Huge step in keeping Keytiles highly available

published: 2021-09-10

Let's try not to be too technical here and focus on customer benefit of this big change so to put it short: from now we can guarantee much more stability and much better availability in Keytiles!

To explain this a bit let's start with the truth: every service/application has some limited resources (= server machines) assigned to it. This should be always sufficient to serve under the "usual" load. For obvious business reasons maintainers of every services/applications try to avoid "overscaling" the system. Means: just run it on enough servers.

But what can a team do if they realize the system has a bottleneck somewhere? If demand grows - even unexpectedly just out of nowhere - what can we do? This is the main question! If a system is not prepared good enough to give options to the maintainer team to "do something" before problem is reaching the customers (results in slow responses, random failures or even data loss in the worst case) then it is 100% sure experiencing outages in the service will come.

For us in Keytiles we use Cassandra as the data storage layer and this layer is one of our possible serious bottlenecks. However Cassandra is scalable by design but scaling it up is not a simple task and definitely not a quick thing either. It takes time and requires planning and execution of the plan. It can not be a real option when we detect for example that something "dirty" is started to happen now and system is short on resources.

But it is not all! If we want to introduce a new feature which is data intensive we always have the risk to just drop that to the data storage because we can not be sure what effect can it bring (over time!) to the storage putting all existing things to danger.

So as a first step we wanted to have much more / better controls over the storage layer. And this is now done! From this moment we have the possibility to have not just one but multiple storage clusters and we can very quickly configure Keytiles (in the background! Super important!) what to store in which storages exactly.

Having this "configurability" at hand we have lot's of options to solve problems we see before it is reaching you, the customer!