High Availability in Kubernetes

Oct 2019 • 2 min read

What is High Availability?

Availability means well, available. If you say something is available, it means it’s there when you need/want it.

In the context of software systems, availability means your system is constantly up and running.

And in the context of distributed systems, it means when one node goes down, the system does not fail, it remains operational.

High availability means your system has a high rate for uptime/availability. i.e. The system or system component is “continuously available for a desirably long length of time”. Examples of this:

SLA guarantees like AWS’ uptime guarantee.
Twitter’s architecture for scale.

The rest of this post is gonna assume we’re in the context of distributed systems.

Why is it important?

Several factors come into play when you decide what you should care about in your system. One of the most important factors, which should go without saying really, is that users should be able to access your system. Amongst other great benefits, High Availability ensures:

Redundancy - The software system/component doesn’t go down in case of a single-node failure. i.e. Disaster recovery before you even discover there’s been one.
Scalability - New nodes can be added when the inevitable need to scale comes.
Easier Maintenance - Nodes can be taken down, maintained and put back up without system failure.

What are some patterns commonly used for high availability?

A HA system usually consists of two nodes, since that’s the minimum number of nodes needed for redundancy.

Node configurations

Active/active
Active/passive
N + 1
N + M

Failover Strategies

What happens when a master node fails?

Fail fast
On fail, try one
On fail, try all

Source (Wikipedia)

Tools & providers

Nginx - NGINX Plus configured with HA, or Nginx with keepalived.
AWS - SoftNas
Digital Ocean - Community tools for HA
GCP - Wasn’t immediately apparent but this tutorial seems good.

Conclusion

High Availability is important. Especially in the context of distributed systems. It prevents disasters, enables you provide SLAs, builds in redundancy and all-round makes your system more battle-ready.

In the next post, we’ll go into implementing HA in Kubernetes. Exciting times, innit?