High Availability in Kubernetes
Oct 2019 • 2 min read
What is High Availability?
Availability means well, available. If you say something is available, it means it’s there when you need/want it.
In the context of software systems, availability means your system is constantly up and running.
And in the context of distributed systems, it means when one node goes down, the system does not fail, it remains operational.
High availability means your system has a high rate for uptime/availability. i.e. The system or system component is “continuously available for a desirably long length of time”. Examples of this:
- SLA guarantees like AWS’ uptime guarantee.
- Twitter’s architecture for scale.
The rest of this post is gonna assume we’re in the context of distributed systems.
Why is it important?
Several factors come into play when you decide what you should care about in your system. One of the most important factors, which should go without saying really, is that users should be able to access your system. Amongst other great benefits, High Availability ensures:
- Redundancy - The software system/component doesn’t go down in case of a single-node failure. i.e. Disaster recovery before you even discover there’s been one.
- Scalability - New nodes can be added when the inevitable need to scale comes.
- Easier Maintenance - Nodes can be taken down, maintained and put back up without system failure.
What are some patterns commonly used for high availability?
A HA system usually consists of two nodes, since that’s the minimum number of nodes needed for redundancy.
Node configurations
- Active/active
- Active/passive
- N + 1
- N + M
Failover Strategies
What happens when a master node fails?
- Fail fast
- On fail, try one
- On fail, try all
Source (Wikipedia)
Tools & providers
- Nginx - NGINX Plus configured with HA, or Nginx with keepalived.
- AWS - SoftNas
- Digital Ocean - Community tools for HA
- GCP - Wasn’t immediately apparent but this tutorial seems good.
Conclusion
High Availability is important. Especially in the context of distributed systems. It prevents disasters, enables you provide SLAs, builds in redundancy and all-round makes your system more battle-ready.
In the next post, we’ll go into implementing HA in Kubernetes. Exciting times, innit?
Further Reading
Some resources you can check out to dig deeper.
High Availability
Distributed Systems
Hi! My name is Opeyemi. I am an SRE that cares about Observability, Performance and Dogs. You can learn more about me or send me a message on Twitter.