Opeyemi Onikute | Reliability Engineer

#About

I am an infrastructure engineer currently focusing on making systems observable the SRE way. This improves reliability by helping you infer the internal behaviour of a system from its external output. This article is a good long-form explanation of how this can benefit an engineering organisation.

My experience with observability is mainly in the AWS, Linux, NodeJS and PHP space. I’m familiar with Datadog, New Relic, Cloudwatch, Splunk and the ELK stack. My current focus is on digging into the Linux kernel to understand the performance of NodeJS applications and find room for optimisation.

In the past, I’ve been a full-stack engineer focused on building and scaling systems - sometimes of a distributed nature. In an alternate timeline, I’ve also been a graphic designer expressing myself in minimalist styles.

I am interested in:

Observability: Distributed Tracing, APM, eBPF, perf, Network (TCP, UDP, DNS), Linux systems (CPU, Memory, Disk), Dashboards, Synthetics
Programming Languages: NodeJS, Python, Golang, PHP, Bash, HCL (Terraform)
Advanced Linux Troubleshooting: CPU/Memory/Disk Profiling, Network Performance, Debugging Techniques
System Stuff: Docker, Kubernetes, ECS, Linux Kernel
Distributed Systems: Distributed Consensus, Architecture, Protocols
Cloud Platforms: AWS, GCP, Digital Ocean, Heroku

If you're interested in stuff like that, read my blog.

#Work

Currently: Doing SRE at Cloudflare - helping to build a better internet.
Earlier: Did observability at an amazing company called Paystack - helping to build systems that power payments across Africa.
Even earlier: Built several full-stack stuff at a design agency called Check DC.
Before that: Worked with some amazing engineers at Konga.

#Publications

I like to talk/write about my work, mainly to help others grow. Some examples:

Prometheus and Grafana: Visualizing Application Performance: Described as one of the best online courses teaching Grafana on the market.
Prometheus Essential Training: A detailed online course that takes you from beginner to intermediate-level skill in Prometheus.
Designing an Incident Response Process at Scale: Conference talk. How can your organization resolve customer-impacting incidents in record time?
Observability at Paystack - the first five years: Sharing how to set up Observability at organisations across the industry.
Improving platform resilience at Cloudflare through automation: How we use automation to improve platform resilience at a global scale.
How the Cloudflare global network optimizes for system reboots: How to use sinusoidal waves to determine low-traffic periods for safe server and data center maintenance.

#Projects

I enjoy making little tools/products for fun. It's a huge part of why I became interested in building software. You can see most of these projects here.

#Contact

The best place to reach me is @Ope__O on Twitter.

Sharing tribal knowledge with insightful videos.

I teach Monitoring and Observability concepts on Linkedin Learning.

I’m hilarious.

All this, but in employment lingo.

Opeyemi Onikute.

#About

#Work

#Publications

#Projects

#Contact