Kubernetes for SingleStoreDB Self-Managed process orchestration is growing like gangbusters. But did you know that provisioning a production-ready Kubernetes environment in the cloud is almost like building one in an on-premises data center? It’s easy to think that launching a Kubernetes cluster takes just a few clicks in a wizard tool. Not so fast! Part 2 of my three-part blog series covers more of the five things you absolutely need to know.
In case you missed it, read Part 1 of my blog series, a cautionary tale of provisioning Kubernetes (K8s) clusters and the inevitable bad ending when standard-issue configurations are pressed into production-level process orchestration. In this blog I’ll focus on the pitfalls of setting up K8s environments that are robust enough to handle the rigors of mission-critical applications, and the challenges novice users face.
Data center skills are still required
I have a theory that if Kubernetes was as popular six or seven years ago as it is today, AWS Google Cloud and Azure would not have experienced their meteoric growth that continues today. While Kubernetes is now deployed in almost 50% of organizations worldwide, 100% of those organizations have (or had) data centers. They’ve always had large teams to manage their on-premises IT infrastructure, but with the shift to cloud computing they’ve redeployed much of these teams away from bare-metal servers, cluster administration and network management.
The point is that with Kubernetes, you still need to know how to do all of these tasks in the cloud. You still need data center skill sets to be successful with Kubernetes.
In Part 1 of this blog I talked about the first of the five things you absolutely need to know about using Kubernetes to orchestrate SingleStoreDB Self-Managed: Running a production Kubernetes cluster is different than a test environment. In this blog I’ll cover points 2 and 3:
2. A Kubernetes production cluster environment is a multi-layered stack.
3. Kubernetes novices face steep challenges.
Here we go.
2. A Kubernetes production cluster environment is a multi-layered stack.
Kubernetes isn’t a solitary application working autonomously in isolation. It’s a highly orchestrated environment and, as in all technology environments, things happen, things fail and things auto-recover. When you sign on to your system at the beginning of your day, how do you know if anything happened in your cluster if you don’t monitor it? What if you’re not alerted when a problem occurs? (And there are many problems with incorrectly provisioned Kubernetes clusters.)
Building a monitoring stack
When considering K8s monitoring and alerting — because what’s the point of monitoring if you can’t be notified of critical events?—you’ll need to think about key requirements, the requisite applications and if your organization is prepared. If you are using Kubernetes to orchestrate SingleStoreDB Self-Managed, here’s how the monitoring challenge stacks up. You’ll want to be able to:
- Capture metrics
- Generate alerts from metrics
- Trigger pages for issues and outages
- Capture logging data
- Visualize metrics and logging on dashboards
Kubernetes monitoring stacks typically comprise three separate applications, each of which requires a level of expertise and competency. One of the most popular K8s monitoring trinities is composed of:
- Prometheus for event monitoring. Prometheus is an open source software application that records real-time metrics in a time series database, with flexible queries and real-time alerting.
- Loki for recording log data in a database; it’s an open source log aggregation system inspired by Prometheus.
- Grafana to compose and scale observability with one or more pieces of your K8s monitoring stack. Grafana is an open source analytics and interactive visualization web application that provides charts, graphs and alerts.
Elasticsearch Logstash Kibana (ELK) is another popular open-source trio for building a Kubernetes monitoring stack. This second-generation technology offers functionality analogous to Prometheus-Loki-Grafana, the original K8s monitoring combo.
Stacks within stacks
Prometheus, Loki and Grafana are each a technology stack unto themselves, with implications for production environment set-up, documentation, implementation tricks and particular situations they’re best suited for. Especially Grafana — it has its own language, which you’ll use to write the syntax to present information on the dashboard in the way you want, to show exactly what you’re monitoring. Grafana can also do alerting, as an alternative to Prometheus.
Between Prometheus, Loki, Grafana and ELK, if you use any of these options to build your K8s monitoring stack you’ll probably be “fine,” but there’s usually one arrangement that works best for what you’re trying to accomplish.
Paging and alerting
If you’ve gone 80% of the way with your stack by implementing monitoring, logging and dashboards, you’ll want to make it to the metaphorical finish line with systems for alerting and paging. Popular choices include:
- Alertmanager from Prometheus is an open source solution that handles alerts and takes care of deduplicating, grouping and routing them through email, PagerDuty or Atlassian Opsgenie.
- PagerDuty is a commercial incident response platform that serves as the central control point for time-sensitive and business-critical work across organizations, including Kubernetes cluster operations.
As you set up alerts and paging, you’ll need to work up alerting thresholds. For example, if a particular error occurs and the Kubernetes cluster restarts three times in a row, a page will be triggered because a process is likely broken. Who will you page? You’ll need schedules for the various people on duty, their phone numbers and escalation tiers, i.e., if the Tier 1 point administrator doesn’t respond to the alert in five minutes, the alert gets kicked up to a Tier 2 manager. Tier 3? It’s usually the chief technology officer.
3. Kubernetes novices face steep challenges.
As you can see, the set-up, care and feeding of a cloud Kubernetes cluster is identical to the standard operating procedures of a brick-and-mortar data center. But besides the obvious fact that K8s is a cloud-based technology, how many casual users have the engineering and administrative skills to handle this level of complexity?
Of course, most novice Kubernetes users don’t. They struggle because they don’t have a strong understanding of the underlying technologies, and lack the fundamental skills to support a K8s cluster — a digital animal that’s living, breathing and changing fast. Let’s recap the categories of required expertise:
- Infrastructure: Kubenetes node topology is itself an expert’s wheelhouse. Whether you are setting up a K8s cluster on Google Cloud, AWS or Azure, how you configure VM pools determines whether you will have enough high-powered resources available to schedule when needed. This extends to VM sizing, as well, with processor bus (AMD or Intel?) and NUMA sockets considerations figuring in prominently.
- Storage: A few clicks can make the difference between storage that’s highly performant or doggedly slow. Google Cloud, AWS and Microsoft Azure have very different default storage resources. For example, the default storage type for K8s on AWS is usually good for most K8s workloads. But on Azure, basic disk storage is limited, so when you set up a cluster there you will want to choose premium storage.
- Networking: Choosing a performant network stack requires knowing how Kubernetes best works on different cloud platforms. For example, Azure’s default kubenet is not ready for prime-time production Kubernetes clusters. You’re better off with taking the extra step to provision specific IP address ranges to make sure you have enough network bandwidth. If you want network observability, which allows for monitoring and alerting, you’ll need to know how to option that, as well.
There are many idiosyncrasies in each cloud service. Just look at Azure, which has about 600 configuration variables, some of which don’t matter but others can make a performance-killing difference in your Kubernetes cluster.
Finally, using Kubernetes successfully requires a solid understanding of software container concepts and the K8s technology stack. While an O’Reilly video for absolute beginners might be sufficient for getting Kubernetes up and running on your laptop, it won’t impart enough knowledge to troubleshoot root cause issues when you’re orchestrating a production-level number of SingleStoreDB Self-Managed instances.
Offload the burden to SingleStoreDB Self-Managed
If your organization is not ready or equipped to handle a data center’s worth of Kubernetes engineering and administration tasks, you’ll definitely want to read the third and final blog of my series covering five things you absolutely need to know about using Kubernetes with SingleStoreDB Self-Managed:
4. Singlestore Helios makes all of this easy for you.
5. Still want to go alone? How SingleStoreDB Self-Managed fits into a production environment.
Follow @SingleStoreDB on Twitter to keep up with all of our latest news.