Deploying Applications with Confidence Using Kubernetes

In 2015, only 10% of us were using some container orchestration solution (Kubernetes of one of its competitors). By 2017, 71% of us were using Kubernetes - based on a survey conducted by 451 Research. More and more teams opted for Kubernetes because of its efficiency and cross-cloud integration.

With the increased popularity, more and more teams started to look for lessons learned and best practices on the topic of deploying and running applications with confidence using Kubernetes. This article summarizes my experiences - with the hope that you can learn a few new tricks after reading it.

I am going to present a very similar topic at NodeSummit in a few weeks - if you'd like to discuss these points in person, find me at the conference!

Learning Kubernetes

Before putting Kubernetes into production, it is essential to understand its concepts. If you are new to Kubernetes, stop reading this article for a few minutes and check out these resources:

Running Kubernetes

Once you tried running Kubernetes locally, it is time to understand better its internals and set up a cluster in the cloud. To do so, I'd recommend doing Kelsey Hightower's Kubernetes The Hard Way. It walks you through all the prerequisites, then goes into how you can bootstrap the etcd cluster, the Kubernetes control plane, and worker nodes, setting up DNS, then finally, how to do smoke tests.

While you might not want to do that for your production cluster, I found it useful to understand better how each component of Kubernetes works together. Depending on the cloud vendor you picked, most probably they'll have managed Kubernetes services. For a comparison, check out this or this article.

Containerizing Node.js Applications

Once you have your production Kubernetes cluster up and running, it is time to create the production version of your images. When I am talking about production images, I am referring to the following Docker best practices, including:

While I was with GoDaddy, we've published Docker images that implement these best practices. You can grab them from Docker Hub, or check out the source.

Handle Application Lifecycle Events

Pods - the smallest computing units in Kubernetes - can have the following states:

Pending
Running
Succeeded
Failed
Unknown

These states are determined using probes. Kubernetes defines both the liveness and the readiness probes. Liveness probes are used to signal Kubernetes if a container has to be restarted, while the readiness probes determine if a container can serve traffic.

To make sure you don't fail any requests, we've open-sourced a tiny library that helps you implement these checks, and to make sure that your application stops in a graceful manner. The library is called terminus, and it extends your Node.js applications with health checks and graceful shutdown procedures.

Packaging Applications

Helm is the package manager for Kubernetes. Helm helps you manage Kubernetes applications — Helm Charts helps you define, install, and upgrade Kubernetes application. In the Helm universe, you find three big concepts:

Chart is a Helm package - it contains all the resource definitions an application needs,
Repository is the place where Helm Charts live - you can think of it as the npm or maven registry,
Release is a running instance of a Helm Chart.

With Helm, you can add MySQL to your Kubernetes cluster as simply as:

# find the MySQL helm packages
$ helm search mysql
NAME                   VERSION    DESCRIPTION
stable/mysql      0.1.0      Chart for MySQL
stable/mariadb    0.5.1      Chart for MariaDB

# install MariaDB (a mysql distribution)
$ helm install stable/mariadb

To read learn more on Helm, I'd recommend to check out out the following resources:

Securing the Kubernetes Cluster

As Kubernetes is managed through a REST API, it is your top priority to secure that interface.

To secure it, you can:

use TLS for all communication towards the API,
use role-based access control for authorization.

For a more comprehensive list, check out the official security guidelines.

Resource Constraints

When creating Pods or Deployments in Kubernetes, you can optionally define how much CPU or memory each container can use. One of the simplest scenarios for limiting resources a container can use is the following:

apiVersion: v1
kind: Pod
metadata:
  name: cpu-demo
  namespace: cpu-example
spec:
  containers:
    - name: cpu-demo-ctr
      image: vish/stress
      resources:
        limits:
          cpu: "1"
        requests:
          cpu: "0.5"

Limits and requests for CPU resources are measured in CPU units. Fractional requests are allowed. One CPU, in Kubernetes, is equivalent to:

1 AWS vCPU
1 GCP Core

Limits and requests for memory are measured in bytes.

In my experience the sooner you start using resource requests and limits, the better of you will be, because:

You can move CPU/memory intensive applications to dedicated node pools.
With node pools, you eliminate noisy neighbors (if you don't have CPU limits, a container may use up all the CPU resources a node has).
It enables you to scale your cluster—you will know how much traffic a pod can serve with the given resources.

To learn more, I recommend reading the official articles Managing Compute Resources for Containers and Assign CPU Resources to Containers and Pods.

Disaster Recovery

Disaster recovery is usually a set of policies and procedures for what to do once a disaster hits mission-critical system components. It can (but doesn’t necessarily) mean data loss too.

When it comes to Kubernetes, it's a good practice to backup your cluster regularly. **However, doing frequent backups are not enough—in my experience, these backups are only valuable if you know how to use them. ** I'd recommend scheduling practice runs in which you restore your whole service from the ground up.

For Kubernetes, I'd recommend a tool called ark to create and restore these backups.

Ark gives you tools to backup and restore your Kubernetes cluster resources and persistent volumes. Ark lets you:

Take backups of your cluster and restore in case of loss.

Copy cluster resources across cloud providers.

Replicate your production environment for development and testing environments. From: https://github.com/heptio/ark