Last week, Miles Ward, Google Cloud Platform’s Global Head of - TopicsExpress



          

Last week, Miles Ward, Google Cloud Platform’s Global Head of Solutions, kicked off our Container series with a post about the overarching concepts around containers, Docker, and Kubernetes. If you have not yet had a chance to read his post, we suggest you start there to arm yourself with the knowledge you will need for this post! This week, Joe Beda, Senior Staff Engineer and one of the founding members of the Kubernetes project, will go a level deeper and talk in depth about the core technical concepts that underpin Google’s use of containers. These have informed the creation of Kubernetes and provide a foundation of future posts in this series. What makes a container cluster? The recent rise of container systems like Docker has (rightly) created a lot of excitement. The ability to package, transfer and run application code across many different environments enables new levels of fluidity in how we manage applications. But, as users expand their use of containers into production, new problems crop up in terms of managing which containers run where, dealing with large numbers of containers and facilitating communication between containers across hosts. This is where Kubernetes comes in. Kubernetes is an open source toolkit from Google that helps to solve these problems. As we discussed in last week’s post, we consider Kubernetes a container cluster manager. Lots of folks call projects in this area orchestration systems, but that has never rung true for me. Orchestral music is meticulously planned with the score decided and distributed to the musicians before the performance starts. Managing a Kubernetes cluster is more like an improv jazz performance. It is a dynamic system that reacts to conditions and inputs in real time. So, what makes a container cluster? Is it a dynamic system that places and oversees sets of containers and the connections between them? Sure, that and a bunch of compute nodes (either raw physical servers or virtual machines). In the remainder of this post, we’ll explore 3 things: what makes up a container cluster, how to work with them and how the interconnected elements work together. Additionally, based on our experience, a container cluster should include a management layer, and we will dig into the implications of this below. Why run a container cluster? Here at Google, we build container clusters around a common set of requirements: always be available, be able to be patched and updated, scale to meet demand, be easily instrumented and monitorable, and so on. While containers allow for applications to be easily and rapidly deployed and broken down into smaller pieces for more granular management, you still need a solution for managing your containers so that they meet these goals. Over the past ten years at Google, weve found that having a container cluster manager addresses these requirements and provides a number of benefits: • Microservices in order to keep moving parts manageable. Having a cluster manager enables us to break down an application into smaller parts that are separately manageable and scalable. This lets us scale up our organization by having clear interfaces between smaller teams of engineers. • Self healing systems in the face of failures. The cluster manager automatically restarts work from failed machines on healthy machines. • Low friction horizontal scaling. A container cluster provides tools for horizontal scaling, such that adding more capacity can be as easy as changing a setting (replication count). • High utilization and efficiency rates. Google was able to dramatically increase resource utilization and efficiency after moving to containers. • Specialized roles for cluster and application operations teams. Developers are able to focus much more on the service they are building rather than on the underlying infrastructure that supports it. For example, the GMail operations and development teams rarely have to talk directly to the cluster operations team. Having a separation of concerns here allows (but doesnt force) operation teams to be more widely leveraged. Now, we understand that some of what we do is unique, so lets explore the ingredients of a great container cluster manager and what you should focus on to realize the benefits of running containers in clusters. Ingredient 1: Dynamic container placement To build a successful cluster, you need a little bit of that jazz improv. You should be able to package up your workload in a container image and declaratively specify your intents around how and where it is going to run. The cluster management system should decide where to actually run your workload. We call this cluster scheduling. This doesnt mean that things are placed arbitrarily. On the contrary, there are a whole set of constraints that come in to play to make cluster scheduling a very interesting and hard problem1 from a computer science point of view. When scheduling, the scheduler makes sure to place your workload on a VM or physical machine with enough spare capacity (e.g. CPU, RAM, I/O, storage). But, in order to meet a reliability objective, the scheduler might also need to spread a set of jobs across machines or racks in order to reduce risk from correlated failures. Or perhaps some machines have special hardware (e.g. GPUs, local SSD, etc.). The scheduler should also react to changing conditions and reschedule work to deal with failures, growing/shrinking the cluster or increased efficiency. To enable this, we encourage users to avoid pinning a container to a specific machine. Sometimes you have to fall back on I want that container on that machine but that should be a rare exception. The next question is: what are we scheduling? The easy answer here is individual containers. But often times, you want to have a set of containers running as a team on the same host. Examples include a data loader with a data server or a log compressor/saver process paired with a server. These containers usually need to be located together, and you want to ensure that they do not become separated during the dynamic placement. To enable this, we introduced in Kubernetes a concept known as a pod. A pod is a set of containers that are placed and scheduled together as a unit on a worker computer (also known as a Kubernetes node). By working to place a group of pods, Kubernetes can pack lots of work onto a node in a reliable way. Ingredient 2: Thinking in sets When working on a single physical node, tools generally dont operate on containers in bulk. But when moving to a container cluster you want to easily scale out across nodes. To do this, you need to work in terms of sets of things instead of singletons. And you want to keep those sets similarly configured. In Kubernetes, we manage sets of pods using two additional concepts: labels and replication controllers. Every pod in Kubernetes has a set of key/value pairs associated with it that we call labels. You can select a set of pods by constructing a query based on these labels. Kubernetes has no opinion on the correct way to organize pods. It is up to you to organize your pods in a way that makes sense to you. You can organize by application tiers, geographic location, development environment etc. In fact, as labels are non-hierarchical, you organize your pods in multiple ways simultaneously. Example: lets say you have a simple service that has a frontend and a backend. But you also have different environments – test, staging and production. You can label your production frontend pods with env=prod tier=fe and your production backend pods with env=prod tier=be. You could similarly label your test and staging environments. Then, when operating on or inspecting your cluster, you could just restrict yourself to the pods where env=prod to see both the frontend and backend. Or you can look at all of your frontends across test, staging and production. You can imagine how this organization system can adapt as you add more tiers and environments. Figure 1 - Filtering pods using labels Scaling Now that we have a way of identifying and maintaining a pool of similarly configured machines, we can use this functionality for horizontally scaling (i.e. “scaling out”). To make this easy, we have a helper object in Kubernetes called the replication controller. It maintains a pool of these pods based on a desired replication count, a pod template and a label selector/query. It is really pretty easy to wrap your head around. Here is some pseudo-code: object replication_controller { property num_replicas property template property label_selector runReplicationController(num_desired_pods, template, label_selector) { loop forever { num_pods = length(query(label_selector)) if num_pods > num_desired_pods { kill_pods(num_pods - num_desired_pods) } else if num_pods create_pods(template, num_desired_pods - num_pods) } } } } So, for example, if you wanted to run a php frontend tier with 3 pods, you would create a replication controller with an appropriate pod template (pointing at your php container image) and a num_replicas count of 3. You would identify the set of pods that this replication controller is managing with a label query of env=prod tier=fe. The replication controller takes an easy to understand desired state and tirelessly works to make it true. And if you want to scale in or out all you have to do is change the desired replication count, and the replication controller will take care of the rest. By focusing on the desired state of the system, we end up with something that is easier to reason about. Figure 2 - The Replication Controller enforces desired state Ingredient 3: Connecting within a cluster You can do a lot of interesting things with the ingredients listed so far. Any sort of highly parallelizable work distribution (continuous integration systems, video encoding, etc.) can work without a lot of communication between individual pods. However, most sophisticated applications are more of a network of smaller services (microservices) that communicate with each other. The tiers of traditional application architectures are really nodes in a graph. A cluster management system needs a naming resolution system that works with the ingredients described above. Just like DNS provides the resolution of domain names to IP addresses, this naming service resolves service names to targets, with some additional requirements. Specifically, changes should be propagated almost immediately when things start or are moved and a service name should resolve to a set of targets, possibly with extra metadata about those targets (e.g. shard assignment). For the Kubernetes API, this is done with a combination of label selectors and the watch API pattern.1 This provides a very lightweight form of service discovery. Most clients arent going to be rewritten immediately (or ever) to take advantage of a new naming API. Most programs want a single address and port to talk to in order to communicate with another tier. To bridge this gap, Kubernetes introduces the idea of a service proxy. This is a simple network load balancer/proxy that does the name query for you and exposes it as a single stable IP/port (with DNS) on the network. Currently, this proxy does simple round robin balancing across all backends identified by a label selector. Over time, Kubernetes plans to allow for custom proxies/ambassadors that can make smarter domain specific decisions (keep an eye on the Kubernetes roadmap for details as the community defines this). One example that Id love to see is a MySQL aware ambassador that knows how to send write traffic to the master and read traffic to read slaves. Voila! Now you can see how the three key components of a cluster management system fit together: dynamic container placement, thinking in sets of containers, and connecting within a cluster. We asked the question at the top of this post, What makes a container cluster? Hopefully from the details and information we’ve provided, you have an answer. Simply put, a container cluster is a dynamic system that places and manages containers, grouped together in pods, running on nodes, along with all the interconnections and communication channels. When we started Kubernetes with the goal of externalizing Googles experiences with containers, we initially focused on just scheduling and dynamic placement. However, when thinking through the various systems that are absolutely necessary to build a real application, we immediately saw that it was necessary to add the other additional ingredients of pods, labels and the replication controller. To my mind, these are the bare minimum necessary to build a usable container cluster manager. Kubernetes is still baking in the oven, but is coming together nicely. We just released v0.8, which you can download here. We’re still adding features and refining those that we have. We’ve published our roadmap to v1.0. The project has quickly established a large and growing community of contributing partners (such as Red Hat, VMware, Microsoft, IBM, CoreOS, and others) and customers, who use Kubernetes in a variety of environments. While we have a lot of experience in this space, Google doesnt have all the answers. There are requirements and considerations that we dont see internally. With that in mind, please check out what we are building and get involved! Try it out, file bug reports, ask for help or send a pull request (PR). -Posted by Joe Beda, Senior Staff Engineer and Kubernetes Cofounder ─────────────────────────────────── 1 This is the classic knapsack problem which is NP-hard in the general case. 2 The Watch API pattern is a way to deliver async events from a service. It is common on lock server systems (zookeeper, etc.) that are derived from the original Google Chubby paper. The client essentially reaches out and hangs a request until there are changes. This is usually coupled with version numbers so that the client stay current on any changes.
Posted on: Thu, 22 Jan 2015 19:22:29 +0000

Trending Topics



Recently Viewed Topics




© 2015