Tech News
7 min read

Everything you would like to know about Kubernetes

Source: GetInData, Google. Source: GetInData, Google.

Kubernetes. What is it? Undoubtedly one of the hottest topics in Big Data world over the last months and a subject of multiple discussions. This is why we’ve decided to sum up facts and thoughts on it and present a comprehensive overview of this platform. This post is dedicated for a non-technical audience that is interested in this tech.

Kubernetes — basic information

The platform’s name etymology comes from Greek and it means helmsman or pilot. The name is also associated, rooted with governor and cybernetic. The platform’s abbreviation is K8s. The 8 replaces the 8 letters from the full name: ubernete.

Source: GetInData, Google. Source: GetInData, Google.

What exactly Kubernetes is? At first, It’s worth to take a look at Kubernetes history. Originally, the platform was developed and designed (around mid 2000s) by engineers at Google, under name Borg, on the top of container technology, containerization. The technology, invented by Linux, is similar to traditional container idea known from shipping business and assumes packaging an application with its critical dependencies, isolated from other, affiliated processes. It is worth to mention that Google was one of the early contributors to containerization and became popular when the Docker containerization project was launched in 2013. Borg predated Kubernetes and the lessons learned from developing Borg, as well as Google’s +10 years of experience with scaling and containerization, ‘paid off’ in the new platform that was introduced to public and open-sourced in 2014.

After this a bit lengthy (but needed) intro, let’s cut to the point and explain what Kubernetes is. This is an open-source source platform for container orchestration, in other words, it helps to run applications packed in containers. Though the process of running apps on a few containers is not a complicated task but if you start scaling, Kubernetes support is in need. By making containerized applications dramatically easier to manage at scale, Kubernetes has become a key part of the container revolution. Now, you can bundle together hosts running Linux containers, and the platform will support you in the process of smooth and efficient cluster management, also in the cloud environment. Kubernetes is an ideal platform for hosting cloud-native applications that require rapid scaling, like real-time data streaming through Apache Kafka.

Source: GetInData, Google. Source: GetInData, Google.

Kubernetes — specs & features

Let’s move on to Kubernetes specs. The platform has a number of features. Kubernetes provides a container-based management environment. It arranges computing, networking, and storage infrastructure on behalf of user workloads. This sums up to a mix of PaaS (Platform as a Service) simplicity and IaaS (Infrastructure as a Service) flexibility, however it is not a traditional, all-inclusive PaaS system. The platform operates at the container level rather than at the hardware level and delivers generally applicable features known from PaaS menu: scaling, logging, deployment to name a few. Kubernetes is not monolithic and default solutions are non-existent, they’re optional and ready for customization. The platform leaves the door wide open to build developer platforms, but preserves user choice and flexibility. Labels (a tool to add metadata to Kubernetes objects) empower users to organize their resources however they please. Annotations (a similar feature to label, but allows to add non-identifying metadata) enable to decorate resources with custom information to facilitate workflows and provide an easy way for management tools to checkpoint state. What’s more, the platform offers the control plane built on the basis of the same APIs available for both developers and users. Thanks to that, the latter group is equipped with the resources to write their own controllers on their own APIs, that can be targeted by a general-purpose command-line tool.

Although Kubernetes provides its users a lot of freedom for running operations (i.e. it does not limit the types of applications supported) it has some limitations arising from the platform’s idea: does not deliver traditional infrastructure services like deploying code, does not dictate logging, alerting nor monitoring solutions or PaaS offerings like application-level services such as middleware, data-processing frameworks (i.e. Spark) or databases (i.e. mySQL). Kubernetes does not support advanced machine configurations, maintenance and management solutions.

Kubernetes vs. IT challenges

Cloud vs on premise — this dilemma is known for any fast-developing IT company. The migration process is complicated as a future cloud company needs to fulfill a lot of requirements: infrastructure accommodation, security and risk management or data privacy to name a few. Kubernetes gives its users a hand in the migration process as it defines the standard API. What’s more, the same tools (kubectl, helm) can manage a distribution infrastructure both on premise (Openshift) and cloud (GKE). We can also start up our own cluster on a PC (via minikube or minishift) to get some hands-on experience with the platform. But one need to remember that since K8s is expandable, some distributions solve problems in their own manner (i.e. K8s Ingress vs OpenShift Route).

How about storage? There are a few bottlenecks here. The K8s pods are ephemeral and are not a good fit for storing stateful applications (quick reminder: stateful apps are the ones that track the previously stored information which is used for current and future transactions). This all is resolved by K8s ability to connect volumes to pods in order to save the app state, but only a few storage types are supported, mainly only as exclusive write. This makes the transition process challenging, because storage is not yet easy to scale.

From a Big Data perspective, one of the most K8s amusing features is isolation. The namespace concept, based on the CICD idea (Continuous Integration and Deployment), offers a separated environment inside a cluster with access policies defined on the namespace level. This gives a freedom to create different environments (testing, production, development) and use the same scripts to run queries on them. The process of allotting the environments is easy and their full independence is ensured. From a business standpoint this solution is advantageous, the costs are under control as the whole environment is maintained on one cluster. What’s also important, the fact of using the same scripts ensures far more smooth and accurate testing processes. No doubt, isolation is a great feature for a data scientist to run an independent project with a huge computing need.

What else? We also find it helpful that Spark is already available on the platform — it eliminates the need for YARN (app to run Spark), however Kubernetes does not yet deliver all the features available on YARN such as dynamic allocation.

All in all, Kubernetes serves as a big box with lots of tools delivering nice, fancy, and customized solutions, that are not yet refined to fully handle some major, critical purposes like data storage or data transition. The system provides a set of composable control processes that are continuously developed by a huge K8s community in order to suit users desired state. These all gives an already powerful system backed by big corporations, with a great deal of potential in the future. As of now, the platform is not perfect, it has a lot to improve in data storage and transition fields, but we believe it’s only temporary as the K8s project is open-sourced and the community works on its new functionalities and features in order to deliver a more stable and powerful system.

kubernetes
google
cloud computing
big data
spark
31 May 2019

Want more? Check our articles

5 reasons to follow us on Linkedin. Celebrating 1,000 followers on our profile!

We are excited to announce that we recently hit the 1,000+ followers on our profile on Linkedin. We would like to send a special THANK YOU :) to…

Read more
Use-cases/Project

Anomaly detection implemented in podcasting company

Being a Data Engineer is not only about moving the data but also about extracting value from it. Read an article on how we implemented anomalies…

Read more
Tutorial

Apache NiFi - why do data engineers love it and hate it at the same time? Blog Series Introduction

Learning new technologies is like falling in love. At the beginning, you enjoy it totally and it is like wearing pink glasses that prevent you from…

Read more
Use-cases/Project

Business value of event processing - use cases

Every second your IT systems exchange millions of messages. This information flow includes technical messages about opening a form on your website…

Read more
Big Data Event

Big Data Tech Warsaw Summit 2019 summary

It’s been already more than a month after Big Data Tech Warsaw Summit 2019, but it’s spirit is still among us — that’s why we’ve decided to prolong it…

Read more
Tech News

Celebrating GetinData’s Inclusion on Clutch’s Lists of Top Big Data and IoT Companies!

Founded by former Spotify data engineers in 2014, GetInData consists of a team of experienced and passionate Big Data veterans with proven track of…

Read more

Contact us

Fill out this simple form. Our team will contact you promptly to discuss the next steps.

hello@getindata.comFist bump illustration

Any questions?

Choose one
By submitting this form, you agree to our  Terms & Conditions