Tech News

7 min read

Everything you would like to know about Kubernetes

Source: GetInData, Google. Source: GetInData, Google.

Kubernetes. What is it? Undoubtedly one of the hottest topics in Big Data world over the last months and a subject of multiple discussions. This is why we’ve decided to sum up facts and thoughts on it and present a comprehensive overview of this platform. This post is dedicated for a non-technical audience that is interested in this tech.

Kubernetes — basic information

The platform’s name etymology comes from Greek and it means helmsman or pilot. The name is also associated, rooted with governor and cybernetic. The platform’s abbreviation is K8s. The 8 replaces the 8 letters from the full name: ubernete.

Source: GetInData, Google. Source: GetInData, Google.

What exactly Kubernetes is? At first, It’s worth to take a look at Kubernetes history. Originally, the platform was developed and designed (around mid 2000s) by engineers at Google, under name Borg, on the top of container technology, containerization. The technology, invented by Linux, is similar to traditional container idea known from shipping business and assumes packaging an application with its critical dependencies, isolated from other, affiliated processes. It is worth to mention that Google was one of the early contributors to containerization and became popular when the Docker containerization project was launched in 2013. Borg predated Kubernetes and the lessons learned from developing Borg, as well as Google’s +10 years of experience with scaling and containerization, ‘paid off’ in the new platform that was introduced to public and open-sourced in 2014.

After this a bit lengthy (but needed) intro, let’s cut to the point and explain what Kubernetes is. This is an open-source source platform for container orchestration, in other words, it helps to run applications packed in containers. Though the process of running apps on a few containers is not a complicated task but if you start scaling, Kubernetes support is in need. By making containerized applications dramatically easier to manage at scale, Kubernetes has become a key part of the container revolution. Now, you can bundle together hosts running Linux containers, and the platform will support you in the process of smooth and efficient cluster management, also in the cloud environment. Kubernetes is an ideal platform for hosting cloud-native applications that require rapid scaling, like real-time data streaming through Apache Kafka.

Source: GetInData, Google. Source: GetInData, Google.

Kubernetes — specs & features

Let’s move on to Kubernetes specs. The platform has a number of features. Kubernetes provides a container-based management environment. It arranges computing, networking, and storage infrastructure on behalf of user workloads. This sums up to a mix of PaaS (Platform as a Service) simplicity and IaaS (Infrastructure as a Service) flexibility, however it is not a traditional, all-inclusive PaaS system. The platform operates at the container level rather than at the hardware level and delivers generally applicable features known from PaaS menu: scaling, logging, deployment to name a few. Kubernetes is not monolithic and default solutions are non-existent, they’re optional and ready for customization. The platform leaves the door wide open to build developer platforms, but preserves user choice and flexibility. Labels (a tool to add metadata to Kubernetes objects) empower users to organize their resources however they please. Annotations (a similar feature to label, but allows to add non-identifying metadata) enable to decorate resources with custom information to facilitate workflows and provide an easy way for management tools to checkpoint state. What’s more, the platform offers the control plane built on the basis of the same APIs available for both developers and users. Thanks to that, the latter group is equipped with the resources to write their own controllers on their own APIs, that can be targeted by a general-purpose command-line tool.

Although Kubernetes provides its users a lot of freedom for running operations (i.e. it does not limit the types of applications supported) it has some limitations arising from the platform’s idea: does not deliver traditional infrastructure services like deploying code, does not dictate logging, alerting nor monitoring solutions or PaaS offerings like application-level services such as middleware, data-processing frameworks (i.e. Spark) or databases (i.e. mySQL). Kubernetes does not support advanced machine configurations, maintenance and management solutions.

Kubernetes vs. IT challenges

Cloud vs on premise — this dilemma is known for any fast-developing IT company. The migration process is complicated as a future cloud company needs to fulfill a lot of requirements: infrastructure accommodation, security and risk management or data privacy to name a few. Kubernetes gives its users a hand in the migration process as it defines the standard API. What’s more, the same tools (kubectl, helm) can manage a distribution infrastructure both on premise (Openshift) and cloud (GKE). We can also start up our own cluster on a PC (via minikube or minishift) to get some hands-on experience with the platform. But one need to remember that since K8s is expandable, some distributions solve problems in their own manner (i.e. K8s Ingress vs OpenShift Route).

How about storage? There are a few bottlenecks here. The K8s pods are ephemeral and are not a good fit for storing stateful applications (quick reminder: stateful apps are the ones that track the previously stored information which is used for current and future transactions). This all is resolved by K8s ability to connect volumes to pods in order to save the app state, but only a few storage types are supported, mainly only as exclusive write. This makes the transition process challenging, because storage is not yet easy to scale.

From a Big Data perspective, one of the most K8s amusing features is isolation. The namespace concept, based on the CICD idea (Continuous Integration and Deployment), offers a separated environment inside a cluster with access policies defined on the namespace level. This gives a freedom to create different environments (testing, production, development) and use the same scripts to run queries on them. The process of allotting the environments is easy and their full independence is ensured. From a business standpoint this solution is advantageous, the costs are under control as the whole environment is maintained on one cluster. What’s also important, the fact of using the same scripts ensures far more smooth and accurate testing processes. No doubt, isolation is a great feature for a data scientist to run an independent project with a huge computing need.

What else? We also find it helpful that Spark is already available on the platform — it eliminates the need for YARN (app to run Spark), however Kubernetes does not yet deliver all the features available on YARN such as dynamic allocation.

All in all, Kubernetes serves as a big box with lots of tools delivering nice, fancy, and customized solutions, that are not yet refined to fully handle some major, critical purposes like data storage or data transition. The system provides a set of composable control processes that are continuously developed by a huge K8s community in order to suit users desired state. These all gives an already powerful system backed by big corporations, with a great deal of potential in the future. As of now, the platform is not perfect, it has a lot to improve in data storage and transition fields, but we believe it’s only temporary as the K8s project is open-sourced and the community works on its new functionalities and features in order to deliver a more stable and powerful system.

kubernetes

google

cloud computing

big data

spark

Last updated: 31 May 2019

Written by

Mikołaj Wiśniewski

Big Data Researcher

Like this post?
Spread the word

Want more? Check our articles

finding your way llm getindataobszar roboczy 1 4

Tutorial

Finding your way through the Large Language Models Hype

With the introduction of ChatGPT, Large Language Models (LLMs) have become without doubt the hottest topic in AI and it doesn’t seem that this is…

Big Data Event

How we evaluate the CfP submissions and build the conference agenda at Big Data Technology Warsaw Summit

Big Data Technology Warsaw Summit 2021 is fast approaching. Please save the date - February 25th, 2021. This time the conference will be organized as…

deployingsecuremlfowonawsobszar roboczy 1 4

Tutorial

Deploying secure MLflow on AWS

One of the core features of an MLOps platform is the capability of tracking and recording experiments, which can then be shared and compared. It also…

Tutorial

EU Artificial Intelligence Act - where are we now

It's coming up to a year since the European Commission published its proposal for the Artificial Intelligence Act (the AI Act/AI Regulation). The…

Use-cases/Project

Enabling Hive on Spark on CDH 5.14 — a few problems (and solutions)

Recently I’ve had an opportunity to configure CDH 5.14 Hadoop cluster of one of GetInData’s customers to make it possible to use Hive on Spark…

DATA Pill – the blue pill that (accidentally) works!

Ever felt overwhelmed by the flood of news about the latest technologies, tools, and trends in Data, AI, and ML? A new framework here, a revolutionary…

Check All

Contact us

Interested in our solutions?
Contact us!

Together, we will select the best Big Data solutions for your organization and build a project that will have a real impact on your organization.

What did you find most impressive about GetInData?

They did a very good job in finding people that fitted in Acast both technically as well as culturally.

Type the form or send a e-mail: hello@getindata.com

Everything you would like to know about Kubernetes

Kubernetes — basic information

Kubernetes — specs & features

Kubernetes vs. IT challenges

Like this post?Spread the word

Want more? Check our articles

Finding your way through the Large Language Models Hype

How we evaluate the CfP submissions and build the conference agenda at Big Data Technology Warsaw Summit

Deploying secure MLflow on AWS

EU Artificial Intelligence Act - where are we now

Enabling Hive on Spark on CDH 5.14 — a few problems (and solutions)

DATA Pill – the blue pill that (accidentally) works!

Contact us

Interested in our solutions?Contact us!

Like this post?
Spread the word

Interested in our solutions?
Contact us!