Tech News

7 min read

Everything you would like to know about Kubernetes

Source: GetInData, Google. Source: GetInData, Google.

Kubernetes. What is it? Undoubtedly one of the hottest topics in Big Data world over the last months and a subject of multiple discussions. This is why we’ve decided to sum up facts and thoughts on it and present a comprehensive overview of this platform. This post is dedicated for a non-technical audience that is interested in this tech.

Kubernetes — basic information

The platform’s name etymology comes from Greek and it means helmsman or pilot. The name is also associated, rooted with governor and cybernetic. The platform’s abbreviation is K8s. The 8 replaces the 8 letters from the full name: ubernete.

Source: GetInData, Google. Source: GetInData, Google.

What exactly Kubernetes is? At first, It’s worth to take a look at Kubernetes history. Originally, the platform was developed and designed (around mid 2000s) by engineers at Google, under name Borg, on the top of container technology, containerization. The technology, invented by Linux, is similar to traditional container idea known from shipping business and assumes packaging an application with its critical dependencies, isolated from other, affiliated processes. It is worth to mention that Google was one of the early contributors to containerization and became popular when the Docker containerization project was launched in 2013. Borg predated Kubernetes and the lessons learned from developing Borg, as well as Google’s +10 years of experience with scaling and containerization, ‘paid off’ in the new platform that was introduced to public and open-sourced in 2014.

After this a bit lengthy (but needed) intro, let’s cut to the point and explain what Kubernetes is. This is an open-source source platform for container orchestration, in other words, it helps to run applications packed in containers. Though the process of running apps on a few containers is not a complicated task but if you start scaling, Kubernetes support is in need. By making containerized applications dramatically easier to manage at scale, Kubernetes has become a key part of the container revolution. Now, you can bundle together hosts running Linux containers, and the platform will support you in the process of smooth and efficient cluster management, also in the cloud environment. Kubernetes is an ideal platform for hosting cloud-native applications that require rapid scaling, like real-time data streaming through Apache Kafka.

Source: GetInData, Google. Source: GetInData, Google.

Kubernetes — specs & features

Let’s move on to Kubernetes specs. The platform has a number of features. Kubernetes provides a container-based management environment. It arranges computing, networking, and storage infrastructure on behalf of user workloads. This sums up to a mix of PaaS (Platform as a Service) simplicity and IaaS (Infrastructure as a Service) flexibility, however it is not a traditional, all-inclusive PaaS system. The platform operates at the container level rather than at the hardware level and delivers generally applicable features known from PaaS menu: scaling, logging, deployment to name a few. Kubernetes is not monolithic and default solutions are non-existent, they’re optional and ready for customization. The platform leaves the door wide open to build developer platforms, but preserves user choice and flexibility. Labels (a tool to add metadata to Kubernetes objects) empower users to organize their resources however they please. Annotations (a similar feature to label, but allows to add non-identifying metadata) enable to decorate resources with custom information to facilitate workflows and provide an easy way for management tools to checkpoint state. What’s more, the platform offers the control plane built on the basis of the same APIs available for both developers and users. Thanks to that, the latter group is equipped with the resources to write their own controllers on their own APIs, that can be targeted by a general-purpose command-line tool.

Although Kubernetes provides its users a lot of freedom for running operations (i.e. it does not limit the types of applications supported) it has some limitations arising from the platform’s idea: does not deliver traditional infrastructure services like deploying code, does not dictate logging, alerting nor monitoring solutions or PaaS offerings like application-level services such as middleware, data-processing frameworks (i.e. Spark) or databases (i.e. mySQL). Kubernetes does not support advanced machine configurations, maintenance and management solutions.

Kubernetes vs. IT challenges

Cloud vs on premise — this dilemma is known for any fast-developing IT company. The migration process is complicated as a future cloud company needs to fulfill a lot of requirements: infrastructure accommodation, security and risk management or data privacy to name a few. Kubernetes gives its users a hand in the migration process as it defines the standard API. What’s more, the same tools (kubectl, helm) can manage a distribution infrastructure both on premise (Openshift) and cloud (GKE). We can also start up our own cluster on a PC (via minikube or minishift) to get some hands-on experience with the platform. But one need to remember that since K8s is expandable, some distributions solve problems in their own manner (i.e. K8s Ingress vs OpenShift Route).

How about storage? There are a few bottlenecks here. The K8s pods are ephemeral and are not a good fit for storing stateful applications (quick reminder: stateful apps are the ones that track the previously stored information which is used for current and future transactions). This all is resolved by K8s ability to connect volumes to pods in order to save the app state, but only a few storage types are supported, mainly only as exclusive write. This makes the transition process challenging, because storage is not yet easy to scale.

From a Big Data perspective, one of the most K8s amusing features is isolation. The namespace concept, based on the CICD idea (Continuous Integration and Deployment), offers a separated environment inside a cluster with access policies defined on the namespace level. This gives a freedom to create different environments (testing, production, development) and use the same scripts to run queries on them. The process of allotting the environments is easy and their full independence is ensured. From a business standpoint this solution is advantageous, the costs are under control as the whole environment is maintained on one cluster. What’s also important, the fact of using the same scripts ensures far more smooth and accurate testing processes. No doubt, isolation is a great feature for a data scientist to run an independent project with a huge computing need.

What else? We also find it helpful that Spark is already available on the platform — it eliminates the need for YARN (app to run Spark), however Kubernetes does not yet deliver all the features available on YARN such as dynamic allocation.

All in all, Kubernetes serves as a big box with lots of tools delivering nice, fancy, and customized solutions, that are not yet refined to fully handle some major, critical purposes like data storage or data transition. The system provides a set of composable control processes that are continuously developed by a huge K8s community in order to suit users desired state. These all gives an already powerful system backed by big corporations, with a great deal of potential in the future. As of now, the platform is not perfect, it has a lot to improve in data storage and transition fields, but we believe it’s only temporary as the K8s project is open-sourced and the community works on its new functionalities and features in order to deliver a more stable and powerful system.

kubernetes

google

cloud computing

big data

spark

Last updated: 31 May 2019

Written by

Mikołaj Wiśniewski

Big Data Researcher

Like this post?
Spread the word

Want more? Check our articles

dbt machine learning getindataobszar roboczy 1 4

Tutorial

dbt & Machine Learning? It is possible!

In one of our recent blog posts Announcing the GetInData Modern Data Platform - a self-service solution for Analytics Engineers we shared with you our…

Tutorial

A Step-by-Step Guide to Training a Machine Learning Model using BigQuery ML (BQML)

What is BigQuery ML? BQML empowers data analysts to create and execute ML models through existing SQL tools & skills. Thanks to that, data analysts…

getindata running machine learning platform pipelines kedro kubeflow airflow mariusz strzelecki

Tutorial

Running Machine Learning Pipelines with Kedro, Kubeflow and Airflow

One of the biggest challenges of today’s Machine Learning world is the lack of standardization when it comes to models training. We all know that data…

Big Data Event

A Review of the Big Data Technology Warsaw Summit 2024! Part 1: Takeaways from Spotify, Dropbox, Ververica, Hellofresh and Agile Lab

It was epic, the 10th edition of the Big Data Tech Warsaw Summit - one of the most tech oriented data conferences in this field. Attending the Big…

Tutorial

Data Quality in Streaming: A Deep Dive into Apache Flink

The adage "Data is king" holds in data engineering more than ever. Data engineers are tasked with building robust systems that process vast amounts of…

deploying serverless mlflow google cloud platform using cloud run machine learning getindata notext

Tutorial

Deploying serverless MLFlow on Google Cloud Platform using Cloud Run

At GetInData, we build elastic MLOps platforms to fit our customer’s needs. One of the key functionalities of the MLOps platform is the ability to…

Check All

Contact us

Interested in our solutions?
Contact us!

Together, we will select the best Big Data solutions for your organization and build a project that will have a real impact on your organization.

What did you find most impressive about GetInData?

They did a very good job in finding people that fitted in Acast both technically as well as culturally.

Type the form or send a e-mail: hello@getindata.com

Everything you would like to know about Kubernetes

Kubernetes — basic information

Kubernetes — specs & features

Kubernetes vs. IT challenges

Like this post?Spread the word

Want more? Check our articles

dbt & Machine Learning? It is possible!

A Step-by-Step Guide to Training a Machine Learning Model using BigQuery ML (BQML)

Running Machine Learning Pipelines with Kedro, Kubeflow and Airflow

A Review of the Big Data Technology Warsaw Summit 2024! Part 1: Takeaways from Spotify, Dropbox, Ververica, Hellofresh and Agile Lab

Data Quality in Streaming: A Deep Dive into Apache Flink

Deploying serverless MLFlow on Google Cloud Platform using Cloud Run

Contact us

Interested in our solutions?Contact us!

Like this post?
Spread the word

Interested in our solutions?
Contact us!