Tech News
7 min read

Everything you would like to know about Kubernetes

Source: GetInData, Google. Source: GetInData, Google.

Kubernetes. What is it? Undoubtedly one of the hottest topics in Big Data world over the last months and a subject of multiple discussions. This is why we’ve decided to sum up facts and thoughts on it and present a comprehensive overview of this platform. This post is dedicated for a non-technical audience that is interested in this tech.

Kubernetes — basic information

The platform’s name etymology comes from Greek and it means helmsman or pilot. The name is also associated, rooted with governor and cybernetic. The platform’s abbreviation is K8s. The 8 replaces the 8 letters from the full name: ubernete.

Source: GetInData, Google. Source: GetInData, Google.

What exactly Kubernetes is? At first, It’s worth to take a look at Kubernetes history. Originally, the platform was developed and designed (around mid 2000s) by engineers at Google, under name Borg, on the top of container technology, containerization. The technology, invented by Linux, is similar to traditional container idea known from shipping business and assumes packaging an application with its critical dependencies, isolated from other, affiliated processes. It is worth to mention that Google was one of the early contributors to containerization and became popular when the Docker containerization project was launched in 2013. Borg predated Kubernetes and the lessons learned from developing Borg, as well as Google’s +10 years of experience with scaling and containerization, ‘paid off’ in the new platform that was introduced to public and open-sourced in 2014.

After this a bit lengthy (but needed) intro, let’s cut to the point and explain what Kubernetes is. This is an open-source source platform for container orchestration, in other words, it helps to run applications packed in containers. Though the process of running apps on a few containers is not a complicated task but if you start scaling, Kubernetes support is in need. By making containerized applications dramatically easier to manage at scale, Kubernetes has become a key part of the container revolution. Now, you can bundle together hosts running Linux containers, and the platform will support you in the process of smooth and efficient cluster management, also in the cloud environment. Kubernetes is an ideal platform for hosting cloud-native applications that require rapid scaling, like real-time data streaming through Apache Kafka.

Source: GetInData, Google. Source: GetInData, Google.

Kubernetes — specs & features

Let’s move on to Kubernetes specs. The platform has a number of features. Kubernetes provides a container-based management environment. It arranges computing, networking, and storage infrastructure on behalf of user workloads. This sums up to a mix of PaaS (Platform as a Service) simplicity and IaaS (Infrastructure as a Service) flexibility, however it is not a traditional, all-inclusive PaaS system. The platform operates at the container level rather than at the hardware level and delivers generally applicable features known from PaaS menu: scaling, logging, deployment to name a few. Kubernetes is not monolithic and default solutions are non-existent, they’re optional and ready for customization. The platform leaves the door wide open to build developer platforms, but preserves user choice and flexibility. Labels (a tool to add metadata to Kubernetes objects) empower users to organize their resources however they please. Annotations (a similar feature to label, but allows to add non-identifying metadata) enable to decorate resources with custom information to facilitate workflows and provide an easy way for management tools to checkpoint state. What’s more, the platform offers the control plane built on the basis of the same APIs available for both developers and users. Thanks to that, the latter group is equipped with the resources to write their own controllers on their own APIs, that can be targeted by a general-purpose command-line tool.

Although Kubernetes provides its users a lot of freedom for running operations (i.e. it does not limit the types of applications supported) it has some limitations arising from the platform’s idea: does not deliver traditional infrastructure services like deploying code, does not dictate logging, alerting nor monitoring solutions or PaaS offerings like application-level services such as middleware, data-processing frameworks (i.e. Spark) or databases (i.e. mySQL). Kubernetes does not support advanced machine configurations, maintenance and management solutions.

Kubernetes vs. IT challenges

Cloud vs on premise — this dilemma is known for any fast-developing IT company. The migration process is complicated as a future cloud company needs to fulfill a lot of requirements: infrastructure accommodation, security and risk management or data privacy to name a few. Kubernetes gives its users a hand in the migration process as it defines the standard API. What’s more, the same tools (kubectl, helm) can manage a distribution infrastructure both on premise (Openshift) and cloud (GKE). We can also start up our own cluster on a PC (via minikube or minishift) to get some hands-on experience with the platform. But one need to remember that since K8s is expandable, some distributions solve problems in their own manner (i.e. K8s Ingress vs OpenShift Route).

How about storage? There are a few bottlenecks here. The K8s pods are ephemeral and are not a good fit for storing stateful applications (quick reminder: stateful apps are the ones that track the previously stored information which is used for current and future transactions). This all is resolved by K8s ability to connect volumes to pods in order to save the app state, but only a few storage types are supported, mainly only as exclusive write. This makes the transition process challenging, because storage is not yet easy to scale.

From a Big Data perspective, one of the most K8s amusing features is isolation. The namespace concept, based on the CICD idea (Continuous Integration and Deployment), offers a separated environment inside a cluster with access policies defined on the namespace level. This gives a freedom to create different environments (testing, production, development) and use the same scripts to run queries on them. The process of allotting the environments is easy and their full independence is ensured. From a business standpoint this solution is advantageous, the costs are under control as the whole environment is maintained on one cluster. What’s also important, the fact of using the same scripts ensures far more smooth and accurate testing processes. No doubt, isolation is a great feature for a data scientist to run an independent project with a huge computing need.

What else? We also find it helpful that Spark is already available on the platform — it eliminates the need for YARN (app to run Spark), however Kubernetes does not yet deliver all the features available on YARN such as dynamic allocation.

All in all, Kubernetes serves as a big box with lots of tools delivering nice, fancy, and customized solutions, that are not yet refined to fully handle some major, critical purposes like data storage or data transition. The system provides a set of composable control processes that are continuously developed by a huge K8s community in order to suit users desired state. These all gives an already powerful system backed by big corporations, with a great deal of potential in the future. As of now, the platform is not perfect, it has a lot to improve in data storage and transition fields, but we believe it’s only temporary as the K8s project is open-sourced and the community works on its new functionalities and features in order to deliver a more stable and powerful system.

kubernetes
google
cloud computing
big data
spark
31 May 2019

Want more? Check our articles

saleslstronaobszar roboczy 1 100
Tutorial

Power of Big Data: Sales

In the first part of the series "Power of Big Data", I wrote about how Big Data can influence the development of marketing activities and how it can…

Read more
flink kubernetes how why blog big data cloud
Tutorial

Flink on Kubernetes - how and why?

Flink is an open-source stream processing framework that supports both batch processing and data streaming programs. Streaming happens as data flows…

Read more
data enrichtment flink sql using http connector flink getindata big data blog notext
Tutorial

Data Enrichment in Flink SQL using HTTP Connector For Flink - Part Two

In part one of this blog post series, we have presented a business use case which inspired us to create an HTTP connector for Flink SQL. The use case…

Read more
power of big data ii obszar roboczy 1 3x 100
Tutorial

Power of Big Data: Healthcare

Welcome to another Power of Big Data series post. In the series, we present the possibilities offered by solutions related to the management, analysis…

Read more
blog1obszar roboczy 1 3
Tutorial

Power of Big Data: Science

Welcome to the next installment of the "Big Data for Business" series, in which we deal with the growing popularity of Big Data solutions in various…

Read more
big data technology warsaw summit 2021 adam kawa przemysław gamdzyk
Big Data Event

The Big Data Technology Summit 2021 - review of presentations

Since 2015, the beginning of every year is quite intense but also exciting for our company, because we are getting closer and closer to the Big Data…

Read more

Contact us

Interested in our solutions?
Contact us!

Together, we will select the best Big Data solutions for your organization and build a project that will have a real impact on your organization.

The administrator of your personal data is GetInData Sp. z o.o. Sp.k with its registered seat in Warsaw (02-508), 39/20 Pulawska St. Your data is processed for the purpose of provision of electronic services in accordance with the  Terms & Conditions. For more information on personal data processing and your rights please see  Privacy Policy.

By submitting this form, you agree to our Terms & Conditions and Privacy Policy