Tutorial

10 min read

Logs analytics at scale in the cloud with Loki

Logs can provide a lot of useful information about the environment and status of the application and should be part of our monitoring stack. We'll discuss how only metrics with logs are enriched, valuable sources of the truth of the platform, and then we'll talk about observability which became the key to creating a stable, efficient and continuous working service in any environment. This is the key to making our platform fully observable.

Observability is quite similar to DevOps. This is not only limited to technology, as it covers organizational culture and approach too. Besides, the concept of observability is prominent in the DevOps approach, because as described, monitoring goals are not limited to collecting and processing logs and metrics. It should deliver information about its state to make it observable. That is what we call observability. A great synonym would be understandability for users.

Let’s go and take a look at a solution designed for Kubernetes-native software that we can easily install on any cloud or on-premise infrastructure.

Managed vs. self-hosted solutions

The first issue that arises when talking about the use of the public cloud is in choosing the right solution: a fully automated cloud service, provided by a cloud service or self-managed applications. Each of the main public cloud providers deliver their own solution addressing the need to obtain and analyze logs from our applications and infrastructure. Google Cloud has its Logging feature; AWS CloudWatch Logs and Microsoft Azure - Azure Monitor. Initially, it looks great. It is self managed, we don't need to worry about scalability - and we can easily integrate it with any cloud service or with our own applications. Unfortunately, such an approach does not provide too much freedom in terms of configuration and in many cases, it can incur quite high costs.

This is where self-managed logs analytics tools come into play. The most popular one is Elastic stack, but there are also some great alternatives. The most interesting one that we use in multiple projects for different customers at GetInData is Loki made by Grafana, not to mention Graylog, DataDog, LogDNA and Sumo Logic.

The second issue is the number of logs, how many logs have been created, how many we require in our platform and the perspective of the system. It is necessary to plan the infrastructure, configure each log pipeline and estimate the costs - the last one being especially important when deciding to go with cloud managed services.

The third issue is about visualisation and alerting.

To summarise this part of the article, let’s analyze the following areas:

Number and size of the qlogs that will be sent to the system
- Can we filter any logs at the source (like not sending all logs at INFO level) to reduce sent logs?
High Availability of the system
The age of the data we would like to run queries on.
The length of time we need to store these logs.
Can we use our current visualisation tool or do we need to install an additional application?
How can we manage access to the logs?
How can we provide alerts based on the content of the logs?

logs-analytics-at-large-scale-in-the-cloud-with-loki — Logs analytics at scale in the cloud: ELK vs Loki+Promtail

Use case: Grafana Loki in the cloud

Loki is a horizontally scalable, highly-available, multi-tenant log aggregation system inspired by Prometheus. It was designed to be very cost-effective and easy to operate. It does not index the contents of the logs, but rather a set of labels for each log stream. The project was started in 2018 and was developed by Grafana Labs, so as you may expect, we can query Loki data in Grafana, which happens to be very useful.

Logs-analytics-at-large-scale-in-the-cloud-loki-grafana — Logs analytics at scale in the cloud: Grafana Loki

Currently, there is a new release - 2.2.0 and we can notice great speed in development, adding new features and enhancements which is crucial when choosing the right tool.

Logs ingestion

Loki is responsible for log aggregation and running queries in logs, yet it still requires an external application to deliver logs to it. The first way is to add a dedicated pipeline in the application from which we can push logs to Loki directly, while the second, recommended and most widely used, is using the dedicated service Loki Promtail, FluentD or Fluentbit.

Two modes of Loki - installation and configuration

Loki can work in two different modes: monolithic and microservice.

The first one is a great solution to start the journey with Loki or for a platform in which we don't expect a high log load, as it is a simple setup and most users can create it with no major issues. On the other hand, Loki can be run with its microservices that are the key to making the logs analytics platform easily, horizontally scalable (depending on the infrastructure to which we install it).

logs-analytics-scale-in-the-cloud-with-loki-grafana-blog — Source: Grafana Blog

Installing Loki can be easily achieved by using the official Helm chart maintained by Grafana Labs. We can customize the values file, add our own configuration and quickly deploy it to the target environment.

logs-analytics-at-scale-in-cloud-loki — Logs analytics: Diagram of the Loki in microservices mode with turned on alerting.

The best option for installing Loki in microservice mode is Kubernetes - each public cloud provider delivers its own managed Kubernetes like AWS EKS, Azure AKS or Google GKE. Here are the following components of Loki:

Distributor - this is responsible for handling incoming streams by clients.
Ingester - responsible for writing log data to long-term storage backends on the write path and returning log data for in-memory queries on the read path.
Querier - handles queries using the LogQL query language, fetching logs both from the ingesters and long-term storage.
(Optional) Query frontend - provides the querier’s API endpoints and can be used to accelerate the read path.

It is necessary to use etcd, Consul or Memberlist to deploy multiple ingesters. One of these components is used to shard series/logs across multiple ingesters.

The next requirement of Loki is storage. Fortunately, since the release of v1.5.0, we only have to use object storage instead of mixed object storage with key-value databases (like Cassandra or Google Cloud BigTable) that makes the whole platform cheaper and easier to maintain. We need to create a new bucket in AWS S3, Google Cloud Storage or Microsoft Azure Blob Storage, set the standard storage class, add required permissions to the utilised IAM user/role by Loki and that’s all. We can then start Loki.

Moreover, in the case of needing to have as fast queries as possible, we can add volumes to our Loki deployment and then the querier can cache the query results in the local storage, to reduce time spent on running the same query once again.

Alerting out of the box

Loki includes a component called Ruler that is responsible for continually evaluating a set of configurable queries and then alerting when certain issues occur, e.g. a high percentage of error logs. It can then send an event to the Alert Manager from which the alert can be sent to the email or Slack channel

Ruler supports object storage or local storage to store its state. It's important to mention that Ruler is also horizontally scalable. Similar to the ingesters, the Rulers establish a hash ring to divide up the responsibilities of evaluating rules.

One single place to see everything

One of the most important facts about Loki is that it is supported by Grafana. We can configure it in a few simple steps, create dashboards with a number of occurrences of the error or of the information and then set up the alert from Grafana, or combine such a panel with Prometheus metrics from the Flink job. This provides a great opportunity to create a complex dashboard in which we can make a fully observable platform. It can also be useful to create a self-healing platform - the action can be triggered based on the logs content.

Simplicity vs. performance

Loki doesn’t require too many resources, especially when compared to the Elastic stack. Unfortunately, it has a big impact on the query speed which is not ideal, this therefore being the main reason why Loki is a great tool for developers to understand logs from their applications, not for running business analysis based on logs

Data in ElasticSearch is stored on-disk as unstructured JSON objects. Both the keys for each object and the contents of each key are indexed. In Loki, logs are stored in plaintext form, tagged with a set of label names and values, where only the label pairs are indexed. This trade-off makes it cheaper to operate than a full index and allows developers to aggressively log from their applications.

Simple, well performing logs analytics tool

Loki seems to be the most interesting platform for technical logs analytics as it’s an open-source project. We can simply install it on any available Kubernetes in any environment with object storage or even on a virtual machine, whilst its features meet production requirements such as High Availability, alerting or data visualization in the tool that supports access management

At GetInData, we evaluate multiple configuration setups and we really know how to create a valuable, well performing and scalable platform for log analytics. If you want to know more, do not hesitate to contact us.

big data

analytics

monitoring system

Grafana

Loki

Last updated: 22 April 2021

Written by

Albert Lewandowski

Big Data DevOps Engineer

Like this post?
Spread the word

Want more? Check our articles

Use-cases/Project

Enabling Hive on Spark on CDH 5.14 — a few problems (and solutions)

Recently I’ve had an opportunity to configure CDH 5.14 Hadoop cluster of one of GetInData’s customers to make it possible to use Hive on Spark…

Tutorial

dbt Semantic Layer - What Is and How to Use

note: Read the second part of this post here. Introduction Many companies nowadays are facing the question, “How can I get value from my data easier…

Tutorial

Data Quality in Streaming: A Deep Dive into Apache Flink

The adage "Data is king" holds in data engineering more than ever. Data engineers are tasked with building robust systems that process vast amounts of…

getindata success story izettle stream processing

Success Stories

Success Story: Fintech data platform gets a boost from stream processing

A partnership between iZettle and GetInData originated in the form of a two-day workshop focused on analyzing iZettle’s needs and exploring multiple…

Data Democratization: Power Your Organizations with Data Accessibility

In today's digital age, data reigns supreme as the lifeblood of organizations across industries. From enabling informed decision-making to driving…

Tutorial

Airflow in a multi-teams / multi-tenant environment. Deployment strategies

This article explores using Airflow 2 in environments with multiple teams (tenants) and concludes with a brief overview of out-of-the-box features to…

Check All

Contact us

Interested in our solutions?
Contact us!

Together, we will select the best Big Data solutions for your organization and build a project that will have a real impact on your organization.

What did you find most impressive about GetInData?

They did a very good job in finding people that fitted in Acast both technically as well as culturally.

Type the form or send a e-mail: hello@getindata.com

Logs analytics at scale in the cloud with Loki

Managed vs. self-hosted solutions

Use case: Grafana Loki in the cloud

Logs ingestion

Two modes of Loki - installation and configuration

Alerting out of the box

One single place to see everything

Simplicity vs. performance

Simple, well performing logs analytics tool

Like this post?Spread the word

Want more? Check our articles

Enabling Hive on Spark on CDH 5.14 — a few problems (and solutions)

dbt Semantic Layer - What Is and How to Use

Data Quality in Streaming: A Deep Dive into Apache Flink

Success Story: Fintech data platform gets a boost from stream processing

Data Democratization: Power Your Organizations with Data Accessibility

Airflow in a multi-teams / multi-tenant environment. Deployment strategies

Contact us

Interested in our solutions?Contact us!

Like this post?
Spread the word

Interested in our solutions?
Contact us!