10 min read

Logs analytics at scale in the cloud with Loki

Logs can provide a lot of useful information about the environment and status of the application and should be part of our monitoring stack. We'll discuss how only metrics with logs are enriched, valuable sources of the truth of the platform, and then we'll talk about observability which became the key to creating a  stable, efficient and continuous working service in any environment. This is the key to making our platform fully observable.

Observability is quite similar to DevOps. This is not only limited to technology, as it covers organizational culture and approach too. Besides, the concept of observability is prominent in the DevOps approach, because as described, monitoring goals are not limited to collecting and processing logs and metrics. It should deliver information about its state to make it observable. That is what we call observability. A great synonym would be understandability for users.

Let’s go and take a look at a solution designed for Kubernetes-native software that we can easily install on any cloud or on-premise infrastructure.

Managed vs. self-hosted solutions

The first issue that arises when talking about the use of the public cloud is in choosing the right solution: a fully automated cloud service, provided by a cloud service or self-managed applications. Each of the main public cloud providers deliver their own solution addressing the need to obtain and analyze logs from our applications and infrastructure. Google Cloud has its Logging feature; AWS CloudWatch Logs and Microsoft Azure - Azure Monitor. Initially, it looks great. It is self managed, we don't need to worry about scalability - and we can easily integrate it with any cloud service or with our own applications. Unfortunately, such an approach does not provide too much freedom in terms of configuration and in many cases, it can incur quite high costs.

This is where self-managed logs analytics tools come into play. The most popular one is Elastic stack, but there are also some great alternatives. The most interesting one that we use in multiple projects for different customers at GetInData is Loki made by Grafana, not to mention Graylog, DataDog, LogDNA and Sumo Logic.

The second issue is the number of logs, how many logs have been created, how many we require in our platform and the perspective of the system. It is necessary to plan the infrastructure, configure each log pipeline and estimate the costs - the last one being especially important when deciding to go with cloud managed services.

The third issue is about visualisation and alerting.

To summarise this part of the article, let’s analyze the following areas:

  • Number and size of the qlogs that will be sent to the system

    • Can we filter any logs at the source (like not sending all logs at INFO level) to reduce sent logs?
  • High Availability of the system

  • The age of the data we would like to run queries on. 

  • The length of time we need to store these logs. 

  • Can we use our current visualisation tool or do we need to install an additional application?

  • How can we manage access to the logs?

  • How can we provide alerts based on the content of the logs?

Logs analytics at scale in the cloud: ELK vs Loki+Promtail

Use case: Grafana Loki in the cloud

Loki is a horizontally scalable, highly-available, multi-tenant log aggregation system inspired by Prometheus. It was designed to be very cost-effective and easy to operate. It does not index the contents of the logs, but rather a set of labels for each log stream. The project was started in 2018 and was developed by Grafana Labs, so as you may expect, we can query Loki data in Grafana, which happens to be very useful.

Logs analytics at scale in the cloud: Grafana Loki

Currently, there is a new release - 2.2.0 and we can notice great speed in development, adding new features and enhancements which is crucial when choosing the right tool.

Logs ingestion

Loki is responsible for log aggregation and running queries in logs, yet it still requires an external application to deliver logs to it. The first way is to add a dedicated pipeline in the application from which we can push logs to Loki directly, while the second, recommended and most widely used, is using the dedicated service Loki Promtail, FluentD or Fluentbit.

Two modes of Loki - installation and configuration

Loki can work in two different modes: monolithic and microservice. 

The first one is a great solution to start the journey with Loki or for a platform in which we don't expect a high log load, as it is a simple setup and most users can create it with no major issues. On the other hand, Loki can be run with its microservices that are the key to making the logs analytics platform easily, horizontally scalable (depending on the infrastructure to which we install it).

Source: Grafana Blog

Installing Loki can be easily achieved by using the official Helm chart maintained by Grafana Labs. We can customize the values file, add our own configuration and quickly deploy it to the target environment.

Logs analytics: Diagram of the Loki in microservices mode with turned on alerting.

The best option for installing Loki in microservice mode is Kubernetes - each public cloud provider delivers its own managed Kubernetes like AWS EKS, Azure AKS or Google GKE. Here are the following components of Loki:

  • Distributor - this is responsible for handling incoming streams by clients.
  • Ingester - responsible for writing log data to long-term storage backends on the write path and returning log data for in-memory queries on the read path.
  • Querier - handles queries using the LogQL query language, fetching logs both from the ingesters and long-term storage.
  • (Optional) Query frontend - provides the querier’s API endpoints and can be used to accelerate the read path.

It is necessary to use etcd, Consul or Memberlist to deploy multiple ingesters. One of these components is used to shard series/logs across multiple ingesters.

The next requirement of Loki is storage. Fortunately, since the release of v1.5.0, we only have to use object storage instead of mixed object storage with key-value databases (like Cassandra or Google Cloud BigTable) that makes the whole platform cheaper and easier to maintain. We need to create a new bucket in AWS S3, Google Cloud Storage or Microsoft Azure Blob Storage, set the standard storage class, add required permissions to the utilised IAM user/role by Loki and that’s all. We can then start Loki.

Moreover, in the case of needing to have as fast queries as possible, we can add volumes to our Loki deployment and then the querier can cache the query results in the local storage, to reduce time spent on running the same query once again.

Alerting out of the box

Loki includes a component called Ruler that is responsible for continually evaluating a set of configurable queries and then alerting when certain issues occur, e.g. a high percentage of error logs. It can then send an event to the Alert Manager from which the alert can be sent to the email or Slack channel

Ruler supports object storage or local storage to store its state. It's important to mention that Ruler is also horizontally scalable. Similar to the ingesters, the Rulers establish a hash ring to divide up the responsibilities of evaluating rules.

One single place to see everything

One of the most important facts about Loki is that it is supported by Grafana. We can configure it in a few simple steps, create dashboards with a number of occurrences of the error or of the information and then set up the alert from Grafana, or combine such a panel with Prometheus metrics from the Flink job. This provides a great opportunity to create a complex dashboard in which we can make a fully observable platform. It can also be useful to create a self-healing platform - the action can be triggered based on the logs content.

Simplicity vs. performance

Loki doesn’t require too many resources, especially when compared to the Elastic stack. Unfortunately, it has a big impact on the query speed which is not ideal,  this therefore being the main reason why Loki is a great tool for developers to understand logs from their applications, not for running business analysis based on logs

Data in ElasticSearch is stored on-disk as unstructured JSON objects. Both the keys for each object and the contents of each key are indexed. In Loki, logs are stored in plaintext form, tagged with a set of label names and values, where only the label pairs are indexed. This trade-off makes it cheaper to operate than a full index and allows developers to aggressively log from their applications.

Simple, well performing logs analytics tool

Loki seems to be the most interesting platform for technical logs analytics as it’s an open-source project. We can simply install it on any available Kubernetes in any environment with object storage or even on a virtual machine,  whilst its features meet production requirements such as  High Availability, alerting or data visualization in the tool that supports access management

At GetInData, we evaluate multiple configuration setups and we really know how to create a valuable, well performing and scalable platform for log analytics. If you want to know more, do not hesitate to contact us.

big data
monitoring system
22 April 2021

Want more? Check our articles

whitepaper data anlytics iot albert lewandowski getindata

White Paper: Data Analytics for Industrial Internet of Things

About In this White Paper, we described what is the Industrial Internet of Things and what profits you can get from Data Analytics with IIoT What you…

Read more
why do big data project fails

Why do Big Data projects fail: Part. 2. The Technological Issues.

In the previous post on our Big Data Blog, we discussed the business reasons behind the failures of Big Data projects. We've listed five major…

Read more
bloggcpobszar roboczy 1 4

Data isolation in tenant architecture on the Google Cloud Platform (GCP)

Multi-tenant architecture, also known as multi-tenancy, is a software architecture in which a single instance of software runs on a server and serves…

Read more
5apacheobszar roboczy 1 4

Real-time ingestion to Iceberg with Kafka Connect - Apache Iceberg Sink

What is Apache Iceberg? Apache Iceberg is an open table format for huge analytics datasets which can be used with commonly-used big data processing…

Read more
runningkedroeverywhereobszar roboczy 1 4

Running Kedro… everywhere? Machine Learning Pipelines on Kubeflow, Vertex AI, Azure and Airflow

Building reliable machine learning pipelines puts a heavy burden on Data Scientists and Machine Learning engineers. It’s fairly easy to kick-off any…

Read more
getindata grafana loki monitoring

Why are log analytics so important in a monitoring system?

A monitoring system is a necessary component of any data platform. We can find a lot of different services that use different approaches to the same…

Read more

Contact us

Interested in our solutions?
Contact us!

Together, we will select the best Big Data solutions for your organization and build a project that will have a real impact on your organization.

What did you find most impressive about GetInData?

They did a very good job in finding people that fitted in Acast both technically as well as culturally.
Type the form or send a e-mail: hello@getindata.com
The administrator of your personal data is GetInData Poland Sp. z o.o. with its registered seat in Warsaw (02-508), 39/20 Pulawska St. Your data is processed for the purpose of provision of electronic services in accordance with the Terms & Conditions. For more information on personal data processing and your rights please see Privacy Policy.

By submitting this form, you agree to our Terms & Conditions and Privacy Policy