10 min read

Logs analytics at scale in the cloud with Loki

Logs can provide a lot of useful information about the environment and status of the application and should be part of our monitoring stack. We'll discuss how only metrics with logs are enriched, valuable sources of the truth of the platform, and then we'll talk about observability which became the key to creating a  stable, efficient and continuous working service in any environment. This is the key to making our platform fully observable.

Observability is quite similar to DevOps. This is not only limited to technology, as it covers organizational culture and approach too. Besides, the concept of observability is prominent in the DevOps approach, because as described, monitoring goals are not limited to collecting and processing logs and metrics. It should deliver information about its state to make it observable. That is what we call observability. A great synonym would be understandability for users.

Let’s go and take a look at a solution designed for Kubernetes-native software that we can easily install on any cloud or on-premise infrastructure.

Managed vs. self-hosted solutions

The first issue that arises when talking about the use of the public cloud is in choosing the right solution: a fully automated cloud service, provided by a cloud service or self-managed applications. Each of the main public cloud providers deliver their own solution addressing the need to obtain and analyze logs from our applications and infrastructure. Google Cloud has its Logging feature; AWS CloudWatch Logs and Microsoft Azure - Azure Monitor. Initially, it looks great. It is self managed, we don't need to worry about scalability - and we can easily integrate it with any cloud service or with our own applications. Unfortunately, such an approach does not provide too much freedom in terms of configuration and in many cases, it can incur quite high costs.

This is where self-managed logs analytics tools come into play. The most popular one is Elastic stack, but there are also some great alternatives. The most interesting one that we use in multiple projects for different customers at GetInData is Loki made by Grafana, not to mention Graylog, DataDog, LogDNA and Sumo Logic.

The second issue is the number of logs, how many logs have been created, how many we require in our platform and the perspective of the system. It is necessary to plan the infrastructure, configure each log pipeline and estimate the costs - the last one being especially important when deciding to go with cloud managed services.

The third issue is about visualisation and alerting.

To summarise this part of the article, let’s analyze the following areas:

  • Number and size of the qlogs that will be sent to the system

    • Can we filter any logs at the source (like not sending all logs at INFO level) to reduce sent logs?
  • High Availability of the system

  • The age of the data we would like to run queries on. 

  • The length of time we need to store these logs. 

  • Can we use our current visualisation tool or do we need to install an additional application?

  • How can we manage access to the logs?

  • How can we provide alerts based on the content of the logs?

Logs analytics at scale in the cloud: ELK vs Loki+Promtail

Use case: Grafana Loki in the cloud

Loki is a horizontally scalable, highly-available, multi-tenant log aggregation system inspired by Prometheus. It was designed to be very cost-effective and easy to operate. It does not index the contents of the logs, but rather a set of labels for each log stream. The project was started in 2018 and was developed by Grafana Labs, so as you may expect, we can query Loki data in Grafana, which happens to be very useful.

Logs analytics at scale in the cloud: Grafana Loki

Currently, there is a new release - 2.2.0 and we can notice great speed in development, adding new features and enhancements which is crucial when choosing the right tool.

Logs ingestion

Loki is responsible for log aggregation and running queries in logs, yet it still requires an external application to deliver logs to it. The first way is to add a dedicated pipeline in the application from which we can push logs to Loki directly, while the second, recommended and most widely used, is using the dedicated service Loki Promtail, FluentD or Fluentbit.

Two modes of Loki - installation and configuration

Loki can work in two different modes: monolithic and microservice. 

The first one is a great solution to start the journey with Loki or for a platform in which we don't expect a high log load, as it is a simple setup and most users can create it with no major issues. On the other hand, Loki can be run with its microservices that are the key to making the logs analytics platform easily, horizontally scalable (depending on the infrastructure to which we install it).

Source: Grafana Blog

Installing Loki can be easily achieved by using the official Helm chart maintained by Grafana Labs. We can customize the values file, add our own configuration and quickly deploy it to the target environment.

Logs analytics: Diagram of the Loki in microservices mode with turned on alerting.

The best option for installing Loki in microservice mode is Kubernetes - each public cloud provider delivers its own managed Kubernetes like AWS EKS, Azure AKS or Google GKE. Here are the following components of Loki:

  • Distributor - this is responsible for handling incoming streams by clients.
  • Ingester - responsible for writing log data to long-term storage backends on the write path and returning log data for in-memory queries on the read path.
  • Querier - handles queries using the LogQL query language, fetching logs both from the ingesters and long-term storage.
  • (Optional) Query frontend - provides the querier’s API endpoints and can be used to accelerate the read path.

It is necessary to use etcd, Consul or Memberlist to deploy multiple ingesters. One of these components is used to shard series/logs across multiple ingesters.

The next requirement of Loki is storage. Fortunately, since the release of v1.5.0, we only have to use object storage instead of mixed object storage with key-value databases (like Cassandra or Google Cloud BigTable) that makes the whole platform cheaper and easier to maintain. We need to create a new bucket in AWS S3, Google Cloud Storage or Microsoft Azure Blob Storage, set the standard storage class, add required permissions to the utilised IAM user/role by Loki and that’s all. We can then start Loki.

Moreover, in the case of needing to have as fast queries as possible, we can add volumes to our Loki deployment and then the querier can cache the query results in the local storage, to reduce time spent on running the same query once again.

Alerting out of the box

Loki includes a component called Ruler that is responsible for continually evaluating a set of configurable queries and then alerting when certain issues occur, e.g. a high percentage of error logs. It can then send an event to the Alert Manager from which the alert can be sent to the email or Slack channel

Ruler supports object storage or local storage to store its state. It's important to mention that Ruler is also horizontally scalable. Similar to the ingesters, the Rulers establish a hash ring to divide up the responsibilities of evaluating rules.

One single place to see everything

One of the most important facts about Loki is that it is supported by Grafana. We can configure it in a few simple steps, create dashboards with a number of occurrences of the error or of the information and then set up the alert from Grafana, or combine such a panel with Prometheus metrics from the Flink job. This provides a great opportunity to create a complex dashboard in which we can make a fully observable platform. It can also be useful to create a self-healing platform - the action can be triggered based on the logs content.

Simplicity vs. performance

Loki doesn’t require too many resources, especially when compared to the Elastic stack. Unfortunately, it has a big impact on the query speed which is not ideal,  this therefore being the main reason why Loki is a great tool for developers to understand logs from their applications, not for running business analysis based on logs

Data in ElasticSearch is stored on-disk as unstructured JSON objects. Both the keys for each object and the contents of each key are indexed. In Loki, logs are stored in plaintext form, tagged with a set of label names and values, where only the label pairs are indexed. This trade-off makes it cheaper to operate than a full index and allows developers to aggressively log from their applications.

Simple, well performing logs analytics tool

Loki seems to be the most interesting platform for technical logs analytics as it’s an open-source project. We can simply install it on any available Kubernetes in any environment with object storage or even on a virtual machine,  whilst its features meet production requirements such as  High Availability, alerting or data visualization in the tool that supports access management

At GetInData, we evaluate multiple configuration setups and we really know how to create a valuable, well performing and scalable platform for log analytics. If you want to know more, do not hesitate to contact us.

big data
monitoring system
22 April 2021

Want more? Check our articles

getindata blog big data flink data capture jdbc flinksql

Change Data Capture by JDBC with FlinkSQL

These days, Big Data and Business Intelligence platforms are one of the fastest-growing areas of computer science. Companies want to extract knowledge…

Read more
getindata cover nifi ingestion kafka poc notext

NiFi Ingestion Blog Series. PART V - It’s fast and easy, what could possibly go wrong - one year history of certain nifi flow

Apache NiFi, a big data processing engine with graphical WebUI, was created to give non-programmers the ability to swiftly and codelessly create data…

Read more
introducinggeiparquetobszar roboczy 1 4

Introducing the Geoparquet data format

The need for a unified format for geospatial data In recent years, a lot of geospatial frameworks have been created to process and analyze big…

Read more
kafka gobblin hdfs getindata linkedin

Data pipeline evolution at Linkedin on a few pictures

Data Pipeline Evolution The LinkedIn Engineering blog is a great resource of technical blog posts related to building and using large-scale data…

Read more
big data technology warsaw summit 2020 getindata
Big Data Event

Review of presentations on the Big Data Technology Warsaw Summit 2020

It’s been exactly two months since the last edition of the Big Data Technology Warsaw Summit 2020, so we decided to share some great statistics with…

Read more
bqmlobszar roboczy 1 4

A Step-by-Step Guide to Training a Machine Learning Model using BigQuery ML (BQML)

What is BigQuery ML? BQML empowers data analysts to create and execute ML models through existing SQL tools & skills. Thanks to that, data analysts…

Read more

Contact us

Interested in our solutions?
Contact us!

Together, we will select the best Big Data solutions for your organization and build a project that will have a real impact on your organization.

The administrator of your personal data is GetInData Sp. z o.o. Sp.k with its registered seat in Warsaw (02-508), 39/20 Pulawska St. Your data is processed for the purpose of provision of electronic services in accordance with the  Terms & Conditions. For more information on personal data processing and your rights please see  Privacy Policy.

By submitting this form, you agree to our Terms & Conditions and Privacy Policy