Apache NiFi - why do data engineers love it and hate it at the same time?

Apache NiFi - why do data engineers love it and hate it at the same time? Blog Series Introduction

Learning new technologies is like falling in love. At the beginning, you enjoy it totally and it is like wearing pink glasses that prevent you from observing anything you don’t like. In software development, we call this phase Proof of Concept. Then a jazzy proof of concept starts being a casual project with corner cases that you cannot hide and need to resolve. At some point, a number of corner cases overwhelms you and maybe even bigger than the advantages gained by the brand-new technology. This may mean long weeks when you truly hate it, although being in love a moment earlier. If you are lucky, you will get Your problems solved quickly and will be able to deploy on production. At this point, you can sit back, eat caviar, drink champagne and put all together - all the findings and issues you solved and encountered during the project.

apache-nifi-introduction-ingestion

At GetInData, we have reached this point and this post series shares our hands-on, real-life experience with Apache NiFi. We will show our findings and opinions but we will not answer questions like: is NiFi good enough, do we recommend it, etc… We believe there are no general answers for that and focus on describing what issues can one encounter when deploying data flows in NiFi. All of the examples come from real project scenarios.

Our blog series will be divided into the following posts:

Part I - Fast development, painful maintenance - We explore the benefits of pipeline development and great features available in web canvas. We identify some minor disadvantages. These are the things, we know from multiple programming languages and, as software developers, are used to this but are not available in NiFi.
Part II - We have deployed, but at what cost… - CI/CD of NiFi flow - There is a long way from a successful project to a successful project release. For NiFi this can be even longer than for most other popular technologies. We describe how the requirements (like environment separation) can make the world hard, and present our solution to that.
Part III - No coding, just drag and drop what you need, but if it's not there… - custom processors, scripts, external services - Implementing optimistic path is just a fraction of a fraction of a pie. Most of the time is spent on corner cases and features that cannot be easily solved by ready-to-go NiFi processors. Custom processors and groovy scripts can be a solution to that. At some point managing dozens of Groovy copy-pasted inline scripts and others can become problematic.
Part IV - Universe made out of flow files - NiFi architecture High availability is a must-have for modern applications. In order to achieve it, one should understand deep internals of system that is going to be used. In this part we cover it. Being aware of the possible limitations allows mitigating them.
Part V - It’s fast and easy, what could possibly go wrong - one-year history of certain NiFi flow - Data ingestion projects have several things in common. At the beginning, they are just simple pipelines and then the complexity emerges with extra business logic to be implemented. The requirements affect the architecture of the flow and how NiFi is adopted.
I have only one rule and that’s … - recommendations for using Apache NiFi - In the remaining posts we have provided tons of interesting findings and most of them are quite detailed knowledge. In this post, we put everything together and come up with some general recommendations on data ingestion using Apache NiFi.

This is what we are planning to do. Please stay with us to read further posts, no matter if you are interested in all the topics or just some of them.

See you soon ;-)

big data

apache nifi

getindata

CI/CD

Last updated: 31 August 2020

Written by

Tomasz Nazarewicz

Data Engineer

Paweł Leszczyński

Data Engineer

Want more? Check our articles

GetInData in 2020 - our achievements and challenges in Big Data environment

The end of 2020 has come, and it's time to stop for a moment and look back. The past year was not the easiest one and presented us with many…

mamava getindata cloud google bigquery prostooleh

Success Stories

Success story: Breastfeeding supported with modern IoT and app features

Outstanding customer experience is usually backed by robust data analytics. Same applies to Mamava, a business that celebrates and supports…

getindata cover nifi ingestion kafka poc notext

Tutorial

NiFi Ingestion Blog Series. PART V - It’s fast and easy, what could possibly go wrong - one year history of certain nifi flow

Apache NiFi, a big data processing engine with graphical WebUI, was created to give non-programmers the ability to swiftly and codelessly create data…

Tutorial

Introducing the Geoparquet data format

The need for a unified format for geospatial data In recent years, a lot of geospatial frameworks have been created to process and analyze big…

getindator data engineer as a pirate behind the blue steering w d0d036e9 2016 48da b7bb 6f6c9e6523f0

Tutorial

Kubecost: Cross Charging Costs of Data Processing Pipelines in Data Mesh Architecture

Introduction As organizations increasingly adopt cloud-native technologies like Kubernetes, managing costs becomes a growing concern. With multiple…

Big Data Event

Overview of InfoShare 2024 - Part 2: Data Quality, LLMs and Data Copilot

Welcome back to our comprehensive coverage of InfoShare 2024! If you missed our first part, click here to catch up on demystifying AI buzzwords and…

Apache NiFi - why do data engineers love it and hate it at the same time? Blog Series Introduction

Like this post?
Spread the word

Want more? Check our articles

GetInData in 2020 - our achievements and challenges in Big Data environment

Success story: Breastfeeding supported with modern IoT and app features

NiFi Ingestion Blog Series. PART V - It’s fast and easy, what could possibly go wrong - one year history of certain nifi flow

Introducing the Geoparquet data format

Kubecost: Cross Charging Costs of Data Processing Pipelines in Data Mesh Architecture

Overview of InfoShare 2024 - Part 2: Data Quality, LLMs and Data Copilot

Contact us

Interested in our solutions?
Contact us!

Apache NiFi - why do data engineers love it and hate it at the same time? Blog Series Introduction

Like this post?Spread the word

Want more? Check our articles

GetInData in 2020 - our achievements and challenges in Big Data environment

Success story: Breastfeeding supported with modern IoT and app features

NiFi Ingestion Blog Series. PART V - It’s fast and easy, what could possibly go wrong - one year history of certain nifi flow

Introducing the Geoparquet data format

Kubecost: Cross Charging Costs of Data Processing Pipelines in Data Mesh Architecture

Overview of InfoShare 2024 - Part 2: Data Quality, LLMs and Data Copilot

Contact us

Interested in our solutions?Contact us!

Like this post?
Spread the word

Interested in our solutions?
Contact us!