Apache NiFi - why do data engineers love it and hate it at the same time?

Apache NiFi - why do data engineers love it and hate it at the same time? Blog Series Introduction

Learning new technologies is like falling in love. At the beginning, you enjoy it totally and it is like wearing pink glasses that prevent you from observing anything you don’t like. In software development, we call this phase Proof of Concept. Then a jazzy proof of concept starts being a casual project with corner cases that you cannot hide and need to resolve. At some point, a number of corner cases overwhelms you and maybe even bigger than the advantages gained by the brand-new technology. This may mean long weeks when you truly hate it, although being in love a moment earlier. If you are lucky, you will get Your problems solved quickly and will be able to deploy on production. At this point, you can sit back, eat caviar, drink champagne and put all together - all the findings and issues you solved and encountered during the project.

apache-nifi-introduction-ingestion

At GetInData, we have reached this point and this post series shares our hands-on, real-life experience with Apache NiFi. We will show our findings and opinions but we will not answer questions like: is NiFi good enough, do we recommend it, etc… We believe there are no general answers for that and focus on describing what issues can one encounter when deploying data flows in NiFi. All of the examples come from real project scenarios.

Our blog series will be divided into the following posts:

Part I - Fast development, painful maintenance - We explore the benefits of pipeline development and great features available in web canvas. We identify some minor disadvantages. These are the things, we know from multiple programming languages and, as software developers, are used to this but are not available in NiFi.
Part II - We have deployed, but at what cost… - CI/CD of NiFi flow - There is a long way from a successful project to a successful project release. For NiFi this can be even longer than for most other popular technologies. We describe how the requirements (like environment separation) can make the world hard, and present our solution to that.
Part III - No coding, just drag and drop what you need, but if it's not there… - custom processors, scripts, external services - Implementing optimistic path is just a fraction of a fraction of a pie. Most of the time is spent on corner cases and features that cannot be easily solved by ready-to-go NiFi processors. Custom processors and groovy scripts can be a solution to that. At some point managing dozens of Groovy copy-pasted inline scripts and others can become problematic.
Part IV - Universe made out of flow files - NiFi architecture High availability is a must-have for modern applications. In order to achieve it, one should understand deep internals of system that is going to be used. In this part we cover it. Being aware of the possible limitations allows mitigating them.
Part V - It’s fast and easy, what could possibly go wrong - one-year history of certain NiFi flow - Data ingestion projects have several things in common. At the beginning, they are just simple pipelines and then the complexity emerges with extra business logic to be implemented. The requirements affect the architecture of the flow and how NiFi is adopted.
I have only one rule and that’s … - recommendations for using Apache NiFi - In the remaining posts we have provided tons of interesting findings and most of them are quite detailed knowledge. In this post, we put everything together and come up with some general recommendations on data ingestion using Apache NiFi.

This is what we are planning to do. Please stay with us to read further posts, no matter if you are interested in all the topics or just some of them.

See you soon ;-)

big data

apache nifi

getindata

CI/CD

Last updated: 31 August 2020

Written by

Tomasz Nazarewicz

Data Engineer

Paweł Leszczyński

Data Engineer

Want more? Check our articles

data analyst data analytics how start career non technical background getindata big data blog

Tutorial

Data Analyst - how to start your career with a non-technical background

Interested in joining the data analytics world? Not sure where to start? Are more and more questions popping into your head? I’ve been there myself…

airbyte column selectionobszar roboczy 1 4

Tutorial

Less data, less problems: Airbyte’s column selection is finally here

The Airbyte 0.50 release has brought some exciting changes to the platform: checkpointing (so that you don’t have to start from scratch in case of…

Level Up Your Data Game: 5 Must-Read Blogs You Can’t Miss in 2024

Staying ahead in the ever-evolving world of data and analytics means accessing the right insights and tools. On our platform, we’re committed to…

transfer legacy pipeline modern using gitlab cicd

Tutorial

How we helped our client to transfer legacy pipeline to modern one using GitLab's CI/CD - Part 3

Please dive in the third part of a blog series based on a project delivered for one of our clients. Please click part I, part II to read the…

transfer legacy pipeline modern gitlab cicd kubernetes kaniko

Tutorial

How we helped our client to transfer legacy pipeline to modern one using GitLab's CI/CD - Part 2

Please dive in the second part of a blog series based on a project delivered for one of our clients. If you miss the first part, please check it here…

Tutorial

NiFi Ingestion Blog Series. PART II - We have deployed, but at what cost… - CI/CD of NiFi flow

Apache NiFi, a big data processing engine with graphical WebUI, was created to give non-programmers the ability to swiftly and codelessly create data…

Apache NiFi - why do data engineers love it and hate it at the same time? Blog Series Introduction

Like this post?
Spread the word

Want more? Check our articles

Data Analyst - how to start your career with a non-technical background

Less data, less problems: Airbyte’s column selection is finally here

Level Up Your Data Game: 5 Must-Read Blogs You Can’t Miss in 2024

How we helped our client to transfer legacy pipeline to modern one using GitLab's CI/CD - Part 3

How we helped our client to transfer legacy pipeline to modern one using GitLab's CI/CD - Part 2

NiFi Ingestion Blog Series. PART II - We have deployed, but at what cost… - CI/CD of NiFi flow

Contact us

Interested in our solutions?
Contact us!

Apache NiFi - why do data engineers love it and hate it at the same time? Blog Series Introduction

Like this post?Spread the word

Want more? Check our articles

Data Analyst - how to start your career with a non-technical background

Less data, less problems: Airbyte’s column selection is finally here

Level Up Your Data Game: 5 Must-Read Blogs You Can’t Miss in 2024

How we helped our client to transfer legacy pipeline to modern one using GitLab's CI/CD - Part 3

How we helped our client to transfer legacy pipeline to modern one using GitLab's CI/CD - Part 2

NiFi Ingestion Blog Series. PART II - We have deployed, but at what cost… - CI/CD of NiFi flow

Contact us

Interested in our solutions?Contact us!

Like this post?
Spread the word

Interested in our solutions?
Contact us!