Tutorial
4 min read

Apache NiFi - why do data engineers love it and hate it at the same time? Blog Series Introduction

Learning new technologies is like falling in love. At the beginning, you enjoy it totally and it is like wearing pink glasses that prevent you from observing anything you don’t like. In software development, we call this phase Proof of Concept. Then a jazzy proof of concept starts being a casual project with corner cases that you cannot hide and need to resolve. At some point, a number of corner cases overwhelms you and maybe even bigger than the advantages gained by the brand-new technology. This may mean long weeks when you truly hate it, although being in love a moment earlier. If you are lucky, you will get Your problems solved quickly and will be able to deploy on production. At this point, you can sit back, eat caviar, drink champagne and put all together - all the findings and issues you solved and encountered during the project.

apache-nifi-introduction-ingestion

At GetInData, we have reached this point and this post series shares our hands-on, real-life experience with Apache NiFi. We will show our findings and opinions but we will not answer questions like: is NiFi good enough, do we recommend it, etc… We believe there are no general answers for that and focus on describing what issues can one encounter when deploying data flows in NiFi. All of the examples come from real project scenarios.

Our blog series will be divided into the following posts:

  • Part I - Fast development, painful maintenance - We explore the benefits of pipeline development and great features available in web canvas. We identify some minor disadvantages. These are the things, we know from multiple programming languages and, as software developers, are used to this but are not available in NiFi.
  • Part II - We have deployed, but at what cost… - CI/CD of NiFi flow - There is a long way from a successful project to a successful project release. For NiFi this can be even longer than for most other popular technologies. We describe how the requirements (like environment separation) can make the world hard, and present our solution to that.
  • Part III - No coding, just drag and drop what you need, but if it's not there… - custom processors, scripts, external services - Implementing optimistic path is just a fraction of a fraction of a pie. Most of the time is spent on corner cases and features that cannot be easily solved by ready-to-go NiFi processors. Custom processors and groovy scripts can be a solution to that. At some point managing dozens of Groovy copy-pasted inline scripts and others can become problematic.
  • Part IV - Universe made out of flow files - NiFi architecture High availability is a must-have for modern applications. In order to achieve it, one should understand deep internals of system that is going to be used. In this part we cover it. Being aware of the possible limitations allows mitigating them.
  • Part V - It’s fast and easy, what could possibly go wrong - one-year history of certain NiFi flow - Data ingestion projects have several things in common. At the beginning, they are just simple pipelines and then the complexity emerges with extra business logic to be implemented. The requirements affect the architecture of the flow and how NiFi is adopted.
  • I have only one rule and that’s … - recommendations for using Apache NiFi - In the remaining posts we have provided tons of interesting findings and most of them are quite detailed knowledge. In this post, we put everything together and come up with some general recommendations on data ingestion using Apache NiFi.

This is what we are planning to do. Please stay with us to read further posts, no matter if you are interested in all the topics or just some of them.

See you soon ;-)

big data
apache nifi
getindata
CI/CD
31 August 2020

Want more? Check our articles

Big Data Event

2³ Reasons To Speak at Big Data Tech Warsaw 2020 (February 27th, 2020)

Big Data Technology Warsaw Summit 2020 is fast approaching. This will be 6th edition of the conference that is jointly organised by Evention and…

Read more

5 reasons to follow us on Linkedin. Celebrating 1,000 followers on our profile!

We are excited to announce that we recently hit the 1,000+ followers on our profile on Linkedin. We would like to send a special THANK YOU :) to…

Read more
Use-cases/Project

Anomaly detection implemented in podcasting company

Being a Data Engineer is not only about moving the data but also about extracting value from it. Read an article on how we implemented anomalies…

Read more
Tutorial

Avoiding the mess in the Hadoop Cluster

This blog is based on the talk “Simplified Data Management and Process Scheduling in Hadoop” that we gave at the Big Data Technical Conference in…

Read more
Big Data Event

Big Data Tech Warsaw Summit 2019 summary

It’s been already more than a month after Big Data Tech Warsaw Summit 2019, but it’s spirit is still among us — that’s why we’ve decided to prolong it…

Read more
Use-cases/Project

Business value of event processing - use cases

Every second your IT systems exchange millions of messages. This information flow includes technical messages about opening a form on your website…

Read more

Contact us

Fill out this simple form. Our team will contact you promptly to discuss the next steps.

hello@getindata.comFist bump illustration

Any questions?

Choose one
By submitting this form, you agree to our  Terms & Conditions