Apache NiFi - why do data engineers love it and hate it at the same time? Blog Series Introduction
Learning new technologies is like falling in love. At the beginning, you enjoy it totally and it is like wearing pink glasses that prevent you from observing anything you don’t like. In software development, we call this phase Proof of Concept. Then a jazzy proof of concept starts being a casual project with corner cases that you cannot hide and need to resolve. At some point, a number of corner cases overwhelms you and maybe even bigger than the advantages gained by the brand-new technology. This may mean long weeks when you truly hate it, although being in love a moment earlier. If you are lucky, you will get Your problems solved quickly and will be able to deploy on production. At this point, you can sit back, eat caviar, drink champagne and put all together - all the findings and issues you solved and encountered during the project.
At GetInData, we have reached this point and this post series shares our hands-on, real-life experience with Apache NiFi. We will show our findings and opinions but we will not answer questions like: is NiFi good enough, do we recommend it, etc… We believe there are no general answers for that and focus on describing what issues can one encounter when deploying data flows in NiFi. All of the examples come from real project scenarios.
Our blog series will be divided into the following posts:
Part I - Fast development, painful maintenance - We explore the benefits of pipeline development and great features available in web canvas. We identify some minor disadvantages. These are the things, we know from multiple programming languages and, as software developers, are used to this but are not available in NiFi.
Part II - We have deployed, but at what cost… - CI/CD of NiFi flow - There is a long way from a successful project to a successful project release. For NiFi this can be even longer than for most other popular technologies. We describe how the requirements (like environment separation) can make the world hard, and present our solution to that.
Part III - No coding, just drag and drop what you need, but if it's not there… - custom processors, scripts, external services - Implementing optimistic path is just a fraction of a fraction of a pie. Most of the time is spent on corner cases and features that cannot be easily solved by ready-to-go NiFi processors. Custom processors and groovy scripts can be a solution to that. At some point managing dozens of Groovy copy-pasted inline scripts and others can become problematic.
Part IV - Universe made out of flow files - NiFi architecture High availability is a must-have for modern applications. In order to achieve it, one should understand deep internals of system that is going to be used. In this part we cover it. Being aware of the possible limitations allows mitigating them.