Learning new technologies is like falling in love. At the beginning, you enjoy it totally and it is like wearing pink glasses that prevent you from observing anything you don’t like. In software development, we call this phase Proof of Concept. Then a jazzy proof of concept starts being a casual project with corner cases that you cannot hide and need to resolve. At some point, a number of corner cases overwhelms you and maybe even bigger than the advantages gained by the brand-new technology. This may mean long weeks when you truly hate it, although being in love a moment earlier. If you are lucky, you will get Your problems solved quickly and will be able to deploy on production. At this point, you can sit back, eat caviar, drink champagne and put all together - all the findings and issues you solved and encountered during the project.
At GetInData, we have reached this point and this post series shares our hands-on, real-life experience with Apache NiFi. We will show our findings and opinions but we will not answer questions like: is NiFi good enough, do we recommend it, etc… We believe there are no general answers for that and focus on describing what issues can one encounter when deploying data flows in NiFi. All of the examples come from real project scenarios.
Our blog series will be divided into the following posts:
- Part I - Fast development, painful maintenance - We explore the benefits of pipeline development and great features available in web canvas. We identify some minor disadvantages. These are the things, we know from multiple programming languages and, as software developers, are used to this but are not available in NiFi.
- Part II - We have deployed, but at what cost… - CI/CD of NiFi flow - There is a long way from a successful project to a successful project release. For NiFi this can be even longer than for most other popular technologies. We describe how the requirements (like environment separation) can make the world hard, and present our solution to that.
- Part III - No coding, just drag and drop what you need, but if it's not there… - custom processors, scripts, external services - Implementing optimistic path is just a fraction of a fraction of a pie. Most of the time is spent on corner cases and features that cannot be easily solved by ready-to-go NiFi processors. Custom processors and groovy scripts can be a solution to that. At some point managing dozens of Groovy copy-pasted inline scripts and others can become problematic.
- Part IV - Universe made out of flow files - NiFi architecture High availability is a must-have for modern applications. In order to achieve it, one should understand deep internals of system that is going to be used. In this part we cover it. Being aware of the possible limitations allows mitigating them.
- Part V - It’s fast and easy, what could possibly go wrong - one-year history of certain NiFi flow - Data ingestion projects have several things in common. At the beginning, they are just simple pipelines and then the complexity emerges with extra business logic to be implemented. The requirements affect the architecture of the flow and how NiFi is adopted.
- I have only one rule and that’s … - recommendations for using Apache NiFi - In the remaining posts we have provided tons of interesting findings and most of them are quite detailed knowledge. In this post, we put everything together and come up with some general recommendations on data ingestion using Apache NiFi.
This is what we are planning to do. Please stay with us to read further posts, no matter if you are interested in all the topics or just some of them.
See you soon ;-)