Avoiding the mess in the Hadoop Cluster
This blog is based on the talk “Simplified Data Management and Process Scheduling in Hadoop” that we gave at the Big Data Technical Conference in…
Read morePlanning any journey requires some prerequisites. Before you decide on a route and start packing your clothes, you need to know where you are and what your destination is. In many cases, it’s crystal clear without even thinking about it. In others, it gets quite challenging.
The data landscape is very complex. On top of that, the field is growing rapidly with new innovations created every couple of months. That’s why trying to understand where you are compared to the peloton and the frontrunners can become quite an overwhelming task. You may need to ask yourself multiple questions, in areas ranging from the technology that you have to the culture that you are trying to foster.
To help companies on their data-driven journey, we have published this blog post “Data-driven fast-track, 3 steps to make your company more data-driven”. Today we’d like to cover the first step of this process, the Diagnosis. We will show you how you can use it to:
Diagnosis prepares you for the next stages of the transformation, building a roadmap and implementation.
A data-driven company excels in multiple areas, such as:
Many of them are fully-fledged fields on their own, with separate skills, tools and best practices. Navigating this landscape gets difficult, especially when you try to approach it ad-hoc. While working with our clients, we noticed that companies often face similar challenges in their data-driven journeys. That’s why we decided to organize them into a survey. You can think of it as a checklist that you can use, regardless of industry, business lifecycle (start-up vs scale-up vs enterprise) and current data maturity. This helps you to learn about the most urgent data challenges and opportunities in a systematic way.
Like many analytical products, our survey was created as the result of multiple iterations. The first version was created based on pre-existing research in the field. It allowed us to capture the current state of knowledge. We have pushed it further by collecting and aggregating the unique experience of our experts, who help companies to be more data-driven on a daily basis (100+ data projects at GiD so far). This process has helped us to identify five dimensions characterizing data-driven companies.
There are five fundamental areas that make a data-driven company: Leadership, Culture, Analytics, Data, and Technology. They are equally important and usually interrelated (more on that later).
Each dimension describes different aspects of data maturity:
Although these are separate dimensions, they shouldn’t be analyzed in isolation from each other. They are interconnected with multiple dependencies between them. For example, enabling a broad group of users to access data needed for making decisions (Culture dimension) requires a BI tool (Technology dimension). Implementing a BI tool is impossible without funding and support from executives (Leadership dimension) and will provide little value if the data is unavailable or low-quality (Data dimension).
For each dimension, there are multiple questions in the survey to build a comprehensive picture of any company. Each question has multiple possible answers (usually five), ranked from the lowest to the highest data-driven maturity.
Let’s use an example to understand how the survey can be utilized in practice. In this case we will focus on a company that is relatively early on in its journey. We will analyze just one aspect of data maturity (tools for reporting) in one of the dimensions (Technology). The goal is to illustrate the process. The same process should be applied to all areas within every dimension to build a comprehensive understanding of your company/team.
Meet John. John works at a startup with aspirations of using data and analytics to make evidence-based decisions. So far John and his colleagues have set up basic reporting using manual processes and simple tools (spreadsheets). As the company grew, they encountered a bottleneck. Reports were available too late to impact on any decisions. They also often contained errors, which made the end users skeptical about any conclusions drawn from them.
To help John and his colleagues, we recommend switching to more robust solutions and start building analytical awareness in the company. To translate these high-level goals into actionable steps, we need to have a better understanding of current challenges and opportunities. John and his team fill in the survey to make this possible. To have a better perspective of the process, let’s take a look at a sample question from the Technology dimension.
John rates his company at level two out of five, in terms of tools for reporting. The company is beyond the level of not using any tools at all. But there is no BI tool that would help to provide quick and interactive access to data for a broad group of users.
John and his team proceed to answer all the remaining questions in the survey (around twenty questions in total). After answering all of them and aggregating the results, the company is positioned on a data-driven scale. It has a highly aware and committed leadership and capable analysts. There is some work to be done to nurture data culture, but Technology and Data are the biggest bottlenecks.
Now that John and his colleagues are aware of the strengths and room for improvement for the company, there is only one big puzzle left, and that's setting a realistic goals that the company can pursue to push the data efforts forward.
While looking at the sample question from the survey, you probably noticed that the order of answers is not random. As was previously highlighted, they are ranked from the lowest to the highest data-driven maturity. This way of framing questions is called the Guttman scale. We used it to capture the paths that different companies may choose to embark on their data-driven journey. This way, the survey itself hints at the possible next steps that you should take.
Setting goals is not a deterministic process. It can’t be replaced with a simple survey. However, such a template helps to structure the discussion and expose the wide range of opportunities available, in order to ensure that you don’t miss something important. Let’s take another look at our sample question.
If we come back to the question about tools for reporting, we can take a look further down the line from the current state. For example, one of the ideas to improve the company’s reporting capabilities is implementing a BI tool.
You don’t have to stop here, because it may be possible to kill two birds with one stone. In this example, the fourth level is about allowing non-technical users to create their own custom reports. Self-service helps to foster data democratization across the company to make all kinds of decisions using data.
Assuming that you plan to implement a BI tool anyway, you can already make sure that it supports these capabilities. At best, it can save you from replacing your BI tool in the future.
It may also turn out that you can avoid the difficulties of certain levels altogether. For example, with the rich cloud offerings that we see, the technology allows you to avoid heavy reliance on spreadsheets. If John and his team had used the survey at the very beginning of their data journey, they would have discovered that they could already have built their processes in a robust and scalable way.
Based on this example we can see that even a single question can trigger multiple insights. After going through the entire survey, you gain a broad perspective on what you can improve on to get more value from data in the near future. At this point, involving multiple people with different competencies and skill sets is priceless. It’s hard to talk about the deployment of ML models without ML Ops specialists. Or about building Machine Learning models without Data Scientists. Confronting multifaceted perspectives will help you set goals that are both ambitious and realistic.
After some discussions, John and his colleagues decided to focus on Data and Technology dimensions. They plan to implement tools and processes to standardize and automate data pipelines (from ingestion to reporting). They also plan to foster data Culture by educating employees and setting up processes for making and monitoring major decisions with data.
The first step of data-driven transformation helps you get proper orientation. You are learning where your company currently is and where you want to take it. The key component to making this happen is a data-driven survey. It was developed to support our everyday work with clients, but we have also decided to share it more broadly
We believe that it can help all kinds of companies in their data-driven journeys. If you would like to use it to get a better understanding of data opportunities in your company, just follow this link. After completing the survey, you will receive a tailored summary report with insights from one of our experts.
The survey has twenty questions in total, to guide you through five data-driven dimensions. As the data landscape is changing rapidly, we plan to refine and update it on a regular basis. We are open to your feedback and willing to help you interpret the results.
Do you want to know more about:
If so, we would like to encourage you to watch a webinar “Data-Driven Fast-Track: introduction to data-drivenness” hosted by Piotr Menclewicz
This blog is based on the talk “Simplified Data Management and Process Scheduling in Hadoop” that we gave at the Big Data Technical Conference in…
Read moreAI regulatory initiatives of EU countries On April 21, 2021, the EU Commission adopted a proposal for a regulation on artificial intelligence…
Read moreMachine learning is becoming increasingly popular in many industries, from finance to marketing to healthcare. But let's face it, that doesn't mean ML…
Read morePlease dive in the third part of a blog series based on a project delivered for one of our clients. Please click part I, part II to read the…
Read moreYou have just installed your first Kubernetes cluster and installed Istio to get the full advantage of Service Mesh. Thanks to really awesome…
Read moreAcquiring unlabeled data is inherent to many machine learning applications. There are cases when we do not know the result of the action provided by…
Read moreTogether, we will select the best Big Data solutions for your organization and build a project that will have a real impact on your organization.
What did you find most impressive about GetInData?