Big Data Event
10 min read

How we evaluate the CfP submissions and build the conference agenda at Big Data Technology Warsaw Summit

Big Data Technology Warsaw Summit 2021 is fast approaching. Please save the date - February 25th, 2021. This time the conference will be organized as an online interactive event. This makes it easier for everyone to speak and attend because there is no need to travel to Warsaw.

As the Call for Presentation (CfP) is still open, we’d like to encourage you to submit your proposals. To shed some light on how submissions are rated and selected, we describe our evaluation process in this blog post.

The deadline for the CfP is October 10th, 2020 and the submission page is here.

Knowledge sharing is the main focus

First and foremost, we aim for presentations about case studies, technologies that solve real-world problems, best practices and lessons learned. We reject submissions containing outright sales pitches and commercial presentations.

yuan jiang architecture interactive analytics big data tech warsaw 2020
Yuan Jiang sharing the system architecture for interactive analytics at Alibaba at Big Data Tech Warsaw 2020.

In each submission we evaluate a presentation value, uniqueness of its subject, alignment with conference core and also speaker’s professional achievements.

  • Value - We aim for the practical and helpful content. If attendees go back to work after the conference, they should be able to apply new knowledge and inspirations in their projects. Ideal presentations are based on real-world use-cases, that work in production at reasonable scale and bring value to the business.
  • Alignment - The main goal of the conference is knowledge sharing. Everyone enjoys presentations where speakers openly talk about their use-cases, technology stack, lessons learned. The authentic goal of the presentation should be sharing the knowledge, not selling some product, software or services.
  • Uniqueness - Although we aim for practical and useful content, we prioritize talks that haven’t been already presented at other conferences. There aren’t many videos or blog posts in the Internet explaining a particular topic, technology or use-cases in detail. Ideally, a speaker should present her/his unique perspective on a given topic and have her/his own comments to share.
  • Speaker - We check if a speaker is known in the BigData / Cloud / ML community, has some prior speaking experience at conferences/meetups, what company she/he represents, and so on. We also give a small-percentage bonus point for diversity to get more speakers from any underrepresented groups.

You can check the agenda of the last edition of the conference and notice that many presentations score very high in each of these 4 criteria. Among top-rated presentations during the CfP process in 2019 were, for example, “Building Recommendation Platform for ESPN+ and Disney+. Lessons Learned”, “Presto @Zalando: A cloud journey for Europe’s leading online retailer”, “Reliability in ML - how to manage changes in data science projects?” or “Omnichannel Personalization as an example of creating data ROI - from separate use cases to operational complete data ecosystem”.

One more experimental track

There are several tracks at the conference and all of the criteria above apply to all conference tracks excluding one.

This year, as an experiment, we decided to launch a new track called “Academia, the incubating projects, and POCs”. Presentations submitted to this track will be scored in a standard way against the Alignment, Uniqueness and Speaker criteria, but we will not expect that submissions describe production use-cases or battle-proven technologies that work at large scale. However, we assume that listening to these presentations will still bring value to the audience by putting some technologies into their radars or introducing new techniques or use-cases that might be popular in the future.

The CfP committee that consists of practitioners

Our CfP committee is quite large and consists of experts who have practical experience in the data-related field and work at top data-driven companies. Each reviewer has her/his own area of expertise. Typically each reviewer rates presentations in multiple tracks, and she/he can specify the importance (weight) of her/his rating in each particular track (from 0.0 meaning no expertise, to 1.0 meaning a lot of expertise). For example, a reviewer who is a senior data engineer specializing in building real-time streaming solutions can rate two tracks (i.e. data engineering and real-time streaming analytics) and give them slightly different weights.

To minimize the risk of bad decisions, multiple reviewers rate each submission (4 reviewers on average) and discuss them. Typically, we add comments next to the ratings so that we share our findings, feedback, concerns across the committee. We assume it’s fine that someone can change her/his vote after hearing feedback from others. We want the CfP process to be a discussion where the CfP members collaborate together to select the best submissions.

The first round to build a short-list

In the first round, each committee member rates presentations in her/his track(s) in four categories and provides additional comments. Here is an example:

getindata-bigdata-tchnology-warsaw-summit

We calculate a score for each submission based on the ratings, criteria weights and reviewers expertise.

Then for each track, we build a quite large list of good-rated presentations. During this process, we have several discussions and iterations based on provided comments and other findings so that we can review each presentation as carefully as possible and re-review them again if needed.

Second round to avoid duplicates and move submissions between tracks

It happens that we have multiple submissions from the same speaker or from the same company. Often all of them get high scores because they are practical, unique, and aligned to the conference core. However, to build a more diverse agenda, we must avoid multiple presentations from the same company or about the same technology or about the same use-case. For example, two years ago we rejected very good submissions about Apache Flink to avoid too many talks about Flink in the real-time streaming analytics track and accepted a bit lower-rated presentation about Apache Apex to cover a more diverse set of stream processing technologies. A year ago, we had one speaker who sent us three very well-rated submissions, and we had to accept only one of them, even though two other ones scored better than other presentations that we accepted. Also two years ago, we had three submissions about automotive and self-driving cars and we had to reject at least one of them, even though all of them looked good.

Also, a given submission might fit multiple tracks, so in this round we also check if some submissions can be moved across the tracks in case a particular track is too crowded.

As you see, our process is not automated and it requires a lot of brainstorming to build the agenda.

In this round, we divide the short-list into two parts - the submissions that we accept now, and the waiting list.

Third round to give last-minute opportunities to speak

It turns out that each year a few speakers have to cancel their presentations a few days or weeks before the conference for various reasons. To find a proper replacement we ask speakers from the waiting list first if they can still speak at the conference.

Does this system work?

During the conference we ask the audience to score each attended presentation so that we collect the feedback about its quality. Thanks to that we can also check if the presentations accepted by the CfP committee were enjoyed by the audience.

Last time, 18 out of 33 presentations given at the conference received a very good score from the audience (average of at least 4.0 in the 1-5 scale). Let’s call them well-rated presentations. 2 out of 33 presentations received a quite bad score from the audience (lower than 3.0 on average). Let’s call them badly-rated presentations. Here are some stats:

  • 10 out of 16 presentations that were accepted by us through the CfP process were well-rated presentations. None of them was a badly-rated presentation.
  • 6 out of the top 8 highest-rated presentations from the CfP process were also well-rated by the audience.

In other words, if the CfP committee accepts a submission it will be most likely enjoyed by the audience, or at least it will not disappoint the audience. We are quite happy about these numbers, but of course we still see the room for improving our process.

Stats from the 2020 edition

Here are some stats that come from the CfP process that took place a year ago, in 2019, before the last edition of the conference.

  • 73 submissions from all over the world
  • Each submission rated in 4 categories by 4 reviewers on average
  • The acceptance ratio was 23%.

As you see, it was extremely difficult to select the best presentations and at the same time reject many very good ones.

The diverse agenda in the 2020 edition

During the last edition of the conference, over half of the presentations came from CfP. This was intentional because we want the conference to be open for the community, make sure that everyone who has a good story to share can speak at the conference.

null

8 presentations were given by personally invited speakers. Those are always experts that we identified in the community, verified their experience by watching some of the presentations or reading their blog posts. Also, 8 presentations were delivered by our sponsors and partners.

null

We achieved good coverage of technologies including battle-proven and state-of-the-art technologies such as Kafka, Flink, Presto, Kubernetes, or Google Cloud as well as new cool kids on the block like Hudi or Amundsen or rising stars like Snowflake.

null

How were the above technologies actually presented at the conference? We encourage you to read our blog post about top 5 biggest ideas to hear during the edition of Big Data Technology Warsaw Summit 2021.


Half of the presentations were about infrastructure, platforms, architecture and engineering efforts to make big data projects successful. Speakers talked about how the above technologies were used by them as building blocks to build their data analytics platforms for real-time and batch processing, or for deploying large-scale ML projects. They talked about on-premise, the public cloud or about migration to the cloud.

The other half of the presentations will be about actually using the big data platforms and the data to implement data-driven algorithms, get insights and solve business use-cases such as recommendation systems, omnichannel personalisation, credit risk, insurance etc.

null

See you soon!

It is a quite long blog post, but hopefully we explained how we evaluate submissions for Big Data Technology Warsaw Summit 2021 and why the process looks this way.

We’d like to encourage you to submit your proposal. Remember that the deadline for the CfP is October 10th, 2020 and the submission page is here.

big data
conference
technology
bigdatatech
bigdatatechwarsaw
28 September 2020

Want more? Check our articles

Big Data Event

2³ Reasons To Speak at Big Data Tech Warsaw 2020 (February 27th, 2020)

Big Data Technology Warsaw Summit 2020 is fast approaching. This will be 6th edition of the conference that is jointly organised by Evention and…

Read more

5 reasons to follow us on Linkedin. Celebrating 1,000 followers on our profile!

We are excited to announce that we recently hit the 1,000+ followers on our profile on Linkedin. We would like to send a special THANK YOU :) to…

Read more
Use-cases/Project

Anomaly detection implemented in podcasting company

Being a Data Engineer is not only about moving the data but also about extracting value from it. Read an article on how we implemented anomalies…

Read more
Tutorial

Apache NiFi - why do data engineers love it and hate it at the same time? Blog Series Introduction

Learning new technologies is like falling in love. At the beginning, you enjoy it totally and it is like wearing pink glasses that prevent you from…

Read more
Tutorial

Avoiding the mess in the Hadoop Cluster

This blog is based on the talk “Simplified Data Management and Process Scheduling in Hadoop” that we gave at the Big Data Technical Conference in…

Read more
Big Data Event

Big Data Tech Warsaw Summit 2019 summary

It’s been already more than a month after Big Data Tech Warsaw Summit 2019, but it’s spirit is still among us — that’s why we’ve decided to prolong it…

Read more

Contact us

Fill out this simple form. Our team will contact you promptly to discuss the next steps.

hello@getindata.comFist bump illustration

Any questions?

Choose one
By submitting this form, you agree to our  Terms & Conditions