Tutorial
9 min read

News Recommendation: the challenging area in building recommendation systems

Remember our whitepaper “Guide to Recommendation Systems. Implementation of Machine Learning in Business” from the middle of last year? Our data scientist, Michal Stawikowski, did an excellent job of giving you a cross-sectional overview of the issues related to recommender systems. In his paper, we analyzed the issue from both the business side and dived into the technical details. We also presented an example of a four-step recommender system, where in successive steps the results are retrieved, filtered, scanned and sorted. You can also find out what QuickStart ML Blueprints are and how they can help data scientists and engineers with building recommendation systems. Download the white paper here.

recommendation-systems-ebook-getindata

Personalised news recommendation systems

Today I would like to focus on a specific issue, namely news recommendation. With the development of artificial intelligence, new solutions have started to appear in recent months, based, for example, on GPT-4 or diffusion models to improve the effectiveness of recommendation engines. However, solutions based on slightly older resolutions such as TF-IDF, word2vec or Bag-of-Words are still leading the way.

As a recap, below is a breakdown of the most important approaches to building recommendation engines.

schema-recommendation-systems-getindata

To create a news recommendation engine, we can actually use any of the above approaches, depending on what our business objective and technological capabilities are. However, the news area is characterized by a particular sensitivity to the context of the news.

Traditional recommendation systems recommend articles according to how similar they are to articles in which the user was previously interested. Typically, similarity is measured using the distance between two pieces of text. A small distance indicates high similarity, while a large distance indicates low similarity. However, people's preference depends on several factors, including context or recent social media trends. For example, a text about the latest transfers of one football club may not be of interest to a fan of another team, such a news item may also become instantly irrelevant if the transfer does not materialise after all. It is important to remember that news recommendation systems face particular challenges because articles change quickly, data about readers is limited, and the relevance of articles is highly context-dependent. As a result, there is growing interest in creating personalised news recommendation systems that can provide users with articles that match their preferences and interests. One approach to creating such systems is to use contextual information. Users' reading preferences and habits can vary depending on their location, time of day and other factors. Given contextual information, news recommendation systems can personalise recommendations for each user, taking into account their current state. Capturing context and trends from users can be achieved in several ways, such as analysing the content of articles that users click on, tracking users' social media activity, using collaborative filtering to identify similar users based on their clicking behaviour, and using contextual information such as time of day, location, device and user profile to personalise recommendations.

Below you can find a classification of features used for news recommendation systems:

features-type-table-getindata

Taking these issues into account, the target solution should be to build a hybrid model, which takes into account both content and user behaviour and preferences.

News modeling

A key element in building methods for personalized news recommendations is news modeling. In this step, it is necessary to understand the content and capture the individual characteristics of the article. A large number of approaches can be used for this purpose, which we can divide into two main groups: feature-based methods and deep learning-based methods.

Feature-based methods use features prepared by the data scientist to represent news articles. These features are designed to capture different aspects of news content and contexts. In many collaborative filtering based methods, articles are represented by news ID’s. However, this approach can suffer from a 'cold start' problem, as new articles are constantly being published and old articles quickly disappear, resulting in limited coverage of news identifiers in the learning set. ID-based news modeling has many limitations, so additional techniques are often used to statistically describe news content. One of these is Term Frequency-Inverse Document Frequency (TF-IDF), which extracts features from news texts. Other content features are also often used, such as topic modeling, using techniques such as Latent Dirichlet Allocation (LDA) to extract topics from news titles, summaries and main content. In addition, other factors such as news popularity, frequency, sentiment and bias can also be used in the model to improve news representation.

On the other hand, deep learning-based methods use neural network models to automatically learn article representations from raw input data, such as news texts. In this case, we can largely skip the data preparation step. They are a competing approach to the one described above, often being able to more effectively capture the information and context of news articles by learning latent patterns from raw input data. For example, some methods use autoencoders, knowledge-aware convolutional neural networks (CNNs), multi-headed self-attention networks and pre-trained language models (PLMs) to encode news text. Deep learning-based methods for news recommendation systems can include news attributes, such as specific topics or concepts, in their analysis of news articles. In this way, these methods aim to gain a deeper understanding of the knowledge and common themes contained in news articles.

User modeling

The next step in building a recommender system is user modeling. During this phase, it is important to understand the interests and preferences of users. This involves constructing user profiles based on a set of characteristics extracted from clicked messages. Again, as with news modeling, methods can be broadly divided into feature-based and deep learning.

The first approach, feature-based user modeling, involves creating user profiles based on a set of features built from historical user behavior, including clicked messages. These methods use various additional user characteristics to facilitate user modeling, such as demographics (e.g. age, gender and occupation), user location, access patterns and user tags or keywords. In some cases, it may be possible to take into account user behavior on other platforms, such as social media and e-commerce platforms, to get additional information about user interests. However, this type of approach usually requires considerable expertise in feature design and validation and access to a wide range of data, preferably of good quality.

On the other hand, user modeling methods based on deep learning aim to learn representations of users based on their behavior, without the need for manual feature engineering. These methods infer user interests based on click behavior, which is an implicit indicator of a user's interest in messages. However, this data can be noisy and may not always accurately indicate a user's actual interests. To address this, many methods incorporate other types of information into user modeling, such as user IDs, contextual features (e.g. user devices and locations) and many types of user feedback on the news platform to incorporate user engagement information into user interest modeling. These methods can automatically learn deep representations of user interests for personalized news recommendations, which are typically more accurate than manually created user interest features.

Creating ranking

Once the characteristics of news stories and users have been modeled, the next step is to create a ranking of candidate news stories based on their relevance to the user's interests. This is a key step in personalized news recommendation, as it aims to present users with the most relevant and engaging articles. 

Relevance-based methods typically rank candidate articles based on their personalized match to the user's interests. The main problem with these methods is accurately measuring the relevance between candidate news items and the user's interests. Many techniques directly assess the relevance between the user and the news items, based on the similarity of their final representations. For example, some methods calculate the cosine similarity between user and message feature vectors (CF-IDF - Concept Frequency-Inverse Document Frequency) to measure their relevance. Other methods use similarities between vectors of message topics and user interests to determine relevance. One of the challenges of personalized relevance-based ranking is the problem of 'filter bubble', when recommending messages that are similar to those clicked on previously by users can limit diversity. To address this, strategies can be used to recommend messages that are slightly different from those clicked on previously, introducing variety and randomness.

Unlike relevance-based methods, ranking methods are based on reinforcement learning with the aim to optimize the total reward in the long term. These methods explore potential user interests and aim to improve long-term user experience and engagement. They have the ability to increase the diversity of recommendation results and discover potential user interests through exploration.

News Recommendation Systems - Summary

In comparison to recommendation systems in other domains such as movie recommendations, news recommendation engines face unique challenges due to the dynamic and time-sensitive nature of news content. While both types of recommendation systems leverage various techniques like collaborative filtering and content-based filtering, news recommendation engines must also contend with the scarcity of user data and the need for real-time adaptation to evolving news trends. Despite these differences, the overarching goal of personalized recommendation systems remains consistent: to provide users with relevant and engaging content tailored to their preferences and interests. 

If you are seeking support to delve deeper into near recommendation systems solutions, do not hesitate to take advantage of our experts' free consultation offers.

recommendation system
News modeling
personalised recommendation
news recommendation
29 February 2024

Want more? Check our articles

getindata blog nifi tomasz nazarewicz
Tutorial

NiFi Scripted Components - the missing link between scripts and fully custom stuff

Custom components As we probably know, the biggest strength of Apache Nifi is the large amount of ready-to-use components. There are, of course…

Read more
power big data science
Tutorial

Power of Big Data: Science

Welcome to the next installment of the "Big Data for Business" series, in which we deal with the growing popularity of Big Data solutions in various…

Read more
5mlopsobszar roboczy 1 4
Tutorial

MLOps: 5 Machine Learning problems resulting in ineffective use of data

In recent times, Machine Learning has seen a surge in popularity. From Google to tech startups, everyone is rushing to use Machine Learning to expand…

Read more
apache2xobszar roboczy 1 4
Tutorial

Introduction to GeoSpatial streaming with Apache Spark and Apache Sedona

We are  producing more and more geospatial data these days. Many companies struggle to analyze and process such data, and a lot of this data comes…

Read more
getindata cover nifi lego notext
Tutorial

NiFi Ingestion Blog Series. PART I - Advantages and Pitfalls of Lego Driven Development

Apache NiFi, big data processing engine with graphical WebUI, was created to give non-programmers the ability to swiftly and codelessly create data…

Read more
getindata bigdatatech cfp
Big Data Event

How we evaluate the CfP submissions and build the conference agenda at Big Data Technology Warsaw Summit

Big Data Technology Warsaw Summit 2021 is fast approaching. Please save the date - February 25th, 2021. This time the conference will be organized as…

Read more

Contact us

Interested in our solutions?
Contact us!

Together, we will select the best Big Data solutions for your organization and build a project that will have a real impact on your organization.


What did you find most impressive about GetInData?

They did a very good job in finding people that fitted in Acast both technically as well as culturally.
Type the form or send a e-mail: hello@getindata.com
The administrator of your personal data is GetInData Poland Sp. z o.o. with its registered seat in Warsaw (02-508), 39/20 Pulawska St. Your data is processed for the purpose of provision of electronic services in accordance with the Terms & Conditions. For more information on personal data processing and your rights please see Privacy Policy.

By submitting this form, you agree to our Terms & Conditions and Privacy Policy