Tutorial

9 min read

News Recommendation: the challenging area in building recommendation systems

Remember our whitepaper “Guide to Recommendation Systems. Implementation of Machine Learning in Business” from the middle of last year? Our data scientist, Michal Stawikowski, did an excellent job of giving you a cross-sectional overview of the issues related to recommender systems. In his paper, we analyzed the issue from both the business side and dived into the technical details. We also presented an example of a four-step recommender system, where in successive steps the results are retrieved, filtered, scanned and sorted. You can also find out what QuickStart ML Blueprints are and how they can help data scientists and engineers with building recommendation systems. Download the white paper here.

recommendation-systems-ebook-getindata

Personalised news recommendation systems

Today I would like to focus on a specific issue, namely news recommendation. With the development of artificial intelligence, new solutions have started to appear in recent months, based, for example, on GPT-4 or diffusion models to improve the effectiveness of recommendation engines. However, solutions based on slightly older resolutions such as TF-IDF, word2vec or Bag-of-Words are still leading the way.

As a recap, below is a breakdown of the most important approaches to building recommendation engines.

schema-recommendation-systems-getindata

To create a news recommendation engine, we can actually use any of the above approaches, depending on what our business objective and technological capabilities are. However, the news area is characterized by a particular sensitivity to the context of the news.

Traditional recommendation systems recommend articles according to how similar they are to articles in which the user was previously interested. Typically, similarity is measured using the distance between two pieces of text. A small distance indicates high similarity, while a large distance indicates low similarity. However, people's preference depends on several factors, including context or recent social media trends. For example, a text about the latest transfers of one football club may not be of interest to a fan of another team, such a news item may also become instantly irrelevant if the transfer does not materialise after all. It is important to remember that news recommendation systems face particular challenges because articles change quickly, data about readers is limited, and the relevance of articles is highly context-dependent. As a result, there is growing interest in creating personalised news recommendation systems that can provide users with articles that match their preferences and interests. One approach to creating such systems is to use contextual information. Users' reading preferences and habits can vary depending on their location, time of day and other factors. Given contextual information, news recommendation systems can personalise recommendations for each user, taking into account their current state. Capturing context and trends from users can be achieved in several ways, such as analysing the content of articles that users click on, tracking users' social media activity, using collaborative filtering to identify similar users based on their clicking behaviour, and using contextual information such as time of day, location, device and user profile to personalise recommendations.

Below you can find a classification of features used for news recommendation systems:

features-type-table-getindata

Taking these issues into account, the target solution should be to build a hybrid model, which takes into account both content and user behaviour and preferences.

News modeling

A key element in building methods for personalized news recommendations is news modeling. In this step, it is necessary to understand the content and capture the individual characteristics of the article. A large number of approaches can be used for this purpose, which we can divide into two main groups: feature-based methods and deep learning-based methods.

Feature-based methods use features prepared by the data scientist to represent news articles. These features are designed to capture different aspects of news content and contexts. In many collaborative filtering based methods, articles are represented by news ID’s. However, this approach can suffer from a 'cold start' problem, as new articles are constantly being published and old articles quickly disappear, resulting in limited coverage of news identifiers in the learning set. ID-based news modeling has many limitations, so additional techniques are often used to statistically describe news content. One of these is Term Frequency-Inverse Document Frequency (TF-IDF), which extracts features from news texts. Other content features are also often used, such as topic modeling, using techniques such as Latent Dirichlet Allocation (LDA) to extract topics from news titles, summaries and main content. In addition, other factors such as news popularity, frequency, sentiment and bias can also be used in the model to improve news representation.

On the other hand, deep learning-based methods use neural network models to automatically learn article representations from raw input data, such as news texts. In this case, we can largely skip the data preparation step. They are a competing approach to the one described above, often being able to more effectively capture the information and context of news articles by learning latent patterns from raw input data. For example, some methods use autoencoders, knowledge-aware convolutional neural networks (CNNs), multi-headed self-attention networks and pre-trained language models (PLMs) to encode news text. Deep learning-based methods for news recommendation systems can include news attributes, such as specific topics or concepts, in their analysis of news articles. In this way, these methods aim to gain a deeper understanding of the knowledge and common themes contained in news articles.

User modeling

The next step in building a recommender system is user modeling. During this phase, it is important to understand the interests and preferences of users. This involves constructing user profiles based on a set of characteristics extracted from clicked messages. Again, as with news modeling, methods can be broadly divided into feature-based and deep learning.

The first approach, feature-based user modeling, involves creating user profiles based on a set of features built from historical user behavior, including clicked messages. These methods use various additional user characteristics to facilitate user modeling, such as demographics (e.g. age, gender and occupation), user location, access patterns and user tags or keywords. In some cases, it may be possible to take into account user behavior on other platforms, such as social media and e-commerce platforms, to get additional information about user interests. However, this type of approach usually requires considerable expertise in feature design and validation and access to a wide range of data, preferably of good quality.

On the other hand, user modeling methods based on deep learning aim to learn representations of users based on their behavior, without the need for manual feature engineering. These methods infer user interests based on click behavior, which is an implicit indicator of a user's interest in messages. However, this data can be noisy and may not always accurately indicate a user's actual interests. To address this, many methods incorporate other types of information into user modeling, such as user IDs, contextual features (e.g. user devices and locations) and many types of user feedback on the news platform to incorporate user engagement information into user interest modeling. These methods can automatically learn deep representations of user interests for personalized news recommendations, which are typically more accurate than manually created user interest features.

Creating ranking

Once the characteristics of news stories and users have been modeled, the next step is to create a ranking of candidate news stories based on their relevance to the user's interests. This is a key step in personalized news recommendation, as it aims to present users with the most relevant and engaging articles.

Relevance-based methods typically rank candidate articles based on their personalized match to the user's interests. The main problem with these methods is accurately measuring the relevance between candidate news items and the user's interests. Many techniques directly assess the relevance between the user and the news items, based on the similarity of their final representations. For example, some methods calculate the cosine similarity between user and message feature vectors (CF-IDF - Concept Frequency-Inverse Document Frequency) to measure their relevance. Other methods use similarities between vectors of message topics and user interests to determine relevance. One of the challenges of personalized relevance-based ranking is the problem of 'filter bubble', when recommending messages that are similar to those clicked on previously by users can limit diversity. To address this, strategies can be used to recommend messages that are slightly different from those clicked on previously, introducing variety and randomness.

Unlike relevance-based methods, ranking methods are based on reinforcement learning with the aim to optimize the total reward in the long term. These methods explore potential user interests and aim to improve long-term user experience and engagement. They have the ability to increase the diversity of recommendation results and discover potential user interests through exploration.

News Recommendation Systems - Summary

In comparison to recommendation systems in other domains such as movie recommendations, news recommendation engines face unique challenges due to the dynamic and time-sensitive nature of news content. While both types of recommendation systems leverage various techniques like collaborative filtering and content-based filtering, news recommendation engines must also contend with the scarcity of user data and the need for real-time adaptation to evolving news trends. Despite these differences, the overarching goal of personalized recommendation systems remains consistent: to provide users with relevant and engaging content tailored to their preferences and interests.

If you are seeking support to delve deeper into near recommendation systems solutions, do not hesitate to take advantage of our experts' free consultation offers.

recommendation system

News modeling

personalised recommendation

news recommendation

Last updated: 29 February 2024

Written by

Adam Cierlik

Senior Data Scientist

Like this post?
Spread the word

Want more? Check our articles

transfer legacy pipeline modern gitlab cicd kubernetes kaniko

Tutorial

How we helped our client to transfer legacy pipeline to modern one using GitLab's CI/CD - Part 2

Please dive in the second part of a blog series based on a project delivered for one of our clients. If you miss the first part, please check it here…

anomaly detection truecaller getindata machine learning

Success Stories

Revolutionizing Daily Analytics: Machine Learning for an Unusual Approach to Anomaly Detection. The Truecaller Story

Discovering anomalies with remarkable accuracy, our deployed model successfully identified 90% true anomalies within a 2-months evaluation period…

Tutorial

The 7 Most Popular Feature Stores In 2023

Feature Stores are becoming increasingly popular tools in the machine learning environment, serving to manage and share the features needed to build…

Whitepaper

White Paper: Big Data Technologies in the Aviation Industry

About In this White Paper we described use-cases in the aviation industry which are the most prominent examples of Big Data related implementations…

Tutorial

Real-time ingestion to Iceberg with Kafka Connect - Apache Iceberg Sink

What is Apache Iceberg? Apache Iceberg is an open table format for huge analytics datasets which can be used with commonly-used big data processing…

Tutorial

Your ML prototype doesn't have to be messy. A few words about the GetInData Machine Learning Framework

A prototype is an early sample, model, or release of a product built to test a concept or process. What we have above is a nice, generic definition of…

Check All

Contact us

Interested in our solutions?
Contact us!

Together, we will select the best Big Data solutions for your organization and build a project that will have a real impact on your organization.

What did you find most impressive about GetInData?

They did a very good job in finding people that fitted in Acast both technically as well as culturally.

Type the form or send a e-mail: hello@getindata.com

News Recommendation: the challenging area in building recommendation systems

Personalised news recommendation systems

News modeling

User modeling

Creating ranking

News Recommendation Systems - Summary

Like this post?Spread the word

Want more? Check our articles

How we helped our client to transfer legacy pipeline to modern one using GitLab's CI/CD - Part 2

Revolutionizing Daily Analytics: Machine Learning for an Unusual Approach to Anomaly Detection. The Truecaller Story

The 7 Most Popular Feature Stores In 2023

White Paper: Big Data Technologies in the Aviation Industry

Real-time ingestion to Iceberg with Kafka Connect - Apache Iceberg Sink

Your ML prototype doesn't have to be messy. A few words about the GetInData Machine Learning Framework

Contact us

Interested in our solutions?Contact us!

Like this post?
Spread the word

Interested in our solutions?
Contact us!