Tutorial
9 min read

News Recommendation: the challenging area in building recommendation systems

Remember our whitepaper “Guide to Recommendation Systems. Implementation of Machine Learning in Business” from the middle of last year? Our data scientist, Michal Stawikowski, did an excellent job of giving you a cross-sectional overview of the issues related to recommender systems. In his paper, we analyzed the issue from both the business side and dived into the technical details. We also presented an example of a four-step recommender system, where in successive steps the results are retrieved, filtered, scanned and sorted. You can also find out what QuickStart ML Blueprints are and how they can help data scientists and engineers with building recommendation systems. Download the white paper here.

recommendation-systems-ebook-getindata

Personalised news recommendation systems

Today I would like to focus on a specific issue, namely news recommendation. With the development of artificial intelligence, new solutions have started to appear in recent months, based, for example, on GPT-4 or diffusion models to improve the effectiveness of recommendation engines. However, solutions based on slightly older resolutions such as TF-IDF, word2vec or Bag-of-Words are still leading the way.

As a recap, below is a breakdown of the most important approaches to building recommendation engines.

schema-recommendation-systems-getindata

To create a news recommendation engine, we can actually use any of the above approaches, depending on what our business objective and technological capabilities are. However, the news area is characterized by a particular sensitivity to the context of the news.

Traditional recommendation systems recommend articles according to how similar they are to articles in which the user was previously interested. Typically, similarity is measured using the distance between two pieces of text. A small distance indicates high similarity, while a large distance indicates low similarity. However, people's preference depends on several factors, including context or recent social media trends. For example, a text about the latest transfers of one football club may not be of interest to a fan of another team, such a news item may also become instantly irrelevant if the transfer does not materialise after all. It is important to remember that news recommendation systems face particular challenges because articles change quickly, data about readers is limited, and the relevance of articles is highly context-dependent. As a result, there is growing interest in creating personalised news recommendation systems that can provide users with articles that match their preferences and interests. One approach to creating such systems is to use contextual information. Users' reading preferences and habits can vary depending on their location, time of day and other factors. Given contextual information, news recommendation systems can personalise recommendations for each user, taking into account their current state. Capturing context and trends from users can be achieved in several ways, such as analysing the content of articles that users click on, tracking users' social media activity, using collaborative filtering to identify similar users based on their clicking behaviour, and using contextual information such as time of day, location, device and user profile to personalise recommendations.

Below you can find a classification of features used for news recommendation systems:

features-type-table-getindata

Taking these issues into account, the target solution should be to build a hybrid model, which takes into account both content and user behaviour and preferences.

News modeling

A key element in building methods for personalized news recommendations is news modeling. In this step, it is necessary to understand the content and capture the individual characteristics of the article. A large number of approaches can be used for this purpose, which we can divide into two main groups: feature-based methods and deep learning-based methods.

Feature-based methods use features prepared by the data scientist to represent news articles. These features are designed to capture different aspects of news content and contexts. In many collaborative filtering based methods, articles are represented by news ID’s. However, this approach can suffer from a 'cold start' problem, as new articles are constantly being published and old articles quickly disappear, resulting in limited coverage of news identifiers in the learning set. ID-based news modeling has many limitations, so additional techniques are often used to statistically describe news content. One of these is Term Frequency-Inverse Document Frequency (TF-IDF), which extracts features from news texts. Other content features are also often used, such as topic modeling, using techniques such as Latent Dirichlet Allocation (LDA) to extract topics from news titles, summaries and main content. In addition, other factors such as news popularity, frequency, sentiment and bias can also be used in the model to improve news representation.

On the other hand, deep learning-based methods use neural network models to automatically learn article representations from raw input data, such as news texts. In this case, we can largely skip the data preparation step. They are a competing approach to the one described above, often being able to more effectively capture the information and context of news articles by learning latent patterns from raw input data. For example, some methods use autoencoders, knowledge-aware convolutional neural networks (CNNs), multi-headed self-attention networks and pre-trained language models (PLMs) to encode news text. Deep learning-based methods for news recommendation systems can include news attributes, such as specific topics or concepts, in their analysis of news articles. In this way, these methods aim to gain a deeper understanding of the knowledge and common themes contained in news articles.

User modeling

The next step in building a recommender system is user modeling. During this phase, it is important to understand the interests and preferences of users. This involves constructing user profiles based on a set of characteristics extracted from clicked messages. Again, as with news modeling, methods can be broadly divided into feature-based and deep learning.

The first approach, feature-based user modeling, involves creating user profiles based on a set of features built from historical user behavior, including clicked messages. These methods use various additional user characteristics to facilitate user modeling, such as demographics (e.g. age, gender and occupation), user location, access patterns and user tags or keywords. In some cases, it may be possible to take into account user behavior on other platforms, such as social media and e-commerce platforms, to get additional information about user interests. However, this type of approach usually requires considerable expertise in feature design and validation and access to a wide range of data, preferably of good quality.

On the other hand, user modeling methods based on deep learning aim to learn representations of users based on their behavior, without the need for manual feature engineering. These methods infer user interests based on click behavior, which is an implicit indicator of a user's interest in messages. However, this data can be noisy and may not always accurately indicate a user's actual interests. To address this, many methods incorporate other types of information into user modeling, such as user IDs, contextual features (e.g. user devices and locations) and many types of user feedback on the news platform to incorporate user engagement information into user interest modeling. These methods can automatically learn deep representations of user interests for personalized news recommendations, which are typically more accurate than manually created user interest features.

Creating ranking

Once the characteristics of news stories and users have been modeled, the next step is to create a ranking of candidate news stories based on their relevance to the user's interests. This is a key step in personalized news recommendation, as it aims to present users with the most relevant and engaging articles. 

Relevance-based methods typically rank candidate articles based on their personalized match to the user's interests. The main problem with these methods is accurately measuring the relevance between candidate news items and the user's interests. Many techniques directly assess the relevance between the user and the news items, based on the similarity of their final representations. For example, some methods calculate the cosine similarity between user and message feature vectors (CF-IDF - Concept Frequency-Inverse Document Frequency) to measure their relevance. Other methods use similarities between vectors of message topics and user interests to determine relevance. One of the challenges of personalized relevance-based ranking is the problem of 'filter bubble', when recommending messages that are similar to those clicked on previously by users can limit diversity. To address this, strategies can be used to recommend messages that are slightly different from those clicked on previously, introducing variety and randomness.

Unlike relevance-based methods, ranking methods are based on reinforcement learning with the aim to optimize the total reward in the long term. These methods explore potential user interests and aim to improve long-term user experience and engagement. They have the ability to increase the diversity of recommendation results and discover potential user interests through exploration.

News Recommendation Systems - Summary

In comparison to recommendation systems in other domains such as movie recommendations, news recommendation engines face unique challenges due to the dynamic and time-sensitive nature of news content. While both types of recommendation systems leverage various techniques like collaborative filtering and content-based filtering, news recommendation engines must also contend with the scarcity of user data and the need for real-time adaptation to evolving news trends. Despite these differences, the overarching goal of personalized recommendation systems remains consistent: to provide users with relevant and engaging content tailored to their preferences and interests. 

If you are seeking support to delve deeper into near recommendation systems solutions, do not hesitate to take advantage of our experts' free consultation offers.

recommendation system
News modeling
personalised recommendation
news recommendation
29 February 2024

Want more? Check our articles

data driven fast track 3 steps make you data driven company
Tech News

Data-driven fast-track: 3 steps to make your company more data-driven

Hardly anyone needs convincing that the more a data-driven company you are, the better. We all have examples of great tech companies in mind. The…

Read more
getindata ml innovations 2023
Tech News

If LLM’s did not exist. ML innovations in 2023 from a data scientist’s perspective

The year 2023 has definitely been dominated by LLM’s (Large Language Models) and generative models. Whether you are a researcher, data scientist, or…

Read more
getindator create an image set in a high tech data operations r cb3ee8f5 f68a 41b0 86c3 12eb597539c0
Tutorial

dbt-flink-adapter - job lifecycle management. Transforming data streaming

It's been a year since the announcement of the dbt-flink-adapter, and the concept of enabling real-time analytics with dbt and Flink SQL is simply…

Read more
getindata data democratization 2

Data Democratization: Power Your Organizations with Data Accessibility

In today's digital age, data reigns supreme as the lifeblood of organizations across industries. From enabling informed decision-making to driving…

Read more
albert1obszar roboczy 1 100
Tutorial

Apache NiFi and Apache NiFi Registry on Kubernetes

Apache NiFi is a popular, big data processing engine with graphical Web UI that provides non-programmers the ability to swiftly and codelessly create…

Read more
avoiding the mess in the hadoop cluster
Tutorial

Avoiding the mess in the Hadoop Cluster

This blog is based on the talk “Simplified Data Management and Process Scheduling in Hadoop” that we gave at the Big Data Technical Conference in…

Read more

Contact us

Interested in our solutions?
Contact us!

Together, we will select the best Big Data solutions for your organization and build a project that will have a real impact on your organization.


What did you find most impressive about GetInData?

They did a very good job in finding people that fitted in Acast both technically as well as culturally.
Type the form or send a e-mail: hello@getindata.com
The administrator of your personal data is GetInData Poland Sp. z o.o. with its registered seat in Warsaw (02-508), 39/20 Pulawska St. Your data is processed for the purpose of provision of electronic services in accordance with the Terms & Conditions. For more information on personal data processing and your rights please see Privacy Policy.

By submitting this form, you agree to our Terms & Conditions and Privacy Policy