New Feature Released! Recombee Insights. Explore Feature

Modern Recommender Systems - Part 2: Data

Pavel Kordik
Mar 07

Data used by modern recommenders and how we can measure progress towards goals.

Modern Recommender Systems

Data Is Crucial

Data plays an essential role in the functioning of a recommender system, as it is the primary source of information used to generate accurate and personalized recommendations. In this blogpost, we will discuss the importance of data for recommender systems, the various types of data sources used, and how data can be used to improve the accuracy and effectiveness of recommendations. The data that can be used for recommendations can be categorized into 1) Item catalog 2) User catalog and 3) History of user X item interactions.

Attributes of items are stored in item catalog, user catalog holds information about users and there are several types of user to item interactions that are recorded in different contexts.

Item Catalog

First of all, it is good to know what we can recommend to users. A database of all items is called an item catalog. In this catalog, we store not only items that can be recommended (active items), but also historical items that were recommended in the past and are not available to users any more. Those historical items are important when measuring similarity of users who interacted with them in the past.

Attributes of items help recommenders understand how items are related and which are more similar than others. Here are a few examples of most important item attributes (or item properties).

  • Categories - Items can be categorized into distinct groups, however you might also come with a hierarchical system of categories where one item can belong to multiple categories. Categories can be used to create item segments so you can recommend particular categories to a given user. You can also filter out items from recommendation based on their category labels or boost probability that items from a particular set of categories are recommended to a user.
  • Text descriptions - When you recommend articles, the text of the article can be used in a text description attribute of the item. Modern recommenders have capabilities to process text using advanced neural networks. Similarities of text neural item embeddings can be very important especially when recommending cold start items that do not have many interactions yet.
  • Images - Modern recommenders can use multiple images of an item to create an image neural item embedding. Again, such information is super important for recommendation systems especially when images play a significant role for users (e.g online art gallery) or when interactions and text descriptions are missing. Imagine an online marketplace where users can upload images of items for sale. As they use their smartphones, it is not likely that they will also add rich and informative text descriptions. Another example would be a real-estate portal, where users like to find similar listings based on images of properties. Or a fashion e-commerce site that decided to utilize visual similarity to recommend alternatives from the product catalog.

User Catalog

Similarly to item catalog, user catalog holds attributes and properties of users. Most important user attributes are the following:

  • Location of user - Geographic location of users is important in recommendation scenarios, when users are interested in items that are located nearby (such as real estate, job or event recommendation). Even users with no interaction history can then get relevant recommendations such as popular items in their region.
  • User search history - One can suggest relevant items based on historical user search queries. Also, user search history is instrumental for personalized query suggestions, where reminding users about their past similar queries is very helpful.
  • User bio, interests or skills - In some domains, it is important to take into consideration not just user interactions with items, but also additional background information that can reveal user interests and help to select relevant items. Again, this is particularly important in cold start scenarios where we need to recommend to users without historical interactions.

Problems and Challenges of User Catalog

User catalogs, while important for personalizing recommendations in modern recommender systems, face several significant challenges. These issues primarily revolve around data privacy concerns, user identification difficulties, and the dynamic nature of user attributes. Addressing these challenges is crucial for maintaining the effectiveness and trustworthiness of recommender systems.

Data Privacy Concerns

In the context of increasing data privacy concerns, it's crucial for recommender systems to responsibly collect and utilize user data to deliver optimal user experiences and enhance product offerings. Regulatory frameworks like the GDPR provide essential guidelines for data handling, yet these should be viewed not as obstacles but as opportunities to foster trust and transparency in the digital ecosystem.

Responsible recommenders are pivotal in striking a balance between personalization and privacy. By employing data minimization strategies, pseudo-anonymizing user information, and ensuring robust data protection measures, recommender systems can offer highly personalized experiences without compromising user privacy.

User Identification Difficulties

Accurately identifying users is fundamental to creating and maintaining useful user profiles. However, several issues complicate user identification:

  • Multiple Profiles: Users may create multiple accounts on the same platform, leading to fragmented data that hinders a unified view of user preferences and behavior.
  • Shared Profiles: Accounts shared among several users, common in streaming services and online shopping platforms, present a challenge in discerning individual preferences, resulting in less personalized recommendations.
  • Cross-Device Identification: Users frequently access services across multiple devices, making it challenging to link these interactions to a single user profile accurately.

These identification challenges can lead to inaccuracies in user profiles, impacting the relevance of recommendations and potentially diminishing user satisfaction.

Maintaining Up-To-Date User Attributes

User preferences, interests, and even geographic locations can change over time. Keeping user attributes up-to-date is important for the accuracy of recommender systems.

  • Changing Preferences and Interests: As users evolve, so do their preferences and interests. A recommendation system that fails to adapt to these changes may continue suggesting irrelevant items, leading to user disengagement.
  • Skills and Professional Changes: In domains like job recommendation systems, users' skills and professional interests may develop, requiring the system to adapt to these changes to remain relevant.
  • Mood Variability: User mood, which can influence content preference (such as music or movies), varies significantly. Capturing and adapting to these transient states poses an additional layer of complexity.

These challenges require online platforms to implement mechanisms for regularly updating user catalog and explicit user preferences. The alternative solution is to reduce reliance on user attributes and let recommender systems infer preferences of users from their interactions with items, incorporating feedback loops, and employing adaptive algorithms capable of adjusting to changes in user behavior and preferences.

Where subscription based services can typically supply recommender system with rich user profiles,online platforms that rely on advertising revenue can have as much as 70 percent of anonymous active users with short and recent interaction history. For such users, recommender systems rely on simple session based algorithms such as multi armed bandits. When an anonymous user logs into the platform, the recommender system should be able to merge browsing histories.

Modern platforms should be able to balance personalization with privacy and transparency. As recommender systems evolve, so too must the strategies for managing user catalogs, ensuring that they continue to offer relevant, timely, and engaging recommendations in a privacy-conscious manner. Nice inspiration are recent developments in managing personal profiles for large language models.

User to Item Interactions

Interactions of users with items is the most important data source for recommender systems. In extreme cases, reasonable recommendations can be produced exclusively based on the interaction (or rating) matrix, where user to item interactions are typically stored. Such recommendations can be computed for anonymous users interacting with anonymous items meaning that neither item attributes nor user attributes are used.

User interactions with items are collected in different scenarios, some of which are powered by a recommender system.

There are a variety of user interactions that can be used to derive implicit feedback for recommender systems. These include ratings, browsing history, clicks, interactions with content (such as watching a video or liking a post), purchase history, search history and more. The data collected from these interactions can then be used to create user profiles and model user behavior, which can then be used to create personalized recommendations.

Typical time sequence is that the recommender system is requested for recommendations to a particular user in some scenario. It returns a personalized list of items that is subsequently displayed to the user. When a user engages with some item from the list, it is important to inform the recommender about the user interaction and if the interaction is based on a particular recommendation. For some scenarios, feedback is almost imminent, for other scenarios, it might take days (e.g. personalized newsletter sent by email).

Problems With Collecting User Feedback

Collecting and interpreting user feedback accurately is a cornerstone for the efficiency of recommender systems. However, several challenges complicate this process, impacting the quality of recommendations. Among these challenges, caching recommendations, biased user interactions, and the lack of explicit user feedback are particularly significant.

Caching Recommendations and Its Impact: To economize on the costs associated with recalculating recommendations for frequent users, some platforms employ a strategy of caching recommendations. This method can lead to reduced costs, improved response times, and provides users the opportunity to explore recommended items more thoroughly. However, this practice introduces a significant issue: users may repeatedly encounter the same items. If the recommender system is not notified of these repeated exposures and cannot adjust accordingly, it misses the critical opportunity to refine recommendations based on the user's demonstrated lack of interest in these repeated items. This oversight often results in a decline in user experience, as the system fails to recognize and adapt to the evolving preferences of the user.

Biased User Interactions: Bias in user interactions can significantly skew the data that recommender systems rely on. One form of bias, editorial bias, occurs when some recommendation scenarios are curated by editors and presented the same way to all users. Users typically click on several items from these curated lists, creating an artificial interaction similarity among items that are not genuinely similar. This phenomenon can mislead the recommender system into overestimating the relevance of certain items, thereby distorting the recommendation process.

Lack of User Feedback: Addressing the challenge of collecting user feedback, it's important to acknowledge that most users are reluctant to provide explicit feedback, such as rating items with stars or indicating likes and dislikes. A critical challenge for recommender systems is the absence of explicit or even implicit feedback in many scenarios. For instance, when users are recommended a list of articles and only read the excerpts without further interacting, they may still be satisfied with the recommendations. However, the recommender system receives no feedback signal to reflect this satisfaction. Similarly, in "autoplay" scenarios for music or short videos, users often continue to watch or listen without active engagement, reacting only if the recommendation is particularly unsuitable. This passive consumption can falsely signal to the RS that the user is engaged, leading to misinterpretations of user interest and satisfaction.

Additionally, in scenarios where a recommender system generates a vast array of items but presents only a select few to the user, it becomes essential for the system to recognize that users may not view the recommendations positioned lower on the list. Misinterpreting a user's non-interaction with these less-visible items as a lack of interest can skew the system's perception of user preferences. Furthermore, there are instances where recommendations may not be seen by the user at all, such as when they are placed far down on a webpage and the user does not scroll sufficiently to encounter them. In such cases, the system's assumption that the user has seen and disregarded these recommendations is flawed. Ideally, recommendations should be requested and displayed to the user dynamically, minimizing the time gap between generation and presentation to ensure that users are exposed to relevant recommendations in a timely manner.

To effectively address these challenges, it's critical to enhance the quality of feedback loops and the accuracy of data provided to the recommender system. The more precise and comprehensive the user feedback, the more tailored the recommendations can be. For instance, tracking engagement metrics such as the portions of a video watched, segments of a song listened to, or parts of an article read can offer deeper insights into user preferences. Additionally, recommender systems need to employ advanced techniques to identify and correct biases, improve data quality, and develop methods for gauging user satisfaction beyond their immediate interactions. Furthermore, fostering an environment of transparency and encouraging users to offer direct and explicit feedback on the recommendations they receive can significantly improve the feedback loop, thereby elevating the overall performance of the recommender system.


Data stands at the core of modern recommender systems, fueling the generation of personalized and precise recommendations. The effectiveness of these systems hinges on their ability to leverage diverse data sources, including item catalogs, user catalogs, and user-item interactions. By understanding the attributes of both items and users, along with their interaction history, recommender systems can navigate the complexities of personalization, privacy, and changing user preferences to provide relevant recommendations. However, challenges such as difficulty of user identification, and the dynamic nature of user attributes necessitate advanced strategies to maintain the data useful for improving user experience. Furthermore, accurate collection and interpretation of user feedback are essential for refining recommendation algorithms and enhancing user satisfaction.

Here are main takeaways from the article:

  • Data Categorization: Recommender systems rely on item catalogs, user catalogs, and the history of user-item interactions to generate recommendations.
  • Item Catalog Importance: Attributes stored in the item catalog, like categories, text descriptions, and images, help in understanding item relationships and preferences for better recommendations.
  • User Catalog Challenges: Data privacy, user identification difficulties, and the need for up-to-date user attributes present significant challenges in maintaining useful and actual user profiles.
  • User to Item Interactions: The most crucial data source for recommender systems, enabling the creation of personalized recommendations based on user behavior.
  • Feedback Collection Challenges: Issues such as caching recommendations, biased user interactions, and lack of explicit feedback pose challenges to the effectiveness of recommender systems.
  • Privacy and Personalization Balance: Modern platforms must navigate the delicate balance between providing personalized experiences and respecting user privacy.
  • Advanced Data Quality Strategies: Employing advanced techniques to address biases, improve data quality, and adapt to user behavior changes is essential for the continued relevance and effectiveness of recommender systems.
Recommendation Engine

Next Articles

Elevate Your Personalization Strategy with Recombee's Innovative Features

The digital landscape and customer preferences and behavior are changing faster than ever now. To help our clients stay on top of the game, our team has focused on developing innovative features...

Jan Valuch
Mar 13
New Features
Recommendation Engine

Recombee Real-Time AI Recommendations as the New Destination in Segment

Segment has enabled its users to enjoy Recombee personalization services without the need to leave their platform and with minimum coding involved. With a few simple clicks, domains using Segment can upgrade their services to maximize the digital experience for their customers.

Adela Sloupenska
Mar 05

Is This Comment Useful? Enhancing Personalized Recommendations by Considering User Rating Uncertainty

Picture this: you're on the hunt for the perfect new smartphone, browsing through your favourite online electronics store. The online store’s recommendation engine pops up with what it thinks could be your possible next gadget love...

Rodrigo Alves
Mar 01
Recommendation Engine