Modern Recommender Systems - Part 1: Introduction
How machine learning methods simplify item discovery and search.
Over the last ten years, we have been working on an universal and domain agnostic recommender system. We have learned a lot of things along the way, as our recommender serves hundreds of customers including big brands in all domains you can imagine. In this blogpost, we will be not only sharing our insights, but also to give you a comprehensive overview of the technology behind nowadays recommenders that power almost every major site you use when you search for something online. Additionally, we will discuss related problems and ethical aspects of this technology.
In this initial blog post, we will explore the emergence of recommender systems and clarify their role in advertising technology, which is often misunderstood as a single entity. Our goal is to highlight the various objectives that recommender systems fulfill for different stakeholders. In the subsequent blog posts, we will delve into the crucial data and signals that are essential for modern recommenders, followed by an in-depth exploration of the technology behind recommenders.
Recommender systems have become pervasive and the most influential machine learning technology since everyone receives hundreds of recommendations on a daily basis, whether it's news to read, songs to listen to, movies to watch, items to purchase, or tweets to see.
The number of recommendations for an average active online user has been growing steadily over the years. In the last decade, it has accelerated exponentially and the growth is far from saturation.
Youtube revenues in billions of dollars over the last few years
It is hard to estimate precise numbers, but the trend is clear and the growth is driven mostly by following factors:
- the number of internet users increases
- average person spends more time online
- more and more websites and online services adopt recommender systems.
Youtube revenues are dependent on growing active usage of their service and number of recommendations served. We observe a similar trend among our customers.
Recommender systems flourish and they are getting better, but first let’s have a look how it started.
Recommender systems developed alongside information retrieval systems (IR) in the early seventies thanks to availability of computers and internet.
Traditional IR systems were focusing on assisting users when searching large catalogs of items using text queries. Users often inserted their queries via shared computers in public libraries and the output (recommended books) was the same for everyone. No information about users was taken into account and the output was non-personalized. Also even user agnostic IR systems were gradually improved by sequential learning (Information retrieval: A sequential learning process, 1983) and learning to rank algorithms (Learning to rank using gradient descent, Burges, 2005). We describe these machine learning algorithms for IR systems in our blogpost on personalized search.
Personal computers and widespread internet connection enabled personalized recommendations based on history of user actions. One of the first recommendation systems relying exclusively on user historical interactions (explicit ratings of articles) was (GroupLens: An open architecture for collaborative filtering of netnews, 1992).
Since then, recommender systems have developed and improved in many directions. Our ambition is to give you a brief overview of the most important techniques that are used in modern recommender systems. In general, nowadays personalized recommendation and search systems are combining many techniques (which will be described in this blogpost series) to optimize various objectives.
Do recommenders really spy on users and generate targeted ads?
Many people link recommender systems to annoying targeted advertisements. However, such advertisements are often based on simple heuristics and do not use AI based recommender systems at all.
A typical example is abandoned cart retargeting, a type of targeted advertising used by e-commerce websites to encourage users who have left items in their online shopping cart to complete their purchase. When a user adds items to their cart but does not complete the purchase, the e-commerce website may display ads for those specific items to the user as they browse the web or use social media. The list of abandoned products is often a simple reminder and advertisers do not use machine learning to compile it.
Another example of targeted advertising is what users see when visiting online media websites, such as news outlets. These ads are also typically not generated by a recommender system, but auctioned by an AdTech platform based on context and user profiles. There are multiple vendors of AdTech platforms, with the most successful being global giants capable of large-scale user profiling.
Moreover, most AdTech techniques rely on collecting and analyzing large amounts of data about a user's online activities and preferences in order to create a detailed user profile. This data may include information about a user's browsing history, search queries, online purchases, and online communication, among other things. People are concerned about the potential for this data to be collected and used for targeted ads without their knowledge or consent, or for it to be misused or mishandled in some way. This raises ethical concerns about the right to privacy and the protection of personal information.
Recommender systems are primarily used to help users find relevant content on a website. These systems, as opposed to AdTech techniques, typically only consider anonymised user's past interactions with the content on the website and do not utilize external data from other websites or user attributes. As a result, there are likely fewer ethical concerns associated with recommender systems compared to AdTech systems, which may use a wider range of data to personalize ads.
The goal of classical Information Retrieval methods is to make search faster and more accurate by assuming that users know what they are looking for and can formulate it in a query. On the other hand, recommender systems do not require a query and instead focus on inspiring users or helping them to discover new items that they may not even know about. Modern recommender systems incorporate both search and discovery components, allowing users to start typing a query and receive intelligent suggestions to navigate their recommendations in the desired direction. The ambition of modern recommendation systems is to improve the user experience and optimize user engagement.
While the user is typically the main beneficiary of the recommender system, the goals of the system and the objectives it is set to optimize are defined by the owners, designers and developers of the system rather than by users. Recommenders are often set up to optimize for certain metrics, such as click-through rate or conversion rate and user experience or engagement are not optimized directly because even developers of recommender systems often face the challenge of accurately measuring sophisticated and often subjective metrics such as long-term user engagement.
Product owner perspective
Making a great product so the user base grows and users are actively enjoying the product seems like the right strategy. However there is an extra aspect product owners have in mind and that is increasing the revenue.
Subscription based services optimize revenue by maximizing the number of loyal subscribers which is inline with improving quality of product and user experience.
Many online businesses however generate revenue by selling advertisements or generating leads for partner sites.
Selling advertisements enables many businesses to offer their product for free. To increase revenue from displaying ads, one typically maximizes the number of page views. This goal can be achieved by tweaking the recommendation system to optimize this objective. In such scenarios, recommenders typically give more visibility to content that has potential to generate additional page views, such as online photo galleries. Product owners need to find a good balance between increasing short term revenues from advertisements and growing the number of active users in the long term. Too aggressive emphasis on maximizing page views typically might lead to significant decrease of user loyalty and less revenues from ads in the long term. No boosting is the other extreme, that is typically adopted only by fully subscription based services.
In relation to the business model, apart from fully subscription based services and “free” ads-sponsored services, media organizations started to offer a combination of these two models - ads sponsored subscription based service. Product owners are motivated to convert “freemium” users into ads sponsored or full subscribers. Recommendations for freemium users can be adjusted in a way that they are getting more content either behind the paywall or content that convinced similar users to subscribe in the past.
When a significant portion of revenues is coming from generating leads for partner sites (e.g. in case of online aggregators), the goal is to recommend third-party content that is associated with highest provisions. Again, tweaking the recommender too much in this direction leads to a significant decrease of user engagement in the longer term.
Similar situation holds for online retailers willing to maximize revenues by boosting visibility of high margin products.
Please note that all these “tricks” are used even by manually curated websites. Recommender systems just help product owners to use them more systematically and efficiently. Instead of displaying high margin content to everyone, they can just increase the probability that relevant users see it.
Content producer perspective
Various websites and services offer content created by artists (such as songs, movies, podcasts, paintings, and poetry), writers (such as books, articles, and news), vendors (such as real-estate listings, products, and online marketplace listings), or other entities (such as job offers). The ultimate aim of these content producers is to attract an audience that will engage with and appreciate their content. They rely on recommender systems to distribute their content to the right users.
Many creators aspire to achieve bestseller status, and they rely on recommender engines to make that possible. However, they are also wary of receiving negative reviews and would prefer their content to be directed towards an audience that will respond positively. As such, they expect recommender systems to accurately identify the ideal audience and maximize the likelihood of their content being seen or purchased. Creators often measure their success based on the popularity of their content, and some even tailor their content to maximize certain metrics, such as number of page views generated, time users spent reading the content, and so on.
Users expect recommender systems to help them reach their goals. However their goals can change even within a single session and possibly be also affected by recommendations. Imagine yourself reading some serious news or educational materials. For some time you are fully engaged, you are actively looking for related content and expect the recommender and search to support you in this process. Slowly, as you get tired, you would like to read some less serious content and get entertained. This is quite a significant change of your goals and objectives to optimize. A good recommender system can deal with such situations even though it is hard to notice that the user goal has changed from the data available.
When you visit an e-commerce website, sometimes you are in the mood just to browse interesting products in the catalog and get inspired. This is again a very different objective from the situation when you are about to buy any suitable shoes as fast as possible. You can help users in reaching their goals by having more recommendation scenarios available (e.g. “get inspired” and “your favorites”).
Typical user is looking for the best value products, however it is not always the case. Some users prefer premium quality products and their price sensitivity is low, while other users have limited budgets and prefer low-end products. Again, more scenarios (e.g. “best value products for you”, “your premium products”, “cheapest picks for you”) can be available. Advanced recommenders are also able to recommend scenarios (see Recombee Item Segmentations) so you do not display irrelevant “cheapest picks for you” to users, who never bought low-end products.
The task of the recommender system is to predict user intents in real-time and support them in reaching their goals. Such a task is however very difficult in the environment, where users are not very keen to provide explicit feedback and even implicit historical interactions are limited. Such data needs to be available to enable the recommender system aligning with user goals and intents.
Problems and ethical aspects
When the goals of all stakeholders are aligned, the objective of the recommender system is clear and its deployment is straightforward. The problem arises when product owners give emphasis to objectives that work against goals of other stakeholders (users or content producers).
One particular example can be a job board site that publishes job positions and is rewarded for the number of applicants for each position. When the recommender system and personalized search are set to maximize solely this objective, users will be recommended positions they are most likely to apply for, no matter if they have any chance to get accepted. Companies will get a high number of applications for their positions they need to evaluate, which is good. The relevance of candidates will be however quite low, because many of them applied to several positions and they do not have much chance to be accepted.
If the recommender system and personalized search are set to optimize for successful applications directly, there will be much less frustration among all participants. Users will get recommended positions where they have a high chance to get accepted. Companies will get less but much more relevant applicants for their positions, so they save time evaluating them. The job board company will have a better product that works well for both users (applicants) and content creators (companies). However such change often involves adjusting not just the recommender objectives, but also changing the business model of the job board company and collecting data about selected applicants for historically offered positions.
Even more widespread example is already mentioned optimization for the number of page views in media. Such a strategy might help product owners to generate more revenue from advertisements that are displayed to users with every single page view. Imagine that a car seller orders a certain number of ads that should be displayed with articles in the auto category. Then the product owner will boost the probability that the recommender system suggests these articles to as many users as possible. These users will ask themselves, why are they getting so many articles about cars? And some content producers will be in even more difficult situations. Their insightful articles will not get enough visibility and do not even reach relevant audiences. It is because users do not generate so many page views reading long insightful articles and the recommender engine will favor other shorter articles that make more revenue from ads for the product owner.
Why should the product owner sacrifice revenues from additional page views? From our experience it is always better to take into consideration longer term criteria as well. It has a positive effect on user loyalty and increasing revenues in the longer term. Also, many media companies introduced subscription-based models enabling them to optimize recommendations directly to user engagement. Non subscribed users are more likely to subscribe when they are recommended relevant content that is unique or insightful. Recommenders here are often set to balance uplift in page views with the number of new subscriptions.
There are still open questions such as: Is it ethical for product owners to tweak recommenders and optimize solely their objectives? Should users be informed about these tweaks? And how? In my opinion, products that significantly deviate from interests of their users or content creators are doomed to lose their market share anyways.
Here are 5 key takeaways from the blog post:
- Recommender systems have evolved from traditional information retrieval systems in the early seventies to personalized recommendation and search systems that combine many techniques to optimize various objectives, such as improving engagement, user satisfaction, and revenue.
- Recommender systems have become an important machine learning technology and are used to make personalized recommendations to users, but they are often confused with AdTech platforms that generate targeted ads.
- Recommender systems primarily help users find relevant content on a website and typically only consider a user's past interactions with the content on the website, while AdTech platforms may use a wider range of data to personalize ads, which raises ethical concerns about privacy and the protection of personal information.
- The number of recommendations an average active online user receives has been growing steadily over the years, driven by the amount of time people spend online, and the adoption of recommender systems by more websites and online services.
- Recommender systems can be configured to optimize different criteria and it is important that they are aligned with objectives of users in order to win their long term engagement.
In the next part, we will discuss in detail the data and signals powering recommender systems. And even more importantly, how we can evaluate recommender systems and measure if they deviate from goals they were set to reach.
In my presentation at the Data Technology Seminar organized by the European Brodcasting Union, I have focused on demonstrating that recommender systems can actually help public media organizations to better fulfill their role in society and reduce content distribution biases.
Inductive Matrix Completion: How to Improve Recommendations for Cold Start Users and Items by Incorporating Their Attributes
Matrix completion (MC), the problem of recovering the missing entries of a partially observed matrix, has found use in a wide range of domains. Still, its potentially most successful application is as a collaborative filtering technique for recommender systems (RSs)...
Artificial Intelligence (AI) has rapidly transformed the media industry in recent years. From automated news production to trend analysis and personalized content recommendations, AI has brought significant changes to the way media is created, distributed, and consumed.