Bandit Models: Exploiting Popularity and Curiosity to Recommend Trending Content
Humans are inherently curious. In fact, curiosity is linked to the evolution of humankind. For instance, according to famous historian Yuval Noah Harari in his bestseller book "Sapiens", our language skills evolved as a way of gossiping: "It is not enough for individual men and women to know the whereabouts of lions and bison. It's much more important for them to know who in their band hates whom, who is sleeping with whom, who is honest, and who is a cheat". In this regard, one might wonder how this trait we all possess can help us develop more accurate recommendations? And how do curiosity and popularity interact in the context of recommender systems? In this blog post, we will answer these and other questions and explain how we at Recombee frequently interpret the environment in which the recommender systems are exposed to effectively respond to changes in popularity.
Let us start with an example of one of the most fundamental recommender system problems: cold-start recommendation (CSR), where we aim to profile and recommend items to new users. This problem is especially challenging due to the lack of information about the preferences and behavior of the users. For example, when a new user visits a video streamer website, an RS needs to choose a video solely based on user-agnostic information (e.g., the average times previous users watched a video). After the recommendation, the RS observes whether the new user watches the recommended video. The website aims to catch the users' attention and maximize the total number of interacted videos. Therefore, providing effective CSR here requires identifying the `trending' items most popular among the website's audience.
Multi-armed bandit algorithms for recommender systems
A popular option to solve this problem is to model CSR as a multi-armed bandit (MAB) problem. In MABs, at each trial, a gambler selects an arm to pull and observes the reward. Throughout the trial history, the gambler improves his policy to maximize the reward at the end.
Recommender systems can be modeled as MABs. In our video streamer example, at each recommendation requisition, the RS selects an item (arm) to show to the user and observes if he watches or not the item (observes the reward). Throughout the recommendation history, an algorithm improves the strategy to maximize the number of watched videos.
Although model RSs as MABs can be efficient and accurate, their standard setting assumes that there exists a fixed most popular item overall time. However, this is inadequate in practice: assuming that the items' popularity is not changing over time is highly unrealistic while considering practical applications.
We, therefore, can model CSR as a non-stationary MAB problem, where the most popular item changes over time. Interestingly, we continuously face the classic exploration/exploitation dilemma known from reinforcement learning: the RS must maintain a balance between recommending a classic popular item and recommending the object of the current viral fad. To illustrate this dilemma, let us consider the following real-world example. Suppose that an RS must select among videos of two artists: the South Korean singer Psy and the British singer David Bowie.
The grey curves in the above graphs show the cumulative percentage of youtube activity (only USA audience) associated with both artists from 2008 to 2020 (for details regarding data collection and preprocessing, see the referenced article). Most of the time, the rate of growth of the system activity is approximately constant. However, this linearity is sometimes broken by sudden bursts of events highlighted by the vertical red and blue lines. The first spike (vertical red line) matches with the Gangnam Style release. The hit had an unprecedented explosion of popularity and its music video “broke” the YouTube view counter's limit. The second burst of events (vertical blue line) coincides with David Bowie's unexpected death. This unfortunate exogenous event triggered the audience's curiosity and substantially increased his video views and reactions in posts and comments. In this frenzy, old fans recalled his talent by listening to his songs again, while the new generation discovered a treasure and quickly transformed into a musical and show business icon.
We then explain the variations in the users' activity by the existence of two types of audiences (disentangled in the above image, third graph): the loyal audience and the curious audience. The loyal audience (green curve, third graph) is constituted by fans who assiduously follow the topic. In contrast, the curious audience (yellow curve, third graph) only turned their attention to the topic due to an extraordinary event. Thus, the environmental context in which the RS must make decisions alternates between calm periods (where the users' behavior is driven by the loyal audience), and disruptive or bursty periods (where the curious audience is driving sudden bursts of interest in certain topics). In our real-life example, during stable periods, the ratio between the singers' popularity is stable, and Bowie is consistently more popular than Psy (second graph). On the other hand, during the disruptive period dominated by curious behavior, the relative popularity between Psy and Bowie changed drastically. Note that, during the recommendation event, the RS only observes the grey curve: it does not know to whose audience it belongs, nor which artist's content the user wishes to interact with. Hence, such information must be inferred.
In our example, we can clearly see how user curiosity can affect the item's popularity. The next question is how can we recognize the alternation between audiences in order to anticipate and identify new trends? For that, we will present the BMAB (Burst-induced MAB) algorithm. The core principle is to use two instances of the classic Thompson sampling (TS) algorithm: one for loyal behaviour and the other for curious behaviour.
To support our explanation we will illustrate our algorithm execution for the time series displayed in our Psy/Bowie example. In the beginning, we (1) initialize all entries of TS priors as uniform distributions. This indicates that the algorithm does not initially know which item (Psy or Bowie) is the most popular and must learn through user interactions. In order to maximize the reward, the BMAB algorithm aims to learn (by updating its priors) the reward distributions of both states with enough confidence to select the best arm at the event time.
Therefore, at each recommendation event, the algorithm needs (2) to detect the state of the system. In this blog post, we assume that an oracle is available to provide an estimate of the state at each recommendation requisition. In practice, the role of the oracle can be assumed by our realistic state detector. In the next step, we (3) sample an item according to the prior distribution corresponding to the current active state. Finally, we (4) observe the reward of the selected arm and update the priors of the active state.
In the video below, we compare the execution of our algorithm with the standard MAB model.
Observe that, after the release of the single "Gangnam Style", our model activates the curious mode and promptly identifies Psy as the most popular current item. In contrast, the standard MAB model takes a considerable amount of time to detect such a shift.
Moreover, in this instance, our model assumes that the transition from one state to another (loyal to curious or curious to loyal) comes with its own reward distribution. Accordingly, we aim to treat each state segment as a separate MAB problem, resetting the Thompson priors at the beginning of each segment whilst keeping a global count for the periods where the loyal audience dominates. However, due to the uncertainty inherent in the state prediction method, we engineer a soft transition procedure: whenever a state appears to be ending, the priors corresponding to the state are gradually forgotten rather than discarded immediately. This behaviour can be observed in the graphs at the bottom of the video.
As you can see, monitoring the audience to quickly recognize changes in popularity may be crucial to identifying trending items with the potential to significantly enhance recommendation KPIs. Curiosity-aware MABs are one of the several Recombee toolkits that deliver improved content recommendations. If you enjoyed our work, please cite us, provide us with comments, or contact our research team for further technical details.
This blog article is adapted from the contribution provided below. For details of data, please check the paper. Please include appropriate citations for any use of this content, whether in whole or in part.
Rodrigo Alves, Antoine Ledent, and Marius Kloft. 2021. Burst-induced Multi-Armed Bandit for Learning Recommendation. In Fifteenth ACM Conference on Recommender Systems (RecSys '21). Association for Computing Machinery, New York, NY, USA, 292–301. https://doi.org/10.1145/3460231.3474250
Item Segmentations are Recombee's original and elegant solution to various advanced tasks related to hierarchical and relational data. The feature provides a flexible way to group items (products or pieces of content) into segments...
At Recombee, we felt the transition within the media industry accelerated by the pandemic. OTT and CTV consumption ballooned at a significant rate.
Do you feel there is a potential to increase your success with customers through an efficient recommender engine? You're highly likely right. Adding a recommender service to your emailing campaigns gives each client tailored product recommendations in all of their emails.