How interdisciplinary collaboration can accelerate AI innovation
In a world where innovation is the new standard, Recombee uses the power of interdisciplinary collaboration to stay at the cutting edge of innovation. Partnering up with the leading player in the food industry (Bofrost) and academia (FIT CTU), allowed Recombee to hold a student competition to create AI which can shape the future of the food industry.
At Recombee, we welcome a bit of healthy competition. After all, the only way in which to achieve truly scalable innovation in the area of AI is to iterate, build, hack, and come up with new ideas to make sure the market and associated technologies keep shifting forward. So we decided to create a competition. We gave students an opportunity to work with real-world data, threw real-world problems at them, and gave them real-world deadlines (they only had a few hours to come up with a solution...sounds like a work deadline, anyone?). We feel this is the only way in which talented young people can prepare for the world of applied AI and transfer theory into practice. What was the theme of the competition, we hear you eagerly ask?
If there was a time where quick deployment of technologies to solve problems was crucial than ever before, it was the time of COVID. Hackathons, competitions, ideathons and everything in between started taking place at a rocket pace. Suddenly, older structures and status quo technologies are no longer sufficient and the hunger for new, better innovations has been larger than ever before. One of those things has been the alpha and omega of modern living - food delivery. Especially at a time when restaurants have been forced to physically shut and social distancing was enforced across society. At the time of COVID19 crisis and quarantine, the popularity of services that bring food to the house has grown exponentially (yep, you’re not the only person who enjoys chain-ordering pizza or repetitively orders the same chocolate cake every week…). For most companies, the so-called online groceries, the volume of sales is growing several times and it’s likely this trend will only continue now that the crisis is almost over. Customers have gotten used to the ease in which they can consume these new services (both metaphorically and literally…). Now; let’s not forget that AI feeds on data. Given the growing popularity of these online ordering tools, especially for weekly shopping, the amount of data flowing in about customer behaviour has been substantial. Using a range of AI algorithms and methods, the startups of today can predict, based on past behavioural data, what a customer may need and order the shopping box to the house without the customer having to order its contents. Think of it as a predictive food delivery butler. If people then don’t collect this box (ie the algorithm gets it wrong and sends you a collection of dried seaweed instead of pepperoni) the company will collect the box back and of course this feedback will be incorporated in the algorithm which will get more accurate over time as this data flows in. Of course, this method can only be used with durable foods - you don’t want to risk something like this with raw chicken being left on your doorstep. Nonetheless, it still places great demands on the accuracy of the AI algorithm since you don’t want food to get wasted or customers to be dissatisfied, let alone the cost of recovering wrongly predicted products.
Enter stage right: shopping cart prediction and Recombee’s methodology. The deployment of AI methods has huge commercial potential in this area and both buyers and sellers in the food industry, as well as customers would welcome a more predictive way in which they can plan what they buy, plan what they sell, streamline processes, save time, logistic resources, and also not have to waste time online adding the same items you buy every week to your shopping basket. The problem the market is currently facing in this regard however, is that the prediction methods are not yet reliable enough to allow, for example, the aforementioned predictive shopping. This is mainly due to the absence of quality data over which research into suitable algorithms could be performed.
Recombee has a joint research laboratory with the Faculty of Information Technologies (FIT CTU) and aims to crack these types of problems through an interdisciplinary, academia-meets-business approach. Recombee’s customer Bofrost allowed us to use its unique dataset, which is unparalleled in the size and quality (several hundreds millions of purchases) when predicting a shopping cart. We wanted to give students an opportunity to work on such an interesting problem and this competition was a unique way of doing that.
So what was the competition itself?
The entrants were given a clear task to complete; and for the sake of sentiment, let’s recap it here so we can share with you the real-world use case behind our thinking. Bofrost is a company selling frozen food to customers in 13 countries. The network of drivers delivers goods to customers' homes in deliveries with built-in refrigerators. Customers are mostly loyal, like Bofrost, and buy Bofrost’ goods regularly (several entries per shopper - the ‘AI dream’). Also, when a Bofrost van visits their street, they usually buy several different products at once.
Given that deliveries have limited capacity and for many other reasons of optimization, it would be very beneficial for Bofrost to predict what goods individual customers will buy on the delivery route so that they can load the right amount of goods. Your goal is to produce an algorithm for forecasting the purchase (basket) based on historical data. The goal is to build a model that accepts a set of historical transactions (baskets) made by a particular user along with the current time (,today’) to predict a list of items that the user will purchase today (basket). The algorithm should be as accurate as possible in the cart estimate. The optimal model predicts all the items that are actually purchased but doesn’t predict anything else, so the delivery can only be loaded with the necessary number of correct items. In order to be able to create a model with the greatest predictive power (perhaps there is no need to explain to anyone that the models will be evaluated on unpublished test data), Recombee and FIT CTU staff provided to students several large and well-labelled data sets.
Data, data, data
For the competition, we have prepared several tables and datasets for the entrants to use to crack this enigma. This was picked up well by the attendees, who immediately understood that data quantity isn’t everything but instead putting the correct emphasis on correct labelling and categorisation which will improve the accuracy of the algorithm.
The transaction data categories used for the competition were:
- Anonymous User ID
- Purchase date
- Purchased goods (ID)
Crucially, Bofrost, being such a large distribution network, not only had large quantities of data about their deliveries, but also a wide diversity of entries, ranging from returned items, undelivered items, successful deliveries, times of delivery et cetera. This was able to provide us with a rich database based on which more sophisticated algorithms could be built, leading to very interesting entries from all participants.
This emphasis on data and making it publically available underpins a core aspect of AI as a whole; without correct data and correct labelling, AI isn’t magic and will not fashion solutions out of thin air. We were very happy to report a mature, interdisciplinary methodical approach the students took to this task and are excited about the new cohort of AI professionals which are incubating at our universities today.
Results and next steps?
As much as we love rewarding all enthusiasm and talent, there can only be one winner. Well, two in our case because there were two phenomenal candidates who we recognised as excelling in both the first and the second round of this competition out of 31 students who actively participated by submitting their solutions. The first one was Matyáš Skalický - for technical details of his solution winning first round check out his report, as well as Filip Dolník who decided not to go down the deep learning methodology route, but instead went for more traditional data modeling methods and smart heuristics. This shows that ‘AI’ while a fantastic development in human history, has many faces, and deep learning methods are just one of the many approaches to achieve great results. Often, taking a hybrid approach of for example combining a deep learning basis with smart heuristics ‘on top’ of it can add the extra accuracy edge needed for a top-shelf product. We’re immensely proud that this competition has so far brought this to the surface and highlighted more great talent in our community. Keep your eyes peeled for next steps and more cool competitions coming your way from the Recombee gang…
Also, if you like to learn more about machine learning techniques involved in predicting the next shopping basket from past purchases check out the second part of our intro blogpost. We are also about to publish more about deep learning recommender systems soon as we’ve heard from many of you, you’d like to know more about this. You can look at the master thesis of Radek Bartyzal as a bit of a taster teaser of the type of cool content we’re prepping for you.
Last but not least; we’re planning a new competition of this type to keep the momentum going and to motivate more students to show us what they’ve got and how they apply what they’ve learned in their courses so far in real life. Also, the Recsys research community is hungry for new challenges that will have tremendous business impact.