A 2025 Research Retrospective

In 2025, I became Head of Research at Recombee. I knew it would be a challenge: and I wasn’t wrong. Recombee is a leader in recommendation-as-a-service, with thousands of clients across a wide range of domains. Keeping research moving forward while staying close to real product constraints (latency, scale, reliability, feedback loops…) is exciting, but it’s definitely not easy.
Toward the end of the year, I did something I rarely manage in the middle of ongoing projects: I stepped back and looked at the recommender-systems field as a whole. More importantly, I looked at where our industrial research team has actually moved the needle.
Recommendation is evolving fast right now. In industry, that speed does not show up as abstract trends; it shows up as a constant stream of concrete decisions: tuning models, shipping features, meeting strict latency budgets, and ensuring we are not accidentally creating harmful feedback loops. Just as importantly, any new modeling improvement must be weighed against how difficult it is to productionalize, monitor, and maintain as part of a larger product. This post is my attempt to connect those day-to-day decisions to the underlying ideas. I want to share what we worked on this year and offer an industrial researcher's view of what “modern recommendation'” looks like in practice, where scale, uncertainty, sequential dynamics, and engineering constraints can matter just as much as the choice of model family.
A Strong Year for Recombee Research
It is challenging, but also very rewarding. I am very happy (and proud) to share that, in total, our researchers published 15 papers across five conferences, with the vast majority ranked A or A*, and six journals, mostly high-impact Q1. There is no way I can cover everything in one post, so I will focus on a handful of contributions that best show two things: how we have been contributing to the broader community, and how that work translates into real value for our clients.
The Future is Sparse
One recurring bottleneck in real systems is retrieval and serving cost. For instance, dense embeddings are a convenient default, but at scale they become one of the most expensive parts of the stack: memory, bandwidth, and approximate nearest-neighbor search all start to dominate. Several of our 2025 results orbit the idea that sparsity is not merely a compression trick applied at the end, but a representational choice that can change the trade-offs you can achieve.
If your pipeline depends on retrieving candidates quickly and repeatedly, then being able to store and search representations efficiently is not a “nice to have”: it’s the condition under which more sophisticated ranking is even feasible. For our clients, this means faster retrieval, which unlocks larger candidate pools and leaves more budget for stronger ranking models. In short, efficiency at retrieval directly translates into better recommendations at production scale.
Scalability is always in the room
Another theme we kept coming back to in 2025 was being disciplined about baselines. In industry, you quickly relearn a humbling lesson: “simple” models can stay surprisingly competitive when they are implemented well and evaluated properly at scale. That is not an argument against deep learning. We use plenty of deep models in our ML stack. It is an argument for being honest about what added complexity actually buys you, when it is worth it, and where it belongs in the recommendation stack.
That is why we spent time evaluating linear models and shallow autoencoder-style recommenders on large datasets. We wanted to see what truly breaks as the interaction matrix grows, separate algorithmic approximations from real objective changes, and keep a set of scalable baselines that new methods have to beat under realistic conditions. For companies, this is practical, not philosophical. The best model is not always the one with the fanciest architecture. It is the one that stays stable under continuous updates, can be monitored and explained, and delivers the best trade-off between quality, latency, and cost.
Beyond “Next Click”
With thousands of clients and billions of interactions, top k or next click is definitely not our only scenario. We also spent time on sequential and dynamic settings where “predict the next click” is not the right way to think about the problem. Baskets, sessions, and recurring behaviors have real structure, and static models often miss it.
At the same time, jumping straight to heavyweight sequence encoders can turn the system into something that is hard to control and even harder to debug. That is why we explored sequence approaches that stay interpretable, with temporal windows and dependency operators made explicit. Even when a model is not simple computationally, it can still be simple scientifically: you can ask what it learned, which dependencies matter most, and how those dependencies change across cohorts or seasons. In applied work, that kind of interpretability is not a nice extra. It is what lets you connect model behavior to product hypotheses and operational constraints.
This means better recommendations in session and repeat interaction scenarios, without turning the system into a black box. The added interpretability makes it easier to debug issues, run safer experiments, and translate model behavior into clear product and business decisions.
Trust, Safety, and Responsible Recommendations
We also kept a parallel track on trust-related properties that companies increasingly need in their recommender stack: explanation, safety, fairness, and diversity. The point here is not to attach slogans to systems, but to treat these as technical objects, with datasets, constraints, and measurable properties that can be tested.
Explanations matter because they improve debugging and human oversight. Diversity and serendipity matter because they reduce the risk of systems collapsing into popularity loops, especially in cold start settings. Fairness matters because group and marketplace scenarios can amplify distributional imbalances. Safety matters because stronger semantic search and richer representations can surface harmful or sensitive content unless you explicitly align for it and evaluate it.
Generative AI in Recommendation Systems
Generative AI also became a real part of our 2025 agenda, not as a replacement for recommender systems, but as an additional set of tools that changes how we represent items, interpret intent, and communicate decisions. In our collaboration with The Telegraph, we explored how LLMs can support editorial work by making segment level patterns easier to inspect and act on, while keeping the recommender backbone responsible for ranking and evaluation.
In parallel, we started treating LLMs as first-class components in responsible recommendation: if language models are used to create or refine representations, they also shape fairness outcomes, so they need explicit constraints and auditing rather than implicit trust. This is shaping how we integrate LLMs into recommenders in practice, where they sit in the pipeline, what they are allowed to influence, and how their impact can be measured under real product constraints.
Finally, 2025 also had a community dimension that matters to us as researchers. RecSys 2025 was held in Prague (Recombee home), and several of us were involved not only as authors but also in organizing roles (including general chair, industry chair, and local chairs).
For a field that sits between academia and product, conferences are not just places where papers are presented: they are part of the infrastructure that shapes standards for evaluation, reproducibility, and the quality of dialogue between research and practice. Contributing to that infrastructure is, in a very literal sense, part of advancing the science.
…A Stronger 2026!

Looking into 2026, we’re already planning to be at The ACM Web Conference 2026 in Dubai (April 13-17, 2026), which feels like a natural venue for the kind of “web-scale” questions that increasingly shape recommender systems in practice. The product-facing challenges aren’t getting smaller: interfaces are becoming conversational, feedback is messier and more implicit, and the boundary between “retrieval,” “recommendation,” and “assistance” is blurring.
From a research perspective, a lot of our attention is therefore shifting toward GenAI in conversational settings (where the system must both understand intent and respond under uncertainty) and toward agentic recommendation, which are systems that don’t only rank items, but plan and adapt sequences of actions while staying controllable, evaluable, and safe.
If you’d like to collaborate with us (whether you’re tackling recommendation problems in a product setting, working on the underlying theory, or exploring new interfaces like conversational and agentic systems) feel free to reach out. We’re always happy to discuss concrete problems, exchange ideas, and learn from other perspectives. And if our work is useful for your own research or engineering efforts, we’d appreciate it if you read it, cite it where relevant.
Published Peer-Reviewed Publications
[1] Kasalický, P., Spišák, M., Vančura, V., Bohuněk, D., Alves, R., & Kordík, P. (2025). The Future is Sparse: Embedding Compression for Scalable Retrieval in Recommender Systems. In Proceedings of the 19th ACM Conference on Recommender Systems (RecSys 2025) (pp. 1099–1103).
[2] Spišák, M., Alves, R., Kelleher, T., Sheppard, J., Fiedler, O., Kosovrasti, E., Vančura, V., Kasalický, P., & Kordík, P. (2025). Segment-Aware Analytics for Real-Time Editorial Support in Media Groups: Lessons from The Telegraph. In INRA 2025: 13th International Workshop on News Recommendation and Analytics (CEUR Workshop Proceedings, Vol. 4056).
[3] Vančura, V., Kasalický, P., Alves, R., & Kordík, P. (2025). Evaluating Linear Shallow Autoencoders on Large Scale Datasets. ACM Transactions on Recommender Systems.
[4] Žid, Č., Alves, R., & Kordík, P. (2025). Active Recommendation for Email Outreach Dynamics. In Proceedings of the 34th ACM International Conference on Information and Knowledge Management (CIKM 2025) (pp. 5540–5544).
[5] Zmeškalová, T., Ledent, A., Spišák, M., Kordík, P., & Alves, R. (2025). Recurrent Autoregressive Linear Model for Next-Basket Recommendation. In Proceedings of the 19th ACM Conference on Recommender Systems (RecSys 2025) (pp. 1273–1278).
[6] Koštejn, V., Peška, L., & Spišák, M. (2025). SAGEA: Sparse Autoencoder-based Group Embeddings Aggregation for Fairness-Preserving Group Recommendations. In Proceedings of the 19th ACM Conference on Recommender Systems (RecSys 2025) (pp. 1290–1295).
[7] Poernomo, J., Tan, N. G. L., Alves, R., & Ledent, A. (2025). Probabilistic Modeling, Learnability and Uncertainty Estimation for Interaction Prediction in Movie Rating Datasets. In Proceedings of the 19th ACM Conference on Recommender Systems (RecSys 2025) (pp. 1261–1266).
[8] Ledent, A., Kasalický, P., Alves, R., & Lauw, H. W. (2025). Conv4Rec: A 1-by-1 Convolutional Autoencoder for User Profiling Through Joint Analysis of Implicit and Explicit Feedback. IEEE Transactions on Neural Networks and Learning Systems.
[9] Ledent, A., Alves, R., & Lei, Y. (2025). Generalization Bounds for Rank-sparse Neural Networks. Neurips 2025.
[10] Spacek, F., Vancura, V., & Kordik, P. (2025). Mitigating Risks in Marketplace Semantic Search: A Dataset for Harmful and Sensitive Query Alignment. In Proceedings of the 33rd ACM Conference on User Modeling, Adaptation and Personalization (UMAP 2025) (pp. 329–334).
[11] Kuznetsov, S., & Kordík, P. (2025). Improving recommendation diversity and serendipity with an ontology-based algorithm for cold start environments. International Journal of Data Science and Analytics, 20(2), 431–443.
[12] Alves, R. (2025). SCORE: A convolutional approach for football event forecasting. International Journal of Forecasting.
[13] Cahlik, V., Alves, R., & Kordík, P. (2025). Reasoning-grounded natural language explanations for language models. In Proceedings of the World Conference on Explainable Artificial Intelligence (pp. 3–18).
[14] Hänsch, S., Sajdoková, A., Rabau, A., Rybář, V., Alves, R., & Kordík, P. (2025). Data-driven closure model selection for multiphase CFD via matrix completion. AI Thermal Fluids.
[15] Stambrouski, T., & Alves, R. (2025). Multitask learning for cognitive sciences triplet analysis. Expert Systems with Applications, 267, 126187.
Next Articles
Looking Back at 2025
2025 marked 10 years of Recombee. A decade of building personalization from first principles, shaped by research, real-world deployments, and close collaboration with partners across industries.
Product Highlights from 2025
In 2025, we focused on making advanced personalization easier to implement, scale, and maintain across products and platforms. Here’s a look at the key product updates we released last year.
AI News and Outlook for 2026
Here’s what caught my attention in AI research lately, and where things might be heading in 2026. After 25+ years in this field, the pace has gotten hard to keep up with. I’m trying to make sense of...


