Monday, April 28, 2014

Hybrid Recommendation Systems


Most recommendation systems use either content-based filters or collaborative filtering. Collaborative filtering (CF) generally tends to make better overall recommendations, however, it suffers from the cold-start or new item problem (an item with no rating will never be recommended). Hybrid recommendation systems use both to counter this down-side while still leveraging the power of CF. 

There are two main approaches for CF, neighborhood approach and latent factor models. Neighborhood approaches focus on the relationship between items, or, alternatively, between users. An item-item approach models the preference of a user to an item based on rating of similar items by the same user. A latent factor models transform both items and users to the same latent factor space. The latent space tries to explain ratings by characterizing both products and users on factors automatically inferred from user feedback. Latent factor models tend to provide more accurate results than neighborhood models. Most commercial systems (e.g., those of Amazon and TiVo), however, are based on the neighborhood models. Part of their prevalence is due to their relative simplicity and naturally provide intuitive explanations of the reasonings behind recommendations (which often enhance user experience beyond what improved accuracy may achieve). Most neighborhood methods are local in nature -- concentrating on only a small subset of related ratings. In contrast, matrix factorization casts a very wide net to try to characterize items and users. See Koren and Bell, Advances in Collaborative FilteringRecommender Systems Handbook 145-186, 2011.

Hybrid approaches still seem to be lacking. Some relevant papers
  1. Claypool et al., Combining Content-Based and Collaborative Filters in an Online Newspaper, ACM SIGIR 1999.
  2. Melville et al., Content-Boosted Collaborative Filtering for Improved Recommendations, AAAI 2002.
  3. Schein et al., Methods and Metrics for Cold-Start Recommendations, ACM SIGIR 2002.
  4. Nanopoulos et al., Matrix Factorization with Content Relationships for Media Personalization, 11th International Conference on Wirtschaftsinformatik, 2013.
  5. Forbes and Zhu, Content-boosted Matrix Factorization for Recommender Systems: Experiments with Recipe Recommendation, RecSys 2011.
  6. Nguyen and Zhu, Content-Boosted Matrix Factorization Techniques for Recommender Systems, Statistical Analysis and Data Mining (6)286-301, 2013.
  7. Li and Kim, Clustering Approach for Hybrid Recommender System, WI 2003.
  8. Salakhutdinov and Mnih, Probabilistic Matrix Factorization, NIPS 2008.
  9. Cremonesi et al., Hybrid algorithms for recommending new items, HetRec 2011.
From my observation, it seems like the Content-Boosted Collaborative Filtering by Melville et al. achieves the highest accuracy. They use a CBF to fill in the missing values in a user-ratings matrix. The dense matrix then uses CF to make recommendations to a user. The downside of this approach, is that it requires too much computational time to make it practical in on-line settings.