How Collaborative Filtering Techniques work in Recommender Systems
Recommender Systems are based on the idea that they store data about users’ interactions with products and use it later when suggesting new. In the field of machine learning, a recommender system is a subclass of information filtering systems that seeks to predict the “rating” or “preference” a user would give to an item. Rating prediction can be generally understood as a collaborative filtering problem, where the rating matrix captures user-item interaction information. In this article, we will talk about the various techniques used in Recommender Systems.
Machine learning (ML) is broadly used in many fields, such as data mining, pattern recognition, and artificial intelligence. ML focuses on algorithms to allow an automated machine to make predictions or decisions. The applications of machine learning are sparse but widespread.
Recommendation systems are not only present in social networks like Facebook or YouTube but also in other domains such as e-commerce platforms. They recommend products that would be interesting for the user based on their purchase records and/or ratings. This system tries to predict what product would interest a given individual by using past purchases made by similar users. To do this, it uses collaborative filtering which consists in building a mathematical model from all available information about users’ preferences to learn and build accurate recommendations efficiently without any human intervention. Collaborative filtering aims at processing information about users and products and predicting new user-product ratings or preferences.
Recommender systems can be used to find videos, images, or articles on social media which is of interest to the individual. The key element in collaborative filtering is that there is some relation between bits of knowledge about a user’s behavior towards certain items and his/her preferences on new items.
Collaborative filtering is used in different fields of application but it is particularly popular in recommender systems, where it aims at providing personalized suggestions to users by predicting their preference towards items or services that have not been considered before. This prediction task requires modeling the relationships between items (or users) based on their previous preferences. Collaborative filtering assumes that in order to predict the rating of an item by a user, it is useful to exploit information about other users who have expressed preferences for items similar to this one. Its underlying assumption is that preferences are shared among people with similar interests or tastes in movies, music, etc. Collaborative filtering techniques fall into two major categories: memory-based and model-based.
There are three different types of collaborative filtering techniques:
Memory-based
Model-based
Knowledge-based
Memory-based techniques typically operate by making recommendations based on a user’s historical behavior in rating items. Model-based techniques estimate the probability that an item will be rated by a user. Knowledge-based techniques aim at identifying factors or aspects that define user preferences for certain products such as similar users who bought the product so far or alternative options to explore. The simplest technique is content-based filtering which aims to compute similarities between users and items to generate implicit feedback about their preferences. This means it uses observational data in order to make inferences about preference instead of explicit feedback (ratings).
Content-based recommendation techniques aim at predicting user preferences by analyzing large amounts of data about them and identifying common features among the preferred items. Some examples of this type of technique are the Amazon Bestsellers rank, Netflix movie categories, IMDb genre classification, or MPAA rating groupings. Content-based methods take into account all available information about users’ preferences to generate models but they suffer from lacking robustness because it is difficult to capture all aspects of users’ preferences.
Memory-based techniques use all available data to build associations between users and items through which they compute implicit feedback. They try to distinguish users who share tastes from those sharing random preferences in order to enable prediction towards new users inferred from the computed association. Memory-based techniques are also very popular because their basic idea can easily be applied using different kinds of variables (.g., textual attributes). However, they are not very discriminative because they fail to credit users for new preferences. Research has tried to solve this problem using user behavior data.
Memory-based recommenders use only historical data users’ interactions with items (ratings). Memory-based recommenders form the user base as the set of all users who have interacted with an item at least once (e.g., Amazon uses this strategy). To recommend items for a new user, memory-based systems start by counting how many other users have purchased or rated each unique item. This information forms the initial basis of recommendations; typically, this is done simply by listing all of these similar users and their respective ratings of the items they bought.
Model-based techniques try to build a predictive model based on the observed relationships between users and items in order to make predictions for new items. They can operate in both online and batch mode, which is an advantage over memory-based methods. However, model-based techniques are computationally expensive because they require building models from scratch each time.
There are also Matrix factorization techniques that are based on the assumption that preferences are latent vectors in an implicit space, defined by a user-item rating matrix R. Factorization models aim at decomposing the rating matrix into the product of two smaller matrices, A and B, which can be interpreted as item and user vectors respectively: R = AB. This model has three major advantages over content-based approaches: memory is not required for storage, it does not require large datasets to fit the parameters, contrary to content-based methods -factors are interpretable since they represent meaningful relationships between items.
Originally published at https://protonautoml.com on August 16, 2021.