Why use the state-of-the-art method for matrix factorization

4 min readJul 3, 2021

Matrix factorization is arguably the most popular algorithm in collaborative filtering. It is typically used for recommendation systems and it often outperforms other methods like stochastic/online gradient descent or KNN. Let's explore its application in this article.

Matrix factorization (MF) is a fundamental data analysis tool that has seen widespread use in recommender systems and other applications. The state-of-the-art MF algorithm, non-negative matrix factorization (NMF), provides a very elegant solution to the challenging task of learning latent features from data. However, users’ ratings are often sparse, which implies that most rating matrices contain few non-zero entries. In addition, item rankings change dramatically when any single entry is changed by even one unit. As a result, most recent algorithms (e.g., GloVe) require that all latent features be non-negative and share the same scale; this seems to make them vulnerable to outliers in the data.

Despite these challenges, current MF implementations have been successfully deployed on several challenging real-world problems such as movie recommendation, news article recommendation, and music recommendation.

In the past five years, there have been two major breakthroughs in matrix factorization. The first was the adoption of stochastic optimization for non-convex learning. This caused a paradigm shift, which led to successful applications on problems that were previously intractable, such as multi-label classification. The second was the development of algorithms with provable guarantees for quality and robustness. These include algorithms that handle more general types of noise and non-convex loss functions as well as algorithms that can work in settings with limited data, which is often the case for a real application.

MF algorithms try to decompose an original matrix of user/item preferences into two smaller matrices called latent factors: U from users’ characteristics and I from item characteristics. Then these factors are used for the recommender system:

There are some problems that arise when using this algorithm to create recommendations:

The results may differ depending on whether an item is recommended or not. -This problem can be solved by choosing a random number of k nearest neighbors and their weights so that each recommendation contributes with different power. It makes sense to use more similar users for recommendations and useless similar to compute the importance of each user. The number of neighbors should be set based on the context — movie recommendations will have high similarities between users, but in the case of books it should be lower. In some cases when there are few items in the dataset it’s possible that nearest neighbors cannot make any contribution, so then there can be no recommendation made instead.

There may appear many irrelevant results if a large portion is 1-star or 5-stars or even 0 — it means this item has not been rated by enough people and the relevance calculated on basis of nearest neighbors can be very low. This problem appeared especially in Netflix data where there were not too many movies watched and they were mostly highly rated ones. It resulted in the introduction of two algorithms: one that was more aggressive and recommended items with high probability based on initial ratings, and second, that was less aggressive and it set the smaller weight to these recommendations.

In the case of new users, it is difficult to recommend the items. Consider a case when the user has 0 ratings: in nearest neighbors computation k should be equal to 0 because there is no information for this user yet, but then this user will not receive any recommendation as well. It can be solved by using a special component for unknown users whose values depend on estimated data (maybe based on other results). This component does not update during backpropagation through.

Performance issues may occur especially in recommender systems that have many users or they’re very sparse (movies on Netflix).

In many cases, the original dataset has no ratings assigned to items and it’s necessary to create them before using this algorithm. Rating values can be generated by certain function R(i,j) based on user preferences — these will be used in the recommender system for item j instead of real ones.

And here is a list of advantages and disadvantages:

Positive:

It does not suffer from the cold-start problem;
It uses other approaches to address the cold-start issue (like Netflix did);
Recommendations have higher quality;
Memory usage is lower (all we need is matrix USVI which can be small and sparse);

Negative:

Clustering might take more time (because the algorithm searches for similar users);
There can appear irrelevant recommendations;
If there are too few ratings in the dataset then some items may not receive any recommendation.
They cannot handle sparse data efficiently,
They do not scale well with a large number of users or items.
Their performance degrades substantially when the number of latent features is larger than the number of training examples.

Matrix factorization is an essential task in data analysis and machine learning applications, especially recommender systems, neural networks, and large-scale textual or image data. Most of the current methods build on the work of Levy & Goldberg which provides a simple and efficient algorithm for composite matrix factorization. In computer science, the most popular algorithms are based on either stochastic gradient descent or alternating least squares. Matrix factorization has been a hot research topic since the initial burst of interest in collaborative filtering. It is used to reduce very large sparse matrices (with many missing entries) into two smaller dense matrices. Various applications for this technique have been proposed, including recommendation content generation in social networks, recommender systems for e-commerce, and so on.

Originally published at https://protonautoml.com on July 3, 2021.

Why use the state-of-the-art method for matrix factorization

Written by protonAutoML

No responses yet