Online NMF - AbsoluteAstronomy.com

Online NMF is a recently developed method for real time data analysis in an online context. Non-negative matrix factorization in the past has been used for static data analysis and pattern recognition. In the past it has been used for facial recognition and spectral data analysis, however due to the time and memory expensive nature of NMF algorithms², PCA

Principal components analysis

Principal component analysis is a mathematical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of uncorrelated variables called principal components. The number of principal components is less than or equal to...

, SVD

Singular value decomposition

In linear algebra, the singular value decomposition is a factorization of a real or complex matrix, with many useful applications in signal processing and statistics....

, and Pearson correlation

Pearson product-moment correlation coefficient

In statistics, the Pearson product-moment correlation coefficient is a measure of the correlation between two variables X and Y, giving a value between +1 and −1 inclusive...

based methods have been used instead. However, the fact that data can be recreated as a linear combination of the set of resolved "basis" data is advantageous in some lines of study. One such use is for collaborative filtering

Collaborative filtering

Collaborative filtering is the process of filtering for information or patterns using techniques involving collaboration among multiple agents, viewpoints, data sources, etc. Applications of collaborative filtering typically involve very large data sets...

in recommendation systems where it is advantageous to know not only how much two individuals are alike, which can be derived from the Pearson correlation, but also in what ways are they alike. The purpose of the online NMF algorithm is to perform rapid NMF analysis so that recommendations can be produced in real time.

Mathematical Framework

We begin with the initial factorization at timestamp t

V = W*H + E

For simplicity we claim V ~ W*H

We then add the additional data U to matrix V resulting in V'

V' = ( V ) over ( U )

V' = W'*H

V' = ( W1' )*H' over ( W2' )

Also, for previously processed data:

W' = ( W*Λ⁻¹*W1' ) over ( W2' )

H' = W1'⁻¹*Λ*H

Where Λ is a diagonal matrix were Λ_ii equals weight factor h_i

For unprocessed data we need to perform the optimization problem of minimizing J. We have:

J = 1/2 ||V - W*H|| + a*R*H*H^T

Where a is a positive integer and R is a symmetry non-negative matrix. This can be done using the following iterative algorithm

w_ij <-- w_ij (V*H^T)_ij over (W*H*H^T)_ij

h_ij <-- h_ij (W^T*V)_ij over (W^T*W*H + a*R*H)_ij

Online NMF Algorithm

1) timestamp 0: Initialize using the NMF algorithm for new data

2) timestamp t:

a) calculate W' and H' using the NMF algorithm for new data
(for this step you do not need to use all of the data in previous W and H)

b) create W and H using W' and H' from the previous step and the NMF algorithm for previously processed data.

3) timestamp T: Output final W and H

Recommendation

Let V be a matrix of data were the rows represent internet users, the columns represent items, and each cell is what that internet viewer rated that item. First we decide how many "types" of user we expect there to be in our network, such as types of star wars characters: Jedi, rebel soldier, imperial soldier, ewok, or types of high-schooler: jock, emo, nerd, goth, etc. This number is the length of W (the weights matrix) and the height of H (the basis matrix). We can determine this number using PCA

Principal components analysis

. Also, W has a height of the number of people in the network we are looking at and H has a length of the number of items. After running an NMF algorithm, such as gradient descent

Gradient descent

Gradient descent is a first-order optimization algorithm. To find a local minimum of a function using gradient descent, one takes steps proportional to the negative of the gradient of the function at the current point...

or alternating least squares, you will have resolved W and H. W and H serve as an invaluable description of your network as shown in V. H represents each "type" of user, and each cell represents what someone of that type would rate the selected item (column). In the high-schooler example, the first row would be jock, the second, emo, etc. For the items football and silently crying to yourself (both of which would be in the rows), the jock would rank football high and crying low, while the emo would do the reverse.

The W matrix represents the weights, it is what forms the final result V. Remember, each row in V represents an actual person with lots of complexity. For example, the first person in list V might equally like crying and football, in which case he would be half jock and half emo, in which case the first row of the weights matrix would be 1/2 1/2. However, sometimes the person can not be perfectly represented as a linear combination of the types in the basis matrix H, this must be made up for in the error matrix E.

By performing NMF on a bunch of people and their likes and dislikes we have essentially created a recommendation system

Recommendation system

Recommender systems, recommendation systems, recommendation engines, recommendation frameworks, recommendation platforms or simply recommender form or work from a specific type of information filtering system technique that attempts to recommend information items Recommender systems, recommendation...

. Based on what someone liked in the past we can conjecture what collection of personality traits someone has and from this form recommendations based on these traits. Separating these can be highly beneficial. For example if a Youtube user likes videos of kittens, but also likes WWF videos, it is unlikely that anyone else in the network likes those two things so a Pearson Correlation with everyone else will always be bad. However matrix factorization understands this individuals dual personalities because they both exist in the basis (kitten lover and wrestling fan), so we can make recommendations on what the basis people (not real) like, and we end up properly recommending kittens and wrestling movies.

One of the reasons this is not regularly deployed is that it is hard to continually perform NMF algorithms on constantly changing huge data sets. This is the problem that Online NMF hopes to solve. It uses the previous results from timestamp t and then adds in the additional users and items as they come in timestamp t+1 without having to perform the entire algorithm over.

Search Results

Hand and hand with recommendation systems comes the back-end of search engines. Using this method it is possible to get better search results by guessing what the person is looking for based on the type of person they are.

General Signal Analysis

This type of analysis can also be applied to general signals that update themselves regularly.

Links and References

1) http://www.siam.org/proceedings/datamining/2006/dm06_059zhangs2.pdf
2) http://www.ijcai.org/papers07/Papers/IJCAI07-432.pdf
3) http://portal.acm.org/citation.cfm?id=1339264.1339709

The source of this article is wikipedia, the free encyclopedia. The text of this article is licensed under the GFDL.