In a collaborative filtering (CF) scenario, generally we start with a list of
users
, a list of
items
, and a mapping between user-item pairs and a set of weights. The latter mapping can be represented as a
matrix
. In the traditional CF domain the matrix
usually represents user ratings of items, thus the entry
represents a user
's rating on item
. In this case, the users' judgments or preferences are explicitly given by matrix
. Collaborative filtering can also be used in the context of Web usage data. In that case, the set
may represent user sessions, some of which may belong to the same user who has visited the site multiple times. For usage data, generally, the entry
represents an implicit weight associated with an item (e.g., page or product)
in a user session
. This weight may be binary (representing the existence or non-existence of the item in the user session), or it may be based on the amount of time spent on the particular item during the session. For a given active user (also called the target user)
, the task of a CF system is to (1) predict
for a given target item
which has not already been visited or rated by
; or (2) recommend a set of items that may be interesting to user
.
In user-based CF algorithms, first a set of
nearest neighbors of the target user are computed. This is performed by computing correlations or similarities between user records (rows of the matrix
) and the target user. Then, different methods can be used to combine the neighbors' item ratings (or weights) to produce a prediction value for the target user on unrated (or unvisited) items. As noted in the introduction, a major problem with this approach is the lack of scalability: the complexity of the system increases linearly as a function of the number of users which, in large-scale e-commerce sites, could reach tens of millions.
In contrast, item-based CF algorithms attempt to find
similar items that are co-rated (or visited) by different users similarly. This amounts to performing similarity computations among the columns of matrix
. Thus, item-based CF algorithms avoid the bottleneck in user-user computations by first considering the relationships among items. For a target item, predictions can be generated by taking a weighted average of the target user's item ratings (or weights) on these neighbor items.