Next: Finding Similar Items (Item Up: Semantically Enhanced Collaborative Filtering Previous: Introduction

Background on Item-Based Collaborative Filtering

In a collaborative filtering (CF) scenario, generally we start with a list of users $U = \{u_1, u_2, \ldots, u_m\}$ , a list of items $I = \{i_1, i_2, \ldots, i_n\}$ , and a mapping between user-item pairs and a set of weights. The latter mapping can be represented as a $m \times n$ matrix . In the traditional CF domain the matrix usually represents user ratings of items, thus the entry $M_{r,j}$ represents a user 's rating on item . In this case, the users' judgments or preferences are explicitly given by matrix . Collaborative filtering can also be used in the context of Web usage data. In that case, the set may represent user sessions, some of which may belong to the same user who has visited the site multiple times. For usage data, generally, the entry $M_{r,j}$ represents an implicit weight associated with an item (e.g., page or product) in a user session . This weight may be binary (representing the existence or non-existence of the item in the user session), or it may be based on the amount of time spent on the particular item during the session. For a given active user (also called the target user) , the task of a CF system is to (1) predict $M_{a,t}$ for a given target item which has not already been visited or rated by ; or (2) recommend a set of items that may be interesting to user .

In user-based CF algorithms, first a set of nearest neighbors of the target user are computed. This is performed by computing correlations or similarities between user records (rows of the matrix ) and the target user. Then, different methods can be used to combine the neighbors' item ratings (or weights) to produce a prediction value for the target user on unrated (or unvisited) items. As noted in the introduction, a major problem with this approach is the lack of scalability: the complexity of the system increases linearly as a function of the number of users which, in large-scale e-commerce sites, could reach tens of millions.

In contrast, item-based CF algorithms attempt to find similar items that are co-rated (or visited) by different users similarly. This amounts to performing similarity computations among the columns of matrix . Thus, item-based CF algorithms avoid the bottleneck in user-user computations by first considering the relationships among items. For a target item, predictions can be generated by taking a weighted average of the target user's item ratings (or weights) on these neighbor items.

Subsections

Next: Finding Similar Items (Item Up: Semantically Enhanced Collaborative Filtering Previous: Introduction

Bamshad Mobasher 2004-03-09