next up previous
Next: Finding Similar Items (Item Up: Semantically Enhanced Collaborative Filtering Previous: Introduction


Background on Item-Based Collaborative Filtering

In a collaborative filtering (CF) scenario, generally we start with a list of $m$ users $U = \{u_1, u_2, \ldots, u_m\}$, a list of $n$ items $I = \{i_1, i_2, \ldots, i_n\}$, and a mapping between user-item pairs and a set of weights. The latter mapping can be represented as a $m \times n$ matrix $M$. In the traditional CF domain the matrix $M$ usually represents user ratings of items, thus the entry $M_{r,j}$ represents a user $u_r$'s rating on item $i_j$. In this case, the users' judgments or preferences are explicitly given by matrix $M$. Collaborative filtering can also be used in the context of Web usage data. In that case, the set $U$ may represent user sessions, some of which may belong to the same user who has visited the site multiple times. For usage data, generally, the entry $M_{r,j}$ represents an implicit weight associated with an item (e.g., page or product) $i_j$ in a user session $u_r$. This weight may be binary (representing the existence or non-existence of the item in the user session), or it may be based on the amount of time spent on the particular item during the session. For a given active user (also called the target user) $u_a$, the task of a CF system is to (1) predict $M_{a,t}$ for a given target item $i_t$ which has not already been visited or rated by $u_a$; or (2) recommend a set of items that may be interesting to user $u_a$.

In user-based CF algorithms, first a set of $k$ nearest neighbors of the target user are computed. This is performed by computing correlations or similarities between user records (rows of the matrix $M$) and the target user. Then, different methods can be used to combine the neighbors' item ratings (or weights) to produce a prediction value for the target user on unrated (or unvisited) items. As noted in the introduction, a major problem with this approach is the lack of scalability: the complexity of the system increases linearly as a function of the number of users which, in large-scale e-commerce sites, could reach tens of millions.

In contrast, item-based CF algorithms attempt to find $k$ similar items that are co-rated (or visited) by different users similarly. This amounts to performing similarity computations among the columns of matrix $M$. Thus, item-based CF algorithms avoid the bottleneck in user-user computations by first considering the relationships among items. For a target item, predictions can be generated by taking a weighted average of the target user's item ratings (or weights) on these neighbor items.


Subsections
next up previous
Next: Finding Similar Items (Item Up: Semantically Enhanced Collaborative Filtering Previous: Introduction
Bamshad Mobasher 2004-03-09