User-Based Collaborative Filtering with K-NN

How to compute Predictions:

Suppose that we have a new target user NU and we want to compute the predicted rating for NU on a target item I_t (an item NU has not rated).

Assume that we have identified the K nearest neighbors, U₁, U₂, ..., U_k for NU (generally, by computing the correlaton between rating vectors of user NU and all users in the training database and then ranking the top K most similar/correlated users to NU in decreasing order of similarity). Let us denote the rating given by user U_i to an item I_j by r(U_i,I_j). Also, let us denote the similarity of user U_i to user NU as by sim(NU, U_i). Note that, generally, this similarity is computed as the Pearson correlation of the two users.

Using the weighted average approach, the predicted rating of NU on the target item I_t can be computed as follows:

In other words, the ratings of the K neighbors are weighted by their similarity (correlation) to the target user, and the sum of all these weighted ratings is divided by the sum of all the similarities across the K neighbors.

Important Notes:

When computing the predictions (i.e., computing the weighted average), only those neighbors are considered that have actually rated the target item, I_t, are considered. For example, suppose K = 3, and U₁, U₂, and U₃, are the nearest neighbors to target user NU. Suppose that only U₁ and U₃ have rated item I_t. Then, the predicted rating for NU is computed using only U₁ and U₃:

Generally, when the K neighbors are identified, those whose correlations with the target user less than or equal to 0 are filtered out. So, in practice, the predictions may be computed with less than K neighbors (only those with similarities greater than 0 are considered).