Title: Improving the Effectiveness of Collaborative Filtering on Anonymous Web Usage Data
Authors: Bamshad Mobasher, Honghua Dai, Tao Luo, Miki Nakagawa
Abstract: Recommender systems based on collaborative filtering usually require real-time comparison of users' ratings on objects. In the context of Web personalization, particularly at the early stages of a visitor's interaction with the site (i.e., before registration or authentication), recommender systems must rely on anonymous clickstream data. The lack of explicit user ratings and the shear amount of data in such a setting poses serious challenges to standard collaborative filtering techniques in terms of scalability and performance. Offline clustering of users transactions can be used to improve the scalability of collaborative filtering, however, this is often at the cost of reduced recommendation accuracy. In this paper we study the impact of various preprocessing techniques applied to clickstream data, such as clustering, normalization, and significance filtering, on collaborative filtering. Our experimental results, performed on real usage data, indicate that with proper data preparation, the clustering-based approach to collaborative filtering can achieve dramatic improvements in terms of recommendation effectiveness, while maintaining the computational advantage over the direct approaches such as the k-Nearest-Neighbor technique.
Full Paper:  [pdf]