Next: Experimental Results Up: Integrating Web Usage and Previous: Discovery of Content Profiles

Integrating Content and Usage Profiles for Personalization

The recommendation engine is the online component of a Web personalization system. The task of the recommendation engine is to compute a recommendation set for the current (active) user session, consisting the objects (links, ads, text, products, etc.) that most closely match the current user profile. The essential aspect of computing a recommendation set for a user is the matching of current user's activity against aggregate usage profiles. The recommended objects are added to the last page in the active session accessed by the user before that page is sent to the browser. Maintaining a history depth is important because most users navigate several paths leading to independent pieces of information within a session. In many cases these sub-sessions have a length of no more than 2 or 3 references. We capture the user history depth within a sliding window over the current session. The sliding window of size n over the active session allows only the last n visited pages to influence the recommendation value of items in the recommendation set. Finally, the structural characteristics of the site or prior domain knowledge can also be used to associate an additional measure of significance with each pageview in the user's active session.

In our proposed architecture, both content and usage profiles are represented as sets of pageview-weight pairs. This will allow for both the active session and the profiles to be treated as n-dimensional vectors over the space of pageviews in the site. Thus, given a content or usage profile C, we can represent C as a vector $C = \left\langle {w_1^C, w_2^C, \cdots, w_n^C} \right\rangle$ , where

$\begin{displaymath}w_i^C = \left\{ {_{0,\quad {\rm otherwise}}^{weight(p_i ,\,C),\quad {\rm if}\;p_i \, \in \,C} } \right. \end{displaymath}$

Similarly, the current active session S is also represented as a vector $S = \left\langle {s_1, s_2, ... , s_n} \right\rangle$ , where s_i is a significance weight associated with the corresponding pageview reference, if the user has accessed p_i in this session, and s_i = 0, otherwise. We can compute the profile matching score using a similarity function such as the normalized cosine measure for vectors:

$\begin{displaymath}match(S,C) = \frac{{\sum\limits_k {w_k^C \cdot S_k } }}{{\sqr... ...limits_k {(S_k )^2 } \times \sum\limits_k {(w_k^C )^2 } } }}. \end{displaymath}$

Note that the matching score is normalized for the size of the clusters and the active session. This corresponds to the intuitive notion that we should see more of the user's active session before obtaining a better match with a larger cluster representing a user profile. Given a profile C and an active session S, a recommendation score, Rec(S, p), is computed for each pageview p in C as follows:

$\begin{displaymath}Rec(S,p) = \sqrt {weight(p,C)\; \cdot \;match(S,C)}. \end{displaymath}$

If the pageview p is in the current active session, then its recommendation value is set to zero. We obtain the usage recommendation set, UREC(S), for current active session S by collecting from each usage profile all pageviews whose recommendation score satisfies a minimum recommendation threshold $\rho$ , i.e.,

$\begin{displaymath}UREC(S) = \{ w_i^C \;\vert\;C \in UP{\rm , and }\,Rec(s,w_i^C ) \ge \rho \}, \end{displaymath}$

where UP is the collection of all usage profiles. Furthermore, for each pageview that is contributed by several usage profiles, we use its maximal recommendation score from all of the contributing profiles. In a similar manner, we can obtain the content recommendation set CREC(S) from content profiles. Different methods can be used for combining the two recommendation sets depending on the goals of personalization and the requirements of the site. In our case, for each pageview we take the maximum recommendation value across the two recommendation sets. This allows, for example, content profiles to contribute to the recommendation set even if no matching usage profile is available and vice versa.

Next: Experimental Results Up: Integrating Web Usage and Previous: Discovery of Content Profiles

Bamshad Mobasher
2000-08-14