Next: Conclusions and Future Work
Up: Integrating Web Usage and
Previous: Integrating Content and Usage
We conducted a series of experiments with real usage data from the site
for the newsletter of the Association for Consumer Research (from
July 1998 to June 1999). The site contains a variety of news items,
including President's columns, conference announcements, and
call-for-papers for a number of conferences and journals.
The usage preprocessing steps described earlier resulted in a user
transaction file containing 18430 user transactions with a total of 62
pageviews represented uniquely by their associated URLs. The
transaction clustering process yielded 16 transaction clusters
representing different types of user access patterns. A threshold of
0.5 was used to derive usage profiles from transaction clusters (i.e.,
profiles contained only those pageviews appearing in at least 50% of
transactions). In the content preprocessing stage, a total of 566
significant features were extracted with each document contributing at
most 20 significant features to the global dictionary (normalized term
frequency was used for measuring feature significance). Feature
clustering using multivariate k-means resulted in 28 feature clusters
from which the corresponding (overlapping) content profiles were
derived.
Figure 2 depicts an example of two overlapping content
profiles. The top significant features where listed for each document
for illustrative purposes. These features indicated why each of the
pageviews were included in each content profile. The first profile
captures those documents in which a portion of the content relates to
global and international business management and marketing. On the
other hand, the second profile includes documents about consumer
behavior and psychology in marketing. Note that documents which contain
content related to both topics have been included in both profiles.
Usage profiles are represented in the same manner, but they capture
overlapping aggregate usage patterns of the site users.
Figure 2:
Two Overlapping Content Profiles
 |
The recommendation engine was used for a sample user session using a
window size of 2. Figure 3 shows the system recommendations
based only on usage profiles, while Figure 2 shows the
results from only the content profiles. In these tables, the first
column shows the pageviews contained in the current active session. The
last pageview in each session window represents the current location of
the user in the site. The right-hand column gives the recommendation
score obtained using the techniques discussed in the previous section.
It is clear from these examples that the combination of recommendations
from both content and usage profiles can provide added value to the
user. For example, in the usage-based recommendations, the user's visit
to "ACR Board of Directors Meeting" did not yield any
recommendations with a score above the specified threshold (0.5), while
content-based recommendations produced some pages with related content.
On the other hand, navigating to the page "Conference Update"
resulted in content-based recommendations for pages with only general
information and news about conferences, while usage profiles yielded a
number of specific recommendations that the site users interested in
conferences and calls for papers tend visit.
Figure 3:
Recommendations Based on Usage Profiles
 |
Figure 4:
Recommendations Based on Content Profiles
 |
Next: Conclusions and Future Work
Up: Integrating Web Usage and
Previous: Integrating Content and Usage
Bamshad Mobasher
2000-08-14