Next: Integrating Content and Usage
Up: A Web Mining Framework
Previous: Discovery of Aggregate Usage
We use precisely the same representation for content profiles (i.e., as
a weighted collection of pageviews). In contrast to usage profiles,
content profiles represent different ways pages with partly similar
content may be grouped together. Our goal here is to capture common
interests of users in a group of pages because specific portions of
their contents are similar. Different groups of users may be interested
in different segments of each page, thus content profiles must capture
overlapping interests of users.
Clusters of pageviews obtained using standard clustering algorithms
which partition the data are not appropriate as candidates for content
profiles. To obtain content profiles, instead of clustering pageviews
(as k-dimensional feature vectors, where k is the number of
extracted features in the global site dictionary), we cluster the
features. Using the inverted feature-pageview matrix obtained in the
content preprocessing stage, each feature can be viewed as an
n-dimensional vector over the original space of pageviews. Thus, each
dimension in the pageview vector for a feature is the weight associated
with that feature in the corresponding pageview. We use multivariate
k-means clustering technique to cluster these pageview vectors. Now,
given a feature cluster G, we construct a content profile CG as a
set of pageview-weight pairs:
where the significance weight,
weight(p, CG), of the pageview p
within the content profile is obtained as follows:
and fw(p, f) is the weight of a feature f in pageview p. As in
the case of usage profiles, we normalize pageview weights so that the
maximum weight in each profile is 1, and we filter out pageviews whose
weight is below a specified significance threshold,
.
Note that
the representation of content profiles as a set of pageview-weight
pairs is identical to that for usage profiles discussed earlier. This
uniform representation allows us to easily integrate both types of
profiles with the recommendation engine.
Next: Integrating Content and Usage
Up: A Web Mining Framework
Previous: Discovery of Aggregate Usage
Bamshad Mobasher
2000-08-14