Title: Measuring the Accuracy of Sessionizers for Web Usage Analysis
Authors: Bettina Berendt, Bamshad Mobasher, Myra Spiliopoulou, Jim Wiltshire, Honghua Dai, Tao Luo, Miki Nakagawa
Abstract: Companies with web presence rely on web usage analysis to obtain
insights on customer behavior, associations among products, impact of
advertisement banners, web marketing campaigns and product promotions.
The validity of these results depends heavily on the accurate
reconstruction of the visitors' activities in the web site. To this
end, many sites employ cookies that distinguish among different users
coming from the same proxy server or anonymizer. However, the set of
activities thus grouped together refer to the whole lifetime of a
cookie at the user's host. The activities performed during each visit
to the web site, the \sessions", are not grouped properly, thus
prohibiting the monitoring of changes in the user's behaviour and in
her interaction with the site during each session. The reconstruction
of user sessions, the so-called \sessionizing" is blurred by client
caches and multiple instantiations of the user's browser. Sessionizing
tools exploit infor- mation on the site's topology and statistics on
its usage, in order to assess the correct contents of a user session.
These tools are based on heuristic rules and on assumptions about the
site's usage, and are therefore prone to error. In this study, we
provide a formal framework for the evaluation of the accuracy of
sessionizing tools. We introduce a set of measures that compute the
extent to which real sessions are successfully reconstructed by
different sessionizers. The wide range of measures proposed re ects the
fact that some web usage analysis applications require exact
reconstruction of a session, while for others ordering and page
revisits are not important. On the basis of these measures, we compute
and evaluate a number of sessionizing tools using the log data of a
real web site.
Full Paper:  [pdf]