Analysis of Web search engine clicked documents

David F. Nettleton, Liliana Calderón-Benavides, Ricardo Baeza-Yates

Research output: Book / Book Chapter / ReportResearch Bookspeer-review

1 Scopus citations


In this paper we process and analyze web search engine query and click data from the perspective of the documents (URL's) selected. We initially define possible document categories and select descriptive variables to define the documents. The URL dataset is preprocessed and analyzed using some traditional statistical methods, and then processed by the Kohonen SOM clustering technique[5], which we use to produce a two level clustering. The clusters are interpreted in terms of the document categories and variables defined initially. Then we apply the C4.5[9] rule induction algorithm to produce a decision tree for the document category. The objective of the work is to apply a systematic data mining process to click data, contrasting non-supervised (Kohonen) and supervised (C4.5) methods to cluster and model the data, in order to identify document profiles which relate to theoretical user behavior, and document (URL) organization.

Original languageEnglish
Title of host publicationProceedings - LA-Web 06
Subtitle of host publicationFourth Latin American Web Congress
Number of pages11
StatePublished - 2006
Externally publishedYes
EventLA-Web 06: 4th Latin American Web Congress - Cholula, Mexico
Duration: 25 Oct 200627 Oct 2006

Publication series

NameProceedings - LA-Web 06: Fourth Latin American Web Congress


ConferenceLA-Web 06: 4th Latin American Web Congress


Dive into the research topics of 'Analysis of Web search engine clicked documents'. Together they form a unique fingerprint.

Cite this