TY - GEN
T1 - Analysis of Web search engine clicked documents
AU - Nettleton, David F.
AU - Calderón-Benavides, Liliana
AU - Baeza-Yates, Ricardo
PY - 2006
Y1 - 2006
N2 - In this paper we process and analyze web search engine query and click data from the perspective of the documents (URL's) selected. We initially define possible document categories and select descriptive variables to define the documents. The URL dataset is preprocessed and analyzed using some traditional statistical methods, and then processed by the Kohonen SOM clustering technique[5], which we use to produce a two level clustering. The clusters are interpreted in terms of the document categories and variables defined initially. Then we apply the C4.5[9] rule induction algorithm to produce a decision tree for the document category. The objective of the work is to apply a systematic data mining process to click data, contrasting non-supervised (Kohonen) and supervised (C4.5) methods to cluster and model the data, in order to identify document profiles which relate to theoretical user behavior, and document (URL) organization.
AB - In this paper we process and analyze web search engine query and click data from the perspective of the documents (URL's) selected. We initially define possible document categories and select descriptive variables to define the documents. The URL dataset is preprocessed and analyzed using some traditional statistical methods, and then processed by the Kohonen SOM clustering technique[5], which we use to produce a two level clustering. The clusters are interpreted in terms of the document categories and variables defined initially. Then we apply the C4.5[9] rule induction algorithm to produce a decision tree for the document category. The objective of the work is to apply a systematic data mining process to click data, contrasting non-supervised (Kohonen) and supervised (C4.5) methods to cluster and model the data, in order to identify document profiles which relate to theoretical user behavior, and document (URL) organization.
UR - http://www.scopus.com/inward/record.url?scp=34547685751&partnerID=8YFLogxK
U2 - 10.1109/LA-WEB.2006.6
DO - 10.1109/LA-WEB.2006.6
M3 - Libros de Investigación
AN - SCOPUS:34547685751
SN - 0769526934
SN - 9780769526935
T3 - Proceedings - LA-Web 06: Fourth Latin American Web Congress
SP - 209
EP - 219
BT - Proceedings - LA-Web 06
T2 - LA-Web 06: 4th Latin American Web Congress
Y2 - 25 October 2006 through 27 October 2006
ER -