TY - GEN
T1 - The intention behind web queries
AU - Baeza-Yates, Ricardo
AU - Calderón-Benavides, Liliana
AU - González-Caro, Cristina
PY - 2006
Y1 - 2006
N2 - The identification of the user's intention or interest through queries that they submit to a search engine can be very useful to offer them more adequate results. In this work we present a framework for the identification of user's interest in an automatic way, based on the analysis of query logs. This identification is made from two perspectives, the objectives or goals of a user and the categories in which these aims are situated. A manual classification of the queries was made in order to have a reference point and then we applied supervised and unsupervised learning techniques. The results obtained show that for a considerable amount of cases supervised learning is a good option, however through unsupervised learning we found relationships between users and behaviors that are not easy to detect just taking the query words. Also, through unsupervised learning we established that there are categories that we are not able to determine in contrast with other classes that were not considered but naturally appear after the clustering process. This allowed us to establish that the combination of supervised and unsupervised learning is a good alternative to find user's goals. From supervised learning we can identify the user interest given certain established goals and categories; on the other hand, with unsupervised learning we can validate the goals and categories used, refine them and select the most appropriate to the user's needs.
AB - The identification of the user's intention or interest through queries that they submit to a search engine can be very useful to offer them more adequate results. In this work we present a framework for the identification of user's interest in an automatic way, based on the analysis of query logs. This identification is made from two perspectives, the objectives or goals of a user and the categories in which these aims are situated. A manual classification of the queries was made in order to have a reference point and then we applied supervised and unsupervised learning techniques. The results obtained show that for a considerable amount of cases supervised learning is a good option, however through unsupervised learning we found relationships between users and behaviors that are not easy to detect just taking the query words. Also, through unsupervised learning we established that there are categories that we are not able to determine in contrast with other classes that were not considered but naturally appear after the clustering process. This allowed us to establish that the combination of supervised and unsupervised learning is a good alternative to find user's goals. From supervised learning we can identify the user interest given certain established goals and categories; on the other hand, with unsupervised learning we can validate the goals and categories used, refine them and select the most appropriate to the user's needs.
UR - http://www.scopus.com/inward/record.url?scp=33750321322&partnerID=8YFLogxK
U2 - 10.1007/11880561_9
DO - 10.1007/11880561_9
M3 - Libros de Investigación
AN - SCOPUS:33750321322
SN - 3540457747
SN - 9783540457749
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 98
EP - 109
BT - String Processing and Information Retrieval - 13th International Conference, SPIRE 2006, Proceedings
PB - Springer Verlag
T2 - 13th International Conference on String Processing and Information Retrieval, SPIRE 2006
Y2 - 11 October 2006 through 13 October 2006
ER -