TY - JOUR
T1 - Analyzing Medical Image Search Behavior
T2 - Semantics and Prediction of Query Results
AU - De-Arteaga, Maria
AU - Eggel, Ivan
AU - Kahn, Charles E.
AU - Müller, Henning
N1 - Publisher Copyright:
© 2015, Society for Imaging Informatics in Medicine.
PY - 2015/10/23
Y1 - 2015/10/23
N2 - Log files of information retrieval systems that record user behavior have been used to improve the outcomes of retrieval systems, understand user behavior, and predict events. In this article, a log file of the ARRS GoldMiner search engine containing 222,005 consecutive queries is analyzed. Time stamps are available for each query, as well as masked IP addresses, which enables to identify queries from the same person. This article describes the ways in which physicians (or Internet searchers interested in medical images) search and proposes potential improvements by suggesting query modifications. For example, many queries contain only few terms and therefore are not specific; others contain spelling mistakes or non-medical terms that likely lead to poor or empty results. One of the goals of this report is to predict the number of results a query will have since such a model allows search engines to automatically propose query modifications in order to avoid result lists that are empty or too large. This prediction is made based on characteristics of the query terms themselves. Prediction of empty results has an accuracy above 88 %, and thus can be used to automatically modify the query to avoid empty result sets for a user. The semantic analysis and data of reformulations done by users in the past can aid the development of better search systems, particularly to improve results for novice users. Therefore, this paper gives important ideas to better understand how people search and how to use this knowledge to improve the performance of specialized medical search engines.
AB - Log files of information retrieval systems that record user behavior have been used to improve the outcomes of retrieval systems, understand user behavior, and predict events. In this article, a log file of the ARRS GoldMiner search engine containing 222,005 consecutive queries is analyzed. Time stamps are available for each query, as well as masked IP addresses, which enables to identify queries from the same person. This article describes the ways in which physicians (or Internet searchers interested in medical images) search and proposes potential improvements by suggesting query modifications. For example, many queries contain only few terms and therefore are not specific; others contain spelling mistakes or non-medical terms that likely lead to poor or empty results. One of the goals of this report is to predict the number of results a query will have since such a model allows search engines to automatically propose query modifications in order to avoid result lists that are empty or too large. This prediction is made based on characteristics of the query terms themselves. Prediction of empty results has an accuracy above 88 %, and thus can be used to automatically modify the query to avoid empty result sets for a user. The semantic analysis and data of reformulations done by users in the past can aid the development of better search systems, particularly to improve results for novice users. Therefore, this paper gives important ideas to better understand how people search and how to use this knowledge to improve the performance of specialized medical search engines.
KW - Human-computer interaction
KW - Image retrieval
KW - Information storage and retrieval
KW - Log file analysis
KW - Machine learning
KW - Medical image search
KW - Statistic analysis
UR - https://www.scopus.com/pages/publications/84941997230
U2 - 10.1007/s10278-015-9792-6
DO - 10.1007/s10278-015-9792-6
M3 - Article
C2 - 25810317
AN - SCOPUS:84941997230
SN - 0897-1889
VL - 28
SP - 537
EP - 546
JO - Journal of Digital Imaging
JF - Journal of Digital Imaging
IS - 5
ER -