ra.ethz.ch

Ra.ethz.ch

WWW 2007 / Poster Paper
Topic: Search
MedSearch: A Specialized Search Engine
for Medical Information
University of Massachusetts – Amherst2 {luog, ctang, haoyang}@us.ibm.com xwei@cs.umass.edu ABSTRACT
panophthalmitis). As a result, it is difficult for him to choose a few People are thirsty for medical information. Existing Web search accurate medical phrases as a starting point for his search. Instead, engines cannot handle medical search well because they do not considering the importance of his health, the searcher is typically consider its special requirements. Often a medical information willing to take his time to describe his situation in detail (e.g., his searcher is uncertain about his exact questions and unfamiliar with medical history, where and how he feels uncomfortable, and what medical terminology. Therefore, he prefers to pose long queries, happened in the last several days) by posing long queries in plain describing his symptoms and situation in plain English, and receive English, much like the way he talks to a doctor. Actually, many comprehensive, relevant information from search results. This paper medical questions that people posted on medical forums contain presents MedSearch, a specialized medical Web search engine, to several hundred words, and a recent study on medical queries [2] address these challenges. MedSearch can assist ordinary Internet has reported that medical information searchers prefer to pose users to search for medical information, by accepting queries of detailed long questions to Web search engines. Figure 1 shows one extended length, providing diversified search results, and suggesting related medical phrases. A full version of this paper is available in www.medhelp.org/forums/RespiratoryDisorders/messages/2584.html . My 23 month old son has been coughing since 6 months old … Seems to be constantly on antibiotics for every kind of chest infection, on Categories and Subject Descriptors
pulmicort, albuterol 2x's a day, constant ear infections (tubes, adnoids, and tonsils are scheduled), chronic loose stools. Seen an allergist, he has H.3.3 [Information Search and Retrieval]: search process lots of environmental allergies, did all the mattres covers, rugs are gone, General Terms: Algorithms, Experimentation
air purifier in. All this to no avail. Chest xray showed streaking in the Keywords: medical query, medical Web search engine
main bronch tubes (?) perihilar stuff hazy areas, left lobe is alot grayer than the right. … Went to pedi pulmonologist in Boston, scheduled for 1. INTRODUCTION
sweat test on Friday, he doesnt think he has it, but wants to rule out CF. He wants to do CT and bronchoscope next week. Mentioned something Health care is a major business in many countries and a large part about poss. deformed broch tubes, or weak lung walls, or even a cyst of this business is related to the management and retrieval of compressing his lungs causing this cough … what are the possibilities he medical information. To facilitate people to acquire medical has a verison of pulmonary micobacterial infection? . information in the Web era, many medical Web search engines (e.g., Figure 1. An exemplary medical question posted on the
Healthline and Google Health) have come into existence. While Med Help International Medical and Health Forum
these systems have their own merits, they all treat medical search in (www.medhelp.org/forums.htm).
much the same way as traditional web search. Medical search has several unique requirements that distinguish Even after stopword removal, the above query still cannot be fed itself from traditional Web search. A common scenario in which a directly into existing medical Web search engines, because they all person performs medical search is that he feels uncomfortable but is impose certain limits on query length for various reasons. For uncertain about his exact medical problems. In this case, the instance, Google truncates long queries into the length limit of 32 searcher usually prefers to learn all kinds of knowledge that is words. Such a low limit on query length is a serious obstacle for related to his situation. However, existing medical Web search medical information searchers. Moreover, a medical information engines are optimized for precision and concentrate their search searcher often prefers the search engine to automatically suggest results on a few topics. This lack-of-diversity problem is aggravated diversified, related medical phrases that can help him quickly digest by the nature of medical web pages. When discussing a medical search results and refine his query. However, this cannot be done topic, many medical web sites use similar, but not identical, with existing medical Web search engines when the query is written descriptions by paraphrasing contents in medical textbooks and using plain English description and has a terminological research papers. Hence, search results provided by existing medical Web search engines often contain much semantic redundancy, which cannot be easily handled by existing methods for identifying near-duplicate documents or result diversification. To find useful 2. MEDSEARCH
medical information, the searcher often has to go through a large In this paper, we present MedSearch, a prototype medical Web search engine that addresses the aforementioned limitations of Another unique feature of medical search is due to the fact that existing systems. MedSearch uses several key techniques that most Internet users do not have much medical knowledge. A greatly improve its usability and the quality of search results. First, medical information searcher is often unclear about the problem that MedSearch accepts queries of extended length and supports the use he is facing and unaware of the related medical terminology (e.g., of plain English description. This is a great convenience for the majority of Internet users who do not have much medical knowledge. MedSearch automatically rewrites long queries into Copyright is held by the author/owner(s). moderate-length queries by selectively dropping unimportant terms WWW 2007, May 8–12, 2007, Banff, Alberta, Canada. (i.e., words). Since unimportant terms not only appear in a large number of Web pages but also obscure the main theme of the query, WWW 2007 / Poster Paper
Topic: Search
dropping them can both greatly increase the query processing speed (www.medhelp.org/forums.htm). One such query is shown in and improve the quality of search results. Second, MedSearch Figure 1. We crawled 6GB of Web pages from WebMD returns diversified Web pages without significantly increasing query (www.webmd.com), one of the most popular medical web sites. processing time or deteriorating the quality of the returned top Web Both relevance and diversity are judged using a single metric: pages, which allows the searcher to see various aspects related to his usefulness. A returned Web page P is useful if P is relevant to the situation. Third, MedSearch automatically suggests diversified, query, and much of P’s relevant content has not been mentioned in related medical phrases to the searcher based on information from the Web pages that are ranked higher. If P is useful, its usefulness several sources: the standard MeSH medical ontology score score (P) = 1 ; otherwise, (www.nlm.nih.gov/mesh/meshhome.html), the collection of crawled definition of usefulness holds for the suggested medical phrases. For There are several key challenges in designing MedSearch. In order to rewrite long queries into moderate-length queries, we must aggressively drop unimportant terms yet avoid losing much useful score = ∑ score (P ) / information. For this purpose, we rank all the terms in the query For the suggested 60 medical phrases, their weighted average according to the Okapi term weighting formula. Those terms with usefulness score is defined similarly. The mean of the weighted small weights are treated as unimportant ones and dropped from the average usefulness scores over the 30 queries is the main quality metric for the returned pages and the suggested phrases. Five One major challenge in providing diversified search results is to colleagues served as assessors and independently determined the efficiently handle the excessive redundancy among different usefulness scores of the returned Web pages and the suggested medical Web pages. For this purpose, all the crawled Web pages are medical phrases. None of them has formal medical training. clustered into multiple clusters in a pre-processing step. Each of To give the reader a feeling of the contents returned by these clusters roughly corresponds to a different topic. When MedSearch, we present detailed results of the returned Web pages ranking Web pages, each cluster can contribute only a limited and the suggested medical phrases for the query in Figure 1. Table 1 number of results to the returned top few Web pages. Then the shows some of the returned relevant Web pages. The suggested searcher is likely to see different aspects in the top results. relevant medical phrases include bronchoscopy (rank 1), bronchitis The process of suggesting related medical phrases consists of two (rank 2), and sarcoidosis (rank 4). In general, for a medical query Q, sub-steps. The first sub-step is to generate the candidate set S of MedSearch can find several relevant Web pages and medical related medical phrases in the MeSH ontology. The second sub-step phrases that cover multiple aspects of Q. is to rank the medical phrases in S. In the first sub-step, MedSearch selects V=60 medical phrases from the returned top-20 Web pages. Table 1. Some of the returned relevant Web pages.
The suggested medical phrases need to be both relevant and diverse in order to provide the greatest convenience to the searcher. Intuitively, to ensure that a medical phrase M is relevant, it is better for M to appear in one of the returned top Web pages with a large tf×idf value that is computed using the Okapi formula. To ensure enough diversity in the list of suggested medical phrases, a single Web page should not contribute too many medical phrases to that list. We use a continuous discounting method to achieve these two goals. Each time a medical phrase is chosen from a Web page P, a The means of the weighted average usefulness scores over the 30 discount is given to the tf×idf values of the remaining medical queries for the returned top-20 Web pages and the suggested 60 phrases in P. As a result, the more medical phrases have already medical phrases are 7.9 and 6.1, respectively. We present a simple come out from P, the more difficult the remaining medical phrases calculation below to give the reader some intuition on these in P will come out in the future. We select V medical phrases in V numbers. Let wsi denote the weighted average usefulness score passes. In each pass, we select a medical phrase with the largest when the returned top-i Web pages (or medical phrases) are useful while the others are not useful. Then ws The main challenge in the second sub-step of ranking the suggested medical phrases is to resolve the terminological discrepancy between medical phrases and queries written in plain Our results show that MedSearch can process long queries English. For this purpose, a set of representative Web pages are efficiently, at a speed roughly comparable to that of existing computed offline for each medical phrase M, by using M to retrieve medical Web search engines in processing short queries. Our the top-ranked Web pages. Since a large part of these high-quality experiments also show that users’ satisfaction is crucially tied to representative Web pages are written in plain English, they provide MedSearch’s capability of returning diversified Web pages and good linkages between medical terminology and plain English suggesting diversified, related medical phrases that can help users words. The relevance between a query Q and a medical phrase M is quickly understand the returned pages and refine their queries. computed as a function of the relevance scores between Q and M’s representative Web pages. Then all the suggested medical phrases are sorted in descending order of their relevance scores. A detailed 4. REFERENCES
description of our techniques is available in [1]. [1] Full version of this paper is available at http://www.cs.wisc.edu/~gangluo/medsearch.pdf. 3. RESULTS
[2] A. Spink, Y. Yang, and J. Jansen et al. A Study of Medical and To demonstrate the effectiveness of our techniques, Health Queries to Web Search Engines. Health Information and we conducted experiments using 30 representative medical questions that people posted on a popular medical forum, the Med Help International Medical and Health Forum

Source: http://www.ra.ethz.ch/cdstore/www2007/www2007.org/posters/poster858.pdf

Abendua-diciembre

DICIEMBRE 2011 ABENDUA Número 73. Alea JESPER JUUL: SU HIJO UNA PERSONA COMPETENTE LA RESPONSABILIDAD PERSONAL DE LOS NIÑOS Los niños deben de ser responsables sobre tres áreas de la vida. Sus sentidos: por ejemplo, lo que sabe bien y lo que sabe mal, lo que huele bien o lo que huele mal, lo que está frío y lo que está caliente… Sus sentimientos: por ejemplo, l

Soma vetorial de forças

FUNDAÇÃO ESCOLA TÉCNICA LIBERATO SALZANO VIEIRA DA CUNHA Roteiro para Aula Prática de Física Prof.: Luiz André e Edi T. O. Grins Aula: 16 / out / 2009 Entrega: 30 / out / 2009 (pelo TelEduc) Roteiro elaborado pelo Professor Luiz André Mützenberg TENTATIVAS DO GERADOR DE ENERGIA EÓLICA Objetivo: Determinar a eficiência do gerador de energia eólica desenvolvido