D:/documents/clef/clef papers/all-hands2005/ah2005queries.dvi

Intuitive Querying of e-Health Data Repositories
Catalina Hallett, Richard Power, Donia Scott
{C.Hallett, R.Power, D.Scott}@open.ac.uk Abstract
At the centre of the Clinical e-Science Framework (CLEF) project is a repository of well organ-ised, detailed clinical histories, encoded as data that will be available for use in clinical care andin-silico medical experiments. An integral part of the CLEF workbench is a tool to allow biomed-ical researchers and clinicians to query – in an intuitive way – the repository of patient data. Thispaper describes the CLEF query editing interface, which makes use of natural language gener-ation techniques in order to alleviate some of the problems generally faced by natural languageand graphical query interfaces. The query interface also incorporates an answer renderer thatdynamically generates responses in both natural language text and graphics.
Background
databases involves expressing queries in a language that is understood by the database aims at providing a data repository of well management system (typically SQL). Direct SQL organised clinical histories, which can be queried querying requires specialist knowledge of the and summarised both for biomedical research both the query language and the structure of the underlying database, and – in the case of medical of the query interface is to provide efficient databases – usually also knowledge of precise access to aggregated data for performing a variety of tasks, e.g., assisting in diagnosis or be counter-productive to require this additional treatment, identifying patterns in treatment, level of technical expertise of the clinicians and selecting subjects for clinical trials, monitoring biomedical researchers who want to access the the participants in clinical trials. The intended users of this service are clinicians, biomedicalresearchers, Attempts to overcome this problem in user Our current domain is cancer; however, the interfaces to medical databases have traditionally framework in principle supports a wide range of made use of graphical devices such as forms, diagrams, menus, or pointers to communicate to An analysis of free text queries written by the user the information content of a database medical professionals show that they are mostly (e.g., KNAVE (Shahar and Cheng, 1999) and TrialDB (Deshpande et al., 2001)), and research makes the design of the query interface to the shows that they are much preferred over textual CLEF repository particularly difficult, since our query languages such as SQL, especially by users will need to construct complex queries containing conditional and temporal structures.
empirical studies have reported high error rates The CLEF repository of clinical histories by domain experts using graphical modelling currently contains some 20000 records of cancer tools (Kim, 1990) and a clear advantage of text over graphics for understanding nested or ICD, and is implemented as a relational conditional structures (Petre, 1995).
database that stores patient records modeled However, it is also well-known that queries on the archetype for cancer developed at UCL expressed in free natural language are sensitive Query analysis
ungrammaticalities) or processing (at the lexical,syntactic or semantic level). A further drawback Types of queries
of natural language interfaces to databases is that An analysis of real queries from clinical trials and such systems normally understand only a subset invented queries supplied by clinicians identified of natural language, and it is not always clear to two general types of queries, as exemplified casual users which are the valid constructions and whether the lack of response from the system is due to the unavailability of an answer or to an unaccepted input construction. On the positive side, natural language is far more expressive than SQL, so it is generally easier to ask complex questions and manipulate temporal constructions using natural language than using a database The CLEF query interface
In the first example, the expected answer is a comparison between a certain statisticalmeasure (in this case, percentage) applied on two The CLEF query system is designed to answer groups of patients differentiated by the treatment questions relating to patterns in medical histories over sets of patients in the data repository.
a statistical measure (average) computed for a The current interface is designed for casual certain parameter (number of investigations of and moderate users who are familiar with the type ”body scan”) of a group of patients with semantic domain of the repository but not with its technical implementation (e.g., clinicians, For either of these queries, the attributes medical researchers and hospital administrators).
involved in constructing the query can vary For the reasons we described above, the guiding within a certain range: any statistical measure principle in the design of our interface is that its can be used, the differentiating parameter could use requires no prior knowledge of the structure be the diagnosis instead of the treatment, etc.
of the repository, no expertise in database Additionally, there are a number of variations access languages such as SQL, no familiarity to these two main types of queries. For both with medical codes, and only minimal prior types, the user may ask for simple assessment repository is not through SQL, or graphics or freetext. Instead, query-construction is performed by interacting with an automatically-generated Natural Language feedback text (currently only English). This interaction method, based on the There are also cases where several similar et al (Power and Scott, 1998), allows users queries are combined into one more complex of the profile described above to construct in an intuitive way, unambiguous, syntactically correct, complex natural language queries, such For all these queries, there is practically no limit to the complexity that can be achieved description can in fact be a conjunction or disjunction of diagnoses, and the same applies for every concept included in a query. Therefore, the Query editing interface
General features
Conceptual authoring through WYSIWYM editing (Power and Scott, 1998) alleviates the need for expensive syntactic and semantic processing of the queries by providing the users with an supported by the query editor, and they are interface for editing the conceptual meaning of not considered separate types of queries, nor a query instead of the surface text.
The WYSIWYM interface presents the contents of a knowledge base to the user in the form of Modeling queries
a feedback text. In the case of query editing,the content of the knowledge base is a yet to For presentation reasons, queries have to be be completed formal representation of the user’s decomposed into constituents that can be easily edited by the user. By way of exemplification, a natural language text that corresponds with let us consider the query type (1). There are the incomplete query and guides them towards three elements to the query: the set of relevant editing a semantically consistent and complete patients, defined by a problem; the partition of this set according to treatment; and the further control the interpretation that the system gives partition according to outcome, from which the a basic query frame, where concepts to be complicated sentences, we consider a format in instantiated (anchors) are clickable spans of text which these elements are presented separately: with associated pop-up menus containing optionsfor expanding the query. For example, one can Relevant subjects:
start constructing a query that asks for a group of patients fulfilling some conditions by editing thefollowing description: Treatment profiles:
Relevant subjects:
Outcome measure:
Treatment profile:
received [some treatment]
Outcome: [measure] of [patients with
This breakdown allows the following basic Relevant subjects: [Some patients]
Once the user selects an anchor and a new value for the concept represented by the anchor, Treatment profiles:
the semantic representation of the query is updated and a new text is generated on the basis combination of features or events of the same Outcome measure:
type, thus allowing for complex queries, with nested conditional structures to be built. Some concept instances can also be typed in manually,which is useful for numerical values or other Each of the bracketed elements are complex fields with unpredictable content, such as names.
descriptions that model the concept definition in This is also a way of enriching the ontology with the CLEF archetype. For example, the concept new concepts. Figure 1 is a snapshot of the query diagnosis consists of the following obligatory editor with a partially constructed query.
and optional components: tumour name, locus, type (metastatic, primary, secondary) and TNM staging code. Each of the subcomponents can be selection over the feedback text is treated as an extended through boolean operations (negation, intermediate query, which is sent to the DBMS.
In return, the DBMS will transmit to the interface a feedback answer. At this point, the feedback main challenge is not to construct valid database answer is a set of paired values representing the queries from edited queries but to ensure that number of patient records that match the query the query the user is editing corresponds to the and the percentage from the total number of intended meaning. Therefore we want to ensure records. There is also a further breakdown of that the layout of the query conveys one meaning patient records by sex, which was considered a good discriminatory feature. For example, for an intermediate query such as Number of patients over the age of 60., the feedback answer could based on the analysis of some real queries that be 100 records (20% of 500), 55 men (55%), 45 could be given multiple interpretations. Several categories of possible ambiguities are presented As a further consistency checking mechanism, below, along with the solution provided by the the interface provides an additional rendering of the query in running text, which is performed When the phrase describing a relevance set once the editing of the feedback query has includes a conjunction or disjunction, there may been completed, the user is presented with an be ambiguity over whether the intended query is alternative natural language query corresponding single or multiple. Compare these three patterns: to the structure that has been edited (output schematic to allow for more intuitive editing, the output query resembles in every respect a free text query, thus being more natural and easier to The natural language interface is database- independent, since it does not require any Example 7a is likely to be interpreted as two knowledge of the database structure.
separate queries, while the others are ambiguous.
structure of the database is not only completely Disjunctions like 7c occur often in real life transparent to the user, but also to the interface developer: changes at the database level require no changes in the query editor. Queries can be saved for later re-use, which is particularly useful for frequent users who formulate queries with Dealing with ambiguities
Since the processing of an edited query is deterministic and transparent to the user, the In this case, it is not clear if separate myelodysplastic syndrome only and for acute myelogenous leukaemia caused by bad prognosis myelodysplastic syndrome, or if it make sense to give a single answer lumping these two groups feedback texts by using different realisations forconjunctions/disjunctions that imply multiple Specifying constraints and temporal
relevance sets, and conjunctions/disjunctions relations
that do not. For example, we use bulleted lists Guiding users towards editing correct and for the former, and conjunction words (and, or) complete queries is essential and is one of the main points where our approach improves on classical natural language query interfaces.
This is achieved by defining and implementing of age who have had bad prognosismyelodysplastic syndrome only for at years of age who have had acutemyelogenous leukaemia caused by bad Static (or ontological) constraints relate
to the structure of the queries as defined in the query model. This includes specifying the super-class of an instance (for example, the anchor cancer can only be instantiated with names of cancers), its type (for example, age is numeric and editable, while cancer is a static string) and its status (compulsory vs optional).
Dynamic constraints are triggered at runtime
by the user selection of certain instances. Most In 9a we have two relevance sets; in 9b we constraints simply serve the role of restricting the user selection so that the resulting query Similar ambiguities can be found when several is meaningful and intelligible. In other cases, treatment profiles are mentioned, or several however, allowing the user to construct queries outcome measures. In each case, the ambiguity can be avoided in the WYSIWYM feedback texts constraints could yield ambiguous queries.
the same way as before, by using bullets to mark Dynamic contraints can be either conceptual, which are compiled from a medical knowledge base and represent depedencies between medical properties. A description can be elaborate either concepts (for example, nephroblastoma is a type because it contains many boolean operators, of kidney cancer, so users shouldn’t be allowed to query for nephroblastoma in the left breast), or numerical (for example, patients between 60 and boolean combinations in running prose means 30 years of age is a disallowed construction).
that the scope of the operators can become As medical records mirror the evolution in ambiguous to the user. For this reason, layout time of a patient, it is important to be able is used to present boolean combinations more to access the patient’s status at a certain point natural language is an important advantage of natural language query interfaces over graphical interfaces. All temporal concepts in the medical record are stamped with a valid time stamp, event took place. Typically, a time interval is 1to a certain level of granularity imposed by the representation of time instances in the database Gender Age adenocarcinoma small cell carcinoma squamous cell carcinoma death
represented as a pair of start and end dates, where in 4 age groups according to their gender and start and end are discrete time values of a certain histopathology diagnosis. 42 patients have been returned as a result to your query: associates specific linguistic expressions to time -in the 29-38 years age group there were 1 intervals. For example, between [date 1] and patients (0 men and 1 woman): all patients were [date 2] is interpreted as a closed interval [date 1, diagnosed with adenocarcinoma. [.] date 2], in [this year] is interpreted as [01/01/this -in the 49-58 age group, there were 27 patients (14 men and 13 women): 11 were diagnosed cover most temporal queries, such as: patients with adenocarcinoma, 5 were diagnosed with diagnosed with cancer before 1999, patients squamous cell carcinoma, 11 were diagnosed who received chemotherapy within 5 months of Conclusions and further work
Answer generation
We have presented in this paper a query interfaceto a repository of patient records which makes A typical result set received from the DBMS use of natural language generation techniques.
consists of lists of patients that fulfilled the The query interface allows the editing of complex requirements of the query, for each patient queries and is a viable alternative to natural having specified the age, gender, and the language interfaces and visual query interfaces values for each of the query elements.
to medical databases. Answers to queries are example, a query such as Select all patients provided in textual format using natural language between the ages of 30 and 60 with a generation techniques and also as tables and clinical diagnosis of malignant neoplasm of charts. The main features that set our approach bronchus or lungs and histopathology diagnosis apart from other querying interfaces to medical of adenocarcinoma, small cell carcinoma or squamous cell carcinoma, who were alive after10 years of the diagnosis, may yield the result set users require little training for using the The result set is processed in such a way as to allow the rendering of various groups of patients a set of semantic constraints are used according to the age/gender breakdown and each to guide users towards constructing valid individual query term. For each individual search queries only, therefore incorrect queries are parameter, the data are split into a dynamically determined number of age groups, and for each age group the number of patients is further split according to their gender. The result set thus since ambiguity is dealt with in the editing processed is presented to the user in three types of format: tables, charts and text. Each individual chart also contains an automatically generated caption that explains the content of the chart.
the query interface has wider applicability The captions are generated using template- based techniques, where fillers are provided by the same result set that was used for generating the chart. For the bar chart in Fig. 3, a fragmentof the explanation provided in the caption reads: Whilst the query editing interface is fully This chart displays the distribution of patients implemented, extending the range of queries Figure 3: Generated bar chart: histopathology diagnosis/age/gender breakdown supported is an ongoing effort. This is performed M. Petre. 1995. Why looking isn’t always in parallel with an evaluation of the usability and user-friendliness of the interface.
expected that the evaluation will help formulatean extended range of queries and improve the editing interface. The improved query interface will provide means of interactively defining texts. In Proceedings of 17th InternationalConference on Computational Linguistics default values for instances that support them (for example, one may want to default all index Association for Computational Linguistics events to the date of the first diagnosis). We also (COLING-ACL 98), pages 1053–1059, plan to extend the range of temporal operators to include, for example, trend operators for clinical Intelligent visualization and exploration blood pressure, stationary haemoglobin count) and define independent variables for reporting Proceedings of HICSS, Maui, Hawaii.
statistical results (such as age groups, sex,education level).
References
A. Deshpande, C. Brandt, and P. Nadkarni.
Meeting the needs of clinical studies.
Journal Informatics Association, 9(4):369–382.
Dipak Kalra, Anthony Austin, A. O’Connor, Implementation of a Federated HealthRecord Server, pages 1–13.
Records Institute for the Centre forAdvancement of Electronic Records Ltd.
Y. Kim. 1990. Effects of conceptual data modelling fomalsms on user validationand analyst modelling of informationrequirements. Ph.D. thesis, University ofMinnesota.

Source: http://www.allhands.org.uk/2005/proceedings/papers/336.pdf

Microsoft word - cold and flu prevention and treatment nov07 _2_.doc

Health Update _______________________________________________________________________________________ Health Update - Understanding Colds and Flu: Their Prevention and Treatment Did you catch our Internet Radio Show on “Cold & Flu Prevention and Treatment”? It is available in the webcast archives at www.healthcoach.ca/radio/. Many medical experts consider the influenza virus (ca

Microsoft word - newsweek30-01-95.doc

A New Assault on Addiction motivated to take it. "If this drug isn't used with a comprehensive treatment program," Medicine: Can a single drug keep alcoholics on the says DuPont Merck president Kurt Landgraf, “the failure rates are very high." And wagon and help junkies through withdrawal? naltrexone poses hazards of its own. The common side effects are minor, ranging from

Copyright © 2010-2014 Metabolize Drugs Pdf