CABG]i. It is well tolerated when started beforeand continued after the surgery. However, further
prospective studies are needed to clarify [this is-
issue anaphora. We propose a candidateranking model for this-issue anaphora
(3) In principle, he said, airlines should be allowed
resolution that explores different issue-
[to sell standing-room-only tickets for adults]i
— as long as [this decision]i was approved by
to nominal or verbal antecedents; rather,
These examples highlight a difficulty not found with
it is able to identify antecedents thatare arbitrary spans of text.
nominal anaphora. First, the anaphors refer to ab-
sults show that (a) the model outperforms
stract concepts that can be expressed with differ-
ent syntactic shapes which are usually not nominals.
The anaphor That in (1) refers to the proposition in
as distinguished from issue-specific fea-
the previous utterance, whereas the anaphor this is-
sue in (2) refers to a clause from the previous text.
In (3), the anaphoric expression this decision refers
to a verb phrase from the same sentence. Second,
NPs such as this problem and this debate;and (c) it is possible to reduce the search
the antecedents do not always have precisely defined
space in order to improve performance.
boundaries. In (2), for example, the whole sentencecontaining the marked clause could also be thought
to be the correct antecedent. Third, the actual refer-ents are not always the precise textual antecedents.
Anaphora in which the anaphoric expression refers
The actual referent in (2), the issue to be clarified,
to an abstract object such as a proposition, a prop-
is whether oral carvedilol is more effective than oral
erty, or a fact is known as abstract object anaphora.
metoprolol in the prevention of AF after on-pump
This is seen in the following examples.
CABG or not, a variant of the antecedent text.
Generally, abstract anaphora, as distinguished
(1) [Be careful what you wish. because wishes
from nominal anaphora, is signalled in English by
Semiconductor Industry Association, which rep-
pronouns this, that, and it (M¨uller, 2008). But in
resents U.S. manufacturers, has been learning.
abstract anaphora, English prefers demonstratives
to personal pronouns and definite articles
(2) This prospective study suggested [that oral
carvedilol is more effective than oral meto-
1This is not to say that personal pronouns and definite arti-
prolol in the prevention of AF after on-pump
cles do not occur in abstract anaphora, but they are not common.
tives can be used in isolation (That in (1)) or with
semantic forms, which makes the problem interest-
nouns (e.g., this issue in (2)). The latter follows
ing and non-trivial. Third, issue referents in scien-
the pattern demonstrative {modifier}* noun. The
tific literature generally lie in the previous sentence
demonstrative acts as a determiner and the noun fol-
or two, which makes the problem tractable. Fourth,
lowing the demonstrative imposes selectional con-
issues in Medline abstracts are generally associated
straints for the antecedent, as in examples (2) and
with clinical problems in the medical domain and
(3). Francis (1994) calls such nouns label nouns,
spell out the motivation of the research presented in
which “serve to encapsulate or package a stretch
the article. So extraction of this information would
be useful in any biomedical information retrieval
shell nouns, a metaphoric term which reflects differ-
ent functions of these nouns such as encapsulation,pointing, and signalling.
Demonstrative nouns, along with pronouns like
both and either, are referred to as sortal anaphors
Anaphora resolution has been extensively studied
(Casta˜no et al., 2002; Lin and Liang, 2004; Torii
in computational linguistics (Hirst, 1981; Mitkov,
and Vijay-Shanker, 2007). Casta˜no et al. observed
2002; Poesio et al., 2011). But CL research has
that sortal anaphors are prevalent in the biomedi-
mostly focused on nominal anaphora resolution
cal literature. They noted that among 100 distinct
(e.g., resolving multiple ambiguous mentions of a
anaphors derived from a corpus of 70 Medline ab-
single entity representing a person, a location, or an
stracts, 60% were sortal anaphors. But how often
organization) mainly for two reasons. First, nominal
do demonstrative nouns refer to abstract objects?
anaphora is the most frequently occurring anaphora
We observed that from a corpus of 74,000 randomly
in most domains, and second, there is a substantial
chosen Medline2 abstracts, of the first 150 most fre-
amount of annotated data available for this kind of
quently occurring distinct demonstrative nouns (fre-
quency > 30), 51.3% were abstract, 41.3% were
Besides pronominal anaphora, some work has
concrete, and 7.3% were discourse deictic.
been done on complement anaphora (Modjeska,
shows that abstract anaphora resolution is an impor-
2003) (e.g., British and other European steelmak-
tant component of general anaphora resolution in the
ers). There is also some research on resolving sor-
biomedical domain. However, automatic resolution
tal anaphora in the medical domain using domain
of this type of anaphora has not attracted much atten-
knowledge (Casta˜no et al., 2002; Lin and Liang,
tion and the previous work for this task is limited.
2004; Torii and Vijay-Shanker, 2007). But all these
The present work is a step towards resolving ab-
approaches focus only on the anaphors with nominal
stract anaphora in written text. In particular, we
choose the interesting abstract concept issue and
By contrast, the area of abstract object anaphora
demonstrate the complexities of resolving this-issue
remains relatively unexplored mainly because the
anaphora manually as well as automatically in the
standard anaphora resolution features such as agree-
Medline domain. We present our algorithm, results,
ment and apposition cannot be applied to abstract
and error analysis for this-issue anaphora resolution.
anaphora resolution. Asher (1993) built a theoreti-
The abstract concept issue was chosen for the fol-
cal framework to resolve abstract anaphora. He di-
vided discourse abstract anaphora into three broad
kinds of text from newspaper articles to novels to
categories: event anaphora, proposition anaphora,
scientific articles. There are 13,489 issue anaphora
and fact anaphora, and discussed how abstract en-
instances in the New York Times corpus and 1,116
tities can be resolved using discourse representa-
instances in 65,000 Medline abstracts. Second, it is
tion theory. Chen et al. (2011) focused on a sub-
abstract enough that it can take several syntactic and
set of event anaphora and resolved event corefer-ence chains in terms of the representative verbs of
2http://www.nlm.nih.gov/bsd/pmresources.
the events from the OntoNotes corpus. Our task dif-
fers from their work as follows. Chen et al. mainly
focus on events and actions and use verbs as a proxy
for the non-nominal antecedents. But this-issue an-
tecedents cannot usually be represented by a verb.
Our work is not restricted to a particular syntactic
type of the antecedent; rather we provide the flexibil-ity of marking arbitrary spans of text as antecedents.
Figure 1: Example of annotated data. Bold segments
There are also some prominent approaches to ab-
denote the marked antecedents for the corresponding
stract anaphora resolution in the spoken dialogue
anaphor ids. rh j is the jth section identified by the an-
domain (Eckert and Strube, 2000; Byron, 2004;
M¨uller, 2008). These approaches go beyond nom-inal antecedents; however, they are restricted to spo-
ken dialogues in specific domains and need seriousadaptation if one wants to apply them to arbitrary
This kind of annotation — identifying and marking
arbitrary units of text that are not necessarily con-stituents — requires a non-trivial variant of the usual
In addition to research on resolution, there is
inter-annotator agreement measures. We use Krip-
also some work on effective annotation of abstract
pendorff’s reliability coefficient for unitizing (α
anaphora (Strube and M¨uller, 2003; Botley, 2006;
(Krippendorff, 1995) which has not often been used
Poesio and Artstein, 2008; Dipper and Zinsmeister,
or described in CL. In our context, unitizing means
2011). However, to the best of our knowledge, there
marking the spans of the text that serve as the an-
is currently no English corpus annotated for issue
tecedent for the given anaphors within the given text.
The coefficient αu assumes that the annotated sec-tions do not overlap in a single annotator’s output
and our data satisfies this criterion.4 The generalform of coefficient
To create an initial annotated dataset, we collected
188 this {modifier}* issue instances along with thesurrounding context from Medline abstracts.3 Five
instances were discarded as they had an unrelated(publication related) sense. Among the remaining
where uDo and uDe are observed and expected dis-
183 instances, 132 instances were independently an-
agreements respectively. Both disagreement quanti-
notated by two annotators, a domain expert and a
ties express the average squared differences between
non-expert, and the remaining 51 instances were an-
the mismatching pairs of values assigned by anno-
notated only by the domain expert. We use the for-
tators to given units of analysis. αu = 1 indicates
mer instances for training and the latter instances
perfect reliability and αu = 0 indicates the absence
(unseen by the developer) for testing. The anno-
of reliability. When αu < 0, the disagreement is sys-
tator’s task was to mark arbitrary text segments
tematic. Annotated data with reliability of αu ≥ 0.80
as antecedents (without concern for their linguistic
is considered reliable (Krippendorff, 2004).
types). To make the task tractable, we assumed that
Krippendorff’s αu is non-trivial, and explaining it
an antecedent does not span multiple sentences but
in detail would take too much space, but the general
lies in a single sentence (since we are dealing with
idea, in our context, is as follows. The annotators
singular this-issue anaphors) and that it is a continu-
mark the antecedents corresponding to each anaphor
in their respective copies of the text, as shown in Fig-ure 1. The marked antecedents are mutually exclu-
3Although our dataset is rather small, its size is similar to
sive sections r; we denote the jth section identified
other available abstract anaphora corpora in English: 154 in-stances in Eckert and Strube (2000), 69 instances in Byron
4If antecedents overlap with each other in a single annota-
(2003), 462 instances annotated by only one annotator in Botley
tor’s output (which is a rare event) we construct data that satis-
(2006), and 455 instances restricted to those which have only
fies the non-overlap criterion by creating different copies of the
nominal or clausal antecedents in Poesio and Artstein (2008).
same text corresponding to each anaphor instance.
There is a controversial debate (SBAR whether back school program might improvequality of life in back pain patients). This study aimed to address this issue.
(S Reduced serotonin function and abnormalities in the hypothalamic-pituitary-adrenalaxis are thought to play a role in the aetiology of major depression.) We sought toexamine this issue in the elderly .
(S (PP Given these data) (, ,) (NP decreasing HTD to < or = 5 years) (VP may havea detrimental effect on patients with locally advanced prostate cancer) (. .)) Only arandomized trial will conclusively clarify this issue.
As (NP the influence of estrogen alone on breast cancer detection) is not established,we examined this issue in the Women’s Health Initiative trial.
Table 1: Antecedent types. In examples, the antecedent type is in bold and the marked antecedent is in italics.
by the annotator h by rh j. In Figure 1, annotators 1
6 instances and we broke the tie by writing to the
and 2 have reached different conclusions by identi-
authors of the articles and using their response to
fying 9 and 10 sections respectively in their copies
resolve the disagreement. In the gold standard cor-
of the text. Annotator 1 has not marked any an-
pus, 95.5% of the antecedents were in the current or
tecedent for the anaphor with id = 1, while annotator
previous sentence and 99.2% were in the current or
2 has marked r21 for the same anaphor. Both anno-
previous two sentences. Only one antecedent was
tators have marked exactly the same antecedent for
found more than two sentences back and it was six
the anaphor with id = 4. The difference between two
sentences back. One instance was a cataphor, but
annotated sections is defined in terms of the square
the antecedent occurred in the same sentence as the
of the distance between the non-overlapping parts of
anaphor. This suggests that for an automatic this-
the sections. The distance is 0 when the sections are
issue resolution system, it would be reasonable to
unmarked by both annotators or are marked and ex-
consider only the previous two sentences along with
actly same, and is the summation of the squares of
the sentence containing the anaphor.
the unmatched parts if they are different. The coeffi-
The distribution of the different linguistic forms
cient is computed using intersections of the marked
that an antecedent of this-issue can take in our data
sections. In Figure 1, annotators 1 and 2 have a to-
set is shown in Table 1. The majority of antecedents
tal of 14 intersections. The observed disagreement
are clauses or whole sentences. A number of an-
uDo is the weighted sum of the differences between
tecedents are noun phrases, but these are gener-
all mismatching intersections of sections marked by
ally nominalizations that refer to abstract concepts
the annotators, and the expected disagreement is the
(e.g., the influence of estrogen alone on breast can-
summation of all possible differences of pairwise
cer detection). Some antecedents are not even well-
combinations of all sections of all annotators nor-
defined syntactic constituents5 but are combinations
malized by the length of the text (in terms of the
of several well-defined constituents. We denote the
number of tokens) and the number of pairwise com-
type of such antecedents as mixed. In the corpus,
18.2% of the antecedents are of this type, suggest-
For our data, the inter-annotator agreement was
ing that it is not sufficient to restrict the antecedent
αu = 0.86 (uDo = 0.81 and uDe = 5.81) despite the
search space to well-defined syntactic constituents.6
fact that the annotators differed in their domain ex-
In our data, we did not find anaphoric chains for
pertise, which suggests that abstract concepts such
any of the this-issue anaphor instances, which indi-
cates that the antecedents of this-issue anaphors are
5We refer to every syntactic constituent identified by the
parser as a well-defined syntactic constituent.
A gold standard corpus was created by resolving the
6Indeed, many of mixed type antecedents (nearly three-
cases where the annotators disagreed. Among 132
quarters of them) are the result of parser attachment errors, but
training instances, the annotators could not resolve
in the reader’s local memory and not in the global
having a number of leaves (words) less than a thresh-
memory. This observation supports the THIS-NPs
old8 are discarded to give the final set of candidate
hypothesis (Gundel et al., 1993; Poesio and Mod-
jeska, 2002) that this-NPs are used to refer to enti-ties which are active albeit not in focus, i.e., they are
not the center of the previous utterance.
We explored the effect of including 43 automati-cally extracted features (12 feature classes), which
are summarized in Table 2. The features can also bebroadly divided into two groups: issue-specific fea-
tures and general abstract-anaphora features. Issue-
For correct resolution, the set of extracted candidates
specific features are based on our common-sense
must contain the correct antecedent in the first place.
knowledge of the concept of issue and the different
The problem of candidate extraction is non-trivial in
semantic forms it can take; e.g., controversy (X is
abstract anaphora resolution because the antecedents
controversial), hypothesis (It has been hypothesized
are of many different types of syntactic constituents
X), or lack of knowledge (X is unknown), where X
such as clauses, sentences, and nominalizations.
is the issue. In our data, we observed certain syn-
Drawing on our observation that the mixed type an-
tactic patterns of issues such as whether X or not
tecedents are generally a combination of different
and that X and the IP feature class encodes this in-
well-defined syntactic constituents, we extract the
formation. Other issue-specific features are IVERB
set of candidate antecedents as follows. First, we
and IHEAD. The feature IVERB checks whether
create a set of candidate sentences which contains
the governing verb of the candidate is an issue
the sentence containing the this-issue anaphor and
verb (e.g., speculate, hypothesize, argue, debate),
the two preceding sentences. Then, we parse every
whereas IHEAD checks whether the candidate head
candidate sentence with the Stanford Parser7. Ini-
in the dependency tree is an issue word (e.g., contro-
tially, the set of candidate constituents contains a
versy, uncertain, unknown). The general abstract-
list of well-defined syntactic constituents. We re-
anaphora resolution features do not make use of
quire that the node type of these constituents be in
the semantic properties of the word issue. Some
the set {S, SBAR, NP, SQ, SBARQ, S+V}. This
of these features are derived empirically from the
set was empirically derived from our data. To each
training data (e.g., ST, L, D). The EL feature is bor-
constituent, there is associated a set of mixed type
rowed from M¨uller (2008) and encodes the embed-
constituents. These are created by concatenating the
ding level of the candidate within the candidate sen-
original constituent with its sister constituents. For
tence. The MC feature tries to capture the idea of the
example, in (4), the set of well-defined eligible can-
THIS-NPs hypothesis (Gundel et al., 1993; Poesio
didate constituents is {NP, NP1} and so NP1 PP1 is
and Modjeska, 2002) that the antecedents of this-
NP anaphors are not the center of the previous utter-ance. The general abstract-anaphora features in the
SR feature class capture the semantic role of the can-didate in the candidate sentence. We used the Illinois
Semantic Role Labeler9 for SR features. The gen-eral abstract-anaphora features also contain a few
The set of candidate constituents is updated with
lexical features (e.g., M, SC). But these features are
the extracted mixed type constituents. Extracting
independent of the semantic properties of the word
mixed type candidate constituents not only deals
issue. The general abstract-anaphora resolution fea-
with mixed type instances as shown in Table 1, but
tures also contain dependency-tree features, lexical-
as a side effect it also corrects some attachment er-rors made by the parser. Finally, the constituents
8The threshold 5 was empirically derived. Antecedents in
our training data had on average 17 words.
1 iff the candidate follows the pattern SBAR → (IN whether) (S .)
1 iff the candidate follows the pattern SBAR → (IN that) (S .)
1 iff the candidate follows the pattern SBAR → (IN iff) (S .)
1 iff the candidate node is a sentence node
1 iff the candidate node is an SQ or SBARQ node
1 iff the candidate node is of type mixed
EMBEDDING LEVEL (EL) (M ¨uller, 2008)TLEMBEDDING
level of embedding of the given candidate in its top clause (the root node of the syntactic tree)
level of embedding of the given candidate in its immediate clause (the closest parent of type S or SBAR)
1 iff the candidate is in the main clause
1 iff the candidate is in the same sentence as anaphor
1 iff the candidate is in the adjacent sentence
1 iff the candidate occurs 2 or more sentences before the anaphor
1 iff the antecedent occurs before anaphor
1 iff the governing verb of the given candidate is an issue verb
1 iff the candidate is the agent of the governing verb
1 iff the candidate is the patient of the governing verb
1 iff the candidate is the instrument of the governing verb
1 iff the candidate plays the role of modiffication
1 iff the candidate plays no well-defined semantic role in the sentence
1 iff the candidate head in the dependency tree is an issue word (e.g., controversial, unknown)
1 iff the dependency relation of the candidate to its head is of type nominal, controlling or clausal subject
1 iff the dependency relation of the candidate to its head is of type direct object or preposition obj
1 iff the dependency relation of the candidate to its head is of type dependent
1 iff the candidate is the root of the dependency tree
1 iff the dependency relation of the candidate to its head is of type preposition
1 iff the dependency relation of the candidate to its head is of type continuation
1 iff the dependency relation of the candidate to its head is of type clausal or adjectival complement
1 iff candidate’s head is the root node
1 iff the given candidate contains a modal verb
PRESENCE OF SUBORDINATING CONJUNCTION (SC)ISCONT
1 iff the candidate starts with a contrastive subordinating conjunction (e.g., however, but, yet)
1 iff the candidate starts with a causal subordinating conjunction (e.g., because, as, since)
1 iff the candidate starts with a conditional subordinating conjunction (e.g., if, that, whether or not)
normalized ratio of the overlapping words in candidate and the title of the article
normalized ratio of the overlapping words in candidate and the anaphor sentence
proportion of domain-specific words in the candidate
1 iff the preceding word of the candidate is a preposition
1 iff the following word of the candidate is a preposition
1 iff the preceding word of the candidate is a punctuation
1 iff the following word of the candidate is a punctuation
Table 2: Feature sets for this-issue resolution. All features are extracted automatically.
overlap features, and context features.
which could not be corrected by our candidate ex-traction method.10 In these cases, the parts of the
antecedent had been placed in completely different
branches of the parse tree. For example, in (5), the
correct antecedent is a combination of the NP from
anaphora resolution is to choose the best candidate
the S → V P → NP → PP → NP branch and the PP
from S → V P → PP branch. In such a case, concate-
model proposed by Denis and Baldridge (2008).
nating sister constituents does not help.
The advantage of the candidate-ranking model over
(5) The data from this pilot study (VP (VBP provide)
the mention-pair model is that it overcomes the
(NP (NP no evidence) (PP (IN for) (NP a dif-
strong independence assumption made in mention-
ference in hemodynamic effects between pulse
pair models and evaluates how good a candidate is
HVHF and CPFA))) (PP in patients with sep-
tic shock already receiving CRRT)). A larger
We train our model as follows. If the anaphor
sample size is needed to adequately explore thisissue.
is a this-issue anaphor, the set C is extracted us-ing the candidate extraction algorithm from Section
4.1. Then a corresponding set of feature vectors,
We propose two metrics for abstract anaphora eval-
Cf = {Cf 1,Cf 2, .,Cf k}, is created using the features
uation. The simplest metric is the percentage of an-
in Table 2. The training instances are created as de-
tecedents on which the system and the annotated
scribed by Soon et al. (2001). Note that the instance
gold data agree. We denote this metric as EXACT-
creation is simpler than for general coreference res-
M (Exact Match) and compute it as the ratio of
olution because of the absence of anaphoric chains
number of correctly identified antecedents to the to-
in our data. For every anaphor ai and eligible can-
tal number of marked antecedents. This metric is
didates Cf = {Cf 1,Cf 2, .,Cf k}, we create training
a good indicator of a system’s performance; how-
examples (ai,Cfi, label), ∀Cfi ∈ Cf . The label is 1
ever, it is a rather strict evaluation because, as we
if Ci is the true antecedent of the anaphor ai, oth-
noted in section 1, issues generally have no precise
erwise the label is −1. The examples with label 1
boundaries in the text. So we propose another met-
get the rank of 1, while other examples get the rank
ric called RLL, which is similar to the ROUGE-L
of 2. We use SVMrank (Joachims, 2002) for train-
metric (Lin, 2004) used for the evaluation of auto-
ing the candidate-ranking model. During testing, the
matic summarization. Let the marked antecedents
trained model is used to rank the candidates of each
of the gold corpus for k anaphor instances be G =
test instance of this-issue anaphor.
g1, g2, ., gk and the system-annotated antecedents
be A = a1, a2, ., ak . Let the number of words inG and A be m and n respectively. Let LCS(gi, ai)
In this section we present the evaluation of each
be the the number of words in the longest common
subsequence of gi and ai. Then the precision (PRLL)and recall (RRLL) over the whole data set are com-
puted as shown in equations (2) and (3). PRLL is
The set of candidate antecedents extracted by the
the total number of word overlaps between the gold
method from Section 4.1 contained the correct an-
and system-annotated antecedents normalized by the
tecedent 92% of the time. Each anaphor had, on
number of words in system-annotated antecedents
average, 23.80 candidates, of which only 5.19 can-
and RRLL is the total number of such word overlaps
didates were nominal type. The accuracy dropped
normalized by the number of words in the gold an-
to 84% when we did not extract mixed type candi-
tecedents. If the system picks too much text for an-
dates. The error analysis of the 8% of the instances
tecedents, RRLL is high but PRLL is low. The F-score,
where we failed to extract the correct antecedent re-
10Extracting candidate constituents from the dependency
vealed that most of these errors were parsing errors
trees did not add any new candidates to the set of candidates.
Oracle candidate sentence extractor + row 3
Table 3: this-issue resolution results with SVMrank. All means evaluation using all features. Issue-specific features ={IP, IVERB, IHEAD}. EX-M is EXACT-M.
We carried out two sets of systematic experi-
ments in which we considered all combinations ofour twelve feature classes. The first set consists of
5-fold cross-validation experiments on our training
data. The second set evaluates how well the model
built on the training data works on the unseen test
Table 3 gives results of our system. The first two
rows are the baseline results. Rows 3 to 8 give re-
sults for some of the best performing feature sets.
The lower bound of FRLL is 0, where no true an-
All systems based on our features beat both base-
tecedent has any common substring with the pre-
lines on F-scores and EXACT-M. The empirically
dicted antecedents and the upper bound is 1, where
derived feature sets IP (issue patterns) and D (dis-
all the predicted and true antecedents are exactly the
tance) appeared in almost all best feature set com-
same. In our results we represent these scores in
binations. Removing D resulted in a 6 percentage
points drop in FRLL and a 4 percentage points drop
There are no implemented systems that resolve is-
in EXACT-M scores. Surprisingly, feature set ST
sue anaphora or abstract anaphora signalled by label
(syntactic type) was not included in most of the best
nouns in arbitrary text to use as a comparison. So
performing set of feature sets. The combination of
we compare our results against two baselines: ad-
syntactic and semantic feature sets {IP, D, EL, MC,
jacent sentence and random. The adjacent sentence
L, SR, DT} gave the best FRLL and EXACT-M scores
baseline chooses the previous sentence as the correct
for the cross-validation experiments. For the test-
antecedent. This is a high baseline because in our
data experiments, the combination of semantic and
data 84.1% of the antecedents lie within the adjacent
lexical features {D, C, LO, L, SC, SR, DT} gave
sentence. The random baseline chooses a candidate
the best FRLL results, whereas syntactic, discourse,
drawn from a uniform random distribution over the
and semantic features {IP, D, C, EL, L, SC, SR,
DT} gave the best EXACT-M results. Overall, row
3 of the table gives reasonable results for both cross-
Note that our FRLL scores for both baselines are rather high
because candidates often have considerable overlap with one
validation and test-data experiments with no statisti-
another; hence a wrong choice may still have a high FRLL score.
cally significant difference to the corresponding best
EXACT-M scores in rows 6 and 5 respectively.12
Our results show that general abstract-anaphora
To pinpoint the errors made by our system, we
resolution features (i.e., other than issue-specific
carried out three experiments. In the first experi-
features) play a crucial role in resolving this-issue
ment, we examined the contribution of issue-specific
anaphora. This is encouraging, as it suggests that
features versus non-issue features (rows 9 and 10).
the approach could be generalized for other NPs —
Interestingly, when we used only non-issue features,
especially NPs having similar semantic constraints
the performance dropped only slightly. The FRLL re-
such as this problem, this decision, and this conflict.
sults from using only issue-specific features were
The results also show that reduction of search
below baseline, suggesting that the more general
space markedly improves the resolution perfor-
features associated with abstract anaphora play a
mance, suggesting that a two-stage process that first
crucial role in resolving this-issue anaphora.
identifies the broad region of the antecedent and then
In the second experiment, we determined the er-
pinpoints the exact antecedent might work better
ror caused by the candidate extractor component of
than the current single-stage approach. The rationale
our system. Row 12 of the table gives the result
behind this two-stage process is twofold. First, the
when an oracle candidate extractor was used to add
search space of abstract anaphora is large and noisy
the correct antecedent in the set of candidates when-
compared to nominal anaphora.13 And second, it is
ever our candidate extractor failed. This did not
possible to reduce the search space and accurately
affect cross-validation results by much because of
identify the broad region of the antecedents using
the rarity of such instances. However, in the test-
simple features such as the location of the anaphor
data experiment, the EXACT-M improvements that
in the anaphor sentence (e.g., if the anaphor occurs
resulted were statistically significant. This shows
at the beginning of the sentence, the antecedent is
that our resolution algorithm was able to identify an-
most likely present in the previous sentence).
tecedents that were arbitrary spans of text.
We chose scientific articles over general text be-
In the last experiment, we examined the effect of
cause in the former domain the actual referents are
the reduction of the candidate search space. We as-
seldom discourse deictic (i.e., not present in the
sumed an oracle candidate sentence extractor (Row
text). In the news domain, for instance, which we
13) which knows the exact candidate sentence in
have also examined and are presently annotating, a
which the antecedent lies. We can see that both
large percentage of this-issue antecedents lie out-
RLL and EXACT-M scores markedly improved in
side the text. For example, newspaper articles often
this setting. In response to these results, we trained
quote sentences of others who talk about the issues
a decision-tree classifier to identify the correct an-
in their own world, as shown in example (6).
tecedent sentence with simple location and lengthfeatures and achieved 95% accuracy in identifying
(6) As surprising and encouraging to organizers of
the movement are the Wall Street names addedto their roster. Prominent among them is Paul
Singer, a hedge fund manager who is straightand chairman of the conservative Manhattan
We have demonstrated the possibility of resolv-
Institute. He has donated more than $8 million
ing complex abstract anaphora, namely, this-issue
to various same-sex marriage efforts, in states
anaphora having arbitrary antecedents. The work
including California, Maine, New Hampshire,New Jersey, New York and Oregon, much of it
takes the annotation work of Botley (2006) and Dip-
per and Zinsmeister (2011) to the next level by re-solving this-issue anaphora automatically. We pro-
“It’s become something that gradually peo-
posed a set of 43 automatically extracted features
that can be used for resolving abstract anaphora.
If we consider all well-defined syntactic constituents of a
sentence as issue candidates, in our data, a sentence has on av-
12We performed a simple one-tailed, k-fold cross-validated
erage 43.61 candidates. Combinations of several well-defined
paired t-test at significance level p = 0.05 to determine whether
syntactic constituents only add to this number. Hence if we
the difference between the EXACT-M scores of two feature
consider the antecedent candidates from the previous 2 or 3 sen-
classes is statistically significant.
tences, the search space can become quite large and noisy.
ple like myself weren’t afraid to fund, weren’t
afraid to speak out on,” Mr. Singer said in an in-terview. “I’m somebody who is philosophically
very conservative, and on this issue I thought
Memorial Chiropractic College for annotating our
that this really was important on the basis of
the anonymous reviewers for their detailed and con-structive comments. This research was financially
In such a case, the antecedent of this issue is not
supported by the Natural Sciences and Engineering
always in the text of the newspaper article itself, but
Research Council of Canada and by the University
must be inferred from the context of the quotation
and the world of the speaker quoted. That said, wedo not use any domain-specific information in ourthis-issue resolution model. Our features are solely
based on distance, syntactic structure, and semantic
Nicholas Asher. 1993. Reference to Abstract Objects in
and lexical properties of the candidate antecedents
Discourse. Kluwer Academic Publishers, Dordrecht,
which could be extracted for text in any domain.
Issue anaphora can also be signalled by demon-
Philip Simon Botley. 2006. Indirect anaphora: Testing
stratives other than this. However, for our initial
the limits of corpus-based linguistics. International
study, we chose this issue for two reasons. First, in
Journal of Corpus Linguistics, 11(1):73–112.
our corpus as well as in other general corpora such
Donna K. Byron. 2003. Annotation of pronouns and
as the New York Times corpus, issue occurs much
their antecedents: A comparison of two domains.
more frequently with this than other demonstratives.
Technical Report, University of Rochester.
Second, we did not want to increase the complexity
Donna K. Byron. 2004. Resolving pronominal refer-
of the problem by including the plural issues.
ence to abstract entities. Ph.D. thesis, Rochester, NewYork: University of Rochester.
Our approach needs further development to make
Jos´e Casta˜no, Jason Zhang, and James Pustejovsky.
2002. Anaphora resolution in biomedical literature. In
anaphora signalled by label nouns in all kinds of
Proceedings of the International Symposium on Refer-
text. At present, the major obstacle is that there
ence Resolution for NLP, Alicante, Spain, June.
is very little annotated data available that could be
Bin Chen, Jian Su, Sinno Jialin Pan, and Chew Lim Tan.
used to train an abstract anaphora resolution sys-
2011. A unified event coreference resolution by inte-
tem. And the understanding of abstract anaphora
grating multiple resolvers. In Proceedings of 5th Inter-
itself is still at an early stage; it would be prema-
national Joint Conference on Natural Language Pro-
ture to think about unsupervised approaches. In this
cessing, Chiang Mai, Thailand, November.
work, we studied the narrow problem of resolution
Pascal Denis and Jason Baldridge. 2008. Specialized
models and ranking for coreference resolution. In Pro-
of this-issue anaphora in the medical domain to get
ceedings of the 2008 Conference on Empirical Meth-
a good grasp of the general abstract-anaphora reso-
ods in Natural Language Processing, pages 660–669,
Honolulu, Hawaii, October. Association for Computa-
A number of extensions are planned for this work.
First, we will extend the work to resolve other ab-
Stefanie Dipper and Heike Zinsmeister. 2011. Annotat-
stract anaphors (e.g., this decision, this problem).
ing abstract anaphora. Language Resources and Eval-
Second, we will experiment with a two-stage reso-
lution approach. Third, we would like to explore the
Miriam Eckert and Michael Strube. 2000. Dialogue acts,
effect of including serious discourse structure fea-
synchronizing units, and anaphora resolution. Journalof Semantics, 17:51–89.
tures in our model. (The feature sets SC and C en-
code only shallow discourse information.) Finally,
during annotation, we noted a number of issue pat-
Coulthard, editor, Advances in written text analysis,
terns (e.g., An open question is X, X is under debate);
a possible extension is extracting issues and prob-
Jeanette K. Gundel, Nancy Hedberg, and Ron Zacharski.
lems from text using these patterns as seed patterns.
1993. Cognitive status and the form of referring ex-
Massimo Poesio, Simone Ponzetto, and Yannick Versley.
2011. Computational models of anaphora resolution:
Graeme Hirst. 1981. Anaphora in Natural Language Un-
derstanding: A Survey, volume 119 of Lecture Notes
Hans-J¨org Schmid. 2000. English Abstract Nouns As
Conceptual Shells: From Corpus to Cognition. Topics
Thorsten Joachims. 2002. Optimizing search engines us-
in English Linguistics. De Gruyter Mouton, Berlin.
ing clickthrough data. In ACM SIGKDD Conference
Wee Meng Soon, Hwee Tou Ng, and Chung Yong Lim.
on Knowledge Discovery and Data Mining (KDD),
2001. A machine learning approach to coreference
resolution of noun phrases. Computational Linguis-tics, 27(4):521–544.
Klaus Krippendorff. 1995. On the reliability of unitizing
contiguous data. Sociological Methodology, 25:47–
Michael Strube and Christoph M¨uller. 2003. A machine
learning approach to pronoun resolution in spoken di-alogue. In Proceedings of the 41st Annual Meeting of
Klaus Krippendorff. 2004. Content Analysis: An In-
the Association for Computational Linguistics, pages
troduction to Its Methodology. Sage, Thousand Oaks,
168–175, Sapporo, Japan, July. Association for Com-
Yu-Hsiang Lin and Tyne Liang. 2004. Pronominal and
sortal anaphora resolution for biomedical literature. In
anaphora resolution in Medline abstracts. Computa-
Proceedings of ROCLING XVI: Conference on Com-
putational Linguistics and Speech Processing, Taiwan,September.
matic evaluation of summaries. In Text SummarizationBranches Out: Proceedings of the ACL-04 Workshop,pages 74–81, Barcelona, Spain, July. Association forComputational Linguistics.
Ruslan Mitkov. 2002. Anaphora Resolution. Longman.
Natalia N. Modjeska. 2003. Resolving Other-Anaphora.
Ph.D. thesis, School of Informatics, University of Ed-inburgh.
Christoph M¨uller. 2008. Fully Automatic Resolution of
It, This and That in Unrestricted Multi-Party Dialog. Ph.D. thesis, Universit¨at T¨ubingen.
types of abstract pronominal anaphora. In Proceed-ings of the Workshop Beyond Semantics: Corpus-based investigations of pragmatic and discourse phe-nomena, G¨ottingen, Germany, February.
Rebecca Passonneau. 1989. Getting at discourse refer-
ents. In Proceedings of the 27th Annual Meeting ofthe Association for Computational Linguistics, pages51–59, Vancouver, British Columbia, Canada, June. Association for Computational Linguistics.
Massimo Poesio and Ron Artstein. 2008. Anaphoric
annotation in the ARRAU corpus. In Proceedings ofthe Sixth International Conference on Language Re-sources and Evaluation (LREC’08), Marrakech, Mo-rocco, May.
Massimo Poesio and Natalia N. Modjeska.
The THIS-NPs hypothesis: A corpus-based investiga-tion. In Proceedings of the 4th Discourse Anaphoraand Anaphor Resolution Conference (DAARC 2002),pages 157–162, Lisbon, Portugal, September.
Hypoglycemia On this page: What causes hypoglycemia in people with diabetes?Hypoglycemia in People Who Do Not Have Diabetes What is hypoglycemia? Hypoglycemia, also called low blood glucose or low blood sugar, occurs when blood glucose drops below normal levels. Glucose, an important source of energy for the body, comes from food. Carbohydrates are the main dietary source of glucose. Ric