Progress in NLIP? What does the summarising
task tell us? (Prof. Karen Sparck Jones)
language information processing (NLIP) has made significant progress,
in important ways, in the last twenty years. We have developed fairly
comprehensive and robust tools like grammars and parsers, and have
gained experience with applications including multilingual ones.
We have been able not only to take advantage of the general advance
in computing and communications technology but, more significantly,
to exploit by-now vast text corpora to adapt our tools to actual
patterns of language use. We have learnt, in particular, that many
NLIP tasks can be sufficiently well done to be useful in many practical
contexts by exploiting shallow text processing, ie by relying on
surface indications of discourse meaning and communicative intent.
We have also been learning how to do NLIP system evaluation.
illustrates what we have learnt, where we are, and where we need
to go, very well. The first experiments in automatic summarising
used very simple technology, a simple statistical sentence extraction
technology that seemed too simple for useful summaries. Subsequent
research focused on deeper text analysis that could sometimes work
better could not readily be scaled up to large heterogenous data
sources or to some user needs. More recent work on summarising has
largely returned to the simpler, extractive approach, though it
has also sought to refine or enrich this by, for example, incorporating
parsing or by exploiting machine learning.
Summarising has also
been better contextualised, partly by being seen as encompassing
a spectrum of types ranging from basic index descriptions for individual
documents to multi-source syntheses of specific types of information,
for example biographies. At the same time, summarising is increasingly,
and rightly, seen as a task that is only one activity within a set
that may all be useful for some larger purpose so that, for example,
summarising may be related to search queries or to the need to encapsulate
extended information-seeking interactions.
But all of this richer
view of summarising presents significant challenges for system evaluation.
NLIP research has been transformed since 1990 by the major task
evaluation programs that have been running, notably for information
extraction and document retrieval and, later, question answering,
that have served to establish whether plausible ideas actually work
and to disseminate effective techniques. Summarising itself has
been the focus of its own evaluation programmes for five years.
This evaluation work, and the summarising evaluation work in particular,
has been important both in promoting a better understanding of NLIP
tasks and the impact of their application conditions. The summarising
evaluations have, in particular, served to demonstrate both how
crucial application contexts are for how tasks are handled, and
how extremely challenging evaluation in itself is.
Karen Sparck Jones is emeritus Professor of
Computers and Information at the Computer Laboratory, University
of Cambridge. She has worked in automatic language and information
processing research since the late fifties, and has many publications
including nine books. She is a Fellow of the British Academy and
of the American Association for Artificial Intelligence. She has
received three awards for information retrieval research as well
as, in 2004, the Association for Computational Linguistics' Lifetime
Achievement Award. Her more recent research has been on information
retrieval models and practice, on automatic summarising, and on
system evaluation, where she is involved in international programmes.
University of Cambridge