Huettner QA systems 00 04 11

Information about Huettner QA systems 00 04 11

Published on October 16, 2007

Author: Melinda

Source: authorstream.com

Content

Questioning and Answering:  Questioning and Answering Alison Huettner CLARITECH Corporation April 11, 2000 Why question answering?:  Why question answering? Focussed information needs Lack of time to read through documents Inexperienced searchers High expectations Fragment of AV query log:  Fragment of AV query log who invented surf music? how to make stink bombs where are the snowdens of yesteryear? how to do a research paper which english translation of the bible is used in official catholic liturgies? how to do clayart how to copy psx ceramicsweb how to chat? where is silhouettes catelog? how to build a pyramid walleye fishing how tall is the sears tower? What techniques are available?:  What techniques are available? Ordinary document search techniques Electronic dictionaries, encyclopedias, atlases Hand indexing “Clickthrough” information Knowledge-base intense systems Forums for users to exchange information Desire for a truly open-ended QA system Existing resources may be adequate...:  Existing resources may be adequate... What is a codling? Who wrote The Complete Book of Running? When was the saxophone invented? Where can I find out information about West German beer steins? Show me all cases referencing Robbins vs. State of Florida. ...or answers may be elusive:  ...or answers may be elusive How much is a ton of asphalt? What percentage of Americans have children? What state has the most Republicans? Who on Wall Street has been found guilty of insider trading since 1982? Text REtrieval Conference (TREC8):  Text REtrieval Conference (TREC8) Standardized/judged question answering evaluation - what is the state of the art? 198 short-answer, fact-based questions 250- or 50-byte answers (five answers in order of confidence) Scoring by mean reciprocal rank 20 groups submitted out of 23 participating CLARITECH’s approach:  CLARITECH’s approach NLP-based information retrieval (IR) Named entity (NE) extraction Question analysis Question/answer matching Answer ranking deeper NLP Basic CLARIT IR:  Basic CLARIT IR Shallow parsing to detect candidate noun phrases (NPs) Indexing on NPs, attested subphrases, constituent words Subdocuments of 8-10 sentences Optional thesaurus extraction and feedback CLARIT IR adapted for QA:  CLARIT IR adapted for QA Requires some modifications Retain/index verbs, adjectives, adverbs Retrieve smaller subdocuments (1-3 sentences) Prefer subdocs with more of the search terms With modifications, already a reasonable strategy for the 250-byte task Narrows the problem significantly for the 50-byte task Basic CLARIT NE extraction:  Basic CLARIT NE extraction Technology developed for populating DBs and supporting relationship discovery Exploits semantic types – both lists and naming patterns Can index entities by type as part of IR Serendipity for answer identification Preliminary question analysis:  Preliminary question analysis Question word cues Who, when, where, how, why Head noun cues What city, which country, what year... Which astronaut, what blues band, ... Scalar adjective cues How long, how fast, how far, how old, ... Focus cues What is the smallest country in Europe? What is the major export from Thailand? Existing general NE extractors:  Existing general NE extractors Person: Mr. Hubert J. Smith, Adm. McInnes, Grace Chan Title: Chairman, Vice President of Technology, Undersecretary of State Country: USSR, France, Haiti, Haitian Republic City: New York, Rome, Paris, Birmingham, Seneca Falls Province: Kansas, Yorkshire, Uttar Pradesh Business: GTE Corporation, FreeMarkets Inc., Ralston-Purina Co. University: Bryn Mawr College, University of Iowa Organization: Allen Art Museum, Boys and Girls Club, Irish Republican Army Currency: 400 yen, $100, DM450,000 Additional extractors for QA:  Additional extractors for QA Linear: 10 feet, 100 miles, 15 centimeters Area: a square foot, 15 acres Volume: 6 cubic feet, 100 gallons Weight: 10 pounds, half a ton, 100 kilos Duration: 10 day, five minutes, 3 years, a millennium Frequency: daily, biannually, 5 times, 3 times a day Speed: 6 miles per hour, 15 feet per second, 5 kph Age: 3 weeks old, 10-year-old, 50 years of age CLARIT NE adapted for QA:  CLARIT NE adapted for QA But many foreign investors remain sceptical, and western governments are withholding aid because of the Slorc's dismal human rights record and the continued detention of Ms Aung San Suu Kyi, the opposition leader who won the Nobel Peace Prize in 1991. The military junta took power in 1988 as pro-democracy demonstrations were sweeping the country. It held elections in 1990, but has ignored their result. It has kept the 1991 Nobel peace prize winner, AungSan Suu Kyi - leader of the opposition party which won a landslide victoryin the poll - under house arrest since July 1989. The regime, which is also engaged in a battle with insurgents near itseastern border with Thailand, ignored a 1990 election victory by anopposition party and is detaining its leader, Ms Aung San Suu Kyi, who was awarded the 1991 Nobel Peace Prize. According to the British Red Cross, 5,000 or more refugees, mainly theelderly and women and children, are crossing into Bangladesh each day. Who won the Nobel Peace Prize in 1991? Limitations:  Limitations Not all questions contain semantic cues What caused the decline in India’s tiger population? Not all cues lend themselves to NE What actor was the first to be named a British peer? Passages may contain no entities, the wrong entities, or multiple entities Hisako Takahashi, a former director general of the labour ministry, has been named as Japan’s first female supreme court justice, writes Emiko Terazono. Approach is blind to structural cues Who shot Lee Harvey Oswald? NLP revisited:  NLP revisited Deeper NLP is expensive over large databases, but feasible on short passages Linear order and structural information can Identify some answers in default of obvious semantic cues Differentiate among competing answers Rule out prominent but incorrect answers Basic CLARIT NLP:  Basic CLARIT NLP Deterministic part-of-speech tagging Normalization Non-hierarchical, “chunking” parser Discards function words Biassed towards nouns and NPs CLARIT NLP adapted for QA:  CLARIT NLP adapted for QA Improved, context-sensitive part-of-speech tagging CLARIT entity extraction “Greedy” complex noun phrase (CNP) construction  Hierarchical representation capturing both syntactic and semantic information Question analysis:  Question analysis Question and answer patterns may reference individual words (e.g., who), extraction entities (e.g., xcity), or any constituent above the tag level (e.g., NP, CNP). Who commanded British troops at Dunkirk? Question/answer matching:  Question/answer matching Question representation is compared with several hundred “sketchy patterns” A match on a sketchy pattern Associates the question with a question type Identifies and indexes the most important elements in a question of this type Indicates the possible locations of the answer with respect to the indexed question elements Indicates the semantic type of the answer, wherever possible, and requires an element of that type in retrieved subdocuments Question typing:  Question typing Who discovered radium? who AVerb1 CNP1 (xperson) ANSWERS (xperson) []* AVerb1 CNP1 # X discovered radium CNP1 []* PVerb1 []* by (xperson) # radium was discovered by X CNP1 []* Rel (xperson) []* AVerb1 !CNP # radium, which X discovered Matching heuristics:  Matching heuristics Elements of a question which are not indexed are treated as “bonus matches” Their structural position is unspecified They need not appear in a candidate answer passage, but the candidate answer is ranked higher when they do Dates are always treated as bonus matches Elements of a given question type may be explicitly declared to be bonus matches CNP matching heuristics:  CNP matching heuristics Complex noun phrases (CNPs) may cross constituent boundaries Who wrote the Complete Book of Running? but Who commanded British troops at Dunkirk? A complete CNP match is given extra points, but A match on the head noun phrase is sufficient British troops were commanded by... Additional matching heuristics:  Additional matching heuristics Second-choice named entity matches Who first patented a DNA sequence? Who won the Boer War? Overlap (but not identity) with query term Who was President of the United States in 1982? President Ronald Reagan... Answer ranking:  Answer ranking Rank of retrieved subdocument Entity type match Number of elements matched in sketchy answer pattern Goodness of sketchy pattern match Number of bonus matches found Exactness of CNP matches Number of times retrieved Data flow:  Data flow Data flow, cont.:  Data flow, cont. Additional factors:  Additional factors Ontology Which astronaut, what mineral, ... Quoted strings Who wrote “Afternoon On A Hill”? Paraphrase Wording of answer may not match question Plausibility Stonehenge is 14 inches high Sample TREC questions:  Sample TREC questions 1. Who is the author of the book, "The Iron Lady: A Biography of Margaret Thatcher"? 2. What was the monetary value of the Nobel Peace Prize in 1989? 3. What does the Peugeot company manufacture? 4. How much did Mercury spend on advertising in 1993? 5. What is the name of the managing director of Apricot Computer? 6. Why did David Koresh ask the FBI for a word processor? 7. What debts did Qintex group leave? 8. What is the name of the rare neurological disease with symptoms such as: involuntary movements (tics), swearing, and incoherent vocalizations (grunts, shouts, etc.)? Overview of TREC strategies:  Overview of TREC strategies NE POS Syn.Str. Ont. VSyn. Score Cymfony 50 .660 SMU 250/50 .646/.555 AT&T 250/250 .545 GePenn 250 .510 Mulitext 250 .471 RMIT 250 .453 Xerox 250 .453 NTTData 250 .439 MITRE 250 .434 IBM 250/250 .430/.395 UMass 250 .383 ? Conclusions:  Conclusions Baseline performance is better than expected Viable question answering systems are on the horizon Good IR is necessary but not sufficient Minimal NLP is both helpful and feasible The End:  The End

Related presentations


Other presentations created by Melinda

MOLLUSCA Power Point
20. 02. 2008
0 views

MOLLUSCA Power Point

using physician extenders
08. 05. 2008
0 views

using physician extenders

Consumer s Rights
07. 05. 2008
0 views

Consumer s Rights

spscicomp6
02. 05. 2008
0 views

spscicomp6

Energy1
02. 05. 2008
0 views

Energy1

FML2004 1
02. 05. 2008
0 views

FML2004 1

Roberta Zobbi
02. 05. 2008
0 views

Roberta Zobbi

Brain Cancer Causes
02. 05. 2008
0 views

Brain Cancer Causes

Anes Equip Probs 03
30. 04. 2008
0 views

Anes Equip Probs 03

Fin525Fall2006Week7
28. 04. 2008
0 views

Fin525Fall2006Week7

Keynote
22. 04. 2008
0 views

Keynote

Bonding
16. 02. 2008
0 views

Bonding

Systems Analysis Presentation
06. 03. 2008
0 views

Systems Analysis Presentation

BizRetPresentation Final
01. 10. 2007
0 views

BizRetPresentation Final

9 Oyieke Illicit Trade
10. 10. 2007
0 views

9 Oyieke Illicit Trade

Khosla Biofuels GEC DC 3 1 06
15. 10. 2007
0 views

Khosla Biofuels GEC DC 3 1 06

28 Thermal Baths at Vals
19. 10. 2007
0 views

28 Thermal Baths at Vals

pk momterrey 11 04
21. 10. 2007
0 views

pk momterrey 11 04

marie robinson
22. 10. 2007
0 views

marie robinson

OceanStore tahoe2
07. 10. 2007
0 views

OceanStore tahoe2

gis in health
23. 10. 2007
0 views

gis in health

Spalart SanFran 06
30. 10. 2007
0 views

Spalart SanFran 06

W Simpson Foresman
02. 11. 2007
0 views

W Simpson Foresman

transitioning
07. 11. 2007
0 views

transitioning

Lopez Cerezo y Lujan
22. 10. 2007
0 views

Lopez Cerezo y Lujan

024
25. 10. 2007
0 views

024

qiao
15. 11. 2007
0 views

qiao

Skills for Life2
19. 11. 2007
0 views

Skills for Life2

Dr Painter Food Psychology
22. 10. 2007
0 views

Dr Painter Food Psychology

MIE2006 RIDE
29. 10. 2007
0 views

MIE2006 RIDE

thesaurus 1
04. 12. 2007
0 views

thesaurus 1

burns pp
04. 01. 2008
0 views

burns pp

rubrics
01. 01. 2008
0 views

rubrics

IZ 101 Slides Revision
30. 10. 2007
0 views

IZ 101 Slides Revision

midterm review
02. 11. 2007
0 views

midterm review

Armstrong
03. 01. 2008
0 views

Armstrong

Mr KONDO Presentation 070603
09. 10. 2007
0 views

Mr KONDO Presentation 070603

dossier de presse
24. 10. 2007
0 views

dossier de presse

pres clarkm5
15. 10. 2007
0 views

pres clarkm5

Joe Caddell ppt
24. 10. 2007
0 views

Joe Caddell ppt

Slivovsky CPE350 Lecture1
31. 12. 2007
0 views

Slivovsky CPE350 Lecture1

materiasprimas
04. 10. 2007
0 views

materiasprimas

swords
26. 02. 2008
0 views

swords

capstone designing
31. 10. 2007
0 views

capstone designing

03 Basic Chemistry
04. 03. 2008
0 views

03 Basic Chemistry

Contraception
12. 10. 2007
0 views

Contraception

P Letardi
01. 11. 2007
0 views

P Letardi

NUR lecture 2
07. 01. 2008
0 views

NUR lecture 2

Global Health
11. 03. 2008
0 views

Global Health

Solar5
07. 04. 2008
0 views

Solar5

Gumbert vormittags
29. 12. 2007
0 views

Gumbert vormittags

35
22. 10. 2007
0 views

35

STB OrgChart 050819
25. 03. 2008
0 views

STB OrgChart 050819

wf conf june5
09. 04. 2008
0 views

wf conf june5

global nondeal roadshow
11. 04. 2008
0 views

global nondeal roadshow

L04 dg c2
10. 12. 2007
0 views

L04 dg c2

20 Non renewables
16. 04. 2008
0 views

20 Non renewables

SAINSEL Kickoff
23. 10. 2007
0 views

SAINSEL Kickoff

Futures Pricing and Strategies
17. 04. 2008
0 views

Futures Pricing and Strategies

tutorial avedon
24. 02. 2008
0 views

tutorial avedon

k waldron avnwx tdas
07. 03. 2008
0 views

k waldron avnwx tdas

ASPO2005 Leonard
12. 10. 2007
0 views

ASPO2005 Leonard

powerpointmba
16. 11. 2007
0 views

powerpointmba

marxfam2
19. 02. 2008
0 views

marxfam2

antihistamines 2006
16. 10. 2007
0 views

antihistamines 2006

CD San Francisco 11 14
29. 10. 2007
0 views

CD San Francisco 11 14

russian transportation
11. 10. 2007
0 views

russian transportation

ibcoldwar
08. 04. 2008
0 views

ibcoldwar

Agrimi Workshop7 THC5
17. 10. 2007
0 views

Agrimi Workshop7 THC5

Pompey the Great
03. 04. 2008
0 views

Pompey the Great

wipo smes sin 07 8 parta
28. 02. 2008
0 views

wipo smes sin 07 8 parta

P1Slides
17. 10. 2007
0 views

P1Slides

JubbDinner Keynote
07. 01. 2008
0 views

JubbDinner Keynote

old china new japan
09. 10. 2007
0 views

old china new japan

Paul Browne Selected MSc Work
29. 09. 2007
0 views

Paul Browne Selected MSc Work

ppt simons
13. 11. 2007
0 views

ppt simons

Bucharest Denisov
11. 10. 2007
0 views

Bucharest Denisov

4941
13. 04. 2008
0 views

4941

Lucy
05. 01. 2008
0 views

Lucy