shen 1

Information about shen 1

Published on October 12, 2007

Author: Mahugani

Source: authorstream.com

Content

A Maximum Entropy-based Model for Answer Extraction:  A Maximum Entropy-based Model for Answer Extraction Dan Shen IGK, Saarland University Supervisors: Prof. Dietrich Klakow Dr. ir. Geert-Jan M. Kruijff Part I -- Introduction:  Part I -- Introduction Answer Extraction Module in QA Statistical Method for Answer Extraction Motivation Framework Answer Extraction Module in QA:  Answer Extraction Module in QA Open-Domain factoid Question Answering Basic modules Information Retrieval Module  a set of relevant sentences / paragraphs Answer Extraction (AE) Module  the appropriate answer phrase Q: What is the capital of Japan ? A: Tokyo Q: How far is it from Earth to Mars ? A: 249 million miles Techniques and Resources for AE:  Techniques and Resources for AE  How to incorporate them ? Pipeline structure Mathematical framework Motivation – Use Statistical Methods ?:  Motivation – Use Statistical Methods ? Flexibility Integrating various techniques / resources Easy to extend to span more in the future Effectiveness Research Issues:  Research Issues Answer Candidate Selection Which constituent is regarded as an AC ? Methods classification / ranking / … Features Part II – ME-based model:  Part II – ME-based model Method Features Experiments and Results Part II – ME-based model:  Part II – ME-based model Method Features Experiments and Results Maximum Entropy Formulation I:  Maximum Entropy Formulation I Given a set of answer candidates Model the probability Define Features Functions Decision Rule Maximum Entropy Formulation II:  Maximum Entropy Formulation II Given a set of answer candidates Model the probability Define Features Functions Decision Rule Some Considerations:  Some Considerations Model I Judge whether each candidate is a correct answer √ Can find more than one correct answer in a sentence ? Is the probability comparable ? × Suffer from the unbalanced data set (1Pos / >20Neg) Model II Find the best answer among the candidates × In a sentence, it just find one correct answer √ Directly make the probabilities of the candidates comparable Experiment Model II outperform Model I by about 5% Part II – ME-based model:  Part II – ME-based model Method Features Experiments and Results Question Analysis:  Question Analysis Q: What US biochemists won the Nobel Prize in medicine in 1992 ? Question Word -- what Target Word – biochemist Subject Word -- Nobel Prize / medicine / 1992 Verb – win Q: What is the name of the highest mountain in Africa ? Question Word -- what Target Word -- mountain Subject Words -- highest / Africa Verb -- be PERSON LOCATION Answer Candidate Selection:  Answer Candidate Selection Preprocessing Named Entity Recognition Parsing [Collins Parser] To dependency tree Answer Candidate Selection Base noun phrase Named entities Leaf nodes Answer Candidate Coverage 11876 / 14039 = 84.6 % Missing some sentences  to consider all of the nodes ? Features – Syntactic / POS Tag Features:  Features – Syntactic / POS Tag Features Observation For who / where Question, answers = Proper Noun For how / when Question, answers = CD Question Word × Syntactic tag / Pos tag QWord = “how” & SynTag = “CD” QWord = “who” & SynTag = “NNP” QWord = “when” & SynTag = “NNP” QWord = “when” & SynTag = “CD” … Features – Surface Word Features:  Features – Surface Word Features Word formations Length / Capitalized / Digits, … Question Word × Word formations QWord = “who” & word is capitalized QWord = “who” & word length < 3 Words co-occurrence between Q and A Observation -- Answer aren’t a subsequence of question Features – Named Entity Features:  Features – Named Entity Features Question Type × NE type QType = Person & NE type = Person QType = Date & NE type = Date QType = how much & NE type = Money … Useful for who, where, when … Question But for What / Which / How questions ? Many expected answer types not belong to a defined NE type Q1: What language is most commonly used in Bombay ? Q2: What city is … Q3: Which movie win …. Features – TWord Relation for WHAT I:  Features – TWord Relation for WHAT I TWord is a hypernym of answer TWord is the head of answer Q: What is the name of the airport in Dallas Ft. Worth ? A: Wednesday morning , the low temperature at the Dallas-Fort Worth International Airport was 81 degrees . Q: What city is Disneyland in ? A: Not bad for a struggling actor who was working at Tokyo Disneyland just a few years ago . Features – TWord Relation for WHAT II:  Features – TWord Relation for WHAT II TWord is the Appositive of answer Feature Function QWord = what & TWord is hypernym of answer candidate … Q: What book did Rachel Carson write in 1962 ? A1: In her 1962 book Silent Spring , Rachel Carson , a marine biologist , chronicled DDT 's poisonous effects , …. A2: In 1962 , former U.S. Fish and Wildlife Service biologist Rachel Carson shocked the nation with her landmark book , Silent Spring . Features – Tword Relation for HOW:  Features – Tword Relation for HOW How many / much + NN … How long / far / tall / fast … How long …  year / day / month / … How tall …  feet / inch / mile / … How fast …  per day / per hour / … Use some trigger word features Q: How many time zones are there in the world ? A: The world is divided into 24 time zones . Features – Subject Word Relations I:  Features – Subject Word Relations I Q: Who invented the paper clip ? S1: The paper clip , weighing a desk-crushing 1320 pounds , is a faithful copy of Norwegian Johan Vaaler ‘s 1899 invention, said … S2: “ Like the guy who invented the safety pin , or the guy who invented the paper clip “ , David says . × Features – Subject Word Relations II:  Features – Subject Word Relations II Match subject word in the answer sentence Minimal Edit Distance Dependency Relationship Matching Observation – answer are close to SWord in Dependency Tree  answer and SWord have some relation Answer candidate is a subject word Answer candidate is the parent / child / brother of SWord The path from the answer candidate to SWord Q: What is the name of the airport in Dallas Ft. Worth ? A: Wednesday morning , the low temperature at the Dallas-Fort Worth International Airport was 81 degrees Part II – ME-based model:  Part II – ME-based model Method Features Experiments and Results Experiment Settings:  Experiment Settings Training Data TREC 1999, TREC 2000, TREC 2002 Total Number of Questions: 1108 Total Number of Sentences: 11331 Test Data TREC 2003 Total Number of Questions: 362 (remove NIL question) Total Number of Sentences: 2708 Question Word Distribution:  Question Word Distribution Overall Performance:  Overall Performance MRR – Mean Reciprocal Rank return five answers for each question Contribution of Different Features:  Contribution of Different Features Features – Syntactic / POS Tag Features:  Features – Syntactic / POS Tag Features Features – + Surface Word Features:  Features – + Surface Word Features Features – + Named Entity Features:  Features – + Named Entity Features Features – + TWord Relations for WHAT:  Features – + TWord Relations for WHAT Features – + TWord Relations for HOW:  Features – + TWord Relations for HOW Features – + Subject Word Relations:  Features – + Subject Word Relations Error Analysis – I:  Error Analysis – I Target Word Concept Unresolved Q: What is the traditional dish served at Wimbledon? √A: And she said she wasn't wild about Wimbledon 's famed strawberries and cream . ×A: And she said she wasn't wild about Wimbledon 's famed strawberries and cream . Choosing the Wrong Entity Q: What actress has received the most Oscar nominations? √A: Oscar perennial Meryl Streep is up for best actress for the film , tying Katharine Hepburn for most acting nominations with 12 . ×A: Oscar perennial Meryl Streep is up for best actress for the film , tying Katharine Hepburn for most acting nominations with 12 . Error Analysis – II:  Error Analysis – II Answer Candidate Granularity Q: What city is Disneyland in? √A: Not bad for a struggling actor who was working at Tokyo Disneyland just a few years ago . ×A: Not bad for a struggling actor who was working at Tokyo Disneyland just a few years ago . Repeated Target Word in Answer Q: How many grams in an ounce? √A: NOTE : 30 grams is about 1 ounce . ×A: NOTE : 30 grams is about 1 ounce . Misc. Future Work:  Future Work Extract answer from Web Evaluate on other data sets Knowledge Master Corpus How to deal with NIL question ? Incorporate more linguistic-motivated features The End:  The End

Related presentations


Other presentations created by Mahugani

Exploring the Deep Web
12. 03. 2008
0 views

Exploring the Deep Web

Moving Mountains
02. 10. 2007
0 views

Moving Mountains

dustbowl
10. 10. 2007
0 views

dustbowl

The Internet China
12. 10. 2007
0 views

The Internet China

Triumph of Bolshevism
12. 10. 2007
0 views

Triumph of Bolshevism

Kukovecz
15. 10. 2007
0 views

Kukovecz

09 Panama s ppt
22. 10. 2007
0 views

09 Panama s ppt

Common By Product Feeds
04. 10. 2007
0 views

Common By Product Feeds

Dissertation Writing comms ug
27. 11. 2007
0 views

Dissertation Writing comms ug

TT
27. 11. 2007
0 views

TT

black holes v2
28. 11. 2007
0 views

black holes v2

Production of Calla Lily
07. 12. 2007
0 views

Production of Calla Lily

Water Track 8 7 15 051
07. 11. 2007
0 views

Water Track 8 7 15 051

PVC Toronto talk
16. 11. 2007
0 views

PVC Toronto talk

2022lecture2
19. 11. 2007
0 views

2022lecture2

Robertson
03. 10. 2007
0 views

Robertson

20050922 Crafoord Symposium
29. 08. 2007
0 views

20050922 Crafoord Symposium

field mmr naga
31. 12. 2007
0 views

field mmr naga

Anthony Kelly International
02. 01. 2008
0 views

Anthony Kelly International

fy2004 mfc construction
04. 01. 2008
0 views

fy2004 mfc construction

NASC PresentHanson
08. 08. 2007
0 views

NASC PresentHanson

Nicosia Raymond Pawson
08. 08. 2007
0 views

Nicosia Raymond Pawson

Methamphetamine final10 05
08. 08. 2007
0 views

Methamphetamine final10 05

ppt43
16. 10. 2007
0 views

ppt43

McCarthy Mitchell
29. 08. 2007
0 views

McCarthy Mitchell

Update FutureDirection LRago
22. 10. 2007
0 views

Update FutureDirection LRago

gef 160306
23. 10. 2007
0 views

gef 160306

IT Trends 2005 2010
14. 11. 2007
0 views

IT Trends 2005 2010

rec pond mgnt compressed
07. 01. 2008
0 views

rec pond mgnt compressed

Sci Case II
29. 08. 2007
0 views

Sci Case II

markenklima index q1 2005
05. 01. 2008
0 views

markenklima index q1 2005

yalenov2006
29. 08. 2007
0 views

yalenov2006

media 4917
08. 08. 2007
0 views

media 4917

gatorsncrocs
12. 10. 2007
0 views

gatorsncrocs

Eradicating Systemic Poverty
29. 11. 2007
0 views

Eradicating Systemic Poverty

Kennedy obesity 0904
08. 08. 2007
0 views

Kennedy obesity 0904

jsimon santacruz
29. 08. 2007
0 views

jsimon santacruz

9 0568 rusack r
20. 11. 2007
0 views

9 0568 rusack r

soc100ch10Corepwrpt
19. 02. 2008
0 views

soc100ch10Corepwrpt

Edward Albee
24. 02. 2008
0 views

Edward Albee

AFCEA NOVA Breakfast7Sept07v1
06. 03. 2008
0 views

AFCEA NOVA Breakfast7Sept07v1

Lakeside2
26. 03. 2008
0 views

Lakeside2

sHansen
29. 08. 2007
0 views

sHansen

Tectonics Terrestrial Planets2
07. 04. 2008
0 views

Tectonics Terrestrial Planets2

Sept SECC
02. 11. 2007
0 views

Sept SECC

Hercules
28. 03. 2008
0 views

Hercules

deprez presentation 12 1 05
30. 03. 2008
0 views

deprez presentation 12 1 05

HARIPARSAD Ishwarie 2
09. 04. 2008
0 views

HARIPARSAD Ishwarie 2

Beaulieu
10. 04. 2008
0 views

Beaulieu

sings2mw
29. 08. 2007
0 views

sings2mw

molgas twong
29. 08. 2007
0 views

molgas twong

newman1
14. 04. 2008
0 views

newman1

session 25 V2
17. 04. 2008
0 views

session 25 V2

Citel
22. 04. 2008
0 views

Citel

icra02
19. 06. 2007
0 views

icra02

ICHEP 04 Barr Higgs
19. 06. 2007
0 views

ICHEP 04 Barr Higgs

IBERs and e Theses
19. 06. 2007
0 views

IBERs and e Theses

HS P2P Liao
19. 06. 2007
0 views

HS P2P Liao

he b
19. 06. 2007
0 views

he b

HB2004
19. 06. 2007
0 views

HB2004

Hartenstein Oerebro03 pt1
19. 06. 2007
0 views

Hartenstein Oerebro03 pt1

Grid InteropSupport
19. 06. 2007
0 views

Grid InteropSupport

Grid Interop
19. 06. 2007
0 views

Grid Interop

grid 06talk
19. 06. 2007
0 views

grid 06talk

wednesday
29. 08. 2007
0 views

wednesday

comer5e ch08 HO
15. 11. 2007
0 views

comer5e ch08 HO

SAG YinG 9 Jan New
03. 01. 2008
0 views

SAG YinG 9 Jan New

02 Cattle2
26. 11. 2007
0 views

02 Cattle2

Grid Shib uk april05
19. 06. 2007
0 views

Grid Shib uk april05

J Acar
14. 03. 2008
0 views

J Acar

20061130 woodling
30. 12. 2007
0 views

20061130 woodling

ch02exoh
07. 01. 2008
0 views

ch02exoh

Choose your way carefully
03. 10. 2007
0 views

Choose your way carefully

4 vista
16. 06. 2007
0 views

4 vista

33233 11162218 S
16. 06. 2007
0 views

33233 11162218 S

23
16. 06. 2007
0 views

23

2007 tips tricks
16. 06. 2007
0 views

2007 tips tricks

19b
16. 06. 2007
0 views

19b

EPL Membership
16. 06. 2007
0 views

EPL Membership

Entire Gra duation Slideshow
16. 06. 2007
0 views

Entire Gra duation Slideshow

elley web graphics
16. 06. 2007
0 views

elley web graphics

A Loose Confederation
14. 12. 2007
0 views

A Loose Confederation

employee 2004
16. 06. 2007
0 views

employee 2004

Obesity 1
08. 08. 2007
0 views

Obesity 1

Active Kill Disk
19. 06. 2007
0 views

Active Kill Disk

teall cost 3 ch16
24. 02. 2008
0 views

teall cost 3 ch16

CFA05
29. 08. 2007
0 views

CFA05

gemini sab
29. 08. 2007
0 views

gemini sab

NINDS Audience Report
08. 08. 2007
0 views

NINDS Audience Report

mm1
29. 08. 2007
0 views

mm1

ENGD POWERPOINT
16. 06. 2007
0 views

ENGD POWERPOINT

I3C BSML July2002
19. 06. 2007
0 views

I3C BSML July2002

igt 3
04. 03. 2008
0 views

igt 3

MassesofGalaxies
29. 08. 2007
0 views

MassesofGalaxies