NIST TDT2004

Information about NIST TDT2004

Published on May 7, 2008

Author: Lucianna

Source: authorstream.com

Content

Overview of the TDT 2004 Evaluation and Results:  Overview of the TDT 2004 Evaluation and Results Jonathan Fiscus Barbara Wheatley National Institute of Standards and Technology Gaithersburg, Maryland December 2-3, 2004 Outline:  Outline TDT Evaluation Overview Changes in 2004 2004 TDT Evaluation Result Summaries New Event Detection Link Detection Topic Tracking Experimental Tasks: Supervised Adaptive Topic Tracking Hierarchical Topic Detection Topic Detection and Tracking:  Topic Detection and Tracking 5 TDT Applications Story Segmentation* Topic Tracking Topic Detection First Story Detection Link Detection “Applications for organizing text” Terabytes of Unorganized data * Not evaluated in 2004 TDT’s Research Domain :  TDT’s Research Domain Technology challenge Develop applications that organize and locate relevant stories from a continuous feed of news stories Research driven by evaluation tasks Composite applications built from Document Retrieval Speech-to-Text (STT) – not included this year Story Segmentation – not included this year Definitions:  Definitions An event is … A specific thing that happens at a specific time and place along with all necessary preconditions and unavoidable consequences. A topic is … an event or activity, along with all directly related events and activities A broadcast news story is … a section of transcribed text with substantive information content and a unified topical focus Evaluation Corpus:  Evaluation Corpus Same languages as last year Summary of differences New time period No broadcast news No non-news stories 4.5 times more stories 3.1 times more topics Topics have ½ as many on-topic stories Topic Size Distribution:  Topic Size Distribution 35 Arb+Eng+Man 62 Arb 62 Man 63 Eng 21 Eng+Man 7 Arb+Eng Mutlilingual Topic overlap:  Mutlilingual Topic overlap 25 Common Stories Topic ID Unique Stories Topics on Terrorism 107: Casablanca bombs 71: Demonstrations in Casablanca Single Overlap Topics Multiply Overlap Topics Topic labels:  Topic labels 72 Court indicts Liberian President 89 Liberian former president arrives in exile 29 Swedish Foreign Minister killed 125 Sweden rejects the Euro 151 Egyptian delegation in Gaza 189 Palestinian public uprising suspended for three months 69 Earthquake in Algeria 145 Visit of Morocco Minister of Foreign Affairs to Algeria 186 Press conference between Lebanon and US foreign ministers 193 Colin Powell Plans to visit Middle East and Europe 105 UN official killed in attack 126 British soldiers attacked in Basra 215 Jerusalem: Bus suicide bombing 227 Bin Laden Videotape 171 Morocco: death sentences for bombing suspects 107 Casablanca bombs 71 Demonstrations in Casablanca 106 Bombing in Riyadh, Saudi Arabia 118 World Economic Forum in Jordan 154 Saudi suicide bomber dies in shootout 60 Saudi King has eye surgery 80 Spanish Elections Single Overlap Topics Multiply Overlap Topics Participation by Task: Showing the Number of Submitted System Runs:  Participation by Task: Showing the Number of Submitted System Runs Foreign Domestic New Event Detection Task:  New Event Detection Task System Goal: To detect the first story that discusses each topic TDT Evaluation Methodology:  TDT Evaluation Methodology Tasks are modeled as detection tasks Systems are presented with many trials and must answer the question: “Is this example a target trial?” Systems respond: YES this is a target, or NO this is not Each decision includes a likelihood score indicating the system’s confidence in the decision System performance measured by linearly combining the system’s missed detection rate and false alarm rate Detection Evaluation Methodology:  Detection Evaluation Methodology Performance is measured in terms of Detection Cost CDet = CMiss * PMiss * Ptarget + CFA * PFA * (1- Ptarget) Constants: CMiss = 1 and CFA = 0.1 are preset costs Ptarget = 0.02 is the a priori probability of a target System performance estimates PMiss and PFA Normalized Detection Cost generally lies between 0 and 1: (CDet)Norm = CDet/min{CMiss*Ptarget, CFA * (1-Ptarget)} Detection Error Tradeoff (DET) curves graphically depict the performance tradeoff between PMiss and PFA Makes use of likelihood scores attached to the YES/NO decisions Two important scores per system Actual Normalized Detection Cost Based on the YES/NO decision threshold Minimum Normalized DET point Based on the DET curve: Minimum score with proper threshold Slide14:  Performance Measures Example Bottom left is better DET Curve Bar Chart Primary New Event Detection Results Newswire, English Texts:  Primary New Event Detection Results Newswire, English Texts New Event Detection Performance History :  New Event Detection Performance History * 0.4283 on 2002 Topics Slide17:  TDT Link Detection Task System Goal: To detect whether a pair of stories discuss the same topic. (Can be thought of as a “primitive operator” to build a variety of applications) ? Primary Link Detection Results Newswire, Multilingual links, 10-file deferral period:  Primary Link Detection Results Newswire, Multilingual links, 10-file deferral period Link Detection Performance History :  Link Detection Performance History * 0.1798 on 2002 Topics Topic Tracking Task:  Topic Tracking Task System Goal: To detect stories that discuss the target topic, in multiple source streams Supervised Training Given Nt samples stories that discuss a given target topic Testing Find all subsequent stories that discuss the target topic Primary Tracking Results Newswire, Multilingual Texts, 1 English Training Story:  Primary Tracking Results Newswire, Multilingual Texts, 1 English Training Story Tracking Performance History :  Tracking Performance History * 0.1618 on 2002 Topics Supervised Adaptive Tracking Task:  Supervised Adaptive Tracking Task Variation of Topic Tracking system goal: To detect stories that discuss the target topic when a human provides feedback to the system System receives human judgment (on or off-topic) for every retrieved story Same task as TREC 2002 Adaptive Filtering training data test data Supervised Adaptive Tracking Metrics:  Supervised Adaptive Tracking Metrics Normalized Detection Cost Same measure as for basic Tracking task Linear Utility Measure As defined for TREC 2002 Filtering Track (Robertson & Soboroff) Measures value of the stories sent to the user: Credit for relevant stories, debit for non-relevant stories Equivalent to thresholding based on estimated probability of relevance No penalty for missing relevant stories (i.e. all precision, no recall) Implication: Challenge is to beat the “do-nothing” baseline (i.e. a system that rejects all stories) Supervised Adaptive Tracking Metrics:  Supervised Adaptive Tracking Metrics Linear Utility Measure Computation: Basic formula: U = Wrel  R - NR R = number of relevant stories retrieved NR = number of non-relevant stories retrieved Wrel = relative weight of relevant vs non-relevant (set to 10, by analogy with CMiss vs. CFA weights for CDet) Normalization across topics: Divide by maximum possible utility score for each topic Scaling across topics: Define arbitrary minimum possible score, to avoid having average dominated by a few topics with huge NR counts Corresponds to application scenario in which user stops looking at stories when system exceeds some tolerable false alarm rate Scaled, normalized value: Uscale = [ max(Unorm, Umin) ] / [ 1 - Umin ] Supervised Adaptive Tracking Best Two Submissions per Site Newswire, Multilingual Texts, 1 English Training Story:  Supervised Adaptive Tracking Best Two Submissions per Site Newswire, Multilingual Texts, 1 English Training Story Effect of Supervised Adaptation:  Effect of Supervised Adaptation CMU4 is a simple cosine similarity tracker Contrastive run submitted without supervised adaptation Supervised Adaptive Tracking Utility vs. Detection cost:  Supervised Adaptive Tracking Utility vs. Detection cost Performance on Utility measure: 2/3 of systems surpassed baseline scaled utility score (0.33) Most systems optimized for detection cost, not utility Detection Cost and Utility are uncorrelated: R2 of 0.23 Even for CMU3 which was tuned for utility Hierarchical Topic Detection:  Hierarchical Topic Detection System goal: To detect topics in terms of the (clusters of) stories that discuss them Problems with past Topic Detection evaluations: Topics are at different levels of granularity, yet systems had to choose single operating point for creating a new cluster Stories may pertain to multiple topics, yet systems had to assign each to only one cluster Topic Hierarchy Solves Problems:  Topic Hierarchy Solves Problems System operation: Unsupervised topic training - no topic instances as input Assign each story to one or more clusters Clusters may overlap or include other clusters Clusters must be organized as directed acyclic graph (DAG) with single root Treated as retrospective search Semantics of topic hierarchy: Root = entire collection Leaf nodes = the most specific topics Intermediate nodes represent different levels of granularity Performance assessment: Given a topic, find matching cluster with lowest cost Edge Vertex Story IDs Hierarchical Topic Detection Metric: Minimal Cost:  Hierarchical Topic Detection Metric: Minimal Cost Weighted combination of Detection Cost and Travel Cost: WDET  (Cdet(topic, bestVertex))Norm + (1 - WDET)  Ctravel(topic, bestVertex))Norm Detection Cost: same as for other tasks Travel Cost: function of the hierarchy Detection Cost weighted 2 Travel Cost (WDET = 0.66) Minimal Cost metric selected based on study at U Mass (Allan et al.): Effectively eliminates power set solution Favors balance of cluster purity vs. number of clusters Computationally tractable Good behavior in U Mass experiments Analytic use model: Find best-matching cluster by traversing DAG, starting from root Corresponds to analytic task of exploring an unknown collection Drawbacks: Does not model analytic task of finding other stories on same or neighboring topics Not obvious how to normalize travel cost Hierarchical Topic Detection Metric: Travel Cost:  Hierarchical Topic Detection Metric: Travel Cost Travel Cost computation: Ctravel(topic, vertex) = Ctravel(topic, parentOf(vertex)) + CBRANCH  NumChildren(parentOf(vertex)) + CTITLE CBRANCH = cost per branch, for each vertex on path to best match CTITLE = cost of examining each vertex Relative values of CBRANCH and CTITLE determine preference for shallow, bushy hierarchy vs. deep, less bushy hierarchy Evaluation values chosen to favor branching factor of 3 Travel Cost normalization: Absolute travel cost depends on size of corpus, diversity of topics Must be normalized to combine with Detection Cost Normalization scheme for trial evaluation chosen to yield CtravelNorm = 1 for “ignorant” hierarchy (by analogy with use of prior probability for CdetNorm): CtravelNorm = Ctravel / (CBRANCH * MAXVTS * NSTORIES / AVESPT) + CTITLE MAXVTS = 3 (maximum number of vertices per story, controls overlap) AVESPT = 88 (average stories per topic, computed from TDT4 multilingual data) Hierarchical Topic Detection:  Hierarchical Topic Detection Hierarchical Topic Detection Observations:  Hierarchical Topic Detection Observations All systems structured hierarchy as a tree – each vertex has one parent Travel cost has very little effect on finding the best cluster Setting WDET to 1.0 has little effect on topic mapping Cost parameters favor false alarms Average mapped cluster sizes are between 1262 and 7757 stories Average topic size is 40 stories Summary:  Summary Eleven research groups participated in five evaluation tasks Error rates increased for new event detection Why? Error rates decreased for tracking Error rates decreased for link detection Dry run of hierarchical topic detection completed Solves previous problems with topic detection task, but raises new issues Questions to consider: Is the specified hierarchical structure (single-root DAG) appropriate? Is the minimal cost metric appropriate? If so, is the normalization right? Dry run of supervised adaptive tracking completed Promising results for including relevance feedback Questions to consider: Should we continue the task? If so, should we continue using both metrics?

Related presentations


Other presentations created by Lucianna

Nutritional Care of Burns
04. 01. 2008
0 views

Nutritional Care of Burns

spine2 no background
08. 05. 2008
0 views

spine2 no background

banking
14. 04. 2008
0 views

banking

emerging security threats
29. 09. 2007
0 views

emerging security threats

Thunderstorms
03. 10. 2007
0 views

Thunderstorms

i2 traffic shaping
03. 10. 2007
0 views

i2 traffic shaping

bind
07. 10. 2007
0 views

bind

prefix delegation requirement1
09. 10. 2007
0 views

prefix delegation requirement1

dipo
12. 10. 2007
0 views

dipo

Living Things
12. 10. 2007
0 views

Living Things

wnv062904
21. 10. 2007
0 views

wnv062904

latinoamerica
22. 10. 2007
0 views

latinoamerica

Rachinsky
11. 10. 2007
0 views

Rachinsky

Slide presentazione
24. 10. 2007
0 views

Slide presentazione

feynman
16. 10. 2007
0 views

feynman

gt bot
13. 10. 2007
0 views

gt bot

fr summit marginson 230306
30. 10. 2007
0 views

fr summit marginson 230306

Accelerators CZ
15. 11. 2007
0 views

Accelerators CZ

Les Animaux du Zoo
11. 10. 2007
0 views

Les Animaux du Zoo

Rapport Nationale MAROC
23. 10. 2007
0 views

Rapport Nationale MAROC

Grammar essentials
16. 11. 2007
0 views

Grammar essentials

sponge
20. 11. 2007
0 views

sponge

Crans Montana 03 nieuw
15. 10. 2007
0 views

Crans Montana 03 nieuw

Workshop
02. 11. 2007
0 views

Workshop

NSF 12 6 2001
31. 12. 2007
0 views

NSF 12 6 2001

Class8
07. 01. 2008
0 views

Class8

toc wkshp nov03
18. 10. 2007
0 views

toc wkshp nov03

VCT Morocco
24. 10. 2007
0 views

VCT Morocco

NACADA Combined Workshop 11 04
29. 09. 2007
0 views

NACADA Combined Workshop 11 04

sky
13. 11. 2007
0 views

sky

file Kigali Strengthening Local
07. 01. 2008
0 views

file Kigali Strengthening Local

10638221831Maroc MinInt French
23. 10. 2007
0 views

10638221831Maroc MinInt French

ub geographicimagery051 001
27. 09. 2007
0 views

ub geographicimagery051 001

Presentación RR EXPORTA def
23. 10. 2007
0 views

Presentación RR EXPORTA def

prosper
28. 12. 2007
0 views

prosper

HPCN summary 7 5 2007
17. 10. 2007
0 views

HPCN summary 7 5 2007

ammosov
12. 10. 2007
0 views

ammosov

A NEW ENGLISH COURSE Book 3
20. 02. 2008
0 views

A NEW ENGLISH COURSE Book 3

Food Bank of New Jersey
29. 02. 2008
0 views

Food Bank of New Jersey

lewis
19. 10. 2007
0 views

lewis

XC Safety and mentor
03. 04. 2008
0 views

XC Safety and mentor

NA3
07. 04. 2008
0 views

NA3

Civitas Plus2006
18. 03. 2008
0 views

Civitas Plus2006

Ch14 7e
10. 04. 2008
0 views

Ch14 7e

Team2
11. 04. 2008
0 views

Team2

fmla
04. 10. 2007
0 views

fmla

retailcompetition
17. 04. 2008
0 views

retailcompetition

Using ILS
22. 04. 2008
0 views

Using ILS

shaw
16. 03. 2008
0 views

shaw

CSI Presentation 2007
19. 02. 2008
0 views

CSI Presentation 2007

chapter3v2
15. 10. 2007
0 views

chapter3v2

MEDOPSBOOKFEB01
02. 05. 2008
0 views

MEDOPSBOOKFEB01

BostwPres
02. 05. 2008
0 views

BostwPres

555 Spanish
02. 05. 2008
0 views

555 Spanish

hexapod Shirke
02. 05. 2008
0 views

hexapod Shirke

Lung Expansion 1
02. 05. 2008
0 views

Lung Expansion 1

Aaron
02. 05. 2008
0 views

Aaron

CMI slides Feb05
01. 11. 2007
0 views

CMI slides Feb05

SAP1012
10. 03. 2008
0 views

SAP1012

lesson 4
15. 10. 2007
0 views

lesson 4

2006 APHA
05. 10. 2007
0 views

2006 APHA

probir
30. 03. 2008
0 views

probir

Rauf Presentation NEW
18. 10. 2007
0 views

Rauf Presentation NEW

IAJAPAN
09. 10. 2007
0 views

IAJAPAN

Mr Daisuke Matsunaga
09. 10. 2007
0 views

Mr Daisuke Matsunaga

3 KukaGLBThealthissues
29. 10. 2007
0 views

3 KukaGLBThealthissues

Bernard ANSELMETTI
24. 10. 2007
0 views

Bernard ANSELMETTI

NBII Newark 10 02
21. 10. 2007
0 views

NBII Newark 10 02

MarketingWorkshop 4 22 05rev1
24. 10. 2007
0 views

MarketingWorkshop 4 22 05rev1

FEESTDAGEN
06. 11. 2007
0 views

FEESTDAGEN

trainplanesandautomo biles
13. 03. 2008
0 views

trainplanesandautomo biles

NWA June00
05. 10. 2007
0 views

NWA June00

Panama 2004 Reporte
25. 10. 2007
0 views

Panama 2004 Reporte

SAKURA Yamamoto
25. 03. 2008
0 views

SAKURA Yamamoto

tiner presentation
04. 01. 2008
0 views

tiner presentation

aseancjp
09. 10. 2007
0 views

aseancjp

schools talk
29. 10. 2007
0 views

schools talk

BethkeA
02. 10. 2007
0 views

BethkeA

DeVidtsPresentation
11. 10. 2007
0 views

DeVidtsPresentation

Zhu Zhiyong
16. 10. 2007
0 views

Zhu Zhiyong

pres1 1
22. 10. 2007
0 views

pres1 1

AESC 2005 VERMONT Result
02. 11. 2007
0 views

AESC 2005 VERMONT Result