040916 EV WS 10 More Applications

Information about 040916 EV WS 10 More Applications

Published on September 29, 2007

Author: Nivedi

Source: authorstream.com

Content

Slide1:  Multilingual text analysis applications based on automatic Eurovoc indexing Ralf Steinberger Addressing the Language Barrier Problem in the Enlarged EU Automatic Eurovoc Descriptor Assignment JRC Workshop, Ispra, 16/17 September 2004 http://www.jrc.cec.eu.int/langtech Applications mentioned so far:  Applications mentioned so far Thesaurus indexing (summarise main concepts of document) Fully automatic Interactive Monolingual and cross-lingual Document retrieval Monolingual and cross-lingual  Eurovoc indexing can be used for MUCH MORE … Main goals of JRC’s Language Technology (LT) activity:  Main goals of JRC’s Language Technology (LT) activity Gather potentially user-relevant documents Analyse texts in various languages extract information from texts (Eurovoc) identify similarity between documents (Eurovoc) Classify documents (Eurovoc) Visualise contents of individual documents (Eurovoc) of whole document collections (Eurovoc) Eurovoc indexing as part of a tool set:  Eurovoc indexing as part of a tool set (Cross-lingual) document similarity calculation:  (Cross-lingual) document similarity calculation Spanish Text Resolución sobre los residuos radioactivos monolingual (Multilingual) text classification:  (Multilingual) text classification Most current approaches to text classification are monolingual Text classification, via Eurovoc, is multilingual (Multilingual) document map:  (Multilingual) document map © Cartia’s ThemeScape ‘Translation Spotting’:  ‘Translation Spotting’ Why? To test document similarity calculation To compile a collection of parallel texts (for the training and testing of other multilingual text analysis applications) To detect cross-lingual document plagiarism ‘Translation Spotting’ - Results:  ‘Translation Spotting’ - Results Task: find Spanish translations of English source document in a parallel text collection Simple document similarity (DS) (Multilingual) clustering of documents:  To organise unknown document collections Algorithm: Find pairs of texts that are most similar Group them in one cluster, repeat the operation until only one cluster remains (Multilingual) clustering of documents 90% 80% 75% 40% 10% Building a (multilingual) cluster tree:  Building a (multilingual) cluster tree Application to (multilingual) news analysis:  Application to (multilingual) news analysis EMM system in JRC’s Web Technology sector retrieves about 20,000 news articles per day in ~20 languages (4000 articles in English) (http://emm.jrc.it) Cluster related news stories and identify duplicates (news topic identification) Identify keywords, people’s names, place names, main sentences (information extraction) Find related news stories over time (news topic tracking) Find related news stories in other languages (cross-lingual topic tracking mainly via Eurovoc and place names) Slide13:  Detection of the major news of the day (EMM) Establish Links to Related News over time:  Establish Links to Related News over time Establish links to related news in other languages:  Establish links to related news in other languages Subject-specific summarisation (1):  Subject-specific summarisation (1) Title: "Resolution on the 10th anniversary of the Chernobyl accident" Subject-specific summarisation (2):  Subject-specific summarisation (2) Further JRC LT applications:  Further JRC LT applications Recognition and translation of: Place names; + visualisation Products Recognition of text language Place name recognition / Cross-lingual display:  Place name recognition / Cross-lingual display Place name recognition / Visualisation:  Place name recognition / Visualisation 18 references (Boston, American, America, New York) 11 references (Vietnam) 5 references (Iraq) + 1 reference to Sweden (Andre Heinz(…) Swedish based environmental consultant) Place name recognition / Disambiguation:  Place name recognition / Disambiguation Requires disambiguation 14 Paris’, 7 Birminghams cities called ‘And’, ‘Annan’ name variants (exonyms) Zoom on Europe Recognising names, places, … - News navigation:  Recognising names, places, … - News navigation Top-mentioned personalities En/Fr news 26 July 2004 Automatic recognition of name variants:  Automatic recognition of name variants Automatic link to online encyclopaedia:  Automatic link to online encyclopaedia News clusters mentioning a person:  News clusters mentioning a person Persons talked about in same news clusters:  Persons talked about in same news clusters Countries talked about in same news clusters:  Countries talked about in same news clusters Frequent keywords for these news clusters:  Frequent keywords for these news clusters Recognising products and product groups:  Recognising products and product groups Sample text Recognising products and product groups:  Recognising products and product groups Identified products Recognising products and product groups:  Recognising products and product groups Cross-lingual display of products found Slide33:  Multilingual Information Extraction Language recognition (demo) Keywords (monolingual; cross-lingual) Geographical place names (intro; new EU languages; demo) Products and product groups (slides; demo JRC, demo CIS) Names of people (demo news names, demo recognition, related names, Cyrillic/Greek fuzzy name matching, demo fuzzy matching) Dates (demo recognition) Terminology extraction Summarisation (standard sentence extraction; subject-specific summarisation) Cross-lingual navigation and classification Document similarity (monolingual; cross-lingual; translation spotting) Bottom-up document clustering; topic detection (demo news analysis) Classification (multi-monolingual and cross-lingual; pre-classification clustering) Relevance-ranking of documents (slides) News topic tracking (monolingual historical; cross-lingual; demo news analysis) Navigate text collections via people, countries, keywords, clusters, across languages (slides; demo news names). Visualisation of textual contents Individual documents (document profile) Whole document collections (document map) Geographical information (maps; animated maps, demo) Clustering (ascii, star, tree), key-word-in-context (KWIC), search, … Further tools Document Gathering (Lang-Tech crawler; WT’s EMM system) Document format conversion (PDF, MS-Word, PS, HTML, XML) Character set conversion (UTF-8, ISO-Latin, HTML, …) Projects IDoRA for OLAF (slides) Cross-lingual Indexing (EUROVOC) Breaking News – Detection and Visualisation (BNDV / State-of-the-World) SVM for Text Classification Modus Operandi Ad-hoc analyses (REACH, AM, INFSO project proposals, ADMIN job descriptions, ENV Public Consultation Sustainable Development) JRC Introduction Multilingual and crosslingual text analysis

Related presentations


Other presentations created by Nivedi

7 habits of highly effective ir
24. 10. 2007
0 views

7 habits of highly effective ir

Intro to Middle East
23. 10. 2007
0 views

Intro to Middle East

Ulysses
01. 10. 2007
0 views

Ulysses

chasm
02. 10. 2007
0 views

chasm

english version
03. 10. 2007
0 views

english version

griffin BGP tutorial
07. 10. 2007
0 views

griffin BGP tutorial

eurjap2004pres
09. 10. 2007
0 views

eurjap2004pres

aboriginal art
10. 10. 2007
0 views

aboriginal art

Galloway
15. 10. 2007
0 views

Galloway

room
15. 10. 2007
0 views

room

keymicrobial rosenberg
23. 10. 2007
0 views

keymicrobial rosenberg

franklin
15. 10. 2007
0 views

franklin

JavaVsDotNET
21. 10. 2007
0 views

JavaVsDotNET

strategies maximize
29. 10. 2007
0 views

strategies maximize

rehder050407
10. 12. 2007
0 views

rehder050407

Chuck Bedsole
25. 10. 2007
0 views

Chuck Bedsole

WA1 2 Redmond
29. 10. 2007
0 views

WA1 2 Redmond

2005 Hand Washing Findings rev
30. 10. 2007
0 views

2005 Hand Washing Findings rev

ch5slides
07. 11. 2007
0 views

ch5slides

Turkey CoalRestructuring
26. 11. 2007
0 views

Turkey CoalRestructuring

chapter 28
23. 12. 2007
0 views

chapter 28

Crisis Management Lecture 2
29. 12. 2007
0 views

Crisis Management Lecture 2

Lecture 25
16. 10. 2007
0 views

Lecture 25

DERMATOLOGY QUIZ ANSWERS
05. 01. 2008
0 views

DERMATOLOGY QUIZ ANSWERS

Maeve Foreman
07. 01. 2008
0 views

Maeve Foreman

revNotes1stMC
04. 10. 2007
0 views

revNotes1stMC

Schuster
11. 10. 2007
0 views

Schuster

IWGT comet final2
30. 10. 2007
0 views

IWGT comet final2

mccay4
12. 10. 2007
0 views

mccay4

Presentazione ICE TUNISI
23. 10. 2007
0 views

Presentazione ICE TUNISI

Kapitel4
24. 10. 2007
0 views

Kapitel4

CAARI Lab 00
12. 10. 2007
0 views

CAARI Lab 00

Taipei August 05
19. 10. 2007
0 views

Taipei August 05

LOWBACKPAIN2
16. 02. 2008
0 views

LOWBACKPAIN2

2004L9Stat246
24. 02. 2008
0 views

2004L9Stat246

berne
28. 02. 2008
0 views

berne

PAM
23. 10. 2007
0 views

PAM

slides bird flu
30. 03. 2008
0 views

slides bird flu

A105 003 Sky
13. 11. 2007
0 views

A105 003 Sky

hunter lovins ifm07
30. 10. 2007
0 views

hunter lovins ifm07

Psicotrópicos
24. 10. 2007
0 views

Psicotrópicos

IntroToRealOptions
16. 04. 2008
0 views

IntroToRealOptions

QUMRAN COMPRESSED
14. 02. 2008
0 views

QUMRAN COMPRESSED

third exam review
18. 04. 2008
0 views

third exam review

McGraw Hill
22. 04. 2008
0 views

McGraw Hill

posp72 0
11. 10. 2007
0 views

posp72 0

7 Metlin
11. 10. 2007
0 views

7 Metlin

cshcn
07. 05. 2008
0 views

cshcn

ls3 d2 room22
30. 04. 2008
0 views

ls3 d2 room22

Open GL
02. 05. 2008
0 views

Open GL

physics of accelerators
02. 05. 2008
0 views

physics of accelerators

thuy
02. 05. 2008
0 views

thuy

maryland 080202
19. 02. 2008
0 views

maryland 080202

CONFECA2005
22. 10. 2007
0 views

CONFECA2005

20061201 afrinic5 report
27. 03. 2008
0 views

20061201 afrinic5 report

Roe Summer Lecture
13. 10. 2007
0 views

Roe Summer Lecture

opa
22. 10. 2007
0 views

opa

break this glass update
07. 01. 2008
0 views

break this glass update

GGunn Feb05
12. 03. 2008
0 views

GGunn Feb05

cronqvist
31. 12. 2007
0 views

cronqvist

jcdl
02. 10. 2007
0 views

jcdl

jje
10. 10. 2007
0 views

jje

milcho
15. 10. 2007
0 views

milcho

Masahiro Satake session 6
09. 10. 2007
0 views

Masahiro Satake session 6

TIANS 2003 final
11. 03. 2008
0 views

TIANS 2003 final

mariana
22. 10. 2007
0 views

mariana

Schulthess X1Review Feb2004
15. 10. 2007
0 views

Schulthess X1Review Feb2004

april croatia
26. 03. 2008
0 views

april croatia

200793142735
03. 01. 2008
0 views

200793142735

CultureClashWestPoint
17. 10. 2007
0 views

CultureClashWestPoint

notifications
10. 10. 2007
0 views

notifications

Daar
16. 10. 2007
0 views

Daar

IGEM2006 Imperial Powerpoint
01. 01. 2008
0 views

IGEM2006 Imperial Powerpoint

booklet osijek2006
18. 03. 2008
0 views

booklet osijek2006

12km MM5 Issues Mar8 9 2005
29. 10. 2007
0 views

12km MM5 Issues Mar8 9 2005