OHSummarize Sept2003

Information about OHSummarize Sept2003

Published on August 22, 2007

Author: Funtoon

Source: authorstream.com

Content

Automatic text summarization:  Automatic text summarization Hercules Dalianis NADA-KTH Royal Institute of Technology 100 44 Stockholm ph: +46-8-790 91 05 mobile: +46 70 568 13 59 email: [email protected] Overview of talk:  Overview of talk Background Other summarizers Technique Future improvements Applications Evaluation Automatic text summarization:  Automatic text summarization Automatic text summarization is the method where a computer summarizes a text. A text is given to the computer and it returns an non-redundant shorter text- An extract from a longer original text. The technique has it’s roots in the 60’s. With the Internet and the WWW it has been an awakening interest in summarization techniques. Summarization tools:  Summarization tools http://www.nada.kth.se/~hercules/HDbookmarks.htm http://www.ics.mq.edu.au/~swan/summarization/projects_full.htm Microsoft Word 97, 98 and Word 2000 have a summarizer for documents. Intelligent Miner for Text -Summarization tool IBM Inxight (XEROX) Datahammer (Glucose Development Corporation) Slide5:  Corporum Summarizer- Cognit AS (Norway) Pertinence (France) Copernic Summarizer MuST Prototype Automated Text Summarization (SUMMARIST) Columbia Newsblaster http://www1.cs.columbia.edu/nlp/newsblaster OracleContext Autonomy What is Automatic summarization good for?:  What is Automatic summarization good for? News paper setting and printing and#x8;Sydsvenska Dagbladet, Bergens Tidene Summarize Scientific texts Danmarks Elektroniske Forskningsbibliotek Telephone systems Read summarized news synthetically Slide7:  Search engines to summarize documents for hitlist c.f. Google, SiteSeeker. NewsAgent - Business Intelligence TDT Topic Detection Tracking and Columbia Newsblaster Slide8:  Slide9:  Slide10:  Summarization approaches:  Summarization approaches Extraction vs. Abstraction Generic vs. Query based Indicative vs. Informative Restricted vs. Unrestricted domain Background information vs. New information (TDT) Single-document vs. Multiple-document Monolingual vs. Multilingual Textual vs. Multimedia Text summarization:  Text summarization Extraction is much easier than abstraction Abstraction needs understanding and rewriting Techiques:  Techiques Find what the text is about Then decide what so say Then decide how to say it Text summarization (extraction) uses statistic, linguistic and heuristic methods Techiques:  Techiques A text is divided into sentences Sentence positions (News/Reports) Title words Bold text, Numerical values, Citations Named Entities (Frequence based) Keyword frequency and extraction (nouns, adverbs, adjectives) Use morphological information-lemma Key word lexicon:  Key word lexicon Key words in news domain Also called 'open class word lexicon' Key words can be noun, adjectives or adverbs Slide16:  Word which are present in all other sentences. User adaptation Use user keywords - Obtain slanted summaries Combination function of all rankings with different weights gives the rank of each sentence. Generate all high ranking sentences Voilá the summary ! SweSum:  SweSum The first text summarizer for Swedish Summarizes Swedish news paper text in HTML/text format on the WWW. Uses a Swedish key word lexicon that contains 40 000 words and their possible 700 000 inflections. During the text summarization are 5-10 key words produced which describes or categorizes the text - Key words - A miniature summary. The Swedish keyword lexicon:  The Swedish keyword lexicon 700 000 words 40 000 words Inflected version Lemma statsminister statsminister statsministern statsminister statsministerns statsminister statsministrarna statsminister statsministrarnas statsminister .. ... regeringen regeringen regeringens regeringen regeringarna regeringen regeringarnas regeringen ... .... Slide19:  SweSum:  SweSum SweSum is available to summarize news texts on Swedish, Danish, Norwegian, English, Spanish, French, German and in Farsi (Iranian). Slide21:  Slide22:  Textsammanfattningsbildspel Slide23:  Problems:  Problems Pronoun and other anafora referenser Kalle sprang. Han sprang fort. Pronoun resolution Clauses can be too long or too short Clause reductions- and clause combination rules Aggregation SweSum without PRM:  SweSum without PRM Analysera mera! Regi: Harold Ramis Medv: Robert De Niro, Billy Crystal, Lisa Kudrow Längd: 1 tim, 45 min … Ett av många skäl att glädjas åt Analysera mera är att Robert De Niro här verkligen utövar skådespelarkonst igen. Han accelererar emotionellt från 0 till 100 på ingen tid alls, för att sedan kattmjukt bromsa in och parkera, lugnt och behärskat. Och han är tämligen oemotståndlig. Här har han åstadkommit ännu en intelligent komedi för alla oss vänner av intelligens och komedi, gärna i kombination. SvD 99-10-08 SweSum with PRM:  SweSum with PRM Analysera mera! Regi: Harold Ramis Medv: Robert De Niro, Billy Crystal, Lisa Kudrow Längd: 1 tim, 45 min … Ett av många skäl att glädjas åt Analysera mera är att Robert De Niro här verkligen utövar skådespelarkonst igen. Robert accelererar emotionellt från 0 till 100 på ingen tid alls, för att sedan kattmjukt bromsa in och parkera, lugnt och behärskat. Och Robert är tämligen oemotståndlig. Här har Harold åstadkommit ännu en intelligent komedi för alla oss vänner av intelligens och komedi, gärna i kombination. SvD 99-10-08 Evaluation:  Evaluation We found that if one summarizes the text to 30 percent of original length one will obtain around 70-80 percent accuracy on 3-4 pages news articles. .. but query based evaluations are based on subjective opinions These evaluation need large human effort Small overlap of opinions We need man-made extracts to compare the machine made extracts automatically Slide28:  There are some man-made extracts for English news texts. We had to create such extract for Swedish news text. We created KTH Extract Corpus- Corpus created manually once by voting Then one can compare the texts from SweSum and KTH Extract Corpus manually or soon automatically KTH extract corpus:  KTH extract corpus http://www.nada.kth.se/iplab/hlt/kthxc/showsumstats.php and http://www.nada.kth.se/iplab/hlt/kthxc/ Visa celltexten Slide30:  http://www.nada.kth.se/iplab/hlt/kthxc/showsumstats.php?cutoff=30andamp;fileid=svenska-%3Etest-%3Etext001.htm Future improvements of SweSum:  Future improvements of SweSum Tagging instead of static lexicons Clause level summarization Improved Named Entity recognition Improved Pronominal Resolution Lexical chains using SIMPLE and/or EuroWordNet Automatic evaluation method Demonstrators:  Demonstrators SweSum – Standard version http://swesum.nada.kth.se/index-eng.html SweSum – Experimental NE version http://www.nada.kth.se/~xmartin/swesum_lab/index-eng.html (SweSum uses a Perl-CGI script, there is also a standalone version for plain text/html)

Related presentations


Other presentations created by Funtoon

Marketing Mix 4ps
10. 10. 2007
0 views

Marketing Mix 4ps

manners 1
26. 06. 2007
0 views

manners 1

Telecom Seminar 5 20 06
18. 04. 2008
0 views

Telecom Seminar 5 20 06

nuti
10. 04. 2008
0 views

nuti

ch04
07. 04. 2008
0 views

ch04

Anthrax and Pan Flu scenario
30. 03. 2008
0 views

Anthrax and Pan Flu scenario

Software Development Survey
27. 03. 2008
0 views

Software Development Survey

tts
26. 03. 2008
0 views

tts

Tsamboulas
21. 03. 2008
0 views

Tsamboulas

eie1103
18. 03. 2008
0 views

eie1103

Fluid and Electrolyte
02. 01. 2008
0 views

Fluid and Electrolyte

lvmh
26. 06. 2007
0 views

lvmh

Sodium And Water Balance
04. 01. 2008
0 views

Sodium And Water Balance

dot nyc workshop
27. 09. 2007
0 views

dot nyc workshop

Christmas Greetings 02
02. 10. 2007
0 views

Christmas Greetings 02

people around you
03. 10. 2007
0 views

people around you

Impressionismus
12. 10. 2007
0 views

Impressionismus

Pres Feulefack Zeller
29. 11. 2007
0 views

Pres Feulefack Zeller

HydropowerProjects in Nepal
06. 12. 2007
0 views

HydropowerProjects in Nepal

Project Lead The Way
07. 12. 2007
0 views

Project Lead The Way

SC tudor timeline
22. 08. 2007
0 views

SC tudor timeline

RDML Sharp MINWARA
07. 11. 2007
0 views

RDML Sharp MINWARA

discogenic lbp
17. 12. 2007
0 views

discogenic lbp

How can I miss you
24. 12. 2007
0 views

How can I miss you

hoeslywhyte
28. 12. 2007
0 views

hoeslywhyte

A I in the Military
29. 12. 2007
0 views

A I in the Military

Othello Slide Show
02. 11. 2007
0 views

Othello Slide Show

Day1Session10
07. 01. 2008
0 views

Day1Session10

StarryM 4
22. 08. 2007
0 views

StarryM 4

lhj Tudor Sailors
22. 08. 2007
0 views

lhj Tudor Sailors

elec ppt
21. 11. 2007
0 views

elec ppt

World Internet Project Media
23. 12. 2007
0 views

World Internet Project Media

martinez
26. 02. 2008
0 views

martinez

IndiaSinceIndepencen ce
28. 02. 2008
0 views

IndiaSinceIndepencen ce

march frames consumer
26. 06. 2007
0 views

march frames consumer

Manoj
26. 06. 2007
0 views

Manoj

MADHUSHALA
26. 06. 2007
0 views

MADHUSHALA

E Newsletter Aug2006
26. 06. 2007
0 views

E Newsletter Aug2006

Leipzig 02
26. 06. 2007
0 views

Leipzig 02

lecture2 CS598HL
26. 06. 2007
0 views

lecture2 CS598HL

lecture21
26. 06. 2007
0 views

lecture21

lecture13
26. 06. 2007
0 views

lecture13

Lecture 10 Reliability
26. 06. 2007
0 views

Lecture 10 Reliability

13411
23. 11. 2007
0 views

13411

AFD 061206 049
22. 08. 2007
0 views

AFD 061206 049

Elizabeth Suti
03. 12. 2007
0 views

Elizabeth Suti

Mo0PC06 02 Sekar Sari
02. 01. 2008
0 views

Mo0PC06 02 Sekar Sari

corso Haccp
20. 11. 2007
0 views

corso Haccp

nw mn cropping system
04. 10. 2007
0 views

nw mn cropping system

RLEP 2 Overview Bart Graham
13. 11. 2007
0 views

RLEP 2 Overview Bart Graham

himinhvelfingin
14. 11. 2007
0 views

himinhvelfingin

Real time2
22. 08. 2007
0 views

Real time2

le amiche di sergio
26. 06. 2007
0 views

le amiche di sergio

tudor monarchs
22. 08. 2007
0 views

tudor monarchs

daphne OMAN feb04
22. 08. 2007
0 views

daphne OMAN feb04

PickMaster 2 10 Ext Feb 25
07. 01. 2008
0 views

PickMaster 2 10 Ext Feb 25