Words

Information about Words

Published on December 6, 2007

Author: Elena

Source: authorstream.com

Content

Dictionaries:  Dictionaries See Patrick Hanks “Lexicography” chapter 3 of Mitkov, R. (ed.) The Oxford Handbook of Computational Linguistics, Oxford: OUP, 2004. Dictionaries/Lexicons:  Dictionaries/Lexicons Lexicography and the computer Corpus-based lexicography MRDs Dictionaries for NLP Thesauri: structured lexicons Computational lexicography:  Computational lexicography Restructuring and exploiting human dictionaries for use by computer programs Using computational techniques to compile (new) dictionaries Focus on English (and other well established languages) Significant different issues for other languages, especially Alphabetization and arrangement Compilation from scratch for previously unstudied languages Human dictionaries:  Human dictionaries Traditional view of what a “dictionary” is List of words, arranged (usually) alphabetically Inclusion in dictionary lends authority, even proscriptively Entry typically gives spelling ... alternate spellings POS, morphology (if irregular) core definition (using defining vocab?) pronunciation (using own transcription) etymology examples of usage as justification for inclusion as illustration of use (esp. learner’s dictionaries) Entry typically doesn’t give help with spelling morphology (if regular), especially derivational subcategorization information contrastive examples of use indications of possible metaphorical extensions to meaning Human dictionaries:  Human dictionaries Historically bilingual dictionaries for translators monolingual dictionary as (pre/proscriptive) definition of language, often polemical OED (1884-1928) first dictionary on purely descriptive principle, relying on citations Deficiencies and difficulties What to include? (neologisms, slang) Inclusion of names Differentiating senses Differentiating word senses:  Differentiating word senses Dictionaries disagree widely Probably no right answer General principles (look for excuse to split vs look for reason to lump) Keep related words of different POS together? Etymology can be misleading (eg crane, pupil) Metaphorical extension of original meaning – how far do you go? (eg rose, bar) Purpose of dictionary may help decide, eg translation Citations:  Citations Senses and uses identified by collecting examples of use Sent in on “slips” by informants Lexicographer’s job is to collate these Criteria for a new word (or new meaning) Number of citations Source of citations Veracity of use Corpus-based dictionaries:  Corpus-based dictionaries A collection of texts, usually collected with a specific purpose in mind British National Corpus, attempt to capture a synchronic picture of BrE of the late 1980s (100m words) COBUILD “Bank of English” dynamic “monitor” corpus used to help lexicographers identify/define usage Machine-readable dictionaries:  Machine-readable dictionaries “Machine” means “computer” Dictionary stored in a format which makes it manipulable on a computer Originally, derived from MR version of print dictionary (from type-setter’s tapes) Now the other way round: data stored as a database from which hard copy can be printed (inter alia) MRDs - advantages:  MRDs - advantages Flexibility of access and presentation Not bound to alphabetical listing Information presented can be filtered Can be searched as a database Different versions (for different users, serving different purposes) can be produced Increased storage capacity More information can be stored, especially Implicit information can be made explicit More examples, including “negative data” Lexicons for NLP:  Lexicons for NLP Have to state everything we need to know about the word Phonology: stress pattern, possible weak forms Orthography: spelling alternatives, hyphenation Morphology: inflectional paradigms, even if regular Information about derivations Syntax: Explicit information about subcategorization and eg syntactic/semantic features of arguments Any special interpretation of tenses Lexical combinatorics: compounds, idioms Semantics: definition, semantic features, semantic relations Pragmatics: register, collocation, connotation Lexicons for NLP - example:  Lexicons for NLP - example Information about derivations Agentive derivation (-er) is very productive Usually means the actor doing the action of a verb, e.g. swimmer, dancer, killer Not available for some verbs, e.g. *knower, *cycler, *sayer though cf soothsayer, *hoper May have a specialised meaning instead of or as well as the derived meaning, e.g. revolver, computer, washer, hitter In some cases can mean the object undergoing the action (via ergative use of verb), e.g. taster Subcategorization:  Subcategorization Words are assigned to categories (ie parts of speech, POS), eg noun, verb on basis of form, meaning, use Syntactic behaviour is predictable from (or determined by) category Within a category there are subcategories with specific patterns of behaviour, both syntactic and semantic, e.g. transitive/intransitive verb  direct object? passivize? Subcategorization:  Subcategorization Subcat frames indicate complement patterns and preferences, e.g. subj, obj, double obj, prep-obj, infinitival complement, that complement etc semantic features of complements, eg obj of eat normally edible Subcat information can help to disambiguate cf He told the man where the body was buried . He found the place where the body was buried . Much of this info can be captured in general rules [ ][ ] [ [ ]] Slide15:  Have to state everything we need to know about the word, though not necessarily explicitly There can be rules to capture inheritance of properties, e.g. accomplishment + prog tense implies incompletion cf She was baking a cake when she dropped dead  no cake She was stroking the cat when she dropped dead Exploiting human dictionaries in NLP:  Exploiting human dictionaries in NLP In all NLP applications, lexicon is major bottleneck Availability of MRD versions of human dictionaries provided possible solution Obviously, MRD gives list of words, and some information Extract further information about verb frames by analysing the examples Identify semantic features from definitions eg a plant which..., a person who... Identify hidden arguments eg to lock = to close sthg using a key cf He locked the door. The key was heavy. He emptied his pockets. *The key was heavy. Exploiting human dictionaries in NLP:  Exploiting human dictionaries in NLP Generic information about a word and its usage can be derived from definitions in which it occurs: Wine: alcoholic drink made from fermented juices, especially of grapes Vintage: a season’s yield of wine from a vineyard Red wine: wine having a red colour derived from the skins of the grapes used ... Vineyard: an orchard where grapes are grown for the purpose of wine making Pinot noir: a dry red Californian table wine Sake: Japanese rice wine Claret: a dry red Bordeaux or Bordeaux-like wine Sherry: a sweet white wine from the Jerez region of Spain Riesling: a dessert wine made from white grapes grown historically in Germany ... Corpus-based lexicography revisited:  Corpus-based lexicography revisited Similarly, analysis of real examples can reveal patterns of usage Identify primary meaning: not always what you’d expect (example of reckon) Identify possible complementation patterns, and their relative frequency Structured dictionaries:  Structured dictionaries Special type of dictionary in which words are grouped together according to their meaning: thesaurus Classic example Roget’s Thesaurus (1852) Structured vocabulary much used in field of terminology Also now a valuable resource for NLP: Miller’s (Princeton) WordNet (1985)

Related presentations


Other presentations created by Elena

Cheryl Walker Literary Terms
05. 11. 2007
0 views

Cheryl Walker Literary Terms

Flame Pics 04 05
05. 11. 2007
0 views

Flame Pics 04 05

Storage tank Leak check 1
07. 11. 2007
0 views

Storage tank Leak check 1

mulkukarakus
22. 11. 2007
0 views

mulkukarakus

5para
23. 12. 2007
0 views

5para

From World War Two to Vietnam
24. 12. 2007
0 views

From World War Two to Vietnam

Burnswebversion
04. 01. 2008
0 views

Burnswebversion

9elKharrat
07. 01. 2008
0 views

9elKharrat

DesafioBibliotecaEsc olar
05. 11. 2007
0 views

DesafioBibliotecaEsc olar

group1
03. 01. 2008
0 views

group1

learning2004
16. 11. 2007
0 views

learning2004

Historizmus
01. 10. 2007
0 views

Historizmus

PUNJAB Sidhu
04. 10. 2007
0 views

PUNJAB Sidhu

Graciela Camara
04. 01. 2008
0 views

Graciela Camara

upwa6
01. 12. 2007
0 views

upwa6

Facilities Presentation CHE 4 02
04. 01. 2008
0 views

Facilities Presentation CHE 4 02

b689 w04
20. 02. 2008
0 views

b689 w04

prak astro
15. 11. 2007
0 views

prak astro

06svenss
03. 12. 2007
0 views

06svenss

bfslides0708
24. 02. 2008
0 views

bfslides0708

alt2
29. 02. 2008
0 views

alt2

OCCAnalystsVisitNov01
12. 12. 2007
0 views

OCCAnalystsVisitNov01

04 05
05. 03. 2008
0 views

04 05

dgassnerFlexAjax360F lex
28. 11. 2007
0 views

dgassnerFlexAjax360F lex

PlanningDDivine
03. 10. 2007
0 views

PlanningDDivine

fiscalyear06a
27. 03. 2008
0 views

fiscalyear06a

Japan Spring 06
30. 03. 2008
0 views

Japan Spring 06

Old Faithful Premier Video
09. 10. 2007
0 views

Old Faithful Premier Video

eci147p3e
13. 04. 2008
0 views

eci147p3e

D5 Laura Botwinick Peter Angood
02. 10. 2007
0 views

D5 Laura Botwinick Peter Angood

family rel Qs 2004
28. 12. 2007
0 views

family rel Qs 2004

Shaxson
27. 12. 2007
0 views

Shaxson

WIC BFP Training Script
23. 11. 2007
0 views

WIC BFP Training Script

Vanhempainilta
05. 11. 2007
0 views

Vanhempainilta

OME CASEWRITING WORKSHOP
29. 12. 2007
0 views

OME CASEWRITING WORKSHOP

NASTIES AND BEASTIES THINGS1
19. 11. 2007
0 views

NASTIES AND BEASTIES THINGS1