lect29 groupwords

Information about lect29 groupwords

Published on October 18, 2007

Author: Belly

Source: authorstream.com

Content

Grouping Words:  Grouping Words Linguistic Objects in this Course:  Linguistic Objects in this Course Trees (with strings at the nodes) Syntax, semantics Algorithms: Generation, parsing, inside-outside, build semantics Sequences (of strings) n-grams, tag sequences morpheme sequences, phoneme sequences Algorithms: Finite-state, best-paths, forward-backward “Atoms” (unanalyzed strings) Words, morphemes Represent by contexts – other words they occur with Algorithms: Grouping similar words, splitting words into senses A Concordance for “party” from www.webcorp.org.uk:  A Concordance for “party” from www.webcorp.org.uk A Concordance for “party” from www.webcorp.org.uk:  A Concordance for “party” from www.webcorp.org.uk thing. She was talking at a party thrown at Daphne's restaurant in have turned it into the hot dinner-party topic. The comedy is the selection for the World Cup party, which will be announced on May 1 in the 1983 general election for a party which, when it could not bear to to attack the Scottish National Party, who look set to seize Perth and that had been passed to a second party who made a financial decision the by-pass there will be a street party. "Then," he says, "we are going number-crunchers within the Labour party, there now seems little doubt political tradition and the same party. They are both relatively Anglophilic he told Tony Blair's modernised party they must not retreat into "warm "Oh no, I'm just here for the party," they said. "I think it's terrible A future obliges each party to the contract to fulfil it by be signed by or on behalf of each party to the contract." Mr David N What Good are Word Senses?:  What Good are Word Senses? thing. She was talking at a party thrown at Daphne's restaurant in have turned it into the hot dinner-party topic. The comedy is the selection for the World Cup party, which will be announced on May 1 in the 1983 general election for a party which, when it could not bear to to attack the Scottish National Party, who look set to seize Perth and that had been passed to a second party who made a financial decision the by-pass there will be a street party. "Then," he says, "we are going number-crunchers within the Labour party, there now seems little doubt political tradition and the same party. They are both relatively Anglophilic he told Tony Blair's modernised party they must not retreat into "warm "Oh no, I'm just here for the party," they said. "I think it's terrible A future obliges each party to the contract to fulfil it by be signed by or on behalf of each party to the contract." Mr David N What Good are Word Senses?:  What Good are Word Senses? thing. She was talking at a party thrown at Daphne's restaurant in have turned it into the hot dinner-party topic. The comedy is the selection for the World Cup party, which will be announced on May 1 the by-pass there will be a street party. "Then," he says, "we are going "Oh no, I'm just here for the party," they said. "I think it's terrible in the 1983 general election for a party which, when it could not bear to to attack the Scottish National Party, who look set to seize Perth and number-crunchers within the Labour party, there now seems little doubt political tradition and the same party. They are both relatively Anglophilic he told Tony Blair's modernised party they must not retreat into "warm that had been passed to a second party who made a financial decision A future obliges each party to the contract to fulfil it by be signed by or on behalf of each party to the contract." Mr David N What Good are Word Senses?:  What Good are Word Senses? John threw a “rain forest” party last December. His living room was full of plants and his box was playing Brazilian music … What Good are Word Senses?:  What Good are Word Senses? Replace word w with sense s Splits w into senses: distinguishes this token of w from tokens with sense t Groups w with other words: groups this token of w with tokens of x that also have sense s What Good are Word Senses?:  What Good are Word Senses? number-crunchers within the Labour party, there now seems little doubt political tradition and the same party. They are both relatively Anglophilic he told Tony Blair's modernised party they must not retreat into "warm thing. She was talking at a party thrown at Daphne's restaurant in have turned it into the hot dinner-party topic. The comedy is the selection for the World Cup party, which will be announced on May 1 the by-pass there will be a street party. "Then," he says, "we are going "Oh no, I'm just here for the party," they said. "I think it's terrible an appearance at the annual awards bash , but feels in no fit state to -known families at a fundraising bash on Thursday night for Learning Who was paying for the bash? The only clue was the name Asprey, Mail, always hosted the annual bash for the Scottish Labour front- popular. Their method is to bash sense into criminals with a short, just cut off people's heads and bash their brains out over the floor, What Good are Word Senses?:  What Good are Word Senses? number-crunchers within the Labour party, there now seems little doubt political tradition and the same party. They are both relatively Anglophilic he told Tony Blair's modernised party they must not retreat into "warm thing. She was talking at a party thrown at Daphne's restaurant in have turned it into the hot dinner-party topic. The comedy is the selection for the World Cup party, which will be announced on May 1 the by-pass there will be a street party. "Then," he says, "we are going "Oh no, I'm just here for the party," they said. "I think it's terrible an appearance at the annual awards bash, but feels in no fit state to -known families at a fundraising bash on Thursday night for Learning Who was paying for the bash? The only clue was the name Asprey, Mail, always hosted the annual bash for the Scottish Labour front- popular. Their method is to bash sense into criminals with a short, just cut off people's heads and bash their brains out over the floor, What Good are Word Senses?:  What Good are Word Senses? Semantics / Text understanding Axioms about TRANSFER apply to (some tokens of) throw Axioms about BUILDING apply to (some tokens of) bank Machine translation Info retrieval / Question answering / Text categ. Query or pattern might not match document exactly Backoff for just about anything what word comes next? (speech recognition, language ID, …) trigrams are sparse but tri-meanings might not be bilexical PCFGs: p(S[devour]  NP[lion] VP[devour] | S[devour]) approximate by p(S[EAT]  NP[lion] VP[EAT] | S[EAT]) Speaker’s real intention is senses; words are a noisy channel Cues to Word Sense:  Cues to Word Sense Adjacent words (or their senses) Grammatically related words (subject, object, …) Other nearby words Topic of document Sense of other tokens of the word in the same document Word Classes by Tagging:  Word Classes by Tagging Every tag is a kind of class Tagger assigns a class to each word token Word Classes by Tagging:  Word Classes by Tagging Every tag is a kind of class Tagger assigns a class to each word token Simultaneously groups and splits words “party” gets split into N and V senses “bash” gets split into N and V senses {party/N, bash/N} vs. {party/V, bash/V} What good are these groupings? Learning Word Classes:  Learning Word Classes Every tag is a kind of class Tagger assigns a class to each word token {party/N, bash/N} vs. {party/V, bash/V} What good are these groupings? Good for predicting next word or its class! Role of forward-backward algorithm? It adjusts classes etc. in order to predict sequence of words better (with lower perplexity) Words as Vectors :  Words as Vectors Represent each word type w by a point in k-dimensional space e.g., k is size of vocabulary the 17th coordinate of w represents strength of w’s association with vocabulary word 17 (0, 0, 3, 1, 0, 7, . . . 1, 0) = party Words as Vectors :  Words as Vectors Represent each word type w by a point in k-dimensional space e.g., k is size of vocabulary the 17th coordinate of w represents strength of w’s association with vocabulary word 17 = party how often words appear next to each other how often words appear near each other how often words are syntactically linked should correct for commonness of word (e.g., “above”) Words as Vectors :  Words as Vectors Represent each word type w by a point in k-dimensional space e.g., k is size of vocabulary the 17th coordinate of w represents strength of w’s association with vocabulary word 17 Plot all word types in k-dimensional space Look for clusters of close-together types Learning Classes by Clustering :  Learning Classes by Clustering Plot all word types in k-dimensional space Look for clusters of close-together types Bottom-Up Clustering :  Bottom-Up Clustering Start with one cluster per point Repeatedly merge 2 closest clusters Single-link: dist(A,B) = min dist(a,b) for aA, bB Complete-link: dist(A,B) = max dist(a,b) for aA, bB Bottom-Up Clustering – Single-Link:  Bottom-Up Clustering – Single-Link Again, merge closest pair of clusters: Single-link: clusters are close if any of their points are dist(A,B) = min dist(a,b) for aA, bB each word type is a single-point cluster example from Manning & Schütze Bottom-Up Clustering – Single-Link:  example from Manning & Schütze Again, merge closest pair of clusters: Single-link: clusters are close if any of their points are dist(A,B) = min dist(a,b) for aA, bB Fast, but tend to get long, stringy, meandering clusters Bottom-Up Clustering – Single-Link Bottom-Up Clustering – Complete-Link:  Bottom-Up Clustering – Complete-Link Again, merge closest pair of clusters: Complete-link: clusters are close only if all of their points are dist(A,B) = max dist(a,b) for aA, bB example from Manning & Schütze Bottom-Up Clustering – Complete-Link:  Bottom-Up Clustering – Complete-Link Again, merge closest pair of clusters: Complete-link: clusters are close only if all of their points are dist(A,B) = max dist(a,b) for aA, bB example from Manning & Schütze Slow to find closest pair – need quadratically many distances Bottom-Up Clustering :  Bottom-Up Clustering Average-link: dist(A,B) = mean dist(a,b) for aA, bB Centroid-link: dist(A,B) = dist(mean(A),mean(B)) Stop when clusters are “big enough” e.g., provide adequate support for backoff (on a development corpus) Some flexibility in defining dist(a,b) Might not be Euclidean distance; e.g., use vector angle Start with one cluster per point Repeatedly merge 2 closest clusters Single-link: dist(A,B) = min dist(a,b) for aA, bB Complete-link: dist(A,B) = max dist(a,b) for aA, bB too slow to update cluster distances after each merge; but  alternatives! EM Clustering (for k clusters):  EM Clustering (for k clusters) EM algorithm Viterbi version – called “k-means clustering” Full EM version – called “Gaussian mixtures” Expectation step: Use current parameters (and observations) to reconstruct hidden structure Maximization step: Use that hidden structure (and observations) to reestimate parameters Parameters: k points representing cluster centers Hidden structure: for each data point (word type), which center generated it? EM Clustering (for k clusters):  EM Clustering (for k clusters) [see spreadsheet animation]

Related presentations


Other presentations created by Belly

Capital budgeting
28. 04. 2008
0 views

Capital budgeting

Nice pics slides
17. 09. 2007
0 views

Nice pics slides

perceptron 2 4 2008
30. 04. 2008
0 views

perceptron 2 4 2008

pham07
18. 04. 2008
0 views

pham07

FC STONE GREAT WALL1
17. 04. 2008
0 views

FC STONE GREAT WALL1

Sauter Nuts Bolt ETFs
16. 04. 2008
0 views

Sauter Nuts Bolt ETFs

UnivOfGuelphNov26th
14. 04. 2008
0 views

UnivOfGuelphNov26th

fujiwara
13. 04. 2008
0 views

fujiwara

Week 08 Finance
10. 04. 2008
0 views

Week 08 Finance

Lct1
09. 04. 2008
0 views

Lct1

outlook
19. 06. 2007
0 views

outlook

Microsoft Windows Vista
19. 06. 2007
0 views

Microsoft Windows Vista

2004 presentation
13. 09. 2007
0 views

2004 presentation

Australian
13. 09. 2007
0 views

Australian

NBB
13. 09. 2007
0 views

NBB

Thilo Ewald ppt
13. 09. 2007
0 views

Thilo Ewald ppt

20031216 NASANIH presentation
05. 10. 2007
0 views

20031216 NASANIH presentation

mna presentation
17. 10. 2007
0 views

mna presentation

Essential Q Imperialism 2
22. 10. 2007
0 views

Essential Q Imperialism 2

p puska
07. 09. 2007
0 views

p puska

Productivity
07. 09. 2007
0 views

Productivity

honeyPots
13. 09. 2007
0 views

honeyPots

NDB Bensouda
23. 10. 2007
0 views

NDB Bensouda

181105
24. 10. 2007
0 views

181105

METO200Lect19 20
05. 10. 2007
0 views

METO200Lect19 20

oksupercompsymp2006 talk matrow
17. 10. 2007
0 views

oksupercompsymp2006 talk matrow

mareyes
25. 10. 2007
0 views

mareyes

2 01 3
29. 10. 2007
0 views

2 01 3

Online Class Evaluations 8
30. 10. 2007
0 views

Online Class Evaluations 8

1 3Grand father Journey
02. 11. 2007
0 views

1 3Grand father Journey

TuijaKuisma
07. 09. 2007
0 views

TuijaKuisma

Metallsektor
14. 11. 2007
0 views

Metallsektor

insects in out
13. 09. 2007
0 views

insects in out

oasen
16. 11. 2007
0 views

oasen

Unit 10 Scent Theory
17. 11. 2007
0 views

Unit 10 Scent Theory

SPEAR 2004
21. 11. 2007
0 views

SPEAR 2004

danse macabre
22. 11. 2007
0 views

danse macabre

kmutt
13. 09. 2007
0 views

kmutt

NCUR SDT 4 19 05
04. 01. 2008
0 views

NCUR SDT 4 19 05

gerber colloq UICtop feb2002
15. 10. 2007
0 views

gerber colloq UICtop feb2002

Lioi Altered Version
07. 01. 2008
0 views

Lioi Altered Version

Five Halloween Pumpkins audacity
02. 11. 2007
0 views

Five Halloween Pumpkins audacity

smime
07. 10. 2007
0 views

smime

CdF BEC
20. 11. 2007
0 views

CdF BEC

WEB C Schumacher
23. 10. 2007
0 views

WEB C Schumacher

bsb
13. 09. 2007
0 views

bsb

2006052213550876705
03. 01. 2008
0 views

2006052213550876705

1 11
19. 02. 2008
0 views

1 11

Ukraine
20. 02. 2008
0 views

Ukraine

truck tmp1002
27. 02. 2008
0 views

truck tmp1002

ace program plan
29. 02. 2008
0 views

ace program plan

takala
07. 09. 2007
0 views

takala

464 TM12
14. 12. 2007
0 views

464 TM12

ICEBP presentation for ANZCP A
10. 03. 2008
0 views

ICEBP presentation for ANZCP A

aionescu cmc dec06
30. 10. 2007
0 views

aionescu cmc dec06

creationtalk
11. 03. 2008
0 views

creationtalk

Data Mining 2
12. 03. 2008
0 views

Data Mining 2

Omaha Pres for NAP web2
29. 12. 2007
0 views

Omaha Pres for NAP web2

sustainable development part1
26. 03. 2008
0 views

sustainable development part1

Schrage
31. 08. 2007
0 views

Schrage

IHYJP Kickoff Poster
09. 10. 2007
0 views

IHYJP Kickoff Poster

020703 DHCAL
31. 08. 2007
0 views

020703 DHCAL

Vimpel Com
31. 08. 2007
0 views

Vimpel Com

Overland vista uib itforum
19. 06. 2007
0 views

Overland vista uib itforum

OS Notes
19. 06. 2007
0 views

OS Notes

NVIDIA OpenGL on Vista
19. 06. 2007
0 views

NVIDIA OpenGL on Vista

NonAdmin Pilot
19. 06. 2007
0 views

NonAdmin Pilot

New Mexico NETUG WPF
19. 06. 2007
0 views

New Mexico NETUG WPF

nercomp SIG
19. 06. 2007
0 views

nercomp SIG

MSAM Launch Vista Final Updated
19. 06. 2007
0 views

MSAM Launch Vista Final Updated

MOSS WF Talk
19. 06. 2007
0 views

MOSS WF Talk

More Online Games
19. 06. 2007
0 views

More Online Games

MHay Wireless
19. 06. 2007
0 views

MHay Wireless

Marl WSUS3
19. 06. 2007
0 views

Marl WSUS3

mail list news
19. 06. 2007
0 views

mail list news

Lenovo UofU
19. 06. 2007
0 views

Lenovo UofU

Lecture II
19. 06. 2007
0 views

Lecture II

Smith F09
13. 10. 2007
0 views

Smith F09

35508
26. 02. 2008
0 views

35508

pinar
19. 06. 2007
0 views

pinar

pgp
19. 06. 2007
0 views

pgp

pessner
19. 06. 2007
0 views

pessner

Overview Presentation
19. 06. 2007
0 views

Overview Presentation

North Dakota Annuity Deck
19. 06. 2007
0 views

North Dakota Annuity Deck

Rutland Presentation plenary4
31. 08. 2007
0 views

Rutland Presentation plenary4

NAMI NC 112707
07. 01. 2008
0 views

NAMI NC 112707

finland poster
07. 09. 2007
0 views

finland poster

sample
27. 09. 2007
0 views

sample

dtk
13. 09. 2007
0 views

dtk

Phenotyping Oxford
17. 10. 2007
0 views

Phenotyping Oxford

dog breeding
19. 11. 2007
0 views

dog breeding

5th trondhiem
29. 11. 2007
0 views

5th trondhiem

policies regs
28. 12. 2007
0 views

policies regs

GetuHailu
13. 09. 2007
0 views

GetuHailu

genealogy
01. 10. 2007
0 views

genealogy

net info 050928
19. 06. 2007
0 views

net info 050928

chap7
15. 10. 2007
0 views

chap7

Rafael Guillen CCAD SIAM mar06
22. 10. 2007
0 views

Rafael Guillen CCAD SIAM mar06

na3 Russia
31. 08. 2007
0 views

na3 Russia

Sois Global Programs3 12 04
31. 08. 2007
0 views

Sois Global Programs3 12 04

sacha
31. 08. 2007
0 views

sacha

amm pres valdez lacnic
22. 10. 2007
0 views

amm pres valdez lacnic

nwnt
19. 06. 2007
0 views

nwnt

STAR shielding 2
13. 11. 2007
0 views

STAR shielding 2

voiceline overview
17. 10. 2007
0 views

voiceline overview

gross PPT
07. 04. 2008
0 views

gross PPT

WP1a
15. 10. 2007
0 views

WP1a

Microarray Data Standard
07. 11. 2007
0 views

Microarray Data Standard

Lim Badejo Dell Presentation 1
19. 06. 2007
0 views

Lim Badejo Dell Presentation 1

HongKong Punkka Salo
07. 09. 2007
0 views

HongKong Punkka Salo

Dvoretsky
31. 08. 2007
0 views

Dvoretsky

qm1 web
03. 01. 2008
0 views

qm1 web

IAPS
07. 09. 2007
0 views

IAPS

yalestudy
28. 09. 2007
0 views

yalestudy

digvlsideslec1
12. 10. 2007
0 views

digvlsideslec1

mead
13. 09. 2007
0 views

mead

bashmakov
31. 08. 2007
0 views

bashmakov