regression

Information about regression

Published on September 25, 2007

Author: Nickel

Source: authorstream.com

Content

Optimizing Local Probability Models for Statistical Parsing:  Optimizing Local Probability Models for Statistical Parsing Kristina Toutanova, Mark Mitchell, Christopher Manning Computer Science Department Stanford University Highlights:  Highlights Choosing a local probability model P(expansion(n)|history(n)) for statistical parsing – a comparison of commonly used models A new player – memory based models and their relation to interpolated models Joint likelihood, conditional likelihood and classification accuracy for models of this form Motivation:  Motivation Many problems in natural language processing are disambiguation problems word senses jaguar – a big cat, a car, name of a Java package line - phone, queue, in mathematics, air line, etc. part-of-speech tags (noun, verb, proper noun, etc.) ? ? ? Joy makes progress every day . NN VB DT NN NN NNP VBZ NNS Parsing as Classification:  Parsing as Classification 'I would like to meet with you again on Monday' Input: a sentence Classify to one of the possible parses Motivation – Classification Problems:  Motivation – Classification Problems There are two major differences from typical ML domains: The number of classes can be very large or even infinite; the set of available classes for an input varies (and depends on a grammar) Data is usually very sparse and the number of possible features is large (e.g. words) Solutions :  Solutions The possible parse trees are broken down into small pieces defining features features are now functions of input and class, not input only Discriminative or generative models are built using these features we concentrate on generative models here; when a huge number of analyses are possible, they are the only practical ones History-Based Generative Parsing Models:  History-Based Generative Parsing Models 'Tuesday Marks bought Brooks'. TOP The generative models learn a distribution P(S,T) on andlt;sentence, parse treeandgt; pairs: select a single most likely parse based on: Factors in the Performance of Generative History-Based Models:  Factors in the Performance of Generative History-Based Models The chosen decomposition of parse tree generation, including the representation of parse tree nodes and the independence assumptions The model family chosen for representing local probability distributions: Decision Trees, Naïve Bayes, Log-linear Models The optimization method for fitting major and smoothing parameters: Maximum likelihood, maximum conditional likelihood, minimum error rate , etc. Previous Studies and This Work:  Previous Studies and This Work The influence of the previous three factors has not been isolated in previous work: authors presented specific choices for all components and the importance of each was unclear. We assume the generative history-based model and set of features (the representation of parse tree nodes) are fixed and we study carefully the other two factors. Deleted Interpolation:  Deleted Interpolation Estimating the probability P(y|X) by interpolating relative frequency estimates for lower-order distributions Most commonly used: linear feature subsets order Jelinek-Mercer with fixed weight, Witten Bell with varying d, Decision Trees with path interpolation,Memory-Based Learning Memory-Based Learning as Deleted Interpolation:  Memory-Based Learning as Deleted Interpolation In k-NN, the probability of a class given features is estimated as: If the distance function depends only on the positions of the matching features* , it is a case of deleted interpolation Memory-Based Learning as Deleted Interpolation:  Memory-Based Learning as Deleted Interpolation P(eye-color=blue|hair-color=blond) We have N=12 samples of people d=1 or d=0 (match), w(1)=w1 , w(0)=w0, K=12 Deleted Interpolation where the interpolation weights depend on the counts and weights of nearest neighbors at all accepted distances The Task and Features Used :  The Task and Features Used Maximum ambiguity – 507, minimum - 2 Experiments :  Experiments Linear Feature Subsets Order Jelinek-Mercer with fixed weight Witten Bell with varying d Linear Memory-Based Learning Arbitrary Feature Subsets Order Decision Trees Memory-Based Learning Log-linear Models Experiments on the connection among likelihoods and accuracy Experiments – Linear Sequence:  Experiments – Linear Sequence The features {1,2,…,8} ordered by gain ratio {1,8,2,3,5,4,7,6} Jelinek Mercer Fixed Weight Witten-Bell Varying d Experiments – Linear Sequence:  Experiments – Linear Sequence heavy smoothing for best results MBL Linear Subsets Sequence:  MBL Linear Subsets Sequence Restrict MBL to be an instance of the same linear subsets sequence deleted interpolation as follows: Weighting functions INV3 and INV4 performed best: LKNN3 best at K=3,000 79.94% LKNN4 best at K=15,000 80.18% LKNN4 is best of all Experiments :  Experiments Linear Subsets Feature Order Jelinek-Mercer with fixed weight Witten Bell with varying d Linear Memory-Based Learning Arbitrary Subsets Feature Order Decision Trees Memory-Based Learning Log-linear Models Experiments on the connection among likelihoods and accuracy Model Implementations – Decision Trees:  Model Implementations – Decision Trees (DecTreeWBd) n-ary decision trees; If we choose a feature f to split on, all its values form subtrees splitting criterion – gain ratio final probabilities estimates at the leaves are Witten Bell d interpolations of estimates on the path to the root feat: 1 feat:2 HCOMP instances of deleted interpolation models! NOPTCOMP Model Implementations – Log-linear Models:  Model Implementations – Log-linear Models Binary features formed by instantiating templates Three models with different allowable features Single attributes only LogLinSingle Pairs of attributes, only pairs involving the most important feature (node label) LogLinPairs Linear feature subsets – comparable to previous models LogLinBackoff Gaussian smoothing was used Trained by Conjugate Gradient (Stanford Classify Package) Model Implementations –Memory-Based Learning:  Model Implementations –Memory-Based Learning Weighting functions INV3 and INV4 KNN4 better than DecTreeWBd and Log-linear models KNN4 has 5.8% error reduction from WBd (significant at the 0.01 level) Accuracy Curves for MBL and Decision Trees:  Accuracy Curves for MBL and Decision Trees Experiments :  Experiments Linear Subsets Feature Order Jelinek-Mercer with fixed weight Witten Bell with varying d Linear Memory-Based Learning Arbitrary Subsets Feature Order Decision Trees Memory-Based Learning Log-linear Models Experiments on the connection among likelihoods and accuracy Joint Likelihood, Conditional Likelihood, and Classification Accuracy:  Joint Likelihood, Conditional Likelihood, and Classification Accuracy Our aim is to maximize parsing accuracy, but: Smoothing parameters are usually fit on held-out data to maximize joint likelihood Sometimes conditional likelihood is optimized We look at the relationship among the maxima of these three scoring functions, depending on the amount of smoothing, finding that: Much heavier smoothing is needed to maximize accuracy than joint likelihood Conditional likelihood also increases with smoothing, even long after the maximum for joint likelihood Test Set Performance versus Amount of Smoothing - I:  Test Set Performance versus Amount of Smoothing - I Test Set Performance versus Amount of Smoothing:  Test Set Performance versus Amount of Smoothing Test Set Performance versus Amount of Smoothing –PP Attachment:  Test Set Performance versus Amount of Smoothing –PP Attachment Witten-Bell Varying d Summary:  Summary The problem of effectively estimating local probability distributions for compound decision models used for classification is under-explored We showed that the chosen local distribution model matters We showed the relationship between MBL and deleted interpolation models MBL with large numbers of neighbors and appropriate weighting outperformed more expensive and popular algorithms – Decision Trees and Log-linear Models Fitting a small number of smoothing parameters to maximize classification accuracy is promising for improving performance Future Work:  Future Work Compare MBL to other state-of-the art smoothing methods Better ways of fitting MBL weight functions Theoretical investigation of bias-variance tradeoffs for compound decision systems with strong independence assumptions

Related presentations


Other presentations created by Nickel

Final Capital Budgeting
18. 06. 2007
0 views

Final Capital Budgeting

culturaldiversity
15. 06. 2007
0 views

culturaldiversity

copyright
19. 09. 2007
0 views

copyright

Skamarock
25. 09. 2007
0 views

Skamarock

Heynderickx sampex
25. 09. 2007
0 views

Heynderickx sampex

toke wci fragmodel 3
25. 09. 2007
0 views

toke wci fragmodel 3

PACIS05
25. 09. 2007
0 views

PACIS05

WSC06
25. 09. 2007
0 views

WSC06

bh us 02 veeneman wireless
25. 09. 2007
0 views

bh us 02 veeneman wireless

NOAAg
25. 09. 2007
0 views

NOAAg

phrase EM NAACL presentation
25. 09. 2007
0 views

phrase EM NAACL presentation

LCmodels
25. 09. 2007
0 views

LCmodels

models in geography
25. 09. 2007
0 views

models in geography

hawaii 0401
23. 08. 2007
0 views

hawaii 0401

Albrecht Durer
23. 08. 2007
0 views

Albrecht Durer

Brian McClaren Seminar
17. 08. 2007
0 views

Brian McClaren Seminar

Parables of Jesus
17. 08. 2007
0 views

Parables of Jesus

Girish Sant
17. 08. 2007
0 views

Girish Sant

Safer PI ISIS May 00 Seattle
25. 09. 2007
0 views

Safer PI ISIS May 00 Seattle

DOOR South Asia
07. 08. 2007
0 views

DOOR South Asia

8 2 Participants Maldives
07. 08. 2007
0 views

8 2 Participants Maldives

Maldives Country Presentation
07. 08. 2007
0 views

Maldives Country Presentation

Tsunami Relief Efforts
07. 08. 2007
0 views

Tsunami Relief Efforts

safta2
07. 08. 2007
0 views

safta2

maldives khaleel3
07. 08. 2007
0 views

maldives khaleel3

193
07. 08. 2007
0 views

193

Lay sermon Jesus is the only way
17. 08. 2007
0 views

Lay sermon Jesus is the only way

Eschatology 2
17. 08. 2007
0 views

Eschatology 2

AFD 060714 020
23. 08. 2007
0 views

AFD 060714 020

Finder Workshop
18. 06. 2007
0 views

Finder Workshop

acctpresentation
23. 08. 2007
0 views

acctpresentation

Mobile Fun for a New Generation
18. 06. 2007
0 views

Mobile Fun for a New Generation

Medwoche
18. 06. 2007
0 views

Medwoche

MAGIC
18. 06. 2007
0 views

MAGIC

Lezione 2
18. 06. 2007
0 views

Lezione 2

leonardo
18. 06. 2007
0 views

leonardo

Kostenbetrachtung
18. 06. 2007
0 views

Kostenbetrachtung

kore einfuehrung
18. 06. 2007
0 views

kore einfuehrung

KBSBCH d Dez2003
18. 06. 2007
0 views

KBSBCH d Dez2003

13 triossi
18. 06. 2007
0 views

13 triossi

Foot Ankle Injuries
18. 06. 2007
0 views

Foot Ankle Injuries

Jesus as Messiah
17. 08. 2007
0 views

Jesus as Messiah

New Educ Prog Trng Vis
18. 06. 2007
0 views

New Educ Prog Trng Vis

kailas karthikeylan
07. 08. 2007
0 views

kailas karthikeylan

Montes qmapp
18. 06. 2007
0 views

Montes qmapp

mind akad 2004
18. 06. 2007
0 views

mind akad 2004

METRO PARMA
18. 06. 2007
0 views

METRO PARMA

Lezione 10
18. 06. 2007
0 views

Lezione 10

Sprites Presentation
19. 09. 2007
0 views

Sprites Presentation

dy scheler
15. 06. 2007
0 views

dy scheler

DenmarkPresentation
15. 06. 2007
0 views

DenmarkPresentation

DeckertFINALPM
15. 06. 2007
0 views

DeckertFINALPM

Curtiss Murphy
15. 06. 2007
0 views

Curtiss Murphy

moggi telefono
18. 06. 2007
0 views

moggi telefono

emotional branding2007
15. 06. 2007
0 views

emotional branding2007

Ein Gen Enzym
15. 06. 2007
0 views

Ein Gen Enzym

E Economists in the Cartoons
15. 06. 2007
0 views

E Economists in the Cartoons

lorenz
18. 06. 2007
0 views

lorenz

cmodel intro
25. 09. 2007
0 views

cmodel intro

Laboratoriodicittadi nanza
18. 06. 2007
0 views

Laboratoriodicittadi nanza

DibuzSupremeCourt
15. 06. 2007
0 views

DibuzSupremeCourt

LMclerran fest
17. 08. 2007
0 views

LMclerran fest

prm tutorial 2003 02 20
25. 09. 2007
0 views

prm tutorial 2003 02 20

fonasaygestion
18. 06. 2007
0 views

fonasaygestion

ncd survll dr shah
07. 08. 2007
0 views

ncd survll dr shah

Lezione 4
18. 06. 2007
0 views

Lezione 4

polymeric composite
25. 09. 2007
0 views

polymeric composite

CS662b
15. 06. 2007
0 views

CS662b

patterson barcodes zoocodes
19. 09. 2007
0 views

patterson barcodes zoocodes

Presentations sde small islands
07. 08. 2007
0 views

Presentations sde small islands

20070605 Bulte
07. 08. 2007
0 views

20070605 Bulte

Manual TARRAGON
18. 06. 2007
0 views

Manual TARRAGON

Day1TUMGMSeafax
07. 08. 2007
0 views

Day1TUMGMSeafax

kodak memory studio
07. 08. 2007
0 views

kodak memory studio

Jack Carlsen
07. 08. 2007
0 views

Jack Carlsen