jflex tutorial

Information about jflex tutorial

Published on November 27, 2007

Author: Charlo

Source: authorstream.com

Content

JFlex:  JFlex Basically, a lexer is a Finite-State “Transducer” plus bells and whistles Arbitrary Java code can be associated with actions state transitions Specify the “transducer” in a .flex file; JFlex compiles it into a .java file By default, JFlex gives you convenience methods to access results of state transitions A simple task:  A simple task Charniak’s statistical parser takes input sentences delimited by <s>...</s> Suppose we want to take a Reader over such input and get back a Tokenizer over the tokens, which returns Word objects, plus a special end-of-sentence character garbage garbage garbage <s>Stocks skyrocketed on news that investigation of Cheney ’s energy taskforce was dropped . </s>more garbage edu.stanford.nlp.process.AbstractTokenizer:  edu.stanford.nlp.process.AbstractTokenizer ... /** * Internally fetches the next token. * * @return the next token in the token * stream, or null if none exists. */ protected abstract Object getNext(); ... Lexical Rules:  Lexical Rules Basically you’re specifying a finite-state automaton* with actions associated with state transitions *though not strictly limited by FSA expressivity Schematic .flex file:  Schematic .flex file {user code} %% {options and declarations} %% {lexical rules} Lexical Rules (schematic):  Lexical Rules (schematic) <YYINITIAL> { {BeginSentence} { yybegin{SENTENCE}; return yylex(); } {WhiteSpace} { /* ignore */ return yylex();} . { /* ignore */ return yylex();} } <SENTENCE> { {EndSentence} { yybegin{YYINITIAL}; return SENTENCE_BOUNDARY; } {Token} { return new Word(yytext()); } {Space} { /* ignore */ return yylex(); } } Lexical Rules (detail):  Lexical Rules (detail) <YYINITIAL> { {BeginSentence} / .* { yybegin(SENTENCE); return yylex();} ... } <SENTENCE> { {EndSentence} / .* { yybegin(YYINITIAL); return SENTENCE_BOUNDARY;} {Token} { return new Word(yytext()); } ... } Options and declarations: States and Macros:  Options and declarations: States and Macros Macros can be used to define other macros Order of macro definition is irrelevant %state SENTENCE SentenceLetter = s BeginSentence = <{SentenceLetter}> EndSentence = <\/{SentenceLetter}> WhiteSpace = [ \t\r\n\f] Token = [^ \t\r\n\f]+ Other options and declarations:  Other options and declarations %class CharniakTokenizer %implements Tokenizer %extends AbstractTokenizer %unicode %type Object %eofval{ return null; %eofval} Options & declarations: class-internal code (1):  Options & declarations: class-internal code (1) %{ static final Word SENTENCE_BOUNDARY = new Word("SENTENCE_BOUNDARY"); public Object getNext() { try { Object o = yylex(); return o; } catch(IOException e) { return null; } } ... %} Options & declarations: class-internal code (2):  Options & declarations: class-internal code (2) %{ ... public static void main(String[] args) throws IOException { Reader r = new FileReader(args[0]); Tokenizer t = new CharniakTokenizer(r); while(t.hasNext()) { System.out.println(t.next()); } } %} User Code inserted directly into the file:  User Code inserted directly into the file package rog; import java.util.*; import java.io.*; import edu.stanford.nlp.ling.Word; import edu.stanford.nlp.process.*; /** A lexer for Charniak input sentences * @author Roger Levy */ Beyond FSA expressivity:  Beyond FSA expressivity %class ParenCounter %{ private int numParens = 0; %} ... %% ... <YYINITIAL> { \( { numParens++; return yytext(); } \) { if(numParens == 0) throw new RuntimeException( "error – too many close parens!"); else { numParens--; return yytext(); } } }

Related presentations


Other presentations created by Charlo

NCBCPresentation1 24 2006
01. 10. 2007
0 views

NCBCPresentation1 24 2006

Art 3101 Powerpoint Two
01. 11. 2007
0 views

Art 3101 Powerpoint Two

LOT Complex sentences
05. 11. 2007
0 views

LOT Complex sentences

A5LisaWelch
05. 11. 2007
0 views

A5LisaWelch

9 Wheat
04. 10. 2007
0 views

9 Wheat

coordinates
15. 11. 2007
0 views

coordinates

Toxocara canis
19. 11. 2007
0 views

Toxocara canis

Taslipinar
22. 11. 2007
0 views

Taslipinar

BCAKposter
13. 12. 2007
0 views

BCAKposter

jacquie white
23. 12. 2007
0 views

jacquie white

125789
29. 12. 2007
0 views

125789

South Asia
01. 01. 2008
0 views

South Asia

naor slides 1
05. 01. 2008
0 views

naor slides 1

intellilgence
07. 01. 2008
0 views

intellilgence

klasky
20. 11. 2007
0 views

klasky

lab2
25. 12. 2007
0 views

lab2

ecco tezcan 28032007
28. 12. 2007
0 views

ecco tezcan 28032007

Overview of IP
26. 02. 2008
0 views

Overview of IP

Lesson 2 History
28. 02. 2008
0 views

Lesson 2 History

22 vcc parkview 9h30 lonn
04. 03. 2008
0 views

22 vcc parkview 9h30 lonn

AMST3100 Thepre 1960s
24. 02. 2008
0 views

AMST3100 Thepre 1960s

G3
14. 03. 2008
0 views

G3

jovage32zi7d3p6
18. 03. 2008
0 views

jovage32zi7d3p6

NK SIF2
27. 03. 2008
0 views

NK SIF2

Mujdat Altay Nortel
26. 11. 2007
0 views

Mujdat Altay Nortel

LIDA2007 Deyrup
30. 12. 2007
0 views

LIDA2007 Deyrup

Polar Bears
03. 10. 2007
0 views

Polar Bears

m07
21. 11. 2007
0 views

m07

Kimberly PPt
07. 01. 2008
0 views

Kimberly PPt

TRANS WP29 GRSP 38 inf12e
28. 11. 2007
0 views

TRANS WP29 GRSP 38 inf12e

Sabbatini
02. 01. 2008
0 views

Sabbatini

Sorrento highlights
03. 10. 2007
0 views

Sorrento highlights

EFSPlecture3
21. 12. 2007
0 views

EFSPlecture3