CS 595 Presentation

Information about CS 595 Presentation

Published on October 17, 2007

Author: Danielle

Source: authorstream.com

Content

Classifying Gender on Shakespeare’s Characters:  Classifying Gender on Shakespeare’s Characters By Sobhan Advisor: Dr. Argamon Outline:  Outline Introduction to Problem Data Collection, Meta Data Generation, Feature Description File Selection Importing Data, Generating ARFF File - ATMan Vector Calculation ML Algorithms Used for this Classification Problem Experiments and Results Top - Bottom 20 Features – Responsible for Gender Classification Machine/OS/Tools Used Future Work - References Introduction:  Introduction Research in Gender Classification – Email Authorship, Written Text, Authorship on Novels Do Male/Female playwright writes the same way for their Male/Female characters into their plays or they writes in different manner ? Finding Accuracy on Gender Classification for Shakespeare’s Characters Features used by Character Gender from Plays Finding Accuracy on Social Class Classification for Shakespeare’s Characters Data Collection :  Data Collection Version used : Moby Shakespeare Available at: http://www-tech.mit.edu/Shakespeare/ Collected all HTML files using “wget” Class used(html2txt): Converted html files to text files for each individual play and also based on scenes Data Cleaning :  Data Cleaning Unwanted data were removed from each scene exeunt Exit Meta Data Generation:  Meta Data Generation Meta Data: Data about Data For each character acting on the play has the following 6 information to be captured. Data about a Character Type of Play: Comedy Name of the Play: Midsummer Night’s Dream Name of the Character: CLOWN Speech Length: 1024.0 Gender: Male Social Class: Low Corpus Selection:  Corpus Selection Initially All Scenes were selected. Speech Length for each character was added to Metadata and then the following selection were made Characters with more than 100, 200, 300, 400, 500 speech length were taken into consideration. (For scenes, acts and on Play) Separates files per character were created for more than 500, 200 Features File Selection:  Features File Selection Most Frequent 500 Words from Plays (FDescMostFrequentAttr - Sterling) Function Words( Standard FWs from Bar Ilan University - #471) Function Words Collected from ARFF received from Bar-Ilan (#364) Shakespearean Function Words from Plays(# 491) All Stop Words (#645) Appraisal Features(#47) Systemic Features(#94) System Architecture:  System Architecture Corpus ATMAN Importer ImportShakespeareData ARFF FILE Cdesc ATXT TOKEN Atxt, Token Fdesc ATMAN QuickARFF A Meta-Info Tag from an Atxt File:  A Meta-Info Tag from an Atxt File Vector Calculation:  Vector Calculation C(w,c) = # of occurrences of FW w for character c N(c) = total # of word occurrences for character c (number of tokens) Vector_Value(w) = is then C(w,c)/N(c) Algorithms:  Algorithms Decision Trees J48 Decision Stump Functions SMO e-1 SMO e-2 Rules PART Meta AdaBoostM1 + J48 (- 30 I) AdaBoostM1 + DecisionStump(- 30 I) MultiBoost + J48 (- 30 I) MultiBoost + DecisionStump(- 30 I) Experiments:  Experiments Strategy Used: 10 different partitions on each of the following categories. Experiments were made with Total Female characters with equal number of Random Male characters All Comedy History Tragedy High Low Testing Option – 10 Fold CV All & Comedy – MF - 500:  All & Comedy – MF - 500 Tragedy & History - MF - 500:  Tragedy & History - MF - 500 High & Low - MF - 500:  High & Low - MF - 500 Bar-ILan FWs(#471):  Bar-ILan FWs(#471) 364 FWs for Characters with Speech Length more than 100 – Acts Based :  364 FWs for Characters with Speech Length more than 100 – Acts Based 364 FWs Characters with speech length> 500:  364 FWs Characters with speech length> 500 364 FWs + Quote Features Characters with Speech Length > 500:  364 FWs + Quote Features Characters with Speech Length > 500 BAR-ILAN Results F - 55 - M:  BAR-ILAN Results F - 55 - M 364 FWs(F - 89 - V M) Characters with speech length> 200:  364 FWs(F - 89 - V M) Characters with speech length> 200 Stop Words-Appraisal-Systemic:  Stop Words-Appraisal-Systemic Machine/OS/Tools :  Machine/OS/Tools Altaic – Linux OS – Altaic 4GB RAM – Importing, Generating ARFF using ATMan My PC – Windows XP - 1GB RAM - Running Experiments in Weka-3-4 HLL – Java1.4.2 File Zilla – Transferring Files from remotely Putty – To Run commands Remotely in Server TextPad – Tool for Text Processing Edit Plus – IDE for Generating Scripts and Programs Future Work :  Future Work Experiments with Individual Category of Play Type, Social Class Accuracy for Social Class Features, Combination of Features Get subtle features to distinguish Gender Character Get subtle features to distinguish Social Class Combination of Features for Gender/Social class Classification Combination of Features allows to predict characteristics on Appraisal or Systemic behavior Reference:  Reference Authorship Verification as a One-Class Classification Problem, Moshe Koppel, Jonathan Schler Automatic Authorship Attribution – E.Stamatatos, N. Fakotakis, G. Kokkinakis Gender Preferential Text Mining of E-mail Discourse – Malcolm Corney, Olivier de Vel, Alison Anderson, George Mohay Mining E-mail Authorship – Oliver de Vel Style Mining of Electronic Messages for Multiple Authorship Discrimination: First Results - S3, Shlomo Argamon, Marin Automatically Categorizing Written Texts by Author Gender - Moshe Koppel, Shlomo Argamon, Anat Rachel Shimoni Gender, Genre and Writing Style in Formal Written Texts - Moshe Koppel, Shlomo Argamon, Anat Rachel Shimoni, Jonathan Fine References:  References MEASURING THE USEFULNESS OF FUNCTION WORDS FOR AUTHORSHIP ATTRIBUTION – Shlomo Argamon, Shlomo Levitan A short introduction to Boosting : Yoav Freund, Robert E. Schapire A competitive Analysis of Automated Authorship Attribution Techniques – Jason Sorenson Text Categorization with Support Vector Machines: Learning with Many Relevant Features - Thorsten Joachims

Related presentations


Other presentations created by Danielle

American Culture
07. 11. 2007
0 views

American Culture

Seminar Dec 05 06 EvansS
07. 05. 2008
0 views

Seminar Dec 05 06 EvansS

9071
02. 05. 2008
0 views

9071

cas loutraki
02. 05. 2008
0 views

cas loutraki

26904 1
30. 04. 2008
0 views

26904 1

01TaxationNaturalRes ources
28. 04. 2008
0 views

01TaxationNaturalRes ources

UC1006
22. 04. 2008
0 views

UC1006

corpsponsorprogram
18. 04. 2008
0 views

corpsponsorprogram

Diameter Credit Check Mironov
17. 04. 2008
0 views

Diameter Credit Check Mironov

Insurance Needs
16. 04. 2008
0 views

Insurance Needs

EventPlanning
05. 10. 2007
0 views

EventPlanning

MURI NOAAPAP
05. 10. 2007
0 views

MURI NOAAPAP

Chapter3 Overexploitation
10. 10. 2007
0 views

Chapter3 Overexploitation

NSFWkshp10 KoganGPS
12. 10. 2007
0 views

NSFWkshp10 KoganGPS

2006 PC chap1 5
12. 10. 2007
0 views

2006 PC chap1 5

CultureoftheCIS
15. 10. 2007
0 views

CultureoftheCIS

ICCOA IOMDP PP
15. 10. 2007
0 views

ICCOA IOMDP PP

2006BiochemA chap3
15. 10. 2007
0 views

2006BiochemA chap3

landnav
19. 10. 2007
0 views

landnav

GlobalInsightSupplyC hain
22. 10. 2007
0 views

GlobalInsightSupplyC hain

ProfRaulBraes
22. 10. 2007
0 views

ProfRaulBraes

masjid
24. 10. 2007
0 views

masjid

walk21
17. 10. 2007
0 views

walk21

2006 Footwear Conf
25. 10. 2007
0 views

2006 Footwear Conf

protws2 4572
29. 10. 2007
0 views

protws2 4572

0611
30. 10. 2007
0 views

0611

ContactCanberra1
04. 10. 2007
0 views

ContactCanberra1

robert engle
08. 10. 2007
0 views

robert engle

PO Workbenches Data Clean up
22. 10. 2007
0 views

PO Workbenches Data Clean up

Noel Final
12. 10. 2007
0 views

Noel Final

Traina
15. 10. 2007
0 views

Traina

The End of the Cold War
23. 12. 2007
0 views

The End of the Cold War

billah
23. 10. 2007
0 views

billah

CHAPTER 18 1
05. 01. 2008
0 views

CHAPTER 18 1

Constructed Wetlands
07. 01. 2008
0 views

Constructed Wetlands

US invlovemnet in ww2
13. 11. 2007
0 views

US invlovemnet in ww2

usits2001 talk 1
29. 10. 2007
0 views

usits2001 talk 1

Supercomputing
02. 10. 2007
0 views

Supercomputing

Cine y filosofia
24. 10. 2007
0 views

Cine y filosofia

open economy
04. 10. 2007
0 views

open economy

IHEP in EGEE ver4
27. 09. 2007
0 views

IHEP in EGEE ver4

PACS
15. 10. 2007
0 views

PACS

Project Gini
14. 02. 2008
0 views

Project Gini

1Unit 8
24. 02. 2008
0 views

1Unit 8

Slides Louis
24. 02. 2008
0 views

Slides Louis

Management Structure Syria
07. 01. 2008
0 views

Management Structure Syria

DHS COPLINK Data Mining 2003
07. 03. 2008
0 views

DHS COPLINK Data Mining 2003

0815 Branch 0730
12. 03. 2008
0 views

0815 Branch 0730

trudel and nelson
01. 10. 2007
0 views

trudel and nelson

national holocaust memorial day
18. 03. 2008
0 views

national holocaust memorial day

Chapter 14 Powerpoint
26. 11. 2007
0 views

Chapter 14 Powerpoint

viniciuscatao inclusao
02. 11. 2007
0 views

viniciuscatao inclusao

nile climatology
21. 10. 2007
0 views

nile climatology

Roman Spring 2006
31. 12. 2007
0 views

Roman Spring 2006

OPSPanama 1
22. 10. 2007
0 views

OPSPanama 1

FunNight
23. 11. 2007
0 views

FunNight

COrlandi ANCI
24. 10. 2007
0 views

COrlandi ANCI

VisitingUCSF
30. 10. 2007
0 views

VisitingUCSF

Portfolio INFANZIA
02. 11. 2007
0 views

Portfolio INFANZIA

Rejmanek Honza Poster 20061110
03. 10. 2007
0 views

Rejmanek Honza Poster 20061110

Session 2 Mr Hotta ENUM 07
09. 10. 2007
0 views

Session 2 Mr Hotta ENUM 07

embrapa1
23. 10. 2007
0 views

embrapa1

6 ApocaplyticLiterature
01. 10. 2007
0 views

6 ApocaplyticLiterature

dissolving
04. 01. 2008
0 views

dissolving

UUpresEng0706
15. 10. 2007
0 views

UUpresEng0706

ICOPS agarwal 2007 v6
04. 12. 2007
0 views

ICOPS agarwal 2007 v6

Druckman flu presentation
26. 03. 2008
0 views

Druckman flu presentation