shin

Information about shin

Published on December 11, 2007

Author: Dixon

Source: authorstream.com

Content

Exploitation of Structural Similarity in Semi-Structured Bioinformatics Data for Efficient Storage Construction:  Exploitation of Structural Similarity in Semi-Structured Bioinformatics Data for Efficient Storage Construction Dongkyoo Shin ([email protected]) Sejong University, InCob2007 Table of contents:  Table of contents Abstract Background Methods Results Conclusions Abstract (1):  Abstract (1) Background Many researches related to storing XML data Reduce the number of joins between tables Not proper to microarray data with distinctive hierarchy Hierarchical feature of microarray data model a few core values occurs iteratively New approach for capturing the feature Class elements with similar structure into a group Design common database table for the group Abstract (2):  Abstract (2) Results Database schema created by our approach Reduce the number of table joins remarkably Improve performance of storing and loading XML-based microarray data Conclusions Efficient way to improve performance of microarray data is mining structural similarity of elements Background (1):  Background (1) DTD (Data Type Definition)-dependent base Map one element into one table For each e  E, #(S) ≥1 OR #(A) ≥1 -> define_Class(e) For each Se  S -> Add_attributes_of_Class(e) Se  SequenceType -> Define_multivalued_att(Se, e) Background (2):  Background (2) Inline technique base Reduce the complexity of DTD (Data Type Definition) For each e, #(S) == 1 AND Se  SequenceType -> Add_Multi-valued_attribute_of_Paren-tClass(e) Background (3):  Background (3) Drawback of previous approaches DTD-dependent Database schema has the same complexity with DTD Inline technique Strongly depend on the number of omissible elements New design approach for microarray database Capture similar structural features of microarray data Need fast and simple way to mine the structural features Background (5):  Background (5) Microarray data and MAGE (Microarray Gene Expression) standards Research groups share microarray data with others, and use it to solve their biological questions MGED society’s standard definitions MIAME (Minimum Information for the Annotation of a Microarray Experiment) MAGE-OM and MAGE-ML Exchange object model and format for MIAME Structural feature of MAGE-OM a variety set of objects defining the same data types including complex types. Background (6):  Background (6) Decision Tree a simple model for easy understanding classification rules correlations, and effects between variables Proper for mining structural features of MAGE-ML DTD itself (Not MAGE-ML instances !!!) Possible to classify all elements three levels: A root, mediators group, and bottoms group Methods (1):  Methods (1) Classification of core features using decision tree Terminologies for expression of a complexType e: an element defined in XML schema E: an elements set of e SE: a sub-elements set of e a: an attribute of e A: an attributes set of e SA: an attributes set for all sub-elements of e complexType: Structural information that consists of SE and (or) A of e. Lowest child: an element without a sub-element Lowest parent: an element with a sub-element that is one of the lowest child elements PG (Parent Group): a set of candidate elements to be parents of a Lowest Child LPCG (The Lowest Parent Candidate Group): a set of candidates to be Lowest Parent LCG (The Lowest Child Group): a set of Lowest child elements LPG (The Lowest Parent Group): a set of Lowest Parent elements ULPG (Upper Level Parent Group): a set of upper level parents, including elements that are neither Lowest Child nor Lowest Parent Methods (2):  Methods (2) Expression of a complexType A complexType defines structural information of elements A set of arrays including data type Definition of structural similarity SEelex = {e1, e2, … , en}, SAelex = {Ae1, Ae2, … , Aen} complexType(elex) = {SEelex, SAelex} complexType(elex) == complexType(eley) Methods (3):  Methods (3) Decision Tree for recognizing the core features Condition 1: If rule 1 is satisfied, then e arrives at LCG. Otherwise, it arrives at PG. Condition 2: If rule 2 is satisfied, then e and its similar element e arrive at a new LCG. Condition 3: If rule 3 is satisfied, then e arrives at LPG. Otherwise, it arrives at ULPG. Condition 4: If rule 4 is satisfied, then e and elements similar to e arrive at a new LPG. Methods (4):  Methods (4) Classification rules Rule 1 Decide that an element should belong to group LCG or PG For each ei  E { if(number of elements in SEei == 0){ ei is classified into LCG; }else{ ei is classified into PG; } } Methods (5):  Methods (5) Classification rules Rule 2 Classify multiple sets of LCG p = 0; For each ei  LCG0 { Flag=0; If (p>0) { For q=1 to p If (complexType(ei) = complexType(element in LCGq) { ei is classified into LCGq; Flag=1; } } If (Flag==0) { For each ej  LCG0 if(complexType(ei) = complexType(ej) { p=p+1; ei and ej are classified into a new group of LCGp; } } } Methods (6):  Methods (6) Classification rules Rule 3 Separate elements in PG into two groups: LPG and ULPG For each ei  PG { if(SEei  LCG) { ei is classified into LPG; }else{ ei is classified into ULPG; } } Methods:  Methods Classification rules Rule 4 Classify multiple sets of LPG p = 0; For each ei  LPG0 { Flag=0; If (p>0) { For q=1 to p If (complexType(ei) = complexType(element in LPGq) { ei is classified into LPGq; Flag=1; } } If (Flag==0) { For each ej  LPG0 if(complexType(ei) = complexType(ej) { p=p+1; ei and ej are classified into a new group of LPGp; } } } Result (1):  Result (1) Database design by the proposed decision tree Result (2):  Result (2) Database space complexity Time complexity Result (3):  Result (3) Reconstructing the XML Document Conclusions:  Conclusions Proposed approach Mine elements with structural similarity from XML Schema for biological information Experimental result Mining structural similarity of object model is proper to microarray data and more efficient than previous approaches Future work Plan to extend current classification rules to root, LCG, LPG, ULPG respectively

Related presentations


Other presentations created by Dixon

Types of Flower Shop
06. 11. 2007
0 views

Types of Flower Shop

ALCATELe salud
30. 11. 2007
0 views

ALCATELe salud

Upanishads
06. 12. 2007
0 views

Upanishads

Teaching World History
25. 10. 2007
0 views

Teaching World History

400 Silent Years
30. 10. 2007
0 views

400 Silent Years

invasion2
31. 10. 2007
0 views

invasion2

2004 06 09 clavell constipation
31. 10. 2007
0 views

2004 06 09 clavell constipation

PresentazioneSofia20 05
01. 11. 2007
0 views

PresentazioneSofia20 05

Ch09
02. 11. 2007
0 views

Ch09

EEA Workshop Buhaug IMO index
06. 11. 2007
0 views

EEA Workshop Buhaug IMO index

reynolds
07. 11. 2007
0 views

reynolds

Week5
15. 11. 2007
0 views

Week5

The best of two worlds
16. 11. 2007
0 views

The best of two worlds

iso e
23. 11. 2007
0 views

iso e

pollination
17. 12. 2007
0 views

pollination

savannas
26. 11. 2007
0 views

savannas

discourse
12. 12. 2007
0 views

discourse

S4 03Dwaine Clarke
25. 12. 2007
0 views

S4 03Dwaine Clarke

Field Forage
28. 12. 2007
0 views

Field Forage

Ethics Principles May 2003 1
29. 12. 2007
0 views

Ethics Principles May 2003 1

Alan Turing is Da Bombe
02. 01. 2008
0 views

Alan Turing is Da Bombe

Chalut1
03. 01. 2008
0 views

Chalut1

Search and Rescue
03. 01. 2008
0 views

Search and Rescue

StigmaLeipzigAtt
04. 01. 2008
0 views

StigmaLeipzigAtt

saworkshop pp addressing uebel
07. 01. 2008
0 views

saworkshop pp addressing uebel

file 10684
07. 01. 2008
0 views

file 10684

Laborin Mario
15. 11. 2007
0 views

Laborin Mario

una madre unica 21186
01. 10. 2007
0 views

una madre unica 21186

PDSI
21. 11. 2007
0 views

PDSI

BerwickPPT1sp04
10. 12. 2007
0 views

BerwickPPT1sp04

FDIprezentace 2
14. 11. 2007
0 views

FDIprezentace 2

bisc Progress Review 17 june
03. 12. 2007
0 views

bisc Progress Review 17 june

Lecture12Handout
30. 12. 2007
0 views

Lecture12Handout

Beauty05 biglietti
30. 10. 2007
0 views

Beauty05 biglietti

ch14
20. 02. 2008
0 views

ch14

A4081
24. 02. 2008
0 views

A4081

ELECTRONICversion
27. 02. 2008
0 views

ELECTRONICversion

italie powerpoint 04 05
31. 10. 2007
0 views

italie powerpoint 04 05

lecture 11 travel writing
27. 03. 2008
0 views

lecture 11 travel writing

BP ICIW07
31. 10. 2007
0 views

BP ICIW07

GOLINI
29. 10. 2007
0 views

GOLINI

WAYS OF DIVIDING THE WORLD
24. 12. 2007
0 views

WAYS OF DIVIDING THE WORLD

twp
23. 12. 2007
0 views

twp

barrett
02. 01. 2008
0 views

barrett

SLAC 02022005 AMvdB
05. 12. 2007
0 views

SLAC 02022005 AMvdB

Navas 30
23. 11. 2007
0 views

Navas 30

InSeT
16. 11. 2007
0 views

InSeT

Intermediate Microsoft Word
12. 03. 2008
0 views

Intermediate Microsoft Word

SESAMI Menichelli
29. 10. 2007
0 views

SESAMI Menichelli

Wireless Workshop Tyndall
28. 11. 2007
0 views

Wireless Workshop Tyndall