Databases in bioinformatics

Information about Databases in bioinformatics

Published on March 7, 2014

Author: raniashok

Source: authorstream.com

Content

Databases in bioinformatics: Databases in bioinformatics MRS. RANI ASHOK ASSOCIATE PROFESSOR OF ZOOLOGY LADY DOAK COLLEGE, MADURAI – 2 EMAIL: [email protected] 3/7/2014 1 Rani Ashok, Associate Professor of Zoology, LDC Nucleotide sequence databases: Nucleotide sequence databases Primary nucleotide sequence databases 3/7/2014 2 Rani Ashok, Associate Professor of Zoology, LDC Genbank: Genbank www.ncbi.nlm.nih.gov/Genbank/ Maintained by the National Center for Biotechnology Information (NCBI), which is part of the National Institute of Health (NIH), a federal agency of the US government. accessed and searched through the Entrez system at NCBI , or one can download the entire database as flat files. 3/7/2014 3 Rani Ashok, Associate Professor of Zoology, LDC PowerPoint Presentation: 3/7/2014 4 Rani Ashok, Associate Professor of Zoology, LDC EMBL: EMBL www.ebi.ac.uk/embl/ EMBL (European Molecular Biology Laboratory) nucleotide sequence database - maintained by the European Bioinformatics Institute (EBI) in Hinxton , Cambridge, UK. Can be accessed and searched through the SRS system at EBI one can download the entire database as flat files. 3/7/2014 5 Rani Ashok, Associate Professor of Zoology, LDC PowerPoint Presentation: 3/7/2014 Rani Ashok, Associate Professor of Zoology, LDC 6 ddbj: ddbj www.ddbj.nig.ac.jp DNA Data Bank of Japan began as a collaboration with EMBL and GenBank . run by the National Institute of Genetics. one can search for entries by accession number, and little else. 3/7/2014 7 Rani Ashok, Associate Professor of Zoology, LDC PowerPoint Presentation: 3/7/2014 Rani Ashok, Associate Professor of Zoology, LDC 8 Nucleotide sequence databases: Nucleotide sequence databases Other nucleotide sequence databases 3/7/2014 9 Rani Ashok, Associate Professor of Zoology, LDC Other Nucleotide sequence databases: Other Nucleotide sequence databases UniGene - http://www.ncbi.nlm.nih.gov/sites/entrez?db=unigene Attempts to process the GenBank sequence data into a non-redundant set of gene-oriented clusters. Each UniGene cluster contains sequences that represent a unique gene, as well as related information such as the tissue types in which the gene has been expressed and map location. SGD - http://www.yeastgenome.org/ scientific database of the molecular biology and genetics of the yeast Saccharomyces cerevisiae EBI Genomes – www.ebi.ac.uk/genomes/ provides access and statistics for the completed genomes, and information about ongoing projects 3/7/2014 10 Rani Ashok, Associate Professor of Zoology, LDC Other Nucleotide sequence databases: Other Nucleotide sequence databases Genome Biology - www.ncbi.nlm.nih.gov/Genomes/ Site at NCBI contains information about the available complete genomes Ensembl - www.ensembl.org Joint project between EMBL-EBI and the Sanger Centre to develop a software system which produces and maintains automatic annotation on eukaryotic genomes 3/7/2014 11 Rani Ashok, Associate Professor of Zoology, LDC Protein sequence databases: Protein sequence databases 3/7/2014 12 Rani Ashok, Associate Professor of Zoology, LDC Protein sequence databases: Protein sequence databases SWISS-PROT, TrEMBL - www.expasy.ch/sprot/ provide a high level of annotations (such as the description of the function of a protein, its domains structure, post-translational modifications, variants, etc.) a minimal level of redundancy and high level of integration with other databases. started in 1986 by Amos Bairoch in the Department of Medical Biochemistry at the University of Geneva. one of the best protein sequence databases in terms of the quality of the annotation. TrEMBL - computer-annotated supplement of SWISS-PROT that contains all the translations of EMBL nucleotide sequence entries not yet integrated in SWISS-PROT. 3/7/2014 13 Rani Ashok, Associate Professor of Zoology, LDC PowerPoint Presentation: 3/7/2014 Rani Ashok, Associate Professor of Zoology, LDC 14 Protein sequence databases: Protein sequence databases SWISS-PROT, TrEMBL - Contd... SWISS-PROT and TrEMBL - developed by the SWISS-PROT groups at Swiss Institute of Bioinformatics (SIB) and at EBI . accessed and searched through the the SRS system at ExPASy , one can download the entire database as one single flat file. The SWISS-PROT database has some legal restrictions : Entries themselves are copyrighted, but freely accessible and usable by academic researchers. Commercial companies must buy a license fee from SIB. 3/7/2014 15 Rani Ashok, Associate Professor of Zoology, LDC Protein sequence databases: Protein sequence databases PIR - pir.georgetown.edu division of the National Biomedical Research Foundation (NBRF) in the US. involved in a collaboration with the Munich Information Center for Protein Sequences (MIPS) and the Japanese International Protein Sequence Database (JIPID). PIR grew out of Margaret Dayhoff's work in the middle of the 1960s. comprehensive , well-organized, accurate, and consistently annotated. does not reach the level of completeness in the entry annotation as does SWISS-PROT. 3/7/2014 16 Rani Ashok, Associate Professor of Zoology, LDC PowerPoint Presentation: 3/7/2014 Rani Ashok, Associate Professor of Zoology, LDC 17 Protein sequence databases: Protein sequence databases PIR - pir.georgetown.edu although SWISS-PROT and PIR overlap extensively, there are still many sequences which can be found in only one of them. can also be downloaded as a set of flat files. also produces the NRL-3D - database of sequences extracted from the three-dimensional structures in the Protein Databank (PDB) 3/7/2014 18 Rani Ashok, Associate Professor of Zoology, LDC Protein sequence databases: Protein sequence databases Sequence motif databases 3/7/2014 19 Rani Ashok, Associate Professor of Zoology, LDC sequence motif databases: sequence motif databases Pfam - pfam.sanger.ac.uk/ , pfam.cgb.ki.se database of protein families defined as domains (contiguous segments of entire protein sequences). started in 1996 and is maintained by a consortium of scientists [among them Erik Sonnhammer (CGB, KI, Sweden), Sean Eddy ( WashU , St Louis USA), Richard Durbin, Alan Bateman and Ewan Birney (Sanger Centre, UK)]. alignments can be converted into hidden Markov models (HMM), which can be used to search for domains in a query protein sequence. software HMMER (by Sean Eddy) is the computational foundation for Pfam . 3/7/2014 20 Rani Ashok, Associate Professor of Zoology, LDC PowerPoint Presentation: 3/7/2014 Rani Ashok, Associate Professor of Zoology, LDC 21 sequence motif databases: sequence motif databases Pfam – contd... The Pfam database can be searched, or used to identify domains in a sequence, or downloaded from the website. The Pfam database is licensed under the GNU General Public License, which basically makes it available to anyone, but imposes the restriction that derivative works (new databases, modifications) must be made available in source form. 3/7/2014 22 Rani Ashok, Associate Professor of Zoology, LDC sequence motif databases: sequence motif databases PROSITE - www.expasy.ch/prosite/ Database of protein families and domains. consists of biologically significant sites, patterns and profiles that help to reliably identify to which known protein family (if any) a new sequence belongs. started by Amos Bairoch , is part of SWISS-PROT. PROSITE has been extended to contain also some profiles (probability patterns for specific protein sequence families) can be used to search by keyword or other text in the entries, to search for a pattern in a sequence, or to search for proteins in SWISS-PROT that match a pattern. 3/7/2014 23 Rani Ashok, Associate Professor of Zoology, LDC PowerPoint Presentation: 3/7/2014 Rani Ashok, Associate Professor of Zoology, LDC 24 Macromolecular structure databases: Macromolecular structure databases 3/7/2014 25 Rani Ashok, Associate Professor of Zoology, LDC Macromolecular structure databases: Macromolecular structure databases SCOP – scop.mrc-lmb.cam.ac.uk/ scop / The SCOP (Structural Classification of Proteins) database was started by Alexey Murzin in 1994 (Lab of Molecular Biology, MRC, Cambridge, UK). Its purpose is to classify protein 3D structures in a hierarchical scheme of structural classes . It is maintained by experts ("by hand"), and all protein structures in the PDB are classified, and it is updated as new structures are deposited in the PDB. This is a typical secondary database ; it is based on data in a primary database (in this case the PDB), but adds information through analysis and/or organisation, in this case the classification of protein 3d structures into a hierarchical scheme of folds, superfamilies and families. 3/7/2014 26 Rani Ashok, Associate Professor of Zoology, LDC PowerPoint Presentation: 3/7/2014 Rani Ashok, Associate Professor of Zoology, LDC 27 Macromolecular structure databases: Macromolecular structure databases CATH - www.cathdb.info The CATH database (Class, architecure , topology, homologous superfamily ) is a hierarchical classification of protein domain structures, which clusters proteins at four major structural levels . Although the aim is very similar to SCOP, the scheme it uses is different, and the philosophy and practical details of producing the classification differ considerably. For instance, a larger fraction of the decisions made when classifying a new protein 3D structure is made automatically by software. started by Christine Orengo in Janet Thornton's lab (University College London) in 1996. 3/7/2014 28 Rani Ashok, Associate Professor of Zoology, LDC PowerPoint Presentation: 3/7/2014 Rani Ashok, Associate Professor of Zoology, LDC 29 Macromolecular structure databases: Macromolecular structure databases PDB - www.rcsb.org/pdb/ Protein Data Bank (PDB) main primary database for 3D structures of biological macromolecules determined by X-ray crystallography and NMR. Structural biologists usually deposit their structures in the PDB on publication, and some scientific journals require this before accepting a paper. also accepts the experimental data used to determine the structures (X-ray structure factors and NMR restraints) and homology models. 3/7/2014 30 Rani Ashok, Associate Professor of Zoology, LDC PowerPoint Presentation: 3/7/2014 Rani Ashok, Associate Professor of Zoology, LDC 31 Macromolecular structure databases: Macromolecular structure databases PDB - www.rcsb.org/pdb/ contd .. established in the 1970s at the Brookhaven Lab on Long Island, New York State, US. In 1999, the management was moved to the Research Collaboratory for Structural Bionformatics (RCSB, a joint organisation between Rutgers University, San Diego Supercomputer Center and NIST). The PDB entries contain the atomic coordinates , and some structural parameters connected with the atoms (B-factors, occupancies), or computed from the structures (secondary structure). The PDB entries contain some annotation, but it is not as comprehensive as in SWISS-PROT. 3/7/2014 32 Rani Ashok, Associate Professor of Zoology, LDC Literature databases: Literature databases PubMed : the Bibliographic database Developed by NCBI Designed to provide access to citations with abstracts from biomedical journals Linking feature was later added to provide access to full text journal articles Part of relational database management system – Entrez Retrieves and displays results in the summary format in the order the record was initially added to PubMed 3/7/2014 Rani Ashok, Associate Professor of Zoology, LDC 33 PowerPoint Presentation: 3/7/2014 Rani Ashok, Associate Professor of Zoology, LDC 34 Data sources for Pubmed: Data sources for Pubmed MEDLINE  - covers fields of medicine, nursing, dentistry, veterinary medicine, health care system, pre-clinical sciences such as molecular biology Contains citations from more than 4,600 biomedical journals published in US and 70 ther countries Non – MEDLINE – covers out-of-scope citations, primarily for general science and chemistry journals Also includes ‘ahead of print” or “ epub ” citations Eg . From POPLINE, BIOETHICSLINE etc. 3/7/2014 Rani Ashok, Associate Professor of Zoology, LDC 35 Thank You: Thank You 3/7/2014 36 Rani Ashok, Associate Professor of Zoology, LDC

Related presentations


Other presentations created by raniashok

respiration
07. 03. 2014
0 views

respiration

DIGESTIVE SYSTEM
07. 03. 2014
0 views

DIGESTIVE SYSTEM

feeding
02. 12. 2013
0 views

feeding

Classification of Echinoderms
07. 10. 2013
0 views

Classification of Echinoderms

PROTOPLASTS
30. 09. 2013
0 views

PROTOPLASTS

Human brain
30. 09. 2013
0 views

Human brain

rDNA quiz
07. 03. 2014
0 views

rDNA quiz

RESPIRATORY PIGMENTS
15. 03. 2014
0 views

RESPIRATORY PIGMENTS

DNA Fingerprinting
17. 11. 2009
0 views

DNA Fingerprinting

Molecular Diagnostic methods
17. 11. 2009
0 views

Molecular Diagnostic methods

Apiculture
20. 11. 2009
0 views

Apiculture

haemoglobinometer
14. 07. 2014
0 views

haemoglobinometer

Genomes
07. 09. 2014
0 views

Genomes

Basics of DNA and proteins
07. 09. 2014
0 views

Basics of DNA and proteins

History of biotechnology
07. 10. 2014
0 views

History of biotechnology