ECCROct18 06

Information about ECCROct18 06

Published on November 20, 2007

Author: Ming

Source: authorstream.com

Content

Overview of Chemical Informatics and Cyberinfrastructure Collaboratory:  Overview of Chemical Informatics and Cyberinfrastructure Collaboratory October 18 2006 Geoffrey Fox Computer Science, Informatics, Physics Pervasive Technology Laboratories Indiana University Bloomington IN 47401 [email protected] http://www.infomall.org http://www.chembiogrid.org Activities:  Activities Local Teams, successful Prototypes and International Collaboration set up in 3 initial major focus areas Chemical Informatics Cyberinfrastructure/Grids with services, workflows and demonstration uses building on success in other applications (LEAD) and showing distributed integration of academic and commercial tools Computational Chemistry Cyberinfrastructure/Grids with simulation, databases and TeraGrid use Education with courses and degrees Review of activities suggest we also formalize work in two further areas Chemical Informatics Research – model applicability and data-mining Interfacing with the User - interaction tools and portal optimized for particular customer groups Also have started an activity to identify “customers” for Cyberinfrastructure and its implied Chemistry eScience model CICC Senior Personnel:  CICC Senior Personnel Geoffrey C. Fox Mu-Hyun (Mookie) Baik Dennis B. Gannon Marlon Pierce Beth A. Plale Gary D. Wiggins David J. Wild Yuqing (Melanie) Wu Peter T. Cherbas Mehmet M. Dalkilic Charles H. Davis A. Keith Dunker Kelsey M. Forsythe Kevin E. Gilbert John C. Huffman Malika Mahoui Daniel J. Mindiola Santiago D. Schnell William Scott Craig A. Stewart David R. Williams From Biology, Chemistry, Computer Science, Informatics at IU Bloomington and IUPUI (Indianapolis) CICC Infrastructure Vision:  CICC Infrastructure Vision Drug Discovery and other academic chemistry and pharmacology research will be aided by powerful modern information technology ChemBioGrid set up as distributed cyberinfrastructure in eScience model ChemBioGrid will provide portals (user interfaces) to distributed databases, results of high throughput screening instruments, results of computational chemical simulations and other analyses ChemBioGrid will provide services to manipulate this data and combine in workflows; it will have convenient ways to submit and manage multiple jobs ChemBioGrid will include access to PubChem, PubMed, PubMed Central, the Internet and its derivatives like Microsoft Academic Live and Google Scholar The services include open-source software like CDK, commercial code from vendors from BCI, OpenEye, Gaussian and Google, and any user contributed programs ChemBioGrid will define open interfaces to use for a particular type of service allowing plug and play choice between different implementations Slide5:  CICC Combines Grid Computing with Chemical Informatics CICC CICC Chemical Informatics and Cyberinfrastucture Collaboratory Funded by the National Institutes of Health www.chembiogrid.org Indiana University Department of Chemistry, School of Informatics, and Pervasive Technology Laboratories Science and Cyberinfrastructure . Large Scale Computing Challenges Chemical Informatics is non-traditional area of high performance computing, but many new, challenging problems may be investigated. CICC is an NIH funded project to support chemical informatics needs of High Throughput Cancer Screening Centers. The NIH is creating a data deluge of publicly available data on potential new drugs. CICC supports the NIH mission by combining state of the art chemical informatics techniques with World class high performance computing National-scale computing resources (TeraGrid) Internet-standard web services International activities for service orchestration Open distributed computing infrastructure for scientists world wide NIH PubMed DataBase OSCAR Text Analysis POVRay Parallel Rendering Initial 3D Structure Calculation Toxicity Filtering Cluster Grouping Docking Molecular Mechanics Calculations Quantum Mechanics Calculations IU’s Varuna DataBase NIH PubChem DataBase Chemical informatics text analysis programs can process 100,000’s of abstracts of online journal articles to extract chemical signatures of potential drugs. OSCAR-mined molecular signatures can be clustered, filtered for toxicity, and docked onto larger proteins. These are classic “pleasingly parallel” tasks. Top-ranking docked molecules can be further examined for drug potential. Big Red (and the TeraGrid) will also enable us to perform time consuming, multi-stepped Quantum Chemistry calculations on all of PubMed. Results go back to public databases that are freely accessible by the scientific community. CICC Prototype Web Services:  CICC Prototype Web Services Molecular weights Molecular formulae Tanimoto similarity 2D Structure diagrams Molecular descriptors 3D structures InChI generation/search CMLRSS R and Excel Basic cheminformatics Application based services Compare (NIH) Toxicity predictions (ToxTree) Literature extraction (OSCAR3) Clustering (BCI Toolkit) Docking, filtering, ... (OpenEye) Varuna simulation Define WSDL interfaces to enable global production of compatible Web services; refine CML Add more services (identify gaps) Add more databases, including 3D structural info Demonstrate use of services in other pipelining tools (KDE, Knime – Pipeline Pilot already done) Extend Computational Chemistry (Varuna) Services Routine TeraGrid and Big Red use “Production” on OSCAR3 CDK Gamess Jaguar Develop more training material Next steps? Key Ideas Add value to PubChem with additional distributed services and databases Develop nifty ideas like VOTables Wrapping existing code in web services is not difficult Provide “core” (CDK) services and exemplars of typical tools Provide access to key databases via a web service interface Provide access to major Compute Grids Web Service Locations:  Web Service Locations Indiana University Clustering VOTables OSCAR3 Toxicity classification Database services Penn State University (now moved to IU) CDK based services Fingerprints Similarity calculations 2D structure diagrams Molecular descriptors Cambridge University InChI generation / search CMLRSS OpenBabel InfoChem SPRESI database SDSCTypical TeraGrid Site NIH PubChem ….. Compare ….. Cheminformatics Education at IU:  Cheminformatics Education at IU Linked to bioinformatics in Indiana University’s School of Informatics School of Informatics degree programs BS, MS, PhD Programs offered at both the Indianapolis (IUPUI) and Bloomington (IUB) campuses Bioinformatics MS and track on PhD Chemical Informatics MS and track on PhD Informatics Undergraduates can choose a chemistry cognate (change to Life Sciences ) PhD in Informatics started in August 2005 and offers tracks in bioinformatics; chemical informatics; health informatics; human-computer interaction design; social and organizational informatics; more to come! Good employer interest but modest student understanding of value of Cheminformatics degree 3 core courses in Cheminformatics plus seminar/independent studies Significant interest in distance education version of introductory Cheminformatics course (enrollment promising in Distance Graduate Certificate in Chemical Informatics) Current Status:  Current Status Web site http://www.chembiogrid.org Wiki chosen to support project as a shared editable web space Building Collaboratory involving PubChem – Global Information System accessible anywhere and at any time – enhance PubChem with distributed tools (clustering, simulation, annotation etc.) and data Adopted Taverna as workflow as popular in Bioinformatics but we will evaluate other systems such as GPEL from LEAD Demonstrated CI-enhanced Chemistry simulations Initiated Data-mining, User interface and Chemical Informatics tools research Prototyped large set of runs on local Big Red 23 Teraflop supercomputer (OSCAR3 and modeling moving to CDK Gamess Jaguar) Initial results discussed at conferences/workshops/papers Gordon Conferences, ACS, SDSC tutorial First new Cheminformatics courses offered Advisory board set up and met – this is second meeting Videoconferencing-based meetings with Peter Murray-Rust and group at Cambridge roughly every 2-3 weeks Good or potentially good interactions with Local HTS in CGB, NIH DTP, Scripps, Lilly and Michigan ECCR MLSCN Post-HTS Biology Decision Support:  MLSCN Post-HTS Biology Decision Support Percent Inhibition or IC50 data is retrieved from HTS Question: Was this screen successful? Question: What should the active/inactive cutoffs be? Question: What can we learn about the target protein or cell line from this screen? Compounds submitted to PubChem Workflows encoding distribution analysis of screening results Grids can link data analysis ( e.g image processing developed in existing Grids), traditional Chem-informatics tools, as well as annotation tools (Semantic Web, del.icio.us) and enhance lead ID and SAR analysis A Grid of Grids linking collections of services at PubChem ECCR centers MLSCN centers Workflows encoding plate & control well statistics, distribution analysis, etc Workflows encoding statistical comparison of results to similar screens, docking of compounds into proteins to correlate binding, with activity, literature search of active compounds, etc CHEMINFORMATICS PROCESS GRIDS Example HTS workflow: finding cell-protein relationships:  Example HTS workflow: finding cell-protein relationships A protein implicated in tumor growth with known ligand is selected (in this case HSP90 taken from the PDB 1Y4 complex) Similar structures to the ligand can be browsed using client portlets. Once docking is complete, the user visualizes the high-scoring docked structures in a portlet using the JMOL applet. Similar structures are filtered for drugability, are converted to 3D, and are automatically passed to the OpenEye FRED docking program for docking into the target protein. The screening data from a cellular HTS assay is similarity searched for compounds with similar 2D structures to the ligand. Docking results and activity patterns fed into R services for building of activity models and correlations Least Squares Regression Random Forests Neural Nets Varuna environment for molecular modeling (Baik, IU):  Varuna environment for molecular modeling (Baik, IU) QM Database Researcher Simulation Service FORTRAN Code, Scripts Chemical Concepts Experiments QM/MM Database PubChem, PDB, NCI, etc. ChemBioGrid Reaction DB DB Service Queries, Clustering, Curation, etc. Papers etc. Condor TeraGrid Supercomputers “Flocks” Methods Development at the CICC:  Methods Development at the CICC Tagging methods for web-based annotation exploiting del.icio.us and Connotea Development of QSAR model interpretability and applicability methods RNN-Profiles for exploration of chemical spaces VisualiSAR - SAR through visual analysis See http://www.daylight.com/meetings/mug99/Wild/Mug99.html Visual Similarity Matrices for High Volume Datasets See http://www.osl.iu.edu/~chemuell/new/bioinformatics.php Fast, accurate clustering using parallel Divisive K-means Mapping of Natural Language queries to use cases and workflows Advanced data mining models for drug discovery information Structure of Proposal:  Structure of Proposal a) Define audience that we are targeting b) Cyberinfrastructure Framework with Key services -- Registry, Computing, portal, workflow Exemplar Chemoinformatics Services Exemplar workflows using services Defined WSDL for key cases defined to allow others to contribute Tutorial c) Education d) IT/Cyber-enhanced Computational Chemistry e) Cheminformatics Research Systems Tools and Modeling Questions:  Questions We expect to respond to “big” NIH RFP in about 4 months Should we partner with Michigan? Who is “customer” and how do we get more? Do/Should chemists want our or more generally NIH’s product? Interactions with “large” and “small” industry What is balance between infrastructure, computational chemistry, Cheminformatics tools and research, chemical informatics systems and interfaces? Should we stress literature (OSCAR3) project? Balance of applications and generic capabilities? How should we structure education component? Field does not have strong student appeal compared to Bioinformatics We are strong in Computer Sciences (Grids/Cyberinfrastructure) but doubtful if any CS reviewers We are strong in Cheminformatics systems but not clear a recognized activity and how do we justify claim that Grids/Cyberinfrastructure/Open Access “good” Should we link more with biology? Covering our bases: Who are our “Customers”?:  Covering our bases: Who are our “Customers”? What do we need to conquer traditional chemical Research Community:  What do we need to conquer traditional chemical Research Community - High-Fidelity Structural Data, Redox Potentials, Spectroscopy, Transition State Structures, Energies, Molecular Orbitals….. “Departments” of the future Center:  “Departments” of the future Center

Related presentations


Other presentations created by Ming

Cigarette Ad Presentation
03. 10. 2007
0 views

Cigarette Ad Presentation

9 eEpoch Services
31. 10. 2007
0 views

9 eEpoch Services

poverty
29. 11. 2007
0 views

poverty

Ancient Rome
29. 10. 2007
0 views

Ancient Rome

Nature of the Mark Beast
31. 10. 2007
0 views

Nature of the Mark Beast

AIS pps
05. 11. 2007
0 views

AIS pps

BMI2 SS07 Class06 MRI
14. 11. 2007
0 views

BMI2 SS07 Class06 MRI

IR Sensor
15. 11. 2007
0 views

IR Sensor

TS16949
16. 11. 2007
0 views

TS16949

Urolithiasis
19. 11. 2007
0 views

Urolithiasis

los angeles
02. 11. 2007
0 views

los angeles

7FocusOnFriendship
27. 12. 2007
0 views

7FocusOnFriendship

Valstyne c
06. 11. 2007
0 views

Valstyne c

arnold2
07. 01. 2008
0 views

arnold2

moni01
07. 01. 2008
0 views

moni01

Intertestamental Period
11. 12. 2007
0 views

Intertestamental Period

attwood
04. 01. 2008
0 views

attwood

20 Noordende
06. 11. 2007
0 views

20 Noordende

pmo satcom 0502
07. 11. 2007
0 views

pmo satcom 0502

grames ws2002
23. 11. 2007
0 views

grames ws2002

ferrous metallurgy
03. 01. 2008
0 views

ferrous metallurgy

Bb7 Student Orientation
17. 12. 2007
0 views

Bb7 Student Orientation

ISIBANG 2007 01 31 jk
30. 10. 2007
0 views

ISIBANG 2007 01 31 jk

nmrc
01. 11. 2007
0 views

nmrc

TH08282007bridgesFin al
01. 01. 2008
0 views

TH08282007bridgesFin al

ANanni ISPPD2006 02Apr06
20. 02. 2008
0 views

ANanni ISPPD2006 02Apr06

sb climate summit 2
24. 02. 2008
0 views

sb climate summit 2

AMIC UNESCO Session 1
27. 02. 2008
0 views

AMIC UNESCO Session 1

apt2003
04. 12. 2007
0 views

apt2003

Carlson
05. 03. 2008
0 views

Carlson

aca2002 future
19. 11. 2007
0 views

aca2002 future

2003 ECO 11
14. 03. 2008
0 views

2003 ECO 11

FulbrightWorkshop1
27. 03. 2008
0 views

FulbrightWorkshop1

20070201 nznog07 apnic update
30. 03. 2008
0 views

20070201 nznog07 apnic update

Hadiths
18. 12. 2007
0 views

Hadiths

Shu Li Cheng
13. 12. 2007
0 views

Shu Li Cheng

Debt and Health
13. 04. 2008
0 views

Debt and Health

Falcon Asilomar 3 28 06
04. 10. 2007
0 views

Falcon Asilomar 3 28 06

flex arm hang
28. 11. 2007
0 views

flex arm hang

04helsinki kidwai
23. 11. 2007
0 views

04helsinki kidwai

FocusShow
28. 09. 2007
0 views

FocusShow

R M SOC 15 3 06
21. 11. 2007
0 views

R M SOC 15 3 06

metodologiaOA
28. 12. 2007
0 views

metodologiaOA

validation
01. 11. 2007
0 views

validation

Intro to methodology12 July
30. 11. 2007
0 views

Intro to methodology12 July

Pamukkale 070423
21. 11. 2007
0 views

Pamukkale 070423

farm and ranch survival kit
30. 12. 2007
0 views

farm and ranch survival kit

Enigma 1
04. 01. 2008
0 views

Enigma 1

Browne GisinTerminology
06. 11. 2007
0 views

Browne GisinTerminology

Arthropod part 2
16. 11. 2007
0 views

Arthropod part 2