biosummer04 yang keynote

Information about biosummer04 yang keynote

Published on January 17, 2008

Author: Silvestre

Source: authorstream.com

Content

Detecting adaptive protein evolution:  Detecting adaptive protein evolution Ziheng Yang Department of Biology University College London There are two main explanations for genetic variation observed within a population or between species: Natural selection (survival of the fittest) mutation and drift (survival of the luckiest):  There are two main explanations for genetic variation observed within a population or between species: Natural selection (survival of the fittest) mutation and drift (survival of the luckiest) Gillespie, J.H. 1998. Population genetics: a concise guide. John Hopkins University Press, Baltimore. Hartl, D.L., and A.G. Clark. 1997. Principles of population genetics. Sinauer Associates, Sunderland, Massachusetts. Positive & negative selection:  Positive & negative selection Genotype AA Aa aa Frequency p2 2p(1-p) (1-p)2 Fitness 1 1+s 1+2s (A: “wildtype-allele”; a: new mutant) s is selection coefficient: s  0: neutral evolution s < 0: negative (purifying) selection s > 0: positive selection (adaptive evolution) Positive & negative selection:  Positive & negative selection Whether mutation or selection dominates the fate of the new allele depends on whether |Ns|  1, where N is the effective population size. Ns < -3: fatal mutations -3 < Ns < -1: unlucky losers -1 < Ns < 1: nearly neutral 1 < Ns < 3: occasional hopefuls Ns > 3: rare monsters Theories of molecular evolution:  Theories of molecular evolution Akashi, H. (1999) Gene 238: 39-51 Detecting the effect of natural selection is useful for (a) advancing evolutionary theory (b) inferring functional significance from genomic data.:  Detecting the effect of natural selection is useful for (a) advancing evolutionary theory (b) inferring functional significance from genomic data. Evolutionary conservation means functional significance.:  Evolutionary conservation means functional significance. Thomas, et al. 2003. Nature 424:788-793 Fast-evolving genes or gene regions are also functionally important if the variability is driven by natural selection.:  Fast-evolving genes or gene regions are also functionally important if the variability is driven by natural selection. In protein-coding genes, we can distinguish between synonymous (silent) and nonsynonymous (replacement) mutations, and contrast their substitution rates to infer selection on the protein.:  In protein-coding genes, we can distinguish between synonymous (silent) and nonsynonymous (replacement) mutations, and contrast their substitution rates to infer selection on the protein. Synonymous & nonsynonymous substitutions:  Synonymous & nonsynonymous substitutions Definitions:  Definitions dS (KS) : number of synonymous substitutions per synonymous site dN (KA): number of nonsynonymous substitutions per nonsynonymous site  = dN/dS: nonsynonymous/synonymous rate ratio The  ratio measures selection at the protein level:  The  ratio measures selection at the protein level  = 1: neutral evolution  < 1: negative (purifying) selection  > 1: positive (diversifying) selection Data & information:  Data & information a2 GGC TCT CAC TCC ATG AGG TAT TTC TTC ACA TCC a24 ... ..C ... ... ... ..T ... ... .A. ..C ... a11 ... ..C ..A ... ... ... ... ... .A. ..C ... aw24 ... ..C ... ... ... ... ... ... CA. ..C ... aw68 ... ..C ... ... ... ..A ... ... .A. ..C ... a3 ... ..T ..T ... ... ... ... ... C.. ..T ... Early studies average synonymous and nonsynonymous rates over sites and have little power in detecting adaptive evolution.:  Early studies average synonymous and nonsynonymous rates over sites and have little power in detecting adaptive evolution. Possible approaches:  Possible approaches Decide on which sites might be under selection and focus on them (Hughes & Nei 1988 Nature 335:167-170) (fixed-sites model) Test each site for positive selection (Suzuki & Gojobori 1999 Mol. Biol. Evol. 16: 1315–1328) Use a statistical distribution to model the  variation (random-sites model, fishing expedition) A simple approach (Fitch et al. 1997; Suzuki & Gojobori 1999):  A simple approach (Fitch et al. 1997; Suzuki & Gojobori 1999) TTC TA TTC ATC TTA TAT TTT TTC TTC TTC TTT CT CA TA 3 nonsynonymous changes 1 synonymous change Use of codon models to detect amino acid sites under diversifying selection:  Use of codon models to detect amino acid sites under diversifying selection Likelihood Ratio Test (LRT) for sites under positive selection Bayes calculation of posterior probabilities of sites under positive selection Rates to CTG:  Rates to CTG Synonymous CTC (Leu)  CTG (Leu) CTG TTG (Leu)  CTG (Leu) CTG Nonsynonymous GTG (Val)  CTG (Leu) CTG CCG (Pro)  CTG (Leu) CTG Rate matrix Q = {qij}:  Rate matrix Q = {qij} (Goldman & Yang 1994 Mol Biol Evol 11:725-736 Muse & Gaut 1994 Mol Biol Evol 11:715-724) LRT of sites under positive selection:  LRT of sites under positive selection H0: there are no sites at which  > 1 H1: there are such sites Compare 2 = 2(1 - 0) with a 2 distribution (Nielsen & Yang 1998 Genetics 148:929-936; Yang, Nielsen, Goldman & Pedersen 2000. Genetics 155:431-449) Two pairs of useful models:  Two pairs of useful models M1a (Nearly Neutral) Site class k: 0 1 pk: p0 p1 k: 0<1 1=1 M2a (Positive Selection) Site class k: 0 1 2 pk: p0 p1 p2 k: 0<1 1=1 2>1 Modified from Nielsen & Yang (1998), where 0=0 is fixed Slide22:  M7 (beta, using 10 site classes)  ~ beta(p, q) M8 (beta&) p0 of sites from beta(p, q) p1 = 1 - p0 of sites with s > 1 From Yang et al. (2000) Discretisation of a continuous distribution:  Discretisation of a continuous distribution 0 0.2 0.4 0.6 0.8 1  ratio Sites M7(beta) Mixture distribution M8(beta&):  Mixture distribution M8(beta&) Sites 0 0.2 0.4 0.6 0.8 1  ratio =1.7 p1 p0 from beta(p, q) Likelihood function and Empirical Bayesian inference of sites under selection (M2a):  Likelihood function and Empirical Bayesian inference of sites under selection (M2a) Site class k: 0 1 2 Proportion pk: p0 p1 p2  ratio k: 0 < 1 1 = 1 2 > 1 Bayes Empirical Bayes (BEB): M2a:  Bayes Empirical Bayes (BEB): M2a Human MHC Class I data: 192 alleles, 270 codons :  Human MHC Class I data: 192 alleles, 270 codons Model  Parameter estimates M7 (beta) 7,498.97 beta(0.10, 0.35) M8 (beta&) 7,232.68 p0 = 0.90, beta(0.17, 0.71) (p1 = 0.10), s = 5.12 Likelihood ratio test of positive selection: 2 = 2  266.29 = 532.58, P < 0.000, d.f. = 2 Posterior probabilities for MHC:  Posterior probabilities for MHC 25 sites identified by M8 (beta&) using both NEB & BEB:  25 sites identified by M8 (beta&) using both NEB & BEB Comparison between NEB and BEB from real data analysis and computer simulation suggests that :  Comparison between NEB and BEB from real data analysis and computer simulation suggests that BEB is effective in correcting high false positive rates of NEB in small (non-informative) data sets. BEB does not seem to cause a loss of power in large (informative) data sets. Some wrong models are more useful than the true model. A small data set (HTLV tax gene) (Suzuki & Nei 2004 MBE 21:914-921):  A small data set (HTLV tax gene) (Suzuki & Nei 2004 MBE 21:914-921) 20 sequences, 181 codons. 23 singleton differences on star tree: 2 synonymous, 21 nonsynonymous NEB M0 (one-ratio), M2 (selection), M2a (PositiveSelection), M8 (beta&) all give  = 4.87. Every site is under positive selection with P = 1 BEB 21 sites have 0.91 < P < 0.93 under M2a and 0.96 < P < 0.97 under M8. Other sites have P ~ 57% or 70%. Performance measures in simulation:  Performance measures in simulation True positive = 50/80 False positive = 10/120 Accuracy = 50/60 Performance of BEB (NEB) in simulations:  Performance of BEB (NEB) in simulations (cutoff P = 95%) Advantages of ML:  Advantages of ML Accounts for the genetic code Accounts for ts/tv rate bias and codon usage bias Avoids bias in ancestral reconstruction Uses probability theory to correct for multiple hits Assumptions & Limitations:  Assumptions & Limitations Same selective pressure over all lineages No recombination within the sequence No variation in synonymous rate among sites Same rate for all amino acid changes No sequencing or alignment errors The level of sequence divergence and the number of sequences are two major factors affecting accuracy and power. Data of only a few closely related sequences do not contain much information. Adaptive molecular evolution:  Adaptive molecular evolution proteins involved in immunity or defence (MHC, immunoglobulin VH, class 1 chitinas) proteins involved in evading defence systems (HIV env, nef, gap, pol, etc., capsid in FMD virus, flu virus hemagglutinin gene) proteins involved in male & female reproduction (abalone sperm lysin, sea urchin bindin, proteins in mammals) Miscellaneous Acknowledgments:  Acknowledgments BBSRC http://abacus.gene.ucl.ac.uk/ References:  References Yang, Z., and J.P. Bielawski. 2000. Statistical methods for detecting molecular adaptation. Trends in Ecology and Evolution 15: 496-503. Yang, Z. 2001. Adaptive molecular evolution, Chapter 12 (pp. 327-350) in Handbook of statistical genetics, eds. D. Balding, M. Bishop, and C. Cannings. Wiley, New York. Yang, Z. 2002. Inference of selection from multiple species alignments. Current Opinion in Genetics and Development 12:688-694. Wong, W.S.W., et al. 2004. Accuracy and power of statistical methods for detecting adaptive evolution in protein coding sequences and for identifying positively selected sites. Genetics 168: 1041-1051. Yang, Z., et al. submitted. Bayes empirical Bayes inference of amino acid sites under positive selection. Molecular Biology & Evolution

Related presentations


Other presentations created by Silvestre

Music and TOK
15. 01. 2008
0 views

Music and TOK

CAP08Lesson7
08. 05. 2008
0 views

CAP08Lesson7

VALENTINI WANKA 1165498491
07. 05. 2008
0 views

VALENTINI WANKA 1165498491

LSE Olympics slides
02. 05. 2008
0 views

LSE Olympics slides

2007525222912917
30. 04. 2008
0 views

2007525222912917

2005511164441155
24. 04. 2008
0 views

2005511164441155

2005317110534 9
22. 04. 2008
0 views

2005317110534 9

cooperation latvia
17. 04. 2008
0 views

cooperation latvia

B4 Qian 0215
15. 04. 2008
0 views

B4 Qian 0215

ZigBee Master
08. 04. 2008
0 views

ZigBee Master

Health Care Waste
18. 01. 2008
0 views

Health Care Waste

numbergendercase
11. 01. 2008
0 views

numbergendercase

cis bhs fhs foodborne 36957 7
12. 01. 2008
0 views

cis bhs fhs foodborne 36957 7

opinion
13. 01. 2008
0 views

opinion

ConsBeh Pt 2of3 PsyInfl
13. 01. 2008
0 views

ConsBeh Pt 2of3 PsyInfl

Child Protection
17. 01. 2008
0 views

Child Protection

Satellite Testing
17. 01. 2008
0 views

Satellite Testing

COEL ExtRev
16. 01. 2008
0 views

COEL ExtRev

rabenhorstDRCS
19. 01. 2008
0 views

rabenhorstDRCS

Vermont Challenge poster Ding
21. 01. 2008
0 views

Vermont Challenge poster Ding

Cocoaine Chapter 6
22. 01. 2008
0 views

Cocoaine Chapter 6

AFEI NCO presentation
23. 01. 2008
0 views

AFEI NCO presentation

dubaitwo
24. 01. 2008
0 views

dubaitwo

Decision Making 10 06 p
05. 02. 2008
0 views

Decision Making 10 06 p

SCHLEGEL Thomas
12. 02. 2008
0 views

SCHLEGEL Thomas

crager xmastree1
22. 01. 2008
0 views

crager xmastree1

EDEA 630 Chapter 12 PowerPoint
28. 01. 2008
0 views

EDEA 630 Chapter 12 PowerPoint

Chapter 14
29. 01. 2008
0 views

Chapter 14

Activating Your Heart
29. 01. 2008
0 views

Activating Your Heart

Rome UPU PostCode StefanLindholm
17. 01. 2008
0 views

Rome UPU PostCode StefanLindholm

OS0607 YWANG what is good soil
22. 01. 2008
0 views

OS0607 YWANG what is good soil

CellPhones
30. 01. 2008
0 views

CellPhones

Keeoing Fit and Healthy
07. 02. 2008
0 views

Keeoing Fit and Healthy

Metamorphism
10. 01. 2008
0 views

Metamorphism

AW1
21. 01. 2008
0 views

AW1

MLA Documentation
14. 02. 2008
0 views

MLA Documentation

pps 308
14. 02. 2008
0 views

pps 308

Generic
22. 02. 2008
0 views

Generic

220 L13 Constantine
25. 02. 2008
0 views

220 L13 Constantine

48 The Hearts of the Children
08. 03. 2008
0 views

48 The Hearts of the Children

TZ Course and trip
14. 03. 2008
0 views

TZ Course and trip

injury guidelines
15. 03. 2008
0 views

injury guidelines

College Prep for HS Students
19. 03. 2008
0 views

College Prep for HS Students

bh us 02 smith biometric
24. 03. 2008
0 views

bh us 02 smith biometric

ATTC 1981 2007
16. 03. 2008
0 views

ATTC 1981 2007

lenovo
14. 04. 2008
0 views

lenovo

Peds Indonesia
14. 01. 2008
0 views

Peds Indonesia

Trish Skillman Presentation
16. 01. 2008
0 views

Trish Skillman Presentation

KKurani 2 14 07
08. 02. 2008
0 views

KKurani 2 14 07

condon
09. 01. 2008
0 views

condon

anthony russell
10. 01. 2008
0 views

anthony russell

Marketingweek2
04. 02. 2008
0 views

Marketingweek2

SGP03
28. 02. 2008
0 views

SGP03

HKPresentationJmSeig neur
10. 04. 2008
0 views

HKPresentationJmSeig neur

s3 Calzadilla Sarmiento
22. 01. 2008
0 views

s3 Calzadilla Sarmiento

Budzet Mon 2007 ang
07. 03. 2008
0 views

Budzet Mon 2007 ang

Villeneuve Can Rpt
24. 01. 2008
0 views

Villeneuve Can Rpt

GlobalIT Class4
31. 03. 2008
0 views

GlobalIT Class4

icongo a z funds raise
15. 02. 2008
0 views

icongo a z funds raise

bredden först
07. 02. 2008
0 views

bredden först

habitat cluj
23. 01. 2008
0 views

habitat cluj

caringsocietypostKuu rne nov01
20. 02. 2008
0 views

caringsocietypostKuu rne nov01

MELL ASU 0708CCPOverview
10. 01. 2008
0 views

MELL ASU 0708CCPOverview

SETA 2 ETHICAL ATTITUDEs
17. 01. 2008
0 views

SETA 2 ETHICAL ATTITUDEs

Flex Benefit Coordinator
09. 01. 2008
0 views

Flex Benefit Coordinator

filmteaching
05. 02. 2008
0 views

filmteaching