nips06 tutorial

Information about nips06 tutorial

Published on September 17, 2007

Author: CoolDude26

Source: authorstream.com

Slide1:  Bayesian models of human learning and inference Josh Tenenbaum MIT Department of Brain and Cognitive Sciences Computer Science and AI Lab (CSAIL) Thanks to Tom Griffiths, Charles Kemp, Vikash Mansinghka (http://web.mit.edu/cocosci/Talks/nips06-tutorial.ppt) The probabilistic revolution in AI :  The probabilistic revolution in AI Principled and effective solutions for inductive inference from ambiguous data: Vision Robotics Machine learning Expert systems / reasoning Natural language processing Standard view: no necessary connection to how the human brain solves these problems. Probabilistic inference inhuman cognition? :  Probabilistic inference in human cognition? 'People aren’t Bayesian' Kahneman and Tversky (1970’s-present): 'heuristics and biases' research program. 2002 Nobel Prize in Economics. Slovic, Fischhoff, and Lichtenstein (1976): 'It appears that people lack the correct programs for many important judgmental tasks.... it may be argued that we have not had the opportunity to evolve an intellect capable of dealing conceptually with uncertainty.' Stephen Jay Gould (1992): 'Our minds are not built (for whatever reason) to work by the rules of probability.' Slide4:  The probability of breast cancer is 1% for a woman at 40 who participates in a routine screening. If a woman has breast cancer, the probability is 80% that she will have a positive mammography. If a woman does not have breast cancer, the probability is 9.6% that she will also have a positive mammography. A woman in this age group had a positive mammography in a routine screening. What is the probability that she actually has breast cancer? A. greater than 90% B. between 70% and 90% C. between 50% and 70% D. between 30% and 50% E. between 10% and 30% F. less than 10% Availability biases in probability judgment:  Availability biases in probability judgment How likely is that a randomly chosen word ends in 'g'? ends in 'ing'? When buying a car, how much do you weigh your friend’s experience relative to consumer satisfaction surveys? Slide6:  Slide7:  Probabilistic inference inhuman cognition? :  Probabilistic inference in human cognition? 'People aren’t Bayesian' Kahneman and Tversky (1970’s-present): 'heuristics and biases' research program. 2002 Nobel Prize in Economics. Psychology is often drawn towards the mind’s errors and apparent irrationalities. But the computationally interesting question remains: How does mind work so well? Bayesian models of cognition:  Bayesian models of cognition Visual perception [Weiss, Simoncelli, Adelson, Richards, Freeman, Feldman, Kersten, Knill, Maloney, Olshausen, Jacobs, Pouget, ...] Language acquisition and processing [Brent, de Marken, Niyogi, Klein, Manning, Jurafsky, Keller, Levy, Hale, Johnson, Griffiths, Perfors, Tenenbaum, …] Motor learning and motor control [Ghahramani, Jordan, Wolpert, Kording, Kawato, Doya, Todorov, Shadmehr, …] Associative learning [Dayan, Daw, Kakade, Courville, Touretzky, Kruschke, …] Memory [Anderson, Schooler, Shiffrin, Steyvers, Griffiths, McClelland, …] Attention [Mozer, Huber, Torralba, Oliva, Geisler, Yu, Itti, Baldi, …] Categorization and concept learning [Anderson, Nosfosky, Rehder, Navarro, Griffiths, Feldman, Tenenbaum, Rosseel, Goodman, Kemp, Mansinghka, …] Reasoning [Chater, Oaksford, Sloman, McKenzie, Heit, Tenenbaum, Kemp, …] Causal inference [Waldmann, Sloman, Steyvers, Griffiths, Tenenbaum, Yuille, …] Decision making and theory of mind [Lee, Stankiewicz, Rao, Baker, Goodman, Tenenbaum, …] Learning concepts from examples:  Word learning 'horse' 'horse' 'horse' Learning concepts from examples Learning concepts from examples:  Learning concepts from examples Everyday inductive leaps:  Everyday inductive leaps How can people learn so much about the world . . . Kinds of objects and their properties The meanings of words, phrases, and sentences Cause-effect relations The beliefs, goals and plans of other people Social structures, conventions, and rules . . . from such limited evidence? Contributions of Bayesian models:  Contributions of Bayesian models Principled quantitative models of human behavior, with broad coverage and a minimum of free parameters and ad hoc assumptions. Explain how and why human learning and reasoning works, in terms of (approximations to) optimal statistical inference in natural environments. A framework for studying people’s implicit knowledge about the structure of the world: how it is structured, used, and acquired. A two-way bridge to state-of-the-art AI and machine learning. Marr’s Three Levels of Analysis:  Marr’s Three Levels of Analysis Computation: 'What is the goal of the computation, why is it appropriate, and what is the logic of the strategy by which it can be carried out?' Algorithm: Cognitive psychology Implementation: Neurobiology What about those errors?:  What about those errors? The human mind is not a universal Bayesian engine. But, the mind does appear adapted to solve important real-world inference problems in approximately Bayesian ways, e.g. Predicting everyday events Causal learning and reasoning Learning concepts from examples Like perceptual tasks, adults and even young children solve these problems mostly unconsciously, effortlessly, and successfully. Technical themes:  Technical themes Inference in probabilistic models Role of priors, explaining away. Learning in graphical models Parameter learning, structure learning. Bayesian model averaging Being Bayesian over network structures. Bayesian Occam’s razor Trade off model complexity against data fit. Technical themes:  Technical themes Structured probabilistic models Grammars, first-order logic, relational schemas. Hierarchical Bayesian models Acquire abstract knowledge, supports transfer. Nonparametric Bayes Flexible models that grow in complexity as new data warrant. Tractable approximate inference Markov chain Monte Carlo (MCMC), Sequential Monte Carlo (particle filtering). Outline:  Outline Predicting everyday events Causal learning and reasoning Learning concepts from examples Outline:  Outline Predicting everyday events Causal learning and reasoning Learning concepts from examples Basics of Bayesian inference:  Basics of Bayesian inference Bayes’ rule: An example Data: John is coughing Some hypotheses: John has a cold John has lung cancer John has a stomach flu Likelihood P(d|h) favors 1 and 2 over 3 Prior probability P(h) favors 1 and 3 over 2 Posterior probability P(h|d) favors 1 over 2 and 3 Bayesian inference in perception and sensorimotor integration :  Bayesian inference in perception and sensorimotor integration (Weiss, Simoncelli andamp; Adelson 2002) (Kording andamp; Wolpert 2004) Memory retrieval as Bayesian inference(Anderson & Schooler, 1991):  Power law of forgetting: Log delay (hours) Memory retrieval as Bayesian inference (Anderson andamp; Schooler, 1991) Log memory strength Additive effects of practice andamp; delay: Spacing effects in forgetting: Retention interval (days) Mean # recalled Log delay (seconds) Memory retrieval as Bayesian inference(Anderson & Schooler, 1991):  For each item in memory, estimate the probability that it will be useful in the present context. Use priors based on the statistics of natural information sources. Memory retrieval as Bayesian inference (Anderson andamp; Schooler, 1991) Memory retrieval as Bayesian inference(Anderson & Schooler, 1991):  Log # days since last occurrence Memory retrieval as Bayesian inference (Anderson andamp; Schooler, 1991) Log need odds Log need odds Log # days since last occurrence Log # days since last occurrence Power law of forgetting: Additive effects of practice andamp; delay: Spacing effects in forgetting: [New York Times data; c.f. email sources, child-directed speech] Everyday prediction problems(Griffiths & Tenenbaum, 2006):  Everyday prediction problems (Griffiths andamp; Tenenbaum, 2006) You read about a movie that has made \$60 million to date. How much money will it make in total? You see that something has been baking in the oven for 34 minutes. How long until it’s ready? You meet someone who is 78 years old. How long will they live? Your friend quotes to you from line 17 of his favorite poem. How long is the poem? You see taxicab #107 pull up to the curb in front of the train station. How many cabs in this city? Making predictions:  Making predictions You encounter a phenomenon that has existed for tpast units of time. How long will it continue into the future? (i.e. what’s ttotal?) We could replace 'time' with any other quantity that ranges from 0 to some unknown upper limit. Bayesian inference:  Bayesian inference P(ttotal|tpast)  P(tpast|ttotal) P(ttotal) posterior probability likelihood prior Bayesian inference:  Bayesian inference P(ttotal|tpast)  P(tpast|ttotal) P(ttotal)  1/ttotal 1/ttotal posterior probability likelihood prior 'Uninformative' prior Assume random sample (0 andlt; tpast andlt; ttotal) (e.g., Jeffreys, Jaynes) Bayesian inference:  Bayesian inference P(ttotal|tpast)  1/ttotal 1/ttotal posterior probability Random sampling 'Uninformative' prior P(ttotal|tpast) ttotal tpast Bayesian inference:  Bayesian inference P(ttotal|tpast)  1/ttotal 1/ttotal posterior probability Random sampling 'Uninformative' prior P(ttotal|tpast) ttotal tpast Best guess for ttotal: t such that P(ttotal andgt; t|tpast) = 0.5: Bayesian inference:  Bayesian inference P(ttotal|tpast)  1/ttotal 1/ttotal posterior probability Random sampling 'Uninformative' prior P(ttotal|tpast) ttotal tpast Yields Gott’s Rule: P(ttotal andgt; t|tpast) = 0.5 when t = 2tpast i.e., best guess for ttotal = 2tpast . Evaluating Gott’s Rule:  Evaluating Gott’s Rule You read about a movie that has made \$78 million to date. How much money will it make in total? '\$156 million' seems reasonable. You meet someone who is 35 years old. How long will they live? '70 years' seems reasonable. Not so simple: You meet someone who is 78 years old. How long will they live? You meet someone who is 6 years old. How long will they live? The effects of priors:  The effects of priors Different kinds of priors P(ttotal) are appropriate in different domains. e.g., wealth, contacts e.g., height, lifespan [Gott: P(ttotal) ttotal-1 ] The effects of priors:  The effects of priors Evaluating human predictions:  Evaluating human predictions Different domains with different priors: A movie has made \$60 million Your friend quotes from line 17 of a poem You meet a 78 year old man A move has been running for 55 minutes A U.S. congressman has served for 11 years A cake has been in the oven for 34 minutes Use 5 values of tpast for each. People predict ttotal . Slide36:  Slide37:  You learn that in ancient Egypt, there was a great flood in the 11th year of a pharaoh’s reign. How long did he reign? Slide38:  You learn that in ancient Egypt, there was a great flood in the 11th year of a pharaoh’s reign. How long did he reign? How long did the typical pharaoh reign in ancient egypt? Slide39:  If a friend is calling a telephone box office to book tickets and tells you he has been on hold for 3 minutes, how long do you think will be on hold in total? exponential or power law? Summary: prediction:  Summary: prediction Predictions about the extent or magnitude of everyday events follow Bayesian principles. Contrast with Bayesian inference in perception, motor control, memory: no 'universal priors' here. Predictions depend rationally on priors that are appropriately calibrated for different domains. Form of the prior (e.g., power-law or exponential) Specific distribution given that form (parameters) Non-parametric distribution when necessary. In the absence of concrete experience, priors may be generated by qualitative background knowledge. Outline:  Outline Predicting everyday events Causal learning and reasoning Learning concepts from examples Bayesian networks:  Bayesian networks Four random variables: X1 coughing X2 high body temperature X3 flu X4 lung cancer Nodes: variables Links: direct dependencies Each node has a conditional probability distribution Data: observations of X1, ..., X4 Causal Bayesian networks:  Causal Bayesian networks Nodes: variables Links: causal mechanisms Each node has a conditional probability distribution Data: observations of and interventions on X1, ..., X4 Four random variables: X1 coughing X2 high body temperature X3 flu X4 lung cancer (Pearl; Glymour andamp; Cooper) Inference in causal graphical models:  Inference in causal graphical models Explaining away or 'discounting' in social reasoning (Kelley; Morris andamp; Larrick) 'Screening off' in intuitive causal reasoning (Waldmann, Rehder andamp; Burnett, Blok andamp; Sloman, Gopnik andamp; Sobel) Better in chains than common-cause structures; common-cause better if mechanisms clearly independent Understanding and predicting the effects of interventions (Sloman andamp; Lagnado; Gopnik andamp; Schulz) C A B B A C B A C P(c|b) vs. P(c|b, a) P(c|b, not a) Learning graphical models:  Learning graphical models Structure learning: what causes what? Parameter learning: how do causes work? Bayesian learning of causal structure:  Bayesian learning of causal structure Data d Causal hypotheses h 1. What is the most likely network h given observed data d ? 2. How likely is there to be a link X4 X2 ? X1 X4 X3 X2 X1 X4 X3 X2 (Bayesian model averaging) Bayesian Occam’s Razor :  Bayesian Occam’s Razor For any model M, Law of 'conservation of belief': A model that can predict many possible data sets must assign each of them low probability. (MacKay, 2003; Ghahramani tutorials) Learning causation from contingencies:  Learning causation from contingencies Subjects judge the extent C to which causes E (rate on a scale from 0 to 100) E present (e+) E absent (e-) C present (c+) C absent (c-) a b c d e.g., 'Does injecting this chemical cause mice to express a certain gene?' Two models of causal judgment:  Two models of causal judgment Delta-P (Jenkins andamp; Ward, 1965): Power PC (Cheng, 1997): Power Judging the probability that C E (Buehner & Cheng, 1997; 2003):  Judging the probability that C E (Buehner andamp; Cheng, 1997; 2003) Independent effects of both DP and causal power. At DP=0, judgments decrease with base rate. ('frequency illusion') Learning causal strength(parameter learning):  Learning causal strength (parameter learning) Assume this causal structure: DP and causal power are maximum likelihood estimates of the strength parameter w1, under different parameterizations for P(E|B,C): linear  DP, Noisy-OR  causal power B Learning causal structure(Griffiths & Tenenbaum, 2005):  Hypotheses: Bayesian causal support: Learning causal structure (Griffiths andamp; Tenenbaum, 2005) likelihood ratio (Bayes factor) gives evidence in favor of h1 noisy-OR (assume uniform parameter priors, but see Yuille et al., Danks et al.) h0: h1: Buehner and Cheng (1997):  People DP (r = 0.89) Power (r = 0.88) Support (r = 0.97) Buehner and Cheng (1997) Implicit background theory:  Implicit background theory Injections may or may not cause gene expression, but gene expression does not cause injections. No hypotheses with E C Other naturally occurring processes may also cause gene expression. All hypotheses include an always-present background cause B C Causes are generative, probabilistically sufficient and independent, i.e. each cause independently produces the effect in some proportion of cases. Noisy-OR parameterization Sensitivity analysis:  People Support (Noisy-OR) 2 Support (generic parameterization) Sensitivity analysis Generativity is essential:  Generativity is essential Predictions result from 'ceiling effect' ceiling effects only matter if you believe a cause increases the probability of an effect P(e+|c+) P(e+|c-) 8/8 8/8 6/8 6/8 4/8 4/8 2/8 2/8 0/8 0/8 Support 100 50 0 Different parameterizations for different kinds of mechanisms:  Different parameterizations for different kinds of mechanisms 'Does C cause E?' 'Is there a difference in E with C vs. not-C?' 'Does C prevent E?' Blicket detector (Sobel, Gopnik, and colleagues):  Blicket detector (Sobel, Gopnik, and colleagues) “Backwards blocking” (Sobel, Tenenbaum & Gopnik, 2004):  Initially: Nothing on detector – detector silent (A=0, B=0, E=0) Trial 1: A B on detector – detector active (A=1, B=1, E=1) Trial 2: A on detector – detector active (A=1, B=0, E=1) 4-year-olds judge if each object is a blicket A: a blicket (100% say yes) B: probably not a blicket (34% say yes) 'Backwards blocking' (Sobel, Tenenbaum andamp; Gopnik, 2004) AB Trial A Trial E B A ? ? (cf. 'explaining away in weight space', Dayan andamp; Kakade) Possible hypotheses?:  Possible hypotheses? E B A E B A E B A E B A E B A E B A E B A E B A E B A E B A E B A E B A E B A E B A E B A E B A E B A E B A E B A E B A E B A E B A E B A E B A Bayesian causal learning:  Bayesian causal learning With a uniform prior on hypotheses, generic parameterization: A B Probability of being a blicket: 0.32 0.32 0.34 0.34 A stronger hypothesis space:  A stronger hypothesis space Links can only exist from blocks to detectors. Blocks are blickets with prior probability q. Blickets always activate detectors, detectors never activate on their own (i.e., deterministic OR parameterization, no hidden causes). P(E=1 | A=0, B=0): 0 0 0 0 P(E=1 | A=1, B=0): 0 0 1 1 P(E=1 | A=0, B=1): 0 1 0 1 P(E=1 | A=1, B=1): 0 1 1 1 E B A E B A E B A E B A P(h00) = (1 – q)2 P(h10) = q(1 – q) P(h01) = (1 – q) q P(h11) = q2 Manipulating prior probability(Tenenbaum, Sobel, Griffiths, & Gopnik):  Manipulating prior probability (Tenenbaum, Sobel, Griffiths, andamp; Gopnik) AB Trial A Trial Initial Learning more complex structures:  Learning more complex structures Tenenbaum et al., Griffiths andamp; Sobel: detectors with more than two objects and noisy mechanisms Steyvers et al., Sobel andamp; Kushnir: active learning with interventions (c.f. Tong andamp; Koller, Murphy) Lagnado andamp; Sloman: learning from interventions on continuous dynamical systems Inferring hidden causes:  Inferring hidden causes Common unobserved cause 4 x 2 x 2 x Independent unobserved causes 1 x 2 x 2 x 2 x 2 x One observed cause 2 x 4 x (Kushnir, Schulz, Gopnik, andamp; Danks, 2003) The 'stick ball' machine Slide66:  Bayesian learning with unknown number of hidden variables (Griffiths et al 2006) Slide67:  Inferring latent causes in classical conditioning(Courville, Daw, Gordon, Touretzky 2003):  Inferring latent causes in classical conditioning (Courville, Daw, Gordon, Touretzky 2003) Training: A US A X B US Test: X X B e.g., A noise X tone B click US shock Inferring latent causes in perceptual learning (Orban, Fiser, Aslin, Lengyel 2006):  Inferring latent causes in perceptual learning (Orban, Fiser, Aslin, Lengyel 2006) Learning to recognize objects and segment scenes: Inferring latent causes in sensory integration (Kording et al. 2006, NIPS 06):  Inferring latent causes in sensory integration (Kording et al. 2006, NIPS 06) Coincidences(Griffiths & Tenenbaum, in press):  Coincidences (Griffiths andamp; Tenenbaum, in press) The birthday problem How many people do you need to have in the room before the probability exceeds 50% that two of them have the same birthday? The bombing of London 23. Slide72:  How much of a coincidence? Bayesian coincidence factor::  Bayesian coincidence factor: Alternative hypotheses: proximity in date, matching days of the month, matching month, .... August C x x x x x x x x x x Chance: Latent common cause: Slide74:  How much of a coincidence? Bayesian coincidence factor::  C x x x x x x x x x x uniform uniform + regularity Chance: Latent common cause: Bayesian coincidence factor: Summary: causal inference & learning:  Summary: causal inference andamp; learning Human causal induction can be explained using core principles of graphical models. Bayesian inference (explaining away, screening off) Bayesian structure learning (Occam’s razor, model averaging) Active learning with interventions Identifying latent causes Summary: causal inference & learning:  Crucial constraints on hypothesis spaces come from abstract prior knowledge, or 'intuitive theories'. What are the variables? How can they be connected? How are their effects parameterized? Big open questions… How can these theories be described formally? How can these theories be learned? Summary: causal inference andamp; learning Hierarchical Bayesian framework:  Abstract Principles Structure Data (Griffiths, Tenenbaum, Kemp et al.) Hierarchical Bayesian framework A theory for blickets(c.f. PRMs, BLOG, FOPL):  A theory for blickets (c.f. PRMs, BLOG, FOPL) Learning with a uniform prior on network structures::  attributes (1-12) observed data True network Sample 75 observations… patients Learning with a uniform prior on network structures: Learning a block-structured prior on network structures: (Mansinghka et al. 2006):  True network Sample 75 observations… Learning a block-structured prior on network structures: (Mansinghka et al. 2006) attributes (1-12) observed data patients z h 1 2 3 4 0.8 0.0 0.01 0.0 0.0 0.75 0.0 0.0 0.0 5 6 7 8 9 10 11 12 The “blessing of abstraction”:  The 'blessing of abstraction' True structure of graphical model G: edge (G) class (z) edge (G) # of samples: 20 80 1000 Data D Graph G Data D Graph G Abstract theory Z The “nonparametric safety-net”:  The 'nonparametric safety-net' edge (G) class (z) edge (G) 1 2 3 4 5 6 7 8 9 10 11 12 # of samples: 40 100 1000 Data D Graph G Data D Graph G Abstract theory Z True structure of graphical model G: Outline:  Outline Predicting everyday events Causal learning and reasoning Learning concepts from examples Slide85:  Learning from just one or a few examples, and mostly unlabeled examples ('semi-supervised learning'). Simple model of concept learning:  Simple model of concept learning 'Can you show me the other blickets?' Simple model of concept learning:  Simple model of concept learning Other blickets. Simple model of concept learning:  Simple model of concept learning Learning from just one positive example is possible if: Assume concepts refer to clusters in the world. Observe enough unlabeled data to identify clear clusters. (c.f. Learning with mixture models and EM, Ghahramani andamp; Jordan, 1994; Nigam et al. 2000) Other blickets. Concept learning with mixture models in cognitive science:  Fried andamp; Holyoak (1984) Modeled unsupervised and semi-supervised categorization as EM in a Gaussian mixture. Anderson (1990) Modeled unsupervised and semi-supervised categorization as greedy sequential search in an infinite (Chinese restaurant process) mixture. Concept learning with mixture models in cognitive science Infinite (CRP) mixture models:  Infinite (CRP) mixture models Construct from k-component mixtures by integrating out mixing weights, collapsing equivalent partitions, and taking the limit as . Does not require that we commit to a fixed – or even finite – number of classes. Effective number of classes can grow with number of data points, balancing complexity with data fit. Computationally much simpler than applying Bayesian Occam’s razor or cross-validation. Easy to learn with standard Monte Carlo approximations (MCMC, particle filtering), hopefully avoiding local minima. High school lunch room analogy:  High school lunch room analogy Slide92:  'nerds' 'jocks' 'punks' 'preppies' Sampling from the CRP: Slide93:  Slide94:  'nerds' 'jocks' 'punks' 'preppies' Gibbs sampler (Neal): Assign to larger groups Group with similar objects A typical cognitive experiment:  A typical cognitive experiment Training stimuli: 1 1 1 1 1 1 0 1 0 1 0 1 0 1 1 0 0 0 0 0 0 1 0 0 0 1 0 1 1 0 Test stimuli: 0 1 1 1 ? 1 1 0 1 ? 1 1 1 0 ? 1 0 0 0 ? 0 0 1 0 ? 0 0 0 1 ? F1 F2 F3 F4 Label Slide96:  Anderson (1990), 'Rational model of categorization': Greedy sequential search in an infinite mixture model. Sanborn, Griffiths, Navarro (2006), 'More rational model of categorization': Particle filter with a small # of particles Towards more natural concepts:  Towards more natural concepts CrossCat: Discovering multiple structures that capture different subsets of features(Shafto, Kemp, Mansinghka, Gordon & Tenenbaum, 2006):  CrossCat: Discovering multiple structures that capture different subsets of features (Shafto, Kemp, Mansinghka, Gordon andamp; Tenenbaum, 2006) Infinite relational models (Kemp, Tenenbaum, Griffiths, Yamada & Ueda, AAAI 06):  concept concept predicate Infinite relational models (Kemp, Tenenbaum, Griffiths, Yamada andamp; Ueda, AAAI 06) Biomedical predicate data from UMLS (McCrae et al.): 134 concepts: enzyme, hormone, organ, disease, cell function ... 49 predicates: affects(hormone, organ), complicates(enzyme, cell function), treats(drug, disease), diagnoses(procedure, disease) … (c.f. Xu, Tresp, et al. SRL 06) Infinite relational models (Kemp, Tenenbaum, Griffiths, Yamada & Ueda, AAAI 06):  e.g., Diseases affect Organisms Chemicals interact with Chemicals Chemicals cause Diseases Infinite relational models (Kemp, Tenenbaum, Griffiths, Yamada andamp; Ueda, AAAI 06) Learning from very few examples:  Learning from very few examples Cows have T9 hormones. Sheep have T9 hormones. Goats have T9 hormones. All mammals have T9 hormones. Cows have T9 hormones. Seals have T9 hormones. Squirrels have T9 hormones. All mammals have T9 hormones. Property induction Word learning 'tufa' 'tufa' 'tufa' The computational problem(c.f., semi-supervised learning):  The computational problem (c.f., semi-supervised learning) ? ? ? ? ? ? ? ? Features New property ? Horse Cow Chimp Gorilla Mouse Squirrel Dolphin Seal Rhino Elephant (85 features from Osherson et al., e.g., for Elephant: ‘gray’, ‘hairless’, ‘toughskin’, ‘big’, ‘bulbous’, ‘longleg’, ‘tail’, ‘chewteeth’, ‘tusks’, ‘smelly’, ‘walks’, ‘slow’, ‘strong’, ‘muscle’, ‘quadrapedal’,…) Slide103:  ? ? ? ? ? ? ? ? Horse Cow Chimp Gorilla Mouse Squirrel Dolphin Seal Rhino Elephant ... ... X Y } Prior P(h) Hypotheses h Slide104:  ? ? ? ? ? ? ? ? Horse Cow Chimp Gorilla Mouse Squirrel Dolphin Seal Rhino Elephant ... ... } Prediction P(Y | X) Hypotheses h Prior P(h) X Y Many sources of priors:  Many sources of priors Hierarchical Bayesian Framework(Kemp & Tenenbaum):  F: form S: structure D: data Tree Hierarchical Bayesian Framework (Kemp andamp; Tenenbaum) mouse squirrel chimp gorilla F1 F2 F3 F4 Has T9 hormones ? ? ? … P(D|S): How the structure constrains the data of experience:  Smooth: P(h) high P(D|S): How the structure constrains the data of experience Define a stochastic process over structure S that generates hypotheses h. For generic properties, prior should favor hypotheses that vary smoothly over structure. Many properties of biological species were actually generated by such a process (i.e., mutation + selection). Not smooth: P(h) low P(D|S): How the structure constrains the data of experience:  S y Gaussian Process (~ random walk, diffusion) Threshold P(D|S): How the structure constrains the data of experience [Zhu, Ghahramani andamp; Lafferty 2003] h A graph-based prior:  Let dij be the length of the edge between i and j (= if i and j are not connected) A graph-based prior A Gaussian prior ~ N(0, S), with (Zhu, Lafferty andamp; Ghahramani, 2003) Slide110:  Species 1 Species 2 Species 3 Species 4 Species 5 Species 6 Species 7 Species 8 Species 9 Species 10 Structure S Data D Features (85 features from Osherson et al., e.g., for Elephant: ‘gray’, ‘hairless’, ‘toughskin’, ‘big’, ‘bulbous’, ‘longleg’, ‘tail’, ‘chewteeth’, ‘tusks’, ‘smelly’, ‘walks’, ‘slow’, ‘strong’, ‘muscle’, ‘quadrapedal’,…) Slide111:  Slide112:  Slide113:  ? ? ? ? ? ? ? ? Species 1 Species 2 Species 3 Species 4 Species 5 Species 6 Species 7 Species 8 Species 9 Species 10 Features New property Structure S (85 features from Osherson et al., e.g., for Elephant: ‘gray’, ‘hairless’, ‘toughskin’, ‘big’, ‘bulbous’, ‘longleg’, ‘tail’, ‘chewteeth’, ‘tusks’, ‘smelly’, ‘walks’, ‘slow’, ‘strong’, ‘muscle’, ‘quadrapedal’,…) Data D Slide114:  Gorillas have property P. Mice have property P. Seals have property P. All mammals have property P. Cows have property P. Elephants have property P. Horses have property P. Tree 2D Reasoning about spatially varying properties :  Reasoning about spatially varying properties 'Native American artifacts' task Slide116:  Property type 'has T9 hormones' 'can bite through wire' 'carry E. Spirus bacteria' Theory Structure taxonomic tree directed chain directed network + diffusion process + drift process + noisy transmission Class C Class A Class D Class E Class G Class F Class B Class C Class A Class D Class E Class G Class F Class B Class A Class B Class C Class D Class E Class F Class G . . . . . . . . . Hypotheses Slide117:  Kelp Human Dolphin Sand shark Mako shark Tuna Herring Kelp Human Dolphin Sand shark Mako shark Tuna Herring Hierarchical Bayesian Framework:  Hierarchical Bayesian Framework F: form S: structure D: data mouse squirrel chimp gorilla F1 F2 F3 F4 Tree mouse squirrel chimp gorilla Space Chain Discovering structural forms:  Discovering structural forms Ostrich Robin Crocodile Snake Bat Orangutan Turtle Ostrich Robin Crocodile Snake Bat Orangutan Turtle Ostrich Robin Crocodile Snake Bat Orangutan Turtle Discovering structural forms:  Ostrich Robin Crocodile Snake Bat Orangutan Turtle Ostrich Robin Crocodile Snake Bat Orangutan Turtle Angel God Rock Plant Ostrich Robin Crocodile Snake Bat Orangutan Turtle Discovering structural forms Linnaeus 'Great chain of being' People can discover structural forms:  People can discover structural forms Scientists Tree structure for living kinds (Linnaeus) Periodic structure for chemical elements (Mendeleev) Children Hierarchical structure of category labels Clique structure of social groups Cyclical structure of seasons or days of the week Transitive structure for value The value of structural form knowledge: inductive bias:  The value of structural form knowledge: inductive bias Typical structure learning algorithms assume a fixed structural form:  Typical structure learning algorithms assume a fixed structural form Flat Clusters K-Means Mixture models Competitive learning Line Guttman scaling Ideal point models Tree Hierarchical clustering Bayesian phylogenetics Circle Circumplex models Euclidean Space MDS PCA Factor Analysis Grid Self-Organizing Map Generative topographic mapping Goal: a universal framework for unsupervised learning:  Goal: a universal framework for unsupervised learning 'Universal Learner' K-Means Hierarchical clustering Factor Analysis Guttman scaling Circumplex models Self-Organizing maps ··· Data Representation Hierarchical Bayesian Framework:  F: form S: structure D: data Hierarchical Bayesian Framework mouse squirrel chimp gorilla F1 F2 F3 F4 Structural forms as graph grammars:  Structural forms as graph grammars Form Form Process Process Node-replacement graph grammars:  Node-replacement graph grammars Production (Line) Derivation Node-replacement graph grammars:  Production (Line) Derivation Node-replacement graph grammars Node-replacement graph grammars:  Production (Line) Derivation Node-replacement graph grammars Model fitting:  Model fitting Evaluate each form in parallel For each form, heuristic search over structures based on greedy growth from a one-node seed: Slide131:  Development of structural forms as more data are observed:  Development of structural forms as more data are observed Beyond “Nativism” versus “Empiricism”:  Beyond 'Nativism' versus 'Empiricism' 'Nativism': Explicit knowledge of structural forms for core domains is innate. Atran (1998): The tendency to group living kinds into hierarchies reflects an 'innately determined cognitive structure'. Chomsky (1980): 'The belief that various systems of mind are organized along quite different principles leads to the natural conclusion that these systems are intrinsically determined, not simply the result of common mechanisms of learning or growth.' 'Empiricism': General-purpose learning systems without explicit knowledge of structural form. Connectionist networks (e.g., Rogers and McClelland, 2004). Traditional structure learning in probabilistic graphical models. Summary: concept learning:  Summary: concept learning Models based on Bayesian inference over hierarchies of structured representations. How does abstract domain knowledge guide learning of new concepts? How can this knowledge be represented, and how might it be learned? F: form S: structure D: data mouse squirrel chimp gorilla mouse squirrel chimp gorilla F1 F2 F3 F4 How can probabilistic inference work together with flexibly structured representations to model complex, real-world learning and reasoning? Contributions of Bayesian models:  Contributions of Bayesian models Principled quantitative models of human behavior, with broad coverage and a minimum of free parameters and ad hoc assumptions. Explain how and why human learning and reasoning works, in terms of (approximations to) optimal statistical inference in natural environments. A framework for studying people’s implicit knowledge about the structure of the world: how it is structured, used, and acquired. A two-way bridge to state-of-the-art AI and machine learning. Looking forward:  Looking forward What we need to understand: the mind’s ability to build rich models of the world from sparse data. Learning about objects, categories, and their properties. Causal inference Language comprehension and production Scene understanding Understanding other people’s actions, plans, thoughts, goals What do we need to understand these abilities? Bayesian inference in probabilistic generative models Hierarchical models, with inference at all levels of abstraction Structured representations: graphs, grammars, logic Flexible representations, growing in response to observed data Learning word meanings:  Structure Data Whole-object principle Shape bias Taxonomic principle Contrast principle Basic-level bias Learning word meanings (Tenenbaum andamp; Xu) Abstract Principles Causal learning and reasoning:  Abstract Principles Structure Data (Griffiths, Tenenbaum, Kemp et al.) Causal learning and reasoning Slide139:  Phrase structure Utterance Speech signal Grammar 'Universal Grammar' Hierarchical phrase structure grammars (e.g., CFG, HPSG, TAG) Vision as probabilistic parsing:  (Han andamp; Zhu, 2006; c.f., Zhu, Yuanhao andamp; Yuille NIPS 06 ) Vision as probabilistic parsing Slide141:  Goal-directed action (production and comprehension):  Goal-directed action (production and comprehension) (Wolpert et al., 2003) Bayesian models of action understanding:  Bayesian models of action understanding (Baker, Tenenbaum andamp; Saxe; Verma andamp; Rao) Open directions and challenges:  Open directions and challenges Effective methods for learning structured knowledge How to balance expressiveness/learnability tradeoff? More precise relation to psychological processes To what extent do mental processes implement boundedly rational methods of approximate inference? Relation to neural computation How to implement structured representations in brains? Modeling individual subjects and single trials Is there a rational basis for probability matching? Understanding failure cases Are these simply 'not Bayesian', or are people using a different model? How do we avoid circularity? Want to learn more?:  Want to learn more? Special issue of Trends in Cognitive Sciences (TiCS), July 2006 (Vol. 10, no. 7), on 'Probabilistic models of cognition'. Tom Griffiths’ reading list, a/k/a http://bayesiancognition.com Summer school on probabilistic models of cognition, July 2007, Institute for Pure and Applied Mathematics (IPAM) at UCLA. Slide146:  Extra slides:  Extra slides Bayesian prediction:  Bayesian prediction P(ttotal|tpast) ttotal What is the best guess for ttotal? Compute t such that P(ttotal andgt; t|tpast) = 0.5: P(ttotal|tpast)  1/ttotal P(tpast) posterior probability Random sampling Domain-dependent prior We compared the median of the Bayesian posterior with the median of subjects’ judgments… but what about the distribution of subjects’ judgments? Sources of individual differences:  Individuals’ judgments could by noisy. Individuals’ judgments could be optimal, but with different priors. e.g., each individual has seen only a sparse sample of the relevant population of events. Individuals’ inferences about the posterior could be optimal, but their judgments could be based on probability (or utility) matching rather than maximizing. Sources of individual differences Individual differences in prediction:  Individual differences in prediction P(ttotal|tpast) ttotal Quantile of Bayesian posterior distribution Proportion of judgments below predicted value Individual differences in prediction:  Individual differences in prediction Average over all prediction tasks: movie run times movie grosses poem lengths life spans terms in congress cake baking times P(ttotal|tpast) ttotal Individual differences in concept learning:  Individual differences in concept learning Why probability matching?:  Optimal behavior under some (evolutionarily natural) circumstances. Optimal betting theory, portfolio theory Optimal foraging theory Competitive games Dynamic tasks (changing probabilities or utilities) Side-effect of algorithms for approximating complex Bayesian computations. Markov chain Monte Carlo (MCMC): instead of integrating over complex hypothesis spaces, construct a sample of high-probability hypotheses. Judgments from individual (independent) samples can on average be almost as good as using the full posterior distribution. Why probability matching? Markov chain Monte Carlo:  Markov chain Monte Carlo (Metropolis-Hastings algorithm) The puzzle of coincidences:  The puzzle of coincidences Discoveries of hidden causal structure are often driven by noticing coincidences. . . Science Halley’s comet (1705) Slide156:  (Halley, 1705) Slide157:  (Halley, 1705) The puzzle of coincidences:  The puzzle of coincidences Discoveries of hidden causal structure are often driven by noticing coincidences. . . Science Halley’s comet (1705) John Snow and the cause of cholera (1854) Slide159:  Rational analysis of cognition:  Rational analysis of cognition Often can show that apparently irrational behavior is actually rational. Which cards do you have to turn over to test this rule? 'If there is an A on one side, then there is a 2 on the other side' Rational analysis of cognition:  Rational analysis of cognition Often can show that apparently irrational behavior is actually rational. Oaksford andamp; Chater’s rational analysis: Optimal data selection based on maximizing expected information gain. Test the rule 'If p, then q' against the null hypothesis that p and q are independent. Assuming p and q are rare predicts people’s choices: Integrating multiple forms of reasoning(Kemp, Shafto, Berke & Tenenbaum NIPS 06):  Integrating multiple forms of reasoning (Kemp, Shafto, Berke andamp; Tenenbaum NIPS 06) 1) Taxonomic relations between categories 2) Causal relations between features … Parameters of causal relations vary smoothly over the category hierarchy. T9 hormones cause elevated heart rates. Elevated heart rates cause faster metabolisms. Mice have T9 hormones. …? Integrating multiple forms of reasoning:  Integrating multiple forms of reasoning Infinite relational models (Kemp, Tenenbaum, Griffiths, Yamada & Ueda, AAAI 06):  concept concept predicate Infinite relational models (Kemp, Tenenbaum, Griffiths, Yamada andamp; Ueda, AAAI 06) Biomedical predicate data from UMLS (McCrae et al.): 134 concepts: enzyme, hormone, organ, disease, cell function ... 49 predicates: affects(hormone, organ), complicates(enzyme, cell function), treats(drug, disease), diagnoses(procedure, disease) … (c.f. Xu, Tresp, et al. SRL 06) Learning relational theories:  e.g., Diseases affect Organisms Chemicals interact with Chemicals Chemicals cause Diseases Learning relational theories Learning annotated hierarchies from relational data(Roy, Kemp, Mansinghka, Tenenbaum NIPS 06):  Learning annotated hierarchies from relational data (Roy, Kemp, Mansinghka, Tenenbaum NIPS 06) Slide167:  Primate troop Bush administration Prison inmates Kula islands 'x beats y' 'x told y' 'x likes y' 'x trades with y' Dominance hierarchy Tree Cliques Ring Learning abstract relational structures Bayesian inference in neural networks:  (Rao, in press) Bayesian inference in neural networks The big problem of intelligence:  The big problem of intelligence The development of intuitive theories in childhood. Psychology: How do we learn to understand others’ actions in terms of beliefs, desires, plans, intentions, values, morals? Biology: How do we learn that people, dogs, bees, worms, trees, flowers, grass, coral, moss are alive, but chairs, cars, tricycles, computers, the sun, Roomba, robots, clocks, rocks are not? The big problem of intelligence:  The big problem of intelligence Consider a man named Boris. Is the mother of Boris’s father his grandmother? Is the mother of Boris’s sister his mother? Is the son of Boris’s sister his son? (Note: Boris and his family were stranded on a desert island when he was a young boy.) Common sense reasoning.

31. 12. 2007
0 views

17. 09. 2007
0 views

17. 09. 2007
0 views

19. 09. 2007
0 views

11. 10. 2007
0 views

12. 10. 2007
0 views

15. 10. 2007
0 views

15. 10. 2007
0 views

16. 10. 2007
0 views

17. 10. 2007
0 views

21. 10. 2007
0 views

22. 10. 2007
0 views

17. 09. 2007
0 views

17. 09. 2007
0 views

07. 10. 2007
0 views

12. 10. 2007
0 views

23. 10. 2007
0 views

23. 10. 2007
0 views

19. 09. 2007
0 views

17. 10. 2007
0 views

29. 10. 2007
0 views

21. 08. 2007
0 views

17. 09. 2007
0 views

24. 10. 2007
0 views

29. 08. 2007
0 views

16. 10. 2007
0 views

20. 11. 2007
0 views

23. 12. 2007
0 views

29. 08. 2007
0 views

03. 01. 2008
0 views

03. 01. 2008
0 views

07. 01. 2008
0 views

17. 09. 2007
0 views

29. 10. 2007
0 views

21. 08. 2007
0 views

01. 08. 2007
0 views

29. 08. 2007
0 views

29. 08. 2007
0 views

04. 10. 2007
0 views

05. 10. 2007
0 views

24. 10. 2007
0 views

07. 01. 2008
0 views

15. 10. 2007
0 views

16. 02. 2008
0 views

20. 02. 2008
0 views

26. 02. 2008
0 views

17. 10. 2007
0 views

16. 03. 2008
0 views

18. 03. 2008
0 views

25. 03. 2008
0 views

29. 08. 2007
0 views

29. 08. 2007
0 views

29. 08. 2007
0 views

27. 03. 2008
0 views

10. 04. 2008
0 views

13. 04. 2008
0 views

29. 08. 2007
0 views

14. 04. 2008
0 views

16. 04. 2008
0 views

17. 04. 2008
0 views

18. 04. 2008
0 views

22. 04. 2008
0 views

28. 04. 2008
0 views

21. 08. 2007
0 views

17. 09. 2007
0 views

17. 09. 2007
0 views

30. 04. 2008
0 views

02. 05. 2008
0 views

09. 10. 2007
0 views

15. 10. 2007
0 views

29. 08. 2007
0 views

03. 01. 2008
0 views

18. 06. 2007
0 views

18. 06. 2007
0 views

18. 06. 2007
0 views

18. 06. 2007
0 views

18. 06. 2007
0 views

18. 06. 2007
0 views

18. 06. 2007
0 views

18. 06. 2007
0 views

18. 06. 2007
0 views

18. 06. 2007
0 views

18. 06. 2007
0 views

31. 10. 2007
0 views

22. 10. 2007
0 views

18. 06. 2007
0 views

13. 11. 2007
0 views

21. 08. 2007
0 views

22. 10. 2007
0 views

03. 10. 2007
0 views

19. 11. 2007
0 views

17. 09. 2007
0 views

15. 06. 2007
0 views

15. 06. 2007
0 views

15. 06. 2007
0 views

15. 06. 2007
0 views

15. 06. 2007
0 views

15. 06. 2007
0 views

15. 06. 2007
0 views

15. 06. 2007
0 views

29. 08. 2007
0 views

17. 09. 2007
0 views

29. 08. 2007
0 views

17. 09. 2007
0 views

19. 09. 2007
0 views

17. 09. 2007
0 views

29. 08. 2007
0 views

29. 08. 2007
0 views

29. 08. 2007
0 views

23. 10. 2007
0 views

02. 10. 2007
0 views

01. 08. 2007
0 views

17. 09. 2007
0 views

19. 10. 2007
0 views

15. 11. 2007
0 views

29. 08. 2007
0 views

17. 09. 2007
0 views