tufts web

Information about tufts web

Published on November 19, 2007

Author: Columbia

Source: authorstream.com

Content

Learning from how dogs learn:  Learning from how dogs learn Prof. Bruce Blumberg The Media Lab, MIT [email protected] www.media.mit.edu/~bruce About me…:  About me… About me…:  About me… Practical & compelling real-time learning:  Practical & compelling real-time learning Easy for interactive characters to learn what they ought to be able to learn Easy for a human trainer to guide learning process A compelling user experience Provide heuristics and practical design principles My bias & focus:  My bias & focus Learning occurs within an innate structure that biases… Attention Motivation Innate frequency, form and organization of behavior When certain things are most easily learned What are the catalytic components of the scaffolding that make learning possible? sheep|dog:trial by eire:  sheep|dog:trial by eire See sheep|dog video on my website Object persistence:  Object persistence See object persistence video on my website Temporal representation:  Temporal representation See temporal representation (aka Goatzilla) video on my website Alpha Wolf:  Alpha Wolf See alpha wolf video on my website [email protected][email protected] See [email protected] video on my website or go to Scientific American Frontiers website Dobie T. Coyote Goes to School:  Dobie T. Coyote Goes to School See Dobie video on my website Why look at Dog Training?:  Why look at Dog Training? Interactive characters pose unique challenges: State, action and state-action spaces are often continuous and far too big to search exhaustively To be compelling characters must Learn “obvious” contingencies between state, actions and consequences quickly Easy to train without visibility into internal state of character. Learning is only one thing they have to do. Dogs and their trainers seem to solve these problems easily Invaluable resources:  Invaluable resources Doing it, and talking to people who do it. Wilkes, Pryor, Ramirez Lindsay, Burch & Bailey, Mackintosh Lorenz, Leyhausen, Coppinger & Coppinger The problem facing dogs (real and synthetic):  The problem facing dogs (real and synthetic) Set of all possible actions Set of all motivational goals Set of all possible stimuli What do I do, when, in order to best satisfy my motivational goals? The space of possible stimuli is wicked big:  The space of possible stimuli is wicked big Time of Occurence State Space The space of possible actions is also very big:  The space of possible actions is also very big Set of all possible actions Action Time of Performance Action Space Who gets credit for good things happening?:  Who gets credit for good things happening? Yumm.. Action Figure -8 Shake High -5 Beg Down Left ear twitch Modality of Stimuli Who gets credit for good things happening?:  Who gets credit for good things happening? Yumm.. Time Conventional idea: back propagation from goal:  Conventional idea: back propagation from goal stalk grab-bite eye orient kill-bite chase Yumm.. Time Credit flows backward Conventional idea: back propagation from goal:  Conventional idea: back propagation from goal stalk grab-bite eye orient kill-bite chase Yumm.. Time Credit flows backward Conventional idea: back propagation from goal:  Conventional idea: back propagation from goal stalk grab-bite eye orient kill-bite chase Yumm.. Time Credit flows backward The problem:  The problem If each element in sequence has 3 variants, there are 729 possible combinations of which 1 may work (ignoring stimuli) If there are 12 possible stimuli, there are 1,586,874,322,944 possible combinations of stimuli-action pairs to explore. Don’t know if it is the right sequence until goal is reached What happens if “variant” needs to be learned? Leyhausen’s suggestion…:  Leyhausen’s suggestion… stalk grab-bite eye orient kill-bite chase Time Each element is innately self-motivating and has innate reward metric motivation & reward motivation & reward motivation & reward motivation & reward motivation & reward motivation & reward Leyhausen’s suggestion…:  Leyhausen’s suggestion… stalk grab-bite eye orient kill-bite chase Time Each element is innately self-motivating and has innate reward metric motivation & reward motivation & reward motivation & reward motivation & reward motivation & reward motivation & reward Coppinger’s suggestion…:  Coppinger’s suggestion… stalk grab-bite eye orient kill-bite chase Time Varying innate tendency to follow behavior with “next” in sequence Functional goal plays incidental role:  Functional goal plays incidental role stalk grab-bite eye orient kill-bite chase Time Propagated value from functional goal plays incidental role Yumm.. Big idea: innate biases make learning possible :  Big idea: innate biases make learning possible Biases include… Temporal Proximity implies causality Attend more readily to certain classes of stimuli than to others (motion vs. speech) Lazy discovery (pay attention once you have a reason to pay attention) Elements may be “innately” self-motivating and have local metric of “goodness” Good trainers actively guide dog’s exploration:  Good trainers actively guide dog’s exploration Behavioral Train behavior, then cue Differential rewards encourage variability Motor Shaping Rewarding successive approximations Luring Pose, e.g. “down” Trajectory, e.g. “figure-8” Dogs constrain search for causal agents:  Dogs constrain search for causal agents Time Consequences Window: Trainer “clicks” signaling reward is coming. When reward is actually received Attention Window: Cue given immediately before or as dog is moving into desired pose Sit Approach Eat Dogs make the problem tractable by constraining search for causal agents to narrow temporal windows Dogs use implicit feedback to guide perceptual learning:  Dogs use implicit feedback to guide perceptual learning Sit Time “sit-utterance” perceived. Approach Eat “click” perceived. Dog decides to sit Build & update perceptual model of “sit-utterance” Dogs use rewarded action to identify potentially promising state to explore and to guide formation of perceptual models Dogs give credit where credit is due…:  Dogs give credit where credit is due… Trainer repeatedly lures dog through a trajectory or into a pose Eventually, dog performs behavior spontaneously Implication Dog associates reward with resulting body configuration or trajectory and not just with “follow-your nose” Observation: dogs give credit where credit is due:  Observation: dogs give credit where credit is due Sit Time “sit-utterance” perceived. Approach Eat “click” perceived. Dog decides to sit Credit sitting in presence of “sit-utterance” Build & update perceptual model of “sit-utterance” D.L.: Take Advantage of Predictable Regularities:  D.L.: Take Advantage of Predictable Regularities Constrain search for causal agents by taking advantage of temporal proximity & natural hierarchy of state spaces Use consequences to bias choice of action But vary performance and attend to differences Explore state and action spaces on “as-needed” basis Build models on demand D.L.: Make Use of All Feedback: Explicit & Implicit:  D.L.: Make Use of All Feedback: Explicit & Implicit Use rewarded action as context for identifying Promising state space and action space to explore Good examples from which to construct perceptual models, e.g., A good example of a “sit-utterance” is one that occurs within the context of a rewarded Sit. D.L.: Make Them Easy to Train:  D.L.: Make Them Easy to Train Respond quickly to “obvious” contingencies Support Luring and Shaping Techniques to prompt infrequently expressed or novel motor actions “Trainer friendly” credit assignment Assign credit to candidate that matches trainer’s expectation The System:  The System Dobie T. Coyote…:  Dobie T. Coyote… See dobie video on my website Limitations and Future Work:  Limitations and Future Work Important extensions Other kinds of learning (e.g., social or spatial) Generalization Sequences Expectation-based emotion system How will the system scale? Useful Insights:  Useful Insights Use Temporal proximity to limit search. Hierarchical representations of state, action and state-action space & use implicit feedback to guide exploration “trainer friendly” credit assignment Luring and shaping are essential Acknowledgements:  Acknowledgements Members of the Synthetic Characters Group, past, present & future Gary Wilkes Funded by the Digital Life Consortium

Related presentations


Other presentations created by Columbia

Electrical motors
14. 11. 2007
0 views

Electrical motors

Plant Adaptations
23. 11. 2007
0 views

Plant Adaptations

Davis powerpoint
03. 10. 2007
0 views

Davis powerpoint

8 1 intro unix
29. 11. 2007
0 views

8 1 intro unix

model
07. 12. 2007
0 views

model

Coparmex Laboral Yanis Raptis
11. 12. 2007
0 views

Coparmex Laboral Yanis Raptis

moodys 1
26. 10. 2007
0 views

moodys 1

Careers English
05. 11. 2007
0 views

Careers English

Ford Carter 1975 1980
07. 11. 2007
0 views

Ford Carter 1975 1980

Nitrogen Asphyxiation Bulletin
12. 11. 2007
0 views

Nitrogen Asphyxiation Bulletin

Class10
16. 11. 2007
0 views

Class10

Susantha Bangkok Bioethics
21. 11. 2007
0 views

Susantha Bangkok Bioethics

rciabc en
21. 11. 2007
0 views

rciabc en

SMMGEuler
30. 12. 2007
0 views

SMMGEuler

mod18 1
01. 01. 2008
0 views

mod18 1

RICGPSlideshow
03. 01. 2008
0 views

RICGPSlideshow

Space Wortzel
03. 01. 2008
0 views

Space Wortzel

CTSAs Today Part 3 Wall
04. 01. 2008
0 views

CTSAs Today Part 3 Wall

Nano Paris Oct2006 a5
07. 01. 2008
0 views

Nano Paris Oct2006 a5

vote Verification Sherman GWU
07. 01. 2008
0 views

vote Verification Sherman GWU

pocketcheffmkt
12. 12. 2007
0 views

pocketcheffmkt

ABSLec5
27. 09. 2007
0 views

ABSLec5

Rain Drops
03. 10. 2007
0 views

Rain Drops

04 Livestock Contributions
26. 11. 2007
0 views

04 Livestock Contributions

exor sigcomm
23. 12. 2007
0 views

exor sigcomm

Ici
20. 02. 2008
0 views

Ici

Family and Social Change
24. 02. 2008
0 views

Family and Social Change

InventorsWorkshop042 006
27. 02. 2008
0 views

InventorsWorkshop042 006

Lewis ISDS 2007stp
27. 03. 2008
0 views

Lewis ISDS 2007stp

Neu259 2006 2 photon
20. 11. 2007
0 views

Neu259 2006 2 photon

ASI Presentation
28. 11. 2007
0 views

ASI Presentation

MonetaryPolicyInChina
13. 04. 2008
0 views

MonetaryPolicyInChina

Team Tracer Presentation
14. 11. 2007
0 views

Team Tracer Presentation

NISL History current2
30. 10. 2007
0 views

NISL History current2

Hewitt
02. 10. 2007
0 views

Hewitt

Schmidt
08. 11. 2007
0 views

Schmidt

marineFallOffDuty
06. 11. 2007
0 views

marineFallOffDuty

muzi
29. 10. 2007
0 views

muzi

patty abramson russian
01. 10. 2007
0 views

patty abramson russian

WkshpPres
26. 11. 2007
0 views

WkshpPres

thetis
31. 10. 2007
0 views

thetis

2 06 dela croce
05. 11. 2007
0 views

2 06 dela croce

RenaissanceArt
31. 10. 2007
0 views

RenaissanceArt

Gautam Handout
28. 12. 2007
0 views

Gautam Handout

WillgerodtAllRoads
01. 11. 2007
0 views

WillgerodtAllRoads

Planning Change 5C2
01. 12. 2007
0 views

Planning Change 5C2

Moving with EUROUSA
06. 11. 2007
0 views

Moving with EUROUSA