VRP15

Information about VRP15

Published on February 26, 2008

Author: Alohomora

Source: authorstream.com

Content

A Reinforcement Learning Approach for Product Delivery by Multiple Vehicles:  A Reinforcement Learning Approach for Product Delivery by Multiple Vehicles Scott Proper Oregon State University Prasad Tadepalli Hong Tang Rasaratnam Logendran Vehicle Routing & Product Delivery:  Vehicle Routing & Product Delivery Contributions of our Research:  Contributions of our Research Multiple vehicle product delivery is a well-studied problem in operations research We have formulated this problem as an average reward reinforcement learning (RL) problem We have combined inventory control with vehicle routing We have scaled RL methods to work with large state spaces Markov Decision Processes:  Markov Decision Processes Action a Actions are stochastic: Pi,j(a) Actions have costs or rewards: ri(a) Move Unload Unload Average Reward Reinforcement Learning:  Average Reward Reinforcement Learning Goal: Maximize average reward/time step Minimize stockout penalty + movement penalty Policy: states → actions Value function: states → real values expected long-term reward from a state, relative to other states, when following the optimal policy H-Learning:  H-Learning The value function satisfies the Bellman equation: The optimal action a* maximizes the immediate reward + expected value of the next state H-Learning is a real-time algorithm for solving the value function H-Learning: an example 1:  H-Learning: an example 1 -.1, 1/1 0, 9/9 0, 0/9 A E D C B H-Learning: an example 2:  H-Learning: an example 2 Stockout penalty: -20 A E D C B -.1, 1/1 0, 9/10 -20, 1/10 H-Learning: an example 3:  H-Learning: an example 3 A E D C B -.1, 1/1 0, 9/10 -20, 1/10 H-Learning: an example 4:  H-Learning: an example 4 Move penalty: -.1 A E D C B -.1, 2/2 0, 9/10 -20, 1/10 On-line Product Delivery:  On-line Product Delivery Deliver 1 product 9 truck actions: 4 levels of unload 4 move directions wait P(Inventory decrease | shop) Stockout penalty: -20 Movement penalty: -.1 5 Shops Depot The problem of state-space explosion:  The problem of state-space explosion The loads of trucks and shop inventories are discretized into 5 levels States grow exponentially in shops and trucks 10 locations, 5 shops, 2 trucks = (102)(55)(52) = 7,812,500 states 5 trucks = 976,592,500,000 states Table-based methods take too much time and space Piecewise Linear Function Approximation:  Piecewise Linear Function Approximation We use a different linear function for each possible 5-tuple of locations l1,…, l5 of trucks Each function is linear in truck loads and shop inventories Every function represents 10 million states million-fold reduction of learnable parameters Piecewise linear function approximation vs. table-based:  Piecewise linear function approximation vs. table-based Storing and using the action models:  Storing and using the action models Problem: exponential time to determine the expected value of the next state: - Each shop’s consumption is independent - Value function is piecewise linear ? ? ? ? Ignoring Truck Identity:  Ignoring Truck Identity m = number of locations (10) k = number of trucks (2-5) 5 trucks: 105 functions Learnable parameters: 1.1 million 2002 functions Learnable parameters: 22,022 mk The problem of action-space explosion:  The problem of action-space explosion Every action a is a vector of individual “truck actions” a = (a1, a2,…,an) Actions grow exponentially in the number of trucks 9 “truck actions” For 2 trucks: 92 = 81 total actions For 5 trucks: 95 = 59,049 total actions Hill Climbing Search:  Hill Climbing Search We initialize the vector of truck actions a to all “wait” actions We use hill climbing to reach a local optimum Randomly perturb a truck action, repeat This results in an order-of-magnitude improvement in search time Hill climbing vs. exhaustive search for 4 and 5 trucks:  Hill climbing vs. exhaustive search for 4 and 5 trucks Conclusion:  Conclusion Average-reward RL and Piecewise linear function approximation are promising approaches for real-time product delivery Hill climbing shows great potential for speeding up search in domains with a large action space Problems of scaling are surmountable Future Work:  Future Work Scaling! More trucks, more locations, more shops, more depots, and more items Allowing trucks to move with non-uniform speeds (event-based model needed) Real-valued shop inventory and truck load levels

Related presentations


Other presentations created by Alohomora

Learning The Months Of the Year
02. 11. 2007
0 views

Learning The Months Of the Year

HIV STDs Presentation
06. 08. 2007
0 views

HIV STDs Presentation

17 Supply Chain
15. 11. 2007
0 views

17 Supply Chain

duxbury asa
04. 10. 2007
0 views

duxbury asa

Review Week 19
04. 10. 2007
0 views

Review Week 19

ChaneyPres2
16. 10. 2007
0 views

ChaneyPres2

collab filtering tutorial
17. 10. 2007
0 views

collab filtering tutorial

Winderemere Shibboleth 2005 11
17. 10. 2007
0 views

Winderemere Shibboleth 2005 11

1940 1950
23. 10. 2007
0 views

1940 1950

Curso Chagas clase 1
23. 10. 2007
0 views

Curso Chagas clase 1

eplan en
26. 11. 2007
0 views

eplan en

Angiosperms and Gymnosperms
07. 12. 2007
0 views

Angiosperms and Gymnosperms

ns1
13. 10. 2007
0 views

ns1

dahl poster
26. 08. 2007
0 views

dahl poster

RuneGangeskar
07. 11. 2007
0 views

RuneGangeskar

VOCsLouisville 5 18a AHewitt
07. 11. 2007
0 views

VOCsLouisville 5 18a AHewitt

darkts mlecture
13. 11. 2007
0 views

darkts mlecture

Ipertesto
24. 10. 2007
0 views

Ipertesto

SheepGoatHealth
15. 11. 2007
0 views

SheepGoatHealth

IntroLFG2
22. 11. 2007
0 views

IntroLFG2

HERC Waste Summit
14. 11. 2007
0 views

HERC Waste Summit

ENGS11 2007 ERP
28. 12. 2007
0 views

ENGS11 2007 ERP

Kerr
28. 11. 2007
0 views

Kerr

Bourgault
03. 01. 2008
0 views

Bourgault

CHM 103 Lecture 17 S07
03. 01. 2008
0 views

CHM 103 Lecture 17 S07

Stock
26. 10. 2007
0 views

Stock

russian project proposals plus
05. 01. 2008
0 views

russian project proposals plus

Heredity
06. 08. 2007
0 views

Heredity

interest
06. 08. 2007
0 views

interest

HowAdultsReallyLearn PPt1 13 05
06. 08. 2007
0 views

HowAdultsReallyLearn PPt1 13 05

neutrino
05. 01. 2008
0 views

neutrino

Early chemotherapy
15. 10. 2007
0 views

Early chemotherapy

pdb95
15. 10. 2007
0 views

pdb95

18 DanceLanguage
16. 10. 2007
0 views

18 DanceLanguage

Hermez
06. 08. 2007
0 views

Hermez

NIC
15. 10. 2007
0 views

NIC

indicators
23. 10. 2007
0 views

indicators

BodyPsychotherapy
17. 10. 2007
0 views

BodyPsychotherapy

cikm05 ium
19. 11. 2007
0 views

cikm05 ium

Strock
28. 02. 2008
0 views

Strock

Okereke King County
26. 08. 2007
0 views

Okereke King County

wap cfprog
25. 03. 2008
0 views

wap cfprog

Quotable quotes 070508
26. 03. 2008
0 views

Quotable quotes 070508

e31 glob
07. 04. 2008
0 views

e31 glob

WCG CHAPTER13
27. 03. 2008
0 views

WCG CHAPTER13

KeenWhatsBehindRecor dDebt
09. 04. 2008
0 views

KeenWhatsBehindRecor dDebt

lecture05
10. 04. 2008
0 views

lecture05

GEO205 powerpoint 10
13. 04. 2008
0 views

GEO205 powerpoint 10

4903
14. 04. 2008
0 views

4903

021003 Pang Slides
16. 04. 2008
0 views

021003 Pang Slides

wward
17. 10. 2007
0 views

wward

UMBC presentation 050406
17. 04. 2008
0 views

UMBC presentation 050406

Eureka Gold Real Estate
22. 04. 2008
0 views

Eureka Gold Real Estate

IntroPolicy
07. 10. 2007
0 views

IntroPolicy

ncb klein brief01feb06
04. 03. 2008
0 views

ncb klein brief01feb06

26 canadian
18. 06. 2007
0 views

26 canadian

20 Grund Kurs Kurortmedizin 2005
18. 06. 2007
0 views

20 Grund Kurs Kurortmedizin 2005

wrapebs
18. 06. 2007
0 views

wrapebs

wirsindspoe
18. 06. 2007
0 views

wirsindspoe

Wagemans stud
18. 06. 2007
0 views

Wagemans stud

sgtalk eg 2k4a1
16. 10. 2007
0 views

sgtalk eg 2k4a1

Asset Trade In
18. 06. 2007
0 views

Asset Trade In

arte moderno
18. 06. 2007
0 views

arte moderno

AP Presentation
18. 06. 2007
0 views

AP Presentation

Americas Tour
18. 06. 2007
0 views

Americas Tour

African Americans A Z
18. 06. 2007
0 views

African Americans A Z

Advantage Webinar
18. 06. 2007
0 views

Advantage Webinar

56 personnel Schlegel
18. 06. 2007
0 views

56 personnel Schlegel

GIApowerpoint
06. 08. 2007
0 views

GIApowerpoint

heck
15. 10. 2007
0 views

heck

amore travagliato
18. 06. 2007
0 views

amore travagliato

George HseTrust HIV Lawn
06. 08. 2007
0 views

George HseTrust HIV Lawn

Missao ACentral
22. 10. 2007
0 views

Missao ACentral

fun at work presentatie
15. 06. 2007
0 views

fun at work presentatie

Ellesparlentdeshommes
15. 06. 2007
0 views

Ellesparlentdeshommes

donde esta mama1
15. 06. 2007
0 views

donde esta mama1

Cs101 Lec34
15. 06. 2007
0 views

Cs101 Lec34

Creationism
15. 06. 2007
0 views

Creationism

close your eyes
15. 06. 2007
0 views

close your eyes

Cat V2
15. 06. 2007
0 views

Cat V2

projectmamage
12. 10. 2007
0 views

projectmamage

Healthmanagement7
29. 02. 2008
0 views

Healthmanagement7

area riservata 2006
18. 06. 2007
0 views

area riservata 2006

dominguez hills
26. 08. 2007
0 views

dominguez hills

University Forum Feb 232005
03. 01. 2008
0 views

University Forum Feb 232005

4 cornerstone readership
18. 06. 2007
0 views

4 cornerstone readership

Where is the Donkey
01. 11. 2007
0 views

Where is the Donkey

agbus 435 lec9 f04
29. 10. 2007
0 views

agbus 435 lec9 f04

conchicoll
15. 10. 2007
0 views

conchicoll

nystax
27. 09. 2007
0 views

nystax

RussianAFMuseum
02. 10. 2007
0 views

RussianAFMuseum

weitz
24. 02. 2008
0 views

weitz

Jean McLellan
06. 08. 2007
0 views

Jean McLellan

Andrey Uroda
26. 10. 2007
0 views

Andrey Uroda

ED Lafco Present092805chew
26. 08. 2007
0 views

ED Lafco Present092805chew

buckinghamshirecc
26. 10. 2007
0 views

buckinghamshirecc

232nm14
30. 10. 2007
0 views

232nm14

Growingup Healthy with AFHK logo
06. 08. 2007
0 views

Growingup Healthy with AFHK logo