# VRP15

Published on February 26, 2008

Author: Alohomora

Source: authorstream.com

A Reinforcement Learning Approach for Product Delivery by Multiple Vehicles:  A Reinforcement Learning Approach for Product Delivery by Multiple Vehicles Scott Proper Oregon State University Prasad Tadepalli Hong Tang Rasaratnam Logendran Vehicle Routing & Product Delivery:  Vehicle Routing & Product Delivery Contributions of our Research:  Contributions of our Research Multiple vehicle product delivery is a well-studied problem in operations research We have formulated this problem as an average reward reinforcement learning (RL) problem We have combined inventory control with vehicle routing We have scaled RL methods to work with large state spaces Markov Decision Processes:  Markov Decision Processes Action a Actions are stochastic: Pi,j(a) Actions have costs or rewards: ri(a) Move Unload Unload Average Reward Reinforcement Learning:  Average Reward Reinforcement Learning Goal: Maximize average reward/time step Minimize stockout penalty + movement penalty Policy: states → actions Value function: states → real values expected long-term reward from a state, relative to other states, when following the optimal policy H-Learning:  H-Learning The value function satisfies the Bellman equation: The optimal action a* maximizes the immediate reward + expected value of the next state H-Learning is a real-time algorithm for solving the value function H-Learning: an example 1:  H-Learning: an example 1 -.1, 1/1 0, 9/9 0, 0/9 A E D C B H-Learning: an example 2:  H-Learning: an example 2 Stockout penalty: -20 A E D C B -.1, 1/1 0, 9/10 -20, 1/10 H-Learning: an example 3:  H-Learning: an example 3 A E D C B -.1, 1/1 0, 9/10 -20, 1/10 H-Learning: an example 4:  H-Learning: an example 4 Move penalty: -.1 A E D C B -.1, 2/2 0, 9/10 -20, 1/10 On-line Product Delivery:  On-line Product Delivery Deliver 1 product 9 truck actions: 4 levels of unload 4 move directions wait P(Inventory decrease | shop) Stockout penalty: -20 Movement penalty: -.1 5 Shops Depot The problem of state-space explosion:  The problem of state-space explosion The loads of trucks and shop inventories are discretized into 5 levels States grow exponentially in shops and trucks 10 locations, 5 shops, 2 trucks = (102)(55)(52) = 7,812,500 states 5 trucks = 976,592,500,000 states Table-based methods take too much time and space Piecewise Linear Function Approximation:  Piecewise Linear Function Approximation We use a different linear function for each possible 5-tuple of locations l1,…, l5 of trucks Each function is linear in truck loads and shop inventories Every function represents 10 million states million-fold reduction of learnable parameters Piecewise linear function approximation vs. table-based:  Piecewise linear function approximation vs. table-based Storing and using the action models:  Storing and using the action models Problem: exponential time to determine the expected value of the next state: - Each shop’s consumption is independent - Value function is piecewise linear ? ? ? ? Ignoring Truck Identity:  Ignoring Truck Identity m = number of locations (10) k = number of trucks (2-5) 5 trucks: 105 functions Learnable parameters: 1.1 million 2002 functions Learnable parameters: 22,022 mk The problem of action-space explosion:  The problem of action-space explosion Every action a is a vector of individual “truck actions” a = (a1, a2,…,an) Actions grow exponentially in the number of trucks 9 “truck actions” For 2 trucks: 92 = 81 total actions For 5 trucks: 95 = 59,049 total actions Hill Climbing Search:  Hill Climbing Search We initialize the vector of truck actions a to all “wait” actions We use hill climbing to reach a local optimum Randomly perturb a truck action, repeat This results in an order-of-magnitude improvement in search time Hill climbing vs. exhaustive search for 4 and 5 trucks:  Hill climbing vs. exhaustive search for 4 and 5 trucks Conclusion:  Conclusion Average-reward RL and Piecewise linear function approximation are promising approaches for real-time product delivery Hill climbing shows great potential for speeding up search in domains with a large action space Problems of scaling are surmountable Future Work:  Future Work Scaling! More trucks, more locations, more shops, more depots, and more items Allowing trucks to move with non-uniform speeds (event-based model needed) Real-valued shop inventory and truck load levels

02. 11. 2007
0 views

06. 08. 2007
0 views

15. 11. 2007
0 views

04. 10. 2007
0 views

04. 10. 2007
0 views

16. 10. 2007
0 views

17. 10. 2007
0 views

22. 10. 2007
0 views

17. 10. 2007
0 views

23. 10. 2007
0 views

23. 10. 2007
0 views

26. 11. 2007
0 views

07. 12. 2007
0 views

13. 10. 2007
0 views

29. 10. 2007
0 views

26. 08. 2007
0 views

22. 10. 2007
0 views

07. 11. 2007
0 views

07. 11. 2007
0 views

13. 11. 2007
0 views

24. 10. 2007
0 views

15. 11. 2007
0 views

22. 11. 2007
0 views

14. 11. 2007
0 views

28. 12. 2007
0 views

28. 11. 2007
0 views

03. 01. 2008
0 views

03. 01. 2008
0 views

24. 10. 2007
0 views

26. 10. 2007
0 views

05. 01. 2008
0 views

06. 08. 2007
0 views

06. 08. 2007
0 views

06. 08. 2007
0 views

12. 10. 2007
0 views

05. 01. 2008
0 views

15. 10. 2007
0 views

15. 10. 2007
0 views

16. 10. 2007
0 views

06. 08. 2007
0 views

11. 10. 2007
0 views

15. 10. 2007
0 views

23. 10. 2007
0 views

17. 10. 2007
0 views

19. 11. 2007
0 views

28. 02. 2008
0 views

26. 08. 2007
0 views

13. 03. 2008
0 views

06. 08. 2007
0 views

25. 03. 2008
0 views

26. 03. 2008
0 views

07. 04. 2008
0 views

27. 03. 2008
0 views

30. 03. 2008
0 views

09. 04. 2008
0 views

10. 04. 2008
0 views

13. 04. 2008
0 views

14. 04. 2008
0 views

16. 04. 2008
0 views

17. 10. 2007
0 views

17. 04. 2008
0 views

22. 04. 2008
0 views

07. 10. 2007
0 views

04. 03. 2008
0 views

26. 08. 2007
0 views

18. 06. 2007
0 views

18. 06. 2007
0 views

18. 06. 2007
0 views

18. 06. 2007
0 views

18. 06. 2007
0 views

16. 10. 2007
0 views

18. 06. 2007
0 views

18. 06. 2007
0 views

18. 06. 2007
0 views

18. 06. 2007
0 views

18. 06. 2007
0 views

18. 06. 2007
0 views

18. 06. 2007
0 views

18. 06. 2007
0 views

06. 08. 2007
0 views

15. 10. 2007
0 views

18. 06. 2007
0 views

06. 08. 2007
0 views

22. 10. 2007
0 views

15. 06. 2007
0 views

15. 06. 2007
0 views

15. 06. 2007
0 views

15. 06. 2007
0 views

15. 06. 2007
0 views

15. 06. 2007
0 views

15. 06. 2007
0 views

12. 10. 2007
0 views

27. 11. 2007
0 views

29. 02. 2008
0 views

18. 06. 2007
0 views

26. 08. 2007
0 views

03. 01. 2008
0 views

22. 10. 2007
0 views

18. 06. 2007
0 views

01. 11. 2007
0 views

29. 10. 2007
0 views

15. 10. 2007
0 views

27. 09. 2007
0 views

02. 10. 2007
0 views

24. 02. 2008
0 views

06. 08. 2007
0 views

26. 10. 2007
0 views

26. 08. 2007
0 views

26. 10. 2007
0 views

06. 08. 2007
0 views

30. 10. 2007
0 views

06. 08. 2007
0 views