Contextual reinforcement learning for supply chain management

Efficient generalisation in supply chain inventory management is challenging due to a potential mismatch between the model optimised and objective reality. It is hard to know how the real world is configured and, thus, hard to train an agent optimally for it. We address this problem by combining off...

Full description

Saved in:
Bibliographic Details
Published in:Expert systems with applications 2024-09, Vol.249, Article 123541
Main Authors: Batsis, Alex, Samothrakis, Spyridon
Format: Article
Language:eng
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Efficient generalisation in supply chain inventory management is challenging due to a potential mismatch between the model optimised and objective reality. It is hard to know how the real world is configured and, thus, hard to train an agent optimally for it. We address this problem by combining offline training and online adaptation. Agents were trained offline using data from all possible environmental configurations, termed contexts. During an online adaptation phase, agents search for the context maximising rewards. Agents adapted online rapidly and achieved performance close to knowing the context a-priori. In particular, they acted optimally without inferring the correct context, but by finding a suitable one for reward maximisation. By enabling agents to leverage off-line training and online adaptation, we improve their efficiency and effectiveness in unknown environments. The methodology has broader potential applications and contributes to making RL algorithms useful in practical scenarios. We have released the code for this paper under https://github.com/abatsis/supply_chain_few_shot_RL. •Agents are usually trained in simulations, but tested in the real world.•We use online adaptation to close the gap between simulation and the real world.•This problem is prevalent in real-world application of AI.•We test our methods in supply chain optimisation.
ISSN:0957-4174
1873-6793