Continuous‐time mean–variance portfolio selection: A reinforcement learning framework

We approach the continuous‐time mean–variance portfolio selection with reinforcement learning (RL). The problem is to achieve the best trade‐off between exploration and exploitation, and is formulated as an entropy‐regularized, relaxed stochastic control problem. We prove that the optimal feedback p...

Full description

Saved in:
Bibliographic Details
Published in:Mathematical finance 2020-10, Vol.30 (4), p.1273-1308
Main Authors: Wang, Haoran, Zhou, Xun Yu
Format: Article
Language:eng
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:We approach the continuous‐time mean–variance portfolio selection with reinforcement learning (RL). The problem is to achieve the best trade‐off between exploration and exploitation, and is formulated as an entropy‐regularized, relaxed stochastic control problem. We prove that the optimal feedback policy for this problem must be Gaussian, with time‐decaying variance. We then prove a policy improvement theorem, based on which we devise an implementable RL algorithm. We find that our algorithm and its variant outperform both traditional and deep neural network based algorithms in our simulation and empirical studies.
ISSN:0960-1627
1467-9965