Merton’s portfolio problem

19. Merton’s portfolio problem#

Merton’s portfolio problem is an optimization problem dealing with the choice of optimal fraction of investment in a risky asset \(\pi_t\) and the consumption \(c_t\), making a value function maximum. Merton’s assume that the portfolio is build with a risk-free and a risky asset only. In this problem, there’s no minimum spending level (subsistence level, what you need for survival), or transasction costs. Under some assumptions, the optimal fraction of the investment in the risky asset \(\pi^*_t\) is constant (need for - free, as costs are neglected - rebalancing) and equal to the ratio of the excess return of the risky asset \(\mu - r\), scaled by its volatility \(\sigma^2\) and a personal risk aversion \(\gamma\),

\[\pi^* = \frac{\mu - r}{\gamma \sigma^2} \ ,\]

and the consumption is proportional to the wealth by a factor depending on time \(t\),

\[c^*_t = \frac{X_t}{f(t)}\]

19.1. Wealth dynamics#

Assuming that the wealth of a family or an individual can is invested with a fraction \(\pi_t\) in a risky part of the portfolio with expected return \(\mu\) and standard variation \(\sigma_t\), and with a fraction \(1-\pi_t\) in a risk-free asset with expected return \(r_t\) and zero standard deviation, the wealth \(x_t\) of a family or an individual evolves with the SDE

\[\begin{aligned} d X_t & = \underbrace{ (1 - \pi_t ) r_t X_t \, dt }_{\text{risk-free asset}} + \underbrace{\pi_t \left( \mu_t X_t \, dt + \sigma_t X_t \, dW_t \right)}_{\text{risky asset}} - \underbrace{ c_t \, dt}_{\text{consumption}} \ , \end{aligned}\]

or, rearranging for \(dt\), and \(dW_t\),

\[\begin{aligned} dX_t & = \underbrace{\left[ r_t + \left( \mu_t - r_t \right) \pi_t \right] X_t \, dt}_{\text{expected return}} - \underbrace{c_t \, dt}_{\text{consumption}} + \underbrace{\pi_t \sigma_t X_t \, dW_t}_{\text{volatility of the return}} \ , \end{aligned}\]

i.e. the equation of a geometric Brownian motion with drift, with

1-period (or percentage) expected value \((1-\pi_t) r_t + \pi_t \mu_t = r_t + \pi_t (\mu_t - r_t)\)
1-period (or percentage) std-deviation \(\pi_t \sigma_t\)
1-period (or percentage) drift \(-c_t\)

This is the same equation as the one that can be used to discuss sequence risk in investment, especially dealing with withdrawal.

19.2. Value function#

The goal of the problem is the maximization of a objective function representing the life-time cumulative discounted reward, using continuous-time reinforcement learning vocabulary, see as an example Math:Introduction to RL, or - roughly - cumulative happiness. The objective function can be written as a value function,

\[V(x,t) = \mathbb{E} \left[ \left. \int_{s=t}^{T} e^{- \rho (s-t)} u(c_s) \, ds + e^{-\rho(T-t)} B(T) u(X_T) \right| X_t = x \right] \ ,\]

with

\(u(c)\) the utility function, the “joy from spending”
\(\rho\) the discount rate, i.e. “the personal weight of the present”, or a “measure of impatience”
\(B(T)\), the bequest weight, i.e. “the importance of leaving money behind”

Optimization problem can be solved using continuous-time reinforcement learning, see as an example Math:Introduction to RL, and Statistics:RL (todo). Optimal solution \(\pi_t^*\), \(c_t^*\) is the fraction invested in the risky asset and consumption that maximises the value function, \(V(x,t)\).

Arguments of the value function

Value function may be written as a function of several arguments, that can be treated as

independent variables, like:
- the initial time \(t\)
- the initial value of the wealth \(x\)
parameters, like
- the final time \(T\)
- the discount rate \(\rho\),
- the parameters in wealth dynamics, e.g. \(r\), \(\mu\), \(\sigma\)
functions and its parameters, like
- the wealth fraction invested in the risky asset \(\pi_t\)
- the utility function \(u_t(c; \gamma)\), and the risk aversion therein \(\gamma\)
- …

For brevity, the value function is usually written as a function of the independent variables \(x\), \(t\) only, but it could be written with the explict dependence from parameters and functions,

\[V(x,t; \mathbf{p}, \mathbf{f}_t) \ .\]

For constant expected return and volatility of the assets, under the assumption of constant relative risk aversion (CCRA), i.e. utility function \(u(x) = \frac{x^{1-\gamma}}{1 - \gamma}\), it’s possible to find an analytical solution with optimal fraction invested in the risky asset

\[\pi^* = \frac{\mu - r}{\gamma \sigma^2} \ ,\]

and consumption that’s proportional to the wealth with a function of time \(f(t)\)

\[c_t = \frac{X_t}{f(t)} \ .\]

19.3. Solution of the optimization problem#

The solution of the optimization problem in the framework of reinforcement learning (or stochastic control?) may exploit the principle of optimality, getting the global optimization of the value function over time range \([t, T]\) from the local optimization over all the elementary time-steps \(dt\) in the time range of interest.

Merton’s portfolio problem

Contents

19. Merton’s portfolio problem#

19.1. Wealth dynamics#

19.2. Value function#

19.3. Solution of the optimization problem#