Portfolio Optimization and Performance Evaluation

这个文本主要讨论了投资组合优化和表现评估。为了在市场条件下测试策略，需要模拟算法进行交易并验证其表现。策略评估包括针对历史数据的回测来优化策略参数和针对新的样本数据进行的前向测试来验证样本内表现。目标是避免将策略定制到特定过去情况下的虚假发现。在投资组合背景下，正的资产回报可以抵消负的价格波动。一个资产的正的价格变化更可能抵消另一个资产的损失，两个位置之间的相关性越低，这种情况就越可能发生。
基于投资组合风险如何依赖于位置之间的协方差，Harry Markowitz在1952年开发了现代投资组合管理背后的多样化理论。结果是均值方差优化，它选择给定一组资产的权重，以最小化风险，该风险以预期回报的标准差来衡量。资本资产定价模型（CAPM）引入了风险溢价，它以超过无风险投资的预期回报来衡量，作为持有资产的均衡回报。这种奖励补偿了暴露于单一风险因子 - 市场 - 的风险，这种风险是系统性的，而不是特异性的，因此不能通过多样化来消除。风险管理已经发展成为更为复杂的形式，随着额外的风险因素和更精细的选择曝光度的出现。
Kelly准则是一种流行的动态投资组合优化方法，它选择一系列随时间变化的位置；它已经被Edward Thorp在1968年从赌博应用中成功地应用到股票市场中。因此，有几种方法可以优化投资组合，包括将机器学习（ML）应用于学习资产之间的分层关系，并将其持有作为相对于投资组合风险配置文件的补充或替代品处理。
在技术方面，该文本涉及到投资组合优化、回测、股票市场、均值方差优化、风险管理、多样性、协方差、Sharpe比率、信息比率、Kelly准则、Black-Litterman方法、Hierarchical Risk Parity等概念。
在市场条件下实施策略之前，我们需要模拟算法进行交易并验证其表现，以测试策略。策略评估包括使用历史数据进行回测以优化策略的参数，并进行前向测试以验证策略在样本内性能对新的样本外数据的适用性。目标是避免将策略针对特定过去情况进行调整而导致虚假发现。

在投资组合背景下，正收益可以抵消价格下跌。两个资产之间的相关性越低，一个资产的正价格变动越有可能抵消另一个资产的损失。根据投资组合风险与仓位协方差之间的关系，哈里·马科维茨于1952年提出了现代投资组合管理理论，基于多元化来选择给定资产组合的权重以最小化风险，风险以预期回报的标准差来衡量。

资本资产定价模型（CAPM）引入了风险溢价的概念，风险溢价是超过无风险投资的预期回报，作为持有资产的均衡回报。这种回报是为了补偿资产所面临的单一风险因素（市场风险），这种风险是系统性的，与资产的特定个体风险不同，因此不能通过分散投资来消除。

随着额外的风险因素和更精细的暴露选择的出现，风险管理变得越来越复杂。凯利准则是一种流行的动态投资组合优化方法，即随着时间的推移选择一系列持仓，这种方法最初是爱德华·索普在1968年将其从赌博应用转化到股票市场上。

因此，有几种优化投资组合的方法，包括应用机器学习（ML）来学习资产之间的层次关系，并将其持仓视为与投资组合风险配置相互补充或替代的因素。本章将涵盖以下主题：

如何衡量投资组合表现
（调整后的）夏普比率
主动管理的基本定律
如何管理投资组合风险和回报
现代投资组合管理的演变
均值方差优化
Python中的代码示例：寻找有效前沿
均值方差优化的替代方法
等权投资组合
最小方差投资组合
Black-Litterman方法

如何确定：

为了评估和比较不同的策略，或改进现有的策略，我们需要反映其与我们目标相关表现的指标。在投资和交易中，最常见的目标是投资组合的回报和风险。

回报和风险目标意味着存在一种权衡：在某些情况下，承担更多风险可能会带来更高的回报，但也意味着更大的下行风险。为了比较不同策略在这种权衡中的表现，非常流行的是计算回报与风险单位之比的指标。我们将依次讨论夏普比率和信息比率（IR）。

（调整后的）夏普比率
前瞻性夏普比率（SR）将投资组合的预期超额收益与其超额收益的波动性进行比较，波动性由其标准偏差来衡量。它衡量了每单位风险所带来的平均超额回报。可以通过数据来估计它。

金融回报经常违反iid假设。安德鲁·洛（Andrew Lo）推导出了用于稳定但自相关的回报的分布和时间聚合的必要调整。这很重要，因为投资策略的时间序列特性（例如均值回归、动量和其他形式的串行相关性）对于SR估计本身可能产生非平凡的影响，特别是在从更高频率数据年化SR时。

夏普比率的统计特性，安德鲁·洛（Andrew Lo），《金融分析师杂志》，2002年
主动管理的基本定律：
有趣的是，我们在第1章提到过的由吉姆·西蒙斯（Jim Simons）创立的最佳表现量化基金Renaissance Technologies（RenTec）与沃伦·巴菲特（Warren Buffet）产生了相似的回报，尽管它们的方法截然不同。沃伦·巴菲特的投资公司伯克希尔·哈撒韦（Berkshire Hathaway）长期持有大约100至150只股票，而RenTec可能每天执行10万次交易。我们如何比较这些不同的策略？

机器学习是关于优化目标函数的。在算法交易中，目标通常是整个投资组合的回报和风险，相对于一个基准（可以是现金、无风险利率或股票价格指数如标普500）。

高信息比率（IR）意味着相对于承担的额外风险而言，表现具有吸引力。主动管理的基本定律将IR分解为信息系数（IC）作为衡量预测技巧的指标，以及通过独立投注应用这种技巧的能力。它总结了经常参与（高广度）和表现出色（高IC）的重要性。

IC衡量了α因子与其信号所产生的未来回报之间的相关性，并捕捉了管理者预测技能的准确性。策略的广度通过投资者在给定时间段内独立进行的投注数量来衡量，两个值的乘积与IR成正比，也被称为评估风险（Treynor和Black）。

基本定律之所以重要，是因为它强调了超额表现的关键驱动因素：准确的预测和能够进行独立预测并采取行动的能力都很重要。在实践中，鉴于预测之间的横截面和时间序列相关性，估计策略的广度是困难的。

《主动投资组合管理：实现卓越回报和风险控制的量化方法》，理查德·格里诺尔德（Richard Grinold）和罗纳德·卡恩（Ronald Kahn），1999年
如何利用证券分析改进投资组合选择，杰克·特雷纳（Jack L Treynor）和费舍尔·布莱克（Fischer Black），《商业杂志》，1973年
投资组合约束和主动管理的基本定律，Clarke等人，2002年

如何管理投资组合的风险与回报

投资组合管理旨在选择和调整金融工具的头寸，以在与基准相关的目标上实现期望的风险-回报权衡。作为投资组合经理，每个时期，您会选择头寸，以优化分散投资以降低风险，同时实现目标回报。随着时间的推移，这些头寸可能需要重新平衡，以考虑由价格波动引起的权重变化，以实现或维持目标风险配置。

现代投资组合管理的演变
通过利用不完全相关性允许一项资产的收益弥补另一项资产的损失，分散投资可以降低特定预期回报的风险。哈里·马科维茨（Harry Markowitz）于1952年发明了现代投资组合理论（MPT），并提供了选择适当的投资组合权重来优化分散投资的数学工具。

均值-方差优化
现代投资组合理论解决了在给定预期回报下最小化波动性或在给定波动性水平下最大化回报的最优投资组合权重。关键的输入要求是预期资产回报、标准差和协方差矩阵。

《投资组合选择》，哈里·马科维茨（Harry Markowitz），《金融杂志》，1952年
《资本资产定价模型：理论与证据》，尤金·F·法玛（Eugene F. Fama）和肯尼斯·R·弗伦奇（Kenneth R. French），《经济展望杂志》，2004年
代码示例：在Python中找到有效前沿
我们可以使用scipy.optimize.minimize和历史估计的资产回报、标准差和协方差矩阵来计算有效前沿。

笔记本mean_variance_optimization用于在Python中计算有效前沿。

均值-方差优化的替代方法：

由于均值-方差优化问题的准确输入存在挑战，人们采用了几种实际的替代方法，限制均值、方差或两者，或者省略更具挑战性的回报估计，例如我们在本节中讨论的风险平价方法。

1/N投资组合
简单的投资组合为我们提供了有用的基准，用于评估产生过度拟合风险的复杂模型的附加价值。最简单的策略——等权投资组合——被证明是表现最好的策略之一。

最小方差投资组合
另一种选择是全局最小方差（GMV）投资组合，其优先考虑风险的最小化。它显示在有效前沿图中，并可以通过使用均值-方差框架来最小化投资组合标准差来计算。

布莱克-利特曼方法：
布莱克和利特曼（1992年）的全球投资组合优化方法结合了经济模型和统计学习，因为它在许多情况下生成了可信的预期回报估计。该技术假设市场是一个均值-方差投资组合，符合CAPM均衡模型的暗示。它建立在这样一个事实上，即观察到的市场资本化可以被视为市场对每个证券分配的最优权重。市场权重反映了市场价格，而市场价格又包含了市场对未来回报的预期。
《全球投资组合优化》，布莱克（Black, Fischer）；
利特曼（Litterman, Robert），《金融分析师杂志》，1992年

如何确定赌注大小-凯利规则

凯利规则在赌博中有着悠久的历史，因为它提供了在一系列具有不同（但有利）赔率的赌注中押注多少以最大化最终财富的指导。这个规则在1956年由约翰·凯利（John Kelly）在贝尔实验室与克劳德·香农（Claude Shannon）合作时发表了《关于信息率的新解释》。他对新的问答节目《64000美元的问题》中对候选人下注的方式感到好奇，西海岸的观众利用三小时的延迟获取有关获胜者的内部信息。

凯利将它与香农的信息理论联系起来，以解决在赔率有利但仍存在不确定性时最优的长期资本增长押注。他的规则将对每场比赛成功的几率的对数财富最大化作为函数，并包括隐式的破产保护，因为log（0）为负无穷大，这样凯利赌徒自然会避免一切损失。

《信息率的新解释》，约翰·凯利（John Kelly），1956年
《赌徒击败庄家：二十一点游戏的获胜策略》，爱德华·O·索普（Edward O. Thorp），1966年
《击败市场：科学的股票市场系统》，爱德华·O·索普（Edward O. Thorp），1967年
《量化交易：如何构建自己的算法交易业务》，欧内斯特·陈（Ernie Chan），2008年

与Python一起使用的MV优化替代方法
笔记本kelly_rule演示了单个资产和多个资产情况下的应用。
后者的结果也包含在notebook mean_variance_optimization中，以及其他几种替代方法。

分层风险平价
马科斯·洛佩兹·德普拉多（Marcos Lopez de Prado）开发的这种新方法旨在解决二次优化器的三个主要问题，包括马科维茨（Markowitz）的临界线算法（CLA）：

不稳定性，集中度
分层风险平价（HRP）应用图论和机器学习，根据协方差矩阵中包含的信息构建多样化的投资组合。然而，与二次优化器不同，HRP不需要协方差矩阵的可逆性。事实上，HRP可以计算在退化或甚至奇异的协方差矩阵上的投资组合——这对于二次优化器来说是不可能的壮举。
蒙特卡罗实验显示，与CLA相比，HRP提供了更低的样外方差，尽管最小方差是CLA的优化目标。HRP在样外也产生比传统风险平价方法更低的风险投资组合。我们将在第13章中详细讨论HRP，该章讨论了无监督学习的应用，包括应用于交易的分层聚类。

建立多样化的投资组合，在样本外表现优于其他方法，马科斯·洛佩兹·德普拉多（Marcos López de Prado），《投资组合管理杂志》42，第4期（2016年）：59-69。
基于分层聚类的资产配置，托马斯·拉菲诺特（Thomas Raffinot），2016年
我们在第13章中演示了如何实施HRP，并将其与其他方法进行比较，同时介绍了分层聚类。

使用Zipline进行交易和投资组合管理

开源的zipline库是一个事件驱动的回测系统，由众包量化投资基金Quantopian维护和用于生产环境，以促进算法开发和实时交易。它自动化算法对交易事件的反应，并为其提供当前和历史的时间点数据，避免了前瞻性偏差。第8章《ML4T工作流程》有一个更详细、专门介绍使用zipline和backtrader进行回测的部分。

在第4章中，我们介绍了zipline来模拟从横截面市场、基本面和替代数据中计算alpha因子的过程。现在我们将利用alpha因子来得出并执行买入和卖出信号。

代码示例：带有交易和投资组合优化的回测
本节中的代码位于以下两个笔记本中：

本节的笔记本使用conda环境backtest。请参阅安装说明以下载最新的Docker映像或其他设置环境的方法。
笔记本backtest_with_trades模拟了使用Zipline构建基于上一章中的简单均值回归alpha因子的投资组合的交易决策。我们没有明确优化投资组合权重，只是为每个持仓分配了相等价值的头寸。
笔记本backtest_with_pf_optimization演示了如何在简单策略回测中使用PF优化。

使用pyfolio测量回测性能
Pyfolio可以使用许多标准指标，在样本内和样本外对投资组合的性能和风险进行分析。它生成涵盖收益、持仓和交易分析的tear sheet，以及在市场压力期间使用几个内置场景的事件风险分析，并包括贝叶斯样本外性能分析。

代码示例：从Zipline回测中进行pyfolio评估
笔记本pyfolio_demo说明了如何从上一个文件夹中的回测中提取pyfolio输入。然后，它继续计算几个性能指标和使用pyfolio生成tear sheet。

该笔记本需要conda环境backtest。请参阅安装说明以运行最新的Docker映像或设置环境的其他方法。

To test a strategy prior to implementation under market conditions, we need to simulate the trades that the algorithm would make and verify their performance. Strategy evaluation includes backtesting against historical data to optimize the strategy’s parameters and forward-testing to validate the in-sample performance against new, out-of-sample data. The goal is to avoid false discoveries from tailoring a strategy to specific past circumstances.

In a portfolio context, positive asset returns can offset negative price movements. Positive price changes for one asset are more likely to offset losses on another the lower the correlation between the two positions. Based on how portfolio risk depends on the positions’ covariance, Harry Markowitz developed the theory behind modern portfolio management based on diversification in 1952. The result is mean-variance optimization that selects weights for a given set of assets to minimize risk, measured as the standard deviation of returns for a given expected return.

The capital asset pricing model (CAPM) introduces a risk premium, measured as the expected return in excess of a risk-free investment, as an equilibrium reward for holding an asset. This reward compensates for the exposure to a single risk factor—the market—that is systematic as opposed to idiosyncratic to the asset and thus cannot be diversified away.

Risk management has evolved to become more sophisticated as additional risk factors and more granular choices for exposure have emerged. The Kelly criterion is a popular approach to dynamic portfolio optimization, which is the choice of a sequence of positions over time; it has been famously adapted from its original application in gambling to the stock market by Edward Thorp in 1968.

As a result, there are several approaches to optimize portfolios that include the application of machine learning (ML) to learn hierarchical relationships among assets and treat their holdings as complements or substitutes with respect to the portfolio risk profile. This chapter will cover the following topics:

Content

How to measure portfolio performance

To evaluate and compare different strategies or to improve an existing strategy, we need metrics that reflect their performance with respect to our objectives. In investment and trading, the most common objectives are the return and the risk of the investment portfolio.

The return and risk objectives imply a trade-off: taking more risk may yield higher returns in some circumstances, but also implies greater downside. To compare how different strategies navigate this trade-off, ratios that compute a measure of return per unit of risk are very popular. We’ll discuss the Sharpe ratio and the information ratio (IR) in turn.

The (adjusted) Sharpe Ratio

The ex-ante Sharpe Ratio (SR) compares the portfolio’s expected excess portfolio to the volatility of this excess return, measured by its standard deviation. It measures the compensation as the average excess return per unit of risk taken. It can be estimated from data.

Financial returns often violate the iid assumptions. Andrew Lo has derived the necessary adjustments to the distribution and the time aggregation for returns that are stationary but autocorrelated. This is important because the time-series properties of investment strategies (for example, mean reversion, momentum, and other forms of serial correlation) can have a non-trivial impact on the SR estimator itself, especially when annualizing the SR from higher-frequency data.

The fundamental law of active management

It’s a curious fact that Renaissance Technologies (RenTec), the top-performing quant fund founded by Jim Simons that we mentioned in Chapter 1, has produced similar returns as Warren Buffet despite extremely different approaches. Warren Buffet’s investment firm Berkshire Hathaway holds some 100-150 stocks for fairly long periods, whereas RenTec may execute 100,000 trades per day. How can we compare these distinct strategies?

ML is about optimizing objective functions. In algorithmic trading, the objectives are the return and the risk of the overall investment portfolio, typically relative to a benchmark (which may be cash, the risk-free interest rate, or an asset price index like the S&P 500).

A high Information Ratio (IR) implies attractive out-performance relative to the additional risk taken. The Fundamental Law of Active Management breaks the IR down into the information coefficient (IC) as a measure of forecasting skill, and the ability to apply this skill through independent bets. It summarizes the importance to play both often (high breadth) and to play well (high IC).

The IC measures the correlation between an alpha factor and the forward returns resulting from its signals and captures the accuracy of a manager’s forecasting skills. The breadth of the strategy is measured by the independent number of bets an investor makes in a given time period, and the product of both values is proportional to the IR, also known as appraisal risk (Treynor and Black).

The fundamental law is important because it highlights the key drivers of outperformance: both accurate predictions and the ability to make independent forecasts and act on these forecasts matter. In practice, estimating the breadth of a strategy is difficult given the cross-sectional and time-series correlation among forecasts.

How to manage Portfolio Risk & Return

Portfolio management aims to pick and size positions in financial instruments that achieve the desired risk-return trade-off regarding a benchmark. As a portfolio manager, in each period, you select positions that optimize diversification to reduce risks while achieving a target return. Across periods, these positions may require rebalancing to account for changes in weights resulting from price movements to achieve or maintain a target risk profile.

The evolution of modern portfolio management

Diversification permits us to reduce risks for a given expected return by exploiting how imperfect correlation allows for one asset’s gains to make up for another asset’s losses. Harry Markowitz invented modern portfolio theory (MPT) in 1952 and provided the mathematical tools to optimize diversification by choosing appropriate portfolio weights.

Mean-variance optimization

Modern portfolio theory solves for the optimal portfolio weights to minimize volatility for a given expected return, or maximize returns for a given level of volatility. The key requisite inputs are expected asset returns, standard deviations, and the covariance matrix.

Code Examples: Finding the efficient frontier in Python

We can calculate an efficient frontier using scipy.optimize.minimize and the historical estimates for asset returns, standard deviations, and the covariance matrix.

Alternatives to mean-variance optimization

The challenges with accurate inputs for the mean-variance optimization problem have led to the adoption of several practical alternatives that constrain the mean, the variance, or both, or omit return estimates that are more challenging, such as the risk parity approach that we discuss later in this section.

The 1/N portfolio

Simple portfolios providae useful benchmarks to gauge the added value of complex models that generate the risk of overfitting. The simplest strategy—an equally-weighted portfolio—has been shown to be one of the best performers.

The minimum-variance portfolio

Another alternative is the global minimum-variance (GMV) portfolio, which prioritizes the minimization of risk. It is shown in the efficient frontier figure and can be calculated as follows by minimizing the portfolio standard deviation using the mean-variance framework.

The Black-Litterman approach

The Global Portfolio Optimization approach of Black and Litterman (1992) combines economic models with statistical learning and is popular because it generates estimates of expected returns that are plausible in many situations.
The technique assumes that the market is a mean-variance portfolio as implied by the CAPM equilibrium model. It builds on the fact that the observed market capitalization can be considered as optimal weights assigned to each security by the market. Market weights reflect market prices that, in turn, embody the market’s expectations of future returns.

How to size your bets – the Kelly rule

The Kelly rule has a long history in gambling because it provides guidance on how much to stake on each of an (infinite) sequence of bets with varying (but favorable) odds to maximize terminal wealth. It was published as A New Interpretation of the Information Rate in 1956 by John Kelly who was a colleague of Claude Shannon’s at Bell Labs. He was intrigued by bets placed on candidates at the new quiz show The $64,000 Question, where a viewer on the west coast used the three-hour delay to obtain insider information about the winners.

Kelly drew a connection to Shannon’s information theory to solve for the bet that is optimal for long-term capital growth when the odds are favorable, but uncertainty remains. His rule maximizes logarithmic wealth as a function of the odds of success of each game, and includes implicit bankruptcy protection since log(0) is negative infinity so that a Kelly gambler would naturally avoid losing everything.

Alternatives to MV Optimization with Python

Hierarchical Risk Parity

This novel approach developed by Marcos Lopez de Prado aims to address three major concerns of quadratic optimizers, in general, and Markowitz’s critical line algorithm (CLA), in particular:

instability,
concentration, and
underperformance.

Hierarchical Risk Parity (HRP) applies graph theory and machine-learning to build a diversified portfolio based on the information contained in the covariance matrix. However, unlike quadratic optimizers, HRP does not require the invertibility of the covariance matrix. In fact, HRP can compute a portfolio on an ill-degenerated or even a singular covariance matrix—an impossible feat for quadratic optimizers. Monte Carlo experiments show that HRP delivers lower out-ofsample variance than CLA, even though minimum variance is CLA’s optimization objective. HRP also produces less risky portfolios out of sample compared to traditional risk parity methods. We will discuss HRP in more detail in Chapter 13 when we discuss applications of unsupervised learning, including hiearchical clustering, to trading.

We demonstrate how to implement HRP and compare it to alternatives in Chapter 13 on Unsupervised Learning where we also introduce hierarchical clustering.

Zipline

ziplinebacktrader

zipline

Code Examples: Backtests with trades and portfolio optimization

The code for this section lives in the following two notebooks:

pyfolio

Pyfolio facilitates the analysis of portfolio performance and risk in-sample and out-of-sample using many standard metrics. It produces tear sheets covering the analysis of returns, positions, and transactions, as well as event risk during periods of market stress using several built-in scenarios, and also includes Bayesian out-of-sample performance analysis.

pyfolioZipline

pyfoliopyfolio