15 minute read

The accompanying report is here

The package rGARCH_DCC is included here and it is priliminary and hopefully I can make it user friendly and will publish to my github.

Realized GARCH-DCC Model

In this project we will study the realized GARCH-DCC model. This model is designed to do forecast of volatility and correlation among symbols. The model takes into high frequency data.

Specifically, we let $r^i_t$ to denote the log return $\log P^i_t - \log P^i_{t-1}$ for stock $i$, $\sigma^i_t$ to be the conditional variance of stock $i$, $z_t^i$ to be the shock for stock $i$ and $C_t$ be the conditional correlation matrix among $z_t^i$. The model can be described as

\[\begin{aligned} &r_t^i = \mu^i + \sigma_t^i z_t^i \\ &\log(\text{RV}_t^i) = \xi^i + \phi^i \log(\sigma_{t}^i)^2 + \tau^i_1 z_t^i + \tau^i_2 ((z_t^i)^2-1) + u_t^i\\ &(\sigma_{t+1}^i)^2 = \omega^i + \beta^i (\sigma_{t}^i)^2 + \gamma^i \text{RV}_t^i \\ &C_t = \text{diag}(Q_t)^{-1/2} Q_t \text{diag}(Q_t)^{-1/2} \label{eq:section2:normalization} \\ &Q_{t+1} = S(1-a-b) + a (z_t z_t^T) + bQ_t \label{eq:section2:covariance} \\ &z_t \sim \mathcal{N}(0, C_t) \\ &u_t \sim \mathcal{N}(0, \Sigma_u) . \end{aligned}\]

The first equation describes the return using the volatility scale and the constant expected mean $\mu^i$. The second equation describes the “measurement” mechanics: the realized variance depends on the hidden true volatility and the shocks as well as a measurement noise $u_t$. These $u_t$ have zero pairwise correlation but have different scales for different stock $i$ and thus $\Sigma_u$ is a diagonal matrix. The log transformation on the measurement equation is to make the noise $u_t$ similar to Normal distribution. The third equation is the updating equation for the volatility $\sigma_t^i$. Notice, the usual GARCH model uses return squared to update the volatility but here, we use the more accurate estimate, the realized variance. The $Q_{t}$ is the sample estimate of the conditional covariance matrix of $z_t$ and the update equation is mean revert to the sample estimation $S$. The normalization ensures that the covariance matrix for the shocks is normalized.

import numpy as np
import pandas as pd
import pandas_market_calendars as mcal
import datetime 

from scipy.optimize import minimize
from sklearn.linear_model import LinearRegression
from arch import arch_model

import matplotlib.pyplot as plt
from matplotlib.ticker import MaxNLocator
import matplotlib.dates as mdates
from matplotlib.ticker import FormatStrFormatter
from statsmodels.graphics.gofplots import qqplot
import matplotlib

from rGARCH_DCC import *


font = { "family": "sans",
         "weight": "normal",
         'size'  : 12}

matplotlib.rc('font', **font)

Load Data

We first load data from https://firstratedata.com/free-intraday-data. The data contains AAPL, AMZN, FB, MSFT, TSLA and SP500 one minute tick data in 2019.

FOLDER = "/Users/bunny_home/Documents/Cunwei/stockData/"
AAPL_FILE = FOLDER + "AAPL_FirstRateDatacom1.txt"
AMZN_FILE = FOLDER + "AMZN_FirstRateDatacom1.txt"
FB_FILE   = FOLDER + "FB_FirstRateDatacom1.txt"
MSFT_FILE = FOLDER + "MSFT_FirstRateDatacom1.txt"
TSLA_FILE = FOLDER + "TSLA_FirstRateDatacom1.txt"
SPX_FILE  = FOLDER + "SPX_Firstratedata1.txt"

FILES = {"AAPL" : AAPL_FILE, 
         "AMZN" : AMZN_FILE, 
         "FB"   : FB_FILE, 
         "MSFT" : MSFT_FILE, 
         "TSLA" : TSLA_FILE,
         "SP500": SPX_FILE}

Since there are missing ticks in the data, we then construct time ticks frame in one minute frequency and left join with the data. This ensures the data all in one tick frequency and can be joined conveniently.

def get_trading_index(start, end):
    nyse = mcal.get_calendar('NYSE')
    days = nyse.valid_days(start_date = start, end_date = end)
    date_index = []
    
    for d in days:
        #print (d)
        early_close = [datetime.date(d.year, 7,3),
                   datetime.date(d.year, 11,29),
                   datetime.date(d.year, 12,24)]
        if d.date() in early_close :
            open_time = pd.Timestamp(year=d.year, month = d.month, 
                                     day = d.day, hour = 9, minute = 30)
            close_time = pd.Timestamp(year=d.year, month = d.month, 
                                      day = d.day, hour = 13, minute = 0)
        else:
            schedule = nyse.schedule(start_date = d.date(), end_date = d.date(), tz = nyse.tz.zone)
            open_time = schedule["market_open"][0].replace(tzinfo=None)
            close_time = schedule["market_close"][0].replace(tzinfo=None)
        
        date_index += pd.date_range(open_time, close_time, freq = "1T")
        
    return pd.Series(date_index) 
        

def load_file(filename, trading_index):
    df = pd.read_csv(filename, header=None)
    df.columns = ["time", "open", "high", "low", "close", "volume"][:len(df.columns)]
    df["time"] = pd.to_datetime(df["time"], format= "%Y-%m-%d %H:%M:%S")
    df = df.set_index("time")
    
    dummy = pd.DataFrame.from_dict({"time":trading_index, 
                                    "dummy": np.ones(len(trading_index)) } )
    dummy = dummy.set_index("time")
    
    df = dummy.join(df)
    df = df.drop("dummy", axis = 1)
    return df


DATA = {}
print ("Generating time ticks ...")
TRADING_MINUTES = get_trading_index("2019-01-01", "2019-12-30")
print ("Ticks number: %d" % (len(TRADING_MINUTES)))
for F in FILES:
    data = load_file(FILES[F], TRADING_MINUTES)
    DATA[F] = data 
    print ("%s\t length: %d, done" %(F, data.shape[0]))
Generating time ticks ...
Ticks number: 97601
AAPL	 length: 97601, done
AMZN	 length: 97601, done
FB	 length: 97601, done
MSFT	 length: 97601, done
TSLA	 length: 97601, done
SP500	 length: 97601, done
combine = []
keys = []
for s in DATA:
    combine.append(DATA[s]["close"])
    keys.append(s) 
prices = pd.concat(combine, axis = 1, keys = keys)
prices
AAPL AMZN FB MSFT TSLA SP500
time
2019-01-02 09:30:00 154.7800 1466.9690 129.7950 99.010 305.9600 2470.40
2019-01-02 09:31:00 155.1597 1469.0000 130.1100 99.250 308.8050 2470.80
2019-01-02 09:32:00 154.8073 1470.6300 130.4329 99.220 306.9370 2471.26
2019-01-02 09:33:00 154.6700 1470.8000 130.4700 99.355 305.1100 2469.64
2019-01-02 09:34:00 154.7500 1473.9806 130.6524 99.410 303.8800 2470.11
... ... ... ... ... ... ...
2019-12-30 15:56:00 291.4700 1846.7900 204.4700 157.585 414.6150 3220.75
2019-12-30 15:57:00 291.4477 1845.5500 204.4200 157.520 414.4712 3219.83
2019-12-30 15:58:00 291.3800 1845.5100 204.3000 157.410 414.3600 3218.84
2019-12-30 15:59:00 291.6470 1847.1800 204.4100 157.680 414.6200 3222.06
2019-12-30 16:00:00 291.6100 1846.8900 204.3600 157.590 414.4700 NaN

97601 rows × 6 columns

Clean Nans and Create Log Returns

Since some data are missing for specific ticks and we should clean them. From the nan counts, we see that most of the nans are in 2019-09-30 and since each day should contain 391 points from 9:30am to 4:00pm inclusive, this means the data are all missing in 2019-09-30. We just drop that day.

def get_Nan_days(data):
    nans = data.isna()
    nans = nans.groupby(nans.index.date).sum()
    return nans.loc[(nans > 1).sum(axis = 1)>1]

nan_days = get_Nan_days(prices)
nan_days
AAPL AMZN FB MSFT TSLA SP500
2019-08-28 0 3 0 0 2 1
2019-09-30 0 391 391 391 391 1
2019-10-17 2 2 0 0 0 1
2019-10-21 4 10 4 4 4 1
delete_idx = (prices.index.date==datetime.date(2019,9,30))
prices = prices.loc[~delete_idx]
prices = prices.fillna(method="ffill")
get_Nan_days(prices)
AAPL AMZN FB MSFT TSLA SP500
prices.shape
(97210, 6)

we then just construct the log returns in one minute frequency.

# The log return:

returns = np.log(prices).groupby(prices.index.date).diff()
returns
AAPL AMZN FB MSFT TSLA SP500
time
2019-01-02 09:30:00 NaN NaN NaN NaN NaN NaN
2019-01-02 09:31:00 0.002450 0.001384 0.002424 0.002421 0.009256 0.000162
2019-01-02 09:32:00 -0.002274 0.001109 0.002479 -0.000302 -0.006067 0.000186
2019-01-02 09:33:00 -0.000887 0.000116 0.000284 0.001360 -0.005970 -0.000656
2019-01-02 09:34:00 0.000517 0.002160 0.001397 0.000553 -0.004039 0.000190
... ... ... ... ... ... ...
2019-12-30 15:56:00 0.000103 0.000846 0.000489 0.000667 0.000084 0.000096
2019-12-30 15:57:00 -0.000077 -0.000672 -0.000245 -0.000413 -0.000347 -0.000286
2019-12-30 15:58:00 -0.000232 -0.000022 -0.000587 -0.000699 -0.000268 -0.000308
2019-12-30 15:59:00 0.000916 0.000904 0.000538 0.001714 0.000627 0.001000
2019-12-30 16:00:00 -0.000127 -0.000157 -0.000245 -0.000571 -0.000362 0.000000

97210 rows × 6 columns

returns = returns.dropna()
returns.to_csv("logReturns.csv")

Test Train split

test_index = returns.index.date >= datetime.date(2019,12,1)
train = returns[~test_index]
test = returns[test_index]

Construct Realized Variance

In calculating the realized variance, the frequency of data points need to be chosen. Since our data is spaced in one minute frequence, we should choose intervals larger than that. We calculate realized Variance for different spacing and plot the realized variance vs the spacing. Then we would choose the spacing corresponding to the minimum value as the optimal spacing.

def RV_singleDay(returns, spacing):
    res = 0
    for start_tick in range(spacing):
        res += RV_singleDay_test(returns, spacing, start_tick)
    return res/spacing

def RV_singleDay_test(returns, spacing, start_tick):
    sums = returns.rolling(spacing).sum()
    if len(sums.shape) == 1:
        return np.sum((sums.iloc[start_tick+spacing::spacing])**2, axis = 0)
    return np.sum((sums.iloc[start_tick+spacing::spacing,:])**2, axis = 0)

def getRVs(returns, spacing, start_idx = None):
    
    def RVsingle(x):
        return RV_singleDay(x, spacing)
    if start_idx is None:
        return returns.groupby(returns.index.date).apply(RVsingle)
    
    def RVsingle_test(x):
        return RV_singleDay_test(x, spacing, start_idx)
    
    return returns.groupby(returns.index.date).apply(RVsingle_test)

def multipleRVs(returns, spacing):
    res = {}
    for s in returns.columns:
        res[s] = getRVs(returns[s], spacing[s])
    res = pd.DataFrame.from_dict(res, orient = "columns")
    return res

def test_spacing(data, max_spacing, step = 1,  start_idx = None):

    res = {}
    for s in range(1,max_spacing+1, step):
        res[s] = getRVs(train, s, start_idx).mean(axis = 0)
    res = pd.DataFrame.from_dict(res, orient = "index")
    res["spacing"] = res.index
    
    return res

RV_tuning = test_spacing(train, 100, start_idx = 0, step=2)
fig, axs = plt.subplots(2, 3, figsize=(16,10))
for i in range(2):
    for j in range(3):
        ax = axs[i][j]
        ax.scatter(RV_tuning.iloc[:,-1], RV_tuning.iloc[:,i*3+j], s = 60)
        #ax.set_yscale("log")
        ax.yaxis.set_major_formatter(FormatStrFormatter('%1.1e'))
        ax.title.set_text(RV_tuning.columns[i*3+j])
plt.savefig("RVspacing.png")
plt.show()

png

From the above plot, we choose the smallest minute after which the realized variance stablize and this corresponds to the values listed below and we can trasnform the data to get daily realized variance.

SPACING = {
    "AAPL" : 40,
    "AMZN" : 30,
    "FB"   : 10,
    "MSFT" : 10,
    "TSLA" : 40, 
    "SP500": 10
}

def collapse_daily(returns, spacing=SPACING):
    RVs = multipleRVs(returns, spacing)
    returns = returns.groupby(returns.index.date).sum()
    index = pd.MultiIndex.from_product([returns.columns, ["RV", "return"] ] )
    data = pd.DataFrame(np.zeros((RVs.shape[0], RVs.shape[1]*2)), 
                        index = RVs.index, columns = index)
    for j in RVs.columns:
        data.loc[:, (j,"return")] = returns[j]
        data.loc[:, (j, "RV")] = RVs[j]
    return data
    
train_daily = collapse_daily(train)
test_daily = collapse_daily(test)

index = (train_daily["TSLA"]["RV"] == 0)
train_daily = train_daily[~index]
train_daily
AAPL AMZN FB MSFT TSLA SP500
RV return RV return RV return RV return RV return RV return
2019-01-02 0.000276 0.018817 0.000566 0.047986 0.000478 0.044343 0.000252 0.021878 0.000517 0.012795 0.000143 0.014631
2019-01-03 0.000334 -0.014916 0.000346 -0.017998 0.000446 -0.021329 0.000419 -0.025744 0.000573 -0.024370 0.000222 -0.019205
2019-01-04 0.000240 0.025416 0.000369 0.031095 0.000224 0.023634 0.000381 0.025587 0.000389 0.032885 0.000171 0.022167
2019-01-07 0.000355 -0.002600 0.000272 0.019787 0.000296 0.001024 0.000194 0.005699 0.000679 0.040694 0.000068 0.006271
2019-01-08 0.000114 0.007528 0.000419 -0.002183 0.000299 0.012709 0.000191 -0.003128 0.001019 -0.021433 0.000087 0.001863
... ... ... ... ... ... ... ... ... ... ... ... ...
2019-11-22 0.000038 -0.002365 0.000027 0.004836 0.000064 -0.000905 0.000034 -0.003204 0.000223 -0.020516 0.000007 -0.000595
2019-11-25 0.000022 0.012085 0.000052 0.008161 0.000050 0.000200 0.000012 0.006900 0.000164 -0.014727 0.000004 0.003998
2019-11-26 0.000012 -0.006490 0.000035 0.009007 0.000034 -0.004639 0.000016 0.003198 0.000120 -0.016852 0.000006 0.002352
2019-11-27 0.000011 0.008658 0.000036 0.011019 0.000085 0.011351 0.000013 0.000722 0.000083 0.004661 0.000002 0.002683
2019-11-29 0.000007 0.001948 0.000030 -0.010237 0.000081 0.001410 0.000015 -0.004232 0.000027 -0.000545 0.000003 -0.001396

229 rows × 12 columns

We can visualize the realized variance and the squared log returns.

for s in DATA:
    plt.figure()
    plt.title(s)
    plt.plot((train_daily[s]["return"]**2),label="r2")
    plt.plot(train_daily[s]["RV"],label="RV")
    plt.legend(loc = "best")
    plt.show()

png

png

png

png

png

png

We then fit the individual realized Garch

GARCHs = {}
PARAMs = {}
scale = 1e3
for symbol in DATA:
    print ("fit symbol "+ str(symbol) + " ...")
    returns = train_daily[symbol]["return"]
    RVs = train_daily[symbol]["RV"]
    rl_garch = RealizedGARCH(returns, RVs, scale = scale)
    rl_garch.fit(verbose = 0)
    GARCHs[symbol] = rl_garch
    params = pd.concat( (rl_garch.params, rl_garch.measure_params) ) 
    PARAMs[symbol] = params
    
PARAMs = pd.DataFrame.from_dict(PARAMs, orient="columns")
PARAMs
fit symbol AAPL ...
	 best L1 = -650.558
fit symbol AMZN ...
	 best L1 = -668.673
fit symbol FB ...
	 best L1 = -711.490
fit symbol MSFT ...
	 best L1 = -636.488
fit symbol TSLA ...
	 best L1 = -816.116
fit symbol SP500 ...
	 best L1 = -498.618
AAPL AMZN FB MSFT TSLA SP500
omega 0.000035 0.000013 0.000012 0.000011 0.000241 0.000003
beta 0.306265 0.804552 0.932296 0.665771 0.000000 0.415299
gamma 0.498232 0.112582 0.000000 0.248973 0.762372 0.502016
mu 0.001039 0.000049 0.000246 0.000175 0.000450 0.000387
xi 0.221838 7.076395 153.648558 2.076153 -1.130535 -0.787398
phi 1.084924 1.846474 18.791195 1.256567 0.931592 0.943701
tau1 -0.101627 -0.092318 -0.105260 -0.072026 -0.026562 -0.174259
tau2 0.244832 0.176118 0.134127 0.156663 0.149789 0.153689
sigmaU 0.650246 0.584097 0.490212 0.444128 0.566098 0.554517
def realizedGarch_log_likelihood():
    Likelihoods = {}
    for symbol in GARCHs:
        Likelihoods[symbol] = GARCHs[symbol].train_log_likelihood()
    Likelihoods = pd.DataFrame.from_dict(Likelihoods, orient= "index")
    Likelihoods.columns = ["log(L)"]
    Likelihoods = Likelihoods.transpose()
    return Likelihoods

def get_garchs():
    original_garches = {}
    for symbol in DATA:
        am = arch_model(1000*train_daily[symbol]["return"], mean='Constant', vol='garch')
        res = am.fit(disp="off")
        original_garches[symbol] = res
    
    return original_garches

def arch_log_likelihood(arches, scale = 1):
    ans = {}
    for symbol in arches:
        arch_res = arches[symbol]
        vol = (arch_res.conditional_volatility/scale)**2
        resid = arch_res.std_resid 
        loglikelihood = -0.5*np.sum(np.log(2*np.pi*vol) + resid**2)
        ans[symbol] = loglikelihood
    ans = pd.DataFrame.from_dict(ans, orient="index")
    ans.columns=["log(L)"]
    ans = ans.transpose()
    return ans


We fit our reailized Garch model and the classical Garch separately and compare them by the likelihoods. Our realized model achieves better value in AAPL, MSF, TSLA and SP500. The fit for AMZN and FB seems not working.

realizedGarch_log_likelihood()
AAPL AMZN FB MSFT TSLA SP500
log(L) 720.881208 702.766017 659.948803 734.951069 555.322595 872.821334
arch_log_likelihood(get_garchs(), scale = 1000)
AAPL AMZN FB MSFT TSLA SP500
log(L) 714.535985 708.639553 662.078304 734.394404 549.42429 867.143143

From the below plot its clear why the FB and AMZN do not show higher likelihood. The fitted conditional variance $\sigma^2$ doest follow the realized variance. Especially, the FB model has $\beta = 0$ and does not take into account of realized variance in the model. The AMZN has very small coefficient for $\beta$ and is thus very insensitive to the change of the market (the rise or fall of realized variance).

fig, axs = plt.subplots(2, 3, figsize=(16,10))
for i in range(2):
    for j in range(3):
        ax = axs[i][j]
        symbol = RV_tuning.columns[i*3+j]
        GARCHs[symbol].plot(ax= ax)
        ax.xaxis.set_major_locator(MaxNLocator(5)) 
        ax.xaxis.set_major_formatter(mdates.DateFormatter("%Y-%m"))
        ax.xaxis.set_minor_formatter(mdates.DateFormatter("%Y-%m"))
        ax.yaxis.set_major_formatter(FormatStrFormatter('%1.1g'))
        ax.title.set_text(symbol)

png

def plot_z(Garchs):
    fig, axs = plt.subplots(nrows=2, ncols=3, figsize=(16,8))
    fig.suptitle(r"QQ plot for $z_t$")
    plt.subplots_adjust(wspace = 0.6)
    axs = axs.reshape(-1)
    for j, stock in enumerate(Garchs):
        ax1 = axs[j]
        ax1.set_title(stock)
        qqplot(Garchs[stock].std(), line = "s", ax = ax1)
        ax1.set_xlabel("")
    plt.savefig("zt.png")
def plot_u(Garchs):
    fig, axs = plt.subplots(nrows=2, ncols=3, figsize=(16,8))
    fig.suptitle(r"QQ plot for $u_t$")
    plt.subplots_adjust(wspace = 0.6)
    axs = axs.reshape(-1)
    for j, stock in enumerate(Garchs):
        ax1 = axs[j]
        ax1.set_title(stock)
        measure_RV = Garchs[stock].measure_RV()
        diff = np.log(measure_RV) - np.log(Garchs[stock].RVs_original[1:])
        qqplot(diff, line = "q", ax = ax1)
        ax1.set_xlabel("")
    plt.savefig("ut.png")

We plot fitted residual for $z_t$ and measurement linear regression residual $u_t$ in qq plot. The plot shows that they almost follow the normal distribution. As a result, the assumption is mildly satisfied.

plot_z(GARCHs)
plot_u(GARCHs)

png

png

DCC correlations

We fit the rGARCH-DCC model with the DCC employed to describe the correlation among the shocks $z_t$ among the symbols.

rGarch = RGARCH_DCC(train_daily, garch_scale = 1000)
rGarch.fit(verbose = 0)
rGarch.params()
fit AAPL ...
	 best L1 = -650.558
fit AMZN ...
	 best L1 = -668.673
fit FB ...
	 best L1 = -711.490
fit MSFT ...
	 best L1 = -636.488
fit TSLA ...
	 best L1 = -816.116
fit SP500 ...
	 best L1 = -498.618
iteration: 1 	 log(L): 73.840 	 a: 0.38595 	 b:0.61405
iteration: 2 	 log(L): 680.975 	 a: 0.3732 	 b:0.31926
iteration: 3 	 log(L): 672.804 	 a: 0.3509 	 b:0.58848
iteration: 4 	 log(L): 515.326 	 a: 0.31236 	 b:0.68764
iteration: 5 	 log(L): 543.880 	 a: 0.30289 	 b:0.69711
iteration: 6 	 log(L): 579.609 	 a: 0.2883 	 b:0.7117
iteration: 7 	 log(L): 572.286 	 a: 0.29162 	 b:0.70838
iteration: 8 	 log(L): 625.487 	 a: 0.25989 	 b:0.74011
iteration: 9 	 log(L): 533.779 	 a: 0.30642 	 b:0.69358
iteration: 10 	 log(L): 649.872 	 a: 0.19871 	 b:0.80129
iteration: 11 	 log(L): 687.172 	 a: 0.30417 	 b:0.62254
iteration: 12 	 log(L): 687.863 	 a: 0.30861 	 b:0.59438
iteration: 13 	 log(L): 687.865 	 a: 0.30947 	 b:0.59154
iteration: 14 	 log(L): 687.865 	 a: 0.30952 	 b:0.59176
iteration: 15 	 log(L): 687.865 	 a: 0.30948 	 b:0.59181
iteration: 16 	 log(L): 687.865 	 a: 0.30948 	 b:0.59181
rGarch parameters:
            AAPL      AMZN          FB      MSFT      TSLA     SP500
omega   0.000035  0.000013    0.000012  0.000011  0.000241  0.000003
beta    0.306265  0.804552    0.932296  0.665771  0.000000  0.415299
gamma   0.498232  0.112582    0.000000  0.248973  0.762372  0.502016
mu      0.001039  0.000049    0.000246  0.000175  0.000450  0.000387
xi      0.221838  7.076395  153.648558  2.076153 -1.130535 -0.787398
phi     1.084924  1.846474   18.791195  1.256567  0.931592  0.943701
tau1   -0.101627 -0.092318   -0.105260 -0.072026 -0.026562 -0.174259
tau2    0.244832  0.176118    0.134127  0.156663  0.149789  0.153689
sigmaU  0.650246  0.584097    0.490212  0.444128  0.566098  0.554517

DCC parameters:
a    0.309478
b    0.591807
dtype: float64
def plot_predict(RGARCH, test, n_sample = 1000, low_q = 0.025, high_q = 0.975):
    horizon = test.shape[0]
    sigma2_pred, RV_pred = RGARCH.predict_horizon(
                                horizon, 
                                n_sample = n_sample, 
                                low_q = low_q, high_q = high_q)
    sigma2_mean = pd.DataFrame(sigma2_pred[1,...])
    sigma2_high = pd.DataFrame(RV_pred[2,...])
    sigma2_low  = pd.DataFrame(RV_pred[0,...])
    
    sigma2_mean.index = test.index
    sigma2_high.index = test.index
    sigma2_low.index  = test.index
    
    fig, axs = plt.subplots(2, 3, figsize=(16,10))
    fig.suptitle("prediction on in-sample and out-of-sample data")
    
    axs = axs.reshape(-1)
    
    for j, stock in enumerate(RGARCH.stocks):
        test_returns = test[stock]["RV"]
        ax = axs[j]
        RGARCH.GARCHs[stock].plot(ax=ax)
        ax.plot(test_returns, color ='#1f77b4')
        ax.plot(sigma2_mean.iloc[:,j], color = "darkorange")
        ax.fill_between(sigma2_mean.index, sigma2_low.iloc[:,j], 
                sigma2_high.iloc[:,j], color = "darkgrey")
        ax.axvline(x=test.index[0], color = "k", linestyle = '--')
        ax.xaxis.set_major_locator(MaxNLocator(5)) 
        ax.xaxis.set_major_formatter(mdates.DateFormatter("%Y-%m"))
        ax.xaxis.set_minor_formatter(mdates.DateFormatter("%Y-%m"))
        ax.yaxis.set_major_formatter(FormatStrFormatter('%1.1g'))
        ax.title.set_text(stock)
    return fig

In this plot, we plot the fitted $\sigma^2$ and the predicted $\sigma^2$ in December. The gray area is the $95\%$ confidence interval for the realized variance. The plots shows that the predicted realized variance confidence interval contains the true hidden test data. This indicates a successful forecast.

fig = plot_predict(rGarch, test_daily)
fig.savefig("train_test.png")

png

We design a function that could use rolling bootstrap to estimate the confidence interval of the estimated parameters.

def rolling_bootstrap(train, n_bootstrap):
    length = train.shape[0] - n_bootstrap + 1
    GARCH_params = None
    DCC_params = None
    count = 0
    for i in range(n_bootstrap):
        garchDCC = RGARCH_DCC(train.iloc[i:i+length,:], 
                              garch_scale = 1000)
        garchDCC.fit(verbose = 0)
        if GARCH_params is None:
            shape = [n_bootstrap] + list(garchDCC.GARCH_PARAMs.shape)
            GARCH_params = np.zeros(shape=shape)
        if DCC_params is None:
            DCC_params = np.zeros(shape = (n_bootstrap, 2))
        if garchDCC.DCC_PARAMs["b"] <= 1.0e-5 :
            continue
        GARCH_params[i,:,:] = garchDCC.GARCH_PARAMs
        DCC_params[count,:] = garchDCC.DCC_PARAMs
        count +=1
        
    GARCH_params_std = np.std(GARCH_params, axis = 0)
    DCC_params_std = np.std(DCC_params[:count,:], axis = 0)
    
    GARCH_params_std = pd.DataFrame(GARCH_params_std)
    GARCH_params_std.index = garchDCC.GARCH_PARAMs.index
    GARCH_params_std.columns = garchDCC.GARCH_PARAMs.columns
    
    DCC_params_std = pd.Series(DCC_params_std)
    DCC_params_std.index = garchDCC.DCC_PARAMs.index
    print (count)
    return GARCH_params_std, DCC_params_std
rGarch.GARCH_PARAMs
AAPL AMZN FB MSFT TSLA SP500
omega 0.000035 0.000013 0.000012 0.000011 0.000241 0.000003
beta 0.306265 0.804552 0.932296 0.665771 0.000000 0.415299
gamma 0.498232 0.112582 0.000000 0.248973 0.762372 0.502016
mu 0.001039 0.000049 0.000246 0.000175 0.000450 0.000387
xi 0.221838 7.076395 153.648558 2.076153 -1.130535 -0.787398
phi 1.084924 1.846474 18.791195 1.256567 0.931592 0.943701
tau1 -0.101627 -0.092318 -0.105260 -0.072026 -0.026562 -0.174259
tau2 0.244832 0.176118 0.134127 0.156663 0.149789 0.153689
sigmaU 0.650246 0.584097 0.490212 0.444128 0.566098 0.554517
rGarch.DCC_PARAMs
a    0.309478
b    0.591807
dtype: float64
def getRV_data(data, weights):
    indexs = data.columns.get_level_values(0).unique()
    returns = np.zeros( (data.shape[0], len(indexs)) )
    for i,symbol in enumerate(indexs):
        returns[:,i] = data[symbol]["return"]
    sigma2 = returns.dot(weights)**2
    sigma2 = pd.Series(sigma2)
    sigma2.index = data.index
    return sigma2

def plot_portfolio_RV(train, test, model, weights, low_q = 0.025,
                     high_q = 0.975, ax = None):
    model_sigma2 = model.get_portfolio_sigma2(weights)
    train_sigma2 = getRV_data(train_daily, weights)
    test_sigma2  = getRV_data(test_daily, weights)
    pred_sigma2  = rGarch.predict_horizon_portfolio(
                            test_sigma2.shape[0], 
                            weights, 
                            low_q = low_q, high_q = high_q)
    pred_sigma2 = pd.DataFrame(pred_sigma2.T)
    pred_sigma2.index = test_daily.index
    if ax is None:
        ax = plt.gca()
    ax.plot(train_sigma2, color = "#1f77b4", label = r"$r^2$")
    ax.plot(test_sigma2, color = "#1f77b4")
    ax.plot(model_sigma2, color = "darkorange", label = r"model $\sigma^2$")
    ax.plot(pred_sigma2.iloc[:,1], color = "darkorange")
    ax.fill_between(pred_sigma2.index, pred_sigma2.iloc[:,0], 
                pred_sigma2.iloc[:,2], color = "darkgrey")
    ax.axvline(x=test.index[0], color = "k", linestyle = '--')
    ax.legend(loc="best")
    return

def plot_six_pack(train, test, model, weights = None, 
                  low_q = 0.025, high_q = 0.975):
    
    symbols = train.columns.get_level_values(0).unique()
    n_symbols = len(symbols)
    
    if weights is None:
        weights = np.random.uniform(size=(6, n_symbols))
    weights = weights/np.sum(weights, axis=1)[:,None]
    
    fig, axs = plt.subplots(2, 3, figsize=(16,10))
    fig.suptitle(r"portfolio $\sigma^2$ estimation")
    
    print (weights)
    print (weights.sum(axis=1))
    axs = axs.reshape(-1)
    for i in range(6):
        ax = axs[i]
        plot_portfolio_RV(train, test, model, weights[i,:], 
                          low_q, high_q, ax = ax)
        ax.xaxis.set_major_locator(MaxNLocator(5)) 
        ax.xaxis.set_major_formatter(mdates.DateFormatter("%Y-%m"))
        ax.xaxis.set_minor_formatter(mdates.DateFormatter("%Y-%m"))
        ax.yaxis.set_major_formatter(FormatStrFormatter('%1.1g'))
        ax.title.set_text(f"portfolio {i+1:d}")
    
    return fig
    
    

We then construct six random portfolio consisting of the 6 symbols we have. We do this because we want to test whether the correlation matrix is estimated well. The portfolio volatility needs the correlation matrix and individual volatility. Below we plotted the fitted portfolio $\sigma^2$ and the prected $\sigma^2$ in December. The blue line is the portfolio realized return squared.

fig = plot_six_pack(train_daily, test_daily, rGarch)
plt.savefig("six_pack.png")
[[0.16405074 0.17298219 0.19924118 0.0351977  0.247057   0.18147119]
 [0.12837752 0.19303799 0.01225164 0.04758079 0.17934214 0.43940992]
 [0.10489163 0.25960325 0.0583599  0.26450911 0.0773851  0.23525101]
 [0.09721203 0.31495958 0.05358826 0.04046787 0.3348108  0.15896146]
 [0.14997454 0.26229132 0.18209948 0.11644926 0.19562806 0.09355733]
 [0.0638383  0.15728596 0.12019817 0.25126211 0.2573531  0.15006237]]
[1. 1. 1. 1. 1. 1.]

png

Categories:

Updated:

Leave a comment