Realized Garch model with dynamical conditional correlation

15 minute read

The accompanying report is here

The package rGARCH_DCC is included here and it is priliminary and hopefully I can make it user friendly and will publish to my github.

Realized GARCH-DCC Model

In this project we will study the realized GARCH-DCC model. This model is designed to do forecast of volatility and correlation among symbols. The model takes into high frequency data.

Specifically, we let $r^i_t$ to denote the log return $\log P^i_t - \log P^i_{t-1}$ for stock $i$, $\sigma^i_t$ to be the conditional variance of stock $i$, $z_t^i$ to be the shock for stock $i$ and $C_t$ be the conditional correlation matrix among $z_t^i$. The model can be described as

\[\begin{aligned} &r_t^i = \mu^i + \sigma_t^i z_t^i \\ &\log(\text{RV}_t^i) = \xi^i + \phi^i \log(\sigma_{t}^i)^2 + \tau^i_1 z_t^i + \tau^i_2 ((z_t^i)^2-1) + u_t^i\\ &(\sigma_{t+1}^i)^2 = \omega^i + \beta^i (\sigma_{t}^i)^2 + \gamma^i \text{RV}_t^i \\ &C_t = \text{diag}(Q_t)^{-1/2} Q_t \text{diag}(Q_t)^{-1/2} \label{eq:section2:normalization} \\ &Q_{t+1} = S(1-a-b) + a (z_t z_t^T) + bQ_t \label{eq:section2:covariance} \\ &z_t \sim \mathcal{N}(0, C_t) \\ &u_t \sim \mathcal{N}(0, \Sigma_u) . \end{aligned}\]

The first equation describes the return using the volatility scale and the constant expected mean $\mu^i$. The second equation describes the “measurement” mechanics: the realized variance depends on the hidden true volatility and the shocks as well as a measurement noise $u_t$. These $u_t$ have zero pairwise correlation but have different scales for different stock $i$ and thus $\Sigma_u$ is a diagonal matrix. The log transformation on the measurement equation is to make the noise $u_t$ similar to Normal distribution. The third equation is the updating equation for the volatility $\sigma_t^i$. Notice, the usual GARCH model uses return squared to update the volatility but here, we use the more accurate estimate, the realized variance. The $Q_{t}$ is the sample estimate of the conditional covariance matrix of $z_t$ and the update equation is mean revert to the sample estimation $S$. The normalization ensures that the covariance matrix for the shocks is normalized.

import numpy as np
import pandas as pd
import pandas_market_calendars as mcal
import datetime 

from scipy.optimize import minimize
from sklearn.linear_model import LinearRegression
from arch import arch_model

import matplotlib.pyplot as plt
from matplotlib.ticker import MaxNLocator
import matplotlib.dates as mdates
from matplotlib.ticker import FormatStrFormatter
from statsmodels.graphics.gofplots import qqplot
import matplotlib

from rGARCH_DCC import *


font = { "family": "sans",
         "weight": "normal",
         'size'  : 12}

matplotlib.rc('font', **font)

Load Data

We first load data from https://firstratedata.com/free-intraday-data. The data contains AAPL, AMZN, FB, MSFT, TSLA and SP500 one minute tick data in 2019.

FOLDER = "/Users/bunny_home/Documents/Cunwei/stockData/"
AAPL_FILE = FOLDER + "AAPL_FirstRateDatacom1.txt"
AMZN_FILE = FOLDER + "AMZN_FirstRateDatacom1.txt"
FB_FILE   = FOLDER + "FB_FirstRateDatacom1.txt"
MSFT_FILE = FOLDER + "MSFT_FirstRateDatacom1.txt"
TSLA_FILE = FOLDER + "TSLA_FirstRateDatacom1.txt"
SPX_FILE  = FOLDER + "SPX_Firstratedata1.txt"

FILES = {"AAPL" : AAPL_FILE, 
         "AMZN" : AMZN_FILE, 
         "FB"   : FB_FILE, 
         "MSFT" : MSFT_FILE, 
         "TSLA" : TSLA_FILE,
         "SP500": SPX_FILE}

Since there are missing ticks in the data, we then construct time ticks frame in one minute frequency and left join with the data. This ensures the data all in one tick frequency and can be joined conveniently.

def get_trading_index(start, end):
    nyse = mcal.get_calendar('NYSE')
    days = nyse.valid_days(start_date = start, end_date = end)
    date_index = []
    
    for d in days:
        #print (d)
        early_close = [datetime.date(d.year, 7,3),
                   datetime.date(d.year, 11,29),
                   datetime.date(d.year, 12,24)]
        if d.date() in early_close :
            open_time = pd.Timestamp(year=d.year, month = d.month, 
                                     day = d.day, hour = 9, minute = 30)
            close_time = pd.Timestamp(year=d.year, month = d.month, 
                                      day = d.day, hour = 13, minute = 0)
        else:
            schedule = nyse.schedule(start_date = d.date(), end_date = d.date(), tz = nyse.tz.zone)
            open_time = schedule["market_open"][0].replace(tzinfo=None)
            close_time = schedule["market_close"][0].replace(tzinfo=None)
        
        date_index += pd.date_range(open_time, close_time, freq = "1T")
        
    return pd.Series(date_index) 
        

def load_file(filename, trading_index):
    df = pd.read_csv(filename, header=None)
    df.columns = ["time", "open", "high", "low", "close", "volume"][:len(df.columns)]
    df["time"] = pd.to_datetime(df["time"], format= "%Y-%m-%d %H:%M:%S")
    df = df.set_index("time")
    
    dummy = pd.DataFrame.from_dict({"time":trading_index, 
                                    "dummy": np.ones(len(trading_index)) } )
    dummy = dummy.set_index("time")
    
    df = dummy.join(df)
    df = df.drop("dummy", axis = 1)
    return df

DATA = {}
print ("Generating time ticks ...")
TRADING_MINUTES = get_trading_index("2019-01-01", "2019-12-30")
print ("Ticks number: %d" % (len(TRADING_MINUTES)))
for F in FILES:
    data = load_file(FILES[F], TRADING_MINUTES)
    DATA[F] = data 
    print ("%s\t length: %d, done" %(F, data.shape[0]))

Generating time ticks ...
Ticks number: 97601
AAPL	 length: 97601, done
AMZN	 length: 97601, done
FB	 length: 97601, done
MSFT	 length: 97601, done
TSLA	 length: 97601, done
SP500	 length: 97601, done

combine = []
keys = []
for s in DATA:
    combine.append(DATA[s]["close"])
    keys.append(s) 
prices = pd.concat(combine, axis = 1, keys = keys)
prices

	AAPL	AMZN	FB	MSFT	TSLA	SP500
time
2019-01-02 09:30:00	154.7800	1466.9690	129.7950	99.010	305.9600	2470.40
2019-01-02 09:31:00	155.1597	1469.0000	130.1100	99.250	308.8050	2470.80
2019-01-02 09:32:00	154.8073	1470.6300	130.4329	99.220	306.9370	2471.26
2019-01-02 09:33:00	154.6700	1470.8000	130.4700	99.355	305.1100	2469.64
2019-01-02 09:34:00	154.7500	1473.9806	130.6524	99.410	303.8800	2470.11
...	...	...	...	...	...	...
2019-12-30 15:56:00	291.4700	1846.7900	204.4700	157.585	414.6150	3220.75
2019-12-30 15:57:00	291.4477	1845.5500	204.4200	157.520	414.4712	3219.83
2019-12-30 15:58:00	291.3800	1845.5100	204.3000	157.410	414.3600	3218.84
2019-12-30 15:59:00	291.6470	1847.1800	204.4100	157.680	414.6200	3222.06
2019-12-30 16:00:00	291.6100	1846.8900	204.3600	157.590	414.4700	NaN

97601 rows × 6 columns

Clean Nans and Create Log Returns

Since some data are missing for specific ticks and we should clean them. From the nan counts, we see that most of the nans are in 2019-09-30 and since each day should contain 391 points from 9:30am to 4:00pm inclusive, this means the data are all missing in 2019-09-30. We just drop that day.

def get_Nan_days(data):
    nans = data.isna()
    nans = nans.groupby(nans.index.date).sum()
    return nans.loc[(nans > 1).sum(axis = 1)>1]

nan_days = get_Nan_days(prices)
nan_days

	AAPL	AMZN	FB	MSFT	TSLA	SP500
2019-08-28	0	3	0	0	2	1
2019-09-30	0	391	391	391	391	1
2019-10-17	2	2	0	0	0	1
2019-10-21	4	10	4	4	4	1

delete_idx = (prices.index.date==datetime.date(2019,9,30))
prices = prices.loc[~delete_idx]
prices = prices.fillna(method="ffill")
get_Nan_days(prices)

	AAPL	AMZN	FB	MSFT	TSLA	SP500

prices.shape

(97210, 6)

we then just construct the log returns in one minute frequency.

# The log return:

returns = np.log(prices).groupby(prices.index.date).diff()
returns

	AAPL	AMZN	FB	MSFT	TSLA	SP500
time
2019-01-02 09:30:00	NaN	NaN	NaN	NaN	NaN	NaN
2019-01-02 09:31:00	0.002450	0.001384	0.002424	0.002421	0.009256	0.000162
2019-01-02 09:32:00	-0.002274	0.001109	0.002479	-0.000302	-0.006067	0.000186
2019-01-02 09:33:00	-0.000887	0.000116	0.000284	0.001360	-0.005970	-0.000656
2019-01-02 09:34:00	0.000517	0.002160	0.001397	0.000553	-0.004039	0.000190
...	...	...	...	...	...	...
2019-12-30 15:56:00	0.000103	0.000846	0.000489	0.000667	0.000084	0.000096
2019-12-30 15:57:00	-0.000077	-0.000672	-0.000245	-0.000413	-0.000347	-0.000286
2019-12-30 15:58:00	-0.000232	-0.000022	-0.000587	-0.000699	-0.000268	-0.000308
2019-12-30 15:59:00	0.000916	0.000904	0.000538	0.001714	0.000627	0.001000
2019-12-30 16:00:00	-0.000127	-0.000157	-0.000245	-0.000571	-0.000362	0.000000

97210 rows × 6 columns

returns = returns.dropna()
returns.to_csv("logReturns.csv")

Test Train split

test_index = returns.index.date >= datetime.date(2019,12,1)
train = returns[~test_index]
test = returns[test_index]

Construct Realized Variance

In calculating the realized variance, the frequency of data points need to be chosen. Since our data is spaced in one minute frequence, we should choose intervals larger than that. We calculate realized Variance for different spacing and plot the realized variance vs the spacing. Then we would choose the spacing corresponding to the minimum value as the optimal spacing.

def RV_singleDay(returns, spacing):
    res = 0
    for start_tick in range(spacing):
        res += RV_singleDay_test(returns, spacing, start_tick)
    return res/spacing

def RV_singleDay_test(returns, spacing, start_tick):
    sums = returns.rolling(spacing).sum()
    if len(sums.shape) == 1:
        return np.sum((sums.iloc[start_tick+spacing::spacing])**2, axis = 0)
    return np.sum((sums.iloc[start_tick+spacing::spacing,:])**2, axis = 0)

def getRVs(returns, spacing, start_idx = None):
    
    def RVsingle(x):
        return RV_singleDay(x, spacing)
    if start_idx is None:
        return returns.groupby(returns.index.date).apply(RVsingle)
    
    def RVsingle_test(x):
        return RV_singleDay_test(x, spacing, start_idx)
    
    return returns.groupby(returns.index.date).apply(RVsingle_test)

def multipleRVs(returns, spacing):
    res = {}
    for s in returns.columns:
        res[s] = getRVs(returns[s], spacing[s])
    res = pd.DataFrame.from_dict(res, orient = "columns")
    return res

def test_spacing(data, max_spacing, step = 1,  start_idx = None):

    res = {}
    for s in range(1,max_spacing+1, step):
        res[s] = getRVs(train, s, start_idx).mean(axis = 0)
    res = pd.DataFrame.from_dict(res, orient = "index")
    res["spacing"] = res.index
    
    return res

RV_tuning = test_spacing(train, 100, start_idx = 0, step=2)
fig, axs = plt.subplots(2, 3, figsize=(16,10))
for i in range(2):
    for j in range(3):
        ax = axs[i][j]
        ax.scatter(RV_tuning.iloc[:,-1], RV_tuning.iloc[:,i*3+j], s = 60)
        #ax.set_yscale("log")
        ax.yaxis.set_major_formatter(FormatStrFormatter('%1.1e'))
        ax.title.set_text(RV_tuning.columns[i*3+j])
plt.savefig("RVspacing.png")
plt.show()

png

From the above plot, we choose the smallest minute after which the realized variance stablize and this corresponds to the values listed below and we can trasnform the data to get daily realized variance.

SPACING = {
    "AAPL" : 40,
    "AMZN" : 30,
    "FB"   : 10,
    "MSFT" : 10,
    "TSLA" : 40, 
    "SP500": 10
}

def collapse_daily(returns, spacing=SPACING):
    RVs = multipleRVs(returns, spacing)
    returns = returns.groupby(returns.index.date).sum()
    index = pd.MultiIndex.from_product([returns.columns, ["RV", "return"] ] )
    data = pd.DataFrame(np.zeros((RVs.shape[0], RVs.shape[1]*2)), 
                        index = RVs.index, columns = index)
    for j in RVs.columns:
        data.loc[:, (j,"return")] = returns[j]
        data.loc[:, (j, "RV")] = RVs[j]
    return data
    

train_daily = collapse_daily(train)
test_daily = collapse_daily(test)

index = (train_daily["TSLA"]["RV"] == 0)
train_daily = train_daily[~index]
train_daily

	AAPL		AMZN		FB		MSFT		TSLA		SP500
	RV	return	RV	return	RV	return	RV	return	RV	return	RV	return
2019-01-02	0.000276	0.018817	0.000566	0.047986	0.000478	0.044343	0.000252	0.021878	0.000517	0.012795	0.000143	0.014631
2019-01-03	0.000334	-0.014916	0.000346	-0.017998	0.000446	-0.021329	0.000419	-0.025744	0.000573	-0.024370	0.000222	-0.019205
2019-01-04	0.000240	0.025416	0.000369	0.031095	0.000224	0.023634	0.000381	0.025587	0.000389	0.032885	0.000171	0.022167
2019-01-07	0.000355	-0.002600	0.000272	0.019787	0.000296	0.001024	0.000194	0.005699	0.000679	0.040694	0.000068	0.006271
2019-01-08	0.000114	0.007528	0.000419	-0.002183	0.000299	0.012709	0.000191	-0.003128	0.001019	-0.021433	0.000087	0.001863
...	...	...	...	...	...	...	...	...	...	...	...	...
2019-11-22	0.000038	-0.002365	0.000027	0.004836	0.000064	-0.000905	0.000034	-0.003204	0.000223	-0.020516	0.000007	-0.000595
2019-11-25	0.000022	0.012085	0.000052	0.008161	0.000050	0.000200	0.000012	0.006900	0.000164	-0.014727	0.000004	0.003998
2019-11-26	0.000012	-0.006490	0.000035	0.009007	0.000034	-0.004639	0.000016	0.003198	0.000120	-0.016852	0.000006	0.002352
2019-11-27	0.000011	0.008658	0.000036	0.011019	0.000085	0.011351	0.000013	0.000722	0.000083	0.004661	0.000002	0.002683
2019-11-29	0.000007	0.001948	0.000030	-0.010237	0.000081	0.001410	0.000015	-0.004232	0.000027	-0.000545	0.000003	-0.001396

229 rows × 12 columns

We can visualize the realized variance and the squared log returns.

for s in DATA:
    plt.figure()
    plt.title(s)
    plt.plot((train_daily[s]["return"]**2),label="r2")
    plt.plot(train_daily[s]["RV"],label="RV")
    plt.legend(loc = "best")
    plt.show()

png

We then fit the individual realized Garch

GARCHs = {}
PARAMs = {}
scale = 1e3
for symbol in DATA:
    print ("fit symbol "+ str(symbol) + " ...")
    returns = train_daily[symbol]["return"]
    RVs = train_daily[symbol]["RV"]
    rl_garch = RealizedGARCH(returns, RVs, scale = scale)
    rl_garch.fit(verbose = 0)
    GARCHs[symbol] = rl_garch
    params = pd.concat( (rl_garch.params, rl_garch.measure_params) ) 
    PARAMs[symbol] = params
    
PARAMs = pd.DataFrame.from_dict(PARAMs, orient="columns")
PARAMs

fit symbol AAPL ...
	 best L1 = -650.558
fit symbol AMZN ...
	 best L1 = -668.673
fit symbol FB ...
	 best L1 = -711.490
fit symbol MSFT ...
	 best L1 = -636.488
fit symbol TSLA ...
	 best L1 = -816.116
fit symbol SP500 ...
	 best L1 = -498.618

	AAPL	AMZN	FB	MSFT	TSLA	SP500
omega	0.000035	0.000013	0.000012	0.000011	0.000241	0.000003
beta	0.306265	0.804552	0.932296	0.665771	0.000000	0.415299
gamma	0.498232	0.112582	0.000000	0.248973	0.762372	0.502016
mu	0.001039	0.000049	0.000246	0.000175	0.000450	0.000387
xi	0.221838	7.076395	153.648558	2.076153	-1.130535	-0.787398
phi	1.084924	1.846474	18.791195	1.256567	0.931592	0.943701
tau1	-0.101627	-0.092318	-0.105260	-0.072026	-0.026562	-0.174259
tau2	0.244832	0.176118	0.134127	0.156663	0.149789	0.153689
sigmaU	0.650246	0.584097	0.490212	0.444128	0.566098	0.554517

def realizedGarch_log_likelihood():
    Likelihoods = {}
    for symbol in GARCHs:
        Likelihoods[symbol] = GARCHs[symbol].train_log_likelihood()
    Likelihoods = pd.DataFrame.from_dict(Likelihoods, orient= "index")
    Likelihoods.columns = ["log(L)"]
    Likelihoods = Likelihoods.transpose()
    return Likelihoods

def get_garchs():
    original_garches = {}
    for symbol in DATA:
        am = arch_model(1000*train_daily[symbol]["return"], mean='Constant', vol='garch')
        res = am.fit(disp="off")
        original_garches[symbol] = res
    
    return original_garches

def arch_log_likelihood(arches, scale = 1):
    ans = {}
    for symbol in arches:
        arch_res = arches[symbol]
        vol = (arch_res.conditional_volatility/scale)**2
        resid = arch_res.std_resid 
        loglikelihood = -0.5*np.sum(np.log(2*np.pi*vol) + resid**2)
        ans[symbol] = loglikelihood
    ans = pd.DataFrame.from_dict(ans, orient="index")
    ans.columns=["log(L)"]
    ans = ans.transpose()
    return ans

We fit our reailized Garch model and the classical Garch separately and compare them by the likelihoods. Our realized model achieves better value in AAPL, MSF, TSLA and SP500. The fit for AMZN and FB seems not working.

realizedGarch_log_likelihood()

	AAPL	AMZN	FB	MSFT	TSLA	SP500
log(L)	720.881208	702.766017	659.948803	734.951069	555.322595	872.821334

arch_log_likelihood(get_garchs(), scale = 1000)

	AAPL	AMZN	FB	MSFT	TSLA	SP500
log(L)	714.535985	708.639553	662.078304	734.394404	549.42429	867.143143

From the below plot its clear why the FB and AMZN do not show higher likelihood. The fitted conditional variance $\sigma^2$ doest follow the realized variance. Especially, the FB model has $\beta = 0$ and does not take into account of realized variance in the model. The AMZN has very small coefficient for $\beta$ and is thus very insensitive to the change of the market (the rise or fall of realized variance).

fig, axs = plt.subplots(2, 3, figsize=(16,10))
for i in range(2):
    for j in range(3):
        ax = axs[i][j]
        symbol = RV_tuning.columns[i*3+j]
        GARCHs[symbol].plot(ax= ax)
        ax.xaxis.set_major_locator(MaxNLocator(5)) 
        ax.xaxis.set_major_formatter(mdates.DateFormatter("%Y-%m"))
        ax.xaxis.set_minor_formatter(mdates.DateFormatter("%Y-%m"))
        ax.yaxis.set_major_formatter(FormatStrFormatter('%1.1g'))
        ax.title.set_text(symbol)

png

def plot_z(Garchs):
    fig, axs = plt.subplots(nrows=2, ncols=3, figsize=(16,8))
    fig.suptitle(r"QQ plot for $z_t$")
    plt.subplots_adjust(wspace = 0.6)
    axs = axs.reshape(-1)
    for j, stock in enumerate(Garchs):
        ax1 = axs[j]
        ax1.set_title(stock)
        qqplot(Garchs[stock].std(), line = "s", ax = ax1)
        ax1.set_xlabel("")
    plt.savefig("zt.png")
def plot_u(Garchs):
    fig, axs = plt.subplots(nrows=2, ncols=3, figsize=(16,8))
    fig.suptitle(r"QQ plot for $u_t$")
    plt.subplots_adjust(wspace = 0.6)
    axs = axs.reshape(-1)
    for j, stock in enumerate(Garchs):
        ax1 = axs[j]
        ax1.set_title(stock)
        measure_RV = Garchs[stock].measure_RV()
        diff = np.log(measure_RV) - np.log(Garchs[stock].RVs_original[1:])
        qqplot(diff, line = "q", ax = ax1)
        ax1.set_xlabel("")
    plt.savefig("ut.png")

We plot fitted residual for $z_t$ and measurement linear regression residual $u_t$ in qq plot. The plot shows that they almost follow the normal distribution. As a result, the assumption is mildly satisfied.

plot_z(GARCHs)
plot_u(GARCHs)

png

DCC correlations

We fit the rGARCH-DCC model with the DCC employed to describe the correlation among the shocks $z_t$ among the symbols.

rGarch = RGARCH_DCC(train_daily, garch_scale = 1000)
rGarch.fit(verbose = 0)
rGarch.params()

fit AAPL ...
	 best L1 = -650.558
fit AMZN ...
	 best L1 = -668.673
fit FB ...
	 best L1 = -711.490
fit MSFT ...
	 best L1 = -636.488
fit TSLA ...
	 best L1 = -816.116
fit SP500 ...
	 best L1 = -498.618
iteration: 1 	 log(L): 73.840 	 a: 0.38595 	 b:0.61405
iteration: 2 	 log(L): 680.975 	 a: 0.3732 	 b:0.31926
iteration: 3 	 log(L): 672.804 	 a: 0.3509 	 b:0.58848
iteration: 4 	 log(L): 515.326 	 a: 0.31236 	 b:0.68764
iteration: 5 	 log(L): 543.880 	 a: 0.30289 	 b:0.69711
iteration: 6 	 log(L): 579.609 	 a: 0.2883 	 b:0.7117
iteration: 7 	 log(L): 572.286 	 a: 0.29162 	 b:0.70838
iteration: 8 	 log(L): 625.487 	 a: 0.25989 	 b:0.74011
iteration: 9 	 log(L): 533.779 	 a: 0.30642 	 b:0.69358
iteration: 10 	 log(L): 649.872 	 a: 0.19871 	 b:0.80129
iteration: 11 	 log(L): 687.172 	 a: 0.30417 	 b:0.62254
iteration: 12 	 log(L): 687.863 	 a: 0.30861 	 b:0.59438
iteration: 13 	 log(L): 687.865 	 a: 0.30947 	 b:0.59154
iteration: 14 	 log(L): 687.865 	 a: 0.30952 	 b:0.59176
iteration: 15 	 log(L): 687.865 	 a: 0.30948 	 b:0.59181
iteration: 16 	 log(L): 687.865 	 a: 0.30948 	 b:0.59181
rGarch parameters:
            AAPL      AMZN          FB      MSFT      TSLA     SP500
omega   0.000035  0.000013    0.000012  0.000011  0.000241  0.000003
beta    0.306265  0.804552    0.932296  0.665771  0.000000  0.415299
gamma   0.498232  0.112582    0.000000  0.248973  0.762372  0.502016
mu      0.001039  0.000049    0.000246  0.000175  0.000450  0.000387
xi      0.221838  7.076395  153.648558  2.076153 -1.130535 -0.787398
phi     1.084924  1.846474   18.791195  1.256567  0.931592  0.943701
tau1   -0.101627 -0.092318   -0.105260 -0.072026 -0.026562 -0.174259
tau2    0.244832  0.176118    0.134127  0.156663  0.149789  0.153689
sigmaU  0.650246  0.584097    0.490212  0.444128  0.566098  0.554517

DCC parameters:
a    0.309478
b    0.591807
dtype: float64

def plot_predict(RGARCH, test, n_sample = 1000, low_q = 0.025, high_q = 0.975):
    horizon = test.shape[0]
    sigma2_pred, RV_pred = RGARCH.predict_horizon(
                                horizon, 
                                n_sample = n_sample, 
                                low_q = low_q, high_q = high_q)
    sigma2_mean = pd.DataFrame(sigma2_pred[1,...])
    sigma2_high = pd.DataFrame(RV_pred[2,...])
    sigma2_low  = pd.DataFrame(RV_pred[0,...])
    
    sigma2_mean.index = test.index
    sigma2_high.index = test.index
    sigma2_low.index  = test.index
    
    fig, axs = plt.subplots(2, 3, figsize=(16,10))
    fig.suptitle("prediction on in-sample and out-of-sample data")
    
    axs = axs.reshape(-1)
    
    for j, stock in enumerate(RGARCH.stocks):
        test_returns = test[stock]["RV"]
        ax = axs[j]
        RGARCH.GARCHs[stock].plot(ax=ax)
        ax.plot(test_returns, color ='#1f77b4')
        ax.plot(sigma2_mean.iloc[:,j], color = "darkorange")
        ax.fill_between(sigma2_mean.index, sigma2_low.iloc[:,j], 
                sigma2_high.iloc[:,j], color = "darkgrey")
        ax.axvline(x=test.index[0], color = "k", linestyle = '--')
        ax.xaxis.set_major_locator(MaxNLocator(5)) 
        ax.xaxis.set_major_formatter(mdates.DateFormatter("%Y-%m"))
        ax.xaxis.set_minor_formatter(mdates.DateFormatter("%Y-%m"))
        ax.yaxis.set_major_formatter(FormatStrFormatter('%1.1g'))
        ax.title.set_text(stock)
    return fig

In this plot, we plot the fitted $\sigma^2$ and the predicted $\sigma^2$ in December. The gray area is the $95\%$ confidence interval for the realized variance. The plots shows that the predicted realized variance confidence interval contains the true hidden test data. This indicates a successful forecast.

fig = plot_predict(rGarch, test_daily)
fig.savefig("train_test.png")

png

We design a function that could use rolling bootstrap to estimate the confidence interval of the estimated parameters.

def rolling_bootstrap(train, n_bootstrap):
    length = train.shape[0] - n_bootstrap + 1
    GARCH_params = None
    DCC_params = None
    count = 0
    for i in range(n_bootstrap):
        garchDCC = RGARCH_DCC(train.iloc[i:i+length,:], 
                              garch_scale = 1000)
        garchDCC.fit(verbose = 0)
        if GARCH_params is None:
            shape = [n_bootstrap] + list(garchDCC.GARCH_PARAMs.shape)
            GARCH_params = np.zeros(shape=shape)
        if DCC_params is None:
            DCC_params = np.zeros(shape = (n_bootstrap, 2))
        if garchDCC.DCC_PARAMs["b"] <= 1.0e-5 :
            continue
        GARCH_params[i,:,:] = garchDCC.GARCH_PARAMs
        DCC_params[count,:] = garchDCC.DCC_PARAMs
        count +=1
        
    GARCH_params_std = np.std(GARCH_params, axis = 0)
    DCC_params_std = np.std(DCC_params[:count,:], axis = 0)
    
    GARCH_params_std = pd.DataFrame(GARCH_params_std)
    GARCH_params_std.index = garchDCC.GARCH_PARAMs.index
    GARCH_params_std.columns = garchDCC.GARCH_PARAMs.columns
    
    DCC_params_std = pd.Series(DCC_params_std)
    DCC_params_std.index = garchDCC.DCC_PARAMs.index
    print (count)
    return GARCH_params_std, DCC_params_std

rGarch.GARCH_PARAMs

	AAPL	AMZN	FB	MSFT	TSLA	SP500
omega	0.000035	0.000013	0.000012	0.000011	0.000241	0.000003
beta	0.306265	0.804552	0.932296	0.665771	0.000000	0.415299
gamma	0.498232	0.112582	0.000000	0.248973	0.762372	0.502016
mu	0.001039	0.000049	0.000246	0.000175	0.000450	0.000387
xi	0.221838	7.076395	153.648558	2.076153	-1.130535	-0.787398
phi	1.084924	1.846474	18.791195	1.256567	0.931592	0.943701
tau1	-0.101627	-0.092318	-0.105260	-0.072026	-0.026562	-0.174259
tau2	0.244832	0.176118	0.134127	0.156663	0.149789	0.153689
sigmaU	0.650246	0.584097	0.490212	0.444128	0.566098	0.554517

rGarch.DCC_PARAMs

a    0.309478
b    0.591807
dtype: float64

def getRV_data(data, weights):
    indexs = data.columns.get_level_values(0).unique()
    returns = np.zeros( (data.shape[0], len(indexs)) )
    for i,symbol in enumerate(indexs):
        returns[:,i] = data[symbol]["return"]
    sigma2 = returns.dot(weights)**2
    sigma2 = pd.Series(sigma2)
    sigma2.index = data.index
    return sigma2

def plot_portfolio_RV(train, test, model, weights, low_q = 0.025,
                     high_q = 0.975, ax = None):
    model_sigma2 = model.get_portfolio_sigma2(weights)
    train_sigma2 = getRV_data(train_daily, weights)
    test_sigma2  = getRV_data(test_daily, weights)
    pred_sigma2  = rGarch.predict_horizon_portfolio(
                            test_sigma2.shape[0], 
                            weights, 
                            low_q = low_q, high_q = high_q)
    pred_sigma2 = pd.DataFrame(pred_sigma2.T)
    pred_sigma2.index = test_daily.index
    if ax is None:
        ax = plt.gca()
    ax.plot(train_sigma2, color = "#1f77b4", label = r"$r^2$")
    ax.plot(test_sigma2, color = "#1f77b4")
    ax.plot(model_sigma2, color = "darkorange", label = r"model $\sigma^2$")
    ax.plot(pred_sigma2.iloc[:,1], color = "darkorange")
    ax.fill_between(pred_sigma2.index, pred_sigma2.iloc[:,0], 
                pred_sigma2.iloc[:,2], color = "darkgrey")
    ax.axvline(x=test.index[0], color = "k", linestyle = '--')
    ax.legend(loc="best")
    return

def plot_six_pack(train, test, model, weights = None, 
                  low_q = 0.025, high_q = 0.975):
    
    symbols = train.columns.get_level_values(0).unique()
    n_symbols = len(symbols)
    
    if weights is None:
        weights = np.random.uniform(size=(6, n_symbols))
    weights = weights/np.sum(weights, axis=1)[:,None]
    
    fig, axs = plt.subplots(2, 3, figsize=(16,10))
    fig.suptitle(r"portfolio $\sigma^2$ estimation")
    
    print (weights)
    print (weights.sum(axis=1))
    axs = axs.reshape(-1)
    for i in range(6):
        ax = axs[i]
        plot_portfolio_RV(train, test, model, weights[i,:], 
                          low_q, high_q, ax = ax)
        ax.xaxis.set_major_locator(MaxNLocator(5)) 
        ax.xaxis.set_major_formatter(mdates.DateFormatter("%Y-%m"))
        ax.xaxis.set_minor_formatter(mdates.DateFormatter("%Y-%m"))
        ax.yaxis.set_major_formatter(FormatStrFormatter('%1.1g'))
        ax.title.set_text(f"portfolio {i+1:d}")
    
    return fig
    
    

We then construct six random portfolio consisting of the 6 symbols we have. We do this because we want to test whether the correlation matrix is estimated well. The portfolio volatility needs the correlation matrix and individual volatility. Below we plotted the fitted portfolio $\sigma^2$ and the prected $\sigma^2$ in December. The blue line is the portfolio realized return squared.

fig = plot_six_pack(train_daily, test_daily, rGarch)
plt.savefig("six_pack.png")

[[0.16405074 0.17298219 0.19924118 0.0351977  0.247057   0.18147119]
 [0.12837752 0.19303799 0.01225164 0.04758079 0.17934214 0.43940992]
 [0.10489163 0.25960325 0.0583599  0.26450911 0.0773851  0.23525101]
 [0.09721203 0.31495958 0.05358826 0.04046787 0.3348108  0.15896146]
 [0.14997454 0.26229132 0.18209948 0.11644926 0.19562806 0.09355733]
 [0.0638383  0.15728596 0.12019817 0.25126211 0.2573531  0.15006237]]
[1. 1. 1. 1. 1. 1.]

png

Share on

Twitter Facebook LinkedIn

Cunwei Fan

Realized Garch model with dynamical conditional correlation

Realized GARCH-DCC Model

Load Data

Clean Nans and Create Log Returns

Test Train split

Construct Realized Variance

DCC correlations

Share on

Leave a comment

You may also enjoy

Interview Brainteaser

My cute son’s photos before he turns to one month

Travel Experience to Hawaii

My son is Born on Feburary 17, 2022