Time Series Terminal Whitepaper
Brief Solutions Ltd, May 2022
The Problem
Is time series A predictive of time series B?
i.e. given all information about A's value up to time , is it predictive of the conditional distribution of B's value at a future time , with being the prediction horizon?
Is this relationship causal?
i.e. the presence of A in the prediction model makes the prediction of the future of B better, its absence makes the prediction worse. A computational version of this notion is Granger Causality. For a philosophical overview, refer to Causation.
The Solution
Given a pre-configured collection of time series data, we designed a computational engine to compute the predictive power of any series A for any series B, for a given prediction horizon, by conducting many random experiments. When we collate the results we obtain a causal graph that represents the strengths of predictive power between them.
For example, following the close of Thu 14 Dec 2023, the computational engine updates all causal relations for 24-week ahead horizon. The vicinity around the S&P 500 ETF SPY.US would read like
link: https://tsterm.com/?q=SPY.US&h=24w&asof=2023-12-14
so it is computed that the history of values (in this case being the price of the stock) up to Thu 14 Dec 2023 of Microsoft MSFT.US determines the value of the S&P 500 ETF SPY.US in 24 weeks, Thu 30 May 2024.
Once the causal predictors are selected, we train a group of models, each to predict a complete conditional distribution of the target time series' value at the forecast horizon, which can be aggregated in different ways for practical use.
For example, for a certain target instrument and a certain target date, two assessors Model 1 and 2 predicted the value change (from now to the target date) at the 25%-, 50%-, 75%-probability levels as below,
|
25% |
50% |
75% |
Expectation |
Model 1 |
-14 |
3 |
17 |
2 (greater than 0) |
Model 2 |
-20 |
-4 |
6 |
-6 (less than 0) |
Averaged Distribution |
-17 |
-0.5 |
11.5 |
|
Model 1 seems more "optimistic" than Model 2, predicting a higher value change at each probability level. For each probabilty level, if we average the two predicted numbers across the models, we obtain the results in the last row "Averaged Distribution". On tsterm.com the averaged conditional distribution is plotted on charts.
On the other hand, if Model 1 is to take an action solely based on its own predicted conditional distribution, one way is to first compute the expectation, as average of the predicted three numbers. Depending on whether the expectation is greater or less than zero, it can vote "up" or "down", take a long or short trading position, etc.
On the above example, out of two models, one would vote "up", one "down", or models vote "up", models vote "down". The net prediction is then models vote neither "up" or "down".
Goodness of the Probabilistic Prediction
Regardless the nature of the time series data set, we can generically test the following hypothesis:
The realised value follows well the predicted conditional distribution?
We denote by realised probability level how much probability mass in the predicted conditional distribution is below the realised value. It is always a value between 0 and 1. One may prove that if the prediction is well done, the realised probability level should be distributed uniformly between 0 and 1. It cannot hover most often around a particular area between 0 and 1. It is therefore equivalent to testing the hypothesis:
The realised probability level follows the uniform distribution between 0 and 1?
for which we would take a random sample of dates, check out the respective realised values and predicted conditional distributions, and arrive at the realised probability levels. Eventually a statistical test will produce a p-value for how well the realised probability levels are distributed uniformly between 0 and 1.
For example, for prediction of S&P 500 ETF SPY.US, 24-week ahead horizon, let us first look at two target dates Fri 2023-10-20 and Fri 2023-11-24. On the former date the price was going down; on the latter it was going up.
For Fri 2023-10-20, the prediction was made 24 weeks before on Fri 2023-05-05. The realisation 421.19 ends in the lower 50% of predicted outcome scenarios. Precisely, 42% of the predicted values were below the realistion 421.19, the realised probability level was thus 42%, or 0.42.
For Fri 2023-11-24, the prediction was made 24 weeks before on Fri 2023-06-09. The realisation 455.3 ends in the upper 50% of predicted outcome scenarios. Precisely, 56% of the predicted values were below the realisation 455.3, the realised probability level was thus 56%, or 0.56.
We collect a random sample of 20 realised probability levels including the above two 0.42 and 0.56, and plot the empirical distribution of them,
as well as the probability density function of the uniform distribution, which is a constant probability density of 1 for any value between 0 and 1. Can the sampled realised probability levels have come from the uniform distribution between 0 and 1?
Let us run a statistical test which would produce a p-value as an indicator for the goodness of fit of the sample, under the hypothesis it did come from the uniform distribution. The p-value is always a value between 0 and 1 for any statistical test. For our sample above, we obtain p-value 0.0013.
How high should the p-value be for us to not reject the hypothesis that the realised probability level follow the uniform distribution between 0 and 1, or equivalently, the predicted conditional distribution is good? In practice people have different preferences, e.g 0.05, 0.01, 0.001, etc.
For example, if we decide on using the threshold 0.05, when the computed p-value is at least 0.05, we will not reject the hypothesis; when it is below 0.05, we reject it. For our sample, the computed p-value 0.0013 is below 0.05, so we reject the hypothesis that the predicted conditional distribution is good. But if we decide on using the threshold 0.001, we will not reject the hypothesis.
Trading Backtest
On financial data, in addition, we can perform a synthetic trading backtest to show the goodness. We follow the second way of aggregation: each model from its predicted conditional distribution makes a vote of "up", "down" or "undecided", for the direction of movement from the calculation time to the target forecast time. We then tally the majority vote.
Take for example 1-month forecast horizon, and the calculation is updated daily. The votes as calculated over the last one month all concern the next day, so the next-day position would be the average of the majority votes cast daily over the last one month.
With 2-day ahead forecast horizon, under the ideal condition of no trading cost, the performances would be:
For Invesco Nasdaq-100 ETF,
For SPDR S&P 500 ETF,
For SPDR Dow Jones Industrial Average ETF,
For interest rates, let's switch to a 1-week ahead forecast horizon:
For US 5-Year Treasury Yield Rate,
For US 10-Year Treasury Yield Rate,