Time Series Terminal Whitepaper

Brief Solutions Ltd, May 2022

The Problem

Is time series A predictive of time series B?

i.e. given all information about A's value up to time t, is it predictive of the conditional distribution of B's value at a future time t+p, with p being the prediction horizon?

Is this relationship causal?

i.e. the presence of A in the prediction model makes the prediction of the future of B better, its absence makes the prediction worse. A computational version of this notion is Granger Causality. For a philosophical overview, refer to Causation.

The Solution

Random experiments to identify the causal predictors

Given a set of time series data, tsterm.com computes the predictive power of any series A for any series B, for a pre-determined prediction horizon, by conducting many random experiments to check the difference of prediction performance when A is present vs when A is absent.

For an example of result, at the close of Thu 14 Dec 2023, the causal predictors for the S&P 500 ETF SPY.US in 24 weeks would read like

Train a group of models

Once the causal predictors are selected, we train a group of models, each to predict a complete conditional distribution of the target time series' value.

For example, as of 2025-02-18 market close, below are two models' predicted outcomes for SPY.US in 24 weeks (2025-08-05), colored in green and yellow respectively. The average of the predicted outcomes (the "expectation") are in circled markers.

How to use the results of all models?

We have to "aggregate" them: the predicted outcomes can be pooled together across the models. Alternatively, a model can vote "up" or "down" by comparing its own expectation with the last price. In case it has predicted an infinitely wide distribution, it would vote "undecided".

On the above example, out of the two models, one would vote "up", one vote "down", or 1/2=50% models vote "up", 1/2=50% models vote "down". The majority vote is that 50-50=0% models vote neither "up" or "down".

Goodness of the Probabilistic Prediction

Whether from a particular model or pooled over all the models, we can test the goodness of the predicted distributions over time:

Does the realised value follow well the predicted conditional distribution generally?

For a given time point, we denote by realised probability level how much probability mass in the predicted conditional distribution is below the realised value. It is always a value between 0 and 1. One may prove that if the prediction is well done, the realised probability level should be distributed uniformly between 0 and 1. It cannot hover around any particular smaller area between 0 and 1. It is therefore equivalent to testing the hypothesis:

The realised probability level follows the uniform distribution between 0 and 1.

For example, for prediction of S&P 500 ETF SPY.US, 24-week ahead horizon, let us first look at two target dates Fri 2023-10-20 and Fri 2023-11-24.

For Fri 2023-10-20, the prediction was made 24 weeks before on Fri 2023-05-05. 42% of the predicted values were below the realistion 421.19, the realised probability level was thus 42%, or 0.42.

For Fri 2023-11-24, the prediction was made 24 weeks before on Fri 2023-06-09. 56% of the predicted values were below the realisation 455.3, the realised probability level was thus 56%, or 0.56.

For each target date we obtain one realised probability level. Across many target dates, we would have a collection of realised probability levels. We plot their empirical distribution in the solid line,

as well as the probability density function of the uniform distribution in the dotted line, which is constant 1. Can the observed realised probability levels have come from the uniform distribution between 0 and 1?

Let us run a statistical test which would produce a p-value as an indicator for the goodness of fit, under the hypothesis it did come from the uniform distribution. The p-value is always a value between 0 and 1 for any statistical test. For our sample above, we obtain p-value 0.0013.

People would compare the obtained p-value against certain threshold, usually 0.05, 0.01 or 0.001. If we use 0.05, as the computed p-value is below 0.05, we will reject the hypothesis being tested. But if we use a lower threshold 0.001, we will not.

Trading Backtest

On financial data, we can additionally perform a trading backtest to test the goodness. In one way, the signal is the majority vote of "up", "down" or "undecided" as described above.

Take for example 1-month forecast horizon. The causal predictors are updated daily, new models get trained gradually, and model votes take place daily. Each signal concerns the future 1 month, so the next-day position would be the average of the signals that were output over the last one month.

With 2-day ahead forecast horizon, under the ideal condition of no trading cost, the performances would be:

For Invesco Nasdaq-100 ETF,

For SPDR S&P 500 ETF,

For SPDR Dow Jones Industrial Average ETF,

For interest rates, let's switch to a 1-week ahead forecast horizon:

For US 5-Year Treasury Yield Rate,

For US 10-Year Treasury Yield Rate,