Neural Nets, Stock Prices, and the Fourier Transform

Trevor McGuire
10 min readMay 12, 2022

--

Stock prices are notoriously difficult to forecast. There are various reasons for this, depending on how you approach the problem. These reasons include (but are not limited to) self-similarity, non-periodicity, and the chaotic nature of price movements. But no matter the underlying reasons, one thing remains certain: typical methods, such as ARIMA and the many flavors of RNNs, have proved unreliable at solving this task.

In this article, we’re going to explore a slightly unconventional way of forecasting stock prices. And to do that, we’re going to view price as a waveform.

Representing price as a waveform implies some underlying assumptions; the most important being that price is really just a composite signal made up of a number of sinusoids. To both identify and analyze these constituent sinusoids, we can use the Discrete Fourier Transform (DFT) to decompose our signal. Likewise, we can also use the inverse Discrete Fourier Transform (iDFT) to reconstruct our original signal from these underlying parts.

For those who don’t know, the DFT allows you to translate a signal from the time domain into the frequency domain. And the iDFT does the reverse; meaning it allows you to translate from the frequency domain back into the time domain. For a mathematical definition please see here or here .

To demonstrate this principle, imagine a musician playing an (angsty) A-Minor chord on the piano. Using the DFT, we can decompose this chord into its constituent parts, namely the notes “A”, “C”, and “E”. This can be shown below. By the way, if you’re interested in the code behind all of this, here is the jupyter notebook version of this post.

Fig. 1 Breaking down an A-Minor chord into the 3 notes that make it up

Let’s break down what we just did there. First (from top to bottom), we started off by plotting an A-Minor chord. Next, we used the DFT to translate the A-Minor chord from the time domain into the frequency domain. Lastly, once in the frequency domain, we found the frequencies with the most prominent amplitudes, which gave us 28Hz, 33Hz, and 41Hz. Note that these are the frequencies associated with the notes that made up our A-Minor chord: “A”, “C”, and “E”, respectively.

So we were able to use the DFT to break down the original chord into its constituent notes. Likewise, now that we have our three notes (“A”, “C”, and “E”), we can use the iDFT to reconstruct our original A-Minor chord. Here is the result of that below:

Fig 2. Using the iDFT to reconstruct the A-Minor chord from the notes “A”, “C”, and “E”

As you can see, the reconstructed signal matches the original signal (almost) perfectly. The extremely small error (5.7e-16) proves this quantitatively.

So we know the DFT is capable of interpolation. But as cool as that is, what about extrapolation? Can we use the DFT and iDFT to effectively forecast a signal into the future?

Turns out, the answer is yes — however, with some strings attached.

To extrapolate a signal, we can calculate the DFT for N timesteps, find the frequencies of the “x” most prominent amplitudes, and then use the iDFT to reconstruct our series over N+p (where p is equal to the number of future timesteps we want to predict). Doing this yields the below, where the red line is our prediction based on the iDFT:

Fig 3. Forecasting a timeseries using the iDFT

The results actually aren’t half bad (for our toy dataset, that is). Unfortunately, as you will see, there are many inherent issues with this method when using real-world, non-periodic data. For now though, let’s demonstrate this on stock prices.

First, we will reiterate that the iDFT is indeed able to reconstruct any signal almost perfectly.

Fig 4. Reconstructing stock prices using the iDFT

Once again, the extremely small error (1.3e-14) shows that the reconstructed signal is a near-perfect match for the original.

Now, we will predict future prices using the same iDFT methodology as before.

Fig 5. Forecasting stock prices using the iDFT

At first glance, you should notice two things. First, the iDFT was able to model the training data excellently. This isn’t surprising, though, as we already know that we can use these methods to interpolate any signal. What is surprising, however, is the degree to which the iDFT model was able to predict our future price values.

Beware, as this is just an illusion. In reality, the iDFT cannot effectively forecast future values. There are several reasons for this, which we will now briefly discuss.

For starters, the iDFT assumes the underlying function will have a period equal to N (which is the number of samples). Likewise, since a signal’s frequency is inversely proportional to its period, this means the frequencies are frozen. Consequently, the iDFT cannot effectively model any non-periodic components of a signal. So the fact that it near-perfectly fits the training data is non-consequential for extrapolation. In fact, because the iDFT cannot model non-periodic components, any forecast using this technique will essentially just repeat the patterns identified in the training data forever. This is obviously not ideal.

Let’s demonstrate this fact with another example:

Fig 6. Another example of forecasting stock prices using the iDFT

Once again, the iDFT forecast looks pretty good, at least directionally. However, if we take a closer look and compare the forecast to the beginning of the timeseries, it becomes apparent what is actually happening.

Fig 7. A closer look reveals that the forecast mimics the start of the timeseries

To state the obvious, the 20-period “forecast” (titled yhat) is actually just a crude replication of the first 20 timesteps of the series. What’s rather amusing, though, is how often this leads to decent results. I have several guesses as to why this is.

For starters, when prices are trending, there is a high likelihood that mimicking the shape of earlier timesteps will be directionally-accurate. Furthermore, prices do seem to have some periodicity. It is important to note, however, that in my experiments this degree of periodicity fluctuated over time, alongside a continuously changing market environment. But, in those relatively periodic market environments, the iDFT’s forecast seemed capable of capturing where in time local extrema were located. And given the sinusoidal nature of any iDFT model, this is an important observation, as it gives some credence to the idea that price is indeed a waveform.

Nonetheless, using the iDFT to forecast price movements is a terrible idea by itself, and I definitely am not claiming that it can be used as a winning trading strategy. What I am emphasizing, however, is that there are some interesting characteristics given by iDFT forecasts that may give some insight into the true nature of price movements.

At this point I would like to pause for a brief second and point out that the DFT is technically a simple single-layer neural network with no activation function or bias, and with weights that are pre-calculated. In fact, any linear transformation can be represented this way, as we can rewrite such a transformation using matrix multiplication — which is the basis for neural networks. To conceptualize this, I encourage you to read this blog post.

This brings us to Neural Decomposition, which is a neural network approach heavily influenced by the iDFT. As you will see, this method mimics the iDFT for modeling periodic components, while also accounting for non-periodicity by using an augmentation function.

Neural Decomposition (ND), detailed in this paper, uses an iDFT-like model with two key differences. First, ND allows frequencies to be trained; and second, ND is capable of handling non-periodic components via an augmentation function, referred to as g(t).

Below is the anatomy of this model.

Fig 8. Neural Decomposition Architecture

In the diagram above, w(i) represents our frequencies, φ(i) represents our phase-shifts, a(i) represents our amplitudes, and g(t) represents our augmentation function.

Mathematically, the model can be described as the below:

In short, this is just a simplified version of the iDFT with an extra augmentation function tacked on.

Details regarding the python implementation of this model can be found within the code version of this article, as well as the original paper . But the main idea behind it is the usage of trainable frequencies, combined with a trainable function g(t) that is designed to model both linear trend and non-linear irregularities, including any non-periodic components.

The neural net uses a combination of sinusoidal, linear, and sigmoidal activation functions to help augment the iDFT in a way such that it can more accurately represent chaotic, non-periodic data. Moreover, the output layer uses L1 regularization to promote sparsity, theoretically giving us a model that is both simple yet robust.

Below I will feature some results that highlight the advantages of taking this approach relative to the iDFT forecasting method that we were utilizing before. Note that our model’s predictions are in red, while the iDFT’s forecast is in blue.

For those with a keen eye, you will immediately notice what we were discussing before. That is, in every single example, the iDFT forecast is simply a transposed copy of the first few dozen timesteps in our training data. Our neural network model, on the other hand, isn’t.

The samples you just saw are, of course, cherry-picked from a larger population. However, I chose these particular ones intently, as they each show something interesting about what the model was able to learn. Not only can it predict the general direction of price movement, but it is also capable of predicting the general “shape”. Pretty cool, right?

That being said, there are definitely instances when this model performs as poorly as the generic iDFT. Take, for instance, the below example, where both models are clearly just attempting to repeat the pattern identified in the first dozen-or-so timesteps.

Fig 9. A completely erroneous prediction

However, even in this scenario, one should note that the ND model doesn’t precisely mimic the pattern in which the iDFT copies. The reason for this goes back to how the iDFT can only model periodic components, while our neural net model is capable of modeling both periodic and non-periodic parts. So while not perfect, the above example shows how our neural net is at least learning the correct things, even when outputting erroneous predictions.

After spending a lot of time experimenting with all of the methods described in this article, I have many thoughts regarding the feasibility of using fourier-like models to predict price movements. Some of these thoughts I will briefly share with you now, and others I will keep to myself for further development.

For our neural network, there were several things that heavily affected prediction quality. The first was how we scaled our data. In the original paper, the authors claimed that scaling between 0 and 10 worked best. In my experiments using stock prices, however, I found that scaling between 0 and 5 worked better.

But more importantly than the scaling of our data, was the context window used. And if you think about what the DFT and iDFT actually do, you can surmise why the context window is so important. For instance, many people (including myself) believe that the stock market has some form of “memory”. That is, certain price action from weeks, months, or even years ago can provide a psychological “push” that can, in turn, move the market. Well, if our context window is too small, our model won’t be able to decide which frequencies to place emphasis on to reconstruct and forecast our signal. Likewise, if our context window is too large, it may place emphasis on some long-term harmonic that falsely skews our prediction.

In the examples I gave to you here, I used a 128-day training period, with a 22-day prediction period. This was pretty arbitrary, as I just randomly sampled a window of 150 days and then used an 85/15 split for our training and test data.

In my opinion, an optimal context window is most likely a function of the current market environment. For example, in periods with higher volatility, it might make more sense to use a smaller context window, as prices have shorter “memory”. And in periods with lower volatility, it might make sense to use a larger context window for the opposite reason.

There are probably many ways to solve the optimal context problem. However, my initial thought is to potentially use reinforcement learning to craft this function for us. Features for this model would most likely include things like volatility and the Hurst exponent, which measures whether prices are trending or mean-reverting.

Anyway, I will stop there.

Obviously none of what I presented in this article should be used for any live-trading decisions. The use of the Fourier Transform in trading is a largely unexplored area and there is probably good reason for this. That being said, I hope you found this as interesting as I did. Thanks for reading.

--

--