The Lomb-Scargle Periodogram

When the Fourier Transform Doesn't Work

There are many situations in physics when you have some time dependant data and would like to find from which frequencies the signal is made. Alternatively, you could have some data as a function of position and you would like to find which wave numbers are there. The usual way to solve both of these problems is using the Fourier transform \begin{equation} \mathcal{F} [f(\omega)] = \int_{-\infty}^\infty dt f(t) e^{i \omega t} . \end{equation} This gives a convenient way to extract the frequency dependance of a signal. In the figure below, I demonstrae how this can be done in python. We setup some data made from sine waves of two frequencies $4$ and $8$. When the Fourier transform is performed, and we look at which frequencies contribute most strongly, we find two delta like peaks at the two frequencies that appear in our signal.

Notice that when we setup the data array we used xv = np.linspace(0, 2.0*np.pi, 512), meaning that the data is evenly spaced. What happens if the data is not evenly spaced? Let's have a look. We're using exactly the same signal, expect we're now sampling it on random points on the interval.

The Fourier peaks are in the wrong place! This is because the discrete Fourier transform requires data to be sampled on a regular grid. If the grid is not regular, then the discrete Fourier transform won't work.

This is bad news in plenty of situations. For example, say you'd like to find the period of oscillation of a Cepheid variable, a type of star whose luminosity oscillates. This information is useful for benchmarking cosmic distances. You can only take data when there's no cloud blocking your telescope, leading to data that is irregular in time. Or say you want to track the habits of animals, which again you can only observe sometimes. Neither of these problems can make use of the Fourier transform, which is the natural tool for extracting frequencies from time series data.

Luckily, there is another tool that can handle data on an irregular grid: the Lomb Scargle method.

The Lomb Periodogram

The Lomb-Scargle method was developed by the astronomers Lomb and Scargle separately. Lomb [1] came up with the method and Scargle proved several important things about it. The method can be understood as follows.

You have some data points, $(t_i, y_i)$, that are not evenly sampled. We assume that our data has the form \begin{equation} y_i = A \cos (\omega (t_i - \tau)) + B \sin (\omega (t_i - \tau)) \end{equation} where $A$, $B$, $\tau$ are parameters that we have to find and depend upon $\omega$. We want to find values for these parameters for each frequency that best fit our data. To do this, we seek to minimise the usual sum of squares error between our model $A \cos (\omega (t_i - \tau)) + B \sin (\omega (t_i - \tau))$ and our data $y_i$ \begin{equation} \chi^2 = \sum_i \left[ A \cos (\omega (t_i - \tau)) + B \sin (\omega (t_i - \tau)) - y_i \right]^2 . \end{equation} To minimise this, we have to solve \begin{align} \frac{\partial \chi^2}{\partial A} &= 0 & \frac{\partial \chi^2}{\partial B} &= 0 \end{align} for $A$ and $B$. Evaluating the derivatives, we get \begin{align} \frac{\partial \chi^2}{\partial A} &= 2 \sum_i \cos (\omega (t_i - \tau)) \left[ A \cos (\omega (t_i - \tau)) + B \sin (\omega (t_i - \tau)) - y_i \right] = 0 \\ \frac{\partial \chi^2}{\partial B} &= 2 \sum_i \sin (\omega (t_i - \tau)) \left[ A \cos (\omega (t_i - \tau)) + B \sin (\omega (t_i - \tau)) - y_i \right] = 0 \end{align} Dividing by the 2 to remove it, then moving the $y_i$ terms to the right hand side, we have \begin{align} \sum_i A \cos^2 (\omega (t_i - \tau)) + B \sin (\omega (t_i - \tau)) \cos (\omega (t_i - \tau)) &= \sum_i y_i \cos (\omega (t_i - \tau)) \\ \sum_i A \sin (\omega (t_i - \tau)) \cos (\omega (t_i - \tau)) + B \sin^2 (\omega (t_i - \tau)) &= \sum_i y_i \sin (\omega (t_i - \tau)) . \end{align}

Now, we can choose $\tau$ so that the cross terms $\sin (...) \cos (...)$ vanish, leaving \begin{align} \sum_i A \cos^2 (\omega (t_i - \tau)) &= \sum_i y_i \cos (\omega (t_i - \tau)) \\ \sum_i B \sin^2 (\omega (t_i - \tau)) &= \sum_i y_i \sin (\omega (t_i - \tau)) , \end{align} which we can solve for $A$ and $B$ \begin{align} A &= \frac{\sum_i y_i \cos (\omega (t_i - \tau))}{\sum_i \cos^2 (\omega (t_i - \tau))} \\ B &= \frac{\sum_i y_i \sin (\omega (t_i - \tau))}{\sum_i \sin^2 (\omega (t_i - \tau))} . \end{align} For each frequency, $\omega$, we get a different set of $A$, $B$ and $\tau$, but how do we find which frequencies are inside our signal? To find this, we just need to write the periodogram, which is a mesure of the spectral content of the signal, as [2] \begin{equation} P = \left( \sum_i y_i \cos (\omega (t_i - \tau)) \right)^2 + \left( \sum_i y_i \sin (\omega (t_i - \tau)) \right)^2 \end{equation} Plugging in the form of our data, $y_i$ and remembering that we've chosen $\tau$ so that the terms like $\cos (...) \sin (...)$ vanish \begin{align} P(\omega) &= \left( \sum_i \left[ A \cos (\omega (t_i - \tau)) + B \sin (\omega (t_i - \tau))\right] \cos (\omega (t_i - \tau)) \right)^2 + \left( \sum_i \left[ A \cos (\omega (t_i - \tau)) + B \sin (\omega (t_i - \tau))\right] \sin (\omega (t_i - \tau)) \right)^2 \\ P(\omega) &= A^2 \sum_i \cos^2 (\omega (t_i - \tau)) + B^2 \sum_i \sin^2 (\omega (t_i - \tau)) \\ P(\omega) &= \frac{\left(\sum_i y_i \cos (\omega (t_i - \tau))\right)^2 \sum_i \cos^2 (\omega (t_i - \tau)) }{\left(\sum_i \cos^2 (\omega (t_i - \tau))\right)^2} + \frac{\left(\sum_i y_i \sin (\omega (t_i - \tau))\right)^2 \sum_i \sin^2 (\omega (t_i - \tau)) }{\left(\sum_i \sin^2 (\omega (t_i - \tau))\right)^2} \\ P(\omega) &= \frac{\left(\sum_i y_i \cos (\omega (t_i - \tau))\right)^2}{\sum_i \cos^2 (\omega (t_i - \tau))} + \frac{\left(\sum_i y_i \sin (\omega (t_i - \tau))\right)^2 }{\sum_i \sin^2 (\omega (t_i - \tau))} \end{align} This is the Lomb-Scargle Periodogram. To use the periodogram, it should be evaluated for a range of frequencies. If the periodogram is large for a particular frequency (i.e. $A(\omega)$ or $B(\omega)$ is large for that frequency), then that frequency fits the data well. If it is small for a particular data, then that frequency fits poorly. It's worth remembering that the Lomb-Scargle periodogram is not a drop-in replacement for the Fourier transform as it only tells you whether fitting a particular frequency of sine or cosine wave to your data is effective in reducing the least-squares error between your data and the shape you've assumed it has. The Lomb-Scargle periodogram is also very easy to evaluate numerically, as scipy has a built in function for evaluating it. Let's see whether using the Lomb-Scargle periodogram to analyse our unevenly sampled data is any improvement on the discrete Fourier transform.

It does a much better job than the Fourier transform - we now have peaks at the correct frequencies! It's also fairly easy to write your own function that evaluates the periodogram [3] (although it may be a little slow), for example, in classic Blue Peter fashion, here's one I made earlier.

Choosing $\tau$

The only thing we skipped over was how to choose $\tau$ so that $\cos (\omega t-\tau) \sin (\omega t-\tau) = 0$. It's quite important in a few of the steps that this is true, so I'll show the working here. We begin by using the angle-sum rules \begin{align} \sin (\alpha \pm \beta) &= \sin \alpha \cos \beta \pm \cos \alpha \sin \beta, \\ \cos (\alpha \pm \beta) &= \cos \alpha \cos \beta \mp \sin \alpha \sin \beta, \end{align} to write \begin{equation} \left[ \cos \omega t \cos \tau + \sin \omega t \sin \tau \right] \left[ \sin \omega t \cos \tau - \cos \omega t \sin \tau \right] = 0. \end{equation} Expanding the brackets and re-organising we find that \begin{equation} \sin \omega t \cos \omega t \left[ \cos^2 \tau - \sin^2 \tau \right] + \cos \tau \sin \tau \left[ \sin^2 \omega t - \cos^2 \omega t \right] = 0 . \end{equation} We then use the double angle identities \begin{align} \sin 2\theta &= 2 \sin \theta \cos \theta \\ \frac{\cos 2 \theta + 1}{2} &= \cos^2 \theta \\ \frac{1 - \cos 2\theta}{2} &= \sin^2 \theta \end{align} to eliminate all of the squared terms \begin{align} \sin (2\omega t) \cos (2\tau) = \sin(2\tau)\cos(2\omega t) . \end{align} This is easily re-arranged to find that the correct choice of $\tau$ is
\begin{equation} \tau = \frac{1}{2} \arctan \left[ \frac{\sin(2\omega t)}{\cos(2\omega t)} \right] . \end{equation}

References

[1] N. R. Lomb, "Least Squares Spectral Analysis of Unevenly Sampled Data", Astrophysics and Space Science 39 pp. 447-462 (1972)

[2] J. D. Scargle, "Studies in Astronomical Time Series Analysis. II. Statistical Aspects of Spectral Analysis of Unevenly Spaced Data", 263 pp. 835-853 (1982)

[3] W. H. Press, S. A. Teukolsky, W. T. Vetterling and B. P. Flannery, "Numerical Recipes in C" (Cambridge University Press, 2007)