I have a dataset (an array) and I need to find the periodicity in it. How should I proceed? Somebody said I can use FFT but I am not sure how will 开发者_如何转开发it give me the periodicity. Your help is appreciated!
For this task it's best to use the autocorrelation.
The FFT is the wrong tool to use for finding the periodicity.
Consider, for example, a case where your waveform is made by adding together two simple sine waves, one with a period of 2 seconds (0.5 Hz), and the other with 3 seconds (0.333 Hz). This waveform will have a periodicity of 6 seconds (i.e., 2*3), but the Fourier spectrum will only show two peaks at .5 Hz, and .333 Hz.
Periodicity is not well defined term. For example, such data:
1, 10, 1, 10, 1, 11, 1, 10, 1, 10, 1, 11, 1, 10, 1, 10, 1, 11
you may treat as one with not exact but strong periodicity of 2, and as exact periodicity of 6.
For exact periodicity you may simply try to find given data as substring of data repeated twice.
For non exact periodicity of real, noisy signal time domain and frequency domain methods may be used.
Time domain one is self correlation. It is like a substring search above: searched for a shift value on which data have maximum self similarity.
For simple signals counting threshold transitions may be enough.
Frequecy domain methods include one using FFT/FHT: search for a maximum in fequency spectre which gives 1/T of periodicity.
Another method is using Cepstrum.
This new paper hasn't had a great deal of attention, spectral clustering
Amariei, C., Tomita, M., & Murray, D. B. (2014). Quantifying periodicity in omics data. Frontiers in cell and developmental biology.
Implemented in an R package available at oscillat.iab.keio.ac.jp. I'm not affiliated with the authors, but put the code up at GitHub here for easier access (main script here).
Uses a DFT and groups rows into major spectral powers, nice to use in my experience. Obviously for genomics it's designed to be robust (noted in the code it's computationally heavy), so may depend on the application.
You could use FFT because it will convert your data set from a value-space to a frequency-space.
This means that you will end up having a set of frequencies that composed will produce the initial input that you want to analyze. Then you can easily recognize which are the major contribuitions that are generated by specific frequencies and so you will understand how many periodicities there are and which are the most influential ones..
take a look here: http://local.wasp.uwa.edu.au/~pbourke/miscellaneous/dft/
I found a paper that combines an FFT-based periodogram with autocorrelation to provide more accurate information on the periodicity of a signal. I think that this method could be worth looking into:
On Periodicity Detection and Structural Periodic Similarity
精彩评论