I have created a process similar to Shazam that creates a Spectrogram of a given sound clip. I am trying to figure out a way in which to store this data into a database so that I can run comparisons on it. (I dont need actual code, just conceptual help on the process).
For those unfamiliar with a spectrogram, its a graph of time on the x-axis and frequency on the y-axis. I need a way of saving this data in a way that i can run comparisons. Also, I can't simply create a long of the frequency values from left to right because that becomes a time complexity issue when trying to search against it with 开发者_Python百科large data sets (basically an N^2 substring comparison).
Essentially I was thinking about creating some sort of hash on the sound clip and saving the data as a trie or suffix tree of the hash but I'm not sure how I could do a comparison for it then.
Any ideas would be greatly appreciated.
This is a 2D array. Possibly a sparse one if most of the data is 0.0.
I'd use a ROOT histogram (say TH2F
) to avoid having to manage all the edge cases and so on, though almost any scientific library should support an appropriate data structure. ROOT supports at least two histogram similarity measures (Chi squared and Kolmogorov) which will allow you to make quantitative comparisons.
You can either store it as a raw 2D array, or else you will need to do some higher level feature extraction (track pitch contours etc) to extract the significant features which you can then use for comparison purposes.
The problem with a hash is that you need close matches, not exact matches - I was thinking of something along the lines of extracting the (time, freq) tuples of local peaks in the spectrogram and then putting those in a http://en.wikipedia.org/wiki/Spatial_database .
To search you could extract n highest peaks (4 - 8?) and then search for the closest peaks in the spatial database and find the best fitting match.
精彩评论