Audio File Matching Program_问答_开发者_运维开发者技术经验分享

开发者 https://www.devze.com 2023-02-06 12:55 出处：网络

I\'m trying to write a program in iPhone than can take two audio files (e.g. WAV) as inputs, compare them, and spit out a number that tells you how similar the audio files are.

I'm trying to write a program in iPhone than can take two audio files (e.g. WAV) as inputs, compare them, and spit out a number that tells you how similar the audio files are.

If someo开发者_运维技巧ne has done something like this, know how to go about doing it, or just have some ideas, please let me know. Anything will be greatly appreciated.

Specific questions: What language is suitable? How hard is it to do (how many hours, roughly)? Where can I find a good source of audio library/tools?

Thanks!

I'd say it's pretty hard, not so much the implementation, but coming up with a reasonable definition of 'similar'.

That said, you're probably looking at techniques like autocorrelation and FFT, both of which are CPU-intensive tasks, so I'd say a fully-compiled language (C, C++, don't know about Objective-C) would be most suitable at least for the actual calculations. Also, you're facing a somewhat underpowered platform for such tasks (if only because uncompressed audio files are pretty large), so you're in for quite some optimization.

This book: http://www.dspguide.com/ is quite concise reading for all things DSP-related.

Sounds similar to what 'Shazam' does - awesome iPhone app by the way, check it out if you haven't already (it's free too).

A while ago there was an article on how Shazam works, read it here. It takes an acoustic fingerprint and compares it to other songs' fingerprints, returning the closest match.

I would say there is a lot of math, probably some matrices and maybe Fourier transforms involved in fingerprinting and then trying to compare the audio.

Probably would take a good while to program. If your math skills are up to it though, sounds like a good challenge :-)

EDIT: turns out there was some source code on the site I linked. It's in Java but would be well worth a look through before you start writing your own. Source code here

I am working on something similar in Java on a speech recognition app.

I would recommend using MFCC (requires calculating FFT) for feature extraction and Neural Networks or some other sort of machine learning technique for training and recognition. You train the NN with the features extracted from the reference wav file, more precisely from consecutive equal lenght slices/windows of that audio file. Then you use the NN to detect if another file, also split into slices, has the same features.

This is the basic idea upon which you can elaborate to further your own specifications, or exactly what you want your app to do.

In terms of libraries in Objective C I think you can find a few for the signal processing part (FFT and such) as for the machine learning part I have no idea about what you could find.

As for programming time it's hard to estimate because it depends on a lot of details. I would say somewhere about a week, but that's just a fair estimation.

ps: MFCC stands for Mel-Frequency Coeficients: http://en.wikipedia.org/wiki/Mel-frequency_cepstrum