开发者

Compare sounds inside of the App

开发者 https://www.devze.com 2023-01-16 08:21 出处:网络
Is it poss开发者_StackOverflow中文版ible to compare two sounds ? for example app have already a sound file mp3 or any format, is it possible to compare any static sound file and recorded sound inside

Is it poss开发者_StackOverflow中文版ible to compare two sounds ? for example app have already a sound file mp3 or any format, is it possible to compare any static sound file and recorded sound inside of app ?

Any comments are welcomed.

Regards


This forum thread has a good answer (about three down) - http://www.dsprelated.com/showmessage/103820/1.php.

The trick is to get the decoded audio from the mp3 - if they're just short 'hello' sounds, I'd store them inside the app as a wav instead of decoding them (though I've never used CoreAudio or any of the other frameworks before so mp3 decoding into memory might be easy).

When you've got your reference wav and your recorded wav, follow the steps in the post above :

1 Do whatever is necessary to convert .wav files to their discrete- time signals:

http://www.sonicspot.com/guide/wavefiles.html

2 time-warping might or might not be necessary depending on difference between two sample rates:

http://en.wikipedia.org/wiki/Dynamic_time_warping

3 After time warping, truncate both signals so that their durations are equivalent.

4 Compute normalized energy spectral density (ESD) from DFT's two signals:

http://en.wikipedia.org/wiki/Power_spectrum.

6 Compute mean-square-error (MSE) between normalized ESD's of two signals:

http://en.wikipedia.org/wiki/Mean_squared_error

The MSE between the normalized ESD's of two signals is good metric of closeness. If you have say, 10 .wav files, and 2 of them are nearly the same, but the others are not, the two that are close should have a relatively low MSE. Two perfectly identical signals will obviously have MSE of zero. Ideally, two "equivalent" signals with different time scales, (20-second human talking versus 5-second chipmunk), different energies (soft-spoken human verus yelling chipmunk), and different phases (sampling began at slightly different instant against continuous time input); should still have MSE of zero, but quantization errors inherent in DSP will yield MSE slightly greater than zero.

http://en.wikipedia.org/wiki/Minimum_mean-square_error

You should get two different MSE values, one between your male->recorded track and one between your female->recorded track. The comparison with the lowest difference is probably the correct gender.

I confess that I've never tried to do this and it looks very hard - good luck!

0

精彩评论

暂无评论...
验证码 换一张
取 消