Compare two audio files for beat/tempo and rating in iphone_问答_开发者

Compare two audio files for beat/tempo and rating in iphone

开发者 https://www.devze.com 2023-02-01 00:49 出处：网络

I want to develop an iPhone application which should have the ability to count the number of phrases that are received when user sing on the mic.

This application should also have the ability to decipher whether the users phrases are in, or out, of cadence with a preset beat. When user sings on mic, Instrum开发者_StackOverflow社区ental-only music plays.

So I have to merge the users recorded voice with instrumental music -- this is one audio file. Already i have on original song file. I have to compare both and give a rating to users.

Note...Instrumental music is without vocal of Original Song file.

Can anyone please help me? Thanks, Vadivelu

First you are going to need a solution for audio segmentation and onset detection. There are a few different ways to do this, some of them have been discussed on stack overflow already. Aubio is one library that may help you with this.

The second part, merging the two sound files should be a simple summing operation between the sample buffers of the incoming microphone sound with the sample buffers of the original audio source.

Let me try to understand the application you are building.

I have an iPhone and I play Lady Gaga :P.
It plays the original song (instrumentals + vocals).
As I start singing, the app must detect that I am trying to sing the song playing.
If it does determine this, it switches to playing instrumentals only (karaoke style).
Concurrently, it records my voice. At the end of the song, it does some analysis on how well I sang.

If this is correct, let me try to take a stab at Step #4. The basic idea is that only if I am singing something close to the song being played should it switch into karaoke mode.

I would pre-compute an energy envelope of the vocal only portion of the song (the part the person is supposed to sing). To extract the vocal only portion, you might have to pay a good singer to sing it because you probably cannot extract it from the original song.

To compute the energy envelope, I would use something like half wave rectification followed by a low pass filter (definitely something causal and fast).

Then, I would listen on the microphone and in real time compute the energy envelope of the input audio.

Knowing that I am 2:00 into "Telephone", I would compare the truth energy envelope from 1:55 to 2:00 to the energy envelope of the last 5 seconds I recorded. I would normalize each envelope some way. Depending on the overlap score, I would decide whether the person was attempting to sing "Telephone" or not.

Best of luck!

Chuan