Programmatically increase the pitch of an array of audio samples_问答_开发者

Hello kind people of the audio computing world,

I have an array of samples that respresent a recording. Let us say that it is 5 seconds at 44100Hz. How would I play this back at an increased pitch? And is it possible to increase and decrease the pitch dynamically? Like have the pitch slowly increase to double the speed and then back down.

In other words I want to take a recording and play it back as if it is being 'scratched' by a d.j.

Pseudocode is always welcomed. I will be writing this up in C.

Thanks,

EDIT 1

Allow me to clarify my intentions. I want to keep the playback at 44100Hz and so therefore I need to manipulate the samples before playback. This is also because I would want to mix the audio that has an increased pitch with audio that is running at a normal rate.

Expressed in another way, maybe I need to shrink the audio over the same number of samples somehow? That way when it is played back it will sound faster?

EDIT 2

Also, I would like to do this myself. No libraries please (unless you feel I could pick through the code and find something interesting).

EDIT 3

A sample piece of code written in C that takes 2 arguments (array of samples and pitch factor) and then returns an array of the new audio would be fantastic!

PS I've started a bounty on this not because I don't think the answers already given aren't valid. I just thought it would be good to get more feedback on the subject.

AWARD OF BOUNTY

Honestly I wish I could distribute the bounty over several different answers as they were quite a few that I thought were super helpful. Special shoutout to 开发者_如何学运维Daniel for passing me some code and AShelly and Hotpaw2 for putting in such detailed responses.

Ultimately though I used an answer from another SO question referenced by datageist and so the award goes to him.

Thanks again everyone!

Take a look at the "Elephant" paper in Nosredna's answer to this (very similar) SO question: How do you do bicubic (or other non-linear) interpolation of re-sampled audio data?

Sample implementations are provided starting on page 37, and for reference, AShelly's answer corresponds to linear interpolation (on that same page). With a little tweaking, any of the other formulas in the paper could be plugged into that framework.

For evaluating the quality of a given interpolation method (and understanding the potential problems with using "cheaper" schemes), take a look at this page:

http://www.discodsp.com/highlife/aliasing/

For more theory than you probably want to deal with (with source code), this is a good reference as well:

https://ccrma.stanford.edu/~jos/resample/

One way is to keep a floating point index into the original wave, and mix interpolated samples into the output wave.

//Simulate scratching of `inwave`: 
// `rate` is the speedup/slowdown factor. 
// result mixed into `outwave`
// "Sample" is a typedef for the raw audio type.
void ScratchMix(Sample* outwave, Sample* inwave, float rate)
{
   float index = 0;
   while (index < inputLen)
   {
      int i = (int)index;          
      float frac = index-i;      //will be between 0 and 1
      Sample s1 = inwave[i];
      Sample s2 = inwave[i+1];
      *outwave++ += s1 + (s2-s1)*frac;   //do clipping here if needed
      index+=rate;
   }

}

If you want to change rate on the fly, you can do that too.

If this creates noisy artifacts when rate > 1, try replacing *outwave++ += s1 + (s2-s1)*frac; with this technique (from this question)

*outwave++ = InterpolateHermite4pt3oX(inwave+i-1,frac);

where

public static float InterpolateHermite4pt3oX(Sample* x, float t)
{
    float c0 = x[1];
    float c1 = .5F * (x[2] - x[0]);
    float c2 = x[0] - (2.5F * x[1]) + (2 * x[2]) - (.5F * x[3]);
    float c3 = (.5F * (x[3] - x[0])) + (1.5F * (x[1] - x[2]));
    return (((((c3 * t) + c2) * t) + c1) * t) + c0;
}

Example of using the linear interpolation technique on "Windows Startup.wav" with a factor of 1.1. The original is on top, the sped-up version is on the bottom:

Programmatically increase the pitch of an array of audio samples

It may not be mathematically perfect, but it sounds like it should, and ought to work fine for the OP's needs..

Yes, it is possible.

~~But this is not a small amount of pseudo code. You are asking for a time pitch modification algorithm, which is a fairly large and complicated amount of DSP code for decent results.~~

~~Here's a Time Pitch stretching overview from DSP Dimensions. You can also Google for phase vocoder algorithms.~~

ADDED:

If you want to "scratch", as a DJ might do with an LP on a physical turntable, you don't need time-pitch modification. Scratching changes the pitch and the speed of play by the same amount (not independently as would require time-pitch modification).

And the resulting array won't be of the same length, but will be shorter or longer by the amont of pitch/speed change.

You can change the pitch, as well as make the sound play faster or slower by the same ratio, by just resampling the signal using properly filtered interpolation. Just move each sample point, instead of by 1.0, by floating point addition by your desired rate change, then filter and interpolate the data at that point. Interpolation using a windowed Sinc interpolation kernel, with a low-pass filter transition frequency below the lower of the original and interpolated local sample rate, will work fairly well. Searching for "windowed Sinc interpolation" on the web returns lots of suitable result.

You need an interpolation method that includes a low-pass filter, or else you will hear horrible aliasing noise. (The exception to this might be if your original sound file is already severely low-pass filtered a decade or more below the sample rate.)

If you want this done easily, see AShelly's suggestion [edit: as a matter of fact, try it first anyway]. If you need good quality, you basically need a phase vocoder.

The very basic idea of a phase vocoder is to find the frequencies that the sound consists of, change those frequencies as needed and resynthesize the sound. So a brutal simplification would be:

run FFT
change all frequencies by a factor
run inverse FFT

If you're going to implement this yourself, you definitely should read a thorough explanation of how a phase vocoder works. The algorithm really needs many more considerations than the three-step simplification above.

Of course, ready-made implementations exist, but from the question I gather you want to do this yourself.

To decrease and increase the pitch is as simple as playing the sample back at a lower or higher rate than 44.1kHz. This will produce the slower/faster record sound but you'll need to add the 'scratchiness' of real records.

This helped me with resampling, which is same thing you need just looked from the opposite side.

If you can't find code, ping me, I have a nice C routine for this.