Classic! I’ve been in so many meetings like this…

Classic! I’ve been in so many meetings like this…

Most of my audio coding work these days is in Flash or iOS, but I always keep an eye on other possible platforms to develop audio apps for. One of the biggies is of course Windows 8, so I figured I should at least get familiar with the basics of doing interactive real-time audio generation in a Windows 8 app, using Microsoft audio APIs.

Much to my surprise I couldn’t find a good demo app (at least when I searched a few months ago). The nearest I could find was an MSDN WASAPI sample, which is fine, but doesn’t do real-time audio generation, i.e. it doesn’t generate audio that’s calculated continuously, on-the-fly, and sent to the audio output with relatively little latency.

So I created my own (full code here), using the MSDN WASAPI sample as a starting point, stripping it down to the bare essentials for interactive audio generation. It’s a simple continuous sine tone generator, with the sine frequency adjustable via a slider. It looks like this:

My code sets up a real-time audio output that calls an audio generation callback function at regular intervals. The fiddly WASAPI stuff is encapsulated in C++ AudioOutput class with a very simple interface. Basically you create the AudioOutput, initialize it with a static callback function and an object pointer.

// Create and initialize a WASAPI renderer m_audioOutput = Make<AudioOutput>(); if (m_audioOutput) m_audioOutput->Init(AudioOutputCallback, this);

In that static callback function, you can use the object pointer to call a non-static method of an object. In my demo, the callback function is implemented in MainPage.xaml.cpp. MainPage.xaml has a slider that sets the instantaneous frequency of the sine tone. Here’s the static callback function and the redirection to MainPage’s class method:

void MainPage::AudioOutputCallback( float32 *output, int n, int numChannels, int sampleRate, Platform::Object^ user) { MainPage^ mainPage = safe_cast<MainPage^>(user); assert(mainPage); mainPage->GenerateAudio(output, n, numChannels, sampleRate); }

And here’s MainPage’s GenerateAudio function:

void MainPage::GenerateAudio( float32 *output, int n, int numChannels, int sampleRate) { // Compute the phase increment for the current frequency assert(m_frequency != 0); double phaseInc = 2*M_PI*m_frequency/sampleRate; // Generate the samples for (int i = 0; i < n; i++) { float32 x = float(0.1 * sin(m_phase)); for (int ch = 0; ch < numChannels; ch++) *output++ = x; m_phase += phaseInc; } // Bring phase back into range [0, 2pi] m_phase = fmod(m_phase, 2*M_PI); }

Again, you can download a zip with the full project source code. It builds fine in Microsoft Visual Studio Express 2013 for Windows, and runs fine on Windows 8.1 running under Bootcamp on my MacBook Pro. I make no claims that it will build or run on any other configuration! If you find this code useful and adapt the code for your own projects, no attribution is necessary… but of course it’s always welcome, as are thank you notes in comments. Enjoy!

Posted in Uncategorized
Leave a comment

Reached an interesting milestone a few days ago: 100 user reviews for AudioStretch. The response has been overwhelmingly positive, averaging about 4.5 out of 5 stars across all versions.

Here’s a screenshot of the most recent reviews as they appear on AppAnnie:

Posted in Uncategorized
Leave a comment

If you’ve got a good ear and enjoy playing around with music tools, check out Indiloop. Now’s a great time to sign up, as Indiloop’s having a remix contest, with a top prize of $2000. The task: using Indiloop, create a remix using ‘stems’ from three artists.

No need to buy anything. Signing up to Indiloop is free. All the raw musical material is available on Indiloop. It’s all browser based, so no special software required, just a reasonably fast PC or Mac, and a web browser that supports Flash.

My connection to Indiloop: I developed the Flash real-time audio signal processing “engine” that does all the pitch-shifting, time-stretching, and mixing. Basically the fiddly behind-the-scenes code that makes real-time remixing possible in a web browser. A team of very talented folks at Indiloop HQ in Vancouver does everything else – UI design, web and database programming, support, music licensing, legal, finance. Little startups with big ambitions are always team efforts!

Posted in Uncategorized
Leave a comment

Here’s a little promo video for Indiloop, a Vancouver-based company that has a super-cool online music remixing/mashup service. It’s been up and running on the web for a while, and an iPad version is coming soon. I wrote all the audio signal processing code for both versions – in Flash for the web, and native iOS code for iPad.

Indiloop was chosen as one of the finalists for the MidemLab startup competition at MIDEM, a major music industry conference that takes place every year in Cannes, France.

Turns out that a startup I did some audio programming for a couple of years ago, Paris-based Weezic, was also selected as a Midemlab finalist this year.

I’m hoping both Indiloop and Weezic will win in their respective categories!

Posted in Uncategorized
Leave a comment

A few weeks ago, I posted a demo Flash real-time pitch detector, and described a bit how it worked without showing any source code. Well, today, I am posting some actual code, C++ code for a very accurate monophonic time-domain pitch estimator.

I’ve used variants of this pitch estimator in various projects since the late 1990s. The version posted here is a simple, non-optimized version which I wrote for a friend. Note that it’s written for clarity, not speed. Note also that it’s only appropriate for monophonic (in the sense of single pitched source) signals. Polyphonic pitch detection is a harder problem, best tackled using spectral techniques.

Scroll to the bottom of this post for the code. I’m releasing it under the MIT license, which means you can do pretty much whatever you like with it as long as your source code also includes the license. Hat-tips in the form of comments, credits, and free copies of whatever products you create using it are welcome.

The first essential step in this and many other time-domain pitch estimators is the normalized autocorrelation (NAC), which in my code looks like this:

vector nac(maxP+1); for ( int p = minP-1; p <= maxP+1; p++ ) { double ac = 0.0; // Standard auto-correlation double sumSqBeg = 0.0; // Sum of squares of beginning part double sumSqEnd = 0.0; // Sum of squares of ending part for ( int i = 0; i < n-p; i++ ) { ac += x[i]*x[i+p]; sumSqBeg += x[i]*x[i]; sumSqEnd += x[i+p]*x[i+p]; } nac[p] = ac / sqrt( sumSqBeg * sumSqEnd ); }

In this code, x points to the input signal, which should normally have a length at least two times the maximum period; minP/maxP are the minimum/maximum periods of interest; p is the ‘lag’ or time shift in samples (i.e. the hypothetical period); and ac is just the standard (non-normalized) autocorrelation.

Plenty of pitch estimators use autocorrelation, but the problem is that the magnitude of the autocorrelation depends on the magnitude of the signal. Another problem is that because the autocorrelation is computed based on fewer points as the ‘lag’ p increases, the autocorrelation tends to get smaller with increasing p. That makes it difficult to choose the best period.

The normalized autocorrelation (nac in the code) is computed by dividing the (non-normalized) autocorrelation ac by the square root of the product of the sums-of-squares of the two sub-sequences that were multiplied to give the autocorrelation. That’s a mouthful – it’s easier to say in math lingo, but clearest in code.

The net result of the normalization is that if p is exactly equal to the real period (or an integer multiple thereof), the NAC at that period has a value of exactly 1.0. A happy by-product of the normalization is that if you have a signal that’s periodic but has an exponentially growing or decaying envelope, the NAC will still be 1.0. It even works if there’s no energy at the fundamental frequency.

I came up with this particular normalization independently in the late 1990s, but it’s not unique. A couple of years ago, a friend of mine, Dave Fernandes (CEO of Mint Leaf Software) told me my normalization is the same as the one in Paul Boersma’s Praat speech analysis tool, which has been around from the mid-1990s. And I’ve seen the same normalization in various academic papers, for example this 2003 paper by Sumit Basu from Microsoft.

What’s more interesting than the NAC algo is how one can apply some simple tricks to get musically useful results. First off, we need to improve the resolution. For typical sample rates and musical pitches, estimating the period to the nearest number of samples is nowhere near good enough. Consider the top note on a piano, C8 = 4186Hz. For a sample rate of 44.1kHz, C8 would have a period of 10.53 samples. If we could only estimate the period to the nearest sample, we’d get either 10 or 11 samples – error of 5%, nearly a semitone! To get a vastly better estimate, we can apply quadratic interpolation using the peak NAC and the points on either side of it. With this interpolation, the error is reduced to tiny fractions of a semitone, throughout the entire range of a piano at least.

The other challenge is to eliminate so-called ‘octave errors’ in which the estimated period is actually a multiple of the real period (e.g. C4 gets misrecognized as C3). The trick is to check whether the NAC has strong peaks at integer submultiples of period implied by the strongest peak. For example, if the biggest peak in the NAC is for a period of 300 samples, but there are very strong peaks at 100 and 200 as well, then assume the period is 100 samples.

To build sample, just drop the code (see below) into a main.cpp in your favourite C++ development environment. It’s hard-coded to generate and analyze a signal with fundamental frequency corresponding to middle C (C4 ~= 261.6 Hz). When you run it, you should get this output:

Actual freq: 261.626 Estimated freq: 261.625 Error (cents): -0.002 Periodicity quality: 1.000

‘Actual freq’ is the pitch of the test signal; ‘Estimated freq’ is the estimated pitch of the test signal as computed by my algorithm; ‘Error (cents)’ is the error in the estimate is hundredths of semitones; and ‘Periodicity quality’ is a measure of how periodic the signal is, which 1.0 meaning perfectly periodic. Note that error: 2 millicents!

Incidentally, that ‘periodicity quality’ can be handy if you’re synthesizing speech – rather than use a boolean voiced/unvoiced decision, you can synthesize with varying amounts of noisiness based on the quality. Also good for synthesis of musical signals (which after all are never purely periodic or purely noisy).

There are loads of possible performance tweaks not shown here:

- Use an FFT-based method to compute the autocorrelation
- Compute the square of each sample only once.
- Find places in the signal where the periodicity is very strong, then scan forwards and backwards from there to track the pitch into the less certain portions. (Typically only applicable for non-real-time cases).
- Limit the search range to only a small part of the possible pitch range around the most recently identified pitch.
- Mix in some noise and/or use a level threshold to prevent spurious pitch detections when the signal level gets very small.

All things I’ve done at various times, and which people ‘skilled-in-the-art’ (as they say in patents) could figure out. Besides performance optimizations, to use this algo in a real-time context you have to know how to capture input audio, how to call the pitch estimator at appropriate times, how to display the output, etc. These are left as exercises for the reader (as they say in textbooks)… but if you’d like to hire someone to help with it, I know someone you can call. ;-)

// =================================================================== // PeriodEstimator demo // // Demonstrates use of period estimator algorithm based on // normalized autocorrelation. Other neat tricks include sub-sample // accuracy of the period estimate, and avoidance of octave errors. // // Released under the MIT License // // The MIT License (MIT) // // Copyright (c) 2009 Gerald T Beauregard // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to deal // in the Software without restriction, including without limitation the rights // to use, copy, modify, merge, publish, distribute, sublicense, and/or sell // copies of the Software, and to permit persons to whom the Software is // furnished to do so, subject to the following conditions: // // The above copyright notice and this permission notice shall be included in // all copies or substantial portions of the Software. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE // AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER // LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, // OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN // THE SOFTWARE. // =================================================================== #include <stdio.h> #include <math.h> #include <assert.h> #include <vector> using namespace std; double EstimatePeriod( const double *x, // Sample data. const int n, // Number of samples. For best results, should be at least 2 x maxP const int minP, // Minimum period of interest const int maxP, // Maximum period of interest double& q ); // Quality (1= perfectly periodic) int main (int argc, char * const argv[]) { const double pi = 4*atan(1); const double sr = 44100; // Sample rate. const double minF = 27.5; // Lowest pitch of interest (27.5 = A0, lowest note on piano.) const double maxF = 4186.0; // Highest pitch of interest(4186 = C8, highest note on piano.) const int minP = int(sr/maxF-1); // Minimum period const int maxP = int(sr/minF+1); // Maximum period // Generate a test signal const double A440 = 440.0; // A440 double f = A440 * pow(2.0,-9.0/12.0); // Middle C (9 semitones below A440) double p = sr/f; double q; const int n = 2*maxP; double x[n]; for ( int k = 0; k < n; k++ ) { x[k] = 0; x[k] += 1.0*sin(2*pi*1*k/p); // First harmonic x[k] += 0.6*sin(2*pi*2*k/p); // Second harmonic x[k] += 0.3*sin(2*pi*3*k/p); // Third harmonic } // TODO: Add low-pass filter to remove very high frequency // energy. Harmonics above about 1/4 of Nyquist tend to mess // things up, as their periods are often nowhere close to // integer numbers of samples. // Estimate the period double pEst = EstimatePeriod( x, n, minP, maxP, q ); // Compute the fundamental frequency (reciprocal of period) double fEst = 0; if ( pEst > 0 ) fEst = sr/pEst; printf( "Actual freq: %8.3lf\n", f ); printf( "Estimated freq: %8.3lf\n", sr/pEst ); printf( "Error (cents): %8.3lf\n", 100*12*log(fEst/f)/log(2) ); printf( "Periodicity quality: %8.3lf\n", q ); return 0; } // =================================================================== // EstimatePeriod // // Returns best estimate of period. // =================================================================== double EstimatePeriod( const double *x, // Sample data. const int n, // Number of samples. Should be at least 2 x maxP const int minP, // Minimum period of interest const int maxP, // Maximum period double& q ) // Quality (1= perfectly periodic) { assert( minP > 1 ); assert( maxP > minP ); assert( n >= 2*maxP ); assert( x != NULL ); q = 0; // -------------------------------- // Compute the normalized autocorrelation (NAC). The normalization is such that // if the signal is perfectly periodic with (integer) period p, the NAC will be // exactly 1.0. (Bonus: NAC is also exactly 1.0 for periodic signal // with exponential decay or increase in magnitude). vector<double> nac(maxP+1); for ( int p = minP-1; p <= maxP+1; p++ ) { double ac = 0.0; // Standard auto-correlation double sumSqBeg = 0.0; // Sum of squares of beginning part double sumSqEnd = 0.0; // Sum of squares of ending part for ( int i = 0; i < n-p; i++ ) { ac += x[i]*x[i+p]; sumSqBeg += x[i]*x[i]; sumSqEnd += x[i+p]*x[i+p]; } nac[p] = ac / sqrt( sumSqBeg * sumSqEnd ); } // --------------------------------------- // Find the highest peak in the range of interest. // Get the highest value int bestP = minP; for ( int p = minP; p <= maxP; p++ ) if ( nac[p] > nac[bestP] ) bestP = p; // Give up if it's highest value, but not actually a peak. // This can happen if the period is outside the range [minP, maxP] if ( nac[bestP] < nac[bestP-1] && nac[bestP] < nac[bestP+1] ) { return 0.0; } // "Quality" of periodicity is the normalized autocorrelation // at the best period (which may be a multiple of the actual // period). q = nac[bestP]; // -------------------------------------- // Interpolate based on neighboring values // E.g. if value to right is bigger than value to the left, // real peak is a bit to the right of discretized peak. // if left == right, real peak = mid; // if left == mid, real peak = mid-0.5 // if right == mid, real peak = mid+0.5 double mid = nac[bestP]; double left = nac[bestP-1]; double right = nac[bestP+1]; assert( 2*mid - left - right > 0.0 ); double shift = 0.5*(right-left) / ( 2*mid - left - right ); double pEst = bestP + shift; // ----------------------------------------------- // If the range of pitches being searched is greater // than one octave, the basic algo above may make "octave" // errors, in which the period identified is actually some // integer multiple of the real period. (Makes sense, as // a signal that's periodic with period p is technically // also period with period 2p). // // Algorithm is pretty simple: we hypothesize that the real // period is some "submultiple" of the "bestP" above. To // check it, we see whether the NAC is strong at each of the // hypothetical subpeak positions. E.g. if we think the real // period is at 1/3 our initial estimate, we check whether the // NAC is strong at 1/3 and 2/3 of the original period estimate. const double k_subMulThreshold = 0.90; // If strength at all submultiple of peak pos are // this strong relative to the peak, assume the // submultiple is the real period. // For each possible multiple error (starting with the biggest) int maxMul = bestP / minP; bool found = false; for ( int mul = maxMul; !found && mul >= 1; mul-- ) { // Check whether all "submultiples" of original // peak are nearly as strong. bool subsAllStrong = true; // For each submultiple for ( int k = 1; k < mul; k++ ) { int subMulP = int(k*pEst/mul+0.5); // If it's not strong relative to the peak NAC, then // not all submultiples are strong, so we haven't found // the correct submultiple. if ( nac[subMulP] < k_subMulThreshold * nac[bestP] ) subsAllStrong = false; // TODO: Use spline interpolation to get better estimates of nac // magnitudes for non-integer periods in the above comparison } // If yes, then we're done. New estimate of // period is "submultiple" of original period. if ( subsAllStrong == true ) { found = true; pEst = pEst / mul; } } return pEst; }

Posted in Uncategorized
6 Comments

AudioStretch for iOS continues to get overwhelmingly positive user reviews from all over the world!

Many thanks to all who have written reviews or written to support@audiostretch.com with feedback and suggestions. It’s always great to hear from users.

I’m especially grateful to uber-bassist Damian Erskine for his review on notreble.com, and for Gerry Malloy’s review on his PlayRightAway YouTube channel.

Haven’t tried AudioStretch? If you’re not ready to buy, there’s always AudioStretch Lite. It’s free, and apart from some modest limitations (e.g. max song length 5 minutes), it’s nearly identical to the full paid version. If you don’t have an Apple iThing, you can still try AudioStretch for Flash, a pretty darn good simulation of the AudioStretch iOS app which runs in a browser using your own mp3 files.

Posted in Uncategorized
Leave a comment