A few weeks ago, I posted a demo Flash real-time pitch detector, and described a bit how it worked without showing any source code. Well, today, I am posting some actual code, C++ code for a very accurate monophonic time-domain pitch estimator.

I’ve used variants of this pitch estimator in various projects since the late 1990s. The version posted here is a simple, non-optimized version which I wrote for a friend. Note that it’s written for clarity, not speed. Note also that it’s only appropriate for monophonic (in the sense of single pitched source) signals. Polyphonic pitch detection is a harder problem, best tackled using spectral techniques.

Scroll to the bottom of this post for the code. I’m releasing it under the MIT license, which means you can do pretty much whatever you like with it as long as your source code also includes the license. Hat-tips in the form of comments, credits, and free copies of whatever products you create using it are welcome.

The first essential step in this and many other time-domain pitch estimators is the normalized autocorrelation (NAC), which in my code looks like this:

vector nac(maxP+1);
for ( int p = minP-1; p <= maxP+1; p++ )
{
double ac = 0.0; // Standard auto-correlation
double sumSqBeg = 0.0; // Sum of squares of beginning part
double sumSqEnd = 0.0; // Sum of squares of ending part
for ( int i = 0; i < n-p; i++ )
{
ac += x[i]*x[i+p];
sumSqBeg += x[i]*x[i];
sumSqEnd += x[i+p]*x[i+p];
}
nac[p] = ac / sqrt( sumSqBeg * sumSqEnd );
}

In this code, x points to the input signal, which should normally have a length at least two times the maximum period; minP/maxP are the minimum/maximum periods of interest; p is the ‘lag’ or time shift in samples (i.e. the hypothetical period); and ac is just the standard (non-normalized) autocorrelation.

Plenty of pitch estimators use autocorrelation, but the problem is that the magnitude of the autocorrelation depends on the magnitude of the signal. Another problem is that because the autocorrelation is computed based on fewer points as the ‘lag’ p increases, the autocorrelation tends to get smaller with increasing p. That makes it difficult to choose the best period.

The normalized autocorrelation (nac in the code) is computed by dividing the (non-normalized) autocorrelation ac by the square root of the product of the sums-of-squares of the two sub-sequences that were multiplied to give the autocorrelation. That’s a mouthful – it’s easier to say in math lingo, but clearest in code.

The net result of the normalization is that if p is exactly equal to the real period (or an integer multiple thereof), the NAC at that period has a value of exactly 1.0. A happy by-product of the normalization is that if you have a signal that’s periodic but has an exponentially growing or decaying envelope, the NAC will still be 1.0. It even works if there’s no energy at the fundamental frequency.

I came up with this particular normalization independently in the late 1990s, but it’s not unique. A couple of years ago, a friend of mine, Dave Fernandes (CEO of Mint Leaf Software) told me my normalization is the same as the one in Paul Boersma’s Praat speech analysis tool, which has been around from the mid-1990s. And I’ve seen the same normalization in various academic papers, for example this 2003 paper by Sumit Basu from Microsoft.

What’s more interesting than the NAC algo is how one can apply some simple tricks to get musically useful results. First off, we need to improve the resolution. For typical sample rates and musical pitches, estimating the period to the nearest number of samples is nowhere near good enough. Consider the top note on a piano, C8 = 4186Hz. For a sample rate of 44.1kHz, C8 would have a period of 10.53 samples. If we could only estimate the period to the nearest sample, we’d get either 10 or 11 samples – error of 5%, nearly a semitone! To get a vastly better estimate, we can apply quadratic interpolation using the peak NAC and the points on either side of it. With this interpolation, the error is reduced to tiny fractions of a semitone, throughout the entire range of a piano at least.

The other challenge is to eliminate so-called ‘octave errors’ in which the estimated period is actually a multiple of the real period (e.g. C4 gets misrecognized as C3). The trick is to check whether the NAC has strong peaks at integer submultiples of period implied by the strongest peak. For example, if the biggest peak in the NAC is for a period of 300 samples, but there are very strong peaks at 100 and 200 as well, then assume the period is 100 samples.

To build sample, just drop the code (see below) into a main.cpp in your favourite C++ development environment. It’s hard-coded to generate and analyze a signal with fundamental frequency corresponding to middle C (C4 ~= 261.6 Hz). When you run it, you should get this output:

Actual freq: 261.626
Estimated freq: 261.625
Error (cents): -0.002
Periodicity quality: 1.000

‘Actual freq’ is the pitch of the test signal; ‘Estimated freq’ is the estimated pitch of the test signal as computed by my algorithm; ‘Error (cents)’ is the error in the estimate is hundredths of semitones; and ‘Periodicity quality’ is a measure of how periodic the signal is, which 1.0 meaning perfectly periodic. Note that error: 2 millicents!

Incidentally, that ‘periodicity quality’ can be handy if you’re synthesizing speech – rather than use a boolean voiced/unvoiced decision, you can synthesize with varying amounts of noisiness based on the quality. Also good for synthesis of musical signals (which after all are never purely periodic or purely noisy).

There are loads of possible performance tweaks not shown here:

- Use an FFT-based method to compute the autocorrelation
- Compute the square of each sample only once.
- Find places in the signal where the periodicity is very strong, then scan forwards and backwards from there to track the pitch into the less certain portions. (Typically only applicable for non-real-time cases).
- Limit the search range to only a small part of the possible pitch range around the most recently identified pitch.
- Mix in some noise and/or use a level threshold to prevent spurious pitch detections when the signal level gets very small.

All things I’ve done at various times, and which people ‘skilled-in-the-art’ (as they say in patents) could figure out. Besides performance optimizations, to use this algo in a real-time context you have to know how to capture input audio, how to call the pitch estimator at appropriate times, how to display the output, etc. These are left as exercises for the reader (as they say in textbooks)… but if you’d like to hire someone to help with it, I know someone you can call. ;-)

// ===================================================================
// PeriodEstimator demo
//
// Demonstrates use of period estimator algorithm based on
// normalized autocorrelation. Other neat tricks include sub-sample
// accuracy of the period estimate, and avoidance of octave errors.
//
// Released under the MIT License
//
// The MIT License (MIT)
//
// Copyright (c) 2009 Gerald T Beauregard
//
// Permission is hereby granted, free of charge, to any person obtaining a copy
// of this software and associated documentation files (the "Software"), to deal
// in the Software without restriction, including without limitation the rights
// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
// copies of the Software, and to permit persons to whom the Software is
// furnished to do so, subject to the following conditions:
//
// The above copyright notice and this permission notice shall be included in
// all copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
// THE SOFTWARE.
// ===================================================================
#include <stdio.h>
#include <math.h>
#include <assert.h>
#include <vector>
using namespace std;
double EstimatePeriod(
const double *x, // Sample data.
const int n, // Number of samples. For best results, should be at least 2 x maxP
const int minP, // Minimum period of interest
const int maxP, // Maximum period of interest
double& q ); // Quality (1= perfectly periodic)
int main (int argc, char * const argv[])
{
const double pi = 4*atan(1);
const double sr = 44100; // Sample rate.
const double minF = 27.5; // Lowest pitch of interest (27.5 = A0, lowest note on piano.)
const double maxF = 4186.0; // Highest pitch of interest(4186 = C8, highest note on piano.)
const int minP = int(sr/maxF-1); // Minimum period
const int maxP = int(sr/minF+1); // Maximum period
// Generate a test signal
const double A440 = 440.0; // A440
double f = A440 * pow(2.0,-9.0/12.0); // Middle C (9 semitones below A440)
double p = sr/f;
double q;
const int n = 2*maxP;
double x[n];
for ( int k = 0; k < n; k++ )
{
x[k] = 0;
x[k] += 1.0*sin(2*pi*1*k/p); // First harmonic
x[k] += 0.6*sin(2*pi*2*k/p); // Second harmonic
x[k] += 0.3*sin(2*pi*3*k/p); // Third harmonic
}
// TODO: Add low-pass filter to remove very high frequency
// energy. Harmonics above about 1/4 of Nyquist tend to mess
// things up, as their periods are often nowhere close to
// integer numbers of samples.
// Estimate the period
double pEst = EstimatePeriod( x, n, minP, maxP, q );
// Compute the fundamental frequency (reciprocal of period)
double fEst = 0;
if ( pEst > 0 )
fEst = sr/pEst;
printf( "Actual freq: %8.3lf\n", f );
printf( "Estimated freq: %8.3lf\n", sr/pEst );
printf( "Error (cents): %8.3lf\n", 100*12*log(fEst/f)/log(2) );
printf( "Periodicity quality: %8.3lf\n", q );
return 0;
}
// ===================================================================
// EstimatePeriod
//
// Returns best estimate of period.
// ===================================================================
double EstimatePeriod(
const double *x, // Sample data.
const int n, // Number of samples. Should be at least 2 x maxP
const int minP, // Minimum period of interest
const int maxP, // Maximum period
double& q ) // Quality (1= perfectly periodic)
{
assert( minP > 1 );
assert( maxP > minP );
assert( n >= 2*maxP );
assert( x != NULL );
q = 0;
// --------------------------------
// Compute the normalized autocorrelation (NAC). The normalization is such that
// if the signal is perfectly periodic with (integer) period p, the NAC will be
// exactly 1.0. (Bonus: NAC is also exactly 1.0 for periodic signal
// with exponential decay or increase in magnitude).
vector<double> nac(maxP+1);
for ( int p = minP-1; p <= maxP+1; p++ )
{
double ac = 0.0; // Standard auto-correlation
double sumSqBeg = 0.0; // Sum of squares of beginning part
double sumSqEnd = 0.0; // Sum of squares of ending part
for ( int i = 0; i < n-p; i++ )
{
ac += x[i]*x[i+p];
sumSqBeg += x[i]*x[i];
sumSqEnd += x[i+p]*x[i+p];
}
nac[p] = ac / sqrt( sumSqBeg * sumSqEnd );
}
// ---------------------------------------
// Find the highest peak in the range of interest.
// Get the highest value
int bestP = minP;
for ( int p = minP; p <= maxP; p++ )
if ( nac[p] > nac[bestP] )
bestP = p;
// Give up if it's highest value, but not actually a peak.
// This can happen if the period is outside the range [minP, maxP]
if ( nac[bestP] < nac[bestP-1]
&& nac[bestP] < nac[bestP+1] )
{
return 0.0;
}
// "Quality" of periodicity is the normalized autocorrelation
// at the best period (which may be a multiple of the actual
// period).
q = nac[bestP];
// --------------------------------------
// Interpolate based on neighboring values
// E.g. if value to right is bigger than value to the left,
// real peak is a bit to the right of discretized peak.
// if left == right, real peak = mid;
// if left == mid, real peak = mid-0.5
// if right == mid, real peak = mid+0.5
double mid = nac[bestP];
double left = nac[bestP-1];
double right = nac[bestP+1];
assert( 2*mid - left - right > 0.0 );
double shift = 0.5*(right-left) / ( 2*mid - left - right );
double pEst = bestP + shift;
// -----------------------------------------------
// If the range of pitches being searched is greater
// than one octave, the basic algo above may make "octave"
// errors, in which the period identified is actually some
// integer multiple of the real period. (Makes sense, as
// a signal that's periodic with period p is technically
// also period with period 2p).
//
// Algorithm is pretty simple: we hypothesize that the real
// period is some "submultiple" of the "bestP" above. To
// check it, we see whether the NAC is strong at each of the
// hypothetical subpeak positions. E.g. if we think the real
// period is at 1/3 our initial estimate, we check whether the
// NAC is strong at 1/3 and 2/3 of the original period estimate.
const double k_subMulThreshold = 0.90; // If strength at all submultiple of peak pos are
// this strong relative to the peak, assume the
// submultiple is the real period.
// For each possible multiple error (starting with the biggest)
int maxMul = bestP / minP;
bool found = false;
for ( int mul = maxMul; !found && mul >= 1; mul-- )
{
// Check whether all "submultiples" of original
// peak are nearly as strong.
bool subsAllStrong = true;
// For each submultiple
for ( int k = 1; k < mul; k++ )
{
int subMulP = int(k*pEst/mul+0.5);
// If it's not strong relative to the peak NAC, then
// not all submultiples are strong, so we haven't found
// the correct submultiple.
if ( nac[subMulP] < k_subMulThreshold * nac[bestP] )
subsAllStrong = false;
// TODO: Use spline interpolation to get better estimates of nac
// magnitudes for non-integer periods in the above comparison
}
// If yes, then we're done. New estimate of
// period is "submultiple" of original period.
if ( subsAllStrong == true )
{
found = true;
pEst = pEst / mul;
}
}
return pEst;
}