High Accuracy Monophonic Pitch Estimation Using Normalized Autocorrelation


A few weeks ago, I posted a demo Flash real-time pitch detector, and described a bit how it worked without showing any source code.  Well, today, I am posting some actual code,  C++ code for a very accurate monophonic time-domain pitch estimator.

I’ve used variants of this pitch estimator in various projects since the late 1990s. The version posted here is a simple, non-optimized version which I wrote for a friend. Note that it’s written for clarity, not speed. Note also that it’s only appropriate for monophonic (in the sense of single pitched source) signals. Polyphonic pitch detection is a harder problem, best tackled using spectral techniques.

Scroll to the bottom of this post for the code. I’m releasing it under the MIT license, which means you can do pretty much whatever you like with it as long as your source code also includes the license. Hat-tips in the form of comments, credits, and free copies of whatever products you create using it are welcome.

The first essential step in this and many other time-domain pitch estimators is the normalized autocorrelation (NAC), which in my code looks like this:

	vector nac(maxP+1);

	for ( int p =  minP-1; p <= maxP+1; p++ )
	{
		double ac = 0.0;		// Standard auto-correlation
		double sumSqBeg = 0.0;	// Sum of squares of beginning part
		double sumSqEnd = 0.0;	// Sum of squares of ending part

		for ( int i = 0; i < n-p; i++ )
		{
			ac += x[i]*x[i+p];
			sumSqBeg += x[i]*x[i];
			sumSqEnd += x[i+p]*x[i+p];
		}
		nac[p] = ac / sqrt( sumSqBeg * sumSqEnd );
	}

In this code, x points to the input signal, which should normally have a length at least two times the maximum period; minP/maxP are the minimum/maximum periods of interest; p is the ‘lag’ or time shift in samples (i.e. the hypothetical period); and ac is just the standard (non-normalized) autocorrelation.

Plenty of pitch estimators use autocorrelation, but the problem is that the magnitude of the autocorrelation depends on the magnitude of the signal. Another problem is that because the autocorrelation is computed based on fewer points as the ‘lag’ p increases, the autocorrelation tends to get smaller with increasing p. That makes it difficult to choose the best period.

The normalized autocorrelation (nac in the code) is computed by dividing the (non-normalized) autocorrelation ac by the square root of the product of the sums-of-squares of the two sub-sequences that were multiplied to give the autocorrelation. That’s a mouthful – it’s easier to say in math lingo, but clearest in code.

The net result of the normalization is that if p is exactly equal to the real period (or an integer multiple thereof), the NAC at that period has a value of exactly 1.0. A happy by-product of the normalization is that if you have a signal that’s periodic but has an exponentially growing or decaying envelope, the NAC will still be 1.0. It even works if there’s no energy at the fundamental frequency.

I came up with this particular normalization independently in the late 1990s, but it’s not unique. A couple of years ago, a friend of mine, Dave Fernandes (CEO of Mint Leaf Software) told me my normalization is the same as the one in Paul Boersma’s Praat speech analysis tool, which has been around from the mid-1990s. And I’ve seen the same normalization in various academic papers, for example this 2003 paper by Sumit Basu from Microsoft.

What’s more interesting than the NAC algo is how one can apply some simple tricks to get musically useful results. First off, we need to improve the resolution. For typical sample rates and musical pitches, estimating the period to the nearest number of samples is nowhere near good enough. Consider the top note on a piano, C8 = 4186Hz. For a sample rate of 44.1kHz, C8 would have a period of 10.53 samples. If we could only estimate the period to the nearest sample, we’d get either 10 or 11 samples – error of 5%, nearly a semitone! To get a vastly better estimate, we can apply quadratic interpolation using the peak NAC and the points on either side of it. With this interpolation, the error is reduced to tiny fractions of a semitone, throughout the entire range of a piano at least.

The other challenge is to eliminate so-called ‘octave errors’ in which the estimated period is actually a multiple of the real period (e.g. C4 gets misrecognized as C3). The trick is to check whether the NAC has strong peaks at integer submultiples of period implied by the strongest peak. For example, if the biggest peak in the NAC is for a period of 300 samples, but there are very strong peaks at 100 and 200 as well, then assume the period is 100 samples.

To build sample, just drop the code (see below) into a main.cpp in your favourite C++ development environment. It’s hard-coded to generate and analyze a signal with fundamental frequency corresponding to middle C (C4 ~= 261.6 Hz). When you run it, you should get this output:

Actual freq:          261.626
Estimated freq:       261.625
Error (cents):         -0.002
Periodicity quality:    1.000

‘Actual freq’ is the pitch of the test signal; ‘Estimated freq’ is the estimated pitch of the test signal as computed by my algorithm; ‘Error (cents)’ is the error in the estimate is hundredths of semitones; and ‘Periodicity quality’ is a measure of how periodic the signal is, which 1.0 meaning perfectly periodic. Note that error: 2 millicents!

Incidentally, that ‘periodicity quality’ can be handy if you’re synthesizing speech – rather than use a boolean voiced/unvoiced decision, you can synthesize with varying amounts of noisiness based on the quality. Also good for synthesis of musical signals (which after all are never purely periodic or purely noisy).

There are loads of possible performance tweaks not shown here:

  • Use an FFT-based method to compute the autocorrelation
  • Compute the square of each sample only once.
  • Find places in the signal where the periodicity is very strong, then scan forwards and backwards from there to track the pitch into the less certain portions. (Typically only applicable for non-real-time cases).
  • Limit the search range to only a small part of the possible pitch range around the most recently identified pitch.
  • Mix in some noise and/or use a level threshold to prevent spurious pitch detections when the signal level gets very small.

All things I’ve done at various times, and which people ‘skilled-in-the-art’ (as they say in patents) could figure out. Besides performance optimizations, to use this algo in a real-time context you have to know how to capture input audio, how to call the pitch estimator at appropriate times, how to display the output, etc. These are left as exercises for the reader (as they say in textbooks)… but if you’d like to hire someone to help with it, I know someone you can call. 😉

// ===================================================================
//	PeriodEstimator demo
//
//	Demonstrates use of period estimator algorithm based on 
//	normalized autocorrelation. Other neat tricks include sub-sample
//	accuracy of the period estimate, and avoidance of octave errors.
//
//	Released under the MIT License
//
//	The MIT License (MIT)
//
//	Copyright (c) 2009 Gerald T Beauregard
//
//	Permission is hereby granted, free of charge, to any person obtaining a copy
//	of this software and associated documentation files (the "Software"), to deal
//	in the Software without restriction, including without limitation the rights
//	to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
//	copies of the Software, and to permit persons to whom the Software is
//	furnished to do so, subject to the following conditions:
//
//	The above copyright notice and this permission notice shall be included in
//	all copies or substantial portions of the Software.
//
//	THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
//	IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
//	FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
//	AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
//	LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
//	OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
//	THE SOFTWARE.
// ===================================================================

#include <stdio.h>
#include <math.h>
#include <assert.h>
#include <vector>

using namespace std;

double EstimatePeriod(
	const double	*x,			//	Sample data.
	const int		n,			//	Number of samples.  For best results, should be at least 2 x maxP
	const int		minP,		//	Minimum period of interest
	const int		maxP,		//	Maximum period of interest
	double&			q );		//	Quality (1= perfectly periodic)




int main (int argc, char * const argv[])
{
	const double pi = 4*atan(1);

	const double sr = 44100;		//	Sample rate.
	const double minF = 27.5;		//	Lowest pitch of interest (27.5 = A0, lowest note on piano.)
	const double maxF = 4186.0;		//	Highest pitch of interest(4186 = C8, highest note on piano.)
	
	const int minP = int(sr/maxF-1);	//	Minimum period
	const int maxP = int(sr/minF+1);	//	Maximum period

	//	Generate a test signal

	const double A440 = 440.0;				//	A440
	double f = A440 * pow(2.0,-9.0/12.0);	//	Middle C (9 semitones below A440)
	
	double p  = sr/f;
	double q;
	const int n = 2*maxP;
	double x[n];
	
	for ( int k = 0; k < n; k++ )
	{
		x[k] = 0;
		x[k] += 1.0*sin(2*pi*1*k/p);	//	First harmonic
		x[k] += 0.6*sin(2*pi*2*k/p);	//	Second harmonic
		x[k] += 0.3*sin(2*pi*3*k/p);	//	Third harmonic
	}
	
	//	TODO: Add low-pass filter to remove very high frequency 
	//	energy. Harmonics above about 1/4 of Nyquist tend to mess
	//	things up, as their periods are often nowhere close to 
	//	integer numbers of samples.
		
	//	Estimate the period
	double pEst = EstimatePeriod( x, n, minP, maxP, q );
	
	//	Compute the fundamental frequency (reciprocal of period)
	double fEst = 0;
	if ( pEst > 0 )
		fEst = sr/pEst;
	
	printf( "Actual freq:         %8.3lf\n", f );
	printf( "Estimated freq:      %8.3lf\n", sr/pEst );
	printf( "Error (cents):       %8.3lf\n", 100*12*log(fEst/f)/log(2) );
	printf( "Periodicity quality: %8.3lf\n", q );

    return 0;
}



// ===================================================================
//	EstimatePeriod
//
//	Returns best estimate of period.
// ===================================================================
double EstimatePeriod(
	const double	*x,			//	Sample data.
	const int		n,			//	Number of samples.  Should be at least 2 x maxP
	const int		minP,		//	Minimum period of interest
	const int		maxP,		//	Maximum period
	double&			q )			//	Quality (1= perfectly periodic)
{
	assert( minP > 1 );
	assert( maxP > minP );
	assert( n >= 2*maxP );
	assert( x != NULL );
	
	q = 0;
	
	//	--------------------------------
	//	Compute the normalized autocorrelation (NAC).  The normalization is such that
	//	if the signal is perfectly periodic with (integer) period p, the NAC will be
	//	exactly 1.0.  (Bonus: NAC is also exactly 1.0 for periodic signal
	//	with exponential decay or increase in magnitude).
	
	vector<double> nac(maxP+2); // Size is maxP+2 (not maxP+1!) because we need up to element maxP+1 to  check                       
                                    // whether element at maxP is a peak. Thanks to Les Cargill for spotting the bug. 
	
	for ( int p =  minP-1; p <= maxP+1; p++ )
	{
		double ac = 0.0;		// Standard auto-correlation
		double sumSqBeg = 0.0;	// Sum of squares of beginning part
		double sumSqEnd = 0.0;	// Sum of squares of ending part
		
		for ( int i = 0; i < n-p; i++ )
		{
			ac += x[i]*x[i+p];
			sumSqBeg += x[i]*x[i];
			sumSqEnd += x[i+p]*x[i+p];
		}
		nac[p] = ac / sqrt( sumSqBeg * sumSqEnd );
	}
	
	//	---------------------------------------
	//	Find the highest peak in the range of interest.
	
	//	Get the highest value
	int bestP = minP;
	for ( int p = minP; p <= maxP; p++ )
		if ( nac[p] > nac[bestP] )
			bestP = p;
	
	//	Give up if it's highest value, but not actually a peak.
	//	This can happen if the period is outside the range [minP, maxP]
	if ( nac[bestP] < nac[bestP-1] 
	  && nac[bestP] < nac[bestP+1] )
	{
		return 0.0;
	}
	
	//	"Quality" of periodicity is the normalized autocorrelation
	//	at the best period (which may be a multiple of the actual
	//	period).
	q = nac[bestP];
	

	//	--------------------------------------
	//	Interpolate based on neighboring values
	//	E.g. if value to right is bigger than value to the left,
	//	real peak is a bit to the right of discretized peak.
	//	if left  == right, real peak = mid;
	//	if left  == mid,   real peak = mid-0.5
	//	if right == mid,   real peak = mid+0.5
	
	double mid   = nac[bestP];
	double left  = nac[bestP-1];
	double right = nac[bestP+1]; 
	
	assert( 2*mid - left - right > 0.0 );

	double shift = 0.5*(right-left) / ( 2*mid - left - right );
		
	double pEst = bestP + shift;
	
	//	-----------------------------------------------
	//	If the range of pitches being searched is greater
	//	than one octave, the basic algo above may make "octave"
	//	errors, in which the period identified is actually some
	//	integer multiple of the real period.  (Makes sense, as
	//	a signal that's periodic with period p is technically
	//	also period with period 2p).
	//
	//	Algorithm is pretty simple: we hypothesize that the real
	//	period is some "submultiple" of the "bestP" above.  To
	//	check it, we see whether the NAC is strong at each of the
	//	hypothetical subpeak positions.  E.g. if we think the real
	//	period is at 1/3 our initial estimate, we check whether the 
	//	NAC is strong at 1/3 and 2/3 of the original period estimate.
	
	const double k_subMulThreshold = 0.90;	//	If strength at all submultiple of peak pos are 
											//	this strong relative to the peak, assume the 
											//	submultiple is the real period.
											
	//	For each possible multiple error (starting with the biggest)
	int maxMul = bestP / minP;
	bool found = false;
	for ( int mul = maxMul; !found && mul >= 1; mul-- )
	{
		//	Check whether all "submultiples" of original
		//	peak are nearly as strong.
		bool subsAllStrong = true;
		
		//	For each submultiple
		for ( int k = 1; k < mul; k++ )
		{
			int subMulP = int(k*pEst/mul+0.5);
			//	If it's not strong relative to the peak NAC, then
			//	not all submultiples are strong, so we haven't found
			//	the correct submultiple.
			if ( nac[subMulP] < k_subMulThreshold * nac[bestP] )
				subsAllStrong = false;
				
			//	TODO: Use spline interpolation to get better estimates of nac
			//	magnitudes for non-integer periods in the above comparison
		}

		//	If yes, then we're done.   New estimate of 
		//	period is "submultiple" of original period.
		if ( subsAllStrong == true )
		{
			found = true;
			pEst = pEst / mul;
		}
	}
	
	return pEst;
}

Advertisements

About Gerry Beauregard

I'm a Singapore-based Canadian software engineer, inventor, musician, and occasional triathlete. My current work and projects mainly involve audio technology for the web and iOS. I'm the author of AudioStretch, an audio time-stretching/pitch-shifting app for musicians. Past jobs have included writing speech recognition software for Apple, creating automatic video editing software for muvee, and designing ASICs for Nortel. I hold a Bachelor of Applied Science (Electrical Engineering) from Queen's University and a Master of Arts in Electroacoustic Music from Dartmouth College.
This entry was posted in Uncategorized. Bookmark the permalink.

19 Responses to High Accuracy Monophonic Pitch Estimation Using Normalized Autocorrelation

  1. radu says:

    if only we could have it run in the browser

  2. Tyler says:

    Thanks very much for sharing this. I’ve spent a fair amount of time playing around with autocorrelation pitch detection for band instruments. You’ve given me some more ideas to play around with.

    A lot of the recordings I have to process have varying amounts of background music bleeding into the monophonic instrument recordings. It seems like having a bias towards shorter periods (higher frequencies) often helps to guard against octave errors caused by this. Thus far I’ve had the most luck with simply not normalizing, but my guess is that there’s probably some middle ground that would improve things.

    • Thanks for the comment! Autocorrelation (whether normalized or not) tends not to work so well unless you’ve got a truly monophonic source. Even reverb can throw it off. That said, if there’s one monophonic source that’s much louder than any other instruments, it’ll probably work OK, especially if you tweak k_subMulThreshold. These days, I tend to lean more towards frequency domain methods.

      • Tyler says:

        Yes, it seems most solutions go that route nowadays. Unfortunately my knowledge holds me back there. You really have to know what you’re doing to manipulate the data into something usable. If you ever write a book on that, I’ll buy it!

  3. Hi, I’m writing a guitar tuner and had tested various methods , FFT and AutoCorrelation wich gave not enough precision for the guitar frequency range. Your implementation is awesome and works so well ! I did not fully understand it right now but I got the logic thanks to your explanations and comments. Thank you !!

  4. Bruce says:

    Hi ,
    Great page on AutoCorrelation. I am attempting to convert your code to C. How could I handle
    the vector nac(maxP+1); without Vector include file as it only seems to be available for C++
    Thanks

    • You could declare MAX_P as a constant, and use ordinary C-style array notation, e.g. float nac[MAX_P]. Alternatively you could do dynamic allocation and use malloc, e.g. float *nac = (float*)malloc((maxP+1)*sizeof(nac[0])). If you do use malloc(), remember to free() later, otherwise you’ll have memory leaks. Do you have really have to use C? C++ compilers are available for pretty much any platform these days…

  5. Bruce says:

    I got it working with a standard array as you mentioned..thanks.
    I am using it in an embedded project and C is more the norm for ARM STM32F4 development unfortunately. BTW it works really well. The big sqrt loop is a bit of a chug on a uController but I changed the code to floats so it can use the hardware FPU and now have it updating at about 200mS intervals. I reduced the sample rate from 48k to 24k to reduce the sample array dimension and it’s still pretty good up to about 4k. I’m very happy…

    • Glad it’s working well for you. BTW, it’s possible to make it much more efficient by using an FFT + IFFT for the autocorrelation; some notes on that in another one of my posts (https://gerrybeauregard.wordpress.com/2013/04/30/pitch-detection-in-flash/). You might also try eliminating the sqrt(). Just use nac[p] = (ac*ac) / ( sumSqBeg * sumSqEnd ) instead (taking care to set nac[p] to zero is ac is negative to avoid spurious positive peaks), and using the square root of that on for the peak interpolation. [Edit: previously said ‘eliminating the sort()’; the spellchecker had ‘corrected’ sqrt() for me :-(]

  6. Bruce says:

    thanks but not sure what you mean about the sort()
    Do you mean use nac[p] = (ac*ac) / ( sumSqBeg * sumSqEnd ) instead of
    nac[p] = ac /sqrt ( sumSqBeg * sumSqEnd )?
    and then only apply the square root to the values used for
    mid = ptr->nac[bestP];
    left = ptr->nac[bestP – 1];
    right = ptr->nac[bestP + 1];

    so it only requires 3x sqrt() instead of the complete buffer.

    Will the section
    bestP = minP1;
    for (p = minP1; p nac[p] > ptr->nac[bestP])
    bestP = p;
    work without the sqrts?

    • All correct. You just need to be careful in the case where ac is negative, as the square of that will be positive. Something like this should do the trick:
      if (ac > 0 && sumSqBeg > 0 && sumSqEng > 0)
      nac[p] = (ac*ac) / ( sumSqBeg * sumSqEnd );
      else
      nac[p] = 0;

      • Bruce says:

        Excellent thank you… that works great and speeds it up noticeably.
        I can post my C version if you would like to have it here or email it if it’s easier for you to format for the site as I have found WordPress sites have trouble with code tags.

  7. cloudjanak says:

    i want to detect pitch in objective-c can any-one share demo project please.

  8. Les Cargill says:

    HI, Jerry. Nice algo . I’m sort of thinking about building a specialized tuner for pedal steel. Actually, a sort of *designer* for tuning offsets for a given pedal steel. That’s complicated.

    Unfortunately, I do get a heap-corruption crash when building this with Mingw and with Visual Studio 12 on Windows. It references “nac” one cell over the limit in line 141 of your copy ( I added instrumentation to detect bad references in the copy I’m using now, so it’s at a different line number ).

    Hope this helps; I just hate to find a bug and let it go unreported.

    Here’s the instrumentation:

    static int NACSIZE = 0;
    static int zNCC(char *file,int line,int x)
    {
    if (x >= NACSIZE)
    {
    printf(“%s,%d\n”,__FILE__,__LINE__);
    fflush(stdout);
    printf(“nac reference! :%s:%d: index=%d\n”,file,line,x);
    fflush(stdout);
    }
    return x;
    }
    #define NCC(x) zNCC(__FILE__,__LINE__,x)

    then global-search-and-replace all references to nac[..] with nac[NCC(…)] and add this:
    NACSIZE = maxP+1;
    vector nac(maxP+2);
    instead of line 127 of your copy.
    BTW, the “…+2” seem to actually work around it, but I can’t say whether the loop invariants for that for loop are off or not.

    • You’re right! Nice catch! Declaring vector nac’s size as maxP+2 instead of maxP+1 is the best fix. Originally I had the first for loop’s limit as p <= maxP, in which case declaring nac's size as maxP+1 was fine. Later I added some code to check that the maximum nac value was in fact a peak, and that requires having the next element after maxP, i.e. element maxP+1. I changed the for loop limit, but forgot to bump the vector size up to maxP+2. Thanks again for spotting this!

  9. I tried to use this algorithm as-is and I’m finding that although I’m sending it data sampled (in real time) at 44100, the periods being returned for different frequencies indicate that the sample rate seems to be 52800. I’ve no idea why this is — wondering if anyone has an insight. Basically, I’m collecting blocks of 512 samples and everytime I have 4k worth, I call the EstimatePeriod with the samples, using the same minP and maxP as in the example above.
    Thanks in advance.

  10. I should add that the assertion 2*center – left – right > 0.0 fails. Does that help?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s