Downloading audio and video from YouTube

Users of my AudioStretch app sometimes ask me whether you can use YouTube content in it, for example to work out how to play a solo from some jazz concert video. Short answer: you can’t. AudioStretch doesn’t have the ability to grab content directly from YouTube, and even if it were technically possible to add such a feature, it’s probably not allowed.

That said, there are services to download content from YouTube and get it into audio or video files on your PC (or Mac). Once you’ve got the file on your PC, you can copy it into Dropbox. Then in Dropbox on your iPad, you can choose the file and ‘export’ it to AudioStrech (or other app) via the iOS “Open In…” mechanism. So here are a couple of services you can check out. lets you download the audio track of a YouTube video as an mp3 file. Just take the URL from YouTube, copy/paste it into, and in a few seconds an mp3 will be ready for download.

If you want to download a video (not just the audio) from YouTube, go to the YouTube video, then change the start of the URL from to, e.g. change:


That second link redirects to, and from there you’ll be able to download the video as an mp4. Note that you can you can open the audio track mp4 videos in AudioStretch via “Open In…” from Dropbox, just as you would load up mp3 files.

Disclaimer: I’m unsure of the legality of the above services, and have no idea whether they’re entirely safe, etc. so use them at your own risk! And also, please remember that much material on YouTube is not in the public domain, and probably shouldn’t have been uploaded to YouTube in the first place. If you’re just downloading a song to work it out in AudioStretch or some other time-stretching app, and not redistributing it, that may be OK (at least morally, if not legally). But if you really like a song, support the artist by buying a copy of it from a legitimate source. If your favourite artist puts out educational videos or books for learning their stuff, buy them. You’d be surprised how little most recording artists earn (and most app developers, for that matter). Musicians and other creative people often love doing what they do, but need to buy food and pay the rent too!

Posted in Uncategorized | Leave a comment

The Expert (Short Comedy Sketch)

Classic! I’ve been in so many meetings like this…

Video | Posted on by | Leave a comment

Real-Time Audio Generation Using WASAPI on Windows 8.1

Most of my audio coding work these days is in Flash or iOS, but I always keep an eye on other possible platforms to develop audio apps for. One of the biggies is of course Windows 8, so I figured I should at least get familiar with the basics of doing interactive real-time audio generation in a Windows 8 app, using Microsoft audio APIs.

Much to my surprise I couldn’t find a good demo app (at least when I searched a few months ago). The nearest I could find was an MSDN WASAPI sample, which is fine, but doesn’t do real-time audio generation, i.e. it doesn’t generate audio that’s calculated continuously, on-the-fly, and sent to the audio output with relatively little latency.

So I created my own (full code here), using the MSDN WASAPI sample as a starting point, stripping it down to the bare essentials for interactive audio generation. It’s a simple continuous sine tone generator, with the sine frequency adjustable via a slider. It looks like this:


My code sets up a real-time audio output that calls an audio generation callback function at regular intervals. The fiddly WASAPI stuff is encapsulated in C++ AudioOutput class with a very simple interface. Basically you create the AudioOutput, initialize it with a static callback function and an object pointer.

	// Create and initialize a WASAPI renderer
	m_audioOutput = Make<AudioOutput>();
	if (m_audioOutput)
		m_audioOutput->Init(AudioOutputCallback, this);

In that static callback function, you can use the object pointer to call a non-static method of an object. In my demo, the callback function is implemented in MainPage.xaml.cpp. MainPage.xaml has a slider that sets the instantaneous frequency of the sine tone. Here’s the static callback function and the redirection to MainPage’s class method:

void MainPage::AudioOutputCallback(
	float32 *output, 
	int n, 
	int numChannels,
	int sampleRate,
	Platform::Object^ user)
	MainPage^ mainPage = safe_cast<MainPage^>(user);
	mainPage->GenerateAudio(output, n, numChannels, sampleRate);

And here’s MainPage’s GenerateAudio function:

void MainPage::GenerateAudio(
	float32 *output, 
	int n, 
	int numChannels, 
	int sampleRate)
	// Compute the phase increment for the current frequency
	assert(m_frequency != 0);
	double phaseInc = 2*M_PI*m_frequency/sampleRate;

	// Generate the samples
	for (int i = 0; i < n; i++)
		float32 x = float(0.1 * sin(m_phase));
		for (int ch = 0; ch < numChannels; ch++)
			*output++ = x;
		m_phase += phaseInc;

	// Bring phase back into range [0, 2pi]
	m_phase = fmod(m_phase, 2*M_PI);

Again, you can download a zip with the full project source code. It builds fine in Microsoft Visual Studio Express 2013 for Windows, and runs fine on Windows 8.1 running under Bootcamp on my MacBook Pro. I make no claims that it will build or run on any other configuration! If you find this code useful and adapt the code for your own projects, no attribution is necessary… but of course it’s always welcome, as are thank you notes in comments. Enjoy!

Posted in Uncategorized | Leave a comment

100 AppStore Reviews for AudioStretch

Reached an interesting milestone a few days ago: 100 user reviews for AudioStretch. The response has been overwhelmingly positive, averaging about 4.5 out of 5 stars across all versions.

Here’s a screenshot of the most recent reviews as they appear on AppAnnie:


Posted in Uncategorized | Leave a comment

Indiloop “Mix for the Masses” contest

If you’ve got a good ear and enjoy playing around with music tools, check out Indiloop. Now’s a great time to sign up, as Indiloop’s having a remix contest, with a top prize of $2000. The task: using Indiloop, create a remix using ‘stems’ from three artists.


No need to buy anything. Signing up to Indiloop is free. All the raw musical material is available on Indiloop. It’s all browser based, so no special software required, just a reasonably fast PC or Mac, and a web browser that supports Flash.

My connection to Indiloop: I developed the Flash real-time audio signal processing “engine” that does all the pitch-shifting, time-stretching, and mixing. Basically the fiddly behind-the-scenes code that makes real-time remixing possible in a web browser. A team of very talented folks at Indiloop HQ in Vancouver does everything else – UI design, web and database programming, support, music licensing, legal, finance. Little startups with big ambitions are always team efforts!

Posted in Uncategorized | Leave a comment


Here’s a little promo video for Indiloop, a Vancouver-based company that has a super-cool online music remixing/mashup service. It’s been up and running on the web for a while, and an iPad version is coming soon. I wrote all the audio signal processing code for both versions – in Flash for the web, and native iOS code for iPad.

Indiloop was chosen as one of the finalists for the MidemLab startup competition at MIDEM, a major music industry conference that takes place every year in Cannes, France.

Turns out that a startup I did some audio programming for a couple of years ago, Paris-based Weezic, was also selected as a Midemlab finalist this year.

I’m hoping both Indiloop and Weezic will win in their respective categories!

Posted in Uncategorized | Leave a comment

High Accuracy Monophonic Pitch Estimation Using Normalized Autocorrelation

A few weeks ago, I posted a demo Flash real-time pitch detector, and described a bit how it worked without showing any source code.  Well, today, I am posting some actual code,  C++ code for a very accurate monophonic time-domain pitch estimator.

I’ve used variants of this pitch estimator in various projects since the late 1990s. The version posted here is a simple, non-optimized version which I wrote for a friend. Note that it’s written for clarity, not speed. Note also that it’s only appropriate for monophonic (in the sense of single pitched source) signals. Polyphonic pitch detection is a harder problem, best tackled using spectral techniques.

Scroll to the bottom of this post for the code. I’m releasing it under the MIT license, which means you can do pretty much whatever you like with it as long as your source code also includes the license. Hat-tips in the form of comments, credits, and free copies of whatever products you create using it are welcome.

The first essential step in this and many other time-domain pitch estimators is the normalized autocorrelation (NAC), which in my code looks like this:

	vector nac(maxP+1);

	for ( int p =  minP-1; p <= maxP+1; p++ )
		double ac = 0.0;		// Standard auto-correlation
		double sumSqBeg = 0.0;	// Sum of squares of beginning part
		double sumSqEnd = 0.0;	// Sum of squares of ending part

		for ( int i = 0; i < n-p; i++ )
			ac += x[i]*x[i+p];
			sumSqBeg += x[i]*x[i];
			sumSqEnd += x[i+p]*x[i+p];
		nac[p] = ac / sqrt( sumSqBeg * sumSqEnd );

In this code, x points to the input signal, which should normally have a length at least two times the maximum period; minP/maxP are the minimum/maximum periods of interest; p is the ‘lag’ or time shift in samples (i.e. the hypothetical period); and ac is just the standard (non-normalized) autocorrelation.

Plenty of pitch estimators use autocorrelation, but the problem is that the magnitude of the autocorrelation depends on the magnitude of the signal. Another problem is that because the autocorrelation is computed based on fewer points as the ‘lag’ p increases, the autocorrelation tends to get smaller with increasing p. That makes it difficult to choose the best period.

The normalized autocorrelation (nac in the code) is computed by dividing the (non-normalized) autocorrelation ac by the square root of the product of the sums-of-squares of the two sub-sequences that were multiplied to give the autocorrelation. That’s a mouthful – it’s easier to say in math lingo, but clearest in code.

The net result of the normalization is that if p is exactly equal to the real period (or an integer multiple thereof), the NAC at that period has a value of exactly 1.0. A happy by-product of the normalization is that if you have a signal that’s periodic but has an exponentially growing or decaying envelope, the NAC will still be 1.0. It even works if there’s no energy at the fundamental frequency.

I came up with this particular normalization independently in the late 1990s, but it’s not unique. A couple of years ago, a friend of mine, Dave Fernandes (CEO of Mint Leaf Software) told me my normalization is the same as the one in Paul Boersma’s Praat speech analysis tool, which has been around from the mid-1990s. And I’ve seen the same normalization in various academic papers, for example this 2003 paper by Sumit Basu from Microsoft.

What’s more interesting than the NAC algo is how one can apply some simple tricks to get musically useful results. First off, we need to improve the resolution. For typical sample rates and musical pitches, estimating the period to the nearest number of samples is nowhere near good enough. Consider the top note on a piano, C8 = 4186Hz. For a sample rate of 44.1kHz, C8 would have a period of 10.53 samples. If we could only estimate the period to the nearest sample, we’d get either 10 or 11 samples – error of 5%, nearly a semitone! To get a vastly better estimate, we can apply quadratic interpolation using the peak NAC and the points on either side of it. With this interpolation, the error is reduced to tiny fractions of a semitone, throughout the entire range of a piano at least.

The other challenge is to eliminate so-called ‘octave errors’ in which the estimated period is actually a multiple of the real period (e.g. C4 gets misrecognized as C3). The trick is to check whether the NAC has strong peaks at integer submultiples of period implied by the strongest peak. For example, if the biggest peak in the NAC is for a period of 300 samples, but there are very strong peaks at 100 and 200 as well, then assume the period is 100 samples.

To build sample, just drop the code (see below) into a main.cpp in your favourite C++ development environment. It’s hard-coded to generate and analyze a signal with fundamental frequency corresponding to middle C (C4 ~= 261.6 Hz). When you run it, you should get this output:

Actual freq:          261.626
Estimated freq:       261.625
Error (cents):         -0.002
Periodicity quality:    1.000

‘Actual freq’ is the pitch of the test signal; ‘Estimated freq’ is the estimated pitch of the test signal as computed by my algorithm; ‘Error (cents)’ is the error in the estimate is hundredths of semitones; and ‘Periodicity quality’ is a measure of how periodic the signal is, which 1.0 meaning perfectly periodic. Note that error: 2 millicents!

Incidentally, that ‘periodicity quality’ can be handy if you’re synthesizing speech – rather than use a boolean voiced/unvoiced decision, you can synthesize with varying amounts of noisiness based on the quality. Also good for synthesis of musical signals (which after all are never purely periodic or purely noisy).

There are loads of possible performance tweaks not shown here:

  • Use an FFT-based method to compute the autocorrelation
  • Compute the square of each sample only once.
  • Find places in the signal where the periodicity is very strong, then scan forwards and backwards from there to track the pitch into the less certain portions. (Typically only applicable for non-real-time cases).
  • Limit the search range to only a small part of the possible pitch range around the most recently identified pitch.
  • Mix in some noise and/or use a level threshold to prevent spurious pitch detections when the signal level gets very small.

All things I’ve done at various times, and which people ‘skilled-in-the-art’ (as they say in patents) could figure out. Besides performance optimizations, to use this algo in a real-time context you have to know how to capture input audio, how to call the pitch estimator at appropriate times, how to display the output, etc. These are left as exercises for the reader (as they say in textbooks)… but if you’d like to hire someone to help with it, I know someone you can call. ;-)

// ===================================================================
//	PeriodEstimator demo
//	Demonstrates use of period estimator algorithm based on 
//	normalized autocorrelation. Other neat tricks include sub-sample
//	accuracy of the period estimate, and avoidance of octave errors.
//	Released under the MIT License
//	The MIT License (MIT)
//	Copyright (c) 2009 Gerald T Beauregard
//	Permission is hereby granted, free of charge, to any person obtaining a copy
//	of this software and associated documentation files (the "Software"), to deal
//	in the Software without restriction, including without limitation the rights
//	to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
//	copies of the Software, and to permit persons to whom the Software is
//	furnished to do so, subject to the following conditions:
//	The above copyright notice and this permission notice shall be included in
//	all copies or substantial portions of the Software.
// ===================================================================

#include <stdio.h>
#include <math.h>
#include <assert.h>
#include <vector>

using namespace std;

double EstimatePeriod(
	const double	*x,			//	Sample data.
	const int		n,			//	Number of samples.  For best results, should be at least 2 x maxP
	const int		minP,		//	Minimum period of interest
	const int		maxP,		//	Maximum period of interest
	double&			q );		//	Quality (1= perfectly periodic)

int main (int argc, char * const argv[])
	const double pi = 4*atan(1);

	const double sr = 44100;		//	Sample rate.
	const double minF = 27.5;		//	Lowest pitch of interest (27.5 = A0, lowest note on piano.)
	const double maxF = 4186.0;		//	Highest pitch of interest(4186 = C8, highest note on piano.)
	const int minP = int(sr/maxF-1);	//	Minimum period
	const int maxP = int(sr/minF+1);	//	Maximum period

	//	Generate a test signal

	const double A440 = 440.0;				//	A440
	double f = A440 * pow(2.0,-9.0/12.0);	//	Middle C (9 semitones below A440)
	double p  = sr/f;
	double q;
	const int n = 2*maxP;
	double x[n];
	for ( int k = 0; k < n; k++ )
		x[k] = 0;
		x[k] += 1.0*sin(2*pi*1*k/p);	//	First harmonic
		x[k] += 0.6*sin(2*pi*2*k/p);	//	Second harmonic
		x[k] += 0.3*sin(2*pi*3*k/p);	//	Third harmonic
	//	TODO: Add low-pass filter to remove very high frequency 
	//	energy. Harmonics above about 1/4 of Nyquist tend to mess
	//	things up, as their periods are often nowhere close to 
	//	integer numbers of samples.
	//	Estimate the period
	double pEst = EstimatePeriod( x, n, minP, maxP, q );
	//	Compute the fundamental frequency (reciprocal of period)
	double fEst = 0;
	if ( pEst > 0 )
		fEst = sr/pEst;
	printf( "Actual freq:         %8.3lf\n", f );
	printf( "Estimated freq:      %8.3lf\n", sr/pEst );
	printf( "Error (cents):       %8.3lf\n", 100*12*log(fEst/f)/log(2) );
	printf( "Periodicity quality: %8.3lf\n", q );

    return 0;

// ===================================================================
//	EstimatePeriod
//	Returns best estimate of period.
// ===================================================================
double EstimatePeriod(
	const double	*x,			//	Sample data.
	const int		n,			//	Number of samples.  Should be at least 2 x maxP
	const int		minP,		//	Minimum period of interest
	const int		maxP,		//	Maximum period
	double&			q )			//	Quality (1= perfectly periodic)
	assert( minP > 1 );
	assert( maxP > minP );
	assert( n >= 2*maxP );
	assert( x != NULL );
	q = 0;
	//	--------------------------------
	//	Compute the normalized autocorrelation (NAC).  The normalization is such that
	//	if the signal is perfectly periodic with (integer) period p, the NAC will be
	//	exactly 1.0.  (Bonus: NAC is also exactly 1.0 for periodic signal
	//	with exponential decay or increase in magnitude).
	vector<double> nac(maxP+1);
	for ( int p =  minP-1; p <= maxP+1; p++ )
		double ac = 0.0;		// Standard auto-correlation
		double sumSqBeg = 0.0;	// Sum of squares of beginning part
		double sumSqEnd = 0.0;	// Sum of squares of ending part
		for ( int i = 0; i < n-p; i++ )
			ac += x[i]*x[i+p];
			sumSqBeg += x[i]*x[i];
			sumSqEnd += x[i+p]*x[i+p];
		nac[p] = ac / sqrt( sumSqBeg * sumSqEnd );
	//	---------------------------------------
	//	Find the highest peak in the range of interest.
	//	Get the highest value
	int bestP = minP;
	for ( int p = minP; p <= maxP; p++ )
		if ( nac[p] > nac[bestP] )
			bestP = p;
	//	Give up if it's highest value, but not actually a peak.
	//	This can happen if the period is outside the range [minP, maxP]
	if ( nac[bestP] < nac[bestP-1] 
	  && nac[bestP] < nac[bestP+1] )
		return 0.0;
	//	"Quality" of periodicity is the normalized autocorrelation
	//	at the best period (which may be a multiple of the actual
	//	period).
	q = nac[bestP];

	//	--------------------------------------
	//	Interpolate based on neighboring values
	//	E.g. if value to right is bigger than value to the left,
	//	real peak is a bit to the right of discretized peak.
	//	if left  == right, real peak = mid;
	//	if left  == mid,   real peak = mid-0.5
	//	if right == mid,   real peak = mid+0.5
	double mid   = nac[bestP];
	double left  = nac[bestP-1];
	double right = nac[bestP+1]; 
	assert( 2*mid - left - right > 0.0 );

	double shift = 0.5*(right-left) / ( 2*mid - left - right );
	double pEst = bestP + shift;
	//	-----------------------------------------------
	//	If the range of pitches being searched is greater
	//	than one octave, the basic algo above may make "octave"
	//	errors, in which the period identified is actually some
	//	integer multiple of the real period.  (Makes sense, as
	//	a signal that's periodic with period p is technically
	//	also period with period 2p).
	//	Algorithm is pretty simple: we hypothesize that the real
	//	period is some "submultiple" of the "bestP" above.  To
	//	check it, we see whether the NAC is strong at each of the
	//	hypothetical subpeak positions.  E.g. if we think the real
	//	period is at 1/3 our initial estimate, we check whether the 
	//	NAC is strong at 1/3 and 2/3 of the original period estimate.
	const double k_subMulThreshold = 0.90;	//	If strength at all submultiple of peak pos are 
											//	this strong relative to the peak, assume the 
											//	submultiple is the real period.
	//	For each possible multiple error (starting with the biggest)
	int maxMul = bestP / minP;
	bool found = false;
	for ( int mul = maxMul; !found && mul >= 1; mul-- )
		//	Check whether all "submultiples" of original
		//	peak are nearly as strong.
		bool subsAllStrong = true;
		//	For each submultiple
		for ( int k = 1; k < mul; k++ )
			int subMulP = int(k*pEst/mul+0.5);
			//	If it's not strong relative to the peak NAC, then
			//	not all submultiples are strong, so we haven't found
			//	the correct submultiple.
			if ( nac[subMulP] < k_subMulThreshold * nac[bestP] )
				subsAllStrong = false;
			//	TODO: Use spline interpolation to get better estimates of nac
			//	magnitudes for non-integer periods in the above comparison

		//	If yes, then we're done.   New estimate of 
		//	period is "submultiple" of original period.
		if ( subsAllStrong == true )
			found = true;
			pEst = pEst / mul;
	return pEst;

Posted in Uncategorized | 6 Comments