Pitch Detection in Flash

Was just doing some tidying up on my webserver and I came across an old demo I created quite a while ago, but never demoed publicly (as far as I can remember): a real-time pitch monophonic pitch detector, written in Flash.  It looks like this (click to run it!):


The pitch detector listens to the microphone input, displays the waveform, and shows the detected pitch as a red dot on a keyboard. It updates continuously, so as you sing, whistle, or play an instrument, you can see the red do move around.

I’m not planning on releasing the source code to this. I earn a living writing audio code other people, so if I give away all my secrets, I’d be putting myself out of business 🙂 [Update: I’ve released non-optimized C++ source code for a monophonic pitch detection algorithm. Consider it a public service].

But I can say a bit about how it works. Basically there are two main approaches to pitch detection: time-domain approaches, which typically use an autocorrelation; and spectral approaches, which typically use Fourier transforms and some simple pattern matching.  This demo uses a time-domain approach.

Time-domain approaches  are only useful for monophonic cases: that is, where there’s at most one pitched source at any given instant.  The idea behind autocorrelation is basically to see how well a signal lines up with a delayed version of itself for varying amounts of delay (or “lag”).  The very best alignment is for zero lag, but that’s not very interesting.  What is interesting is that you also get very good alignment (auto-correlation) at a lag that corresponds to the period of the waveform; the reciprocal of that is the fundamental frequency (i.e. the pitch or f0 ‘F naught’) of the sound.

That’s basically it, but as always with anything audio, there are loads of subtleties and tricks.  First off, the autocorrelation will be very strong not just for a lag of one period, but also two periods, three periods, etc. What that means is that a middle C (C4) can easily be ‘mis-heard’ as a note an one octave below (C3, whose period is 1/2 that of C4) or an octave-and-a-fifth below (F2, period  1/3 that of C4).  Depending on the signal and what sort of normalization you use, the auto-correlation peaks for multiples of the real period may be stronger than the peak at a lag of one period.

Next, there’s a problem of resolution: for high pitches, the period is really not very long.  For the highest note on a piano (C8, ~4186Hz), the period is less than a dozen samples (if the sample rate is 44.1 kHz) To get a musically accurate measurement of the pitch, you need to upsample (say, to 88.2 kHz), interpolate the peaks of the autocorrelation, or both.

There’s also a matter of CPU load.  Brute force auto-correlation is pretty expensive computationally. Fortunately autocorrelation can be performed more efficiently using Fourier transforms, as the autocorrelation of a signal is equal to the inverse Fourier transform of the product of the signal’s Fourier transform and its complex conjugate:  AC(x) = IFFT(FFT(x)FFT(x)*).  There are some subtleties there – you need to zero-pad the signals, otherwise you’ll compute the circular autocorrelation, which is far less useful.

Even if you get all this stuff right, monophonic pitch detectors using autocorrelation can be thrown off pretty easily, as real-world signals tend not to be as monophonic as we’d like.  Even with an instrument that physically only produces one note at a time (say a clarinet – ignoring advance playing with multiphonics), if you record it in a highly reverberant space, at any given time-slice the recorded signal will contain not just the current note, but also the echoes/reverberation of notes played slightly earlier.

In my more recent experiments with pitch detection, I generally use spectral approaches, as they can be applied to polyphonic pitch detection, and can be tweaked to deal with reverb in the monophonic case.

If this all seems like Greek – even without the mathematical notation, which tends to use Greek letters a lot! – well, that’s the nature of this domain… and that why it’s worth hiring experts 🙂

Incidentally, my AudioStretch app for iOS includes a spectrum analyzer graphically aligned with a keyboard display (which is playable). While it doesn’t do note-recognition per se, it shows you the spectrum of whatever notes are playing; by playing the keyboard you can audibly and graphically figure out which note(s) best line up with the spectrum. Eventually I’ll have the app automatically identify the notes.

For example, here’s AudioStretch displaying the spectrum for a major third played on a piano, specifically middle-C (C4) and the E just above it (E4).  The spectrum clearly shows the peaks at the C4 and E4, as well as at the harmonics of those notes.



About Gerry Beauregard

I'm a Singapore-based Canadian software engineer, inventor, musician, and occasional triathlete. My current work and projects mainly involve audio technology for the web and iOS. I'm the author of AudioStretch, an audio time-stretching/pitch-shifting app for musicians. Past jobs have included writing speech recognition software for Apple, creating automatic video editing software for muvee, and designing ASICs for Nortel. I hold a Bachelor of Applied Science (Electrical Engineering) from Queen's University and a Master of Arts in Electroacoustic Music from Dartmouth College.
This entry was posted in Uncategorized. Bookmark the permalink.

8 Responses to Pitch Detection in Flash

  1. Mike says:

    Gerry, you’re killing me here..My mind is exploding now with all kinds of ideas that I can do with it !
    The people demand an ANE extension 🙂

    This is really great ! I sat down whistling for 10 mins 🙂

  2. Pingback: High Accuracy Monophonic Pitch Estimation Using Normalized Autocorrelation | Gerry Beauregard

  3. This is an ideal tool for pre-lingual CI folks to develop a pitch choice/control skill leading to singing activities. This is incredibly useful!

  4. The link doesn’t work (or maybe chrome just won’t display it 😡 )

  5. Neetesh says:

    Interesting, Can you detect the pitch when someone singing?

    Sure can! Assuming your browser has Flash, you can try it by clicking on the keyboard image on this page, or alternatively just go to http://www.samboo.org/audiostretch/pitch/

  6. David says:

    Hello, I think flash now requires the connection to be secure (https) before it can accept/detec microphone/video input. Thus, your demo is not working anymore. Can you upload it somewhere else that has https connection? (Such as this site) Thanks.


    • Running Flash has certainly gotten more difficult. Because of security issues with Flash (and plugins generally) over the years, browsers require you to give permission to use plugins, sometimes on a site-by-site basis. The demo still works for me in Safari on my Mac, but in Chrome, it no longer works. The Chrome developer console tells me that “Microphone and Camera access no longer works on insecure origins. To use this feature, you should consider switching your application to a secure origin, such as HTTPS”. Time-permitting, I’ll move the demo to a secure server somewhere.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s