One of the new features in Flash Player 10.1 is real-time access to microphone audio input data. Using that feature and my AS3 FFT I created a real-time audio spectrum analyzer.

Here’s a screenshot of its output:

Here I’m singing an “aah” at G4 (the G above middle C on a piano), which has a fundamental frequency (f0, pronounced “F naught”) of 392 Hz. Spectral peaks are clearly visible at the fundamental, as well as the 2nd, 3rd, 4th, etc. harmonics (at 2f0, 3f0, 4fo, etc.). The stuff below the peaks is background noise, mostly from an air conditioner.

You can try the spectrum analyzer yourself here. You’ll probably get a dialog box saying “www.samboo.org is requesting access to your camera and microphone”. Don’t worry, just click “Allow” – despite what the dialog says, I’m only using mic access, and I’m not recording you.

Update 8 September 2010: I’ve decided to release the source code the spectrum analyzer. Click “show source” below to see it. It’s not the loveliest code in the world, but it works. To build it, you’ll also need to use the FFT code from one of my earlier posts. Enjoy!

package { import __AS3__.vec.Vector; import flash.display.Sprite; import flash.events.*; import flash.media.Microphone; import flash.text.*; import flash.utils.*; /** * A real-time spectrum analyzer. * * Released under the MIT License * * Copyright (c) 2010 Gerald T. Beauregard * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to * deal in the Software without restriction, including without limitation the * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or * sell copies of the Software, and to permit persons to whom the Software is * furnished to do so, subject to the following conditions: * * The above copyright notice and this permission notice shall be included in * all copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS * IN THE SOFTWARE. */ [SWF(width='600', height='400', frameRate='30', backgroundColor='0xFFFFFF')] public class SpectrumAnalyzer extends Sprite { private const SAMPLE_RATE:Number = 22050; // Actual microphone sample rate (Hz) private const LOGN:uint = 11; // Log2 FFT length private const N:uint = 1 << LOGN; // FFT Length private const BUF_LEN:uint = N; // Length of buffer for mic audio private const UPDATE_PERIOD:int = 50; // Period of spectrum updates (ms) private var m_fft:FFT2; // FFT object private var m_tempRe:Vector.<Number>; // Temporary buffer - real part private var m_tempIm:Vector.<Number>; // Temporary buffer - imaginary part private var m_mag:Vector.<Number>; // Magnitudes (at each of the frequencies below) private var m_freq:Vector.<Number>; // Frequencies (for each of the magnitudes above) private var m_win:Vector.<Number>; // Analysis window (Hanning) private var m_mic:Microphone; // Microphone object private var m_writePos:uint = 0; // Position to write new audio from mic private var m_buf:Vector.<Number> = null; // Buffer for mic audio private var m_tickTextAdded:Boolean = false; private var m_timer:Timer; // Timer for updating spectrum /** * */ public function SpectrumAnalyzer() { var i:uint; // Set up the FFT m_fft = new FFT2(); m_fft.init(LOGN); m_tempRe = new Vector.<Number>(N); m_tempIm = new Vector.<Number>(N); m_mag = new Vector.<Number>(N/2); //m_smoothMag = new Vector.<Number>(N/2); // Vector with frequencies for each bin number. Used // in the graphing code (not in the analysis itself). m_freq = new Vector.<Number>(N/2); for ( i = 0; i < N/2; i++ ) m_freq[i] = i*SAMPLE_RATE/N; // Hanning analysis window m_win = new Vector.<Number>(N); for ( i = 0; i < N; i++ ) m_win[i] = (4.0/N) * 0.5*(1-Math.cos(2*Math.PI*i/N)); // Create a buffer for the input audio m_buf = new Vector.<Number>(BUF_LEN); for ( i = 0; i < BUF_LEN; i++ ) m_buf[i] = 0.0; // Set up microphone input m_mic = Microphone.getMicrophone(); m_mic.rate = SAMPLE_RATE/1000; m_mic.setSilenceLevel(0.0); // Have the mic run non-stop, regardless of the input level m_mic.addEventListener( SampleDataEvent.SAMPLE_DATA, onMicSampleData ); // Set up a timer to do periodic updates of the spectrum m_timer = new Timer(UPDATE_PERIOD); m_timer.addEventListener(TimerEvent.TIMER, updateSpectrum); m_timer.start(); } /** * Called whether new microphone input data is available. See this call * above: * m_mic.addEventListener( SampleDataEvent.SAMPLE_DATA, onMicSampleData ); */ private function onMicSampleData( event:SampleDataEvent ):void { // Get number of available input samples var len:uint = event.data.length/4; // Read the input data and stuff it into // the circular buffer for ( var i:uint = 0; i < len; i++ ) { m_buf[m_writePos] = event.data.readFloat(); m_writePos = (m_writePos+1)%BUF_LEN; } } /** * Called at regular intervals to update the spectrum */ private function updateSpectrum( event:Event ):void { // Copy data from circular microphone audio // buffer into temporary buffer for FFT, while // applying Hanning window. var i:int; var pos:uint = m_writePos; for ( i = 0; i < N; i++ ) { m_tempRe[i] = m_win[i]*m_buf[pos]; pos = (pos+1)%BUF_LEN; } // Zero out the imaginary component for ( i = 0; i < N; i++ ) m_tempIm[i] = 0.0; // Do FFT and get magnitude spectrum m_fft.run( m_tempRe, m_tempIm ); for ( i = 0; i < N/2; i++ ) { var re:Number = m_tempRe[i]; var im:Number = m_tempIm[i]; m_mag[i] = Math.sqrt(re*re + im*im); } // Convert to dB magnitude const SCALE:Number = 20/Math.LN10; for ( i = 0; i < N/2; i++ ) { // 20 log10(mag) => 20/ln(10) ln(mag) // Addition of MIN_VALUE prevents log from returning minus infinity if mag is zero m_mag[i] = SCALE*Math.log( m_mag[i] + Number.MIN_VALUE ); } // Draw the graph drawSpectrum( m_mag, m_freq ); } /** * Draw a graph of the spectrum */ private function drawSpectrum( mag:Vector.<Number>, freq:Vector.<Number> ):void { // Basic constants const MIN_FREQ:Number = 0; // Minimum frequency (Hz) on horizontal axis. const MAX_FREQ:Number = 4000; // Maximum frequency (Hz) on horizontal axis. const FREQ_STEP:Number = 500; // Interval between ticks (Hz) on horizontal axis. const MAX_DB:Number = -0.0; // Maximum dB magnitude on vertical axis. const MIN_DB:Number = -60.0; // Minimum dB magnitude on vertical axis. const DB_STEP:Number = 10; // Interval between ticks (dB) on vertical axis. const TOP:Number = 50; // Top of graph const LEFT:Number = 60; // Left edge of graph const HEIGHT:Number = 300; // Height of graph const WIDTH:Number = 500; // Width of graph const TICK_LEN:Number = 10; // Length of tick in pixels const LABEL_X:String = "Frequency (Hz)"; // Label for X axis const LABEL_Y:String = "dB"; // Label for Y axis // Derived constants const BOTTOM:Number = TOP+HEIGHT; // Bottom of graph const DBTOPIXEL:Number = HEIGHT/(MAX_DB-MIN_DB); // Pixels/tick const FREQTOPIXEL:Number = WIDTH/(MAX_FREQ-MIN_FREQ);// Pixels/Hz //----------------------- var i:uint; var numPoints:uint; numPoints = mag.length; if ( mag.length != freq.length ) trace( "mag.length != freq.length" ); graphics.clear(); // Draw a rectangular box marking the boundaries of the graph graphics.lineStyle( 1, 0x000000 ); graphics.drawRect( LEFT, TOP, WIDTH, HEIGHT ); graphics.moveTo(LEFT, TOP+HEIGHT); //-------------------------------------------- // Tick marks on the vertical axis var y:Number; var x:Number; for ( var dBTick:Number = MIN_DB; dBTick <= MAX_DB; dBTick += DB_STEP ) { y = BOTTOM - DBTOPIXEL*(dBTick-MIN_DB); graphics.moveTo( LEFT-TICK_LEN/2, y ); graphics.lineTo( LEFT+TICK_LEN/2, y ); if ( m_tickTextAdded == false ) { // Numbers on the tick marks var t:TextField = new TextField(); t.text = int(dBTick).toString(); t.width = 0; t.height = 20; t.x = LEFT-20; t.y = y - t.textHeight/2; t.autoSize = TextFieldAutoSize.CENTER; addChild(t); } } // Label for vertical axis if ( m_tickTextAdded == false ) { t = new TextField(); t.text = LABEL_Y; t.x = LEFT-50; t.y = TOP + HEIGHT/2 - t.textHeight/2; t.height = 20; t.width = 50; //t.rotation = -90; addChild(t); } //-------------------------------------------- // Tick marks on the horizontal axis for ( var f:Number = MIN_FREQ; f <= MAX_FREQ; f += FREQ_STEP ) { x = LEFT + FREQTOPIXEL*(f-MIN_FREQ); graphics.moveTo( x, BOTTOM - TICK_LEN/2 ); graphics.lineTo( x, BOTTOM + TICK_LEN/2 ); if ( m_tickTextAdded == false ) { t = new TextField(); t.text = int(f).toString(); t.width = 0; t.x = x; t.y = BOTTOM+7; t.autoSize = TextFieldAutoSize.CENTER; addChild(t); } } // Label for horizontal axis if ( m_tickTextAdded == false ) { t = new TextField(); t.text = LABEL_X; t.width = 0; t.x = LEFT+WIDTH/2; t.y = BOTTOM+30; t.autoSize = TextFieldAutoSize.CENTER; addChild(t); } m_tickTextAdded = true; // ------------------------------------------------- // The line in the graph // Ignore points that are too far to the left for ( i = 0; i < numPoints && freq[i] < MIN_FREQ; i++ ) { } // For all remaining points within range of x-axis var firstPoint:Boolean = true; for ( /**/; i < numPoints && freq[i] <= MAX_FREQ; i++ ) { // Compute horizontal position x = LEFT + FREQTOPIXEL*(freq[i]-MIN_FREQ); // Compute vertical position of point // and clip at top/bottom. y = BOTTOM - DBTOPIXEL*(mag[i]-MIN_DB); if ( y < TOP ) y = TOP; else if ( y > BOTTOM ) y = BOTTOM; // If it's the first point if ( firstPoint ) { // Move to the point graphics.moveTo(x,y); firstPoint = false; } else { // Otherwise, draw line from the previous point graphics.lineTo(x,y); } } } } }

Got an iPhone or iPad? Check out AudioStretch for iOS.

Hi, Gerry! Can i view code of this sample?

I’ve just posted the code. I was hesitant for a while because the code quality isn’t up to my usual standards, but what the heck – if other folks find it educational or useful, I figure why not share it… -GerryHey Gerry,

I would like to build an app that compares two audio clips (for voice recognition and control). I’m pretty sure that fft and spectrum analysis is the way to go.

Got any ideas on available api’s for this?

Thanx

Spectrum analysis is just one of many stages in a speech recognition system. A complete speech recognition system can be remarkably simple or extremely complicated depending on the task and how well it needs to work. For example, if you only need to recognize a dozen different isolated words spoken by one person in a non-noisy environment and always using the same microphone, a very simple template-based recognizer will probably suffice. However, if you need it to work with many different speakers and microphone setups, handle a large number of different multi-word commands, or operate in real-world (i.e. noisy!) environments, some pretty fancy techniques are required.

There are lots of speech recognition APIs around, but none accessible from Flash as far as I know. So if you’re looking at creating a speech recognition system that runs in Flash, you’d probably need to create your own.

This is amzing stuff, I would like to work with your code on some music notation stuff. Your code looks like you get better control than the native flash fft.

Copying the source includes the line numbers, not sure if that will mess up getting it working in flash, any chance of sending a zipped version of the *.fla file or posting it on the site.

Glad you like it, and I hope you’ll find it useful! If you mouse over at the top-right of the code listing, a few icons will appear. One of them is for “copy to clipboard”. I’ve done a bit of work on doing polyphonic note recognition using this spectrum analysis. Made some good progress, but it’s very hard to make it reliable enough to be worth the bother! -Gerrypolyphonic recognition!! awesome, are you going to publish some of that code in the future? I can see in a near future, an TC electronics-polytune-like app, made in flash eheheheh

(again sorry for my poor english skills)

Gerry, you’ve saved my bacon! I’ve been banging my head against a brick wall with Flash and getting an FFT form of the microphone input. Since computeSpectrum only works against playing audio (not ideal with the input from the mic) having this is just superb.

I’m so glad you made this available under an MIT license – thank you very much 🙂

Hey, glad to hear it’ll be of use! I’ve gotten so much great stuff from other generous developers online, I feel I should give something back. BTW, I released this under the MIT license because I know that in commercial development, code released under more restrictive licenses (e.g. GPL or LGPL) can raise enough concerns that devs are sometimes barred from using it. Not much point in that – if I give away code, I want it to be used! With the MIT license, you can basically do whatever you like with the code, as long as you include the license text in a comment. -Gerry

Absolutely! I’ve been doing some experiments recently, and want to do a new one with some nice stuff reacting to the audio from the microphone. I can obviously pull the raw feed from the mic, but that seems little better than querying the activity level. What I really want to do, which I figure the FTT is great for, is basing things on the active frequencies in the audio.

I’m no audiophile unfortunately, so my knowledge is pretty weak. I do understand the very basics of what’s going on, though 😀

A quick question: am I right in thinking if I wanted to have things in – say – 16 buckets (more like a bar chart equalizer feel) that I should pass through the 0 – 4000 Hz range in jumps of 250Hz, and getting the mean level in each bucket? Seems like the right thing to do, but could be an error of epic proportions 😀

Depends on the application really. In many audio applications, it’s better to have logarithmically-spaced bands, as perception of pitch generally behaves that way (e.g. each octave on a piano keyboard sounds like it’s covering an equal range, but it represents a doubling of the frequency). So you might want to use 8 buckets of bins, with mid-points separated by octaves, e.g. 62, 125, 250, 500, 1K, 2K, 8K, or by 1/2 octaves if you want better resolution. -GerryAnother question RE: the license. I don’t plan to give the Flash source code away, because it would be largely irrelevant to the world, so would you want the MIT license putting into the source code of the HTML or is that unnecessary?

You’re only required to keep the license comment in the ActionScript code. (“The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software” is the wording in the MIT license, where “the Software” consists of the .as file I posted). There’s no requirement to give credit anywhere else, but it’s always welcome and appreciated 😉 -GerryHello again Gerry,

Can you post some codes that the input is an mp3 file? thanks!

Hi Gerry, first of all thanks for sharing your excellent work!

I just have a little question for you…

I’m trying to create a real time equalizer acting on an audio file loaded at runtime.

Using two Sound obj’s I can easily load an mp3, read its byte content at runtime listening for the SampleDataEvent and write the modified ByteArray for final reproduction.

Your sound spectrum (as well as the one obtained with SoundMixer.getSpectrum(), even if it’s slower) correctly shows the expected behaviour when I apply a low pass filter, but I’m having problems with band and high pass filters: the audio seems to be correctly filtered, but the spectrum still has peaks in the low frequency regions.

I agree, my analysis is not so fine, and I think the problem is in the implementation of the filter, but what is really frustrating is the divergence between the audio and its spectral representation.

Ok… That’s my question:

Do you have any suggestion about the way to apply this kind of filters?

Thanks in advance,

fabio

[One good sanity check on whether a filter has the frequency response you expect is to pass it sine tones of known amplitudes at various frequencies, and look at the magnitude of the output. Do that at a few frequencies inside and outside the pass band, and you can quickly determine whether the filter is basically working (or not). If the input sine tone has amplitude Ain and the output has amplitude Aout, the frequency response at that frequency is20 log10 (Aout/Ain) dB

Do that at a dozen different frequencies, and you’ll be able to create a nice frequency response curve. You might even want to generate the sine tone programmatically, with a slider in your UI to control the frequency. As you adjust the frequency, look at the spectrum analyzer to see how the peak magnitude changes as a function of frequency. Note that no filter is ideal/perfect, so for input signals in the stop band you’ll inevitably still get a little bit of output. -Gerry]

Am I the only one who can’t get past all the compile errors with that class?

If you let me know what the errors are, I might be able to help! -GerryHi Gerry,

I’m working on a flash pitch detection module (for voice over) for the last 2 weeks and I have tried lots of different ways till now, but they are all too complicated or involving too much c/c++ libraries and the need of Alchemy!

I was getting the FFT from the microphone using computeSpectrum() but your FFT methods look more detailed and faster!

I’m trying to apply autocorrelation on the FFT…

Do you have any idea about how that could be done, or any other suggestions to get the fundamental frequency from a frame of a sound?

Any help will be really appreciated!

Thank you

Stelios

[Hi Stelios, I’m getting quite a lot of questions about pitch detection these days. It’s something I’ve worked on a lot, so it’s probably time I wrote a post about it! Monophonic pitch detection is not too difficult, much easier than polyphonic pitch detection. If you want to take a first stab at it, you could use the real-time spectral analysis code as a starting point. First step is to find the spectral bin with the largest magnitude. The resolution will be a bit coarse; to make it more accurate you can interpolated using that peak bin and the bins on either side, for example using Quadratic Interpolation. One challenge with pitch detectors is to avoid “octave errors”, where the pitch estimate is off by some integer factor. There are bunch of heuristics one can use to prevent octave errors… a bit too much for me to get into here… -Gerry]I’m also trying to estimate pitch via autocorrelation (I see I’m bringing this issue back up after a long period of silence, apologies!), but am quite confused about how to do it correctly (perhaps because I don’t quite understand the difference between autocorrelation and ceptrum).

Basically, I’m taking an inverse fft of the complex conjugate of an fft: first, I’m taking the FFT of a signal (a vowel), then creating a vector whose elements are this FFT multiplied by its complex conjugate, and feeding this second vector as real-number input to your inverse FFT with the imaginary-input vector as all zeros; e.g. fft.run(conjugate, zeros, true).

I expect a graph of the resulting FFT to peak at the first index and be symmetrical, as online sources indicate a ceptrum should look like. My results do have a maximum initially, then steadily decrease, but are not symmetrical, though the graphi is highly periodic. Even if this method is correct, I’m unsure how to translate the index values from this fft of an fft back into frequency!

Is this method correct? How are the indices of the ifft translated back into frequency?

Thank you!

Given a vector x with a segment of the signal, to compute the autocorrelation basically you need to compute…

IFFT(FFT(x) conj(FFT(x))

… which is really just the IFFF of the power spectrum of x (i.e. the FFT magnitude squared). But that’ll give you the “circular” autocorrelation, which generally isn’t very useful.

To get the non-circular autocorrelation, you need to “zero pad” your buffer. For example, if you want to compute the autocorrelation for a vector x that has 1024 samples, add 1024 zero samples to extend it to 2048 samples. Then compute the IFFT of the FFT magnitude squared to get the autocorrelation.

The ‘units’ of the autocorrelation are in samples of time-domain ‘lag’. Think of taking the original signal, and doing a dot product (point-by-point multiply, and summing up the result) with a delayed version of itself. The autocorrelation is that cot product as a function of the amount of delay or ‘lag’. The biggest peak in the autocorrelation (other than at zero lag) will typically correspond to the period (in samples), or some integer multiple of the period. Note that autocorrelation for pitch detection is really only applicable to monophonic signals; for polyphonic pitch detection, spectral approaches make more sense.

Thanks! So if I understand correctly, if the FFT size is N, then the first FFT will produce a vector of length N/2, so the complex conjugate vector must be zero-padded to regain length N. Whereas the FFT magnitude is Math.sqrt( re*re + im*im ), the value of its conjugate for the IFFT is (re*re + im*im)*(re*re – im*im). These values comprise the vector of real-number input for the IFFT, and its imaginary number input is all zeros, as is the case for the original FFT. Is this correct?

Finally, is the magnitude of the autocorrelation the number of samples of lag, or the index?

Thank you so much for your work on Flash DSP!

Not quite… let’s assume the FFT is of length N. You take N/2 real samples of the source signal, stuff those into the first N/2 elements of your re buffer; fill the remaining N/2 with zeros. Fill the im buffer with N zeros. Do the N-point FFT. Next compute for each bin the magnitude squared, i.e. (re[i]^2 + im[i]^2). Stuff that into your re buffer. Set the im buffer to zero. Do the IFFT. The real part of the result is the autocorrelation, and the imaginary part should be zero.

If you’re wondering what happened to the complex conjugate bit… well, the FFT gives you re and im output, so for each bin you have (re[i] + j im[i]). Taking the complex conjugate of that just means flipping the sign of the im parts, which gives (re[i] – j im[i]). So when you multiple the FFT by the conjugate of the FFT, for each bin you have (re[i] + j im[i]) (re[i] – j im[i]) , which gives you (re[i]^2 + im[i]^2), i.e. the magnitude squared.

I’m using “j” to mean sqrt(-1). I know “i” is the more commonly-used symbol, but electrical engineers like me usually use j, since i is used for current!

Thanks for your detailed explanations! It works now! Tends to underestimate the pitch by an octave, but that should be ok.

You saved my life, thank you! 🙂

Glad to hear it! 😉 -GerryThank you so much for sharing this code.

I have been trying to do this for a long time now. This is probable the most difficult part of my project.

My next step is to get a reliable frequency to tune my singing.

I have been able to loop through the numPoints to return the the highest magnitude which gives me a starting point (cool) however not the most accurate method. There seems to be Java, C# and Python solutions. Any chance I could tempt you in the challenge? Or just your thoughts on where I should start.

//Identify greatest magnitude possible fundamental Frequancy

for ( i = 0; i < numPoints; i++ )

{

if(myMag < mag[i])

{

myMag = mag[i];

txtFreq = i * 10;

}

}

Hi Darren, using the spectral bin with the greatest magnitude is a good start, but by itself won’t give good enough resolution. Using three-point interpolation around the largest peak can help a lot. I get questions about pitch detection pretty frequently, so it’s probably time for me to do a post on the subject… -GerryGerry, I’m working on a guitar tuner application and I’m tying to make the pitch detection in ActionScript 3.

I have the same questions as Darren Winfield above.

Have you managed to post something on this subject?

By the way, your FFT code is amazing!

Thanks! I’ve worked on pitch detection algorithms, but still haven’t gotten around to writing a post on pitch detection…My brain is hurting trying to make this pitch detection algorithm.

I made a few experiments last night, but I didn’t get good results.

When you have some time, please make a post about this.

I’m sure there are many others out there who are looking for a pitch detection algorithm in ActionScript 3.

Hallo Gerry..

Thanks for this nice piece of code. I’m currently trying to get that working with a beatdetection class. The first big problem I have is to find out the peaks of your magnitude-output. I am a little bit puzzled here, cause the values (round about -148 to -85 at silence) doesn’t make any sense to me. Hope you can give me a clue..

thx in advance

The code computes a log magnitude spectrum, so the values represent the level in dB as a function of frequency. A large negative value in dB means a very small (but always positive) value on a linear scale. (On a dB scale, total silence is actually minus infinity).I’ve come across some beat and tempo detection algorithms that do sliding windowed Fourier transform as a first step, for example, the method described in this Jean Laroche paper. For a basic beat detector, though, you can do a pretty good job using a purely time-domain method, which is likely to be simpler and more efficient. No need to compute a magnitude spectrum at all.-Gerry

Thx for your quick answer. Well, soundanalysis is unfamiliar territory for me. So please indulge me if I’m substantially wrong: I figured that FFT simply sorts the frequencies of one frame or one snapshot of sound, respectively. Each band represents a frequency in-between 0-20000 hz (theoretically) and each frequency has a power measured in decibel. Regardless the db logarithmically decrements along with incrementing frequency, i thought that all frequency bars could be shaped to a differential between 0 and 1. From there on goes the time-domain thing you mentioned. I’m going to add listeners to each bar and simply calculate the variancies..

In order to do so I need to know how to shape your output to the 0-1 diffs.

I bow to your musicality and your math skills..but parts of your code and even more the Jean Laroche paper is beyond my kung-fu.

And I bow to your design skills… was just cheking out layouterlimits.de, very nice stuff. The code for the spectrum analyzer is admittedly not the loveliest, and I was reluctant to release it at first in part because of that. As for the maths in it… it’s unfortunately it’s largely unavoidable with audio signal processing 😦

Your understanding of the FFT is basically correct. The FFT ‘bins’ will correspond to frequencies that depend on the sample rate and the length of the FFT. For example, if you use a sample rate of 22050 (which is what I use in my example code if I remember correctly) and a window with N=2048 (and logN=11), then the bin frequencies will be multiples of 22050/2048 ~= 10.7 Hz. For real input, you can basically ignore the output values above N/2, as they’ll contain the same info as the bins below N/2 (just with the sign of the imaginary component flipped).

If you want to convert from the dB log-magnitude scale used in the spectrogram to a 0-to-1 range, you could choose some lower and upper thresholds, and linearly map everything in between. For example, have -60dB (or below) map to 0; 0dB (or above) map to 1; and everything else in between map linearly (e.g. so -30dB would map to 0.5).

Hi Gerry, this is really phenomenal stuff! I’ve spent the past few days going over the newest FFT classes and now the Spectrum Analyzer. I am not an audiophile nor a mathematician, but I have gotten a basic understanding of what this all does. So, that being said, I was wondering what direction I should go in to read the byte array from a loaded mp3 file (via sound.extract in as3). The ultimate goal is to be able to read the entire file, map out the Hz/dB the way you did, and be able to scrub through the song to see how it changes over time and at specific times. The latter of this goal is something I can definitely do, but reading a byte array from sound.extract into an FFT has proved to be very challenging. Any advice on where I should from here? Any help is much appreciated!!

Thanks!

~Jake

Basically you’d need to modify the updateSpectrum() function in the spectrum analyzer so that it grabs samples from the mp3 instead of getting them from the circular buffer for the microphone input. Getting audio data from an mp3 is pretty straightforward. One place to see mp3 audio extraction in action is Andre-Michelle’s gapless mp3 looper code here. Note that my spectrum analyzer code as currently written works on a mono signal (since the mic is mono anyhow), but the mp3s you’ll be loading will be stereo; to create a mono signal from stereo, just add the left & right channels sample-by-sample, and (optionally) divide by two.

Ahh many thanks! Consider my hat officially off to you. I was sitting there modifying the onMicSampleData, and excessively commenting each line in the class. Thank you!

Hello Gerry,

I would like to say thank you pour toutes ces explications techniquement très enrichissantes !!!

merci beaucoup.

Thank you! I would never realize this algorithms by myself. You have saved my brain from distortion.

Pingback: Translate FFT data to match ComputeSpectrum FFT output | Software development support, software risk,bugs for bugs, risk analysis,

how to use it in flash cs5?

Not sure really. You’d have to ask someone who’s familiar with Flash CS5. Personally I use Flash Builder, but I know that it is possible to add classes with AS3 code in Flash CS5.

Hello,

first of all, thanks for sharing this code, works great.

Based on your work, I currently try to find out the value of the frequency, witch has the highest magnitude. But I guess since I don’t have so much background knowledge, I don’ t know how to achieve this. Could you give me a basic idea, how or where I could get to that value?

Thanks!

Markus

The magnitude of each bin is in variable m_mag. Just loop through that to find the index of the bin with the greatest magnitude. The frequency at a particular bin i is i*SAMPLE_RATE/N. To make the graph plotting code a little simpler, the frequencies are computed once and stored in vector m_freq. I hope this helps!

I am curious about your Hann window function implemented as:

m_win[i] =(4.0/N) * 0.5*(1-Math.cos(2*Math.PI*i/N));

Why did you add the (4.0/N) term to the standard Hann function?

With the 4/N scaling, a full-scale sinusoid (i.e. one that swings from -1 to 1) results in a 0 dB spectral peak.

for ( i = 0; i < N; i++ )

{

m_tempRe[i] = m_win[i]*m_buf[pos];

pos = (pos+1)%BUF_LEN;

}

I'm curious about the pos interator, can I change it to

m_tempRe[i] = m_win[i] * m_buf[i];

m_buf is a circular buffer. Grabbing the most recent N samples may involve getting some samples at the end, followed by some at the beginning of m_buf. By incrementing pos and applying modulo division (%), pos wraps around correctly when it reaches the end of the buffer.

You are to soooon！！！！

Thank you

I writing a program like Signstar, try to scoring people singing. I read you blog and the paper(http://www.google.com.hk/url?sa=t&rct=j&q=karaoke+algorithm&source=web&cd=6&ved=0CEUQFjAF&url=http%3A%2F%2Forbit.dtu.dk%2FgetResource%3FrecordId%3D185857%26objectId%3D1%26versionId%3D1&ei=KUrxTsHmKIrZiAK5rYCZDg&usg=AFQjCNGn6exCfbtrVQpMvO9vB3s26aTC4Q&sig2=HdlvvDPbbI6hRl8aSix9ag)

Now I confused by something, I change you test example code, and use Sound.extract get sample data which size equal 4096 bytes per 100ms from mp3 file, and then do FFT.run, find the frequency with largest magnitude, then draw it on the screen, x-axis is time, y-axis is hz. But I found some hz is higher than 1000Hz, it lead the graph to have many rapid change, is it normal?

I read the paper above, it said you should discard spike, But I don’t know the discard algorithm, I read another paper, it said you can simple discard the hz above 1000 and below 60 hz, is it right?

PS. the mp3 file is popular music, have accompany and vocal.

The FFT magnitude will give you the breakdown of frequencies from 0 (“DC”) all the way to the Nyquist frequency, which is half the sample rate (e.g. 22050 Hz if your sample rate is 44.1 kHz, which is what you get from Sound.extract()).

I haven’t read the thesis you cite, but since it’s scoring of people singing, I assume part of it involves checking whether they sing at the correct pitch, and that that is why you’re finding the frequency with the largest magnitude. That won’t work well. The magnitude spectrum of a periodic signal will have ‘spikes’ at the harmonics, i.e. at integer multiples of the fundamental frequency. It’s not at all unusual to have harmonics that are stronger than the fundamental. This can be seen clearly in the graph above, in which the third harmonic has the highest magnitude.

Thanks for you suggestion。

I think I need to learned some “pitch detection” algorithm.

I need to correct sth that above said.

not 4096 bytes

is 4096 samples, one sample have two number (left channel, right channel)

The audio input is mono. No left/right channel. Each sample is 4 bytes. Using ByteArray.readFloat() to get each sample.

Mic input is mono, and I get audio data which from mp3 file have left/right channel, I read you other comment, so I convert it to mono, (left + right)/2

郑贤伟 and 白开水 both is me.

hello. im actually looking for java source codes for a frequency fft spectrum analyzer. would be glad if u could be of help. its urgent and its for my final yr prpject

I don’t have any Java sources, but it should be pretty easy to translate from the ActionScript source code (which is in my post) to Java. Good luck on the project!

is it the same as for a spectrum analyzer? can i use the same codes but to be translated.

Not sure what you’re asking… the ActionScript spectrum analyzer code I posted uses an FFT to get the spectrum (i.e. magnitude as a function of frequency) for an audio signal in real time. If for your project you need to write one in Java, looking at my ActionScript code for inspiration might help.

mine is a frequency spectrum analyzer. and i am new to java

hi

first of all thank you so much for your blog.

how we can change linear plotting values on x axix into logarithmically.

Thanks

Suppose you wanted the x axis to be linear in frequency from some MIN_F to MAX_F, and that the axis is WIDTH pixels long. You’d basically do a linear mapping like this:

x = WIDTH * (f – MIN_F) / (MAX_F – MIN_F)

If you want the x axis to be log frequency instead, you do a similar linear mapping but with the log of the frequency:

x = WIDTH * (log(f)-log(MIN_F))/(log(MAX_F)-log(MIN_F))

Since log(a) – log(b) = log(a/b), we can simplify this to:

x = WIDTH * log(f/MIN_F)/log(MAX_F/MIN_F)

@Gerry Beauregard

Thanks a lot for your help. It worked for me.

Is there any chance I can get your algorithm to work on other audio input (as an mp3) without being actually real-time? I want to render it as an image before playing it, and that in a relatively short time, so I can use it afterwards? I can’t seem to get it the right way with the output (only negative numbers and very odd range (1 -> 7000)).

Thank you.

P.S. I would like to mention I intend to use the output for a spectrogram.

Should be pretty easy. Just decode the mp3 using Sound.extract(). It’ll give you interleaved stereo audio data, so mix it to mono. Make a series of calls to updateSpectrum(), copying blocks of (mono) audio into m_tempRe, advancing the starting position into the mono audio data on each call. Modify drawSpectrum() so that instead of drawing a graph, it draws one vertical line of varying brightness, i.e. one more time slice of the spectrogram.

Thank you for your reply, unfortunately I mus be doing something wrong, using the exact code you provided and reading from the bytes from a sound object my magnitudes are incredible, something like >700. Should I continue with this, and modify the scale, and the maximum magnitude or I am too far from a good result.

The ByteArray that you get from Sound.extract() contains normalized (-1.0 to +1.0) floating point sample values. After calling extract(), set the ByteArray’s position to 0, then call readFloat() to get each successive sample value. Be sure to check out the ActionScript Reference and Adobe engineer Tinic Uro’s blog.

Thank you this is very cool and impressive work.

Im trying to figure out how to implement pitch detection – to determine main frequency thus getting the note of the input.

Any advice on what steps would be necessary to achieve it? I noticed in some comments you said ,you would do an article on that 🙂 – maybe just a little teaser would be nice!

Many thanks again 🙂

Thanks! Alas, I have too little time to write these days… interpolating around the biggest spectral peak is not too bad, though the biggest peak may not be the fundamental.

Hi! I’m working on that too1 =)

I’m just new to AS and programming and I’m studying a lot to make this work as I want it to work with starling for my graduation work.

The way I’m trying to achieve it is by registering the fundamentals and when it finds any of then with its harmonic series, it pushes that to a vector…

Couldn’t do that yet, if anyone has some tips, I’ll be very grateful =D

Although I don’t know if it the best way to do it…

Hello Gerry,

And thank you for your very great work !

For one of my projects, i try to use your real time spectrum analysis to transfer data via soundwave.

It’s simple implemantation :

> For a 0 i make a 500hz sound for 300ms

> For a 1 i make a 1250hz sound for 300ms

> And i have a marker at 200hz for 300ms at the end of each bit

Thhen i use your spectrum analysis (50ms update spectrum) to listen if the 500hz, 1250hz or 200hz is high. But the issue is that the analysis didn’t catch any bit for the same time so i got a lot of error in the captation. Some time i got 5 500hz in row, and an other time i got only one 1 500hz.

Is it because the analysis is not enought fast ? do you have advice to do this ?

Thanks a lot !

Salut Stéphane, et merci d’avoir visité !

The general idea certainly seems doable, but are you sure those numbers and units are correct? If it’s 300ms per bit, plus 300ms between bits, then your data transmission rate will be below 2 bits/s, which seems too slow to be very useful. How is the sound being sent? Is it via a phone line, or through open air from a loudspeaker to microphone? Is there a lot of echo?

To test where the errors are coming from, I’d recommend creating an ‘offline’ non-real-time test program that extracts bits from a recording of a typical input signal.

Depending on the audio channel, you might want to look up other ways of creating a data-carrying audio signal. There’s loads of literature on modems for use on phone lines, for example; they managed to get pretty decent transmission rates over regular phone lines, which have less than 4kHz of bandwidth. Not sure offhand what modulation technique they use. For lower bandwidth, say a few dozen bits per second, you could use DTMF (Dual-tone multi-frequency) signaling; the frequencies for DTMF are cleverly chosen such that they’ll be easy to detect even if channel distortion results in addition spurious tones (due to harmonic & inharmonic distortion).

For detecting specific frequencies, the FFT is overkill, btw. You could instead use filters tuned at the specific frequencies.

Gerry,

Thank you very much for your reply !

Indeed, my algorythm with a 0 / 1 each 300ms was just for a test, in reality i would reach 13 Kbits / s. It’s just for a digital art installation to transmit a webcam capture through audiowave (speaker and mic on the same computer), so if i c’ant reach this rate, i will only transmit some part of the image.

So i went back to the basics of your spectrum analysis, and it works really nice to transmit bits at 2bits/s. In fact i had just a noob mistake in my code and the Magnitude vector was updated during my analysis of it, so thanks for the advice to go back to basics ! Now, l have (just) to increase the rate with some work on my signal encoding.

If your interesed, i will update you when the final installation will be done !

Thanks again and really sorry for my bad english.

Hi Gerry,

Berkeley Prof. Ken Goldberg announced yesterday at Maker Faire NY the winners of the 10-dollar robot design challenge. The objective of this challenge was to develop low cost robots to help education in Africa.

One of the winners, N-Bot, which uses your AS Flash based FFT system! 🙂

N-Bot website:

http://www.natalnet.br/~aroca/afron/

10-dollar robot design challenge winners

http://robotics-africa.org/design_challenge.html

http://www.wired.com/design/2012/09/afron-winners

N-Bot costs between 11 and 15 dollars (depending on the number of sensors). After drawing N-Bot control program using blocks, you simply click Run, and audio tones will control the robot! Sensor feedback is also captured by the microphone input of the computer. Then we use an adapted version of your real time FFT system to map frequencies to sensor states. The videos and instructions page of the above N-Bot explains how the Gerry’s FFT system is used to capture sensors’ states.

If you want to see it working right now, please follow the link:

http://www.natalnet.br/~aroca/N-Bot/html/beta/natalnet/

On this site, there is a credits page with your name. We hope this is okay for the Flash FFT license.

Thank you very much Gerry! Your piece of software helped we build N-Bot!!

Rafael

Congrats on your win! Very cool project, and I’m glad to see my FFT spectrum analyzer code has been put to good use. Thanks for the credit too!

Thank-you very much for your contribution to sound processing in AS3. You and Andre Michelle have opened up amazing new possibilities for the AS3 community.

It seems like the momentum behind these great sound projects has somewhat diminished since 2010. I’m sure many fellow developers would agree that we’d love to see more code and explanation in this area of signal processing, especially now that we have access to ActionScript workers.

I’d be willing to offer a small donation if the next article is about how to implement pitch detection. Since there seems to be a lot of demand for pitch detection, perhaps we can start a KickStarter project around it?

Im up for it too, lets start a kickstarter for the pitch detection. Thanks again Gary.

I too need to implement pitch detection into an application of mine.

I’m willing to donate for it.

I have been searching far and wide on the internet for something like this. I was so close, too. I follow your link, and the site has died or something. :[[[

EDIT:Disregard that. Now it works!!

Now, I’m trying to modify the code to make this logarithmic instead of linear. I followed your response to a comment asking for help on this issue, and it lead me to modify lines 190 and 250.

const FREQTOPIXEL:Number = WIDTH/(Math.log((MAX_FREQ/MIN_FREQ)));// Pixels/Hz

x = LEFT + FREQTOPIXEL*(Math.log((f/MIN_FREQ)));

Now, when I compile, I get undesired results. Are there more changes to be had that I’ve missed?

Only one I can think of offhand (rushing, Christmas morning here!) is to make sure MIN_FREQ is non-zero.

Pingback: Continuing to Refine Live Audio Input | RANDOM TERRITORY

Hi gerry first of all thank you for your post..

I’m a computer science student and me and my group are making a chord identifier but we can’t find a code for creating a spectrum analysis in mp3 mode. can you give me source code.i will deeply appreciate your help..thank you in advance

Assuming you’re using Flash, just use the Sound class extract() method to get raw audio sample data from the mp3.

can i get the code from you? because it is depressing me so much…

thank you for your reply..i was wodering if you also have a code for converting a bytes to frequency..

thankyou so mmuch

To use an FFT to do pitch recognition on an mp3, you first need to extract the audio data from an mp3. Initially you’ll get it as a ByteArray. You can convert that into a vector of Numbers by making a series of calls to ByteArray.readFloat(), and pushing the values to the end of a vector. Note that the audio data in the ByteArray will be interleaved, that is alternating left/right/left/right. For pitch recognition, you’ll probably want to mix to mono. Andre Michelle’s blog has some pretty good examples for extracting and manipulating mp3 sample data. I hope that helps!

thank you for the reply

we are already done with getting the ByteArray our problem is how are we going to convert the Bytes to frequency..

thank you.

and also Gerry we are using java..

Hi Gerry, I realize I’m exhuming a very old post and you’ve quite likely moved on from AS3 and the Flash platform, but there appears to be a damping phenomenon with the Microphone handler, and your spectrum analyser exhibits the same behaviour. A simple test: generate some white / pink noise at a constant amplitude and observe your spectrum levels. After about 5 seconds, the levels drop off dramatically. Depending on the purity of your noise input, it could drop off to zero entirely. As soon as you cut off the input and then start up again, the levels are back up… for a little while.

Sounds like some noise compensation algorithm at play. Have you had to deal with this, and if so would you be willing to address my StackOverflow question or even post a response here? The SO question is here: http://stackoverflow.com/questions/24619985/as3-microphone-levels-decrease-over-time-built-in-noise-compensation

Must be platform dependent. I just tested in Safari using a Logitech C170 webcam for the mic, using a steady sound source, and notice no drop in levels in the spectrum analyzer, even after a minute.

Hi. Really nice code. I was searching for a spectrum analyzer. But I couldn’t run the code with the FFT code you have provided? I created 3 classes for all the code you have provided but i cant run it. Im trying it on Netbeans. Can you help me? Thank you!!

Pretty busy these days I’m afraid… but if you tell me what the errors are, I might be able to help.

Hey! Would you mind linking the FFT code/earlier post you mention in the original post? I’m having trouble finding it

There’s a link to the FFT code in the first paragraph of the post.Good Job!. I have been struggling for days with my code. My spectrum is not responding, looks like random and always almost the same.

I think it has to do with the mic buffer. COuld you help me with this:

private void Microphone_DataAvailable(object sender, MicrophoneDataEventArgs e)

{

uint len = (uint)(e.Buffer.Length / 4);

for (uint i = 0; i < len; i++)

{

short sample = (short)((e.Buffer[i + 1] << 8) |

e.Buffer[i]);

float sample32 = sample / 32768f;

m_buf[m_writePos] = (float)Convert.ToDouble(sample);

m_writePos = (m_writePos + 1) % BUF_LEN;

}

You can download full code at :

—-

https://drive.google.com/file/d/0B2i6MVQ3QsVjUUFza3AyZTQ0QlE/view?usp=sharing

—

Any help would be really appreciated.

I’m not familiar with how the mic input works in C#, so can’t help much. But just from the small code snippet you provided, I suggest double-checking the format of e.Buffer. If it’s shorts, then number of input values would be the number of bytes in the buffer divided by 2 (i.e. sizeof(short)) not by 4. Also, if it’s shorts, you should be advancing through the buffer two bytes for each sample, e.g. short sample = (short)((e.Buffer[2*i + 1] << 8) | e.Buffer[2*i]);. Even easier, just cast the buffer to short*, e.g. short* shortBuffer = (short*)(&e.Buffer[0]), then access the short values directly from shortBuffer.

i made it work! works great. Thank you