A few weeks ago, I wrote a post comparing the performance of Flash/AS3 vs. Silverlight/C#. I used a very simple test, element-by-element addition of two vectors of numbers. The C# version was about 5 times faster.

I was curious whether I’d get a similar performance disparity with some less trivial computation, so I ported my “even faster” AS3 FFT implementation to C#. The C# FFT uses only managed code (no pointers) so you can use it in Silverlight. I didn’t attempt to make any C#-specific optimizations, just did the minimal changes to get it to build.

As in my earlier C# vs. Flash performance test, I once again found that the C# implementation is much faster, nearly 4x faster than AS3. For 1000 forward-inverse pairs of 1024 pt FFTs, my C# FFT takes ~85 ms. The same test with the AS3 FFT takes over 300 ms.

I’m releasing the C# FFT code under the MIT license, which means you can pretty much do whatever you like with it, as long as you keep the comment block that says I wrote it and that you can do whatever you like with it ;-). If you find it useful or have suggestions to improve it, feel free to leave a comment.

*Update 3 August 2011: Graeme West reported some strange results using my FFT. I dug into it, and discovered a bug: the sign for the imaginary part of the “twiddle factors” was incorrect. I’ve fixed it now – just required changing the if (inverse) condition to if (inverse == false). I never noticed because for real input data (i.e. where the imaginary component is zero), the only impact on the output was that the sign of the imaginary component was flipped, and that had no effect on magnitude spectra (which is what I normally was computing). *

using System; using System.Net; using System.Windows; using System.Windows.Controls; using System.Windows.Documents; using System.Windows.Ink; using System.Windows.Input; using System.Windows.Media; using System.Windows.Media.Animation; using System.Windows.Shapes; namespace SpeedTest { /** * Performs an in-place complex FFT. * * Released under the MIT License * * Copyright (c) 2010 Gerald T. Beauregard * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to * deal in the Software without restriction, including without limitation the * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or * sell copies of the Software, and to permit persons to whom the Software is * furnished to do so, subject to the following conditions: * * The above copyright notice and this permission notice shall be included in * all copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS * IN THE SOFTWARE. */ public class FFT2 { // Element for linked list in which we store the // input/output data. We use a linked list because // for sequential access it's faster than array index. class FFTElement { public double re = 0.0; // Real component public double im = 0.0; // Imaginary component public FFTElement next; // Next element in linked list public uint revTgt; // Target position post bit-reversal } private uint m_logN = 0; // log2 of FFT size private uint m_N = 0; // FFT size private FFTElement[] m_X; // Vector of linked list elements /** * */ public FFT2() { } /** * Initialize class to perform FFT of specified size. * * @param logN Log2 of FFT length. e.g. for 512 pt FFT, logN = 9. */ public void init( uint logN ) { m_logN = logN; m_N = (uint)(1 << (int)m_logN); // Allocate elements for linked list of complex numbers. m_X = new FFTElement[m_N]; for (uint k = 0; k < m_N; k++) m_X[k] = new FFTElement(); // Set up "next" pointers. for (uint k = 0; k < m_N-1; k++) m_X[k].next = m_X[k+1]; // Specify target for bit reversal re-ordering. for (uint k = 0; k < m_N; k++ ) m_X[k].revTgt = BitReverse(k,logN); } /** * Performs in-place complex FFT. * * @param xRe Real part of input/output * @param xIm Imaginary part of input/output * @param inverse If true, do an inverse FFT */ public void run( double[] xRe, double[] xIm, bool inverse = false ) { uint numFlies = m_N >> 1; // Number of butterflies per sub-FFT uint span = m_N >> 1; // Width of the butterfly uint spacing = m_N; // Distance between start of sub-FFTs uint wIndexStep = 1; // Increment for twiddle table index // Copy data into linked complex number objects // If it's an IFFT, we divide by N while we're at it FFTElement x = m_X[0]; uint k = 0; double scale = inverse ? 1.0/m_N : 1.0; while (x != null) { x.re = scale*xRe[k]; x.im = scale*xIm[k]; x = x.next; k++; } // For each stage of the FFT for (uint stage = 0; stage < m_logN; stage++) { // Compute a multiplier factor for the "twiddle factors". // The twiddle factors are complex unit vectors spaced at // regular angular intervals. The angle by which the twiddle // factor advances depends on the FFT stage. In many FFT // implementations the twiddle factors are cached, but because // array lookup is relatively slow in C#, it's just // as fast to compute them on the fly. double wAngleInc = wIndexStep * 2.0*Math.PI/m_N; if (inverse == false) wAngleInc *= -1; double wMulRe = Math.Cos(wAngleInc); double wMulIm = Math.Sin(wAngleInc); for (uint start = 0; start < m_N; start += spacing) { FFTElement xTop = m_X[start]; FFTElement xBot = m_X[start+span]; double wRe = 1.0; double wIm = 0.0; // For each butterfly in this stage for (uint flyCount = 0; flyCount < numFlies; ++flyCount) { // Get the top & bottom values double xTopRe = xTop.re; double xTopIm = xTop.im; double xBotRe = xBot.re; double xBotIm = xBot.im; // Top branch of butterfly has addition xTop.re = xTopRe + xBotRe; xTop.im = xTopIm + xBotIm; // Bottom branch of butterly has subtraction, // followed by multiplication by twiddle factor xBotRe = xTopRe - xBotRe; xBotIm = xTopIm - xBotIm; xBot.re = xBotRe*wRe - xBotIm*wIm; xBot.im = xBotRe*wIm + xBotIm*wRe; // Advance butterfly to next top & bottom positions xTop = xTop.next; xBot = xBot.next; // Update the twiddle factor, via complex multiply // by unit vector with the appropriate angle // (wRe + j wIm) = (wRe + j wIm) x (wMulRe + j wMulIm) double tRe = wRe; wRe = wRe*wMulRe - wIm*wMulIm; wIm = tRe*wMulIm + wIm*wMulRe; } } numFlies >>= 1; // Divide by 2 by right shift span >>= 1; spacing >>= 1; wIndexStep <<= 1; // Multiply by 2 by left shift } // The algorithm leaves the result in a scrambled order. // Unscramble while copying values from the complex // linked list elements back to the input/output vectors. x = m_X[0]; while (x != null) { uint target = x.revTgt; xRe[target] = x.re; xIm[target] = x.im; x = x.next; } } /** * Do bit reversal of specified number of places of an int * For example, 1101 bit-reversed is 1011 * * @param x Number to be bit-reverse. * @param numBits Number of bits in the number. */ private uint BitReverse( uint x, uint numBits) { uint y = 0; for (uint i = 0; i < numBits; i++) { y <<= 1; y |= x & 0x0001; x >>= 1; } return y; } } }

Got an iPhone or iPad? Check out AudioStretch for iOS.

Gerry-

I have written a small application in C# using NAudio that will record a sound clip from line-in or MIC. My requirement is to search this recorded sound clip for a 2 second clip. (I know this is very similar to Shazam). Another dev suggested I use FFT, but I have no background at all in signal processing.

Basically, in pseudocode, I want to achieve the following:

If RecordedSound.data contains SoundsClip(myclip.wav)

{

Do this

and this

}

If FFT is the way to go, could you point me in the right direction to get started on this?

Thanks-

Jason

Jason, that’s a challenging problem!Basically what you need to do is to do is analyze the two clips and generate some “feature vectors” (vectors of numbers that represent their acoustic properties in some time slice) at intervals of 10 – 50 ms or so. Then you need to attempt to match the feature vectors of one clip against another clip. That would require some measure of “distance” between the feature vectors, and some fast matching algorithm that’s smart enough to not do an exhaustive computation of the match at every time position in every clip. In short – quite a big problem.

Fortunately people have done a lot of work in this field already, and some of the stuff has made it into audio standards already. In particular the choice of feature vectors has been gravitating towards a small set thanks to MPEG-7 standardization. Do a Google search for “MPEG7 music features”.

Incidentally, the Fourier transform by itself won’t give a good feature vector – it’s too big, and contains loads of information that can change dramatically even for very small changes in perceived sound. I believe people normally use a cepstrum, which can be computed from the Fourier Transform – the cepstrum’s much smaller and more robust. Look for MFCC (Mel-Frequency Cepstral Coefficients) in the literature on the subject].

-Gerry

Hi

Thanks for this very nice implementation.

It seems that there is something non-standard happening here. Compared to other implementations, the entries seem to be in a rather strange order: call the requisite outputs of the FFT y[0], y[1], …. y[N]. (These are what the outputs would be according to other implementations, and indeed according to the definition i.e. just doing the algebra of multiplying the vector by the requisite square matrix.) But actually your implementation is writing the output out as y[0], y[N], y[N-1], …. y[1].

I’m sure I’m missing something. Can you advise?

Hi,

Your algorithm works very well. I really appreciate it. Is there any limitation as to the size of the data array? If I set my data size greater than 262,144 it does not seem to return a result.

Again, thanks for simple well commented code.

The only hard-coded limitation on the size of the data is due to the size of uint/int, but that would limit the FFT size to 2^31 (or maybe 2^32). If it’s failing for a smaller size, it’s probably because of memory. I suggest just checking the result of each “new” call in init() to ensure it succeeded, i.e. that the value is non-null. Perhaps just modify init() so it returns a bool. If any of the new calls gives a null pointer, immediately return from init() with false; and return true at the end of the the init() function. Then in the code that uses the FFT, just check the return value from init(). If it’s true, it’s good to go. If the return value is false, then you know you’ve hit the limit for memory allocation on the heap.

If the issue

isthat you’re running out of memory, maybe there’s some way to increase the maximum amount of heap memory. Alternatively, you could use a different FFT algorithm. The C# version I posted is a bit odd, in that it copies all the data into linked lists for processing. It’s a direct port from an ActionScript implementation which had to be written in a strange way to get decent performance. In ActionScript, it turns out that accessing elements of vectors is not very fast, and stepping through linked lists much faster. In C#, it’s probably faster (and definitely less memory intensive) to operate on the arrays/vectors of input data directly, as I did in my original more conventional ActionScript FFT implementation here which you could probably port pretty easily to C#.i was wondering how could i use this code in my project to determine the frequency of the sound recorded from the mic, plz can you help me , thanks

Hi Gerry, I used your AS3 FFT code and it worked great. I’m moving my project over to C# now though. For some reason, I keep getting Infinity when I try to convert the FFT output (which is in magnitude I think) to dB. I’ll link my stackoverflow post.

http://stackoverflow.com/questions/8696926/magnitude-to-decibel-always-returns-infinity-in-c-sharp

I can’t find anything wrong with my conversion, so I think I’m not seeing something. Thanks!!

If the power in a bin happens to be zero (i.e. re and im parts are both zero), then you’ll be taking the log of zero. The log of zero is minus infinity. A straightforward way to prevent this is to add a very small number to the power before taking the log, as I do in my spectrum analyzer code.

Incidentally, since you’re getting the magnitude (not the power) in each bin by taking the square root of re*re + im*im, to get dB levels you want 20 times the log10, not 10. Alternatively, you can skip the square root so you’re dealing with power, skip the square root, and multiply the log10 by 10.

Hi Gerry, please disregard my question below, I found out what was wrong. Everything is working as it should, but my values are very large (up in the positive 600’s) but when I did the spectrum analyzer in AS3 everything was between -150 and 0. Do you know what might be causing this? Thanks again!!

Thanks for the advice, I think I sort of jumble up things a bit when I was trying to initially fix them. I tried what you advised just now and I keep getting NaN for some reason. I’ve gone back to the spectrum analyzer code and taken from there. I still keep getting NaN though, but I’m not sure why. Here is my code:

…

private double SCALE = (double)20/System.Math.Log(10); // used to convert magnitude from FFT to usable dB

private double MIN_VALUE = (double)System.Double.MinValue;

…

// Perform FFT

fft.run(tempRe, tempIm);

fftCount++;

// Convert from Decibel to Magnitude

for (int i = 0; i < N/2; i++) {

double re = tempRe[i]; // get the Real FFT Number at position i

double im = tempIm[i]; // get the Imaginary FFT Number at position i

m_mag[i] = Math.Sqrt(re * re + im * im); // Convert magnitude to decibels

m_mag[i] = SCALE * Math.Log(m_mag[i] + MIN_VALUE);

if (fftCount == 50 && i == 400) print ("dB @ 399: " + m_mag[399]);

}

I'm new to C# so it could just be something simple but I'm not sure. Thanks!!

Gerry,

Thanks for the useful code! I dropped it into a Visual Studio 2010 C# project and it worked perfectly!

Many thanks. This is really clear and effective. I’m just getting into C# and I learnt a lot from this and wrote a little ‘is my violin in tune’ programme for my son. You can download this if you want from the website. This should take you to it: http://www.moonsch.com/?creator&Administrator&20

There were no changes required, except in the ‘run’ definition my (old 2003) compiler didn’t like ‘inverse’ being defaulted to false for some reason. Easily overcome, but next lesson is probably to look into this and find out why! Not tonight though.

Does the code work for all sizes or for 2^n’s?

It’s only for power-of-2 lengths. The FFT size is specified via the logN argument to init(), logN being the base-2 length of the FFT. For example, for a 512 point FFT, logN = 9. For non-power-of-2 lengths, you could try FFTW; I don’t think it’s available in C#, but people have written C# wrappers to call it.

3 Questions:

1.) If sample 1024 points of data at a rate of 10KHz (every 0.1ms) How do I figure out what the frequency is for each array index in the fft values you return?

2.) When run() completes, xRe and xIm contain 1024 points of data, the second half of the data almost mirrors the first half, am I supposed to use only the first 512 points of data? and the rest is garbage?

3.) If I do this Magnitude[i] = Math.Sqrt(re * re + im * im); What units is this in?

1) The first bin is at 0 Hz. Spacing between bins is SR/N. So for SR=10kHz and N=1024, successive bins are every ~9.75 Hz (10kHz/1024).

2) If your input is all real (i.e. all elements of xIm are zero), then the second half is basically redundant. Due to some symmetry properties, you’ll find that xRe[N-k] = xRe[k], and xIm[N-k] = -xIm[k], or more concisely x[N-k] = conj(X[k]).

3) The FFT itself doesn’t know anything about units. It’s just a mathematical transform. So the units of the bin magnitudes depends what the units of your input are :-) If you want the values to be in the same sort of range as the magnitude of the signal you’re analyzing, you’ll probably want to divide by N.

BTW, if you’re using the FFT to analyze audio data (or almost any other kind of real-world data), there are a few other things to consider, e.g. applying a suitable window to the data (Hann, Hamming, Blackman, etc.), and (if you’re plotting the output) converting to dB scale. You may find my post on real-time spectrum analysis useful:

http://gerrybeauregard.wordpress.com/2010/08/06/real-time-spectrum-analysis/

It’s in Flash/ActionScript, but it should be pretty easy to read the code if you know C#.

Hi Gerry. Can you please confirm something for me? If I have a 1024 point FFT (N = 1024) and the data is all real. Then only the first 512 bytes of the result contain the results. When calculating the bin size is N 1024 or 512?

Thanks

Bryon

If you have a complex FFT function, and you want to do an FFT on 1024 real points, the simplest approach is to set the FFT size to 1024 (log2N = 10), set the imaginary input to all zeroes, and perform the FFT. The result will have 1024 complex points, but a lot of it’s redundant information due to symmetry properties of real FFTs; basically you can ignore all the bins above N/2.

It’s possible to calculate the FFT of N-point pure real sequences using an N/2-point FFT. It’s also possible to do two N-point real FFTs using a single N-point complex FFT. Both technique are useful in applications where high performance is necessary, but the code and math are a bit fiddly.

Thanks for the implementation – quick question however. Calculating signal amplitude for a power bucket should be A = SQRT(2 * P / N); where P = Power (assuming no power bleed-over into adjacent buckets).

In my testing, however, I am finding that with this algorithm, A is calculated as: A= 2 * P / N. <== notice I need to drop the SQRT.

Am I misunderstanding something here?

You should be able to get the magnitude at bin i by computing (a/N) * sqrt(xRe[i]*xRe[i] + xIm[i]*xIm[i]), where ‘a’ is some constant that depends on the particular analysis window you’re using. -GerryThanks for the reply. Assuming a = 1 in your example above, here’s what I am seeing:

A = 2 * SQRT(xRe[i]*xRe[i] + xIm[i]*xIm[i]) / N

or

A = 2 * P / N

Where A = Magnitude of the signal, and P = SQRT(xRe[i]*xRe[i] + xIm[i]*xIm[i]) . According to the texts I have read, I was expecting to see:

A = SQRT(2 * P / N), but I am finding that I don’t need the SQRT to get the correct A values.

Power is proportional to amplitude *squared*. So if you want the power, you just want (xRe[i]*xRe[i] + xIm[i]*xIm[i]), not the square root of it.Thank you very much for your code, Gerald. It was a great help.

I’m using it for audio-processing, and so for an audio packet of size “N” samples I set xRe[0...N] to the waveform and I set xIm[0..N] to zero, and do an FFT of size “N”.

But I read that it’s possible, and more efficient, to pack the audio interleaved into xRe[0...N/2] and xIm[0..N/2], and then do an FFT of size “N/2″. This obviously runs twice as fast. I found two websites with explanations on how to do it:

http://www.katjaas.nl/realFFT/realFFT2.html

http://processors.wiki.ti.com/index.php/Efficient_FFT_Computation_of_Real_Input

But after spending three fruitless hours attempting to follow their techniques, I came to the conclusion that I’m not smart enough. Well, if you ever end up implementing those techniques in a double-speed FFT, I’d love to know!

I know of the technique, but have never implemented it. For a lot of the stuff I do, I need to do two N-length real FFTs at once (e.g. for processing stereo audio data), and that can be done with a single N-length complex FFT.

Basically you stuff the left channel into the real part and the right channel into the imaginary part, run the transform, then do some post-processing to separate the spectra of the two channels by taking advantage of even-odd symmetry properties of the FFT. There’s an explanation here.

It’s still a bit fiddly, but it’s relatively easy to implement compared to trying to do a N-length real FFT with an N/2 complex FFT.

Hi!

I’m trying to FFT some data from an audio file, but I still struggle to understand the ins and outs of the problem.

I want to thank you for the implementation!

The only missing part would be a test function, that show how to call the methods. Can you please give me some guidances on how to do this?

FFT2 myFFT = new FFT2() ;

myFFT.init(5) ; // 2^5 = 32

myFFT.run(??, ??, false) ;

You need to declare arrays for the input/output data. The input and output are complex numbers i.e. they have “real” and “imaginary” components. Audio data is all “real”, so you copy a block of audio data into the elements of the real array, and set the imaginary elements to zero. After running the FFT, the arrays contain the real and imaginary parts of the spectrum.

Codewise you need to do something like this.

int logN = 10;

int n = 1 << logN; // = 2^10

double[] re;

double[] im;

re = new double[n];

im = new double[n];

// TODO: Copy audio data into re, set im to zero

FFT2 myFFT = new FFT2(logN);

myFFT.init(logN);

myFFT.run(re, im, false);

// TODO: Do something with the FFT results, e.g. compute magnitude from re and im components.

(Not 100% sure of the C# syntax, btw. Haven't used C# in a while, and too lazy to fire up the old Windows XP virtual machine on my Mac to check my old code).

Often one wants to get the magnitude of the spectrum. To get that, you apply the Pythagoras theorem (take the square root of the sum of the squares of the real and imaginary elements), mag[i] = sqrt(re[i]*re[i] + im[i]*im[i]).

Take a look at my post on Real-Time Spectrum Analysis. The code's in ActionScript, but it should be pretty easy to figure out how to port it to C#.

hey,

I am developing a speech recognition module, where i need to use fft for signal processing.

Your code was very valuable for the same. Could you please tell, how to store the audio data to the arrays. How to specify the path of the audio file..?

Thanks in advance..

You’ll probably want to take a frame of speech audio data, window it (e.g. with a Hamming window), and put it in the real array, and set the imaginary array to zero. As for getting audio from the files, you’ll have to look elsewhere – haven’t done any work in C# audio programming in a long time. -GerryThank you for very nice implementation, I really appreciate it. Here my small piece of code which will give you about 1% speedup. Sorry for ugly code :)

private uint BitReverse(uint x, uint bits)

{

uint v = x; // 32-bit word to reverse bit order

// swap odd and even bits

v = ((v >> 1) & 0x55555555) | ((v & 0x55555555) <> 2) & 0x33333333) | ((v & 0x33333333) <> 4) & 0x0F0F0F0F) | ((v & 0x0F0F0F0F) <> 8) & 0x00FF00FF) | ((v & 0x00FF00FF) <> 16) | (v <>= (32 – (int)bits);

return v & (uint)((1 << (int)bits) – 1);

}

oops, formatting was wrong ;) 2nd try. I hope it works now.

private uint BitReverse(uint x, uint bits)

{

uint v = x; // 32-bit word to reverse bit order

// swap odd and even bits

v = ((v >> 1) & 0x55555555) | ((v & 0x55555555) << 1);

// swap consecutive pairs

v = ((v >> 2) & 0x33333333) | ((v & 0x33333333) << 2);

// swap nibbles …

v = ((v >> 4) & 0x0F0F0F0F) | ((v & 0x0F0F0F0F) << 4);

// swap bytes

v = ((v >> 8) & 0x00FF00FF) | ((v & 0x00FF00FF) << 8);

// swap 2-byte long pairs

v = (v >> 16) | (v << 16);

v >>= (32 – (int)bits);

return v & (uint)((1 << (int)bits) – 1);

}

Thanks! But for BitReverse(), I’ll take clarity over speed, especially given that BitReverse() is only called in init(). In most applications, one wants to perform many FFTs of the same size, which means you call init() once, then call run() many times (thousands of times in a typical audio application). So how long init() takes doesn’t really matter; speedwise, it’s the speed of run() that matters.

Hi Gerry

I am trying to use an FFT to preform frequency analysis on accelerometer data. From what I understand, if I used your code I would pass my accelerometer array as the real part, and set the imaginary part to be 0 (since all my data is real?).

Where I am getting confused is where you use your functions (run then intt) and how to get the result. Neither function appears to have a return type, and there is no externally accessible variable. I assume the output will be in the FFTElement m_X

If I am wrong about anything above let me know. I will look through the code more thoroughly, but it takes me a long time to understand code I haven’t written myself.

Hi Mike, the arguments xRe and xIm to the run() function are for input *and* output. So in your case, you would put the accelerometer data into xRe, set all the elements of xIm to zero, can call run(). xRe and xIm then contain the results of the Fourier transform.

Ok so I call init and pass it the log base 2 of my array size.

Then I pass my array into run, but how do it get the results from x.Re and x.Im, or am I going about this the wrong way?

After you call run(), xRe and xIm contain the results of the Fourier transform, i.e. the spectrum of the real-valued signal you passed in via xRe (with xIm set to zero). The spectrum is complex because it has magnitude and phase information. To get the magnitude at bin i, just compute sqrt(xRe[i]*xRe[i] + xIm[i]*xIm[i]), basically Pythagoras’ theorem. The spacing of the bins is Fs/N, where Fs is the sample rate, and N is the length of the FFT. For example, if the accelerometer data was sampled at 100 Hz, and you performed FFTs with buffers of 128 points, the frequency increment between bins would be 100 Hz / 128 ~= 0.78 Hz. Note that in the FFT, any bins above N/2 are redundant. They’re above the Nyquist frequency (half sample rate), and the values are also complex conjugates of the bins below N/2.

BTW, before passing the real data into the FFT via the xRe argument, you’ll probably want to apply a non-rectangular analysis window function to it. I usually use a Hann window (raised single-period inverted cosine function). The windows reduce “spectral leakage”, which can be quite severe if you don’t use a window (or more precisely, if you use a rectangular window); spectral leakage makes the spectrum values pretty much unusable, as you get strong values in the spectrum at frequencies where there wasn’t any power in the original signal.

I hope this helps! You might find my post on real-time spectral analysis useful; it uses audio, and the code’s in Flash/ActionScript, but much of it could easily be ported to C# and applied to accelerometer data.

Hello Gerry,

Thanks for your code. It is very helpful.

Now i have this problem that i have a sound input from a file. This sound file is a single frequency file (for example 100 hz) and its duration is 20 second. I need to be able to know what is the frequency of the played sound from the FFT results. The problem that i don’t understand the results coming from the FFT function. I strongly believe that i am able to extract the correct frequency from the file if i have the samples values and the sample frequency. Could you please help me to achieve this ?

Thanks,

Ahmad

Hi Ahmad, glad you like the code. Take a look my real-time spectrum analysis project. It’s in Flash/ActionScript, but if you know C# you should find it relatively easy to read.

Key steps to getting the (most prominent) frequency: grab a frame of audio data (typically a power of two long) ; apply a window function (e.g. Hanning); perform FFT (input: audio data in real part, zeroes in imaginary part); find magnitudes, i.e. for each FFT ‘bin’, compute sqrt(xRe[i]^2 + xIm[i]^2); find peak; interpolate to find more precise frequency.

If the signal is just a single frequency, getting its frequency via an FFT is actually a bit of an overkill, but it’ll work.

BTW, often when people say they’re looking for the “frequency” of an audio signal, they really mean the pitch (essentially the fundamental frequency). What’s your signal?

Hello Gerry,

Have been trying to do this for a good week or so now, and getting a little stuck.

I was wondering if you had the time to explain where I am going wrong or fill in the gaps in my knowledge.

Firstly, I take an audio file, (only .wav), and create a byte array out of this. the sample song I am using is 48000Hz and 217 seconds in length. This results in the size of the byte array for the

left audio channel being 10439424. I am only using left channel, and song is stereo should I be doing something with the right here?

Should I perform the FFT on an array of size 16777216 (after passing in the audio data and filling the rest with 0s) and then split this into 8192/4096/2048 etc. sized chunks, or should I split the audio into 4096 sized chunks and do an FFT on each of those. Or is there no difference here?

I can calculate the magnitude but don’t see how this then gives me a method of displaying the whole spectral graph.

I can calculate the magnitude at any point in the array:

//magnitude

magnitude[i] = Math.Sqrt(real[i] * real[i] + imaginary[i] * imaginary[i]);

//frequency

frequency[i] = magnitude[i] * (48000/lengthOfFFT)

So if i use these as my two axis.

I can display a graph with the x plots being frequency[i], y being magnitude[i]

If I split the audio into 4096 chunks I can display the magnitude and frequency graph of each of these at a point in time relative to the song? So:

If i pass an array of 4096 size into an fft i get 2048 bins? With a sampling rate of 48000Hz I need to refresh my graph every 42/43ms?

Thanks,

Tim

Got a display now, I saw my issue after I posted – frequency is meant to be just [i] not magnitude[i].

Glad you worked it out. Frequency is proportional to the bin index i, basically i*SAMPLE_RATE/FFT_LEN. You’ll probably find the spectrum display more useful if you plot the dB magnitude (20*log10(magnitude[i]) instead of the raw magnitude.

I am still having some problems getting this FFT to work. I added this code to the end of the run() function:

for (int i = 0; i < 16384; i++)

{

FFTOutPut[i] = Math.Sqrt((xRe[i] * xRe[i]) + (xIm[i] * xIm[i]));

}

return FFTOutPut;

It all run ok, but when I display the result on a graph weird things start happening. Even if the data is not changing (i.e. no frequencies other than 0Hz) the graph shows weird values. Even more unusual is that the values keep getting larger (even though everything is instantiated in a function, and then disposed of). By this I mean the FFT output values which grow until they are over 10^308.

This is the code that actually calls the FFT (called once every 5 seconds aprox):

private void FFTRun()

{

FFT ffTrans = new FFT();

List fftList = new List { };

double[] tempFFTList = new double[16384];

fftList = YAxisAcc.DataArray.OfType().ToList();

ffTrans.init(14);

tempFFTList = ffTrans.run(fftList.ToArray(), emptyImArray, false);

tempFFTList[0] = 0;

if (Global.GlobalGraphOn)

{

Graph.DrawGraph(zg1, “FFT”, “Frequency”, “Amplitude”, tempFFTList);

}

}

I have also used another FFT function, so I know the graph etc works. The problem with this one is that it is not radix 2, and uses large amounts of memory for a 16k point FFT.

Any idea what is going on, or if I am doing something wrong?

Hi Mike, not quite sure I follow everything that’s going on in your code… but the most essential thing is to understand that FFT2.run() has no return value. The real & imaginary spectrum gets put into the same arrays you use to pass the input values to run().

Ok. So if I do something like

FFT myFFT = new FFT();

myFFT.run(myArray1, myArray2, false);

Then myArray1 & 2 would contain the result?

[Yes, that's correct, though of course you also need to call init() before calling run(). Note that initializing the FFT takes a bit of time, so if speed is an issue, create and init the FFT object once, and then reuse it over and over again. -Gerry]So you can call init() once then use run() over and over again?

[Correct. Most other FFTs work that way too, one time initialization (which may be time-consuming), then call as many times as you like. -Gerry]I worked out why I am getting these large values. I have an array called yAxisAcc (accelerometer data) which I pass to another array called fftReArray. The result of the fft are being stored in the FFT array but also the accelerometer array. The assigning doesn’t even happen in the same function.

Any idea why this is happening?

Hi Gerry, thanks for your code, its much appreciated. I implemented a simple DFT (i.e. not the fast version) then compared its results with your code for both 128 bins and also 1024 bins and they correspond exactly, apart from a few minor significant digits. Its actually hard to find a simple C# example of an FFT implementation like this. The only other one I found, which might be of some interest to your readers might be the one on code project. Here is the link for reference: http://www.codeproject.com/Articles/9388/How-to-implement-the-FFT-algorithm

Glad you like it, and glad to hear the results agree with your simple DFT implementation. I’d be worried if they didn’t! :-)

It is kind of hard to find a C# FFT, especially one that can be used very freely. Mine’s released under the MIT license, which basically means you can do anything you want with it, as long as the comment block remains in the source code. No hassle to use in commercial products – no need for attribution in “About” dialogs, no need to make modifications freely available, etc., though attribution is of course welcome :-)

I’m actually kind of surprised there’s no Microsoft-supplied library of DSP functions (at least as far as I know). Nothing like Apple’s vDSP/Accelerate framework for example. There are some great third party libraries, for example Intel IPP (for which there is apparently a C# wrapper), but IPP doesn’t run on ARM chips, so if you’re developing in C# and targeting Windows Surface Pro (Intel/AMD instruction set) *and* Surface RT (ARM), you’re stuck.

Hi,Gerry. You have a wonderful post on the performance of C#. I am a student, and currently I’m doing a project on Audio informaton retrival. Exactly speaking, I’m doing a project where my search query would be a recorded piece of audio which would be matched and searched in database full of lecture notes, podcasts and other speech dominated audio content.

Till date, I have reached a stage where I am reading the audio file (WAV format) and obtaining the different frequencies. After that I’m stuck. I have a stereo, so have a 2 dimensional array of the bytes that I read. To further my project, I need to apply FFT. I have no clue as to how to use your module. Could you please guide me from here?

Thanks for your comments. Sounds like an interesting project you’re doing!

To use the FFT, first create an FFT object and specify the base-2 log of the FFT length. E.g. if you’ll be doing 1024 = 2^10 point FFTs, create an FFT object like this:

const int LOGN = 10;

FFT2 fft = new FFT;

fft.init(LOGN);

You only need to do this once.

Then for each FFT computation, assuming you’re doing it on real data, put the audio data into a 1024-point array re, and fill another 1024-point array im with zeros, and call the FFT’s run function:

fft.run(re, im);

The arrays re and im will contain the complex spectrum. Chances are you don’t need the phase info, and only want the magnitude spectrum, which you would compute like this:

const int N = 1 << LOGN;

for (int i = 0; i < N/2; i++)

{

mag[i] = sqrt(re[i]*re[i] + im[i]*im[i]);

}

BTW, I haven’t written any C# code in ages… so not 100% sure I’ve got all the syntax correct above (e.g. it might be Math.sqrt() instead of sort()). But the above should be enough to give you the idea. Good luck!

for (int i = 0; i < N/2; i++)

N is a constant. It will just loop 5 times then end.

nvm, misread the bit shift part const int N = 1 << LOGN;

But I am still getting an out of bounds exception. Why not just do for (int i = 0; i < re.Length; i++)

Assuming the re and im arrays were created with length N, you should be able to access up to element N-1 without an out-of-bounds exception, so a for loop with i running from 0 to N/2-1 should be no problem at all. I suggest you double check how the re and im arrays are being allocated.

Hi Gerry,

amazing piece of code you have there, I have played with it quite a bit now. I have a small question on its performance though: you mentioned that you managed to get 85ms per 1000 forward-inverse pairs for 1024 points long arrays. What machine specs have you tested it on?

I seem to be getting values of around 330ms for a release build compiled it in VS Express 2012 C# 4.5. The machine I am using is a few years old but still fairly capable quad core i5 2.6GHz.

I’ve got a mid-2010, 15-inch MacBook Pro. It’s old, but zippy: 2.66 GHz Intel Core i7, 8 GB 1067 MHz DDR3 RAM. Can’t remember offhand all the other test conditions. Not sure whether the C# FFT timing test program is either, but it was almost certainly a fairly direct translation of my AS3 FFT timing test code.

Just thought I’d let you know I’ve ported your code to C and am running it on a Raspberry Pi in baremetal mode (i.e. no operating system – using the gcc compiler for ARM). It runs quite fast – taking approx 180 microseconds to do the calc with an FFT size of 64 bins. As a comparison, when using the FFTW package under Debian Linux on the Raspberry Pi it runs the same calc in 50 microseconds, however the FFTW is extremely complex code to get to that level of optimisation.

Cool! I’ve been working on a C (actually C++) port of it as well… which is sort of funny, given that the C# is a port of my ActionScript version, which was an adaptation of a C FFT I wrote 20 years ago!! I couldn’t compare it with FFTW, because I couldn’t figure out how to build FFTW :-( But I did compare it with KissFFT, and found I got comparable speed. A lot depends on the specific platform, compiler settings, and the FFT size.

hehe! yeah using FFTW from Windows (C#) was tricky, but I just used the precompiled DLL. There’s a page here that helps … http://www.sdss.jhu.edu/~tamas/bytes/fftwcsharp.html. It was easier to test FFTW on Linux using a plain C program – you just install a precompiled package.

Thank you very much, I appreciate. What more i say.

Good works

Hi Gerry,

This is a really nice contribution to the software engineering community you have made. Well done.

I have a small question about the maths and configuring for optimal performance…

To tune the algorithm for the optimum performance I believe that you should initialise the FFT for the appropriate number of points. (Too many points means it needs to do more work right?) What criteria do you use for selecting the number of points in your FFT?

Further to that question, the smaller the sample buffer then the faster the algorithm can do its work. What criteria can you use to turn down the sample rate so you don’t have to analyse too many samples. To help with the answer, here is some information about what I am trying to analyse:

a. The range of frequencies I am interested is between 15Hz and 8KHz.

b. The space between frequencies that I care about is 1Hz.

Based on the above, how would you configure the audio-sampling hardware and the FFT so the algorithm can perform at its best?

Thanks again!

Bryon

Sorry Gerry – I hit send before I added. Here is what I believe to be the approach:

a. The sample rate should be at a minimum twice the maximum frequency. So I would use 16KHz.

b. For a 1Hz signal distinction I should use a 0.5Hz bin size.

fftPoints can therefore be calculated as: fftPoints = sampleRate / binSize (16000/0.5)

fftPoints should therefore be 32768.

This is a lot of FFT points though….. What are your thoughts?

You would need 1024*8 points. You will get a latency of 1 second, the covered frequencies will be from 0Hz (DC offset) to 8192Hz.

You need to sample at 16kHz (at least). Personally I’d probably go with 22kHz, as that’s supported by a lot of computer hardware. The window length is a tradeoff: the longer it is, the greater the ability to resolve closely spaced sinusoids, but the poorer your time resolution becomes. If you want to resolve a pair of sinusoids 1 Hz apart (i.e. see separate peaks for the two sinusoids), then you’ll need a window at least 2s long.

If you’re not dealing with very closely spaced sinusoids, but just need to get a very accurate measure of frequencies of very spaced out sinusoids, you can use a relatively short FFT and use quadratic interpolation around spectral peaks to get sub-bin accuracy.

Thanks Gerry,

Can you please help me understand why a sample size of two seconds versus one second would make the difference for this application?

Rgds,

Bryon

Thanks David. So you suggest 8192 points – but at what sample rate?

I make it then that the sample rate would be 4KHz. Is that fast enough?

binSize = sampleFreq / fftPoints => (0.5 * 8192 = 4096)

How did you arrive at that number?

Thanks in advance,

Bryon

Sorry, I got you confused.

The sample rate you will need is 16000Hz, that will cover frequencies from 0-8kHz.

In fact you would need 1024*16 = 16384 FFT points, you can do it with 8192 but that is a topic for another discussion.

You simply load your 16000 sampled points into the rex buffer (pad the 384 extra points with zeros, that will not make a problem) and you will get two buffers on return, rex will contain cosine waves while imx sine waves.

Thanks David.

That will give me a bin size of about 1Hz. Given the frequency gap I am concerned with is 1Hz – is a bin size of 1Hz enough separation do you think?

Thanks,

Bryon

It depends on your purposes of course. There are many applications where bins of 10s, or even 100s Hz are perfectly acceptable.

HI David,

Thanks for sharing your code, good work!

I have discovered one minor issue with your it: the imaginary part vector is pointing the opposite direction. At descrambling (around line 190), the code should be: xIm[target] = -x.im;

Please let me clarify the bin spacing:

Bin spacing or bin size is Bandwidth/Window size, so for example in audio files it is Samplerate/2/Window. A signal sampled at 22kHz have a max frequency of 11kHz, and that 11k should be divided by the size of the window. In one of the previous posts it is not really clear.

What I need help with from someone is how to get a meaning of the returned numbers, as increasing the window size will result in exponentially(?) decreasing values.

Using a perfect sine wave, while changing only the window size I expected to get similar Power values at the same frequencies.

My test result of a 44100Hz 5512 Hz sine wave (period length of 8 samples):

Freq Re^2+Im^2 Re Im

FFT window: 256 Sample rate: 44100 Freq delta: 86,13

2756,250 1,190 0,028 -1,091

5426,367 0,243 0,461 0,174

5512,500 2628,820 -51,260 -1,117

5598,633 0,242 -0,461 0,174

FFT window: 512 Sample rate: 44100 Freq delta: 43,07

2756,250 0,124 0,009 -0,352

5512,500 177,371 -13,309 -0,484

FFT window: 1024 Sample rate: 44100 Freq delta: 21,53

5512,500 7,938 -2,810 -0,204

FFT window: 2048 Sample rate: 44100 Freq delta: 10,77

5512,500 0,785 -0,876 -0,128

FFT window: 4096 Sample rate: 44100 Freq delta: 5,38

5512,500 0,140 -0,358 -0,108

Can you (or someone) give me some pointers on this?

You’re sure about the sign of the imaginary part? I definitely did have a bug in the sign of the imaginary part, but I corrected that way back in 2011. I don’t do any coding in C# these days, so I can’t really check easily.

The bin spacing is SR/N, where SR is the sample rate and N is the number of points in the FFT. For example, for an FFT with N = 1024 (log2N = 10) and sample rate of 44.1 kHz, the bin spacing is ~43.1 Hz.

Yesterday I checked Exocortex DSP FFT and a naive DFT in Math.NET, and these conclusions came up. Maybe these two other implemenations have a same bug in common, but taking into account that one is a DFT, I think you should verify your results against other results you trust in. The values were perfectly matching apart form that.

It’s true although, I have only tested forward FFT. Maybe the reverse FFT’s imaginary sings are ok.

I’m sorry to say, but you must be mistaken about the bin size, at least on real input, it must be half of that. I have generated sine waves in Audacity, even counted sample bytes in the sinewaves to make it sure, and resulting frequencies were always doubled.

The sample rate is the double of the frequency of signal that could be contained within, therefore you must divide the signal frequency (which is obviously half of the samplerate) with the number of bins.

I am not an expert of complex math/FFT, but using generated audio samples this was the result.

Another explanation I can come up with that we must divide by two because of the mirrored results, the fact that bins are effectively doubled.

Please do verify it yourself, I am waiting your comments.

Yesterday I checked Exocortex DSP FFT and a naive DFT in Math.NET, and these conclusions came up. Maybe these two other implemenations have a same bug in common, but taking into account that one is a DFT, I think you should verify your results against other results you trust in. The values were perfectly matching apart form that.

It’s true although, I have only tested forward FFT. Maybe the reverse FFT’s imaginary sings are ok.

I’m sorry to say, but you must be mistaken about the bin size, at least on real input, it must be half of that. I have generated sine waves in Audacity, even counted sample bytes in the sinewaves to make it sure, and resulting frequencies were always doubled.

The sample rate is the double of the frequency of signal that could be contained within, therefore you must divide the signal frequency (which is obviously half of the samplerate) with the number of bins.

I am not an expert of complex math/FFT, but using generated audio samples this was the result.

Another explanation I can come up with that we must divide by two because of the mirrored results, the fact that bins are effectively doubled.

Please do verify it yourself, I am waiting your comments.

If you want to test the FFT itself, you should write a test program than uses a test signal that’s generated mathematically in code. Don’t rely on the values read from a file generated in Audacity; loads of things can go wrong with reading audio data from files – endianess, bit depth, stereo/mono mixups.

As for the bin spacing for an N-point DFT (or FFT), it really is SR/N, with bin N/2 corresponding to the Nyquist frequency (SR/2). For real-valued input signals, the bins above N/2 will be complex-conjugates of those below N/2. E.g. xRe[N-i] = xRe[i] and xIm[N-i] = -xIm[i].

Thanks for pointing to input data. It was not the WAV.

The window function(s) were applied incorrectly.

Now the output values makes much more sense ;-)