About Me

I write code for Microsoft and do web stuff. Want to know more? Read this blog!

My Social World

« Backing up a Zune Library for when you reinstall Windows | Main | Sorry about the delay on VidSpeak Part 3 »

VidSpeak Part 3 – Generating and Playing Audio

This is part 3 of my VidSpeak series, where I show off an app I wrote for my multimedia course.  Check out part 2.

5. Generate the Sound

Alright, so this is the complex part :).  For the assignment, we were instructed to convert each column of the quantized grayscale 64x64 frame to a chord of sound.  We were asked to do this by assigning a frequency to each row of the image (the middle row being 440Hz or "A440", which is the "A" above middle "C" on a piano).  Then the "brightness" of the pixel at that row is the intesity of that frequency in the chord.  So, if a pixel was black, that frequency was not part of the chord, and if it was white, that frequency was part of the chord (at 100% intensity).

Sound is generated using a process called Frequency Modulation (or "FREAK WHEN SEE MOD YOU LATE SHUN" as I like to call it (inside joke for MS interns who did The Microsoft Intern Game :P)).  Essentially what we are going to do is create sine waves of each frequency we want to play, scale them by the intensity value, and then add them together to create a new sine wave, representing the whole chord.  This process is done with this code

uint* data = (uint*) Marshal.AllocHGlobal(SamplesPerColumn * Marshal.SizeOf(typeof (uint)));
for (int sample = 0; sample < SamplesPerColumn; sample++) {
double signal = 0.0f;
for (int col = 0; col < width; col++) {
PixelData* cell = row0 + col;
double intensity = cell->Red / (double) (_quantizationLevels - 1);
signal += (intensity * Math.Sin(2.0f * Math.PI * FrequencyTable[col] * TimeTable[sample]));
signal = (uint.MaxValue / 2) * (signal + 1);
*(data + sample) = (uint) signal;

Because sound is represented digitally in computers, we can't send the sine wave itself, we have to take discrete samples of it that we will send to the sound card.  In this case, I'm taking 500 samples for each column (SamplesPerColumn = 500).  Elsewhere in my code I informed the sound card that I would be generating sound at 8000Hz (or 8000 samples per second), meaning that 500 samples will produce 0.0625seconds of audio.

For each sample I'm going to generate, I iterate across the columns (remember, I rotated the image in step 1 so that each row of the image now represents a column of the original frame).  Then I generate a sample of the sine wave for that frequency and multiply it by the value at the pixel (as a float, where 15 is 1.0 and 0 is 0.0) (that's the line that starts "signal +=".  Once I've added up all the samples for the chord, I convert the float into an unsigned integer where (uint.MaxValue / 2) represents 0 and store in the (unmanaged) buffer I'm building.

From there, I just send that unmanaged buffer 'data' to the sound card using a helper class I created to wrap the waveOut API:

SoundDevice.WriteAudioBlock(new IntPtr(data), (uint)SamplesPerColumn);

The helper class basically wraps a low-level Win32 API called "waveOut"�.  It's not as powerful as using DirectX or some other high-level API, but it gets the job done :).  The code for this helper is in the SoundDevice class in my code. Remember, all the code is posted in my Part 1 post.

That's it for VidSpeak.  I'm going to be posting more code from my other school projects soon, but I have to finish them first :P.

Reader Comments

There are no comments for this journal entry. To create a new comment, use the form below.

Post a New Comment

Enter your information below to add a new comment.

My response is on my own website »
Author Email (optional):
Author URL (optional):
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>