VidSpeak Part 1 - Extracting Frames from Video in C#!

I know, its been too long since I blogged, but its pretty busy at school right now :). Anyway, I'm taking a course in Multimedia this semester, and as part of that course, I have to write a program to convert frames in a Video to short Audio clips. I thought it might be interesting to examine how that is done, in C#.  So, over the course of about 3-4 posts, I'll go over the code that I wrote.  I've attached the full project to this post, so you can take a look at it right now.  The GUI app should work, though I can't guarantee it. All I can give it is "Works on My Machine" seal of approval :) 

Here are the steps involved:


  1. Extract the next frame from the video
  2. Scale the frame down to 64x64 pixels
  3. Make the frame a grayscale image
  4. "Quantize" the grayscale frame into 4-bit colour
  5. Convert the frame to sound

The Code

The Code is a Visual Studio 2008 solution, written for .Net 3.5.  It uses unsafe code for image processing and sound generation, so you can't run it without full trust (i.e. you can't run it off of a network share).

1. Extract the next frame from the video

I used a "Pipeline" (http://en.wikipedia.org/wiki/Pipeline_(software)) architecture, so this phase is handled by a component I call a "Frame Source" which is expected to return a new frame when asked (or return null to signal the end of the input). I used the DirectShow COM library "DexterLib" to do the extraction. DexterLib contains a class called MediaDet (for MediaDetector) which does most of the work. Here's the code for the function which retrieves a frame at a specified timecode (in seconds). 

FYI: "_detector" is an instance of DexterLib.MediaDetClass() ("_detector" is of type IMediaDet), "_streamLength" is the length of of the video stream in seconds, "_bufferHandle" is an IntPtr referring to an unmanaged buffer (allocated with Marshal.AllocHGlobal) to hold the bitmap, and "_bufferSize"/"_frameSize" are the size of the buffer and the size of each video frame (respectively)

// WARNING: This method will destroy the bitmap retrieved in a previous call to this method
public Bitmap GetFrameAtTime(double timeCode) {
// Get the bitmap at this time
Bitmap frame = null;
unsafe {
byte* bufferPointer = (byte*)_bufferHandle;
_detector.GetBitmapBits(timeCode,
ref _bufferSize,
ref *bufferPointer,
_frameSize.Width,
_frameSize.Height);
frame = new Bitmap(_frameSize.Width, // Width
_frameSize.Height, // Heigth
_frameSize.Width * 3, // Stride
PixelFormat.Format24bppRgb, // Pixel Format
new IntPtr(bufferPointer +
Marshal.SizeOf(typeof(BITMAPINFOHEADER)))); // Start of Buffer
}
return frame;
}

(Note: If you look at the actualy code, you will notice I snipped out some stuff from the beginning of this function to display it on the blog. The missing code just handles an (experimental) feature I added to allow me to start at any location in the video, rather than always starting at the beginning)

After loading the frame, I have to flip it, because Dexter loads the frame upside-down, fortunately the System.Drawing.Image class provides a RotateFlip method to do just that! I also rotate it 90 degrees clock-wise, so that each row of the transformed image maps to a column of the frame. This makes step 5 easier, since Bitmaps are stored in "row-major" order (http://en.wikipedia.org/wiki/Row-major_order).

To use the Frame Source, all my program has to do is call GetFrameAtTime method passing in a timecode (in seconds).  This is handled in the FrameProcessor by the GetNextFrame method

_source.GetFrameAtTime((DateTime.Now - _startTime).TotalSeconds)

Rather than going frame-by-frame, I'm extracting the next frame by time.  So, if it takes 4 seconds to process a frame, the next frame I take is approximately 4 seconds after the frame I just processed.

Here's the code: VidSpeak.zip (267.21 KB)