Win32 Multimedia subsystem
The Win32 multimedia subsystem is fairly good at what it was designed for - powering applications such as "CD Player" and "Sound Recorder" but when I tried to use it for sound capture and CD playing within a game type environment, it obvious fairly early on that alot of work would be required to get things up to speed.
Real Time Sound Capture
I wanted to be able to retrieve a constant sampled audio stream from the computers sound card that was being recorded "live", so that cd music, microphone input e.t.c. could all be captured and processed in real time by my program. I aimed to get a latency between the user hearing a sound and the sampled sound showing up in my data of about 50/100ms.
I'd seen this done in a few other applications but had noticed that many were not able to update the sound stream often [their outputs were jerky, like about 4 updates per second] as I began to code I noted the problems that caused this behaviour. Actually, it probably wouldn't have mattered if I hadn't smoothed everything out for my program since the user probably wouldn't notice anyway, but I can't resist a new problem :)
There are a set of functions used for accessing the win32 multimedia api that deal only with recording sound [there are a set for playing sounds and various other things]. These functions are separate from MCI, which does not allow realtime access to recorded data.
waveInGetNumDevs, waveInGetDevCaps, waveInOpen, waveInPrepareHeader, waveInAddBuffer, waveInUnprepareHeader, waveInReset, waveInClose, waveInStart, waveInStop
[Details in the MM help files]
I'm not going to go into alot of detail about what I did but I'll describe the problems I found. [Otherwise, where would all the fun be for you when coding this stuff :)]
First of all you have to open a wavein device, its best to enumerate all the available ones [usually there is only one] and let the user select one. You have to pick a sample rate and sample format here too, use the default PCM format [its just plain uncompressed sample data]. I try to open the device at several sample rates and channels e.t.c. starting at 64Khz, Stereo, 16bit [which I think is the highest any sound card will go] and then clocking down through to 11Khz, Mono, 8bit. Running through them like this will stop problems with cards that, when running in full duplex mode [playing and recording waves at the same time] won't allow different frequencies for playing and recording. [e.g. if I run Modplug with my program, I can only open a 44.1Khz wavein device, without Modplug I can open a 48Khz wavein device].
The way that your application receives sound is simple - you post data buffers to the wavein system, it fills them and calls a callback when its done - you can post several buffers to get continuous coverage. You are not ment to post new buffers in the callback but so far I haven't found problems with doing so. I use two data buffers that are continually cycled round, each about 1/2 second long. You need to 'prepare' the buffers before sending them [using waveInPrepareHeader] but you don't [I think] have to prepare them again before the next time you send them.
When you want to activate recording, call waveInStart.
So far, easy - now comes the annoying part. The problems are caused by the fact that I need to see recorded data in real time.
There is a wavein function [waveingetpos] that I thought at first would solve my problems. It doesn't. It only tells you how long the wavein channel has been recording, not how much data has been recorded - this wouldn't be a problem if:
a) You could make sure that during all the time between waveinstart and waveinstop data was always being fed into your buffers, if it isn't, waveingetpos keeps counting... [so you lose track]
b) waveingetpos wasn't on some sort of weird 30Hz timer that made its updates very erratic.
I said above that I was using 1/2 second buffers - the next thing I tried was using 16.6ms buffers [approx 1 frame at 60fps], expecting that approximatly each frame I would receive a filled data buffer. Nope. The driver latency caused by this approach was HUGE [about 20ms a time].
I realised why the other recording applications I saw had such slow sound updates - they were using 1/4 [ish] second buffers and only processing when the buffer was full. This was unacceptable to me. So I came up with a better method of retrieving the data.
After a little checking, I found that my sound driver filled up the data buffer I posted with data packets at about 20Hz [that varied a bit depending on the sound frequency/channels/bits]. So I filled the data buffers with a signiture string [8 bytes]. Every frame, I scanned the buffers [from the previous search end] to see if I could find the signiture, if it was found in the same place as last frame then I knew that no data had been received, if it had moved then I captured all the data up until the next signiture match. Each time I received a filled data buffer, I filled it with the signiture again before sending it back to the sound driver.
I'm not sure if this approach will work on all sound cards/drivers but it works very well on the few I've tried it on. It is a little more complex than described above [To smooth things out more you need to have a sliding window on the data you've received, the position of which is computed by predicting when the next data packet will arrive] but if you're interested in adding sound reactivity to a program, it's worth the effort.
I'm releasing an alpha test of my wave and base 3D code sometime this or next week, I'll fix any reported bugs and if there is any intrest, I'll release the source code for the sound recording section of the application. [All will be available on my web site]
CD audio routines
These too are a complete bastard. Especially trying to detect when CD's have been inserted or removed [the driver takes 10ms anytime you ask it to do/check anything]. The most annoying problem I came across was when I tried to implement a track timer [tells you how long the current track has been playing]. Since calling the driver each frame was so slow, and calling it less often [e.g. per second] would have caused aliasing, I had to bypass the CD functions [I used MCI this time] and time it myself. Not exactly hard to do but annoying.
[The Key to making the Win32 multimedia functions fast is not to use them