Professional Documents
Culture Documents
Abstract Microsoft Windows Vista includes several audio rendering digital signal processing effects that are installed along with the in-box class drivers. These effects are packaged as user-mode System Effect Audio Processing Objects (sAPOs). The effects include per-stream pre-mix render sAPOs (local effects [LFX]) and one post-mix render sAPO (global effects [GFX]). This white paper describes the render LFX and GFX sAPOs that are included with Windows Vista. It also describes two strategies that hardware manufacturers can implement to reuse the inbox sAPOs. This information applies to the Windows Vista operating system. Future versions of this preview information will be provided in the Microsoft Windows Driver Kit (WDK). The current version of this paper is maintained on the Web at: http://www.microsoft.com/whdc/device/audio/vista_sysfx.mspx References and resources discussed here are listed at the end of this paper. Contents
Introduction .............................................................................................................................3 New Audio Features for Windows Vista ..................................................................................3 Loudness Equalization DSP (LFX).....................................................................................3 Bass Management (LFX) ...................................................................................................5 Low Frequency Protection (LFX)........................................................................................8 Speaker Fill (LFX) ..............................................................................................................8 Room Correction (GFX) .....................................................................................................9 Virtual Surround (LFX) .....................................................................................................10 Speaker Phantoming (LFX)..............................................................................................10 Enhanced Sound for Laptop Computers ..........................................................................11 Audio System Effects User Interface................................................................................13 Reuse of Microsoft sAPOs by Third Parties ..........................................................................16 How to Install HD Audio and USB Audio Drivers..............................................................16 How to Combine Custom and Windows Vista sAPOs......................................................17 Detailed Guidelines for Strategy A ........................................................................................17 General Programming Issues...........................................................................................17 Features and Their Modes ...............................................................................................19 Supported IPropertyStore Properties ...............................................................................26 Mutual Exclusion and Feature Interactions ......................................................................29 Sample Code for Strategy A.............................................................................................29 Detailed Guidelines for Strategy B ........................................................................................32 Programming Information.................................................................................................32 Initialization ......................................................................................................................32 Query Windows Vista sAPO's Feature State...................................................................33 Format Negotiation...........................................................................................................33 LockForProcess/UnlockForProcess .................................................................................34 GetLatency.......................................................................................................................34 APOProcess.....................................................................................................................35 Handling Windows Vista sAPO errors..............................................................................35
Compilation and Linking ...................................................................................................35 General Guidelines for Custom Audio System Effects..........................................................36 Resources.............................................................................................................................36 Appendix. Run-Time Considerations When Reusing Windows Vista sAPOs .......................37 Handling the Limitations of Different Input-Output Format Combination ..........................37 Interaction between Speaker Fill and Bass Management ................................................38 Interaction between Folddown and Bass Management....................................................38
Disclaimer
This is a preliminary document and may be changed substantially prior to final commercial release of the software described herein. The information contained in this document represents the current view of Microsoft Corporation on the issues discussed as of the date of publication. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information presented after the date of publication. This White Paper is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS DOCUMENT. Complying with all applicable copyright laws is the responsibility of the user. Without limiting the rights under copyright, no part of this document may be reproduced, stored in or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording, or otherwise), or for any purpose, without the express written permission of Microsoft Corporation. Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in this document. Except as expressly provided in any written license agreement from Microsoft, the furnishing of this document does not give you any license to these patents, trademarks, copyrights, or other intellectual property. Unless otherwise noted, the example companies, organizations, products, domain names, e-mail addresses, logos, people, places and events depicted herein are fictitious, and no association with any real company, organization, product, domain name, email address, logo, person, place or event is intended or should be inferred. 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, and Windows Vista are either registered trademarks or trademarks of Microsoft Corporation in the United States and/or other countries. The names of actual companies and products mentioned herein may be the trademarks of their respective owners.
Introduction
Today's consumers use their PCs to access and enjoy a wide variety of entertainment. It is not uncommon to have music, television, and movie playback integrated into one computer. With Microsoft Windows Media Center Edition, computers are increasingly found in the living room and are used as the entertainment hub for the entire household. New audio features enable users to store their media in one placethe computerinstead of spread out over several sources such as a CD or DVD player, an audio/video receiver (AVR), a TV, and so on. Centralizing media sources provides a much more engrossing playback experience for both the casual and the avid listener. Microsoft Windows Vista introduces new advanced audio and communication functionality that enhances the high-fidelity music and movie audio experience and provides great hands-free voice support. It supports the kind of entertainment audio functionality and performance that is usually found only in expensive feature-laden AVRs. This includes previously exclusive features such as room correction and bass management. For communications audio, Window Vista supports echo cancellation and microphone array voice acquisition. This new inbox audio functionality for both entertainment and communications services is well ahead of the competition. The new audio features can be broadly classified into three categories: Enhanced audio playback for music, television, and movies Surround headphones and bass boost for laptop computers Advanced communication support
This white paper provides an overview of the audio system effects in Windows Vista. It includes descriptions of the underlying digital signal processing (DSP) algorithms and the user interface (UI) for accessing and choosing the available audio system effects.
Windows Vista can maintain a more uniform perceived loudness across different digital audio files or sources than earlier technologies. This means that loudness always stays within a specified rangeeven for different digital signalswhen users:: Switch between a broadcast NTSC or ATSC TV program and a locally stored Windows Media Audio (WMA) or MP3 file. Switch between different formats in a play listsuch as WMA, MP3, or WAV filesthat were authored at different volume levels.
Loudness equalization is ideal, for example, for watching a movie at night. It makes it easier to hear the quieter parts of the movie while limiting the maximum loudness to a level that is considerate of others. Loudness equalization also improves the listening experience in noisy playback settings, by making the quiet parts of the content loud enough to be audible without creating disturbingly loud peak volumes. Loudness and intensity are different ways of quantifying audio levels. Loudness, in its technical sense, refers to the listener's perception of an audio signal's volume. Intensity (volume and level) is the externally measured power of an audio signal.
Two signals of the same intensity with different time structure or frequency content can have substantially different loudness levels. This leads to the common experience where some content sounds much louder than other content with the same intensity simply because of differences in the source material and the way in which the content was recorded. Furthermore, different content standardssuch as digital versus analog TV)could have different intensity levels for the same content. As a result, the perceived level of audio content can vary widely, from nearly inaudible in a moderately quiet listening environment to loud enough to be uncomfortable. Loudness equalization simulates human hearing to accurately measure the loudnessas opposed to intensityof an audio source. It then uses dynamic gain adjustment to keep the loudness of different sources more nearly constant. Loudness equalization can thus affect both dynamic range and peak loudness. Windows Vista uses single-pass loudness equalization, which calculates loudness on a block-by-block basis. A block corresponds to the critical band resolution of a human year. Single-pass loudness equalization adjusts the gain with a fast attack and slow decayjust as many wideband compressors doto tightly control the peak loudness of a signal while maintaining the local dynamics. Fast attack means that relatively loud signals have their gain rapidly reduced to control the loudest signal that is presented to the listener. Slow decay means that, when an audio signal reaches a peak but does not sustain that level, the gain following the peak is slowly increased.
Single-pass loudness equalization equalizes long-term level changes somewhat, but preserves the signal's short-term dynamics. The loudness equalization is not full, and the technique deliberately preserves some sense of louder versus softer across different material.
A loudness equalization service is very rare in AVRs and is not generally included in competing products. Figure 1 illustrates the Enhancements tab of the Windows Vista Control Panel Speakers application, showing the UI for enabling and configuring loudness equalization.
LPF
HPF
BASS
LPF
HPF
BASS
LPF
HPF
BASS
LPF
SL
HPF
BASS
SL
LPF
SR
HPF
BASS
SR
LFE Subwoofer
Figure 3 shows a typical example of reverse bass management. The subwoofer's signal and the low-frequency portion of the signal that is intended for the small loudspeakers are redirected to the two large loudspeakers.
REVERSE BASS MANAGEMENT Front LR speakers large (full range) Satellite speakers small No subwoofer LFE and bass from satellite speakers channeled to front LR
L L
R
LPF
HPF
BASS
LPF
SL
HPF
BASS
SL
LPF
SR
HPF
BASS
SR
LFE
With either type of system, a user can use the Bass Management Settings dialog box during setup to specify the loudspeaker configuration and the cutoff frequency of the limited-bandwidth loudspeakers. The system can then take the appropriate actions. Figure 4 shows the Bass Management Settings dialog box.
Speaker fill can be turned on and off with the Control Panel Audio application's Speakers (High Definition Audio Device) dialog box, as shown in Figure 5.
procedure automatically attempts to flatten the frequency response of each channel to compensate for relative differences in the channels, as well as any deficiencies in each channel's frequency response. After these measurements have been made, they are stored as a profile that is used by the room correction DSP to correct the delay, overall gain, and frequency balance between loudspeaker locations. Room correction ensures that the listening area will be a good stereo and multichannel soundstage with improved timbre, envelopment, and front and back sensation when compared to the uncorrected system. Figure 6 shows the completion page of the Room Calibration Wizard.
use other combinations such as the rear-left and rear-right or side-left and side-right speakers.
Users enable virtualized surround sound by indicating that they are listening with headphones. Figure 7 shows the UI for enabling virtualized surround sound.
After users choose an audio end point, they must run a Speaker Configuration Wizard. The wizard asks users to specify their loudspeaker configurationstereo or multichanneland verifies that the configuration is accurate by playing test tones through the loudspeakers. In addition, users can specify the physical characteristics of their loudspeakers, such as which loudspeakers are full range, whether any loudspeakers are absent, and so on. Figure 9 shows the Speaker Configuration Wizard.
The loudspeaker configuration settings determine which audio system effects are enabled. The configuration settings also determine some of the parameters that are required for audio system effects such as bass management and speaker phantoming. After users complete the Speaker Configuration Wizard, they can use the Enhancements tab on the Control Panel Audio application's Speakers Properties dialog box to select audio system effects that are pertinent to their loudspeaker configuration. Figure 10 shows the Enhancements tab of the Speakers Properties dialog box.
The available audio system effects depend on which audio end point, loudspeaker configuration, and loudspeaker characteristics the user has chosen. Table 1 lists the audio system effects that are available for the various loudspeaker configurations.
required audio effects functionality, there is no need to install any Windows Vista sAPOs. The details in the following sections pertain specifically to Option 2, where the IHVs and OEMs must delegate missing functionalities to Windows Vista sAPOs.
Strategy B Write the custom sAPO as a thin wrapper around the Windows Vista sAPO. This strategy is best suited to IHVs or OEMs who want to: Add their custom effects in the simplest way possible. Have the Windows Vista UI continue to control the effects.
IHV or OEMs who choose Strategy B should still read the Strategy A section to obtain a thorough understanding of Windows Vista custom audio system effects. Note: With strategy B, IHVs cannot add UI to control their added custom audio system effects to the Windows Vista Enhancements tab. There is only one Enhancements tab, and it must remain associated with the property page for the Windows Vista sAPOs. The IHV's UI must be implemented in some other way, such as a separate Control Panel application.
The CLSIDs are CLSID_CWMAudioLFXAPO and CLSID_CWMAudioGFXAPO for the LFX and GFX sAPOs, respectively. The CLSIDs are declared in wmcodecdsp.h and defined in wmcodecdspuuid.lib. They must support COM aggregation. However, aggregation is not expected to be used in custom audio system effects scenarios, so it should pose no significant problems.
Initialization
A custom sAPO must initialize the Window Vista sAPO by calling its IAudioSystemEffects::Initialize method. This is typically done from the custom APOs Initialize method. Any arguments that are passed to the custom sAPO's Initialize method should be passed directly to the Windows sAPOs Initialize. This allows the Windows Vista sAPO to fetch its settings from the endpoint and Fx property stores in the APOInitSystemEffects structure. It is possible to have the custom sAPO fetch the settings and selectively pass them to the Windows Vista sAPO, but that is essentially Strategy A. If the custom sAPO replaces a Windows Vista feature, it is generally advisable to turn off the corresponding feature on the Windows Vista sAPO. However, turning off the Windows Vista feature might not be strictly necessary, depending on how the feature works. To turn off a feature, query the Windows Vista sAPO for its IPropertyStore interface and call IPropertyStore::SetValue. The properties that are supported by the Windows Vista sAPO's property store are described in "Supported IPropertyStore Properties." later in this paper.
Processing
Generally, just call IAudioProcessingObjectRT::APOProcess with an arbitrary number of input samples and the sAPO will produce the same number of output samples. However, make sure that the output buffer is large enough to hold that many output samples. Both sAPOs support in-place processing, so the input and output buffers can be the same. However, the buffer must be large enough to contain the same number of output samples that were sent as input, allowing for possible expansion of the input sample by features like speaker fill. When headphone virtualization is enabled, the Windows Vista sAPO currently works correctly only if the number of input samples that are passed to it each time is a multiple of 2048. This restriction should eventually be relaxed by using one of the following methods. The appropriate method will be chosen based on the audio engine's requirements. Remove the restriction. The sAPO accepts any number of input samples and produces the same number of output samples. Remove the restriction, but remember that the number of output samples might be different from the number of input samples. In that case, the caller must use IAudioProcessingObjectRT::CalcInputFrames or IAudioProcessingObjectRT::CalcOutputFrames to predict the number of samples that the sAPO will produce and calculate the required output buffer size. Maintain the restriction, but advertise it through an sAPO interface.
Note: The Windows Vista sAPO IAudioProcessingObject::GetLatency implementation is not currently functional.
API Documentation
The custom audio system effects sAPOs currently have no custom APIs. Complete COM, sAPO, and IPropertyStore documentation is beyond the scope of this paper. For complete COM documentation, see the MSDN library. The sAPO interfaces are documented in audioenginebaseapo.idl, which is included in the Platform SDK.
The bass management mode that is used depends on the availability of a subwoofer and the bass-handling capability of main speakers. Table 2 summarizes which bass management mode applies in various scenarios. The six scenarios are numbered for later reference. FBM and RBM refer to forward and reverse bass management, respectively. Table 2. Bass Management Modes
Main speakers All speakers are small The front left/right speakers are large All speakers are large Subwoofer is present (inverted or noninverted) FBM (Scenario 1) FBM (Scenario 2) N/A (Scenario 3) No subwoofer Low-frequency protection or bass boost (Scenario 4) RBM and FBM (Scenario 5) RBM (Scenario 6)
In all six scenarios, the user has at least the following choices: Turn off bass management completely. Turn on bass management, which causes the sAPO to automatically decide the appropriate bass management mode.
The following list is a case-by-case description of the six scenarios: Scenario 1: Forward bass management. The low-frequency portion of the signal for the speaker channels is redirected to the subwoofer. Scenario 2: Forward bass management. The low-frequency portion of the signal for the speaker channels is redirected as follows: If the original channel is off center, the low-frequency signal is redirected to the front-left or front-right channel, depending on which of those two channels is on the same side as the original channel. If the original channel is on the center axis, the low-frequency signal is redirected to the subwoofer channel.
Scenario 3: No bass management. Scenario 4: Low-frequency protection. The low-frequency portion of each of the main channels is removed. The user can choose to turn on bass boost instead of low-frequency protection. Scenario 5: Both bass management modes applied. There is no way to enable them separately. Forward bass management. The low-frequency portion of each of the surround channels is redirected to the front-left or front-right channels, depending on which of those two channels is on the same side as the original channel. If the incoming channel is on the center axis, the lowfrequency part of its signal is divided equally between the front-left and front-right channels. Reverse bass management The subwoofer signal is sent with equal gain to the front-left and front-right channels, with equal gain.
Scenario 6: The subwoofer signal sent with equal gain to each of the main output channels.
Note: In this context, the term surround refers to all main channels other than frontleft and front-right channels. It includes the front-center channel. The low-frequency portion refers to frequencies below a user-adjustable crossover frequency. When a user turns on bass management, the programming logic that the sAPO uses to decide which bass management mode to enable is to: Enable reverse bass management if the content has a .1 channel and there is no subwoofer channel. The lack of a subwoofer channel is indicated by either of the following: The GFX sAPO does have a .1 channel. The NoSub flag is set.
Enable forward bass management if the subwoofer is present or either of the following are true: The LRBig flag is set, indicating that the front and right speakers are large. The content has main channels other than the front-left and front-right channels.
When the NoSub and LRBig flags are both set, the content has both surround and subwoofer channels, which calls for both bass management modes.
Bass Management Settings The following settings are used to define the speaker configuration programmatically. Crossover frequency. Only some speakers, such as the subwoofer, can support frequencies below the crossover frequency. The setting is used for forward bass management, low-frequency protection, and bass boost. Multiple crossover frequenciessuch as different values for front and surround speakersare not supported. Speaker size for speakers other than the subwoofer has three settings: All big: All speakers can handle unlimited deep bass. All small: No speakers can go below the crossover frequency. Front LR big: The front left and right speakers are big, and the rest are small. This is referred to subsequently as LRBig.
LRBig allows, for example, forward bass management to work without an output subwoofer channel by redirecting deep bass signals from the surround/rear channels into the front channels. Otherwise, forward bass management requires an output subwoofer channel. Other modes of bass management also must know which main speakers are big. A flag that is named NoSub is set to indicate that no subwoofer is connected even though the output format advertised by the audio device or GFX input may include a .1 channel. The NoSub flag indicates that the output configuration is effectively N.0 as far as bass management is concerned. Note that "NoSub" is an explicit setting, separate from the presence of a lowfrequency effects (LFE) flag in the output channel mask that indicates a subwoofer. The output channel mask cannot be used to convey the presence or absence of a subwoofer because most drivers do not support N.0 channel masks for N > 4. Bass Management Channel Mask Dependencies Usually, at least some form of bass management is supported. This is true only if all of the following conditions are met: NoSub is set to FALSE. The output channel mask includes an LFE flag. There are no small output speakers .This includes when the speaker setup is LRBig, but stereo content is being played.
Virtual Surround This effect is also referred as left /right (LTRT) folddown or left/right matrix encoding. It is used if the channel format of the content that is being played back (N.x) is 2.0 or larger, where x can be 0 or 1. LTRT folddown is normally 4.0 to 2.0. Any other input format is usually handled by first applying N.x to 4.0 generic folddown. However, in our implementation, LTRT folddown is natively 5.1 to 2.0. Any other input is handled by first applying N.x to 5.1 generic folddown first. The output channel mask must be 0x3 (stereo) and the number of input channels including the subwoofer if presentmust be no more than eight. Speaker Fill This effect is used when the number of input channels (N) is less than the number of output channels (M). The effect fills N.x channel to M.x channels, where x can be either 0 or 1. The channel masks in Table 4ignoring the LFE channelare supported for speaker fill. Speaker fill supports any combination of input or output subwoofer channel presence, so the numbers on the left are only examples. The actual configurations might or might not have a subwoofer. Table 4: Speaker Fill Channel Masks
Name MASK_STEREO MASK_FRONTLR MASK_3_FRONT (SPEAKER_FRONT_CENTER | MASK_FRONTLR MASK_4_SQUARE (MASK_FRONTLR | MASK_BACKLR) MASK_4_DIAMOND (MASK_FRONTLR | MASK_FBCENTERS) MASK_5_BACK (MASK_FRONTLR | MASK_BACKLR | SPEAKER_FRONT_CENTER) MASK_5_SIDE (MASK_FRONTLR | MASK_SIDELR | SPEAKER_FRONT_CENTER) MASK_7_SIDE_BACK (MASK_FRONTLR | MASK_BACKLR | SPEAKER_FRONT_CENTER | MASK_SIDELR) MASK_7_FRONT_SIDE (MASK_FRONTLR | MASK_SIDELR | SPEAKER_FRONT_CENTER | MASK_CENTERLR) MASK_7_FRONT_BACK (MASK_FRONTLR | MASK_BACKLR | SPEAKER_FRONT_CENTER | MASK_CENTERLR) Value 0x3 0x7 0x33 0x107 0x3F 0x60F 0x63F 0x6CF 0xFF
Speaker fill is not supported if any of the following is true: The input mask equals the output mask. The only difference between input and output is that one has side left/right channels; the other has back left/right channels. Input has more main channels than output has. The output mask includes the center left/right speakers, but the input mask does not. The set of channels in the output but not in the input does not include at least one of: front center, back left/right, or side left/right.
There is one exception to the second item on the list. If the only difference between input and output is that one has side left/right channels and the other has back left/right channels, speaker fill is supported if either format contains channels that would fall between sideLR and backLR in the channel mask bit order. There are three such channels: SPEAKER_FRONT_LEFT_OF_CENTER SPEAKER_FRONT_RIGHT_OF_CENTER SPEAKER_BACK_CENTER
If the input or output mask contains any of these three channels, speaker fill might be supported even though it does not meet the second condition on the list, but only if the other conditions are satisfied. For example, speaker fill from MASK_7_FRONT_BACK to or from MASK_7_FRONT_SIDE is supported by speaker fill for this reason. Table 5 has the full list of channel values, for convenient reference. Table 5. Channel Values
Name SPEAKER_FRONT_LEFT SPEAKER_FRONT_RIGHT SPEAKER_FRONT_CENTER SPEAKER_LOW_FREQUENCY SPEAKER_BACK_LEFT SPEAKER_BACK_RIGHT SPEAKER_FRONT_LEFT_OF_CENTER SPEAKER_FRONT_RIGHT_OF_CENTER SPEAKER_BACK_CENTER SPEAKER_SIDE_LEFT SPEAKER_SIDE_RIGHT Value 0x1 0x2 0x4 0x8 0x10 0x20 0x40 0x80 0x100 0x200 0x400
Delays are used for channels in the output configurations that are "outside" the front-back range in the input configuration. Conversely, if a speaker in the output configuration is "between" some speakers in the input configuration in the frontback sense, the output for that speaker is generated by mixing some of the input channels on either side of the output channel.
There are exceptions when the physical speaker mask is ignored, including when: The physical mask is set to 0. The physical mask includes channels that are not present in the input mask. Either the output mask or the physical mask lacks left/right symmetry. The number of bits in the mask does not match the channel count for either the input or output format.
Figure 11 shows a block diagram that captures the processing modifications that are used to support phantom speakers. The nature of the optional folddown is a function of the LFX that is enabled.
P Add Zeroes
LFX Processing
Input Mask
Physical Mask
Output Mask
Supported Formats
The Windows Vista custom audio system effects sAPOs support the following formats, sampling rates, and channel masks. PCM format The Windows Vista sAPOs supports only the pulse code modulation (PCM) 32-bit floating point format.
Sampling Rates The Windows Vista sAPOs supports the following common sample rates: 8000, 11025, 16000, 22050, 24000, 32000, 44100, 48000, 88200, 96000, 176400, and 192000. Channel Masks The Windows Vista sAPOs support the channel mask and number of channels as follows: If one channel mask is empty, it is assumed to be equal to the other channel mask. If both channel masks are empty, they are assumed to be equal to each other, but unknown. Both channel masks can be empty only if the number of input channels is equal to the number of output channels.
Except for cases of empty channel masks, all four of the possible LFE channel combinations are supported regardless of the rest of the channel mask: No in or out LFE channel In and out LFE channel In LFE but no out LFE channel Out LFE but no in LFE channel
Two categories of channel masks are supported only when both the channel masks and the number of channels match exactly between input and output, except possibly in the subwoofer channel. Main channel manipulation features are disabled in that case. Channel masks lacking left-right symmetry Channel masks with bits beyond the low 11 bits
If the input channel mask differs from the output channel mask in more than subwoofer presence, the combination is supported only if there is a channel manipulation featuresuch as speaker fill, virtual surround, or headphone virtualizationthat can translate one channel mask to the other. That feature must be enabled when the APO is initialized. There is an exception to the rule in the previous paragraph if both of the following conditions are true: The only difference between the input and output formats is that one has the side left/right channels whereas the other has the back left/right channels instead of the side channels. Neither format has any channels whose position in the channel mask flag order would fall between sideLR and left LR.
If these two criteria are satisfied, then the channel masks are treated as equivalent and conversion between them happens automatically by using memcpy. If, however, either format has any channels that would fall between sideLR and backLR in the channel mask order, simple conversion by using memcpy is not possiblethe channel positions would be corruptedand the conversion must be performed with speaker fill.
For example: 0x60F <=> 0x3F is supported without speaker fill. 0xFF <=> 0x6CF requires speaker fill. 0x13F <=> 0x70F would be possible by using speaker fill if it supported 6.1. However, because speaker fill does not currently support 6.1, 0x13F <=> 0x70F is not supported.
MFPKEY_BASSMGMT_BIGROOM (VT_BOOL) If this value is set to TRUE, assume that the bass outputs of various speakers arrive at the listener with randomized phase and add by power. FALSE means they arrive in phase and add algebraically. For forward bass management, this affects the filter design choice; it should be complementary for small rooms and noncomplementary for large rooms. For reverse bass management, this property should arguably affect how the gain with which the subwoofer channel is added to the bass-capable main channels depends on the number of such channels. It is normally 1/sqrt for large rooms and inverse-linear for small rooms. However, the distinction between big and small rooms is not currently implemented, and reverse bass management always uses the small-room (inverse-linear) rule, regardless of the value of this property. MFPKEY_BASSMGMT_NO_SUB (VT_BOOL) If this value is set to TRUE, no subwoofer is connected to the system even if the output channel mask includes an LFE channel. An example of such a scenario is a 5.1 sound card that is connected to a 5.0 speaker set. MFPKEY_BASSMGMT_INVERT_SUB (VT_BOOL) If this value is set to TRUE, the subwoofer channel output is inverted. The reason for this property is that subwoofers are sometimes wired backwards. MFPKEY_BASS_BOOST_AMOUNT (VT_I4) This value controls the amount of boost that is applied under MFPKEY_CORR_BASS_MANAGEMENT_MODE==2 ("boost"). The range of valid values includes [0..3]. MFPKEY_CORR_HEADPHONE (VT_BOOL) If this value is set to TRUE, the speaker configuration indicates a headphone. Setting this property to TRUE does not enable headphone virtualization. It is used only to determine whether headphone virtualization or room correction is possible. The two effects are mutually exclusive. MFPKEY_AUVRHP_ROOMMODEL (VT_I4) This value specifies the room model. Valid values, defined in wmcodecdsp.h, are: VRHP_SMALLROOM VRHP_MEDIUMROOM VRHP_BIGROOM
MFPKEY_CORR_MULTICHANNEL_MODE (VT_I4) This value specifies the multichannel mode. There are five modes: Normal: No main channel manipulations Passthru: Do not use. This value is obsolete and will be removed. SpkrFill: Speaker fill. HPVR: Headphone virtualization. Headphone virtualization is possible only if MFPKEY_CORR_HEADPHONE is set to true. LTRT: Virtual surround.
MFPKEY_CORR_BASS_MANAGEMENT_MODE (VT_I4) This value specifies the bass management mode. There are three modes: None: No bass management or bass boost. Management: Bass management only. The sAPO turns on whichever type of bass management is possible by using logic that was described earlier in this paper. Boost: Bass boost only. The sAPO enables the low-frequency protection form of bass management.
MFPKEY_ROOMCORR_PROFILE (VT_BLOB) This value is used to store the binary contents of the room profile file. The file extension is .rmp. The following properties are expected to be of no use in the scenarios that are described in this document. However, the GFX sAPO does use the MFPKEY_CORR_HEADPHONE property. Setting that property to TRUE disables room correction. MFPKEY_CORR_ROOM_CORRECTION_ENABLED (VT_BOOL) If this value is set to TRUE, it turns on room correction, if possible. It is possible if both of these conditions are met: A room profile was successfully loaded and parsed earlier through MFPKEY_ROOMCORR_PROFILE. The output channel mask set on the GFX APO is a subset of the channel mask in the room profile.
MFPKEY_CORR_LOUDNESS_EQUALIZATION_ON (VT_BOOL) If this value is set to TRUE, it turns on loudness equalization. This is always possible on the LFX sAPO. MFPKEY_LOUDNESS_EQUALIZATION_RELEASE (VT_I4) This value controls release time of the compressor that is used for loudness equalization. Higher numbers mean slower release: 0: As fast as possible (instant) 1: Extreme (practically instant) 2: Aggressive (~1s) 3: Reasonable (~3s) 4: Conservative (~7s) 5: Slow (~15s) 6: Very slow (~half a minute) 7: Extremely slow (~minute)
The audio effects UI for release setting has six slider positions that correspond to values 2 thru 7.
RBM
SFill
LEQ
RCorr
VRHP
LtRt
LProt
V > >> X >> V X < >> X X >> >> v < < < < X X V << X V X V << I
APO_CONNECTION_DESCRIPTOR inDesc = { APO_CONNECTION_BUFFER_TYPE_EXTERNAL, // APO_CONNECTION_BUFFER_TYPE NULL, 0, // u32MaxFrameCount NULL, APO_CONNECTION_DESCRIPTOR_SIGNATURE }, *pInDesc = &inDesc, outDesc = { APO_CONNECTION_BUFFER_TYPE_EXTERNAL, // APO_CONNECTION_BUFFER_TYPE NULL, 0, NULL, APO_CONNECTION_DESCRIPTOR_SIGNATURE }, *pOutDesc = &outDesc; APO_CONNECTION_PROPERTY inConn = { NULL, 0, // frame count BUFFER_VALID, APO_CONNECTION_PROPERTY_SIGNATURE }, *pInConn = &inConn, outConn = { NULL, 0, // frame count BUFFER_INVALID, APO_CONNECTION_PROPERTY_SIGNATURE }, *pOutConn = &outConn; #define CHECKHR(x) hr = x; if (FAILED(hr)) {printf("%d: %08X\n", __LINE__, hr); goto exit;} #define SET_I4(pkey,val) pv.vt = VT_I4; pv.lVal = val; CHECKHR(pPS>SetValue(pkey, &pv)); #define SET_BOOL(pkey,val) pv.vt = VT_BOOL; pv.boolVal = val ? VARIANT_TRUE : VARIANT_FALSE; CHECKHR(pPS->SetValue(pkey, &pv)); void useAPO() { IUnknown* pUnk = NULL; IAudioProcessingObjectRT* pRT = NULL; IAudioProcessingObjectConfiguration* pConfig = NULL; IPropertyStore* pPS = NULL; WAVEFORMATEXTENSIBLE wfx; IAudioMediaType* pAMTIn = NULL, *pAMTOut = NULL; PROPVARIANT pv; HRESULT hr; CHECKHR(CoCreateInstance(CLSID_CWMAudioLFXAPO, NULL, CLSCTX_INPROC_SERVER, IID_IUnknown, (void**)&pUnk)); CHECKHR(pUnk->QueryInterface(__uuidof(IAudioProcessingObjectRT), (void**)&pRT)); CHECKHR(pUnk>QueryInterface(__uuidof(IAudioProcessingObjectConfiguration), (void**)&pConfig)); CHECKHR(pUnk->QueryInterface(IID_IPropertyStore, (void**)&pPS));
SET_I4(MFPKEY_CORR_MULTICHANNEL_MODE, 2); // turn on speaker filling SET_I4(MFPKEY_CORR_BASS_MANAGEMENT_MODE, 1); // turn on bass management SET_I4(MFPKEY_BASSMGMT_SPKRBASSCONFIG, 0); // all main speakers are small SET_I4(MFPKEY_BASSMGMT_CROSSOVER_FREQ, 120); SET_BOOL(MFPKEY_CORR_LOUDNESS_EQUALIZATION_ON, TRUE); // turn on loudness equalization SET_BOOL(MFPKEY_BASSMGMT_BIGROOM, TRUE); // affects bass management filter // initialize WAVEFORMATEXTENSIBLE for the input format wfx.Format.wFormatTag = WAVE_FORMAT_EXTENSIBLE; wfx.Format.nChannels = 2; wfx.Format.nSamplesPerSec = 44100; wfx.Format.wBitsPerSample = 32; wfx.Format.nBlockAlign = wfx.Format.wBitsPerSample / 8 * wfx.Format.nChannels; wfx.Format.nAvgBytesPerSec = wfx.Format.nSamplesPerSec * wfx.Format.nBlockAlign; wfx.Format.cbSize = 22; wfx.Samples.wValidBitsPerSample = 32; wfx.dwChannelMask = 3; // stereo wfx.SubFormat.Data1 = WAVE_FORMAT_IEEE_FLOAT; wfx.SubFormat.Data2 = 0x0000; wfx.SubFormat.Data3 = 0x0010; wfx.SubFormat.Data4[0] = 0x80; wfx.SubFormat.Data4[1] = 0x00; wfx.SubFormat.Data4[2] = 0x00; wfx.SubFormat.Data4[3] = 0xaa; wfx.SubFormat.Data4[4] = 0x00; wfx.SubFormat.Data4[5] = 0x38; wfx.SubFormat.Data4[6] = 0x9b; wfx.SubFormat.Data4[7] = 0x71; CHECKHR(CreateAudioMediaType(&wfx.Format, &pAMTIn)); // modify WAVEFORMATEXTENSIBLE for the output format wfx.Format.nChannels = 6; wfx.Format.nBlockAlign = wfx.Format.wBitsPerSample / 8 * wfx.Format.nChannels; wfx.Format.nAvgBytesPerSec = wfx.Format.nSamplesPerSec * wfx.Format.nBlockAlign; wfx.dwChannelMask = 0x3f; // one 5.1 flavor CHECKHR(CreateAudioMediaType(&wfx.Format, &pAMTOut)); pInDesc->pFormat = pAMTIn; pOutDesc->pFormat = pAMTOut; CHECKHR(pConfig->LockForProcess(1, &pInDesc, 1, &pOutDesc)); while (0/*have data to process*/) { pInConn->u32ValidFrameCount = 0; // sample count pInConn->pBuffer = 0; // input buffer pOutConn->pBuffer = 0; // output buffer pOutConn->u32BufferFlags = BUFFER_INVALID; pRT->APOProcess(1, &pInConn, 1, &pOutConn); // do something with the output buffer } pConfig->UnlockForProcess();
exit: #define SAFE_RELEASE(p) if (p) p->Release(); SAFE_RELEASE(pAMTIn); SAFE_RELEASE(pAMTOut); SAFE_RELEASE(pUnk); SAFE_RELEASE(pRT); SAFE_RELEASE(pConfig); SAFE_RELEASE(pPS); }
Programming Information
This section covers the general programming issues that must be addressed when using strategy B to implement custom sAPOs. Both LFX and GFX custom audio system effects sAPOs have the following general characteristics: They must be registered as COM in-process server objects that can be instantiated by using CoCreateInstance. The CLSIDs are CLSID_CWMAudioLFXAPO and CLSID_CWMAudioGFXAPO for the LFX and GFX sAPOs, respectively. The CLSIDs are declared in wmcodecdsp.h and defined in wmcodecdspuuid.lib. They must support COM aggregation. However, aggregation is not expected to be used in custom audio system effects scenario, so it should pose no significant problems.
Initialization
A custom sAPO must initialize the Window Vista sAPO by calling its IAudioSystemEffects::Initialize method. This is typically done from the custom APOs Initialize method. Any arguments that are passed to the custom sAPO's Initialize method should be passed directly to the Windows sAPOs Initialize. This allows the Windows Vista sAPO to fetch its settings from the endpoint and Fx property stores in the APOInitSystemEffects structure. It is possible to have the custom sAPO fetch the settings and selectively pass them to the Windows Vista sAPO, but that is essentially Strategy A. If the custom sAPO replaces a Windows Vista feature, it is generally advisable to turn off the corresponding feature on the Windows Vista sAPO. However, turning off the Windows Vista feature might not be strictly necessary, depending on how the feature works. To turn off a feature, query the Windows Vista sAPO for its IPropertyStore interface and call IPropertyStore::SetValue. The properties that are supported by the Windows Vista sAPO's property store are described in "Supported IPropertyStore Properties." later in this paper. For examples of how to communicate with the Windows custom audio system effects sAPO property store, see the "compress" and the "spkrfill" samples.
Option A This option has the advantage that it can be done without instantiating a Windows Vista sAPO. Also, if a custom sAPO wants to monitor the Fx property store, Option A is the only way to receive on-the-fly property change notifications. For an example of Option A, see the "compress" sample. With Option A, the custom sAPO queries the main endpoint property storenot Fxfor PKEY_AudioEngine_DeviceFormat. It then uses the channel mask from that format as the PID for the property key that is used to query the Fx property store. The GUID (fmtid) for the property key that is used to query the Fx property store is one of the XXX_XXX_KEY_GUID values from wmcodecdsp.h. The _KEY_GUID names correspond in obvious ways to the MFPKEY_ names that were discussed earlier in this paper. For examples of this approach, see the Initialize code from the "compress" and "spkrfill" samples. Option B This option has the advantage that it can correctly handle the possibility that the Windows sAPO could eventually have some of its features enabled by default if the corresponding property in the Fx property store does not exist. With Option B, the custom sAPO simply queries the Windows Vista sAPO for its IPropertyStore interface and calls IPropertyStore::GetValue by using one of the MFPKEY_XXX keys that were discussed earlier in this paper. None of the samples does this, although the "spkrfill" sample could use this approach.
Format Negotiation
When implementing a custom LFX sAPO that wraps the Windows Vista LFX sAPO, do not specify APO_FLAG_FRAMESPERSECOND_MUST_MATCH in the custom sAPO's registration properties. This rule should be followed whether or not the custom sAPO can change the channel format. If the custom LFX sAPO were to specify this flag, it would prevent the corresponding Windows Vista LFX from doing speaker filling, headphone virtualization, or virtual surround. A custom LFX sAPO implementation must implement or override IAudioProcessingObject::IsInputFormatSupported. The base class IsInputFormatSupported implementation is unlikely to accurately reflect the set of possible channel conversions that were implemented by the custom LFX sAPO and the Windows Vista LFX sAPO. The custom LFX sAPO's IsInputFormatSupported method should call the corresponding Windows Vista sAPO's IsInputFormatSupported. This ensures that the Windows Vista LFX sAPO handles any channel conversions that are not handled by the custom LFX sAPO. Note that the Windows Vista LFX sAPO might be updated to support more conversions in future Windows releases. Calling the Windows Vista sAPO's IsInputFormatSupported method is one way to ensure that the set of channel conversions that are supported by the custom sAPO completely
contains the set of channel conversions that are supported by the Windows Vista LFX sAPO. What the custom sAPO should do with the return value from the Windows Vista LFX sAPO's IsInputFormatSupported method depends on what channel conversions, if any, the custom LFX sAPO supports. If the custom LFX sAPO does not support any of its own channel conversions, its IsInputFormatSupported method can return the value that was returned by the Windows Vista LFX sAPO's IsInputFormatSupported method directly to the caller. For an example, see the "swap" and "compress" samples. If the custom LFX sAPO supports its own channel conversions, then a negative return valueincluding S_FALSEfrom the Windows Vista LFX sAPO's IsInputFormatSupported method does not necessarily translate into a negative return value to the caller. The custom LFX sAPO could, for example, support channel conversions that are not supported by the corresponding Windows Vista sAPO. In that case, the custom LFX sAPO must combine the return value from the Windows Vista LFX sAPO's IsInputFormatSupported method with its own logic for determining supported inputs. For an example, see the "spkrfill" sample's IsInputFormatSupported implementation. Note that the optimal meaning of "combine" depends on which type of channel conversion should take precedence. It might be appropriate to deviate from the "spkrfill" sample, depending on the exact design of the custom implementation. The IsOutputFormatSupported method on an LFX sAPO is uninteresting because a LFX sAPO's output format is the device's mix format. This format is based on external considerations and cannot be affected by an LFX sAPO or its input format. For that reason, the samples do not attempt to implement correct logic for IsOutputFormatSupported. The above considerations do not apply to GFX sAPOs because the Windows Vista GFX sAPO does not implement any features that require or imply changing the channel format. For that reason, the GFX sample does nothing special for either IsInputFormatSupported or IsOutputFormatSupported. The format negotiation logic of a custom GFX sAPO is not affected by the fact that it is wrapping the Windows Vista GFX sAPO.
LockForProcess/UnlockForProcess
The custom sAPO's IAudioProcessingObjectConfiguration::LockForProcess method should call the corresponding method on the Windows Vista sAPO. LockForProcess() is a good place to make decisions as to the order in which the various processing stages should happen. For example, it can decide whether to apply custom sAPO processing or the Windows Vista sAPO's processing first. All three samples provide examples of such decision logic, and the comments in the samples provide some background. However, it is impossible to provide completely general guidance on that subject in this document because it would require knowledge of the specific features of the custom sAPO and how they might interact with the Windows Vista sAPOs features.
GetLatency
The custom sAPOs IAudioProcessingObject::GetLatency implementation should call GetLatency on the Windows Vista sAPO that is being wrapped. If the custom sAPO processing incurs latency, it should add it to the result that was returned by the Windows Vista sAPO before returning the value to the caller.
APOProcess
The custom sAPO's IAudioProcessingObjectRT::APOProcess method should call the Windows Vista sAPO's APOProcess method before, after, or even during processing. The decision on when to call APOProcess should be made in LockForProcess, so that any necessary intermediate buffers can be allocated. The Windows Vista sAPOs support in-place processing whenever their input and output formats are identical. In that case, the custom APO can pass the same APO_CONNECTION_PROPERTY as both the input and output connection property for the Windows sAPO. The custom sAPO should not, however, use the custom sAPO's input connection property as the output connection property for the Windows Vista sAPO. In general, sAPOs should not modify their input buffer.
Resources
White Papers: A Wave Port Driver for Real-Time Audio Streaming http://www.microsoft.com/whdc/device/audio/wavertport.mspx Audio Device Technologies for Windows http://www.microsoft.com/whdc/device/audio/default.mspx Custom Audio Effects in Windows Vista http://www.microsoft.com/whdc/device/audio/sysfx.mspx Device Finish-Install Actions in Windows Vista http://www.microsoft.com/whdc/driver/install/Finish_Install.mspx Pin Configuration Guidelines for High Definition Audio Devices http://www.microsoft.com/whdc/device/audio/PinConfig.mspx Plug and Play Guidelines for High Definition Audio Devices http://www.microsoft.com/whdc/device/audio/HD-aud_PnP.mspx UAA Hardware Design Guidelines http://www.microsoft.com/whdc/device/audio/UAA_HWdesign.mspx Universal Audio Architecture http://www.microsoft.com/whdc/device/audio/uaa.mspx Windows Driver Kit (WDK) http://www.microsoft.com/whdc/driver/WDK/ Software Development Kits: APO SDK (available Summer 2006) http://msdn.microsoft.com/library/default.asp?url=/library/enus/sdkintro/sdkintro/devdoc_platform_software_development_kit_start_page.asp Platform SDK http://msdn.microsoft.com/library/enus/sdkintro/sdkintro/devdoc_platform_software_development_kit_start_page.asp Windows Vista SDK http://msdn.microsoft.com/windowsvista/
Why one or more instances? To avoid undesirable interactions, most features require a certain relative ordering. Because Windows Vista sAPOs implement multiple features inside a single sAPO, multiple instances of that sAPO might be required to ensure correct ordering. For example, assume that three enabled featuresA, B, and Cmust be ordered ABC. The custom implementation handles B but delegates A and C to the Windows sAPO. A and C must then be in separate instances of the Microsoft sAPO so that the custom implementation of B can happen between them. Windows Vista implements room correction in the GFX sAPO, which means it is a separate COM object from the LFX sAPO. A custom implementation could choose to delegate room correction to the Windows implementation but place it in a custom LFX sAPO. The custom LFX implementation might then need to delegate some processing to the Windows Vista LFX sAPO implementation and other processing to the Windows Vista GFX sAPO implementation.
The Windows Vista sAPO implements certain common speaker fill configurations, such as 2.0 => 5.1, with special optimized code that handles reverse bass management in the same step as speaker fill.
When any of the headphone virtualization, virtual surround encoding, or speaker fill effects are on, reverse bass management is handled during that step. Reverse bass management is still controlled via the sAPOs reverse bass management property as if it were a separate feature. In these cases, reverse bass management simply controls the folddown coefficients for the .1 input channel. One open issue is that reverse bass management cannot be disabled when LTRT is on. In that case, reverse bass management uses an unconventional subwoofer channel gain. The Windows audio system effects sAPOs apply some minor processinggain and delayeven when no features are enabled. The goal of such processing is to ensure that the gain and delay parameters do not change when a feature is enabled on the fly. The reason is that delay is inherent in the implementation of some features, and a gain <1 is applied by some features to avoid excessively high output in certain situations. The set of available features depends on the input-output formats and certain properties, and so does the cumulative normalization gain and delay. If features will not be turned on or off on the fly, normalization gain can be disabled by setting the MFPKEY_CORR_NORMALIZATION_GAIN property to FALSE by calling IPropertyStore::SetValue. The property might be TRUE by default. There is no mechanism to disable the normalization delay because it is presumed less likely to be objectionable than normalization gain. If normalization delay is objectionable, simply bypass the sAPO in question.