|
|
|
![]() |
|
Volume 7, Number 2, 2008 — The ARM-Enabled Home
|
 | | | Special ARM-Enabled Home Section | | Throw Away Your CD Collection? Yes,with HD-AAC Audio Codecs on ARM CPUs |
|
Author:
Robert Bleidt, Division General Manager Jan Nordmann, Director of Marketing and Business Development Fraunhofer USA Digital Media Technologies
Synopsis:
Featuring music encoding with quality beyond CDs, plus iPod and mobile phone compatibility, the HD-AAC codec offers a new way of storing, streaming and distributing music on portable devices, within the connected home and through online music stores, replacing the physical media CD. Unifying MPEG-4 SLS and Advanced Audio Coding standards (AAC; a compression and encoding scheme for digital audio, designed to be the successor of the MP3 format), Fraunhofers HD-AAC codec provides future-proof, lossless compression of music with up to 24-bit quality. Via ARM Powered® platforms, Fraunhofer IIS is efficiently making HD-AAC available to consumers, broadcasters, electronic music distributors and the consumer electronics industry. This article, which includes a detailed explanation of the principles of audio codecs, outlines the technical and business cases for this new technology.
Consumers have enjoyed the permanence and quality of audio CDs, but CDs dont offer the convenience of todays portable media players, home media servers and car entertainment systems. Plus, the production and distribution costs of CDs are high compared to music downloads, and the variety of inventory in retail stores is decreasing.
Fraunhofers answer is HD-AAC, our first lossless audio codec, which offers better-than-CD quality, preserving every bit of the 24 bit/96 KHz studio master unchanged, yet playing on every iPod and AAC mobile phone. When consumers purchase HD-AAC files from a Web music store, they are not just buying a filename. HD-AAC files may include all the art, lyrics, liner notes and metadata that consumers associate with a tangible purchase. These files can be conveniently copied from consumer electronics device to device without keeping track of multiple bitrate copies or associated metadata, and can even be streamed over home networks at the best possible quality for the available bandwidth. HD-AAC is the audio format of the future that, due to its AAC core layer, also plays on many of todays legacy devices.
New Ways of Listening to Music
Portability
Music listening habits have changed dramatically over the past 10 years, as portable music players became a mass phenomenon, and multiple playback and storage devices within the household were enabled to connect over the Internet. Recently, the introduction of portable music players as nomadic devices, docked with a car head unit or home theater entertainment system, has intensified the consumer expectation of listening to music everywhere.
Convenience
As the storage capacity and ease of handling of compressed audio and flash or hard-drive based music players became superior, the usage and sales of CDs declined. Chances are high that consumers will strongly welcome a combination of CD quality along with the flexibility and portability of compressed audio, and then demand new services and products combining these benefits. Since HD-AAC is unifying both, it represents something essentially new and guarantees that content created or purchased in the new HD-AAC format also plays on the majority of existing consumer electronics devices. The content can either be stored in HD-AAC on a single portable device or archived on the home media server.
Ease of Use
With HD-AAC, consumers may download, store and manage only a single file instead of two or more versions of the same song, depending on the application, download store or corresponding playback devices. In addition to stereo music, HD-AAC also compresses surround signals, which makes it the most convenient solution for any content or playback scenario.
Quality
With the move to compressed music formats, consumers sometimes worry that they are missing something from the studio master, even though listening tests have proven todays audio codecs offer transparent quality when operated at their recommended bitrates. Due to its 24-bit/96 KHz support and lossless compression, HD-AAC delivers every bit unchanged, bringing the studio master standard to the consumer in the most convenient way.
A New Generation of HD-AAC-Enabled Consumer Devices
Legacy Devices
The successful introduction of new audio formats invariably leads to increased consumer demand for new playback devices. The hurdle for wide market adoption is very high, since existing device generations are usually not capable of playing a new format, and consumers are reluctant to leave those devices behind without a good reason. Due to its backward-compatible approach and unique new features, HD-AAC creates consumer demand for the next generation of devices without making the current generation obsolete.
Portable Media Players and Connectivity
The number of devices benefiting from HD-AAC could be extensive. High-end audio has been the domain of physical media in the living room or in the car for many years. This exclusivity was mainly driven by the lack of storage space on portable media players. With the enormous increase in flash storage capacity, the multiplication of available CPU power, improvements in battery life management and availability of economical 24-bit converters, todays portable media players fulfill the hardware requirements for lossless audio storage and decoding. In combination with quality headphones, HD-AAC may offer mobile listeners the ability to listen to the same music file in its highest fidelity.
Due to the docking station concept and bitstream output feature of modern portable media players, they easily dock to any stationary audio device to reproduce studio master quality from the HD-AAC file stored on the mobile. In this respect, the connectivity to other listening environments, such as home theatre systems or automotive entertainment systems, becomes even more important, as shown in Figure 1.
Home Media Servers and Connected Devices
Music libraries stored on home media servers in the HD-AAC format offer the convenience of one file, the emotional assurance that every bit of original information is preserved, and the possibility that the complete artwork is contained in the music file. Maximum audio quality at minimum disk space is achieved through state-of-the-art lossless compression in an .mp4 file container. Through these characteristics, HD-AAC creates more than a file experience. Due to the scalable nature of the codecs files, the content stored on the media server in the living room can be streamed to various devices in the household, even under difficult network conditions, while always guaranteeing maximum audio quality, as shown in Figure 2.
A New Way of Creating Content
Consumer Software Tools
Many consumers have ripped their CD collections more than once to create a compressed digital copy. Whether driven by the wish to increase overall quality or motivated by a new gadget purchase, they found that this process always included a loss of original information and an uncertainty about what new and better format the future might bring. Consequently, many consumers kept their CD shelves in the living room as a backup medium. With the introduction of the lossless HD-AAC format, consumers will have to encode their CD collections once more, but since every bit of original information is preserved and the HD-AAC file can be played on most legacy devices, it will be for the last time.
Production Software Tools
Professional producers and musicians, as well as the advanced consumer, can use software tools to encode 96 KHz productions with up to 24-bit resolution into the HD-AAC format. At the same time, easy pre-listening on iPods and exchange by Web or email becomes possible.
A New Way of Selling and Purchasing Music
High-Fidelity Downloads
Audio CDs store uncompressed music in 16-bit, 44.1 KHz quality, while most of todays music is produced in the improved 24-bit, 96 KHz standard. Due to the technical limitations of the Audio CD and current download formats, the music industry has been unable to offer this improved studio-master sound quality to the consumer. HD-AAC is closing this technical gap between music production and distribution standards, enabling a true high-fidelity experience for the consumer and even offering the chance to resell back-catalog titles. It may also help soften the trend of decreasing album sales, since the complete record could be released as a premium version sold in one HD-AAC file incorporating the complete artwork.
One-File Shopping
Music-buying habits have changed enormously in the past few months, since legal downloads have become the norm, and related revenues are increasing rapidly through online music sales. Still, buying music online is a file download experience, often not yet considered equivalent to traditional physical media, especially with regard to album sales. Too often its a complicated two-file or dual-delivery world one file is downloaded to the mobile phone, and a copy of the same song is downloaded to the PC in a second step. This lack of convenience is mostly due to the bandwidth constraints of current 3G networks. With the next generation of HSDPA and 4G/LTE-powered networks, this bottleneck will disappear, and consumers will benefit from the convenience and ease-of-use that HD-AACs simple one-file approach delivers.
New Coding Technology in HD-AAC
Principles of Audio Codecs
Audio codecs rely on two general techniques to reduce the bitrate of a signal. One is to remove redundancies in the transmitted bitstream by using mathematical algorithms to pack more information in fewer bits. We are all users of these techniques when we make a ZIP file on our computers. Similar, though audio-specific, algorithms are used in compressing the encoded bitstream for audio codecs such as MP3 and AAC, and in the lossless coding part of HD-AAC.
The other technique is to reduce irrelevancy in the signal by removing parts of the signal that wont be heard or are less important to perceived quality. In speech codecs, such as the GSM or ITU G-series codecs, this is done by taking advantage of the characteristics of human speech. This is one reason that music heard over a mobile phone sounds so bad because the codec is not designed to transmit it.
In music codecs such as MP3 or AAC, properties of human hearing are used in deciding what signal components contribute to the perceived quality, so they will work with any input signal. The primary hearing property exploited is frequency masking, as shown in Figure 3. In some respects, our ears are similar to a spectrum analyzer in that they have a certain resolution bandwidth below which frequency components in the signal are not resolved. In the ear, signal components lower in amplitude and near in frequency to a stronger signal component are not heard because they are below the ears masking threshold.
Perceptual audio codecs use a psychoacoustic model of the ears masking threshold to analyze the input signal and decide which signal components will not be heard. These components can then be coarsely quantized or even removed before transmission.
To do this, we need to operate on the input signal in the frequency domain; so all music codecs perform a time to frequency domain conversion in an analysis filterbank as the first step.
Except for the compression introduced by the following lossless compression encoding, it is only by quantizing the signal that the data to be transmitted is compressed. This quantizing also adds noise to the signal, but fortunately, the same masking effects keep this noise from being heard. With smaller frequency bands in the filterbank, this quantization can be more precise. One reason AAC offers higher compression efficiency than MP3 is because it uses a larger filterbank.
If we examine the block diagram of an AAC encoder, shown in Figure 4, we can see that the filter bank is followed by some auxiliary spectral processing tools that improve the encoders performance in specific cases, and then functions which quantize the spectral components. The resulting values are then Huffman-coded (a form of lossless data compression) and packed into a bitstream. The perceptual model is supplying psychoacoustic information to allow the bitrate controller to change the quantization levels.
This compression process, which has evolved in complexity over the past 20 years as available computing power increased and new coding discoveries were made, works well. A typical music signal can be compressed 10 or 20 times with no perceivable loss. When the bitrate is not high enough to meet the requirements of the perceptual model, or the model does not accurately duplicate the listeners perception, the listener hears coding artifacts.
Artifacts are also increased by tandem coding the re-encoding of material that has already been encoded one or more times. At typical bitrates, audio codecs such as MP3 or AAC are intended for final transmission or delivery to the consumer. These codecs have also been successfully employed in broadcast editing and production, but at higher bitrates. The consumer who is making his own remix in a music manager, or converting files from one format to another by re-encoding, may start to create unmasked artifacts. In these cases, consumers need an archival format instead of a delivery one.
Lossless Coding and the HD-AAC Codec
Unlike lossy audio codecs, lossless codecs only remove redundancies. As with computer ZIP files, their operations are mathematically lossless. After decoding, the signal is identical to the original. This allows an unlimited number of encoding and decoding cycles without any loss or artifacts. Their disadvantage is that lossless compression offers only a 2-to-1-compression ratio, on average, compared to files 10 to 20 times smaller with lossy codecs.
Lossless codecs have traditionally been built using time-domain algorithms such as linear or polynomial prediction. HD-AAC is a new approach that operates in the frequency domain with an AAC base layer inside. In the HD-AAC encoder, the input signal is encoded with both a standard AAC encoder and a modified version that uses integer arithmetic. Integer arithmetic is needed because the traditional AAC encoding operations, particularly the filterbank that performs time to frequency conversion, lose a small amount of precision due to fixed-point rounding or to floating-point arithmetic.
The integer filterbank, implemented with an integer MDCT transform, avoids this by only using integer operations where the rounding is precisely specified and its effect is perfectly inverted in the inverse transform. The advantage of the integer MDCT is that it when the decoder performs the inverse transform to recreate the output time domain signal, it is an exact duplicate of the input signal. In a traditional AAC encoder, there are always small rounding errors that make this impossible, even if no quantization is done.
The integer MDCT closely approximates the calculations of the regular MDCT used in the AAC encoder. This allows us to send only the difference between each of the quantized AAC frequency coefficients and its losslessly calculated counterpart. These differences are compressed in a bit-plane order, so the MSBs of the differences can be decoded without decoding the LSBs, as shown in Figure 5.
Compressing the residual error signal in this way allows the quality level and total bitrate to be adjusted on the fly during transmission. Throwing away the LSBs of the error signal to reduce the bitrate leads to reducing the quantizing of the AAC frequency coefficients almost exactly as if the signal had just been encoded with the AAC encoder set for a higher core bitrate.
Although this architecture introduces additional computation in the codec compared to traditional lossless codecs, it provides compatibility with legacy AAC devices and offers the scalability feature to match the audio quality to the available storage or transmission bandwidth. This is done while maintaining lossless compression efficiency similar to the older lossless codecs about 2-to-1 for most signals.
The HD-AAC codec offers the capability to encode the lossless signal at higher sampling rates and precision than the core AAC layer. This means that the core AAC layer can be encoded at 256 kb/s and 48 KHz and 16-bit sampling rate, while the lossless residual layer is operating on data at 96 or 192 KHz and 24 bits. The residual layer bitstream is encoded so that a legacy AAC decoder will ignore it in the combined bitstream. This gives HD-AAC files the ability to be played directly on legacy devices that only support AAC, while supplying full-precision oversampled playback from a new device supporting HD-AAC.
HD-AAC on ARM Processors
The ARM processor architecture dominates the music player and mobile phone markets where AAC has been applied, and is Fraunhofers natural choice for our first embedded port of HD-AAC. With the speed of todays ARM implementations, there is little need for separate DSP-based processors, even for encoding. AAC encoding and decoding is possible directly on the ARM processor through fixed-point arithmetic implementations of AAC. The integer nature of HD-AAC means it is already suitable for running on the general-purpose ARM processor. This simplifies product design, as the control, file system and user interface code can all run on the same processor as the audio codec.
Fraunhofer is now in the process of porting our floating-point reference code, used in the MPEG standardization process, to fixed-point production software that will run efficiently in embedded devices. As shown in Figure 6, we first convert the floating point code to a generic fixed-point prototype, by replacing floating-point operations with their fixed-point equivalents, perhaps using double precision fixed-point or implementing an internal software block floating-point representation where necessary.
We will also configure the data storage for efficient use of an embedded processors more limited memory, and we will remove the runtime overhead of calling transcendental functions and other math libraries, either through our own optimized real-time ones or by pre-computing look-up tables. Once weve done this, and implemented some other techniques to compact and speed up the code, we will test it for stability and conformance. For encoders, this often means conducting double-blind listening tests, just as we do with our MPEG standards work.
When this code passes our tests, it becomes our fixed-point template code that is then modified only in a bit-exact way to port to each processor, taking advantage of specific memory models or hardware instructions to improve the execution speed. In this way, were able to easily keep all of our ports current if we make an improvement, just by changing the fixed-point template and conditionally including the optimizations for each processor when we re-compile. For the ARM processor family, this is usually very straightforward, as the architecture does not require many special optimizations to our code.
At this point, we expect that an HD-AAC decoder will take about twice the processing power of an AAC decoder. We expect this work to be completed in the next few months.
Further Fraunhofer Products on ARM Processors
In addition to HD-AAC, Fraunhofer offers many other audio and video codecs for the ARM processor architecture, as shown in Figure 7.
| Author: Robert Bleidt and Jan Nordmann, Fraunhofer USA |
|
|
|