ADX (file format)
Encyclopedia
ADX is a lossy proprietary
Proprietary software
Proprietary software is computer software licensed under exclusive legal right of the copyright holder. The licensee is given the right to use the software under certain conditions, while restricted from other uses, such as modification, further distribution, or reverse engineering.Complementary...

 audio storage and compression format developed by CRI Middleware
CRI Middleware
CRI Middleware Co., Ltd. is a Japanese developer providing middleware for use in the computer and video game industry. From the early nineties, CRI was a video game developer, but shifted focus in 2001...

 specifically for use in video games, it is derived from ADPCM. Its most notable feature is a looping function that has proved useful for background music in various games that have adopted the format, including many games for the SEGA Dreamcast
Sega Dreamcast
The is a 128-bit video game console which was released by Sega in late 1998 in Japan and from September 1999 in other territories. It was the first entry in the sixth generation of video game consoles, preceding Sony's PlayStation 2, Microsoft's Xbox and the Nintendo GameCube.Dreamcast sales were...

 as well as some PlayStation 2
PlayStation 2
The PlayStation 2 is a sixth-generation video game console manufactured by Sony as part of the PlayStation series. Its development was announced in March 1999 and it was first released on March 4, 2000, in Japan...

 and GameCube games. One of the first games to use ADX was Burning Rangers
Burning Rangers
Burning Rangers is a video game developed by Sonic Team for the Sega Saturn. The player takes on the role of a new recruit to the futuristic firefighting organization of the title, and must explore locations where various emergencies have taken place, extinguishing fires and rescuing survivors.The...

, on the Sega Saturn
Sega Saturn
The is a 32-bit fifth-generation video game console that was first released by Sega on November 22, 1994 in Japan, May 11, 1995 in North America, and July 8, 1995 in Europe...

. Notably, the Sonic the Hedgehog series
Sonic the Hedgehog series
Sonic the Hedgehog is the best selling video game series released by Sega starring and named after its mascot character, Sonic the Hedgehog...

 from the Dreamcast generation up to at least Shadow the Hedgehog have used this format for music and voice recordings.

On top of the main ADPCM encoding, the ADX toolkit also includes a sibling format, AHX, which uses a variant of MPEG-2
MPEG-2
MPEG-2 is a standard for "the generic coding of moving pictures and associated audio information". It describes a combination of lossy video compression and lossy audio data compression methods which permit storage and transmission of movies using currently available storage media and transmission...

 audio intended specifically for voice recordings and a packaging archive, AFS, for bundling multiple ADX and AHX tracks into a single container file.

General Overview

ADX is a compressed audio format but unlike MP3
MP3
MPEG-1 or MPEG-2 Audio Layer III, more commonly referred to as MP3, is a patented digital audio encoding format using a form of lossy data compression...

 and similar formats, it doesn't apply a psychoacoustic model
Psychoacoustics
Psychoacoustics is the scientific study of sound perception. More specifically, it is the branch of science studying the psychological and physiological responses associated with sound...

 to the sound to reduce its complexity. The ADPCM model instead stores samples by recording the error relative to a prediction function which means more of the original signal survives the encoding process; as such ADPCM compression instead trades accuracy of the representation for size by using relatively small sample sizes, usually 4bits. The human auditory system's tolerance for the noise this causes makes the loss of accuracy barely noticeable.

Like other encoding formats, ADX supports multiple sampling frequencies such as 22050 Hz
Hertz
The hertz is the SI unit of frequency defined as the number of cycles per second of a periodic phenomenon. One of its most common uses is the description of the sine wave, particularly those used in radio and audio applications....

, 44100 Hz, 48000 Hz, etc. however, the output sample depth is locked at 16bits, generally due to the lack of precision already mentioned. It supports multiple channels but there seems to be an implicit limitation of stereo (2 channel) audio although the file format itself can represent up to 255 channels. The only particularly distinctive feature that sets ADX apart from alternatives like IMA ADPCM (other than having a different prediction function) is the integrated looping functionality, this enables an audio player to optionally skip backwards after reaching a single specified point in the track to create a coherent loop; hypothetically, this functionality could be used to skip forwards as well but that would be redundant since the audio could simply be clipped with an editing program instead.

For playback there are a few plugins for WinAmp and a convert to wave tool (see the references section). The open source program / library FFmpeg
FFmpeg
FFmpeg is a free software project that produces libraries and programs for handling multimedia data. The most notable parts of FFmpeg are libavcodec, an audio/video codec library used by several other projects, libavformat, an audio/video container mux and demux library, and the ffmpeg command line...

 also has ADX support implemented, however, its decoder is hard coded so can only properly decode 44100 Hz ADXs.

Technical Description

The ADX specification is not freely available, however the most important elements of the structure have been reverse engineered and documented in various places on the web. The information here may be incomplete but should be sufficient to build a working codec
Codec
A codec is a device or computer program capable of encoding or decoding a digital data stream or signal. The word codec is a portmanteau of "compressor-decompressor" or, more commonly, "coder-decoder"...

 or transcoder.

As a side note, the AFS archive files that ADXs are sometimes packed in are a simple variant of a tarball
Tar (file format)
In computing, tar is both a file format and the name of a program used to handle such files...

 which uses numerical indices to identify the contents rather than names. Source code for an extractor can be found in the ADX archive at .

File Header

The ADX disk format is defined in big-endian. The identified sections of the main header are outlined below:
0 1 2 3 4 5 6 7 8 9 A B C D E F
0x0 0x80 0x00 Copyright Offset Encoding Type Block Size Sample Bitdepth Channel Count Sample Rate Total Samples
0x10 Highpass Frequency Version Flags Unknown Loop Enabled (v3) Loop Begin Sample Index (v3)
0x20 Loop Begin Byte Index (v3) Loop Enabled (v4)
Loop End Sample Index (v3)
Loop Begin Sample Index (v4)
Loop End Byte Index (v3)
Loop Begin Byte Index (v4)
0x30 Loop End Sample Index (v4) Loop End Byte Index (v4) Zero or more bytes empty space
??? [CopyrightOffset - 2] ASCII (unterminated) string: "(c)CRI"
... [CopyrightOffset + 4] Audio data starts here

Fields labelled "Unknown" contain either unknown data or are apparently just reserved (i.e. filled with null bytes). Fields labelled with 'v3' or 'v4' but not both are considered "Unknown" in the version they are not marked with. It should also be noted that this header may be as short as 20 bytes (0x14), as determined by the copyright offset, which implicitly removes support for a loop since those fields are not present.

The "Encoding Type" field should contain one of:
  • 0x03 for Standard ADX
  • 0x04 for ADX with an exponential scale
  • 0x10 or 0x11 for AHX

The "Version" field should contain one of:
  • 0x02 for a variant of 'version 3' with a different fixed decoder
  • 0x03 for ADX 'version 3'
  • 0x04 for ADX 'version 4'
  • 0x05 for a variant of ADX 4 without looping support

When decoding AHX audio, the version field does not appear to have any meaning and can be safely ignored.

Sample Format

ADX encoded audio data is broken into a series of 'blocks', each containing data for only one channel. The blocks are then laid out in 'frames' which consist of one block from every channel in ascending order. For example, in a stereo (2 channel) stream this would consist of Frame 1: left channel block, right channel block; Frame 2: left, right; etc. Blocks are usually always 18 bytes in size containing 4bit samples though other sizes are technically possible, an example of such a block looks like this:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Scale 32 4bit samples

The scale is a 16bit unsigned
Signedness
In computing, signedness is a property of data types representing numbers in computer programs. A numeric variable is signed if it can represent both positive and negative numbers, and unsigned if it can only represent non-negative numbers .As signed numbers can represent negative numbers, they...

 integer (big-endian like the header) which is essentially the amplification of all the samples in that block. Each sample in the block must be decoded in bit-stream order, that is, most significant bit first. For example, when the sample size is 4bits:
7 6 5 4 3 2 1 0
First sample Second sample

The samples themselves are not in reverse so there is no need to fiddle with them once they are extracted. Each sample is signed so for this example, the value can range between -8 and +7 (which will be multiplied by the scale during decoding). As an aside, although any bit-depth between 1 and 255 is made possible by the header. It is unlikely that one bit samples would ever occur as they can only represent the values {0, 1}, {-1, 0} or {-1, 1}, all of which are not particularly useful for encoding music — if they were to occur then it is unclear which of the three possibilities is the correct interpretation.

ADX Decoding

This section walks through decoding ADX 'version 3' or 'version 4' when "Encoding Type" is "Standard ADX" (0x03). An encoder can also be built by simply flipping the code to run in reverse. All code samples in this section are written using C99
C99
C99 is a modern dialect of the C programming language. It extends the previous version with new linguistic and library features, and helps implementations make better use of available computer hardware and compiler technology.-History:...

.
Before a 'standard' ADX can be either encoded or decoded, the set of prediction coefficients must be calculated. This is generally best done in the initialisation stage:

#define M_PI acos(-1.0)
double a, b, c;
a = sqrt(2.0) - cos(2.0 * M_PI * ((double)adx_header->highpass_frequency / adx_header->sample_rate));
b = sqrt(2.0) - 1.0;
c = (a - sqrt((a + b) * (a - b))) / b; //(a+b)*(a-b) = a*a-b*b, however the simpler formula loses accuracy in floating point

// double coefficient[2];
coefficient[0] = c * 2.0;
coefficient[1] = -(c * c);

This code calculates prediction coefficients for predicting the current sample from the 2 previous samples. These coefficients also form a first order Finite Impulse Response
Finite impulse response
A finite impulse response filter is a type of a signal processing filter whose impulse response is of finite duration, because it settles to zero in finite time. This is in contrast to infinite impulse response filters, which have internal feedback and may continue to respond indefinitely...

 high-pass filter
High-pass filter
A high-pass filter is a device that passes high frequencies and attenuates frequencies lower than its cutoff frequency. A high-pass filter is usually modeled as a linear time-invariant system...

 as well.

Once we know the decoding coefficients we can start decoding the stream:

static int32_t* past_samples; // Previously decoded samples from each channel, zeroed at start (size = 2*channel_count)
static uint_fast32_t sample_index = 0; // sample_index is the index of sample set that needs to be decoded next
static ADX_header* adx_header;

// buffer is where the decoded samples will be put
// samples_needed states how many sample 'sets' (one sample from every channel) need to be decoded to fill the buffer
// looping_enabled is a boolean flag to control use of the built-in loop
// Returns the number of sample 'sets' in the buffer that could not be filled (EOS)
unsigned decode_adx_standard( int16_t* buffer, unsigned samples_needed, bool looping_enabled )
{
unsigned const samples_per_block = (adx_header->block_size - 2) * 8 / adx_header->sample_bitdepth;
int16_t scale[ adx_header->channel_count ];

if (looping_enabled && !adx_header->loop_enabled)
looping_enabled = false;

// Loop until the requested number of samples are decoded, or the end of file is reached
while (samples_needed > 0 && sample_index < adx_header->total_samples)
{
// Calculate the number of samples that are left to be decoded in the current block
unsigned sample_offset = sample_index % samples_per_block;
unsigned samples_can_get = samples_per_block - sample_offset;

// Clamp the samples we can get during this run if they won't fit in the buffer
if (samples_can_get > samples_needed)
samples_can_get = samples_needed;

// Clamp the number of samples to be acquired if the stream isn't long enough or the loop trigger is nearby
if (looping_enabled && sample_index + samples_can_get > adx_header->loop_end_index)
samples_can_get = adx_header->loop_end_index - sample_index;
else if (sample_index + samples_can_get > adx_header->total_samples)
samples_can_get = adx_header->total_samples - sample_index;

// Calculate the bit address of the start of the frame that sample_index resides in and record that location
unsigned long started_at = (adx_header->copyright_offset + 4 + \
sample_index / samples_per_block * adx_header->block_size * adx_header->channel_count) * 8;

// Read the scale values from the start of each block in this frame
for (unsigned i = 0 ; i < adx_header->channel_count ; ++i)
{
bitstream_seek( started_at + adx_header->block_size * i * 8 );
scale[i] = ntohs( bitstream_read( 16 ) );
}

// Pre-calculate the stop value for sample_offset
unsigned sample_endoffset = sample_offset + samples_can_get;

// Save the bitstream address of the first sample immediately after the scale in the first block of the frame
started_at += 16;
while ( sample_offset < sample_endoffset )
{
for (unsigned i = 0 ; i < adx_header->channel_count ; ++i)
{
// Predict the next sample
double sample_prediction = coefficient[0] * past_samples[i*2 + 0] + coefficient[1] * past_samples[i*2 + 1];

// Seek to the sample offset, read and sign extend it to a 32bit integer
// Implementing sign extension is left as an exercise for the reader
// The sign extension will also need to include a endian adjustment if there are more than 8 bits
bitstream_seek( started_at + adx_header->sample_bitdepth * sample_offset + \
adx_header->block_size * 8 * i );
int_fast32_t sample_error = bitstream_read( adx_header->sample_bitdepth );
sample_error = sign_extend( sample_error, adx_header->sample_bitdepth );

// Scale the error correction value
sample_error *= scale[i];

// Calculate the sample by combining the prediction with the error correction
int_fast32_t sample = sample_error + (int_fast32_t)sample_prediction;

// Update the past samples with the newer sample
past_samples[i*2 + 1] = past_samples[i*2 + 0];
past_samples[i*2 + 0] = sample;

// Clamp the decoded sample to the valid range for a 16bit integer
if (sample > 32767)
sample = 32767;
else if (sample < -32768)
sample = -32768;

// Save the sample to the buffer then advance one place
*buffer++ = sample;
}
++sample_offset; // We've decoded one sample from every block, advance block offset by 1
++sample_index; // This also means we're one sample further into the stream
--samples_needed; // And so there is one less set of samples that need to be decoded
}

// Check if we hit the loop end marker, if we did we need to jump to the loop start
if (looping_enabled && sample_index adx_header->loop_end_index)
sample_index = adx_header->loop_start_index;
}

return samples_needed;
}

Most of the above code should be straightforward enough for anyone versed in C
C (programming language)
C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....

. The 'adx_header' pointer refers to the data extracted from the header as outlined earlier, it is assumed to have already been converted to the host Endian. This implementation is not intended to be optimal and the external concerns have been ignored such as the specific method for sign extension and the method of acquiring a bitstream from a file or network source. Once it completes, there will be samples_needed sets (if stereo, there will be pairs for example) of samples in the output buffer. The decoded samples will be in host-endian standard interleaved PCM format, i.e. left 16bit, right 16bit, left, right, etc. Finally, if looping is not enabled, or not supported, then the function will return the number of sample spaces that were not used in the buffer. The caller can test if this value is not zero to detect the end of the stream and drop or write silence in to the unused spaces if necessary.

Encryption

ADX supports a simple encryption scheme which XORs values from a linear congruential
Linear congruential generator
A Linear Congruential Generator represents one of the oldest and best-known pseudorandom number generator algorithms. The theory behind them is easy to understand, and they are easily implemented and fast....

 pseudorandom number generator with the block scale values. This method is computationally inexpensive to decrypt (in keeping with ADX's real-time decoding) yet renders the encrypted files unusable. The encryption is active when the "Flags" value in the header is 0x08. As XOR is symmetric
Symmetric-key algorithm
Symmetric-key algorithms are a class of algorithms for cryptography that use trivially related, often identical, cryptographic keys for both encryption of plaintext and decryption of ciphertext. The encryption key is trivially related to the decryption key, in that they may be identical or there is...

 the same method is used to decrypt as to encrypt. The encryption key is a set of three 16-bit values: the multiplier, increment, and start values for the linear congruential generator (the modulus is 0x8000 to keep the values in the 15-bit range of valid block scales). Typically all ADX files from a single game will use the same key.

The encryption method is vulnerable to known-plaintext attack
Known-plaintext attack
The known-plaintext attack is an attack model for cryptanalysis where the attacker has samples of both the plaintext , and its encrypted version . These can be used to reveal further secret information such as secret keys and code books...

s. If an unencrypted version of the same audio is known the random number stream can be easily retrieved and from it the key parameters can be determined, rendering every ADX encrypted with that same key decryptable. The encryption method attempts to make this more difficult by not encrypting silent blocks (with all sample nybbles equal to 0), as their scale is known to be 0.

Even if the encrypted ADX is the only sample available, it is possible to determine a key by assuming that the scale values of the decrypted ADX must fall within a "low range". This method does not necessarily find the key used to encrypt the file, however. While it can always determine keys that produce an apparently correct output, errors may exist undetected. This is due to the increasingly random distribution of the lower bits of the scale values, which becomes impossible to separate from the randomness added by the encryption.

AHX Decoding

As noted earlier, AHX is just an implementation of MPEG2 audio
MPEG-1 Audio Layer II
MPEG-1 Audio Layer II or MPEG-2 Audio Layer II is a lossy audio compression format defined by ISO/IEC 11172-3 alongside MPEG-1 Audio Layer I and MPEG-1 Audio Layer III...

 and the decoding method is basically the same as the standard, it is possible just to demux the stream from the ADX container and feed it through a standard MPEG Audio decoder like mpg123
Mpg123
mpg123 is a fast, free and console MPEG audio player software program for UNIX and Linux operating systems. mpg123 was ported to the Windows platform using Cygwin and MinGW. It supports MPEG-1 and -2, layers 1, 2 and 3. Its most common use is to play MP3 files...

. The ADX header's "sample rate" and "total samples" are usually correct if a decoder needs them (so should be set by encoder/muxer implementations) but most of the other fields such as the "block size" and "sample bitdepth" will usually be zero — as noted above, the looping functionality is also unavailable.
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK