All Topics  
Speex

 

   Email Print
   Bookmark   Link

 

Speex


 
 

Speex is a free softwareFree software

Free software, as defined by the Free Software Foundation, is software which can be used, copied, studied, modified and redi...
 speech codec that may be used on VoIPVoice over IP

Voice over Internet Protocol, also called VoIP, 'IP Telephony, 'Internet telephony, 'Broadband telephony...
 applications and podcasts. Speex is free of any patentSoftware patent

Software patents are patents on computer-implemented inventions....
 restrictions and is licensed under the revised (3-clause) BSD license. It may be used with the OggOgg Overview

Ogg is a patent-free, fully open and standardised multimedia bitstream container format designed for efficient streaming and...
 container formatContainer format

A container format is a computer file format that can contain various types of data, compressed by means of standardized co...
 or directly transmitted over UDPUser Datagram Protocol

The User Datagram Protocol is one of the core protocols of the Internet protocol suite....
/RTPReal-time Transport Protocol

The Real-time Transport Protocol defines a good standardized packet format for delivering audio and video over the Internet...
.

The Speex designers see their project as complementary to the VorbisVorbis

Vorbis is an open source, lossy audio codec project headed by the Xiph.org Foundation....
 general-purpose audio compressionAudio compression

Audio compression can mean two things:...
 project.

Speex is a lossyLossy data compression

A lossy data compression method is one where compressing data and then decompressing it retrieves data that may well be diff...
 format, meaning quality is permanently degraded to reduce file size.

Description

Unlike many other speech codecs, Speex is not targeted at cellular telephony but rather at Voice over IPVoice over IP

Voice over Internet Protocol, also called VoIP, 'IP Telephony, 'Internet telephony, 'Broadband telephony...
 (VoIP) and file-based compression. The design goals have been to make a codec that would be optimized for high quality speech and low bit rate. To achieve this the codec uses multiple bit rates, and supports ultra-wideband (32 kHz sampling rateSampling rate

The sampling rate, sample rate, or sampling frequency defines the number of samples per second taken from a cont...
), widebandWideband

In communications, wideband is a relative term used to describe a wide range of frequencies in a spectrum....
 (16 kHz sampling rate) and narrowband (telephone quality, 8 kHz sampling rate). Designing for Voice over IP instead of cell phone use means that Speex must be robust to lost packets, but not to corrupted ones since the User Datagram ProtocolUser Datagram Protocol

The User Datagram Protocol is one of the core protocols of the Internet protocol suite....
 (UDP) ensures that packets either arrive unaltered or don't arrive. All this led to the choice of Code Excited Linear PredictionCode Excited Linear Prediction

CELP stands for Code Excited Linear Prediction and is a speech coding algorithm originally proposed by M.R....
 (CELP) as the encoding technique to use for Speex. One of the main reasons is that CELP has long proven that it could do the job and scale well to both low bit rates (as evidenced by DoD CELP @ 4.8 kbit/s) and high bit rates (as with G.728G.728

G.728 is a ITU-T standard for speech coding operating at 16 kbit/s....
 @ 16 kbit/s).
The main characteristics can be summarized as follows:
  • Free softwareFree software

    Free software, as defined by the Free Software Foundation, is software which can be used, copied, studied, modified and redi...
    /open-source, patentPatent

    A patent is a set of exclusive rights granted by a state to a patentee for a fixed period of time in exchange for the regul...
     and royaltyRoyalties

    The royalty is typically a sum of money to be paid to the owner or Licensor of Intellectual Property IP Rights for the benef...
    -free
  • Integration of narrowband and wideband in the same bit-stream
  • Wide range of bit rates available (from 2 kbit/s to 44 kbit/s)
  • Dynamic bit rate switching and Variable bit-rate (VBR)
  • Voice Activity Detection (VAD, integrated with VBR)
  • Variable complexity
  • Ultra-wideband mode at 32 kHz (up to 48 kHz)
  • Intensity stereoIntensity stereo

    Intensity stereo is a term that refers to a stereo sound image that is produced only by the difference in volume of somethin...
     encoding option

Features

Sampling rate: Speex is mainly designed for three different sampling rates: 8 kHz (the same sampling rate to transmit telephoneTelephone

The telephone or phone is a telecommunications device which is used to transmit and receive sound across distance....
 calls), 16 kHz, and 32 kHz. These are respectively referred to as narrowband, wideband and ultra-wideband.
Quality: Speex encoding is controlled most of the time by a quality parameter that ranges from 0 to 10. In constant bit-rate (CBR) operation, the quality parameter is an integerInteger

The integers consist of the positive natural numbers , their negatives and the number zero....
, while for variable bit-rate (VBR), the parameter is a real number.
Complexity (variable): With Speex, it is possible to vary the complexity allowed for the encoder. This is done by controlling how the search is performed with an integer ranging from 1 to 10 in a way that's similar to the -1 to -9 options to gzipGzip

gzip is short for GNU zip, a GNU free software file compression program....
 compressionData compression

In computer science and information theory, data compression or source coding is the process of encoding information u...
 utilities. For normal use, the noise level at complexity 1 is between 1 and 2 dB higher than at complexity 10, but the CPUCentral processing unit

A central processing unit , or sometimes simply processor, is the component in a digital computer that interprets ins...
 requirements for complexity 10 is about five times higher than for complexity 1. In practice, the best trade-off is between complexity 2 and 4, though higher settings are often useful when encoding non-speech sounds like DTMF tones, or if encoding is not in real-time.
Variable Bit-Rate (VBR): Variable bit-rate (VBR) allows a codec to change its bit rate dynamically to adapt to the "difficulty" of the audio being encoded. In the example of Speex, sounds like vowelVowel

In phonetics, a vowel is a sound in spoken language that is characterized by an open configuration of the vocal tract so tha...
s and high-energy transientTransient

Transient means passing with time....
s require a higher bit rate to achieve good quality, while fricatives (e.g. s and f sounds) can be coded adequately with fewer bits. For this reason, VBR can achieve lower bit rate for the same quality, or a better quality for a certain bit rate. Despite its advantages, VBR has two main drawbacks: first, by only specifying quality, there's no guarantee about the final average bit-rate. Second, for some real-time applications like voice over IPVoice over IP

Voice over Internet Protocol, also called VoIP, 'IP Telephony, 'Internet telephony, 'Broadband telephony...
 (VoIP), what counts is the maximum bit-rate, which must be low enough for the communication channel.
Average Bit-Rate (ABR): Average bit-rate solves one of the problems of VBR, as it dynamically adjusts VBR quality in order to meet a specific target bit-rate. Because the quality/bit-rate is adjusted in real-time (open-loop), the global quality will be slightly lower than that obtained by encoding in VBR with exactly the right quality setting to meet the target average bitrate.
Voice Activity Detection (VAD): When enabled, voice activity detection detects whether the audio being encoded is speech or silence/background noise. VAD is always implicitly activated when encoding in VBR, so the option is only useful in non-VBR operation. In this case, Speex detects non-speech periods and encodes them with just enough bits to reproduce the background noise. This is called "comfort noiseComfort noise

Comfort noise is artificial background noise used in radio and wireless communications to fill the silence in a transmission...
 generation" (CNG).
Discontinuous Transmission (DTX): Discontinuous transmission is an addition to VAD/VBR operation, that allows to stop transmitting completely when the background noise is stationary. In a file, 5 bits are used for each missing frame (corresponding to 250 bit/s).
Perceptual enhancement: Perceptual enhancement is a part of the decoder which, when turned on, tries to reduce (the perception of) the noise produced by the coding/decoding process. In most cases, perceptual enhancement makes the sound further from the original objectively (signal-to-noise ratio), but in the end it still sounds better (subjective improvement).
Algorithmic delay: Every codec introduces a delay in the transmission. For Speex, this delay is equal to the frame size, plus some amount of "look-ahead" required to process each frame. In narrowband operation (8 kHz), the delay is 30 ms, while for wideband (16 kHz), the delay is 34 ms. These values don't account for the CPU time it takes to encode or decode the frames.

Large application base

There is already a large base of applications supporting the Speex codec, from streamingStreaming media

Streaming media is media that is consumed whilst it is being delivered....
 applications like teleconferenceTeleconference

In telecommunication, teleconference is the live exchange of information among persons and machines remote from one another ...
 to videogames and audio processing applications. Most of these are based on the DirectShowDirectShow Overview

DirectShow, codename Quartz, is a multimedia framework and API produced by Microsoft for software developers to perform ...
 filter, OpenACM codec—Netmeeting on Microsoft WindowsMicrosoft Windows

Microsoft Windows is a family of operating systems by Microsoft....
, or OpenH323OpenH323

The project goal is developing a full featured, open source implementation of the H.323 Voice over IP protocol....
 on LinuxLinux

Linux is a Unix-like computer operating system....
, for example. There are also pluginPlugin

A plugin is a computer program that interacts with a main application to provide a certain, usually very specific, functio...
s for the WinampWinamp

Winamp is a multimedia player made by Nullsoft and eventually acquired by America Online....
 and XMMSXMMS

The X Multimedia System is a free software audio player very similar to Winamp, that runs on many Unix-like operating syste...
 players. Also KSP Sound PlayerKSP Sound Player

KSP is a free audio player for Windows which is developed and distributed by Kalliope s.c....
 from version 2006.0.0.2 and foobar2000Foobar2000

foobar2000 is a freeware audio player for Windows developed by Peter Pawlowski, a former freelance contractor for Nullsoft....
 support Speex.

The media type for Speex is audio/ogg while contained by Ogg, and audio/x-speex when transported through RTP or without container..

See the plugin and software page on site for more details.

Microsoft's Xbox LiveXbox Live

Xbox Live is a subscription-based online gaming service for Microsoft's Xbox and Xbox 360 video game consoles....
 uses Speex for the headsets, as announced by Ralph Giles, the TheoraTheora Overview

Theora is a video codec being developed by the Xiph.org Foundation as part of their Ogg project....
 codec maintainer, on LugRadioLugRadio

LugRadio is a UK-based internet radio programme aimed at fans of Linux and open source software....
.

The latest Half-Life 1 engine and mods use the voice_speex.dll codec as its ingame VoIP function. Though it is not enabled by default, server administrators must enable it by typing in the console of their server either through rcon or at the physical server computer "sv_voicecodec voice_speex". Speex provides much better quality than the default Miles voice codec.

The United States ArmyFacts About United States Army

The United States Army is the largest branch of the United States armed forces and has primary responsibility for land-based...
's Land WarriorLand Warrior

Land Warrior is a United States Army program that uses a combination of commercial, off-the-shelf technology and current-iss...
 system, designed by General DynamicsGeneral Dynamics

General Dynamics is a defense conglomerate formed by mergers and divestitures, and as of 2005 it is the sixth largest defen...
, also uses Speex for VoIP on an EPLRS radio designed by RaytheonRaytheon

Raytheon Company is a major United States military contractor based in Waltham, Massachusetts....
.

In Sid Meier's Civilization 4, Speex is used to encode the descriptions of the technologies as read by Leonard NimoyFacts About Leonard Nimoy

Leonard Simon Nimoy is an American actor, film director, poet, musician and photographer best known for playing the charact...
.

The VoIP Program TeamSpeakTeamSpeak

TeamSpeak is proprietary Voice over IP software that allows users to speak on a chat channel with other users, much like a t...
 Uses Speex codecs as one of the 3 codecs available. the range of quality starts from 3.4Kbit to 25.9Kbit. Many servers prefer the Speex codec due to its good quality with few or many people in a room.

The RockboxRockbox

Rockbox is a free software / open source operating system for digital audio players....
 project uses Speex for its voice interface. It can also play Speex files on supported players, such as the Apple iPod or the iRiver H10.

The handheld data acquisition device for science education uses Speex for voice annotations created by students and teachers using either the built-in or an external microphone.

The Flash Player 10 Beta from Adobe includes support for Speex audio codec.

See also

  • Comparison of audio codecsComparison of audio codecs

    The following tables compare general and technical information for a variety of audio codecs....
  • DSSDigital Speech Standard

    Digital Speech Standard is a proprietary compressed digital audio format defined by the International Voice Association, a ...
    , a proprietary speech audio compression standard

External links