Speex
Encyclopedia
Speex is a patent-free audio compression
Audio compression
Audio compression may refer to:*Audio compression , a type of lossy compression in which the amount of data in a recorded waveform is reduced for transmission with some loss of quality, used in CD and MP3 encoding, Internet radio, and the like...

 format designed for speech and also a free software
Free software
Free software, software libre or libre software is software that can be used, studied, and modified without restriction, and which can be copied and redistributed in modified or unmodified form either without restriction, or with restrictions that only ensure that further recipients can also do...

 speech codec that may be used on VoIP
Voice over IP
Voice over Internet Protocol is a family of technologies, methodologies, communication protocols, and transmission techniques for the delivery of voice communications and multimedia sessions over Internet Protocol networks, such as the Internet...

 applications and podcast
Podcast
A podcast is a series of digital media files that are released episodically and often downloaded through web syndication...

s. It is based on the CELP
Code Excited Linear Prediction
Code-excited linear prediction is a speech coding algorithm originally proposed by M.R. Schroeder and B.S. Atal in 1985. At the time, it provided significantly better quality than existing low bit-rate algorithms, such as residual-excited linear prediction and linear predictive coding vocoders...

 speech coding algorithm. Speex claims to be free of any patent
Software patent
Software patent does not have a universally accepted definition. One definition suggested by the Foundation for a Free Information Infrastructure is that a software patent is a "patent on any performance of a computer realised by means of a computer program".In 2005, the European Patent Office...

 restrictions and is licensed under the revised (3-clause) BSD license. It may be used with the Ogg
Ogg
Ogg is a free, open container format maintained by the Xiph.Org Foundation. The creators of the Ogg format state that it is unrestricted by software patents and is designed to provide for efficient streaming and manipulation of high quality digital multimedia.The Ogg container format can multiplex...

 container format
Container format
A container or wrapper format is a meta-file format whose specification describes how different data elements and metadata coexist in a computer file....

 or directly transmitted over UDP
User Datagram Protocol
The User Datagram Protocol is one of the core members of the Internet Protocol Suite, the set of network protocols used for the Internet. With UDP, computer applications can send messages, in this case referred to as datagrams, to other hosts on an Internet Protocol network without requiring...

/RTP
Real-time Transport Protocol
The Real-time Transport Protocol defines a standardized packet format for delivering audio and video over IP networks. RTP is used extensively in communication and entertainment systems that involve streaming media, such as telephony, video teleconference applications, television services and...

.

The Speex designers see their project as complementary to the Vorbis
Vorbis
Vorbis is a free software / open source project headed by the Xiph.Org Foundation . The project produces an audio format specification and software implementation for lossy audio compression...

 general-purpose audio compression
Audio compression
Audio compression may refer to:*Audio compression , a type of lossy compression in which the amount of data in a recorded waveform is reduced for transmission with some loss of quality, used in CD and MP3 encoding, Internet radio, and the like...

 project.

Speex is a lossy
Lossy data compression
In information technology, "lossy" compression is a data encoding method that compresses data by discarding some of it. The procedure aims to minimize the amount of data that need to be held, handled, and/or transmitted by a computer...

 format, meaning quality is permanently degraded to reduce file size.

The Speex project was created on February 13, 2002. The first development versions of Speex were released under LGPL license, but as of version 1.0 beta 1, Speex is released under Xiph's version of the (revised) BSD license. Speex 1.0 was announced on March 24, 2003, after a year of development. The last stable version of Speex encoder and decoder is 1.1.12.

Description

Unlike many other speech codecs, Speex is not targeted at cellular telephony but rather at Voice over IP
Voice over IP
Voice over Internet Protocol is a family of technologies, methodologies, communication protocols, and transmission techniques for the delivery of voice communications and multimedia sessions over Internet Protocol networks, such as the Internet...

 (VoIP) and file-based compression. The design goals have been to make a codec that would be optimized for high quality speech and low bit rate. To achieve this the codec uses multiple bit rates, and supports ultra-wideband (32 kHz sampling rate
Sampling rate
The sampling rate, sample rate, or sampling frequency defines the number of samples per unit of time taken from a continuous signal to make a discrete signal. For time-domain signals, the unit for sampling rate is hertz , sometimes noted as Sa/s...

), wideband
Wideband
In communications, wideband is a relative term used to describe a wide range of frequencies in a spectrum. A system is typically described as wideband if the message bandwidth significantly exceeds the channel's coherence bandwidth....

 (16 kHz sampling rate) and narrowband (telephone quality, 8 kHz sampling rate). Since Speex was designed for Voice over IP (VoIP) instead of cell phone use, the codec must be robust to lost packets, but not to corrupted ones. All this led to the choice of Code Excited Linear Prediction
Code Excited Linear Prediction
Code-excited linear prediction is a speech coding algorithm originally proposed by M.R. Schroeder and B.S. Atal in 1985. At the time, it provided significantly better quality than existing low bit-rate algorithms, such as residual-excited linear prediction and linear predictive coding vocoders...

 (CELP) as the encoding technique to use for Speex. One of the main reasons is that CELP has long proven that it could do the job and scale well to both low bit rate
Bit rate
In telecommunications and computing, bit rate is the number of bits that are conveyed or processed per unit of time....

s (as evidenced by DoD CELP @ 4.8 kbit/s) and high bit rates (as with G.728
G.728
G.728 is an ITU-T standard for speech coding operating at 16 kbit/s. It is officially described as Coding of speech at 16 kbit/s using low-delay code excited linear prediction....

 @ 16 kbit/s).
The main characteristics can be summarized as follows:
  • Free software
    Free software
    Free software, software libre or libre software is software that can be used, studied, and modified without restriction, and which can be copied and redistributed in modified or unmodified form either without restriction, or with restrictions that only ensure that further recipients can also do...

    /open-source, patent
    Patent
    A patent is a form of intellectual property. It consists of a set of exclusive rights granted by a sovereign state to an inventor or their assignee for a limited period of time in exchange for the public disclosure of an invention....

     and royalty
    Royalties
    Royalties are usage-based payments made by one party to another for the right to ongoing use of an asset, sometimes an intellectual property...

    -free.
  • Integration of narrowband and wideband in the same bit-stream.
  • Wide range of bit rates available (from 2 kbit/s to 44 kbit/s).
  • Dynamic bit rate switching and Variable bit-rate (VBR).
  • Voice Activity Detection (VAD, integrated with VBR) (not working from version 1.2).
  • Variable complexity.
  • Ultra-wideband mode at 32 kHz (up to 48 kHz).
  • Intensity stereo
    Intensity stereo
    Intensity stereo or Intensity stereophony is the technique used by a stereo sound image that is produced only by level differences in between the left and right loudspeakers, rather than arrival time differences. Also known as a set up of two microphones from across from each other so you could...

     encoding option.

Features

Sampling rate: Speex is mainly designed for three different sampling rates: 8 kHz (the same sampling rate to transmit telephone
Telephone
The telephone , colloquially referred to as a phone, is a telecommunications device that transmits and receives sounds, usually the human voice. Telephones are a point-to-point communication system whose most basic function is to allow two people separated by large distances to talk to each other...

 calls), 16 kHz, and 32 kHz. These are respectively referred to as narrowband, wideband and ultra-wideband.
Quality: Speex encoding is controlled most of the time by a quality parameter that ranges from 0 to 10. In constant bit-rate (CBR) operation, the quality parameter is an integer
Integer
The integers are formed by the natural numbers together with the negatives of the non-zero natural numbers .They are known as Positive and Negative Integers respectively...

, while for variable bit-rate (VBR), the parameter is a real (floating point
Floating point
In computing, floating point describes a method of representing real numbers in a way that can support a wide range of values. Numbers are, in general, represented approximately to a fixed number of significant digits and scaled using an exponent. The base for the scaling is normally 2, 10 or 16...

) number.
Complexity (variable): With Speex, it is possible to vary the complexity allowed for the encoder. This is done by controlling how the search is performed with an integer ranging from 1 to 10 in a way similar to the -1 to -9 options to gzip
Gzip
Gzip is any of several software applications used for file compression and decompression. The term usually refers to the GNU Project's implementation, "gzip" standing for GNU zip. It is based on the DEFLATE algorithm, which is a combination of Lempel-Ziv and Huffman coding...

 compression
Data compression
In computer science and information theory, data compression, source coding or bit-rate reduction is the process of encoding information using fewer bits than the original representation would use....

 utilities. For normal use, the noise level at complexity 1 is between 1 and 2 dB higher than at complexity 10, but the CPU
Central processing unit
The central processing unit is the portion of a computer system that carries out the instructions of a computer program, to perform the basic arithmetical, logical, and input/output operations of the system. The CPU plays a role somewhat analogous to the brain in the computer. The term has been in...

 requirements for complexity 10 is about five times higher than for complexity 1. In practice, the best trade-off is between complexity 2 and 4, though higher settings are often useful when encoding non-speech sounds like DTMF tones, or if encoding is not in real-time.
Variable Bit-Rate (VBR)
Variable bitrate
Variable bitrate is a term used in telecommunications and computing that relates to the bitrate used in sound or video encoding. As opposed to constant bitrate , VBR files vary the amount of output data per time segment...

: Variable bit-rate (VBR) allows a codec to change its bit rate dynamically to adapt to the "difficulty" of the audio being encoded. In the example of Speex, sounds like vowel
Vowel
In phonetics, a vowel is a sound in spoken language, such as English ah! or oh! , pronounced with an open vocal tract so that there is no build-up of air pressure at any point above the glottis. This contrasts with consonants, such as English sh! , where there is a constriction or closure at some...

s and high-energy transients require a higher bit rate to achieve good quality, while fricatives (e.g. s and f sounds) can be coded adequately with fewer bits. For this reason, VBR can achieve lower bit rate for the same quality, or a better quality for a certain bit rate. Despite its advantages, VBR has three main drawbacks: first, by only specifying quality, there is no guarantee about the final average bit-rate. Second, for some real-time applications like voice over IP
Voice over IP
Voice over Internet Protocol is a family of technologies, methodologies, communication protocols, and transmission techniques for the delivery of voice communications and multimedia sessions over Internet Protocol networks, such as the Internet...

 (VoIP), what counts is the maximum bit-rate, which must be low enough for the communication channel. Third, encryption of VBR-encoded speech may not ensure complete privacy, as phrases can still be identified, at least in a controlled setting with a small dictionary of phrases, by analysing the pattern of variation of the bit rate.
Average Bit-Rate (ABR): Average bit-rate solves one of the problems of VBR, as it dynamically adjusts VBR quality in order to meet a specific target bit-rate. Because the quality/bit-rate is adjusted in real-time (open-loop), the global quality will be slightly lower than that obtained by encoding in VBR with exactly the right quality setting to meet the target average bitrate.
Voice Activity Detection (VAD): When enabled, voice activity detection detects whether the audio being encoded is speech or silence/background noise. VAD is always implicitly activated when encoding in VBR, so the option is only useful in non-VBR operation. In this case, Speex detects non-speech periods and encodes them with just enough bits to reproduce the background noise. This is called "comfort noise
Comfort noise
Comfort noise is synthetic background noise used in radio and wireless communications to fill the artificial silence in a transmission resulting from voice activity detection or from the audio clarity of modern digital lines....

 generation" (CNG). Last version VAD was working fine is 1.1.12, since v 1.2 it has been replaced with simple Any Activity Detection.
Discontinuous Transmission (DTX): Discontinuous transmission is an addition to VAD/VBR operation, that allows to stop transmitting completely when the background noise is stationary. In a file, 5 bits are used for each missing frame (corresponding to 250 bit/s).
Perceptual enhancement: Perceptual enhancement is a part of the decoder which, when turned on, tries to reduce (the perception of) the noise produced by the coding/decoding process. In most cases, perceptual enhancement makes the sound further from the original objectively (signal-to-noise ratio), but in the end it still sounds better (subjective improvement).
Algorithmic delay: Every codec introduces a delay in the transmission. For Speex, this delay is equal to the frame size, plus some amount of "look-ahead" required to process each frame. In narrowband operation (8 kHz), the delay is 30 ms, while for wideband (16 kHz), the delay is 34 ms. These values don't account for the CPU time it takes to encode or decode the frames.

Applications

There is already a large base of applications supporting the Speex codec, from streaming
Streaming media
Streaming media is multimedia that is constantly received by and presented to an end-user while being delivered by a streaming provider.The term "presented" is used in this article in a general sense that includes audio or video playback. The name refers to the delivery method of the medium rather...

 applications like teleconference
Teleconference
A teleconference or teleseminar is the live exchange and mass articulation of information among several persons and machines remote from one another but linked by a telecommunications system...

 (e.g. TeamSpeak
TeamSpeak
TeamSpeak is a proprietary Voice over IP software that allows users to speak on a chat channel with other users, much like a telephone conference call. A TeamSpeak user will often wear a headset with an integrated microphone...

; many servers prefer Speex due to its good quality), to VoIP systems (e.g. Asterisk
Asterisk (PBX)
Asterisk is a software implementation of a telephone private branch exchange ; it was created in 1999 by Mark Spencer of Digium. Like any PBX, it allows attached telephones to make calls to one another, and to connect to other telephone services including the public switched telephone network and...

), to videogames (e.g. Xbox Live
Xbox Live
Xbox Live is an online multiplayer gaming and digital media delivery service created and operated by Microsoft Corporation. It is currently the only online gaming service on consoles that charges users a fee to play multiplayer gaming. It was first made available to the Xbox system in 2002...

, Civilization 4) and audio processing applications. Most of these are based on the DirectShow
DirectShow
DirectShow , codename Quartz, is a multimedia framework and API produced by Microsoft for software developers to perform various operations with media files or streams. It is the replacement for Microsoft's earlier Video for Windows technology...

 filter or OpenACM codec (e.g. Microsoft NetMeeting
Microsoft NetMeeting
Microsoft NetMeeting was a VoIP and multi-point videoconferencing client included in many versions of Microsoft Windows . It used the H.323 protocol for video and audio conferencing, and was interoperable with OpenH323-based clients such as Ekiga, and Internet Locator Service as reflector...

) on Microsoft Windows
Microsoft Windows
Microsoft Windows is a series of operating systems produced by Microsoft.Microsoft introduced an operating environment named Windows on November 20, 1985 as an add-on to MS-DOS in response to the growing interest in graphical user interfaces . Microsoft Windows came to dominate the world's personal...

, or Xiph.org's reference implementation, libvorbis, on Linux
Linux
Linux is a Unix-like computer operating system assembled under the model of free and open source software development and distribution. The defining component of any Linux system is the Linux kernel, an operating system kernel first released October 5, 1991 by Linus Torvalds...

 (e.g. Ekiga
Ekiga
Ekiga /i k ai g a/ is a VoIP and video conferencing application for GNOME and Windows. It is distributed as free software under the terms of the GNU General Public License. It was the default VoIP client in Ubuntu until October 2009, when it was replaced by Empathy...

). There are also plugins for many audio players. See the plugin and software page on the speex.org site for more details.

The media type for Speex is audio/ogg while contained by Ogg, and audio/speex (previously audio/x-speex) when transported through RTP
Real-time Transport Protocol
The Real-time Transport Protocol defines a standardized packet format for delivering audio and video over IP networks. RTP is used extensively in communication and entertainment systems that involve streaming media, such as telephony, video teleconference applications, television services and...

 or without container.

The United States Army
United States Army
The United States Army is the main branch of the United States Armed Forces responsible for land-based military operations. It is the largest and oldest established branch of the U.S. military, and is one of seven U.S. uniformed services...

's Land Warrior
Land Warrior
Land Warrior is a United States Army program, cancelled in 2007, that was to use a combination of commercial, off-the-shelf technology and current-issue military gear and equipment designed to:* integrate small arms with high-tech equipment;...

 system, designed by General Dynamics
General Dynamics
General Dynamics Corporation is a U.S. defense conglomerate formed by mergers and divestitures, and as of 2008 it is the fifth largest defense contractor in the world. Its headquarters are in West Falls Church , unincorporated Fairfax County, Virginia, in the Falls Church area.The company has...

, also uses Speex for VoIP on an EPLRS radio designed by Raytheon
Raytheon
Raytheon Company is a major American defense contractor and industrial corporation with core manufacturing concentrations in weapons and military and commercial electronics. It was previously involved in corporate and special-mission aircraft until early 2007...

.

The Ear Bible is a single-ear headphone with a built-in Speex player with 1 GB of flash memory, preloaded with a recording of the New American Standard Bible
New American Standard Bible
The New American Standard Bible , also informally called New American Standard Version , is an English translation of the Bible....

.

ASL Safety & Security's Linux based VIPA OS software which is used in long line public address systems and voice alarm systems at major international air transport hubs and rail networks.

The Rockbox
Rockbox
Rockbox is a replacement for the standard firmware in various forms of digital audio players . It offers an alternative to the player's operating system, in many cases without removing the original firmware, which provides a plug-in architecture for adding various enhancements and functions...

 project uses Speex for its voice interface. It can also play Speex files on supported players, such as the Apple iPod or the iRiver H10.

The Vernier LabQuest handheld data acquisition device for science education uses Speex for voice annotations created by students and teachers using either the built-in or an external microphone.

The Google Mobile App for iPhone
IPhone
The iPhone is a line of Internet and multimedia-enabled smartphones marketed by Apple Inc. The first iPhone was unveiled by Steve Jobs, then CEO of Apple, on January 9, 2007, and released on June 29, 2007...

 currently incorporates Speex. It has also been suggested that the new Google
Google
Google Inc. is an American multinational public corporation invested in Internet search, cloud computing, and advertising technologies. Google hosts and develops a number of Internet-based services and products, and generates profit primarily from advertising through its AdWords program...

 voice search iPhone
IPhone
The iPhone is a line of Internet and multimedia-enabled smartphones marketed by Apple Inc. The first iPhone was unveiled by Steve Jobs, then CEO of Apple, on January 9, 2007, and released on June 29, 2007...

 app is using Speex to transmit voice to Google servers for interpretation.

Adobe Flash Player supports Speex starting with Flash Player 10.0.12.36, released in October 2008. Because of some bugs in Flash Player, the first recommended version for Speex support is 10.0.22.87 and later. Speex in Flash Player can be used for both kind of communication, through Flash Media Server or P2P
Peer-to-peer
Peer-to-peer computing or networking is a distributed application architecture that partitions tasks or workloads among peers. Peers are equally privileged, equipotent participants in the application...

. Speex can be decoded or converted to any format unlike Nellymoser
Nellymoser Asao Codec
Asao is a proprietary single-channel codec and compression format optimized for low-bitrate transmission of audio, developed by Nellymoser Inc....

 audio, which was the only speech format in previous versions of Flash Player. Speex can be also used in the Flash Video container format (.flv), starting with version 10 of Video File Format Specification (published in November 2008).

The JavaSonics ListenUp voice recorder uses Speex to compress voice messages that are recorded in a browser and then uploaded to a web server. Primary applications are language training, transcription and social networking.

Speex is used as the voice compression algorithm in the Siri
Siri (software)
Siri is an intelligent software assistant and knowledge navigator functioning as a personal assistant application for iOS. The application uses a natural language user interface to answer questions, make recommendations, and perform actions by delegating requests to a set of web services...

 voice assistance on the iPhone 4S
IPhone 4S
The iPhone 4S is a touchscreen slate smartphone developed by Apple Inc. It is the fifth generation of the iPhone, a device that combines a widescreen iPod with a touchscreen, mobile phone, and internet communicator. It retains the exterior design of its predecessor, iPhone 4, but is host to a range...

. Since the speech-to-text occurs on Apple's servers, the speex codec is used to minimize network bandwidth.

External links



This article uses material from the Speex Codec Manual which is copyright © Jean-Marc Valin and licensed under the terms of the GFDL.
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK