CMU Sphinx
Encyclopedia
CMU Sphinx, also called Sphinx in short, is the general term to describe a group of speech recognition systems developed at Carnegie Mellon University
Carnegie Mellon University
Carnegie Mellon University is a private research university in Pittsburgh, Pennsylvania, United States....

. These include a series of speech recognizers (Sphinx 2 - 4) and an acoustic model
Acoustic Model
An acoustic model is created by taking audio recordings of speech, and their text transcriptions, and using software to create statistical representations of the sounds that make up each word. It is used by a speech recognition engine to recognize speech....

 trainer (SphinxTrain).

In 2000, the Sphinx group at Carnegie Mellon committed to open source several speech recognizer components, including Sphinx 2 and later Sphinx 3 (in 2001). The speech decoders come with acoustic models and sample applications. The available resources include in addition software for acoustic model training, Language model
Language model
A statistical language model assigns a probability to a sequence of m words P by means of a probability distribution.Language modeling is used in many natural language processing applications such as speech recognition, machine translation, part-of-speech tagging, parsing and information...

 compilation and a public-domain pronunciation dictionary, cmudict
CMU Pronouncing Dictionary
The CMU Pronouncing Dictionary is a public domain pronouncing dictionary created by Carnegie Mellon University . It is used as the American lexicon for the Festival Speech Synthesis System and also for the CMU Sphinx speech recognition system...

.

Sphinx encompasses a number of software systems, described below.

Sphinx

Sphinx is a continuous-speech, speaker-independent recognition system making use of hidden Markov acoustic models (HMMs
Hidden Markov model
A hidden Markov model is a statistical Markov model in which the system being modeled is assumed to be a Markov process with unobserved states. An HMM can be considered as the simplest dynamic Bayesian network. The mathematics behind the HMM was developed by L. E...

) and an n-gram statistical language model. It was developed by Kai-Fu Lee. Sphinx featured feasibility of continuous-speech, speaker-independent large-vocabulary recognition, the possibility of which was in dispute at the time (1986). Sphinx is of historical interest only; it has been superseded in performance by subsequent versions. An archival article describes the system in detail.

Sphinx 2

A fast performance-oriented recognizer, originally developed by Xuedong Huang
Xuedong Huang
Xuedong David Huang is the key person behind Microsoft's spoken language and search technologies. He is currently a Distinguished Engineer and Architect in Microsoft's online services division for Bing.-Education:In 1978, Huang entered Hunan University without finishing his high school. He...

 at Carnegie Mellon and released as Open source
Open source
The term open source describes practices in production and development that promote access to the end product's source materials. Some consider open source a philosophy, others consider it a pragmatic methodology...

 with a BSD-style license on SourceForge
SourceForge
SourceForge Enterprise Edition is a collaborative revision control and software development management system. It provides a front-end to a range of software development lifecycle services and integrates with a number of free software / open source software applications .While originally itself...

 by Kevin Lenzo
Kevin Lenzo
Kevin Lenzo is an American computer scientist. He wrote the initial infobot, founded The Perl Foundation and the Yet Another Perl Conferences , released CMU Sphinx into Open source, founded Cepstral LLC, and has been a major contributor to the Festival Speech Synthesis System, FestVox, and Flite...

 at LinuxWorld in 2000. Sphinx 2 focuses on real-time recognition suitable for spoken language applications. As such it incorporates functionality such as end-pointing, partial hypothesis generation, dynamic language model switching and so on. It is used in dialog systems and language learning systems. It can be used in computer based PBX systems such as Asterisk
Asterisk (PBX)
Asterisk is a software implementation of a telephone private branch exchange ; it was created in 1999 by Mark Spencer of Digium. Like any PBX, it allows attached telephones to make calls to one another, and to connect to other telephone services including the public switched telephone network and...

. Sphinx 2 code has also been incorporated into a number of commercial products. It is no longer under active development (other than for routine maintenance). Current real-time decoder development is taking place in the Pocket Sphinx project. An archival article describes the system.

Sphinx 3

Sphinx 2 used a semi-continuous representation for acoustic modeling (i.e., a single set of Gaussians is used for all models, with individual models represented as a weight vector over these Gaussians). Sphinx 3 adopted the prevalent continuous HMM representation and has been used primarily for high-accuracy, non-real-time recognition. Recent developments (in algorithms and in hardware) have made Sphinx 3 "near" real-time, although not yet suitable for critical interactive applications. Sphinx 3 is under active development and in conjunction with SphinxTrain provides access to a number of modern modeling techniques, such as LDA/MLLT, MLLR and VTLN, that improve recognition accuracy (see the article on Speech Recognition
Speech recognition
Speech recognition converts spoken words to text. The term "voice recognition" is sometimes used to refer to recognition systems that must be trained to a particular speaker—as is the case for most desktop recognition software...

 for descriptions of these techniques).

Sphinx 4

Sphinx 4 is a complete re-write of the Sphinx engine with the goal of providing a more flexible framework for research in speech recognition, written entirely in the Java programming language. Sun Microsystems
Sun Microsystems
Sun Microsystems, Inc. was a company that sold :computers, computer components, :computer software, and :information technology services. Sun was founded on February 24, 1982...

 supported the development of Sphinx 4 and contributed software engineering expertise to the project. Participants included individuals at MERL, MIT
Massachusetts Institute of Technology
The Massachusetts Institute of Technology is a private research university located in Cambridge, Massachusetts. MIT has five schools and one college, containing a total of 32 academic departments, with a strong emphasis on scientific and technological education and research.Founded in 1861 in...

 and CMU
Carnegie Mellon University
Carnegie Mellon University is a private research university in Pittsburgh, Pennsylvania, United States....

.

Current development goals include:
  • developing a new (acoustic model) trainer
  • implementing speaker adaptation (e.g. MLLR)
  • improving configuration management
  • creating a graph-based UI
    ConfDesigner
    ConfDesigner is a graphical environment written in Java, which eases the design of complex system configurations.Because of being part of the Sphinx4 Speech Recognizer, ConfDesinger is licensed under BSD licenses. ConfDesinger is based on the .-WebStart:...

     for graphical system design

PocketSphinx

A version of Sphinx that can be used in embedded systems (e.g., based on an ARM
ARM architecture
ARM is a 32-bit reduced instruction set computer instruction set architecture developed by ARM Holdings. It was named the Advanced RISC Machine, and before that, the Acorn RISC Machine. The ARM architecture is the most widely used 32-bit ISA in numbers produced...

 processor). PocketSphinx is under active development and incorporates features such as fixed-point arithmetic and efficient algorithms for GMM
Mixture model
In statistics, a mixture model is a probabilistic model for representing the presence of sub-populations within an overall population, without requiring that an observed data-set should identify the sub-population to which an individual observation belongs...

computation.

External links

  • CMU Sphinx homepage
  • (broken link) Sphinx subwiki - Getting started tutorials + python integration information.
  • SourceForge hosts Sphinx software and should be considered the definitive source for code.
  • (broken link) NeXT on Campus Fall 1990 (This document is postscript format compressed with gzip.) Carnegie Mellon University - Breakthroughs in speech recognition and document management, pgs. 12-13
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK