VoxSigma - AbsoluteAstronomy.com

VoxSigma is a speech recognition

Speech recognition

Speech recognition converts spoken words to text. The term "voice recognition" is sometimes used to refer to recognition systems that must be trained to a particular speaker—as is the case for most desktop recognition software...

software suite developed by Vocapia Research
for Unix-like

Unix-like

A Unix-like operating system is one that behaves in a manner similar to a Unix system, while not necessarily conforming to or being certified to any version of the Single UNIX Specification....

x86 and x86-64

X86-64

x86-64 is an extension of the x86 instruction set. It supports vastly larger virtual and physical address spaces than are possible on x86, thereby allowing programmers to conveniently work with much larger data sets. x86-64 also provides 64-bit general purpose registers and numerous other...

platforms.

History

The VoxSigma software suite has its roots at LIMSI

LIMSI

Founded in 1972, Laboratoire d'informatique pour la mécanique et les sciences de l'ingénieur is a CNRS laboratory associated with the Paris-VI and Paris-Sud-XI Universities. LIMSI is located in Orsay on the campus of the University of Paris-Sud...

, a French CNRS
laboratory conducting research on speech processing since the 70's. VoxSigma is
the latest generation of speech processing offered by Vocapia Research, building
upon accurate statistical modeling techniques developed at LIMSI

LIMSI

for
speech production and speech perception. The first commercial version was
released in July 2003.

Features

The VoxSigma suite offers large vocabulary speech-to-text capabilities in multiple languages.
It includes adaptive features allowing the transcription of noisy speech, such as speech over background music.
The software suite has been designed for professional users needing to transcribe large quantities of audio and
video documents such as broadcast data, either in batch mode or in real-time. Versions can also be used to
transcribe call-center data.

The speech-to-text processing result is a fully annotated XML document including labels for speech and non-speech
segments, speaker labels, words with time codes and high quality confidence scores. This XML file can be directly
indexed by a search engine, or alternatively can be converted into plain text with capitalization and punctuation.

Supported languages: Arabic, Dutch, English, French, German, Greek, Italian, Mandarin, Russian, Spanish.

The source of this article is wikipedia, the free encyclopedia. The text of this article is licensed under the GFDL.