Gnuspeech
Encyclopedia
Gnuspeech is an extensible text-to-speech computer software package
Software package
Software package may refer to:* A piece of application software or utility software** A software suite, or collection of related application or utility software* A software library...

 that produces artificial speech output based on real-time articulatory
Articulatory synthesis
Articulatory synthesis refers to computational techniques for synthesizing speech based on models of the human vocal tract and the articulation processes occurring there. The shape of the vocal tract can be controlled in a number of ways which usually involves modifying the position of the speech...

 speech synthesis by rules. That is, it converts text strings into phonetic descriptions, aided by a pronouncing dictionary, letter-to-sound rules, and rhythm and intonation models; transforms the phonetic descriptions into parameters for a low-level articulatory speech synthesizer; uses these to drive an articulatory model of the human vocal tract
Vocal tract
The vocal tract is the cavity in human beings and in animals where sound that is produced at the sound source is filtered....

 producing an output suitable for the normal sound output devices used by various computer operating system
Operating system
An operating system is a set of programs that manage computer hardware resources and provide common services for application software. The operating system is the most important type of system software in a computer system...

s; and does this at the same or faster rate than the speech is spoken for adult speech.

Design

The synthesizer is a tube resonance, or waveguide, model that models the behavior of the real vocal tract
Vocal tract
The vocal tract is the cavity in human beings and in animals where sound that is produced at the sound source is filtered....

 directly, and reasonably accurately, unlike formant synthesizers that indirectly model the speech spectrum. The control problem is solved by using René Carré’s Distinctive Region Model which relates changes in the radii of eight longitudinal divisions of the vocal tract to corresponding changes in the three frequency formants in the speech spectrum that convey much of the information of speech. The regions are, in turn, based on work by the Stockholm Speech Technology Laboratory of the Royal Institute of Technology (KTH) on "formant sensitivity analysis" - that is, how formant frequencies are affected by small changes in the radius of the vocal tract at various places along its length.

History

Gnuspeech was originally commercial software produced by the now-defunct Trillium Sound Research for the NeXT computer as various grades of "TextToSpeech" kit. Trillium Sound Research was a technology transfer
Technology transfer
Technology Transfer, also called Transfer of Technology and Technology Commercialisation, is the process of skill transferring, knowledge, technologies, methods of manufacturing, samples of manufacturing and facilities among governments or universities and other institutions to ensure that...

 spin-off company formed at the University of Calgary, Alberta, Canada, based on long-standing research in the computer science department on computer-human interaction using speech, where papers and manuals relevant to the system are maintained. The initial version in 1992 used a formant-based speech synthesiser. When NeXT
NeXT
Next, Inc. was an American computer company headquartered in Redwood City, California, that developed and manufactured a series of computer workstations intended for the higher education and business markets...

 ceased manufacturing hardware, the synthesizer software was completely re-written and also ported to NSFIP (NextStep For Intel Processors) using the waveguide approach to acoustic tube modeling based on the research at the Center for Computer Research in Music and Acoustics (CCRMA) at Stanford University, especially the Music Kit. The synthesis approach is explained in more detail in a paper presented to the American Voice I/O Society in 1995. The system used the onboard 56001 Digital Signal Processor (DSP) on the NeXT computer and a Turtle Beach add-on board with the same DSP on the NSFIP version to run the waveguide (also known as the tube model). Speed limitations meant that the shortest vocal tract length that could be used for speech in real time (that is, generated at the same or faster rate than it was "spoken") was around 15 centimeters, because the sample rate for the waveguide computations increases with decreasing vocal tract length. Faster processor speeds are progressively removing this restriction, an important advance for producing children's speech in real time.

Trillium ceased trading in the late 1990s and the Gnuspeech project was first entered into the GNU Savannah
GNU Savannah
GNU Savannah is a project of the Free Software Foundation initiated by Loïc Dachary, which serves as a collaborative software development management system for Free Software projects. Savannah currently offers CVS, GNU arch, Subversion, Git, Mercurial, Bazaar, mailing list, web hosting, file...

 repository under the terms of the GNU General Public License
GNU General Public License
The GNU General Public License is the most widely used free software license, originally written by Richard Stallman for the GNU Project....

 in 2002, as an official GNU
GNU
GNU is a Unix-like computer operating system developed by the GNU project, ultimately aiming to be a "complete Unix-compatible software system"...

 software.

Portability

Various associated modules used to help in developing the original spoken English databases are being ported
Porting
In computer science, porting is the process of adapting software so that an executable program can be created for a computing environment that is different from the one for which it was originally designed...

 and they could be used for other languages. The whole software suite is suitable for psychoacoustic and linguistic
Linguistics
Linguistics is the scientific study of human language. Linguistics can be broadly broken into three categories or subfields of study: language form, language meaning, and language in context....

 research, but is currently only complete for the NeXT. A main module - Monet - is available for Mac OS X
Mac OS X
Mac OS X is a series of Unix-based operating systems and graphical user interfaces developed, marketed, and sold by Apple Inc. Since 2002, has been included with all new Macintosh computer systems...

. Monet allows the creation and modification of the rules used to form and concatenate the speech sound parameters for different languages, with the exception of the rules used for intonation. However, the rule-based intonation can be manually varied.

External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK