Recent Articles



































Speech recognition



         


Speech recognition technologies allow computers equipped with a source of sound input, such as a microphone, to interpret human speech, e.g. for transcription or as an alternative method of interacting with a computer.

Such systems can be classified as to whether

Speaker dependent systems requiring a short amount of training can (as of 2001) capture continuous speech with a large vocabulary at normal pace with an accuracy of about 98% (getting two words in one hundred wrong) if operated under optimal conditions, and different systems that require no training can recognize a small number of words (for instance, the ten digits of the decimal system) as spoken by most English speakers. Such systems are popular for routing incoming phone calls to their destinations in large organisations.

Commercial systems for speech recognition have been available off-the-shelf since the 1990s. However, it is interesting to note that despite the apparent success of the technology, few people use such speech recognition systems on their desktop computers. However, the use of speech recognition in telephone applications, for appplications like travel booking and information, financial account information, and directory assistance has been increasing as the cost for implementing such voice-activated systems has dropped.

It appears that most computer users can create and edit documents more quickly with a conventional keyboard, despite the fact that most people are able to speak considerably faster than they can type. Using both keyboard and speech recognition simultaneously, however, can in some cases be more efficient than using any one of these inputs alone. Additionally, heavy use of the speech organs results in vocal loading. Also, the typical office environment with a high amplitude of background speechs are among the most adverse environment for current speech recognition technologies.

Large-vocabulary systems with speaker-independence and/or are designed to operate within an adverse environment, however, have significantly lower recognition rates. The typical achievable recognition rate (2003) for large-vocabulary speaker-indenependent are about 80%-90% for clear environment, and can be as low as 50% for scenarios like cellular phone with background noise.

Some of the key technical problems in speech recognition are that:

The "understanding" of the meaning of spoken words is regarded by some as a separate field, that of natural language understanding. However, there are many examples of sentences that sound the same, but can only be disambiguated by an appeal to context: one famous T-shirt worn by Apple Computer researchers stated,

I helped Apple wreck a nice beach,

which, when spoken, sounds like I helped Apple recognize speech.

A general solution of many of the above problems effectively requires human knowledge and experience, and would thus require advanced pattern recognition and artificial intelligence technologies to be implemented on a computer. In particular, statistical language models are often employed for disambiguation and improvement of the recognition accuracies.

For foreign speakers an unintended side-effect of using speech recognition technology is that they can improve their pronunciation while trying to make the computer understand what they're saying.






  View Live Article   This article is from Wikipedia. All text is available under the terms of the GNU Free Documentation License