Thursday, April 17, 2008

Speech Libraries added to .NET 3.0

Microsoft has wrapped the SAPI into .NET 3.0 with the System.Speech namespace. System.Speech has two main sections. They are Recognition and Synthesis. In order to use the System.Speech library you need to add a reference to it.

Synthesis

Synthesis can be done very easily.

SpeechSynthesizer synthesizer = new SpeechSynthesizer();
synthesizer.Speak("This text is spoken by the computer.");

In SAPI 5.3 (the version that System.Speech wraps) Microsoft Anna is the only voice available. Hopefully Microsoft will make more voices available.

Recognition

Getting started with speech recognition is a little more involved. You need to decide whether you want to recognize text (dictation) or recognize specific commands. The Grammar object will tell the recognition engine whether you want to recognize text or recognize specific commands. First we'll look at dictation (recognizing text).

static void Main(string[] args)
{
SpeechRecognizer recognizer = new SpeechRecognizer();
recognizer.LoadGrammar(new DictationGrammar());
recognizer.SpeechRecognized += new EventHandler(recognizer_SpeechRecognized);
}

static void recognizer_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
{
if (e.Result != null)
Console.WriteLine(e.Result.Text);
}

This will recognize speech from the line in and write text to the console. We can also accept input from a wave file.

SpeechRecognitionEngine engine = new SpeechRecognitionEngine();
engine.LoadGrammar(new DictationGrammar());
engine.SetInputToWaveFile("C:\\My Temp\\Test Audio Search.wav");
RecognitionResult result = engine.Recognize();
if (result != null)
Console.WriteLine("Recognized: " + result.Text);

Command recognition is done by creating a Grammer object that identifies the commands you would like to be recognized. You can build the Grammer object with GrammerBuilder and Choice classes. Here is a simple example:

static void Main(string[] args)
{
SpeechRecognizer recognizer = new SpeechRecognizer();
GrammarBuilder commandGB = new GrammarBuilder();
commandGB.Append(new Choices("open", "close"));
recognizer.LoadGrammar(new Grammar(commandGB));
recognizer.SpeechRecognized += new EventHandler(recognizer_SpeechRecognized);
}

static void recognizer_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
{
if (e.Result != null)
Console.WriteLine("Recognized: " + e.Result.Text);
}

Here we are looking for open or close as commands. The System.Speech namespace makes this very easy to code, but the speech recognition and synthesis aren't the quality you would hope for in 2008. Synthesis can be improved with SSML and SRGS although it still sounds robotic and the only voice available is Anna. Recognition needs to be calibrated to get any sort of reasonably accurate results, at least for dictation. Also the documentation leaves a lot to be discovered. I like that Microsoft has included this in a managed library, but I would like to see even more effort applied to the speech technology itself.

Save to del.icio.us Add to Technorati Add to dzone

No comments: