Scott Hanselman recently blogged about speech recognition in Windows Vista and so I went to try it myself. In general, the results are good. Throughout the process, I had a number of problems and so I’m not yet sure that I can dictate faster than I can type.

In the initial configuration wizzard, there was a problem with the volume of my microphone. Windows thought it could not understand me and showed the following funny window:

Speech recognition seems to have enormous problems with the capitalisation of various words. For example, at first it was unable to learn that Windows Vista had to be capitalised this way. It also recognized the word Vista as Easter all that time — I’m still not sure it works correctly although I have corrected that problem a number of times.
There are also a number of problems that come up every now and then, such as the cursor jumping to a different place in the text when dictating single words. In these cases, it helps to dictate the words in the context of a longer sentence. I have not yet understood why this cursor jumping issue happens at all… There must be some kind of command that makes the cursor jump to a special position, but I’m not aware of it. This also sometimes comes up when I’m trying to dictate sentences — I was trying to dictate a sentence starting with “Now let’s try…” for 10 minutes or so before giving up… the “now” always made the cursor jump around instead of ever inserting the right word.
Finally, a conceptual problem seems to be that the system regularly makes mistakes that I can’t correct using the automatic functionality. An example of this is the insertion of short words into dictated sentences — there is no way to correct that kind of problem using voice commands. In most cases, not even the selection of such words works correctly. But if I can’t use an automated functionality to do the correction, then the system obviously can’t learn from me.

Overall I get the impression that the quality of speech recognition is still not good enough to understand exactly what I’m saying when I just talk normally. That may be my particular accent or the fact that I am not a native English speaker, but the result is that I have to do so many corrections that I could easily type the same text faster on the keyboard. Plus, I’m using a good quality USB headset — I can’t imagine how bad the quality would be if I were using a built in microphone instead. After all, I wouldn’t want to be sitting around with my headset on my ears all day just to give a command to my computer every now and then. Most of this post was dictated instead of typed, and I guesstimate that it took me about three or four times as long as if I had just typed it completely.

First blog post using speech recognition