Dec. 11, 2014
Always-Active Voice Command Recognition Without False Detections
Python, AutoHotkey, C#, PowerShell
dragonfly, Windows Speech Recognition
Uses the built-in voice recognition engine of Windows to respond to spoken command, without accidental triggers from conversation, music, or television.
I use a simple but highly effective technique for skewing Windows speech recognition toward false negatives (missed detection) instead of false positives (triggering when a command wasn't spoken). Because the match percent returned by WSR is a relative value between potential words it's unsuitable as a general estimate of detection accuracy. What I needed was a set of highly distinct sounds, to use as a "combination lock". The phonetic alphabet was ideally suited.
By setting the trigger phrase to three specific letters in a row, e.g. "alpha bravo charlie" it opens 15,600 permutations the speech recognition engine will attempt to match to all spoken phrases. This number is large enough that the command never triggers accidentally. (Since 2013 I've had the system active with daily computer use: YouTube, Twitch.tv, Pandora, etc. and it has only falsely triggered a few times.)
With the phonetic alphabet in place as the available choices, it's easy enough to slip in some custom words, particularly: "come" "pew" "tur". The engine will now trigger on the phrase "computer", rather than "alpha bravo charlie". You may have to enunciate, but it won't trigger on accident.
Of course once the trigger word is heard the remaining portion is trivially matched to your command choice.
This technique can easily be adapted for a public-but-secure voice recognition system, with rotating codes. A user with access to the current code (e.g. "delta hotel foxtrot") can control the device with their voice, upon which the system will select and privately notify the user of the new code. I would love to see an implementation of this with a smartwatch like Pebble.
I originally learned you could leverage Vista's (and successor's) built-in voice recognition training to execute your own strict set of commands from a PowerShell script written by Joel "Jaykul" Bennett.
I then wrote my own quick implementation in C# as a DLL I ran from AutoHotkey.
When I discovered "acampbell" had submitted a patch to add Sapi5InProcEngine support (the particular WSR engine variant that doesn't include the accessibility commands most are familiar with) to the dragonfly Python package, I ported my code to Python. I then successfully lobbied to have the patch added to the official version (thanks t4ngo).