People have figured out how to use the Google Speech API (Speech-To-Text). I'm trying to get it working with Flash Speex codec, and I just can't figure it out. I've tried inserting frame size byte before each 160 bytes (as some sources say), but this doesn't work.
So I post a challenge to somehow translate the flash speex bytes for Google Speech API to understand.
Here is basic flex code:
<?xml version="1.0" encoding="utf-8"?>
<s:Application xmlns:fx="http://ns.adobe.com/mxml/2009"
xmlns:s="library://ns.adobe.com/flex/spark"
creationComplete="init();">
<fx:Script>
<![CDATA[
// Speech API info
// Reference: http://mikepultz.com/2011/03/accessing-google-speech-api-chrome-11/,
// Reference: https://stackoverflow.com/questions/4361826/does-chrome-have-buil-in-speech-recognition-for-input-type-text-x-webkit-speec
private static const speechApiUrl:String = "http://www.google.com/speech-api/v1/recognize";
private 开发者_如何学Gostatic const speechLanguage:String = "en";
private static const mimeType:String = "audio/x-speex-with-header-byte";
private static const sampleRate:uint = 8;
// Sound bytes & mic
private var soundBytes:ByteArray;
private var microphone:Microphone;
// Initial setup
private function init():void {
// Set up the microphone
microphone = Microphone.getMicrophone();
// Speech API supports 8khz and 16khz rates
microphone.rate = sampleRate;
// Select the SPEEX codec
microphone.codec = SoundCodec.SPEEX;
// I don't know what effect this has...
microphone.framesPerPacket = 1;
}
// THIS IS THE CHALLENGE
// We have the flash speex bytes and we need to translate them so Google API understands
private function process():void{
soundBytes.position = 0;
var processed:ByteArray = new ByteArray();
processed.endian = Endian.BIG_ENDIAN;
var frameSize:uint = 160;
for(var n:uint = 0; n < soundBytes.bytesAvailable / frameSize; n++){
processed.writeByte(frameSize);
processed.writeBytes(soundBytes, frameSize * n, frameSize);
}
processed.position = 0;
soundBytes = processed;
}
// Sending to Google Speech server
private function send():void {
var loader:URLLoader = new URLLoader();
var request:URLRequest = new URLRequest(speechApiUrl + "?lang=" + speechLanguage);
request.method = URLRequestMethod.POST;
request.data = soundBytes;
request.contentType = mimeType + "; rate=" + (1000 * sampleRate);
loader.addEventListener(Event.COMPLETE, onComplete);
loader.addEventListener(IOErrorEvent.IO_ERROR, onError);
loader.load(request);
trace("Connecting to Speech API server");
}
private function onError(event:IOErrorEvent):void{
trace("Error: " + event.toString());
}
private function onComplete(event:Event):void{
trace("Done: " + event.target.data);
}
private function record(event:Event):void{
soundBytes = new ByteArray();
soundBytes.endian = Endian.BIG_ENDIAN;
microphone.addEventListener(SampleDataEvent.SAMPLE_DATA, sampleData);
}
private function sampleData(event:SampleDataEvent):void {
soundBytes.writeBytes(event.data, 0, event.data.bytesAvailable);
}
private function stop(e:Event):void {
microphone.removeEventListener(SampleDataEvent.SAMPLE_DATA, sampleData);
if(soundBytes != null){
process();
send();
}
}
]]>
</fx:Script>
<s:HGroup>
<s:Button label="Record"
click="record(event)"/>
<s:Button label="Stop and Send"
click="stop(event)"/>
</s:HGroup>
</s:Application>
For more info check this links: http://mikepultz.com/2011/03/accessing-google-speech-api-chrome-11/ and Does Chrome have built-in speech recognition for "x-webkit-speech" input elements?
The code you are looking for is at http://src.chromium.org/viewvc/chrome/trunk/src/content/browser/speech/speech_recognizer.cc?view=diff&r1=79556&r2=79557 around lines 100-160 which in turn #includes .../viewvc/chrome/trunk/deps/third_party/speex/
However, Chrome switched from Speex to FLAC at the end of March without any real explanation in the change log -- http://src.chromium.org/viewvc/chrome/trunk/src/content/browser/speech/speech_recognizer.cc?view=diff&r1=79556&r2=79557 -- so I would not advise using Speex. On the other hand, someone looked at the Android source and said they still use Speex there, so it's likely they will keep it (it's less than a fifth as many bytes per second.)
精彩评论