I am using Azure Speech Service resource to transcribe real time audio from my mic using microsoft-cognitiveservices-speech-sdk. I want to send the transcribed text to another endpoint (or to an azure function through which I can send the text to another endpoint) before the recognized event runs on my browser app.
Is there a way to do this? I could not find anything to help me with this in the documentation or in the portal.
Any help is appreciated.
I am using Azure Speech Service resource to transcribe real time audio from my mic using microsoft-cognitiveservices-speech-sdk. I want to send the transcribed text to another endpoint (or to an azure function through which I can send the text to another endpoint) before the recognized event runs on my browser app.
Is there a way to do this? I could not find anything to help me with this in the documentation or in the portal.
Any help is appreciated.
Share Improve this question asked Feb 7 at 7:58 Abdullah NadeemAbdullah Nadeem 1 2- Can you provide your code in the question? – Dasari Kamali Commented Feb 7 at 8:13
- Please provide enough code so others can better understand or reproduce the problem. – Community Bot Commented Feb 7 at 19:23
1 Answer
Reset to default 0I tried the below Node.js code to convert speech to text using the microsoft-cognitiveservices-speech-sdk
. I recorded audio using my microphone, saved it as a .wav file, and then sent the transcribed text to another endpoint through an HTTP-triggered Azure Function in JavaScript.
index.js :
const fs = require('fs');
const sdk = require("microsoft-cognitiveservices-speech-sdk");
const axios = require('axios');
const speechKey = "<speechKey>";
const speechRegion = "<speechRegion>";
const speechConfig = sdk.SpeechConfig.fromSubscription(speechKey, speechRegion);
speechConfig.speechRecognitionLanguage = "en-US";
const audioConfig = sdk.AudioConfig.fromWavFileInput(fs.readFileSync("kamsp.wav"));
const speechRecognizer = new sdk.SpeechRecognizer(speechConfig, audioConfig);
const functionEndpoint = "http://localhost:7071/api/ProcessSpeech";
speechRecognizer.recognized = async (s, e) => {
if (e.result.reason === sdk.ResultReason.RecognizedSpeech) {
console.log(`RECOGNIZED: Text=${e.result.text}`);
try {
await axios.post(functionEndpoint, { text: e.result.text }, { headers: { 'Content-Type': 'application/json' } });
console.log("Transcription sent to function.");
} catch (error) {
console.error("Error sending transcription:", error);
}
}
};
speechRecognizer.sessionStopped = (s, e) => {
speechRecognizer.stopContinuousRecognitionAsync();
};
speechRecognizer.startContinuousRecognitionAsync();
httpTrigger1.js :
const { app } = require('@azure/functions');
let latestTranscription = "";
app.http('processSpeech', {
methods: ['GET', 'POST'],
authLevel: 'anonymous',
handler: async (request, context) => {
context.log("Speech-to-text function triggered");
if (request.method === "POST") {
try {
const requestBody = await request.json();
const text = requestBody.text;
if (text) {
latestTranscription = text;
context.log(`Received transcription: ${text}`);
return { body: "Transcription received", status: 200 };
}
return { body: "No transcription received", status: 400 };
} catch (error) {
context.log(`Error: ${error}`);
return { body: "Error processing transcription", status: 500 };
}
} else if (request.method === "GET") {
return { body: `Latest Transcription: ${latestTranscription}`, status: 200 };
}
}
});
Nodejs Output :
HPPT trigger Function Output :