Python streamlit realtime speech-to-text with azure SDK

Hello I'm trying to create a real-time speech to text using streamlit and azure speech SDK. I can easilly transcribe audio/video files with no issues, but I want to integrate realtime transcription (from browser).

In the code I've tried, when I speak in the microphone, no sentence is transcribed

I've also tried to reuse the function I've created that uses files, pass the AudioStream and turn it async but didn't work it either.

The guided path: .html

Only works on local machine because it uses host's microphone

I've tried with the code posted below, searched on Google and asked AI. I want the user to be able to start live speech-to-text with live transcription in the chat (speaker recognition must stay)

EDIT 19/3: By using pydub I can now save and listen the .wav file, only need to pass the stream to the speech SDK

Edited Code:

def addsentence(evt: ConversationTranscriptionEventArgs):
    if evt.result.speaker_id == "Unknown":
        logger.debug("Unknown speaker: " + str(evt))
        return
    logger.info(f"Detected **{evt.result.speaker_id}**: {evt.result.text}")
    st.session_state.r.append(f"**{evt.result.speaker_id}**: {evt.result.text}")

webrtc_ctx = webrtc_streamer(key="speech-to-text", mode=WebRtcMode.SENDONLY,
        media_stream_constraints={"video": False, "audio": True},
        audio_receiver_size=256)

while webrtc_ctx.state.playing:
    if not st.session_state["recording"]:
        st.session_state.r = []

        st.session_state.stream = PushAudioInputStream()
        ###
        audio_input = speechsdk.AudioConfig(stream=st.session_state.stream)
        speech_config = speechsdk.SpeechConfig(env["SPEECH_KEY"], env["SPEECH_REGION"])
        if "proxy_host" in env and "proxy_port" in env:
            speech_config.set_proxy(env["proxy_host"], int(env["proxy_port"]))
        conversation_transcriber = ConversationTranscriber(speech_config, audio_input, language="it-IT")

        conversation_transcriber.transcribed.connect(addsentence)
        ###

        st.session_state.fullwav = pydub.AudioSegment.empty()
        with (st.chat_message("assistant")):
            with st.spinner("Trascrizione in corso..."):
                stream_placeholder = st.expander("Trascrizione", icon="

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

Python streamlit realtime speech-to-text with azure SDK - Stack Overflow

`与本文相关的文章`

`评论列表(0)`