最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

Python streamlit realtime speech-to-text with azure SDK - Stack Overflow

programmeradmin0浏览0评论

Hello I'm trying to create a real-time speech to text using streamlit and azure speech SDK. I can easilly transcribe audio/video files with no issues, but I want to integrate realtime transcription (from browser).

In the code I've tried, when I speak in the microphone, no sentence is transcribed

I've also tried to reuse the function I've created that uses files, pass the AudioStream and turn it async but didn't work it either.

The guided path: .html

Only works on local machine because it uses host's microphone

I've tried with the code posted below, searched on Google and asked AI. I want the user to be able to start live speech-to-text with live transcription in the chat (speaker recognition must stay)

EDIT 19/3: By using pydub I can now save and listen the .wav file, only need to pass the stream to the speech SDK

Edited Code:

def addsentence(evt: ConversationTranscriptionEventArgs):
    if evt.result.speaker_id == "Unknown":
        logger.debug("Unknown speaker: " + str(evt))
        return
    logger.info(f"Detected **{evt.result.speaker_id}**: {evt.result.text}")
    st.session_state.r.append(f"**{evt.result.speaker_id}**: {evt.result.text}")
webrtc_ctx = webrtc_streamer(key="speech-to-text", mode=WebRtcMode.SENDONLY,
        media_stream_constraints={"video": False, "audio": True},
        audio_receiver_size=256)

while webrtc_ctx.state.playing:
    if not st.session_state["recording"]:
        st.session_state.r = []

        st.session_state.stream = PushAudioInputStream()
        ###
        audio_input = speechsdk.AudioConfig(stream=st.session_state.stream)
        speech_config = speechsdk.SpeechConfig(env["SPEECH_KEY"], env["SPEECH_REGION"])
        if "proxy_host" in env and "proxy_port" in env:
            speech_config.set_proxy(env["proxy_host"], int(env["proxy_port"]))
        conversation_transcriber = ConversationTranscriber(speech_config, audio_input, language="it-IT")

        conversation_transcriber.transcribed.connect(addsentence)
        ###

        st.session_state.fullwav = pydub.AudioSegment.empty()
        with (st.chat_message("assistant")):
            with st.spinner("Trascrizione in corso..."):
                stream_placeholder = st.expander("Trascrizione", icon="
发布评论

评论列表(0)

  1. 暂无评论