最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

python - Google Speech returning all words ever spoken, instead of just the words from the transcript - Stack Overflow

programmeradmin0浏览0评论

Using Google Speech in Python, I'm able to get a transcript for each phrase spoken using result.alternatives[0].transcript, but when I try to look at the words for the phrase, result.alternatives[0].words always returns an array of ALL of the words ever spoken, not just the words from the transcript... which seems wrong? Is this a bug, or is there some way to filter out/reset the words array, since I'm only interested in the words in the spoken phrase.

My code:

if not response.results:
            continue

        result = response.results[0]
        if not result.alternatives:
            continue

        transcript = result.alternatives[0].transcript
        confidence = result.alternatives[0].confidence
        words = result.alternatives[0].words

        if result.is_final:
            print("*******************")
            sensory_log.info(f"Final STT output: {transcript}")
            print(f"Confidence: {confidence:.2f}")
            self.process_input(transcript)

            # Check for multiple speakers using words
            if words:
                print(words)
                # Track unique speaker IDs using a list
                speaker_ids = []
                for word in words:
                    print(f"Word: {word.word} (speaker_tag: {word.speaker_tag})")
                    if word.speaker_tag not in speaker_ids:
                        speaker_ids.append(word.speaker_tag)
                
                print(f"Detected {len(speaker_ids)} speakers")

Using Google Speech in Python, I'm able to get a transcript for each phrase spoken using result.alternatives[0].transcript, but when I try to look at the words for the phrase, result.alternatives[0].words always returns an array of ALL of the words ever spoken, not just the words from the transcript... which seems wrong? Is this a bug, or is there some way to filter out/reset the words array, since I'm only interested in the words in the spoken phrase.

My code:

if not response.results:
            continue

        result = response.results[0]
        if not result.alternatives:
            continue

        transcript = result.alternatives[0].transcript
        confidence = result.alternatives[0].confidence
        words = result.alternatives[0].words

        if result.is_final:
            print("*******************")
            sensory_log.info(f"Final STT output: {transcript}")
            print(f"Confidence: {confidence:.2f}")
            self.process_input(transcript)

            # Check for multiple speakers using words
            if words:
                print(words)
                # Track unique speaker IDs using a list
                speaker_ids = []
                for word in words:
                    print(f"Word: {word.word} (speaker_tag: {word.speaker_tag})")
                    if word.speaker_tag not in speaker_ids:
                        speaker_ids.append(word.speaker_tag)
                
                print(f"Detected {len(speaker_ids)} speakers")
Share Improve this question asked Mar 4 at 0:03 JackKalishJackKalish 1,5953 gold badges15 silver badges26 bronze badges 2
  • 1 it is looks strange. Maybe it is bug. And maybe you should send it to authors of this module. – furas Commented Mar 4 at 10:01
  • 1 Hi @jacki, I have posted an answer. I hope it will help you. do consider accepting and upvoting if it helps, as per Stack Overflow guidelines, helping more Stack contributors with their research – Sourav Dutta Commented Mar 5 at 21:37
Add a comment  | 

1 Answer 1

Reset to default 2

Here the problem is result.alternatives[0].words , which contains all the words from the previous transcript also . So you can filter the word the help of start_time = word_info.start_time .

when result.is_final is True and it’s not capture the word from previous transcript .

In your code you have to modify this section ,

if words: print(words) . You can refer to this documentation to change in your code

isFinal indicates whether the results obtained within this list entry are interim or are final. Checkout this full doc for more info .

You can also file a bug here issue tracker and vote with + one .

与本文相关的文章

发布评论

评论列表(0)

  1. 暂无评论