Using Google Speech in Python, I'm able to get a transcript for each phrase spoken using result.alternatives[0].transcript, but when I try to look at the words for the phrase, result.alternatives[0].words always returns an array of ALL of the words ever spoken, not just the words from the transcript... which seems wrong? Is this a bug, or is there some way to filter out/reset the words array, since I'm only interested in the words in the spoken phrase.
My code:
if not response.results:
continue
result = response.results[0]
if not result.alternatives:
continue
transcript = result.alternatives[0].transcript
confidence = result.alternatives[0].confidence
words = result.alternatives[0].words
if result.is_final:
print("*******************")
sensory_log.info(f"Final STT output: {transcript}")
print(f"Confidence: {confidence:.2f}")
self.process_input(transcript)
# Check for multiple speakers using words
if words:
print(words)
# Track unique speaker IDs using a list
speaker_ids = []
for word in words:
print(f"Word: {word.word} (speaker_tag: {word.speaker_tag})")
if word.speaker_tag not in speaker_ids:
speaker_ids.append(word.speaker_tag)
print(f"Detected {len(speaker_ids)} speakers")
Using Google Speech in Python, I'm able to get a transcript for each phrase spoken using result.alternatives[0].transcript, but when I try to look at the words for the phrase, result.alternatives[0].words always returns an array of ALL of the words ever spoken, not just the words from the transcript... which seems wrong? Is this a bug, or is there some way to filter out/reset the words array, since I'm only interested in the words in the spoken phrase.
My code:
if not response.results:
continue
result = response.results[0]
if not result.alternatives:
continue
transcript = result.alternatives[0].transcript
confidence = result.alternatives[0].confidence
words = result.alternatives[0].words
if result.is_final:
print("*******************")
sensory_log.info(f"Final STT output: {transcript}")
print(f"Confidence: {confidence:.2f}")
self.process_input(transcript)
# Check for multiple speakers using words
if words:
print(words)
# Track unique speaker IDs using a list
speaker_ids = []
for word in words:
print(f"Word: {word.word} (speaker_tag: {word.speaker_tag})")
if word.speaker_tag not in speaker_ids:
speaker_ids.append(word.speaker_tag)
print(f"Detected {len(speaker_ids)} speakers")
Share
Improve this question
asked Mar 4 at 0:03
JackKalishJackKalish
1,5953 gold badges15 silver badges26 bronze badges
2
- 1 it is looks strange. Maybe it is bug. And maybe you should send it to authors of this module. – furas Commented Mar 4 at 10:01
- 1 Hi @jacki, I have posted an answer. I hope it will help you. do consider accepting and upvoting if it helps, as per Stack Overflow guidelines, helping more Stack contributors with their research – Sourav Dutta Commented Mar 5 at 21:37
1 Answer
Reset to default 2Here the problem is
result.alternatives[0].words
, which contains all the words from the previous transcript also . So you can filter the word the help of start_time = word_info.start_time
.
when result.is_final is True
and it’s not capture the word from previous transcript .
In your code you have to modify this section ,
if words: print(words) . You can refer to this documentation to change in your code
isFinal indicates whether the results obtained within this list entry are interim or are final. Checkout this full doc for more info .
You can also file a bug here issue tracker and vote with +
one .