最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

python - How to Adjust Google TTS SSML to Match Original SRT Timing? - Stack Overflow

programmeradmin9浏览0评论

I have an .srt file where each speech segment is supposed to last a specific duration (e.g., 4 seconds). However, when I generate the speech using Google Text-to-Speech (TTS) with SSML, the resulting audio plays the same segment in a shorter time (e.g., 3 seconds).

I want to adjust the speech rate dynamically in SSML so that each segment matches its original timing. My idea is to use ffmpeg to extract the actual duration of each generated speech segment, then calculate the speech rate percentage as: generated duration speech rate = -------------------- original duration

This percentage would then be applied in SSML using the tag, like: Text to be spoken

How can I accurately measure the duration of each segment using ffmpeg, and what is the best way to apply the correct speech rate in SSML to match the original .srt timing?

I tried duration and my SSML should look like this:

        f.write(f'\t<p>{break_until_start}{text}<break time="{value["break_until_next"]}ms"/></p>\n')

Code writing the SSML:

text = value['text'] start_time_ms = int(value['start_ms']) # Start time in milliseconds previous_end_ms = int(subsDict.get(str(int(key) - 1), {}).get('end_ms', 0)) # Get the previous end time gap_to_fill = max(0, start_time_ms - previous_end_ms)

        text = text.replace("&", "&amp;").replace('"', "&quot;").replace("'", "&apos;").replace("<", "&lt;").replace(
            ">", "&gt;")

        break_until_start = f'<break time="{gap_to_fill}ms"/>' if gap_to_fill > 0 else ''

        f.write(f'\t<p>{break_until_start}{text}<break time="{value["break_until_next"]}ms"/></p>\n')

    f.write('</speak>\n')
发布评论

评论列表(0)

  1. 暂无评论