最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

javascript - Fastest way to process audio segments and concatenate buffers in Node.js - Stack Overflow

programmeradmin0浏览0评论

I am trying to optimize the performance of a Node.js function that generates audio segments (for example something like OpenAIs TTS API) and then concatenates the resulting audio buffers. My goal is to speed up the reading of each audio response into the buffer and the final concatenation of audio buffers.

Current Approach

Naively I thought that it would be as simple as calling all of the audio segments, then reading them all into their own buffer and putting them together. Here is a simplified version of what I am doing now:

const createAudioSegment = async (text) => {
    const response = await openai.audio.speech.create({
        model: "tts-1",
        voice: "echo",
        input: text,
    });
    return response;
};

const audio_texts = ["text 1", "text 2"] // list of text to turn into audio

const segmentsTimeStart = new Date().getTime()
// Process dialogue segments in parallel
const audioSegments = await Promise.all(
    audio_texts.map((text) => createAudioSegment(text))
);
const segmentsTimeEnd = new Date().getTime()
const segmentsTimeDiff = segmentsTimeEnd - segmentsTimeStart
console.log(`Total Audio Segment Time: ${segmentsTimeDiff}ms`)

const audioBufferReadStart = new Date().getTime()
let responseTimes = []
const audioBuffers = await Promise.all(
    audioSegments.map(async (segment) => {
     const responseStartTime = new Date().getTime()
     const arrayBuffer = await segment.arrayBuffer();
     const responseTimeDiff = responseEndTime - responseStartTime
     responseTimes.push(responseTimeDiff)
     return Buffer.from(arrayBuffer);
    })
);
const audioBufferReadEnd = new Date().getTime()

console.log(`Audio Buffer Total Read Time: ${audioBufferReadEnd - audioBufferReadStart}ms`)
console.log(`Audio Buffer Individual Read Times (ms): ${responseTimes}`)

const concatStartTime = new Date().getTime()
const finalBuffer = Buffer.concat(audioBuffers);
const concatEndTime = new Date().getTime()
console.log(`Buffer Concatenation Time: ${concatEndTime - concatStartTime}ms`)

Performance Issues

Overall, the time it takes to do this is usually 40-60 seconds. However, when logging the individual operations, I noticed that it is really the reading of each audio segment into its individual buffer that takes the majority of the time. As an example, for a scenario with 12 audio segments, I see the following timing:

  • Total Audio Segment Time: 2333 ms
  • Audio Buffer Total Read Time: 27455 ms
  • Audio Buffer Individual Read Times (ms): [1035,3420,6497,96,150,360,70,88,20344,32,83,254]
  • Buffer Concatenation Time: 2 ms

It takes over 20 seconds to read all of the responses to their buffers.. why are some <1sec and some >20sec? I don't understand.

Question

What could be causing such a large discrepancy between the individual read times? Is this something that is avoidable with a more efficient way to do this?

I am trying to optimize the performance of a Node.js function that generates audio segments (for example something like OpenAIs TTS API) and then concatenates the resulting audio buffers. My goal is to speed up the reading of each audio response into the buffer and the final concatenation of audio buffers.

Current Approach

Naively I thought that it would be as simple as calling all of the audio segments, then reading them all into their own buffer and putting them together. Here is a simplified version of what I am doing now:

const createAudioSegment = async (text) => {
    const response = await openai.audio.speech.create({
        model: "tts-1",
        voice: "echo",
        input: text,
    });
    return response;
};

const audio_texts = ["text 1", "text 2"] // list of text to turn into audio

const segmentsTimeStart = new Date().getTime()
// Process dialogue segments in parallel
const audioSegments = await Promise.all(
    audio_texts.map((text) => createAudioSegment(text))
);
const segmentsTimeEnd = new Date().getTime()
const segmentsTimeDiff = segmentsTimeEnd - segmentsTimeStart
console.log(`Total Audio Segment Time: ${segmentsTimeDiff}ms`)

const audioBufferReadStart = new Date().getTime()
let responseTimes = []
const audioBuffers = await Promise.all(
    audioSegments.map(async (segment) => {
     const responseStartTime = new Date().getTime()
     const arrayBuffer = await segment.arrayBuffer();
     const responseTimeDiff = responseEndTime - responseStartTime
     responseTimes.push(responseTimeDiff)
     return Buffer.from(arrayBuffer);
    })
);
const audioBufferReadEnd = new Date().getTime()

console.log(`Audio Buffer Total Read Time: ${audioBufferReadEnd - audioBufferReadStart}ms`)
console.log(`Audio Buffer Individual Read Times (ms): ${responseTimes}`)

const concatStartTime = new Date().getTime()
const finalBuffer = Buffer.concat(audioBuffers);
const concatEndTime = new Date().getTime()
console.log(`Buffer Concatenation Time: ${concatEndTime - concatStartTime}ms`)

Performance Issues

Overall, the time it takes to do this is usually 40-60 seconds. However, when logging the individual operations, I noticed that it is really the reading of each audio segment into its individual buffer that takes the majority of the time. As an example, for a scenario with 12 audio segments, I see the following timing:

  • Total Audio Segment Time: 2333 ms
  • Audio Buffer Total Read Time: 27455 ms
  • Audio Buffer Individual Read Times (ms): [1035,3420,6497,96,150,360,70,88,20344,32,83,254]
  • Buffer Concatenation Time: 2 ms

It takes over 20 seconds to read all of the responses to their buffers.. why are some <1sec and some >20sec? I don't understand.

Question

What could be causing such a large discrepancy between the individual read times? Is this something that is avoidable with a more efficient way to do this?

Share Improve this question edited Mar 18 at 17:32 Trevor Woods asked Mar 13 at 15:59 Trevor WoodsTrevor Woods 481 gold badge1 silver badge8 bronze badges 9
  • It would be helpful to know how you are measuring "Audio Buffer Individual Read Times". – James Commented Mar 13 at 17:59
  • const createAudioSegment = async ([i, segment]) - when you call it createAudioSegment(text) you are passing a string. Please fix/clarify. – James Commented Mar 13 at 18:05
  • can put console.time and .timeEnd (example) between calls and share the result ? – bogdanoff Commented Mar 16 at 17:17
  • Thanks for the comments/edits gents. @James yes simply the text is being passed in. @ bogdanoff I've added in the logging statements for clarity. – Trevor Woods Commented Mar 18 at 15:04
  • Assuming response there is just a HTTP response, you're essentially making N parallel calls to OpenAI, so sure, some of them might finish quicker, others more slowly. .. – AKX Commented Mar 18 at 15:33
 |  Show 4 more comments

1 Answer 1

Reset to default 0

Note: This is not the final answer, but my attempt to narrow down the problem.

I have noticed few issues with your code first being

Issue 1

const audioSegments = await Promise.all(
    audio_texts.map((text) => createAudioSegment(text))
);

notice you are passing a string (text) to createAudioSegment but in definition you are expecting array of string

const createAudioSegment = async ([text]) => {

so you were sending only very first character (which is just t) to openai.

Issue 2

const responseTimeDiff = responseEndTime - responseStartTime

responseEndTime is not defined.

After simplifying you get this

Final solution

import OpenAI from 'openai'
const openai = new OpenAI()

/** @param {string} text */
const createAudioSegment = (text) =>
  openai.audio.speech.create({
    model: 'tts-1',
    voice: 'echo',
    input: text,
  })

const audio_texts = ['I am text 1', 'I am text 2']

console.time('audioSegments')
const audioSegments = await Promise.all(audio_texts.map((text) => createAudioSegment(text)))
console.timeEnd('audioSegments')

console.time('audioArrayBuffers')
const audioArrayBuffers = await Promise.all(audioSegments.map((segment) => segment.arrayBuffer()))
console.timeEnd('audioArrayBuffers')

console.time('buffers')
const buffers = audioArrayBuffers.map((ab) => Buffer.from(ab))
console.timeEnd('buffers')

console.time('finalBuffer')
const finalBuffer = Buffer.concat(buffers)
console.timeEnd('finalBuffer')

console.log(finalBuffer.length, finalBuffer.byteLength)

Now run this code as-is and share the whole terminal output here. (I don't have access to openai, so I could not test it.)

发布评论

评论列表(0)

  1. 暂无评论