最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

javascript - text to audio file on node.js - Stack Overflow

programmeradmin0浏览0评论

Im looking for an optimized legal way to create an audio file from text on nodejs.

Now I see 5 variants:

1) simple hhtp request to google translate text-to-sppeach api. This variant isn't good, because each request demands on generated token e.g. 'tk:729008.879154' Without this it can fail. Besides that, this option is 'illegal'.

2) http request to google translate text-to-sppeach api from 'console browser' - puppeteer

Is there a way to generate right token key to make this request 'legal'?

3) use Web Speech Api in puppeteer to get binary data and save it to file? Or is there a way to work with Chromium/Chrome source code ?

4) Using any other technology/language library on machine with nodejs and to use js as interpreter to call mands in this technology/programm. Any ideas?

5) Any free public api with different languages support (dream api)?

Any advice will be highly appreciated.

Im looking for an optimized legal way to create an audio file from text on nodejs.

Now I see 5 variants:

1) simple hhtp request to google translate text-to-sppeach api. This variant isn't good, because each request demands on generated token e.g. 'tk:729008.879154' Without this it can fail. Besides that, this option is 'illegal'.

2) http request to google translate text-to-sppeach api from 'console browser' - puppeteer

Is there a way to generate right token key to make this request 'legal'?

3) use Web Speech Api in puppeteer to get binary data and save it to file? Or is there a way to work with Chromium/Chrome source code ?

4) Using any other technology/language library on machine with nodejs and to use js as interpreter to call mands in this technology/programm. Any ideas?

5) Any free public api with different languages support (dream api)?

Any advice will be highly appreciated.

Share Improve this question asked Apr 16, 2018 at 15:59 Oleg MiniukOleg Miniuk 311 silver badge4 bronze badges 1
  • As I've mentioned in the answer below, there are a couple of options that do not require a key, the MARY engine (mary.dfki.de/documentation/index.html) is a good choice and you can install locally. – Terry Lennox Commented Apr 19, 2018 at 11:32
Add a ment  | 

2 Answers 2

Reset to default 5

One possible approach is to wrap the eSpeak mand line tool (Windows & Linux) http://espeak.sourceforge/. You can then wrap with Node.js.

const { exec } = require('child_process');

var outputFile = process.argv[2] || "output.wav";
var voice = process.argv[3] || "en-uk-north";
var text = process.argv[4] || "hello there buddy";
var mand = `espeak.exe -v ${voice} -w ${outputFile} "${text}"`;

exec(mand, (err, stdout, stderr) => {
  if (err) {
    console.log("Error occurred: ", err);
    return;
  }
});

This gives a fairly low quality output.

I've also played with the Bing Speech API and the output is very good, I've created a Node.js example. You would need to sign up for an API key but this is very easy (you to to https://azure.microsoft./en-us/try/cognitive-services/ and select "Speech").

const key = 'your api key here';

function synthesizeSpeech(apiKey)
{
    const fs = require('fs');
    const request = require('request');
    const xmlbuilder = require('xmlbuilder');
    const text = process.argv[2] || "The fault, dear Brutus, is not in our stars, But in ourselves, that we are underlings.";
    const outputFile = process.argv[3] || "speech.wav";

    var ssml_doc = xmlbuilder.create('speak')
        .att('version', '1.0')
        .att('xml:lang', 'en-au')
        .ele('voice')
        .att('xml:lang', 'en-au')
        .att('xml:gender', 'Female')
        .att('name', 'Microsoft Server Speech Text to Speech Voice (en-AU, HayleyRUS)')
        .txt(text)
        .end();
    var post_speak_data = ssml_doc.toString();

    console.log('Synthesizing speech: ', text);
    request.post({
        url: 'https://api.cognitive.microsoft./sts/v1.0/issueToken',
        headers: {
            'Ocp-Apim-Subscription-Key' : apiKey
        }
    }, function (err, resp, access_token) {
        if (err || resp.statusCode != 200) {
            console.log(err, resp.body);
        } else {
            try {
                request.post({
                    url: 'https://speech.platform.bing./synthesize',
                    body: post_speak_data,
                    headers: {
                        'content-type' : 'application/ssml+xml',
                        'X-Microsoft-OutputFormat' : 'riff-16khz-16bit-mono-pcm',
                        'Authorization': 'Bearer ' + access_token,
                        'X-Search-AppId': '9FCF779F0EFB4E8E8D293EEC544221E9',
                        'X-Search-ClientID': '0A13B7717D0349E683C00A6AEA9E8B6D',
                        'User-Agent': 'Node.js-Demo'
                    },
                    encoding: null
                }, function (err, resp, data) {
                    if (err || resp.statusCode != 200) {
                        console.log(err, resp.body);
                    } else {
                        try {
                            console.log('Saving output to file: ', outputFile);
                            fs.writeFileSync(outputFile, data);
                        } catch (e) {
                            console.log(e.message);
                        }
                    }
                });
            } catch (e) {
                console.log(e.message);
            }
        }
    });
}

synthesizeSpeech(key);

Also check out the MARY project here: http://mary.dfki.de/, this is an open source server that you can install, the voice output is very good, you could make calls to the server from node.js.

If you install the Mary Speech engine (quite easy):

"use strict";

const fs = require('fs');
const request = require('request');
const text = process.argv[2] || "The fault, dear Brutus, is not in our stars, But in ourselves, that we are underlings.";
const outputFile = process.argv[3] || "speech_mary_output.wav";

const options = {
    url: `http://localhost:59125/process?INPUT_TEXT=${text}!&INPUT_TYPE=TEXT&OUTPUT_TYPE=AUDIO&AUDIO=WAVE_FILE&LOCALE=en_US&VOICE=cmu-slt-hsmm`,
    encoding: null // Binary data.
}

console.log('Synthesizing speech (using Mary engine): ', text);
console.log('Calling: ', options.url);
request.get(options, function (err, resp, data) {
    if (err || resp.statusCode != 200) {
        console.log(err, resp.body);
    } else {
        try {
            console.log(`Saving output to file: ${outputFile}, length: ${data.length} byte(s)`);
            fs.writeFileSync(outputFile, data, { encoding: 'binary'});
        } catch (e) {
            console.log(e.message);
        }
    }
});

This will synthesize speech for you. No API key required!

Using text2wav.node.js, you don't need to depend on any external online service or separately installed host program. It's all self-contained FOSS. Besides, it supports 101 languages.

发布评论

评论列表(0)

  1. 暂无评论