最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

javascript - Parsing XML file in Node.js - Stack Overflow

programmeradmin2浏览0评论

I am using an Arch Linux system with KDE plasma. I have approximately 50mb XML, and I need to parse it. The file has custom tags.

Example XML:

<JMdict>
   <entry>
      <ent_seq>1000000</ent_seq>
      <r_ele>
         <reb>ヽ</reb>
      </r_ele>
      <sense>
         <pos>&unc;</pos>
         <gloss g_type="expl">repetition mark in katakana</gloss>
      </sense>
   </entry>
</JMdict>

I have tried many solutions that were suggested on Stack Overflow, and they did not work at all, and some of them could not installed to my system like xml-stream, xml2json. I decided to use xml2js (most of them suggest to use xml2js), and got the same result. How can I correctly use it ? I am using this code but it always returns undefined:

const fs = require('fs-extra');
const xml2js = require('xml2js');
const parser = new xml2js.Parser();

const path = "test.xml";

fs.readFile(path, {encoding: 'utf-8'}, function(error, data) {
     parser.parseString(data, function(err, res) {
         console.log(res);
     });
});

Result: Undefined

Is there any way to handle an XML file by hand (without a package)?

I am using an Arch Linux system with KDE plasma. I have approximately 50mb XML, and I need to parse it. The file has custom tags.

Example XML:

<JMdict>
   <entry>
      <ent_seq>1000000</ent_seq>
      <r_ele>
         <reb>ヽ</reb>
      </r_ele>
      <sense>
         <pos>&unc;</pos>
         <gloss g_type="expl">repetition mark in katakana</gloss>
      </sense>
   </entry>
</JMdict>

I have tried many solutions that were suggested on Stack Overflow, and they did not work at all, and some of them could not installed to my system like xml-stream, xml2json. I decided to use xml2js (most of them suggest to use xml2js), and got the same result. How can I correctly use it ? I am using this code but it always returns undefined:

const fs = require('fs-extra');
const xml2js = require('xml2js');
const parser = new xml2js.Parser();

const path = "test.xml";

fs.readFile(path, {encoding: 'utf-8'}, function(error, data) {
     parser.parseString(data, function(err, res) {
         console.log(res);
     });
});

Result: Undefined

Is there any way to handle an XML file by hand (without a package)?

Share Improve this question edited Jan 1, 2019 at 16:03 jonrsharpe 122k30 gold badges267 silver badges474 bronze badges asked Jan 1, 2019 at 15:53 Kaan Taha KökenKaan Taha Köken 9634 gold badges17 silver badges38 bronze badges 1
  • 2 Your "XML" file is not well-formed: it contains an undefined entity reference &unc;. So parsing should fail. – Michael Kay Commented Jan 1, 2019 at 18:58
Add a ment  | 

3 Answers 3

Reset to default 5

This solution uses xml2js.

Working Example Link

var fs = require('fs'),
slash = require('slash'),
xml2js = require('xml2js');

var parser = new xml2js.Parser();

let filename = slash(__dirname+'/foo.xml');

// console.log(filename);

fs.readFile(filename,  "utf8", function(err, data) {

    if(err) {
        console.log('Err1111');
        console.log(err);
    } else {
        //console.log(data);
        // data.toString('ascii', 0, data.length)
        
        parser.parseString(data.replace(/&(?!(?:apos|quot|[gl]t|amp);|#)/g, '&amp;'), function (err, result) {
            if(err) {
                console.log('Err');
                console.log(err);
            } else {
                console.log(JSON.stringify(result));
                console.log('Done');
            }            
        });
    }
});

Exact you have to do it below :

data.replace(/&(?!(?:apos|quot|[gl]t|amp);|#)/g, '&')

Problem is below tag only &unc;

<pos>&unc;</pos>

Referenced And Thanks to @tim

I think your problem is unescaped characters in your xml data.

I'm able to get your example to work by using this:

xml data:

<JMdict>
    <entry>
        <ent_seq>1000000</ent_seq>
        <r_ele>
            <reb>ヽ</reb>
        </r_ele>
        <sense>
             <pos>YOUR PROBLEM WAS HERE</pos>
             <gloss g_type="expl">repetition mark in katakana</gloss>
        </sense>
    </entry>

node.js code:

const fs = require('fs-extra');
const xml2js = require('xml2js');
const parser = new xml2js.Parser();

const path = "test.xml";

fs.readFile(path, {encoding: 'utf-8'}, function(error, data) {
     parser.parseString(data, function(err, res) {
         console.log(JSON.stringify(res.JMdict.entry, null, 4));
     });

});

In situations like this, when I know it should work fine, I always look at the data and for any possible issues with the input data.

The way you use the xml2js package should be fine. However, the format of your xml is a little bit off.

if you add a console.log to see what's causing the error

fs.readFile(path, {encoding: 'utf-8'}, function(error, data) {
     parser.parseString(data, function(err, res) {
         if (err) console.log(err);

         console.log(res);
     });
});

You'll see that it's the line <pos>&unc;</pos> that causes the problem. If you fix the HTML entities, the parser should works fine.

发布评论

评论列表(0)

  1. 暂无评论