I am using an Arch Linux system with KDE plasma. I have approximately 50mb XML, and I need to parse it. The file has custom tags.
Example XML:
<JMdict>
<entry>
<ent_seq>1000000</ent_seq>
<r_ele>
<reb>ヽ</reb>
</r_ele>
<sense>
<pos>&unc;</pos>
<gloss g_type="expl">repetition mark in katakana</gloss>
</sense>
</entry>
</JMdict>
I have tried many solutions that were suggested on Stack Overflow, and they did not work at all, and some of them could not installed to my system like xml-stream
, xml2json
. I decided to use xml2js
(most of them suggest to use xml2js
), and got the same result. How can I correctly use it ?
I am using this code but it always returns undefined:
const fs = require('fs-extra');
const xml2js = require('xml2js');
const parser = new xml2js.Parser();
const path = "test.xml";
fs.readFile(path, {encoding: 'utf-8'}, function(error, data) {
parser.parseString(data, function(err, res) {
console.log(res);
});
});
Result: Undefined
Is there any way to handle an XML file by hand (without a package)?
I am using an Arch Linux system with KDE plasma. I have approximately 50mb XML, and I need to parse it. The file has custom tags.
Example XML:
<JMdict>
<entry>
<ent_seq>1000000</ent_seq>
<r_ele>
<reb>ヽ</reb>
</r_ele>
<sense>
<pos>&unc;</pos>
<gloss g_type="expl">repetition mark in katakana</gloss>
</sense>
</entry>
</JMdict>
I have tried many solutions that were suggested on Stack Overflow, and they did not work at all, and some of them could not installed to my system like xml-stream
, xml2json
. I decided to use xml2js
(most of them suggest to use xml2js
), and got the same result. How can I correctly use it ?
I am using this code but it always returns undefined:
const fs = require('fs-extra');
const xml2js = require('xml2js');
const parser = new xml2js.Parser();
const path = "test.xml";
fs.readFile(path, {encoding: 'utf-8'}, function(error, data) {
parser.parseString(data, function(err, res) {
console.log(res);
});
});
Result: Undefined
Is there any way to handle an XML file by hand (without a package)?
Share Improve this question edited Jan 1, 2019 at 16:03 jonrsharpe 122k30 gold badges267 silver badges474 bronze badges asked Jan 1, 2019 at 15:53 Kaan Taha KökenKaan Taha Köken 9634 gold badges17 silver badges38 bronze badges 1-
2
Your "XML" file is not well-formed: it contains an undefined entity reference
&unc;
. So parsing should fail. – Michael Kay Commented Jan 1, 2019 at 18:58
3 Answers
Reset to default 5This solution uses xml2js.
Working Example Link
var fs = require('fs'),
slash = require('slash'),
xml2js = require('xml2js');
var parser = new xml2js.Parser();
let filename = slash(__dirname+'/foo.xml');
// console.log(filename);
fs.readFile(filename, "utf8", function(err, data) {
if(err) {
console.log('Err1111');
console.log(err);
} else {
//console.log(data);
// data.toString('ascii', 0, data.length)
parser.parseString(data.replace(/&(?!(?:apos|quot|[gl]t|amp);|#)/g, '&'), function (err, result) {
if(err) {
console.log('Err');
console.log(err);
} else {
console.log(JSON.stringify(result));
console.log('Done');
}
});
}
});
Exact you have to do it below :
data.replace(/&(?!(?:apos|quot|[gl]t|amp);|#)/g, '&')
Problem is below tag only &unc;
<pos>&unc;</pos>
Referenced And Thanks to @tim
I think your problem is unescaped characters in your xml data.
I'm able to get your example to work by using this:
xml data:
<JMdict>
<entry>
<ent_seq>1000000</ent_seq>
<r_ele>
<reb>ヽ</reb>
</r_ele>
<sense>
<pos>YOUR PROBLEM WAS HERE</pos>
<gloss g_type="expl">repetition mark in katakana</gloss>
</sense>
</entry>
node.js code:
const fs = require('fs-extra');
const xml2js = require('xml2js');
const parser = new xml2js.Parser();
const path = "test.xml";
fs.readFile(path, {encoding: 'utf-8'}, function(error, data) {
parser.parseString(data, function(err, res) {
console.log(JSON.stringify(res.JMdict.entry, null, 4));
});
});
In situations like this, when I know it should work fine, I always look at the data and for any possible issues with the input data.
The way you use the xml2js package should be fine. However, the format of your xml is a little bit off.
if you add a console.log
to see what's causing the error
fs.readFile(path, {encoding: 'utf-8'}, function(error, data) {
parser.parseString(data, function(err, res) {
if (err) console.log(err);
console.log(res);
});
});
You'll see that it's the line <pos>&unc;</pos>
that causes the problem.
If you fix the HTML entities, the parser should works fine.