最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

javascript - Replace HTML entities (e.g. ’) with character equivalents when parsing an XML feed - Stack Overflow

programmeradmin2浏览0评论

When parsing an XML feed, I am getting text from the content tag, like this:

The Government has awarded funding for a major refurbishment project to go ahead at St Eunan’s College. This is in addition to last month’s announcement that grant for its prefabs to be replaced with permanent acodation. The latest grant will allow for major refurbishment to a section of the school to allow for new acmodation for classes – the project will also involve roof repairs, the installation of a dust extraction system, new science room fittings and installation of firm alarms. Donegal Deputy Joe McHugh says credit must go to the school’s board of management

Is there anyway to easily replace these special characters (i.e., HTML entities) for e.g., apostrophes, etc. with their character equivalents?

EDIT:

Ti.API.info("is this real------------"+win.dataToPass)


returns: (line breaks added for clarity)

[INFO][TiAPI   ( 5437)]  Is this real------------------Police in Strabane are
warning home owners and car owners in the town to be vigilant following a recent
spate of break-ins. There has been a number of thefts from gardens and vehicles
in the Jefferson Court and Carricklynn Avenue area of the town. The PSNI have
said that residents have reported seeing a dark haired male in and around the
area in the early hours of the morning. Local Cllr Karina Carlin has been
monitoring the situation – she says the problem seems to be getting
worse…….


My external.js file is below i.e. the one which merely displays the text above:

var win= Titanium.UI.currentWindow;

Ti.API.info("Is this real------------------"+ win.dataToPass);

var escapeChars = { lt: '<', gt: '>', quot: '"', apos: "'", amp: '&' };

function unescapeHTML(str) {//modified from underscore.string and string.js
    return str.replace(/\&([^;]+);/g, function(entity, entityCode) {
        var match;

        if ( entityCode in escapeChars) {
            return escapeChars[entityCode];
        } else if ( match = entityCode.match(/^#x([\da-fA-F]+)$/)) {
            return String.fromCharCode(parseInt(match[1], 16));
        } else if ( match = entityCode.match(/^#(\d+)$/)) {
            return String.fromCharCode(~~match[1]);
        } else {
            return entity;
        }
    });
}

var newText= unescapeHTML(win.datatoPass);


var label= Titanium.UI.createLabel({
    color: "black",
    //text: win.dataToPass,//this works!
    text:newText,//this is causing an error
    font: "Helvetica",
    fontSize: 50,
    width: "auto",
    height: "auto",
    textAlign: "center"
})

win.add(label);

When parsing an XML feed, I am getting text from the content tag, like this:

The Government has awarded funding for a major refurbishment project to go ahead at St Eunan’s College. This is in addition to last month’s announcement that grant for its prefabs to be replaced with permanent acodation. The latest grant will allow for major refurbishment to a section of the school to allow for new acmodation for classes – the project will also involve roof repairs, the installation of a dust extraction system, new science room fittings and installation of firm alarms. Donegal Deputy Joe McHugh says credit must go to the school’s board of management

Is there anyway to easily replace these special characters (i.e., HTML entities) for e.g., apostrophes, etc. with their character equivalents?

EDIT:

Ti.API.info("is this real------------"+win.dataToPass)


returns: (line breaks added for clarity)

[INFO][TiAPI   ( 5437)]  Is this real------------------Police in Strabane are
warning home owners and car owners in the town to be vigilant following a recent
spate of break-ins. There has been a number of thefts from gardens and vehicles
in the Jefferson Court and Carricklynn Avenue area of the town. The PSNI have
said that residents have reported seeing a dark haired male in and around the
area in the early hours of the morning. Local Cllr Karina Carlin has been
monitoring the situation &#8211; she says the problem seems to be getting
worse&#8230;&#8230;.


My external.js file is below i.e. the one which merely displays the text above:

var win= Titanium.UI.currentWindow;

Ti.API.info("Is this real------------------"+ win.dataToPass);

var escapeChars = { lt: '<', gt: '>', quot: '"', apos: "'", amp: '&' };

function unescapeHTML(str) {//modified from underscore.string and string.js
    return str.replace(/\&([^;]+);/g, function(entity, entityCode) {
        var match;

        if ( entityCode in escapeChars) {
            return escapeChars[entityCode];
        } else if ( match = entityCode.match(/^#x([\da-fA-F]+)$/)) {
            return String.fromCharCode(parseInt(match[1], 16));
        } else if ( match = entityCode.match(/^#(\d+)$/)) {
            return String.fromCharCode(~~match[1]);
        } else {
            return entity;
        }
    });
}

var newText= unescapeHTML(win.datatoPass);


var label= Titanium.UI.createLabel({
    color: "black",
    //text: win.dataToPass,//this works!
    text:newText,//this is causing an error
    font: "Helvetica",
    fontSize: 50,
    width: "auto",
    height: "auto",
    textAlign: "center"
})

win.add(label);
Share Improve this question edited Sep 11, 2014 at 18:17 Dennis T --Reinstate Monica-- 7696 silver badges20 bronze badges asked Jul 16, 2013 at 14:03 user2363025user2363025 6,51520 gold badges55 silver badges90 bronze badges 5
  • Avoid as in remove them? replace them with their character equivalents? - What do you want to do with the string? – Alex K. Commented Jul 16, 2013 at 14:07
  • @ Alex K. Yep replace them with their character equivalents. I am displaying them as text on a window – user2363025 Commented Jul 16, 2013 at 14:11
  • @ Alex K. I realise a custom find and replace function could do it but I waas wondering if there was another way as I'd have to know all the possible special characters which could possibly appear – user2363025 Commented Jul 16, 2013 at 14:16
  • Is this in a browser? Then you can use a pattern match or by setting a dom members html the reading back its node text; stackoverflow./questions/4338963/… – Alex K. Commented Jul 16, 2013 at 14:17
  • @AlexK. No it's not in a browser – user2363025 Commented Jul 16, 2013 at 14:28
Add a ment  | 

3 Answers 3

Reset to default 7

There are many libraries you can include in Titanium (Underscore.string, string.js that will make this happen, but if you only want the unescape html function, just try this code, adapted from the above libraries

var escapeChars = { lt: '<', gt: '>', quot: '"', apos: "'", amp: '&' };

function unescapeHTML(str) {//modified from underscore.string and string.js
    return str.replace(/\&([^;]+);/g, function(entity, entityCode) {
        var match;

        if ( entityCode in escapeChars) {
            return escapeChars[entityCode];
        } else if ( match = entityCode.match(/^#x([\da-fA-F]+)$/)) {
            return String.fromCharCode(parseInt(match[1], 16));
        } else if ( match = entityCode.match(/^#(\d+)$/)) {
            return String.fromCharCode(~~match[1]);
        } else {
            return entity;
        }
    });
}

This replaces those special characters with their human readable derivatives and returns the modified string. Just put this somewhere in code and your good to go, I have used this myself in Titanium and its quite handy.

I have encountered same issue, and @Josiah Hester's solution does work for me. I have add a condition to check that only string values are handled.

    this.unescapeHTML = function(str) {
    var escapeChars = { lt: '<', gt: '>', quot: '"', apos: "'", amp: '&' };
    if(typeof(str) !== 'string'){
        return str;
    }else{
        return str.replace(/\&([^;]+);/g, function(entity, entityCode) {
        var match;
        if ( entityCode in escapeChars) {
            return escapeChars[entityCode];
        } else if ( match = entityCode.match(/^#x([\da-fA-F]+)$/)) {
            return String.fromCharCode(parseInt(match[1], 16));
        } else if ( match = entityCode.match(/^#(\d+)$/)) {
            return String.fromCharCode(~~match[1]);
        } else {
            return entity;
        }});
    }
};

Below are two references to these special characters, unfortunately by filtering them out you may filter out important information that you might actually want to keep. My advice is to use the symbol reference table to create an array and then perform a search in your string for each of the codes and replace the code with it's appropriate response.

For example:

A-Z are represented by: &#65; to &#90;

Filtering out this information may significantly change the data you expect to be reading.

HTML Symbol Entities Reference:
http://www.webmonkey./2010/02/special_characters/
http://www.w3schools./tags/ref_symbols.asp

与本文相关的文章

发布评论

评论列表(0)

  1. 暂无评论