最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

javascript - How to parse HTMLXML documents with Node.js? - Stack Overflow

programmeradmin2浏览0评论

I have an editor.html that contains generatePNG function:

  <!DOCTYPE html> 
<html> 
<head> 
    <meta charset="UTF-8"> 
    <title>Diagram</title> 

    <script type="text/javascript" src="lib/jquery-1.8.1.js"></script> 
//    <!-- I use many resources -->
<script></script> 

    <script> 

        function generatePNG (oViewer) { 
            var oImageOptions = { 
                includeDecoratorLayers: false, 
                replaceImageURL: true 
            }; 

            var d = new Date(); 
            var h = d.getHours(); 
            var m = d.getMinutes(); 
            var s = d.getSeconds(); 

            var sFileName = "diagram" + h.toString() + m.toString() + s.toString() + ".png"; 

            var sResultBlob = oViewer.generateImageBlob(function(sBlob) { 
                b = 64; 
                var reader = new window.FileReader(); 
                reader.readAsDataURL(sBlob); 
                reader.onloadend = function() { 
                    base64data = reader.result; 
                    var image = document.createElement('img'); 
                    image.setAttribute("id", "GraphImage"); 
                    image.src = base64data; 
                    document.body.appendChild(image); 
                } 

            }, "image/png", oImageOptions); 
            return sResult; 
        } 

    </script> 


</head> 

<body > 
    <div id="diagramContainer"></div> 
</body> 
</html>

I want to access the DOM and get image.src using Node.js. I find that I can work with cheerio or jsdom.

I start with this:

var cheerio = require('cheerio'),
    $ = cheerio.load('editor.html');

But I don't find how to access and get image.src.

I have an editor.html that contains generatePNG function:

  <!DOCTYPE html> 
<html> 
<head> 
    <meta charset="UTF-8"> 
    <title>Diagram</title> 

    <script type="text/javascript" src="lib/jquery-1.8.1.js"></script> 
//    <!-- I use many resources -->
<script></script> 

    <script> 

        function generatePNG (oViewer) { 
            var oImageOptions = { 
                includeDecoratorLayers: false, 
                replaceImageURL: true 
            }; 

            var d = new Date(); 
            var h = d.getHours(); 
            var m = d.getMinutes(); 
            var s = d.getSeconds(); 

            var sFileName = "diagram" + h.toString() + m.toString() + s.toString() + ".png"; 

            var sResultBlob = oViewer.generateImageBlob(function(sBlob) { 
                b = 64; 
                var reader = new window.FileReader(); 
                reader.readAsDataURL(sBlob); 
                reader.onloadend = function() { 
                    base64data = reader.result; 
                    var image = document.createElement('img'); 
                    image.setAttribute("id", "GraphImage"); 
                    image.src = base64data; 
                    document.body.appendChild(image); 
                } 

            }, "image/png", oImageOptions); 
            return sResult; 
        } 

    </script> 


</head> 

<body > 
    <div id="diagramContainer"></div> 
</body> 
</html>

I want to access the DOM and get image.src using Node.js. I find that I can work with cheerio or jsdom.

I start with this:

var cheerio = require('cheerio'),
    $ = cheerio.load('editor.html');

But I don't find how to access and get image.src.

Share Improve this question edited Oct 22, 2021 at 12:10 Armen Michaeli 9,1409 gold badges65 silver badges100 bronze badges asked Dec 16, 2015 at 10:28 ameniameni 3932 gold badges9 silver badges17 bronze badges 8
  • The image.src you want to get is generated inside the editor.html using javascript that lays within that page? – luiso1979 Commented Dec 16, 2015 at 10:42
  • @luiso yes the image.src is a based64 data and it is generated in the editor.html , i want to extract it from node.js server – ameni Commented Dec 16, 2015 at 10:44
  • Just to clarify, you load the editor.html into cheerio on the server? So there is no browser involved in this? – Rogier Spieker Commented Dec 16, 2015 at 11:26
  • @RogierSpieker i just want to access to edtior.html from node.js and get the image.src – ameni Commented Dec 16, 2015 at 11:56
  • There are two possibilities in my mind as to what you are asking. Either you want Node.js to access an image generated by a web browser on a live page, or you want to be able to access image data stored in an html file in an img element's src attribute. Please clarify. – Jonathan Gray Commented Dec 16, 2015 at 12:07
 |  Show 3 more ments

1 Answer 1

Reset to default 12

The problem is that by loading an html file into cheerio (or any other node module) will not process the HTML as a browser does. Assets (such as stylesheets, images and javascripts) will not be loaded and/or processed as they would be within a browser.

While both node.js and modern webbrowsers have the same (or similar) javascript engines, however a browser adds a lot of additional stuff, such as window, the DOM (document), etc. Node.js does not have these concepts, so there is no window.FileReader nor document.createElement.

If the image is created entirely without user interaction (your code sample 'magically' receives the sBlob argument wich appears to be a string like data:<type>;<encoding>,<data>) you could use a so called headless browser on the server, PhantomJS seems most popular these days. Then again, if no user interaction is required for the creation of the sBlob, you are probably better off using a pure node.js solution, e.g. How do I parse a data URL in Node?.

If there is some kind of user interaction required to create the sBlob, and you need to store it on a server, you can use pretty much the same solution as mentioned by simply sending the sBlob to the server using Ajax or a websocket, processing the sBlob into an image and (optionally) returning the URL where to find the image.

发布评论

评论列表(0)

  1. 暂无评论