When I run the following code in the page console I'm trying to scrape, I got picture.
document.querySelector('#sb-site > div.sticky_footer > div:nth-child(9)')
However, when I run this in my program, the console log it and returns '{}'
const inputContent = await page.evaluate(() => {
return document.querySelector('#sb-site > div.sticky_footer > div:nth-child(9)'); });
When I run the following code in the page console I'm trying to scrape, I got picture.
document.querySelector('#sb-site > div.sticky_footer > div:nth-child(9)')
However, when I run this in my program, the console log it and returns '{}'
const inputContent = await page.evaluate(() => {
return document.querySelector('#sb-site > div.sticky_footer > div:nth-child(9)'); });
Share
Improve this question
edited Nov 10, 2023 at 16:17
ggorlen
57.1k8 gold badges110 silver badges150 bronze badges
asked Mar 6, 2019 at 6:39
Ryan SoderbergRyan Soderberg
7729 silver badges24 bronze badges
7
-
How are you loading the page? Are you loading with
waitUntil: 'networkidle0'
? Are you trying to console a HTML element on the nodejs console or just get the text/link? – Md. Abu Taher Commented Mar 6, 2019 at 6:48 - I have added that code so now it fully loads, I also added .innerHTML after the selector. I am trying to grab that giant block of text from the image in the main post so I can pull content out of it – Ryan Soderberg Commented Mar 6, 2019 at 7:17
- You are trying to pull text from image? :/ – Md. Abu Taher Commented Mar 6, 2019 at 7:23
- tbh, it's hard to help if you don't provide more code or url, so that we can reproduce this problem. I dealt with lots of react/vue/angular site scraping, but still I needed more specific information. – Md. Abu Taher Commented Mar 6, 2019 at 7:25
- 1 Instead of sending us pictures, please copy and paste just the code you want into your question. – Heretic Monkey Commented Mar 6, 2019 at 20:29
3 Answers
Reset to default 12puppeteer can transfer two types of data between Node.js and browser context: serializable data (i.e. data that is supported by JSON.stringify()
/JSON.parse()
) and JavaScript object ids (including DOM elements) — JSHandle and ElementHandle. Later ones have a bit more plicated API (see JSHandle and ElementHandle methods or methods that mention them).
page.evaluate()
can only transfer serializable data, and instead of un-serializable data, it returns undefined
or empty objects. DOM elements are non-serializable as they contain circular references and methods.
So if you just need some text or element attributes, try to do most of the processing in the browser context and return just serializable data.
Make sure the page loads pletely before scraping.
page.goto(url, {waitUntil: 'networkidle0'})
Also, according to the docs, .evaluate
will return a promise
, it will not return a DOM element.
It will print {}
on console or the value the promise resolves to on console.
In your case you're trying to select a custom dom object injected into the page which is leading to some strange behavior when using the nth-child()
css selector. So you should try to target the DOM node directly instead. So let's say you were trying to get a similar element here https://wefunder./chattanoogafc
You can do:
const inputContent = await page.evaluate(async () => {
var elements = document.querySelectorAll("#sb-site > div.sticky_footer > div")[3].querySelectorAll("*")[0];
return elements.getAttribute("pany-json");
});
console.log("test:" + inputContent);
And that should return the JSON that you want. You can then parse it using JSON.parse(inputContent)