最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

javascript - How to scrape date from iframe tag with with puppeteer - Stack Overflow

programmeradmin2浏览0评论

I try to scrape some data from an iframe/frame tag, but I got stuck with the code for puppeteer.I'm a novice so please bear with me.This is the link of the site .In there when I click in a name in the first frame I get some data in second frame on witch again I can click and get data in third frame. In the code I try to loop true first frame to get all data for the second and third.

Thank you for any hints.

I have run this mand: document.querySelector("body > form > font > select > option") in console, but I can't find a way to run it in puppeteer.

const puppeteer = require("puppeteer");

(async () => {

    const browser = await puppeteer.launch();

    const page = await browser.newPage();
    await page.goto('');

    const iframeParagraph = await page.evaluate(() => {

        const iframe = document.getElementsByName("stanga");

        // grab iframe's document object
        const iframeDoc = iframe.contentDocument || iframe.contentWindow.document;

        const iframeP = iframeDoc.getElementsByName("fmtstatii");

        return iframeP.innerHTML;
    });

    console.log(iframeParagraph); 

    await browser.close();

})();

or

const puppeteer = require('puppeteer');

let scrape = async () => {
    const browser = await puppeteer.launch({headless: false});
    const page = await browser.newPage();

    await page.goto('');
    await page.click('document.querySelector("body > form > font > select")');
    await page.waitFor(1000);

    const result = await page.evaluate(() => {
        let statie = document.querySelector('document.querySelector("body > form > font > select > option")').innerText;

        return {
            statie
        }

    });

    browser.close();
    return result;
};

scrape().then((value) => {
    console.log(value); // Success!
});

This is the error that I get:

[(node:13308) UnhandledPromiseRejectionWarning: Error: Evaluation failed: DOMException: Failed to execute 'querySelector' on 'Document': 'document.querySelector("body > form > font > select")' is not a
valid selector.
    at __puppeteer_evaluation_script__:1:33
    at ExecutionContext._evaluateInternal (D:\Zero\ratt_scrap\node_modules\puppeteer\lib\ExecutionContext.js:122:13)
    at process._tickCallback (internal/process/next_tick.js:68:7)
  -- ASYNC --
    at ExecutionContext.<anonymous> (D:\Zero\ratt_scrap\node_modules\puppeteer\lib\helper.js:111:15)
    at ElementHandle.$ (D:\Zero\ratt_scrap\node_modules\puppeteer\lib\JSHandle.js:395:50)
    at ElementHandle.<anonymous> (D:\Zero\ratt_scrap\node_modules\puppeteer\lib\helper.js:112:23)
    at DOMWorld.$ (D:\Zero\ratt_scrap\node_modules\puppeteer\lib\DOMWorld.js:121:34)
    at process._tickCallback (internal/process/next_tick.js:68:7)
  -- ASYNC --
    at Frame.<anonymous> (D:\Zero\ratt_scrap\node_modules\puppeteer\lib\helper.js:111:15)
    at Page.click (D:\Zero\ratt_scrap\node_modules\puppeteer\lib\Page.js:986:29)
    at scrape (D:\Zero\ratt_scrap\scrape.js:23:16)
    at process._tickCallback (internal/process/next_tick.js:68:7)
(node:13308) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 1)
(node:13308) \[DEP0018\] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.]

I try to scrape some data from an iframe/frame tag, but I got stuck with the code for puppeteer.I'm a novice so please bear with me.This is the link of the site .In there when I click in a name in the first frame I get some data in second frame on witch again I can click and get data in third frame. In the code I try to loop true first frame to get all data for the second and third.

Thank you for any hints.

I have run this mand: document.querySelector("body > form > font > select > option") in console, but I can't find a way to run it in puppeteer.

const puppeteer = require("puppeteer");

(async () => {

    const browser = await puppeteer.launch();

    const page = await browser.newPage();
    await page.goto('');

    const iframeParagraph = await page.evaluate(() => {

        const iframe = document.getElementsByName("stanga");

        // grab iframe's document object
        const iframeDoc = iframe.contentDocument || iframe.contentWindow.document;

        const iframeP = iframeDoc.getElementsByName("fmtstatii");

        return iframeP.innerHTML;
    });

    console.log(iframeParagraph); 

    await browser.close();

})();

or

const puppeteer = require('puppeteer');

let scrape = async () => {
    const browser = await puppeteer.launch({headless: false});
    const page = await browser.newPage();

    await page.goto('');
    await page.click('document.querySelector("body > form > font > select")');
    await page.waitFor(1000);

    const result = await page.evaluate(() => {
        let statie = document.querySelector('document.querySelector("body > form > font > select > option")').innerText;

        return {
            statie
        }

    });

    browser.close();
    return result;
};

scrape().then((value) => {
    console.log(value); // Success!
});

This is the error that I get:

[(node:13308) UnhandledPromiseRejectionWarning: Error: Evaluation failed: DOMException: Failed to execute 'querySelector' on 'Document': 'document.querySelector("body > form > font > select")' is not a
valid selector.
    at __puppeteer_evaluation_script__:1:33
    at ExecutionContext._evaluateInternal (D:\Zero\ratt_scrap\node_modules\puppeteer\lib\ExecutionContext.js:122:13)
    at process._tickCallback (internal/process/next_tick.js:68:7)
  -- ASYNC --
    at ExecutionContext.<anonymous> (D:\Zero\ratt_scrap\node_modules\puppeteer\lib\helper.js:111:15)
    at ElementHandle.$ (D:\Zero\ratt_scrap\node_modules\puppeteer\lib\JSHandle.js:395:50)
    at ElementHandle.<anonymous> (D:\Zero\ratt_scrap\node_modules\puppeteer\lib\helper.js:112:23)
    at DOMWorld.$ (D:\Zero\ratt_scrap\node_modules\puppeteer\lib\DOMWorld.js:121:34)
    at process._tickCallback (internal/process/next_tick.js:68:7)
  -- ASYNC --
    at Frame.<anonymous> (D:\Zero\ratt_scrap\node_modules\puppeteer\lib\helper.js:111:15)
    at Page.click (D:\Zero\ratt_scrap\node_modules\puppeteer\lib\Page.js:986:29)
    at scrape (D:\Zero\ratt_scrap\scrape.js:23:16)
    at process._tickCallback (internal/process/next_tick.js:68:7)
(node:13308) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 1)
(node:13308) \[DEP0018\] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.]
Share Improve this question edited Sep 17, 2019 at 15:32 Adrian asked Jul 23, 2019 at 18:54 AdrianAdrian 771 silver badge6 bronze badges
Add a ment  | 

1 Answer 1

Reset to default 5

You have made several mistakes:

  1. You should interact with the Frame instead of the Page object.

    const frame = await page.frames().find(frame => frame.name() === 'stanga'); // Find the right frame.
    
  2. The click() method expects the selector <string>, so you don't need to add the document.querySelector inside the click() method.

    await frame.click('body > form > font > select');
    
  3. And to get all innerText you have to iterate over the elements.

  4. Don't forget to add the await. You have missed for the close method.

    await browser.close();
    

SOLUTION:

const puppeteer = require('puppeteer');

let scrape = async () => {
const browser = await puppeteer.launch({headless: false});
const page = await browser.newPage();

await page.goto('http://ratt.ro/txt');
const frame = await page.frames().find(frame => frame.name() === 'stanga');
await frame.click('body > form > font > select');
await page.waitFor(1000);


const optionsResult = await frame.$$eval('body > form > font > select > option', (options) => {
    const result = options.map(option => option.innerText);

    return result;
});

await browser.close();

return optionsResult;
};

scrape().then((value) => {
  console.log(value); // Success!
});
发布评论

评论列表(0)

  1. 暂无评论