最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

javascript - Puppeteer: Open a page, get the data, go back to the previous page, enter a new page to get data - Stack Overflow

programmeradmin2浏览0评论

Getting data from 1 page is simple, but how to go back after getting data from first page, enter a new page, get data from that page .. etc. I am trying to do this on a website /.

So, I chose to print how many books are in Stock because it can only be accessed if you enter the link. For example, if you run the code you will get: { stock: 'In stock (22 available)' }

Now, I wish to go back to the original page, enter the second link and take the same information as the previous one. And so on..

How can this be done using vanilla JavaScript?

const puppeteer = require('puppeteer');

let scrape = async () => {
    const browser = await puppeteer.launch({ headless: false });
    const page = await browser.newPage();

    await page.goto('/');
    await page.click('#default > div > div > div > div > section > div:nth-child(2) > ol > li:nth-child(1) > article > div.image_container > a > img');
    await page.waitFor(1000);

    const result = await page.evaluate(() => {
        let stock = document.querySelector('#content_inner > article > table > tbody > tr:nth-child(6) > td').innerText;

        return {
            stock
        }
    });

    browser.close();
    return result;
};

scrape().then((value) => {
    console.log(value); // Success!
});

Getting data from 1 page is simple, but how to go back after getting data from first page, enter a new page, get data from that page .. etc. I am trying to do this on a website http://books.toscrape./.

So, I chose to print how many books are in Stock because it can only be accessed if you enter the link. For example, if you run the code you will get: { stock: 'In stock (22 available)' }

Now, I wish to go back to the original page, enter the second link and take the same information as the previous one. And so on..

How can this be done using vanilla JavaScript?

const puppeteer = require('puppeteer');

let scrape = async () => {
    const browser = await puppeteer.launch({ headless: false });
    const page = await browser.newPage();

    await page.goto('http://books.toscrape./');
    await page.click('#default > div > div > div > div > section > div:nth-child(2) > ol > li:nth-child(1) > article > div.image_container > a > img');
    await page.waitFor(1000);

    const result = await page.evaluate(() => {
        let stock = document.querySelector('#content_inner > article > table > tbody > tr:nth-child(6) > td').innerText;

        return {
            stock
        }
    });

    browser.close();
    return result;
};

scrape().then((value) => {
    console.log(value); // Success!
});
Share Improve this question edited Apr 23, 2019 at 16:30 Thomas Dondorf 25.3k6 gold badges96 silver badges112 bronze badges asked Apr 23, 2019 at 16:12 user9746492user9746492 551 silver badge5 bronze badges
Add a ment  | 

1 Answer 1

Reset to default 6

Explanation

What you need to do is call page.goBack() to go back one page when your task is finished and then click the next element. For this you should use page.$$ to get the list of the clickable elements and use a loop to step over them one after another. Then you can re-run your script to extract the same information for the next page.

Code

I adapted your code to print out your desired result in the console for each page below. Be aware that I changed the selector from your question to remove the :nth-child(1) to select all clickable elements.

const puppeteer = require('puppeteer');

const elementsToClickSelector = '#default > div > div > div > div > section > div:nth-child(2) > ol > li > article > div.image_container > a > img';

let scrape = async () => {
    const browser = await puppeteer.launch({ headless: false });
    const page = await browser.newPage();

    await page.goto('http://books.toscrape./');

    // get all elements to be clicked
    let elementsToClick = await page.$$(elementsToClickSelector);
    console.log(`Elements to click: ${elementsToClick.length}`);

    for (let i = 0; i < elementsToClick.length; i++) {
        // click element
        elementsToClick[i].click();
        await page.waitFor(1000);

        // generate result for the current page
        const result = await page.evaluate(() => {
            let stock = document.querySelector('#content_inner > article > table > tbody > tr:nth-child(6) > td').innerText;
            return { stock };
        });
        console.log(result); // do something with the result here...

        // go back one page and repopulate the elements
        await page.goBack();
        elementsToClick = await page.$$(elementsToClickSelector);
    }

    browser.close();
};

scrape();

与本文相关的文章

发布评论

评论列表(0)

  1. 暂无评论