最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

javascript - Skip to next step on timeout - Stack Overflow

programmeradmin5浏览0评论

I'm leveraging Puppeteer to open a website from a list of URLs, grab a few pieces of data, then write to CSV.

While there are a few elements that could be collected from a given URL, not all URLs will have all elements.

When my code is unable to find one of the stated elements (xpath) it times out and stops the code altogether. Instead of doing this, I would like it to either enter null or 0 to indicate that no data was actually gathered from the URL for that element.

I tried adjusted the duration until timeout but it doesn't move to the next step, it just exists the script altogether (as it does with the default timeout).

As there will be instances where the xpath can't be found, I don't want to disable timeout as it will just loop forever at that point.

Here's my code as it currently stands:

const puppeteer = require('puppeteer');
const fs = require('fs');
const csv = require('csv-parser');
const createCsvWriter = require('csv-writer').createObjectCsvWriter;

(async () => {
  const browser = await puppeteer.launch({ headless: true });
  const page = await browser.newPage();
  
  const urls = [];
    fs.createReadStream('urls.csv')
        .pipe(csv())
        .on('data', (row) => {
            urls.push(row.url); // Assuming the CSV has a column named 'url'
        })
        .on('end', async () => {
                       
            for (const url of urls) {
                await page.goto(url, { waitUntil: 'networkidle2' });
                const url_visited = url

                //* PRICE 1

                    let xpath_ELEMENT_1 = 'XPATH';
                    const el1 = await page.waitForSelector('xpath/' + xpath_ELEMENT_1);
                    const ELEMENT_1 = await page.evaluate(el => el.textContent.trim(), el1);

                //* PRICE 2

                    let xpath_ELEMENT_2 = 'XPATH';
                    const el1 = await page.waitForSelector('xpath/' + xpath_ELEMENT_2);
                    const ELEMENT_2 = await page.evaluate(el => el.textContent.trim(), el2);


// create csv file
const csvWriter = createCsvWriter({
    path: 'output.csv',
    header: [
        {id: 'url', title: 'URL'},
        {id: 'price1', title: 'Price1'},
        {id: 'price2', title: 'Price2'}
    ]
});

// create record using collected data
const records = [
    {url: url_visited, price1: ELEMENT_1, price: ELEMENT_2}
]

// write record to csv
await csvWriter.writeRecords(records);
}

await browser.close();
});
})();```

I'm leveraging Puppeteer to open a website from a list of URLs, grab a few pieces of data, then write to CSV.

While there are a few elements that could be collected from a given URL, not all URLs will have all elements.

When my code is unable to find one of the stated elements (xpath) it times out and stops the code altogether. Instead of doing this, I would like it to either enter null or 0 to indicate that no data was actually gathered from the URL for that element.

I tried adjusted the duration until timeout but it doesn't move to the next step, it just exists the script altogether (as it does with the default timeout).

As there will be instances where the xpath can't be found, I don't want to disable timeout as it will just loop forever at that point.

Here's my code as it currently stands:

const puppeteer = require('puppeteer');
const fs = require('fs');
const csv = require('csv-parser');
const createCsvWriter = require('csv-writer').createObjectCsvWriter;

(async () => {
  const browser = await puppeteer.launch({ headless: true });
  const page = await browser.newPage();
  
  const urls = [];
    fs.createReadStream('urls.csv')
        .pipe(csv())
        .on('data', (row) => {
            urls.push(row.url); // Assuming the CSV has a column named 'url'
        })
        .on('end', async () => {
                       
            for (const url of urls) {
                await page.goto(url, { waitUntil: 'networkidle2' });
                const url_visited = url

                //* PRICE 1

                    let xpath_ELEMENT_1 = 'XPATH';
                    const el1 = await page.waitForSelector('xpath/' + xpath_ELEMENT_1);
                    const ELEMENT_1 = await page.evaluate(el => el.textContent.trim(), el1);

                //* PRICE 2

                    let xpath_ELEMENT_2 = 'XPATH';
                    const el1 = await page.waitForSelector('xpath/' + xpath_ELEMENT_2);
                    const ELEMENT_2 = await page.evaluate(el => el.textContent.trim(), el2);


// create csv file
const csvWriter = createCsvWriter({
    path: 'output.csv',
    header: [
        {id: 'url', title: 'URL'},
        {id: 'price1', title: 'Price1'},
        {id: 'price2', title: 'Price2'}
    ]
});

// create record using collected data
const records = [
    {url: url_visited, price1: ELEMENT_1, price: ELEMENT_2}
]

// write record to csv
await csvWriter.writeRecords(records);
}

await browser.close();
});
})();```
Share Improve this question edited Mar 17 at 16:20 jonrsharpe 122k30 gold badges268 silver badges476 bronze badges asked Mar 17 at 16:18 newnewnewnewnewnew 31 bronze badge
Add a comment  | 

1 Answer 1

Reset to default 0

You need to wrap your code in a try...catch blocks so you can catch errors, avoid timeouts and also write null's to your results.

Something like this:

try {
  await page.goto(url, { waitUntil: "networkidle2" });

  let ELEMENT_1 = null;
  let ELEMENT_2 = null;

  try {
    const el1 = await page.waitForSelector("xpath/XPATH_1", { timeout: 3000 });
    ELEMENT_1 = await page.evaluate((el) => el.textContent.trim(), el1);
  } catch (error) {
    // set null for the ELEMENT_1
  }

  try {
    const el2 = await page.waitForSelector("xpath/XPATH_2", { timeout: 3000 });
    ELEMENT_2 = await page.evaluate((el) => el.textContent.trim(), el2);
  } catch (error) {
    // set null for the ELEMENT_2
  }
} catch (error) {
  // set null for both elements
}
发布评论

评论列表(0)

  1. 暂无评论