最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

javascript - nightmarejs scrape multiple Elements with querySelectorAll - Stack Overflow

programmeradmin1浏览0评论

I'm trying to scrape some informations from an instagram profile page with nightmarejs (a phantomjs derivate using electron as a browser).

The goal is to get the alt tags of all images on the profile (for examples sake I focus only on the images before the "show more" button)

var Nightmare = require('nightmare');
var nightmare = Nightmare({ show: true });

nightmare
  .goto('/')
  .evaluate(function () {
    let array = [...document.querySelectorAll('._icyx7')];
    return array.length;
  })
  .end()
  .then(function (result) {
    console.log(result);
  })
  .catch(function (error) {
    console.error('Search failed:', error);
  });
  

I'm trying to scrape some informations from an instagram profile page with nightmarejs (a phantomjs derivate using electron as a browser).

The goal is to get the alt tags of all images on the profile (for examples sake I focus only on the images before the "show more" button)

var Nightmare = require('nightmare');
var nightmare = Nightmare({ show: true });

nightmare
  .goto('https://www.instagram./ackerfestival/')
  .evaluate(function () {
    let array = [...document.querySelectorAll('._icyx7')];
    return array.length;
  })
  .end()
  .then(function (result) {
    console.log(result);
  })
  .catch(function (error) {
    console.error('Search failed:', error);
  });
  

This example works, the array has a length of 12. The electron browser opens and closes, so everything is fine. But if I change the return to just the array, the electron browser never closes and I don't get a console.log.

What am I doing wrong? I want to get all informations from the images in an Array or Object.

Share Improve this question edited Feb 27, 2017 at 5:08 Vaviloff 16.9k6 gold badges53 silver badges62 bronze badges asked Feb 26, 2017 at 17:34 tarpiertarpier 231 silver badge4 bronze badges
Add a ment  | 

1 Answer 1

Reset to default 8

The problem you're hitting is document.querySelectorAll() returns a NodeList of DOMElements. Those two object types do not serialize well, and the return value from .evaluate() has to serialize across the IPC boundary - I'm betting you're getting an empty array on the other side of your .evaluate() call?

The easiest answer here is to map out what, specifically, you want from the NodeList. From the hip, something like the following should get the idea across:

.evaluate(function(){
  return Array.from(document.querySelectorAll('._icyx7')).map(element => element.innerText);
})
.then((innerTexts) => {
  // ... do something with the inner texts of each element
})
发布评论

评论列表(0)

  1. 暂无评论