I have a site I need to scrape, to find a list of classes on a div, by searching for a particular div class.
For instance, if we have the code:
//HTML on site
<div class="main">Main Stuff</div>
<div class="class1 class 2 specialclass">Other Stuff</div>
<div class="footer">Footer Stuff</div>'
I need to search for "special class" as a div class, and return the list of classes for that div, so I would want to return:
class1 class2 specialclass
I'm using a Wikibooks site as an example and running this code:
//Puppeteer Code
const puppeteer = require('puppeteer')
const devices = require('puppeteer/DeviceDescriptors');
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('/');
const myclassname = await page.evaluate(() =>
document.querySelector('.lang1').innerText);
console.log(myclassname);
It searches for the div with a class of lang1
, the default language div near the top of the screen, and returns me the text of the object, but I don't know what to change innerText
to in order to get the class names of the object, so it will return central-featured-lang lang1
, all of the classes of that object.
I have a site I need to scrape, to find a list of classes on a div, by searching for a particular div class.
For instance, if we have the code:
//HTML on site
<div class="main">Main Stuff</div>
<div class="class1 class 2 specialclass">Other Stuff</div>
<div class="footer">Footer Stuff</div>'
I need to search for "special class" as a div class, and return the list of classes for that div, so I would want to return:
class1 class2 specialclass
I'm using a Wikibooks site as an example and running this code:
//Puppeteer Code
const puppeteer = require('puppeteer')
const devices = require('puppeteer/DeviceDescriptors');
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://www.wikibooks/');
const myclassname = await page.evaluate(() =>
document.querySelector('.lang1').innerText);
console.log(myclassname);
It searches for the div with a class of lang1
, the default language div near the top of the screen, and returns me the text of the object, but I don't know what to change innerText
to in order to get the class names of the object, so it will return central-featured-lang lang1
, all of the classes of that object.
2 Answers
Reset to default 11Consider the following element from the webpage you specified:
<div class="central-featured-lang lang1" lang="en">...</div>
You can use className
or getAttribute('class')
to obtain the content of the class
attribute of an element:
const myclassname = await page.evaluate(() => document.querySelector('.lang1' ).className);
console.log(myclassname); // Returns "central-featured-lang lang1"
Or, you can return an iterable array of the classes of an element using classList
:
const myclassnamearray = await page.evaluate(() => [...document.querySelector('.lang1').classList]);
console.log(myclassnamearray[0]); // Returns "central-featured-lang"
console.log(myclassnamearray[1]); // Returns "lang1"
use
.getAttribute("class");
for example
var x = document.getElementsByTagName("H1")[0].getAttribute("class");