最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

javascript - How to use Puppeteer on NODE server and get the results on frontend HTML page? - Stack Overflow

programmeradmin3浏览0评论

I'm just starting to learn Node and Puppeteer so forgiveness for being a noob in advance..

I have a simple form on my index.html page and I want it to return images for an Instagram profile from a function on a NODE server running Puppeteer. In the below code there is an Index.HTML file and an Index.JS file, in the Index.HTML file, when the button is clicked, I just want to call the server with an AJAX request passing in the username and running that function on the server, returning the result to the HTML file and putting the response text into the .images div (I can split the result and render img tags later)

I have a couple questions:

1: I am running the server.js with liveserver plugin in VSC, and it's running the file on http://127.0.0.1:5500/12_Puppeteer/12-scraping-instagram/index.js is that now the endpoint? How then do I pass the username to the server function.. In the headers or in the url? Can you show me?

2: In my AJAX request in the Index.HTML file what does the request need to be to pass the username through to the server scrapeImages(username) function and get back what's returned?

.

This is what I've tried in my index.html file:

       <body>
            <form>
                Username: <input type="text" id="username">&nbsp;&nbsp;
                <button id="clickMe" type="button" value="clickme" onclick="scrape(username.value);">
                Scrape Account Images</button>
            </form>

            <div class="images">
            </div>
        </body>

        <script>
            function scrape() {
                var xhttp = new XMLHttpRequest();
                xhttp.onreadystatechange = function() {
                    if (this.readyState == 4 && this.status == 200) {
                    document.querySelector(".images").innerHTML = this.responseText;
                    }
                };
                xhttp.open("GET", "http://127.0.0.1:5500/12_Puppeteer/12-scraping-instagram/index.js", true);
                xhttp.send();
            }


        </script>

This is my index.js file (works when I debug & with my username/pass):

const puppeteer = require("puppeteer");
const fs = require('fs');

async function scrapeImages (username) {
    const browser = await puppeteer.launch({ headless: false });
    const page = await browser.newPage();

    await page.goto('/')

    await page.type('[name=username]','[email protected]')
    await page.type('[name=password]','xxxxxx')

    await page.click('[type=submit]')
    await page.goto(`/${username}`);

    await page.waitForSelector('img', {
        visible: true,
    })

    const data = await page.evaluate( () => {
        const images = document.querySelectorAll('img');
        const urls = Array.from(images).map(v => v.src + '||');
        return urls;
    } );


    fs.writeFileSync('./myData2.txt', data);


    return data;
}

I'm just starting to learn Node and Puppeteer so forgiveness for being a noob in advance..

I have a simple form on my index.html page and I want it to return images for an Instagram profile from a function on a NODE server running Puppeteer. In the below code there is an Index.HTML file and an Index.JS file, in the Index.HTML file, when the button is clicked, I just want to call the server with an AJAX request passing in the username and running that function on the server, returning the result to the HTML file and putting the response text into the .images div (I can split the result and render img tags later)

I have a couple questions:

1: I am running the server.js with liveserver plugin in VSC, and it's running the file on http://127.0.0.1:5500/12_Puppeteer/12-scraping-instagram/index.js is that now the endpoint? How then do I pass the username to the server function.. In the headers or in the url? Can you show me?

2: In my AJAX request in the Index.HTML file what does the request need to be to pass the username through to the server scrapeImages(username) function and get back what's returned?

.

This is what I've tried in my index.html file:

       <body>
            <form>
                Username: <input type="text" id="username">&nbsp;&nbsp;
                <button id="clickMe" type="button" value="clickme" onclick="scrape(username.value);">
                Scrape Account Images</button>
            </form>

            <div class="images">
            </div>
        </body>

        <script>
            function scrape() {
                var xhttp = new XMLHttpRequest();
                xhttp.onreadystatechange = function() {
                    if (this.readyState == 4 && this.status == 200) {
                    document.querySelector(".images").innerHTML = this.responseText;
                    }
                };
                xhttp.open("GET", "http://127.0.0.1:5500/12_Puppeteer/12-scraping-instagram/index.js", true);
                xhttp.send();
            }


        </script>

This is my index.js file (works when I debug & with my username/pass):

const puppeteer = require("puppeteer");
const fs = require('fs');

async function scrapeImages (username) {
    const browser = await puppeteer.launch({ headless: false });
    const page = await browser.newPage();

    await page.goto('https://www.instagram./accounts/login/')

    await page.type('[name=username]','[email protected]')
    await page.type('[name=password]','xxxxxx')

    await page.click('[type=submit]')
    await page.goto(`https://www.instagram./${username}`);

    await page.waitForSelector('img', {
        visible: true,
    })

    const data = await page.evaluate( () => {
        const images = document.querySelectorAll('img');
        const urls = Array.from(images).map(v => v.src + '||');
        return urls;
    } );


    fs.writeFileSync('./myData2.txt', data);


    return data;
}
Share Improve this question asked Dec 24, 2019 at 17:59 AGrushAGrush 1,16716 silver badges33 bronze badges
Add a ment  | 

1 Answer 1

Reset to default 6

You'll have to setup a node server, like express or anything else, and then pass the username by POST/GET method and catch the username with node/express. Then you can run the puppeteer with it.

For an example, you have your node.js/express server running on port 8888. Your HTML would be like this:

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <meta http-equiv="X-UA-Compatible" content="ie=edge">
    <title>Document</title>
</head>
<body>
    <form method="post">
        Username: <input type="text" name="username" id="username">&nbsp;&nbsp;
        <button id="clickMe" type="button" value="clickme" onclick="getImages(this.form.username.value)">
        Scrape Account Images</button>
    </form>

    <div id="scrapedimages"></div>
    <script>
        let imgArray

        const getImages = (username) => {
            var xhttp = new XMLHttpRequest();
            xhttp.onreadystatechange = function () {
                if (this.readyState == 4 && this.status == 200) {
                    document.querySelector('#scrapedimages').innerHTML = ''
                    imgArray = JSON.parse(this.responseText)
                    if ( imgArray.images.length > 0 ) {
                        imgArray.images.split(',').forEach( function (source) {
                            var image = document.createElement('img')
                            image.src = source
                            document.querySelector('#scrapedimages').appendChild(image)
                        })
                    }
                }
            };
            xhttp.open('GET', 'http://127.0.0.1:8888/instascraper/user/' + username, true);
            xhttp.send();
        }
    </script>
</body>
</html>

Then in your node.js/server your script will be like this

const puppeteer = require('puppeteer')
const fs = require('fs-extra')
const express = require('express')
const app = express()
const port = 8888

const username = 'usernameInstaGram'
const password = 'passwordInstaGram'

;(async () => {

    app.get('/instascraper/user/:userID', async (request, response) => {
        const profile = request.params.userID
        const content = await scrapeImages (profile)
        response.set({
            'Access-Control-Allow-Origin': '*',
            'Access-Control-Allow-Credentials': true,
            'Access-Control-Allow-Methods': 'POST, GET, PUT, DELETE, OPTIONS',
            'Access-Control-Allow-Headers': 'Content-Type',
            'Content-Type': 'text/plain'
        })

        response.send(content)
    })

    app.listen(port, () => {
        console.log(`Instascraper server listening on port ${port}!`)
    })

    const scrapeImages = async profile => {

        const browser = await puppeteer.launch()
        const [page] = await browser.pages()

        await page.goto('https://www.instagram./accounts/login/', {waitUntil: 'networkidle0', timeout: 0})

        await page.waitForSelector('[name=username]', {timeout: 0})
        await page.type('[name=username]', username)
        await page.waitForSelector('[name=password]', {timeout: 0})
        await page.type('[name=password]',password)

        await Promise.all([
            page.waitForNavigation(),
            page.click('[type=submit]')
        ])

        await page.waitForSelector('input[placeholder="Search"]', {timeout: 0})
        await page.goto(`https://www.instagram./${profile}`, {waitUntil: 'networkidle0', timeout: 0})

        await page.waitForSelector('body section > main > div > header ~ div ~ div > article a[href] img[srcset]', {visible:true, timeout: 0})

        const data = await page.evaluate( () => {
            const images = document.querySelectorAll('body section > main > div > header ~ div ~ div > article a[href] img[srcset]')
            const urls = Array.from(images).map(img => img.src )
            return urls;
        })

        await browser.close()

        return `{
            "images" : "${data}"
        }`
    }

})()
发布评论

评论列表(0)

  1. 暂无评论