最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

javascript - How do I get the absolute path for '<img src=''>' in node from the a

programmeradmin0浏览0评论

So I want to use request-promise to pull the body of a page. Once I have the page I want to collect all the tags and get an array of src's of those images. Assume the src attributes on a page have both relative and absolute paths. I want an array of absolute paths for imgs on a page. I know I can use some string manipulation and the npm path to build the absolute path but I wanted to find a better way of doing it.

var rp = require('request-promise'),
    cheerio = require('cheerio');

var options = {
    uri: '',
    method: 'GET',
    resolveWithFullResponse: true
};

rp(options)
  .then (function (response) {
    $ = cheerio.load(response.body);
    var relativeLinks = $("img");
    relativeLinks.each( function() {
        var link = $(this).attr('src');
        console.log(link);
        if (link.startsWith('http')){
            console.log('abs');
        }
        else {
            console.log('rel');
        }
   });
});

results

  /logos/doodles/2016/phoebe-snetsingers-85th-birthday-5179281716019200-hp.gif
  rel

So I want to use request-promise to pull the body of a page. Once I have the page I want to collect all the tags and get an array of src's of those images. Assume the src attributes on a page have both relative and absolute paths. I want an array of absolute paths for imgs on a page. I know I can use some string manipulation and the npm path to build the absolute path but I wanted to find a better way of doing it.

var rp = require('request-promise'),
    cheerio = require('cheerio');

var options = {
    uri: 'http://www.google.',
    method: 'GET',
    resolveWithFullResponse: true
};

rp(options)
  .then (function (response) {
    $ = cheerio.load(response.body);
    var relativeLinks = $("img");
    relativeLinks.each( function() {
        var link = $(this).attr('src');
        console.log(link);
        if (link.startsWith('http')){
            console.log('abs');
        }
        else {
            console.log('rel');
        }
   });
});

results

  /logos/doodles/2016/phoebe-snetsingers-85th-birthday-5179281716019200-hp.gif
  rel
Share Improve this question edited Jun 9, 2016 at 20:04 Nick Bartlett 4,9752 gold badges26 silver badges38 bronze badges asked Jun 9, 2016 at 18:43 bsegobsego 731 silver badge3 bronze badges 2
  • Possible duplicate of getting the absolute path of a <img/> – Midas Commented Jun 9, 2016 at 19:01
  • @Midas This question is closely related, but not quite a duplicate of that other question because of the implementation differences between the DOM and jQuery in that case, and Cheerio in this case. Doing something like $(this) or $('img')[0].src won't return anything in Cheerio. – Michael Commented Jun 10, 2016 at 17:02
Add a ment  | 

4 Answers 4

Reset to default 4

Store your page URL as a variable use url.resolve to join the pieces together. In the Node REPL this works for both relative and absolute paths (hence the "resolving"):

$:~/Projects/test$ node
> var base = "https://www.google.";
undefined
> var imageSrc = "/logos/doodles/2016/phoebe-snetsingers-85th-birthday-5179281716019200-hp.gif";
undefined
> var url = require('url');
undefined
> url.resolve(base, imageSrc);
'https://www.google./logos/doodles/2016/phoebe-snetsingers-85th-birthday-5179281716019200-hp.gif'
> imageSrc = base + imageSrc;
'https://www.google./logos/doodles/2016/phoebe-snetsingers-85th-birthday-5179281716019200-hp.gif'
> url.resolve(base, imageSrc);
'https://www.google./logos/doodles/2016/phoebe-snetsingers-85th-birthday-5179281716019200-hp.gif'

Your code would change to something like:

var rp = require('request-promise'),
    cheerio = require('cheerio'),
    url = require('url'),
    base = 'http://www.google.';

var options = {
    uri: base,
    method: 'GET',
    resolveWithFullResponse: true
};

rp(options)
  .then (function (response) {
    $ = cheerio.load(response.body);
    var relativeLinks = $("img");
    relativeLinks.each( function() {
        var link = $(this).attr('src');
        var fullImagePath = url.resolve(base, link); // should be absolute 
        console.log(link);
        if (link.startsWith('http')){
            console.log('abs');
        }
        else {
            console.log('rel');
        }
   });
});

To get an array of image links in your scenario, you can use url.resolve to resolve relative src attributes of img tags with the request URL, resulting in an absolute URL. The array is passed to the final then; you can do other things with the array other than console.log if so desired.

var rp = require('request-promise'),
    cheerio = require('cheerio'),
    url = require('url'),
    base = 'http://www.google.';

var options = {
    uri: base,
    method: 'GET',
    resolveWithFullResponse: true
};

rp(options)
    .then (function (response) {
        var $ = cheerio.load(response.body);

        return $('img').map(function () {
            return url.resolve(base, $(this).attr('src'));
        }).toArray();
    })
    .then(console.log);

This url.resolve will work for absolute or relative URLs (it resolves and returns the bined absolute URL when resolving from your request URL to a relative path, but when resolving from your request URL to an absolute URL it just returns the absolute URL). For example, with img tags on google with /logos/cat.gif and https://test./dog.gif as the src attributes, this would output:

[ 
    'http://www.google./logos/cat.gif',
    'https://test./dog.gif'
]

It looks like you're using jQuery, so you could

$('img').each(function(i, e) {
    console.log(e.src)
});

If you use src it will expand relative paths to absolute ones.

It's 2022 and url.resolve is deprecated now.

Here is how I do it (works both for a 'href' and img 'src'):

import URI from 'urijs'

function absolutizeUri(maybeRelativeUri: string, baseUri: string): string {
    if (!maybeRelativeUri || maybeRelativeUri.length === 0) {
        return ''
    }
    let uri = new URI(maybeRelativeUri);
    if (uri.is('relative')) {
        uri = ur.absoluteTo(baseUri)
    }
    return uri.toString()
}

// ...
const baseUri = 'http://www.google.'
const src = absolutizeUri($(this).attr('src'), baseUri)
发布评论

评论列表(0)

  1. 暂无评论