最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

javascript - Node.js scraping, converting image src -> full URL - Stack Overflow

programmeradmin1浏览0评论

I'm using Cheerio () to scrape websites and get images for a project I'm working on. I'm wondering if there's an easy way with Node.js (or another package) to convert the $(img).attr('src') to a fully qualified URL? Sometimes I'll get "image.jpg" and other times "../../image.jpg", and other times "//somepath/image.jpg". Perhaps I'm just missing a regex of some sort... Thanks for your time :)

I'm using Cheerio (https://github./MatthewMueller/cheerio) to scrape websites and get images for a project I'm working on. I'm wondering if there's an easy way with Node.js (or another package) to convert the $(img).attr('src') to a fully qualified URL? Sometimes I'll get "image.jpg" and other times "../../image.jpg", and other times "//somepath/image.jpg". Perhaps I'm just missing a regex of some sort... Thanks for your time :)

Share Improve this question asked Oct 26, 2012 at 1:14 ewindsorewindsor 88510 silver badges24 bronze badges 2
  • 1 We will need the url of the scrapped site... Or an example of a site like that. Either way, I remend you to build yourself an extra function to parse these values. – hhh Commented Oct 26, 2012 at 3:42
  • Ohh Brilliant !! I was troubled by the exact same thing, was manually writing out solutions for each of these. God bless SO ! – vishalv2050 Commented May 31, 2014 at 15:28
Add a ment  | 

1 Answer 1

Reset to default 10

Look at the node url module. Specifically url.resolve(from, to) should be what you're looking for.

发布评论

评论列表(0)

  1. 暂无评论