I'm using Cheerio () to scrape websites and get images for a project I'm working on. I'm wondering if there's an easy way with Node.js (or another package) to convert the $(img).attr('src') to a fully qualified URL? Sometimes I'll get "image.jpg" and other times "../../image.jpg", and other times "//somepath/image.jpg". Perhaps I'm just missing a regex of some sort... Thanks for your time :)
I'm using Cheerio (https://github./MatthewMueller/cheerio) to scrape websites and get images for a project I'm working on. I'm wondering if there's an easy way with Node.js (or another package) to convert the $(img).attr('src') to a fully qualified URL? Sometimes I'll get "image.jpg" and other times "../../image.jpg", and other times "//somepath/image.jpg". Perhaps I'm just missing a regex of some sort... Thanks for your time :)
Share Improve this question asked Oct 26, 2012 at 1:14 ewindsorewindsor 88510 silver badges24 bronze badges 2- 1 We will need the url of the scrapped site... Or an example of a site like that. Either way, I remend you to build yourself an extra function to parse these values. – hhh Commented Oct 26, 2012 at 3:42
- Ohh Brilliant !! I was troubled by the exact same thing, was manually writing out solutions for each of these. God bless SO ! – vishalv2050 Commented May 31, 2014 at 15:28
1 Answer
Reset to default 10Look at the node url
module. Specifically url.resolve(from, to)
should be what you're looking for.