javascript - How can I programmatically scrape an image from another website?

A few years ago I helped someone put together a webpage (for local personal use only, not served to the world) that aggregates outdoor webcam photos from several of his favorite websites. It's a time-saver for viewing multiple websites at once. We had it easy when the images on those websites had fixed URLs. And we were able to write some JavaScript code when the URLs changed predictably (e.g., when the url had a date it in). But now he'd like to add an image whose filename changes seemingly at random and I don't know how to handle that. Basically, I'd like to:

Programmatically visit another website to find the URL of a particular image.
Insert that URL into my webpage with an <img> tag.

I realize this is probably a confusing and unusual question. I'm willing to help clarify as much as possible. I'm just not sure how to ask for what this guy wants to do.

Update: David Dorward mentioned that doing this with JavaScript violates the Same Origin Policy. I'm open to suggestions for other ways to approach this problem.

Programmatically visit another website to find the URL of a particular image.
Insert that URL into my webpage with an <img> tag.

I realize this is probably a confusing and unusual question. I'm willing to help clarify as much as possible. I'm just not sure how to ask for what this guy wants to do.

Update: David Dorward mentioned that doing this with JavaScript violates the Same Origin Policy. I'm open to suggestions for other ways to approach this problem.

Share Improve this question edited Mar 4, 2010 at 16:15 asked Mar 4, 2010 at 14:14 Michael Kristofik 35.3k16 gold badges78 silver badges127 bronze badges

What you could ask him: 1. Does the image path change (or is it always /images/something_random.jpg)? 2. Does the image location on the page to parse change (or is it always the first element in a div with the ID "content")? Clarifying that would help a lot for start. The more random changes you expect, the more plicated the solution will be. – Select0r Commented Mar 4, 2010 at 14:20
I think the image path is fixed. Only the filename changes. And I think it's a safe assumption that the target webpage's structure is fixed. Otherwise this bees a much harder problem. I think when I looked at it, the image in question was the first tag following some main div tag. – Michael Kristofik Commented Mar 4, 2010 at 14:30
Hot linking is not a good idea! – Josh Stodola Commented Mar 4, 2010 at 15:23
2 The only programming language you have tagged this with is "JavaScript", if you are talking about JS in a standard browser context then you are going to run smack bang into the Same Origin Policy (making what you want to achieve impossible). – Quentin Commented Mar 4, 2010 at 15:24
@David Dorward, thank you I didn't know that. I tagged this with html and JS because that's what the guy's webpage currently uses. We can certainly pursue other options. – Michael Kristofik Commented Mar 4, 2010 at 15:44

Add a ment |

4 Answers 4

Sorted by: Reset to default 2

Fetch html of remote page using Cross Domain AJAX.
Then parse it to get urls of images of interest.
Then for each url do <img src=url />

Its probably a big fat violation of copyright.

The picture is most like containered within a page - just regularly visit that page and parse the img tag. Make sure that the random bit you mented on is not just a random parameter to force browsers to fetch the fresh image instead of retrieving a cached version.

You have a Python question in your profile, so I'll just say if I were trying to do this, I'd go with Python & Beautiful Soup. Has the added advantage of being able to handle invalid HTML.

If you use php at your project you can use CURL library to get another website content and using regex parse it for getting image url from source code.

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

javascript - How can I programmatically scrape an image from another website? - Stack Overflow

4 Answers 4

与本文相关的文章

评论列表(0)