I need to get the html content of a page using JavaScript, the page could be also on another domain, kind of what does wget but in JavaScript. I want to use it for a kind of web-crawler.
Using JavaScript, how can I get content of a page, provided I have an URL, and get it into a string?
I need to get the html content of a page using JavaScript, the page could be also on another domain, kind of what does wget but in JavaScript. I want to use it for a kind of web-crawler.
Using JavaScript, how can I get content of a page, provided I have an URL, and get it into a string?
Share Improve this question asked Oct 23, 2012 at 11:52 Eduard FlorinescuEduard Florinescu 17.6k29 gold badges122 silver badges186 bronze badges 4- 1 XMLHttpRequest? Unless you're expecting someone to give you lots of code to actually do it, you need more information in your question. What have you tried? What did you rule out? Why are you using javascript? there may be better ways. – Paystey Commented Oct 23, 2012 at 11:53
- 1 Is this for client side JS (in the browser) or for server side JS (like node.js) ? – Sirko Commented Oct 23, 2012 at 11:55
- @Sirko it is for browser – Eduard Florinescu Commented Oct 23, 2012 at 12:08
- @Paystey I thought that too but then on other domains, see answer below about the concerns I had in mind – Eduard Florinescu Commented Oct 23, 2012 at 12:12
3 Answers
Reset to default 1Try this:
function cbfunc(html) { alert(html.results[0]); }
$.getScript('http://query.yahooapis./v1/public/yql?q=select%20*%20from%20html%20where%20url%3D%22' +
encodeURIComponent(url) + '%22&format=xml&diagnostics=true&callback=cbfunc');
DEMO
More about YQL
The general way to load content over HTTP via JavaScript is to use the XMLHttpRequest object. This is subject to the same origin policy so to access content on other domains you have to circumvent it.
This assumes you are running JS in a web browser (implied by "the page could be also on another domain"). If you were not that other options would be open to you. For example, with nodejs you could use the http client it has.
If you want to also capture the hmtl tags you could concatenate them to the html like this:
function getPageHTML() {
return "<html>" + $("html").html() + "</html>";
}
How do I get the entire page's HTML with jQuery?