最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

url - How can I get a web page into a string using JavaScript? - Stack Overflow

programmeradmin0浏览0评论

I need to get the html content of a page using JavaScript, the page could be also on another domain, kind of what does wget but in JavaScript. I want to use it for a kind of web-crawler.

Using JavaScript, how can I get content of a page, provided I have an URL, and get it into a string?

I need to get the html content of a page using JavaScript, the page could be also on another domain, kind of what does wget but in JavaScript. I want to use it for a kind of web-crawler.

Using JavaScript, how can I get content of a page, provided I have an URL, and get it into a string?

Share Improve this question asked Oct 23, 2012 at 11:52 Eduard FlorinescuEduard Florinescu 17.6k29 gold badges122 silver badges186 bronze badges 4
  • 1 XMLHttpRequest? Unless you're expecting someone to give you lots of code to actually do it, you need more information in your question. What have you tried? What did you rule out? Why are you using javascript? there may be better ways. – Paystey Commented Oct 23, 2012 at 11:53
  • 1 Is this for client side JS (in the browser) or for server side JS (like node.js) ? – Sirko Commented Oct 23, 2012 at 11:55
  • @Sirko it is for browser – Eduard Florinescu Commented Oct 23, 2012 at 12:08
  • @Paystey I thought that too but then on other domains, see answer below about the concerns I had in mind – Eduard Florinescu Commented Oct 23, 2012 at 12:12
Add a ment  | 

3 Answers 3

Reset to default 1

Try this:

function cbfunc(html) { alert(html.results[0]); }
$.getScript('http://query.yahooapis./v1/public/yql?q=select%20*%20from%20html%20where%20url%3D%22' + 
encodeURIComponent(url) + '%22&format=xml&diagnostics=true&callback=cbfunc');

DEMO

More about YQL

The general way to load content over HTTP via JavaScript is to use the XMLHttpRequest object. This is subject to the same origin policy so to access content on other domains you have to circumvent it.

This assumes you are running JS in a web browser (implied by "the page could be also on another domain"). If you were not that other options would be open to you. For example, with nodejs you could use the http client it has.

If you want to also capture the hmtl tags you could concatenate them to the html like this:

 function getPageHTML() {
       return "<html>" + $("html").html() + "</html>";
    }

How do I get the entire page's HTML with jQuery?

发布评论

评论列表(0)

  1. 暂无评论