最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

curl - Command line URL fetch with JavaScript capabliity - Stack Overflow

programmeradmin5浏览0评论

I use curl, in php and httplib2 in python to fetch URL.

However, there are some pages that use JavaScript (AJAX) to retrieve the data after you have loaded the page and they just overwrite a specific section of the page afterward.

So, is there any mand line utility that can handle JavaScript?

To know what I mean go to: monster and try searching for a job.

You'll see that the Ajax is getting the list of jobs afterward. So, if I wanted to pull in the jobs based on my keyword search, I would get the page with no jobs.

But via browser it works.

I use curl, in php and httplib2 in python to fetch URL.

However, there are some pages that use JavaScript (AJAX) to retrieve the data after you have loaded the page and they just overwrite a specific section of the page afterward.

So, is there any mand line utility that can handle JavaScript?

To know what I mean go to: monster. and try searching for a job.

You'll see that the Ajax is getting the list of jobs afterward. So, if I wanted to pull in the jobs based on my keyword search, I would get the page with no jobs.

But via browser it works.

Share Improve this question edited Jan 8, 2017 at 14:55 marc_s 756k184 gold badges1.4k silver badges1.5k bronze badges asked Jul 9, 2009 at 20:29 Avid CoderAvid Coder 18.4k14 gold badges64 silver badges69 bronze badges
Add a ment  | 

6 Answers 6

Reset to default 0

you can use PhantomJS http://phantomjs

You can use it as below :

var page=require("webpage");
page.open("http://monster.",function(status){
  page.evaluate(function(){
    /* your javascript code here 
        $.ajax("....",function(result){


            phantom.exit(0);
           }); */
  });
});

Get FireBug and see the URL for that Ajax request. You may then use curl with that URL.

There are 2 ways to handle this. Write your screen scraper using a full browser based client like Webkit, or go to the actual page and find out what the AJAX requesting is doing and do request that directly. You then need to parse the results of course. Use firebug to help you out.

Check out this post for more info on the subject. The upvoted answer suggests using a test tool to drive a real browser. What's a good tool to screen-scrape with Javascript support?

I think env.js can handle <script> elements. It runs in the Rhino JavaScript interpreter and has it's own XMLHttpRequest object, so you should be able to at least run the scripts manually (select all the <script> tags, get the .js file, and call eval) if it doesn't automatically run them. Be careful about running scripts you don't trust though, since they can use any Java classes.

I haven't played with it since John Resig's first version, so I don't know much about how to use it, but there's a discussion group on Google Groups.

Maybe you could try and use features of HtmlUnit in your own utility?

HtmlUnit is a "GUI-Less browser for Java programs". It models HTML documents and provides an API that allows you to invoke pages, fill out forms, click links, etc... just like you do in your "normal" browser.

It has fairly good JavaScript support (which is constantly improving) and is able to work even with quite plex AJAX libraries, simulating either Firefox or Internet Explorer depending on the configuration you want to use.

It is typically used for testing purposes or to retrieve information from web sites.

Use LiveHttpHeaders a plug in for Firefox to see all URL details and then use the cURL with that url. LiveHttpHeaders shows all information like type of method(post or get) and headers body etc. it also show post or get parameters in headers i think this may help you.

发布评论

评论列表(0)

  1. 暂无评论