最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

screen scraping - How can I programmatically scrape a web page and "click" a javascript button? - Stack Overfl

programmeradmin1浏览0评论

I'm trying to scrape a web page for work where there are hundreds of table rows with a check box, and to submit the form I need to click a button which calls a javascript function. The button in html looks like this:

<a onclick="JavaScript: return verifyChecked('Resend the selected request for various approvals?');"
id="_ctl0_cphMain_lbtnReapprove"
title="Click a single request to send to relevant managers for reapproval."
class="lnkDBD" href="javascript:__doPostBack('_ctl0$cphMain$lbtnReapprove','')"
style="border-color:#0077D4;border-width:1px;border-style:Solid;text-decoration: overline;">&nbsp;Resend&nbsp;</a>

I know with libraries like beautiful soup you can submit forms by adding post data to the url, but how could I check a checkbox and "click" this javascript button? The website is a help desk of sorts, and for this particular button we can only check one request at a time which takes way too long when there are hundreds of requests that need re-submitted.

When I check the checkbox a message also pops up verifying that I want to do this, I don't know if that will affect programmatically submit it.

EDIT: I forgot to include the doPostBack method.

<script type="text/javascript"> 
<!--
var theForm = document.forms['aspnetForm'];
if (!theForm) {
    theForm = document.aspnetForm;
}
function __doPostBack(eventTarget, eventArgument) {
    if (!theForm.onsubmit || (theForm.onsubmit() != false)) {
        theForm.__EVENTTARGET.value = eventTarget;
        theForm.__EVENTARGUMENT.value = eventArgument;
        theForm.submit();
    }
}
// -->
</script>

I'm trying to scrape a web page for work where there are hundreds of table rows with a check box, and to submit the form I need to click a button which calls a javascript function. The button in html looks like this:

<a onclick="JavaScript: return verifyChecked('Resend the selected request for various approvals?');"
id="_ctl0_cphMain_lbtnReapprove"
title="Click a single request to send to relevant managers for reapproval."
class="lnkDBD" href="javascript:__doPostBack('_ctl0$cphMain$lbtnReapprove','')"
style="border-color:#0077D4;border-width:1px;border-style:Solid;text-decoration: overline;">&nbsp;Resend&nbsp;</a>

I know with libraries like beautiful soup you can submit forms by adding post data to the url, but how could I check a checkbox and "click" this javascript button? The website is a help desk of sorts, and for this particular button we can only check one request at a time which takes way too long when there are hundreds of requests that need re-submitted.

When I check the checkbox a message also pops up verifying that I want to do this, I don't know if that will affect programmatically submit it.

EDIT: I forgot to include the doPostBack method.

<script type="text/javascript"> 
<!--
var theForm = document.forms['aspnetForm'];
if (!theForm) {
    theForm = document.aspnetForm;
}
function __doPostBack(eventTarget, eventArgument) {
    if (!theForm.onsubmit || (theForm.onsubmit() != false)) {
        theForm.__EVENTTARGET.value = eventTarget;
        theForm.__EVENTARGUMENT.value = eventArgument;
        theForm.submit();
    }
}
// -->
</script>
Share Improve this question edited May 5, 2012 at 2:47 Jeff Mercado 135k33 gold badges266 silver badges280 bronze badges asked May 2, 2012 at 1:38 FirehaKFirehaK 432 silver badges6 bronze badges 4
  • You need to inspect what clicking the JS button actually does, then just create a copy of the process. – Petah Commented May 2, 2012 at 1:41
  • 1 I know that jQuery isn't mentioned, and is probably a bad fit here, but this may be interesting to look at: api.jquery./trigger – Jarrod Mosen Commented May 2, 2012 at 1:44
  • @Petah I updated the first post with the method, I forgot to include it. I'm not sure how to go about copying what it does from something like beautiful soup though. – FirehaK Commented May 2, 2012 at 1:49
  • @FirehaK I mean what it actually does in terms of the HTTP protocol/requests. It is easy to spoof HTTP requests, you just need to know what you need to spoof. – Petah Commented May 2, 2012 at 1:51
Add a ment  | 

3 Answers 3

Reset to default 2

Get Firefox and Firebug, open Firebug load up the page, and look in the console tab for what its actually sending to the server.

Then just repeat what its sending using what ever tool you like.

You're probably better off using a browser automation library like selenium for something like this.

Try Imacros. For simple browser automation it's excellent. You can record your sessions and it makes code based on that. If there is more logic, standard programming in the non-plex documentation can have you going fast. You can cal outside language / scripts as well. A few projects for example I've used this for:

1) collect business leads: a site had a list of all ther retail stores but would not give them all just close to a user input zip code. In spreadsheet put a ton of zip codes and when ran, would go through each one from csv and scrape info for store in csv file. Every 5 minutes would open VPN program on pc and change ip. Took. 20 minutes to make.

I'd your set on programming it then ok, but I find this the best way as its easier to debug if site changes , their "code" is very easy and you can call other scripts and files with ease. Firefox add on is most stable and free.

与本文相关的文章

发布评论

评论列表(0)

  1. 暂无评论