is it possible to have mechanize follow an anchor link that is of type javascript?
I am trying to login into a website in python using mechanize and beautifulsoup.
this is the anchor link
<a id="StaticModuleID15_ctl00_SkinLogin1_Login1_Login1_LoginButton" href="javascript:WebForm_DoPostBackWithOptions(new WebForm_PostBackOptions("StaticModuleID15$ctl00$SkinLogin1$Login1$Login1$LoginButton", "", true, "Login1", "", false, true))"><img id="StaticModuleID15_ctl00_SkinLogin1_Login1_Login1_Image2" border="0" src="../../App_Themes/default/images/Member/btn_loginenter.gif" align="absmiddle" style="border-width:0px;" /></a>
and here is what i have tried
links = SoupStrainer('a', id="StaticModuleID15_ctl00_SkinLogin1_Login1_Login1_LoginButton")
[anchor for anchor in BeautifulSoup(data, parseOnlyThese=links)]
link = mechanize.Link( base_url = self.url,
url = str(anchor['href']),
text = str(anchor.string),
tag = str(anchor.name),
attrs = [(str(name), str(value))
for name, value in anchor.attrs])
response2 = br.follow_link(link)
Right now I am getting the error message of,
urllib2.URLError:
any help or suggestion is appreciated
Edit
After the ment by helpers, I went and looked at the code of the asp page a bit.
I found a little bit of useful scripts but I am unsure of what I have to do in python to emulate the JS code with python. In no where did I see any cookies set, am I looking at the wrong places?
<form name="form1" method="post" action="BrowseSchedule.aspx?ItemId=75" onsubmit="javascript:return WebForm_OnSubmit();" id="form1">
//<![CDATA[
function WebForm_OnSubmit() {
if (typeof(ValidatorOnSubmit) == "function" && ValidatorOnSubmit() == false) return false;
return true;
}
//]]>
<script type="text/javascript">
//<![CDATA[
var theForm = document.forms['form1'];
if (!theForm) {
theForm = document.form1;
}
function __doPostBack(eventTarget, eventArgument) {
if (!theForm.onsubmit || (theForm.onsubmit() != false)) {
theForm.__EVENTTARGET.value = eventTarget;
theForm.__EVENTARGUMENT.value = eventArgument;
theForm.submit();
}
}
//]]>
</script>
function WebForm_DoPostBackWithOptions(options) {
var validationResult = true;
if (options.validation) {
if (typeof(Page_ClientValidate) == 'function') {
validationResult = Page_ClientValidate(options.validationGroup);
}
}
if (validationResult) {
if ((typeof(options.actionUrl) != "undefined") && (options.actionUrl != null) && (options.actionUrl.length > 0)) {
theForm.action = options.actionUrl;
}
if (options.trackFocus) {
var lastFocus = theForm.elements["__LASTFOCUS"];
if ((typeof(lastFocus) != "undefined") && (lastFocus != null)) {
if (typeof(document.activeElement) == "undefined") {
lastFocus.value = options.eventTarget;
}
else {
var active = document.activeElement;
if ((typeof(active) != "undefined") && (active != null)) {
if ((typeof(active.id) != "undefined") && (active.id != null) && (active.id.length > 0)) {
lastFocus.value = active.id;
}
else if (typeof(active.name) != "undefined") {
lastFocus.value = active.name;
}
}
}
}
}
}
if (options.clientSubmit) {
__doPostBack(options.eventTarget, options.eventArgument);
}
}
is it possible to have mechanize follow an anchor link that is of type javascript?
I am trying to login into a website in python using mechanize and beautifulsoup.
this is the anchor link
<a id="StaticModuleID15_ctl00_SkinLogin1_Login1_Login1_LoginButton" href="javascript:WebForm_DoPostBackWithOptions(new WebForm_PostBackOptions("StaticModuleID15$ctl00$SkinLogin1$Login1$Login1$LoginButton", "", true, "Login1", "", false, true))"><img id="StaticModuleID15_ctl00_SkinLogin1_Login1_Login1_Image2" border="0" src="../../App_Themes/default/images/Member/btn_loginenter.gif" align="absmiddle" style="border-width:0px;" /></a>
and here is what i have tried
links = SoupStrainer('a', id="StaticModuleID15_ctl00_SkinLogin1_Login1_Login1_LoginButton")
[anchor for anchor in BeautifulSoup(data, parseOnlyThese=links)]
link = mechanize.Link( base_url = self.url,
url = str(anchor['href']),
text = str(anchor.string),
tag = str(anchor.name),
attrs = [(str(name), str(value))
for name, value in anchor.attrs])
response2 = br.follow_link(link)
Right now I am getting the error message of,
urllib2.URLError:
any help or suggestion is appreciated
Edit
After the ment by helpers, I went and looked at the code of the asp page a bit.
I found a little bit of useful scripts but I am unsure of what I have to do in python to emulate the JS code with python. In no where did I see any cookies set, am I looking at the wrong places?
<form name="form1" method="post" action="BrowseSchedule.aspx?ItemId=75" onsubmit="javascript:return WebForm_OnSubmit();" id="form1">
//<![CDATA[
function WebForm_OnSubmit() {
if (typeof(ValidatorOnSubmit) == "function" && ValidatorOnSubmit() == false) return false;
return true;
}
//]]>
<script type="text/javascript">
//<![CDATA[
var theForm = document.forms['form1'];
if (!theForm) {
theForm = document.form1;
}
function __doPostBack(eventTarget, eventArgument) {
if (!theForm.onsubmit || (theForm.onsubmit() != false)) {
theForm.__EVENTTARGET.value = eventTarget;
theForm.__EVENTARGUMENT.value = eventArgument;
theForm.submit();
}
}
//]]>
</script>
function WebForm_DoPostBackWithOptions(options) {
var validationResult = true;
if (options.validation) {
if (typeof(Page_ClientValidate) == 'function') {
validationResult = Page_ClientValidate(options.validationGroup);
}
}
if (validationResult) {
if ((typeof(options.actionUrl) != "undefined") && (options.actionUrl != null) && (options.actionUrl.length > 0)) {
theForm.action = options.actionUrl;
}
if (options.trackFocus) {
var lastFocus = theForm.elements["__LASTFOCUS"];
if ((typeof(lastFocus) != "undefined") && (lastFocus != null)) {
if (typeof(document.activeElement) == "undefined") {
lastFocus.value = options.eventTarget;
}
else {
var active = document.activeElement;
if ((typeof(active) != "undefined") && (active != null)) {
if ((typeof(active.id) != "undefined") && (active.id != null) && (active.id.length > 0)) {
lastFocus.value = active.id;
}
else if (typeof(active.name) != "undefined") {
lastFocus.value = active.name;
}
}
}
}
}
}
if (options.clientSubmit) {
__doPostBack(options.eventTarget, options.eventArgument);
}
}
Share
Improve this question
edited Feb 15, 2015 at 6:08
Sheena
16.3k15 gold badges78 silver badges122 bronze badges
asked Aug 13, 2009 at 6:01
freshWoWerfreshWoWer
64.3k10 gold badges38 silver badges36 bronze badges
3
- You should read this: wwwsearch.sourceforge/bits/GeneralFAQ.html and read the "Embedded script is messing up my web-scraping. What do I do?" What you choose will largely depend on your needs, and how plicated the login is. It might end up being that the easiest way is to just to emulate the JS code with Python. – user120242 Commented Aug 13, 2009 at 6:31
- That was the page I was looking for: I knew it was on the site somewhere. – jkp Commented Aug 13, 2009 at 6:49
- That page is now at wwwsearch.sourceforge/old/bits/GeneralFAQ.html – utapyngo Commented Dec 12, 2011 at 14:12
2 Answers
Reset to default 4I don't think this is possible with the mechanize module: it doesn't have the ability to interact with JavaScript: its purely Python and HTTP based.
That said, you may be intested in python-spidermonkey module, which it seems is aimed at letting you do just this kind of thing. According to it's website it's aim is to let you
"Execute arbitrary JavaScript code from Python. Allows you to reference arbitrary Python objects and functions in the JavaScript VM"
I've not used it yet but it certainly looks like it would do what you are looking for, although it is still in alpha.
You may set cookies using cookielib
import mechanize
import cookielib
# add headers to your browser also
browser = mechanize.Browser()
browser.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')]
cj = cookielib.LWPCookieJar()
browser.set_cookiejar(cj)
I doubt this is even relevant now, but oh well :)