最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

javascript - Python Requests run JS file from GET - Stack Overflow

programmeradmin1浏览0评论

Goal

To log in to this website () using python requests etc. (I know this could be done with selenium or PhantomJS or something, but would prefer not to)

Problem

During the log in process there a couple of redirects where "session ID" type params are passed. Most of these i can get but there's one called dtPC that appears to e from a cookie that you get when first visiting the page. As far as I can tell, the cookie originates from this JS file (.js). This url is the next GET request the browser performs after the initial GET of the main url. All the methods i've tried so far have failed to get me that cookie.

Code thus far

from requests_html import HTMLSession

url=r''
url2=r'.js'
headers={
'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3',
 'Accept-Encoding': 'gzip, deflate, br',
 'Accept-Language': 'en-US,en;q=0.9',
 'Cache-Control': 'max-age=0',
 'Connection': 'keep-alive',
 'Host': 'www.reliant',
 'Sec-Fetch-Mode': 'navigate',
 'Sec-Fetch-Site': 'none',
 'Sec-Fetch-User': '?1',
 'Upgrade-Insecure-Requests': '1',
 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.3'
}

headers2={
'Referer': '',
 'Sec-Fetch-Mode': 'no-cors',
 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.132 Safari/537.36'
}

s=HTMLSession()
r=s.get(url,headers=headers)
js=s.get(url2,headers=headers2).text

r.html.render() #works but doesn't get the cookie
r.html.render(script=js) #fails on Network error

Goal

To log in to this website (https://www.reliant.) using python requests etc. (I know this could be done with selenium or PhantomJS or something, but would prefer not to)

Problem

During the log in process there a couple of redirects where "session ID" type params are passed. Most of these i can get but there's one called dtPC that appears to e from a cookie that you get when first visiting the page. As far as I can tell, the cookie originates from this JS file (https://www.reliant./ruxitagentjs_ICA2QSVfhjqrux_10175190917092722.js). This url is the next GET request the browser performs after the initial GET of the main url. All the methods i've tried so far have failed to get me that cookie.

Code thus far

from requests_html import HTMLSession

url=r'https://www.reliant.'
url2=r'https://www.reliant./ruxitagentjs_ICA2QSVfhjqrux_10175190917092722.js'
headers={
'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3',
 'Accept-Encoding': 'gzip, deflate, br',
 'Accept-Language': 'en-US,en;q=0.9',
 'Cache-Control': 'max-age=0',
 'Connection': 'keep-alive',
 'Host': 'www.reliant.',
 'Sec-Fetch-Mode': 'navigate',
 'Sec-Fetch-Site': 'none',
 'Sec-Fetch-User': '?1',
 'Upgrade-Insecure-Requests': '1',
 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.3'
}

headers2={
'Referer': 'https://www.reliant.',
 'Sec-Fetch-Mode': 'no-cors',
 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.132 Safari/537.36'
}

s=HTMLSession()
r=s.get(url,headers=headers)
js=s.get(url2,headers=headers2).text

r.html.render() #works but doesn't get the cookie
r.html.render(script=js) #fails on Network error
Share Improve this question asked Sep 26, 2019 at 16:31 SuperStewSuperStew 3,0642 gold badges17 silver badges29 bronze badges
Add a ment  | 

1 Answer 1

Reset to default 5

Alright I figured this one out, despite it fighting me the whole way. Idk why dtPC wasn't showing up in the s.cookies like it should, but I wasn't using the script keyword quite right. Apparently, whatever JS you pass it will be executed after everything else has rendered, like you opened the console on your browser and pasted it in there. When i actually tried that in Chrome, I got some errors. Eventually i realized i could just run a simple JS script to return the cookies generated by the other JS.

s=HTMLSession()
r=s.get(url,headers=headers)
print(r.status_code)

c=r.html.render(script='document.cookie') 

c=urllib.parse.unquote(c)
c=[x.split('=') for x in c.split(';')]
c={x[0]:x[1] for x in c}
print(c)

at this point, c will be a dict with 'dtPC' as a key and the corresponding value.

发布评论

评论列表(0)

  1. 暂无评论