python - How to scrape website which has hidden data inside table?

I am trying to Scrape Screener.in website to extract some information related to stocks. However while trying to extract Quarterly Results section there are some field which is hidden and when click on + button it show additional information related to parent header. I need to have this information

I am using below python code which is giving me a dataframe but without additional information

url = f'/'
print(url)
req = Request(url, headers={'User-Agent': 'Mozilla/5.0'})
page = urlopen(req).read()
soup = BeautifulSoup(page, 'html.parser')
table = soup.find_all("table", {"class": "data-table responsive-text-nowrap"})[0]
df = pd.read_html(StringIO(str(table)))[0]
df

Above code is working fine however I am not able to pull additional information

Can somebody help me with this?

I am using below python code which is giving me a dataframe but without additional information

url = f'https://www.screener.in/company/TATAPOWER/consolidated/'
print(url)
req = Request(url, headers={'User-Agent': 'Mozilla/5.0'})
page = urlopen(req).read()
soup = BeautifulSoup(page, 'html.parser')
table = soup.find_all("table", {"class": "data-table responsive-text-nowrap"})[0]
df = pd.read_html(StringIO(str(table)))[0]
df

Above code is working fine however I am not able to pull additional information

Can somebody help me with this?

Share Improve this question edited Feb 15 at 19:52 HedgeHog 25.2k5 gold badges17 silver badges41 bronze badges asked Feb 15 at 18:01 Data-7scientist 14510 bronze badges

1 Many sites are driven by JS and the html is not generated until the JS events are triggered to add elements to the DOM. You can't parse HTML that isn't there, so in such cases you may need a different library more like a web scraper or web automation library (there are many but as an example when I was taking a short course on web development we were introduced to Splinter, a python framework I believe based on Selenium). Although you may use your automation code to "click" the right button ... and still use BS to parse the new html. – topsail Commented Feb 15 at 18:07

Add a comment |

1 Answer 1

Sorted by: Reset to default 2

As already commented, the content is reloaded on demand, but it is precisely these requests that can be replicated in order to obtain the content as well.

To do this, you have to iterate over the rows of the table and make the request if necessary.

import requests
import pandas as pd
from bs4 import BeautifulSoup

url = f'https://www.screener.in/company/TATAPOWER/consolidated/'
soup = BeautifulSoup(requests.get(url, headers={'User-Agent': 'Mozilla/5.0'}).text)

keys = ['Item'] + list(soup.select_one('#quarters thead tr').stripped_strings)

data = []

for row in soup.select('#quarters tbody tr')[:-1]:
    if row.td.button:
        data.append(dict(zip(keys,[c.text for c in row.select('td')])))
        d = requests.get(f'https://www.screener.in/api/company/3371/schedules/?parent={row.td.button.text.strip(" +")}&section=quarters&consolidated=', headers={'User-Agent': 'Mozilla/5.0'}).json()
        first_key = next(iter(d))
        data.append({"Item": first_key, **d[first_key]})     
    else:
        data.append(dict(zip(keys,row.stripped_strings)))

pd.DataFrame(data)

Result:

Item	Dec 2021	Mar 2022	Jun 2022	Sep 2022	Dec 2022	Mar 2023	Jun 2023	Sep 2023	Dec 2023	Mar 2024	Jun 2024	Sep 2024	Dec 2024
Sales +	10,913	11,960	14,495	14,031	14,129	12,454	15,213	15,738	14,651	15,847	17,294	15,698	15,391
YOY Sales Growth %	43.63%	15.41%	43.06%	43.02%	29.47%	4.13%	4.95%	12.17%	3.69%	27.24%	13.67%	-0.26%	5.05%
Expenses +	9,279	10,091	12,812	12,270	11,810	10,526	12,500	12,967	12,234	13,540	14,232	12,427	12,312
Material Cost %	8.67%	13.38%	6.74%	4.04%	6.55%	12.13%	6.00%	6.09%	9.29%	13.86%	5.50%	3.59%	6.75%
Operating Profit	1,634	1,869	1,683	1,760	2,319	1,928	2,713	2,771	2,417	2,307	3,062	3,271	3,079
OPM %	15%	16%	12%	13%	16%	15%	18%	18%	16%	15%	18%	21%	20%
Other Income +	865	62	1,227	1,502	1,497	1,352	877	567	1,092	1,407	578	632	589
Exceptional items	0	-618	0	0	0	0	235	0	0	39	0	-140	0
Interest	953	1,015	1,026	1,052	1,098	1,196	1,221	1,182	1,094	1,136	1,176	1,143	1,170
Depreciation	758	846	822	838	853	926	893	926	926	1,041	973	987	1,041
Profit before tax	788	71	1,062	1,373	1,864	1,158	1,476	1,231	1,489	1,537	1,490	1,773	1,457
Tax %	30%	-794%	17%	32%	44%	19%	23%	17%	28%	32%	20%	38%	18%
Net Profit +	552	632	884	935	1,052	939	1,141	1,017	1,076	1,046	1,189	1,093	1,188
Profit after tax	552	632	884	935	1,052	939	1,141	1,017	1,076	1,046	1,189	1,093	1,188
EPS in Rs	1.33	1.57	2.49	2.56	2.96	2.43	3.04	2.74	2.98	2.80	3.04	2.90	3.23

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

python - How to scrape website which has hidden data inside table? - Stack Overflow

1 Answer 1

与本文相关的文章

评论列表(0)