I am trying to Scrape Screener.in website to extract some information related to stocks. However while trying to extract Quarterly Results section there are some field which is hidden and when click on + button it show additional information related to parent header. I need to have this information
I am using below python code which is giving me a dataframe but without additional information
url = f'/'
print(url)
req = Request(url, headers={'User-Agent': 'Mozilla/5.0'})
page = urlopen(req).read()
soup = BeautifulSoup(page, 'html.parser')
table = soup.find_all("table", {"class": "data-table responsive-text-nowrap"})[0]
df = pd.read_html(StringIO(str(table)))[0]
df
Above code is working fine however I am not able to pull additional information
Can somebody help me with this?
I am trying to Scrape Screener.in website to extract some information related to stocks. However while trying to extract Quarterly Results section there are some field which is hidden and when click on + button it show additional information related to parent header. I need to have this information
I am using below python code which is giving me a dataframe but without additional information
url = f'https://www.screener.in/company/TATAPOWER/consolidated/'
print(url)
req = Request(url, headers={'User-Agent': 'Mozilla/5.0'})
page = urlopen(req).read()
soup = BeautifulSoup(page, 'html.parser')
table = soup.find_all("table", {"class": "data-table responsive-text-nowrap"})[0]
df = pd.read_html(StringIO(str(table)))[0]
df
Above code is working fine however I am not able to pull additional information
Can somebody help me with this?
Share Improve this question edited Feb 15 at 19:52 HedgeHog 25.2k5 gold badges17 silver badges41 bronze badges asked Feb 15 at 18:01 Data-7scientistData-7scientist 14510 bronze badges 1- 1 Many sites are driven by JS and the html is not generated until the JS events are triggered to add elements to the DOM. You can't parse HTML that isn't there, so in such cases you may need a different library more like a web scraper or web automation library (there are many but as an example when I was taking a short course on web development we were introduced to Splinter, a python framework I believe based on Selenium). Although you may use your automation code to "click" the right button ... and still use BS to parse the new html. – topsail Commented Feb 15 at 18:07
1 Answer
Reset to default 2As already commented, the content is reloaded on demand, but it is precisely these requests that can be replicated in order to obtain the content as well.
To do this, you have to iterate over the rows of the table and make the request if necessary.
import requests
import pandas as pd
from bs4 import BeautifulSoup
url = f'https://www.screener.in/company/TATAPOWER/consolidated/'
soup = BeautifulSoup(requests.get(url, headers={'User-Agent': 'Mozilla/5.0'}).text)
keys = ['Item'] + list(soup.select_one('#quarters thead tr').stripped_strings)
data = []
for row in soup.select('#quarters tbody tr')[:-1]:
if row.td.button:
data.append(dict(zip(keys,[c.text for c in row.select('td')])))
d = requests.get(f'https://www.screener.in/api/company/3371/schedules/?parent={row.td.button.text.strip(" +")}§ion=quarters&consolidated=', headers={'User-Agent': 'Mozilla/5.0'}).json()
first_key = next(iter(d))
data.append({"Item": first_key, **d[first_key]})
else:
data.append(dict(zip(keys,row.stripped_strings)))
pd.DataFrame(data)
Result:
Item | Dec 2021 | Mar 2022 | Jun 2022 | Sep 2022 | Dec 2022 | Mar 2023 | Jun 2023 | Sep 2023 | Dec 2023 | Mar 2024 | Jun 2024 | Sep 2024 | Dec 2024 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Sales + | 10,913 | 11,960 | 14,495 | 14,031 | 14,129 | 12,454 | 15,213 | 15,738 | 14,651 | 15,847 | 17,294 | 15,698 | 15,391 |
YOY Sales Growth % | 43.63% | 15.41% | 43.06% | 43.02% | 29.47% | 4.13% | 4.95% | 12.17% | 3.69% | 27.24% | 13.67% | -0.26% | 5.05% |
Expenses + | 9,279 | 10,091 | 12,812 | 12,270 | 11,810 | 10,526 | 12,500 | 12,967 | 12,234 | 13,540 | 14,232 | 12,427 | 12,312 |
Material Cost % | 8.67% | 13.38% | 6.74% | 4.04% | 6.55% | 12.13% | 6.00% | 6.09% | 9.29% | 13.86% | 5.50% | 3.59% | 6.75% |
Operating Profit | 1,634 | 1,869 | 1,683 | 1,760 | 2,319 | 1,928 | 2,713 | 2,771 | 2,417 | 2,307 | 3,062 | 3,271 | 3,079 |
OPM % | 15% | 16% | 12% | 13% | 16% | 15% | 18% | 18% | 16% | 15% | 18% | 21% | 20% |
Other Income + | 865 | 62 | 1,227 | 1,502 | 1,497 | 1,352 | 877 | 567 | 1,092 | 1,407 | 578 | 632 | 589 |
Exceptional items | 0 | -618 | 0 | 0 | 0 | 0 | 235 | 0 | 0 | 39 | 0 | -140 | 0 |
Interest | 953 | 1,015 | 1,026 | 1,052 | 1,098 | 1,196 | 1,221 | 1,182 | 1,094 | 1,136 | 1,176 | 1,143 | 1,170 |
Depreciation | 758 | 846 | 822 | 838 | 853 | 926 | 893 | 926 | 926 | 1,041 | 973 | 987 | 1,041 |
Profit before tax | 788 | 71 | 1,062 | 1,373 | 1,864 | 1,158 | 1,476 | 1,231 | 1,489 | 1,537 | 1,490 | 1,773 | 1,457 |
Tax % | 30% | -794% | 17% | 32% | 44% | 19% | 23% | 17% | 28% | 32% | 20% | 38% | 18% |
Net Profit + | 552 | 632 | 884 | 935 | 1,052 | 939 | 1,141 | 1,017 | 1,076 | 1,046 | 1,189 | 1,093 | 1,188 |
Profit after tax | 552 | 632 | 884 | 935 | 1,052 | 939 | 1,141 | 1,017 | 1,076 | 1,046 | 1,189 | 1,093 | 1,188 |
EPS in Rs | 1.33 | 1.57 | 2.49 | 2.56 | 2.96 | 2.43 | 3.04 | 2.74 | 2.98 | 2.80 | 3.04 | 2.90 | 3.23 |