最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

web scraping - GET request to an API endpoint comes back as 403 forbidden even with headers using httr package in R. (works in b

programmeradmin6浏览0评论

I obtained an API endpoint url from inspection of a webpage. That endpoint is: ";. If I copy and paste this into a browser, it populates with a JSON response just fine. However, If I try to make that same call with the same headers from the browser version:

library(httr)

url = ";

headers <- add_headers(
  `accept` = "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7",
  `accept-encoding` = "gzip, deflate, br, zstd",
  `accept-language` = "en-US,en;q=0.9",
  `cache-control` = "max-age=0",
  `cookie` = "_ga=GA1.1.598020237.1738380109; _cc_id=fe871723732fbf4c2401642c107a26fe; panoramaId_expiry=1742486201782; panoramaId=daa5f27b6006082f849fd5198a3b16d539383b6b209451763912ab7650a957c1; panoramaIdType=panoIndiv; cto_bundle=d8efBl9kZGJhN3FiV1NUcU1sZnNpTkx6ZmFqRm9RdWtzWHlvcGJzbHlLR1Y0d1lpS211NFV0RlNjZVFwcmVrU0ZZeU12MlRjb0hIOExZNWVIZTBubThNV0N1WHJ2NmRwdHdTOUlaRTFLcnVLTkxjTHpaU25yMnlwODJ3eGRCNjd6RWMlMkZpd0pVcm53TUhBazRlSFZKaG5HMm5PUSUzRCUzRA; FCNEC=%5B%5B%22AKsRol-wu_0XC3FpfHpetGephwnn9tq3I6kd5brworJJEdPd-xbuokPGjGGDI8A3ClXx5gkbCZkQnxp3pFBxykXMh08rWsb91bSEJC2NHmP_GKKFEaKHGOYt2jwek5_EqpbmDRYEl8LWRgZtGge3p_CecJbmr2sumQ%3D%3D%22%5D%5D; _awl=2.1741897730.5-b2f593bc3c95e1889f553be2c7879f1b-6763652d75732d6561737431-3; _ga_HNQ9P9MGZR=GS1.1.1741896640.5.1.1741898812.60.0.0",
  `if-none-match` = "\"8192fa0e83\"",
  `priority` = "u=0, i",
  `sec-ch-ua` = "\"Chromium\";v=\"134\", \"Not:A-Brand\";v=\"24\", \"Google Chrome\";v=\"134\"",
  `sec-ch-ua-mobile` = "?0",
  `sec-ch-ua-platform` = "\"Windows\"",
  `sec-fetch-dest` = "document",
  `sec-fetch-mode` = "navigate",
  `sec-fetch-site` = "none",
  `sec-fetch-user` = "?1",
  `upgrade-insecure-requests` = "1",
  `user-agent` = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/134.0.0.0 Safari/537.36"
)


response = GET(url, headers)



response$status_code

I get a error 403 forbidden. I've also attempted with various headers removed/simplified, but can't seem to figure out how or why it blocks the GET request from R but not the browser response.

My best guess is that the browser is still acting differently than a simple GET request from R, but I don't have the experience or knowledge to understand why that is. I'd like to avoid spinning up my own browser using selenium if possible. Is there a good way to tackle this problem?

与本文相关的文章

发布评论

评论列表(0)

  1. 暂无评论