最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

javascript - scrape Shopee API v4 - Stack Overflow

programmeradmin7浏览0评论

I have a final project where the data I want to retrieve is through scraping data on shopee, but I have a problem when I scrape shopee on a hidden API, when I try it on the Insomnia script the script runs, but when I try it on local or google colab script this is an error.

How can I fix this?

    import requests
    import json
    headers = {
        'sec-ch-ua-mobile': '?0',
        'cookie': 'REC_T_ID=e67c02b5-ae54-11ec-b368-46ac8e8cc9d8; SPC_F=KrJ9Ck0EYC252EWJ3FSH5QFNzjmvng6O; SPC_IA=-1; _gcl_au=1.1.459910866.1654678938; _fbp=fb.2.1654678939550.956784750; G_ENABLED_IDPS=google; SPC_CLIENTID=S3JKOUNrMEVZQzI1jkqfwanvqrwehsep; _gcl_aw=GCL.1660529943.Cj0KCQjwuuKXBhCRARIsAC-gM0g5RPYu1Cfx0PZbXHrR5qqd7JqgFEy4XrCAxXEGFD4quU2tORTIR9caAsVdEALw_wcB; _gac_UA-61904553-8=1.1660529949.Cj0KCQjwuuKXBhCRARIsAC-gM0g5RPYu1Cfx0PZbXHrR5qqd7JqgFEy4XrCAxXEGFD4quU2tORTIR9caAsVdEALw_wcB; _med=refer; _gid=GA1.3.792417909.1660891119; csrftoken=Hk3UgpYhG30zu0CO9Vhk2OIKptWNBS0g; _QPWSDCXHZQA=9be12e07-9c49-426e-e0d8-01a11f73956b; AMP_TOKEN=%24NOT_FOUND; __LOCALE__null=ID; _dc_gtm_UA-61904553-8=1; SPC_T_ID="uXbSXytLbRMSr+KtQpRkW7f5FHiriPO+CdAryBv6THa5ljtJhfxKSiI5g2Ps2Fl4eILJBWgkAYeR+c0hO4843b12KCXHt56jNWASfgA5Uq8="; SPC_U=616200160; SPC_T_IV="eB95as87FjhL8HoasAA0kw=="; _ga_KK6LLGGZNQ=GS1.1.1660961596.8.0.1660961596.0.0.0; SPC_R_T_ID=uXbSXytLbRMSr+KtQpRkW7f5FHiriPO+CdAryBv6THa5ljtJhfxKSiI5g2Ps2Fl4eILJBWgkAYeR+c0hO4843b12KCXHt56jNWASfgA5Uq8=; SPC_R_T_IV=eB95as87FjhL8HoasAA0kw==; SPC_T_ID=uXbSXytLbRMSr+KtQpRkW7f5FHiriPO+CdAryBv6THa5ljtJhfxKSiI5g2Ps2Fl4eILJBWgkAYeR+c0hO4843b12KCXHt56jNWASfgA5Uq8=; SPC_T_IV=eB95as87FjhL8HoasAA0kw==; SPC_SI=id+yYgAAAABBUWdBaGJJRaEQWwAAAAAAbDFtUDZZZ2k=; SPC_ST=.aVZDcFoyVjBuUWIwUXVSUnkGCGuGI58EkFOzdykhsuSCGz0GrBWotkUiREvJO38YxTxyl3Pgbl73NUs1AmCexDhPneO/ABd8bgUkVqlhCvZTNPDPg8jv/9KaHwWagKm9FM55IY61eECu5twdRUQl9u3xgfshk26TRkvpli4dlCUZzIE0boMi5/5B/CcqUgoXsDH567+KunuKEe92wUSC1Q==; _ga=GA1.3.1352849021.1654678939; cto_bundle=ZLv7oF9EWUpOZWVHYUkyUHh0d2RBWDJvTWk5eXllWVpia1F1eXJ4RkdZcjhVZ1Q3NVRYYnE0c0hOWERsMm1tTjFER09MbmdMTW1VZG5VbkQ0MjByVnpxYlNRdk1MRk9TUGtNSzZpRzRnZFNXU1ZUVlElMkY1dXRpbGFUZm5vdjdvcklFQzk0YzBuVm1qMUJzZnRyb2xPMHpRMldVQSUzRCUzRA; _ga_SW6D8G0HXK=GS1.1.1660959836.35.1.1660961617.22.0.0; SPC_EC=U1A1Vk5JdzlVaFVYdjJRUk4fyVPKEHSso64GpvFSCO/oihfsUpaQrXO9e4XqPT/AjNQJP7hcW+o+A7chna6AIbCtFRsocFdW1x1oS3A8+pNHmK3oRTDCZe2BDyAP0cOp133wiyu0GTSCetXIhbIRwvkOTJYqOXYBGKuTW6tGY1o=; shopee_webUnique_ccd=veSMI3XpR84mDT6rWJgoWg%3D%3D%7C9xD6GCFDkurxx4Cxf%2F72oK7%2FP2ilXgSYBkzRAd4F%2BSkKrCsqCWGVzz0SHGMINBr5KgoTxt7LXhBKejCILMQlWRcetFY%3D%7ClXsfMcnYECC51PEy%7C05%7C3',
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36',
        'referer':'.11043145?page=0&ratingFilter=4',
          }
    x=0
    number_page = x*60
    url =(";limit=60&match_id=11043145"
            "&newest={}&order=desc&page_type=search&rating_filter=4&scenario=PAGE_CATEGORY&version=2").format(number_page)
    y = requests.get(url, headers=headers).json()
    y

Output:

{'tracking_id': '745a6f4b-0fc3-48af-b563-5a7ec483a601',
 'action_type': 2,
 'error': 90309999}

I have a final project where the data I want to retrieve is through scraping data on shopee, but I have a problem when I scrape shopee on a hidden API, when I try it on the Insomnia script the script runs, but when I try it on local or google colab script this is an error.

How can I fix this?

    import requests
    import json
    headers = {
        'sec-ch-ua-mobile': '?0',
        'cookie': 'REC_T_ID=e67c02b5-ae54-11ec-b368-46ac8e8cc9d8; SPC_F=KrJ9Ck0EYC252EWJ3FSH5QFNzjmvng6O; SPC_IA=-1; _gcl_au=1.1.459910866.1654678938; _fbp=fb.2.1654678939550.956784750; G_ENABLED_IDPS=google; SPC_CLIENTID=S3JKOUNrMEVZQzI1jkqfwanvqrwehsep; _gcl_aw=GCL.1660529943.Cj0KCQjwuuKXBhCRARIsAC-gM0g5RPYu1Cfx0PZbXHrR5qqd7JqgFEy4XrCAxXEGFD4quU2tORTIR9caAsVdEALw_wcB; _gac_UA-61904553-8=1.1660529949.Cj0KCQjwuuKXBhCRARIsAC-gM0g5RPYu1Cfx0PZbXHrR5qqd7JqgFEy4XrCAxXEGFD4quU2tORTIR9caAsVdEALw_wcB; _med=refer; _gid=GA1.3.792417909.1660891119; csrftoken=Hk3UgpYhG30zu0CO9Vhk2OIKptWNBS0g; _QPWSDCXHZQA=9be12e07-9c49-426e-e0d8-01a11f73956b; AMP_TOKEN=%24NOT_FOUND; __LOCALE__null=ID; _dc_gtm_UA-61904553-8=1; SPC_T_ID="uXbSXytLbRMSr+KtQpRkW7f5FHiriPO+CdAryBv6THa5ljtJhfxKSiI5g2Ps2Fl4eILJBWgkAYeR+c0hO4843b12KCXHt56jNWASfgA5Uq8="; SPC_U=616200160; SPC_T_IV="eB95as87FjhL8HoasAA0kw=="; _ga_KK6LLGGZNQ=GS1.1.1660961596.8.0.1660961596.0.0.0; SPC_R_T_ID=uXbSXytLbRMSr+KtQpRkW7f5FHiriPO+CdAryBv6THa5ljtJhfxKSiI5g2Ps2Fl4eILJBWgkAYeR+c0hO4843b12KCXHt56jNWASfgA5Uq8=; SPC_R_T_IV=eB95as87FjhL8HoasAA0kw==; SPC_T_ID=uXbSXytLbRMSr+KtQpRkW7f5FHiriPO+CdAryBv6THa5ljtJhfxKSiI5g2Ps2Fl4eILJBWgkAYeR+c0hO4843b12KCXHt56jNWASfgA5Uq8=; SPC_T_IV=eB95as87FjhL8HoasAA0kw==; SPC_SI=id+yYgAAAABBUWdBaGJJRaEQWwAAAAAAbDFtUDZZZ2k=; SPC_ST=.aVZDcFoyVjBuUWIwUXVSUnkGCGuGI58EkFOzdykhsuSCGz0GrBWotkUiREvJO38YxTxyl3Pgbl73NUs1AmCexDhPneO/ABd8bgUkVqlhCvZTNPDPg8jv/9KaHwWagKm9FM55IY61eECu5twdRUQl9u3xgfshk26TRkvpli4dlCUZzIE0boMi5/5B/CcqUgoXsDH567+KunuKEe92wUSC1Q==; _ga=GA1.3.1352849021.1654678939; cto_bundle=ZLv7oF9EWUpOZWVHYUkyUHh0d2RBWDJvTWk5eXllWVpia1F1eXJ4RkdZcjhVZ1Q3NVRYYnE0c0hOWERsMm1tTjFER09MbmdMTW1VZG5VbkQ0MjByVnpxYlNRdk1MRk9TUGtNSzZpRzRnZFNXU1ZUVlElMkY1dXRpbGFUZm5vdjdvcklFQzk0YzBuVm1qMUJzZnRyb2xPMHpRMldVQSUzRCUzRA; _ga_SW6D8G0HXK=GS1.1.1660959836.35.1.1660961617.22.0.0; SPC_EC=U1A1Vk5JdzlVaFVYdjJRUk4fyVPKEHSso64GpvFSCO/oihfsUpaQrXO9e4XqPT/AjNQJP7hcW+o+A7chna6AIbCtFRsocFdW1x1oS3A8+pNHmK3oRTDCZe2BDyAP0cOp133wiyu0GTSCetXIhbIRwvkOTJYqOXYBGKuTW6tGY1o=; shopee_webUnique_ccd=veSMI3XpR84mDT6rWJgoWg%3D%3D%7C9xD6GCFDkurxx4Cxf%2F72oK7%2FP2ilXgSYBkzRAd4F%2BSkKrCsqCWGVzz0SHGMINBr5KgoTxt7LXhBKejCILMQlWRcetFY%3D%7ClXsfMcnYECC51PEy%7C05%7C3',
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36',
        'referer':'https://shopee.co.id/Perawatan-Kecantikan-cat.11043145?page=0&ratingFilter=4',
          }
    x=0
    number_page = x*60
    url =("https://shopee.co.id/api/v4/search/search_items?by=relevancy&limit=60&match_id=11043145"
            "&newest={}&order=desc&page_type=search&rating_filter=4&scenario=PAGE_CATEGORY&version=2").format(number_page)
    y = requests.get(url, headers=headers).json()
    y

Output:

{'tracking_id': '745a6f4b-0fc3-48af-b563-5a7ec483a601',
 'action_type': 2,
 'error': 90309999}
Share Improve this question edited Nov 6, 2022 at 4:15 cigien 60.6k11 gold badges81 silver badges121 bronze badges asked Aug 20, 2022 at 4:52 msa.statmsa.stat 311 silver badge2 bronze badges 4
  • The cookie you're sending includes embedded identity information. When you switch machines/browsers, the identity information doesn't match any more. Can't you just start a new session each time and log in again? – Tim Roberts Commented Aug 20, 2022 at 4:59
  • so the cookies that are in chrome or the app testing API can't be used when creating the get API in python huh?, so I have to build cookies from the machine I will use? – msa.stat Commented Aug 21, 2022 at 7:17
  • Essentially, yes. Every HTTP request stands plete alone. When you connect to a web site for the first time, they hand you back cookies. You are expected to provide those cookies when you call them again. That's how they know it was you. Chrome stores the cookies it gets in its cache. You need to do the same. – Tim Roberts Commented Aug 21, 2022 at 23:35
  • thank you very much for the answer and this useful information, I really get new knowledge from this problem. I hope you won't get bored if I have other questions but it's still about this problem, I'm still trying to test according to your suggestions – msa.stat Commented Aug 23, 2022 at 2:16
Add a ment  | 

1 Answer 1

Reset to default 6

You can add a requests header, like this headers key af-ac-enc-dat value null, this works for me.

                var request = WebRequest.Create(apiURL);
                request.Headers["x-api-source"] = "pc";
                request.Headers["af-ac-enc-dat"] = "null";
                var response = request.GetResponse();
                var reader = new StreamReader(response.GetResponseStream());
                s_ResponseString = reader.ReadToEnd();
                dynamic Prodata = Newtonsoft.Json.Linq.JObject.Parse(s_ResponseString);
发布评论

评论列表(0)

  1. 暂无评论