最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

web crawler - Problem with Rust web crawling getting blocked and crashing - Stack Overflow

programmeradmin3浏览0评论

I am experimenting with web crawling in rust using thirtyfour, but I ran into a issue regarding google capcha blocking the code to get the results I want

use thirtyfour::prelude::*;
use thirtyfour::Key;
use tokio;
use tokio::time::{sleep, Duration};

#[tokio::main]
async fn main() -> WebDriverResult<()> {
let mut capa = DesiredCapabilities::chrome();
//capa.add_arg("--user-data-dir=C:/Users/name/AppData/Local/Google/Chrome/User Data")?; 
//capa.add_arg("--profile-directory=Default")?;
//name is a placeholder for my actual username, the two lines causes errors
let driver = WebDriver::new("http://localhost:63867", capa).await?; //webdriver port
driver.goto(";).await?;

sleep(Duration::from_secs(3)).await; 

let search_box = driver.find(By::Name("q")).await?;
search_box.send_keys("rust").await?;
sleep(Duration::from_secs(2)).await; 
search_box.send_keys(Key::Enter).await?;

sleep(Duration::from_secs(5)).await; 


let page_source = driver.source().await?;
println!("{}", page_source);

driver.quit().await?;
Ok(())
}

this is my code and it is suppose to grab the search result for rust, but it returns an error of

Error: WebDriverError(SessionNotCreated(WebDriverErrorInfo { status: 500, error: "", value: WebDriverErrorValue { message: "session not created: Chrome failed to start: crashed.\n (chrome not reachable)\n (The process started from chrome location C:\Program Files\Google\Chrome\Application\chrome.exe is no longer running, so ChromeDriver is assuming that Chrome has crashed.)", error: Some("session not created"), stacktrace: Some("\tGetHandleVerifier [0x00007FF71C4D6F15+28773]\n\t(No symbol) [0x00007FF71C442600]\n\t(No symbol) [0x00007FF71C2D8FAA]\n\t(No symbol) [0x00007FF71C3157F4]\n\t(No symbol) [0x00007FF71C3115A6]\n\t(No symbol) [0x00007FF71C36547B]\n\t(No symbol) [0x00007FF71C364A50]\n\t(No symbol) [0x00007FF71C357023]\n\t(No symbol) [0x00007FF71C31FF5E]\n\t(No symbol) [0x00007FF71C3211E3]\n\tGetHandleVerifier [0x00007FF71C82425D+3490733]\n\tGetHandleVerifier [0x00007FF71C83BA43+3586963]\n\tGetHandleVerifier [0x00007FF71C83147D+3544525]\n\tGetHandleVerifier [0x00007FF71C59C9DA+838442]\n\t(No symbol) [0x00007FF71C44D04F]\n\t(No symbol) [0x00007FF71C449614]\n\t(No symbol) [0x00007FF71C4497B6]\n\t(No symbol) [0x00007FF71C438CE9]\n\tBaseThreadInitThunk [0x00007FF9C133E8D7+23]\n\tRtlUserThreadStart [0x00007FF9C2D5BF2C+44]\n"), data: None } })) error: process didn't exit successfully: target\debug\rust-crawl.exe (exit code: 1)

if i include the commented lines, and only returns the result of the capcha if i dont include the two lines, i tried some other way of making google not flagging it as a bot, like adding delay to searches and simulate scrolling but none of it work, please help

发布评论

评论列表(0)

  1. 暂无评论