最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

Using R to add field to online form and scrape resulting javascript created table - Stack Overflow

programmeradmin0浏览0评论

I am trying to get R to plete the 'Search by postcode' field on this webpage / with predefined text (e.g. BN1 1NA), advance to the next page and scrape the resulting 4 column table, which, depending on the postcode, can be over multiple pages. To make it more plex the 'Improvement indicator' is not a text field, rather an image file (as seen if you search with postcode BN1 3HP). I would prefer this column to either contain a 0 or 1 depending on if the image is present.

Ultimately I am after a nice data frame that mirrors the 4 columns on screen.

I have tried to modify the suggestions from this question to do what I have described above with no luck, and to be honest I am out of my depth trying to decipher this one.

I realise R may not be the most suited for what I need to do, but it's all I have available to me. Any help would be greatly appreciated.

I am trying to get R to plete the 'Search by postcode' field on this webpage http://cti.voa.gov.uk/cti/ with predefined text (e.g. BN1 1NA), advance to the next page and scrape the resulting 4 column table, which, depending on the postcode, can be over multiple pages. To make it more plex the 'Improvement indicator' is not a text field, rather an image file (as seen if you search with postcode BN1 3HP). I would prefer this column to either contain a 0 or 1 depending on if the image is present.

Ultimately I am after a nice data frame that mirrors the 4 columns on screen.

I have tried to modify the suggestions from this question to do what I have described above with no luck, and to be honest I am out of my depth trying to decipher this one.

I realise R may not be the most suited for what I need to do, but it's all I have available to me. Any help would be greatly appreciated.

Share Improve this question edited May 23, 2017 at 12:25 CommunityBot 11 silver badge asked Jul 8, 2015 at 14:56 ChrisChris 1,23711 silver badges30 bronze badges 1
  • I have tried to use look<-getHTMLFormDescription("http://cti.voa.gov.uk/cti/") ; look<-look[[1]]; look(txtPostCode="W2 4RH"); but this give me "Error: Not Found ". – dax90 Commented Jul 11, 2015 at 1:20
Add a ment  | 

2 Answers 2

Reset to default 6

I'm not sure what the T&C of the VOA website have to say about scraping, but this code will do the job:

library("httr")
library("rvest")
post_code <- "B1 1"
resp <- POST("http://cti.voa.gov.uk/cti/InitS.asp?lcn=0",
             encode = "form",
             body = list(btnPush = 1,
                         txtPageNum = 0,
                         txtPostCode = post_code,
                         txtRedirectTo = "InitS.asp",
                         txtStartKey = 0))
resp_cont <- read_html(resp)
council_table <- resp_cont %>%
  html_node(".scl_plex table") %>%
  html_table

Firebug has an excellent 'Net' panel where the POST headers can be seen. Most modern browsers also have something similar built in.

I use RSelenium to scrap a council tax list of an Exeter postcode:

library(RSelenium)
library(RCurl)
input = 'EX4 2NU'
appURL <- "http://cti.voa.gov.uk/cti/"
RSelenium::startServer()
remDr <- remoteDriver()
remDr$open()
Sys.sleep(5)
remDr$navigate(appURL)
search.form <- remDr$findElement(using = "xpath", "//*[@id='txtPostCode']")
search.form$sendKeysToElement(list(input, key = "enter"))
doc <- remDr$getPageSource()
tbl = xpathSApply(htmlParse(doc[[1]]),'//tbody')
temp1 = readHTMLTable(tbl[[1]],header=F)

v = length(xpathSApply(htmlParse(doc[[1]]),'//a[@class="next"]'))
while (v != 0) {
    nextpage <- remDr$findElement(using = "xpath", "//*[@class = 'next']")
    nextpage$clickElement()
    doc <- remDr$getPageSource()
    tbl = xpathSApply(htmlParse(doc[[1]]),'//tbody')
    temp2 = readHTMLTable(tbl[[1]],header=F)
    temp1 = rbind(temp1,temp2)
    v = length(xpathSApply(htmlParse(doc[[1]]),'//a[@class="next"]'))
}
finaltable = temp1

Hope you find it helpful. With this one you can scrap multiple page data.

发布评论

评论列表(0)

  1. 暂无评论