r - Build data frame from multiple rvest elements -


i trying web scraping of journal article metadata (title, authors, abstract, etc.). have list of pages need navigate , each page has information need (except table of contents pages in list). built function piece each part of page list , i'm trying go through each page , end data frame of results.

here have:

article.links <- c("http://onlinelibrary.wiley.com/doi/10.1002/jee.20116/abstract",  "http://onlinelibrary.wiley.com/doi/10.1002/jee.20120/abstract",  "http://onlinelibrary.wiley.com/doi/10.1002/jee.20117/abstract" )  pager <- function(page) {   new.row = vector("list", 4)   page <- read_html(page)    #doi   new.row[1] <- page %>%     html_node("#doi") %>%     html_text()    #title   new.row[2] <- page %>%     html_node(".maintitle") %>%     html_text()    #authors   new.row[3] <- page %>%     html_node("#authors") %>%     html_text()    #abstract   new.row[4] <- page %>%     html_node("#abstract") %>%     html_text()    return(unlist(new.row)) } 

when run pager.test(article.links.test[1]) results expect 1 entry. i'm not quite sure build data frame series of results though. tried loop rbind put rows when try of rows throws errors entries being generated:

#this doesn't seem work abstracts <- data.frame() for(key in 1:length(article.links.test)) {   abstracts <- rbind(abstracts2, pager.test(article.links.test[key])) } 

how can scrape elements each of pages in list , combine results data frame?

you can use lapply , rbind rows

options(stringsasfactors=f) library(rvest) article.links <- c("http://onlinelibrary.wiley.com/doi/10.1002/jee.20116/abstract",                          "http://onlinelibrary.wiley.com/doi/10.1002/jee.20120/abstract",      "http://onlinelibrary.wiley.com/doi/10.1002/jee.20117/abstract" )  pager <- function(page) {     doc <- read_html(url(page))     data.frame(doi=doc %>% html_node("#doi") %>% html_text(),         title=doc %>% html_node(".maintitle") %>% html_text(),         authors=doc %>% html_node("#authors") %>% html_text(),         abstract=doc %>% html_node("#abstract") %>% html_text()) }  ans <- do.call(rbind, lapply(article.links, pager)) str(ans) 

Comments

Popular posts from this blog

Ansible - ERROR! the field 'hosts' is required but was not set -

customize file_field button ruby on rails -

SoapUI on windows 10 - high DPI/4K scaling issue -