Add new column if range of columns contains string in R -
i have dataframe below. add 2 columns:
containsanz: indicates if of columns f0 f3 contain 'australia' or 'new zealand' ignoring na values
allanz: indicates if non na columns contain 'australia' or 'new zealand'
starting dataframe be:
dfcontainsanz col.a col.b col.c f0 f1 f2 f3 1 data 0 xxx australia singapore <na> <na> 2 data 1 yyy united states united states united states <na> 3 data 0 zzz australia australia australia australia 4 data 0 ooo hong kong london australia <na> 5 data 1 xxx new zealand <na> <na> <na>
the end result should this:
df col.a col.b col.c f0 f1 f2 f3 containsanz allanz 1 data 0 xxx australia singapore <na> <na> australia undefined 2 data 1 yyy united states united states united states <na> undefined undefined 3 data 0 zzz australia australia australia australia australia australia 4 data 0 ooo hong kong london australia <na> australia undefined 5 data 1 xxx new zealand <na> <na> <na> new zealand new zealand
i'm using dplyr (preferred solution) , have come code doesn't work , repetitive. there better way write not having copy f0|f1|f2... rules on again? real data set has more. nas interfering code?
df <- df %>% mutate(anzflag = ifelse( f0 == 'australia' | f1 == 'australia' | f2 == 'australia' | f3 == 'australia', 'australia', ifelse( f0 == 'new zealand' | f1 == 'new zealand' | f2 == 'new zealand' | f3 == 'new zealand', 'new zealand', 'undefined' ) ) )
still typing, think gets @ essence you're looking for:
library(dplyr) df <- read.table(text='col.a,col.b,col.c,f0,f1,f2,f3 data,0,xxx,australia,singapore,na,na data,1,yyy,"united states","united states","united states",na data,0,zzz,australia,australia,australia,australia data,0,ooo,"hong kong",london,australia,na data,1,xxx,"new zealand",na,na,na', header=true, sep=",", stringsasfactors=false) down_under <- function(x) { mtch <- c("australia", "new zealand") cols <- unlist(x)[c("f0", "f1", "f2", "f3")] bind_cols(x, data_frame(containsanz=any(mtch %in% cols, na.rm=true), allanz=all(as.vector(na.omit(cols)) %in% cols))) } rowwise(df) %>% do(down_under(.)) ## source: local data frame [5 x 9] ## groups: <by row> ## ## col.a col.b col.c f0 f1 f2 f3 containsanz allanz ## (chr) (int) (chr) (chr) (chr) (chr) (chr) (lgl) (lgl) ## 1 data 0 xxx australia singapore na na true true ## 2 data 1 yyy united states united states united states na false true ## 3 data 0 zzz australia australia australia australia true true ## 4 data 0 ooo hong kong london australia na true true ## 5 data 1 xxx new zealand na na na true true
Comments
Post a Comment