Add new column if range of columns contains string in R -

- January 15, 2012

i have dataframe below. add 2 columns:

containsanz: indicates if of columns f0 f3 contain 'australia' or 'new zealand' ignoring na values

allanz: indicates if non na columns contain 'australia' or 'new zealand'

starting dataframe be:

dfcontainsanz   col.a col.b col.c            f0            f1            f2        f3 1  data     0   xxx     australia     singapore          <na>      <na> 2  data     1   yyy united states united states united states      <na> 3  data     0   zzz     australia     australia     australia australia 4  data     0   ooo     hong kong        london     australia      <na> 5  data     1   xxx   new zealand          <na>          <na>      <na>

the end result should this:

df   col.a col.b col.c            f0            f1            f2        f3 containsanz      allanz 1  data     0   xxx     australia     singapore          <na>      <na>   australia   undefined 2  data     1   yyy united states united states united states      <na>   undefined   undefined 3  data     0   zzz     australia     australia     australia australia   australia   australia 4  data     0   ooo     hong kong        london     australia      <na>   australia   undefined 5  data     1   xxx   new zealand          <na>          <na>      <na> new zealand new zealand

i'm using dplyr (preferred solution) , have come code doesn't work , repetitive. there better way write not having copy f0|f1|f2... rules on again? real data set has more. nas interfering code?

df <- df %>% mutate(anzflag =      ifelse(     f0 == 'australia' |      f1 == 'australia' |     f2 == 'australia' |      f3 == 'australia',     'australia',          ifelse(         f0 == 'new zealand' |          f1 == 'new zealand' |         f2 == 'new zealand' |          f3 == 'new zealand',         'new zealand', 'undefined'         )     ) )

still typing, think gets @ essence you're looking for:

library(dplyr)  df <- read.table(text='col.a,col.b,col.c,f0,f1,f2,f3 data,0,xxx,australia,singapore,na,na data,1,yyy,"united states","united states","united states",na data,0,zzz,australia,australia,australia,australia data,0,ooo,"hong kong",london,australia,na data,1,xxx,"new zealand",na,na,na', header=true, sep=",", stringsasfactors=false)  down_under <- function(x) {   mtch <- c("australia", "new zealand")   cols <- unlist(x)[c("f0", "f1", "f2", "f3")]   bind_cols(x, data_frame(containsanz=any(mtch %in% cols, na.rm=true),                           allanz=all(as.vector(na.omit(cols)) %in% cols))) }  rowwise(df) %>% do(down_under(.))  ## source: local data frame [5 x 9] ## groups: <by row> ##  ##   col.a col.b col.c            f0            f1            f2        f3 containsanz allanz ##   (chr) (int) (chr)         (chr)         (chr)         (chr)     (chr)       (lgl)  (lgl) ## 1  data     0   xxx     australia     singapore            na        na        true   true ## 2  data     1   yyy united states united states united states        na       false   true ## 3  data     0   zzz     australia     australia     australia australia        true   true ## 4  data     0   ooo     hong kong        london     australia        na        true   true ## 5  data     1   xxx   new zealand            na            na        na        true   true

Search This Blog

EEE

Add new column if range of columns contains string in R -

Comments

Post a Comment

Popular posts from this blog

Ansible - ERROR! the field 'hosts' is required but was not set -

SoapUI on windows 10 - high DPI/4K scaling issue -

ssl - how to download/uplaod file over HTTPS using Indy 10 and OpenSSL in delphi? -