regex - Extracting last-names from a vector of names in R -
i have dataframe contains u.s. senator names , need extract last names can fuzzy match them dataframe has other information senators, (and column contains last names.)
the problem names contain middle initial or middle name , have senator's party @ end. how can write gsub command extract senator's last name? apologies, i'm new regex , bad @ it.
snippet of data here:
names <- c("john kerry (d)", "john h chafee (r)", "chris dodd (d)", "joe lieberman (d)", "frank r lautenberg (d)", "daniel patrick moynihan (d)", "alfonse m d'amato (r)", "arlen specter (r)", "jay rockefeller (d)", "carl levin (d)")
you can use strsplit()
along lapply()
on resulting list:
> unlist(lapply(strsplit(names, " "), function(x) { return(x[length(x)-1]) })) [1] "kerry" "chafee" "dodd" "lieberman" "lautenberg" [6] "moynihan" "d'amato" "specter" "rockefeller" "levin"
the trick here take second last element in each splitted string, last name.
Comments
Post a Comment