regex - Extracting last-names from a vector of names in R -


i have dataframe contains u.s. senator names , need extract last names can fuzzy match them dataframe has other information senators, (and column contains last names.)

the problem names contain middle initial or middle name , have senator's party @ end. how can write gsub command extract senator's last name? apologies, i'm new regex , bad @ it.

snippet of data here:

names <- c("john kerry (d)", "john h chafee (r)", "chris dodd (d)", "joe lieberman (d)", "frank r lautenberg (d)", "daniel patrick moynihan (d)", "alfonse m d'amato (r)", "arlen specter (r)", "jay rockefeller (d)", "carl levin (d)") 

you can use strsplit() along lapply() on resulting list:

> unlist(lapply(strsplit(names, " "), function(x) { return(x[length(x)-1]) }))  [1] "kerry"       "chafee"      "dodd"        "lieberman"   "lautenberg"  [6] "moynihan"    "d'amato"     "specter"     "rockefeller" "levin" 

the trick here take second last element in each splitted string, last name.


Comments

Popular posts from this blog

Ansible - ERROR! the field 'hosts' is required but was not set -

customize file_field button ruby on rails -

SoapUI on windows 10 - high DPI/4K scaling issue -