regex - filter dates from dataframe -


i need remove kinds of dates (mm-dd-yy, mm/dd/yyyy, dd.mm.yy, dd-mon-yyyy etc) .csv file using pandas dataframe. can filter method of use?

for col in df.columns.values:    pd.filter(regex = '(([1-9]|1[012])[-/.]([1-9]|[12][0-9]|3[01])[-/.](19|20)\d\d)|((1[012]|0[1-9])(3‌​[01]|2\d|1\d|0[1-9])(19|20)\d\d)|((1[012]|0[1-9])[-/.](3[01]|2\d|1\d|0[1-9])[-/.]‌​(19|20)\d\d)') 

e.g: if have .csv file various columns data , dates 10/12/2015, 12/01/1995, 2016-19-04, 19th april,2016, etc., output file must contain no dates.

data sample

column1    column2     column3    data  4th april,2016  data  4/20/2016 20-04-16    20.04.2016   data      data      20-04-2016  4-apr-16    data      20/04/2016  

as can see have various formats dates here. need remove them all.

of course can use regex filter out dates, find way: pick first row of dataframe (assuming there no nan in df), initialize pandas.timestamp object each value of row. if success, corresponding column contains date

time_columns = [] col in df.columns:     try:         t = pandas.timestamp(df.loc[0, col])         time_columns.append(col)     except exception:         pass df = df.drop(time_columns, axis=1) 

but don't think solution. it's little bit weird. instead, think might analyze original data first


Comments

Popular posts from this blog

Ansible - ERROR! the field 'hosts' is required but was not set -

SoapUI on windows 10 - high DPI/4K scaling issue -

customize file_field button ruby on rails -