regex - filter dates from dataframe -
i need remove kinds of dates (mm-dd-yy
, mm/dd/yyyy
, dd.mm.yy
, dd-mon-yyyy
etc) .csv
file using pandas dataframe. can filter method of use?
for col in df.columns.values: pd.filter(regex = '(([1-9]|1[012])[-/.]([1-9]|[12][0-9]|3[01])[-/.](19|20)\d\d)|((1[012]|0[1-9])(3[01]|2\d|1\d|0[1-9])(19|20)\d\d)|((1[012]|0[1-9])[-/.](3[01]|2\d|1\d|0[1-9])[-/.](19|20)\d\d)')
e.g: if have .csv
file various columns data , dates 10/12/2015
, 12/01/1995
, 2016-19-04
, 19th april,2016
, etc., output file must contain no dates.
data sample
column1 column2 column3 data 4th april,2016 data 4/20/2016 20-04-16 20.04.2016 data data 20-04-2016 4-apr-16 data 20/04/2016
as can see have various formats dates here. need remove them all.
of course can use regex filter out dates, find way: pick first row of dataframe (assuming there no nan in df), initialize pandas.timestamp
object each value of row. if success, corresponding column contains date
time_columns = [] col in df.columns: try: t = pandas.timestamp(df.loc[0, col]) time_columns.append(col) except exception: pass df = df.drop(time_columns, axis=1)
but don't think solution. it's little bit weird. instead, think might analyze original data first
Comments
Post a Comment