regex - filter dates from dataframe -
i need remove kinds of dates (mm-dd-yy, mm/dd/yyyy, dd.mm.yy, dd-mon-yyyy etc) .csv file using pandas dataframe. can filter method of use?
for col in df.columns.values: pd.filter(regex = '(([1-9]|1[012])[-/.]([1-9]|[12][0-9]|3[01])[-/.](19|20)\d\d)|((1[012]|0[1-9])(3[01]|2\d|1\d|0[1-9])(19|20)\d\d)|((1[012]|0[1-9])[-/.](3[01]|2\d|1\d|0[1-9])[-/.](19|20)\d\d)') e.g: if have .csv file various columns data , dates 10/12/2015, 12/01/1995, 2016-19-04, 19th april,2016, etc., output file must contain no dates.
data sample
column1 column2 column3 data 4th april,2016 data 4/20/2016 20-04-16 20.04.2016 data data 20-04-2016 4-apr-16 data 20/04/2016 as can see have various formats dates here. need remove them all.
of course can use regex filter out dates, find way: pick first row of dataframe (assuming there no nan in df), initialize pandas.timestamp object each value of row. if success, corresponding column contains date
time_columns = [] col in df.columns: try: t = pandas.timestamp(df.loc[0, col]) time_columns.append(col) except exception: pass df = df.drop(time_columns, axis=1) but don't think solution. it's little bit weird. instead, think might analyze original data first
Comments
Post a Comment