python - Slicing and arranging dataframe in pandas -
i want arrange data data frame multiple dataframes or groups. input data
id channel path 15 direct a1 15 direct a2 15 direct a3 15 direct a4 213 paid b2 213 paid b1 2222 direct as25 2222 direct dw46 2222 direct 32q 3111 paid d32a 3111 paid 23ff 3111 paid www32 3111 paid 2d2
the desired output should like
id channel p1 p2 213 paid b2 b2 id channel p1 p2 p3 2222 direct as25 dw46 dw46 id channel p1 p2 p3 p4 15 direct a1 a2 a3 a4 3111 paid d32a 23ff www32 2d2
please tell way can achieve it. thanks
i think can first create helper column cols
cumcount
, pivot_table
. need find length of notnull
columns (substract first 2) , groupby
length
. last dropna
columns in each group:
df['cols'] = 'p' + (df.groupby('id')['id'].cumcount() + 1).astype(str) df1 = df.pivot_table(index=['id', 'channel'], columns='cols', values='path', aggfunc='first').reset_index().rename_axis(none, axis=1) print df1 id channel p1 p2 p3 p4 0 15 direct a1 a2 a3 a4 1 213 paid b2 b1 none none 2 2222 direct as25 dw46 32q none 3 3111 paid d32a 23ff www32 2d2 print df1.apply(lambda x: x.notnull().sum() - 2 , axis=1) 0 4 1 2 2 3 3 4 dtype: int64 i, g in df1.groupby(df1.apply(lambda x: x.notnull().sum() - 2 , axis=1)): print print g.dropna(axis=1) 2 id channel p1 p2 1 213 paid b2 b1 3 id channel p1 p2 p3 2 2222 direct as25 dw46 32q 4 id channel p1 p2 p3 p4 0 15 direct a1 a2 a3 a4 3 3111 paid d32a 23ff www32 2d2
for storing can use dictionary
of dataframes
:
dfs={i: g.dropna(axis=1) i, g in df1.groupby(df1.apply(lambda x: x.notnull().sum() - 2 , axis=1))} #select dataframe len=2 print dfs[2] id channel p1 p2 1 213 paid b2 b1 #select dataframe len=3 print dfs[3] id channel p1 p2 p3 2 2222 direct as25 dw46 32q
Comments
Post a Comment