hadoop - How to count the different values in a column in Pig and Hive -
i have data file below indicates order valid or invalid. want calculate count of valid orders , count of invalid orders.
1,flipkart,pepsi,invalid 2,flipkart,tshirt,valid 3,flipkart,shirt,valid 4,amazon,shoe,valid 5,amazon,beer,invalid 6,flipkart,jewels,valid 7,flipkart,coke,invalid
so final output should
how many number of valid , invalid records totally
eg : valid : 7 , invalid 3
in flipkart, how many valid , invalid records , in amazon how many valid , invalid records.
eg : flipkart : valid 3 , invalid : 2 amazon : valid 1 , invalid : 1
in pig - groupby
, foreach
assuming column names id,name,pp,state
bynamestate = group my_data (name, state); bynamestatecounts = foreach bynamestate generate count(my_data) ccc;
Comments
Post a Comment