hadoop - How to count the different values in a column in Pig and Hive -


i have data file below indicates order valid or invalid. want calculate count of valid orders , count of invalid orders.

1,flipkart,pepsi,invalid 2,flipkart,tshirt,valid 3,flipkart,shirt,valid 4,amazon,shoe,valid 5,amazon,beer,invalid 6,flipkart,jewels,valid 7,flipkart,coke,invalid 

so final output should

  1. how many number of valid , invalid records totally

    eg : valid : 7 , invalid 3

  2. in flipkart, how many valid , invalid records , in amazon how many valid , invalid records.

    eg : flipkart : valid 3 , invalid : 2 amazon : valid 1 , invalid : 1

in pig - groupby , foreach

assuming column names id,name,pp,state

bynamestate = group my_data (name, state); bynamestatecounts = foreach bynamestate generate count(my_data) ccc; 

Comments

Popular posts from this blog

Ansible - ERROR! the field 'hosts' is required but was not set -

customize file_field button ruby on rails -

SoapUI on windows 10 - high DPI/4K scaling issue -