Wednesday, November 13, 2013

More Pig stuff

Given a 2 column dataset - Salesperson,  SalesAmount
Mary  20000
Tom 1200
Mary 6000
Jane 8000
Jim 9000
Tom 20000
Amy 28000
Barry 35000
Charles 2400
Dawson 7384
Haley 2847
Kelly 29495
Lucy 3648



Here's how you get the top 3 Salesperson in pig:

salespeople = LOAD 'data' AS (salesperson:chararray, salesamount:int);
salespeople_group = GROUP salespeople BY salesperson;
top_sales = FOREACH salespeople_group {
sorted = ORDER salespeople BY salesamount DESC;
highest_sales = LIMIT sorted 3;
GENERATE group as salesperson, highest_sales;
};

No comments:

Post a Comment