I had to convert this sql into pig recently, took me a few tries and it's not very intuitive to me, so if you ever come across this problem..here's a template you can copy.
SQL:
select month_year, osname, appname, count (distinct sessionid) as distinct_sessions
from tableA
where starttime >= '2013-01-01'
group by month_year, osname, appname;
PIG:
A = LOAD '/user/yourfolder/*/part*' USING PigStorage('\t') AS (startday:chararray, appname:chararray, osname:chararray, uniquesessionid:chararray);
B = FILTER A by startday matches '2013.*';
--project startday to month
C = FOREACH B GENERATE SUBSTRING(TRIM(startday), 0, 7) as monthyear, appname, osname, uniquesessionid;
D = GROUP C BY (monthyear, appname, osname);
SESSIONS = foreach D {
unique_sessions = DISTINCT C.uniquesessionid;
generate FLATTEN(group), COUNT(unique_sessions) as sessions;
};
STORE SESSIONS INTO '/user/yourfolder' USING PigStorage('\t');
No comments:
Post a Comment