Wednesday, January 9, 2013

Pig null handling

When I sqooped in data from our Netezza database into HDFS..the null values also translated into nulls in the hadoop file.

To replace null with an empty string, do it in the GENERATE statement.

A = LOAD 'your file' USING PigStorage('\t') AS (col1:charray, col2:charray);

B = FOREACH A GENERATE col1, (col2 is null ? '' : TRIM(col2)) as col2;

--Note: I also sneaked in a TRIM to get rid of any trailing spaces in col2, otherwise if it's a null, I set it to empty string.

No comments:

Post a Comment