Thursday, November 14, 2013

Regular expression

in pig, say you want to match only 2013 feb - may:

/*

 * Include only records where the 'order_dtm' field matches
 * the regular expression pattern:
 *
 *   ^       = beginning of string
 *   2013    = literal value '2013'
 *   0[2345] = 0 followed by 2, 3, 4, or 5
 *   -       = a literal character '-'
 *   \\d{2}  = exactly two digits
 *   \\s     = a single whitespace character
 *   .*      = any number of any characters
 *   $       = end of string
 *
 * If you are not familiar with regular expressions and would
 * like to know more about them, see the Regular Expression
 * Reference at the end of the Exercise Manual.
 */
A = FILTER data by order_date matches '^2013-0[2345]-\\d{2}\\s.*$';

No comments:

Post a Comment