Friday, January 4, 2013

Doing a Group By and count in PIG

suppose you want to convert this SQL to PIG latin.


 select  count(*)
 from   gpl_limits gl
 group by grouptypeid

In PIG, you'll have to do this:
--begin pig script

rmf /user/tsusanto/test;

--1) load data
A = LOAD '/user/tsusanto/full_load/gpl_limits/part*' USING PigStorage('\t') AS (groupid:chararray, originallimit:chararray, groupdesc:chararray, spentprior:chararray, actuallimit:chararray, effectivedate:chararray, grouptypeid:chararray, stacklim:chararray, displaytypeid:chararray);
 
--2) project required fields
B = FOREACH A GENERATE originallimit, groupid, grouptypeid;

--3)group by groupetypeid
C = GROUP B BY (grouptypeid);

D = FOREACH C generate FLATTEN(group), COUNT(B) as count;

--4)write final resultset
STORE D INTO '/user/tsusanto/test' USING PigStorage('\t');

--end pig script

If you then view the file, you should see 2 columns, the grouptypeid and the count

hadoop fs -text /user/tsusanto/test/part*



Tuesday, December 18, 2012

Hadoop etc

Cheat sheet of Hadoop stuff

SQOOP

IMPORT:
sqoop import --connect jdbc:netezza://host/dbname --username yourusername --password=yourpassword --query "SELECT * FROM tableA where \$CONDITIONS and columnA >= '2012-08-01'"  --direct --fields-terminated-by '\t' --split-by columnB--num-mappers 1 --target-dir /user/tsusanto/input_data/20120803 --compression-codec org.apache.hadoop.io.compress.GzipCodec


DELETE. You can also delete records using sqoop:
sqoop eval --connect jdbc:netezza://host/dbname --username yourusername --password=yourpassword# -e "delete from yourtable where columnA= 926"


Hadoop FS SHELL

write a value to a file in HDFS:
echo 'test123' | hadoop fs -put - yourfilepath

display a few rows of your data file stored in hadoop file system
hadoop fs -text /user/tsusanto/yourdatafolder/part* | less


to search a text string
hadoop fs -text /user/tsusanto/yourdatafolder/part* | grep -P '\byoursearchstring\b'

move a file from hadoop file system to your local current folder
hadoop fs -get /user/tsusanto/jobs/yourfolder/workflow.xml .

delete a file from hadoop file system.  You must delete the file before you can push a new file to the same location, file must not exists
hadoop fs -rm -r /user/tsusanto/jobs/yourfolder/workflow.xml

move a file to the hadoop file system from your current folder
hadoop fs -put workflow.xml /user/tsusanto/jobs/yourfolder/

Wednesday, December 12, 2012

Netezza SQL Extension Toolkit

There are many useful string manipulation functions in Netezza available if you install Netezza SQL Extension Toolkit.

To get the Right 3 characters of a string:

select strright( 'this is a string abc',  3)

The above will return abc.


Tuesday, December 11, 2012

Microstratgy Dashboard

Microstrategy Dashboard built using Microstrategy 9.3

Features:
  1. Row level security based on user login - Microsite IDs. The selectors displayed on the left side of the screen below changes based on the user login
  2. Cascading selectors. Selector Offer Type filters the Retailer selector, which in turn filters the Microsite selector
  3. Multi Panels, so user can easily tab between panels to view different data




Thursday, September 27, 2012

Microstrategy filter on dataset in document

On Microstrategy Web, if your document/dashboard is based on a dataset that is returning you too much data and you want to apply a filter on your dashboard..right click on your grid, click "Edit View Filter", then you can select what attribute/metric you want to filter on and the conditions.



Wednesday, September 19, 2012

Datastage Date Manipulation

I always have a hard time trying to add or minus days  to a given date in Datastage. You can use icon, ocon too....but I like this method when I need to manipulate a date column in Transformer Stage.

To add one day to a given date (an incoming date column):

DateFromJulianDay(JulianDayFromDate(DSLink1.your_date_column) + 1)

So basically you convert a date to a julian day, then add 1, then convert the julian day back to a date.

Friday, September 7, 2012

Microstrategy Modifying Tooltips on graphs

By default, when you have a graph, the tooltip on the web will display:
Series: Electronics Sales Amt
Category: 2012-09-07
Value: $2,509.24

But what if you want your tooltip to say:

Electronics Sales Amt: $2,509.24
Date: 2012-09-07?

Under Graph, Preferences, Series, select one of your series, show tooltip, remove the default value of {&TOOLTIP}
and instead type this:
{&SERIESLABEL}:{&VALUE}&#13Date:{&GROUPLABEL}