Thursday, August 8, 2013

My first experience with Splunk

I've been tasked to explore the Splunk tool and this is my findings after playing with it for a couple hours...

Here's what I did...

Downloaded the free trial Enterprise version from splunk.com to install on my windows laptop (splunk-5.0.4-172409-x64-release.msi). With this trial version I get to index 500 MB of data per day, for 60 days.

Our main goal is to use splunk to explore data sitting on our Hadoop file system but to start of initially, all I wanted to do was upload a simple tab delimited file and write some queries against it...wasn't as straightforward as I thought! They splunk folks did mention that ideally the input file should be key value pairs of data but most of the data I work with are csv/tab delimited. Anyway...to make the fields show up so you can include it in your splunk query, you have to define the column names in 2 files called props.conf and transforms.conf.

I installed splunk using all the defaults so I had to create the 2 files props.conf and transforms.conf here:
C:\Program Files\Splunk\etc\system\local.

Just right click, create new text file and in props.conf, type this:

[mydataname]
NO_BINARY_CHECK = 1
pulldown_type = 1
REPORT-myname =  myfieldnames
----------------------------
--comments:
you can replace "mydataname" with anything you want
you can replace "myfieldnames" with anything you want
----------------------------------------------------

Create a new text file and in transforms.conf, type this:

[myfieldnames]
DELIMS = "\t"
FIELDS = "ACTIONID","CALLID","GIQDEVICEID","APPSESSIONID","ACTION","INFO1","INFO2","STARTTIME","ENDTIME","ADDDATE"


-----------------------------------------
--comments
my input file had 10 columns (no header row). So I just typed out the column names in the order they came in.

Then  I went to Splunk Manager, System, and clicked on Server controls, and it gives me an option to Restart Splunk, which I did.

Then I went to Data, imported my tab delimited file that is sitting on my local machine by using the Add Data to Splunk wizard, selected the file, and in the window where it says "Set source type", if you click on the drop down list, you should now see "mydataname" in the drop down list. Your data should now appear in the source types of the summary page. I clicked on "mydataname" which brought me to the search page and gave me the default query:
sourcetype="mydataname" (which is equivalent to select * from mydataname).

And now I can further filter my results using the column names I specified in the transforms.conf file.
I wanted to write the equivalent of this:
select count(*)
from mydataname
where ACTION="AppStarted"

Splunk:
sourcetype="giquseraction" AND ACTION="AppStarted" | stats COUNT


So there you go...my first experiment with Splunk...will keep you guys posted on what I find next..



We had the splunk folks come down to give us a demo a few weeks ago and I asked them what happens if one of us index a huge 1 GB data file by mistake that makes us go over the limit...(oops)....they say it happens all the time and they give their customers 5 oopsies per month, which means you have 5 chances per month when you can call your sales representative up and ask them to "reset" the quota for that day.

Friday, August 2, 2013

Netezza regexp_like. Regex to find integers only

regex is very handy and to be able to use it as part of a sql is just plain awesome I think.

I needed to find all records that only has integers in that field and this is how I did it:

select *
from TABLENAME
where regexp_like(INPUTCOLUMNNAME ,'^\d+$');