Showing posts with label Scripting. Show all posts
Showing posts with label Scripting. Show all posts

Tuesday, December 3, 2013

bash script to extract dates from sql statements from history table

We had to figure out the usage against one of our biggest table to decide how to archive it. We keep a history of all queries issued against the table and I wanted to extract the date filter used in the sql.

There are many ways to do it but I wanted to do it in bash script just for fun...Probably not the most efficient way of doing it but I'm posting this anyway mostly for my own benefit in case I need to figure out how to do something in bash..like how to find the position of a word in a string, how to extract words from a line etc etc



#! /bin/bash

#goal: to extract the dates used in where condition in sql ie adddate >= '2013-11-01' or daystring=20131105
nzsql -t -host serverhere -d databasehere -u usernamehere -W passwordhere -c "select QH_SESSIONID, QH_TSUBMIT, QH_USER, QH_SQL from QUERYHISTORY" > queryhistory.txt 2>&1

rm -f lhqueryhistory1.csv

while read line
do
  if [ -z "$line" ];
  then
    echo "line is empty"
    break
  fi

  sessionid=`echo $line | cut -d"|" -f1`
  starttime=`echo $line | cut -d"|" -f2`
  username=`echo $line | cut -d"|" -f3`
  sql=`echo $line | cut -d"|" -f4`

  #cut out sql starting from the where word
  wherepos=$(echo $sql | awk '{print match(tolower($0), "where")}')

  #if no where condition skip that sql
  if [ ${wherepos:-0} -eq 0 ];
  then
    continue
  fi
  wheresql=$(echo ${sql:${wherepos}})

  #cut out sql starting from the adddate or daystring word
  wherepos2=$(echo $wheresql | awk '{print match(tolower($0), "adddate|daystring")}')
  if [ ${wherepos2:-0} -eq 0 ];
  then
    wheresql2="none and"
  else
    wherepos2=$(($wherepos2 - 1))
    wheresql2=$(echo ${wheresql:${wherepos2}})
  fi

  #cut out sql up until the and, group or limit word
  wherepos3=$(echo $wheresql2 | awk '{print match(tolower($0), "and|group|limit")}')
  if [ ${wherepos3:-0} -eq 0 ];
  then
    wheresql3=$wheresql2
  else
    wherepos3=$(($wherepos3 - 1))
    wheresql3=$(echo ${wheresql2:0:${wherepos3:-0}})
  fi

  #extract the date (before single quote)
  wherepos4=$(expr index "$wheresql3" "\'") #look for single quote
  wheresql4=$(echo ${wheresql3:${wherepos4}})
  if [ ${wherepos4:-0} -eq 0 ];
  then
    wheresql4=$wheresql3
  else
    wherepos4=$(expr index "$wheresql3" "\'")#look for next single quote
    wherepos4=$(($wherepos4 - 1))
    wheresql4=$(echo ${wheresql3:${wherepos4}})
  fi
  finalrow="$sessionid|$starttime|$username|$wheresql4"
  echo $finalrow >> lhqueryhistory1.txt


done < queryhistory.txt

rm -f queryhistory.txt

Monday, November 11, 2013

Datastage stop and reset long hanging jobs

We run a lot of loading jobs from our source SQL Server databases into Netezza using Datastage..these are just simple table to table load with no data transformation and runs every hour.

Every now and them...some of these jobs will hang but will not abort, and it will be a in a perpetual running state until one of us comes in and manually stop the job and recompile it then the next hourly scheduled run will kick off successfully.

I wrote a little shell script to check for Datastage jobs that have been running for more than a certain interval and if it is on the "okay to reset and kill" list (stored in a textfile), then it will stop the job and reset using dsjob commands.

#! /bin/bash

PROG=`basename ${0}`
EXIT_STATUS=0
ProjectName="${1}"
MaxMinutesBeforeReset="${2}"

if [ ${#} -ne 2 ]; then
   echo "${PROG}    : Invalid parameter list. The script needs 2 parameters:"
   echo "Param 1    : DS Project Name "
   echo "Param 2    : MinutesBeforeReset"

   EXIT_STATUS=99
   echo ${NOW} ${PROG} Exiting without completion with status [${EXIT_STATUS}]
   exit ${EXIT_STATUS}
fi

#go to /opt/IBM/InformationServer/Server/DSEngine
BinFileDirectory=`cat /.dshome`/bin
cd ${BinFileDirectory}

#Get current epochtime to as a baseline
CurrentTimeEpoch=$(date +%s)

#check for current running Datastage jobs
ps aux | grep 'DSD.RUN' | tr -s ' ' | cut -d" " -f13 | tr -d '\.' | while read JobName;
do

   #check if it is in the jobs to monitor & reset file, if not skip it
   if grep -Fxq "$JobName" /home/myfolder/JobsToMonitorAndReset.txt
   then
     #Get starttime which is on the 3rd row after the colon
     StartTime=$(./dsjob -jobinfo $ProjectName $JobName | sed -n 3p | grep -o ": .*" | grep -o " .*")
     StartTimeEpoch=$(date -d "$StartTime" +%s)
     DifferenceInMinutes=$((($CurrentTimeEpoch - $StartTimeEpoch)/(60)))
     echo "$JobName has been running for $DifferenceInMinutes minutes"

     #if it has been running more than x (2nd argument) minutes, stop and reset job
     if [ $DifferenceInMinutes -ge $MaxMinutesBeforeReset];
     then
       echo "$JobName will be stopped and reset."
       ./dsjob -stop $ProjectName $JobName
       ./dsjob -run -mode RESET -wait -jobstatus $ProjectName $JobName
       exit 0
     fi
   fi
done

exit 0

---------------------------------------------------------------
If you want to monitor only specific jobs, just add the datastage JobName your file, I stored mine in /home/myfolder/JobsToMonitorAndReset.txt.

You can send an email to yourself too with the jobs that were stopped and reset.

Thursday, November 7, 2013

Convert string to JSON array using Perl

original json file (note: the string values are delimited by a colon)
[
{"cxbrandid":"100158:100159","cxbrandname":"APPLE, ORANGE & BANANA:JUICE JUICE"},
{"cxbrandid":"3000","cxbrandname":"JUNGLE, FOREST"}
]

desired output json file:
[{"cxbrandid":["100158","100159"],"cxbrandname":["APPLE, ORANGE & BANANA","JUICE JUICE"]},{"cxbrandid":["3000"],"cxbrandname":["JUNGLE, FOREST"]}]


Perl script to do that:
========================================================================
use JSON;

my $json;
{
  local $/; #Enable 'slurp' mode
  open my $fh, "<", "t.json";
  $json = <$fh>;
  close $fh;
}
my $data = decode_json($json);

#count number of elements in array and minus one
$t = scalar(@{$data});
$t = $t - 1;

for (my $i=0; $i <= $t; $i++) {

$cxbrandname =  $data->[$i]->{'cxbrandname'};
$cxbrandid = $data->[$i]->{'cxbrandid'};

my $brandidtable=[];
my $brandnametable=[];

@cxbrandidarray=split(/:/, $cxbrandid);
@cxbrandnamearray=split(/:/, $cxbrandname);

foreach my $val (@cxbrandidarray) {
  push @{$brandidtable}, $val;
}

foreach my $val (@cxbrandnamearray) {
  push @{$brandnametable}, $val;
}

$data->[$i]->{'cxbrandid'}= $brandidtable;
$data->[$i]->{'cxbrandname'}= $brandnametable;

}

open my $fh, ">", "data_out.json";
print $fh encode_json($data);
close $fh;