We had to figure out the usage against one of our biggest table to decide how to archive it. We keep a history of all queries issued against the table and I wanted to extract the date filter used in the sql.
There are many ways to do it but I wanted to do it in bash script just for fun...Probably not the most efficient way of doing it but I'm posting this anyway mostly for my own benefit in case I need to figure out how to do something in bash..like how to find the position of a word in a string, how to extract words from a line etc etc
#! /bin/bash
#goal: to extract the dates used in where condition in sql ie adddate >= '2013-11-01' or daystring=20131105
nzsql -t -host serverhere -d databasehere -u usernamehere -W passwordhere -c "select QH_SESSIONID, QH_TSUBMIT, QH_USER, QH_SQL from QUERYHISTORY" > queryhistory.txt 2>&1
rm -f lhqueryhistory1.csv
while read line
do
if [ -z "$line" ];
then
echo "line is empty"
break
fi
sessionid=`echo $line | cut -d"|" -f1`
starttime=`echo $line | cut -d"|" -f2`
username=`echo $line | cut -d"|" -f3`
sql=`echo $line | cut -d"|" -f4`
#cut out sql starting from the where word
wherepos=$(echo $sql | awk '{print match(tolower($0), "where")}')
#if no where condition skip that sql
if [ ${wherepos:-0} -eq 0 ];
then
continue
fi
wheresql=$(echo ${sql:${wherepos}})
#cut out sql starting from the adddate or daystring word
wherepos2=$(echo $wheresql | awk '{print match(tolower($0), "adddate|daystring")}')
if [ ${wherepos2:-0} -eq 0 ];
then
wheresql2="none and"
else
wherepos2=$(($wherepos2 - 1))
wheresql2=$(echo ${wheresql:${wherepos2}})
fi
#cut out sql up until the and, group or limit word
wherepos3=$(echo $wheresql2 | awk '{print match(tolower($0), "and|group|limit")}')
if [ ${wherepos3:-0} -eq 0 ];
then
wheresql3=$wheresql2
else
wherepos3=$(($wherepos3 - 1))
wheresql3=$(echo ${wheresql2:0:${wherepos3:-0}})
fi
#extract the date (before single quote)
wherepos4=$(expr index "$wheresql3" "\'") #look for single quote
wheresql4=$(echo ${wheresql3:${wherepos4}})
if [ ${wherepos4:-0} -eq 0 ];
then
wheresql4=$wheresql3
else
wherepos4=$(expr index "$wheresql3" "\'")#look for next single quote
wherepos4=$(($wherepos4 - 1))
wheresql4=$(echo ${wheresql3:${wherepos4}})
fi
finalrow="$sessionid|$starttime|$username|$wheresql4"
echo $finalrow >> lhqueryhistory1.txt
done < queryhistory.txt
rm -f queryhistory.txt
IBM Infosphere Datastage, Netezza, Microstrategy, SSIS, SSRS,Hadoop, BI, ETL, Data warehouse, SQL Server, Tableau, mongoDB
Showing posts with label Scripting. Show all posts
Showing posts with label Scripting. Show all posts
Tuesday, December 3, 2013
Monday, November 11, 2013
Datastage stop and reset long hanging jobs
We run a lot of loading jobs from our source SQL Server databases into Netezza using Datastage..these are just simple table to table load with no data transformation and runs every hour.
Every now and them...some of these jobs will hang but will not abort, and it will be a in a perpetual running state until one of us comes in and manually stop the job and recompile it then the next hourly scheduled run will kick off successfully.
I wrote a little shell script to check for Datastage jobs that have been running for more than a certain interval and if it is on the "okay to reset and kill" list (stored in a textfile), then it will stop the job and reset using dsjob commands.
#! /bin/bash
PROG=`basename ${0}`
EXIT_STATUS=0
ProjectName="${1}"
MaxMinutesBeforeReset="${2}"
if [ ${#} -ne 2 ]; then
echo "${PROG} : Invalid parameter list. The script needs 2 parameters:"
echo "Param 1 : DS Project Name "
echo "Param 2 : MinutesBeforeReset"
EXIT_STATUS=99
echo ${NOW} ${PROG} Exiting without completion with status [${EXIT_STATUS}]
exit ${EXIT_STATUS}
fi
#go to /opt/IBM/InformationServer/Server/DSEngine
BinFileDirectory=`cat /.dshome`/bin
cd ${BinFileDirectory}
#Get current epochtime to as a baseline
CurrentTimeEpoch=$(date +%s)
#check for current running Datastage jobs
ps aux | grep 'DSD.RUN' | tr -s ' ' | cut -d" " -f13 | tr -d '\.' | while read JobName;
do
#check if it is in the jobs to monitor & reset file, if not skip it
if grep -Fxq "$JobName" /home/myfolder/JobsToMonitorAndReset.txt
then
#Get starttime which is on the 3rd row after the colon
StartTime=$(./dsjob -jobinfo $ProjectName $JobName | sed -n 3p | grep -o ": .*" | grep -o " .*")
StartTimeEpoch=$(date -d "$StartTime" +%s)
DifferenceInMinutes=$((($CurrentTimeEpoch - $StartTimeEpoch)/(60)))
echo "$JobName has been running for $DifferenceInMinutes minutes"
#if it has been running more than x (2nd argument) minutes, stop and reset job
if [ $DifferenceInMinutes -ge $MaxMinutesBeforeReset];
then
echo "$JobName will be stopped and reset."
./dsjob -stop $ProjectName $JobName
./dsjob -run -mode RESET -wait -jobstatus $ProjectName $JobName
exit 0
fi
fi
done
exit 0
---------------------------------------------------------------
If you want to monitor only specific jobs, just add the datastage JobName your file, I stored mine in /home/myfolder/JobsToMonitorAndReset.txt.
You can send an email to yourself too with the jobs that were stopped and reset.
Every now and them...some of these jobs will hang but will not abort, and it will be a in a perpetual running state until one of us comes in and manually stop the job and recompile it then the next hourly scheduled run will kick off successfully.
I wrote a little shell script to check for Datastage jobs that have been running for more than a certain interval and if it is on the "okay to reset and kill" list (stored in a textfile), then it will stop the job and reset using dsjob commands.
#! /bin/bash
PROG=`basename ${0}`
EXIT_STATUS=0
ProjectName="${1}"
MaxMinutesBeforeReset="${2}"
if [ ${#} -ne 2 ]; then
echo "${PROG} : Invalid parameter list. The script needs 2 parameters:"
echo "Param 1 : DS Project Name "
echo "Param 2 : MinutesBeforeReset"
EXIT_STATUS=99
echo ${NOW} ${PROG} Exiting without completion with status [${EXIT_STATUS}]
exit ${EXIT_STATUS}
fi
#go to /opt/IBM/InformationServer/Server/DSEngine
BinFileDirectory=`cat /.dshome`/bin
cd ${BinFileDirectory}
#Get current epochtime to as a baseline
CurrentTimeEpoch=$(date +%s)
#check for current running Datastage jobs
ps aux | grep 'DSD.RUN' | tr -s ' ' | cut -d" " -f13 | tr -d '\.' | while read JobName;
do
#check if it is in the jobs to monitor & reset file, if not skip it
if grep -Fxq "$JobName" /home/myfolder/JobsToMonitorAndReset.txt
then
#Get starttime which is on the 3rd row after the colon
StartTime=$(./dsjob -jobinfo $ProjectName $JobName | sed -n 3p | grep -o ": .*" | grep -o " .*")
StartTimeEpoch=$(date -d "$StartTime" +%s)
DifferenceInMinutes=$((($CurrentTimeEpoch - $StartTimeEpoch)/(60)))
echo "$JobName has been running for $DifferenceInMinutes minutes"
#if it has been running more than x (2nd argument) minutes, stop and reset job
if [ $DifferenceInMinutes -ge $MaxMinutesBeforeReset];
then
echo "$JobName will be stopped and reset."
./dsjob -stop $ProjectName $JobName
./dsjob -run -mode RESET -wait -jobstatus $ProjectName $JobName
exit 0
fi
fi
done
exit 0
---------------------------------------------------------------
If you want to monitor only specific jobs, just add the datastage JobName your file, I stored mine in /home/myfolder/JobsToMonitorAndReset.txt.
You can send an email to yourself too with the jobs that were stopped and reset.
Thursday, November 7, 2013
Convert string to JSON array using Perl
original json file (note: the string values are delimited by a colon)
[
{"cxbrandid":"100158:100159","cxbrandname":"APPLE, ORANGE & BANANA:JUICE JUICE"},
{"cxbrandid":"3000","cxbrandname":"JUNGLE, FOREST"}
]
Perl script to do that:
========================================================================
use JSON;
my $json;
{
local $/; #Enable 'slurp' mode
open my $fh, "<", "t.json";
$json = <$fh>;
close $fh;
}
my $data = decode_json($json);
#count number of elements in array and minus one
$t = scalar(@{$data});
$t = $t - 1;
for (my $i=0; $i <= $t; $i++) {
$cxbrandname = $data->[$i]->{'cxbrandname'};
$cxbrandid = $data->[$i]->{'cxbrandid'};
my $brandidtable=[];
my $brandnametable=[];
@cxbrandidarray=split(/:/, $cxbrandid);
@cxbrandnamearray=split(/:/, $cxbrandname);
foreach my $val (@cxbrandidarray) {
push @{$brandidtable}, $val;
}
foreach my $val (@cxbrandnamearray) {
push @{$brandnametable}, $val;
}
$data->[$i]->{'cxbrandid'}= $brandidtable;
$data->[$i]->{'cxbrandname'}= $brandnametable;
}
open my $fh, ">", "data_out.json";
print $fh encode_json($data);
close $fh;
[
{"cxbrandid":"100158:100159","cxbrandname":"APPLE, ORANGE & BANANA:JUICE JUICE"},
{"cxbrandid":"3000","cxbrandname":"JUNGLE, FOREST"}
]
desired output json file:
[{"cxbrandid":["100158","100159"],"cxbrandname":["APPLE, ORANGE & BANANA","JUICE JUICE"]},{"cxbrandid":["3000"],"cxbrandname":["JUNGLE, FOREST"]}]
Perl script to do that:
========================================================================
use JSON;
my $json;
{
local $/; #Enable 'slurp' mode
open my $fh, "<", "t.json";
$json = <$fh>;
close $fh;
}
my $data = decode_json($json);
#count number of elements in array and minus one
$t = scalar(@{$data});
$t = $t - 1;
for (my $i=0; $i <= $t; $i++) {
$cxbrandname = $data->[$i]->{'cxbrandname'};
$cxbrandid = $data->[$i]->{'cxbrandid'};
my $brandidtable=[];
my $brandnametable=[];
@cxbrandidarray=split(/:/, $cxbrandid);
@cxbrandnamearray=split(/:/, $cxbrandname);
foreach my $val (@cxbrandidarray) {
push @{$brandidtable}, $val;
}
foreach my $val (@cxbrandnamearray) {
push @{$brandnametable}, $val;
}
$data->[$i]->{'cxbrandid'}= $brandidtable;
$data->[$i]->{'cxbrandname'}= $brandnametable;
}
open my $fh, ">", "data_out.json";
print $fh encode_json($data);
close $fh;
Subscribe to:
Posts (Atom)