Every now and them...some of these jobs will hang but will not abort, and it will be a in a perpetual running state until one of us comes in and manually stop the job and recompile it then the next hourly scheduled run will kick off successfully.
I wrote a little shell script to check for Datastage jobs that have been running for more than a certain interval and if it is on the "okay to reset and kill" list (stored in a textfile), then it will stop the job and reset using dsjob commands.
#! /bin/bash
PROG=`basename ${0}`
EXIT_STATUS=0
ProjectName="${1}"
MaxMinutesBeforeReset="${2}"
if [ ${#} -ne 2 ]; then
echo "${PROG} : Invalid parameter list. The script needs 2 parameters:"
echo "Param 1 : DS Project Name "
echo "Param 2 : MinutesBeforeReset"
EXIT_STATUS=99
echo ${NOW} ${PROG} Exiting without completion with status [${EXIT_STATUS}]
exit ${EXIT_STATUS}
fi
#go to /opt/IBM/InformationServer/Server/DSEngine
BinFileDirectory=`cat /.dshome`/bin
cd ${BinFileDirectory}
#Get current epochtime to as a baseline
CurrentTimeEpoch=$(date +%s)
#check for current running Datastage jobs
ps aux | grep 'DSD.RUN' | tr -s ' ' | cut -d" " -f13 | tr -d '\.' | while read JobName;
do
#check if it is in the jobs to monitor & reset file, if not skip it
if grep -Fxq "$JobName" /home/myfolder/JobsToMonitorAndReset.txt
then
#Get starttime which is on the 3rd row after the colon
StartTime=$(./dsjob -jobinfo $ProjectName $JobName | sed -n 3p | grep -o ": .*" | grep -o " .*")
StartTimeEpoch=$(date -d "$StartTime" +%s)
DifferenceInMinutes=$((($CurrentTimeEpoch - $StartTimeEpoch)/(60)))
echo "$JobName has been running for $DifferenceInMinutes minutes"
#if it has been running more than x (2nd argument) minutes, stop and reset job
if [ $DifferenceInMinutes -ge $MaxMinutesBeforeReset];
then
echo "$JobName will be stopped and reset."
./dsjob -stop $ProjectName $JobName
./dsjob -run -mode RESET -wait -jobstatus $ProjectName $JobName
exit 0
fi
fi
done
exit 0
---------------------------------------------------------------
If you want to monitor only specific jobs, just add the datastage JobName your file, I stored mine in /home/myfolder/JobsToMonitorAndReset.txt.
You can send an email to yourself too with the jobs that were stopped and reset.
No comments:
Post a Comment