SQL Agent Monitoring – Long Running Job Queries

The SQL Server alert system does a great job of handling notifications for SQL Agent job failures but if want to know if a job is hung, taking longer than expected, or maybe it’s disabled and hasn’t been running, we will need to put in a bit of effort ourselves.

This first query will return the start and stop times of the last execution of a job. If the job is currently running then it will provide the duration in seconds that it has been running for.¬†This can easily be put into a SQL Agent job to run every few minutes and send out an e-mail with jobs that have been running for longer than you’d like.

SELECT jobs.name AS [Job_Name]
, CONVERT(VARCHAR(23),ja.start_execution_date,121)
AS [Start_execution_date]
, ISNULL(CONVERT(VARCHAR(23),ja.stop_execution_date,121), 'Is Running')
AS [Stop_execution_date]
, CASE ISNULL(CAST(ja.stop_execution_date AS VARCHAR(30)),'NULL')
WHEN 'NULL'
THEN CAST(DATEDIFF(ss,ja.start_execution_date,GETDATE()) AS VARCHAR(30))
ELSE 'Not running'
END AS [Duration_secs]
FROM msdb.dbo.sysjobs jobs
LEFT JOIN msdb.dbo.sysjobactivity ja ON ja.job_id = jobs.job_id
AND ja.start_execution_date IS NOT NULL
WHERE jobs.name = 'test'

SQL-Agent-Monitoring-Long-Running-Job-Queries-1

SQL-Agent-Monitoring-Long-Running-Job-Queries-3

This next query will return the last execution and status for a given job, or all jobs if you comment out the WHERE clause. This can be used to monitor how long it’s been since a job last succeeded.

;WITH last_hist_rec AS

(
SELECT ROW_NUMBER() OVER
(PARTITION BY job_id ORDER BY run_date DESC, run_time DESC) AS [RowNum]
, job_id
, run_date AS [last_run_date]
, run_time AS [last_run_time]
, CASE run_status
WHEN 0 THEN 'Failed'
WHEN 1 THEN 'Succeeded'
WHEN 2 THEN 'Retry'
WHEN 3 THEN 'Canceled'
END AS [run_status_desc]
FROM msdb.dbo.sysjobhistory
)
SELECT jobs.name AS [job_name]
, hist.last_run_date
, hist.last_run_time
, hist.run_status_desc
FROM msdb.dbo.sysjobs jobs
LEFT JOIN last_hist_rec hist ON hist.job_id = jobs.job_id
AND hist.RowNum = 1
WHERE jobs.name = 'test'
--AND hist.[run_status_desc] = 'Failed'
--AND hist.[run_status_desc] = 'Succeeded'
--AND hist.[run_status_desc] = 'Retry'
--AND hist.[run_status_desc] = 'Canceled'

SQL-Agent-Monitoring-Long-Running-Job-Queries-2

This article has 4 comments

  1. Derik,

    Thanks for this post. When I run the first query, I see multiple results for a single job and strangely, those results skip around a lot. For example, I have a job that runs every day but the last three rows (out of a total of 129 rows) for this job show start dates of 7/14/14, 8/15/14, and 8/28/14. Any idea why I would see rows like that? Seems that maybe the sysjobs table needs to be cleaned up?? If that is the case, how would I do that?

    Thanks

    • Thanks for your question Dave. Your results are interesting. In theory, there should never be multiple records in the msdb.dbo.sysjobs table with the same job name. The SSMS GUI disallows it. There is not, however, a unique index on the sysjobs table for name. So you can insert duplicates. This throws SSMS Job Activity Monitor into a fit but I have not tested how it would affect job execution yet. Either way, the above query could only return more than one record if there were duplicate records in sysjobs or your WHERE clause was not written the way you expect. Such as, if you used a LIKE and a wild card and you were actually getting 129 different jobs with similar names. Order shouldn’t matter, since you should only get one record, but an ORDER BY clause would be required to line those up if you desired. Finally, I do think that sysjobs cleanup is required and you will need to perform manual DELETE statements. This can be dangerous, especially if those jobs are actually functioning properly and this is a production environment. Proceed with caution.

  2. I’ve been trying to do something similar to this using SSIS to bring together MSDB job activities from several servers into one table. Mainly so we can spot unfinished jobs which have got stuck. What I’ve found is that it is possible to have a job in the [jobactivity] table with a start_execution_date and a NULL stop_execution_date even if it isn’t actually running. I think this happens when a job is scheduled and someone disables it, either by disabling or changing the schedule. I’ve seen it happen today when a job which was cancelled by a user has an entry for what would have been the next scheduled run time but that doesn’t show as the next run time in the Job Activity Monitor console.

    If you know how to eliminate these using another table please can you share?

    • Unfortunately, I do not know of any other source of information to suite this purpose. Even worse, I can understand your concern. I have had SQL Agent’s act funny with scheduling and run metadata. On occasion I have had to restart the agent to resolve issues. Mostly I saw this in SQL Server 2008 R2. Just because I have not experienced it on a newer version does not guarantee that it still cannot happen, however.

Leave a Reply

%d bloggers like this: