Thursday, January 17, 2008

Working with Biztalk Database


Count of all Orchestration instances grouped by state
 What it does:   Gives statistical information about all orchestration instances in your system and groups them into the different states they might be in. This type of information is available in the admin MMC by right clicking on an orchestration and selecting properties. This is just a way to rollup all data across all orchestration types and to do it programmatically.
How to read the results: If you have a large number of suspended instances well then something is failing. If you have a large number of dehydrated instances it could indicate a problem with your backend systems not responding promptly or it could just mean that the host in which messaging services are running is not up. If you have a large number of Ready To Run services then either your server is not up or you have a lot of load on your system and are trying hard to keep up. Might not be a problem if we catch up later, but if we can’t catch up then your system is being over driven.
What to do: If they are suspended, you need to look at the suspended info in HAT and determine why they are suspended and do something about. If they are dehydrated and you are concerned, you should look at the message flow to see what they are waiting for and then determine why that message has not come back. If they are ready to run, you should make sure your services are running and then check CPU utilization on these machines and other resource issues because they might be over-driven.
 SET NOCOUNT ON
SET TRANSACTION ISOLATION LEVEL READ COMMITTED
SET DEADLOCK_PRIORITY LOW
 SELECT  o.nvcName AS Orchestration, COUNT(*) as Count,
                CASE i.nState
                                WHEN 1 THEN 'Ready To Run'
                                WHEN 2 THEN 'Active'
                                WHEN 4 THEN 'Suspended Resumable'
                                WHEN 8 THEN 'Dehydrated'
                                WHEN 16 THEN 'Completed With Discarded Messages'
                                WHEN 32 THEN 'Suspended Non-Resumable'
                END as State
FROM [BizTalkMsgboxDb]..[Instances] AS i WITH (NOLOCK)
JOIN [BizTalkMgmtDb]..[bts_Orchestration] AS o WITH (NOLOCK) ON i.uidServiceID = o.uidGUID
--WHERE dtCreated > '2004-08-24 00:00:00' AND dtCreated < '2004-08-24 13:30:00'
GROUP BY o.nvcName, i.nState

Counts for all host queues
What it does:   This will return the table size for the three queues described in the tables section for all applications
How to read the results: If the suspendedQ grows, things are failing. If the workQ grows, you are either over-driving the system, your services are off, or perhaps things have failed and you are going to do retries so the messages are delayed (see valid time). If the state queue grows, you either have a lot of dehydrated orchestrations, or you might be having problems with MSMQt.
What to do: If it is the SuspendedQ, use the error information in HAT and the eventlog to figure out what is failing and fix it. Make sure to either resume or terminate these instances eventually. You don’t want stuff sitting around forever. If it is the workQ, check to make sure the services are running. If they are, check to see if the CPU or other resources are pegged since you might be over driving the system. Also check the evenlog for warnings indicating transport failures which could cause retires. If it is the StateQ, check to see if you have a lot of dehydrated or suspended orchestrations and if so, why. You can also run one of the later queries to try and see if it is an MSMQt instance holding all the references and if so, you are probably not sending data to the outgoing queue anymore because we are still waiting for an ack for an earlier message.
 SET NOCOUNT ON
SET TRANSACTION ISOLATION LEVEL READ COMMITTED
SET DEADLOCK_PRIORITY LOW
 CREATE TABLE #Temp (AppQueue nvarchar(256), QueueSize int)
declare @nvcAppName sysname
declare MyCursor CURSOR FAST_FORWARD FOR
SELECT nvcApplicationName FROM [BizTalkMsgboxDb]..[Applications] WITH (NOLOCK)
 open MyCursor
FETCH NEXT FROM MyCursor INTO @nvcAppName
WHILE (@@FETCH_STATUS = 0)
BEGIN
                INSERT INTO #Temp
                exec ('SELECT ''' + @nvcAppName + 'Q'', COUNT(*) FROM ' + @nvcAppName +'Q WITH (NOLOCK)')
                INSERT INTO #Temp
                exec ('SELECT ''' + @nvcAppName + 'Q_Suspended'', COUNT(*) FROM ' + @nvcAppName +'Q_Suspended WITH (NOLOCK)')
                INSERT INTO #Temp
                exec ('SELECT ''InstanceStateMessageReferences_' + @nvcAppName + ''', COUNT(*) FROM InstanceStateMessageReferences_' + @nvcAppName + '  WITH (NOLOCK)')
                    FETCH NEXT FROM MyCursor INTO @nvcAppName
END 
SELECT * FROM #Temp ORDER BY QueueSize DESC
 close MyCursor
deallocate MyCursor
DROP TABLE #Temp

 Count for number of state messages grouped by instances
What it does:   This will give you a count of the number of state messages associated with each instance
How to read the results: This specific query is probably only useful for systems which use MSMQt as a quick warning when things are starting to go wrong. If you have a lot of messages building up for a specific instance and that instance is an MSMQt instances, you could have problems.
What to do: If you have one of these problematic instances and the service class is MSMQt, you can look at the UserState column to determine what the name of the queue is which we are talking to and then you need to see why the communication with that machines queue is not working. If this is an orchestration, it is possible this is simply expected. It depends on your business logic as to why so many messages are going to the same instance.
 SET NOCOUNT ON
SET TRANSACTION ISOLATION LEVEL READ COMMITTED
SET DEADLOCK_PRIORITY LOW
 CREATE TABLE #Temp (AppName nvarchar(256), StateMessages int, InstanceID uniqueidentifier, ServiceClassID uniqueidentifier, ServiceState nvarchar(256))
declare @nvcAppName sysname
declare MyCursor CURSOR FAST_FORWARD FOR
SELECT nvcApplicationName FROM [BizTalkMsgboxDb]..[Applications] WITH (NOLOCK)
open MyCursor
FETCH NEXT FROM MyCursor INTO @nvcAppName
WHILE (@@FETCH_STATUS = 0)
BEGIN
                INSERT INTO #Temp
                exec ('SELECT ''' + @nvcAppName + ''', COUNT(*), i.uidInstanceID, i.uidClassID, i.nvcUserState FROM InstanceStateMessageReferences_' + @nvcAppName + ' AS s WITH (NOLOCK) JOIN Instances AS i WITH (NOLOCK) ON s.uidInstanceID = i.uidInstanceID GROUP BY i.uidInstanceID, i.uidClassID, i.nvcUserState')
                    FETCH NEXT FROM MyCursor INTO @nvcAppName
END
SELECT * FROM #Temp ORDER BY StateMessages DESC
 close MyCursor
deallocate MyCursor
DROP TABLE #Temp

Count of all active messages for a specific sendport
What it does:   This query not only gives the count for how many messages are in the workQ for a given sendport, but it will do a breakdown based upon the primary and secondary transport also. Even better, it will give you a count for the maximum number of retries any message is on.
How to read the results: Having a high number for a certain transport is not necessarily worrying. It is basically the same as having a lot of data in the workQ generically and could just mean your system is slow. If you retry count is high it means things are failing. If you have a lot of messages on the secondary transport, then things never succeeded on the primary and now have failed over to the secondary transport
What to do: You should look in the eventlog for information on why the failures are happening and perhaps look into the system which this sendport is communicating with the to figure out what the problem is.
 SET NOCOUNT ON
SET TRANSACTION ISOLATION LEVEL READ COMMITTED
SET DEADLOCK_PRIORITY LOW
 CREATE TABLE #Temp (MaxRetries int, Active int, SendPort nvarchar(256), IsPrimaryTransport int)
declare @nvcAppName sysname
declare MyCursor CURSOR FAST_FORWARD FOR
SELECT nvcApplicationName FROM [BizTalkMsgboxDb]..[Applications] WITH (NOLOCK)
open MyCursor
FETCH NEXT FROM MyCursor INTO @nvcAppName
WHILE (@@FETCH_STATUS = 0)
BEGIN
                INSERT INTO #Temp
                exec ('      SELECT    MAX(q.nRetryCount) AS MaxRetries
                                                  ,COUNT(*) AS Active
                                                  ,sp.nvcName AS SendHandlersHostServiceName
                                                  ,spt.bIsPrimary AS IsPrimaryTransport
                                FROM ' +  @nvcAppName + 'Q as q WITH (NOLOCK)
                                                  INNER LOOP JOIN Subscription as s WITH (NOLOCK) ON q.uidServiceID = s.uidServiceID AND s.uidPortID = q.uidPortID
                                                  INNER LOOP JOIN [BizTalkMgmtDb]..[bts_sendport] as sp WITH (NOLOCK) ON q.uidServiceID = sp.uidGUID
                                                  INNER LOOP JOIN [BizTalkMgmtDb]..[bts_sendport_transport] as spt WITH (NOLOCK) ON sp.nID = spt.nSendPortID AND spt.uidGUID = q.uidPortID
                                GROUP BY  sp.nvcName, s.uidPortID, spt.bIsPrimary
                                ORDER BY  SendHandlersHostServiceName ASC')

                FETCH NEXT FROM MyCursor INTO @nvcAppName
END 
SELECT * FROM #Temp ORDER BY Active DESC

close MyCursor
deallocate MyCursor
DROP TABLE #Temp

Last duration for our SQL Agent jobs
What it does:   This will give you a view into when each job last ran and how long it took. This is important to monitor not just because of the data it gives you but also to make sure you realize the importance of having sql agent running.
How to read the results: If the jobs start taking a long time, you could have database resource issues. Either you are running out of CPU or perhaps your disk is getting thrashed. If the job failed, it could be because of a number of deadlock issues which we are fixing in SP1 or have fixed in QFEs.
What to do: If it is a resource issue, you might need to consider beefing up your sql server. If the issue only happens rarely, then you are probably okay as we will catch up. If a failure occurred due to deadlock, you could consider getting SP1 if it has released yet as we fix a number of these issues. There is also a QFE out for deadlocking in the TrackedMessages_Copy job. In general, the deadlocks will not seriously hurt your system as long as they don’t occur too often. The job will fail and we will just have more work to do next time it kicks in.
 SET NOCOUNT ON
SET TRANSACTION ISOLATION LEVEL READ COMMITTED
SET DEADLOCK_PRIORITY LOW
 CREATE TABLE #Temp (JobID uniqueidentifier,
                    JobName sysname,
                    Status int,
                    DateStarted int,
                    TimeStarted int,
                    Duration int,
                    OpEmailed nvarchar(20),
                    OpNetSent nvarchar(20),
                    OpPaged nvarchar(20),
                    Retries int,
                    Server nvarchar(30)
                    )
 CREATE TABLE #Final (MessageBox sysname,
                                    JobName sysname,
                                    DateStarted int,
                                    TimeStarted int,
                                    Duration int,
                    Status int
                    )

declare @dbName sysname
declare MyCursor cursor FAST_FORWARD FOR
select DBName FROM [BizTalkMgmtDb]..[adm_MessageBox] WITH (NOLOCK)
 open MyCursor
 FETCH NEXT FROM MyCursor INTO @dbName
WHILE (@@FETCH_STATUS = 0)
BEGIN
     INSERT INTO #Temp
    exec ('[msdb]..[sp_help_jobhistory] @job_name = ''MessageBox_DeadProcesses_Cleanup_' + @dbName + '''')
    INSERT INTO #Final
    SELECT TOP 1 @dbName, JobName, DateStarted, TimeStarted, Duration, Status FROM #Temp
    TRUNCATE TABLE #Temp  

    INSERT INTO #Temp
    exec ('[msdb]..[sp_help_jobhistory] @job_name = ''MessageBox_Message_Cleanup_' + @dbName + '''')
    INSERT INTO #Final
    SELECT TOP 1 @dbName, JobName, DateStarted, TimeStarted, Duration, Status FROM #Temp
    TRUNCATE TABLE #Temp  

    INSERT INTO #Temp
    exec ('[msdb]..[sp_help_jobhistory] @job_name = ''MessageBox_Parts_Cleanup_' + @dbName + '''')
    INSERT INTO #Final
    SELECT TOP 1 @dbName, JobName, DateStarted, TimeStarted, Duration, Status FROM #Temp
    TRUNCATE TABLE #Temp  

    INSERT INTO #Temp
    exec ('[msdb]..[sp_help_jobhistory] @job_name = ''PurgeSubscriptionsJob_' + @dbName + '''')
    INSERT INTO #Final
    SELECT TOP 1 @dbName, JobName, DateStarted, TimeStarted, Duration, Status FROM #Temp
    TRUNCATE TABLE #Temp  

    INSERT INTO #Temp
    exec ('[msdb]..[sp_help_jobhistory] @job_name = ''TrackedMessages_Copy_' + @dbName + '''')
    INSERT INTO #Final
    SELECT TOP 1 @dbName, JobName, DateStarted, TimeStarted, Duration, Status FROM #Temp
    TRUNCATE TABLE #Temp  
   FETCH NEXT FROM MyCursor INTO @dbName
END

SELECT * FROM #Final ORDER BY MessageBox
close MyCursor
deallocate MyCursor
 drop table #Temp
drop table #Final