Service Broker Gone Wild: Dealing With sysxmitqueue Bloat

by Mar 15, 2015

Have you ever been surprised by suddenly finding out that one of your databases has grown way beyond your expectations? Hopefully, this isn’t a surprise you get on production, as this is something easy to monitor (both database size and disk space are monitored by default with SQL diagnostic manager). But in other environments where change control isn’t strict or where you don’t have monitoring running actively, it’s easy for this to sneak up and surprise you.

 

I got this surprise recently on my laptop. It wasn’t a tremendous surprise as I currently have 7 SQL Server instances and only 2 drives in my laptop. But when I noticed that my 750 GB drive was down to ~2 GB free, I decided to investigate. One of the interesting things I discovered was that a couple of my instances had really large data files for the msdb database (>30 GB). My first thought was that the data file was probably mostly empty as I currently have no replication, log shipping, CDC, etc., running on these instances. To my surprise the files had very little free space. I needed to investigate deeper.

msdb Properties

My next suspicion was that there was a bloated logging table such as dbo.sysjobhistory or dbo.sysmail_log. I decided to query sys.partition to get a quick count of table sizes. This query (shown below) showed that the only large table was the system table sys.sysxmitqueue with more than 94 million rows.

Select OBJECT_NAME(object_id) As TBName,
    SUM(rows) As TotalRows
From sys.partitions
Where index_id In (0, 1)
Group By object_id
Order By TotalRows desc;

This table is defined in Books Online as, “contains a row for each Service Broker transmission queue.” Okay, so it’s a service broker thing. This is a hidden system object, so I can’t delete it directly. I need to purge whatever service broker queue is bloating it. I logged in using the dedicated admin connection (DAC) and queried it to see what the contents were. The entries were all for Event Notifications which rung a bell. I must have set the database mirroring monitoring via service broker that detects a mirroring failover. The root of the problem is that an Event Notification is set up to monitor database mirroring state change, but the attempt to send the message is not reaching its destination due to some configuration problem like the target queue no longer existing or being disabled.

One way to fix this is to resolve the problem that is preventing delivery is to fix the configuration problem and wait for the messages to be sent. This may require restarting the service broker endpoint if it has stopped due to a poison message problem. This takes kind of a long time and then if the target queue is the same server, you’ve simply moved the large chunk of data to a new location. You now have to process those messages.

The quicker way to fix it, AND PLEASE BE CAREFUL DOING THIS IN PRODUCTION, is to create a new broker using Alter Database msdb Set New_Broker With Rollback Immediate;. In order to do this, you must have no traffic in msdb. Stop the SQL Server Agent before you try to run this against msdb. In user databases, disabling SQL Agent is not required. Only do this if you have nothing else in the database using service broker or be prepared to recreate any user created routes.

It may take a few minutes to complete the process as it will delete all existing messages (without processing them) in all service broker queues. It may take several minutes to complete the deletion of existing messages, so don’t be alarmed. It will repeat the message “Nonqualified transactions are being rolled back. Estimated rollback completion: 100%.” while it is deleting the messages. It took about 3 1/2 minutes to delete over 94 million messages on my laptop.

If I check the number of rows in sysxmitqueue now, it has 0 rows, and I can use DBCC SHRINKFILE() for a one-time shrink (do not use regularly unless you bad things to happen) to reclaim 30+ GB of space on the drive.