We have a rather large SQLDM configuration here, with 140+ servers / 1000+ databases being monitored simultaneously from one SQLDM instance. However, monitoring feels like it is starting to fail - many servers, even ones that are not that busy, have gaps in their monitoring, varying from 10-15 minutes up to several days.
There is some correlation with server load, with the busiest servers having the largest and most frequent gaps. However, the correlation is not straightforward. Production Server A, which is our biggest server with the highest average load maxes out at gaps of about a day, whereas Production Server S has a currently running gap of nearly four days at this moment. Development Server A has gaps when nothing is running of up to 50 minutes, QA Server S has gaps of up to an hour, but has been running heavily for the last few days, whereas QA Server A has gaps larger than production server A.
When you are onitoring a specific server, such as server S, you will see values feed into the graphs as time passes. However if you switch to monitoring another server, then return to S, there were no snapshots recorded in the period.
Does anyone know -- Is this just the limit of what Idera can monitor on the hardware we are using, or has anyone experienced this before with something to fix or examine?
NOTE: We intend to open a ticket with Idera soon, but I was hoping that some genius here could give us a hint on what we should be looking at.
IDERA SQL Diagnostic Manager Desktop Client 10.2.0.3269IDERA SQL Diagnostic Manager Repository 10.2IDERA SQL Diagnostic Manager Management Service 10.2.0.3270IDERA SQL Diagnostic Manager Collection Service 10.2.0.3269Microsoft Data Access Component (MDAC) 6.3.9600.16384Microsoft .Net Framework 4.0.30319.34014Microsoft Windows Operating System Microsoft Windows NT 6.2.9200.0SQLDM Mobile and Newsfeed Version 188.8.131.52
We have installed 10.3.1.3, and the process is running. Things are definitely better, but I still have one server (I have not yet gone through all 140 servers) that is skipping large chunks of time. There may be more, but this upgrade did indeed improve the situation significantly.
When you get a chance, I'd recommend opening a support case. If you can isolate a problem to specific instances, they may be able to identify the cause of the "large chunks of time" that's missing.