Easier To Put Things Into Maintenance

by Aug 18, 2015

Maintenance mode is one of the best ways to cut down on needless alerts, as many alerts happen when someone is doing routine work on the system.  The key is to get people to make use of the tool.  If assigning temporary maintenance is very easy and quick to do, the utilization of this tool becomes much more likely.  Currently the process of putting hosts into temporary maintenance is quite cumbersome:

 

1. I cannot put things into temporary maintenance from the system’s page itself, but I must rather find the system in the My Infrastructure tree and use the menu there on each specific device.  It would be so much better and quicker if I could just use the search bar to go straight to a specific device, and from there put it into temporary maintenance – just like I can do with scheduled maintenance, but why not temporary as well?  The infrastructure tree can take so much longer and be more difficult to navigate to go hunting for a specific device in it.

 

2. I must put each and every single system into maintenance individually.  Why can’t I have the option to put an entire Infrastructure group or View into maintenance all at once?  Very often groups of several devices are all being worked with at the same time.  An excellent example of this would be a remote office where we are experiencing issues with the network connection back to headquarters.  That office might have 50 monitored systems, all already grouped together, yet it is absolutely ridiculous to have to click 100 times (twice on each device) to silence the alarms from the group – meaning I simply won’t do it.

 

3. When a device is put into temporary maintenance, it must be manually re-enabled.  It is very easy for this step to be forgotten leaving devices unmonitored for potentially much longer periods than intended until noticed later.  Of course, scheduled maintenance can be used, but temporary maintenance is so much quicker than defining and saving a schedule.  In competing products we have seen quick access options to disable for common periods of time (15 minutes, 1 hour, 1 day etc.), similar to the “Quick Date” selections in graphing (though those exact time amounts are longer than relevant to maintenance mode).

 

4.  When a device is put into maintenance, all services attached to that device are disabled, meaning no data at all is collected from the system.  This is a deterring factor from using maintenance mode, since having large gaps in collected metrics is undesirable.  To me, the point of maintenance mode and its primary purpose is to silence alerts from the system, as the device may be reachable and “monitorable” through much of the maintenance window, and it is useful to use up.time itself to pinpoint precisely when a device was fully returned to service, but that cannot be done when it is in maintenance.  If the “Host Check” service goes down, that will stop the other services, but could maintenance mode simply silence alarms instead of disable everything, or at the very least, an option for that?