Feeling the bloat? Quiet down our log files.

by May 19, 2017

Log obesity. It's a silent problem. One that often is hidden by well meaning administrators, adding space to their VMs over time. The patient, helpful enablers, feeding the beast. NO MORE I SAY! NO MORE!

True story. I was on a customer's Uptime environment a little while back and saw his C drive was 100 GB, 89 or so was used. Now, on a larger uptime install that wouldn't surprise me one little bit, but he had his MySQL datastore off on another drive. YIKES! What was taking up all that space. A little digging revealed he had nearly 70 GIGABYTES OF TEXT LOGS in the uptimelogs folder! Wow! They get rolled over and restart when the services restart so mind you there were a lot of files in there, some dating back 5 years! I asked him if he was saving them for some particular reason, because to us, stuff that old and that many versions behind was, well, junk. He said no and we deleted them. It was like clearing junk out of the garage, a cathartic moment when you free yourself from the… oh you get it.

So here's the deal. Let's own this. This is obviously something that doesn't have to get out of control because we can monitor the logs folder using our file and directory monitor. Sure, but we can control just about anything and everything that goes into our log files. Surprising right? Don't be. That's how Uptime is, extremely configurable. In your uptime folder is a subfolder called templates.. Actually, the line in uptime.conf refers to the logging configuration file. 

In this file there are lots of comments and many commented out lines that would ADD logging. A quick review and you'll get the idea how we're going to make things shut up. The FIRST thing that stick out to me are what are known as ERDC events. ERDC refers to the monitoring events, in general. You know the "WARN: your disk is running out of space" and the "CRIT: your cpu is on fire.." alerts you see in Uptime? Those ALL get logged, every single time they change state, to our default uptime.log… Crazy right? We have an option in alert profiles to do EXACTLY that, so, why on Earth would we want to do it by default. It creates bloat, noise in the logs when you actually want to find issues, and let's not forget unnecessary I/O on the disk. Let's review the file. Default pasted below…

# This configuration file can be used to fine-tune logging thresholds in the
# up.time application. The format of the file is:
#
# logger.name=LEVEL
#
# where LEVEL is a standard Java log level (ALL, DEBUG, INFO, WARN, ERROR,
# FATAL, OFF). Logging thresholds defined in this file override the global
# up.time logging threshold that is specified by the loggingLevel option in
# uptime.conf.
#
# Some common configuration options are listed below.

# Show more granular timing information for service monitor runs that exceed
# the long run warn threshold:
#com.uptimesoftware.uptime.base.event.erdc.ERDCEventProcessor=DEBUG

# Hide warnings about service monitors entering CRIT:
#com.uptimesoftware.uptime.base.event.erdc.ErdcRunResultHandler=ERROR

# Show more detailed information about vSphere inventory updates, including
# timeouts and reconnects:
#com.uptimesoftware.uptime.vmware.update.InventoryUpdateListener=DEBUG

# Hide all informational logs about vSphere inventory updates:
#com.uptimesoftware.uptime.vmware.update.log=WARN

# Logger names starting with com.uptimesoftware.util.StackLogger affect only
# the uptime_exceptions.log, which is globally set to the DEBUG threshold.
#
# An example configuration option follows.

# Hide exceptions printed when transactions are retried following a
# deadlock:
#com.uptimesoftware.util.StackLogger.com.uptimesoftware.uptime.database.session.SessionTemplate=INFO

Ok so you're wondering what is safe and what isn't safe to play with. Remember it's a LOG FILE so technically ANYTHING is fair game. However I will warn you, if you disable any of it, it can make it more difficult, or impossible, to troubleshoot an issue if one were to arise. So what do I recommend you ask? If you've just installed Uptime on a trial, leave it be, because who knows? If you've had Uptime installed for a long time and you're not having any issue, go crazy. If you are about to upgrade, probably turn logging back on more in depth just in case for a day or so prior to upgrade. The upgrade will overwrite the file anyway… Basically whenever there is change, and in the early days, we need to pay more attention to things. Like a new puppy. Except well, uptime isn't likely to eat your favorite pair of Italian leather dress shoes. So. I recommend this.

# This configuration file can be used to fine-tune logging thresholds in the
# up.time application. The format of the file is:
#
# logger.name=LEVEL
#
# where LEVEL is a standard Java log level (ALL, DEBUG, INFO, WARN, ERROR,
# FATAL, OFF). Logging thresholds defined in this file override the global
# up.time logging threshold that is specified by the loggingLevel option in
# uptime.conf.
#
# Some common configuration options are listed below.

# Show more granular timing information for service monitor runs that exceed
# the long run warn threshold:
#com.uptimesoftware.uptime.base.event.erdc.ERDCEventProcessor=DEBUG

# Hide warnings about service monitors entering CRIT:
com.uptimesoftware.uptime.base.event.erdc.ErdcRunResultHandler=ERROR

# Show more detailed information about vSphere inventory updates, including
# timeouts and reconnects:
#com.uptimesoftware.uptime.vmware.update.InventoryUpdateListener=DEBUG

# Hide all informational logs about vSphere inventory updates:
com.uptimesoftware.uptime.vmware.update.log=WARN

# Logger names starting with com.uptimesoftware.util.StackLogger affect only
# the uptime_exceptions.log, which is globally set to the DEBUG threshold.
#
# An example configuration option follows.

# Hide exceptions printed when transactions are retried following a
# deadlock:
#com.uptimesoftware.util.StackLogger.com.uptimesoftware.uptime.database.session.SessionTemplate=INFO

I highlighted the two rows I changed. You can just copy / paste the whole thing too if you like. After modifying you will need to restart the Uptime Data Collector (core in linux) to pickup the changes since they are Java related. While the service is down, it is a perfect time to clean up that logs folder. Current logs will have no date / time stamp on them. The old ones will. If you've got something going with support, probably leave everything be for now. Otherwise, have at it. 

Now, you may be asking "ok this is nice to know, but I don't like messing with configuration files for stuff like this." I would say you're in good company! In the release that is just now available, 7.8.2, we have split all Java based monitors status messages into their very own log file. Others will follow suit soon. Also, we have plans on modifying default logging levels so you'd have to tell us you WANT more verbosity, first of all. We also plan on adding more control from within the configuration section of the GUI. We will also have what we are referring to as health dots, in the uptime dashboard that represent all kinds of things within three categories. Monitoring station health (drive capacity, folder growth like logs, cpu, mem, etc), health of monitors themselves (if some can't connect, aren't pulling data, have errors, anything preventing the monitor from working properly), and monitoring performance health (things internal to monitoring like number of events processed, backlog, various JVM stats, etc). But I'll save some of that detail for later. I figure if you made it this far in my article, I'd give you a little taste.

Thanks for reading and enjoy that free space!