SCOM – Windows Service Monitor Management Pack

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

NOTE – If you already have a previous version of this MP imported into SCOM and are updating to a newer version, backup your existing MP and follow the instructions below to copy the service monitoring configurations to the updated PM.

The progression should go:

  • Export your current Windows Service Monitoring MP.
  • Download the updated version, copy it to another location
  • Open the MP you exported from SCOM and copy the old service configurations to the new MP
  • So from this (search for “Dim Arr” to find it):wsm_updatemp
  • To this:wsm_update1mp
  • Now import the new MP into SCOM and you’re good to go

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

** Changelog **

Updates 2/25/2016

  • Added console tasks for service starting/stopping/status
  • Added a recovery to the service monitor for automatic service restart (disabled by default). It uses the three strikes and you’re out rule. By that I mean by default it will try to start a service 3 times in a 24 hour period and then it will not try to start the service again. This is to prevent a constant start/stop loop and is able to be configure to your preference
  • Added a timer reset monitor for when the 3 strike limit is reached, this is enabled by default and will automatically close by default
  • Fixed a small bug in the “ServiceMPEditor.ps1” tool where it wouldn’t import if you had 0 entries

Blog post begins here (with edits to reflect current updates)

I dislike the default Windows Service Monitoring Template in SCOM. Here’s a few reasons why:

  • The templates create a lot of monitoring bloat. Each service template configuration comes with a lot of baggage:
    • Its own class
    • Its own discovery
    • 3 monitors
    • 8 overrides
  • It takes forever to add/edit/remove a large amount of services from the console
  • If you override “Alert only if service start type is automatic” and set it to “false” it will alert for all startup types. This is a bummer when you have some disabled ones out there or want to temporarily set one to disabled and not have SCOM alert.
  • No built in flexibility for days of the week or hours of the day filtering
  • This method will not scale very well

So how do we make things better, it appears there’s some room for improvement. I’ve been thinking about and working on solving this problem for years and in that time two approaches to solving the problem have always float to the top:

  1. A scripted monitoring solution
  2. An optimized version of the built in template method

There’s pros and cons to each of these approaches as well

A Scripted Monitoring Solution

Pros:

Cons:

  • Performing overrides and alerting at the individual service level is a bit cumbersome

An optimized version of the built in template method

Pros:

  • You get most of the features you get with the template monitoring and some added extra ones that aren’t in the default template monitoring.
  • Each service has its own alert/monitor.

Cons:

  • Although it will scale better than the built in solution, it won’t scale up as well as the scripted solution.

An early version of the optimized solution is what I have here for you today. I’ve been working the last couple weekends on getting this put together and here’s a high level overview of the features I have so far:

  • I used the same datasource for service monitoring that SCOM uses so whatever magic they have going on to optimize that process should be included
  • Date and time filtering so you can exclude certain days/times from monitoring on a per service or service object basis
  • Console tasks for service starting/stopping/status
  • Automatic service recovery (disabled by defuault). Works on a 3 strikes and you’re out format. After 3 failures in a 24 hour period it will stop trying to restart the service. This is overridable to your needs.
  • Timer reset monitor (closes itself after 24 hours) to watch for and alert on the 3 strike out situation. This is enabled by default
  • Monitor all service startup types with the exclusion of disabled services from alerting
  • Created a custom discovery which discovers and adds all the service objects to one class rather than scattering them about like the templates do
  • Created a Service MP Editor to allow you to add/remove or mass import services to be monitored

Getting started

  • First download the MP and Service MP Editor here: https://gallery.technet.microsoft.com/SCOM-Windows-Service-c9fc3f95
  • Second, this is still really early days for this MP. Don’t go throwing it in production right away without testing. I’m releasing the MP in its current state hoping to get feedback from other people who might be interested in testing it out at this point.
  • Third, the Service MP Editor is really rough and only sort of tested. You’ll want to double check your MP before you import it back into SCOM or just edit the MP directly to add service monitoring configurations (more on how to do those things later)
  • Ok, so now that we have these things out of the way lets get started, you’ll see that you have three files from the download. The Readme.txt directs  you to this blog, the ServiceMPEditor.ps1 is used to add/modify/remove services to the MP, and WindowsServiceMonitor.xml is the management pack.wsmfiles
  • You can go ahead and import the management pack into SCOM, there’s no services that are discovered by defaults so out of the box it doesn’t really do much until its configured
  • Once you have it imported go ahead and run the ServiceMPEditor.ps1 script, this will launch the service add/edit/remove powershell form

wsm_poshgui1

  • Basically, what this form does is export the management pack from SCOM and reads the service discovery vbscript contained within the MP. It then allows you to edit the xml utilizing the GUI, and then you can import it back in after saving the changes.
  • Again, remember to have the MP imported and then lets go through adding the first one together

wsm_poshgui2

  1. Management Server – Put in the server name of one of your SCOM management servers.
  2. Management Pack Location – This is where the MP will be exported to and updated. This is also where any service exports/imports will go to/from.
  3. Get MP Config – Select this button after steps 1 and 2 are completed, this will export the MP to the “Management Pack Location”.
  4. New Service – Select this button to start a new service configuration
  5. Service Name – Put the “Service name” in here. Do not put in the “Display Name”, that will not work (unless they are the same).
  6. Confirm Service Edit – Select this after choosing all your service monitoring options. Dates that are checked will be days that the service is monitored and the time between “Monitor Start Time” and “Monitor End Time” is when the service will be monitored.
  7. Save MP Config – Once you’ve added your service(s) to be monitored select this button to save the service monitoring configuration

Next you’ll probably want to take a look at the MP and verify the GUI added in the services properly. A little about what is going on behind the scenes

The discovery script in the MP’s xml has an array that will hold the configurations for each service that will be discovered:

wsm_vbs1

After you save select “Save MP Config” in the GUI anything that’s in the “Services” list will be placed in the discovery script:

wsm_vbs2

Add the numbers for each day you want monitored together

After you see a couple of them in there you’ll get the idea of how it works and maybe you’ll prefer editing it manually, its up to you, the key is you have choices =)

  • Ok, back to main bullet points. So next you’ll want to head over to the console and check to make sure your service(s) is discovered:

wsm_discoveredinv

  • Looks pretty good – Some words about that.
    • The discovery interval is 86400 seconds (once a day). If you don’t want to wait for the discovery interval just restart the monitoring agent and it will run the discovery right away. You can also override the discovery interval if you like (Go to SCOM Console – Choose “Authoring” – “Management Pack Objects” – “Object Discoveries” – Search for: Windows Service Monitor Discovery)
    • You can also enable debug if you like, this will write 101 events to the SCOM event log each time the discovery is ran. It will let you know that it ran and which services the script discovered.
  • Lastly, lets take a look at the monitor configuration:

wsm_monitor

  • The big thing to mention here are the overrides: You can override the start/end times and the days of the week mask for one or a group (if you create one) of service objects here. Or just stick with your discovery settings. The world is your oyster.

Additional Monitoring Features Explained

  • Console tasks to start/stop and get service status. Pretty self explanatory, they will show up on the alerts as well.

wsm_consoletasks

  • Next is the recovery task. This is disabled by default. When enabled it will try to start a service automatically 3 times in a 24  hour period by default. If you would like to use this feature or change the settings you can override these values in the “Windows Service Monitor (Custom MP)” monitor

wsm_monitorrecovery

  • Also, there is another monitor that watches for when the service is restarted 3 times in a 24 hour period. This is enabled to alert by default and is called “Excessive automated service restart monitor”. Its a reset timer based monitor meaning that when it alerts it will stay open and then close after a period of time (24  hours by default)

 

I think that’s the basics on how things work. Again this is really almost a POC version of this management pack. I would love it if you would try it out and let me know your feedback! You can reach me via the comments or LinkedIn: www.linkedin.com/pub/andy-leibundgut/ab/23/a72

Honey do list:

  • See about adding performance gathering rules (disabled by default)
  • Add some views
  • Maybe some reporting?
  • Add an alert description that can be configured per service in the discovery (ServiceMPEditor.ps1) so each service can have its own unique “KB” in the alert description.
  • Make ServcieMPEditor.ps1 a little prettier.
    • Maybe add a folder picker.
    • Prefill Management Server/Folder with last location (config file or something)
  • What else? let me know your ideas!

Site Index

Advertisements

13 thoughts on “SCOM – Windows Service Monitor Management Pack

  1. We’re using something simular.
    Some things to add could be:
    -interval monitoring with option to go red after X times
    -wildcard service discovery

    Like

    • Thank you, love the ideas. I’ll include them in the next update. Been busy lately but I might carve out some time next week to work on improvements.

      Like

  2. Thx looks good! Is there anyway to target certain servers? Or the discovery runs on every server?
    Is there anyway to change the alert name for each service?

    Like

  3. Nice, did some small changes and added in a state widget view to the MP just to get some basic overview of monitored services.

    I had no problems in my test environment, so I am doing a bit larger scale test to see if some other problems comes to the surface.

    Think this is a good balance when it comes to usability and function, there are not many that are able to author their own MPs. Instead of having to train ppl to use VS to be able to add monitoring of new services is a bit of a overkill. The build in SCOM service monitor template is a bit problematic in SCOM 2016. For one thing it does not allow for triggering a command channel if you want to run some scripts (like sending SMS). E-mail bit works fine, just not the command channel. Go figure.

    If I find bugs I will fix them and report back.
    Thanks again

    Like

    • Hi again, found a bug in the discovery. If the service description field is empty for a service you want to monitor the vbs script will throw an error of type missmatch. If fixed it by checking if he description field was null and just adding some text.

      This is the code i used:
      If IsEmpty(objSvc.description) Then
      strDescription = “No description”
      ElseIf IsNull(objSvc.description) Then
      strDescription = “No description”
      Else
      strDescription = objSvc.description
      End If

      Like

      • Awesome, thanks! I’ll incorporate it into my next release. I need to quit slacking and get to work on the next round of fixes/enhancements! ;P

        Like

  4. Very useful article. But there is one point. When discovery runs, it selects displayname of service. But in class definition there is property “DisplayName” and this is confusing, because there is also property “DisplayName” in system entity. I would suggest renaming “DisplayName” to “ServiceDisplayName” and maybe “Description” to “ServiceDescription”

    Like

  5. Let me give one more comment. ConditionDetection in unit monitor type “WindowsService.CheckNTServiceState.MonitorType” is not full.
    Particular, if my service is disabled and I want monitor it with CheckStartupType=false then (if service is not running) CD will return nothing (not “ServiceRunning” and not “ServiceNotRunning”). This is not allowed, I thnink, also this situation is not common.
    Maybe, it is better to insert separate script like in this monitor type (probe action with ID “CheckServiceState”)?
    http://systemcentercore.com/?GetElement=Microsoft.SQLServer.2012.CheckWinServiceStateMonitorType&Type=UnitMonitorType&ManagementPack=Microsoft.SQLServer.2012.Monitoring&Version=6.7.15.0

    Like

  6. I try to modify unit monitor type “WindowsService.CheckNTServiceState.MonitorType”:
    1. replace data source with “Microsoft.Windows.WmiProvider”-based rather than “Microsoft.Windows.Win32ServiceInformationProvider”
    2. add probe action (like “Microsoft.SQLServer.2012.VerifyWindowsServiceState”, see link) to detect service running
    3. simplify condition detection, because check in detail is processing in probe action.
    It is working and became simpler (because computername is not necessary for “Microsoft.Windows.WmiProvider”, condition detection is simpler). In probe action you can check, if “CheckStartupType” is spelling properly.

    I would also suggest remove “bDebug” flag all over xml. It is simplier directly set constant variable in script, in my opinion.

    Like

  7. I noticed that in MS MP frequency is set to 60 sec. Maybe it is better to set this value (instead of 30 sec).
    Also there is one interesting option “Unavailable time”, which allows service to be in stopped state without alerting during this specific period of time.

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s