Service Status Monitoring using PowerShell
PowerShell is a blessing for Windows administrators. With the level of reach that PowerShell has in Windows, there is hardly anything in Windows administration that cannot be achieved using PowerShell.
Recently, a colleague requested for a script that could help them monitor a set of services on a set of computers. They wanted the script to run every so many minutes, check if some specific services are running on a specified set of servers, and notify them if a service wasn’t running.
It is a pretty simple, straightforward requirement. Or so it seems at first. I mean, how complex can a service check be? In reality, it is simple, no doubt. However, there are a few things that you might want to remember. This post discusses the following:
The not-so-good solution
It is really tempting to create a hard-coded script with the service name in it. This script can then be set to run from the Task Scheduler in Windows. It can be kicked off at regular intervals, and emails can be triggered for each of those services that aren’t running. It is, in fact, a matter of running a one-liner:
if ((Get-Service blah).Status -ne 'Running') { Send-MailMessage -From 'bot@domain.com' -To 'WindowsAdmin@domain.com' -Subject 'Blah not running' -Body "Hey, Admin!`n`nI see that Blah is not running on SVR001.`n`nYou might want to check this.`n`nBot" -SmtpServer 'smtp@domain.com' }
That actually does the job. But what if I want to monitor seven services on the server? Still possible. Okay, what if, as with any enterprise, I want to monitor seventy servers for different sets of services on them? Would you create four hundred scripts and manage them from seventy servers? And worse, receive thirty different notifications from your servers that are being patched during a maintenance window?
Scalability is the key
In my current environment, we have one server that handles all of the PowerShell-based automation solutions in my infrastructure. If I wanted a script to monitor four hundred services across seventy servers, this is the server where I would schedule the task to do that.
You would ideally want to write one script that takes care of all of the services on all of the servers. Here is what I translated the requirements into:
- Allow multiple servers to be monitored
- Allow multiple services to be monitored on each of the servers
- Allow notification to multiple concerned stakeholders
- Centralise management
- Simplify configuration
- Creating a sample configuration file should be an option
- Send only one notification per administrator/group, with actionable services from all of the servers that they own
The script
Here is the script that can take care of all of the aforementioned requirements. I also threw in a little email formatting for better user experience.
function main {
Get-ServiceStatus -InputFilePath '\\path\to\input-file.csv' -From 'bot@domain.com' -SmtpServer 'smtp@domain.com'
}
function New-InputFile {
param (
# Path to the file
[Parameter(Mandatory=$true, Position=0)]
[string]
$Path
)
$Fields = 'ComputerName,Service,NotificationEmail','SVR001,WinRM,admin@domain.com;me@domain.com;you@domain.com,<< Use this line as a guide; delete it before using the script.'
New-Item -Path $Path -ItemType File -Value ($Fields | Out-String).Trim() -Force
Write-Host "New file was successfully created at $Path"
}
function Get-ServiceStatus {
[CmdletBinding(DefaultParameterSetName='FromFile')]
param (
# Path to the input file
[Parameter(Mandatory=$true, Position=0, ParameterSetName='FromFile')]
[string]
$InputFilePath,
# Switch to create a new template
[Parameter(Mandatory=$true, ParameterSetName='NewTemplate')]
[string]
$NewTemplatePath,
# From address, from which the email should be sent
[Parameter(Mandatory=$true, ParameterSetName='FromFile')]
[string]
$From,
# SMTP server FQDN
[Parameter(Mandatory=$true, ParameterSetName='FromFile')]
[string]
$SmtpServer
)
begin {
if ($InputFilePath) {
try {
Write-Verbose 'Importing contents of the input file.'
$ServiceRecords = Import-Csv $InputFilePath -ErrorAction Stop
}
catch {
Write-Warning $PSItem.Exception
Write-Error 'Could not read the file.'
break
}
}
$style = "<style>BODY{font-family:'Segoe UI';font-size:10pt;line-height: 120%}h1,h2{font-family:'Segoe UI Light';font-weight:normal;}TABLE{border:1px solid white;background:#f5f5f5;border-collapse:collapse;}TH{border:1px solid white;background:#f0f0f0;padding:5px 10px 5px 10px;font-family:'Segoe UI Light';font-size:13pt;font-weight: normal;}TD{border:1px solid white;padding:5px 10px 5px 10px;}</style>"
$ServiceStatusTable = @()
}
process {
if ($NewTemplatePath) {
if (Test-Path $NewTemplatePath) {
$NewTemplatePathItem = Get-Item $NewTemplatePath
if ($NewTemplatePathItem.PsIsContainer) {
Write-Verbose "The path given is that of a directory. Creating a new input file in the directory."
New-InputFile -Path "$NewTemplatePath\Input.csv"
}
elseif ($NewTemplatePathItem.Extension -eq '.csv') {
if (Read-Host "A file exists at the specified path. Would you like to overwrite it?" -imatch '^y') {
New-InputFile -Path $NewTemplatePath
}
}
else {
Write-Warning "The path specified is neither a directory, nor a CSV file. Attempting to create the file anyway."
New-InputFile -Path $NewTemplatePath
}
}
elseif ($NewTemplatePath -match '\.csv$') {
Write-Verbose 'Creating a new CSV file at the path specified.'
New-InputFile -Path $NewTemplatePath
}
else {
New-InputFile -Path "$NewTemplatePath\Input.csv"
}
}
else {
foreach ($Record in $ServiceRecords) {
try {
$ServiceStatus = (Get-Service -Name $Record.Service -ComputerName $Record.ComputerName -ErrorAction Stop).Status
}
catch {
$ServiceStatus = $PSItem.Exception
}
$ServiceStatusTable += New-Object PsObject -Property @{
ComputerName = $Record.ComputerName
ServiceName = $Record.Service
Status = $ServiceStatus
NotificationEmail = $Record.NotificationEmail
}
}
$StoppedServices = $ServiceStatusTable |
Where-Object Status -ne 'Running' |
Group-Object NotificationEmail
if ($StoppedServices) {
foreach ($Group in $StoppedServices) {
$FilteredStoppedServices = $Group.Group |
Select-Object ComputerName, ServiceName, Status |
ConvertTo-Html -As Table -Fragment | Out-String
$Body = ConvertTo-Html -Head $style -Body '<p>Hi Team,</p><p>The following services were found to be not running when the status was checked by the Automated Service Status Check monitor.</p>', $FilteredStoppedServices, '<p>Please take actions as necessary.</p><p>Thanks,<br />Service Check Bot</p>' | Out-String
Send-MailMessage -From $From -To ($Group.Name -split ';').Trim() -SmtpServer $SmtpServer -Subject 'Services found to be not running' -Body $Body -BodyAsHtml
}
}
}
}
}
main
The input file
PowerShell handles structured data better than any shell out there. And input in the form of structured files is ideal. While PowerShell can handle xml, json, etc., nothing probably beats the simplicity of a CSV, from a Windows admin standpoint. The primary reason I choose CSV over other formats in such cases is because a CSV opens in Excel, and is really easy to update for most people.
Here’s some sample data to help you configure the input file (I use the table format to help you understand it better):
This type of input enables flexibility. A certain service on a server may interest a certain set of administrators, while several services on a single server might concern a single administrator. This way, you enable better management of such situations.
How the script works
This script uses the same skeleton that my other scripts use, including the main
function. I prefer this model because all of your configuration is presented front-and-centre. You don’t have to look around or dig to find something to change. If your smtp server address has changed, you only have to change that one line, which is the main
function.
Next, there is a function to create a new input file. This is to be in line with the Don’t Repeat Yourself (or dry) ideology. I will explain this further in a bit.
The major chunk of this script is the Get-ServiceStatus
function. It accepts four parameters in all. The function has two parameter sets: FromFile
—which is the default parameter set—and NewTemplate
, which is used to create a new template. NewTemplate
can be thought of as an add-on, just in case someone wants a new template. Of course, nobody would schedule this. This functionality would be required only before the very first run of the script. I’ve added this functionality to ensure there is no mismatch in the column names. One wrong character in the header, and the script won’t function the way you want it to.
The begin
block contains four pieces: trying to input import the input file, the style definition for the html email, initialising the service status table, and finally, setting the completed path, in case the new template path specified either does not have the full file name, or the path mentioned is that of a directory. The flow should break out of the function if the input file could not be imported for some reason.
The process
block starts with the case where NewTemplatePath
is given. The script goes through most possible situations when the NewTemplatePath
is specified, and based on what is passed as the value, creates the right kind of file, or exits the function altogether.
Here is where dry comes into play. For instance, the action would be the same in case:
- The specified path is a directory
- The specified path does not exist at all
Similarly, the action will be the same in case:
- The template path does not exist, but the administrator has specified the file name and extension
- The new template path contains the right extension, the file already exists, but the administrator responds with “Create the file anyway”
Specifying the input file and asking for a template are mutually exclusive events (we ensure this by means of parameter sets). Therefore, these can be handled using a simple if–else split. The if
part creates the template, the else
part handles the process of generating the report and sharing the same.
Let’s concentrate on the else part—that’s the core idea of this script. We start with the first record in the sheet. The status of the service is checked on the specified server1. There is a good chance that this check fails, and the reason could be any. If the failure occurs, the service status should not show an arbitrary value. Therefore, we record the error that occurred, in the ServiceStatus
column. This not only ensures that no wrong information is recorded, but also gives the administrator an explanation as to what failed.
Whenever an error occurs, the variable, PSItem
—which stands for ‘the current item in the pipeline’—is assigned with the error. We simply need the exception message in this case, and therefore, we only pick the Exception
property (the error object itself has many fields, which we don’t need in this context).
We then combine data from the already-existing $Record
and the newly-gotten ServiceStatus
into a new object entry, using the hash table, $ServiceStatusRecord
. When all the records in the CSV have been looped through, we have a complete table with all the specified services on all the specified computers.
We are only concerned with services that are not running2. Therefore, we filter the output to contain only those services that are not running at the moment. We want to send only one notification per recipient/group. For instance, I own four servers, and care about a dozen services across them. I should neither receive four, nor twelve notifications, but only one. I co-own another two servers with a John Doe in my team. John and I should receive only one email about any of the services on those servers. (Which means, I receive two notifications—anything more refined could become more complicated than necessary.)
Solution: Group the services by the notification email recipient.
Next, you don’t want anything done if all the specified services are running. Therefore, you combine further actions within an if
block. We loop through each group, and create an html table fragment with the relevant service information. We only need the service name, the computer name and the status, but not the notification email. So, we select only those three properties from the table. We then convert this into an html table fragment, and output it as a string.
We then compose the email body. We specify the styling (that we set in begin
), add a couple of paragraphs to the html body, the html table fragment, another couple paragraphs, and output everything as a string. This string is saved within the variable, Body
.
Finally, we send the email message using the Send-MailMessage
cmdlet. Notice how we split $Group.Name
at ;
and trim the elements. This is for cases where there are multiple recipients. The To
parameter of Send-MailMessage
accepts a string array; it cannot send an email to addresses separated by semicolons by itself. Therefore, we combine the convention (of separating email addresses with ;
) and string manipulation to cater to the requirement. We use Trim()
to trim off spaces in addresses (in case someone entered one@domain.com; two@domain.com
). Also, picking the address becomes a cakewalk because of grouping by recipient list; that is the group name!
Of course, do not forget to call BodyAsHtml
.
That brings us to the call to main
. The call happens in the end. If you did not have the main
function, you would’ve had to make configuration changes to the end of the script. It wouldn’t have been the best of experiences. Therefore, the configuration is done in the very beginning, and the call is made in the end.
Summary
In this post, we looked at how many things we may have to consider to cover the most common possibilities of issues when it comes to creating a script to monitor services. We wrote a script that is (of course, not infinitely, but greatly) scalable, and easily configurable. We wrote the script with quite a bunch of best practices in mind, and made it modular3.
We set the logic in such a way that the script is efficient, and the information it sends is useful—human-readable. We also added the functionality of creating the input file, into the script.
Now, a confession: I considered enabling the script to write to the Event Log, but then, decided against it, thinking it would seem more complex than is necessary for this situation. If you would like it, do [let me know](https://twitter.com/{{ site.twitter }}), and I’ll add the functionality. Consider this situation: The script is set to run every five minutes, to monitor fifty servers. All the specified services on all the fifty servers are running. Therefore, no email is triggered to anyone. How would we know the script ran at all? Sure, we could create a simple log file with a check-in that the script was started, but that’s not exactly a good practice. If we could have an entry made to the Event Log, and also, record every error into the Event Log, wouldn’t that be great?
So, that, Ladies and Gentlemen, is the script to monitor services across multiple computers. The code is available on GitHub. You are free to use it and even modify if you want. If you think that modification can benefit more people, feel free to submit a pull request. However, remember the best practices and the conventions followed in the script. Submissions that make the script a monolith, or deviate from administration best practices will be rejected (with an explanation, of course).
-
It seemed to me as though querying for multiple services on the same server, using multiple queries, was inefficient. For instance, if I had to monitor four services on a server, this script makes four calls to the server. However, the alternative methods I tried either consumed more time than multiple queries (unless you were monitoring more than seven services on the same server), or increased the complexity of the script. If everything worked, error handling broke. So, if you happen to find a better way to handle this, please make the change and submit a pull request. I’d be rather grateful! ↩︎
-
Of course, there is no harm in sending an “Everything is good!” report to administrators, consider the situation where the script is designed to run every five minutes. The administrator could get the “Everything is good!” email 288 times in a day! never do anything like that unless you want failures to be ignored as well—because as humans, that’s what we do with overloads. ↩︎
-
In my experience, everywhere I see, PowerShell scripts are monoliths. And it is sad that such has become a norm. PowerShell is a programming language, too, and we should treat it as such. A good developer cares about his program; a good administrator cares about efficiency. If you write PowerShell scripts, you are a developer-administrator. You should care about this stuff. ↩︎