An assortment of indigestible things

Monitoring SANsymphony-V with Nagios

Update (6-Feb-2012): I am reliably informed that the next major release of SANsymphony-V (9.0) will include direct SNMP support, making this nasty procedure unnecessary. Hooray 🙂

It seems that every time I install a new product for production, I have to find new and amusing ways to monitor everything to make sure I’m alerted in case of untowardness. My monitoring solution has Nagios at its core, so I want every alarm and fault condition to appear in the same place.

I’m in the early stages of implementing SANsymphony-V, which—despite its clumsy name—is a rather clever way of presenting replicated storage to a vSphere cluster. It presents iSCSI LUNs to hosts while looking after all the replication and mirroring nastiness itself, hiding the physical storage from the rest of the infrastructure. I might write more about it when I’ve finished the implementation, but for now let’s look at monitoring.

As shipped, SSV (much easier to type) displays its alerts in its own GUI. It can be configured to send emails when things go wrong by creating a ‘task’, but getting alerts this way is inflexible: I can’t collect alert statistics, configure who gets email and when, and so forth. SSV’s task feature is a bit noddy anyway, and I’m sure it wouldn’t be long before my needs would exceed its capabilities.

SSV also comes with a whole load of PowerShell cmdlets. As a Unix guy, PowerShell is completely foreign to me, so I’m feeling my way around here. Comments from seasoned PowerShellers welcome 🙂

Prerequisites

To make this work, we need

  • A working Nagios installation
  • SSV installed on a Windows box
  • NSClient++ installed and working in NRPE mode on the SSV server

If you haven’t seen it before, NSClient++ is like NRPE for Windows: it lets Nagios monitor things like disk space and CPU usage without having to get a decent SNMP server running. It’s a small and simple bit of software that just sits there and works. I like that a lot.

Getting scripts to run non-interactively

You’d think that the good people at DataCore would have thought of this and made it easier. As shipped, the cmdlets can only be run interactively, because there’s some registration script that needs to run beforehand, and this spawns an interactive session. The documentation mentions that the cmdlets can be used ‘in a scripting environment’, but is silent on how this is possible. Thankfully, I found this thread on DataCore’s forums, and I’ll reproduce the important bits here.

It seems that our PowerShell script has to start with this bit of code, which comes from Register-DcsCmdlets.ps1 in SSV’s installation directory.

$bpkey = 'BaseProductKey';
$regKey = get-Item "HKLM:\Software\DataCore\Executive";
$strProductKey = $regKey.getValue($bpKey);
$regKey = get-Item "HKLM:\$strProductKey";
$installPath = $regKey.getValue('InstallPath');
Import-Module "$installPath\DataCore.Executive.Cmdlets.dll" -DisableNameChecking -ErrorAction Stop;

We also have to create C:\Windows\System32\WindowsPowerShell\v1.0\PowerShell.exe.config containing

<;?xml version="1.0"?>;
<;configuration>;
<;startup useLegacyV2RuntimeActivationPolicy="true">;
<;supportedRuntime version="v4.0.30319"/>;
<;supportedRuntime version="v2.0.50727"/>;
<;/startup>;
<;/configuration>;

(That forum thread suggests that the location of this file should be C:\Windows\SysWOW64\WindowsPowerShell\v1.0 but that didn’t work for me, despite this being a 64-bit installation of 2008R2. I just put the file in both places just to be extra-sure.)

Installing the test script and configuring NSClient++ to use it

My test script, including the block of code above, is very simple and looks like this:

$bpkey = 'BaseProductKey';
$regKey = get-Item "HKLM:\Software\DataCore\Executive";
$strProductKey = $regKey.getValue($bpKey);
$regKey = get-Item "HKLM:\$strProductKey";
$installPath = $regKey.getValue('InstallPath');
Import-Module "$installPath\DataCore.Executive.Cmdlets.dll" -DisableNameChecking -ErrorAction Stop;

Connect-DcsServer
Get-DcsAlert | ForEach-Object {$alerts++}
Disconnect-DcsServer

if ($alerts -gt 0) {
"CRITICAL: $alerts alerts"
exit 2
} else {
"OK: no alerts"
}

exit 0

In other words, it exits with a status of 2 (i.e. ‘critical’) if there are any alerts, and 0 (‘OK’) if there are none. Save the script as C:\Program Files\NSClient++\scripts\ssv_alerts.ps1.

Now edit C:\Program Files\NSClient++\NSC.ini and make sure that CheckExternalScripts.dll is not commented out. Then add this line:

check_ssv_alerts=cmd /c echo scripts\ssv_alerts.ps1; exit($lastexitcode) | powershell.exe -command -

Don’t restart NSClient++ yet; we have one more change to make (and you might not like it).

Granting administrator rights to NSClient++

This is the bit that makes me a bit twitchy. Although I have NSClient++ locked down to a single IP address, and it’s firewalled in all sorts of ways, I don’t really want to run it as administrator. However, I found that the Connect-DcsServer cmdlet won’t run if the service logs on as ‘Local Service’. It just times out, even if given a username and password, and I’ve no idea why. As I said at the start, this is foreign territory to me, so the amount of debugging I can do is limited. If you know a better way to do this, I’d really love to hear it!

If you’re happy to make this change (and, on balance, I decided it was worth it), open the Server Manager, select ‘Services’, right-click on NSClient++, select the ‘Log On’ tab, and enter the credentials for a local administrator account. You’ll need to restart the NSClient++ service to make this take effect.

Does it work?

On your Nagios server, try running a check command by hand to see if you’ve done everything right.

nagios:~$ /usr/lib/nagios/plugins/check_nrpe -H <;hostname>; -c check_ssv_alerts
OK: no alerts
nagios:~$ echo $?
0
nagios:~$

Woo-hoo! It all looks great. The ‘echo $?’ was to check that the exit status matched the message. To make absolutely sure things are working, we need to make SSV display an alert and try the script again. This time, the result of ‘echo $?’ should be 2. This is all that Nagios needs to interpret the result and generate an alarm accordingly.

Now all that’s left is to create a Nagios service that uses this check. I won’t bother showing you how to do that, as you’re already running Nagios and I don’t want to teach you to suck eggs. Comments are of course very welcome.

Previous

‘Sovereign’ citizens in the UK: a study in nonsense

Next

State-sanctioned child abuse: a response to Sean Faircloth’s interview with Liz Heywood

1 Comment

  1. Dave

    Thank you so much for your efforts. We were using melody and relied on perfmon and nagios hopefully now we can get the info we need which is used space on nmvs.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Powered by WordPress & Theme by Anders Norén