Wednesday, November 27, 2013

KB2775511 & Hotfix

As I blogged before KB2775511 causes issues on Windows Server 2008 R2 SP1 in conjunction with SCOM 2007/2012, whether it’s an Agent or the Management Server.

Some weeks ago Microsoft published a hotfix for this issue, also known as KB2878378. This KB provides a hotfix when you’re experiencing this issue.

Thursday, November 21, 2013

Troubleshooting EventID 33333

Prelude
Even though SCOM monitors itself, the Operations Manager event log on the SCOM Management Servers still tells a lot more. So periodically I go through those event logs on the SCOM Management Servers in order to check whether everything is okay.

Hello EventID 33333! Sad smile
This way I bumped into a SCOM 2012 R2 Management Server logging EventID 33333 way too many times:
image
I live by the credo: ‘…ONE event isn’t an event.’. or in other words, when a single event happens and the rest is okay, it was just a blurb and nothing more.

But this here tells me a different story, something isn’t going as planned. Time for some investigation.

Loving the event log
Seriously I do! Why? Because the events contain so much information. I have done a lot of troubleshooting and most of the times the Operations Manager event logs were the starting point of my investigations and also the clue to the solutions.

And in this case the event log helped me a lot since 99% of the EventID 33333 logged had the same sources in the description of the event:
image

As you can see the BaseManagedEntityId and MonitorId are logged, both with their GUIDs. Awesome! And yes, 99% of the events with EventID 33333 had the same GUIDs. So the cause was already pinned down to only ONE source and ONE monitor not functioning well. Awesome!

Sherlock Holmes or PowerShell?
Now it was time for some plain PowerShell commandlets in order to translate the GUIDs to understandable human language.

  1. In order to get a proper name for the GUID attached to BaseManagedEntityId I ran this PS cmdlet:
    Get-SCOMClassInstance -id: 'GUID' | ft DisplayName 
    (Replace GUID with the GUID for the BaseManagedEntityId shown in the event description.)

    This gave me the FQDN of the BaseManagedEntityId. It turned out to be a monitored Windows Server.

  2. In order to get a proper name for the GUID attached to MonitorID I ran this PS cmdlet:
    Get-SCOMMonitor -id: 'GUID' | ft DisplayName 
    (Replace GUID with the GUID for the MonitorID shown in the event description.)

    This gave me the name of the Monitor involved. In this case it was the Monitor System Center Management Health Service Memory Utilization.

Health Explorer
Time to open Health Explorer for that particular Windows server. Since it’s a Monitor targeted against the Agent, I opened the Health Explorer of the Agent of that Windows Server. And this is what I saw:
image

Yikes! Flip flopping! This Monitor is not doing well on this particular Windows Server. On all other monitored Windows Servers this Monitor runs just fine. I checked about 20 other servers in order to be sure, but on none of those servers this Monitor had issues. And the counter kept on growing….

So the culprit wasn’t the Monitor itself but the Windows Server.

The culprit
Time to start a RDP session with the Windows Server having issues with this Monitor. Also on this server I opened the Operations Manager event log. But all I got was this:
image

That’s not okay. But it could be the very same reason of flip flopping. Time to run a repair of the SCOM Agent running on this server:

  1. Go to Programs and Features > right click Microsoft Monitoring Agent > Change;
  2. Next > select the Repair option > Next;
  3. Now the Agent will be repaired > Finish.

After this repair job I could open the Operations Manager event log. And besides a few events it was empty and contained no errors.

On the Management Server side of things, the EventID 33333 stopped coming in from the moment the Agent on the Windows Server was repaired!

And in Health Explorer? The counter stopped. No more flip flopping!

Recap
Whenever you see an event (warning/critical) coming back in the Operations Manager event log on the Management Servers, changes are something is not okay.

Use those very same events as a starting point for your investigation and use PowerShell in order to get the understandable names of those GUIDs.

This way you obtain a lot of information within just a few minutes, aiding you in good old trouble shooting without ending up with a goose chase.

System Center 2012 R2 & Visio Stencils

My much respected Irish buddy Kevin Greene has a new hobby: creating Visio Stencils for many different components of the System Center 2012 R2 stack.

The score for now is already impressive since he has made Visio stencils for:

  1. DPM 2012 R2;
  2. SCOM 2012 R2 APM;
  3. SCOM 2012 R2 Infrastructure;
  4. SCOM 2012 R2 Network Monitoring;
  5. VMM 2012 R2.

And like the true community guy he is, he shares them with anyone who’s interested. So whenever you’re involved with one of these System Center 2012 R2 components, go here and GRAB those Visio stencils.

A BIG word of thanks to Kevin Greene for his effort AND willingness to share them with the public. Thanks man!|
image

Removing An OM12 Gateway Server With A Site Association

As I already blogged before, removing an OM12 Gateway Server with a Site association isn’t straight forward at all. It takes a lot of time and one has to go through non-supported steps in order to get things done.

Finally there’s some good news about it. Microsoft has fixed this issue with the release of SCOM 2012 R2. The tool Microsoft.EnterpriseManagement.GatewayApprovalTool.exe – included with the installation media of SCOM 2012 R2 – is rewritten for this purpose, so DON’T use previous versions of this tool.

On top of it, on the System Center: Operations Manager Engineering Blog a new article is posted, all about this topic and how to use this tool in different scenario’s.

Want to know more? Go here.

Tuesday, November 19, 2013

The Super Glue For Unleashed

Like all families there is always a single person who builds bridges and brings different minds, generations and opinions together. In many families this role is many times taken upon by the mother of the house. Somehow, somewhere she becomes the super glue bringing and keeping it all together, thus making it into a family.
image

When looking at the Unleashed series of books I many times see the same set of highly skilled and respected authors. Without them these books wouldn’t stand out that much like these books do now. They set the standard to such a height that other books are having a very hard time to get to the same level. And many times they simply don’t. None the less, these are still great books.

And yet, even in this kind of setup there is also a single person who brings all the different minds, opinions and levels of experience together in order to give you, the reader, the experience that the book is written by a single person/entity. So for those books she becomes the super glue, bringing it all together and creating a situation where 1 +1 isn’t 2 but becomes 2,5 or even 3!

For the Unleashed series of books that role is fulfilled by Kerrie Meyler. I know she isn’t the type of person to step up in the spotlights.  But IMHO she deserves a bit of extra attention. Because she’s the silent force behind many of the Unleashed books.

And – perhaps for one time only – she wrote a post about herself, all about how she got where she’s today. So for anyone who wants to gain a deeper understanding about the super glue of many of the books of the Unleashed series, this is THE posting to read.

Many times people ask me how I got so far in the community. One of the main reasons is that I learned a lot from people like Kerrie, Cameron, Pete, John, Anders, Marcus and so on. Compared to them I feel myself humble and am I just happy to know these people on a personal basis.

And you know what? Every single day I learn new stuff, and when having questions I many times find the answers in the Unleashed series, which closes the circle…

Free eBook: Introducing Windows Server 2012 R2 Technical Overview

Some days ago Microsoft published a free new eBook, titled: Introducing Windows Server 2012 R2 Technical Overview.
image

The eBook can be downloaded in various formats. The related download links can be found here.

Cross Post: Jonathan Almquist About Best Practices

In the SCOM community there are few BIG names which don’t need any introduction. IMHO Jonathan Almquist  is one of them. He knows a lot about SCOM and has deep knowledge and experience about the inner workings of SCOM and MP authoring.

Jonathan has started a whole new series of blog articles, all about Best Practices. He also explains why something is a Best Practice. Every single posting contains good information and should be read by anyone working with SCOM a daily basis.

Jonathan’s blog can be found here. Simply look for the postings starting with Best Practices. Thanks Jonathan for sharing. Awesome!
image

PKI Certificate Verification MP & Windows Server 2012 R2 & SCOM 2012 R2

There are some real pearls made by the community, and the PKI Certificate Verification MP is one of them. However, the last version (1.0.1.20) dates from March 20, 2012.

Since that date much has changed. We have seen new versions of Windows Server and SCOM 2012 as well. So how does this MP work with Windows Server 2012, Windows Server 2012 R2 and SCOM 2012 R2?

Based on my own experiences I can tell you this:

The latest version of the PKI Certificate Verification MP (version 1.0.1.20) works well with:

  • Windows Server 2012;
  • Windows Server 2012 R2;
  • SCOM 2012 R2.

The only trick you need is to set the proper overrides for the Discoveries you want to enable. These are the Discoveries:
image

Many times I only enable the Discovery related to the certificates residing in the personal computer store of the monitored servers, the Discovery of local computer's personal certificate store (registry).

In the ‘old days’ this Discovery was enabled by using the Objects Windows Server 2008 Computer or Windows Server 2003 Computer.

All you have to do now is to enable this Discovery against the Objects Windows Server 2012 Computer and/or Windows Server 2012 R2 Computer.
image

Soon the Discovered Certificate Stores will be shown in SCOM 2012 R2 and the related Certificates as well.

So I am glad to see this pearl of the community is still valid Smile.

Monday, November 18, 2013

New MP: SUSE Manager Management Pack

A few days ago SUSE launched something very special: the SUSE Manager Management Pack for SCOM, enabling Windows systems administrators to view server health information and perform both Windows and Linux patching duties via the same console.

This is really good news since it shows that finally SCOM is taken seriously by the much respected Linux community and software companies.

The SUSE Manager Management Pack for SCOM has some requirements:

  1. SUSE Manager version 1.2 or higher;

  2. SUSE Manager Management Pack;

  3. System Center Operations Manager 2007 R2 or later.

Want to know more? Go here.

Tuesday, November 12, 2013

SCOM 2012 Management Servers: Better One Or Two Too Many…

When the dino’s ruled the world
Back in the old days of SCOM 2007x one really had to consider how many SCOM 2007x Management Servers to roll out, mostly based on these two facts:
  1. Every single SCOM 2007x Management Server required a special SCOM license;
  2. Many environments were using physical hardware for their servers.

So any additional SCOM 2007x Management Server was an extra burden to the (many times already loaded) IT budgets. On top of these costs one had to consider the extra costs of the Server OS license as well.

In situations like these many times the minimum amount of required SCOM 2007x Management Servers was rolled out, never the optimum amount, resulting in an underperforming SCOM 2007x Management Group.

Back to the current situation
With the roll out of the System Center 2012 Product, Microsoft revamped the license model accordingly. This resulted in the related System Center 2012 Management Servers becoming free of System Center 2012 licenses since only the managed end points require a SC 2012 license.

So in the case of the SCOM 2012 Management Servers, these servers became free of the SC 2012 license (of course, when these servers are ‘touched’ by Orchestrator for instance, these servers do require a SC 2012 license…).

On top of it, virtualization of workloads had become default as well. So instead of rolling out physical hardware for servers, VMs were spawned as required. And when the underlying virtualization hosts are covered by a Data Center license for the Windows Server OS, the VMs running on top of those same virtualization hosts are covered as well by a Windows Server OS license. So no hidden costs!

Good to know, the SC 2012 license comes in two flavors: Standard and Data Center. Only difference is virtualization density. All components found in the System Center 2012 Product are covered by both licenses. Many times the SC 2012 Data Center license is the best solution since many virtualization hosts do run many VMs.

And now a second nice thing kicks in: with the Data Center flavor of the SC 2012 license, all VMs running on that same virtualization host are covered automatically with a SC 2012 license, no matter what SC 2012 based workload you run on them.

SQL for free?
And yes, SQL Server comes for free for the System Center 2012 Product when those SQL Servers only run SC 2012 based workloads AND the Standard edition of SQL Server is used.

So what?
The nice thing here is that you don’t have to be lean & mean anymore when rolling out SCOM 2012 Management Servers. So when your design tells you to roll out 3 of them, add an additional one. Yes, it will take resources like disk IO, CPU and RAM. But that goes for any other VM as well of which many new ones are deployed on a weekly basis.

But why? Because we can?
No. There is more to it. Now with SCOM 2012 you can monitor network devices way much better compared to SCOM 2007x. However, other MPs require SNMP to be present on at least one of the SCOM 2012 Management Servers.

For network monitoring SCOM 2012 uses a SNMP trap module of it’s own. And yes, the SNMP feature of Windows Server 2008 R2/2012 and that particular SCOM 2012 module don’t work well together.

In cases like these it’s better to use at least one dedicated SCOM 2012 Management Server, exclude it from network monitoring, and install the SNMP service on that server for those special MPs, like the SAN MP of HP for instance.

This way you know for sure these two components won’t bite each other, enabling you a more stable SCOM 2012 environment.

Recap
When designing a SCOM 2012 environment I use this new rule of thumb for the quantity of needed SCOM 2012 Management Servers:

(Required SCOM 2012 Management Servers based on quantity of monitored objects + future growth) + 1 additional SCOM 2012 Management Server = Total amount of SCOM 2012 Management Servers to be rolled out.

This way you know you have a SCOM 2012 environment in place which can be used in a smart manner with per Management Server a dedicated additional role, like communicating with the framework used by HP for monitoring their SAN solutions for instance.

For more information about the System Center 2012 Product license model, check out this series of postings I wrote about it.

Monday, November 11, 2013

Windows Event Log Monitoring: How To Get The Proper Event Source

First some background information
With SCOM it’s a straight forward process to create new Monitors/Rules which are triggered by certain Windows event log entries. However, you only want to trigger those Monitors/Rules on the correct type of event logged in the Windows event log since every false-positive Alert is one too many, breaking down the overall acceptance of SCOM and playing down the importance of the Alerts shown in the SCOM Console.

No NOISE please!
So the more filtering in that particular Monitor/Rule takes place, the better. This way the false-positives are skipped and only the Alerts which truly matter are triggered.

In the case of the Monitors/Rule which are targeted at a certain event in the Windows Event log you need to add additional filters on top of the most basic one which is the Event ID itself.

These additional filters can be used for instance:

  1. Event Source;
  2. EventDescription as described by Kevin Holman in this blog posting. Also take notion of his warning using this additional filter, stated in the same posting.

In SCOM these two additional Parameter Names (which are the filters) will be added and configured:
image

So far so good. But beware. The Parameter Name Event Source can be bit tricky. And when you don’t get it right from the start, NOT a single event will be captured by SCOM. Why? By default all these ‘filters’ (Parameter Names) are ANDed, so ALL of the filters must be met, or SCOM won’t pickup that event:
image

So when you get the Event Source wrong, not all filters in the AND group are met, thus causing SCOM to skip that particular event you want to be caught by SCOM in the first place…

What proper value to use
So what Value do you have to use for the Parameter Name Event Source in order to make it work as intended?

When you know it (duh!) it’s easy. First I want to show you what value NOT to use for Event Source:
image

For any given EventID you want SCOM to trigger an Alert or to capture it for Reporting purposes and you use the Event Source as well, you DON’T use the Source of that EventID as depicted in the screen dump shown above.

Even when you select the whole Source (which is in this example  Microsoft Windows security auditing., the dot included) SCOM won’t react at all.

Instead, open the EventID you want SCOM to act on and go to the second tab Details > select the option XML View > in the XML View go to Event > System > Provider Name:
image

The yellow highlighted entry is required for SCOM, which is in this example Microsoft-Windows-Security-Auditing. In SCOM the Value for Parameter Name Event Description will look like this:
image

And now we have the proper Event Source for SCOM which enables far more granular monitoring for certain EventIDs in the Windows Event Log.

Recap
When building Monitors/Rules in SCOM which are triggered by certain EventIDs, the more filtering is used the better. However keep Kevin Holman’s remarks in mind and when using the Event Source as one of the filers, make sure you use the proper Value for it. Otherwise that Monitor/Rule will fail to work.

Thursday, November 7, 2013

SCOM 2012 R2 Network Monitoring: Where Are The Alerts?!

First some background information
Okay, SCOM 2007 had some serious issues with network monitoring. So in SCOM 2012 this component got a complete overhaul and is rewritten from the ground up. And indeed, network monitoring in SCOM 2012 has improved compared to SCOM 2007. But to say it has really become top notch is a bit too much.

No, SCOM 2012 won’t replace the pure bred network monitoring tools. But guess what? Those tools will never replace SCOM 2012 as well. Ever. No matter what the marketing departments of those very same vendors want to make you to believe.

But when the network monitoring part of SCOM 2012 is put into perspective (SCOM 2012 monitors tons of work loads, whether it’s on-premise, cloud based, mobile units and from different angles, in- and outside) it’s okay. It’s has become an integrated part of the famous 360 degree monitoring. And for once I am on par with the marketing team of Microsoft because on this topic they tell the truth without any over estimation.

And now what?!
However, some things seem not to change and can still cause some strange issues. Suppose you have a brand new SCOM 2012 R2 RTM environment in place and everything is by the book. Many servers (Windows & Unix) are monitored and many different kind of workloads running on those very same servers. And yes, also many important network devices are being monitored.

And now one of those important monitored network devices goes down. In this case their were other monitoring solutions in place as well and they triggered the alarms. However, SCOM who’s monitoring that network device as well, stayed quiet. And now for a few minutes but for a long long time. And reported the network device to be HEALTHY!

Time to investigate
This really puzzled me so it was time for a deep dive into the way SCOM monitors network devices and alerts upon them. I agree, noise is bad but not Alerting when something is really amiss is even worse!

In Health Explorer of any given monitored network device you’ll find these two Unit Monitors:

  1. ICMP Ping
  2. SNMP Ping

These two Unit Monitors roll up to the Dependency Monitor Network Device Responsiveness, as seen in this screen dump:
image

So far so good. Both Unit Monitors are targeted against the Class Node, which is basically any monitored network device. However, per Unit Monitor there is an override in place which disables it.

The ICMP Ping Unit Monitor is disabled when the network device is covered by SNMP only, and the SNMP Ping Unit Monitor is disabled when the network device is covered by ICMP only. And this makes perfect sense.

But the configuration of those Unit Monitors really puzzled me.

Unit Monitor SNMP Ping
This Unit Monitor has some settings which I don’t fully understand. Let’s take a look at the Knowledge which describes this Unit Monitor in Health Explorer:
image

The options Interval and Number of Samples are most important here. First of all the Interval on this Unit Monitor isn’t 240 seconds in SCOM 2012 R2, but 300 seconds, which is 5 minutes. The Number of Samples is indeed set to three. Basically meaning any given monitored network device can be down for 15 minutes before SCOM 2012 R2 triggers an Alert!
image

Another thing which I am not happy with is the Health State when the network device doesn’t respond. It’s not set to Critical but to a Warning status:
image

However, when a network device goes down, I want it to be a Critical Alert, not a Warning. However, since this Unit Monitor (and the ICMP Ping Unit Monitor) roll up to a Dependency Monitor, which also triggers the Alert, this kind of modification shouldn’t be done on the Unit Monitor level.

So for the Unit Monitor SNMP Ping I set these two overrides:

  1. Interval: from 300 seconds to 30 seconds;
  2. Number of Samples: from 3 to 2.

So now this Unit Monitor will change State after a minute when a monitored Network Device is down:
image

Time to take a look at the second Unit Monitor, ICMP Ping.

Unit Monitor ICMP Ping
This Monitor is configured a bit differently compared to the SNMP Ping Unit Monitor. But still it needs some serious attention. This is what Health Explorer tells us:
image

So this Unit Monitor changes State after 6 minutes (Interval of 120 seconds x Number of Samples, 3) which is still too much. Also a Warning State is generated, not a Critical condition…

Time for some Overrides here as well. So now this Unit Monitor will change State after a minute when a monitored Network Device is down:
image

Time to move on to the Dependency Monitor, Network Device Responsiveness since I want a Critical Alert with Priority High (for the Notifications which sends out only New Alerts which are Critical and have Priority High).

These are the Overrides I set:
image

Time to test it
And now a new network device was added to SCOM to be monitored. This was a test network device. So when SCOM was monitoring it, the network cable was unplugged.

And YES! After a minute SCOM raised a Critical Alert with priority High. This Alert was neatly pushed out by the Notification Model as well. Awesome!

Recap
When you’re running SCOM 2012 R2 and are monitoring network devices, check the settings of the Monitors and make sure whether they match with the requirements of your organization. Changes are you have to make some modifications Smile.

Monday, November 4, 2013

Exchange Server 2010 MP: No Synthetic Transaction Tests Please

For many times I’ve imported and configured the Exchange Server 2010 MP. And now for the first time ever, for a particular use case, there are good reasons NOT to enable the Synthetic Transaction Tests, as described in the related MP guide on pages 14 and 15

However, even though the related MP guide is a big one, there is nowhere a description to be found about how to do that. Nor on the internet. So it was time for me to look for some solutions myself and soon I was disabling quite a few Rules and Monitors.

However, the Exchange 2010 MP has a whole different kind of operation so when disabling a Rule, the Monitor with the same name has to be disabled as well. When you don’t do that, changes are the related SCOM DB will get some serious issues. All thanks to the Correlation Engine, this marvelous wonder of code Smile

Even KB2592561 didn’t help at all. By the way, did you ever read that KB? It contains this sentence which makes my skin crawl: ‘…This is by far the largest MP to date from Microsoft, and provides a massive amount of visibility to Exchange issues.  However, there are just some things in the Management Pack that just don’t work…’
image

Still don’t know whether to laugh or to start crying here… However, I am wandering of now. Back to the topic of this blog posting now.

So here are the Rules and Monitors I disabled up to now. When there are more to come I’ll update this posting accordingly.

Rules: Test-OwaConnectivity

  • NonServiceImpacting: There was an Outlook Web App connectivity (External) transaction failure. The Test-OWAConnectivity cmdlet must be run on a Client Access server.
  • NonServiceImpacting: There was an Outlook Web App connectivity (Internal) transaction failure. The Test-OWAConnectivity cmdlet must be run on a Client Access server.
  • KHI: Exchange Control Panel connectivity (Internal) transaction failure - The test credentials can't be used to test the Exchange Control Panel.
  • KHI: Exchange Control Panel connectivity (External) transaction failure - The test credentials can't be used to test the Exchange Control Panel.
  • KHI: Failed to execute the Test-OWAConnectivity (Internal) diagnostic cmdlet.
  • KHI: Failed to execute the Test-OWAConnectivity (External) diagnostic cmdlet.
  • KHI: A directory error occurred while running the Test-MAPIConnectivity cmdlet.
  • KHI: An error occurred while executing the Test-OWAConnectivity (Internal) cmdlet.
  • KHI: An error occurred while running the Test-OWAConnectivity (External) cmdlet.
  • Script performance collection: Execute: Test-OwaConnectivity (External) diagnostic cmdlet. (Report Collection)
  • Script performance collection: Execute: Test-OwaConnectivity (Internal) diagnostic cmdlet. (Report Collection)

Rules: Test-ActiveSyncConnectivity

  • KHI: An error occurred while running the Test-ActiveSyncConnectivity (Internal) cmdlet.
  • KHI: Failed to execute the Test-ActiveSyncConnectivity (Internal) diagnostic cmdlet.
  • KHI: There was an Exchange ActiveSync connectivity (Internal) transaction failure. The Test-ActiveSyncConnectivity cmdlet must be run on a Client Access server.
  • Script performance collection: Execute: Test-ActiveSyncConnectivity (Internal) diagnostic cmdlet. (Report Collection)

Rules: Test-WebServicesConnectivity

  • KHI: An error occurred while running the Test-WebServicesConnectivity cmdlet in internal mode.
  • NonServiceImpacting: The internal Web Services connectivity transaction failed. The Test-WebServicesConnectivity cmdlet must be run on a Client Access server.
  • Script performance collection: Execute: Test-WebServicesConnectivity (Internal) diagnostic cmdlet

Rules: General

  • NonServiceImpacting: WebServices connectivity (Internal) transaction failure - The credentials can't be used to test Web Services.
  • WebServices connectivity (Internal) transaction failure - The credentials can't be used to test Web Services.
  • Some Client Access test cmdlets failed to run
  • KHI: An Outlook Web App connectivity (External) transaction failure occurred. The test credentials can't be used to test Outlook Web App.

Monitors: Test-OwaConnectivity

  • NonServiceImpacting: There was an Outlook Web App connectivity (External) transaction failure. The Test-OWAConnectivity cmdlet must be run on a Client Access server.
  • NonServiceImpacting: There was an Outlook Web App connectivity (Internal) transaction failure. The Test-OWAConnectivity cmdlet must be run on a Client Access server.
  • KHI: Exchange Control Panel connectivity (Internal) transaction failure - The test credentials can't be used to test the Exchange Control Panel.
  • KHI: Exchange Control Panel connectivity (External) transaction failure - The test credentials can't be used to test the Exchange Control Panel.
  • KHI: Failed to execute the Test-OWAConnectivity (Internal) diagnostic cmdlet.
  • KHI: Failed to execute the Test-OWAConnectivity (External) diagnostic cmdlet.
  • KHI: A directory error occurred while running the Test-MAPIConnectivity cmdlet.
  • KHI: An error occurred while executing the Test-OWAConnectivity (Internal) cmdlet.
  • KHI: An error occurred while running the Test-OWAConnectivity (External) cmdlet.

Monitors: Test-ActiveSyncConnectivity

  • KHI: An error occurred while running the Test-ActiveSyncConnectivity (Internal) cmdlet.
  • KHI: Failed to execute the Test-ActiveSyncConnectivity (Internal) diagnostic cmdlet.
  • KHI: There was an Exchange ActiveSync connectivity (Internal) transaction failure. The Test-ActiveSyncConnectivity cmdlet must be run on a Client Access server.

Monitors: Test-WebServicesConnectivity

  • KHI: An error occurred while running the Test-WebServicesConnectivity cmdlet in internal mode.
  • NonServiceImpacting: The internal Web Services connectivity transaction failed. The Test-WebServicesConnectivity cmdlet must be run on a Client Access server.

Monitors: General

  • NonServiceImpacting: WebServices connectivity (Internal) transaction failure - The credentials can't be used to test Web Services.
  • WebServices connectivity (Internal) transaction failure - The credentials can't be used to test Web Services.
  • Some Client Access test cmdlets failed to run
  • KHI: An Exchange ActiveSync connectivity (Internal) transaction failure occurred. The test credentials can't be used to test Exchange ActiveSync.
  • KHI: An Outlook Web App connectivity (External) transaction failure occurred. The test credentials can't be used to test Outlook Web App.
  • KHI: An Outlook Web App connectivity (Internal) transaction failure occurred. The test credentials can't be used to test Outlook Web App.

Yes, I know. Everybody is on Exchange Server 2013 by now or is using Office 365 Smile. But for the customers out there who’re still on Exchange Server 2010 (and that’s still a huge part…), this posting might come in handy when you don’t want to use the synthetic transactions.

HP MP For Blade Systems, Virtual Connect & Linux Systems: Where Are The Alerts?!

First of all, I want to compliment HP for the quality of their MPs. Seriously. The last few years HP has put a lot of effort into the overall quality of their MPs, the requirements and how they operate. And every new iteration showed progress and improvements.

In the last few weeks I have worked with the latest versions of the HP MPs for SCOM and I must really say, it has improved significantly. So that’s an awesome feat since we all know that overall quality of some other MPs delivered by other vendors isn’t that good at all which is a shame.

So this posting isn’t meant in any kind of way to bash HP. Instead I want to point out some challenges with the latest version of their MP targeted at monitoring ESX servers, Linux servers, Blade Systems, Virtual Connect and Agentless servers.

Challenges
The latest version of HP Insight Control 7.1 was in place, installed, imported and properly configured. Also the related Blade Systems and Linux servers were added. And soon enough these devices showed up in SCOM and got a status. Sweet!

So it was time for some tests. The system engineers went to the computer room and took out some hardware from the monitored Blade Systems and Linux Servers. And now something strange happened…

State Changes? Yes. Alerts? NO!
A bit late (?) SCOM started to show the related state changes. The time it took was far too long but nothing alarming. A properly configured override would take care of that issue. But what worried me was that no Alert what so ever showed up. Nothing. Zip. Nada! Time for some investigations.

No Noise please…
And this one really puzzled me. The related Monitor was set to generate an Alert, as this screen dump shows:
image

So why wasn’t the Alert being shown? SCOM itself was in an healthy state and Alerts for other monitored components, covered by other MPs still came in. So the cause was related to HP MP itself.

Time to check the overrides. And this one was a bit surprising. Since it turned out that ALL Monitors in the HP MP are set with an Override NOT to generate an Alert by default:
image

I don’t like noise for sure, but this kind of tuning is a bit too much when you ask me Smile. And no, none of the related guides for this MP tells you anything about this configuration…

Split brain scenario & Enforcing an Override
But this isn’t a nice situation at all since this MP has some configuration issues now which can be addressed but need some serious attention. Why? Well…

  1. The MP contains Monitors which by default generate Alerts;
  2. Out of the box these Monitors contain overrides which suppress this setting (Generate Alert: FALSE). And this Override is boxed in a Sealed MP, so it can’t be removed or edited directly;
  3. So an EXTRA Override is required (Generates Alert: TRUE).

However, with this option as described in Step 3 a new situation is born which is equivalent to the split brain scenario we had back in the days with the old failover clusters. There can only be one owner of the quorum any given time. But during disasters and their recoveries a situation can happen where two or even more nodes they think they’re the quorum owner. And this is even worse for your failover cluster.

With setting two Overrides on the same Parameter (Generates Alert), one time FALSE and the other time TRUE, SCOM doesn’t know what to do so it’s behavior becomes erratic. One time it will generate an Alert and the other time it won’t.

GLADLY, Microsoft had a very bright moment when they engineered SCOM 2007 RTM and from the beginning they added an extra option for setting Overrides: the ENFORCED option. Basically it means that for that particular Override, SCOM has to enforce it, no matter what other overrides for the same Monitor/Rule and Parameter of that very same Rule/Monitor are in place.

So when setting this Override I used the ENFORCED option like this:
image

While I was at it, I also changes the PeriodSeconds Parameter Name from 900 seconds (15 minutes) which is way too long, to 60 seconds, so this Alert would trigger an Alert far sooner. After these modifications the related Monitors looked like this:
image

And now the second test went far better: when the system engineers went out to pull out some disks or other hardware, SCOM showed a State Change within a minute AND the related Alert was also shown!

So for anyone having this MP in place, open the related hardware in the Health Explorer in SCOM and check one by one those Monitors. I’ll bet they have that Override in place, suppressing the Alerts. Now you know how to fix them, and when required also to make sure those very same Monitors run a bit more often…

Recap
Like I said before, HP has done a great job and delivers good MPs now. Still some additional tuning is required though, but when that’s in place, you have a good monitoring solution in place. And to be frank, I rather have MPs like this one (no noise) and the ability to tune them.

None the less, HP could do these two things:

  1. Document these Overrides so their consumers know about it;
  2. Put these Overrides in an additional Unsealed MP, so people can decide whether or not to import it.

And for the rest: RESPECT to HP!

Additional resources
There are some additional resources about this MP, how to import, configure and tune it:

  1. My respected fellow MVP Stanislav Zhelyazkov: https://cloudadministrator.wordpress.com/2012/08/12/configure-hp-bladesystem-management-pack-for-scom/
  2. And my own blog: http://thoughtsonopsmgr.blogspot.nl/2013/06/high-level-overview-hp-blade-monitoring.html