Thursday, January 29, 2009

SCOM Discovery Wizard doesn’t work

Symptom
The SCOM Discovery Wizard can run forever without ever discovering a single system eventhough the systems to be discovered are up & running and not restricted by a firewall.

Cause
The SQL Broker of the OperationsManager database is not running. Without it, the Discovery Wizard will not function.

Remedy

  1. Check to see whether the SQL Broker is running
    - Open SQL Server Management Studio
    - Select the right instance and the OpsMgr database
    - Start a new query on the OpsMgr database:
    SELECT is_broker_enabled FROM sys.databases WHERE name = 'OperationsManager'
    - Value = 0 :SQL Broker is disabled. Goto Step 2.
    - Value= 1 : SQL Broker is enabled. All is OK.

    Check here for another issue which might be causing the Discovery Wizard to run forever.

  2. Enabling the SQL Broker for the OpsMgr database
    - Open SQL Server Management Studio
    - Select the right instance and the OpsMgr database
    - Start a new query on the OpsMgr database:
    ALTER DATABASE OperationsManager SET SINGLE_USER WITH ROLLBACK IMMEDIATE
    - Click Execute
    - Start this query on the OpsMgr database:
    ALTER DATABASE OperationsManager SET ENABLE_BROKER
    - Click Execute
    - Close SQL Server Management Studio.
    Note: Closing SQL Server Management Studio closes the connection to the database in single user mode. Most of the times one has to stop all SCOM related services on the RMS since these services have running connections to this database. Without stopping them one won't be able to run the next query.

- Open SQL Server Management Studio
- Select the right instance and the OpsMgr database
- Start a new query on the OpsMgr database:
ALTER DATABASE OperationsManager SET MULTI_USER
- Click Execute.

Repeat Step 1 to check the SQL Broker is running now (value must be 1)

Tuesday, January 27, 2009

Alert Severities/Priorities and their integers

In SCOM SP1 one can override an alert priority and/or –severity. However, when one to make the override, there are integers to be used for defining the alert priority/severity.


Underneath tables show the alert severities/priorities and their corresponding integers:

Alert Severity

Integer

Critical

2

Warning

1

Information

0


Alert Priority

Integer

High

2

Medium

1

Low

0

Monday, January 26, 2009

SCOM & Maintenance Mode

16-04-2009 Update: A new blogposting answers the top 3 questions I do get the most about MM. Read all about it here.
SCOM has an option which is called Maintenance Mode. Whenever a monitored server will have maintenance, one doesn't want to see all kinds of alerts in SCOM related to this object. Neither will the reports show relevant data concerning the availability of this server, since the time it received maintenance, SCOM will report on this server being unavailable and thereby the SLAs will be negatively influenced.

By using Maintenance Mode in SCOM one can overcome these issues. However, Maintenance Mode in SCOM isn't complete. For instance one cannot schedule Maintenance Mode in advance. One has to start it right away. But what if some monitored servers receive maintenance during the night? Does the SCOM administrator have to come out of his/her bed, set the related servers and objects into maintenance mode and go back to bed?

Well, according to Microsoft the answer is a YES. But thanks to Tim McFadden, the answer is NO. He has made a very good tool which enables SCOM administrators to plan Maintenace Mode in advance. Moreover, it is GUI-based so it is easy to use.

I have seen many other solutions for the above mentioned issue, but I must say Tim McFadden's solution is the best. It works like a charm and is very straightforward.

One can download it here.
All credits go to Tim McFadden who runs a very good blog.

Thursday, January 22, 2009

EventID 10102

The OpsMgr eventlog of the RMS logs this EventID many times per hour:
Event Type: Error
Event Source: Health Service Modules
Event Category: None
Event ID: 10102
Date: xx-x-xxxx
Time: xx:xx:xx
User: N/A
Computer: xxxx
Description:
In PerfDataSource, could not resolve counter SQLServer:SSIS Pipeline, Rows written, . Module will be unloaded.

One or more workflows were affected by this.

Workflow name: Microsoft.SQLServer.2005.SQLServer_SSIS_Pipeline_Rows_Written_15.0_minutes_2_Rule
Instance name: MsDtsServer
Instance ID: {8CF3D89A-4393-20C5-1178-6715DD04AA4F}
Management group: xxxxxxx
This happens during the installation of SQL Server on a x64 server. SQL Server Setup sets the registry entry for the x64 SQL Server:SSIS Pipeline performance object to the wrong location. Thus the perfomance counter cannot be found.

The solution is simple:
Open a cmd-prompt on the SQL-server and type these commands, each followed by an :
cd DriveLetter:\Program Files\Microsoft SQL Server\90\DTS\Binn
unlodctr dtspipeline
lodctr dtsperf.ini


This should solve the problem. There is a KB article about this issue: KB941154

Sometimes step 2 in this KB article is needed as well (editing the registry). There is an easy way to check whether the mentioned steps have helped: Start Perfmon and check whether the counter 'SQL Server:SSIS Pipeline' is present. See screendump:

Tuesday, January 13, 2009

HOWTO: Monitoring Exchange 2007 with SCOM

Update 02 march 2009: The fourth part has been published. So it is complete now
Rui Silva, an Exchange MVP, is writing a very good and detailed HOWTO guide about monitoring Exchange 2007 with SCOM.

In total there are four articles covering this topic.

All I can say is that they are great and cover every detail.

So look here for:
Part 1
Part 2
Part 3
Part 4

Monday, January 12, 2009

SCOM Reporting, Installation issues. Posting 4, error 'An error occurred while parsing the configuration file’

This and the previous postings are about the most common errors one can bump into when testing SQL Reporting Services (SRS), and how to resolve these. SRS is needed to make SCOM Reporting work. So this posting - and the others to come - presume SRS is already installed AND configured.

The most common way to test SRS is to go to the website of SRS on the local server hosting the SRS services AND website. This latter is important since this server hosts the website with the SRS components. When one tries to open the SRS website (http://localhost/reports) one can bump into this error:


Cause
Certain Elements of the Reporting website are missing

Solution
- Start IIS Manager and remove everything related to SQL Reporting Services (website & application pools)

- Start SQL Reporting Services Configuration tool and rebuild the related website(s) and application pools. Make sure the related accounts (Windows Service Account & Execution Account) are OK.

When the SRS Configuration Tools reports everything to be OK (green state). SCOM Reporting will be OK again.

Friday, January 9, 2009

EventID 11464

The OpsMgr eventlog of the RMS logs this EventID every hour:
A container for the management group MANAGEMENTGROUP NAME either does not exist in domain DOMAIN NAME or the Run As Account associated with the AD based agent assignment rule does not have access to the container. Please run MomADAdmin for this Management Group before configuring assignment rules and make sure the associated Run As Account is the member of the Operations Manager Administrator role.

When one looks in the AD the SCP (Service Connection Point) is present, so what goes wrong?

Most of the time, the issue is that the MOMADAdmin.exe tool has been used with the wrong syntax. The tool doesn't generate an error, it even creates a SCP (a faulty one that is) so one tends to think all is well.

The solution is straight forward:

First Step
Remove the current SCP with this command:
momadadmin -d MANAGEMENT_GROUP_NAME FQDN_DOMAIN_NAME

Second Step
Create the new SCP with this command:
MomADAdmin MANAGEMENT_GROUP_NAME GROUP_WITH_ADMINPERMISSIONS_IN_SCOM RMS_NAME FQDN_DOMAIN_NAME


After a while (an hour after the last EventID 11464 has been logged) EventID 11470(even multiple, depending on your SCOM environment) should show up:

AD assignment module successfully added xx computers to SecurityGroup xx_PrimarySG_45962 in domain FQDN DOMAIN NAME since the result of the assignment ldap query has changed.
The SCP is now up & running.

Tuesday, January 6, 2009

Missing Performance Counters

Sometimes a server monitored by SCOM will generate perfomancecounter related errors in the OpsMgr eventlog.

First an error with EventID 10102 is logged, soon to be followed by a warning with EventID 1103. These events will be shown many times in the OpsMgr eventlog.

These events have nothing to do with SCOM but everything with the monitored server itself. Some perfmon counters are missing or corrupt. In rare cases, all the perfmon counters are missing or corrupt. For both cases are solutions. However, when you are experiencing the last situation, it will take some time to get things working again.

First you have to check out to what level perfmon is damaged. The first EventID (10102) gives in its description the exact name of what perfmon counter is missing or corrupt. Sometimes multiple perfmon counters are missing. Therefore checkout all events with EventID 10102 which were written to the eventlog in a certain timerange.

  1. Write down the name(s) of the mentioned perfmon counters.
  2. Start Perfmon and check whether it is showing any counters. When it does, only the mentioned counter(s) in the EventID 10102 is/are missing. Are all counters missing, you have to check out a certain KB article, which will be mentioned later on.
  3. Check in Perfmon whether the perfmon counters written down at step 1 are present. Most of the times these won't be present.

When the missing perfmon counter is about ASP.NET look here for how to fix it. Are there any other missing perfmon counters or are all permon counters missing? Checkout this KB article for how to fix it. The only disadvantage is that it can take a long time to get everything OK. So it is something which has to be scheduled in advance.

A quick and dirty workaround are the steps mentioned below, but be careful! They are to be perfomed at your own risk... They also work only on a Windows 2003 based server!
  1. Follow steps 1,2 and 3 in the earlier mentioned KB article, section ' To rebuild the base performance counter libraries manually'
  2. Follow steps 4 and 5 only for the permon counters involved
  3. Rebuild all perfmon counter by typing this command at the commandprompt: c:\windows\system32\lodctr /R (Note: R is uppercase.)

It can take some time (minutes) before this command is completed. After a while all should be OK and perfmon is showing all counters again. If it is not, the only way is the long way: follow all the way through the KB article without using any shortcuts.

Monday, January 5, 2009

SCOM Reporting, Installation issues. Posting 3, error ‘Unable to generate a temporary class (result=1).’

This and the previous postings are about the most common errors one can bump into when testing SQL Reporting Services (SRS), and how to resolve these. SRS is needed to make SCOM Reporting work. So this posting - and the others to come - presume SRS is already installed AND configured.

The most common way to test SRS is to go to the website of SRS on the local server hosting the SRS services AND website. This latter is important since this server hosts the website with the SRS components. When one tries to open the SRS website (http://localhost/reports) one can bump into this error:


Cause
The account used by ASP.NET has no permissions on the in the errormessage mentioned folder (in this case c:\windows\temp)

Solution
Allow the group ‘Users(servername \Users)’ write permissions on the in the errormessage mentioned folder (in this case c:\windows\temp) and all subfolders.