Monday, June 17, 2013

OM12 SP1 UNIX/Linux Agent Troubleshooting Table

Even though there are already many good postings and articles out there, all about this topic (will list them at the end of this posting) I still want to add my experiences as well.

For a customer quite a few Linux servers had to be monitored. During the roll-out of these OM12 SP1 Agents to the Linux systems several errors popped up. Thanks to a highly experienced Linux guru working for this customers these issues were sorted out pretty fast. Based on this experience I have made a table with the most occurring errors and their possible causes and their fixes.

Issue Cause & Resolution
DNS Configuration error 01: Faulty reverse DNS Lookup Zone. When fixed all went just fine
02: Linux system had multiple names, all registered in DNS. After a couple of retries the Agent landed properly.
03: System resided in an old segment which didn’t have a zone on the new DNS servers. When fixed all went just fine
Failed during SSH Discovery 01: SSH was locked down to ROOT only. When fixed for the OM12 SP1 account used by Linux all went just fine.
02: An outdated version of SSH which isn’t compatible with the .NET SSH implementation Microsoft uses on the OM12 SP1 side. SSH requires an update.
03: An outdated version of SSH which doesn’t accept certain SSH calls. SSH requires an update.
Failed to install kit 01: Home folder of the OM12 SP1 Linux account was missing. After having added this folder all went just fine.
02: Certain files were locked. When retried the installation of the OM12 SP1 Agent some hours later all went just fine.
Installation hangs On some systems the installation of the OM12 SP1 Linux Agent just hanged. Had to hard stop the OM12 SP1 Console. Then a second attempt went just fine.
Unexpected Discovery Result 01: Reason unknown. Second attempt (some hours later) ran just fine.
02: A restart of the OM12 SP1 services on the OM12 SP1 MS running the Discovery (be careful though): http://www.opsman.co.za/?p=50
WinRM cannot complete the operation Firewall was blocking WinRM service. After having opened that port (TCP 1270) it still didn’t work. See this posting to get it working: http://blogs.technet.com/b/chandanbharti/archive/2011/12/21/linux-agent-install-issue.aspx
Agent verification failed Multiple DNS issues:
1: Linux system has a different hostname compared to the FQDN. Correct it (hostname or FQDN) and all is just fine.
2: DNS record isn’t present. Add the record and all is just fine.

Other resources for troubleshooting OM12 SP1 UNIX/Linux Agent installation issues:

  1. Bob Cornelissen: http://www.bictt.com/blogs/bictt.php/2011/05/29/scom-trick-15-cross-platform
  2. Microsoft TechNet Wiki: http://social.technet.microsoft.com/wiki/contents/articles/4966.troubleshooting-unixlinux-agent-discovery-in-system-center-2012-operations-manager.aspx
  3. Stefan Roth: http://blog.scomfaq.ch/2012/09/11/scom-2012-linux-discovery-unspecified-failure/
  4. Enabling logging and debugging in OM12: http://technet.microsoft.com/en-us/library/hh212862 
  5. Microsoft TechNet – Trouble shooting UNIX/Linux monitoring: http://technet.microsoft.com/en-us/library/hh212885

Other useful resources, all related to UNIX/Linux monitoring with OM12:

Tasks
Install Agent on UNIX and Linux Using the Discovery Wizard
Concepts
Using Templates for Additional Monitoring of UNIX and Linux
Troubleshooting UNIX and Linux Monitoring
Accessing UNIX and Linux Computers in Operations Manager
Required Capabilities for UNIX and Linux Accounts
Management Pack Issues
Operating System Issues
Certificate Issues
Managing Certificates for UNIX and Linux Computers
Managing Resource Pools for UNIX and Linux Computers

No comments: