Sun Microsystems, Inc.  Sun System Handbook - CD 2.1.8 April 2005 Internal/Partner Edition
   Home | Systems | Components | General Info | Search | Feedback

Asset ID:1-25-77137-1
Update Date:Tue Sep 21 00:00:00 MDT 2004
Keywords:lom lomlite fault LED faulton faultoff showenvironment environment showlogs, 

Sun Fire[TM] And Netra[TM] Servers: How to Troubleshoot Servers When the Fault LED is On?

Luca Vit

2004-09-21


Problem Statement:

This document provides troubleshooting information for the service personnel who need a quick reference on what to look for when the Fault Light-Emitting Diode (LED) is on for a Sun Fire or Netra server.

Resolution:

The Fault LED on a Sun Fire server or Netra server can have three states: off, solid, or flashing.

Refer to the following URL under the Telco section for more information about LEDs:

http://sunsolve.sun.com/handbook_pub/General/LEDs_TOC.html

To determine what actually failed or what caused the fault condition, service personnel need to access Lights Out Management (LOM). You can do this in one of two ways:

- Through the LOM management port - Through the operating system (OS) by using the lom command provided by the LOMlite packages

USING THE LOM PROMPT

Use the #. key to drop the console to the lom prompt if it is not already at the lom> prompt. Note: Dropping to the lom prompt does not affect your OS in any way, unless you issue strong commands, such as "break," "reset," or "poweroff" from the lom> prompt.

Check the following two conditions when you are at the lom> prompt:

1. When the Fault LED is on, check the status of the machine. For example, check for hardware failures, such as the power supply and the fan. The command to use at the lom prompt is "environment" or "showenvironment" depending on the platform. A sample output follows, which might vary depending on the server platform. Any component's state labeled as "FAILED" might indicate a bad component requiring replacement.

        lom>environment     OR     lom>showenvironment
        Fault  ON
        Alarm1 OFF
        Alarm2 OFF
        Alarm3 OFF
        Fans:
        1 fan1 OK speed 61%
        PSUs:
        1 OK
        Temperature sensors:
        1 Enclosure 21degC OK
        Overheat sensors:
        1 CPU OK
        Circuit breakers:
        1 USB0 OK
        2 USB1 OK
        3 SCC OK
        Supply rails:
        1 5V OK
        2 3V3 OK
        3 +12V OK
        4 -12V OK
        5 VDD core OK

2. If the environment or showenvironment commands do not show any failing components, check the events log for the trigger. Use the "showlogs -v" command from the lom> prompt.

        Sample error messages:
        lom>showlogs -v
        SCC card removed:
        +1d+8h5m9s host FATAL FAULT: SCC removed <--- Cause
        +1d+8h5m9s Fault LED 3Hz <--- Fault LED flashing
        Rocker switch/Power switch/Power Button switch turned to off:
        +11d+0h14m58s host FAULT: unexpected power off <---  Cause
        +11d+0h14m58s Fault LED ON <--- Fault LED solid 
        Input power source failure:
        +19h25m20s PSU 1 FAULT: state change -  InA failed <--- Cause
        +19h25m20s Fault LED ON <--- Fault LED solid
        Fan failure:
        +18d+20h22m59s Fan 4 FATAL FAULT: failed 7% <--- Cause
        +18d+20h22m59s Fault LED ON <--- Fault LED solid

Once the fault has been fixed, the service personnel might need to turn off the Fault LED. This can easily be achieved by issuing a "faultoff" command from the lom> prompt.

USING THE LOM COMMAND FROM THE OS (LOMLITE PACKAGES INSTALLED)

The lom command can be issued from the OS provided that the LOMlite packages have been installed. Use the pkginfo(1) command to check if the LOMlite packages have been installed:

        # pkginfo | grep SUNWlom
        system      SUNWlomm                         LOMlite manual pages
        system      SUNWlomr                         LOMlite driver (root)
        system      SUNWlomu                         LOMlite Utilities (usr)

Again, check for problems with the environment and the events log:

        # lom -plvtf        
        PSUs:
        1 OK
        LOM alarm states:
        Alarm1=off
        Alarm2=off
        Alarm3=off
        Fault LED=off
        Supply voltages:
         1               5V status=ok
         2              3V3 status=ok
         3             +12V status=ok
         4             -12V status=ok
         5         CPU core status=ok
         6            +3VSB status=ok
        System status flags:
         1        SCSI-Term status=ok
         2             USB0 status=ok
         3             USB1 status=ok
         4              SCC status=ok
        System Temperature Sensors:
        1        Enclosure 23 degC : warning 67 degC : shutdown 72 degC
        System Over-temperature Sensors:
         1              CPU status=ok
        Fans:
        1 OK speed 95%
        2 OK speed 91%
        3 OK speed 100%
        4 OK speed 100%

This will displace the last 50 events:

        # lom -e 50     
        LOM Event Log:
         +0h0m0s  Fault LED ON
         +0h0m0s host power on
         +0h3m17s  Fault LED OFF
         +0h3m56s host power off
         +0h4m8s host power on
         +0h0m0s LOM booted
         +0h0m0s host power on
         +0h0m0s LOM booted
         +0h0m0s host power on
         +0h0m0s  Fault LED ON
         +0h0m0s host power on
         +0h1m19s host power off
         +0h1m30s host power on
         5/20/2004 4:54:48 GMT LOM time reference
         +0h41m33s host reset
         5/20/2004 5:51:15 GMT LOM time reference
         +0h32m36s host reset
         5/20/2004 6:24:38 GMT LOM time reference
         +0h0m0s LOM flash download: v3.12 to v3.13
         +0h0m0s LOM reset
         +0h0m0s host power on
         5/20/2004 6:33:57 GMT LOM time reference

To turn off the Fault LED, use the lom -F off command from the OS.

Temporary Workaround:

Additional Information:

Check that the LOM firmware and LOMlite software are patched before parts are replaced. This verification prevents unnecessary parts replacement due to a false alarm on known bugs. Use the "version" command from the lom> prompt, or if you have LOMlite packages installed, issue the lom -c command from the OS.

         lom>version
         LOM version:            v3.8 <---
         LOM checksum:           965b
         LOM firmware part#      258-7871-14
         Microcontroller:        H8/3437S
         LOM firmware build      Feb  2 2001 13:25:30
         # lom -c
         LOM configuration settings:
         serial escape sequence=#.
         serial event reporting=default
         Event reporting level=fatal, warning & information
         Serial security=enabled
         Disable watchdog on break=enabled
         Automatic return to console=disabled
         alarm3 mode=user controlled
         firmware version=3.13 <---
         firmware checksum=1191
         product revision=0.1
         product ID=120

If you need help on lom commands, review the man page about lom (if LOMlite is installed while in the OS) or issue a "help" command from the lom> prompt.

Internal Only:

Important note: The Fault LED can be turned on manually by the lom command in the LOMlite packages while the OS is running. In a typical scenario, the user programmable alarms are sufficient for use in their applications. However, there are occasions when users run out of alarms and start meddling with the fault LED using the lom -F on to get more alarms (that is, Combination of Fault LED + Alarm LEDs). Do not be surprised to find Fault LED and Alarms being set when there are no hardware faults and showlogs does not indicate any failure except the message "Fault LED ON."

For example:

         +0h22m30s host power off
         +0h22m51s host power on
         11/5/2003 6:12:55 LOM time reference
         11/19/2003 6:19:35 LOM time reference
         12/3/2003 6:26:16 LOM time reference
         +17d+5h55m10s  Fault LED ON
         +59d+20h18m26s host power off
         +0h0m0s LOM booted
         +0h0m3s host power on
         2/1/2004 3:25:22 LOM time reference
 Copyright 1994-2005 Sun Microsystems, Inc.    All rights reserved.
 Legal Terms Privacy Policy Feedback