![]() | Sun System Handbook - CD 2.1.8 April 2005 Internal/Partner Edition | ||
|
|
||
2004-09-21 This document provides troubleshooting information for the service personnel who need a quick reference on what to look for when the Fault Light-Emitting Diode (LED) is on for a Sun Fire or Netra server. The Fault LED on a Sun Fire server or Netra server can have three states: off, solid, or flashing. Refer to the following URL under the Telco section for more information about LEDs: http://sunsolve.sun.com/handbook_pub/General/LEDs_TOC.html To determine what actually failed or what caused the fault condition, service personnel need to access Lights Out Management (LOM). You can do this in one of two ways: - Through the LOM management port - Through the operating system (OS) by using the lom command provided by the LOMlite packages USING THE LOM PROMPT Use the #. key to drop the console to the lom prompt if it is not already at the lom> prompt. Note: Dropping to the lom prompt does not affect your OS in any way, unless you issue strong commands, such as "break," "reset," or "poweroff" from the lom> prompt. Check the following two conditions when you are at the lom> prompt: 1. When the Fault LED is on, check the status of the machine. For example, check for hardware failures, such as the power supply and the fan. The command to use at the lom prompt is "environment" or "showenvironment" depending on the platform. A sample output follows, which might vary depending on the server platform. Any component's state labeled as "FAILED" might indicate a bad component requiring replacement. lom>environment OR lom>showenvironment
Fault ON
Alarm1 OFF
Alarm2 OFF
Alarm3 OFF
Fans:
1 fan1 OK speed 61%
PSUs:
1 OK
Temperature sensors:
1 Enclosure 21degC OK
Overheat sensors:
1 CPU OK
Circuit breakers:
1 USB0 OK
2 USB1 OK
3 SCC OK
Supply rails:
1 5V OK
2 3V3 OK
3 +12V OK
4 -12V OK
5 VDD core OK
2. If the environment or showenvironment commands do not show any failing components, check the events log for the trigger. Use the "showlogs -v" command from the lom> prompt. Sample error messages: lom>showlogs -v SCC card removed:
+1d+8h5m9s host FATAL FAULT: SCC removed <--- Cause
+1d+8h5m9s Fault LED 3Hz <--- Fault LED flashing
Rocker switch/Power switch/Power Button switch turned to off:
+11d+0h14m58s host FAULT: unexpected power off <--- Cause
+11d+0h14m58s Fault LED ON <--- Fault LED solid
Input power source failure:
+19h25m20s PSU 1 FAULT: state change - InA failed <--- Cause
+19h25m20s Fault LED ON <--- Fault LED solid
Fan failure:
+18d+20h22m59s Fan 4 FATAL FAULT: failed 7% <--- Cause
+18d+20h22m59s Fault LED ON <--- Fault LED solid
Once the fault has been fixed, the service personnel might need to turn off the Fault LED. This can easily be achieved by issuing a "faultoff" command from the lom> prompt. USING THE LOM COMMAND FROM THE OS (LOMLITE PACKAGES INSTALLED) The lom command can be issued from the OS provided that the LOMlite packages have been installed. Use the pkginfo(1) command to check if the LOMlite packages have been installed: # pkginfo | grep SUNWlom
system SUNWlomm LOMlite manual pages
system SUNWlomr LOMlite driver (root)
system SUNWlomu LOMlite Utilities (usr)
Again, check for problems with the environment and the events log: # lom -plvtf
PSUs:
1 OK
LOM alarm states:
Alarm1=off
Alarm2=off
Alarm3=off
Fault LED=off
Supply voltages:
1 5V status=ok
2 3V3 status=ok
3 +12V status=ok
4 -12V status=ok
5 CPU core status=ok
6 +3VSB status=ok
System status flags:
1 SCSI-Term status=ok
2 USB0 status=ok
3 USB1 status=ok
4 SCC status=ok
System Temperature Sensors:
1 Enclosure 23 degC : warning 67 degC : shutdown 72 degC
System Over-temperature Sensors:
1 CPU status=ok
Fans:
1 OK speed 95%
2 OK speed 91%
3 OK speed 100%
4 OK speed 100%
This will displace the last 50 events: # lom -e 50
LOM Event Log:
+0h0m0s Fault LED ON
+0h0m0s host power on
+0h3m17s Fault LED OFF
+0h3m56s host power off
+0h4m8s host power on
+0h0m0s LOM booted
+0h0m0s host power on
+0h0m0s LOM booted
+0h0m0s host power on
+0h0m0s Fault LED ON
+0h0m0s host power on
+0h1m19s host power off
+0h1m30s host power on
5/20/2004 4:54:48 GMT LOM time reference
+0h41m33s host reset
5/20/2004 5:51:15 GMT LOM time reference
+0h32m36s host reset
5/20/2004 6:24:38 GMT LOM time reference
+0h0m0s LOM flash download: v3.12 to v3.13
+0h0m0s LOM reset
+0h0m0s host power on
5/20/2004 6:33:57 GMT LOM time reference
To turn off the Fault LED, use the lom -F off command from the OS. Check that the LOM firmware and LOMlite software are patched before parts are replaced. This verification prevents unnecessary parts replacement due to a false alarm on known bugs. Use the "version" command from the lom> prompt, or if you have LOMlite packages installed, issue the lom -c command from the OS. lom>version
LOM version: v3.8 <---
LOM checksum: 965b
LOM firmware part# 258-7871-14
Microcontroller: H8/3437S
LOM firmware build Feb 2 2001 13:25:30
# lom -c
LOM configuration settings:
serial escape sequence=#.
serial event reporting=default
Event reporting level=fatal, warning & information
Serial security=enabled
Disable watchdog on break=enabled
Automatic return to console=disabled
alarm3 mode=user controlled
firmware version=3.13 <---
firmware checksum=1191
product revision=0.1
product ID=120
If you need help on lom commands, review the man page about lom (if LOMlite is installed while in the OS) or issue a "help" command from the lom> prompt. Important note: The Fault LED can be turned on manually by the lom command in the LOMlite packages while the OS is running. In a typical scenario, the user programmable alarms are sufficient for use in their applications. However, there are occasions when users run out of alarms and start meddling with the fault LED using the lom -F on to get more alarms (that is, Combination of Fault LED + Alarm LEDs). Do not be surprised to find Fault LED and Alarms being set when there are no hardware faults and showlogs does not indicate any failure except the message "Fault LED ON." For example: +0h22m30s host power off
+0h22m51s host power on
11/5/2003 6:12:55 LOM time reference
11/19/2003 6:19:35 LOM time reference
12/3/2003 6:26:16 LOM time reference
+17d+5h55m10s Fault LED ON
+59d+20h18m26s host power off
+0h0m0s LOM booted
+0h0m3s host power on
2/1/2004 3:25:22 LOM time reference
| ||||||||||||
|
||||||||||||