Cisco NX-OS Troubleshooting Guide, Release 4.0
Troubleshooting Upgrades and Reboots

Table Of Contents

Troubleshooting Upgrades and Reboots

Information About Upgrades and Reboots

Upgrades and Reboot Guidelines

Verifying Cisco NX-OS Software Upgrades

Verifying a Nondisruptive Upgrade

Using ROM Monitor Mode

Troubleshooting Cisco NX-OS Software Upgrades and Downgrades

Software Upgrade Ends with Error

Upgrading Cisco NX-OS Software

Troubleshooting Cisco NX-OS Software System Reboots

Power-On or Switch Reboot Hangs

Corrupted Bootflash Recovery

Recovery from the loader> Prompt on Supervisor Modules

Recovery from the loader> Prompt

Recovery from the switch(boot)# Prompt

Recovery for Systems with Dual Supervisor Modules

Recovering One Supervisor Module With Corrupted Bootflash

Recovering Both Supervisor Modules with Corrupted Bootflash

System or Process Resets

Recoverable System Restarts

Unrecoverable System Restarts

Recovering the Administrator Password


Troubleshooting Upgrades and Reboots


This chapter describes how to identify and resolve problems that might occur when upgrading or restarting Cisco NX-OS.

This chapter includes the following sections:

Information About Upgrades and Reboots

Upgrades and Reboot Guidelines

Verifying Cisco NX-OS Software Upgrades

Troubleshooting Cisco NX-OS Software Upgrades and Downgrades

Troubleshooting Cisco NX-OS Software System Reboots

Recovering the Administrator Password

Information About Upgrades and Reboots

Cisco NX-OS consists of two images—the kickstart image and the system image. These two images should have the same image version to bring up the system.

Upgrades and reboots are ongoing network maintenance activities. You should try to minimize the risk of disrupting the network when performing these operations in production environments and to know how to recover quickly when something does go wrong.


Note This publication used the term upgrade to refer to both Cisco NX-OS upgrades and downgrades.


Upgrades and Reboot Guidelines

Use the following checklist to prepare for an upgrade:

Checklist
Check off

Read theCisco NX-OS Release Notes for the release you are upgrading or downgrading to. Cisco NX-OS Release Notes are available at the following URL:

http://cisco.com/en/US/products/ps5989/prod_release_notes_list.html

Ensure that an FTP or TFTP server is available to download the Cisco NX-OS software images.

Copy the new Cisco NX-OS image onto your supervisor modules in bootflash: or slot0:.

Use the show install all impact command to verify that the new image is healthy and the impact that new load will have on any hardware with regards to compatibility. Check for compatibility.

Copy the startup-config to a snapshot config in NVRAM. This step creates a backup copy of the startup-config (see the Rollback chapter in the Cisco NX-OS System Management Configuration Guide, Release 4.0).

Save your running configuration to the startup configuration.

Backup a copy of your configuration to a remote TFTP server.

Schedule your upgrade during an appropriate maintenance window for your network.


After you have completed the checklist, you are ready to upgrade the systems in your network. See the Cisco NX-OS Software Upgrade Guide, Release 4.0 for complete upgrade details.


Note It is normal for the active supervisor to become the standby supervisor during an upgrade.



Note Log messages are not saved across system reboots. However, a maximum of 100 log messages with a severity level of critical and below (levels 0, 1, and 2) are saved in NVRAM. You can view this log at any time by entering the show logging nvram CLI command.


Verifying Cisco NX-OS Software Upgrades

You can use the show install all status command to watch the progress of your software upgrade or to view the ongoing install all command or the log of the last installed install all command from a console, SSH, or Telnet session. This command shows the install all output on both the active and standby supervisor module even if you are not connected to the console terminal. See Example 2-1.

Example 2-1 install all Command Output

switch# show install all status
There is an on-going installation... <---------------------- in progress installation
Enter Ctrl-C to go back to the prompt.
Verifying image bootflash:/b-4.0.0.104
-- SUCCESS
Verifying image bootflash:/i-4.0.0.104
-- SUCCESS
Extracting "system" version from image bootflash:/i-4.0.0.104.
-- SUCCESS
Extracting "kickstart" version from image bootflash:/b-4.0.0.104.
-- SUCCESS
Extracting "loader" version from image bootflash:/b-4.0.0.104.
-- SUCCESS
switch# show install all status
This is the log of last installation. <----------------- log of last install
Verifying image bootflash:/b-4.0.0.104
-- SUCCESS
Verifying image bootflash:/i-4.0.0.104
-- SUCCESS
Extracting "system" version from image bootflash:/i-4.0.0.104.
-- SUCCESS
Extracting "kickstart" version from image bootflash:/b-4.0.0.104.
-- SUCCESS
Extracting "loader" version from image bootflash:/b-4.0.0.104.
-- SUCCESS

This section includes the following topics:

Verifying a Nondisruptive Upgrade

Using ROM Monitor Mode

Verifying a Nondisruptive Upgrade

When you initiate a non-disruptive upgrade, Cisco NX-OS notifies all services that an upgrade is about to start and finds out whether or not the upgrade can proceed. If a service cannot allow the upgrade to proceed at this time, then the service aborts the upgrade and you are prompted to enter the show install all failure-reason command to determine the reason why the upgrade cannot proceed.

...
Do you want to continue with the installation (y/n)?  [n] y
Install is in progress, please wait.
Notifying services about the upgrade. 
[#                  ]   0% -- FAIL. Return code 0x401E0066 (request timed out).
Please issue "show install all failure-reason" to find the cause of the failure.<---system 
prompt to enter the show all failure-reason command.
Install has failed. Return code 0x401E0066 (request timed out).
Please identify the cause of the failure, and try 'install all' again.

switch# show install all failure-reason 
Service: "cfs" failed to respond within the given time period.
switch# 

If a failure occurs for whatever reason (such as a save runtime state failure or module upgrade failure) after the upgrade is in progress, then the device reboots disruptively because the changes cannot be rolled back. In such cases, the upgrade has failed.

If you need further assistance to determine why an upgrade is unsuccessful, you should collect the details from the show tech-support command output and the console output from the installation, if available, before you contact your technical support representative.

Using ROM Monitor Mode

If your device does not find a valid system image to load, the system will start in ROM monitor mode. ROM monitor mode can also be accessed by interrupting the boot sequence during startup. From ROM monitor mode, you can boot the device or perform diagnostic tests.

On most systems, you can enter ROM monitor mode by entering the reload EXEC command and then pressing the Break key on your keyboard or by using the Break key-combination (the default Break key combination is Ctrl-C) during the first 60 seconds of startup.

Troubleshooting Cisco NX-OS Software Upgrades and Downgrades

This section how to troubleshoot a software installation upgrade or downgrade failure.

This section includes the following topics:

Software Upgrade Ends with Error

Upgrading Cisco NX-OS Software

Software Upgrade Ends with Error

Symptom    The software upgrade ends with an error.

Table 2-1 Software Upgrade Ends with Error

Problem
Possible Cause
Solution

The upgrade ends with an error.

The standby supervisor module bootflash: file system does not have sufficient space to accept the updated image.

Use the delete command to remove unnecessary files from the file system.

The specified system and kickstart images are not compatible.

Check the output of the installation process for details on the incompatibility. Possibly update the kickstart image before updating the system image.

The install all command is entered on the standby supervisor module.

Enter the command on the active supervisor module only.

A module was inserted while the upgrade was in progress.

Restart the installation. See the "Upgrading Cisco NX-OS Software" section.

The system experienced a power disruption while the upgrade was in progress.

Restart the installation. See the "Upgrading Cisco NX-OS Software" section.

An incorrect software image path was specified.

Specify the entire path for the remote location accurately.

Another upgrade is already in progress.

Verify the state of the system at every stage and restart the upgrade after 10 seconds. If you restart the upgrade within 10 seconds, the command is rejected. An error message displays, indicating that an upgrade is currently in progress.

Module failed to upgrade.

Restart the upgrade. See the "Upgrading Cisco NX-OS Software" section, or, use the install module CLI command to upgrade the failed module.


Upgrading Cisco NX-OS Software

To perform an automated software upgrade on any system from the CLI, follow these steps:


Step 1 Log into the system through the console, Telnet, or SSH port of the active supervisor.

Step 2 Create a backup of your existing configuration file, if required.

Step 3 Perform the upgrade by entering the install all command.

The example below demonstrates upgrading using the install all command with the source images located on a SCP server.


Tip Always carefully read the output of the install all compatibility check command. This compatibility check tells you exactly what needs to be upgraded (such as the BIOS, loader, or firmware) and what modules will experience a disruptive upgrade. If there are any questions or concerns about the results of the output, type n to stop the installation and contact the next level of support.


switch# install all system scp://testuser@tftp-server1/tftpboot/rel/qa/4.0/final/m95
00-sf1ek9-mz.4.0.bin kickstart scp://testuser@tftp-server1/tftpboot/rel/qa/4.0/fin
al/n7000-s1-kickstart-mz.4.0.bin
For scp://testuser@tftp-server1, please enter password:
For scp://testuser@tftp-server1, please enter password:

Copying image from scp://testuser@pal/tftpboot/rel/qa/4.0/final/n7000-s1
-kickstart-mz.4.0.bin to bootflash:///n7000-s1-kickstart-mz.4.0.bin.
[####################] 100% -- SUCCESS

Copying image from scp://testuser@pal/tftpboot/rel/qa/4.0/final/n7000-s1
-mz.4.0.bin to bootflash:///n7000-s1-mz.4.0.bin.
[####################] 100% -- SUCCESS

Verifying image bootflash:///n7000-s1-kickstart-mz.4.0.bin
[####################] 100% -- SUCCESS

Verifying image bootflash:///n7000-s1-mz.4.0.bin
[####################] 100% -- SUCCESS

Extracting "slc" version from image bootflash:///n7000-s1-mz.4.0.bin.
[####################] 100% -- SUCCESS

Extracting "ips" version from image bootflash:///n7000-s1-mz.4.0.bin.
[####################] 100% -- SUCCESS

Extracting "svclc" version from image bootflash:///n7000-s1-mz.4.0.bin.
[####################] 100% -- SUCCESS

Extracting "system" version from image bootflash:///n7000-s1-mz.4.0.bin.
[####################] 100% -- SUCCESS

Extracting "kickstart" version from image bootflash:///n7000-s1-kickstart-mz
.4.0.bin.
[####################] 100% -- SUCCESS

Extracting "loader" version from image bootflash:///n7000-s1-kickstart-mz.2.
1.1a.bin.
[####################] 100% -- SUCCESS



Compatibility check is done:
Module  bootable          Impact  Install-type  Reason
------  --------  --------------  ------------  ------
     1       yes  non-disruptive       rolling
     2       yes  non-disruptive       rolling
     3       yes      disruptive       rolling  Hitless upgrade is not supported
     4       yes      disruptive       rolling  Hitless upgrade is not supported
     5       yes  non-disruptive         reset
     6       yes  non-disruptive         reset



Images will be upgraded according to following table:
Module       Image       Running-Version           New-Version  Upg-Required
------  ----------  --------------------  --------------------  ------------
     1         slc               2.0(2b)               2.1(1a)           yes
     1        bios      v1.1.0(10/24/03)      v1.1.0(10/24/03)            no
     2         slc               2.0(2b)               2.1(1a)           yes
     2        bios      v1.1.0(10/24/03)      v1.1.0(10/24/03)            no
     3         ips               2.0(2b)               2.1(1a)           yes
     3        bios      v1.1.0(10/24/03)      v1.1.0(10/24/03)            no
     4       svclc               2.0(2b)               2.1(1a)           yes
     4       svcsb               1.3(5m)               1.3(5m)            no
     4       svcsb               1.3(5m)               1.3(5m)            no
     4        bios      v1.1.0(10/24/03)      v1.1.0(10/24/03)            no
     5      system               2.0(2b)               2.1(1a)           yes
     5   kickstart               2.0(2b)               2.1(1a)           yes
     5        bios      v1.1.0(10/24/03)      v1.1.0(10/24/03)            no
     5      loader                1.2(2)                1.2(2)            no
     6      system               2.0(2b)               2.1(1a)           yes
     6   kickstart               2.0(2b)               2.1(1a)           yes
     6        bios      v1.1.0(10/24/03)      v1.1.0(10/24/03)            no
     6      loader                1.2(2)                1.2(2)            no

Do you want to continue with the installation (y/n)?  [n] y

Install is in progress, please wait.

Syncing image bootflash:///n7000-s1-kickstart-mz.4.0.bin to standby.
[####################] 100% -- SUCCESS

Syncing image bootflash:///n7000-s1-mz.4.0.bin to standby.
[####################] 100% -- SUCCESS

Setting boot variables.
[####################] 100% -- SUCCESS

Performing configuration copy.
[####################] 100% -- SUCCESS

Module 5: Waiting for module online.
2005 May 20 15:46:03 ca-9506 %KERN-2-SYSTEM_MSG: mts: HA communication with standby 
terminated. Please check the standby supervisor.
 -- SUCCESS

"Switching over onto standby".

Step 4 Exit the system console and open a new terminal session to view the upgraded supervisor module by using the show module command.

If the configuration meets all guidelines when the install all command is used, all modules (supervisor and switching) are upgraded.


Troubleshooting Cisco NX-OS Software System Reboots

This section describes how to troubleshoot software reboots and includes the following topics:

Power-On or Switch Reboot Hangs

Corrupted Bootflash Recovery

Recovery from the loader> Prompt on Supervisor Modules

Recovery from the loader> Prompt

Recovery from the switch(boot)# Prompt

Recovery for Systems with Dual Supervisor Modules

Power-On or Switch Reboot Hangs

Symptom    Power on or switch reboot hangs.

Table 2-2 Power-On or Switch Reboot Hangs 

Problem
Possible Cause
Solution

A power-on or switch reboot hangs for dual supervisor configuration.

The bootflash is corrupted.

See the "Recovery for Systems with Dual Supervisor Modules" section.

A power-on or switch reboot hangs for single supervisor configuration.

The BIOS is corrupted.

Replace this module. Contact your customer support representative to return the failed module.

The kickstart image is corrupted.

Interrupt the boot process at the >loader prompt. Update the kickstart image. See the "Recovery from the loader> Prompt on Supervisor Modules" section.

Boot parameters are incorrect.

Verify and correct the boot parameters and reboot.

The system image is corrupted.

Interrupt the boot process at the switch#boot prompt. Update the system image. See the "Recovery from the switch(boot)# Prompt" section.


Corrupted Bootflash Recovery

All device configuration resides in the internal bootflash. If you have a corrupted internal bootflash, you could potentially lose your configuration. Be sure to save and back up your configuration files periodically. The regular system boot goes through the following sequence (see Figure 2-1):

1. The basic input/output system (BIOS) loads the loader.

2. The loader loads the kickstart image into RAM and starts the kickstart image.

3. The kickstart image loads and starts the system image.

4. The system image reads the startup configuration file.

Figure 2-1 Regular Boot Sequence

If the images on your system are corrupted and you cannot proceed (error state), you can interrupt the system boot sequence and recover the image by entering the BIOS configuration utility described in the following section. Access this utility only when needed to recover a corrupted internal disk.


Caution The BIOS changes explained in this section are required only to recover a corrupted bootflash.

Recovery procedures require the regular sequence to be interrupted. The internal sequence goes through four phases between the time you turn on the system and the time that the system prompt appears on your terminal—BIOS, boot loader, kickstart, and system. (See Table 2-3 and Figure 2-2.)

Table 2-3 Recovery Interruption 

Phase
Normal Prompt1
Recovery Prompt2
Description

BIOS

loader>

No bootable device

The BIOS begins the power-on self test, memory test, and other operating system applications. While the test is in progress, press Ctrl-C to enter the BIOS configuration utility and use the netboot option.

Boot loader

Starting kickstart

loader>

The boot loader uncompresses loaded software to boot an image using its filename as a reference. These images are made available through bootflash. When the memory test is over, press Esc to enter the boot loader prompt.

Kickstart

Uncompressing system

switch(boot)#

When the boot loader phase is over, press Ctrl-]3 (Control key plus right bracket key) to enter the switch(boot)# prompt. If the corruption causes the console to stop at this prompt, copy the system image and reboot the system.

System

Login:

The system image loads the configuration file of the last saved running configuration and returns a switch login prompt.

1 This prompt or message appears at the end of each phase.

2 This prompt or message appears when the system cannot progress to the next phase.

3 Depending on your Telnet client, these keys may be reserved, and you need to remap the keystroke. See the documentation provided by your Telnet client.


Figure 2-2 Regular and Recovery Sequence

Recovery from the loader> Prompt on Supervisor Modules


Caution This procedure uses the init system command, which reformats the file system of the device. Be sure that you have made a backup of the configuration files before you begin this procedure.

The loader> prompt is different from the regular switch# prompt. The CLI command completion feature does not work at the loader> prompt and may result in undesired errors. You must type the command exactly as you want the command to appear.


Note If you boot over TFTP from the loader> prompt, you must supply the full path to the image on the remote server.


Use the help command at the loader> prompt to display a list of commands available at this prompt or to obtain more information about a specific command in that list.

To recover a corrupted kickstart image (system error state) for a system with a single supervisor module, follow these steps:


Step 1 Enter the local IP address and subnet mask for the system at the loader> prompt, and press Enter.

loader> set ip 172.16.1.2 255.255.255.0


Step 2 Specify the IP address of the default gateway.

loader> set gw 172.16.1.1

Step 3 Boot the kickstart image file from the required server.

loader> boot tftp://172.16.10.100/tftpboot/n7000-s1-kickstart-4.0.bin

In this example, 172.16.10.100 is the IP address of the TFTP server, and n7000-s1-kickstart-4.0.bin is the name of the kickstart image file that exists on that server.

The switch(boot)# prompt indicates that you have a usable Kickstart image.

Step 4 Enter the init system command at the switch(boot)# prompt.

switch(boot)# init system


Caution Be sure that you have made a backup of the configuration files before you enter this command.

Step 5 Follow the procedure specified in the "Recovery from the switch(boot)# Prompt" section.


Recovery from the loader> Prompt


Caution This procedure uses the init system command, which reformats the file system of the device. Be sure that you have made a backup of the configuration files before you begin this procedure.


Note The loader> prompt is different from the regular switch# or switch(boot)# prompt. The CLI command completion feature does not work at the loader> prompt and may result in undesired errors. You must type the command exactly as you want the command to appear.



Note If you boot over TFTP from the loader> prompt, you must supply the full path to the image on the remote server.



Tip Use the help command at the loader> prompt to display a list of commands available at this prompt or to obtain more information about a specific command in that list.


To recover a corrupted kickstart image (system error state) for a system with a single supervisor module, follow these steps:


Step 1 Specify the local IP address and the subnet mask for the system.

loader> set ip 172.21.55.213 255.255.255.224
set ip 172.21.55.213 255.255.255.224                                   
Correct - ip addr is 172.21.55.213, mask is 255.255.255.224
Found Intel 82546GB [2:9.0] at 0xe040, ROM address 0xf980
Probing...[Intel 82546GB]
Management interface
Link UP in 1000/full mode
Ethernet addr: 00:1B:54:C1:28:60
Address: 172.21.55.213
Netmask: 255.255.255.224
Server: 0.0.0.0
Gateway: 172.21.55.193

Step 2 Specify the IP address of the default gateway.

loader> set gw 172.21.55.193                                                   
Correct gateway addr 172.21.55.193
Address: 172.21.55.213
Netmask: 255.255.255.224
Server: 0.0.0.0
Gateway: 172.21.55.193

Step 3 Boot the kickstart image file from the required server.

loader> loader> boot tftp://172.28.255.18/tftpboot/n7000-s1-kickstart.4.0.3.gbin            
Address: 172.21.55.213
Netmask: 255.255.255.224
Server: 172.28.255.18
Gateway: 172.21.55.193
 Filesystem type is tftp, using whole disk
Booting: /tftpboot/n7000-s1-kickstart.4.0.3.gbin console=ttyS0,9600n8nn quiet loader
_ver="3.17.0"....
.............................................................................Im
age verification OK

Starting kernel...
INIT: version 2.85 booting
Checking all filesystems..r.r.r.. done.
Setting kernel variables: sysctlnet.ipv4.ip_forward = 0
net.ipv4.ip_default_ttl = 64
net.ipv4.ip_no_pmtu_disc = 1
. 
Setting the System Clock using the Hardware Clock as reference...System Clock set. Local 
time: Wed Oct  1 11:20:11 PST 2008
WARNING: image sync is going to be disabled after a loader netboot
Loading system software
No system image Unexporting directories for NFS kernel daemon...done.
INIT: Sending processes the KILL signal
Cisco Nexus Operating System (NX-OS) Software
TAC support: http://www.cisco.com/tac
Copyright (c) 2002-2008, Cisco Systems, Inc. All rights reserved.
The copyrights to certain works contained in this software are
owned by other third parties and used and distributed under
license. Certain components of this software are licensed under
the GNU General Public License (GPL) version 2.0 or the GNU
Lesser General Public License (LGPL) Version 2.1. A copy of each
such license is available at
http://www.opensource.org/licenses/gpl-2.0.php and
http://www.opensource.org/licenses/lgpl-2.1.php
switch(boot)# 

The switch(boot)# prompt indicates that you have a usable kickstart image.

Step 4 Enter the init system command at the switch(boot)# prompt.

switch(boot)# init system


Caution Be sure that you have made a backup of the configuration files before you enter this command.

Step 5 Follow the procedure specified in the "Recovery from the switch(boot)# Prompt" section.


Recovery from the switch(boot)# Prompt

To recover a system image using the kickstart image for a system with a single supervisor module, follow these steps:


Step 1 Change to configuration mode and configure the IP address of the mgmt0 interface.

switch(boot)# config t
switch(boot)(config)# interface mgmt0

Step 2 Follow this step if you entered an init system command. Otherwise, skip to Step 3.

a. Enter the ip address command to configure the local IP address and the subnet mask for the system.

switch(boot)(config-mgmt0)# ip address 172.16.1.2 255.255.255.0

b. Enter the ip default-gateway command to configure the IP address of the default gateway.

switch(boot)(config-mgmt0)# ip default-gateway 172.16.1.1

Step 3 Enter the no shutdown command to enable the mgmt0 interface on the system.

switch(boot)(config-mgmt0)# no shutdown

Step 4 Enter end to exit to EXEC mode.

switch(boot)(config-mgmt0)# end

Step 5 If you believe there are file system problems, enter the init system check-filesystem command. This command checks all internal file systems and fixes any errors that are encountered. This command takes a few minutes to complete.

switch(boot)# init system check-filesytem 

Step 6 Copy the system image from the required TFTP server.

switch(boot)# copy tftp://172.16.10.100/system-image1 bootflash:system-image1

Step 7 Copy the kickstart image from the required TFTP server.

switch(boot)# copy tftp://172.16.10.100/kickstart-image1 bootflash:kickstart-image1

Step 8 Verify that the system and kickstart image files are copied to your bootflash: file system.

switch(boot)# dir bootflash: 
12456448     Jul 30 23:05:28 1980  kickstart-image1 
12288        Jun 23 14:58:44 1980  lost+found/ 
27602159     Jul 30 23:05:16 1980  system-image1 

Usage for bootflash://sup-local 
  135404544 bytes used 
   49155072 bytes free 
  184559616 bytes total 

Step 9 Load the system image from the bootflash: files system.

switch(boot)# load bootflash:system-image1
Uncompressing system image: bootflash:/system-image1
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC

Would you like to enter the initial configuration mode? (yes/no): yes


Note If you enter no, you will return to the switch# login prompt, and you must manually configure the system.



Recovery for Systems with Dual Supervisor Modules

This section describes how to recover when one or both supervisor modules in a dual supervisor system have corrupted bootflash.

Recovering One Supervisor Module With Corrupted Bootflash

If one supervisor module has functioning bootflash and the other has corrupted bootflash, follow these steps:


Step 1 Boot the functioning supervisor module and log on to the system.

Step 2 At the switch# prompt on the booted supervisor module, enter the reload module slot force-dnld command, where slot is the slot number of the supervisor module with the corrupted bootflash.

The supervisor module with the corrupted bootflash performs a netboot and checks the bootflash for corruption. When the bootup scripts discover that the bootflash is corrupted, it generates an init system command, which fixes the corrupt bootflash. The supervisor boots as the HA Standby.


Caution If your system has an active supervisor module currently running, you must enter the system standby manual-boot command in EXEC mode on the active supervisor module before issuing the init system command on the standby supervisor module to avoid corrupting the internal bootflash:. After the init system command completes on the standby supervisor module, enter the system no standby manual-boot command in EXEC mode on the active supervisor module.


Recovering Both Supervisor Modules with Corrupted Bootflash

If both supervisor modules have corrupted bootflash, follow these steps:


Step 1 Boot the system and press the Esc key after the BIOS memory test to interrupt the boot loader.


Note Press Esc immediately after you see the following message:

00000589K Low Memory Passed
00000000K Ext Memory Passed
Hit ^C if you want to run SETUP....
Wait.....

If you wait too long, you will skip the boot loader phase and enter the kickstart phase.


You see the loader> prompt.


Caution The loader> prompt is different from the regular switch# or switch(boot)# prompt. The CLI command completion feature does not work at the loader> prompt and may result in undesired errors. You must type the command exactly as you want the command to appear.

Tip Use the help command at the loader> prompt to display a list of commands available at this prompt or to obtain more information about a specific command in that list.


Step 2 Specify the local IP address and the subnet mask for the system.

loader> set ip 172.21.55.213 255.255.255.224
set ip 172.21.55.213 255.255.255.224                                   
Correct - ip addr is 172.21.55.213, mask is 255.255.255.224
Found Intel 82546GB [2:9.0] at 0xe040, ROM address 0xf980
Probing...[Intel 82546GB]
Management interface
Link UP in 1000/full mode
Ethernet addr: 00:1B:54:C1:28:60
Address: 172.21.55.213
Netmask: 255.255.255.224
Server: 0.0.0.0
Gateway: 172.21.55.193

Step 3 Specify the IP address of the default gateway.

loader> set gw 172.21.55.193                                                   
Correct gateway addr 172.21.55.193
Address: 172.21.55.213
Netmask: 255.255.255.224
Server: 0.0.0.0
Gateway: 172.21.55.193

Step 4 Boot the kickstart image file from the required server.

loader> loader> boot tftp://172.28.255.18/tftpboot/n7000-s1-kickstart.4.0.3.gbin            
Address: 172.21.55.213
Netmask: 255.255.255.224
Server: 172.28.255.18
Gateway: 172.21.55.193
 Filesystem type is tftp, using whole disk
Booting: /tftpboot/n7000-s1-kickstart.4.0.3.gbin console=ttyS0,9600n8nn quiet loader
_ver="3.17.0"....
.............................................................................Im
age verification OK

Starting kernel...
INIT: version 2.85 booting
Checking all filesystems..r.r.r.. done.
Setting kernel variables: sysctlnet.ipv4.ip_forward = 0
net.ipv4.ip_default_ttl = 64
net.ipv4.ip_no_pmtu_disc = 1
. 
Setting the System Clock using the Hardware Clock as reference...System Clock set. Local 
time: Wed Oct  1 11:20:11 PST 2008
WARNING: image sync is going to be disabled after a loader netboot
Loading system software
No system image Unexporting directories for NFS kernel daemon...done.
INIT: Sending processes the KILL signal
Cisco Nexus Operating System (NX-OS) Software
TAC support: http://www.cisco.com/tac
Copyright (c) 2002-2008, Cisco Systems, Inc. All rights reserved.
The copyrights to certain works contained in this software are
owned by other third parties and used and distributed under
license. Certain components of this software are licensed under
the GNU General Public License (GPL) version 2.0 or the GNU
Lesser General Public License (LGPL) Version 2.1. A copy of each
such license is available at
http://www.opensource.org/licenses/gpl-2.0.php and
http://www.opensource.org/licenses/lgpl-2.1.php
switch(boot)# 

The switch(boot)# prompt indicates that you have a usable kickstart image.


Note If you boot over TFTP from the loader> prompt, you must supply the full path to the image on the remote server.


Step 5 Enter the init-system command to repartition and format the bootflash.

Step 6 Perform the procedure specified in the "Recovery from the switch(boot)# Prompt" section.

Step 7 Perform the procedure specified in the "Recovering One Supervisor Module With Corrupted Bootflash" section to recover the other supervisor module.



Note If you do not enter the reload module command when a boot failure has occurred, the active supervisor module automatically reloads the standby supervisor module within 3 to 6 minutes after the failure.


System or Process Resets

When a recoverable or nonrecoverable error occurs, the system or a process on the system may reset. See Table 2-4 for possible causes and solutions.

Symptom    The system or a process on the system reset.

Table 2-4 System or Process Resets

Problem
Possible Cause
Solution

The system or a process on the system resets.

A recoverable error occurred on the system or on a process in the system.

Cisco NX-OS automatically recovered from the problem. See the "Recoverable System Restarts" section and the "System or Process Resets" section.

A nonrecoverable error occurred on the system.

Cisco NX-OS cannot recover automatically from the problem. See the "Unrecoverable System Restarts" section to determine the cause.

A clock module failed.

Verify that a clock module failed. Replace the failed clock module during the next maintenance window.


Recoverable System Restarts

Every process restart generates a syslog message and a Call Home event. Even if the event does not affect service, you should identify and resolve the condition immediately because future occurrences could cause a service interruption.

To respond to a recoverable system restart, follow these steps:


Step 1 Check the syslog file to see which process restarted and why it restarted.

switch# show log logfile | include error

For information about the meaning of each message, see to the Cisco NX-OS System Messages Reference.

The system output looks like the following example:

Sep 10 23:31:31 dot-6 % LOG_SYSMGR-3-SERVICE_TERMINATED: Service "sensor" (PID 704) has 
finished with error code SYSMGR_EXITCODE_SY.
switch# show logging logfile | include fail
Jan 27 04:08:42 88 %LOG_DAEMON-3-SYSTEM_MSG: bind() fd 4, family 2, port 123, ad
dr 0.0.0.0, in_classd=0 flags=1 fails: Address already in use
Jan 27 04:08:42 88 %LOG_DAEMON-3-SYSTEM_MSG: bind() fd 4, family 2, port 123, ad
dr 127.0.0.1, in_classd=0 flags=0 fails: Address already in use
Jan 27 04:08:42 88 %LOG_DAEMON-3-SYSTEM_MSG: bind() fd 4, family 2, port 123, ad
dr 127.1.1.1, in_classd=0 flags=1 fails: Address already in use
Jan 27 04:08:42 88 %LOG_DAEMON-3-SYSTEM_MSG: bind() fd 4, family 2, port 123, ad
dr 172.22.93.88, in_classd=0 flags=1 fails: Address already in use
Jan 27 23:18:59 88 % LOG_PORT-5-IF_DOWN: Interface fc1/13 is down (Link failure 
or not-connected)
Jan 27 23:18:59 88 % LOG_PORT-5-IF_DOWN: Interface fc1/14 is down (Link failure 
or not-connected)
Jan 28 00:55:12 88 % LOG_PORT-5-IF_DOWN: Interface fc1/1 is down (Link failure o
r not-connected)
Jan 28 00:58:06 88 % LOG_ZONE-2-ZS_MERGE_FAILED: Zone merge failure, Isolating p
ort fc1/1 (VSAN 100)
Jan 28 00:58:44 88 % LOG_ZONE-2-ZS_MERGE_FAILED: Zone merge failure, Isolating p
ort fc1/1 (VSAN 100)
Jan 28 03:26:38 88 % LOG_ZONE-2-ZS_MERGE_FAILED: Zone merge failure, Isolating p
ort fc1/1 (VSAN 100)
Jan 29 19:01:34 88 % LOG_PORT-5-IF_DOWN: Interface fc1/1 is down (Link failure o
r not-connected)
switch#

Step 2 Identify the processes that are running and the status of each process.

switch# show processes 

The following codes are used in the system output for the State (process state):

D = uninterruptible sleep (usually I/O)

R = runnable (on run queue)

S = sleeping

T = traced or stopped

Z = defunct ("zombie") process

NR = notrunning

ER = should be running but currently notrunning


Note ER usually is the state a process enters if it has been restarted too many times and has been detected as faulty by the system and disabled.


The system output looks like the following example. (This output has been abbreviated to be more concise.)

PID    State  PC        Start_cnt    TTY   Process
-----  -----  --------  -----------  ----  -------------
    1      S  2ab8e33e            1     -  init
    2      S         0            1     -  keventd
    3      S         0            1     -  ksoftirqd_CPU0
    4      S         0            1     -  kswapd
    5      S         0            1     -  bdflush
    6      S         0            1     -  kupdated
   71      S         0            1     -  kjournald
  136      S         0            1     -  kjournald
  140      S         0            1     -  kjournald
  431      S  2abe333e            1     -  httpd
  443      S  2abfd33e            1     -  xinetd
  446      S  2ac1e33e            1     -  sysmgr
  452      S  2abe91a2            1     -  httpd
  453      S  2abe91a2            1     -  httpd
  456      S  2ac73419            1    S0  vsh
  469      S  2abe91a2            1     -  httpd
  470      S  2abe91a2            1     -  httpd  

Step 3 Show the processes that have had abnormal exits and to show if there is a stack-trace or core dump.

switch# show process log
Process           PID     Normal-exit  Stack-trace  Core     Log-create-time
----------------  ------  -----------  -----------  -------  ---------------
ntp               919               N            N        N  Jan 27 04:08
snsm              972               N            Y        N  Jan 24 20:50

Step 4 Show detailed information about a specific process that has restarted.

switch# show processes log pid 898
Service: idehsd
Description: ide hotswap handler Daemon
Started at Mon Sep 16 14:56:04 2002 (390923 us)
Stopped at Thu Sep 19 14:18:42 2002 (639239 us)
Uptime: 2 days 23 hours 22 minutes 22 seconds
Start type: SRV_OPTION_RESTART_STATELESS (23)
Death reason: SYSMGR_DEATH_REASON_FAILURE_SIGTERM (3)
Exit code: signal 15 (no core)
CWD: /var/sysmgr/work
Virtual Memory:
CODE      08048000 - 0804D660
    DATA      0804E660 - 0804E824
    BRK       0804E9A0 - 08050000
    STACK     7FFFFD10
Register Set:
EBX 00000003         ECX 0804E994         EDX 00000008
    ESI 00000005         EDI 7FFFFC9C         EBP 7FFFFCAC
    EAX 00000008         XDS 0000002B         XES 0000002B
    EAX 00000003 (orig)  EIP 2ABF5EF4         XCS 00000023
    EFL 00000246         ESP 7FFFFC5C         XSS 0000002B
Stack: 128 bytes. ESP 7FFFFC5C, TOP 7FFFFD10
0x7FFFFC5C: 0804F990 0804C416 00000003 0804E994 ................
0x7FFFFC6C: 00000008 0804BF95 2AC451E0 2AAC24A4 .........Q.*.$.*
0x7FFFFC7C: 7FFFFD14 2AC2C581 0804E6BC 7FFFFCA8 .......*........
0x7FFFFC8C: 7FFFFC94 00000003 00000001 00000003 ................
0x7FFFFC9C: 00000001 00000000 00000068 00000000 ........h.......
0x7FFFFCAC: 7FFFFCE8 2AB4F819 00000001 7FFFFD14 .......*........
0x7FFFFCBC: 7FFFFD1C 0804C470 00000000 7FFFFCE8 ....p...........
0x7FFFFCCC: 2AB4F7E9 2AAC1F00 00000001 08048A2C ...*...*....,...
PID: 898
SAP: 0
UUID: 0
switch#

Step 5 Determine if the restart recently occurred.

switch# show system uptime 
Start Time: Fri Sep 13 12:38:39 2002
Up Time:    0 days, 1 hours, 16 minutes, 22 seconds

To determine if the restart is repetitive or a one-time occurrence, compare the length of time that the system has been up with the timestamp of each restart.

Step 6 View the core files.

switch# show cores
Module-num      Process-name      	PID    	Core-create-time
----------     		------------     			 ---   	 ----------------
5              		fspf              			1524    Jan 9 03:11
6              		fcc               			919     Jan 9 03:09
8              		acltcam           			285     Jan 9 03:09
8              		fib               			283     Jan 9 03:08

The output shows all cores that are presently available for upload from the active supervisor. The module-num column shows the slot number on which the core was generated. In the previous example, an FSPF core was generated on the active supervisor module in slot 5. An FCC core was generated on the standby supervisory module in slot 6. Core dumps generated on the module in slot 8 include ACLTCAM and FIB.

Copy the FSPF core dump to a TFTP server with the IP address 1.1.1.1, as follows:

switch# copy core://5/1524 tftp::/1.1.1.1/abcd

Display the file named zone_server_log.889 in the log directory as follows:

 switch# show pro log pid 1473
======================================================
Service: ips
Description: IPS Manager

Started at Tue Jan  8 17:07:42 1980 (757583 us)
Stopped at Thu Jan 10 06:16:45 1980 (83451 us)
Uptime: 1 days 13 hours 9 minutes 9 seconds

Start type: SRV_OPTION_RESTART_STATELESS (23)
Death reason: SYSMGR_DEATH_REASON_FAILURE_SIGNAL (2)
Exit code: signal 6 (core dumped)
CWD: /var/sysmgr/work

Virtual Memory:

    CODE      08048000 - 080FB060
    DATA      080FC060 - 080FCBA8
    BRK       081795C0 - 081EC000
    STACK     7FFFFCF0
    TOTAL     20952 KB

Register Set:

    EBX 000005C1         ECX 00000006         EDX 2AD721E0
    ESI 2AD701A8         EDI 08109308         EBP 7FFFF2EC
    EAX 00000000         XDS 0000002B         XES 0000002B
    EAX 00000025 (orig)  EIP 2AC8CC71         XCS 00000023
    EFL 00000207         ESP 7FFFF2C0         XSS 0000002B

Stack: 2608 bytes. ESP 7FFFF2C0, TOP 7FFFFCF0

0x7FFFF2C0: 2AC8C944 000005C1 00000006 2AC735E2 D..*.........5.*
0x7FFFF2D0: 2AC8C92C 2AD721E0 2AAB76F0 00000000 ,..*.!.*.v.*....
0x7FFFF2E0: 7FFFF320 2AC8C920 2AC513F8 7FFFF42C  ... ..*...*,...
0x7FFFF2F0: 2AC8E0BB 00000006 7FFFF320 00000000 ...*.... .......
0x7FFFF300: 2AC8DFF8 2AD721E0 08109308 2AC65AFC ...*.!.*.....Z.*
0x7FFFF310: 00000393 2AC6A49C 2AC621CC 2AC513F8 .......*.!.*...*
0x7FFFF320: 00000020 00000000 00000000 00000000  ...............
0x7FFFF330: 00000000 00000000 00000000 00000000 ................
0x7FFFF340: 00000000 00000000 00000000 00000000 ................
0x7FFFF350: 00000000 00000000 00000000 00000000 ................
0x7FFFF360: 00000000 00000000 00000000 00000000 ................
0x7FFFF370: 00000000 00000000 00000000 00000000 ................
0x7FFFF380: 00000000 00000000 00000000 00000000 ................
0x7FFFF390: 00000000 00000000 00000000 00000000 ................
0x7FFFF3A0: 00000002 7FFFF3F4 2AAB752D 2AC5154C .
... output abbreviated ...
Stack: 128 bytes. ESP 7FFFF830, TOP 7FFFFCD0

Step 7 Enter the system cores tftp:[//servername][/path] command to configure the system to use TFTP to send the core dump to a TFTP server.

This command causes the system to enable the automatic copy of core files to a TFTP server. For example, the following command sends the core files to the TFTP server with the IP address 10.1.1.1:

switch(config)# system cores tftp://10.1.1.1/cores

The following conditions apply:

The core files are copied every 4 minutes. This time interval is not configurable.

The copy of a specific core file to a TFTP server can be manually triggered, by using the command
copy core://module#/pid# tftp://tftp_ip_address/file_name.

The maximum number of times that a process can be restarted is part of the high-availability (HA) policy for any process. (This parameter is not configurable.) If the process restarts more than the maximum number of times, the older core files are overwritten.

The maximum number of core files that can be saved for any process is part of the HA policy for any process. (This parameter is not configurable, and it is set to three.)

Step 8 Determine the cause and resolution for the restart condition by contacting your technical support representative and asking the representative to review your core dump.


See the Cisco NX-OS High Availability and Redundancy Guide, Release 4.0 for more information on high-availability policies.

Unrecoverable System Restarts

An unrecoverable system restart might occur in the following cases:

A critical process fails and is not restartable.

A process restarts more times than is allowed by the system configuration.

A process restarts more frequently than is allowed by the system configuration.

The effect of a process reset is determined by the policy configured for each process. An unrecoverable reset may cause functionality loss, the active supervisor to restart, a supervisor switchover, or the system to restart.

To respond to an unrecoverable reset, see the "Troubleshooting Cisco NX-OS Software System Reboots" section.

The show system reset-reason CLI command displays the following information (see Example 2-2):

The last four reset-reason codes for the supervisor modules are displayed. If either supervisor module is absent, the reset-reason codes for that supervisor module are not displayed.