Guest

Cisco AS5400 Series Universal Gateways

Configuring NextPort SPE Recovery

Document ID: 6207



Contents

Introduction
Prerequisites
      Requirements
      Components Used
      Conventions
SPE/Port Failure Overview
Configuring SPE Recovery
      SPE Recovery Configuration Examples
Monitoring SPE Recovery
Using SPE Recovery
      Automatic Identification of Bad Port/SPEs
      Reloading Firmware
      Explaining the Configuration
      SPE Recovery Command Summary
Related Information

Introduction

Field analysis and investigation has found cases where NextPort Software Port Entity (SPEs) in production sometimes stops working. However, reloading the firmware generally recovers the SPE back into operational mode by resetting it. The objective of the SPE recovery feature is to have the network access server (NAS) identify SPEs which have gone out of operation, and automatically reload their digital signal processor (DSP) firmware with minimal impact on end users and NAS capacity.

For information on how to perform modem recovery on MICA modem platforms refer to the document Configuring Modem Recovery.

Prerequisites

Requirements

There are no specific requirements for this document.

Components Used

SPE recovery first appeared in Cisco IOS® Software Release 12.1(1)XD of IOS (for the AS5400). It was implemented for the AS5800 Series starting in 12.1(3)T.

We recommend using Cisco IOS Release12.2 mainline or above. Make sure you have an IOS loaded that supports modem recovery.

The information in this document was created from the devices in a specific lab environment. All of the devices used in this document started with a cleared (default) configuration. If your network is live, make sure that you understand the potential impact of any command.

Conventions

For more information on document conventions, refer to the Cisco Technical Tips Conventions.

SPE/Port Failure Overview

NextPort platforms are implemented with one Software Port Entity (SPE) servicing six individual ports (modems). Features like busyout and shutdown can be configured at the SPE or the individual port level. The NextPort Dial Feature Card (DFC) contains 18 SPEs, whose value ranges from 0 to 17. Because each SPE has six ports, the NextPort DFC has a total of 108 ports. The port value ranges from 0 to 107.

Configuring SPE Recovery

Once you have installed Cisco IOS Software Release version supporting SPE recovery, configure a recovery scheme to meet the needs of your installation. In order to learn how best to configure this, you need to understand the usage patterns and policies. This involves answering such questions as:

  • At what rate are the ports failing? Two SPEs per day? Two SPEs per hour?

  • When is your daily usage at its lowest?

  • What is your policy regarding clearing active calls? Are you willing to drop calls as needed, in order to reload SPEs, to prevent other callers from getting busy signals?

SPE Recovery Configuration Examples

Here are some example configurations for SPE recovery. For more information, refer to Managing Port Services for the AS5400 and AS5800.

Tip: Due to similarities between the AS5350 and AS5400 platforms, the document Managing Port Services on the AS5400 can also be used for the AS5350. Similarly, the document Managing Port Services on the AS5800 can also be applied to the AS5850.

Note: On MICA modem systems, modem recovery commands are used instead of the SPE recovery commands shown here. For more information on modem recovery with MICA modems, refer to Configuring MICA Modem Recovery.

SPE command

MICA command

spe recovery port-threshold

modem recovery threshold

spe recovery port-action

modem recovery action

spe download maintenance

modem recovery maintenance

Note: Keep in mind that each SPE controls six ports. SPE recovery can tag a malfunctioning individual port as "pending recovery" but requires all the ports on the SPE to be idle before reloading the SPE firmware. In other words, the SPE will be considered idle only when all six ports on the SPE are idle at which time recovery may take place for that SPE.

The scenarios below describe sample parameters to:

  • Identify that an individual port is malfunctioning.

  • Specify whether to mark the port as "Bad" or wait till the other ports on the SPE are idle before performing recovery.

  • Specify the time and window of the firmware download for the affected SPE.

  • Action to take if the SPE is not idle at the end of the download window.

Note: All the commands in each table should be applied to the router for each of the following scenarios.

Highly Aggressive SPE Recovery

This assumes the administrator finds ports that need frequent recovery. Hence, ports must be recovered quickly and continuously to prevent the clients from encountering busies. In this case, it is assumed that the administrator is willing to drop active calls as needed.

Command

Explanation

spe recovery port-threshold 10

When a port suffers 10 consecutive trainup failures, it will be taken out of service and marked as pending recovery.

spe recovery port-action recover

Set the port into a recovery pending state; therefore, stopping the port from accepting new calls. Perform recovery when the SPE is in the idle state and has no active calls on its 6 ports.

spe download maintenance max-spes 3

Sets a maximum of three SPEs that can simultaneously be in maintenance.

spe download maintenance window 60

Any idle port will be placed in recovery pending state during the maintenance window. If all active calls on the SPE drop during the 60 minute maintenance window, the SPE will reload immediately.

After 60 minutes, the system performs the action specified in spe download maintenance expired-window shown below.

spe download maintenance expired-window drop-call

Once the maintenance window expires, all active connections on the SPE ports are shut down and the firmware is downloaded.

Moderately Aggressive Modem Recovery

In this case, the administrator finds ports need recovering at the rate of only two or three SPEs per day. The customer has several spare SPEs per chassis, and a usage is lowest from 02:00 to 04:00, so it is acceptable to delay recovery until that time.

Command

Explanation

spe recovery port-threshold 10

We keep the consecutive trainup failure threshold at 10 to maintain the degree of certainty that the port is indeed bad.

spe recovery port-action recover

Set the port into a recovery pending state; therefore, stopping the port from accepting new calls. Recover when the SPE is in the idle state and has no active calls on its 6 ports.

spe download maintenance time 02:00

Rather than starting immediately, however, this administrator is willing to wait until a lull in usage at 2 AM.

The download maintenance activity starts at 2 AM and steps through all SPEs that need recovery and the SPEs that need a firmware upgrade.

spe download maintenance stop-time 04:00

The SPE firmware download process can continue until 4 AM.

spe download maintenance max-spes 5

The maximum number of SPEs tied up is raised to 5 because during the maintenance window fewer ports are needed to service incoming requests.

spe download maintenance window 90

Any idle port is placed in recovery pending state during the maintenance window. If all active calls on the SPE drop during the 90 minute maintenance window, the SPE reloads immediately.

After 90 minutes, the system performs the action specified in spe download maintenance expired-window shown below.

spe download maintenance expired-window drop-call

If a SPE needs to be recovered, this administrator is still willing to disconnect a current user to make sure the modems are available during peak usage hours.

Conservative SPE Recovery

In this case, the administrator has many spare modems in the chassis, and is unwilling to drop user calls.

Command

Explanation

spe recovery port-threshold 7

In this case, the consecutive trainup failure threshold is lowered to 7 to make it more likely for the port to be marked bad.

spe recovery port-action recover

Set the port into a recovery pending state, thereby stopping the port from accepting new calls. Perform recovery when the SPE is in the idle state and has no active calls on all 6 ports.

spe download maintenance time 02:00

This administrator is also willing to wait until the number of users is smallest.

The download maintenance activity starts at 2 AM and steps through all SPEs that need recovery and/or firmware upgrade.

spe download maintenance stop-time 05:00

This maintenance window lasts until 5AM.

spe download maintenance max-spes 5

The maximum number of SPEs would still be 5 because during the maintenance window fewer modems are needed to service incoming requests.

spe download maintenance window 120

Any idle port is placed in recovery pending state during the maintenance window. If all active calls on the SPE drop during the 120 minute maintenance window, the SPE reloads immediately.

After 120 minutes, the system performs the action specified in spe download maintenance expired-window shown below.

spe download maintenance expired-window reschedule

The firmware download is rescheduled to the next download maintenance time.

Since this NAS has spare SPEs, the administrator can afford to have some of the modems out of service during peak usage hours. There is no need to disconnect users.

Because the NAS has many spare SPEs, the administrator configures a consecutive trainup failure threshold of 7. This quickly detects a failed modem, but at the cost of some modems being marked for download prematurely. During the maintenance period, if active calls on the SPE should fail to drop, the recovery will be automatically rescheduled rather than drop existing calls.

Monitoring SPE Recovery

Use the command show spe recovery to monitor SPE recovery once it has been enabled. A sample output, obtained from an AS5400, is provided:

RABU-5400#show spe recovery 
SPE#      Session Abort   Session NAK   Call Failure   SPE Bad
1/00                  0             0              1         0
1/01                  0             0              0         0
1/02                  0             0              0         0

The 'Call Failure' count increments every time there are 'x' failed calls to one modem or port in that SPE, as specified with the spe recovery port-threshold x . In other words, when ANY port (modem) on a SPE reaches the spe recovery port-threshold x value, the Call Failure column increments by 1, and the SPE is marked for recovery (download) during the maintenance window (defined in the router configuration).

You can also use debug commands to verify that modem recovery was performed. An example is shown below:

debug csm modem
debug spe download-maintenance 
debug spe fm 

Nov  7 00:24:07.295: CSM DSPLIB(1/1): Modem state changed to 
(TERMINATING_STATE)
Nov  7 00:24:07.295: CSM DSPLIB(1/1): Modem went onhook
Nov  7 00:24:07.295: CSM_PROC_IC8_OC8_DISCONNECTING: 
CSM_EVENT_MODEM_ONHOOK at slot 1, port 1
Nov  7 00:24:07.295: CSM(1/1): Enter csm_enter_idle_state

!-- This is the last failed modem call that triggers spe recovery.

Nov  7 00:24:07.295: PM_DNLD_MAINT: Recovery due to idle 1/00

!-- SPE is marked for recovery, and recovery starts immediately 


!-- when the whole SPE is IDLE.

Nov  7 00:24:07.295: CSM DSPLIB(1/1):DSPLIB_IDLE: 
Modem session transition to FLUSHING
Nov  7 00:24:07.295: CSM DSPLIB(1/1):DSPLIB_IDLE: 
Modem session transition to IDLE
Nov  7 00:24:07.295: CSM_PROC_IDLE: CSM_EVENT_SHUTDOWN_INTERFACE 
at slot 1, port 0
Nov  7 00:24:07.295: CSM_PROC_IDLE: CSM_EVENT_SHUTDOWN_INTERFACE 
at slot 1, port 1
Nov  7 00:24:07.295: CSM_PROC_IDLE: CSM_EVENT_SHUTDOWN_INTERFACE 
at slot 1, port 2
Nov  7 00:24:07.295: CSM_PROC_IDLE: CSM_EVENT_SHUTDOWN_INTERFACE 
at slot 1, port 3
Nov  7 00:24:07.295: CSM_PROC_IDLE: CSM_EVENT_SHUTDOWN_INTERFACE 
at slot 1, port 4
Nov  7 00:24:07.295: CSM_PROC_IDLE: CSM_EVENT_SHUTDOWN_INTERFACE 
at slot 1, port 5

!-- Since all 6 ports(1/0-1/5) on the SPE are idle, 


!-- they are busied out in preparation for recovery.

Nov  7 00:24:07.299: PM_FW_MGR: pm_fm_req_download
Nov  7 00:24:07.299: PM_FW_MGR: firm_index = 1
Nov  7 00:24:07.299: PM_NP_FW_MGR: pm_np_fm_alloc_rsp()
Nov  7 00:24:07.299: PM_NP_FW_MGR: fw_msg addrs = 0x64022A64
Nov  7 00:24:07.299: PM_NP_FW_MGR: firmware index = 1 spes 0x1 
mi = 0x1000000
Nov  7 00:24:07.299: PM_DNLD_MAINT: Rsp of Recovery Req 0 for 1/00
Nov  7 00:24:07.299: PM_NP_FW_MGR:pm_np_fm_dl_event_hdlr
Nov  7 00:24:07.299: PM_NP_FW_MGR:fw_msg->err_code == 0x0
Nov  7 00:24:07.299: PM_NP_FW_MGR:SPE are busied out - fw_index = 1
Nov  7 00:24:10.591: CSM_PROC_IDLE: CSM_EVENT_DSX0_DISCONN_TIMEOUT 
at slot 1, port 1
Nov  7 00:24:14.991: PM_FW_MGR: pm_fm_spe_enable()
Nov  7 00:24:14.991: PM_FW_MGR: slot = 1 spe = 0
Nov  7 00:24:14.991: PM_DNLD_MAINT: 1/01 port recovery count = 1
Nov  7 00:24:14.991: PM_NP_FW_MGR:pm_np_fm_dl_event_hdlr
Nov  7 00:24:14.991: PM_NP_FW_MGR: DL completed
Nov  7 00:24:14.991: PM_NP_FW_MGR: module 0 max_spe = 6 spe_bitmape 0x1
Nov  7 00:24:14.991: Slot 1, spe 0 - Download done

!-- Firmware download for SPE 1/0 is complete.

Nov  7 00:24:15.003: CSM DSPLIB(1/1):DSPLIB_IDLE: 
Modem session transition to IDLE
Nov  7 00:24:15.003: CSM DSPLIB(1/2):DSPLIB_IDLE: 
Modem session transition to IDLE
Nov  7 00:24:15.003: CSM DSPLIB(1/3):DSPLIB_IDLE: 
Modem session transition to IDLE
Nov  7 00:24:15.003: CSM DSPLIB(1/4):DSPLIB_IDLE: 
Modem session transition to IDLE
Nov  7 00:24:15.003: CSM DSPLIB(1/5):DSPLIB_IDLE: 
Modem session transition to IDLE
Nov  7 00:24:15.035: CSM DSPLIB(1/0):DSPLIB_IDLE: 
Modem session transition to IDLE

!-- All the ports in the SPE transition to IDLE. 

They are now ready to take calls

Using SPE Recovery

Automatic Identification of Bad Port/SPEs

NextPort modems have shown to maintain a healthy call success rate (CSR) of 90 to 95 percent under normal usage. What this means is that 90 to 95 percent of all calls which are allocated to a port successfully connect, link, trainup, negotiate, and finally enter a steady state where the client and NAS modems can transfer data. The 5 to 10 percent failure rate can be associated to numerous client side issues such as incompatible clients and clients disconnecting. These client side issues cannot be viewed as a problem with the NextPort SPE. So, in the worst case scenario, you can expect that at least 1 call in 10 attempts will fail. Thus, basic statistics tell us:

  • The probability of 1 consecutive failure call attempts is: 1/10.

  • The probability of 2 consecutive failure call attempts is: 1/10 x 1/10.

  • The probability of 3 consecutive failure call attempts is: 1/10 x 1/10 x 1/10.

  • The probability of "n" consecutive failure call attempts is: (1/10) raised to the power of "n".

As such, basic statistics tell us that even under a situation of a normal CSR rate of 90 percent, the probability of a good modem failing to enter a steady state (after a call has been allocated to it) drops significantly after each failed call attempt. Therefore, under this analysis, setting the modem threshold to a value of ten makes it extremely unlikely that a modem with ten consecutive failures would be operating properly.

Reloading Firmware

As mentioned above, SPE ports are implemented in a modular fashion whereby six ports are allocated to a single SPE. This was done to minimize costs and complexity. An unfortunate consequence of this design is that the NAS is unable to download firmware to a single port, but requires all six ports to be reloaded at the same time. This issue is not significant when booting the NAS because no active calls are being processed at that time. However, this issue becomes significant when trying to load firmware for either recovery or upgrade purposes. The objective is to reload the Dial Feature Card (DFC) with a minimal impact to the end users and the NAS operation. For this purpose, you have two tools at your disposal:

  • SPE "busyout": This locks all ports on the module as being busy and does not allow new calls to be allocated on any of the ports until the "busyout" is removed. This is usually after the SPE is reloaded. Existing calls on ports are not affected when SPE is in the "busyout" state.

  • Hourly utilization analysis: Modem usage is quite predictable. There are telecommuters who use modems between 7:00 AM and 6:00 PM who provide a consistent call volume throughout the business day. Then, there are nightly surfers who surf the Web between 6:00 PM and 2:00 AM. As a result, modem usage between 2:00 AM and 7:00 AM is typically at its lowest.

The "busyout" tool is currently widely used for firmware upgrades. However, this tool has a significant drawback. If the SPE is left in a "busyout" state until all calls drop, then a single modem call that does not disconnect can prevent new calls from being accepted on that entire SPE. For example, if you have one active call in a SPE with six ports and the remaining five ports do not have existing calls, this can seriously impact a NAS's ability to perform at top capacity. This is because you have removed these five ports from service.

To avoid this, the SPE recovery feature uses a firmware reload algorithm which will attempt to reload the SPE firmware with the least possible impact and still retain a good chance of getting the firmware downloaded to the ports. This is done by:

  • The firmware download taking place as soon as possible without requiring a "busyout". If any port on a given SPE is in a recovery pending (seen as a "r" state), and there are no active calls left on that SPE, the recovery mechanism will download the SPE right away. This mechanism should do most of the downloads in a safe and controlled fashion without requiring any ports to be in "busyout" state to accomplish the objective.

  • Scheduling "busyout" during the off-peak hours, so recovery maintenance can be performed on the SPEs. This is necessary for a NAS which is heavily loaded throughout the day. Therefore, it is necessary that no new calls get allocated to a SPE and the active calls have a chance to drop normally before proceeding with the download. However, unlike the regular "busyout", the SPE recovery mechanism only puts the SPE in "busyout" for a specified window of time. If the window expires for the download, it cannot continue anymore with the "busyout" of the SPE, and must do something different to return the modem capacity back to the NAS. Also, you must manage the amount of SPEs which can be in the "busyout" state at the same time. Even though you are in the off-peak hours when performing this action, you really should not "busyout" more than 20 percent of SPEs at one time.

Explaining the Configuration

The "busyout" behavior is managed with the SPE recovery maintenance configuration, which includes the time when recovery starts (default is 3:00 am), the window (maximum "busyout" duration for a single SPE), and max-spes (the maximum number of SPEs which can be in the "busyout" state at the same time during the window) - default is 20 percent of NAS capacity and is dynamically calculated.

Consider the following settings on a NAS with 10 SPEs (60 ports) needing to be reloaded and the following configuration:

spe download maintenance time 00:00 

!--- hh:mm
 
spe download maintenance window 60 

!--- minutes
 
spe download maintenance max-spes 2

!--- number of SPEs

   TIME
 00:00 01:00 02:00 03:00 04:00 05:00 06:00
   |     |     |     |     |     |     |
   ------------------------------------------------------------------->
   ^     ^     ^     ^     ^     ^
   |     |     |     |     |     |
   |     |     |     |     |     - should be finished at 5AM
   |     |     |     |     - window to download last 2 SPEs
   |     |     |     - window to download next 2 SPEs
   |     |     - window to download next 2 SPEs
   |     - window to download next 2 SPEs
   - window to download first 2 SPEs

In the above case, the NAS will be in a recovery maintenance "busyout" state for at most five hours. This is a very unlikely situation, but can easily be handled by the recovery maintenance process.

SPE Recovery Command Summary

The following are the configuration commands available for fine tuning the modem recovery process:

spe recovery port-threshold num-failures

A port failing to connect for a certain number of consecutive times (specified with num-failures) indicates that a problem exists in a specific part or the whole of SPE firmware The SPE to which the port belongs has to be recovered by downloading new firmware. Any port failing to connect the specified number of times is moved to a state based on the spe recover port-action value explained below.

This command sets the number of consecutive call attempts which fail to trainup before you consider the port faulty. The default is set to 30.

spe recovery port-action action

Once an SPE is determined to be faulty, the configured action takes place on the modem. The following choices are possible:

  • disable: Mark the SPE port bad (B).

  • recover: Set the port into a recovery pending state, thereby stopping the port from accepting new calls. Recover the port when the SPE is in the idle state and has no active calls.

  • none: No action is taken. This is the default.

spe download maintenance time hh:mm

The download maintenance activity starts at the set start time and steps through all SPEs that need recovery and the SPEs that need a firmware upgrade.

This command sets the actual time of day when the SPE recovery maintenance process wakes up and starts recovering the ports. (The default is 03:00).

spe download maintenance window time-period

When an SPE attempts to reload its portware, it must avoid taking down any active connections that may exist. Because of this, the recovery process sets all ports not currently in use to the recovery pending state. If any ports on the module are active, the recovery process waits for the calls to terminate normally.

In order to avoid capacity problems from attempting recovery for an excessively long period, a maintenance window is configured to require the SPE recovery to take place within that timeframe. The system waits for the spe download maintenance window time-period for all the ports on the SPE to become inactive before moving the SPE to the Idle state. Immediately after the SPE moves to the Idle state, the system starts to download firmware. If the window expires before recovery can be performed, then a given action (specified in spe download maintenance expired-window) are performed on that module (the default window is 60 minutes).

spe download maintenance max-spes number

When the SPE recovery maintenance process starts, it attempts to recover all ports in the recovery pending state. This can potentially be all modules on a given system. To avoid taking down all ports on a given system, only a maximum number of recoveries can take place at one time. The default is dynamically calculated to be 20 percent of the modules on a given system.

This command overrides the default and specifies the maximum number of SPEs that can simultaneously be in maintenance.

spe download maintenance expired-window {drop-call | reschedule}

At the end of the recovery window, connections on the SPE ports are shut down and the firmware is downloaded by choosing the drop-call option, or the firmware download is rescheduled to the next download maintenance time by choosing the reschedule option.

spe download maintenance stop-time hh:mm

Time of day to stop all pending recovery maintenance tasks. Some customers have specific maintenance times which can be fine tuned. If you prefer that the maintenance window not exceed a certain time of day, this option can be useful.

There is no default stop-time


Related Information



Updated: Oct 30, 2008 Document ID: 6207