Guest

Cisco MGX 8200 Series Edge Concentrators

MGX 8240 CES Redundancy Fault Tips for Release 1.X

Document ID: 6931



Contents

Introduction
Prerequisites
      Requirements
      Components Used
      Conventions
Backup CES May Switch in for a Primary CES After Several Days of Service
fetchFiles Flag Gets Stuck
File Transfers and Configuration Updates
Debugging Aids
Tracing for Fault Management
NetPro Discussion Forums - Featured Conversations
Related Information

Introduction

The MGX 8240 Circuit Emulation Service (CES) card supports N:1 redundancy. A redundancy group typically consists of one backup and four primary CES cards, for a total of five CES cards. The backup CES can switch-in for any of the primary CES cards. Once the backup CES is switched in (SI) for a primary CES, the card must be manually switched back (SB) to allow the primary CES to resume operations so that the backup CES can resume backup functionality. Release 1.X of the MGX 8240 switch software has known redundancy-affecting behaviors that can be isolated to a specific cause with the steps outlined in this document. For most performance problems, the sysReset command can be issued to clear the anomaly, but service is disrupted while the card is restored to service. After the sysReset command is issued, card restoration usually takes between five and ten minutes. The sysReset command is issued from the VxWorks shell of the CES. There are three separate interfaces used in order to manage the MGX 8240 in addition to the VxWorks operating system.

  • VCLI—Virtual Command Line Interface

  • CMDR—Craft interface/console Management Diagnostic Resource

  • SentientView Network Management System

This document assumes the reader is familiar with the VxWorks and the VCLI user interfaces. For this document, VxWorks and VCLI commands are distinguished from each other. All commands in this document are case sensitive.

Prerequisites

Requirements

There are no specific requirements for this document.

Components Used

This document is not restricted to specific software and hardware versions.

The information in this document was created from the devices in a specific lab environment. All of the devices used in this document started with a cleared (default) configuration. If your network is live, make sure that you understand the potential impact of any command.

Conventions

For more information on document conventions, refer to the Cisco Technical Tips Conventions.

Backup CES May Switch in for a Primary CES After Several Days of Service

The backup CES card sometimes switches in for a primary CES when the watchdog timers become synchronized between the two cards. This causes VxWorks ITIME events and IPC_MSG (TQ0) events to be processed in the same Operations Support System (OSS) event.

Because of a defect in the fmTask main loop,

  • The ITIME event is ignored on the primary CES.

  • Without the ITIME event, no health message is sent by the primary CES to the backup CES.

  • After four to seven missed messages, the backup CES switches in for the primary CES.

This defect is resolved in Release 2.X switch software.

In order to view the current redundancy state of an MGX 8240 switch, issue the show switch redundancy (or sh sw red) command from the VCLI. The screen display shown here indicates there are two redundancy groups, and both backup CES cards are switched in (SI) for a primary CES card that is in the switched out (SO) state. In redundancy group 4, the backup CES in slot 4 is switched in for the primary CES in slot 5. In redundancy group 15, the backup CES in slot 15 is switched in for the primary CES in slot 16. From the VCLI, issue the change ces redundancy -card <slot_number> -force back (or ch ces red -card 15 -force back) command to manually switch the primary CES back to an in service (IS) state. If this command does not work after ten minutes, the VxWorks fetchFiles flag could be stuck.

tac[4..] - VCLI>> sh SW red 

Switch Name: tac 

Phys Slot 1A   1B   2   3   4   5   6   7   8   9   10  11  12  13  14  15  16 
Group     -    -    -   -   4   4   4   -   -   -   -   -   -   -   -   15  15 
Role      -    -    -   -   B   P   P   -   -   -   -   -   -   -   -   B   P 
State     -    -    -   -   SI  SO  IS  -   -   -   -   -   -   -   -   SI  SO 
Log Slot  -    -    -   -   5   0   0   -   -   -   -   -   -   -   -   16  0 

After the VCLI ch ces red -card 15 -force back command is issued, the backup CES in slot 15 is now available to switch in for the primary CES in slot 16. The backup CES is in the switch backup (SB) state, and the primary CES in slot 16 is in the in service (IS) state.

tac[15..] - VCLI>> sh SW red 

Switch Name: tac 

Phys Slot 1A   1B   2   3   4   5   6   7   8   9   10  11  12  13  14  15  16 
Group     -    -    -   -   4   4   4   -   -   -   -   -   -   -   -   15  15 
Role      -    -    -   -   B   P   P   -   -   -   -   -   -   -   -   B   P 
State     -    -    -   -   SI  SO  IS  -   -   -   -   -   -   -   -   SB  IS 
Log Slot  -    -    -   -   5   0   0   -   -   -   -   -   -   -   -   0   0 

fetchFiles Flag Gets Stuck

In an attempt to prevent multiple reader and writer situations in the VxWorks shell, a flag is used to signal that an FTP is in progress. When this flag is set to the value of 1, the flag is in a condition called "set". In the set condition, fault management Simple Network Management Protocol (SNMP) requests are ignored. When the CES card is in the set condition, the card is unmanageable and cannot be provisioned. This affects the backup CES that is switched in for the primary CES.

With this example, if the backup CES card in slot 4 that is switched in for the CES in slot 5 has a stuck fetchFiles flag, then all provisioning is performed on the CES in slot 4. This defect incorrectly associates the provisioned services to the CES in slot 4 instead of the primary CES in slot 5.

In order to check to see if an FTP really is in progress (which means the flag is correctly set at 1), log in to the CES card VxWorks shell. Issue the ddiGetSlot command in order to verify the CES slot number after login. Once logged in to the correct CES card, set the verbose logging flag for FTP.

-> ftpVerbose=1

Wait five to ten seconds. If no FTP is in progress, the flag is stuck and needs to be cleared. There is an access function that can be called in order to clear this flag. The call to issue at the VxWorks shell is:

-> fmSetFetchFilesVariable(fmGetFmCb(), 0)

Give fault management at least 30 minutes to recover once this occurs. An FTP could start once the flag is cleared. Use the FTP Verbose flag in order to check again for an FTP. There could be several requests queued that set the fetchFiles flag. The flag could have to be cleared several times before the flag stays cleared.

When the fetchFiles flag is cleared and no more FTP transfers occur, issue the sysReset command. This happens as part of a force back operation if the card is a backup, or this can be performed manually.

In order to turn off the FTP Verbose flag, enter:

-> ftpVerbose=0

In order to verify that there are no remaining files to FTP, enter:

-> fmPrintCb

If fetchIndx = 0 is displayed, there are no more files in queue. If fetchIndx = 1, then the call fmSetFetchFilesVariable(fmGetFmCb(), 0) needs to be reissued. More information about the fmPrintCb command is provided in the Debugging Aids section of this document.

File Transfers and Configuration Updates

File transfers between primary and backup CES cards are not always successful. There are several ways that file transfers can fail. When a primary CES card is added to a redundancy group, the backup CES card copies the primary CES configuration. If a configuration update is made on the primary CES card while a file is transferred, the file that is transferred could be incomplete. Due to this behavior, it is recommended that no provisioning be conducted for 30 minutes after you add a primary CES card to a redundancy group.

Every time the primary CES card logs a configuration change due to provisioning, the card sends an updated copy of the configuration to the backup CES card. This can conflict with the force back activity that takes place when you manually switch a primary CES card back into service. When a force back occurs, the backup CES card pushes the configuration files to the primary CES card. If the primary CES card experiences a configuration change during the force back, the primary CES pushes the updated configuration file to the backup CES card. This creates a scenario where a slightly different configuration file is pushed to the primary CES card.

There are other similar configuration conflicts. The basic rule is to wait at least ten minutes when a configuration update is in progress. The main configuration update events are:

  • When a primary is added to a redundancy group

  • When a force over or force back command is issued.

Verify the file sizes with the VxWorks ls or ll commands. A common error is a file of size zero. For the CES card, do not use the UNIX change directory or cd command in order to navigate directories, as there are many places in the code that use relative path references. To view files for the access, trunk, and fpe directories, issue the ll command and specify the path name as shown in these examples:

-> ll "cmdata/slot0/fpe/access"

-> ll "cmdata/slot0/fpe/trunk"

-> ll "cmdata/slot0/fpe"

Debugging Aids

The state of the fault management and redundancy internal data structures can be viewed with a group of debug commands. The commands are issued from the VxWorks shell on the local CES card. For this example, all commands are issued while in the VxWorks shell of the CES card in slot 5. In order to verify the local slot number, issue the ddiGetSlot command:

-> ddiGetSlot 
value = 5 = 0x5

In order to view the fault management control block and redundancy state for a CES card, issue the fmPrintCb command. The screen contains information about the status of the card and the fetchFiles flag.

-> fmPrintCb 
Dumping tFM_CB structure from DRAM. 
==================================================== 
fmCompTbl[0].cfgFileRev =   2 
fmCompTbl[0].activeSwDir =  SW2 
fmCompTbl[0].compName   =   c38s4 
fmCompTbl[0].defPath    =   F:/SW2 
fmCompTbl[0].compIpAddr =   10.64.38.4 
fmCompTbl[0].redundRole =   FM_backup 
fmCompTbl[0].primaryNum =   0 
fmCompTbl[0].redundGroup=   4 
fmCompTbl[0].adminState =   1 
fmCompTbl[0].operState  =   S_BKUP_SWD_IN 
fmCompTbl[0].numRcvdMsgs=   1607537 

myCfgFileRev =   2 
myActiveSwDir=   SW2 
compTblIndx  =   16 
fetchIndx    =   0 
delayFetching=   0 
myHostName   =   c38s5 
myDefPath    =   F:/SW2 
myIpAddr     =   10.64.38.5 
adminState   =   1 
redundRole   =   FM_primary 
primaryNum   =   5 
redundGroup  =   4 
redundAction =   1 
fmReset      =   0 
ownRcvdMsgs  =   0 

bkupIndx     =   0 

SwitchedInForSlotN    =   0 
FrcSwitchOvrForSlotN  =   0 
ReasonForSwitchOver   =   0 
primSwdOutDefPath     = 
primSwdOutAddr        =   0.0.0.0 
primSwdOutAdminState  =   0 
primSwdOutNumber      =   0 
fetchFiles            =   0 

internalEvent         =   23 

AddrErrCnt   =   0 
tftpErrCnt   =   0 
mkDirErrCnt  =   0 
invldEvtCnt  =   486 
ossErrCnt    =   0 

In order to view the current redundancy state for a specific CES card, issue the fmPrintState command.

-> fmPrintState 
==================================================== 
state       =   S_PRIM_SWD_OUT 
lastEvent   =   2 
enableRstrt =   0 
actionIndex =   0 

value = 1 = 0x1 

Tracing for Fault Management

In order to trace fault management messages in the VxWorks shell of a local CES card, issue these commands:

  • -> ftpVerbose=1

  • -> tprintfSetMask 35,255

In order to disable tracing and logging, issue these commands:

  • -> tprintfSetMask 35,0

  • -> ftpVerbose=0

NetPro Discussion Forums - Featured Conversations

Networking Professionals Connection is a forum for networking professionals to share questions, suggestions, and information about networking solutions, products, and technologies. The featured links are some of the most recent conversations available in this technology.
NetPro Discussion Forums - Featured Conversations for WAN Switching
Network Infrastructure: WAN Routing and Switching

Related Information



Updated: Jun 03, 2005Document ID: 6931