Table Of Contents
Overload Control
Overload Control Phases
Detecting Overload
Computing MCL
Reducing Overload
Slowing Overload Reduction
Configuring
Setting the Minimum System MCL
Configuring the SIP Response Code
Configuring Emergency Call Handling
Configuring SIP Message Handling
SIP Message Types
Message Rejection Logic
Operating
Viewing MCL
Editing the OLM.CFG File
Sample olm.cfg
Measurements
Call Processing Measurements
Service Interaction Manager Measurements
Traffic Measurements Monitor Counters
Miscellaneous Measurements
Troubleshooting
Events and Alarms
Congestion Status—Maintenance (112)
CPU Load of Critical Processes—Maintenance(113)
Queue Length of Critical Processes—Maintenance(114)
IPC Buffer Usage Level—Maintenance(115)
CA Reports the Congestion Level of FS—Maintenance(116)
Logs
Overload Control
Revised: December 11, 2008, OL-8723-17
Overload Control Phases
Overload is a switch condition that exists when system resources cannot handle system tasks. Increases in call traffic or messages indirectly related to call traffic usually cause overload (Table C-1).
The Overload Control feature supports the BTS Call Agent (CA) and Feature Server (FS). Overload Control detects, controls, and manages overload from all types of networks (SIP, SS7, ISDN, MGCP, H.323):
Table C-1 Overload Control Phases
Overload Control Phase
|
Actions
|
1. Detection
|
Measures and compares factors to threshold values.
Determines system congestion and machine congestion level (MCL).
Detects BTS machine congestion conditions in 5 levels: none, mild, moderate, severe, emergency.
|
2. Control
|
Decreases overload, this is configurable and varies with MCL but usually means rejecting a percentage of incoming calls.
Caution  To control overload you must edit configuration files. Changes to these files significantly impact system performance. Edit them only under the direction of a Cisco engineer.
|
3. Management
|
Affects the following switch areas:
• Alarms
• Logs
• Billing
• Measurements
|
Detecting Overload
In the detection phase of Overload Control any one of three factors can have the highest MCL. This value dictates the MCL for the entire system. The three factors are:
•
Critical processes CPU usage—The olm.cfg configuration file has UNIX "nice" values. BTS uses these values to calculate CPU utilization for critical processes over a small period of time (2 seconds minimum). Values for each process should be at or below the set value.
•
Critical process queue lengths—The olm.cfg configuration file has critical queue lengths for BTS processes like BCM, MGA, SGA, SIA, ISA and H3A. You can define multiple (32 factors total) critical queues for any BTS process. BTS monitors the usage proportion of each critical IPC queue.
•
IPC buffer pool usage—BTS monitors the proportion of available buffers in the IPC buffers pool, this reflects MCL: the higher the usage, the greater the congestion.
BTS detects its own MCL in five levels:
•
MCL0—No congestion and no need for any abatement.
•
MCL1—Mild congestion. Call rejection starts as configured in olm.cfg.
•
MCL2—Moderate congestion. Call rejection increases as configured in olm.cfg.
•
MCL3—Severe congestion. Call rejection increases still more as configured in olm.cfg.
•
MCL4—Emergency congestion. BTS rejects all calls including emergency calls.
Computing MCL
BTS computes factor levels by calculating averages for each factor. The rate of sampling (number of slots) can be configured per factor (3-10 slots). The MCL is set according to a factor level. In Table C-2 thresholds are set to 70, 80, 90, and 95 percent.
Table C-2 MCL Thresholds
Onset /abatement thresholds
|
Factor Level
|
MCL
|
-
|
0-69
|
MC0
|
level_1_threshold = 70
|
70-79
|
MC1
|
level_2_threshold = 80
|
80-89
|
MC2
|
level_3_threshold = 90
|
90-95
|
MC3
|
level_4_threshold = 95
|
95-100
|
MC4
|
Reducing Overload
When MCL exceeds MCL0, Overload Control reduces MCL as follows:
•
Selectively reject new calls by the signaling adapters—A percentage of calls and messages are rejected at the current MCL level, based on olm.cfg. Emergency calls are not rejected at MCL 1-3, but all calls, including emergency calls, are rejected at MCL4.
•
Tell the network to stop sending traffic—This starts when BTS is mildly congested (at MCL1) and continues through all higher MCL levels until the overload condition abates to MC0. This action can only be applied to the following types of networks:
–
SS7 sends Automatic Control Level (ACL) parameter in ISUP release messages.
–
H.323 sends Resource Availability Indicator (RAI) message.
–
SIP sends 500 or 503 with a retry.
•
CA stops sending triggers to POTS FS—When the FS is congested the following occurs:
–
FS notifies CA once of its congested status.
–
CA sends only emergency triggers to FS, as it manages FS's congestion abatement
Slowing Overload Reduction
Sudden abatement reduction may cause MCL to rapidly increase again. To counteract MCL "bouncing", MCL reduces one MCL level at a time, regardless of how low computed MCL becomes. This permits the system MCL to reduce gracefully over a number of intervals.
Damping also slows the system MCL. Damping values are in milliseconds and define the shortest amount of time an MCL level can exist. You can configure the damping time for each MCL level in olm.cfg using:
•
level_1_damping_time
•
level_2_damping_time
•
level_3_damping_time
•
level_4_damping_time
Configuring
This section explains how to perform the following tasks:
•
Setting the Minimum System MCL
•
Configuring the SIP Response Code
•
Configuring Emergency Call Handling
•
Configuring SIP Message Handling
Note
These tasks include examples of CLI commands that illustrate how to provision the specific feature. For a complete list of all CLI tables and tokens, refer to the Cisco BTS 10200 Softswitch Command Line Interface Reference Guide.
Setting the Minimum System MCL
Warning
Manually setting minimum MCL means call processing is affected exactly as it would be if MCL were set at that level due to actual system overload/congestion. Use it for test purposes only.
To set the minimum system MCL, enter a command similar to the following:
control machine-congestion-level platform_id=CA146, mcl=2;
MACHINE CONGESTION LEVEL ON CALL AGENT CA146 IS... ->
ADMIN MCL -> NO_CONGESTION(0)
COMPUTED MCL -> NO_CONGESTION(0)
EFFECTIVE MCL -> NO_CONGESTION(0)
FEATURE SERVER CONGESTION ->
FSAIN205 IS NOT CONGESTED
FSPTC235 IS NOT CONGESTED
REASON -> ADM executed successfully
RESULT -> ADM configure result in success
Reply : Success: at 2006-02-28 09:54:27 by btsadmin
Configuring the SIP Response Code
When rejecting a SIP message during overload, you can use either of the following:
•
500 Server Internal Error
•
503 Service Unavailable
Use the following command. The default value is 503.
add ca_config type=SIA-OC-REJECTION-RESP; datatype=integer; value=500;
Configuring Emergency Call Handling
The BTS checks the:
•
Called-party number for all incoming calls against the EMERGENCY-NUMBER-LIST
•
Calling party category (CPC) in ISUP calls
If the BTS determines it is an emergency call and the MCL is 1, 2, or 3, the BTS gives it priority and does not rejected the call. If the MCL is 4, The BTS rejects all calls, including emergency calls.
To add a number to the EMERGENCY-NUMBER-LIST, enter a command similar to the following:
add emergency-number-list digit_string=911;
Reply : Success: at 2006-02-28 09:48:40 by btsadmin
Transaction 934823299797597704 was processed.
To display the EMERGENCY-NUMBER-LIST, enter:
show emergency-number-list;
Reply : Success: at 2006-02-28 09:48:45 by btsadmin
To delete a number from the EMERGENCY-NUMBER-LIST, enter a command similar to the following:
delete emergency-number-list digit_string=911;
Reply : Success: at 2006-02-28 09:52:20 by btsadmin
Transaction 934823480106794504 was processed.
Configuring SIP Message Handling
When processing an incoming SIP call, the BTS looks at the MCL of the CA. It uses the following factors to decide whether to accept or reject the message:
•
SIP Message Type
•
Call type (normal or emergency)
•
Configured rejection percentage
•
Current MCL status
SIP Message Types
Message Rejection: INVITE
If overloaded BTS rejects a percentage of incoming INVITE messages. The percentage rejected is based on sia.cfg. Only new INVITE messages are checked for acceptance. Re-INVITE messages are always accepted.
Message Rejection: REGISTER
If overloaded BTS rejects a configured percentage of REGISTER messages.
Message Rejection: REFER
If overloaded BTS rejects a a percentage of incoming REFER messages.
Message Rejection: SUBSCRIBE
If overloaded BTS rejects a configured percentage of out-of-dialog SUBSCRIBE messages. The BTS also rejects SUBSCRIBE messages without call contexts. The BTS does not reject SUBSCRIBE messages received in an INVITE dialog.
Message Rejection: OPTIONS
If overloaded BTS rejects OPTIONS messages . There is no configuration required; all OPTIONS messages are rejected between MCL1 and MCL4
Message: Unsolicited NOTIFY Repression
If overloaded BTS does not send unsolicited NOTIFY messages (MWI requests) to endpoints. However, even if overloaded BTS does receive and process unsolicited NOTIFY requests.
UDP Messages
BTS drops messages like STUN if they are less than the configured size. This applies to UDP messages.
Message Rejection Logic
When the BTS rejects an incoming SIP call it responds with 500 or 503. Using CLI set the response code.
The BTS includes a "Retry-After" header in its response. The value (in seconds) in this header notifies the endpoint the BTS will not receive further requests for the specified time. For example, "Retry-After: 5" means the endpoint should send the next request to the BTS until after 5 seconds has passed.
Operating
This section explains how to perform the following tasks:
•
Viewing MCL
•
Editing the OLM.CFG File
It also explains how this feature affects the following operational area:
•
Measurements
Viewing MCL
To display the MCL, enter a command similar to the following:
status machine-congestion-level platform_id=CA146;
MACHINE CONGESTION LEVEL ON CALL AGENT CA146 IS... ->
ADMIN MCL -> NO_CONGESTION(0)
COMPUTED MCL -> NO_CONGESTION(0)
EFFECTIVE MCL -> NO_CONGESTION(0)
FEATURE SERVER CONGESTION ->
FSAIN205 IS NOT CONGESTED
FSPTC235 IS NOT CONGESTED
REASON -> ADM executed successfully
RESULT -> ADM configure result in success
Reply : Success: at 2006-02-28 09:54:27 by btsadmin
If platform_id is the FS, e.g., FSPTC235, the output shows MCL. If platform_id is the CA, e.g., CA146, the output includes congestion status of FSs as seen by the CA. Without this parameter the command displays the MCL of all platforms on the system.
Editing the OLM.CFG File
The Overload Manager (OLM) section appears in the platform.cfg file. Overload Control uses the following configuration files:
Table C-3 Configuration Files Used by Overload Control
Name
|
Location
|
Description
|
olm.cfg
|
/opt/OptiCall/ca/bin/
|
Specifies the parameters that control OLM
Exists in separate versions for Call Agent and each Feature Server
Stores configuration information (to compute MCL) in global data section
Shown in Appendix A
Caution  Changes to olm.cfg significantly impact system performance. Edit it only under the direction of a Cisco engineer.
|
sia.cfg
(SIP Adapter)
|
/opt/OptiCall/ca/bin/
|
Has SIP timer values used during overload
Caution  These values can be changed only by a Cisco engineer.
|
Sample olm.cfg
#These macros can be used as default token values to make this file easier to maintain.
#i.e. for all OBJECTS that use them, the values for all can be changed in this one place.
#These values take effect if no corresponding entry is made for a given OBJECT when the
level_1_threshold=50 #The value at which MCL level 1 is triggered for this
factor
level_2_threshold=70 #The value at which MCL level 2 is triggered for this
factor
level_3_threshold=90 #The value at which MCL level 3 is triggered for this
factor
level_4_threshold=95 #The value at which MCL level 4 is triggered for this
factor
info_alarm_step_size=0 #The number of percentage point steps that will cause an
info
slot_array_size=10 #The number of slots in the array of factor levels over
which
olm_sampling_interval=2 #In milliseconds, how often OLM "wakes up" and performs
olm_printing_cycle=0 #In cycles of olm_sampling_interval how often to print
level_1_reject_rate=10 #Percentage of new calls to reject when the system
level_2_reject_rate=50 #Percentage of new calls to reject when the system
level_3_reject_rate=90 #Percentage of new calls to reject when the system
level_1_damping_time=1600 #The minimum amount of time (ms) that can be spent at MCL1
level_2_damping_time=1200 #The minimum amount of time (ms) that can be spent at MCL2
level_3_damping_time=800 #The minimum amount of time (ms) that can be spent at MCL3
level_4_damping_time=400 #The minimum amount of time (ms) that can be spent at MCL4
alarm_damping_time=60 #The minimum amount of time(s) before alarm 112 is changed
#Defines the section of the configuration file relating to CPU Utilization.
factor_type=cpu_utilization #The type of this factor
cpu_collection_interval=5 #In seconds, the value passed to the gosGetCpuUsage()
cpu_nice_value=12 #The Unix nice value that is used to identify critical
processes.
slot_array_size=${SLOTS} #The number of slots in the array of factor levels over
which the
level_1_threshold=85 #The value at which MCL level 1 is triggered for this factor
level_2_threshold=90 #The value at which MCL level 2 is triggered for this factor
level_3_threshold=95 #The value at which MCL level 3 is triggered for this factor
level_4_threshold=98 #The value at which MCL level 4 is triggered for this factor
info_alarm_step_size=${STEP} #The number of percentage point steps that will cause an
#info alarm for this factor
#Defines the section of the configuration file pertaining to IPC BufferPool Utilization.
factor_type=ipc_buff_pool #The type of this factor
slot_array_size=${SLOTS} #The number of slots in the array of factor levels over
which the
level_1_threshold=${THRESH1} #The value at which MCL level 1 is triggered for this factor
level_2_threshold=${THRESH2} #The value at which MCL level 2 is triggered for this factor
level_3_threshold=${THRESH3} #The value at which MCL level 3 is triggered for this factor
level_4_threshold=${THRESH4} #The value at which MCL level 4 is triggered for this factor
info_alarm_step_size=${STEP} #The number of percentage point steps that will cause an
#info alarm for this factor
#Defines a section of the configuration file pertaining to monitoring of Critical Queue
Sizes.
factor_type=critical_queue #The type of this factor
process_name=BCM #The 3 or 4 character process name.
thread_type=1 #The numeric thread type associated with the queue.
thread_instance=1 #The numeric thread instance number associated with
#the queue being monitored.
slot_array_size=${SLOTS} #The number of slots in the array of factor levels over
which the
level_1_threshold=${THRESH1} #The value at which MCL level 1 is triggered for this factor
level_2_threshold=${THRESH2} #The value at which MCL level 2 is triggered for this factor
level_3_threshold=${THRESH3} #The value at which MCL level 3 is triggered for this factor
level_4_threshold=${THRESH4} #The value at which MCL level 4 is triggered for this factor
info_alarm_step_size=${STEP} #The number of percentage point steps that will cause an
#info alarm for this factor
Measurements
These tables list new, modified, or deleted measurements.
Note
See the Measurements section of the BTS 10200 Operations and Maintenance Guide for a complete list of all traffic measurements.
Call Processing Measurements
Table C-4 lists the new call processing measurements provided to support this feature.
Table C-4 Call Processing Measurements Used by Overload Control
Measurement
|
Description
|
CALLP_OLM_OFFERED
|
The total number of calls offered to OLM
|
CALLP_OLM_ACCEPT
|
The total number of calls accepted by OLM
|
CALLP_OLM_REJECT
|
The total number of calls rejected by OLM
|
CALLP_OLM_ACCEPT_MCL0
|
Calls accepted by OLM at MCL0
|
CALLP_OLM_ACCEPT_MCL1
|
Calls accepted by OLM at MCL1
|
CALLP_OLM_ACCEPT_MCL2
|
Calls accepted by OLM at MCL2
|
CALLP_OLM_ACCEPT_MCL3
|
Calls accepted by OLM at MCL3
|
CALLP_OLM_REJECT_MCL1
|
Calls rejected by OLM at MCL1
|
CALLP_OLM_REJECT_MCL2
|
Calls rejected by OLM at MCL2
|
CALLP_OLM_REJECT_MCL3
|
Calls rejected by OLM at MCL3
|
CALLP_OLM_REJECT_MCL4
|
Calls rejected by OLM at MCL4
|
CALLP_OLM_REJECT_EMERGENCY
|
Emergency calls rejected at MCL4
|
CALLP_OLM_MCL1_COUNT
|
Total number of MCL1 occurrences
|
CALLP_OLM_MCL2_COUNT
|
Total number of MCL2 occurrences
|
CALLP_OLM_MCL3_COUNT
|
Total number of MCL3 occurrences
|
CALLP_OLM_MCL4_COUNT
|
Total number of MCL4 occurrences
|
CALLP_OLM_ISUP_MSG_DUMPED
|
Number of ISUP messages dumped at MCL4 by layer 3/4 interface (MIM) due to system overload.
|
Service Interaction Manager Measurements
Table C-5 lists the new Service Interaction Manager measurements provided to support this feature.
Table C-5 Service Interaction Manager Measurements used by Overload Control
Measurement
|
Description
|
SIM_OC_TRIG_FILTERED
|
The number of triggers dropped when the FS is overloaded (a single counter is used by SIM, which tracks the trigger filtering for all the FS). SIM will update this counter every time it filters a trigger due to congestion on a FS.
|
SIM_OC_EMG_TRIG_FORCED
|
The number of emergency triggers (i.e. TRIGGER_911) forced when the FS is overloaded (a single counter is used by SIM which tracks number of emergency triggers forced for all the FS). SIM will update this counter every time when it forces an emergency trigger (TRIGGER_911) to FS.
|
SIM_OC_TRIG_FORCED
|
The number of triggers forced when the FS is overloaded (a single counter is used by SIM which tracks the number of forced triggers for all the FSs). SIM will update this counter every time when it forces a trigger.
|
Traffic Measurements Monitor Counters
Table C-6 lists the new Traffic Measurements Monitor (TMM) measurements provided to support this feature.
Table C-6 TMM Timers used by Overload Control
Measurement
|
Description
|
SIA_OC_RX_INVITE_REJECT
|
The total number of incoming INVITE messages rejected by SIA due to overload.
|
SIA_OC_RX_REGISTER_REJECT
|
The total number of incoming REGISTER messages rejected by SIA due to overload
|
SIA_OC_RX_REFER_REJECT
|
The total number incoming REFER messages rejected by SIP due to overload.
|
SIA_OC_RX_SUBSCRIBE_REJECT
|
The total number of incoming SUBSCRIBE messages rejected.
|
SIA_OC_RX_UNSOL_NOTIFY_SUPP
|
The total number of unsolicited notification requests suppressed without sending to endpoints.
|
SIA_OC_RX_OPTIONS_REJECT
|
The total number of incoming OPTIONS messages rejected by SIA due to overload.
|
Miscellaneous Measurements
Table C-7 lists additional measurements added to support Overload Control.
Table C-7 Miscellaneous Measurements used by Overload Control
Timer
|
Description
|
ISUP_CONG_CALL_REJECTED
|
The congestion-rejected calls on a per trunk group basis. This is implemented for SGA.
|
POTS_OC_DP_RECEIVED
|
The number of Detection Points (DPs) reported during periods of congestion. This is being pegged by the FS
|
H323_OC_SETUP_REJECTED
|
The total number of incoming H.225 Setup messages rejected by the BTS due to overload.
|
MEAS_ISA_OC_SETUP_REJECTED
|
The number of ISDN calls rejected due to system overload.
|
MEAS_MGA_OC_CALL_REJECTED
|
The number of MGCP calls rejected due to system overload.
|
Troubleshooting
This section lists the Events and Alarms added to support this feature.
Events and Alarms
The FS sends an alarm when:
•
MCL changes
•
An individual critical factor reaches its threshold
The CA sends an Informational alarm when:
•
It receives a congested notification
•
It receives an abatement notification from an FS
Informational alarms are sent at fixed 25 percent increments. A configurable parameter, info_alarm_step_size, is added to each factor defined in olm.cfg. Ensure the value allows sufficient warning. The default for info_alarm_step_size is 5, giving factor informational alarms at 5, 10, 15 percent, etc.
Congestion Status—Maintenance (112)
The Congestion Status alarm (major) shows MCL changes, "System MCL Level". This is the effective MCL or the greater of the computed MCL and the administrative MCL.
When a new MAINTENANCE(112) alarm appears, old MAINTENANCE(112) alarms clear. When the system MCL falls to 0, the "new" alarm clears.
Dampen this alarm using alarm_damping_time in olm.cfg. The value of alarm_damping_time is the minimum amount of time that passes before the alarm is issued after the change occurred.
For additional information, refer to the "MAINTENANCE (112)" section on page 7-60.
CPU Load of Critical Processes—Maintenance(113)
The CPU Load of Critical Processes alarm (info) shows MCL from the CPU utilization factor crossed a multiple of the info_alarm_step_size in olm.cfg, "Factor Level" and "Factor MCL". This alarm appears for every crossing of the info_alarm_step_size in the upper and lower direction for this factor, but it is required to pass the next higher or lower level before appearing again.
For additional information, refer to the "MAINTENANCE (113)" section on page 7-60.
Queue Length of Critical Processes—Maintenance(114)
The Queue Length of Critical Processes alarm (info) shows MCL for defined critical process queue length factors crossed a multiple of the info_alarm_step_size in olm.cfg, process_name then 1 byte for "Factor Level" and 1 for "Factor MCL". This alarm appears for every crossing of the info_alarm_step_size in the upper and lower direction for this factor, but it is required to pass the next higher or lower level before appearing again.
For additional information, refer to the "MAINTENANCE (114)" section on page 7-61.
IPC Buffer Usage Level—Maintenance(115)
The IPC Buffer Usage Level alarm (info) shows MCL for IPC buffer usage factor crossed a multiple of the info_alarm_step_size in olm.cfg, "Factor Level" and "Factor MCL". This alarm is produced appears for every crossing of the info_alarm_step_size in the upper and lower direction for this factor, but it is required to pass the next higher or lower level before appearing again.
For additional information, refer to the "MAINTENANCE (115)" section on page 7-61.
CA Reports the Congestion Level of FS—Maintenance(116)
CA Reports the Congestion Level of FS alarm (info) shows CA received a congestion or abatement notification from an FS.
For additional information, refer to the "MAINTENANCE (116)" section on page 7-62.
Logs
Use the INFO logs to get differing levels of information about the alarms:
•
INFO1—Are included with each alarm
•
INFO3—Prints factors feature controlled by olm.cfg shows system overview
•
INFO4—Have extra detail
•
INFO5—Shows exact details of the factor MCL computations