Table Of Contents
Design Considerations for High Availability
What's New in This Chapter
Designing for High Availability
Data Network Design Considerations
Unified CM and CTI Manager Design Considerations
Configuring Unified ICM for CTI Manager Redundancy
Unified IP IVR (CRS) Design Considerations
Unified IP IVR (CRS) High Availability Using Unified CM
Unified IP IVR (CRS) High Availability Using Unified ICM
Cisco Unified Customer Voice Portal (Unified CVP) Design Considerations
Multi-Channel Design Considerations (Cisco Email Manager Option and Cisco Collaboration Server Option)
Cisco Email Manager Option
Cisco Collaboration Server Option
Cisco Unified Outbound Dialer (Unified OUTD) Design Considerations
Peripheral Gateway Design Considerations
Unified CM Failure Scenarios
Unified ICM Failover Scenarios
Scenario 1: Unified CM and CTI Manager Fail
Scenario 2: Agent PG Side A Fails
Scenario 3: Only the Primary Unified CM Subscriber Fails
Scenario 4: Only the Unified CM CTI Manager Service Fails
Unified CCE Scenarios for Clustering over the WAN
Scenario 1: Unified ICM Central Controller or Peripheral Gateway Private Network Fails
Scenario 2: Visible Network Fails
Scenario 3: Visible and Private Networks Both Fail (Dual Failure)
Scenario 4: Unified MA Location WAN (Visible Network) Fails
Understanding Failure Recovery
Unified CM Service
Unified IP IVR (CRS)
Unified ICM
Unified CM PG and CTI Manager Service
Unified ICM Voice Response Unit PG
Unified ICM Call Router and Logger
Administrative Workstation Real-Time Distributor (RTD)
CTI Server
CTI OS Considerations
Cisco Agent Desktop Considerations
Design Considerations for Unified CCE System Deployment with Unified ICM Enterprise
Parent/Child Components
The Unified ICM Enterprise (Parent) Data Center
The Unified CCX Call Center (Child) Site
The Unified CCE Call Center (Child) Site
Parent/Child Call Flows
Typical Inbound PSTN Call Flow
Post-Route Call Flow
Parent/Child Fault Tolerance
Unified CCE Child Loses WAN Connection to Unified ICM Parent
Unified CCE Gateway PG Fails or Cannot Communicate with Unified ICM Parent
Parent/Child Reporting and Configuration Impacts
Other Considerations for the Parent/Child Model
Other Considerations for High Availability
Design Considerations for High Availability
Last revised on: October 29, 2008
This chapter covers several possible Unified CCE failover scenarios and explains design considerations for providing high availability of system functions and features in each of those scenarios. This chapter contains the following sections:
•
Designing for High Availability
•
Data Network Design Considerations
•
Unified CM and CTI Manager Design Considerations
•
Unified IP IVR (CRS) Design Considerations
•
Cisco Unified Customer Voice Portal (Unified CVP) Design Considerations
•
Multi-Channel Design Considerations (Cisco Email Manager Option and Cisco Collaboration Server Option)
•
Cisco Email Manager Option
•
Cisco Collaboration Server Option
•
Cisco Unified Outbound Dialer (Unified OUTD) Design Considerations
•
Peripheral Gateway Design Considerations
•
Understanding Failure Recovery
•
CTI OS Considerations
•
Cisco Agent Desktop Considerations
•
Design Considerations for Unified CCE System Deployment with Unified ICM Enterprise
•
Other Considerations for High Availability
What's New in This Chapter
Table 3-1 lists the topics that are new in this chapter or that have changed significantly from previous releases of this document.
Table 3-1 New or Changed Information Since the Previous Release of This Document
New or Revised Topic
|
Described in:
|
Recovery of calls after PG failover
|
Unified CM PG and CTI Manager Service
|
Designing for High Availability
Cisco Unified CCE is a distributed solution that uses numerous hardware and software components, and it is important to design each system in a way that eliminates any single point of failure or that at least addresses potential failures in a way that will impact the fewest resources in the call center. The type and number of resources impacted will depend on how stringent your requirements are and which design characteristics you choose for the various Unified CCE components, including the network infrastructure. A good Unified CCE design will be tolerant of most failures (defined later in this section), but not all failures can be made transparent.
Cisco Unified CCE is a solution designed for mission-critical call centers. The success of any Unified CCE deployment requires a team with experience in data and voice internetworking, system administration, and Unified CCE application design and configuration.
Note
Simplex deployments are allowed for demo, laboratory, and non-production deployments. However, all production deployments must be deployed with redundancy.
Before implementing Unified CCE, use careful preparation and design planning to avoid costly upgrades or maintenance later in the deployment cycle. Always design for the worst possible failure scenario, with future scalability in mind for all Unified CCE sites.
In summary, plan ahead and follow all the design guidelines and recommendations presented in this guide and in the Cisco Unified Communications Solution Reference Network Design (SRND) guide, available at
http://www.cisco.com/go/designzone
For assistance in planning and designing your Unified CCE solution, consult your Cisco or certified Partner Systems Engineer (SE).
Figure 3-1 shows a high-level design for a fault-tolerant Unified CCE single-site deployment.
Figure 3-1 Unified CCE Single-Site Design for High Availability
In Figure 3-1, each component in the Unified CCE solution is duplicated with a redundant or duplex component, with the exception of the intermediate distribution frame (IDF) switch for the Unified CCE agents and their phones. The IDF switches do not interconnect with each other, but only with the main distribution frame (MDF) switches, because it is better to distribute the agents among different IDF switches for load balancing and for geographic separation (for example, different building floors or different cities). If an IDF switch fails, all calls should be routed to other available agents in a separate IDF switch or to a Unified IP IVR (Customer Response Solutions (CRS)) queue. Follow the design recommendations for a single-site deployment as documented in the Cisco Unified Communications Solution Reference Network Design (SRND) guide, available at
http://www.cisco.com/go/designzone
If designed correctly for high availability and redundancy, a Unified CCE system can lose half of its systems and still be operational. With this type of design, no matter what happens in the Unified CCE system, each call can still be handled in one of the following ways:
•
Routed and answered by an available Unified CCE agent using an IP phone or desktop softphone
•
Sent to an available Unified IP IVR (CRS) or Unified CVP port or session
•
Answered by the Cisco Unified Communications Manager AutoAttendant
•
Prompted by a Unified IP IVR (CRS) or Unified CVP announcement that the call center is currently experiencing technical difficulties, and to call back later
•
Rerouted to another site with available agents or resources to handle the call
The components in Figure 3-1 can be rearranged to form two connected Unified CCE sites, as illustrated in Figure 3-2.
Figure 3-2 Unified CCE Single-Site Redundancy
Figure 3-2 emphasizes the redundancy of the single site design in Figure 3-1. Side A and Side B are basically mirror images of each other. In fact, one of the main Unified CCE features to enhance high availability is its ability to add redundant/duplex components that are designed to automatically fail-over and recover without any manual intervention. Core system components with redundant/duplex components are interconnected to provide failure detection of the "partner" system with the use of TCP keep-alive messages generated every 100 ms over a separate network path. The fault-tolerant design and failure detection/recovery method is described later in this chapter.
Other components in the solution use other types of redundancy strategies. For example, Cisco Unified Communications Manager (Unified CM) uses a cluster design to provide IP phones and devices with multiple Unified CM subscribers (servers) with which to register if the primary server fails, and those devices automatically re-home to the primary when it is restored.
The following sections use Figure 3-1 as the model design to discuss issues and features that you should consider when designing Unified CCE for high availability. These sections use a bottom-up model (from a network model perspective, starting with the physical layer first) that divides the design into segments that can be deployed in separate stages.
Cisco recommends using only duplex (redundant) Unified CM, Unified IP IVR/Unified CVP, and Unified ICM configurations for all Unified CCE deployments that require high availability. This chapter assumes that the Unified CCE failover feature is a critical requirement for all deployments, therefore it presents only deployments that use a redundant (duplex) configuration, with each Unified CM cluster having at least one publisher and one subscriber. Additionally, where possible, deployments should follow the best practice of having no devices, call processing, or CTI Manager Services running on the Unified CM publisher.
Data Network Design Considerations
The Unified CCE design shown in Figure 3-3 starts from a time division multiplexing (TDM) call access point and ends where the call reaches a Unified CCE agent. The bottom of the network infrastructure in the design supports the Unified CCE environment for data and voice traffic. The network, including the PSTN, is the foundation for the Unified CCE solution. If the network is poorly design to handle failures, then everything in the call center is prone to failure because all the servers and network devices depend on the network for communication. Therefore, the data and voice networks must be a primary part of your solution design and must be addressed in the early stages for all Unified CCE implementations.
Note
Cisco recommends that the NIC card and ethernet switch be set to 100 MB full duplex for 10/100 links, or set to auto-negotiate for gigabit links.
In addition, the choice of voice gateways for a deployment is critical because some protocols offer more call resiliency than others. This chapter provides high-level information on how the voice gateways should be configured for high availability with the Unified CCE solution.
For more information on voice gateways and voice networks in general, refer to the Cisco Unified Communications Solution Reference Network Design (SRND) guide, available at
http://www.cisco.com/go/designzone
Figure 3-3 High Availability in a Network with Two Voice Gateways and One Unified CM Cluster
Using multiple voice gateways avoids the problem of a single gateway failure causing blockage of all calls. In a configuration with two voice gateways and one Unified CM cluster, each gateway should register with a different primary Unified CM to spread the workload across the Unified CMs in the cluster. Each gateway should use the other Unified CM as a backup in case its primary Unified CM fails. For details on setting up Unified CM for redundant service related to call processing, refer to the Cisco Unified Communications Solution Reference Network Design (SRND) guide (available at
http://www.cisco.com/go/designzone
With H.323 voice gateways, additional call processing is available by using TCL scripts and additional dial peers if the gateway is unable to reach its Unified CM for call control or call processing instructions. MGCP gateways do not have this built-in functionality, and the trunks that are terminated in these gateways should have backup routing from the PSTN carrier or service provider to reroute the trunk on failure or no-answer to another gateway or location.
As for sizing the gateway's trunk capacity, it is a good idea to account for failover of the gateways, building in enough excess capacity to handle the maximum busy hour call attempts (BHCA) if one or more voice gateways fail. During the design phase, first decide how many simultaneous voice gateway failures are acceptable for the site. Based upon this requirement, the number of voice gateways used, and the distribution of trunks across those voice gateways, you can determine the total number of trunks required for normal and disaster" modes of operation. The more you distribute the trunks over multiple voice gateways, the fewer trunks you will need in a failure mode. However, using more voice gateways will increase the cost of that component of the solution, so you should compare the annual operating cost of the trunks (paid to the PSTN provider) against the one-time fixed cost of the voice gateways. The form-factor of the gateway is also a consideration; for example, if an entire 8-port T1 blade fails in a Cisco Catalyst 6500 chassis, that event could impact 184 calls coming into the site.
As an example, assume a call center has a maximum BHCA that results in the need for four T1 lines, and the company has a requirement for no call blockage in the event of a single component (voice gateway) failure. If two voice gateways are deployed in this case, then each voice gateway should be provisioned with four T1 lines (total of eight). If three voice gateways are deployed, then two T1 lines per voice gateway (total of six) would be enough to achieve the same level of redundancy. If five voice gateways are deployed, then one T1 per voice gateway (total of five) would be enough to achieve the same level of redundancy. Thus, you can reduce the number of T1 lines required by adding more voice gateways and spreading the risk over multiple physical devices.
The operational cost savings of fewer T1 lines might be greater than the one-time capital cost of the additional voice gateways. In addition to the recurring operational costs of the T1 lines, you should also factor in the one-time installation cost of the T1 lines to ensure that your design accounts for the most cost-effective solution. Every installation has different availability requirements and cost metrics, but using multiple voice gateways is often more cost-effective. Therefore, it is a worthwhile design practice to perform this cost comparison.
After you have determined the number of trunks needed, the PSTN service provider has to configure them so that calls can be terminated onto trunks connected to all of the voice gateways (or at least more than one voice gateway). From the PSTN perspective, if the trunks going to the multiple voice gateways are configured as a single large trunk group, then all calls will automatically be routed to the surviving voice gateways when one voice gateway fails. If all of the trunks are not grouped into a single trunk group within the PSTN, then you must ensure that PSTN rerouting or overflow routing to the other trunk groups is configured for all dialed numbers.
If a voice gateway with a digital interface (T1 or E1) fails, then the PSTN automatically stops sending calls to that voice gateway because the carrier level signaling on the digital circuit has dropped. Loss of carrier level signaling causes the PSTN to busy-out all trunks on that digital circuit, thus preventing the PSTN from routing new calls to the failed voice gateway. When the failed voice gateway comes back on-line and the circuits are back in operation, the PSTN automatically starts delivering calls to that voice gateway again.
With H.323 voice gateways, it is possible for the voice gateway itself to be operational but for its communication paths to the Unified CM servers to be severed (for example, a failed Ethernet connection). If this situation occurs in the case of a H.323 gateway, you can use the busyout-monitor interface command to monitor the Ethernet interfaces on a voice gateway. To place a voice port into a busyout monitor state, use the busyout-monitor interface voice-port configuration command. To remove the busyout-monitor state on the voice port, use the no form of this command. As noted previously, these gateways also provide additional processing options if the call control interface is not available from Unified CM to reroute the calls to another site or dialed number or to play a locally stored .wav file to the caller and end the call.
With MGCP-controlled voice gateways, when the voice gateway interface to Unified CM fails, the gateway will look for secondary and tertiary Unified CM subscribers from the redundancy group. The MGCP gateway will automatically fail-over to the other subscribers in the group and periodically check the health of each, marking it as available once it comes back on-line. The gateway will then fail-back to the primary subscriber when all calls are idle or after 24 hours (whichever comes first). If no subscribers are available, the voice gateway automatically busies-out all its trunks. This action prevents new calls from being routed to this voice gateway from the PSTN. When the voice gateway interface to Unified CM homes to the backup subscriber, the trunks are automatically idled and the PSTN should begin routing calls to this voice gateway again (assuming the PSTN has not permanently busied-out those trunks). The design practice is to spread the gateways across the Unified CM call processing servers in the cluster to limit the risk of losing all the gateway calls in a call center if the primary subscriber that has all the gateways registered to it should fail.
Voice gateways that are used with Cisco Unified Survivable Remote Site Telephony (SRST) option for Unified CM follow a similar failover process. If the gateway is cut off from the Unified CM that is controlling it, the gateway will fail-over into SRST mode, which terminates all trunk calls and resets the gateway into SRST mode. Phones re-home to the local SRST gateway for call control, and calls will be processed locally and directed to local phones. While running in SRST mode, it is assumed that the agents also have no CTI connection from their desktops, so they will be seen as not ready within the Unified CCE routing application. Therefore, no calls will be sent to these agents by Unified CCE. When the data connection is re-established to the gateway at the site, the Unified CM will take control of the gateway and phones again, allowing the agents to be reconnected to the Unified CCE.
Unified CM and CTI Manager Design Considerations
Cisco Unified CM Release 3.3(x) and later uses CTI Manager, a service that acts as an application broker and abstracts the physical binding of the application to a particular Unified CM server to handle all its CTI resources. (Refer to the Cisco Unified Communications Solution Reference Network Design (SRND) guide for further details about the architecture of the CTI Manager.) The CTI Manager and Unified CM are two separate services running on a Unified CM server. Some other services running on a Unified CM server include TFTP, Cisco Messaging Interface, and Real-time Information Server (RIS) data collector services.
The main function of the CTI Manager is to accept messages from external CTI applications and send them to the appropriate resource in the Unified CM cluster. The CTI Manager uses the Cisco JTAPI link to communicate with the applications. It acts like a JTAPI messaging router. The JTAPI client library in Cisco Unified CM Release 3.3(x) and above connects to the CTI Manager instead of connecting directly to the Unified CM service directly, as in prior releases. In addition, there can be multiple CTI Manager services running on different Unified CM servers in the cluster that are aware of each other (via the Unified CM service, which is explained later in this section). The CTI Manager uses the same Signal Distribution Layer (SDL) signaling mechanism that the Unified CM services in the cluster use to communicate with each other. However, the CTI Manager does not directly communicate with the other CTI Managers in its cluster. (This is also explained later in detail.)
The main function of the Unified CM service is to register and monitor all the Cisco Unified Communications devices. It basically acts as a switch for all the Cisco Unified Communications resources and devices in the system, while the CTI Manager service acts as a router for all the CTI application requests for the system devices. Some of the devices that can be controlled by JTAPI that register with the Unified CM service include the IP phones, CTI ports, and CTI route points.
Figure 3-4 illustrates some of the functions of Unified CM and the CTI Manager.
Figure 3-4 Functions of Unified CM and the CTI Manager
The servers in a Unified CM cluster communicate with each other using the Signal Distribution Layer (SDL) service. SDL signaling is used only by the Unified CM service to talk to the other Unified CM services to make sure everything is in sync within the Unified CM cluster. The CTI Managers in the cluster are completely independent and do not establish a direct connection with each other. CTI Managers route only the external CTI application requests to the appropriate devices serviced by the local Unified CM service on this subscriber. If the device is not resident on its local Unified CM subscriber, then the Unified CM service forwards the application request to the appropriate Unified CM in the cluster. Figure 3-5 shows the flow of a device request to another Unified CM in the cluster.
Figure 3-5 CTI Manager Device Request to a Remote Unified CM
Although it might be tempting to register all of the Unified CCE devices to a single subscriber in the cluster and point the Peripheral Gateway (PG) to that server, this configuration would put a high load on that subscriber. If the PG were to fail in this case, the duplex PG would connect to a different subscriber, and all the CTI Manager messaging would have to be routed across the cluster to the original subscriber. It is important to distribute devices and CTI applications appropriately across all the call processing nodes in the Unified CM cluster to balance the CTI traffic and possible failover conditions.
The external CTI applications use a JTAPI user account on the CTI Manager to establish a connection and assume control of the Unified CM devices registered to this JTAPI user. In addition, given that the CTI Managers are independent from each other, any CTI application can connect to any CTI Manager to perform its requests. However, because the CTI Managers are independent, one CTI Manager cannot pass the CTI application to another CTI Manager upon failure. If the first CTI Manager fails, the external CTI application must implement the failover mechanism to connect to another CTI Manager in the cluster.
For example, the Agent PG handles failover for the CTI Manager by using its duplex servers, sides A and B, each of which is pointed to a different subscriber in the cluster, but not at the same time. The PG processes are designed to prevent both sides from trying to be active at the same time. Additionally, both of the duplex PG servers use the same JTAPI user to log into the CTI Manager applications. However, only one Unified CM PG side allows the JTAPI user to register and monitor the user devices to conserve system resources in the Unified CM cluster. The other side of the Unified CM PG stays in hot-standby mode, waiting to be activated immediately upon failure of the active side.
Figure 3-6 shows two external CTI applications using the CTI Manager, the Agent PG, and the Unified IP IVR (CRS). The Unified CM PG logs into the CTI Manager using the JTAPI account User 1, while the Unified IP IVR (CRS) uses account User 2. Each external application uses its own specific JTAPI user account and will have different devices registered and monitored by that user. For example, the Unified CM PG (User 1) will monitor all four agent phones and the inbound CTI Route Points, while the Unified IP IVR (User 2) will monitor its CTI Ports and the CTI Route Points used for its JTAPI Triggers. Although multiple applications could monitor the same devices, this method is not recommended because it can cause race conditions between the applications trying to take control of the same physical device.
Figure 3-6 CTI Application Device Registration
Unified CM CTI applications also add to the device weights on the subscribers, adding memory objects used to monitor registered devices. These monitors are registered on the subscriber that has the connection to the external application. It is a good design practice to distribute these applications to CTI Manager registrations across multiple subscribers to avoid overloading a single subscriber with all of the monitored object tracking.
The design of Unified CM and CTI Manager should be performed as the second design stage, right after the network design stage, and deployment should occur in this same order. The reason for this order is that the Cisco Unified Communications infrastructure must be in place to dial and receive calls using its devices before you can deploy any telephony applications. Before moving to the next design stage, make sure that a PSTN phone can call an IP phone and that this same IP phone can dial out to a PSTN phone, with all the call survivability capabilities considered for treating these calls. Also keep in mind that the Unified CM cluster design is paramount to the Unified CCE system, and any server failure in a cluster will take down two services (CTI and Unified CM), thereby adding an extra load to the remaining servers in the cluster.
Configuring Unified ICM for CTI Manager Redundancy
To enable Unified CM support for CTI Manager failover in a duplex Unified CM model, perform the following steps:
Step 1
Create a Unified CM redundancy group, and add subscribers to the group. (Publishers and TFTP servers should not be used for call processing, device registration, or CTI Manager use.)
Step 2
Designate two CTI Managers to be used for each side of the duplex Peripheral Gateway (PG), one for PG Side A and one for PG Side B.
Step 3
Assign one of the CTI Managers to be the JTAPI service of the Unified CM PG Side A. (See Figure 3-7.)
Step 4
Assign the second CTI Manager to be the JTAPI service of the Unified CM PG Side B. (See Figure 3-7.)
Figure 3-7 Assigning CTI Managers for PG Sides A and B
Unified IP IVR (CRS) Design Considerations
The JTAPI subsystem in Unified IP IVR (CRS) can establish connections with two CTI Managers. This feature enables Unified CCE designs to add Unified IP IVR (CRS) redundancy at the CTI Manager level in addition to using the Unified ICM script to check for the availability of Unified IP IVR (CRS) before sending a call to it. Load balancing is highly recommended to ensure that all Unified IP IVR (CRSs) are used in the most efficient way.
Figure 3-8 shows two Unified IP IVR (CRS) servers configured for redundancy within one Unified CM cluster. The Unified IP IVR (CRS) group should be configured so that each server is connected to a different CTI Manager service on different Unified CM subscribers in the cluster for load balancing and high availability. Using the redundancy feature of the JTAPI subsystem in the Unified IP IVR (CRS) server, you can implement redundancy by adding the IP addresses or host names of two Unified CMs from the cluster. Then, if one of the Unified CMs fails, the Unified IP IVR (CRS) associated with that particular Unified CM will fail-over to the second Unified CM.
Figure 3-8 High Availability with Two Unified IP IVR (CRS) Servers and One Unified CM Cluster
You can increase Unified IP IVR (CRS) availability by using one of the following optional methods:
•
Call-forward-busy and call-forward-on-error features in Unified CM. This method is more complicated, and Cisco recommends it only for special cases where a few critical CTI route points and CTI ports absolutely must have high availability down to the call processing level in Unified CM.
•
Unified ICM script features to check the availability of a Unified IP IVR (CRS) prior to sending a call to it.
Note
Do not confuse the Unified IP IVR (CRS) subsystems with services. Unified IP IVR (CRS) uses only one service, the Cisco CRS Node Manager service. The Unified IP IVR (CRS) subsystems are connections to external applications such as the CTI Manager and Unified ICM.
Unified IP IVR (CRS) High Availability Using Unified CM
You can implement Unified IP IVR (CRS) port high availability by using any of the following call-forward features in Unified CM:
•
Forward Busy — forwards calls to another port or route point when Unified CM detects that the port is busy. This feature can be used to forward calls to another CTI port when a Unified IP IVR (CRS) CTI port is busy due to a Unified IP IVR (CRS) application problem, such as running out of available CTI ports.
•
Forward No Answer — forwards calls to another port or route point when Unified CM detects that a port has not picked up a call within the timeout period set in Unified CM. This feature can be used to forward calls to another CTI port when a Unified IP IVR (CRS) CTI port is not answering due to a Unified IP IVR (CRS) application problem.
•
Forward on Failure — forwards calls to another port or route point when Unified CM detects a port failure caused by an application error. This feature can be used to forward calls to another CTI port when a Unified IP IVR (CRS) CTI port is busy due to a Unified CM application error.
Note
When using the call forwarding features to implement high availability of Unified IP IVR (CRS) ports, avoid creating a loop in the event that all the Unified IP IVR (CRS) servers are unavailable. Basically, do not establish a path back to the first CTI port that initiated the call forwarding.
Unified IP IVR (CRS) High Availability Using Unified ICM
You can implement Unified IP IVR (CRS) high availability through Unified ICM scripts. You can prevent calls from queuing to an inactive Unified IP IVR (CRS) by using the Unified ICM scripts to check the Unified IP IVR (CRS) Peripheral Status before sending the calls to it. For example, you can program a Unified ICM script to check if the Unified IP IVR (CRS) is active by using an IF node or by configuring a Translation Route to the Voice Response Unit (VRU) node (by using the consider if field) to select the Unified IP IVR (CRS) with the most idle ports to distribute the calls evenly on a call-by-call basis. This method can be modified to load-balance ports across multiple Unified IP IVR (CRSs), and it can address all of the Unified IP IVR (CRSs) on the cluster in the same Translation Route or Send to VRU node.

Note
All calls at the Unified IP IVR (CRS) are dropped if the Unified IP IVR (CRS) server itself fails. It is important to distribute calls across multiple Unified IP IVR (CRS) servers to minimize the impact of such a failure. In Unified IP IVR Release 4.0(x), there is a default script to handle cases where the Unified IP IVR (CRS) loses the link to the IVR Peripheral Gateway, so that the calls are not lost.
Cisco Unified Customer Voice Portal (Unified CVP) Design Considerations
The Unified CVP can be deployed with Unified CCE as an alternative to Unified IP IVR (CRS) for call treatment and queuing. Unified CVP is different from Unified IP IVR (CRS) in that it does not rely on Unified CM for JTAPI call control. Unified CVP uses H.323 for call control and is used in front of Unified CM or other PBX systems as part of a hybrid Unified CCE or migration solution. (See Figure 3-9.)
Figure 3-9 High Availability with Two Unified CVP Call Control Servers
Unified CVP uses the following system components:
•
Cisco Voice Gateway
The Cisco Voice Gateway is typically used to terminate TDM PSTN trunks and calls to transform them into IP-based calls on an IP network. Unified CVP uses specific H.323 voice gateways to enable more flexible call control models outside of the Unified CM MGCP control model. H.323 allows the Unified CVP to integrate with multiple IP and TDM architectures for Unified CCE. Unified CVP-controlled voice gateways also provide additional functionality using the Cisco IOS built-in Voice Extensible Markup Language (VoiceXML) Browser to provide caller treatment and call queuing on the voice gateway without having to move the call to a physical IVR device. The gateway can also use the Media Resource Control Protocol (MRCP) interface to add automatic speech recognition (ASR) and text-to-speech (TTS) functions on the gateway as well under Unified CVP control.
•
Unified CVP Call Control Server
The Unified CVP Call Control Server provides call control signaling when calls are switched between the ingress gateway and another endpoint gateway or a Unified CCE agent. It also provides the interface to the Unified ICM VRU Peripheral Gateway and translates specific Unified ICM VRU commands into VoiceXML code that is rendered on the Unified CVP Voice Gateway.
•
Unified CVP Media Server
The Unified CVP caller treatment is provided either by using ASR/TTS functions via MRCP or with predefined .wav files stored on media servers. The media servers act as web servers and serve up the .wav files to the voice browsers as part of their VoiceXML processing. Media servers can be clustered using the Cisco Content Services Switch (CSS) products, thus allowing multiple media servers to be pooled behind a single URL for access by all the voice browsers in the network.
•
Unified CVP Web Server
Unified CVP Release 3.0 provides a VoiceXML service creation environment using an Eclipse toolkit browser, which is hosted on the Unified CVP Web Server. This server also hosts the Unified CVP VoiceXML runtime environment where the dynamic VoiceXML scripts are executed and Java and Web Services calls are processed for external systems and database access.
•
H.323 Gatekeepers
Gatekeepers are used with Unified CVP to register the voice browsers and associate them with specific dialed numbers. When calls come into the network, the gateway will query the gatekeeper to find out where to send the call based upon the dialed number. The gatekeeper is also aware of the state of the voice browsers and will load-balance calls across them and avoid sending calls to out-of-service voice browsers or ones that have no available sessions.
Availability of Unified CVP can be increased by the following methods:
•
Adding redundant Unified CVP systems under control of the Unified ICM Peripheral gateways, thus allowing the calls to be balanced automatically across multiple Unified CVP Call Control Servers
•
Adding TCL scripts to the Unified CVP gateway to handle conditions where the gateway cannot contact the Unified CVP Call Control Server to direct the call correctly
•
Adding gatekeeper redundancy with HSRP
•
Adding Cisco Content Server to load-balance .wav file requests across multiple Unified CVP Media Servers and VoiceXML URL access across multiple servers.
Note
Calls in Unified CVP are not dropped if the Unified CVP Call Control Server or Unified CVP PG fails because they can be redirected to another Unified CVP Call Control Server on another Unified CVP-controlled gateway as part of the fault-tolerant design using TCL scripts (which are provided with the Unified CVP images) in the voice gateway.
For more information on these options, review the Unified CVP product documentation at
http://www.cisco.com/en/US/products/sw/custcosw/ps1006/tsd_products_support_series_home.html
Multi-Channel Design Considerations (Cisco Email Manager Option and Cisco Collaboration Server Option)
The Unified CCE solution can be extended to support multi-channel customer contacts, with email and web contacts being routed by the Unified CCE to agents in a blended or universal queue mode. The following optional components are integrated into the Unified CCE architecture (see Figure 3-10):
•
Media Routing Peripheral Gateway
To route multi-channel contacts, the Cisco e-Mail Manager and Cisco Collaboration Server Media Blender communicate with the Media Routing Peripheral Gateway. The Media Routing Peripheral Gateway, like any peripheral gateway, can be deployed in a redundant or duplex manner with two servers interconnected for high availability. Typically, the Media Routing Peripheral Gateway is co-located at the Central Controller and has an IP socket connection to the multi-channel systems.
•
Admin Workstation ConAPI Interface
The integration of the Cisco multi-channel options allows for the Unified ICM and optional systems to share configuration information about agents and their related skill groups. The Configuration Application Programming Interface (ConAPI) runs on an Administrative Workstation and can be configured with a backup service running on another Administrative Workstation.
•
Agent Reporting and Management (ARM) and Task Event Services (TES) Connections
ARM and TES services provide call (ARM) and non-voice (TES) state and event notification from the Unified CCE CTI Server to the multi-channel systems. These connections provide agent information to the email and web environments as well as accepting and processing task requests from them. The connection is a TCP/IP socket that connects to the agent's associated CTI Server, which can be deployed as a redundant or duplex pair on the Agent Peripheral Gateway.
Figure 3-10 Multi-Channel System
Recommendations for high availability:
•
Deploy the Media Routing Peripheral Gateways in duplex pairs.
•
Deploy ConAPI as a redundant pair of Administrative Workstations that are not used for configuration and scripting, so that they will be less likely to be shut off or rebooted. Also consider using the HDS servers at the central sites to host this function.
•
Deploy the Unified CCE Agent Peripheral Gateways and CTI Servers in duplex pairs.
Cisco Email Manager Option
The Cisco Email Manager is integrated with Unified CCE to provide full email support in the multi-channel contact center with Unified CCE. It can be deployed using a single server (see Figure 3-11) for a small deployments or with multiple servers to meet larger system design requirements. The major components of Cisco Email Manager are:
•
Cisco Email Manager Server — The core routing and control server; it is not redundant.
•
Cisco Email Manager Database Server — The server that maintains the online database of all email and configuration and routing rules in the system. It can be co-resident on the Cisco Email Manager server for smaller deployments or on a dedicated server for larger systems.
•
Cisco Email Manager UI Server — This server allows the agent user interface (UI) components to be off-loaded from the main Cisco Email Manager server to scale for larger deployments or to support multiple United Mobile Agent (Unified MA) sites. Each remote site could have a local UI Server to reduce the data traffic from the agent browser clients to the Cisco Email Manager server. Additionally, multiple UI servers could be configured for agents to have a redundant/secondary path to access the email application. (See Figure 3-12.)
Figure 3-11 Single Cisco Email Manager Server
Figure 3-12 Multiple UI Servers
Cisco Collaboration Server Option
The Cisco Collaboration Server is integrated with Unified CCE to provide web chat and co-browsing support in the multi-channel contact center with Unified CCE. The major components of the Cisco Collaboration Server are (see Figure 3-13):
•
Cisco Collaboration Server — Collaboration servers are deployed outside the corporate firewall in a demilitarized zone (DMZ) with the corporate web servers they support. The Collaboration Server typically supports up to 400 concurrent sessions, but multiple servers can be deployed to handle larger contact volume or to provide a backup collaboration server for agents to access if their primary server fails.
•
Cisco Collaboration Server Database Server — This server maintains the online database of all chat and browsing sessions as well as configuration and routing rules in the system. It can be co-resident on the Cisco Collaboration Server; however, because the Cisco Collaboration Server is outside the firewall, most enterprises deploy it on a separate server inside the firewall to protect the historical data in the database. Multiple Cisco Collaboration Servers can point to the same database server to reduce the total number of servers required for the solution. For redundancy, each collaboration server could also have its own dedicated database server.
•
Cisco Collaboration Server Media Blender — This server polls the collaboration servers to check for new requests, and it manages the Media Routing and CTI/Task interfaces to connect the agent and caller. Each Unified CCE Agent Peripheral Gateway will have its own Media Blender, and each Media Blender will have a Media Routing peripheral interface manager (PIM) component on the Media Routing Peripheral Gateway.
•
Cisco Collaboration Dynamic Content Adaptor (DCA) — This server is deployed in the DMZ with the collaboration server, and it allows the system to share content that is generated dynamically by programs on the web site (as opposed to static HTTP pages). Multiple DCA servers can be configured and called from the Collaboration Server(s) for redundancy as well.
Figure 3-13 Cisco Collaboration Server
Cisco Unified Outbound Dialer (Unified OUTD) Design Considerations
The Unified OUTD provides the ability for Unified CCE to place calls on behalf of agents to customers based upon a predefined campaign. The major components of the Unified OUTD are (see Figure 3-14):
•
Outbound Campaign Manager — A software module that manages the dialing lists and rules associated with the calls to be placed. This software is loaded on the Logger Side A platform and is not redundant; it can be loaded and active on only one server of the duplex pair of Loggers in the Unified CCE system.
•
Unified OUTD — A software module that performs the dialing tasks on behalf of the Campaign Manager. In Unified CCE, the Unified OUTD emulates a set of IP phones for Unified CM to make the outbound calls, and it detects the called party and manages the interaction tasks with the CTI OS server to transfer the call to an agent. It also interfaces with the Media Routing Peripheral Gateway, and each Dialer has its own peripheral interface manager (PIM) on the Media Routing Peripheral Gateway.
Figure 3-14 Unified CCE Unified OUTD
The system can support multiple dialers across the enterprise, all of which are under control of the central Campaign Manager software. Although they do not function as a redundant or duplex pair the way a Peripheral Gateway does, with a pair of dialers under control of the Campaign Manager, a failure of one of the dialers can be handled automatically and calls will continue to be placed and processed by the surviving dialer. Any calls that were already connected to agents would remain connected and would experience no impact from the failure.
For smaller implementations, the Dialer could be co-resident on the Unified CCE Peripheral Gateway. For larger systems, the Dialer should be on its own server, or you could possibly use multiple Dialers under control of the central Campaign Manager.
Recommendations for high availability:
•
Deploy the Media Routing Peripheral Gateways in duplex pairs.
•
Deploy Dialers on their own servers as standalone devices to eliminate a single point of failure. (If they were co-resident on a PG, the dialer would fail whenever the PG server failed.)
•
Deploy multiple Dialers and make use of them in the Campaign Manager to allow for automatic fault recovery to a second Dialer in the event of a failure.
•
Include Dialer phones (virtual phones in Unified CM) in redundancy groups in Unified CM to allow them to fail-over to a different subscriber, as would any other phone or device in the Unified CM cluster.
Peripheral Gateway Design Considerations
The Agent PG uses the Unified CM CTI Manager process to communicate with the Unified CM cluster, while a single Peripheral Interface Manager (PIM) controls agent phones anywhere in the cluster. The Peripheral Gateway PIM process registers with CTI Manager on one of the Unified CM servers in the cluster, and the CTI Manager accepts all JTAPI requests from the PG for the cluster. If the phone, route point, or other device being controlled by the PG is not registered to that specific Unified CM server in the cluster, the CTI Manager forwards that request via Unified CM SDL links to the other Unified CM servers in the cluster. There is no need for a PG to connect to multiple Unified CM servers in a cluster.
Although the Agent PG in this document is described as typically having only one PIM process that connects to the Unified CM cluster, the Agent PG can manage multiple PIM interfaces to the same Unified CM cluster, which can be used to create additional peripherals within Unified CCE to separate registration for CTI route points and phones into two different streams. Using two PIMs for one cluster is required only when a customer has a large number of CTI Route Points in the Unified CCE Configuration. A single PIM can register approximately five CTI Route Points per second. To reduce initialization and failover times, you should use a second PIM when there are more than 250 Route Points. It is recommended that you add another PIM to the Unified CCE PG to handle only the CTI Route Points. With this model, all agent control is contained within a distinct PIM, and all CTI Route Points are contained on a second PIM. This enables the CTI Route Point PIM to register the Route Points and bring the system online before all of the agent phones are registered. This model applies only to Unified CCE models. Note that system Unified CCE allows only one Unified CM PIM.
The typical process for the startup of a PIM is to register with the peripheral for all of the monitored objects, such as CTI route points, dialed numbers, agent phones, or device targets. The PIM will not go active until all of these objects are registered and monitored properly in the PIM connection.
Many Unified CCE implementations have only a small number of CTI route points, while having numerous device targets to register. By separating these two configuration objects across two PIMs, the CTI route points can be registered and used for routing long before all of the agent phones or device targets are registered and active. The concept is to have the CTI route points available to Unified CCE immediately upon PG/PIM startup and failover, thus allowing Unified CM to process these route requests and start to handle calls even if the agent phones are not all registered. Callers can be put into queue quickly or played a message rather than having the default forward-on-failure processing in Unified CM take over the calls if the PIM is not ready to accept the incoming calls. (However, this functionality is not available in System Unified CCE, which allows only one PIM per PG. Only traditional Unified CCE configurations allow for multiple PIM configuration.)
Duplex Agent PG implementations are highly recommended because the PG has only one connection to the Unified CM cluster using a single CTI Manager. If that CTI Manager were to fail, the PG would no long be able to communicate with the Unified CM cluster. Adding a redundant or duplex PG allows the Unified ICM to have a second pathway or connection to the Unified CM cluster using a second CTI Manager process on a different Unified CM server in the cluster.
The minimum requirement for Unified ICM high-availability support for CTI Manager and Unified IP IVR is a duplex (redundant) Agent PG environment with one Unified CM cluster containing at least two subscribers. Therefore, the minimum configuration for a Unified CM cluster in this case is one publisher and two subscribers. This minimum configuration ensures that, if the primary subscriber fails, the devices will re-home to the secondary subscriber and not to the publisher for the cluster. (See Figure 3-15.) In smaller systems and labs, Cisco permits a single publisher and single subscriber, which means if the subscriber fails, then all the devices will be active on the publisher. For specific details about the number of recommended Unified CM servers, see Sizing Cisco Unified Communications Manager Servers, page 11-1.
Figure 3-15 Unified ICM High Availability with One Unified CM Cluster
Redundant Unified ICM servers can be located at the same physical site or geographically distributed. In both cases, the Unified ICM Call Router and Logger/Database Server processes are interconnected through a separate private network. If the servers are located at the same site, you can provide the private LAN by inserting a second NIC card in each server (sides A and B) and connecting them with a crossover cable. If the servers are geographically distributed, you can provide the private network by inserting a second NIC card in each server (sides A and B) and connecting them with a dedicated T1 line that meets the specific network requirements for this connection as documented in the Unified ICM Installation Guide, available at
http://www.cisco.com/en/US/products/sw/custcosw/ps1001/prod_installation_guides_list.html
Within the Agent PG, two software processes are run to manage the connectivity to the Unified CM cluster: the JTAPI Gateway and the Unified CM PIM. The JTAPI Gateway is started by the PG automatically and runs as a node-managed process, which means that the PG will monitor this process and automatically restart it if it should fail for any reason. The JTAPI Gateway handles the low-level JTAPI socket connection protocol and messaging between the PIM and the Unified CM CTI Manager, and it is specific for the version of Unified CM. This software module must be downloaded from Unified CM when the PG is initially set up to ensure specific version compatibility.
The Agent PG Peripheral Interface Manager (PIM) is also a node-managed process and is monitored for unexpected failures and automatically restarted. This process manages the higher-level interface between the Unified ICM and the Unified CM cluster, requesting specific objects to monitor and handling route requests from the Unified CM cluster.
In a duplex Agent PG environment, both JTAPI services from both Agent PG sides log into the CTI Manager upon initialization. Unified CM PG side A logs into the primary CTI Manager, while PG side B logs into the secondary CTI Manager. However, only the active side of the Unified CM PG registers monitors for phones and CTI route points. The duplex Agent PG pair works in hot-standby mode, with only the active PG side PIM communicating with the Unified CM cluster. The standby side logs into the secondary CTI Manager only to initialize the interface and make it available for a failover. The registration and initialization services of the Unified CM devices take a significant amount of time, and having the CTI Manager available significantly decreases the time for failover.
In duplex PG operation, the side that goes active is the PG side that is first able to connect to the Unified ICM Call Router Server and request configuration information. It is not deterministic based upon the side-A or side-B designation of the PG device, but it depends only upon the ability of the PG to connect to the Call Router, and it ensures that only the PG side that has the best connection to the Call Router will attempt to go active.
The startup process of the PIM requires that all of the CTI route points be registered first, which is done at a rate of 5 route points per second. For systems with a lot of CTI route points (for example, 1000), this process can take as long as 3 minutes to complete before the system will allow any of the agents to log in. This time can be reduced by distributing the devices over multiple PIM interfaces to the Unified CM cluster.
Unified CM will accept multiple PG/PIM connections on a single cluster, which allows Unified CCE to distribute different types of registrations across multiple connections. For example, one PIM could be configured to log in as JTAPIuser1 and have only the CTI route points associated with it, while a second PIM could be logged in as JTAPIuser2 and have the agent IP phones registered to it. This arrangement would allow agents to log in faster at startup, without having to wait for the dialed numbers to be registered with CTI Manager first.
Unified CM Failure Scenarios
A fully redundant Unified CCE system contains no single points of failure. However, there are scenarios where a combination of multiple failures can reduce Unified CCE system functionality and availability. Also, if a component of the Unified CCE solution does not itself support redundancy and failover, existing calls on that component will be dropped. The following failure scenarios have the most impact on high availability, and Unified CM Peripheral Interface Managers (PIMs) cannot activate if either of the following failure scenarios occurs (see Figure 3-16):
•
Agent PG/PIM side A and the secondary CTI Manager that services the PG/PIM on side B both fail.
•
Agent PG/PIM side B and the primary CTI Manager that services the PG/PIM on side A both fail.
In either of these cases, the Unified ICM will not be able to communicate with the Unified CM cluster.
Figure 3-16 Unified CM PGs Cannot Cross-Connect to Backup CTI Managers
Unified ICM Failover Scenarios
This section describes how redundancy works in the following failure scenarios:
•
Scenario 1: Unified CM and CTI Manager Fail
•
Scenario 2: Agent PG Side A Fails
•
Scenario 3: Only the Primary Unified CM Subscriber Fails
•
Scenario 4: Only the Unified CM CTI Manager Service Fails
Scenario 1: Unified CM and CTI Manager Fail
Figure 3-17 shows a complete system failure or loss of network connectivity on Cisco Unified CM subscriber A. The CTI Manager and Cisco CallManager services were initially both active on this same server, and Unified CM subscriber A is the primary CTI Manager in this case. The following conditions apply to this scenario:
•
All phones and gateways are registered with Unified CM subscriber A as the primary server.
•
All phones and gateways are configured to re-home to Unified CM subscriber B (that is, B is the backup server).
•
Unified CM subscribers A and B are each running a separate instance of CTI Manager.
•
When Unified CM subscriber A or its CCM.exe process fails, all registered phones and gateways re-home to Unified CM subscriber B.
•
PG side A detects a failure and induces a failover to PG side B.
•
PG side A receives out-of-service events from Unified CM and logs out the agents.
•
PG side B becomes active and registers all dialed numbers and phones, and call processing continues.
•
To provide full access to state and third-party call control functions again, the agents must log back in after their phones have re-homed to subscriber B.
•
During this failure, any calls in progress at a Unified CCE agent remain active; however, the agent will not be able to perform a conference, transfer, or any other Unified CM features until the phone re-homes to an active subscriber. Then the agent will be required to log back into Unified CCE. When the call is completed, the phone re-homes automatically to the backup Unified CM subscriber.
•
When Unified CM subscriber A recovers, all idle phones and gateways re-home to it. Active devices wait until they are idle before re-homing to the primary subscriber.
•
PG side B remains active, using the CTI Manager on Unified CM subscriber B.
•
After recovery from the failure, the PG does not fail back to the A side of the duplex pair. All CTI messaging is handled using the CTI Manager on Unified CM subscriber B, which communicates with Unified CM subscriber A to obtain phone state and call information.
Figure 3-17 Scenario 1 - Unified CM and CTI Manager Fail
Scenario 2: Agent PG Side A Fails
Figure 3-18 shows a failure on PG side A and a failover to PG side B. All CTI Manager and Unified CM services continue running normally. The following conditions apply to this scenario:
•
All phones and gateways are registered with Unified CM subscriber A.
•
All phones and gateways are configured to re-home to Unified CM subscriber B (that is, B is the backup server).
•
Unified CM subscribers A and B are each running a local instance of CTI Manager.
•
When PG side A fails, PG side B becomes active.
•
PG side B registers all dialed numbers and phones, and call processing continues. Phones and gateways stay registered and operational with Unified CM subscriber A; they do not fail-over.
•
Agents with calls in progress will stay in progress, but with no third-party call control (conference, transfer, and so forth) available from their agent desktop softphones. After an agent disconnects from all calls, that agent's desktop functionality is restored to the same state prior to failover.
•
When PG side A recovers, PG side B remains active and uses the CTI Manager on Unified CM subscriber B.
Figure 3-18 Scenario 2 - Agent PG Side A Fails
Scenario 3: Only the Primary Unified CM Subscriber Fails
Figure 3-19 shows a failure on Unified CM subscriber A. The CTI Manager services are running on Unified CM subscribers C and D, and the CTI Manager on Unified CM subscriber C is actively connected to the Unified CM Peripheral Gateway (PG) side A. However, all phones and gateways are registered with Unified CM subscriber A. All phones and devices will re-home individually to the backup Unified CM subscriber B. If the device is in use (a phone on a call), it will re-home to backup Unified CM subscriber B after it is no longer in use.
The following conditions apply to this scenario:
•
All phones and gateways are registered with Unified CM subscriber A.
•
All phones and gateways are configured to re-home to Unified CM subscriber B (that is, B is the backup server).
•
Unified CM subscribers C and D are each running a local instance of CTI Manager.
•
If Unified CM subscriber A fails, phones and gateways re-home to the backup Unified CM subscriber B.
•
PG side A remains connected and active, with a CTI Manager connection on Unified CM subscriber C. It does not fail-over because the JTAPI-to-CTI Manager connection has not failed. However, it will see the phones and devices being unregistered from Unified CM subscriber A (where they were registered) and will then be notified of these devices being re-registered on Unified CM subscriber B automatically. During the time that the agent phones are not registered, the PG will disable the agent desktops to prevent the agents from attempting to use the system while their phones are not actively registered with a Unified CM subscriber.
•
Call processing continues for any devices not registered to Unified CM subscriber A. Call processing also continues for those devices on subscriber A when they are re-registered with their backup subscriber.
•
Agents on active calls will stay in their connected state until they complete the call; however, the agent desktop will be disabled to prevent any conference, transfer, or other third-party call control during the failover. After the agent disconnects the active call, that agent's phone will re-register with the backup subscriber, and the agent will have to log in again manually using the agent desktop.
•
When Unified CM subscriber A recovers, phones and gateways re-home to it. This re-homing can be set up on Unified CM to gracefully return groups of phones and devices over time or to require manual intervention during a maintenance window to minimize the impact to the call center. During this re-homing process, the CTI Manager service will notify the Unified CCE Peripheral Gateway of the phones being unregistered from the backup Unified CM subscriber B and re-registered with the original Unified CM subscriber A.
•
Call processing continues normally after the phones and devices have returned to their original subscriber.
Figure 3-19 Scenario 3 - Only the Primary Unified CM Subscriber Fails
Scenario 4: Only the Unified CM CTI Manager Service Fails
Figure 3-20 shows a CTI Manager service failure on Unified CM subscriber C. The CTI Manager services are running on Unified CM subscribers C and D, and Unified CM subscriber C is the active CTI Manager connected to the Unified CCE Peripheral Gateway side A. However, all phones and gateways are registered with Unified CM subscriber A. During this failure, both the CTI Manager and the PG fail-over to their secondary sides. Because the JTAPI service on PG side B is already logged into the secondary (now primary) CTI Manager, the device registration and initialization time is significantly shorter than if the JTAPI service on PG side B had to log into the CTI Manager.
The following conditions apply to this scenario:
•
All phones and gateways are registered with Unified CM subscriber A.
•
All phones and gateways are configured to re-home to Unified CM subscriber B (that is, B is the backup server).
•
Unified CM subscribers C and D are each running a local instance of CTI Manager.
•
If the Unified CM CTI Manager service on subscriber C fails, the PG side A detects a failure of the CTI Manager service and induces a failover to PG side B.
•
PG side B registers all dialed numbers and phones with the Unified CM CTI Manager service on subscriber D, and call processing continues.
•
Agents with calls in progress will stay in progress, but with no third-party call control (conference, transfer, and so forth) available from their agent desktop softphones. After an agent disconnects from all calls, that agent's desktop functionality is restored to the same state prior to failover.
•
When the Unified CM CTI Manager service on subscriber C recovers, PG side B continues to be active and uses the CTI Manager service on Unified CM subscriber D.
Figure 3-20 Scenario 4 - Only the Unified CM CTI Manager Service Fails
Unified CCE Scenarios for Clustering over the WAN
Unified CCE can also be overlaid with the Unified CM design model for clustering over the WAN, which allows for high availability of Unified CM resources across multiple locations and data center locations. There are a number of specific design requirements for Unified CM to support this deployment model, and Unified CCE adds its own specific requirements and new failover considerations to the model.
Specific testing has been performed to identify the design requirements and failover scenarios, but no code changes were made to the core Unified CCE solution components to support this model. The success of this design model relies on specific network configuration and setup, and the network must be monitored and maintained. The component failure scenarios noted previously (see Unified ICM Failover Scenarios) are still valid in this model, and the additional failure scenarios for this model include:
•
Scenario 1: Unified ICM Central Controller or Peripheral Gateway Private Network Fails
•
Scenario 2: Visible Network Fails
•
Scenario 3: Visible and Private Networks Both Fail (Dual Failure)
•
Scenario 4: Unified MA Location WAN (Visible Network) Fails
Note
The terms public network and visible network are used interchangeably throughout this document.
Scenario 1: Unified ICM Central Controller or Peripheral Gateway Private Network Fails
In clustering over the WAN with Unified CCE, there should be an isolated private network connection between the geographically distributed Central Controller (Call Router/Logger) and the split Peripheral Gateway pair to maintain state and synchronization between the sides of the system.
To understand this scenario fully, a brief review of the ICM Fault Tolerant architecture is warranted. On each call router, there is a process known as the Message Delivery Service (MDS), which delivers messages to and from local processes such as router.exe and which handles synchronization of messages to both call routers. For example, if a route request comes from the carrier or any routing client to side A, MDS ensures that both call routers receive the request. MDS also handles the duplicate output messages.
The MDS process ensures that duplex ICM sides are functioning in a synchronized execution, fault tolerance method. Both routers are executing everything in lockstep, based on input the router receives from MDS. Because of this synchronized execution method, the MDS processes must always be in communication with each other over the private network. They use TCP keep-alive messages generated every 100 ms to ensure the health of the redundant mate or the other side. Missing five consecutive TCP keep-alive messages indicates to Unified ICM that the link or the remote partner system might have failed.
When running duplexed ICM sides as recommended for all production system, one MDS will be the enabled synchronizer and will be in a paired-enabled state. Its partner will be the disabled synchronizer and is said to be paired-disabled. Whenever the sides are running synchronized, the side A MDS will be the enabled synchronizer in paired-enabled state. Its partner, side B, will be the disabled synchronizer and paired-disabled state. The enabled synchronizer sets the ordering of input messages to the router and also maintains the master clock for the ICM system.
If the private network fails between the Unified ICM Central Controllers, the following conditions apply:
•
The Call Routers detects the failure by missing five consecutive TCP keep-alive messages. The currently enabled side (side A in most cases) transitions to an isolated-enabled state and continues to function as long as it can communicate to at least half of the configured number of Peripheral Gateways (PGs). The disabled side (side B in this example) checks that it can communicate with a majority of the PGs, or half of the configured PGs plus one (half + 1). If the disabled side (B) cannot communicate with a majority (half + 1) of the PGs, then it transitions to an isolated-disabled state and side A runs in simplex mode. (Noted that any PG with a connected DMP path, either active or idle, is counted in this majority calculation for both side A and side B.) If the B side does have device majority (can communicate to half + 1 of the configured PGs), then it transitions to a testing state and initiates the Test Other Side (TOS) procedure to determine if it should become enabled.
•
The disabled synchronizer (side B in this case) sends TOS messages through each PG in sequential order to Router A. One successful message from any PG, communicating that Router A is still enabled, will cause Router B to transition to the isolated-disabled state and make itself idle. Its mated Logger B will also idle itself. All the Peripheral Gateways realign their active data feed to the active Call Router over the visible network, with no failover or loss of service because the PGs do not disconnect when they realign.
•
If all PGs reply that side A is down during the TOS procedure, then side B promotes itself to isolated-enabled state and runs in simplex mode.
•
There is no impact to the agents, calls in progress, or calls in queue. The system can continue to function normally; however; the Call Routers will be in simplex mode until the private network link is restored.
If the private network fails between the Unified CM Peripheral Gateways, the following conditions apply:
•
The Peripheral Gateway sides detect a failure if they miss five consecutive TCP keep-alive messages, and they follow a process similar to the call routers, leveraging the MDS process when handling a private link failure. As with the Central Controllers, one MDS process is the enabled synchronizer and its redundant side is the disabled synchronizer. When running redundant PGs, as is always recommended in production, the A side will always be the enabled synchronizer.
•
After detecting the failure, the disabled synchronizer (side B) initiates a test of its peer synchronizer via the TOS procedure on the Public or Visible Network connection. If PG side B receives a TOS response stating that the A side synchronizer is enabled or active, then the B side immediately goes out of service, leaving the A side to run in simplex mode until the Private Network connection is restored. The PIM, OPC, and CTI SVR processes become active on PG side A, if not already in that state, and the CTI OS Server process still remains active on both sides as long as the PG side B server is healthy. If the B side does not receive a message stating that the A side is enabled, then side B continues to run in simplex mode and the PIM, OPC, and CTI SVR processes become active on PG side B if not already in that state. This condition should occur only if the PG side A server is truly down or unreachable due to a double failure of visible and private network paths.
•
There is no impact to the agents, calls in progress, or calls in queue because the agents stay connected to their already established CTI OS Server process connection. The system can continue to function normally; however; the PGs will be in simplex mode until the private network link is restored.
If the two private network connections are combined into one link, the failures follow the same path; however, the system runs in simplex mode on both the Call Router and the Peripheral Gateway. If a second failure were to occur at that point, the system could lose some or all of the call routing and ACD functionality.
Scenario 2: Visible Network Fails
The visible network in this design model is the network path between the data center locations where the main system components (Unified CM subscribers, Peripheral Gateways, Unified IP IVR/Unified CVP components, and so forth) are located. This network is used to carry all the voice traffic (RTP stream and call control signaling), Unified ICM CTI (call control signaling) traffic, as well as all typical data network traffic between the sites. In order to meet the requirements of Unified CM clustering over the WAN, this link must be highly available with very low latency and sufficient bandwidth. This link is critical to the Unified CCE design because it is part of the fault-tolerant design of the system, and it must be highly resilient as well:
•
The highly available (HA) WAN between the central sites must be fully redundant with no single point of failure. (For information regarding site-to-site redundancy options, refer to the WAN infrastructure and QoS design guides available at http://www.cisco.com/go/designzone.) In case of partial failure of the highly available WAN, the redundant link must be capable of handling the full central-site load with all QoS parameters. For more information, see the section on Bandwidth Requirements for Unified CCE Clustering Over the WAN, page 12-19.
•
A highly available (HA) WAN using point-to-point technology is best implemented across two separate carriers, but this is not necessary when using a ring technology.
If the visible network fails between the data center locations, the following conditions apply:
•
The Unified CM subscribers will detect the failure and continue to function locally, with no impact to local call processing and call control. However, any calls that were set up over this WAN link will fail with the link.
•
The Unified ICM Call Routers will detect the failure because the normal flow of TCP keep-alives from the remote Peripheral Gateways will stop. Likewise, the Peripheral Gateways will detect this failure by the loss of TCP keep-alives from the remote Call Routers. The Peripheral Gateways will automatically realign their data communications to the local Call Router, and the local Call Router will then use the private network to pass data to the Call Router on the other side to continue call processing. This does not cause a failover of the Peripheral Gateway or the Call Router.
•
Agents might be affected by this failure under the following circumstances:
–
If the agent desktop (Cisco Agent Desktop or CTI OS) is registered to the Peripheral Gateway on side A of the system but the physical phone is registered to side B of the Unified CM cluster.
Under normal circumstances, the phone events would be passed from side B to side A over the visible network via the CTI Manager Service to present these events to the side A Peripheral Gateway. The visible network failure will not force the IP phone to re-home to side A of the cluster, and the phone will remain operational on the isolated side B. The Peripheral Gateway will no longer be able to see this phone, and the agent will be logged out of Unified CCE automatically because the system can no longer direct calls to the agent's phone.
–
If the agent desktop (Cisco Agent Desktop or CTI OS) and IP phone are both registered to side A of the Peripheral Gateway and Unified CM, but the phone is reset and it re-registers to a side B of the Unified CM subscriber.
If the IP phone re-homes or is manually reset and forced to register to side B of a Unified CM subscriber, the Unified CM subscriber on side A that is providing the CTI Manager service to the local Peripheral Gateway will unregister the phone and remove it from service. Because the visible network is down, the remote Unified CM subscriber at side B cannot send the phone registration event to the remote Peripheral Gateway. Unified CCE will log out this agent because it can no longer control the phone for the agent.
–
If the agent desktop (CTI toolkit Agent Desktop or Cisco Agent Desktop) is registered to the CTI OS Server at the side-B site but the active Peripheral Gateway side is at the side-A site.