Guest

Clustering and High-Performance Computing Solution

Stanford University Advances High-Performance Computing

Engineering school speeds interprocessor communications with Cisco® Server Fabric Switches on HPC research platforms.

Text Box: EXECUTIVE SUMMARYSTANFORD UNIVERSITY HIGH PERFORMANCE COMPUTING CENTER● Industry: Research, Higher Education● Location: Palo Alto, California BUSINESS CHALLENGE●   Consolidate and advance campuswide HPC facilities●    Choose cost-efficient components for cluster computing that optimize replication, simplify support, and provide researchers with superior platforms compared to dedicated systemsSOLUTION●    Dual-speed InfiniBand 4X interfaces with 20-Gbps bandwidth per port, 96 ports per switch● Full compatibility with open-source standards and software as well as commercial applications●    Fast setup and deployment of new clustersBUSINESS RESULTS●    Efficient and scalable foundation for high-performance CFD simulation codes●  Highly reliable environment that avoids application restarts, reducing downtime●  Premier facility for research teams doing first-of-a-kind simulations as well as academic courses for credit

Business Challenge

Stanford University, a premier research and education institution, recently introduced the on-campus High Performance Computing Center (HPCC) to support sponsored research efforts and credit-based courses within the School of Engineering. The center provides a focus for the previously independent high-performance computing (HPC) efforts of many different research and academic teams, and it has already become a leading center for large-scale simulations of computational fluid dynamics (CFD) and other engineering problems that require massively parallel computing resources. In particular, the HPCC provides the expertise and resources to give researchers and faculty access to the latest advances in:

HPC systems: By consolidating resources, the HPCC now offers numerous HPC platforms with varied architectures. These HPC systems enable larger simulations and analyses and significantly faster processing than was possible with computers dedicated to individual researchers.

Advanced scientific visualization: Computing systems with high-performance graphics hardware and leading-edge desktop and large-venue display capabilities enable large-scale data analysis and promote discovery.

Massive data storage systems: The vast quantities of data from simulations on HPC systems must be stored and be accessible at speeds that complement real-time data visualization.

The mission of the HPCC challenges a small staff to efficiently deliver a broad range of services and carry out the work required to introduce the latest HPC technology to the campus. "To keep up with our user base requirements, we needed to derive a standard cluster configuration - servers, memory, I/O, and processor interconnects," says Steve Jones, HPC manager for the flow physics and computational engineering group and founder of the HPCC. "The goal was to evaluate and choose the best price-performance options for each key cluster component and establish a reproducible best practice for rapid deployment."

Network Solutions

Processor interconnections - the server fabric within each cluster - represent an important factor for ensuring the overall scalability of applications. The HPCC clusters, while impressive HPC resources, provide a development environment for codes and applications that are destined for much larger scale clusters at the national laboratories affiliated with the CFD research community. For example, when a research team is allocated a precious week of processing time on a 65,000 processor Department of Energy ASC (Advanced Simulation and Computation) Supercomputer, it is essential that codes be thoroughly tested in advance.
In the search for the best server fabric, the HPCC team defined the requirements to include:

• A single chassis solution capable of enabling a full nonblocking server fabric

• Support for application loads consisting of CFD codes including Stanford University multiblock (SUmb) and CDP (named for the late Charles David Pierce) codes as well as commercially available applications

• Low-latency processor interconnects

• The ability to scale from 48-node to 96-node clusters, with systems configured with up to 1000 processors, without processor interconnect performance becoming a bottleneck within the clusters

• An open, proven design that simplifies code portability when moving applications to larger national computing centers, such as those affiliated with the Department of Energy's Advanced Simulation and Computing program

"We were looking for an interconnect based on InfiniBand technology, but it wasn't just about finding the best hardware component. We wanted a complete solution including the message-passing layer - a solid hardware and software combination."

- Steve Jones, HPC Manager, Flow Physics and Computational Engineering, Stanford University

"We were looking for an interconnect based on InfiniBand technology, but it wasn't just about finding the best hardware component," Jones says. "We wanted a complete solution including the message-passing layer - a solid hardware and software combination. To maximize our code porting within the research community and leverage the cluster findings from other research teams, we were also looking for a solution based on the Message Passing Interface (MPI) standard. Many companies offer proprietary implementations, but our goals for an open environment led us to look for a vendor with knowledge about the OSU MVAPICH implementation of the MPI standard. Working relationships between our supplying vendors was also desirable since we are a test bed for new HPC applications and want a collaborative relationship with our vendors."
At the end of the HPCC team's search for a server fabric, the vendor with the most expertise was Cisco®. The search process also included stringent benchmarking to determine the overall price-performance of the top-ranked choices, measuring both timed interprocessor transactions and actual application iterations. The Cisco 7008 Server Fabric Switch (SFS) was the clear winner after the benchmarking phase, and it has already been deployed in several on-campus clusters.
The Cisco 7008 SFS supports dual-speed InfiniBand 4X double data rate (DDR) and single data rate (SDR) interfaces that deliver 20 Gbps and 10 Gbps bandwidth per port, respectively. The nonblocking cross-sectional bandwidth with low port-to-port latency enables the creation of high-performance server fabrics within large-scale clusters. The platform provides 96 InfiniBand 10-Gbps 4X ports for server and interswitch connectivity along with superior reliability, high availability, and serviceability.

Business Results

The cluster technology evaluation efforts have paid off for the HPCC and its prestigious base of researchers. "By finding an optimal combination of foundational cluster technologies, we've been able to refine the art of cluster deployment and operation," says Jones. "We can bring up a new 96-processor system in less than a day. The Cisco SFS plays an important role in our replicable compute model and has helped us achieve very scalable results with CFD and other simulation codes. Just as important, we've had no failures that require restarts of applications. When codes can take more than a week to run, it's imperative that we provide cluster solutions with the best possible sustained uptimes."
The other benefits of the Cisco SFS solution have included:

• Ease of installation and management (all clusters are supported by a small team)

• Clean drivers and software stack that further simplify support efforts

• Highly skilled Cisco engineers with in-depth understanding of HPC paradigms

The Cisco SFS solution also integrates easily into the open, standards-based compute environment of the HPCC, which includes Linux-based servers and Rocks cluster management software.
The establishment of the HPCC and its rapid rise to become a leading simulation facility for CFD and similar research has given the Stanford community and its partners the ability to realistically and affordably scale HPC clusters in support of programs that require massively parallel computing resources. The newest clusters, with Cisco SFS processor interconnect solutions, are enabling first-of-a-kind simulations for the study of structural dynamics, contact problems, nonlinear aeroelasticity of fighter aircraft, fluid-structure interaction, underwater acoustics, inverse problems, and shape optimization.
Text Box: PRODUCT LISTHigh-Performance ComputingCisco SFS 7008 Server Fabric Switch
The low-latency clusters provide superior performance for the center's flagship CFD codes CDP and SUmb. CDP is an unstructured LES code with multiphysics capabilities for computing high-fidelity turbulent reacting multiphase flows. Its low dissipation numerics are critical when important flow structure persists for a relatively long time, such as the trailing vortices generated by helicopter blades. SUmb is a block-structured RANS code, and is currently being used to study compressible flow related to jet engines, aircraft, helicopters, and space-going vehicles. Both codes have been run on thousands of processors and can support scalable simulation on large parallel computers such as BlueGene/L, which includes as many as 130,000 processors.

Next Steps

"The InfiniBand interconnect technology introduces an opportunity to improve other connections within our clusters, in addition to interprocessor connections," Jones says. "Storage systems could also benefit from the high-speed, low-latency characteristics of an InfiniBand fabric. We plan to further explore InfiniBand switching solutions to continue to push cluster platforms beyond current capacities and capabilities."

For More Information

To find out more about Cisco HPC solutions, go to: http://www.cisco.com/go/hpc.