The Experience of Building a Scalable Supercomputer

This week, the TOP500 ranked the system at Taiwan’s National Center for High Performance Computing as #42 in their June 2011 bi-annual supercomputer list. This system was provided by Acer together with its technology partners, AMD, DataDirect Networks, QLogic, and Platform Computing. Platform Computing provided the management software and MPI libraries for the system, as well as services for deploying these software components.


During the period of system installation and configuration, a number of areas demonstrated the advantages of partnering with Platform Computing:


(1) Management software: Platform HPC was chosen to manage the system. The scalability and maturity of the software components simplified the installation and the configuration of the management software layer. Both the workload scheduler (based on Platform LSF) and MPI library (Platform MPI) on the system scale effortlessly.


(2) MPI expertise: To achieve maximum Linpack performance results, it is critical to ensure MPI performance is optimized. During the installation and configuration stage, the Platform MPI development team provided numerous best practices to help maximize the benchmarking results, from checking cluster healthiness to MPI performance tuning. They collaborated closely with developers from QLogic, who provided Infiniband interconnects.


(3) Dynamic zoning: The system will be used by multiple research user groups. There is a separate workload management instance for each user group. Based on the workload of each user group, the size of the workload management zone will change from time to time. Each zone has its own user account management system and scheduling policies. Platform HPC was set up to easily manages such dynamic configuration changes.


The maturity of Platform HPC, as well as the expertise from Platform Computing’s development and services teams played a key role in ensuring the success of this Acer project. The maximized performance and stability of the benchmarking runs enabled the results to be submitted in time for the June TOP500 list. But mostly importantly, when the system is in hands of hundreds of users in production, the robustness of the workload management, the performance of MPI, as well as the support from experts who built the software will make a difference in delivering the quality of services from this top Taiwanese supercomputer.

0 comments:

Post a Comment