The HPC²'s high performance computing needs date back to its inception, but it was not until June 1996 that it made its first appearance on the TOP500 Supercomputer Sites list. The TOP500 list is published in June and November of each year by the University of Tennessee, the University of Mannheim, and the Department of Energy's National Energy Research Scientific Computing Center (NERSC)and ranks the 500 most powerful computers in the world based on their performance on the LINPACK benchmark. This benchmark solves a dense system of linear equations and was chosen because it simulates the kinds of computations that many systems of this size perform in the real world.
In the June 1996 TOP500 list, the HPC² appeared for the first time with its new SGI CHALLENGE computer system capable of 4.32 gigaflops (billion calculations per second) on the LINPACK benchmark. It debuted at 359th overall and 33rd among academic institution in the United States.
On the 27 lists published between June 1996 and November 2019, the HPC² was represented 16 times. Since the 1996 list, the HPC²'s largest computer system increased from a mere 4.32 billion calculations per second to 3.666 quadrillion calculations per second in measured LINPACK performance.
The HPC² has averaged a ranking of 254th place overall on lists in which it has been ranked (as of November 2019), and has been ranked as high as 60th overall (November 2019) and 4th among U.S. academic institutions (June 2019).
The Message Passing Interface (MPI) is a library specification that defines a communications protocol used by a group of processors when performing parallel calculations. The original MPI standard was defined by a group of vendors, laboratories and universities (including Mississippi State University). The standard was formulated between April 1992 and the final publication of the MPI Version 1.0 standard in May 1994.
Shortly thereafter a project began to develop a portable implementation of the MPI standard that would run on most any type of computer. The project was called "MPICH", where the "CH" stood for "chameleon", a creature known for its ability to adapt to its environment. The first paper on this project was "A High-Performance, Portable Implementation of the MPI Message Passing Interface Standard". This paper was authored by William Gropp and Ewing Lusk of the Mathematics and Computer Science Division of the Argonne National Laboratory and by Nathan Doss and Anthony Skjellum of the HPC² and Department of Computer Science at Mississippi State University.
Most of the original MPICH implementation was developed by researchers at the Argonne National Laboratory and at the HPC² at Mississippi State University, and is still sometimes referred to as the "MSU/ANL MPI Implementation".
Myricom, Inc. was founded in 1994. In that first year, researchers at the HPC² and the Department of Computer Science began beta testing and software development for some of Myricom's earliest Myrinet products. HPC² researchers Gregory Henley, Thomas McMahon, Anthony Skjellum, and Nathan Doss helped develop the first high-performance Myrinet drivers for MPI on Sun/Solaris systems. This research lead to the development of the "BullDog Myrinet Control Program" (BDM) which provided workstation host stack and NIC board-level software for the management of the Myrinet interface hardware and interaction with the host-level software though a set of shared queues.
Many people think that cluster computing originated with Thomas Sterling and Donald Becker's work on the Beowulf Project in 1994. This project is certainly one of the most important events in the history of cluster computing. Its use of the Linux operating system on inexpensive PC's has revolutionized the high performance computing community and created all whole new class of systems (known as Beowulf clusters). However, it was not the first time that a cluster, or what had often previously been referred to as a "multicomputer", had been built. Many others, including Mississippi State University, had been working on the subject for several years before that event. This is a brief description of the history of cluster computing research at Mississippi State University.
Mississippi State University has been involved in what is now called cluster computing at least since 1987. In that year, DARPA funded an MSU project called MADEM (Mapped Array Differential Equation Machine). MADEM was a distributed memory MIMD system based on the Sun 4/110 workstation.
By 1992, research had moved to an 8 node system based on SPARCstation 2 workstations interconnected with communications cards developed by MSU researchers. Included in this system were custom built performance monitoring capabilities as well as a midplane with wormhole router chips. This system was known as the MSPARC/8, which indicated that it was the second generation of the MADEM system, was now based on the new Sun SPARC architecture, and that it had 8 processors. As with the MADEM system, the MSPARC/8 had motherboards that were removed from their original chassis and mounted in a custom chassis with direct interconnects to the midplane.
In June of 1993, the first components were purchased for what would be known as the SuperMSPARC. This was the
third generation of this project. The SuperMSPARC is comprised of 8 Sun SPARCstation 10 workstations. Each node
has four 90MHz HyperSPARC processor modules, and 288 MB of RAM. Sun had originally intended to release a quad
processor SuperSPARC-based SPARCstation 10, but eventually released them as HyperSPARCs instead due to heat issues.
Unfortunately, the project was already named SuperMSPARC by that time. The nodes have been interconnected via the
built-in 10Mb/s ethernet, 155Mb/s (OC3) ATM, and Myrinet. The system also has a custom-built midplane and SBUS
cards used for monitoring interprocess communications. Unlike its predecessors, the SuperMSPARC systems were left
in their original chassis and connected via cables from their SBUS ports to the custom midplane. This project has
been so successful, that as of June 2002, nine years after its construction began, it is still in service as a tool
to teach parallel computing techniques.
In December 1999, the fourth generation of this project began. The UltraMSPARC is a 16 node system. Each node has four 400 MHz UltraSPARC II processors and 2 GB of RAM. The nodes are connected via Myrinet as well as 100Mb/s ethernet. The research continues with this system by using custom built Global Positioning System (GPS) cards in the nodes to synchronize their system clocks very accurately with similar systems in a remote location. MSU is now experimenting with clustering techniques where the physical location of the nodes no longer matters. Unlike previous generations, the UltraMSPARC was designed from the outset to be primarily a production level system. The clustering research on this system is secondary to its main function as a center-wide computational resource.
It was due to the experience gained through more than a decade of cluster computing that the MSU Engineering Research Center embarked on the large-scale production system that became known as EMPIRE (ERC's Massively Parallel Initiative for Research and Engineering). EMPIRE is currently a 1038 processor (519 node) cluster based on Intel Pentium III processors running the Linux operating system. Each node contains dual Pentium III processors running at either 1GHz or 1.266GHz, and 1GB of RAM. It is the first cluster built at the HPC² based on the Intel/Linux architecture instead of the Sun/SunOS/Solaris architecture. EMPIRE is built primarily with IBM eSeries x330 rackmountable systems connected via 100Mb/s ethernet with interswitch communications via Gigabit ethernet.
In October 2003, construction began on "Maverick". This system is a 384 processor (192 node) cluster based on 3.06GHz
Intel Xeon processors running the Linux operating system. Each node has 2.5GB of RAM. This system is constructed with
IBM eSeries x335 servers. It is quite unique in that all compute nodes are diskless. The system uses InfiniBand
technology for communications. The InfiniBand technology, provided by Voltaire, Inc., provides 10Gb/s of low-latency
data throughput to each node. It uses a technology known as PXE to boot each node across its supplemental ethernet
network. Each node downloads a Linux kernel and ramdisk image of its operating system at boot time and then boots on
the kernel and loads its entire operating system into memory. At the time of its construction, it was the third
largest InfiniBand cluster in the world, and the only large diskless InfiniBand cluster known to exist.
Maverick was decommissioned in November of 2013
In October 2006, "Raptor" was installed. This system is a 2048 processor cluster based on 512 Sun Microsystems
SunFire X2200 M2 servers, each with two dual-core AMD Opteron 2218 processors (2.6GHz) and 8GB of memory. Once again,
all compute nodes are diskless and are connected with gigabit ethernet between the 32 nodes in each rack, and
10-gigabit ethernet between each of the 16 racks. This system has a peak performance of more than 10.6 trillion
calculations per second.
Raptor was decommissioned in December of 2017
In 2010, "Talon" was brought online. Talon is a 3072 core cluster composed of 256 IBM DataPlex Nodes. Each node has
two six-core Intel Westmere processors with a clock rate of 2.8GHz. Each node contained 24 GB of memory for a system total of 6TB.
Talon's interconnect utilizes a Voltaire quad data-rate Infiniband rated at 40 Gb/s. Talon utilizes a rear door
water cooling system that is thermally neutral or positive in air temperature. The system's peak performance
is 34.4 teraFLOPS (trillion calculations per second), and debuted as the 331st entry on the June 2010 Top500. Talon
was also named the ninth overall most energy efficient system in the June 2010 Green500.
Talon was decommissioned in April of 2019
"Shadow" came online in 2013. Shadow is a Cray CS300-LC cluster with 4,800 Intel Ivy Bridge processor cores and 28,800 Intel Xeon Phi cores. Each node has
either 512 GB of RAM (45%), 128 GB of RAM (45%) or 64 GB of RAM (10%) for a system total of 70 TB of RAM. Furthermore, each
node is interconnected via FDR InfiniBand (56 Gb/s). Shadow has a peak performance of 593 teraFLOPS (trillion calculations
per second). Shadow is the first production system of its kind and uses an innovative cooling system called direct,
warm-water cooling, which allows the system to be cooled with water as warm as 104 degrees Fahrenheit (40 degrees Celsius).
On the June 2015 TOP500 Supercomputer Sites list, Shadow was ranked as the 143rd fastest computer in the world and the
11th most powerful computer at any academic site in the United States. It was also the 16th most energy efficient
supercomputer in the world according to the June 2015 Green500 list.
Shadow was decommissioned in June of 2024
Commissioned in 2019, "Scout" is a Dell C8220 cluster with 2,688 Intel Sandy Bridge processor cores. Each of the 168 nodes contain dual Xeon cores
(E5-2680 2.7GHz, turbo, 3.5GHz, 8 core) and 32 GB of memory for a system total memory of 5 TB. Each node is interconnected
via FDR InfiniBand (56 Gb/s). The system has 8 nodes with 1 Nvidia K20 GPU each.
Scout was decommissioned in April of 2024