Story:
Creating the Nation's Best Research Resource Through Improved Management
December 23, 2011 at 12:35pm
Summary
Purdue University has built five Top500.org-ranked supercomputers in the past four years, not by receiving millions of dollars in additional funding, but by improving the way in which it managed its research funding. The Community Clusters Program provides incentives for researchers to pool their funding to create a shared resource.
Context
Purdue University is a large, publicly funded research university in West Lafyatte, Indiana. It is famous for excellence in science and engineering, and is known as the "mother of astronauts" for its many space-travelling alumni, including Neil Armstrong. Science and engineering researchers are in constant search for more computing cycles on large—and expensive—supercomputers. This is a national bottleneck in our race to make discoveries in areas as important to us as cancer research, developing new energy sources, and understanding global climate change. With public funding of universities in steep decline, and federal reserach funding flat or declining, paying for these large supercomputers is an expense that most universities simply can't afford, further limiting research.
Triggers
Often these research groups are led by a prominent researcher who may receive millions of dollars in research funds. At universities, including Purdue, to a certain exent these researchers operate as independent contractors. They control how their funding is spent, and if they take a job at another university much of their funding travels with them. As indepent teams, these research groups commonly purchase small high performance computers, or build small cluster supercomputers and install them in their labs to conduct their research. This is an ineffecient system: each researcher must learn how to operate and maintain these high performance computers, and also install the infrastructure, such as required additional electricity and cooling. Also, even in the most productive research groups, these high-priced computers are often in use less than half of the time.
Key Innovations & Timeline
Four years ago Purdue offered its researchers a choice: They could purchase portions of a supercomputer the university planned to build. It was something of a time-share arrangement. The computational cycles they purchased would be available to them, but when they weren't using their nodes these computer cycles would be made available to others. Likewise, if they had a computational run that required more cycles than they had contracted for, they could make use of the available cycles. This increased the flexibility of the resources available to them. The researchers would give up some amount of choice and control: the IT management would determine the type and brand of supercomputer being purchased. But there were many benefits to the research teams. By joining this bulk purchase, the researchers would pay half as much per node than if they were buying it themselves. Also, it freed the researchers from having to install and maintain their systems. Finally, the university offered additional incentives: the university would pay for accompanying data storage systems as well as the racks, networking, and other infrastructure needed for the nodes. The university was able to do this because of the cost savings found in from the large energy savings that would result from consolidating dozens of small ineffecient datacenters scattered across campus into one large, modern datacenter.
Challenges & Solutions
The Community Cluster Program at Purdue faced several challenges, most associated with trust. Researchers had to believe they could turn over a mission-critical portion of their work to a large, centralized orgnization and not experience down times, lost data, or other problems. Also, installing a large supercomputer often takes weeks or even months, during which time researchers committed to the machine would not be able to work. To counter both of these concerns, the central IT organization, ITaP, announced that it would build the first supercomputer, named Steele, in just one day. This would make the downtime issue essentially moot, and demonstrate competence. ITaP carefully planned how to execute this, and the effort built excitement on campus and elsewhere. Nearly everyone with technical skill on campus pitched in to help, and Purdue's athletic rival Indiana University even sent a team of IT staff to lend a hand. The first machine was not only built in a day, it was actually up and running science before lunchtime, an event that became national news. After the excitement of the first day, ITaP had to demonstrate excellence in both operations and customer service. Each year additional researchers have purchased shares in the Community Cluster Program, and each year the supercomptuer was built in a day as if it were routine.
Benefits & Metrics
Researchers at Purdue have benefited by participating in the Community Cluster Program. They are able to buy twice as much computational resources than before, and the program frees them from having to operate and maintain their own research computers and datacenters. They also have access to supercomputers so fast and powerful that each of the five supercomputers has been internationally ranked on the industry's Top500.org ranking. The amount of computational capacity at Purdue has increased seven-fold since the first supercomputer built four years ago. The most recent supercomputer, Carter, is the fastest campus supercomputer in the United States. Because of its speed and groundbreaking technology, certain types of important research which either could not be done on a campus machine (requiring use of national supercomptuer center resources) or literally could not be done at all are now running on Carter. One example of this is statistical analysis that is being used to identify cancer stem cells, which are the cells that allow a tumor to grow. Identifying these cells will allow reserchers to develop new therapies that target just these cells instead of entire tumors or organs. More than 100 research teams at Purdue have purchased nodes in the Community Cluster Program, and after four years not a single team has withdrawn and returned to running its own datacenter. The university determined that one recent research team that joined the program saved the university $16,500 in electricity costs alone by closing their datacenter and joining the central program. The overall result is that Purdue now has the largest campus computational resouce by far without spending additional funds.
Lessons
There have been many lessons learned or reinforced from the Community Cluster Program. Some are prosaic but no less critical: well-constructed strategic plans can provide large benefits; team-work and customer service are critical to success; and that effective community benefits each individual. But the unexpected benefit is exemplified in Carter, Purdue's latest supercomputer. After ITaP built its first three supercomputers in less than a day each, and managed the nation's largest campus research computational resource, Purdue was approached by intel and HP, which asked if Purdue would be interested in being the first to build a new supercomputer using unreleased and unannounced technolgoy, including new processors. The result was Carter, which was built in October 2011 using intel Sandy Bridge processors, which will not be available until midyear 2012. The lessons here are that you have to be prepared for opportunity to arise, and that successful people and organizations like to partner with others who are likewise national leaders.
Credits
Gerry McCartney, vice president for information technology and CIO at Purdue University, together with John Campbell, associate vice president for academic technologies, and the staff at Purdue's Rosen Center for Advanced Computing, developed the Community Cluster Program. The program is a part of Purdue University President France Cordova's five-year strategic plan for the university, which was implemented in 2007. Purdue also received assistance and support from many high performance computing corporations, including intel, HP, Dell, Mellanox, CoolCentric, and many others.
Materials
Purdue Community Cluster Program:
Campus Technology magazine: Purdue's Community Cluster Program:
IEEE research paper: Community cluster costs compared to cloud computing costs:
MarketWatch: Purdue builds nation's fastest campus supercomputer:
Inside HPC: "Purdue's HPC funding model recieves award, model for other universities:"
Downloadable photo of Carter supercomputer:
Purdue "installation Day" promotional video:
Purdue Coates supercomputer installation photo gallery:
You need to register in order to submit a comment.