GrayWulf: Scalable Clustered Architecture for Data Intensive Computing

Data intensive computing presents a significant challenge for traditional supercomputing architectures that maximize FLOPS since CPU speed has surpassed IO capabilities of HPC systems and BeoWulf clusters. We present the architecture for a three tier commodity component cluster designed for a range...

Full description

Saved in:

Bibliographic Details
Main Authors:	Szalay, A.S., Bell, G., Vandenberg, J., Wonders, A., Burns, R., Dan Fay, Heasley, J., Hey, T., Nieto-SantiSteban, M., Thakar, A., van Ingen, C., Wilton, R.
Format:	Conference Proceeding
Language:	eng
Subjects:	Application software Central Processing Unit Cloud computing Computer architecture Data analysis Grid computing Hardware High performance computing Supercomputers Workstations
Online Access:	Request full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Data intensive computing presents a significant challenge for traditional supercomputing architectures that maximize FLOPS since CPU speed has surpassed IO capabilities of HPC systems and BeoWulf clusters. We present the architecture for a three tier commodity component cluster designed for a range of data intensive computations operating on petascale data sets named GrayWulf. The design goal is a balanced system in terms of IO performance and memory size, according to Amdahl's laws. The hardware currently installed at JHU exceeds one petabyte of storage and has 0.5 bytes/sec of I/O and 1 byte of memory for each CPU cycle. The GrayWulf provides almost an order of magnitude better balance than existing systems. The paper covers its architecture and reference applications. The software design is presented in a companion paper.
ISSN:	1530-1605 2572-6862