Search: 
 

GSRC posters

authored by

Wen-mei Hwu


Select by venue:  


   FCUDA: Compilation of CUDA kernels onto FPGA
Pub ID:  1976 Authors:  Alex Papakonstantinou, Karthik Gururaj, John Stratton, Deming Chen, Jason Cong, Wen‑mei Hwu
As growing power dissipation and thermal effects disrupted the rising clock frequency trend and threatened to annul Moore’s law, the computing industry has switched its route to higher performance through parallel processing. The rise of multi-core systems in all domains of computing has opened the door to heterogeneous multi-processors, where processors of different compute characteristics can be combined to effectively boost the performance per watt of different application kernels. GPUs and FPGAs are becoming very popular in PC-based heterogeneous systems for speeding up compute intensive kernels of scientific, imaging and simulation applications. GPUs can execute hundreds of concurrent threads, while FPGAs provide customized concurrency for highly parallel kernels. However, exploiting the parallelism available in these applications is currently not a push-button task. Often the programmer has to expose the application’s fine and coarse grained parallelism by using special APIs. CUDA is such a parallel-computing API that is driven by the GPU industry and is gaining significant popularity. In this work, we adapt the CUDA programming model into a new FPGA design flow called FCUDA, which efficiently maps the coarse and fine grained parallelism exposed in CUDA onto the reconfigurable fabric. Our CUDA-to-FPGA flow employs AutoPilot, an advanced high-level synthesis tool which enables high-abstraction FPGA programming. FCUDA is based on a source-to-source compilation that transforms the SPMD CUDA thread blocks into parallel C code for AutoPilot. Our experimental results demonstrate the highly competitive performance of the resulting customized FPGA multi-core accelerators.
Sep 3, 2009,   GSRC Annual Symposium 2009

   ADAPT: An Automated Doubly Adaptive Performance Modeling Tool for GPU Architectures 

Pub ID:  2029 Authors:  Sara Baghsorkhi, Wen‑mei Hwu
We propose an analytical model to predict the performance of general purpose applications on a GPU architecture. The model is designed to provide performance information to an automated optimizing compiler and can also be incorporated into a tool to help programmers better assess the performance bottlenecks in their code. We have validated the model is on the NVIDIA GPUs with data parallel benchmarks that stress different GPU microarchitecture events such as uncoalesced memory accesses, scratch-pad memory bank conflicts and control flow divergence. The model adapts to both hardware parameters and kernel input parameters.
Sep 3, 2009,   GSRC Annual Symposium 2009

   Portable Parallel Programming with MCUDA
Pub ID:  1417 Authors:  John Stratton, Wen‑mei Hwu
For programmers targeting modern parallel architectures and accelerators, it is risky to limit an application to a specific vendor or architecture, and costly to write for multiple platforms. We introduce MCUDA, the anchor of a tool chain which translates a program in a data-parallel programming model to execute efficiently on a wide variety of modern architectures. MCUDA generates source code for multiple granularities of instruction- and thread-level parallelism targeted for specific architectures, achieving portable and scalable parallel performance.
Sep 29, 2008,   GSRC Annual Symposium 2008

   Application studies on GPU
Pub ID:  339 Authors:  Shane Ryoo, Christopher Rodrigues, Sam Stone, Sara Sadeghi Baghsorkhi, Sain‑Zee Ueng, Stephanie C. Tsao, Wen‑mei Hwu
Through studying applications on NVIDIA\'s Compute Uniform Device Architecture (CUDA), we are able to investigate what approaches and configuration settings are to be made in order to fully utilize the GPGPU. In addition, we are experimenting with various coding and parallelization configurations to determine the optimal settings for individual applications. We hope to formulate a heuristic to guide future application development in such an environment from our studies. Finally, we discuss some implications of our work for the development of implicitly parallel programming models by exploiting the application studies done on CUDA, GPGPU.
Sep 20, 2007,   GSRC Annual Symposium 2007

   Expanding the IMPACT Toolbox
Pub ID:  130 Authors:  Sara Sadeghi, Robert Kidd, Sain‑Zee Ueng, Wen‑mei Hwu
The current, manually-intensive, paradigm of designing parallel systems is a deterrent for allowing a large number of applications to fully utilize the computing power of contemporary multi-processing systems. To address this issue, we propose a new interactive, visualization- and interface-driven framework that eases developer efforts in obtaining multi-threaded performance. This framework will provide flexible tools to analyze and transform code correctly and quickly, allowing developers to rapidly iterate through various designs and improve time to final product.
Sep 28, 2006,   GSRC Annual Symposium 2006

   Analysis for Parallelization of Media Applications
Pub ID:  27 Author:  Wen‑mei Hwu
Sep 8, 2005,   GSRC Annual Symposium 2005

   IMPACT Vision
Pub ID:  28 Author:  Wen‑mei Hwu
Sep 8, 2005,   GSRC Annual Symposium 2005