Search: 
 
Commands
  Search pubs database

Quick search by ...
Theme
  alternative
core
concurrent
resilient
self_test
heterogeneous
infrastructure
microarch
power
reliable
soft
verification
roadmap

Design Driver
  driver
Year
  2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998


Group
  2006faculty
alternative
bee2
bk_partitioning
bk_placement
bk_routing
bookshelf
embedded
fabricsthrust
faculty
fresco
gsrc
gsrcadmin
gsrcexec
gsrc_faculty
gtx
infrax
marcov
mescal
metropolis
nexsis
polis
ptolemy
semantics
sig_modeling
sig_power
sig_uarch
sig_verification
testthrust
theme_leaders
 MCUDA: An Efficient Implementation of CUDA Kernels on Multi-cores
John Stratton, Sam Stone, Wen-mei Hwu

Citation
John Stratton, Sam Stone, Wen-mei Hwu. "MCUDA: An Efficient Implementation of CUDA Kernels on Multi-cores". Technical report, University of Illinois at Urbana-Champaign, IMPACT-08-01, March, 2008.

Abstract
The CUDA programming model, which is based on an extended ANSI C language and a runtime environment, allows the programmer to specify explicitly data parallel computation. NVIDIA developed CUDA to open the architecture of their graphics accelerators to more general applications, but did not provide an efficient mapping to execute the programming model on any other architecture. This document describes Multicore-CUDA (MCUDA), a system that efficiently maps the CUDA programming model to a multicore CPU architecture. The major contribution of this work is the source-to-source translation process that converts CUDA code into standard C that interfaces to a runtime library for parallel execution. We apply the MCUDA framework to some CUDA applications previously shown to have high performance on a GPU, and demonstrate high efficiency executing these applications on a multicore CPU architecture. The thread-level parallelism, data locality and computational regularity of the code as expressed in the CUDA model achieve much of the benefit of hand-tuning an application for the CPU architecture. With the MCUDA framework, it is now possible to write data-parallel code in a single programming model for efficient execution on CPU or GPU architectures.

Electronic downloads

Citation formats  

  • HTML
    John Stratton, Sam Stone, Wen-mei Hwu. <a
    href="http://www.gigascale.org/pubs/1278.html"><i>MCUDA:
    An Efficient Implementation of CUDA Kernels on
    Multi-cores</i></a>, Technical report, 
    University of Illinois at Urbana-Champaign, IMPACT-08-01,
    March, 2008.
  • Plain text
    John Stratton, Sam Stone, Wen-mei Hwu. "MCUDA: An Efficient
    Implementation of CUDA Kernels on Multi-cores". Technical
    report,  University of Illinois at Urbana-Champaign,
    IMPACT-08-01, March, 2008.
  • BibTeX
    @techreport{StrattonStoneHwu08_MCUDAEfficientImplementationOfCUDAKernelsOnMulticores,
        author = {John Stratton and Sam Stone and Wen-mei Hwu},
        title = {MCUDA: An Efficient Implementation of CUDA Kernels
                  on Multi-cores},
        institution = {University of Illinois at Urbana-Champaign},
        number = {IMPACT-08-01},
        month = {March},
        year = {2008},
        abstract = {The CUDA programming model, which is based on an
                  extended ANSI C language and a runtime
                  environment, allows the programmer to specify
                  explicitly data parallel computation. NVIDIA
                  developed CUDA to open the architecture of their
                  graphics accelerators to more general
                  applications, but did not provide an efficient
                  mapping to execute the programming model on any
                  other architecture. This document describes
                  Multicore-CUDA (MCUDA), a system that efficiently
                  maps the CUDA programming model to a multicore CPU
                  architecture. The major contribution of this work
                  is the source-to-source translation process that
                  converts CUDA code into standard C that interfaces
                  to a runtime library for parallel execution. We
                  apply the MCUDA framework to some CUDA
                  applications previously shown to have high
                  performance on a GPU, and demonstrate high
                  efficiency executing these applications on a
                  multicore CPU architecture. The thread-level
                  parallelism, data locality and computational
                  regularity of the code as expressed in the CUDA
                  model achieve much of the benefit of hand-tuning
                  an application for the CPU architecture. With the
                  MCUDA framework, it is now possible to write
                  data-parallel code in a single programming model
                  for efficient execution on CPU or GPU
                  architectures. },
        URL = {http://www.gigascale.org/pubs/1278.html}
    }
    

Posted by John Stratton on 18 Apr 2008..

Notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright.

 
You are not logged in
©1998-2009 GSRC