Search: 
 

GSRC posters

authored by

Mahmut Kandemir


Select by venue:  


   Integrated Processor-Cache Partitioning in CMPs
Pub ID:  2006 Authors:  Shekhar Srikantaiah, Mahmut Kandemir, Mary Jane Irwin
Existing cache partitioning schemes are designed in a manner oblivious to the implicit processor partitioning enforced by the operating system. This research examines an operating system directed integrated processor-cache partitioning scheme that partitions both the available processors and the shared cache in a chip multiprocessor (CMP) among different multi-threaded applications. Extensive simulations using a set of multi-programmed workloads show that our integrated processor-cache partitioning scheme facilitates achieving better performance isolation as compared to state of the art hardware/software based solutions. Specifically, our integrated processor-cache partitioning approach performs, on an average, 20.83% and 14.14% better than equal partitioning and the implicit partitioning enforced by the underlying operating system, respectively, on the fair speedup metric on an 8 core system. We also compare our approach to processor partitioning alone and a state-of-the-art cache partitioning scheme and our scheme fares 8.21% and 9.19% better than these schemes on a 16 core system.
Sep 3, 2009,   GSRC Annual Symposium 2009

   Cache Sharing Aware Computation Distribution and Scheduling for Multicore Systems
Pub ID:  2025 Authors:  Shekhar Srikantaiah, Mahmut Kandemir, Mary Jane Irwin
The main contribution of this work is a compiler based, cache topology aware code optimization scheme for emerging multicore systems. This scheme implements two complementary optimizations: distributing the iterations of a parallel loop across the cores of a target multicore machine and scheduling the iterations assigned to each core. The goal of these optimizations is to improve the utilization of on-chip cache hierarchy and maximize overall application performance. We explore future multicore machines that are not currently available (with higher core counts and deeper on-chip cache hierarchies), we also conduct a simulation based study. Our experience so far with our code optimization scheme is very promising. Specifically, the results collected from our experiments with three Intel multicore machines show that the proposed compiler-based approach is very effective in enhancing the performance of on-chip cache hierarchies of multicores. Our simulation results also indicate that optimizing for on-chip cache hierarchy will be even more important in future multicores with increasing numbers of cores and cache levels.
Sep 3, 2009,   GSRC Annual Symposium 2009

   Adaptive Set Pinning: Managing: Managing Shared Caches in CMPs
Pub ID:  1374 Authors:  Shekhar Srikantaiah, Mahmut Kandemir, Mary Jane Irwin
Shared cache management is a crucial CMP design aspect for the performance of the system. We present a new classification of cache misses – CII: Compulsory, Inter-processor and Intra-processor misses – for CMPs with shared caches to provide a better understanding of the interactions between memory transactions of different processors at the level of shared cache in a CMP. We then propose a novel approach, called set pinning, for eliminating inter-processor misses and reducing intra-processor misses in a shared cache. Our experiments indicate that the set pinning scheme achieves an average improvement of 22.18% in the L2 miss rate while the adaptive set pinning scheme reduces the miss rates by an average of 47.94% as compared to the traditional shared cache scheme. They also improve the performance by 7.24% and 17.88% respectively.
Sep 29, 2008,   GSRC Annual Symposium 2008

   Profile-Driven Energy Reduction in Network-on-Chips
Pub ID:  331 Authors:  Feihui Li, Guangyu Chen, Mahmut Kandemir
Reducing energy consumption of a Network-on-Chip (NoC) is a critical design goal, especially for power-constrained embedded systems. In response, prior research has proposed several circuit/architectural level mechanisms to reduce NoC power consumption. This paper considers the problem from a different perspective and demonstrates that compiler analysis can be very helpful for enhancing the effectiveness of a hardware-based link power management mechanism by increasing the duration of communication links? idle periods. The proposed profile-based approach achieves its goal by maximizing the communication link reuse through compiler-directed, static message re-routing. That is, it clusters the required data communications into a small set of communication links at any given time, which increases the idle periods for the remaining communication links in the network. This helps hardware shut down more communication links and their corresponding buffers to reduce leakage power. The current experimental evaluation, with twelve data-intensive embedded applications, shows that the proposed profile-driven compiler approach reduces leakage energy by more than 35% (on average) as compared to a pure hardware-based link power management scheme.
Sep 20, 2007,   GSRC Annual Symposium 2007

   Evaluating the Role of Scratchpad Memories in Multi-core for Sparse Matrix Computations
Pub ID:  380 Authors:  Aditya Yanamandra, Brian Cover, Konrad Malkowski, Padma Raghavan, Mahmut Kandemir, Mary Jane Irwin
We consider hardware acceleration for sparse matrix vector multiplication (SpMV), a kernel that is used widely in iterative linear solvers, modeling and simulation applications. In particular, we consider how scratchpad memory can be used for increasing the performance and the energy efficiency of SpMV in a multi-core system. Scratchpad memories (SPM) are more energy efficient than traditional caches. This, coupled with the predictability of data presence, makes SPM an attractive alternative to a cache. We ensure the efficient utilization of the SPM by using it to store data which doesn?t perform well in the traditional cache. We evaluate the impact of using an SPM at all levels of the on-chip memory hierarchy. Depending on the level of the hierarchy in which the SPM is utilized, we observe on an eight core system an average increase in performance of 13.5%-15% at an average decrease in energy consumption of 23%-28%.
Sep 20, 2007,   GSRC Annual Symposium 2007

   A Constraint Network A Constraint Network Based Solution to Code Parallelization
Pub ID:  111 Author:  Mahmut Kandemir
Software issues regarding the programming of the chip multiprocessors need to be re-considered. Need a fresh look for chip multiprocessors: Interprocessor communication is cheap, off-chip memory accesses are very costly. Code parallelization techniques proposed and studied for high-performance parallel machines do not extend directly to chip multiprocessors: Code parallelizers handle one loop nest at a time and they fail to capture the data sharing patterns. Chip multiprocessing is quickly becoming a viable approach . Our goal: Explore a new code parallelization scheme Target chip multiprocessors used in embedded computing
Sep 28, 2006,   GSRC Annual Symposium 2006