These pages describe work carried out on design implementation, and applications of a technique that we call static approximate phase analysis. The PI is Hridesh Rajan and much of the work is carried out by Tyler Sondag.
NewsJanuary 2010: Tutorial on Frances tool accepted for CCSC 2010. New October 2009: Paper on Frances tool accepted for SIGCSE 2010. August 2009: Technical Report: Frances: A Tool For Understanding Code Generation , ISU, 2009. July 2009: Technical Report: A Theory of Reads and Writes for Multi-level Caches , ISU, 2009. March 2009: Technical Report: Phase-guided Auto-Tuning for Improved Utilization of Performance-Asymmetric Multicore Processors , ISU, 2009. February 2009: Tyler's paper accepted for IWMSE 2009. July 2007: Tyler and Viswanath's paper accepted for PLOS 2007. |
Phase-guided Thread-to-core Assignment for Improved Utilization of Performance-Asymmetric Multi-Core ProcessorsTyler Sondag and Hridesh RajanAbstractCPU vendors are starting to explore trade offs between die size, number of cores on a die, and power consumption leading to performance asymmetry among cores on a single chip. For efficient utilization of these performance- asymmetric multi-core processors, application threads must be assigned to cores such that the resource needs of a thread closely matches resource availability at the assigned core. This significantly complicates the task of an average programmer. The contribution of this work is a technique for automatically determining the mapping between threads and performance-asymmetric cores of a processor. Our approach, which we call phase-guided thread-to-core assignment, builds on a well-known insight that programs exhibit phase behavior. We first take code sections and group them into clusters such that each section in a cluster is likely to exhibit similar runtime characteristics. The key idea is that with this clustering, characteristics of a small number of representative sections in a cluster give insight into the behavior of the entire cluster. Thus the exhibited characteristics of the representative sections on different types of cores can be used for automating thread-to-core assignment at a lower runtime cost. Variations of our technique show up to an average 150% improvement in throughput over the stock Linux scheduler for systems with a constant feed of jobs, while maintaining comparable fairness and efficiency. Bibliographic Information
@inproceedings{Sondag-Rajan-09, Most recent version: PDF Previous version appeared as Technical Report 08-14, Computer Science, Iowa State University, January 31, 2009. [PDF] |