| Cited by |
Paper title |
Year |
| 1553 |
Power provisioning for a warehouse-sized computer. |
2007 |
| 1203 |
Dark silicon and the end of multicore scaling. |
2011 |
| 937 |
Scalable high performance main memory system using phase-change memory technology. |
2009 |
| 875 |
Architecting phase change memory as a scalable dram alternative. |
2009 |
| 756 |
Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU. |
2010 |
| 644 |
A durable and energy efficient main memory using phase change memory technology. |
2009 |
| 625 |
Corona: System Implications of Emerging Nanophotonic Technology. |
2008 |
| 610 |
Continuous Optimization. |
2005 |
| 588 |
3D-Stacked Memory Architectures for Multi-core Processors. |
2008 |
| 547 |
Adaptive insertion policies for high performance caching. |
2007 |
| 539 |
Techniques for Multicore Thermal Management: Classification and New Exploration. |
2006 |
| 532 |
An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness. |
2009 |
| 511 |
Interconnections in Multi-Core Architectures: Understanding Mechanisms, Overheads and Scaling. |
2005 |
| 497 |
Virtualizing Transactional Memory. |
2005 |
| 477 |
Cooperative Caching for Chip Multiprocessors. |
2006 |
| 451 |
Anton, a special-purpose machine for molecular dynamics simulation. |
2007 |
| 448 |
Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems. |
2008 |
| 427 |
Design and Management of 3D Chip Multiprocessors Using Network-in-Memory. |
2006 |
| 413 |
High performance cache replacement using re-reference interval prediction (RRIP). |
2010 |
| 411 |
Victim Replication: Maximizing Capacity while Hiding Wire Delay in Tiled Chip Multiprocessors. |
2005 |
| 389 |
An integrated GPU power and performance model. |
2010 |
| 380 |
Reactive NUCA: near-optimal block placement and replication in distributed caches. |
2009 |
| 376 |
Ensemble-level Power Management for Dense Blade Servers. |
2006 |
| 368 |
Express virtual channels: towards the ideal interconnection fabric. |
2007 |
| 367 |
Technology-Driven, Highly-Scalable Dragonfly Topology. |
2008 |
| 365 |
An effective hybrid transactional memory system with strong isolation guarantees. |
2007 |
| 363 |
Flattened butterfly: a cost-efficient topology for high-radix networks. |
2007 |
| 361 |
Energy proportional datacenter networks. |
2010 |
| 353 |
A reconfigurable fabric for accelerating large-scale datacenter services. |
2014 |
| 341 |
BugNet: Continuously Recording Program Execution for Deterministic Replay Debugging. |
2005 |
| 341 |
Optimizing Replication, Communication, and Capacity Allocation in CMPs. |
2005 |
| 337 |
Firefly: illuminating future network-on-chip with nanophotonics. |
2009 |
| 334 |
A High Throughput String Matching Architecture for Intrusion Detection and Prevention. |
2005 |
| 327 |
Bulk Disambiguation of Speculative Threads in Multiprocessors. |
2006 |
| 318 |
Understanding sources of inefficiency in general-purpose chips. |
2010 |
| 316 |
A case for bufferless routing in on-chip networks. |
2009 |
| 306 |
Hybrid cache architecture with disparate memory technologies. |
2009 |
| 305 |
The Impact of Performance Asymmetry in Emerging Multicore Architectures. |
2005 |
| 295 |
Power management of online data-intensive services. |
2011 |
| 294 |
Core fusion: accommodating software diversity in chip multiprocessors. |
2007 |
| 292 |
Variation-Aware Application Scheduling and Power Management for Chip Multiprocessors. |
2008 |
| 290 |
Improving NAND Flash Based Disk Caches. |
2008 |
| 289 |
A Scalable Architecture For High-Throughput Regular-Expression Pattern Matching. |
2006 |
| 288 |
Self-Optimizing Memory Controllers: A Reinforcement Learning Approach. |
2008 |
| 285 |
Raksha: a flexible information flow architecture for software security. |
2007 |
| 272 |
GPUWattch: enabling energy optimizations in GPGPUs. |
2013 |
| 267 |
A Case for MLP-Aware Cache Replacement. |
2006 |
| 267 |
PIPP: promotion/insertion pseudo-partitioning of multi-core shared caches. |
2009 |
| 265 |
A novel dimensionally-decomposed router for on-chip communication in 3D architectures. |
2007 |
| 263 |
Mitigating Amdahl’s Law through EPI Throttling. |
2005 |
| 263 |
New cache designs for thwarting software cache-based side channel attacks. |
2007 |
| 258 |
Performance pathologies in hardware transactional memory. |
2007 |
| 254 |
NoHype: virtualized cloud infrastructure without the virtualization. |
2010 |
| 253 |
SODA: A Low-power Architecture For Software Radio. |
2006 |
| 238 |
Hardware support for WCET analysis of hard real-time multicore systems. |
2009 |
| 236 |
Scaling the bandwidth wall: challenges in and avenues for CMP scaling. |
2009 |
| 236 |
RAIDR: Retention-aware intelligent DRAM refresh. |
2012 |
| 232 |
Microarchitecture of a High-Radix Router. |
2005 |
| 232 |
Use ECP, not ECC, for hard failures in resistive memories. |
2010 |
| 230 |
Design and Implementation of the AEGIS Single-Chip Secure Processor Using Physical Random Functions. |
2005 |
| 229 |
A Gracefully Degrading and Energy-Efficient Modular Router Architecture for On-Chip Networks. |
2006 |
| 227 |
Exploiting Structural Duplication for Lifetime Reliability Enhancement. |
2005 |
| 225 |
Carbon: architectural support for fine-grained parallelism on chip multiprocessors. |
2007 |
| 225 |
Trading off Cache Capacity for Reliability to Enable Low Voltage Operation. |
2008 |
| 225 |
Thread motion: fine-grained power management for multi-core systems. |
2009 |
| 223 |
BulkSC: bulk enforcement of sequential consistency. |
2007 |
| 222 |
Virtual Circuit Tree Multicasting: A Case for On-Chip Hardware Multicast Support. |
2008 |
| 221 |
MIRA: A Multi-layered On-Chip Interconnect Router Architecture. |
2008 |
| 220 |
The V-Way Cache: Demand Based Associativity via Global Replacement. |
2005 |
| 217 |
Architectural Semantics for Practical Transactional Memory. |
2006 |
| 217 |
DeLorean: Recording and Deterministically Replaying Shared-Memory Multiprocessor Execution Effciently. |
2008 |
| 217 |
A Comprehensive Memory Modeling Tool and Its Application to the Design and Analysis of Future Memory Hierarchies. |
2008 |
| 216 |
Understanding and Designing New Server Architectures for Emerging Warehouse-Computing Environments. |
2008 |
| 215 |
Computing Architectural Vulnerability Factors for Address-Based Structures. |
2005 |
| 211 |
Architecture for Protecting Critical Secrets in Microprocessors. |
2005 |
| 211 |
Rethinking DRAM design and organization for energy-constrained multi-cores. |
2010 |
| 208 |
The BlackWidow High-Radix Clos Network. |
2006 |
| 207 |
Virtual hierarchies to support server consolidation. |
2007 |
| 205 |
Scheduling heterogeneous multi-cores through performance impact estimation (PIE). |
2012 |
| 195 |
Near-Optimal Worst-Case Throughput Routing for Two-Dimensional Mesh Networks. |
2005 |
| 195 |
An Ultra Low Power System Architecture for Sensor Network Applications. |
2005 |
| 195 |
Relax: an architectural framework for software recovery of hardware faults. |
2010 |
| 192 |
Configurable isolation: building high availability systems with commodity multi-core processors. |
2007 |
| 192 |
Virtual private caches. |
2007 |
| 191 |
RegionScout: Exploiting Coarse Grain Sharing in Snoop-Based Coherence. |
2005 |
| 190 |
The performance of PC solid-state disks (SSDs) as a function of bandwidth, concurrency, device architecture, and system organization. |
2009 |
| 189 |
Security refresh: prevent malicious wear-out and increase durability for phase-change memory with dynamically randomized address mapping. |
2010 |
| 186 |
Dynamic warp subdivision for integrated branch and memory divergence tolerance. |
2010 |
| 179 |
Rerun: Exploiting Episodes for Lightweight Memory Race Recording. |
2008 |
| 178 |
An Architecture Framework for Transparent Instruction Set Customization in Embedded Processors. |
2005 |
| 175 |
Energy-efficient mechanisms for managing thread context in throughput processors. |
2011 |
| 174 |
Temperature-constrained power control for chip multiprocessors with online model estimation. |
2009 |
| 173 |
Reducing cache power with low-cost, multi-bit error-correcting codes. |
2010 |
| 172 |
Web search using mobile cores: quantifying and mitigating the price of efficiency. |
2010 |
| 172 |
The impact of memory subsystem resource sharing on datacenter applications. |
2011 |
| 170 |
Thread criticality predictors for dynamic performance, power, and resource management in chip multiprocessors. |
2009 |
| 169 |
Phastlane: a rapid transit optical routing network. |
2009 |
| 165 |
Spatial Memory Streaming. |
2006 |
| 162 |
Making the fast case common and the uncommon case simple in unbounded transactional memory. |
2007 |
| 160 |
Flexible Decoupled Transactional Memory Support. |
2008 |
| 160 |
Benefits and limitations of tapping into stored energy for datacenters. |
2011 |
| 159 |
Rigel: an architecture and scalable programming interface for a 1000-core accelerator. |
2009 |
| 157 |
Analysis of redundancy and application balance in the SPEC CPU2006 benchmark suite. |
2007 |
| 157 |
Vantage: scalable and efficient fine-grain cache partitioning. |
2011 |
| 154 |
Disaggregated memory for expansion and sharing in blade servers. |
2009 |
| 153 |
ReCycle: : pipeline adaptation to tolerate process variation. |
2007 |
| 150 |
Design and Evaluation of Hybrid Fault-Detection Systems. |
2005 |
| 150 |
Globally-Synchronized Frames for Guaranteed Quality-of-Service in On-Chip Networks. |
2008 |
| 148 |
Opportunistic Transient-Fault Detection. |
2005 |
| 148 |
Direct Cache Access for High Bandwidth Network I/O. |
2005 |
| 145 |
Staged memory scheduling: Achieving high performance and scalability in heterogeneous systems. |
2012 |
| 144 |
Towards energy-proportional datacenter memory with mobile DRAM. |
2012 |
| 143 |
Morphable memory system: a robust architecture for exploiting multi-level phase change memories. |
2010 |
| 143 |
ZSim: fast and accurate microarchitectural simulation of thousand-core systems. |
2013 |
| 141 |
Resistive computation: avoiding the power wall with low-leakage, STT-MRAM based computing. |
2010 |
| 140 |
A case for exploiting subarray-level parallelism (SALP) in DRAM. |
2012 |
| 139 |
Improving Cost, Performance, and Security of Memory Encryption and Authentication. |
2006 |
| 139 |
Aérgia: exploiting packet latency slack in on-chip networks. |
2010 |
| 138 |
An integrated hardware-software approach to flexible transactional memory. |
2007 |
| 137 |
Limiting the power consumption of main memory. |
2007 |
| 137 |
Achieving predictable performance through better memory controller placement in many-core CMPs. |
2009 |
| 137 |
A case for an interleaving constrained shared-memory multi-processor. |
2009 |
| 136 |
Scale-out processors. |
2012 |
| 134 |
TokenTM: Efficient Execution of Large Transactions with Hardware Transactional Memory. |
2008 |
| 134 |
Kilo-NOC: a heterogeneous network-on-chip architecture for scalability and service guarantees. |
2011 |
| 132 |
Increasing the effectiveness of directory caches by deactivating coherence for private memory blocks. |
2011 |
| 131 |
Interconnect-Aware Coherence Protocols for Chip Multiprocessors. |
2006 |
| 131 |
Flipping bits in memory without accessing them: An experimental study of DRAM disturbance errors. |
2014 |
| 130 |
Flexible Hardware Acceleration for Instruction-Grain Program Monitoring. |
2008 |
| 127 |
Comparing memory systems for chip multiprocessors. |
2007 |
| 125 |
A Robust Main-Memory Compression Scheme. |
2005 |
| 125 |
Dynamic prediction of architectural vulnerability from microarchitectural state. |
2007 |
| 124 |
Improving Multiprocessor Performance with Coarse-Grain Coherence Tracking. |
2005 |
| 124 |
Architectural core salvaging in a multi-core processor for hard-error tolerance. |
2009 |
| 124 |
A dynamically configurable coprocessor for convolutional neural networks. |
2010 |
| 124 |
Energy-performance tradeoffs in processor architecture and circuit design: a marginal cost analysis. |
2010 |
| 123 |
Bubble-flux: precise online QoS management for increased utilization in warehouse scale computers. |
2013 |
| 122 |
Temporal Streaming of Shared Memory. |
2005 |
| 121 |
Thin servers with smart pipes: designing SoC accelerators for memcached. |
2013 |
| 120 |
Learning-Based SMT Processor Resource Distribution via Hill-Climbing. |
2006 |
| 120 |
TRAP-Array: A Disk Array Architecture Providing Timely Recovery to Any Point-in-time. |
2006 |
| 120 |
FabScalar: composing synthesizable RTL designs of arbitrary cores within a canonical superscalar template. |
2011 |
| 119 |
An experimental study of data retention behavior in modern DRAM devices: implications for retention time profiling mechanisms. |
2013 |
| 118 |
Analysis of the O-GEometric History Length Branch Predictor. |
2005 |
| 118 |
Atom-Aid: Detecting and Surviving Atomicity Violations. |
2008 |
| 118 |
Managing distributed UPS energy for effective power capping in data centers. |
2012 |
| 114 |
DBAR: an efficient routing algorithm to support multiple concurrent applications in networks-on-chip. |
2011 |
| 113 |
Using Hardware Memory Protection to Build a High-Performance, Strongly-Atomic Hybrid Transactional Memory. |
2008 |
| 113 |
VEAL: Virtualized Execution Accelerator for Loops. |
2008 |
| 111 |
Energy-efficient cache design using variable-strength error-correcting codes. |
2011 |
| 110 |
Translation caching: skip, don’t walk (the page table). |
2010 |
| 109 |
Energy Optimization of Subthreshold-Voltage Sensor Network Processors. |
2005 |
| 109 |
Interconnect design considerations for large NUCA caches. |
2007 |
| 107 |
Efficient virtual memory for big memory servers. |
2013 |
| 106 |
Memory mapped ECC: low-cost error protection for last level caches. |
2009 |
| 106 |
PreSET: Improving performance of phase change memories by exploiting asymmetry in write times. |
2012 |
| 105 |
Convolution engine: balancing efficiency&flexibility in specialized computing. |
2013 |
| 103 |
Examining ACE analysis reliability estimates using fault-injection. |
2007 |
| 101 |
Adaptive Mechanisms and Policies for Managing Cache Hierarchies in Chip Multiprocessors. |
2005 |
| 101 |
Towards energy proportionality for large-scale latency-critical workloads. |
2014 |
| 100 |
SigRace: signature-based data race detection. |
2009 |
| 100 |
Re-architecting DRAM memory systems with monolithically integrated silicon photonics. |
2010 |
| 99 |
High Efficiency Counter Mode Security Architecture via Prediction and Precomputation. |
2005 |
| 99 |
Mechanisms for store-wait-free multiprocessors. |
2007 |
| 99 |
The impact of management operations on the virtualized datacenter. |
2010 |
| 99 |
SieveStore: a highly-selective, ensemble-level disk cache for cost-performance. |
2010 |
| 98 |
A Tree Based Router Search Engine Architecture with Single Port Memories. |
2005 |
| 98 |
Scalable power control for many-core architectures running multi-threaded applications. |
2011 |
| 98 |
Prefetch-aware shared resource management for multi-core systems. |
2011 |
| 98 |
A scalable processing-in-memory accelerator for parallel graph processing. |
2015 |
| 97 |
Exploring the tradeoffs between programmability and efficiency in data-parallel accelerators. |
2011 |
| 96 |
MetaTM//TxLinux: transactional memory for an operating system. |
2007 |
| 96 |
Spatio-temporal memory streaming. |
2009 |
| 96 |
Orchestrated scheduling and prefetching for GPGPUs. |
2013 |
| 95 |
Piecewise Linear Branch Prediction. |
2005 |
| 95 |
Robust architectural support for transactional memory in the power architecture. |
2013 |
| 95 |
General-purpose code acceleration with limited-precision analog computation. |
2014 |
| 94 |
AnySP: anytime anywhere anyway signal processing. |
2009 |
| 94 |
Modeling critical sections in Amdahl’s law and its implications for multicore design. |
2010 |
| 94 |
Die-stacked DRAM caches for servers: hit ratio, latency, or bandwidth? have it all with footprint cache. |
2013 |
| 91 |
A Proactive Wearout Recovery Approach for Exploiting Microarchitectural Redundancy to Extend Cache SRAM Lifetime. |
2008 |
| 90 |
Synchronization state buffer: supporting efficient fine-grain synchronization on many-core architectures. |
2007 |
| 90 |
InvisiFence: performance-transparent memory ordering in conventional multiprocessors. |
2009 |
| 89 |
Conflict exceptions: simplifying concurrent language semantics with precise hardware exceptions for data-races. |
2010 |
| 88 |
Rotary router: an efficient architecture for CMP interconnection networks. |
2007 |
| 88 |
The virtual write queue: coordinating DRAM and last-level cache policies. |
2010 |
| 88 |
A case for heterogeneous on-chip interconnects for CMPs. |
2011 |
| 88 |
Memory persistency. |
2014 |
| 86 |
Silicon-photonic network architectures for scalable, power-efficient multi-chip systems. |
2010 |
| 84 |
Evolution of thread-level parallelism in desktop applications. |
2010 |
| 84 |
Catnap: energy proportional multiple network-on-chip. |
2013 |
| 83 |
On the feasibility of online malware detection with performance counters. |
2013 |
| 82 |
Disk Drive Roadmap from the Thermal Perspective: A Case for Dynamic Thermal Management. |
2005 |
| 82 |
Hardware atomicity for reliable software speculation. |
2007 |
| 82 |
A defect-tolerant accelerator for emerging high-performance applications. |
2012 |
| 80 |
Store Vulnerability Window (SVW): Re-Execution Filtering for Enhanced Load Optimization. |
2005 |
| 80 |
iSwitch: Coordinating and optimizing renewable energy powered server clusters. |
2012 |
| 80 |
EIE: Efficient Inference Engine on Compressed Deep Neural Network. |
2016 |
| 79 |
An abacus turn model for time/space-efficient reconfigurable routing. |
2011 |
| 78 |
ReVIVaL: A Variation-Tolerant Architecture Using Voltage Interpolation and Variable Latency. |
2008 |
| 77 |
An intra-chip free-space optical interconnect. |
2010 |
| 76 |
Rescue: A Microarchitecture for Testability and Defect Tolerance. |
2005 |
| 76 |
Power model validation through thermal measurements. |
2007 |
| 76 |
ShiDianNao: shifting vision processing closer to the sensor. |
2015 |
| 75 |
Online Estimation of Architectural Vulnerability Factor for Soft Errors. |
2008 |
| 75 |
Can traditional programming bridge the Ninja performance gap for parallel computing applications? |
2012 |
| 74 |
Application-aware deadlock-free oblivious routing. |
2009 |
| 74 |
Architecting on-chip interconnects for stacked 3D STT-RAM caches in CMPs. |
2011 |
| 74 |
Bypass and insertion algorithms for exclusive last-level caches. |
2011 |
| 74 |
Simultaneous branch and warp interweaving for sustained GPU performance. |
2012 |
| 74 |
Aladdin: A pre-RTL, power-performance accelerator simulator enabling large design space exploration of customized architectures. |
2014 |
| 73 |
Thread tailor: dynamically weaving threads together for efficient, adaptive parallel applications. |
2010 |
| 73 |
TimeWarp: Rethinking timekeeping and performance monitoring mechanisms to mitigate side-channel attacks. |
2012 |
| 73 |
The Yin and Yang of power and performance for asymmetric hardware and managed software. |
2012 |
| 72 |
A 64-bit stream processor architecture for scientific applications. |
2007 |
| 71 |
iDEAL: Inter-router Dual-Function Energy and Area-Efficient Links for Network-on-Chip (NoC) Architectures. |
2008 |
| 70 |
Scalable Load and Store Processing in Latency Tolerant Processors. |
2005 |
| 70 |
Chisel: A Storage-efficient, Collision-free Hash-based Network Processing Architecture. |
2006 |
| 70 |
Decoupled DIMM: building high-bandwidth memory system using low-speed DRAM devices. |
2009 |
| 69 |
Techniques for Efficient Processing in Runahead Execution Engines. |
2005 |
| 69 |
Automated design of application specific superscalar processors: an analytical approach. |
2007 |
| 69 |
Polymorphic On-Chip Networks. |
2008 |
| 69 |
Crafting a usable microkernel, processor, and I/O system with strict and provable information flow security. |
2011 |
| 68 |
Program Demultiplexing: Data-flow based Speculative Parallelization of Methods in Sequential Programs. |
2006 |
| 68 |
Elastic cooperative caching: an autonomous dynamically adaptive memory hierarchy for chip multiprocessors. |
2010 |
| 66 |
Indirect adaptive routing on large scale interconnection networks. |
2009 |
| 66 |
Adaptive granularity memory systems: a tradeoff between storage efficiency and throughput. |
2011 |
| 66 |
The role of optics in future high radix switch design. |
2011 |
| 66 |
Navigating big data with high-throughput, energy-efficient data partitioning. |
2013 |
| 65 |
A case for FAME: FPGA architecture model execution. |
2010 |
| 65 |
Combining memory and a controller with photonics through 3D-stacking to enable scalable and energy-efficient systems. |
2011 |
| 65 |
End-to-end sequential consistency. |
2012 |
| 64 |
An Integrated Memory Array Processor Architecture for Embedded Image Recognition Systems. |
2005 |
| 64 |
Mechanisms for bounding vulnerabilities of processor structures. |
2007 |
| 64 |
A case for random shortcut topologies for HPC interconnects. |
2012 |
| 64 |
“Whare-map: heterogeneity in “”homogeneous”” warehouse-scale computers. “ |
2013 |
| 64 |
Design space exploration and optimization of path oblivious RAM in secure processors. |
2013 |
| 62 |
Balanced Cache: Reducing Conflict Misses of Direct-Mapped Caches. |
2006 |
| 62 |
Internet-scale service infrastructure efficiency. |
2009 |
| 61 |
LOT-ECC: Localized and tiered reliability mechanisms for commodity memory systems. |
2012 |
| 60 |
An Integrated Framework for Dependable and Revivable Architectures Using Multicore Processors. |
2006 |
| 60 |
Simultaneous speculative threading: a novel pipeline architecture implemented in sun’s rock processor. |
2009 |
| 60 |
ColorSafe: architectural support for debugging and dynamically avoiding multi-variable atomicity violations. |
2010 |
| 60 |
SRAM-DRAM hybrid memory with applications to efficient register files in fine-grained multi-threading. |
2011 |
| 60 |
ArchShield: architectural framework for assisting DRAM scaling by tolerating high error rates. |
2013 |
| 60 |
A hardware evaluation of cache partitioning to improve utilization and energy-efficiency while preserving responsiveness. |
2013 |
| 59 |
Deconstructing Commodity Storage Clusters. |
2005 |
| 59 |
Flexible Snooping: Adaptive Forwarding and Filtering of Snoops in Embedded-Ring Multiprocessors. |
2006 |
| 59 |
Rapid identification of architectural bottlenecks via precise event counting. |
2011 |
| 59 |
Probabilistic Shared Cache Management (PriSM). |
2012 |
| 58 |
Branch regulation: Low-overhead protection from code reuse attacks. |
2012 |
| 58 |
Triggered instructions: a control paradigm for spatially-programmed architectures. |
2013 |
| 58 |
The CHERI capability model: Revisiting RISC in an age of risk. |
2014 |
| 57 |
Thermal modeling and management of DRAM memory systems. |
2007 |
| 57 |
Heracles: improving resource efficiency at scale. |
2015 |
| 57 |
PIM-enabled instructions: a low-overhead, locality-aware processing-in-memory architecture. |
2015 |
| 56 |
Side-channel vulnerability factor: A metric for measuring information leakage. |
2012 |
| 55 |
Cohesion: a hybrid memory model for accelerators. |
2010 |
| 54 |
Memory Model = Instruction Reordering + Store Atomicity. |
2006 |
| 54 |
LINQits: big data on little clients. |
2013 |
| 53 |
An Evaluation Framework and Instruction Set Architecture for Ion-Trap Based Quantum Micro-Architectures. |
2005 |
| 53 |
Profiling a warehouse-scale computer. |
2015 |
| 52 |
WiDGET: Wisconsin decoupled grid execution tiles. |
2010 |
| 52 |
SpecTLB: a mechanism for speculative address translation. |
2011 |
| 52 |
Reducing memory reference energy with opportunistic virtual caching. |
2012 |
| 52 |
SurfNoC: a low latency and provably non-interfering approach to secure networks-on-chip. |
2013 |
| 51 |
Quantum Memory Hierarchies: Efficient Designs to Match Available Parallelism in Quantum Computing. |
2006 |
| 51 |
Sampling + DMR: practical and low-overhead permanent fault detection. |
2011 |
| 51 |
Tri-level-cell phase change memory: toward an efficient and reliable memory system. |
2013 |
| 50 |
Reducing memory access latency with asymmetric DRAM bank organizations. |
2013 |
| 50 |
Enabling preemptive multiprogramming on GPUs. |
2014 |
| 49 |
CAPRI: Prediction of compaction-adequacy for handling control-divergence in GPGPU architectures. |
2012 |
| 49 |
Utility-based acceleration of multithreaded applications on asymmetric CMPs. |
2013 |
| 48 |
Physical simulation for animation and visual effects: parallelization and characterization for chip multiprocessors. |
2007 |
| 48 |
Continuous real-world inputs can open up alternative accelerator designs. |
2013 |
| 47 |
ParallAX: an architecture for real-time physics. |
2007 |
| 47 |
i-NVMM: a secure non-volatile main memory system with incremental encryption. |
2011 |
| 47 |
The dynamic granularity memory system. |
2012 |
| 46 |
RENO - A Rename-Based Instruction Optimizer. |
2005 |
| 46 |
Stream chaining: exploiting multiple levels of correlation in data prefetching. |
2009 |
| 46 |
Physically Addressed Queueing (PAQ): Improving parallelism in Solid State Disks. |
2012 |
| 45 |
Late-binding: enabling unordered load-store queues. |
2007 |
| 45 |
Learning and Leveraging the Relationship between Architecture-Level Measurements and Individual User Satisfaction. |
2008 |
| 45 |
iGPU: Exception support and speculative execution on GPUs. |
2012 |
| 44 |
Store Buffer Design in First-Level Multibanked Data Caches. |
2005 |
| 44 |
Multiple Instruction Stream Processor. |
2006 |
| 44 |
Multi-execution: multicore caching for data-similar executions. |
2009 |
| 43 |
VPC prediction: reducing the cost of indirect branches via hardware-based dynamic devirtualization. |
2007 |
| 42 |
Improving Program Efficiency by Packing Instructions into Registers. |
2005 |
| 42 |
Area-Performance Trade-offs in Tiled Dataflow Architectures. |
2006 |
| 42 |
Using hardware vulnerability factors to enhance AVF analysis. |
2010 |
| 42 |
RADISH: Always-on sound and complete race detection in software and hardware. |
2012 |
| 42 |
Understanding and mitigating refresh overheads in high-density DDR4 DRAM systems. |
2013 |
| 41 |
Performance and power of cache-based reconfigurable computing. |
2009 |
| 41 |
A new perspective for efficient virtual-cache coherence. |
2013 |
| 41 |
Flicker: a dynamically adaptive architecture for power limited multicore systems. |
2013 |
| 41 |
An energy-efficient and scalable eDRAM-based register file architecture for GPGPU. |
2013 |
| 40 |
Achieving Out-of-Order Performance with Almost In-Order Complexity. |
2008 |
| 40 |
A fault tolerant, area efficient architecture for Shor’s factoring algorithm. |
2009 |
| 40 |
Dynamic performance tuning for speculative threads. |
2009 |
| 40 |
BOOM: Enabling mobile memory based low-power server DIMMs. |
2012 |
| 39 |
Reducing Startup Time in Co-Designed Virtual Machines. |
2006 |
| 39 |
Matrix scheduler reloaded. |
2007 |
| 39 |
From Speculation to Security: Practical and Efficient Information Flow Tracking Using Speculative Hardware. |
2008 |
| 39 |
Forwardflow: a scalable core for power-constrained CMPs. |
2010 |
| 39 |
RETCON: transactional repair without replay. |
2010 |
| 39 |
CPPC: correctable parity protected cache. |
2011 |
| 39 |
AC-DIMM: associative computing with STT-MRAM. |
2013 |
| 39 |
SCORPIO: A 36-core research chip demonstrating snoopy coherence on a scalable mesh NoC with in-network ordering. |
2014 |
| 38 |
Atomic Vector Operations on Chip Multiprocessors. |
2008 |
| 38 |
LReplay: a pending period based deterministic replay scheme. |
2010 |
| 38 |
Automatic abstraction and fault tolerance in cortical microachitectures. |
2011 |
| 38 |
Criticality stacks: identifying critical threads in parallel programs using synchronization behavior. |
2013 |
| 37 |
Software-Controlled Priority Characterization of POWER5 Processor. |
2008 |
| 37 |
SC2: A statistical compression cache scheme. |
2014 |
| 36 |
Transparent control independence (TCI). |
2007 |
| 36 |
ECMon: exposing cache events for monitoring. |
2009 |
| 36 |
TLSync: support for multiple fast barriers using on-chip transmission lines. |
2011 |
| 36 |
Fighting fire with fire: modeling the datacenter-scale effects of targeted superlattice thermal management. |
2011 |
| 36 |
Buffer-on-board memory systems. |
2012 |
| 36 |
WebCore: Architectural support for mobile Web browsing. |
2014 |
| 36 |
SynFull: Synthetic traffic models capturing cache coherent behaviour. |
2014 |
| 35 |
Dynamic Verification of Sequential Consistency. |
2005 |
| 35 |
Running a Quantum Circuit at the Speed of Data. |
2008 |
| 35 |
Watchdog: Hardware for safe and secure manual memory management and full memory safety. |
2012 |
| 35 |
Improving memory scheduling via processor-side load criticality information. |
2013 |
| 35 |
Harnessing ISA diversity: Design of a heterogeneous-ISA chip multiprocessor. |
2014 |
| 34 |
Virtualizing performance asymmetric multi-core systems. |
2011 |
| 33 |
Tolerating Dependences Between Large Speculative Threads Via Sub-Threads. |
2006 |
| 33 |
Intra-disk Parallelism: An Idea Whose Time Has Come. |
2008 |
| 33 |
Demand-driven software race detection using hardware performance counters. |
2011 |
| 33 |
Exploring memory consistency for massively-threaded throughput-oriented processors. |
2013 |
| 33 |
STAG: Spintronic-Tape Architecture for GPGPU cache hierarchies. |
2014 |
| 33 |
Half-DRAM: A high-bandwidth and low-power DRAM architecture from the rethinking of fine-grained activation. |
2014 |
| 32 |
Tolerating process variations in nanophotonic on-chip networks. |
2012 |
| 32 |
FLEXclusion: Balancing cache capacity and on-chip bandwidth via Flexible Exclusion. |
2012 |
| 32 |
The locality-aware adaptive cache coherence protocol. |
2013 |
| 32 |
Resilient die-stacked DRAM caches. |
2013 |
| 32 |
Data reorganization in memory using 3D-stacked DRAM. |
2015 |
| 32 |
ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars. |
2016 |
| 32 |
Minerva: Enabling Low-Power, Highly-Accurate Deep Neural Network Accelerators. |
2016 |
| 31 |
Data marshaling for multi-core architectures. |
2010 |
| 31 |
A case for globally shared-medium on-chip interconnect. |
2011 |
| 31 |
Zombie memory: extending memory lifetime by reviving dead blocks. |
2013 |
| 31 |
Rumba: an online quality management system for approximate computing. |
2015 |
| 31 |
PRIME: A Novel Processing-in-Memory Architecture for Neural Network Computation in ReRAM-Based Main Memory. |
2016 |
| 30 |
Conditional Memory Ordering. |
2006 |
| 30 |
Interconnection Networks for Scalable Quantum Computers. |
2006 |
| 30 |
Cooperative boosting: needy versus greedy power management. |
2013 |
| 29 |
Aquacore: a programmable architecture for microfluidics. |
2007 |
| 29 |
Boosting single-thread performance in multi-core systems through fine-grain multi-threading. |
2009 |
| 29 |
Timetraveler: exploiting acyclic races for optimizing memory race recording. |
2010 |
| 29 |
Harmony: Collection and analysis of parallel block vectors. |
2012 |
| 29 |
PARDIS: A programmable memory controller for the DDRx interfacing standards. |
2012 |
| 29 |
Virtualizing power distribution in datacenters. |
2013 |
| 29 |
The Dirty-Block Index. |
2014 |
| 29 |
Architecting to achieve a billion requests per second throughput on a single key-value store server platform. |
2015 |
| 27 |
Revisiting hardware-assisted page walks for virtualized systems. |
2012 |
| 27 |
A first-order mechanistic model for architectural vulnerability factor. |
2012 |
| 27 |
A micro-architectural analysis of switched photonic multi-chip interconnects. |
2012 |
| 27 |
Agile, efficient virtualization power management with low-latency server power states. |
2013 |
| 27 |
Redundant memory mappings for fast access to large memories. |
2015 |
| 27 |
DjiNN and Tonic: DNN as a service and its implications for future warehouse scale computers. |
2015 |
| 26 |
DNA-based molecular architecture with spatially localized components. |
2013 |
| 26 |
Unifying on-chip and inter-node switching within the Anton 2 network. |
2014 |
| 26 |
BlueDBM: an appliance for big data analytics. |
2015 |
| 25 |
Ginger: control independence using tag rewriting. |
2007 |
| 25 |
Flexible reference-counting-based hardware acceleration for garbage collection. |
2009 |
| 25 |
A memory system design framework: creating smart memories. |
2009 |
| 25 |
Rebound: scalable checkpointing for coherent shared memory. |
2011 |
| 25 |
VRSync: Characterizing and eliminating synchronization-induced voltage emergencies in many-core processors. |
2012 |
| 25 |
Protozoa: adaptive granularity cache coherence. |
2013 |
| 25 |
QuickSAN: a storage area network for fast, distributed, solid state disks. |
2013 |
| 25 |
Architecture implications of pads as a scarce resource. |
2014 |
| 24 |
Slackened Memory Dependence Enforcement: Combining Opportunistic Forwarding with Decoupled Verification. |
2006 |
| 24 |
Boosting mobile GPU performance with a decoupled access/execute fragment processor. |
2012 |
| 24 |
Studying multicore processor scaling via reuse distance analysis. |
2013 |
| 24 |
Quantitative comparison of hardware transactional memory for Blue Gene/Q, zEnterprise EC12, Intel Core, and POWER8. |
2015 |
| 24 |
Warped-compression: enabling power efficient GPUs through register compression. |
2015 |
| 24 |
Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks. |
2016 |
| 23 |
Improving writeback efficiency with decoupled last-write prediction. |
2012 |
| 23 |
Lane decoupling for improving the timing-error resiliency of wide-SIMD architectures. |
2012 |
| 23 |
SIMD divergence optimization through intra-warp compaction. |
2013 |
| 23 |
Maximizing SIMD resource utilization in GPGPUs with SIMD lane permutation. |
2013 |
| 22 |
Bit mapping for balanced PCM cell programming. |
2013 |
| 22 |
Dynamic reduction of voltage margins by leveraging on-chip ECC in Itanium II processors. |
2013 |
| 22 |
Eliminating redundant fragment shader executions on a mobile GPU via hardware memoization. |
2014 |
| 22 |
A case for core-assisted bottleneck acceleration in GPUs: enabling flexible data compression with assist warps. |
2015 |
| 21 |
Distributed Arithmetic on a Quantum Multicomputer. |
2006 |
| 21 |
Dynamic MIPS rate stabilization in out-of-order processors. |
2009 |
| 21 |
Moguls: a model to explore the memory hierarchy for bandwidth improvements. |
2011 |
| 21 |
WeeFence: toward making fences free in TSO. |
2013 |
| 21 |
Going vertical in memory management: Handling multiplicity by multi-policy. |
2014 |
| 21 |
SleepScale: Runtime joint speed scaling and sleep states management for power efficient data centers. |
2014 |
| 21 |
Dynamic thread block launch: a lightweight execution mechanism to support irregular applications on GPUs. |
2015 |
| 21 |
CAWA: coordinated warp scheduling and cache prioritization for critical warp acceleration of GPGPU workloads. |
2015 |
| 21 |
A fully associative, tagless DRAM cache. |
2015 |
| 20 |
Leveraging the core-level complementary effects of PVT variations to reduce timing emergencies in multi-core processors. |
2010 |
| 20 |
Flexible auto-refresh: enabling scalable and energy-efficient DRAM refresh reductions. |
2015 |
| 20 |
BEAR: techniques for mitigating bandwidth bloat in gigascale DRAM caches. |
2015 |
| 19 |
End-to-end register data-flow continuous self-test. |
2009 |
| 19 |
OUTRIDER: efficient memory latency tolerance with decoupled strands. |
2011 |
| 19 |
Inspection resistant memory: Architectural support for security from physical examination. |
2012 |
| 19 |
QuickRec: prototyping an intel architecture extension for record and replay of multithreaded programs. |
2013 |
| 19 |
Single-graph multiple flows: Energy efficient design alternative for GPGPUs. |
2014 |
| 19 |
Exploring the potential of heterogeneous von neumann/dataflow execution models. |
2015 |
| 18 |
HELIX-RC: An architecture-compiler co-design for automatic parallelization of irregular programs. |
2014 |
| 18 |
Stash: have your scratchpad and cache it too. |
2015 |
| 18 |
Cnvlutin: Ineffectual-Neuron-Free Deep Neural Network Computing. |
2016 |
| 17 |
The Future of Virtualization Technology. |
2006 |
| 17 |
Necromancer: enhancing system throughput by animating dead cores. |
2010 |
| 17 |
Optimizing virtual machine consolidation performance on NUMA server architecture for cloud workloads. |
2014 |
| 17 |
HIOS: A host interface I/O scheduler for Solid State Disks. |
2014 |
| 17 |
Transparent Offloading and Mapping (TOM): Enabling Programmer-Transparent Near-Data Processing in GPU Systems. |
2016 |
| 16 |
CPU transparent protection of OS kernel and hypervisor integrity with programmable DRAM. |
2013 |
| 16 |
Towards sustainable in-situ server systems in the big data era. |
2015 |
| 16 |
HEB: deploying and managing hybrid energy buffers for improving datacenter efficiency and economy. |
2015 |
| 15 |
Counting Dependence Predictors. |
2008 |
| 15 |
Microcoded Architectures for Ion-Tap Quantum Computers. |
2008 |
| 15 |
Sentry: light-weight auxiliary memory access control. |
2010 |
| 15 |
CODOMs: Protecting software with Code-centric memory Domains. |
2014 |
| 15 |
Real-world design and evaluation of compiler-managed GPU redundant multithreading. |
2014 |
| 15 |
EOLE: Paving the way for an effective implementation of value prediction. |
2014 |
| 15 |
Multiple clone row DRAM: a low latency and area optimized DRAM. |
2015 |
| 14 |
Ten ways to waste a parallel computer. |
2009 |
| 14 |
Viper: Virtual pipelines for enhanced reliability. |
2012 |
| 14 |
Enhancing effective throughput for transmission line-based bus. |
2012 |
| 14 |
STREX: boosting instruction cache reuse in OLTP workloads through stratified transaction execution. |
2013 |
| 14 |
Secure I/O device sharing among virtual machines on multiple hosts. |
2013 |
| 14 |
Page overlays: an enhanced virtual memory framework to enable fine-grained memory management. |
2015 |
| 14 |
Hi-fi playback: tolerating position errors in shift operations of racetrack memory. |
2015 |
| 13 |
Performance and security lessons learned from virtualizing the alpha processor. |
2007 |
| 13 |
The rebirth of neural networks. |
2010 |
| 13 |
CRIB: consolidated rename, issue, and bypass. |
2011 |
| 13 |
ArchRanker: A ranking approach to design space exploration. |
2014 |
| 13 |
Fine-grain task aggregation and coordination on GPUs. |
2014 |
| 13 |
GangES: Gang error simulation for hardware resiliency evaluation. |
2014 |
| 13 |
Manycore network interfaces for in-memory rack-scale computing. |
2015 |
| 13 |
Callback: efficient synchronization without invalidation with a directory just for spin-waiting. |
2015 |
| 13 |
ArMOR: defending against memory consistency model mismatches in heterogeneous architectures. |
2015 |
| 13 |
Accelerating Dependent Cache Misses with an Enhanced Memory Controller. |
2016 |
| 13 |
Neurocube: A Programmable Digital Neuromorphic Architecture with High-Density 3D Memory. |
2016 |
| 12 |
Fusion: design tradeoffs in coherent cache hierarchies for accelerators. |
2015 |
| 12 |
SLIP: reducing wire energy in the memory hierarchy. |
2015 |
| 12 |
Cambricon: An Instruction Set Architecture for Neural Networks. |
2016 |
| 11 |
Tailoring quantum architectures to implementation style: a quantum computer for mobile and persistent qubits. |
2007 |
| 11 |
End-to-end performance forecasting: finding bottlenecks before they happen. |
2009 |
| 11 |
Microarchitectural mechanisms to exploit value structure in SIMT architectures. |
2013 |
| 11 |
OmniOrder: Directory-based conflict serialization of transactions. |
2014 |
| 11 |
Harmonia: balancing compute and memory power in high-performance GPUs. |
2015 |
| 11 |
RedEye: Analog ConvNet Image Sensor Architecture for Continuous Mobile Vision. |
2016 |
| 10 |
Architectural implications of brick and mortar silicon manufacturing. |
2007 |
| 10 |
Decoupled store completion/silent deterministic replay: enabling scalable data memory for CPR/CFP processors. |
2009 |
| 10 |
Improving virtualization in the presence of software managed translation lookaside buffers. |
2013 |
| 10 |
Increasing off-chip bandwidth in multi-core processors with switchable pins. |
2014 |
| 10 |
Race Logic: A hardware acceleration for dynamic programming algorithms. |
2014 |
| 10 |
Flexible software profiling of GPU architectures. |
2015 |
| 9 |
Increased Scalability and Power Efficiency by Using Multiple Speed Pipelines. |
2005 |
| 9 |
A Two-Level Load/Store Queue Based on Execution Locality. |
2008 |
| 9 |
Replay debugging: Leveraging record and replay for program debugging. |
2014 |
| 9 |
Navigating the cache hierarchy with a single lookup. |
2014 |
| 9 |
An examination of the architecture and system-level tradeoffs of employing steep slope devices in 3D CMPs. |
2014 |
| 9 |
Avoiding core’s DUE&SDC via acoustic wave detectors and tailored error containment and recovery. |
2014 |
| 9 |
Thermal time shifting: leveraging phase change materials to reduce cooling costs in warehouse-scale computers. |
2015 |
| 9 |
Probable cause: the deanonymizing effects of approximate DRAM. |
2015 |
| 9 |
COP: to compress and protect main memory. |
2015 |
| 8 |
Setting an error detection infrastructure with low cost acoustic wave detectors. |
2012 |
| 8 |
A low power and reliable charge pump design for Phase Change Memories. |
2014 |
| 8 |
Improving the energy efficiency of Big Cores. |
2014 |
| 8 |
Row-buffer decoupling: A case for low-latency DRAM microarchitecture. |
2014 |
| 8 |
Reducing access latency of MLC PCMs through line striping. |
2014 |
| 8 |
DynaSpAM: dynamic spatial architecture mapping using out of order instruction schedules. |
2015 |
| 8 |
PrORAM: dynamic prefetcher for oblivious RAM. |
2015 |
| 8 |
Computer performance microscopy with Shim. |
2015 |
| 8 |
CloudMonatt: an architecture for security health monitoring and attestation of virtual machines in cloud computing. |
2015 |
| 8 |
LaPerm: Locality Aware Scheduler for Dynamic Parallelism on GPUs. |
2016 |
| 7 |
Energy-Effectiveness of Pre-Execution and Energy-Aware P-Thread Selection. |
2005 |
| 7 |
Moving the needle, computer architecture research in academe and industry. |
2010 |
| 7 |
FlexBulk: intelligently forming atomic blocks in blocked-execution multiprocessors to minimize squashes. |
2011 |
| 7 |
Non-race concurrency bug detection through order-sensitive critical sections. |
2013 |
| 7 |
FASE: finding amplitude-modulated side-channel emanations. |
2015 |
| 7 |
The load slice core microarchitecture. |
2015 |
| 6 |
The End of Scaling? Revolutions in Technology and Microarchitecture as We Pass the 90 Nanometer Node. |
2006 |
| 6 |
Fetch-Criticality Reduction through Control Independence. |
2008 |
| 6 |
Accelerating asynchronous programs through event sneak peek. |
2015 |
| 6 |
Reducing world switches in virtualized environment with flexible cross-world calls. |
2015 |
| 6 |
Semantic locality and context-based prefetching using reinforcement learning. |
2015 |
| 6 |
Efficient execution of memory access phases using dataflow specialization. |
2015 |
| 6 |
Clean: a race detector with cleaner semantics. |
2015 |
| 6 |
A variable warp size architecture. |
2015 |
| 6 |
Coherence protocol for transparent management of scratchpad memories in shared memory manycore architectures. |
2015 |
| 6 |
Dynamo: Facebook’s Data Center-Wide Power Management System. |
2016 |
| 6 |
Warped-Slicer: Efficient Intra-SM Slicing through Dynamic Resource Partitioning for GPU Multiprogramming. |
2016 |
| 6 |
Agile Paging: Exceeding the Best of Nested and Shadow Paging. |
2016 |
| 5 |
Improving the future by examining the past. |
2010 |
| 5 |
Euripus: A flexible unified hardware memory checkpointing accelerator for bidirectional-debugging and reliability. |
2012 |
| 5 |
Configurable fine-grain protection for multicore processor virtualization. |
2012 |
| 5 |
Quantum rotations: a case study in static and dynamic machine-code generation for quantum computers. |
2013 |
| 5 |
Unified address translation for memory-mapped SSDs with FlashMap. |
2015 |
| 5 |
Efficient Synonym Filtering and Scalable Delayed Translation for Hybrid Virtual Caching. |
2016 |
| 5 |
Automatic Generation of Efficient Accelerators for Reconfigurable Hardware. |
2016 |
| 5 |
Energy Efficient Architecture for Graph Analytics Accelerators. |
2016 |
| 5 |
MITTS: Memory Inter-arrival Time Traffic Shaping. |
2016 |
| 5 |
Efficient Synonym Filtering and Scalable Delayed Translation for Hybrid Virtual Caching. |
2016 |
| 5 |
Biscuit: A Framework for Near-Data Processing of Big Data Workloads. |
2016 |
| 5 |
CASH: Supporting IaaS Customers with a Sub-core Configurable Architecture. |
2016 |
| 4 |
IVEC: off-chip memory integrity protection for both security and reliability. |
2010 |
| 4 |
MemGuard: A low cost and energy efficient design to support and enhance memory system reliability. |
2014 |
| 4 |
Fractal++: Closing the performance gap between fractal and conventional coherence. |
2014 |
| 4 |
Branch vanguard: decomposing branch functionality into prediction and resolution instructions. |
2015 |
| 4 |
SHRINK: reducing the ISA complexity via instruction recycling. |
2015 |
| 4 |
MiSAR: minimalistic synchronization accelerator with resource overflow management. |
2015 |
| 4 |
Back to the Future: Leveraging Belady’s Algorithm for Improved Cache Replacement. |
2016 |
| 4 |
Mellow Writes: Extending Lifetime in Resistive Memories through Selective Slow Write Backs. |
2016 |
| 4 |
Using Multiple Input, Multiple Output Formal Control to Maximize Resource Efficiency in Architectures. |
2016 |
| 4 |
ASIC Clouds: Specializing the Datacenter. |
2016 |
| 4 |
Virtual Thread: Maximizing Thread-Level Parallelism beyond GPU Scheduling Limit. |
2016 |
| 3 |
Releasing efficient beta cores to market early. |
2011 |
| 3 |
BlockChop: Dynamic squash elimination for hybrid processor architecture. |
2012 |
| 3 |
Pacifier: Record and replay for relaxed-consistency multiprocessors with distributed directory protocol. |
2014 |
| 3 |
MBus: an ultra-low power interconnect bus for next generation nanopower systems. |
2015 |
| 3 |
Cost-effective speculative scheduling in high performance processors. |
2015 |
| 3 |
Treadmill: Attributing the Source of Tail Latency through Precise Load Testing and Statistical Inference. |
2016 |
| 3 |
Peak Efficiency Aware Scheduling for Highly Energy Proportional Servers. |
2016 |
| 2 |
Shared caches in multicores: the good, the bad, and the ugly. |
2010 |
| 2 |
Deconfigurable microprocessor architectures for silicon debug acceleration. |
2013 |
| 2 |
VIP: virtualizing IP chains on handheld platforms. |
2015 |
| 2 |
Bit-Plane Compression: Transforming Data for Better Compression in Many-Core Architectures. |
2016 |
| 2 |
ARM Virtualization: Performance and Architectural Implications. |
2016 |
| 2 |
Exploiting Dynamic Timing Slack for Energy Efficiency in Ultra-Low-Power Embedded Systems. |
2016 |
| 2 |
PowerChop: Identifying and Managing Non-critical Units in Hybrid Processor Architectures. |
2016 |
| 2 |
Future Vector Microprocessor Extensions for Data Aggregations. |
2016 |
| 2 |
LAP: Loop-Block Aware Inclusion Properties for Energy-Efficient Asymmetric Last Level Caches. |
2016 |
| 2 |
Morpheus: Creating Application Objects Efficiently for Heterogeneous Computing. |
2016 |
| 2 |
Towards Statistical Guarantees in Controlling Quality Tradeoffs for Approximate Acceleration. |
2016 |
| 2 |
ActivePointers: A Case for Software Address Translation on GPUs. |
2016 |
| 2 |
The Anytime Automaton. |
2016 |
| 1 |
Computer Architecture Research and Future Microprocessors: Where Do We Go from Here? |
2006 |
| 1 |
Efficient digital neurons for large scale cortical architectures. |
2014 |
| 1 |
FaultHound: value-locality-based soft-fault tolerance. |
2015 |
| 1 |
Accelerating Markov Random Field Inference Using Molecular Optical Gibbs Sampling Units. |
2016 |
| 1 |
Strober: Fast and Accurate Sample-Based Energy Simulation for Arbitrary RTL. |
2016 |
| 1 |
Decoupling Loads for Nano-Instruction Set Computers. |
2016 |
| 1 |
Efficiently Scaling Out-of-Order Cores for Simultaneous Multithreading. |
2016 |
| 1 |
Energy Efficient Data Encoding in DRAM Channels Exploiting Data Value Similarity. |
2016 |
| 1 |
Boosting Access Parallelism to PCM-Based Main Memory. |
2016 |
| 1 |
Power Attack Defense: Securing Battery-Backed Data Centers. |
2016 |
| 1 |
Short-Circuit Dispatch: Accelerating Virtual Machine Interpreters on Embedded Processors. |
2016 |
| 1 |
APRES: Improving Cache Efficiency by Exploiting Load Characteristics on GPUs. |
2016 |
| 1 |
Rescuing Uncorrectable Fault Patterns in On-Chip Memories through Error Pattern Transformation. |
2016 |
| 1 |
Asymmetry-Aware Work-Stealing Runtimes. |
2016 |
| 1 |
XED: Exposing On-Die Error Detection Information for Strong Memory Reliability. |
2016 |
| 1 |
All-Inclusive ECC: Thorough End-to-End Protection for Reliable Computer Memory. |
2016 |
| 0 |
Message from the General Chair. |
2006 |
| 0 |
Message from the Program Chair. |
2006 |
| 0 |
SIGARCH Guidelines. |
2006 |
| 0 |
LaZy superscalar. |
2015 |
| 0 |
DRAF: A Low-Power DRAM-Based Reconfigurable Acceleration Fabric. |
2016 |
| 0 |
Opportunistic Competition Overhead Reduction for Expediting Critical Section in NoC Based CMPs. |
2016 |
| 0 |
Production-Run Software Failure Diagnosis via Adaptive Communication Tracking. |
2016 |
| 0 |
RelaxFault Memory Repair. |
2016 |
| 0 |
Evaluation of an Analog Accelerator for Linear Algebra. |
2016 |
| 0 |
Base-Victim Compression: An Opportunistic Cache Compression Architecture. |
2016 |