| Cited by |
Paper title |
Year |
| 826 |
PowerNap: eliminating server idle power. |
2009 |
| 814 |
A comparison of software and hardware techniques for x86 virtualization. |
2006 |
| 680 |
Learning from mistakes: a comprehensive study on real world concurrency bug characteristics. |
2008 |
| 627 |
DFTL: a flash translation layer employing demand-based selective caching of page-level address mappings. |
2009 |
| 610 |
“No “”power”” struggles: coordinated multi-level power management for the data center. “ |
2008 |
| 549 |
Exploiting coarse-grained task, data, and pipeline parallelism in stream programs. |
2006 |
| 540 |
Addressing shared resource contention in multicore processors via scheduling. |
2010 |
| 498 |
Hybrid transactional memory. |
2006 |
| 485 |
Clearing the clouds: a study of emerging scale-out workloads on modern hardware. |
2012 |
| 405 |
Conservation cores: reducing the energy of mature computations. |
2010 |
| 392 |
Overshadow: a virtualization-based approach to retrofitting protection in commodity operating systems. |
2008 |
| 377 |
AVIO: detecting atomicity violations via access interleaving invariants. |
2006 |
| 377 |
Accelerator: using data parallelism to program GPUs for general-purpose uses. |
2006 |
| 370 |
Accurate and efficient regression modeling for microarchitectural performance and power prediction. |
2006 |
| 369 |
Kendo: efficient deterministic multithreading in software. |
2009 |
| 326 |
S2E: a platform for in-vivo multi-path analysis of software systems. |
2011 |
| 318 |
Merge: a programming model for heterogeneous multi-core systems. |
2008 |
| 314 |
Mnemosyne: lightweight persistent memory. |
2011 |
| 310 |
Combinatorial sketching for finite programs. |
2006 |
| 307 |
NV-Heaps: making persistent objects fast and safe with next-generation, non-volatile memories. |
2011 |
| 300 |
DMP: deterministic shared memory multiprocessing. |
2009 |
| 275 |
CoreDet: a compiler and runtime system for deterministic multithreaded execution. |
2010 |
| 272 |
Flikker: saving DRAM refresh-power through critical data partitioning. |
2011 |
| 269 |
Accelerating critical section execution with asymmetric multi-core architectures. |
2009 |
| 268 |
Early experience with a commercial hardware transactional memory implementation. |
2009 |
| 267 |
CTrigger: exposing atomicity violation bugs from their hiding places. |
2009 |
| 266 |
Efficiently exploring architectural design spaces via predictive modeling. |
2006 |
| 258 |
Gordon: using flash memory to build fast, power-efficient clusters for data-intensive applications. |
2009 |
| 256 |
Architecture support for disciplined approximate programming. |
2012 |
| 248 |
Paragon: QoS-aware scheduling for heterogeneous datacenters. |
2013 |
| 239 |
Fairness via source throttling: a configurable and high-performance fairness substrate for multi-core memory systems. |
2010 |
| 232 |
Producing wrong data without doing anything obviously wrong! |
2009 |
| 227 |
Supporting nested transactional memory in logTM. |
2006 |
| 225 |
Mercury and freon: temperature emulation and management for server systems. |
2006 |
| 220 |
DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning. |
2014 |
| 217 |
Understanding the propagation of hard errors to software and implications for resilient system design. |
2008 |
| 213 |
Quasar: resource-efficient and QoS-aware cluster management. |
2014 |
| 212 |
PicoServer: using 3D stacking technology to enable a compact energy efficient chip multiprocessor. |
2006 |
| 199 |
Geiger: monitoring the buffer cache in a virtual machine environment. |
2006 |
| 199 |
MemScale: active low-power modes for main memory. |
2011 |
| 197 |
Dynamic knobs for responsive power-aware computing. |
2011 |
| 194 |
Tarazu: optimizing MapReduce on heterogeneous clusters. |
2012 |
| 180 |
Joint optimization of idle and cooling power in data centers while maintaining response time. |
2010 |
| 177 |
Green-Marl: a DSL for easy and efficient graph analysis. |
2012 |
| 176 |
RapidMRC: approximating L2 miss rate curves on commodity systems for online optimizations. |
2009 |
| 175 |
An asymmetric distributed shared memory model for heterogeneous parallel systems. |
2010 |
| 172 |
Faults in linux: ten years later. |
2011 |
| 171 |
SherLog: error diagnosis by connecting clues from run-time logs. |
2010 |
| 170 |
Feedback-driven threading: power-efficient and high-performance execution of multi-threaded workloads on CMPs. |
2008 |
| 169 |
ELI: bare-metal performance for I/O virtualization. |
2012 |
| 168 |
Accelerating two-dimensional page walks for virtualized systems. |
2008 |
| 164 |
Cosmic rays don’t strike twice: understanding the nature of DRAM errors and the implications for system design. |
2012 |
| 163 |
A randomized scheduler with probabilistic guarantees of finding bugs. |
2010 |
| 161 |
Micro-pages: increasing DRAM efficiency with locality-aware data placement. |
2010 |
| 161 |
On-the-fly elimination of dynamic irregularities for GPU computing. |
2011 |
| 157 |
Unikernels: library operating systems for the cloud. |
2013 |
| 156 |
Shoestring: probabilistic soft error reliability on the cheap. |
2010 |
| 155 |
ASSURE: automatic software self-healing using rescue points. |
2009 |
| 153 |
DoublePlay: parallelizing sequential logging and replay. |
2011 |
| 151 |
Parasol and GreenSwitch: managing datacenters powered by renewable energy. |
2013 |
| 150 |
Dynamically replicated memory: building reliable systems from nanoscale resistive memories. |
2010 |
| 147 |
OWL: cooperative thread array aware scheduling techniques for improving GPGPU performance. |
2013 |
| 144 |
Recording shared memory dependencies using strata. |
2006 |
| 143 |
Ultra low-cost defect protection for microprocessor pipelines. |
2006 |
| 142 |
A performance counter architecture for computing accurate CPI components. |
2006 |
| 140 |
DejaVu: accelerating resource allocation in virtualized environments. |
2012 |
| 137 |
Rethinking the library OS from the top down. |
2011 |
| 136 |
Capo: a software-hardware interface for practical deterministic multiprocessor replay. |
2009 |
| 133 |
Whole-system persistence. |
2012 |
| 132 |
Respec: efficient online multiprocessor replayvia speculation and external determinism. |
2010 |
| 132 |
Blink: managing server clusters on intermittent power. |
2011 |
| 131 |
Traffic management: a holistic approach to memory placement on NUMA systems. |
2013 |
| 130 |
Adaptive set pinning: managing shared caches in chip multiprocessors. |
2008 |
| 128 |
Improving software diagnosability via log enhancement. |
2011 |
| 128 |
InkTag: secure applications on an untrusted operating system. |
2013 |
| 127 |
Parallelizing security checks on commodity hardware. |
2008 |
| 127 |
Complete information flow tracking from the gates up. |
2009 |
| 124 |
Hybrid NOrec: a case study in the effectiveness of best effort hardware transactional memory. |
2011 |
| 123 |
A regulated transitive reduction (RTR) for longer memory race recording. |
2006 |
| 120 |
Computation spreading: employing hardware migration to specialize CMP cores on-the-fly. |
2006 |
| 113 |
Hardbound: architectural support for spatial safety of the C programming language. |
2008 |
| 111 |
Power routing: dynamic power provisioning in the data center. |
2010 |
| 110 |
Unbounded page-based transactional memory. |
2006 |
| 109 |
Providing safe, user space access to fast, solid state disks. |
2012 |
| 106 |
Ensuring operating system kernel integrity with OSck. |
2011 |
| 104 |
Flexible architectural support for fine-grain scheduling. |
2010 |
| 102 |
Virtualized and flexible ECC for main memory. |
2010 |
| 101 |
Bell: bit-encoding online memory leak detection. |
2006 |
| 101 |
The design and implementation of microdrivers. |
2008 |
| 101 |
Speculative parallelization using software multi-threaded transactions. |
2010 |
| 101 |
Architectural support for hypervisor-secure virtualization. |
2012 |
| 99 |
Better bug reporting with better privacy. |
2008 |
| 97 |
Bottleneck identification and scheduling in multithreaded applications. |
2012 |
| 95 |
Automatic generation of peephole superoptimizers. |
2006 |
| 93 |
Tradeoffs in transactional memory virtualization. |
2006 |
| 92 |
Streamware: programming general-purpose multicore processors using streams. |
2008 |
| 92 |
Optimistic parallelism benefits from data partitioning. |
2008 |
| 91 |
A power-efficient all-optical on-chip interconnect using wavelength-based oblivious routing. |
2010 |
| 91 |
ConMem: detecting severe concurrency bugs through an effect-oriented approach. |
2010 |
| 90 |
ConSeq: detecting concurrency bugs through sequential errors. |
2011 |
| 89 |
Leveraging stored energy for handling power emergencies in aggressively provisioned datacenters. |
2012 |
| 89 |
Improving GPGPU concurrency with elastic kernels. |
2013 |
| 88 |
Stochastic superoptimization. |
2013 |
| 88 |
KVM/ARM: the design and implementation of the linux ARM hypervisor. |
2014 |
| 87 |
RCDC: a relaxed consistency deterministic computer. |
2011 |
| 87 |
SDF: software-defined flash for web-scale internet storage systems. |
2014 |
| 85 |
Mementos: system support for long-running computation on RFID-scale devices. |
2011 |
| 84 |
Sponge: portable stream programming on graphics engines. |
2011 |
| 80 |
How low can you go?: recommendations for hardware-supported minimal TCB code execution. |
2008 |
| 80 |
Relyzer: exploiting application-level fault equivalence to analyze application resiliency to transient faults. |
2012 |
| 80 |
Iago attacks: why the system call API is a bad untrusted RPC interface. |
2013 |
| 79 |
Understanding and visualizing full systems with data flow tomography. |
2008 |
| 79 |
Paraprox: pattern-based approximation for data parallel applications. |
2014 |
| 76 |
Q100: the architecture and design of a database processing unit. |
2014 |
| 74 |
Pocket cloudlets. |
2011 |
| 74 |
Data races vs. data race bugs: telling the difference with portend. |
2012 |
| 72 |
Looking back on the language and hardware revolutions: measured power, performance, and scaling. |
2011 |
| 72 |
Scalable address spaces using RCU balanced trees. |
2012 |
| 71 |
Tartan: evaluating spatial computation for whole program execution. |
2006 |
| 71 |
Characterizing processor thermal behavior. |
2010 |
| 71 |
Inter-core cooperative TLB for chip multiprocessors. |
2010 |
| 70 |
Temporal search: detecting hidden malware timebombs with virtual machines. |
2006 |
| 70 |
GPUfs: integrating a file system with GPUs. |
2013 |
| 68 |
Introspective 3D chips. |
2006 |
| 68 |
Uncertain: a first-order type for uncertain data. |
2014 |
| 68 |
Using ARM trustzone to build a trusted language runtime for mobile applications. |
2014 |
| 67 |
Probabilistic job symbiosis modeling for SMT processor scheduling. |
2010 |
| 66 |
Adapting to intermittent faults in multicore systems. |
2008 |
| 66 |
Inter-core prefetching for multicore processors using migrating helper threads. |
2011 |
| 65 |
Virtual ghost: protecting applications from hostile operating systems. |
2014 |
| 64 |
An evaluation of the TRIPS computer system. |
2009 |
| 64 |
CRUISE: cache replacement and utility-aware scheduling. |
2012 |
| 61 |
Per-thread cycle accounting in SMT processors. |
2009 |
| 59 |
Scale-out NUMA. |
2014 |
| 58 |
Integrated network interfaces for high-bandwidth TCP/IP. |
2006 |
| 58 |
Archipelago: trading address space for reliability and security. |
2008 |
| 58 |
Mixed-mode multicore reliability. |
2009 |
| 58 |
Understanding modern device drivers. |
2012 |
| 58 |
Portable performance on heterogeneous architectures. |
2013 |
| 58 |
Memory Errors in Modern Systems: The Good, The Bad, and The Ugly. |
2015 |
| 57 |
Analyzing multicore dumps to facilitate concurrency bug reproduction. |
2010 |
| 56 |
Software-based instruction caching for embedded processors. |
2006 |
| 56 |
Execution migration in a heterogeneous-ISA chip multiprocessor. |
2012 |
| 56 |
STABILIZER: statistically sound performance evaluation. |
2013 |
| 55 |
A spatial path scheduling algorithm for EDGE architectures. |
2006 |
| 55 |
Power containers: an OS facility for fine-grained power and energy management on multicore servers. |
2013 |
| 54 |
ISOLATOR: dynamically ensuring isolation in comcurrent programs. |
2009 |
| 54 |
Decoupling contention management from scheduling. |
2010 |
| 54 |
NVM duet: unified working memory and persistent store architecture. |
2014 |
| 53 |
Recovery domains: an organizing principle for recoverable operating systems. |
2009 |
| 53 |
DreamWeaver: architectural support for deep sleep. |
2012 |
| 53 |
PuDianNao: A Polyvalent Machine Learning Accelerator. |
2015 |
| 52 |
PICSEL: measuring user-perceived performance to control dynamic frequency scaling. |
2008 |
| 52 |
PocketWeb: instant web browsing for mobile devices. |
2012 |
| 51 |
ApproxHadoop: Bringing Approximations to MapReduce Frameworks. |
2015 |
| 50 |
Reflex: using low-power processors in smartphones without knowing them. |
2012 |
| 50 |
Sirius: An Open End-to-End Voice and Vision Personal Assistant and Its Implications for Future Warehouse Scale Computers. |
2015 |
| 49 |
SlicK: slice-based locality exploitation for efficient redundant multithreading. |
2006 |
| 48 |
HOTL: a higher order theory of locality. |
2013 |
| 47 |
A probabilistic pointer analysis for speculative optimizations. |
2006 |
| 46 |
Efficiency trends and limits from comprehensive microarchitectural adaptivity. |
2008 |
| 45 |
Leak pruning. |
2009 |
| 44 |
2ndStrike: toward manifesting hidden concurrency typestate bugs. |
2011 |
| 44 |
Using likely invariants for automated software fault localization. |
2013 |
| 44 |
Fine-grained fault tolerance using device checkpoints. |
2013 |
| 44 |
Architectural support for address translation on GPUs: designing memory management units for CPU/GPUs with unified address spaces. |
2014 |
| 43 |
Understanding prediction-based partial redundant threading for low-overhead, high- coverage fault tolerance. |
2006 |
| 43 |
Ubik: efficient cache sharing with strict qos for latency-critical workloads. |
2014 |
| 43 |
Price theory based power management for heterogeneous multi-cores. |
2014 |
| 42 |
Exploiting access semantics and program behavior to reduce snoop power in chip multiprocessors. |
2008 |
| 42 |
Efficient online validation with delta execution. |
2009 |
| 42 |
Commutativity analysis for software parallelization: letting program transformations see the big picture. |
2009 |
| 42 |
ParaLog: enabling and accelerating online parallel monitoring of multithreaded applications. |
2010 |
| 42 |
Heterogeneous-race-free memory models. |
2014 |
| 42 |
REF: resource elasticity fairness with sharing incentives for multiprocessors. |
2014 |
| 41 |
Comprehensively and efficiently protecting the heap. |
2006 |
| 41 |
DeNovoND: efficient hardware support for disciplined non-determinism. |
2013 |
| 40 |
Safe and automatic live update for operating systems. |
2013 |
| 40 |
EnCore: exploiting system environment and correlation information for misconfiguration detection. |
2014 |
| 39 |
SoftSig: software-exposed hardware signatures for code analysis and optimization. |
2008 |
| 39 |
Path-exploration lifting: hi-fi tests for lo-fi emulators. |
2012 |
| 39 |
Post-compiler software optimization for reducing energy. |
2014 |
| 38 |
Predictor virtualization. |
2008 |
| 38 |
TwinDrivers: semi-automatic derivation of fast and safe hypervisor network drivers from guest OS drivers. |
2009 |
| 38 |
Efficient sequential consistency via conflict ordering. |
2012 |
| 38 |
Data-parallel finite-state machines. |
2014 |
| 38 |
GPU Concurrency: Weak Behaviours and Programming Assumptions. |
2015 |
| 37 |
Hardware counter driven on-the-fly request signatures. |
2008 |
| 36 |
Orchestration by approximation: mapping stream programs onto multicore architectures. |
2011 |
| 35 |
HeapMD: identifying heap-based bugs using anomaly detection. |
2006 |
| 35 |
Stealth prefetching. |
2006 |
| 35 |
A defect tolerant self-organizing nanoscale SIMD architecture. |
2006 |
| 34 |
Instruction scheduling for a tiled dataflow architecture. |
2006 |
| 34 |
Demand-based coordinated scheduling for SMP VMs. |
2013 |
| 34 |
Protecting Data on Smartphones and Tablets from Memory Attacks. |
2015 |
| 34 |
Beyond the PDP-11: Architectural Support for a Memory-Safe C Abstract Machine. |
2015 |
| 33 |
A new idiom recognition framework for exploiting hardware-assist instructions. |
2006 |
| 33 |
Optimal task assignment in multithreaded processors: a statistical approach. |
2012 |
| 33 |
Verifying systems rules using rule-directed symbolic execution. |
2013 |
| 33 |
Verifying security invariants in ExpressOS. |
2013 |
| 33 |
A study of the scalability of stop-the-world garbage collectors on multicores. |
2013 |
| 33 |
K2: a mobile operating system for heterogeneous coherence domains. |
2014 |
| 33 |
Transactionalizing legacy code: an experience report using GCC and Memcached. |
2014 |
| 33 |
Disengaged scheduling for fair, protected access to fast computational accelerators. |
2014 |
| 33 |
Page Placement Strategies for GPUs within Heterogeneous Memory Systems. |
2015 |
| 33 |
GhostRider: A Hardware-Software System for Memory Trace Oblivious Computation. |
2015 |
| 33 |
Asynchronized Concurrency: The Secret to Scaling Concurrent Search Data Structures. |
2015 |
| 32 |
Xoc, an extension-oriented compiler for systems programming. |
2008 |
| 32 |
Computational sprinting on a hardware/software testbed. |
2013 |
| 32 |
PolyMage: Automatic Optimization for Image Processing Pipelines. |
2015 |
| 31 |
Mapping esterel onto a multi-threaded embedded processor. |
2006 |
| 31 |
Tapping into the fountain of CPUs: on operating system support for programmable devices. |
2008 |
| 30 |
MacroSS: macro-SIMDization of streaming applications. |
2010 |
| 30 |
A case for neuromorphic ISAs. |
2011 |
| 30 |
Hardware acceleration of transactional memory on commodity systems. |
2011 |
| 30 |
Region scheduling: efficiently using the cache architectures via page-level affinity. |
2012 |
| 30 |
Regularities considered harmful: forcing randomness to memory accesses to reduce row buffer conflicts for multi-core, multi-bank systems. |
2013 |
| 30 |
ReQoS: reactive static/dynamic compilation for QoS in warehouse scale computers. |
2013 |
| 30 |
Deterministic galois: on-demand, portable and parameterless. |
2014 |
| 30 |
Mojim: A Reliable and Highly-Available Non-Volatile Memory System. |
2015 |
| 29 |
Exploring circuit timing-aware language and compilation. |
2011 |
| 29 |
Efficient processor support for DRFx, a memory model with exceptions. |
2011 |
| 29 |
Comprehensive kernel instrumentation via dynamic binary translation. |
2012 |
| 29 |
VSwapper: a memory swapper for virtualized environments. |
2014 |
| 29 |
FACADE: A Compiler and Runtime for (Almost) Object-Bounded Big Data Applications. |
2015 |
| 28 |
The mapping collector: virtual memory support for generational, parallel, and concurrent compaction. |
2008 |
| 28 |
Discerning the dominant out-of-order performance advantage: is it speculation or dynamism? |
2013 |
| 28 |
Production-run software failure diagnosis via hardware performance counters. |
2013 |
| 28 |
Parallelizing data race detection. |
2013 |
| 28 |
Reduced Hardware NOrec: A Safe and Scalable Hybrid Transactional Memory. |
2015 |
| 28 |
Chimera: Collaborative Preemption for Multitasking on a Shared GPU. |
2015 |
| 27 |
Improving the performance of object-oriented languages with dynamic predication of indirect jumps. |
2008 |
| 27 |
Maximum benefit from a minimal HTM. |
2009 |
| 27 |
COMPASS: a programmable data prefetcher using idle GPU shaders. |
2010 |
| 27 |
Chameleon: operating system support for dynamic processors. |
2012 |
| 27 |
ConAir: featherweight concurrency bug recovery via single-threaded idempotent execution. |
2013 |
| 27 |
DDOS: taming nondeterminism in distributed systems. |
2013 |
| 27 |
Underprovisioning backup power infrastructure for datacenters. |
2014 |
| 26 |
Orthrus: efficient software integrity protection on multi-cores. |
2010 |
| 26 |
Synthesizing concurrent schedulers for irregular algorithms. |
2011 |
| 26 |
Improving the performance of trace-based systems by false loop filtering. |
2011 |
| 26 |
Transparent mutable replay for multicore debugging and patch validation. |
2013 |
| 25 |
Cooperative empirical failure avoidance for multithreaded programs. |
2013 |
| 25 |
Integrated 3D-stacked server designs for increasing physical density of key-value stores. |
2014 |
| 25 |
Few-to-Many: Incremental Parallelism for Reducing Tail Latency in Interactive Services. |
2015 |
| 24 |
Butterfly analysis: adapting dataflow analysis to dynamic parallel monitoring. |
2010 |
| 24 |
Applying transactional memory to concurrency bugs. |
2012 |
| 24 |
A Probabilistic Graphical Model-based Approach for Minimizing Energy Under Performance Constraints. |
2015 |
| 24 |
Architectural Support for Software-Defined Metadata Processing. |
2015 |
| 23 |
Anomaly-based bug prediction, isolation, and validation: an automated approach for software debugging. |
2009 |
| 23 |
A real system evaluation of hardware atomicity for software speculation. |
2010 |
| 23 |
Automated repair of binary and assembly programs for cooperating embedded devices. |
2013 |
| 23 |
Energy-efficient work-stealing language runtimes. |
2014 |
| 23 |
A Hardware Design Language for Timing-Sensitive Information-Flow Security. |
2015 |
| 23 |
A DNA-Based Archival Storage System. |
2016 |
| 22 |
HICAMP: architectural support for efficient concurrency-safe shared structured data access. |
2012 |
| 22 |
Monitoring and Debugging the Quality of Results in Approximate Programs. |
2015 |
| 21 |
Accurate branch prediction for short threads. |
2008 |
| 21 |
Specifying and checking semantic atomicity for multithreaded programs. |
2011 |
| 21 |
SIMD defragmenter: efficient ILP realization on data-parallel architectures. |
2012 |
| 21 |
DeNovoSync: Efficient Support for Arbitrary Synchronization without Writer-Initiated Invalidations. |
2015 |
| 20 |
Phantom-BTB: a virtualized branch target buffer design. |
2009 |
| 20 |
A case for unlimited watchpoints. |
2012 |
| 20 |
GPUDet: a deterministic GPU architecture. |
2013 |
| 20 |
Volition: scalable and precise sequential consistency violation detection. |
2013 |
| 20 |
Cyrus: unintrusive application-level record-replay for replay parallelism. |
2013 |
| 20 |
Sapper: a language for hardware-level security policy enforcement. |
2014 |
| 20 |
SI-TM: reducing transactional memory abort rates through snapshot isolation. |
2014 |
| 20 |
NumaGiC: a Garbage Collector for Big Data on Big NUMA Machines. |
2015 |
| 20 |
Freecursive ORAM: [Nearly] Free Recursion and Integrity Verification for Position-based Oblivious RAM. |
2015 |
| 19 |
Dispersing proprietary applications as benchmarks through code mutation. |
2008 |
| 19 |
The benefit of SMT in the multi-core era: flexibility towards degrees of thread-level parallelism. |
2014 |
| 19 |
Hybrid Static: Dynamic Analysis for Statically Bounded Region Serializability. |
2015 |
| 18 |
StreamRay: a stream filtering architecture for coherent ray tracing. |
2009 |
| 18 |
Specifying and dynamically verifying address translation-aware memory consistency. |
2010 |
| 18 |
Aikido: accelerating shared data dynamic analyses. |
2012 |
| 18 |
Comprehending performance from real-world execution traces: a device-driver case. |
2014 |
| 18 |
Prototyping symbolic execution engines for interpreted languages. |
2014 |
| 17 |
Architectural implications of nanoscale integrated sensing and computing. |
2009 |
| 17 |
Request behavior variations. |
2010 |
| 17 |
Nested Kernel: An Operating System Architecture for Intra-Kernel Privilege Separation. |
2015 |
| 16 |
Architectural support for SWAR text processing with parallel bit streams: the inductive doubling principle. |
2009 |
| 16 |
A declarative language approach to device configuration. |
2011 |
| 16 |
Why you should care about quantile regression. |
2013 |
| 16 |
Rhythm: harnessing data parallel hardware for server workloads. |
2014 |
| 16 |
I/o paravirtualization at the device file boundary. |
2014 |
| 16 |
Targeted Automatic Integer Overflow Discovery Using Goal-Directed Conditional Branch Enforcement. |
2015 |
| 16 |
DEUCE: Write-Efficient Encryption for Non-Volatile Memories. |
2015 |
| 16 |
Maximizing Performance Under a Power Cap: A Comparison of Hardware, Software, and Hybrid Techniques. |
2016 |
| 15 |
Dynamic prediction of collection yield for managed runtimes. |
2009 |
| 15 |
Wait-n-GoTM: improving HTM performance by serializing cyclic dependencies. |
2013 |
| 15 |
rIOMMU: Efficient IOMMU for I/O Devices that Employ Ring Buffers. |
2015 |
| 15 |
CoGENT: Verifying High-Assurance File System Implementations. |
2016 |
| 14 |
Automatic generation of hardware/software interfaces. |
2012 |
| 14 |
To hardware prefetch or not to prefetch?: a virtualized environment study and core binding approach. |
2013 |
| 14 |
Hardware support for fine-grained event-driven computation in Anton 2. |
2013 |
| 14 |
RelaxReplay: record and replay for relaxed-consistency multiprocessors. |
2014 |
| 14 |
OpenPiton: An Open Source Manycore Research Framework. |
2016 |
| 13 |
Communication optimizations for global multi-threaded instruction scheduling. |
2008 |
| 13 |
iThreads: A Threading Library for Parallel Incremental Computation. |
2015 |
| 12 |
Totally green: evaluating and designing servers for lifecycle environmental impact. |
2012 |
| 12 |
Iterative optimization for the data center. |
2012 |
| 12 |
Low-level detection of language-level data races with LARD. |
2014 |
| 12 |
The sharing architecture: sub-core configurability for IaaS clouds. |
2014 |
| 12 |
Synchronization Using Remote-Scope Promotion. |
2015 |
| 12 |
High-Performance Transactions for Persistent Memories. |
2016 |
| 11 |
Triple-A: a Non-SSD based autonomic all-flash array for high performance storage systems. |
2014 |
| 11 |
CommGuard: Mitigating Communication Errors in Error-Prone Parallel Execution. |
2015 |
| 11 |
HCloud: Resource-Efficient Provisioning in Shared Cloud Systems. |
2016 |
| 10 |
Dynamic filtering: multi-purpose architecture support for language runtime systems. |
2010 |
| 10 |
“Challenging the “”embarrassingly sequential””: parallelizing finite state machine-based computations through principled speculation. “ |
2014 |
| 10 |
Speculative hardware/software co-designed floating-point multiply-add fusion. |
2014 |
| 10 |
Leveraging the short-term memory of hardware to diagnose production-run software failures. |
2014 |
| 10 |
VARAN the Unbelievable: An Efficient N-version Execution Framework. |
2015 |
| 10 |
SPECS: A Lightweight Runtime Mechanism for Protecting Software from Security-Critical Processor Bugs. |
2015 |
| 10 |
Automated OS-level Device Runtime Power Management. |
2015 |
| 10 |
TaxDC: A Taxonomy of Non-Deterministic Concurrency Bugs in Datacenter Distributed Systems. |
2016 |
| 10 |
Taurus: A Holistic Language Runtime System for Coordinating Distributed Managed-Language Applications. |
2016 |
| 10 |
How to Build Static Checking Systems Using Orders of Magnitude Less Code. |
2016 |
| 9 |
Practical automatic loop specialization. |
2013 |
| 9 |
Fence-free work stealing on bounded TSO processors. |
2014 |
| 9 |
Neuromorphic processing: a new frontier in scaling computer architecture. |
2014 |
| 9 |
Ziria: A DSL for Wireless Systems Programming. |
2015 |
| 9 |
CoolAir: Temperature- and Variation-Aware Management for Free-Cooled Datacenters. |
2015 |
| 9 |
Improving Agility and Elasticity in Bare-metal Clouds. |
2015 |
| 9 |
SD-PCM: Constructing Reliable Super Dense Phase Change Memory under Write Disturbance. |
2015 |
| 9 |
ANVIL: Software-Based Protection Against Next-Generation Rowhammer Attacks. |
2016 |
| 8 |
An update-aware storage system for low-locality update-intensive workloads. |
2012 |
| 8 |
Cider: native execution of iOS apps on android. |
2014 |
| 8 |
Specifying and Checking File System Crash-Consistency Models. |
2016 |
| 8 |
Failure-Atomic Persistent Memory Updates via JUSTDO Logging. |
2016 |
| 8 |
The Computational Sprinting Game. |
2016 |
| 8 |
Scaling up Superoptimization. |
2016 |
| 7 |
Continuous object access profiling and optimizations to overcome the memory wall and bloat. |
2012 |
| 7 |
The rise of the expert amateur: DIY culture and the evolution of computer science. |
2013 |
| 7 |
ASC: automatically scalable computation. |
2014 |
| 7 |
Finding the limit: examining the potential and complexity of compilation scheduling for JIT-based runtime systems. |
2014 |
| 7 |
Supporting Differentiated Services in Computers via Programmable Architecture for Resourcing-on-Demand (PARD). |
2015 |
| 7 |
Dirigent: Enforcing QoS for Latency-Critical Tasks on Shared Multicore Systems. |
2016 |
| 7 |
High Performance Packet Processing with FlexNIC. |
2016 |
| 7 |
Baymax: QoS Awareness and Increased Utilization for Non-Preemptive Accelerators in Warehouse Scale Computers. |
2016 |
| 7 |
Scalable Kernel TCP Design and Implementation for Short-Lived Connections. |
2016 |
| 7 |
COATCheck: Verifying Memory Ordering at the Hardware-OS Interface. |
2016 |
| 6 |
A program transformation and architecture support for quantum uncomputation. |
2006 |
| 6 |
Toward molecular programming with DNA. |
2008 |
| 6 |
Improved device driver reliability through hardware verification reuse. |
2011 |
| 6 |
High-performance fractal coherence. |
2014 |
| 6 |
Temporally Bounding TSO for Fence-Free Asymmetric Synchronization. |
2015 |
| 6 |
ProteusTM: Abstraction Meets Performance in Transactional Memory. |
2016 |
| 6 |
Generating Configurable Hardware from Parallel Patterns. |
2016 |
| 6 |
Proactive Control of Approximate Programs. |
2016 |
| 5 |
The cloud will change everything. |
2011 |
| 5 |
Compiler Management of Communication and Parallelism for Quantum Computation. |
2015 |
| 5 |
PIFT: Predictive Information-Flow Tracking. |
2016 |
| 5 |
An Energy-interference-free Hardware-Software Debugger for Intermittent Energy-harvesting Systems. |
2016 |
| 5 |
Paravirtual Remote I/O. |
2016 |
| 5 |
High-Density Image Storage Using Approximate Memory Cells. |
2016 |
| 5 |
Analyzing Behavior Specialized Acceleration. |
2016 |
| 4 |
Impact of virtualization on computer architecture and operating systems. |
2006 |
| 4 |
DeAliaser: alias speculation using atomic region support. |
2013 |
| 4 |
Efficient virtualization on embedded power architecture® platforms. |
2013 |
| 4 |
Guardrail: a high fidelity approach to protecting hardware devices from buggy drivers. |
2014 |
| 4 |
Finding trojan message vulnerabilities in distributed systems. |
2014 |
| 4 |
Kinetic Dependence Graphs. |
2015 |
| 4 |
On-the-Fly Principled Speculation for FSM Parallelization. |
2015 |
| 4 |
Watson and the Era of Cognitive Computing. |
2015 |
| 4 |
NVWAL: Exploiting NVRAM in Write-Ahead Logging. |
2016 |
| 4 |
Sego: Pervasive Trusted Metadata for Efficiently Verified Untrusted System Services. |
2016 |
| 4 |
memif: Towards Programming Heterogeneous Memory Asynchronously. |
2016 |
| 4 |
Silent Shredder: Zero-Cost Shredding for Secure Non-Volatile Main Memory Controllers. |
2016 |
| 3 |
Architectural Support for Cyber-Physical Systems. |
2015 |
| 3 |
M3: A Hardware/Operating-System Co-Design to Tame Heterogeneous Manycores. |
2016 |
| 3 |
Whirlpool: Improving Dynamic Cache Management with Static Data Classification. |
2016 |
| 3 |
RAPID Programming of Pattern-Recognition Processors. |
2016 |
| 3 |
TxRace: Efficient Data Race Detection Using Commodity Hardware Transactional Memory. |
2016 |
| 3 |
SpaceJMP: Programming with Multiple Virtual Address Spaces. |
2016 |
| 3 |
ReBudget: Trading Off Efficiency vs. Fairness in Market-Based Multicore Resource Allocation via Runtime Budget Reassignment. |
2016 |
| 3 |
Efficient Address Translation for Architectures with Multiple Page Sizes. |
2017 |
| 3 |
SC-DCNN: Highly-Scalable Deep Convolutional Neural Network using Stochastic Computing. |
2017 |
| 2 |
Locality-oblivious cache organization leveraging single-cycle multi-hop NoCs. |
2014 |
| 2 |
Dual Execution for On the Fly Fine Grained Execution Comparison. |
2015 |
| 2 |
HIPStR: Heterogeneous-ISA Program State Relocation. |
2016 |
| 2 |
Interference Management for Distributed Parallel Applications in Consolidated Clusters. |
2016 |
| 2 |
WiSync: An Architecture for Fast Synchronization through On-Chip Wireless Communication. |
2016 |
| 2 |
Architecture-Adaptive Code Variant Tuning. |
2016 |
| 2 |
True IOMMU Protection from DMA Attacks: When Copy is Faster than Zero Copy. |
2016 |
| 2 |
CSR: Core Surprise Removal in Commodity Operating Systems. |
2016 |
| 2 |
DySel: Lightweight Dynamic Selection for Kernel-based Data-parallel Programming Model. |
2016 |
| 2 |
AxGames: Towards Crowdsourcing Quality Target Determination in Approximate Computing. |
2016 |
| 2 |
Lifting Assembly to Intermediate Representation: A Novel Approach Leveraging Compilers. |
2016 |
| 2 |
Automated Synthesis of Comprehensive Memory Model Litmus Test Suites. |
2017 |
| 2 |
Breaking the Boundaries in Heterogeneous-ISA Datacenters. |
2017 |
| 2 |
KickStarter: Fast and Accurate Computations on Streaming Graphs via Trimmed Approximations. |
2017 |
| 2 |
Translation-Triggered Prefetching. |
2017 |
| 1 |
Research directions for 21st century computer systems: asplos 2013 panel. |
2013 |
| 1 |
Resolved: specialized architectures, languages, and system software should supplant general-purpose alternatives within a decade. |
2014 |
| 1 |
Inside windows azure: the challenges and opportunities of a cloud operating system. |
2014 |
| 1 |
More is Less, Less is More: Molecular-Scale Photonic NoC Power Topologies. |
2015 |
| 1 |
Asymmetric Memory Fences: Optimizing Both Performance and Implementability. |
2015 |
| 1 |
Architectural Support for Dynamic Linking. |
2015 |
| 1 |
DIABLO: A Warehouse-Scale Computer Network Simulator using FPGAs. |
2015 |
| 1 |
Prudent Memory Reclamation in Procrastination-Based Synchronization. |
2016 |
| 1 |
Programming Uncertain<T>jhings. |
2016 |
| 1 |
TPC: Target-Driven Parallelism Combining Prediction and Correction to Reduce Tail Latency in Interactive Services. |
2016 |
| 1 |
LDX: Causality Inference by Lightweight Dual Execution. |
2016 |
| 1 |
CloudSeer: Workflow Monitoring of Cloud Infrastructures via Interleaved Logs. |
2016 |
| 1 |
CASPAR: Breaking Serialization in Lock-Free Multicore Synchronization. |
2016 |
| 1 |
Brain Inspired Computing. |
2016 |
| 1 |
ReFlex: Remote Flash?Local Flash. |
2017 |
| 1 |
SaberLDA: Sparsity-Aware Learning of Topic Models on GPUs. |
2017 |
| 1 |
Hardware-Software Co-design to Mitigate DRAM Refresh Overheads: A Case for Refresh-Aware Process Scheduling. |
2017 |
| 1 |
Exploiting Intra-Request Slack to Improve SSD Performance. |
2017 |
| 1 |
Towards Practical Default-On Multi-Core Record/Replay. |
2017 |
| 1 |
Enabling Lightweight Transactions with Precision Time. |
2017 |
| 1 |
An Analysis of Persistent Memory Use with WHISPER. |
2017 |
| 1 |
GRIFFIN: Guarding Control Flows Using Intel Processor Trace. |
2017 |
| 1 |
TETRIS: Scalable and Efficient Neural Network Acceleration with 3D Memory. |
2017 |
| 1 |
Thermostat: Application-transparent Page Management for Two-tiered Main Memory. |
2017 |
| 1 |
Page Fault Support for Network Controllers. |
2017 |
| 0 |
Technology for developing regions: Moore’s law is not enough. |
2010 |
| 0 |
TSO_ATOMICITY: efficient hardware primitive for TSO-preserving region optimizations. |
2013 |
| 0 |
Programmer Productivity in a World of Mushy Interfaces: Challenges of the Post-ISA Reality. |
2016 |
| 0 |
Synopsis of the ASPLOS ‘16 Wild and Crazy Ideas (WACI) Invited-Speakers Session. |
2016 |
| 0 |
RID: Finding Reference Count Bugs with Inconsistent Path Pair Checking. |
2016 |
| 0 |
Sidewinder: An Energy Efficient and Developer Friendly Heterogeneous Architecture for Continuous Mobile Sensing. |
2016 |
| 0 |
Pallas: Semantic-Aware Checking for Finding Deep Bugs in Fast Path. |
2017 |
| 0 |
Kill the Program Counter: Reconstructing Program Behavior in the Processor Cache Hierarchy. |
2017 |
| 0 |
FLEP: Enabling Flexible and Efficient Preemption on GPUs. |
2017 |
| 0 |
CHERI JNI: Sinking the Java Security Model into the C. |
2017 |
| 0 |
3DGates: An Instruction-Level Energy Analysis and Optimization of 3D Printers. |
2017 |
| 0 |
Moonwalk: NRE Optimization in ASIC Clouds. |
2017 |
| 0 |
AMNESIAC: Amnesic Automatic Computer. |
2017 |
| 0 |
AsyncClock: Scalable Inference of Asynchronous Event Causality. |
2017 |
| 0 |
Prophet: Precise QoS Prediction on Non-Preemptive Accelerators to Improve Utilization in Warehouse-Scale Computers. |
2017 |
| 0 |
Typed Architectures: Architectural Support for Lightweight Scripting. |
2017 |
| 0 |
REDSPY: Exploring Value Locality in Software. |
2017 |
| 0 |
Locality Transformations for Nested Recursive Iteration Spaces. |
2017 |
| 0 |
Verification of a Practical Hardware Security Architecture Through Static Information Flow Analysis. |
2017 |
| 0 |
Optimizing CNNs on Multicores for Scalability, Performance and Goodput. |
2017 |
| 0 |
Locality-Aware CTA Clustering for Modern GPUs. |
2017 |
| 0 |
Browsix: Bridging the Gap Between Unix and the Browser. |
2017 |
| 0 |
DCatch: Automatically Detecting Distributed Concurrency Bugs in Cloud Systems. |
2017 |
| 0 |
Identifying Security Critical Properties for the Dynamic Verification of a Processor. |
2017 |
| 0 |
An Architecture Supporting Formal and Compositional Binary Analysis. |
2017 |
| 0 |
Sound Loop Superoptimization for Google Native Client. |
2017 |
| 0 |
Dynamic Resource Management for Efficient Utilization of Multitasking GPUs. |
2017 |
| 0 |
Black-box Concurrent Data Structures for NUMA Architectures. |
2017 |
| 0 |
Determining Application-specific Peak Power and Energy Requirements for Ultra-low Power Processors. |
2017 |
| 0 |
Approximate Storage of Compressed and Encrypted Videos. |
2017 |
| 0 |
History-Based Arbitration for Fairness in Processor-Interconnect of NUMA Servers. |
2017 |
| 0 |
Crossing Guard: Mediating Host-Accelerator Coherence Interactions. |
2017 |
| 0 |
Bolt: I Know What You Did Last Summer... In The Cloud. |
2017 |
| 0 |
DudeTM: Building Durable Transactions with Decoupling for Persistent Memory. |
2017 |
| 0 |
Voltage Regulator Efficiency Aware Power Management. |
2017 |
| 0 |
Failure-Atomic Slotted Paging for Persistent Memory. |
2017 |
| 0 |
TriCheck: Memory Model Verification at the Trisection of Software, Hardware, and ISA. |
2017 |
| 0 |
Mallacc: Accelerating Memory Allocation. |
2017 |
| 0 |
“Towards “”Full Containerization”” in Containerized Network Function Virtualization. “ |
2017 |
| 0 |
IncBricks: Toward In-Network Computation with an In-Network Cache. |
2017 |
| 0 |
Big Data Analytics and Intelligence at Alibaba Cloud. |
2017 |
| 0 |
CoRAL: Confined Recovery in Distributed Asynchronous Graph Processing. |
2017 |
| 0 |
Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge. |
2017 |
| 0 |
ProRace: Practical Data Race Detection for Production Use. |
2017 |
| 0 |
What Scalable Programs Need from Transactional Memory. |
2017 |
| 0 |
Graspan: A Single-machine Disk-based Graph System for Interprocedural Static Analyses of Large-scale Systems Code. |
2017 |