ASPLOS¶

All¶

Cited by	Paper title	Year
826	PowerNap: eliminating server idle power.	2009
814	A comparison of software and hardware techniques for x86 virtualization.	2006
680	Learning from mistakes: a comprehensive study on real world concurrency bug characteristics.	2008
627	DFTL: a flash translation layer employing demand-based selective caching of page-level address mappings.	2009
610	“No “”power”” struggles: coordinated multi-level power management for the data center. “	2008
549	Exploiting coarse-grained task, data, and pipeline parallelism in stream programs.	2006
540	Addressing shared resource contention in multicore processors via scheduling.	2010
498	Hybrid transactional memory.	2006
485	Clearing the clouds: a study of emerging scale-out workloads on modern hardware.	2012
405	Conservation cores: reducing the energy of mature computations.	2010
392	Overshadow: a virtualization-based approach to retrofitting protection in commodity operating systems.	2008
377	AVIO: detecting atomicity violations via access interleaving invariants.	2006
377	Accelerator: using data parallelism to program GPUs for general-purpose uses.	2006
370	Accurate and efficient regression modeling for microarchitectural performance and power prediction.	2006
369	Kendo: efficient deterministic multithreading in software.	2009
326	S2E: a platform for in-vivo multi-path analysis of software systems.	2011
318	Merge: a programming model for heterogeneous multi-core systems.	2008
314	Mnemosyne: lightweight persistent memory.	2011
310	Combinatorial sketching for finite programs.	2006
307	NV-Heaps: making persistent objects fast and safe with next-generation, non-volatile memories.	2011
300	DMP: deterministic shared memory multiprocessing.	2009
275	CoreDet: a compiler and runtime system for deterministic multithreaded execution.	2010
272	Flikker: saving DRAM refresh-power through critical data partitioning.	2011
269	Accelerating critical section execution with asymmetric multi-core architectures.	2009
268	Early experience with a commercial hardware transactional memory implementation.	2009
267	CTrigger: exposing atomicity violation bugs from their hiding places.	2009
266	Efficiently exploring architectural design spaces via predictive modeling.	2006
258	Gordon: using flash memory to build fast, power-efficient clusters for data-intensive applications.	2009
256	Architecture support for disciplined approximate programming.	2012
248	Paragon: QoS-aware scheduling for heterogeneous datacenters.	2013
239	Fairness via source throttling: a configurable and high-performance fairness substrate for multi-core memory systems.	2010
232	Producing wrong data without doing anything obviously wrong!	2009
227	Supporting nested transactional memory in logTM.	2006
225	Mercury and freon: temperature emulation and management for server systems.	2006
220	DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning.	2014
217	Understanding the propagation of hard errors to software and implications for resilient system design.	2008
213	Quasar: resource-efficient and QoS-aware cluster management.	2014
212	PicoServer: using 3D stacking technology to enable a compact energy efficient chip multiprocessor.	2006
199	Geiger: monitoring the buffer cache in a virtual machine environment.	2006
199	MemScale: active low-power modes for main memory.	2011
197	Dynamic knobs for responsive power-aware computing.	2011
194	Tarazu: optimizing MapReduce on heterogeneous clusters.	2012
180	Joint optimization of idle and cooling power in data centers while maintaining response time.	2010
177	Green-Marl: a DSL for easy and efficient graph analysis.	2012
176	RapidMRC: approximating L2 miss rate curves on commodity systems for online optimizations.	2009
175	An asymmetric distributed shared memory model for heterogeneous parallel systems.	2010
172	Faults in linux: ten years later.	2011
171	SherLog: error diagnosis by connecting clues from run-time logs.	2010
170	Feedback-driven threading: power-efficient and high-performance execution of multi-threaded workloads on CMPs.	2008
169	ELI: bare-metal performance for I/O virtualization.	2012
168	Accelerating two-dimensional page walks for virtualized systems.	2008
164	Cosmic rays don’t strike twice: understanding the nature of DRAM errors and the implications for system design.	2012
163	A randomized scheduler with probabilistic guarantees of finding bugs.	2010
161	Micro-pages: increasing DRAM efficiency with locality-aware data placement.	2010
161	On-the-fly elimination of dynamic irregularities for GPU computing.	2011
157	Unikernels: library operating systems for the cloud.	2013
156	Shoestring: probabilistic soft error reliability on the cheap.	2010
155	ASSURE: automatic software self-healing using rescue points.	2009
153	DoublePlay: parallelizing sequential logging and replay.	2011
151	Parasol and GreenSwitch: managing datacenters powered by renewable energy.	2013
150	Dynamically replicated memory: building reliable systems from nanoscale resistive memories.	2010
147	OWL: cooperative thread array aware scheduling techniques for improving GPGPU performance.	2013
144	Recording shared memory dependencies using strata.	2006
143	Ultra low-cost defect protection for microprocessor pipelines.	2006
142	A performance counter architecture for computing accurate CPI components.	2006
140	DejaVu: accelerating resource allocation in virtualized environments.	2012
137	Rethinking the library OS from the top down.	2011
136	Capo: a software-hardware interface for practical deterministic multiprocessor replay.	2009
133	Whole-system persistence.	2012
132	Respec: efficient online multiprocessor replayvia speculation and external determinism.	2010
132	Blink: managing server clusters on intermittent power.	2011
131	Traffic management: a holistic approach to memory placement on NUMA systems.	2013
130	Adaptive set pinning: managing shared caches in chip multiprocessors.	2008
128	Improving software diagnosability via log enhancement.	2011
128	InkTag: secure applications on an untrusted operating system.	2013
127	Parallelizing security checks on commodity hardware.	2008
127	Complete information flow tracking from the gates up.	2009
124	Hybrid NOrec: a case study in the effectiveness of best effort hardware transactional memory.	2011
123	A regulated transitive reduction (RTR) for longer memory race recording.	2006
120	Computation spreading: employing hardware migration to specialize CMP cores on-the-fly.	2006
113	Hardbound: architectural support for spatial safety of the C programming language.	2008
111	Power routing: dynamic power provisioning in the data center.	2010
110	Unbounded page-based transactional memory.	2006
109	Providing safe, user space access to fast, solid state disks.	2012
106	Ensuring operating system kernel integrity with OSck.	2011
104	Flexible architectural support for fine-grain scheduling.	2010
102	Virtualized and flexible ECC for main memory.	2010
101	Bell: bit-encoding online memory leak detection.	2006
101	The design and implementation of microdrivers.	2008
101	Speculative parallelization using software multi-threaded transactions.	2010
101	Architectural support for hypervisor-secure virtualization.	2012
99	Better bug reporting with better privacy.	2008
97	Bottleneck identification and scheduling in multithreaded applications.	2012
95	Automatic generation of peephole superoptimizers.	2006
93	Tradeoffs in transactional memory virtualization.	2006
92	Streamware: programming general-purpose multicore processors using streams.	2008
92	Optimistic parallelism benefits from data partitioning.	2008
91	A power-efficient all-optical on-chip interconnect using wavelength-based oblivious routing.	2010
91	ConMem: detecting severe concurrency bugs through an effect-oriented approach.	2010
90	ConSeq: detecting concurrency bugs through sequential errors.	2011
89	Leveraging stored energy for handling power emergencies in aggressively provisioned datacenters.	2012
89	Improving GPGPU concurrency with elastic kernels.	2013
88	Stochastic superoptimization.	2013
88	KVM/ARM: the design and implementation of the linux ARM hypervisor.	2014
87	RCDC: a relaxed consistency deterministic computer.	2011
87	SDF: software-defined flash for web-scale internet storage systems.	2014
85	Mementos: system support for long-running computation on RFID-scale devices.	2011
84	Sponge: portable stream programming on graphics engines.	2011
80	How low can you go?: recommendations for hardware-supported minimal TCB code execution.	2008
80	Relyzer: exploiting application-level fault equivalence to analyze application resiliency to transient faults.	2012
80	Iago attacks: why the system call API is a bad untrusted RPC interface.	2013
79	Understanding and visualizing full systems with data flow tomography.	2008
79	Paraprox: pattern-based approximation for data parallel applications.	2014
76	Q100: the architecture and design of a database processing unit.	2014
74	Pocket cloudlets.	2011
74	Data races vs. data race bugs: telling the difference with portend.	2012
72	Looking back on the language and hardware revolutions: measured power, performance, and scaling.	2011
72	Scalable address spaces using RCU balanced trees.	2012
71	Tartan: evaluating spatial computation for whole program execution.	2006
71	Characterizing processor thermal behavior.	2010
71	Inter-core cooperative TLB for chip multiprocessors.	2010
70	Temporal search: detecting hidden malware timebombs with virtual machines.	2006
70	GPUfs: integrating a file system with GPUs.	2013
68	Introspective 3D chips.	2006
68	Uncertain: a first-order type for uncertain data.	2014
68	Using ARM trustzone to build a trusted language runtime for mobile applications.	2014
67	Probabilistic job symbiosis modeling for SMT processor scheduling.	2010
66	Adapting to intermittent faults in multicore systems.	2008
66	Inter-core prefetching for multicore processors using migrating helper threads.	2011
65	Virtual ghost: protecting applications from hostile operating systems.	2014
64	An evaluation of the TRIPS computer system.	2009
64	CRUISE: cache replacement and utility-aware scheduling.	2012
61	Per-thread cycle accounting in SMT processors.	2009
59	Scale-out NUMA.	2014
58	Integrated network interfaces for high-bandwidth TCP/IP.	2006
58	Archipelago: trading address space for reliability and security.	2008
58	Mixed-mode multicore reliability.	2009
58	Understanding modern device drivers.	2012
58	Portable performance on heterogeneous architectures.	2013
58	Memory Errors in Modern Systems: The Good, The Bad, and The Ugly.	2015
57	Analyzing multicore dumps to facilitate concurrency bug reproduction.	2010
56	Software-based instruction caching for embedded processors.	2006
56	Execution migration in a heterogeneous-ISA chip multiprocessor.	2012
56	STABILIZER: statistically sound performance evaluation.	2013
55	A spatial path scheduling algorithm for EDGE architectures.	2006
55	Power containers: an OS facility for fine-grained power and energy management on multicore servers.	2013
54	ISOLATOR: dynamically ensuring isolation in comcurrent programs.	2009
54	Decoupling contention management from scheduling.	2010
54	NVM duet: unified working memory and persistent store architecture.	2014
53	Recovery domains: an organizing principle for recoverable operating systems.	2009
53	DreamWeaver: architectural support for deep sleep.	2012
53	PuDianNao: A Polyvalent Machine Learning Accelerator.	2015
52	PICSEL: measuring user-perceived performance to control dynamic frequency scaling.	2008
52	PocketWeb: instant web browsing for mobile devices.	2012
51	ApproxHadoop: Bringing Approximations to MapReduce Frameworks.	2015
50	Reflex: using low-power processors in smartphones without knowing them.	2012
50	Sirius: An Open End-to-End Voice and Vision Personal Assistant and Its Implications for Future Warehouse Scale Computers.	2015
49	SlicK: slice-based locality exploitation for efficient redundant multithreading.	2006
48	HOTL: a higher order theory of locality.	2013
47	A probabilistic pointer analysis for speculative optimizations.	2006
46	Efficiency trends and limits from comprehensive microarchitectural adaptivity.	2008
45	Leak pruning.	2009
44	2ndStrike: toward manifesting hidden concurrency typestate bugs.	2011
44	Using likely invariants for automated software fault localization.	2013
44	Fine-grained fault tolerance using device checkpoints.	2013
44	Architectural support for address translation on GPUs: designing memory management units for CPU/GPUs with unified address spaces.	2014
43	Understanding prediction-based partial redundant threading for low-overhead, high- coverage fault tolerance.	2006
43	Ubik: efficient cache sharing with strict qos for latency-critical workloads.	2014
43	Price theory based power management for heterogeneous multi-cores.	2014
42	Exploiting access semantics and program behavior to reduce snoop power in chip multiprocessors.	2008
42	Efficient online validation with delta execution.	2009
42	Commutativity analysis for software parallelization: letting program transformations see the big picture.	2009
42	ParaLog: enabling and accelerating online parallel monitoring of multithreaded applications.	2010
42	Heterogeneous-race-free memory models.	2014
42	REF: resource elasticity fairness with sharing incentives for multiprocessors.	2014
41	Comprehensively and efficiently protecting the heap.	2006
41	DeNovoND: efficient hardware support for disciplined non-determinism.	2013
40	Safe and automatic live update for operating systems.	2013
40	EnCore: exploiting system environment and correlation information for misconfiguration detection.	2014
39	SoftSig: software-exposed hardware signatures for code analysis and optimization.	2008
39	Path-exploration lifting: hi-fi tests for lo-fi emulators.	2012
39	Post-compiler software optimization for reducing energy.	2014
38	Predictor virtualization.	2008
38	TwinDrivers: semi-automatic derivation of fast and safe hypervisor network drivers from guest OS drivers.	2009
38	Efficient sequential consistency via conflict ordering.	2012
38	Data-parallel finite-state machines.	2014
38	GPU Concurrency: Weak Behaviours and Programming Assumptions.	2015
37	Hardware counter driven on-the-fly request signatures.	2008
36	Orchestration by approximation: mapping stream programs onto multicore architectures.	2011
35	HeapMD: identifying heap-based bugs using anomaly detection.	2006
35	Stealth prefetching.	2006
35	A defect tolerant self-organizing nanoscale SIMD architecture.	2006
34	Instruction scheduling for a tiled dataflow architecture.	2006
34	Demand-based coordinated scheduling for SMP VMs.	2013
34	Protecting Data on Smartphones and Tablets from Memory Attacks.	2015
34	Beyond the PDP-11: Architectural Support for a Memory-Safe C Abstract Machine.	2015
33	A new idiom recognition framework for exploiting hardware-assist instructions.	2006
33	Optimal task assignment in multithreaded processors: a statistical approach.	2012
33	Verifying systems rules using rule-directed symbolic execution.	2013
33	Verifying security invariants in ExpressOS.	2013
33	A study of the scalability of stop-the-world garbage collectors on multicores.	2013
33	K2: a mobile operating system for heterogeneous coherence domains.	2014
33	Transactionalizing legacy code: an experience report using GCC and Memcached.	2014
33	Disengaged scheduling for fair, protected access to fast computational accelerators.	2014
33	Page Placement Strategies for GPUs within Heterogeneous Memory Systems.	2015
33	GhostRider: A Hardware-Software System for Memory Trace Oblivious Computation.	2015
33	Asynchronized Concurrency: The Secret to Scaling Concurrent Search Data Structures.	2015
32	Xoc, an extension-oriented compiler for systems programming.	2008
32	Computational sprinting on a hardware/software testbed.	2013
32	PolyMage: Automatic Optimization for Image Processing Pipelines.	2015
31	Mapping esterel onto a multi-threaded embedded processor.	2006
31	Tapping into the fountain of CPUs: on operating system support for programmable devices.	2008
30	MacroSS: macro-SIMDization of streaming applications.	2010
30	A case for neuromorphic ISAs.	2011
30	Hardware acceleration of transactional memory on commodity systems.	2011
30	Region scheduling: efficiently using the cache architectures via page-level affinity.	2012
30	Regularities considered harmful: forcing randomness to memory accesses to reduce row buffer conflicts for multi-core, multi-bank systems.	2013
30	ReQoS: reactive static/dynamic compilation for QoS in warehouse scale computers.	2013
30	Deterministic galois: on-demand, portable and parameterless.	2014
30	Mojim: A Reliable and Highly-Available Non-Volatile Memory System.	2015
29	Exploring circuit timing-aware language and compilation.	2011
29	Efficient processor support for DRFx, a memory model with exceptions.	2011
29	Comprehensive kernel instrumentation via dynamic binary translation.	2012
29	VSwapper: a memory swapper for virtualized environments.	2014
29	FACADE: A Compiler and Runtime for (Almost) Object-Bounded Big Data Applications.	2015
28	The mapping collector: virtual memory support for generational, parallel, and concurrent compaction.	2008
28	Discerning the dominant out-of-order performance advantage: is it speculation or dynamism?	2013
28	Production-run software failure diagnosis via hardware performance counters.	2013
28	Parallelizing data race detection.	2013
28	Reduced Hardware NOrec: A Safe and Scalable Hybrid Transactional Memory.	2015
28	Chimera: Collaborative Preemption for Multitasking on a Shared GPU.	2015
27	Improving the performance of object-oriented languages with dynamic predication of indirect jumps.	2008
27	Maximum benefit from a minimal HTM.	2009
27	COMPASS: a programmable data prefetcher using idle GPU shaders.	2010
27	Chameleon: operating system support for dynamic processors.	2012
27	ConAir: featherweight concurrency bug recovery via single-threaded idempotent execution.	2013
27	DDOS: taming nondeterminism in distributed systems.	2013
27	Underprovisioning backup power infrastructure for datacenters.	2014
26	Orthrus: efficient software integrity protection on multi-cores.	2010
26	Synthesizing concurrent schedulers for irregular algorithms.	2011
26	Improving the performance of trace-based systems by false loop filtering.	2011
26	Transparent mutable replay for multicore debugging and patch validation.	2013
25	Cooperative empirical failure avoidance for multithreaded programs.	2013
25	Integrated 3D-stacked server designs for increasing physical density of key-value stores.	2014
25	Few-to-Many: Incremental Parallelism for Reducing Tail Latency in Interactive Services.	2015
24	Butterfly analysis: adapting dataflow analysis to dynamic parallel monitoring.	2010
24	Applying transactional memory to concurrency bugs.	2012
24	A Probabilistic Graphical Model-based Approach for Minimizing Energy Under Performance Constraints.	2015
24	Architectural Support for Software-Defined Metadata Processing.	2015
23	Anomaly-based bug prediction, isolation, and validation: an automated approach for software debugging.	2009
23	A real system evaluation of hardware atomicity for software speculation.	2010
23	Automated repair of binary and assembly programs for cooperating embedded devices.	2013
23	Energy-efficient work-stealing language runtimes.	2014
23	A Hardware Design Language for Timing-Sensitive Information-Flow Security.	2015
23	A DNA-Based Archival Storage System.	2016
22	HICAMP: architectural support for efficient concurrency-safe shared structured data access.	2012
22	Monitoring and Debugging the Quality of Results in Approximate Programs.	2015
21	Accurate branch prediction for short threads.	2008
21	Specifying and checking semantic atomicity for multithreaded programs.	2011
21	SIMD defragmenter: efficient ILP realization on data-parallel architectures.	2012
21	DeNovoSync: Efficient Support for Arbitrary Synchronization without Writer-Initiated Invalidations.	2015
20	Phantom-BTB: a virtualized branch target buffer design.	2009
20	A case for unlimited watchpoints.	2012
20	GPUDet: a deterministic GPU architecture.	2013
20	Volition: scalable and precise sequential consistency violation detection.	2013
20	Cyrus: unintrusive application-level record-replay for replay parallelism.	2013
20	Sapper: a language for hardware-level security policy enforcement.	2014
20	SI-TM: reducing transactional memory abort rates through snapshot isolation.	2014
20	NumaGiC: a Garbage Collector for Big Data on Big NUMA Machines.	2015
20	Freecursive ORAM: [Nearly] Free Recursion and Integrity Verification for Position-based Oblivious RAM.	2015
19	Dispersing proprietary applications as benchmarks through code mutation.	2008
19	The benefit of SMT in the multi-core era: flexibility towards degrees of thread-level parallelism.	2014
19	Hybrid Static: Dynamic Analysis for Statically Bounded Region Serializability.	2015
18	StreamRay: a stream filtering architecture for coherent ray tracing.	2009
18	Specifying and dynamically verifying address translation-aware memory consistency.	2010
18	Aikido: accelerating shared data dynamic analyses.	2012
18	Comprehending performance from real-world execution traces: a device-driver case.	2014
18	Prototyping symbolic execution engines for interpreted languages.	2014
17	Architectural implications of nanoscale integrated sensing and computing.	2009
17	Request behavior variations.	2010
17	Nested Kernel: An Operating System Architecture for Intra-Kernel Privilege Separation.	2015
16	Architectural support for SWAR text processing with parallel bit streams: the inductive doubling principle.	2009
16	A declarative language approach to device configuration.	2011
16	Why you should care about quantile regression.	2013
16	Rhythm: harnessing data parallel hardware for server workloads.	2014
16	I/o paravirtualization at the device file boundary.	2014
16	Targeted Automatic Integer Overflow Discovery Using Goal-Directed Conditional Branch Enforcement.	2015
16	DEUCE: Write-Efficient Encryption for Non-Volatile Memories.	2015
16	Maximizing Performance Under a Power Cap: A Comparison of Hardware, Software, and Hybrid Techniques.	2016
15	Dynamic prediction of collection yield for managed runtimes.	2009
15	Wait-n-GoTM: improving HTM performance by serializing cyclic dependencies.	2013
15	rIOMMU: Efficient IOMMU for I/O Devices that Employ Ring Buffers.	2015
15	CoGENT: Verifying High-Assurance File System Implementations.	2016
14	Automatic generation of hardware/software interfaces.	2012
14	To hardware prefetch or not to prefetch?: a virtualized environment study and core binding approach.	2013
14	Hardware support for fine-grained event-driven computation in Anton 2.	2013
14	RelaxReplay: record and replay for relaxed-consistency multiprocessors.	2014
14	OpenPiton: An Open Source Manycore Research Framework.	2016
13	Communication optimizations for global multi-threaded instruction scheduling.	2008
13	iThreads: A Threading Library for Parallel Incremental Computation.	2015
12	Totally green: evaluating and designing servers for lifecycle environmental impact.	2012
12	Iterative optimization for the data center.	2012
12	Low-level detection of language-level data races with LARD.	2014
12	The sharing architecture: sub-core configurability for IaaS clouds.	2014
12	Synchronization Using Remote-Scope Promotion.	2015
12	High-Performance Transactions for Persistent Memories.	2016
11	Triple-A: a Non-SSD based autonomic all-flash array for high performance storage systems.	2014
11	CommGuard: Mitigating Communication Errors in Error-Prone Parallel Execution.	2015
11	HCloud: Resource-Efficient Provisioning in Shared Cloud Systems.	2016
10	Dynamic filtering: multi-purpose architecture support for language runtime systems.	2010
10	“Challenging the “”embarrassingly sequential””: parallelizing finite state machine-based computations through principled speculation. “	2014
10	Speculative hardware/software co-designed floating-point multiply-add fusion.	2014
10	Leveraging the short-term memory of hardware to diagnose production-run software failures.	2014
10	VARAN the Unbelievable: An Efficient N-version Execution Framework.	2015
10	SPECS: A Lightweight Runtime Mechanism for Protecting Software from Security-Critical Processor Bugs.	2015
10	Automated OS-level Device Runtime Power Management.	2015
10	TaxDC: A Taxonomy of Non-Deterministic Concurrency Bugs in Datacenter Distributed Systems.	2016
10	Taurus: A Holistic Language Runtime System for Coordinating Distributed Managed-Language Applications.	2016
10	How to Build Static Checking Systems Using Orders of Magnitude Less Code.	2016
9	Practical automatic loop specialization.	2013
9	Fence-free work stealing on bounded TSO processors.	2014
9	Neuromorphic processing: a new frontier in scaling computer architecture.	2014
9	Ziria: A DSL for Wireless Systems Programming.	2015
9	CoolAir: Temperature- and Variation-Aware Management for Free-Cooled Datacenters.	2015
9	Improving Agility and Elasticity in Bare-metal Clouds.	2015
9	SD-PCM: Constructing Reliable Super Dense Phase Change Memory under Write Disturbance.	2015
9	ANVIL: Software-Based Protection Against Next-Generation Rowhammer Attacks.	2016
8	An update-aware storage system for low-locality update-intensive workloads.	2012
8	Cider: native execution of iOS apps on android.	2014
8	Specifying and Checking File System Crash-Consistency Models.	2016
8	Failure-Atomic Persistent Memory Updates via JUSTDO Logging.	2016
8	The Computational Sprinting Game.	2016
8	Scaling up Superoptimization.	2016
7	Continuous object access profiling and optimizations to overcome the memory wall and bloat.	2012
7	The rise of the expert amateur: DIY culture and the evolution of computer science.	2013
7	ASC: automatically scalable computation.	2014
7	Finding the limit: examining the potential and complexity of compilation scheduling for JIT-based runtime systems.	2014
7	Supporting Differentiated Services in Computers via Programmable Architecture for Resourcing-on-Demand (PARD).	2015
7	Dirigent: Enforcing QoS for Latency-Critical Tasks on Shared Multicore Systems.	2016
7	High Performance Packet Processing with FlexNIC.	2016
7	Baymax: QoS Awareness and Increased Utilization for Non-Preemptive Accelerators in Warehouse Scale Computers.	2016
7	Scalable Kernel TCP Design and Implementation for Short-Lived Connections.	2016
7	COATCheck: Verifying Memory Ordering at the Hardware-OS Interface.	2016
6	A program transformation and architecture support for quantum uncomputation.	2006
6	Toward molecular programming with DNA.	2008
6	Improved device driver reliability through hardware verification reuse.	2011
6	High-performance fractal coherence.	2014
6	Temporally Bounding TSO for Fence-Free Asymmetric Synchronization.	2015
6	ProteusTM: Abstraction Meets Performance in Transactional Memory.	2016
6	Generating Configurable Hardware from Parallel Patterns.	2016
6	Proactive Control of Approximate Programs.	2016
5	The cloud will change everything.	2011
5	Compiler Management of Communication and Parallelism for Quantum Computation.	2015
5	PIFT: Predictive Information-Flow Tracking.	2016
5	An Energy-interference-free Hardware-Software Debugger for Intermittent Energy-harvesting Systems.	2016
5	Paravirtual Remote I/O.	2016
5	High-Density Image Storage Using Approximate Memory Cells.	2016
5	Analyzing Behavior Specialized Acceleration.	2016
4	Impact of virtualization on computer architecture and operating systems.	2006
4	DeAliaser: alias speculation using atomic region support.	2013
4	Efficient virtualization on embedded power architecture® platforms.	2013
4	Guardrail: a high fidelity approach to protecting hardware devices from buggy drivers.	2014
4	Finding trojan message vulnerabilities in distributed systems.	2014
4	Kinetic Dependence Graphs.	2015
4	On-the-Fly Principled Speculation for FSM Parallelization.	2015
4	Watson and the Era of Cognitive Computing.	2015
4	NVWAL: Exploiting NVRAM in Write-Ahead Logging.	2016
4	Sego: Pervasive Trusted Metadata for Efficiently Verified Untrusted System Services.	2016
4	memif: Towards Programming Heterogeneous Memory Asynchronously.	2016
4	Silent Shredder: Zero-Cost Shredding for Secure Non-Volatile Main Memory Controllers.	2016
3	Architectural Support for Cyber-Physical Systems.	2015
3	M3: A Hardware/Operating-System Co-Design to Tame Heterogeneous Manycores.	2016
3	Whirlpool: Improving Dynamic Cache Management with Static Data Classification.	2016
3	RAPID Programming of Pattern-Recognition Processors.	2016
3	TxRace: Efficient Data Race Detection Using Commodity Hardware Transactional Memory.	2016
3	SpaceJMP: Programming with Multiple Virtual Address Spaces.	2016
3	ReBudget: Trading Off Efficiency vs. Fairness in Market-Based Multicore Resource Allocation via Runtime Budget Reassignment.	2016
3	Efficient Address Translation for Architectures with Multiple Page Sizes.	2017
3	SC-DCNN: Highly-Scalable Deep Convolutional Neural Network using Stochastic Computing.	2017
2	Locality-oblivious cache organization leveraging single-cycle multi-hop NoCs.	2014
2	Dual Execution for On the Fly Fine Grained Execution Comparison.	2015
2	HIPStR: Heterogeneous-ISA Program State Relocation.	2016
2	Interference Management for Distributed Parallel Applications in Consolidated Clusters.	2016
2	WiSync: An Architecture for Fast Synchronization through On-Chip Wireless Communication.	2016
2	Architecture-Adaptive Code Variant Tuning.	2016
2	True IOMMU Protection from DMA Attacks: When Copy is Faster than Zero Copy.	2016
2	CSR: Core Surprise Removal in Commodity Operating Systems.	2016
2	DySel: Lightweight Dynamic Selection for Kernel-based Data-parallel Programming Model.	2016
2	AxGames: Towards Crowdsourcing Quality Target Determination in Approximate Computing.	2016
2	Lifting Assembly to Intermediate Representation: A Novel Approach Leveraging Compilers.	2016
2	Automated Synthesis of Comprehensive Memory Model Litmus Test Suites.	2017
2	Breaking the Boundaries in Heterogeneous-ISA Datacenters.	2017
2	KickStarter: Fast and Accurate Computations on Streaming Graphs via Trimmed Approximations.	2017
2	Translation-Triggered Prefetching.	2017
1	Research directions for 21st century computer systems: asplos 2013 panel.	2013
1	Resolved: specialized architectures, languages, and system software should supplant general-purpose alternatives within a decade.	2014
1	Inside windows azure: the challenges and opportunities of a cloud operating system.	2014
1	More is Less, Less is More: Molecular-Scale Photonic NoC Power Topologies.	2015
1	Asymmetric Memory Fences: Optimizing Both Performance and Implementability.	2015
1	Architectural Support for Dynamic Linking.	2015
1	DIABLO: A Warehouse-Scale Computer Network Simulator using FPGAs.	2015
1	Prudent Memory Reclamation in Procrastination-Based Synchronization.	2016
1	Programming Uncertain<T>jhings.	2016
1	TPC: Target-Driven Parallelism Combining Prediction and Correction to Reduce Tail Latency in Interactive Services.	2016
1	LDX: Causality Inference by Lightweight Dual Execution.	2016
1	CloudSeer: Workflow Monitoring of Cloud Infrastructures via Interleaved Logs.	2016
1	CASPAR: Breaking Serialization in Lock-Free Multicore Synchronization.	2016
1	Brain Inspired Computing.	2016
1	ReFlex: Remote Flash?Local Flash.	2017
1	SaberLDA: Sparsity-Aware Learning of Topic Models on GPUs.	2017
1	Hardware-Software Co-design to Mitigate DRAM Refresh Overheads: A Case for Refresh-Aware Process Scheduling.	2017
1	Exploiting Intra-Request Slack to Improve SSD Performance.	2017
1	Towards Practical Default-On Multi-Core Record/Replay.	2017
1	Enabling Lightweight Transactions with Precision Time.	2017
1	An Analysis of Persistent Memory Use with WHISPER.	2017
1	GRIFFIN: Guarding Control Flows Using Intel Processor Trace.	2017
1	TETRIS: Scalable and Efficient Neural Network Acceleration with 3D Memory.	2017
1	Thermostat: Application-transparent Page Management for Two-tiered Main Memory.	2017
1	Page Fault Support for Network Controllers.	2017
0	Technology for developing regions: Moore’s law is not enough.	2010
0	TSO_ATOMICITY: efficient hardware primitive for TSO-preserving region optimizations.	2013
0	Programmer Productivity in a World of Mushy Interfaces: Challenges of the Post-ISA Reality.	2016
0	Synopsis of the ASPLOS ‘16 Wild and Crazy Ideas (WACI) Invited-Speakers Session.	2016
0	RID: Finding Reference Count Bugs with Inconsistent Path Pair Checking.	2016
0	Sidewinder: An Energy Efficient and Developer Friendly Heterogeneous Architecture for Continuous Mobile Sensing.	2016
0	Pallas: Semantic-Aware Checking for Finding Deep Bugs in Fast Path.	2017
0	Kill the Program Counter: Reconstructing Program Behavior in the Processor Cache Hierarchy.	2017
0	FLEP: Enabling Flexible and Efficient Preemption on GPUs.	2017
0	CHERI JNI: Sinking the Java Security Model into the C.	2017
0	3DGates: An Instruction-Level Energy Analysis and Optimization of 3D Printers.	2017
0	Moonwalk: NRE Optimization in ASIC Clouds.	2017
0	AMNESIAC: Amnesic Automatic Computer.	2017
0	AsyncClock: Scalable Inference of Asynchronous Event Causality.	2017
0	Prophet: Precise QoS Prediction on Non-Preemptive Accelerators to Improve Utilization in Warehouse-Scale Computers.	2017
0	Typed Architectures: Architectural Support for Lightweight Scripting.	2017
0	REDSPY: Exploring Value Locality in Software.	2017
0	Locality Transformations for Nested Recursive Iteration Spaces.	2017
0	Verification of a Practical Hardware Security Architecture Through Static Information Flow Analysis.	2017
0	Optimizing CNNs on Multicores for Scalability, Performance and Goodput.	2017
0	Locality-Aware CTA Clustering for Modern GPUs.	2017
0	Browsix: Bridging the Gap Between Unix and the Browser.	2017
0	DCatch: Automatically Detecting Distributed Concurrency Bugs in Cloud Systems.	2017
0	Identifying Security Critical Properties for the Dynamic Verification of a Processor.	2017
0	An Architecture Supporting Formal and Compositional Binary Analysis.	2017
0	Sound Loop Superoptimization for Google Native Client.	2017
0	Dynamic Resource Management for Efficient Utilization of Multitasking GPUs.	2017
0	Black-box Concurrent Data Structures for NUMA Architectures.	2017
0	Determining Application-specific Peak Power and Energy Requirements for Ultra-low Power Processors.	2017
0	Approximate Storage of Compressed and Encrypted Videos.	2017
0	History-Based Arbitration for Fairness in Processor-Interconnect of NUMA Servers.	2017
0	Crossing Guard: Mediating Host-Accelerator Coherence Interactions.	2017
0	Bolt: I Know What You Did Last Summer... In The Cloud.	2017
0	DudeTM: Building Durable Transactions with Decoupling for Persistent Memory.	2017
0	Voltage Regulator Efficiency Aware Power Management.	2017
0	Failure-Atomic Slotted Paging for Persistent Memory.	2017
0	TriCheck: Memory Model Verification at the Trisection of Software, Hardware, and ISA.	2017
0	Mallacc: Accelerating Memory Allocation.	2017
0	“Towards “”Full Containerization”” in Containerized Network Function Virtualization. “	2017
0	IncBricks: Toward In-Network Computation with an In-Network Cache.	2017
0	Big Data Analytics and Intelligence at Alibaba Cloud.	2017
0	CoRAL: Confined Recovery in Distributed Asynchronous Graph Processing.	2017
0	Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge.	2017
0	ProRace: Practical Data Race Detection for Production Use.	2017
0	What Scalable Programs Need from Transactional Memory.	2017
0	Graspan: A Single-machine Disk-based Graph System for Interprocedural Static Analyses of Large-scale Systems Code.	2017

2017¶

Cited by	Paper title
3	Efficient Address Translation for Architectures with Multiple Page Sizes.
3	SC-DCNN: Highly-Scalable Deep Convolutional Neural Network using Stochastic Computing.
2	Automated Synthesis of Comprehensive Memory Model Litmus Test Suites.
2	Breaking the Boundaries in Heterogeneous-ISA Datacenters.
2	KickStarter: Fast and Accurate Computations on Streaming Graphs via Trimmed Approximations.
2	Translation-Triggered Prefetching.
1	ReFlex: Remote Flash?Local Flash.
1	SaberLDA: Sparsity-Aware Learning of Topic Models on GPUs.
1	Hardware-Software Co-design to Mitigate DRAM Refresh Overheads: A Case for Refresh-Aware Process Scheduling.
1	Exploiting Intra-Request Slack to Improve SSD Performance.
1	Towards Practical Default-On Multi-Core Record/Replay.
1	Enabling Lightweight Transactions with Precision Time.
1	An Analysis of Persistent Memory Use with WHISPER.
1	GRIFFIN: Guarding Control Flows Using Intel Processor Trace.
1	TETRIS: Scalable and Efficient Neural Network Acceleration with 3D Memory.
1	Thermostat: Application-transparent Page Management for Two-tiered Main Memory.
1	Page Fault Support for Network Controllers.
0	Pallas: Semantic-Aware Checking for Finding Deep Bugs in Fast Path.
0	Kill the Program Counter: Reconstructing Program Behavior in the Processor Cache Hierarchy.
0	FLEP: Enabling Flexible and Efficient Preemption on GPUs.
0	CHERI JNI: Sinking the Java Security Model into the C.
0	3DGates: An Instruction-Level Energy Analysis and Optimization of 3D Printers.
0	Moonwalk: NRE Optimization in ASIC Clouds.
0	AMNESIAC: Amnesic Automatic Computer.
0	AsyncClock: Scalable Inference of Asynchronous Event Causality.
0	Prophet: Precise QoS Prediction on Non-Preemptive Accelerators to Improve Utilization in Warehouse-Scale Computers.
0	Typed Architectures: Architectural Support for Lightweight Scripting.
0	REDSPY: Exploring Value Locality in Software.
0	Locality Transformations for Nested Recursive Iteration Spaces.
0	Verification of a Practical Hardware Security Architecture Through Static Information Flow Analysis.
0	Optimizing CNNs on Multicores for Scalability, Performance and Goodput.
0	Locality-Aware CTA Clustering for Modern GPUs.
0	Browsix: Bridging the Gap Between Unix and the Browser.
0	DCatch: Automatically Detecting Distributed Concurrency Bugs in Cloud Systems.
0	Identifying Security Critical Properties for the Dynamic Verification of a Processor.
0	An Architecture Supporting Formal and Compositional Binary Analysis.
0	Sound Loop Superoptimization for Google Native Client.
0	Dynamic Resource Management for Efficient Utilization of Multitasking GPUs.
0	Black-box Concurrent Data Structures for NUMA Architectures.
0	Determining Application-specific Peak Power and Energy Requirements for Ultra-low Power Processors.
0	Approximate Storage of Compressed and Encrypted Videos.
0	History-Based Arbitration for Fairness in Processor-Interconnect of NUMA Servers.
0	Crossing Guard: Mediating Host-Accelerator Coherence Interactions.
0	Bolt: I Know What You Did Last Summer... In The Cloud.
0	DudeTM: Building Durable Transactions with Decoupling for Persistent Memory.
0	Voltage Regulator Efficiency Aware Power Management.
0	Failure-Atomic Slotted Paging for Persistent Memory.
0	TriCheck: Memory Model Verification at the Trisection of Software, Hardware, and ISA.
0	Mallacc: Accelerating Memory Allocation.
0	“Towards “”Full Containerization”” in Containerized Network Function Virtualization. “
0	IncBricks: Toward In-Network Computation with an In-Network Cache.
0	Big Data Analytics and Intelligence at Alibaba Cloud.
0	CoRAL: Confined Recovery in Distributed Asynchronous Graph Processing.
0	Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge.
0	ProRace: Practical Data Race Detection for Production Use.
0	What Scalable Programs Need from Transactional Memory.
0	Graspan: A Single-machine Disk-based Graph System for Interprocedural Static Analyses of Large-scale Systems Code.

2016¶

Cited by	Paper title
23	A DNA-Based Archival Storage System.
16	Maximizing Performance Under a Power Cap: A Comparison of Hardware, Software, and Hybrid Techniques.
15	CoGENT: Verifying High-Assurance File System Implementations.
14	OpenPiton: An Open Source Manycore Research Framework.
12	High-Performance Transactions for Persistent Memories.
11	HCloud: Resource-Efficient Provisioning in Shared Cloud Systems.
10	TaxDC: A Taxonomy of Non-Deterministic Concurrency Bugs in Datacenter Distributed Systems.
10	Taurus: A Holistic Language Runtime System for Coordinating Distributed Managed-Language Applications.
10	How to Build Static Checking Systems Using Orders of Magnitude Less Code.
9	ANVIL: Software-Based Protection Against Next-Generation Rowhammer Attacks.
8	Specifying and Checking File System Crash-Consistency Models.
8	Failure-Atomic Persistent Memory Updates via JUSTDO Logging.
8	The Computational Sprinting Game.
8	Scaling up Superoptimization.
7	Dirigent: Enforcing QoS for Latency-Critical Tasks on Shared Multicore Systems.
7	High Performance Packet Processing with FlexNIC.
7	Baymax: QoS Awareness and Increased Utilization for Non-Preemptive Accelerators in Warehouse Scale Computers.
7	Scalable Kernel TCP Design and Implementation for Short-Lived Connections.
7	COATCheck: Verifying Memory Ordering at the Hardware-OS Interface.
6	ProteusTM: Abstraction Meets Performance in Transactional Memory.
6	Generating Configurable Hardware from Parallel Patterns.
6	Proactive Control of Approximate Programs.
5	PIFT: Predictive Information-Flow Tracking.
5	An Energy-interference-free Hardware-Software Debugger for Intermittent Energy-harvesting Systems.
5	Paravirtual Remote I/O.
5	High-Density Image Storage Using Approximate Memory Cells.
5	Analyzing Behavior Specialized Acceleration.
4	NVWAL: Exploiting NVRAM in Write-Ahead Logging.
4	Sego: Pervasive Trusted Metadata for Efficiently Verified Untrusted System Services.
4	memif: Towards Programming Heterogeneous Memory Asynchronously.
4	Silent Shredder: Zero-Cost Shredding for Secure Non-Volatile Main Memory Controllers.
3	M3: A Hardware/Operating-System Co-Design to Tame Heterogeneous Manycores.
3	Whirlpool: Improving Dynamic Cache Management with Static Data Classification.
3	RAPID Programming of Pattern-Recognition Processors.
3	TxRace: Efficient Data Race Detection Using Commodity Hardware Transactional Memory.
3	SpaceJMP: Programming with Multiple Virtual Address Spaces.
3	ReBudget: Trading Off Efficiency vs. Fairness in Market-Based Multicore Resource Allocation via Runtime Budget Reassignment.
2	HIPStR: Heterogeneous-ISA Program State Relocation.
2	Interference Management for Distributed Parallel Applications in Consolidated Clusters.
2	WiSync: An Architecture for Fast Synchronization through On-Chip Wireless Communication.
2	Architecture-Adaptive Code Variant Tuning.
2	True IOMMU Protection from DMA Attacks: When Copy is Faster than Zero Copy.
2	CSR: Core Surprise Removal in Commodity Operating Systems.
2	DySel: Lightweight Dynamic Selection for Kernel-based Data-parallel Programming Model.
2	AxGames: Towards Crowdsourcing Quality Target Determination in Approximate Computing.
2	Lifting Assembly to Intermediate Representation: A Novel Approach Leveraging Compilers.
1	Prudent Memory Reclamation in Procrastination-Based Synchronization.
1	Programming Uncertain<T>jhings.
1	TPC: Target-Driven Parallelism Combining Prediction and Correction to Reduce Tail Latency in Interactive Services.
1	LDX: Causality Inference by Lightweight Dual Execution.
1	CloudSeer: Workflow Monitoring of Cloud Infrastructures via Interleaved Logs.
1	CASPAR: Breaking Serialization in Lock-Free Multicore Synchronization.
1	Brain Inspired Computing.
0	Programmer Productivity in a World of Mushy Interfaces: Challenges of the Post-ISA Reality.
0	Synopsis of the ASPLOS ‘16 Wild and Crazy Ideas (WACI) Invited-Speakers Session.
0	RID: Finding Reference Count Bugs with Inconsistent Path Pair Checking.
0	Sidewinder: An Energy Efficient and Developer Friendly Heterogeneous Architecture for Continuous Mobile Sensing.

2015¶

Cited by	Paper title
58	Memory Errors in Modern Systems: The Good, The Bad, and The Ugly.
53	PuDianNao: A Polyvalent Machine Learning Accelerator.
51	ApproxHadoop: Bringing Approximations to MapReduce Frameworks.
50	Sirius: An Open End-to-End Voice and Vision Personal Assistant and Its Implications for Future Warehouse Scale Computers.
38	GPU Concurrency: Weak Behaviours and Programming Assumptions.
34	Protecting Data on Smartphones and Tablets from Memory Attacks.
34	Beyond the PDP-11: Architectural Support for a Memory-Safe C Abstract Machine.
33	Page Placement Strategies for GPUs within Heterogeneous Memory Systems.
33	GhostRider: A Hardware-Software System for Memory Trace Oblivious Computation.
33	Asynchronized Concurrency: The Secret to Scaling Concurrent Search Data Structures.
32	PolyMage: Automatic Optimization for Image Processing Pipelines.
30	Mojim: A Reliable and Highly-Available Non-Volatile Memory System.
29	FACADE: A Compiler and Runtime for (Almost) Object-Bounded Big Data Applications.
28	Reduced Hardware NOrec: A Safe and Scalable Hybrid Transactional Memory.
28	Chimera: Collaborative Preemption for Multitasking on a Shared GPU.
25	Few-to-Many: Incremental Parallelism for Reducing Tail Latency in Interactive Services.
24	A Probabilistic Graphical Model-based Approach for Minimizing Energy Under Performance Constraints.
24	Architectural Support for Software-Defined Metadata Processing.
23	A Hardware Design Language for Timing-Sensitive Information-Flow Security.
22	Monitoring and Debugging the Quality of Results in Approximate Programs.
21	DeNovoSync: Efficient Support for Arbitrary Synchronization without Writer-Initiated Invalidations.
20	NumaGiC: a Garbage Collector for Big Data on Big NUMA Machines.
20	Freecursive ORAM: [Nearly] Free Recursion and Integrity Verification for Position-based Oblivious RAM.
19	Hybrid Static: Dynamic Analysis for Statically Bounded Region Serializability.
17	Nested Kernel: An Operating System Architecture for Intra-Kernel Privilege Separation.
16	Targeted Automatic Integer Overflow Discovery Using Goal-Directed Conditional Branch Enforcement.
16	DEUCE: Write-Efficient Encryption for Non-Volatile Memories.
15	rIOMMU: Efficient IOMMU for I/O Devices that Employ Ring Buffers.
13	iThreads: A Threading Library for Parallel Incremental Computation.
12	Synchronization Using Remote-Scope Promotion.
11	CommGuard: Mitigating Communication Errors in Error-Prone Parallel Execution.
10	VARAN the Unbelievable: An Efficient N-version Execution Framework.
10	SPECS: A Lightweight Runtime Mechanism for Protecting Software from Security-Critical Processor Bugs.
10	Automated OS-level Device Runtime Power Management.
9	Ziria: A DSL for Wireless Systems Programming.
9	CoolAir: Temperature- and Variation-Aware Management for Free-Cooled Datacenters.
9	Improving Agility and Elasticity in Bare-metal Clouds.
9	SD-PCM: Constructing Reliable Super Dense Phase Change Memory under Write Disturbance.
7	Supporting Differentiated Services in Computers via Programmable Architecture for Resourcing-on-Demand (PARD).
6	Temporally Bounding TSO for Fence-Free Asymmetric Synchronization.
5	Compiler Management of Communication and Parallelism for Quantum Computation.
4	Kinetic Dependence Graphs.
4	On-the-Fly Principled Speculation for FSM Parallelization.
4	Watson and the Era of Cognitive Computing.
3	Architectural Support for Cyber-Physical Systems.
2	Dual Execution for On the Fly Fine Grained Execution Comparison.
1	More is Less, Less is More: Molecular-Scale Photonic NoC Power Topologies.
1	Asymmetric Memory Fences: Optimizing Both Performance and Implementability.
1	Architectural Support for Dynamic Linking.
1	DIABLO: A Warehouse-Scale Computer Network Simulator using FPGAs.

2014¶

Cited by	Paper title
220	DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning.
213	Quasar: resource-efficient and QoS-aware cluster management.
88	KVM/ARM: the design and implementation of the linux ARM hypervisor.
87	SDF: software-defined flash for web-scale internet storage systems.
79	Paraprox: pattern-based approximation for data parallel applications.
76	Q100: the architecture and design of a database processing unit.
68	Uncertain: a first-order type for uncertain data.
68	Using ARM trustzone to build a trusted language runtime for mobile applications.
65	Virtual ghost: protecting applications from hostile operating systems.
59	Scale-out NUMA.
54	NVM duet: unified working memory and persistent store architecture.
44	Architectural support for address translation on GPUs: designing memory management units for CPU/GPUs with unified address spaces.
43	Ubik: efficient cache sharing with strict qos for latency-critical workloads.
43	Price theory based power management for heterogeneous multi-cores.
42	Heterogeneous-race-free memory models.
42	REF: resource elasticity fairness with sharing incentives for multiprocessors.
40	EnCore: exploiting system environment and correlation information for misconfiguration detection.
39	Post-compiler software optimization for reducing energy.
38	Data-parallel finite-state machines.
33	K2: a mobile operating system for heterogeneous coherence domains.
33	Transactionalizing legacy code: an experience report using GCC and Memcached.
33	Disengaged scheduling for fair, protected access to fast computational accelerators.
30	Deterministic galois: on-demand, portable and parameterless.
29	VSwapper: a memory swapper for virtualized environments.
27	Underprovisioning backup power infrastructure for datacenters.
25	Integrated 3D-stacked server designs for increasing physical density of key-value stores.
23	Energy-efficient work-stealing language runtimes.
20	Sapper: a language for hardware-level security policy enforcement.
20	SI-TM: reducing transactional memory abort rates through snapshot isolation.
19	The benefit of SMT in the multi-core era: flexibility towards degrees of thread-level parallelism.
18	Comprehending performance from real-world execution traces: a device-driver case.
18	Prototyping symbolic execution engines for interpreted languages.
16	Rhythm: harnessing data parallel hardware for server workloads.
16	I/o paravirtualization at the device file boundary.
14	RelaxReplay: record and replay for relaxed-consistency multiprocessors.
12	Low-level detection of language-level data races with LARD.
12	The sharing architecture: sub-core configurability for IaaS clouds.
11	Triple-A: a Non-SSD based autonomic all-flash array for high performance storage systems.
10	“Challenging the “”embarrassingly sequential””: parallelizing finite state machine-based computations through principled speculation. “
10	Speculative hardware/software co-designed floating-point multiply-add fusion.
10	Leveraging the short-term memory of hardware to diagnose production-run software failures.
9	Fence-free work stealing on bounded TSO processors.
9	Neuromorphic processing: a new frontier in scaling computer architecture.
8	Cider: native execution of iOS apps on android.
7	ASC: automatically scalable computation.
7	Finding the limit: examining the potential and complexity of compilation scheduling for JIT-based runtime systems.
6	High-performance fractal coherence.
4	Guardrail: a high fidelity approach to protecting hardware devices from buggy drivers.
4	Finding trojan message vulnerabilities in distributed systems.
2	Locality-oblivious cache organization leveraging single-cycle multi-hop NoCs.
1	Resolved: specialized architectures, languages, and system software should supplant general-purpose alternatives within a decade.
1	Inside windows azure: the challenges and opportunities of a cloud operating system.

2013¶

Cited by	Paper title
248	Paragon: QoS-aware scheduling for heterogeneous datacenters.
157	Unikernels: library operating systems for the cloud.
151	Parasol and GreenSwitch: managing datacenters powered by renewable energy.
147	OWL: cooperative thread array aware scheduling techniques for improving GPGPU performance.
131	Traffic management: a holistic approach to memory placement on NUMA systems.
128	InkTag: secure applications on an untrusted operating system.
89	Improving GPGPU concurrency with elastic kernels.
88	Stochastic superoptimization.
80	Iago attacks: why the system call API is a bad untrusted RPC interface.
70	GPUfs: integrating a file system with GPUs.
58	Portable performance on heterogeneous architectures.
56	STABILIZER: statistically sound performance evaluation.
55	Power containers: an OS facility for fine-grained power and energy management on multicore servers.
48	HOTL: a higher order theory of locality.
44	Using likely invariants for automated software fault localization.
44	Fine-grained fault tolerance using device checkpoints.
41	DeNovoND: efficient hardware support for disciplined non-determinism.
40	Safe and automatic live update for operating systems.
34	Demand-based coordinated scheduling for SMP VMs.
33	Verifying systems rules using rule-directed symbolic execution.
33	Verifying security invariants in ExpressOS.
33	A study of the scalability of stop-the-world garbage collectors on multicores.
32	Computational sprinting on a hardware/software testbed.
30	Regularities considered harmful: forcing randomness to memory accesses to reduce row buffer conflicts for multi-core, multi-bank systems.
30	ReQoS: reactive static/dynamic compilation for QoS in warehouse scale computers.
28	Discerning the dominant out-of-order performance advantage: is it speculation or dynamism?
28	Production-run software failure diagnosis via hardware performance counters.
28	Parallelizing data race detection.
27	ConAir: featherweight concurrency bug recovery via single-threaded idempotent execution.
27	DDOS: taming nondeterminism in distributed systems.
26	Transparent mutable replay for multicore debugging and patch validation.
25	Cooperative empirical failure avoidance for multithreaded programs.
23	Automated repair of binary and assembly programs for cooperating embedded devices.
20	GPUDet: a deterministic GPU architecture.
20	Volition: scalable and precise sequential consistency violation detection.
20	Cyrus: unintrusive application-level record-replay for replay parallelism.
16	Why you should care about quantile regression.
15	Wait-n-GoTM: improving HTM performance by serializing cyclic dependencies.
14	To hardware prefetch or not to prefetch?: a virtualized environment study and core binding approach.
14	Hardware support for fine-grained event-driven computation in Anton 2.
9	Practical automatic loop specialization.
7	The rise of the expert amateur: DIY culture and the evolution of computer science.
4	DeAliaser: alias speculation using atomic region support.
4	Efficient virtualization on embedded power architecture® platforms.
1	Research directions for 21st century computer systems: asplos 2013 panel.
0	TSO_ATOMICITY: efficient hardware primitive for TSO-preserving region optimizations.

2012¶

Cited by	Paper title
485	Clearing the clouds: a study of emerging scale-out workloads on modern hardware.
256	Architecture support for disciplined approximate programming.
194	Tarazu: optimizing MapReduce on heterogeneous clusters.
177	Green-Marl: a DSL for easy and efficient graph analysis.
169	ELI: bare-metal performance for I/O virtualization.
164	Cosmic rays don’t strike twice: understanding the nature of DRAM errors and the implications for system design.
140	DejaVu: accelerating resource allocation in virtualized environments.
133	Whole-system persistence.
109	Providing safe, user space access to fast, solid state disks.
101	Architectural support for hypervisor-secure virtualization.
97	Bottleneck identification and scheduling in multithreaded applications.
89	Leveraging stored energy for handling power emergencies in aggressively provisioned datacenters.
80	Relyzer: exploiting application-level fault equivalence to analyze application resiliency to transient faults.
74	Data races vs. data race bugs: telling the difference with portend.
72	Scalable address spaces using RCU balanced trees.
64	CRUISE: cache replacement and utility-aware scheduling.
58	Understanding modern device drivers.
56	Execution migration in a heterogeneous-ISA chip multiprocessor.
53	DreamWeaver: architectural support for deep sleep.
52	PocketWeb: instant web browsing for mobile devices.
50	Reflex: using low-power processors in smartphones without knowing them.
39	Path-exploration lifting: hi-fi tests for lo-fi emulators.
38	Efficient sequential consistency via conflict ordering.
33	Optimal task assignment in multithreaded processors: a statistical approach.
30	Region scheduling: efficiently using the cache architectures via page-level affinity.
29	Comprehensive kernel instrumentation via dynamic binary translation.
27	Chameleon: operating system support for dynamic processors.
24	Applying transactional memory to concurrency bugs.
22	HICAMP: architectural support for efficient concurrency-safe shared structured data access.
21	SIMD defragmenter: efficient ILP realization on data-parallel architectures.
20	A case for unlimited watchpoints.
18	Aikido: accelerating shared data dynamic analyses.
14	Automatic generation of hardware/software interfaces.
12	Totally green: evaluating and designing servers for lifecycle environmental impact.
12	Iterative optimization for the data center.
8	An update-aware storage system for low-locality update-intensive workloads.
7	Continuous object access profiling and optimizations to overcome the memory wall and bloat.

2011¶

Cited by	Paper title
326	S2E: a platform for in-vivo multi-path analysis of software systems.
314	Mnemosyne: lightweight persistent memory.
307	NV-Heaps: making persistent objects fast and safe with next-generation, non-volatile memories.
272	Flikker: saving DRAM refresh-power through critical data partitioning.
199	MemScale: active low-power modes for main memory.
197	Dynamic knobs for responsive power-aware computing.
172	Faults in linux: ten years later.
161	On-the-fly elimination of dynamic irregularities for GPU computing.
153	DoublePlay: parallelizing sequential logging and replay.
137	Rethinking the library OS from the top down.
132	Blink: managing server clusters on intermittent power.
128	Improving software diagnosability via log enhancement.
124	Hybrid NOrec: a case study in the effectiveness of best effort hardware transactional memory.
106	Ensuring operating system kernel integrity with OSck.
90	ConSeq: detecting concurrency bugs through sequential errors.
87	RCDC: a relaxed consistency deterministic computer.
85	Mementos: system support for long-running computation on RFID-scale devices.
84	Sponge: portable stream programming on graphics engines.
74	Pocket cloudlets.
72	Looking back on the language and hardware revolutions: measured power, performance, and scaling.
66	Inter-core prefetching for multicore processors using migrating helper threads.
44	2ndStrike: toward manifesting hidden concurrency typestate bugs.
36	Orchestration by approximation: mapping stream programs onto multicore architectures.
30	A case for neuromorphic ISAs.
30	Hardware acceleration of transactional memory on commodity systems.
29	Exploring circuit timing-aware language and compilation.
29	Efficient processor support for DRFx, a memory model with exceptions.
26	Synthesizing concurrent schedulers for irregular algorithms.
26	Improving the performance of trace-based systems by false loop filtering.
21	Specifying and checking semantic atomicity for multithreaded programs.
16	A declarative language approach to device configuration.
6	Improved device driver reliability through hardware verification reuse.
5	The cloud will change everything.

2010¶

Cited by	Paper title
540	Addressing shared resource contention in multicore processors via scheduling.
405	Conservation cores: reducing the energy of mature computations.
275	CoreDet: a compiler and runtime system for deterministic multithreaded execution.
239	Fairness via source throttling: a configurable and high-performance fairness substrate for multi-core memory systems.
180	Joint optimization of idle and cooling power in data centers while maintaining response time.
175	An asymmetric distributed shared memory model for heterogeneous parallel systems.
171	SherLog: error diagnosis by connecting clues from run-time logs.
163	A randomized scheduler with probabilistic guarantees of finding bugs.
161	Micro-pages: increasing DRAM efficiency with locality-aware data placement.
156	Shoestring: probabilistic soft error reliability on the cheap.
150	Dynamically replicated memory: building reliable systems from nanoscale resistive memories.
132	Respec: efficient online multiprocessor replayvia speculation and external determinism.
111	Power routing: dynamic power provisioning in the data center.
104	Flexible architectural support for fine-grain scheduling.
102	Virtualized and flexible ECC for main memory.
101	Speculative parallelization using software multi-threaded transactions.
91	A power-efficient all-optical on-chip interconnect using wavelength-based oblivious routing.
91	ConMem: detecting severe concurrency bugs through an effect-oriented approach.
71	Characterizing processor thermal behavior.
71	Inter-core cooperative TLB for chip multiprocessors.
67	Probabilistic job symbiosis modeling for SMT processor scheduling.
57	Analyzing multicore dumps to facilitate concurrency bug reproduction.
54	Decoupling contention management from scheduling.
42	ParaLog: enabling and accelerating online parallel monitoring of multithreaded applications.
30	MacroSS: macro-SIMDization of streaming applications.
27	COMPASS: a programmable data prefetcher using idle GPU shaders.
26	Orthrus: efficient software integrity protection on multi-cores.
24	Butterfly analysis: adapting dataflow analysis to dynamic parallel monitoring.
23	A real system evaluation of hardware atomicity for software speculation.
18	Specifying and dynamically verifying address translation-aware memory consistency.
17	Request behavior variations.
10	Dynamic filtering: multi-purpose architecture support for language runtime systems.
0	Technology for developing regions: Moore’s law is not enough.

2009¶

Cited by	Paper title
826	PowerNap: eliminating server idle power.
627	DFTL: a flash translation layer employing demand-based selective caching of page-level address mappings.
369	Kendo: efficient deterministic multithreading in software.
300	DMP: deterministic shared memory multiprocessing.
269	Accelerating critical section execution with asymmetric multi-core architectures.
268	Early experience with a commercial hardware transactional memory implementation.
267	CTrigger: exposing atomicity violation bugs from their hiding places.
258	Gordon: using flash memory to build fast, power-efficient clusters for data-intensive applications.
232	Producing wrong data without doing anything obviously wrong!
176	RapidMRC: approximating L2 miss rate curves on commodity systems for online optimizations.
155	ASSURE: automatic software self-healing using rescue points.
136	Capo: a software-hardware interface for practical deterministic multiprocessor replay.
127	Complete information flow tracking from the gates up.
64	An evaluation of the TRIPS computer system.
61	Per-thread cycle accounting in SMT processors.
58	Mixed-mode multicore reliability.
54	ISOLATOR: dynamically ensuring isolation in comcurrent programs.
53	Recovery domains: an organizing principle for recoverable operating systems.
45	Leak pruning.
42	Efficient online validation with delta execution.
42	Commutativity analysis for software parallelization: letting program transformations see the big picture.
38	TwinDrivers: semi-automatic derivation of fast and safe hypervisor network drivers from guest OS drivers.
27	Maximum benefit from a minimal HTM.
23	Anomaly-based bug prediction, isolation, and validation: an automated approach for software debugging.
20	Phantom-BTB: a virtualized branch target buffer design.
18	StreamRay: a stream filtering architecture for coherent ray tracing.
17	Architectural implications of nanoscale integrated sensing and computing.
16	Architectural support for SWAR text processing with parallel bit streams: the inductive doubling principle.
15	Dynamic prediction of collection yield for managed runtimes.

2008¶

Cited by	Paper title
680	Learning from mistakes: a comprehensive study on real world concurrency bug characteristics.
610	“No “”power”” struggles: coordinated multi-level power management for the data center. “
392	Overshadow: a virtualization-based approach to retrofitting protection in commodity operating systems.
318	Merge: a programming model for heterogeneous multi-core systems.
217	Understanding the propagation of hard errors to software and implications for resilient system design.
170	Feedback-driven threading: power-efficient and high-performance execution of multi-threaded workloads on CMPs.
168	Accelerating two-dimensional page walks for virtualized systems.
130	Adaptive set pinning: managing shared caches in chip multiprocessors.
127	Parallelizing security checks on commodity hardware.
113	Hardbound: architectural support for spatial safety of the C programming language.
101	The design and implementation of microdrivers.
99	Better bug reporting with better privacy.
92	Streamware: programming general-purpose multicore processors using streams.
92	Optimistic parallelism benefits from data partitioning.
80	How low can you go?: recommendations for hardware-supported minimal TCB code execution.
79	Understanding and visualizing full systems with data flow tomography.
66	Adapting to intermittent faults in multicore systems.
58	Archipelago: trading address space for reliability and security.
52	PICSEL: measuring user-perceived performance to control dynamic frequency scaling.
46	Efficiency trends and limits from comprehensive microarchitectural adaptivity.
42	Exploiting access semantics and program behavior to reduce snoop power in chip multiprocessors.
39	SoftSig: software-exposed hardware signatures for code analysis and optimization.
38	Predictor virtualization.
37	Hardware counter driven on-the-fly request signatures.
32	Xoc, an extension-oriented compiler for systems programming.
31	Tapping into the fountain of CPUs: on operating system support for programmable devices.
28	The mapping collector: virtual memory support for generational, parallel, and concurrent compaction.
27	Improving the performance of object-oriented languages with dynamic predication of indirect jumps.
21	Accurate branch prediction for short threads.
19	Dispersing proprietary applications as benchmarks through code mutation.
13	Communication optimizations for global multi-threaded instruction scheduling.
6	Toward molecular programming with DNA.

2006¶

Cited by	Paper title
814	A comparison of software and hardware techniques for x86 virtualization.
549	Exploiting coarse-grained task, data, and pipeline parallelism in stream programs.
498	Hybrid transactional memory.
377	AVIO: detecting atomicity violations via access interleaving invariants.
377	Accelerator: using data parallelism to program GPUs for general-purpose uses.
370	Accurate and efficient regression modeling for microarchitectural performance and power prediction.
310	Combinatorial sketching for finite programs.
266	Efficiently exploring architectural design spaces via predictive modeling.
227	Supporting nested transactional memory in logTM.
225	Mercury and freon: temperature emulation and management for server systems.
212	PicoServer: using 3D stacking technology to enable a compact energy efficient chip multiprocessor.
199	Geiger: monitoring the buffer cache in a virtual machine environment.
144	Recording shared memory dependencies using strata.
143	Ultra low-cost defect protection for microprocessor pipelines.
142	A performance counter architecture for computing accurate CPI components.
123	A regulated transitive reduction (RTR) for longer memory race recording.
120	Computation spreading: employing hardware migration to specialize CMP cores on-the-fly.
110	Unbounded page-based transactional memory.
101	Bell: bit-encoding online memory leak detection.
95	Automatic generation of peephole superoptimizers.
93	Tradeoffs in transactional memory virtualization.
71	Tartan: evaluating spatial computation for whole program execution.
70	Temporal search: detecting hidden malware timebombs with virtual machines.
68	Introspective 3D chips.
58	Integrated network interfaces for high-bandwidth TCP/IP.
56	Software-based instruction caching for embedded processors.
55	A spatial path scheduling algorithm for EDGE architectures.
49	SlicK: slice-based locality exploitation for efficient redundant multithreading.
47	A probabilistic pointer analysis for speculative optimizations.
43	Understanding prediction-based partial redundant threading for low-overhead, high- coverage fault tolerance.
41	Comprehensively and efficiently protecting the heap.
35	HeapMD: identifying heap-based bugs using anomaly detection.
35	Stealth prefetching.
35	A defect tolerant self-organizing nanoscale SIMD architecture.
34	Instruction scheduling for a tiled dataflow architecture.
33	A new idiom recognition framework for exploiting hardware-assist instructions.
31	Mapping esterel onto a multi-threaded embedded processor.
6	A program transformation and architecture support for quantum uncomputation.
4	Impact of virtualization on computer architecture and operating systems.

Last updated:	2017-08-07