ECE 511 FA21 Reading List:

Lecture 1 (08/24)

Introduction: Technology and Performance
[Read] Cramming more components onto integrated circuits
https://newsroom.intel.com/wp-content/uploads/sites/11/2018/05/moores-law-electronics.pdf
[Read] Coming Challenges in Microarchitecture and Architecture
https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=915377 

Lecture 2 (08/26)

Gem5 Tutorial: See Resources
[Review] Power: a first-class architectural design constraint
http://ieeexplore.ieee.org/document/917539/
[Reference]: Morgan Claypool Lecture on Power
https://www.morganclaypool.com/doi/pdf/10.2200/S00119ED1V01Y200805CAC004
[Read]: The gem5 Simulator: Version 20.0+
https://arxiv.org/abs/2007.03152
[Reference]: http://learning.gem5.org/book/index.html
Morgan Claypool Lecture on Simulation
https://www.morganclaypool.com/doi/pdfplus/10.2200/S00273ED1V01Y201006CAC010

Lecture 3 (08/31)

Technolology and Performance, ISA
[Review] Limits of Instruction-Level Parallelism  
https://www.hpl.hp.com/techreports/Compaq-DEC/WRL-93-6.pdf
[Read] Power Struggles: Revisiting the RISC vs. CISC Debate on Contemporary ARM and x86 Architectures
https://research.cs.wisc.edu/vertical/papers/2013/hpca13-isa-power-struggles.pdf
[Reference] Instruction Sets and Beyond: Computers, Complexity, and Controversy
https://ieeexplore.ieee.org/document/1663000

Lecture 4 (09/02)
Microarchitecture Overview
[Review] The Microarchitecture of Superscalar Processors
https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=476078
[Read] Instruction-Level Parallel Processing: History, Overview and Perspective
https://www.hpl.hp.com/techreports/92/HPL-92-132.pdf
[Reference] R. Tomasulo, "An Efficient Algorithm for Exploiting Multiple Arithmetic Units," IBM Journal Research and Development, 11:2533, January 1967
Y.N. Patt, W.W. Hwu, and M.C. Shebanow, "HPS, A New Microarchitecture: Rationale and Introduction," Proceedings of the 18th International Microprogramming Workshop, Asilomar, CA Dec. 1985, pp. 103-108. 

Lecture 5 (09/07)

Instruction Fetch (I)
[Review] Optimization of instruction fetch mechanisms for high issue rates
https://ieeexplore.ieee.org/document/524573
[Read] Evaluation of design options for the trace cache fetch mechanism
https://ieeexplore.ieee.org/document/752661

Lecture 6 (09/09)

Instruction Fetch (II)
[Read] Combining Branch Predictors
https://www.hpl.hp.com/techreports/Compaq-DEC/WRL-TN-36.pdf
[Review] Design Tradeoffs for the Alpha EV8 Conditional Branch Predictor
http://www.cs.binghamton.edu/~dima/cs522_05/ev8predictor.pdf

Lecture 7 (09/14)

Hardware Tutorial
[Read] INTRODUCING THE IA-64 ARCHITECTURE
https://ieeexplore.ieee.org/document/877947
[Review] A Comparison of Full and Partial Predicated Execution Support for ILP Processors
https://ieeexplore.ieee.org/document/524556

Lecture 8 (09/16)

Speculative Execution / OO (I)
[Review Option 1] The MIPS R10000 Superscalar Microprocessor
https://ieeexplore.ieee.org/document/491460
[Review Option 2] Tuning the Pentium Pro Microarchitecture
https://www.computer.org/csdl/magazine/mi/1996/02/m2008/13rRUxZ0nYd
[Reference] Processor Microarchitecture An Implementation Perspective. Chapters 5 and 6

Lecture 9 (09/21)

Speculative Execution / OO (II)
[Review] Dynamic speculation and synchronization of data dependences
https://dl.acm.org/doi/10.1145/264107.264189
[Read] Instruction Issue Logic for High-Performance,
Interruptible, Multiple Functional Unit,
[Read] Pipelined Computers
https://ieeexplore.ieee.org/document/48865

Lecture 10 (09/23)

Speculative Execution / OO (III)
[Review] Implementing Precise Interrupts in Pipelined Processors
https://ieeexplore.ieee.org/iel1/12/257/00004607.pdf
[Read] HASWELL: THE FOURTH-GENERATION INTEL CORE PROCESSOR
http://ieeexplore.ieee.org/document/6762795
[Reference] Processor Microarchitecture An Implementation Perspective. Chapter 7.

Lecture 11 (09/28)

Speculative Execution / OO (IV)

[Read] Look-ahead Processors
https://dl.acm.org/doi/10.1145/356654.356657
[Review] Complexity-Effective Superscalar Processors
https://dl.acm.org/doi/10.1145/384286.264201
[Reference] Processor Microarchitecture An Implementation Perspective. Chapter 8.

Lecture 12 (09/30)

Memory Dataflow
[Read] Scalable Hardware Memory Disambiguation for High ILP Processors
https://www.microarch.org/micro36/html/pdf/sethumadhavan-ScalableHardware.pdf
[Review] Memory Dependence Prediction using Store Sets
http://people.csail.mit.edu/emer/papers/1998.06.isca.storesets.pdf

Lecture 13 (10/05)

[N/A] Midterm

Lecture 14 (10/07)

Caches
[Review Option 1] Caches and Memory
https://dl.acm.org/doi/10.1145/356887.356892
[Review Option 2] Selective Cache Ways: On-Demand Cache Resource Allocation
https://dl.acm.org/doi/10.5555/320080.320119
[Reference] Multi-core Hierarchies
https://www.morganclaypool.com/doi/abs/10.2200/S00365ED1V01Y201105CAC017

Lecture 15 (10/12)

Caches and Memory
[Review] Sandbox Prefetching: Safe Run-Time Evaluation of Aggressive Prefetchers
https://www.cs.utah.edu/~rajeev/pubs/hpca14p.pdf
[Read] Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers
https://dl.acm.org/doi/10.1145/325096.325162
[Reference] A Primer on Hardware Prefetching
https://www.morganclaypool.com/doi/abs/10.2200/S00581ED1V01Y201405CAC028

Lecture 16 (10/14)

Caches and Memory
[Review] Fine-Grained DRAM: Energy-Efficient DRAM for Extreme Bandwidth Systems
https://ieeexplore.ieee.org/document/8686544
[Read] Read: High-performance DRAMs in workstation environments
https://ieeexplore.ieee.org/document/966491
[Reference] The Memory System: You Can’t Avoid It, You Can’t Ignore It, You Can’t Fake It
https://www.morganclaypool.com/doi/pdf/10.2200/S00201ED1V01Y200907CAC007

Lecture 17 (10/19)

[Review] Agile Paging: Exceeding the Best of Nested and Shadow Paging
https://research.cs.wisc.edu/multifacet/papers/isca16_agile_paging.pdf
[Read] Amortized Efficiency of List Update and Paging Rules
https://dl.acm.org/doi/10.1145/2786.2793
[Reference] Architecture and OS Support for Virtual Memory
https://www.morganclaypool.com/doi/pdf/10.2200/S00795ED1V01Y201708CAC042

Lecture 18 (10/21)

System-level Issues
[Read] A case for redundant arrays of inexpensive disks (RAID)
https://dl.acm.org/doi/pdf/10.1145/50202.50214
[Review] A Comparison of Software and Hardware Techniques for x86 Virtualization
https://www.vmware.com/pdf/asplos235_adams.pdf

Lecture 19 (10/26)

Multithreading
[Read] Hyper-Threading Technology and Microarchitecture
http://www.moreno.marzolla.name/teaching/HPC/vol6iss1_art01.pdf
[Review] Exploiting choice: instruction fetch and issue on an implementable simultaneous multithreading processor
https://ieeexplore.ieee.org/document/1563047

Lecture 20 (10/28)

Multiprocessing
[Read] THE ORACLE SPARC T5 16-CORE PROCESSOR SCALES TO EIGHT SOCKETS
https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6493301&tag=1
[Review] The Stanford Dash Multiprocessor
https://ieeexplore.ieee.org/document/121510

Lecture 21 (11/02)

GPUs / Accelerators (No required paper review)
[Read] THE GPU COMPUTING ERA
https://ieeexplore.ieee.org/document/5446251
[Read] The Accelerator Wall: Limits of Chip Specialization
https://parallel.princeton.edu/papers/wall-hpca19.pdf

Lecture 22 (11/04)

Interconnections
[Read] A Survey of Wormhole Routing Techniques in Direct Networks
https://ieeexplore.ieee.org/document/191995
[Review] In-network cache coherence
https://dl.acm.org/doi/10.1109/MICRO.2006.27

Lecture 23 (11/09)

[N/A] Final

Lecture 24 (11/11)

Massively Parallel Processors and Programming
[Read] The CRAY-1 Computer System
https://dl.acm.org/doi/10.1145/359327.359336
[Review] A Special-Purpose Machine For Molecular Dynamics Simulation
https://cacm.acm.org/magazines/2008/7/5372-anton-a-special-purpose-machine-for-molecular-dynamics-simulation/fulltext