|
1) Cache Coherence and Scalability:
1a) P. Stenstrom. "A
Survey of Cache Coherence Schemes for Multiprocessors". IEEE Computer,
1990. Also see Dubois, Annavaram, and Stenstrom's book or
Culler and Singh's book.
1b) D. Lenoski at al. "The
Directory-Based Cache Coherence Protocol for the DASH Multiprocessors".
ISCA 1990.
1c) A. Gupta et al. "Reducing
Memory and Traffic Requirements for Scalable Directory-Based Cache Coherence
Schemes". ICPP 1990.
1d) A. Gupta & W. Weber "Cache
Invalidation Patterns in Shared-Memory Multiprocessors". Transactions on
Computers, July 1992.
1e) J. Torrellas, M. Lam & J. Hennessy
"False Sharing and Spatial Locality in Multiprocessor Caches". Transactions
on Computers, June 1994.
1f) Mengjia Yan, Jen-Yang Wen, Christopher Fletcher, and Josep Torrellas. "
SecDir: A Secure Directory to Defeat Directory Side-Channel Attacks". ISCA 2019.
|
|
2) Memory Consistency Models:
2a) K. Gharachorloo et al. "Memory
Consistency and Event Ordering in Scalable Shared-Memory Multiprocessors".
ISCA 1990.
2b) K. Gharachorloo et al. "Performance
Evaluation of Memory Consistency Models for Shared-Memory Multiprocessors".
ASPLOS 1991.
2c) K. Gharachorloo et al.
"Two Techniques to Enhance the performance of Memory Consistency
Models". ICPP 1991.
2d) S. Adve &
K. Gharachorloo.
"Shared Memory Consistency Models: A Tutorial" WRL Research
Report 95/7, 1995
|
|
3) Prefetching and Forwarding:
3a) T. Mowry et al. "Design
and Evaluation of a Compiler Algorithm for Prefetching".
ASPLOS 1992.
3b) Y. Solihin et al.
"Using a User-Level Memory Thread for Correlation Prefetching".
ISCA 2002.
|
|
4) Synchronization:
4a) J. Goodman
et al. "Efficient Synchronization Primitives for Large Scale Cache-Coherent
Multiprocessors". ASPLOS 1989.
4b) J. Mellor-Crummey
and M. Scott. "Algorithms for Scalable Synchronization on Shared-Memory
Multiprocessors". ACM TOCS 1991.
|
|
5) Multithreading:
5a) D. Tullsen et al.
"Simultaneous multithreading: Maximizing On-Chip Parallelism".
ISCA 1995.
5b) D. Tullsen et al.
"Exploiting
Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading
Processor". ISCA 1996.
|
|
6) Multiple Processors on a Chip:
6a) K.
Olukotun
et al. "The Case for a Single-Chip Multiprocessor".ASPLOS 1996.
6b) G. Sohi et al.
"Multiscalar
Processors". ISCA 1995.
6c) V. Krishnan and J.
Torrellas.
"A Chip Multiprocessor Architecture with Speculative Multithreading".
IEEE Trans Comp 1999.
|
|
7) Speculative Parallelization and Execution:
7a) J. Steffan et al.
"A
Scalable Approach to Thread-Level Speculation".ISCA 2000.
7b) J. Martinez et
al.
"Speculative Synchronization: Applying Thread-Level Speculation to
Expliticly Parallel Applications". ASPLOS 2002
|
|
8) Processor and Memory Integration:
8a) D. Patterson et al.
"A
Case for Intelligent DRAM". IEEE Micro 1997.
8b) Y. Kang et al.
"FlexRAM:
Toward an Advanced Intelligent Memory System".ICCD 1999.
|
|
9) Reliability:
9a) J. Oplinger et al.
"Enhancing Software Reliability with Speculative Threads".
ASPLOS 2002.
9b) M. Prvulovic et al.
" ReEnact: Using Thread-Level Speculation to Debug Data Races in Multithreaded
Codes". ISCA 2003.
9c) S. Mukherjee et al.
"Detailed Design and Evaluation of Redundant Multithreading
Alternatives". ISCA 2002.
9d) M. Prvulovic et al.
"ReVive: Cost-Effective Architectural Support for Rollback Recovery
in Shared-Memory Multiprocessors". ISCA 2002.
9e) J. Nakano et al.
" ReViveI/O: Efficient Handling of I/O in Highly-Available Rollback-Recovery
Servers". (optional) HPCA 2006.
|
|
10) Interaction of Operating Systems with Architecture:
10a) J. Torrellas et al. "Characterizing
the Caching and Synchronization Performance of a Multiprocessor Operating
System". ASPLOS 1992.
10b) B. Vergese
et al. "Operating System Support for Improving Data Locality on CC-NUMA
Compute Servers". ASPLOS 1996.
10c) P. Trancoso et al. "The
Memory Performance of DSS Commercial Workloads in Shared-Memory Multiprocessors".
HPCA 1997.
10d) L Barroso et
al. "Memory System Characterization of Commercial Workloads",
ISCA 1998.
10e)J. Torrellas, Andrew Tucker and Anoop Gupta.
"Evaluating the Performance of Cache-Affinity Scheduling in Shared-Memory
Multiprocessors". Journal of Parallel and Distributed Computing (JPDC), February
1995.
|
|
11) Message Passing Architectures:
11a) D. Culler and J. Singh's book: Chapter 10.
11b) L. Ni and P. McKinley.
"A Survey of Wormhole Routing Techniques in Direct Networks". IEEE
Computer 1993.
11c) W. Dally. "Performance
Analysis of k-ary n-cube Interconnection Networks," IEEE Trans. on
Computers, 1990.
11d) S. Scott and G. Thorson. "The
Cray T3E Network: Adaptive Routing in a High Performance 3D Torus ". Hot Interconnects IV, 1996.
|
|
12) Dataflow Architectures:
12a) R. Iannucci. "Toward
a Dataflow/Von Neumann Hybrid Architecture". ISCA 1988.
12b) A. Veen. "Dataflow Machine Architecture". ACM Computing Surveys, December 1986.
|
|
13) Data-Parallel and Data-Center Architectures:
13a)
Norman P. Jouppi, George Kurian, Sheng Li, Peter Ma, Rahul Nagarajan, Lifeng Nai, Nishant Patil, Suvinay
Subramanian, Andy Swing, Brian Towles, Cliff Young, Xiang Zhou, Zongwei Zhou, and David Patterson."TPU v4: An Optically Reconfigurable Supercomputer for
Machine Learning with Hardware Support for Embeddings."ISCA 2023.
13b) Jovan Stojkovic, Chaojie Zhang, Íñigo Goiri, Esha Choukse, Haoran Qiu, Rodrigo Fonseca, Josep Torrellas, Ricardo Bianchini. "TAPAS: Thermal- and Power-Aware Scheduling for LLM Inference in Cloud Platforms. " ASPLOS 2025.
|
|
14) 3D-Stacked Architectures:
14a) Aditya Agrawal, Josep Torrellas, and Sachin Idgunji. "Xylem: Enhancing Vertical Thermal Conduction in 3D Processor-Memory Stacks. " MICRO 2017.
14b) Bhargava Gopireddy and Josep Torrellas. "Designing Vertical Processors in Monolithic 3D. " ISCA 2019.
|
|
15) Cache-Only Memory Architectures:
15) Cache-Only Memory Architecture
|