ECE 412 : Computer Architecture
Fall 2003, Supplementary Readings

Instruction Set Architecture

David A. Patterson, "Reduced instruction set computers", Communications of the ACM, v.28 n.1, p.8-21, Jan. 1985

Microarchitecture

Ronen, R., Mendelson, A., Lai, K., Lu, S-L., Pollack, F., and Shen, J., "Coming Challenges in Microarchitecture and Architecture," Proceedings of the IEEE, Vol 89, No. 3, March 2001.

Instruction Fetch

Johnny K.F. Lee and Alan Jay Smith, "Branch Prediction Strategies and Branch Target Buffer Design," IEEE Computer, January, 1984. (This is a 17MB file, it may take a while to download and print.)

Scott McFarling, "Combining Branch Predictors," Digital Western Research Laboratory Technical Note TN-36, June 1993.

Caches

Mark D. Hill and Alan Jay Smith, Evaluating Associativity in CPU Caches, IEEE Transactions on Computers, 38(12), 1989. (This is the paper that introduced the notion of the "4 Cs" (compulsory, capacity, conflict and coherence)).

Norman P. Jouppi, Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers, ISCA, 1990.

Out-of-order Execution

A very nice survey paper:

James E. Smith and Gurindar S. Sohi, The Microarchitecture of Superscalar Processors, Proceedings of the IEEE, vol. 83, pp 1609--1624, Dec 1995.

The MIPS R10000: The first commercial microprocessor with an RRAT:

Kenneth C. Yeager, The Mips R10000 Superscalar Microprocessor, IEEE Micro, April 1996.

Classics

R. M. Tomasulo, An Efficient Algorithm for Exploiting Multiple Arithmetic Units, IBM Journal of Research and Development, 11(1), 1967.

Robert M. Keller, Look-Ahead Processors, ACM Computing Surveys, 7(4), 1975.

VLIW and EPIC

B. Ramakrishna Rau and Joseph A. Fisher, Instruction-Level Parallel Processing: History, Overview, and Perspective, The Journal of Supercomputing, 7, 9-50, 1993.

David I. August, Daniel A. Connors, Scott A. Mahlke, John W. Sias, Kevin M. Crozier, Ben-Chung Cheng, Patrick R. Eaton, Qudus B. Olaniran, and Wen-mei W. Hwu, Integrated Predicated and Speculative Execution in the IMPACT EPIC Archtecture, Proceedings of the 25th International Symposium on Computer Architecture, July, 1998.

Intel's page for the Intel Itanium Architecture Software Developer's Manual.

Non-associative scheduling automata

Dan Ernst, Andrew Hamel, and Todd Austin, Cyclone: A Broadcast-Free Dynamic Instruction Scheduler with Selective Replay, ACM/IEEE 30th Annual International Symposium on Computer Architecture (ISCA-2003), June 2003.

Throughput Optimizations

David F. Bacon, Susan L. Graham and Oliver J. Sharp, Compiler Transformations for High-Performance Computing, ACM Computing Surveys, 26(4), 1994. (Sections 6.2.1, 6.2.7 and 6.4.1.)

Dean M. Tullsen, Susan J. Eggers, Joel S. Emer, Henry M. Levy, Jack L. Lo and Rebecca L. Stamm, Exploiting choice: instruction fetch and issue on an implementable simultaneous multithreading processor, ISCA, 1996.

Sanjay Patel, Daniel Holmes Friendly and Yale N. Patt, Evaluation of design options for the trace cache fetch mechanism, IEEE Transactions on Computers, 48(2), 1999.

Interconnects

Lionel Ni and Philip K. McKinley, A Survey of Wormhole Routing Techniques in Direct Networks, IEEE Computer, February, 1993.

Speculative Multithreading

Gurindar S. Sohi and Amir Roth, Speculative Multithreaded Processors, IEEE Computer, April 2001.