ECE 511 : Computer Architecture
Fall 2004, Readings


Required Text (HJS): Mark D. Hill, Norman P. Jouppi and Gurindar S. Sohi (editors), ``Readings in Computer Architecture,'' Morgan Kaufmann, 2000.


Intro

Ronen, R., Mendelson, A., Lai, K., Lu, S-L., Pollack, F., and Shen, J., "Coming Challenges in Microarchitecture and Architecture," Proceedings of the IEEE, 89(3), 2001.

G.E. Moore, "Cramming More Components onto Integrated Circuits," Electronics, Apr. 1965. (HJS:56)

S. Mazor, "The History of the Microcomputer - Invention and Evolution," Proceedings of the IEEE, 83(12):1601-1608, 1995. (HJS:60)

Simulator Construction

David W. Wall, "Limits of Instruction Level Parallelism," Digital Western Research Laboratory Research Report 93/6, 1993 (extended version of a paper that appeared in ASPLOS 1991: The appendix describes trace-based simulator design).

Mark D. Hill and Alan Jay Smith, "Evaluating Associativity in CPU Caches", IEEE Transactions on Computers, 38(12), 1989. (HJS:82).

Instruction Fetch

Scott McFarling, "Combining Branch Predictors," Digital Western Research Laboratory Technical Note TN-36, June 1993.

Sanjay Patel, Daniel Holmes Friendly and Yale N. Patt, Evaluation of design options for the trace cache fetch mechanism, IEEE Transactions on Computers, 48(2), 1999.

Thomas Ball and James R. Larus, "Efficient Path Profiling," MICRO-29, December, 1996.

Daniel A. Jiménez, "Fast Path-Based Neural Branch Prediction," MICRO-36, December, 2003.

Speculative Execution and Rollback

James E. Smith and Gurindar S. Sohi, The Microarchitecture of Superscalar Processors, Proceedings of the IEEE, 83(12):1609-1624, 1995.

Kenneth C. Yeager, "The Mips R10000 Superscalar Microprocessor," IEEE Micro, April 1996. (HJS:275)

D.B.Papworth, "Tuning the Pentium Pro Microarchitecture," IEEE Micro, 16(2):8-15, 1996. (HJS:660)

B. Ramakrishna Rau and Joseph A. Fisher, "Instruction-Level Parallel Processing: History, Overview, and Perspective," The Journal of Supercomputing, 7, 9-50, 1993. (HJS:288)

Memory Dataflow

George Z. Chrysos and Joel S. Emer, "Memory Dependence Prediction using Store Sets," ISCA, 1998.

Dynamic Optimization

Brian Fahs, Satarupa Bose, Matthew Crum, Brian Slechta, Francesco Spadini, Tony Tung, Sanjay J. Patel and Steven S. Lumetta, "Performance Characterization of a Hardware Mechanism for Dynamic Optimization," MICRO, 2001.

Scheduling

Dan Ernst, Andrew Hamel, and Todd Austin, Cyclone: A Broadcast-Free Dynamic Instruction Scheduler with Selective Replay, ACM/IEEE 30th Annual International Symposium on Computer Architecture (ISCA-2003), June 2003.

Vasanth Bala and Norman Rubin, "Efficient Instruction Scheduling Using Finite State Automata," MICRO-28, 1995.

B. Ramakrishna Rau, "Iterative Modulo Scheduling," MICRO, 1994.

Caching

Daniel D. Sleator and Robert E. Tarjan, "Amortized Efficiency of List Update and Paging Rules," Communications of the ACM, 28(2), 1985.

Erik G. Hallnor and Steven K. Reinhardt, "A Fully Associative Software-Managed Cache Design," ISCA, 2000.

Amir Roth, Andreas Moshovos and Gurindar S. Sohi, "Dependence Based Prefetching for Linked Data Structures," ASPLOS-8, 1998.

Multithreading

David F. Bacon, Susan L. Graham and Oliver J. Sharp, Compiler Transformations for High-Performance Computing, ACM Computing Surveys, 26(4), 1994. (Sections 6.2.1, 6.2.7 and 6.4.1.)

Dean M. Tullsen, Susan J. Eggers, Joel S. Emer, Henry M. Levy, Jack L. Lo and Rebecca L. Stamm, "Exploiting choice: instruction fetch and issue on an implementable simultaneous multithreading processor," ISCA, 1996. (HJS:350).

Hakim Akkary and Michael A. Driscoll, "A Dynamic Multithreading Processor," MICRO, 1998.

K. Akeley, "Reality Engine Graphics," SIGGRAPH, 1993. (HJS:507)

S. K. Reinhardt and S. S. Mukherjee, "Transient Fault Detection via Simultaneous Multithreading," ISCA, 2000.

Communication and Synchronization

Lionel Ni and Philip K. McKinley, "A Survey of Wormhole Routing Techniques in Direct Networks," IEEE Computer, February, 1993. (HJS: 492)

Leslie Lamport, "How to Make a Multiprocessor Computer that Correctly Executes Multiprocess Programs", IEEE Transactions on Computers, 28(9):690-691, 1979. (HJS: 574)

L.M. Censier and P. Feautrier, "A New Solution to Coherence Problems in Multicache Systems," IEEE Transactions on Computers, 27(12):1112-1118, 1978. (HJS:576)

Ravi Rajwar and James R. Goodman, "Transactional Lock-Free Execution of Lock-Based Programs," ASPLOS, 2002.