Required Text (HJS): Mark D. Hill, Norman P. Jouppi and Gurindar S. Sohi (editors), ``Readings in Computer Architecture,'' Morgan Kaufmann, 2000.
Ronen, R., Mendelson, A., Lai, K., Lu, S-L., Pollack, F., and Shen, J., "Coming Challenges in Microarchitecture and Architecture," Proceedings of the IEEE, 89(3), 2001.
G.E. Moore, "Cramming More Components onto Integrated Circuits," Electronics, Apr. 1965. (HJS:56)
S. Mazor, "The History of the Microcomputer - Invention and Evolution," Proceedings of the IEEE, 83(12):1601-1608, 1995. (HJS:60)
David W. Wall, "Limits of Instruction Level Parallelism," Digital Western Research Laboratory Research Report 93/6, 1993 (extended version of a paper that appeared in ASPLOS 1991: The appendix describes trace-based simulator design).
Mark D. Hill and Alan Jay Smith, "Evaluating Associativity in CPU Caches", IEEE Transactions on Computers, 38(12), 1989. (HJS:82).
Scott McFarling, "Combining Branch Predictors," Digital Western Research Laboratory Technical Note TN-36, June 1993.
Sanjay Patel, Daniel Holmes Friendly and Yale N. Patt, Evaluation of design options for the trace cache fetch mechanism, IEEE Transactions on Computers, 48(2), 1999.
Thomas Ball and James R. Larus, "Efficient Path Profiling," MICRO-29, December, 1996.
Daniel A. Jiménez, "Fast Path-Based Neural Branch Prediction," MICRO-36, December, 2003.
James E. Smith and Gurindar S. Sohi, The Microarchitecture of Superscalar Processors, Proceedings of the IEEE, 83(12):1609-1624, 1995.
Kenneth C. Yeager, "The Mips R10000 Superscalar Microprocessor," IEEE Micro, April 1996. (HJS:275)
D.B.Papworth, "Tuning the Pentium Pro Microarchitecture," IEEE Micro, 16(2):8-15, 1996. (HJS:660)
B. Ramakrishna Rau and Joseph A. Fisher, "Instruction-Level Parallel Processing: History, Overview, and Perspective," The Journal of Supercomputing, 7, 9-50, 1993. (HJS:288)
George Z. Chrysos and Joel S. Emer, "Memory Dependence Prediction using Store Sets," ISCA, 1998.
Brian Fahs, Satarupa Bose, Matthew Crum, Brian Slechta, Francesco Spadini, Tony Tung, Sanjay J. Patel and Steven S. Lumetta, "Performance Characterization of a Hardware Mechanism for Dynamic Optimization," MICRO, 2001.
Dan Ernst, Andrew Hamel, and Todd Austin, Cyclone: A Broadcast-Free Dynamic Instruction Scheduler with Selective Replay, ACM/IEEE 30th Annual International Symposium on Computer Architecture (ISCA-2003), June 2003.
Vasanth Bala and Norman Rubin, "Efficient Instruction Scheduling Using Finite State Automata," MICRO-28, 1995.
B. Ramakrishna Rau, "Iterative Modulo Scheduling," MICRO, 1994.
Daniel D. Sleator and Robert E. Tarjan, "Amortized Efficiency of List Update and Paging Rules," Communications of the ACM, 28(2), 1985.
Erik G. Hallnor and Steven K. Reinhardt, "A Fully Associative Software-Managed Cache Design," ISCA, 2000.
Amir Roth, Andreas Moshovos and Gurindar S. Sohi, "Dependence Based Prefetching for Linked Data Structures," ASPLOS-8, 1998.
David F. Bacon, Susan L. Graham and Oliver J. Sharp, Compiler Transformations for High-Performance Computing, ACM Computing Surveys, 26(4), 1994. (Sections 6.2.1, 6.2.7 and 6.4.1.)
Dean M. Tullsen, Susan J. Eggers, Joel S. Emer, Henry M. Levy, Jack L. Lo and Rebecca L. Stamm, "Exploiting choice: instruction fetch and issue on an implementable simultaneous multithreading processor," ISCA, 1996. (HJS:350).
Hakim Akkary and Michael A. Driscoll, "A Dynamic Multithreading Processor," MICRO, 1998.
K. Akeley, "Reality Engine Graphics," SIGGRAPH, 1993. (HJS:507)
S. K. Reinhardt and S. S. Mukherjee, "Transient Fault Detection via Simultaneous Multithreading," ISCA, 2000.
Lionel Ni and Philip K. McKinley, "A Survey of Wormhole Routing Techniques in Direct Networks," IEEE Computer, February, 1993. (HJS: 492)
Leslie Lamport, "How to Make a Multiprocessor Computer that Correctly Executes Multiprocess Programs", IEEE Transactions on Computers, 28(9):690-691, 1979. (HJS: 574)
L.M. Censier and P. Feautrier, "A New Solution to Coherence Problems in Multicache Systems," IEEE Transactions on Computers, 27(12):1112-1118, 1978. (HJS:576)
Ravi Rajwar and James R. Goodman, "Transactional Lock-Free Execution of Lock-Based Programs," ASPLOS, 2002.