Recommended Text (HJS): Mark D. Hill, Norman P. Jouppi and Gurindar S. Sohi (editors), ``Readings in Computer Architecture,'' Morgan Kaufmann, 2000.
G.E. Moore, "Cramming More Components onto Integrated Circuits," Electronics, Apr. 1965. (HJS:56)
S. Mazor, "The History of the Microcomputer - Invention and Evolution," Proceedings of the IEEE, 83(12):1601-1608, 1995. (HJS:60)
Ronen, R., Mendelson, A., Lai, K., Lu, S-L., Pollack, F., and Shen, J., "Coming Challenges in Microarchitecture and Architecture," Proceedings of the IEEE, 89(3), 2001.
G.M. Amdahl, G.A. Blaauw, F.P. Brooks, Jr., "Architecture of the IBM System/360," IBM Journal of Research and Development, Apr. 1964. (HJS:17)
G. Radin, "The 801 Minicomputer," proceedings of ASPLOS, Apr. 1982. (HJS:126)
D.A. Patterson and D.R. Ditzel, "The case for the Reduced Instruction Set Computers," proceedings of ASPLOS, Apr. 1980. (HJS:135)
R. Tomasulo, "An Efficient Algorithm for Exploiting Multiple Arithmetic Units," IBM Journal Research and Development, 11:2533, January 1967
Y.N. Patt, W.W. Hwu, and M.C. Shebanow, "HPS, A New Microarchitecture: Rationale and Introduction," Proceedings of the 18th International Microprogramming Workshop, Asilomar, CA Dec. 1985, pp. 103-108. (HJS:238)
James E. Smith and Gurindar S. Sohi, The Microarchitecture of Superscalar Processors, Proceedings of the IEEE, 83(12):1609-1624, 1995.
PC Magazine, " DRAM Technology."
A.J. Smith, "Cache Memories," ACM Computing Surveys , 14(3), September 1982.
T.-Y. Yeh and Y.N.Patt, "Two-level Adaptive Training Branch Prediction ," IEEE Transactions on Computers , 48(2), 1999. (HJS:228)
Scott McFarling, "Combining Branch Predictors," Digital Western Research Laboratory Technical Note TN-36, June 1993.
Sanjay Patel, Daniel Holmes Friendly and Yale N. Patt, Evaluation of design options for the trace cache fetch mechanism, IEEE Transactions on Computers, 48(2), 1999.
Kenneth C. Yeager, "The Mips R10000 Superscalar Microprocessor," IEEE Micro, April 1996. (HJS:275)
D.B.Papworth, "Tuning the Pentium Pro Microarchitecture," IEEE Micro, 16(2):8-15, 1996. (HJS:660)
B. Ramakrishna Rau and Joseph A. Fisher, "Instruction-Level Parallel Processing: History, Overview, and Perspective," The Journal of Supercomputing, 7, 9-50, 1993. (HJS:288)
B. Ramakrishna Rau, "Iterative Modulo Scheduling," MICRO, 1994.
S. Mahlke, W.Y. Chen, W.W.Hwu, B.R. Rau, M. Schlansker, "Sentinel Scheduling for Superscalar and VLIW Processor ," ASPLOS 1992.
S. Mahlke, R.E. Hank, J.E. McCormick, R.I. August, W.W.Hwu, "A Comparison of Full and Partial Predicated Execution Support for ILP Processors," ISCA 1995. (HJS:163)
R.I. August, D.A. Connors, S.A. Mahlke, J.W. Sias, K.M. Krozier, B-.C. Cheng, P.R. Eaton, Q.B. Olaniran, W.W.Hwu, "Integrated Predicated and Speculative Execution in the IMPACT EPIC Archtecture," ISCA 1998.
J.E. Thornton, "Parallel Operation in the Control Data 6600," Fall Joint Computer Conference, vol 26, pp 33-40, 1961.(HJS:32)
Dean M. Tullsen, Susan J. Eggers, Joel S. Emer, Henry M. Levy, Jack L. Lo and Rebecca L. Stamm, "Exploiting choice: instruction fetch and issue on an implementable simultaneous multithreading processor," ISCA, 1996. (HJS:350)
D. Marr, et al, "Hyper-Threading Technology and Microarchitecture," Intel Technical Journal, Feb 2002.
R.M. Russell, "The CRAY-1 Computer System,," Communications of the ACM, 21(1), 63-72, 1978. (HJS:40)
M.C. Merten, A.R. Trick, C.N. George, J.C. Gyllenhaal, and W.W. Hwu, "A Hardware-Driven Profiling Scheme for Identifying Program Hot Spots to Support Runtime Optimization," ISCA 1999.
M.C. Merten, A.R. Trick, E.M. Nystrom, R.D. Barnes, and W. W. Hwu, "A Hardware Mechanism for Dynamic Extraction and Relayout of Program Hot Spots," ISCA 2000.
B. Fahs, S. Bose, M. Crum, B. Slechta, F. Spadini, T. Tung, S. J. Patel and S. S. Lumetta, "Performance Characterization of a Hardware Mechanism for Dynamic Optimization," MICRO, 2001.
L. Lamport, "How to Make a Multiprocessor Computer that Correctly Executes Multiprocess Programs", IEEE Transactions on Computers, 28(9):690-691, 1979. (HJS: 574)
D. Lenoski, et al, "The Stanford Dash Multiprocessor," IEEE Computer, 25(3): 63-79, 1992. (HJS: 583)
Dan Ernst, Andrew Hamel, and Todd Austin, Cyclone: A Broadcast-Free Dynamic Instruction Scheduler with Selective Replay, ACM/IEEE 30th Annual International Symposium on Computer Architecture (ISCA-2003), June 2003.
Vasanth Bala and Norman Rubin, "Efficient Instruction Scheduling Using Finite State Automata," MICRO-28, 1995.
Daniel D. Sleator and Robert E. Tarjan, "Amortized Efficiency of List Update and Paging Rules," Communications of the ACM, 28(2), 1985.
Erik G. Hallnor and Steven K. Reinhardt, "A Fully Associative Software-Managed Cache Design," ISCA, 2000.
David F. Bacon, Susan L. Graham and Oliver J. Sharp, Compiler Transformations for High-Performance Computing, ACM Computing Surveys, 26(4), 1994. (Sections 6.2.1, 6.2.7 and 6.4.1.)
Hakim Akkary and Michael A. Driscoll, "A Dynamic Multithreading Processor," MICRO, 1998.
K. Akeley, "Reality Engine Graphics," SIGGRAPH, 1993. (HJS:507)
S. K. Reinhardt and S. S. Mukherjee, "Transient Fault Detection via Simultaneous Multithreading," ISCA, 2000.
Lionel Ni and Philip K. McKinley, "A Survey of Wormhole Routing Techniques in Direct Networks," IEEE Computer, February, 1993. (HJS: 492)
L.M. Censier and P. Feautrier, "A New Solution to Coherence Problems in Multicache Systems," IEEE Transactions on Computers, 27(12):1112-1118, 1978. (HJS:576)
Ravi Rajwar and James R. Goodman, "Transactional Lock-Free Execution of Lock-Based Programs," ASPLOS, 2002.