Recommended Text (HJS): Mark D. Hill, Norman P. Jouppi and Gurindar S. Sohi (editors), Readings in Computer Architecture, Morgan Kaufmann, 2000.
Ronen, R., Mendelson, A., Lai, K., Lu, S-L., Pollack, F., and Shen, J., "Coming Challenges in Microarchitecture and Architecture," Proceedings of the IEEE, 89(3), 2001.
G.E. Moore, "Cramming More Components onto Integrated Circuits," Electronics, Apr. 1965. (HJS:56)
David W. Wall, "Limits of Instruction Level Parallelism," Digital Western Research Laboratory Research Report 93/6, 1993 (extended version of a paper that appeared in ASPLOS 1991: The appendix describes trace-based simulator design).
B. Ramakrishna Rau and Joseph A. Fisher, "Instruction-Level Parallel Processing: History, Overview, and Perspective," The Journal of Supercomputing, 7, 9-50, 1993. (HJS:288)
James E. Smith and Gurindar S. Sohi, The Microarchitecture of Superscalar Processors, Proceedings of the IEEE, 83(12):1609-1624, 1995.
Simultaneous Multithreading: A Foundation for Next-generation Processors, Susan J. Eggers, Joel S. Emer, Henry M. Levy, Jack L. Lo, Rebecca L. Stamm, and Dean M. Tullsen, IEEE Micro, September/October 1997, pp. 12-18
The Case for a Single-Chip Multiprocessor Kunle Olukotun, Basem A. Nayfeh , Lance Hammond, Ken Wilson and Kun-Yung Chang Proceedings of the Seventh International Symposium on Architectural Support for Parallel Languages and Operating Systems, October 1996.
J. Huh, D. Burger, and S. W. Keckler, ``Exploring the Design Space of Future CMPs,'' in PACT '01: Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques, pp. 199-210, 2001
Rakesh Kumar, Dean Tullsen, Norman Jouppi, and Partha Ranganathan. "Heterogeneous Chip Multiprocessors". In IEEE Computer, November 2005 (PDF).
M. B. Taylor, J. Kim, J. Miller, D. Wentzlaff, F. Ghodrat, B. Greenwald, H. Hoffman, P. Johnson, J.-W. Lee, W. Lee, A. Ma, A. Saraf, M. Seneski, N. Shnidman, V. Strumpen, M. Frank, S. Amarasinghe, and A. Agarwal, ``The Raw Microprocessor: A Computational Fabric for Software Circuits and General-Purpose Programs,'' IEEE Micro, vol. 22, no. 2, pp. 25-35, 2002
K. Sankaralingam, R. Nagarajan, H. Liu, C. Kim, J. Huh, D. Burger, S. W. Keckler, and C. R. Moore, ``Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture,'' SIGARCH Comput. Archit. News, vol. 31, no. 2, pp. 422-433, 2003.
M. M. K. Martin, M. D. Hill, and D. A. Wood, ``Token coherence: decoupling performance and correctness,'' in ISCA '03: Proceedings of the 30th annual international symposium on Computer architecture, pp. 182-193, 2003
D. Lenoski, J. Laudon, K. Gharachorloo, A. Gupta, and J. Hennessy, ``The directory-based cache coherence protocol for the DASH multiprocessor,'' in ISCA '90: Proceedings of the 17th annual international symposium on Computer Architecture, pp. 148-159, 1990.
S. V. Adve and K. Gharachorloo, ``Shared Memory Consistency Models: A Tutorial,'' Tech. Rep., DEC WRL, 1995
W.J. Dally and B. Towles, "Route packets, not wires: On-chip interconnection networks," in Proceedings of DAC 2001.
Vassos Soteriou, Noel Eisley, Hangsheng Wang, Bin Li, Li-Shiuan Peh. "Polaris: A System-Level Roadmap for On-Chip Interconnection Networks". Proceedings of the 24th International Conference on Computer Design (ICCD), October, 2006.
The Cell Broadband Engine: Exploiting Multiple Levels of Parallelism in a Chip Multiprocessor , International Journal of Parallel ProgrammingVol. 35, No. 3, June 2007.
J. H. Ahn, W. J. Dally, B. Khailany, U. J. Kapasi, and A. Das, ``Evaluating the Imagine Stream Architecture,'' in ISCA '04: Proceedings of the 31st annual international symposium on Computer architecture, p. 14, 2004.