Course Websites

ECE 408 - Applied Parallel Programming

Last offered Fall 2021

Official Description

Parallel programming with emphasis on developing applications for processors with many computation cores. Computational thinking, forms of parallelism, programming models, mapping computations to parallel hardware, efficient data structures, paradigms for efficient parallel algorithms, and application case studies. Course Information: Same as CS 483 and CSE 408. 4 undergraduate hours. 4 graduate hours. Prerequisite: ECE 220.

Related Faculty

Subject Area

  • Computer Engineering

Course Director

Detailed Description and Outline

Parallel programming with emphasis on developing applications for processors with many computation cores. Computational thinking, forms of parallelism, programming model features, mapping computations to parallel hardware, efficient data structures, paradigms for efficient parallel algorithms, hardware fatures and limitations, and application case studies. Same as CS 483.

Computer Usage

Extensive usage for all programming assignments and final project

Reports

A final project report is required

Lab Projects

Lab 0 - installation and test of programming environment; Lab 1 - Parallel Vector Addition; Lab 2 - Parallel Matrix Multiplication; Lab 3 - Tiled Parallel Matrix Multiplication; Lab 4 - Parallel Reduction; Lab 5 - Parallel Scan; Lab 6 - Tiled Parallel Convolution; Lab 7 - Sparse Matrix-Vector Multiplication; Final Project that involves Project Proposal, Project Workshop, Project Presentation, and Project Report

Lab Equipment

Linux based cluster system

Lab Software

C Programming Language and CUDA Software Development Kit, WebGPU for labs, RAI for final project

Topical Prerequisites

C programming, Basic data structures, Introduction to computer organization

Texts

D. Kirk and W. Hwu, Programming Massively Parallel Processors, Morgan Kaufmann, 3rd Edition.

Required, Elective, or Selected Elective

Elective

Course Goals

The aim of this course is to provide students with knowledge and hands-on experience in developing applications software for processors with massively parallel computing resources. In general, we refer to a processor as massively parallel if it has the ability to complete more than 64 arithmetic operations per clock cycle. Many commercial offerings from NVIDIA, AMD, and Intel already offer such levels of concurrency. Effectively programming these processors requires in-depth knowledge about parallel programming principles, as well as parallelism models, communication models, hardware organizations, and resource limitations of these processors. The target audiences of the course are students who want to develop exciting applications for these processors, as well as those who want to develop programming tools and future design for these processors.

Instructional Objectives

A. After the seven machine problems (after approximately 20 seventy-five minute lectures) the student should be able to:

1. Analyze and implement common parallel algorithm patterns in a parallel programming model such as CUDA. (1, 2)

2. Design experiments to analyze the performance bottlenecks in their parallel code. (6)

3. Apply common parallel techniques to improve performance given hardware constraints. (1, 2, 6)

4. Learn about the features of a parallel debugger and use them to identify and repair code defects. (6, 7)

5. Learn about the features of a parallel profiler and use them to identify performance bottlenecks in their code. (6, 7)

B. By examination 2 (after approximately 29 seventy-five minute lectures) the student should be able to:

6. Understand and apply common parallel algorithm patterns. (1, 7)

7. Understand the major types of hardware limitations that limit parallel program performance. (1, 6, 7)

8. Understand and apply common parallel programming interface features. (1, 6, 7)

9. Review a parallel code segment and identify its behavior and potential problems. (b, e)

C. By the end of the final project (with proposal, workshop discussions, presentation, and report) the student should be able to:

10. Identify and solve a computational problem with parallel algorithm design and program. (1, 2, 6, 7)

11. Learn the necessary domain knowledge in order to solve the identified problem (7)

12. Work with domain experts and teammates from different disciplines to maximize the effective of solutions (3, 5)

13. Properly divide up the responsibilities among teammates and support each other towards success (3, 4, 5)

14. Identify design space and explore optimization opportunities for the solutions. (1, 2, 6, 7)

15. Motivate the problem and approach in a presentation. (3)

16. Properly explain the solutions experimented and justify the final decision and outcome. (1, 2, 3, 4, 6)

17. Identify limitations of the solutions and future directions (1, 2, 4, 6, 7)

TitleSectionCRNTypeHoursTimesDaysLocationInstructor
Applied Parallel ProgrammingAB58791LAB0 -    Henry Haase
Applied Parallel ProgrammingAL176323LEC41100 - 1220 T R  1310 Digital Computer Laboratory Volodymyr Kindratenko
Applied Parallel ProgrammingAL267069LEC40930 - 1050 T R  1002 Electrical & Computer Eng Bldg Sanjay Patel
Applied Parallel ProgrammingOD158790PKG40930 - 1050 T R    Sanjay Patel
Henry Haase
Applied Parallel ProgrammingOD158790PKG4 -    Sanjay Patel
Henry Haase
Applied Parallel ProgrammingZJ172391PKG40930 - 1050 T R  ARR Zhejiang University Volodymyr Kindratenko
Applied Parallel ProgrammingZJ172391PKG41900 - 2050 R  ARR Zhejiang University Volodymyr Kindratenko
Applied Parallel ProgrammingZJ272392PKG40930 - 1050 T R  ARR Zhejiang University Volodymyr Kindratenko
Applied Parallel ProgrammingZJ272392PKG41900 - 1950 R  ARR Zhejiang University Volodymyr Kindratenko