UNIVERSITY OF NORTH CAROLINA CHARLOTTE
ITCS 4/5145 PARALLEL COMPUTING
Barry Wilkinson
abw@uncc.edu
This page contains the materials in the Spring 2016 ITCS 4145 and ITCS 5145 Parallel Computing courses, as were provided to students on Moodle. The courses use our pattern programming approach. For more information, see ../PatternProgProject.html.
For educational use only. Please acknowledge if you use the posted materials. Thank you. The work is supported by NSF.1
Modification date: June 13, 2016.
Course: ITCS 4/5145 Parallel Computing is a programming course. The course starts with OpenMP, then MPI, and then patterns and our Suzaku pattern programming tools. Parallel algorithms are introduced (sorting, numeric, etc.). CUDA is done last. There are seven programming assignments plus a "preassignment" to set up the software on your own computer. Most progamming is done students' own computer using a provided Virtual Machine with some programming on a small departmental parallel programming cluster (notably speedup tests and CUDA). Prerequisites: Knowledge of C programming, data structures and algorithms.
ITCS 4145 course flyer
ITCS 5145 course flyer
ITCS 5145 Welcome -- Describes the various components of the online course.
Additional Course Materials:
Parallel Programming Software -- Software needed for course.
Additional Information -- Includes the course FAQ, documents on installing and using software, previous tests etc.
Class videos from Fall 2014 -- Mostly similar in content but not identical to the Spring 2016 class. (Videos provided for online version only.)
Wk |
|
|
|
Additional materials for online course |
||
Moodle mini-quizzes | ||||||
1 |
Course outline, prerequisites, course text, course contents, instructor details. TA details and responsibilities. | Week1 Study Guide | ||||
Assignment preliminaries, Moodle, student accounts. | ||||||
Setting up software environment. Due: Week 2 | ||||||
Parallel
Comp. Demand |
Demand for computational speed, grand challenge problems |
|||||
Parallel
Comp. Potential |
Quiz questions | Potential for speed-up using
multiple process(or)s, speed-up factor, max speed up, Amdahl's law,
Gustafson's law. |
Lecture 2 not recorded | |||
Parallel Computers | Types of parallel computers, shared memory systems, multicore, programming shared memory, distributed memory platform, networked computers cluster computing, programming, GPU systems. | |||||
|
Programming
with Shared Memory-1 |
Programming shared memory systems, processes, fork, fork-join pattern, threads, Pthreads, thread pool pattern | ||||
2 |
Introduction
to OpenMP |
OpenMP Quiz questions | Introduction to OpenMP, thread team pattern, directives/constructs, parallel, shared and local variables, work-sharing, sections, for, loop scheduling, for reduction, single master. | Week2 Study Guide | Lecture 3 | Mini-quiz Week 2 |
OpenMP tutorial. Due: Week 3 | ||||||
3 | Programming with Shared memory-2 | Shared memory Quiz questions-I | Accessing shared data, critical sections, locks, condition variables, critical sections serializing code, deadlock, semaphores, monitors, Pthreads program example. | Week3 Study Guide | Lecture 4 | Mini-quiz Week 3 |
OpenMP continued | Sharing data and synchronization, critical, barrier, atomic, flush. | |||||
4 | Intro to stencil pattern | Stencil pattern, heat distribution | Week4 Study Guide | Lecture 5 | Mini-quiz Week 4 | |
Assignment 2 (4145) Assignment 2 (5145) |
OpenMP heat distribution program, graphics. Due: Week 5 |
|||||
Programming with Shared Memory-3 | Shared Memory Quiz questions-II | Shared memory performance issues, specifying parallelism, par, forall constructs, dependency analysis (Bernstein's conditions), data shared in caches, false sharing, sequential consistency, code re-ordering | Lecture 6 | |||
Lower
Level Message-passing
Computing - MPI |
Basics of message-passing programming, MPI, point-to-point message passing, message tags, MPI communicator, blocking send/recv, command line compiling and executing MPI programs, instrumenting code for execution time, Eclipse IDE Parallel Tools Platform. | Lecture 7 | ||||
5 |
|
Message passing patterns, MPI collective routines, broadcast, scatter, gather, reduce, barrier, alltoall broadcast. | Week5 Study Guide | Lecture 8 | Mini-quiz Week 5 | |
Synchronization | Barriers implementations, counter, reentrant code, tree, butterfly, local synchronization, safety and deadlock, safe MPI routines, MPI_SendRecv(), MPI_BSend(), MPI_Isend/MPI_Irecv(), synchronous message passing, asynchronous (non-blocking) message passing, changing to synchronous message passing. | Lecture 9 | ||||
Assignment 3 | MPI tutorial, using command line and Eclipse-PTP. Due: Week 7 | |||||
6 | Review for Test 1 |
Test format: ITCS 4145 75-minute class paper test. |
Week6 Study Guide | |||
Test 1 Posted afterwards: ITCS 4145 Test 1 ITCS 4145 Test 1 with solutions
|
Week7 Study Guide | |||||
7 | Introduction to Patterns | Pattern programming concepts, problem addressed, low message-passing patterns, point to point data transfer, broadcast, scatter, gather, reduce, all-to-all broadcast, higher level message-passing patterns, workpool, pipeline, divide and conquer, all-to-all, iterative synchronous patterns, iterative synchronous all-to-all, stencil, advantages and disadvantages of patterns, our tools. | Lecture 10 | Mini-quiz Week 7 | ||
Suzaku framework | Suzaku, macros, routines, implementation | |||||
Suzaku workpool version 2 | ||||||
MPI application. Monte Carlo pi workpool. Due: Week 9 | Lecture 11 | |||||
8 |
Seeds framework | Quiz questions | Seeds
pattern programming framework, , module
method, bootstrapping class, network and multicore versions, workpool programming examples -
Monte Carlo pi, matrix addion, matrix multiplication. |
Week8 Study Guide | Lecture 13 | Mini-quiz Week 8 |
All-To-All pattern, iterative synchronous All-To-All pattern, gravitational N-body problem, Barnes-Hut algorithm, solving system of linear equations by iteration, Jacobi iteration, convergence rate. |
Lecture 14 | |||||
Stencil pattern | Quiz questions | Stencil pattern, applications, solving Laplace's eq., heat distribution problem, ways to improve performance, partially synchronous method, red-black, multigrid. | ||||
9 | Pipeline pattern | Quiz questions | Pipeline pattern, space time diagram, speed up factor, applications, matrix-vector multiplication, matrix multiplication, insertion sort, prime numbers, upper triangular linear equations. | Week9 Study Guide | Lecture 15 | Mini-quiz Week 9 |
Sorting Algorithms | Quiz questions | Potential speedup of sorting in parallel, compare and exchange, bubble sort, odd-even transposition sort, mergesort, quicksort, odd-even mergesort, bitonic mergesort, shearsort, rank sort, counting sort, radix sort | Lecture 16 | |||
10 | Using Suzaku to Create MPI Programs – N-body problem. Due: Week 11. |
Week10 Study Guide | Mini-quiz Week 10 | |||
|
Combining MPI and OpenMP to take advantage of clusters that have both distributed-memory and shared-memory. Discussion of whether hybrid is any better than using only MPI or only OpenMP. |
Lecture 18 | ||||
11 |
Data
Parallel Pattern |
Data parallel pattern, use of
forall notation, example, data parallel prefix sum algorithm, matrix
multiplication. |
Week11 Study Guide | Lecture 19 | Mini-quiz Week 11 | |
Quiz questions |
CPU-GPU architecture evolution, 1970s to present, dedicated pipelined GPUs, general purpose GPU design, NVIDIA products, Fermi architecture, GPU performance gains, CUDA. CUDA SIMT prog. model, CUDA kernel routines, CPU and GPU memories, basic CUDA program structure, code example adding two vectors, compiling and executing on Linux command line, Windows MS Visual Studio. |
|||||
CUDA programming: threads, blocks, grid, multidimensional grid and blocks, compute capabilities, thread addressing, predefined variables, flattening array, 2-D grid and block code: matrix addition/multiplication. |
Lecture 20 | |||||
Assignment 6 | Suzaku Workpool Version 2 Programming Assignment. Due: Week 13 | |||||
12 | Measuring performance, timing program execution, CUDA “events”, synchronous and asynchronous CUDA routines, max and effective bandwidth, computation measures, FLOPs. Declaring routines called from device and from host, local device variables, accessing kernel variables from host, cudaMemcopyToSymbol/FromSymbol |
Week12 Study Guide | Mini-quiz Week 12 | |||
12 |
CUDA assignment using Linux environment to compile and execute simple CUDA programs, make file, vector/matrix addition/multiplication, sorting. Due: Week 15 |
|||||
12 | Ways to achieve thread synchronization, __syncThreads(), CPU
synchronization, cudaThreadSynchronize(), __threadfence(). Memory structures and bandwidth optimization, memory coalescing |
Lecture 21 | ||||
13 | Demonstration of memory coalescing, code, performance improvements Demonstration of using shared memory, code, performance improvements |
Week13 Study Guide | Lecture 22 | |||
13 | Review for quiz | Lecture 23 | ||||
13 |
Test 2 Posted afterwards: ITCS 4145Test 2 ITCS 4145 Test 2 with solutions
|
|||||
14 | Review/discussion/sample finals | Week14 Study Guide | Lecture 24 | |||
15 | Review/discussion/sample finals | Week15 Study Guide | Lecture 25 | |||
16 |
Last class. Review/discussion |
Week16 Study Guide | Lecture 26 |