ITCS 4145 Parallel Programming

UNIVERSITY OF NORTH CAROLINA CHARLOTTE

ITCS 4/5145 PARALLEL COMPUTING

Barry Wilkinson
abw@uncc.edu

This page contains the materials in the Spring 2016 ITCS 4145 and ITCS 5145 Parallel Computing courses, as were provided to students on Moodle. The courses use our pattern programming approach. For more information, see ../PatternProgProject.html.
For educational use only. Please acknowledge if you use the posted materials. Thank you. The work is supported by NSF.¹

Modification date: June 13, 2016.

Course: ITCS 4/5145 Parallel Computing is a programming course. The course starts with OpenMP, then MPI, and then patterns and our Suzaku pattern programming tools. Parallel algorithms are introduced (sorting, numeric, etc.). CUDA is done last. There are seven programming assignments plus a "preassignment" to set up the software on your own computer. Most progamming is done students' own computer using a provided Virtual Machine with some programming on a small departmental parallel programming cluster (notably speedup tests and CUDA). Prerequisites: Knowledge of C programming, data structures and algorithms.

ITCS 4145 course flyer
ITCS 5145 course flyer
ITCS 5145 Welcome -- Describes the various components of the online course.

Additional Course Materials:

Parallel Programming Software -- Software needed for course.
Additional Information -- Includes the course FAQ, documents on installing and using software, previous tests etc.

Class videos from Fall 2014 -- Mostly similar in content but not identical to the Spring 2016 class. (Videos provided for online version only.)

Lecture Materials

**Lecture slides**
Wk	Slides	Quiz questions	Topics	Additional materials for online course
Wk	Slides	Quiz questions	Topics	Study Guides	Fall 2014 Videos	Moodle mini-quizzes
1	Outline 4145 Outline 5145		Course outline, prerequisites, course text, course contents, instructor details. TA details and responsibilities.	Week1 Study Guide	Lecture1
	Assignment Preliminaries		Assignment preliminaries, Moodle, student accounts.
	Pre-Assignment		Setting up software environment. Due: Week 2
	Parallel Comp. Demand		Demand for computational speed, grand challenge problems
	Parallel Comp. Potential	Quiz questions	Potential for speed-up using multiple process(or)s, speed-up factor, max speed up, Amdahl's law, Gustafson's law.		Lecture 2 not recorded
	Parallel Computers		Types of parallel computers, shared memory systems, multicore, programming shared memory, distributed memory platform, networked computers cluster computing, programming, GPU systems.
	Programming with Shared Memory-1		Programming shared memory systems, processes, fork, fork-join pattern, threads, Pthreads, thread pool pattern
2	Introduction to OpenMP	OpenMP Quiz questions	Introduction to OpenMP, thread team pattern, directives/constructs, parallel, shared and local variables, work-sharing, sections, for, loop scheduling, for reduction, single master.	Week2 Study Guide	Lecture 3	Mini-quiz Week 2
	Assignment 1		OpenMP tutorial. Due: Week 3
3	Programming with Shared memory-2	Shared memory Quiz questions-I	Accessing shared data, critical sections, locks, condition variables, critical sections serializing code, deadlock, semaphores, monitors, Pthreads program example.	Week3 Study Guide	Lecture 4	Mini-quiz Week 3
	OpenMP continued		Sharing data and synchronization, critical, barrier, atomic, flush.
4	Intro to stencil pattern		Stencil pattern, heat distribution	Week4 Study Guide	Lecture 5	Mini-quiz Week 4
	Assignment 2 (4145) Assignment 2 (5145)		OpenMP heat distribution program, graphics. Due: Week 5 Graduate assignment includes Task 5 changing termination condition.
	Programming with Shared Memory-3	Shared Memory Quiz questions-II	Shared memory performance issues, specifying parallelism, par, forall constructs, dependency analysis (Bernstein's conditions), data shared in caches, false sharing, sequential consistency, code re-ordering		Lecture 6
	Lower Level Message-passing Computing - MPI		Basics of message-passing programming, MPI, point-to-point message passing, message tags, MPI communicator, blocking send/recv, command line compiling and executing MPI programs, instrumenting code for execution time, Eclipse IDE Parallel Tools Platform.		Lecture 7
5	Collective data transfer patterns and MPI routines	Quiz questions	Message passing patterns, MPI collective routines, broadcast, scatter, gather, reduce, barrier, alltoall broadcast.	Week5 Study Guide	Lecture 8	Mini-quiz Week 5
	Synchronization	Quiz questions Quiz questions	Barriers implementations, counter, reentrant code, tree, butterfly, local synchronization, safety and deadlock, safe MPI routines, MPI_SendRecv(), MPI_BSend(), MPI_Isend/MPI_Irecv(), synchronous message passing, asynchronous (non-blocking) message passing, changing to synchronous message passing.		Lecture 9
	Assignment 3		MPI tutorial, using command line and Eclipse-PTP. Due: Week 7
6	Review for Test 1		Test format: ITCS 4145 75-minute class paper test. Qu 1 (a) – (j), 10-12 short answer questions (24 points) Qu 2 (a) Write a small sequential program (4 points) (b) Modify it to be an OpenMP program (4 points) (c) Modify it to be an MPI program (8 points) Total 40 points. Closed book except that provided with the test. Summaries of OpenMP and MPI routines will be provided. Topics: All materials from week 1 to week 5 inclusive, including assignment 1 and assignment 2. ITCS 5145: Equivalent online test, slightly harder coding questions.	Week6 Study Guide
			Test 1 Posted afterwards: ITCS 4145 Test 1 ITCS 4145 Test 1 with solutions ITCS 5145 Test 1 code questions with solutions	Week7 Study Guide
7	Introduction to Patterns		Pattern programming concepts, problem addressed, low message-passing patterns, point to point data transfer, broadcast, scatter, gather, reduce, all-to-all broadcast, higher level message-passing patterns, workpool, pipeline, divide and conquer, all-to-all, iterative synchronous patterns, iterative synchronous all-to-all, stencil, advantages and disadvantages of patterns, our tools.		Lecture 10	Mini-quiz Week 7
	Suzaku framework		Suzaku, macros, routines, implementation
	Suzaku workpool version 2
	Assignment 4		MPI application. Monte Carlo pi workpool. Due: Week 9		Lecture 11
8	Seeds framework	Quiz questions	Seeds pattern programming framework, , module method, bootstrapping class, network and multicore versions, workpool programming examples - Monte Carlo pi, matrix addion, matrix multiplication.	Week8 Study Guide	Lecture 13	Mini-quiz Week 8
	Synchronous All-To-All Patterns	Quiz questions	All-To-All pattern, iterative synchronous All-To-All pattern, gravitational N-body problem, Barnes-Hut algorithm, solving system of linear equations by iteration, Jacobi iteration, convergence rate.		Lecture 14
	Stencil pattern	Quiz questions	Stencil pattern, applications, solving Laplace's eq., heat distribution problem, ways to improve performance, partially synchronous method, red-black, multigrid.
9	Pipeline pattern	Quiz questions	Pipeline pattern, space time diagram, speed up factor, applications, matrix-vector multiplication, matrix multiplication, insertion sort, prime numbers, upper triangular linear equations.	Week9 Study Guide	Lecture 15	Mini-quiz Week 9
	Sorting Algorithms	Quiz questions	Potential speedup of sorting in parallel, compare and exchange, bubble sort, odd-even transposition sort, mergesort, quicksort, odd-even mergesort, bitonic mergesort, shearsort, rank sort, counting sort, radix sort		Lecture 16
10	Assignment 5 (4145) Assignment 5 (5145) Demo		Using Suzaku to Create MPI Programs – N-body problem. Due: Week 11. Graduate asisgnment includes Part 3 Suzaku workpool version Demo of assignment output (N-body graphics)	Week10 Study Guide		Mini-quiz Week 10
	Hybrid Programming		Combining MPI and OpenMP to take advantage of clusters that have both distributed-memory and shared-memory. Discussion of whether hybrid is any better than using only MPI or only OpenMP.		Lecture 18
11	Data Parallel Pattern		Data parallel pattern, use of forall notation, example, data parallel prefix sum algorithm, matrix multiplication.	Week11 Study Guide	Lecture 19	Mini-quiz Week 11
	Intro to GPUs and CUDA CUDA Prog. Model	Quiz questions	CPU-GPU architecture evolution, 1970s to present, dedicated pipelined GPUs, general purpose GPU design, NVIDIA products, Fermi architecture, GPU performance gains, CUDA. CUDA SIMT prog. model, CUDA kernel routines, CPU and GPU memories, basic CUDA program structure, code example adding two vectors, compiling and executing on Linux command line, Windows MS Visual Studio.
	Multidimensional thread structure		CUDA programming: threads, blocks, grid, multidimensional grid and blocks, compute capabilities, thread addressing, predefined variables, flattening array, 2-D grid and block code: matrix addition/multiplication.		Lecture 20
	Assignment 6		Suzaku Workpool Version 2 Programming Assignment. Due: Week 13
12	Performance measurements Device routines		Measuring performance, timing program execution, CUDA “events”, synchronous and asynchronous CUDA routines, max and effective bandwidth, computation measures, FLOPs. Declaring routines called from device and from host, local device variables, accessing kernel variables from host, cudaMemcopyToSymbol/FromSymbol	Week12 Study Guide		Mini-quiz Week 12
12	Assignment 7 (4145) Assignment 7 (5145)		CUDA assignment using Linux environment to compile and execute simple CUDA programs, make file, vector/matrix addition/multiplication, sorting. Due: Week 15
12	Thread synchronization GPU memory structures		Ways to achieve thread synchronization, __syncThreads(), CPU synchronization, cudaThreadSynchronize(), __threadfence(). Memory structures and bandwidth optimization, memory coalescing		Lecture 21
13	Memory coalescing Shared Memory Demo		Demonstration of memory coalescing, code, performance improvements Demonstration of using shared memory, code, performance improvements	Week13 Study Guide	Lecture 22
13			Review for quiz		Lecture 23
13			Test 2 Posted afterwards: ITCS 4145Test 2 ITCS 4145 Test 2 with solutions ITCS 5145 test 1 code questions with solutions
14			Review/discussion/sample finals	Week14 Study Guide	Lecture 24
15			Review/discussion/sample finals	Week15 Study Guide	Lecture 25
16			Last class. Review/discussion	Week16 Study Guide	Lecture 26

¹ The materials provided are based upon work supported by the National Science Foundation under the collaborative grant #1141005/1141006. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.