Department of Computer Science
ITCS 4145/5145 Parallel Computing
Spring 2014
Tuesday/Thursday 5:00 pm - 6:15 pm, Woodward 130
This page is continually updated as the course proceeds. Watch for
announcements. Modification date: June 5, 2014. Always make sure you have the most recent copy of this page (not cached, re-load page).
Assignment
Frequently Asked Questions |
Academic
calendar |
Lecture
Materials |
Reading
materials |
Assignments |
Tests |
UNC-C Moodle 2 |
The following slides are provided as Powerpoint slides. You may wish to print these sides out as 1 x 2 or 2 x 3 thumbnails. The slides are not ready for use until the date of the class. They are likely to be revised just before the class.
Wk |
Date, 2014 |
No of slides |
|
Review/Quiz questions
|
|
1 |
Thurs Jan 9 |
26 | Outline |
Course outline, prerequisites, course text, course contents, instructor details. | |
1 |
Thurs Jan 9 |
23 | Assignment
Preliminaries |
Assignment preliminaries,
access to servers, Moodle, student accounts. TA details and
responsibilities. |
|
1/2 |
Thurs Jan 9 |
13 | Parallel
Comp. Demand |
Demand for computational speed, grand challenge problems |
|
2 |
Tues Jan 14 |
16 | Parallel
Comp. Potential |
Quiz questions | Potential for speed-up using
multiple process(or)s, speed-up factor, max speed up, Amdahl's law,
Gustafson's law. |
2 |
Tues Jan 14 |
19 | Parallel Computers | Types of parallel computers, shared memory systems, multicore, programming shared memory, distributed memory platform, networked computers cluster computing, programming, GPU systems. | |
2 |
Tues Jan 14/Thur Jan 16 |
33 | Pattern
Programming-1 |
Quiz questions | Parallel patterns for structured parallel programming, workpool, pipeline, divide and conquer, stencil, all-to-all patterns, advantages of starting with patterns, tools, Seeds pattern programming framework, user interface, programming example (Monte Carlo pi). |
2 |
Thurs Jan 16 |
41 |
Pattern
Programming-2 |
Seeds Framework, workpool module methods, bootstrapping class, further details of Seeds workpool for Assignment 1, Monte Carlo pi code, matrix addition and multiplication workpool code. | |
2 | Thurs Jan 16 |
Assignment 1 | Using the Seeds Pattern Programming Framework: Workpool | ||
3 | Tues Jan 21 | Matrix addition and multiplication | Matrix addition and multiplication, partitioning, block matrix multiplication, computation/comunnication ratio, workpool pattern Review, quiz questions etc. |
||
3 |
Thurs Jan 23 |
54 | Lower
Level Message-passing
Computing - MPI |
Basics of message-passing programming, MPI, point-to-point message passing, message tags, MPI communicator, blocking send/recv, command line compiling and executing MPI programs, instrumenting code for execution time, Eclipse IDE Parallel Tools Platform. | |
4/5 |
Turs Jan 30/Tues Feb 4 | 57 |
|
Collective data transfer patterns and MPI collective routines, broadcast, scatter, gather, reduce, barrier, alltoall broadcast. Combined patterns, broadcast-gather, MPI routines, all gather, alltoall, general features | |
4 | Thurs Jan 30 | Assignment 2 | Compiling and executing MPI programs. Comparison with Seeds | ||
5 | Tues Feb 4 | 33 | Synchronization | Synchronous message passing, asynchronous (non-blocking) message passing, changing to synchronous message passing. MPI_BSend(), MPI_Isend/MPI_Irecv(), Barrier synchronization pattern, MPI Barrier, implementations, counter, reentrant code, tree, butterfly, local synchronization, safety and deadlock, safe MPI routines, MPI_SendRecv() | |
5
|
Thurs Feb 6 |
39 | Programming
with Shared Memory |
Programming shared memory systems, processes, threads, issues, interleaved statements, thread safe routines, re-ordering code, compiler/processor optimizations, accessing shared data, critical sections, locks, condition variables, deadlock, semaphores, monitors, Pthreads program example. | |
6 |
Tues Feb 11 |
45 | Introduction
to OpenMP |
Introduction to OpenMP, directives/constructs, parallel, shared and local variables, work-sharing, sections, for, loop scheduling, for reduction, single master, critical, barrier, atomic, flush. | |
6 | Thurs Feb 13 |
|
|
University closed because of snow. Class canceled |
|
7 | Tuesday Feb 18 2014 | Assignment 3 | OpenMP assignment. | ||
7 | Tuesday Feb 18 2014 | Review for test | |||
7 | Thur Feb 20 | Class Test | Paper test in classroom | ||
8 | Tues Feb 25, 2014 | Return and go over test 1 |
|||
8 | Tues Feb 25 | 16 | Assignment 3 Notes | Stencil pattern intro, heat distribution, and Assignment 3 Generating X11 graphics |
|
8 | Feb 27 | 29 |
Quiz questions | Shared memory performance issues, specifying parallelism, par, forall constructs, dependency analysis (Bernstein's conditions), data shared in caches, false sharing, sequential consistency, code re-ordering |
|
March 3 - 8, 2014 |
Spring Break, no classes | ||||
9 |
Tues March 11 |
15
39 |
Using compiler-directed approach to create MPI code automatically, intro to Paraguin compiler, compiling. Paraguin, parallel regions, barrier, forall, broadcast, scatter, gather, and reduction. |
||
9 | Thurs March 13 | 28
|
Compiler
directive approach |
Quiz Questions | Patterns, Scatter/Gather, Stencil |
9 | Thurs March 13 | 26 | Hybrid Programming | Combining MPI and OpenMP to take advantage of clusters that have both distributed-memory and shared-memory. Discussion of whether hybrid is any better than using only MPI or only OpenMP. Using the Paraguin compiler to generate a hybrid program |
|
10 |
Tues March 18 |
Assignment 4 |
Suzaku assignment (new for Spring 2014) - pattern programming using macros/routines. |
||
10 | Tues March 18 | 32 | Synchronous All-To-All pattern, example use in gravitational N-body problem, Barnes-Hut algorithm, Seeds CompleteSynchGraph pattern code for N-body problem, iterative synchronous All-To-All pattern, solving system of linear equations by iteration, Jacobi iteration, convergence rate. Seeds CompleteSynchGraph Pattern, MPI _Allgather() routine |
||
10 | Thurs March 20, 2014 | 37 | Stencil pattern | Quiz questions | Stencil pattern, applications, solving Laplace's eq., heat distribution problem, Seeds stencil pattern, cellular automata, game of life, ways to improve performance, partially synchronous method, red-black, multigrid. |
10 | Thurs March 20, 2014 | 32 | Pipeline pattern | Quiz questions | Pipeline pattern, space time diagram, speed up factor, matrix-vector multiplication, matrix multiplication, adding rows of an array, unfolding loops, frequency filter, insertion sort, prime numbers, upper triangular linear equations. Seeds pipeline pattern. |
Reading | 23 | Sieve of Eratosthenes | Quiz questions | Sieve of Eratosthenes Algorithm for computing prime numbers | |
Reading | 42 | Graph Algorithms | Prim's Algorithm for Minimum Spanning Tree, Dijkstra's Algorithm for Single-Source Shortest Path, Dijkstra's and Floyd's Algorithms for All-Pairs Shortest Path | ||
11 |
Tues March 25, 2014 |
40 | Sorting Algorithms | Quiz questions | Potential speedup of sorting in parallel, compare and exchange, bubble sort, odd-even transposition sort, mergesort, quicksort, odd-even mergesort, bitonic mergesort, shearsort, rank sort, counting sort, radix sort |
11 |
Thurs March 27 |
14 | Data
Parallel Pattern |
Data parallel pattern, use of
forall notation, example, data parallel prefix sum algorithm, matrix
multiplication. |
|
11
12 |
Thurs March 27
Tuesday April 1 |
21
21
|
CPU-GPU architecture evolution, 1970s to present, dedicated pipelined GPUs, general purpose GPU design, NVIDIA products, Fermi architecture, GPU performance gains, CUDA. CUDA SIMT prog. model, CUDA kernel routines, CPU and GPU memories, basic CUDA program structure, code example adding two vectors, compiling and executing on Linux command line, Windows MS Visual Studio. |
||
12 | Tuesday April 1 | 38 |
CUDA programming: threads, blocks, grid, multidimensional grid and blocks, compute capabilities, thread addressing, predefined variables, flattening array, 2-D grid and block code: matrix addition/multiplication. |
||
12 |
Thurs April 3 |
|
Assignment 5
|
CUDA assignment using Linux environment to compile and execute simple CUDA programs, make file, vector/matrix addition/multiplication, and sorting. |
|
12 | Thurs April 3 |
21
14
12 |
|
Measuring performance, timing program execution, CUDA “events”, synchronous and asynchronous CUDA routines, max and effective bandwidth, computation measures, FLOPs. Declaring routines called from device and from host, local device variables, accessing kernel variables from host, cudaMemcopyToSymbol/FromSymbol Ways to achieve thread synchronization, __syncThreads(), CPU synchronization, cudaThreadSynchronize(), __threadfence(). |
|
13 | Tues April 8 | Review for test | |||
13 |
Thurs April 10 |
Class
Test |
Paper test in classroom | ||
14 |
Tues April 15 |
41
|
Memory structures and bandwidth optimization, memory coalescing |
||
14 | Thurs April 17 | Return test | |||
15 | Tues April 22 | ||||
15 | Thurs April 24 | Review/discussion/sample finals | |||
16 |
Tues April 29, 2014 |
Last class. Review/discussions/sample finals Teaching evaluations done on-line |
Notes on installing MPI on your own computer
Let me know if you have anything to post on MPI installations. Thanks. BW
Paraguin
OpenMP
Notes on installing MPI on your own computer
CUDA
UNCC parallel programming cci-grid0x cluster
Date set |
Date to report system/account problems
|
Assignment | Topic | Date due 12 pm (noon) |
Thursday Jan 16, 2014 | Using the Seeds Pattern
Programming Framework 1 - Workpool Pattern |
Wednesday Jan 29, 2014 | ||
Thursday Jan 30, 2014 |
Thursday Feb 6th, 2014 for any issues installing MPI on your own computer. |
Writing and executing MPI programs on a local computer and on the cluster. MPI workpool program - Compare with Seeds Assignment 1 |
Wednesday Feb 19 2014 | |
Thurs Feb 20, 2014 |
Thursday Feb 27th, 2014 for any issues installing OpenMP compiler on your own computer. |
OpenMP assignment |
Wednesday March 19, 2014 | |
Tues March 18, 2014 |
Tuesday March 25 2014 for any issues that are preventing you from starting |
Assignment 4
suzaku.h (all OS's) suzaku.o (zipped): |
Suzaku assignment (new for Spring 2014) - pattern programming using macros/routines. |
Friday April 4, 2014 |
April 3, 2014 |
Assignment 5 |
CUDA programs, Linux environment
to compile and execute
simple CUDA programs, make file, vector/matrix addition, prefix sum, and sorting extra credit). |
April 22, 2014 |
Class test 1 date: Thur Feb 20, 2014
Format: Paper test, same format as posted tests. MPI and OpenMP summaries provided, see Previous Tests. Otherwise closed book.
Topics: All lecture materials presented in class from beginning of course (week 1) to week 6 inclusive (parallel computers, pattern programming and Seeds, message passing programming and MPI, shared memory programming and OpenMP), and materials in Assignment 1 (Seeds pattern programming) and Assignment 2 (MPI). Does not include Assignment 3.
Class test 2 date: Thurs April 10, 2014
Format: Paper test, same format as posted tests.
Topics: All lecture materials after test 1, week 8 to week 12 inclusive -Shared memory performance issues, Paraguin, hybrid
programming, synchronous all-to-all pattern, stencil pattern, pipeline pattern, sorting algorithms, CUDA, Assignment 4. Does not include Assignment 5 (although includes CUDA lectures).
Final exam (2 1/2 hour exam)
date: 5:00 to 7:30 pm, Tuesday May 6th, 2014. In Woodward 130
Topics: Comprehensive
Format: Paper test in format of previous
posted final tests. Closed book.