University
of North Carolina Charlotte |
||
Dr.
Barry
Wilkinson |
and |
Dr.
Clayton
Ferner University of North Carolina at Wilmington Office Hours: Tuesday/Thursday 11 am - 1 pm |
This page is continually updated as the course proceeds. Watch for announcements. Modification date: Dec 3, 2014. Always make sure you have the most recent copy of this page (not cached, re-load page).
Academic
calendar |
Lecture
Materials |
Reading
materials |
Assignments | Tests |
UNC-C Moodle 2 |
Class videos (Streaming) |
Assignment FAQs |
The following slides are provided as Powerpoint slides. The slides are not ready for use until the date of the class. They are likely to be revised just before the class. The order of materials may also change.
Wk |
Date, 2014 |
No of slides |
|
Review/Quiz questions
|
|
Class Videos (Trimmed) |
1 |
Thurs Aug. 21 |
28 | Outline |
Course outline, prerequisites, course text, course contents, instructor details. TA details and responsibilities. | ||
1 |
Thurs Aug. 21 |
12 | Assignment
Preliminaries |
Assignment preliminaries, Moodle, student accounts. | ||
1 | Thurs Aug. 21 | 14 pages | Pre-Assignment | Test software environment on your computer | ||
1 |
Thurs Aug. 21 |
10 | Parallel
Comp. Demand |
Demand for computational speed, grand challenge problems |
||
2 |
Tues Aug 26 |
14 | Parallel
Comp. Potential |
Quiz questions | Potential for speed-up using
multiple process(or)s, speed-up factor, max speed up, Amdahl's law,
Gustafson's law. |
Lecture 2 not available |
2 |
Tues Aug 26 |
19 | Parallel Computers | Types of parallel computers, shared memory systems, multicore, programming shared memory, distributed memory platform, networked computers cluster computing, programming, GPU systems. | ||
2
|
Tues Aug 26 |
17 | Programming
with Shared Memory-1 |
Programming shared memory systems, processes, fork, fork-join pattern, threads, Pthreads, thread pool pattern | ||
2 |
Thurs Aug 28 |
37 | Introduction
to OpenMP |
Introduction to OpenMP, thread team pattern, directives/constructs, parallel, shared and local variables, work-sharing, sections, for, loop scheduling, for reduction, single master. | Lecture 3 | |
2 | Thurs Aug 28 | 9 pages | OpenMP tutorial | |||
3 | Tuesday Sept 2 | 29 | Programming with Shared memory-2 | OpenMP Quiz questions | Accessing shared data, critical sections, locks, condition variables, critical sections serializing code, deadlock, semaphores, monitors, Pthreads program example. | Lecture 4 |
3 | Tuesday Sept 2 | 10 | OpenMP continued | Sharing data and synchronization, critical, barrier, atomic, flush. | ||
3 | Thursday Sept 4 | 13 | Intro to stencil pattern | Stencil pattern, heat distribution | Lecture 5 | |
3 | Thursday Sept 4 | 6 pages | Assignment 2 | OpenMP heat distribution program, graphics | ||
3/4 | Thurs Sept 4/Tues Sept 9 | 34 | Programming with Shared Memory-3 | Shared Memory Quiz questions | Shared memory performance issues, specifying parallelism, par, forall constructs, dependency analysis (Bernstein's conditions), data shared in caches, false sharing, sequential consistency, code re-ordering | Lecture 6 |
4 |
Tues Sept 9/Thurs Sept 11 |
62 | Lower
Level Message-passing
Computing - MPI |
Basics of message-passing programming, MPI, point-to-point message passing, message tags, MPI communicator, blocking send/recv, command line compiling and executing MPI programs, instrumenting code for execution time, Eclipse IDE Parallel Tools Platform. | Lecture 7 | |
4/5 |
Thurs Sept 11/16 | 45 |
|
Message passing patterns, MPI collective routines, broadcast, scatter, gather, reduce, barrier, alltoall broadcast. | Lecture 8 | |
5 | Tues Sept 16 | 22 pages | Assignment 3 | MPI tutorial, using command line and Eclipse-PTP | ||
5 | Thurs Sept 18 | 35 | Synchronization | Barriers implementations, counter, reentrant code, tree, butterfly, local synchronization, safety and deadlock, safe MPI routines, MPI_SendRecv(), MPI_BSend(), MPI_Isend/MPI_Irecv(), synchronous message passing, asynchronous (non-blocking) message passing, changing to synchronous message passing. | Lecture 9 | |
6 | Tues Sept 23 | 25 | Introduction to Patterns | Pattern programming concepts, problem addressed, low message-passing patterns, point to point data transfer, broadcast, scatter, gather, reduce, all-to-all broadcast, higher level message-passing patterns, workpool, pipeline, divide and conquer, all-to-all, iterative synchronous patterns, iterative synchronous all-to-all, stencil, advantages and disadvantages of patterns, our tools. | Lecture 10 | |
6 | Tues Sept 23 | 21 | Suzaku framework | Quiz questions | Suzaku, macros, routines, implementation | |
6 | Thurs Sept 25, 2014 | Class quiz | No lecture. Take quiz at any location | |||
7 | Tues Sept 30 | 4 pages | MPI application. Monte Carlo pi workpool | Lecture 11 | ||
7 |
Tues Sept 30 |
47 | Compiler directive approach | Introduction to Paraguin,
parallel regions, barrier, forall, broadcast, scatter, gather, and
reduction. |
||
7 | Thurs Oct 2 | 37 and 34 | Compiler
directive approach Examples |
Quiz Questions | Patterns, Scatter/Gather, Stencil | Lecture 12 |
8 | Tues Oct 7 | Fall Recess, no class. Fall break will follow UNC-Charlotte. Students at other sites with a different break will need to watch the video of the class missed in their breaks, at their convenience. |
||||
8 | Thurs Oct 9 | Paraguin continued | ||||
8 |
Thurs Oct 9 |
45 | Seeds framework | Quiz questions | Seeds
pattern programming framework, , module
method, bootstrapping class, network and multicore versions, workpool programming examples -
Monte Carlo pi, matrix addion, matrix multiplication. |
Lecture 13 |
9 | Tues Oct 14 |
29 |
Patterns and Applications |
|
All-To-All pattern, iterative synchronous All-To-All pattern, gravitational N-body problem, Barnes-Hut algorithm, solving system of linear equations by iteration, Jacobi iteration, convergence rate. |
Lecture 14 |
9 | Tues Oct 14 | 22 | Stencil pattern | Quiz questions | Stencil pattern, applications, solving Laplace's eq., heat distribution problem, ways to improve performance, partially synchronous method, red-black, multigrid. | |
9 | Thurs Oct 16 | 23 | Assignment 5 slides Assignment 5 |
Using Paraguin to Create MPI Programs - hello world, matrix multiplication, stencil pattern, and Monte Carlo. | ||
9 | Thurs Oct 16 | 26 | Pipeline pattern | Quiz questions | Pipeline pattern, space time diagram, speed up factor, applications, matrix-vector multiplication, matrix multiplication, insertion sort, prime numbers, upper triangular linear equations. | Lecture 15 |
9/10 |
Thurs Oct 16/Tues Oct 21 |
40 | Sorting Algorithms | Quiz questions | Potential speedup of sorting in parallel, compare and exchange, bubble sort, odd-even transposition sort, mergesort, quicksort, odd-even mergesort, bitonic mergesort, shearsort, rank sort, counting sort, radix sort | Lecture 16 |
10 | Thurs Oct 23 | 23 | Sieve of Eratosthenes | Quiz questions | Sieve of Eratosthenes Algorithm for computing prime numbers | Lecture 17 |
10 | Thurs Oct 23 | 42 | Graph Algorithms | Prim's Algorithm for Minimum Spanning Tree, Dijkstra's Algorithm for Single-Source Shortest Path, Dijkstra's and Floyd's Algorithms for All-Pairs Shortest Path | ||
11 | Tues Oct 28 |
26 pages | Assignment 6 | Using the Seeds Pattern Programming Framework: 1 - Workpool | ||
11 | Tues Oct 28 | 21
14 |
|
Combining MPI and OpenMP to take advantage of clusters that have both distributed-memory and shared-memory. Discussion of whether hybrid is any better than using only MPI or only OpenMP. Using the Paraguin compiler to generate a hybrid program. |
Lecture 18 | |
11 |
Thurs Oct 30 |
14 | Data
Parallel Pattern |
Data parallel pattern, use of
forall notation, example, data parallel prefix sum algorithm, matrix
multiplication. |
Lecture 19 | |
11 |
Thurs Oct 30
|
21
21
|
Quiz questions |
CPU-GPU architecture evolution, 1970s to present, dedicated pipelined GPUs, general purpose GPU design, NVIDIA products, Fermi architecture, GPU performance gains, CUDA. CUDA SIMT prog. model, CUDA kernel routines, CPU and GPU memories, basic CUDA program structure, code example adding two vectors, compiling and executing on Linux command line, Windows MS Visual Studio. |
||
12 | Tues Nov 4 | 38 |
CUDA programming: threads, blocks, grid, multidimensional grid and blocks, compute capabilities, thread addressing, predefined variables, flattening array, 2-D grid and block code: matrix addition/multiplication. |
Lecture 20 | ||
12 | Tues Nov 4 |
21
14
|
|
Measuring performance, timing program execution, CUDA “events”, synchronous and asynchronous CUDA routines, max and effective bandwidth, computation measures, FLOPs. Declaring routines called from device and from host, local device variables, accessing kernel variables from host, cudaMemcopyToSymbol/FromSymbol |
||
12 |
Nov 6th |
5 pages |
|
CUDA assignment using Linux environment to compile and execute simple CUDA programs, make file, vector/matrix addition/multiplication, prefix sum, sorting. |
||
12 | Nov 6th | 12
34 |
|
Ways to achieve thread synchronization, __syncThreads(), CPU synchronization, cudaThreadSynchronize(), __threadfence(). Memory structures and bandwidth optimization, memory coalescing |
Lecture 21 | |
13 | Nov 11 | 28
22 11 |
|
Introduction to a few performance analysis tools: time, gettimeofday, read_real_time, MPI_Wtime, prof, gprof, xprofiler, mpiP Demonstration of memory coalescing, code, performance improvements Demonstration of using shared memory, code, performance improvements |
Lecture 22 | |
13 | Nov 13th | Review for quiz | Lecture 23 | |||
14 |
Nov 18th |
Moodle Quiz |
No lecture. Take at any location |
|||
14 | Nov 20th | Review/discussion/sample finals | Lecture 24 | |||
15 | Nov 25th | Review/discussion/sample finals | Lecture 25 | |||
15 | Nov 27th | Thanksgiving break. No classes | ||||
16 |
Dec 2nd, 2014 |
Last NCREN class. Review/discussion Teaching evaluations done on-line |
Lecture 26 |
OpenMP
MPIParaguin
Seeds Pattern Programming Framework
CUDA
Textbook home page: http://www.cs.uncc.edu/par_prog
English to American translation
Assignments Each assignment is not ready for use until the date set.
Assignment FAQs
Clusters used in assignments:
Date set | Date to report system/software problems |
Assignment | Topic | Date due 12 pm (noon) |
Thurs Aug. 21 |
Tues Aug 26, 2014 | Test software environment on your computer | Thursday Aug 28, 2014 | |
Thursday Aug 28, 2014 | Tues Sept 2, 2014 | Assignment 1 | OpenMP tutorial | Thursday Sept 4, 2014 |
Thursday Sept 4, 2014 | Tues Sept 9, 2014 | OpenMP heat distribution program, graphics | Tuesday Sept 16, 2014 | |
Tuesday Sept 16, 2014 | Thurs Sept 18, 2014 | Assignment 3 | MPI tutorial, using command line and Eclipse-PTP | Tues Sept 30, 2014 |
Tues Sept 30, 2014 | Thurs Oct 2, 2014 | Assignment 4 | MPI program. Monte Carlo pi workpool | Friday Oct 10, 2014 |
Thurs Oct 16, 2014 | Tues Oct 21, 2014 | Assignment 5 | Paraguin | Tues Oct 28, 2014 |
Tues Oct 28, 2014 | Thurs Oct 30, 2014 | Assignment 6 | Seeds | Thurs Nov 6th, 2014 |
Thurs Nov 6th, 2014 |
Tues Nov 11, 2014 |
Assignment 7 |
CUDA (On UNC-C K20 cci-grid08.uncc.edu) |
Thurs Nov 20, 2014 |
Class test 1 date: Thursday Sept 25th, 2014, 2 pm - 3:15 pm (during class period) Take at any location.
Format: 40 questions, multiple choice, Moodle quiz. Closed book.
Topics: All lecture materials presented in class from
beginning of course (week 1) to week 5 inclusive, and materials in
assignments.
Class test 2 date: Tuesday Nov 18th, 2014, 2 pm - 3:15 pm (during class period) Take at any location.
Format: 40 questions, multiple choice, Moodle quiz. Closed book.
Topics: All materials after test 1, week 6 to week 13 inclusive and assignments.
Final exam date: (2 1/2 hour exam within university scheluded exam period for class):
UNCC students: Tuesday Dec 9, 2014, 2 pm to 4:30 pm
UNCW students: Thursday Dec 11, 2014, 3:00 pm- 6:00 pm
App State students: Thursday, Dec 11, 2014, 12:00 pm (Noon)- 2:30 pm
ECU students: Tuesday Dec 16, 2014, 2 pm - 4:30 pm
UNCG students: Saturday Dec 6, 2014, 3:30 pm - 6:30 pm
WCU students: Monday Dec 8, 2014 12:00 pm (Noon) - 2:30 pm