University
of North Carolina Charlotte University of North Carolina Wilmington Parallel Programming Fall 2013 Tuesday/Thursday, 11:00 am - 12:15 pm |
||
Dr.
Barry
Wilkinson University of North Carolina at Charlotte Office Hours: 2 pm - 3:30 pm T/Th |
and |
Dr.
Clayton
Ferner University of North Carolina at Wilmington Office Hours: 12:30 pm - 2 pm T/Th |
This page is continually updated as the course proceeds. Watch for announcements. Modification date: Dec 9, 2013. Always make sure you have the most recent copy of this page (not cached, re-load page).
Assignment
Frequently Asked Questions |
Academic
calendar |
Lecture
Materials |
Reading
materials |
Assignments |
Tests |
UNC-C Moodle 2 |
Class videos |
The following slides are provided as Powerpoint slides. You may wish to print these sides out as 1 x 2 or 2 x 3 thumbnails. The slides are not ready for use until the date of the class. They are likely to be revised just before the class.
Wk |
Date, 2013 |
No of slides |
|
Review/Quiz questions
|
|
1 |
Thurs Aug. 22 |
26 | Outline |
Course outline, prerequisites, course text, course contents, instructor details. | |
1 |
Thurs Aug. 22 |
23 | Assignment
Preliminaries |
Assignment preliminaries,
access to servers, Moodle, student accounts. TA details and
responsibilities. |
|
1/2 |
Tues Aug 27 |
13 | Parallel
Comp. Demand |
Demand for computational speed, grand challenge problems |
|
2 |
Tues Aug 27 |
18 | Parallel
Comp. Potential |
Quiz questions | Potential for speed-up using
multiple process(or)s, speed-up factor, max speed up, Amdahl's law,
Gustafson's law. |
2 |
Tues Aug 27 |
20 | Parallel Computers | Types of parallel computers, shared memory systems, multicore, programming shared memory, distributed memory platform, networked computers cluster computing, programming, GPU systems. | |
2 |
Thurs Aug 29 |
30 | Pattern
Programming-1 |
Quiz questions | Parallel patterns for structured
parallel programming, workpool, pipeline, divide and conquer, stencil,
all-to-all patterns, advantages of starting with patterns, tools, Seeds
pattern programming framework, user interface, programming example
(Monte Carlo pi), Seeds documentation. |
2/3 |
Thurs Aug 29/Tues Sept 3 |
43 |
Pattern
Programming-2 Matrix add/multiply |
Seeds Framework, workpool module methods, bootstrapping class, further details of Seeds workpool for Assignment 1, Monte Carlo pi code, matrix addition workpool code. | |
2/3 | Thurs Aug 29/Tues Sept 3 |
Assignment 1 | Using the Seeds Pattern Programming Framework: 1 - Workpool | ||
3 |
Thurs Sept 5 |
47 | Compiler directive approach | Introduction to Paraguin,
parallel regions, barrier, forall, broadcast, scatter, gather, and
reduction. |
|
4 | Tues Sept 10 | 37 and 34 | Compiler
directive approach Examples |
Quiz Questions | Patterns, Scatter/Gather, Stencil |
4 | Thurs Sept 12 | 23 | Assignment 2 slides Assignment 2 |
Using Paraguin to Create MPI Programs - hello world, matrix multiplication, stencil pattern, and Monte Carlo. | |
5 |
Tues Sept 17 |
54 | Lower
Level Message-passing
Computing - MPI |
Basics of message-passing programming, MPI, point-to-point message passing, message tags, MPI communicator, blocking send/recv, command line compiling and executing MPI programs, instrumenting code for execution time, Eclipse IDE Parallel Tools Platform. | |
5 |
Thurs Sept 19 | 61 |
|
MPI collective routines, general features, broadcast, scatter, gather, reduce, barrier, alltoall broadcast, synchronous message passing, asynchronous (non-blocking) message passing, changing to synchronous message passing. | |
6 | Tues Sept 24 | 16 | Synchronization | Barriers implementations, counter, reentrant code, tree, butterfly, local synchronization, safety and deadlock, safe MPI routines, MPI_SendRecv(), MPI_BSend(), MPI_Isend/MPI_Irecv() | |
6 |
Thurs Sept 26 |
Assignment 3 | Compiling and executing MPI programs. Comparison with Seeds | ||
6
|
Tues
Sept 24/Thurs Sept 26 |
39 | Programming
with Shared Memory |
Programming shared memory systems, processes, threads, issues, interleaved statements, thread safe routines, re-ordering code, compiler/processor optimizations, accessing shared data, critical sections, locks, condition variables, deadlock, semaphores, monitors, Pthreads program example. | |
6/7 |
Thurs Sept 26/Tues Oct 1 |
50 | Introduction
to OpenMP |
Introduction to OpenMP, directives/constructs, parallel, shared and local variables, work-sharing, sections, for, loop scheduling, for reduction, single master, critical, barrier, atomic, flush. | |
7/8 | Tues Oct 1/ Thur Oct 10 | 32 | Shared memory performance issues | Shared memory performance issues, specifying parallelism, par, forall constructs, dependency analysis (Bernstein's conditions), critical sections serializing code, data shared in caches, false sharing, sequential consistency, code re-ordering | |
7 | Thurs Oct 3 | Class Test | No lecture. Take at any location | ||
8 |
Mon/Tues
Oct 7-8 |
Fall Recess, no classes. Fall break will follow UNC-Charlotte and students at other sites with a different break will need to watch the video of the class missed at their convenience. | |||
8 | Thur Oct 10 | Shared memory performance issues | |||
8 | Thur Oct 10 | 25 | Java threads and synchronization | Brief review of Java threads, Thread class, Runnable interface, Java synchronization, Synchronised methods, statements, atomic. | |
9 | Tues Oct 15 | 21
14 |
|
Combining MPI and OpenMP to take advantage of clusters that have both distributed-memory and shared-memory. Discussion of whether hybrid is any better than using only MPI or only OpenMP. Using the Paraguin compiler to generate a hybrid program. |
|
9 |
October 15/17 |
OpenMP and hybrid MPI/OpenMP
assignment using command line. |
|||
9 | October 15/17 | 32 | Synchronous All-To-All pattern, example use in gravitational N-body problem, Barnes-Hut algorithm, Seeds CompleteSynchGraph pattern code for N-body problem, iterative synchronous All-To-All pattern, solving system of linear equations by iteration, Jacobi iteration, convergence rate. Seeds CompleteSynchGraph Pattern, MPI _Allgather() routine |
||
10 | Tues Oct 22 | 37 | Stencil pattern | Quiz questions | Stencil pattern, applications, solving Laplace's eq., heat distribution problem, Seeds stencil pattern, cellular automata, game of life, ways to improve performance, partially synchronous method, red-black, multigrid. |
10 | Oct 22/24 | 32 | Pipeline pattern | Quiz questions | Pipeline pattern, space time diagram, speed up factor, matrix-vector multiplication, matrix multiplication, adding rows of an array, unfolding loops, frequency filter, insertion sort, prime numbers, upper triangular linear equations. Seeds pipeline pattern. |
10 | Thurs Oct 24 | 23 | Sieve of Eratosthenes | Quiz questions | Sieve of Eratosthenes Algorithm for computing prime numbers |
11 | Tues Oct 29 | 42 | Graph Algorithms | Prim's Algorithm for Minimum Spanning Tree, Dijkstra's Algorithm for Single-Source Shortest Path, Dijkstra's and Floyd's Algorithms for All-Pairs Shortest Path | |
11 |
Thurs Oct 31 |
40 | Sorting Algorithms | Quiz questions | Potential speedup of sorting in parallel, compare and exchange, bubble sort, odd-even transposition sort, mergesort, quicksort, odd-even mergesort, bitonic mergesort, shearsort, rank sort, counting sort, radix sort |
12 |
Tues Nov 5 |
14 | Data
Parallel Pattern |
Data parallel pattern, use of
forall notation, example, data parallel prefix sum algorithm, matrix
multiplication. |
|
12 |
Tues Nov 5
|
21
21
|
CPU-GPU architecture evolution, 1970s to present, dedicated pipelined GPUs, general purpose GPU design, NVIDIA products, Fermi architecture, GPU performance gains, CUDA. CUDA SIMT prog. model, CUDA kernel routines, CPU and GPU memories, basic CUDA program structure, code example adding two vectors, compiling and executing on Linux command line, Windows MS Visual Studio. |
||
12/13 | Nov 7/12 | 38 |
CUDA programming: threads, blocks, grid, multidimensional grid and blocks, compute capabilities, thread addressing, predefined variables, flattening array, 2-D grid and block code: matrix addition/multiplication. |
||
12 |
Thurs Nov 7 |
|
|
CUDA assignment using Linux environment to compile and execute simple CUDA programs, make file, vector/matrix addition/multiplication, and sorting. |
|
13 | Tues Nov 12 |
21
14
12 |
|
Measuring performance, timing program execution, CUDA “events”, synchronous and asynchronous CUDA routines, max and effective bandwidth, computation measures, FLOPs. Declaring routines called from device and from host, local device variables, accessing kernel variables from host, cudaMemcopyToSymbol/FromSymbol Ways to achieve thread synchronization, __syncThreads(), CPU synchronization, cudaThreadSynchronize(), __threadfence(). |
|
13 |
Thurs Nov 14 |
Class
Test |
No lecture. Take at any location |
||
14 |
Tues Nov 19 |
34 28 |
Memory structures and bandwidth optimization, memory coalescing Introduction to a few performance analysis tools: time, gettimeofday, read_real_time, MPI_Wtime, prof, gprof, xprofiler, mpiP |
||
14 | Thurs Nov 21 | Review/discussion/sample finals | |||
15 |
Tues Nov 26 |
Last NCREN class. Review/discussions/sample finals Teaching evaluations done on-line |
Paraguin
Paraguin Compiler Version 2.1 User Manual
CUDA
UNC-Charlotte Spring CUDA programming course (ITCS 4/5010 Spring 2013)
UNCC parallel programming cci-grid0x cluster
cci-grid0x cluster (newly reorganized)
Generating X11 graphical output
Notes on installing MPI on your own computer
OpenMPI on a Mac here from Tristan Bithell Spring 2013 class
MPICH on Ubuntu http://jetcracker.wordpress.com/2012/03/01/how-to-install-mpi-in-ubuntu/ from Thomas Kraft, Spring 2013 class
Notes on installing OpenMP on your own computer
To add. Let me know if you have anything to post on MPI or OpenMP installation. Thanks. BW
Date set |
Date to report system/account problems
|
Assignment | Topic | Date due 12 pm (noon) |
Thursday Aug 29th, 2013 | Monday Sept 2nd,
2013 |
Using the Seeds Pattern
Programming Framework 1 - Workpool Pattern |
Wednesday Sept 11th, 2013 | |
Thursday Sept 12th, 2013 |
Monday Sept 16th, 2013 |
Assignment 2 |
Using Paraguin to Create MPI
Programs - hello world, matrix multiplication, stencil pattern,
and Monte Carlo. |
Wednesday Sept 25th, 2013 |
Thursday Sept 26, 2013 |
Monday Sept 30, 2013 |
Assignment 3 |
Compiling and running MPI
programs |
Wednesday Oct 16, 2013 |
Thursday Oct 17, 2013 |
Monday Oct 21, 2013 |
Test file here G = 100 |
OpenMP and hybrid OpenMP/MPI assignment |
New: Thursday Nov 7th, 2013 at 11:55 pm |
Thursday Nov 7, 2013 |
Monday Nov 11, 2013 |
Assignment 5 |
CUDA programs, Linux environment
to compile and execute
simple CUDA programs, make file, vector/matrix addition, and sorting. |
Tuesday Nov 26, 2013 |
Class test 1 date: Thursday Oct 3rd, 2013 -- 60 minutes scheduled during class time. Open 10:45 am. Closes 12:30 pm.
Format: 40 questions, multiple choice, Moodle quiz. Closed book.
Topics: All lecture materials presented in class from
beginning of course (week 1) to week 6 inclusive, and materials in
Assignment 1 (pattern programming) and Assignment
2 (Paraguin compiler directives) and MPI but not Assignment 3. Does not include
shared memory
programming.
Class test 2 date: Nov 14, 2013 -- 60 minutes scheduled during class time. Open 10:45 am. Closes 12:30 pm.
Format: 40 questions, multiple choice, Moodle quiz. Closed book.
Topics: All materials after test 1, week 6 to week 13 inclusive - shared
memory programming, OpenMP, shared memory performance issues, Java threads, hybrid
programming, synchronous all-to-all pattern, stencil pattern, pipeline pattern, Sieve of Eratosthenes, graph algorithms, numerical algorithms, sorting algorithms, CUDA, Assignment 4. Does not include Assignment 5 (although includes CUDA lectures).
Final exam (2 hour exam)
date:
UNC-C
students: 11 am - 1:00 pm, Tuesday December 10th 2013
UNC-W students: 11 am - 1:00 pm, Tuesday December
10th 2013
UNC A&T students: 1 pm - 3 pm, Thursday December 12, 2013
ECU students: 11 am - 1:00 pm, Thursday
December 5th 2013
UNC-G students: 12 noon- 2:00 pm, Tuesday December
10th 2013
WSSU
students: 11 am- 1 pm, Tuesday, December 10, 2013