University
of North Carolina Charlotte University of North Carolina Wilmington Parallel Programming Fall 2012 Tuesday/Thursday, 11:00 am - 12:15 pm |
||
Dr.
Barry
Wilkinson University of North Carolina at Charlotte Office Hours: 9:15am-10:45 am T/Th |
and |
Dr.
Clayton
Ferner University of North Carolina at Wilmington Office Hours: 2 pm - 3:30 pm T/Th |
Assignment
Frequently Asked Questions |
Academic
calendar |
Lecture
Materials |
Reading
materials |
Assignments |
Tests |
UNC-C Moodle |
Class videos |
The following slides are provided as Powerpoint slides. You
may wish to print these sides out as 1 x 2 or 2 x 3 thumbnails. The
slides are not ready for use until the date of the class. They
are likely to be revised just before the class.
Wk |
Date, 2012 |
No of slides |
|
|
1 |
Thurs Aug. 23 |
25 | Outline |
Course outline, prerequisites, course text, course contents, instructor details. |
1 |
Thurs Aug. 23 |
23 | Assignment
Preliminaries |
Assignment preliminaries,
access to servers, Moodle, student accounts. TA details and
responsibilities. |
2 |
Tues Aug. 28 |
13 | Parallel
Comp. Demand |
Demand for computational speed, grand challenge problems |
2 |
Tues Aug. 28 |
16 | Parallel
Comp. Potential |
Potential for speed-up using
multiple process(or)s, speed-up factor, max speed up, Amdahl's law,
Gustafson's law. |
2 |
Tues Aug 28 |
20 | Parallel Computers | Types of parallel computers, shared memory systems, multicore, programming shared memory, distributed memory platform, networked computers cluster computing, programming, GPU systems. |
2 |
Thurs Aug 30 |
31 | Pattern
Programming-1 |
Parallel patterns for structured
parallel programming, workpool, pipeline, divide and conquer, stencil,
all-to-all patterns, advantages of starting with patterns, tools, Seeds
pattern programming framework, user interface, programming example
(Monte Carlo pi), Seeds documentation. |
2 |
Thurs Aug 30 |
27 |
Pattern
Programming-2 Matrix add/multiply |
Seeds Framework, workpool module methods, bootstrapping class, further details of Seeds workpool for Assignment 1, Monte Carlo pi code, matrix addition workpool code. |
2 | Thurs Aug 30 | Assignment 1 | Using the Seeds Pattern Programming Framework: 1 - Workpool | |
3 |
Tues Sept 4 |
52 | Lower
Level Message-passing
Computing - MPI |
Basics of message-passing programming, MPI, point-to-point message passing, message tags, MPI communicator, blocking send/recv, command line compiling and executing MPI programs, instrumenting code for execution time, Eclipse IDE Parallel Tools Platform. |
3 |
Thurs Sept 6 and Tues Sept 11 | 50 | MPI collective routines, general features, broadcast, scatter, gather, reduce, barrier, alltoall broadcast, synchronous message passing, asynchronous (non-blocking) message passing, changing to synchronous message passing. | |
4 |
Tues Sept 11 |
Assignment 2 | Compiling and executing MPI programs. Comparison with Seeds | |
4 |
Thurs Sept 13 |
44 | Compiler directive approach | Introduction to the Paraguin
compiler, affine expressions, partitioning, mapping, parallel region,
forall |
5 | Tues Sept 18 and Thurs Sept 20 | 22 and 43 | Compiler
directive approach Examples |
Communication, Broadcast, Gather, Data Dependence, More Examples |
6 | Tues Sept 25 | Assignment 3 | Using Paraguin to Create MPI Programs - hello world, matrix multiplication, Sobel Edge Detection, and Monte Carlo. | |
Reading |
9 |
Divide and conquer pattern | Recursive divide and conquer pattern, example: numerical integration with adaptive quadrature. | |
6 |
Tues Sept 25 | 30 |
Pipeline pattern | Pipeline pattern, examples unfolding loops, insertion sort, prime numbers, upper triangular linear equations. Seeds pipeline pattern, sorting code. |
6 |
Tues Sept 25 | 35 |
Synchronous All-To-All pattern, example use in gravitational N-body problem, Barnes-Hut algorithm, Seeds CompleteSynchGraph pattern code for N-body problem, module methods and bootstrap class. | |
6 |
Tues Sept 25 |
13 |
Iterative synchronous All-To-All
pattern, solving system of linear equations by iteration, Seeds
CompleteSynchGraph Pattern, MPI _Allgather() routine, Jacobi iteration,
convergence rate. |
|
Reading |
34 |
Stencil pattern | Stencil pattern, applications, heat distribution problem, Seeds code, cellular automata, game of life, partially synchronous method. | |
6 |
Thurs Sept 27 |
39 | Programming
with Shared Memory |
Programming shared memory systems, processes, threads, issues, interleaved statements, thread safe routines, re-ordering code, compiler/processor optimizations, accessing shared data, critical sections, locks, condition variables, deadlock, semaphores, monitors, Pthreads program example. |
6 | 32 | Shared memory performance issues | Shared memory performance issues, specifying parallelism, par, forall constructs, dependency analysis (Bernstein's conditions), critical sections serializing code, data shared in caches, false sharing, sequential consistency, code re-ordering | |
Reading |
25 | Java threads and synchronization | Brief review of Java threads, Thread class, Runnable interface, Java synchronization, Synchronised methods, statements, atomic. | |
6 |
Thurs Oct 3 |
50 | Introduction
to OpenMP |
Introduction to OpenMP,
directives/constructs, parallel, shared and local variables,
work-sharing, sections, for, loop scheduling, for reduction, single
master, critical, barrier, atomic, flush. |
7 |
Oct 8/9 |
Fall Recess, no classes |
||
7 |
Thurs Oct 11 |
Class test |
||
8 |
Tues Oct 16 |
Assignment 4 |
OpenMP and hybrid MPI/OpenMP
assignment using command line. |
|
9 |
Thurs Oct 18 |
44 | Numerical
Algorithms Num.Alg. quiz |
Parallelizing matrix multiplication, time complexities, block multiplication, recursive algorithm, mesh algorithms, Cannon's algorithm, systolic array, solving system of linear equations, direct and iterative methods, red-black, multigrid. |
10 |
Tues Oct 22 |
32 | Hybrid Programming | Combining MPI and OpenMP to take advantage of clusters that have both distributed-memory and shared-memory. Using the Paraguin compiler to generate a hybrid program. Discussion of whether hybrid is any better than using only MPI or only OpenMP. |
10 |
Thurs Oct 25 |
40 | Sorting Algorithms | Potential speedup of sorting in parallel, compare and
exchange, bubble sort, odd-even transposition sort, mergesort,
quicksort, odd-even mergesort, bitonic mergesort, shearsort. |
10 |
Thurs
Oct 25 |
9 | Data
Parallel Pattern |
Data parallel pattern, use of
forall notation, example, data parallel prefix sum algorithm, matrix
multiplication. |
11 |
Tues Oct 30
|
16
26 |
CPU-GPU architecture evolution, 1970s to present, dedicated pipelined GPUs, general purpose GPU design, NVIDIA products, Fermi architecture, GPU performance gains, CUDA. CUDA SIMT prog. model, CUDA kernel routines, CPU and GPU memories, basic CUDA program structure, code example adding two vectors, compiling and executing on Linux command line, Windows MS Visual Studio. |
|
11 |
Thurs Nov 1 |
25 | Multidimensional
thread structure |
CUDA programming:
threads, blocks, grid, multidimensional grid and blocks, compute
capabilities, thread
addressing, predefined variables, flattening array, 2-D grid and block
code: matrix
addition/multiplication. |
11 | Thurs Nov 1/Tues Nov 6 | 20 | Measuring performance, timing program execution, CUDA “events”, synchronous and asynchronous CUDA routines, max and effective bandwidth, computation measures, FLOPs. | |
12 |
Tues Nov 6th |
|
|
CUDA assignment using Linux environment to compile and execute simple CUDA programs, make file, vector/matrix addition and static heat distribution problem. |
12 | Tues Nov 6 |
14 | Device routines | Declaring routines called from device and from host, local device variables, accessing kernel variables from host, cudaMemcopyToSymbol/FromSymbol |
12 |
Thurs Nov 8 |
12 |
Ways to achieve thread synchronization, __syncThreads(), CPU
synchronization, cudaThreadSynchronize(), __threadfence(). |
|
12 |
Thurs Nov 8 |
Sorting Algorithms | Sorting continued. Rank sort, counting sort, radix sort | |
13 |
Tues Nov 13 |
40 |
Graph
Algorithms |
Prim's Algorithm for Minimum
Spanning Tree, Dijkstra's Algorithm for Single-Source Shortest Path,
Dijkstra's and Floyd's Algorithms for All-Pairs Shortest Path |
13 |
Thurs Nov 15 |
Class test |
||
14 |
Tues Nov 20 |
23 | Sieve of
Eratosthenes |
Sieve of Eratosthenes Algorithm
for computing prime numbers |
15 |
Tues Nov 27 |
Class teaching evaluation | ||
15 |
Thurs Nov 29 |
Presentations by UNCW graduate
students |
||
16 |
Tues Dec 4, 2012 |
Last class. Review/discussion |
Paraguin
userman.pdf (will be available later)
CUDA
Date set | Interim
dates |
Assignment | Topic | Date due 12 pm (noon) |
Thurs Aug 30, 2012 |
Date
to report system/account problems: Tues Sept 4, 2012 |
Assignment 1 Seeds 2.0 Pi Approx |
Using the Seeds Pattern
Programming Framework 1 - Workpool Pattern |
Monday, Sept 10, 2012 |
Tues Sept 11 |
Date
to report system/account problems: Tues Sept 18, 2012 |
Assignment 2 |
Compiling and running MPI
programs Comparison with Seeds |
Monday, Sept 24, 2012 |
Tues Sept 25 |
Date to report system/account
problems: Tues Oct 2, 2012 |
Assignment 3 |
Using Paraguin to Create MPI Programs - hello world, matrix multiplication, Sobel Edge Detection, and Monte Carlo. | Wednesday Oct 10, 2012 |
Tues Oct 16 |
Date to report system/ |
OpenMP and hybrid OpenMP/MPI assignment |
Monday Nov 5, 2012 |
|
Tues Nov 6, 2012 |
Date to report system/ compiler problems: Tues Nov 13, 2012 |
Assignment 5 |
CUDA programs, Linux environment
to compile and execute
simple CUDA programs, make file, vector/matrix addition and static heat
distribution problem. |
Mon Nov 26, 2012 |
Class
test 1: Thursday October 11th, in class 11 am - 12:15 pm
Format: 45 question, multiple choice,
paper test. Closed book.
Topics: All lecture materials
presented in class from
beginning of course (week 1) to week 6 inclusive, and materials in
Assignment 1 (pattern programming), Assignment 2 (MPI), and Assignment
3 (Paraguin compiler directives). Does not include
shared memory
programming.
Class
test 2: Thursday November 15th in class 11 am - 12:15 pm
Format:
Paper test in format of previous posted 2nd tests. Closed book.
Topics:
All materials after test 1, week 6 to week 10 inclusive - shared
memory programming, numerical algorithms, sorting algorithms, hybrid
programming, assignment 4. DOES
NOT INCLUDE GPU/CUDA PROGRAMMING
Final
Exam:
UNC-C
students: 11 am - 1:30 pm, Tuesday December 11th 2012
UNC-W students:
11:30 am - 2:00 pm, Tuesday December 11th 2012
Topics: Comprehensive.