UNCC-logo
University of North Carolina Charlotte
University of North Carolina Wilmington
Parallel Programming Fall 2012
Tuesday/Thursday, 11:00 am - 12:15 pm
UNCW logo
Dr. Barry Wilkinson
University of North Carolina at Charlotte

Office Hours:
9:15am-10:45 am T/Th

and
Dr. Clayton Ferner
University of North Carolina at Wilmington

Office Hours:
2 pm - 3:30 pm T/Th
This page is continually updated as the course proceeds. Watch for announcements. Modification date: Dec 18, 2012

ANNOUNCEMENTS


Assignment Frequently Asked Questions
Academic calendar
Lecture Materials
Reading materials
Assignments
Tests
UNC-C Moodle
Class videos

Lecture Materials

The following slides are provided as Powerpoint slides.  You may wish to print these sides out as 1 x 2 or 2 x 3 thumbnails. The slides are not ready for use until the date of the class.  They are likely to be revised just before the class.
Lecture slides

Wk

Date, 2012
No of slides
Slides
Topics
1
Thurs Aug. 23
25 Outline
Course outline, prerequisites, course text,  course contents, instructor details.
1
Thurs Aug. 23
23 Assignment Preliminaries
Assignment preliminaries, access to servers, Moodle, student accounts. TA details and responsibilities.
2
Tues Aug. 28
13 Parallel Comp. Demand
Demand for computational speed, grand challenge problems
2
Tues Aug. 28
16 Parallel Comp. Potential
Potential for speed-up using multiple process(or)s, speed-up factor, max speed up, Amdahl's law, Gustafson's law.
2
Tues Aug 28
20 Parallel Computers Types of parallel computers, shared memory systems, multicore, programming shared memory, distributed memory platform, networked computers cluster computing, programming, GPU systems.
2
Thurs Aug 30
31 Pattern Programming-1
Parallel patterns for structured parallel programming, workpool, pipeline, divide and conquer, stencil, all-to-all patterns, advantages of starting with patterns, tools, Seeds pattern programming framework, user interface, programming example (Monte Carlo pi), Seeds documentation.
2
Thurs Aug 30
27


Pattern Programming-2

Matrix add/multiply
Seeds Framework, workpool module methods, bootstrapping class, further details of Seeds workpool for Assignment 1, Monte Carlo pi code, matrix addition workpool code.
2 Thurs Aug 30   Assignment 1 Using the Seeds Pattern Programming Framework: 1 - Workpool
3
Tues Sept 4
52 Lower Level Message-passing Computing - MPI
Basics of message-passing programming, MPI, point-to-point message passing, message tags, MPI communicator, blocking send/recv, command line compiling and executing MPI programs, instrumenting code for execution time, Eclipse IDE Parallel Tools Platform.
3
Thurs Sept 6 and Tues Sept 11 50

More MPI routines

MPI Quiz questions

MPI collective routines, general features, broadcast, scatter, gather, reduce, barrier, alltoall broadcast, synchronous message passing, asynchronous (non-blocking) message passing, changing to synchronous message passing.
4
Tues Sept 11

Assignment 2 Compiling and executing MPI programs. Comparison with Seeds
4
Thurs Sept 13
44 Compiler directive approach Introduction to the Paraguin compiler, affine expressions, partitioning, mapping, parallel region, forall
5 Tues Sept 18 and Thurs Sept 20 22 and 43 Compiler directive approach

Examples
Communication, Broadcast, Gather, Data Dependence, More Examples
6 Tues Sept 25   Assignment 3 Using Paraguin to Create MPI Programs - hello world, matrix multiplication, Sobel Edge Detection, and Monte Carlo.

 Reading
9
Divide and conquer pattern Recursive divide and conquer pattern, example:  numerical integration with adaptive quadrature.
6
Tues Sept 25 30
Pipeline pattern Pipeline pattern, examples unfolding loops, insertion sort, prime numbers, upper triangular linear equations.  Seeds pipeline pattern, sorting code.
6
Tues Sept 25 35

Synchronous All-ToAll Pattern

Demo

Synchronous All-To-All pattern, example use in gravitational N-body problem, Barnes-Hut algorithm, Seeds CompleteSynchGraph pattern code for N-body problem, module methods and bootstrap class.
6
Tues Sept 25

13

Iterative synchronous All-To-All pattern

Iterative synchronous All-To-All pattern, solving system of linear equations by iteration, Seeds CompleteSynchGraph Pattern, MPI _Allgather() routine, Jacobi iteration, convergence rate.

Reading
34
Stencil pattern Stencil pattern, applications, heat distribution problem, Seeds code, cellular automata, game of life, partially synchronous method.





6
Thurs Sept 27
39 Programming with Shared Memory
Programming shared memory systems, processes, threads, issues, interleaved statements, thread safe routines, re-ordering code, compiler/processor optimizations, accessing shared data, critical sections, locks, condition variables, deadlock, semaphores, monitors, Pthreads program example.
6   32 Shared memory performance issues Shared memory performance issues, specifying parallelism, par, forall constructs, dependency analysis (Bernstein's conditions), critical sections serializing code, data shared in caches, false sharing, sequential consistency, code re-ordering

Reading
25 Java threads and synchronization Brief review of Java threads, Thread class, Runnable interface, Java synchronization, Synchronised methods, statements, atomic.
6
Thurs Oct 3
50 Introduction to OpenMP
Introduction to OpenMP, directives/constructs, parallel, shared and local variables, work-sharing, sections, for, loop scheduling, for reduction, single master, critical, barrier, atomic, flush.
7
Oct 8/9


Fall Recess, no classes
7
Thurs Oct 11


Class test
8
Tues Oct 16
  Assignment 4

OpenMP and hybrid MPI/OpenMP assignment using command line.

9
Thurs Oct 18
44 Numerical Algorithms

Num.Alg. quiz
Parallelizing matrix multiplication, time complexities, block multiplication, recursive algorithm, mesh algorithms, Cannon's algorithm, systolic array, solving system of linear equations, direct and iterative methods, red-black, multigrid.
10
Tues Oct 22
32 Hybrid Programming Combining MPI and OpenMP to take advantage of clusters that have both distributed-memory and shared-memory. Using the Paraguin compiler to generate a hybrid program. Discussion of whether hybrid is any better than using only MPI or only OpenMP.
10
Thurs Oct 25
40 Sorting Algorithms Potential speedup of sorting in parallel, compare and exchange, bubble sort, odd-even transposition sort, mergesort, quicksort, odd-even mergesort, bitonic mergesort, shearsort.
10
Thurs Oct 25
9 Data Parallel Pattern
Data parallel pattern, use of forall notation, example, data parallel prefix sum algorithm, matrix multiplication.
11

Tues Oct 30

 


16

 

 

26

Intro to GPUs and CUDA


CUDA Prog. Model

CPU-GPU architecture evolution, 1970s to present, dedicated pipelined GPUs, general purpose GPU design, NVIDIA products, Fermi architecture, GPU performance gains, CUDA.

CUDA SIMT prog. model, CUDA kernel routines, CPU and GPU memories, basic CUDA program structure, code example adding two vectors, compiling and executing on Linux command line, Windows MS Visual Studio.

11
Thurs Nov 1
25 Multidimensional thread structure
CUDA programming: threads, blocks, grid, multidimensional grid and blocks, compute capabilities, thread addressing, predefined variables, flattening array, 2-D grid and block code: matrix addition/multiplication.
11 Thurs Nov 1/Tues Nov 6 20

Performance measurements

Measuring performance, timing program execution, CUDA “events”, synchronous and asynchronous CUDA routines, max and effective bandwidth, computation measures, FLOPs.
12
Tues Nov 6th

 

Assignment 5


CUDA assignment using Linux environment to compile and execute simple CUDA programs, make file, vector/matrix addition and static heat distribution problem.

12  Tues Nov 6
14 Device routines Declaring routines called from device and from host, local device variables, accessing kernel variables from host, cudaMemcopyToSymbol/FromSymbol
12
Thurs Nov 8

12

Thread synchronization

Ways to achieve thread synchronization, __syncThreads(), CPU synchronization, cudaThreadSynchronize(), __threadfence().

12
Thurs Nov 8

Sorting Algorithms Sorting continued. Rank sort, counting sort, radix sort
13
Tues Nov 13
40
Graph Algorithms
Prim's Algorithm for Minimum Spanning Tree, Dijkstra's Algorithm for Single-Source Shortest Path, Dijkstra's and Floyd's Algorithms for All-Pairs Shortest Path
13
Thurs Nov 15


Class test
14
Tues Nov 20
23 Sieve of Eratosthenes
Sieve of Eratosthenes Algorithm for computing prime numbers
15
Tues Nov 27
 
  Class teaching evaluation
15
Thurs Nov 29
 
Presentations by UNCW graduate students
16
Tues Dec 4, 2012
 
Last class. Review/discussion

Top 
Reading materials

Patterns
MPI

Paraguin

userman.pdf (will be available later) 

CUDA

Top 
Assignments

Each assignment is not ready for use until the date set.

Date set Interim dates
Assignment Topic Date due
12 pm (noon)
Thurs Aug 30, 2012
Date to report system/account problems:
Tues Sept 4, 2012
Assignment 1
Seeds 2.0
Pi Approx
Using the Seeds Pattern Programming Framework 1 - Workpool Pattern
Monday, Sept 10, 2012
Tues Sept 11
Date to report
system/account problems:
Tues Sept 18, 2012
Assignment 2
Compiling and running MPI programs
Comparison with Seeds
Monday, Sept 24, 2012
Tues Sept 25
Date to report system/account problems:
Tues Oct 2, 2012
Assignment 3
Using Paraguin to Create MPI Programs - hello world, matrix multiplication, Sobel Edge Detection, and Monte Carlo. Wednesday Oct 10, 2012
Tues Oct 16

Date to report system/
compiler problems:Tues Oct 23, 2012

Assignment 4

Generating X11 graphical output

Using UNC-C coit-grid cluster for MPI

OpenMP and hybrid OpenMP/MPI assignment

Monday Nov 5, 2012

Tues Nov 6, 2012
Date to report system/
compiler problems: Tues Nov 13, 2012
Assignment 5
CUDA programs, Linux environment to compile and execute simple CUDA programs, make file, vector/matrix addition and static heat distribution problem.

Mon Nov 26, 2012

Top 
Tests

Class test 1: Thursday October 11th, in class 11 am - 12:15 pm
Format: 45 question, multiple choice, paper test. Closed book.

Topics:
All lecture materials presented in class from beginning of course (week 1) to week 6 inclusive, and materials in Assignment 1 (pattern programming), Assignment 2 (MPI), and Assignment 3 (Paraguin compiler directives).  Does not include shared memory programming.

Class test 2: Thursday November 15th in class 11 am - 12:15 pm
Format: Paper test in format of previous posted 2nd tests. Closed book.

Topics:
All materials after test 1, week 6 to week 10 inclusive - shared memory programming, numerical algorithms, sorting algorithms, hybrid programming, assignment 4.   DOES NOT INCLUDE GPU/CUDA PROGRAMMING


Final Exam:
UNC-C students: 11 am - 1:30 pm, Tuesday December 11th 2012
UNC-W students: 11:30 am - 2:00 pm, Tuesday December 11th 2012

Topics: Comprehensive.


Previous tests (UNC-C courses that did not do patterns)

Top