ITCS 4145 Parallel Programming

	University of North Carolina Charlotte University of North Carolina Wilmington Parallel Programming Fall 2013 Tuesday/Thursday, 11:00 am - 12:15 pm
Dr. Barry Wilkinson University of North Carolina at Charlotte Office Hours: 2 pm - 3:30 pm T/Th	and	Dr. Clayton Ferner University of North Carolina at Wilmington Office Hours: 12:30 pm - 2 pm T/Th

This page is continually updated as the course proceeds. Watch for announcements. Modification date: Dec 9, 2013. Always make sure you have the most recent copy of this page (not cached, re-load page).

ANNOUNCEMENTS

Assignment Frequently Asked Questions

Academic calendar

Lecture Materials

Reading materials

Assignments

Tests

UNC-C Moodle 2

Class videos

Lecture Materials

The following slides are provided as Powerpoint slides. You may wish to print these sides out as 1 x 2 or 2 x 3 thumbnails. The slides are not ready for use until the date of the class. They are likely to be revised just before the class.
Lecture slides

Wk

Date, 2013
No of slides Slides
Review/Quiz questions
Topics

1
Thurs Aug. 22
26 Outline
Course outline, prerequisites, course text, course contents, instructor details.

1
Thurs Aug. 22
23 Assignment Preliminaries
Assignment preliminaries, access to servers, Moodle, student accounts. TA details and responsibilities.

1/2
Tues Aug 27
13 Parallel Comp. Demand
Demand for computational speed, grand challenge problems

2
Tues Aug 27
18 Parallel Comp. Potential
Quiz questions Potential for speed-up using multiple process(or)s, speed-up factor, max speed up, Amdahl's law, Gustafson's law.

2
Tues Aug 27
20 Parallel Computers Types of parallel computers, shared memory systems, multicore, programming shared memory, distributed memory platform, networked computers cluster computing, programming, GPU systems.

2
Thurs Aug 29
30 Pattern Programming-1
Quiz questions Parallel patterns for structured parallel programming, workpool, pipeline, divide and conquer, stencil, all-to-all patterns, advantages of starting with patterns, tools, Seeds pattern programming framework, user interface, programming example (Monte Carlo pi), Seeds documentation.

2/3
Thurs Aug 29/Tues Sept 3
43

Pattern Programming-2

Matrix add/multiply
Seeds Framework, workpool module methods, bootstrapping class, further details of Seeds workpool for Assignment 1, Monte Carlo pi code, matrix addition workpool code.

2/3 Thurs Aug 29/Tues Sept 3
Assignment 1 Using the Seeds Pattern Programming Framework: 1 - Workpool

3
Thurs Sept 5
47 Compiler directive approach Introduction to Paraguin, parallel regions, barrier, forall, broadcast, scatter, gather, and reduction.

4 Tues Sept 10 37 and 34 Compiler directive approach

Examples Quiz Questions Patterns, Scatter/Gather, Stencil

4 Thurs Sept 12 23 Assignment 2 slides
Assignment 2 Using Paraguin to Create MPI Programs - hello world, matrix multiplication, stencil pattern, and Monte Carlo.

5
Tues Sept 17
54 Lower Level Message-passing Computing - MPI
Basics of message-passing programming, MPI, point-to-point message passing, message tags, MPI communicator, blocking send/recv, command line compiling and executing MPI programs, instrumenting code for execution time, Eclipse IDE Parallel Tools Platform.

5
Thurs Sept 19 61
More MPI routines

Quiz questions

MPI collective routines, general features, broadcast, scatter, gather, reduce, barrier, alltoall broadcast, synchronous message passing, asynchronous (non-blocking) message passing, changing to synchronous message passing.

6 Tues Sept 24 16 Synchronization
Quiz questions

Quiz questions
Barriers implementations, counter, reentrant code, tree, butterfly, local synchronization, safety and deadlock, safe MPI routines, MPI_SendRecv(), MPI_BSend(), MPI_Isend/MPI_Irecv()

6
Thurs Sept 26

Assignment 3 Compiling and executing MPI programs. Comparison with Seeds

6

Tues Sept 24/Thurs Sept 26
39 Programming with Shared Memory
Programming shared memory systems, processes, threads, issues, interleaved statements, thread safe routines, re-ordering code, compiler/processor optimizations, accessing shared data, critical sections, locks, condition variables, deadlock, semaphores, monitors, Pthreads program example.

6/7
Thurs Sept 26/Tues Oct 1
50 Introduction to OpenMP
Introduction to OpenMP, directives/constructs, parallel, shared and local variables, work-sharing, sections, for, loop scheduling, for reduction, single master, critical, barrier, atomic, flush.

7/8 Tues Oct 1/ Thur Oct 10 32 Shared memory performance issues Shared memory performance issues, specifying parallelism, par, forall constructs, dependency analysis (Bernstein's conditions), critical sections serializing code, data shared in caches, false sharing, sequential consistency, code re-ordering

7 Thurs Oct 3 Class Test No lecture. Take at any location

8
Mon/Tues Oct 7-8

Fall Recess, no classes. Fall break will follow UNC-Charlotte and students at other sites with a different break will need to watch the video of the class missed at their convenience.

8 Thur Oct 10 Shared memory performance issues
Quiz questions

8 Thur Oct 10 25 Java threads and synchronization Brief review of Java threads, Thread class, Runnable interface, Java synchronization, Synchronised methods, statements, atomic.

9 Tues Oct 15
21

14

Hybrid Programming

Paraguin Hybrid Programming

Combining MPI and OpenMP to take advantage of clusters that have both distributed-memory and shared-memory. Discussion of whether hybrid is any better than using only MPI or only OpenMP.

Using the Paraguin compiler to generate a hybrid program.

9
October 15/17

Assignment 4

cci-grid0x cluster

OpenMP and hybrid MPI/OpenMP assignment using command line.

9 October 15/17 32
Synchronous All-To-All Patterns

Demo

Synchronous All-To-All pattern, example use in gravitational N-body problem, Barnes-Hut algorithm, Seeds CompleteSynchGraph pattern code for N-body problem, iterative synchronous All-To-All pattern, solving system of linear equations by iteration, Jacobi iteration, convergence rate. Seeds CompleteSynchGraph Pattern, MPI _Allgather() routine

10 Tues Oct 22 37 Stencil pattern Quiz questions Stencil pattern, applications, solving Laplace's eq., heat distribution problem, Seeds stencil pattern, cellular automata, game of life, ways to improve performance, partially synchronous method, red-black, multigrid.

10 Oct 22/24 32 Pipeline pattern Quiz questions Pipeline pattern, space time diagram, speed up factor, matrix-vector multiplication, matrix multiplication, adding rows of an array, unfolding loops, frequency filter, insertion sort, prime numbers, upper triangular linear equations. Seeds pipeline pattern.

10 Thurs Oct 24 23 Sieve of Eratosthenes Quiz questions Sieve of Eratosthenes Algorithm for computing prime numbers

11 Tues Oct 29 42 Graph Algorithms Prim's Algorithm for Minimum Spanning Tree, Dijkstra's Algorithm for Single-Source Shortest Path, Dijkstra's and Floyd's Algorithms for All-Pairs Shortest Path

11
Thurs Oct 31
40 Sorting Algorithms Quiz questions Potential speedup of sorting in parallel, compare and exchange, bubble sort, odd-even transposition sort, mergesort, quicksort, odd-even mergesort, bitonic mergesort, shearsort, rank sort, counting sort, radix sort

12
Tues Nov 5
14 Data Parallel Pattern
Data parallel pattern, use of forall notation, example, data parallel prefix sum algorithm, matrix multiplication.

12

Tues Nov 5

21

21

Intro to GPUs and CUDA

CUDA Prog. Model

CPU-GPU architecture evolution, 1970s to present, dedicated pipelined GPUs, general purpose GPU design, NVIDIA products, Fermi architecture, GPU performance gains, CUDA.

CUDA SIMT prog. model, CUDA kernel routines, CPU and GPU memories, basic CUDA program structure, code example adding two vectors, compiling and executing on Linux command line, Windows MS Visual Studio.

12/13 Nov 7/12
38

Multidimensional thread structure

CUDA programming: threads, blocks, grid, multidimensional grid and blocks, compute capabilities, thread addressing, predefined variables, flattening array, 2-D grid and block code: matrix addition/multiplication.

12
Thurs Nov 7

Assignment 5

CUDA assignment using Linux environment to compile and execute simple CUDA programs, make file, vector/matrix addition/multiplication, and sorting.

13 Tues Nov 12

21

14

12

Performance measurements

Device routines

Thread synchronization

Measuring performance, timing program execution, CUDA “events”, synchronous and asynchronous CUDA routines, max and effective bandwidth, computation measures, FLOPs.

Declaring routines called from device and from host, local device variables, accessing kernel variables from host, cudaMemcopyToSymbol/FromSymbol

Ways to achieve thread synchronization, __syncThreads(), CPU synchronization, cudaThreadSynchronize(), __threadfence().

13
Thurs Nov 14

Class Test
No lecture. Take at any location

14
Tues Nov 19

34

28

GPU memory structures

Performance Analysis tools

Memory structures and bandwidth optimization, memory coalescing

Introduction to a few performance analysis tools: time, gettimeofday, read_real_time, MPI_Wtime, prof, gprof, xprofiler, mpiP

14 Thurs Nov 21 Review/discussion/sample finals

15
Tues Nov 26

Last NCREN class. Review/discussions/sample finals

Teaching evaluations done on-line

**Lecture slides**
Wk	Date, 2013	No of slides	Slides	Review/Quiz questions	Topics
1	Thurs Aug. 22	26	Outline		Course outline, prerequisites, course text, course contents, instructor details.
1	Thurs Aug. 22	23	Assignment Preliminaries		Assignment preliminaries, access to servers, Moodle, student accounts. TA details and responsibilities.
1/2	Tues Aug 27	13	Parallel Comp. Demand		Demand for computational speed, grand challenge problems
2	Tues Aug 27	18	Parallel Comp. Potential	Quiz questions	Potential for speed-up using multiple process(or)s, speed-up factor, max speed up, Amdahl's law, Gustafson's law.
2	Tues Aug 27	20	Parallel Computers		Types of parallel computers, shared memory systems, multicore, programming shared memory, distributed memory platform, networked computers cluster computing, programming, GPU systems.
2	Thurs Aug 29	30	Pattern Programming-1	Quiz questions	Parallel patterns for structured parallel programming, workpool, pipeline, divide and conquer, stencil, all-to-all patterns, advantages of starting with patterns, tools, Seeds pattern programming framework, user interface, programming example (Monte Carlo pi), Seeds documentation.
2/3	Thurs Aug 29/Tues Sept 3	43	Pattern Programming-2 Matrix add/multiply		Seeds Framework, workpool module methods, bootstrapping class, further details of Seeds workpool for Assignment 1, Monte Carlo pi code, matrix addition workpool code.
2/3	Thurs Aug 29/Tues Sept 3		Assignment 1		Using the Seeds Pattern Programming Framework: 1 - Workpool
3	Thurs Sept 5	47	Compiler directive approach		Introduction to Paraguin, parallel regions, barrier, forall, broadcast, scatter, gather, and reduction.
4	Tues Sept 10	37 and 34	Compiler directive approach Examples	Quiz Questions	Patterns, Scatter/Gather, Stencil
4	Thurs Sept 12	23	Assignment 2 slides Assignment 2		Using Paraguin to Create MPI Programs - hello world, matrix multiplication, stencil pattern, and Monte Carlo.
5	Tues Sept 17	54	Lower Level Message-passing Computing - MPI		Basics of message-passing programming, MPI, point-to-point message passing, message tags, MPI communicator, blocking send/recv, command line compiling and executing MPI programs, instrumenting code for execution time, Eclipse IDE Parallel Tools Platform.
5	Thurs Sept 19	61	More MPI routines	Quiz questions	MPI collective routines, general features, broadcast, scatter, gather, reduce, barrier, alltoall broadcast, synchronous message passing, asynchronous (non-blocking) message passing, changing to synchronous message passing.
6	Tues Sept 24	16	Synchronization	Quiz questions Quiz questions	Barriers implementations, counter, reentrant code, tree, butterfly, local synchronization, safety and deadlock, safe MPI routines, MPI_SendRecv(), MPI_BSend(), MPI_Isend/MPI_Irecv()
6	Thurs Sept 26		Assignment 3		Compiling and executing MPI programs. Comparison with Seeds
6	Tues Sept 24/Thurs Sept 26	39	Programming with Shared Memory		Programming shared memory systems, processes, threads, issues, interleaved statements, thread safe routines, re-ordering code, compiler/processor optimizations, accessing shared data, critical sections, locks, condition variables, deadlock, semaphores, monitors, Pthreads program example.
6/7	Thurs Sept 26/Tues Oct 1	50	Introduction to OpenMP		Introduction to OpenMP, directives/constructs, parallel, shared and local variables, work-sharing, sections, for, loop scheduling, for reduction, single master, critical, barrier, atomic, flush.
7/8	Tues Oct 1/ Thur Oct 10	32	Shared memory performance issues		Shared memory performance issues, specifying parallelism, par, forall constructs, dependency analysis (Bernstein's conditions), critical sections serializing code, data shared in caches, false sharing, sequential consistency, code re-ordering
7	Thurs Oct 3		Class Test		No lecture. Take at any location
8	Mon/Tues Oct 7-8				Fall Recess, no classes. Fall break will follow UNC-Charlotte and students at other sites with a different break will need to watch the video of the class missed at their convenience.
8	Thur Oct 10		Shared memory performance issues	Quiz questions
8	Thur Oct 10	25	Java threads and synchronization		Brief review of Java threads, Thread class, Runnable interface, Java synchronization, Synchronised methods, statements, atomic.
9	Tues Oct 15	21 14	Hybrid Programming Paraguin Hybrid Programming		Combining MPI and OpenMP to take advantage of clusters that have both distributed-memory and shared-memory. Discussion of whether hybrid is any better than using only MPI or only OpenMP. Using the Paraguin compiler to generate a hybrid program.
9	October 15/17		Assignment 4 cci-grid0x cluster		OpenMP and hybrid MPI/OpenMP assignment using command line.
9	October 15/17	32	Synchronous All-To-All Patterns Demo		Synchronous All-To-All pattern, example use in gravitational N-body problem, Barnes-Hut algorithm, Seeds CompleteSynchGraph pattern code for N-body problem, iterative synchronous All-To-All pattern, solving system of linear equations by iteration, Jacobi iteration, convergence rate. Seeds CompleteSynchGraph Pattern, MPI _Allgather() routine
10	Tues Oct 22	37	Stencil pattern	Quiz questions	Stencil pattern, applications, solving Laplace's eq., heat distribution problem, Seeds stencil pattern, cellular automata, game of life, ways to improve performance, partially synchronous method, red-black, multigrid.
10	Oct 22/24	32	Pipeline pattern	Quiz questions	Pipeline pattern, space time diagram, speed up factor, matrix-vector multiplication, matrix multiplication, adding rows of an array, unfolding loops, frequency filter, insertion sort, prime numbers, upper triangular linear equations. Seeds pipeline pattern.
10	Thurs Oct 24	23	Sieve of Eratosthenes	Quiz questions	Sieve of Eratosthenes Algorithm for computing prime numbers
11	Tues Oct 29	42	Graph Algorithms		Prim's Algorithm for Minimum Spanning Tree, Dijkstra's Algorithm for Single-Source Shortest Path, Dijkstra's and Floyd's Algorithms for All-Pairs Shortest Path
11	Thurs Oct 31	40	Sorting Algorithms	Quiz questions	Potential speedup of sorting in parallel, compare and exchange, bubble sort, odd-even transposition sort, mergesort, quicksort, odd-even mergesort, bitonic mergesort, shearsort, rank sort, counting sort, radix sort
12	Tues Nov 5	14	Data Parallel Pattern		Data parallel pattern, use of forall notation, example, data parallel prefix sum algorithm, matrix multiplication.
12	Tues Nov 5	21 21	Intro to GPUs and CUDA CUDA Prog. Model		CPU-GPU architecture evolution, 1970s to present, dedicated pipelined GPUs, general purpose GPU design, NVIDIA products, Fermi architecture, GPU performance gains, CUDA. CUDA SIMT prog. model, CUDA kernel routines, CPU and GPU memories, basic CUDA program structure, code example adding two vectors, compiling and executing on Linux command line, Windows MS Visual Studio.
12/13	Nov 7/12	38	Multidimensional thread structure		CUDA programming: threads, blocks, grid, multidimensional grid and blocks, compute capabilities, thread addressing, predefined variables, flattening array, 2-D grid and block code: matrix addition/multiplication.
12	Thurs Nov 7		Assignment 5		CUDA assignment using Linux environment to compile and execute simple CUDA programs, make file, vector/matrix addition/multiplication, and sorting.
13	Tues Nov 12	21 14 12	Performance measurements Device routines Thread synchronization		Measuring performance, timing program execution, CUDA “events”, synchronous and asynchronous CUDA routines, max and effective bandwidth, computation measures, FLOPs. Declaring routines called from device and from host, local device variables, accessing kernel variables from host, cudaMemcopyToSymbol/FromSymbol Ways to achieve thread synchronization, __syncThreads(), CPU synchronization, cudaThreadSynchronize(), __threadfence().
13	Thurs Nov 14		Class Test		No lecture. Take at any location
14	Tues Nov 19	34 28	GPU memory structures Performance Analysis tools		Memory structures and bandwidth optimization, memory coalescing Introduction to a few performance analysis tools: time, gettimeofday, read_real_time, MPI_Wtime, prof, gprof, xprofiler, mpiP
14	Thurs Nov 21				Review/discussion/sample finals
15	Tues Nov 26				Last NCREN class. Review/discussions/sample finals Teaching evaluations done on-line

Top

Reading materials

Seeds Pattern Programming Framework

Seeds Framework Home Page

Seeds Tutorials

MPI

MPI forum and standard: http://www.mpi-forum.org/

MPICH http://www.mcs.anl.gov/research/projects/mpich2/

Textbook home page: http://www.cs.uncc.edu/par_prog

Formal parameters of MPI routines (ppt): MPI Parameters

MPI Gather/scatter examples

Some error messages when running in MPI programs

Paraguin

Paraguin Compiler Version 2.1 User Manual

CUDA

UNC-Charlotte Spring CUDA programming course (ITCS 4/5010 Spring 2013)

UNCC parallel programming cci-grid0x cluster

cci-grid0x cluster (newly reorganized)

Generating X11 graphical output

Notes on installing MPI on your own computer

OpenMPI on a Mac here from Tristan Bithell Spring 2013 class

MPICH on Ubuntu http://jetcracker.wordpress.com/2012/03/01/how-to-install-mpi-in-ubuntu/ from Thomas Kraft, Spring 2013 class

Notes on installing OpenMP on your own computer

To add. Let me know if you have anything to post on MPI or OpenMP installation. Thanks. BW

Top

Assignments

Each assignment is not ready for use until the date set.

Date set	Date to report system/account problems	Assignment	Topic	Date due 12 pm (noon)
Thursday Aug 29th, 2013	Monday Sept 2^nd, 2013	Assignment 1 Seeds Software	Using the Seeds Pattern Programming Framework 1 - Workpool Pattern	Wednesday Sept 11th, 2013
Thursday Sept 12th, 2013	Monday Sept 16th, 2013	Assignment 2	Using Paraguin to Create MPI Programs - hello world, matrix multiplication, stencil pattern, and Monte Carlo.	Wednesday Sept 25th, 2013
Thursday Sept 26, 2013	Monday Sept 30, 2013	Assignment 3	Compiling and running MPI programs Comparison with Seeds and Paraguin	Wednesday Oct 16, 2013
Thursday Oct 17, 2013	Monday Oct 21, 2013	Assignment 4 Test file here G = 100 Generating X11 graphical output cci-grid0x cluster	OpenMP and hybrid OpenMP/MPI assignment	New: Thursday Nov 7th, 2013 at 11:55 pm
Thursday Nov 7, 2013	Monday Nov 11, 2013	Assignment 5	CUDA programs, Linux environment to compile and execute simple CUDA programs, make file, vector/matrix addition, and sorting.	Tuesday Nov 26, 2013

Top

Tests

Class test 1 date: Thursday Oct 3rd, 2013 -- 60 minutes scheduled during class time. Open 10:45 am. Closes 12:30 pm.

Format: 40 questions, multiple choice, Moodle quiz. Closed book.
Topics: All lecture materials presented in class from beginning of course (week 1) to week 6 inclusive, and materials in Assignment 1 (pattern programming) and Assignment 2 (Paraguin compiler directives) and MPI but not Assignment 3. Does not include shared memory programming.

Class test 2 date: Nov 14, 2013 -- 60 minutes scheduled during class time. Open 10:45 am. Closes 12:30 pm.

Format: 40 questions, multiple choice, Moodle quiz. Closed book.
Topics: All materials after test 1, week 6 to week 13 inclusive - shared memory programming, OpenMP, shared memory performance issues, Java threads, hybrid programming, synchronous all-to-all pattern, stencil pattern, pipeline pattern, Sieve of Eratosthenes, graph algorithms, numerical algorithms, sorting algorithms, CUDA, Assignment 4. Does not include Assignment 5 (although includes CUDA lectures).

Final exam (2 hour exam) date:

    UNC-C students: 11 am - 1:00 pm, Tuesday December 10th 2013
    UNC-W students: 11 am - 1:00 pm, Tuesday December 10th 2013
    UNC A&T students: 1 pm - 3 pm, Thursday December 12, 2013
    ECU students: 11 am - 1:00 pm, Thursday December 5th 2013
    UNC-G students: 12 noon- 2:00 pm, Tuesday December 10th 2013
    WSSU students: 11 am- 1 pm, Tuesday, December 10, 2013

Topics: Comprehensive
Format: Paper test in format of previous posted final tests. Closed book.

Previous tests (UNC-C courses that did not do patterns)

Top