ITCS 4145 Cluster Computing

	University of North Carolina Charlotte University of North Carolina Wilmington Parallel Programming Fall 2012 Tuesday/Thursday, 11:00 am - 12:15 pm
Dr. Barry Wilkinson University of North Carolina at Charlotte Office Hours: 9:15am-10:45 am T/Th	and	Dr. Clayton Ferner University of North Carolina at Wilmington Office Hours: 2 pm - 3:30 pm T/Th

This page is continually updated as the course proceeds. Watch for announcements. Modification date: Dec 18, 2012

ANNOUNCEMENTS

Assignment Frequently Asked Questions

Academic calendar

Lecture Materials

Reading materials

Assignments

Tests

UNC-C Moodle

Class videos

Lecture Materials

The following slides are provided as Powerpoint slides. You may wish to print these sides out as 1 x 2 or 2 x 3 thumbnails. The slides are not ready for use until the date of the class. They are likely to be revised just before the class.

Lecture slides

Wk

Date, 2012
No of slides Slides Topics

1
Thurs Aug. 23
25 Outline
Course outline, prerequisites, course text, course contents, instructor details.

1
Thurs Aug. 23
23 Assignment Preliminaries
Assignment preliminaries, access to servers, Moodle, student accounts. TA details and responsibilities.

2
Tues Aug. 28
13 Parallel Comp. Demand
Demand for computational speed, grand challenge problems

2
Tues Aug. 28
16 Parallel Comp. Potential
Potential for speed-up using multiple process(or)s, speed-up factor, max speed up, Amdahl's law, Gustafson's law.

2
Tues Aug 28
20 Parallel Computers Types of parallel computers, shared memory systems, multicore, programming shared memory, distributed memory platform, networked computers cluster computing, programming, GPU systems.

2
Thurs Aug 30
31 Pattern Programming-1
Parallel patterns for structured parallel programming, workpool, pipeline, divide and conquer, stencil, all-to-all patterns, advantages of starting with patterns, tools, Seeds pattern programming framework, user interface, programming example (Monte Carlo pi), Seeds documentation.

2
Thurs Aug 30
27

Pattern Programming-2

Matrix add/multiply
Seeds Framework, workpool module methods, bootstrapping class, further details of Seeds workpool for Assignment 1, Monte Carlo pi code, matrix addition workpool code.

2 Thurs Aug 30 Assignment 1 Using the Seeds Pattern Programming Framework: 1 - Workpool

3
Tues Sept 4
52 Lower Level Message-passing Computing - MPI
Basics of message-passing programming, MPI, point-to-point message passing, message tags, MPI communicator, blocking send/recv, command line compiling and executing MPI programs, instrumenting code for execution time, Eclipse IDE Parallel Tools Platform.

3
Thurs Sept 6 and Tues Sept 11 50
More MPI routines

MPI Quiz questions

MPI collective routines, general features, broadcast, scatter, gather, reduce, barrier, alltoall broadcast, synchronous message passing, asynchronous (non-blocking) message passing, changing to synchronous message passing.

4
Tues Sept 11

Assignment 2 Compiling and executing MPI programs. Comparison with Seeds

4
Thurs Sept 13
44 Compiler directive approach Introduction to the Paraguin compiler, affine expressions, partitioning, mapping, parallel region, forall

5 Tues Sept 18 and Thurs Sept 20 22 and 43 Compiler directive approach

Examples Communication, Broadcast, Gather, Data Dependence, More Examples

6 Tues Sept 25 Assignment 3 Using Paraguin to Create MPI Programs - hello world, matrix multiplication, Sobel Edge Detection, and Monte Carlo.

Reading
9
Divide and conquer pattern Recursive divide and conquer pattern, example: numerical integration with adaptive quadrature.

6
Tues Sept 25 30
Pipeline pattern Pipeline pattern, examples unfolding loops, insertion sort, prime numbers, upper triangular linear equations. Seeds pipeline pattern, sorting code.

6
Tues Sept 25 35

Synchronous All-ToAll Pattern

Demo
Synchronous All-To-All pattern, example use in gravitational N-body problem, Barnes-Hut algorithm, Seeds CompleteSynchGraph pattern code for N-body problem, module methods and bootstrap class.

6
Tues Sept 25

13

Iterative synchronous All-To-All pattern

Iterative synchronous All-To-All pattern, solving system of linear equations by iteration, Seeds CompleteSynchGraph Pattern, MPI _Allgather() routine, Jacobi iteration, convergence rate.

Reading
34
Stencil pattern Stencil pattern, applications, heat distribution problem, Seeds code, cellular automata, game of life, partially synchronous method.

6
Thurs Sept 27
39 Programming with Shared Memory
Programming shared memory systems, processes, threads, issues, interleaved statements, thread safe routines, re-ordering code, compiler/processor optimizations, accessing shared data, critical sections, locks, condition variables, deadlock, semaphores, monitors, Pthreads program example.

6 32 Shared memory performance issues Shared memory performance issues, specifying parallelism, par, forall constructs, dependency analysis (Bernstein's conditions), critical sections serializing code, data shared in caches, false sharing, sequential consistency, code re-ordering

Reading
25 Java threads and synchronization Brief review of Java threads, Thread class, Runnable interface, Java synchronization, Synchronised methods, statements, atomic.

6
Thurs Oct 3
50 Introduction to OpenMP
Introduction to OpenMP, directives/constructs, parallel, shared and local variables, work-sharing, sections, for, loop scheduling, for reduction, single master, critical, barrier, atomic, flush.

7
Oct 8/9

Fall Recess, no classes

7
Thurs Oct 11

Class test

8
Tues Oct 16
Assignment 4
OpenMP and hybrid MPI/OpenMP assignment using command line.

9
Thurs Oct 18
44 Numerical Algorithms

Num.Alg. quiz
Parallelizing matrix multiplication, time complexities, block multiplication, recursive algorithm, mesh algorithms, Cannon's algorithm, systolic array, solving system of linear equations, direct and iterative methods, red-black, multigrid.

10
Tues Oct 22
32 Hybrid Programming Combining MPI and OpenMP to take advantage of clusters that have both distributed-memory and shared-memory. Using the Paraguin compiler to generate a hybrid program. Discussion of whether hybrid is any better than using only MPI or only OpenMP.

10
Thurs Oct 25
40 Sorting Algorithms Potential speedup of sorting in parallel, compare and exchange, bubble sort, odd-even transposition sort, mergesort, quicksort, odd-even mergesort, bitonic mergesort, shearsort.

10
Thurs Oct 25
9 Data Parallel Pattern
Data parallel pattern, use of forall notation, example, data parallel prefix sum algorithm, matrix multiplication.

11

Tues Oct 30

16

26

Intro to GPUs and CUDA

CUDA Prog. Model

CPU-GPU architecture evolution, 1970s to present, dedicated pipelined GPUs, general purpose GPU design, NVIDIA products, Fermi architecture, GPU performance gains, CUDA.

CUDA SIMT prog. model, CUDA kernel routines, CPU and GPU memories, basic CUDA program structure, code example adding two vectors, compiling and executing on Linux command line, Windows MS Visual Studio.

11
Thurs Nov 1
25 Multidimensional thread structure
CUDA programming: threads, blocks, grid, multidimensional grid and blocks, compute capabilities, thread addressing, predefined variables, flattening array, 2-D grid and block code: matrix addition/multiplication.

11 Thurs Nov 1/Tues Nov 6 20
Performance measurements
Measuring performance, timing program execution, CUDA “events”, synchronous and asynchronous CUDA routines, max and effective bandwidth, computation measures, FLOPs.

12
Tues Nov 6th

Assignment 5

CUDA assignment using Linux environment to compile and execute simple CUDA programs, make file, vector/matrix addition and static heat distribution problem.

12 Tues Nov 6
14 Device routines Declaring routines called from device and from host, local device variables, accessing kernel variables from host, cudaMemcopyToSymbol/FromSymbol

12
Thurs Nov 8

12

Thread synchronization

Ways to achieve thread synchronization, __syncThreads(), CPU synchronization, cudaThreadSynchronize(), __threadfence().

12
Thurs Nov 8

Sorting Algorithms Sorting continued. Rank sort, counting sort, radix sort

13
Tues Nov 13
40
Graph Algorithms
Prim's Algorithm for Minimum Spanning Tree, Dijkstra's Algorithm for Single-Source Shortest Path, Dijkstra's and Floyd's Algorithms for All-Pairs Shortest Path

13
Thurs Nov 15

Class test

14
Tues Nov 20
23 Sieve of Eratosthenes
Sieve of Eratosthenes Algorithm for computing prime numbers

15
Tues Nov 27

Class teaching evaluation

15
Thurs Nov 29

Presentations by UNCW graduate students

16
Tues Dec 4, 2012

Last class. Review/discussion

**Lecture slides**
Wk	Date, 2012	No of slides	Slides	Topics
1	Thurs Aug. 23	25	Outline	Course outline, prerequisites, course text, course contents, instructor details.
1	Thurs Aug. 23	23	Assignment Preliminaries	Assignment preliminaries, access to servers, Moodle, student accounts. TA details and responsibilities.
2	Tues Aug. 28	13	Parallel Comp. Demand	Demand for computational speed, grand challenge problems
2	Tues Aug. 28	16	Parallel Comp. Potential	Potential for speed-up using multiple process(or)s, speed-up factor, max speed up, Amdahl's law, Gustafson's law.
2	Tues Aug 28	20	Parallel Computers	Types of parallel computers, shared memory systems, multicore, programming shared memory, distributed memory platform, networked computers cluster computing, programming, GPU systems.
2	Thurs Aug 30	31	Pattern Programming-1	Parallel patterns for structured parallel programming, workpool, pipeline, divide and conquer, stencil, all-to-all patterns, advantages of starting with patterns, tools, Seeds pattern programming framework, user interface, programming example (Monte Carlo pi), Seeds documentation.
2	Thurs Aug 30	27	Pattern Programming-2 Matrix add/multiply	Seeds Framework, workpool module methods, bootstrapping class, further details of Seeds workpool for Assignment 1, Monte Carlo pi code, matrix addition workpool code.
2	Thurs Aug 30		Assignment 1	Using the Seeds Pattern Programming Framework: 1 - Workpool
3	Tues Sept 4	52	Lower Level Message-passing Computing - MPI	Basics of message-passing programming, MPI, point-to-point message passing, message tags, MPI communicator, blocking send/recv, command line compiling and executing MPI programs, instrumenting code for execution time, Eclipse IDE Parallel Tools Platform.
3	Thurs Sept 6 and Tues Sept 11	50	More MPI routines MPI Quiz questions	MPI collective routines, general features, broadcast, scatter, gather, reduce, barrier, alltoall broadcast, synchronous message passing, asynchronous (non-blocking) message passing, changing to synchronous message passing.
4	Tues Sept 11		Assignment 2	Compiling and executing MPI programs. Comparison with Seeds
4	Thurs Sept 13	44	Compiler directive approach	Introduction to the Paraguin compiler, affine expressions, partitioning, mapping, parallel region, forall
5	Tues Sept 18 and Thurs Sept 20	22 and 43	Compiler directive approach Examples	Communication, Broadcast, Gather, Data Dependence, More Examples
6	Tues Sept 25		Assignment 3	Using Paraguin to Create MPI Programs - hello world, matrix multiplication, Sobel Edge Detection, and Monte Carlo.
	Reading	9	Divide and conquer pattern	Recursive divide and conquer pattern, example: numerical integration with adaptive quadrature.
6	Tues Sept 25	30	Pipeline pattern	Pipeline pattern, examples unfolding loops, insertion sort, prime numbers, upper triangular linear equations. Seeds pipeline pattern, sorting code.
6	Tues Sept 25	35	Synchronous All-ToAll Pattern Demo	Synchronous All-To-All pattern, example use in gravitational N-body problem, Barnes-Hut algorithm, Seeds CompleteSynchGraph pattern code for N-body problem, module methods and bootstrap class.
6	Tues Sept 25	13	Iterative synchronous All-To-All pattern	Iterative synchronous All-To-All pattern, solving system of linear equations by iteration, Seeds CompleteSynchGraph Pattern, MPI _Allgather() routine, Jacobi iteration, convergence rate.
	Reading	34	Stencil pattern	Stencil pattern, applications, heat distribution problem, Seeds code, cellular automata, game of life, partially synchronous method.

6	Thurs Sept 27	39	Programming with Shared Memory	Programming shared memory systems, processes, threads, issues, interleaved statements, thread safe routines, re-ordering code, compiler/processor optimizations, accessing shared data, critical sections, locks, condition variables, deadlock, semaphores, monitors, Pthreads program example.
6		32	Shared memory performance issues	Shared memory performance issues, specifying parallelism, par, forall constructs, dependency analysis (Bernstein's conditions), critical sections serializing code, data shared in caches, false sharing, sequential consistency, code re-ordering
	Reading	25	Java threads and synchronization	Brief review of Java threads, Thread class, Runnable interface, Java synchronization, Synchronised methods, statements, atomic.
6	Thurs Oct 3	50	Introduction to OpenMP	Introduction to OpenMP, directives/constructs, parallel, shared and local variables, work-sharing, sections, for, loop scheduling, for reduction, single master, critical, barrier, atomic, flush.
7	Oct 8/9			Fall Recess, no classes
7	Thurs Oct 11			Class test
8	Tues Oct 16		Assignment 4	OpenMP and hybrid MPI/OpenMP assignment using command line.
9	Thurs Oct 18	44	Numerical Algorithms Num.Alg. quiz	Parallelizing matrix multiplication, time complexities, block multiplication, recursive algorithm, mesh algorithms, Cannon's algorithm, systolic array, solving system of linear equations, direct and iterative methods, red-black, multigrid.
10	Tues Oct 22	32	Hybrid Programming	Combining MPI and OpenMP to take advantage of clusters that have both distributed-memory and shared-memory. Using the Paraguin compiler to generate a hybrid program. Discussion of whether hybrid is any better than using only MPI or only OpenMP.
10	Thurs Oct 25	40	Sorting Algorithms	Potential speedup of sorting in parallel, compare and exchange, bubble sort, odd-even transposition sort, mergesort, quicksort, odd-even mergesort, bitonic mergesort, shearsort.
10	Thurs Oct 25	9	Data Parallel Pattern	Data parallel pattern, use of forall notation, example, data parallel prefix sum algorithm, matrix multiplication.
11	Tues Oct 30	16 26	Intro to GPUs and CUDA CUDA Prog. Model	CPU-GPU architecture evolution, 1970s to present, dedicated pipelined GPUs, general purpose GPU design, NVIDIA products, Fermi architecture, GPU performance gains, CUDA. CUDA SIMT prog. model, CUDA kernel routines, CPU and GPU memories, basic CUDA program structure, code example adding two vectors, compiling and executing on Linux command line, Windows MS Visual Studio.
11	Thurs Nov 1	25	Multidimensional thread structure	CUDA programming: threads, blocks, grid, multidimensional grid and blocks, compute capabilities, thread addressing, predefined variables, flattening array, 2-D grid and block code: matrix addition/multiplication.
11	Thurs Nov 1/Tues Nov 6	20	Performance measurements	Measuring performance, timing program execution, CUDA “events”, synchronous and asynchronous CUDA routines, max and effective bandwidth, computation measures, FLOPs.
12	Tues Nov 6th		Assignment 5	CUDA assignment using Linux environment to compile and execute simple CUDA programs, make file, vector/matrix addition and static heat distribution problem.
12	Tues Nov 6	14	Device routines	Declaring routines called from device and from host, local device variables, accessing kernel variables from host, cudaMemcopyToSymbol/FromSymbol
12	Thurs Nov 8	12	Thread synchronization	Ways to achieve thread synchronization, __syncThreads(), CPU synchronization, cudaThreadSynchronize(), __threadfence().
12	Thurs Nov 8		Sorting Algorithms	Sorting continued. Rank sort, counting sort, radix sort
13	Tues Nov 13	40	Graph Algorithms	Prim's Algorithm for Minimum Spanning Tree, Dijkstra's Algorithm for Single-Source Shortest Path, Dijkstra's and Floyd's Algorithms for All-Pairs Shortest Path
13	Thurs Nov 15			Class test
14	Tues Nov 20	23	Sieve of Eratosthenes	Sieve of Eratosthenes Algorithm for computing prime numbers
15	Tues Nov 27			Class teaching evaluation
15	Thurs Nov 29			Presentations by UNCW graduate students
16	Tues Dec 4, 2012			Last class. Review/discussion

Top

Reading materials

Patterns

Seeds Framework Home Page

MPI

MPI forum and standard: http://www.mpi-forum.org/
MPICH http://www.mcs.anl.gov/research/projects/mpich2/
Textbook home page: http://www.cs.uncc.edu/par_prog
Formal parameters of MPI routines (ppt): MPI Parameters
Gather/scatter

Paraguin

userman.pdf (will be available later)

CUDA

Top

Assignments

Each assignment is not ready for use until the date set.

Date set	Interim dates	Assignment	Topic	Date due 12 pm (noon)
Thurs Aug 30, 2012	Date to report system/account problems: Tues Sept 4, 2012	Assignment 1 Seeds 2.0 Pi Approx	Using the Seeds Pattern Programming Framework 1 - Workpool Pattern	Monday, Sept 10, 2012
Tues Sept 11	Date to report system/account problems: Tues Sept 18, 2012	Assignment 2	Compiling and running MPI programs Comparison with Seeds	Monday, Sept 24, 2012
Tues Sept 25	Date to report system/account problems: Tues Oct 2, 2012	Assignment 3	Using Paraguin to Create MPI Programs - hello world, matrix multiplication, Sobel Edge Detection, and Monte Carlo.	Wednesday Oct 10, 2012
Tues Oct 16	Date to report system/ compiler problems:Tues Oct 23, 2012	Assignment 4 Generating X11 graphical output Using UNC-C coit-grid cluster for MPI	OpenMP and hybrid OpenMP/MPI assignment	Monday Nov 5, 2012
Tues Nov 6, 2012	Date to report system/ compiler problems: Tues Nov 13, 2012	Assignment 5	CUDA programs, Linux environment to compile and execute simple CUDA programs, make file, vector/matrix addition and static heat distribution problem.	Mon Nov 26, 2012

Top

Tests

Class test 1: Thursday October 11th, in class 11 am - 12:15 pm
Format: 45 question, multiple choice, paper test. Closed book.
Topics: All lecture materials presented in class from beginning of course (week 1) to week 6 inclusive, and materials in Assignment 1 (pattern programming), Assignment 2 (MPI), and Assignment 3 (Paraguin compiler directives). Does not include shared memory programming.

Class test 2: Thursday November 15th in class 11 am - 12:15 pm
Format: Paper test in format of previous posted 2nd tests. Closed book.
Topics: All materials after test 1, week 6 to week 10 inclusive - shared memory programming, numerical algorithms, sorting algorithms, hybrid programming, assignment 4. DOES NOT INCLUDE GPU/CUDA PROGRAMMING

Final Exam:
UNC-C students: 11 am - 1:30 pm, Tuesday December 11th 2012
UNC-W students: 11:30 am - 2:00 pm, Tuesday December 11th 2012
Topics: Comprehensive.

Previous tests (UNC-C courses that did not do patterns)

Top