UNIVERSITY OF NORTH CAROLINA AT CHARLOTTE
Department of Computer Science
ITCS 4145/5145 Parallel Computing

Spring 2014

Tuesday/Thursday 5:00 pm - 6:15 pm, Woodward 130

Dr. Barry Wilkinson

This page is continually updated as the course proceeds. Watch for announcements. Modification date: June 5, 2014. Always make sure you have the most recent copy of this page (not cached, re-load page).


ANNOUNCEMENTS


Assignment Frequently Asked Questions
Academic calendar
Lecture Materials
Reading materials
Assignments
Tests
UNC-C Moodle 2

Lecture Materials

The following slides are provided as Powerpoint slides. You may wish to print these sides out as 1 x 2 or 2 x 3 thumbnails. The slides are not ready for use until the date of the class. They are likely to be revised just before the class.
Lecture slides

Wk

Date, 2014
No of slides
Slides
Review/Quiz questions
Topics
1
Thurs Jan 9
26 Outline
  Course outline, prerequisites, course text,  course contents, instructor details.
1
Thurs Jan 9
23 Assignment Preliminaries
  Assignment preliminaries, access to servers, Moodle, student accounts. TA details and responsibilities.
1/2
Thurs Jan 9
13 Parallel Comp. Demand
  Demand for computational speed, grand challenge problems
2
Tues Jan 14
16 Parallel Comp. Potential
Quiz questions Potential for speed-up using multiple process(or)s, speed-up factor, max speed up, Amdahl's law, Gustafson's law.
2
Tues Jan 14
19 Parallel Computers   Types of parallel computers, shared memory systems, multicore, programming shared memory, distributed memory platform, networked computers cluster computing, programming, GPU systems.
2
Tues Jan 14/Thur Jan 16
33 Pattern Programming-1
 Quiz questions Parallel patterns for structured parallel programming, workpool, pipeline, divide and conquer, stencil, all-to-all patterns, advantages of starting with patterns, tools, Seeds pattern programming framework, user interface, programming example (Monte Carlo pi).
2
Thurs Jan 16
41


Pattern Programming-2

  Seeds Framework, workpool module methods, bootstrapping class, further details of Seeds workpool for Assignment 1, Monte Carlo pi code, matrix addition and multiplication workpool code.
2 Thurs Jan 16
  Assignment 1   Using the Seeds Pattern Programming Framework: Workpool
3 Tues Jan 21   Matrix addition and multiplication  

Matrix addition and multiplication, partitioning, block matrix multiplication, computation/comunnication ratio, workpool pattern

Review, quiz questions etc.

3
Thurs Jan 23
54 Lower Level Message-passing Computing - MPI
  Basics of message-passing programming, MPI, point-to-point message passing, message tags, MPI communicator, blocking send/recv, command line compiling and executing MPI programs, instrumenting code for execution time, Eclipse IDE Parallel Tools Platform.
4/5
Turs Jan 30/Tues Feb 4 57

More MPI routines

Quiz questions

 

Collective data transfer patterns and MPI collective routines, broadcast, scatter, gather, reduce, barrier, alltoall broadcast. Combined patterns, broadcast-gather, MPI routines, all gather, alltoall, general features
4 Thurs Jan 30   Assignment 2   Compiling and executing MPI programs. Comparison with Seeds
5 Tues Feb 4 33 Synchronization

Quiz questions

Quiz questions

Synchronous message passing, asynchronous (non-blocking) message passing, changing to synchronous message passing. MPI_BSend(), MPI_Isend/MPI_Irecv(), Barrier synchronization pattern, MPI Barrier, implementations, counter, reentrant code, tree, butterfly, local synchronization, safety and deadlock, safe MPI routines, MPI_SendRecv()

5


Thurs Feb 6
39 Programming with Shared Memory
  Programming shared memory systems, processes, threads, issues, interleaved statements, thread safe routines, re-ordering code, compiler/processor optimizations, accessing shared data, critical sections, locks, condition variables, deadlock, semaphores, monitors, Pthreads program example.
6
Tues Feb 11
45 Introduction to OpenMP
  Introduction to OpenMP, directives/constructs, parallel, shared and local variables, work-sharing, sections, for, loop scheduling, for reduction, single master, critical, barrier, atomic, flush.
6 Thurs Feb 13

 

 

 

University closed because of snow. Class canceled

7 Tuesday Feb 18 2014   Assignment 3   OpenMP assignment.
7 Tuesday Feb 18 2014       Review for test
7 Thur Feb 20   Class Test   Paper test in classroom
8 Tues Feb 25, 2014      

Return and go over test 1

8 Tues Feb 25 16 Assignment 3 Notes  

Stencil pattern intro, heat distribution, and Assignment 3

Generating X11 graphics

8 Feb 27

29

Shared memory performance issues

Quiz questions

Shared memory performance issues, specifying parallelism, par, forall constructs, dependency analysis (Bernstein's conditions), data shared in caches, false sharing, sequential consistency, code re-ordering


March 3 - 8, 2014


  Spring Break, no classes
9
Tues March 11

15

 

39

Paraguin Introduction

Compiler directive approach

 

Using compiler-directed approach to create MPI code automatically, intro to Paraguin compiler, compiling.

Paraguin, parallel regions, barrier, forall, broadcast, scatter, gather, and reduction.
9 Thurs March 13

28

 

Compiler directive approach

Quiz Questions Patterns, Scatter/Gather, Stencil
9 Thurs March 13 26 Hybrid Programming  

Combining MPI and OpenMP to take advantage of clusters that have both distributed-memory and shared-memory. Discussion of whether hybrid is any better than using only MPI or only OpenMP.

Using the Paraguin compiler to generate a hybrid program

10
Tues March 18
 

Assignment 4

 

Suzaku assignment (new for Spring 2014) - pattern programming using macros/routines.

10 Tues March 18 32

Synchronous All-To-All Patterns

Demo

 

Synchronous All-To-All pattern, example use in gravitational N-body problem, Barnes-Hut algorithm, Seeds CompleteSynchGraph pattern code for N-body problem, iterative synchronous All-To-All pattern, solving system of linear equations by iteration, Jacobi iteration, convergence rate. Seeds CompleteSynchGraph Pattern, MPI _Allgather() routine

10 Thurs March 20, 2014 37 Stencil pattern Quiz questions Stencil pattern, applications, solving Laplace's eq., heat distribution problem, Seeds stencil pattern, cellular automata, game of life, ways to improve performance, partially synchronous method, red-black, multigrid.
10 Thurs March 20, 2014 32 Pipeline pattern Quiz questions Pipeline pattern, space time diagram, speed up factor, matrix-vector multiplication, matrix multiplication, adding rows of an array, unfolding loops, frequency filter, insertion sort, prime numbers, upper triangular linear equations.  Seeds pipeline pattern.
  Reading 23 Sieve of Eratosthenes Quiz questions Sieve of Eratosthenes Algorithm for computing prime numbers
  Reading 42 Graph Algorithms   Prim's Algorithm for Minimum Spanning Tree, Dijkstra's Algorithm for Single-Source Shortest Path, Dijkstra's and Floyd's Algorithms for All-Pairs Shortest Path
11
Tues March 25, 2014
40 Sorting Algorithms Quiz questions Potential speedup of sorting in parallel, compare and exchange, bubble sort, odd-even transposition sort, mergesort, quicksort, odd-even mergesort, bitonic mergesort, shearsort, rank sort, counting sort, radix sort
11
Thurs March 27
14 Data Parallel Pattern
  Data parallel pattern, use of forall notation, example, data parallel prefix sum algorithm, matrix multiplication.

11

 

12

Thurs March 27

 

Tuesday April 1

21

 

 

21

 

Intro to GPUs and CUDA


CUDA Prog. Model

 

CPU-GPU architecture evolution, 1970s to present, dedicated pipelined GPUs, general purpose GPU design, NVIDIA products, Fermi architecture, GPU performance gains, CUDA.

CUDA SIMT prog. model, CUDA kernel routines, CPU and GPU memories, basic CUDA program structure, code example adding two vectors, compiling and executing on Linux command line, Windows MS Visual Studio.

12 Tuesday April 1

38

Multidimensional thread structure

 

CUDA programming: threads, blocks, grid, multidimensional grid and blocks, compute capabilities, thread addressing, predefined variables, flattening array, 2-D grid and block code: matrix addition/multiplication.

12
Thurs April 3

 

Assignment 5


 

CUDA assignment using Linux environment to compile and execute simple CUDA programs, make file, vector/matrix addition/multiplication, and sorting.

12 Thurs April 3

21

 

 

14

 

12

Performance measurements

 

Device routines

 

Thread synchronization

 

Measuring performance, timing program execution, CUDA “events”, synchronous and asynchronous CUDA routines, max and effective bandwidth, computation measures, FLOPs.

Declaring routines called from device and from host, local device variables, accessing kernel variables from host, cudaMemcopyToSymbol/FromSymbol

Ways to achieve thread synchronization, __syncThreads(), CPU synchronization, cudaThreadSynchronize(), __threadfence().

13 Tues April 8       Review for test
13
Thurs April 10

Class Test
  Paper test in classroom
14
Tues April 15

41


GPU memory structures

 

Memory structures and bandwidth optimization, memory coalescing

14 Thurs April 17       Return test
15 Tues April 22        
15 Thurs April 24       Review/discussion/sample finals
16
Tues April 29, 2014
 
 

Last class. Review/discussions/sample finals

Teaching evaluations done on-line

Top 
Reading materials

Seeds Pattern Programming Framework
MPI

Notes on installing MPI on your own computer

Let me know if you have anything to post on MPI installations. Thanks. BW

Paraguin

OpenMP

Notes on installing MPI on your own computer

Let me know if you have anything to post on MPI installations. Thanks. BW

CUDA

UNCC parallel programming cci-grid0x cluster

Top 


Assignments

Each assignment is not ready for use until the date set.

Date set
Date to report system/account problems
Assignment Topic Date due
12 pm (noon)
Thursday Jan 16, 2014 Tuesday Jan 21, 2014

Assignment 1

Test files

Seeds Software

Using the Seeds Pattern Programming Framework 1 - Workpool Pattern
Wednesday Jan 29, 2014
Thursday Jan 30, 2014
Thursday Feb 6th, 2014 for any issues installing MPI on your own computer.

Assignment 2

Test files

cci-grid0x cluster

Writing and executing MPI programs on a local computer and on the cluster. MPI workpool program - Compare with Seeds Assignment 1
Wednesday Feb 19 2014
Thurs Feb 20, 2014
Thursday Feb 27th, 2014 for any issues installing OpenMP compiler on your own computer.

Assignment 3

Generating X11 graphical output

OpenMP assignment

Wednesday March 19, 2014
Tues March 18, 2014

Tuesday March 25 2014 for any issues that are preventing you from starting

Assignment 4

suzaku.h (all OS's)

suzaku.o (zipped):
Mac (Clang)
64-bit Ubuntu
32-bit Ubuntu
UNCC Cluster


Test files

Suzaku assignment (new for Spring 2014) - pattern programming using macros/routines.

Friday April 4, 2014

April 3, 2014


Assignment 5
CUDA programs, Linux environment to compile and execute simple CUDA programs, make file, vector/matrix addition, prefix sum, and sorting extra credit).
April 22, 2014

Top 

Tests

Class test 1 date: Thur Feb 20, 2014

Format: Paper test, same format as posted tests. MPI and OpenMP summaries provided, see Previous Tests. Otherwise closed book.

Topics: All lecture materials presented in class from beginning of course (week 1) to week 6 inclusive (parallel computers, pattern programming and Seeds, message passing programming and MPI, shared memory programming and OpenMP), and materials in Assignment 1 (Seeds pattern programming) and Assignment 2 (MPI).  Does not include Assignment 3.


Class test 2 date: Thurs April 10, 2014

Format: Paper test, same format as posted tests.
Topics: All lecture materials after test 1, week 8 to week 12 inclusive -Shared memory performance issues, Paraguin, hybrid programming, synchronous all-to-all pattern, stencil pattern, pipeline pattern, sorting algorithms, CUDA, Assignment 4. Does not include Assignment 5 (although includes CUDA lectures).

Final exam (2 1/2 hour exam) date: 5:00 to 7:30 pm, Tuesday May 6th, 2014. In Woodward 130

Topics: Comprehensive
Format: Paper test in format of previous posted final tests. Closed book.


Previous tests

Top