Frequently Asked Questions and Error Messages

Last updated Nov 7, 2013


Seeds Framework

See Seeds FAQs


MPI Assignment

1. "When I run the matrix multiplication program, the values in the output differ from the values in the file output512x512mult by small quantities, such as:

C =
1329796.50 1266513.38 1330501.88 1262294.12 1428972.12 1292474.62 1365209.62 ...

versus:

C =
1329796.52 1266512.52 1330502.06 1262293.92 1428972.18 1292474.58 1365209.85 ...
"

Change the data type for the matrices from float to double. The instructions for Assignment 3 had the matrix multiplication program with floats as the type of the matrices. It was correct in Assignment 2, but not Assignment 3.

2. "My matrix multiplication program works fine for 4, 8, or 16 processors, but when I try to run it for 12, I get the error:

[compute-0-1.local][[6640,1],0][btl_tcp_frag.c:118:mca_btl_tcp_frag_send]
mca_btl_tcp_frag_send: writev error (0x7fffb96329d0, 8272)
Bad address(1)
"

I investigated the problem some more. It turns out that having a variable number of rows for the scatter doesn't fix the problem entirely. The best solution seems to be to add extra rows to the matrices to allow extra room for the extra rows that result when NP does not evenly divide into N. Adding 20 extra rows to the matrices seems to be enough to fix the problem.

The problem is that 12 does not divide evenly into 512 (which is the number of rows in the input). This means that blksz * NP is more than 512. This would not be a problem, except that when you indicate that you want to scatter blksz number of rows to each processor, not all processors will get that many rows. So you need to compute the number of rows each processor will should actually receive and use that as a paramter in the Scatter. Most of the processors will receive blksz number of rows, but not all of them.


Paraguin Compiler

1. "I can't get the matrix multiplication program to produce the correct results, even though I am transposing the B matrix and scattering it."

The transposing of the B matrix and scattering it does not work in the matrix multiplication. Each processors needs the entire B matrix in order to compute the partial results for any single row. So disregard the comment about tr ansposing the matrix and scattering it. You will simply have to broadcast the B matrix. The A matrix can still be scattered.

2. "When I try to compile my program, I'm getting the following output:
mpirun was unable to launch the specified application as it could not access
or execute an executable:

Executable: ./hello.out
Node: compute-0-1.local "

Check that your comments in the job submission file are all on one line. For example, the line
"#$ -pe orte 12         # Specify how many processors we want"
should all be on one line.  If the word "want" is on a separate line by itself, it would be a syntax error.  Either put the entire comment on one line or put a "#" in front of the part that wraps to the next line.

If that doesn't seem to fix it, it might be because you created the file on your PC and uploaded it as a DOS file. Sometimes, DOS files do not have the same end of line sequence (carriage return/linefeed). You may need to edit the file on babbage, delete the last line, and retype it.

3. "When I try to compile my program, I get the following error:
In file included from /usr/include/features.h:385,
                 from /usr/include/stdio.h:28,
                 from hello.c:7:
/usr/include/gnu/stubs.h:7:27: error: gnu/stubs-32.h: No such file or directory
/usr/bin/cpp -D__SCC__  -DPARAGUIN -D_x86_64_   -I/share/apps/suifhome/x86_64-redhat-linux/include -undef -U__GNUC__ -U__GNUC_MINOR__ hello.c /tmp/scc29468_0.i
FAILED (exit status 0x1) "

One reason is there is a typo on the command you are entering to compile.  For example, you will get this error if you don't have the proper number of underscores ('_') in the word "-D__x86_64__".  There should be TWO underscores before the "x86" and after the "64".

4. "When I compile my program with the Paraguin compiler, I'm getting the following error message:
/usr/lib/gcc/x86_64-redhat-linux/4.4.6/include/stdarg.h:40: syntax error; found `__gnuc_va_list' expecting `;'
"

The lines in the program that declare the "builtin_va_list" need to come BEFORE all the include files. The reason is because the va_list is used in the stdarg.h file.

5. "I dont know where I am suppose to put the input file for the matrix multiplication problem."
or
"I get the error:
Usage: ./matrix file "

The matrix multiplication skeleton program is written to take the command-line argument given to the program as the name of the input file. So the input file you copied into your directory should be given immediately after the name of your program.


Assignment 4 (UNC-C cluster issues)

1. I get the error message "Cannot connect to X server localhost:10:0"

Check that your local client is running an Xserver.
Check that you are forwarding X11 from the server, including through any intermediate severs (i.e. ssh -X ...)
Check if xclock displays.
This is not a server issue.