Frequently Asked Questions and Error Messages

Last updated April 3, 2014.

Installing software on your own computer UNC-Charlotte cci-grid0x cluster

UNC-Wilmington Cluster

Seeds MPI Paraguin OpenMP CUDA Suzaku

Installing software on your own computer To add - please contribute
Top

UNC-Charlotte cci-grid0x cluster (cci_gridgw.uncc.edu) issues

1. I get the message "mpiexec_cci-gridgw.uncc.edu: cannot connect to local mpd ... "

You are using the wrong command to execute MPI program. On the UNCC cluster it is mpiexec.hydra see: Using UNCC Parallel Programming Cluster

2. (X-11 libraries) I get the error message "Cannot connect to X server localhost:10:0"

Check that your local client is running an Xserver.
Check that you are forwarding X11 from the server, including through any intermediate severs (i.e. ssh -X ...)
Check if xclock displays.
This is not a server issue.


Top

UNC-Wilmington Cluster

1. "My matrix multiplication program works fine for 4, 8, or 16 processors, but when I try to run it for 12, I get the error:

[compute-0-1.local][[6640,1],0][btl_tcp_frag.c:118:mca_btl_tcp_frag_send]
mca_btl_tcp_frag_send: writev error (0x7fffb96329d0, 8272)
Bad address(1)
"

I investigated the problem some more. It turns out that having a variable number of rows for the scatter doesn't fix the problem entirely. The best solution seems to be to add extra rows to the matrices to allow extra room for the extra rows that result when NP does not evenly divide into N. Adding 20 extra rows to the matrices seems to be enough to fix the problem. CF
Top

MPI Assignment

1. "When I run the matrix multiplication program, the values in the output differ from the values in the file output512x512mult by small quantities, such as:

C =
1329796.50 1266513.38 1330501.88 1262294.12 1428972.12 1292474.62 1365209.62 ...

versus:

C =
1329796.52 1266512.52 1330502.06 1262293.92 1428972.18 1292474.58 1365209.85 ...
"


Change the data type for the matrices from float to double.


Top


Paraguin Compiler

1. "I can't get the matrix multiplication program to produce the correct results, even though I am transposing the B matrix and scattering it."

The transposing of the B matrix and scattering it does not work in the matrix multiplication. Each processors needs the entire B matrix in order to compute the partial results for any single row. So disregard the comment about tr ansposing the matrix and scattering it. You will simply have to broadcast the B matrix. The A matrix can still be scattered.

2. "When I try to compile my program, I'm getting the following output:
mpirun was unable to launch the specified application as it could not access
or execute an executable:

Executable: ./hello.out
Node: compute-0-1.local "

Check that your comments in the job submission file are all on one line. For example, the line
"#$ -pe orte 12         # Specify how many processors we want"
should all be on one line.  If the word "want" is on a separate line by itself, it would be a syntax error.  Either put the entire comment on one line or put a "#" in front of the part that wraps to the next line.

If that doesn't seem to fix it, it might be because you created the file on your PC and uploaded it as a DOS file. Sometimes, DOS files do not have the same end of line sequence (carriage return/linefeed). You may need to edit the file on babbage, delete the last line, and retype it.

3. "When I try to compile my program, I get the following error:
In file included from /usr/include/features.h:385,
                 from /usr/include/stdio.h:28,
                 from hello.c:7:
/usr/include/gnu/stubs.h:7:27: error: gnu/stubs-32.h: No such file or directory
/usr/bin/cpp -D__SCC__  -DPARAGUIN -D_x86_64_   -I/share/apps/suifhome/x86_64-redhat-linux/include -undef -U__GNUC__ -U__GNUC_MINOR__ hello.c /tmp/scc29468_0.i
FAILED (exit status 0x1) "

One reason is there is a typo on the command you are entering to compile.  For example, you will get this error if you don't have the proper number of underscores ('_') in the word "-D__x86_64__".  There should be TWO underscores before the "x86" and after the "64".

4. "When I compile my program with the Paraguin compiler, I'm getting the following error message:
/usr/lib/gcc/x86_64-redhat-linux/4.4.6/include/stdarg.h:40: syntax error; found `__gnuc_va_list' expecting `;'
"

The lines in the program that declare the "builtin_va_list" need to come BEFORE all the include files. The reason is because the va_list is used in the stdarg.h file.

5. "I dont know where I am suppose to put the input file for the matrix multiplication problem."
or
"I get the error:
Usage: ./matrix file "

The matrix multiplication skeleton program is written to take the command-line argument given to the program as the name of the input file. So the input file you copied into your directory should be given immediately after the name of your program.


Top

OpenMP

1. My program is executing much slower when I parallelize it.

First check that the parallel code produces the same results as the corresponding sequential program.

Make sure all variables that need to be private are declared as such.

Make sure you have not serialized the code excessively with critical sections or contention on shared variables.

Note in general, the absolute maximum speed up is given by the number of cores on the machine. One may not get speed up for small problem sizes. Increase the size of data to see an effect.


CUDA

1. The make file will not re-compile my program. (It compiled the first time.)

Make will only re-compile source programs that have been modified. This mechanism relies upon the time stamp on the files. So if this is incorrect, you may find the file will not recompile. In that case, delete the executable (not the source file!).


Top


Suzaku

1. I get a string of error messages: "undefined reference to "ompi.... " on my own computer

May be due to different version numbers of MPI or/and diferent implementations of MPI (such OpenMPI, MPICH). Try the cluster.

2. I get a string of error messages: "In Function '_start': . .. multiple definition of '_start' ... "

You missed off -c in linking command: mpicc –c helloworld.c –o helloworld.o

Top