Frequently Asked Questions and Error Messages
Last updated Nov 15, 2014
Note: Google the error message for more information and help.
SSH | VirtualBox and Course Virtual Machine | UNC-C cluster issues | C programming | OpenMP | MPI | Paraguin Compiler | Seeds FAQs | CUDA |
1. When I try to connect with ssh -1 username cci-gridgw.uncw.edu, it returns with nothing happing.
Check that l ("el") is not a 1 ("one").
2. When I try to ssh from my laptop through the UNCC wireless network, it hangs.
The UNCC wireless network blocks port 22 used by ssh so you cannot use this wireless network. Use your laptop at home (wireless or wired) or through a wired network.
Top1. I get the message "VT-x is not available. (VERR_VMX_NO_VMX) ..." when I try to import a 64-bit VM (.ova file)
The system must have VT-x (Intel CPU) or AMD-v (AMD CPU) enabled to install a 64-bit OS in VirtualBox. Easiest solution is to use the supplied 32-bit VM instead.
2. When I connect to cci-gridgw.uncc.edu from the VM, I get an error "... REMOTE HOST IDENTIFICATION HAS CHANGED! ...".
This was caused by changes on the server since the VM was created. To fix, enter the command:
ssh-keygen -f "/home/abw/.ssh/known_hosts" -R cci-gridgw.uncc.edu
as given in the error mesage and reconnect to the sever with ssh.
Top C programming issues
1. I get a compiler "warning" message ...
Generally you can ignore warning messages if the code compiled. (However, you might wish to check what it says about your code.)
2. My C progam generates a segmentation fault when I increase the size of declared arrays.
Several possibilities:
(a) If you chose a very large size, insufficient memory allocation. (Too large for your system.)
(b) You have tried to use a variable in the declaring static arrays. In the original C language, it was incorrect to use code such as:
int N = 1000;
double A[2][N][N];
Static declaration requires the size to be known at compile time, i.e, N needs to declared as a define constant, i.e.
#define N 1000
before main. If you require N to be a variable, you will need to dyanmically memory using malloc. In some cases (especially with three-dimensional arrays) it might be easier declare the array statically instead and define N to be the largest value it can occur, although that will occupy memory space.
The above restriction is only for pre-C99 versions of C (original ANSI C). C99 relaxed this restriction. The C compiler may automatically compile to C99. To specify C99 explicitly, use the -std=c99 option in the compile command, i.e. cc -std=c99 -o prog prog.c. However, arrray will then be placed on the stack and limited in size. N = 1000 may not be possible.
Making variables private that should not be private or do not need to be (for example "current" and "next" in the heat distibution code).
Top1. "When I run the matrix multiplication program, the values in the output differ from the values in the file output512x512mult by small quantities, such as:
C =
1329796.50 1266513.38 1330501.88 1262294.12 1428972.12 1292474.62 1365209.62 ...
versus:
C =
1329796.52 1266512.52 1330502.06 1262293.92 1428972.18 1292474.58 1365209.85 ...
"
Change the data type for the matrices from float to double. The instructions for Assignment 3 had the matrix multiplication program with floats as the type of the matrices. It was correct in Assignment 2, but not Assignment 3.
2. "My matrix multiplication program works fine for 4, 8, or 16 processors, but when I try to run it for 12, I get the error:
[compute-0-1.local][[6640,1],0][btl_tcp_frag.c:118:mca_btl_tcp_frag_send]
mca_btl_tcp_frag_send: writev error (0x7fffb96329d0, 8272)
Bad address(1)
"
1. "I can't get the matrix multiplication program to produce the correct results, even though I am transposing the B matrix and scattering it."
The transposing of the B matrix and scattering it does not work in the matrix multiplication. Each processors needs the entire B matrix in order to compute the partial results for any single row. So disregard the comment about tr ansposing the matrix and scattering it. You will simply have to broadcast the B matrix. The A matrix can still be scattered.2. "When I try to compile my program, I'm getting the
following output:
mpirun was unable to launch the specified application as it could not
access
or execute an executable:
Executable: ./hello.out
Node: compute-0-1.local "
Check that your comments in the job submission file are all on one
line. For example, the line
"#$ -pe orte 12 #
Specify how many processors we want"
should all be on one line. If the word "want" is on a separate
line by itself, it would be a syntax error. Either put the entire
comment on one line or put a "#" in front of the part that wraps to the
next line.
If that doesn't seem to fix it, it might be because you created the file on your PC and uploaded it as a DOS file. Sometimes, DOS files do not have the same end of line sequence (carriage return/linefeed). You may need to edit the file on babbage, delete the last line, and retype it.
3. "When I try to compile my program, I get the following
error:
In file included from /usr/include/features.h:385,
from /usr/include/stdio.h:28,
from hello.c:7:
/usr/include/gnu/stubs.h:7:27: error: gnu/stubs-32.h: No such file or
directory
/usr/bin/cpp -D__SCC__ -DPARAGUIN -D_x86_64_
-I/share/apps/suifhome/x86_64-redhat-linux/include -undef -U__GNUC__
-U__GNUC_MINOR__ hello.c /tmp/scc29468_0.i
FAILED (exit status 0x1) "
One reason is there is a typo on the command you are entering to
compile. For example, you will get this error if you don't have
the proper number of underscores ('_') in the word
"-D__x86_64__". There should be TWO underscores before the "x86"
and after the "64".
4. "When I compile my program with the Paraguin compiler,
I'm getting the following error message:
/usr/lib/gcc/x86_64-redhat-linux/4.4.6/include/stdarg.h:40: syntax
error; found `__gnuc_va_list' expecting `;'
"
The line to include the paraguin.h file needs to be included BEFORE any other includes.
5. "I don't know where I am suppose to put the input file for the matrix multiplication problem."
or
"I get the error:
Usage: ./matrix file
"
The matrix multiplication skeleton program is written to take the command-line argument given to the program as the name of the input file. So the input file you copied into your directory should be given immediately after the name of your program.
6. "When I compile my program, I'm getting there error:
matrixmult.out.c:85: warning: conflicting types for built-in function ‘malloc’
"
This is a warning message and not an error. The gcc compiler is complaining about malloc having conflicting declarations. It can be ignored.
7. "When I compile my program, I'm getting there error:
matrixmult.out.c:113: error: storage size of ‘__guin_status’ isn’t known
"
This is an issue related to the version of MPI used on the UNCC cluster. You can fix it by editing your .out.c file and replacing the line:
struct ompi_status_public_t __guin_status;
with:
MPI_Status __guin_status;
This shouldn't be an issue on the UNCW cluster.
8. "When I compile my program, I'm getting there error:
matrixmult.out.c:(.text+0x232): undefined reference to `__isoc99_fscanf'
"
This is also an issue related to the version of MPI used on the UNCC cluster. You can fix it by editing your .out.c file and replacing any occurrence of the function "__isoc99_fscanf" with "fscanf". This shouldn't be an issue on the UNCW cluster.
9. "I am unable to get my program to use the X11 library on babbage. Also, I get a 'command not found' error when I try to run 'xclock &'."
You will not be able to make use of the X11 library on the UNCW cluster. The reason is because when you use the scheduler to run your program, it is running in batch mode. The output will have to go to a file and not to a window on your local system.
In order to create a graphical representation of the output without using X11, you can open the output file in Excel, then create a surface chart of the data.
10. "My heat distribution program just seems to hang."
Although there are many reasons for have a deadlock, the stencil pattern should not produce a deadlock. One cause I have seen is asking the user for the number of iterations. If you do this, then you will need to broadcast the number of iterations to the other processors. In the examples, the number of iterations is a literal, and therefore does not need to be communicated to the other processors. However, if you ask the user for that number, then only the master know the number of iterations. The other processors are doing zero iterations causing a deadlock.
Perhaps, the parameters to the stencil pattern should be broadcast as part of the stencil pattern. That didn't occur to me when I implemented it, so I think I will have to add that to my list of enhancements. For the time being, any parameters that are variables and not literals (e.g. #define) need to be broadcast before the stencil.
11. "When I run my heat distribution program, I'm getting the following error:
[...] *** Process received signal ***
[...] error (7)
[...] Signal code: (128)
[...] Failing at address: (nil)
[...] [ 0] /lib64/libpthread.so.0() [0x3c8460f4a0]
[...] [ 1] newheat.out(main+0xb69) [0x401a9f]
[...] [ 2] /lib64/libc.so.6(__libc_start_main+0xfd) [0x3c83e1ecdd]
...
"
One possible cause of this error is using the __guin_current pre-defined compiler variable incorrectly. There are two things that you need to make sure of when using one of the Paraguin pre-defined variables:
If you don't do the two above, then the variable you declare is not intitialized. If you use it to access and array, then the value may be out of the bounds of the array.
12. "When I compile my program, I'm getting the following error:
[...] .c:33:Warning(paraguin): end parallel region near line 53 without a begin parallel region; ignoring
[...] .c:62:Warning(paraguin): found begin parallel region near line 62 within a parallel region; ignoring
But my parallel region pragmas are there."
This is a warning that cannot be ignored. Most likely, the some of your pragmas statements immediately follow a block. For example:
if (...) {
...
}
#pragma paraguin ...
13. "When I compile my program, I'm getting the following error:
s2c:Error: illegal null operand
"
This is one of those nasty error messages that gives you no help in finding the problem. The cause of this (most likely) is not specifying the sizes of the dimensions of a multi-dimensional array parameter, for example:
double f(double A[][][])
{
...
}
Since C uses row-major order for storing arrays, it needs to know the sizes of all but the first dimension of a multi-dimensional array passed as a parameters. If you don't provide that, the compiler has no way of generating the correct code for something like A[i][j][k]. It needs to know the sizes of the dimensions. It turns out that it does not need the size of the 1st dimension. So the above function should be delared using something like:
double f(double A[][100][200])
{
...
}
14. "When I compile my program, I'm getting there error:
[...].c: /usr/include/stdlib.h:120: warning: non-ANSI type ``long long int'' used
"
This is a warning message and not an error. It can be ignored.
15. "When I compile my program, I'm getting there error:
[...].c:65:Error(paraguin): Invalid type for parameter 5 of stencil pattern at line 65
"
Parameter 5 is the name of the function to compute values. If it is declared correctly,
then make sure that you either have a prototype for the function, or you have put the
function definition before main. The reason is because C assumes a functions return type
is int if it sees a function call before it knows what the function is.
You can do:
1) Put a prototype at the top of your function. A function prototype is the first
line of the program (return type, name, and parameters) FOLLOWED BY A SEMICOLON and no body.
2) Put the definition of the function at the top of the program.
1 When I execute the command: make VectorAdd, I get the error message:
/usr/local/cuda/bin/nvcc -I/usr/local/cuda/include -o VectorAdd VectorAdd.cu -L/usr/local/cuda /lib64 -lcuda -lcudart -lm
nvcc fatal : Don't know what to do with '/lib64'
There is space before /lib64 where there should not be one. Check the make file.