Frequently Asked Questions and Error Messages

Last updated Nov 15, 2014

Note: Google the error message for more information and help.

SSH VirtualBox and Course Virtual Machine UNC-C cluster issues C programming OpenMP MPI Paraguin Compiler Seeds FAQs CUDA

SSH

1. When I try to connect with ssh -1 username cci-gridgw.uncw.edu, it returns with nothing happing.

Check that l ("el") is not a 1 ("one").

2. When I try to ssh from my laptop through the UNCC wireless network, it hangs.

The UNCC wireless network blocks port 22 used by ssh so you cannot use this wireless network. Use your laptop at home (wireless or wired) or through a wired network.

Top
VirtualBox and Course Virtual Machine

1. I get the message "VT-x is not available. (VERR_VMX_NO_VMX) ..." when I try to import a 64-bit VM (.ova file)

The system must have VT-x (Intel CPU) or AMD-v (AMD CPU) enabled to install a 64-bit OS in VirtualBox. Easiest solution is to use the supplied 32-bit VM instead.

2. When I connect to cci-gridgw.uncc.edu from the VM, I get an error "... REMOTE HOST IDENTIFICATION HAS CHANGED! ...".

This was caused by changes on the server since the VM was created. To fix, enter the command:

ssh-keygen -f "/home/abw/.ssh/known_hosts" -R cci-gridgw.uncc.edu

as given in the error mesage and reconnect to the sever with ssh.

Top
UNC-C cluster issues

1. I get the error message "Cannot connect to X server localhost:10:0" when accessing the cluster from Windows with a X server installed.

Check that your local client is running an Xserver.
Check that you are forwarding X11 from the server, including through any intermediate severs (i.e. ssh -X ...)
Check if xclock displays.

This is not a server issue.
Top

C programming issues

1. I get a compiler "warning" message ...

Generally you can ignore warning messages if the code compiled. (However, you might wish to check what it says about your code.)

2. My C progam generates a segmentation fault when I increase the size of declared arrays.

Several possibilities:

(a) If you chose a very large size, insufficient memory allocation. (Too large for your system.)

(b) You have tried to use a variable in the declaring static arrays. In the original C language, it was incorrect to use code such as:

int N = 1000;
double A[2][N][N];


Static declaration requires the size to be known at compile time, i.e, N needs to declared as a define constant, i.e.

#define N 1000

before main. If you require N to be a variable, you will need to dyanmically memory using malloc. In some cases (especially with three-dimensional arrays) it might be easier declare the array statically instead and define N to be the largest value it can occur, although that will occupy memory space.

The above restriction is only for pre-C99 versions of C (original ANSI C). C99 relaxed this restriction. The C compiler may automatically compile to C99. To specify C99 explicitly, use the -std=c99 option in the compile command, i.e. cc -std=c99 -o prog prog.c. However, arrray will then be placed on the stack and limited in size. N = 1000 may not be possible.


Top


OpenMP

1. My OpenMP progam generates a segmentation fault

Apart for basic C mistakes (see under C programming):

Making variables private that should not be private or do not need to be (for example "current" and "next" in the heat distibution code).

Top
MPI

1. "When I run the matrix multiplication program, the values in the output differ from the values in the file output512x512mult by small quantities, such as:

C =
1329796.50 1266513.38 1330501.88 1262294.12 1428972.12 1292474.62 1365209.62 ...

versus:

C =
1329796.52 1266512.52 1330502.06 1262293.92 1428972.18 1292474.58 1365209.85 ...
"

Change the data type for the matrices from float to double. The instructions for Assignment 3 had the matrix multiplication program with floats as the type of the matrices. It was correct in Assignment 2, but not Assignment 3.

2. "My matrix multiplication program works fine for 4, 8, or 16 processors, but when I try to run it for 12, I get the error:

[compute-0-1.local][[6640,1],0][btl_tcp_frag.c:118:mca_btl_tcp_frag_send]
mca_btl_tcp_frag_send: writev error (0x7fffb96329d0, 8272)
Bad address(1)
"

I investigated the problem some more. It turns out that having a variable number of rows for the scatter doesn't fix the problem entirely. The best solution seems to be to add extra rows to the matrices to allow extra room for the extra rows that result when NP does not evenly divide into N. Adding 20 extra rows to the matrices seems to be enough to fix the problem.

The problem is that 12 does not divide evenly into 512 (which is the number of rows in the input). This means that blksz * NP is more than 512. This would not be a problem, except that when you indicate that you want to scatter blksz number of rows to each processor, not all processors will get that many rows. So you need to compute the number of rows each processor will should actually receive and use that as a paramter in the Scatter. Most of the processors will receive blksz number of rows, but not all of them.


Top
Paraguin Compiler

1. "I can't get the matrix multiplication program to produce the correct results, even though I am transposing the B matrix and scattering it."

The transposing of the B matrix and scattering it does not work in the matrix multiplication. Each processors needs the entire B matrix in order to compute the partial results for any single row. So disregard the comment about tr ansposing the matrix and scattering it. You will simply have to broadcast the B matrix. The A matrix can still be scattered.

2. "When I try to compile my program, I'm getting the following output:
mpirun was unable to launch the specified application as it could not access
or execute an executable:

Executable: ./hello.out
Node: compute-0-1.local "

Check that your comments in the job submission file are all on one line. For example, the line
"#$ -pe orte 12         # Specify how many processors we want"
should all be on one line.  If the word "want" is on a separate line by itself, it would be a syntax error.  Either put the entire comment on one line or put a "#" in front of the part that wraps to the next line.

If that doesn't seem to fix it, it might be because you created the file on your PC and uploaded it as a DOS file. Sometimes, DOS files do not have the same end of line sequence (carriage return/linefeed). You may need to edit the file on babbage, delete the last line, and retype it.

3. "When I try to compile my program, I get the following error:
In file included from /usr/include/features.h:385,
                 from /usr/include/stdio.h:28,
                 from hello.c:7:
/usr/include/gnu/stubs.h:7:27: error: gnu/stubs-32.h: No such file or directory
/usr/bin/cpp -D__SCC__  -DPARAGUIN -D_x86_64_   -I/share/apps/suifhome/x86_64-redhat-linux/include -undef -U__GNUC__ -U__GNUC_MINOR__ hello.c /tmp/scc29468_0.i
FAILED (exit status 0x1) "

One reason is there is a typo on the command you are entering to compile.  For example, you will get this error if you don't have the proper number of underscores ('_') in the word "-D__x86_64__".  There should be TWO underscores before the "x86" and after the "64".

4. "When I compile my program with the Paraguin compiler, I'm getting the following error message:
/usr/lib/gcc/x86_64-redhat-linux/4.4.6/include/stdarg.h:40: syntax error; found `__gnuc_va_list' expecting `;'
"

The line to include the paraguin.h file needs to be included BEFORE any other includes.

5. "I don't know where I am suppose to put the input file for the matrix multiplication problem."
or
"I get the error:
Usage: ./matrix file "

The matrix multiplication skeleton program is written to take the command-line argument given to the program as the name of the input file. So the input file you copied into your directory should be given immediately after the name of your program.

6. "When I compile my program, I'm getting there error:
matrixmult.out.c:85: warning: conflicting types for built-in function ‘malloc’ "

This is a warning message and not an error. The gcc compiler is complaining about malloc having conflicting declarations. It can be ignored.

7. "When I compile my program, I'm getting there error:
matrixmult.out.c:113: error: storage size of ‘__guin_status’ isn’t known "

This is an issue related to the version of MPI used on the UNCC cluster. You can fix it by editing your .out.c file and replacing the line:
      struct ompi_status_public_t __guin_status;
with:
      MPI_Status __guin_status;
This shouldn't be an issue on the UNCW cluster.

8. "When I compile my program, I'm getting there error:
matrixmult.out.c:(.text+0x232): undefined reference to `__isoc99_fscanf' "

This is also an issue related to the version of MPI used on the UNCC cluster. You can fix it by editing your .out.c file and replacing any occurrence of the function "__isoc99_fscanf" with "fscanf". This shouldn't be an issue on the UNCW cluster.

9. "I am unable to get my program to use the X11 library on babbage. Also, I get a 'command not found' error when I try to run 'xclock &'."

You will not be able to make use of the X11 library on the UNCW cluster. The reason is because when you use the scheduler to run your program, it is running in batch mode. The output will have to go to a file and not to a window on your local system.

In order to create a graphical representation of the output without using X11, you can open the output file in Excel, then create a surface chart of the data.

10. "My heat distribution program just seems to hang."

Although there are many reasons for have a deadlock, the stencil pattern should not produce a deadlock. One cause I have seen is asking the user for the number of iterations. If you do this, then you will need to broadcast the number of iterations to the other processors. In the examples, the number of iterations is a literal, and therefore does not need to be communicated to the other processors. However, if you ask the user for that number, then only the master know the number of iterations. The other processors are doing zero iterations causing a deadlock.

Perhaps, the parameters to the stencil pattern should be broadcast as part of the stencil pattern. That didn't occur to me when I implemented it, so I think I will have to add that to my list of enhancements. For the time being, any parameters that are variables and not literals (e.g. #define) need to be broadcast before the stencil.

11. "When I run my heat distribution program, I'm getting the following error:
[...] *** Process received signal ***
[...] error (7)
[...] Signal code: (128)
[...] Failing at address: (nil)
[...] [ 0] /lib64/libpthread.so.0() [0x3c8460f4a0]
[...] [ 1] newheat.out(main+0xb69) [0x401a9f]
[...] [ 2] /lib64/libc.so.6(__libc_start_main+0xfd) [0x3c83e1ecdd]
... "

One possible cause of this error is using the __guin_current pre-defined compiler variable incorrectly. There are two things that you need to make sure of when using one of the Paraguin pre-defined variables:

  1. The variable should be initialized (like to zero).
  2. The variable should be declared in the global space (outside of all functions) and not within main(). If you declare it within main(), then it is not the same variable as the one the compiler declares, which is global.

If you don't do the two above, then the variable you declare is not intitialized. If you use it to access and array, then the value may be out of the bounds of the array.

12. "When I compile my program, I'm getting the following error:
[...] .c:33:Warning(paraguin): end parallel region near line 53 without a begin parallel region; ignoring
[...] .c:62:Warning(paraguin): found begin parallel region near line 62 within a parallel region; ignoring

But my parallel region pragmas are there."

This is a warning that cannot be ignored. Most likely, the some of your pragmas statements immediately follow a block. For example:
if (...) {
   ...
}
#pragma paraguin ...

The pragma will be attached to the last statement, which, in this example, happens to be the last statement within the block for the if statement. That is the wrong location. Since the compiler is looking for the parallel regions to be at the outermost nesting within a function, it doesn't see the pragma and thinks you didn't put one in. The solution is to put a semicolon on a line before the pragma created a NOOP instruction to which the pragma can be attached.

13. "When I compile my program, I'm getting the following error:
s2c:Error: illegal null operand
"

This is one of those nasty error messages that gives you no help in finding the problem. The cause of this (most likely) is not specifying the sizes of the dimensions of a multi-dimensional array parameter, for example:

double f(double A[][][]) {
   ...
}

Since C uses row-major order for storing arrays, it needs to know the sizes of all but the first dimension of a multi-dimensional array passed as a parameters. If you don't provide that, the compiler has no way of generating the correct code for something like A[i][j][k]. It needs to know the sizes of the dimensions. It turns out that it does not need the size of the 1st dimension. So the above function should be delared using something like:
double f(double A[][100][200]) {
   ...
}

14. "When I compile my program, I'm getting there error:
[...].c: /usr/include/stdlib.h:120: warning: non-ANSI type ``long long int'' used "

This is a warning message and not an error. It can be ignored.

15. "When I compile my program, I'm getting there error:
[...].c:65:Error(paraguin): Invalid type for parameter 5 of stencil pattern at line 65 "

Parameter 5 is the name of the function to compute values. If it is declared correctly, then make sure that you either have a prototype for the function, or you have put the function definition before main. The reason is because C assumes a functions return type is int if it sees a function call before it knows what the function is.
You can do:
1) Put a prototype at the top of your function. A function prototype is the first line of the program (return type, name, and parameters) FOLLOWED BY A SEMICOLON and no body.
2) Put the definition of the function at the top of the program.


Top


CUDA

1 When I execute the command: make VectorAdd, I get the error message:
/usr/local/cuda/bin/nvcc -I/usr/local/cuda/include -o VectorAdd VectorAdd.cu -L/usr/local/cuda /lib64 -lcuda -lcudart -lm
nvcc fatal : Don't know what to do with '/lib64'

There is space before /lib64 where there should not be one. Check the make file.


Top