Parallel Programming
Frequently Asked Questions and Error Messages
Last updated January 22, 2016
Note: Google the error message for more information and help. If you have an issue with a solution that you think should be reported below, please send an email to abw@uncc.edu. Thank you. BW
General Issue: Do not copy and past commands directly from pdf documents. That may create problems by introducing extra spaces, wrong characters, etc.
SSH | VirtualBox and Course Virtual Machine | UNC-C cluster issues | C programming | Make files | X11 | OpenMP | MPI | Eclipse | Suzaku | Seeds FAQs | CUDA |
1. When I try to connect with ssh -1 username cci-gridgw.uncw.edu, it returns with nothing happing.
Check that l ("el") is not a 1 ("one").
2. When I try to ssh from my laptop through the UNCC wireless network, it hangs.
The UNCC wireless network blocks port 22 used by ssh so you cannot use this wireless network. Port 22 is also blocked off-campus.
Solutions:
a) On campus, you can use a networked lab computer.
b) Set up a VPN connection. (See notes on that in Additional Information). This will work for both campus wireless and off campus.
3 When I try to ssh to a server, I get the message "could not resolve host name....name or service not known".
Most likely you have typo in the command. Check the name of the server carefully. (One person reported re-starting Ubuntu solved the connection issue.)
Top1. I get the message "VT-x is not available. (VERR_VMX_NO_VMX) ..." when I try to import a 64-bit VM (.ova file)
The system must have VT-x (Intel CPU) or AMD-v (AMD CPU) to install a 64-bit OS in VirtualBox. Easiest solution is to use the supplied 32-bit VM instead.
2. I get the message "VT-x is not available. ..." when I try to import a 32-bit VM (.ova file)
We have seen situatons where the system has VT-x support but it is disabled in the BIOS and prevents even a 32-bit OS in VirtualBox. Enable the VT-x support in the BIOS. To get to the BIOS setup, hold down typically the F12 key when boots the system. You may need to do some research on exactly which key gets you to the BIOS for your system.
Image link from: http://www.sysprobs.com/disable-enable-virtualization-technology-bios
3. When I start Ubuntu, I get a error about a broken link to a shared folder VMshared although the start up continues and Ubuntu works.
The VM provided was created with a shared folder on the original host, which does not exist on your host. Delete the shared folder, most likely at /media/, using the command sudo rmdir sf_VMShared (or create the shared folder on your host). Note you cannot use use graphical operations for root commands.
4. When I connect to cci-gridgw.uncc.edu from the VM, I get an error "... REMOTE HOST IDENTIFICATION HAS CHANGED! ...".
This was caused by changes on the server since the VM was created. To fix, enter the command:
ssh-keygen -f "/home/abw/.ssh/known_hosts" -R cci-gridgw.uncc.edu
as given in the error mesage and reconnect to the sever with ssh.
5. I get "Opps Something went wrong. Unhandled error message: Timed out when logging in" when trying to use the Ubuntu Nautilus File Manager.
We have had a few reports of getting this message in Fall 2014. This appears to be a bug particularly when using Windows 8. So you will have to use the command line to access files in this case, unless anyone has solution. Please let us know if you do have a solution (a different GUI file manager, use XUbuntu?) To transfer files to and from the remote server, use the scp command (see Assignment Preliminaries Linux Commands appendix). BW
6. When I try to install a Ubuntu VM on VirtualBox, and get a pop up message "Sorry, Ubuntu 14.04 has experienced an internal error"
The following was reported that worked in one case. Get a command-line interface at the login screen using the combination alt+ctrl+F1. Then issue the following command:
sudo apt-get update && sudo apt-get upgrade
Error was probably due the graphics drivers needing fixes.
7. When I try to connect to a remote server using the Ubuntu Nautilus file explorer File > Connect to Server, I get a "Connection timed out" message.
Check basic ssh connection isses under SSH. Check to see if you can make a connection using PuTTy (Windows platform). If so, then your issue is with Nautilus on your platform and you may have to use the command line terminal throughout.
If all else fails to get the course VM working, install a clean Ubuntu or Xubuntu OS in Virtualbox. If that works, then install the course software. Instructions here: http://webpages.uncc.edu/abw/ParallelProgSoftware/
UNC-C cluster issues (cci-gridgw.uncc.edu)
1. I cannot get gedit to work on the UNC-C cluster
It is not installed. Use nano instead.
2. I get the error message "Cannot connect to X server localhost:10:0" when accessing the cluster from Windows with a X server installed.
Check that your local client is running an Xserver.
Check that you are forwarding X11 from the server, including through any intermediate severs (i.e. ssh -X ...)
Check if xclock displays.
This is not a server issue.
1. I get a compiler "warning" message ...
Generally you can ignore warning messages if the code compiled. (However, you might wish to check what it says about your code.) But see next:
2. I get "Warning: incompatible implicit declaration of built-in function 'strcpy' ... "
Need to add include:
#include <string.h>
where strcpy is defined. Similar fixes for other C system routines called before being defined.
3. I get a segmentation fault.
This means you are trying to refer to memory outside the allocated segment in memory. First identify where in your code this is occurring. Then check:
(a) The values of array indices are correct, e.g. the value of x in A[x].
(This is a very common error. I have seen a situation when x was not set to any value.)
(b) The value of any pointer (e.g. the value of int *p)
4. My C progam generates a segmentation fault when I increase the size of declared arrays.
Several possibilities:
(a) If you chose a very large size, insufficient memory allocation. (Too large for your system.) Try adding the keyword static to the array declaration, e.g. static double A[N][N]; The lifetime of static variables extend across the whole program. Static variables are not stored on the stack or heap.
(b) You have tried to use a variable in the declaring static arrays. In the original C language, it was incorrect to use code such as:
int N = 1000;
double A[2][N][N];
Static declaration requires the size to be known at compile time, i.e, N needs to declared as a define constant, i.e.
#define N 1000
before main. If you require N to be a variable, you will need to dyanmically memory using malloc. In some cases (especially with three-dimensional arrays) it might be easier declare the array statically instead and define N to be the largest value it can occur, although that will occupy memory space.
The above restriction is only for pre-C99 versions of C (original ANSI C). C99 relaxed this restriction. The C compiler may automatically compile to C99. To specify C99 explicitly, use the -std=c99 option in the compile command, i.e. cc -std=c99 -o prog prog.c. However, arrray will then be placed on the stack and limited in size. N = 1000 may not be possible.
1. When I run make, I get a message that the program is "up-to-date"
If you have not altered the source file, make will not re-compile and says so with that message. We have had issues with make not recognizing that a program needs re-compiling, apparently because time stamp is not up to date on the source file. Make uses the time stamp of a file to decide whether to re-compile. If make is not recompiling altered source files, delete the original compiled executable (not the source file, be careful) to force make to re-compile, as then it will know it must compile.
It is also possible you have a mistake in the makefile. Make sure you have any dependencies listed.
Top1. When I try to compile an X11 program I get: "... fatal error: X11/Xlib.h: No such file or directory"
If you are using the -L option to compile, check path to the X11 directory is correct.
Check whether X11 is installed. (X11 is installed on the course VM.) If not installed, install with:
sudo apt-get install libx11-dev
2. When I execute the X11 sample program sample.c, an X11 window is displayed but nothing else.
The computer might be too slow to process the X11 commands. One fix that worked was to add a sleep statement after the XFillArc() routine:
XFillArc (display,win,gc,400,400,50,50,0,23040); // draw circle of size 50x50 at location 400,400
sleep(1); // sleep for one second
Top
1. When I compile an OpenMP program, I get a string of error messages "undefined reference" to omp routines, e.g. undefined reference to `omp_get_wtime', undefined reference to `omp_get_thread_num', etc.
You are probably missing the -fopenmp flag when compiling:
cc –fopenmp hello.c –o hello
2. My OpenMP progam generates a segmentation fault
Apart for basic C mistakes (see under C programming):
Making variables private that should not be private or do not need to be (for example "current" and "next" in the heat distibution code).
3. When I run the hello world OpenMP program, it uses one thread as the default, e.g.
Hello World from thread = 0
Number of threads = 1
Check how many cores available. For VirtualBox, check Machines > Settings > System > Processor to see how many cores are allocated. You are only able to alter the number of cores with no VM appliances running. If you can only use one core, that will mean that the default number of threads is 1. You can alter the number of threads in OpenMP although they will all run on one core and so do not expect a speed improvement.
1. When I compile my program prog1.c with:
mpicc prog1.c -o prog1
I receive the error:
/usr/bin/ld: /usr/lib/debug/usr/lib/i386-linux-gnu/crt1.o(.debug_line): relocation X has invalid symbol index Y
/usr/lib/gcc/i686-linux-gnu/4.8/../../../i386-linux-gnu/crt1.o: In function `_start':
(.text+0x18): undefined reference to `main'
collect2: error: ld returned 1 exit status
Try compiling with the -o option first:
mpicc -o prog1 prog1.c
(In fact all options should really be before the source file although some prefer after and it usually does work either way.)
2. When I execute the matrix multilication program, I get the run time error message "memcpy argument memory ranges overlap, dst_=0x7fff6f48a730 src_=0x7fff6f48a730 len_=65536".
Using the same buffer for the source data and destination data in scatter and gather does not work with MPICH, i.e. in these lines:
MPI_Scatter(a, blksz*N, MPI_DOUBLE, a, blksz*N, MPI_DOUBLE, 0, MPI_COMM_WORLD)
MPI_Gather(c, blksz*N, MPI_DOUBLE, c, blksz*N, MPI_DOUBLE, 0, MPI_COMM_WORLD);
Changing the locations for the destination data to separate arrays gets rid of the run time error message (which actually points to the issue).
3. "When I run the matrix multiplication program, the values in the output differ from the values in the file output512x512mult by small quantities, such as:
C =
1329796.50 1266513.38 1330501.88 1262294.12 1428972.12 1292474.62 1365209.62 ...
versus:
C =
1329796.52 1266512.52 1330502.06 1262293.92 1428972.18 1292474.58 1365209.85 ...
"
This can occur if the data type for the matrices is a float instead of a double. Even with doubles, very small rounding errors have still been seen on occasion, so you will have to accept very small rounding errors.
4. "My matrix multiplication program works fine for 4, 8, or 16 processors, but when I try to run it for 12, I get the error:
[compute-0-1.local][[6640,1],0][btl_tcp_frag.c:118:mca_btl_tcp_frag_send]
mca_btl_tcp_frag_send: writev error (0x7fffb96329d0, 8272)
Bad address(1)
"
The issue is because the number of rows does not divide equally across all processes. The best solution seems to be to add extra rows to the matrices to allow extra room for the extra rows that result when P does not evenly divide into N. Adding 20 extra rows to the matrices seems to be enough to fix the problem.
Top
1. With MPI -- In the Open-MPI-Generic-Interactive run configuration, when I select connection type as "Local", I get the error message "src-resolve: Cannot resolve the name 'lml:layout_root' to a(n) type definition component."
Restart the virtual machine. This has fixed it on the one occasion the error was reported.
2. With MPI - When I run the hello.c program, I get an error such as:
"Launching Hello has encountered a problem. perl Exited with value: 132 ... "
"perl Exited with value: 131 ... mpirun was unable to launch the specified application as it could not access or execute an executable: ...
System does not know how to handle space in file name or a workspace directory name. Take out the space.
1 When I execute the command: make VectorAdd, I get the error message:
/usr/local/cuda/bin/nvcc -I/usr/local/cuda/include -o VectorAdd VectorAdd.cu -L/usr/local/cuda /lib64 -lcuda -lcudart -lm
nvcc fatal : Don't know what to do with '/lib64'
There is space before /lib64 where there should not be one. Check the make file.
2. When I use cudaEventRecord:
cudaEventRecord(start, 0);
...
cudaEventRecord(stop, 0);
cudaEventSynchronize(stop);
cudaEventElapsedTime(&time, start, stop);
to measure time, it does not give the correct time.
Although not always necessary, add a cudaEventSynchronize() after both time stamps:
cudaEventRecord(start, 0);
cudaEventSynchronize(start);
...
cudaEventRecord(stop, 0);
cudaEventSynchronize(stop);
cudaEventElapsedTime(&time, start, stop);
A cudaEventSynchronize() is needed if there is not an instruction that waits for all threads to complete. One report indicated it as needed for measuring sequential execution time.