eperimentation continued till about 2 yesterday, when we confgured a new linux user (and group) on each node.. and (re)installed mpi on each.. also we configured nfs (network file sharing) so that we could code on one computer and simply execute them from a common folder
the issues we sorted out yesterday were
- the pwd problem: earlier when we used mpi with different usernames on each node, a mpiexec -n 10 pwd returned the correct location only on the computer on which the command was executed and defaulted to '/' on other nodes. we figured that this was due to the absolute location being different on each node.. hence we added a linux user account with the same name and home directory location on each node (/home/cluster/) hence even all relative paths given in any mpiexec command mapped to the same absolute path on each node..
- the nfs problem: after setting /home/cluster/nfs to be shared from pv's computer (as server) and allowing other nodes access to this folder via /etc/exports and /etc/hosts.allow, we tried mounting this drive on the other nodes (my comp only for the time being). however, read-write permissions seemed a bit elusive at the start. in fact on mounting, the owner and group of the mounted shared directory were assigned to up and nobody. unfortunately this prevented recursive write permissions to the folder. after a bit of googling, we found that the user id (uid) and group id (gid) of the owner and group of the shared folder should be same on both the server and all nfs clients. to sort this out.. we deleted the user cluster and created (yes, again) cluster on each node with a uid=1042 and gid=1042 (yes, yes, we like 42 very much, thank you). then remounted the nfs folder.. and there!.. we had owner=cluster, and group=cluster. then we reinstallled mpi on the cluster @each node.. reset ssh-keygen, etc etc. and tried mpiexec -l -n 10 mpich2-1.0.6p1/examples/cpi. all sorted thanks to 42 and a lot of simple brainwork
- get_new_nodes(void)
- get_a_software_to_run_on_them(void)