Friday, March 14, 2008

the name's cluster, embarassed cluster

google! we finally got mpd running without any issues whatsoever on 2 comps.. !

eperimentation continued till about 2 yesterday, when we confgured a new linux user (and group) on each node.. and (re)installed mpi on each.. also we configured nfs (network file sharing) so that we could code on one computer and simply execute them from a common folder

the issues we sorted out yesterday were
  1. the pwd problem: earlier when we used mpi with different usernames on each node, a mpiexec -n 10 pwd returned the correct location only on the computer on which the command was executed and defaulted to '/' on other nodes. we figured that this was due to the absolute location being different on each node.. hence we added a linux user account with the same name and home directory location on each node (/home/cluster/) hence even all relative paths given in any mpiexec command mapped to the same absolute path on each node..
  2. the nfs problem: after setting /home/cluster/nfs to be shared from pv's computer (as server) and allowing other nodes access to this folder via /etc/exports and /etc/hosts.allow, we tried mounting this drive on the other nodes (my comp only for the time being). however, read-write permissions seemed a bit elusive at the start. in fact on mounting, the owner and group of the mounted shared directory were assigned to up and nobody. unfortunately this prevented recursive write permissions to the folder. after a bit of googling, we found that the user id (uid) and group id (gid) of the owner and group of the shared folder should be same on both the server and all nfs clients. to sort this out.. we deleted the user cluster and created (yes, again) cluster on each node with a uid=1042 and gid=1042 (yes, yes, we like 42 very much, thank you). then remounted the nfs folder.. and there!.. we had owner=cluster, and group=cluster. then we reinstallled mpi on the cluster @each node.. reset ssh-keygen, etc etc. and tried mpiexec -l -n 10 mpich2-1.0.6p1/examples/cpi. all sorted thanks to 42 and a lot of simple brainwork
this should be very simply scalable to all new nodes.. (vinayakzark, vai.sinh, kk).. the ssh problem with kk's sshd still remains to be figured.. so we're keeping it out of the ring for the moment.. now its a simple matter of running some custom applications on the mpi platform.. maybe we could try AMBER9 or something that already uses MPI as a parallel computing framework. so i guess our immediate objectives are the following
  1. get_new_nodes(void)
  2. get_a_software_to_run_on_them(void)
foobar to pv: we are green to go. i repeat, we are green to go. do you copy?

1 comment:

Unknown said...

na i share via NFS

IEEE Transactions on Parallel and Distributed Systems : latest TOC