Wednesday, September 9, 2009

MTP discussions

one optimization technique springs to mind:
we can break up the initial 2d array into lots of small 2d arrays and optimise each each separately. then we can stitch them back. this can probably give better results.

MTP discussions

in section 4.1 of the paper, the authors say they used a sparse variant of the levenberg marquardt algo. the levmar site (http://www.ics.forth.gr/~lourakis/levmar/) you found earlier is exactly that. i don't think gsl wala has sparse matrix support. so will switch to levmar.

Tuesday, November 4, 2008

para3dhwt

Its been a snail's pace thanks to arbit errors and the debugging involved. We have adopted the modular approach making separate files for all the functions involved, keeping the related ones in the same file. What it has cost us at a speed of around 0.25x for just 3dhwt is a time of around 15 seconds for a 100 frame chunk. Thats at least better than the 38 minutes it once took! I seriously wish this gives a good compression over the usual stuff for all the time its taking... and this was just for order 1 wavelets! Well, we do have some optimizations already in mind; leaving apart the init 2 without X and gdm that we have come down to.

We have a few major hurdles to overcome though - one is the time factor, this time meaning the deadline for the BTP after which it is as good as dead; a second staring us in the face is the disappearance of nodes from the cluster owing to the upgradation of the lab that's going on... No we still won't be given the new ThinkSmarts to work on and no we (=>at least I) don't intend to appeal either for it'll mean wasting another weekend over installation of the components.

Friday, October 31, 2008

Installing OpenCV with ffmpeg

Making it work took some time and effort and repetition owing to the 4 nodes being individually separate entities. Here's what we and the ffmpeg and opencv tarballs had to go through
  1. untar ffmpeg
  2. ./configure --enable-shared --enable-swscale --enable-gpl
  3. make
  4. sudo make install
  5. untar opencv
  6. sudo apt-get install patch ;if not already installed
  7. patch otherlibs/highgui/cvcap_ffmpeg.cpp ../nfs/opencv-1.0.0-cvcapffmpegundefinedsymbols.patch
  8. #4 from the page http://www.rainsoft.de/projects/ffmpeg_opencv.html
  9. su
    cd /usr/local/include/
    mkdir ffmpeg
    cp libavcodec/* ffmpeg/
    cp libavdevice/* ffmpeg/
    cp libavformat/* ffmpeg/
    cp libavutil/* ffmpeg/
    cp libswscale/* ffmpeg/
    exit
  10. change FFMPEGLIBS="-lavcodec -lavformat" to FFMPEGLIBS="-lavcodec -lavformat -lswscale" in configure
  11. ./configure --enable-shared
  12. make
  13. sudo make install
  14. sudo ldconfig
Sandy instead made a patch file for both step #8 and #10 so if anyone ever reads this and needs them, can contact him.

Wednesday, June 4, 2008

From the Horse's mouth: a copy of the MPI reference manual

I'm diving head first into MPI and related PPing since:
1. its fun and I always wanted to do it!
2. I need to do it for my summer project at IISc which will otherwise take months on the single/dual processor machines.
I was surprised when Hollow told me that you did not have the full manual for MPI (MIT press).
I am a bit aware of some of the problems you ran into while setting up the Beowulf cluster. I will try to get solutions for these since there are people here who are proficient at this sort of stuff.
In any case, I will be adding some notes which I feel are important from the reference manual under this tag. Also, I'll post some bioinformatics problems that can be done when the cluster is up and running on all its feet.

Friday, March 14, 2008

the name's cluster, embarassed cluster

google! we finally got mpd running without any issues whatsoever on 2 comps.. !

eperimentation continued till about 2 yesterday, when we confgured a new linux user (and group) on each node.. and (re)installed mpi on each.. also we configured nfs (network file sharing) so that we could code on one computer and simply execute them from a common folder

the issues we sorted out yesterday were
  1. the pwd problem: earlier when we used mpi with different usernames on each node, a mpiexec -n 10 pwd returned the correct location only on the computer on which the command was executed and defaulted to '/' on other nodes. we figured that this was due to the absolute location being different on each node.. hence we added a linux user account with the same name and home directory location on each node (/home/cluster/) hence even all relative paths given in any mpiexec command mapped to the same absolute path on each node..
  2. the nfs problem: after setting /home/cluster/nfs to be shared from pv's computer (as server) and allowing other nodes access to this folder via /etc/exports and /etc/hosts.allow, we tried mounting this drive on the other nodes (my comp only for the time being). however, read-write permissions seemed a bit elusive at the start. in fact on mounting, the owner and group of the mounted shared directory were assigned to up and nobody. unfortunately this prevented recursive write permissions to the folder. after a bit of googling, we found that the user id (uid) and group id (gid) of the owner and group of the shared folder should be same on both the server and all nfs clients. to sort this out.. we deleted the user cluster and created (yes, again) cluster on each node with a uid=1042 and gid=1042 (yes, yes, we like 42 very much, thank you). then remounted the nfs folder.. and there!.. we had owner=cluster, and group=cluster. then we reinstallled mpi on the cluster @each node.. reset ssh-keygen, etc etc. and tried mpiexec -l -n 10 mpich2-1.0.6p1/examples/cpi. all sorted thanks to 42 and a lot of simple brainwork
this should be very simply scalable to all new nodes.. (vinayakzark, vai.sinh, kk).. the ssh problem with kk's sshd still remains to be figured.. so we're keeping it out of the ring for the moment.. now its a simple matter of running some custom applications on the mpi platform.. maybe we could try AMBER9 or something that already uses MPI as a parallel computing framework. so i guess our immediate objectives are the following
  1. get_new_nodes(void)
  2. get_a_software_to_run_on_them(void)
foobar to pv: we are green to go. i repeat, we are green to go. do you copy?

Friday, March 7, 2008

Third weak week

With the problems faced earlier we decided to start all over again. And this time we had 5 nodes (tgwtt's scribbler's proliferous in the ring, :P). The problem persists with one of them and we blame it on the sshd on that comp. For the time being, it has been isolated from the ring.
So, after setting up ssh for MPI the next very step - installing MPICH2 on the two nodes it wasn't already on; we successfully did it on proliferous whereas we'll have to wait till the next day for cluster to be ready with it.

failed to handshake with mpd on recvd output={}

Finally our focus shifted from the ssh-ing problem to a new one. The very first command mpdboot gave an error. We figured this out to be a hostname resolution problem and so we modified the /etc/hosts files on comps we had the su permissions of. And so we had to say goodbye to proliferous too for the time being. With the sshd problem not resolved yet on krishna and MPICH2 not installed yet on cluster, we were now left with only two nodes.
With this problem fixed we proceeded to the next command. All this looked pretty simple until we'd encountered the problems in every command we gave. It was a late realization that this was happening and we had to search exaustively to get the problems solved... hmm or are they solved!?

problem with execution of [Errno 2] No such file or directory

Next, with mpdboot working now, we proceeded to giving something to the ring to execute. mpiexec worked well when we executed files in /bin or any other path in $PATH of all the nodes. Where we met the next obstacle was in executing a file on some path not already in $PATH, for instance the home directory of the node user accounts! We tried to fix this as follows:
copied the file onto every node's home dir-> ran mpiexec, but... -> got the same error. Obviously this thing wasn't looking for the file where we'd expected it would. Our doubts were confirmed on giving mpiexec -n 2 pwd. This displayed / and /home/hollow, hollow being my node indicating that on the other nodes, it looks for the file on / itself!
To deal with this, we added /home/deepcyan in $PATH of node deepcyan. This still didn't work. We now can identify this problem as being one where we want to run two different programs on two different nodes using the same mpiexec. We weren't even using ":" for our purpose.
In searching for a solution we came across NFS and how it can be used for this purpose. That's when it struck us. We had to run the same program on different nodes parallelly right! Thanks to the links [1] and [2] we setup and configured an NFS server and a single client for the time being.

mpiexec: failed to obtain sock from manager

Hmm this is what we are currently facing, some NFS configuration problem most probably. Its like a video game. You need to fight a monster to go on the next level to fight a bigger one. Right now the game's saved at this level. I do hope we complete all levels someday.

IEEE Transactions on Parallel and Distributed Systems : latest TOC