In the previous Data Storage hands-on section, you should have copied the class scripts to your /data area. If you skipped or missed that section, type
hpc-classes biowulfnow. This command will copy the scripts and input files used in this online class to your /data area, and will take about 5 minutes.
In the following session, you will resubmit the Blat swarm that you ran earlier, but will submit it as a swarm bundle. If you're not familiar with Blat, don't worry -- this is just an example. The basic principles of job submission are not specific for Blat.
cd /data/$USER/hpc-classes/biowulf/swarm # submit the swarm swarm -f blat.swarm -g 4 --module blat -b 4
Answer
Answer
Each subjob is running 4 commands sequentially, so could take 4 times as long as if submitted without the '-b 4' flag. However, blat is a pattern matching program that reads the file /fdb/genome/hg19/chr10.fa. With a bundled swarm, this file will be read into memory on the node, so the 2nd, 3rd and 4th runs will be faster (this is especially true for a larger database file such as the entire human genome, /fdb/genome/hg19/chr_all.fa, which is 3 GB). Thus, if I/O is a major factor in the job, and the jobs are set up correctly (sufficient memory), a bundled swarm could take less time than an unbundled swarm.
Another factor to consider is how busy the cluster is. If there are lots of free resources, all the subjobs of an unbundled swarm may start up right away and complete quickly. But if the cluster is busy, only a few of the jobs may run at a time. In that case, you may be better off bundling your swarm, so that you have only a few subjobs and they all start quickly.
Answer
Answer