Fully Utilizing Your X-Core CPU - Page 2

3. Logging

When ppss.sh is working, a log directory is created right under the working directory. When something does not work as expected, you should have a deep look into this directory, which by default is named ppss. There is a log of ppss.sh itself, and also logs of the commands it performs. If all works well the ppss directory could be deleted after execution.


4. Distributed ppss.sh

One feature of ppss.sh was completely ignored until now. Since version 2.0 it is able to parallelize such kind of jobs not only over the x-cores of one machines, but distributed over x-cores of x-machines. So we have an easy way to do some HPC computing ! When the nodes of the HPC cluster were equipped with OpenCL or DirectCompute capable graphiccards, a simple way to a kind of numbercrunching HPC cluster might also be possible !

The grid is organized in a master-slave structure, with one master and x slaves. The communication between the master and the nodes in the grid is achieved with the help of ssh. The files to be processed have to be on one place reachable from all nodes, for instance a NFS or SMB share on the master (maybe sshfs also works), but scp could also be used. The OSes on the nodes could be different, there can also be different CPU's in the grid.

Requirements are a bit higher, than in a standalone installation, additionally sshd and screen are needed. Also there has to be a unprivileged account on all machines, with passwordless node to server ssh connection.

Setting up such a gridcluster is detailled described in the wiki on the website, as there are detailled instructions for every aspect of ppss.sh

It's done in 4 steps:

  • Setup an account and SSH access on the server and all nodes.
  • Create a list of all nodes.
  • Create a configfile for ppss on the server, that will be distributed to nodes.
  • Deploy ppss to the nodes, and run it

I have built a server/2 node configuration (the server itself also being a node, which is not recommended on larger setups) for demonstration purposes, outlined below. The Server is Ubuntu 9.04, the node is Debian 5. I created an account on all systems intentionally named ppss, so the homedirectory is /home/ppss. I decided to make a subdir /home/ppss/mp on the server, which should hold the files to be processed. I also created a subdir /home/ppss/mp on the node to be used as mountpoint, and manually mounted the subdir /home/ppss/mp of the server there by doing a

sshfs ppss@srvr:/home/ppss/mp ~/mp

on the node. This way there are the same paths on boths systems.

The accounts are created by

adduser -m ppss

If you have a lot of nodes clusterssh would simplify this step. You also would be well advised either to keep all UIDs/GID's in sync on all systems, or to use LDAP based authentication, outlined for instance in http://www.howtoforge.com/linux_openldap_setup_server_client

Create the passwordless ssh access like outlined for instance in http://www.debian-administration.org/articles/152 or http://www.howtoforge.com/set-up-ssh-with-public-key-authentication-debian-etch and do a first login (to accept the fingerprint).

Then use your favourite editor to create a file named nodes.txt on the server, one IP adress/hostname on each line.

The most complicated step is to create the configfile. I wanted only to perform the simple task to downmix some mp3 files using lame, like we did in one of the above examples.

Here is an example, run it in the homedir of the ppss User:

ppss.sh config -C config.cfg -c 'lame $ITEM -V 4 -B 160 "${ITEM%.mp3}.downmp3"' -d /home/ppss/mp -m srvr -u ppss -K ~/.ssh/known_hosts -n /home/ppss/nodes.txt -k /home/ppss/ppss-private.key

You should get the configfile, and should now be able to deploy it on the nodes (listed in the configfile), by doing a

ppss.sh deploy -C config.cfg

Afterwards you should be able to start the whole process by doing a

ppss.sh start -C config.cfg


5. Bottleneck

If you build a large gridcluster for HPC numbercrunching, the server with the files may become a bottleneck because he has to do a lot of disk-IO and a lot of networktraffic aggregates on this system. Also the method the filesystem with the files to be processed may be critical. scp/sshfs may be worse than NFS, but NFS also has the reputation to suffer under heavy load. On the roadmap of ppss.sh is the point, to use netcat for transfering the data over the net. netcat has very little overhead, so this should help. An alternative might be pNFS coming with NFS 4.1 (coming with Linux kernel 2.6.30).

But the whole system, favourably the IO subsystem and the network interface should be as fast as possible. Maybe it's worth thinking about RAID and NIC-bonding.


6. Alternatives

A simple alternative to ppss.sh doing parallelized processing, which should not be unmentioned here, is xargs in piped cooperation with find. This way you can also perform parallelized processing in the above style, but only on the local system, and you have to decide yourself howmuch concurrent processes you run.

A simple example looks like

find . -name '*.wav' | xargs -n1 -P2 oggenc

Another way to invoke parallel processes on more than one host, but interactively, are pdsh, or clusterssh. With this kind of apps you simultaneously login into a number of hosts, and can issue the same commands, but it might be difficult to perform such batch operations as with ppss.sh are possible. But clusterssh is a good mean to simplify administraton of all nodes in the grid outlined above.


7. HPC

There is a lot of stuff floating around in the net regarding HPC on Linux. The most famous is the Beowulf project launched in the mid of the 90's.

A good entrypoint into the HPC topic is LinuxHPC.

A distribution which might be suited well to build a single-user Ad-Hoc HPC cluster might be PelicanHPC. PelicanHPC is a Live-Linux which could be started from a CD/DVD and has an integrated PXE Server, from where all nodes could easily boot. So a single-user HPC cluster could be implemented without installing a byte on one of the involved systems.

Another lightweight distribution for building HPC clusters is CAOS. CAOS has perceus integrated which is a cluster management system. Warewulf, which is also integrated, does Monitoring.


8. URL's

Share this page:

9 Comment(s)

Add comment


From: Anonymous at: 2010-03-04 01:58:24

Nice article..

But just to let you know its not recommend to run applications as root for security reasons..  

From: Ole Tange at: 2010-06-07 22:21:41

I have here translated your examples to GNU Parallel http://www.gnu.org/software/parallel/ It makes the examples somewhat easier to read:

for i in *.flac; do oggenc -q3 -o ${i%%flac}ogg $i; done

ls *.flac | parallel -j+0 oggenc -q3 -o {.}.ogg {}

for file in *.flac; do $(flac -cd "$file" | lame -h - "${file%.flac}.mp3"); done

ls *.flac | parallel -j+0 'flac -cd {} | lame -h - {.}.mp3'

for i in *.mp3; do lame -V 4 -B 160 "$i" "${i%.mp3}.downmp3"'; done

ls *.mp3 | parallel lame -V 4 -B 160 {} {.}.downmp3

ppss.sh -f ppss_list -d /path/to/images -c 'mogrify -solarize 50 "$ITEM"'

cat feh_list | parallel mogrify -solarize 50

ppss.sh config -C config.cfg -c 'lame $ITEM -V 4 -B 160 "${ITEM%.mp3}.downmp3"' -d /home/ppss/mp -m srvr -u ppss -K ~/.ssh/known_hosts -n /home/ppss/nodes.txt -k /home/ppss/ppss-private.key

ls *.mp3 | parallel -j+0 --sshloginfile nodes.txt --trc {.}.downmp3 lame -V 4 -B 160 {} {.}.downmp3


From: linuxease.com at: 2010-03-02 00:13:48

Most 64bit distributions do not need this.  I find both my cpu's working in Linux Mint system monitor if I get a few apps going at once.  Will this squeeze more performance out of a dual core???

From: Anonymous at: 2010-03-04 03:36:05

The above statement "Most 64bit distributions do not need this." is incorrect as implied by the follow up question, "Will this squeeze more performance out of a dual core???"

 The article details methods to distribute jobs that would otherwise only use one core, over multiple cores.

 If the program can not be, or is not yet optimised to run over multiple cores, a convenient way to parallelise the work load is detailed.

For example, convert song a from flac to mp3 on one core whilst converting song b from flac to mp3 on another.

From: JohnP at: 2010-03-07 14:13:32

I think some folks are missing the point. Modern OS schedulers will spread new running processes and threads across the available CPUs automatically. The problem arises when we overload the number of tasks, especially IO bound tasks, that prevent more and more of them from getting run while the OS is trying to give each running task some amount of CPU.

This is where Task Spooler and Parallel Processing Shell Script help out, but in slightly different ways. I can use a use where ppss.sh feeds jobs into ts.

These tools are both different than tools for HA cluster management or to spread CPU intensive jobs over many different computers. Some folks need those, but that isn't what ppss.sh is about.

From: at: 2010-03-02 06:03:34

I wonder if u have ever tried kerrighed. It requires lesser meddling in running ur job. It does autoload balance for u. (though patching the kernel initially is a female dog)



From: JohnP at: 2010-03-03 18:08:03

Task Spooler provides a FIFO queue for running jobs. You can configure multiple queues or set a default number of concurrent jobs. I routinely set the set the number of max simultaneous jobs to 4, ts -S4 (quad core CPU). Then push 20+ video transcoding jobs onto the queue.

`ts transcode 1.mpg -o 1.avi`
`ts transcode 2.mpg -o 2.avi`
`ts transcode 3.mpg -o 3.avi`
`ts transcode 4.mpg -o 4.avi`
`ts transcode 5.mpg -o 5.avi`
The 5th job will be delayed until one of the others tasks completes. Log files with the output are stored too. They will need to be manually cleaned up eventually.

From: Louwrentius at: 2010-03-03 23:33:09

Thank you for this elaborate howto on PPSS. Please note that the -f option should not require the -d option and PPSS should be able to process absolute paths as specified in the file supplied to the -f option. If not, this is a bug. 



From: Leo at: 2010-03-10 18:19:59

This is very nice. An alternative (I've used for years) is simply to use "at", "atq", and "atd". It lets you queue processes, and you can specify what's the load at which a new process in the queue is automatically launched (basically, if you have 4 processors and N intensive processes to be run, you can queue them all and get them run, for at a time).

Load balancing is awesome.