Fully Utilizing Your X-Core CPU

Almost all systems sold nowadays have at least a dual-core CPU, even triple- or quad-cores are getting cheaper and getting standard in the near future. But how to utilize your shiny x-core to it's full potential, with applications that are only utilizing one core ? With Linux, which has strong multitasking capabilities as all unixoid operating systems, there is an easy possibility to parallelize tasks which are normally only using one core of an x-core CPU.

You should be familiar working in the shell, because all work is done there. No browser, no fancy GUI's, no eyecandy.

 

Disclaimer

The following article describes the way I installed and used the software, I do not issue any guarantee that the same way works for you.

 

1. Parallel Processing Shell Script

This script is hosted on http://code.google.com/p/ppss/. It is released under BSD licensing terms, changing to GPL with version 2.55. It is not packaged in a repository of the popular Linux distributions, so it has to be installed manually.

There is a full, detailed documentation on the hosting site and the script could be downloaded from there in form of an archive. As there is only the script in the archive I unpacked the archive to /usr/local/bin, checked that it's belonging to root and is world read- and executable:

wget http://ppss.googlecode.com/files/ppss-2.50.tgz
tar xvzf ppss-2.50.tgz -C /usr/local/bin
chown root:root /usr/local/bin/ppss.sh && chmod a+rx /usr/local/bin/ppss.sh
ls -l /usr/local/bin/ppss.sh

-rwxr-xr-x 1 root root 54612 2009-12-17 16:40 /usr/local/bin/ppss.sh*

Requirements other than bash is only the mkfifo command. mkfifo is usually in the GNU coreutils package. As we are installing without a package manager, it's obvious that you manually have to take care that all dependencies are resolved. ppss.sh runs not only on Linux, but also on Mac OS X, may be also on *BSD.

When you try to call it without any arguments it should show it's help screen:

$ ppss.sh 

|P|P|S|S| Distributed Parallel Processing Shell Script 2.50

usage: /usr/local/bin/ppss.sh [ -d  | -f  ]  [ -c ' "$ITEM"' ]
                 [ -C  ]  [ -j ] [ -l  ] [ -p <# jobs> ]
                 [ -D  ] [ -h ] [ --help ]

Examples:
                 /usr/local/bin/ppss.sh -d /dir/with/some/files -c 'gzip '
                 /usr/local/bin/ppss.sh -d /dir/with/some/files -c 'gzip "$ITEM"' -D 5
                 /usr/local/bin/ppss.sh -d /dir/with/some/files -c 'cp "$ITEM" /tmp' -p 2

For basic usage of ppss.sh we don't need to configure anything, as the script is smart enough to analyze how much cores the CPU of the machine it's running on, has.

 

2. Examples

A job perfectly suited being parallelized with ppss.sh is, when you want to transform a bunch of .flac files to for instance oggvorbis files, for listening on your portable player. This could easily be done in a for loop, and we are lucky that oggenc is capable of reading .flac files:

for i in *.flac; do oggenc -q3 -o ${i%%flac}ogg $i; done

This is the sequential approach, only one of your x-cores is used. This can be seen by issuing a

top -n1

in another window, while the conversion is running:

Tasks: 112 total,   2 running, 110 sleeping,   0 stopped,   0 zombie
Cpu(s): 56.9%us,  0.6%sy,  0.0%ni, 41.2%id,  1.1%wa,  0.0%hi,  0.2%si, 0.0%st
Mem:   4062968k total,  2303836k used,  1759132k free,    36268k buffers
Swap:   522072k total,        0k used,   522072k free,  1558228k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND          
 7248 root      20   0 19780 2604 1468 R   88  0.1   0:05.02 oggenc  
...

Let's try to parallelize it with ppss.sh:

ppss.sh -d /path/to/your/flacfiles/ -c 'oggenc $ITEM -q3 -o "${ITEM%%.flac}.ogg"'

Instead of the loop-variable i we have to use the variable ITEM, which is preset by ppss.sh as "loop" variable. We get the following output on the screen:

Feb 03 20:16:00:  =========================================================
Feb 03 20:16:00:                         |P|P|S|S|                        
Feb 03 20:16:00:  Distributed Parallel Processing Shell Script version 2.50
Feb 03 20:16:00:  =========================================================
Feb 03 20:16:00:  Hostname:             xxxxx
Feb 03 20:16:00:  ---------------------------------------------------------
Feb 03 20:16:00:  CPU: AMD Athlon(tm) Dual Core Processor 4850e
Feb 03 20:16:00:  Found 2 logic processors.
Feb 03 20:16:00:  Starting 2 parallel workers.
Feb 03 20:16:00:  ---------------------------------------------------------
Feb 03 20:17:00:  Currently 100 percent complete. Processed 7 of 7 items.
Feb 03 20:17:00:  1 job is remaining.      
Feb 03 20:17:34:  Finished. Consult ./ppss/job_log for job output.

And another

top -n1

shows that all cores are utilized:

Tasks: 117 total,   3 running, 114 sleeping,   0 stopped,   0 zombie
Cpu(s): 99.4%us,  0.6%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si, 0.0%st
Mem:   4062968k total,  2070004k used,  1992964k free,    38916k buffers
Swap:   522072k total,        0k used,   522072k free,  1709844k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND          
10933 root      20   0 19856 2600 1468 R   99  0.1   0:06.20 oggenc            
10854 root      20   0 19856 2600 1468 R   87  0.1   0:08.51 oggenc     
...

Another example would be to convert the same .flac files to .mp3. This is a bit more complicated, because we need flac to decode the .flac files and lame for encoding them. This could be done sequential in a for loop, pipeing stdout of flac to stdin of lame:

for file in *.flac; do $(flac -cd "$file" | lame -h - "${file%.flac}.mp3"); done

Another

top -n1

shows that only one core is used:

Tasks: 118 total,   3 running, 115 sleeping,   0 stopped,   0 zombie
Cpu(s): 99.3%us,  0.7%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si, 0.0%st
Mem:   4062968k total,  2080004k used,  1982964k free,    37926k buffers
Swap:   522072k total,        0k used,   522072k free,  1709844k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND          
17837 root      20   0 19856 2600 1468 R   99  0.1   0:04.70 oggenc            
...

The same parallelized by ppss.sh:

ppss.sh -d /path/to/flacfiles -c 'flac -cd "$ITEM" | lame -V 4 -B 160 - "${ITEM%.flac}.mp3"'

Here

top -n1

shows that both cores are utilized nearly to their full potential:

Tasks: 129 total,   3 running, 126 sleeping,   0 stopped,   0 zombie
Cpu(s): 98.2%us,  1.7%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.2%hi,  0.0%si, 0.0%st
Mem:   4062968k total,  2629956k used,  1433012k free,    33996k buffers
Swap:   522072k total,        0k used,   522072k free,  1890244k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND           
 7287 root      20   0 14788 2480 1124 R   93  0.1   0:13.41 lame               
 7210 root      20   0 14788 2484 1124 R   89  0.1   0:23.59 lame               
 7209 root      20   0 15776 1060  792 S    7  0.0   0:01.67 flac               
 7286 root      20   0 15776 1064  792 S    6  0.0   0:00.95 flac     

A last example (dealing with soundfiles) would be to downsample some .mp3 files encoded with 320kBit to 128kBit VBR, again to save precious space on the portable player. This could be done with lame, without the help of a decoder like in the previous example:

for i in *.mp3; do lame -V 4 -B 160 - "${i%.flac}.mp3"'; done

Parallelizing this with ppss.sh looks like

ppss.sh -d /path/to/320kBit/mp3_files -c 'lame $ITEM -V 4 -B 160 "${ITEM%.mp3}.downmp3"'

and looks like

Tasks: 132 total,   3 running, 129 sleeping,   0 stopped,   0 zombie
Cpu(s): 98.8%us,  1.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.2%si, 
0.0%st
Mem:   4062968k total,  2441644k used,  1621324k free,    37068k buffers
Swap:   522072k total,        0k used,   522072k free,  1671080k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND          
 9073 root      20   0 14792 2724 1248 R   98  0.1   0:10.80 lame              
 8994 root      20   0 14788 2720 1248 R   98  0.1   0:15.57 lame

As we do not like to overwrite the 320kBit files, we simply give the downsampled files the extension .downmp3. After the downsampling is done, you can move the downsampled files out of the way with mmv:

mmv "*.downmp3" "/new/path/#1.mp3"

So you rename the files and move them elsewhere with one command. mmv, by the way, should be in the repositories of the most common distributions.

Another feature of ppss.sh is that you can use a file to provide it with the items to process. Say you have a directory with a lot of pictures, and you first want to view all pictures with feh as image viewer and use feh's filelist feature to pick some files for later processing. The processing should be done with ImageMagick, which has a lot of features for filtering and manipulating images. As manipulation of images could be cpu intensive, this is perfectly suited for ppss.sh.

A filelist of feh looks like

/path/to/images/1.png
/path/to/images/3.png
/path/to/images/17.png
...
/path/to/images/13815.png

It is produced with the -f filelist commandline parameter. You can delete an image from the filelist by pressing the <DELETE> key while in feh. Unfortunately absolute paths cannot be used with ppss.sh, but feh itself could be used to convert it's filelist to one suitable for ppss.sh:

feh *.png -f feh_list -L %n > ppss_list

or awk:

cat feh_list | awk -F/ '{ print $NF }' > ppss_list

Say you want to do some real timeconsuming filtering with all these selected files. With ImageMagick you cannot use filelists of feh, if you want to process a whole directory you can use the above mentioned for loop method. With feh you can also build recursive filelists, which could not be processed within a simple for loop with ImageMagick. But back to the example, now that you have a filelist you do the nifty filtering like

ppss.sh -f ppss_list -d /path/to/images -c 'mogrify -solarize 50 "$ITEM"'

(I only do a solarize, but you get the idea).

On the screen you see ppss working:

Feb 03 15:06:06:  =========================================================
Feb 03 15:06:06:                         |P|P|S|S|                         
Feb 03 15:06:06:  Distributed Parallel Processing Shell Script version 2.50
Feb 03 15:06:06:  =========================================================
Feb 03 15:06:06:  Hostname:		xxxxxxxx
Feb 03 15:06:06:  ---------------------------------------------------------
Feb 03 15:06:06:  CPU: Intel(R) Core(TM)2 Duo CPU     T5470  @ 1.60GHz
Feb 03 15:06:06:  Found 2 logic processors.
Feb 03 15:06:06:  Starting 2 parallel workers.
Feb 03 15:06:06:  ---------------------------------------------------------
Feb 03 15:06:17:  Currently 100 percent complete. Processed 65 of 65 items.
Feb 03 15:06:17:  1 job is remaining.       
Feb 03 15:06:17:  Finished. Consult ./ppss/job_log for job output.
Share this page:

2 Comment(s)