Page 1 of 1

Creating a Handbrake Distributed / Cluster setup

Posted: Fri Aug 06, 2010 1:10 am
by louwrentius
Encoding or transcoding a single video file on multiple systems is not worth the effort. But you can create a Handbrake cluster that processes multiple files in parallel. I've written down some instructions about how to achieve this. I hope someone finds this useful.

Say you have about 20 DVDs you want to convert to MP4 to save some space. Or you want to transcode an entire season of a series that you've bought on DVD to a format your phone or tablet can handle.

You will need:

Knowledge:
- Some understanding of SSH and keys.
- Some understanding of Linux/or the Terminal on Mac OS X
- Be able to configure NFS / SMB on Linux/Mac OS X

Stuff:
- Some Linux or Mac machines. Windows is not supported, sorry.
- HandBrakeCLI installed on all machines.
- A central file server that is accessible through NFS or SMB.
- SSH access to all systems that participate in the en/transcoding and the 'master' server.
- PPSS (http://code.google.com/p/ppss/) Parallel Processing Shell Script (full disclosure: I wrote PPSS.)
- An example PPSS config file (look below)
- A script for transcoding a DVD to MP4 handling HandBrakeCLI.

The hard part is setting up PPSS. This will not be easy, since it requires SSH etc. But it does work. It may be worth the effort.

0. Short intro about PPSS so you know wat you're building:

PPSS is just a shell script.

You feed PPSS a list of files and PPSS delegates these files to nodes for them to process. PPSS allows you, for instance, to have 5 machines processing those 20 DVDs we talked about earlier. A master node is used to 'manage' the queue and nodes access this 'queue' through SSH. Thus, all nodes must be able to talk to the master through SSH. What nodes actually do is check through SSH if a file is already processed and if not, process it or otherwise try the next one.

A single system acts as the 'master' server and is used to manage the 'queue'. The master can also participate as a node. The master doesnt actually do anything. It is just a single SSH server that is used by all nodes for communication (claiming items).

1. Create a list of files you want to process.


I assume that you've already ripped the DVDs or otherwise that the input contains directories or files. This is how you can create an input list to process:

Example:

Code: Select all

find /mnt/input/2\ -\ Video/0\ -\ Films/ -type f -iname "*.iso" >> dvds.txt
find /mnt/input/2\ -\ Video/0\ -\ Films/ -type f -iname "*.vob" -exec dirname {} \; | sort | uniq >> dvds.txt
The /mnt/input directory is mounted on each node through NFS, so the path for each node is the same.

2. Install and config your nodes to work with PPSS

- You must create a 'ppss' user account on every node system.
- Generate an SSH private key and public key and add the public key to the 'authorized_keys' file on all nodes of the PPSS account. It must also be added to the PPSS account on the master system, since all nodes will use this key to access the master server. The SSH key is both used for deployment of PPSS to nodes, as for nodes to communicate with the master server.

If you login into the master server, the master server key will be added to the known_hosts file as found in ~/.ssh/known_hosts. Copy this entry, since you need this later on.

3. Create a PPSS configuration file:

Example how to generate a PPSS config with PPSS itself:

Code: Select all

ppss config -C config.cfg -f dvds.txt -c './transcode2mp4.sh "$ITEM" /mnt/output/mp4/ source' -u ppss -k ppss-key-dsa -n nodes.txt -m 192.168.0.10  -S ./convert2i-device.sh -K known_hosts
Please look into the PPSS manual to understand what happens here. But if you look closely, you will understand anyway.

Take note of the -p 1 option: HandBrake often uses all CPU cores efficiently thus you should start only one Handbrake instance per host. This whole exercise is just adding more computers to the mix. If you configure HandBrake in such a way that it does not use all available CPU cores, you can use PPSS to run more than one Handbrake instance per node like with -p 2.

Example contents of a PPSS configuration file (config.cfg):

Code: Select all

INPUT_FILE=dvds.txt
COMMAND='./transcode2mp4.sh "$ITEM" /mnt/output/mp4/ source'
USER=ppss
SSH_KEY=ppss-key-dsa
NODES_FILE=nodes.txt
SSH_SERVER=192.168.0.10
SCRIPT=./transcode2mp4.sh
SSH_KNOWN_HOSTS=known_hosts
PPSS_LOCAL_TMPDIR=ppss_dir/PPSS_LOCAL_TMPDIR
PPSS_LOCAL_OUTPUT=ppss_dir/PPSS_LOCAL_OUTPUT
MAX_NO_OF_RUNNING_JOBS=1
The nodes file looks like this and contains the IP-addresses of all the nodes. You can add the master host also, thus the master will act both as master and node:

Code: Select all

192.168.0.10
192.168.0.100
192.168.0.101
192.168.0.102
3. Create a working directory.

put in this directory
- the PPSS config file.
- the ppss ssh private key that will be used to access all nodes and will be used by the nodes to access the master server.
- the nodes file containing a list of all the dns names or IP-addresses of all nodes.
- the file 'dvds.txt' containing all the files you want to process.
- add the known_hosts entry of the master server to the known_hosts file. Otherwise you will have to use ssh on every node as the ppss user to login into the master server, to accept the SSH server key.

4. Deploy PPSS to all nodes

run ./ppss deploy -C config.cfg to deploy all files to all nodes. Make sure that all systems have a valid NFS mount for both the source and destination directory.

5. Start encoding!

run ./ppss start -C config.cfg to start the encoding process on all systems.

run ./ppss status -C config.cfg to show the current status.

Code: Select all

Aug 06 03:07:51:  
Aug 06 03:07:51:  =========================================================
Aug 06 03:07:51:                         |P|P|S|S|                         
Aug 06 03:07:51:  Distributed Parallel Processing Shell Script vers. 2.84
Aug 06 03:07:51:  =========================================================
Aug 06 03:07:51:  Hostname:		Core-i7-Linux
Aug 06 03:07:51:  ---------------------------------------------------------
Aug 06 03:07:51:  CPU: Intel(R) Core(TM) i7 CPU         920  @ 2.67GHz
Aug 06 03:07:53:  Status:		2 percent complete.
Aug 06 03:07:53:  Nodes:	 3
Aug 06 03:07:53:  Items:		203
Aug 06 03:07:53:  ---------------------------------------------------------
Aug 06 03:07:53:  IP-address       Hostname            Processed     Status
Aug 06 03:07:53:  ---------------------------------------------------------
Aug 06 03:07:53:  192.168.0.10       Core-i7-Linux              3    RUNNING
Aug 06 03:07:55:  192.168.0.100      Storage                    0    RUNNING
Aug 06 03:07:57:  192.168.0.101      iMac                       0    RUNNING
Aug 06 03:07:57:  ---------------------------------------------------------
Aug 06 03:07:57:  Total processed:                              3

Use top or htop or ps aux to determine if HandBrakeCLI is actually running on all hosts.

Re: Creating a Handbrake Distributed / Cluster setup

Posted: Tue Aug 17, 2010 5:34 pm
by marjiea1
Just the instructions I was looking for to break up my batches of coding movies to watch on my Zen.

I already have a linux Blender render farm, although I have always used Auto Gordian Knot on Windows to convert my movies, I think I can transfer my settings to a Handbrake script to get the multi-node benefit with just a little more research (I also need subtitles.)

Re: Creating a Handbrake Distributed / Cluster setup

Posted: Tue Aug 17, 2010 10:40 pm
by louwrentius
I updated my own transcode2mp4 handbrake script as provided within this topic. I got some ideas based on work of another forum member.

http://code.google.com/p/ppss/downloads ... z&can=2&q=

It does now transfer all subtitles and audio tracks to the mp4 file instead of only the first.

Please share your experiences.

Re: Creating a Handbrake Distributed / Cluster setup

Posted: Mon Nov 06, 2017 5:49 pm
by kcam1999
This is exactly what I've been looking for. I encode my TV series and would love to use this. I have both a server and 4 Mac's I would love to hook up. I'm not to familiar with unix code, or ssh. Is there a chance that there is a Youtube Tutorial on this, if not is anyone willing to do one? Thanks so much for the time and effort that was put into this!
-kcam1999

Re: Creating a Handbrake Distributed / Cluster setup

Posted: Tue Nov 07, 2017 12:31 am
by mduell
marjiea1 wrote:
Tue Aug 17, 2010 5:34 pm
I have always used Auto Gordian Knot on Windows to convert my movies
Wow, that's a blast from the past.