Say you have about 20 DVDs you want to convert to MP4 to save some space. Or you want to transcode an entire season of a series that you've bought on DVD to a format your phone or tablet can handle.
You will need:
Knowledge:
- Some understanding of SSH and keys.
- Some understanding of Linux/or the Terminal on Mac OS X
- Be able to configure NFS / SMB on Linux/Mac OS X
Stuff:
- Some Linux or Mac machines. Windows is not supported, sorry.
- HandBrakeCLI installed on all machines.
- A central file server that is accessible through NFS or SMB.
- SSH access to all systems that participate in the en/transcoding and the 'master' server.
- PPSS (http://code.google.com/p/ppss/) Parallel Processing Shell Script (full disclosure: I wrote PPSS.)
- An example PPSS config file (look below)
- A script for transcoding a DVD to MP4 handling HandBrakeCLI.
The hard part is setting up PPSS. This will not be easy, since it requires SSH etc. But it does work. It may be worth the effort.
0. Short intro about PPSS so you know wat you're building:
PPSS is just a shell script.
You feed PPSS a list of files and PPSS delegates these files to nodes for them to process. PPSS allows you, for instance, to have 5 machines processing those 20 DVDs we talked about earlier. A master node is used to 'manage' the queue and nodes access this 'queue' through SSH. Thus, all nodes must be able to talk to the master through SSH. What nodes actually do is check through SSH if a file is already processed and if not, process it or otherwise try the next one.
A single system acts as the 'master' server and is used to manage the 'queue'. The master can also participate as a node. The master doesnt actually do anything. It is just a single SSH server that is used by all nodes for communication (claiming items).
1. Create a list of files you want to process.
I assume that you've already ripped the DVDs or otherwise that the input contains directories or files. This is how you can create an input list to process:
Example:
Code: Select all
find /mnt/input/2\ -\ Video/0\ -\ Films/ -type f -iname "*.iso" >> dvds.txt
find /mnt/input/2\ -\ Video/0\ -\ Films/ -type f -iname "*.vob" -exec dirname {} \; | sort | uniq >> dvds.txt
2. Install and config your nodes to work with PPSS
- You must create a 'ppss' user account on every node system.
- Generate an SSH private key and public key and add the public key to the 'authorized_keys' file on all nodes of the PPSS account. It must also be added to the PPSS account on the master system, since all nodes will use this key to access the master server. The SSH key is both used for deployment of PPSS to nodes, as for nodes to communicate with the master server.
If you login into the master server, the master server key will be added to the known_hosts file as found in ~/.ssh/known_hosts. Copy this entry, since you need this later on.
3. Create a PPSS configuration file:
Example how to generate a PPSS config with PPSS itself:
Code: Select all
ppss config -C config.cfg -f dvds.txt -c './transcode2mp4.sh "$ITEM" /mnt/output/mp4/ source' -u ppss -k ppss-key-dsa -n nodes.txt -m 192.168.0.10 -S ./convert2i-device.sh -K known_hosts
Take note of the -p 1 option: HandBrake often uses all CPU cores efficiently thus you should start only one Handbrake instance per host. This whole exercise is just adding more computers to the mix. If you configure HandBrake in such a way that it does not use all available CPU cores, you can use PPSS to run more than one Handbrake instance per node like with -p 2.
Example contents of a PPSS configuration file (config.cfg):
Code: Select all
INPUT_FILE=dvds.txt
COMMAND='./transcode2mp4.sh "$ITEM" /mnt/output/mp4/ source'
USER=ppss
SSH_KEY=ppss-key-dsa
NODES_FILE=nodes.txt
SSH_SERVER=192.168.0.10
SCRIPT=./transcode2mp4.sh
SSH_KNOWN_HOSTS=known_hosts
PPSS_LOCAL_TMPDIR=ppss_dir/PPSS_LOCAL_TMPDIR
PPSS_LOCAL_OUTPUT=ppss_dir/PPSS_LOCAL_OUTPUT
MAX_NO_OF_RUNNING_JOBS=1
Code: Select all
192.168.0.10
192.168.0.100
192.168.0.101
192.168.0.102
put in this directory
- the PPSS config file.
- the ppss ssh private key that will be used to access all nodes and will be used by the nodes to access the master server.
- the nodes file containing a list of all the dns names or IP-addresses of all nodes.
- the file 'dvds.txt' containing all the files you want to process.
- add the known_hosts entry of the master server to the known_hosts file. Otherwise you will have to use ssh on every node as the ppss user to login into the master server, to accept the SSH server key.
4. Deploy PPSS to all nodes
run ./ppss deploy -C config.cfg to deploy all files to all nodes. Make sure that all systems have a valid NFS mount for both the source and destination directory.
5. Start encoding!
run ./ppss start -C config.cfg to start the encoding process on all systems.
run ./ppss status -C config.cfg to show the current status.
Code: Select all
Aug 06 03:07:51:
Aug 06 03:07:51: =========================================================
Aug 06 03:07:51: |P|P|S|S|
Aug 06 03:07:51: Distributed Parallel Processing Shell Script vers. 2.84
Aug 06 03:07:51: =========================================================
Aug 06 03:07:51: Hostname: Core-i7-Linux
Aug 06 03:07:51: ---------------------------------------------------------
Aug 06 03:07:51: CPU: Intel(R) Core(TM) i7 CPU 920 @ 2.67GHz
Aug 06 03:07:53: Status: 2 percent complete.
Aug 06 03:07:53: Nodes: 3
Aug 06 03:07:53: Items: 203
Aug 06 03:07:53: ---------------------------------------------------------
Aug 06 03:07:53: IP-address Hostname Processed Status
Aug 06 03:07:53: ---------------------------------------------------------
Aug 06 03:07:53: 192.168.0.10 Core-i7-Linux 3 RUNNING
Aug 06 03:07:55: 192.168.0.100 Storage 0 RUNNING
Aug 06 03:07:57: 192.168.0.101 iMac 0 RUNNING
Aug 06 03:07:57: ---------------------------------------------------------
Aug 06 03:07:57: Total processed: 3
Use top or htop or ps aux to determine if HandBrakeCLI is actually running on all hosts.