On your server
To use s3sync you need ruby to be installed. I found openssl was already installed on my server but you may need to get that too if you want to use ssl connections (you can use yum for this too).
To get ruby use yum:
yum install ruby
Once has installed check the version using
ruby -v
You should see something like
ruby 1.8.5 (2006-08-25) [i386-linux]
To use s3sync you need a version at 1.8.4 upwards. This tutorial assumes you download this to your home directory. If not you will need to change the paths in the example. The examples use “your-user” so you will obviously need to change that to whichever user you are using. Let’s start by downloading, extracting s3sync and then removing the download:
wget http://s3.amazonaws.com/ServEdge_pub/s3sync/s3sync.tar.gz tar xzf s3sync.tar.gz rm s3sync.tar.gz
Now you’ll need to set up the configuration with the access keys you have from s3
cd s3sync # Copy the default configuration to the right location in /etc # You may need to be root for this mkdir /etc/s3conf cp s3config.yml.example /etc/s3conf/s3config.yml # Edit the file vi /etc/s3conf/s3config.yml
Edit the file with the following lines
aws_access_key_id: ------Your Access Key here ------ aws_secret_access_key: ---- Your Secret Access Key here ------ ssl_cert_dir: /home/your-user/s3sync/certs
Now we need to set up the SSL certificates so we can connect on a secure connection. I had some trouble setting up my certificates (I think because I am in Europe) so I used John Eberly’s example and it worked fine.
mkdir /home/your-user/s3sync/certs cd /home/your-user/s3sync/certs # Use John Eberly's example to get the certificates wget http://mirbsd.mirsolutions.de/cvs.cgi/~checkout~/src/etc/ssl.certs.shar # Run the script sh ssl.certs.shar
Connecting to S3
You should be set up now to access S3. There are two scripts you can use to administer and set up your backups. Both s3sync and s3cmd are well documented at s3sync but I will take you through a basic setup.
First we are going to set up a bucket for this server (as we may wish to back up others in the future).
cd /home/your-user/s3sync # Create the bucket (add -s to use ssl) ruby s3cmd.rb createbucket shapeshed.com
For this backup I’m going to backup my vhosts directory. Here’s the command I use (I’m still in /home/your-user/s3sync).
ruby s3sync.rb -r -s -v --exclude="cache$|captchas$" --delete /nas/content/live/lifeofaserver/vhosts/ shapeshed.com:/vhosts > /var/log/s3sync
Let’s go through the options
- -r This tells the script to act recursively including everything in the folder
- -s This tells the script to use SSL. We certainly want to do this if there is any sensitive information being transmitted and I’d recommend doing this by default anyway.
- -v This tells the script to be verbose, meaning it should show output all messages to the terminal.
- –exclude=”cache$|captchas$”
This tells the script to exclude certain folders or files based on a regular expression. In this example I want to exclude any folders called cache or captchas. - /nas/content/live/lifeofaserver/vhosts/
This is path to the folder that you want to back up. Bear in mind that this backs up everything in the folder. - –delete This tells the script to delete any obsolete files. So it will remove files you have deleted on your local server from the mirror.
- shapeshed.com:/vhosts
This is first the bucket that you want to use (this is the one we created earlier), and then the prefix you would like. I’m backing up my vhosts so vhosts is a good one for me. - > /var/log/s3sync
This tells the script to log the output into a log file. This is optoinal but I like to keep an eye on things. You’ll need to make sure your user has permissions to write to the file or the script will error. This is crude logging as it will only log the last sync.
You can run the script with a dry run by using the additional –dryrun flag and this will show you everything the script will do without it actually doing it. You can also use the -d flag to debug the script. Depending on the size of your folder syncing can take some time so be patient. That’s it – you now have a remote backup of your files that is likely to cost cents rather than dollars per month. If any files or folers are subsequently removed from or added to /nas/content/live/lifeofaserver/vhosts/ when you run the script again your remote copy will be updated to mirror your folder.
Automating the task
To take all the administration out of this task you can automate the backup using cron. First we need to put the command into file so cron can use it.
mkdir /home/your-user/shell_scripts cd /home/your-user/shell_scripts # Create and edit the file vi s3backup.sh
Copy the script you want to run as a cron job into this file, ensuring you specify the full path to your ruby script. Rember to add #!/bin/bash or whichever shell you use at the top of the script.
ruby /home/your-user/s3sync/s3sync.rb -r -s -v --exclude="cache$|captchas$" --delete /nas/content/live/lifeofaserver/vhosts/ shapeshed.com:/vhosts > /var/log/s3sync
Save this file and then set up the cron job
crontab -e # Add the following line. This runs the backup every Sunday at 6am 0 6 * * 0 /home/your-user/shell_scripts/s3backup.sh
The backup will now run at 6am every Sunday without any further input from you. You can check the script is running ok by checking /var/log/s3sync (if you have created it). If you want to do it more frequently just change the cron timings.
Source: http://shapeshed.com/journal/automating_backups_with_amazon_s3_on_linux/