amazon

  • Automating backups with Amazon S3 on Linux

    On your server

    To use s3sync you need ruby to be installed. I found openssl was already installed on my server but you may need to get that too if you want to use ssl connections (you can use yum for this too).

    To get ruby use yum:

    yum install ruby

    Once has installed check the version using

    ruby -v

    You should see something like

    ruby 1.8.5 (2006-08-25) [i386-linux]
    

    To use s3sync you need a version at 1.8.4 upwards. This tutorial assumes you download this to your home directory. If not you will need to change the paths in the example. The examples use “your-user” so you will obviously need to change that to whichever user you are using. Let’s start by downloading, extracting s3sync and then removing the download:

    wget http://s3.amazonaws.com/ServEdge_pub/s3sync/s3sync.tar.gz
    tar xzf s3sync.tar.gz
    rm s3sync.tar.gz

    Now you’ll need to set up the configuration with the access keys you have from s3

    cd s3sync
    # Copy the default configuration to the right location in /etc
    
    # You may need to be root for this
    mkdir /etc/s3conf
    
    cp s3config.yml.example /etc/s3conf/s3config.yml
    
    # Edit the file
    vi /etc/s3conf/s3config.yml

    Edit the file with the following lines

    aws_access_key_id: ------Your Access Key here ------
    aws_secret_access_key: ---- Your Secret Access Key here ------
    ssl_cert_dir: /home/your-user/s3sync/certs

    Now we need to set up the SSL certificates so we can connect on a secure connection. I had some trouble setting up my certificates (I think because I am in Europe) so I used John Eberly’s example and it worked fine.

     mkdir /home/your-user/s3sync/certs
    cd /home/your-user/s3sync/certs
    # Use John Eberly's example to get the certificates
    wget http://mirbsd.mirsolutions.de/cvs.cgi/~checkout~/src/etc/ssl.certs.shar
    # Run the script
    sh ssl.certs.shar

    Connecting to S3

    You should be set up now to access S3. There are two scripts you can use to administer and set up your backups. Both s3sync and s3cmd are well documented at s3sync but I will take you through a basic setup.

    First we are going to set up a bucket for this server (as we may wish to back up others in the future).

    cd /home/your-user/s3sync
    # Create the bucket (add -s to use ssl)
    ruby s3cmd.rb createbucket shapeshed.com

    For this backup I’m going to backup my vhosts directory. Here’s the command I use (I’m still in /home/your-user/s3sync).

    ruby s3sync.rb -r -s -v --exclude="cache$|captchas$" --delete /nas/content/live/lifeofaserver/vhosts/ shapeshed.com:/vhosts > /var/log/s3sync
    

    Let’s go through the options

    • -r This tells the script to act recursively including everything in the folder
    • -s This tells the script to use SSL. We certainly want to do this if there is any sensitive information being transmitted and I’d recommend doing this by default anyway.
    • -v This tells the script to be verbose, meaning it should show output all messages to the terminal.
    • –exclude=”cache$|captchas$”
      This tells the script to exclude certain folders or files based on a regular expression. In this example I want to exclude any folders called cache or captchas.
    • /nas/content/live/lifeofaserver/vhosts/
      This is path to the folder that you want to back up. Bear in mind that this backs up everything in the folder.
    • –delete This tells the script to delete any obsolete files. So it will remove files you have deleted on your local server from the mirror.
    • shapeshed.com:/vhosts
      This is first the bucket that you want to use (this is the one we created earlier), and then the prefix you would like. I’m backing up my vhosts so vhosts is a good one for me.
    • > /var/log/s3sync
      This tells the script to log the output into a log file. This is optoinal but I like to keep an eye on things. You’ll need to make sure your user has permissions to write to the file or the script will error. This is crude logging as it will only log the last sync.

    You can run the script with a dry run by using the additional –dryrun flag and this will show you everything the script will do without it actually doing it. You can also use the -d flag to debug the script. Depending on the size of your folder syncing can take some time so be patient. That’s it – you now have a remote backup of your files that is likely to cost cents rather than dollars per month. If any files or folers are subsequently removed from or added to /nas/content/live/lifeofaserver/vhosts/ when you run the script again your remote copy will be updated to mirror your folder.

    Automating the task

    To take all the administration out of this task you can automate the backup using cron. First we need to put the command into file so cron can use it.

    mkdir /home/your-user/shell_scripts
    cd /home/your-user/shell_scripts
    # Create and edit the file
    vi s3backup.sh

    Copy the script you want to run as a cron job into this file, ensuring you specify the full path to your ruby script. Rember to add #!/bin/bash or whichever shell you use at the top of the script.

    ruby /home/your-user/s3sync/s3sync.rb -r -s -v --exclude="cache$|captchas$" --delete /nas/content/live/lifeofaserver/vhosts/ shapeshed.com:/vhosts > /var/log/s3sync
    

    Save this file and then set up the cron job

    crontab -e
    
    # Add the following line. This runs the backup every Sunday at 6am
    0 6 * * 0 /home/your-user/shell_scripts/s3backup.sh

    The backup will now run at 6am every Sunday without any further input from you. You can check the script is running ok by checking /var/log/s3sync (if you have created it). If you want to do it more frequently just change the cron timings.

    Source: http://shapeshed.com/journal/automating_backups_with_amazon_s3_on_linux/

  • Using s3cmd.rb to copy files to/from Amazon

    Examples
    --------
    List all the buckets your account owns:
    	s3cmd.rb listbuckets
    
    Create a new bucket:
    	s3cmd.rb createbucket BucketName
    
    Create a new bucket in the EU:
    	s3cmd.rb createbucket BucketName EU
    
    Find out the location constraint of a bucket:
       s3cmd.rb location BucketName
    
    Delete an old bucket you don't want any more:
    	s3cmd.rb deletebucket BucketName
    
    Find out what's in a bucket, 10 lines at a time:
    	s3cmd.rb list BucketName 10
    
    Only look in a particular prefix:
    	s3cmd.rb list BucketName:startsWithThis
    
    Look in the virtual "directory" named foo;
    lists sub-"directories" and keys that are at this level.
    Note that if you specify a delimiter you must specify a max before it.
    (until I make the options parsing smarter)
    	s3cmd.rb list BucketName:foo/  10  /
    
    Delete a key:
    	s3cmd.rb delete BucketName:AKey
    
    Delete all keys that match (like a combo between list and delete):
    	s3cmd.rb deleteall BucketName:SomePrefix
    
    Only pretend you're going to delete all keys that match, but list them:
    	s3cmd.rb  --dryrun  deleteall  BucketName:SomePrefix
    
    Delete all keys in a bucket (leaving the bucket):
    	s3cmd.rb deleteall BucketName
    
    Get a file from S3 and store it to a local file
    	s3cmd.rb get BucketName:TheFileOnS3.txt  ALocalFile.txt
    
    Put a local file up to S3
    Note we don't automatically set mime type, etc.
    NOTE that the order of the options doesn't change. S3 stays first!
    	s3cmd.rb put BucketName:TheFileOnS3.txt ALocalFile.txt

    Example to retrieve a file:

    /home/s3sync/s3cmd.rb get server1.3mwhosting.com:backup/bak_2014_01_09_04_41_23/bak-1-tar-2014-1-9/backup.tar.gzaa backup.tar.gzaa

    Source: http://s3.amazonaws.com/ServEdge_pub/s3sync/README_s3cmd.txt

  • Using s3sync.rb to synchronize files to/from Amazon

    Examples:
    ———
    (using S3 bucket ‘mybucket’ and prefix ‘pre’)

    Put the local etc directory itself into S3

    s3sync.rb  -r  /etc  mybucket:pre

    (This will yield S3 keys named pre/etc/…)

    Put the contents of the local /etc dir into S3, rename dir:

    s3sync.rb  -r  /etc/  mybucket:pre/etcbackup

    (This will yield S3 keys named pre/etcbackup/…)

    Put contents of S3 “directory” etc into local dir

    s3sync.rb  -r  mybucket:pre/etc/  /root/etcrestore

    (This will yield local files at /root/etcrestore/…)

    Put the contents of S3 “directory” etc into a local dir named etc

    s3sync.rb  -r  mybucket:pre/etc  /root

    (This will yield local files at /root/etc/…)

    Put S3 nodes under the key pre/etc/ to the local dir etcrestore
    **and create local dirs even if S3 side lacks dir nodes**

    s3sync.rb  -r  --make-dirs  mybucket:pre/etc/  /root/etcrestore

    (This will yield local files at /root/etcrestore/…)

    Source: http://s3.amazonaws.com/ServEdge_pub/s3sync/README.txt