Search This Blog

Wednesday, August 29, 2012

vsespb/mt-aws-glacier · GitHub

vsespb/mt-aws-glacier · GitHub:


Perl Multithreaded multipart sync to Amazon AWS Glacier service.


Amazon AWS Glacier is an archive/backup service with very low storage price. However with some caveats in usage and archive retrieval prices. Read more about Amazon AWS Glacier
mt-aws-glacier is a client application for Glacier.


  • Version 0.7 Beta


  • Does not use any existing AWS library, so can be flexible in implementing advanced features
  • Glacier Multipart upload
  • Multithreaded upload
  • Multipart+Multithreaded upload
  • Multithreaded retrieval, deletion and download
  • Tracking of all uploaded files with a local journal file (opened for write in append mode only)
  • Checking integrity of local files using journal
  • Ability to limit number of archives to retrieve

Coming-soon features

  • Multipart download (using HTTP Range header)
  • Ability to limit amount of archives to retrieve, by size, or by traffic/hour
  • Use journal file as flock() mutex
  • Checking integrity of remote files
  • Upload from STDIN
  • Some integration with external world, ability to read SNS topics
  • Simplified distribution for Debian/RedHat
  • Split code to re-usable modules, publish on CPAN (Currently there are great existing Glacier modules on CPAN - see Net::Amazon::Glacier by Tim Nordenfur )
  • Create/Delete vault function

Planed next version features

  • Amazon S3 support

Important bugs/missed features

  • Zero length files are ignored
  • chunk size hardcoded as 2MB
  • Only multipart upload implemented, no plain upload
  • Retrieval works as proof-of-concept, so you can't initiate retrieve job twice (until previous job is completed)
  • No way to specify SNS topic
  • HTTP only, no way to configure HTTPS yet (however it works fine in HTTPS mode)
  • Internal refactoring needed, no comments in source yet, unit tests not published
  • Journal file required to restore backup. To be fixed. Will store file metainformation in archive description.

Production ready

  • Not recomended to use in production until first "Release" version. Currently Beta.


  • Install the following CPAN modules:
            LWP::UserAgent JSON::XS
that's all
  • in case you use HTTPS, also install
  • Some CPAN modules better install as OS packages (example for Ubuntu/Debian)
            libjson-xs-perl liblwp-protocol-https-perl liburi-perl


  • When playing with Glacier make sure you will be able to delete all your archives, it's impossible to delete archive or non-empty vault in amazon console now. Also make sure you have read all AWS Glacier pricing/faq.
  • Read their pricing FAQ again, really. Beware of retrieval fee.
  • Backup your local journal file. Currently it's impossible to correctly restore backup without journal file


  1. Create a directory containing files to backup. Example /data/backup
  2. Create config file, say, glacier.cfg
            region=us-east-1 #eu-west-1, us-east-1 etc
  3. Create a vault in specified region, using Amazon Console (myvault)
  4. Choose a filename for the Journal, for example, journal.log
  5. Sync your files
            ./ sync --config=glacier.cfg --from-dir /data/backup --to-vault=myvault --journal=journal.log --concurrency=3
  6. Add more files and sync again
  7. Check that your local files not modified since last sync
            ./ check-local-hash --config=glacier.cfg --from-dir /data/backup --to-vault=myvault -journal=journal.log
  8. Delete some files from your backup location
  9. Initiate archive restore job on Amazon side
            ./ restore --config=glacier.cfg --from-dir /data/backup --to-vault=myvault -journal=journal.log --max-number-of-files=10
  10. Wait 4+ hours
  11. Download restored files back to backup location
            ./ restore-completed --config=glacier.cfg --from-dir /data/backup --to-vault=myvault -journal=journal.log
  12. Delete all your files from vault
            ./ purge-vault --config=glacier.cfg --from-dir /data/backup --to-vault=myvault -journal=journal.log

Test/Play with it

  1. create empty dir MYDIR
  2. Set vault name inside
  3. Run
    ./ init MYDIR
    ./ retrieve MYDIR
    ./ restore MYDIR
    ./ init MYDIR
    ./ purge MYDIR

Minimum AWS permissions

something like that
            "Statement": [
                "Effect": "Allow",

No comments:

Post a Comment