Restore date-separated Glacier directories by directory using Amazon S3 batch operations

background

I saved log data in S3 daily like `bucket / log / 20201``, and I moved logs from S3 to Glacier after a certain period of time, but I need several months' worth of Glacier log data. , I had to restore it.

What to consider

-Restore S3 Glacier archive with folder -Qiita ――I didn't use this tool this time because the installation was rather difficult.

-One-liner for returning from Glacier to S3 at once -Qiita ――If the amount of data is small, I think that one liner is fine, but in the case of the data for several months this time, it took one hour to restore one day. (It may be different if there are asynchronous options etc.)

Correspondence

I decided to use S3 batch operations because I did one operation called restore for a large number of objects in S3.

S3 batch operations are performed in the following steps. --Create manifest file --Upload manifest file --Create a job --Job execution

Creating a manifest file

--The manifest file is described in the format of bucket, object_key. Here, create a CSV that describes the bucket and the key of the log data object you want to restore. --Upload the created file to S3.

target_date_range  = Date.parse(config['START_DATE'])..Date.parse(config['END_DATE'])
target_date_range.each do |date|
  #Get the list of objects in the directory where the log data to be restored is saved
  s3_objects = s3_list_object_content(config['BUCKET'], "#{config['S3_KEY']}/#{date.strftime('%Y%m%d')}")
  #Make the acquired object list into a manifest file in CSV format
  s3_objects.each do |object|
    CSV.open(file_name, 'a') { |f| f << [config['BUCKET'], object.key] }
  end
end

Creating a job

--By specifying confirmation_required: false, the job will be executed at the same time as it is created. If set to true, the job will be created and must be executed separately. --The job result is output to the bucket specified in report. If there is something that failed to be restored, it will be displayed here.


job_id = Aws::S3Control::Client.new.create_job(
  account_id: config['AWS_ACCOUNT_ID'],
  confirmation_required: false,
  operation: {
    s3_initiate_restore_object: {
      expiration_in_days: 10,
      glacier_job_tier: config['GLACIER_JOB_TIER'],
    },
  },
  report: {
    bucket: "arn:aws:s3:::#{config['MANIFEST_BUCKET']}", 
    format: 'Report_CSV_20180820',
    enabled: true,
    prefix: 'report',
    report_scope: 'AllTasks',
  },
  client_request_token: SecureRandom.uuid,
  manifest: {
    spec: {
      format: 'S3BatchOperations_CSV_20180820',
      fields: ['Bucket', 'Key'],
    },
    location: {
      object_arn: "#{manifest_bucket_arn}/#{manifes_file_name}",
      etag: etag,
    },
  },
  description: 'restore',
  priority: 10,
  role_arn: config['ROLE_ARN'],
  tags: config['TAGS'],
)

puts job_id

Other

--S3 batch operations are charged on a job-by-job basis, so be careful when splitting jobs. --The final script you created can be found at https://github.com/akiraisomura/s3_utils.

Recommended Posts

Restore date-separated Glacier directories by directory using Amazon S3 batch operations