[JAVA] I was a little addicted to the S3 Checksum comparison, so I made a note.

As always, it's a reprint from the blog. https://munchkins-diary.hatenablog.com/entry/2019/10/28/230924 I haven't written much recently, but I want to resume it soon.

I wrote that it is easy to get addicted to the method of getting the checksum of the file on S3, and how to calculate the checksum of the file with Java as a bonus at the bottom.

I hope it helps someone. (Not too much)

I want to compare S3 checksums

I was writing a process that listed a file of several GB in S3, and wanted to retry due to a failure in subsequent processing.

In this case, it is wasteful to upload the file again, but there is a possibility that the target file in the migration source storage has been changed. So I only want to retry the upload if I already have the exactly same file on S3.

In this case, it's easy to compare checksums, so I wrote the code below using S3's getObjectMetaData API.

 private boolean shouldSkip(String bucketName, String key, String md5CheckSum) {
      try {
        ObjectMetadata meta = s3Client.getObjectMetadata(bucketName, key);
        if (meta == null || meta.getContentMD5() == null) {
          log.info("meta data not exist for the file {} in bucket {}", key, bucketName);
          return false;
        }
        log.info(
            "Checksum of existing file is {} and present file checksum is {}",
            meta.getContentMD5(),
            md5CheckSum);
        return meta.getContentMD5().equals(md5CheckSum);
      } catch (SdkClientException e) {
        log.error("Exception thrown while validating the checksum of the file {}", key, e);
        return false;
      }
    }

But it doesn't work. ʻObjectMetaData # contentMD5` is inevitably null.

After checking, it seems that the checksum of the existing object in S3 is given to ʻEtag` instead of contentMD5.

Then, what is contentMD5 used for? It is added to the HTTP header at the time of update and used for tampering confirmation (correct usage) in S3, so it is not returned when getting an object with get.

Therefore, if you want to know the checksum of the file dropped from S3, you need to compare it with Etag.

Like this.

    private boolean shouldSkip(String bucketName, String key, String md5CheckSum) {
      try {
        ObjectMetadata meta = this.s3Client.getObjectMetadata(bucketName, key);
        if (meta == null || meta.getETag() == null) {
          log.info("meta data not exist for the file {} in bucket {}", key, bucketName);
          return false;
        }
        log.info(
            "Checksum of existing file is {} and present file checksum is {}",
            meta.getETag(),
            md5CheckSum);
        return meta.getETag().equals(md5CheckSum);
      } catch (SdkClientException e) {
        log.error("Exception thrown while validating the checksum of the file {}", key, e);
        return false;
      }
    }

This works fine. I hope it helps someone.

Bonus How to check checksum in Java

For those who googled with checksums and flew in, here's how to calculate checksums in Java.

  public static String checkMd5Checksum(File file) {
    try (BufferedInputStream is = new BufferedInputStream(new FileInputStream(file))) {
      return DigestUtils.md5Hex(is);
    } catch (Exception e) {
      // Not likely to occur.
      log.error(
          "ERROR Happened while calculating the check sum for file {}", file.getAbsolutePath(), e);
      return "NOT FOUND";
    }
  }

For sha256, just change DigestUtils # md5Hex to DigestUtils # sha256Hex.

That was a memo.

Recommended Posts

I was a little addicted to the S3 Checksum comparison, so I made a note.
I was addicted to the roll method
I was addicted to the Spring-Batch test
A story I was addicted to when testing the API using MockMVC
I was a little addicted to running old Ruby environment and old Rails
I was a little addicted to ssh connection from mac to linux (ubuntu)
I was addicted to the NoSuchMethodError in Cloud Endpoints
I read the readable code, so make a note
I tried to decorate the simple calendar a little
A memorandum because I was addicted to the setting of the Android project of IntelliJ IDEA
What I was addicted to when introducing the JNI library
I was addicted to looping the Update statement on MyBatis
I tried JAX-RS and made a note of the procedure
A story I was addicted to in Rails validation settings
I was addicted to the setting of laradock + VSCode + xdebug
What I was addicted to with the Redmine REST API
The story I was addicted to when setting up STS
kintone clone? I was quite addicted to launching OSS WebDB Extension with Lightsail + Docker, so make a note of it.
I was addicted to starting sbt
[Circle CI] A story I was addicted to at Start Building
I passed the Java test level 2 so I will leave a note
I made a gem to post the text of org-mode to qiita
About the matter that I was addicted to how to use hashmap
I made a tool to output the difference of CSV file
I was addicted to the API version min23 setting of registerTorchCallback
Rails was difficult, so I made something like a controller of Spring Framework to take a break
A memo that was soberly addicted to the request of multipart / form-data
Memorandum: What I was addicted to when I hit the accounting freee API
[Rails] I was addicted to the nginx settings when using Action Cable.
A story that I was addicted to twice with the automatic startup setting of Tomcat 8 on CentOS 8
I was in trouble at work, so I made a plugin for IntelliJ
I was addicted to a simple test of Jedis (Java-> Redis library)
Problems I was addicted to when building the digdag environment with docker
I was addicted to rewriting to @SpringApplicationConfiguration-> @SpringBootTest
It was a life I wanted to reset the thread-safe associative counter
I was addicted to scrollview because I couldn't tap the variable size UIView
[CircleCI] I was addicted to the automatic test of CircleCI (rails + mysql) [Memo]
I was addicted to unit testing with the buffer operator in RxJava
I was able to obtain OCJP Silver SE 11 safely, so a summary
[Java beginner] I got a little deeper understanding of "It's time to use new", so make a note
The training for newcomers was "Make an app!", So I made an app for the time being.
SpringSecurity I was addicted to trying to log in with a hashed password (solved)
[Small story] I tried to make the java ArrayList a little more convenient
I was addicted to using RXTX on Sierra
I was addicted to installing Ruby/Tk on MacOS
I made a Docker container to run Maven
I was addicted to doing onActivityResult () with DialogFragment
What I was addicted to when developing a Spring Boot application with VS Code
A memo that I was addicted to when making batch processing with Spring Boot
The part I was addicted to in "Introduction to Ajax in Java Web Applications" of NetBeans
A story I was addicted to when getting a key that was automatically tried on MyBatis
I made a check tool for the release module
I made a method to ask for Premium Friday
I read the "Object-Oriented Practical Guide", so a memorandum
I want to add a delete function to the comment function
A story I was addicted to with implicit type conversion of ActiveRecord during unit testing
I was inspired by the article for newcomers, so make a note of it so that you don't forget your original intentions.
It's easy to forget = render @collection, so leave a note
What I fixed when updating to Spring Boot 1.5.12 ・ What I was addicted to
What I was addicted to while using rspec on rails
I made a plugin to execute jextract with Gradle task