Even though I need to check the contents of gz on the server, I don't have any gz-compatible software such as 7zip installed on my Windows 7 PC. If you unzip it on the server, the original file may be too big to bring to your PC. Java and Eclipse are the only things I can use at hand--so I had the opportunity to unzip the gz file in Java, so I would like to memorialize the sample code to Qiita.
import java.io.IOException;
import java.io.OutputStream;
import java.io.UncheckedIOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.util.zip.GZIPInputStream;
public class GZIPUtils {
public static void unzip(Path src, Path dst) {
try (GZIPInputStream in = new GZIPInputStream(Files.newInputStream(src));
OutputStream out = Files.newOutputStream(dst)) {
int len;
byte[] b = new byte[1024 * 4];
while ((len = in.read(b)) != -1) {
out.write(b, 0, len);
}
} catch (IOException e) {
throw new UncheckedIOException(e);
}
}
}
Let's check the operation. First, create a suitable text file. Let's assume that we will create 1MB, 10MB, and 100MB files, and name them 001mb.txt, 010mb.txt, and 100mb.txt, respectively.
$ cat /dev/urandom | tr -cd [:print:] | head -c 1m > 001mb.txt
$ cat /dev/urandom | tr -cd [:print:] | head -c 10m > 010mb.txt
$ cat /dev/urandom | tr -cd [:print:] | head -c 100m > 100mb.txt
For these three files, I will get a message digest and use it to verify whether it has been decompressed correctly.
$ sha256sum *.txt
f2cddb9f0bb64033f7a9d8979a47d925af06271b2e7d79b99bd9e41ad504616d 001mb.txt
0282119ac4c00a1c14370b65e72f3283d988264fb3240f46b679ada117db3979 010mb.txt
1104a28dd5a4033c5dd495289b590526b60c1f285357dd26af53ed0f5c4898c0 100mb.txt
Since I was able to get the message digest, I will compress these three files in GZIP format.
$ gzip *.txt
$ ls -lh *.gz
-rwxrwxrwx 1 dev dev 852K Nov 7 20:33 001mb.txt.gz
-rwxrwxrwx 1 dev dev 8.4M Nov 7 20:33 010mb.txt.gz
-rwxrwxrwx 1 dev dev 84M Nov 7 20:34 100mb.txt.gz
Let's decompress the compressed file with your own Java program. The following Main.java
uses GZIPUtils :: unzip
to name 001mb.txt.gz, 010mb.txt.gz, 100mb.txt.gz unzipped_001mb.txt, unzipped_010mb.txt, unzipped_100mb.txt, respectively. Unzip to.
Main.java
import java.nio.file.Paths;
public class Main {
public static void main(String[] args) {
GZIPUtils.unzip(Paths.get("001mb.txt.gz"), Paths.get("unzipped_001mb.txt"));
GZIPUtils.unzip(Paths.get("010mb.txt.gz"), Paths.get("unzipped_010mb.txt"));
GZIPUtils.unzip(Paths.get("100mb.txt.gz"), Paths.get("unzipped_100mb.txt"));
}
}
After compiling Main.java
, execute it, and check the execution result, you can see that three files, unzipped_001mb.txt, unzipped_010mb.txt, and unzipped_100mb.txt, are created as expected.
$ javac Main.java GZIPUtils.java
$ java -cp . Main
$ ls -hl unzipped*
-rwxrwxrwx 1 dev dev 1.0M Nov 7 23:09 unzipped_001mb.txt
-rwxrwxrwx 1 dev dev 10M Nov 7 23:09 unzipped_010mb.txt
-rwxrwxrwx 1 dev dev 100M Nov 7 23:09 unzipped_100mb.txt
If you look at the message digests for these three files, you'll see that they're exactly the same as the uncompressed txt file. In other words, GZIPUtils :: unzip
seems to be working as expected.
$ sha256sum unzipped*
f2cddb9f0bb64033f7a9d8979a47d925af06271b2e7d79b99bd9e41ad504616d unzipped_001mb.txt
0282119ac4c00a1c14370b65e72f3283d988264fb3240f46b679ada117db3979 unzipped_010mb.txt
1104a28dd5a4033c5dd495289b590526b60c1f285357dd26af53ed0f5c4898c0 unzipped_100mb.txt
Recommended Posts