There are quite a few situations in which ordinary people receive files created with Excel or Word on a Windows machine and process information on linux. I have summarized the basics of the data conversion procedure that will be required at that time.
Here, the Japanese name file is a file in which so-called double-byte characters are used in the file name.
unzip -O cp932 Archive containing Japanese name files.zip
cp932 is a character code standard extended by microsoft to Shift JIS.
reference: Actually not scary CP932
convmv -f cp932 -t utf-8 * --notest
For ubuntu, the convmv
command is not installed by default, so you need to do ʻapt install convmv` in advance.
If a file with a Japanese name is unzipped from zip on linux using the "standard" procedure, it seems that it will be converted to UTF-8 with inappropriate garbled characters, and even if you try to convmv after that, " "It has been processed" and it is refused. There is no choice but to give up and re-extract from the original zip file according to the above procedure.
Convert the text of Shift jis to UTF-8, convert the line feed code from CR / LF which is the standard of windows to LF, and write the result to a new file.
iconv -f cp932 -t utf-8 Target file name| sed 's/\r//g' >Output destination file name
I also considered a method to process all the files in the current directory at once. Let's write out the converted contents by creating a file with the same name in the subdirectory ʻutf8`. Use bash loop processing.
#!/bin/bash
[ -d utf8 ] || mkdir utf8
for a in *
do iconv -f cp932 -t utf-8 $a | sed 's/\r//g' > utf8/$a
done
Recommended Posts