I was asked to set the default character code of the character string to Shift-jis when spitting out CSV in my current job.
The default character code of the Ruby string is UTF-8, so I thought it would be possible to realize it just by converting it to Shift-jis at the time of CSV spitting, but I was addicted to other than that.
Some people may be addicted to it as well, so I will describe the solution.
CSV ejection can be implemented relatively easily by using the generate method of the CSV class of the Ruby library.
Implementation image ↓
require "csv"
text =<<-EOS
id,first name,last name,age
1,taro,tanaka,20
2,jiro,suzuki,18
3,ami,sato,19
4,yumi,adachi,21
EOS
csv = CSV.generate(text, headers: true) do |csv|
csv.add_row(["5", "saburo", "kondo", "34"])
end
Code reference: https://docs.ruby-lang.org/ja/latest/method/CSV/s/generate.html
If you want to convert to shift-jis, you can use the key: encoding to specify the encoding of the output.
CSV.generate(text, headers: true, encoding: "SJIS")
With this option, the output encoding will be automatically converted from utf-8 to shift-jis.
When I implemented it here, the following error occurred for some reason .....
incompatible character encodings: Windows-31J and UTF-8
Investigate why you get an encoding error.
As a result of investigating which character string causes an error, an error occurred in the following character string.
"AAA−0001"
It's a string that doesn't feel strange, why is it an error?
Upon closer examination, it seems that converting the following characters to shif-jis will result in an exception error.
Character code (UTF-8) | letter | Remarks |
---|---|---|
U+00A2 | ¢ | Cent sign (currency) |
U+00A3 | £ | Pound sign (currency) |
U+00AC | ¬ | NOT sign |
U+2016 | ‖ | Double vertical line |
U+2212 | − | Minus sign |
U+301C | 〜 | Wave dash |
Reference: https://osa.hatenablog.com/entry/2014/08/21/113602
"AAA−0001"
Since this string contains a- (minus sign), it is assumed that an exception error has occurred.
I understand the cause of the error. So how do you solve it?
The easiest way is to extend the string class using Ruby's open class.
Ruby has no restrictions on class inheritance. Even built-in library classes such as the String class and Array class can be inherited to define their own classes.
So add a method to the String class to prevent exceptions when converting to Windows-31J as follows.
class String
def sjisable
str = self
#Replace the characters on the conversion table with the characters below
from_chr = "\u{301C 2212 00A2 00A3 00AC 2013 2014 2016 203E 00A0 00F8 203A}"
to_chr = "\u{FF5E FF0D FFE0 FFE1 FFE2 FF0D 2015 2225 FFE3 0020 03A6 3009}"
str.tr!(from_chr, to_chr)
#Illegal characters leaked from the conversion table?Convert to UTF8 and then back to UTF8 to prevent future exceptions
str = str.encode("Windows-31J","UTF-8",:invalid => :replace,:undef=>:replace).encode("UTF-8","Windows-31J")
end
end
Code reference: https://qiita.com/yugo-yamamoto/items/0c12488447cb8c2fc018
By executing this method at the place where an exception error occurs, the exception error will not occur.
"AAA−0001".sjisable
Open classes and are very powerful and can be used to improve development efficiency.
On the other hand, it's nice to add a unique method to the Ruby standard class, but even if you read the code, you can't tell who defined the method for what purpose, and instead improve the development efficiency of the entire team. Drop it.
Another possible disadvantage is that an error occurs at an unexpected timing.
Or is it the responsibility of the String class to change the character code to Shift_JIS for some people, or is it the responsibility of the class that handles CSV because it is necessary when converting to CSV? I think there will be a question.
So, if you don't use open class, it's better to create something like CsvUtility class, consolidate the procedures for handling CSV there, and implement it so that it can be output in Shift_JIS or UTF-8. Become.
[Junichi Ito. Introduction to Ruby for professionals From language specifications to test-driven development / debugging techniques](https://www.amazon.co.jp/dp/B077Q8BXHC/ref=dp-kindle-redirect?_encoding = UTF8 & btkr = 1)
Recommended Posts