A story about converting character codes from UTF-8 to Shift-jis in Ruby

A story about converting character codes from UTF-8 to Shift-jis in Ruby

I was asked to set the default character code of the character string to Shift-jis when spitting out CSV in my current job.

The default character code of the Ruby string is UTF-8, so I thought it would be possible to realize it just by converting it to Shift-jis at the time of CSV spitting, but I was addicted to other than that.

Some people may be addicted to it as well, so I will describe the solution.

Addictive mounting location

CSV ejection can be implemented relatively easily by using the generate method of the CSV class of the Ruby library.

Implementation image ↓

require "csv"

text =<<-EOS
id,first name,last name,age

csv = CSV.generate(text, headers: true) do |csv|
  csv.add_row(["5", "saburo", "kondo", "34"])

Code reference: https://docs.ruby-lang.org/ja/latest/method/CSV/s/generate.html

If you want to convert to shift-jis, you can use the key: encoding to specify the encoding of the output.

CSV.generate(text, headers: true, encoding: "SJIS")

With this option, the output encoding will be automatically converted from utf-8 to shift-jis.

When I implemented it here, the following error occurred for some reason .....

incompatible character encodings: Windows-31J and UTF-8

Research of cause

Investigate why you get an encoding error.

As a result of investigating which character string causes an error, an error occurred in the following character string.


It's a string that doesn't feel strange, why is it an error?

Upon closer examination, it seems that converting the following characters to shif-jis will result in an exception error.

Character code (UTF-8) letter Remarks
U+00A2 ¢ Cent sign (currency)
U+00A3 £ Pound sign (currency)
U+00AC ¬ NOT sign
U+2016 Double vertical line
U+2212 Minus sign
U+301C Wave dash

Reference: https://osa.hatenablog.com/entry/2014/08/21/113602


Since this string contains a- (minus sign), it is assumed that an exception error has occurred.


I understand the cause of the error. So how do you solve it?

The easiest way is to extend the string class using Ruby's open class.

Ruby has no restrictions on class inheritance. Even built-in library classes such as the String class and Array class can be inherited to define their own classes.

So add a method to the String class to prevent exceptions when converting to Windows-31J as follows.

class String
  def sjisable
    str = self
    #Replace the characters on the conversion table with the characters below
    from_chr = "\u{301C 2212 00A2 00A3 00AC 2013 2014 2016 203E 00A0 00F8 203A}"
    to_chr   = "\u{FF5E FF0D FFE0 FFE1 FFE2 FF0D 2015 2225 FFE3 0020 03A6 3009}"
    str.tr!(from_chr, to_chr)
    #Illegal characters leaked from the conversion table?Convert to UTF8 and then back to UTF8 to prevent future exceptions
    str = str.encode("Windows-31J","UTF-8",:invalid => :replace,:undef=>:replace).encode("UTF-8","Windows-31J")

Code reference: https://qiita.com/yugo-yamamoto/items/0c12488447cb8c2fc018

By executing this method at the place where an exception error occurs, the exception error will not occur.


If you don't want to use open classes

Open classes and are very powerful and can be used to improve development efficiency.

On the other hand, it's nice to add a unique method to the Ruby standard class, but even if you read the code, you can't tell who defined the method for what purpose, and instead improve the development efficiency of the entire team. Drop it.

Another possible disadvantage is that an error occurs at an unexpected timing.

Or is it the responsibility of the String class to change the character code to Shift_JIS for some people, or is it the responsibility of the class that handles CSV because it is necessary when converting to CSV? I think there will be a question.

So, if you don't use open class, it's better to create something like CsvUtility class, consolidate the procedures for handling CSV there, and implement it so that it can be output in Shift_JIS or UTF-8. Become.


[Junichi Ito. Introduction to Ruby for professionals From language specifications to test-driven development / debugging techniques](https://www.amazon.co.jp/dp/B077Q8BXHC/ref=dp-kindle-redirect?_encoding = UTF8 & btkr = 1)

Recommended Posts

A story about converting character codes from UTF-8 to Shift-jis in Ruby
Change from SQLite3 to PostgreSQL in a new Ruby on Rails project
A story about changing jobs from a Christian minister (apprentice) to a web engineer
I thought about the best way to create a ValueObject in Ruby
A story about an arithmetic overflow that you shouldn't encounter in Ruby
A story about changing skills from COBOL cultivated for 5 years in the late 20s to a Web language
A story about the JDK in the Java 11 era
A story about a very useful Ruby Struct class
A story about trying to operate JAVA File
How to start a subscript from an arbitrary number in Ruby iterative processing
A story about trying to get along with Mockito
A story about trying hard to decompile JAR files
A story about reducing memory consumption to 1/100 with find_in_batches
Ruby Regular Expression Extracts from a specific string to a string
About eval in Ruby
From Java to Ruby !!
How to change a string in an array to a number in Ruby
A story about BeanNotOfRequiredTypeException occurring after applying AOP in Spring
[Java] How to convert a character string from String type to byte type
What I did in the version upgrade from Ruby 2.5.2 to 2.7.1
A story I was addicted to in Rails validation settings
How to display a graph in Ruby on Rails (LazyHighChart)
I want to create a Parquet file even in Ruby
Apply CSS to a specific View in Ruby on Rails
About smart cast in callback from Fragment to Activity in Kotlin
A story about misunderstanding how to use java scanner (memo)
About the method to convert a character string to an integer / decimal number (cast data) in Java
I tried to create an API to get data from a spreadsheet in Ruby (with service account)
Multiplication in a Ruby array
About regular expressions in Ruby
[Ruby] Difference between symbol variables and character string variables. About the difference between [: a] and ['a'].
I tried to write code like a type declaration in Ruby
From building to deploying Ruby on Jets in docker-compose environment <Part 2>
From building to deploying Ruby on Jets in docker-compose environment <Part 1>
A story about making a calculator to calculate the shell mound rate
The story that Tomcat suffered from a timeout error in Eclipse
How to get and add data from Firebase Firestore in Ruby
How to think about class design (division) in a business system (1)
A story about running a program that copies files in Java from a bat file to make the work done every day a little more efficient