Converting TSV files to CSV files (with BOM) in Ruby

Introduction

It is a conversion method from a tab-delimited file (TSV file) in Ruby to a CSV file with BOM.

table of contents

  1. Completion code
    1. Creating a CSV file with a BOM
  2. Reading data in TSV file
    1. Read and write line by line in CSV
  3. Run the program

0. Completion code

tsv_to_csv.rb



require 'csv'

#Create a CSV file and add a bomb.
File.write("meibo.csv", "\uFEFF")

# 'meibo2.txt'Read the information of the above line by line, make an array with split, and put it line by line in the CSV file.
CSV.open("meibo.csv", "a", force_quotes: true ) do |meibocsv|
  File.foreach('meibo.txt') do |student|
    meibocsv << student.chomp.split("\t", -1)
  end
end

1. 1. Creating a CSV file with a BOM

BOM (Byte order mark) is a few bytes of data that can be reached at the beginning of text encoded in Unicode encoding format. Excel tries to open CSV file with Shift-JIS by default, so UTF-8 will cause garbled characters. As a workaround, you can prevent garbled characters by making Excel recognize that it is written in UTF-8 by adding a BOM when outputting in UTF-8. BOM is attached as follows.

File.write("meibo.csv", "\uFEFF")

2. Reading data in TSV file

The TSV file to be read is as follows.

meibo.txt


john	m	18
paul	m	20
alice	f	15
dabid	m	17
jasmin	f	17

I read meibo.txt in which tab-delimited format data is written, but I would like to take the method of writing one line when reading one line. After converting one line of tab-delimited data to an array with split, write it to the CSV file as it is.

File.foreach('meibo.txt') do |student|
(CSV file)<< student.chomp.split("\t", -1)
end

3. 3. Read and write line by line in CSV

If you enclose the above code in CSV.open, you can read and write one line. Basically, when moving from a tab-delimited file to a CSV file, it is processed line by line. This is to prevent memory overrun even if the data is huge, such as 10 million rows.

CSV.open("meibo.csv", "a", force_quotes: true ) do |meibo_csv|
  File.foreach('meibo.txt') do |student|
    meibo_csv << student.chomp.split("\t", -1)
  end
end

By the way, -1 is passed to the second argument (limit) of split. The value of this second argument is 0 by default, which removes the empty string at the end of the array. This doesn't work when the TSV has an empty field. You can solve it by passing -1.

4. Run the program

% ruby tsv_to_csv.rb 

A CSV file with data written in it is generated.

meibo.csv


"john","m","18"
"paul","m","20"
"alice","f","15"
"dabid","m","17"
"jasmin","f","17"

Correctly converted to a comma separated CSV file.

Recommended Posts

Converting TSV files to CSV files (with BOM) in Ruby
How to handle TSV files and CSV files in Ruby
Convert JSON to TSV and TSV to JSON with Ruby
Put CSV files containing "'" and "" "in MySQL in Ruby 2.3
CSV import with BOM
Convert large XLSX files to CSV with Apache POI
Write DiscordBot to Spreadsheets Write in Ruby and run with Docker
How to iterate infinitely in Ruby
How to install Bootstrap in Ruby
Handle CSV files uploaded to GCS
I tried to solve the tribonacci sequence problem in Ruby, with recursion.
A story about converting character codes from UTF-8 to Shift-jis in Ruby
How to insert processing with any number of elements in iterative processing in Ruby
How to share files with Docker Toolbox
How to get date data in Ruby
Introduction to Ruby basic grammar with yakiniku
CSV parsing with newline characters in fields
Convert numbers to Roman numerals in Ruby
Java-How to compare image files in binary
Convert SVG files to PNG files in Java
Ruby: CSV :: How to use Table Note
Script to make yaml from CSV to put initial data in Rails with Fixtures
Sample to create PDF from Excel with Ruby
[Ruby] Basic key to be strong in refactoring
[Ruby] How to convert CSV file to Yaml (Yml)
[Rails] Various ways to write in seed files
I want to use arrow notation in Ruby
Try to link Ruby and Java with Dapr
How to update pre-built files in docker container
One way to redirect_to with parameters in rails
Try to get redmine API key with ruby
AtCoder ABC127 D hash to solve with Ruby 2.7.1
How to get resource files out with spring-boot
[Ruby on Rails] How to install Bootstrap in Rails
Scraping yahoo news with [Ruby + Nokogiri] → Save CSV
How to build the simplest blockchain in Ruby
Burn files to CD / DVD with Ubuntu 18.04 LTS
How to implement Pagination in GraphQL (for ruby)
I want to get the value in Ruby
I want to return an object in CSV format with multi-line header & filter in Java
Docker command to create Rails project with a single blow in environment without Ruby