I made a ruby script template that processes a huge file (even 1 million lines) line by line

I copy and use the template when I create a huge file.

merit

--You don't have to look up commands when reading and writing CSV and files. --Progress appears ――It usually takes about 10 minutes to read a huge file, but it is displayed because you can not see the progress and you do not know the progress

Script template

The following script is a script for reading CSV

--If you want to mess with a text file, click here (https://github.com/setsumaru1992/portableScripts/blob/master/frequent_use_codes/ruby_file_handlers/read_and_write_file.rb) --If you want to do aggregation processing after seeing all the lines instead of sequential processing of each line, [here](https://github.com/setsumaru1992/portableScripts/blob/master/frequent_use_codes/ruby_file_handlers/read_csv_lines_and_write_result. rb)

require "csv"

$is_debug = true

def main(csv) #, output_file)
  # output_file_writer = CSV.open(output_file, "w")
  # output_cols = ["hoge"]
  # output_file_writer.puts(output_cols)

  FileHandler.csv_foreach(csv) do |row|
    #processing
    p row

    # output_row_values = []
    # output_file_writer.puts(output_row_values)
  end

  # puts "#{output_file}is created"
  # output_file_writer.close
end

module FileHandler
  class << self
    def csv_foreach(csv)
      log "#{Time.now}: read start #{csv}"
      all_line_count = line_count(csv)

      return_values = CSV.foreach(csv, headers: true).with_index(1).map do |row, row_no|
        log progress(row_no, all_line_count) if progress_timing?(all_line_count, row_no)
        yield(row)
      end

      log "#{Time.now}: read end #{csv}"

      return_values
    end

    def line_count(file)
      open(file){|f|
        while f.gets; end
        f.lineno
      }
    end

    private

    def log(message)
      puts(message) if $is_debug
    end

    def progress_timing?(all_line_count, line_no)
      return false if all_line_count < 100

      #NOTE Change depending on processing time
      div_number = 100

      percent_unit = all_line_count / div_number
      line_no % percent_unit == 0
    end

    def progress(current_count, all_count)
      "#{Time.now}: #{CommonUtilities.percent(current_count, all_count)}% (#{CommonUtilities.number_with_commma(current_count)} / #{CommonUtilities.number_with_commma(all_count)})"
    end
  end
end

module CommonUtilities
  class << self
    def percent(num, all_count)
      (num.fdiv(all_count) * 100).round(2)
    end

    def number_with_commma(number)
      number.to_s.gsub(/(\d)(?=\d{3}+$)/, '\\1,')
    end
  end
end

main(ARGV[0])

Recommended Posts

I made a ruby script template that processes a huge file (even 1 million lines) line by line
I want to create a Parquet file even in Ruby
Ruby: I made a FizzBuzz program!
[Ruby] I made a simple Ping client
I made a risky die with Ruby
I made a viewer app that displays a PDF
I made a Ruby extension library in C
I made a LINE bot with Rails + heroku
I made a portfolio with Ruby On Rails