Phenomenon that the numerical value changes slightly with Pandas and its response

Introduction

A memo when the numerical value changes slightly when reading data with read_csv of Pandas, changing the column name etc. and outputting with to_csv

Source

Below is the minimum code from the essence of this article. Just read and spit out.

PandasTest.py


import pandas as pd
import argparse


def main():

    parser = argparse.ArgumentParser()
    parser.add_argument("-input", type=str, required=True)
    parser.add_argument("-output", type=str, required=True)

    args = parser.parse_args()

    df = pd.read_csv(args.input, index_col=0)
    df.to_csv(args.output)


if __name__ == "__main__":
    main()

problem

Comparing the two files converted by the above command with WinMarge, the data with many digits is inconsistent in some places. image.png

Correspondence

I haven't pinpointed the exact cause, but I presume that there was a digit loss when converting to float inside Pandas. Therefore, add dtype = "object" to the argument of read_csv to prevent type conversion. This will give you an exact match.

   df = pd.read_csv(args.input, index_col=0, dtype=object)

Recommended Posts

Phenomenon that the numerical value changes slightly with Pandas and its response
Extract the maximum value with pandas and change that value
Extract the maximum value with pandas.
Describe ec2 with boto3 and retrieve the value
Script that changes the length of the sound with REAPER
[Python3] Save the mean and covariance matrix in json with pandas
A class that freely changes the PLC value by socket communication