Introduction

A memo when the numerical value changes slightly when reading data with read_csv of Pandas, changing the column name etc. and outputting with to_csv

Source

Below is the minimum code from the essence of this article. Just read and spit out.

`PandasTest.py`


import pandas as pd
import argparse


def main():

    parser = argparse.ArgumentParser()
    parser.add_argument("-input", type=str, required=True)
    parser.add_argument("-output", type=str, required=True)

    args = parser.parse_args()

    df = pd.read_csv(args.input, index_col=0)
    df.to_csv(args.output)


if __name__ == "__main__":
    main()

problem

Comparing the two files converted by the above command with WinMarge, the data with many digits is inconsistent in some places.

Correspondence

I haven't pinpointed the exact cause, but I presume that there was a digit loss when converting to float inside Pandas. Therefore, add dtype = "object" to the argument of read_csv to prevent type conversion. This will give you an exact match.

   df = pd.read_csv(args.input, index_col=0, dtype=object)

Phenomenon that the numerical value changes slightly with Pandas and its response

Introduction

Source

PandasTest.py

problem

Correspondence

`PandasTest.py`