A memo when the numerical value changes slightly when reading data with read_csv of Pandas, changing the column name etc. and outputting with to_csv
Below is the minimum code from the essence of this article. Just read and spit out.
PandasTest.py
import pandas as pd
import argparse
def main():
parser = argparse.ArgumentParser()
parser.add_argument("-input", type=str, required=True)
parser.add_argument("-output", type=str, required=True)
args = parser.parse_args()
df = pd.read_csv(args.input, index_col=0)
df.to_csv(args.output)
if __name__ == "__main__":
main()
Comparing the two files converted by the above command with WinMarge, the data with many digits is inconsistent in some places.
I haven't pinpointed the exact cause, but I presume that there was a digit loss when converting to float inside Pandas. Therefore, add dtype = "object" to the argument of read_csv to prevent type conversion. This will give you an exact match.
df = pd.read_csv(args.input, index_col=0, dtype=object)
Recommended Posts