When outputting the prediction result calculated by machine learning
--For binary classification, I want to output at 0/1 --If it is a probability value, you want to output it with [0,1], which is about 5 digits after the decimal point. ――If you want to use regression, you want to use exponential notation according to the range of possible values.
You may want to adjust the number of digits of the output value.
In numpy, you can output csv and tsv with numpy.savetxt ()
, but you can adjust the number of digits by specifying the fmt
parameter at this time.
In [1]: import numpy as np
In [2]: np.savetxt("output.csv", [[0.1, 0.2, 0.3], [0.4, 0.5, 0.6]])
In [3]: cat output.csv
1.000000000000000056e-01 2.000000000000000111e-01 2.999999999999999889e-01
4.000000000000000222e-01 5.000000000000000000e-01 5.999999999999999778e-01
In [4]: np.savetxt("output.csv", [[0.1, 0.2, 0.3], [0.4, 0.5, 0.6]], fmt="%.5f")
In [5]: cat output.csv
0.10000 0.20000 0.30000
0.40000 0.50000 0.60000
Actually, you can specify multiple parameters in list format. If the parameter is set like fmt = ["% .0f ","% .1f ","% .5f "]
, it will be output with the specified number of digits for each column (each column).
In [6]: np.savetxt("output.csv", [[0.1, 0.2, 0.3], [0.4, 0.5, 0.6]], fmt=["%.0f", "%.1f", "%.5f"])
In [7]: cat output.csv
0 0.2 0.30000
0 0.5 0.60000
In [8]: np.savetxt("output.csv", [[0.1, 0.2, 0.3], [0.4, 0.5, 0.6]], fmt=["%.0e", "%.1e", "%.5e"])
In [9]: cat output1.csv
1e-01 2.0e-01 3.00000e-01
4e-01 5.0e-01 6.00000e-01
However, when specifying each column, it must be specified in all columns, and if the number of columns does not match, an error will occur.
In [10]: np.savetxt("output.csv", [[0.1, 0.2, 0.3], [0.4, 0.5, 0.6]], fmt=["%.0e", "%.1e"])
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-11-6d7d74124420> in <module>()
----> 1 np.savetxt("output.csv", [[0.1, 0.2, 0.3] ,[0.4, 0.5, 0.6]], fmt=["%.0e", "%.1e"])
[...]
AttributeError: fmt has wrong shape. ['%.0e', '%.1e']
This is useful when the ID is in the first column, or when the explanatory variable is float but the objective variable is of a different type such as int.
Recommended Posts