The analysis library ROOT (C ++ based) developed by CERN manages and saves data in a \ * .root file. uproot is a python / numpy based library that reads and writes this \ * .root file at high speed. I can't find any Japanese documentation because it's been released relatively early, but it seems to be very useful for connecting ROOT, which is mainly used for elementary particles, atomic nuclei, and space experiments, with a python-based machine learning framework. So I tried to summarize it. Official documentation ROOT Home Page
If you want to run machine learning based on the created ROOT file, there are several possible methods.
1 is obviously annoying because you have to write both the ROOT macro to convert to csv format and the python code to read csv and convert it to numpy array etc. On the other hand, 2 which seems to be straight forward at first glance seems to be quite difficult to firmly introduce pyROOT when running in a cloud environment such as Google Colab. (It seems possible, but ... link)
Meanwhile, the recently released uproot can handle the ROOT file as it is, and the introduction cost is not high. I think that uproot will be very helpful. In addition, it has a very important feature for use in machine learning that it can read a huge amount of data at high speed. According to the development team, it achieves read speeds that exceed the original ROOT for large files.
Already LHC (Large Hadron Collider) experiment and [XENON-nT experiment](https://science.purdue.edu/xenon1t/ It seems that there are many users in major experiments (in the neighborhood) such as), and it seems that development will continue in the future. It is also used in MLaaS developed at LHC.
You can install it with the pip command.
pip install uproot
You can also install it with conda.
conda config --add channels conda-forge # if you haven't added conda-forge already
conda install uproot
You do not need to have C ++ ROOT installed to use uproot.
In this article, we will focus on the most basic uses. uproot uses the name of the ROOT object to read and write.
A suitable ROOT macro. Execute root -l GenRootFile.cpp
.
GenRootFile.cpp
void GenRootFile(){
TFile* fout = new TFile("sample.root","recreate");
TTree* tout = new TTree("tout","tout");
Int_t event;
Double_t val;
tout->Branch("event",&event,"event/I");
tout->Branch("val" ,&val ,"val/D");
TF1* fgaus = new TF1("fgaus","[0]*TMath::Gaus(x,[1],[2])",-10,10);
fgaus->SetParameters(1,0,2);
TH1D* hgaus = new TH1D("hgaus","hgaus",20,-10,10);
Int_t Nevent = 100;
for (Int_t ievent = 0; ievent < Nevent; ievent++) {
event = ievent;
val = fgaus->GetRandom();
hgaus ->Fill(val);
tout ->Fill();
}
fout ->cd();
tout ->Write();
hgaus->Write();
fout ->Close();
}
For example, the following histogram is generated.
Access with the name that Tree has ("tout" in this case).
import uproot
file = uproot.open("sample.root")
tout = file["tout"] #TName
print(tout)
You can load a Tree Branch as a numpy array.
val = tout.array("val")
Histogram can also be read.
hgaus = file["hgaus"] #Get by TName
print(hgaus.edges) #Histogram x-axis
print(hgaus.values) #Histogram y-axis(value)
hgaus.show()
The execution result looks like this.
[-10. -9. -8. -7. -6. -5. -4. -3. -2. -1. 0. 1. 2. 3.
4. 5. 6. 7. 8. 9. 10.]
[ 0. 0. 0. 0. 0. 0. 5. 13. 17. 24. 20. 14. 2. 3. 1. 0. 1. 0.
0. 0.]
0 25.2
+---------------------------------------------------------------+
[-inf, -10) 0 | |
[-10, -9) 0 | |
[-9, -8) 0 | |
[-8, -7) 0 | |
[-7, -6) 0 | |
[-6, -5) 0 | |
[-5, -4) 0 | |
[-4, -3) 5 |************ |
[-3, -2) 13 |******************************** |
[-2, -1) 17 |****************************************** |
[-1, 0) 24 |************************************************************ |
[0, 1) 20 |************************************************** |
[1, 2) 14 |*********************************** |
[2, 3) 2 |***** |
[3, 4) 3 |******* |
[4, 5) 1 |** |
[5, 6) 0 | |
[6, 7) 1 |** |
[7, 8) 0 | |
[8, 9) 0 | |
[9, 10) 0 | |
[10, inf] 0 | |
+---------------------------------------------------------------+
You can also create a new ROOT file to write the histogram and new Tree as follows:
import numpy as np
t = uproot.newtree({"branch1": int,
"branch2": np.int32,
"branch3": uproot.newbranch(np.float64, title="This is the title")})
with uproot.recreate("example.root") as f:
f["hist"] = hgaus #Name the object appropriately
f["t"] = t
I found that I can read and write ROOT files (fast) in a python environment using uproot. It seems to be especially useful when you want to feed large files with python-based machine learning frameworks (PyTorch, Tensorflow, etc.).
Recommended Posts