uproot: Python / Numpy based library for reading and writing ROOT files

Overview

The analysis library ROOT (C ++ based) developed by CERN manages and saves data in a \ * .root file. uproot is a python / numpy based library that reads and writes this \ * .root file at high speed. I can't find any Japanese documentation because it's been released relatively early, but it seems to be very useful for connecting ROOT, which is mainly used for elementary particles, atomic nuclei, and space experiments, with a python-based machine learning framework. So I tried to summarize it. Official documentation ROOT Home Page logo-uproot.png

background

If you want to run machine learning based on the created ROOT file, there are several possible methods.

  1. Read in csv format with ROOT macro
  2. Read using pyROOT etc.

1 is obviously annoying because you have to write both the ROOT macro to convert to csv format and the python code to read csv and convert it to numpy array etc. On the other hand, 2 which seems to be straight forward at first glance seems to be quite difficult to firmly introduce pyROOT when running in a cloud environment such as Google Colab. (It seems possible, but ... link)

Meanwhile, the recently released uproot can handle the ROOT file as it is, and the introduction cost is not high. I think that uproot will be very helpful. In addition, it has a very important feature for use in machine learning that it can read a huge amount of data at high speed. According to the development team, it achieves read speeds that exceed the original ROOT for large files. root-none-muon.png

Already LHC (Large Hadron Collider) experiment and [XENON-nT experiment](https://science.purdue.edu/xenon1t/ It seems that there are many users in major experiments (in the neighborhood) such as), and it seems that development will continue in the future. It is also used in MLaaS developed at LHC.

uproot_popularity.png

Installation

You can install it with the pip command.

pip install uproot

You can also install it with conda.

conda config --add channels conda-forge  # if you haven't added conda-forge already
conda install uproot

You do not need to have C ++ ROOT installed to use uproot.

How to use

In this article, we will focus on the most basic uses. uproot uses the name of the ROOT object to read and write.

Create a suitable ROOT file

A suitable ROOT macro. Execute root -l GenRootFile.cpp.

GenRootFile.cpp



void GenRootFile(){
   TFile* fout = new TFile("sample.root","recreate");
   TTree* tout = new TTree("tout","tout");
   Int_t    event;
   Double_t val;
   tout->Branch("event",&event,"event/I");
   tout->Branch("val"  ,&val  ,"val/D");
   
   TF1* fgaus = new TF1("fgaus","[0]*TMath::Gaus(x,[1],[2])",-10,10);
   fgaus->SetParameters(1,0,2);
   TH1D* hgaus = new TH1D("hgaus","hgaus",20,-10,10);
   Int_t Nevent = 100;
   for (Int_t ievent = 0; ievent < Nevent; ievent++) {
      event = ievent;
      val   = fgaus->GetRandom();
      
      hgaus ->Fill(val);
      tout  ->Fill();
   }
   
   fout ->cd();
   tout ->Write();
   hgaus->Write();
   fout ->Close();
}

For example, the following histogram is generated.

histogram.png

Read the tree in the root file

Access with the name that Tree has ("tout" in this case).

import uproot
file = uproot.open("sample.root")
tout = file["tout"] #TName
print(tout)

Access the elements of the Tree

You can load a Tree Branch as a numpy array.


val  = tout.array("val")

Load Histogram

Histogram can also be read.

hgaus = file["hgaus"] #Get by TName
print(hgaus.edges) #Histogram x-axis
print(hgaus.values)  #Histogram y-axis(value)
hgaus.show()

The execution result looks like this.

[-10.  -9.  -8.  -7.  -6.  -5.  -4.  -3.  -2.  -1.   0.   1.   2.   3.
   4.   5.   6.   7.   8.   9.  10.]
[ 0.  0.  0.  0.  0.  0.  5. 13. 17. 24. 20. 14.  2.  3.  1.  0.  1.  0.
  0.  0.]
               0                                                            25.2
               +---------------------------------------------------------------+
[-inf, -10) 0  |                                                               |
[-10, -9)   0  |                                                               |
[-9, -8)    0  |                                                               |
[-8, -7)    0  |                                                               |
[-7, -6)    0  |                                                               |
[-6, -5)    0  |                                                               |
[-5, -4)    0  |                                                               |
[-4, -3)    5  |************                                                   |
[-3, -2)    13 |********************************                               |
[-2, -1)    17 |******************************************                     |
[-1, 0)     24 |************************************************************   |
[0, 1)      20 |**************************************************             |
[1, 2)      14 |***********************************                            |
[2, 3)      2  |*****                                                          |
[3, 4)      3  |*******                                                        |
[4, 5)      1  |**                                                             |
[5, 6)      0  |                                                               |
[6, 7)      1  |**                                                             |
[7, 8)      0  |                                                               |
[8, 9)      0  |                                                               |
[9, 10)     0  |                                                               |
[10, inf]   0  |                                                               |
               +---------------------------------------------------------------+

writing

You can also create a new ROOT file to write the histogram and new Tree as follows:

import numpy as np
t = uproot.newtree({"branch1": int,
                    "branch2": np.int32,
                    "branch3": uproot.newbranch(np.float64, title="This is the title")})
with uproot.recreate("example.root") as f:
    f["hist"] = hgaus #Name the object appropriately
    f["t"] = t

Summary

I found that I can read and write ROOT files (fast) in a python environment using uproot. It seems to be especially useful when you want to feed large files with python-based machine learning frameworks (PyTorch, Tensorflow, etc.).

Recommended Posts

uproot: Python / Numpy based library for reading and writing ROOT files
[Introduction for beginners] Reading and writing Python CSV files
Reading and writing JSON files with Python
Study from Python Reading and writing Hour9 files
Reading and writing CSV and JSON files in Python
Reading and writing fits files with Python (memo)
Reading and writing csv files
Character code for reading and writing csv files with python ~ windows environment ver ~
Python CSV file reading and writing
Reading and writing NetCDF with Python
Reading and writing CSV with Python
Reading and writing text in Python
PDF files and sites useful for learning Python 3
Case sensitive when reading and writing INI files
Example of reading and writing CSV with Python
Code reading for m3u8, a library for manipulating HLS video format m3u8 files in Python
Python and numpy tips
Python text reading for multiple lines and one line
Recursively search for files and directories in Python and output
Notes on writing config files for Python Note: configparser
[Python] Reading CSV files
Library for specifying a name server and dig with python
Notes on reading and writing float32 TIFF images in python
Reading .txt files with Python
Introduction to Python Numerical Library NumPy
Minimum grammar notes for writing Python
Read and use Python files from Python
<For beginners> python library <For machine learning>
Export and output files in Python
[Python] Reading and writing photo location information tags (JPG file GPS Exif)
Reading from text files and SQLite in Python (+ Pandas), R, Julia (+ DataFrames)
Reading and creating a mark sheet using Python OpenCV (Tips for reading well)
Code reading of Safe, a library for checking password strength in Python
[Python] Master the reading of csv files. List of main options for pandas.read_csv.