I touched Tribuo published by Oracle. Document Tribuo --A Java prediction library (v4.0)

Postscript 2020/09/28 This article is out of date. I have reorganized it here. https://qiita.com/jashika/items/d7c86dd8053379fd909f

Introduction Tribuo is a Java library for building and deploying machine learning models. The core development team is the Machine Learning Research Group at Oracle Labs, and the library is published on Github under the Apache 2.0 license.

The API is strongly typed, with parameters for models, predictors, datasets, and examples.

The API is high level, the model consumes examples and produces predictive values, but not float arrays.

The API is unified, all prediction types have the same (well-typed) API, and Tribuo's classes are parameterized by prediction type (eg, classification uses Label, regression uses Regressor). I will use it).

-The API is reusable, modular, and packaged in small pieces only for what you need, so you can install only what you need.

Tribuo offers a wide range of ML algorithms and features with the same API.

・ Classification: Linear model, SVM, tree, ensemble, deep learning

Regression: Linear model, linear regression with penalties, SVM, tree, ensemble, deep learning

・ Clustering: K-Means

・ Abnormality detection: SVM

We plan to increase the available algorithms over time.

Tribuo makes it easy to load datasets, train models, and evaluate models on test data. For example, this code trains and evaluates a logistic regression model.

var trainSet = new MutableDataset<>(new LibSVMDataSource("train-data",new LabelFactory()));
var model    = new LogisticRegressionTrainer().train(trainSet);
var eval     = new LabelEvaluator().evaluate(new LibSVMDataSource("test-data",trainSet.getOutputFactory()));

Getting Started In order to make Tribuo a project, Maven sets as follows.

<dependency>
    <groupId>org.tribuo</groupId>
    <artifactId>tribuo-all</artifactId>
    <version>4.0.0</version>
    <type>pom</type>
</dependency>

The tribuo-all module captures all of Tribuo. You can later select a subset of specific use cases. Here is a simple example that shows how to build and evaluate a classification system. There are four steps to this.

  1. Read the dataset for classifying iris species from CSV.
  2. Divide the data set into a training data set and a test data set.
  3. Learn two types of models using different trainers.
  4. Predict the test set using the model and evaluate the performance of the entire test set.
//Labeled iris(Iris)Read data
var irisHeaders = new String[]{"sepalLength", "sepalWidth", "petalLength", "petalWidth", "species"};
DataSource<Label> irisData =
        new CSVLoader<>(new LabelFactory()).loadDataSource(Paths.get("bezdekIris.data"),
                                     /* Output column   */ irisHeaders[4],
                                     /* Column headers  */ irisHeaders);

//iris(Iris)Data training set(70%)And test set(30%)Divided into
var splitIrisData = new TrainTestSplitter<>(irisesSource,
                       /* Train fraction */ 0.7,
                             /* RNG seed */ 1L);
var trainData = new MutableDataset<>(splitIrisData.getTrain());
var testData = new MutableDataset<>(splitIrisData.getTest());

//Learn decision trees
var cartTrainer = new CARTClassificationTrainer();
Model<Label> tree = cartTrainer.train(trainData);

//Logistic regression
var linearTrainer = new LogisticRegressionTrainer();
Model<Label> linear = linearTrainer.train(trainData);

//Ultimately, make predictions from invisible data
//Each prediction is scored from the output name (label)/Map to probability
Prediction<Label> prediction = linear.predict(testData.get(0));

//The complete test dataset may be evaluated to calculate accuracy, F1, etc.
Evaluation<Label> evaluation = new LabelEvaluation().evaluate(linear,testData);

//Inspect manual evaluation.
double acc = evaluation.accuracy();

//Display the formatted evaluation string.
System.out.println(evaluation.toString());

The formatted evaluation output looks like this:

Class                           n          tp          fn          fp      recall        prec          f1
Iris-versicolor                16          16           0           1       1.000       0.941       0.970
Iris-virginica                 15          14           1           0       0.933       1.000       0.966
Iris-setosa                    14          14           0           0       1.000       1.000       1.000
Total                          45          44           1           1
Accuracy                                                                    0.978
Micro Average                                                               0.978       0.978       0.978
Macro Average                                                               0.978       0.980       0.978
Balanced Error Rate                                                         0.022

For more information on this example, see the classification tutorial using the same iris (iris) dataset. ~~ Translate later. ~~ translated.

Documentation Overview The feature list gives an overview of what Tribuo can do and the algorithms it supports both natively and through interfaces to third-party libraries. The best way to understand Tribuo is to read the Tribuo architecture documentation. Describes basic definitions, data flows, library structures, settings (including options and achievements), data loading, transformations, sample details, and obfuscation features that can be used to hide input functionality. The Package Structure Overview describes how Tribuo packages are organized around the machine learning tasks they support. These packages are grouped into modules so that Tribuo users can rely only on what they need to implement. Be sure to read the security precautions for using Tribuo and what users expect. See the FAQ for other issues and common questions. See Tribuo's JavaDoc for more information on all classes and packages.

Tutorials Tutorial notes on classification, clustering, regression, anomaly detection, and configuration systems are available. They use the Java Jupyter notebook kernel and run on Java 10+. It should be easy to get the tutorial code back into Java 8 code by replacing the var keyword with the appropriate type.

Configuration and Provenance Tribuo trainers can be fully configured via the OLCUT configuration system. This allows you to define a trainer once in an XML (or JSON or EDN) file and build iterative models with exactly the same parameters. The config folder for each package contains an example of the trainer settings provided. Models can be serialized using Java serialization as well as the dataset itself, and the configuration used is stored in any model. For all models and evaluations, when the model or evaluation was created, what data was used, what transformations were applied to the data, what are the trainer's hyperparameters, and in the case of evaluation, which model Contains a serializable certificate object that records how it was used. This information can be extracted into JSON or serialized directly using Java serialization. In a production environment, this performance information can be replaced with hashes and edited to provide model tracking via an external system. Learn more about settings, options and certificates.

Platform Support & Requirements Tribuo runs on Java 8+ and has been tested with LTS versions of Java and the latest releases. Tribuo itself is a Java library and is supported on all Java platforms, but some interfaces require native code and are only supported where the native library is. Tested on x86_64 architecture on Windows 10, macOS, Linux (RHEL / OL / CentOS 7+). If you are interested in another platform and would like to use one of the native library interfaces (ONNX runtime, TensorFlow, XGBoost), we recommend contacting the developers of those libraries.

Recommended Posts

I touched Tribuo published by Oracle. Document Tribuo --A Java prediction library (v4.0)
I tried Tribuo published by Oracle. Tribuo --A Java prediction library (v4.0)
I took a second look at Tribuo published by Oracle. Tribuo --A Java prediction library (v4.0)
I touched Tribuo published by Oracle. Document Tribuo --Intro classification with Irises
I first touched Java ②
I first touched Java ③
I first touched Java ④
I first touched Java
I made a Dockerfile to start Glassfish 5 using Oracle Java
When I regained my mind about Tribuo released by Oracle, the person inside was a hot person.
Java creates a Word document
[Java] I tried to make a maze by the digging method ♪