I tried Tribuo published by Oracle. Tribuo --A Java prediction library (v4.0)

I actually tried it. From the conclusion, I was very anxious. Also, if you call yourself a framework, you can write javadoc properly in English. Then you should fix the implementation code properly instead of fixing the manual.

setup

I used maven to resolve the dependency. After creating the project, set tribuo in the pom file as shown below.

<dependency>
    <groupId>org.tribuo</groupId>
    <artifactId>tribuo-all</artifactId>
    <version>4.0.0</version>
    <type>pom</type>
</dependency>

To learn and evaluate the logistic regression model do the following, so write the following code in the java class To do.

var trainSet = new MutableDataset<>(new LibSVMDataSource("train-data",new LabelFactory()));
var model    = new LogisticRegressionTrainer().train(trainSet);
var eval     = new LabelEvaluator().evaluate(new LibSVMDataSource("test-data",trainSet.getOutputFactory()));

The class name is TribuoSample. What? Nanikore?

コンストラクタがないエラー01.png

This is the constructor part of LibSVMDataSource. There is no constructor that takes a String as an argument.

LibSVMDataSourceコンストラクタ.png

And what is this java.nio.file.Path interface? The fact that java.nio.file.Paths is not an implementation ... Is it modern because it has static access to Path.of?

python


        //Are you saying that you can write like this?
        var trainSet = new MutableDataset<>(new LibSVMDataSource<>(Path.of(new URL("file:train-data").getPath()),new LabelFactory()));

        //Or is it like this?
        var trainSet = new MutableDataset<>(new LibSVMDataSource<>(Paths.get("train-data"),new LabelFactory()));

        //Then isn't it okay with the URL? It's redundant.
        var trainSet = new MutableDataset<>(new LibSVMDataSource<>(new URL("file:train-data"),new LabelFactory()));

You can either fix it manually or accept String in this class body, but you should fix MutableDataset.class. You can do it like this for the time being.

MutableDataset



    public LibSVMDataSource(String url, OutputFactory<T> outputFactory) throws IOException {
        this(null,new URL(url),outputFactory,false,false,0);
    }

    public LibSVMDataSource(String url, OutputFactory<T> outputFactory, boolean zeroIndexed, int maxFeatureID) throws IOException {
        this(null,new URL(url),outputFactory,true,zeroIndexed,maxFeatureID);
    }

However, in the processing of the constructor, neither url nor path is checked. Below is the corresponding code of MutableDataset.class.

LibSVMDataSource


    private LibSVMDataSource(Path path, URL url, OutputFactory<T> outputFactory, boolean rangeSet, boolean zeroIndexed, int maxFeatureID) throws IOException {
        this.outputFactory = outputFactory;
        this.path = path;
        this.url = url;
        this.rangeSet = rangeSet;
        if (rangeSet) {
            this.zeroIndexed = zeroIndexed;
            this.minFeatureID = zeroIndexed ? 0 : 1;
            if (maxFeatureID < minFeatureID + 1) {
                throw new IllegalArgumentException("maxFeatureID must be positive, found " + maxFeatureID);
            }
            this.maxFeatureID = maxFeatureID;
        }
        read();
    }

When it comes to how far we go In LibSVMDataSource # read called from the constructor. Below is the corresponding code.

LibSVMDataSource#read



    private void read() throws IOException {
        int pos = 0;
        ArrayList<HashMap<Integer,Double>> processedData = new ArrayList<>();
        ArrayList<String> labels = new ArrayList<>();

        // Idiom copied from Files.readAllLines,
        // but this doesn't require keeping the whole file in RAM.
        String line;
        // Parse the libsvm file, ignoring malformed lines.
        try (BufferedReader r = new BufferedReader(new InputStreamReader(url.openStream(),StandardCharsets.UTF_8))) {
            for (;;) {
                line = r.readLine();
                if (line == null) {
                    break;
                }
                pos++;
                String[] fields = splitPattern.split(line);
                try {
                    boolean valid = true;
                    HashMap<Integer, Double> features = new HashMap<>();
                    for (int i = 1; i < fields.length && valid; i++) {
                        int ind = fields[i].indexOf(':');
                        if (ind < 0) {
                            logger.warning(String.format("Weird line at %d", pos));
                            valid = false;
                        }
                        String ids = fields[i].substring(0, ind);
                        int id = Integer.parseInt(ids);
                        if ((!rangeSet) && (maxFeatureID < id)) {
                            maxFeatureID = id;
                        }
                        if ((!rangeSet) && (minFeatureID > id)) {
                            minFeatureID = id;
                        }
                        double val = Double.parseDouble(fields[i].substring(ind + 1));
                        Double value = features.put(id, val);
                        if (value != null) {
                            logger.warning(String.format("Repeated features at line %d", pos));
                            valid = false;
                        }
                    }
                    if (valid) {
                        // Store the label
                        labels.add(fields[0]);
                        // Store the features
                        processedData.add(features);
                    } else {
                        throw new IOException("Invalid LibSVM format file");
                    }
                } catch (NumberFormatException ex) {
                    logger.warning(String.format("Weird line at %d", pos));
                    throw new IOException("Invalid LibSVM format file", ex);
                }
            }
        }

I'm doing url.openStream () in the try. It's just a NumberFormatException to catch. Looking at the description of the member variable, either url or path is required.

LibSVMDataSource


    // url is the store of record.
    @Config(description="URL to load the data from. Either this or path must be set.")
    private URL url;

    @Config(description="Path to load the data from. Either this or url must be set.")
    private Path path;

I'm checking if both are null in LibSVMDataSource # postConfig, but this is no good.

LibSVMDataSource#postConfig


    @Override
    public void postConfig() throws IOException {
        if (maxFeatureID != Integer.MIN_VALUE) {
            rangeSet = true;
            minFeatureID = zeroIndexed ? 0 : 1;
            if (maxFeatureID < minFeatureID + 1) {
                throw new IllegalArgumentException("maxFeatureID must be positive, found " + maxFeatureID);
            }
        }
        if ((url == null) && (path == null)) {
            throw new PropertyException("","path","At most one of url and path must be set.");
        } else if ((url != null) && (path != null) && !path.toUri().toURL().equals(url)) {
            throw new PropertyException("","path","At most one of url and path must be set");
        } else if (path != null) {
            // url is the store of record.
            try {
                url = path.toUri().toURL();
            } catch (MalformedURLException e) {
                throw new PropertyException(e,"","path","Path was not a valid URL");
            }
        }
        read();
    }

If you write such a code, no one will process it.

TribuoSample


public class TribuoSample {

    /**
     * @param args main method arguments.
     */
    public static void main(String[] args) {

        URL url = null;

        try {
            var trainSet = new MutableDataset<>(
                    new LibSVMDataSource<>(url, new LabelFactory()));
        } catch (IOException e) {
            //TODO auto-generated catch block
            e.printStackTrace();
        }
    }
}

When you do this ...

StackTrace


Exception in thread "main" java.lang.NullPointerException
	at org.tribuo.datasource.LibSVMDataSource.read(LibSVMDataSource.java:204)
	at org.tribuo.datasource.LibSVMDataSource.<init>(LibSVMDataSource.java:125)
	at org.tribuo.datasource.LibSVMDataSource.<init>(LibSVMDataSource.java:105)
	at org.project.eden.adam.TribuoSample.main(TribuoSample.java:28)

In this case, the message you want to convey to the user is "It is essential that either url or path is set, but The value you set for the url was set to null. I should tell you that, but it's an unexpected error or a stack trace is spit out and the process stops. If it's a business application, it's still a framework under the name of oracle, so I'm wondering how this kind of fall is. In the case of path, the constructor will access the path object, so it's a nullpo on the spot.

This is its implementation

LibSVMDataSource


    public LibSVMDataSource(Path path, OutputFactory<T> outputFactory) throws IOException {
        this(path,path.toUri().toURL(),outputFactory,false,false,0);
    }

Run the sample as follows.

TribuoSample


public class TribuoSample {

    /**
     * @param args main method arguments.
     */
    public static void main(String[] args) {

        Path path = null;

        try {
            var trainSet = new MutableDataset<>(
                    new LibSVMDataSource<>(path, new LabelFactory()));

        } catch (IOException e) {
            //TODO auto-generated catch block
            //e.printStackTrace();
        }
    }
}

Needless to say, NullPointerException.

Exception in thread "main" java.lang.NullPointerException
	at org.tribuo.datasource.LibSVMDataSource.<init>(LibSVMDataSource.java:97)
	at org.project.eden.adam.TribuoSample.main(TribuoSample.java:28)

I didn't expect this to happen from the first line on the top page of the document. Sample execution on another page.

Recommended Posts

I tried Tribuo published by Oracle. Tribuo --A Java prediction library (v4.0)
I touched Tribuo published by Oracle. Document Tribuo --A Java prediction library (v4.0)
I took a second look at Tribuo published by Oracle. Tribuo --A Java prediction library (v4.0)
I touched Tribuo published by Oracle. Document Tribuo --Intro classification with Irises
[Java] I tried to make a maze by the digging method ♪
java I tried to break a simple block
I tried hitting a Java method from ABCL
I tried to break a block with java (1)
I tried running Java on a Mac terminal
I tried to create a Clova skill in Java
I tried to make a login function in Java
I tried using Log4j2 on a Java EE server
I tried OCR processing a PDF file with Java
I tried scraping a stock chart using Java (Jsoup)
I tried to create a java8 development environment with Chocolatey
I tried using the GitHub repository as a library server
I tried to modernize a Java EE application with OpenShift.
I tried to convert a string to a LocalDate type in Java
I tried to make a client of RESAS-API in Java
I tried OCR processing a PDF file with Java part2
I tried using the CameraX library with Android Java Fragment
I made a Dockerfile to start Glassfish 5 using Oracle Java
I tried Drools (Java, InputStream)
I tried using Java REPL
I tried metaprogramming in Java
When I regained my mind about Tribuo released by Oracle, the person inside was a hot person.
Java had a QR code creation library called QRGen that wrapped ZXing nicely, so I tried it.
I created a PDF in Java.
I made a shopify app @java
I tried a little digdag docker.run_options
A person writing C ++ tried writing Java
I tried to interact with Java
I tried UDP communication with Java
I tried the Java framework "Quarkus"
I tried using Java8 Stream API
I tried using JWT in Java
I tried to summarize Java learning (1)
I tried to summarize Java 8 now
I tried using Java memo LocalDate
I tried using GoogleHttpClient of Java
I tried learning Java with a series that beginners can understand clearly
I was addicted to a simple test of Jedis (Java-> Redis library)
I tried to make a talk application in Java using AI "A3RT"