This article shares how to create a simple batch application with the Java batch framework Asakusa Framework. Officially, there is Asakusa Framework Tutorial, but in this article, you can easily develop Asakusa. The purpose is to give you an overview of the operation and give you a feel for it.
We have confirmed the operation in the following environment.
(Before trying the steps below, please have the JDK and Gradle installed.)
The sample application introduced is a simple one that reads an input file (contents are only Hello
), updates the contents toHello, World!
, And outputs the file.
The DFD format is as follows.
Create the following folder and use it as the project folder of the sample application.
C:¥asakusa-develop¥helloworld
Create a Gradle build script file (build.gradle
) under the created project folder.
Since this sample application is executed on the Windows JVM, Asakusa Vanilla is configured.
It is also possible to generate an application for Hadoop or M3BP by changing the settings here.
For more information on the Gradle script you created, see the Asakusa Gradle Plugin Reference (http://docs.asakusafw.com/latest/release/ja/html/application/gradle-plugin-reference.html).
build.gradle
group 'com.example'
buildscript {
repositories {
maven { url 'http://asakusafw.s3.amazonaws.com/maven/releases' }
maven { url 'http://asakusafw.s3.amazonaws.com/maven/snapshots' }
}
dependencies {
classpath group: 'com.asakusafw.gradle', name: 'asakusa-distribution', version: '0.10.0'
}
}
apply plugin: 'asakusafw-sdk'
apply plugin: 'asakusafw-organizer'
apply plugin: 'asakusafw-vanilla'
apply plugin: 'eclipse'
asakusafwOrganizer {
vanilla.enabled true
profiles.prod {
hadoop.embed true
}
}
The following Gradle command will generate a configuration file that can be imported as an Eclipse project. (Run under the project folder)
C:¥asakusa-develop¥helloworld>gradle eclipse
If you want to use IntelliJ IDEA, please refer to the official Using IntelliJ IDEA. If you use Eclipse, please consider using Shafu (Eclipse plug-in for Asakusa development).
DMDL (Data Model Definition Language) Create a script file to define the data model.
In the script below, the input and output models are named ʻinput and ʻoutput
, respectively, and Direct I / O CSV. /csv-format.html) is set to perform input / output.
Create the following folder and create a script file (models.dmdl
) under it.
C:¥asakusa-develop¥helloworld¥src¥main¥dmdl
models.dmdl
@directio.csv
input = {
value : TEXT;
};
@directio.csv
output = {
value : TEXT;
};
When you execute the following Gradle command, the data model class that can be used in Asakusa Framework will be generated based on the created script file. (Run under the project folder)
C:¥asakusa-develop¥helloworld>gradle compileDMDL
Create the following folder and create the com.example.operator.HelloWorldOperator
class under it.
Since the processing of the operator to be implemented is the processing of adding a value (, World!
) To the value of the input file, [@Update
](http://docs.asakusafw.com/latest/release/ja/html /dsl/operators.html#update-operator) Implements the operator.
For more information on operators, see the Asakusa DSL Operator Reference (http://docs.asakusafw.com/latest/release/ja/html/dsl/operators.html).
The com.example.modelgen.dmdl.model.Input
class is an automatically generated class from the DMDL.
C:¥asakusa-develop¥helloworld¥src¥main¥java
java:com.example.operator.HelloWorldOperator
package com.example.operator;
import com.asakusafw.vocabulary.operator.Update;
import com.example.modelgen.dmdl.model.Input;
public abstract class HelloWorldOperator {
@Update
public void appendMessage(Input input) {
input.setValueAsString(input.getValueAsString() + ", World!");
}
}
Create an input definition class and an output definition class. It inherits and implements ʻAbstractInputCsvInputDescription and ʻAbstractOutputCsvOutputDescription
, respectively. These classes are also automatically generated from DMDL.
For more information on file I / O classes, see Direct I / O User Guide-File I / O (http://docs.asakusafw.com/latest/release/ja/html/directio/user-guide.html#id41 ) May be referred to.
java:com.example.jobflow.InputFile
package com.example.jobflow;
import com.example.modelgen.dmdl.csv.AbstractInputCsvInputDescription;
public class InputFile extends AbstractInputCsvInputDescription {
@Override
public String getBasePath() {
return "input";
}
@Override
public String getResourcePattern() {
return "hello.csv";
}
}
java:com.example.jobflow.OutputFile
package com.example.jobflow;
import com.example.modelgen.dmdl.csv.AbstractOutputCsvOutputDescription;
public class OutputFile extends AbstractOutputCsvOutputDescription {
@Override
public String getBasePath() {
return "output";
}
@Override
public String getResourcePattern() {
return "helloworld.csv";
}
}
In the following job flow class, the input definition class (ʻInputFile) is set in ʻImport
of the data flow in the constructor, and the output definition class (ʻOutputFile) is set in ʻExport
.
Describe the data flow in the describe method. In the sample, the input model (ʻinput) is connected to the update operator (ʻappendMessage
) and the result is output to the output model (ʻoutput`).
For more information, see Asakusa DSL User Guide --Flow DSL.
java:com.example.jobflow.HelloWorldJob
package com.example.jobflow;
import com.asakusafw.vocabulary.flow.Export;
import com.asakusafw.vocabulary.flow.FlowDescription;
import com.asakusafw.vocabulary.flow.Import;
import com.asakusafw.vocabulary.flow.In;
import com.asakusafw.vocabulary.flow.JobFlow;
import com.asakusafw.vocabulary.flow.Out;
import com.asakusafw.vocabulary.flow.util.CoreOperatorFactory;
import com.example.modelgen.dmdl.model.Input;
import com.example.modelgen.dmdl.model.Output;
import com.example.operator.HelloWorldOperatorFactory;
import com.example.operator.HelloWorldOperatorFactory.AppendMessage;
@JobFlow(name = "helloWorldJob")
public class HelloWorldJob extends FlowDescription {
final In<Input> input;
final Out<Output> output;
public HelloWorldJob(
@Import(name = "input", description = InputFile.class)
In<Input> input,
@Export(name = "output", description = OutputFile.class)
Out<Output> output) {
this.input = input;
this.output = output;
}
@Override
protected void describe() {
CoreOperatorFactory core = new CoreOperatorFactory();
HelloWorldOperatorFactory operator = new HelloWorldOperatorFactory();
AppendMessage appendedMessage = operator.appendMessage(input);
output.add(core.restructure(appendedMessage.out, Output.class));
}
}
The following batch class just executes the job flow class (HelloWorldJob
).
For information on batch classes, see Asakusa DSL User Guide-Batch DSL. ..
java:com.example.batch.HelloWorldBatch
package com.example.batch;
import com.asakusafw.vocabulary.batch.Batch;
import com.asakusafw.vocabulary.batch.BatchDescription;
import com.example.jobflow.HelloWorldJob;
@Batch(name = "example.helloWorld")
public class HelloWorldBatch extends BatchDescription {
@Override
protected void describe() {
run(HelloWorldJob.class).soon();
}
}
Up to this point, I think that the subordinate of the source folder has the following structure.
Generate a sample application deployment archive file with the following Gradle command. (Run under the project folder)
C:¥asakusa-develop¥helloworld>gradle assemble
If the execution is successful, the deployment archive file will be created under the build
folder.
C:¥asakusa-develop¥helloworld¥build¥asakusafw-helloworld.tar.gz
Since we are deploying on Windows this time (for commercial use, we will deploy to Hadoop or M3BP environment ...) Add ʻASAKUSA_HOME` to the Windows environment variable.
variable | value |
---|---|
ASAKUSA_HOME | C:¥asakusa-develop¥asakusa |
Extract the generated deployment archive file (ʻasakusafw-helloworld.tar.gz`) under ASAKUSA_HOME. Run the following command under ASAKUSA_HOME.
C:¥asakusa-develop¥asakusa>java -jar tools¥bin¥setup.jar
setup: C:¥asakusa-develop¥asakusa
framework version: 0.10.0
installation path is not a POSIX file system: C:¥asakusa-develop¥asakusa
Deploy the batch input file to the following folder. By default, it is deployed under the target \ testing \ directio
folder under the user home directory.
Make sure that it matches the basePath
(return value of the getBasePath
method) and resourcePattern
(return value of the getResourcePattern
method) set in the input definition class.
C:¥...<User home directory>...¥target¥testing¥directio¥input¥
hello.csv
Hello
In this sample, Asakusa CLI is used (for commercial use, YAESS It will be executed at .asakusafw.com/latest/release/ja/html/yaess/index.html)), and add the following value to the Windows PATH environment variable.
%ASAKUSA_HOME%¥bin
Run the batch application with the following command.
asakusa run vanilla.example.helloWorld
Confirm that the following files are output as the execution result.
C:¥...<User home directory>...¥target¥testing¥directio¥output¥
helloworld.csv
"Hello, World!"
The version of Asakusa Framework has also become 0.10.0, and the threshold for development and operation has been lowered, so I decided to summarize it at this timing and wrote this article. How was it? There are many things I haven't written yet, but I hope to introduce them in the next article.
Recommended Posts