Create a simple Apache Beam program to understand how it works
Create a program that reads a local text file and writes it as is
IntelliJ
IntelliJ IDEA 2017.3.3 (Ultimate Edition)
Build #IU-173.4301.25, built on January 16, 2018
Licensed to kaito iwatsuki
Subscription is active until January 24, 2019
For educational use only.
JRE: 1.8.0_152-release-1024-b11 x86_64
JVM: OpenJDK 64-Bit Server VM by JetBrains s.r.o
Mac OS X 10.12.6
Maven : 3.5.2
Select Maven and Project SDK
is 1.8
\ <Next >
Enter GroupId
and ʻArtifactId \ <Next \> (both this time
beamSample`)
Enter Project name
\ <Finish > (this time beamSampleProject
)
Added ~ / beamSample / src / main / java / SimpleBeam.java
SimpleBeam.java
import org.apache.beam.sdk.Pipeline;
import org.apache.beam.sdk.io.TextIO;
import org.apache.beam.sdk.options.PipelineOptions;
import org.apache.beam.sdk.options.PipelineOptionsFactory;
import org.apache.beam.sdk.values.PCollection;
public class SimpleBeam {
public static void main(String[] args){
PipelineOptions options = PipelineOptionsFactory.create();
Pipeline p = Pipeline.create(options);
//Read text
PCollection<String> textData = p.apply(TextIO.read().from("Sample.txt"));
//Text writing
textData.apply(TextIO.write().to("wordcounts"));
//Pipeline run
p.run().waitUntilFinish();
}
}
If the library whose dependency is not resolved, click command
+ and add it as shown in the image.
~ / beamSample / Sample.txt
(text for reading)+
=> Maven in the upper leftName
Working directory
is set to ~ / beamSample
compile exec: java -Dexec.mainClass = SimpleBeam
in Command line
The output is as follows, and if the `` wordcounts-. * File is created in the
~ / beamSample` directory, it succeeds.
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 4.827 s
[INFO] Finished at: 2018-02-27T02:46:10+09:00
[INFO] Final Memory: 21M/373M
[INFO] ------------------------------------------------------------------------
Process finished with exit code 0
This time, there is nothing that seems to stumble, but since I have little understanding of the contents of IntelliJ, I was impatient with an unknown error several times. However, most of the causes were that the dependencies could not be resolved, so I managed to do ʻAdd_Maven`.
This time, I just moved it, so from the next time onward, I would like to configure a simple Pipeline that also serves as a review of the idea of MapReduce.
Recommended Posts