[JAVA] Introduction to Apache Beam (1) ~ Reading and writing text ~

Overall purpose

Create a simple Apache Beam program to understand how it works

Purpose of this time

Create a program that reads a local text file and writes it as is

Main story



IntelliJ IDEA 2017.3.3 (Ultimate Edition)
Build #IU-173.4301.25, built on January 16, 2018
Licensed to kaito iwatsuki
Subscription is active until January 24, 2019
For educational use only.
JRE: 1.8.0_152-release-1024-b11 x86_64
JVM: OpenJDK 64-Bit Server VM by JetBrains s.r.o
Mac OS X 10.12.6

Maven : 3.5.2






import org.apache.beam.sdk.Pipeline;
import org.apache.beam.sdk.io.TextIO;
import org.apache.beam.sdk.options.PipelineOptions;
import org.apache.beam.sdk.options.PipelineOptionsFactory;
import org.apache.beam.sdk.values.PCollection;

public class SimpleBeam {
    public static void main(String[] args){
        PipelineOptions options = PipelineOptionsFactory.create();

        Pipeline p = Pipeline.create(options);
        //Read text
        PCollection<String> textData = p.apply(TextIO.read().from("Sample.txt"));
        //Text writing
        //Pipeline run

If the library whose dependency is not resolved, click command + and add it as shown in the image.









The output is as follows, and if the `` wordcounts-. * File is created in the ~ / beamSample` directory, it succeeds.

[INFO] ------------------------------------------------------------------------
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 4.827 s
[INFO] Finished at: 2018-02-27T02:46:10+09:00
[INFO] Final Memory: 21M/373M
[INFO] ------------------------------------------------------------------------

Process finished with exit code 0


In other words

This time, there is nothing that seems to stumble, but since I have little understanding of the contents of IntelliJ, I was impatient with an unknown error several times. However, most of the causes were that the dependencies could not be resolved, so I managed to do ʻAdd_Maven`.

from next time

This time, I just moved it, so from the next time onward, I would like to configure a simple Pipeline that also serves as a review of the idea of MapReduce.

Recommended Posts

Introduction to Apache Beam (1) ~ Reading and writing text ~
I tried to chew C # (reading and writing files)
Implement writing and reading to Property List (.plist) in Swift
[Java] Reading and writing files with OpenCSV
Introduction to EHRbase 1-Overview and Environmental Improvement
[Introduction to Java] Variable declarations and types
Reading and writing gzip files in Java
Scraping and writing specific elements to a file
[Raspberry Pi] Try to link Apache2 and Tomcat
[Review] Reading and writing files with java (JDK6)
Introduction to Ruby 2
N things to keep in mind when reading "Introduction to Spring" and "Introduction to Spring" in the Reiwa era
Introduction to SWING
Apache and tomcat
Introduction to web3j
Introduction to Micronaut 1 ~ Introduction ~
[Java] Introduction to Java
Introduction to migration
Introduction to java
Introduction to Doma
A brief introduction to terasoluna5, see the text below