How to convert a 2D array to csv format and save it in a file using Java 8's Stream API.
First, an example of simply separating character strings with commas (","). Suppose you have a two-dimensional array like this.
String arrays[][] = {
{ "aaa", "bbb", "ccc", "ddd", "eee" },
{ "abc", "def", "hij", "klm", "opq" },
{ "AAA", "BBB", "CCC", "DDD", "EEE" }
};
this,
aaa,bbb,ccc,ddd,eee
abc,def,hij,klm,opq
AAA,BBB,CCC,DDD,EEE
It is output in the form of.
First, convert each line to a comma-separated string and store it in the List. This is because it is easier to output a file if you set it to a List of Strings. Since you only have to process each line individually, stream arrays with Arrays.stream () and process as follows.
// convert each array[] to csv strings, then store to a List
List<String> list = Arrays.stream(arrays)
.map(line -> String.join(",",line))
.collect(Collectors.toList());
Since each row is an array of String, [String.join ()](https://docs.oracle.com/javase/jp/8/docs/api/java/lang/String.html#join-java.lang.CharSequence You can easily combine them using the -java.lang.CharSequence ...-) method. String.join () is a method added in Java 8. If you pass a delimiter and a String array, the contents of the array will be joined and returned using the specified delimiter.
Termination operations are listed with toList () using collect ().
To write a List object of String to a file, it is convenient to use the Files.write () method added from Java 7.
// save to a file on current dir
try {
Files.write(Paths.get(System.getProperty("user.dir"),"out.csv"), list, StandardOpenOption.CREATE);
} catch (IOException e) {
e.printStackTrace();
}
[Files.write ()](https://docs.oracle.com/javase/jp/8/docs/api/java/nio/file/Files.html#write-java.nio.file.Path-java. Pass the Path object of the save destination file to the first argument of lang.Iterable-java.nio.file.OpenOption ...-), and pass the Iteratable object that holds the contents you want to save to the second argument. Since the definition of the second argument is Iterable <? Extends CharSequence>, the content of Iteratable must be an object of CharSequence implementation class such as String or StringBuffer.
The following is an example of reading from a file and holding it in a two-dimensional array. This can also be processed line by line, so [Files.lines ()](https://docs.oracle.com/javase/jp/8/docs/api/java/nio/file/Files.html#lines It is convenient to read using the -java.nio.file.Path-) method. Files.lines () is a method that reads all lines from the specified file as a stream. The return value is Stream
// read from csv file
try (Stream<String> stream = Files.lines(Paths.get(System.getProperty("user.dir"),"out.csv"))) {
// read each line
String data[][] = stream.map(line -> line.split(","))
.map(line -> Arrays.stream(line)
.map(String::new)
.toArray(String[]::new))
.toArray(String[][]::new);
} catch (IOException e) {
e.printStackTrace();
}
Since the data read by Files.lines () has already been streamed, it is OK if you execute the process of splitting with "," and converting to an array for each line. I'm using Stream twice, but the inside is the part that processes each row, and the result is a Stream of String []. The outside is the part that processes the whole, and since it is a stream of String [] at the stage of map (), it can be made into a two-dimensional array with toArray ().
In addition to this example, [Files.readAllLines ()](https://docs.oracle.com/javase/jp/8/docs/api/java/nio/file/Files.html#readAllLines-java.nio I think it is also possible to use the .file.Path-) method to load all the data into the List and then process it.
Next, consider a two-dimensional array of primitive types. Suppose you have the following array:
int arrays[][] = {
{ 11, 12, 13, 14, 15 },
{ 21, 22, 23, 24, 25 },
{ 31, 32, 33, 34, 35 }
};
this
11,12,13,14,15
21,22,23,24,25
31,32,33,34,35
Save it to a file like this.
This is also converted to a String List using the Stream API, but since the String.join () method cannot be used to create a comma-separated string, you need to write the processing for this part yourself. So what's inside map () looks like this:
// convert each array[] to csv strings, then store to a List
List<String> list = Arrays.stream(arrays)
.map(line -> Arrays.stream(line)
.mapToObj(String::valueOf)
.collect(Collectors.joining(",")))
.collect(Collectors.toList());
Since the target to be processed is String [], after making this a Stream, convert each element to a String object with mapToObj (). Then, in the termination operation, Collectors.joining () is used to join all the elements with ",".
The output to a file is similar to that of a String array.
It is the same as the case of a character string up to the point where it is read as a Stream by the Files.lines () method, but this time I want to store it as an int array, so I converted each read element to an int type using mapToInt () in stream processing. Arrange above.
try (Stream<String> stream = Files.lines(Paths.get(System.getProperty("user.dir"),"out2.csv"))) {
// read each line
int data[][] = stream.map(line -> line.split(","))
.map(line -> Arrays.stream(line)
.mapToInt(Integer::parseInt)
.toArray())
.toArray(int[][]::new);
} catch (IOException e) {
e.printStackTrace();
}
The first example was a simple csv format where the strings were simply separated by ",", but if this is left as it is, it will not be processed correctly if the string contains double quotes or commas. Therefore, let's consider a program that can support csv in which each field is enclosed in double quotes.
There are various libraries for handling csv, but here I will try it on my own. That said, it's hard to parse it properly, so we'll use a regular expression to retrieve the string for each field. Therefore, we will simplify the csv specification and assume the following rules for processing.
--Handle fields without double quotes as strings, including spaces --Comma in the part surrounded by double quotes is treated as a character --Double quotes inside double quotes are not allowed. However, unless it is escaped with'' --Do not allow cases where the characters are outside the double quotes
As an example, consider the following two-dimensional array.
String[][] arrays = {
{ "Dog" , "Cat" , "" , "Turtle", "" , "" },
{ "hoge", "pi yo" , " fuga " , " foo" , "bar ", "bow" },
{ "hoge", " pi yo", " fuga " , "foo " , "bar " , "" },
{ "hoge", "pi yo" , "fu\" ga", "foo" , "bar " , "bow" },
{ " ", "pi yo" , "fu,ga ", "foo" , " bar ", "" }
};
this
"Dog","Cat","","Turtle","",""
"hoge","pi yo"," fuga "," foo","bar ","bow"
"hoge"," pi yo"," fuga ","foo ","bar ",""
"hoge","pi yo","fu\" ga","foo","bar ","bow"
" ","pi yo","fu,ga ","foo"," bar ",""
It is output to a file in the following format.
As usual, use Stream to convert to a comma-separated string and store it in a List. However, this time we will add double quotes to both ends of each string before combining them with commas. Also, double quotes in the string should be escaped with "" so that they are output.
// convert each array[] to csv strings, then store to a List
List<String> list = Arrays.stream(arrays)
.map(line -> Arrays.stream(line)
.map(str -> str.replaceAll("\\\"", "\\\\\""))
.map(str -> "\"" + str + "\"")
.collect(Collectors.joining(",")))
.collect(Collectors.toList());
The output to a file is the same as in previous cases.
When reading the data, you need to consider the csv rules mentioned above. This time, we need to consider the cases where each field is not enclosed in double quotes and the case where it is enclosed.
If not enclosed in double quotes, treat the entire comma-to-comma (or line beginning to comma, comma to end of line) as a field string.
If it is enclosed in double quotes, the enclosed part is treated as a string of the field. If there is a comma in it, the comma is also part of the string. Double quotes are treated as characters only if they are escaped.
Based on the above conditions, consider a regular expression that matches the character string of each field. First, consider the case where escape characters are not considered. This can be detected by using look-ahead and look-behind, because each field is comma-to-comma, or line beginning to comma, and comma to line ending.
(?<=^|,)hogehoge(?=$|,)
The hogehoge part can be divided into cases that are not surrounded by double quotes and cases that are surrounded by double quotes. Unenclosed cases can be represented by regular expressions such as "[^",] * ", and enclosed cases can be represented by regular expressions such as" "[^"] * "". The point is that the former does not allow ",", while the latter allows it. If you write this together, it will be as follows.
(?:[^",]*|"[^"]*")
If you add the look-ahead and look-ahead to this, you get the following.
(?<=^|,)(?:[^",]*|"[^"]*")(?=$|,)
Now,""When considering the escape by, any character can come after the escape character, so the above "[^",]"" Is "(?:\.|[^\",])"It looks like. Similarly, ""[^"]""" Is ""(?:\.|[^\"])""It looks like. With that in mind, the final regular expression can be written as:
(?<=^|,)(?:(?:\\.|[^\\",])*|"(?:\\.|[^\\"])*")(?=$|,)
You can use this regular expression to retrieve the value of each field individually as you read the data from the file.
// Regex expression that matches with csv fields.
String regex = "(?<=^|,)(?:(?:\\\\.|[^\\\\\",])*|\"(?:\\\\.|[^\\\\\"])*\")(?=$|,)";
Pattern pattern = Pattern.compile(regex);
// open a file
try (Stream<String> stream = Files.lines(Paths.get(System.getProperty("user.dir"),"out3.csv"))) {
// read each line
String data[][] = stream
.map(line -> {
Matcher matcher = pattern.matcher(line);
List<String> aList = new ArrayList<>();
while (matcher.find()) {
aList.add(matcher.group().replaceFirst("^\"","").replaceFirst("\"$",""));
}
return aList;
})
.map(line -> line.stream().toArray(String[]::new))
.toArray(String[][]::new);
} catch (IOException e) {
e.printStackTrace();
}
First, create a regular expression Pattern object and compile it. Getting data from a file as a Stream and processing it line by line is the same as before. In the part that processes each line, Matcher is used to extract the matched string. Note that in this example, fields that do not match (do not follow the rules) are ignored.
The matched character string is temporarily stored in List, but at that time, double quotes before and after are removed. Finally, arraying with toArray () is the same as before.
I think that if you devise the processing of the regular expression part, you can handle the reading of more complicated conditions.
Recommended Posts