If there is an item with a comma in the CSV file, splitting it with the normal String.split method will solve the problem of splitting in the middle of the item.
For example, if there is such a CSV record
test.csv
ABC,DEF,GHI,'1,000,000',JKL,MN
When you want to split this record with commas, suppose you want to split this single-quote-enclosed string as a single string, not separated by commas.
The expected result at that time is
Expected results
ABC
DEF
GHI
'1,000,000'
JKL
MN
However, if you use the split method normally, it will be divided like this.
split(",")Disappointing result of
ABC
DEF
GHI
'1
000
000'
JKL
MN
I tried to find out what happened, regular expressions, etc., but the split method is in the first place
"Split when conditions are met" Is a method
"Do not split when conditions are met" I can't say that (in this case, the condition is "comma surrounded by quotation marks", and I can't say that it doesn't split only when that condition is met), so I gave up and made my own method.
Main.java
import java.io.BufferedReader;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStreamReader;
import java.util.ArrayList;
import java.util.List;
public class Main {
public static void main(String[] args) throws IOException {
try {
String fileName= "C:\\temp\\test.csv"; //CSV file you want to read
//Read input CSV file
File file= new File(fileName);
FileInputStream input = new FileInputStream(file);
InputStreamReader stream= new InputStreamReader(input,"SJIS");
BufferedReader br = new BufferedReader(stream);
String line = br.readLine();
//Call your own method
List<String> data = csvSplit(line);
for (String col : data) {
System.out.print(col + "\r\n");
}
} catch (IOException e) {
e.printStackTrace();
}
}
private static List<String> csvSplit(String line) {
char c;
StringBuilder s = new StringBuilder();
List<String> data = new ArrayList<String>();
boolean singleQuoteFlg = false;
for(int i=0; i < line.length(); i++){
c = line.charAt(i);
if (c == ',' && !singleQuoteFlg) {
data.add(s.toString());
s.delete(0,s.length());
} else if (c == ',' && singleQuoteFlg) {
s.append(c);
} else if (c == '\'') {
singleQuoteFlg = !singleQuoteFlg;
s.append(c);
} else {
s.append(c);
}
}
return data;
}
}
Below are the execution results.
Execution result
ABC
DEF
GHI
'1,000,000'
JKL
The commas in the quotes are not separated and are properly considered as a single string.
What you are doing
--Read one CSV record (String line = br.readLine ();) --Extract characters one by one from the read record and store them in a StringBuilder (c = line.charAt (i);). (S.append (c);) --Processing is divided according to the one-character pattern. --When it is a comma and the comma is not surrounded by single quotes --When it is a comma and the comma is surrounded by single quotes --When single quotes
The explanation of logic is like this. By the way, if you don't want to include single quotes in the output, they will not be included in the output unless you concatenate to StringBuilder.
After modification csvSplit
private static List<String> csvSplit(String line) {
char c;
StringBuilder s = new StringBuilder();
List<String> data = new ArrayList<String>();
boolean singleQuoteFlg = false;
for(int i=0; i < line.length(); i++){
c = line.charAt(i);
if (c == ',' && !singleQuoteFlg) {
data.add(s.toString());
s.delete(0,s.length());
} else if (c == ',' && singleQuoteFlg) {
s.append(c);
} else if (c == '\'') {
singleQuoteFlg = !singleQuoteFlg;
// s.append(c); //If you stop connecting characters, single quotes will not be output!
} else {
s.append(c);
}
}
return data;
}
Execution result (without single quote)
ABC
DEF
GHI
1,000,000
JKL
I feel like this.
If there is any other good way, I would appreciate it if you could teach me!
Recommended Posts