To be clear, I think it's a ** cat that has been implemented a million times **. I've implemented it for some reason, so I uploaded it to GitHub.
As a requirement, it is very common ** "It is necessary to implement a CSV parser" **. Personally, I don't like the CSV file format because it's a lot of trouble. I would like you to use XML or JSON as the inter-system IF and adopt other structured character string formats. Unfortunately, due to various circumstances, I decided to say ** anything is CSV **.
In short, what I want is a CSV parser, and if possible, such a muddy thing is not something I want to write by myself. So, I searched for a library that could be used, but none of them came to my mind.
The following points were the main bottlenecks.
--The CSV escape implementation is weak.
--Once the entire amount of files is stored in memory.
* (Maybe it's because of using Scanner internally or using regular expressions, I think that's the situation) *
I want to get rid of the already implemented library * (I don't want to write the muddy code myself) *, so I tried my best to find it. After all, I couldn't find a good product that could meet all the requirements.
--The data contained in CSV contains line breaks * (LF line-feed: \ n) *, which need to be processed properly. --It is fixed as a specification that the record delimiter is CrLf and the line feed in the data is only Lf. --The amount of data in the CSV file is fairly large, and it is NG to expand the entire amount in memory. It is essential to be able to process streams.
Well, read the text file line by line, It's just a CSV token parsing while inquoting normally. It's muddy, but it's easy and easy.
I was thinking, but unfortunately I didn't know that ** BufferedReader # readLine () doesn't distinguish between CrLf and Lf **.
Although the token parser was created immediately, the essential line reading did not go well, and in the end it was necessary to make it by himself.
Only the core logic part is excerpted. I'm omitting other related classes, so please see GitHub for the whole amount.
CrLfReader#next
public String next()
{
this.sb.setLength( 0 );
if ( this.build() )
{
return this.sb.toString();
}
else
{
return null;
}
}
CrLfReader#build
private boolean build()
{
if ( this.end )
{
return false;
}
else
{
this.readline( false );
return true;
}
}
CrLfReader.Buffer
private class Buffer
{
private final char[] temp;
private int size;
private int index;
private boolean eof;
public Buffer(int size)
{
this.temp = new char[max( size, MIN_SIZE )];
this.size = 0;
this.index = this.temp.length;
this.eof = false;
}
public boolean fill()
{
if ( !eof )
{
size = reader.read( temp );
index = 0;
eof = -1 == size;
}
return !eof;
}
public boolean seekable()
{
return index < size;
}
public char seek()
{
return temp[index++];
}
}
CrLfReader#readline
private static final char CR = '\r';
private static final char LF = '\n';
private void readline(boolean cr)
{
while ( this.buffer.seekable() )
{
final char c = this.buffer.seek();
if ( cr )
{
if ( LF == c ) return;
this.sb.append( CR );
}
cr = CR == c;
if ( !cr )
{
this.sb.append( c );
}
}
if ( this.buffer.fill() )
{
this.readline( cr );
}
else
{
if ( cr ) sb.append( CR );
this.end = true;
}
}
■GitHub
repository URL https://github.com/sugaryo/sharp4j
class FQN sharp4j.util.io.CrLfReader
In C #, you can use yield return to make it a more convenient utility. Will Java use the Stream API?
I'd like to expand it after studying for a while.
Recommended Posts