[Java] Implemented a strict line feed stream reader that readsLine only with CrLf.

To be clear, I think it's a ** cat that has been implemented a million times **. I've implemented it for some reason, so I uploaded it to GitHub.

■ Summary:

◇ Background

As a requirement, it is very common ** "It is necessary to implement a CSV parser" **. Personally, I don't like the CSV file format because it's a lot of trouble. I would like you to use XML or JSON as the inter-system IF and adopt other structured character string formats. Unfortunately, due to various circumstances, I decided to say ** anything is CSV **.

◇ I searched for it before implementing it independently

In short, what I want is a CSV parser, and if possible, such a muddy thing is not something I want to write by myself. So, I searched for a library that could be used, but none of them came to my mind.

The following points were the main bottlenecks.

--The CSV escape implementation is weak. --Once the entire amount of files is stored in memory.
* (Maybe it's because of using Scanner internally or using regular expressions, I think that's the situation) *

I want to get rid of the already implemented library * (I don't want to write the muddy code myself) *, so I tried my best to find it. After all, I couldn't find a good product that could meet all the requirements.

◇ Implementation requirements

--The data contained in CSV contains line breaks * (LF line-feed: \ n) *, which need to be processed properly. --It is fixed as a specification that the record delimiter is CrLf and the line feed in the data is only Lf. --The amount of data in the CSV file is fairly large, and it is NG to expand the entire amount in memory. It is essential to be able to process streams.

Well, read the text file line by line, It's just a CSV token parsing while inquoting normally. It's muddy, but it's easy and easy.

I was thinking, but unfortunately I didn't know that ** BufferedReader # readLine () doesn't distinguish between CrLf and Lf **.

Although the token parser was created immediately, the essential line reading did not go well, and in the end it was necessary to make it by himself.

■ Implementation code:

Only the core logic part is excerpted. I'm omitting other related classes, so please see GitHub for the whole amount.

CrLfReader#next



	public String next()
	{
		this.sb.setLength( 0 );
		
		if ( this.build() )
		{
			return this.sb.toString();
		}
		else
		{
			return null;
		}
	}

CrLfReader#build



	private boolean build()
	{
		if ( this.end )
		{
			return false;
		}
		else
		{
			this.readline( false );
			return true;
		}
	}

CrLfReader.Buffer



	private class Buffer
	{
		private final char[] temp;
		
		private int size;
		private int index;
		private boolean eof;
		
		public Buffer(int size)
		{
			this.temp = new char[max( size, MIN_SIZE )];
			this.size = 0;
			this.index = this.temp.length;
			this.eof = false;
		}
		
		public boolean fill()
		{
			if ( !eof )
			{
				size = reader.read( temp );
				index = 0;
				eof = -1 == size;
			}
			
			return !eof;
		}
		
		
		public boolean seekable()
		{
			return index < size;
		}
		
		public char seek()
		{
			return temp[index++];
		}
	}

CrLfReader#readline



	private static final char CR = '\r';
	private static final char LF = '\n';
	
	private void readline(boolean cr)
	{
		while ( this.buffer.seekable() )
		{
			final char c = this.buffer.seek();
			
			if ( cr )
			{
				if ( LF == c ) return;
				
				this.sb.append( CR );
			}
			
			cr = CR == c;
			
			if ( !cr )
			{
				this.sb.append( c );
			}
		}
		
		
		if ( this.buffer.fill() )
		{
			this.readline( cr );
		}
		else
		{
			if ( cr ) sb.append( CR );
			
			this.end = true;
		}
	}

■GitHub

repository URL https://github.com/sugaryo/sharp4j

class FQN sharp4j.util.io.CrLfReader

Digression

In C #, you can use yield return to make it a more convenient utility. Will Java use the Stream API?

I'd like to expand it after studying for a while.

Recommended Posts

[Java] Implemented a strict line feed stream reader that readsLine only with CrLf.
[Java] Create a collection with only one element
Make Java Stream line breaks nice with eclipse
[Java] How to start a new line with StringBuilder
I want to write a loop that references an index with Java 8's Stream API
Only the top level Stream can be parallelized with Java Stream.
[MQTT / Java] Implemented a class that does MQTT Pub / Sub in Java