The first important point is that __ "CSV is not just a matter of reading a file and simply splitting it line by line with", "" __.
RFC4180 is currently defined as the standard specification for CSV.
http://www.kasai.fm/wiki/rfc4180jp http://www.ietf.org/rfc/rfc4180.txt
However, this is a follow-up to the one released in October 2005, and it is likely that there are some differences from the ones used at each site.
The main features described in this specification are as follows.
-Each record is separated by a line feed (CRLF) -The end of the record at the end of the file may or may not have a line break. -The header line may or may not exist at the beginning of the file. -Each line and header have fields separated by commas. • The number of fields should be the same throughout the file. -The last field must not end with a comma. -Each field may or may not be enclosed in double quotes. -Fields containing line breaks should be enclosed in double quotes. -Enclose fields containing double quotation marks in double quotation marks, and add double quotation marks before the double quotation marks in the field.
xxx,"test""test"
An example of operating the following CSV in each programming language is shown. In this sample, we will verify how the following CSV file can be imported.
__test1.csv Common CSV example __
test1.csv
Jack,12,Warrior,Explanation 1
Burn,17,Knight,Explanation 2
Maam,15,Monk,Explanation 3
__test2.csv If the first line is a comment and the field contains line breaks, double quotes, and commas __
test2.csv
#name,age,class,In the explanation(",)Put in
"Jack","12","Warrior",",Or""Can be entered"
"Burn","17","Knight","Explanation 2
Make a line break"
"Maam","15","Monk","Explanation 3"
__test3.csv Example where a blank line exists __
test3.csv
#Treatment of blank lines
"Jack","12","Warrior",",Or""Can be entered"
"Burn","17","Knight","Explanation 2
If there is an empty newline in double quotes"
#Comments on the way
"Maam","15","Monk","Explanation 3"
__test4.csv Example where the number of columns in each record is different __
test4.csv
#If the number of columns is different
"Jack","12"
"Burn"
"Maam","15","Monk","Explanation 3"
An example of parsing using TextField that can be used by referring to "Microsoft.VisualBasic" is shown below.
__CSV reading sample __
using System;
using System.Collections.Generic;
using System.Text;
using Microsoft.VisualBasic.FileIO; //add to
//It is a specification to skip blank lines
// http://msdn.microsoft.com/ja-jp/library/microsoft.visualbasic.fileio.textfieldparser.readfields.aspx
namespace csvTest
{
class Program
{
static void Main(string[] args)
{
dumpCsv("C:\\dev\\csv\\test1.csv");
dumpCsv("C:\\dev\\csv\\test2.csv");
dumpCsv("C:\\dev\\csv\\test3.csv");
dumpCsv("C:\\dev\\csv\\test4.csv");
Console.ReadLine();
}
static void dumpCsv(string file)
{
Console.WriteLine(file + "================================");
TextFieldParser parser = new TextFieldParser(file,
System.Text.Encoding.GetEncoding("Shift_JIS"));
parser.TextFieldType = FieldType.Delimited;
parser.SetDelimiters(","); //The delimiter is a comma
parser.CommentTokens = new string[1] {"#"};
int line = 0, col = 0;
while (!parser.EndOfData)
{
++line;
col = 0;
string[] row = parser.ReadFields(); //Read one line
Console.WriteLine("{0}", line);
//The element of the array row is the value of each field in the read row
foreach (string field in row)
{
++col;
Console.WriteLine("{0}:{1}", col, field);
}
Console.WriteLine("----------------------------");
}
parser.Close();
}
}
}
Execution result
C:\dev\csv\test1.csv================================
1
1:Jack
2:12
3:Warrior
4:Explanation 1
----------------------------
2
1:Burn
2:17
3:Knight
4:Explanation 2
----------------------------
3
1:Maam
2:15
3:Monk
4:Explanation 3
----------------------------
C:\dev\csv\test2.csv================================
1
1:Jack
2:12
3:Warrior
4:,Or"Can be entered
----------------------------
2
1:Burn
2:17
3:Knight
4:Explanation 2
Make a line break
----------------------------
3
1:Maam
2:15
3:Monk
4:Explanation 3
----------------------------
C:\dev\csv\test3.csv================================
1
1:Jack
2:12
3:Warrior
4:,Or"Can be entered
----------------------------
2
1:Burn
2:17
3:Knight
4:Explanation 2
If there is an empty newline in double quotes
----------------------------
3
1:Maam
2:15
3:Monk
4:Explanation 3
----------------------------
C:\dev\csv\test4.csv================================
1
1:Jack
2:12
----------------------------
2
1:Burn
----------------------------
3
1:Maam
2:15
3:Monk
4:Explanation 3
----------------------------
From this result, the following can be said as the operation of TextFieldParser. -No need to install a special library. -Comment lines can be specified by setting CommentTokens. ・ Skip blank lines. This is no exception even during double quotes. This behavior is by design. http://msdn.microsoft.com/ja-jp/library/microsoft.visualbasic.fileio.textfieldparser.readfields.aspx -There is no function to create CSV.
If you have a problem with the specification to skip blank lines of TextParser, or if you need to create CSV, use CsvHelper.
https://github.com/JoshClose/CsvHelper
Download it from the above and build it yourself, or run the following command in your package manager.
Install-Package CsvHelper
The verification here was done with .NET 3.5, but the library supports .NET 2.0 to 4.5.
__CSV reading sample __
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
using CsvHelper;
// .NET3.Validated by 5
namespace csvTest
{
class Program
{
static void Main(string[] args)
{
dumpCsv("C:\\dev\\csv\\test1.csv");
dumpCsv("C:\\dev\\csv\\test2.csv");
dumpCsv("C:\\dev\\csv\\test3.csv");
dumpCsv("C:\\dev\\csv\\test4.csv");
Console.Read();
}
static void dumpCsv(string file)
{
Console.WriteLine(file + "================================");
var parser = new CsvReader(new StreamReader(file,
System.Text.Encoding.GetEncoding(932)));
parser.Configuration.Encoding = System.Text.Encoding.GetEncoding(932);
parser.Configuration.AllowComments = true;
parser.Configuration.Comment = '#';
parser.Configuration.HasHeaderRecord = false;
while (parser.Read())
{
for (var i = 0; i < parser.CurrentRecord.Length; ++i)
{
Console.WriteLine("{0}:{1}", i, parser.CurrentRecord.ElementAt(i));
}
Console.WriteLine("----------------------------");
}
parser.Dispose();
}
}
}
Execution result
C:\dev\csv\test1.csv================================
0:Jack
1:12
2:Warrior
3:Explanation 1
----------------------------
0:Burn
1:17
2:Knight
3:Explanation 2
----------------------------
0:Maam
1:15
2:Monk
3:Explanation 3
----------------------------
C:\dev\csv\test2.csv================================
0:Jack
1:12
2:Warrior
3:,Or"Can be entered
----------------------------
0:Burn
1:17
2:Knight
3:Explanation 2
Make a line break
----------------------------
0:Maam
1:15
2:Monk
3:Explanation 3
----------------------------
C:\dev\csv\test3.csv================================
0:Jack
1:12
2:Warrior
3:,Or"Can be entered
----------------------------
0:Burn
1:17
2:Knight
3:Explanation 2
If there is an empty newline in double quotes
----------------------------
0:Maam
1:15
2:Monk
3:Explanation 3
----------------------------
C:\dev\csv\test4.csv================================
0:Jack
1:12
----------------------------
0:Burn
----------------------------
0:Maam
1:15
2:Monk
3:Explanation 3
----------------------------
__CSV writing sample __
var csv = new CsvWriter(new StreamWriter("C:\\dev\\csv\\out.csv", false, System.Text.Encoding.GetEncoding(932)));
csv.WriteField("I'm Kagome");
csv.WriteField(12);
csv.WriteField(true);
csv.WriteField("\",Symbols such as");
csv.NextRecord();
csv.WriteField("2nd line");
csv.WriteField("Two line breaks\n\n mix");
csv.NextRecord();
csv.Dispose();
Output file
I'm Kagome,12,True,""",Symbols such as"
2nd line,"Two line breaks
Mix"
From this result, the following can be said as the operation of CsvHelper. ・ Do not skip even if there is a blank line. -You can specify the presence or absence of comments in Configuration.AllowComments and Comments. -You can specify whether to use the first line as a header in Configuration.HasHeaderRecord. -You can easily create a CSV file. -Although not done here, you can also use Mapping to associate with an object.
Excel can open CSV, so let's implement it using that function.
Public Sub test()
Call DumpCsv("C:\\dev\\csv\\test1.csv")
Call DumpCsv("C:\\dev\\csv\\test2.csv")
Call DumpCsv("C:\\dev\\csv\\test3.csv")
Call DumpCsv("C:\\dev\\csv\\test4.csv")
End Sub
Private Sub DumpCsv(ByVal path As String)
Debug.Print path & "=============================="
Dim wkb As Workbook
Dim wks As Worksheet
Application.ScreenUpdating = False
Set wkb = Application.Workbooks.Open(path)
Application.Windows(wkb.Name).Visible = False
Set wks = wkb.Sheets(1)
Dim r As Long
Dim c As Long
Dim maxRow As Long
Dim maxCol As Long
maxRow = wks.Cells(1, 1).SpecialCells(xlLastCell).Row
maxCol = wks.Cells(1, 1).SpecialCells(xlLastCell).Column
For r = 1 To maxRow
For c = 1 To maxCol
Debug.Print c & ":" & wks.Cells(r, c).Value
Next c
Debug.Print "----------------------"
Next r
Call wkb.Close(False)
Application.ScreenUpdating = True
End Sub
Execution result
C:\\dev\\csv\\test1.csv==============================
1:Jack
2:12
3:Warrior
4:Explanation 1
----------------------
1:Burn
2:17
3:Knight
4:Explanation 2
----------------------
1:Maam
2:15
3:Monk
4:Explanation 3
----------------------
C:\\dev\\csv\\test2.csv==============================
1:#name
2:age
3:class
4:In the explanation("
5:)Put in
----------------------
1:Jack
2:12
3:Warrior
4:,Or"Can be entered
5:
----------------------
1:Burn
2:17
3:Knight
4:Explanation 2
Make a line break
5:
----------------------
1:Maam
2:15
3:Monk
4:Explanation 3
5:
----------------------
C:\\dev\\csv\\test3.csv==============================
1:#Treatment of blank lines
2:
3:
4:
----------------------
1:
2:
3:
4:
----------------------
1:Jack
2:12
3:Warrior
4:,Or"Can be entered
----------------------
1:Burn
2:17
3:Knight
4:Explanation 2
If there is an empty newline in double quotes
----------------------
1:#Comments on the way
2:
3:
4:
----------------------
1:Maam
2:15
3:Monk
4:Explanation 3
----------------------
C:\\dev\\csv\\test4.csv==============================
1:#If the number of columns is different
2:
3:
4:
----------------------
1:
2:
3:
4:
----------------------
1:Jack
2:12
3:
4:
----------------------
1:Burn
2:
3:
4:
----------------------
1:Maam
2:15
3:Monk
4:Explanation 3
----------------------
From this, when opening CSV using Excel, the following can be said. -Comments cannot be controlled. -Since CSV is opened in the sheet on Excel, the data operation can be the same as normal Excel VBA. -Since it is expanded with a sheet on Excel, the performance will be lower than when accessing the data in memory.
If you refer to "Microsoft ActiveX Data Objects x.x Library" in the reference settings, you can treat CSV as if you were operating a database.
Public Sub tstAdo()
Call DumpCsvByADO("test1.csv")
Call DumpCsvByADO("test2.csv")
Call DumpCsvByADO("test3.csv")
Call DumpCsvByADO("test4.csv")
End Sub
Private Sub DumpCsvByADO(ByVal path As String)
Dim cnn As ADODB.Connection
Set cnn = New ADODB.Connection
cnn.Open ("Driver={Microsoft Text Driver (*.txt; *.csv)};" & _
"DBQ=C:\dev\csv\;" & _
"FirstRowHasNames=0;")
Dim rs As ADODB.Recordset
'FirstRowHasNames=You can make the header unnecessary with 0, but it doesn't work due to a bug.
' http://support.microsoft.com/kb/288343/ja
'http://support.microsoft.com/kb/257819/JA
Dim i As Long
Set rs = cnn.Execute("SELECT * FROM " & path)
Debug.Print path & "=============================="
Do While Not rs.EOF
For i = 0 To rs.Fields.Count - 1
Debug.Print rs.Fields(i).Value
Next i
Debug.Print "-------------------------"
rs.MoveNext
Loop
rs.Close
cnn.Close
Set cnn = Nothing
Set rs = Nothing
End Sub
Execution result
test1.csv==============================
Burn
17
Knight
Explanation 2
-------------------------
Maam
15
Monk
Explanation 3
-------------------------
test2.csv==============================
Jack
12
Warrior
,Or"Can be entered
Null
-------------------------
Burn
17
Knight
Explanation 2
Make a line break
Null
-------------------------
Maam
15
Monk
Explanation 3
Null
-------------------------
test3.csv==============================
Null
Null
Null
Null
-------------------------
Jack
12
Warrior
,Or"Can be entered
-------------------------
Burn
17
Knight
Explanation 2
If there is an empty newline in double quotes
-------------------------
#Comments on the way
Null
Null
Null
-------------------------
Maam
15
Monk
Explanation 3
-------------------------
test4.csv==============================
Null
Null
Null
Null
-------------------------
Jack
12
Null
Null
-------------------------
Burn
Null
Null
Null
-------------------------
Maam
15
Monk
Explanation 3
-------------------------
From this result, the following can be said when using ADO. -Can handle CSV files like a database. -A field that does not exist will be Null. ・ Comments do not work. -The first line is always treated as a header. And this cannot be avoided due to a bug. http://support.microsoft.com/kb/288343/ja http://support.microsoft.com/kb/257819/JA
In the case of Python, it can be supported by the csv module included as standard. http://docs.python.jp/2/library/csv.html
__ CSV read __
# -*- coding: utf-8 -*-
import csv
def dumpCsv(path):
print ('%s==============' % path)
reader = csv.reader(open(path,'rb'))
for row in reader:
print (', '.join(row))
dumpCsv('C:\\dev\\csv\\test1.csv')
dumpCsv('C:\\dev\\csv\\test2.csv')
dumpCsv('C:\\dev\\csv\\test3.csv')
dumpCsv('C:\\dev\\csv\\test4.csv')
Execution result
C:\dev\csv\test1.csv==============
Jack, 12,Warrior,Explanation 1
Burn, 17,Knight,Explanation 2
Maam, 15,Monk,Explanation 3
C:\dev\csv\test2.csv==============
#name,age,class,In the explanation(", )Put in
Jack, 12,Warrior, ,Or"Can be entered
Burn, 17,Knight,Explanation 2
Make a line break
Maam, 15,Monk,Explanation 3
C:\dev\csv\test3.csv==============
#Treatment of blank lines
Jack, 12,Warrior, ,Or"Can be entered
Burn, 17,Knight,Explanation 2
If there is an empty newline in double quotes
#Comments on the way
Maam, 15,Monk,Explanation 3
C:\dev\csv\test4.csv==============
#If the number of columns is different
Jack, 12
Burn
Maam, 15,Monk,Explanation 3
__csv write __
writer = csv.writer(open('C:\\dev\\csv\\out.csv', 'wb'),quoting=csv.QUOTE_ALL)
writer.writerow(['test', 'Ah ah\nAh ah', 'symbol",'])
writer.writerow(['test'])
Output result
"test","Ah ah
Ah ah","symbol"","
"test"
From the above results, the following can be said when using the csv module. ・ Comments cannot be handled. -Can handle commas, line breaks, and double quotes when enclosed in double quotes. -You can easily create CSV.
In the case of Node.js, CSV can be read and written by using node-csv. If you also use a character code such as cp932, use the iconv module as well.
__ How to install __
Install node-csv
npm install csv
installation of iconv
npm install iconv
__CSV reading example __
var csv = require('csv');
var fs = require('fs');
var Iconv = require('iconv').Iconv;
var conv = new Iconv('cp932','utf-8');
dumpCsv('test1.csv');
dumpCsv('test2.csv');
dumpCsv('test3.csv');
dumpCsv('test4.csv');
function dumpCsv(path) {
fs.readFile(path, function(err, sjisBuf) {
var buf = conv.convert(sjisBuf);
console.log(path + '================');
csv.parse(buf.toString(),{comment:'#'}, function(err, data) {
console.log(err);
console.log(data);
});
});
}
Execution result
test1.csv================
null
[ [ 'Jack', '12', 'Warrior', 'Explanation 1' ],
[ 'Burn', '17', 'Knight', 'Explanation 2' ],
[ 'Maam', '15', 'Monk', 'Explanation 3' ] ]
test2.csv================
[Error: Invalid closing quote at line 1; found "The" instead of delimiter ","]
undefined
test3.csv================
null
[ [ 'Jack', '12', 'Warrior', ',Or"Can be entered' ],
[ 'Burn', '17', 'Knight', 'Explanation 2\r\n\r\n If there is an empty newline in double quotes' ],
[ 'Maam', '15', 'Monk', 'Explanation 3' ] ]
test4.csv================
null
[ [ 'Jack', '12' ], [ 'Burn' ], [ 'Maam', '15', 'Monk', 'Explanation 3' ] ]
The reading result of test2.csv is an error. csv.parse cannot read data with double quotes in comments as follows.
#name,age,class,In the explanation(",)Put in
To fix this, you need to apply the following patch to node_modules / csv / node_modules / csv-parse / lib / index.js.
--- node_modules/csv/node_modules/csv-parse/lib/index_bk.js 2014-06-20 17:36:56.000000000 +0900
+++ node_modules/csv/node_modules/csv-parse/lib/index.js 2014-07-22 22:06:12.826116745 +0900
@@ -253,7 +253,7 @@
this.closingQuote = i;
i++;
continue;
- } else if (!this.field) {
+ } else if (!this.field && !this.commenting) {
this.quoting = true;
i++;
continue;
The result when this is applied is as follows.
Execution result
test2.csv================
null
[ [ 'Jack', '12', 'Warrior', ',Or"Can be entered' ],
[ 'Burn', '17', 'Knight', 'Explanation 2\r\n Start a new line' ],
[ 'Maam', '15', 'Monk', 'Explanation 3' ] ]
__CSV writing example __
var csv = require('csv')
var data = [
['xx', 'Ah ah', 1],
['id', '"atagfa,asteata','#teat','aaa'],
['newline', 'tests\n\ntesat']
];
console.log(data);
csv.stringify(data, {quoted: true}, function(err, output) {
console.log(err);
console.log(output);
});
String created
"xx","Ah ah","1"
"id","""atagfa,asteata","#teat","aaa"
"newline","tests
tesat"
From the above results, the following can be said when using node-csv.
-You can read and write CSV by using node-csv. ・ It is also possible to handle comments. -However, since there are bugs or specifications around the comments as explained above, either create a patch yourself and apply it, or do not use comments.
Here, we will consider how to design the format of the CSV file.
It is desirable to specify the header file so that you can understand the meaning of the fields just by looking at the CSV file. At this time, it is better to treat it as a comment. I looked at some libraries, but by default I often commented on "#".
Another advantage of using CSV is that it is easy to edit data in Excel. It is a big attraction to be able to easily create data as long as you can use Excel, and you need to be careful when using a data format that destroys its advantages.
Here, we consider a method of expressing department data including employees in CSV format as follows.
The simplest is to treat department data and employee data as separate CSVs.
Department.csv
#Department name,Street address,phone number
Sales,Tokyo,12345
development of,Osaka,34566
management,Dreamland,44444
Employee.csv
#Employee name,Affiliation department,Hire date,Email
Tsubasa Oozora,Sales,1999/1/2,[email protected]
Taro Misaki,Sales,1999/1/3,[email protected]
Joseph Joestar,development of,2000/1/1,[email protected]
Jotaro Kujo,development of,2000/1/3,[email protected]
Shimakosaku,management,2003/1/1,[email protected]
The drawback is that the files are not completely separated and linked, making it difficult to understand the parent-child relationship of the data.
Another way of thinking about how to handle it as a separate file is to divide the employee file into departments and link them with department .csv.
Sales.csv
#Department name,Street address,phone number,Employee file
Sales,Tokyo,12345,Sales社員.csv
development of,Osaka,34566,development of社員.csv
management,Dreamland,44444,management社員.csv
Sales employee.csv
Tsubasa Oozora,1999/1/2,[email protected]
Taro Misaki,1999/1/3,[email protected]
Development employee.csv
Joseph Joestar,2000/1/1,[email protected]
Jotaro Kujo,2000/1/3,[email protected]
Management employee.csv
Shimakosaku,2003/1/1,[email protected]
The advantage of this is that the CSV of the parent's department is linked to the CSV of the employee, making it easy to understand the parent-child relationship, and because it is not necessary to describe the parent's name as the department name in each record, the number of files increases, but the total capacity Can be reduced.
Consider the case where departments and employees must be treated in the same file.
First, consider whether it is really necessary to express it in one CSV file. If XML is fine, there is a way to adopt it, and if it is okay to divide the file into multiple files, divide it.
However, editing XML is not as easy as CSV, and it is a fact that there is a demand to combine files into one, such as when uploading files to a server. Here, we consider how to combine hierarchical data into one file.
In this case, the standard specification "the number of fields should be the same throughout the file" cannot be satisfied.
Specify whether it is "department" or "employee" with the first field as the data category.
Department employee.csv
Department,Sales,Tokyo,12345
Department,development of,Osaka,34566
Department,management,Dreamland,44444
Employee,Tsubasa Oozora,Sales,1999/1/2,[email protected]
Employee,Taro Misaki,Sales,1999/1/3,[email protected]
Employee,Joseph Joestar,development of,2000/1/1,[email protected]
Employee,Jotaro Kujo,development of,2000/1/3,[email protected]
Employee,Shimakosaku,management,2003/1/1,[email protected]
When the amount of data actually increases, it is thought that departments and employees will be confused and difficult to understand.
Store all employees in one department line. Prepare as many fields as there are employees in the field after the department information. Include all employee information in one field.
Department employee.csv
Sales,Tokyo,12345,"Tsubasa Oozora,1999/1/2,[email protected]","Taro Misaki,1999/1/3,[email protected]"
development of,Osaka,34566,"Joseph Joestar,2000/1/1,[email protected]","Jotaro Kujo,2000/1/3,[email protected]"
management,Dreamland,44444,"Shimakosaku,2003/1/1,[email protected]"
This method makes it difficult for the user to input when editing data in Excel. For example, department addresses can be easily created by copying and pasting cells, but user email addresses cannot be copied and pasted by cells.
Store all employees in one line of department At that time, the start character and the end character are set so that the employee data can be distinguished. For example, if the field contains the characters "\ $ Tsubasa Oozora", it is considered that the data of Employee: Tsubasa Oozora is stored until "\ $" appears.
Department employee.csv
Sales,Tokyo,12345,$Tsubasa Oozora,1999/1/2,[email protected],$,$Taro Misaki,Sales,1999/1/3,[email protected],$
development of,Osaka,34566,$Joseph Joestar,2000/1/1,[email protected],$,$Jotaro Kujo,development of,2000/1/3,[email protected],$
management,Dreamland,44444,$Shimakosaku,2003/1/1,[email protected],$
Editing with Excel is easier because cell copying is enabled. However, it is necessary to determine rules such as determining special characters, and it is necessary to match the awareness with the data creator.
Since each has advantages and disadvantages, it is necessary to determine what to prioritize, including discarding CSV and using XML.
Recommended Posts