This is a review note of the java.util.Scanner class introduced in Java 1.5. There are many examples of using standard input (System.in) in the usage examples of the Scanner class, but this article focuses on reading text files without dealing with standard input. It also does not cover all APIs of the Scanner class.
environment
reference
signature
public Scanner(File source) throws FileNotFoundException
public Scanner(File source, String charsetName) throws FileNotFoundException
** Code example **
example
Path in = Paths.get("path/to/sample.in");
Scanner scanner = new Scanner(in.toFile());
try (scanner) {
// ...abridgement...
}
signature
public Scanner(InputStream source)
public Scanner(InputStream source, String charsetName)
** Code example **
example
InputStream in = Files.newInputStream(Paths.get("path/to/sample.in"));
Scanner scanner = new Scanner(in);
try (scanner) {
// ...abridgement...
}
As described in the JavaDoc, if the Scanner's input resource implements the Closeable interface, when the Scanner is closed, that resource is also closed and does not need to be included in the try clause.
When the Scanner is closed, if its input source implements the Closeable interface, that source will also be closed.
signature
public Scanner(Path source) throws IOException
public Scanner(Path source, String charsetName) throws IOException
** Code example **
example
Path in = Paths.get("path", "to", "sample.in");
Scanner scanner = new Scanner(in);
try (scanner) {
// ...abridgement...
}
signature
public Scanner(String source)
** Code example **
example
String in = "apple banana cherry durian elderberry";
Scanner scanner = new Scanner(in);
try (scanner) {
// ...abridgement...
}
signature
public Scanner(Readable source)
** Code example **
example
Readable in = new FileReader(new File("path/to/sample.in"));
Scanner scanner = new Scanner(in);
try (scanner) {
// ...abridgement...
}
** Sample file **
I used the postal code data that can be downloaded from the website of Japan Post Co., Ltd. The code examples in this article deal with this comma-separated text, but you'll usually use a library such as opencsv.
sample.in
32343,"69917","6991701","Shimanen","Nitagun Okuizumo","Kameda","Shimane Prefecture","Okuizumo Town, Nita District","Tortoise",0,0,0,0,0,0
32343,"69915","6991515","Shimanen","Nitagun Okuizumo","Kamokura","Shimane Prefecture","Okuizumo Town, Nita District","Kamokura",0,0,0,0,0,0
32343,"69915","6991514","Shimanen","Nitagun Okuizumo","Kawachi","Shimane Prefecture","Okuizumo Town, Nita District","Kawachi",0,0,0,0,0,0
The default token delimiter is a whitespace character. However, the white space character in this case is a white space (character that Character.isWhitespace returns true) according to Java standards, and other than half-width spaces, for example, the following are recognized as delimiters.
** Output result **
Characters with a true return value are Java-based whitespace.
output
Character.isWhitespace(' '); //Half-width space
// → true
Character.isWhitespace('\u0020'); //Half-width space
// → true
Character.isWhitespace(' '); //Full-width space
// → true
Character.isWhitespace('\t'); //tab
// → true
Character.isWhitespace('\n'); //new line
// → true
Character.isWhitespace('\f'); //Form feed
// → true
Character.isWhitespace('\r'); //return
// → true
Character.isWhitespace('\u001C'); //File delimiter
// → true
Character.isWhitespace('\u001D'); //Group delimiter
// → true
Character.isWhitespace('\u001E'); //Record delimiter
// → true
Character.isWhitespace('\u001F'); //Unit delimiter
// → true
Character.isWhitespace('\u00a0'); //No break space.So-called
// → false
Character.isWhitespace('a');
// → false
Character.isWhitespace('Ah');
// → false
signature
public Pattern delimiter()
** Code example **
example
scanner.delimiter().pattern();
Output result
output
\p{javaWhitespace}+
Specify any pattern for the delimiter with the useDelimiter method.
signature
public Scanner useDelimiter(Pattern pattern)
public Scanner useDelimiter(String pattern)
** Code example **
example
String in = "apple : banana : cherry : durian : elderberry";
Scanner scanner = new Scanner(in);
try (scanner) {
scanner.useDelimiter("\\s*:\\s*");
while (scanner.hasNext()) {
System.out.println("[" + scanner.next() + "]");
}
}
Output result
output
[apple]
[banana]
[cherry]
[durian]
[elderberry]
Use the hasNextLine and nextLine methods. The hasNextLine method returns true if the scanner still has input lines. The nextLine method also returns the contents from the scanner's current position to the end of the line, moving the scanner's position to the beginning of the next line.
signature
public boolean hasNextLine()
public String nextLine()
** Code example **
example
File in = new File("path/to/sample.in");
Scanner scanner = new Scanner(in);
try (scanner) {
int counter = 0;
while (scanner.hasNextLine()) {
System.out.println(String.format("%2d: %s", ++counter, scanner.nextLine()));
}
}
Output result
output
1: 32343,"69917","6991701","Shimanen","Nitagun Okuizumo","Kameda","Shimane Prefecture","Okuizumo Town, Nita District","Tortoise",0,0,0,0,0,0
2: 32343,"69915","6991515","Shimanen","Nitagun Okuizumo","Kamokura","Shimane Prefecture","Okuizumo Town, Nita District","Kamokura",0,0,0,0,0,0
3: 32343,"69915","6991514","Shimanen","Nitagun Okuizumo","Kawachi","Shimane Prefecture","Okuizumo Town, Nita District","Kawachi",0,0,0,0,0,0
The hasNext method returns true if the scanner input has another token. The next method also returns a token from the scanner's current position and moves the scanner's position to the next delimiter position.
The default token delimiter is whitespace (half-width, full-width), tab, line feed code, etc., but in this sample data, the token delimiter is a comma, so specify it explicitly with the useDelimiter method, and the line feed code is also a token delimiter. Must be specified as a character.
signature
public boolean hasNext()
public String next()
** Code example **
example
File in = new File("path/to/sample.in");
Scanner scanner = new Scanner(in);
try (scanner) {
scanner.useDelimiter(",|\n");
int counter = 0;
while (scanner.hasNext()) {
System.out.println(String.format("%2d: %s", ++counter, scanner.next()));
}
}
Output result
output
1: 32343
2: "69917"
3: "6991701"
4: "Shimanen"
5: "Nitagun Okuizumo"
6: "Kameda"
7: "Shimane Prefecture"
8: "Okuizumo Town, Nita District"
9: "Tortoise"
10: 0
11: 0
12: 0
13: 0
14: 0
15: 0
16: 32343
17: "69915"
18: "6991515"
19: "Shimanen"
20: "Nitagun Okuizumo"
21: "Kamokura"
22: "Shimane Prefecture"
23: "Okuizumo Town, Nita District"
24: "Kamokura"
25: 0
26: 0
27: 0
28: 0
29: 0
30: 0
31: 32343
32: "69915"
33: "6991514"
34: "Shimanen"
35: "Nitagun Okuizumo"
36: "Kawachi"
37: "Shimane Prefecture"
38: "Okuizumo Town, Nita District"
39: "Kawachi"
40: 0
41: 0
42: 0
43: 0
44: 0
45: 0
You can also use the next method to read the token at any position and the nextLine method to skip the data up to the end of the line.
** Code example **
example
File in = new File("path/to/sample.in");
Scanner scanner = new Scanner(in);
try (scanner) {
scanner.useDelimiter(",");
int counter = 0;
while (scanner.hasNextLine()) {
int code = scanner.nextInt(); //National local government code
String zip5 = scanner.next(); //Zip code (5 digits)
String zip7 = scanner.next(); //Zip code (7 digits)
scanner.next(); //skip Prefecture name Half-width katakana
scanner.next(); //skip City / ward / town / village name Half-width katakana
scanner.next(); //skip Town area name Half-width katakana
String prefectures = scanner.next(); //Name of prefectures
String city = scanner.next(); //City name
String townArea = scanner.next(); //Town area name
System.out.println(String.format("%2d: %d %s %s %s %s %s", ++counter, code, zip5, zip7, prefectures, city, townArea));
scanner.nextLine(); // next line
}
}
Output result
output
1: 32343 "69917" "6991701" "Shimane Prefecture" "Okuizumo Town, Nita District" "Tortoise"
2: 32343 "69915" "6991515" "Shimane Prefecture" "Okuizumo Town, Nita District" "Kamokura"
3: 32343 "69915" "6991514" "Shimane Prefecture" "Okuizumo Town, Nita District" "Kawachi"
Searches for a character string that matches the search pattern specified in the findInLine method from the current position of the scanner to the end of the line. Returns null if no string matching the pattern is found.
signature
public String findInLine(String pattern)
public String findInLine(Pattern pattern)
** Code example **
exmaple
File in = new File("path/to/sample.in");
Scanner scanner = new Scanner(in);
try (scanner) {
int counter = 0;
while (scanner.hasNextLine()) {
String find = scanner.findInLine("69915[0-9]{2}");
System.out.println(String.format("%2d: %s", ++counter, find));
scanner.nextLine(); // next line
}
}
Output result
output
1: null
2: 6991515
3: 6991514
tokens
Returns a stream of tokens.
signature
public Stream<String> tokens()
** Code example **
example
String in = "apple banana cherry durian elderberry";
Scanner scanner = new Scanner(in);
try (scanner) {
final List<String> fruits = scanner.tokens()
.map(String::toUpperCase)
.collect(Collectors.toUnmodifiableList());
System.out.println(fruits);
}
Output result
output
[APPLE, BANANA, CHERRY, DURIAN, ELDERBERRY]
findAll
Returns a stream of pattern matching from the scanner.
signature
public Stream<MatchResult> findAll(Pattern pattern)
public Stream<MatchResult> findAll(String patString)
** Code example **
example
File in = new File("path/to/sample.in");
Scanner scanner = new Scanner(in);
try (scanner) {
List<String> list = scanner.findAll("\"[0-9]{5,}\"")
.map(MatchResult::group)
.collect(Collectors.toUnmodifiableList());
System.out.println(list);
}
Output result
output
["69917", "6991701", "69915", "6991515", "69915", "6991514"]
A new constructor has been added that takes a Charset as the second argument. The code example is omitted.
signature
public Scanner(InputStream source, Charset charset)
public Scanner(File source, Charset charset) throws IOException
public Scanner(Path source, Charset charset) throws IOException
public Scanner(ReadableByteChannel source, Charset charset)
Recommended Posts