[Java] MalformedInputException that also occurs when writing

1 minute read

The Java program I wrote output the following error.

Caused by: java.nio.charset.MalformedInputException: Input length = 1
	at java.base/java.nio.charset.CoderResult.throwException(CoderResult.java:274)
	at java.base/sun.nio.cs.StreamEncoder.implWrite(StreamEncoder.java:306)
	at java.base/sun.nio.cs.StreamEncoder.implWrite(StreamEncoder.java:281)
	at java.base/sun.nio.cs.StreamEncoder.write(StreamEncoder.java:125)
	at java.base/java.io.OutputStreamWriter.write(OutputStreamWriter.java:211)
	at java.base/java.io.BufferedWriter.flushBuffer(BufferedWriter.java:120)
	at java.base/java.io.BufferedWriter.flush(BufferedWriter.java:256)
(Hereafter, the stack trace part of the application code is omitted)

It’s easy to see that it’s a character encoding issue, but I can’t think of a reason when I look at the relevant part of the application code below.

File file = ...;
try (BufferedWriter writer = java.nio.file.Files.newBufferedWriter(file, StandardCharsets.UTF_8)) {
    String s = ...;
    writer.write(s);
    writer.flush(); //Since an error occurs here, the argument of the previous write is suspicious...
}

In such cases, it is customary to dump the string s and examine the contents of the s.

System.err.println("[" + s + "]");

Then, it was displayed as follows. What is?, Garbled?
In the first place, does System.err # println give no error?

[?]

It was a detour that I didn’t immediately notice Surrogate Pair.
When I looked up each letter of s with the following code, I got High.. This is the cause of the first error.

for (char c : s.toCharArray()) {
    if (Character.isHighSurrogate(c)) {
        System.err.println("High.");
    }
    if (Character.isLowSurrogate(c)) {
        System.err.println("Low.");
    }
}

As of 2020, is Java programming that properly considers surrogate pairs a common sense practice?