About CLDR locale data enabled by default from Java 9

Overview

This is a study of CLDR locale data that was adopted in Java 1.8 (but disabled by default) and enabled by default in Java 9. You can check CLDR on the site mentioned for reference, but briefly, it is a project underway at the Unicode Consortium and has different locales (date format, date format, around the world). A database of currency names, country names, dates of the week, numeric formats, etc. is created. This data is managed and published in XML format LDML (Locale Data Markup Language), and Java also incorporates this data, though not completely.

The motivation for writing this article is Problems and solutions caused by migrating Nulab's account infrastructure to Java 9. After reading the article, I was wondering what kind of code would affect the part quoted below.

** Date and currency formats have changed ** The automated test failed due to a change in run-time behavior between Java 8 and 9.

The date format is internationalized. This is because Java 9 changed the initial value of the internationalization extension to CLDR (Common Locale Data Repository), which is the de facto standard for internationalization defined by the Unicode Consortium (JEP 252).

environment

reference

Differences in behavior between Java versions

I briefly investigated the difference in behavior between versions of Java 8 (Oracle JDK) / Java 9 (OpenJDK) / Java 10 (OpenJDK).

Locale.toLanguageTag

Returns a well-formed IETF BCP 47 language tag that represents this locale.

Locale.getDefault().toLanguageTag();             // → (1)
new Locale("ja", "JP").toLanguageTag();          // → (2) 
new Locale("ja", "JP", "JP").toLanguageTag();    // → (3)

** Output result **

pattern 1.8.0 9.0.4 10.0.1
1 ja-JP ja-JP ja-JP
2 ja-JP ja-JP ja-JP
3 ja-JP-u-ca-japanese-x-lvariant-JP ja-JP-u-ca-japanese-x-lvariant-JP ja-JP-u-ca-japanese-x-lvariant-JP

About Locale ("ja", "JP", "JP") of special locale

Locale (Java SE 10 & JDK 10) Two non-compliant locales are treated as a special case for compatibility. These are ja_JP_JP and th_TH_TH.

In Java, ja_JP_JP has been used to represent the Japanese imperial year along with the Japanese language used in Japan. This is now represented using the Unicode locale extension by specifying the Unicode locale key ca (calendar) and type japanese. The extension u-ca-japanese is automatically added when the Locale constructor is called with the arguments "ja", "JP", "JP".

The "u-ca-japanese" in the JavaDoc quoted above is "u", which stands for Unicode locale extension, and a keyword (key / type pair) that overrides the default behavior of the locale (calendar in this example). ) Is a combination of "ca-japanese".

+----- U extension
| +--- Keyword (Key & Type)
| |
- -----------
u-ca-japanese
  ^^ ^^^^^^^^
  |  |
  |  +--- Type (japanese = Japanese Imperial calendar)
  +------ Key  (ca = Calendar algorithm)

Java supports two types of keys in Java 9 (JEP 314: Additional Unicode Language-Tag Extensions):

Java 10 adds four things:

Override the currency type of the locale

There is a Locale.forLanguageTag method, but Locale.Builder is recommended, so use this Builder to generate a customized locale. The following is an example of overriding the currency type with US dollars for the Japanese locale.

Locale locale = new Locale.Builder()
    .setLocale(Locale.getDefault())
    .setUnicodeLocaleKeyword("cu", "USD")
    .build();

System.out.println(locale.toLanguageTag());
// → ja-JP-u-cu-usd

Currency currency = Currency.getInstance(locale);
System.out.println(currency.getCurrencyCode());
// → USD
System.out.println(currency.getDisplayName());
//→ US dollar
System.out.println(currency.getSymbol());
// → $

double money = 123456789.12345;
NumberFormat formatter = NumberFormat.getCurrencyInstance(locale);
formatter.setMinimumFractionDigits(3);
System.out.println(formatter.format(money));
// → $123,456,789.123

Override the first day of the week in the calendar

As quoted below, you can override the first day of the week with any day of the week by specifying the Unicode extension keyword "u-fw-xxx" as described in the JavaDoc of the Calendar class.

Calendar (Java SE 10 & JDK 10)

Calendar A locale-specific 7 days a week is defined using two parameters: the first day of the week and the minimum number of days in the first week (1-7). These numbers are taken from the locale resource data when the Calendar was built or from the locale itself. If the specified locale contains "fw" and / or "rg" "Unicode extensions", the first day of the week will be retrieved according to those extensions.

Calendar calendar = Calendar.getInstance();
System.out.println(calendar.getCalendarType());
// → gregory
System.out.println(calendar.getFirstDayOfWeek());
// → 1 (Calendar.SUNDAY)
Locale locale = new Locale.Builder()
    .setLocale(Locale.getDefault())
    .setUnicodeLocaleKeyword("fw", "mon")
    .build();

System.out.println(locale.toLanguageTag());
// → ja-JP-u-fw-mon

Calendar calendar = Calendar.getInstance(locale);
System.out.println(calendar.getCalendarType());
// → gregory
System.out.println(calendar.getFirstDayOfWeek());
// → 2  (Calendar.MONDAY)

Date and DateFormat

Unicode locale extension (addition of u-ca-japanese)

Locale locale = new Locale("ja", "JP", "JP");
Date now = new Date();
DateFormat.getDateInstance(DateFormat.FULL, locale).format(now);   // → (1)
DateFormat.getDateInstance(DateFormat.LONG, locale).format(now);   // → (2)
DateFormat.getDateInstance(DateFormat.MEDIUM, locale).format(now); // → (3)
DateFormat.getDateInstance(DateFormat.SHORT, locale).format(now);  // → (4)

Default locale

Date now = new Date();
DateFormat.getDateInstance(DateFormat.FULL).format(now);           // → (5)
DateFormat.getDateInstance(DateFormat.LONG).format(now);           // → (6)
DateFormat.getDateInstance(DateFormat.MEDIUM).format(now);         // → (7)
DateFormat.getDateInstance(DateFormat.SHORT).format(now);          // → (8)

** Output result **

pattern 1.8.0 9.0.4 10.0.1 1.Difference between 8 and 9
1 June 25, 2018 June 25, 2018 June 25, 2018
2 H30.06.25 2018.06.25 2018.06.25 Yes
3 H30.06.25 2018.06.25 2018.06.25 Yes
4 H30.06.25 2018.06.25 2018.06.25 Yes
5 June 25, 2018 Monday, June 25, 2018 Monday, June 25, 2018 Yes
6 2018/06/25 June 25, 2018 June 25, 2018 Yes
7 2018/06/25 2018/06/25 2018/06/25
8 18/06/25 2018/06/25 2018/06/25 Yes

Calendar and DateFormat

Unicode locale extension (addition of u-ca-japanese)

Locale locale = new Locale("ja", "JP", "JP");
Calendar now = Calendar.getInstance(locale);
DateFormat.getDateInstance(DateFormat.FULL, locale).format(now.getTime());   // → (1)
DateFormat.getDateInstance(DateFormat.LONG, locale).format(now.getTime());   // → (2)
DateFormat.getDateInstance(DateFormat.MEDIUM, locale).format(now.getTime()); // → (3)
DateFormat.getDateInstance(DateFormat.SHORT, locale).format(now.getTime());  // → (4)

Default locale

Calendar now = Calendar.getInstance();
DateFormat.getDateInstance(DateFormat.FULL).format(now.getTime());           // → (5)
DateFormat.getDateInstance(DateFormat.LONG).format(now.getTime());           // → (6)
DateFormat.getDateInstance(DateFormat.MEDIUM).format(now.getTime());         // → (7)
DateFormat.getDateInstance(DateFormat.SHORT).format(now.getTime());          // → (8)

** Output result **

pattern 1.8.0 9.0.4 10.0.1 1.Difference between 8 and 9
1 June 25, 2018 June 25, 2018 June 25, 2018
2 H30.06.25 2018.06.25 2018.06.25 Yes
3 H30.06.25 2018.06.25 2018.06.25 Yes
4 H30.06.25 2018.06.25 2018.06.25 Yes
5 June 25, 2018 Monday, June 25, 2018 Monday, June 25, 2018 Yes
6 2018/06/25 June 25, 2018 June 25, 2018 Yes
7 2018/06/25 2018/06/25 2018/06/25
8 18/06/25 2018/06/25 2018/06/25 Yes

LocalDateTime and DateTimeFormatter

Unicode locale extension (addition of u-ca-japanese)

Locale locale = new Locale("ja", "JP", "JP");
LocalDateTime now = LocalDateTime.now();
DateTimeFormatter.ofLocalizedDate(FormatStyle.FULL).withLocale(locale).format(now);    // → (1)
DateTimeFormatter.ofLocalizedDate(FormatStyle.LONG).withLocale(locale).format(now);    // → (2)
DateTimeFormatter.ofLocalizedDate(FormatStyle.MEDIUM).withLocale(locale).format(now);  // → (3)
DateTimeFormatter.ofLocalizedDate(FormatStyle.SHORT).withLocale(locale).format(now);   // → (4)

Default locale

LocalDateTime now = LocalDateTime.now();
DateTimeFormatter.ofLocalizedDate(FormatStyle.FULL).format(now);     // → (5)
DateTimeFormatter.ofLocalizedDate(FormatStyle.LONG).format(now);     // → (6)
DateTimeFormatter.ofLocalizedDate(FormatStyle.MEDIUM).format(now);   // → (7)
DateTimeFormatter.ofLocalizedDate(FormatStyle.SHORT).format(now);    // → (8)

** Output result **

pattern 1.8.0 9.0.4 10.0.1 1.Difference between 8 and 9
1 June 25, 2018 Monday, June 25, 2018 Monday, June 25, 2018 Yes
2 2018/06/25 June 25, 2018 June 25, 2018 Yes
3 2018/06/25 2018/06/25 2018/06/25
4 18/06/25 2018/06/25 2018/06/25 Yes
5 June 25, 2018 Monday, June 25, 2018 Monday, June 25, 2018 Yes
6 2018/06/25 June 25, 2018 June 25, 2018 Yes
7 2018/06/25 2018/06/25 2018/06/25
8 18/06/25 2018/06/25 2018/06/25 Yes

DateTimeFormatter.localizedBy

The localizedBy method is a method introduced in Java 10. If the locale contains Unicode extensions, the locale will be overridden. (No side effects on locale instances)

Locale locale = new Locale("ja", "JP", "JP");
LocalDateTime now = LocalDateTime.now();
DateTimeFormatter.ofLocalizedDate(FormatStyle.FULL).localizedBy(locale).format(now);    // → (1)
DateTimeFormatter.ofLocalizedDate(FormatStyle.LONG).localizedBy(locale).format(now);    // → (2)
DateTimeFormatter.ofLocalizedDate(FormatStyle.MEDIUM).localizedBy(locale).format(now);  // → (3)
DateTimeFormatter.ofLocalizedDate(FormatStyle.SHORT).localizedBy(locale).format(now);   // → (4)

** Output result **

pattern 1.8.0 9.0.4 10.0.1 1.Difference between 8 and 9
1 - - Wednesday, June 27, 2018 -
2 - - June 27, 2018 -
3 - - June 27, 2018 -
4 - - H30/6/27 -

Zoned DateTime and DateTimeFormatter

Unicode locale extension (addition of u-ca-japanese)

Locale locale = new Locale("ja", "JP", "JP");
ZonedDateTime now = ZonedDateTime.now();
DateTimeFormatter.ofLocalizedDateTime(FormatStyle.FULL, FormatStyle.FULL).withLocale(locale).format(now);     // → (1)
DateTimeFormatter.ofLocalizedDateTime(FormatStyle.LONG, FormatStyle.LONG).withLocale(locale).format(now);     // → (2)
DateTimeFormatter.ofLocalizedDateTime(FormatStyle.MEDIUM, FormatStyle.MEDIUM).withLocale(locale).format(now); // → (3)
DateTimeFormatter.ofLocalizedDateTime(FormatStyle.SHORT, FormatStyle.SHORT).withLocale(locale).format(now);   // → (4)

Default locale

ZonedDateTime now = ZonedDateTime.now();
DateTimeFormatter.ofLocalizedDateTime(FormatStyle.FULL, FormatStyle.FULL).format(now);      // → (5)
DateTimeFormatter.ofLocalizedDateTime(FormatStyle.LONG, FormatStyle.LONG).format(now);      // → (6)
DateTimeFormatter.ofLocalizedDateTime(FormatStyle.MEDIUM, FormatStyle.MEDIUM).format(now);  // → (7)
DateTimeFormatter.ofLocalizedDateTime(FormatStyle.SHORT, FormatStyle.SHORT).format(now);    // → (8)

** Output result **

pattern 1.8.0 9.0.4 10.0.1 1.Difference between 8 and 9
1 June 25, 2018 22:20:24 JST Monday, June 25, 2018 22:22:28 Japan Standard Time Monday, June 25, 2018 22:24:57 Japan Standard Time Yes
2 2018/06/25 22:20:24 JST June 25, 2018 22:22:28 JST June 25, 2018 22:24:57 JST Yes
3 2018/06/25 22:20:24 2018/06/25 22:22:28 2018/06/25 22:24:57
4 18/06/25 22:20 2018/06/25 22:22 2018/06/25 22:24 Yes
5 June 25, 2018 22:20:24 JST Monday, June 25, 2018 22:22:28 Japan Standard Time Monday, June 25, 2018 22:24:57 Japan Standard Time Yes
6 2018/06/25 22:20:24 JST June 25, 2018 22:22:28 JST June 25, 2018 22:24:57 JST Yes
7 2018/06/25 22:20:24 2018/06/25 22:22:28 2018/06/25 22:24:57
8 18/06/25 22:20 2018/06/25 22:22 2018/06/25 22:24 Yes

Specify any pattern

There was no difference between the versions when doing any pattern.

Locale locale = new Locale("ja", "JP", "JP");
LocalDateTime now = LocalDateTime.now();
DateTimeFormatter.ofPattern("G yyyy-MM-dd (E) a HH:mm:ss.SSS", locale).format(now);                // → (1)
DateTimeFormatter.ofPattern("G yyyy-MM-dd (E) a HH:mm:ss.SSS").format(now);                        // → (2)
DateTimeFormatter.ofPattern("G yy-MM-dd (E) a HH:mm:ss.SSS").localizedBy(locale).format(now);      // → (3)
Locale locale = new Locale("ja", "JP", "JP");
ZonedDateTime now = ZonedDateTime.now();
DateTimeFormatter.ofPattern("G yyyy-MM-dd (E) a HH:mm:ss.SSS zzz", locale).format(now);            // → (4)
DateTimeFormatter.ofPattern("G yyyy-MM-dd (E) a HH:mm:ss.SSS zzz").format(now);                    // → (5)
DateTimeFormatter.ofPattern("G yy-MM-dd (E) a HH:mm:ss.SSS zzz").localizedBy(locale).format(now);  // → (6)

** Output result **

pattern 1.8.0 9.0.4 10.0.1 1.Difference between 8 and 9
1 Year 2018-06-26 (fire)00 am:01:45.267 Year 2018-06-26 (fire)00 am:02:50.751 Year 2018-06-26 (fire)00 am:03:56.584
2 Year 2018-06-26 (fire)00 am:01:45.267 Year 2018-06-26 (fire)00 am:02:50.751 Year 2018-06-26 (fire)00 am:03:56.584
3 - - Heisei 30-06-26 (fire)00 am:26:03.888 -
4 Year 2018-06-26 (fire)00 am:01:45.269 JST Year 2018-06-26 (fire)00 am:02:50.767 JST Year 2018-06-26 (fire)00 am:03:56.601 JST
5 Year 2018-06-26 (fire)00 am:01:45.269 JST Year 2018-06-26 (fire)00 am:02:50.767 JST Year 2018-06-26 (fire)00 am:03:56.601 JST
6 - - Heisei 30-06-26 (fire)00 am:26:03.898 JST -

Supplement

Unicode Technical Standard #35

UNICODE LOCALE DATA MARKUP LANGUAGE (LDML)

Transition of internationalization extensions

Internationalization extensions in Java SE 6

There is no description about CLDR. Although it has nothing to do with CLDR, support for Japanese history is provided in Java SE 6.

** Japanese calendar support **

A new Calendar implementation has been added to support Japanese calendar counting, such as 2005 (Gregorian calendar) as 2005. This Japanese calendar instance can be created in the Calendar.getInstance factory by specifying Locale ("ja", "JP", "JP"). The java.text.SimpleDateFormat class supports calendar-specific year and date formats other than the Gregorian calendar.

Locale locale = new Locale("ja", "JP", "JP");
Calendar calendar = Calendar.getInstance(locale);
System.out.println(calendar.getClass().getCanonicalName());
// → java.util.JapaneseImperialCalendar

System.out.println(new SimpleDateFormat("Gyy year MM month dd day(E)", locale).format(calendar.getTime()));
//→ June 25, 2018(Month)
Calendar calendar = Calendar.getInstance();
System.out.println(calendar.getClass().getCanonicalName());
// → java.util.GregorianCalendar

System.out.println(new SimpleDateFormat("Gyyyy year MM month dd day(E)").format(calendar.getTime()));
//→ June 25, 2018 AD(Month)

Internationalization extensions in Java SE 7

** Locale class supports BCP47 and UTR35 **

The Locale class has been updated to implement identifiers that can be exchanged with BCP 47 (IETF BCP 47 "Tags for Identifying Languages") and LDML for locale data exchange (UTS # 35 "Unicode Locale Data Markup Language"). Supports BCP 47 compatibility extensions.

Extension of internationalization in JDK 8

** Adoption of Unicode CLDR data and java.locale.providers system properties **

The Unicode Consortium has released the Common Locale Data Repository (CLDR) project to "support the world's languages with the largest and most extensive standard locale data repository". CLDR is becoming the de facto standard for locale data.

CLDR's XML-based locale data is included in the JDK 8 release, but is disabled by default.

Default

The default behavior is equivalent to the following settings.

java.locale.providers=JRE,SPI

Internationalization extensions in JDK 9

** CLDR locale data enabled by default **

The XML-based locale data for the Unicode Common Locale Data Repository (CLDR) that was first added to JDK 8 is the default locale data for JDK 9. In previous releases, the default was JRE.

Default

If you do not set this property, the default behavior is equivalent to the following setting:

java.locale.providers=CLDR,COMPAT,SPI

Internationalization extensions in JDK 10

** Additional Unicode language tag extensions **

Java SE 9 only supports -ca (calendar) and -nu (numeric) extensions. Java SE 10 adds support for the following additional extensions in the associated JDK classes:

  • -cu (currency type)
  • -fw (first day of the week)
  • -rg (Region Override)
  • -tz (time zone)

Issues addressed in Java 9

java.time: DateTimeFormatter containing "DD" fails on 3-digit day-of-year value

Java 9 has solved the bug that an exception occurs when "DD" is specified in the pattern string when the total number of days in the target date is 100 days or more.

LocalDateTime now = LocalDateTime.now();
DateTimeFormatter.ofPattern("D").format(now);
// → 177
DateTimeFormatter.ofPattern("DD").format(now);  // ← Java 1.Exception in 8
// → 177
DateTimeFormatter.ofPattern("DDD").format(now);
// → 177

DateTimeFormatter won't parse dates with custom format "yyyyMMddHHmmssSSS"

In Java 1.8, the bug that an exception occurs when "yyyyMMddHHmmssSSS" is specified in the pattern string has been solved in Java 9.

LocalDateTime now = LocalDateTime.now();
DateTimeFormatter.ofPattern("yyyyMMddHHmmssSSS").format(now);
// → 20180625222024147

java.time.format.FormatStyle.LONG or FULL causes unchecked exception

Formatting LocalDateTime with DateTimeFormatter.ofLocalizedDateTime (FormatStyle.FULL) will raise an exception, but this is by design and not a problem.

LocalDateTime now = LocalDateTime.now();
DateTimeFormatter.ofLocalizedDateTime(FormatStyle.FULL).format(now);
// → Exception in thread "main" java.time.DateTimeException: Unable to extract ZoneId from temporal 2018-06-26T01:00:54.557262700

In case of ZonedDateTime, it can be formatted without any problem.

ZonedDateTime now = ZonedDateTime.now();
DateTimeFormatter.ofLocalizedDateTime(FormatStyle.FULL, FormatStyle.FULL).format(now);
//→ Monday, June 25, 2018 22:24:57 Japan Standard Time

Correspondence for other pattern character strings (correspondence that conforms to CLDR specifications rather than addition)

DateTimeFormatter pattern letters 'A','n','N' Add date-time patterns 'v' and 'vvvv' DateTimeFormatter pattern letter 'g' Incorrect documentation for DateTimeFormatter letter 'k'

Issues addressed in Java 11

I'm picking up what I'm interested in, so not all of them are listed here.

Japanese new era implementation Release Note: Japanese New Era Implementation

This is a response to the revision that will take place on May 1, 2019. At the moment, "New Era" will be displayed in the era as a provisional issue.

Release Note: Update locale data to Unicode CLDR v33

The CLDR data to Version 33 (http://cldr.unicode.org/index/downloads/cldr-33) will be upgraded.

Recommended Posts

About CLDR locale data enabled by default from Java 9
Call the Microsoft Emotion API by sending image data directly from Java.
[Java Siler] About type inference by var
Data processing using stream API from Java 8
Use PostgreSQL data type (jsonb) from Java
Kinesis Data Streams from zero Java experience (3.1)
Kinesis Data Streams from zero Java experience (3.2)
About Java data types (especially primitive types) and literals
Memo: [Java] Get Raspberry Pi data by SFTP