[Java] Kinx Tips –UTF8 string formatting

2 minute read

Kinx Tips –UTF8 string formatting

Introduction

** “Looks like JavaScript, brain (contents) is Ruby, (stability is AC / DC)” ** Scripting language Kinx ). Introducing a little trick implemented in Kinx.

Yes, a little trick.

UTF8 string formatting issues

C language printf

In UTF8, the number of bytes and the character width do not match, so even if you say “% -20s”, it will not be aligned slightly. Let’s try using half-width kana.

#include <stdio.h>

int main()
{
    struct fruits {
        char *name;
        int price;
    } fruits[] = {
        { .name = "Apple", .price = 230 },
        { .name = "Mandarin orange(1 bag)", .price = 450 },
        { .name = "Grapefruit", .price = 120 },
    };
    printf("01234567890123456789\n");
    for (int i = 0; i < sizeof(fruits)/sizeof(fruits[0]); ++i) {
        printf("%-20s ... %3d yen\n", fruits[i].name, fruits[i].price);
    }
    return 0;
}

The result is like this.

01234567890123456789
Apple...230 yen
Mandarin orange(1 bag)      ...450 yen
Grapefruit...120 yen

Since it is basically the number of bytes, 3 bytes in Japanese will consume 3. Therefore, the width shrinks.

Ruby sprintf

I think it’s the same as C, but it’s different.

fruits = [
    ["Apple", 230],
    ["Mandarin orange(1 bag)", 450],
    ["Grapefruit", 120],
]

puts "01234567890123456789"
fruits.each {|name, price|
    puts sprintf("%-20s ... %3d yen", name, price)
}

The result is like this.

01234567890123456789
Apple...230 yen
Mandarin orange(1 bag)              ...450 yen
Grapefruit...120 yen

The number of characters, not the number of bytes. Japanese also consumes one character (no matter how many bytes). Therefore, the display width increases. Since it is the number of characters, it actually matches if it is half-width kana.

Kinx formatter

Kinx checks the UTF8 character width and automatically adjusts the numeric part of the format string. It functions as a display width.

var fruits = [
    ["Apple", 230],
    ["Mandarin orange(1 bag)", 450],
    ["Grapefruit", 120],
];

System.println("01234567890123456789");
fruits.each(&(e) => {
    System.println("%-20s ... %3d yen" % e[0] % e[1]);
});

The result is like this.

01234567890123456789
Apple...230 yen
Mandarin orange(1 bag)          ...450 yen
Grapefruit...120 yen

It is subtle because it seems that it is not displayed in monospaced font in Qiita’s code block, but if you look at it in monospaced font, the vertical lines are neatly aligned. It adjusts according to the East Asian Width, so it should work generally well [^ 1].

[^ 1]: According to “here”, Ambiguous Unicode characters are recommended for fullwidth in the context of traditional character encoding in East Asia. I’m defeating it. Otherwise it will be disturbed.

The behavior that many people expect is ** kore **.

in conclusion

If you’re too busy to write an article, keep these small stories in stock.

See you next time.