[JAVA] If you have trouble with the character code problem in Myanmar (Burmese)

About Burmese character code

Myanmar is one of the regions where the development of Internet technology has been left behind from the world due to its historical background. In the past, the Zawgyi character code was the mainstream, but as the market opens and internationalization progresses, It is changing to UNICODE.

In the case of Japan, it resembles the history of changing to UTF8 from the time when there were Web sites such as EUC and SJIS. https://enjoy-yangon.com/ja/enyanblog/351-change-myanmar-font-zawgyi-to-unicode

Basic policy for dealing with garbled characters due to mixed character codes

If you are not a local person, the characters themselves are garbled, so it is true that we engineers and proramers do not know what the problem is. However, if you are an engineer, you need to work on solving the problem.

In other words, you need to determine the requirements required to solve the problem and solve it with software.

Requirement 1 Zawgyi or UNICODE can be judged in sentences Requirement 2 Character code conversion from Zawgyi to UNICODE

These two points are essential requirements.

In actual implementation

I searched for Github etc. Google Myanmar Tool has been a hit. https://github.com/google/myanmar-tools

If you check this content, it is written that it has a function to judge Zawgyi or UNICODE. Use this.

Further hints are hidden, use Rabit to convert character code from Zawgyi to UNICODE

Rabbit-Converter https://github.com/Rabbit-Converter

Two libraries were found.

With PHP, all you have to do is install the library with composer, load the class and pass it through. It's easy to use.

    1. Determine the character code.
  1. Convert

python


$ZawgyiDetector = new ZawgyiDetector();
$Rabbit = new Rabbit();
$text = 'Myanmar text';
$check = $ZawgyiDetector->getZawgyiProbability($input1);

if($check >= 0.95){
  $newtext = $Rabbit->zg2uni($text);
}

If you correct the character code like this, it will be displayed correctly in UNICODE. For UNICODE Myanmar fonts, the UNICODE version of the web font must be applied to CSS.

Deal with the entrance and exit

When using CMS etc., if you put this code in either when you put it in the database or when you put it out, the garbled characters will be solved. I think it's better to add a check function when putting it in the database. If you run this logic every time, rendering will be slow depending on the number of characters.

This is a rare story, but if you work on the Myanmar-related website, please refer to it.

See you again.

Recommended Posts

If you have trouble with the character code problem in Myanmar (Burmese)
Guess the character code in Java
The application absorbs the difference in character code
What to do if you have enabled Use the WSL2 based engine in Docker Desktop with insufficient WSL2 installation
Ruby's fine syntax: if you have variables and methods with the same name
If you have trouble uploading Wordpress image files with KUSANAGI Runs on Docker
If you have trouble with JPA or Hibernate, don't google and look here first
If you want to recreate the instance in cloud9
What to do if you don't see the test code error message in the terminal console
Solution if you delete the migration file in the up state
In Redmine you can get the project with Project.find (<identifier>)
Specify the character code of the source when building with Maven
Correct the character code in Java and read from the URL
[Ruby] problem with if statement
If you want to include the parent class in Lombok's @builder
What to do if you forget the root password in CentOS7