Java has a qualifier called strictfp. I prepared the code and execution environment whose behavior actually changes depending on strictfp.

Machine

** strictfp has no meaning when running Java on modern machines. ** Should behave as IEEE compliant with or without strictfp.

(Therefore, there is a JEP that tries to make IEEE compliant behavior with or without strictfp: JEP 306: Restore Always-Strict Floating-Point Semantics / 306)))

strictfp is meaningful for x86 CPUs that do not have SSE2 implemented. In the case of Intel, SSE2 is implemented in Pentium 4 or later around 2000, so if you want to prepare a machine without SSE2, you need a machine earlier than that.

However, it is difficult to prepare a machine that is more than 20 years old (I am not a retro PC enthusiast). Therefore, we will use emulation. Consider two emulators, Intel SDE and QEMU.

Among the CPU types that can be specified by Intel SDE, there seems to be -quark that does not implement SSE2. The next least feature is -p4, that is, Pentium 4, so if you want to prepare an environment without SSE2 with Intel SDE, you can choose -quark.

You can use -cpu pentium, -cpu pentium2, -cpu pentium3, etc. in QEMU. On Linux, it is convenient because you can run the program without preparing a virtual machine called user space emulation. This article also uses Linux.

The JVM utilizes Java 8 Update 261 distributed by Oracle. This guy seems to be compatible with Pentium II and later. I haven't confirmed what the new JRE is like.

Reference: What are the system requirements for Java?

program

It seems that strictfp is meaningful when overflow or underflow occurs in the middle of floating point arithmetic. If strictfp is specified, it behaves IEEE-compliant, and if strictfp is not specified, overflow or underflow may be avoided or a different value may be returned.

Java Language Specification - Chapter 15. Expressions

here,

Multiplication of doubles
double 3 multiplications
float 3 multiplications

For, we will prepare a version with and without strictfp. Then let them calculate those values for specific values. For 3 multiplications, run it 100,000 times in anticipation of JIT compilation, and display the values before and after that.

class StrictfpTest
{
    static double multiplyDefault(double x, double y)
    {
        return x * y;
    }
    static strictfp double multiplyStrict(double x, double y)
    {
        return x * y;
    }
    static double multiplyThreeDoublesDefault(double x, double y, double z)
    {
        return x * y * z;
    }
    static strictfp double multiplyThreeDoublesStrict(double x, double y, double z)
    {
        return x * y * z;
    }
    static float multiplyThreeFloatsDefault(float x, float y, float z)
    {
        return x * y * z;
    }
    static strictfp float multiplyThreeFloatsStrict(float x, float y, float z)
    {
        return x * y * z;
    }
    public static void main(String[] args)
    {
        {
            double x = 0x1.00002fff0p0, y = 0x1.000000008p0;
            System.out.printf("multiplyDefault(%a, %a) = %a\n", x, y, multiplyDefault(x, y));
            System.out.printf("multiplyStrict(%a, %a) = %a\n", x, y, multiplyStrict(x, y));
        }
        {
            double x = 0x1.fffe0effffffep0, y = 0x1.0000000000001p0;
            System.out.printf("multiplyDefault(%a, %a) = %a\n", x, y, multiplyDefault(x, y));
            System.out.printf("multiplyStrict(%a, %a) = %a\n", x, y, multiplyStrict(x, y));
        }
        {
            double x = 0x1.fffe0effffffep-51, y = 0x1.0000000000001p-1000;
            System.out.printf("multiplyDefault(%a, %a) = %a\n", x, y, multiplyDefault(x, y));
            System.out.printf("multiplyStrict(%a, %a) = %a\n", x, y, multiplyStrict(x, y));
        }
        {
            double x = 0x1p-1000, y = 0x1p-1000, z = 0x1p1000;
            System.out.printf("multiplyThreeDoublesDefault(%a, %a, %a) = %a\n", x, y, z, multiplyThreeDoublesDefault(x, y, z));
            System.out.printf("multiplyThreeDoublesStrict(%a, %a, %a) = %a\n", x, y, z, multiplyThreeDoublesStrict(x, y, z));
            for (int i = 0; i < 100000; ++i) {
                multiplyThreeDoublesDefault(x, z, y);
                multiplyThreeDoublesStrict(x, z, y);
            }
            System.out.printf("multiplyThreeDoublesDefault(%a, %a, %a) = %a\n", x, y, z, multiplyThreeDoublesDefault(x, y, z));
            System.out.printf("multiplyThreeDoublesStrict(%a, %a, %a) = %a\n", x, y, z, multiplyThreeDoublesStrict(x, y, z));
        }
        {
            float x = 0x1p-100f, y = 0x1p-100f, z = 0x1p100f;
            System.out.printf("multiplyThreeFloatsDefault(%a, %a, %a) = %a\n", x, y, z, multiplyThreeFloatsDefault(x, y, z));
            System.out.printf("multiplyThreeFloatsStrict(%a, %a, %a) = %a\n", x, y, z, multiplyThreeFloatsStrict(x, y, z));
            for (int i = 0; i < 1000000; ++i) {
                multiplyThreeFloatsDefault(x, z, y);
                multiplyThreeFloatsStrict(x, z, y);
            }
            System.out.printf("multiplyThreeFloatsDefault(%a, %a, %a) = %a\n", x, y, z, multiplyThreeFloatsDefault(x, y, z));
            System.out.printf("multiplyThreeFloatsStrict(%a, %a, %a) = %a\n", x, y, z, multiplyThreeFloatsStrict(x, y, z));
        }
    }
}

First, the execution result in the modern environment is as follows. I ran it on x86 \ _64, but it should give the same result on x86 processors with SSE2 and CPUs such as AArch64.

$ java StrictfpTest
multiplyDefault(0x1.00002fffp0, 0x1.000000008p0) = 0x1.00002fff80001p0
multiplyStrict(0x1.00002fffp0, 0x1.000000008p0) = 0x1.00002fff80001p0
multiplyDefault(0x1.fffe0effffffep0, 0x1.0000000000001p0) = 0x1.fffe0fp0
multiplyStrict(0x1.fffe0effffffep0, 0x1.0000000000001p0) = 0x1.fffe0fp0
multiplyDefault(0x1.fffe0effffffep-51, 0x1.0000000000001p-1000) = 0x0.0000000ffff07p-1022
multiplyStrict(0x1.fffe0effffffep-51, 0x1.0000000000001p-1000) = 0x0.0000000ffff07p-1022
multiplyThreeDoublesDefault(0x1.0p-1000, 0x1.0p-1000, 0x1.0p1000) = 0x0.0p0
multiplyThreeDoublesStrict(0x1.0p-1000, 0x1.0p-1000, 0x1.0p1000) = 0x0.0p0
multiplyThreeDoublesDefault(0x1.0p-1000, 0x1.0p-1000, 0x1.0p1000) = 0x0.0p0
multiplyThreeDoublesStrict(0x1.0p-1000, 0x1.0p-1000, 0x1.0p1000) = 0x0.0p0
multiplyThreeFloatsDefault(0x1.0p-100, 0x1.0p-100, 0x1.0p100) = 0x0.0p0
multiplyThreeFloatsStrict(0x1.0p-100, 0x1.0p-100, 0x1.0p100) = 0x0.0p0
multiplyThreeFloatsDefault(0x1.0p-100, 0x1.0p-100, 0x1.0p100) = 0x0.0p0
multiplyThreeFloatsStrict(0x1.0p-100, 0x1.0p-100, 0x1.0p100) = 0x0.0p0

It can be seen that in a modern environment the results do not change with or without strictfp. The value does not change before and after JIT compilation.

Then use QEMU to run it on a Pentium II. I put the 32-bit version of the Java command in ~ / jre1.8.0_261 / bin / java.

$ qemu-i386 -cpu pentium2 ~/jre1.8.0_261/bin/java StrictfpTest
multiplyDefault(0x1.00002fffp0, 0x1.000000008p0) = 0x1.00002fff80001p0
multiplyStrict(0x1.00002fffp0, 0x1.000000008p0) = 0x1.00002fff80001p0
multiplyDefault(0x1.fffe0effffffep0, 0x1.0000000000001p0) = 0x1.fffe0fp0
multiplyStrict(0x1.fffe0effffffep0, 0x1.0000000000001p0) = 0x1.fffe0fp0
multiplyDefault(0x1.fffe0effffffep-51, 0x1.0000000000001p-1000) = 0x0.0000000ffff08p-1022
multiplyStrict(0x1.fffe0effffffep-51, 0x1.0000000000001p-1000) = 0x0.0000000ffff07p-1022
multiplyThreeDoublesDefault(0x1.0p-1000, 0x1.0p-1000, 0x1.0p1000) = 0x0.0p0
multiplyThreeDoublesStrict(0x1.0p-1000, 0x1.0p-1000, 0x1.0p1000) = 0x0.0p0
multiplyThreeDoublesDefault(0x1.0p-1000, 0x1.0p-1000, 0x1.0p1000) = 0x1.0p-1000
multiplyThreeDoublesStrict(0x1.0p-1000, 0x1.0p-1000, 0x1.0p1000) = 0x0.0p0
multiplyThreeFloatsDefault(0x1.0p-100, 0x1.0p-100, 0x1.0p100) = 0x0.0p0
multiplyThreeFloatsStrict(0x1.0p-100, 0x1.0p-100, 0x1.0p100) = 0x0.0p0
multiplyThreeFloatsDefault(0x1.0p-100, 0x1.0p-100, 0x1.0p100) = 0x0.0p0
multiplyThreeFloatsStrict(0x1.0p-100, 0x1.0p-100, 0x1.0p100) = 0x0.0p0

First, the first example 0x1.00002fffp0 * 0x1.000000008p0 does not cause overflow or underflow, so the result does not change with or without strictfp. The same is true for the following example 0x1.fffe0effffffep0 * 0x1.0000000000001p0.

On the other hand, in the third example 0x1.fffe0effffffep-51 * 0x1.0000000000001p-1000, underflow occurs and the result is a denormalized number. And the last digit is shifted by 1 depending on the presence or absence of strictfp. Of course, it is the one with strictfp that is IEEE 754 compliant, and in this case, the one with strictfp that is close to the true value.

In the fourth example, double is used to calculate 0x1p-1000 * 0x1p-1000 * 0x1p1000 ($ 2 ^ {-1000} \ times 2 ^ {-1000} \ times 2 ^ {1000} ). The intermediate result `0x1p-2000` ( 2 ^ {-2000} $) should be 0 because the exponent part is too small to be expressed in double, and the final result should also be 0. In fact, those with strictfp and those before JIT compilation are returning 0x0.0p0.

However, if you do not add strictfp, the result after JIT compilation is 0x1p-1000. This means that the exponent part was calculated in a wider range than the original double.

In the fifth example, I tried to calculate 0x1p-100 * 0x1p-100 * 0x1p100 with float. The intermediate result 0x1p-200 cannot be represented by float, so the final result should be 0. In fact it is. Here, the result did not change even if JIT compilation was performed.

By the way, when it is executed by specifying -quark to Intel SDE, it seems to be equivalent to unmarked Pentium, Java does not support ʻExecuted instruction not valid for specified chip (PENTIUM): 0xf7f61dd0: nop ebx, I fell in edi`.

I won't give a deep explanation here

I will not give a deep explanation here.

You might write an article like "The Curse of x87 FPU" or "What was Java's strictfp introduced for and how it was no longer needed? "

Prepare an environment where Java strictfp really makes sense

Machine

program

I won't give a deep explanation here