The story of porting code from C to Go and getting hooked (and to the language spec)

background

We programmers often port source code between different language processing systems. As for syntax, compiler guarantees it, so I don't really get into it.

However, for evaluation, in a similar syntax, The evaluation results may differ depending on the language, and I think that sometimes I am addicted to it.

This time, when porting from C to Go, due to differences in language specifications, Write a story that is a little addictive to getting the intended result.

"Gununu ... I transplanted it in the same way no matter how I looked at it ..."

For example, this C code

`main.c`


// This is main.c

#include <stdio.h>

void main () {
    unsigned char  a;
    unsigned char  b;
    unsigned short c;

    a = 0x12;
    b = 0x34;
    c = 0x0000;

    c |= (unsigned short)(a << 8);
    c |= (unsigned short)(b << 0);

    printf("c is 0x%04X\n", c);
}

I ported it to this Go code.

`main.go`


package main

import "fmt"

func main() {
    var a uint8
    var b uint8
    var c uint16

    a = 0x12
    b = 0x34
    c = 0x0000;

    c |= uint16(a << 8)
    c |= uint16(b << 0)

    fmt.Printf("c is 0x%04X\n", c)
}

This is a common procedure for combining the bits of two variables.

"Okay, I wrote the cast of C in the same way, No matter how you look at it, it should work from the exact same code !! "

I ran it.

$ gcc -o main.exe main.c
$ main.exe
c is 0x1234

$ go build -o main.exe
$ main.exe
c is 0x0034

"Oh ... the execution result is different ... ??"

"The C source of the porting source has the cast written properly, and Go should get angry if he doesn't have a mold or a cast in the first place ... I don't know, I don't know ... "

If you think about it later, it's not surprising (although it was actually a more complex program). I twisted my neck for about 30 minutes with this.

"Hmm ... ?? This is ..."

When I twisted my neck for 30 minutes (actually I was desperately debugging) ...

"Oh, this ʻa << 8` in C has been extended to an int." "The reason C programs work is because integer operations are implicitly extended to int ..."

"But there is no such implicit behavior in Now Go ..."

`main.c`


    c |= (unsigned short)(a << 8);
    c |= (unsigned short)(b << 0);

That's right. C language is the background of the times when the language was born and Because the language processing system is a CPU-dependent standard, Frequently do "undefined behavior", implicit typecasting, integral promotion, and so on.

After all, in Go, the above code is a bad code, A trivial belief delayed me from realizing why Go's code didn't work.

So the problem is that Go had to explicitly write a type extension like this:

`main.go`


    c |= uint16(a) << 8 //8-bit variable`<<`Type cast to a 16-bit variable before the operation is evaluated
    c |= uint16(b) << 0

The two programs are now evaluated equally.

$ gcc -o main.exe main.c
$ main.exe
c is 0x1234

$ go build -o main.exe
$ main.exe
c is 0x1234

Check the language specifications

So, up to this point, I have dealt with it in my memory. What are the language specifications of each language in the first place?

This time, the language specifications of the processing system (C, Go) actually used Let's see how it is defined to behave when this kind of processing is done.

The following is referred to from ISO / IEC 9899 (International standard of C language: C99), shift It is a specification of calculation.

#About the operation of Bit shift
6.5.7 Bitwise shift operators
(Omission)
Semantics
3 The integer promotions are performed on each of the operands. The type of the result is
that of the promoted left operand. If the value of the right operand is negative or is
greater than or equal to the width of the promoted left operand, the behavior is undefined.

(Translated by the author)
If the right operand exceeds the bit width of the "extended" left operand
Also, if the second operand has a negative value, the behavior is undefined.

#About integral promotion
If an int can represent all values of the original type, the value is converted to an int;
otherwise, it is converted to an unsigned int. These are called the integer
promotions.

(Translated by the author)
If int can represent all values of the original type, the value is converted to an int.
Otherwise, it will be converted to an unsigned int. They are"integer promotion" (Integral propaganda)Is called.

The following is the (integer value operation) specification of the shift operation referenced from the Go reference.

Integer overflow
For unsigned integer values, the operations +, -, *, and << are computed modulo 2n,
where n is the bit width of the unsigned integer's type. Loosely speaking,
these unsigned integer operations discard high bits upon overflow, and programs may rely on "wrap around".

(Translated by the author)
(Omission)An unsigned integer operation truncates the high-order bits and "wraps around" if it overflows.
(0xFF for 8-bit integers+ 0x01 =That it will be 0x00)

Again, in shift operations, C is implicitly extended to int, whereas Go has no implicit behavior and clearly defines the behavior.

Summary

Go is sometimes referred to as better C, but as in this example, the actual behavior is not explicit at the syntactic level. In such cases, if the primary documentation is provided in an easy-to-read format, I thought that referring to the language specifications would lead to a quick, reliable, and deeper understanding of the language specifications.

reference open-std.org ISO/IEC 9899:TC3 The Go Programming Language Specification