CRuby code reading (2): rb_newobj_of

About this theme

This is an article that I will read about the CRuby implementation. This is the second time.

This time I wanted to read around rb_intern, which defines the symbol, but after digging some functions I was confused because there was a proper use of fake RString and true RString related to RString (string). Looking at the peripheral code, it seems that you can read that you understand the initialization operation after securing the Ruby object. Therefore, I will connect to the next by reading rb_newobj_of which seems to be securing a new object.

Because the purpose is to know the initial state of the object I will read the detailed story of GC and allocation on hold.

rb_newobj_of

rb_newobj_of is defined in gc.c.

VALUE
rb_newobj_of(VALUE klass, VALUE flags)
{
    return newobj_of(klass, flags, 0, 0, 0);
}

Apparently the entity is like newobj_of. If you look for the code on the side you are using and read it, klass is a constant that expresses the type of class. Various flags related to objects were put in flags in the form of bit flags.

By the way, the type of each argument and return value is VALUE. Since it can be used as a bit flag, there is no doubt that it is a natural number type, but what kind of type is this? And 0 of the three arguments of the call is a magic number, but what does it mean?

I read each one and examined it.

VALUE type

What is the VALUE type?

I searched with grep and found a typedef in include / ruby / ruby.h.

#if defined HAVE_UINTPTR_T && 0
typedef uintptr_t VALUE;
typedef uintptr_t ID;
# define SIGNED_VALUE intptr_t
# define SIZEOF_VALUE SIZEOF_UINTPTR_T
# undef PRI_VALUE_PREFIX
#elif SIZEOF_LONG == SIZEOF_VOIDP
typedef unsigned long VALUE;
typedef unsigned long ID;
# define SIGNED_VALUE long
# define SIZEOF_VALUE SIZEOF_LONG
# define PRI_VALUE_PREFIX "l"
#elif SIZEOF_LONG_LONG == SIZEOF_VOIDP
typedef unsigned LONG_LONG VALUE;
typedef unsigned LONG_LONG ID;
# define SIGNED_VALUE LONG_LONG
# define LONG_LONG_VALUE 1
# define SIZEOF_VALUE SIZEOF_LONG_LONG
# define PRI_VALUE_PREFIX PRI_LL_PREFIX
#else
# error ---->> ruby requires sizeof(void*) == sizeof(long) or sizeof(LONG_LONG) to be compiled. <<----
#endif

It looks like some type that matches the pointer size of the environment. What's the point of using VALUE instead of simply using pointers or natural numbers?

If you look at the usage of VALUE with grep, you can understand the following.

--Basically used as a pointer to Ruby objects --Sometimes it is used as a natural number or bit flag instead of a pointer --The distinction between the two is made depending on whether the least significant bit is 0 (pointer) or fixnum_flag (= 1 / natural number). --see also. INT2FIX macro in include / ruby / ruby.h

Probably a mechanism for improving memory efficiency. Now that I know that it's basically a pointer, that's enough. I've decided to avoid further research on VALUE for now.

newobj_of

Like rb_newobj_of, it has a definition in gc.c.

static VALUE
newobj_of(VALUE klass, VALUE flags, VALUE v1, VALUE v2, VALUE v3)
{
   :
<<Omission>>
   :
   :
    return obj;
}

The parameter 0 specified by rb_newobj_of should be in v1 to v3.

Let's read from the head.

static VALUE
newobj_of(VALUE klass, VALUE flags, VALUE v1, VALUE v2, VALUE v3)
{
    rb_objspace_t *objspace = &rb_objspace;
    VALUE obj;

    if (UNLIKELY(during_gc || ruby_gc_stressful)) {
	if (during_gc) {
	    dont_gc = 1;
	    during_gc = 0;
	    rb_bug("object allocation during garbage collection phase");
	}

	if (ruby_gc_stressful) {
	    if (!garbage_collect(objspace, FALSE, FALSE, FALSE, GPR_FLAG_NEWOBJ)) {
		rb_memerror();
	    }
	}
    }
   :
<<Omission>>

O. ʻUN LIKELY. ʻUN LIKELY is specified, so This conditional clause is basically a path that does not occur, so let's skip it once. (And as far as the conditional statement is seen, it seems to be the operation during GC execution.)

ʻObjspace and ʻobj are secured in the variable declaration. The macro for rb_objspace was defined in gc.c.

#if defined(ENABLE_VM_OBJSPACE) && ENABLE_VM_OBJSPACE
#define rb_objspace (*GET_VM()->objspace)
#else
static rb_objspace_t rb_objspace = {{GC_MALLOC_LIMIT_MIN}};
#endif

As far as you can see, it looks like an area for placing objects, as the name implies. For the time being, the purpose is the initial state, so it seems that there is no need to dig deep here.

Let's move on to the middle of the newobj_of function.

static VALUE
newobj_of(VALUE klass, VALUE flags, VALUE v1, VALUE v2, VALUE v3)
{
<<Omission>>
   :
    obj = heap_get_freeobj(objspace, heap_eden);

    if (RGENGC_CHECK_MODE > 0) assert(BUILTIN_TYPE(obj) == T_NONE);

    /* OBJSETUP */
    RBASIC(obj)->flags = flags & ~FL_WB_PROTECTED;
    RBASIC_SET_CLASS_RAW(obj, klass);
    if (rb_safe_level() >= 3) FL_SET((obj), FL_TAINT);
    RANY(obj)->as.values.v1 = v1;
    RANY(obj)->as.values.v2 = v2;
    RANY(obj)->as.values.v3 = v3;
<<Omission>>
}

It can be read as if freeobj (?) Is acquired by the heap_get_freeobj function in ʻobjsecured earlier. After that, it seems to setflags for the result of passing ʻobj through the RBASIC macro.

When FL_WB_PROTECTED is grep,#define FL_WB_PROTECTED (((VALUE) 1) << 5) You can see that it is a bit flag that uses the 5th bit.

Because of this negative logical product, only the 5th bit is ignored even if it is specified by the flags argument of newobj_of.

I'm wondering who the FL_WB_PROTECTED is, but as a knowledge about the initial value I think it is enough to know that the value does not stand at the time of initialization. This is because there should be a flag name or some form of access to the 5th bit where you use it. (Rather, you will be able to understand the meaning of FL_WB_PROTECTED by reading it from now on.)

So I don't care.

What is the RBASIC macro? Here, RANY is also used in a similar way. Especially in RANY, the arguments v1 to v3 are assigned. What kind of effect does this have?

What are RBASIC_SET_CLASS_RAW and FL_SET doing? And what time is it when rb_safe_level () is 3 or higher?

--Summary of questions here

  1. What is heap_get_freeobj doing?
  2. What are RBASIC and RANY doing to obj? ――Especially at this time, what kind of data are you accessing with values.v1 or flags?
  3. What does RBASIC_SET_CLASS_RAW do?
  4. What is the situation with rb_safe_level ()> = 3?

Let's take a look.

  1. What is heap_get_freeobj doing?

The definition of heap_get_freeobj () is as follows.

static inline VALUE
heap_get_freeobj(rb_objspace_t *objspace, rb_heap_t *heap)
{
    RVALUE *p = heap->freelist;

    while (1) {
	if (LIKELY(p != NULL)) {
	    heap->freelist = p->as.free.next;
	    return (VALUE)p;
	}
	else {
	    p = heap_get_freeobj_from_next_freepage(objspace, heap);
	}
    }
}

Oh, it's LIKELY. Using LIKELY as a clue, it seems that you simply get an object from the free list. It doesn't seem to be related to the purpose, so I just understood that I was able to allocate memory and proceeded to the next step.

heap_get_freeobj simply reserves the object from the free list

  1. What are RBASIC and RANY doing to obj?

I tried to grep what RBASIC and RANY are doing to obj. As a result, the following two were found.

#define RBASIC(obj)  (R_CAST(RBasic)(obj))
#define RANY(o) ((RVALUE*)(o))

There was the following definition for R_CAST

#define R_CAST(st)   (struct st*)

In other words, both cast ʻobjas a pointer to a certain structure. (A pointer to aRBasic structure for a RBASICmacro. A pointer to aRVALUE` structure for RANY.)

The RBasic structure and the RVALUE structure were as follows, respectively.

struct RBasic {
    VALUE flags;
    const VALUE klass;
}
typedef struct RVALUE {
    union {
	struct {
	    VALUE flags;		/* always 0 for freed obj */
	    struct RVALUE *next;
	} free;
	struct RBasic  basic;
	struct RObject object;
	struct RClass  klass;
	struct RFloat  flonum;
	struct RString string;
	struct RArray  array;
	struct RRegexp regexp;
	struct RHash   hash;
	struct RData   data;
	struct RTypedData   typeddata;
	struct RStruct rstruct;
	struct RBignum bignum;
	struct RFile   file;
	struct RNode   node;
	struct RMatch  match;
	struct RRational rational;
	struct RComplex complex;
	struct RSymbol symbol;
	struct {
	    struct RBasic basic;
	    VALUE v1;
	    VALUE v2;
	    VALUE v3;
	} values;
    } as;
#if GC_DEBUG
    const char *file;
    int line;
#endif
} RVALUE;

Ignore GC_DEBUG as it seems to be debug information.

RBasic is the first part of RVALUE. RVALUE seems to be an all-inclusive union. When I looked it up, the head part of the structure definition of each type included in RVALUE was basically RBasic. From this, it can be seen that RBasic seems to be commonly used as an object management header.

Since RBasic has flags, which is a common header for the entire object, The assignment to flags in the original code was the assignment of flags to this header.

Also, each type included in RVALUE was made small, and it fits up to 3 pointers (= 3 VALUEs).

Therefore, from v1 in values of RVALUE via v3, By accessing it, it seems that the values of all objects can be initialized properly.

  1. What does RBASIC_SET_CLASS_RAW do?

You can find the following by grep.

#define RBASIC_SET_CLASS_RAW(obj, cls) (((struct RBasicRaw *)((VALUE)(obj)))->klass = (cls))

If you also grep the definition of the RBasicRaw structure

struct RBasicRaw {
    VALUE flags;
    VALUE klass;
};

It was like this. It is very similar to RBasic, but there seems to be a difference in the const property of klass when compared. Apparently touching the klass value should basically not be done, Use RBASIC_SET_CLASS_RAW wherever you need it I think that it is protected so that it can be operated.

By the way, the value of klass was a constant that seems to indicate the class type. Therefore, after all, RBASIC_SET_CLASS_RAW can be thought of as a macro that records the specified ʻobj type as cls`.

  1. What is the situation with rb_safe_level ()> = 3?

Finally, let's read rb_safe_level (). You can find it at the beginning of safe.c by grep.

int
rb_safe_level(void)
{
    return GET_THREAD()->safe_level;
}

And GET_THREAD is an inline function if you think it is a macro,

static inline rb_thread_t *
GET_THREAD(void)
{
    rb_thread_t *th = ruby_current_thread;
#if OPT_CALL_CFUNC_WITHOUT_FRAME
    if (UNLIKELY(th->passed_ci != 0)) {
	void vm_call_cfunc_push_frame(rb_thread_t *th);
	vm_call_cfunc_push_frame(th);
    }
#endif
    return th;
}

Like this. Is it the acquisition of the current thread object?

So, after looking up to this point, I realized, "Oh, isn't this the safe level of the security model?" Security model Each thread will have a safe_level and will mesh intuitively.

I decided to stop digging deeper here and prioritize whether this understanding can interpret the original code.

Again in the middle of newobj_of

I will repost the middle of newobj_of.

static VALUE
newobj_of(VALUE klass, VALUE flags, VALUE v1, VALUE v2, VALUE v3)
{
<<Omission>>
   :
    obj = heap_get_freeobj(objspace, heap_eden);

    if (RGENGC_CHECK_MODE > 0) assert(BUILTIN_TYPE(obj) == T_NONE);

    /* OBJSETUP */
    RBASIC(obj)->flags = flags & ~FL_WB_PROTECTED;
    RBASIC_SET_CLASS_RAW(obj, klass);
    if (rb_safe_level() >= 3) FL_SET((obj), FL_TAINT);
    RANY(obj)->as.values.v1 = v1;
    RANY(obj)->as.values.v2 = v2;
    RANY(obj)->as.values.v3 = v3;
<<Omission>>
}

The meaning of FL_TAINT is amazing.

By the way, if you read it with the understanding so far, this part is

  1. Secure a new object
  2. Set the specified flags in the object header
  3. Change the class of the object to the specified class
  4. If the security level is 3 or higher, flag the object for pollution status.
  5. Initialize the body part (v1 to v3) of the object to the specified initial value. --Here, v1 to v3 will be 0 when going through rb_newobj_of.

Can be read as. I see. I feel that I got a sufficient feeling for understanding the initialization operation.

So if you look at the rest, you'll find tools for assertions and debugging, The process of registering with the garbage collector seemed to be different from something like determining the initial state of an object.

I thought I could understand the initial state of the object if I had nothing else to do.

Conclusion

The initial state of the object secured by rb_newobj_of is as follows

--The header part of the object will be the flag with FL_WB_PROTECTED omitted from the flags specification and the klass as specified. --However, if the security level is 3 or higher, FL_TAINT will also be in a standing state. --The body part of the object is cleared to 0

About next time

This time it has become quite long. Next time, I want to read the part that seems easy and shorten it.

Other information

The CRuby you are reading in this article is the ruby / ruby, trunk branch on github. The current commit hash value is c1b05c53b795fdb1137819bc2973d591af2712d0, I will always read the latest version in the future.

Recommended Posts

CRuby code reading (2): rb_newobj_of
CRuby code reading (1): LIKELY / UNLIKELY
CRuby code reading (3): rb_bug execution line output
Technology for reading source code (cheat sheet)
Rails 5 Code Reading Part 1 ~ ActiveRecord new Method ~