This is an article that I will read about the CRuby implementation. This is the second time.
This time I wanted to read around rb_intern, which defines the symbol, but after digging some functions
I was confused because there was a proper use of fake RString and true RString related to RString (string).
Looking at the peripheral code, it seems that you can read that you understand the initialization operation after securing the Ruby object.
Therefore, I will connect to the next by reading rb_newobj_of
which seems to be securing a new object.
Because the purpose is to know the initial state of the object I will read the detailed story of GC and allocation on hold.
rb_newobj_of
rb_newobj_of
is defined in gc.c.
VALUE
rb_newobj_of(VALUE klass, VALUE flags)
{
return newobj_of(klass, flags, 0, 0, 0);
}
Apparently the entity is like newobj_of
.
If you look for the code on the side you are using and read it, klass
is a constant that expresses the type of class.
Various flags related to objects were put in flags
in the form of bit flags.
By the way, the type of each argument and return value is VALUE. Since it can be used as a bit flag, there is no doubt that it is a natural number type, but what kind of type is this? And 0 of the three arguments of the call is a magic number, but what does it mean?
I read each one and examined it.
VALUE
typeWhat is the VALUE type?
I searched with grep and found a typedef in include / ruby / ruby.h.
#if defined HAVE_UINTPTR_T && 0
typedef uintptr_t VALUE;
typedef uintptr_t ID;
# define SIGNED_VALUE intptr_t
# define SIZEOF_VALUE SIZEOF_UINTPTR_T
# undef PRI_VALUE_PREFIX
#elif SIZEOF_LONG == SIZEOF_VOIDP
typedef unsigned long VALUE;
typedef unsigned long ID;
# define SIGNED_VALUE long
# define SIZEOF_VALUE SIZEOF_LONG
# define PRI_VALUE_PREFIX "l"
#elif SIZEOF_LONG_LONG == SIZEOF_VOIDP
typedef unsigned LONG_LONG VALUE;
typedef unsigned LONG_LONG ID;
# define SIGNED_VALUE LONG_LONG
# define LONG_LONG_VALUE 1
# define SIZEOF_VALUE SIZEOF_LONG_LONG
# define PRI_VALUE_PREFIX PRI_LL_PREFIX
#else
# error ---->> ruby requires sizeof(void*) == sizeof(long) or sizeof(LONG_LONG) to be compiled. <<----
#endif
It looks like some type that matches the pointer size of the environment. What's the point of using VALUE instead of simply using pointers or natural numbers?
If you look at the usage of VALUE with grep, you can understand the following.
--Basically used as a pointer to Ruby objects
--Sometimes it is used as a natural number or bit flag instead of a pointer
--The distinction between the two is made depending on whether the least significant bit is 0 (pointer) or fixnum_flag
(= 1 / natural number).
--see also. INT2FIX macro in include / ruby / ruby.h
Probably a mechanism for improving memory efficiency. Now that I know that it's basically a pointer, that's enough. I've decided to avoid further research on VALUE for now.
newobj_of
Like rb_newobj_of
, it has a definition in gc.c.
static VALUE
newobj_of(VALUE klass, VALUE flags, VALUE v1, VALUE v2, VALUE v3)
{
:
<<Omission>>
:
:
return obj;
}
The parameter 0 specified by rb_newobj_of
should be in v1 to v3.
Let's read from the head.
static VALUE
newobj_of(VALUE klass, VALUE flags, VALUE v1, VALUE v2, VALUE v3)
{
rb_objspace_t *objspace = &rb_objspace;
VALUE obj;
if (UNLIKELY(during_gc || ruby_gc_stressful)) {
if (during_gc) {
dont_gc = 1;
during_gc = 0;
rb_bug("object allocation during garbage collection phase");
}
if (ruby_gc_stressful) {
if (!garbage_collect(objspace, FALSE, FALSE, FALSE, GPR_FLAG_NEWOBJ)) {
rb_memerror();
}
}
}
:
<<Omission>>
O. ʻUN LIKELY. ʻUN LIKELY
is specified, so
This conditional clause is basically a path that does not occur, so let's skip it once.
(And as far as the conditional statement is seen, it seems to be the operation during GC execution.)
ʻObjspace and ʻobj
are secured in the variable declaration.
The macro for rb_objspace
was defined in gc.c.
#if defined(ENABLE_VM_OBJSPACE) && ENABLE_VM_OBJSPACE
#define rb_objspace (*GET_VM()->objspace)
#else
static rb_objspace_t rb_objspace = {{GC_MALLOC_LIMIT_MIN}};
#endif
As far as you can see, it looks like an area for placing objects, as the name implies. For the time being, the purpose is the initial state, so it seems that there is no need to dig deep here.
Let's move on to the middle of the newobj_of
function.
static VALUE
newobj_of(VALUE klass, VALUE flags, VALUE v1, VALUE v2, VALUE v3)
{
<<Omission>>
:
obj = heap_get_freeobj(objspace, heap_eden);
if (RGENGC_CHECK_MODE > 0) assert(BUILTIN_TYPE(obj) == T_NONE);
/* OBJSETUP */
RBASIC(obj)->flags = flags & ~FL_WB_PROTECTED;
RBASIC_SET_CLASS_RAW(obj, klass);
if (rb_safe_level() >= 3) FL_SET((obj), FL_TAINT);
RANY(obj)->as.values.v1 = v1;
RANY(obj)->as.values.v2 = v2;
RANY(obj)->as.values.v3 = v3;
<<Omission>>
}
It can be read as if freeobj (?) Is acquired by the heap_get_freeobj
function in ʻobjsecured earlier. After that, it seems to set
flags for the result of passing ʻobj
through the RBASIC
macro.
When FL_WB_PROTECTED
is grep,#define FL_WB_PROTECTED (((VALUE) 1) << 5)
You can see that it is a bit flag that uses the 5th bit.
Because of this negative logical product, only the 5th bit is ignored even if it is specified by the flags
argument of newobj_of
.
I'm wondering who the FL_WB_PROTECTED
is, but as a knowledge about the initial value
I think it is enough to know that the value does not stand at the time of initialization.
This is because there should be a flag name or some form of access to the 5th bit where you use it.
(Rather, you will be able to understand the meaning of FL_WB_PROTECTED
by reading it from now on.)
So I don't care.
What is the RBASIC
macro? Here, RANY
is also used in a similar way.
Especially in RANY
, the arguments v1
to v3
are assigned. What kind of effect does this have?
What are RBASIC_SET_CLASS_RAW
and FL_SET
doing?
And what time is it when rb_safe_level ()
is 3 or higher?
--Summary of questions here
RBASIC
and RANY
doing to obj?
――Especially at this time, what kind of data are you accessing with values.v1
or flags
?RBASIC_SET_CLASS_RAW
do?rb_safe_level ()> = 3
?Let's take a look.
The definition of heap_get_freeobj ()
is as follows.
static inline VALUE
heap_get_freeobj(rb_objspace_t *objspace, rb_heap_t *heap)
{
RVALUE *p = heap->freelist;
while (1) {
if (LIKELY(p != NULL)) {
heap->freelist = p->as.free.next;
return (VALUE)p;
}
else {
p = heap_get_freeobj_from_next_freepage(objspace, heap);
}
}
}
Oh, it's LIKELY
.
Using LIKELY
as a clue, it seems that you simply get an object from the free list.
It doesn't seem to be related to the purpose, so I just understood that I was able to allocate memory and proceeded to the next step.
heap_get_freeobj simply reserves the object from the free list
RBASIC
and RANY
doing to obj?I tried to grep what RBASIC
and RANY
are doing to obj.
As a result, the following two were found.
#define RBASIC(obj) (R_CAST(RBasic)(obj))
#define RANY(o) ((RVALUE*)(o))
There was the following definition for R_CAST
#define R_CAST(st) (struct st*)
In other words, both cast ʻobjas a pointer to a certain structure. (A pointer to a
RBasic structure for a
RBASICmacro. A pointer to a
RVALUE` structure for RANY.)
The RBasic structure and the RVALUE structure were as follows, respectively.
struct RBasic {
VALUE flags;
const VALUE klass;
}
typedef struct RVALUE {
union {
struct {
VALUE flags; /* always 0 for freed obj */
struct RVALUE *next;
} free;
struct RBasic basic;
struct RObject object;
struct RClass klass;
struct RFloat flonum;
struct RString string;
struct RArray array;
struct RRegexp regexp;
struct RHash hash;
struct RData data;
struct RTypedData typeddata;
struct RStruct rstruct;
struct RBignum bignum;
struct RFile file;
struct RNode node;
struct RMatch match;
struct RRational rational;
struct RComplex complex;
struct RSymbol symbol;
struct {
struct RBasic basic;
VALUE v1;
VALUE v2;
VALUE v3;
} values;
} as;
#if GC_DEBUG
const char *file;
int line;
#endif
} RVALUE;
Ignore GC_DEBUG
as it seems to be debug information.
RBasic is the first part of RVALUE. RVALUE seems to be an all-inclusive union. When I looked it up, the head part of the structure definition of each type included in RVALUE was basically RBasic. From this, it can be seen that RBasic seems to be commonly used as an object management header.
Since RBasic has flags
, which is a common header for the entire object,
The assignment to flags in the original code was the assignment of flags to this header.
Also, each type included in RVALUE
was made small, and it fits up to 3 pointers (= 3 VALUEs).
Therefore, from v1
in values
of RVALUE
via v3
,
By accessing it, it seems that the values of all objects can be initialized properly.
RBASIC_SET_CLASS_RAW
do?You can find the following by grep.
#define RBASIC_SET_CLASS_RAW(obj, cls) (((struct RBasicRaw *)((VALUE)(obj)))->klass = (cls))
If you also grep the definition of the RBasicRaw
structure
struct RBasicRaw {
VALUE flags;
VALUE klass;
};
It was like this. It is very similar to RBasic
, but there seems to be a difference in the const property of klass
when compared.
Apparently touching the klass
value should basically not be done,
Use RBASIC_SET_CLASS_RAW
wherever you need it
I think that it is protected so that it can be operated.
By the way, the value of klass
was a constant that seems to indicate the class type.
Therefore, after all, RBASIC_SET_CLASS_RAW
can be thought of as a macro that records the specified ʻobj type as
cls`.
rb_safe_level ()> = 3
?Finally, let's read rb_safe_level ()
.
You can find it at the beginning of safe.c by grep.
int
rb_safe_level(void)
{
return GET_THREAD()->safe_level;
}
And GET_THREAD
is an inline function if you think it is a macro,
static inline rb_thread_t *
GET_THREAD(void)
{
rb_thread_t *th = ruby_current_thread;
#if OPT_CALL_CFUNC_WITHOUT_FRAME
if (UNLIKELY(th->passed_ci != 0)) {
void vm_call_cfunc_push_frame(rb_thread_t *th);
vm_call_cfunc_push_frame(th);
}
#endif
return th;
}
Like this. Is it the acquisition of the current thread object?
So, after looking up to this point, I realized, "Oh, isn't this the safe level of the security model?"
Security model
Each thread will have a safe_level
and will mesh intuitively.
I decided to stop digging deeper here and prioritize whether this understanding can interpret the original code.
I will repost the middle of newobj_of.
static VALUE
newobj_of(VALUE klass, VALUE flags, VALUE v1, VALUE v2, VALUE v3)
{
<<Omission>>
:
obj = heap_get_freeobj(objspace, heap_eden);
if (RGENGC_CHECK_MODE > 0) assert(BUILTIN_TYPE(obj) == T_NONE);
/* OBJSETUP */
RBASIC(obj)->flags = flags & ~FL_WB_PROTECTED;
RBASIC_SET_CLASS_RAW(obj, klass);
if (rb_safe_level() >= 3) FL_SET((obj), FL_TAINT);
RANY(obj)->as.values.v1 = v1;
RANY(obj)->as.values.v2 = v2;
RANY(obj)->as.values.v3 = v3;
<<Omission>>
}
The meaning of FL_TAINT
is amazing.
By the way, if you read it with the understanding so far, this part is
Can be read as. I see. I feel that I got a sufficient feeling for understanding the initialization operation.
So if you look at the rest, you'll find tools for assertions and debugging, The process of registering with the garbage collector seemed to be different from something like determining the initial state of an object.
I thought I could understand the initial state of the object if I had nothing else to do.
The initial state of the object secured by rb_newobj_of is as follows
--The header part of the object will be the flag with FL_WB_PROTECTED
omitted from the flags specification and the klass as specified.
--However, if the security level is 3 or higher, FL_TAINT
will also be in a standing state.
--The body part of the object is cleared to 0
This time it has become quite long. Next time, I want to read the part that seems easy and shorten it.
The CRuby you are reading in this article is the ruby / ruby, trunk branch on github. The current commit hash value is c1b05c53b795fdb1137819bc2973d591af2712d0, I will always read the latest version in the future.