Great Swift pointer type commentary

Introduction

Swift has ʻUnsafePointer ` and its companions as types to represent pointers. It will be used when using a C language library such as Core Foundation. These pointer-type APIs are very well thought out and wonderful. This article will introduce and explain it. I think it is interesting for C language users and C ++ users. (swift 3.0.2)

Pointer type

The pointer types are as follows.

Basic pointer type

For language bridge

There are so many.

Basic pointer type attributes

The most basic pointer type is ʻUnsafePointer . Represents a pointer to a T-type value. The referenced T value is immutable. ʻUnsafePointer <T> itself is a struct, so let and var can represent the immutability of the pointer itself.

func f(_ x: UnsafePointer<Int>) {
	let a: UnsafePointer<Int> = x
	var b: UnsafePointer<Int> = x
}

The above is expressed in C language as follows.

void f(const int * const x) {
	const int * const a = x;
	const int * b = x;
}

For ʻUnsafePointer `, it becomes a different type by adding three attributes. There are 8 combinations of 2x2x2.

Mutability

There is a mutable version in which the value of the reference T can be changed. Mutable is attached to the name. From the immutable version, it can be converted with a constructor labeled mutating. Conversely, you can convert from a mutable version to an immutable version with an unlabeled constructor.

public struct UnsafePointer<Pointee> : Strideable, Hashable {
	...
    public init(_ other: UnsafeMutablePointer<Pointee>)
    ...
}

public struct UnsafeMutablePointer<Pointee> : Strideable, Hashable {
	...
    public init(mutating other: UnsafePointer<Pointee>)
    ...
}

With and without type

ʻUnsafePointer had the referenced type T as a type parameter, but some versions do not have this type information. Editions that do not have type information are named with Raw. For example, const void * in C is visible to Swift as this ʻUnsafeRawPointer. It is used when the type of the pointer reference is unknown.

Originally, ʻUnsafePointer alone could cause a bug related to strict aliasing in some code, and ʻUnsafeRawPointer with memcpy semantics was needed to avoid it. That's right.

Details are written in the document at the time of standard proposal. UnsafeRawPointer API

Also, the following is easy to understand about strict aliasing. (Translation) Understand C / C ++ Strict Aliasing or-why # $ @ ## @ ^% The compiler doesn't let me do what I want!

The conversion between typed and untyped will be discussed later.

Pointers and buffers

In C language, pointers are often used to represent arrays, but in order to handle them in a so-called array, it is necessary to know the number of elements together with the pointer. Therefore, ʻUnsafeBufferPointer represents an array that represents this pointer and the number of elements as a set. There are four types ofBuffer`, corresponding to immutable and mutable, typed and untyped.

ʻUnsafeBufferPointer is designed to receive the start address and the number of elements in the constructor. And it inherits the Collection` protocol.

public struct UnsafeBufferPointer<Element> : Indexable, Collection, RandomAccessCollection {
	...
    public init(start: UnsafePointer<Element>?, count: Int)
    ...
    public var baseAddress: UnsafePointer<Element>? { get }
    ...
    public var count: Int { get }
    ...
}

Mold for bridge

OpaquePointer

ʻOpaquePointer` is for representing the type of a pattern called opaque pointer in C language. An opaque pointer is a pointer that is forward-declared only by the type name, but has virtually no information about access to the referenced destination because the type definition is not visible. It is also used to hide the details inside the library from the outside of the library. In Swift, it doesn't happen that you can see only the name of the type but not the definition, but this type is used because it is necessary to express it when the opaque pointer defined in C language is read from Swift. I will.

For example, suppose you have the following C source.

struct CatImpl;
struct Cat {
	CatImpl * impl;
}
void PassCat(const Cat * a);
void PassCatImpl(const CatImpl * b);

From Swift, the two functions look like this:

func PassCat(a: UnsafePointer<Cat>?)
func PassCatImpl(b: OpaquePointer?)

ʻUnsafePointer `can be converted to and from each other through the constructor.

public struct UnsafePointer<Pointee> : Strideable, Hashable {
	...
    public init(_ from: OpaquePointer)
    ...
}

public struct OpaquePointer : Hashable {
	...
    public init<T>(_ from: UnsafePointer<T>)
    ...
}

CVaListPointer

When dealing with variadic arguments in C, we use the special notation ... and the special type va_list, but the pointer to handle va_list is CVaListPointer.

AutoreleasingUnsafeMutablePointer

Not investigated. Looking at a pointer with the __autoreleasing qualifier in Objective-C from Swift, I feel like this, but I don't understand.

Optional, pointer and nullability

Swift pointer types are not null. In other words, ʻUnsafePointer is a non-null pointer. Nullable pointers are represented using Optional as ʻUnsafePointer <T>?. ** Swift can also handle pointers with null safety mechanisms such as the ʻif let` syntax. ** **

There is an optional version of the conversion constructor between ʻUnsafePointer , ʻUnsafeMutablePointer <T>, and ʻOpaquePointer`, and if nil is passed as an argument, the constructor also returns nil.

public struct UnsafePointer<Pointee> : Strideable, Hashable {
	...
    public init?(_ from: OpaquePointer?)
    ...
    public init?(_ other: UnsafeMutablePointer<Pointee>?)
    ...
}

ʻUnsafeBufferPointer is a nullable pointer of its original type. If the pointer received by the constructor is received as Optional from the beginning and nil is passed to it, the baseAddress` property will be nil.

Basic access

ʻUnsafePointer references can be accessed with the pointee property. In C language, it was written with the dereference operator (*) and the arrow operator (->`). Also, since subscript access is possible, the continuously allocated memory area can be accessed like an array.

public struct UnsafePointer<Pointee> : Strideable, Hashable {
	...
    public var pointee: Pointee { get }
    ...
    public subscript(i: Int) -> Pointee { get }
    ...
}

This is writable in the mutable version.

public struct UnsafeMutablePointer<Pointee> : Strideable, Hashable {
	...
    public var pointee: Pointee { get nonmutating set }
	...
    public subscript(i: Int) -> Pointee { get nonmutating set }
	...
}

The untyped Raw version does not have these properties. If you do not assign a type, you will not be able to access the reference destination.

Three states of the pointer

There are three states in the memory pointed to by ʻUnsafePointer `.

The distinction between these three states is an important concept underpinning Swift's pointer types. These three states are indistinguishable by the type system. The programmer needs to know exactly what state the pointer he is dealing with is.

But this isn't a spec added by Swift, it's essentially a concept that exists for pointers. This is explained below.

Memory allocation

The allocated state means that the memory area pointed to by the pointer is secured. Conversely, the unallocated state means that the pointer is null or the memory area it points to is freed.

Here, the size of the memory area handled by ʻUnsafePointer ` is not necessarily the size of one value. It can also handle memory areas allocated in series so that multiple elements can be stored.

Memory allocation can be done by the static method of the mutable version of the pointer, ʻallocate`.

public struct UnsafeMutablePointer<Pointee> : Strideable, Hashable {
   	...
    public static func allocate(capacity count: Int) -> UnsafeMutablePointer<Pointee>
    ...
}

The argument count is the number of consecutive memory areas to be allocated. This method adjusts the alignment and stride according to the value type Pointee. Alignment is a constraint on the ratio of the memory address in which the value is placed to the value of the address. For example, if the alignment is 8, the memory address will always be a multiple of 8. You can get it with MemoryLayout <T> .alignment. The stride is the value of how many bytes each value is placed with the address shifted when securing consecutively. For example, if a type has a memory size of 5 bytes but a stride of 8 bytes, then 8 bytes are allocated for each element, leaving 3 bytes of blank space. You can get it with MemoryLayout <T> .stride. These values are determined by the compiler and cannot be set arbitrarily by the programmer.

For the untyped ʻUnsafeRawPointer, the parameters for ʻallocate are different because we don't know these values.

public struct UnsafeMutableRawPointer : Strideable, Hashable {
    public static func allocate(bytes size: Int, alignedTo: Int) -> UnsafeMutableRawPointer
}

It is designed to specify a pure number of bytes and an alignment value. If you want to keep it continuously, you need to calculate the size considering the stride mentioned above.

Both return values are non-Optional because the pointer to the allocated memory area is non-null. Also, I'm not happy with allocating an immutable memory area, so these static methods are defined in the mutable version of the type.

To free the allocated memory, call the deallocate method.

public struct UnsafeMutablePointer<Pointee> : Strideable, Hashable {
    public func deallocate(capacity: Int)
}

Memory initialization state

Initialized is a concept that indicates whether a value exists in the memory area. The memory area just allocated is really just a memory area, and the value does not exist there yet, it is in an uninitialized state.

Whether it is initialized or uninitialized depends on when you access the pointer's reference and ** read ** and ** write ** the value.

If you read a value from an uninitialized memory area, you don't know what the state in memory is, so it may contain ridiculous values and there is a risk of crashing. I think this is easy to understand. The interesting thing is when it comes to writing.

Consider a variable defined by Swift's var in general. Suppose you have a Cat type (reference type) and a CatHouse type (value type) that holds it, as shown below.

class Cat {
}

struct CatHouse {
    var cat: Cat?
}

Suppose that the ʻApp type as shown below has a property of the CatHouse type and this is rewritten in the ʻupdate method.

class App {
    init (a: CatHouse) {
        self.a = a
    }

    var a: CatHouse
    
    func update(b: CatHouse) {
        self.a = b
    }
}

At this time, the ARC mechanism of swift will increase the reference counter of cat that CatHouse of b has by 1, but one more thing to remember is that it is originally in ʻa. The process of decrementing the cat` reference counter that the old CatHouse had ** occurs. So in general, when a copy of a value occurs in swift, it will be erased by the copy ** the old value will be destroyed **.

Now, let's think about writing a value to the pointer reference destination. When writing a value to the reference destination of a pointer, it is the same as having a variable there, so it is necessary to destroy the original value. But what if you just secured it and haven't written a value yet? In that state, it would be bad if the original value was destroyed. This is because the value is not written and it is in a random memory state.

Therefore, it is necessary to distinguish between initialized and uninitialized. The pointee property of ʻUnsafePointer is a convention that should only be used when it has been initialized. Use the ʻinitialize method to write to the uninitialized memory area and the deinitialize method to return the initialized memory area to uninitialized.

public struct UnsafeMutablePointer<Pointee> : Strideable, Hashable {
	...
    public func initialize(to newValue: Pointee, count: Int = default)
    ...
    public func deinitialize(count: Int = default) -> UnsafeMutableRawPointer
    ...
    public func move() -> Pointee
    ...
}

Now, since it was possible to allocate multiple elements in the memory area, these methods have a count argument. For ʻinitialize, fill countelements with the value specified by thetoargument. At this time, the original memory area is not destroyed. Conversely, thedeinitializemethod only destroys the value. Themove method is deinitializewhen the number of elements is one, and returns the value as the return value. You can see that move semantics are realized with exactly the same name asstd :: move` in C ++.

I will experiment. Make sure that Cat's ʻinit and deinit` are logged.

class Cat {
    init () {
        print("init")
    }
    deinit {
        print("deinit")
    }
}

Then execute the following function.

func test1() {
    var p = UnsafeMutablePointer<CatHouse>.allocate(capacity: 1)
    defer {
        p.deallocate(capacity: 1)
    }
    p.initialize(to: CatHouse(cat: Cat()))
    p.move()
}

I tried prefixing deallocate with defer. The output will be as follows.

init
deinit

Now, let's make a version that does not move.

func test2() {
    var p = UnsafeMutablePointer<CatHouse>.allocate(capacity: 1)
    defer {
        p.deallocate(capacity: 1)
    }
    p.initialize(to: CatHouse(cat: Cat()))
}

Then, deinit is no longer done.

init

Although the memory area has been released, the processing to reduce the Cat counter held by it has not been executed because the CatHouse written in it has not been destroyed, and a memory leak has occurred. I have done it.

Also, try a test that erases old values using pointee on the way.

func test3() {
    var p = UnsafeMutablePointer<CatHouse>.allocate(capacity: 1)
    defer {
        p.deallocate(capacity: 1)
    }
    p.initialize(to: CatHouse(cat: Cat()))
    p.pointee = CatHouse(cat: Cat())
    p.move()
}
init
init
deinit
deinit

You can see that it was created twice correctly and deleted twice.

So what if you try and write the value before ʻinitialize`?

func test4() {
    var p = UnsafeMutablePointer<CatHouse>.allocate(capacity: 1)
    defer {
        p.deallocate(capacity: 1)
    }
    p.pointee = CatHouse(cat: Cat())
    p.initialize(to: CatHouse(cat: Cat()))
    p.move()
}
init
init
deinit

As you can see, one Cat has leaked memory. The pointee written before ʻinitialize was overwritten with ** initialization without destruction ** during ʻinitialize, so the counter operation of cat was skipped and leaked. ..

And before that, this code was ** destroying the uninitialized area when writing to pointee ** So there is also a risk of crash.

Value exchange between pointers and move semantics

Suppose you have two allocated memories, one of which has been initialized. In other words, suppose there is a value on one side. At this time, when moving the value from the existing pointer to the other pointer, there are 2x2 patterns depending on the following conditions.

--Whether the destination pointer is initialized or uninitialized --Whether to leave the value of the sender pointer as it is or discard it

Copying a value type is fast in Swift, but if you have a reference type as a property, for example CatHouse, you need to increment the counter for that reference by 1 when copying, which has the overhead. If the copy source value is subsequently discarded, the counter will be decremented by 1 at that time, so it will be increased by 1 and decremented by 1, which is useless. Therefore, if there is an operation of discarding the value of the copy source and displaying the value to the copy destination at the same time, this useless overhead can be eliminated. This is called a move operation in C ++, but Swift's pointer type has a method for this move.

As mentioned earlier, the operation of writing a value to uninitialized memory was called ʻinitialize. On the other hand, the operation of writing a value to initialized memory is called ʻassign. These two are ordinary copies. And there are these move operation versions with the prefix move.

public struct UnsafeMutablePointer<Pointee> : Strideable, Hashable {
	...
    public func initialize(from source: UnsafePointer<Pointee>, count: Int)
    ...
    public func moveInitialize(from source: UnsafeMutablePointer<Pointee>, count: Int)
    ...
    public func assign(from source: UnsafePointer<Pointee>, count: Int)
    ...
    public func moveAssign(from source: UnsafeMutablePointer<Pointee>, count: Int)
    ...
}

In the move version, source is mutable. This is because it is subject to a discard operation.

Initialization / destruction cannot be controlled for Raw systems. Because the type is unknown.

Memory state and buffer type

BufferPointer type types do not have methods such as allocation and initialization. These memory operations are done by pointer type, and the buffer acts just like a view on it.

Conversion between typed and untyped

The conversion from a typed pointer ʻUnsafePointer to an untyped pointer ʻUnsafeRawPointer is possible in the constructor.

public struct UnsafeRawPointer : Strideable, Hashable {
	...
    public init<T>(_ other: UnsafePointer<T>)
    ...
}

However, untyped pointer to typed conversion is not possible in the constructor. Instead, there are two dedicated methods.

public struct UnsafeRawPointer : Strideable, Hashable {
	...
    public func bindMemory<T>(to type: T.Type, capacity count: Int) -> UnsafePointer<T>
    ...
    public func assumingMemoryBound<T>(to: T.Type) -> UnsafePointer<T>
    ...
}

Apparently, in balance with the strict-aliasing mentioned above, ʻUnsafeRawPointer seems to have the compiler statically track what type T` the memory area is currently treated with. This is called a bind.

ʻWhen allocated as UnsafeRawPointer, it is unbound, and the method that binds it to a type T is bindMemory. At the same time, ʻUnsafePointer <T> is returned. For memory that is already bound to T, you can use the ʻassumingMemoryBound` method.

There is also a method called ʻinitializeMemory` that initializes uninitialized memory while typing it into T.

The binding state transition around here is described in the above document. Binding memory type

end

It provides the functions required for language functions without using a dedicated syntax for pointers, handles type information generically, is null-safe, and provides operation methods with clearly organized conventions for the three states of memory. , I think that it is very well done because it can also support move semantics.

I think it's interesting to compare it with Rust and C ++. In these languages, first-class pointers are raw pointers, and smart pointers with reference counts etc. are provided as generic types. However, in the case of Swift, this is reversed, and first-class pointers are provided as smart pointers and raw pointers are provided as generic types. I think this reversal is a good balance in designing a language that can be used to write apps, but also to low layers.

Recommended Posts

Great Swift pointer type commentary
[Swift] Type type ~ Enumeration type ~
Convert from C String pointer to Swift String type
[Swift] Type components ~ Type nesting ~
[Swift] Type component ~ Subscript ~
[Swift] Type design guidelines
[Swift] Type component ~ Initializer ~
[Swift] Shared enumeration type
[Swift] Type type-Class sequel-
[Swift] Type component ~ Method ~
[Swift] Summary about Bool type
[Swift] Type component ~ Property Part 2 ~