Blog

February 21, 2009

I am American

The Apple Publications Style Guide [PDF] is the reference document for how to write documentation for Apple products. It is interesting to see that they’re putting online what is mostly meant for internal use. Here’s an interesting tidbit about the word America, which many people should remember:

America

American

Refers to both North and South America. Don’t use when you mean United States. See also U.S.

I am Canadian, therefore I am American. Sounds strange… but it’s true.

February 14, 2009

Some Ideas for Dynamic Vtables in D

In my last two posts, I’ve written about things I’d like to see added to the D programming language. The first is a non-fragile ABI, where you don’t need to recompile every subclasses when you’re changing the base class. The second is to implement a class extension system where you can attach new functions to existing classes without recompiling them. Now it’s time to look at how it can be implemented.

In the current vtable system, each D object begins with a pointer to the virtual function table (vtable for short), followed by the monitor pointer (used for locking), followed by the object’s members. The vtable contains a list of pointers to the implementations of virtual functions of that class in the order they were declared. When you call a virtual method, the compiler generate code that will grab the pointer to the function implementation at the right offset, and then call the function.

Then, when you create a subclass, a new vtable is created, starting with all the pointers from the base class’ vtable (except for overriden functions where the pointer is replaced with the new implementation) and added with pointers to any virtual function from the derived class.

Now, suppose that in a new version of a library we’re adding a function to the base class. This will offset by one all the function pointers in the vtable that comes after the function we’re adding. Since functions are added to the vtable in the lexical order they appear in the source file, we may avoid changing any of the base class’ offsets by placing the function at the end of the class declaration, but we can’t avoid offseting all of the derived class function pointers. This means that all derived classes need to be recompiled, and that’s the first thing we’ll try to avoid to allow libraries to keep their ABI compatible when adding functions to classes.

Note that this problem also occurs with member variables, which are accessed by adding an offset to the pointer to the start of the class. But that’s less of a problem since you can always reserve an opaque pointer for keeping the new data you may want in the future.

Interfaces are not the solution

D has the concept of interfaces. Implementing an interface in a class adds a new pointer to the memory layout of the object pointing to a special vtable for the interface. This makes interfaces more robust when it comes to the undesirable offset problem when creating a subclass in another library. That’s only true for interfaces declared final though, as any interface derived from another will break when you add a function to the base.

The problem with interfaces is that it doesn’t solve any part of this problem. If you build a library by exposing only interfaces, you can’t create a subclass of anything to override the default behaviour. At this rate, using static functions from the library owning the class is even safer: in both cases you lose the possibility of creating a subclass, but in the later at least you aren’t dependent on the lexical order of declaration in the interface source file.

Dynamic vtables

The solution I’m proposing would be to build vtables dynamically, either while linking, while initializing the program, or at runtime before the first use of a class. The vtable builder, whether it’s in the linker or in the runtime, needs two things for each class: a list of function names properly decorated with their argument types (so each function signature has a unique string) and a corresponding list of pointers. This allows the creation of the vtable at runtime.

Any code calling a virtual function would be required to check a global variable containing the vtable offset for the desired function. Once the vtable is built for a given class, all these global variables are set to the right offset for each function and the rest of the code can start using them to access the vtable.

Dynamic Indirect vtables

With another change, dynamic vtables could allows us to add functions at runtime while loading a new library containing some class extensions. When doing this, vtables of the extended classes and subclasses need to be extended to account for the new virtual functions being added. Once that is done, the global offsets variables are updated to match the new vtables, and subsequent calls to functions from that vtable will use the updated offsets.

This process could cause vtables to be relocated in memory. To solve this, we could fix each object’s vtable pointer using the GC type information about each memory block could be used.

Another approach to relocated vtables would be to have the vtable be accessed using a second level of indirection. This is what I call Dynamic Indirect: the object contains a pointer to a pointer to the vtable. When relocating the vtable, you update the second pointer and all objects are now using it.

A general problem to updating the virtual tables while the program is running has to do with thread safety. If any other thread is using the offset global variables or the vtable while you’re updating you could easily get a wrongly dispatched function call. I’m not sure how to solve this currently. Updating the vtable would need to be an atomic operation, but how to do this without imposing a lock at each function call?

Effects on performance

Calling each virtual function would require fetching the vtable offset from a global variable instead of it being part of the code.

If we add a second level of indirection for vtable pointers, then it means we’re fetching another value from a remote location. This value could be put in close proximity to the vtable offsets to improve cache efficiency though.

I’m currently preparing some benchmarks to check the performance impacts, but saving the result for another post as I’ll have a lot to say about them.

February 7, 2009

Class extensions in D?

One thing interesting about Objective-C is that you can extend a class by adding new methods to it at runtime. This is done through Categories. There has been some demand for something like it in D too. Here is my take at how it should work in D, and what kind of problem it would solve.

Say that a library provides you with some classes for accessing the filesystem:

class Node
{
    string path();
    string name();
}

class File : Node
{
    ...
}

class Directory : Node
{
    Node[] children();
    ...
}

And a couple of functions giving you directories and files:

File getConfigFile();
Directory getResourceDirectory();
Directory getDataDirectory();

As you don’t have control over these functions, you can’t change the class returned by them. Even if you subclass File or Directory to do what you want, it won’t cause these functions to magically create instances of your class.

It has been suggested many times that D could allow functions where the first argument is a class to be called as if they were a member of that class. For instance, this function:

void backup(Node node, string backupPath);

could be called this way:

backup(node, backupPath);

or this way:

node.backup(backupPath);

The simplicity of this is interesting, but faking member functions like this has a major drawback: contrary to true methods in classes, that function is not part of the virtual table, and thus cannot be dispatched dynamically based on the runtime type. If you wanted to do different things depending on the type of node, the ideal way would be to add a true member function in Node, which you’ll override in the File and Directory subclasses.

Unfortunately, since Node comes from an external library, overriding is not an option… well, that’s what I’m suggesting we add to the language through class extensions. Class extensions are somewhat akin to categories in Objective-C, although safer in regard to accidental name clashes.

Here’s how it works.

The idea of an extension is that you can add member functions to a class:

extension NodeBackup : Node
{
    void backup(string backupPath) { }
}

With this syntax, you say that the NodeBackup extension applies to class Node. Since it’s an extension, it’s not a type you can instantiate. Since it applies to class Node, you can call its functions as if it they were part of Node whenever you have imported the extension’s module.

Then you can override that function in another extension. You do that by deriving that extension from the first one (NodeBackup), and applying to a subclass of the first’s extension base class (File):

extension FileBackup : NodeBackup, File
{
    override void backup(string backupPath)
    {
        copy(this.path, backupPath ~ "/" ~ this.name);
    }
}

The function backup defined here overrides NodeBackup.backup and will be called whenever the Node is a File at runtime.

You can then do the same for Directory.

extension DirectoryBackup : NodeBackup, Directory
{
    override void backup(string backupPath)
    {
        foreach(child; children)
            child.backup(backupPath ~ "/" ~ this.name);
    }
}

And now you can use it like that:

getDataDirectory.backup("/backup_disk");

Special Considerations

How can the compiler generate code that does this?

There are several ways. One way is using dynamic vtable offsets and constructing the vtables at runtime, before first using the class. There are others. I’ll leave the details to another post.

Say you already have a `backup` function in the `Node` or `File` class, what happens?

That should be flagged as ambiguous at the call site and you’d have to manually specify which version of the function you want, something like that (invented syntax):

    (&Node.backup)("/path");

    (&NodeBackup.backup)("/path");

If you don’t like this syntax, avoid defining duplicate names, or avoid importing the module containing that annoying extension.

Say you update the library containing `Node` and that it suddenly adds a `backup` function after your code was compiled, which one get used?

Since your code was compiled by calling the NodeBackup’s backup function, it should continue to do so. The dispatch mechanism should be good enough to tell that you were calling Node.

When you recompile, it’ll get flagged as ambiguous (see previous point).

Say I want to override `backup` and use private variables of the subclass?

Either define a new extension in the same module as the class (private protection doesn’t apply to code in the same module), or merge the extension directly in your subclass by declaring the extension of the base class as an ancestor:

class DirectoryWithSpecialBackupName : Directory, DirectoryBackup
{
    override backup(string path) { ... }
}

Doesn’t that pose some of the same issues as multiple inheritance?

Yes it does. In the preceding example, if say that both the Directory class and the DirectoryBackup extension implements the backup function, a call to DirectoryWithSpecialBackupName will be ambigous. I suggest we forbid deriving a class from an extension implementing a function of the same name.

You can still override the extension’s function from another extension if necessary. And of course, in this case, a call to the class’s version is going to be ambiguous.

Can extensions access private and protected members of the attached class?

No. This would break encapsulation.

Extensions are designed for adding functionalities, not changing existing behaviour, and therefore are not granted any more rights than any function in the same scope as the extension. (That last sentence would make it a yes if the extension is defined in the same module though.)

February 5, 2009

Non-fragile ABI in D?

One shortcoming of the D programming language comes from its C++ roots. In C++, whenever you add, change or remove a virtual method or a variable as a member of a class or struct, you’re probably breaking binary compatibility and must recompile everything depending on that. That makes C++ a bad choice for publishing public APIs when you expect binary compatibility.

The fragile binary interface problem (as Wikipedia calls it) is not new. As a OS written entirely in C++, BeOS has been “bitten” by it in the last decade and suggests some guidelines to alleviate that problem. But following those guidelines adds clutter in the code, forces you to be extra careful when changing a class, and some suggestions are downright impossible in D: for instance, you can’t have private virtual functions in D (something I disagree with, but that’s another subject).

As the Wikipedia entry suggests, a better approach is to use a language that does not have the problem. Or, I should say, change an existing language so it does no longer have that problem.

I’d certainly like to see D gain a non-fragile ABI for classes. I’ll explore this and other subjects in a couple of posts about the D language I’m preparing.