Wednesday, September 7, 2016

State reflection

Overview

The Stingray engine has two controller threads -- the main thread and the render thread. These two threads build up work for our job system, which is distributed on the remaining threads. The main thread and the render thread are pipelined, so that while the main thread runs the simulation/update for frame N, the render thread is processing the rendering work for the previous frame (N-1). This post will dive into the details how state is propagated from the main thread to the render thread.

I will use code snippets to explain how the state reflection works. It's mostly actual code from the engine but it has been cleaned up to a certain extent. Some stuff has been renamed and/or removed to make it easier to understand what's going on.

The main loop

Here is a slimmed down version of the update loop which is part of the main thread:

while (!quit())
{
    // Calls out to the mandatory user supplied `update` Lua function, Lua is used 
    // as a scripting language to manipulate objects. From Lua worlds, objects etc
    // can be created, manipulated, destroyed, etc. All these changes are recorded
    // on a `StateStream` that is a part of each world.
    _game->update();

    // Flush state changes recorded on the `StateStream` for each world to
    // the rendering world representation.
    unsigned n_worlds = _worlds.size();
    for (uint32_t i = 0; i < n_worlds; ++i) {
        auto &world = *_worlds[i];
        _render_interface->update_world(world);
    }

    // Begin a new render frame.
    _render_interface->begin_frame();

    // Calls out to the user supplied `render` Lua function. It's up to the script
    // to call render on worlds(). The script controls what camera and viewport
    // are used when rendering the world.
    _game->render();

    // Present the frame.
    _render_interface->present_frame();

    // End frame.
    _render_interface->end_frame(_delta_time);

    // Never let the main thread run more than 1 frame a head of the render thread.
    _render_interface->wait_for_fence(_frame_fence);

    // Create a new fence for the next frame.
    _frame_fence = _render_interface->create_fence();
}

First thing to point out is the _render_interface. This is not a class full of virtual functions that some other class can inherit from and override as the name might suggest. The word "interface" is used in the sense that it's used to communicate from one thread to another. So in this context the _render_interface is used to post messages from the main thread to the render thread.

As said in the first comment in the code snippet above, Lua is used as our scripting language and from Lua things such as worlds, objects, etc can be created, destroyed, manipulated, etc.

The state between the main thread and the render thread is very rarely shared, instead each thread has its own representation and when state is changed on the main thread that state is reflected over to the render thread. E.g., the MeshObject, which is the representation of a mesh with vertex buffers, materials, textures, shaders, skinning, data etc to be rendered, is the main thread representation and RenderMeshObject is the corresponding render thread representation. All objects that have a representation on both the main and render thread are setup to work the same way:

class MeshObject : public RenderStateObject
{
};

class RenderMeshObject : public RenderObject
{
};

The corresponding render thread class is prefixed with Render. We use this naming convention for all objects that have both a main and a render thread representation.

The main thread objects inherit from RenderStateObject and the render thread objects inherit from RenderObject. These structs are defined as:

struct RenderStateObject
{
    uint32_t render_handle;
    StateReflection *state_reflection;
};

struct RenderObject
{
    uint32_t type;
};

The render_handle is an ID that identifies the corresponding object on the render thread. state_reflection is a stream of data that is used to propagate state changes from the main thread to the render thread. type is an enum used to identify the type of render objects.

Object creation

In Stingray a world is a container of renderable objects, physical objects, sounds, etc. On the main thread, it is represented by the World class, and on the render thread by a RenderWorld.

When a MeshObject is created in a world on the main thread, there's an explicit call to WorldRenderInterface::create() to create the corresponding render thread representation:

MeshObject *mesh_object = MAKE_NEW(_allocator, MeshObject);
_world_render_interface.create(mesh_object);

The purpose of the call to WorldRenderInterface::create is to explicitly create the render thread representation, acquire a render_handle and to post that to the render thread:

void WorldRenderInterface::create(MeshObject *mesh_object)
{
    // Get a unique render handle.
    mesh_object->render_handle = new_render_handle();

    // Set the state_reflection pointer, more about this later.
    mesh_object->state_reflection = &_state_reflection;

    // Create the render thread representation.
    RenderMeshObject *render_mesh_object = MAKE_NEW(_allocator, RenderMeshObject);

    // Pass the data to the render thread
    create_object(mesh_object->render_handle, RenderMeshObject::TYPE, render_mesh_object);
}

The new_render_handle function speaks for itself.

uint32_t WorldRenderInterface::new_render_handle()
{
    if (_free_render_handles.any()) {
        uint32_t handle = _free_render_handles.back();
        _free_render_handles.pop_back();
        return handle;
    } else
        return _render_handle++;
}

There is a recycling mechanism for the render handles and a similar pattern reoccurs at several places in the engine. The release_render_handle function together with the new_render_handle function should give the complete picture of how it works.

void WorlRenderInterface::release_render_handle(uint32_t handle)
{
    _free_render_handles.push_back(handle);
}

There is one WorldRenderInterface per world which contains the _state_reflection that is used by the world and all of its objects to communicate with the render thread. The StateReflection in its simplest form is defined as:

struct StateReflection
{
    StateStream *state_stream;
};

The create_object function needs a bit more explanation though:

void WorldRenderInterface::create_object(uint32_t render_handle, RenderObject::Type type, void *user_data)
{
    // Allocate a message on the `state_stream`.
    ObjectManagementPackage *omp;
    alloc_message(_state_reflection.state_stream, WorldRenderInterface::CREATE, &omp);

    omp->object_type = RenderWorld::TYPE;
    omp->render_handle = render_handle;
    omp->type = type;
    omp->user_data = user_data;
}

What happens here is that alloc_message will allocate enough bytes to make room for a MessageHeader together with the size of ObjectManagementPackage in a buffer owned by the StateStream. The StateStream is defined as:

struct StateStream
{
    void *buffer;
    uint32_t capacity;
    uint32_t size;
};

capacity is the size of the memory pointed to by buffer, size is the current amount of bytes allocated from buffer.

The MessageHeader is defined as:

struct MessageHeader
{
    uint32_t type;
    uint32_t size;
    uint32_t data_offset;
};

The alloc_message function will first place the MessageHeader and then comes the data, some ASCII to the rescue:

+-------------------------------------------------------------------+
| MessageHeader | data                                              |
+-------------------------------------------------------------------+
<- data_offset ->
<-                          size                                   ->

The size and data_offset mentioned in the ASCII are two of the members of MessageHeader, these are assigned during the alloc_message call:

template<Class T>
void alloc_message(StateStream *state_stream, uint32_t type, T **data)
{
    uint32_t data_size = sizeof(T);

    uint32_t message_size = sizeof(MessageHeader) + data_size;

    // Allocate message and fill in the header.
    void *buffer = allocate(state_stream, message_size, alignof(MessageHeader));
    auto header = (MessageHeader*)buffer;

    header->type = type;
    header->size = message_size;
    header->data_offset = sizeof(MessageHeader);

    *data = memory_utilities::pointer_add(buffer, header->data_offset);
}

The buffer member of the StateStream will contain several consecutive chunks of message headers and data blocks.

+-----------------------------------------------------------------------+
| Header | data | Header | data | Header | data | Header | data | etc   |
+-----------------------------------------------------------------------+

This is the necessary code on the main thread to create an object and populate the StateStream which will later on be consumed by the render thread. A very similar pattern is used when changing the state of an object on the main thread, e.g:

void MeshObject::set_flags(renderable::Flags flags)
{
    _flags = flags;

    // Allocate a message on the `state_stream`.
    SetVisibilityPackage *svp;
    alloc_message(state_reflection->state_stream, MeshObject::SET_VISIBILITY, &svp);

    // Fill in message information.
    svp->object_type = RenderMeshObject::TYPE;

    // The render handle that got assigned in `WorldRenderInterface::create`
    // to be able to associate the main thread object with its render thread 
    // representation.
    svp->handle = render_handle;

    // The new flags value.
    svp->flags = _flags;
}

Getting the recorded state to the render thread

Let's take a step back and explain what happens in the main update loop during the following code excerpt:

// Flush state changes recorded on the `StateStream` for each world to
// the rendering world representation.
unsigned n_worlds = _worlds.size();
for (uint32_t i = 0; i < n_worlds; ++i) {
    auto &world = *_worlds[i];
    _render_interface->update_world(world);
}

When Lua has been creating, destroying, manipulating, etc objects during update() and is done, each world's StateStream which contains all the recorded changes is ready to be sent over to the render thread for consumption. The call to RenderInterface::update_world() will do just that, it roughly looks like:

void RenderInterface::update_world(World &world)
{
    UpdateWorldMsg uw;

    // Get the render thread representation of the `world`.
    uw.render_world = render_world_representation(world);

    // The world's current `state_stream` that contains all changes made 
    // on the main thread.
    uw.state_stream = world->_world_reflection_interface.state_stream;

    // Create and assign a new `state_stream` to the world's `_world_reflection_interface`
    // that will be used for the next frame.
    world->_world_reflection_interface->state_stream = new_state_stream();

    // Post a message to the render thread to update the world.
    post_message(UPDATE_WORLD, &uw);
}

This function will create a new message and post it to the render thread. The world being flushed and its StateStream are stored in the message and a new StateStream is created that will be used for the next frame. This new StateStream is set on the WorldRenderInterface of the World, and since all objects being created got a pointer to the same WorldRenderInterface they will use the newly created StateStream when storing state changes for the next frame.

Render thread

The render thread is spinning in a message loop:

void RenderInterface::render_thread_entry()
{
    while (!_quit) {
        // If there's no message -- put the thread to sleep until there's
        // a new message to consume.
        RenderMessage *message = get_message();

        void *data = data(message);
        switch (message->type) {
            case UPDATE_WORLD:
                internal_update_world((UpdateWorldMsg*)(data));
                break;

            // ... And a lot more case statements to handle different messages. There
            // are other threads than the main thread that also communicate with the
            // render thread. E.g., the resource loading happens on its own thread
            // and will post messages to the render thread.
        }
    }
}

The internal_update_world() function is defined as:

void RenderInterface::internal_update_world(UpdateWorldMsg *uw)
{
    // Call update on the `render_world` with the `state_stream` as argument.
    uw->render_world->update(uw->state_stream);

    // Release and recycle the `state_stream`.
    release_state_stream(uw->state_stream);
}

It calls update() on the RenderWorld with the StateStream and when that is done the StateStream is released to a pool.

void RenderWorld::update(StateStream *state_stream)
{
    MessageHeader *message_header;
    StatePackageHeader *package_header;

    // Consume a message and get the `message_header` and `package_header`.
    while (get_message(state_stream, &message_header, (void**)&package_header)) {
        switch (package_header->object_type) {
            case RenderWorld::TYPE:
            {
                auto omp = (WorldRenderInterface::ObjectManagementPackage*)package_header;
                // The call to `WorldRenderInterface::create` created this message.
                if (message_header->type == WorldRenderInterface::CREATE)
                    create_object(omp);
            }
            case (RenderMeshObject::TYPE)
            {
                if (message_header->type == MeshObject::SET_VISIBILITY) {
                    auto svp = (MeshObject::SetVisibilityPackage*>)package_header;

                    // The `render_handle` is used to do a lookup in `_objects_lut` to
                    // to get the `object_index`.
                    uint32_t object_index = _object_lut[package_header->render_handle];

                    // Get the `render_object`.
                    void *render_object = _objects[object_index];

                    // Cast it since the type is already given from the `object_type`
                    // in the `package_header`.
                    auto rmo = (RenderMeshObject*)render_object;

                    // Call update on the `RenderMeshObject`.
                    rmo->update(message_header->type, package_header);
                }
            }
            // ... And a lot more case statements to handle different kind of messages.
        }
    }
}

The above is mostly infrastructure to extract messages from the StateStream. It can be a bit involved since a lot of stuff is written out explicitly but the basic idea is hopefully simple and easy to understand.

On to the create_object call done when (message_header->type == WorldRenderInterface::CREATE) is satisfied:

void RenderWorld::create_object(WorldRenderInterface::ObjectManagementPackage *omp)
{
    // Acquire an `object_index`.
    uint32_t object_index = _objects.size();

    // Same recycling mechanism as seen for render handles.
    if (_free_object_indices.any()) {
        object_index = _free_object_indices.back();
        _free_object_indices.pop_back();
    } else {
        _objects.resize(object_index + 1);
        _object_types.resize(object_index + 1);
    }

    void *render_object = omp->user_data;
    if (omp->type == RenderMeshObject::TYPE) {
        // Cast the `render_object` to a `MeshObject`.
        RenderMeshObject *rmo = (RenderMeshObject*)render_object;

        // If needed, do more stuff with `rmo`.
    }

    // Store the `render_object` and `type`.
    _objects[object_index] = render_object;
    _object_types[object_index] = omp->type;

    if (omp->render_handle >= _object_lut.size())
        _object_lut.resize(omp->handle + 1);
    // The `render_handle` is used
    _object_lut[omp->render_handle] = object_index;
}

So the take away from the code above lies in the general usage of the render_handle and the object_index. The render_handle of objects are used to do a look up in _object_lut to get the object_index and type. Let's look at an example, the same RenderWorld::update code presented earlier but this time the focus is when the message is MeshObject::SET_VISIBILITY:

void RenderWorld::update(StateStream *state_stream)
{
    StateStream::MessageHeader *message_header;
    StatePackageHeader *package_header;

    while (get_message(state_stream, &message_header, (void**)&package_header)) {
        switch (package_header->object_type) {
            case (RenderMeshObject::TYPE)
            {
                if (message_header->type == MeshObject::SET_VISIBILITY) {
                    auto svp = (MeshObject::SetVisibilityPackage*>)package_header;

                    // The `render_handle` is used to do a lookup in `_objects_lut` to
                    // to get the `object_index`.
                    uint32_t object_index = _object_lut[package_header->render_handle];

                    // Get the `render_object` from the `object_index`.
                    void *render_object = _objects[object_index];

                    // Cast it since the type is already given from the `object_type`
                    // in the `package_header`.
                    auto rmo = (RenderMeshObject*)render_object;

                    // Call update on the `RenderMeshObject`.
                    rmo->update(message_header->type, svp);
                }
            }
        }
    }
}

The state reflection pattern shown in this post is a fundamental part of the engine. Similar patterns appear in other places as well and having a good understanding of this pattern makes it much easier to understand the internals of the engine.

Tuesday, September 6, 2016

A New Localization System for Stingray

The current Stingray localization system is based around the concept of properties. A property is any period separated part of the file name before the extension. Consider the following three files:

  • trees/larch_03.unit
  • trees/larch_03.fr.unit
  • trees/larch_03.ps4.unit

These three files all have the same type (.unit), and the same name (trees/larch_03), but their properties differ. The first one has no properties set. The second one has the property .fr and the last one has the property .ps4. (Note that resources can have more than one property.)

Properties are resolved in slightly different ways, depending on the kind of property. Platform properties are resolved at compile time, so if you compile for PS4, you will get the PS4 version of the resource (or the default version if there is no .ps4 specific version).

Other properties are resolved at resource load time. When you load a bunch of resources, which property variant is loaded depends on a global property preference order set from the script. A property preference order of ['.fr', '.es'] means that resources with the property .fr are be preferred, then resources with the property .es (if no .fr resource is available), and finally a resource without any properties at all.

This single mechanism is used for localizing strings, sounds, textures, etc. Strings, for example, are stored in .strings files, which are essentially just key-value stores:

file = "File"
open = "Open"
...

To create a French localized of this menu.strings resource, you just create a menu.fr.strings resource and fill it with:

file = "Fichier"
open = "Ouvert"
...

This basic localization system has served us well for many years, but it has some drawbacks that are starting to become more pronounced:

  • It doesn't allow file names with periods in them. Since we always interpret periods as properties, periods can't be a part of the regular file name. This isn't a huge problem when users name their own files, but as we are increasing the interoperability between Stingray and other software packages we more and more run into software that has, let's say peculiar, ways of naming its files. Renaming things by hand is cumbersome and can also break things when files cross-reference each other.

  • Switching language requires reloading the resource packages. This seems overly complicated. We have more memory these days than when we started building Stingray. In many cases, especially for strings, it makes more sense to keep them in memory all the time, so we can switch between them easily.

  • Just switching on platform isn't enough. Mobile devices range from very low-end to at least mid-end. Rather than having .ios and .android properties, we might want .low-quality and .high-quality and select which one to use based on the actual capabilities of the hardware.

  • Making editors work well with the property system has been surprisingly complicated. For example, when the editor runs on Windows, what should it show if there is a .win32 specialization of a resource -- the default version or the .win32 one? How would you edit a .ps4 resource when those are normally stripped out of the Windows runtime?

    We used to have this wonky think where you could sort of cross-compile the resources and say that "I want to run on Windows, but as if I was running on PS4. But to be honest, that system never really worked that well and in the new editor we have gotten rid of it.

Interestingly, out of all these problems, it is the first one -- the most stupid one -- that is the main impetus for change.

The New System

The new system has several parts. First, we decided that for systems that deal with localization a lot, such as strings and sounds it makes sense to have the system actually be aware of localization. That way, we can provide the best possible experience.

So the .strings format has changed to:

file = {en = "File", fr = "Fichier", ...}
open = {en = "Open", fr = "Ouvert", ...}
...

All the languages are stored in the same file and to switch language you just call Localizer.set_language("fr"). We keep all the different languages in memory at all times. Even for a game with ridiculous amounts of text this still doesn't use much memory and it means we can hot-swap languages instantly.

This is a nice approach, but it doesn't work for all resources. We don't want to add this deep kind of integration to resources that are normally not localized, such as .unit and .texture. Still, there sometimes is a need to localize such resources. For example, a .texture might have text in it that needs to be localized. We may need a low-poly version of a .unit for a less capable platform. Or a less gory version of an animation for countries with stricter age ratings.

To make things easier for the editor we decided to ditch the property system all together, and instead go for a substitution strategy. There are no special magical parts of a resource's path -- it is just a name and a type. But if you want to, you can say to the engine that all instances of a certain resource should be replaced with another resource:

trees/larch_03.unit → trees/larch_03_ps4.unit

Note here that there is nothing special or magical about the trees/larch_03_ps4.unit. There is no problem with displaying it on Windows. You just edit it in the editor, like any other unit. However, when you play the game -- any time a trees/larch_03.unit is requested by the engine, a trees/larch_03_ps4.unit is substituted. So if you have authored a level full of larch_03 units, when the override above is in place, you will instead see larch_03_ps4 units.

There are many ways for this scheme to go wrong. The gameplay script might expect to find a certain node branch_43 in the unit -- a node that exists in larch_03.unit, but not in larch_03_ps4.unit and this may lead to unexpected behavior. The same problem existed in the old property system. We don't try to do anything special about this, because it is impossible. In the end, it is only the gameplay script that can know what it means for two things to be similar enough to be used interchangeably. Anyone working with localized resources just has to be careful not to break things.

Overrides can be specified from the Lua script:

Application.set_resource_override("unit", "trees/larch_03", "trees/larch_03_ps4");

Note that this is a much more powerful system than the old property system. Any resource can be set to override any other -- we are not restricted to work within the strict naming scheme required by the property system. Also, the override is dynamic and can be determined at runtime. So it can be based on dynamic properties, such as measured CPU or GPU performance -- or a user setting for the amount of gore they are comfortable with.

It can even be used for completely different things than localization or platform specific resources -- such as replacing the units in a level for a night-time or psychedelic version of the same level. And I'm sure our users will find many other ways of (ab)using this mechanism.

But this dynamic system is not quite enough to do everything we want to do.

First, since the override is dynamic and only happens at runtime, our packaging system can't be aware of it. Normally, our packaging system figures out all resource dependencies automatically. So when you say that you want a package with the forest level, the packaging system will automatically pull in the larch_03 unit that is used in that level, any textures used by that unit, etc. But since the packaging system can't know that at runtime you will replace larch_03 with larch_03_ps4, it doesn't know that larch_03_ps4 and its dependencies should go into the package as well.

You could add larch_03_ps4 to the package manually, since you know it will be used. That might work if you only have one or two overrides. However, even with a very small amount of overrides micromanaging packages in this way becomes incredibly tedious and error prone.

Second, we don't want to burden the packages with resources that will never be used. If we are making a game for digital distribution on iOS or Android we don't want to include large PS4-only resources in that game.

So we need a static override mechanism that is known by the package manager to make sure it includes and excludes the right resources. The simplest thing would be a big file that just listed all the overrides. For example, to override larch_03 on PS4 we would write something like:

resource_overrides = [
  {
    type = "unit"
    name = "trees/larch_03"
    override = "trees/larch_03_ps4"
    platforms = ["ps4"]
  }
]

This would work, but could again get pretty tedious if there are a lot of overrides. It would be nice with something that was a bit more automatic.

Since our users are already used to using name suffixes such as .fr and .ps4 for localization, we decided to build on the same mechanism -- creating overrides automatically based on suffix rules:

resource_overrides = [
  {suffix = "_ps4", platforms = ["ps4"]}
]

This rule says that when we are compiling for the platform PS4, if we find a resource that has the same name as another resource, but with the added suffix _ps4, that resource will automatically be registered as an override for that resource:

trees/larch_03.unit → trees/larch_03_ps4.unit
leaves/larch_leaves.texture → leaves/larch_leaves_ps4.unit

In addition to platform settings, the system also generalizes to support other flags:

resource_overrides = [
  {suffix = "_fr", flags = ["fr"]}
  {suffix = "_4k", flags = ["4K"]}
  {suffix = "_noblood", flags = ["noblood", "PG-13"]}
]

This defines the _fr suffix for French localization. A 4K suffix _4k for high-quality versions of resources suitable for 4K monitors. And a _noblood suffix that selects resources without blood and gore.

The flags can be set at compile time with:

--compile --resource-flag-true 4K

This means that we are compiling a 4K version of the game, so when bundling only the 4K resources will be included and the other versions will be stripped out. Just as if we were compiling for a specific platform.

But we can also choose to resolve the flags at runtime:

--compile --resource-flag-runtime noblood

With this setting, both the regular resource and the _noblood resource will be included in the package and loaded into memory. And we can hot swap between them with:

Application.set_resource_flag("noblood", true)

I have not decided yet whether in addition to these two alternatives we should also have an option that resolves at package load time. I.e., both variants of the resource would be included on disk, but only one of them would be loaded into memory and if you wanted to switch resource you would have to unload the package and load it back into memory again.

I can see some use cases for this, but on the other hand adding more options complicates the system and I like to keep things as simple as possible.

A nice thing about this suffix mapping is that it can be configured to be backwards compatible with the old property system:

resource_overrides = [
  {suffix = ".fr", flags = ["fr"]}
  {suffix = ".ps4", platforms = ["ps4"]}
  {suffix = ".xb1", platforms = ["xb1"]}
]

Whenever we change something in Stingray we try to make it more flexible and data-driven, while at the same time ensuring that the most common cases are still easy to work with. This rewrite of the localization is a good example:

  • It fixes the problem with periods in file names. Periods are now only an issue if you have made an explicit suffix mapping that matches them.

  • We can switch language (or any other resource setting) at runtime.

  • The new system is more flexible -- it doesn't just handle localization and platform specific resources, we can set up whatever resource categories we want. And we can even dynamically override individual resources.

  • The editor no longer needs to do anything special to deal with the concept of "properties". Resources that are used to override other resources can be edited in the editor just like any other resource.

  • And the system can easily be configured to be backwards compatible with the old localization system.

I still feel slightly queasy about using name matching to drive parts of this system. Name matching is a practice that can go horribly wrong. But in this case, since the name matching is completely user controlled I think it makes a good compromise between purity and usability.