Deep dive Tainted Grail [1] - Drake - runtime entity renderer registration system

The Name

Drake is a type of dragon. Dragons are gold keepers, and this system is also golden. :) With this system, I started a naming convention related to mythological creatures/entities.

Background

We faced significant performance issues with MeshRenderers. Internal managers for MeshRenderers were (and may still be) very CPU-intensive, as was frustum culling. However, we needed to use many renderers.
At the time, Entities were nearing version 1.0, and my tests with heavy renderer loads showed much better performance. The idea was straightforward: use Entities for rendering. The requirements were:

Minimal changes to artists' workflows
Only rendering changes (physics and other systems remain unchanged)
Interactivity with Unity's OOP world (most renderers are static, but hundreds require some interaction):
- Follow transforms
- Copy enable/disable state from corresponding GameObject
- Process material change requests (a recent addition)
Streaming via Addressables, as Entities' content management would duplicate assets, and we were unaware of Addressables' scaling limitations

Subscene systems were not feasible for this use case.

First steps

Let's start simple. Just take MeshRenderer and convert it at runtime to entity. It's easy, just use RenderMeshUtility.AddComponents, nice win.
The next step was to remove MeshRenderer and move all required data to DrakeMeshRenderer:

Mesh
Materials
RenderMeshDescription

Editor code was implemented to convert MeshRenderer to DrakeMeshRenderer. The DrakeMeshRenderer itself is just a few lines of code:

var count = _materials.Count;
var renderMeshArray = new RenderMeshArray(_materials, new Mesh[] { _mesh }); // Allocation :(
var localToWorld =  new LocalToWorld(){ Value = transform.localToWorldMatrix };

_entities = new NativeArray<Entity>(count, Allocator.Persistent, NativeArrayOptions.UninitializedMemory);

for(var  i = 0; i < count; ++i)
{
    var entity = entityManager.CreateEntity();

    RenderMeshUtility.AddComponents( // Heavy :(
        entity,
        entityManager,
        desc,
        renderMeshArray,
        MaterialMeshInfo.FromRenderMeshArrayIndices(i, 0, i)
    );
    entityManager.AddComponentData(entity, localToWorld);

    _entities[i] = entity;
}

LOD's

Next, we needed to handle LODGroup conversion by creating entity with LODGroup data and notifying rendering Entities that they are linked to it.

Entities LodGroup

We need MeshLODGroupComponent. It's easy code:

public void Initialize(LODGroup lodGroup)
{
    var transform = lodGroup.transform;
    var worldSize = LodUtils.GetWorldSpaceScale(transform) * lodGroup.size;
    localReferencePoint = lodGroup.localReferencePoint; // Local space

    lodDistances0 = new float4(float.PositiveInfinity);
    odDistances1 = new float4(float.PositiveInfinity);
    var lods = lodGroup.GetLODs();
    for (var i = 0; i < lods.Length; ++i)
    {
        var d = worldSize / lods[i].screenRelativeTransitionHeight; // But here world space ugh
        if (i < 4)
        {
            lodDistances0[i] = d;
        }
        else
        {
            lodDistances1[i - 4] = d;
        }
    }
}

DrakeRendererManager and modifications of DrakeMeshRenderer

Now DrakeMeshRendererneeds to add to its entities MeshLODComponent. So we need to save lod mask at conversion time and we need entity spawned by DrakeLodGroup.
Let me introduce DrakeRendererManager, which creates both renderer and LOD group Entities. so after lod group entity is created we can pass them to spawning of renderers.

For simplicity, we maintain a link between DrakeLodGroup and DrakeMeshRenderer:

DrakeMeshRenderer has an optional reference to DrakeLodGroup.
DrakeLodGroup has an array of DrakeMeshRenderer references, requiring at least one entry (otherwise, it’s an invalid LODGroup).

DrakeRendererManager has two Register methods: one for DrakeLodGroup (DLG) and one for DrakeMeshRenderer (DMR), also called from DLG for each child. Both DLG and DMR call the appropriate method from Start.

Streaming

Streaming is critical to load only necessary meshes and materials. RenderMeshArray and MaterialMeshInfo must be added and removed dynamically.
Addressables uses strings as keys, which cannot be bursted, so we store them indirectly.

Ladies and gentlemen, DrakeMaterialMeshInfo:

public struct DrakeMaterialMeshInfo : IComponentData
{
    public ushort meshIndex;
    public ushort materialIndex;
    public byte submesh;
}

Marvelous one. This component stores material and mesh indices, which require arrays to index them. This led to the creation of DrakeAddressablesManager.
If C# would has header files, for DrakeAddressablesManager it would be like that:

public class DrakeAddressablesManager
{
    Dictionary<string, ushort> _meshKeyToIndex;
    Dictionary<string, ushort> _materialKeyToIndex;

    List<AddressableLoadingData<Mesh>> _meshLoadingData;
    List<AddressableLoadingData<Material>> _materialLoadingData;

public:
    ushort RegisterMaterial(string materialKey);
    void StartLoadingMaterial(ushort materialIndex);
    void StartUnloadingMaterial(ushort materialIndex);

    ushort RegisterMesh(string meshKey);
    void StartLoadingMesh(ushort meshIndex);
    void StartUnloadingMesh(ushort meshIndex);

    Optional<BatchMaterialID> TryGetLoadedMaterial(ushort materialIndex);
    Optional<BatchMeshID> TryGetMesh(ushort meshIndex);
}

and if you wonder, AddressableLoadingData<T>:

public struct AddressableLoadingData<T> where T : Object
{
    public readonly string key;
    public AsyncOperationHandle<T> loadingHandle;
    public ushort counter;
}

During registration, we register keys and add DrakeMaterialMeshInfo instead of RenderMeshArray and MaterialMeshInfo. Then other simple checks if the renderer should be loaded (based on a distance-to-camera vs. LOD distance comparison). If so, StartLoadingMaterial and StartLoadingMesh are called with the corresponding indices.
The next frame, another system checks TryGetLoadedMaterial and TryGetMesh. If both succeed, MaterialMeshInfo is created and added to the entity.

You may ask:
What about RenderMeshArray?

It’s very wasteful, as we know when resources are loaded. We manually replicate RenderMeshArray’s functionality by obtaining EntitiesGraphicsSystem and calling (Un)RegisterMesh or (Un)RegisterMaterial. This eliminates multiple managed array creations - double win.

Mipmaps streaming

It's not topic of this post, but quick mention.
Once mesh loading is complete, we collect UV distribution data. When material loading is finished, we register the material to the mipmap streaming system, which provides a MaterialMipMapsStreamingHandle. After both the mesh and material are loaded and the Entity is spawning, UVDistributionComponent and MipmapsStreamingComponent are added to the Entity. From then on, mipmap streaming systems can process these and update the required mip level.

Let move it

Another requirement: possibility to link transform with renderer entity.
Transforms are one of Unity’s Four Horsemen of the Apocalypse. To avoid killing performance, we use TransformAccessArray.

We model this as: after the simulation step, run IJobParallelForTransform to write transform data back to linked Entities.
The challenge lies in the near-nonexistent documentation for TransformAccessArray. Let me reiterate: one of the most impactful and essential optimization tool lacks explanation - good job Unity as always :)

It's worth mentioning how we store transforms in an ECS unmanaged component:

public struct LinkedTransformComponent : IComponentData, IEquatable<LinkedTransformComponent>
{
    public readonly UnityObjectRef<Transform> transform;

    public LinkedTransformComponent(Transform transform) {
        this.transform = transform;
    }

    public bool Equals(LinkedTransformComponent other) {
        return transform.Equals(other.transform);
    }

    public override int GetHashCode() {
        return transform.GetHashCode();
    }
}

The registration method collects new entities and registers each:

int Register(Entity entity, Transform transform)
{
    if (_freeIds.Length > 0) {
        var lastIndex = _freeIds.Length - 1;
        var index = _freeIds[lastIndex];
        _freeIds.RemoveAtSwapBack(lastIndex);
        _transformsArray[index] = transform;
        _linkedTransformEntities[index] = entity;
        return index;
    }
    var newId = _transformsArray.length;
    _transformsArray.Add(transform);
    _linkedTransformEntities.Add(entity);

    return newId;
}

_freeIds tracks unoccupied transform indices in _transformsArray. At the time, leaving removed transforms in TransformAccessArray was faster than removing them and adjusting Entity indices. We also use LinkedTransformIndexComponent, an ICleanupComponentData storing the index to free.

Two way link

We can read managed data (track transform values), but we also need to manipulate Entities from the managed side.
If requested, LinkedEntitiesAccess MonoBehaviour is added during registration, holding a list of Entities linked to the GameObject. A challenge arises because registration uses EntityCommandBuffer, so the Entity isn’t immediately available. As a workaround, registration creates a LinkedEntitiesAccessRequest with a UnityRef to the LinkedEntitiesAccess MonoBehaviour. After ECB playback, a system processes requests and registers valid Entities to the given LinkedEntitiesAccess. This requires a managed, unbursted system, but I’m unsure if there’s a better approach.

Enable/Disable/Destroy

With access to entities we can react to GameObject state changes: hide entities when GameObject is disabled, show them when is enabled, and destroy renderers alongside with GameObject.

Materials

Often there is need to modify material of renderers (strictly for renderer not material itself). There are two main modifications:

Replace value

This uses custom components tagged with MaterialPropertyAttribute from Entities Graphics. No custom modifications are required, only a smart editor to display possible changes.

Replace whole material

To replace a material, we first register the new material with DrakeResourcesManager, which returns a material index for creating a DrakeMaterialMeshInfo.
Next step is tricky, as it depends on the current state of DrakeMeshRenderer. There are five states to address, but all involve replacing the old DrakeMaterialMeshInfo with a new one. Additional operations primarily include unloading the previously loaded material (decrementing its refcount) and adding or removing certain state components.

Scene unloaded

Scene lifetime management is required.
Startup is easy using MonoBehaviour’s Start (preferred, as spawning agents can edit properties, e.g., mark as non-static).
Unloading is much trickier, as no MonoBehaviour remains (see Optimize/Leftovers section). To solve that issue I created SystemRelatedLifeTime<T> class, with nested IdComponent. IdComponent has just int id and it's ISharedComponentData.
To make such generic component work you need to register every usage with line like:

[assembly: RegisterGenericComponentType(typeof(SystemRelatedLifeTime<DrakeRendererManager>.IdComponent))]

Now you can make query over that component, with filter for specific IdComponent, and destroy all matching entities. It’s highly efficient and elegant.
Id for scene is just scene handle value.

Vertex snap hack

Our artists often use vertex snap tool in editor. Therefore Drake needs to support it, unfortunately there is no such functionality in entites. As a workaround, we made special mode when instead of creating entities we spawn old LodGroups and MeshRenderers as hidden in hierarchy children. That made editor flow very complicated (need to keep track of all Drakes, spawn in two ways, enter/exit play mode edge-cases, prefab editing nightmare, duplication edge-cases, undo and so on).
There is no much more to say, just that Unity editor support is very painful.

Optimize

Leftovers

Once Drakes are registered, there is no need to keep their corresponding MonoBehaviours. In most cases, these are the only components on a GameObject, allowing the GameObject to be destroyed. Often, this leaves the parent GameObject as a leaf node with no components, which can also be purged, and the cycle continues up to needed GameObject.

This results in shallower hierarchies and allows many small memory allocations to be reclaimed faster (after a GarbageCollector run, of course).

Static

If you check entities graphics code, you can find that LodGroup is just to update components on rendering entities.
For static groups with no moving parts, we can skip spawning LODGroup and assign values at spawn time.
The plan is simple: for static group, spawn only fully set up rendering entities. First, we need to determine if the prefab is static. I used the static flag from the GameObject, but since this flag is unavailable in builds, it must be cached during baking.

EntityCommandBuffer

You might hear that EntityManager should be used whenever possible because it’s faster than EntityCommandBuffer. We tested this, but it was significantly slower in our case. At Start, we spawn thousands of entities for Renderers and hundreds for LODGroups in chains of operations on these entities. Most operations occur after a scene loads, though a few take place during gameplay.

For LodGroup chain is like:

Create Entity
Set LocalToWorld
Set MeshLODGroupComponent
Set LinkedTransformComponent
Set IdComponent (shared component)
[Optionally] Add LinkedEntitiesAccessRequest

For Renderer:

Create Entity
Set LocalToWorld
Set WorldRenderBounds
Set DrakeMeshMaterialComponent
[Optionally] Set MeshLODComponent
[Optionally] Set DrakeRendererVisibleRangeComponent
[Optionally] Set LODRange
[Optionally] Set LODWorldReferencePoint
[Optionally] Set LinkedTransformComponent
[Optionally] Set RenderBounds
[Optionally] Set LinkedTransformLocalToWorldOffsetComponent
Set RenderFilterSettings (shared component)
Set IdComponent (shared component)
[Optionally] Add LinkedEntitiesAccessRequest

Archetypes

Adding components is expensive, so we create Entities with the correct archetype using DrakeRendererArchetypeKey. It covers all configurations, and the manager generates archetypes for each.
Main logic looks like:

static DrakeRendererArchetypeKey[] CreateAllValues() {
    // _static * _isTransparent * _hasLod * _inMotionPass * _lightProbeUsage * _hasShadowsOverriden * _hasLocalToWorldOffset
    DrakeRendererArchetypeKey[] values = new DrakeRendererArchetypeKey[2*2*2*2*4*2*2];
    int index = 0;
    for (int i = 0; i < 2; i++) {
        var isStatic = i == 1;
        for (int j = 0; j < 2; j++) {
            var isTransparent = j == 1;
            for (int k = 0; k < 2; k++) {
                var hasLod = k == 1;
                for (int l = 0; l < 2; l++) {
                    var inMotion = l == 1;
                    for (int m = 0; m < 4; m++) {
                        var lightProbeUsage = m == 0 ? LightProbeUsage.Off : (LightProbeUsage)(1 << (m-1));
                        for (int n = 0; n < 2; n++) {
                            var hasShadowsOverriden = n == 1;
                            for (int o = 0; o < 2; o++) {
                                var hasLocalToWorldOffset = o == 1;
                                values[index++] = new DrakeRendererArchetypeKey(isStatic, isTransparent, hasLod, inMotion, lightProbeUsage, hasShadowsOverriden, hasLocalToWorldOffset);
                            }
                        }
                    }
                }
            }
        }
    }
    return values;
}

var archetypeKeys = DrakeRendererArchetypeKey.All;
_entityArchetypes = new NativeHashMap<DrakeRendererArchetypeKey, EntityArchetype>(archetypeKeys.Length, Allocator.Domain);
foreach (var archetypeKey in archetypeKeys) {
    _entityArchetypes.Add(archetypeKey, CreateArchetype(archetypeKey, entityManager));
}

// From Unity.Rendering.RenderMeshUtility.EntitiesGraphicsComponentTypes
EntityArchetype CreateArchetype(DrakeRendererArchetypeKey archetypeKey, EntityManager entityManager)
{
    var components = new UnsafeList<ComponentType>(24, ARAlloc.Temp) {
        ComponentType.ReadWrite<WorldRenderBounds>(),
        ComponentType.ReadWrite<DrakeMeshMaterialComponent>(),
        ComponentType.ReadWrite<PerInstanceCullingTag>(),
        ComponentType.ReadWrite<WorldToLocal_Tag>(),
        ComponentType.ReadWrite<LocalToWorld>(),
        ComponentType.ReadWrite<MipmapsFactorComponent>(),

        ComponentType.ChunkComponent<ChunkWorldRenderBounds>(),

        ComponentType.ReadWrite<RenderFilterSettings>(),
        ComponentType.ReadWrite<SystemRelatedLifeTime<DrakeRendererManager>.IdComponent>(),
        ComponentType.ReadWrite<ShadowsProcessedTag>(),
    };

    if (archetypeKey.isStatic)
    {
        components.Add(ComponentType.ReadWrite<Static>());
    }
    else
    {
        components.Add(ComponentType.ReadWrite<LinkedTransformComponent>());
        components.Add(ComponentType.ReadWrite<RenderBounds>());
        if (archetypeKey.inMotionPass)
        {
            components.Add(ComponentType.ReadWrite<BuiltinMaterialPropertyUnity_MatrixPreviousM>());
        }
        if (archetypeKey.hasLocalToWorldOffset)
        {
            components.Add(ComponentType.ReadWrite<LinkedTransformLocalToWorldOffsetComponent>());
        }
    }

    if (archetypeKey.hasLodGroup)
    {
        components.Add(ComponentType.ReadWrite<DrakeRendererVisibleRangeComponent>());
        components.Add(ComponentType.ReadWrite<LODRange>());
        components.Add(ComponentType.ReadWrite<LODWorldReferencePoint>());
        if (!archetypeKey.isStatic)
        {
            components.Add(ComponentType.ReadWrite<MeshLODComponent>());
        }
    }
    else
    {
        components.Add(ComponentType.ReadWrite<DrakeRendererLoadRequestTag>());
    }

    if (archetypeKey.isTransparent)
    {
        components.Add(ComponentType.ReadWrite<DepthSorted_Tag>());
    }
    if (archetypeKey.lightProbeUsage == LightProbeUsage.BlendProbes)
    {
        components.Add(ComponentType.ReadWrite<BlendProbeTag>());
    }
    else if (archetypeKey.lightProbeUsage == LightProbeUsage.CustomProvided)
    {
        components.Add(ComponentType.ReadWrite<CustomProbeTag>());
    }
    if (archetypeKey.hasShadowsOverriden)
    {
        components.Add(ComponentType.ReadWrite<ShadowsChangedTag>());
    }
#if UNITY_EDITOR
#if DEBUG
    components.Add(ComponentType.ReadWrite<CullingDistancePreviewComponent>());
#endif
    if (!UnityEditor.EditorPrefs.GetBool("showEntities", false)) {
        components.Add(ComponentType.ReadWrite<EntityGuid>());
    }

    components.Add(ComponentType.ReadWrite<EditorRenderData>());
#endif

    var archetype = entityManager.CreateArchetype(components.AsNativeArray());
    components.Dispose();
    return archetype;
}

This reduces runtime operations significantly.

Scene parts hide/show

In the “Two-Way Link” section, I mentioned that entities access is managed per LOD hierarchy. That isn't always optimal. In game, we handle several map region changes, that involve hiding or showing many such hierarchies, leading to numerous small operations for each action and and requires many LinkedEntitiesAccess MonoBehaviours.
To address it, we introduced SharedLinkedEntitiesAccess: if available in a parent, we register entities to it instead of creating a new LinkedEntitiesAccess.

Merged drake

We have a lot static drakes, each renderer and LOD group requires short living MonoBehaviour. That mean a lot of small allocations.
Lot of small allocations is definition of bad memory management.
Solution is fairly simple:

At build:

Collect all static Drakes from scene
Create a single MergedDrake GameObject with GUID
Collect all data required to spawn collected Drakes
Serialize data as binary into file in StreamingAssets, file is addressed by GUID
Remove processed Drakes

At runtime:

Load data into Temp native allocation
Register all meshes and materials
Spawn Drakes via a bursted job
Release data

This requires only one MonoBehaviour with single field (GUID) and, as a bonus, spawning is now bursted. Big win for simple batching.

Closing

As you can see, even a 'simple' "runtime entity renderer registration system" can be complex.
At the same time, you may notice that even a straightforward system, when scaled, reveals new optimization opportunities.
To see it in action, try playing Tainted Grail: The Fall of Avalon and count how many rendering is going on (be aware there is no occulusion culling) :)

KamilVDono @kamilvdono