filament 렌더링 엔진 여행기 #4

Graphics/filament

filament 렌더링 엔진 여행기 #4 - RenderPass (2)

J'Heel; 2025. 5. 22. 10:00

연휴를 게으르고 나태하게 지내다보니 시간이 참 빠르게 흘렀다.

저번에 글을 올릴 이후로 꽤 시간이 많이 지났는데... 늦은 만큼 아무래도 열심히 달려야하지 않을까...?

아무튼 그런 고로 이전 포스팅에 이어서 RenderPass 두번째다.

지금까지 파악한 걸 요약하면 filament의 렌더링은 기본적으로 RenderPass라는 클래스 단위로 동작하며, 이 때 PassBuilder를 통해 렌더링할 데이터를 전달하여 RenderPass 객체를 만들 수 있다로 요약할 수 있겠다.

RenderPass::RenderPass(FEngine& engine, RenderPassBuilder const& builder) noexcept
        : mRenderableSoa(*builder.mRenderableSoa),
          mScissorViewport(builder.mScissorViewport),
          mCustomCommands(engine.getPerRenderPassArena()) {

    // compute the number of commands we need
    updateSummedPrimitiveCounts(
            const_cast<FScene::RenderableSoa&>(mRenderableSoa), builder.mVisibleRenderables);

    uint32_t commandCount =
            FScene::getPrimitiveCount(mRenderableSoa, builder.mVisibleRenderables.last);
    const bool colorPass  = bool(builder.mCommandTypeFlags & CommandTypeFlags::COLOR);
    const bool depthPass  = bool(builder.mCommandTypeFlags & CommandTypeFlags::DEPTH);
    commandCount *= uint32_t(colorPass * 2 + depthPass);
    commandCount += 1; // for the sentinel

    uint32_t const customCommandCount =
            builder.mCustomCommands.has_value() ? builder.mCustomCommands->size() : 0;

    Command* const commandBegin = builder.mArena.alloc<Command>(commandCount + customCommandCount);
    Command* commandEnd = commandBegin + (commandCount + customCommandCount);
    assert_invariant(commandBegin);

    if (UTILS_UNLIKELY(builder.mArena.getAllocator().isHeapAllocation(commandBegin))) {
        static bool sLogOnce = true;
        if (UTILS_UNLIKELY(sLogOnce)) {
            sLogOnce = false;
            PANIC_LOG("RenderPass arena is full, using slower system heap. Please increase "
                      "the appropriate constant (e.g. FILAMENT_PER_RENDER_PASS_ARENA_SIZE_IN_MB).");
        }
    }

    appendCommands(engine, { commandBegin, commandCount },
            builder.mUboHandle,
            builder.mVisibleRenderables,
            builder.mCommandTypeFlags,
            builder.mFlags,
            builder.mVisibilityMask,
            builder.mVariant,
            builder.mCameraPosition,
            builder.mCameraForwardVector);

    if (builder.mCustomCommands.has_value()) {
        Command* p = commandBegin + commandCount;
        for (auto [channel, passId, command, order, fn]: builder.mCustomCommands.value()) {
            appendCustomCommand(p++, channel, passId, command, order, fn);
        }
    }

    // sort commands once we're done adding commands
    commandEnd = resize(builder.mArena,
            RenderPass::sortCommands(commandBegin, commandEnd));

    if (engine.isAutomaticInstancingEnabled()) {
        int32_t stereoscopicEyeCount = 1;
        if (builder.mFlags & IS_INSTANCED_STEREOSCOPIC) {
            stereoscopicEyeCount *= engine.getConfig().stereoscopicEyeCount;
        }
        commandEnd = resize(builder.mArena,
                instanceify(engine, commandBegin, commandEnd, stereoscopicEyeCount));
    }

    // these are `const` from this point on...
    mCommandBegin = commandBegin;
    mCommandEnd = commandEnd;
}

원본 코드 보기

이전 포스팅에서 updateSummedPrimitiveCounts()를 호출하는 부분과 해당 함수 동작까지 살펴봤었다.

updateSummedPrimitiveCounts() 함수의 기능은 이번 RenderPass에서 렌더링할 서브 메쉬의 개수를 세어 기록해두는 것이다 까지가 저번 포스팅에서 알아본 바이다.

여기까진 매우 초반이고, 크게 중요한 내용이 아니었다. 이 다음 코드부터 RenderPass 생성자의 핵심 동작이라 할 수 있다.

uint32_t commandCount =
            FScene::getPrimitiveCount(mRenderableSoa, builder.mVisibleRenderables.last);
    const bool colorPass  = bool(builder.mCommandTypeFlags & CommandTypeFlags::COLOR);
    const bool depthPass  = bool(builder.mCommandTypeFlags & CommandTypeFlags::DEPTH);
    commandCount *= uint32_t(colorPass * 2 + depthPass);
    commandCount += 1; // for the sentinel

    uint32_t const customCommandCount =
            builder.mCustomCommands.has_value() ? builder.mCustomCommands->size() : 0;

    Command* const commandBegin = builder.mArena.alloc<Command>(commandCount + customCommandCount);
    Command* commandEnd = commandBegin + (commandCount + customCommandCount);
    assert_invariant(commandBegin);

우선 먼저 FScene::getPrimitiveCount() 라는 함수를 이용하여 renderableSoa로부터 커맨드의 개수를 받아온다. 이때 커맨드의 개수는 앞서 updateSummedPrimitiveCounts() 에서 계산한 렌더링할 서브메쉬의 개수다.

updateSummedPrimitiveCounts()에서 계산한 Summed Primitive 개수(서브메쉬 개수)가 mRenderableSoa에 기록되어 있어 mRenderableSoa로부터 SummedPrimitive 개수를 받아오는 단순한 동작이다.

이어서 colorPass와 depthPass 플래그를 정의한다. mCommandTypeFlags에 Color 플래그가 포함되어 있다면 ColorPass를 수행한다는 의미며 depth 쪽도 마찬가지의 의미이다.

아마도 일반적인 화면에 3차원 오브젝트를 렌더링하는 RenderPass를 작성한다면 color와 depth 패스 둘 다 true로 나오지 않을까? 근데 그렇다고하기엔 변수 이름이 좀 이상하다. 우선 과한 추측은 제쳐두고 다음 코드를 보자.

commandCount *= uint32_t(colorPass * 2 + depthPass);

앞서 계산한 summedPrimitive 개수에서 colorPass * 2 + depthPass를 한 뒤, 이를 곱해주고 있다. colorPass든 depthPass든 둘다 bool 타입이기 때문에 colorPass * 2 + depthPass의 결과값은 0, 1, 2, 3 이렇게 네가지 경우만이 존재할 것이다. 그렇다면 commandCount는 기존값에서 0~3배가 될 수 있다는 의미다.

여기까지보면 colorPass와 depthPass의 의도가 좀 이해된다.

depthPass만 사용한다면 곱해지는 값이 1이 될 테니, Primitive를 한 번씩 렌더링하여 depth 데이터를 생성하겠다는 의미일 것이고, color와 depth 모두 사용한다면 depth 데이터를 위한 전체 렌더링을 한 번 수행한 후 이어서 color를 위한 렌더링을 전체를 대상으로 다시 하겠다는 의미일 것이다.

그런데, colorPass는 2를 곱하는 것이 특이하다. 이 말은 ColorPass에선 모든 Primitive가 두 번씩 렌더링 된다는 의미가 되는데... 지금은 기억만 해두고 좀 더 따라가보자.

이어서 commandCount에 1를 추가로 더하고 있는데, 이건 커맨드의 종료 지점을 표시하기 위한 더미 커맨드를 두는게 목적일 것이다. 이런 형태의 리스트 처리에서 복잡한 예외처리를 하는 것보다 더미 객체를 두는 것만큼 편한 게 없다.

Command* const commandBegin = builder.mArena.alloc<Command>(commandCount + customCommandCount);
Command* commandEnd = commandBegin + (commandCount + customCommandCount);
assert_invariant(commandBegin);

다음으론 builder로부터 커맨드를 저장할 저장공간을 할당받고 있다.

builder.mArena 의 자료형을 계속 따라가보면 utils::Arena<utils::LinearAllocatorWithFallback, utils::LockingPolicy::NoLock, utils::TrackingPolicy::HighWatermark, utils::AreaPolicy::StaticArea> 이런 매우 길고, 뭔가 복잡해보이는 템플릿 클래스가 나오는데... 간단하게 설명하자면 그냥 정적 배열이다.

한 번 정의된 순간 최대 용량이 변하지 않으며 alloc을 호출하는 것으로 메모리 영역을 받을 수 있다. 그런데 이제 할당 방식이 Linear하기 때문에 저 Arena로부터 메모리를 할당받을 때 몇 번을 할당 받든 메모리 단편화가 일어나지 않을 것이다.

아무튼 이를 통해 앞서 계산한 커맨드 개수를 담을 수 있을 만큼의 메모리를 할당받아, 시작 포인터와 끝 포인터를 계산하여 변수에 저장한다.

그런데 여기서 코딩하면서 메모리 액세스 오류를 몇 번 접해본 사람이라면 바로 걱정되는 부분이 있다.

builder.mArena가 정적 배열 같은 거라고 하는데, 요구하는 command의 개수가 너무 많으면 mArena의 최대 메모리 용량을 넘어버려 올바르지 않은 접근이 일어나는 거 아닌가? 하는 걱정이다.

매우 합리적인 의심이다. 그리고 당연하게도 filament에선 이런 경우에 대한 예외처리를 넣어놨다.

if (UTILS_UNLIKELY(builder.mArena.getAllocator().isHeapAllocation(commandBegin))) {
        static bool sLogOnce = true;
        if (UTILS_UNLIKELY(sLogOnce)) {
            sLogOnce = false;
            PANIC_LOG("RenderPass arena is full, using slower system heap. Please increase "
                      "the appropriate constant (e.g. FILAMENT_PER_RENDER_PASS_ARENA_SIZE_IN_MB).");
        }
    }

위 코드의 if 구문를 보면 할당받은 메모리의 시작지점은 commandBegin이 heap할당인지 검사한다.

왜냐하면 builder.mArena의 템플릿 인자 첫번째 항목을 보면 LinearAllocatorWithFallback 이라는 allocator를 지정해둔 것을 볼 수 있는데, 이 할당자는 메모리 용량이 정적 배열 범위 내라면 거기서 메모리를 할당해주고, 만일 용량이 부족하다면 heap 메모리를 새로 할당하여 넘겨준다.

그렇기 때문에 다행이도 커맨드 개수가 너무 많아 프로그램이 뻗어버리는 불상사는 일어나지 않는다. 다만 성능 저하를 피할 수 없기에 filament에선 정적 배열의 크기를 더 크게 잡아서 빌드하라는 경고 문구를 내보내는 것을 볼 수 있다.

  appendCommands(engine, { commandBegin, commandCount },
            builder.mUboHandle,
            builder.mVisibleRenderables,
            builder.mCommandTypeFlags,
            builder.mFlags,
            builder.mVisibilityMask,
            builder.mVariant,
            builder.mCameraPosition,
            builder.mCameraForwardVector);

    if (builder.mCustomCommands.has_value()) {
        Command* p = commandBegin + commandCount;
        for (auto [channel, passId, command, order, fn]: builder.mCustomCommands.value()) {
            appendCustomCommand(p++, channel, passId, command, order, fn);
        }
    }

그 다음으로 이제 appendCommands 를 호출하여 커맨드를 생성하여 할당받은 메모리 영역에 집어넣는다. 이때 커스텀 커맨드가 존재할 때 예외 처리가 있는데, 여기까진 굳이 따라가지 않겠다.

void RenderPass::appendCommands(FEngine& engine,
        Slice<Command> commands,
        backend::BufferObjectHandle const uboHandle,
        utils::Range<uint32_t> const vr,
        CommandTypeFlags const commandTypeFlags,
        RenderFlags const renderFlags,
        FScene::VisibleMaskType const visibilityMask,
        Variant const variant,
        float3 const cameraPosition,
        float3 const cameraForwardVector) noexcept {
    SYSTRACE_CALL();
    SYSTRACE_CONTEXT();

    // trace the number of visible renderables
    SYSTRACE_VALUE32("visibleRenderables", vr.size());
    if (UTILS_UNLIKELY(vr.empty())) {
        // no renderables, we still need the sentinel and the command buffer size should be
        // exactly 1.
        assert_invariant(commands.size() == 1);
        Command* curr = commands.data();
        curr->key = uint64_t(Pass::SENTINEL);
        return;
    }

    JobSystem& js = engine.getJobSystem();

    // up-to-date summed primitive counts needed for generateCommands()
    FScene::RenderableSoa const& soa = mRenderableSoa;

    Command* curr = commands.data();
    size_t const commandCount = commands.size();

    auto stereoscopicEyeCount = engine.getConfig().stereoscopicEyeCount;

    auto work = [commandTypeFlags, curr, &soa,
                 boh = uboHandle,
                 variant, renderFlags, visibilityMask,
                 cameraPosition, cameraForwardVector, stereoscopicEyeCount]
            (uint32_t startIndex, uint32_t indexCount) {
        RenderPass::generateCommands(commandTypeFlags, curr,
                soa, { startIndex, startIndex + indexCount }, boh,
                variant, renderFlags, visibilityMask,
                cameraPosition, cameraForwardVector, stereoscopicEyeCount);
    };

    if (vr.size() <= JOBS_PARALLEL_FOR_COMMANDS_COUNT) {
        work(vr.first, vr.size());
    } else {
        auto* jobCommandsParallel = jobs::parallel_for(js, nullptr, vr.first, (uint32_t)vr.size(),
                std::cref(work), jobs::CountSplitter<JOBS_PARALLEL_FOR_COMMANDS_COUNT>());
        js.runAndWait(jobCommandsParallel);
    }

    // Always add an "eof" command
    // "eof" command. These commands are guaranteed to be sorted last in the
    // command buffer.
    curr[commandCount - 1].key = uint64_t(Pass::SENTINEL);

    // Go over all the commands and call prepareProgram().
    // This must be done from the main thread.
    for (Command const* first = curr, *last = curr + commandCount ; first != last ; ++first) {
        if (UTILS_LIKELY((first->key & CUSTOM_MASK) == uint64_t(CustomCommand::PASS))) {
            auto ma = first->info.mi->getMaterial();
            ma->prepareProgram(first->info.materialVariant);
        }
    }
}

원본 코드 보기

appendCommands에선 RenderPass::generateCommands를 호출하여 각 커맨드들을 정의하여 메모리에 기록한다.

전반적인 흐름을 보면 인자로 받은 range vr이 비었어도 커맨드의 끝을 나타내는 sentinel를 삽입한다고 적혀있고, JobSystem이라는 클래스를 이용하여 병렬화하여 커맨드 생성을 하고 있다.

이 JobSystem의 구조는 당장 알아볼 필요는 없어보인다. 입력받은 range의 start와 last에 대한 커맨드를 생성하는데, RenderPass::generateCommands()라는 함수를 이용하여 병렬로 처리하겠다는 것 일테니 이렇게만 이해하고 넘어가도 지금은 충분할 것이다.

void RenderPass::generateCommands(CommandTypeFlags commandTypeFlags, Command* const commands,
        FScene::RenderableSoa const& soa, Range<uint32_t> range,
        backend::BufferObjectHandle renderablesUbo,
        Variant variant, RenderFlags renderFlags,
        FScene::VisibleMaskType visibilityMask, float3 cameraPosition, float3 cameraForward,
        uint8_t stereoEyeCount) noexcept {

    SYSTRACE_CALL();

    // generateCommands() writes both the draw and depth commands simultaneously such that
    // we go throw the list of renderables just once.
    // (in principle, we could have split this method into two, at the cost of going through
    // the list twice)

    // compute how much maximum storage we need
    // double the color pass for transparent objects that need to render twice
    const bool colorPass  = bool(commandTypeFlags & CommandTypeFlags::COLOR);
    const bool depthPass  = bool(commandTypeFlags & CommandTypeFlags::DEPTH);
    const size_t commandsPerPrimitive = uint32_t(colorPass * 2 + depthPass);
    const size_t offsetBegin = FScene::getPrimitiveCount(soa, range.first) * commandsPerPrimitive;
    const size_t offsetEnd   = FScene::getPrimitiveCount(soa, range.last) * commandsPerPrimitive;
    Command* curr = commands + offsetBegin;
    Command* const last = commands + offsetEnd;

    /*
     * The switch {} below is to coerce the compiler into generating different versions of
     * "generateCommandsImpl" based on which pass we're processing.
     *
     *  We use a template function (as opposed to just inlining), so that the compiler is
     *  able to generate actual separate versions of generateCommandsImpl<>, which is much
     *  easier to debug and doesn't impact performance (it's just a predicted jump).
     */

    switch (commandTypeFlags & (CommandTypeFlags::COLOR | CommandTypeFlags::DEPTH)) {
        case CommandTypeFlags::COLOR:
            curr = generateCommandsImpl<CommandTypeFlags::COLOR>(commandTypeFlags, curr,
                    soa, range, renderablesUbo,
                    variant, renderFlags, visibilityMask, cameraPosition, cameraForward,
                    stereoEyeCount);
            break;
        case CommandTypeFlags::DEPTH:
            curr = generateCommandsImpl<CommandTypeFlags::DEPTH>(commandTypeFlags, curr,
                    soa, range, renderablesUbo,
                    variant, renderFlags, visibilityMask, cameraPosition, cameraForward,
                    stereoEyeCount);
            break;
        default:
            // we should never end-up here
            break;
    }

    assert_invariant(curr <= last);

    // commands may have been skipped, cancel all of them.
    while (curr != last) {
        curr->key = uint64_t(Pass::SENTINEL);
        ++curr;
    }
}

원본 코드 보기

generateCommands()에선 commandTypeFlags에 따라 할당받은 메모리 영역에 커맨드를 채우는 동작을 하고 있다.

여기서 주석된 내용을 잘 보면 왜 colorPass 시에는 commandsPerPrimitive 계산 시 2를 곱해주는 지 알 수 있다. 어떤 프리미티브가 투명한 객체를 렌더링 해야하는 경우 두 번 렌더링을 할 수 있는데, 이를 위해 렌더링할 프리미티브 개수의 두 배 만큼 미리 메모리를 잡아둬서 오버플로우가 나지 않도록 하기 위함이다.

투명 객체가 두 번 렌더링 될 수 있는 이유는 유리병 같이 안에 공간이 있으면서 투명한 객체의 경우 일반적인 back face 컬링으로 렌더링하게 되면 뒷면에 보여야할 유리병 반대 부분이 보이지 않게 된다.

투명 객체의 블렌딩 시 먼 곳에서부터 가까운 곳으로 순차적으로 렌더링되야 하기 때문에 투명 객체의 뒷면을 먼저 그리고 이어서 앞면을 그리도록 해야 투명한 객체를 제대로 표현할 수 있다.

이와 좀 다르게 아주 얇은 유리판을 렌더링한다고 한다면 반대편이 렌더링될 필요가 없을 것이다. 고로 투명 객체는 1~2 번의 렌더링이 필요할 수 있으므로 ColorPass라면 미리 렌더링할 프리미티브의 전체 개수 * 2 만큼 할당하고 있는 것이다.

상세한 커맨드 내용을 구성하는 것은 generateCommandsImpl() 내부에서 하고 있는데... 여기 내부까지 살펴보는 건 하지 않겠다..

필자도 이부분을 완전히 파악하고 이해하고 있는 상태가 아니기도 하고, 원할한 코드 읽기를 위해선 적어도 RenderableSoa와 filament에서 사용하는 Material 시스템에 대해 아는 편이 좋다.

아무튼 이와 같이 커맨드를 구성하는데, 마지막에 추가로 예외처리 구문을 넣어놨다.

현재 generateCommandsImpl()을 하면서 커맨드 버퍼의 처음 위치부터 값을 한 개씩 올리면서 커맨드를 넣는데, 이 동작을 모두 마친 후 현재 위치한 포인터 위치를 반환하여 curr로 반환하는데, 이 curr부터 last까지 커맨드의 key를 SENTINEL로 채운다.

커맨드의 전체 개수를 실제 수행되어야할 커맨드의 개수보다 크거나 같게 되도록 메모리를 잡고 있기 때문에 경우에 따라서 버퍼 뒷부분에 필요없는 메모리 영역이 남을 수 있다. 이 부분을 SENTINEL로 채워버리는 것으로 커맨드 처리 시 이 커맨드를 스킵할 수 있도록 하는 것이다.(커맨드는 입력받은 버퍼에 순차적으로 들어가기 때문에 first와 curr 사이에 유효하지 않은 커맨드가 존재하는 경우는 없다)

그런데 이 generateCommands()는 JobSystem를 이용하여 병렬 처리하고 있으니 appendCommands() 완료 직후 커맨드 버퍼의 메모리 상태는

아마 이런 형태가 될 것이다.

그런데 이 상태로 끝나는 것이 아니고 마지막으로 커맨드 정렬을 수행한다.

 // sort commands once we're done adding commands
    commandEnd = resize(builder.mArena,
            RenderPass::sortCommands(commandBegin, commandEnd));

다음과 같이 sortCommands()라는 함수를 호출하여 커맨드를 정렬하고, 커맨드의 마지막 위치를 업데이트한다.

RenderPass::Command* RenderPass::sortCommands(
        Command* const begin, Command* const end) noexcept {
    SYSTRACE_NAME("sort commands");

    std::sort(begin, end);

    // find the last command
    Command* const last = std::partition_point(begin, end,
            [](Command const& c) {
                return c.key != uint64_t(Pass::SENTINEL);
            });

    return last;
}

원본 코드 보기

커맨드 정렬은 key 값을 이용한다. 위 소스 코드에서 확인할 순 없지만 Command는 operator< 가 오버라이딩되어 있어 uint64_t 타입인 key에 대하여 less 연산을 통해 정렬될 수 있도록 구현되어 있다.

앞서 알아본 코드에선 key 값이 Pass::SENTINEL 같이 단순 마스킹 값만 들어가는 부분만 있었지만 실제론 generateCommands() 에서 key에 다양한 설정값을 | 연산을 이용해 누적시켜 놓는다.

그로 인하여 최종적으론 커맨드들이 투명인지 불투명인지, depth pass 인지 colorpass 인지에 따라 주어진 규칙에 맞춰 정렬된다. 이 정렬 규칙 부분은 로직이 꽤 복잡해서 필자도 아직 정확히 살펴보진 않았지만 아마 depth - 불투명 - 투명 순으로 정렬되는 것으로 보인다.

최종적으로 커맨드 버퍼에는 이렇게 커맨드들이 들어가게 된다.

여기까지 완료하면 RenderPass 생성이 끝난다.

인스턴싱과 관련된 처리가 마지막에 하나 더 있긴 한데, 전반적인 RenderPass의 커맨드를 구성하는 로직이 대강 이정도로 마무리할 수 있겠다.

그럼 RenderPass 객체를 만드는 데 성공했으니 이제 execute()를 호출하여 렌더링 명령을 내리기만 하면 된다!

그러니 이번 포스팅에선 여기까지하고, 다음엔 execute() 함수를 알아볼 예정이다. 시작부터 RenderPass를 깊게 파고 들다보니 전체적인 흐름을 보긴 좋으나 볼 게 너무 많았다. RenderPass 쪽은 execute를 마지막으로 빠르게 끝내고 좀 더 기본적인 것들을 살펴보는 게 좋지 않을까 싶다.