Most frame limiters are wrong. Not broken — they produce a frame rate — but wrong in the sense that they misunderstand what they're actually doing.

The naive approach looks like this:

while (elapsed < targetFrameTime) {
    if (spare >= 10.0)
        Sleep(1);
}

Sleep until we're close, then spin. Seems reasonable. The problem is that Sleep(1) doesn't sleep for 1ms. It sleeps for "at least 1ms, probably more, depending on what the OS scheduler feels like." On Windows, the default timer resolution is 15.6ms. You asked for 1ms. You got a coin flip.

The result is frame pacing that looks fine in aggregate — your mean is okay — but your P99 is a disaster. Some frames take twice as long as they should. Not because anything is slow. Because you handed scheduling control to the OS and hoped.

The actual problem

A frame limiter is a resource contention problem with one resource (the CPU), one requester (your render loop), and a required acquisition lifecycle: prepare, render, present, wait, repeat. The "wait" step is where most implementations fall apart.

What you actually need:

Precise timing — not OS sleep granularity, but hardware counter precision. Controlled handoff — give up the CPU when you can afford to, spin when you can't. No drift accumulation — each frame should target an absolute timestamp, not a relative offset from whenever the last wait happened to end.

The fix uses QueryPerformanceCounter for calibration, RDTSC for low-overhead spin measurement, a waitable timer for the bulk of the wait when there's enough slack, and NtSetTimerResolution(10000) to push the OS timer to 1ms resolution.

// Target TSC is ideal frame end — no drift accumulation
uint64_t targetTSC = m_lastFrameTSC
    + static_cast<uint64_t>(targetFrameTime);

if (waitTimeMs > 2.0) {
    SetWaitableTimer(m_hTimer, &dueTime, ...);
    WaitForSingleObject(m_hTimer, INFINITE);
}

// Precision spin for the final approach
do { _mm_pause(); } while (__rdtsc() < targetTSC);

The frame end timestamp is absolute. If a frame runs long, the next target doesn't shift — it stays where it was supposed to be. Drift doesn't accumulate.

The broader point

This is a specific instance of something general: any system where N things compete for 1 thing, and the order and duration of each acquisition matters, is a scheduling problem. Game engines are full of them. The frame loop, the audio thread, the physics tick, the job system — all of it is arbitration.

Most game code doesn't think about it this way. It uses Sleep, or std::this_thread::sleep_for, or busy-polls, and then wonders why the P99 latency is ugly. The fix isn't complicated. It's just precise about what it's actually doing.

Full implementation: mtasa-blue.