Fixing Rust memory allocation slowdown in VS Code on Windows

2025-03-25

On Windows, Rust programs can have surprisingly poor performance when running in VS Code with CodeLLDB, and in some other IDEs. This particularly affects programs that use allocation-heavy containers like HashMap<String, _>. Even a few megabytes of small allocations can lead to terrible performance when the container is deallocated.

There is an easy fix: set the environment variable _NO_DEBUG_HEAP=1 and you can get multiple orders of magnitude improvement. This isn’t a novel discovery, but it’s a fairly obscure feature and I’ve found no discussion of it in the context of Rust, and I just got bitten by it yet again, so I’ll try to explain it here.

To reproduce the issue: Start a new project with cargo new --bin. Open in VS Code, with the standard rust-analyzer and CodeLLDB extensions. Use “LLDB: Generate Launch Configurations from Cargo.toml” to build a default launch.json (or you can use the “rust-analyzer: Debug” command). Add this code:

use std::{collections::HashMap, time::Instant};

fn main() {
    let t = Instant::now();
    let mut h = HashMap::new();
    for i in 0..1_000_000 {
        h.insert(i.to_string(), i.to_string());
    }
    println!("Constructed in {:.2} secs", t.elapsed().as_secs_f64());

    let t = Instant::now();
    drop(h);
    println!("Dropped in {:.2} secs", t.elapsed().as_secs_f64());
}

Run it with F5 (“Start Debugging”):

Constructed in 3.04 secs
Dropped in 52.75 secs

The drop is remarkably slow. It’s the same if we don’t explicitly drop and just let the object go out of scope. If we change the number of iterations, the cost seems to increase as roughly O(n²), which is very bad.

Of course we shouldn’t expect great performance from a debug build. Let’s try adding --release to the build command in launch.json:

Constructed in 2.14 secs
Dropped in 52.30 secs

Hmm, that’s barely any better. Maybe the debugger is interfering? Try ctrl+F5 (“Run Without Debugging”) – nope, that’s just as slow.

For comparison, let’s try cargo run --release from a command line:

Constructed in 0.44 secs
Dropped in 0.20 secs

Deallocation is 250x faster! Why is it so slow when running from the IDE?

Try pausing the debugger during the drop, and the call stack looks like:

RtlTryEnterCriticalSection (@RtlTryEnterCriticalSection:957)
RtlTryEnterCriticalSection (@RtlTryEnterCriticalSection:602)
RtlTryEnterCriticalSection (@RtlTryEnterCriticalSection:691)
RtlTryEnterCriticalSection (@RtlTryEnterCriticalSection:1234)
RtlDeleteBoundaryDescriptor (@RtlDeleteBoundaryDescriptor:392)
RtlGetCurrentServiceSessionId (@RtlGetCurrentServiceSessionId:1203)
RtlFreeHeap (@RtlFreeHeap:24)
RtlRegisterSecureMemoryCacheCallback (@RtlRegisterSecureMemoryCacheCallback:348)
EtwLogTraceEvent (@EtwLogTraceEvent:201)
RtlGetCurrentServiceSessionId (@RtlGetCurrentServiceSessionId:1203)
RtlFreeHeap (@RtlFreeHeap:24)
<hashbrown::raw::RawTable<T,A> as core::ops::drop::Drop>::drop (@<hashbrown::raw::RawTable<T,A> as core::ops::drop::Drop>::drop:57)
...

Rtl is the “run-time library” from the mostly-undocumented Windows NT native API (the level below the properly-documented Win32 API). If we step into drop with the debugger, we get to GlobalAlloc::dealloc which calls HeapFree which calls RtlFreeHeap. This is simply freeing a single pointer, so it shouldn’t be that expensive.

CodeLLDB isn’t great at stack tracing through the OS, so let’s try with WinDbg instead, which gives more sensible output:

ntdll!RtlpHeapFindListLookupEntry+0x1c0
ntdll!RtlpFindEntry+0x3a
ntdll!RtlpFreeHeap+0x94b
ntdll!RtlpFreeHeapInternal+0x7c4
ntdll!RtlFreeHeap+0x51
ntdll!RtlDebugFreeHeap+0x273
ntdll!RtlpFreeHeap+0x83ae6
ntdll!RtlpFreeHeapInternal+0x7c4
ntdll!RtlFreeHeap+0x51
debug_heap_test!hashbrown::raw::RawTableInner::resize_inner+0x327
...

The RtlDebugFreeHeap stands out. It turns out this debug heap is an old (~1994) Windows feature with almost zero official documentation. Processes created by a debugger have a few global flags set, causing RtlCreateHeap to set corresponding heap flags that enable some debug features.

In particular HEAP_FREE_CHECKING_ENABLED overwrites freed memory with a fixed pattern (0xFEEEFEEE), and validates the heap by checking that every free block still has the same pattern (indicating it has not been mistakenly overwritten by the application). It appears to perform this validation over the entire heap on every call to RtlFreeHeap, resulting in the O(n²) cost when freeing a large number of objects.

The debug heap is meant to help C programmers detect buffer overflows, use after free, etc. We don’t need that help, because we’re using Rust.

(Actually it’s not even helpful for most C programmers, because Microsoft’s C runtime has its own separate CRT debug heap to catch these errors before they reach the Rtl heap. Recently AddressSanitizer came to Windows too; that’s significantly more powerful since it can detect reads to bad addresses, not just writes.)

Fortunately the Rtl debug heap can be disabled by setting the environment variable _NO_DEBUG_HEAP=1 before creating the process. (The process can’t set the variable itself, because the flags will have already been set and the heap already created.)

With CodeLLDB you can set this globally in settings.json (“Preferences: Open User Settings (JSON)”):

"lldb.launch.env": {
    "_NO_DEBUG_HEAP": "1"
},

or set it per project if you prefer. Now let’s try the same program in the debugger:

Constructed in 0.41 secs
Dropped in 0.17 secs

Problem solved.

This issue typically doesn’t affect C/C++ in VS Code, because the C/C++ extension already sets _NO_DEBUG_HEAP by default. That also applies to Rust programs if you configure VS Code with "rust-analyzer.debug.engine": "ms-vscode.cpptools" (or leave it at the default auto and don’t install CodeLLDB) and launch the program with the “rust-analyzer: Debug” command, since that debugs via the C/C++ extension instead of CodeLLDB.

Visual Studio sets _NO_DEBUG_HEAP by default since 2015.

This issue does affect the RustRover IDE, though only when using the “Debug” command. The “Run” command is not affected, because Windows doesn’t think there’s a debugger (IsDebuggerPresent() returns false, unlike VS Code’s “run without debugging”).

WinDbg triggers the debug heap by default, but has the -hd command line option to disable it.

Some profilers might trigger the debug heap, though at least Intel VTune and AMD uProf appear not to (the process runs with IsDebuggerPresent() == false).