The Phantom Closure: Why ‘c_DisplayClassX_0’ Is Allocated Even When Your ‘if’ Is False
A war story about a WPF cross-thread diagnostic patch, an animation tick that ran 60 times per second, and an allocation that shouldn’t have happened - until you read the IL.
TL;DR
Roslyn allocates the closure object (c_DisplayClassX_0) at the scope of the outermost captured variable, not at the lexical position of the lambda.
If a lambda captures a method parameter (or a pattern variable whose scope spans the entire method), the newobj is emitted in the method prolog before any if guard. The closure is therefore created on every call, unconditionally, even when the branch containing the lambda is never entered.
The fix is mechanical: move the lambda into a separate helper method so the only captured variables live in that helper’s scope. The hot path then becomes allocation-free.
The setup
We had a WPF “hack” class that uses Harmony to patch DependencyObject.GetValue and emit a diagnostic whenever a DP is read from the wrong dispatcher thread:
1
2
3
4
5
6
7
8
private static void TryPrintVisualTreeInfo(DependencyObject obj, DependencyProperty? dp = null)
{
if (obj is FrameworkElement element && !element.CheckAccess())
{
var thread = Dispatcher.CurrentDispatcher.Thread.Name;
element.BeginInvoke(() => PrintVisualTreeInfo(element, thread, dp));
}
}
Looks innocent. The lambda is inside the if, so we expected:
- When the call is on the right thread =>
CheckAccess()returnstrue=> no allocation. - When it’s a cross-thread access => 1 closure + 1 delegate + 1
DispatcherOperation.
In production, the allocation profiler told a different story:
1
2
3
4
5
6
7
8
9
new WPFPatch.c_DisplayClassX_0() ← thousands per second
WPFPatch.TryPrintVisualTreeInfo()
WPFPatch.DependencyObjectGetValue.Prefix()
[Lightweight Method Call]
Timeline.get_AccelerationRatio()
Clock.ComputeIntervalsWithParentIntersection()
...
TimeManager.Tick()
MediaContext.RenderMessageHandlerCore()
Every animation tick was minting a fresh closure - but the BeginInvoke callback never ran. We confirmed via a breakpoint on c_DisplayClassX_0..ctor that the closure was being allocated even though the body of the if was apparently skipped.
How?
The IL doesn’t lie
Setting a method breakpoint on the closure’s constructor stopped execution at the if line. The disassembly of the method clarifies why:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
.method private hidebysig static void TryPrintVisualTreeInfo(
[WindowsBase]System.Windows.DependencyObject obj,
[opt] [WindowsBase]System.Windows.DependencyProperty dp)
{
.locals init ([0] WPFPatch/c_DisplayClass4_0 'cs$<>8_locals0')
// METHOD PROLOG - runs every call ===
IL_0000: newobj instance void WPFPatch/<>>c_DisplayClassX_0::.ctor()
IL_0005: stloc.0
IL_0006: ldloc.0
IL_0007: ldarg.1 // dp
IL_0008: stfld ...::dp // captured into closure immediately
// === if (obj is FrameworkElement element && !element.CheckAccess()) ===
IL_000d: ldloc.0
IL_000e: ldarg.0 // obj
IL_000f: isinst FrameworkElement
IL_0014: stfld ...::element // pattern var also captured here
IL_0019: ldloc.0
IL_001a: ldfld ...::element
IL_001f: brfalse.s IL_005b // null => return
IL_0021: ldloc.0
IL_0022: ldfld ...::element
IL_0027: callvirt CheckAccess()
IL_002c: brtrue.s IL_005b // on right thread => return
// = Body of the if =
IL_002e: ldloc.0
IL_002f: call Dispatcher::get_CurrentDispatcher
IL_0034: callvirt Dispatcher::get_Thread
IL_0039: callvirt Thread::get_Name
IL_003e: stfld ...::thread
IL_0043: ldloc.0
IL_0044: ldfld ...::element
IL_0049: ldloc.0
IL_004a: ldftn ...::<TryPrintVisualTreeInfo>b__0
IL_0050: newobj Action:: .ctor
IL_0055: call DispatcherObjectExtensions::BeginInvoke
IL_005a: pop
IL_005b: ret
}
Three things to notice:
- The
newobjforc_DisplayClassX_0is at IL_0000 - the very first instruction of the method. dpis stored into the closure immediately (IL_0008), before any check.- The
elementpattern variable is also a field on the closure (...::element), and gets written at IL_0014, also before theCheckAccessguard.
So the closure object materialises on every single call. The if only gates the delegate creation and the BeginInvoke, which is why PrintVisualTreeInfo never runs but allocations still climb.
Why Roslyn does this
Roslyn has an optimisation that places the closure allocation at the innermost block that encloses every captured variable and every lambda that references them. The intent is the obvious one: don’t allocate the closure until control flow has actually entered a scope that needs it.
That optimization only fires when all captures live inside the same inner block. As soon as one capture has wider scope, the closure must be allocated at that wider scope - otherwise the captured field wouldn’t be live for the whole region in which the source variable is visible.
In our method, the lambda captures three locals:
| Capture | Where it’s declared | Scope |
|---|---|---|
dp |
method parameter | the whole method |
element |
pattern variable in obj is FrameworkElement element |
the whole method (*) |
thread |
var thread = ... inside the if |
inside the if |
(*) A C# 7+ pattern variable declared in an if condition has a scope that extends past the if when used with && and similar short-circuiting operators, and is considered to belong to the enclosing block, not the if body. For the purposes of closure scoping, the compiler treats it as a method local visible to the whole method.
Because dp and element are visible across the whole method, c_DisplayClassX_0 is hoisted to the method scope. thread, even though narrowly scoped, is just an extra field assigned later - it doesn’t pull the allocation back down.
Net result: one capture with method-wide scope is enough to defeat the optimisation for every other capture.
Why the symptom was so bad
The patched method DependencyObjectGetValue.Prefix is invoked from Timeline.get_AccelerationRatio on the WPF media context’s animation tick. That’s ~60 calls per second, per active timeline, on the render thread.
Each invocation:
- Allocated one
c_DisplayClassX_0. - On cross-thread paths, also allocated one
Actionand queued oneDispatcherOperationto a foreign dispatcher.
At market close, a flood of orders poured into the system. The code’s throughput was fine, but memory allocations spiked, causing the GC to pause all threads for garbage collection. The traders were furious, shouting “It’s frozen again!”
The fix
Split the method so the lambda lives in a helper whose only locals are the captures. Then no capture has scope wider than the helper, and Roslyn’s optimisation can place the newobj where you’d naively expect.
Before
1
2
3
4
5
6
7
8
private static void TryPrintVisualTreeInfo(DependencyObject obj, DependencyProperty? dp = null)
{
if (obj is FrameworkElement element && !element.CheckAccess())
{
var thread = Dispatcher.CurrentDispatcher.Thread.Name;
element.BeginInvoke(() => PrintVisualTreeInfo(element, thread, dp));
}
}
After
1
2
3
4
5
6
7
8
9
10
11
12
13
private static void TryPrintVisualTreeInfo(DependencyObject obj, DependencyProperty? dp = null)
{
if (obj is FrameworkElement element && !element.CheckAccess())
{
QueuePrintVisualTreeInfo(element, dp);
}
}
private static void QueuePrintVisualTreeInfo(FrameworkElement element, DependencyProperty? dp)
{
var thread = Dispatcher.CurrentDispatcher.Thread.Name;
element.BeginInvoke(() => PrintVisualTreeInfo(element, thread, dp));
}
TryPrintVisualTreeInfo now has no lambda and no captures => no DisplayClass allocation on the hot path. QueuePrintVisualTreeInfo still allocates one closure per call, but it only runs when the cross-thread condition is actually true - which is what we wanted all along.
Take-aways
- The lexical position of a lambda is not the same as the lexical position of its closure allocation. Roslyn places the
newobjbased on capture scope. - Any method parameter captured by a lambda forces the closure to be allocated at method entry, regardless of how deeply nested the lambda is.
- Pattern variables in
if (x is T y && ...)participate in method scope for closure purposes - capturing them has the same hoisting effect as capturing a parameter. - Hot paths must not capture method-wide variables in a lambda. If you need a lambda on a slow path, move both the lambda and the slow-path captures into a separate method. The fast path stays allocation-free.
- Trust the IL, not the source. When the allocation profiler and your mental model disagree, decompile the method. A two-minute look at the IL prolog will tell you exactly where the
newobjis and why.
How to spot this in your own code
A quick heuristic during code review:
If a lambda inside a guard captures a parameter of the enclosing method (or a pattern variable from the guard itself), and the enclosing method is on a hot path, refactor it.
Or, even more mechanically: open the file in a decompiler, look at the method’s .locals init and the first few IL instructions. If you see a newobj ...<>c_DisplayClassN_M::.ctor() at IL_0000, the closure is unconditional. The only way to make it conditional is to move the lambda into a helper.
幽灵闭包:为什么 ‘if’ 没进去,’c_DisplayClassX_0’ 还是分配了
一个关于 WPF 跨线程诊断补丁的实战故事:一个本不该发生的分配——直到你读了IL。
太长不看版
Roslyn把闭包对象(c_DisplayClassX_0)分配在最外层被捕获变量的作用域,而不是lambda的词法位置。
如果lambda捕获了方法的参数(或者作用域覆盖整个方法的模式匹配变量),newobj 会被生成到方法序言中。于是当闭包所在函数每次被调用时,即使闭包调用位于if语句的body里面,闭包还是会被无条件创建,即使包含lambda的分支从未进入。
修复很简单:把lambda挪到一个独立的helper方法里,让所有被捕获的变量都在helper的作用域内。于是函数调用就不会产生任何内存分配。
背景
我在项目中写了一个hack类,用Harmony对DependencyObject.GetValue打补丁,当有代码从错误的 dispatcher线程读取DP时打印Visual Tree从而知道是哪个资源的访问出了问题:
1
2
3
4
5
6
7
8
private static void TryPrintVisualTreeInfo(DependencyObject obj, DependencyProperty dp)
{
if (obj is FrameworkElement element && !element.CheckAccess())
{
var thread = Dispatcher.CurrentDispatcher.Thread.Name;
element.BeginInvoke(() => PrintVisualTreeInfo(element, thread, dp));
}
}
代码看起来人畜无害。lambda在if里面,所以我们的预期是:
- 在正确线程上调用时 =>
CheckAccess()返回true=> 零分配。 - 跨线程访问时 => 1 个闭包 + 1 个委托 + 1 个
DispatcherOperation。
然而对程序做内存性能分析则发现不是那么回事:
1
2
3
4
5
6
7
8
9
new WPFPatch.c_DisplayClassX_0() ← 每秒数千次
WPFPatch.TryPrintVisualTreeInfo()
WPFPatch.DependencyObjectGetValue.Prefix()
[Lightweight Method Call]
Timeline.get_AccelerationRatio()
Clock.ComputeIntervalsWithParentIntersection()
...
TimeManager.Tick()
MediaContext.RenderMessageHandlerCore()
每一次动画Tick都在创建新的闭包——但BeginInvoke的回调从未执行。我们在c_DisplayClassX_0..ctor上打断点也确认:闭包在 if 的body明显被跳过的情况下仍然被分配了。
什么原因呢?
还得是IL
反汇编方法后一眼就明了了:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
.method private hidebysig static void TryPrintVisualTreeInfo(
[WindowsBase]System.Windows.DependencyObject obj,
[opt] [WindowsBase]System.Windows.DependencyProperty dp)
{
.locals init ([0] WPFPatch/c_DisplayClass4_0 'cs$<>8_locals0')
// METHOD PROLOG - runs every call ===
IL_0000: newobj instance void WPFPatch/<>>c_DisplayClassX_0::.ctor()
IL_0005: stloc.0
IL_0006: ldloc.0
IL_0007: ldarg.1 // dp
IL_0008: stfld ...::dp // 立即捕获到闭包
// === if (obj is FrameworkElement element && !element.CheckAccess()) ===
IL_000d: ldloc.0
IL_000e: ldarg.0 // obj
IL_000f: isinst FrameworkElement
IL_0014: stfld ...::element // 模式变量也被捕获到这里
IL_0019: ldloc.0
IL_001a: ldfld ...::element
IL_001f: brfalse.s IL_005b // null => 返回
IL_0021: ldloc.0
IL_0022: ldfld ...::element
IL_0027: callvirt CheckAccess()
IL_002c: brtrue.s IL_005b // 在正确线程上 => 返回
// = if 的 body =
IL_002e: ldloc.0
IL_002f: call Dispatcher::get_CurrentDispatcher
IL_0034: callvirt Dispatcher::get_Thread
IL_0039: callvirt Thread::get_Name
IL_003e: stfld ...::thread
IL_0043: ldloc.0
IL_0044: ldfld ...::element
IL_0049: ldloc.0
IL_004a: ldftn ...::<TryPrintVisualTreeInfo>b__0
IL_0050: newobj Action:: .ctor
IL_0055: call DispatcherObjectExtensions::BeginInvoke
IL_005a: pop
IL_005b: ret
}
三个关键点:
c_DisplayClassX_0的newobj在 IL_0000——方法的第一条指令。- 在
if语句被执行之前dp就已经被存入闭包(IL_0008)。 element模式变量也是闭包上的一个字段(...::element),在 IL_0014 写入,同样在CheckAccess防护之前。
所以闭包对象在每一次调用中都会实例化。if 只是兜住了委托创建和 BeginInvoke,这就是为什么PrintVisualTreeInfo从未运行,但内存分配量仍然飙升。
Roslyn 为什么这么做
Roslyn有一个优化:把闭包分配放在覆盖所有被捕获变量和引用它们的lambda的最内层块。意图很明确:等控制流真正进入需要闭包的作用域再分配。
很符合我们的直觉,但是这个优化只在所有捕获都处在在同一个内部块时才触发。一旦有一个捕获的作用域更宽,闭包就必须在更宽的那个作用域分配——否则被捕获的字段在源变量可见的整个区间内就不是可靠的了。
在我们的方法里,lambda捕获了三个局部量:
| 捕获 | 声明位置 | 作用域 |
|---|---|---|
dp |
方法参数 | 整个方法 |
element |
obj is FrameworkElement element 中的模式匹配变量 |
整个方法(*) |
thread |
var thread = ... 在 if 内部 |
if 内部 |
(*)C# 7+ 中,if 条件里声明的模式匹配变量,当与 && 等短路运算符一起使用时,作用域会延伸到 if 之外,被视为属于外层块而非 if body。就闭包作用域而言,编译器将其当作一个对整个方法可见的方法局部变量。
因为 dp 和 element 在整个方法中可见,c_DisplayClassX_0 被提升到了方法作用域。thread 虽然作用域很窄,不过是稍后多赋一个字段——它没法把分配点拉回来。
最终结果:只要有一个方法级作用域的捕获,就足以让其他所有捕获的优化全部失效。
为什么症状这么严重
被补丁的方法会被WPF媒体上下文的动画Tick从Timeline.get_AccelerationRatio 调起。渲染线程上,每个活跃的时间线每秒大约60次调用,更不用说DependencyObject.GetValue几乎每时每刻都在被调用。
每次调用:
- 分配一个
c_DisplayClassX_0。 - 在跨线程路径上,还要分配一个
Action并入队一个DispatcherOperation到外部dispatcher。
于是在股市关市的那一刻,大量订单涌入系统,代码的性能没有问题,但是内存分配却暴增,导致GC把全部线程停住开始垃圾回收,交易员暴跳如雷咆哮着喊“又卡住了”。
修复
把方法拆开,把lambda以及所有需要捕获的变量都移动到一个helper里。这样就没有任何捕获的作用域比helper更宽,Roslyn的优化就能把newobj放在我们所期望的位置。
修改前
1
2
3
4
5
6
7
8
private static void TryPrintVisualTreeInfo(DependencyObject obj, DependencyProperty dp)
{
if (obj is FrameworkElement element && !element.CheckAccess())
{
var thread = Dispatcher.CurrentDispatcher.Thread.Name;
element.BeginInvoke(() => PrintVisualTreeInfo(element, thread, dp));
}
}
修改后
1
2
3
4
5
6
7
8
9
10
11
12
13
private static void TryPrintVisualTreeInfo(DependencyObject obj, DependencyProperty dp)
{
if (obj is FrameworkElement element && !element.CheckAccess())
{
DispatchPrintVisualTreeInfo(element, dp);
}
}
private static void DispatchPrintVisualTreeInfo(FrameworkElement element, DependencyProperty? dp)
{
var thread = Dispatcher.CurrentDispatcher.Thread.Name;
element.BeginInvoke(() => PrintVisualTreeInfo(element, thread, dp));
}
TryPrintVisualTreeInfo现在没有lambda,也没有捕获 => 热路径上不再有DisplayClass分配。DispatchPrintVisualTreeInfo每次调用仍然分配一个闭包,但它只在跨线程条件真正成立时才运行——这正是我们一开始想要的效果。
核心要点
- lambda 的词法位置不等于其闭包分配的词法位置。
Roslyn根据捕获的作用域来放置newobj。 - lambda 只要捕获了方法参数,闭包就一定会被分配在方法入口,无论
lambda嵌套多深。 if (x is T y && ...)中的模式变量在闭包作用域上参与的是方法级作用域——捕获它们和捕获参数有相同的提升效应。- 热路径绝不能让
lambda捕获方法级变量。如果慢路径需要一个lambda,就把lambda和慢路径的捕获都移到一个单独的方法里。快路径保持零分配。 - 信 IL,别信源码。 当分配分析器和你的心智模型打架时,反编译那个方法。花两分钟看一眼
IL序言,newobj在哪、为什么在那,一目了然。
怎么在自己的代码里发现这个问题
代码审阅时的启发:
如果一个
guard里的lambda捕获了外层方法的参数(或者guard自身的模式匹配变量),并且外层方法在热路径上,那就重构它。
或者更机械一点:在反编译器里打开文件,看方法的 .locals init 和前几条IL指令。如果你在 IL_0000看到了 newobj ...<>c_DisplayClassN_M::.ctor(),这个闭包就是无条件的。让它变成有条件的方式只有一个:把lambda挪进helper。