Implicit boxing in System.Array.IndexOf

Posted by eagleboost on August 25, 2021

1. The Problem

I was fixing some performance issues for our project last week, and interestingly the profiler shows boxing operation (allocation of char) happened inside a method that calls System.Array.IndexOf<T>(). That method is fairly simple and no boxing at all, so the .net framework source code becomes the top suspect.

2. Analysis

Although System.Array.IndexOf<T>() seems to be a generic method that shouldn’t cause any boxing, but its source code shows the real job is actually done by EqualityComparer<T>.IndexOf().

1
2
3
4
5
public static int IndexOf<T>(T[] array, T value, int startIndex, int count) 
{
  ......
  return EqualityComparer<T>.Default.IndexOf(array, value, startIndex, count);
}

In our case the data type is char, soEqualityComparer<char>.Default would return an instance of GenericEqualityComparer<char>.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
public abstract class EqualityComparer<T> : IEqualityComparer, IEqualityComparer<T>
{
  private static EqualityComparer<T> CreateComparer() 
  {
    ......
    // If T implements IEquatable<T> return a GenericEqualityComparer<T>
    if (typeof(IEquatable<T>).IsAssignableFrom(t)) {
        return an instance of GenericEqualityComparer<T>;
    }

    ......
    // Otherwise return an ObjectEqualityComparer<T>
    return new ObjectEqualityComparer<T>();
}

The problem is that GenericEqualityComparer<T> would check if the data is null before calling Equals, thus boxing would happen when the data is value type:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
internal class GenericEqualityComparer<T>: EqualityComparer<T> where T: IEquatable<T>
{
  internal override int IndexOf(T[] array, T value, int startIndex, int count) 
  {
    int endIndex = startIndex + count;
    if (value == null)  ////Boxing
    {
      for (int i = startIndex; i < endIndex; i++) 
      {
        ////Boxing
        if (array[i] == null) return i;
      }
    }
    else 
    {
      for (int i = startIndex; i < endIndex; i++) 
      {
        ////Boxing
        if (array[i] != null && array[i].Equals(value)) return i;
      }
    }
    return -1;
  }

The IL code of GenericEqualityComparer<T> also prove the existence of the box instruction:

I thought someone else might have encountered the same problem already so I searched around and did find one Collections Without the Boxing. Its problem was that ObjectEqualityComparer<T> is created because of a struct does not implement the IEquatable<T> interface, after reading the source code the author of the article had his struct implemented IEquatable<T> so that GenericEqualityComparer<T> would be created, and he thought problem was solved.

Obviously the author of that article did not realize the boxing problem of the GenericEqualityComparer<T>.IndexOf, and if we look at the source code of ObjectEqualityComparer<T>, we can see its IndexOf implementation is exactly same as GenericEqualityComparer<T>!

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
internal class ObjectEqualityComparer<T>: EqualityComparer<T>
{
  internal override int IndexOf(T[] array, T value, int startIndex, int count) 
  {
    int endIndex = startIndex + count;
    if (value == null) ////Boxing
    {
      for (int i = startIndex; i < endIndex; i++) 
      {
        ////Boxing
        if (array[i] == null) return i;
      }
    }
    else 
    {
      for (int i = startIndex; i < endIndex; i++) 
      {
        ////Boxing
        if (array[i] != null && array[i].Equals(value)) return i;
      }
    }
    return -1;
  }

3. Solution

Actually the default implementation of EqualityComparer<T>.IndexOf() is already what we need. But EqualityComparer<T> is an abstract base class that cannot used directly, so we either subclass it (but also need to implement some methods that are not needed when calling IndexOf), or just copy and paste this piece of codes.

1
2
3
4
5
6
7
8
9
10
11
public abstract class EqualityComparer<T> : IEqualityComparer, IEqualityComparer<T>
{
  internal virtual int IndexOf(T[] array, T value, int startIndex, int count) 
  {
    int endIndex = startIndex + count;
    for (int i = startIndex; i < endIndex; i++) 
    {
      if (Equals(array[i], value)) return i;
    }
    return -1;
  }

The good news is that .net Core does not have such problem anymore, after all a lot of codes are re-written for the sake of performance improvements.

1. 问题

上周在修复项目里一个性能问题时偶然发现一段使用了 System.Array.IndexOf<T>()的代码在profiler下显示有装箱操作,那段代码非常简单没有问题,那么.net framework的代码就成了首要嫌疑犯。

2. 分析

尽管System.Array.IndexOf<T>()看起来是泛型方法,但打开源代码就会发现具体工作实际上是通过EqualityComparer<T>.IndexOf()来完成的。

1
2
3
4
5
public static int IndexOf<T>(T[] array, T value, int startIndex, int count) 
{
  ......
  return EqualityComparer<T>.Default.IndexOf(array, value, startIndex, count);
}

由于我们的数据类型是charEqualityComparer<char>.Default会返回一个GenericEqualityComparer<char>的实例。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
public abstract class EqualityComparer<T> : IEqualityComparer, IEqualityComparer<T>
{
  private static EqualityComparer<T> CreateComparer() 
  {
    ......
    // If T implements IEquatable<T> return a GenericEqualityComparer<T>
    if (typeof(IEquatable<T>).IsAssignableFrom(t)) {
        return an instance of GenericEqualityComparer<T>;
    }

    ......
    // Otherwise return an ObjectEqualityComparer<T>
    return new ObjectEqualityComparer<T>();
}

问题在于GenericEqualityComparer<T>在比较的时候会测试数据是否为空,当数据类型是值类型的时候装箱就发生了,如下所示:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
internal class GenericEqualityComparer<T>: EqualityComparer<T> where T: IEquatable<T>
{
  internal override int IndexOf(T[] array, T value, int startIndex, int count) 
  {
    int endIndex = startIndex + count;
    if (value == null)  ////Boxing
    {
      for (int i = startIndex; i < endIndex; i++) 
      {
        ////Boxing
        if (array[i] == null) return i;
      }
    }
    else 
    {
      for (int i = startIndex; i < endIndex; i++) 
      {
        ////Boxing
        if (array[i] != null && array[i].Equals(value)) return i;
      }
    }
    return -1;
  }

从反编译后的IL代码也能看到box指令的身影:

网上搜索了一下类似的问题,找到一篇Collections Without the Boxing。文章描述的是一个struct由于没有实现IEquatable<T>接口导致ObjectEqualityComparer<T>被创建,作者阅读源代码后实现了IEquatable<T>接口使得GenericEqualityComparer<T>被创建,并认为问题解决了。

显然该作者并未意识到装箱操作的存在,更有趣的是如果我们打开ObjectEqualityComparer<T>的源代码会发现其IndexOf方法的实现跟GenericEqualityComparer<T>是一模一样的,也就是说他刚从坑里面爬出来转身又自己跳进去了还不自知。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
internal class ObjectEqualityComparer<T>: EqualityComparer<T>
{
  internal override int IndexOf(T[] array, T value, int startIndex, int count) 
  {
    int endIndex = startIndex + count;
    if (value == null) ////Boxing
    {
      for (int i = startIndex; i < endIndex; i++) 
      {
        ////Boxing
        if (array[i] == null) return i;
      }
    }
    else 
    {
      for (int i = startIndex; i < endIndex; i++) 
      {
        ////Boxing
        if (array[i] != null && array[i].Equals(value)) return i;
      }
    }
    return -1;
  }

3. 解决方案

EqualityComparer<T>.IndexOf()的默认实现其实就已经我们所需要的。不过EqualityComparer<T>是个虚基类没法直接用,所以要么创建一个子类(但同时也需要实现其它IndexOf用不着的方法),要么直接复制这段代码吧。

1
2
3
4
5
6
7
8
9
10
11
public abstract class EqualityComparer<T> : IEqualityComparer, IEqualityComparer<T>
{
  internal virtual int IndexOf(T[] array, T value, int startIndex, int count) 
  {
    int endIndex = startIndex + count;
    for (int i = startIndex; i < endIndex; i++) 
    {
      if (Equals(array[i], value)) return i;
    }
    return -1;
  }

好在同样的问题在.net Core中不复存在,毕竟为了提升性能代码经过了大幅重写。