避免使用枚举类型 (1) No more Enums

No more Enums

Posted by eagleboost on June 12, 2022

1. The problem

As one of the basic data types in a lot of programming language, Enum provides Alias capability for a set of integers with limited number of values, it’s very easy to use but with some potential problems when it’s not used properly. Such problems could cause performance issues especially in large scan desktop applications.

  • Converting Enum to string

One of the scenarios WPF developers would meet more or less is to display all possible values of certain type of Enum in a list control for user to choose, usually ListBox or ComboBox. Sometimes displaying the name of the value itself is enough, which would results in the calls to the ToString() method, but it’s not always the case especially when the names of the Enum are not user friendly. The common solution is adding an Attribute to define display name for each of the value, then retrieve the display name from the Attribute for display. Reflection is not efficient operation, the ToString() call is also slow for Enum. It’s not big deal for small size application when such case does not happen frequently.

However, extra attentions should be paid when ToString() or reflection is used very frequently. In an order management system for example, boxing would happen if we forget to call ToString() for Enum when writing order information to log, and even we do call ToString() to avoid boxing, the ToString() call itself is still very slow. Proof can be found in the source code of Enum (.net Framework) or Enum (.net Core). Even worse if we want to write Display Name of the value to log then reflection would be needed.

  • Parsing string to Enum

When there’s a need to convert Enum to string, there’s also a need to convert string back to Enum, aka the Parse operation. When the string is the name of the value we can call Enum.Parse(), The .net Framework version of this method allocates temporary strings,the .net Core version, although uses Span<char> to avoid allocating temporary string on the Heap, but the cost is still not trivial according to the source code. Enum.Parse() would not work if Attribute is used to define custom display name, common approach is to cache the names for reuse, like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
public static class EnumParser<T> where T : struct, Enum
{
  private static readonly Dictionary<string, T> StringToEnum;

  static EnumParser()
  {
    StringToEnum = new Dictionary<string, T>(StringComparer.OrdinalIgnoreCase);
    foreach (T value in Enum.GetValues(typeof(T)))
    {
      //GetName returns the display name of the value from the Attribute, implementation details is ignored here
      var name = GetName(value);
      StringToEnum.Add(name, value);
    }
  }
  
  public static T Parse(string value)
  {
    return StringToEnum[value];
  }
  
  public static bool TryParse(string value, out T result)
  {
    return StringToEnum.TryGetValue(value, out result);
  }
}
  • Type casting. Say we have a Enum type of Status, we can convert it to integer this way:
1
var intValue = (int)Status;

But it works only for explicit type casting. Notice that Enum type also implements IConvertible, so sometimes we write general code this way:

1
2
3
4
5
6
7
8
9
10
var intValue = Status.ToInt32(null);

////Generic convert
public static class EnumConvert<TIn> where TIn: Enum, IConvertible
{
  public static int From(TIn value)
  {
    return (int)value.ToInt32(null);
  }
}

The result is the same comparing to explicit direct casting but one fact usually being neglected by developers is the implicit boxing in the IConvertible implementations:

1
2
3
4
5
int IConvertible.ToInt32(IFormatProvider? provider)
{
  //Return type of the GetValue() method is System.Object
  return Convert.ToInt32(GetValue());
}

2. Box the Enum values

One way to avoid boxing operation is to box the values. For example Boolean has only two possible values, so we can box them and reuse the boxed values when needed (not everywhere) to avoid boxing. This approach has been used widely in WPF when creating Boolean type Dependency Properties.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
public static class BooleanBox
{
  public static readonly object True = true;
  public static readonly object False = false;
  
  public static object Box(bool value)
  {
    return value ? True : False;
  }
  
  public static object Box(bool? value)
  {
    return value.HasValue ? Box(value.Value) : null;
  }
}

Similar to Boolean, Enum also has limited number of possible values, so we can also box them and reuse later. Below is a simple implementation for demo purpose:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
public static class EnumBox<T> where T : struct, Enum
{
  private static readonly Dictionary<T, object> BoxedValues = new();

  static EnumBox()
  {
    foreach (var value in Enum.GetValues(typeof(T)))
    {
      BoxedValues.Add((T)value, value);
    }
  }
  
  public static object Box(T value)
  {
    return BoxedValues[value];
  }
}

3. There’re still problems

So far we have fixed these problems:

Problem Fix
Avoid Boxing UseEnumBox<T> or similar implementations
Parse Enum from string UseEnumParser<T> or similar implementations
Convert to integer Avoid using IConvertible

However, although boxing can be avoided by using EnumBox<T>, but on one hand dictionary lookup still has cost, on the other hand unboxing is not fast, and it feels strange to create helper classes like EnumBox<T> and EnumParser<T> just for a type that’s basically an integer.

The basic scenario or the design purpose of Enum is actually just being alias, but there’re far more use cases in real world especially in modern applications today. Except for display and serialization which cannot bypass boxing/unboxing, there’re other advanced scenarios hard to implement by using the traditional Enum. Below examples are from some order management system:

Scenario Description
Searching and Filtering One column in the grid is the country the order originated. The country code is an Enum by definition but users might want to be able to enter CN or China in the column header to search/filter orders from China.
Sorting The order of values defined in an Enum may not be the order they’re defined. Take country as an example, the country code can just an an integer but the column usually should be sorted by the string presented to user (country short name or full name).
Special Displaying Take country example again, we might want to show full name/short name/ISO code/flag in different places.
Dynamic Loading Although country is Enum by definition, but since it has many values and barely used in if statement or switch-case statment, so instead of defining all values as static members of the Country class, its better store the values in some configuration file or data base and dynamically load them into memory at runtime.

To implement all features listed above, creating a dedicated data type instead of using Enum maybe a better choice.

4. TypeSafeEnum

This base class TypeSafeEnum defines Id and Name, Id is the unique identity of a value of TypeSafeEnum, we choose string instead of integer as its data type because firstly most of the time identities are strings, secondly in the serialization case we could use string directly without conversion between string and integers back and forth.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
public abstract class TypeSafeEnum
{
  protected TypeSafeEnum(string id, string name)
  {
    Id = id;
    Name = name;
  }
  
  public readonly string Id;
  
  public readonly string Name;
  
  public override string ToString()
  {
    return Name;
  }
}

Then we can define the generic class TypeSafeEnum<T>, its core functionalities are implemented by a nested class EnumCache.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
public abstract partial class TypeSafeEnum<T> : TypeSafeEnum where T : TypeSafeEnum<T>
{
  protected TypeSafeEnum(string name) : this(name, name)
  {
  }

  protected TypeSafeEnum(string id, string name) : base(id, name)
  {
    EnumCache.Add(id, (T)this);
  }
  
  ////Returns corresponding instance of T based on Id or Name, it would throw exception when no value can be found.
  public static T Parse(string id)
  {
    return EnumCache.Find(id);
  }

  ////Returns corresponding instance of T based on Id or Name,return false when not found
  public static bool TryParse(string id, out T result)
  {
    return EnumCache.TryFind(id, out result);
  }
  
  ////Returns all possible values of the type T
  public static IReadOnlyCollection<T> AllItems => EnumCache.Items;
}

Below is a simple implementation of the core function EnumCache. Two things worth mention are first it created a dictionary (or an array/list when number of values is small for better lookup performance and smaller memory usage) to store the mapping from Id and Name to the corresponding value, second its static constructor triggers the static constructor of the type T so that all of the defined static instances of the type T would be created before any other method of EnumCache is called.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
public abstract partial class TypeSafeEnum<T> where T : TypeSafeEnum<T>
{
  private static class EnumCache
  {
    private static readonly Dictionary<string, T> Map = new(StringComparer.OrdinalIgnoreCase);
    private static readonly List<T> Values = new();

    static EnumCache()
    {
      ////Trigger the static constructor of type T and create all instances of T
      RuntimeHelpers.RunClassConstructor(typeof(T).TypeHandle);
    }
    
    public static IReadOnlyCollection<T> Items => Values;
      
    public static T Find(string id)
    {
      try
      {
        return Map[id];
      }
      catch (Exception e)
      {
        throw new ArgumentException($"Cannot parse {id} for '{typeof(T)}'");
      }
    }

    public static bool TryFind(string id, out T item)
    {
      item = FindCore(id);
      return item != null;
    }
    
    public static void Add(string id, T item)
    {
      Map.Add(id, item);
      Map.Add(item.Name, item);
      Values.Add(item);
    }

    private static T FindCore(string id)
    {
      return Map.TryGetValue(id, out var itemById) ? itemById : null;
    }
  }
}

Then we can define a new type based on the generic base class:

1
2
3
4
5
6
7
8
9
10
public sealed class TypeSafeStatus : TypeSafeEnum<TypeSafeStatus>
{
  public static readonly TypeSafeStatus New = new ("0", "New");
  public static readonly TypeSafeStatus Open = new ("1", "Open");
  public static readonly TypeSafeStatus Cancelled = new ("2", "Cancelled");
  
  private TypeSafeStatus(string id, string name) : base(id, name)
  {
  }
}

Some unit test codes:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
[Test]
public void Test_01_ParseById()
{
  Assert.That(TypeSafeStatus.Parse("0"), Is.EqualTo(TypeSafeStatus.New));
  Assert.That(TypeSafeStatus.Parse("1"), Is.EqualTo(TypeSafeStatus.Open));
  Assert.That(TypeSafeStatus.Parse("2"), Is.EqualTo(TypeSafeStatus.Cancelled));
}

[Test]
public void Test_02_ParseByName()
{
  Assert.That(TypeSafeStatus.Parse("New"), Is.EqualTo(TypeSafeStatus.New));
  Assert.That(TypeSafeStatus.Parse("Open"), Is.EqualTo(TypeSafeStatus.Open));
  Assert.That(TypeSafeStatus.Parse("Cancelled"), Is.EqualTo(TypeSafeStatus.Cancelled));
}

1. 问题

枚举类型Enum作为编程语言中的常用数据类型,为有限数量情况下的整数提供了别名支持,使用很简单,但隐含的问题其实不少——使用不当带来的性能损失对于大型桌面应用来说有时候会是一笔不大不小(但可以避免)的开销。

  • 枚举类型转换到字符串

WPF应用程序的程序员大都碰到过这个问题——把一个枚举类型的所有可能值显示在一个列表中供用户选择,比如ComboBoxListBox。有时候把相应枚举值的名称显示出来就行,但会导致ToString()方法的调用。当值的名称对用户不友好的时候通常的做法是给每个值加上一个Attribute来定义名称,再通过反射取出每个值对应的名称用于显示。枚举类型的ToString()方法以及反射都是低效的操作,如果是小型应用程序或者只是为了显示倒也问题不大,毕竟不是频繁发生。

在需要频繁调用的上下文中使用枚举类型则需要注意。比如为每条订单记录日志的时候使用枚举类型但是忘了调用ToString()会导致装箱,即便调用ToString()避免了装箱但是枚举类型的ToString()还是有额外的开销(打开Enum源码可以看到),更遭的是如果希望在日志中记录名称那么反射又不可避免。

  • 字符串转换到枚举类型

有转换到字符串的需求自然也有从字符串解析得到相应枚举类型的需求。字符串与值名称相同的情况下可以直接调用Enum.Parse()。在.net Framework中该方法会创建临时字符串,.net Core使用了Span<char>避免了在堆上创建临时字符串但从源代码可以看出代价还是不小。如果使用了Attribute定义名称,那Enum.Parse()就不再适用,通常的做法是把名称缓存起来,比如这样:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
public static class EnumParser<T> where T : struct, Enum
{
  private static readonly Dictionary<string, T> StringToEnum;

  static EnumParser()
  {
    StringToEnum = new Dictionary<string, T>(StringComparer.OrdinalIgnoreCase);
    foreach (T value in Enum.GetValues(typeof(T)))
    {
      //GetName根据Attribute取回相应的名称,细节不再赘述。
      var name = GetName(value);
      StringToEnum.Add(name, value);
    }
  }
  
  public static T Parse(string value)
  {
    return StringToEnum[value];
  }
  
  public static bool TryParse(string value, out T result)
  {
    return StringToEnum.TryGetValue(value, out result);
  }
}
  • 类型转换。把一个枚举类型Status转换为整数类型很简单:
1
var intValue = (int)Status;

但这只适用于显式转换的场景。注意到枚举类型Enum实现了IConvertible接口,所以在一些泛型代码中可以这样写:

1
2
3
4
5
6
7
8
9
10
var intValue = Status.ToInt32(null);

////泛型转换
public static class EnumConvert<TIn> where TIn: Enum, IConvertible
{
  public static int From(TIn value)
  {
    return (int)value.ToInt32(null);
  }
}

结果相同但是使用IConvertible接口产生的隐含装箱操作常常容易被忽略:

1
2
3
4
5
int IConvertible.ToInt32(IFormatProvider? provider)
{
  //GetValue返回值是object
  return Convert.ToInt32(GetValue());
}

2. 那就装箱吧

避免值类型被装箱的一种办法是把值类型装箱,很拗口,看个例子就明白了。比如布尔类型Boolean就两个可能值,那么事先把它们装箱,然后在需要的地方(不是所有地方)用它们来代替从而避免频繁装箱。WPF里创建布尔类型的依赖属性DependencyProperty时就大量使用了这种技术。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
public static class BooleanBox
{
  public static readonly object True = true;
  public static readonly object False = false;
  
  public static object Box(bool value)
  {
    return value ? True : False;
  }
  
  public static object Box(bool? value)
  {
    return value.HasValue ? Box(value.Value) : null;
  }
}

枚举类型由于可能值的数量有限,也可以使用类似的方式,比如这个最简陋的实现:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
public static class EnumBox<T> where T : struct, Enum
{
  private static readonly Dictionary<T, object> BoxedValues = new();

  static EnumBox()
  {
    foreach (var value in Enum.GetValues(typeof(T)))
    {
      BoxedValues.Add((T)value, value);
    }
  }
  
  public static object Box(T value)
  {
    return BoxedValues[value];
  }
}

3. 还有问题

好,到目前为止这几个问题得到了解决:

问题 解决方法
避免装箱 使用EnumBox<T>或类似实现
从名称获得枚举值 使用EnumParser<T>或类似实现
转换到整数 不使用IConvertible就没有装箱

但再仔细一想,EnumBox用装箱避免了装箱,但一方面字典访问虽然快但仍旧有开销,另一方面拆箱也不快,而且为了实际上就是个整数的类型搞出几个辅助类总感觉有点多余。

枚举类型的基本使用场景其实只有别名一个,但真实世界的使用场景则更多,比如用户呈现和序列化,这两者都不可避免的导致装箱拆箱操作。此外还有一些延伸的高级场景则难于用传统的枚举类型实现,比如下面的例子来自某个订单系统:

场景 描述
查找与过滤 在表格中某一列显示订单发起的国家。国家代码从定义来说是一个枚举类型,用户需要能够在列表头输入CN或者China来查找和过滤中国的订单。
排序 枚举类型的数据定义的顺序并不一定是显示的顺序。还是以国家为例,国家代码可以用一个整数代替,但排序的时候按照用户所见的字符串(比如国家名称)排列再正常不过。
多种显示 还是以国家为例,国家代码可以用一个整数代替,但根据实际情况可能需要显示为国家名称/缩写/国家代码/国旗等。
动态加载 尽管国家从定义上来说是个枚举类型,但是其数量较多,那么更好的做法应该是把数据存放在配置文件或者数据库中,在运行时动态加载到内存。

于是会发现创建一个专门的数据类型来代替枚举是更好的选择——假设我们把这个新的枚举类型定义为一个类(而不是值类型),那么上述问题都很容易找到相应的解决办法。

4. TypeSafeEnum

基类定义了IdName。字符串类型的Id是唯一标识,与传统枚举类型基于整数不同,字符串具有更广泛的适用性,一来很多时候唯一标识并非数字,二来在需要序列化的场景下字符串可以一步到位。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
public abstract class TypeSafeEnum
{
  protected TypeSafeEnum(string id, string name)
  {
    Id = id;
    Name = name;
  }
  
  public readonly string Id;
  
  public readonly string Name;
  
  public override string ToString()
  {
    return Name;
  }
}

接下来是泛型基类的定义,其核心功能由内嵌类EnumCache实现。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
public abstract partial class TypeSafeEnum<T> : TypeSafeEnum where T : TypeSafeEnum<T>
{
  protected TypeSafeEnum(string name) : this(name, name)
  {
  }

  protected TypeSafeEnum(string id, string name) : base(id, name)
  {
    EnumCache.Add(id, (T)this);
  }
  
  ////根据Id或Name找到相应的实例T,失败时抛出异常
  public static T Parse(string id)
  {
    return EnumCache.Find(id);
  }

  ////根据Id或Name找到相应的实例T,失败时返回False
  public static bool TryParse(string id, out T result)
  {
    return EnumCache.TryFind(id, out result);
  }
  
  ////返回该类型T的所有实例
  public static IReadOnlyCollection<T> AllItems => EnumCache.Items;
}

核心功能EnumCache的简单实现如下。EnumCache一方面创建了一个字典(字典不是唯一选择,枚举值数量少的时候可以使用数组或列表达到更小的内存占用和更快的存取速度)用于保存IdName到相应实例的映射,另一方面其静态构造函数触发了类型T的静态构造函数以使得类型T的所有已定义的静态实例在EnumCache的任何方法被调用之前创建好。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
public abstract partial class TypeSafeEnum<T> where T : TypeSafeEnum<T>
{
  private static class EnumCache
  {
    private static readonly Dictionary<string, T> Map = new(StringComparer.OrdinalIgnoreCase);
    private static readonly List<T> Values = new();

    static EnumCache()
    {
      ////触发类型T的静态构造函数并创建所有已定义的静态实例
      RuntimeHelpers.RunClassConstructor(typeof(T).TypeHandle);
    }
    
    public static IReadOnlyCollection<T> Items => Values;
      
    public static T Find(string id)
    {
      try
      {
        return Map[id];
      }
      catch (Exception e)
      {
        throw new ArgumentException($"Cannot parse {id} for '{typeof(T)}'");
      }
    }

    public static bool TryFind(string id, out T item)
    {
      item = FindCore(id);
      return item != null;
    }
    
    public static void Add(string id, T item)
    {
      Map.Add(id, item);
      Map.Add(item.Name, item);
      Values.Add(item);
    }

    private static T FindCore(string id)
    {
      return Map.TryGetValue(id, out var itemById) ? itemById : null;
    }
  }
}

有了泛型基类,以Status为例可以定义出TypeSafeStatus如下。

1
2
3
4
5
6
7
8
9
10
public sealed class TypeSafeStatus : TypeSafeEnum<TypeSafeStatus>
{
  public static readonly TypeSafeStatus New = new ("0", "New");
  public static readonly TypeSafeStatus Open = new ("1", "Open");
  public static readonly TypeSafeStatus Cancelled = new ("2", "Cancelled");
  
  private TypeSafeStatus(string id, string name) : base(id, name)
  {
  }
}

相应的单元测试如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
[Test]
public void Test_01_ParseById()
{
  Assert.That(TypeSafeStatus.Parse("0"), Is.EqualTo(TypeSafeStatus.New));
  Assert.That(TypeSafeStatus.Parse("1"), Is.EqualTo(TypeSafeStatus.Open));
  Assert.That(TypeSafeStatus.Parse("2"), Is.EqualTo(TypeSafeStatus.Cancelled));
}

[Test]
public void Test_02_ParseByName()
{
  Assert.That(TypeSafeStatus.Parse("New"), Is.EqualTo(TypeSafeStatus.New));
  Assert.That(TypeSafeStatus.Parse("Open"), Is.EqualTo(TypeSafeStatus.Open));
  Assert.That(TypeSafeStatus.Parse("Cancelled"), Is.EqualTo(TypeSafeStatus.Cancelled));
}