避免使用枚举类型 (2) TypeSafeEnum

TypeSafeEnum

Posted by eagleboost on June 16, 2022

1. Benchmark

Let’s create some benchmark test codes to compare traditional Enum, EnumBox/EnumParser helper class and TypeSafeEnum from 3 aspect: parsing from string, converting to string and accessing all values:

  • Parsing from string - The helper class EnumBox/EnumParser and TypeSafeEnum have similar performance, after all dictionary lookup is way faster than Enum.Parse.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
public class Benchmark_Parse
{
  [GlobalSetup]
  public void Setup()
  {
    ////Trigger static constructors to warm up
    RuntimeHelpers.RunClassConstructor(typeof(TypeSafeStatus).TypeHandle);
    RuntimeHelpers.RunClassConstructor(typeof(EnumParser<Status>).TypeHandle);
  }
  
  [Benchmark(Baseline = true)]
  [Arguments("New")]
  [Arguments("Open")]
  [Arguments("Cancelled")]
  public void Enum_Parse(string status)
  {
    Enum.Parse(typeof(Status), status);
  }
    
  [Benchmark]
  [Arguments("New")]
  [Arguments("Open")]
  [Arguments("Cancelled")]
  public void EnumBox_Parse(string status)
  {
    EnumParser<Status>.Parse(status);
  }
    
  [Benchmark]
  [Arguments("New")]
  [Arguments("Open")]
  [Arguments("Cancelled")]
  public void TypeSafeEnum_Parse(string status)
  {
    TypeSafeStatus.Parse(status);
  } 
}

  • Converting to string - Helper classes only avoids repeated boxing, but Enum.ToString() is still a slow operation. TypeSafeEnum is the fastest as it directly access the Name property.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
public class Benchmark_ToString
{
  [Benchmark(Baseline = true)]
  public void Enum_ToString()
  {
    var str = Status.New.ToString();
  }
    
  [Benchmark]
  public void EnumBox_ToString()
  {
    var str = EnumBox<Status>.Box(Status.New).ToString();
  }
    
  [Benchmark]
  public void TypeSafeEnum_ToString()
  {
    var str = TypeSafeStatus.New.ToString();
  } 
}

  • Accessing all possible values - TypeSafeEnum wins again for direct accessing of the AllItems property.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
public class Benchmark_All
{
  [Benchmark(Baseline = true)]
  public void Enum_All()
  {
    var items = Enum.GetValues<Status>();
  }
    
  [Benchmark]
  public void EnumBox_All()
  {
    var items = EnumParser<Status>.AllItems;
  }
    
  [Benchmark]
  public void TypeSafeEnum_All()
  {
    var items = TypeSafeStatus.AllItems;
  } 
}

2. Use cases

Assume there’s a OrderStatus class, using it in if statement is straightforward, for switch-case statement, we just need to make minor changes to the OrderStatus class:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
switch (status.Id)
{
  case OrderStatusId.New:
    ......
    break;
  case OrderStatusId.Open:
    ......
    break;
}

public sealed class OrderStatus : TypeSafeEnum<OrderStatus>
{
  public static readonly OrderStatus New = new (OrderStatusId.New, "New");
  public static readonly OrderStatus Open = new (OrderStatusId.Open, "Open");
  public static readonly OrderStatus Cancelled = new (OrderStatusId.Open, "Cancelled");
  
  private OrderStatus(string id, string name) : base(id, name)
  {
  }
}

////Used in switch-case statement
public static class OrderStatusId
{
  public const string New = "0";
  public const string Open = "1";
  public const string Cancelled = "1";
}

It’s more powerful and efficient in scenarios that huge amount of data parsing is needed.

For instance, in a stock trading system, each order contains a lot of properties, apart from basic information symbol, price, quantity, it also has Enum types like order status, side etc. when the orders are stored in the database or some other network services, usually we only save the corresponding value of the properties instead of name, like 0 for Buy, 1 for Sale instead of Buy, Sell. When the orders are loaded to client side, we might get something like this:

1
55=IBM;11=636730640278898634;15=USD;38=7000;40=1;54=1;39=0;10000=UserId_123

The meanings of the Tag/Values are:

Tag/Value Meaning
55=IBM The symbol is IBM
11=636730640278898634 Unique identity of the order
15=USD The order is traded in USD
38=7000 The order quantity is 7000 shares
40=1 It’s a day order
54=1 It’s a Buy order
39=0 The order is in Open status
10000=UserId_123 The id of the trader is UserId_123

Once parsed we’d want to see something like this in the memory:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
public class Order
{
  public Symbol Symbol { get; set; } //IBM

  public string OrderId { get; set; } //636730640278898634

  public Currency Currency { get; set; } //USD

  public double Quantity { get; set; } //7000 shares

  public OrderType OrderType { get; set; } //Day order

  public Side Side { get; set; } //Buy

  public OrderStatus OrderStatus { get; set; } //Open

  public Trader Trader { get; set; } //Michael Jordan (UserId_123)
}

If the Currency, OrderType etc are defined as traditional Enum type, then massive calls to the slow Enum.Parse may potentially cause performance issues.

TypeSafeEnum can solve the problem efficiently. We can add TypeConverter to it for data conversion, in the OrderStatus example OrderStatusConverter calls the fast TypeSafeEnum<T>.Parse method eventually.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
[TypeConverter(typeof(OrderStatusConverter))]
public sealed class OrderStatus : TypeSafeEnum<OrderStatus>
{
  ......
}

public class OrderStatusConverter : TypeConverter
{
  public override bool CanConvertFrom(ITypeDescriptorContext context, Type sourceType)
  {
    return sourceType == typeof(string);
  }

  public override object ConvertFrom(ITypeDescriptorContext context, CultureInfo culture, object value)
  {
    var str = (string)value;
    return OrderStatus.Parse(str);
  }
}

////Demo code, if the property here is OrderStatus, stringValue is "0", then after parsing, value would be OrderStatus.New

if (typeof(TypeSafeEnum).IsAssignableFrom(p.PropertyType))
{
  var converter = TypeDescriptor.GetConverter(p.PropertyType);
  var value = converter.ConvertFromString(stringValue);
  ////Set value to the property p
  ......
}

3. Conclusion

TypeSafeEnum isn’t free, apparently it uses a lot more memory (compare to an integer) than traditional Enum, but the return of investment is enormous, indeed any Enum related to display can be replaced with TypeSafeEnum.

You may have noticed that we didn’t discuss the case that Enum is used as Flag for fast flag test, we can simply add a Flag property to TypeSafeEnum to achieve similar result.

Please visit [github] for the demo project (https://github.com/eagleboost/TypeSafeEnum)

1. 性能测试

下面分别对从字符串解析,转换到字符串和访问所有元素三个方面对传统枚举类型,EnumBoxEnumParser辅助类和TypeSafeEnum三者作性能比较。

  • 从字符串解析 - 可以看出辅助类和TypeSafeEnum性能相当,毕竟查询操作远快于Enum.Parse
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
public class Benchmark_Parse
{
  [GlobalSetup]
  public void Setup()
  {
    ////触发静态构造函数完成热身,下略
    RuntimeHelpers.RunClassConstructor(typeof(TypeSafeStatus).TypeHandle);
    RuntimeHelpers.RunClassConstructor(typeof(EnumParser<Status>).TypeHandle);
  }
  
  [Benchmark(Baseline = true)]
  [Arguments("New")]
  [Arguments("Open")]
  [Arguments("Cancelled")]
  public void Enum_Parse(string status)
  {
    Enum.Parse(typeof(Status), status);
  }
    
  [Benchmark]
  [Arguments("New")]
  [Arguments("Open")]
  [Arguments("Cancelled")]
  public void EnumBox_Parse(string status)
  {
    EnumParser<Status>.Parse(status);
  }
    
  [Benchmark]
  [Arguments("New")]
  [Arguments("Open")]
  [Arguments("Cancelled")]
  public void TypeSafeEnum_Parse(string status)
  {
    TypeSafeStatus.Parse(status);
  } 
}

  • 转换到字符串 - 辅助类只是省去了重复装箱,但Enum.ToString()本身很慢。TypeSafeEnum直接访问属性从而大幅胜出。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
public class Benchmark_ToString
{
  [Benchmark(Baseline = true)]
  public void Enum_ToString()
  {
    var str = Status.New.ToString();
  }
    
  [Benchmark]
  public void EnumBox_ToString()
  {
    var str = EnumBox<Status>.Box(Status.New).ToString();
  }
    
  [Benchmark]
  public void TypeSafeEnum_ToString()
  {
    var str = TypeSafeStatus.New.ToString();
  } 
}

  • 访问所有元素 - TypeSafeEnum同样是直接访问属性大幅胜出。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
public class Benchmark_All
{
  [Benchmark(Baseline = true)]
  public void Enum_All()
  {
    var items = Enum.GetValues<Status>();
  }
    
  [Benchmark]
  public void EnumBox_All()
  {
    var items = EnumParser<Status>.AllItems;
  }
    
  [Benchmark]
  public void TypeSafeEnum_All()
  {
    var items = TypeSafeStatus.AllItems;
  } 
}

2. 使用场景举例

普通使用场景与传统枚举类型类似,如条件判断语句,如果需要在switch-case的场景中使用,略为改写OrderStatus类即可:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
switch (status.Id)
{
  case OrderStatusId.New:
    ......
    break;
  case OrderStatusId.Open:
    ......
    break;
}

public sealed class OrderStatus : TypeSafeEnum<OrderStatus>
{
  public static readonly OrderStatus New = new (OrderStatusId.New, "New");
  public static readonly OrderStatus Open = new (OrderStatusId.Open, "Open");
  public static readonly OrderStatus Cancelled = new (OrderStatusId.Open, "Cancelled");
  
  private OrderStatus(string id, string name) : base(id, name)
  {
  }
}

public static class OrderStatusId
{
  public const string New = "0";
  public const string Open = "1";
  public const string Cancelled = "1";
}

更为有用也远比传统枚举类型高效的是在大量数据需要解析的场景。

举个例子,在一个股票交易系统中,每条交易订单都包含大量的属性,比如交易的价格,数量等,也包含交易状态OrderStatus,是买还是卖Side等等枚举类型。当这些交易订单保持在网络服务中时仅需保存各个属性对应的值而不是显示名称,比如保存0表示买单,1表示卖单等。当订单系统从网络服务读取这些数据时,得到的结果可能是这样:

1
55=IBM;11=636730640278898634;15=USD;38=7000;40=1;54=1;39=0;10000=UserId_123

对应的意义是:

Tag/Value 意义
55=IBM 股票代码是IBM
11=636730640278898634 订单唯一标识
15=USD 交易货币是美元
38=7000 交易数量是7000股
40=1 订单类型是当日有效
54=1 卖单
39=0 订单状态
10000=UserId_123 交易员标识UserId_123

而从上述字符串解析后在内存中我们希望看到的是这样的类:

1
2
3
4
5
6
7
8
9
10
11
public class Order
{
  public Symbol Symbol { get; set; } //IBM
  public string OrderId { get; set; } //636730640278898634
  public Currency Currency { get; set; } //USD
  public double Quantity { get; set; } //7000
  public OrderType OrderType { get; set; } //Day
  public Side Side { get; set; } //Buy
  public OrderStatus OrderStatus { get; set; } //Open
  public Trader Trader { get; set; } //Michael Jordan (UserId_123)
}

如果把这里的CurrencyOrderType等类型定义为传统枚举类型,那么解析的过程将大量调用非常耗时的Enum.Parse导致性能问题。

对于TypeSafeEnum则非常高效。我们可以为相应的类型定义TypeConverter用于数据类型的转换。OrderStatusConverter最终调用了TypeSafeEnum<T>.Parse方法,远比Enum.Parse高效。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
[TypeConverter(typeof(OrderStatusConverter))]
public sealed class OrderStatus : TypeSafeEnum<OrderStatus>
{
  ......
}

public class OrderStatusConverter : TypeConverter
{
  public override bool CanConvertFrom(ITypeDescriptorContext context, Type sourceType)
  {
    return sourceType == typeof(string);
  }

  public override object ConvertFrom(ITypeDescriptorContext context, CultureInfo culture, object value)
  {
    var str = (string)value;
    return OrderStatus.Parse(str);
  }
}

////示例代码,比如这里的属性是OrderStatus,stringValue是"0",那么解析后value的值就是OrderStatus.New

if (typeof(TypeSafeEnum).IsAssignableFrom(p.PropertyType))
{
  var converter = TypeDescriptor.GetConverter(p.PropertyType);
  var value = converter.ConvertFromString(stringValue);
  ////把value设置给相应属性
  ......
}

3. 结论

相比传统枚举类型,TypeSafeEnum的设计所牺牲的内存带来的回报是巨大的。因而在实际工作中需要显示的场景,都可以用TypeSafeEnum来代替。

细心的读者可能也注意到这两篇博客并未提及枚举类型作为Flag使用的场景——尤其是需要快速标志位测试的时候,TypeSafeEnum同样增加一个属性来代表标志位达到类似的效果,具体实现不再赘述。

相关代码请移步github