Linux 内核通知链 (Notification Chain)

⚠ 转载请注明出处：作者：ZobinHuang，更新日期：Nov.22 2021

本作品由 ZobinHuang 采用知识共享署名-非商业性使用-禁止演绎 4.0 国际许可协议进行许可，在进行使用或分享前请查看权限要求。若发现侵权行为，会采取法律手段维护作者正当合法权益，谢谢配合。

1. Notification Chain 的用处

我们都知道，Linux Kernel 是由很多个相互之间依赖关系很强的 Subsystems 组合形成的宏内核系统。不像微内核系统那样，通过 Client/Server 模型来实现内核中不同模块之间的交互，Linux 采用了 Notification Chain 的结构，以回调函数的形式实现了：一个 Subsysem 将它检测到的一个异步事件发布给其它对这个事件感兴趣的 Subsystems。我们在本文中将主要介绍 Notification Chain 的一些实现原理，以及 Kernel Subsystems 是如何通过 Notification Chain 串联起来的。首先我们先来讨论一个直观的 Notification Chain 的使用场景。

考虑上述的网络结构，我们关注 RT 主机。RT 前往 Network F 有两条可选路径，我们假设路由表中当前存储的是通过 eth3 网络接口经过 Network E 前往 Network F。此时当 eth3 下线的时候，理论上路由表项应该被切换成为：通过 eth0 网络接口经过 Network A 前往 Network F。对于 Linux Kernel 来说，负责网络接口状态的是 Device Driver，它能够检测到设备的 Up/Down 状态；负责路由信息的是 Routing Subsystem，它负责维护区域路由信息。为了能够让 Device Driver 在接口下线的时候能够通知到 Routing Subsystem 来修改路由信息表项，Linux Kernel 正是通过 Notification Chain 这种机制来实现的，我们在下面进行介绍。

2. 基本机制

2.1 Publish-and-Subscribe Model

在 Notification Chain 的实现机制中，有两种角色：

Notifier (主动方): Notification Chain 的 Owner；在探测到异步 Event 时，它将调用注册到 Notification Chain 中的回调函数;
Notified (被动方): 对来自其它 Subsystem 的某个 Event 感兴趣的 Subsystem，它将对应于该 Event 的回调函数注册到 Notification Chain 中;

从实现上，Notifier 会管理一个链表 (list)，Notified 可以将它的相关回调函数注册到这个链表中。当相应的事件发生时，这条链表上注册的回调函数将会被调用。我们在下面会看到具体的数据结构。

2.2 Notifier 声明一条 Notification Chain

正如上述，一条 Notification Chain 实际上是一条 struct notifier_block 结构体的链表，一个 struct notifier_block 代表了一个被注册进当前 Notification Chain 的对某个 Event 感兴趣的 Subsystem。这个结构是在 include/linux/notifier.h 中定义的，其具体定义如下所示：

struct notifier_block {
  notifier_fn_t notifier_call;
  struct notifier_block __rcu *next;
  int priority;
};

分析 struct notifier_block 结构体的成员：notifier_call 是注册当前 notifier_block 的 Subsystem 针对该特定 Event 的回调函数，可以看出，当一个事件发生时，具体执行哪个回调函数，是由 Event 的接收者 Notified 来决定的，而不是由 Notifier 来决定的，这样才是合理的逻辑。next 是指向下一个 struct notifier_block 结构的链表指针，值得注意的是出现在此处的 __rcu 宏 (p.s. 如果您对 Read-Copy Update 不是十分了解，可以查阅我的另一篇文章 Read-Copy Update (RCU) 同步机制)，其在 include/linux/compiler_types.h 中被定义，如下所示：

1	# define __rcu __attribute__((noderef, address_space(4)))

这个宏用于显式告诉 Sparse 代码分析工具被修饰的指针是被 RCU 保护的指针，以探测代码中是否有不恰当的对受 RCU 保护的共享数据结构的访问，比如在 Reader 程序中使用了 不是基于 rcu_dereference() 或其变种的方式 来访问该指针指向的内存区域。当探测到相关错误时，Sparse 代码分析工具将会进行报告。

我们继续分析 struct notifier_block 结构体的成员, priority 用于描述当前 notifier_block 结构体在当前 Notification Chain 中的优先级，优先级越高的 notifier_block 将在相关 Event 发生时更早的被执行。也就是说，我们所面对的 Notification Chain 实际上是一个排序了的链表。在实际中，notifier_block 的 priority 通常都被置为同样的 0，回调函数被执行的顺序取决于 notifier_block 被注册进 Notification Chain 的顺序。

2.3 Notified 向 Notification Chain 注册回调函数

上面我们研究了 Notification Chain 的本质实际上就是一条 struct notifier_block 的链表。在本节中我们将探究 Notified Subsystem 是如何将自己的回调函数注册进其感兴趣的 Notification Chain 的。

Notified Subsystem 通过调用 notifier_chain_register 来向一条 Notification Chain 中注册回调函数，这个函数是在 kernel/notifier.c 中实现的，具体定义如下所示：

/*
 *	Notifier chain core routines.  The exported routines below
 *	are layered on top of these, with appropriate locking added.
 */
static int notifier_chain_register(struct notifier_block **nl,
    struct notifier_block *n)
{
  while ((*nl) != NULL) {
    if (unlikely((*nl) == n)) {
      WARN(1, "double register detected");
      return 0;
    }
    if (n->priority > (*nl)->priority)
      break;
    nl = &((*nl)->next);
  }
  n->next = *nl;
  rcu_assign_pointer(*nl, n);
  return 0;
}

注意到该函数实际上就是一个 node 插入 list 的过程，并且插入的顺序是按照上面介绍的 priority 的大小顺序进行插入的。

有注册，就有卸载，Notified 通过调用 notifier_chain_unregister 来将自己的 notifier_block 从某个 Notification Chain 卸载下来，这个函数是在 kernel/notifier.c 中定义的，其具体定义如下所示，由于比较简单，此处就不进行解释。

static int notifier_chain_unregister(struct notifier_block **nl,
struct notifier_block *n)
{
  while ((*nl) != NULL) {
    if ((*nl) == n) {
      rcu_assign_pointer(*nl, n->next);
      return 0;
    }
    nl = &((*nl)->next);
  }
  return -ENOENT;
}

内核中有常用的与网络相关的 Notification Chain 有: inetaddr_chain、inet6addr_chain 和 netdev_chain 等。其中，inetaddr_chain 用于通告本地 network interface 上的 IPv4 地址的插入/删除/修改等 Event; netdev_chain 用于通告 Network Device 的注册状态。通常来说，这些 Notification Chain 提供了一系列的 Wrapper 函数，将上述的 notifier_chain_register 和 notifier_chain_unregister 函数包装起来，具体整理如下：

Operation	Function Prototype
Registration	`int notifier_chain_register(struct notifier_block *nl, struct notifier_block n)`
	Wrappers
	`inetaddr_chain`: `int register_inetaddr_notifier(struct notifier_block *nb)`
	`inet6addr_chain`: `int register_inet6addr_notifier(struct notifier_block *nb)`
	`netdev_chain`: `int register_netdevice_notifier(struct notifier_block *nb)`
Unregistration	`int notifier_chain_unregister(struct notifier_block *nl, struct notifier_block n)`
	Wrappers
	`inetaddr_chain`: `int unregister_inetaddr_notifier(struct notifier_block *nb)`
	`inet6addr_chain`: `int unregister_inet6addr_notifier(struct notifier_block *nb)`
	`netdev_chain`: `int unregister_netdevice_notifier(struct notifier_block *nb)`

2.4 Notifier 在 Notification Chain 上通告 Event 的发生

Notifier 通过调用 notifier_call_chain 来在一条 Notification Chain 上通告 Event 的发生，这个函数是在 kernel/notifier.c 中定义的，其具体定义如下所示。对于其传入的参数，nl 是要发布 Event 的 Notification Chain 的链表头指针; val 是 Event 的类型，通常使用宏定义来使得代码更加 Readable，如 NETDEV_REGISTER; v 是传入回调函数的参数 (i.e. 可以是多个)，如当一个新的 Network Device 注册进内核时，相关的 Notiftictaion 会使用这个参数来承载对应 Network Device 的 net_device 结构体，以将相关信息传递给感兴趣的 Subsystem。

static int notifier_call_chain(struct notifier_block **nl,
	       unsigned long val, void *v,
	       int nr_to_call, int *nr_calls)
{
  int ret = NOTIFY_DONE;
  struct notifier_block *nb, *next_nb;

  nb = rcu_dereference_raw(*nl);

  while (nb && nr_to_call) {
    next_nb = rcu_dereference_raw(nb->next);

#ifdef CONFIG_DEBUG_NOTIFIERS
    if (unlikely(!func_ptr_is_kernel_text(nb->notifier_call))) {
      WARN(1, "Invalid notifier called!");
      nb = next_nb;
      continue;
    }
#endif
    ret = nb->notifier_call(nb, val, v);

    if (nr_calls)
      (*nr_calls)++;

    if (ret & NOTIFY_STOP_MASK)
      break;
    nb = next_nb;
    nr_to_call--;
  }
  return ret;
}

从上面的函数定义中可以看到，其就是按照 Notification Chain 链上的顺序，依次调用了各个 notifier_block 的中回调函数。值得注意的是，由于是 Notifier 主动调用的，因此这些回调函数是在 Notifier 进程的上下文中执行的。因此在实现回调函数时，我们可以将其中的逻辑填充为：把相关的 Event 入队到某个内存区域中，然后唤醒相关的进程来处理这些 Event 信息，以跳出 Notifier 上下文来处理其它 Subsystem 的回调逻辑。

notifier_call_chain 调用的回调函数可以返回如下其中之一的返回值：

NOTIFY_OK: 成功地执行了回调函数;
NOTIFY_DONE: 对该 Event 不感兴趣;
NOTIFY_BAD: 在运行回调函数出现问题，停止对 Notification Chain 上所有后续回调函数的调用;
NOTIFY_STOP: 成功调用了当前的回调函数，并且已经不需要调用后续的回调函数;

NOTIFY_BAD 和 NOTIFY_STOP 宏都包括了 NOTIFY_STOP_MASK 位。从上面的代码中可以看见，如果遭遇了 NOTIFY_BAD 和 NOTIFY_STOP 返回值，notifier_call_chain 都会停止对后续回调函数的调用。notifier_call_chain 最后返回的是最后一个被调用的回调函数的返回值。