简述

这是kernel处理bpf加载时候的处理逻辑

kernel/bpf/syscall.c

学习一下bpf加载过程

再往上层是static int __sys_bpf(int cmd, bpfptr_t uattr, unsigned int size)

SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, size)
{
	return __sys_bpf(cmd, USER_BPFPTR(uattr), size);
}

bpf系统调用

具体参数在bpf.h里面已经定义好了,可以man bpf来进行查看

其中prog_load需要这些参数

               struct {    /* Used by BPF_PROG_LOAD */
                   __u32         prog_type;
                   __u32         insn_cnt;
                   __aligned_u64 insns;      /* 'const struct bpf_insn *' */
                   __aligned_u64 license;    /* 'const char *' */
                   __u32         log_level;  /* verbosity level of verifier */
                   __u32         log_size;   /* size of user buffer */
                   __aligned_u64 log_buf;    /* user supplied 'char *'
                                                buffer */
                   __u32         kern_version;
                                             /* checked when prog_type=kprobe
                                                (since Linux 4.1) */
               };

前面都是env的一些设置,重点是bpf_check函数,在这个函数中会对bpf指令做各种check

细节

__sys_bpf

第一步先parse用户输入

static int __sys_bpf(int cmd, bpfptr_t uattr, unsigned int size)
{
	union bpf_attr attr;
	int err;

	err = bpf_check_uarg_tail_zero(uattr, sizeof(attr), size);
	if (err)
		return err;
	size = min_t(u32, size, sizeof(attr));

	/* copy attributes from user space, may be less than sizeof(bpf_attr) */
	memset(&attr, 0, sizeof(attr));
	if (copy_from_bpfptr(&attr, uattr, size) != 0) // copy uattr to addr
		return -EFAULT;

	err = security_bpf(cmd, &attr, size);
	if (err < 0)
		return err;

先分析第一个函数bpf_check_uarg_tail_zero

描述是这样写的

/*
 * If we're handed a bigger struct than we know of, ensure all the unknown bits
 * are 0 - i.e. new user-space does not rely on any kernel feature extensions
 * we don't know about yet.
 *
 * There is a ToCToU between this function call and the following
 * copy_from_user() call. However, this is not a concern since this function is
 * meant to be a future-proofing of bits.
 */

是一个safe check

函数体: