As part of the Summer '22 round of Linux Kernel Mentorship Program, I worked on the bug KASAN: use-after-free Read in v4l2_fh_open amongst multiple other bugs reported by Syzbot. My patch for this bug is still under review at this time. I'll update and post any feedback I will receive from the maintainers.

Note: I will be describing the bug, my analysis, my patch and how I arrived at the fix. The
      purpose is to document how I was thinking about the bug. 

The syzbot dashboard

Syzbot is a cloud based automation system that runs the fuzzer Syzkaller continuously on various Linux trees and reports any bugs found. A syzbot bug report includes a lot of information about the crash like the reproducer for the bug, call trace, the commit and tree that crashed, config file used, number of crashes, cause and fix bisection attempts etc.

We can also issue commands to syzbot(via email) to test a particular kernel tree and commit for a bug and optionally send it a patch to apply on the tree before testing the bug. This is useful to test out patches or even just to check if the bug still exists in the kernel.

Note: Just because a bug is marked as open by syzbot does not automatically mean it
      is still open. 

There may be a fix for the bug already lined up in the linux-next tree or perhaps the fix was patched into the tree but did not have a Fixes /Reported tag or maybe syzbot hasn't tested the bug for a while. Since syzbot fuzzes older Linux releases as well, the bug may not occur in newer releases owing to code changes in the kernel. Syzbot docs explain how syzbot determines a bug as closed.

For both Syzbot and Syzkaller, I found the official docs and the mailing lists to be the best resources for setup and usage info.

Bug Analysis

First off, looking at the name of this KASAN bug raises some questions: Which resource is being read after its freed? Who is freeing the resource? Which codepaths lead to this read-after-free? Are there any checks along such codepaths and if so how are they being bypassed? And of course, which subsystems are involved in this bug?

The stack trace printed in the kernel buffer along with GDB helped in analyzing the codepaths. The kernel buffer has three call traces. First, call trace for the use-after-free read of resource. Second, call trace for allocation of the resource. Third, call trace for freeing of resource.

Note: Scroll to the bottom of the page for the full stack trace decoded using 
      decode_stacktrace.sh. 

Subsystems involved

This bug involves the V4L2(Video for Linux version 2) infrastructure(/drivers/media/v4l2-core) and the em28xx driver(drivers/media/usb/em28xx) in the Linux Media subsystem.

Bug premise

Video device vdev(of type struct video_device *) is being read in em28xx_v4l2_open() (called by v4l2_open()) after it has been freed in a call to em28xx_v4l2_init().

Cause of bug

The bug arises because the em28xx driver registers a V4L2 video device(struct video_device) with the V4L2 interface in em28xx_v4l2_init(), but the V4L2 extension initialization eventually fails mid-way and the video device is unregistered. v4l2_open() in the V4L2 interface views the device as registered (because video_is_registered() in the videodevice module returns true) and allows calls to em28xx_v4l2_open() in the driver code.

There is a race between video_unregister_device() (called by the V4L2 interface to unregister the video device) and v4l2_open(). em28xx_v4l2_open() is accessing the video device after it has been freed(by the release callback when there is no device user left) and is passing it on to v4l2_fh_open() which in turn is passing it onto v4l2_fh_init() where the use-after-free read actually occurs.

Arriving at a bug fix

The process I used to work on the bug was far from perfect. But still, the below section lists out the various strategies I thought out and tried. Clearly, most of them did not fix the bug. However, it was helpful to come up with them because I was able to follow up in a new direction by looking at why a certain approach was not working out. Mostly, I had to refer back to the subsystem documentation and rest of the subsystem code to study up on Linux Media internals.

Try #1

As a brute attempt, I increased the refcount to struct em28xx_v4l2_ *v4l2 in em28xx_v4l2_open() using kref_get() with the mutex dev->lock (part of struct em28xx) locked to prevent video_device_unregister() from freeing up device while em28xx_v4l2_open() uses vdev. However, it doesn't make any sense to do this. The vdev can already be freed before it is extracted from file in em28xx_v4l2_open(). vdev can still be read before kref_get() causing use-after-free read.

Try #2

I attempted to solve the race via additional locking in v4l2-dev.c. I considered locking the call to device_unregister() inside video_device_register() and locking the vdev extraction inside em28xx_v4l2_open() using the mutex videodev_lock (static to v4l2-dev.c). This logic was wrong and caused a deadlock.

Try #3

Used the mutex vdev->lock inside v4l2_open() and em28xx_v4l2_init() to make them mutually exclusive. This also deadlocked.

Try #4

The only viable option(i.e. locks common to both functions whose use would not create a deadlock ) was the mutex available in the struct em28xx device pointer *dev(i.e dev->lock). video_device_unregister() marks vdev as unregistered with dev->lock locked. What if vdev extraction from file in em28xx_v4l2_open() is locked with dev->lock as well? The issue still remained.

Pointer to vdev extracted from file eventually points to a struct video_device object. So, if device_unregister() is called on vdev before extraction, vdev is not and can not be nulled since it is not a pointer. Thus, there is no way to check if vdev has been freed or not.

Try #5 (Worked)

Since making V4L2 extension initialization and v4l2_open() mutually exclusive was not working out, I thought of ways to bar V4L2 file open operation (v4l2_open()) while V4L2 extension was being initialized inem28xx_v4l2_init().

The fix

The patch makes changes in em28xx_v4l2_init(). It temporarily disables V4L2 file operations on struct video_device devices(video, vbi, radio) before registering them and enables the file operations on the video_device devices only at the end of the em28xx_v4l2_init() codepath in which V4L2 extension initialization succeeds.

The following templates present in drivers/media/usb/em28xx/em28xx-video.c are used by em28xx_v4l2_init() to initialize the video, vbi and radio devices (all of type struct video_device) before registering them.

static const struct video_device em28xx_video_template = {
	.fops		= &em28xx_v4l_fops,
 	.ioctl_ops	= &video_ioctl_ops,
 	.release	= video_device_release,
 	.tvnorms	= V4L2_STD_ALL,
 };

static struct video_device em28xx_radio_template = {
	.fops		= &radio_fops,
 	.ioctl_ops	= &radio_ioctl_ops,
 	.release	= video_device_release,
 };

static const struct v4l2_file_operations em28xx_v4l_fops = {
	.owner         = THIS_MODULE,
	.open          = em28xx_v4l2_open,
	.release       = em28xx_v4l2_close,
	.read          = vb2_fop_read,
	.poll          = vb2_fop_poll,
	.mmap          = vb2_fop_mmap,
	.unlocked_ioctl = video_ioctl2,
};

static const struct v4l2_file_operations radio_fops = {
	.owner         = THIS_MODULE,
	.open          = em28xx_v4l2_open,
	.release       = em28xx_v4l2_close,
	.unlocked_ioctl = video_ioctl2,
};

The patch creates two empty v4l2_file_operations templates em28xx_v4l_fops_empty and radio_fops_empty i.e. the functions to be used for V4L2 file ops are not filled in.

static const struct v4l2_file_operations em28xx_v4l_fops_empty = {
	.owner         = THIS_MODULE,
	.open          = NULL,
	.release       = NULL,
	.read          = NULL,
	.poll          = NULL,
	.mmap          = NULL,
	.unlocked_ioctl = NULL,
};

static const struct v4l2_file_operations radio_fops_empty = {
	.owner         = THIS_MODULE,
	.open          = NULL,
	.release       = NULL,
	.unlocked_ioctl = NULL,
};

The video_device templates now use the empty v4l2_file_operations templates for .fops. This means the video_device registered will have the V4L2 operations disabled. Device templates will be changed as below.

static const struct video_device em28xx_video_template = {
	.fops		= &em28xx_v4l_fops_empty,
	.ioctl_ops	= &video_ioctl_ops,
	.release	= video_device_release_empty,
	.tvnorms	= V4L2_STD_ALL,
};

static struct video_device em28xx_radio_template = {
	.fops		= &radio_fops_empty,
	.ioctl_ops	= &radio_ioctl_ops,
	.release	= video_device_release_empty,
};

After V4L2 extension is successfully initialized inside em28xx_v4l2_init(), enable V4L2 file operation by setting .fops to em28xx_v4l_fops for the video and vbi devices, and setting .fops to radio_fops for the radio device.

Note: The fix was successfully tested on syzbot and sent for review. Even though the patch 
      fixes the bug, it makes considerable changes to the em28xx driver. Only the maintainer 
      will determine if it is right to make these changes. If not, another fix for the bug 
      will have to be devised.