As part of the Summer '22 round of Linux Kernel Mentorship Program, I worked on the bug KASAN: use-after-free Read in v4l2_fh_open amongst multiple other bugs reported by Syzbot. My patch for this bug is still under review at this time. I'll update and post any feedback I will receive from the maintainers.
Note: I will be describing the bug, my analysis, my patch and how I arrived at the fix. The
purpose is to document how I was thinking about the bug.
The syzbot dashboard
Syzbot is a cloud based automation system that runs the fuzzer Syzkaller continuously on various Linux trees and reports any bugs found. A syzbot bug report includes a lot of information about the crash like the reproducer for the bug, call trace, the commit and tree that crashed, config file used, number of crashes, cause and fix bisection attempts etc.
We can also issue commands to syzbot(via email) to test a particular kernel tree and commit for a bug and optionally send it a patch to apply on the tree before testing the bug. This is useful to test out patches or even just to check if the bug still exists in the kernel.
Note: Just because a bug is marked as open by syzbot does not automatically mean it
is still open.
There may be a fix for the bug already lined up in the linux-next
tree or perhaps the fix was patched into the tree but did not have a Fixes
/Reported
tag or maybe syzbot hasn't tested the bug for a while.
Since syzbot fuzzes older Linux releases as well, the bug may not occur in newer
releases owing to code changes in the kernel. Syzbot docs explain how syzbot
determines a bug as closed.
For both Syzbot and Syzkaller, I found the official docs and the mailing lists to be the best resources for setup and usage info.
Bug Analysis
Bug Report: https://syzkaller.appspot.com/bug?id=0e3c97f1c4112e102c9988afd5eff079350eab7f
C Reproducer: https://syzkaller.appspot.com/text?tag=ReproC&x=12663ebdd00000
.config: https://syzkaller.appspot.com/text?tag=KernelConfig&x=547a5e42ca601229
First off, looking at the name of this KASAN bug raises some questions: Which resource is being read after its freed? Who is freeing the resource? Which codepaths lead to this read-after-free? Are there any checks along such codepaths and if so how are they being bypassed? And of course, which subsystems are involved in this bug?
The stack trace printed in the kernel buffer along with GDB helped in analyzing the codepaths. The kernel buffer has three call traces. First, call trace for the use-after-free read of resource. Second, call trace for allocation of the resource. Third, call trace for freeing of resource.
Note: Scroll to the bottom of the page for the full stack trace decoded using
decode_stacktrace.sh.
Subsystems involved
This bug involves the
V4L2(Video for Linux version 2) infrastructure(/drivers/media/v4l2-core
)
and the em28xx driver(drivers/media/usb/em28xx
) in the
Linux Media subsystem.
Bug premise
Video device vdev
(of type struct video_device *
)
is being read in em28xx_v4l2_open()
(called by v4l2_open()
)
after it has been freed in a call to em28xx_v4l2_init()
.
Cause of bug
The bug arises because the em28xx driver registers a V4L2 video
device(struct video_device
) with the V4L2 interface in em28xx_v4l2_init()
,
but the V4L2 extension initialization eventually fails mid-way and the video device is unregistered.
v4l2_open()
in the V4L2 interface views the device as registered (because
video_is_registered()
in the videodevice
module returns true) and allows calls
to em28xx_v4l2_open()
in the driver code.
There is a race between video_unregister_device()
(called
by the V4L2 interface to unregister the video device) and v4l2_open()
.
em28xx_v4l2_open()
is accessing the video device after it has been
freed(by the release callback when there is no device user left) and is passing
it on to v4l2_fh_open()
which in turn is passing it onto
v4l2_fh_init()
where the use-after-free read actually occurs.
Arriving at a bug fix
The process I used to work on the bug was far from perfect. But still, the below section lists out the various strategies I thought out and tried. Clearly, most of them did not fix the bug. However, it was helpful to come up with them because I was able to follow up in a new direction by looking at why a certain approach was not working out. Mostly, I had to refer back to the subsystem documentation and rest of the subsystem code to study up on Linux Media internals.
Try #1
As a brute attempt, I increased the refcount tostruct em28xx_v4l2_ *v4l2
in em28xx_v4l2_open()
using kref_get()
with the mutex dev->lock
(part of struct em28xx
) locked to prevent video_device_unregister()
from freeing up device while em28xx_v4l2_open()
uses vdev
. However, it doesn't make any sense to do this. The vdev
can already be freed before it is extracted from file in em28xx_v4l2_open()
. vdev
can still be read before kref_get()
causing use-after-free read.
Try #2
I attempted to solve the race via additional locking inv4l2-dev.c
. I considered locking the call to device_unregister()
inside video_device_register()
and locking the vdev
extraction inside em28xx_v4l2_open()
using the mutex videodev_lock
(static to v4l2-dev.c
). This logic was wrong and caused a deadlock.
Try #3
Used the mutexvdev->lock
inside v4l2_open()
and em28xx_v4l2_init()
to make them mutually exclusive. This also deadlocked.
Try #4
The only viable option(i.e. locks common to both functions whose use would not create a deadlock ) was the mutex available in the struct em28xx device
pointer *dev
(i.e dev->lock
). video_device_unregister()
marks vdev
as unregistered with dev->lock
locked. What if vdev
extraction from file in em28xx_v4l2_open()
is locked with dev->lock
as well? The issue still remained.
Pointer to vdev
extracted from file eventually points to a struct video_device
object. So, if device_unregister()
is called on vdev
before extraction, vdev
is not and can not be nulled since it is not a pointer. Thus, there is no way to check if vdev
has been freed or not.
Try #5 (Worked)
Since making V4L2 extension initialization and v4l2_open()
mutually exclusive was not working out, I thought of ways to bar V4L2 file open operation (v4l2_open()
) while V4L2 extension was being initialized inem28xx_v4l2_init()
.
The fix
The patch makes changes in em28xx_v4l2_init()
. It temporarily disables V4L2 file operations on struct video_device
devices(video, vbi, radio) before registering them and enables the file operations on the video_device
devices only at the end of the em28xx_v4l2_init()
codepath in which V4L2 extension initialization succeeds.
The following templates present in drivers/media/usb/em28xx/em28xx-video.c
are used by em28xx_v4l2_init()
to initialize the video, vbi and radio devices (all of type struct video_device
) before registering them.
static const struct video_device em28xx_video_template = {
.fops = &em28xx_v4l_fops,
.ioctl_ops = &video_ioctl_ops,
.release = video_device_release,
.tvnorms = V4L2_STD_ALL,
};
static struct video_device em28xx_radio_template = {
.fops = &radio_fops,
.ioctl_ops = &radio_ioctl_ops,
.release = video_device_release,
};
static const struct v4l2_file_operations em28xx_v4l_fops = {
.owner = THIS_MODULE,
.open = em28xx_v4l2_open,
.release = em28xx_v4l2_close,
.read = vb2_fop_read,
.poll = vb2_fop_poll,
.mmap = vb2_fop_mmap,
.unlocked_ioctl = video_ioctl2,
};
static const struct v4l2_file_operations radio_fops = {
.owner = THIS_MODULE,
.open = em28xx_v4l2_open,
.release = em28xx_v4l2_close,
.unlocked_ioctl = video_ioctl2,
};
The patch creates two empty v4l2_file_operations
templates em28xx_v4l_fops_empty
and radio_fops_empty
i.e. the functions to be used for V4L2 file ops are not filled in.
static const struct v4l2_file_operations em28xx_v4l_fops_empty = {
.owner = THIS_MODULE,
.open = NULL,
.release = NULL,
.read = NULL,
.poll = NULL,
.mmap = NULL,
.unlocked_ioctl = NULL,
};
static const struct v4l2_file_operations radio_fops_empty = {
.owner = THIS_MODULE,
.open = NULL,
.release = NULL,
.unlocked_ioctl = NULL,
};
The video_device
templates now use the empty v4l2_file_operations
templates for .fops
. This means the video_device
registered will have the V4L2 operations disabled. Device templates will be changed as below.
static const struct video_device em28xx_video_template = {
.fops = &em28xx_v4l_fops_empty,
.ioctl_ops = &video_ioctl_ops,
.release = video_device_release_empty,
.tvnorms = V4L2_STD_ALL,
};
static struct video_device em28xx_radio_template = {
.fops = &radio_fops_empty,
.ioctl_ops = &radio_ioctl_ops,
.release = video_device_release_empty,
};
After V4L2 extension is successfully initialized inside em28xx_v4l2_init()
, enable V4L2 file operation by setting .fops
to em28xx_v4l_fops
for the video and vbi devices, and setting .fops
to radio_fops
for the radio device.
Note: The fix was successfully tested on syzbot and sent for review. Even though the patch
fixes the bug, it makes considerable changes to the em28xx driver. Only the maintainer
will determine if it is right to make these changes. If not, another fix for the bug
will have to be devised.