aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/misc
diff options
context:
space:
mode:
authorDani Liberman <dliberman@habana.ai>2022-04-27 14:14:17 +0300
committerGreg Kroah-Hartman <gregkh@linuxfoundation.org>2022-05-22 21:01:20 +0200
commitc37803388c95833c4728b089e6c94996dc457d95 (patch)
tree25c6f33a781cea93f6734eb6f543e3b98fbf8573 /drivers/misc
parenthabanalabs: add device memory scrub ability through debugfs (diff)
downloadlinux-dev-c37803388c95833c4728b089e6c94996dc457d95.tar.xz
linux-dev-c37803388c95833c4728b089e6c94996dc457d95.zip
habanalabs: handle race in driver fini
Scenario: 1. During hard reset, driver executes device_kill_open_processes. 2. Drivers file descriptor is not closed yet (user process is alive), hence we are starting loop on all open file descriptors. 3. Just before getting task struct of user process, according to pid, SIGKILL is sent to the user process, hence get_pid_task fails, driver prints a warning and device_kill_open_processes returns an error. 4. Returned error causing driver fini do disable the device object of the process which causes a kernel crash. The fix is to handle this case not as an error and continue fini flow as normal, since the killed process (by the SIGKILL) will release its resources just like it will do when the driver sends him the sigkill. Signed-off-by: Dani Liberman <dliberman@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Diffstat (limited to 'drivers/misc')
-rw-r--r--drivers/misc/habanalabs/common/device.c11
1 files changed, 7 insertions, 4 deletions
diff --git a/drivers/misc/habanalabs/common/device.c b/drivers/misc/habanalabs/common/device.c
index dbec98736a31..15df89b31e1b 100644
--- a/drivers/misc/habanalabs/common/device.c
+++ b/drivers/misc/habanalabs/common/device.c
@@ -1024,10 +1024,13 @@ static int device_kill_open_processes(struct hl_device *hdev, u32 timeout, bool
put_task_struct(task);
} else {
- dev_warn(hdev->dev,
- "Can't get task struct for PID so giving up on killing process\n");
- mutex_unlock(fd_lock);
- return -ETIME;
+ /*
+ * If we got here, it means that process was killed from outside the driver
+ * right after it started looping on fd_list and before get_pid_task, thus
+ * we don't need to kill it.
+ */
+ dev_dbg(hdev->dev,
+ "Can't get task struct for user process, assuming process was killed from outside the driver\n");
}
}