diff options
Diffstat (limited to 'Documentation/fault-injection')
4 files changed, 89 insertions, 46 deletions
diff --git a/Documentation/fault-injection/fault-injection.rst b/Documentation/fault-injection/fault-injection.rst index f51bb21d20e4..17779a2772e5 100644 --- a/Documentation/fault-injection/fault-injection.rst +++ b/Documentation/fault-injection/fault-injection.rst @@ -16,15 +16,23 @@ Available fault injection capabilities injects page allocation failures. (alloc_pages(), get_free_pages(), ...) +- fail_usercopy + + injects failures in user memory access functions. (copy_from_user(), get_user(), ...) + - fail_futex injects futex deadlock and uaddr fault errors. +- fail_sunrpc + + injects kernel RPC client and server failures. + - fail_make_request injects disk IO errors on devices permitted by setting /sys/block/<device>/make-it-fail or - /sys/block/<device>/<partition>/make-it-fail. (generic_make_request()) + /sys/block/<device>/<partition>/make-it-fail. (submit_bio_noacct()) - fail_mmc_request @@ -74,8 +82,10 @@ configuration of fault-injection capabilities. - /sys/kernel/debug/fail*/times: - specifies how many times failures may happen at most. - A value of -1 means "no limit". + specifies how many times failures may happen at most. A value of -1 + means "no limit". Note, though, that this file only accepts unsigned + values. So, if you want to specify -1, you better use 'printf' instead + of 'echo', e.g.: $ printf %#x -1 > times - /sys/kernel/debug/fail*/space: @@ -122,16 +132,16 @@ configuration of fault-injection capabilities. Format: { 'Y' | 'N' } - default is 'N', setting it to 'Y' won't inject failures into - highmem/user allocations. + default is 'Y', setting it to 'N' will also inject failures into + highmem/user allocations (__GFP_HIGHMEM allocations). - /sys/kernel/debug/failslab/ignore-gfp-wait: - /sys/kernel/debug/fail_page_alloc/ignore-gfp-wait: Format: { 'Y' | 'N' } - default is 'N', setting it to 'Y' will inject failures - only into non-sleep allocations (GFP_ATOMIC allocations). + default is 'Y', setting it to 'N' will also inject failures + into allocations that can sleep (__GFP_DIRECT_RECLAIM allocations). - /sys/kernel/debug/fail_page_alloc/min-order: @@ -145,6 +155,27 @@ configuration of fault-injection capabilities. default is 'N', setting it to 'Y' will disable failure injections when dealing with private (address space) futexes. +- /sys/kernel/debug/fail_sunrpc/ignore-client-disconnect: + + Format: { 'Y' | 'N' } + + default is 'N', setting it to 'Y' will disable disconnect + injection on the RPC client. + +- /sys/kernel/debug/fail_sunrpc/ignore-server-disconnect: + + Format: { 'Y' | 'N' } + + default is 'N', setting it to 'Y' will disable disconnect + injection on the RPC server. + +- /sys/kernel/debug/fail_sunrpc/ignore-cache-wait: + + Format: { 'Y' | 'N' } + + default is 'N', setting it to 'Y' will disable cache wait + injection on the RPC server. + - /sys/kernel/debug/fail_function/inject: Format: { 'function-name' | '!function-name' | '' } @@ -163,11 +194,13 @@ configuration of fault-injection capabilities. - ERRNO: retval must be -1 to -MAX_ERRNO (-4096). - ERR_NULL: retval must be 0 or -1 to -MAX_ERRNO (-4096). -- /sys/kernel/debug/fail_function/<functiuon-name>/retval: +- /sys/kernel/debug/fail_function/<function-name>/retval: - specifies the "error" return value to inject to the given - function for given function. This will be created when - user specifies new injection entry. + specifies the "error" return value to inject to the given function. + This will be created when the user specifies a new injection entry. + Note that this file only accepts unsigned values. So, if you want to + use a negative errno, you better use 'printf' instead of 'echo', e.g.: + $ printf %#x -12 > retval Boot option ^^^^^^^^^^^ @@ -177,6 +210,7 @@ use the boot option:: failslab= fail_page_alloc= + fail_usercopy= fail_make_request= fail_futex= mmc_core.fail_request=<interval>,<probability>,<space>,<times> @@ -222,7 +256,7 @@ How to add new fault injection capability - debugfs entries - failslab, fail_page_alloc, and fail_make_request use this way. + failslab, fail_page_alloc, fail_usercopy, and fail_make_request use this way. Helper functions: fault_create_debugfs_attr(name, parent, attr); @@ -250,10 +284,10 @@ Application Examples echo Y > /sys/kernel/debug/$FAILTYPE/task-filter echo 10 > /sys/kernel/debug/$FAILTYPE/probability echo 100 > /sys/kernel/debug/$FAILTYPE/interval - echo -1 > /sys/kernel/debug/$FAILTYPE/times + printf %#x -1 > /sys/kernel/debug/$FAILTYPE/times echo 0 > /sys/kernel/debug/$FAILTYPE/space echo 2 > /sys/kernel/debug/$FAILTYPE/verbose - echo 1 > /sys/kernel/debug/$FAILTYPE/ignore-gfp-wait + echo Y > /sys/kernel/debug/$FAILTYPE/ignore-gfp-wait faulty_system() { @@ -304,11 +338,11 @@ Application Examples echo N > /sys/kernel/debug/$FAILTYPE/task-filter echo 10 > /sys/kernel/debug/$FAILTYPE/probability echo 100 > /sys/kernel/debug/$FAILTYPE/interval - echo -1 > /sys/kernel/debug/$FAILTYPE/times + printf %#x -1 > /sys/kernel/debug/$FAILTYPE/times echo 0 > /sys/kernel/debug/$FAILTYPE/space echo 2 > /sys/kernel/debug/$FAILTYPE/verbose - echo 1 > /sys/kernel/debug/$FAILTYPE/ignore-gfp-wait - echo 1 > /sys/kernel/debug/$FAILTYPE/ignore-gfp-highmem + echo Y > /sys/kernel/debug/$FAILTYPE/ignore-gfp-wait + echo Y > /sys/kernel/debug/$FAILTYPE/ignore-gfp-highmem echo 10 > /sys/kernel/debug/$FAILTYPE/stacktrace-depth trap "echo 0 > /sys/kernel/debug/$FAILTYPE/probability" SIGINT SIGTERM EXIT @@ -331,11 +365,11 @@ Application Examples FAILTYPE=fail_function FAILFUNC=open_ctree echo $FAILFUNC > /sys/kernel/debug/$FAILTYPE/inject - echo -12 > /sys/kernel/debug/$FAILTYPE/$FAILFUNC/retval + printf %#x -12 > /sys/kernel/debug/$FAILTYPE/$FAILFUNC/retval echo N > /sys/kernel/debug/$FAILTYPE/task-filter echo 100 > /sys/kernel/debug/$FAILTYPE/probability echo 0 > /sys/kernel/debug/$FAILTYPE/interval - echo -1 > /sys/kernel/debug/$FAILTYPE/times + printf %#x -1 > /sys/kernel/debug/$FAILTYPE/times echo 0 > /sys/kernel/debug/$FAILTYPE/space echo 1 > /sys/kernel/debug/$FAILTYPE/verbose diff --git a/Documentation/fault-injection/notifier-error-inject.rst b/Documentation/fault-injection/notifier-error-inject.rst index 1668b6e48d3a..fdf2dc433ead 100644 --- a/Documentation/fault-injection/notifier-error-inject.rst +++ b/Documentation/fault-injection/notifier-error-inject.rst @@ -91,8 +91,8 @@ For more usage examples There are tools/testing/selftests using the notifier error injection features for CPU and memory notifiers. - * tools/testing/selftests/cpu-hotplug/on-off-test.sh - * tools/testing/selftests/memory-hotplug/on-off-test.sh + * tools/testing/selftests/cpu-hotplug/cpu-on-off-test.sh + * tools/testing/selftests/memory-hotplug/mem-on-off-test.sh These scripts first do simple online and offline tests and then do fault injection tests if notifier error injection module is available. diff --git a/Documentation/fault-injection/nvme-fault-injection.rst b/Documentation/fault-injection/nvme-fault-injection.rst index cdb2e829228e..1d4427890d75 100644 --- a/Documentation/fault-injection/nvme-fault-injection.rst +++ b/Documentation/fault-injection/nvme-fault-injection.rst @@ -3,7 +3,7 @@ NVMe Fault Injection Linux's fault injection framework provides a systematic way to support error injection via debugfs in the /sys/kernel/debug directory. When enabled, the default NVME_SC_INVALID_OPCODE with no retry will be -injected into the nvme_end_request. Users can change the default status +injected into the nvme_try_complete_req. Users can change the default status code and no retry flag via the debugfs. The list of Generic Command Status can be found in include/linux/nvme.h diff --git a/Documentation/fault-injection/provoke-crashes.rst b/Documentation/fault-injection/provoke-crashes.rst index 9279a3e12278..3abe84225613 100644 --- a/Documentation/fault-injection/provoke-crashes.rst +++ b/Documentation/fault-injection/provoke-crashes.rst @@ -1,16 +1,19 @@ -=============== -Provoke crashes -=============== +.. SPDX-License-Identifier: GPL-2.0 -The lkdtm module provides an interface to crash or injure the kernel at -predefined crashpoints to evaluate the reliability of crash dumps obtained -using different dumping solutions. The module uses KPROBEs to instrument -crashing points, but can also crash the kernel directly without KRPOBE -support. +============================================================ +Provoking crashes with Linux Kernel Dump Test Module (LKDTM) +============================================================ +The lkdtm module provides an interface to disrupt (and usually crash) +the kernel at predefined code locations to evaluate the reliability of +the kernel's exception handling and to test crash dumps obtained using +different dumping solutions. The module uses KPROBEs to instrument the +trigger location, but can also trigger the kernel directly without KPROBE +support via debugfs. -You can provide the way either through module arguments when inserting -the module, or through a debugfs interface. +You can select the location of the trigger ("crash point name") and the +type of action ("crash point type") either through module arguments when +inserting the module, or through the debugfs interface. Usage:: @@ -18,31 +21,37 @@ Usage:: [cpoint_count={>0}] recur_count - Recursion level for the stack overflow test. Default is 10. + Recursion level for the stack overflow test. By default this is + dynamically calculated based on kernel configuration, with the + goal of being just large enough to exhaust the kernel stack. The + value can be seen at `/sys/module/lkdtm/parameters/recur_count`. cpoint_name - Crash point where the kernel is to be crashed. It can be + Where in the kernel to trigger the action. It can be one of INT_HARDWARE_ENTRY, INT_HW_IRQ_EN, INT_TASKLET_ENTRY, - FS_DEVRW, MEM_SWAPOUT, TIMERADD, SCSI_DISPATCH_CMD, - IDE_CORE_CP, DIRECT + FS_DEVRW, MEM_SWAPOUT, TIMERADD, SCSI_QUEUE_RQ, or DIRECT. cpoint_type Indicates the action to be taken on hitting the crash point. - It can be one of PANIC, BUG, EXCEPTION, LOOP, OVERFLOW, - CORRUPT_STACK, UNALIGNED_LOAD_STORE_WRITE, OVERWRITE_ALLOCATION, - WRITE_AFTER_FREE, + These are numerous, and best queried directly from debugfs. Some + of the common ones are PANIC, BUG, EXCEPTION, LOOP, and OVERFLOW. + See the contents of `/sys/kernel/debug/provoke-crash/DIRECT` for + a complete list. cpoint_count Indicates the number of times the crash point is to be hit - to trigger an action. The default is 10. + before triggering the action. The default is 10 (except for + DIRECT, which always fires immediately). You can also induce failures by mounting debugfs and writing the type to -<mountpoint>/provoke-crash/<crashpoint>. E.g.:: +<debugfs>/provoke-crash/<crashpoint>. E.g.:: - mount -t debugfs debugfs /mnt - echo EXCEPTION > /mnt/provoke-crash/INT_HARDWARE_ENTRY + mount -t debugfs debugfs /sys/kernel/debug + echo EXCEPTION > /sys/kernel/debug/provoke-crash/INT_HARDWARE_ENTRY +The special file `DIRECT` will induce the action directly without KPROBE +instrumentation. This mode is the only one available when the module is +built for a kernel without KPROBEs support:: -A special file is `DIRECT` which will induce the crash directly without -KPROBE instrumentation. This mode is the only one available when the module -is built on a kernel without KPROBEs support. + # Instead of having a BUG kill your shell, have it kill "cat": + cat <(echo WRITE_RO) >/sys/kernel/debug/provoke-crash/DIRECT |