Age | Commit message (Collapse) | Author | Files | Lines |
|
In one rare case, Clang generated the following code:
5ca: 83 e0 21 and $0x21,%eax
5cd: b9 04 00 00 00 mov $0x4,%ecx
5d2: ff 24 c5 00 00 00 00 jmpq *0x0(,%rax,8)
5d5: R_X86_64_32S .rodata+0x38
which uses the corresponding jump table relocations:
000000000038 000200000001 R_X86_64_64 0000000000000000 .text + 834
000000000040 000200000001 R_X86_64_64 0000000000000000 .text + 5d9
000000000048 000200000001 R_X86_64_64 0000000000000000 .text + b96
000000000050 000200000001 R_X86_64_64 0000000000000000 .text + b96
000000000058 000200000001 R_X86_64_64 0000000000000000 .text + b96
000000000060 000200000001 R_X86_64_64 0000000000000000 .text + b96
000000000068 000200000001 R_X86_64_64 0000000000000000 .text + b96
000000000070 000200000001 R_X86_64_64 0000000000000000 .text + b96
000000000078 000200000001 R_X86_64_64 0000000000000000 .text + b96
000000000080 000200000001 R_X86_64_64 0000000000000000 .text + b96
000000000088 000200000001 R_X86_64_64 0000000000000000 .text + b96
000000000090 000200000001 R_X86_64_64 0000000000000000 .text + b96
000000000098 000200000001 R_X86_64_64 0000000000000000 .text + b96
0000000000a0 000200000001 R_X86_64_64 0000000000000000 .text + b96
0000000000a8 000200000001 R_X86_64_64 0000000000000000 .text + b96
0000000000b0 000200000001 R_X86_64_64 0000000000000000 .text + b96
0000000000b8 000200000001 R_X86_64_64 0000000000000000 .text + b96
0000000000c0 000200000001 R_X86_64_64 0000000000000000 .text + b96
0000000000c8 000200000001 R_X86_64_64 0000000000000000 .text + b96
0000000000d0 000200000001 R_X86_64_64 0000000000000000 .text + b96
0000000000d8 000200000001 R_X86_64_64 0000000000000000 .text + b96
0000000000e0 000200000001 R_X86_64_64 0000000000000000 .text + b96
0000000000e8 000200000001 R_X86_64_64 0000000000000000 .text + b96
0000000000f0 000200000001 R_X86_64_64 0000000000000000 .text + b96
0000000000f8 000200000001 R_X86_64_64 0000000000000000 .text + b96
000000000100 000200000001 R_X86_64_64 0000000000000000 .text + b96
000000000108 000200000001 R_X86_64_64 0000000000000000 .text + b96
000000000110 000200000001 R_X86_64_64 0000000000000000 .text + b96
000000000118 000200000001 R_X86_64_64 0000000000000000 .text + b96
000000000120 000200000001 R_X86_64_64 0000000000000000 .text + b96
000000000128 000200000001 R_X86_64_64 0000000000000000 .text + b96
000000000130 000200000001 R_X86_64_64 0000000000000000 .text + b96
000000000138 000200000001 R_X86_64_64 0000000000000000 .text + 82f
000000000140 000200000001 R_X86_64_64 0000000000000000 .text + 828
Since %eax was masked with 0x21, only the first two and the last two
entries are possible.
Objtool doesn't actually emulate all the code, so it isn't smart enough
to know that all the middle entries aren't reachable. They point to the
NOP padding area after the end of the function, so objtool seg faulted
when it tried to dereference a NULL insn->func.
After this fix, objtool still gives an "unreachable" error because it
stops reading the jump table when it encounters the bad addresses:
/home/jpoimboe/objtool-tests/adm1275.o: warning: objtool: adm1275_probe()+0x828: unreachable instruction
While the above code is technically correct, it's very wasteful of
memory -- it uses 34 jump table entries when only 4 are needed. It's
also not possible for objtool to validate this type of switch table
because the unused entries point outside the function and objtool has no
way of determining if that's intentional. Hopefully the Clang folks can
fix it.
Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/a9db88eec4f1ca089e040989846961748238b6d8.1563413318.git.jpoimboe@redhat.com
|
|
This fixes objtool for both a GCC issue and a Clang issue:
1) GCC issue:
kernel/bpf/core.o: warning: objtool: ___bpf_prog_run()+0x8d5: sibling call from callable instruction with modified stack frame
With CONFIG_RETPOLINE=n, GCC is doing the following optimization in
___bpf_prog_run().
Before:
select_insn:
jmp *jumptable(,%rax,8)
...
ALU64_ADD_X:
...
jmp select_insn
ALU_ADD_X:
...
jmp select_insn
After:
select_insn:
jmp *jumptable(, %rax, 8)
...
ALU64_ADD_X:
...
jmp *jumptable(, %rax, 8)
ALU_ADD_X:
...
jmp *jumptable(, %rax, 8)
This confuses objtool. It has never seen multiple indirect jump
sites which use the same jump table.
For GCC switch tables, the only way of detecting the size of a table
is by continuing to scan for more tables. The size of the previous
table can only be determined after another switch table is found, or
when the scan reaches the end of the function.
That logic was reused for C jump tables, and was based on the
assumption that each jump table only has a single jump site. The
above optimization breaks that assumption.
2) Clang issue:
drivers/usb/misc/sisusbvga/sisusb.o: warning: objtool: sisusb_write_mem_bulk()+0x588: can't find switch jump table
With clang 9, code can be generated where a function contains two
indirect jump instructions which use the same switch table.
The fix is the same for both issues: split the jump table parsing into
two passes.
In the first pass, locate the heads of all switch tables for the
function and mark their locations.
In the second pass, parse the switch tables and add them.
Fixes: e55a73251da3 ("bpf: Fix ORC unwinding in non-JIT BPF code")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/e995befaada9d4d8b2cf788ff3f566ba900d2b4d.1563413318.git.jpoimboe@redhat.com
Co-developed-by: Josh Poimboeuf <jpoimboe@redhat.com>
|
|
Now that C jump tables are supported, call them "jump tables" instead of
"switch tables". Also rename some other variables, add comments, and
simplify the code flow a bit.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/cf951b0c0641628e0b9b81f7ceccd9bcabcb4bd8.1563413318.git.jpoimboe@redhat.com
|
|
Simplify the sibling call detection logic a bit.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/8357dbef9e7f5512e76bf83a76c81722fc09eb5e.1563413318.git.jpoimboe@redhat.com
|
|
Even calls to __noreturn functions need the frame pointer setup first.
Such functions often dump the stack.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/aed62fbd60e239280218be623f751a433658e896.1563413318.git.jpoimboe@redhat.com
|
|
dead_end_function() can no longer return an error. Simplify its
interface by making it return boolean.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/9e6679610768fb6e6c51dca23f7d4d0c03b0c910.1563413318.git.jpoimboe@redhat.com
|
|
All callable functions should have an ELF size.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/03d429c4fa87829c61c5dc0e89652f4d9efb62f1.1563413318.git.jpoimboe@redhat.com
|
|
- Add an alias check in validate_functions(). With this change, aliases
no longer need uaccess_safe set.
- Add an alias check in decode_instructions(). With this change, the
"if (!insn->func)" check is no longer needed.
- Don't create aliases for zero-length functions, as it can have
unexpected results. The next patch will spit out a warning for
zero-length functions anyway.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/26a99c31426540f19c9a58b9e10727c385a147bc.1563413318.git.jpoimboe@redhat.com
|
|
If 'insn->func' is NULL, objtool skips some important checks, including
sibling call validation. So if some .fixup code does an invalid sibling
call, objtool ignores it.
Treat all code branches (including alts) as part of the original
function by keeping track of the original func value from
validate_functions().
This improves the usefulness of some clang function fallthrough
warnings, and exposes some additional kernel bugs in the process.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/505df630f33c9717e1ccde6e4b64c5303135c25f.1563413318.git.jpoimboe@redhat.com
|
|
After an objtool improvement, it's reporting that __memcpy_mcsafe() is
calling mcsafe_handle_tail() with AC=1:
arch/x86/lib/memcpy_64.o: warning: objtool: .fixup+0x13: call to mcsafe_handle_tail() with UACCESS enabled
arch/x86/lib/memcpy_64.o: warning: objtool: __memcpy_mcsafe()+0x34: (alt)
arch/x86/lib/memcpy_64.o: warning: objtool: __memcpy_mcsafe()+0xb: (branch)
arch/x86/lib/memcpy_64.o: warning: objtool: __memcpy_mcsafe()+0x0: <=== (func)
mcsafe_handle_tail() is basically an extension of __memcpy_mcsafe(), so
AC=1 is supposed to be set. Add mcsafe_handle_tail() to the uaccess
safe list.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/035c38f7eac845281d3c3d36749144982e06e58c.1563413318.git.jpoimboe@redhat.com
|
|
On x86-64, with CONFIG_RETPOLINE=n, GCC's "global common subexpression
elimination" optimization results in ___bpf_prog_run()'s jumptable code
changing from this:
select_insn:
jmp *jumptable(, %rax, 8)
...
ALU64_ADD_X:
...
jmp *jumptable(, %rax, 8)
ALU_ADD_X:
...
jmp *jumptable(, %rax, 8)
to this:
select_insn:
mov jumptable, %r12
jmp *(%r12, %rax, 8)
...
ALU64_ADD_X:
...
jmp *(%r12, %rax, 8)
ALU_ADD_X:
...
jmp *(%r12, %rax, 8)
The jumptable address is placed in a register once, at the beginning of
the function. The function execution can then go through multiple
indirect jumps which rely on that same register value. This has a few
issues:
1) Objtool isn't smart enough to be able to track such a register value
across multiple recursive indirect jumps through the jump table.
2) With CONFIG_RETPOLINE enabled, this optimization actually results in
a small slowdown. I measured a ~4.7% slowdown in the test_bpf
"tcpdump port 22" selftest.
This slowdown is actually predicted by the GCC manual:
Note: When compiling a program using computed gotos, a GCC
extension, you may get better run-time performance if you
disable the global common subexpression elimination pass by
adding -fno-gcse to the command line.
So just disable the optimization for this function.
Fixes: e55a73251da3 ("bpf: Fix ORC unwinding in non-JIT BPF code")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/30c3ca29ba037afcbd860a8672eef0021addf9fe.1563413318.git.jpoimboe@redhat.com
|
|
The same getuser/putuser error paths are used regardless of whether AC
is set. In non-exception failure cases, this results in an unnecessary
CLAC.
Fixes the following warnings:
arch/x86/lib/getuser.o: warning: objtool: .altinstr_replacement+0x18: redundant UACCESS disable
arch/x86/lib/putuser.o: warning: objtool: .altinstr_replacement+0x18: redundant UACCESS disable
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/bc14ded2755ae75bd9010c446079e113dbddb74b.1563413318.git.jpoimboe@redhat.com
|
|
After adding mcsafe_handle_tail() to the objtool uaccess safe list,
objtool reports:
arch/x86/lib/usercopy_64.o: warning: objtool: mcsafe_handle_tail()+0x0: call to __fentry__() with UACCESS enabled
With SMAP, this function is called with AC=1, so it needs to be careful
about which functions it calls. Disable the ftrace entry hook, which
can potentially pull in a lot of extra code.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/8e13d6f0da1c8a3f7603903da6cbf6d582bbfe10.1563413318.git.jpoimboe@redhat.com
|
|
After an objtool improvement, it's complaining about the CLAC in
copy_user_handle_tail():
arch/x86/lib/copy_user_64.o: warning: objtool: .altinstr_replacement+0x12: redundant UACCESS disable
arch/x86/lib/copy_user_64.o: warning: objtool: copy_user_handle_tail()+0x6: (alt)
arch/x86/lib/copy_user_64.o: warning: objtool: copy_user_handle_tail()+0x2: (alt)
arch/x86/lib/copy_user_64.o: warning: objtool: copy_user_handle_tail()+0x0: <=== (func)
copy_user_handle_tail() is incorrectly marked as a callable function, so
objtool is rightfully concerned about the CLAC with no corresponding
STAC.
Remove the ELF function annotation. The copy_user_handle_tail() code
path is already verified by objtool because it's jumped to by other
callable asm code (which does the corresponding STAC).
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/6b6e436774678b4b9873811ff023bd29935bee5b.1563413318.git.jpoimboe@redhat.com
|
|
After an objtool improvement, it complains about the fact that
start_cpu0() jumps to code which has an LRET instruction.
arch/x86/kernel/head_64.o: warning: objtool: .head.text+0xe4: unsupported instruction in callable function
Technically, start_cpu0() is callable, but it acts nothing like a
callable function. Prevent objtool from treating it like one by
removing its ELF function annotation.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/6b1b4505fcb90571a55fa1b52d71fb458ca24454.1563413318.git.jpoimboe@redhat.com
|
|
Fix the following warnings:
arch/x86/entry/thunk_64.o: warning: objtool: trace_hardirqs_on_thunk() is missing an ELF size annotation
arch/x86/entry/thunk_64.o: warning: objtool: trace_hardirqs_off_thunk() is missing an ELF size annotation
arch/x86/entry/thunk_64.o: warning: objtool: lockdep_sys_exit_thunk() is missing an ELF size annotation
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/89c97adc9f6cc44a0f5d03cde6d0357662938909.1563413318.git.jpoimboe@redhat.com
|
|
After making a change to improve objtool's sibling call detection, it
started showing the following warning:
arch/x86/kvm/vmx/nested.o: warning: objtool: .fixup+0x15: sibling call from callable instruction with modified stack frame
The problem is the ____kvm_handle_fault_on_reboot() macro. It does a
fake call by pushing a fake RIP and doing a jump. That tricks the
unwinder into printing the function which triggered the exception,
rather than the .fixup code.
Instead of the hack to make it look like the original function made the
call, just change the macro so that the original function actually does
make the call. This allows removal of the hack, and also makes objtool
happy.
I triggered a vmx instruction exception and verified that the stack
trace is still sane:
kernel BUG at arch/x86/kvm/x86.c:358!
invalid opcode: 0000 [#1] SMP PTI
CPU: 28 PID: 4096 Comm: qemu-kvm Not tainted 5.2.0+ #16
Hardware name: Lenovo THINKSYSTEM SD530 -[7X2106Z000]-/-[7X2106Z000]-, BIOS -[TEE113Z-1.00]- 07/17/2017
RIP: 0010:kvm_spurious_fault+0x5/0x10
Code: 00 00 00 00 00 8b 44 24 10 89 d2 45 89 c9 48 89 44 24 10 8b 44 24 08 48 89 44 24 08 e9 d4 40 22 00 0f 1f 40 00 0f 1f 44 00 00 <0f> 0b 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 55 49 89 fd 41
RSP: 0018:ffffbf91c683bd00 EFLAGS: 00010246
RAX: 000061f040000000 RBX: ffff9e159c77bba0 RCX: ffff9e15a5c87000
RDX: 0000000665c87000 RSI: ffff9e15a5c87000 RDI: ffff9e159c77bba0
RBP: 0000000000000000 R08: 0000000000000000 R09: ffff9e15a5c87000
R10: 0000000000000000 R11: fffff8f2d99721c0 R12: ffff9e159c77bba0
R13: ffffbf91c671d960 R14: ffff9e159c778000 R15: 0000000000000000
FS: 00007fa341cbe700(0000) GS:ffff9e15b7400000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fdd38356804 CR3: 00000006759de003 CR4: 00000000007606e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
loaded_vmcs_init+0x4f/0xe0
alloc_loaded_vmcs+0x38/0xd0
vmx_create_vcpu+0xf7/0x600
kvm_vm_ioctl+0x5e9/0x980
? __switch_to_asm+0x40/0x70
? __switch_to_asm+0x34/0x70
? __switch_to_asm+0x40/0x70
? __switch_to_asm+0x34/0x70
? free_one_page+0x13f/0x4e0
do_vfs_ioctl+0xa4/0x630
ksys_ioctl+0x60/0x90
__x64_sys_ioctl+0x16/0x20
do_syscall_64+0x55/0x1c0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7fa349b1ee5b
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/64a9b64d127e87b6920a97afde8e96ea76f6524e.1563413318.git.jpoimboe@redhat.com
|
|
Objtool reports the following:
arch/x86/kvm/vmx/vmenter.o: warning: objtool: vmx_vmenter()+0x14: call without frame pointer save/setup
But frame pointers are necessarily broken anyway, because
__vmx_vcpu_run() clobbers RBP with the guest's value before calling
vmx_vmenter(). So calling without a frame pointer doesn't make things
any worse.
Make objtool happy by changing the call to a UD2.
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lkml.kernel.org/r/9fc2216c9dc972f95bb65ce2966a682c6bda1cb0.1563413318.git.jpoimboe@redhat.com
|
|
Some of the fastop functions, e.g. em_setcc(), are actually just used as
global labels which point to blocks of functions. The global labels are
incorrectly annotated as functions. Also the functions themselves don't
have size annotations.
Fixes a bunch of warnings like the following:
arch/x86/kvm/emulate.o: warning: objtool: seto() is missing an ELF size annotation
arch/x86/kvm/emulate.o: warning: objtool: em_setcc() is missing an ELF size annotation
arch/x86/kvm/emulate.o: warning: objtool: setno() is missing an ELF size annotation
arch/x86/kvm/emulate.o: warning: objtool: setc() is missing an ELF size annotation
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/c8cc9be60ebbceb3092aa5dd91916039a1f88275.1563413318.git.jpoimboe@redhat.com
|
|
The __raw_callee_save_*() functions have an ELF symbol size of zero,
which confuses objtool and other tools.
Fixes a bunch of warnings like the following:
arch/x86/xen/mmu_pv.o: warning: objtool: __raw_callee_save_xen_pte_val() is missing an ELF size annotation
arch/x86/xen/mmu_pv.o: warning: objtool: __raw_callee_save_xen_pgd_val() is missing an ELF size annotation
arch/x86/xen/mmu_pv.o: warning: objtool: __raw_callee_save_xen_make_pte() is missing an ELF size annotation
arch/x86/xen/mmu_pv.o: warning: objtool: __raw_callee_save_xen_make_pgd() is missing an ELF size annotation
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Juergen Gross <jgross@suse.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/afa6d49bb07497ca62e4fc3b27a2d0cece545b4e.1563413318.git.jpoimboe@redhat.com
|
|
When walking userspace stacks, USER_DS needs to be set, otherwise
access_ok() will not function as expected.
Reported-by: Vegard Nossum <vegard.nossum@oracle.com>
Reported-by: Eiichi Tsukata <devel@etsukata.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Vegard Nossum <vegard.nossum@oracle.com>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Link: https://lkml.kernel.org/r/20190718085754.GM3402@hirez.programming.kicks-ass.net
|
|
The elftoolchain version of libelf has a function named elf_open().
The function name isn't quite accurate anyway, since it also reads all
the ELF data. Rename it to elf_read(), which is more accurate.
[ jpoimboe: rename to elf_read(); write commit description ]
Signed-off-by: Michael Forney <mforney@mforney.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/7ce2d1b35665edf19fd0eb6fbc0b17b81a48e62f.1562793604.git.jpoimboe@redhat.com
|
|
The libelf implementation might use a different struct name, and the
Elf_Scn typedef is already used throughout the rest of objtool.
Signed-off-by: Michael Forney <mforney@mforney.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/d270e1be2835fc2a10acf67535ff2ebd2145bf43.1562793448.git.jpoimboe@redhat.com
|
|
One of the more common cases of allocation size calculations is finding
the size of a structure that has a zero-sized array at the end, along
with memory for some number of elements for that array. For example:
struct foo {
int stuff;
struct boo entry[];
};
size = sizeof(struct foo) + count * sizeof(struct boo);
instance = kmalloc(size, GFP_KERNEL);
Instead of leaving these open-coded and prone to type mistakes, we can now
use the new struct_size() helper:
instance = kmalloc(struct_size(instance, entry, count), GFP_KERNEL);
Also, notice that variable size is unnecessary, hence it is removed.
This code was detected with the help of Coccinelle.
Link: http://lkml.kernel.org/r/20190604164226.GA13823@embeddedor
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
locked_vm accounting is done roughly the same way in five places, so
unify them in a helper.
Include the helper's caller in the debug print to distinguish between
callsites.
Error codes stay the same, so user-visible behavior does too. The one
exception is that the -EPERM case in tce_account_locked_vm is removed
because Alexey has never seen it triggered.
[daniel.m.jordan@oracle.com: v3]
Link: http://lkml.kernel.org/r/20190529205019.20927-1-daniel.m.jordan@oracle.com
[sfr@canb.auug.org.au: fix mm/util.c]
Link: http://lkml.kernel.org/r/20190524175045.26897-1-daniel.m.jordan@oracle.com
Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Cc: Alan Tull <atull@kernel.org>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Moritz Fischer <mdf@kernel.org>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Steve Sistare <steven.sistare@oracle.com>
Cc: Wu Hao <hao.wu@intel.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
In order for things like get_user_pages() to work on ZONE_DEVICE memory,
we need a software PTE bit to identify device-backed PFNs. Hook this up
along with the relevant helpers to join in with ARCH_HAS_PTE_DEVMAP.
[robin.murphy@arm.com: build fixes]
Link: http://lkml.kernel.org/r/13026c4e64abc17133bbfa07d7731ec6691c0bcd.1559050949.git.robin.murphy@arm.com
Link: http://lkml.kernel.org/r/817d92886fc3b33bcbf6e105ee83a74babb3a5aa.1558547956.git.robin.murphy@arm.com
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
ARCH_HAS_ZONE_DEVICE is somewhat meaningless in itself, and combined
with the long-out-of-date comment can lead to the impression than an
architecture may just enable it (since __add_pages() now "comprehends
device memory" for itself) and expect things to work.
In practice, however, ZONE_DEVICE users have little chance of
functioning correctly without __HAVE_ARCH_PTE_DEVMAP, so let's clean
that up the same way as ARCH_HAS_PTE_SPECIAL and make it the proper
dependency so the real situation is clearer.
Link: http://lkml.kernel.org/r/87554aa78478a02a63f2c4cf60a847279ae3eb3b.1558547956.git.robin.murphy@arm.com
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Acked-by: Oliver O'Halloran <oohall@gmail.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Refactor is_device_{public,private}_page() with is_pci_p2pdma_page() to
make them all consistent in depending on their respective config options
even when CONFIG_DEV_PAGEMAP_OPS is enabled for other reasons. This
allows a little more compile-time optimisation as well as the conceptual
and cosmetic cleanup.
Link: http://lkml.kernel.org/r/187c2ab27dea70635d375a61b2f2076d26c032b0.1558547956.git.robin.murphy@arm.com
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Suggested-by: Jerome Glisse <jglisse@redhat.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oliver O'Halloran <oohall@gmail.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Two architecture that use arch specific MMAP flags are powerpc and
sparc. We still have few flag values common across them and other
architectures. Consolidate this in mman-common.h.
Also update the comment to indicate where to find HugeTLB specific
reserved values
Link: http://lkml.kernel.org/r/20190604090950.31417-1-aneesh.kumar@linux.ibm.com
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This enables support for synchronous DAX fault on powerpc
The generic changes are added as part of b6fb293f2497 ("mm: Define
MAP_SYNC and VM_SYNC flags")
Without this, mmap returns EOPNOTSUPP for MAP_SYNC with
MAP_SHARED_VALIDATE
Instead of adding MAP_SYNC with same value to
arch/powerpc/include/uapi/asm/mman.h, I am moving the #define to
asm-generic/mman-common.h. Two architectures using mman-common.h
directly are sparc and powerpc. We should be able to consloidate more
#defines to mman-common.h. That can be done as a separate patch.
Link: http://lkml.kernel.org/r/20190528091120.13322-1-aneesh.kumar@linux.ibm.com
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
It is now allowed to use persistent memory like a regular RAM, but
currently there is no way to remove this memory until machine is
rebooted.
This work expands the functionality to also allows hotremoving
previously hotplugged persistent memory, and recover the device for use
for other purposes.
To hotremove persistent memory, the management software must first
offline all memory blocks of dax region, and than unbind it from
device-dax/kmem driver. So, operations should look like this:
echo offline > /sys/devices/system/memory/memoryN/state
...
echo dax0.0 > /sys/bus/dax/drivers/kmem/unbind
Note: if unbind is done without offlining memory beforehand, it won't be
possible to do dax0.0 hotremove, and dax's memory is going to be part of
System RAM until reboot.
Link: http://lkml.kernel.org/r/20190517215438.6487-4-pasha.tatashin@soleen.com
Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: James Morris <jmorris@namei.org>
Cc: Sasha Levin <sashal@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Ross Zwisler <zwisler@kernel.org>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Yaowei Bai <baiyaowei@cmss.chinamobile.com>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Presently the remove_memory() interface is inherently broken. It tries
to remove memory but panics if some memory is not offline. The problem
is that it is impossible to ensure that all memory blocks are offline as
this function also takes lock_device_hotplug that is required to change
memory state via sysfs.
So, between calling this function and offlining all memory blocks there
is always a window when lock_device_hotplug is released, and therefore,
there is always a chance for a panic during this window.
Make this interface to return an error if memory removal fails. This
way it is safe to call this function without panicking machine, and also
makes it symmetric to add_memory() which already returns an error.
Link: http://lkml.kernel.org/r/20190517215438.6487-3-pasha.tatashin@soleen.com
Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: James Morris <jmorris@namei.org>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Ross Zwisler <zwisler@kernel.org>
Cc: Sasha Levin <sashal@kernel.org>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Yaowei Bai <baiyaowei@cmss.chinamobile.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Patch series ""Hotremove" persistent memory", v6.
Recently, adding a persistent memory to be used like a regular RAM was
added to Linux. This work extends this functionality to also allow hot
removing persistent memory.
We (Microsoft) have an important use case for this functionality.
The requirement is for physical machines with small amount of RAM (~8G)
to be able to reboot in a very short period of time (<1s). Yet, there
is a userland state that is expensive to recreate (~2G).
The solution is to boot machines with 2G preserved for persistent
memory.
Copy the state, and hotadd the persistent memory so machine still has
all 8G available for runtime. Before reboot, offline and hotremove
device-dax 2G, copy the memory that is needed to be preserved to pmem0
device, and reboot.
The series of operations look like this:
1. After boot restore /dev/pmem0 to ramdisk to be consumed by apps.
and free ramdisk.
2. Convert raw pmem0 to devdax
ndctl create-namespace --mode devdax --map mem -e namespace0.0 -f
3. Hotadd to System RAM
echo dax0.0 > /sys/bus/dax/drivers/device_dax/unbind
echo dax0.0 > /sys/bus/dax/drivers/kmem/new_id
echo online_movable > /sys/devices/system/memoryXXX/state
4. Before reboot hotremove device-dax memory from System RAM
echo offline > /sys/devices/system/memoryXXX/state
echo dax0.0 > /sys/bus/dax/drivers/kmem/unbind
5. Create raw pmem0 device
ndctl create-namespace --mode raw -e namespace0.0 -f
6. Copy the state that was stored by apps to ramdisk to pmem device
7. Do kexec reboot or reboot through firmware if firmware does not
zero memory in pmem0 region (These machines have only regular
volatile memory). So to have pmem0 device either memmap kernel
parameter is used, or devices nodes in dtb are specified.
This patch (of 3):
When add_memory() fails, the resource and the memory should be freed.
Link: http://lkml.kernel.org/r/20190517215438.6487-2-pasha.tatashin@soleen.com
Fixes: c221c0b0308f ("device-dax: "Hotplug" persistent memory for use like normal RAM")
Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
Reviewed-by: Dave Hansen <dave.hansen@intel.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: James Morris <jmorris@namei.org>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Ross Zwisler <zwisler@kernel.org>
Cc: Sasha Levin <sashal@kernel.org>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Yaowei Bai <baiyaowei@cmss.chinamobile.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Fix a few spelling and grammar errors, and two places where fast/safe in
the documentation did not match the function.
Link: http://lkml.kernel.org/r/20190321014452.13297-1-tomlevy93@gmail.com
Signed-off-by: Tom Levy <tomlevy93@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Jiri Kosina <trivial@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Andreas Christoforou reported:
UBSAN: Undefined behaviour in ipc/mqueue.c:414:49 signed integer overflow:
9 * 2305843009213693951 cannot be represented in type 'long int'
...
Call Trace:
mqueue_evict_inode+0x8e7/0xa10 ipc/mqueue.c:414
evict+0x472/0x8c0 fs/inode.c:558
iput_final fs/inode.c:1547 [inline]
iput+0x51d/0x8c0 fs/inode.c:1573
mqueue_get_inode+0x8eb/0x1070 ipc/mqueue.c:320
mqueue_create_attr+0x198/0x440 ipc/mqueue.c:459
vfs_mkobj+0x39e/0x580 fs/namei.c:2892
prepare_open ipc/mqueue.c:731 [inline]
do_mq_open+0x6da/0x8e0 ipc/mqueue.c:771
Which could be triggered by:
struct mq_attr attr = {
.mq_flags = 0,
.mq_maxmsg = 9,
.mq_msgsize = 0x1fffffffffffffff,
.mq_curmsgs = 0,
};
if (mq_open("/testing", 0x40, 3, &attr) == (mqd_t) -1)
perror("mq_open");
mqueue_get_inode() was correctly rejecting the giant mq_msgsize, and
preparing to return -EINVAL. During the cleanup, it calls
mqueue_evict_inode() which performed resource usage tracking math for
updating "user", before checking if there was a valid "user" at all
(which would indicate that the calculations would be sane). Instead,
delay this check to after seeing a valid "user".
The overflow was real, but the results went unused, so while the flaw is
harmless, it's noisy for kernel fuzzers, so just fix it by moving the
calculation under the non-NULL "user" where it actually gets used.
Link: http://lkml.kernel.org/r/201906072207.ECB65450@keescook
Signed-off-by: Kees Cook <keescook@chromium.org>
Reported-by: Andreas Christoforou <andreaschristofo@gmail.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
For architectures using __WARN_TAINT, the WARN_ON macro did not print
out the "cut here" string. The other WARN_XXX macros would print "cut
here" inside __warn_printk, which is not called for WARN_ON since it
doesn't have a message to print.
Link: http://lkml.kernel.org/r/20190624154831.163888-1-ddavenport@chromium.org
Fixes: a7bed27af194 ("bug: fix "cut here" location for __WARN_TAINT architectures")
Signed-off-by: Drew Davenport <ddavenport@chromium.org>
Acked-by: Kees Cook <keescook@chromium.org>
Tested-by: Kees Cook <keescook@chromium.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Add helper commands and functions for finding pointers to struct device
by enumerating linux device bus/class infrastructure. This can be used
to fetch subsystem and driver-specific structs:
(gdb) p *$container_of($lx_device_find_by_class_name("net", "eth0"), "struct net_device", "dev")
(gdb) p *$container_of($lx_device_find_by_bus_name("i2c", "0-004b"), "struct i2c_client", "dev")
(gdb) p *(struct imx_port*)$lx_device_find_by_class_name("tty", "ttymxc1")->parent->driver_data
Several generic "lx-device-list" functions are included to enumerate
devices by bus and class:
(gdb) lx-device-list-bus usb
(gdb) lx-device-list-class
(gdb) lx-device-list-tree &platform_bus
Similar information is available in /sys but pointer values are
deliberately hidden.
Link: http://lkml.kernel.org/r/c948628041311cbf1b9b4cff3dda7d2073cb3eaa.1561492937.git.leonard.crestez@nxp.com
Signed-off-by: Leonard Crestez <leonard.crestez@nxp.com>
Reviewed-by: Stephen Boyd <sboyd@kernel.org>
Cc: Kieran Bingham <kbingham@kernel.org>
Cc: Jan Kiszka <jan.kiszka@siemens.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This is like /sys/kernel/debug/pm/pm_genpd_summary except it's
accessible through a debugger.
This can be useful if the target crashes or hangs because power domains
were not properly enabled.
Link: http://lkml.kernel.org/r/f9ee627a0d4f94b894aa202fee8a98444049bed8.1561492937.git.leonard.crestez@nxp.com
Signed-off-by: Leonard Crestez <leonard.crestez@nxp.com>
Reviewed-by: Stephen Boyd <sboyd@kernel.org>
Cc: Kieran Bingham <kbingham@kernel.org>
Cc: Jan Kiszka <jan.kiszka@siemens.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The PPS assert/clear offset corrections are set by the PPS_SETPARAMS
ioctl in the pps_ktime structs, which also contain flags. The flags are
not initialized by applications (using the timepps.h header) and they
are not used by the kernel for anything except returning them back in
the PPS_GETPARAMS ioctl.
Set the flags to zero to make it clear they are unused and avoid leaking
uninitialized data of the PPS_SETPARAMS caller to other applications
that have a read access to the PPS device.
Link: http://lkml.kernel.org/r/20190702092251.24303-1-mlichvar@redhat.com
Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Rodolfo Giometti <giometti@enneenne.com>
Cc: Greg KH <greg@kroah.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
struct pid's count is an atomic_t field used as a refcount. Use
refcount_t for it which is basically atomic_t but does additional
checking to prevent use-after-free bugs.
For memory ordering, the only change is with the following:
- if ((atomic_read(&pid->count) == 1) ||
- atomic_dec_and_test(&pid->count)) {
+ if (refcount_dec_and_test(&pid->count)) {
kmem_cache_free(ns->pid_cachep, pid);
Here the change is from: Fully ordered --> RELEASE + ACQUIRE (as per
refcount-vs-atomic.rst) This ACQUIRE should take care of making sure the
free happens after the refcount_dec_and_test().
The above hunk also removes atomic_read() since it is not needed for the
code to work and it is unclear how beneficial it is. The removal lets
refcount_dec_and_test() check for cases where get_pid() happened before
the object was freed.
Link: http://lkml.kernel.org/r/20190701183826.191936-1-joel@joelfernandes.org
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Reviewed-by: Andrea Parri <andrea.parri@amarulasolutions.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Elena Reshetova <elena.reshetova@intel.com>
Cc: Jann Horn <jannh@google.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: KJ Tsanaktsidis <ktsanaktsidis@zendesk.com>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The dev_info.name[] array has space for RIO_MAX_DEVNAME_SZ + 1
characters. But the problem here is that we don't ensure that the user
put a NUL terminator on the end of the string. It could lead to an out
of bounds read.
Link: http://lkml.kernel.org/r/20190529110601.GB19119@mwanda
Fixes: e8de370188d0 ("rapidio: add mport char device driver")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Alexandre Bounine <alex.bou9@gmail.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Now that restore_saved_sigmask_unless() is always called with the same
argument right before poll_select_copy_remaining() we can move it into
poll_select_copy_remaining() and make it the only caller of restore() in
fs/select.c.
The patch also renames poll_select_copy_remaining(),
poll_select_finish() looks better after this change.
kern_select() doesn't use set_user_sigmask(), so in this case
poll_select_finish() does restore_saved_sigmask_unless() "for no
reason". But this won't hurt, and WARN_ON(!TIF_SIGPENDING) is still
valid.
Link: http://lkml.kernel.org/r/20190606140915.GC13440@redhat.com
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: David Laight <David.Laight@aculab.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Deepa Dinamani <deepa.kernel@gmail.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Eric Wong <e@80x24.org>
Cc: Jason Baron <jbaron@akamai.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
do_poll() returns -EINTR if interrupted and after that all its callers
have to translate it into -ERESTARTNOHAND. Change do_poll() to return
-ERESTARTNOHAND and update (simplify) the callers.
Note that this also unifies all users of restore_saved_sigmask_unless(),
see the next patch.
Linus:
: The *right* return value will actually be then chosen by
: poll_select_copy_remaining(), which will turn ERESTARTNOHAND to EINTR
: when it can't update the timeout.
:
: Except for the cases that use restart_block and do that instead and
: don't have the whole timeout restart issue as a result.
Link: http://lkml.kernel.org/r/20190606140852.GB13440@redhat.com
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: David Laight <David.Laight@aculab.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Deepa Dinamani <deepa.kernel@gmail.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Eric Wong <e@80x24.org>
Cc: Jason Baron <jbaron@akamai.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
task->saved_sigmask and ->restore_sigmask are only used in the ret-from-
syscall paths. This means that set_user_sigmask() can save ->blocked in
->saved_sigmask and do set_restore_sigmask() to indicate that ->blocked
was modified.
This way the callers do not need 2 sigset_t's passed to set/restore and
restore_user_sigmask() renamed to restore_saved_sigmask_unless() turns
into the trivial helper which just calls restore_saved_sigmask().
Link: http://lkml.kernel.org/r/20190606113206.GA9464@redhat.com
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Deepa Dinamani <deepa.kernel@gmail.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Eric Wong <e@80x24.org>
Cc: Jason Baron <jbaron@akamai.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: David Laight <David.Laight@aculab.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
struct sighand_struct::siglock field is the most used field by far, put
it first so that is can be accessed without IMM8 or IMM32 encoding on
x86_64.
Space savings (on trimmed down VM test config):
add/remove: 0/0 grow/shrink: 8/68 up/down: 49/-1147 (-1098)
Function old new delta
complete_signal 512 533 +21
do_signalfd4 335 346 +11
__cleanup_sighand 39 43 +4
unhandled_signal 49 52 +3
prepare_signal 692 695 +3
ignore_signals 37 40 +3
__tty_check_change.part 248 251 +3
ksys_unshare 780 781 +1
sighand_ctor 33 29 -4
ptrace_trap_notify 60 56 -4
sigqueue_free 98 91 -7
run_posix_cpu_timers 1389 1382 -7
proc_pid_status 2448 2441 -7
proc_pid_limits 344 337 -7
posix_cpu_timer_rearm 222 215 -7
posix_cpu_timer_get 249 242 -7
kill_pid_info_as_cred 243 236 -7
freeze_task 197 190 -7
flush_old_exec 1873 1866 -7
do_task_stat 3363 3356 -7
do_send_sig_info 98 91 -7
do_group_exit 147 140 -7
init_sighand 2088 2080 -8
do_notify_parent_cldstop 399 391 -8
signalfd_cleanup 50 41 -9
do_notify_parent 557 545 -12
__send_signal 1029 1017 -12
ptrace_stop 590 577 -13
get_signal 1576 1563 -13
__lock_task_sighand 112 99 -13
zap_pid_ns_processes 391 377 -14
update_rlimit_cpu 78 64 -14
tty_signal_session_leader 413 399 -14
tty_open_proc_set_tty 149 135 -14
tty_jobctrl_ioctl 936 922 -14
set_cpu_itimer 339 325 -14
ptrace_resume 226 212 -14
ptrace_notify 110 96 -14
proc_clear_tty 81 67 -14
posix_cpu_timer_del 229 215 -14
kernel_sigaction 156 142 -14
getrusage 977 963 -14
get_current_tty 98 84 -14
force_sigsegv 89 75 -14
force_sig_info 205 191 -14
flush_signals 83 69 -14
flush_itimer_signals 85 71 -14
do_timer_create 1120 1106 -14
do_sigpending 88 74 -14
do_signal_stop 537 523 -14
cgroup_init_fs_context 644 630 -14
call_usermodehelper_exec_async 402 388 -14
calculate_sigpending 58 44 -14
__x64_sys_timer_delete 248 234 -14
__set_current_blocked 80 66 -14
__ptrace_unlink 310 296 -14
__ptrace_detach.part 187 173 -14
send_sigqueue 362 347 -15
get_cpu_itimer 214 199 -15
signalfd_poll 175 159 -16
dequeue_signal 340 323 -17
do_getitimer 192 174 -18
release_task.part 1060 1040 -20
ptrace_peek_siginfo 408 387 -21
posix_cpu_timer_set 827 806 -21
exit_signals 437 416 -21
do_sigaction 541 520 -21
do_setitimer 485 464 -21
disassociate_ctty.part 545 517 -28
__x64_sys_rt_sigtimedwait 721 679 -42
__x64_sys_ptrace 1319 1277 -42
ptrace_request 1828 1782 -46
signalfd_read 507 459 -48
wait_consider_task 2027 1971 -56
do_coredump 3672 3616 -56
copy_process.part 6936 6871 -65
Link: http://lkml.kernel.org/r/20190503192800.GA18004@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Check whether PTRACE_GET_SYSCALL_INFO semantics implemented in the
kernel matches userspace expectations.
[akpm@linux-foundation.org: coding-style fixes]
Link: http://lkml.kernel.org/r/20190510152852.GG28558@altlinux.org
Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Acked-by: Shuah Khan <shuah@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Elvira Khabirova <lineprinter@altlinux.org>
Cc: Eugene Syromyatnikov <esyr@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Greentime Hu <greentime@andestech.com>
Cc: Helge Deller <deller@gmx.de> [parisc]
Cc: James E.J. Bottomley <jejb@parisc-linux.org>
Cc: James Hogan <jhogan@kernel.org>
Cc: kbuild test robot <lkp@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Vincent Chen <deanbo422@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
PTRACE_GET_SYSCALL_INFO is a generic ptrace API that lets ptracer obtain
details of the syscall the tracee is blocked in.
There are two reasons for a special syscall-related ptrace request.
Firstly, with the current ptrace API there are cases when ptracer cannot
retrieve necessary information about syscalls. Some examples include:
* The notorious int-0x80-from-64-bit-task issue. See [1] for details.
In short, if a 64-bit task performs a syscall through int 0x80, its
tracer has no reliable means to find out that the syscall was, in
fact, a compat syscall, and misidentifies it.
* Syscall-enter-stop and syscall-exit-stop look the same for the
tracer. Common practice is to keep track of the sequence of
ptrace-stops in order not to mix the two syscall-stops up. But it is
not as simple as it looks; for example, strace had a (just recently
fixed) long-standing bug where attaching strace to a tracee that is
performing the execve system call led to the tracer identifying the
following syscall-exit-stop as syscall-enter-stop, which messed up
all the state tracking.
* Since the introduction of commit 84d77d3f06e7 ("ptrace: Don't allow
accessing an undumpable mm"), both PTRACE_PEEKDATA and
process_vm_readv become unavailable when the process dumpable flag is
cleared. On such architectures as ia64 this results in all syscall
arguments being unavailable for the tracer.
Secondly, ptracers also have to support a lot of arch-specific code for
obtaining information about the tracee. For some architectures, this
requires a ptrace(PTRACE_PEEKUSER, ...) invocation for every syscall
argument and return value.
ptrace(2) man page:
long ptrace(enum __ptrace_request request, pid_t pid,
void *addr, void *data);
...
PTRACE_GET_SYSCALL_INFO
Retrieve information about the syscall that caused the stop.
The information is placed into the buffer pointed by "data"
argument, which should be a pointer to a buffer of type
"struct ptrace_syscall_info".
The "addr" argument contains the size of the buffer pointed to
by "data" argument (i.e., sizeof(struct ptrace_syscall_info)).
The return value contains the number of bytes available
to be written by the kernel.
If the size of data to be written by the kernel exceeds the size
specified by "addr" argument, the output is truncated.
[ldv@altlinux.org: selftests/seccomp/seccomp_bpf: update for PTRACE_GET_SYSCALL_INFO]
Link: http://lkml.kernel.org/r/20190708182904.GA12332@altlinux.org
Link: http://lkml.kernel.org/r/20190510152842.GF28558@altlinux.org
Signed-off-by: Elvira Khabirova <lineprinter@altlinux.org>
Co-developed-by: Dmitry V. Levin <ldv@altlinux.org>
Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Cc: Eugene Syromyatnikov <esyr@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Greentime Hu <greentime@andestech.com>
Cc: Helge Deller <deller@gmx.de> [parisc]
Cc: James E.J. Bottomley <jejb@parisc-linux.org>
Cc: James Hogan <jhogan@kernel.org>
Cc: kbuild test robot <lkp@intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vincent Chen <deanbo422@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
syscall_get_error() is required to be implemented on this architecture in
addition to already implemented syscall_get_nr(), syscall_get_arguments(),
syscall_get_return_value(), and syscall_get_arch() functions in order to
extend the generic ptrace API with PTRACE_GET_SYSCALL_INFO request.
Link: http://lkml.kernel.org/r/20190510152824.GE28558@altlinux.org
Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: Elvira Khabirova <lineprinter@altlinux.org>
Cc: Eugene Syromyatnikov <esyr@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Greentime Hu <greentime@andestech.com>
Cc: Helge Deller <deller@gmx.de> [parisc]
Cc: James E.J. Bottomley <jejb@parisc-linux.org>
Cc: James Hogan <jhogan@kernel.org>
Cc: kbuild test robot <lkp@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vincent Chen <deanbo422@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
syscall_get_error() is required to be implemented on all architectures in
addition to already implemented syscall_get_nr(), syscall_get_arguments(),
syscall_get_return_value(), and syscall_get_arch() functions in order to
extend the generic ptrace API with PTRACE_GET_SYSCALL_INFO request.
Link: http://lkml.kernel.org/r/20190510152812.GD28558@altlinux.org
Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Acked-by: Helge Deller <deller@gmx.de> [parisc]
Cc: James E.J. Bottomley <jejb@parisc-linux.org>
Cc: Elvira Khabirova <lineprinter@altlinux.org>
Cc: Eugene Syromyatnikov <esyr@redhat.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Greentime Hu <greentime@andestech.com>
Cc: James Hogan <jhogan@kernel.org>
Cc: kbuild test robot <lkp@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vincent Chen <deanbo422@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
syscall_get_error() is required to be implemented on all architectures
in addition to already implemented syscall_get_nr(),
syscall_get_arguments(), syscall_get_return_value(), and
syscall_get_arch() functions in order to extend the generic ptrace API
with PTRACE_GET_SYSCALL_INFO request.
Link: http://lkml.kernel.org/r/20190510152803.GC28558@altlinux.org
Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Acked-by: Paul Burton <paul.burton@mips.com>
Cc: Elvira Khabirova <lineprinter@altlinux.org>
Cc: Eugene Syromyatnikov <esyr@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: James Hogan <jhogan@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Greentime Hu <greentime@andestech.com>
Cc: Helge Deller <deller@gmx.de> [parisc]
Cc: James E.J. Bottomley <jejb@parisc-linux.org>
Cc: kbuild test robot <lkp@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vincent Chen <deanbo422@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|