Age | Commit message (Collapse) | Author | Files | Lines |
|
sysvec_spurious_apic_interrupt() calls into the handling body of
__spurious_interrupt() which is not obvious as that function is declared
inside the DEFINE_IDTENTRY_IRQ(spurious_interrupt) macro.
As __spurious_interrupt() is currently always inlined this ends up with two
copies of the same code for no reason.
Split the handling function out and invoke it from both entry points.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20210210002512.469379641@linutronix.de
|
|
The per CPU hardirq_stack_ptr contains the pointer to the irq stack in the
form that it is ready to be assigned to [ER]SP so that the first push ends
up on the top entry of the stack.
But the stack switching on 64 bit has the following rules:
1) Store the current stack pointer (RSP) in the top most stack entry
to allow the unwinder to link back to the previous stack
2) Set RSP to the top most stack entry
3) Invoke functions on the irq stack
4) Pop RSP from the top most stack entry (stored in #1) so it's back
to the original stack.
That requires all stack switching code to decrement the stored pointer by 8
in order to be able to store the current RSP and then set RSP to that
location. That's a pointless exercise.
Do the -8 adjustment right when storing the pointer and make the data type
a void pointer to avoid confusion vs. the struct irq_stack data type which
is on 64bit only used to declare the backing store. Move the definition
next to the inuse flag so they likely end up in the same cache
line. Sticking them into a struct to enforce it is a seperate change.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20210210002512.354260928@linutronix.de
|
|
The recursion protection for hard interrupt stacks is an unsigned int per
CPU variable initialized to -1 named __irq_count.
The irq stack switching is only done when the variable is -1, which creates
worse code than just checking for 0. When the stack switching happens it
uses this_cpu_add/sub(1), but there is no reason to do so. It simply can
use straight writes. This is a historical leftover from the low level ASM
code which used inc and jz to make a decision.
Rename it to hardirq_stack_inuse, make it a bool and use plain stores.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20210210002512.228830141@linutronix.de
|
|
Embracing a callout into instrumentation_begin() / instrumentation_begin()
does not really make sense. Make the latter instrumentation_end().
Fixes: 2f6474e4636b ("x86/entry: Switch XEN/PV hypercall entry to IDTENTRY")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20210210002512.106502464@linutronix.de
|
|
Natively support the stack swizzle pattern:
mov %rsp, (%[tos])
mov %[tos], %rsp
...
pop %rsp
It uses the vals[] array to link the first two stack-ops, and detect
the SP to SP_INDIRECT swizzle.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
|
|
Where we already decode: mov %rsp, %reg, also decode mov %rsp, (%reg).
Nothing should match for this new stack-op.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
|
|
Currently REG_SP_INDIRECT is unused but means (%rsp + offset),
change it to mean (%rsp) + offset.
The reason is that we're going to swizzle stack in the middle of a C
function with non-trivial stack footprint. This means that when the
unwinder finds the ToS, it needs to dereference it (%rsp) and then add
the offset to the next frame, resulting in: (%rsp) + offset
This is somewhat unfortunate, since REG_BP_INDIRECT is used (by DRAP)
and thus needs to retain the current (%rbp + offset).
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
|
|
The OBJECT_FILES_NON_STANDARD annotation is used to tell objtool to
ignore a file. File-level ignores won't work when validating vmlinux.o.
Instead, convert restore_image() and core_restore_code() to be ELF
functions. Their code is conventional enough for objtool to be able to
understand them.
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/974f8ceb5385e470f72e93974c70ab5c894bb0dc.1611263462.git.jpoimboe@redhat.com
|
|
Because restore_registers() is page-aligned, the assembler inexplicably
adds an unreachable jump from after the end of the previous function to
the beginning of restore_registers().
That confuses objtool, understandably. It also creates significant text
fragmentation. As a result, most of the object file is wasted text
(nops).
Move restore_registers() to the beginning of the file to both prevent
the text fragmentation and avoid the dead jump instruction.
$ size /tmp/hibernate_asm_64.before.o /tmp/hibernate_asm_64.after.o
text data bss dec hex filename
4415 0 0 4415 113f /tmp/hibernate_asm_64.before.o
524 0 0 524 20c /tmp/hibernate_asm_64.after.o
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/8c7f634201d26453d73fe55032cbbdc05d004387.1611263462.git.jpoimboe@redhat.com
|
|
These indirect jumps are harmless; annotate them to make objtool's
retpoline validation happy.
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/ba7a141c98f2c09c255b19bf78ee4a5f45d4ecb6.1611263462.git.jpoimboe@redhat.com
|
|
The OBJECT_FILES_NON_STANDARD annotation is used to tell objtool to
ignore a file. File-level ignores won't work when validating vmlinux.o.
Instead, tell objtool to ignore do_suspend_lowlevel() directly with the
STACK_FRAME_NON_STANDARD annotation.
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Len Brown <len.brown@intel.com>
Cc: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/269eda576c53bc9ecc8167c211989111013a67aa.1611263462.git.jpoimboe@redhat.com
|
|
This indirect jump is harmless; annotate it to keep objtool's retpoline
validation happy.
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Len Brown <len.brown@intel.com>
Cc: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/a7288e7043265d95c1a5d64f9fd751ead4854bdc.1611263462.git.jpoimboe@redhat.com
|
|
With objtool vmlinux.o validation of return_to_handler(), now that
objtool has visibility inside the retpoline, jumping from EMPTY state to
a proper function state results in a stack state mismatch.
return_to_handler() is actually quite normal despite the underlying
magic. Just annotate it as a normal function.
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/14f48e623f61dbdcd84cf27a56ed8ccae73199ef.1611263462.git.jpoimboe@redhat.com
|
|
This indirect jump is harmless; annotate it to keep objtool's retpoline
validation happy.
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Juergen Gross <jgross@suse.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/4797c72a258b26e06741c58ccd4a75c42db39c1d.1611263462.git.jpoimboe@redhat.com
|
|
The Xen hypercall page is filled with zeros, causing objtool to fall
through all the empty hypercall functions until it reaches a real
function, resulting in a stack state mismatch.
The build-time contents of the hypercall page don't matter because the
page gets rewritten by the hypervisor. Make it more palatable to
objtool by making each hypervisor function a true empty function, with
nops and a return.
Cc: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/0883bde1d7a1fb3b6a4c952bc0200e873752f609.1611263462.git.jpoimboe@redhat.com
|
|
The OBJECT_FILES_NON_STANDARD annotation is used to tell objtool to
ignore a file. File-level ignores won't work when validating vmlinux.o.
Tweak the ELF metadata and unwind hints to allow objtool to follow the
code.
Cc: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/8b042a09c69e8645f3b133ef6653ba28f896807d.1611263462.git.jpoimboe@redhat.com
|
|
xen_start_kernel() doesn't return. Annotate it as such so objtool can
follow the code flow.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/930deafa89256c60b180442df59a1bbae48f30ab.1611263462.git.jpoimboe@redhat.com
|
|
The ORC metadata generated for UNWIND_HINT_FUNC isn't actually very
func-like. With certain usages it can cause stack state mismatches
because it doesn't set the return address (CFI_RA).
Also, users of UNWIND_HINT_RET_OFFSET no longer need to set a custom
return stack offset. Instead they just need to specify a func-like
situation, so the current ret_offset code is hacky for no good reason.
Solve both problems by simplifying the RET_OFFSET handling and
converting it into a more useful UNWIND_HINT_FUNC.
If we end up needing the old 'ret_offset' functionality again in the
future, we should be able to support it pretty easily with the addition
of a custom 'sp_offset' in UNWIND_HINT_FUNC.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/db9d1f5d79dddfbb3725ef6d8ec3477ad199948d.1611263462.git.jpoimboe@redhat.com
|
|
To be used for adding asm functions to the ignore list. The "aw" is
needed to help the ELF section metadata match GCC-created sections.
Otherwise the linker creates duplicate sections instead of combining
them.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/8faa476f9a5ac89af27944ec184c89f95f3c6c49.1611263462.git.jpoimboe@redhat.com
|
|
There's an inconsistency in how sibling calls are detected in
non-function asm code, depending on the scope of the object. If the
target code is external to the object, objtool considers it a sibling
call. If the target code is internal but not a function, objtool
*doesn't* consider it a sibling call.
This can cause some inconsistencies between per-object and vmlinux.o
validation.
Instead, assume only ELF functions can do sibling calls. This generally
matches existing reality, and makes sibling call validation consistent
between vmlinux.o and per-object.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/0e9ab6f3628cc7bf3bde7aa6762d54d7df19ad78.1611263461.git.jpoimboe@redhat.com
|
|
Prevent an unreachable objtool warning after the sibling call detection
gets improved. ftrace_stub() is basically a function, annotate it as
such.
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/6845e1b2fb0723a95740c6674e548ba38c5ea489.1611263461.git.jpoimboe@redhat.com
|
|
Objtool converts direct retpoline jumps to type INSN_JUMP_DYNAMIC, since
that's what they are semantically.
That conversion doesn't work in vmlinux.o validation because the
indirect thunk function is present in the object, so the intra-object
jump check succeeds before the retpoline jump check gets a chance.
Rearrange the checks: check for a retpoline jump before checking for an
intra-object jump.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/4302893513770dde68ddc22a9d6a2a04aca491dd.1611263461.git.jpoimboe@redhat.com
|
|
With my version of GCC 9.3.1 the ".cold" subfunctions no longer have a
numbered suffix, so the trailing period is no longer there.
Presumably this doesn't yet trigger a user-visible bug since most of the
subfunction detection logic is duplicated. I only found it when
testing vmlinux.o validation.
Fixes: 54262aa28301 ("objtool: Fix sibling call detection")
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/ca0b5a57f08a2fbb48538dd915cc253b5edabb40.1611263461.git.jpoimboe@redhat.com
|
|
The JMP_NOSPEC macro branches to __x86_retpoline_*() rather than the
__x86_indirect_thunk_*() wrappers used by C code. Detect jumps to
__x86_retpoline_*() as retpoline dynamic jumps.
Presumably this doesn't trigger a user-visible bug. I only found it
when testing vmlinux.o validation.
Fixes: 39b735332cb8 ("objtool: Detect jumps to retpoline thunks")
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/31f5833e2e4f01e3d755889ac77e3661e906c09f.1611263461.git.jpoimboe@redhat.com
|
|
Actually return an error (and display a backtrace, if requested) for
directional bit warnings.
Fixes: 2f0f9e9ad7b3 ("objtool: Add Direction Flag validation")
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/dc70f2adbc72f09526f7cab5b6feb8bf7f6c5ad4.1611263461.git.jpoimboe@redhat.com
|
|
That logic is unused since
320100a5ffe5 ("x86/entry: Remove the TRACE_IRQS cruft")
Remove it.
Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/YAAszZJ2GcIYZmB5@hirez.programming.kicks-ass.net
|
|
Arnd found a randconfig that produces the warning:
arch/x86/entry/thunk_64.o: warning: objtool: missing symbol for insn at
offset 0x3e
when building with LLVM_IAS=1 (Clang's integrated assembler). Josh
notes:
With the LLVM assembler not generating section symbols, objtool has no
way to reference this code when it generates ORC unwinder entries,
because this code is outside of any ELF function.
The limitation now being imposed by objtool is that all code must be
contained in an ELF symbol. And .L symbols don't create such symbols.
So basically, you can use an .L symbol *inside* a function or a code
segment, you just can't use the .L symbol to contain the code using a
SYM_*_START/END annotation pair.
Fangrui notes that this optimization is helpful for reducing image size
when compiling with -ffunction-sections and -fdata-sections. I have
observed on the order of tens of thousands of symbols for the kernel
images built with those flags.
A patch has been authored against GNU binutils to match this behavior
of not generating unused section symbols ([1]), so this will
also become a problem for users of GNU binutils once they upgrade to 2.36.
Omit the .L prefix on a label so that the assembler will emit an entry
into the symbol table for the label, with STB_LOCAL binding. This
enables objtool to generate proper unwind info here with LLVM_IAS=1 or
GNU binutils 2.36+.
[ bp: Massage commit message. ]
Reported-by: Arnd Bergmann <arnd@arndb.de>
Suggested-by: Josh Poimboeuf <jpoimboe@redhat.com>
Suggested-by: Borislav Petkov <bp@alien8.de>
Suggested-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/20210112194625.4181814-1-ndesaulniers@google.com
Link: https://github.com/ClangBuiltLinux/linux/issues/1209
Link: https://reviews.llvm.org/D93783
Link: https://sourceware.org/binutils/docs/as/Symbol-Names.html
Link: https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=d1bcae833b32f1408485ce69f844dcd7ded093a8 [1]
|
|
The ORC unwinder showed a warning [1] which revealed the stack layout
didn't match what was expected. The problem was that paravirt patching
had replaced "CALL *pv_ops.irq.save_fl" with "PUSHF;POP". That changed
the stack layout between the PUSHF and the POP, so unwinding from an
interrupt which occurred between those two instructions would fail.
Part of the agreed upon solution was to rework the custom paravirt
patching code to use alternatives instead, since objtool already knows
how to read alternatives (and converging runtime patching infrastructure
is always a good thing anyway). But the main problem still remains,
which is that runtime patching can change the stack layout.
Making stack layout changes in alternatives was disallowed with commit
7117f16bf460 ("objtool: Fix ORC vs alternatives"), but now that paravirt
is going to be doing it, it needs to be supported.
One way to do so would be to modify the ORC table when the code gets
patched. But ORC is simple -- a good thing! -- and it's best to leave
it alone.
Instead, support stack layout changes by "flattening" all possible stack
states (CFI) from parallel alternative code streams into a single set of
linear states. The only necessary limitation is that CFI conflicts are
disallowed at all possible instruction boundaries.
For example, this scenario is allowed:
Alt1 Alt2 Alt3
0x00 CALL *pv_ops.save_fl CALL xen_save_fl PUSHF
0x01 POP %RAX
0x02 NOP
...
0x05 NOP
...
0x07 <insn>
The unwind information for offset-0x00 is identical for all 3
alternatives. Similarly offset-0x05 and higher also are identical (and
the same as 0x00). However offset-0x01 has deviating CFI, but that is
only relevant for Alt3, neither of the other alternative instruction
streams will ever hit that offset.
This scenario is NOT allowed:
Alt1 Alt2
0x00 CALL *pv_ops.save_fl PUSHF
0x01 NOP6
...
0x07 NOP POP %RAX
The problem here is that offset-0x7, which is an instruction boundary in
both possible instruction patch streams, has two conflicting stack
layouts.
[ The above examples were stolen from Peter Zijlstra. ]
The new flattened CFI array is used both for the detection of conflicts
(like the second example above) and the generation of linear ORC
entries.
BTW, another benefit of these changes is that, thanks to some related
cleanups (new fake nops and alt_group struct) objtool can finally be rid
of fake jumps, which were a constant source of headaches.
[1] https://lkml.kernel.org/r/20201111170536.arx2zbn4ngvjoov7@treble
Cc: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
|
|
Create a new struct associated with each group of alternatives
instructions. This will help with the removal of fake jumps, and more
importantly with adding support for stack layout changes in
alternatives.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
|
|
Decouple ORC entries from instructions. This simplifies the
control/data flow, and is going to make it easier to support alternative
instructions which change the stack layout.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
|
|
Running instruction decoder posttest on an s390 host with an x86 target
with allyesconfig shows errors. Instructions used in a couple of kernel
objects could not be correctly decoded on big endian system.
insn_decoder_test: warning: objdump says 6 bytes, but insn_get_length() says 5
insn_decoder_test: warning: Found an x86 instruction decoder bug, please report this.
insn_decoder_test: warning: ffffffff831eb4e1: 62 d1 fd 48 7f 04 24 vmovdqa64 %zmm0,(%r12)
insn_decoder_test: warning: objdump says 7 bytes, but insn_get_length() says 6
insn_decoder_test: warning: Found an x86 instruction decoder bug, please report this.
insn_decoder_test: warning: ffffffff831eb4e8: 62 51 fd 48 7f 44 24 01 vmovdqa64 %zmm8,0x40(%r12)
insn_decoder_test: warning: objdump says 8 bytes, but insn_get_length() says 6
This is because in a few places instruction field bytes are set directly
with further usage of "value". To address that introduce and use a
insn_set_byte() helper, which correctly updates "value" on big endian
systems.
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
|
|
Currently objtool headers are being included either by their base name
or included via ../ from a parent directory. In case of a base name usage:
#include "warn.h"
#include "arch_elf.h"
it does not make it apparent from which directory the file comes from.
To make it slightly better, and actually to avoid name clashes some arch
specific files have "arch_" suffix. And files from an arch folder have
to revert to including via ../ e.g:
#include "../../elf.h"
With additional architectures support and the code base growth there is
a need for clearer headers naming scheme for multiple reasons:
1. to make it instantly obvious where these files come from (objtool
itself / objtool arch|generic folders / some other external files),
2. to avoid name clashes of objtool arch specific headers, potential
obtool arch generic headers and the system header files (there is
/usr/include/elf.h already),
3. to avoid ../ includes and improve code readability.
4. to give a warm fuzzy feeling to developers who are mostly kernel
developers and are accustomed to linux kernel headers arranging
scheme.
Doesn't this make it instantly obvious where are these files come from?
#include <objtool/warn.h>
#include <arch/elf.h>
And doesn't it look nicer to avoid ugly ../ includes? Which also
guarantees this is elf.h from the objtool and not /usr/include/elf.h.
#include <objtool/elf.h>
This patch defines and implements new objtool headers arranging
scheme. Which is:
- all generic headers go to include/objtool (similar to include/linux)
- all arch headers go to arch/$(SRCARCH)/include/arch (to get arch
prefix). This is similar to linux arch specific "asm/*" headers but we
are not abusing "asm" name and calling it what it is. This also helps
to prevent name clashes (arch is not used in system headers or kernel
exports).
To bring objtool to this state the following things are done:
1. current top level tools/objtool/ headers are moved into
include/objtool/ subdirectory,
2. arch specific headers, currently only arch/x86/include/ are moved into
arch/x86/include/arch/ and were stripped of "arch_" suffix,
3. new -I$(srctree)/tools/objtool/include include path to make
includes like <objtool/warn.h> possible,
4. rewriting file includes,
5. make git not to ignore include/objtool/ subdirectory.
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
|
|
Correct objtool orc generation endianness problems to enable fully
functional x86 cross-compiles on big endian hardware.
Introduce bswap_if_needed() macro, which does a byte swap if target
endianness doesn't match the host, i.e. cross-compilation for little
endian on big endian and vice versa. The macro is used for conversion
of multi-byte values which are read from / about to be written to a
target native endianness ELF file.
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
|
|
Relocations generated in elf_rebuild_rel[a]_reloc_section() are broken
if objtool is built and run on a big endian system.
The following errors pop up during x86 cross-compilation:
x86_64-9.1.0-ld: fs/efivarfs/inode.o: bad reloc symbol index (0x2000000 >= 0x22) for offset 0 in section `.orc_unwind_ip'
x86_64-9.1.0-ld: final link failed: bad value
Convert those functions to use gelf_update_rel[a](), similar to what
elf_write_reloc() does.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Co-developed-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
|
|
The x86 instruction decoder code is shared across the kernel source and
the tools. Currently objtool seems to be the only tool from build tools
needed which breaks x86 cross-compilation on big endian systems. Make
the x86 instruction decoder build host endianness agnostic to support
x86 cross-compilation and enable objtool to implement endianness
awareness for big endian architectures support.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Co-developed-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
|
|
Currently the x86 instruction decoder is used from:
- the kernel itself,
- from tools like objtool and perf,
- within x86 tools, i.e. instruction decoder selftests.
The first two cases are similar, because tools headers try to mimic
kernel headers.
Instruction decoder selftests include some of the kernel headers
directly, including uapi headers. This works until headers dependencies
are kept to a minimum and tools are not cross-compiled. Since the goal
of the x86 instruction decoder selftests is not to verify uapi headers,
move it to using tools headers, like is already done for vdso2c tool,
mkpiggy and other tools in arch/x86/boot/.
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
|
|
Architectures without PUSH/POP instructions will always access the stack
though memory operations (SRC/DEST_INDIRECT). Make those operations have
the same effect on the CFA as PUSH/POP, with no stack pointer
modification.
Signed-off-by: Julien Thierry <jthierry@redhat.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
|
|
On arm64, the compiler can set the frame pointer either
with a move operation or with and add operation like:
add (SP + constant), BP
For a simple move operation, the CFA base is changed from SP to BP.
Handle also changing the CFA base when the frame pointer is set with
an addition instruction.
Signed-off-by: Julien Thierry <jthierry@redhat.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
|
|
A valid stack frame should contain both the return address and the
previous frame pointer value.
On x86, the return value is placed on the stack by the calling
instructions. On other architectures, the callee needs to explicitly
save the return address on the stack.
Add the necessary checks to verify a function properly sets up all the
elements of the stack frame.
Signed-off-by: Julien Thierry <jthierry@redhat.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
|
|
|
|
Section 3.11 was incorrectly called 3.9, fix it.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
Change my email contact ahead of a likely painful eleven-month migration
to a certain cobalt enteprisey groupware cloud product that will totally
break my workflow. Some day I may get used to having to email being
sequestered behind both claret and cerulean oath2+sms 2fa layers, but
for now I'll stick with keying in one password to receive an email vs.
the required four.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
When the creator of SQPOLL io_uring dies (i.e. sqo_task), we don't want
its internals like ->files and ->mm to be poked by the SQPOLL task, it
have never been nice and recently got racy. That can happen when the
owner undergoes destruction and SQPOLL tasks tries to submit new
requests in parallel, and so calls io_sq_thread_acquire*().
That patch halts SQPOLL submissions when sqo_task dies by introducing
sqo_dead flag. Once set, the SQPOLL task must not do any submission,
which is synchronised by uring_lock as well as the new flag.
The tricky part is to make sure that disabling always happens, that
means either the ring is discovered by creator's do_exit() -> cancel,
or if the final close() happens before it's done by the creator. The
last is guaranteed by the fact that for SQPOLL the creator task and only
it holds exactly one file note, so either it pins up to do_exit() or
removed by the creator on the final put in flush. (see comments in
uring_flush() around file->f_count == 2).
One more place that can trigger io_sq_thread_acquire_*() is
__io_req_task_submit(). Shoot off requests on sqo_dead there, even
though actually we don't need to. That's because cancellation of
sqo_task should wait for the request before going any further.
note 1: io_disable_sqo_submit() does io_ring_set_wakeup_flag() so the
caller would enter the ring to get an error, but it still doesn't
guarantee that the flag won't be cleared.
note 2: if final __userspace__ close happens not from the creator
task, the file note will pin the ring until the task dies.
Fixed: b1b6b5a30dce8 ("kernel/io_uring: cancel io_uring before task works")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
files_cancel() should cancel all relevant requests and drop file notes,
so we should never have file notes after that, including on-exit fput
and flush. Add a WARN_ONCE to be sure.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
A simple preparation change inlining io_uring_attempt_task_drop() into
io_uring_flush().
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
We expect io_rw_reissue() to take place only during submission with
uring_lock held. Add a lockdep annotation to check that invariant.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
If BCH_FEATURE_INCOMPAT_OBSO_LARGE_BUCKET is set in incompat feature
set, it means the cache device is created with obsoleted layout with
obso_bucket_site_hi. Now bcache does not support this feature bit, a new
BCH_FEATURE_INCOMPAT_LOG_LARGE_BUCKET_SIZE incompat feature bit is added
for a better layout to support large bucket size.
For the legacy compatibility purpose, if a cache device created with
obsoleted BCH_FEATURE_INCOMPAT_OBSO_LARGE_BUCKET feature bit, all bcache
devices attached to this cache set should be set to read-only. Then the
dirty data can be written back to backing device before re-create the
cache device with BCH_FEATURE_INCOMPAT_LOG_LARGE_BUCKET_SIZE feature bit
by the latest bcache-tools.
This patch checks BCH_FEATURE_INCOMPAT_OBSO_LARGE_BUCKET feature bit
when running a cache set and attach a bcache device to the cache set. If
this bit is set,
- When run a cache set, print an error kernel message to indicate all
following attached bcache device will be read-only.
- When attach a bcache device, print an error kernel message to indicate
the attached bcache device will be read-only, and ask users to update
to latest bcache-tools.
Such change is only for cache device whose bucket size >= 32MB, this is
for the zoned SSD and almost nobody uses such large bucket size at this
moment. If you don't explicit set a large bucket size for a zoned SSD,
such change is totally transparent to your bcache device.
Fixes: ffa470327572 ("bcache: add bucket_size_hi into struct cache_sb_disk for large bucket")
Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
When large bucket feature was added, BCH_FEATURE_INCOMPAT_LARGE_BUCKET
was introduced into the incompat feature set. It used bucket_size_hi
(which was added at the tail of struct cache_sb_disk) to extend current
16bit bucket size to 32bit with existing bucket_size in struct
cache_sb_disk.
This is not a good idea, there are two obvious problems,
- Bucket size is always value power of 2, if store log2(bucket size) in
existing bucket_size of struct cache_sb_disk, it is unnecessary to add
bucket_size_hi.
- Macro csum_set() assumes d[SB_JOURNAL_BUCKETS] is the last member in
struct cache_sb_disk, bucket_size_hi was added after d[] which makes
csum_set calculate an unexpected super block checksum.
To fix the above problems, this patch introduces a new incompat feature
bit BCH_FEATURE_INCOMPAT_LOG_LARGE_BUCKET_SIZE, when this bit is set, it
means bucket_size in struct cache_sb_disk stores the order of power-of-2
bucket size value. When user specifies a bucket size larger than 32768
sectors, BCH_FEATURE_INCOMPAT_LOG_LARGE_BUCKET_SIZE will be set to
incompat feature set, and bucket_size stores log2(bucket size) more
than store the real bucket size value.
The obsoleted BCH_FEATURE_INCOMPAT_LARGE_BUCKET won't be used anymore,
it is renamed to BCH_FEATURE_INCOMPAT_OBSO_LARGE_BUCKET and still only
recognized by kernel driver for legacy compatible purpose. The previous
bucket_size_hi is renmaed to obso_bucket_size_hi in struct cache_sb_disk
and not used in bcache-tools anymore.
For cache device created with BCH_FEATURE_INCOMPAT_LARGE_BUCKET feature,
bcache-tools and kernel driver still recognize the feature string and
display it as "obso_large_bucket".
With this change, the unnecessary extra space extend of bcache on-disk
super block can be avoided, and csum_set() may generate expected check
sum as well.
Fixes: ffa470327572 ("bcache: add bucket_size_hi into struct cache_sb_disk for large bucket")
Signed-off-by: Coly Li <colyli@suse.de>
Cc: stable@vger.kernel.org # 5.9+
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
This patch adds the check for features which is incompatible for
current supported feature sets.
Now if the bcache device created by bcache-tools has features that
current kernel doesn't support, read_super() will fail with error
messoage. E.g. if an unsupported incompatible feature detected,
bcache register will fail with dmesg "bcache: register_bcache() error :
Unsupported incompatible feature found".
Fixes: d721a43ff69c ("bcache: increase super block version for cache device and backing device")
Fixes: ffa470327572 ("bcache: add bucket_size_hi into struct cache_sb_disk for large bucket")
Signed-off-by: Coly Li <colyli@suse.de>
Cc: stable@vger.kernel.org # 5.9+
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
This patch fixes the following typos,
from BCH_FEATURE_COMPAT_SUUP to BCH_FEATURE_COMPAT_SUPP
from BCH_FEATURE_INCOMPAT_SUUP to BCH_FEATURE_INCOMPAT_SUPP
from BCH_FEATURE_INCOMPAT_SUUP to BCH_FEATURE_RO_COMPAT_SUPP
Fixes: d721a43ff69c ("bcache: increase super block version for cache device and backing device")
Fixes: ffa470327572 ("bcache: add bucket_size_hi into struct cache_sb_disk for large bucket")
Signed-off-by: Coly Li <colyli@suse.de>
Cc: stable@vger.kernel.org # 5.9+
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|