Age | Commit message (Collapse) | Author | Files | Lines |
|
Internal injection testing crashed with a console log that said:
mce: [Hardware Error]: CPU 7: Machine Check Exception: f Bank 0: bd80000000100134
This caused a lot of head scratching because the MCACOD (bits 15:0) of
that status is a signature from an L1 data cache error. But Linux says
that it found it in "Bank 0", which on this model CPU only reports L1
instruction cache errors.
The answer was that Linux doesn't initialize "m->bank" in the case that
it finds a fatal error in the mce_no_way_out() pre-scan of banks. If
this was a local machine check, then this partially initialized struct
mce is being passed to mce_panic().
Fix is simple: just initialize m->bank in the case of a fatal error.
Fixes: 40c36e2741d7 ("x86/mce: Fix incorrect "Machine check from unknown source" message")
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: x86-ml <x86@kernel.org>
Cc: stable@vger.kernel.org # v4.18 Note pre-v5.0 arch/x86/kernel/cpu/mce/core.c was called arch/x86/kernel/cpu/mcheck/mce.c
Link: https://lkml.kernel.org/r/20190201003341.10638-1-tony.luck@intel.com
|
|
"Resource Control" is a very broad term for this CPU feature, and a term
that is also associated with containers, cgroups etc. This can easily
cause confusion.
Make the user prompt more specific. Match the config symbol name.
[ bp: In the future, the corresponding ARM arch-specific code will be
under ARM_CPU_RESCTRL and the arch-agnostic bits will be carved out
under the CPU_RESCTRL umbrella symbol. ]
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Babu Moger <Babu.Moger@amd.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Morse <james.morse@arm.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: linux-doc@vger.kernel.org
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Pu Wen <puwen@hygon.cn>
Cc: Reinette Chatre <reinette.chatre@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190130195621.GA30653@cmpxchg.org
|
|
Kexec-ing a kernel with "efi=noruntime" on the first kernel's command
line causes the following null pointer dereference:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
#PF error: [normal kernel read fault]
Call Trace:
efi_runtime_map_copy+0x28/0x30
bzImage64_load+0x688/0x872
arch_kexec_kernel_image_load+0x6d/0x70
kimage_file_alloc_init+0x13e/0x220
__x64_sys_kexec_file_load+0x144/0x290
do_syscall_64+0x55/0x1a0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Just skip the EFI info setup if EFI runtime services are not enabled.
[ bp: Massage commit message. ]
Suggested-by: Dave Young <dyoung@redhat.com>
Signed-off-by: Kairui Song <kasong@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Dave Young <dyoung@redhat.com>
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: bhe@redhat.com
Cc: David Howells <dhowells@redhat.com>
Cc: erik.schmauss@intel.com
Cc: fanc.fnst@cn.fujitsu.com
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: kexec@lists.infradead.org
Cc: lenb@kernel.org
Cc: linux-acpi@vger.kernel.org
Cc: Philipp Rudo <prudo@linux.vnet.ibm.com>
Cc: rafael.j.wysocki@intel.com
Cc: robert.moore@intel.com
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Cc: Yannik Sembritzki <yannik@sembritzki.me>
Link: https://lkml.kernel.org/r/20190118111310.29589-2-kasong@redhat.com
|
|
The load_microcode_amd() function searches for microcode patches and
attempts to apply a microcode patch if it is of different level than the
currently installed level.
While the processor won't actually load a level that is less than
what is already installed, the logic wrongly returns UCODE_NEW thus
signaling to its caller reload_store() that a late loading should be
attempted.
If the file-system contains an older microcode revision than what is
currently running, such a late microcode reload can result in these
misleading messages:
x86/CPU: CPU features have changed after loading microcode, but might not take effect.
x86/CPU: Please consider either early loading through initrd/built-in or a potential BIOS update.
These messages were issued on a system where SME/SEV are not
enabled by the BIOS (MSR C001_0010[23] = 0b) because during boot,
early_detect_mem_encrypt() is called and cleared the SME and SEV
features in this case.
However, after the wrong late load attempt, get_cpu_cap() is called and
reloads the SME and SEV feature bits, resulting in the messages.
Update the microcode level check to not attempt microcode loading if the
current level is greater than(!) and not only equal to the current patch
level.
[ bp: massage commit message. ]
Fixes: 2613f36ed965 ("x86/microcode: Attempt late loading only when new microcode is present")
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/154894518427.9406.8246222496874202773.stgit@tlendack-t1.amdoffice.net
|
|
... so that they can get CCed on platform patches.
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Andy Shevchenko <andy@infradead.org>
Cc: x86@kernel.org
Link: https://lkml.kernel.org/r/20190128113619.19025-1-bp@alien8.de
|
|
show_ldttss() shifts desc.base2 by 24 bit, but base2 is 8 bits of a
bitfield in a u16.
Due to the really great idea of integer promotion in C99 base2 is promoted
to an int, because that's the standard defined behaviour when all values
which can be represented by base2 fit into an int.
Now if bit 7 is set in desc.base2 the result of the shift left by 24 makes
the resulting integer negative and the following conversion to unsigned
long legitmately sign extends first causing the upper bits 32 bits to be
set in the result.
Fix this by casting desc.base2 to unsigned long before the shift.
Detected by CoverityScan, CID#1475635 ("Unintended sign extension")
[ tglx: Reworded the changelog a bit as I actually had to lookup
the standard (again) to decode the original one. ]
Fixes: a1a371c468f7 ("x86/fault: Decode page fault OOPSes better")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: kernel-janitors@vger.kernel.org
Link: https://lkml.kernel.org/r/20181222191116.21831-1-colin.king@canonical.com
|
|
In some old AMD KVM implementation, guest's EFER.LME bit is cleared by KVM
when the hypervsior detects that the guest sets CR0.PG to 0. This causes
the guest OS to reboot when it tries to return from 32-bit trampoline code
because the CPU is in incorrect state: CR4.PAE=1, CR0.PG=1, CS.L=1, but
EFER.LME=0. As a precaution, set EFER.LME=1 as part of long mode
activation procedure. This extra step won't cause any harm when Linux is
booted on a bare-metal machine.
Signed-off-by: Wei Huang <wei@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: bp@alien8.de
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/20190104054411.12489-1-wei@redhat.com
|
|
Add the Atom Tremont model number to the Intel family list.
[ Tony: Also update comment at head of file to say "_X" suffix is
also used for microserver parts. ]
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Aristeu Rozanski <aris@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Cc: Megha Dey <megha.dey@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: Rajneesh Bhardwaj <rajneesh.bhardwaj@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190125195902.17109-4-tony.luck@intel.com
|
|
After commit
5d32a66541c4 ("PCI/ACPI: Allow ACPI to be built without CONFIG_PCI set")
dependencies on CONFIG_PCI that previously were satisfied implicitly
through dependencies on CONFIG_ACPI have to be specified directly.
PCI_LOCKLESS_CONFIG depends on PCI but this dependency has not been
mentioned in the Kconfig so add an explicit dependency here and fix
WARNING: unmet direct dependencies detected for PCI_LOCKLESS_CONFIG
Depends on [n]: PCI [=n]
Selected by [y]:
- X86 [=y]
Fixes: 5d32a66541c46 ("PCI/ACPI: Allow ACPI to be built without CONFIG_PCI set")
Signed-off-by: Sinan Kaya <okaya@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-acpi@vger.kernel.org
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190121231958.28255-2-okaya@kernel.org
|
|
While in the native case entry into the kernel happens on the trampoline
stack, PV Xen kernels get entered with the current thread stack right
away. Hence source and destination stacks are identical in that case,
and special care is needed.
Other than in sync_regs() the copying done on the INT80 path isn't
NMI / #MC safe, as either of these events occurring in the middle of the
stack copying would clobber data on the (source) stack.
There is similar code in interrupt_entry() and nmi(), but there is no fixup
required because those code paths are unreachable in XEN PV guests.
[ tglx: Sanitized subject, changelog, Fixes tag and stable mail address. Sigh ]
Fixes: 7f2590a110b8 ("x86/entry/64: Use a per-CPU trampoline stack for IDT entries")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Juergen Gross <jgross@suse.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Peter Anvin <hpa@zytor.com>
Cc: xen-devel@lists.xenproject.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/5C3E1128020000780020DFAD@prv1-mh.provo.novell.com
|
|
Commit
b6664ba42f14 ("s390, kexec_file: drop arch_kexec_mem_walk()")
changed the behavior of kexec_locate_mem_hole(): it will try to allocate
free memory only when kbuf.mem is initialized to zero.
However, x86's kexec_file_load() implementation reuses a struct
kexec_buf allocated on the stack and its kbuf.mem member gets set by
each kexec_add_buffer() invocation.
The second kexec_add_buffer() will reuse the same kbuf but not
reinitialize kbuf.mem.
Therefore, explictily reset kbuf.mem each time in order for
kexec_locate_mem_hole() to locate a free memory region each time.
[ bp: massage commit message. ]
Fixes: b6664ba42f14 ("s390, kexec_file: drop arch_kexec_mem_walk()")
Signed-off-by: Dave Young <dyoung@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Philipp Rudo <prudo@linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Yannik Sembritzki <yannik@sembritzki.me>
Cc: Yi Wang <wang.yi59@zte.com.cn>
Cc: kexec@lists.infradead.org
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20181228011247.GA9999@dhcp-128-65.nay.redhat.com
|
|
Using sizeof(pointer) for determining the size of a memset() only works
when the size of the pointer and the size of type to which it points are
the same. For pte_t this is only true for 64bit and 32bit-NONPAE. On 32bit
PAE systems this is wrong as the pointer size is 4 byte but the PTE entry
is 8 bytes. It's actually not a real world issue as this code depends on
64bit, but it's wrong nevertheless.
Use sizeof(*p) for correctness sake.
Fixes: aad983913d77 ("x86/mm/encrypt: Simplify sme_populate_pgd() and sme_populate_pgd_large()")
Signed-off-by: Peng Hao <peng.hao2@zte.com.cn>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: dave.hansen@linux.intel.com
Cc: peterz@infradead.org
Cc: luto@kernel.org
Link: https://lkml.kernel.org/r/1546065252-97996-1-git-send-email-peng.hao2@zte.com.cn
|
|
There was a bug where the per-mm pkey state was not being preserved across
fork() in the child. fork() is performed in the pkey selftests, but all of
the pkey activity is performed in the parent. The child does not perform
any actions sensitive to pkey state.
To make the test more sensitive to these kinds of bugs, add a fork() where
the parent exits, and execution continues in the child.
To achieve this let the key exhaustion test not terminate at the first
allocation failure and fork after 2*NR_PKEYS loops and continue in the
child.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: bp@alien8.de
Cc: hpa@zytor.com
Cc: peterz@infradead.org
Cc: mpe@ellerman.id.au
Cc: will.deacon@arm.com
Cc: luto@kernel.org
Cc: jroedel@suse.de
Cc: stable@vger.kernel.org
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Joerg Roedel <jroedel@suse.de>
Link: https://lkml.kernel.org/r/20190102215657.585704B7@viggo.jf.intel.com
|
|
Memory protection key behavior should be the same in a child as it was
in the parent before a fork. But, there is a bug that resets the
state in the child at fork instead of preserving it.
The creation of new mm's is a bit convoluted. At fork(), the code
does:
1. memcpy() the parent mm to initialize child
2. mm_init() to initalize some select stuff stuff
3. dup_mmap() to create true copies that memcpy() did not do right
For pkeys two bits of state need to be preserved across a fork:
'execute_only_pkey' and 'pkey_allocation_map'.
Those are preserved by the memcpy(), but mm_init() invokes
init_new_context() which overwrites 'execute_only_pkey' and
'pkey_allocation_map' with "new" values.
The author of the code erroneously believed that init_new_context is *only*
called at execve()-time. But, alas, init_new_context() is used at execve()
and fork().
The result is that, after a fork(), the child's pkey state ends up looking
like it does after an execve(), which is totally wrong. pkeys that are
already allocated can be allocated again, for instance.
To fix this, add code called by dup_mmap() to copy the pkey state from
parent to child explicitly. Also add a comment above init_new_context() to
make it more clear to the next poor sod what this code is used for.
Fixes: e8c24d3a23a ("x86/pkeys: Allocation/free syscalls")
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: bp@alien8.de
Cc: hpa@zytor.com
Cc: peterz@infradead.org
Cc: mpe@ellerman.id.au
Cc: will.deacon@arm.com
Cc: luto@kernel.org
Cc: jroedel@suse.de
Cc: stable@vger.kernel.org
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Joerg Roedel <jroedel@suse.de>
Link: https://lkml.kernel.org/r/20190102215655.7A69518C@viggo.jf.intel.com
|
|
The outb() function takes parameters value and port, in that order. Fix
the parameters used in the kalsr i8254 fallback code.
Fixes: 5bfce5ef55cb ("x86, kaslr: Provide randomness functions")
Signed-off-by: Daniel Drake <drake@endlessm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: bp@alien8.de
Cc: hpa@zytor.com
Cc: linux@endlessm.com
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20190107034024.15005-1-drake@endlessm.com
|
|
After commit
5d32a66541c4 ("PCI/ACPI: Allow ACPI to be built without CONFIG_PCI set")
dependencies on CONFIG_PCI that previously were satisfied implicitly
through dependencies on CONFIG_ACPI have to be specified directly. LPSS
code relies on PCI infrastructure but this dependency has not been
explicitly called out so do that.
Fixes: 5d32a66541c46 ("PCI/ACPI: Allow ACPI to be built without CONFIG_PCI set")
Signed-off-by: Sinan Kaya <okaya@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: linux-acpi@vger.kernel.org
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190102181038.4418-11-okaya@kernel.org
|
|
Commit
4cd24de3a098 ("x86/retpoline: Make CONFIG_RETPOLINE depend on compiler support")
replaced the RETPOLINE define with CONFIG_RETPOLINE checks. Remove the
remaining pieces.
[ bp: Massage commit message. ]
Fixes: 4cd24de3a098 ("x86/retpoline: Make CONFIG_RETPOLINE depend on compiler support")
Signed-off-by: WANG Chao <chao.wang@ucloud.cn>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Reviewed-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Jessica Yu <jeyu@kernel.org>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Kees Cook <keescook@chromium.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Cc: Michal Marek <michal.lkml@markovi.net>
Cc: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: linux-kbuild@vger.kernel.org
Cc: srinivas.eeda@oracle.com
Cc: stable <stable@vger.kernel.org>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20181210163725.95977-1-chao.wang@ucloud.cn
|
|
CONFIG_RESCTRL is too generic. The final goal is to have a generic
option called like this which is selected by the arch-specific ones
CONFIG_X86_RESCTRL and CONFIG_ARM64_RESCTRL. The generic one will
cover the resctrl filesystem and other generic and shared bits of
functionality.
Signed-off-by: Borislav Petkov <bp@suse.de>
Suggested-by: Ingo Molnar <mingo@kernel.org>
Requested-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Babu Moger <babu.moger@amd.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: James Morse <james.morse@arm.com>
Cc: Reinette Chatre <reinette.chatre@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: x86@kernel.org
Link: http://lkml.kernel.org/r/20190108171401.GC12235@zn.tnic
|
|
Both the .o and the actual executable need to be built with -m32 in order
to link correctly.
Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Tycho Andersen <tycho@tycho.ws>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: James Morris <jmorris@namei.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: fec7b6690541 ("samples: add an example of seccomp user trap")
Link: http://lkml.kernel.org/r/20190107231631.1849-1-tycho@tycho.ws
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
For some reasons, I accidentally got rid of "generic-y += shmparam.h"
from some architectures.
Restore them to fix building c6x, h8300, hexagon, m68k, microblaze,
openrisc, and unicore32.
Fixes: d6e4b3e326d8 ("arch: remove redundant UAPI generic-y defines")
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
The semantics of what "in core" means for the mincore() system call are
somewhat unclear, but Linux has always (since 2.3.52, which is when
mincore() was initially done) treated it as "page is available in page
cache" rather than "page is mapped in the mapping".
The problem with that traditional semantic is that it exposes a lot of
system cache state that it really probably shouldn't, and that users
shouldn't really even care about.
So let's try to avoid that information leak by simply changing the
semantics to be that mincore() counts actual mapped pages, not pages
that might be cheaply mapped if they were faulted (note the "might be"
part of the old semantics: being in the cache doesn't actually guarantee
that you can access them without IO anyway, since things like network
filesystems may have to revalidate the cache before use).
In many ways the old semantics were somewhat insane even aside from the
information leak issue. From the very beginning (and that beginning is
a long time ago: 2.3.52 was released in March 2000, I think), the code
had a comment saying
Later we can get more picky about what "in core" means precisely.
and this is that "later". Admittedly it is much later than is really
comfortable.
NOTE! This is a real semantic change, and it is for example known to
change the output of "fincore", since that program literally does a
mmmap without populating it, and then doing "mincore()" on that mapping
that doesn't actually have any pages in it.
I'm hoping that nobody actually has any workflow that cares, and the
info leak is real.
We may have to do something different if it turns out that people have
valid reasons to want the old semantics, and if we can limit the
information leak sanely.
Cc: Kevin Easton <kevin@guarana.org>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Masatake YAMATO <yamato@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Commit 594cc251fdd0 ("make 'user_access_begin()' do 'access_ok()'")
broke both alpha and SH booting in qemu, as noticed by Guenter Roeck.
It turns out that the bug wasn't actually in that commit itself (which
would have been surprising: it was mostly a no-op), but in how the
addition of access_ok() to the strncpy_from_user() and strnlen_user()
functions now triggered the case where those functions would test the
access of the very last byte of the user address space.
The string functions actually did that user range test before too, but
they did it manually by just comparing against user_addr_max(). But
with user_access_begin() doing the check (using "access_ok()"), it now
exposed problems in the architecture implementations of that function.
For example, on alpha, the access_ok() helper macro looked like this:
#define __access_ok(addr, size) \
((get_fs().seg & (addr | size | (addr+size))) == 0)
and what it basically tests is of any of the high bits get set (the
USER_DS masking value is 0xfffffc0000000000).
And that's completely wrong for the "addr+size" check. Because it's
off-by-one for the case where we check to the very end of the user
address space, which is exactly what the strn*_user() functions do.
Why? Because "addr+size" will be exactly the size of the address space,
so trying to access the last byte of the user address space will fail
the __access_ok() check, even though it shouldn't. As a result, the
user string accessor functions failed consistently - because they
literally don't know how long the string is going to be, and the max
access is going to be that last byte of the user address space.
Side note: that alpha macro is buggy for another reason too - it re-uses
the arguments twice.
And SH has another version of almost the exact same bug:
#define __addr_ok(addr) \
((unsigned long __force)(addr) < current_thread_info()->addr_limit.seg)
so far so good: yes, a user address must be below the limit. But then:
#define __access_ok(addr, size) \
(__addr_ok((addr) + (size)))
is wrong with the exact same off-by-one case: the case when "addr+size"
is exactly _equal_ to the limit is actually perfectly fine (think "one
byte access at the last address of the user address space")
The SH version is actually seriously buggy in another way: it doesn't
actually check for overflow, even though it did copy the _comment_ that
talks about overflow.
So it turns out that both SH and alpha actually have completely buggy
implementations of access_ok(), but they happened to work in practice
(although the SH overflow one is a serious serious security bug, not
that anybody likely cares about SH security).
This fixes the problems by using a similar macro on both alpha and SH.
It isn't trying to be clever, the end address is based on this logic:
unsigned long __ao_end = __ao_a + __ao_b - !!__ao_b;
which basically says "add start and length, and then subtract one unless
the length was zero". We can't subtract one for a zero length, or we'd
just hit an underflow instead.
For a lot of access_ok() users the length is a constant, so this isn't
actually as expensive as it initially looks.
Reported-and-tested-by: Guenter Roeck <linux@roeck-us.net>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Add support for the Adiantum encryption mode to fscrypt. Adiantum is a
tweakable, length-preserving encryption mode with security provably
reducible to that of XChaCha12 and AES-256, subject to a security bound.
It's also a true wide-block mode, unlike XTS. See the paper
"Adiantum: length-preserving encryption for entry-level processors"
(https://eprint.iacr.org/2018/720.pdf) for more details. Also see
commit 059c2a4d8e16 ("crypto: adiantum - add Adiantum support").
On sufficiently long messages, Adiantum's bottlenecks are XChaCha12 and
the NH hash function. These algorithms are fast even on processors
without dedicated crypto instructions. Adiantum makes it feasible to
enable storage encryption on low-end mobile devices that lack AES
instructions; currently such devices are unencrypted. On ARM Cortex-A7,
on 4096-byte messages Adiantum encryption is about 4 times faster than
AES-256-XTS encryption; decryption is about 5 times faster.
In fscrypt, Adiantum is suitable for encrypting both file contents and
names. With filenames, it fixes a known weakness: when two filenames in
a directory share a common prefix of >= 16 bytes, with CTS-CBC their
encrypted filenames share a common prefix too, leaking information.
Adiantum does not have this problem.
Since Adiantum also accepts long tweaks (IVs), it's also safe to use the
master key directly for Adiantum encryption rather than deriving
per-file keys, provided that the per-file nonce is included in the IVs
and the master key isn't used for any other encryption mode. This
configuration saves memory and improves performance. A new fscrypt
policy flag is added to allow users to opt-in to this configuration.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Remove the dot-prefixing since it is just a matter of the
.gitignore file.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
|
|
Make simply skips a missing rule when it is marked as .PHONY.
Remove the dummy targets.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
|
|
You do not have to use define ... endef for filechk_* rules.
For simple cases, the use of assignment looks cleaner, IMHO.
I updated the usage for scripts/Kbuild.include in case somebody
misunderstands the 'define ... endif' is the requirement.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
|
|
Now that Kbuild automatically creates asm-generic wrappers for missing
mandatory headers, it is redundant to list the same headers in
generic-y and mandatory-y.
Suggested-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
|
|
Some time ago, Sam pointed out a certain degree of overwrap between
generic-y and mandatory-y. (https://lkml.org/lkml/2017/7/10/121)
I tweaked the meaning of mandatory-y a little bit; now it defines the
minimum set of ASM headers that all architectures must have.
If arch does not have specific implementation of a mandatory header,
Kbuild will let it fallback to the asm-generic one by automatically
generating a wrapper. This will allow to drop lots of redundant
generic-y defines.
Previously, "mandatory" was used in the context of UAPI, but I guess
this can be extended to kernel space ASM headers.
Suggested-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
|
|
These comments are leftovers of commit fcc8487d477a ("uapi: export all
headers under uapi directories").
Prior to that commit, exported headers must be explicitly added to
header-y. Now, all headers under the uapi/ directories are exported.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
|
|
This commit removes redundant generic-y defines in
arch/riscv/include/asm/Kbuild.
[1] It is redundant to define the same generic-y in both
arch/$(ARCH)/include/asm/Kbuild and
arch/$(ARCH)/include/uapi/asm/Kbuild.
Remove the following generic-y:
errno.h
fcntl.h
ioctl.h
ioctls.h
ipcbuf.h
mman.h
msgbuf.h
param.h
poll.h
posix_types.h
resource.h
sembuf.h
setup.h
shmbuf.h
signal.h
socket.h
sockios.h
stat.h
statfs.h
swab.h
termbits.h
termios.h
types.h
[2] It is redundant to define generic-y when arch-specific
implementation exists in arch/$(ARCH)/include/asm/*.h
Remove the following generic-y:
cacheflush.h
module.h
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
|
|
filechk_* rules often consist of multiple 'echo' lines. They must be
surrounded with { } or ( ) to work correctly. Otherwise, only the
string from the last 'echo' would be written into the target.
Let's take care of that in the 'filechk' in scripts/Kbuild.include
to clean up filechk_* rules.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
|
|
Since commit 9c2af1c7377a ("kbuild: add .DELETE_ON_ERROR special
target"), the target file is automatically deleted on failure.
The boilerplate code
... || { rm -f $@; false; }
is unneeded.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
|
|
Commit 3a2429e1faf4 ("kbuild: change if_changed_rule for multi-line
recipe") and commit 4f0e3a57d6eb ("kbuild: Add support for DT binding
schema checks") came in via different sub-systems.
This is a follow-up cleanup.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
|
|
The only/last user of UIMAGE_IN/OUT was removed by commit 4722a3e6b716
("microblaze: fix multiple bugs in arch/microblaze/boot/Makefile").
The input and output should always be $< and $@.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
|
|
Currently, CONFIG_JUMP_LABEL just means "I _want_ to use jump label".
The jump label is controlled by HAVE_JUMP_LABEL, which is defined
like this:
#if defined(CC_HAVE_ASM_GOTO) && defined(CONFIG_JUMP_LABEL)
# define HAVE_JUMP_LABEL
#endif
We can improve this by testing 'asm goto' support in Kconfig, then
make JUMP_LABEL depend on CC_HAS_ASM_GOTO.
Ugly #ifdef HAVE_JUMP_LABEL will go away, and CONFIG_JUMP_LABEL will
match to the real kernel capability.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
|
|
As mentioned in the info pages of gas, the '.align' pseudo op's
interpretation of the alignment value is architecture specific.
It might either be a byte value or taken to the power of two.
On ARM it's actually the latter which leads to unnecessary large
alignments of 16 bytes for 32 bit builds or 256 bytes for 64 bit
builds.
Fix this by switching to '.balign' instead which is consistent
across all architectures.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
|
|
Coccinelle doesn't always have access to the values of named
(#define) constants, and they may likely often be bound to true
and false values anyway, resulting in false positives. So stop
warning about them.
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
|
|
Avoid reporting on the use of an iterator index variable when
the variable is redeclared.
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
|
|
This has never been used.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
|
|
This commit removes redundant generic-y defines in
arch/nds32/include/asm/Kbuild.
[1] It is redundant to define the same generic-y in both
arch/$(ARCH)/include/asm/Kbuild and
arch/$(ARCH)/include/uapi/asm/Kbuild.
Remove the following generic-y:
bitsperlong.h
bpf_perf_event.h
errno.h
fcntl.h
ioctl.h
ioctls.h
mman.h
shmbuf.h
stat.h
[2] It is redundant to define generic-y when arch-specific
implementation exists in arch/$(ARCH)/include/asm/*.h
Remove the following generic-y:
ftrace.h
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
|
|
kernel/dma/Kconfig globally defines HAS_DMA as follows:
config HAS_DMA
bool
depends on !NO_DMA
default y
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
Fixes build break on most ARM/ARM64 defconfigs:
lib/genalloc.c: In function 'gen_pool_add_virt':
lib/genalloc.c:190:10: error: implicit declaration of function 'vzalloc_node'; did you mean 'kzalloc_node'?
lib/genalloc.c:190:8: warning: assignment to 'struct gen_pool_chunk *' from 'int' makes pointer from integer without a cast [-Wint-conversion]
lib/genalloc.c: In function 'gen_pool_destroy':
lib/genalloc.c:254:3: error: implicit declaration of function 'vfree'; did you mean 'kfree'?
Fixes: 6862d2fc8185 ('lib/genalloc.c: use vzalloc_node() to allocate the bitmap')
Cc: Huang Shijie <sjhuang@iluvatar.ai>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexey Skidanov <alexey.skidanov@intel.com>
Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
We need to return a dma_addr_t even if we don't have a kernel mapping.
Do so by consolidating the phys_to_dma call in a single place and jump
to it from all the branches that return successfully.
Fixes: bfd56cd60521 ("dma-mapping: support highmem in the generic remap allocator")
Reported-by: Liviu Dudau <liviu@dudau.co.uk
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Liviu Dudau <liviu@dudau.co.uk>
|
|
In many cases we don't have to create a GART mapping at all, which
also means there is nothing to unmap. Fix the range check that was
incorrectly modified when removing the mapping_error method.
Fixes: 9e8aa6b546 ("x86/amd_gart: remove the mapping_error dma_map_ops method")
Reported-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Michal Kubecek <mkubecek@suse.cz>
|
|
Some non-generic ia64 configs don't build swiotlb, and thus should not
pull in the generic non-coherent DMA infrastructure.
Fixes: 68c608345c ("swiotlb: remove dma_mark_clean")
Reported-by: Tony Luck <tony.luck@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This has been broken forever, and nobody ever really noticed because
it's purely a performance issue.
Long long ago, in commit 6175ddf06b61 ("x86: Clean up mem*io functions")
Brian Gerst simplified the memory copies to and from iomem, since on
x86, the instructions to access iomem are exactly the same as the
regular instructions.
That is technically true, and things worked, and nobody said anything.
Besides, back then the regular memcpy was pretty simple and worked fine.
Nobody noticed except for David Laight, that is. David has a testing a
TLP monitor he was writing for an FPGA, and has been occasionally
complaining about how memcpy_toio() writes things one byte at a time.
Which is completely unacceptable from a performance standpoint, even if
it happens to technically work.
The reason it's writing one byte at a time is because while it's
technically true that accesses to iomem are the same as accesses to
regular memory on x86, the _granularity_ (and ordering) of accesses
matter to iomem in ways that they don't matter to regular cached memory.
In particular, when ERMS is set, we default to using "rep movsb" for
larger memory copies. That is indeed perfectly fine for real memory,
since the whole point is that the CPU is going to do cacheline
optimizations and executes the memory copy efficiently for cached
memory.
With iomem? Not so much. With iomem, "rep movsb" will indeed work, but
it will copy things one byte at a time. Slowly and ponderously.
Now, originally, back in 2010 when commit 6175ddf06b61 was done, we
didn't use ERMS, and this was much less noticeable.
Our normal memcpy() was simpler in other ways too.
Because in fact, it's not just about using the string instructions. Our
memcpy() these days does things like "read and write overlapping values"
to handle the last bytes of the copy. Again, for normal memory,
overlapping accesses isn't an issue. For iomem? It can be.
So this re-introduces the specialized memcpy_toio(), memcpy_fromio() and
memset_io() functions. It doesn't particularly optimize them, but it
tries to at least not be horrid, or do overlapping accesses. In fact,
this uses the existing __inline_memcpy() function that we still had
lying around that uses our very traditional "rep movsl" loop followed by
movsw/movsb for the final bytes.
Somebody may decide to try to improve on it, but if we've gone almost a
decade with only one person really ever noticing and complaining, maybe
it's not worth worrying about further, once it's not _completely_ broken?
Reported-by: David Laight <David.Laight@aculab.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This actually enables the __put_user_goto() functionality in
unsafe_put_user().
For an example of the effect of this, this is the code generated for the
unsafe_put_user(signo, &infop->si_signo, Efault);
in the waitid() system call:
movl %ecx,(%rbx) # signo, MEM[(struct __large_struct *)_2]
It's just one single store instruction, along with generating an
exception table entry pointing to the Efault label case in case that
instruction faults.
Before, we would generate this:
xorl %edx, %edx
movl %ecx,(%rbx) # signo, MEM[(struct __large_struct *)_3]
testl %edx, %edx
jne .L309
with the exception table generated for that 'mov' instruction causing us
to jump to a stub that set %edx to -EFAULT and then jumped back to the
'testl' instruction.
So not only do we now get rid of the extra code in the normal sequence,
we also avoid unnecessarily keeping that extra error register live
across it all.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This is finally the actual reason for the odd error handling in the
"unsafe_get/put_user()" functions, introduced over three years ago.
Using a "jump to error label" interface is somewhat odd, but very
convenient as a programming interface, and more importantly, it fits
very well with simply making the target be the exception handler address
directly from the inline asm.
The reason it took over three years to actually do this? We need "asm
goto" support for it, which only became the default on x86 last year.
It's now been a year that we've forced asm goto support (see commit
e501ce957a78 "x86: Force asm-goto"), and so let's just do it here too.
[ Side note: this commit was originally done back in 2016. The above
commentary about timing is obviously about it only now getting merged
into my real upstream tree - Linus ]
Sadly, gcc still only supports "asm goto" with asms that do not have any
outputs, so we are limited to only the put_user case for this. Maybe in
several more years we can do the get_user case too.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The alternative coding patch for parisc in kernel 4.20 broke booting
machines with PA8500-PA8700 CPUs. The problem is, that for such machines
the parisc kernel automatically utilizes huge pages to access kernel
text code, but the set_kernel_text_rw() function, which is used shortly
before applying any alternative patches, didn't used the correctly
hugepage-aligned addresses to remap the kernel text read-writeable.
Fixes: 3847dab77421 ("parisc: Add alternative coding infrastructure")
Cc: <stable@vger.kernel.org> [4.20]
Signed-off-by: Helge Deller <deller@gmx.de>
|