Age | Commit message (Collapse) | Author | Files | Lines |
|
__copy_siginfo_to_user32() call reordered a bit. The rest folds
nicely.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Currently we have user_access block, followed by __put_user(),
deciding what the restorer will be and finally a put_user_try
block.
Moving the calculation of restorer first allows the rest
(actual copyout work) to coalesce into a single user_access block.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
What's left is just a sequence of stores to userland addresses, with all
error handling, etc. done out of line. Calling that from user_access block
is safe, but rather than teaching objtool to recognize it as such we can
just make it always_inline - it is small enough and has few enough callers,
for the space savings not to be an issue.
Rename the sucker to __unsafe_setup_sigcontext32() and provide
unsafe_put_sigcontext32() with usual kind of semantics.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Straightforward, except for compat_save_altstack_ex() stuck in those.
Replace that thing with an analogue that would use unsafe_put_user()
instead of put_user_ex() (called unsafe_compat_save_altstack()) and
be done with that...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
no users left
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Just do copyin into a local struct and be done with that - we are
on a shallow stack here.
[reworked by tglx, removing the macro horrors while we are touching that]
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Just do copyin into a local struct and be done with that - we are
on a shallow stack here.
[reworked by tglx, removing the macro horrors while we are touching that]
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Just do a copyin of what we want into a local variable and
be done with that. We are guaranteed to be on shallow stack
here...
Note that conditional expression for range passed to access_ok()
in mainline had been pointless all along - the only difference
between vm86plus_struct and vm86_struct is that the former has
one extra field in the end and when we get to copyin of that
field (conditional upon 'plus' argument), we use copy_from_user().
Moreover, all fields starting with ->int_revectored are copied
that way, so we only need that check (be it done by access_ok()
or by user_access_begin()) only on the beginning of the structure -
the fields that used to be covered by that get_user_try() block.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Very few call sites where that would be triggered remain, and none
of those is anywhere near hot enough to bother.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
... and consolidate the definition of sigframe_ia32->extramask - it's
always a 1-element array of 32bit unsigned.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
rather than relying upon the magic in raw_copy_from_user()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
|
|
In order to allow the GICv4 code to link properly on 32bit ARM,
make sure we don't use 64bit divisions when it isn't strictly
necessary.
Fixes: 4e6437f12d6e ("irqchip/gic-v4.1: Ensure L2 vPE table is allocated at RD level")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
VirtualBox hosts can share folders with guests, this commit adds a
VFS driver implementing the Linux-guest side of this, allowing folders
exported by the host to be mounted under Linux.
This driver depends on the guest <-> host IPC functions exported by
the vboxguest driver.
Acked-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
This is a merge error on my part - the driver was merged into mainline
by commit c5951e7c8ee5 ("Merge tag 'mips_5.6' of git://../mips/linux")
over a week ago, but nobody apparently noticed that it didn't actually
build due to still having a reference to the devm_ioremap_nocache()
function, removed a few days earlier through commit 6a1000bd2703 ("Merge
tag 'ioremap-5.6' of git://../ioremap").
Apparently this didn't get any build testing anywhere. Not perhaps all
that surprising: it's restricted to 64-bit MIPS only, and only with the
new SGI_MFD_IOC3 support enabled.
I only noticed because the ioremap conflicts in the ARM SoC driver
update made me check there weren't any others hiding, and I found this
one.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This makes the pipe code use separate wait-queues and exclusive waiting
for readers and writers, avoiding a nasty thundering herd problem when
there are lots of readers waiting for data on a pipe (or, less commonly,
lots of writers waiting for a pipe to have space).
While this isn't a common occurrence in the traditional "use a pipe as a
data transport" case, where you typically only have a single reader and
a single writer process, there is one common special case: using a pipe
as a source of "locking tokens" rather than for data communication.
In particular, the GNU make jobserver code ends up using a pipe as a way
to limit parallelism, where each job consumes a token by reading a byte
from the jobserver pipe, and releases the token by writing a byte back
to the pipe.
This pattern is fairly traditional on Unix, and works very well, but
will waste a lot of time waking up a lot of processes when only a single
reader needs to be woken up when a writer releases a new token.
A simplified test-case of just this pipe interaction is to create 64
processes, and then pass a single token around between them (this
test-case also intentionally passes another token that gets ignored to
test the "wake up next" logic too, in case anybody wonders about it):
#include <unistd.h>
int main(int argc, char **argv)
{
int fd[2], counters[2];
pipe(fd);
counters[0] = 0;
counters[1] = -1;
write(fd[1], counters, sizeof(counters));
/* 64 processes */
fork(); fork(); fork(); fork(); fork(); fork();
do {
int i;
read(fd[0], &i, sizeof(i));
if (i < 0)
continue;
counters[0] = i+1;
write(fd[1], counters, (1+(i & 1)) *sizeof(int));
} while (counters[0] < 1000000);
return 0;
}
and in a perfect world, passing that token around should only cause one
context switch per transfer, when the writer of a token causes a
directed wakeup of just a single reader.
But with the "writer wakes all readers" model we traditionally had, on
my test box the above case causes more than an order of magnitude more
scheduling: instead of the expected ~1M context switches, "perf stat"
shows
231,852.37 msec task-clock # 15.857 CPUs utilized
11,250,961 context-switches # 0.049 M/sec
616,304 cpu-migrations # 0.003 M/sec
1,648 page-faults # 0.007 K/sec
1,097,903,998,514 cycles # 4.735 GHz
120,781,778,352 instructions # 0.11 insn per cycle
27,997,056,043 branches # 120.754 M/sec
283,581,233 branch-misses # 1.01% of all branches
14.621273891 seconds time elapsed
0.018243000 seconds user
3.611468000 seconds sys
before this commit.
After this commit, I get
5,229.55 msec task-clock # 3.072 CPUs utilized
1,212,233 context-switches # 0.232 M/sec
103,951 cpu-migrations # 0.020 M/sec
1,328 page-faults # 0.254 K/sec
21,307,456,166 cycles # 4.074 GHz
12,947,819,999 instructions # 0.61 insn per cycle
2,881,985,678 branches # 551.096 M/sec
64,267,015 branch-misses # 2.23% of all branches
1.702148350 seconds time elapsed
0.004868000 seconds user
0.110786000 seconds sys
instead. Much better.
[ Note! This kernel improvement seems to be very good at triggering a
race condition in the make jobserver (in GNU make 4.2.1) for me. It's
a long known bug that was fixed back in June 2017 by GNU make commit
b552b0525198 ("[SV 51159] Use a non-blocking read with pselect to
avoid hangs.").
But there wasn't a new release of GNU make until 4.3 on Jan 19 2020,
so a number of distributions may still have the buggy version. Some
have backported the fix to their 4.2.1 release, though, and even
without the fix it's quite timing-dependent whether the bug actually
is hit. ]
Josh Triplett says:
"I've been hammering on your pipe fix patch (switching to exclusive
wait queues) for a month or so, on several different systems, and I've
run into no issues with it. The patch *substantially* improves
parallel build times on large (~100 CPU) systems, both with parallel
make and with other things that use make's pipe-based jobserver.
All current distributions (including stable and long-term stable
distributions) have versions of GNU make that no longer have the
jobserver bug"
Tested-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
My final cleanup patch for sys_compat_ioctl() introduced a regression on
the FIONREAD ioctl command, which is used for both regular and special
files, but only works on regular files after my patch, as I had missed
the warning that Al Viro put into a comment right above it.
Change it back so it can work on any file again by moving the implementation
to do_vfs_ioctl() instead.
Fixes: 77b9040195de ("compat_ioctl: simplify the implementation")
Reported-and-tested-by: Christian Zigotzky <chzigotzky@xenosoft.de>
Reported-and-tested-by: youling257 <youling257@gmail.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
The configuration of the OCTEONTX XCV_DLL_CTL register via
xcv_init_hw() is such that the RGMII RX delay is bypassed
leaving the RGMII TX delay enabled in the MAC:
/* Configure DLL - enable or bypass
* TX no bypass, RX bypass
*/
cfg = readq_relaxed(xcv->reg_base + XCV_DLL_CTL);
cfg &= ~0xFF03;
cfg |= CLKRX_BYP;
writeq_relaxed(cfg, xcv->reg_base + XCV_DLL_CTL);
This would coorespond to a interface type of PHY_INTERFACE_MODE_RGMII_RXID
and not PHY_INTERFACE_MODE_RGMII.
Fixing this allows RGMII PHY drivers to do the right thing (enable
RX delay in the PHY) instead of erroneously enabling both delays in the
PHY.
Signed-off-by: Tim Harvey <tharvey@gateworks.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When CONFIG_PROVE_LOCKING is selected together with (now default)
CONFIG_VMAP_STACK, kernel enter deadlock during boot.
At the point of checking whether interrupts are enabled or not, the
value of MSR saved on stack is read using the physical address of the
stack. But at this point, when using VMAP stack the DATA MMU
translation has already been re-enabled, leading to deadlock.
Don't use the physical address of the stack when
CONFIG_VMAP_STACK is set.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Reported-by: Guenter Roeck <linux@roeck-us.net>
Fixes: 028474876f47 ("powerpc/32: prepare for CONFIG_VMAP_STACK")
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/daeacdc0dec0416d1c587cc9f9e7191ad3068dc0.1581095957.git.christophe.leroy@c-s.fr
|
|
The early versions of our kernel user access prevention (KUAP) were
written by Russell and Christophe, and didn't have separate
read/write access.
At some point I picked up the series and added the read/write access,
but I failed to update the usages in futex.h to correctly allow read
and write.
However we didn't notice because of another bug which was causing the
low-level code to always enable read and write. That bug was fixed
recently in commit 1d8f739b07bd ("powerpc/kuap: Fix set direction in
allow/prevent_user_access()").
futex_atomic_cmpxchg_inatomic() is passed the user address as %3 and
does:
1: lwarx %1, 0, %3
cmpw 0, %1, %4
bne- 3f
2: stwcx. %5, 0, %3
Which clearly loads and stores from/to %3. The logic in
arch_futex_atomic_op_inuser() is similar, so fix both of them to use
allow_read_write_user().
Without this fix, and with PPC_KUAP_DEBUG=y, we see eg:
Bug: Read fault blocked by AMR!
WARNING: CPU: 94 PID: 149215 at arch/powerpc/include/asm/book3s/64/kup-radix.h:126 __do_page_fault+0x600/0xf30
CPU: 94 PID: 149215 Comm: futex_requeue_p Tainted: G W 5.5.0-rc7-gcc9x-g4c25df5640ae #1
...
NIP [c000000000070680] __do_page_fault+0x600/0xf30
LR [c00000000007067c] __do_page_fault+0x5fc/0xf30
Call Trace:
[c00020138e5637e0] [c00000000007067c] __do_page_fault+0x5fc/0xf30 (unreliable)
[c00020138e5638c0] [c00000000000ada8] handle_page_fault+0x10/0x30
--- interrupt: 301 at cmpxchg_futex_value_locked+0x68/0xd0
LR = futex_lock_pi_atomic+0xe0/0x1f0
[c00020138e563bc0] [c000000000217b50] futex_lock_pi_atomic+0x80/0x1f0 (unreliable)
[c00020138e563c30] [c00000000021b668] futex_requeue+0x438/0xb60
[c00020138e563d60] [c00000000021c6cc] do_futex+0x1ec/0x2b0
[c00020138e563d90] [c00000000021c8b8] sys_futex+0x128/0x200
[c00020138e563e20] [c00000000000b7ac] system_call+0x5c/0x68
Fixes: de78a9c42a79 ("powerpc: Add a framework for Kernel Userspace Access Protection")
Cc: stable@vger.kernel.org # v5.2+
Reported-by: syzbot+e808452bad7c375cbee6@syzkaller-ppc64.appspotmail.com
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr>
Link: https://lore.kernel.org/r/20200207122145.11928-1-mpe@ellerman.id.au
|
|
V{PEND,PROP}BASER registers are actually located in VLPI_base frame
of the *redistributor*. Rename their accessors to reflect this fact.
No functional changes.
Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20200206075711.1275-7-yuzenghui@huawei.com
|
|
"ITS virtual pending table not cleaning" is already complained inside
its_clear_vpend_valid(), there's no need to trigger a WARN_ON again.
Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20200206075711.1275-6-yuzenghui@huawei.com
|
|
The variable 'tmp' in inherit_vpe_l1_table_from_rd() is actually
not needed, drop it.
Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20200206075711.1275-5-yuzenghui@huawei.com
|
|
In GICv4, we will ensure that level2 vPE table memory is allocated
for the specified vpe_id on all v4 ITS, in its_alloc_vpe_table().
This still works well for the typical GICv4.1 implementation, where
the new vPE table is shared between the ITSs and the RDs.
To make it explicit, let us introduce allocate_vpe_l2_table() to
make sure that the L2 tables are allocated on all v4.1 RDs. We're
likely not need to allocate memory in it because the vPE table is
shared and (L2 table is) already allocated at ITS level, except
for the case where the ITS doesn't share anything (say SVPET == 0,
practically unlikely but architecturally allowed).
The implementation of allocate_vpe_l2_table() is mostly copied from
its_alloc_table_entry().
Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20200206075711.1275-4-yuzenghui@huawei.com
|
|
Currently, we will not set vpe_l1_page for the current RD if we can
inherit the vPE configuration table from another RD (or ITS), which
results in an inconsistency between RDs within the same CommonLPIAff
group.
Let's rename it to vpe_l1_base to indicate the base address of the
vPE configuration table of this RD, and set it properly for *all*
v4.1 redistributors.
Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20200206075711.1275-3-yuzenghui@huawei.com
|
|
The Size field of GICv4.1 VPROPBASER register indicates number of
pages minus one and together Page_Size and Size control the vPEID
width. Let's respect this requirement of the architecture.
Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20200206075711.1275-2-yuzenghui@huawei.com
|
|
Fix u8 cast reading max_nss from MT_TOP_STRAP_STA register in
mt7615_eeprom_parse_hw_cap routine
Fixes: acf5457fd99db ("mt76: mt7615: read {tx,rx} mask from eeprom")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
|
|
It was reported that the max_t, ilog2, and roundup_pow_of_two macros have
exponential effects on the number of states in the sparse checker.
This patch breaks them up by calculating the "nbuckets" first so that the
"bucket_log" only needs to take ilog2().
In addition, Linus mentioned:
Patch looks good, but I'd like to point out that it's not just sparse.
You can see it with a simple
make net/core/bpf_sk_storage.i
grep 'smap->bucket_log = ' net/core/bpf_sk_storage.i | wc
and see the end result:
1 365071 2686974
That's one line (the assignment line) that is 2,686,974 characters in
length.
Now, sparse does happen to react particularly badly to that (I didn't
look to why, but I suspect it's just that evaluating all the types
that don't actually ever end up getting used ends up being much more
expensive than it should be), but I bet it's not good for gcc either.
Fixes: 6ac99e8f23d4 ("bpf: Introduce bpf sk local storage")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Link: https://lore.kernel.org/bpf/20200207081810.3918919-1-kafai@fb.com
|