Age | Commit message (Collapse) | Author | Files | Lines |
|
In commit 170d13ca3a2f ("x86: re-introduce non-generic memcpy_{to,from}io")
I made our copy from IO space use a separate copy routine rather than
rely on the generic memcpy. I did that because our generic memory copy
isn't actually well-defined when it comes to internal access ordering or
alignment, and will in fact depend on various CPUID flags.
In particular, the default memcpy() for a modern Intel CPU will
generally be just a "rep movsb", which works reasonably well for
medium-sized memory copies of regular RAM, since the CPU will turn it
into fairly optimized microcode.
However, for non-cached memory and IO, "rep movs" ends up being
horrendously slow and will just do the architectural "one byte at a
time" accesses implied by the movsb.
At the other end of the spectrum, if you _don't_ end up using the "rep
movsb" code, you'd likely fall back to the software copy, which does
overlapping accesses for the tail, and may copy things backwards.
Again, for regular memory that's fine, for IO memory not so much.
The thinking was that clearly nobody really cared (because things
worked), but some people had seen horrible performance due to the byte
accesses, so let's just revert back to our long ago version that dod
"rep movsl" for the bulk of the copy, and then fixed up the potentially
last few bytes of the tail with "movsw/b".
Interestingly (and perhaps not entirely surprisingly), while that was
our original memory copy implementation, and had been used before for
IO, in the meantime many new users of memcpy_*io() had come about. And
while the access patterns for the memory copy weren't well-defined (so
arguably _any_ access pattern should work), in practice the "rep movsb"
case had been very common for the last several years.
In particular Jarkko Sakkinen reported that the memcpy_*io() change
resuled in weird errors from his Geminilake NUC TPM module.
And it turns out that the TPM TCG accesses according to spec require
that the accesses be
(a) done strictly sequentially
(b) be naturally aligned
otherwise the TPM chip will abort the PCI transaction.
And, in fact, the tpm_crb.c driver did this:
memcpy_fromio(buf, priv->rsp, 6);
...
memcpy_fromio(&buf[6], &priv->rsp[6], expected - 6);
which really should never have worked in the first place, but back
before commit 170d13ca3a2f it *happened* to work, because the
memcpy_fromio() would be expanded to a regular memcpy, and
(a) gcc would expand the first memcpy in-line, and turn it into a
4-byte and a 2-byte read, and they happened to be in the right
order, and the alignment was right.
(b) gcc would call "memcpy()" for the second one, and the machines that
had this TPM chip also apparently ended up always having ERMS
("Enhanced REP MOVSB/STOSB instructions"), so we'd use the "rep
movbs" for that copy.
In other words, basically by pure luck, the code happened to use the
right access sizes in the (two different!) memcpy() implementations to
make it all work.
But after commit 170d13ca3a2f, both of the memcpy_fromio() calls
resulted in a call to the routine with the consistent memory accesses,
and in both cases it started out transferring with 4-byte accesses.
Which worked for the first copy, but resulted in the second copy doing a
32-bit read at an address that was only 2-byte aligned.
Jarkko is actually fixing the fragile code in the TPM driver, but since
this is an excellent example of why we absolutely must not use a generic
memcpy for IO accesses, _and_ an IO-specific one really should strive to
align the IO accesses, let's do exactly that.
Side note: Jarkko also noted that the driver had been used on ARM
platforms, and had worked. That was because on 32-bit ARM, memcpy_*io()
ends up always doing byte accesses, and on 64-bit ARM it first does byte
accesses to align to 8-byte boundaries, and then does 8-byte accesses
for the bulk.
So ARM actually worked by design, and the x86 case worked by pure luck.
We *might* want to make x86-64 do the 8-byte case too. That should be a
pretty straightforward extension, but let's do one thing at a time. And
generally MMIO accesses aren't really all that performance-critical, as
shown by the fact that for a long time we just did them a byte at a
time, and very few people ever noticed.
Reported-and-tested-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Tested-by: Jerry Snitselaar <jsnitsel@redhat.com>
Cc: David Laight <David.Laight@aculab.com>
Fixes: 170d13ca3a2f ("x86: re-introduce non-generic memcpy_{to,from}io")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
aa_label_merge() can return NULL for memory allocations failures
make sure to handle and set the correct error in this case.
Reported-by: Peng Hao <peng.hao2@zte.com.cn>
Signed-off-by: John Johansen <john.johansen@canonical.com>
|
|
During resume hibernate restores all physical memory. Any memory
that is accessed with the MMU disabled needs to be cleaned to the
PoC.
KVMs __hyp_text was previously ommitted as it runs with the MMU
enabled, but now that the hyp-stub is located in this section,
we must clean __hyp_text too.
This ensures secondary CPUs that come online after hibernate
has finished resuming, and load KVM via the freshly written
hyp-stub see the correct instructions.
Signed-off-by: James Morse <james.morse@arm.com>
Cc: stable@vger.kernel.org
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
The hyp-stub is loaded by the kernel's early startup code at EL2
during boot, before KVM takes ownership later. The hyp-stub's
text is part of the regular kernel text, meaning it can be kprobed.
A breakpoint in the hyp-stub causes the CPU to spin in el2_sync_invalid.
Add it to the __hyp_text.
Signed-off-by: James Morse <james.morse@arm.com>
Cc: stable@vger.kernel.org
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
On systems with VHE the kernel and KVM's world-switch code run at the
same exception level. Code that is only used on a VHE system does not
need to be annotated as __hyp_text as it can reside anywhere in the
kernel text.
__hyp_text was also used to prevent kprobes from patching breakpoint
instructions into this region, as this code runs at a different
exception level. While this is no longer true with VHE, KVM still
switches VBAR_EL1, meaning a kprobe's breakpoint executed in the
world-switch code will cause a hyp-panic.
Move the __hyp_text check in the kprobes blacklist so it applies on
VHE systems too, to cover the common code and guest enter/exit
assembly.
Fixes: 888b3c8720e0 ("arm64: Treat all entry code as non-kprobe-able")
Reviewed-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Commit 1598ecda7b23 ("arm64: kaslr: ensure randomized quantities are
clean to the PoC") added cache maintenance to ensure that global
variables set by the kaslr init routine are not wiped clean due to
cache invalidation occurring during the second round of page table
creation.
However, if kaslr_early_init() exits early with no randomization
being applied (either due to the lack of a seed, or because the user
has disabled kaslr explicitly), no cache maintenance is performed,
leading to the same issue we attempted to fix earlier, as far as the
module_alloc_base variable is concerned.
Note that module_alloc_base cannot be initialized statically, because
that would cause it to be subject to a R_AARCH64_RELATIVE relocation,
causing it to be overwritten by the second round of KASLR relocation
processing.
Fixes: f80fb3a3d508 ("arm64: add support for kernel ASLR")
Cc: <stable@vger.kernel.org> # v4.6+
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Commit 3b8c9f1cdfc5 ("arm64: IPI each CPU after invalidating the I-cache
for kernel mappings") was aimed at fixing the I-cache invalidation for
kernel mappings. However, it inadvertently caused all cache maintenance
for user mappings via set_pte_at() -> __sync_icache_dcache() ->
sync_icache_aliases() to call kick_all_cpus_sync().
Reported-by: Shijith Thotton <sthotton@marvell.com>
Tested-by: Shijith Thotton <sthotton@marvell.com>
Reported-by: Wandun Chen <chenwandun@huawei.com>
Fixes: 3b8c9f1cdfc5 ("arm64: IPI each CPU after invalidating the I-cache for kernel mappings")
Cc: <stable@vger.kernel.org> # 4.19.x-
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
when compiled without CONFIG_IPV6:
security/apparmor/lsm.c:1601:21: warning: ‘apparmor_ipv6_postroute’ defined but not used [-Wunused-function]
static unsigned int apparmor_ipv6_postroute(void *priv,
^~~~~~~~~~~~~~~~~~~~~~~
Reported-by: Jordan Glover <Golden_Miller83@protonmail.ch>
Tested-by: Jordan Glover <Golden_Miller83@protonmail.ch>
Signed-off-by: Petr Vorel <pvorel@suse.cz>
Signed-off-by: John Johansen <john.johansen@canonical.com>
|
|
Revert commit 3d71746c42 ("PCI: armada8k: Add support for gpio controlled
reset signal").
That commit breaks boot on Macchiatobin board when a Mellanox NIC is
present in the PCIe slot.
It turns out that full reset cycle requires first comphy serdes
initialization. Reset signal toggle without comphy initialization makes
access to PCI configuration registers stall indefinitely. U-Boot toggles
the Macchiatobin PCIe reset line already at boot, after initializing the
comphy serdes.
So while commit 3d71746c42 ("PCI: armada8k: Add support for gpio controlled
reset signal") enables PCIe on platforms that U-Boot does not touch the
reset line (like Clearfog GT-8K), it breaks PCIe (and boot) on the
Macchiatobin board.
Revert commit 3d71746c42 ("PCI: armada8k: Add support for gpio controlled
reset signal") entirely to fix the Macchiatobin regression.
Reported-by: Sven Auhagen <sven.auhagen@voleatech.de>
Signed-off-by: Baruch Siach <baruch@tkos.co.il>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
|
|
commit 802b7c06adc7 ("ARM: cns3xxx: Convert PCI to use generic config
accessors") reimplemented cns3xxx_pci_read_config() using
pci_generic_config_read32(), which preserved the property of only doing
32-bit reads.
It also replaced cns3xxx_pci_write_config() with pci_generic_config_write(),
so it changed writes from always being 32 bits to being the actual size,
which works just fine.
Given that:
- The documentation does not mention that only 32 bit access is allowed.
- Writes are already executed using the actual size
- Extensive testing shows that 8b, 16b and 32b reads work as intended
Allow read access of any size by replacing pci_generic_config_read32()
with the pci_generic_config_read() accessors.
Fixes: 802b7c06adc7 ("ARM: cns3xxx: Convert PCI to use generic config accessors")
Suggested-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Koen Vandeputte <koen.vandeputte@ncentric.com>
[lorenzo.pieralisi@arm.com: updated commit log]
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Acked-by: Krzysztof Halasa <khalasa@piap.pl>
Acked-by: Arnd Bergmann <arnd@arndb.de>
CC: Krzysztof Halasa <khalasa@piap.pl>
CC: Olof Johansson <olof@lixom.net>
CC: Robin Leblon <robin.leblon@ncentric.com>
CC: Rob Herring <robh@kernel.org>
CC: Russell King <linux@armlinux.org.uk>
CC: Tim Harvey <tharvey@gateworks.com>
|
|
Originally, cns3xxx used its own functions for mapping, reading and
writing config registers.
Commit 802b7c06adc7 ("ARM: cns3xxx: Convert PCI to use generic config
accessors") removed the internal PCI config write function in favor of
the generic one:
cns3xxx_pci_write_config() --> pci_generic_config_write()
cns3xxx_pci_write_config() expected aligned addresses, being produced by
cns3xxx_pci_map_bus() while the generic one pci_generic_config_write()
actually expects the real address as both the function and hardware are
capable of byte-aligned writes.
This currently leads to pci_generic_config_write() writing to the wrong
registers.
For instance, upon ath9k module loading:
- driver ath9k gets loaded
- The driver wants to write value 0xA8 to register PCI_LATENCY_TIMER,
located at 0x0D
- cns3xxx_pci_map_bus() aligns the address to 0x0C
- pci_generic_config_write() effectively writes 0xA8 into register 0x0C
(CACHE_LINE_SIZE)
Fix the bug by removing the alignment in the cns3xxx mapping function.
Fixes: 802b7c06adc7 ("ARM: cns3xxx: Convert PCI to use generic config accessors")
Signed-off-by: Koen Vandeputte <koen.vandeputte@ncentric.com>
[lorenzo.pieralisi@arm.com: updated commit log]
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Acked-by: Krzysztof Halasa <khalasa@piap.pl>
Acked-by: Tim Harvey <tharvey@gateworks.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
CC: stable@vger.kernel.org # v4.0+
CC: Bjorn Helgaas <bhelgaas@google.com>
CC: Olof Johansson <olof@lixom.net>
CC: Robin Leblon <robin.leblon@ncentric.com>
CC: Rob Herring <robh@kernel.org>
CC: Russell King <linux@armlinux.org.uk>
|
|
The check on the device_link_add() return value is wrong;
this leads to erroneous code execution, so fix it.
Fixes: 3f7cceeab895 ("PCI: imx: Add multi-pd support")
Signed-off-by: Leonard Crestez <leonard.crestez@nxp.com>
[lorenzo.pieralisi@arm.com: updated commit log]
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
|
|
On chips without a separate power domain for PCI (such as 6q/6qp) the
imx6_pcie_attach_pd() function incorrectly returns an error.
Fix by returning 0 if dev_pm_domain_attach_by_name() does not find
anything.
Fixes: 3f7cceeab895 ("PCI: imx: Add multi-pd support")
Reported-by: Lukas F.Hartmann <lukas@mntmn.com>
Signed-off-by: Leonard Crestez <leonard.crestez@nxp.com>
[lorenzo.pieralisi@arm.com: updated commit log]
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
|
|
This reverts commit 2d29f6b96d8f80322ed2dd895bca590491c38d34.
It turns out that the fix can lead to a ~20 percent performance regression
in initial writes to the page cache according to iozone. Let's revert this
for now to have more time for a proper fix.
Cc: stable@vger.kernel.org # v3.13+
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
To 2.17
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
The request buffers are freed right before copying the pointers.
Use the func args instead which are identical and still valid.
Simple reproducer (requires KASAN enabled) on a cifs mount:
echo foo > foo ; tail -f foo & rm foo
Cc: <stable@vger.kernel.org> # 4.20
Fixes: 179e44d49c2f ("smb3: add tracepoint for sending lease break responses to server")
Signed-off-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Paulo Alcantara <palcantara@suse.de>
|
|
The default time is declared in units of microsecnds,
but is used as nanoseconds, resulting in significant
accounting errors for idle state 0 time when all idle
states deeper than 0 are disabled.
Under these unusual conditions, we don't really care
about the poll time limit anyhow.
Fixes: 800fb34a99ce ("cpuidle: poll_state: Disregard disable idle states")
Signed-off-by: Doug Smythies <dsmythies@telus.net>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
A deadlock has been seen when swicthing clocksources which use
PM-runtime. The call path is:
change_clocksource
...
write_seqcount_begin
...
timekeeping_update
...
sh_cmt_clocksource_enable
...
rpm_resume
pm_runtime_mark_last_busy
ktime_get
do
read_seqcount_begin
while read_seqcount_retry
....
write_seqcount_end
Although we should be safe because we haven't yet changed the
clocksource at that time, we can't do that because of seqcount
protection.
Use ktime_get_mono_fast_ns() instead which is lock safe for such
cases.
With ktime_get_mono_fast_ns, the timestamp is not guaranteed to be
monotonic across an update and as a result can goes backward.
According to update_fast_timekeeper() description: "In the worst
case, this can result is a slightly wrong timestamp (a few
nanoseconds)". For PM-runtime autosuspend, this means only that
the suspend decision may be slightly suboptimal.
Fixes: 8234f6734c5d ("PM-runtime: Switch autosuspend over to using hrtimers")
Reported-by: Biju Das <biju.das@bp.renesas.com>
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
The current dentry number tracking code doesn't distinguish between
positive & negative dentries. It just reports the total number of
dentries in the LRU lists.
As excessive number of negative dentries can have an impact on system
performance, it will be wise to track the number of positive and
negative dentries separately.
This patch adds tracking for the total number of negative dentries in
the system LRU lists and reports it in the 5th field in the
/proc/sys/fs/dentry-state file. The number, however, does not include
negative dentries that are in flight but not in the LRU yet as well as
those in the shrinker lists which are on the way out anyway.
The number of positive dentries in the LRU lists can be roughly found by
subtracting the number of negative dentries from the unused count.
Matthew Wilcox had confirmed that since the introduction of the
dentry_stat structure in 2.1.60, the dummy array was there, probably for
future extension. They were not replacements of pre-existing fields.
So no sane applications that read the value of /proc/sys/fs/dentry-state
will do dummy thing if the last 2 fields of the sysctl parameter are not
zero. IOW, it will be safe to use one of the dummy array entry for
negative dentry count.
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The list_lru structure is essentially just a pointer to a table of
per-node LRU lists. Even if CONFIG_MEMCG_KMEM is defined, the list
field is just used for LRU list registration and shrinker_id is set at
initialization. Those fields won't need to be touched that often.
So there is no point to make the list_lru structures to sit in their own
cachelines.
Signed-off-by: Waiman Long <longman@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The nr_dentry_unused per-cpu counter tracks dentries in both the LRU
lists and the shrink lists where the DCACHE_LRU_LIST bit is set.
The shrink_dcache_sb() function moves dentries from the LRU list to a
shrink list and subtracts the dentry count from nr_dentry_unused. This
is incorrect as the nr_dentry_unused count will also be decremented in
shrink_dentry_list() via d_shrink_del().
To fix this double decrement, the decrement in the shrink_dcache_sb()
function is taken out.
Fixes: 4e717f5c1083 ("list_lru: remove special case function list_lru_dispose_all."
Cc: stable@kernel.org
Signed-off-by: Waiman Long <longman@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
When doing reads beyound the end of a file the server returns
error STATUS_END_OF_FILE error which is mapped to -ENODATA.
Currently we report it as a failure which confuses read stats.
Change it to not consider -ENODATA as failure for stat purposes.
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
CC: Stable <stable@vger.kernel.org>
|
|
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
CC: Stable <stable@vger.kernel.org>
|
|
Currently we log success once we send an async IO request to
the server. Instead we need to analyse a response and then log
success or failure for a particular command. Also fix argument
list for read logging.
Cc: <stable@vger.kernel.org> # 4.18
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Allocation of a page array for non-cached IO was separated from
allocation of rdata and wdata structures and this introduced memory
leaks and a possible null pointer dereference. This patch fixes
these problems.
Cc: <stable@vger.kernel.org>
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
minus the various headers and blobs that will be part of the reply.
or else we might trigger a session reconnect.
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
|
|
The size of the fixed part of the create response is 88 bytes not 56.
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
|
|
Ensure that we return the fatal error value that caused us to exit
nfs_page_async_flush().
Fixes: c373fff7bd25 ("NFSv4: Don't special case "launder"")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: stable@vger.kernel.org # v4.12+
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
The vma->vm_mm can become impossible to get before rdma_umap_close() is
called, in this case we must not try to get an mm that is already
undergoing process exit. In this case there is no need to wait for
anything as the VMA will be destroyed by another thread soon and is
already effectively 'unreachable' by userspace.
BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
PGD 800000012bc50067 P4D 800000012bc50067 PUD 129db5067 PMD 0
Oops: 0000 [#1] SMP PTI
CPU: 1 PID: 2050 Comm: bash Tainted: G W OE 4.20.0-rc6+ #3
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
RIP: 0010:__rb_erase_color+0xb9/0x280
Code: 84 17 01 00 00 48 3b 68 10 0f 84 15 01 00 00 48 89
58 08 48 89 de 48 89 ef 4c 89 e3 e8 90 84 22 00 e9 60 ff ff ff 48 8b 5d
10 <f6> 03 01 0f 84 9c 00 00 00 48 8b 43 10 48 85 c0 74 09 f6 00 01 0f
RSP: 0018:ffffbecfc090bab8 EFLAGS: 00010246
RAX: ffff97616346cf30 RBX: 0000000000000000 RCX: 0000000000000101
RDX: 0000000000000000 RSI: ffff97623b6ca828 RDI: ffff97621ef10828
RBP: ffff97621ef10828 R08: ffff97621ef10828 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff97623b6ca838
R13: ffffffffbb3fef50 R14: ffff97623b6ca828 R15: 0000000000000000
FS: 00007f7a5c31d740(0000) GS:ffff97623bb00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 000000011255a000 CR4: 00000000000006e0
Call Trace:
unlink_file_vma+0x3b/0x50
free_pgtables+0xa1/0x110
exit_mmap+0xca/0x1a0
? mlx5_ib_dealloc_pd+0x28/0x30 [mlx5_ib]
mmput+0x54/0x140
uverbs_user_mmap_disassociate+0xcc/0x160 [ib_uverbs]
uverbs_destroy_ufile_hw+0xf7/0x120 [ib_uverbs]
ib_uverbs_remove_one+0xea/0x240 [ib_uverbs]
ib_unregister_device+0xfb/0x200 [ib_core]
mlx5_ib_remove+0x51/0xe0 [mlx5_ib]
mlx5_remove_device+0xc1/0xd0 [mlx5_core]
mlx5_unregister_device+0x3d/0xb0 [mlx5_core]
remove_one+0x2a/0x90 [mlx5_core]
pci_device_remove+0x3b/0xc0
device_release_driver_internal+0x16d/0x240
unbind_store+0xb2/0x100
kernfs_fop_write+0x102/0x180
__vfs_write+0x36/0x1a0
? __alloc_fd+0xa9/0x170
? set_close_on_exec+0x49/0x70
vfs_write+0xad/0x1a0
ksys_write+0x52/0xc0
do_syscall_64+0x5b/0x180
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Cc: <stable@vger.kernel.org> # 4.19
Fixes: 5f9794dc94f5 ("RDMA/ucontext: Add a core API for mmaping driver IO memory")
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
Add multiple people as maintainers for XDP, sorted alphabetically.
XDP is also tied to driver level support and code, but we cannot add all
drivers to the list. Instead K: and N: match on 'xdp' in hope to catch some
of those changes in drivers.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Assign a default net namespace to netdevs created by init_dummy_netdev().
Fixes a NULL pointer dereference caused by busy-polling a socket bound to
an iwlwifi wireless device, which bumps the per-net BUSYPOLLRXPACKETS stat
if napi_poll() received packets:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000190
IP: napi_busy_loop+0xd6/0x200
Call Trace:
sock_poll+0x5e/0x80
do_sys_poll+0x324/0x5a0
SyS_poll+0x6c/0xf0
do_syscall_64+0x6b/0x1f0
entry_SYSCALL_64_after_hwframe+0x3d/0xa2
Fixes: 7db6b048da3b ("net: Commonize busy polling code to focus on napi_id instead of socket")
Signed-off-by: Josh Elsasser <jelsasser@appneta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The skb should be freed by dev_consume_skb_any() in b44_start_xmit()
when bounce_skb is used. The skb is be replaced by bounce_skb, so the
original skb should be consumed(not drop).
dev_consume_skb_irq() should be called in b44_tx() when skb xmit
done. It makes drop profiles(dropwatch, perf) more friendly.
Signed-off-by: Yang Wei <yang.wei9@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The skb shouled be consumed when xmit done, it makes drop profiles
(dropwatch, perf) more friendly.
dev_kfree_skb_irq()/kfree_skb() shouled be replaced by
dev_consume_skb_any(), it makes code cleaner.
Signed-off-by: Yang Wei <yang.wei9@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
dev_consume_skb_irq() should be called in cp_tx() when skb xmit
done. It makes drop profiles(dropwatch, perf) more friendly.
Signed-off-by: Yang Wei <yang.wei9@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The interrupt handler contains a workaround for RX hang applicable
to Zynq and AT91RM9200 only. Subsequent versions do not need this
workaround. This workaround unnecessarily resets RX whenever RX used
bit read is observed, which can be often under heavy traffic. There
is no other action performed on RX UBR interrupt. Hence introduce a
CAPS mask; enable this interrupt and workaround only on affected
versions.
Signed-off-by: Harini Katakam <harini.katakam@xilinx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Fix hp_pin always no value.
[More notes on the changes:
The hp_pin value that is referred in alc294_hp_init() is always zero
at the moment the function gets called, hence this is actually
useless as in the current code.
And, this kind of init sequence should be called from the codec init
callback, instead of the parser function. So, the first fix in this
patch to move the call call into its own init_hook.
OTOH, this function is needed to be called only once after the boot,
and it'd take too long for invoking at each resume (where the init
callback gets called). So we add a new flag and invoke this only
once as an additional fix.
The one case is still not covered, though: S4 resume. But this
change itself won't lead to any regression in that regard, so we
leave S4 issue as is for now and fix it later. -- tiwai ]
Fixes: bde1a7459623 ("ALSA: hda/realtek - Fixed headphone issue for ALC700")
Signed-off-by: Kailang Yang <kailang@realtek.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
|
|
Add BACKLIGHT_LCD_SUPPORT for SAMSUNG_Q10 to fix the
warning: unmet direct dependencies detected for BACKLIGHT_CLASS_DEVICE.
SAMSUNG_Q10 selects BACKLIGHT_CLASS_DEVICE but BACKLIGHT_CLASS_DEVICE
depends on BACKLIGHT_LCD_SUPPORT.
Copy BACKLIGHT_LCD_SUPPORT dependency into SAMSUNG_Q10 to fix:
WARNING: unmet direct dependencies detected for BACKLIGHT_CLASS_DEVICE
Depends on [n]: HAS_IOMEM [=y] && BACKLIGHT_LCD_SUPPORT [=n]
Selected by [y]:
- SAMSUNG_Q10 [=y] && X86 [=y] && X86_PLATFORM_DEVICES [=y] && ACPI [=y]
Signed-off-by: Sinan Kaya <okaya@kernel.org>
Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Add BACKLIGHT_LCD_SUPPORT for ACPI_CMPC to fix the
warning: unmet direct dependencies detected for BACKLIGHT_CLASS_DEVICE.
ACPI_CMPC selects BACKLIGHT_CLASS_DEVICE but BACKLIGHT_CLASS_DEVICE
depends on BACKLIGHT_LCD_SUPPORT.
Copy BACKLIGHT_LCD_SUPPORT dependency into ACPI_CMPC to fix
WARNING: unmet direct dependencies detected for BACKLIGHT_CLASS_DEVICE
Depends on [n]: HAS_IOMEM [=y] && BACKLIGHT_LCD_SUPPORT [=n]
Selected by [y]:
- ACPI_CMPC [=y] && X86 [=y] && X86_PLATFORM_DEVICES [=y] && ACPI [=y] && INPUT [=y] && (RFKILL [=n] || RFKILL [=n]=n)
Signed-off-by: Sinan Kaya <okaya@kernel.org>
Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
After commit 5d32a66541c4 ("PCI/ACPI: Allow ACPI to be built without
CONFIG_PCI set") dependencies on CONFIG_PCI that previously were
satisfied implicitly through dependencies on CONFIG_ACPI have to be
specified directly.
WARNING: unmet direct dependencies detected for I2C_DESIGNWARE_PLATFORM
Depends on [n]: I2C [=y] && HAS_IOMEM [=y] && (ACPI [=y] && COMMON_CLK [=n] || !ACPI [=y])
Selected by [y]:
- MFD_TPS68470 [=y] && HAS_IOMEM [=y] && ACPI [=y] && I2C [=y]=y
MFD_TPS68470 is an ACPI only device and selects I2C_DESIGNWARE_PLATFORM.
I2C_DESIGNWARE_PLATFORM does not have any configuration today for ACPI
support without CONFIG_PCI set.
For sake of a quick fix this introduces a new mandatory dependency to
the driver which may survive without it. Otherwise we need to revisit
the driver architecture to address this properly.
Fixes: 5d32a66541c46 ("PCI/ACPI: Allow ACPI to be built without CONFIG_PCI set")
Signed-off-by: Sinan Kaya <okaya@kernel.org>
Acked-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
dev_consume_skb_irq() should be called in cpmac_end_xmit() when
xmit done. It makes drop profiles more friendly.
Signed-off-by: Yang Wei <yang.wei9@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
dev_consume_skb_irq() should be called in bmac_txdma_intr() when
xmit done. It makes drop profiles more friendly.
Signed-off-by: Yang Wei <yang.wei9@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
dev_consume_skb_irq() should be called in amd8111e_tx() when xmit
done. It makes drop profiles more friendly.
Signed-off-by: Yang Wei <yang.wei9@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
dev_consume_skb_irq() should be called in ace_tx_int() when xmit
done. It makes drop profiles more friendly.
Signed-off-by: Yang Wei <yang.wei9@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If there are outstanding async tx requests (when crypto returns EINPROGRESS),
there is a potential deadlock: the tx work acquires the lock, while we
cancel_delayed_work_sync() while holding the lock. Drop the lock while waiting
for the work to complete.
Fixes: a42055e8d2c30 ("Add support for async encryption of records...")
Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
aead_request_set_crypt takes an iv pointer, and we change the iv
soon after setting it. Some async crypto algorithms don't save the iv,
so we need to save it in the tls_rec for async requests.
Found by hardcoding x64 aesni to use async crypto manager (to test the async
codepath), however I don't think this combination can happen in the wild.
Presumably other hardware offloads will need this fix, but there have been
no user reports.
Fixes: a42055e8d2c30 ("Add support for async encryption of records...")
Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
After batched used ring updating was introduced in commit e2b3b35eb989
("vhost_net: batch used ring update in rx"). We tend to batch heads in
vq->heads for more than one packet. But the quota passed to
get_rx_bufs() was not correctly limited, which can result a OOB write
in vq->heads.
headcount = get_rx_bufs(vq, vq->heads + nvq->done_idx,
vhost_len, &in, vq_log, &log,
likely(mergeable) ? UIO_MAXIOV : 1);
UIO_MAXIOV was still used which is wrong since we could have batched
used in vq->heads, this will cause OOB if the next buffer needs more
than 960 (1024 (UIO_MAXIOV) - 64 (VHOST_NET_BATCH)) heads after we've
batched 64 (VHOST_NET_BATCH) heads:
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
=============================================================================
BUG kmalloc-8k (Tainted: G B ): Redzone overwritten
-----------------------------------------------------------------------------
INFO: 0x00000000fd93b7a2-0x00000000f0713384. First byte 0xa9 instead of 0xcc
INFO: Allocated in alloc_pd+0x22/0x60 age=3933677 cpu=2 pid=2674
kmem_cache_alloc_trace+0xbb/0x140
alloc_pd+0x22/0x60
gen8_ppgtt_create+0x11d/0x5f0
i915_ppgtt_create+0x16/0x80
i915_gem_create_context+0x248/0x390
i915_gem_context_create_ioctl+0x4b/0xe0
drm_ioctl_kernel+0xa5/0xf0
drm_ioctl+0x2ed/0x3a0
do_vfs_ioctl+0x9f/0x620
ksys_ioctl+0x6b/0x80
__x64_sys_ioctl+0x11/0x20
do_syscall_64+0x43/0xf0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
INFO: Slab 0x00000000d13e87af objects=3 used=3 fp=0x (null) flags=0x200000000010201
INFO: Object 0x0000000003278802 @offset=17064 fp=0x00000000e2e6652b
Fixing this by allocating UIO_MAXIOV + VHOST_NET_BATCH iovs for
vhost-net. This is done through set the limitation through
vhost_dev_init(), then set_owner can allocate the number of iov in a
per device manner.
This fixes CVE-2018-16880.
Fixes: e2b3b35eb989 ("vhost_net: batch used ring update in rx")
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
KASAN reported following bug in qed_init_qm_get_idx_from_flags
due to inappropriate casting of "pq_flags". Fix the type of "pq_flags".
[ 196.624707] BUG: KASAN: stack-out-of-bounds in qed_init_qm_get_idx_from_flags+0x1a4/0x1b8 [qed]
[ 196.624712] Read of size 8 at addr ffff809b00bc7360 by task kworker/0:9/1712
[ 196.624714]
[ 196.624720] CPU: 0 PID: 1712 Comm: kworker/0:9 Not tainted 4.18.0-60.el8.aarch64+debug #1
[ 196.624723] Hardware name: To be filled by O.E.M. Saber/Saber, BIOS 0ACKL024 09/26/2018
[ 196.624733] Workqueue: events work_for_cpu_fn
[ 196.624738] Call trace:
[ 196.624742] dump_backtrace+0x0/0x2f8
[ 196.624745] show_stack+0x24/0x30
[ 196.624749] dump_stack+0xe0/0x11c
[ 196.624755] print_address_description+0x68/0x260
[ 196.624759] kasan_report+0x178/0x340
[ 196.624762] __asan_report_load_n_noabort+0x38/0x48
[ 196.624786] qed_init_qm_get_idx_from_flags+0x1a4/0x1b8 [qed]
[ 196.624808] qed_init_qm_info+0xec0/0x2200 [qed]
[ 196.624830] qed_resc_alloc+0x284/0x7e8 [qed]
[ 196.624853] qed_slowpath_start+0x6cc/0x1ae8 [qed]
[ 196.624864] __qede_probe.isra.10+0x1cc/0x12c0 [qede]
[ 196.624874] qede_probe+0x78/0xf0 [qede]
[ 196.624879] local_pci_probe+0xc4/0x180
[ 196.624882] work_for_cpu_fn+0x54/0x98
[ 196.624885] process_one_work+0x758/0x1900
[ 196.624888] worker_thread+0x4e0/0xd18
[ 196.624892] kthread+0x2c8/0x350
[ 196.624897] ret_from_fork+0x10/0x18
[ 196.624899]
[ 196.624902] Allocated by task 2:
[ 196.624906] kasan_kmalloc.part.1+0x40/0x108
[ 196.624909] kasan_kmalloc+0xb4/0xc8
[ 196.624913] kasan_slab_alloc+0x14/0x20
[ 196.624916] kmem_cache_alloc_node+0x1dc/0x480
[ 196.624921] copy_process.isra.1.part.2+0x1d8/0x4a98
[ 196.624924] _do_fork+0x150/0xfa0
[ 196.624926] kernel_thread+0x48/0x58
[ 196.624930] kthreadd+0x3a4/0x5a0
[ 196.624932] ret_from_fork+0x10/0x18
[ 196.624934]
[ 196.624937] Freed by task 0:
[ 196.624938] (stack is not available)
[ 196.624940]
[ 196.624943] The buggy address belongs to the object at ffff809b00bc0000
[ 196.624943] which belongs to the cache thread_stack of size 32768
[ 196.624946] The buggy address is located 29536 bytes inside of
[ 196.624946] 32768-byte region [ffff809b00bc0000, ffff809b00bc8000)
[ 196.624948] The buggy address belongs to the page:
[ 196.624952] page:ffff7fe026c02e00 count:1 mapcount:0 mapping:ffff809b4001c000 index:0x0 compound_mapcount: 0
[ 196.624960] flags: 0xfffff8000008100(slab|head)
[ 196.624967] raw: 0fffff8000008100 dead000000000100 dead000000000200 ffff809b4001c000
[ 196.624970] raw: 0000000000000000 0000000000080008 00000001ffffffff 0000000000000000
[ 196.624973] page dumped because: kasan: bad access detected
[ 196.624974]
[ 196.624976] Memory state around the buggy address:
[ 196.624980] ffff809b00bc7200: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 196.624983] ffff809b00bc7280: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 196.624985] >ffff809b00bc7300: 00 00 00 00 00 00 00 00 f1 f1 f1 f1 04 f2 f2 f2
[ 196.624988] ^
[ 196.624990] ffff809b00bc7380: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 196.624993] ffff809b00bc7400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 196.624995] ==================================================================
Signed-off-by: Manish Chopra <manishc@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Cache number of fragments in the skb locally as in case
of linear skb (with zero fragments), tx completion
(or freeing of skb) may happen before driver tries
to get number of frgaments from the skb which could
lead to stale access to an already freed skb.
Signed-off-by: Manish Chopra <manishc@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
VFs may hit VF-PF channel timeout while probing, as in some
cases it was observed that VF FLR and VF "acquire" message
transaction (i.e first message from VF to PF in VF's probe flow)
could occur simultaneously which could lead VF to fail sending
"acquire" message to PF as VF is marked disabled from HW perspective
due to FLR, which will result into channel timeout and VF probe failure.
In such cases, try retrying VF "acquire" message so that in later
attempts it could be successful to pass message to PF after the VF
FLR is completed and can be probed successfully.
Signed-off-by: Manish Chopra <manishc@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
VF is always configured to drop control frames
(with reserved mac addresses) but to work LACP
on the VFs, it would require LACP control frames
to be forwarded or transmitted successfully.
This patch fixes this in such a way that trusted VFs
(marked through ndo_set_vf_trust) would be allowed to
pass the control frames such as LACP pdus.
Signed-off-by: Manish Chopra <manishc@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|