Age | Commit message (Collapse) | Author | Files | Lines |
|
Fix up the documentation of the struct powercap_control_type members
to match the code.
Also fixup stray whitespace.
Signed-off-by: Amit Kucheria <amitk@kernel.org>
[ rjw: Changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Fix kernel-doc warning in <linux/device.h>:
../include/linux/device.h:613: warning: Function parameter or member 'em_pd' not described in 'device'
Fixes: 1bc138c62295 ("PM / EM: add support for other devices than CPUs in Energy Model")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Add intel_rapl support for the AlderLake platform.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Add intel_rapl support for the RocketLake platform.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Add intel_rapl support for the TigerLake desktop platform.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Currently we allocate rx buffers in a single contiguous buffers for
headers (iser and iscsi) and data trailer. This means that most likely the
data starting offset is aligned to 76 bytes (size of both headers).
This worked fine for years, but at some point this broke, resulting in
data corruptions in isert when a command comes with immediate data and the
underlying backend device assumes 512 bytes buffer alignment.
We assume a hard-requirement for all direct I/O buffers to be 512 bytes
aligned. To fix this, we should avoid passing unaligned buffers for I/O.
Instead, we allocate our recv buffers with some extra space such that we
can have the data portion align to 512 byte boundary. This also means that
we cannot reference headers or data using structure but rather
accessors (as they may move based on alignment). Also, get rid of the
wrong __packed annotation from iser_rx_desc as this has only harmful
effects (not aligned to anything).
This affects the rx descriptors for iscsi login and data plane.
Fixes: 3d75ca0adef4 ("block: introduce multi-page bvec helpers")
Link: https://lore.kernel.org/r/20200904195039.31687-1-sagi@grimberg.me
Reported-by: Stephen Rust <srust@blockbridge.com>
Tested-by: Doug Dumitru <doug@dumitru.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
The device .release function was not being set during the device
initialization. This was leading to the below warning, in error cases when
put_srv was called before device_add was called.
Warning:
Device '(null)' does not have a release() function, it is broken and must
be fixed. See Documentation/kobject.txt.
So, set the device .release function during device initialization in the
__alloc_srv() function.
Fixes: baa5b28b7a47 ("RDMA/rtrs-srv: Replace device_register with device_initialize and device_add")
Link: https://lore.kernel.org/r/20200907102216.104041-1-haris.iqbal@cloud.ionos.com
Signed-off-by: Md Haris Iqbal <haris.iqbal@cloud.ionos.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Acked-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
drivers/infiniband/hw/bnxt_re/main.c:1012:25:
warning: variable ‘qplib_ctx’ set but not used [-Wunused-but-set-variable]
Fixes: f86b31c6a28f ("RDMA/bnxt_re: Static NQ depth allocation")
Link: https://lore.kernel.org/r/20200905121624.32776-1-yuehaibing@huawei.com
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
If we hit the UINT_MAX limit of bio->bi_iter.bi_size and so we are anyway
not merging this page in this bio, then it make sense to make same_page
also as false before returning.
Without this patch, we hit below WARNING in iomap.
This mostly happens with very large memory system and / or after tweaking
vm dirty threshold params to delay writeback of dirty data.
WARNING: CPU: 18 PID: 5130 at fs/iomap/buffered-io.c:74 iomap_page_release+0x120/0x150
CPU: 18 PID: 5130 Comm: fio Kdump: loaded Tainted: G W 5.8.0-rc3 #6
Call Trace:
__remove_mapping+0x154/0x320 (unreliable)
iomap_releasepage+0x80/0x180
try_to_release_page+0x94/0xe0
invalidate_inode_page+0xc8/0x110
invalidate_mapping_pages+0x1dc/0x540
generic_fadvise+0x3c8/0x450
xfs_file_fadvise+0x2c/0xe0 [xfs]
vfs_fadvise+0x3c/0x60
ksys_fadvise64_64+0x68/0xe0
sys_fadvise64+0x28/0x40
system_call_exception+0xf8/0x1c0
system_call_common+0xf0/0x278
Fixes: cc90bc68422 ("block: fix "check bi_size overflow before merge"")
Reported-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
Suggested-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
The pm_runtime_get_sync() can return either 0 or 1 on success but this
code treats 1 as a failure.
Fixes: db96bf976a4f ("spi: stm32: fixes suspend/resume management")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Alain Volmat <alain.volmat@st.com>
Link: https://lore.kernel.org/r/20200909094304.GA420136@mwanda
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
In the prepare_message callback the bus driver has the
opportunity to split a transfer into smaller chunks.
spi_map_msg is done after prepare_message.
Function spi_res_release releases the splited transfers
in the message. Therefore spi_res_release should be called
after spi_map_msg.
The previous try at this was commit c9ba7a16d0f1
which released the splited transfers after
spi_finalize_current_message had been called.
This introduced a race since the message struct could be
out of scope because the spi_sync call got completed.
Fixes this leak on spi bus driver spi-bcm2835.c when transfer
size is greater than 65532:
Kmemleak:
sg_alloc_table+0x28/0xc8
spi_map_buf+0xa4/0x300
__spi_pump_messages+0x370/0x748
__spi_sync+0x1d4/0x270
spi_sync+0x34/0x58
spi_test_execute_msg+0x60/0x340 [spi_loopback_test]
spi_test_run_iter+0x548/0x578 [spi_loopback_test]
spi_test_run_test+0x94/0x140 [spi_loopback_test]
spi_test_run_tests+0x150/0x180 [spi_loopback_test]
spi_loopback_test_probe+0x50/0xd0 [spi_loopback_test]
spi_drv_probe+0x84/0xe0
Signed-off-by: Gustav Wiklander <gustavwi@axis.com>
Link: https://lore.kernel.org/r/20200908151129.15915-1-gustav.wiklander@axis.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
If something goes wrong (such as the SCL being stuck low) then we need
to reset the PCA chip. The issue with this is that on reset we lose all
config settings and the chip ends up in a disabled state which results
in a lock up/high CPU usage. We need to re-apply any configuration that
had previously been set and re-enable the chip.
Signed-off-by: Evan Nimmo <evan.nimmo@alliedtelesis.co.nz>
Reviewed-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
|
|
Right now we are failing requests based on the controller state (which
is checked inline in nvmf_check_ready) however we should definitely
accept requests if the queue is live.
When entering controller reset, we transition the controller into
NVME_CTRL_RESETTING, and then return BLK_STS_RESOURCE for non-mpath
requests (have blk_noretry_request set).
This is also the case for NVME_REQ_USER for the wrong reason. There
shouldn't be any reason for us to reject this I/O in a controller reset.
We do want to prevent passthru commands on the admin queue because we
need the controller to fully initialize first before we let user passthru
admin commands to be issued.
In a non-mpath setup, this means that the requests will simply be
requeued over and over forever not allowing the q_usage_counter to drop
its final reference, causing controller reset to hang if running
concurrently with heavy I/O.
Fixes: 35897b920c8a ("nvme-fabrics: fix and refine state checks in __nvmf_check_ready")
Reviewed-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Reading past end of file returns EOF for aligned reads but -EINVAL for
unaligned reads on f2fs. While documentation is not strict about this
corner case, most filesystem returns EOF on this case, like iomap
filesystems. This patch consolidates the behavior for f2fs, by making
it return EOF(0).
it can be verified by a read loop on a file that does a partial read
before EOF (A file that doesn't end at an aligned address). The
following code fails on an unaligned file on f2fs, but not on
btrfs, ext4, and xfs.
while (done < total) {
ssize_t delta = pread(fd, buf + done, total - done, off + done);
if (!delta)
break;
...
}
It is arguable whether filesystems should actually return EOF or
-EINVAL, but since iomap filesystems support it, and so does the
original DIO code, it seems reasonable to consolidate on that.
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
If the sbi->ckpt->next_free_nid is not NAT block aligned and if there
are free nids in that NAT block between the start of the block and
next_free_nid, then those free nids will not be scanned in scan_nat_page().
This results into mismatch between nm_i->available_nids and the sum of
nm_i->free_nid_count of all NAT blocks scanned. And nm_i->available_nids
will always be greater than the sum of free nids in all the blocks.
Under this condition, if we use all the currently scanned free nids,
then it will loop forever in f2fs_alloc_nid() as nm_i->available_nids
is still not zero but nm_i->free_nid_count of that partially scanned
NAT block is zero.
Fix this to align the nm_i->next_scan_nid to the first nid of the
corresponding NAT block.
Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Commit da52f8ade40b ("f2fs: get the right gc victim section when section
has several segments") added code to count blocks of each section using
variables with type 'unsigned short', which has 2 bytes size in many
systems. However, the counts can be larger than the 2 bytes range and
type conversion results in wrong values. Especially when the f2fs
sections have blocks as many as USHRT_MAX + 1, the count is handled as 0.
This triggers eternal loop in init_dirty_segmap() at mount system call.
Fix this by changing the type of the variables to block_t.
Fixes: da52f8ade40b ("f2fs: get the right gc victim section when section has several segments")
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Yang Yang reported the following crash caused by requeueing a flush
request in Kyber:
[ 2.517297] Unable to handle kernel paging request at virtual address ffffffd8071c0b00
...
[ 2.517468] pc : clear_bit+0x18/0x2c
[ 2.517502] lr : sbitmap_queue_clear+0x40/0x228
[ 2.517503] sp : ffffff800832bc60 pstate : 00c00145
...
[ 2.517599] Process ksoftirqd/5 (pid: 51, stack limit = 0xffffff8008328000)
[ 2.517602] Call trace:
[ 2.517606] clear_bit+0x18/0x2c
[ 2.517619] kyber_finish_request+0x74/0x80
[ 2.517627] blk_mq_requeue_request+0x3c/0xc0
[ 2.517637] __scsi_queue_insert+0x11c/0x148
[ 2.517640] scsi_softirq_done+0x114/0x130
[ 2.517643] blk_done_softirq+0x7c/0xb0
[ 2.517651] __do_softirq+0x208/0x3bc
[ 2.517657] run_ksoftirqd+0x34/0x60
[ 2.517663] smpboot_thread_fn+0x1c4/0x2c0
[ 2.517667] kthread+0x110/0x120
[ 2.517669] ret_from_fork+0x10/0x18
This happens because Kyber doesn't track flush requests, so
kyber_finish_request() reads a garbage domain token. Only call the
scheduler's requeue_request() hook if RQF_ELVPRIV is set (like we do for
the finish_request() hook in blk_mq_free_request()). Now that we're
handling it in blk-mq, also remove the check from BFQ.
Reported-by: Yang Yang <yang.yang@vivo.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Cancel async event work in case async event has been queued up, and
nvme_tcp_submit_async_event() runs after event has been freed.
Signed-off-by: David Milburn <dmilburn@redhat.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Cancel async event work in case async event has been queued up, and
nvme_rdma_submit_async_event() runs after event has been freed.
Signed-off-by: David Milburn <dmilburn@redhat.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Cancel async event work in case async event has been queued up, and
nvme_fc_submit_async_event() runs after event has been freed.
Signed-off-by: David Milburn <dmilburn@redhat.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
The indicated patch introduced a barrier in the sysfs_delete attribute
for the controller that rejects the request if the controller isn't
created. "Created" is defined as at least 1 call to nvme_start_ctrl().
This is problematic in error-injection testing. If an error occurs on
the initial attempt to create an association and the controller enters
reconnect(s) attempts, the admin cannot delete the controller until
either there is a successful association created or ctrl_loss_tmo
times out.
Where this issue is particularly hurtful is when the "admin" is the
nvme-cli, it is performing a connection to a discovery controller, and
it is initiated via auto-connect scripts. With the FC transport, if the
first connection attempt fails, the controller enters a normal reconnect
state but returns control to the cli thread that created the controller.
In this scenario, the cli attempts to read the discovery log via ioctl,
which fails, causing the cli to see it as an empty log and then proceeds
to delete the discovery controller. The delete is rejected and the
controller is left live. If the discovery controller reconnect then
succeeds, there is no action to delete it, and it sits live doing nothing.
Cc: <stable@vger.kernel.org> # v5.7+
Fixes: ce1518139e69 ("nvme: Fix controller creation races with teardown flow")
Signed-off-by: James Smart <james.smart@broadcom.com>
CC: Israel Rukshin <israelr@mellanox.com>
CC: Max Gurtovoy <maxg@mellanox.com>
CC: Christoph Hellwig <hch@lst.de>
CC: Keith Busch <kbusch@kernel.org>
CC: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Buffers need to mapped to DMA channel's device pointer instead of SPI
controller's device pointer as its system DMA that actually does data
transfer.
Data inconsistencies have been reported when reading from flash
without this fix.
Fixes: ffa639e069fb ("mtd: spi-nor: cadence-quadspi: Add DMA support for direct mode reads")
Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com>
Tested-by: Jan Kiszka <jan.kiszka@siemens.com>
Link: https://lore.kernel.org/r/20200831130720.4524-1-vigneshr@ti.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
mdadm relies on the fact that deleting an invalid partition returns
-ENXIO or -ENOTTY to detect if a block device is a partition or a
whole device.
Fixes: 08fc1ab6d748 ("block: fix locking in bdev_del_partition")
Reported-by: kernel test robot <rong.a.chen@intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
In commit 4f0b4352bd26 ("drm/i915: Extract cdclk requirements checking
to separate function") the order of force_min_cdclk_changed check and
intel_modeset_checks(), was reversed. This broke the mechanism to
immediately force a new CDCLK minimum, and lead to driver probe
errors for display audio on GLK platform with 5.9-rc1 kernel. Fix
the issue by moving intel_modeset_checks() call later.
[vsyrjala: It also broke the ability of planes to bump up the cdclk
and thus could lead to underruns when eg. flipping from 32bpp to
64bpp framebuffer. To be clear, we still compute the new cdclk
correctly but fail to actually program it to the hardware due to
intel_set_cdclk_{pre,post}_plane_update() not getting called on
account of state->modeset==false.]
Fixes: 4f0b4352bd26 ("drm/i915: Extract cdclk requirements checking to separate function")
BugLink: https://github.com/thesofproject/linux/issues/2410
Signed-off-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200901151036.1312357-1-kai.vehmanen@linux.intel.com
(cherry picked from commit cf696856bc54a31f78e6538b84c8f7a006b6108b)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
timeout_usec value calculation was wrong, the calculated value
was in msec instead of usec.
Fixes: 56a1485b102e ("i2c: npcm7xx: Add Nuvoton NPCM I2C controller driver")
Signed-off-by: Tali Perry <tali.perry1@gmail.com>
Reviewed-by: Avi Fishman <avifishman70@gmail.com>
Reviewed-by: Joel Stanley <joel@jms.id.au>
Reviewed-by: Alex Qiu <xqiu@google.com>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
|
|
These commits caused a regression on Lenovo t520 sandybridge
machine belonging to reporter. We are reverting them for 5.10
for other reasons, so just do it for 5.9 as well.
This reverts commit 7ac2d2536dfa71c275a74813345779b1e7522c91.
Reported-by: Harald Arnesen <harald@skogtun.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
These commits caused a regression on Lenovo t520 sandybridge
machine belonging to reporter. We are reverting them for 5.10
for other reasons, so just do it for 5.9 as well.
This reverts commit 9e0f9464e2ab36b864359a59b0e9058fdef0ce47.
Reported-by: Harald Arnesen <harald@skogtun.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
These commits caused a regression on Lenovo t520 sandybridge
machine belonging to reporter. We are reverting them for 5.10
for other reasons, so just do it for 5.9 as well.
This reverts commit 763fedd6a216f94c2eb98d2f7ca21be3d3806e69.
Reported-by: Harald Arnesen <harald@skogtun.org>
Signed-off-by: Dave Airlie <airied@redhat.com>
|
|
The 'spi_stm32 44004000.spi: Communication suspended' message means that
when using PIO, the kernel did not read the FIFO fast enough and so the
SPI controller paused the transfer. Currently, this is printed on every
single such event, so if the kernel is busy and the controller is pausing
the transfers often, the kernel will be all the more busy scrolling this
message into the log buffer every few milliseconds. That is not helpful.
Instead, rate-limit the message and print it every once in a while. It is
not possible to use the default dev_warn_ratelimited(), because that is
still too verbose, as it prints 10 lines (DEFAULT_RATELIMIT_BURST) every
5 seconds (DEFAULT_RATELIMIT_INTERVAL). The policy here is to print 1 line
every 50 seconds (DEFAULT_RATELIMIT_INTERVAL * 10), because 1 line is more
than enough and the cycles saved on printing are better left to the CPU to
handle the SPI. However, dev_warn_once() is also not useful, as the user
should be aware that this condition is possibly recurring or ongoing. Thus
the custom rate-limit policy.
Finally, turn the message from dev_warn() to dev_dbg(), since the system
does not suffer any sort of malfunction if this message appears, it is
just slowing down. This further reduces the printing into the log buffer
and frees the CPU to do useful work.
Fixes: dcbe0d84dfa5 ("spi: add driver for STM32 SPI controller")
Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
Cc: Amelie Delaunay <amelie.delaunay@st.com>
Cc: Antonio Borneo <borneo.antonio@gmail.com>
Cc: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20200905151913.117775-1-marex@denx.de
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
It turns out that currently we rely only on sysfs attribute
permissions:
$ ll /sys/bus/rbd/{add*,remove*}
--w------- 1 root root 4096 Sep 3 20:37 /sys/bus/rbd/add
--w------- 1 root root 4096 Sep 3 20:37 /sys/bus/rbd/add_single_major
--w------- 1 root root 4096 Sep 3 20:37 /sys/bus/rbd/remove
--w------- 1 root root 4096 Sep 3 20:38 /sys/bus/rbd/remove_single_major
This means that images can be mapped and unmapped (i.e. block devices
can be created and deleted) by a UID 0 process even after it drops all
privileges or by any process with CAP_DAC_OVERRIDE in its user namespace
as long as UID 0 is mapped into that user namespace.
Be consistent with other virtual block devices (loop, nbd, dm, md, etc)
and require CAP_SYS_ADMIN in the initial user namespace for mapping and
unmapping, and also for dumping the configuration string and refreshing
the image header.
Cc: stable@vger.kernel.org
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
|
|
SDHCI changed from using a tasklet to finish requests, to using an IRQ
thread i.e. commit c07a48c2651965 ("mmc: sdhci: Remove finish_tasklet").
Because this increased the latency to complete requests, a preparatory
change was made to complete the request from the IRQ handler if
possible i.e. commit 19d2f695f4e827 ("mmc: sdhci: Call mmc_request_done()
from IRQ handler if possible"). That alleviated the situation for MMC
block devices because the MMC block driver makes use of mmc_pre_req()
and mmc_post_req() so that successful requests are completed in the IRQ
handler and any DMA unmapping is handled separately in mmc_post_req().
However SDIO was still affected, and an example has been reported with
up to 20% degradation in performance.
Looking at SDIO I/O helper functions, sdio_io_rw_ext_helper() appeared
to be a possible candidate for making use of asynchronous requests
within its I/O loops, but analysis revealed that these loops almost
never iterate more than once, so the complexity of the change would not
be warrented.
Instead, mmc_pre_req() and mmc_post_req() are added before and after I/O
submission (mmc_wait_for_req) in mmc_io_rw_extended(). This still has
the potential benefit of reducing the duration of interrupt handlers, as
well as addressing the latency issue for SDHCI. It also seems a more
reasonable solution than forcing drivers to do everything in the IRQ
handler.
Reported-by: Dmitry Osipenko <digetx@gmail.com>
Fixes: c07a48c2651965 ("mmc: sdhci: Remove finish_tasklet")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Tested-by: Dmitry Osipenko <digetx@gmail.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20200903082007.18715-1-adrian.hunter@intel.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
|
|
Commit b214fe592ab7 ("mmc: sdhci-of-esdhc: add erratum eSDHC7 support")
added code to check for a specific compatible string in the device-tree
on every esdhc interrupat. Instead of doing this record the quirk in
struct sdhci_esdhc and lookup the struct in esdhc_irq.
Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Link: https://lore.kernel.org/r/20200903012029.25673-1-chris.packham@alliedtelesis.co.nz
Fixes: b214fe592ab7 ("mmc: sdhci-of-esdhc: add erratum eSDHC7 support")
Cc: stable@vger.kernel.org
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
|
|
The commit cd57d07b1e4e ("sh: don't allow non-coherent DMA for NOMMU") made
CONFIG_NO_DMA to be set for some platforms, for good reasons.
Consequentially, CONFIG_HAS_DMA doesn't get set, which makes the DMA
mapping interface to be built as stub functions, but also prevent the
mmc_spi driver from being built as it depends on CONFIG_HAS_DMA.
It turns out that for some odd cases, the driver still relied on the DMA
mapping interface, even if the DMA was not actively being used.
To fixup the behaviour, let's drop the build dependency for CONFIG_HAS_DMA.
Moreover, as to allow the driver to succeed probing, let's move the DMA
initializations behind "#ifdef CONFIG_HAS_DMA".
Fixes: cd57d07b1e4e ("sh: don't allow non-coherent DMA for NOMMU")
Reported-by: Rich Felker <dalias@libc.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Tested-by: Rich Felker <dalias@libc.org>
Link: https://lore.kernel.org/r/20200901150438.228887-1-ulf.hansson@linaro.org
|
|
As the comments in this patch say, if we tune and find all phases are
valid it's _almost_ as bad as no phases being found valid. Probably
all phases are not really reliable but we didn't detect where the
unreliable place is. That means we'll essentially be guessing and
hoping we get a good phase.
This is not just a problem in theory. It was causing real problems on
a real board. On that board, most often phase 10 is found as the only
invalid phase, though sometimes 10 and 11 are invalid and sometimes
just 11. Some percentage of the time, however, all phases are found
to be valid. When this happens, the current logic will decide to use
phase 11. Since phase 11 is sometimes found to be invalid, this is a
bad choice. Sure enough, when phase 11 is picked we often get mmc
errors later in boot.
I have seen cases where all phases were found to be valid 3 times in a
row, so increase the retry count to 10 just to be extra sure.
Fixes: 415b5a75da43 ("mmc: sdhci-msm: Add platform_execute_tuning implementation")
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Veerabhadrarao Badiganti <vbadigan@codeaurora.org>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Link: https://lore.kernel.org/r/20200827075809.1.If179abf5ecb67c963494db79c3bc4247d987419b@changeid
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
|
|
The commit 61d7437ed1390 ("mmc: sdhci-acpi: Fix HS400 tuning for AMDI0040")
broke resume for eMMC HS400. When the system suspends the eMMC controller
is powered down. So, on resume we need to reinitialize the controller.
Although, amd_sdhci_host was not getting cleared, so the DLL was never
re-enabled on resume. This results in HS400 being non-functional.
To fix the problem, this change clears the tuned_clock flag, clears the
dll_enabled flag and disables the DLL on reset.
Fixes: 61d7437ed1390 ("mmc: sdhci-acpi: Fix HS400 tuning for AMDI0040")
Signed-off-by: Raul E Rangel <rrangel@chromium.org>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Link: https://lore.kernel.org/r/20200831150517.1.I93c78bfc6575771bb653c9d3fca5eb018a08417d@changeid
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
|
|
RHBZ: 1871246
If during cifs_lookup()/get_inode_info() we encounter a DFS link
and we use the cifsacl or modefromsid mount options we must suppress
any -EREMOTE errors that triggers or else we will not be able to follow
the DFS link and automount the target.
This fixes an issue with modefromsid/cifsacl where these mountoptions
would break DFS and we would no longer be able to access the share.
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
|
|
While looking for ->files in ->defer_list, consider that requests there
may actually be links.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
While trying to cancel requests with ->files, it also should look for
requests in ->defer_list, otherwise it might end up hanging a thread.
Cancel all requests in ->defer_list up to the last request there with
matching ->files, that's needed to follow drain ordering semantics.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Otherwise gcc generates warnings if the expression is complicated.
Fixes: 312a0c170945 ("[PATCH] LOG2: Alter roundup_pow_of_two() so that it can use a ilog2() on a constant")
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Link: https://lkml.kernel.org/r/0-v1-8a2697e3c003+41165-log_brackets_jgg@nvidia.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
collapse_file() in khugepaged passes PAGE_SIZE as the number of pages to
be read to page_cache_sync_readahead(). The intent was probably to read
a single page. Fix it to use the number of pages to the end of the
window instead.
Fixes: 99cb0dbd47a1 ("mm,thp: add read-only THP support for (non-shmem) FS")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: Song Liu <songliubraving@fb.com>
Acked-by: Yang Shi <shy828301@gmail.com>
Acked-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Eric Biggers <ebiggers@google.com>
Link: https://lkml.kernel.org/r/20200903140844.14194-2-willy@infradead.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
There is a race between the assignment of `table->data` and write value
to the pointer of `table->data` in the __do_proc_doulongvec_minmax() on
the other thread.
CPU0: CPU1:
proc_sys_write
hugetlb_sysctl_handler proc_sys_call_handler
hugetlb_sysctl_handler_common hugetlb_sysctl_handler
table->data = &tmp; hugetlb_sysctl_handler_common
table->data = &tmp;
proc_doulongvec_minmax
do_proc_doulongvec_minmax sysctl_head_finish
__do_proc_doulongvec_minmax unuse_table
i = table->data;
*i = val; // corrupt CPU1's stack
Fix this by duplicating the `table`, and only update the duplicate of
it. And introduce a helper of proc_hugetlb_doulongvec_minmax() to
simplify the code.
The following oops was seen:
BUG: kernel NULL pointer dereference, address: 0000000000000000
#PF: supervisor instruction fetch in kernel mode
#PF: error_code(0x0010) - not-present page
Code: Bad RIP value.
...
Call Trace:
? set_max_huge_pages+0x3da/0x4f0
? alloc_pool_huge_page+0x150/0x150
? proc_doulongvec_minmax+0x46/0x60
? hugetlb_sysctl_handler_common+0x1c7/0x200
? nr_hugepages_store+0x20/0x20
? copy_fd_bitmaps+0x170/0x170
? hugetlb_sysctl_handler+0x1e/0x20
? proc_sys_call_handler+0x2f1/0x300
? unregister_sysctl_table+0xb0/0xb0
? __fd_install+0x78/0x100
? proc_sys_write+0x14/0x20
? __vfs_write+0x4d/0x90
? vfs_write+0xef/0x240
? ksys_write+0xc0/0x160
? __ia32_sys_read+0x50/0x50
? __close_fd+0x129/0x150
? __x64_sys_write+0x43/0x50
? do_syscall_64+0x6c/0x200
? entry_SYSCALL_64_after_hwframe+0x44/0xa9
Fixes: e5ff215941d5 ("hugetlb: multiple hstates for multiple page sizes")
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Andi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/20200828031146.43035-1-songmuchun@bytedance.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Since commit cf11e85fc08c ("mm: hugetlb: optionally allocate gigantic
hugepages using cma"), the gigantic page would be allocated from node
which is not the preferred node, although there are pages available from
that node. The reason is that the nid parameter has been ignored in
alloc_gigantic_page().
Besides, the __GFP_THISNODE also need be checked if user required to
alloc only from the preferred node.
After this patch, the preferred node is tried first before other allowed
nodes, and don't try to allocate from other nodes if __GFP_THISNODE is
specified. If user don't specify the preferred node, the current node
will be used as preferred node, which makes sure consistent behavior of
allocating gigantic and non-gigantic hugetlb page.
Fixes: cf11e85fc08c ("mm: hugetlb: optionally allocate gigantic hugepages using cma")
Signed-off-by: Li Xinhai <lixinhai.lxh@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Roman Gushchin <guro@fb.com>
Link: https://lkml.kernel.org/r/20200902025016.697260-1-lixinhai.lxh@gmail.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The code to remove a migration PTE and replace it with a device private
PTE was not copying the soft dirty bit from the migration entry. This
could lead to page contents not being marked dirty when faulting the page
back from device private memory.
Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Bharata B Rao <bharata@linux.ibm.com>
Link: https://lkml.kernel.org/r/20200831212222.22409-3-rcampbell@nvidia.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Patch series "mm/migrate: preserve soft dirty in remove_migration_pte()".
I happened to notice this from code inspection after seeing Alistair
Popple's patch ("mm/rmap: Fixup copying of soft dirty and uffd ptes").
This patch (of 2):
The check for is_zone_device_page() and is_device_private_page() is
unnecessary since the latter is sufficient to determine if the page is a
device private page. Simplify the code for easier reading.
Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Bharata B Rao <bharata@linux.ibm.com>
Link: https://lkml.kernel.org/r/20200831212222.22409-1-rcampbell@nvidia.com
Link: https://lkml.kernel.org/r/20200831212222.22409-2-rcampbell@nvidia.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
During memory migration a pte is temporarily replaced with a migration
swap pte. Some pte bits from the existing mapping such as the soft-dirty
and uffd write-protect bits are preserved by copying these to the
temporary migration swap pte.
However these bits are not stored at the same location for swap and
non-swap ptes. Therefore testing these bits requires using the
appropriate helper function for the given pte type.
Unfortunately several code locations were found where the wrong helper
function is being used to test soft_dirty and uffd_wp bits which leads to
them getting incorrectly set or cleared during page-migration.
Fix these by using the correct tests based on pte type.
Fixes: a5430dda8a3a ("mm/migrate: support un-addressable ZONE_DEVICE page in migration")
Fixes: 8c3328f1f36a ("mm/migrate: migrate_vma() unmap page from vma while collecting pages")
Fixes: f45ec5ff16a7 ("userfaultfd: wp: support swap and page migration")
Signed-off-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Peter Xu <peterx@redhat.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Alistair Popple <alistair@popple.id.au>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20200825064232.10023-2-alistair@popple.id.au
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Commit f45ec5ff16a75 ("userfaultfd: wp: support swap and page migration")
introduced support for tracking the uffd wp bit during page migration.
However the non-swap PTE variant was used to set the flag for zone device
private pages which are a type of swap page.
This leads to corruption of the swap offset if the original PTE has the
uffd_wp flag set.
Fixes: f45ec5ff16a75 ("userfaultfd: wp: support swap and page migration")
Signed-off-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Peter Xu <peterx@redhat.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Link: https://lkml.kernel.org/r/20200825064232.10023-1-alistair@popple.id.au
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The syzbot reported the below use-after-free:
BUG: KASAN: use-after-free in madvise_willneed mm/madvise.c:293 [inline]
BUG: KASAN: use-after-free in madvise_vma mm/madvise.c:942 [inline]
BUG: KASAN: use-after-free in do_madvise.part.0+0x1c8b/0x1cf0 mm/madvise.c:1145
Read of size 8 at addr ffff8880a6163eb0 by task syz-executor.0/9996
CPU: 0 PID: 9996 Comm: syz-executor.0 Not tainted 5.9.0-rc1-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x18f/0x20d lib/dump_stack.c:118
print_address_description.constprop.0.cold+0xae/0x497 mm/kasan/report.c:383
__kasan_report mm/kasan/report.c:513 [inline]
kasan_report.cold+0x1f/0x37 mm/kasan/report.c:530
madvise_willneed mm/madvise.c:293 [inline]
madvise_vma mm/madvise.c:942 [inline]
do_madvise.part.0+0x1c8b/0x1cf0 mm/madvise.c:1145
do_madvise mm/madvise.c:1169 [inline]
__do_sys_madvise mm/madvise.c:1171 [inline]
__se_sys_madvise mm/madvise.c:1169 [inline]
__x64_sys_madvise+0xd9/0x110 mm/madvise.c:1169
do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Allocated by task 9992:
kmem_cache_alloc+0x138/0x3a0 mm/slab.c:3482
vm_area_alloc+0x1c/0x110 kernel/fork.c:347
mmap_region+0x8e5/0x1780 mm/mmap.c:1743
do_mmap+0xcf9/0x11d0 mm/mmap.c:1545
vm_mmap_pgoff+0x195/0x200 mm/util.c:506
ksys_mmap_pgoff+0x43a/0x560 mm/mmap.c:1596
do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Freed by task 9992:
kmem_cache_free.part.0+0x67/0x1f0 mm/slab.c:3693
remove_vma+0x132/0x170 mm/mmap.c:184
remove_vma_list mm/mmap.c:2613 [inline]
__do_munmap+0x743/0x1170 mm/mmap.c:2869
do_munmap mm/mmap.c:2877 [inline]
mmap_region+0x257/0x1780 mm/mmap.c:1716
do_mmap+0xcf9/0x11d0 mm/mmap.c:1545
vm_mmap_pgoff+0x195/0x200 mm/util.c:506
ksys_mmap_pgoff+0x43a/0x560 mm/mmap.c:1596
do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xa9
It is because vma is accessed after releasing mmap_lock, but someone
else acquired the mmap_lock and the vma is gone.
Releasing mmap_lock after accessing vma should fix the problem.
Fixes: 692fe62433d4c ("mm: Handle MADV_WILLNEED through vfs_fadvise()")
Reported-by: syzbot+b90df26038d1d5d85c97@syzkaller.appspotmail.com
Signed-off-by: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: <stable@vger.kernel.org> [5.4+]
Link: https://lkml.kernel.org/r/20200816141204.162624-1-shy828301@gmail.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The usage of "capture group (...)" in the immediate condition after `&&`
results in `$1` being uninitialized. This issues a warning "Use of
uninitialized value $1 in regexp compilation at ./scripts/checkpatch.pl
line 2638".
I noticed this bug while running checkpatch on the set of commits from
v5.7 to v5.8-rc1 of the kernel on the commits with a diff content in
their commit message.
This bug was introduced in the script by commit e518e9a59ec3
("checkpatch: emit an error when there's a diff in a changelog"). It
has been in the script since then.
The author intended to store the match made by capture group in variable
`$1`. This should have contained the name of the file as `[\w/]+`
matched. However, this couldn't be accomplished due to usage of capture
group and `$1` in the same regular expression.
Fix this by placing the capture group in the condition before `&&`.
Thus, `$1` can be initialized to the text that capture group matches
thereby setting it to the desired and required value.
Fixes: e518e9a59ec3 ("checkpatch: emit an error when there's a diff in a changelog")
Signed-off-by: Mrinal Pandey <mrinalmni@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Reviewed-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Joe Perches <joe@perches.com>
Link: https://lkml.kernel.org/r/20200714032352.f476hanaj2dlmiot@mrinalpandey
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Commit 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler")
changed ctl_table.proc_handler to take a kernel pointer. Adjust the
definition of sysctl_max_threads to match its prototype in
linux/sysctl.h which fixes the following sparse error/warning:
kernel/fork.c:3050:47: warning: incorrect type in argument 3 (different address spaces)
kernel/fork.c:3050:47: expected void *
kernel/fork.c:3050:47: got void [noderef] __user *buffer
kernel/fork.c:3036:5: error: symbol 'sysctl_max_threads' redeclared with different type (incompatible argument 3 (different address spaces)):
kernel/fork.c:3036:5: int extern [addressable] [signed] [toplevel] sysctl_max_threads( ... )
kernel/fork.c: note: in included file (through include/linux/key.h, include/linux/cred.h, include/linux/sched/signal.h, include/linux/sched/cputime.h):
include/linux/sysctl.h:242:5: note: previously declared as:
include/linux/sysctl.h:242:5: int extern [addressable] [signed] [toplevel] sysctl_max_threads( ... )
Fixes: 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler")
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Link: https://lkml.kernel.org/r/20200825093647.24263-1-tklauser@distanz.ch
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|