aboutsummaryrefslogtreecommitdiffstats
path: root/tools/perf/scripts/python/exported-sql-viewer.py (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2021-09-24block: hold ->invalidate_lock in blkdev_fallocateMing Lei1-11/+10
When running ->fallocate(), blkdev_fallocate() should hold mapping->invalidate_lock to prevent page cache from being accessed, otherwise stale data may be read in page cache. Without this patch, blktests block/009 fails sometimes. With this patch, block/009 can pass always. Also as Jan pointed out, no pages can be created in the discarded area while you are holding the invalidate_lock, so remove the 2nd truncate_bdev_range(). Cc: Jan Kara <jack@suse.cz> Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210923023751.1441091-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-09-24blktrace: Fix uaf in blk_trace access after removing by sysfsZhihao Cheng1-0/+8
There is an use-after-free problem triggered by following process: P1(sda) P2(sdb) echo 0 > /sys/block/sdb/trace/enable blk_trace_remove_queue synchronize_rcu blk_trace_free relay_close rcu_read_lock __blk_add_trace trace_note_tsk (Iterate running_trace_list) relay_close_buf relay_destroy_buf kfree(buf) trace_note(sdb's bt) relay_reserve buf->offset <- nullptr deference (use-after-free) !!! rcu_read_unlock [ 502.714379] BUG: kernel NULL pointer dereference, address: 0000000000000010 [ 502.715260] #PF: supervisor read access in kernel mode [ 502.715903] #PF: error_code(0x0000) - not-present page [ 502.716546] PGD 103984067 P4D 103984067 PUD 17592b067 PMD 0 [ 502.717252] Oops: 0000 [#1] SMP [ 502.720308] RIP: 0010:trace_note.isra.0+0x86/0x360 [ 502.732872] Call Trace: [ 502.733193] __blk_add_trace.cold+0x137/0x1a3 [ 502.733734] blk_add_trace_rq+0x7b/0xd0 [ 502.734207] blk_add_trace_rq_issue+0x54/0xa0 [ 502.734755] blk_mq_start_request+0xde/0x1b0 [ 502.735287] scsi_queue_rq+0x528/0x1140 ... [ 502.742704] sg_new_write.isra.0+0x16e/0x3e0 [ 502.747501] sg_ioctl+0x466/0x1100 Reproduce method: ioctl(/dev/sda, BLKTRACESETUP, blk_user_trace_setup[buf_size=127]) ioctl(/dev/sda, BLKTRACESTART) ioctl(/dev/sdb, BLKTRACESETUP, blk_user_trace_setup[buf_size=127]) ioctl(/dev/sdb, BLKTRACESTART) echo 0 > /sys/block/sdb/trace/enable & // Add delay(mdelay/msleep) before kernel enters blk_trace_free() ioctl$SG_IO(/dev/sda, SG_IO, ...) // Enters trace_note_tsk() after blk_trace_free() returned // Use mdelay in rcu region rather than msleep(which may schedule out) Remove blk_trace from running_list before calling blk_trace_free() by sysfs if blk_trace is at Blktrace_running state. Fixes: c71a896154119f ("blktrace: add ftrace plugin") Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Link: https://lore.kernel.org/r/20210923134921.109194-1-chengzhihao1@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-09-24block: don't call rq_qos_ops->done_bio if the bio isn't trackedMing Lei1-1/+1
rq_qos framework is only applied on request based driver, so: 1) rq_qos_done_bio() needn't to be called for bio based driver 2) rq_qos_done_bio() needn't to be called for bio which isn't tracked, such as bios ended from error handling code. Especially in bio_endio(): 1) request queue is referred via bio->bi_bdev->bd_disk->queue, which may be gone since request queue refcount may not be held in above two cases 2) q->rq_qos may be freed in blk_cleanup_queue() when calling into __rq_qos_done_bio() Fix the potential kernel panic by not calling rq_qos_ops->done_bio if the bio isn't tracked. This way is safe because both ioc_rqos_done_bio() and blkcg_iolatency_done_bio() are nop if the bio isn't tracked. Reported-by: Yu Kuai <yukuai3@huawei.com> Cc: tj@kernel.org Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20210924110704.1541818-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-09-24io_uring: kill extra checks in io_write()Pavel Begunkov1-3/+0
We don't retry short writes and so we would never get to async setup in io_write() in that case. Thus ret2 > 0 is always false and iov_iter_advance() is never used. Apparently, the same is found by Coverity, which complains on the code. Fixes: cd65869512ab ("io_uring: use iov_iter state save/restore helpers") Reported-by: Dave Jones <davej@codemonkey.org.uk> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/5b33e61034748ef1022766efc0fb8854cfcf749c.1632500058.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-09-24io_uring: don't punt files update to io-wq unconditionallyJens Axboe1-5/+2
There's no reason to punt it unconditionally, we just need to ensure that the submit lock grabbing is conditional. Fixes: 05f3fb3c5397 ("io_uring: avoid ring quiesce for fixed file set unregister and update") Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-09-24io_uring: put provided buffer meta data under memcg accountingJens Axboe1-1/+1
For each provided buffer, we allocate a struct io_buffer to hold the data associated with it. As a large number of buffers can be provided, account that data with memcg. Fixes: ddf0322db79c ("io_uring: add IORING_OP_PROVIDE_BUFFERS") Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-09-24io_uring: allow conditional reschedule for intensive iteratorsJens Axboe1-2/+6
If we have a lot of threads and rings, the tctx list can get quite big. This is especially true if we keep creating new threads and rings. Likewise for the provided buffers list. Be nice and insert a conditional reschedule point while iterating the nodes for deletion. Link: https://lore.kernel.org/io-uring/00000000000064b6b405ccb41113@google.com/ Reported-by: syzbot+111d2a03f51f5ae73775@syzkaller.appspotmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-09-24io_uring: fix potential req refcount underflowHao Xu1-2/+7
For multishot mode, there may be cases like: iowq original context io_poll_add _arm_poll() mask = vfs_poll() is not 0 if mask (2) io_poll_complete() compl_unlock (interruption happens tw queued to original context) io_poll_task_func() compl_lock (3) done = io_poll_complete() is true compl_unlock put req ref (1) if (poll->flags & EPOLLONESHOT) put req ref EPOLLONESHOT flag in (1) may be from (2) or (3), so there are multiple combinations that can cause ref underfow. Let's address it by: - check the return value in (2) as done - change (1) to if (done) in this way, we only do ref put in (1) if 'oneshot flag' is from (2) - do poll.done check in io_poll_task_func(), so that we won't put ref for the second time. Signed-off-by: Hao Xu <haoxu@linux.alibaba.com> Link: https://lore.kernel.org/r/20210922101238.7177-4-haoxu@linux.alibaba.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-09-24io_uring: fix missing set of EPOLLONESHOT for CQ ring overflowHao Xu1-1/+3
We should set EPOLLONESHOT if cqring_fill_event() returns false since io_poll_add() decides to put req or not by it. Fixes: 5082620fb2ca ("io_uring: terminate multishot poll for CQ ring overflow") Signed-off-by: Hao Xu <haoxu@linux.alibaba.com> Link: https://lore.kernel.org/r/20210922101238.7177-3-haoxu@linux.alibaba.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-09-24io_uring: fix race between poll completion and cancel_hash insertionHao Xu1-3/+3
If poll arming and poll completion runs in parallel, there maybe races. For instance, run io_poll_add in iowq and io_poll_task_func in original context, then: iowq original context io_poll_add vfs_poll (interruption happens tw queued to original context) io_poll_task_func generate cqe del from cancel_hash[] if !poll.done insert to cancel_hash[] The entry left in cancel_hash[], similar case for fast poll. Fix it by set poll.done = true when del from cancel_hash[]. Fixes: 5082620fb2ca ("io_uring: terminate multishot poll for CQ ring overflow") Signed-off-by: Hao Xu <haoxu@linux.alibaba.com> Link: https://lore.kernel.org/r/20210922101238.7177-2-haoxu@linux.alibaba.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-09-24io-wq: ensure we exit if thread group is exitingJens Axboe1-1/+2
Dave reports that a coredumping workload gets stuck in 5.15-rc2, and identified the culprit in the Fixes line below. The problem is that relying solely on fatal_signal_pending() to gate whether to exit or not fails miserably if a process gets eg SIGILL sent. Don't exclusively rely on fatal signals, also check if the thread group is exiting. Fixes: 15e20db2e0ce ("io-wq: only exit on fatal signals") Reported-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-09-24USB: serial: option: add device id for Foxconn T99W265Slark Xiao1-0/+2
Adding support for Foxconn device T99W265 for enumeration with PID 0xe0db. usb-devices output for 0xe0db T: Bus=04 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#= 19 Spd=5000 MxCh= 0 D: Ver= 3.20 Cls=ef(misc ) Sub=02 Prot=01 MxPS= 9 #Cfgs= 1 P: Vendor=0489 ProdID=e0db Rev=05.04 S: Manufacturer=Microsoft S: Product=Generic Mobile Broadband Adapter S: SerialNumber=6c50f452 C: #Ifs= 5 Cfg#= 1 Atr=a0 MxPwr=896mA I: If#=0x0 Alt= 0 #EPs= 1 Cls=02(commc) Sub=0e Prot=00 Driver=cdc_mbim I: If#=0x1 Alt= 1 #EPs= 2 Cls=0a(data ) Sub=00 Prot=02 Driver=cdc_mbim I: If#=0x2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option I: If#=0x3 Alt= 0 #EPs= 1 Cls=ff(vend.) Sub=ff Prot=ff Driver=(none) I: If#=0x4 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=30 Driver=option if0/1: MBIM, if2:Diag, if3:GNSS, if4: Modem Signed-off-by: Slark Xiao <slark_xiao@163.com> Link: https://lore.kernel.org/r/20210917110106.9852-1-slark_xiao@163.com [ johan: use USB_DEVICE_INTERFACE_CLASS(), amend comment ] Cc: stable@vger.kernel.org Signed-off-by: Johan Hovold <johan@kernel.org>
2021-09-24USB: serial: cp210x: add ID for GW Instek GDM-834x Digital MultimeterUwe Brandt1-0/+1
Add the USB serial device ID for the GW Instek GDM-834x Digital Multimeter. Signed-off-by: Uwe Brandt <uwe.brandt@gmail.com> Link: https://lore.kernel.org/r/YUxFl3YUCPGJZd8Y@hovoldconsulting.com Cc: stable@vger.kernel.org Signed-off-by: Johan Hovold <johan@kernel.org>
2021-09-23cifs: fix incorrect check for null pointer in header_assembleSteve French1-2/+3
Although very unlikely that the tlink pointer would be null in this case, get_next_mid function can in theory return null (but not an error) so need to check for null (not for IS_ERR, which can not be returned here). Address warning: fs/smbfs_client/connect.c:2392 cifs_match_super() warn: 'tlink' isn't an ERR_PTR Pointed out by Dan Carpenter via smatch code analysis tool CC: stable@vger.kernel.org Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-09-23smb3: correct server pointer dereferencing check to be more consistentSteve French1-1/+2
Address warning: fs/smbfs_client/misc.c:273 header_assemble() warn: variable dereferenced before check 'treeCon->ses->server' Pointed out by Dan Carpenter via smatch code analysis tool Although the check is likely unneeded, adding it makes the code more consistent and easier to read, as the same check is done elsewhere in the function. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-09-23smb3: correct smb3 ACL security descriptorSteve French1-2/+2
Address warning: fs/smbfs_client/smb2pdu.c:2425 create_sd_buf() warn: struct type mismatch 'smb3_acl vs cifs_acl' Pointed out by Dan Carpenter via smatch code analysis tool Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-09-23cifs: Clear modified attribute bit from inode flagsSteve French1-1/+1
Clear CIFS_INO_MODIFIED_ATTR bit from inode flags after updating mtime and ctime Signed-off-by: Rohith Surabattula <rohiths@microsoft.com> Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Acked-by: Ronnie Sahlberg <lsahlber@redhat.com> Cc: stable@vger.kernel.org # 5.13+ Signed-off-by: Steve French <stfrench@microsoft.com>
2021-09-23drm/amdkfd: fix svm_migrate_fini warningPhilip Yang3-15/+4
Device manager releases device-specific resources when a driver disconnects from a device, devm_memunmap_pages and devm_release_mem_region calls in svm_migrate_fini are redundant. It causes below warning trace after patch "drm/amdgpu: Split amdgpu_device_fini into early and late", so remove function svm_migrate_fini. BUG: https://gitlab.freedesktop.org/drm/amd/-/issues/1718 WARNING: CPU: 1 PID: 3646 at drivers/base/devres.c:795 devm_release_action+0x51/0x60 Call Trace: ? memunmap_pages+0x360/0x360 svm_migrate_fini+0x2d/0x60 [amdgpu] kgd2kfd_device_exit+0x23/0xa0 [amdgpu] amdgpu_amdkfd_device_fini_sw+0x1d/0x30 [amdgpu] amdgpu_device_fini_sw+0x45/0x290 [amdgpu] amdgpu_driver_release_kms+0x12/0x30 [amdgpu] drm_dev_release+0x20/0x40 [drm] release_nodes+0x196/0x1e0 device_release_driver_internal+0x104/0x1d0 driver_detach+0x47/0x90 bus_remove_driver+0x7a/0xd0 pci_unregister_driver+0x3d/0x90 amdgpu_exit+0x11/0x20 [amdgpu] Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-09-23drm/amdkfd: handle svm migrate init errorPhilip Yang1-0/+3
If svm migration init failed to create pgmap for device memory, set pgmap type to 0 to disable device SVM support capability. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-09-23drm/amd/pm: Update intermediate power state for SILijo Lazar1-0/+2
Update the current state as boot state during dpm initialization. During the subsequent initialization, set_power_state gets called to transition to the final power state. set_power_state refers to values from the current state and without current state populated, it could result in NULL pointer dereference. For ex: on platforms where PCI speed change is supported through ACPI ATCS method, the link speed of current state needs to be queried before deciding on changing to final power state's link speed. The logic to query ATCS-support was broken on certain platforms. The issue became visible when broken ATCS-support logic got fixed with commit f9b7f3703ff9 ("drm/amdgpu/acpi: make ATPX/ATCS structures global (v2)"). Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1698 Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2021-09-23drm/amdkfd: fix dma mapping leaking warningPhilip Yang1-4/+14
For xnack off, restore work dma unmap previous system memory page, and dma map the updated system memory page to update GPU mapping, this is not dma mapping leaking, remove the WARN_ONCE for dma mapping leaking. prange->dma_addr store the VRAM page pfn after the range migrated to VRAM, should not dma unmap VRAM page when updating GPU mapping or remove prange. Add helper svm_is_valid_dma_mapping_addr to check VRAM page and error cases. Mask out SVM_RANGE_VRAM_DOMAIN flag in dma_addr before calling amdgpu vm update to avoid BUG_ON(*addr & 0xFFFF00000000003FULL), and set it again immediately after. This flag is used to know the type of page later to dma unmapping system memory page. Fixes: 1d5dbfe6c06a ("drm/amdkfd: classify and map mixed svm range pages in GPU") Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-09-23drm/amdkfd: SVM map to gpus check vma boundaryPhilip Yang1-1/+5
SVM range may includes multiple VMAs with different vm_flags, if prange page index is the last page of the VMA offset + npages, update GPU mapping to create GPU page table with same VMA access permission. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-09-23MAINTAINERS: fix up entry for AMD PowerplayAlex Deucher1-2/+2
Fix the path to cover both the older powerplay infrastructure and the newer SwSMU infrastructure. Reviewed-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-09-23drm/amd/display: fix empty debug macrosArnd Bergmann1-2/+2
Using an empty macro expansion as a conditional expression produces a W=1 warning: drivers/gpu/drm/amd/amdgpu/../display/dc/dce/dce_aux.c: In function 'dce_aux_transfer_with_retries': drivers/gpu/drm/amd/amdgpu/../display/dc/dce/dce_aux.c:775:156: error: suggest braces around empty body in an 'if' statement [-Werror=empty-body] 775 | "dce_aux_transfer_with_retries: AUX_RET_SUCCESS: AUX_TRANSACTION_REPLY_I2C_OVER_AUX_DEFER"); | ^ drivers/gpu/drm/amd/amdgpu/../display/dc/dce/dce_aux.c:783:155: error: suggest braces around empty body in an 'if' statement [-Werror=empty-body] 783 | "dce_aux_transfer_with_retries: AUX_RET_SUCCESS: AUX_TRANSACTION_REPLY_I2C_OVER_AUX_NACK"); | ^ Expand it to "do { } while (0)" instead to make the expression more robust and avoid the warning. Fixes: 56aca2309301 ("drm/amd/display: Add AUX I2C tracing.") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-09-23cifs: Deal with some warnings from W=1David Howells1-2/+12
Deal with some warnings generated from make W=1: (1) Add/remove/fix kerneldoc parameters descriptions. (2) Turn cifs' rqst_page_get_length()'s banner comment into a kerneldoc comment. It should probably be prefixed with "cifs_" though. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-09-23Revert "ACPI: Add memory semantics to acpi_os_map_memory()"Jia He4-43/+10
This reverts commit 437b38c51162f8b87beb28a833c4d5dc85fa864e. The memory semantics added in commit 437b38c51162 causes SystemMemory Operation region, whose address range is not described in the EFI memory map to be mapped as NormalNC memory on arm64 platforms (through acpi_os_map_memory() in acpi_ex_system_memory_space_handler()). This triggers the following abort on an ARM64 Ampere eMAG machine, because presumably the physical address range area backing the Opregion does not support NormalNC memory attributes driven on the bus. Internal error: synchronous external abort: 96000410 [#1] SMP Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.14.0+ #462 Hardware name: MiTAC RAPTOR EV-883832-X3-0001/RAPTOR, BIOS 0.14 02/22/2019 pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [...snip...] Call trace: acpi_ex_system_memory_space_handler+0x26c/0x2c8 acpi_ev_address_space_dispatch+0x228/0x2c4 acpi_ex_access_region+0x114/0x268 acpi_ex_field_datum_io+0x128/0x1b8 acpi_ex_extract_from_field+0x14c/0x2ac acpi_ex_read_data_from_field+0x190/0x1b8 acpi_ex_resolve_node_to_value+0x1ec/0x288 acpi_ex_resolve_to_value+0x250/0x274 acpi_ds_evaluate_name_path+0xac/0x124 acpi_ds_exec_end_op+0x90/0x410 acpi_ps_parse_loop+0x4ac/0x5d8 acpi_ps_parse_aml+0xe0/0x2c8 acpi_ps_execute_method+0x19c/0x1ac acpi_ns_evaluate+0x1f8/0x26c acpi_ns_init_one_device+0x104/0x140 acpi_ns_walk_namespace+0x158/0x1d0 acpi_ns_initialize_devices+0x194/0x218 acpi_initialize_objects+0x48/0x50 acpi_init+0xe0/0x498 If the Opregion address range is not present in the EFI memory map there is no way for us to determine the memory attributes to use to map it - defaulting to NormalNC does not work (and it is not correct on a memory region that may have read side-effects) and therefore commit 437b38c51162 should be reverted, which means reverting back to the original behavior whereby address ranges that are mapped using acpi_os_map_memory() default to the safe devicenGnRnE attributes on ARM64 if the mapped address range is not defined in the EFI memory map. Fixes: 437b38c51162 ("ACPI: Add memory semantics to acpi_os_map_memory()") Signed-off-by: Jia He <justin.he@arm.com> Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-09-23memcg: flush lruvec stats in the refaultShakeel Butt2-10/+1
Prior to the commit 7e1c0d6f5820 ("memcg: switch lruvec stats to rstat") and the commit aa48e47e3906 ("memcg: infrastructure to flush memcg stats"), each lruvec memcg stats can be off by (nr_cgroups * nr_cpus * 32) at worst and for unbounded amount of time. The commit aa48e47e3906 moved the lruvec stats to rstat infrastructure and the commit 7e1c0d6f5820 bounded the error for all the lruvec stats to (nr_cpus * 32) at worst for at most 2 seconds. More specifically it decoupled the number of stats and the number of cgroups from the error rate. However this reduction in error comes with the cost of triggering the slowpath of stats update more frequently. Previously in the slowpath the kernel adds the stats up the memcg tree. After aa48e47e3906, the kernel triggers the asyn lruvec stats flush through queue_work(). This causes regression reports from 0day kernel bot [1] as well as from phoronix test suite [2]. We tried two options to fix the regression: 1) Increase the threshold to trigger the slowpath in lruvec stats update codepath from 32 to 512. 2) Remove the slowpath from lruvec stats update codepath and instead flush the stats in the page refault codepath. The assumption is that the kernel timely flush the stats, so, the update tree would be small in the refault codepath to not cause the preformance impact. Following are the results of will-it-scale/page_fault[1|2|3] benchmark on four settings i.e. (1) 5.15-rc1 as baseline (2) 5.15-rc1 with aa48e47e3906 and 7e1c0d6f5820 reverted (3) 5.15-rc1 with option-1 (4) 5.15-rc1 with option-2. test (1) (2) (3) (4) pg_f1 368563 406277 (10.23%) 399693 (8.44%) 416398 (12.97%) pg_f2 338399 372133 (9.96%) 369180 (9.09%) 381024 (12.59%) pg_f3 500853 575399 (14.88%) 570388 (13.88%) 576083 (15.02%) From the above result, it seems like the option-2 not only solves the regression but also improves the performance for at least these benchmarks. Feng Tang (intel) ran the aim7 benchmark with these two options and confirms that option-1 reduces the regression but option-2 removes the regression. Michael Larabel (phoronix) ran multiple benchmarks with these options and reported the results at [3] and it shows for most benchmarks option-2 removes the regression introduced by the commit aa48e47e3906 ("memcg: infrastructure to flush memcg stats"). Based on the experiment results, this patch proposed the option-2 as the solution to resolve the regression. Link: https://lore.kernel.org/all/20210726022421.GB21872@xsang-OptiPlex-9020 [1] Link: https://www.phoronix.com/scan.php?page=article&item=linux515-compile-regress [2] Link: https://openbenchmarking.org/result/2109226-DEBU-LINUX5104 [3] Fixes: aa48e47e3906 ("memcg: infrastructure to flush memcg stats") Signed-off-by: Shakeel Butt <shakeelb@google.com> Tested-by: Michael Larabel <Michael@phoronix.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Roman Gushchin <guro@fb.com> Cc: Feng Tang <feng.tang@intel.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Hillf Danton <hdanton@sina.com>, Cc: Michal Koutný <mkoutny@suse.com> Cc: Andrew Morton <akpm@linux-foundation.org>, Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-23selinux,smack: fix subjective/objective credential use mixupsPaul Moore2-4/+4
Jann Horn reported a problem with commit eb1231f73c4d ("selinux: clarify task subjective and objective credentials") where some LSM hooks were attempting to access the subjective credentials of a task other than the current task. Generally speaking, it is not safe to access another task's subjective credentials and doing so can cause a number of problems. Further, while looking into the problem, I realized that Smack was suffering from a similar problem brought about by a similar commit 1fb057dcde11 ("smack: differentiate between subjective and objective task credentials"). This patch addresses this problem by restoring the use of the task's objective credentials in those cases where the task is other than the current executing task. Not only does this resolve the problem reported by Jann, it is arguably the correct thing to do in these cases. Cc: stable@vger.kernel.org Fixes: eb1231f73c4d ("selinux: clarify task subjective and objective credentials") Fixes: 1fb057dcde11 ("smack: differentiate between subjective and objective task credentials") Reported-by: Jann Horn <jannh@google.com> Acked-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2021-09-23erofs: clear compacted_2b if compacted_4b_initial > totalidxYue Hu1-1/+2
Currently, the whole indexes will only be compacted 4B if compacted_4b_initial > totalidx. So, the calculated compacted_2b is worthless for that case. It may waste CPU resources. No need to update compacted_4b_initial as mkfs since it's used to fulfill the alignment of the 1st compacted_2b pack and would handle the case above. We also need to clarify compacted_4b_end here. It's used for the last lclusters which aren't fitted in the previous compacted_2b packs. Some messages are from Xiang. Link: https://lore.kernel.org/r/20210914035915.1190-1-zbestahu@gmail.com Signed-off-by: Yue Hu <huyue2@yulong.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Chao Yu <chao@kernel.org> [ Gao Xiang: it's enough to use "compacted_4b_initial < totalidx". ] Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2021-09-23erofs: fix misbehavior of unsupported chunk format checkGao Xiang1-1/+1
Unsupported chunk format should be checked with "if (vi->chunkformat & ~EROFS_CHUNK_FORMAT_ALL)" Found when checking with 4k-byte blockmap (although currently mkfs uses inode chunk indexes format by default.) Link: https://lore.kernel.org/r/20210922095141.233938-1-hsiangkao@linux.alibaba.com Fixes: c5aa903a59db ("erofs: support reading chunk-based uncompressed files") Reviewed-by: Liu Bo <bo.liu@linux.alibaba.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2021-09-23erofs: fix up erofs_lookup tracepointGao Xiang1-3/+3
Fix up a misuse that the filename pointer isn't always valid in the ring buffer, and we should copy the content instead. Link: https://lore.kernel.org/r/20210921143531.81356-1-hsiangkao@linux.alibaba.com Fixes: 13f06f48f7bf ("staging: erofs: support tracepoint") Cc: stable@vger.kernel.org # 4.19+ Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2021-09-23arm64: Restore forced disabling of KPTI on ThunderXdann frazier1-2/+6
A noted side-effect of commit 0c6c2d3615ef ("arm64: Generate cpucaps.h") is that cpucaps are now sorted, changing the enumeration order. This assumed no dependencies between cpucaps, which turned out not to be true in one case. UNMAP_KERNEL_AT_EL0 currently needs to be processed after WORKAROUND_CAVIUM_27456. ThunderX systems are incompatible with KPTI, so unmap_kernel_at_el0() bails if WORKAROUND_CAVIUM_27456 is set. But because of the sorting, WORKAROUND_CAVIUM_27456 will not yet have been considered when unmap_kernel_at_el0() checks for it, so the kernel tries to run w/ KPTI - and quickly falls over. Because all ThunderX implementations have homogeneous CPUs, we can remove this dependency by just checking the current CPU for the erratum. Fixes: 0c6c2d3615ef ("arm64: Generate cpucaps.h") Cc: <stable@vger.kernel.org> # 5.13.x Signed-off-by: dann frazier <dann.frazier@canonical.com> Suggested-by: Suzuki K Poulose <suzuki.poulose@arm.com> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210923145002.3394558-1-dann.frazier@canonical.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-09-23atlantic: Fix issue in the pm resume flow.Sudarsana Reddy Kalluru1-2/+2
After fixing hibernation resume flow, another usecase was found which should be explicitly handled - resume when device is in "down" state. Invoke aq_nic_init jointly with aq_nic_start only if ndev was already up during suspend/hibernate. We still need to perform nic_deinit() if caller requests for it, to handle the freeze/resume scenarios. Fixes: 57f780f1c433 ("atlantic: Fix driver resume flow.") Signed-off-by: Sudarsana Reddy Kalluru <skalluru@marvell.com> Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-23net/mlx4_en: Don't allow aRFS for encapsulated packetsAya Levin1-0/+3
Driver doesn't support aRFS for encapsulated packets, return early error in such a case. Fixes: 1eb8c695bda9 ("net/mlx4_en: Add accelerated RFS support") Signed-off-by: Aya Levin <ayal@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-23net: mscc: ocelot: fix forwarding from BLOCKING ports remaining enabledVladimir Oltean1-3/+8
The blamed commit made the fatally incorrect assumption that ports which aren't in the FORWARDING STP state should not have packets forwarded towards them, and that is all that needs to be done. However, that logic alone permits BLOCKING ports to forward to FORWARDING ports, which of course allows packet storms to occur when there is an L2 loop. The ocelot_get_bridge_fwd_mask should not only ask "what can the bridge do for you", but "what can you do for the bridge". This way, only FORWARDING ports forward to the other FORWARDING ports from the same bridging domain, and we are still compatible with the idea of multiple bridges. Fixes: df291e54ccca ("net: ocelot: support multiple bridges") Suggested-by: Colin Foster <colin.foster@in-advantage.com> Reported-by: Colin Foster <colin.foster@in-advantage.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Colin Foster <colin.foster@in-advantage.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-23net: ethernet: mtk_eth_soc: avoid creating duplicate offload entriesFelix Fietkau1-0/+3
Sometimes multiple CLS_REPLACE calls are issued for the same connection. rhashtable_insert_fast does not check for these duplicates, so multiple hardware flow entries can be created. Fix this by checking for an existing entry early Fixes: 502e84e2382d ("net: ethernet: mtk_eth_soc: add flow offloading support") Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Ilya Lipnitskiy <ilya.lipnitskiy@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-23nfc: st-nci: Add SPI ID matching DT compatibleMark Brown1-0/+1
Currently autoloading for SPI devices does not use the DT ID table, it uses SPI modalises. Supporting OF modalises is going to be difficult if not impractical, an attempt was made but has been reverted, so ensure that module autoloading works for this driver by adding the part name used in the compatible to the list of SPI IDs. Fixes: 96c8395e2166 ("spi: Revert modalias changes") Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-23MAINTAINERS: remove Guvenc Gulce as net/smc maintainerGuvenc Gulce1-1/+0
Remove myself as net/smc maintainer, as I am leaving IBM soon and can not maintain net/smc anymore. Cc: Julian Wiedmann <jwi@linux.ibm.com> Acked-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Guvenc Gulce <guvenc@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-23nexthop: Fix memory leaks in nexthop notification chain listenersIdo Schimmel2-6/+15
syzkaller discovered memory leaks [1] that can be reduced to the following commands: # ip nexthop add id 1 blackhole # devlink dev reload pci/0000:06:00.0 As part of the reload flow, mlxsw will unregister its netdevs and then unregister from the nexthop notification chain. Before unregistering from the notification chain, mlxsw will receive delete notifications for nexthop objects using netdevs registered by mlxsw or their uppers. mlxsw will not receive notifications for nexthops using netdevs that are not dismantled as part of the reload flow. For example, the blackhole nexthop above that internally uses the loopback netdev as its nexthop device. One way to fix this problem is to have listeners flush their nexthop tables after unregistering from the notification chain. This is error-prone as evident by this patch and also not symmetric with the registration path where a listener receives a dump of all the existing nexthops. Therefore, fix this problem by replaying delete notifications for the listener being unregistered. This is symmetric to the registration path and also consistent with the netdev notification chain. The above means that unregister_nexthop_notifier(), like register_nexthop_notifier(), will have to take RTNL in order to iterate over the existing nexthops and that any callers of the function cannot hold RTNL. This is true for mlxsw and netdevsim, but not for the VXLAN driver. To avoid a deadlock, change the latter to unregister its nexthop listener without holding RTNL, making it symmetric to the registration path. [1] unreferenced object 0xffff88806173d600 (size 512): comm "syz-executor.0", pid 1290, jiffies 4295583142 (age 143.507s) hex dump (first 32 bytes): 41 9d 1e 60 80 88 ff ff 08 d6 73 61 80 88 ff ff A..`......sa.... 08 d6 73 61 80 88 ff ff 01 00 00 00 00 00 00 00 ..sa............ backtrace: [<ffffffff81a6b576>] kmemleak_alloc_recursive include/linux/kmemleak.h:43 [inline] [<ffffffff81a6b576>] slab_post_alloc_hook+0x96/0x490 mm/slab.h:522 [<ffffffff81a716d3>] slab_alloc_node mm/slub.c:3206 [inline] [<ffffffff81a716d3>] slab_alloc mm/slub.c:3214 [inline] [<ffffffff81a716d3>] kmem_cache_alloc_trace+0x163/0x370 mm/slub.c:3231 [<ffffffff82e8681a>] kmalloc include/linux/slab.h:591 [inline] [<ffffffff82e8681a>] kzalloc include/linux/slab.h:721 [inline] [<ffffffff82e8681a>] mlxsw_sp_nexthop_obj_group_create drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:4918 [inline] [<ffffffff82e8681a>] mlxsw_sp_nexthop_obj_new drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:5054 [inline] [<ffffffff82e8681a>] mlxsw_sp_nexthop_obj_event+0x59a/0x2910 drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:5239 [<ffffffff813ef67d>] notifier_call_chain+0xbd/0x210 kernel/notifier.c:83 [<ffffffff813f0662>] blocking_notifier_call_chain kernel/notifier.c:318 [inline] [<ffffffff813f0662>] blocking_notifier_call_chain+0x72/0xa0 kernel/notifier.c:306 [<ffffffff8384b9c6>] call_nexthop_notifiers+0x156/0x310 net/ipv4/nexthop.c:244 [<ffffffff83852bd8>] insert_nexthop net/ipv4/nexthop.c:2336 [inline] [<ffffffff83852bd8>] nexthop_add net/ipv4/nexthop.c:2644 [inline] [<ffffffff83852bd8>] rtm_new_nexthop+0x14e8/0x4d10 net/ipv4/nexthop.c:2913 [<ffffffff833e9a78>] rtnetlink_rcv_msg+0x448/0xbf0 net/core/rtnetlink.c:5572 [<ffffffff83608703>] netlink_rcv_skb+0x173/0x480 net/netlink/af_netlink.c:2504 [<ffffffff833de032>] rtnetlink_rcv+0x22/0x30 net/core/rtnetlink.c:5590 [<ffffffff836069de>] netlink_unicast_kernel net/netlink/af_netlink.c:1314 [inline] [<ffffffff836069de>] netlink_unicast+0x5ae/0x7f0 net/netlink/af_netlink.c:1340 [<ffffffff83607501>] netlink_sendmsg+0x8e1/0xe30 net/netlink/af_netlink.c:1929 [<ffffffff832fde84>] sock_sendmsg_nosec net/socket.c:704 [inline] [<ffffffff832fde84>] sock_sendmsg net/socket.c:724 [inline] [<ffffffff832fde84>] ____sys_sendmsg+0x874/0x9f0 net/socket.c:2409 [<ffffffff83304a44>] ___sys_sendmsg+0x104/0x170 net/socket.c:2463 [<ffffffff83304c01>] __sys_sendmsg+0x111/0x1f0 net/socket.c:2492 [<ffffffff83304d5d>] __do_sys_sendmsg net/socket.c:2501 [inline] [<ffffffff83304d5d>] __se_sys_sendmsg net/socket.c:2499 [inline] [<ffffffff83304d5d>] __x64_sys_sendmsg+0x7d/0xc0 net/socket.c:2499 Fixes: 2a014b200bbd ("mlxsw: spectrum_router: Add support for nexthop objects") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-23USB: serial: cp210x: add part-number debug printkJohan Hovold1-0/+2
Add a part-number debug printk to facilitate debugging. Signed-off-by: Johan Hovold <johan@kernel.org>
2021-09-23USB: serial: cp210x: fix dropped characters with CP2102Johan Hovold1-0/+35
Some CP2102 do not support event-insertion mode but return no error when attempting to enable it. This means that any event escape characters in the input stream will not be escaped by the device and consequently regular data may be interpreted as escape sequences and be removed from the stream by the driver. The reporter's device has batch number DCL00X etched into it and as discovered by the SHA2017 Badge team, counterfeit devices with that marking can be detected by sending malformed vendor requests. [1][2] Tests confirm that the possibly counterfeit CP2102 returns a single byte in response to a malformed two-byte part-number request, while an original CP2102 returns two bytes. Assume that every CP2102 that behaves this way also does not support event-insertion mode (e.g. cannot report parity errors). [1] https://mobile.twitter.com/sha2017badge/status/1167902087289532418 [2] https://hackaday.com/2017/08/14/hands-on-with-the-shacamp-2017-badge/#comment-3903376 Reported-by: Malte Di Donato <malte@neo-soft.org> Tested-by: Malte Di Donato <malte@neo-soft.org> Fixes: a7207e9835a4 ("USB: serial: cp210x: add support for line-status events") Cc: stable@vger.kernel.org # 5.9 Link: https://lore.kernel.org/r/20210922113100.20888-1-johan@kernel.org Signed-off-by: Johan Hovold <johan@kernel.org>
2021-09-22init: Revert accidental changes to print irqs_disabled()Geert Uytterhoeven1-3/+3
Commit f8ade8dddb16 ("xsurf100: drop include of lib8390.c") accidentally changed init/main.c. Revert that part. Fixes: f8ade8dddb16 ("xsurf100: drop include of lib8390.c") Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-22MAINTAINERS: Update Xen-[PCI,SWIOTLB,Block] maintainershipKonrad Rzeszutek Wilk1-3/+3
Konrad's new job role is putting a serious cramp on him being a responsive maintainer and as such he is handing off the reins to Juergen, Roger, and Stefano. Thank you! Acked-by: Juergen Gross <jgross@suse.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Stefano Stabellini <sstabellini@kernel.org> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-22MAINTAINERS: Update SWIOTLB maintainershipKonrad Rzeszutek Wilk1-2/+3
Konrad's new job role is putting a serious cramp on him being a responsive maintainer and as such he is handing off the reins to Christoph Hellwig. Thank you! Acked-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-22MAINTAINERS: update entry for NIOS2Dinh Nguyen1-2/+2
Ley Foon has left Intel and will no longer be able to maintain NIOS2. Update the MAINTAINER's entry to Dinh Nguyen. Acked-by: Ley Foon Tan <ley.foon.tan@intel.com> Signed-off-by: Dinh Nguyen <dinguyen@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-22MAINTAINERS: ARM/VT8500, remove defunct e-mailJiri Slaby1-2/+1
linux@prisktech.co.nz is defunct: 4.1.2 <linux@prisktech.co.nz>: Recipient address rejected: Domain not found Remove it from MAINTAINERS and mark the ARM/VT8500 entry orphan. Signed-off-by: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-22md: fix a lock order reversal in md_allocChristoph Hellwig1-5/+0
Commit b0140891a8cea3 ("md: Fix race when creating a new md device.") not only moved assigning mddev->gendisk before calling add_disk, which fixes the races described in the commit log, but also added a mddev->open_mutex critical section over add_disk and creation of the md kobj. Adding a kobject after add_disk is racy vs deleting the gendisk right after adding it, but md already prevents against that by holding a mddev->active reference. On the other hand taking this lock added a lock order reversal with what is not disk->open_mutex (used to be bdev->bd_mutex when the commit was added) for partition devices, which need that lock for the internal open for the partition scan, and a recent commit also takes it for non-partitioned devices, leading to further lockdep splatter. Fixes: b0140891a8ce ("md: Fix race when creating a new md device.") Fixes: d62633873590 ("block: support delayed holder registration") Reported-by: syzbot+fadc0aaf497e6a493b9f@syzkaller.appspotmail.com Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: syzbot+fadc0aaf497e6a493b9f@syzkaller.appspotmail.com Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Song Liu <songliubraving@fb.com>
2021-09-22KVM: selftests: Remove __NR_userfaultfd syscall fallbackSean Christopherson1-3/+0
Revert the __NR_userfaultfd syscall fallback added for KVM selftests now that x86's unistd_{32,63}.h overrides are under uapi/ and thus not in KVM selftests' search path, i.e. now that KVM gets x86 syscall numbers from the installed kernel headers. No functional change intended. Reviewed-by: Ben Gardon <bgardon@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210901203030.1292304-6-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-09-22KVM: selftests: Add a test for KVM_RUN+rseq to detect task migration bugsSean Christopherson3-0/+240
Add a test to verify an rseq's CPU ID is updated correctly if the task is migrated while the kernel is handling KVM_RUN. This is a regression test for a bug introduced by commit 72c3c0fe54a3 ("x86/kvm: Use generic xfer to guest work function"), where TIF_NOTIFY_RESUME would be cleared by KVM without updating rseq, leading to a stale CPU ID and other badness. Signed-off-by: Sean Christopherson <seanjc@google.com> Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Message-Id: <20210901203030.1292304-5-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-09-22tools: Move x86 syscall number fallbacks to .../uapi/Sean Christopherson2-0/+0
Move unistd_{32,64}.h from x86/include/asm to x86/include/uapi/asm so that tools/selftests that install kernel headers, e.g. KVM selftests, can include non-uapi tools headers, e.g. to get 'struct list_head', without effectively overriding the installed non-tool uapi headers. Swapping KVM's search order, e.g. to search the kernel headers before tool headers, is not a viable option as doing results in linux/type.h and other core headers getting pulled from the kernel headers, which do not have the kernel-internal typedefs that are used through tools, including many files outside of selftests/kvm's control. Prior to commit cec07f53c398 ("perf tools: Move syscall number fallbacks from perf-sys.h to tools/arch/x86/include/asm/"), the handcoded numbers were actual fallbacks, i.e. overriding unistd_{32,64}.h from the kernel headers was unintentional. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210901203030.1292304-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>