aboutsummaryrefslogtreecommitdiffstats
path: root/tools/perf/scripts/python/export-to-postgresql.py (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2022-09-09sched/psi: Remove NR_ONCPU task accountingJohannes Weiner2-20/+37
We put all fields updated by the scheduler in the first cacheline of struct psi_group_cpu for performance. Since we want add another PSI_IRQ_FULL to track IRQ/SOFTIRQ pressure, we need to reclaim space first. This patch remove NR_ONCPU task accounting in struct psi_group_cpu, use one bit in state_mask to track instead. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Chengming Zhou <zhouchengming@bytedance.com> Tested-by: Chengming Zhou <zhouchengming@bytedance.com> Link: https://lore.kernel.org/r/20220825164111.29534-7-zhouchengming@bytedance.com
2022-09-09sched/psi: Optimize task switch inside shared cgroups againChengming Zhou1-12/+9
Way back when PSI_MEM_FULL was accounted from the timer tick, task switching could simply iterate next and prev to the common ancestor to update TSK_ONCPU and be done. Then memstall ticks were replaced with checking curr->in_memstall directly in psi_group_change(). That meant that now if the task switch was between a memstall and a !memstall task, we had to iterate through the common ancestors at least ONCE to fix up their state_masks. We added the identical_state filter to make sure the common ancestor elimination was skipped in that case. It seems that was always a little too eager, because it caused us to walk the common ancestors *twice* instead of the required once: the iteration for next could have stopped at the common ancestor; prev could have updated TSK_ONCPU up to the common ancestor, then finish to the root without changing any flags, just to get the new curr->in_memstall into the state_masks. This patch recognizes this and makes it so that we walk to the root exactly once if state_mask needs updating, which is simply catching up on a missed optimization that could have been done in commit 7fae6c8171d2 ("psi: Use ONCPU state tracking machinery to detect reclaim") directly. Apart from this, it's also necessary for the next patch "sched/psi: remove NR_ONCPU task accounting". Suppose we walk the common ancestors twice: (1) psi_group_change(.clear = 0, .set = TSK_ONCPU) (2) psi_group_change(.clear = TSK_ONCPU, .set = 0) We previously used tasks[NR_ONCPU] to record TSK_ONCPU, tasks[NR_ONCPU]++ in (1) then tasks[NR_ONCPU]-- in (2), so tasks[NR_ONCPU] still be correct. The next patch change to use one bit in state mask to record TSK_ONCPU, PSI_ONCPU bit will be set in (1), but then be cleared in (2), which cause the psi_group_cpu has task running on CPU but without PSI_ONCPU bit set! With this patch, we will never walk the common ancestors twice, so won't have above problem. Suggested-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Link: https://lore.kernel.org/r/20220825164111.29534-6-zhouchengming@bytedance.com
2022-09-09sched/psi: Move private helpers to sched/stats.hChengming Zhou2-4/+4
This patch move psi_task_change/psi_task_switch declarations out of PSI public header, since they are only needed for implementing the PSI stats tracking in sched/stats.h psi_task_switch is obvious, psi_task_change can't be public helper since it doesn't check psi_disabled static key. And there is no any user now, so put it in sched/stats.h too. Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Link: https://lore.kernel.org/r/20220825164111.29534-5-zhouchengming@bytedance.com
2022-09-09sched/psi: Save percpu memory when !psi_cgroups_enabledChengming Zhou1-3/+4
We won't use cgroup psi_group when !psi_cgroups_enabled, so don't bother to alloc percpu memory and init for it. Also don't need to migrate task PSI stats between cgroups in cgroup_move_task(). Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Link: https://lore.kernel.org/r/20220825164111.29534-4-zhouchengming@bytedance.com
2022-09-09sched/psi: Don't create cgroup PSI files when psi_disabledChengming Zhou1-0/+3
commit 3958e2d0c34e ("cgroup: make per-cgroup pressure stall tracking configurable") make PSI can be configured to skip per-cgroup stall accounting. And doesn't expose PSI files in cgroup hierarchy. This patch do the same thing when psi_disabled. Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Link: https://lore.kernel.org/r/20220825164111.29534-3-zhouchengming@bytedance.com
2022-09-09sched/psi: Fix periodic aggregation shut offChengming Zhou1-14/+14
We don't want to wake periodic aggregation work back up if the task change is the aggregation worker itself going to sleep, or we'll ping-pong forever. Previously, we would use psi_task_change() in psi_dequeue() when task going to sleep, so this check was put in psi_task_change(). But commit 4117cebf1a9f ("psi: Optimize task switch inside shared cgroups") defer task sleep handling to psi_task_switch(), won't go through psi_task_change() anymore. So this patch move this check to psi_task_switch(). Fixes: 4117cebf1a9f ("psi: Optimize task switch inside shared cgroups") Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Link: https://lore.kernel.org/r/20220825164111.29534-2-zhouchengming@bytedance.com
2022-09-04Linux 6.0-rc4Linus Torvalds1-1/+1
2022-09-04Makefile.extrawarn: re-enable -Wformat for clang; take 2Nick Desaulniers1-0/+12
-Wformat was recently re-enabled for builds with clang, then quickly re-disabled, due to concerns stemming from the frequency of default argument promotion related warning instances. commit 258fafcd0683 ("Makefile.extrawarn: re-enable -Wformat for clang") commit 21f9c8a13bb2 ("Revert "Makefile.extrawarn: re-enable -Wformat for clang"") ISO WG14 has ratified N2562 to address default argument promotion explicitly for printf, as part of the upcoming ISO C2X standard. The behavior of clang was changed in clang-16 to not warn for the cited cases in all language modes. Add a version check, so that users of clang-16 now get the full effect of -Wformat. For older clang versions, re-enable flags under the -Wformat group that way users still get some useful checks related to format strings, without noisy default argument promotion warnings. I intentionally omitted -Wformat-y2k and -Wformat-security from being re-enabled, which are also part of -Wformat in clang-16. Link: https://github.com/ClangBuiltLinux/linux/issues/378 Link: https://github.com/llvm/llvm-project/issues/57102 Link: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2562.pdf Suggested-by: Justin Stitt <jstitt007@gmail.com> Suggested-by: Nathan Chancellor <nathan@kernel.org> Suggested-by: Youngmin Nam <youngmin.nam@samsung.com> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-09-03gpio: ws16c48: Make irq_chip immutableWilliam Breathitt Gray1-3/+7
Kernel warns about mutable irq_chips: "not an immutable chip, please consider fixing!" Make the struct irq_chip const, flag it as IRQCHIP_IMMUTABLE, add the new helper functions, and call the appropriate gpiolib functions. Signed-off-by: William Breathitt Gray <william.gray@linaro.org> Signed-off-by: Bartosz Golaszewski <brgl@bgdev.pl>
2022-09-03gpio: 104-idio-16: Make irq_chip immutableWilliam Breathitt Gray1-7/+11
Kernel warns about mutable irq_chips: "not an immutable chip, please consider fixing!" Make the struct irq_chip const, flag it as IRQCHIP_IMMUTABLE, add the new helper functions, and call the appropriate gpiolib functions. Signed-off-by: William Breathitt Gray <william.gray@linaro.org> Signed-off-by: Bartosz Golaszewski <brgl@bgdev.pl>
2022-09-03gpio: 104-idi-48: Make irq_chip immutableWilliam Breathitt Gray1-3/+7
Kernel warns about mutable irq_chips: "not an immutable chip, please consider fixing!" Make the struct irq_chip const, flag it as IRQCHIP_IMMUTABLE, add the new helper functions, and call the appropriate gpiolib functions. Signed-off-by: William Breathitt Gray <william.gray@linaro.org> Signed-off-by: Bartosz Golaszewski <brgl@bgdev.pl>
2022-09-03gpio: 104-dio-48e: Make irq_chip immutableWilliam Breathitt Gray1-3/+7
Kernel warns about mutable irq_chips: "not an immutable chip, please consider fixing!" Make the struct irq_chip const, flag it as IRQCHIP_IMMUTABLE, add the new helper functions, and call the appropriate gpiolib functions. Signed-off-by: William Breathitt Gray <william.gray@linaro.org> Signed-off-by: Bartosz Golaszewski <brgl@bgdev.pl>
2022-09-03mm: pagewalk: Fix race between unmap and page walkerSteven Price3-13/+16
The mmap lock protects the page walker from changes to the page tables during the walk. However a read lock is insufficient to protect those areas which don't have a VMA as munmap() detaches the VMAs before downgrading to a read lock and actually tearing down PTEs/page tables. For users of walk_page_range() the solution is to simply call pte_hole() immediately without checking the actual page tables when a VMA is not present. We now never call __walk_page_range() without a valid vma. For walk_page_range_novma() the locking requirements are tightened to require the mmap write lock to be taken, and then walking the pgd directly with 'no_vma' set. This in turn means that all page walkers either have a valid vma, or it's that special 'novma' case for page table debugging. As a result, all the odd '(!walk->vma && !walk->no_vma)' tests can be removed. Fixes: dd2283f2605e ("mm: mmap: zap pages with read mmap_sem in munmap") Reported-by: Jann Horn <jannh@google.com> Signed-off-by: Steven Price <steven.price@arm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-09-03LoongArch: mm: Remove the unneeded result variableye xingchen1-4/+1
Return the value pa_to_nid() directly instead of storing it in another redundant variable. Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: ye xingchen <ye.xingchen@zte.com.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2022-09-03LoongArch: Fix arch_remove_memory() undefined build errorYupeng Li1-12/+10
The kernel build error when unslected CONFIG_MEMORY_HOTREMOVE because arch_remove_memory() is needed by mm/memory_hotplug.c but undefined. Some build error messages like: LD vmlinux.o MODPOST vmlinux.symvers MODINFO modules.builtin.modinfo GEN modules.builtin LD .tmp_vmlinux.kallsyms1 loongarch64-linux-gnu-ld: mm/memory_hotplug.o: in function `.L242': memory_hotplug.c:(.ref.text+0x930): undefined reference to `arch_remove_memory' make: *** [Makefile:1169:vmlinux] 错误 1 Removed CONFIG_MEMORY_HOTREMOVE requirement and rearrange the file refer to the definitions of other platform architectures. Signed-off-by: Yupeng Li <liyupeng@zbhlos.com> Signed-off-by: Caicai <caizp2008@163.com> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2022-09-03LoongArch: Fix section mismatch due to acpi_os_ioremap()Huacai Chen3-2/+3
Now acpi_os_ioremap() is marked with __init because it calls memblock_ is_memory() which is also marked with __init in the !ARCH_KEEP_MEMBLOCK case. However, acpi_os_ioremap() is called by ordinary functions such as acpi_os_{read, write}_memory() and causes section mismatch warnings: WARNING: modpost: vmlinux.o: section mismatch in reference: acpi_os_read_memory (section: .text) -> acpi_os_ioremap (section: .init.text) WARNING: modpost: vmlinux.o: section mismatch in reference: acpi_os_write_memory (section: .text) -> acpi_os_ioremap (section: .init.text) Fix these warnings by selecting ARCH_KEEP_MEMBLOCK unconditionally and removing the __init modifier of acpi_os_ioremap(). This can also give a chance to track "memory" and "reserved" memblocks after early boot. Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2022-09-03LoongArch: Improve dump_tlb() output messagesHuacai Chen1-13/+13
1, Use nr/nx to replace ri/xi; 2, Add 0x prefix for hexadecimal data. Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2022-09-03LoongArch: Adjust arch_do_signal_or_restart() to adapt generic entryHuacai Chen1-2/+2
Commit 8ba62d37949e248c69 ("task_work: Call tracehook_notify_signal from get_signal on all architectures") adjust arch_do_signal_or_restart() for all architectures. LoongArch hasn't been upstream yet at that time and can be still built successfully without adjustment because this function has a weak version with the correct prototype. It is obviously that we should convert LoongArch to use new API, otherwise some signal handlings will be lost. Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2022-09-03LoongArch: Avoid orphan input sectionsArd Biesheuvel2-0/+3
Ensure that all input sections are listed explicitly in the linker script, and issue a warning otherwise. This ensures that the binary image matches the PE/COFF and other image metadata exactly, which is important for things like code signing. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2022-09-02Documentation: document ublkMing Lei3-0/+255
Add documentation for ublk subsystem. It was supposed to be documented when merging the driver, but missing at that time. Cc: Bagas Sanjaya <bagasdotme@gmail.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Richard W.M. Jones <rjones@redhat.com> Cc: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: ZiyangZhang <ZiyangZhang@linux.alibaba.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> [axboe: correct MAINTAINERS addition] Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-09-02landlock: Fix file reparenting without explicit LANDLOCK_ACCESS_FS_REFERMickaël Salaün2-33/+170
This change fixes a mis-handling of the LANDLOCK_ACCESS_FS_REFER right when multiple rulesets/domains are stacked. The expected behaviour was that an additional ruleset can only restrict the set of permitted operations, but in this particular case, it was potentially possible to re-gain the LANDLOCK_ACCESS_FS_REFER right. With the introduction of LANDLOCK_ACCESS_FS_REFER, we added the first globally denied-by-default access right. Indeed, this lifted an initial Landlock limitation to rename and link files, which was initially always denied when the source or the destination were different directories. This led to an inconsistent backward compatibility behavior which was only taken into account if no domain layer were using the new LANDLOCK_ACCESS_FS_REFER right. However, when restricting a thread with a new ruleset handling LANDLOCK_ACCESS_FS_REFER, all inherited parent rulesets/layers not explicitly handling LANDLOCK_ACCESS_FS_REFER would behave as if they were handling this access right and with all their rules allowing it. This means that renaming and linking files could became allowed by these parent layers, but all the other required accesses must also be granted: all layers must allow file removal or creation, and renaming and linking operations cannot lead to privilege escalation according to the Landlock policy. See detailed explanation in commit b91c3e4ea756 ("landlock: Add support for file reparenting with LANDLOCK_ACCESS_FS_REFER"). To say it another way, this bug may lift the renaming and linking limitations of the initial Landlock version, and a same ruleset can enforce different restrictions depending on previous or next enforced ruleset (i.e. inconsistent behavior). The LANDLOCK_ACCESS_FS_REFER right cannot give access to data not already allowed, but this doesn't follow the contract of the first Landlock ABI. This fix puts back the limitation for sandboxes that didn't opt-in for this additional right. For instance, if a first ruleset allows LANDLOCK_ACCESS_FS_MAKE_REG on /dst and LANDLOCK_ACCESS_FS_REMOVE_FILE on /src, renaming /src/file to /dst/file is denied. However, without this fix, stacking a new ruleset which allows LANDLOCK_ACCESS_FS_REFER on / would now permit the sandboxed thread to rename /src/file to /dst/file . This change fixes the (absolute) rule access rights, which now always forbid LANDLOCK_ACCESS_FS_REFER except when it is explicitly allowed when creating a rule. Making all domain handle LANDLOCK_ACCESS_FS_REFER was an initial approach but there is two downsides: * it makes the code more complex because we still want to check that a rule allowing LANDLOCK_ACCESS_FS_REFER is legitimate according to the ruleset's handled access rights (i.e. ABI v1 != ABI v2); * it would not allow to identify if the user created a ruleset explicitly handling LANDLOCK_ACCESS_FS_REFER or not, which will be an issue to audit Landlock. Instead, this change adds an ACCESS_INITIALLY_DENIED list of denied-by-default rights, which (only) contains LANDLOCK_ACCESS_FS_REFER. All domains are treated as if they are also handling this list, but without modifying their fs_access_masks field. A side effect is that the errno code returned by rename(2) or link(2) *may* be changed from EXDEV to EACCES according to the enforced restrictions. Indeed, we now have the mechanic to identify if an access is denied because of a required right (e.g. LANDLOCK_ACCESS_FS_MAKE_REG, LANDLOCK_ACCESS_FS_REMOVE_FILE) or if it is denied because of missing LANDLOCK_ACCESS_FS_REFER rights. This may result in different errno codes than for the initial Landlock version, but this approach is more consistent and better for rename/link compatibility reasons, and it wasn't possible before (hence no backport to ABI v1). The layout1.rename_file test reflects this change. Add 4 layout1.refer_denied_by_default* test suites to check that the behavior of a ruleset not handling LANDLOCK_ACCESS_FS_REFER (ABI v1) is unchanged even if another layer handles LANDLOCK_ACCESS_FS_REFER (i.e. ABI v1 precedence). Make sure rule's absolute access rights are correct by testing with and without a matching path. Add test_rename() and test_exchange() helpers. Extend layout1.inval tests to check that a denied-by-default access right is not necessarily part of a domain's handled access rights. Test coverage for security/landlock is 95.3% of 599 lines according to gcc/gcov-11. Fixes: b91c3e4ea756 ("landlock: Add support for file reparenting with LANDLOCK_ACCESS_FS_REFER") Reviewed-by: Paul Moore <paul@paul-moore.com> Reviewed-by: Günther Noack <gnoack3000@gmail.com> Link: https://lore.kernel.org/r/20220831203840.1370732-1-mic@digikod.net Cc: stable@vger.kernel.org [mic: Constify and slightly simplify test helpers] Signed-off-by: Mickaël Salaün <mic@digikod.net>
2022-09-02xen/grants: prevent integer overflow in gnttab_dma_alloc_pages()Dan Carpenter1-0/+3
The change from kcalloc() to kvmalloc() means that arg->nr_pages might now be large enough that the "args->nr_pages << PAGE_SHIFT" can result in an integer overflow. Fixes: b3f7931f5c61 ("xen/gntdev: switch from kcalloc() to kvcalloc()") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/YxDROJqu/RPvR0bi@kili Signed-off-by: Juergen Gross <jgross@suse.com>
2022-09-02xen-blkfront: Cache feature_persistent value before advertisementSeongJae Park1-7/+7
Xen blkfront advertises its support of the persistent grants feature when it first setting up and when resuming in 'talk_to_blkback()'. Then, blkback reads the advertised value when it connects with blkfront and decides if it will use the persistent grants feature or not, and advertises its decision to blkfront. Blkfront reads the blkback's decision and it also makes the decision for the use of the feature. Commit 402c43ea6b34 ("xen-blkfront: Apply 'feature_persistent' parameter when connect"), however, made the blkfront's read of the parameter for disabling the advertisement, namely 'feature_persistent', to be done when it negotiate, not when advertise. Therefore blkfront advertises without reading the parameter. As the field for caching the parameter value is zero-initialized, it always advertises as the feature is disabled, so that the persistent grants feature becomes always disabled. This commit fixes the issue by making the blkfront does parmeter caching just before the advertisement. Fixes: 402c43ea6b34 ("xen-blkfront: Apply 'feature_persistent' parameter when connect") Cc: <stable@vger.kernel.org> # 5.10.x Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Signed-off-by: SeongJae Park <sj@kernel.org> Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/20220831165824.94815-4-sj@kernel.org Signed-off-by: Juergen Gross <jgross@suse.com>
2022-09-02xen-blkfront: Advertise feature-persistent as user requestedSeongJae Park1-2/+6
The advertisement of the persistent grants feature (writing 'feature-persistent' to xenbus) should mean not the decision for using the feature but only the availability of the feature. However, commit 74a852479c68 ("xen-blkfront: add a parameter for disabling of persistent grants") made a field of blkfront, which was a place for saving only the negotiation result, to be used for yet another purpose: caching of the 'feature_persistent' parameter value. As a result, the advertisement, which should follow only the parameter value, becomes inconsistent. This commit fixes the misuse of the semantic by making blkfront saves the parameter value in a separate place and advertises the support based on only the saved value. Fixes: 74a852479c68 ("xen-blkfront: add a parameter for disabling of persistent grants") Cc: <stable@vger.kernel.org> # 5.10.x Suggested-by: Juergen Gross <jgross@suse.com> Signed-off-by: SeongJae Park <sj@kernel.org> Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/20220831165824.94815-3-sj@kernel.org Signed-off-by: Juergen Gross <jgross@suse.com>
2022-09-02xen-blkback: Advertise feature-persistent as user requestedSeongJae Park2-2/+7
The advertisement of the persistent grants feature (writing 'feature-persistent' to xenbus) should mean not the decision for using the feature but only the availability of the feature. However, commit aac8a70db24b ("xen-blkback: add a parameter for disabling of persistent grants") made a field of blkback, which was a place for saving only the negotiation result, to be used for yet another purpose: caching of the 'feature_persistent' parameter value. As a result, the advertisement, which should follow only the parameter value, becomes inconsistent. This commit fixes the misuse of the semantic by making blkback saves the parameter value in a separate place and advertises the support based on only the saved value. Fixes: aac8a70db24b ("xen-blkback: add a parameter for disabling of persistent grants") Cc: <stable@vger.kernel.org> # 5.10.x Suggested-by: Juergen Gross <jgross@suse.com> Signed-off-by: SeongJae Park <sj@kernel.org> Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/20220831165824.94815-2-sj@kernel.org Signed-off-by: Juergen Gross <jgross@suse.com>
2022-09-02powerpc/papr_scm: Ensure rc is always initialized in papr_scm_pmu_register()Nathan Chancellor1-1/+3
Clang warns: arch/powerpc/platforms/pseries/papr_scm.c:492:6: warning: variable 'rc' is used uninitialized whenever 'if' condition is true [-Wsometimes-uninitialized] if (!p->stat_buffer_len) ^~~~~~~~~~~~~~~~~~~ arch/powerpc/platforms/pseries/papr_scm.c:523:64: note: uninitialized use occurs here dev_info(&p->pdev->dev, "nvdimm pmu didn't register rc=%d\n", rc); ^~ include/linux/dev_printk.h:150:67: note: expanded from macro 'dev_info' dev_printk_index_wrap(_dev_info, KERN_INFO, dev, dev_fmt(fmt), ##__VA_ARGS__) ^~~~~~~~~~~ include/linux/dev_printk.h:110:23: note: expanded from macro 'dev_printk_index_wrap' _p_func(dev, fmt, ##__VA_ARGS__); \ ^~~~~~~~~~~ arch/powerpc/platforms/pseries/papr_scm.c:492:2: note: remove the 'if' if its condition is always false if (!p->stat_buffer_len) ^~~~~~~~~~~~~~~~~~~~~~~~ arch/powerpc/platforms/pseries/papr_scm.c:484:8: note: initialize the variable 'rc' to silence this warning int rc, nodeid; ^ = 0 1 warning generated. The call to papr_scm_pmu_check_events() was eliminated but a return code was not added to the if statement. Add the same return code from papr_scm_pmu_check_events() for this condition so there is no more warning. Fixes: 9b1ac04698a4 ("powerpc/papr_scm: Fix nvdimm event mappings") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://github.com/ClangBuiltLinux/linux/issues/1701 Link: https://lore.kernel.org/r/20220830151256.1473169-1-nathan@kernel.org
2022-09-02Revert "powerpc/irq: Don't open code irq_soft_mask helpers"Michael Ellerman1-7/+36
This reverts commit ef5b570d3700fbb8628a58da0487486ceeb713cd. Zhouyi reported that commit is causing crashes when running rcutorture with KASAN enabled: BUG: using smp_processor_id() in preemptible [00000000] code: rcu_torture_rea/100 caller is rcu_preempt_deferred_qs_irqrestore+0x74/0xed0 CPU: 4 PID: 100 Comm: rcu_torture_rea Tainted: G W 5.19.0-rc5-next-20220708-dirty #253 Call Trace: dump_stack_lvl+0xbc/0x108 (unreliable) check_preemption_disabled+0x154/0x160 rcu_preempt_deferred_qs_irqrestore+0x74/0xed0 __rcu_read_unlock+0x290/0x3b0 rcu_torture_read_unlock+0x30/0xb0 rcutorture_one_extend+0x198/0x810 rcu_torture_one_read+0x58c/0xc90 rcu_torture_reader+0x12c/0x360 kthread+0x1e8/0x220 ret_from_kernel_thread+0x5c/0x64 KASAN will generate instrumentation instructions around the WRITE_ONCE(local_paca->irq_soft_mask, mask): 0xc000000000295cb0 <+0>: addis r2,r12,774 0xc000000000295cb4 <+4>: addi r2,r2,16464 0xc000000000295cb8 <+8>: mflr r0 0xc000000000295cbc <+12>: bl 0xc00000000008bb4c <mcount> 0xc000000000295cc0 <+16>: mflr r0 0xc000000000295cc4 <+20>: std r31,-8(r1) 0xc000000000295cc8 <+24>: addi r3,r13,2354 0xc000000000295ccc <+28>: mr r31,r13 0xc000000000295cd0 <+32>: std r0,16(r1) 0xc000000000295cd4 <+36>: stdu r1,-48(r1) 0xc000000000295cd8 <+40>: bl 0xc000000000609b98 <__asan_store1+8> 0xc000000000295cdc <+44>: nop 0xc000000000295ce0 <+48>: li r9,1 0xc000000000295ce4 <+52>: stb r9,2354(r31) 0xc000000000295ce8 <+56>: addi r1,r1,48 0xc000000000295cec <+60>: ld r0,16(r1) 0xc000000000295cf0 <+64>: ld r31,-8(r1) 0xc000000000295cf4 <+68>: mtlr r0 If there is a context switch before "stb r9,2354(r31)", r31 may not equal to r13, in such case, irq soft mask will not work. The usual solution of marking the code ineligible for instrumentation forces the code out-of-line, which we would prefer to avoid. Christophe proposed a partial revert, but Nick raised some concerns with that. So for now do a full revert. Reported-by: Zhouyi Zhou <zhouzhouyi@gmail.com> [mpe: Construct change log based on Zhouyi's original report] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220831131052.42250-1-mpe@ellerman.id.au
2022-09-02Revert "usb: gadget: udc-xilinx: replace memcpy with memcpy_toio"Greg Kroah-Hartman1-8/+8
This reverts commit 8cb339f1c1f04baede9d54c1e40ac96247a6393b as it throws up a bunch of sparse warnings as reported by the kernel test robot. Reported-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/r/202209020044.CX2PfZzM-lkp@intel.com Fixes: 8cb339f1c1f0 ("usb: gadget: udc-xilinx: replace memcpy with memcpy_toio") Cc: stable@vger.kernel.org Cc: Linus Walleij <linus.walleij@linaro.org> Cc: Piyush Mehta <piyush.mehta@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-09-01KVM: x86: check validity of argument to KVM_SET_MP_STATEPaolo Bonzini1-3/+17
An invalid argument to KVM_SET_MP_STATE has no effect other than making the vCPU fail to run at the next KVM_RUN. Since it is extremely unlikely that any userspace is relying on it, fail with -EINVAL just like for other architectures. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-01perf/x86/core: Completely disable guest PEBS via guest's global_ctrlLike Xu1-1/+2
When a guest PEBS counter is cross-mapped by a host counter, software will remove the corresponding bit in the arr[global_ctrl].guest and expect hardware to perform a change of state "from enable to disable" via the msr_slot[] switch during the vmx transaction. The real world is that if user adjust the counter overflow value small enough, it still opens a tiny race window for the previously PEBS-enabled counter to write cross-mapped PEBS records into the guest's PEBS buffer, when arr[global_ctrl].guest has been prioritised (switch_msr_special stuff) to switch into the enabled state, while the arr[pebs_enable].guest has not. Close this window by clearing invalid bits in the arr[global_ctrl].guest. Cc: linux-perf-users@vger.kernel.org Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sean Christopherson <seanjc@google.com> Fixes: 854250329c02 ("KVM: x86/pmu: Disable guest PEBS temporarily in two rare situations") Signed-off-by: Like Xu <likexu@tencent.com> Message-Id: <20220831033524.58561-1-likexu@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-01KVM: x86: fix memoryleak in kvm_arch_vcpu_create()Miaohe Lin1-2/+1
When allocating memory for mci_ctl2_banks fails, KVM doesn't release mce_banks leading to memoryleak. Fix this issue by calling kfree() for it when kcalloc() fails. Fixes: 281b52780b57 ("KVM: x86: Add emulation for MSR_IA32_MCx_CTL2 MSRs.") Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Message-Id: <20220901122300.22298-1-linmiaohe@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-01KVM: x86: Mask off unsupported and unknown bits of IA32_ARCH_CAPABILITIESJim Mattson1-4/+21
KVM should not claim to virtualize unknown IA32_ARCH_CAPABILITIES bits. When kvm_get_arch_capabilities() was originally written, there were only a few bits defined in this MSR, and KVM could virtualize all of them. However, over the years, several bits have been defined that KVM cannot just blindly pass through to the guest without additional work (such as virtualizing an MSR promised by the IA32_ARCH_CAPABILITES feature bit). Define a mask of supported IA32_ARCH_CAPABILITIES bits, and mask off any other bits that are set in the hardware MSR. Cc: Paolo Bonzini <pbonzini@redhat.com> Fixes: 5b76a3cff011 ("KVM: VMX: Tell the nested hypervisor to skip L1D flush on vmentry") Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: Vipin Sharma <vipinsh@google.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Message-Id: <20220830174947.2182144-1-jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-01devres: Slightly optimize alloc_dr()Christophe JAILLET1-1/+3
If the gfp flag used for the memory allocation already has __GFP_ZERO, then there is no need to explicitly clear the "struct devres_node". It is already zeroed. This saves a few cycles when using devm_zalloc() and co. In the case of devres_alloc() (which calls __devres_alloc_node()), the compiler could remove the test and the memset() because it should be able to see that the __GFP_ZERO flag is set. So this would make the code both faster and smaller. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://lore.kernel.org/r/d255bd871484e63cdd628e819f929e2df59afb02.1658352383.git.christophe.jaillet@wanadoo.fr Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-09-01drivers: base: Print error code on synthetic uevent failureBrian Norris1-1/+1
If we're going to log the failure, we might as well log the return code too. Signed-off-by: Brian Norris <briannorris@chromium.org> Link: https://lore.kernel.org/r/20220824165213.1.Ifdb98af3d0c23708a11d8d5ae5697bdb7e96a3cc@changeid Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-09-01class: use IS_ERR_OR_NULL() helper in class_unregister()Yang Yingliang1-1/+1
Use IS_ERR_OR_NULL() helper in class_unregister() to simplify code. Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Link: https://lore.kernel.org/r/20220822061922.3884113-1-yangyingliang@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-09-01driver_core: move from strlcpy with unused retval to strscpyWolfram Sang1-1/+1
Follow the advice of the below link and prefer 'strscpy' in this subsystem. Conversion is 1:1 because the return value is not used. Generated by a coccinelle script. Link: https://lore.kernel.org/r/CAHk-=wgfRnXz0W3D37d01q3JFkr_i_uTL=V6A6G1oUZcprmknw@mail.gmail.com/ Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Link: https://lore.kernel.org/r/20220818205956.6528-1-wsa+renesas@sang-engineering.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-09-01cgroup: Implement cgroup_file_show()Tejun Heo2-0/+21
Add cgroup_file_show() which allows toggling visibility of a cgroup file using the new kernfs_show(). This will be used to hide psi interface files on cgroups where it's disabled. Cc: Chengming Zhou <zhouchengming@bytedance.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Tested-by: Chengming Zhou <zhouchengming@bytedance.com> Reviewed-by: Chengming Zhou <zhouchengming@bytedance.com> Signed-off-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20220828050440.734579-10-tj@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-09-01kernfs: Implement kernfs_show()Tejun Heo2-1/+38
Currently, kernfs nodes can be created hidden and activated later by calling kernfs_activate() to allow creation of multiple nodes to succeed or fail as a unit. This is an one-way one-time-only transition. This patch introduces kernfs_show() which can toggle visibility dynamically. As the currently proposed use - toggling the cgroup pressure files - only requires operating on leaf nodes, for the sake of simplicity, restrict it as such for now. Hiding uses the same mechanism as deactivation and likewise guarantees that there are no in-flight operations on completion. KERNFS_ACTIVATED and KERNFS_HIDDEN are used to manage the interactions between activations and show/hide operations. A node is visible iff both activated & !hidden. Cc: Chengming Zhou <zhouchengming@bytedance.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Tested-by: Chengming Zhou <zhouchengming@bytedance.com> Reviewed-by: Chengming Zhou <zhouchengming@bytedance.com> Signed-off-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20220828050440.734579-9-tj@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-09-01kernfs: Factor out kernfs_activate_one()Tejun Heo1-10/+17
Factor out kernfs_activate_one() from kernfs_activate() and reorder operations so that KERNFS_ACTIVATED now simply indicates whether activation was attempted on the node ignoring whether activation took place. As the flag doesn't have a reader, the refactoring and reordering shouldn't cause any behavior difference. Tested-by: Chengming Zhou <zhouchengming@bytedance.com> Reviewed-by: Chengming Zhou <zhouchengming@bytedance.com> Signed-off-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20220828050440.734579-8-tj@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-09-01kernfs: Add KERNFS_REMOVING flagsTejun Heo2-14/+8
KERNFS_ACTIVATED tracks whether a given node has ever been activated. As a node was only deactivated on removal, this was used for 1. Drain optimization (removed by the previous patch). 2. To hide !activated nodes 3. To avoid double activations 4. Reject adding children to a node being removed 5. Skip activaing a node which is being removed. We want to decouple deactivation from removal so that nodes can be deactivated and hidden dynamically, which makes KERNFS_ACTIVATED useless for all of the above purposes. #1 is already gone. #2 and #3 can instead test whether the node is currently active. A new flag KERNFS_REMOVING is added to explicitly mark nodes which are being removed for #4 and #5. While this leaves KERNFS_ACTIVATED with no users, leave it be as it will be used in a following patch. Cc: Chengming Zhou <zhouchengming@bytedance.com> Tested-by: Chengming Zhou <zhouchengming@bytedance.com> Reviewed-by: Chengming Zhou <zhouchengming@bytedance.com> Signed-off-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20220828050440.734579-7-tj@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-09-01kernfs: Improve kernfs_drain() and always call on removalTejun Heo1-12/+12
__kernfs_remove() was skipping draining based on KERNFS_ACTIVATED - whether the node has ever been activated since creation. Instead, update it to always call kernfs_drain() which now drains or skips based on the precise drain conditions. This ensures that the nodes will be deactivated and drained regardless of their states. This doesn't make meaningful difference now but will enable deactivating and draining nodes dynamically by making removals safe when racing those operations. While at it, drop / update comments. v2: Fix the inverted test on kernfs_should_drain_open_files() noted by Chengming. This was fixed by the next unrelated patch in the previous posting. Cc: Chengming Zhou <zhouchengming@bytedance.com> Tested-by: Chengming Zhou <zhouchengming@bytedance.com> Reviewed-by: Chengming Zhou <zhouchengming@bytedance.com> Signed-off-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20220828050440.734579-6-tj@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-09-01kernfs: Skip kernfs_drain_open_files() more aggressivelyTejun Heo3-21/+48
Track the number of mmapped files and files that need to be released and skip kernfs_drain_open_file() if both are zero, which are the precise conditions which require draining open_files. The early exit test is factored into kernfs_should_drain_open_files() which is now tested by kernfs_drain_open_files()'s caller - kernfs_drain(). This isn't a meaningful optimization on its own but will enable future stand-alone kernfs_deactivate() implementation. v2: Chengming noticed that on->nr_to_release was leaking after ->open() failure. Fix it by telling kernfs_unlink_open_file() that it's called from the ->open() fail path and should dec the counter. Use kzalloc() to allocate kernfs_open_node so that the tracking fields are correctly initialized. Cc: Chengming Zhou <zhouchengming@bytedance.com> Tested-by: Chengming Zhou <zhouchengming@bytedance.com> Reviewed-by: Chengming Zhou <zhouchengming@bytedance.com> Signed-off-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20220828050440.734579-5-tj@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-09-01kernfs: Refactor kernfs_get_open_node()Tejun Heo1-14/+11
Factor out commont part. This is cleaner and should help with future changes. No functional changes. Tested-by: Chengming Zhou <zhouchengming@bytedance.com> Reviewed-by: Chengming Zhou <zhouchengming@bytedance.com> Signed-off-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20220828050440.734579-4-tj@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-09-01kernfs: Drop unnecessary "mutex" local variable initializationTejun Heo1-4/+5
These are unnecessary and unconventional. Remove them. Also move variable declaration into the block that it's used. No functional changes. Cc: Imran Khan <imran.f.khan@oracle.com> Tested-by: Chengming Zhou <zhouchengming@bytedance.com> Reviewed-by: Chengming Zhou <zhouchengming@bytedance.com> Signed-off-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20220828050440.734579-3-tj@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-09-01kernfs: Simply by replacing kernfs_deref_open_node() with of_on()Tejun Heo1-43/+13
kernfs_node->attr.open is an RCU pointer to kernfs_open_node. However, RCU dereference is currently only used in kernfs_notify(). Everywhere else, either we're holding the lock which protects it or know that the kernfs_open_node is pinned becaused we have a pointer to a kernfs_open_file which is hanging off of it. kernfs_deref_open_node() is used for the latter case - accessing kernfs_open_node from kernfs_open_file. The lifetime and visibility rules are simple and clear here. To someone who can access a kernfs_open_file, its kernfs_open_node is pinned and visible through of->kn->attr.open. Replace kernfs_deref_open_node() which simpler of_on(). The former takes both @kn and @of and RCU deref @kn->attr.open while sanity checking with @of. The latter takes @of and uses protected deref on of->kn->attr.open. As the return value can't be NULL, remove the error handling in the callers too. This shouldn't cause any functional changes. Cc: Imran Khan <imran.f.khan@oracle.com> Tested-by: Chengming Zhou <zhouchengming@bytedance.com> Reviewed-by: Chengming Zhou <zhouchengming@bytedance.com> Signed-off-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20220828050440.734579-2-tj@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-09-01devres: remove devm_ioremap_npChristoph Hellwig3-18/+0
devm_ioremap_np has never been used anywhere since it was added in early 2021, so remove it. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220822061424.151819-1-hch@lst.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-09-01platform/x86: p2sb: Fix UAF when caller uses resource nameAndy Shevchenko1-2/+16
We have to copy only selected fields from the original resource. Because a PCI device will be removed immediately after getting its resources, we may not use any allocated data, hence we may not copy any pointers. Consider the following scenario: 1/ a caller of p2sb_bar() gets the resource; 2/ the resource has been copied by platform_device_add_data() in order to create a platform device; 3/ the platform device creation will call for the device driver's ->probe() as soon as a match found; 4/ the ->probe() takes given resources (see 2/) and tries to access one of its field, i.e. 'name', in the __devm_ioremap_resource() to create a pretty looking output; 5/ but the 'name' is a dangling pointer because p2sb_bar() removed a PCI device, which 'name' had been copied to the caller's memory. 6/ UAF (Use-After-Free) as a result. Kudos to Mika for the initial analisys of the issue. Fixes: 9745fb07474f ("platform/x86/intel: Add Primary to Sideband (P2SB) bridge support") Reported-by: kernel test robot <oliver.sang@intel.com> Suggested-by: Mika Westerberg <mika.westerberg@linux.intel.com> Link: https://lore.kernel.org/linux-i2c/YvPCbnKqDiL2XEKp@xsang-OptiPlex-9020/ Link: https://lore.kernel.org/linux-i2c/YtjAswDKfiuDfWYs@xsang-OptiPlex-9020/ Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20220901113406.65876-1-andriy.shevchenko@linux.intel.com Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2022-09-01platform/x86: asus-wmi: Increase FAN_CURVE_BUF_LEN to 32Luke D. Jones1-5/+4
Fix for TUF laptops returning with an -ENOSPC on calling asus_wmi_evaluate_method_buf() when fetching default curves. The TUF method requires at least 32 bytes space. This also moves and changes the pr_debug() in fan_curve_check_present() to pr_warn() in fan_curve_get_factory_default() so that there is at least some indication in logs of why it fails. Signed-off-by: Luke D. Jones <luke@ljones.dev> Link: https://lore.kernel.org/r/20220828074638.5473-1-luke@ljones.dev Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2022-09-01firmware_loader: Fix memory leak in firmware uploadRuss Weight3-4/+17
In the case of firmware-upload, an instance of struct fw_upload is allocated in firmware_upload_register(). This data needs to be freed in fw_dev_release(). Create a new fw_upload_free() function in sysfs_upload.c to handle the firmware-upload specific memory frees and incorporate the missing kfree call for the fw_upload structure. Fixes: 97730bbb242c ("firmware_loader: Add firmware-upload support") Cc: stable <stable@kernel.org> Signed-off-by: Russ Weight <russell.h.weight@intel.com> Link: https://lore.kernel.org/r/20220831002518.465274-1-russell.h.weight@intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-09-01firmware_loader: Fix use-after-free during unregisterRuss Weight1-1/+2
In the following code within firmware_upload_unregister(), the call to device_unregister() could result in the dev_release function freeing the fw_upload_priv structure before it is dereferenced for the call to module_put(). This bug was found by the kernel test robot using CONFIG_KASAN while running the firmware selftests. device_unregister(&fw_sysfs->dev); module_put(fw_upload_priv->module); The problem is fixed by copying fw_upload_priv->module to a local variable for use when calling device_unregister(). Fixes: 97730bbb242c ("firmware_loader: Add firmware-upload support") Cc: stable <stable@kernel.org> Reported-by: kernel test robot <oliver.sang@intel.com> Reviewed-by: Matthew Gerlach <matthew.gerlach@linux.intel.com> Signed-off-by: Russ Weight <russell.h.weight@intel.com> Link: https://lore.kernel.org/r/20220829174557.437047-1-russell.h.weight@intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>