aboutsummaryrefslogtreecommitdiffstats
path: root/kernel (follow)
AgeCommit message (Collapse)AuthorFilesLines
2021-05-01Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds3-5/+3
Pull kvm updates from Paolo Bonzini: "This is a large update by KVM standards, including AMD PSP (Platform Security Processor, aka "AMD Secure Technology") and ARM CoreSight (debug and trace) changes. ARM: - CoreSight: Add support for ETE and TRBE - Stage-2 isolation for the host kernel when running in protected mode - Guest SVE support when running in nVHE mode - Force W^X hypervisor mappings in nVHE mode - ITS save/restore for guests using direct injection with GICv4.1 - nVHE panics now produce readable backtraces - Guest support for PTP using the ptp_kvm driver - Performance improvements in the S2 fault handler x86: - AMD PSP driver changes - Optimizations and cleanup of nested SVM code - AMD: Support for virtual SPEC_CTRL - Optimizations of the new MMU code: fast invalidation, zap under read lock, enable/disably dirty page logging under read lock - /dev/kvm API for AMD SEV live migration (guest API coming soon) - support SEV virtual machines sharing the same encryption context - support SGX in virtual machines - add a few more statistics - improved directed yield heuristics - Lots and lots of cleanups Generic: - Rework of MMU notifier interface, simplifying and optimizing the architecture-specific code - a handful of "Get rid of oprofile leftovers" patches - Some selftests improvements" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (379 commits) KVM: selftests: Speed up set_memory_region_test selftests: kvm: Fix the check of return value KVM: x86: Take advantage of kvm_arch_dy_has_pending_interrupt() KVM: SVM: Skip SEV cache flush if no ASIDs have been used KVM: SVM: Remove an unnecessary prototype declaration of sev_flush_asids() KVM: SVM: Drop redundant svm_sev_enabled() helper KVM: SVM: Move SEV VMCB tracking allocation to sev.c KVM: SVM: Explicitly check max SEV ASID during sev_hardware_setup() KVM: SVM: Unconditionally invoke sev_hardware_teardown() KVM: SVM: Enable SEV/SEV-ES functionality by default (when supported) KVM: SVM: Condition sev_enabled and sev_es_enabled on CONFIG_KVM_AMD_SEV=y KVM: SVM: Append "_enabled" to module-scoped SEV/SEV-ES control variables KVM: SEV: Mask CPUID[0x8000001F].eax according to supported features KVM: SVM: Move SEV module params/variables to sev.c KVM: SVM: Disable SEV/SEV-ES if NPT is disabled KVM: SVM: Free sev_asid_bitmap during init if SEV setup fails KVM: SVM: Zero out the VMCB array used to track SEV ASID association x86/sev: Drop redundant and potentially misleading 'sev_enabled' KVM: x86: Move reverse CPUID helpers to separate header file KVM: x86: Rename GPR accessors to make mode-aware variants the defaults ...
2021-04-30Merge branch 'akpm' (patches from Andrew)Linus Torvalds7-91/+118
Merge misc updates from Andrew Morton: "A few misc subsystems and some of MM. 175 patches. Subsystems affected by this patch series: ia64, kbuild, scripts, sh, ocfs2, kfifo, vfs, kernel/watchdog, and mm (slab-generic, slub, kmemleak, debug, pagecache, msync, gup, memremap, memcg, pagemap, mremap, dma, sparsemem, vmalloc, documentation, kasan, initialization, pagealloc, and memory-failure)" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (175 commits) mm/memory-failure: unnecessary amount of unmapping mm/mmzone.h: fix existing kernel-doc comments and link them to core-api mm: page_alloc: ignore init_on_free=1 for debug_pagealloc=1 net: page_pool: use alloc_pages_bulk in refill code path net: page_pool: refactor dma_map into own function page_pool_dma_map SUNRPC: refresh rq_pages using a bulk page allocator SUNRPC: set rq_page_end differently mm/page_alloc: inline __rmqueue_pcplist mm/page_alloc: optimize code layout for __alloc_pages_bulk mm/page_alloc: add an array-based interface to the bulk page allocator mm/page_alloc: add a bulk page allocator mm/page_alloc: rename alloced to allocated mm/page_alloc: duplicate include linux/vmalloc.h mm, page_alloc: avoid page_to_pfn() in move_freepages() mm/Kconfig: remove default DISCONTIGMEM_MANUAL mm: page_alloc: dump migrate-failed pages mm/mempolicy: fix mpol_misplaced kernel-doc mm/mempolicy: rewrite alloc_pages_vma documentation mm/mempolicy: rewrite alloc_pages documentation mm/mempolicy: rename alloc_pages_current to alloc_pages ...
2021-04-30Merge tag 'pinctrl-v5.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrlLinus Torvalds1-0/+1
Pull pin control updates from Linus Walleij: "There is a lot going on! Core changes: - A semantic change to handle pinmux and pinconf in explicit order while up until now we depended on the semantic order in the device tree. The device tree is a functional programming language and does not imply any order, so the right thing is for the pin control core to provide these semantics. - Add a new pinmux-select debugfs file which makes it possible to go in and select functions for a pin manually (iteratively, at the prompt) for debugging purposes. - Fixes to gpio regmap handling for a new pin control driver making use of regmap-gpio. - Use octal permissions on debugfs files. New drivers: - A massive rewrite of the former custom pin control driver for MIPS Broadcom devices to instead use the pin control subsystem. New pin control drivers for BCM6345, BCM6328, BCM6358, BCM6362, BCM6368, BCM63268 and BCM6318 SoC variants are implemented. - Support for PM8350, PM8350B, PM8350C, PMK8350, PMR735A and PMR735B in the Qualcomm PMIC GPIO driver. Also the two GPIOs on PM8008 are supported. - Support for the Rockchip RK3568/RK3566 pin controller. - Support for Ingenic JZ4730, JZ4750, JZ4755, JZ4775 and X2000. - Support for Mediatek MTK8195. - Add a new Xilinx ZynqMP pin control driver. Driver improvements and non-urgent fixes: - Modularization and improvements of the Rockchip drivers. - Some new pins added to the description of new Renesas SoCs. - Clarifications of the GPIO base calculation in the Intel driver. - Fix the function names for the MPP54 and MPP55 pins in the Armada CP110 pin controller. - GPIO wakeup interrupt map for Qualcomm SC7280 and SM8350. - Support for ACPI probing of the Qualcomm SC8180x. - Fix interrupt clear status on rockchip - Fix some missing pins on the Ingenic JZ4770, some semantic fixes for the behaviour of the Ingenic pin controller. Add DMIC pins for JZ4780, X1000, X1500 and X1830. - A slew of janitorial like of_node_put() calls" * tag 'pinctrl-v5.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: (99 commits) pinctrl: Add Xilinx ZynqMP pinctrl driver support firmware: xilinx: Add pinctrl support pinctrl: rockchip: do coding style for mux route struct pinctrl: Add PIN_CONFIG_MODE_PWM to enum pin_config_param pinctrl: Introduce MODE group in enum pin_config_param pinctrl: Keep enum pin_config_param ordered by name dt-bindings: pinctrl: Add binding for ZynqMP pinctrl driver pinctrl: core: Fix kernel doc string for pin_get_name() pinctrl: mediatek: use spin lock in mtk_rmw pinctrl: add drive for I2C related pins on MT8195 pinctrl: add pinctrl driver on mt8195 dt-bindings: pinctrl: mt8195: add pinctrl file and binding document pinctrl: Ingenic: Add pinctrl driver for X2000. pinctrl: Ingenic: Add pinctrl driver for JZ4775. pinctrl: Ingenic: Add pinctrl driver for JZ4755. pinctrl: Ingenic: Add pinctrl driver for JZ4750. pinctrl: Ingenic: Add pinctrl driver for JZ4730. dt-bindings: pinctrl: Add bindings for new Ingenic SoCs. pinctrl: Ingenic: Reformat the code. pinctrl: Ingenic: Add DMIC pins support for Ingenic SoCs. ...
2021-04-30Merge tag 'modules-for-v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linuxLinus Torvalds1-5/+4
Pull module updates from Jessica Yu: "Fix an age old bug involving jump_calls and static_labels when CONFIG_MODULE_UNLOAD=n. When CONFIG_MODULE_UNLOAD=n, it means you can't unload modules, so normally the __exit sections of a module are not loaded at all. However, dynamic code patching (jump_label, static_call, alternatives) can have sites in __exit sections even if __exit is never executed. Reported by Peter Zijlstra: 'Alternatives, jump_labels and static_call all can have relocations into __exit code. Not loading it at all would be BAD.' Therefore, load the __exit sections even when CONFIG_MODULE_UNLOAD=n, and discard them after init" * tag 'modules-for-v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux: module: treat exit sections the same as init sections when !CONFIG_MODULE_UNLOAD
2021-04-30irq_work: record irq_work_queue() call stackZqiang1-1/+6
Add the irq_work_queue() call stack into the KASAN auxiliary stack in order to improve KASAN reports. this will let us know where the irq work be queued. Link: https://lkml.kernel.org/r/20210331063202.28770-1-qiang.zhang@windriver.com Signed-off-by: Zqiang <qiang.zhang@windriver.com> Reviewed-by: Dmitry Vyukov <dvyukov@google.com> Acked-by: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Alexander Potapenko <glider@google.com> Cc: Matthias Brugger <matthias.bgg@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Walter Wu <walter-zh.wu@mediatek.com> Cc: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-04-30kasan: record task_work_add() call stackWalter Wu1-0/+3
Why record task_work_add() call stack? Syzbot reports many use-after-free issues for task_work, see [1]. After seeing the free stack and the current auxiliary stack, we think they are useless, we don't know where the work was registered. This work may be the free call stack, so we miss the root cause and don't solve the use-after-free. Add the task_work_add() call stack into the KASAN auxiliary stack in order to improve KASAN reports. It helps programmers solve use-after-free issues. [1]: https://groups.google.com/g/syzkaller-bugs/search?q=kasan%20use-after-free%20task_work_run Link: https://lkml.kernel.org/r/20210316024410.19967-1-walter-zh.wu@mediatek.com Signed-off-by: Walter Wu <walter-zh.wu@mediatek.com> Suggested-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: Jens Axboe <axboe@kernel.dk> Acked-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Andrey Konovalov <andreyknvl@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Matthias Brugger <matthias.bgg@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-04-30kernel/dma: remove unnecessary unmap_kernel_rangeNicholas Piggin1-1/+0
vunmap will remove ptes. Link: https://lkml.kernel.org/r/20210322021806.892164-3-npiggin@gmail.com Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Cédric Le Goater <clg@kaod.org> Cc: Uladzislau Rezki <urezki@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-04-30cgroup: rstat: punt root-level optimization to individual controllersJohannes Weiner1-25/+36
Current users of the rstat code can source root-level statistics from the native counters of their respective subsystem, allowing them to forego aggregation at the root level. This optimization is currently implemented inside the generic rstat code, which doesn't track the root cgroup and doesn't invoke the subsystem flush callbacks on it. However, the memory controller cannot do this optimization, because cgroup1 breaks out memory specifically for the local level, including at the root level. In preparation for the memory controller switching to rstat, move the optimization from rstat core to the controllers. Afterwards, rstat will always track the root cgroup for changes and invoke the subsystem callbacks on it; and it's up to the subsystem to special-case and skip aggregation of the root cgroup if it can source this information through other, cheaper means. This is the case for the io controller and the cgroup base stats. In their respective flush callbacks, check whether the parent is the root cgroup, and if so, skip the unnecessary upward propagation. The extra cost of tracking the root cgroup is negligible: on stat changes, we actually remove a branch that checks for the root. The queueing for a flush touches only per-cpu data, and only the first stat change since a flush requires a (per-cpu) lock. Link: https://lkml.kernel.org/r/20210209163304.77088-6-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Tejun Heo <tj@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Michal Koutný <mkoutny@suse.com> Cc: Roman Gushchin <guro@fb.com> Cc: Shakeel Butt <shakeelb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-04-30cgroup: rstat: support cgroup1Johannes Weiner2-15/+21
Rstat currently only supports the default hierarchy in cgroup2. In order to replace memcg's private stats infrastructure - used in both cgroup1 and cgroup2 - with rstat, the latter needs to support cgroup1. The initialization and destruction callbacks for regular cgroups are already in place. Remove the cgroup_on_dfl() guards to handle cgroup1. The initialization of the root cgroup is currently hardcoded to only handle cgrp_dfl_root.cgrp. Move those callbacks to cgroup_setup_root() and cgroup_destroy_root() to handle the default root as well as the various cgroup1 roots we may set up during mounting. The linking of css to cgroups happens in code shared between cgroup1 and cgroup2 as well. Simply remove the cgroup_on_dfl() guard. Linkage of the root css to the root cgroup is a bit trickier: per default, the root css of a subsystem controller belongs to the default hierarchy (i.e. the cgroup2 root). When a controller is mounted in its cgroup1 version, the root css is stolen and moved to the cgroup1 root; on unmount, the css moves back to the default hierarchy. Annotate rebind_subsystems() to move the root css linkage along between roots. Link: https://lkml.kernel.org/r/20210209163304.77088-5-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Roman Gushchin <guro@fb.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Acked-by: Tejun Heo <tj@kernel.org> Reviewed-by: Michal Koutný <mkoutny@suse.com> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-04-30mm: memcontrol: fix kernel stack accountMuchun Song1-5/+8
For simplification commit 991e7673859e ("mm: memcontrol: account kernel stack per node") changed the per zone vmalloc backed stack pages accounting to per node. By doing that we have lost a certain precision because those pages might live in different NUMA nodes. In the end NR_KERNEL_STACK_KB exported to the userspace might be over estimated on some nodes while underestimated on others. But this is not a real world problem, just a problem found by reading the code. So there is no actual data to showing how much impact it has on users. This doesn't impose any real problem to correctnes of the kernel behavior as the counter is not used for any internal processing but it can cause some confusion to the userspace. Address the problem by accounting each vmalloc backing page to its own node. Link: https://lkml.kernel.org/r/20210303151843.81156-1-songmuchun@bytedance.com Signed-off-by: Muchun Song <songmuchun@bytedance.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Roman Gushchin <guro@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-04-30watchdog: cleanup handling of false positivesPetr Mladek1-12/+8
Commit d6ad3e286d2c ("softlockup: Add sched_clock_tick() to avoid kernel warning on kgdb resume") introduced touch_softlockup_watchdog_sync(). It solved a problem when the watchdog was touched in an atomic context, the timer callback was proceed right after releasing interrupts, and the local clock has not been updated yet. In this case, sched_clock_tick() was called in watchdog_timer_fn() before updating the timer. So far so good. Later commit 5d1c0f4a80a6 ("watchdog: add check for suspended vm in softlockup detector") added two kvm_check_and_clear_guest_paused() calls. They touch the watchdog when the guest has been sleeping. The code makes my head spin around. Scenario 1: + guest did sleep: + PVCLOCK_GUEST_STOPPED is set + 1st watchdog_timer_fn() invocation: + the watchdog is not touched yet + is_softlockup() returns too big delay + kvm_check_and_clear_guest_paused(): + clear PVCLOCK_GUEST_STOPPED + call touch_softlockup_watchdog_sync() + set SOFTLOCKUP_DELAY_REPORT + set softlockup_touch_sync + return from the timer callback + 2nd watchdog_timer_fn() invocation: + call sched_clock_tick() even though it is not needed. The timer callback was invoked again only because the clock has already been updated in the meantime. + call kvm_check_and_clear_guest_paused() that does nothing because PVCLOCK_GUEST_STOPPED has been cleared already. + call update_report_ts() and return. This is fine. Except that sched_clock_tick() might allow to set it already during the 1st invocation. Scenario 2: + guest did sleep + 1st watchdog_timer_fn() invocation + same as in 1st scenario + guest did sleep again: + set PVCLOCK_GUEST_STOPPED again + 2nd watchdog_timer_fn() invocation + SOFTLOCKUP_DELAY_REPORT is set from 1st invocation + call sched_clock_tick() + call kvm_check_and_clear_guest_paused() + clear PVCLOCK_GUEST_STOPPED + call touch_softlockup_watchdog_sync() + set SOFTLOCKUP_DELAY_REPORT + set softlockup_touch_sync + call update_report_ts() (set real timestamp immediately) + return from the timer callback + 3rd watchdog_timer_fn() invocation + timestamp is set from 2nd invocation + softlockup_touch_sync is set but not checked because the real timestamp is already set Make the code more straightforward: 1. Always call kvm_check_and_clear_guest_paused() at the very beginning to handle PVCLOCK_GUEST_STOPPED. It touches the watchdog when the quest did sleep. 2. Handle the situation when the watchdog has been touched (SOFTLOCKUP_DELAY_REPORT is set). Call sched_clock_tick() when touch_*sync() variant was used. It makes sure that the timestamp will be up to date even when it has been touched in atomic context or quest did sleep. As a result, kvm_check_and_clear_guest_paused() is called on a single location. And the right timestamp is always set when returning from the timer callback. Link: https://lkml.kernel.org/r/20210311122130.6788-7-pmladek@suse.com Signed-off-by: Petr Mladek <pmladek@suse.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Laurence Oberman <loberman@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vincent Whitchurch <vincent.whitchurch@axis.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-04-30watchdog: fix barriers when printing backtraces from all CPUsPetr Mladek1-11/+6
Any parallel softlockup reports are skipped when one CPU is already printing backtraces from all CPUs. The exclusive rights are synchronized using one bit in soft_lockup_nmi_warn. There is also one memory barrier that does not make much sense. Use two barriers on the right location to prevent mixing two reports. [pmladek@suse.com: use bit lock operations to prevent multiple soft-lockup reports] Link: https://lkml.kernel.org/r/YFSVsLGVWMXTvlbk@alley Link: https://lkml.kernel.org/r/20210311122130.6788-6-pmladek@suse.com Signed-off-by: Petr Mladek <pmladek@suse.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Laurence Oberman <loberman@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vincent Whitchurch <vincent.whitchurch@axis.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-04-30watchdog/softlockup: remove logic that tried to prevent repeated reportsPetr Mladek1-12/+2
The softlockup detector does some gymnastic with the variable soft_watchdog_warn. It was added by the commit 58687acba59266735ad ("lockup_detector: Combine nmi_watchdog and softlockup detector"). The purpose is not completely clear. There are the following clues. They describe the situation how it looked after the above mentioned commit: 1. The variable was checked with a comment "only warn once". 2. The variable was set when softlockup was reported. It was cleared only when the CPU was not longer in the softlockup state. 3. watchdog_touch_ts was not explicitly updated when the softlockup was reported. Without this variable, the report would normally be printed again during every following watchdog_timer_fn() invocation. The logic has got even more tangled up by the commit ed235875e2ca98 ("kernel/watchdog.c: print traces for all cpus on lockup detection"). After this commit, soft_watchdog_warn is set only when softlockup_all_cpu_backtrace is enabled. But multiple reports from all CPUs are prevented by a new variable soft_lockup_nmi_warn. Conclusion: The variable probably never worked as intended. In each case, it has not worked last many years because the softlockup was reported repeatedly after the full period defined by watchdog_thresh. The reason is that watchdog gets touched in many known slow paths, for example, in printk_stack_address(). This code is called also when printing the softlockup report. It means that the watchdog timestamp gets updated after each report. Solution: Simply remove the logic. People want the periodic report anyway. Link: https://lkml.kernel.org/r/20210311122130.6788-5-pmladek@suse.com Signed-off-by: Petr Mladek <pmladek@suse.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Laurence Oberman <loberman@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vincent Whitchurch <vincent.whitchurch@axis.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-04-30watchdog/softlockup: report the overall time of softlockupsPetr Mladek1-12/+28
The softlockup detector currently shows the time spent since the last report. As a result it is not clear whether a CPU is infinitely hogged by a single task or if it is a repeated event. The situation can be simulated with a simply busy loop: while (true) cpu_relax(); The softlockup detector produces: [ 168.277520] watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [cat:4865] [ 196.277604] watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [cat:4865] [ 236.277522] watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [cat:4865] But it should be, something like: [ 480.372418] watchdog: BUG: soft lockup - CPU#2 stuck for 26s! [cat:4943] [ 508.372359] watchdog: BUG: soft lockup - CPU#2 stuck for 52s! [cat:4943] [ 548.372359] watchdog: BUG: soft lockup - CPU#2 stuck for 89s! [cat:4943] [ 576.372351] watchdog: BUG: soft lockup - CPU#2 stuck for 115s! [cat:4943] For the better output, add an additional timestamp of the last report. Only this timestamp is reset when the watchdog is intentionally touched from slow code paths or when printing the report. Link: https://lkml.kernel.org/r/20210311122130.6788-4-pmladek@suse.com Signed-off-by: Petr Mladek <pmladek@suse.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Laurence Oberman <loberman@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vincent Whitchurch <vincent.whitchurch@axis.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-04-30watchdog: explicitly update timestamp when reporting softlockupPetr Mladek1-0/+3
The softlockup situation might stay for a long time or even forever. When it happens, the softlockup debug messages are printed in regular intervals defined by get_softlockup_thresh(). There is a mystery. The repeated message is printed after the full interval that is defined by get_softlockup_thresh(). But the timer callback is called more often as defined by sample_period. The code looks like the soflockup should get reported in every sample_period when it was once behind the thresh. It works only by chance. The watchdog is touched when printing the stall report, for example, in printk_stack_address(). Make the behavior clear and predictable by explicitly updating the timestamp in watchdog_timer_fn() when the report gets printed. Link: https://lkml.kernel.org/r/20210311122130.6788-3-pmladek@suse.com Signed-off-by: Petr Mladek <pmladek@suse.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Laurence Oberman <loberman@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vincent Whitchurch <vincent.whitchurch@axis.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-04-30watchdog: rename __touch_watchdog() to a better descriptive namePetr Mladek1-4/+4
Patch series "watchdog/softlockup: Report overall time and some cleanup", v2. I dug deep into the softlockup watchdog history when time permitted this year. And reworked the patchset that fixed timestamps and cleaned up the code[2]. I split it into very small steps and did even more code clean up. The result looks quite strightforward and I am pretty confident with the changes. [1] v2: https://lore.kernel.org/r/20201210160038.31441-1-pmladek@suse.com [2] v1: https://lore.kernel.org/r/20191024114928.15377-1-pmladek@suse.com This patch (of 6): There are many touch_*watchdog() functions. They are called in situations where the watchdog could report false positives or create unnecessary noise. For example, when CPU is entering idle mode, a virtual machine is stopped, or a lot of messages are printed in the atomic context. These functions set SOFTLOCKUP_RESET instead of a real timestamp. It allows to call them even in a context where jiffies might be outdated. For example, in an atomic context. The real timestamp is set by __touch_watchdog() that is called from the watchdog timer callback. Rename this callback to update_touch_ts(). It better describes the effect and clearly distinguish is from the other touch_*watchdog() functions. Another motivation is that two timestamps are going to be used. One will be used for the total softlockup time. The other will be used to measure time since the last report. The new function name will help to distinguish which timestamp is being updated. Link: https://lkml.kernel.org/r/20210311122130.6788-1-pmladek@suse.com Link: https://lkml.kernel.org/r/20210311122130.6788-2-pmladek@suse.com Signed-off-by: Petr Mladek <pmladek@suse.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Laurence Oberman <loberman@redhat.com> Cc: Vincent Whitchurch <vincent.whitchurch@axis.com> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-04-29Merge tag 'kconfig-v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuildLinus Torvalds1-0/+1
Pull Kconfig updates from Masahiro Yamada: - Change 'option defconfig' to the environment variable KCONFIG_DEFCONFIG_LIST - Refactor tinyconfig without using allnoconfig_y - Remove 'option allnoconfig_y' syntax - Change 'option modules' to 'modules' - Do not use /boot/config-* etc. as base config for cross-compilation - Fix a search bug in nconf - Various code cleanups * tag 'kconfig-v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (34 commits) kconfig: refactor .gitignore kconfig: highlight xconfig 'comment' lines with '***' kconfig: highlight gconfig 'comment' lines with '***' kconfig: gconf: remove unused code kconfig: remove unused PACKAGE definition kconfig: nconf: stop endless search loops kconfig: split menu.c out of parser.y kconfig: nconf: refactor in print_in_middle() kconfig: nconf: remove meaningless wattrset() call from show_menu() kconfig: nconf: change set_config_filename() to void function kconfig: nconf: refactor attributes setup code kconfig: nconf: remove unneeded default for menu prompt kconfig: nconf: get rid of (void) casts from wattrset() calls kconfig: nconf: fix NORMAL attributes kconfig: mconf,nconf: remove unneeded '\0' termination after snprintf() kconfig: use /boot/config-* etc. as DEFCONFIG_LIST only for native build kconfig: change sym_change_count to a boolean flag kconfig: nconf: fix core dump when searching in empty menu kconfig: lxdialog: A spello fix and a punctuation added kconfig: streamline_config.pl: Couple of typo fixes ...
2021-04-29Merge tag 'kbuild-v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuildLinus Torvalds1-1/+1
Pull Kbuild updates from Masahiro Yamada: - Evaluate $(call cc-option,...) etc. only for build targets - Add CONFIG_VMLINUX_MAP to generate .map file when linking vmlinux - Remove unnecessary --gcc-toolchains Clang flag because the --prefix flag finds the toolchains - Do not pass Clang's --prefix flag when using the integrated as - Check the assembler version in Kconfig time - Add new CONFIG options, AS_VERSION, AS_IS_GNU, AS_IS_LLVM to clean up some dependencies in Kconfig - Fix invalid Module.symvers creation when building only modules without vmlinux - Fix false-positive modpost warnings when CONFIG_TRIM_UNUSED_KSYMS is set, but there is no module to build - Refactor module installation Makefile - Support zstd for module compression - Convert alpha and ia64 to use generic shell scripts to generate the syscall headers - Add a new elfnote to indicate if the kernel was built with LTO, which will be used by pahole - Flatten the directory structure under include/config/ so CONFIG options and filenames match - Change the deb source package name from linux-$(KERNELRELEASE) to linux-upstream * tag 'kbuild-v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (42 commits) kbuild: Add $(KBUILD_HOSTLDFLAGS) to 'has_libelf' test kbuild: deb-pkg: change the source package name to linux-upstream tools: do not include scripts/Kbuild.include kbuild: redo fake deps at include/config/*.h kbuild: remove TMPO from try-run MAINTAINERS: add pattern for dummy-tools kbuild: add an elfnote for whether vmlinux is built with lto ia64: syscalls: switch to generic syscallhdr.sh ia64: syscalls: switch to generic syscalltbl.sh alpha: syscalls: switch to generic syscallhdr.sh alpha: syscalls: switch to generic syscalltbl.sh sysctl: use min() helper for namecmp() kbuild: add support for zstd compressed modules kbuild: remove CONFIG_MODULE_COMPRESS kbuild: merge scripts/Makefile.modsign to scripts/Makefile.modinst kbuild: move module strip/compression code into scripts/Makefile.modinst kbuild: refactor scripts/Makefile.modinst kbuild: rename extmod-prefix to extmod_prefix kbuild: check module name conflict for external modules as well kbuild: show the target directory for depmod log ...
2021-04-29Merge tag 'net-next-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-nextLinus Torvalds23-649/+1735
Pull networking updates from Jakub Kicinski: "Core: - bpf: - allow bpf programs calling kernel functions (initially to reuse TCP congestion control implementations) - enable task local storage for tracing programs - remove the need to store per-task state in hash maps, and allow tracing programs access to task local storage previously added for BPF_LSM - add bpf_for_each_map_elem() helper, allowing programs to walk all map elements in a more robust and easier to verify fashion - sockmap: support UDP and cross-protocol BPF_SK_SKB_VERDICT redirection - lpm: add support for batched ops in LPM trie - add BTF_KIND_FLOAT support - mostly to allow use of BTF on s390 which has floats in its headers files - improve BPF syscall documentation and extend the use of kdoc parsing scripts we already employ for bpf-helpers - libbpf, bpftool: support static linking of BPF ELF files - improve support for encapsulation of L2 packets - xdp: restructure redirect actions to avoid a runtime lookup, improving performance by 4-8% in microbenchmarks - xsk: build skb by page (aka generic zerocopy xmit) - improve performance of software AF_XDP path by 33% for devices which don't need headers in the linear skb part (e.g. virtio) - nexthop: resilient next-hop groups - improve path stability on next-hops group changes (incl. offload for mlxsw) - ipv6: segment routing: add support for IPv4 decapsulation - icmp: add support for RFC 8335 extended PROBE messages - inet: use bigger hash table for IP ID generation - tcp: deal better with delayed TX completions - make sure we don't give up on fast TCP retransmissions only because driver is slow in reporting that it completed transmitting the original - tcp: reorder tcp_congestion_ops for better cache locality - mptcp: - add sockopt support for common TCP options - add support for common TCP msg flags - include multiple address ids in RM_ADDR - add reset option support for resetting one subflow - udp: GRO L4 improvements - improve 'forward' / 'frag_list' co-existence with UDP tunnel GRO, allowing the first to take place correctly even for encapsulated UDP traffic - micro-optimize dev_gro_receive() and flow dissection, avoid retpoline overhead on VLAN and TEB GRO - use less memory for sysctls, add a new sysctl type, to allow using u8 instead of "int" and "long" and shrink networking sysctls - veth: allow GRO without XDP - this allows aggregating UDP packets before handing them off to routing, bridge, OvS, etc. - allow specifing ifindex when device is moved to another namespace - netfilter: - nft_socket: add support for cgroupsv2 - nftables: add catch-all set element - special element used to define a default action in case normal lookup missed - use net_generic infra in many modules to avoid allocating per-ns memory unnecessarily - xps: improve the xps handling to avoid potential out-of-bound accesses and use-after-free when XPS change race with other re-configuration under traffic - add a config knob to turn off per-cpu netdev refcnt to catch underflows in testing Device APIs: - add WWAN subsystem to organize the WWAN interfaces better and hopefully start driving towards more unified and vendor- independent APIs - ethtool: - add interface for reading IEEE MIB stats (incl. mlx5 and bnxt support) - allow network drivers to dump arbitrary SFP EEPROM data, current offset+length API was a poor fit for modern SFP which define EEPROM in terms of pages (incl. mlx5 support) - act_police, flow_offload: add support for packet-per-second policing (incl. offload for nfp) - psample: add additional metadata attributes like transit delay for packets sampled from switch HW (and corresponding egress and policy-based sampling in the mlxsw driver) - dsa: improve support for sandwiched LAGs with bridge and DSA - netfilter: - flowtable: use direct xmit in topologies with IP forwarding, bridging, vlans etc. - nftables: counter hardware offload support - Bluetooth: - improvements for firmware download w/ Intel devices - add support for reading AOSP vendor capabilities - add support for virtio transport driver - mac80211: - allow concurrent monitor iface and ethernet rx decap - set priority and queue mapping for injected frames - phy: add support for Clause-45 PHY Loopback - pci/iov: add sysfs MSI-X vector assignment interface to distribute MSI-X resources to VFs (incl. mlx5 support) New hardware/drivers: - dsa: mv88e6xxx: add support for Marvell mv88e6393x - 11-port Ethernet switch with 8x 1-Gigabit Ethernet and 3x 10-Gigabit interfaces. - dsa: support for legacy Broadcom tags used on BCM5325, BCM5365 and BCM63xx switches - Microchip KSZ8863 and KSZ8873; 3x 10/100Mbps Ethernet switches - ath11k: support for QCN9074 a 802.11ax device - Bluetooth: Broadcom BCM4330 and BMC4334 - phy: Marvell 88X2222 transceiver support - mdio: add BCM6368 MDIO mux bus controller - r8152: support RTL8153 and RTL8156 (USB Ethernet) chips - mana: driver for Microsoft Azure Network Adapter (MANA) - Actions Semi Owl Ethernet MAC - can: driver for ETAS ES58X CAN/USB interfaces Pure driver changes: - add XDP support to: enetc, igc, stmmac - add AF_XDP support to: stmmac - virtio: - page_to_skb() use build_skb when there's sufficient tailroom (21% improvement for 1000B UDP frames) - support XDP even without dedicated Tx queues - share the Tx queues with the stack when necessary - mlx5: - flow rules: add support for mirroring with conntrack, matching on ICMP, GTP, flex filters and more - support packet sampling with flow offloads - persist uplink representor netdev across eswitch mode changes - allow coexistence of CQE compression and HW time-stamping - add ethtool extended link error state reporting - ice, iavf: support flow filters, UDP Segmentation Offload - dpaa2-switch: - move the driver out of staging - add spanning tree (STP) support - add rx copybreak support - add tc flower hardware offload on ingress traffic - ionic: - implement Rx page reuse - support HW PTP time-stamping - octeon: support TC hardware offloads - flower matching on ingress and egress ratelimitting. - stmmac: - add RX frame steering based on VLAN priority in tc flower - support frame preemption (FPE) - intel: add cross time-stamping freq difference adjustment - ocelot: - support forwarding of MRP frames in HW - support multiple bridges - support PTP Sync one-step timestamping - dsa: mv88e6xxx, dpaa2-switch: offload bridge port flags like learning, flooding etc. - ipa: add IPA v4.5, v4.9 and v4.11 support (Qualcomm SDX55, SM8350, SC7280 SoCs) - mt7601u: enable TDLS support - mt76: - add support for 802.3 rx frames (mt7915/mt7615) - mt7915 flash pre-calibration support - mt7921/mt7663 runtime power management fixes" * tag 'net-next-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2451 commits) net: selftest: fix build issue if INET is disabled net: netrom: nr_in: Remove redundant assignment to ns net: tun: Remove redundant assignment to ret net: phy: marvell: add downshift support for M88E1240 net: dsa: ksz: Make reg_mib_cnt a u8 as it never exceeds 255 net/sched: act_ct: Remove redundant ct get and check icmp: standardize naming of RFC 8335 PROBE constants bpf, selftests: Update array map tests for per-cpu batched ops bpf: Add batched ops support for percpu array bpf: Implement formatted output helpers with bstr_printf seq_file: Add a seq_bprintf function sfc: adjust efx->xdp_tx_queue_count with the real number of initialized queues net:nfc:digital: Fix a double free in digital_tg_recv_dep_req net: fix a concurrency bug in l2tp_tunnel_register() net/smc: Remove redundant assignment to rc mpls: Remove redundant assignment to err llc2: Remove redundant assignment to rc net/tls: Remove redundant initialization of record rds: Remove redundant assignment to nr_sig dt-bindings: net: mdio-gpio: add compatible for microchip,mdio-smi0 ...
2021-04-29Merge tag 'x86-mm-2021-04-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds2-161/+89
Pull x86 tlb updates from Ingo Molnar: "The x86 MM changes in this cycle were: - Implement concurrent TLB flushes, which overlaps the local TLB flush with the remote TLB flush. In testing this improved sysbench performance measurably by a couple of percentage points, especially if TLB-heavy security mitigations are active. - Further micro-optimizations to improve the performance of TLB flushes" * tag 'x86-mm-2021-04-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: smp: Micro-optimize smp_call_function_many_cond() smp: Inline on_each_cpu_cond() and on_each_cpu() x86/mm/tlb: Remove unnecessary uses of the inline keyword cpumask: Mark functions as pure x86/mm/tlb: Do not make is_lazy dirty for no reason x86/mm/tlb: Privatize cpu_tlbstate x86/mm/tlb: Flush remote and local TLBs concurrently x86/mm/tlb: Open-code on_each_cpu_cond_mask() for tlb_is_not_lazy() x86/mm/tlb: Unify flush_tlb_func_local() and flush_tlb_func_remote() smp: Run functions concurrently in smp_call_function_many_cond()
2021-04-29Merge tag 'fsnotify_for_v5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fsLinus Torvalds2-1/+15
Pull fsnotify updates from Jan Kara: - support for limited fanotify functionality for unpriviledged users - faster merging of fanotify events - a few smaller fsnotify improvements * tag 'fsnotify_for_v5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: shmem: allow reporting fanotify events with file handles on tmpfs fs: introduce a wrapper uuid_to_fsid() fanotify_user: use upper_32_bits() to verify mask fanotify: support limited functionality for unprivileged users fanotify: configurable limits via sysfs fanotify: limit number of event merge attempts fsnotify: use hash table for faster events merge fanotify: mix event info and pid into merge key hash fanotify: reduce event objectid to 29-bit hash fsnotify: allow fsnotify_{peek,remove}_first_event with empty queue
2021-04-29Merge tag 'for_v5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fsLinus Torvalds1-0/+1
Pull quota, ext2, reiserfs updates from Jan Kara: - support for path (instead of device) based quotactl syscall (quotactl_path(2)) - ext2 conversion to kmap_local() - other minor cleanups & fixes * tag 'for_v5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: fs/reiserfs/journal.c: delete useless variables fs/ext2: Replace kmap() with kmap_local_page() ext2: Match up ext2_put_page() with ext2_dotdot() and ext2_find_entry() fs/ext2/: fix misspellings using codespell tool quota: report warning limits for realtime space quotas quota: wire up quotactl_path quota: Add mountpath based quota support
2021-04-28Merge tag 'for-5.13/io_uring-2021-04-27' of git://git.kernel.dk/linux-blockLinus Torvalds2-8/+30
Pull io_uring updates from Jens Axboe: - Support for multi-shot mode for POLL requests - More efficient reference counting. This is shamelessly stolen from the mm side. Even though referencing is mostly single/dual user, the 128 count was retained to keep the code the same. Maybe this should/could be made generic at some point. - Removal of the need to have a manager thread for each ring. The manager threads only job was checking and creating new io-threads as needed, instead we handle this from the queue path. - Allow SQPOLL without CAP_SYS_ADMIN or CAP_SYS_NICE. Since 5.12, this thread is "just" a regular application thread, so no need to restrict use of it anymore. - Cleanup of how internal async poll data lifetime is managed. - Fix for syzbot reported crash on SQPOLL cancelation. - Make buffer registration more like file registrations, which includes flexibility in avoiding full set unregistration and re-registration. - Fix for io-wq affinity setting. - Be a bit more defensive in task->pf_io_worker setup. - Various SQPOLL fixes. - Cleanup of SQPOLL creds handling. - Improvements to in-flight request tracking. - File registration cleanups. - Tons of cleanups and little fixes * tag 'for-5.13/io_uring-2021-04-27' of git://git.kernel.dk/linux-block: (156 commits) io_uring: maintain drain logic for multishot poll requests io_uring: Check current->io_uring in io_uring_cancel_sqpoll io_uring: fix NULL reg-buffer io_uring: simplify SQPOLL cancellations io_uring: fix work_exit sqpoll cancellations io_uring: Fix uninitialized variable up.resv io_uring: fix invalid error check after malloc io_uring: io_sq_thread() no longer needs to reset current->pf_io_worker kernel: always initialize task->pf_io_worker to NULL io_uring: update sq_thread_idle after ctx deleted io_uring: add full-fledged dynamic buffers support io_uring: implement fixed buffers registration similar to fixed files io_uring: prepare fixed rw for dynanic buffers io_uring: keep table of pointers to ubufs io_uring: add generic rsrc update with tags io_uring: add IORING_REGISTER_RSRC io_uring: enumerate dynamic resources io_uring: add generic path for rsrc update io_uring: preparation for rsrc tagging io_uring: decouple CQE filling from requests ...
2021-04-28Merge tag 'sched-core-2021-04-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds29-854/+1059
Pull scheduler updates from Ingo Molnar: - Clean up SCHED_DEBUG: move the decades old mess of sysctl, procfs and debugfs interfaces to a unified debugfs interface. - Signals: Allow caching one sigqueue object per task, to improve performance & latencies. - Improve newidle_balance() irq-off latencies on systems with a large number of CPU cgroups. - Improve energy-aware scheduling - Improve the PELT metrics for certain workloads - Reintroduce select_idle_smt() to improve load-balancing locality - but without the previous regressions - Add 'scheduler latency debugging': warn after long periods of pending need_resched. This is an opt-in feature that requires the enabling of the LATENCY_WARN scheduler feature, or the use of the resched_latency_warn_ms=xx boot parameter. - CPU hotplug fixes for HP-rollback, and for the 'fail' interface. Fix remaining balance_push() vs. hotplug holes/races - PSI fixes, plus allow /proc/pressure/ files to be written by CAP_SYS_RESOURCE tasks as well - Fix/improve various load-balancing corner cases vs. capacity margins - Fix sched topology on systems with NUMA diameter of 3 or above - Fix PF_KTHREAD vs to_kthread() race - Minor rseq optimizations - Misc cleanups, optimizations, fixes and smaller updates * tag 'sched-core-2021-04-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (61 commits) cpumask/hotplug: Fix cpu_dying() state tracking kthread: Fix PF_KTHREAD vs to_kthread() race sched/debug: Fix cgroup_path[] serialization sched,psi: Handle potential task count underflow bugs more gracefully sched: Warn on long periods of pending need_resched sched/fair: Move update_nohz_stats() to the CONFIG_NO_HZ_COMMON block to simplify the code & fix an unused function warning sched/debug: Rename the sched_debug parameter to sched_verbose sched,fair: Alternative sched_slice() sched: Move /proc/sched_debug to debugfs sched,debug: Convert sysctl sched_domains to debugfs debugfs: Implement debugfs_create_str() sched,preempt: Move preempt_dynamic to debug.c sched: Move SCHED_DEBUG sysctl to debugfs sched: Don't make LATENCYTOP select SCHED_DEBUG sched: Remove sched_schedstats sysctl out from under SCHED_DEBUG sched/numa: Allow runtime enabling/disabling of NUMA balance without SCHED_DEBUG sched: Use cpu_dying() to fix balance_push vs hotplug-rollback cpumask: Introduce DYING mask cpumask: Make cpu_{online,possible,present,active}() inline rseq: Optimise rseq_get_rseq_cs() and clear_rseq_cs() ...
2021-04-28Merge tag 'perf-core-2021-04-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds4-111/+292
Pull perf event updates from Ingo Molnar: - Improve Intel uncore PMU support: - Parse uncore 'discovery tables' - a new hardware capability enumeration method introduced on the latest Intel platforms. This table is in a well-defined PCI namespace location and is read via MMIO. It is organized in an rbtree. These uncore tables will allow the discovery of standard counter blocks, but fancier counters still need to be enumerated explicitly. - Add Alder Lake support - Improve IIO stacks to PMON mapping support on Skylake servers - Add Intel Alder Lake PMU support - which requires the introduction of 'hybrid' CPUs and PMUs. Alder Lake is a mix of Golden Cove ('big') and Gracemont ('small' - Atom derived) cores. The CPU-side feature set is entirely symmetrical - but on the PMU side there's core type dependent PMU functionality. - Reduce data loss with CPU level hardware tracing on Intel PT / AUX profiling, by fixing the AUX allocation watermark logic. - Improve ring buffer allocation on NUMA systems - Put 'struct perf_event' into their separate kmem_cache pool - Add support for synchronous signals for select perf events. The immediate motivation is to support low-overhead sampling-based race detection for user-space code. The feature consists of the following main changes: - Add thread-only event inheritance via perf_event_attr::inherit_thread, which limits inheritance of events to CLONE_THREAD. - Add the ability for events to not leak through exec(), via perf_event_attr::remove_on_exec. - Allow the generation of SIGTRAP via perf_event_attr::sigtrap, extend siginfo with an u64 ::si_perf, and add the breakpoint information to ::si_addr and ::si_perf if the event is PERF_TYPE_BREAKPOINT. The siginfo support is adequate for breakpoints right now - but the new field can be used to introduce support for other types of metadata passed over siginfo as well. - Misc fixes, cleanups and smaller updates. * tag 'perf-core-2021-04-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (53 commits) signal, perf: Add missing TRAP_PERF case in siginfo_layout() signal, perf: Fix siginfo_t by avoiding u64 on 32-bit architectures perf/x86: Allow for 8<num_fixed_counters<16 perf/x86/rapl: Add support for Intel Alder Lake perf/x86/cstate: Add Alder Lake CPU support perf/x86/msr: Add Alder Lake CPU support perf/x86/intel/uncore: Add Alder Lake support perf: Extend PERF_TYPE_HARDWARE and PERF_TYPE_HW_CACHE perf/x86/intel: Add Alder Lake Hybrid support perf/x86: Support filter_match callback perf/x86/intel: Add attr_update for Hybrid PMUs perf/x86: Add structures for the attributes of Hybrid PMUs perf/x86: Register hybrid PMUs perf/x86: Factor out x86_pmu_show_pmu_cap perf/x86: Remove temporary pmu assignment in event_init perf/x86/intel: Factor out intel_pmu_check_extra_regs perf/x86/intel: Factor out intel_pmu_check_event_constraints perf/x86/intel: Factor out intel_pmu_check_num_counters perf/x86: Hybrid PMU support for extra_regs perf/x86: Hybrid PMU support for event constraints ...
2021-04-28Merge tag 'locking-core-2021-04-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds27-764/+707
Pull locking updates from Ingo Molnar: - rtmutex cleanup & spring cleaning pass that removes ~400 lines of code - Futex simplifications & cleanups - Add debugging to the CSD code, to help track down a tenacious race (or hw problem) - Add lockdep_assert_not_held(), to allow code to require a lock to not be held, and propagate this into the ath10k driver - Misc LKMM documentation updates - Misc KCSAN updates: cleanups & documentation updates - Misc fixes and cleanups - Fix locktorture bugs with ww_mutexes * tag 'locking-core-2021-04-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (44 commits) kcsan: Fix printk format string static_call: Relax static_call_update() function argument type static_call: Fix unused variable warn w/o MODULE locking/rtmutex: Clean up signal handling in __rt_mutex_slowlock() locking/rtmutex: Restrict the trylock WARN_ON() to debug locking/rtmutex: Fix misleading comment in rt_mutex_postunlock() locking/rtmutex: Consolidate the fast/slowpath invocation locking/rtmutex: Make text section and inlining consistent locking/rtmutex: Move debug functions as inlines into common header locking/rtmutex: Decrapify __rt_mutex_init() locking/rtmutex: Remove pointless CONFIG_RT_MUTEXES=n stubs locking/rtmutex: Inline chainwalk depth check locking/rtmutex: Move rt_mutex_debug_task_free() to rtmutex.c locking/rtmutex: Remove empty and unused debug stubs locking/rtmutex: Consolidate rt_mutex_init() locking/rtmutex: Remove output from deadlock detector locking/rtmutex: Remove rtmutex deadlock tester leftovers locking/rtmutex: Remove rt_mutex_timed_lock() MAINTAINERS: Add myself as futex reviewer locking/mutex: Remove repeated declaration ...
2021-04-28Merge tag 'core-rcu-2021-04-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds11-166/+457
Pull RCU updates from Ingo Molnar: - Support for "N" as alias for last bit in bitmap parsing library (eg using syntax like "nohz_full=2-N") - kvfree_rcu updates - mm_dump_obj() updates. (One of these is to mm, but was suggested by Andrew Morton.) - RCU callback offloading update - Polling RCU grace-period interfaces - Realtime-related RCU updates - Tasks-RCU updates - Torture-test updates - Torture-test scripting updates - Miscellaneous fixes * tag 'core-rcu-2021-04-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (77 commits) rcutorture: Test start_poll_synchronize_rcu() and poll_state_synchronize_rcu() rcu: Provide polling interfaces for Tiny RCU grace periods torture: Fix kvm.sh --datestamp regex check torture: Consolidate qemu-cmd duration editing into kvm-transform.sh torture: Print proper vmlinux path for kvm-again.sh runs torture: Make TORTURE_TRUST_MAKE available in kvm-again.sh environment torture: Make kvm-transform.sh update jitter commands torture: Add --duration argument to kvm-again.sh torture: Add kvm-again.sh to rerun a previous torture-test torture: Create a "batches" file for build reuse torture: De-capitalize TORTURE_SUITE torture: Make upper-case-only no-dot no-slash scenario names official torture: Rename SRCU-t and SRCU-u to avoid lowercase characters torture: Remove no-mpstat error message torture: Record kvm-test-1-run.sh and kvm-test-1-run-qemu.sh PIDs torture: Record jitter start/stop commands torture: Extract kvm-test-1-run-qemu.sh from kvm-test-1-run.sh torture: Record TORTURE_KCONFIG_GDB_ARG in qemu-cmd torture: Abstract jitter.sh start/stop into scripts rcu: Provide polling interfaces for Tree RCU grace periods ...
2021-04-27Merge branch 'for-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroupLinus Torvalds4-4/+412
Pull cgroup changes from Tejun Heo: "The only notable change is Vipin's new misc cgroup controller. This implements generic support for resources which can be controlled by simply counting and limiting the number of resource instances - ie there's X number of these on the system and this cgroup subtree can have upto Y of those. The first user is the address space IDs used for virtual machine memory encryption and expected future usages are similar - niche hardware features with concrete resource limits and simple usage models" * 'for-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup: use tsk->in_iowait instead of delayacct_is_task_waiting_on_io() cgroup/cpuset: fix typos in comments cgroup: misc: mark dummy misc_cg_res_total_usage() static inline svm/sev: Register SEV and SEV-ES ASIDs to the misc controller cgroup: Miscellaneous cgroup documentation. cgroup: Add misc cgroup controller
2021-04-27Merge tag 'livepatching-for-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatchingLinus Torvalds2-6/+3
Pull livepatching update from Petr Mladek: - Use TIF_NOTIFY_SIGNAL infrastructure instead of the fake signal * tag 'livepatching-for-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching: livepatch: Replace the fake signal sending with TIF_NOTIFY_SIGNAL infrastructure
2021-04-27Merge tag 'printk-for-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linuxLinus Torvalds4-268/+257
Pull printk updates from Petr Mladek: - Stop synchronizing kernel log buffer readers by logbuf_lock. As a result, the access to the buffer is fully lockless now. Note that printk() itself still uses locks because it tries to flush the messages to the console immediately. Also the per-CPU temporary buffers are still there because they prevent infinite recursion and serialize backtraces from NMI. All this is going to change in the future. - kmsg_dump API rework and cleanup as a side effect of the logbuf_lock removal. - Make bstr_printf() aware that %pf and %pF formats could deference the given pointer. - Show also page flags by %pGp format. - Clarify the documentation for plain pointer printing. - Do not show no_hash_pointers warning multiple times. - Update Senozhatsky email address. - Some clean up. * tag 'printk-for-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux: (24 commits) lib/vsprintf.c: remove leftover 'f' and 'F' cases from bstr_printf() printk: clarify the documentation for plain pointer printing kernel/printk.c: Fixed mundane typos printk: rename vprintk_func to vprintk vsprintf: dump full information of page flags in pGp mm, slub: don't combine pr_err with INFO mm, slub: use pGp to print page flags MAINTAINERS: update Senozhatsky email address lib/vsprintf: do not show no_hash_pointers message multiple times printk: console: remove unnecessary safe buffer usage printk: kmsg_dump: remove _nolock() variants printk: remove logbuf_lock printk: introduce a kmsg_dump iterator printk: kmsg_dumper: remove @active field printk: add syslog_lock printk: use atomic64_t for devkmsg_user.seq printk: use seqcount_latch for clear_seq printk: introduce CONSOLE_LOG_MAX printk: consolidate kmsg_dump_get_buffer/syslog_print_all code printk: refactor kmsg_dump_get_buffer() ...
2021-04-27Merge tag 'kgdb-5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linuxLinus Torvalds5-309/+381
Pull kgdb updates from Daniel Thompson: "Exclusively tidy ups this cycle. Most of them are thanks to Sumit Garg and, as it happens, the clean ups do result in a slight increase in the line count. This is due to registering kdb commands using data structures rather than function calls which, in turn, simplifies the memory management during command registration. In addition to changes to command registration we also have some dead code removal, a clearer implementation of environment variable handling and a typo fix" * tag 'kgdb-5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux: kdb: Refactor env variables get/set code kernel: debug: Ordinary typo fixes in the file gdbstub.c kdb: Simplify kdb commands registration kdb: Remove redundant function definitions/prototypes
2021-04-28bpf: Add batched ops support for percpu arrayPedro Tammela1-0/+2
Uses the already in-place infrastructure provided by the 'generic_map_*_batch' functions. No tweak was needed as it transparently handles the percpu variant. As arrays don't have delete operations, let it return a error to user space (default behaviour). Suggested-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210424214510.806627-2-pctammela@mojatatu.com
2021-04-27bpf: Implement formatted output helpers with bstr_printfFlorent Revest3-113/+111
BPF has three formatted output helpers: bpf_trace_printk, bpf_seq_printf and bpf_snprintf. Their signatures specify that all arguments are provided from the BPF world as u64s (in an array or as registers). All of these helpers are currently implemented by calling functions such as snprintf() whose signatures take a variable number of arguments, then placed in a va_list by the compiler to call vsnprintf(). "d9c9e4db bpf: Factorize bpf_trace_printk and bpf_seq_printf" introduced a bpf_printf_prepare function that fills an array of u64 sanitized arguments with an array of "modifiers" which indicate what the "real" size of each argument should be (given by the format specifier). The BPF_CAST_FMT_ARG macro consumes these arrays and casts each argument to its real size. However, the C promotion rules implicitely cast them all back to u64s. Therefore, the arguments given to snprintf are u64s and the va_list constructed by the compiler will use 64 bits for each argument. On 64 bit machines, this happens to work well because 32 bit arguments in va_lists need to occupy 64 bits anyway, but on 32 bit architectures this breaks the layout of the va_list expected by the called function and mangles values. In "88a5c690b6 bpf: fix bpf_trace_printk on 32 bit archs", this problem had been solved for bpf_trace_printk only with a "horrid workaround" that emitted multiple calls to trace_printk where each call had different argument types and generated different va_list layouts. One of the call would be dynamically chosen at runtime. This was ok with the 3 arguments that bpf_trace_printk takes but bpf_seq_printf and bpf_snprintf accept up to 12 arguments. Because this approach scales code exponentially, it is not a viable option anymore. Because the promotion rules are part of the language and because the construction of a va_list is an arch-specific ABI, it's best to just avoid variadic arguments and va_lists altogether. Thankfully the kernel's snprintf() has an alternative in the form of bstr_printf() that accepts arguments in a "binary buffer representation". These binary buffers are currently created by vbin_printf and used in the tracing subsystem to split the cost of printing into two parts: a fast one that only dereferences and remembers values, and a slower one, called later, that does the pretty-printing. This patch refactors bpf_printf_prepare to construct binary buffers of arguments consumable by bstr_printf() instead of arrays of arguments and modifiers. This gets rid of BPF_CAST_FMT_ARG and greatly simplifies the bpf_printf_prepare usage but there are a few gotchas that change how bpf_printf_prepare needs to do things. Currently, bpf_printf_prepare uses a per cpu temporary buffer as a generic storage for strings and IP addresses. With this refactoring, the temporary buffers now holds all the arguments in a structured binary format. To comply with the format expected by bstr_printf, certain format specifiers also need to be pre-formatted: %pB and %pi6/%pi4/%pI4/%pI6. Because vsnprintf subroutines for these specifiers are hard to expose, we pre-format these arguments with calls to snprintf(). Reported-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Florent Revest <revest@chromium.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210427174313.860948-3-revest@chromium.org
2021-04-27Merge tag 'audit-pr-20210426' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/auditLinus Torvalds2-13/+10
Pull audit updates from Paul Moore: "Another small pull request for audit, most of the patches are documentation updates with only two real code changes: one to fix a compiler warning for a dummy function/macro, and one to cleanup some code since we removed the AUDIT_FILTER_ENTRY ages ago (v4.17)" * tag 'audit-pr-20210426' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit: audit: drop /proc/PID/loginuid documentation Format field audit: avoid -Wempty-body warning audit: document /proc/PID/sessionid audit: document /proc/PID/loginuid MAINTAINERS: update audit files audit: further cleanup of AUDIT_FILTER_ENTRY deprecation
2021-04-27Merge tag 'selinux-pr-20210426' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinuxLinus Torvalds4-8/+10
Pull selinux updates from Paul Moore: - Add support for measuring the SELinux state and policy capabilities using IMA. - A handful of SELinux/NFS patches to compare the SELinux state of one mount with a set of mount options. Olga goes into more detail in the patch descriptions, but this is important as it allows more flexibility when using NFS and SELinux context mounts. - Properly differentiate between the subjective and objective LSM credentials; including support for the SELinux and Smack. My clumsy attempt at a proper fix for AppArmor didn't quite pass muster so John is working on a proper AppArmor patch, in the meantime this set of patches shouldn't change the behavior of AppArmor in any way. This change explains the bulk of the diffstat beyond security/. - Fix a problem where we were not properly terminating the permission list for two SELinux object classes. * tag 'selinux-pr-20210426' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux: selinux: add proper NULL termination to the secclass_map permissions smack: differentiate between subjective and objective task credentials selinux: clarify task subjective and objective credentials lsm: separate security_task_getsecid() into subjective and objective variants nfs: account for selinux security context when deciding to share superblock nfs: remove unneeded null check in nfs_fill_super() lsm,selinux: add new hook to compare new mount to an existing mount selinux: fix misspellings using codespell tool selinux: fix misspellings using codespell tool selinux: measure state and policy capabilities selinux: Allow context mounts for unpriviliged overlayfs
2021-04-27Merge tag 'cfi-v5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linuxLinus Torvalds6-7/+429
Pull CFI on arm64 support from Kees Cook: "This builds on last cycle's LTO work, and allows the arm64 kernels to be built with Clang's Control Flow Integrity feature. This feature has happily lived in Android kernels for almost 3 years[1], so I'm excited to have it ready for upstream. The wide diffstat is mainly due to the treewide fixing of mismatched list_sort prototypes. Other things in core kernel are to address various CFI corner cases. The largest code portion is the CFI runtime implementation itself (which will be shared by all architectures implementing support for CFI). The arm64 pieces are Acked by arm64 maintainers rather than coming through the arm64 tree since carrying this tree over there was going to be awkward. CFI support for x86 is still under development, but is pretty close. There are a handful of corner cases on x86 that need some improvements to Clang and objtool, but otherwise works well. Summary: - Clean up list_sort prototypes (Sami Tolvanen) - Introduce CONFIG_CFI_CLANG for arm64 (Sami Tolvanen)" * tag 'cfi-v5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: arm64: allow CONFIG_CFI_CLANG to be selected KVM: arm64: Disable CFI for nVHE arm64: ftrace: use function_nocfi for ftrace_call arm64: add __nocfi to __apply_alternatives arm64: add __nocfi to functions that jump to a physical address arm64: use function_nocfi with __pa_symbol arm64: implement function_nocfi psci: use function_nocfi for cpu_resume lkdtm: use function_nocfi treewide: Change list_sort to use const pointers bpf: disable CFI in dispatcher functions kallsyms: strip ThinLTO hashes from static functions kthread: use WARN_ON_FUNCTION_MISMATCH workqueue: use WARN_ON_FUNCTION_MISMATCH module: ensure __cfi_check alignment mm: add generic function_nocfi macro cfi: add __cficanonical add support for Clang CFI
2021-04-27Merge tag 'seccomp-v5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linuxLinus Torvalds1-1/+1
Pull seccomp updates from Kees Cook: - Fix "cacheable" typo in comments (Cui GaoSheng) - Fix CONFIG for /proc/$pid/status Seccomp_filters (Kenta.Tada@sony.com) * tag 'seccomp-v5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: seccomp: Fix "cacheable" typo in comments seccomp: Fix CONFIG tests for Seccomp_filters
2021-04-27bpf, cpumap: Bulk skb using netif_receive_skb_listLorenzo Bianconi1-9/+9
Rely on netif_receive_skb_list routine to send skbs converted from xdp_frames in cpu_map_kthread_run in order to improve i-cache usage. The proposed patch has been tested running xdp_redirect_cpu bpf sample available in the kernel tree that is used to redirect UDP frames from ixgbe driver to a cpumap entry and then to the networking stack. UDP frames are generated using pktgen. Packets are discarded by the UDP layer. $ xdp_redirect_cpu --cpu <cpu> --progname xdp_cpu_map0 --dev <eth> bpf-next: ~2.35Mpps bpf-next + cpumap skb-list: ~2.72Mpps Rename drops counter in kmem_alloc_drops since now it reports just kmem_cache_alloc_bulk failures Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Link: https://lore.kernel.org/bpf/c729f83e5d7482d9329e0f165bdbe5adcefd1510.1619169700.git.lorenzo@kernel.org
2021-04-27bpf: Fix propagation of 32 bit unsigned bounds from 64 bit boundsDaniel Borkmann1-5/+3
Similarly as b02709587ea3 ("bpf: Fix propagation of 32-bit signed bounds from 64-bit bounds."), we also need to fix the propagation of 32 bit unsigned bounds from 64 bit counterparts. That is, really only set the u32_{min,max}_value when /both/ {umin,umax}_value safely fit in 32 bit space. For example, the register with a umin_value == 1 does /not/ imply that u32_min_value is also equal to 1, since umax_value could be much larger than 32 bit subregister can hold, and thus u32_min_value is in the interval [0,1] instead. Before fix, invalid tracking result of R2_w=inv1: [...] 5: R0_w=inv1337 R1=ctx(id=0,off=0,imm=0) R2_w=inv(id=0) R10=fp0 5: (35) if r2 >= 0x1 goto pc+1 [...] // goto path 7: R0=inv1337 R1=ctx(id=0,off=0,imm=0) R2=inv(id=0,umin_value=1) R10=fp0 7: (b6) if w2 <= 0x1 goto pc+1 [...] // goto path 9: R0=inv1337 R1=ctx(id=0,off=0,imm=0) R2=inv(id=0,smin_value=-9223372036854775807,smax_value=9223372032559808513,umin_value=1,umax_value=18446744069414584321,var_off=(0x1; 0xffffffff00000000),s32_min_value=1,s32_max_value=1,u32_max_value=1) R10=fp0 9: (bc) w2 = w2 10: R0=inv1337 R1=ctx(id=0,off=0,imm=0) R2_w=inv1 R10=fp0 [...] After fix, correct tracking result of R2_w=inv(id=0,umax_value=1,var_off=(0x0; 0x1)): [...] 5: R0_w=inv1337 R1=ctx(id=0,off=0,imm=0) R2_w=inv(id=0) R10=fp0 5: (35) if r2 >= 0x1 goto pc+1 [...] // goto path 7: R0=inv1337 R1=ctx(id=0,off=0,imm=0) R2=inv(id=0,umin_value=1) R10=fp0 7: (b6) if w2 <= 0x1 goto pc+1 [...] // goto path 9: R0=inv1337 R1=ctx(id=0,off=0,imm=0) R2=inv(id=0,smax_value=9223372032559808513,umax_value=18446744069414584321,var_off=(0x0; 0xffffffff00000001),s32_min_value=0,s32_max_value=1,u32_max_value=1) R10=fp0 9: (bc) w2 = w2 10: R0=inv1337 R1=ctx(id=0,off=0,imm=0) R2_w=inv(id=0,umax_value=1,var_off=(0x0; 0x1)) R10=fp0 [...] Thus, same issue as in b02709587ea3 holds for unsigned subregister tracking. Also, align __reg64_bound_u32() similarly to __reg64_bound_s32() as done in b02709587ea3 to make them uniform again. Fixes: 3f50f132d840 ("bpf: Verifier, do explicit ALU32 bounds tracking") Reported-by: Manfred Paul (@_manfp) Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org>
2021-04-27bpf: Lock bpf_trace_printk's tmp buf before it is written toFlorent Revest1-1/+1
bpf_trace_printk uses a shared static buffer to hold strings before they are printed. A recent refactoring moved the locking of that buffer after it gets filled by mistake. Fixes: d9c9e4db186a ("bpf: Factorize bpf_trace_printk and bpf_seq_printf") Reported-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Florent Revest <revest@chromium.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210427112958.773132-1-revest@chromium.org
2021-04-26Merge tag 'pm-5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pmLinus Torvalds6-21/+21
Pull power management updates from Rafael Wysocki: "These add some new hardware support (for example, IceLake-D idle states in intel_idle), fix some issues (for example, the handling of negative "sleep length" values in cpuidle governors), add new functionality to the existing drivers (for example, scale-invariance support in the ACPI CPPC cpufreq driver) and clean up code all over. Specifics: - Add idle states table for IceLake-D to the intel_idle driver and update IceLake-X C6 data in it (Artem Bityutskiy). - Fix the C7 idle state on Tegra114 in the tegra cpuidle driver and drop the unused do_idle() firmware call from it (Dmitry Osipenko). - Fix cpuidle-qcom-spm Kconfig entry (He Ying). - Fix handling of possible negative tick_nohz_get_next_hrtimer() return values of in cpuidle governors (Rafael Wysocki). - Add support for frequency-invariance to the ACPI CPPC cpufreq driver and update the frequency-invariance engine (FIE) to use it as needed (Viresh Kumar). - Simplify the default delay_us setting in the ACPI CPPC cpufreq driver (Tom Saeger). - Clean up frequency-related computations in the intel_pstate cpufreq driver (Rafael Wysocki). - Fix TBG parent setting for load levels in the armada-37xx cpufreq driver and drop the CPU PM clock .set_parent method for armada-37xx (Marek Behún). - Fix multiple issues in the armada-37xx cpufreq driver (Pali Rohár). - Fix handling of dev_pm_opp_of_cpumask_add_table() return values in cpufreq-dt to take the -EPROBE_DEFER one into acconut as appropriate (Quanyang Wang). - Fix format string in ia64-acpi-cpufreq (Sergei Trofimovich). - Drop the unused for_each_policy() macro from cpufreq (Shaokun Zhang). - Simplify computations in the schedutil cpufreq governor to avoid unnecessary overhead (Yue Hu). - Fix typos in the s5pv210 cpufreq driver (Bhaskar Chowdhury). - Fix cpufreq documentation links in Kconfig (Alexander Monakov). - Fix PCI device power state handling in pci_enable_device_flags() to avoid issuse in some cases when the device depends on an ACPI power resource (Rafael Wysocki). - Add missing documentation of pm_runtime_resume_and_get() (Alan Stern). - Add missing static inline stub for pm_runtime_has_no_callbacks() to pm_runtime.h and drop the unused try_to_freeze_nowarn() definition (YueHaibing). - Drop duplicate struct device declaration from pm.h and fix a structure type declaration in intel_rapl.h (Wan Jiabing). - Use dev_set_name() instead of an open-coded equivalent of it in the wakeup sources code and drop a redundant local variable initialization from it (Andy Shevchenko, Colin Ian King). - Use crc32 instead of md5 for e820 memory map integrity check during resume from hibernation on x86 (Chris von Recklinghausen). - Fix typos in comments in the system-wide and hibernation support code (Lu Jialin). - Modify the generic power domains (genpd) code to avoid resuming devices in the "prepare" phase of system-wide suspend and hibernation (Ulf Hansson). - Add Hygon Fam18h RAPL support to the intel_rapl power capping driver (Pu Wen). - Add MAINTAINERS entry for the dynamic thermal power management (DTPM) code (Daniel Lezcano). - Add devm variants of operating performance points (OPP) API functions and switch over some users of the OPP framework to the new resource-managed API (Yangtao Li and Dmitry Osipenko). - Update devfreq core: * Register devfreq devices as cooling devices on demand (Daniel Lezcano). * Add missing unlock opeation in devfreq_add_device() (Lukasz Luba). * Use the next frequency as resume_freq instead of the previous frequency when using the opp-suspend property (Dong Aisheng). * Check get_dev_status in devfreq_update_stats() (Dong Aisheng). * Fix set_freq path for the userspace governor in Kconfig (Dong Aisheng). * Remove invalid description of get_target_freq() (Dong Aisheng). - Update devfreq drivers: * imx8m-ddrc: Remove imx8m_ddrc_get_dev_status() and unneeded of_match_ptr() (Dong Aisheng, Fabio Estevam). * rk3399_dmc: dt-bindings: Add rockchip,pmu phandle and drop references to undefined symbols (Enric Balletbo i Serra, Gaël PORTAY). * rk3399_dmc: Use dev_err_probe() to simplify the code (Krzysztof Kozlowski). * imx-bus: Remove unneeded of_match_ptr() (Fabio Estevam). - Fix kernel-doc warnings in three places (Pierre-Louis Bossart). - Fix typo in the pm-graph utility code (Ricardo Ribalda)" * tag 'pm-5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (74 commits) PM: wakeup: remove redundant assignment to variable retval PM: hibernate: x86: Use crc32 instead of md5 for hibernation e820 integrity check cpufreq: Kconfig: fix documentation links PM: wakeup: use dev_set_name() directly PM: runtime: Add documentation for pm_runtime_resume_and_get() cpufreq: intel_pstate: Simplify intel_pstate_update_perf_limits() cpufreq: armada-37xx: Fix module unloading cpufreq: armada-37xx: Remove cur_frequency variable cpufreq: armada-37xx: Fix determining base CPU frequency cpufreq: armada-37xx: Fix driver cleanup when registration failed clk: mvebu: armada-37xx-periph: Fix workaround for switching from L1 to L0 clk: mvebu: armada-37xx-periph: Fix switching CPU freq from 250 Mhz to 1 GHz cpufreq: armada-37xx: Fix the AVS value for load L1 clk: mvebu: armada-37xx-periph: remove .set_parent method for CPU PM clock cpufreq: armada-37xx: Fix setting TBG parent for load levels cpuidle: Fix ARM_QCOM_SPM_CPUIDLE configuration cpuidle: tegra: Remove do_idle firmware call cpuidle: tegra: Fix C7 idling state on Tegra114 PM: sleep: fix typos in comments cpufreq: Remove unused for_each_policy macro ...
2021-04-26Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linuxLinus Torvalds1-0/+16
Pull arm64 updates from Catalin Marinas: - MTE asynchronous support for KASan. Previously only synchronous (slower) mode was supported. Asynchronous is faster but does not allow precise identification of the illegal access. - Run kernel mode SIMD with softirqs disabled. This allows using NEON in softirq context for crypto performance improvements. The conditional yield support is modified to take softirqs into account and reduce the latency. - Preparatory patches for Apple M1: handle CPUs that only have the VHE mode available (host kernel running at EL2), add FIQ support. - arm64 perf updates: support for HiSilicon PA and SLLC PMU drivers, new functions for the HiSilicon HHA and L3C PMU, cleanups. - Re-introduce support for execute-only user permissions but only when the EPAN (Enhanced Privileged Access Never) architecture feature is available. - Disable fine-grained traps at boot and improve the documented boot requirements. - Support CONFIG_KASAN_VMALLOC on arm64 (only with KASAN_GENERIC). - Add hierarchical eXecute Never permissions for all page tables. - Add arm64 prctl(PR_PAC_{SET,GET}_ENABLED_KEYS) allowing user programs to control which PAC keys are enabled in a particular task. - arm64 kselftests for BTI and some improvements to the MTE tests. - Minor improvements to the compat vdso and sigpage. - Miscellaneous cleanups. * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (86 commits) arm64/sve: Add compile time checks for SVE hooks in generic functions arm64/kernel/probes: Use BUG_ON instead of if condition followed by BUG. arm64: pac: Optimize kernel entry/exit key installation code paths arm64: Introduce prctl(PR_PAC_{SET,GET}_ENABLED_KEYS) arm64: mte: make the per-task SCTLR_EL1 field usable elsewhere arm64/sve: Remove redundant system_supports_sve() tests arm64: fpsimd: run kernel mode NEON with softirqs disabled arm64: assembler: introduce wxN aliases for wN registers arm64: assembler: remove conditional NEON yield macros kasan, arm64: tests supports for HW_TAGS async mode arm64: mte: Report async tag faults before suspend arm64: mte: Enable async tag check fault arm64: mte: Conditionally compile mte_enable_kernel_*() arm64: mte: Enable TCO in functions that can read beyond buffer limits kasan: Add report for async mode arm64: mte: Drop arch_enable_tagging() kasan: Add KASAN mode kernel parameter arm64: mte: Add asynchronous mode support arm64: Get rid of CONFIG_ARM64_VHE arm64: Cope with CPUs stuck in VHE mode ...
2021-04-26Merge tag 'timers-core-2021-04-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds19-62/+78
Pull timer updates from Thomas Gleixner: "The time and timers updates contain: Core changes: - Allow runtime power management when the clocksource is changed. - A correctness fix for clock_adjtime32() so that the return value on success is not overwritten by the result of the copy to user. - Allow late installment of broadcast clockevent devices which was broken because nothing switched them over to oneshot mode. This went unnoticed so far because clockevent devices used to be built in, but now people started to make them modular. - Debugfs related simplifications - Small cleanups and improvements here and there Driver changes: - The usual set of device tree binding updates for a wide range of drivers/devices. - The usual updates and improvements for drivers all over the place but nothing outstanding. - No new clocksource/event drivers. They'll come back next time" * tag 'timers-core-2021-04-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits) posix-timers: Preserve return value in clock_adjtime32() tick/broadcast: Allow late registered device to enter oneshot mode tick: Use tick_check_replacement() instead of open coding it time/timecounter: Mark 1st argument of timecounter_cyc2time() as const dt-bindings: timer: nuvoton,npcm7xx: Add wpcm450-timer clocksource/drivers/arm_arch_timer: Add __ro_after_init and __init clocksource/drivers/timer-ti-dm: Handle dra7 timer wrap errata i940 clocksource/drivers/timer-ti-dm: Prepare to handle dra7 timer wrap issue clocksource/drivers/dw_apb_timer_of: Add handling for potential memory leak clocksource/drivers/npcm: Add support for WPCM450 clocksource/drivers/sh_cmt: Don't use CMTOUT_IE with R-Car Gen2/3 clocksource/drivers/pistachio: Fix trivial typo clocksource/drivers/ingenic_ost: Fix return value check in ingenic_ost_probe() clocksource/drivers/timer-ti-dm: Add missing set_state_oneshot_stopped clocksource/drivers/timer-ti-dm: Fix posted mode status check order dt-bindings: timer: renesas,cmt: Document R8A77961 dt-bindings: timer: renesas,cmt: Add r8a779a0 CMT support clocksource/drivers/ingenic-ost: Add support for the JZ4760B clocksource/drivers/ingenic: Add support for the JZ4760 dt-bindings: timer: ingenic: Add compatible strings for JZ4760(B) ...
2021-04-26Merge tag 'irq-core-2021-04-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds19-157/+360
Pull irq updates from Thomas Gleixner: "The usual updates from the irq departement: Core changes: - Provide IRQF_NO_AUTOEN as a flag for request*_irq() so drivers can be cleaned up which either use a seperate mechanism to prevent auto-enable at request time or have a racy mechanism which disables the interrupt right after request. - Get rid of the last usage of irq_create_identity_mapping() and remove the interface. - An overhaul of tasklet_disable(). Most usage sites of tasklet_disable() are in task context and usually in cleanup, teardown code pathes. tasklet_disable() spinwaits for a tasklet which is currently executed. That's not only a problem for PREEMPT_RT where this can lead to a live lock when the disabling task preempts the softirq thread. It's also problematic in context of virtualization when the vCPU which runs the tasklet is scheduled out and the disabling code has to spin wait until it's scheduled back in. There are a few code pathes which invoke tasklet_disable() from non-sleepable context. For these a new disable variant which still spinwaits is provided which allows to switch tasklet_disable() to a sleep wait mechanism. For the atomic use cases this does not solve the live lock issue on PREEMPT_RT. That is mitigated by blocking on the RT specific softirq lock. - The PREEMPT_RT specific implementation of softirq processing and local_bh_disable/enable(). On RT enabled kernels soft interrupt processing happens always in task context and all interrupt handlers, which are not explicitly marked to be invoked in hard interrupt context are forced into task context as well. This allows to protect against softirq processing with a per CPU lock, which in turn allows to make BH disabled regions preemptible. Most of the softirq handling code is still shared. The RT/non-RT specific differences are addressed with a set of inline functions which provide the context specific functionality. The local_bh_disable() / local_bh_enable() mechanism are obviously seperate. - The usual set of small improvements and cleanups Driver changes: - New drivers for Nuvoton WPCM450 and DT 79rc3243x interrupt controllers - Extended functionality for MStar, STM32 and SC7280 irq chips - Enhanced robustness for ARM GICv3/4.1 drivers - The usual set of cleanups and improvements all over the place" * tag 'irq-core-2021-04-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (53 commits) irqchip/xilinx: Expose Kconfig option for Zynq/ZynqMP irqchip/gic-v3: Do not enable irqs when handling spurious interrups dt-bindings: interrupt-controller: Add IDT 79RC3243x Interrupt Controller irqchip: Add support for IDT 79rc3243x interrupt controller irqdomain: Drop references to recusive irqdomain setup irqdomain: Get rid of irq_create_strict_mappings() irqchip/jcore-aic: Kill use of irq_create_strict_mappings() ARM: PXA: Kill use of irq_create_strict_mappings() irqchip/gic-v4.1: Disable vSGI upon (GIC CPUIF < v4.1) detection irqchip/tb10x: Use 'fallthrough' to eliminate a warning genirq: Reduce irqdebug cacheline bouncing kernel: Initialize cpumask before parsing irqchip/wpcm450: Drop COMPILE_TEST irqchip/irq-mst: Support polarity configuration irqchip: Add driver for WPCM450 interrupt controller dt-bindings: interrupt-controller: Add nuvoton, wpcm450-aic dt-bindings: qcom,pdc: Add compatible for sc7280 irqchip/stm32: Add usart instances exti direct event support irqchip/gic-v3: Fix OF_BAD_ADDR error handling irqchip/sifive-plic: Mark two global variables __ro_after_init ...
2021-04-26Merge tag 'core-entry-2021-04-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds1-1/+1
Pull core entry updates from Thomas Gleixner: "A trivial cleanup of typo fixes" * tag 'core-entry-2021-04-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: entry: Fix typos in comments
2021-04-26Merge tag 'tomoyo-pr-20210426' of git://git.osdn.net/gitroot/tomoyo/tomoyo-test1Linus Torvalds2-5/+5
Pull lockdep capacity limit updates from Tetsuo Handa: "syzbot is occasionally reporting that fuzz testing is terminated due to hitting upper limits lockdep can track. Analysis via /proc/lockdep* did not show any obvious culprits, allow tuning tracing capacity constants" * tag 'tomoyo-pr-20210426' of git://git.osdn.net/gitroot/tomoyo/tomoyo-test1: lockdep: Allow tuning tracing capacity constants.
2021-04-26Merge branches 'pm-core', 'pm-pci', 'pm-sleep', 'pm-domains' and 'powercap'Rafael J. Wysocki3-3/+3
* pm-core: PM: runtime: Add documentation for pm_runtime_resume_and_get() PM: runtime: Replace inline function pm_runtime_callbacks_present() PM: core: Remove duplicate declaration from header file * pm-pci: PCI: PM: Do not read power state in pci_enable_device_flags() * pm-sleep: PM: wakeup: remove redundant assignment to variable retval PM: hibernate: x86: Use crc32 instead of md5 for hibernation e820 integrity check PM: wakeup: use dev_set_name() directly PM: sleep: fix typos in comments freezer: Remove unused inline function try_to_freeze_nowarn() * pm-domains: PM: domains: Don't runtime resume devices at genpd_prepare() * powercap: powercap: RAPL: Fix struct declaration in header file MAINTAINERS: Add DTPM subsystem maintainer powercap: Add Hygon Fam18h RAPL support
2021-04-26Merge branch 'pm-cpufreq'Rafael J. Wysocki2-17/+13
* pm-cpufreq: (22 commits) cpufreq: Kconfig: fix documentation links cpufreq: intel_pstate: Simplify intel_pstate_update_perf_limits() cpufreq: armada-37xx: Fix module unloading cpufreq: armada-37xx: Remove cur_frequency variable cpufreq: armada-37xx: Fix determining base CPU frequency cpufreq: armada-37xx: Fix driver cleanup when registration failed clk: mvebu: armada-37xx-periph: Fix workaround for switching from L1 to L0 clk: mvebu: armada-37xx-periph: Fix switching CPU freq from 250 Mhz to 1 GHz cpufreq: armada-37xx: Fix the AVS value for load L1 clk: mvebu: armada-37xx-periph: remove .set_parent method for CPU PM clock cpufreq: armada-37xx: Fix setting TBG parent for load levels cpufreq: Remove unused for_each_policy macro cpufreq: dt: dev_pm_opp_of_cpumask_add_table() may return -EPROBE_DEFER cpufreq: intel_pstate: Clean up frequency computations cpufreq: cppc: simplify default delay_us setting cpufreq: Rudimentary typos fix in the file s5pv210-cpufreq.c cpufreq: CPPC: Add support for frequency invariance ia64: fix format string for ia64-acpi-cpu-freq cpufreq: schedutil: Call sugov_update_next_freq() before check to fast_switch_enabled arch_topology: Export arch_freq_scale and helpers ...
2021-04-25bpf: Allow trampoline re-attach for tracing and lsm programsJiri Olsa2-8/+19
Currently we don't allow re-attaching of trampolines. Once it's detached, it can't be re-attach even when the program is still loaded. Adding the possibility to re-attach the loaded tracing and lsm programs. Fixing missing unlock with proper cleanup goto jump reported by Julia. Reported-by: kernel test robot <lkp@intel.com> Reported-by: Julia Lawall <julia.lawall@lip6.fr> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: KP Singh <kpsingh@kernel.org> Link: https://lore.kernel.org/bpf/20210414195147.1624932-2-jolsa@kernel.org
2021-04-25Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller6-340/+435
Alexei Starovoitov says: ==================== pull-request: bpf-next 2021-04-23 The following pull-request contains BPF updates for your *net-next* tree. We've added 69 non-merge commits during the last 22 day(s) which contain a total of 69 files changed, 3141 insertions(+), 866 deletions(-). The main changes are: 1) Add BPF static linker support for extern resolution of global, from Andrii. 2) Refine retval for bpf_get_task_stack helper, from Dave. 3) Add a bpf_snprintf helper, from Florent. 4) A bunch of miscellaneous improvements from many developers. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>