linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2021-05-12	sched: Inherit task cookie on fork()	Peter Zijlstra	1	-0/+2
	Note that sched_core_fork() is called from under tasklist_lock, and not from sched_fork() earlier. This avoids a few races later. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Don Hiatt <dhiatt@digitalocean.com> Tested-by: Hongyu Ning <hongyu.ning@linux.intel.com> Tested-by: Vincent Guittot <vincent.guittot@linaro.org> Link: https://lkml.kernel.org/r/20210422123308.980003687@infradead.org
2021-05-12	sched: Trivial core scheduling cookie management	Peter Zijlstra	1	-0/+6
	In order to not have to use pid_struct, create a new, smaller, structure to manage task cookies for core scheduling. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Don Hiatt <dhiatt@digitalocean.com> Tested-by: Hongyu Ning <hongyu.ning@linux.intel.com> Tested-by: Vincent Guittot <vincent.guittot@linaro.org> Link: https://lkml.kernel.org/r/20210422123308.919768100@infradead.org
2021-05-12	sched: Trivial forced-newidle balancer	Peter Zijlstra	1	-0/+1
	When a sibling is forced-idle to match the core-cookie; search for matching tasks to fill the core. rcu_read_unlock() can incur an infrequent deadlock in sched_core_balance(). Fix this by using the RCU-sched flavor instead. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Don Hiatt <dhiatt@digitalocean.com> Tested-by: Hongyu Ning <hongyu.ning@linux.intel.com> Tested-by: Vincent Guittot <vincent.guittot@linaro.org> Link: https://lkml.kernel.org/r/20210422123308.800048269@infradead.org
2021-05-12	sched: Basic tracking of matching tasks	Peter Zijlstra	1	-1/+7
	Introduce task_struct::core_cookie as an opaque identifier for core scheduling. When enabled; core scheduling will only allow matching task to be on the core; where idle matches everything. When task_struct::core_cookie is set (and core scheduling is enabled) these tasks are indexed in a second RB-tree, first on cookie value then on scheduling function, such that matching task selection always finds the most elegible match. NOTE: shudder at the overhead... NOTE: sigh, a 3rd copy of the scheduling function; the alternative is per class tracking of cookies and that just duplicates a lot of stuff for no raisin (the 2nd copy lives in the rt-mutex PI code). [Joel: folded fixes] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Don Hiatt <dhiatt@digitalocean.com> Tested-by: Hongyu Ning <hongyu.ning@linux.intel.com> Tested-by: Vincent Guittot <vincent.guittot@linaro.org> Link: https://lkml.kernel.org/r/20210422123308.496975854@infradead.org
2021-05-12	delayacct: Add sysctl to enable at runtime	Peter Zijlstra	1	-0/+4
	Just like sched_schedstats, allow runtime enabling (and disabling) of delayacct. This is useful if one forgot to add the delayacct boot time option. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/YJkhebGJAywaZowX@hirez.programming.kicks-ass.net
2021-05-12	delayacct: Default disabled	Peter Zijlstra	1	-12/+4
	Assuming this stuff isn't actually used much; disable it by default and avoid allocating and tracking the task_delay_info structure. taskstats is changed to still report the regular sched and sched_info and only skip the missing task_delay_info fields instead of not reporting anything. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Link: https://lkml.kernel.org/r/20210505111525.308018373@infradead.org
2021-05-12	delayacct: Add static_branch in scheduler hooks	Peter Zijlstra	1	-0/+8
	Cheaper when delayacct is disabled. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Acked-by: Balbir Singh <bsingharora@gmail.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Link: https://lkml.kernel.org/r/20210505111525.248028369@infradead.org
2021-05-12	sched: Simplify sched_info_on()	Peter Zijlstra	1	-8/+2
	The situation around sched_info is somewhat complicated, it is used by sched_stats and delayacct and, indirectly, kvm. If SCHEDSTATS=Y (but disabled by default) sched_info_on() is unconditionally true -- this is the case for all distro kernel configs I checked. If for some reason SCHEDSTATS=N, but TASK_DELAY_ACCT=Y, then sched_info_on() can return false when delayacct is disabled, presumably because there would be no other users left; except kvm is. Instead of complicating matters further by accurately accounting sched_stat and kvm state, simply unconditionally enable when SCHED_INFO=Y, matching the common distro case. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Link: https://lkml.kernel.org/r/20210505111525.121458839@infradead.org
2021-05-12	drm/modifiers: Enforce consistency between the cap an IN_FORMATS	Daniel Vetter	1	-0/+2
	It's very confusing for userspace to have to deal with inconsistencies here, and some drivers screwed this up a bit. Most just ommitted the format list when they meant to say that only linear modifier is allowed, but some also meant that only implied modifiers are acceptable (because actually none of the planes registered supported modifiers). Now that this is all done consistently across all drivers, document the rules and enforce it in the drm core. v2: - Make the capability a link (Simon) - Note that all is lost before 5.1. v3: - Use drm_WARN_ON (Lyude) Acked-by: Pekka Paalanen <pekka.paalanen@collabora.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Lyude Paul <lyude@redhat.com> Acked-by: Maxime Ripard <maxime@cerno.tech> Cc: Simon Ser <contact@emersion.fr> Reviewed-by: Lucas Stach <l.stach@pengutronix.de> Cc: Pekka Paalanen <pekka.paalanen@collabora.com> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Maxime Ripard <mripard@kernel.org> Cc: Thomas Zimmermann <tzimmermann@suse.de> Cc: David Airlie <airlied@linux.ie> Cc: Daniel Vetter <daniel@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20210506132343.2873699-1-daniel.vetter@ffwll.ch
2021-05-11	usb: class: cdc-wdm: WWAN framework integration	Loic Poulain	1	-1/+2
	The WWAN framework provides a unified way to handle WWAN/modems and its control port(s). It has initially been introduced to support MHI/PCI modems, offering the same control protocols as the USB variants such as MBIM, QMI, AT... The WWAN framework exposes these control protocols as character devices, similarly to cdc-wdm, but in a bus agnostic fashion. This change adds registration of the USB modem cdc-wdm control endpoints to the WWAN framework as standard control ports (wwanXpY...). Exposing cdc-wdm through WWAN framework normally maintains backward compatibility, e.g: $ qmicli --device-open-qmi -d /dev/wwan0p1QMI --dms-get-ids instead of $ qmicli --device-open-qmi -d /dev/cdc-wdm0 --dms-get-ids However, some tools may rely on cdc-wdm driver/device name for device detection. It is then safer to keep the 'legacy' cdc-wdm character device to prevent any breakage. This is handled in this change by API mutual exclusion, only one access method can be used at a time, either cdc-wdm chardev or WWAN API. Note that unknown channel types (other than MBIM, AT or MBIM) are not registered to the WWAN framework. Signed-off-by: Loic Poulain <loic.poulain@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-11	net: wwan: Add unknown port type	Loic Poulain	1	-1/+3
	Some devices may have ports with unknown type/protocol which need to be tagged (though not supported by WWAN core). This will be the case for cdc-wdm based drivers. Signed-off-by: Loic Poulain <loic.poulain@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-11	mac80211: properly handle A-MSDUs that start with an RFC 1042 header	Mathy Vanhoef	1	-2/+2
	Properly parse A-MSDUs whose first 6 bytes happen to equal a rfc1042 header. This can occur in practice when the destination MAC address equals AA:AA:03:00:00:00. More importantly, this simplifies the next patch to mitigate A-MSDU injection attacks. Cc: stable@vger.kernel.org Signed-off-by: Mathy Vanhoef <Mathy.Vanhoef@kuleuven.be> Link: https://lore.kernel.org/r/20210511200110.0b2b886492f0.I23dd5d685fe16d3b0ec8106e8f01b59f499dffed@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-05-11	RDMA: Remove unnecessary struct declaration	Wan Jiabing	1	-1/+0
	The declaration of struct ib_grh is uncessary here, because it is defined at line 766. Link: https://lore.kernel.org/r/20210510062843.15707-1-wanjiabing@vivo.com Signed-off-by: Wan Jiabing <wanjiabing@vivo.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-05-11	RDMA/core: Remove never used ib_modify_wq function call	Leon Romanovsky	1	-2/+0
	The function ib_modify_wq() is not used, so remove it. Link: https://lore.kernel.org/r/c5e48d517b9163fe4f9ffd224050b83fdb3571c6.1620552935.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-05-11	kyber: fix out of bounds access when preempted	Omar Sandoval	1	-1/+1
	__blk_mq_sched_bio_merge() gets the ctx and hctx for the current CPU and passes the hctx to ->bio_merge(). kyber_bio_merge() then gets the ctx for the current CPU again and uses that to get the corresponding Kyber context in the passed hctx. However, the thread may be preempted between the two calls to blk_mq_get_ctx(), and the ctx returned the second time may no longer correspond to the passed hctx. This "works" accidentally most of the time, but it can cause us to read garbage if the second ctx came from an hctx with more ctx's than the first one (i.e., if ctx->index_hw[hctx->type] > hctx->nr_ctx). This manifested as this UBSAN array index out of bounds error reported by Jakub: UBSAN: array-index-out-of-bounds in ../kernel/locking/qspinlock.c:130:9 index 13106 is out of range for type 'long unsigned int [128]' Call Trace: dump_stack+0xa4/0xe5 ubsan_epilogue+0x5/0x40 __ubsan_handle_out_of_bounds.cold.13+0x2a/0x34 queued_spin_lock_slowpath+0x476/0x480 do_raw_spin_lock+0x1c2/0x1d0 kyber_bio_merge+0x112/0x180 blk_mq_submit_bio+0x1f5/0x1100 submit_bio_noacct+0x7b0/0x870 submit_bio+0xc2/0x3a0 btrfs_map_bio+0x4f0/0x9d0 btrfs_submit_data_bio+0x24e/0x310 submit_one_bio+0x7f/0xb0 submit_extent_page+0xc4/0x440 __extent_writepage_io+0x2b8/0x5e0 __extent_writepage+0x28d/0x6e0 extent_write_cache_pages+0x4d7/0x7a0 extent_writepages+0xa2/0x110 do_writepages+0x8f/0x180 __writeback_single_inode+0x99/0x7f0 writeback_sb_inodes+0x34e/0x790 __writeback_inodes_wb+0x9e/0x120 wb_writeback+0x4d2/0x660 wb_workfn+0x64d/0xa10 process_one_work+0x53a/0xa80 worker_thread+0x69/0x5b0 kthread+0x20b/0x240 ret_from_fork+0x1f/0x30 Only Kyber uses the hctx, so fix it by passing the request_queue to ->bio_merge() instead. BFQ and mq-deadline just use that, and Kyber can map the queues itself to avoid the mismatch. Fixes: a6088845c2bf ("block: kyber: make kyber more friendly with merging") Reported-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Omar Sandoval <osandov@fb.com> Link: https://lore.kernel.org/r/c7598605401a48d5cfeadebb678abd10af22b83f.1620691329.git.osandov@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-05-11	Merge drm/drm-next into drm-misc-next	Thomas Zimmermann	640	-6275/+15835
	Backmerging to get v5.12 fixes. Requested for vmwgfx. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
2021-05-11	soundwire: add missing kernel-doc description	Pierre-Louis Bossart	1	-0/+1
	For some reason we never added a description for the clk_stop callback. Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Reviewed-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com> Reviewed-by: Rander Wang <rander.wang@intel.com> Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com> Link: https://lore.kernel.org/r/20210511030048.25622-3-yung-chuan.liao@linux.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
2021-05-11	soundwire: bus: only use CLOCK_STOP_MODE0 and fix confusions	Pierre-Louis Bossart	1	-2/+0
	Existing devices and implementations only support the required CLOCK_STOP_MODE0. All the code related to CLOCK_STOP_MODE1 has not been tested and is highly questionable, with a clear confusion between CLOCK_STOP_MODE1 and the simple clock stop state machine. This patch removes all usages of CLOCK_STOP_MODE1 - which has no impact on any solution - and fixes the use of the simple clock stop state machine. The resulting code should be a lot more symmetrical and easier to maintain. Note that CLOCK_STOP_MODE1 is not supported in the SoundWire Device Class specification so it's rather unlikely that we need to re-add this mode later. Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Reviewed-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com> Reviewed-by: Rander Wang <rander.wang@intel.com> Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com> Link: https://lore.kernel.org/r/20210511030048.25622-2-yung-chuan.liao@linux.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
2021-05-11	spi: Switch to signed types for *_native_cs SPI controller fields	Andy Shevchenko	1	-2/+2
	While fixing undefined behaviour the commit f60d7270c8a3 ("spi: Avoid undefined behaviour when counting unused native CSs") missed the case when all CSs are GPIOs and thus unused_native_cs will be evaluated to -1 in unsigned representation. This will falsely trigger a condition in the spi_get_gpio_descs(). Switch to signed types for *_native_cs SPI controller fields to fix above. Fixes: f60d7270c8a3 ("spi: Avoid undefined behaviour when counting unused native CSs") Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20210510131242.49455-1-andriy.shevchenko@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org>
2021-05-11	dt-bindings: interconnect: Add Qualcomm SC7280 DT bindings	Odelu Kukatla	1	-0/+165
	The Qualcomm SC7280 platform has several bus fabrics that could be controlled and tuned dynamically according to the bandwidth demand. Signed-off-by: Odelu Kukatla <okukatla@codeaurora.org> Reviewed-by: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/1619517059-12109-2-git-send-email-okukatla@codeaurora.org Signed-off-by: Georgi Djakov <djakov@kernel.org>
2021-05-11	spi: pxa2xx: Introduce special type for Merrifield SPIs	Andy Shevchenko	1	-0/+16
	Intel Merrifield SPI is actually more closer to PXA3xx. It has extended FIFO (32 bytes) and additional registers to get or set FIFO thresholds. Introduce new type for Intel Merrifield SPI host controllers and handle bigger FIFO size. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20210510124134.24638-15-andriy.shevchenko@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org>
2021-05-11	spi: pxa2xx: Use pxa_ssp_enable()/pxa_ssp_disable() in the driver	Andy Shevchenko	1	-0/+16
	There are few places that repeat the logic of pxa_ssp_enable() and pxa_ssp_disable(). Use them instead of open coded variants. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20210510124134.24638-10-andriy.shevchenko@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org>
2021-05-11	stack: Replace "o" output with "r" input constraint	Nick Desaulniers	1	-1/+1
	"o" isn't a common asm() constraint to use; it triggers an assertion in assert-enabled builds of LLVM that it's not recognized when targeting aarch64 (though it appears to fall back to "m"). It's fixed in LLVM 13 now, but there isn't really a good reason to use "o" in particular here. To avoid causing build issues for those using assert-enabled builds of earlier LLVM versions, the constraint needs changing. Instead, if the point is to retain the __builtin_alloca(), make ptr appear to "escape" via being an input to an empty inline asm block. This is preferable anyways, since otherwise this looks like a dead store. While the use of "r" was considered in https://lore.kernel.org/lkml/202104011447.2E7F543@keescook/ it was only tested as an output (which looks like a dead store, and wasn't sufficient). Use "r" as an input constraint instead, which behaves correctly across compilers and architectures. Fixes: 39218ff4c625 ("stack: Optionally randomize kernel stack offset each syscall") Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Kees Cook <keescook@chromium.org> Tested-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Link: https://reviews.llvm.org/D100412 Link: https://bugs.llvm.org/show_bug.cgi?id=49956 Link: https://lore.kernel.org/r/20210419231741.4084415-1-keescook@chromium.org
2021-05-10	scsi: ufs: core: Enable power management for wlun	Asutosh Das	1	-0/+20
	During runtime-suspend of ufs host, the SCSI devices are already suspended and so are the queues associated with them. However, the ufs host sends SSU (START_STOP_UNIT) to the wlun during runtime-suspend. During the process blk_queue_enter() checks if the queue is not in suspended state. If so, it waits for the queue to resume, and never comes out of it. Commit 52abca64fd94 ("scsi: block: Do not accept any requests while suspended") adds the check to see if the queue is in suspended state in blk_queue_enter(). Call trace: __switch_to+0x174/0x2c4 __schedule+0x478/0x764 schedule+0x9c/0xe0 blk_queue_enter+0x158/0x228 blk_mq_alloc_request+0x40/0xa4 blk_get_request+0x2c/0x70 __scsi_execute+0x60/0x1c4 ufshcd_set_dev_pwr_mode+0x124/0x1e4 ufshcd_suspend+0x208/0x83c ufshcd_runtime_suspend+0x40/0x154 ufshcd_pltfrm_runtime_suspend+0x14/0x20 pm_generic_runtime_suspend+0x28/0x3c __rpm_callback+0x80/0x2a4 rpm_suspend+0x308/0x614 rpm_idle+0x158/0x228 pm_runtime_work+0x84/0xac process_one_work+0x1f0/0x470 worker_thread+0x26c/0x4c8 kthread+0x13c/0x320 ret_from_fork+0x10/0x18 Fix this by registering ufs device wlun as a SCSI driver and registering it for block runtime-pm. Also make this a supplier for all other LUNs. This way the wlun device suspends after all the consumers and resumes after HBA resumes. This also registers a new SCSI driver for rpmb wlun. This new driver is mostly used to clear rpmb uac. [mkp: resolve merge conflict with 5.13-rc1 and fix doc warning] Fixed smatch warnings: Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Link: https://lore.kernel.org/r/4662c462e79e3e7f541f54f88f8993f421026d83.1619223249.git.asutoshd@codeaurora.org Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Co-developed-by: Can Guo <cang@codeaurora.org> Signed-off-by: Can Guo <cang@codeaurora.org> Signed-off-by: Asutosh Das <asutoshd@codeaurora.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-05-10	selinux: delete selinux_xfrm_policy_lookup() useless argument	Zhongjun Tan	2	-4/+3
	seliunx_xfrm_policy_lookup() is hooks of security_xfrm_policy_lookup(). The dir argument is uselss in security_xfrm_policy_lookup(). So remove the dir argument from selinux_xfrm_policy_lookup() and security_xfrm_policy_lookup(). Signed-off-by: Zhongjun Tan <tanzhongjun@yulong.com> [PM: reformat the subject line] Signed-off-by: Paul Moore <paul@paul-moore.com>
2021-05-10	cgroup: inline cgroup_task_freeze()	Roman Gushchin	1	-18/+0
	After the introduction of the cgroup.kill there is only one call site of cgroup_task_freeze() left: cgroup_exit(). cgroup_task_freeze() is currently taking rcu_read_lock() to read task's cgroup flags, but because it's always called with css_set_lock locked, the rcu protection is excessive. Simplify the code by inlining cgroup_task_freeze(). v2: fix build Signed-off-by: Roman Gushchin <guro@fb.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2021-05-10	rcu: Reject RCU_LOCKDEP_WARN() false positives	Paul E. McKenney	1	-1/+1
	If another lockdep report runs concurrently with an RCU lockdep report from RCU_LOCKDEP_WARN(), the following sequence of events can occur: 1. debug_lockdep_rcu_enabled() sees that lockdep is enabled when called from (say) synchronize_rcu(). 2. Lockdep is disabled by a concurrent lockdep report. 3. debug_lockdep_rcu_enabled() evaluates its lockdep-expression argument, for example, lock_is_held(&rcu_bh_lock_map). 4. Because lockdep is now disabled, lock_is_held() plays it safe and returns the constant 1. 5. But in this case, the constant 1 is not safe, because invoking synchronize_rcu() under rcu_read_lock_bh() is disallowed. 6. debug_lockdep_rcu_enabled() wrongly invokes lockdep_rcu_suspicious(), resulting in a false-positive splat. This commit therefore changes RCU_LOCKDEP_WARN() to check debug_lockdep_rcu_enabled() after checking the lockdep expression, so that any "safe" returns from lock_is_held() are rejected by debug_lockdep_rcu_enabled(). This requires memory ordering, which is supplied by READ_ONCE(debug_locks). The resulting volatile accesses prevent the compiler from reordering and the fact that only one variable is being accessed prevents the underlying hardware from reordering. The combination works for IA64, which can reorder reads to the same location, but this is defeated by the volatile accesses, which compile to load instructions that provide ordering. Reported-by: syzbot+dde0cc33951735441301@syzkaller.appspotmail.com Reported-by: Matthew Wilcox <willy@infradead.org> Reported-by: syzbot+88e4f02896967fe1ab0d@syzkaller.appspotmail.com Reported-by: Thomas Gleixner <tglx@linutronix.de> Suggested-by: Boqun Feng <boqun.feng@gmail.com> Reviewed-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-05-10	rcu: Remove the unused rcu_irq_exit_preempt() function	Paul E. McKenney	2	-2/+0
	Commit 9ee01e0f69a9 ("x86/entry: Clean up idtentry_enter/exit() leftovers") left the rcu_irq_exit_preempt() in place in order to avoid conflicts with the -rcu tree. Now that this change has long since hit mainline, this commit removes the no-longer-used rcu_irq_exit_preempt() function. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-05-10	bpf: verifier: Allocate idmap scratch in verifier env	Lorenz Bauer	1	-0/+8
	func_states_equal makes a very short lived allocation for idmap, probably because it's too large to fit on the stack. However the function is called quite often, leading to a lot of alloc / free churn. Replace the temporary allocation with dedicated scratch space in struct bpf_verifier_env. Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Edward Cree <ecree.xilinx@gmail.com> Link: https://lore.kernel.org/bpf/20210429134656.122225-4-lmb@cloudflare.com
2021-05-10	srcu: Initialize SRCU after timers	Frederic Weisbecker	1	-0/+6
	Once srcu_init() is called, the SRCU core will make use of delayed workqueues, which rely on timers. However init_timers() is called several steps after rcu_init(). This means that a call_srcu() after rcu_init() but before init_timers() would find itself within a dangerously uninitialized timer core. This commit therefore creates a separate call to srcu_init() after init_timer() completes, which ensures that we stay in early SRCU mode until timers are safe(r). Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Uladzislau Rezki <urezki@gmail.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Neeraj Upadhyay <neeraju@codeaurora.org> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Joel Fernandes <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-05-10	srcu: Unconditionally embed struct lockdep_map	Frederic Weisbecker	1	-2/+0
	Since struct lockdep_map has zero size when CONFIG_DEBUG_LOCK_ALLOC=n, this commit removes the #ifdef from the srcu_struct structure's ->dep_map. This change will simplify further manipulations of this field. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Uladzislau Rezki <urezki@gmail.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Neeraj Upadhyay <neeraju@codeaurora.org> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Joel Fernandes <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-05-10	timer: Revert "timer: Add timer_curr_running()"	Frederic Weisbecker	1	-2/+0
	This reverts commit dcd42591ebb8a25895b551a5297ea9c24414ba54. The only user was RCU/nocb. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Neeraj Upadhyay <neeraju@codeaurora.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-05-10	ptp: ptp_clock: make scaled_ppm_to_ppb static inline	Radu Pirea (NXP OSS)	1	-8/+26
	Make scaled_ppm_to_ppb static inline to be able to build drivers that use this function even with PTP_1588_CLOCK disabled. Signed-off-by: Radu Pirea (NXP OSS) <radu-nicolae.pirea@oss.nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-10	scsi: core: Treat device offline as a failure	Jason Yan	1	-26/+28
	When a SCSI device is offline a MODE SENSE command will return a result with only DID_NO_CONNECT set. In sd_read_write_protect_flag() only the status byte of the result is checked. Despite a returned status of DID_NO_CONNECT the command is considered successful and we read sdkp->write_prot from a buffer containing garbage. Modify scsi_status_is_good() to treat DID_NO_CONNECT as a failure case. Link: https://lore.kernel.org/r/20210330114727.234467-1-yanaijie@huawei.com Signed-off-by: Jason Yan <yanaijie@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-05-10	PM: runtime: Fix unpaired parent child_count for force_resume	Tony Lindgren	1	-0/+1
	As pm_runtime_need_not_resume() relies also on usage_count, it can return a different value in pm_runtime_force_suspend() compared to when called in pm_runtime_force_resume(). Different return values can happen if anything calls PM runtime functions in between, and causes the parent child_count to increase on every resume. So far I've seen the issue only for omapdrm that does complicated things with PM runtime calls during system suspend for legacy reasons: omap_atomic_commit_tail() for omapdrm.0 dispc_runtime_get() wakes up 58000000.dss as it's the dispc parent dispc_runtime_resume() rpm_resume() increases parent child_count dispc_runtime_put() won't idle, PM runtime suspend blocked pm_runtime_force_suspend() for 58000000.dss, !pm_runtime_need_not_resume() __update_runtime_status() system suspended pm_runtime_force_resume() for 58000000.dss, pm_runtime_need_not_resume() pm_runtime_enable() only called because of pm_runtime_need_not_resume() omap_atomic_commit_tail() for omapdrm.0 dispc_runtime_get() wakes up 58000000.dss as it's the dispc parent dispc_runtime_resume() rpm_resume() increases parent child_count dispc_runtime_put() won't idle, PM runtime suspend blocked ... rpm_suspend for 58000000.dss but parent child_count is now unbalanced Let's fix the issue by adding a flag for needs_force_resume and use it in pm_runtime_force_resume() instead of pm_runtime_need_not_resume(). Additionally omapdrm system suspend could be simplified later on to avoid lots of unnecessary PM runtime calls and the complexity it adds. The driver can just use internal functions that are shared between the PM runtime and system suspend related functions. Fixes: 4918e1f87c5f ("PM / runtime: Rework pm_runtime_force_suspend/resume()") Signed-off-by: Tony Lindgren <tony@atomide.com> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Tested-by: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com> Cc: 4.16+ <stable@vger.kernel.org> # 4.16+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-05-10	asm-generic: unaligned always use struct helpers	Arnd Bergmann	2	-79/+2
	As found by Vineet Gupta and Linus Torvalds, gcc has somewhat unexpected behavior when faced with overlapping unaligned pointers. The kernel's unaligned/access-ok.h header technically invokes undefined behavior that happens to usually work on the architectures using it, but if the compiler optimizes code based on the assumption that undefined behavior doesn't happen, it can create output that actually causes data corruption. A related problem was previously found on 32-bit ARMv7, where most instructions can be used on unaligned data, but 64-bit ldrd/strd causes an exception. The workaround was to always use the unaligned/le_struct.h helper instead of unaligned/access-ok.h, in commit 1cce91dfc8f7 ("ARM: 8715/1: add a private asm/unaligned.h"). The same solution should work on all other architectures as well, so remove the access-ok.h variant and use the other one unconditionally on all architectures, picking either the big-endian or little-endian version. With this, the arm specific header can be removed as well, and the only file including linux/unaligned/access_ok.h gets moved to including the normal file. Fortunately, this made almost no difference to the object code produced by gcc-11. On x86, s390, powerpc, and arc, the resulting binary appears to be identical to the previous version, while on arm64 and m68k there are minimal differences that looks like an optimization pass went into a different direction, usually using fewer stack spills on the new version. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100363
2021-05-10	asm-generic: unaligned: remove byteshift helpers	Arnd Bergmann	5	-144/+60
	In theory, compilers should be able to work this out themselves so we can use a simpler version based on the swab() helpers. I have verified that this works on all supported compiler versions (gcc-4.9 and up, clang-10 and up). Looking at the object code produced by gcc-11, I found that the impact is mostly a change in inlining decisions that lead to slightly larger code. In other cases, this version produces explicit byte swaps in place of separate byte access, or comparing against pre-swapped constants. While the source code is clearly simpler, I have not seen an indication of the new version actually producing better code on Arm, so maybe we want to skip this after all. From what I can tell, gcc recognizes the byteswap pattern in the byteshift.h header and can turn it into explicit instructions, but it does not turn a __builtin_bswap32() back into individual bytes when that would result in better output, e.g. when storing a byte-reversed constant. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2021-05-10	openrisc: always use unaligned-struct header	Arnd Bergmann	3	-120/+0
	openrisc is the only architecture using the linux/unaligned/*memmove infrastructure. There is a comment saying that this version is more efficient, but this was added in 2011 before the openrisc gcc port was merged upstream. I checked a couple of files to see what the actual difference is with the mainline gcc (9.4 and 11.1), and found that the generic header seems to produce better code now, regardless of the gcc version. Specifically, the be_memmove leads to allocating a stack slot and copying the data one byte at a time, then reading the whole word from the stack: 00000000 <test_get_unaligned_memmove>: 0: 9c 21 ff f4 l.addi r1,r1,-12 4: d4 01 10 04 l.sw 4(r1),r2 8: 8e 63 00 00 l.lbz r19,0(r3) c: 9c 41 00 0c l.addi r2,r1,12 10: 8e 23 00 01 l.lbz r17,1(r3) 14: db e2 9f f4 l.sb -12(r2),r19 18: db e2 8f f5 l.sb -11(r2),r17 1c: 8e 63 00 02 l.lbz r19,2(r3) 20: 8e 23 00 03 l.lbz r17,3(r3) 24: d4 01 48 08 l.sw 8(r1),r9 28: db e2 9f f6 l.sb -10(r2),r19 2c: db e2 8f f7 l.sb -9(r2),r17 30: 85 62 ff f4 l.lwz r11,-12(r2) 34: 85 21 00 08 l.lwz r9,8(r1) 38: 84 41 00 04 l.lwz r2,4(r1) 3c: 44 00 48 00 l.jr r9 40: 9c 21 00 0c l.addi r1,r1,12 while the be_struct version reads each byte into a register and does a shift to the right position: 00000000 <test_get_unaligned_struct>: 0: 9c 21 ff f8 l.addi r1,r1,-8 4: 8e 63 00 00 l.lbz r19,0(r3) 8: aa 20 00 18 l.ori r17,r0,0x18 c: e2 73 88 08 l.sll r19,r19,r17 10: 8d 63 00 01 l.lbz r11,1(r3) 14: aa 20 00 10 l.ori r17,r0,0x10 18: e1 6b 88 08 l.sll r11,r11,r17 1c: e1 6b 98 04 l.or r11,r11,r19 20: 8e 23 00 02 l.lbz r17,2(r3) 24: aa 60 00 08 l.ori r19,r0,0x8 28: e2 31 98 08 l.sll r17,r17,r19 2c: d4 01 10 00 l.sw 0(r1),r2 30: d4 01 48 04 l.sw 4(r1),r9 34: 9c 41 00 08 l.addi r2,r1,8 38: e2 31 58 04 l.or r17,r17,r11 3c: 8d 63 00 03 l.lbz r11,3(r3) 40: e1 6b 88 04 l.or r11,r11,r17 44: 84 41 00 00 l.lwz r2,0(r1) 48: 85 21 00 04 l.lwz r9,4(r1) 4c: 44 00 48 00 l.jr r9 50: 9c 21 00 08 l.addi r1,r1,8 According to Stafford Horne, the new version should in fact perform better. In the trivial example, the struct version is a few instructions longer, but building a whole kernel shows an overall reduction in code size, presumably because it now has to manage fewer stack slots: text data bss dec hex filename 4792010 181480 82324 5055814 4d2546 vmlinux-unaligned-memmove 4790642 181480 82324 5054446 4d1fee vmlinux-unaligned-struct Remove the memmove version completely and let openrisc use the same code as everyone else, as a simplification. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Stafford Horne <shorne@gmail.com>
2021-05-10	block: uapi: fix comment about block device ioctl	Damien Le Moal	1	-1/+1
	Fix the comment mentioning ioctl command range used for zoned block devices to reflect the range of commands actually implemented. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Link: https://lore.kernel.org/r/20210509234806.3000-1-damien.lemoal@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-05-10	gpu: ipu-v3: Add Rec.709 limited range support to DP	Philipp Zabel	1	-0/+2
	Add YCbCr encoding and quantization range parameters to ipu_dp_setup_channel() and configure the CSC DP matrix accordingly. Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
2021-05-10	cgroup: introduce cgroup.kill	Christian Brauner	1	-0/+3
	Introduce the cgroup.kill file. It does what it says on the tin and allows a caller to kill a cgroup by writing "1" into cgroup.kill. The file is available in non-root cgroups. Killing cgroups is a process directed operation, i.e. the whole thread-group is affected. Consequently trying to write to cgroup.kill in threaded cgroups will be rejected and EOPNOTSUPP returned. This behavior aligns with cgroup.procs where reads in threaded-cgroups are rejected with EOPNOTSUPP. The cgroup.kill file is write-only since killing a cgroup is an event not which makes it different from e.g. freezer where a cgroup transitions between the two states. As with all new cgroup features cgroup.kill is recursive by default. Killing a cgroup is protected against concurrent migrations through the cgroup mutex. To protect against forkbombs and to mitigate the effect of racing forks a new CGRP_KILL css set lock protected flag is introduced that is set prior to killing a cgroup and unset after the cgroup has been killed. We can then check in cgroup_post_fork() where we hold the css set lock already whether the cgroup is currently being killed. If so we send the child a SIGKILL signal immediately taking it down as soon as it returns to userspace. To make the killing of the child semantically clean it is killed after all cgroup attachment operations have been finalized. There are various use-cases of this interface: - Containers usually have a conservative layout where each container usually has a delegated cgroup. For such layouts there is a 1:1 mapping between container and cgroup. If the container in addition uses a separate pid namespace then killing a container usually becomes a simple kill -9 <container-init-pid> from an ancestor pid namespace. However, there are quite a few scenarios where that isn't true. For example, there are containers that share the cgroup with other processes on purpose that are supposed to be bound to the lifetime of the container but are not in the same pidns of the container. Containers that are in a delegated cgroup but share the pid namespace with the host or other containers. - Service managers such as systemd use cgroups to group and organize processes belonging to a service. They usually rely on a recursive algorithm now to kill a service. With cgroup.kill this becomes a simple write to cgroup.kill. - Userspace OOM implementations can make good use of this feature to efficiently take down whole cgroups quickly. - The kill program can gain a new kill --cgroup /sys/fs/cgroup/delegated flag to take down cgroups. A few observations about the semantics: - If parent and child are in the same cgroup and CLONE_INTO_CGROUP is not specified we are not taking cgroup mutex meaning the cgroup can be killed while a process in that cgroup is forking. If the kill request happens right before cgroup_can_fork() and before the parent grabs its siglock the parent is guaranteed to see the pending SIGKILL. In addition we perform another check in cgroup_post_fork() whether the cgroup is being killed and is so take down the child (see above). This is robust enough and protects gainst forkbombs. If userspace really really wants to have stricter protection the simple solution would be to grab the write side of the cgroup threadgroup rwsem which will force all ongoing forks to complete before killing starts. We concluded that this is not necessary as the semantics for concurrent forking should simply align with freezer where a similar check as cgroup_post_fork() is performed. For all other cases CLONE_INTO_CGROUP is required. In this case we will grab the cgroup mutex so the cgroup can't be killed while we fork. Once we're done with the fork and have dropped cgroup mutex we are visible and will be found by any subsequent kill request. - We obviously don't kill kthreads. This means a cgroup that has a kthread will not become empty after killing and consequently no unpopulated event will be generated. The assumption is that kthreads should be in the root cgroup only anyway so this is not an issue. - We skip killing tasks that already have pending fatal signals. - Freezer doesn't care about tasks in different pid namespaces, i.e. if you have two tasks in different pid namespaces the cgroup would still be frozen. The cgroup.kill mechanism consequently behaves the same way, i.e. we kill all processes and ignore in which pid namespace they exist. - If the caller is located in a cgroup that is killed the caller will obviously be killed as well. Link: https://lore.kernel.org/r/20210503143922.3093755-1-brauner@kernel.org Cc: Shakeel Butt <shakeelb@google.com> Cc: Roman Gushchin <guro@fb.com> Cc: Tejun Heo <tj@kernel.org> Cc: cgroups@vger.kernel.org Reviewed-by: Shakeel Butt <shakeelb@google.com> Reviewed-by: Serge Hallyn <serge@hallyn.com> Acked-by: Roman Gushchin <guro@fb.com> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2021-05-10	Merge tag 'misc-habanalabs-fixes-2021-05-08' of https://git.kernel.org/pub/scm/linux/kernel/git/ogabbay/linux into char-misc-linus	Greg Kroah-Hartman	1	-0/+33
	Oded writes: This tag contains the following fixes for 5.13-rc2: - Expose PLL information per ASIC. This also fixes some casting warnings. - Skip reading further firmware errors in case PCI link is down. - Security firmware error should be handled as error and not warning. - Allow user to ignore firmware errors. - Fix bug in timeout calculation when waiting for interrupt of CS. - Fix bug of potential use-after-free. * tag 'misc-habanalabs-fixes-2021-05-08' of https://git.kernel.org/pub/scm/linux/kernel/git/ogabbay/linux: habanalabs/gaudi: Fix a potential use after free in gaudi_memset_device_memory habanalabs: wait for interrupt wrong timeout calculation habanalabs: ignore f/w status error habanalabs: change error level of security not ready habanalabs: skip reading f/w errors on bad status habanalabs: expose ASIC specific PLL index
2021-05-10	usb: typec: tcpm: Don't block probing of consumers of "connector" nodes	Saravana Kannan	1	-0/+1
	fw_devlink expects DT device nodes with "compatible" property to have struct devices created for them. Since the connector node might not be populated as a device, mark it as such so that fw_devlink knows not to wait on this fwnode being populated as a struct device. Without this patch, USB functionality can be broken on some boards. Fixes: f7514a663016 ("of: property: fw_devlink: Add support for remote-endpoint") Reported-by: John Stultz <john.stultz@linaro.org> Tested-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Saravana Kannan <saravanak@google.com> Link: https://lore.kernel.org/r/20210506004423.345199-1-saravanak@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-05-10	drm: Mark AGP implementation and ioctls as legacy	Thomas Zimmermann	3	-120/+85
	Only UMs drivers use DRM's core AGP code and ioctls. Mark the icotls as legacy. Add the _legacy_ infix to all AGP functions. Move the declarations to the public and internal legacy header files. The agp field in struct drm_device is now located in the structure's legacy section. Adapt drivers to the changes. AGP code now depends on CONFIG_DRM_LEGACY. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210507185709.22797-5-tzimmermann@suse.de
2021-05-10	spi: pxa2xx: Group Intel Quark specific definitions	Andy Shevchenko	1	-1/+3
	DDS_RATE is Intel Quark specific definition. Move it to the rest Intel Quark related. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20210423182441.50272-7-andriy.shevchenko@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org>
2021-05-10	spi: pxa2xx: Unify ifdeffery used in the headers	Andy Shevchenko	2	-6/+7
	The two headers have quite different ifdeffery to prevent multiple inclusion. Unify them with the pattern that in particular reflects their location. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20210423182441.50272-6-andriy.shevchenko@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org>
2021-05-10	spi: pxa2xx: Replace header inclusions by forward declarations	Andy Shevchenko	1	-0/+2
	When the data structure is only referred by pointer, compiler may not need to see the contents of the data type. Thus, we may replace header inclusions by respective forward declarations. Due to above add missed headers as well. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20210423182441.50272-5-andriy.shevchenko@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org>
2021-05-10	mtd: core: add OTP nvmem provider support	Michael Walle	1	-0/+2
	Flash OTP regions can already be read via user space. Some boards have their serial number or MAC addresses stored in the OTP regions. Add support for them being a (read-only) nvmem provider. The API to read the OTP data is already in place. It distinguishes between factory and user OTP, thus there are up to two different providers. Signed-off-by: Michael Walle <michael@walle.cc> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Link: https://lore.kernel.org/linux-mtd/20210424110608.15748-6-michael@walle.cc
2021-05-10	nvmem: core: allow specifying of_node	Michael Walle	1	-0/+2
	Until now, the of_node of the parent device is used. Some devices provide more than just the nvmem provider. To avoid name space clashes, add a way to allow specifying the nvmem cells in subnodes. Consider the following example: flash@0 { compatible = "jedec,spi-nor"; partitions { compatible = "fixed-partitions"; #address-cells = <1>; #size-cells = <1>; partition@0 { reg = <0x000000 0x010000>; }; }; otp { compatible = "user-otp"; #address-cells = <1>; #size-cells = <1>; serial-number@0 { reg = <0x0 0x8>; }; }; }; There the nvmem provider might be the MTD partition or the OTP region of the flash. Add a new config->of_node parameter, which if set, will be used instead of the parent's of_node. Signed-off-by: Michael Walle <michael@walle.cc> Acked-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Link: https://lore.kernel.org/linux-mtd/20210424110608.15748-2-michael@walle.cc
2021-05-10	dt-bindings: add power-domain header for RK3568 SoCs	Elaine Zhang	1	-0/+32
	According to a description from TRM, add all the power domains Signed-off-by: Elaine Zhang <zhangqing@rock-chips.com> Reviewed-by: Enric Balletbo i Serra <enric.balletbo@collabora.com> Signed-off-by: Johan Jonker <jbx6244@gmail.com> Acked-by: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20210417112952.8516-11-jbx6244@gmail.com Signed-off-by: Heiko Stuebner <heiko@sntech.de>