linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2022-08-30	memblock tests: add tests for memblock_*bottom_up functions	Rebecca Mckeever	1	-0/+45
	Add simple tests for memblock_set_bottom_up() and memblock_bottom_up(). Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Shaoqin Huang <shaoqin.huang@intel.com> Signed-off-by: Rebecca Mckeever <remckee0@gmail.com> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Link: https://lore.kernel.org/r/b03701d2faeaf00f7184e4b72903de4e5e939437.1661578349.git.remckee0@gmail.com
2022-08-30	memblock tests: update alloc_nid_api to test memblock_alloc_try_nid_raw	Rebecca Mckeever	1	-69/+111
	Update memblock_alloc_try_nid() tests so that they test either memblock_alloc_try_nid() or memblock_alloc_try_nid_raw() depending on the value of alloc_nid_test_flags. Run through all the existing tests in alloc_nid_api twice: once for memblock_alloc_try_nid() and once for memblock_alloc_try_nid_raw(). When the tests run memblock_alloc_try_nid(), they test that the entire memory region is zero. When the tests run memblock_alloc_try_nid_raw(), they test that the entire memory region is nonzero. The content of the memory region is initialized to nonzero, and we expect it to remain unchanged if running memblock_alloc_try_nid_raw(). Reviewed-by: Shaoqin Huang <shaoqin.huang@intel.com> Signed-off-by: Rebecca Mckeever <remckee0@gmail.com> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Link: https://lore.kernel.org/r/6fa8938f67872841c10a00afb042947d1d280a04.1661578349.git.remckee0@gmail.com
2022-08-30	memblock tests: update alloc_api to test memblock_alloc_raw	Rebecca Mckeever	2	-33/+85
	Update memblock_alloc() tests so that they test either memblock_alloc() or memblock_alloc_raw() depending on the value of alloc_test_flags. Run through all the existing tests in memblock_alloc_api twice: once for memblock_alloc() and once for memblock_alloc_raw(). When the tests run memblock_alloc(), they test that the entire memory region is zero. When the tests run memblock_alloc_raw(), they test that the entire memory region is nonzero. The content of the memory region is initialized to nonzero, and we expect it to remain unchanged if running memblock_alloc_raw(). Reviewed-by: Shaoqin Huang <shaoqin.huang@intel.com> Signed-off-by: Rebecca Mckeever <remckee0@gmail.com> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Link: https://lore.kernel.org/r/5a7cfb2f807ee2cb53ee77f9f5c910107b253d6e.1661578349.git.remckee0@gmail.com
2022-08-30	memblock tests: add additional tests for basic api and memblock_alloc	Rebecca Mckeever	2	-0/+543
	Add tests for memblock_add(), memblock_reserve(), memblock_remove(), memblock_free(), and memblock_alloc() for the following test scenarios. memblock_add() and memblock_reserve(): - add/reserve a memory block in the gap between two existing memory blocks, and check that the blocks are merged into one region - try to add/reserve memblock regions that extend past PHYS_ADDR_MAX memblock_remove() and memblock_free(): - remove/free a region when it is the only available region + These tests ensure that the first region is overwritten with a "dummy" region when the last remaining region of that type is removed or freed. - remove/free() a region that overlaps with two existing regions of the relevant type - try to remove/free memblock regions that extend past PHYS_ADDR_MAX memblock_alloc(): - try to allocate a region that is larger than the total size of available memory (memblock.memory) Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Shaoqin Huang <shaoqin.huang@intel.com> Signed-off-by: Rebecca Mckeever <remckee0@gmail.com> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Link: https://lore.kernel.org/r/c23c0393c5b9a53fe7f676996913c629495e9727.1661578349.git.remckee0@gmail.com
2022-08-30	memblock tests: add labels to verbose output for generic alloc tests	Rebecca Mckeever	4	-56/+44
	Generic tests for memblock_alloc() functions do not use separate functions for testing top-down and bottom-up allocation directions. Therefore, the function name that is displayed in the verbose testing output does not include the allocation direction. Add an additional prefix when running generic tests for memblock_alloc() functions that indicates which allocation direction is set. The prefix will be displayed when the tests are run in verbose mode. Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Shaoqin Huang <shaoqin.huang@intel.com> Signed-off-by: Rebecca Mckeever <remckee0@gmail.com> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Link: https://lore.kernel.org/r/fb76a42253d2a196a7daea29dd8121a69904f58e.1661578349.git.remckee0@gmail.com
2022-08-30	memblock tests: update zeroed memory check for memblock_alloc_* tests	Rebecca Mckeever	2	-60/+20
	Update the assert in memblock_alloc_try_nid() and memblock_alloc_from() tests that checks whether the memory is cleared so that it checks the entire chunk of allocated memory instead of just the first byte. Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Shaoqin Huang <shaoqin.huang@intel.com> Signed-off-by: Rebecca Mckeever <remckee0@gmail.com> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Link: https://lore.kernel.org/r/24b3271751756100142e65b75284d43b4d30c9b7.1661578349.git.remckee0@gmail.com
2022-08-30	memblock tests: update tests to check if memblock_alloc zeroed memory	Rebecca Mckeever	3	-0/+42
	Add an assert in memblock_alloc() tests where allocation is expected to occur. The assert checks whether the entire chunk of allocated memory is cleared. The current memblock_alloc() tests do not check whether the allocated memory was zeroed. memblock_alloc() should zero the allocated memory since it is a wrapper for memblock_alloc_try_nid(). Reviewed-by: Shaoqin Huang <shaoqin.huang@intel.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Rebecca Mckeever <remckee0@gmail.com> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Link: https://lore.kernel.org/r/83ffb941b65074f40eb14552f8bfe5b71fe50abd.1661578349.git.remckee0@gmail.com
2022-08-30	memblock tests: update reference to obsolete build option in comments	Rebecca Mckeever	1	-3/+6
	The VERBOSE build option was replaced with the --verbose runtime option, but the comments describing the ASSERT_*() macros still refer to the VERBOSE build option. Update these comments so that they refer to the --verbose runtime option. Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Rebecca Mckeever <remckee0@gmail.com> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Link: https://lore.kernel.org/r/5f8a4c2bde34cc029282c68d47eda982d950f421.1660451025.git.remckee0@gmail.com
2022-08-30	memblock tests: add command line help option	Rebecca Mckeever	1	-1/+3
	Add a help command line option to the help message. Add the help option to the short and long options so it will be recognized as a valid option. Usage: $ ./main -h Or: $ ./main --help Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Rebecca Mckeever <remckee0@gmail.com> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Link: https://lore.kernel.org/r/0f3b93a79de78c0da1ca90f74fe35e9a85c7cf93.1660451025.git.remckee0@gmail.com
2022-08-30	dt-bindings: i2c: renesas,riic: Fix 'unevaluatedProperties' warnings	Lad Prabhakar	1	-0/+3
	With 'unevaluatedProperties' support implemented, there's a number of warnings when running dtbs_check: arch/arm64/boot/dts/renesas/r9a07g043u11-smarc.dtb: i2c@10058000: Unevaluated properties are not allowed ('resets' was unexpected) From schema: Documentation/devicetree/bindings/i2c/renesas,riic.yaml The main problem is that bindings schema marks resets as a required property for RZ/G2L (and alike) SoC's but resets property is not part of schema. So to fix this just add a resets property with maxItems set to 1. Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Wolfram Sang <wsa@kernel.org>
2022-08-30	Add Pink Sardine platform ASoC driver	Mark Brown	8	-0/+1455
	Merge series from Syed Saba Kareem <Syed.SabaKareem@amd.com>: Pink Sardine platform is new APU series based on acp6.2 design. This patch set adds an ASoC driver for the ACP (Audio CoProcessor) block on AMD Pink Sardine APU with DMIC endpoint support.
2022-08-30	i2c: imx-lpi2c: use bulk clk API	Peng Fan	1	-12/+12
	The current driver only support one clock, however LPI2C requires two clocks: PER and IPG. To make sure old dts could work with newer kernel, use bulk clk API. Reviewed-by: Dong Aisheng <aisheng.dong@nxp.com> Signed-off-by: Peng Fan <peng.fan@nxp.com> Signed-off-by: Wolfram Sang <wsa@kernel.org>
2022-08-30	dt-bindings: i2c: i2c-imx-lpi2c: add i.MX93	Peng Fan	1	-0/+1
	Add i.MX93 LPI2C compatible string. Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Reviewed-by: Dong Aisheng <aisheng.dong@nxp.com> Signed-off-by: Peng Fan <peng.fan@nxp.com> Signed-off-by: Wolfram Sang <wsa@kernel.org>
2022-08-30	dt-bindings: i2c: i2c-imx-lpi2c: add dmas property	Peng Fan	1	-0/+10
	i.MX LPI2C has dma capability, so add dmas property Reviewed-by: Dong Aisheng <aisheng.dong@nxp.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Peng Fan <peng.fan@nxp.com> Signed-off-by: Wolfram Sang <wsa@kernel.org>
2022-08-30	dt-bindings: i2c: i2c-imx-lpi2c: add ipg clk	Peng Fan	1	-3/+6
	i.MX LPI2C actually requires dual clock: per clock and ipg clock, so add both. Reviewed-by: Dong Aisheng <aisheng.dong@nxp.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Peng Fan <peng.fan@nxp.com> Signed-off-by: Wolfram Sang <wsa@kernel.org>
2022-08-30	net: unify alloclen calculation for paged requests	Pavel Begunkov	2	-8/+2
	Consolidate alloclen and pagedlen calculation for zerocopy and normal paged requests. The current non-zerocopy paged version can a bit overallocate and unnecessary copy a small chunk of data into the linear part. Cc: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/netdev/CA+FuTSf0+cJ9_N_xrHmCGX_KoVCWcE0YQBdtgEkzGvcLMSv7Qw@mail.gmail.com/ Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/b0e4edb7b91f171c7119891d3c61040b8c56596e.1661428921.git.asml.silence@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-08-30	ARM: 9221/1: traps: print un-hashed user pc on undefined instruction	Baruch Siach	1	-1/+1
	When user undefined instruction debug is enabled pc value is hashed like kernel pointers for security reason. But the security benefit of this hash is very limited because the code goes on to call __show_regs() that prints the plain pointer value. pc is a user pointer anyway, so the kernel does not leak anything. The only result is confusion about the difference between the pc value on the first printed line, and the value that __show_regs() prints. Always print the plain value of pc. Signed-off-by: Baruch Siach <baruch@tkos.co.il> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2022-08-30	net: sched: tbf: don't call qdisc_put() while holding tree lock	Zhengchao Shao	1	-1/+3
	The issue is the same to commit c2999f7fb05b ("net: sched: multiq: don't call qdisc_put() while holding tree lock"). Qdiscs call qdisc_put() while holding sch tree spinlock, which results sleeping-while-atomic BUG. Fixes: c266f64dbfa2 ("net: sched: protect block state with mutex") Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Link: https://lore.kernel.org/r/20220826013930.340121-1-shaozhengchao@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-08-30	Merge branch 'nfp-port-speed-and-eeprom-get-set-updates'	Paolo Abeni	7	-25/+295
	Simon Horman says: ==================== nfp: port speed and eeprom get/set updates this short series is the initial updates for the NFP driver for the v6.1 Kernel. It covers two enhancements: 1. Patches 1/3 and 2/3: - Support cases where application firmware does not know port speeds a priori by relaying this information from the management firmware to the application firmware. This allows the existing mechanism, whereby the driver reports port speeds to user-space as provided by the application firmware, to work in this case. 2. Patch 2/3: - Add support for eeprom get and set command ==================== Link: https://lore.kernel.org/r/20220825141223.22346-1-simon.horman@corigine.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-08-30	nfp: add support for eeprom get and set command	Baowen Zheng	1	-0/+157
	Add support for eeprom get and set operation with ethtool command. with this change, we can support commands as: #ethtool -e enp101s0np0 offset 0 length 6 Offset Values ------ ------ 0x0000: 00 15 4d 16 66 33 #ethtool -E enp101s0np0 magic 0x400019ee offset 5 length 1 value 0x88 We make this change to persist MAC change during driver reload and system reboot. Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-08-30	nfp: check if application firmware is indifferent to port speed	Yinjun Zhang	4	-1/+59
	A new tlv type is introduced to indicate if application firmware is indifferent to port speed, and inform management firmware of the result. And the result is always true for flower application firmware since it's indifferent to port speed from the start and will never change. Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com> Reviewed-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-08-30	nfp: propagate port speed from management firmware	Yinjun Zhang	6	-24/+79
	In future releases the NIC application firmware may be indifferent to port speeds - not built for specific port speeds - and consequently it will not be able to report VF port speeds to the driver without first learning them. With this change, the driver will pass the speed of physical ports from management firmware to application firmware, and the latter will copy the speed of port 0 to all the active VFs. So that the driver can get VF port speed as before. The port speed of a VF may be requested from userspace using: ethtool <vf-intf> Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com> Reviewed-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-08-30	ARM: make ARCH_MULTIPLATFORM user-visible	Arnd Bergmann	7	-7/+19
	Some options like CONFIG_DEBUG_UNCOMPRESS and CONFIG_CMDLINE_FORCE are fundamentally incompatible with portable kernels but are currently allowed in all configurations. Other options like XIP_KERNEL are essentially useless after the completion of the multiplatform conversion. Repurpose the existing CONFIG_ARCH_MULTIPLATFORM option to decide whether the resulting kernel image is meant to be portable or not, and using this to guard all of the known incompatible options. This is similar to how the RISC-V kernel handles the CONFIG_NONPORTABLE option (with the opposite polarity). A few references to CONFIG_ARCH_MULTIPLATFORM were left behind by earlier clanups and have to be removed now up. Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2022-08-30	ARM: fix XIP_KERNEL dependencies	Arnd Bergmann	1	-3/+4
	CONFIG_XIP_KERNEL does not work with any option that involves patching the read-only kernel .text. Since at least CONFIG_SMP_ON_UP is required in certain configurations, flip the dependency to always allow the .text patching options but make XIP_KERNEL have the dependency instead. This is a prerequisite for allowing CONFIG_ARCH_MULTIPLATFORM to be disabled. Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2022-08-30	RDMA/hns: Remove redundant member doorbell_qpn of struct hns_roce_qp	Wenpeng Liang	3	-5/+1
	The value of doorbell_qpn is always equal to qpn on current hardware versions. So remove it. Link: https://lore.kernel.org/r/20220829105021.1427804-5-liangwenpeng@huawei.com Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-08-30	IB/cm: Refactor cm_insert_listen() and cm_find_listen()	Mark Zhang	1	-20/+15
	Move the device and service_id match code at the top of cm_insert_listen() and cm_find_listen() into the final else branch. Link: https://lore.kernel.org/r/20220819090859.957943-4-markzhang@nvidia.com Signed-off-by: Mark Zhang <markzhang@nvidia.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-08-30	IB/cm: remove cm_id_priv->id.service_mask and service_mask parameter of cm_init_listen()	Mark Zhang	2	-21/+8
	The service_mask is always ~cpu_to_be64(0), so the result is always a NOP when it is &'d with a service_id. Remove it for simplicity. Link: https://lore.kernel.org/r/20220819090859.957943-3-markzhang@nvidia.com Signed-off-by: Mark Zhang <markzhang@nvidia.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-08-30	IB/cm: Remove the service_mask parameter from ib_cm_listen()	Mark Zhang	4	-15/+6
	Remove the service_mask parameter of ib_cm_listen(), as all callers use 0. Link: https://lore.kernel.org/r/20220819090859.957943-2-markzhang@nvidia.com Signed-off-by: Mark Zhang <markzhang@nvidia.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-08-30	RDMA/rtrs: Remove 'dir' argument from rnbd_srv_rdma_ev	Guoqing Jiang	3	-10/+8
	Since process_{read,write} already prints direction info if ctx->ops.rdma_ev fails, no need to pass 'dir'. Link: https://lore.kernel.org/r/20220826081117.21687-1-guoqing.jiang@linux.dev Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-08-30	spi: renesas,sh-msiof: Fix 'unevaluatedProperties' warnings	Lad Prabhakar	1	-0/+6
	With 'unevaluatedProperties' support implemented, there's a number of warnings when running dtbs_check: arch/arm64/boot/dts/renesas/r8a77951-ulcb-kf.dtb: spi@e6e90000: Unevaluated properties are not allowed ('power-domains', 'resets' were unexpected) From schema: Documentation/devicetree/bindings/spi/renesas,sh-msiof.yaml The main problem is that SoC DTSI's include power-domains and resets property, whereas the renesas,sh-msiof.yaml has 'unevaluatedProperties: false'. So just add optional power-domains and resets properties. Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Link: https://lore.kernel.org/r/20220829220334.6379-1-prabhakar.mahadev-lad.rj@bp.renesas.com Signed-off-by: Mark Brown <broonie@kernel.org>
2022-08-30	ASoC: codecs: rk817: fix missing I2C dependency in compile test	Krzysztof Kozlowski	1	-1/+1
	SND_SOC_RK817 uses I2C regmap so compile testing without parent MFD_RK808, requires I2C: WARNING: unmet direct dependencies detected for REGMAP_I2C Depends on [n]: I2C [=n] Selected by [y]: - SND_SOC_RK817 [=y] && SOUND [=y] && !UML && SND [=y] && SND_SOC [=y] && (MFD_RK808 [=n] \|\| COMPILE_TEST [=y]) Reported-by: kernel test robot <lkp@intel.com> Fixes: 5b7f4e5de61b ("ASoC: codecs: allow compile testing without MFD drivers") Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20220830075855.278046-1-krzysztof.kozlowski@linaro.org Signed-off-by: Mark Brown <broonie@kernel.org>
2022-08-30	perf/hw_breakpoint: Optimize toggle_bp_slot() for CPU-independent task targets	Marco Elver	1	-31/+124
	We can still see that a majority of the time is spent hashing task pointers: ... 16.98% [kernel] [k] rhashtable_jhash2 ... Doing the bookkeeping in toggle_bp_slots() is currently O(#cpus), calling task_bp_pinned() for each CPU, even if task_bp_pinned() is CPU-independent. The reason for this is to update the per-CPU 'tsk_pinned' histogram. To optimize the CPU-independent case to O(1), keep a separate CPU-independent 'tsk_pinned_all' histogram. The major source of complexity are transitions between "all CPU-independent task breakpoints" and "mixed CPU-independent and CPU-dependent task breakpoints". The code comments list all cases that require handling. After this optimization: \| $> perf bench -r 100 breakpoint thread -b 4 -p 128 -t 512 \| # Running 'breakpoint/thread' benchmark: \| # Created/joined 100 threads with 4 breakpoints and 128 parallelism \| Total time: 1.758 [sec] \| \| 34.336621 usecs/op \| 4395.087500 usecs/op/cpu 38.08% [kernel] [k] queued_spin_lock_slowpath 10.81% [kernel] [k] smp_cfm_core_cond 3.01% [kernel] [k] update_sg_lb_stats 2.58% [kernel] [k] osq_lock 2.57% [kernel] [k] llist_reverse_order 1.45% [kernel] [k] find_next_bit 1.21% [kernel] [k] flush_tlb_func_common 1.01% [kernel] [k] arch_install_hw_breakpoint Showing that the time spent hashing keys has become insignificant. With the given benchmark parameters, that's an improvement of 12% compared with the old O(#cpus) version. And finally, using the less aggressive parameters from the preceding changes, we now observe: \| $> perf bench -r 30 breakpoint thread -b 4 -p 64 -t 64 \| # Running 'breakpoint/thread' benchmark: \| # Created/joined 30 threads with 4 breakpoints and 64 parallelism \| Total time: 0.067 [sec] \| \| 35.292187 usecs/op \| 2258.700000 usecs/op/cpu Which is an improvement of 12% compared to without the histogram optimizations (baseline is 40 usecs/op). This is now on par with the theoretical ideal (constraints disabled), and only 12% slower than no breakpoints at all. Signed-off-by: Marco Elver <elver@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Dmitry Vyukov <dvyukov@google.com> Acked-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20220829124719.675715-15-elver@google.com
2022-08-30	perf/hw_breakpoint: Optimize max_bp_pinned_slots() for CPU-independent task targets	Marco Elver	1	-4/+53
	Running the perf benchmark with (note: more aggressive parameters vs. preceding changes, but same 256 CPUs host): \| $> perf bench -r 100 breakpoint thread -b 4 -p 128 -t 512 \| # Running 'breakpoint/thread' benchmark: \| # Created/joined 100 threads with 4 breakpoints and 128 parallelism \| Total time: 1.989 [sec] \| \| 38.854160 usecs/op \| 4973.332500 usecs/op/cpu 20.43% [kernel] [k] queued_spin_lock_slowpath 18.75% [kernel] [k] osq_lock 16.98% [kernel] [k] rhashtable_jhash2 8.34% [kernel] [k] task_bp_pinned 4.23% [kernel] [k] smp_cfm_core_cond 3.65% [kernel] [k] bcmp 2.83% [kernel] [k] toggle_bp_slot 1.87% [kernel] [k] find_next_bit 1.49% [kernel] [k] __reserve_bp_slot We can see that a majority of the time is now spent hashing task pointers to index into task_bps_ht in task_bp_pinned(). Obtaining the max_bp_pinned_slots() for CPU-independent task targets currently is O(#cpus), and calls task_bp_pinned() for each CPU, even if the result of task_bp_pinned() is CPU-independent. The loop in max_bp_pinned_slots() wants to compute the maximum slots across all CPUs. If task_bp_pinned() is CPU-independent, we can do so by obtaining the max slots across all CPUs and adding task_bp_pinned(). To do so in O(1), use a bp_slots_histogram for CPU-pinned slots. After this optimization: \| $> perf bench -r 100 breakpoint thread -b 4 -p 128 -t 512 \| # Running 'breakpoint/thread' benchmark: \| # Created/joined 100 threads with 4 breakpoints and 128 parallelism \| Total time: 1.930 [sec] \| \| 37.697832 usecs/op \| 4825.322500 usecs/op/cpu 19.13% [kernel] [k] queued_spin_lock_slowpath 18.21% [kernel] [k] rhashtable_jhash2 15.46% [kernel] [k] osq_lock 6.27% [kernel] [k] toggle_bp_slot 5.91% [kernel] [k] task_bp_pinned 5.05% [kernel] [k] smp_cfm_core_cond 1.78% [kernel] [k] update_sg_lb_stats 1.36% [kernel] [k] llist_reverse_order 1.34% [kernel] [k] find_next_bit 1.19% [kernel] [k] bcmp Suggesting that time spent in task_bp_pinned() has been reduced. However, we're still hashing too much, which will be addressed in the subsequent change. Signed-off-by: Marco Elver <elver@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Dmitry Vyukov <dvyukov@google.com> Acked-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20220829124719.675715-14-elver@google.com
2022-08-30	perf/hw_breakpoint: Introduce bp_slots_histogram	Marco Elver	1	-33/+63
	Factor out the existing `atomic_t count[N]` into its own struct called 'bp_slots_histogram', to generalize and make its intent clearer in preparation of reusing elsewhere. The basic idea of bucketing "total uses of N slots" resembles a histogram, so calling it such seems most intuitive. No functional change. Signed-off-by: Marco Elver <elver@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Dmitry Vyukov <dvyukov@google.com> Acked-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20220829124719.675715-13-elver@google.com
2022-08-30	perf/hw_breakpoint: Reduce contention with large number of tasks	Marco Elver	1	-28/+133
	While optimizing task_bp_pinned()'s runtime complexity to O(1) on average helps reduce time spent in the critical section, we still suffer due to serializing everything via 'nr_bp_mutex'. Indeed, a profile shows that now contention is the biggest issue: 95.93% [kernel] [k] osq_lock 0.70% [kernel] [k] mutex_spin_on_owner 0.22% [kernel] [k] smp_cfm_core_cond 0.18% [kernel] [k] task_bp_pinned 0.18% [kernel] [k] rhashtable_jhash2 0.15% [kernel] [k] queued_spin_lock_slowpath when running the breakpoint benchmark with (system with 256 CPUs): \| $> perf bench -r 30 breakpoint thread -b 4 -p 64 -t 64 \| # Running 'breakpoint/thread' benchmark: \| # Created/joined 30 threads with 4 breakpoints and 64 parallelism \| Total time: 0.207 [sec] \| \| 108.267188 usecs/op \| 6929.100000 usecs/op/cpu The main concern for synchronizing the breakpoint constraints data is that a consistent snapshot of the per-CPU and per-task data is observed. The access pattern is as follows: 1. If the target is a task: the task's pinned breakpoints are counted, checked for space, and then appended to; only bp_cpuinfo::cpu_pinned is used to check for conflicts with CPU-only breakpoints; bp_cpuinfo::tsk_pinned are incremented/decremented, but otherwise unused. 2. If the target is a CPU: bp_cpuinfo::cpu_pinned are counted, along with bp_cpuinfo::tsk_pinned; after a successful check, cpu_pinned is incremented. No per-task breakpoints are checked. Since rhltable safely synchronizes insertions/deletions, we can allow concurrency as follows: 1. If the target is a task: independent tasks may update and check the constraints concurrently, but same-task target calls need to be serialized; since bp_cpuinfo::tsk_pinned is only updated, but not checked, these modifications can happen concurrently by switching tsk_pinned to atomic_t. 2. If the target is a CPU: access to the per-CPU constraints needs to be serialized with other CPU-target and task-target callers (to stabilize the bp_cpuinfo::tsk_pinned snapshot). We can allow the above concurrency by introducing a per-CPU constraints data reader-writer lock (bp_cpuinfo_sem), and per-task mutexes (reuses task_struct::perf_event_mutex): 1. If the target is a task: acquires perf_event_mutex, and acquires bp_cpuinfo_sem as a reader. The choice of percpu-rwsem minimizes contention in the presence of many read-lock but few write-lock acquisitions: we assume many orders of magnitude more task target breakpoints creations/destructions than CPU target breakpoints. 2. If the target is a CPU: acquires bp_cpuinfo_sem as a writer. With these changes, contention with thousands of tasks is reduced to the point where waiting on locking no longer dominates the profile: \| $> perf bench -r 30 breakpoint thread -b 4 -p 64 -t 64 \| # Running 'breakpoint/thread' benchmark: \| # Created/joined 30 threads with 4 breakpoints and 64 parallelism \| Total time: 0.077 [sec] \| \| 40.201563 usecs/op \| 2572.900000 usecs/op/cpu 21.54% [kernel] [k] task_bp_pinned 20.18% [kernel] [k] rhashtable_jhash2 6.81% [kernel] [k] toggle_bp_slot 5.47% [kernel] [k] queued_spin_lock_slowpath 3.75% [kernel] [k] smp_cfm_core_cond 3.48% [kernel] [k] bcmp On this particular setup that's a speedup of 2.7x. We're also getting closer to the theoretical ideal performance through optimizations in hw_breakpoint.c -- constraints accounting disabled: \| perf bench -r 30 breakpoint thread -b 4 -p 64 -t 64 \| # Running 'breakpoint/thread' benchmark: \| # Created/joined 30 threads with 4 breakpoints and 64 parallelism \| Total time: 0.067 [sec] \| \| 35.286458 usecs/op \| 2258.333333 usecs/op/cpu Which means the current implementation is ~12% slower than the theoretical ideal. For reference, performance without any breakpoints: \| $> bench -r 30 breakpoint thread -b 0 -p 64 -t 64 \| # Running 'breakpoint/thread' benchmark: \| # Created/joined 30 threads with 0 breakpoints and 64 parallelism \| Total time: 0.060 [sec] \| \| 31.365625 usecs/op \| 2007.400000 usecs/op/cpu On a system with 256 CPUs, the theoretical ideal is only ~12% slower than no breakpoints at all; the current implementation is ~28% slower. Signed-off-by: Marco Elver <elver@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Dmitry Vyukov <dvyukov@google.com> Acked-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20220829124719.675715-12-elver@google.com
2022-08-30	locking/percpu-rwsem: Add percpu_is_write_locked() and percpu_is_read_locked()	Marco Elver	2	-0/+12
	Implement simple accessors to probe percpu-rwsem's locked state: percpu_is_write_locked(), percpu_is_read_locked(). Signed-off-by: Marco Elver <elver@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Dmitry Vyukov <dvyukov@google.com> Acked-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20220829124719.675715-11-elver@google.com
2022-08-30	powerpc/hw_breakpoint: Avoid relying on caller synchronization	Marco Elver	1	-13/+40
	Internal data structures (cpu_bps, task_bps) of powerpc's hw_breakpoint implementation have relied on nr_bp_mutex serializing access to them. Before overhauling synchronization of kernel/events/hw_breakpoint.c, introduce 2 spinlocks to synchronize cpu_bps and task_bps respectively, thus avoiding reliance on callers synchronizing powerpc's hw_breakpoint. Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Marco Elver <elver@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Dmitry Vyukov <dvyukov@google.com> Acked-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20220829124719.675715-10-elver@google.com
2022-08-30	soc: mediatek: mtk-svs: Explicitly include bitfield header	Nícolas F. R. A. Prado	1	-0/+1
	Commit a92438c5a30a ("soc: mediatek: mtk-svs: Use bitfield access macros where possible") introduced the use of FIELD_GET and FIELD_PREP macros, which are defined in the bitfield header. Add an explicit include for it so we're sure to have the symbols defined independently of the config. Fixes: a92438c5a30a ("soc: mediatek: mtk-svs: Use bitfield access macros where possible") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Nícolas F. R. A. Prado <nfraprado@collabora.com> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Link: https://lore.kernel.org/r/20220829204439.3748648-1-nfraprado@collabora.com Signed-off-by: Matthias Brugger <matthias.bgg@gmail.com>
2022-08-30	perf/hw_breakpoint: Remove useless code related to flexible breakpoints	Marco Elver	1	-40/+17
	Flexible breakpoints have never been implemented, with bp_cpuinfo::flexible always being 0. Unfortunately, they still occupy 4 bytes in each bp_cpuinfo and bp_busy_slots, as well as computing the max flexible count in fetch_bp_busy_slots(). This again causes suboptimal code generation, when we always know that `!!slots.flexible` will be 0. Just get rid of the flexible "placeholder" and remove all real code related to it. Make a note in the comment related to the constraints algorithm but don't remove them from the algorithm, so that if in future flexible breakpoints need supporting, it should be trivial to revive them (along with reverting this change). Signed-off-by: Marco Elver <elver@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Dmitry Vyukov <dvyukov@google.com> Acked-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20220829124719.675715-9-elver@google.com
2022-08-30	perf/hw_breakpoint: Make hw_breakpoint_weight() inlinable	Marco Elver	2	-2/+3
	Due to being a __weak function, hw_breakpoint_weight() will cause the compiler to always emit a call to it. This generates unnecessarily bad code (register spills etc.) for no good reason; in fact it appears in profiles of `perf bench -r 100 breakpoint thread -b 4 -p 128 -t 512`: ... 0.70% [kernel] [k] hw_breakpoint_weight ... While a small percentage, no architecture defines its own hw_breakpoint_weight() nor are there users outside hw_breakpoint.c, which makes the fact it is currently __weak a poor choice. Change hw_breakpoint_weight()'s definition to follow a similar protocol to hw_breakpoint_slots(), such that if <asm/hw_breakpoint.h> defines hw_breakpoint_weight(), we'll use it instead. The result is that it is inlined and no longer shows up in profiles. Signed-off-by: Marco Elver <elver@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Dmitry Vyukov <dvyukov@google.com> Acked-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20220829124719.675715-8-elver@google.com
2022-08-30	perf/hw_breakpoint: Optimize constant number of breakpoint slots	Marco Elver	3	-41/+63
	Optimize internal hw_breakpoint state if the architecture's number of breakpoint slots is constant. This avoids several kmalloc() calls and potentially unnecessary failures if the allocations fail, as well as subtly improves code generation and cache locality. The protocol is that if an architecture defines hw_breakpoint_slots via the preprocessor, it must be constant and the same for all types. Signed-off-by: Marco Elver <elver@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Dmitry Vyukov <dvyukov@google.com> Acked-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20220829124719.675715-7-elver@google.com
2022-08-30	perf/hw_breakpoint: Mark data __ro_after_init	Marco Elver	1	-3/+3
	Mark read-only data after initialization as __ro_after_init. While we are here, turn 'constraints_initialized' into a bool. Signed-off-by: Marco Elver <elver@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Dmitry Vyukov <dvyukov@google.com> Acked-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20220829124719.675715-6-elver@google.com
2022-08-30	perf/hw_breakpoint: Optimize list of per-task breakpoints	Marco Elver	2	-22/+37
	On a machine with 256 CPUs, running the recently added perf breakpoint benchmark results in: \| $> perf bench -r 30 breakpoint thread -b 4 -p 64 -t 64 \| # Running 'breakpoint/thread' benchmark: \| # Created/joined 30 threads with 4 breakpoints and 64 parallelism \| Total time: 236.418 [sec] \| \| 123134.794271 usecs/op \| 7880626.833333 usecs/op/cpu The benchmark tests inherited breakpoint perf events across many threads. Looking at a perf profile, we can see that the majority of the time is spent in various hw_breakpoint.c functions, which execute within the 'nr_bp_mutex' critical sections which then results in contention on that mutex as well: 37.27% [kernel] [k] osq_lock 34.92% [kernel] [k] mutex_spin_on_owner 12.15% [kernel] [k] toggle_bp_slot 11.90% [kernel] [k] __reserve_bp_slot The culprit here is task_bp_pinned(), which has a runtime complexity of O(#tasks) due to storing all task breakpoints in the same list and iterating through that list looking for a matching task. Clearly, this does not scale to thousands of tasks. Instead, make use of the "rhashtable" variant "rhltable" which stores multiple items with the same key in a list. This results in average runtime complexity of O(1) for task_bp_pinned(). With the optimization, the benchmark shows: \| $> perf bench -r 30 breakpoint thread -b 4 -p 64 -t 64 \| # Running 'breakpoint/thread' benchmark: \| # Created/joined 30 threads with 4 breakpoints and 64 parallelism \| Total time: 0.208 [sec] \| \| 108.422396 usecs/op \| 6939.033333 usecs/op/cpu On this particular setup that's a speedup of ~1135x. While one option would be to make task_struct a breakpoint list node, this would only further bloat task_struct for infrequently used data. Furthermore, after all optimizations in this series, there's no evidence it would result in better performance: later optimizations make the time spent looking up entries in the hash table negligible (we'll reach the theoretical ideal performance i.e. no constraints). Signed-off-by: Marco Elver <elver@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Dmitry Vyukov <dvyukov@google.com> Acked-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20220829124719.675715-5-elver@google.com
2022-08-30	perf/hw_breakpoint: Clean up headers	Marco Elver	1	-10/+9
	Clean up headers: - Remove unused <linux/kallsyms.h> - Remove unused <linux/kprobes.h> - Remove unused <linux/module.h> - Remove unused <linux/smp.h> - Add <linux/export.h> for EXPORT_SYMBOL_GPL(). - Add <linux/mutex.h> for mutex. - Sort alphabetically. - Move <linux/hw_breakpoint.h> to top to test it compiles on its own. Signed-off-by: Marco Elver <elver@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Dmitry Vyukov <dvyukov@google.com> Acked-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20220829124719.675715-4-elver@google.com
2022-08-30	perf/hw_breakpoint: Provide hw_breakpoint_is_used() and use in test	Marco Elver	3	-1/+43
	Provide hw_breakpoint_is_used() to check if breakpoints are in use on the system. Use it in the KUnit test to verify the global state before and after a test case. Signed-off-by: Marco Elver <elver@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Dmitry Vyukov <dvyukov@google.com> Acked-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20220829124719.675715-3-elver@google.com
2022-08-30	perf/hw_breakpoint: Add KUnit test for constraints accounting	Marco Elver	3	-0/+334
	Add KUnit test for hw_breakpoint constraints accounting, with various interesting mixes of breakpoint targets (some care was taken to catch interesting corner cases via bug-injection). The test cannot be built as a module because it requires access to hw_breakpoint_slots(), which is not inlinable or exported on all architectures. Signed-off-by: Marco Elver <elver@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Dmitry Vyukov <dvyukov@google.com> Acked-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20220829124719.675715-2-elver@google.com
2022-08-30	arm64: dts: mediatek: Add missing xHCI clocks for mt8192 and mt8195	Nícolas F. R. A. Prado	2	-7/+20
	The MediaTek xHCI dt-binding expects a specific order for the clocks, but the mt8192 and mt8195 devicetrees were skipping some of the middle clocks. These clocks are wired to the controller hardware but aren't controllable. Add the missing clocks as handles to fixed clocks, so that the clock order is respected and the dtbs_check warnings are gone. Signed-off-by: Nícolas F. R. A. Prado <nfraprado@collabora.com> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Link: https://lore.kernel.org/r/20220708194314.56922-1-nfraprado@collabora.com Signed-off-by: Matthias Brugger <matthias.bgg@gmail.com>
2022-08-30	PCI: qcom-ep: Add MODULE_DEVICE_TABLE	Dmitry Baryshkov	1	-0/+1
	Add MODULE_DEVICE_TABLE to enable module autoloading for respective device. Link: https://lore.kernel.org/r/20220430084740.3769925-1-dmitry.baryshkov@linaro.org Fixes: f55fee56a631 ("PCI: qcom-ep: Add Qualcomm PCIe Endpoint controller driver") Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
2022-08-30	Merge branch 'sched/warnings' into sched/core, to pick up WARN_ON_ONCE() conversion commit	Ingo Molnar	12311	-248923/+1236370
	Merge in the BUG_ON() => WARN_ON_ONCE() conversion commit. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2022-08-30	cpufreq: bmips-cpufreq: Use module_init and add module_exit	Zhang Jianhua	1	-2/+8
	- Use module_init instead of device_initcall. - Add a function for module_exit to unregister driver. Signed-off-by: Zhang Jianhua <chris.zjh@huawei.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>