linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2022-09-05	can: rx-offload: can_rx_offload_init_queue(): fix typo	Marc Kleine-Budde	1	-1/+1
	Fix typo "rounted" -> "rounded". Link: https://lore.kernel.org/all/20220811093617.1861938-2-mkl@pengutronix.de Fixes: d254586c3453 ("can: rx-offload: Add support for HW fifo based irq offloading") Reported-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2022-09-05	rnbd-srv: remove redundant setting of blk_open_flags	Guoqing Jiang	1	-1/+0
	It is not necessary since it is set later just before function return success. Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Acked-by: Md Haris Iqbal <haris.iqbal@ionos.com> Link: https://lore.kernel.org/r/20220902100055.25724-4-guoqing.jiang@linux.dev Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-09-05	rnbd-srv: make process_msg_close returns void	Guoqing Jiang	1	-4/+3
	Change the return type to void given it always returns 0. Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Acked-by: Md Haris Iqbal <haris.iqbal@ionos.com> Link: https://lore.kernel.org/r/20220902100055.25724-3-guoqing.jiang@linux.dev Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-09-05	rnbd-srv: add comment in rnbd_srv_rdma_ev	Guoqing Jiang	1	-0/+5
	Let's add some explanations here given the err handling is not obvious. Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Acked-by: Md Haris Iqbal <haris.iqbal@ionos.com> Link: https://lore.kernel.org/r/20220902100055.25724-2-guoqing.jiang@linux.dev Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-09-05	block: remove unneeded return value of bio_check_ro()	Miaohe Lin	1	-7/+3
	bio_check_ro() always return false now. Remove this unneeded return value and cleanup the sole caller. No functional change intended. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Link: https://lore.kernel.org/r/20220905102754.1942-1-linmiaohe@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-09-05	blk-mq: remove unneeded needs_restart check	Miaohe Lin	1	-1/+1
	If code reaches here, needs_restart must be true. Remove this unneeded needs_restart check. No functional change intended. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Link: https://lore.kernel.org/r/20220905101950.4606-1-linmiaohe@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-09-05	block/blk-map: Remove set but unused variable 'added'	Jiapeng Chong	1	-2/+1
	The variable added is not effectively used in the function, so delete it. block/blk-map.c:273:16: warning: variable 'added' set but not used. Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=2049 Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Link: https://lore.kernel.org/r/20220905063253.120082-1-jiapeng.chong@linux.alibaba.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-09-05	io_uring/notif: Remove the unused function io_notif_complete()	Jiapeng Chong	1	-8/+0
	The function io_notif_complete() is defined in the notif.c file, but not called elsewhere, so delete this unused function. io_uring/notif.c:24:20: warning: unused function 'io_notif_complete' [-Wunused-function]. Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=2047 Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/20220905020436.51894-1-jiapeng.chong@linux.alibaba.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-09-05	iio: pressure: icp10100: Switch from UNIVERSAL to DEFINE_RUNTIME_DEV_PM_OPS().	Jonathan Cameron	1	-5/+5
	The suspend and resume callbacks in this driver appear to be safe to call repeatedly, but why do so when we can use the DEFINE_RUNTIME_DEV_PM_OPS() macro to supply callbacks that check if we are already runtime suspended before doing unnecessary work. Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Jean-Baptiste Maneyrol <jean-baptiste.maneyrol@tdk.com> Acked-by: Jean-Baptiste Maneyrol <jean-baptiste.maneyrol@tdk.com> Link: https://lore.kernel.org/r/20220807190414.1039028-2-jic23@kernel.org
2022-09-05	iio: adc: max1363: Drop provision to provide an IIO channel map via platform data	Jonathan Cameron	1	-6/+0
	Back in the days of board files, platform data was used to provide information on the mapping from ADC channel to an analog signal from another device. We've long since moved to doing this via device tree. Hence drop the support from the max1363 driver which is the only driver still providing this. Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/20220821161058.2207185-1-jic23@kernel.org
2022-09-05	iio: accel: bma400: Add support for single and double tap events	Jagath Jog J	2	-10/+347
	Add support for single and double tap events based on the tap threshold value, minimum quiet time before and after the tap and minimum time between the taps in the double tap. The INT1 pin is used to interrupt and the event is pushed to userspace. Signed-off-by: Jagath Jog J <jagathjog1996@gmail.com> Link: https://lore.kernel.org/r/20220831063117.4141-3-jagathjog1996@gmail.com Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2022-09-05	iio: Add new event type gesture and use direction for single and double tap	Jagath Jog J	5	-2/+87
	Add new event type for tap called gesture and the direction can be used to differentiate single and double tap. This may be used by accelerometer sensors to express single and double tap events. For directional tap, modifiers like IIO_MOD_(X/Y/Z) can be used along with singletap and doubletap direction. Signed-off-by: Jagath Jog J <jagathjog1996@gmail.com> Link: https://lore.kernel.org/r/20220831063117.4141-2-jagathjog1996@gmail.com Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2022-09-05	iio: Use per-device lockdep class for mlock	Vincent Whitchurch	2	-0/+7
	If an IIO driver uses callbacks from another IIO driver and calls iio_channel_start_all_cb() from one of its buffer setup ops, then lockdep complains due to the lock nesting, as in the below example with lmp91000. Since the locks are being taken on different IIO devices, there is no actual deadlock. Fix the warning by telling lockdep to use a different class for each iio_device. ============================================ WARNING: possible recursive locking detected -------------------------------------------- python3/23 is trying to acquire lock: (&indio_dev->mlock){+.+.}-{3:3}, at: iio_update_buffers but task is already holding lock: (&indio_dev->mlock){+.+.}-{3:3}, at: enable_store other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&indio_dev->mlock); lock(&indio_dev->mlock); * DEADLOCK * May be due to missing lock nesting notation 5 locks held by python3/23: #0: (sb_writers#5){.+.+}-{0:0}, at: ksys_write #1: (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_write_iter #2: (kn->active#14){.+.+}-{0:0}, at: kernfs_fop_write_iter #3: (&indio_dev->mlock){+.+.}-{3:3}, at: enable_store #4: (&iio_dev_opaque->info_exist_lock){+.+.}-{3:3}, at: iio_update_buffers Call Trace: __mutex_lock iio_update_buffers iio_channel_start_all_cb lmp91000_buffer_postenable __iio_update_buffers enable_store Fixes: 67e17300dc1d76 ("iio: potentiostat: add LMP91000 support") Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Link: https://lore.kernel.org/r/20220829091840.2791846-1-vincent.whitchurch@axis.com Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2022-09-05	iio: adc: add max11205 adc driver	Ramona Bolboaca	3	-0/+198
	Adding support for max11205 16-bit single-channel ultra-low power delta-sigma adc. The MAX11205 is compatible with the 2-wire interface and uses SCLK and RDY/DOUT for serial communications. In this mode, all controls are implemented by timing the high or low phase of the SCLK. The 2-wire serial interface only allows for data to be read out through the RDY/DOUT output. Datasheet: https://datasheets.maximintegrated.com/en/ds/MAX11205.pdf Signed-off-by: Ramona Bolboaca <ramona.bolboaca@analog.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Link: https://lore.kernel.org/r/20220831133021.215625-2-ramona.bolboaca@analog.com Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2022-09-05	Drivers: hv: Never allocate anything besides framebuffer from framebuffer memory region	Vitaly Kuznetsov	1	-1/+9
	Passed through PCI device sometimes misbehave on Gen1 VMs when Hyper-V DRM driver is also loaded. Looking at IOMEM assignment, we can see e.g. $ cat /proc/iomem ... f8000000-fffbffff : PCI Bus 0000:00 f8000000-fbffffff : 0000:00:08.0 f8000000-f8001fff : bb8c4f33-2ba2-4808-9f7f-02f3b4da22fe ... fe0000000-fffffffff : PCI Bus 0000:00 fe0000000-fe07fffff : bb8c4f33-2ba2-4808-9f7f-02f3b4da22fe fe0000000-fe07fffff : 2ba2:00:02.0 fe0000000-fe07fffff : mlx4_core the interesting part is the 'f8000000' region as it is actually the VM's framebuffer: $ lspci -v ... 0000:00:08.0 VGA compatible controller: Microsoft Corporation Hyper-V virtual VGA (prog-if 00 [VGA controller]) Flags: bus master, fast devsel, latency 0, IRQ 11 Memory at f8000000 (32-bit, non-prefetchable) [size=64M] ... hv_vmbus: registering driver hyperv_drm hyperv_drm 5620e0c7-8062-4dce-aeb7-520c7ef76171: [drm] Synthvid Version major 3, minor 5 hyperv_drm 0000:00:08.0: vgaarb: deactivate vga console hyperv_drm 0000:00:08.0: BAR 0: can't reserve [mem 0xf8000000-0xfbffffff] hyperv_drm 5620e0c7-8062-4dce-aeb7-520c7ef76171: [drm] Cannot request framebuffer, boot fb still active? Note: "Cannot request framebuffer" is not a fatal error in hyperv_setup_gen1() as the code assumes there's some other framebuffer device there but we actually have some other PCI device (mlx4 in this case) config space there! The problem appears to be that vmbus_allocate_mmio() can use dedicated framebuffer region to serve any MMIO request from any device. The semantics one might assume of a parameter named "fb_overlap_ok" aren't implemented because !fb_overlap_ok essentially has no effect. The existing semantics are really "prefer_fb_overlap". This patch implements the expected and needed semantics, which is to not allocate from the frame buffer space when !fb_overlap_ok. Note, Gen2 VMs are usually unaffected by the issue because framebuffer region is already taken by EFI fb (in case kernel supports it) but Gen1 VMs may have this region unclaimed by the time Hyper-V PCI pass-through driver tries allocating MMIO space if Hyper-V DRM/FB drivers load after it. Devices can be brought up in any sequence so let's resolve the issue by always ignoring 'fb_mmio' region for non-FB requests, even if the region is unclaimed. Reviewed-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lore.kernel.org/r/20220827130345.1320254-4-vkuznets@redhat.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
2022-09-05	Drivers: hv: Always reserve framebuffer region for Gen1 VMs	Vitaly Kuznetsov	1	-14/+32
	vmbus_reserve_fb() tries reserving framebuffer region iff 'screen_info.lfb_base' is set. Gen2 VMs seem to have it set by EFI and/or by the kernel EFI FB driver (or, in some edge cases like kexec, the address where the buffer was moved, see https://lore.kernel.org/all/20201014092429.1415040-1-kasong@redhat.com/) but on Gen1 VM it depends on bootloader behavior. With grub, it depends on 'gfxpayload=' setting but in some cases it is observed to be zero. That being said, relying on 'screen_info.lfb_base' to reserve framebuffer region is risky. For Gen1 VMs, it should always be possible to get the address from the dedicated PCI device instead. Check for legacy PCI video device presence and reserve the whole region for framebuffer on Gen1 VMs. Reviewed-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lore.kernel.org/r/20220827130345.1320254-3-vkuznets@redhat.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
2022-09-05	PCI: Move PCI_VENDOR_ID_MICROSOFT/PCI_DEVICE_ID_HYPERV_VIDEO definitions to pci_ids.h	Vitaly Kuznetsov	4	-11/+3
	There are already three places in kernel which define PCI_VENDOR_ID_MICROSOFT and two for PCI_DEVICE_ID_HYPERV_VIDEO and there's a need to use these from core VMBus code. Move the defines where they belong. No functional change. Reviewed-by: Michael Kelley <mikelley@microsoft.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> # pci_ids.h Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lore.kernel.org/r/20220827130345.1320254-2-vkuznets@redhat.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
2022-09-05	tools: hv: kvp: remove unnecessary (void*) conversions	Zhou jie	1	-2/+2
	Remove unnecessary void* type casting. Signed-off-by: Zhou jie <zhoujie@nfschina.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/20220823034552.8596-1-zhoujie@nfschina.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
2022-09-05	Drivers: hv: remove duplicate word in a comment	Shaomin Deng	1	-1/+1
	Signed-off-by: Shaomin Deng <dengshaomin@cdjrlc.com> Link: https://lore.kernel.org/r/20220904154808.26022-1-dengshaomin@cdjrlc.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
2022-09-05	lib/string_helpers: Introduce parse_int_array_user()	Mark Brown	3	-89/+61
	Merge series from Cezary Rojewski <cezary.rojewski@intel.com>: Continuation of recent upstream discussion [1] regarding user string tokenization. First, parse_int_array_user() is introduced to allow for splitting specified user string into a sequence of integers. Makes use of get_options() internally so the parsing logic is not duplicated. With that done, redundant parts of the sound driver are removed. Originally similar functionality was added for the SOF sound driver. As more users are on the horizon, it is desirable to update existing string_helpers code and provide a unified solution.
2022-09-05	Untested TAS2562 power setting fixes	Mark Brown	1	-61/+29
	Merge series from Martin Povišer <povik+lin@cutebit.org>: The tas2562 driver does the same thing with the setting of PWR_CTRL field as the tas2764/tas2770 drivers were doing. Link: https://lore.kernel.org/alsa-devel/20220808141246.5749-1-povik+lin@cutebit.org/T/#t Link: https://lore.kernel.org/alsa-devel/20220825140241.53963-1-povik+lin@cutebit.org/T/#t These are blindly written patches without testing since I don't have the hardware. (I even tried TI's formal sample request program but was refused there. CCing @ti.com addresses I found on other series recently submitted.)
2022-09-05	ASoC: soc-pcm.c: random cleanup	Mark Brown	1	-65/+47
	Merge series from Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>: These are not related, but random cleanup patches for soc-pcm.c
2022-09-05	erofs: fix pcluster use-after-free on UP platforms	Gao Xiang	1	-29/+0
	During stress testing with CONFIG_SMP disabled, KASAN reports as below: ================================================================== BUG: KASAN: use-after-free in __mutex_lock+0xe5/0xc30 Read of size 8 at addr ffff8881094223f8 by task stress/7789 CPU: 0 PID: 7789 Comm: stress Not tainted 6.0.0-rc1-00002-g0d53d2e882f9 #3 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 Call Trace: <TASK> .. __mutex_lock+0xe5/0xc30 .. z_erofs_do_read_page+0x8ce/0x1560 .. z_erofs_readahead+0x31c/0x580 .. Freed by task 7787 kasan_save_stack+0x1e/0x40 kasan_set_track+0x20/0x30 kasan_set_free_info+0x20/0x40 __kasan_slab_free+0x10c/0x190 kmem_cache_free+0xed/0x380 rcu_core+0x3d5/0xc90 __do_softirq+0x12d/0x389 Last potentially related work creation: kasan_save_stack+0x1e/0x40 __kasan_record_aux_stack+0x97/0xb0 call_rcu+0x3d/0x3f0 erofs_shrink_workstation+0x11f/0x210 erofs_shrink_scan+0xdc/0x170 shrink_slab.constprop.0+0x296/0x530 drop_slab+0x1c/0x70 drop_caches_sysctl_handler+0x70/0x80 proc_sys_call_handler+0x20a/0x2f0 vfs_write+0x555/0x6c0 ksys_write+0xbe/0x160 do_syscall_64+0x3b/0x90 The root cause is that erofs_workgroup_unfreeze() doesn't reset to orig_val thus it causes a race that the pcluster reuses unexpectedly before freeing. Since UP platforms are quite rare now, such path becomes unnecessary. Let's drop such specific-designed path directly instead. Fixes: 73f5c66df3e2 ("staging: erofs: fix `erofs_workgroup_{try_to_freeze, unfreeze}'") Reviewed-by: Yue Hu <huyue2@coolpad.com> Reviewed-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20220902045710.109530-1-hsiangkao@linux.alibaba.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2022-09-05	erofs: avoid the potentially wrong m_plen for big pcluster	Yue Hu	1	-8/+8
	Actually, 'compressedlcs' stores compressed block count rather than lcluster count. Therefore, the number of bits for shifting the count should be 'LOG_BLOCK_SIZE' rather than 'lclusterbits' although current lcluster size is 4K. The value of 'm_plen' will be wrong once we enable the non 4K-sized lcluster. Signed-off-by: Yue Hu <huyue2@coolpad.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20220812060150.8510-1-huyue2@coolpad.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2022-09-05	erofs: fix error return code in erofs_fscache_{meta_,}read_folio	Sun Ke	1	-2/+6
	If erofs_fscache_alloc_request fail and then goto out, it will return 0. it should return a negative error code instead of 0. Fixes: d435d53228dd ("erofs: change to use asynchronous io for fscache readpage/readahead") Signed-off-by: Sun Ke <sunke32@huawei.com> Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20220815034829.3940803-1-sunke32@huawei.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2022-09-05	asm-generic: Conditionally enable do_softirq_own_stack() via Kconfig.	Sebastian Andrzej Siewior	10	-10/+13
	Remove the CONFIG_PREEMPT_RT symbol from the ifdef around do_softirq_own_stack() and move it to Kconfig instead. Enable softirq stacks based on SOFTIRQ_ON_OWN_STACK which depends on HAVE_SOFTIRQ_ON_OWN_STACK and its default value is set to !PREEMPT_RT. This ensures that softirq stacks are not used on PREEMPT_RT and avoids a 'select' statement on an option which has a 'depends' statement. Link: https://lore.kernel.org/YvN5E%2FPrHfUhggr7@linutronix.de Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2022-09-05	arm64: defconfig: Config that had RPMSG_CHAR now gets RPMSG_CTRL	Arnaud Pouliquen	1	-0/+1
	In the commit 617d32938d1b ("rpmsg: Move the rpmsg control device from rpmsg_char to rpmsg_ctrl"), we split the rpmsg_char driver in two. By default give everyone who had the old driver enabled the rpmsg_ctrl driver too. Signed-off-by: Arnaud Pouliquen <arnaud.pouliquen@foss.st.com> Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2022-09-05	arm: configs: Configs that had RPMSG_CHAR now get RPMSG_CTRL	Arnaud Pouliquen	1	-0/+1
	In the commit 617d32938d1b ("rpmsg: Move the rpmsg control device from rpmsg_char to rpmsg_ctrl"), we split the rpmsg_char driver in two. By default give everyone who had the old driver enabled the rpmsg_ctrl driver too. Signed-off-by: Arnaud Pouliquen <arnaud.pouliquen@foss.st.com> Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org> Reviewed-by: Mukesh Ojha <quic_mojha@quicinc.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2022-09-05	ASoC: SOF: Remove strsplit_u32() and tokenize_input()	Cezary Rojewski	1	-89/+15
	Make use of global integer-array parsing helper instead of the internal one as both serve same purpose. With that, both strsplit_u32() and tokenize_input() become unused so remove them. Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Cezary Rojewski <cezary.rojewski@intel.com> Link: https://lore.kernel.org/r/20220904102840.862395-3-cezary.rojewski@intel.com Signed-off-by: Mark Brown <broonie@kernel.org>
2022-09-05	lib/string_helpers: Introduce parse_int_array_user()	Cezary Rojewski	2	-0/+46
	Add new helper function to allow for splitting specified user string into a sequence of integers. Internally it makes use of get_options() so the returned sequence contains the integers extracted plus an additional element that begins the sequence and specifies the integers count. Suggested-by: Andy Shevchenko <andy.shevchenko@gmail.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Cezary Rojewski <cezary.rojewski@intel.com> Link: https://lore.kernel.org/r/20220904102840.862395-2-cezary.rojewski@intel.com Signed-off-by: Mark Brown <broonie@kernel.org>
2022-09-05	r8169: remove not needed net_ratelimit() check	Heiner Kallweit	1	-2/+1
	We're not in a hot path and don't want to miss this message, therefore remove the net_ratelimit() check. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-05	netlink: Bounds-check struct nlmsgerr creation	Kees Cook	2	-6/+10
	In preparation for FORTIFY_SOURCE doing bounds-check on memcpy(), switch from __nlmsg_put to nlmsg_put(), and explain the bounds check for dealing with the memcpy() across a composite flexible array struct. Avoids this future run-time warning: memcpy: detected field-spanning write (size 32) of single field "&errmsg->msg" at net/netlink/af_netlink.c:2447 (size 16) Cc: Jakub Kicinski <kuba@kernel.org> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Jozsef Kadlecsik <kadlec@netfilter.org> Cc: Florian Westphal <fw@strlen.de> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Cc: syzbot <syzkaller@googlegroups.com> Cc: netfilter-devel@vger.kernel.org Cc: coreteam@netfilter.org Cc: netdev@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20220901071336.1418572-1-keescook@chromium.org Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-05	Merge tag 'for-net-2022-09-02' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth	David S. Miller	1	-6/+6
	Luiz Augusto von Dentz says: ==================== bluetooth pull request for net: - Fix regression preventing ACL packet transmission ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-05	btrfs: zoned: fix mounting with conventional zones	Johannes Thumshirn	1	-41/+40
	Since commit 6a921de58992 ("btrfs: zoned: introduce space_info->active_total_bytes"), we're only counting the bytes of a block group on an active zone as usable for metadata writes. But on a SMR drive, we don't have active zones and short circuit some of the logic. This leads to an error on mount, because we cannot reserve space for metadata writes. Fix this by also setting the BLOCK_GROUP_FLAG_ZONE_IS_ACTIVE bit in the block-group's runtime flag if the zone is a conventional zone. Fixes: 6a921de58992 ("btrfs: zoned: introduce space_info->active_total_bytes") Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-09-05	Merge branch 'bpf-allocator'	Daniel Borkmann	10	-134/+820
	Alexei Starovoitov says: ==================== Introduce any context BPF specific memory allocator. Tracing BPF programs can attach to kprobe and fentry. Hence they run in unknown context where calling plain kmalloc() might not be safe. Front-end kmalloc() with per-cpu cache of free elements. Refill this cache asynchronously from irq_work. Major achievements enabled by bpf_mem_alloc: - Dynamically allocated hash maps used to be 10 times slower than fully preallocated. With bpf_mem_alloc and subsequent optimizations the speed of dynamic maps is equal to full prealloc. - Tracing bpf programs can use dynamically allocated hash maps. Potentially saving lots of memory. Typical hash map is sparsely populated. - Sleepable bpf programs can used dynamically allocated hash maps. Future work: - Expose bpf_mem_alloc as uapi FD to be used in dynptr_alloc, kptr_alloc - Convert lru map to bpf_mem_alloc - Further cleanup htab code. Example: htab_use_raw_lock can be removed. Changelog: v5->v6: - Debugged the reason for selftests/bpf/test_maps ooming in a small VM that BPF CI is using. Added patch 16 that optimizes the usage of rcu_barrier-s between bpf_mem_alloc and hash map. It drastically improved the speed of htab destruction. v4->v5: - Fixed missing migrate_disable in hash tab free path (Daniel) - Replaced impossible "memory leak" with WARN_ON_ONCE (Martin) - Dropped sysctl kernel.bpf_force_dyn_alloc patch (Daniel) - Added Andrii's ack - Added new patch 15 that removes kmem_cache usage from bpf_mem_alloc. It saves memory, speeds up map create/destroy operations while maintains hash map update/delete performance. v3->v4: - fix build issue due to missing local.h on 32-bit arch - add Kumar's ack - proposal for next steps from Delyan: https://lore.kernel.org/bpf/d3f76b27f4e55ec9e400ae8dcaecbb702a4932e8.camel@fb.com/ v2->v3: - Rewrote the free_list algorithm based on discussions with Kumar. Patch 1. - Allowed sleepable bpf progs use dynamically allocated maps. Patches 13 and 14. - Added sysctl to force bpf_mem_alloc in hash map even if pre-alloc is requested to reduce memory consumption. Patch 15. - Fix: zero-fill percpu allocation - Single rcu_barrier at the end instead of each cpu during bpf_mem_alloc destruction v2 thread: https://lore.kernel.org/bpf/20220817210419.95560-1-alexei.starovoitov@gmail.com/ v1->v2: - Moved unsafe direct call_rcu() from hash map into safe place inside bpf_mem_alloc. Patches 7 and 9. - Optimized atomic_inc/dec in hash map with percpu_counter. Patch 6. - Tuned watermarks per allocation size. Patch 8 - Adopted this approach to per-cpu allocation. Patch 10. - Fully converted hash map to bpf_mem_alloc. Patch 11. - Removed tracing prog restriction on map types. Combination of all patches and final patch 12. v1 thread: https://lore.kernel.org/bpf/20220623003230.37497-1-alexei.starovoitov@gmail.com/ LWN article: https://lwn.net/Articles/899274/ ==================== Link: https://lore.kernel.org/r/ Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2022-09-05	bpf: Optimize rcu_barrier usage between hash map and bpf_mem_alloc.	Alexei Starovoitov	3	-19/+69
	User space might be creating and destroying a lot of hash maps. Synchronous rcu_barrier-s in a destruction path of hash map delay freeing of hash buckets and other map memory and may cause artificial OOM situation under stress. Optimize rcu_barrier usage between bpf hash map and bpf_mem_alloc: - remove rcu_barrier from hash map, since htab doesn't use call_rcu directly and there are no callback to wait for. - bpf_mem_alloc has call_rcu_in_progress flag that indicates pending callbacks. Use it to avoid barriers in fast path. - When barriers are needed copy bpf_mem_alloc into temp structure and wait for rcu barrier-s in the worker to let the rest of hash map freeing to proceed. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220902211058.60789-17-alexei.starovoitov@gmail.com
2022-09-05	bpf: Remove usage of kmem_cache from bpf_mem_cache.	Alexei Starovoitov	1	-36/+14
	For bpf_mem_cache based hash maps the following stress test: for (i = 1; i <= 512; i <<= 1) for (j = 1; j <= 1 << 18; j <<= 1) fd = bpf_map_create(BPF_MAP_TYPE_HASH, NULL, i, j, 2, 0); creates many kmem_cache-s that are not mergeable in debug kernels and consume unnecessary amount of memory. Turned out bpf_mem_cache's free_list logic does batching well, so usage of kmem_cache for fixes size allocations doesn't bring any performance benefits vs normal kmalloc. Hence get rid of kmem_cache in bpf_mem_cache. That saves memory, speeds up map create/destroy operations, while maintains hash map update/delete performance. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220902211058.60789-16-alexei.starovoitov@gmail.com
2022-09-05	bpf: Remove prealloc-only restriction for sleepable bpf programs.	Alexei Starovoitov	1	-23/+0
	Since hash map is now converted to bpf_mem_alloc and it's waiting for rcu and rcu_tasks_trace GPs before freeing elements into global memory slabs it's safe to use dynamically allocated hash maps in sleepable bpf programs. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220902211058.60789-15-alexei.starovoitov@gmail.com
2022-09-05	bpf: Prepare bpf_mem_alloc to be used by sleepable bpf programs.	Alexei Starovoitov	1	-1/+14
	Use call_rcu_tasks_trace() to wait for sleepable progs to finish. Then use call_rcu() to wait for normal progs to finish and finally do free_one() on each element when freeing objects into global memory pool. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220902211058.60789-14-alexei.starovoitov@gmail.com
2022-09-05	bpf: Remove tracing program restriction on map types	Alexei Starovoitov	1	-42/+0
	The hash map is now fully converted to bpf_mem_alloc. Its implementation is not allocating synchronously and not calling call_rcu() directly. It's now safe to use non-preallocated hash maps in all types of tracing programs including BPF_PROG_TYPE_PERF_EVENT that runs out of NMI context. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220902211058.60789-13-alexei.starovoitov@gmail.com
2022-09-05	bpf: Convert percpu hash map to per-cpu bpf_mem_alloc.	Alexei Starovoitov	1	-26/+19
	Convert dynamic allocations in percpu hash map from alloc_percpu() to bpf_mem_cache_alloc() from per-cpu bpf_mem_alloc. Since bpf_mem_alloc frees objects after RCU gp the call_rcu() is removed. pcpu_init_value() now needs to zero-fill per-cpu allocations, since dynamically allocated map elements are now similar to full prealloc, since alloc_percpu() is not called inline and the elements are reused in the freelist. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220902211058.60789-12-alexei.starovoitov@gmail.com
2022-09-05	bpf: Add percpu allocation support to bpf_mem_alloc.	Alexei Starovoitov	3	-7/+41
	Extend bpf_mem_alloc to cache free list of fixed size per-cpu allocations. Once such cache is created bpf_mem_cache_alloc() will return per-cpu objects. bpf_mem_cache_free() will free them back into global per-cpu pool after observing RCU grace period. per-cpu flavor of bpf_mem_alloc is going to be used by per-cpu hash maps. The free list cache consists of tuples { llist_node, per-cpu pointer } Unlike alloc_percpu() that returns per-cpu pointer the bpf_mem_cache_alloc() returns a pointer to per-cpu pointer and bpf_mem_cache_free() expects to receive it back. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220902211058.60789-11-alexei.starovoitov@gmail.com
2022-09-05	bpf: Batch call_rcu callbacks instead of SLAB_TYPESAFE_BY_RCU.	Alexei Starovoitov	2	-4/+66
	SLAB_TYPESAFE_BY_RCU makes kmem_caches non mergeable and slows down kmem_cache_destroy. All bpf_mem_cache are safe to share across different maps and programs. Convert SLAB_TYPESAFE_BY_RCU to batched call_rcu. This change solves the memory consumption issue, avoids kmem_cache_destroy latency and keeps bpf hash map performance the same. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220902211058.60789-10-alexei.starovoitov@gmail.com
2022-09-05	bpf: Adjust low/high watermarks in bpf_mem_cache	Alexei Starovoitov	1	-14/+36
	The same low/high watermarks for every bucket in bpf_mem_cache consume significant amount of memory. Preallocating 64 elements of 4096 bytes each in the free list is not efficient. Make low/high watermarks and batching value dependent on element size. This change brings significant memory savings. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220902211058.60789-9-alexei.starovoitov@gmail.com
2022-09-05	bpf: Optimize call_rcu in non-preallocated hash map.	Alexei Starovoitov	3	-14/+7
	Doing call_rcu() million times a second becomes a bottle neck. Convert non-preallocated hash map from call_rcu to SLAB_TYPESAFE_BY_RCU. The rcu critical section is no longer observed for one htab element which makes non-preallocated hash map behave just like preallocated hash map. The map elements are released back to kernel memory after observing rcu critical section. This improves 'map_perf_test 4' performance from 100k events per second to 250k events per second. bpf_mem_alloc + percpu_counter + typesafe_by_rcu provide 10x performance boost to non-preallocated hash map and make it within few % of preallocated map while consuming fraction of memory. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220902211058.60789-8-alexei.starovoitov@gmail.com
2022-09-05	bpf: Optimize element count in non-preallocated hash map.	Alexei Starovoitov	1	-8/+62
	The atomic_inc/dec might cause extreme cache line bouncing when multiple cpus access the same bpf map. Based on specified max_entries for the hash map calculate when percpu_counter becomes faster than atomic_t and use it for such maps. For example samples/bpf/map_perf_test is using hash map with max_entries 1000. On a system with 16 cpus the 'map_perf_test 4' shows 14k events per second using atomic_t. On a system with 15 cpus it shows 100k events per second using percpu. map_perf_test is an extreme case where all cpus colliding on atomic_t which causes extreme cache bouncing. Note that the slow path of percpu_counter is 5k events per secound vs 14k for atomic, so the heuristic is necessary. See comment in the code why the heuristic is based on num_online_cpus(). Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220902211058.60789-7-alexei.starovoitov@gmail.com
2022-09-05	bpf: Relax the requirement to use preallocated hash maps in tracing progs.	Alexei Starovoitov	1	-9/+22
	Since bpf hash map was converted to use bpf_mem_alloc it is safe to use from tracing programs and in RT kernels. But per-cpu hash map is still using dynamic allocation for per-cpu map values, hence keep the warning for this map type. In the future alloc_percpu_gfp can be front-end-ed with bpf_mem_cache and this restriction will be completely lifted. perf_event (NMI) bpf programs have to use preallocated hash maps, because free_htab_elem() is using call_rcu which might crash if re-entered. Sleepable bpf programs have to use preallocated hash maps, because life time of the map elements is not protected by rcu_read_lock/unlock. This restriction can be lifted in the future as well. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220902211058.60789-6-alexei.starovoitov@gmail.com
2022-09-05	samples/bpf: Reduce syscall overhead in map_perf_test.	Alexei Starovoitov	2	-17/+29
	Make map_perf_test for preallocated and non-preallocated hash map spend more time inside bpf program to focus performance analysis on the speed of update/lookup/delete operations performed by bpf program. It makes 'perf report' of bpf_mem_alloc look like: 11.76% map_perf_test [k] _raw_spin_lock_irqsave 11.26% map_perf_test [k] htab_map_update_elem 9.70% map_perf_test [k] _raw_spin_lock 9.47% map_perf_test [k] htab_map_delete_elem 8.57% map_perf_test [k] memcpy_erms 5.58% map_perf_test [k] alloc_htab_elem 4.09% map_perf_test [k] __htab_map_lookup_elem 3.44% map_perf_test [k] syscall_exit_to_user_mode 3.13% map_perf_test [k] lookup_nulls_elem_raw 3.05% map_perf_test [k] migrate_enable 3.04% map_perf_test [k] memcmp 2.67% map_perf_test [k] unit_free 2.39% map_perf_test [k] lookup_elem_raw Reduce default iteration count as well to make 'map_perf_test' quick enough even on debug kernels. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220902211058.60789-5-alexei.starovoitov@gmail.com
2022-09-05	selftests/bpf: Improve test coverage of test_maps	Alexei Starovoitov	1	-14/+24
	Make test_maps more stressful with more parallelism in update/delete/lookup/walk including different value sizes. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220902211058.60789-4-alexei.starovoitov@gmail.com
2022-09-05	bpf: Convert hash map to bpf_mem_alloc.	Alexei Starovoitov	1	-5/+16
	Convert bpf hash map to use bpf memory allocator. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220902211058.60789-3-alexei.starovoitov@gmail.com