wireguard-linux - WireGuard for the Linux kernel

Age	Commit message (Collapse)	Author	Files	Lines
2024-12-24	usb: typec: Add driver for Thunderbolt 3 Alternate Mode	Heikki Krogerus	4	-0/+400
	Thunderbolt 3 Alternate Mode entry flow is described in USB Type-C Specification Release 2.0. Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Co-developed-by: Abhishek Pandit-Subedi <abhishekpandit@chromium.org> Signed-off-by: Abhishek Pandit-Subedi <abhishekpandit@chromium.org> Reviewed-by: Benson Leung <bleung@chromium.org> Link: https://lore.kernel.org/r/20241213153543.v5.2.I3080b036e8de0b9957c57c1c3059db7149c5e549@changeid Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-24	usb: typec: Only use SVID for matching altmodes	Abhishek Pandit-Subedi	6	-16/+8
	Mode in struct typec_altmode is used to indicate the index of the altmode on a port, partner or plug. It is used in enter mode VDMs but doesn't make much sense for matching against altmode drivers or for matching partner to port altmodes. Signed-off-by: Abhishek Pandit-Subedi <abhishekpandit@chromium.org> Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Reviewed-by: Benson Leung <bleung@chromium.org> Link: https://lore.kernel.org/r/20241213153543.v5.1.Ie0d37646f18461234777d88b4c3e21faed92ed4f@changeid Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-24	usb: dwc3: gadget: Fix incorrect UDC state after manual deconfiguration	Roy Luo	1	-0/+2
	The UDC state in sysfs (/sys/class/udc/<udc>/state) should accurately reflect the current state of the USB Device Controller. Currently, the UDC state is not handled consistently during gadget disconnection. While the disconnect interrupt path correctly sets the state to "not-attached", manual deconfiguration leaves the state in "configured", misrepresenting the actual situation. This commit ensures consistent UDC state handling by setting the state to "not-attached" after manual deconfiguration. This accurately reflects the UDC's state and provides a consistent behavior regardless of the disconnection method. Signed-off-by: Roy Luo <royluo@google.com> Reviewed-by: André Draszik <andre.draszik@linaro.org> Tested-by: André Draszik <andre.draszik@linaro.org> Link: https://lore.kernel.org/r/20241223042536.1465299-1-royluo@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-24	usbip: vudc: Constify 'struct bin_attribute'	Thomas Weißschuh	1	-4/+4
	The sysfs core now allows instances of 'struct bin_attribute' to be moved into read-only memory. Make use of that to protect them against accidental or malicious modifications. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Link: https://lore.kernel.org/r/20241222-sysfs-const-bin_attr-usbip-v1-1-20d611a9bfa4@weissschuh.net Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-24	usb: core: sysfs: Constify 'struct bin_attribute'	Thomas Weißschuh	1	-6/+6
	The sysfs core now allows instances of 'struct bin_attribute' to be moved into read-only memory. Make use of that to protect them against accidental or malicious modifications. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Link: https://lore.kernel.org/r/20241222-sysfs-const-bin_attr-usb-v1-1-19a137c0f20a@weissschuh.net Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-24	usbip: Accept arbitrarily long scatter-gather list	Jason Long	1	-6/+2
	Fixes issue where memory will fail to be allocated for larger bulk transfers, ~1 MB or more. This occurs because userland libraries, such as libusb, send the entire USB data buffer when SG support is detected. The assumption is that the driver knows how to properly split the data up before sending it out. By hardcoding a limit, bigger transfers that exceed the SG tablesize limit of 32 will be unable to use SG. This results in an attempt to allocate contiguous pages which, unsurprisingly, will fail too and returns an ENOMEM. It looks like other drivers that support SG allow for any length of SG lists. Accepting any SG size allows the driver to properly handle large bulk transfer situations. Tested bulk read and write operations using the following devices: - Logitech Webcam Pro 9000 - USB 2.0 - SanDisk Ultra - USB 3.0 - Logitech M500s Mouse Signed-off-by: Jason Long <jasonlongball@gmail.com> Reviewed-by: Shuah Khan <skhan@linuxfoundation.org> Link: https://lore.kernel.org/r/20241218161344.202637-1-jasonlongball@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-24	usb: typec: tcpm: Add new AMS for Get_Revision response	Amit Sunil Dhamne	2	-3/+60
	This commit adds a new AMS for responding to a "Get_Revision" request. Revision message consists of the following fields: +----------------------------------------------------+ \| Header \| RMDO \| \| No. of data objects = 1 \| \| +----------------------------------------------------+ While RMDO consists of: * B31..28 Revision Major * B27..24 Revision Minor * B23..20 Version Major * B19..16 Version Minor * B15..0 Reserved, shall be set to zero. As per the PD spec ("8.3.3.16.2.1 PR_Give_Revision State"), a request is only expected when an explicit contract is established and the port is in ready state. This AMS is only supported for PD >= 3.0. Signed-off-by: Amit Sunil Dhamne <amitsd@google.com> Reviewed-by: Badhri Jagan Sridharan <badhri@google.com> Link: https://lore.kernel.org/r/20241210-get_rev_upstream-v2-3-d0094e52d48f@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-24	usb: typec: tcpm: Add support for parsing pd-revision DT property	Amit Sunil Dhamne	1	-2/+44
	Add support for parsing "pd-revision" DT property in TCPM and store PD revision and version supported by the Type-C connnector. It should be noted that the PD revision is the maximum possible revision supported by the port. This is different from the 2 bit revision set in PD msg headers. The purpose of the 2 bit revision value is to negotiate between Rev 2.X & 3.X spec rev as part of contract negotiation, while this is used for Get_Revision AMS after a contract is in place. Signed-off-by: Amit Sunil Dhamne <amitsd@google.com> Reviewed-by: Badhri Jagan Sridharan <badhri@google.com> Link: https://lore.kernel.org/r/20241210-get_rev_upstream-v2-2-d0094e52d48f@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-24	dt-bindings: connector: Add pd-revision property	Amit Sunil Dhamne	2	-0/+8
	Add pd-revision property definition, to specify the maximum Power Delivery Revision and Version supported by the connector. Signed-off-by: Amit Sunil Dhamne <amitsd@google.com> Reviewed-by: Rob Herring (Arm) <robh@kernel.org> Link: https://lore.kernel.org/r/20241210-get_rev_upstream-v2-1-d0094e52d48f@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-24	usb: typec: hd3ss3220: support configuring role preference based on fwnode property and typec_operation	Oliver Facklam	1	-25/+47
	The TI HD3SS3220 Type-C controller supports configuring its role preference when operating as a dual-role port through the SOURCE_PREF field of the General Control Register. The previous driver behavior was to set the role preference based on the dr_set typec_operation. However, the controller does not support swapping the data role on an active connection due to its lack of Power Delivery support. Remove previous dr_set typec_operation, and support setting the role preference based on the corresponding fwnode property, as well as the try_role typec_operation. Signed-off-by: Oliver Facklam <oliver.facklam@zuehlke.com> Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Link: https://lore.kernel.org/r/20241211-usb-typec-controller-enhancements-v3-3-e4bc1b6e1441@zuehlke.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-24	usb: typec: hd3ss3220: support configuring port type	Oliver Facklam	1	-1/+87
	The TI HD3SS3220 Type-C controller supports configuring the port type it will operate as through the MODE_SELECT field of the General Control Register. Configure the port type based on the fwnode property "power-role" during probe, if present. If the property is absent, leave the operation mode at the default, which is defined by the PORT pin of the chip. Support configuring the port type through the port_type_set typec_operation as well. The MODE_SELECT field can only be changed when the controller is in unattached state, so follow the sequence recommended by the datasheet to: 1. disable termination on CC pins to disable the controller 2. change the mode 3. re-enable termination This will effectively cause a connected device to disconnect for the duration of the mode change. Signed-off-by: Oliver Facklam <oliver.facklam@zuehlke.com> Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Link: https://lore.kernel.org/r/20241211-usb-typec-controller-enhancements-v3-2-e4bc1b6e1441@zuehlke.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-24	usb: typec: hd3ss3220: configure advertised power opmode based on fwnode property	Oliver Facklam	1	-0/+53
	The TI HD3SS3220 Type-C controller supports configuring its advertised power operation mode over I2C using the CURRENT_MODE_ADVERTISE field of the Connection Status Register. Configure this power mode based on the existing (optional) property "typec-power-opmode" of /schemas/connector/usb-connector.yaml Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Signed-off-by: Oliver Facklam <oliver.facklam@zuehlke.com> Link: https://lore.kernel.org/r/20241211-usb-typec-controller-enhancements-v3-1-e4bc1b6e1441@zuehlke.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-24	arm64: dts: qcom: x1e80100-qcp: Enable external DP support	Stephan Gerhold	1	-0/+24
	Now that the FSUSB42 USB switches are described, enable support for DP on the three USB-C ports of the X1E80100 QCP. It supports up to 4 lanes, but for now we need to limit this to 2 lanes due to limitations in the USB/DP combo PHY driver. The same limitation also exists on other boards upstream. Co-developed-by: Abel Vesa <abel.vesa@linaro.org> Signed-off-by: Abel Vesa <abel.vesa@linaro.org> Signed-off-by: Stephan Gerhold <stephan.gerhold@linaro.org> Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Link: https://lore.kernel.org/r/20241212-x1e80100-qcp-dp-v1-3-37cb362a0dfe@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-24	arm64: dts: qcom: x1e80100-qcp: Add FSUSB42 USB switches	Stephan Gerhold	1	-0/+154
	Unlike most X1E boards, the QCP does not have Parade PS8830 retimers on the three USB-C ports. Instead, there are FSUSB42 USB switches for each port that handle orientation switching for the SBU lines. The overall setup is similar to the gpio-sbu-mux defined for sc8280xp-crd and the ThinkPad X13s. Co-developed-by: Abel Vesa <abel.vesa@linaro.org> Signed-off-by: Abel Vesa <abel.vesa@linaro.org> Signed-off-by: Stephan Gerhold <stephan.gerhold@linaro.org> Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Link: https://lore.kernel.org/r/20241212-x1e80100-qcp-dp-v1-2-37cb362a0dfe@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-24	dt-bindings: usb: gpio-sbu-mux: Add an entry for FSUSB42	Stephan Gerhold	1	-0/+1
	Add a compatible entry for the onsemi FSUSB42 USB switch, which can be used for switching orientation of the SBU lines in USB Type-C applications. Drivers work as-is with the existing fallback compatible. Link to datasheet: https://www.onsemi.com/pdf/datasheet/fsusb42-d.pdf Signed-off-by: Stephan Gerhold <stephan.gerhold@linaro.org> Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20241212-x1e80100-qcp-dp-v1-1-37cb362a0dfe@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-23	usb: typec: tcpci: set local CC to Rd only when cc1/cc2 status is Rp	Xu Yang	1	-2/+2
	The cc1 and cc2 status returned by tcpci_get_cc() may be TYPEC_CC_OPEN or TYPEC_CC_RA. So don't assume it's just TYPEC_CC_RD or TYPEC_CC_RP_. This will let local port present Rd on CC only when cc1/cc2 status is TYPEC_CC_RP_. Signed-off-by: Xu Yang <xu.yang_2@nxp.com> Acked-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Link: https://lore.kernel.org/r/20241211105753.1205312-1-xu.yang_2@nxp.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-23	USB: usblp: remove redundant semicolon	Jun Yan	1	-1/+1
	remove redundant semicolon in LPIOC_SOFT_RESET to fix the incorrect macro expansion syntax. Signed-off-by: Jun Yan <jerrysteve1101@gmail.com> Link: https://lore.kernel.org/r/20241213145314.785616-1-jerrysteve1101@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-23	USB: Optimize goto logic in API usb_register_driver()	Zijun Hu	1	-4/+3
	usb_register_driver() uses complex goto statements to handle simple error cases, move down the goto label 'out' a bit to - Simplify goto logic - Leverage pr_err() prompt for driver registering failure. Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com> Link: https://lore.kernel.org/r/20241212-fix_usb-v1-1-300eb440c753@quicinc.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-23	usb: dwc3: dwc3-am62: Re-initialize controller if lost power in PM suspend	Roger Quadros	1	-27/+55
	If controller looses power during PM suspend then re-initialize it. We use the DEBUG_CFG register to track if controller lost power or was reset in PM suspend. Move all initialization code into dwc3_ti_init() so it can be re-used. Signed-off-by: Roger Quadros <rogerq@kernel.org> Acked-by: Thinh Nguyen <Thinh.Nguyen@synopsys.com> Link: https://lore.kernel.org/r/20241212-am62-dwc3-io-ddr-v3-1-10b95cd7e9c0@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-23	usb: common: expand documentation for USB functions	Lucy Mielke	1	-0/+14
	This patch adds documentation for two USB functions: - usb_otg_state_string(), which returns a human-readable name for OTG states and - usb_get_dr_mode_from_string(), which returns the dual role mode for a given string. Signed-off-by: Lucy Mielke <lucymielke@icloud.com> Link: https://lore.kernel.org/r/6nvegfmo6d5ak4soaf5nyifsaasfts4qlsnnpsd4sgpnikc2jd@amfmgdr5n3bi Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-23	usb: typec: intel_pmc_mux: Silence snprintf() output truncation warning	Heikki Krogerus	1	-1/+1
	In the function pmc_mux_port_debugfs_init() the buffer for the name of the port is limited to six bytes. That makes the compiler think that the output of "port%d" may be truncated. That can't actually happen as the interface can support maximum of eight ports. To make the compiler happy just increase the buffer to where the warning goes away. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202412031437.vX580pxx-lkp@intel.com/ Cc: Rajat Khandelwal <rajat.khandelwal@linux.intel.com> Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Link: https://lore.kernel.org/r/20241205113919.1182673-1-heikki.krogerus@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-22	Linux 6.13-rc4	Linus Torvalds	1	-1/+1

2024-12-22	KVM: x86: let it be known that ignore_msrs is a bad idea	Paolo Bonzini	1	-0/+7
	When running KVM with ignore_msrs=1 and report_ignored_msrs=0, the user has no clue that that the guest is being lied to. This may cause bug reports such as https://gitlab.com/qemu-project/qemu/-/issues/2571, where enabling a CPUID bit in QEMU caused Linux guests to try reading MSR_CU_DEF_ERR; and being lied about the existence of MSR_CU_DEF_ERR caused the guest to assume other things about the local APIC which were not true: Sep 14 12:02:53 kernel: mce: [Firmware Bug]: Your BIOS is not setting up LVT offset 0x2 for deferred error IRQs correctly. Sep 14 12:02:53 kernel: unchecked MSR access error: RDMSR from 0x852 at rIP: 0xffffffffb548ffa7 (native_read_msr+0x7/0x40) Sep 14 12:02:53 kernel: Call Trace: ... Sep 14 12:02:53 kernel: native_apic_msr_read+0x20/0x30 Sep 14 12:02:53 kernel: setup_APIC_eilvt+0x47/0x110 Sep 14 12:02:53 kernel: mce_amd_feature_init+0x485/0x4e0 ... Sep 14 12:02:53 kernel: [Firmware Bug]: cpu 0, try to use APIC520 (LVT offset 2) for vector 0xf4, but the register is already in use for vector 0x0 on this cpu Without reported_ignored_msrs=0 at least the host kernel log will contain enough information to avoid going on a wild goose chase. But if reports about individual MSR accesses are being silenced too, at least complain loudly the first time a VM is started. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-12-22	KVM: VMX: don't include '<linux/find.h>' directly	Wolfram Sang	1	-1/+1
	The header clearly states that it does not want to be included directly, only via '<linux/bitmap.h>'. Replace the include accordingly. Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Message-ID: <20241217070539.2433-2-wsa+renesas@sang-engineering.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-12-21	staging: gpib: Fix allyesconfig build failures	Steven Rostedt	2	-2/+2
	My tests run an allyesconfig build and it failed with the following errors: LD [M] samples/kfifo/dma-example.ko ld.lld: error: undefined symbol: nec7210_board_reset ld.lld: error: undefined symbol: nec7210_read ld.lld: error: undefined symbol: nec7210_write It appears that some modules call the function nec7210_board_reset() that is defined in nec7210.c. In an allyesconfig build, these other modules are built in. But the file that holds nec7210_board_reset() has: obj-m += nec7210.o Where that "-m" means it only gets built as a module. With the other modules built in, they have no access to nec7210_board_reset() and the build fails. This isn't the only function. After fixing that one, I hit another: ld.lld: error: undefined symbol: push_gpib_event ld.lld: error: undefined symbol: gpib_match_device_path Where push_gpib_event() was also used outside of the file it was defined in, and that file too only was built as a module. Since the directory that nec7210.c is only traversed when CONFIG_GPIB_NEC7210 is set, and the directory with gpib_common.c is only traversed when CONFIG_GPIB_COMMON is set, use those configs as the option to build those modules. When it is an allyesconfig, then they will both be built in and their functions will be available to the other modules that are also built in. Fixes: 3ba84ac69b53e ("staging: gpib: Add nec7210 GPIB chip driver") Fixes: 9dde4559e9395 ("staging: gpib: Add GPIB common core driver") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-12-21	modpost: distinguish same module paths from different dump files	Masahiro Yamada	2	-9/+11
	Since commit 13b25489b6f8 ("kbuild: change working directory to external module directory with M="), module paths are always relative to the top of the external module tree. The module paths recorded in Module.symvers are no longer globally unique when they are passed via KBUILD_EXTRA_SYMBOLS for building other external modules, which may result in false-positive "exported twice" errors. Such errors should not occur because external modules should be able to override in-tree modules. To address this, record the dump file path in struct module and check it when searching for a module. Fixes: 13b25489b6f8 ("kbuild: change working directory to external module directory with M=") Reported-by: Jon Hunter <jonathanh@nvidia.com> Closes: https://lore.kernel.org/all/eb21a546-a19c-40df-b821-bbba80f19a3d@nvidia.com/ Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Tested-by: Jon Hunter <jonathanh@nvidia.com>
2024-12-21	kbuild: deb-pkg: Do not install maint scripts for arch 'um'	Nicolas Schier	1	-0/+6
	Stop installing Debian maintainer scripts when building a user-mode-linux Debian package. Debian maintainer scripts are used for e.g. requesting rebuilds of initrd, rebuilding DKMS modules and updating of grub configuration. As all of this is not relevant for UML but also may lead to failures while processing the kernel hooks, do no more install maintainer scripts for the UML package. Suggested-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Nicolas Schier <nicolas@fjasle.eu> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2024-12-21	kbuild: deb-pkg: add debarch for ARCH=um	Masahiro Yamada	1	-0/+7
	'make ARCH=um bindeb-pkg' shows the following warning. $ make ARCH=um bindeb-pkg [snip] GEN debian WARNING Your architecture doesn't have its equivalent Debian userspace architecture defined! Falling back to the current host architecture (amd64). Please add support for um to ./scripts/package/mkdebian ... This commit hard-codes i386/amd64 because UML is only supported for x86. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Nicolas Schier <nicolas@fjasle.eu>
2024-12-21	kbuild: Drop support for include/asm-<arch> in headers_check.pl	Geert Uytterhoeven	2	-8/+3
	"include/asm-<arch>" was replaced by "arch/<arch>/include/asm" a long time ago. All assembler header files are now included using "#include <asm/*>", so there is no longer a need to rewrite paths. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2024-12-20	selftests/bpf: Test bpf_skb_change_tail() in TC ingress	Cong Wang	2	-0/+168
	Similarly to the previous test, we also need a test case to cover positive offsets as well, TC is an excellent hook for this. Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Zijian Zhang <zijianzhang@bytedance.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20241213034057.246437-5-xiyou.wangcong@gmail.com
2024-12-20	selftests/bpf: Introduce socket_helpers.h for TC tests	Cong Wang	2	-384/+395
	Pull socket helpers out of sockmap_helpers.h so that they can be reused for TC tests as well. This prepares for the next patch. Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20241213034057.246437-4-xiyou.wangcong@gmail.com
2024-12-20	selftests/bpf: Add a BPF selftest for bpf_skb_change_tail()	Cong Wang	2	-0/+91
	As requested by Daniel, we need to add a selftest to cover bpf_skb_change_tail() cases in skb_verdict. Here we test trimming, growing and error cases, and validate its expected return values and the expected sizes of the payload. Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20241213034057.246437-3-xiyou.wangcong@gmail.com
2024-12-20	bpf: Check negative offsets in __bpf_skb_min_len()	Cong Wang	1	-6/+15
	skb_network_offset() and skb_transport_offset() can be negative when they are called after we pull the transport header, for example, when we use eBPF sockmap at the point of ->sk_data_ready(). __bpf_skb_min_len() uses an unsigned int to get these offsets, this leads to a very large number which then causes bpf_skb_change_tail() failed unexpectedly. Fix this by using a signed int to get these offsets and ensure the minimum is at least zero. Fixes: 5293efe62df8 ("bpf: add bpf_skb_change_tail helper") Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20241213034057.246437-2-xiyou.wangcong@gmail.com
2024-12-20	tcp_bpf: Fix copied value in tcp_bpf_sendmsg	Levi Zim	1	-4/+4
	bpf kselftest sockhash::test_txmsg_cork_hangs in test_sockmap.c triggers a kernel NULL pointer dereference: BUG: kernel NULL pointer dereference, address: 0000000000000008 ? __die_body+0x6e/0xb0 ? __die+0x8b/0xa0 ? page_fault_oops+0x358/0x3c0 ? local_clock+0x19/0x30 ? lock_release+0x11b/0x440 ? kernelmode_fixup_or_oops+0x54/0x60 ? __bad_area_nosemaphore+0x4f/0x210 ? mmap_read_unlock+0x13/0x30 ? bad_area_nosemaphore+0x16/0x20 ? do_user_addr_fault+0x6fd/0x740 ? prb_read_valid+0x1d/0x30 ? exc_page_fault+0x55/0xd0 ? asm_exc_page_fault+0x2b/0x30 ? splice_to_socket+0x52e/0x630 ? shmem_file_splice_read+0x2b1/0x310 direct_splice_actor+0x47/0x70 splice_direct_to_actor+0x133/0x300 ? do_splice_direct+0x90/0x90 do_splice_direct+0x64/0x90 ? __ia32_sys_tee+0x30/0x30 do_sendfile+0x214/0x300 __se_sys_sendfile64+0x8e/0xb0 __x64_sys_sendfile64+0x25/0x30 x64_sys_call+0xb82/0x2840 do_syscall_64+0x75/0x110 entry_SYSCALL_64_after_hwframe+0x4b/0x53 This is caused by tcp_bpf_sendmsg() returning a larger value(12289) than size (8192), which causes the while loop in splice_to_socket() to release an uninitialized pipe buf. The underlying cause is that this code assumes sk_msg_memcopy_from_iter() will copy all bytes upon success but it actually might only copy part of it. This commit changes it to use the real copied bytes. Signed-off-by: Levi Zim <rsworktech@outlook.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Björn Töpel <bjorn@kernel.org> Reviewed-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20241130-tcp-bpf-sendmsg-v1-2-bae583d014f3@outlook.com
2024-12-20	skmsg: Return copied bytes in sk_msg_memcopy_from_iter	Levi Zim	1	-2/+3
	Previously sk_msg_memcopy_from_iter returns the copied bytes from the last copy_from_iter{,_nocache} call upon success. This commit changes it to return the total number of copied bytes on success. Signed-off-by: Levi Zim <rsworktech@outlook.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Björn Töpel <bjorn@kernel.org> Reviewed-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20241130-tcp-bpf-sendmsg-v1-1-bae583d014f3@outlook.com
2024-12-20	of: Add coreboot firmware to excluded default cells list	Rob Herring (Arm)	1	-1/+2
	Google Juniper and other Chromebook platforms have a very old bootloader which populates /firmware node without proper address/size-cells leading to warnings: Missing '#address-cells' in /firmware WARNING: CPU: 0 PID: 1 at drivers/of/base.c:106 of_bus_n_addr_cells+0x90/0xf0 Modules linked in: CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.12.0 #1 933ab9971ff4d5dc58cb378a96f64c7f72e3454d Hardware name: Google juniper sku16 board (DT) ... Missing '#size-cells' in /firmware WARNING: CPU: 0 PID: 1 at drivers/of/base.c:133 of_bus_n_size_cells+0x90/0xf0 Modules linked in: CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Tainted: G W 6.12.0 #1 933ab9971ff4d5dc58cb378a96f64c7f72e3454d Tainted: [W]=WARN Hardware name: Google juniper sku16 board (DT) These platform won't receive updated bootloader/firmware, so add an exclusion for platforms with a "coreboot" compatible node. While this is wider than necessary, that's the easiest fix and it doesn't doesn't matter if we miss checking other platforms using coreboot. We may revisit this later and address with a fixup to the DT itself. Reported-by: Sasha Levin <sashal@kernel.org> Closes: https://lore.kernel.org/all/Z0NUdoG17EwuCigT@sashalap/ Cc: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Cc: Matthias Brugger <matthias.bgg@gmail.com> Cc: Chen-Yu Tsai <wenst@chromium.org> Cc: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
2024-12-20	tcp_bpf: Add sk_rmem_alloc related logic for tcp_bpf ingress redirection	Zijian Zhang	3	-5/+16
	When we do sk_psock_verdict_apply->sk_psock_skb_ingress, an sk_msg will be created out of the skb, and the rmem accounting of the sk_msg will be handled by the skb. For skmsgs in __SK_REDIRECT case of tcp_bpf_send_verdict, when redirecting to the ingress of a socket, although we sk_rmem_schedule and add sk_msg to the ingress_msg of sk_redir, we do not update sk_rmem_alloc. As a result, except for the global memory limit, the rmem of sk_redir is nearly unlimited. Thus, add sk_rmem_alloc related logic to limit the recv buffer. Since the function sk_msg_recvmsg and __sk_psock_purge_ingress_msg are used in these two paths. We use "msg->skb" to test whether the sk_msg is skb backed up. If it's not, we shall do the memory accounting explicitly. Fixes: 604326b41a6f ("bpf, sockmap: convert to generic sk_msg interface") Signed-off-by: Zijian Zhang <zijianzhang@bytedance.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20241210012039.1669389-3-zijianzhang@bytedance.com
2024-12-20	tcp_bpf: Charge receive socket buffer in bpf_tcp_ingress()	Cong Wang	2	-3/+9
	When bpf_tcp_ingress() is called, the skmsg is being redirected to the ingress of the destination socket. Therefore, we should charge its receive socket buffer, instead of sending socket buffer. Because sk_rmem_schedule() tests pfmemalloc of skb, we need to introduce a wrapper and call it for skmsg. Fixes: 604326b41a6f ("bpf, sockmap: convert to generic sk_msg interface") Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20241210012039.1669389-2-zijianzhang@bytedance.com
2024-12-20	arm64/signal: Silence sparse warning storing GCSPR_EL0	Mark Brown	1	-20/+15
	We are seeing a sparse warning in gcs_restore_signal(): arch/arm64/kernel/signal.c:1054:9: sparse: sparse: cast removes address space '__user' of expression when storing the final GCSPR_EL0 value back into the register, caused by the fact that write_sysreg_s() casts the value it writes to a u64 which sparse sees as discarding the __userness of the pointer. Avoid this by treating the address as an integer, casting to a pointer only when using it to write to userspace. While we're at it also inline gcs_signal_cap_valid() into it's one user and make equivalent updates to gcs_signal_entry(). Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202412082005.OBJ0BbWs-lkp@intel.com/ Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20241214-arm64-gcs-signal-sparse-v3-1-5e8d18fffc0c@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-12-19	KVM: x86/mmu: Treat TDP MMU faults as spurious if access is already allowed	Sean Christopherson	3	-12/+22
	Treat slow-path TDP MMU faults as spurious if the access is allowed given the existing SPTE to fix a benign warning (other than the WARN itself) due to replacing a writable SPTE with a read-only SPTE, and to avoid the unnecessary LOCK CMPXCHG and subsequent TLB flush. If a read fault races with a write fault, fast GUP fails for any reason when trying to "promote" the read fault to a writable mapping, and KVM resolves the write fault first, then KVM will end up trying to install a read-only SPTE (for a !map_writable fault) overtop a writable SPTE. Note, it's not entirely clear why fast GUP fails, or if that's even how KVM ends up with a !map_writable fault with a writable SPTE. If something else is going awry, e.g. due to a bug in mmu_notifiers, then treating read faults as spurious in this scenario could effectively mask the underlying problem. However, retrying the faulting access instead of overwriting an existing SPTE is functionally correct and desirable irrespective of the WARN, and fast GUP _can_ legitimately fail with a writable VMA, e.g. if the Accessed bit in primary MMU's PTE is toggled and causes a PTE value mismatch. The WARN was also recently added, specifically to track down scenarios where KVM is unnecessarily overwrites SPTEs, i.e. treating the fault as spurious doesn't regress KVM's bug-finding capabilities in any way. In short, letting the WARN linger because there's a tiny chance it's due to a bug elsewhere would be excessively paranoid. Fixes: 1a175082b190 ("KVM: x86/mmu: WARN and flush if resolving a TDP MMU fault clears MMU-writable") Reported-by: Lei Yang <leiyang@redhat.com> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219588 Tested-by: Lei Yang <leiyang@redhat.com> Link: https://lore.kernel.org/r/20241218213611.3181643-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-19	KVM: SVM: Allow guest writes to set MSR_AMD64_DE_CFG bits	Sean Christopherson	1	-9/+0
	Drop KVM's arbitrary behavior of making DE_CFG.LFENCE_SERIALIZE read-only for the guest, as rejecting writes can lead to guest crashes, e.g. Windows in particular doesn't gracefully handle unexpected #GPs on the WRMSR, and nothing in the AMD manuals suggests that LFENCE_SERIALIZE is read-only _if it exists_. KVM only allows LFENCE_SERIALIZE to be set, by the guest or host, if the underlying CPU has X86_FEATURE_LFENCE_RDTSC, i.e. if LFENCE is guaranteed to be serializing. So if the guest sets LFENCE_SERIALIZE, KVM will provide the desired/correct behavior without any additional action (the guest's value is never stuffed into hardware). And having LFENCE be serializing even when it's not _required_ to be is a-ok from a functional perspective. Fixes: 74a0e79df68a ("KVM: SVM: Disallow guest from changing userspace's MSR_AMD64_DE_CFG value") Fixes: d1d93fa90f1a ("KVM: SVM: Add MSR-based feature support for serializing LFENCE") Reported-by: Simon Pilkington <simonp.git@mailbox.org> Closes: https://lore.kernel.org/all/52914da7-a97b-45ad-86a0-affdf8266c61@mailbox.org Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: stable@vger.kernel.org Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lore.kernel.org/r/20241211172952.1477605-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-19	KVM: x86: Play nice with protected guests in complete_hypercall_exit()	Sean Christopherson	1	-1/+1
	Use is_64_bit_hypercall() instead of is_64_bit_mode() to detect a 64-bit hypercall when completing said hypercall. For guests with protected state, e.g. SEV-ES and SEV-SNP, KVM must assume the hypercall was made in 64-bit mode as the vCPU state needed to detect 64-bit mode is unavailable. Hacking the sev_smoke_test selftest to generate a KVM_HC_MAP_GPA_RANGE hypercall via VMGEXIT trips the WARN: ------------[ cut here ]------------ WARNING: CPU: 273 PID: 326626 at arch/x86/kvm/x86.h:180 complete_hypercall_exit+0x44/0xe0 [kvm] Modules linked in: kvm_amd kvm ... [last unloaded: kvm] CPU: 273 UID: 0 PID: 326626 Comm: sev_smoke_test Not tainted 6.12.0-smp--392e932fa0f3-feat #470 Hardware name: Google Astoria/astoria, BIOS 0.20240617.0-0 06/17/2024 RIP: 0010:complete_hypercall_exit+0x44/0xe0 [kvm] Call Trace: <TASK> kvm_arch_vcpu_ioctl_run+0x2400/0x2720 [kvm] kvm_vcpu_ioctl+0x54f/0x630 [kvm] __se_sys_ioctl+0x6b/0xc0 do_syscall_64+0x83/0x160 entry_SYSCALL_64_after_hwframe+0x76/0x7e </TASK> ---[ end trace 0000000000000000 ]--- Fixes: b5aead0064f3 ("KVM: x86: Assume a 64-bit hypercall for guests with protected state") Cc: stable@vger.kernel.org Cc: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Reviewed-by: Nikunj A Dadhania <nikunj@amd.com> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Link: https://lore.kernel.org/r/20241128004344.4072099-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-19	KVM: SVM: Disable AVIC on SNP-enabled system without HvInUseWrAllowed feature	Suravee Suthikulpanit	2	-0/+7
	On SNP-enabled system, VMRUN marks AVIC Backing Page as in-use while the guest is running for both secure and non-secure guest. Any hypervisor write to the in-use vCPU's AVIC backing page (e.g. to inject an interrupt) will generate unexpected #PF in the host. Currently, attempt to run AVIC guest would result in the following error: BUG: unable to handle page fault for address: ff3a442e549cc270 #PF: supervisor write access in kernel mode #PF: error_code(0x80000003) - RMP violation PGD b6ee01067 P4D b6ee02067 PUD 10096d063 PMD 11c540063 PTE 80000001149cc163 SEV-SNP: PFN 0x1149cc unassigned, dumping non-zero entries in 2M PFN region: [0x114800 - 0x114a00] ... Newer AMD system is enhanced to allow hypervisor to modify the backing page for non-secure guest on SNP-enabled system. This enhancement is available when the CPUID Fn8000_001F_EAX bit 30 is set (HvInUseWrAllowed). This table describes AVIC support matrix w.r.t. SNP enablement: \| Non-SNP system \| SNP system ----------------------------------------------------- Non-SNP guest \| AVIC Activate \| AVIC Activate iff \| \| HvInuseWrAllowed=1 ----------------------------------------------------- SNP guest \| N/A \| Secure AVIC Therefore, check and disable AVIC in kvm_amd driver when the feature is not available on SNP-enabled system. See the AMD64 Architecture Programmer’s Manual (APM) Volume 2 for detail. (https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/ programmer-references/40332.pdf) Fixes: 216d106c7ff7 ("x86/sev: Add SEV-SNP host initialization support") Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Link: https://lore.kernel.org/r/20241104075845.7583-1-suravee.suthikulpanit@amd.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-19	io_uring: check if iowq is killed before queuing	Pavel Begunkov	1	-1/+5
	task work can be executed after the task has gone through io_uring termination, whether it's the final task_work run or the fallback path. In this case, task work will find ->io_wq being already killed and null'ed, which is a problem if it then tries to forward the request to io_queue_iowq(). Make io_queue_iowq() fail requests in this case. Note that it also checks PF_KTHREAD, because the user can first close a DEFER_TASKRUN ring and shortly after kill the task, in which case ->iowq check would race. Cc: stable@vger.kernel.org Fixes: 50c52250e2d74 ("block: implement async io_uring discard cmd") Fixes: 773af69121ecc ("io_uring: always reissue from task_work context") Reported-by: Will <willsroot@protonmail.com> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/63312b4a2c2bb67ad67b857d17a300e1d3b078e8.1734637909.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-12-19	smb: fix bytes written value in /proc/fs/cifs/Stats	Bharath SM	1	-0/+3
	With recent netfs apis changes, the bytes written value was not getting updated in /proc/fs/cifs/Stats. Fix this by updating tcon->bytes in write operations. Fixes: 3ee1a1fc3981 ("cifs: Cut over to using netfslib") Signed-off-by: Bharath SM <bharathsm@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-12-19	PCI/bwctrl: Enable only if more than one speed is supported	Lukas Wunner	1	-1/+3
	If a PCIe port only supports a single speed, enabling bandwidth control is pointless: There's no need to monitor autonomous speed changes, nor can the speed be changed. Not enabling it saves a small amount of memory and compute resources, but also fixes a boot hang reported by Niklas: It occurs when enabling bandwidth control on Downstream Ports of Intel JHL7540 "Titan Ridge 2018" Thunderbolt controllers. The ports only support 2.5 GT/s in accordance with USB4 v2 sec 11.2.1, so the present commit works around the issue. PCIe r6.2 sec 8.2.1 prescribes that: "A device must support 2.5 GT/s and is not permitted to skip support for any data rates between 2.5 GT/s and the highest supported rate." Consequently, bandwidth control is currently only disabled if a port doesn't support higher speeds than 2.5 GT/s. However the Implementation Note in PCIe r6.2 sec 7.5.3.18 cautions: "It is strongly encouraged that software primarily utilize the Supported Link Speeds Vector instead of the Max Link Speed field, so that software can determine the exact set of supported speeds on current and future hardware. This can avoid software being confused if a future specification defines Links that do not require support for all slower speeds." In other words, future revisions of the PCIe Base Spec may allow gaps in the Supported Link Speeds Vector. To be future-proof, don't just check whether speeds above 2.5 GT/s are supported, but rather check whether more than one speed is supported. Fixes: 665745f27487 ("PCI/bwctrl: Re-add BW notification portdrv as PCIe BW controller") Closes: https://lore.kernel.org/r/db8e457fcd155436449b035e8791a8241b0df400.camel@kernel.org Link: https://lore.kernel.org/r/3564908a9c99fc0d2a292473af7a94ebfc8f5820.1734428762.git.lukas@wunner.de Reported-by: Niklas Schnelle <niks@kernel.org> Tested-by: Niklas Schnelle <niks@kernel.org> Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Jonathan Cameron <Jonthan.Cameron@huawei.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2024-12-19	PCI: Honor Max Link Speed when determining supported speeds	Lukas Wunner	1	-2/+4
	The Supported Link Speeds Vector in the Link Capabilities 2 Register indicates the supported link speeds. The Max Link Speed field in the Link Capabilities Register indicates the maximum of those speeds. pcie_get_supported_speeds() neglects to honor the Max Link Speed field and will thus incorrectly deem higher speeds as supported. Fix it. One user-visible issue addressed here is an incorrect value in the sysfs attribute "max_link_speed". But the main motivation is a boot hang reported by Niklas: Intel JHL7540 "Titan Ridge 2018" Thunderbolt controllers supports 2.5-8 GT/s speeds, but indicate 2.5 GT/s as maximum. Ilpo recalls seeing this on more devices. It can be explained by the controller's Downstream Ports supporting 8 GT/s if an Endpoint is attached, but limiting to 2.5 GT/s if the port interfaces to a PCIe Adapter, in accordance with USB4 v2 sec 11.2.1: "This section defines the functionality of an Internal PCIe Port that interfaces to a PCIe Adapter. [...] The Logical sub-block shall update the PCIe configuration registers with the following characteristics: [...] Max Link Speed field in the Link Capabilities Register set to 0001b (data rate of 2.5 GT/s only). Note: These settings do not represent actual throughput. Throughput is implementation specific and based on the USB4 Fabric performance." The present commit is not sufficient on its own to fix Niklas' boot hang, but it is a prerequisite: A subsequent commit will fix the boot hang by enabling bandwidth control only if more than one speed is supported. The GENMASK() macro used herein specifies 0 as lowest bit, even though the Supported Link Speeds Vector ends at bit 1. This is done on purpose to avoid a GENMASK(0, 1) macro if Max Link Speed is zero. That macro would be invalid as the lowest bit is greater than the highest bit. Ilpo has witnessed a zero Max Link Speed on Root Complex Integrated Endpoints in particular, so it does occur in practice. (The Link Capabilities Register is optional on RCiEPs per PCIe r6.2 sec 7.5.3.) Fixes: d2bd39c0456b ("PCI: Store all PCIe Supported Link Speeds") Closes: https://lore.kernel.org/r/70829798889c6d779ca0f6cd3260a765780d1369.camel@kernel.org Link: https://lore.kernel.org/r/fe03941e3e1cc42fb9bf4395e302bff53ee2198b.1734428762.git.lukas@wunner.de Reported-by: Niklas Schnelle <niks@kernel.org> Tested-by: Niklas Schnelle <niks@kernel.org> Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2024-12-19	io_uring/register: limit ring resizing to DEFER_TASKRUN	Jens Axboe	1	-0/+3
	With DEFER_TASKRUN, we know the ring can't be both waited upon and resized at the same time. This is important for CQ resizing. Allowing SQ ring resizing is more trivial, but isn't the interesting use case. Hence limit ring resizing in general to DEFER_TASKRUN only for now. This isn't a huge problem as CQ ring resizing is generally the most useful on networking type of workloads where it can be hard to size the ring appropriately upfront, and those should be using DEFER_TASKRUN for better performance. Fixes: 79cfe9e59c2a ("io_uring/register: add IORING_REGISTER_RESIZE_RINGS") Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-12-19	smb: client: fix TCP timers deadlock after rmmod	Enzo Matsumiya	1	-10/+26
	Commit ef7134c7fc48 ("smb: client: Fix use-after-free of network namespace.") fixed a netns UAF by manually enabled socket refcounting (sk->sk_net_refcnt=1 and sock_inuse_add(net, 1)). The reason the patch worked for that bug was because we now hold references to the netns (get_net_track() gets a ref internally) and they're properly released (internally, on __sk_destruct()), but only because sk->sk_net_refcnt was set. Problem: (this happens regardless of CONFIG_NET_NS_REFCNT_TRACKER and regardless if init_net or other) Setting sk->sk_net_refcnt=1 manually and after socket creation is not only out of cifs scope, but also technically wrong -- it's set conditionally based on user (=1) vs kernel (=0) sockets. And net/ implementations seem to base their user vs kernel space operations on it. e.g. upon TCP socket close, the TCP timers are not cleared because sk->sk_net_refcnt=1: (cf. commit 151c9c724d05 ("tcp: properly terminate timers for kernel sockets")) net/ipv4/tcp.c: void tcp_close(struct sock *sk, long timeout) { lock_sock(sk); __tcp_close(sk, timeout); release_sock(sk); if (!sk->sk_net_refcnt) inet_csk_clear_xmit_timers_sync(sk); sock_put(sk); } Which will throw a lockdep warning and then, as expected, deadlock on tcp_write_timer(). A way to reproduce this is by running the reproducer from ef7134c7fc48 and then 'rmmod cifs'. A few seconds later, the deadlock/lockdep warning shows up. Fix: We shouldn't mess with socket internals ourselves, so do not set sk_net_refcnt manually. Also change __sock_create() to sock_create_kern() for explicitness. As for non-init_net network namespaces, we deal with it the best way we can -- hold an extra netns reference for server->ssocket and drop it when it's released. This ensures that the netns still exists whenever we need to create/destroy server->ssocket, but is not directly tied to it. Fixes: ef7134c7fc48 ("smb: client: Fix use-after-free of network namespace.") Cc: stable@vger.kernel.org Signed-off-by: Enzo Matsumiya <ematsumiya@suse.de> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-12-19	smb: client: Deduplicate "select NETFS_SUPPORT" in Kconfig	Dragan Simic	1	-1/+0
	Repeating automatically selected options in Kconfig files is redundant, so let's delete repeated "select NETFS_SUPPORT" that was added accidentally. Fixes: 69c3c023af25 ("cifs: Implement netfslib hooks") Signed-off-by: Dragan Simic <dsimic@manjaro.org> Signed-off-by: Steve French <stfrench@microsoft.com>