linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2021-08-27	virtio-mem: fix sleeping in RCU read side section in virtio_mem_online_page_cb()	David Hildenbrand	1	-1/+8
	virtio_mem_set_fake_offline() might sleep now, and we call it under rcu_read_lock(). To fix it, simply move the rcu_read_unlock() further up, as we're done with the device. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Fixes: 6cc26d77613a: "virtio-mem: use page_offline_(start\|end) when setting PageOffline() Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: virtualization@lists.linux-foundation.org Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-08-27	cryptoloop: add a deprecation warning	Christoph Hellwig	2	-2/+4
	Support for cryptoloop has been officially marked broken and deprecated in favor of dm-crypt (which supports the same broken algorithms if needed) in Linux 2.6.4 (released in March 2004), and support for it has been entirely removed from losetup in util-linux 2.23 (released in April 2013). Add a warning and a deprecation schedule. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210827163250.255325-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-27	Revert "mmc: sdhci-iproc: Set SDHCI_QUIRK_CAP_CLOCK_BASE_BROKEN on BCM2711"	Ulf Hansson	1	-2/+1
	This reverts commit 419dd626e357e89fc9c4e3863592c8b38cfe1571. It turned out that the change from the reverted commit breaks the ACPI based rpi's because it causes the 100Mhz max clock to be overridden to the return from sdhci_iproc_get_max_clock(), which is 0 because there isn't a OF/DT based clock device. Reported-by: Jeremy Linton <jeremy.linton@arm.com> Fixes: 419dd626e357 ("mmc: sdhci-iproc: Set SDHCI_QUIRK_CAP_CLOCK_BASE_BROKEN on BCM2711") Acked-by: Stefan Wahren <stefan.wahren@i2se.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2021-08-27	usb: gadget: u_audio: fix race condition on endpoint stop	Jerome Brunet	1	-4/+4
	If the endpoint completion callback is call right after the ep_enabled flag is cleared and before usb_ep_dequeue() is call, we could do a double free on the request and the associated buffer. Fix this by clearing ep_enabled after all the endpoint requests have been dequeued. Fixes: 7de8681be2cd ("usb: gadget: u_audio: Free requests only after callback") Cc: stable <stable@vger.kernel.org> Reported-by: Thinh Nguyen <Thinh.Nguyen@synopsys.com> Signed-off-by: Jerome Brunet <jbrunet@baylibre.com> Link: https://lore.kernel.org/r/20210827092927.366482-1-jbrunet@baylibre.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-08-27	usb: gadget: f_uac2: fixup feedback endpoint stop	Jerome Brunet	1	-4/+11
	When the uac2 function is stopped, there seems to be an issue reported on some platforms (Intel Merrifield at least) BUG: kernel NULL pointer dereference, address: 0000000000000008 ... RIP: 0010:dwc3_gadget_del_and_unmap_request+0x19/0xe0 ... Call Trace: dwc3_remove_requests.constprop.0+0x12f/0x170 __dwc3_gadget_ep_disable+0x7a/0x160 dwc3_gadget_ep_disable+0x3d/0xd0 usb_ep_disable+0x1c/0x70 u_audio_stop_capture+0x79/0x120 [u_audio] afunc_set_alt+0x73/0x80 [usb_f_uac2] composite_setup+0x224/0x1b90 [libcomposite] The issue happens only when the gadget is using the sync type "async", not "adaptive". This indicates that problem is coming from the feedback endpoint, which is only used with async synchronization mode. The problem is that request is freed regardless of usb_ep_dequeue(), which ends up badly if the request is not actually dequeued yet. Update the feedback endpoint free function to release the endpoint the same way it is done for the data endpoint, which takes care of the problem. Fixes: 24f779dac8f3 ("usb: gadget: f_uac2/u_audio: add feedback endpoint support") Reported-by: Ferry Toth <ftoth@exalondelft.nl> Tested-by: Ferry Toth <ftoth@exalondelft.nl> Acked-by: Felipe Balbi <balbi@kernel.org> Signed-off-by: Jerome Brunet <jbrunet@baylibre.com> Link: https://lore.kernel.org/r/20210827075853.266912-1-jbrunet@baylibre.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-08-27	pd: fix a NULL vs IS_ERR() check	Dan Carpenter	1	-1/+1
	blk_mq_alloc_disk() returns error pointers, it doesn't return NULL so correct the check. Fixes: 262d431f9000 ("pd: use blk_mq_alloc_disk and blk_cleanup_disk") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Link: https://lore.kernel.org/r/20210827100023.GB9449@kili Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-27	PCI/MSI: Skip masking MSI-X on Xen PV	Marek Marczykowski-Górecki	1	-0/+3
	When running as Xen PV guest, masking MSI-X is a responsibility of the hypervisor. The guest has no write access to the relevant BAR at all - when it tries to, it results in a crash like this: BUG: unable to handle page fault for address: ffffc9004069100c #PF: supervisor write access in kernel mode #PF: error_code(0x0003) - permissions violation RIP: e030:__pci_enable_msix_range.part.0+0x26b/0x5f0 e1000e_set_interrupt_capability+0xbf/0xd0 [e1000e] e1000_probe+0x41f/0xdb0 [e1000e] local_pci_probe+0x42/0x80 (...) The recently introduced function msix_mask_all() does not check the global variable pci_msi_ignore_mask which is set by XEN PV to bypass the masking of MSI[-X] interrupts. Add the check to make this function XEN PV compatible. Fixes: 7d5ec3d36123 ("PCI/MSI: Mask all unused MSI-X entries") Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210826170342.135172-1-marmarek@invisiblethingslab.com
2021-08-26	Revert "block/mq-deadline: Prioritize high-priority requests"	Jens Axboe	1	-37/+5
	This reverts commit fb926032b3209300f9dc454a36b8299582ae545c. Zhen reports that this commit slows down mq-deadline on a 128 thread box, going from 258K IOPS to 170-180K. My testing shows that Optane gen2 IOPS goes from 2.3M IOPS to 1.2M IOPS on a 64 thread box. Looking in detail at the code, the main culprit here is needing to sum percpu counters in the dispatch hot path, leading to very high CPU utilization there. To make matters worse, the code currently needs to sum 2 percpu counters, and it does so in the most naive way of iterating possible CPUs _twice_. Since we're close to release, revert this commit and we can re-do it with regular per-priority counters instead for the 5.15 kernel. Link: https://lore.kernel.org/linux-block/20210826144039.2143-1-thunder.leizhen@huawei.com/ Reported-by: Zhen Lei <thunder.leizhen@huawei.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-26	Revert "net: really fix the build..."	Kalle Valo	6	-30/+8
	This reverts commit ce78ffa3ef1681065ba451cfd545da6126f5ca88. Wren and Nicolas reported that ath11k was failing to initialise QCA6390 Wi-Fi 6 device with error: qcom_mhi_qrtr: probe of mhi0_IPCR failed with error -22 Commit ce78ffa3ef16 ("net: really fix the build..."), introduced in v5.14-rc5, caused this regression in qrtr. Most likely all ath11k devices are broken, but I only tested QCA6390. Let's revert the broken commit so that ath11k works again. Reported-by: Wren Turkal <wt@penguintechs.org> Reported-by: Nicolas Schichan <nschichan@freebox.fr> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/20210826172816.24478-1-kvalo@codeaurora.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-26	media: ipu3-cio2: Drop reference on error path in cio2_bridge_connect_sensor()	Andy Shevchenko	1	-1/+1
	The commit 71f642833284 ("ACPI: utils: Fix reference counting in for_each_acpi_dev_match()") moved adev assignment outside of error path and hence made acpi_dev_put(sensor->adev) a no-op. We still need to drop reference count on error path, and to achieve that, replace sensor->adev by locally assigned adev. Fixes: 71f642833284 ("ACPI: utils: Fix reference counting in for_each_acpi_dev_match()") Depends-on: fc68f42aa737 ("ACPI: fix NULL pointer dereference") Reported-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Sakari Ailus <sakari.ailus@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-08-26	net: hns3: fix get wrong pfc_en when query PFC configuration	Guangbin Huang	1	-11/+2
	Currently, when query PFC configuration by dcbtool, driver will return PFC enable status based on TC. As all priorities are mapped to TC0 by default, if TC0 is enabled, then all priorities mapped to TC0 will be shown as enabled status when query PFC setting, even though some priorities have never been set. for example: $ dcb pfc show dev eth0 pfc-cap 4 macsec-bypass off delay 0 prio-pfc 0:off 1:off 2:off 3:off 4:off 5:off 6:off 7:off $ dcb pfc set dev eth0 prio-pfc 0:on 1:on 2:on 3:on $ dcb pfc show dev eth0 pfc-cap 4 macsec-bypass off delay 0 prio-pfc 0:on 1:on 2:on 3:on 4:on 5:on 6:on 7:on To fix this problem, just returns user's PFC config parameter saved in driver. Fixes: cacde272dd00 ("net: hns3: Add hclge_dcb module for the support of DCB feature") Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-26	net: hns3: fix GRO configuration error after reset	Yufeng Mo	4	-10/+30
	The GRO configuration is enabled by default after reset. This is incorrect and should be restored to the user-configured value. So this restoration is added during reset initialization. Signed-off-by: Yufeng Mo <moyufeng@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-26	net: hns3: change the method of getting cmd index in debugfs	Yufeng Mo	2	-7/+8
	Currently, the cmd index is obtained in debugfs by comparing file names. However, this method may cause errors when processing more complex file names. So, change this method by saving cmd in private data and comparing it when getting cmd index in debugfs for optimization. Fixes: 5e69ea7ee2a6 ("net: hns3: refactor the debugfs process") Signed-off-by: Yufeng Mo <moyufeng@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-26	net: hns3: fix duplicate node in VLAN list	Guojia Liao	1	-1/+5
	VLAN list should not be added duplicate VLAN node, otherwise it would cause "add failed" when restore VLAN from VLAN list, so this patch adds VLAN ID check before adding node into VLAN list. Fixes: c6075b193462 ("net: hns3: Record VF vlan tables") Signed-off-by: Guojia Liao <liaoguojia@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-26	net: hns3: fix speed unknown issue in bond 4	Yonglong Liu	3	-3/+3
	In bond 4, when the link goes down and up repeatedly, the bond may get an unknown speed, and then this port can not work. The driver notify netif_carrier_on() before update the link state, when the bond receive carrier on, will query the speed of the port, if the query operation happens before updating the link state, will get an unknown speed. So need to notify netif_carrier_on() after update the link state. Fixes: 46a3df9f9718 ("net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support") Fixes: e2cb1dec9779 ("net: hns3: Add HNS3 VF HCL(Hardware Compatibility Layer) Support") Signed-off-by: Yonglong Liu <liuyonglong@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-26	net: hns3: add waiting time before cmdq memory is released	Yufeng Mo	4	-2/+13
	After the cmdq registers are cleared, the firmware may take time to clear out possible left over commands in the cmdq. Driver must release cmdq memory only after firmware has completed processing of left over commands. Fixes: 232d0d55fca6 ("net: hns3: uninitialize command queue while unloading PF driver") Signed-off-by: Yufeng Mo <moyufeng@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-26	net: hns3: clear hardware resource when loading driver	Yufeng Mo	2	-0/+29
	If a PF is bonded to a virtual machine and the virtual machine exits unexpectedly, some hardware resource cannot be cleared. In this case, loading driver may cause exceptions. Therefore, the hardware resource needs to be cleared when the driver is loaded. Fixes: 46a3df9f9718 ("net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support") Signed-off-by: Yufeng Mo <moyufeng@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-26	usb: typec: tcpm: Raise vdm_sm_running flag only when VDM SM is running	Kyle Tso	1	-43/+38
	If the port is going to send Discover_Identity Message, vdm_sm_running flag was intentionally set before entering Ready States in order to avoid the conflict because the port and the port partner might start AMS at almost the same time after entering Ready States. However, the original design has a problem. When the port is doing DR_SWAP from Device to Host, it raises the flag. Later in the tcpm_send_discover_work, the flag blocks the procedure of sending the Discover_Identity and it might never be cleared until disconnection. Since there exists another flag send_discover representing that the port is going to send Discover_Identity or not, it is enough to use that flag to prevent the conflict. Also change the timing of the set/clear of vdm_sm_running to indicate whether the VDM SM is actually running or not. Fixes: c34e85fa69b9 ("usb: typec: tcpm: Send DISCOVER_IDENTITY from dedicated work") Cc: stable <stable@vger.kernel.org> Cc: Badhri Jagan Sridharan <badhri@google.com> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Acked-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Signed-off-by: Kyle Tso <kyletso@google.com> Link: https://lore.kernel.org/r/20210826124201.1562502-1-kyletso@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-08-26	usb: renesas-xhci: Prefer firmware loading on unknown ROM state	Takashi Iwai	1	-12/+23
	The recent attempt to handle an unknown ROM state in the commit d143825baf15 ("usb: renesas-xhci: Fix handling of unknown ROM state") resulted in a regression and reverted later by the commit 44cf53602f5a ("Revert "usb: renesas-xhci: Fix handling of unknown ROM state""). The problem of the former fix was that it treated the failure of firmware loading as a fatal error. Since the firmware files aren't included in the standard linux-firmware tree, most users don't have them, hence they got the non-working system after that. The revert fixed the regression, but also it didn't make the firmware loading triggered even on the devices that do need it. So we need still a fix for them. This is another attempt to handle the unknown ROM state. Like the previous fix, this also tries to load the firmware when ROM shows unknown state. In this patch, however, the failure of a firmware loading (such as a missing firmware file) isn't handled as a fatal error any longer when ROM has been already detected, but it falls back to the ROM mode like before. The error is returned only when no ROM is detected and the firmware loading failed. Along with it, for simplifying the code flow, the detection and the check of ROM is factored out from renesas_fw_check_running() and done in the caller side, renesas_xhci_check_request_fw(). It avoids the redundant ROM checks. The patch was tested on Lenovo Thinkpad T14 gen (BIOS 1.34). Also it was confirmed that no regression is seen on another Thinkpad T14 machine that has worked without the patch, too. Fixes: 44cf53602f5a ("Revert "usb: renesas-xhci: Fix handling of unknown ROM state"") Cc: stable <stable@vger.kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de> BugLink: https://bugzilla.opensuse.org/show_bug.cgi?id=1189207 Link: https://lore.kernel.org/r/20210826124127.14789-1-tiwai@suse.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-08-26	usb: dwc3: gadget: Stop EP0 transfers during pullup disable	Wesley Cheng	1	-4/+3
	During a USB cable disconnect, or soft disconnect scenario, a pending SETUP transaction may not be completed, leading to the following error: dwc3 a600000.dwc3: timed out waiting for SETUP phase If this occurs, then the entire pullup disable routine is skipped and proper cleanup and halting of the controller does not complete. Instead of returning an error (which is ignored from the UDC perspective), allow the pullup disable routine to continue, which will also handle disabling of EP0/1. This will end any active transfers as well. Ensure to clear any delayed_status also, as the timeout could happen within the STATUS stage. Fixes: bb0147364850 ("usb: dwc3: gadget: don't clear RUN/STOP when it's invalid to do so") Cc: <stable@vger.kernel.org> Reviewed-by: Thinh Nguyen <Thinh.Nguyen@synopsys.com> Acked-by: Felipe Balbi <balbi@kernel.org> Signed-off-by: Wesley Cheng <wcheng@codeaurora.org> Link: https://lore.kernel.org/r/20210825042855.7977-1-wcheng@codeaurora.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-08-26	usb: dwc3: gadget: Fix dwc3_calc_trbs_left()	Thinh Nguyen	1	-8/+8
	We can't depend on the TRB's HWO bit to determine if the TRB ring is "full". A TRB is only available when the driver had processed it, not when the controller consumed and relinquished the TRB's ownership to the driver. Otherwise, the driver may overwrite unprocessed TRBs. This can happen when many transfer events accumulate and the system is slow to process them and/or when there are too many small requests. If a request is in the started_list, that means there is one or more unprocessed TRBs remained. Check this instead of the TRB's HWO bit whether the TRB ring is full. Fixes: c4233573f6ee ("usb: dwc3: gadget: prepare TRBs on update transfers too") Cc: <stable@vger.kernel.org> Acked-by: Felipe Balbi <balbi@kernel.org> Signed-off-by: Thinh Nguyen <Thinh.Nguyen@synopsys.com> Link: https://lore.kernel.org/r/e91e975affb0d0d02770686afc3a5b9eb84409f6.1629335416.git.Thinh.Nguyen@synopsys.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-08-26	drm/i915/dp: Drop redundant debug print	Swati Sharma	1	-7/+2
	drm_dp_dpcd_read/write already has debug error message. Drop redundant error messages which gives false status even if correct value is read in drm_dp_dpcd_read(). v2: -Added fixes tag (Ankit) v3: -Fixed build error (CI) Fixes: 9488a030ac91 ("drm/i915: Add support for enabling link status and recovery") Cc: Ankit Nautiyal <ankit.k.nautiyal@intel.com> Cc: Imre Deak <imre.deak@intel.com> Cc: Jani Nikula <jani.nikula@intel.com> Cc: José Roberto de Souza <jose.souza@intel.com> Cc: Manasi Navare <manasi.d.navare@intel.com> Cc: Sean Paul <seanpaul@chromium.org> Cc: Uma Shankar <uma.shankar@intel.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: <stable@vger.kernel.org> # v5.12+ Reviewed-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Swati Sharma <swati2.sharma@intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210812131107.5531-1-swati2.sharma@intel.com (cherry picked from commit b6dfa416172939edaa46a5a647457b94c6d94119) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2021-08-26	drm/i915: Fix syncmap memory leak	Matthew Brost	1	-0/+9
	A small race exists between intel_gt_retire_requests_timeout and intel_timeline_exit which could result in the syncmap not getting free'd. Rather than work to hard to seal this race, simply cleanup the syncmap on fini. unreferenced object 0xffff88813bc53b18 (size 96): comm "gem_close_race", pid 5410, jiffies 4294917818 (age 1105.600s) hex dump (first 32 bytes): 01 00 00 00 00 00 00 00 00 00 00 00 0a 00 00 00 ................ 00 00 00 00 00 00 00 00 6b 6b 6b 6b 06 00 00 00 ........kkkk.... backtrace: [<00000000120b863a>] __sync_alloc_leaf+0x1e/0x40 [i915] [<00000000042f6959>] __sync_set+0x1bb/0x240 [i915] [<0000000090f0e90f>] i915_request_await_dma_fence+0x1c7/0x400 [i915] [<0000000056a48219>] i915_request_await_object+0x222/0x360 [i915] [<00000000aaac4ee3>] i915_gem_do_execbuffer+0x1bd0/0x2250 [i915] [<000000003c9d830f>] i915_gem_execbuffer2_ioctl+0x405/0xce0 [i915] [<00000000fd7a8e68>] drm_ioctl_kernel+0xb0/0xf0 [drm] [<00000000e721ee87>] drm_ioctl+0x305/0x3c0 [drm] [<000000008b0d8986>] __x64_sys_ioctl+0x71/0xb0 [<0000000076c362a4>] do_syscall_64+0x33/0x80 [<00000000eb7a4831>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Signed-off-by: Matthew Brost <matthew.brost@intel.com> Fixes: 531958f6f357 ("drm/i915/gt: Track timeline activeness in enter/exit") Cc: <stable@vger.kernel.org> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210730195342.110234-1-matthew.brost@intel.com (cherry picked from commit faf890985e30d5e88cc3a7c50c1bcad32f89ab7c) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2021-08-26	net: fix NULL pointer reference in cipso_v4_doi_free	王贇	1	-8/+10
	In netlbl_cipsov4_add_std() when 'doi_def->map.std' alloc failed, we sometime observe panic: BUG: kernel NULL pointer dereference, address: ... RIP: 0010:cipso_v4_doi_free+0x3a/0x80 ... Call Trace: netlbl_cipsov4_add_std+0xf4/0x8c0 netlbl_cipsov4_add+0x13f/0x1b0 genl_family_rcv_msg_doit.isra.15+0x132/0x170 genl_rcv_msg+0x125/0x240 This is because in cipso_v4_doi_free() there is no check on 'doi_def->map.std' when 'doi_def->type' equal 1, which is possibe, since netlbl_cipsov4_add_std() haven't initialize it before alloc 'doi_def->map.std'. This patch just add the check to prevent panic happen for similar cases. Reported-by: Abaci <abaci@linux.alibaba.com> Signed-off-by: Michael Wang <yun.wang@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-26	rtnetlink: Return correct error on changing device netns	Andrey Ignatov	1	-1/+2
	Currently when device is moved between network namespaces using RTM_NEWLINK message type and one of netns attributes (FLA_NET_NS_PID, IFLA_NET_NS_FD, IFLA_TARGET_NETNSID) but w/o specifying IFLA_IFNAME, and target namespace already has device with same name, userspace will get EINVAL what is confusing and makes debugging harder. Fix it so that userspace gets more appropriate EEXIST instead what makes debugging much easier. Before: # ./ifname.sh + ip netns add ns0 + ip netns exec ns0 ip link add l0 type dummy + ip netns exec ns0 ip link show l0 8: l0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 66:90:b5:d5:78:69 brd ff:ff:ff:ff:ff:ff + ip link add l0 type dummy + ip link show l0 10: l0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 6e:c6:1f:15:20:8d brd ff:ff:ff:ff:ff:ff + ip link set l0 netns ns0 RTNETLINK answers: Invalid argument After: # ./ifname.sh + ip netns add ns0 + ip netns exec ns0 ip link add l0 type dummy + ip netns exec ns0 ip link show l0 8: l0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 1e:4a:72:e3:e3:8f brd ff:ff:ff:ff:ff:ff + ip link add l0 type dummy + ip link show l0 10: l0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether f2:fc:fe:2b:7d:a6 brd ff:ff:ff:ff:ff:ff + ip link set l0 netns ns0 RTNETLINK answers: File exists The problem is that do_setlink() passes its `char ifname` argument, that it gets from a caller, to __dev_change_net_namespace() as is (as `const char pat`), but semantics of ifname and pat can be different. For example, __rtnl_newlink() does this: net/core/rtnetlink.c 3270 char ifname[IFNAMSIZ]; ... 3286 if (tb[IFLA_IFNAME]) 3287 nla_strscpy(ifname, tb[IFLA_IFNAME], IFNAMSIZ); 3288 else 3289 ifname[0] = '\0'; ... 3364 if (dev) { ... 3394 return do_setlink(skb, dev, ifm, extack, tb, ifname, status); 3395 } , i.e. do_setlink() gets ifname pointer that is always valid no matter if user specified IFLA_IFNAME or not and then do_setlink() passes this ifname pointer as is to __dev_change_net_namespace() as pat argument. But the pat (pattern) in __dev_change_net_namespace() is used as: net/core/dev.c 11198 err = -EEXIST; 11199 if (__dev_get_by_name(net, dev->name)) { 11200 /* We get here if we can't use the current device name */ 11201 if (!pat) 11202 goto out; 11203 err = dev_get_valid_name(net, dev, pat); 11204 if (err < 0) 11205 goto out; 11206 } As the result the `goto out` path on line 11202 is neven taken and instead of returning EEXIST defined on line 11198, __dev_change_net_namespace() returns an error from dev_get_valid_name() and this, in turn, will be EINVAL for ifname[0] = '\0' set earlier. Fixes: d8a5ec672768 ("[NET]: netlink support for moving devices between network namespaces.") Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-26	net: dsa: hellcreek: Adjust schedule look ahead window	Kurt Kanzenbach	1	-1/+1
	Traffic schedules can only be started up to eight seconds within the future. Therefore, the driver periodically checks every two seconds whether the admin base time provided by the user is inside that window. If so the schedule is started. Otherwise the check is deferred. However, according to the programming manual the look ahead window size should be four - not eight - seconds. By using the proposed value of four seconds starting a schedule at a specified admin base time actually works as expected. Fixes: 24dfc6eb39b2 ("net: dsa: hellcreek: Add TAPRIO offloading support") Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-26	net: dsa: hellcreek: Fix incorrect setting of GCL	Kurt Kanzenbach	1	-3/+3
	Currently the gate control list which is programmed into the hardware is incorrect resulting in wrong traffic schedules. The problem is the loop variables are incremented before they are referenced. Therefore, move the increment to the end of the loop. Fixes: 24dfc6eb39b2 ("net: dsa: hellcreek: Add TAPRIO offloading support") Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-26	cxgb4: dont touch blocked freelist bitmap after free	Rahul Lakkireddy	1	-4/+3
	When adapter init fails, the blocked freelist bitmap is already freed up and should not be touched. So, move the bitmap zeroing closer to where it was successfully allocated. Also handle adapter init failure unwind path immediately and avoid setting up RDMA memory windows. Fixes: 5b377d114f2b ("cxgb4: Add debugfs facility to inject FL starvation") Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-26	ipv4: use siphash instead of Jenkins in fnhe_hashfun()	Eric Dumazet	1	-6/+6
	A group of security researchers brought to our attention the weakness of hash function used in fnhe_hashfun(). Lets use siphash instead of Jenkins Hash, to considerably reduce security risks. Also remove the inline keyword, this really is distracting. Fixes: d546c621542d ("ipv4: harden fnhe_hashfun()") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Keyu Man <kman001@ucr.edu> Cc: Willy Tarreau <w@1wt.eu> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-26	ipv6: use siphash in rt6_exception_hash()	Eric Dumazet	1	-6/+14
	A group of security researchers brought to our attention the weakness of hash function used in rt6_exception_hash() Lets use siphash instead of Jenkins Hash, to considerably reduce security risks. Following patch deals with IPv4. Fixes: 35732d01fe31 ("ipv6: introduce a hash table to store dst cache") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Keyu Man <kman001@ucr.edu> Cc: Wei Wang <weiwan@google.com> Cc: Martin KaFai Lau <kafai@fb.com> Acked-by: Wei Wang <weiwan@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-26	perf/x86/amd/power: Assign pmu.module	Kim Phillips	1	-0/+1
	Assign pmu.module so the driver can't be unloaded whilst in use. Signed-off-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210817221048.88063-4-kim.phillips@amd.com
2021-08-26	perf/x86/amd/ibs: Extend PERF_PMU_CAP_NO_EXCLUDE to IBS Op	Kim Phillips	1	-0/+1
	Commit: 2ff40250691e ("perf/core, arch/x86: Use PERF_PMU_CAP_NO_EXCLUDE for exclusion incapable PMUs") neglected to do so. Fixes: 2ff40250691e ("perf/core, arch/x86: Use PERF_PMU_CAP_NO_EXCLUDE for exclusion incapable PMUs") Signed-off-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210817221048.88063-2-kim.phillips@amd.com
2021-08-26	perf/x86/amd/ibs: Work around erratum #1197	Kim Phillips	1	-0/+8
	Erratum #1197 "IBS (Instruction Based Sampling) Register State May be Incorrect After Restore From CC6" is published in a document: "Revision Guide for AMD Family 19h Models 00h-0Fh Processors" 56683 Rev. 1.04 July 2021 https://bugzilla.kernel.org/show_bug.cgi?id=206537 Implement the erratum's suggested workaround and ignore IBS samples if MSRC001_1031 == 0. Signed-off-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210817221048.88063-3-kim.phillips@amd.com
2021-08-26	perf/x86/intel/uncore: Fix integer overflow on 23 bit left shift of a u32	Colin Ian King	1	-1/+1
	The u32 variable pci_dword is being masked with 0x1fffffff and then left shifted 23 places. The shift is a u32 operation,so a value of 0x200 or more in pci_dword will overflow the u32 and only the bottow 32 bits are assigned to addr. I don't believe this was the original intent. Fix this by casting pci_dword to a resource_size_t to ensure no overflow occurs. Note that the mask and 12 bit left shift operation does not need this because the mask SNR_IMC_MMIO_MEM0_MASK and shift is always a 32 bit value. Fixes: ee49532b38dd ("perf/x86/intel/uncore: Add IMC uncore support for Snow Ridge") Addresses-Coverity: ("Unintentional integer overflow") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Link: https://lore.kernel.org/r/20210706114553.28249-1-colin.king@canonical.com
2021-08-26	can: usb: esd_usb2: esd_usb2_rx_event(): fix the interchange of the CAN RX and TX error counters	Stefan Mätje	1	-2/+2
	This patch fixes the interchanged fetch of the CAN RX and TX error counters from the ESD_EV_CAN_ERROR_EXT message. The RX error counter is really in struct rx_msg::data[2] and the TX error counter is in struct rx_msg::data[3]. Fixes: 96d8e90382dc ("can: Add driver for esd CAN-USB/2 device") Link: https://lore.kernel.org/r/20210825215227.4947-2-stefan.maetje@esd.eu Cc: stable@vger.kernel.org Signed-off-by: Stefan Mätje <stefan.maetje@esd.eu> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2021-08-25	net: usb: asix: ax88772: fix boolconv.cocci warnings	kernel test robot	1	-1/+1
	drivers/net/usb/asix_devices.c:757:60-65: WARNING: conversion to bool not needed here Remove unneeded conversion to bool Semantic patch information: Relational and logical operators evaluate to bool, explicit conversion is overly verbose and unneeded. Generated by: scripts/coccinelle/misc/boolconv.cocci Fixes: 7a141e64cf14 ("net: usb: asix: ax88772: move embedded PHY detection as early as possible") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: kernel test robot <lkp@intel.com> Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://lore.kernel.org/r/20210825183538.13070-1-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-25	SUNRPC: Fix XPT_BUSY flag leakage in svc_handle_xprt()...	Trond Myklebust	1	-1/+2
	If the attempt to reserve a slot fails, we currently leak the XPT_BUSY flag on the socket. Among other things, this make it impossible to close the socket. Fixes: 82011c80b3ec ("SUNRPC: Move svc_xprt_received() call sites") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2021-08-25	MAINTAINERS: exfat: update my email address	Namjae Jeon	1	-1/+1
	My email address in exfat entry will be not available in a few days. Update it to my own kernel.org address. Link: https://lkml.kernel.org/r/20210825044833.16806-1-namjae.jeon@samsung.com Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-08-25	mm/memory_hotplug: fix potential permanent lru cache disable	Miaohe Lin	1	-0/+1
	If offline_pages failed after lru_cache_disable(), it forgot to do lru_cache_enable() in error path. So we would have lru cache disabled permanently in this case. Link: https://lkml.kernel.org/r/20210821094246.10149-3-linmiaohe@huawei.com Fixes: d479960e44f2 ("mm: disable LRU pagevec during the migration temporarily") Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Chris Goldsworthy <cgoldswo@codeaurora.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Minchan Kim <minchan@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-08-25	PM: domains: Improve runtime PM performance state handling	Dmitry Osipenko	1	-2/+8
	GENPD core doesn't support handling performance state changes while consumer device is runtime-suspended or when runtime PM is disabled. GENPD core may override performance state that was configured by device driver while RPM of the device was disabled or device was RPM-suspended. Let's close that gap by allowing drivers to control performance state while RPM of a consumer device is disabled and to set up performance state of RPM-suspended device that will be applied by GENPD core on RPM-resume of the device. Fixes: 5937c3ce2122 ("PM: domains: Drop/restore performance state votes for devices at runtime PM") Signed-off-by: Dmitry Osipenko <digetx@gmail.com> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-08-25	pipe: do FASYNC notifications for every pipe IO, not just state changes	Linus Torvalds	1	-12/+8
	It turns out that the SIGIO/FASYNC situation is almost exactly the same as the EPOLLET case was: user space really wants to be notified after every operation. Now, in a perfect world it should be sufficient to only notify user space on "state transitions" when the IO state changes (ie when a pipe goes from unreadable to readable, or from unwritable to writable). User space should then do as much as possible - fully emptying the buffer or what not - and we'll notify it again the next time the state changes. But as with EPOLLET, we have at least one case (stress-ng) where the kernel sent SIGIO due to the pipe being marked for asynchronous notification, but the user space signal handler then didn't actually necessarily read it all before returning (it read more than what was written, but since there could be multiple writes, it could leave data pending). The user space code then expected to get another SIGIO for subsequent writes - even though the pipe had been readable the whole time - and would only then read more. This is arguably a user space bug - and Colin King already fixed the stress-ng code in question - but the kernel regression rules are clear: it doesn't matter if kernel people think that user space did something silly and wrong. What matters is that it used to work. So if user space depends on specific historical kernel behavior, it's a regression when that behavior changes. It's on us: we were silly to have that non-optimal historical behavior, and our old kernel behavior was what user space was tested against. Because of how the FASYNC notification was tied to wakeup behavior, this was first broken by commits f467a6a66419 and 1b6b26ae7053 ("pipe: fix and clarify pipe read/write wakeup logic"), but at the time it seems nobody noticed. Probably because the stress-ng problem case ends up being timing-dependent too. It was then unwittingly fixed by commit 3a34b13a88ca ("pipe: make pipe writes always wake up readers") only to be broken again when by commit 3b844826b6c6 ("pipe: avoid unnecessary EPOLLET wakeups under normal loads"). And at that point the kernel test robot noticed the performance refression in the stress-ng.sigio.ops_per_sec case. So the "Fixes" tag below is somewhat ad hoc, but it matches when the issue was noticed. Fix it for good (knock wood) by simply making the kill_fasync() case separate from the wakeup case. FASYNC is quite rare, and we clearly shouldn't even try to use the "avoid unnecessary wakeups" logic for it. Link: https://lore.kernel.org/lkml/20210824151337.GC27667@xsang-OptiPlex-9020/ Fixes: 3b844826b6c6 ("pipe: avoid unnecessary EPOLLET wakeups under normal loads") Reported-by: kernel test robot <oliver.sang@intel.com> Tested-by: Oliver Sang <oliver.sang@intel.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Colin Ian King <colin.king@canonical.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-08-25	ceph: fix possible null-pointer dereference in ceph_mdsmap_decode()	Tuo Li	1	-3/+5
	kcalloc() is called to allocate memory for m->m_info, and if it fails, ceph_mdsmap_destroy() behind the label out_err will be called: ceph_mdsmap_destroy(m); In ceph_mdsmap_destroy(), m->m_info is dereferenced through: kfree(m->m_info[i].export_targets); To fix this possible null-pointer dereference, check m->m_info before the for loop to free m->m_info[i].export_targets. [ jlayton: fix up whitespace damage only kfree(m->m_info) if it's non-NULL ] Reported-by: TOTE Robot <oslab@tsinghua.edu.cn> Signed-off-by: Tuo Li <islituo@gmail.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2021-08-25	ceph: correctly handle releasing an embedded cap flush	Xiubo Li	4	-12/+22
	The ceph_cap_flush structures are usually dynamically allocated, but the ceph_cap_snap has an embedded one. When force umounting, the client will try to remove all the session caps. During this, it will free them, but that should not be done with the ones embedded in a capsnap. Fix this by adding a new boolean that indicates that the cap flush is embedded in a capsnap, and skip freeing it if that's set. At the same time, switch to using list_del_init() when detaching the i_list and g_list heads. It's possible for a forced umount to remove these objects but then handle_cap_flushsnap_ack() races in and does the list_del_init() again, corrupting memory. Cc: stable@vger.kernel.org URL: https://tracker.ceph.com/issues/52283 Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2021-08-25	perf/x86/intel/pt: Fix mask of num_address_ranges	Xiaoyao Li	1	-1/+1
	Per SDM, bit 2:0 of CPUID(0x14,1).EAX[2:0] reports the number of configurable address ranges for filtering, not bit 1:0. Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Link: https://lkml.kernel.org/r/20210824040622.4081502-1-xiaoyao.li@intel.com
2021-08-25	Revert "btrfs: compression: don't try to compress if we don't have enough pages"	Qu Wenruo	1	-1/+1
	This reverts commit f2165627319ffd33a6217275e5690b1ab5c45763. [BUG] It's no longer possible to create compressed inline extent after commit f2165627319f ("btrfs: compression: don't try to compress if we don't have enough pages"). [CAUSE] For compression code, there are several possible reasons we have a range that needs to be compressed while it's no more than one page. - Compressed inline write The data is always smaller than one sector and the test lacks the condition to properly recognize a non-inline extent. - Compressed subpage write For the incoming subpage compressed write support, we require page alignment of the delalloc range. And for 64K page size, we can compress just one page into smaller sectors. For those reasons, the requirement for the data to be more than one page is not correct, and is already causing regression for compressed inline data writeback. The idea of skipping one page to avoid wasting CPU time could be revisited in the future. [FIX] Fix it by reverting the offending commit. Reported-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org> Link: https://lore.kernel.org/linux-btrfs/afa2742.c084f5d6.17b6b08dffc@tnonline.net Fixes: f2165627319f ("btrfs: compression: don't try to compress if we don't have enough pages") CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-08-25	Partially revert "arm64/mm: drop HAVE_ARCH_PFN_VALID"	Will Deacon	3	-0/+39
	This partially reverts commit 16c9afc776608324ca71c0bc354987bab532f51d. Alex Bee reports a regression in 5.14 on their RK3328 SoC when configuring the PL330 DMA controller: \| ------------[ cut here ]------------ \| WARNING: CPU: 2 PID: 373 at kernel/dma/mapping.c:235 dma_map_resource+0x68/0xc0 \| Modules linked in: spi_rockchip(+) fuse \| CPU: 2 PID: 373 Comm: systemd-udevd Not tainted 5.14.0-rc7 #1 \| Hardware name: Pine64 Rock64 (DT) \| pstate: 80000005 (Nzcv daif -PAN -UAO -TCO BTYPE=--) \| pc : dma_map_resource+0x68/0xc0 \| lr : pl330_prep_slave_fifo+0x78/0xd0 This appears to be because dma_map_resource() is being called for a physical address which does not correspond to a memory address yet does have a valid 'struct page' due to the way in which the vmemmap is constructed. Prior to 16c9afc77660 ("arm64/mm: drop HAVE_ARCH_PFN_VALID"), the arm64 implementation of pfn_valid() called memblock_is_memory() to return 'false' for such regions and the DMA mapping request would proceed. However, now that we are using the generic implementation where only the presence of the memory map entry is considered, we return 'true' and erroneously fail with DMA_MAPPING_ERROR because we identify the region as DRAM. Although fixing this in the DMA mapping code is arguably the right fix, it is a risky, cross-architecture change at this stage in the cycle. So just revert arm64 back to its old pfn_valid() implementation for v5.14. The change to the generic pfn_valid() code is preserved from the original patch, so as to avoid impacting other architectures. Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Christoph Hellwig <hch@lst.de> Reported-by: Alex Bee <knaerzche@gmail.com> Link: https://lore.kernel.org/r/d3a3c828-b777-faf8-e901-904995688437@gmail.com Signed-off-by: Will Deacon <will@kernel.org>
2021-08-25	net/sched: ets: fix crash when flipping from 'strict' to 'quantum'	Davide Caratti	1	-0/+7
	While running kselftests, Hangbin observed that sch_ets.sh often crashes, and splats like the following one are seen in the output of 'dmesg': BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 159f12067 P4D 159f12067 PUD 159f13067 PMD 0 Oops: 0000 [#1] SMP NOPTI CPU: 2 PID: 921 Comm: tc Not tainted 5.14.0-rc6+ #458 Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014 RIP: 0010:__list_del_entry_valid+0x2d/0x50 Code: 48 8b 57 08 48 b9 00 01 00 00 00 00 ad de 48 39 c8 0f 84 ac 6e 5b 00 48 b9 22 01 00 00 00 00 ad de 48 39 ca 0f 84 cf 6e 5b 00 <48> 8b 32 48 39 fe 0f 85 af 6e 5b 00 48 8b 50 08 48 39 f2 0f 85 94 RSP: 0018:ffffb2da005c3890 EFLAGS: 00010217 RAX: 0000000000000000 RBX: ffff9073ba23f800 RCX: dead000000000122 RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff9073ba23fbc8 RBP: ffff9073ba23f890 R08: 0000000000000001 R09: 0000000000000001 R10: 0000000000000001 R11: 0000000000000001 R12: dead000000000100 R13: ffff9073ba23fb00 R14: 0000000000000002 R15: 0000000000000002 FS: 00007f93e5564e40(0000) GS:ffff9073bba00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000014ad34000 CR4: 0000000000350ee0 Call Trace: ets_qdisc_reset+0x6e/0x100 [sch_ets] qdisc_reset+0x49/0x1d0 tbf_reset+0x15/0x60 [sch_tbf] qdisc_reset+0x49/0x1d0 dev_reset_queue.constprop.42+0x2f/0x90 dev_deactivate_many+0x1d3/0x3d0 dev_deactivate+0x56/0x90 qdisc_graft+0x47e/0x5a0 tc_get_qdisc+0x1db/0x3e0 rtnetlink_rcv_msg+0x164/0x4c0 netlink_rcv_skb+0x50/0x100 netlink_unicast+0x1a5/0x280 netlink_sendmsg+0x242/0x480 sock_sendmsg+0x5b/0x60 ____sys_sendmsg+0x1f2/0x260 ___sys_sendmsg+0x7c/0xc0 __sys_sendmsg+0x57/0xa0 do_syscall_64+0x3a/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f93e44b8338 Code: 89 02 48 c7 c0 ff ff ff ff eb b5 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 25 43 2c 00 8b 00 85 c0 75 17 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 41 54 41 89 d4 55 RSP: 002b:00007ffc0db737a8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 0000000061255c06 RCX: 00007f93e44b8338 RDX: 0000000000000000 RSI: 00007ffc0db73810 RDI: 0000000000000003 RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000 R10: 000000000000000b R11: 0000000000000246 R12: 0000000000000001 R13: 0000000000687880 R14: 0000000000000000 R15: 0000000000000000 Modules linked in: sch_ets sch_tbf dummy rfkill iTCO_wdt iTCO_vendor_support intel_rapl_msr intel_rapl_common joydev i2c_i801 pcspkr i2c_smbus lpc_ich virtio_balloon ip_tables xfs libcrc32c crct10dif_pclmul crc32_pclmul crc32c_intel ahci libahci ghash_clmulni_intel libata serio_raw virtio_blk virtio_console virtio_net net_failover failover sunrpc dm_mirror dm_region_hash dm_log dm_mod CR2: 0000000000000000 When the change() function decreases the value of 'nstrict', we must take into account that packets might be already enqueued on a class that flips from 'strict' to 'quantum': otherwise that class will not be added to the bandwidth-sharing list. Then, a call to ets_qdisc_reset() will attempt to do list_del(&alist) with 'alist' filled with zero, hence the NULL pointer dereference. For classes flipping from 'strict' to 'quantum', initialize an empty list and eventually add it to the bandwidth-sharing list, if there are packets already enqueued. In this way, the kernel will: a) prevent crashing as described above. b) avoid retaining the backlog packets (for an arbitrarily long time) in case no packet is enqueued after a change from 'strict' to 'quantum'. Reported-by: Hangbin Liu <liuhangbin@gmail.com> Fixes: dcc68b4d8084 ("net: sch_ets: Add a new Qdisc") Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-25	qede: Fix memset corruption	Shai Malin	1	-1/+1
	Thanks to Kees Cook who detected the problem of memset that starting from not the first member, but sized for the whole struct. The better change will be to remove the redundant memset and to clear only the msix_cnt member. Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com> Signed-off-by: Ariel Elior <aelior@marvell.com> Signed-off-by: Shai Malin <smalin@marvell.com> Reported-by: Kees Cook <keescook@chromium.org> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-25	net: stmmac: fix kernel panic due to NULL pointer dereference of buf->xdp	Song Yoong Siang	1	-4/+4
	Ensure a valid XSK buffer before proceed to free the xdp buffer. The following kernel panic is observed without this patch: RIP: 0010:xp_free+0x5/0x40 Call Trace: stmmac_napi_poll_rxtx+0x332/0xb30 [stmmac] ? stmmac_tx_timer+0x3c/0xb0 [stmmac] net_rx_action+0x13d/0x3d0 __do_softirq+0xfc/0x2fb ? smpboot_register_percpu_thread+0xe0/0xe0 run_ksoftirqd+0x32/0x70 smpboot_thread_fn+0x1d8/0x2c0 kthread+0x169/0x1a0 ? kthread_park+0x90/0x90 ret_from_fork+0x1f/0x30 ---[ end trace 0000000000000002 ]--- Fixes: bba2556efad6 ("net: stmmac: Enable RX via AF_XDP zero-copy") Cc: <stable@vger.kernel.org> # 5.13.x Suggested-by: Ong Boon Leong <boon.leong.ong@intel.com> Signed-off-by: Song Yoong Siang <yoong.siang.song@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-25	net: stmmac: fix kernel panic due to NULL pointer dereference of xsk_pool	Song Yoong Siang	1	-6/+6
	After free xsk_pool, there is possibility that napi polling is still running in the middle, thus causes a kernel crash due to kernel NULL pointer dereference of rx_q->xsk_pool and tx_q->xsk_pool. Fix this by changing the XDP pool setup sequence to: 1. disable napi before free xsk_pool 2. enable napi after init xsk_pool The following kernel panic is observed without this patch: RIP: 0010:xsk_uses_need_wakeup+0x5/0x10 Call Trace: stmmac_napi_poll_rxtx+0x3a9/0xae0 [stmmac] __napi_poll+0x27/0x130 net_rx_action+0x233/0x280 __do_softirq+0xe2/0x2b6 run_ksoftirqd+0x1a/0x20 smpboot_thread_fn+0xac/0x140 ? sort_range+0x20/0x20 kthread+0x124/0x150 ? set_kthread_struct+0x40/0x40 ret_from_fork+0x1f/0x30 ---[ end trace a77c8956b79ac107 ]--- Fixes: bba2556efad6 ("net: stmmac: Enable RX via AF_XDP zero-copy") Cc: <stable@vger.kernel.org> # 5.13.x Signed-off-by: Song Yoong Siang <yoong.siang.song@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>