wireguard-linux - WireGuard for the Linux kernel

Age	Commit message (Collapse)	Author	Files	Lines
2021-09-08	io-wq: fix cancellation on create-worker failure	Pavel Begunkov	1	-9/+20
	WARNING: CPU: 0 PID: 10392 at fs/io_uring.c:1151 req_ref_put_and_test fs/io_uring.c:1151 [inline] WARNING: CPU: 0 PID: 10392 at fs/io_uring.c:1151 req_ref_put_and_test fs/io_uring.c:1146 [inline] WARNING: CPU: 0 PID: 10392 at fs/io_uring.c:1151 io_req_complete_post+0xf5b/0x1190 fs/io_uring.c:1794 Modules linked in: Call Trace: tctx_task_work+0x1e5/0x570 fs/io_uring.c:2158 task_work_run+0xe0/0x1a0 kernel/task_work.c:164 tracehook_notify_signal include/linux/tracehook.h:212 [inline] handle_signal_work kernel/entry/common.c:146 [inline] exit_to_user_mode_loop kernel/entry/common.c:172 [inline] exit_to_user_mode_prepare+0x232/0x2a0 kernel/entry/common.c:209 __syscall_exit_to_user_mode_work kernel/entry/common.c:291 [inline] syscall_exit_to_user_mode+0x19/0x60 kernel/entry/common.c:302 do_syscall_64+0x42/0xb0 arch/x86/entry/common.c:86 entry_SYSCALL_64_after_hwframe+0x44/0xae When io_wqe_enqueue() -> io_wqe_create_worker() fails, we can't just call io_run_cancel() to clean up the request, it's already enqueued via io_wqe_insert_work() and will be executed either by some other worker during cancellation (e.g. in io_wq_put_and_exit()). Reported-by: Hao Sun <sunhao.th@gmail.com> Fixes: 3146cba99aa28 ("io-wq: make worker creation resilient against signals") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/93b9de0fcf657affab0acfd675d4abcd273ee863.1631092071.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-09-07	Revert "memcg: enable accounting for pollfd and select bits arrays"	Linus Torvalds	1	-2/+2
	This reverts commit b655843444152c0a14b749308e4cb35d91cbcf0b. Just like with the memcg lock accounting, the kernel test robot reports a sizeable performance regression for this commit, and while it clearly does the rigth thing in theory, we'll need to look at just how to avoid or minimize the performance overhead of the memcg accounting. People already have suggestions on how to do that, but it's "future work". So revert it for now. [ Note: the first link below is for this same commit but a different commit ID, because it's the kernel test robot ended up noticing it in Andrew Morton's patch queue ] Link: https://lore.kernel.org/lkml/20210905132732.GC15026@xsang-OptiPlex-9020/ Link: https://lore.kernel.org/lkml/20210907150757.GE17617@xsang-OptiPlex-9020/ Acked-by: Jens Axboe <axboe@kernel.dk> Acked-by: Shakeel Butt <shakeelb@google.com> Acked-by: Roman Gushchin <guro@fb.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-07	Revert "memcg: enable accounting for file lock caches"	Linus Torvalds	1	-4/+2
	This reverts commit 0f12156dff2862ac54235fc72703f18770769042. The kernel test robot reports a sizeable performance regression for this commit, and while it clearly does the rigth thing in theory, we'll need to look at just how to avoid or minimize the performance overhead of the memcg accounting. People already have suggestions on how to do that, but it's "future work". So revert it for now. Link: https://lore.kernel.org/lkml/20210907150757.GE17617@xsang-OptiPlex-9020/ Acked-by: Jens Axboe <axboe@kernel.dk> Acked-by: Shakeel Butt <shakeelb@google.com> Acked-by: Roman Gushchin <guro@fb.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-07	Revert "mm/gup: remove try_get_page(), call try_get_compound_head() directly"	Linus Torvalds	4	-20/+15
	This reverts commit 9857a17f206ff374aea78bccfb687f145368be2e. That commit was completely broken, and I should have caught on to it earlier. But happily, the kernel test robot noticed the breakage fairly quickly. The breakage is because "try_get_page()" is about avoiding the page reference count overflow case, but is otherwise the exact same as a plain "get_page()". In contrast, "try_get_compound_head()" is an entirely different beast, and uses __page_cache_add_speculative() because it's not just about the page reference count, but also about possibly racing with the underlying page going away. So all the commentary about how "try_get_page() has fallen a little behind in terms of maintenance, try_get_compound_head() handles speculative page references more thoroughly" was just completely wrong: yes, try_get_compound_head() handles speculative page references, but the point is that try_get_page() does not, and must not. So there's no lack of maintainance - there are fundamentally different semantics. A speculative page reference would be entirely wrong in "get_page()", and it's entirely wrong in "try_get_page()". It's not about speculation, it's purely about "uhhuh, you can't get this page because you've tried to increment the reference count too much already". The reason the kernel test robot noticed this bug was that it hit the VM_BUG_ON() in __page_cache_add_speculative(), which is all about verifying that the context of any speculative page access is correct. But since that isn't what try_get_page() is all about, the VM_BUG_ON() tests things that are not correct to test for try_get_page(). Reported-by: kernel test robot <oliver.sang@intel.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-07	ieee802154: Remove redundant initialization of variable ret	Colin Ian King	1	-1/+1
	The variable ret is being initialized with a value that is never read, it is being updated later on. The assignment is redundant and can be removed. Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-07	net: stmmac: fix MAC not working when system resume back with WoL active	Joakim Zhang	1	-18/+18
	We can reproduce this issue with below steps: 1) enable WoL on the host 2) host system suspended 3) remote client send out wakeup packets We can see that host system resume back, but can't work, such as ping failed. After a bit digging, this issue is introduced by the commit 46f69ded988d ("net: stmmac: Use resolved link config in mac_link_up()"), which use the finalised link parameters in mac_link_up() rather than the parameters in mac_config(). There are two scenarios for MAC suspend/resume in STMMAC driver: 1) MAC suspend with WoL inactive, stmmac_suspend() call phylink_mac_change() to notify phylink machine that a change in MAC state, then .mac_link_down callback would be invoked. Further, it will call phylink_stop() to stop the phylink instance. When MAC resume back, firstly phylink_start() is called to start the phylink instance, then call phylink_mac_change() which will finally trigger phylink machine to invoke .mac_config and .mac_link_up callback. All is fine since configuration in these two callbacks will be initialized, that means MAC can restore the state. 2) MAC suspend with WoL active, phylink_mac_change() will put link down, but there is no phylink_stop() to stop the phylink instance, so it will link up again, that means .mac_config and .mac_link_up would be invoked before system suspended. After system resume back, it will do DMA initialization and SW reset which let MAC lost the hardware setting (i.e MAC_Configuration register(offset 0x0) is reset). Since link is up before system suspended, so .mac_link_up would not be invoked after system resume back, lead to there is no chance to initialize the configuration in .mac_link_up callback, as a result, MAC can't work any longer. After discussed with Russell King [1], we confirm that phylink framework have not take WoL into consideration yet. This patch calls phylink_suspend()/phylink_resume() functions which is newly introduced by Russell King to fix this issue. [1] https://lore.kernel.org/netdev/20210901090228.11308-1-qiangqing.zhang@nxp.com/ Fixes: 46f69ded988d ("net: stmmac: Use resolved link config in mac_link_up()") Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-07	net: phylink: add suspend/resume support	Russell King (Oracle)	2	-0/+85
	Joakim Zhang reports that Wake-on-Lan with the stmmac ethernet driver broke when moving the incorrect handling of mac link state out of mac_config(). This reason this breaks is because the stmmac's WoL is handled by the MAC rather than the PHY, and phylink doesn't cater for that scenario. This patch adds the necessary phylink code to handle suspend/resume events according to whether the MAC still needs a valid link or not. This is the barest minimum for this support. Reported-by: Joakim Zhang <qiangqing.zhang@nxp.com> Tested-by: Joakim Zhang <qiangqing.zhang@nxp.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-07	net: renesas: sh_eth: Fix freeing wrong tx descriptor	Yoshihiro Shimoda	1	-0/+1
	The cur_tx counter must be incremented after TACT bit of txdesc->status was set. However, a CPU is possible to reorder instructions and/or memory accesses between cur_tx and txdesc->status. And then, if TX interrupt happened at such a timing, the sh_eth_tx_free() may free the descriptor wrongly. So, add wmb() before cur_tx++. Otherwise NETDEV WATCHDOG timeout is possible to happen. Fixes: 86a74ff21a7a ("net: sh_eth: add support for Renesas SuperH Ethernet") Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-07	bonding: 3ad: pass parameter bond_params by reference	Colin Ian King	1	-4/+4
	The parameter bond_params is a relatively large 192 byte sized struct so pass it by reference rather than by value to reduce copying. Addresses-Coverity: ("Big parameter passed by value") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-07	cxgb3: fix oops on module removal	Heiner Kallweit	1	-0/+3
	When removing the driver module w/o bringing an interface up before the error below occurs. Reason seems to be that cancel_work_sync() is called in t3_sge_stop() for a queue that hasn't been initialized yet. [10085.941785] ------------[ cut here ]------------ [10085.941799] WARNING: CPU: 1 PID: 5850 at kernel/workqueue.c:3074 __flush_work+0x3ff/0x480 [10085.941819] Modules linked in: vfat snd_hda_codec_hdmi fat snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio led_class ee1004 iTCO_ wdt intel_tcc_cooling x86_pkg_temp_thermal coretemp aesni_intel crypto_simd cryptd snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hda_core r 8169 snd_pcm realtek mdio_devres snd_timer snd i2c_i801 i2c_smbus libphy i915 i2c_algo_bit cxgb3(-) intel_gtt ttm mdio drm_kms_helper mei_me s yscopyarea sysfillrect sysimgblt mei fb_sys_fops acpi_pad sch_fq_codel crypto_user drm efivarfs ext4 mbcache jbd2 crc32c_intel [10085.941944] CPU: 1 PID: 5850 Comm: rmmod Not tainted 5.14.0-rc7-next-20210826+ #6 [10085.941974] Hardware name: System manufacturer System Product Name/PRIME H310I-PLUS, BIOS 2603 10/21/2019 [10085.941992] RIP: 0010:__flush_work+0x3ff/0x480 [10085.942003] Code: c0 74 6b 65 ff 0d d1 bd 78 75 e8 bc 2f 06 00 48 c7 c6 68 b1 88 8a 48 c7 c7 e0 5f b4 8b 45 31 ff e8 e6 66 04 00 e9 4b fe ff ff <0f> 0b 45 31 ff e9 41 fe ff ff e8 72 c1 79 00 85 c0 74 87 80 3d 22 [10085.942036] RSP: 0018:ffffa1744383fc08 EFLAGS: 00010246 [10085.942048] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000923 [10085.942062] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff91c901710a88 [10085.942076] RBP: ffffa1744383fce8 R08: 0000000000000001 R09: 0000000000000001 [10085.942090] R10: 00000000000000c2 R11: 0000000000000000 R12: ffff91c901710a88 [10085.942104] R13: 0000000000000000 R14: ffff91c909a96100 R15: 0000000000000001 [10085.942118] FS: 00007fe417837740(0000) GS:ffff91c969d00000(0000) knlGS:0000000000000000 [10085.942134] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [10085.942146] CR2: 000055a8d567ecd8 CR3: 0000000121690003 CR4: 00000000003706e0 [10085.942160] Call Trace: [10085.942166] ? __lock_acquire+0x3af/0x22e0 [10085.942177] ? cancel_work_sync+0xb/0x10 [10085.942187] __cancel_work_timer+0x128/0x1b0 [10085.942197] ? __pm_runtime_resume+0x5b/0x90 [10085.942208] cancel_work_sync+0xb/0x10 [10085.942217] t3_sge_stop+0x2f/0x50 [cxgb3] [10085.942234] remove_one+0x26/0x190 [cxgb3] [10085.942248] pci_device_remove+0x39/0xa0 [10085.942258] __device_release_driver+0x15e/0x240 [10085.942269] driver_detach+0xd9/0x120 [10085.942278] bus_remove_driver+0x53/0xd0 [10085.942288] driver_unregister+0x2c/0x50 [10085.942298] pci_unregister_driver+0x31/0x90 [10085.942307] cxgb3_cleanup_module+0x10/0x18c [cxgb3] [10085.942324] __do_sys_delete_module+0x191/0x250 [10085.942336] ? syscall_enter_from_user_mode+0x21/0x60 [10085.942347] ? trace_hardirqs_on+0x2a/0xe0 [10085.942357] __x64_sys_delete_module+0x13/0x20 [10085.942368] do_syscall_64+0x40/0x90 [10085.942377] entry_SYSCALL_64_after_hwframe+0x44/0xae [10085.942389] RIP: 0033:0x7fe41796323b Fixes: 5e0b8928927f ("net:cxgb3: replace tasklets with works") Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-07	mfd: lpc_sch: Rename GPIOBASE to prevent build error	Randy Dunlap	1	-2/+2
	One MIPS platform (mach-rc32434) defines GPIOBASE. This macro conflicts with one of the same name in lpc_sch.c. Rename the latter one to prevent the build error. ../drivers/mfd/lpc_sch.c:25: error: "GPIOBASE" redefined [-Werror] 25 \| #define GPIOBASE 0x44 ../arch/mips/include/asm/mach-rc32434/rb.h:32: note: this is the location of the previous definition 32 \| #define GPIOBASE 0x050000 Cc: Denis Turischev <denis@compulab.co.il> Fixes: e82c60ae7d3a ("mfd: Introduce lpc_sch for Intel SCH LPC bridge") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Lee Jones <lee.jones@linaro.org>
2021-09-07	mfd: syscon: Use of_iomap() instead of ioremap()	Hector Martin	1	-1/+1
	This automatically selects between ioremap() and ioremap_np() on platforms that require it, such as Apple SoCs. Signed-off-by: Hector Martin <marcan@marcan.st> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Lee Jones <lee.jones@linaro.org>
2021-09-07	can: c_can: fix null-ptr-deref on ioctl()	Tong Zhang	1	-3/+1
	The pdev maybe not a platform device, e.g. c_can_pci device, in this case, calling to_platform_device() would not make sense. Also, per the comment in drivers/net/can/c_can/c_can_ethtool.c, @bus_info should match dev_name() string, so I am replacing this with dev_name() to fix this issue. [ 1.458583] BUG: unable to handle page fault for address: 0000000100000000 [ 1.460921] RIP: 0010:strnlen+0x1a/0x30 [ 1.466336] ? c_can_get_drvinfo+0x65/0xb0 [c_can] [ 1.466597] ethtool_get_drvinfo+0xae/0x360 [ 1.466826] dev_ethtool+0x10f8/0x2970 [ 1.467880] sock_ioctl+0xef/0x300 Fixes: 2722ac986e93 ("can: c_can: add ethtool support") Link: https://lore.kernel.org/r/20210906233704.1162666-1-ztong0001@gmail.com Cc: stable@vger.kernel.org # 5.14+ Signed-off-by: Tong Zhang <ztong0001@gmail.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2021-09-07	can: rcar_canfd: add __maybe_unused annotation to silence warning	Marc Kleine-Budde	1	-1/+1
	Since commit \| dd3bd23eb438 ("can: rcar_canfd: Add Renesas R-Car CAN FD driver") the rcar_canfd driver can be compile tested on all architectures. On non OF enabled archs, or archs where OF is optional (and disabled in the .config) the compilation throws the following warning: \| drivers/net/can/rcar/rcar_canfd.c:2020:34: warning: unused variable 'rcar_canfd_of_table' [-Wunused-const-variable] \| static const struct of_device_id rcar_canfd_of_table[] = { \| ^ This patch fixes the warning by marking the variable rcar_canfd_of_table as __maybe_unused. Fixes: ac4224087312 ("can: rcar: Kconfig: Add helper dependency on COMPILE_TEST") Fixes: dd3bd23eb438 ("can: rcar_canfd: Add Renesas R-Car CAN FD driver") Link: https://lore.kernel.org/all/20210907064537.1054268-1-mkl@pengutronix.de Cc: linux-renesas-soc@vger.kernel.org Cc: Cai Huoqing <caihuoqing@baidu.com> Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2021-09-06	thunderbolt: test: split up test cases in tb_test_credit_alloc_all	Linus Torvalds	1	-17/+81
	The tb_test_credit_alloc_all() function had a huge number of KUNIT_ASSERT() statements, all of which (though the magic of many many layers of inscrutable macros) ended up allocating and initializing various test assertion structures on the stack. Don't do that. The kernel stack isn't infinite, and we have compiler warnings (now errors) for the case where a stack frame grows too large. Like it did here, by not an inconsiderable margin: drivers/thunderbolt/test.c: In function ‘tb_test_credit_alloc_all’: drivers/thunderbolt/test.c:2367:1: error: the frame size of 4500 bytes is larger than 2048 bytes [-Werror=frame-larger-than=] 2367 \| } \| ^ Solve this similarly to the lib/test_scanf case: split out the tests into several smaller functions, each just testing one particular tunnel credit allocation. This makes the i386 allyesconfig build work for me again. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-06	lib/test_scanf: split up number parsing test routines	Linus Torvalds	1	-8/+71
	It turns out that gcc has real trouble merging all the temporary on-stack buffer allocation. So despite the fact that their lifetimes do not overlap, gcc will allocate stack for all of them when they have different types. Which they do in the number scanning test routines. This is unfortunate in general, but with lots of test-cases in one function, it becomes a real problem. gcc will allocate a huge stack frame for no actual good reason. We have tried to counteract this tendency of gcc not merging stack slots (see "-fconserve-stack"), but that has limited effect (and should be on by default these days, iirc). So with all the debug options enabled on an i386 allmodconfig build, we end up with overly big stack frames, and the resulting stack frame size warnings (now errors): lib/test_scanf.c: In function ‘numbers_list_field_width_val_width’: lib/test_scanf.c:530:1: error: the frame size of 2088 bytes is larger than 2048 bytes [-Werror=frame-larger-than=] 530 \| } \| ^ lib/test_scanf.c: In function ‘numbers_list_field_width_typemax’: lib/test_scanf.c:488:1: error: the frame size of 2568 bytes is larger than 2048 bytes [-Werror=frame-larger-than=] 488 \| } \| ^ lib/test_scanf.c: In function ‘numbers_list’: lib/test_scanf.c:437:1: error: the frame size of 2088 bytes is larger than 2048 bytes [-Werror=frame-larger-than=] 437 \| } \| ^ In this particular case, the reasonably straightforward solution is to just split out the test routines into multiple more targeted versions. That way we don't have one huge stack, but several smaller ones, and they aren't active all at the same time. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-06	iwl: fix debug printf format strings	Linus Torvalds	1	-2/+2
	The variable 'package_size' is an unsigned long, and should be printed out using '%lu', not '%zd' (that would be for a size_t). Yes, on many architectures (including x86-64), 'size_t' is in fact the same type as 'long', but that's a fairly random architecture definition, and on some platforms 'size_t' is in fact 'int' rather than 'long'. That is the case on traditional 32-bit x86. Yes, both types are the exact same 32-bit size, and it would all print out perfectly correctly, but '%zd' ends up still being wrong. And we can't make 'package_size' be a 'size_t', because we get the actual value using efivar_entry_get() that takes a pointer to an 'unsigned long'. So '%lu' it is. This fixes two of the i386 allmodconfig build warnings (that is now an error due to -Werror). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-06	don't make the syscall checking produce errors from warnings	Stephen Rothwell	1	-1/+1
	Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-06	net: wwan: iosm: Unify IO accessors used in the driver	Andy Shevchenko	1	-10/+12
	Currently we have readl()/writel()/ioread()/iowrite() APIs in use. Let's unify to use only ioread()/iowrite() variants. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-06	net: wwan: iosm: Replace io.*64_lo_hi() with regular accessors	Andy Shevchenko	1	-4/+4
	The io.*_lo_hi() variants are not strictly needed on the x86 hardware and especially the PCI bus. Replace them with regular accessors, but leave headers in place in case of 32-bit build. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-06	net: qcom/emac: Replace strlcpy with strscpy	Jason Wang	1	-1/+1
	The strlcpy should not be used because it doesn't limit the source length. As linus says, it's a completely useless function if you can't implicitly trust the source string - but that is almost always why people think they should use it! All in all the BSD function will lead some potential bugs. But the strscpy doesn't require reading memory from the src string beyond the specified "count" bytes, and since the return value is easier to error-check than strlcpy()'s. In addition, the implementation is robust to the string changing out from underneath it, unlike the current strlcpy() implementation. Thus, We prefer using strscpy instead of strlcpy. Signed-off-by: Jason Wang <wangborong@cdjrlc.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-06	ip6_gre: Revert "ip6_gre: add validation for csum_start"	Willem de Bruijn	1	-2/+0
	This reverts commit 9cf448c200ba9935baa94e7a0964598ce947db9d. This commit was added for equivalence with a similar fix to ip_gre. That fix proved to have a bug. Upon closer inspection, ip6_gre is not susceptible to the original bug. So revert the unnecessary extra check. In short, ipgre_xmit calls skb_pull to remove ipv4 headers previously inserted by dev_hard_header. ip6gre_tunnel_xmit does not. Link: https://lore.kernel.org/netdev/CA+FuTSe+vJgTVLc9SojGuN-f9YQ+xWLPKE_S4f=f+w+_P2hgUg@mail.gmail.com/#t Fixes: 9cf448c200ba ("ip6_gre: add validation for csum_start") Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-06	kernel: debug: Convert to SPDX identifier	Cai Huoqing	2	-8/+2
	use SPDX-License-Identifier instead of a verbose license text Signed-off-by: Cai Huoqing <caihuoqing@baidu.com> Link: https://lore.kernel.org/r/20210906112302.937-1-caihuoqing@baidu.com Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
2021-09-06	KVM: Drop unused kvm_dirty_gfn_invalid()	Peter Xu	1	-5/+0
	Drop the unused function as reported by test bot. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20210901230904.15164-1-peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-09-06	net: hns3: make hclgevf_cmd_caps_bit_map0 and hclge_cmd_caps_bit_map0 static	chongjiapeng	2	-2/+2
	This symbols is not used outside of hclge_cmd.c and hclgevf_cmd.c, so marks it static. Fix the following sparse warning: drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_cmd.c:345:35: warning: symbol 'hclgevf_cmd_caps_bit_map0' was not declared. Should it be static? drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c:365:33: warning: symbol 'hclge_cmd_caps_bit_map0' was not declared. Should it be static? Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: chongjiapeng <jiapeng.chong@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-06	selftests/bpf: Test XDP bonding nest and unwind	Jussi Maki	1	-10/+64
	Modify the test to check that enslaving a bond slave with a XDP program is now allowed. Extend attach test to exercise the program unwinding in bond_xdp_set and add a new test for loading XDP program on doubly nested bond device to verify that static key incr/decr is correct. Signed-off-by: Jussi Maki <joamaki@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-06	bonding: Fix negative jump label count on nested bonding	Jussi Maki	1	-6/+5
	With nested bonding devices the nested bond device's ndo_bpf was called without a program causing it to decrement the static key without a prior increment leading to negative count. Fix the issue by 1) only calling slave's ndo_bpf when there's a program to be loaded and 2) only decrement the count when a program is unloaded. Fixes: 9e2ee5c7e7c3 ("net, bonding: Add XDP support to the bonding driver") Reported-by: syzbot+30622fb04ddd72a4d167@syzkaller.appspotmail.com Signed-off-by: Jussi Maki <joamaki@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-06	MAINTAINERS: add VM SOCKETS (AF_VSOCK) entry	Stefano Garzarella	1	-7/+13
	Add a new entry for VM Sockets (AF_VSOCK) that covers vsock core, tests, and headers. Move some general vsock stuff from virtio-vsock entry into this new more general vsock entry. I've been reviewing and contributing for the last few years, so I'm available to help maintain this code. Cc: Dexuan Cui <decui@microsoft.com> Cc: Jorgen Hansen <jhansen@vmware.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Suggested-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-06	stmmac: dwmac-loongson:Fix missing return value	zhaoxiao	1	-1/+3
	Add the return value when phy_mode < 0. Signed-off-by: zhaoxiao <long870912@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-06	fuse: remove unused arg in fuse_write_file_get()	Miklos Szeredi	1	-9/+6
	The struct fuse_conn argument is not used and can be removed. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-09-06	fuse: wait for writepages in syncfs	Miklos Szeredi	3	-0/+100
	In case of fuse the MM subsystem doesn't guarantee that page writeback completes by the time ->sync_fs() is called. This is because fuse completes page writeback immediately to prevent DoS of memory reclaim by the userspace file server. This means that fuse itself must ensure that writes are synced before sending the SYNCFS request to the server. Introduce sync buckets, that hold a counter for the number of outstanding write requests. On syncfs replace the current bucket with a new one and wait until the old bucket's counter goes down to zero. It is possible to have multiple syncfs calls in parallel, in which case there could be more than one waited-on buckets. Descendant buckets must not complete until the parent completes. Add a count to the child (new) bucket until the (parent) old bucket completes. Use RCU protection to dereference the current bucket and to wake up an emptied bucket. Use fc->lock to protect against parallel assignments to the current bucket. This leaves just the counter to be a possible scalability issue. The fc->num_waiting counter has a similar issue, so both should be addressed at the same time. Reported-by: Amir Goldstein <amir73il@gmail.com> Fixes: 2d82ab251ef0 ("virtiofs: propagate sync() to file server") Cc: <stable@vger.kernel.org> # v5.14 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-09-06	KVM: x86: Update vCPU's hv_clock before back to guest when tsc_offset is adjusted	Zelin Deng	1	-0/+4
	When MSR_IA32_TSC_ADJUST is written by guest due to TSC ADJUST feature especially there's a big tsc warp (like a new vCPU is hot-added into VM which has been up for a long time), tsc_offset is added by a large value then go back to guest. This causes system time jump as tsc_timestamp is not adjusted in the meantime and pvclock monotonic character. To fix this, just notify kvm to update vCPU's guest time before back to guest. Cc: stable@vger.kernel.org Signed-off-by: Zelin Deng <zelin.deng@linux.alibaba.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <1619576521-81399-2-git-send-email-zelin.deng@linux.alibaba.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-09-06	KVM: MMU: mark role_regs and role accessors as maybe unused	Paolo Bonzini	1	-2/+2
	It is reasonable for these functions to be used only in some configurations, for example only if the host is 64-bits (and therefore supports 64-bit guests). It is also reasonable to keep the role_regs and role accessors in sync even though some of the accessors may be used only for one of the two sets (as is the case currently for CR4.LA57).. Because clang reports warnings for unused inlines declared in a .c file, mark both sets of accessors as __maybe_unused. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-09-06	KVM: MIPS: Remove a "set but not used" variable	Huacai Chen	1	-2/+1
	This fix a build warning: arch/mips/kvm/vz.c: In function '_kvm_vz_restore_htimer': >> arch/mips/kvm/vz.c:392:10: warning: variable 'freeze_time' set but not used [-Wunused-but-set-variable] 392 \| ktime_t freeze_time; \| ^~~~~~~~~~~ Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> Message-Id: <20210406024911.2008046-1-chenhuacai@loongson.cn> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-09-06	x86/kvm: Don't enable IRQ when IRQ enabled in kvm_wait	Lai Jiangshan	1	-2/+3
	Commit f4e61f0c9add3 ("x86/kvm: Fix broken irq restoration in kvm_wait") replaced "local_irq_restore() when IRQ enabled" with "local_irq_enable() when IRQ enabled" to suppress a warnning. Although there is no similar debugging warnning for doing local_irq_enable() when IRQ enabled as doing local_irq_restore() in the same IRQ situation. But doing local_irq_enable() when IRQ enabled is no less broken as doing local_irq_restore() and we'd better avoid it. Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Message-Id: <20210814035129.154242-1-jiangshanlai@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-09-06	KVM: stats: Add VM stat for remote tlb flush requests	Jing Zhang	4	-1/+5
	Add a new stat that counts the number of times a remote TLB flush is requested, regardless of whether it kicks vCPUs out of guest mode. This allows us to look at how often flushes are initiated. Unlike remote_tlb_flush, this one applies to ARM's instruction-set-based TLB flush implementation, so apply it there too. Original-by: David Matlack <dmatlack@google.com> Signed-off-by: Jing Zhang <jingzhangos@google.com> Message-Id: <20210817002639.3856694-1-jingzhangos@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-09-06	KVM: Remove unnecessary export of kvm_{inc,dec}_notifier_count()	Sean Christopherson	1	-3/+0
	Don't export KVM's MMU notifier count helpers, under no circumstance should any downstream module, including x86's vendor code, have a legitimate reason to piggyback KVM's MMU notifier logic. E.g in the x86 case, only KVM's MMU should be elevating the notifier count, and that code is always built into the core kvm.ko module. Fixes: edb298c663fc ("KVM: x86/mmu: bump mmu notifier count in kvm_zap_gfn_range") Cc: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210902175951.1387989-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-09-06	KVM: x86/mmu: Move lpage_disallowed_link further "down" in kvm_mmu_page	Sean Christopherson	1	-1/+5
	Move "lpage_disallowed_link" out of the first 64 bytes, i.e. out of the first cache line, of kvm_mmu_page so that "spt" and to a lesser extent "gfns" land in the first cache line. "lpage_disallowed_link" is accessed relatively infrequently compared to "spt", which is accessed any time KVM is walking and/or manipulating the shadow page tables. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210901221023.1303578-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-09-06	KVM: x86/mmu: Relocate kvm_mmu_page.tdp_mmu_page for better cache locality	Sean Christopherson	1	-2/+1
	Move "tdp_mmu_page" into the 1-byte void left by the recently removed "mmio_cached" so that it resides in the first 64 bytes of kvm_mmu_page, i.e. in the same cache line as the most commonly accessed fields. Don't bother wrapping tdp_mmu_page in CONFIG_X86_64, including the field in 32-bit builds doesn't affect the size of kvm_mmu_page, and a future patch can always wrap the field in the unlikely event KVM gains a 1-byte flag that is 32-bit specific. Note, the size of kvm_mmu_page is also unchanged on CONFIG_X86_64=y due to it previously sharing an 8-byte chunk with write_flooding_count. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210901221023.1303578-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-09-06	Revert "KVM: x86: mmu: Add guest physical address check in translate_gpa()"	Sean Christopherson	1	-6/+0
	Revert a misguided illegal GPA check when "translating" a non-nested GPA. The check is woefully incomplete as it does not fill in @exception as expected by all callers, which leads to KVM attempting to inject a bogus exception, potentially exposing kernel stack information in the process. WARNING: CPU: 0 PID: 8469 at arch/x86/kvm/x86.c:525 exception_type+0x98/0xb0 arch/x86/kvm/x86.c:525 CPU: 1 PID: 8469 Comm: syz-executor531 Not tainted 5.14.0-rc7-syzkaller #0 RIP: 0010:exception_type+0x98/0xb0 arch/x86/kvm/x86.c:525 Call Trace: x86_emulate_instruction+0xef6/0x1460 arch/x86/kvm/x86.c:7853 kvm_mmu_page_fault+0x2f0/0x1810 arch/x86/kvm/mmu/mmu.c:5199 handle_ept_misconfig+0xdf/0x3e0 arch/x86/kvm/vmx/vmx.c:5336 __vmx_handle_exit arch/x86/kvm/vmx/vmx.c:6021 [inline] vmx_handle_exit+0x336/0x1800 arch/x86/kvm/vmx/vmx.c:6038 vcpu_enter_guest+0x2a1c/0x4430 arch/x86/kvm/x86.c:9712 vcpu_run arch/x86/kvm/x86.c:9779 [inline] kvm_arch_vcpu_ioctl_run+0x47d/0x1b20 arch/x86/kvm/x86.c:10010 kvm_vcpu_ioctl+0x49e/0xe50 arch/x86/kvm/../../../virt/kvm/kvm_main.c:3652 The bug has escaped notice because practically speaking the GPA check is useless. The GPA check in question only comes into play when KVM is walking guest page tables (or "translating" CR3), and KVM already handles illegal GPA checks by setting reserved bits in rsvd_bits_mask for each PxE, or in the case of CR3 for loading PTDPTRs, manually checks for an illegal CR3. This particular failure doesn't hit the existing reserved bits checks because syzbot sets guest.MAXPHYADDR=1, and IA32 architecture simply doesn't allow for such an absurd MAXPHYADDR, e.g. 32-bit paging doesn't define any reserved PA bits checks, which KVM emulates by only incorporating the reserved PA bits into the "high" bits, i.e. bits 63:32. Simply remove the bogus check. There is zero meaningful value and no architectural justification for supporting guest.MAXPHYADDR < 32, and properly filling the exception would introduce non-trivial complexity. This reverts commit ec7771ab471ba6a945350353617e2e3385d0e013. Fixes: ec7771ab471b ("KVM: x86: mmu: Add guest physical address check in translate_gpa()") Cc: stable@vger.kernel.org Reported-by: syzbot+200c08e88ae818f849ce@syzkaller.appspotmail.com Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210831164224.1119728-2-seanjc@google.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-09-06	KVM: x86/mmu: Remove unused field mmio_cached in struct kvm_mmu_page	Jia He	1	-1/+0
	After reverting and restoring the fast tlb invalidation patch series, the mmio_cached is not removed. Hence a unused field is left in kvm_mmu_page. Cc: Sean Christopherson <seanjc@google.com> Signed-off-by: Jia He <justin.he@arm.com> Message-Id: <20210830145336.27183-1-justin.he@arm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-09-06	kvm: x86: Increase KVM_SOFT_MAX_VCPUS to 710	Eduardo Habkost	1	-1/+1
	Support for 710 VCPUs was tested by Red Hat since RHEL-8.4, so increase KVM_SOFT_MAX_VCPUS to 710. Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Message-Id: <20210903211600.2002377-4-ehabkost@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-09-06	kvm: x86: Increase MAX_VCPUS to 1024	Eduardo Habkost	1	-1/+1
	Increase KVM_MAX_VCPUS to 1024, so we can test larger VMs. I'm not changing KVM_SOFT_MAX_VCPUS yet because I'm afraid it might involve complicated questions around the meaning of "supported" and "recommended" in the upstream tree. KVM_SOFT_MAX_VCPUS will be changed in a separate patch. For reference, visible effects of this change are: - KVM_CAP_MAX_VCPUS will now return 1024 (of course) - Default value for CPUID[HYPERV_CPUID_IMPLEMENT_LIMITS (00x40000005)].EAX will now be 1024 - KVM_MAX_VCPU_ID will change from 1151 to 4096 - Size of struct kvm will increase from 19328 to 22272 bytes (in x86_64) - Size of struct kvm_ioapic will increase from 1780 to 5084 bytes (in x86_64) - Bitmap stack variables that will grow: - At kvm_hv_flush_tlb() kvm_hv_send_ipi(), vp_bitmap[] and vcpu_bitmap[] will now be 128 bytes long - vcpu_bitmap at bioapic_write_indirect() will be 128 bytes long once patch "KVM: x86: Fix stack-out-of-bounds memory access from ioapic_write_indirect()" is applied Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Message-Id: <20210903211600.2002377-3-ehabkost@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-09-06	kvm: x86: Set KVM_MAX_VCPU_ID to 4*KVM_MAX_VCPUS	Eduardo Habkost	1	-1/+13
	Instead of requiring KVM_MAX_VCPU_ID to be manually increased every time we increase KVM_MAX_VCPUS, set it to 4*KVM_MAX_VCPUS. This should be enough for CPU topologies where Cores-per-Package and Packages-per-Socket are not powers of 2. In practice, this increases KVM_MAX_VCPU_ID from 1023 to 1152. The only side effect of this change is making some fields in struct kvm_ioapic larger, increasing the struct size from 1628 to 1780 bytes (in x86_64). Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Message-Id: <20210903211600.2002377-2-ehabkost@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-09-06	iwlwifi: fix printk format warnings in uefi.c	Randy Dunlap	1	-2/+2
	The kernel test robot reports printk format warnings in uefi.c, so correct them. ../drivers/net/wireless/intel/iwlwifi/fw/uefi.c: In function 'iwl_uefi_get_pnvm': ../drivers/net/wireless/intel/iwlwifi/fw/uefi.c:52:30: warning: format '%zd' expects argument of type 'signed size_t', but argument 7 has type 'long unsigned int' [-Wformat=] 52 \| "PNVM UEFI variable not found %d (len %zd)\n", \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 53 \| err, package_size); \| ~~~~~~~~~~~~ \| \| \| long unsigned int ../drivers/net/wireless/intel/iwlwifi/fw/uefi.c:59:29: warning: format '%zd' expects argument of type 'signed size_t', but argument 6 has type 'long unsigned int' [-Wformat=] 59 \| IWL_DEBUG_FW(trans, "Read PNVM from UEFI with size %zd\n", package_size); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~ \| \| \| long unsigned int Fixes: 84c3c9952afb ("iwlwifi: move UEFI code to a separate file") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reported-by: kernel test robot <lkp@intel.com> Cc: Kalle Valo <kvalo@codeaurora.org> Cc: Luca Coelho <luciano.coelho@intel.com> Cc: linux-wireless@vger.kernel.org Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/20210821020901.25901-1-rdunlap@infradead.org
2021-09-06	KVM: VMX: avoid running vmx_handle_exit_irqoff in case of emulation	Maxim Levitsky	1	-0/+3
	If we are emulating an invalid guest state, we don't have a correct exit reason, and thus we shouldn't do anything in this function. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210826095750.1650467-2-mlevitsk@redhat.com> Cc: stable@vger.kernel.org Fixes: 95b5a48c4f2b ("KVM: VMX: Handle NMIs, #MCs and async #PFs in common irqs-disabled fn", 2019-06-18) Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-09-06	KVM: x86/mmu: Don't freak out if pml5_root is NULL on 4-level host	Sean Christopherson	1	-3/+11
	Include pml5_root in the set of special roots if and only if the host, and thus NPT, is using 5-level paging. mmu_alloc_special_roots() expects special roots to be allocated as a bundle, i.e. they're either all valid or all NULL. But for pml5_root, that expectation only holds true if the host uses 5-level paging, which causes KVM to WARN about pml5_root being NULL when the other special roots are valid. The silver lining of 4-level vs. 5-level NPT being tied to the host kernel's paging level is that KVM's shadow root level is constant; unlike VMX's EPT, KVM can't choose 4-level NPT based on guest.MAXPHYADDR. That means KVM can still expect pml5_root to be bundled with the other special roots, it just needs to be conditioned on the shadow root level. Fixes: cb0f722aff6e ("KVM: x86/mmu: Support shadowing NPT when 5-level paging is enabled in host") Reported-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210824005824.205536-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>