aboutsummaryrefslogtreecommitdiffstatshomepage
path: root/tools/perf/scripts/python/export-to-postgresql.py (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2022-06-09nfp: avoid unnecessary check warnings in nfp_app_get_vf_configFei Qin1-13/+15
nfp_net_sriov_check is added in nfp_app_get_vf_config which intends to ensure ivi->vlan_proto and ivi->max_tx_rate/min_tx_rate can be read from VF config table only when firmware supports corresponding capability. However, "nfp_app_get_vf_config" can be called by commands like "ip a", "ip link set $DEV up" and "ip link set $DEV vf $NUM vlan $param" (with VF). When using commands above, many warnings "ndo_set_vf_<cap_x> not supported" would appear if firmware doesn't support VF rate limit and 802.1ad VLAN assingment. If more VFs are created, things could get worse. Thus, this patch add an extra bool parameter for nfp_net_sriov_check to enable/disable the cap check warning report. Unnecessary warnings in nfp_app_get_vf_config can be avoided. Valid warnings in kinds of vf setting function can be reserved. Fixes: e0d0e1fdf1ed ("nfp: VF rate limit support") Fixes: 59359597b010 ("nfp: support 802.1ad VLAN assingment to VF") Signed-off-by: Fei Qin <fei.qin@corigine.com> Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com> Signed-off-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-09tls: Rename TLS_INFO_ZC_SENDFILE to TLS_INFO_ZC_TXMaxim Mikityanskiy2-6/+6
To embrace possible future optimizations of TLS, rename zerocopy sendfile definitions to more generic ones: * setsockopt: TLS_TX_ZEROCOPY_SENDFILE- > TLS_TX_ZEROCOPY_RO * sock_diag: TLS_INFO_ZC_SENDFILE -> TLS_INFO_ZC_RO_TX RO stands for readonly and emphasizes that the application shouldn't modify the data being transmitted with zerocopy to avoid potential disconnection. Fixes: c1318b39c7d3 ("tls: Add opt-in zerocopy mode of sendfile()") Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Link: https://lore.kernel.org/r/20220608153425.3151146-1-maximmi@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-09netfs: gcc-12: temporarily disable '-Wattribute-warning' for nowLinus Torvalds2-0/+6
This is a pure band-aid so that I can continue merging stuff from people while some of the gcc-12 fallout gets sorted out. In particular, gcc-12 is very unhappy about the kinds of pointer arithmetic tricks that netfs does, and that makes the fortify checks trigger in afs and ceph: In function ‘fortify_memset_chk’, inlined from ‘netfs_i_context_init’ at include/linux/netfs.h:327:2, inlined from ‘afs_set_netfs_context’ at fs/afs/inode.c:61:2, inlined from ‘afs_root_iget’ at fs/afs/inode.c:543:2: include/linux/fortify-string.h:258:25: warning: call to ‘__write_overflow_field’ declared with attribute warning: detected write beyond size of field (1st parameter); maybe use struct_group()? [-Wattribute-warning] 258 | __write_overflow_field(p_size_field, size); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ and the reason is that netfs_i_context_init() is passed a 'struct inode' pointer, and then it does struct netfs_i_context *ctx = netfs_i_context(inode); memset(ctx, 0, sizeof(*ctx)); where that netfs_i_context() function just does pointer arithmetic on the inode pointer, knowing that the netfs_i_context is laid out immediately after it in memory. This is all truly disgusting, since the whole "netfs_i_context is laid out immediately after it in memory" is not actually remotely true in general, but is just made to be that way for afs and ceph. See for example fs/cifs/cifsglob.h: struct cifsInodeInfo { struct { /* These must be contiguous */ struct inode vfs_inode; /* the VFS's inode record */ struct netfs_i_context netfs_ctx; /* Netfslib context */ }; [...] and realize that this is all entirely wrong, and the pointer arithmetic that netfs_i_context() is doing is also very very wrong and wouldn't give the right answer if netfs_ctx had different alignment rules from a 'struct inode', for example). Anyway, that's just a long-winded way to say "the gcc-12 warning is actually quite reasonable, and our code happens to work but is pretty disgusting". This is getting fixed properly, but for now I made the mistake of thinking "the week right after the merge window tends to be calm for me as people take a breather" and I did a sustem upgrade. And I got gcc-12 as a result, so to continue merging fixes from people and not have the end result drown in warnings, I am fixing all these gcc-12 issues I hit. Including with these kinds of temporary fixes. Cc: Kees Cook <keescook@chromium.org> Cc: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/all/AEEBCF5D-8402-441D-940B-105AA718C71F@chromium.org/ Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-06-09gcc-12: disable '-Warray-bounds' universally for nowLinus Torvalds4-9/+12
In commit 8b202ee21839 ("s390: disable -Warray-bounds") the s390 people disabled the '-Warray-bounds' warning for gcc-12, because the new logic in gcc would cause warnings for their use of the S390_lowcore macro, which accesses absolute pointers. It turns out gcc-12 has many other issues in this area, so this takes that s390 warning disable logic, and turns it into a kernel build config entry instead. Part of the intent is that we can make this all much more targeted, and use this conflig flag to disable it in only particular configurations that cause problems, with the s390 case as an example: select GCC12_NO_ARRAY_BOUNDS and we could do that for other configuration cases that cause issues. Or we could possibly use the CONFIG_CC_NO_ARRAY_BOUNDS thing in a more targeted way, and disable the warning only for particular uses: again the s390 case as an example: KBUILD_CFLAGS_DECOMPRESSOR += $(if $(CONFIG_CC_NO_ARRAY_BOUNDS),-Wno-array-bounds) but this ends up just doing it globally in the top-level Makefile, since the current issues are spread fairly widely all over: KBUILD_CFLAGS-$(CONFIG_CC_NO_ARRAY_BOUNDS) += -Wno-array-bounds We'll try to limit this later, since the gcc-12 problems are rare enough that *much* of the kernel can be built with it without disabling this warning. Cc: Kees Cook <keescook@chromium.org> Cc: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-06-09mellanox: mlx5: avoid uninitialized variable warning with gcc-12Linus Torvalds1-1/+1
gcc-12 started warning about 'tracker' being used uninitialized: drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c: In function ‘mlx5_do_bond’: drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c:786:28: warning: ‘tracker’ is used uninitialized [-Wuninitialized] 786 | struct lag_tracker tracker; | ^~~~~~~ which seems to be because it doesn't track how the use (and initialization) is bound by the 'do_bond' flag. But admittedly that 'do_bond' usage is fairly complicated, and involves passing it around as an argument to helper functions, so it's somewhat understandable that gcc doesn't see how that all works. This function could be rewritten to make the use of that tracker variable more obviously safe, but for now I'm just adding the forced initialization of it. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-06-09gcc-12: disable '-Wdangling-pointer' warning for nowLinus Torvalds1-0/+3
While the concept of checking for dangling pointers to local variables at function exit is really interesting, the gcc-12 implementation is not compatible with reality, and results in false positives. For example, gcc sees us putting things on a local list head allocated on the stack, which involves exactly those kinds of pointers to the local stack entry: In function ‘__list_add’, inlined from ‘list_add_tail’ at include/linux/list.h:102:2, inlined from ‘rebuild_snap_realms’ at fs/ceph/snap.c:434:2: include/linux/list.h:74:19: warning: storing the address of local variable ‘realm_queue’ in ‘*&realm_27(D)->rebuild_item.prev’ [-Wdangling-pointer=] 74 | new->prev = prev; | ~~~~~~~~~~^~~~~~ But then gcc - understandably - doesn't really understand the big picture how the doubly linked list works, so doesn't see how we then end up emptying said list head in a loop and the pointer we added has been removed. Gcc also complains about us (intentionally) using this as a way to store a kind of fake stack trace, eg drivers/acpi/acpica/utdebug.c:40:38: warning: storing the address of local variable ‘current_sp’ in ‘acpi_gbl_entry_stack_pointer’ [-Wdangling-pointer=] 40 | acpi_gbl_entry_stack_pointer = &current_sp; | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~ which is entirely reasonable from a compiler standpoint, and we may want to change those kinds of patterns, but not not. So this is one of those "it would be lovely if the compiler were to complain about us leaving dangling pointers to the stack", but not this way. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-06-09drm: imx: fix compiler warning with gcc-12Linus Torvalds1-1/+1
Gcc-12 correctly warned about this code using a non-NULL pointer as a truth value: drivers/gpu/drm/imx/ipuv3-crtc.c: In function ‘ipu_crtc_disable_planes’: drivers/gpu/drm/imx/ipuv3-crtc.c:72:21: error: the comparison will always evaluate as ‘true’ for the address of ‘plane’ will never be NULL [-Werror=address] 72 | if (&ipu_crtc->plane[1] && plane == &ipu_crtc->plane[1]->base) | ^ due to the extraneous '&' address-of operator. Philipp Zabel points out that The mistake had no adverse effect since the following condition doesn't actually dereference the NULL pointer, but the intent of the code was obviously to check for it, not to take the address of the member. Fixes: eb8c88808c83 ("drm/imx: add deferred plane disabling") Acked-by: Philipp Zabel <p.zabel@pengutronix.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-06-08net: amd-xgbe: fix clang -Wformat warningJustin Stitt1-1/+1
see warning: | drivers/net/ethernet/amd/xgbe/xgbe-drv.c:2787:43: warning: format specifies | type 'unsigned short' but the argument has type 'int' [-Wformat] | netdev_dbg(netdev, "Protocol: %#06hx\n", ntohs(eth->h_proto)); | ~~~~~~ ^~~~~~~~~~~~~~~~~~~ Variadic functions (printf-like) undergo default argument promotion. Documentation/core-api/printk-formats.rst specifically recommends using the promoted-to-type's format flag. Also, as per C11 6.3.1.1: (https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1548.pdf) `If an int can represent all values of the original type ..., the value is converted to an int; otherwise, it is converted to an unsigned int. These are called the integer promotions.` Since the argument is a u16 it will get promoted to an int and thus it is most accurate to use the %x format specifier here. It should be noted that the `#06` formatting sugar does not alter the promotion rules. Link: https://github.com/ClangBuiltLinux/linux/issues/378 Signed-off-by: Justin Stitt <jstitt007@gmail.com> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Link: https://lore.kernel.org/r/20220607191119.20686-1-jstitt007@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-08tcp: use alloc_large_system_hash() to allocate table_perturbMuchun Song1-4/+6
In our server, there may be no high order (>= 6) memory since we reserve lots of HugeTLB pages when booting. Then the system panic. So use alloc_large_system_hash() to allocate table_perturb. Fixes: e9261476184b ("tcp: dynamically allocate the perturb table used by source ports") Signed-off-by: Muchun Song <songmuchun@bytedance.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20220607070214.94443-1-songmuchun@bytedance.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-08net: dsa: realtek: rtl8365mb: fix GMII caps for ports with internal PHYAlvin Šipraga1-29/+9
Since commit a18e6521a7d9 ("net: phylink: handle NA interface mode in phylink_fwnode_phy_connect()"), phylib defaults to GMII when no phy-mode or phy-connection-type property is specified in a DSA port node of the device tree. The same commit caused a regression in rtl8365mb whereby phylink would fail to connect, because the driver did not advertise support for GMII for ports with internal PHY. It should be noted that the aforementioned regression is not because the blamed commit was incorrect: on the contrary, the blamed commit is correcting the previous behaviour whereby unspecified phy-mode would cause the internal interface mode to be PHY_INTERFACE_MODE_NA. The rtl8365mb driver only worked by accident before because it _did_ advertise support for PHY_INTERFACE_MODE_NA, despite NA being reserved for internal use by phylink. With one mistake fixed, the other was exposed. Commit a5dba0f207e5 ("net: dsa: rtl8365mb: add GMII as user port mode") then introduced implicit support for GMII mode on ports with internal PHY to allow a PHY connection for device trees where the phy-mode is not explicitly set to "internal". At this point everything was working OK again. Subsequently, commit 6ff6064605e9 ("net: dsa: realtek: convert to phylink_generic_validate()") broke this behaviour again by discarding the usage of rtl8365mb_phy_mode_supported() - where this GMII support was indicated - while switching to the new .phylink_get_caps API. With the new API, rtl8365mb_phy_mode_supported() is no longer needed. Remove it altogether and add back the GMII capability - this time to rtl8365mb_phylink_get_caps() - so that the above default behaviour works for ports with internal PHY again. Fixes: 6ff6064605e9 ("net: dsa: realtek: convert to phylink_generic_validate()") Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/20220607184624.417641-1-alvin@pqrs.dk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-08net: dsa: mv88e6xxx: correctly report serdes link failureRussell King (Oracle)1-0/+8
Phylink wants to know if the link has dropped since the last time state was retrieved, and the BMSR gives us that. Read the BMSR and use it when deciding the link state. Fill in the an_complete member as well for the emulated PHY state. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-08net: dsa: mv88e6xxx: fix BMSR error to be consistent with othersRussell King (Oracle)1-1/+1
Other errors accessing the registers in mv88e6352_serdes_pcs_get_state() print "PHY " before the register name, except for the BMSR. Make this consistent with the other error messages. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-08net: dsa: mv88e6xxx: use BMSR_ANEGCOMPLETE bit for filling an_completeMarek Behún1-16/+11
Commit ede359d8843a ("net: dsa: mv88e6xxx: Link in pcs_get_state() if AN is bypassed") added the ability to link if AN was bypassed, and added filling of state->an_complete field, but set it to true if AN was enabled in BMCR, not when AN was reported complete in BMSR. This was done because for some reason, when I wanted to use BMSR value to infer an_complete, I was looking at BMSR_ANEGCAPABLE bit (which was always 1), instead of BMSR_ANEGCOMPLETE bit. Use BMSR_ANEGCOMPLETE for filling state->an_complete. Fixes: ede359d8843a ("net: dsa: mv88e6xxx: Link in pcs_get_state() if AN is bypassed") Signed-off-by: Marek Behún <kabel@kernel.org> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-08net: altera: Fix refcount leak in altera_tse_mdio_createMiaoqian Lin1-1/+5
Every iteration of for_each_child_of_node() decrements the reference count of the previous node. When break from a for_each_child_of_node() loop, we need to explicitly call of_node_put() on the child node when not need anymore. Add missing of_node_put() to avoid refcount leak. Fixes: bbd2190ce96d ("Altera TSE: Add main and header file for Altera Ethernet Driver") Signed-off-by: Miaoqian Lin <linmq006@gmail.com> Link: https://lore.kernel.org/r/20220607041144.7553-1-linmq006@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-08net: openvswitch: fix misuse of the cached connection on tuple changesIlya Maximets2-1/+9
If packet headers changed, the cached nfct is no longer relevant for the packet and attempt to re-use it leads to the incorrect packet classification. This issue is causing broken connectivity in OpenStack deployments with OVS/OVN due to hairpin traffic being unexpectedly dropped. The setup has datapath flows with several conntrack actions and tuple changes between them: actions:ct(commit,zone=8,mark=0/0x1,nat(src)), set(eth(src=00:00:00:00:00:01,dst=00:00:00:00:00:06)), set(ipv4(src=172.18.2.10,dst=192.168.100.6,ttl=62)), ct(zone=8),recirc(0x4) After the first ct() action the packet headers are almost fully re-written. The next ct() tries to re-use the existing nfct entry and marks the packet as invalid, so it gets dropped later in the pipeline. Clearing the cached conntrack entry whenever packet tuple is changed to avoid the issue. The flow key should not be cleared though, because we should still be able to match on the ct_state if the recirculation happens after the tuple change but before the next ct() action. Cc: stable@vger.kernel.org Fixes: 7f8a436eaa2c ("openvswitch: Add conntrack action") Reported-by: Frode Nordahl <frode.nordahl@canonical.com> Link: https://mail.openvswitch.org/pipermail/ovs-discuss/2022-May/051829.html Link: https://bugs.launchpad.net/ubuntu/+source/ovn/+bug/1967856 Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Link: https://lore.kernel.org/r/20220606221140.488984-1-i.maximets@ovn.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-08net: ethernet: mtk_eth_soc: fix misuse of mem alloc interface netdev[napi]_alloc_fragChen Lin1-2/+19
When rx_flag == MTK_RX_FLAGS_HWLRO, rx_data_len = MTK_MAX_LRO_RX_LENGTH(4096 * 3) > PAGE_SIZE. netdev_alloc_frag is for alloction of page fragment only. Reference to other drivers and Documentation/vm/page_frags.rst Branch to use __get_free_pages when ring->frag_size > PAGE_SIZE. Signed-off-by: Chen Lin <chen45464546@163.com> Link: https://lore.kernel.org/r/1654692413-2598-1-git-send-email-chen45464546@163.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-08ip_gre: test csum_start instead of transport headerWillem de Bruijn1-6/+5
GRE with TUNNEL_CSUM will apply local checksum offload on CHECKSUM_PARTIAL packets. ipgre_xmit must validate csum_start after an optional skb_pull, else lco_csum may trigger an overflow. The original check was if (csum && skb_checksum_start(skb) < skb->data) return -EINVAL; This had false positives when skb_checksum_start is undefined: when ip_summed is not CHECKSUM_PARTIAL. A discussed refinement was straightforward if (csum && skb->ip_summed == CHECKSUM_PARTIAL && skb_checksum_start(skb) < skb->data) return -EINVAL; But was eventually revised more thoroughly: - restrict the check to the only branch where needed, in an uncommon GRE path that uses header_ops and calls skb_pull. - test skb_transport_header, which is set along with csum_start in skb_partial_csum_set in the normal header_ops datapath. Turns out skbs can arrive in this branch without the transport header set, e.g., through BPF redirection. Revise the check back to check csum_start directly, and only if CHECKSUM_PARTIAL. Do leave the check in the updated location. Check field regardless of whether TUNNEL_CSUM is configured. Link: https://lore.kernel.org/netdev/YS+h%2FtqCJJiQei+W@shredder/ Link: https://lore.kernel.org/all/20210902193447.94039-2-willemdebruijn.kernel@gmail.com/T/#u Fixes: 8a0ed250f911 ("ip_gre: validate csum_start only on pull") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Link: https://lore.kernel.org/r/20220606132107.3582565-1-willemdebruijn.kernel@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-08cert host tools: Stop complaining about deprecated OpenSSL functionsLinus Torvalds2-0/+14
OpenSSL 3.0 deprecated the OpenSSL's ENGINE API. That is as may be, but the kernel build host tools still use it. Disable the warning about deprecated declarations until somebody who cares fixes it. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-06-08au1000_eth: stop using virt_to_bus()Arnd Bergmann2-13/+13
The conversion to the dma-mapping API in linux-2.6.11 was incomplete and left a virt_to_bus() call around. There have been a number of fixes for DMA mapping API abuse in this driver, but this one always slipped through. Change it to just use the existing dma_addr_t pointer, and make it use the correct types throughout the driver to make it easier to understand the virtual vs dma address spaces. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Tested-by: Manuel Lauss <manuel.lauss@gmail.com> Link: https://lore.kernel.org/r/20220607090206.19830-1-arnd@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-08ipv6: Fix signed integer overflow in l2tp_ip6_sendmsgWang Yufen1-2/+3
When len >= INT_MAX - transhdrlen, ulen = len + transhdrlen will be overflow. To fix, we can follow what udpv6 does and subtract the transhdrlen from the max. Signed-off-by: Wang Yufen <wangyufen@huawei.com> Link: https://lore.kernel.org/r/20220607120028.845916-2-wangyufen@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-08ipv6: Fix signed integer overflow in __ip6_append_dataWang Yufen2-5/+5
Resurrect ubsan overflow checks and ubsan report this warning, fix it by change the variable [length] type to size_t. UBSAN: signed-integer-overflow in net/ipv6/ip6_output.c:1489:19 2147479552 + 8567 cannot be represented in type 'int' CPU: 0 PID: 253 Comm: err Not tainted 5.16.0+ #1 Hardware name: linux,dummy-virt (DT) Call trace: dump_backtrace+0x214/0x230 show_stack+0x30/0x78 dump_stack_lvl+0xf8/0x118 dump_stack+0x18/0x30 ubsan_epilogue+0x18/0x60 handle_overflow+0xd0/0xf0 __ubsan_handle_add_overflow+0x34/0x44 __ip6_append_data.isra.48+0x1598/0x1688 ip6_append_data+0x128/0x260 udpv6_sendmsg+0x680/0xdd0 inet6_sendmsg+0x54/0x90 sock_sendmsg+0x70/0x88 ____sys_sendmsg+0xe8/0x368 ___sys_sendmsg+0x98/0xe0 __sys_sendmmsg+0xf4/0x3b8 __arm64_sys_sendmmsg+0x34/0x48 invoke_syscall+0x64/0x160 el0_svc_common.constprop.4+0x124/0x300 do_el0_svc+0x44/0xc8 el0_svc+0x3c/0x1e8 el0t_64_sync_handler+0x88/0xb0 el0t_64_sync+0x16c/0x170 Changes since v1: -Change the variable [length] type to unsigned, as Eric Dumazet suggested. Changes since v2: -Don't change exthdrlen type in ip6_make_skb, as Paolo Abeni suggested. Changes since v3: -Don't change ulen type in udpv6_sendmsg and l2tp_ip6_sendmsg, as Jakub Kicinski suggested. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wang Yufen <wangyufen@huawei.com> Link: https://lore.kernel.org/r/20220607120028.845916-1-wangyufen@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-08nfc: nfcmrvl: Fix memory leak in nfcmrvl_play_deferredXiaohui Zhang1-2/+14
Similar to the handling of play_deferred in commit 19cfe912c37b ("Bluetooth: btusb: Fix memory leak in play_deferred"), we thought a patch might be needed here as well. Currently usb_submit_urb is called directly to submit deferred tx urbs after unanchor them. So the usb_giveback_urb_bh would failed to unref it in usb_unanchor_urb and cause memory leak. Put those urbs in tx_anchor to avoid the leak, and also fix the error handling. Signed-off-by: Xiaohui Zhang <xiaohuizhang@ruc.edu.cn> Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20220607083230.6182-1-xiaohuizhang@ruc.edu.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-08nfc: st21nfca: fix incorrect sizing calculations in EVT_TRANSACTIONMartin Faltesek1-30/+30
The transaction buffer is allocated by using the size of the packet buf, and subtracting two which seem intended to remove the two tags which are not present in the target structure. This calculation leads to under counting memory because of differences between the packet contents and the target structure. The aid_len field is a u8 in the packet, but a u32 in the structure, resulting in at least 3 bytes always being under counted. Further, the aid data is a variable length field in the packet, but fixed in the structure, so if this field is less than the max, the difference is added to the under counting. The last validation check for transaction->params_len is also incorrect since it employs the same accounting error. To fix, perform validation checks progressively to safely reach the next field, to determine the size of both buffers and verify both tags. Once all validation checks pass, allocate the buffer and copy the data. This eliminates freeing memory on the error path, as those checks are moved ahead of memory allocation. Fixes: 26fc6c7f02cb ("NFC: st21nfca: Add HCI transaction event support") Fixes: 4fbcc1a4cb20 ("nfc: st21nfca: Fix potential buffer overflows in EVT_TRANSACTION") Cc: stable@vger.kernel.org Signed-off-by: Martin Faltesek <mfaltesek@google.com> Reviewed-by: Guenter Roeck <groeck@chromium.org> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-08nfc: st21nfca: fix memory leaks in EVT_TRANSACTION handlingMartin Faltesek1-3/+10
Error paths do not free previously allocated memory. Add devm_kfree() to those failure paths. Fixes: 26fc6c7f02cb ("NFC: st21nfca: Add HCI transaction event support") Fixes: 4fbcc1a4cb20 ("nfc: st21nfca: Fix potential buffer overflows in EVT_TRANSACTION") Cc: stable@vger.kernel.org Signed-off-by: Martin Faltesek <mfaltesek@google.com> Reviewed-by: Guenter Roeck <groeck@chromium.org> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-08nfc: st21nfca: fix incorrect validating logic in EVT_TRANSACTIONMartin Faltesek1-1/+1
The first validation check for EVT_TRANSACTION has two different checks tied together with logical AND. One is a check for minimum packet length, and the other is for a valid aid_tag. If either condition is true (fails), then an error should be triggered. The fix is to change && to ||. Fixes: 26fc6c7f02cb ("NFC: st21nfca: Add HCI transaction event support") Cc: stable@vger.kernel.org Signed-off-by: Martin Faltesek <mfaltesek@google.com> Reviewed-by: Guenter Roeck <groeck@chromium.org> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-08net: ipv6: unexport __init-annotated seg6_hmac_init()Masahiro Yamada1-1/+0
EXPORT_SYMBOL and __init is a bad combination because the .init.text section is freed up after the initialization. Hence, modules cannot use symbols annotated __init. The access to a freed symbol may end up with kernel panic. modpost used to detect it, but it has been broken for a decade. Recently, I fixed modpost so it started to warn it again, then this showed up in linux-next builds. There are two ways to fix it: - Remove __init - Remove EXPORT_SYMBOL I chose the latter for this case because the caller (net/ipv6/seg6.c) and the callee (net/ipv6/seg6_hmac.c) belong to the same module. It seems an internal function call in ipv6.ko. Fixes: bf355b8d2c30 ("ipv6: sr: add core files for SR HMAC support") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-08net: xfrm: unexport __init-annotated xfrm4_protocol_init()Masahiro Yamada1-1/+0
EXPORT_SYMBOL and __init is a bad combination because the .init.text section is freed up after the initialization. Hence, modules cannot use symbols annotated __init. The access to a freed symbol may end up with kernel panic. modpost used to detect it, but it has been broken for a decade. Recently, I fixed modpost so it started to warn it again, then this showed up in linux-next builds. There are two ways to fix it: - Remove __init - Remove EXPORT_SYMBOL I chose the latter for this case because the only in-tree call-site, net/ipv4/xfrm4_policy.c is never compiled as modular. (CONFIG_XFRM is boolean) Fixes: 2f32b51b609f ("xfrm: Introduce xfrm_input_afinfo to access the the callbacks properly") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Acked-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-08net: mdio: unexport __init-annotated mdio_bus_init()Masahiro Yamada1-1/+0
EXPORT_SYMBOL and __init is a bad combination because the .init.text section is freed up after the initialization. Hence, modules cannot use symbols annotated __init. The access to a freed symbol may end up with kernel panic. modpost used to detect it, but it has been broken for a decade. Recently, I fixed modpost so it started to warn it again, then this showed up in linux-next builds. There are two ways to fix it: - Remove __init - Remove EXPORT_SYMBOL I chose the latter for this case because the only in-tree call-site, drivers/net/phy/phy_device.c is never compiled as modular. (CONFIG_PHYLIB is boolean) Fixes: 90eff9096c01 ("net: phy: Allow splitting MDIO bus/device support from PHYs") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-08MAINTAINERS: Add a maintainer for bpftoolQuentin Monnet1-0/+7
I've been contributing and reviewing patches for bpftool for some time, and I'm taking care of its external mirror. On Alexei, KP, and Daniel's suggestion, I would like to step forwards and become a maintainer for the tool. This patch adds a dedicated entry to MAINTAINERS. Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jakub Kicinski <kuba@kernel.org> Acked-by: KP Singh <kpsingh@kernel.org> Acked-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220608121428.69708-1-quentin@isovalent.com
2022-06-08xsk: Fix handling of invalid descriptors in XSK TX batching APIMaciej Fijalkowski2-10/+3
xdpxceiver run on a AF_XDP ZC enabled driver revealed a problem with XSK Tx batching API. There is a test that checks how invalid Tx descriptors are handled by AF_XDP. Each valid descriptor is followed by invalid one on Tx side whereas the Rx side expects only to receive a set of valid descriptors. In current xsk_tx_peek_release_desc_batch() function, the amount of available descriptors is hidden inside xskq_cons_peek_desc_batch(). This can be problematic in cases where invalid descriptors are present due to the fact that xskq_cons_peek_desc_batch() returns only a count of valid descriptors. This means that it is impossible to properly update XSK ring state when calling xskq_cons_release_n(). To address this issue, pull out the contents of xskq_cons_peek_desc_batch() so that callers (currently only xsk_tx_peek_release_desc_batch()) will always be able to update the state of ring properly, as total count of entries is now available and use this value as an argument in xskq_cons_release_n(). By doing so, xskq_cons_peek_desc_batch() can be dropped altogether. Fixes: 9349eb3a9d2a ("xsk: Introduce batched Tx descriptor interfaces") Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/bpf/20220607142200.576735-1-maciej.fijalkowski@intel.com
2022-06-08KEYS: trusted: tpm2: Fix migratable logicDavid Safford1-2/+2
When creating (sealing) a new trusted key, migratable trusted keys have the FIXED_TPM and FIXED_PARENT attributes set, and non-migratable keys don't. This is backwards, and also causes creation to fail when creating a migratable key under a migratable parent. (The TPM thinks you are trying to seal a non-migratable blob under a migratable parent.) The following simple patch fixes the logic, and has been tested for all four combinations of migratable and non-migratable trusted keys and parent storage keys. With this logic, you will get a proper failure if you try to create a non-migratable trusted key under a migratable parent storage key, and all other combinations work correctly. Cc: stable@vger.kernel.org # v5.13+ Fixes: e5fb5d2c5a03 ("security: keys: trusted: Make sealed key properly interoperable") Signed-off-by: David Safford <david.safford@gmail.com> Reviewed-by: Ahmad Fatoum <a.fatoum@pengutronix.de> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2022-06-08KVM: x86: do not report a vCPU as preempted outside instruction boundariesPaolo Bonzini4-0/+28
If a vCPU is outside guest mode and is scheduled out, it might be in the process of making a memory access. A problem occurs if another vCPU uses the PV TLB flush feature during the period when the vCPU is scheduled out, and a virtual address has already been translated but has not yet been accessed, because this is equivalent to using a stale TLB entry. To avoid this, only report a vCPU as preempted if sure that the guest is at an instruction boundary. A rescheduling request will be delivered to the host physical CPU as an external interrupt, so for simplicity consider any vmexit *not* instruction boundary except for external interrupts. It would in principle be okay to report the vCPU as preempted also if it is sleeping in kvm_vcpu_block(): a TLB flush IPI will incur the vmentry/vmexit overhead unnecessarily, and optimistic spinning is also unlikely to succeed. However, leave it for later because right now kvm_vcpu_check_block() is doing memory accesses. Even though the TLB flush issue only applies to virtual memory address, it's very much preferrable to be conservative. Reported-by: Jann Horn <jannh@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-08KVM: x86: do not set st->preempted when going back to user spacePaolo Bonzini2-14/+18
Similar to the Xen path, only change the vCPU's reported state if the vCPU was actually preempted. The reason for KVM's behavior is that for example optimistic spinning might not be a good idea if the guest is doing repeated exits to userspace; however, it is confusing and unlikely to make a difference, because well-tuned guests will hardly ever exit KVM_RUN in the first place. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-07net/mlx4_en: Fix wrong return value on ioctl EEPROM query failureGal Pressman1-1/+1
The ioctl EEPROM query wrongly returns success on read failures, fix that by returning the appropriate error code. Fixes: 7202da8b7f71 ("ethtool, net/mlx4_en: Cable info, get_module_info/eeprom ethtool support") Signed-off-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20220606115718.14233-1-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-07net: dsa: lantiq_gswip: Fix refcount leak in gswip_gphy_fw_listMiaoqian Lin1-1/+3
Every iteration of for_each_available_child_of_node() decrements the reference count of the previous node. when breaking early from a for_each_available_child_of_node() loop, we need to explicitly call of_node_put() on the gphy_fw_np. Add missing of_node_put() to avoid refcount leak. Fixes: 14fceff4771e ("net: dsa: Add Lantiq / Intel DSA driver for vrx200") Signed-off-by: Miaoqian Lin <linmq006@gmail.com> Link: https://lore.kernel.org/r/20220605072335.11257-1-linmq006@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-07Input: soc_button_array - also add Lenovo Yoga Tablet2 1051F to dmi_use_low_level_irqMarius Hoch1-2/+2
Commit 223f61b8c5ad ("Input: soc_button_array - add Lenovo Yoga Tablet2 1051L to the dmi_use_low_level_irq list") added the 1051L to this list already, but the same problem applies to the 1051F. As there are no further 1051 variants (just the F/L), we can just DMI match 1051. Tested on a Lenovo Yoga Tablet2 1051F: Without this patch the home-button stops working after a wakeup from suspend. Signed-off-by: Marius Hoch <mail@mariushoch.de> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20220603120246.3065-1-mail@mariushoch.de Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2022-06-07Input: bcm5974 - set missing URB_NO_TRANSFER_DMA_MAP urb flagMathias Nyman1-1/+6
The bcm5974 driver does the allocation and dma mapping of the usb urb data buffer, but driver does not set the URB_NO_TRANSFER_DMA_MAP flag to let usb core know the buffer is already mapped. usb core tries to map the already mapped buffer, causing a warning: "xhci_hcd 0000:00:14.0: rejecting DMA map of vmalloc memory" Fix this by setting the URB_NO_TRANSFER_DMA_MAP, letting usb core know buffer is already mapped by bcm5974 driver Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Cc: stable@vger.kernel.org Link: https://bugzilla.kernel.org/show_bug.cgi?id=215890 Link: https://lore.kernel.org/r/20220606113636.588955-1-mathias.nyman@linux.intel.com Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2022-06-07ixgbe: fix unexpected VLAN Rx in promisc mode on VFOlivier Matz1-2/+2
When the promiscuous mode is enabled on a VF, the IXGBE_VMOLR_VPE bit (VLAN Promiscuous Enable) is set. This means that the VF will receive packets whose VLAN is not the same than the VLAN of the VF. For instance, in this situation: ┌────────┐ ┌────────┐ ┌────────┐ │ │ │ │ │ │ │ │ │ │ │ │ │ VF0├────┤VF1 VF2├────┤VF3 │ │ │ │ │ │ │ └────────┘ └────────┘ └────────┘ VM1 VM2 VM3 vf 0: vlan 1000 vf 1: vlan 1000 vf 2: vlan 1001 vf 3: vlan 1001 If we tcpdump on VF3, we see all the packets, even those transmitted on vlan 1000. This behavior prevents to bridge VF1 and VF2 in VM2, because it will create a loop: packets transmitted on VF1 will be received by VF2 and vice-versa, and bridged again through the software bridge. This patch remove the activation of VLAN Promiscuous when a VF enables the promiscuous mode. However, the IXGBE_VMOLR_UPE bit (Unicast Promiscuous) is kept, so that a VF receives all packets that has the same VLAN, whatever the destination MAC address. Fixes: 8443c1a4b192 ("ixgbe, ixgbevf: Add new mbox API xcast mode") Cc: stable@vger.kernel.org Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Olivier Matz <olivier.matz@6wind.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-06-07ixgbe: fix bcast packets Rx on VF after promisc removalOlivier Matz1-2/+2
After a VF requested to remove the promiscuous flag on an interface, the broadcast packets are not received anymore. This breaks some protocols like ARP. In ixgbe_update_vf_xcast_mode(), we should keep the IXGBE_VMOLR_BAM bit (Broadcast Accept) on promiscuous removal. This flag is already set by default in ixgbe_set_vmolr() on VF reset. Fixes: 8443c1a4b192 ("ixgbe, ixgbevf: Add new mbox API xcast mode") Cc: stable@vger.kernel.org Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Olivier Matz <olivier.matz@6wind.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-06-07selftests/bpf: Add selftest for calling global functions from freplaceToke Høiland-Jørgensen2-0/+32
Add a selftest that calls a global function with a context object parameter from an freplace function to check that the program context type is correctly converted to the freplace target when fetching the context type from the kernel BTF. v2: - Trim includes - Get rid of global function - Use __noinline Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/r/20220606075253.28422-2-toke@redhat.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-07bpf: Fix calling global functions from BPF_PROG_TYPE_EXT programsToke Høiland-Jørgensen1-1/+2
The verifier allows programs to call global functions as long as their argument types match, using BTF to check the function arguments. One of the allowed argument types to such global functions is PTR_TO_CTX; however the check for this fails on BPF_PROG_TYPE_EXT functions because the verifier uses the wrong type to fetch the vmlinux BTF ID for the program context type. This failure is seen when an XDP program is loaded using libxdp (which loads it as BPF_PROG_TYPE_EXT and attaches it to a global XDP type program). Fix the issue by passing in the target program type instead of the BPF_PROG_TYPE_EXT type to bpf_prog_get_ctx() when checking function argument compatibility. The first Fixes tag refers to the latest commit that touched the code in question, while the second one points to the code that first introduced the global function call verification. v2: - Use resolve_prog_type() Fixes: 3363bd0cfbb8 ("bpf: Extend kfunc with PTR_TO_CTX, PTR_TO_MEM argument support") Fixes: 51c39bb1d5d1 ("bpf: Introduce function-by-function verification") Reported-by: Simon Sundberg <simon.sundberg@kau.se> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/r/20220606075253.28422-1-toke@redhat.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-07bpf: Use safer kvmalloc_array() where possibleDan Carpenter1-4/+4
The kvmalloc_array() function is safer because it has a check for integer overflows. These sizes come from the user and I was not able to see any bounds checking so an integer overflow seems like a realistic concern. Fixes: 0dcac2725406 ("bpf: Add multi kprobe link") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/Yo9VRVMeHbALyjUH@kili Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-07bpf, arm64: Clear prog->jited_len along prog->jitedEric Dumazet1-0/+1
syzbot reported an illegal copy_to_user() attempt from bpf_prog_get_info_by_fd() [1] There was no repro yet on this bug, but I think that commit 0aef499f3172 ("mm/usercopy: Detect vmalloc overruns") is exposing a prior bug in bpf arm64. bpf_prog_get_info_by_fd() looks at prog->jited_len to determine if the JIT image can be copied out to user space. My theory is that syzbot managed to get a prog where prog->jited_len has been set to 43, while prog->bpf_func has ben cleared. It is not clear why copy_to_user(uinsns, NULL, ulen) is triggering this particular warning. I thought find_vma_area(NULL) would not find a vm_struct. As we do not hold vmap_area_lock spinlock, it might be possible that the found vm_struct was garbage. [1] usercopy: Kernel memory exposure attempt detected from vmalloc (offset 792633534417210172, size 43)! kernel BUG at mm/usercopy.c:101! Internal error: Oops - BUG: 0 [#1] PREEMPT SMP Modules linked in: CPU: 0 PID: 25002 Comm: syz-executor.1 Not tainted 5.18.0-syzkaller-10139-g8291eaafed36 #0 Hardware name: linux,dummy-virt (DT) pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : usercopy_abort+0x90/0x94 mm/usercopy.c:101 lr : usercopy_abort+0x90/0x94 mm/usercopy.c:89 sp : ffff80000b773a20 x29: ffff80000b773a30 x28: faff80000b745000 x27: ffff80000b773b48 x26: 0000000000000000 x25: 000000000000002b x24: 0000000000000000 x23: 00000000000000e0 x22: ffff80000b75db67 x21: 0000000000000001 x20: 000000000000002b x19: ffff80000b75db3c x18: 00000000fffffffd x17: 2820636f6c6c616d x16: 76206d6f72662064 x15: 6574636574656420 x14: 74706d6574746120 x13: 2129333420657a69 x12: 73202c3237313031 x11: 3237313434333533 x10: 3336323937207465 x9 : 657275736f707865 x8 : ffff80000a30c550 x7 : ffff80000b773830 x6 : ffff80000b773830 x5 : 0000000000000000 x4 : ffff00007fbbaa10 x3 : 0000000000000000 x2 : 0000000000000000 x1 : f7ff000028fc0000 x0 : 0000000000000064 Call trace: usercopy_abort+0x90/0x94 mm/usercopy.c:89 check_heap_object mm/usercopy.c:186 [inline] __check_object_size mm/usercopy.c:252 [inline] __check_object_size+0x198/0x36c mm/usercopy.c:214 check_object_size include/linux/thread_info.h:199 [inline] check_copy_size include/linux/thread_info.h:235 [inline] copy_to_user include/linux/uaccess.h:159 [inline] bpf_prog_get_info_by_fd.isra.0+0xf14/0xfdc kernel/bpf/syscall.c:3993 bpf_obj_get_info_by_fd+0x12c/0x510 kernel/bpf/syscall.c:4253 __sys_bpf+0x900/0x2150 kernel/bpf/syscall.c:4956 __do_sys_bpf kernel/bpf/syscall.c:5021 [inline] __se_sys_bpf kernel/bpf/syscall.c:5019 [inline] __arm64_sys_bpf+0x28/0x40 kernel/bpf/syscall.c:5019 __invoke_syscall arch/arm64/kernel/syscall.c:38 [inline] invoke_syscall+0x48/0x114 arch/arm64/kernel/syscall.c:52 el0_svc_common.constprop.0+0x44/0xec arch/arm64/kernel/syscall.c:142 do_el0_svc+0xa0/0xc0 arch/arm64/kernel/syscall.c:206 el0_svc+0x44/0xb0 arch/arm64/kernel/entry-common.c:624 el0t_64_sync_handler+0x1ac/0x1b0 arch/arm64/kernel/entry-common.c:642 el0t_64_sync+0x198/0x19c arch/arm64/kernel/entry.S:581 Code: aa0003e3 d00038c0 91248000 97fff65f (d4210000) Fixes: db496944fdaa ("bpf: arm64: add JIT support for multi-function programs") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20220531215113.1100754-1-eric.dumazet@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-07KVM: SVM: fix tsc scaling cache logicMaxim Levitsky3-15/+23
SVM uses a per-cpu variable to cache the current value of the tsc scaling multiplier msr on each cpu. Commit 1ab9287add5e2 ("KVM: X86: Add vendor callbacks for writing the TSC multiplier") broke this caching logic. Refactor the code so that all TSC scaling multiplier writes go through a single function which checks and updates the cache. This fixes the following scenario: 1. A CPU runs a guest with some tsc scaling ratio. 2. New guest with different tsc scaling ratio starts on this CPU and terminates almost immediately. This ensures that the short running guest had set the tsc scaling ratio just once when it was set via KVM_SET_TSC_KHZ. Due to the bug, the per-cpu cache is not updated. 3. The original guest continues to run, it doesn't restore the msr value back to its own value, because the cache matches, and thus continues to run with a wrong tsc scaling ratio. Fixes: 1ab9287add5e2 ("KVM: X86: Add vendor callbacks for writing the TSC multiplier") Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220606181149.103072-1-mlevitsk@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-07KVM: selftests: Make hyperv_clock selftest more stableVitaly Kuznetsov1-3/+7
hyperv_clock doesn't always give a stable test result, especially with AMD CPUs. The test compares Hyper-V MSR clocksource (acquired either with rdmsr() from within the guest or KVM_GET_MSRS from the host) against rdtsc(). To increase the accuracy, increase the measured delay (done with nop loop) by two orders of magnitude and take the mean rdtsc() value before and after rdmsr()/KVM_GET_MSRS. Reported-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Tested-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220601144322.1968742-1-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-07KVM: x86/MMU: Zap non-leaf SPTEs when disabling dirty loggingBen Gardon3-6/+42
Currently disabling dirty logging with the TDP MMU is extremely slow. On a 96 vCPU / 96G VM backed with gigabyte pages, it takes ~200 seconds to disable dirty logging with the TDP MMU, as opposed to ~4 seconds with the shadow MMU. When disabling dirty logging, zap non-leaf parent entries to allow replacement with huge pages instead of recursing and zapping all of the child, leaf entries. This reduces the number of TLB flushes required. and reduces the disable dirty log time with the TDP MMU to ~3 seconds. Opportunistically add a WARN() to catch GFNs that are mapped at a higher level than their max level. Signed-off-by: Ben Gardon <bgardon@google.com> Message-Id: <20220525230904.1584480-1-bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-07x86: drop bogus "cc" clobber from __try_cmpxchg_user_asm()Jan Beulich1-1/+1
As noted (and fixed) a couple of times in the past, "=@cc<cond>" outputs and clobbering of "cc" don't work well together. The compiler appears to mean to reject such, but doesn't - in its upstream form - quite manage to yet for "cc". Furthermore two similar macros don't clobber "cc", and clobbering "cc" is pointless in asm()-s for x86 anyway - the compiler always assumes status flags to be clobbered there. Fixes: 989b5db215a2 ("x86/uaccess: Implement macros for CMPXCHG on user addresses") Signed-off-by: Jan Beulich <jbeulich@suse.com> Message-Id: <485c0c0b-a3a7-0b7c-5264-7d00c01de032@suse.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-07KVM: x86/mmu: Check every prev_roots in __kvm_mmu_free_obsolete_roots()Shaoqin Huang1-1/+1
When freeing obsolete previous roots, check prev_roots as intended, not the current root. Signed-off-by: Shaoqin Huang <shaoqin.huang@intel.com> Fixes: 527d5cd7eece ("KVM: x86/mmu: Zap only obsolete roots if a root shadow page is zapped") Message-Id: <20220607005905.2933378-1-shaoqin.huang@intel.com> Cc: stable@vger.kernel.org Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-07entry/kvm: Exit to user mode when TIF_NOTIFY_SIGNAL is setSeth Forshee1-6/+0
A livepatch transition may stall indefinitely when a kvm vCPU is heavily loaded. To the host, the vCPU task is a user thread which is spending a very long time in the ioctl(KVM_RUN) syscall. During livepatch transition, set_notify_signal() will be called on such tasks to interrupt the syscall so that the task can be transitioned. This interrupts guest execution, but when xfer_to_guest_mode_work() sees that TIF_NOTIFY_SIGNAL is set but not TIF_SIGPENDING it concludes that an exit to user mode is unnecessary, and guest execution is resumed without transitioning the task for the livepatch. This handling of TIF_NOTIFY_SIGNAL is incorrect, as set_notify_signal() is expected to break tasks out of interruptible kernel loops and cause them to return to userspace. Change xfer_to_guest_mode_work() to handle TIF_NOTIFY_SIGNAL the same as TIF_SIGPENDING, signaling to the vCPU run loop that an exit to userpsace is needed. Any pending task_work will be run when get_signal() is called from exit_to_user_mode_loop(), so there is no longer any need to run task work from xfer_to_guest_mode_work(). Suggested-by: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Petr Mladek <pmladek@suse.com> Signed-off-by: Seth Forshee <sforshee@digitalocean.com> Message-Id: <20220504180840.2907296-1-sforshee@digitalocean.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-07KVM: Don't null dereference ops->destroyAlexey Kardashevskiy1-1/+4
A KVM device cleanup happens in either of two callbacks: 1) destroy() which is called when the VM is being destroyed; 2) release() which is called when a device fd is closed. Most KVM devices use 1) but Book3s's interrupt controller KVM devices (XICS, XIVE, XIVE-native) use 2) as they need to close and reopen during the machine execution. The error handling in kvm_ioctl_create_device() assumes destroy() is always defined which leads to NULL dereference as discovered by Syzkaller. This adds a checks for destroy!=NULL and adds a missing release(). This is not changing kvm_destroy_devices() as devices with defined release() should have been removed from the KVM devices list by then. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>