wireguard-linux - WireGuard for the Linux kernel

Age	Commit message (Collapse)	Author	Files	Lines
2022-10-13	powerpc/64s/interrupt: Fix lost interrupts when returning to soft-masked context	Nicholas Piggin	1	-2/+13
	It's possible for an interrupt returning to an irqs-disabled context to lose a pending soft-masked irq because it branches to part of the exit code for irqs-enabled contexts, which is meant to clear only the PACA_IRQS_HARD_DIS flag from PACAIRQHAPPENED by zeroing the byte. This just looks like a simple thinko from a recent commit (if there was no hard mask pending, there would be no reason to clear it anyway). This also adds comment to the code that actually does need to clear the flag. Fixes: e485f6c751e0a ("powerpc/64/interrupt: Fix return to masked context after hard-mask irq becomes pending") Reported-by: Sachin Sant <sachinp@linux.ibm.com> Reported-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221013064418.1311104-1-npiggin@gmail.com
2022-10-13	powerpc/32: fix syscall wrappers with 64-bit arguments of unaligned register-pairs	Nicholas Piggin	4	-15/+56
	powerpc 32-bit system call (and function) calling convention for 64-bit arguments requires the next available odd-pair (two sequential registers with the first being odd-numbered) from the standard register argument allocation. The first argument register is r3, so a 64-bit argument that appears at an even position in the argument list must skip a register (unless there were preceding 64-bit arguments, which might throw things off). This requires non-standard compat definitions to deal with the holes in the argument register allocation. With pt_regs syscall wrappers which use a standard mapper to map pt_regs GPRs to function arguments, 32-bit kernels hit the same basic problem, the standard definitions don't cope with the unused argument registers. Fix this by having 32-bit kernels share those syscall definitions with compat. Thanks to Jason for spending a lot of time finding and bisecting this and developing a trivial reproducer. The perfect bug report. Reported-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Fixes: 7e92e01b72452 ("powerpc: Provide syscall wrapper") Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221012035335.866440-1-npiggin@gmail.com
2022-10-11	powerpc: Fix 85xx build	Joel Stanley	1	-1/+1
	The merge of the kbuild tree dropped the renaming of the FSL_BOOKE kconfig option. Fixes: 8afc66e8d43b ("Merge tag 'kbuild-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild") Signed-off-by: Joel Stanley <joel@jms.id.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-10-10	module: tracking: Keep a record of tainted unloaded modules only	Aaron Tomlin	1	-0/+3
	This ensures that no module record/or entry is added to the unloaded_tainted_modules list if it does not carry a taint. Reported-by: Alexey Dobriyan <adobriyan@gmail.com> Fixes: 99bd9956551b ("module: Introduce module unload taint tracking") Signed-off-by: Aaron Tomlin <atomlin@redhat.com> Acked-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-10-07	smb3: fix oops in calculating shash_setkey	Steve French	1	-2/+2
	shash was not being initialized in one place in smb3_calc_signature and smb2_calc_signature Reviewed-by: Enzo Matsumiya <ematsumiya@suse.de> Acked-by: Tom Talpey <tom@talpey.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2022-10-07	cifs: secmech: use shash_desc directly, remove sdesc	Enzo Matsumiya	7	-166/+98
	The struct sdesc is just a wrapper around shash_desc, with exact same memory layout. Replace the hashing TFMs with shash_desc as it's what's passed to the crypto API anyway. Also remove the crypto_shash pointers as they can be accessed via shash_desc->tfm (and are actually only used in the setkey calls). Adapt cifs_{alloc,free}_hash functions to this change. Signed-off-by: Enzo Matsumiya <ematsumiya@suse.de> Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com>
2022-10-07	smb3: rename encryption/decryption TFMs	Enzo Matsumiya	4	-16/+15
	Detach the TFM name from a specific algorithm (AES-CCM) as AES-GCM is also supported, making the name misleading. s/ccmaesencrypt/enc/ s/ccmaesdecrypt/dec/ Signed-off-by: Enzo Matsumiya <ematsumiya@suse.de> Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com>
2022-10-07	cifs: replace kfree() with kfree_sensitive() for sensitive data	Enzo Matsumiya	7	-29/+52
	Replace kfree with kfree_sensitive, or prepend memzero_explicit() in other cases, when freeing sensitive material that could still be left in memory. Signed-off-by: Enzo Matsumiya <ematsumiya@suse.de> Reported-by: kernel test robot <oliver.sang@intel.com> Link: https://lore.kernel.org/r/202209201529.ec633796-oliver.sang@intel.com Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2022-10-07	virtio_pci: don't try to use intxif pin is zero	Angus Chen	1	-0/+3
	The background is that we use dpu in cloud computing,the arch is x86,80 cores. We will have a lots of virtio devices,like 512 or more. When we probe about 200 virtio_blk devices,it will fail and the stack is printed as follows: [25338.485128] virtio-pci 0000:b3:00.0: virtio_pci: leaving for legacy driver [25338.496174] genirq: Flags mismatch irq 0. 00000080 (virtio418) vs. 00015a00 (timer) [25338.503822] CPU: 20 PID: 5431 Comm: kworker/20:0 Kdump: loaded Tainted: G OE --------- - - 4.18.0-305.30.1.el8.x86_64 [25338.516403] Hardware name: Inspur NF5280M5/YZMB-00882-10E, BIOS 4.1.21 08/25/2021 [25338.523881] Workqueue: events work_for_cpu_fn [25338.528235] Call Trace: [25338.530687] dump_stack+0x5c/0x80 [25338.534000] __setup_irq.cold.53+0x7c/0xd3 [25338.538098] request_threaded_irq+0xf5/0x160 [25338.542371] vp_find_vqs+0xc7/0x190 [25338.545866] init_vq+0x17c/0x2e0 [virtio_blk] [25338.550223] ? ncpus_cmp_func+0x10/0x10 [25338.554061] virtblk_probe+0xe6/0x8a0 [virtio_blk] [25338.558846] virtio_dev_probe+0x158/0x1f0 [25338.562861] really_probe+0x255/0x4a0 [25338.566524] ? __driver_attach_async_helper+0x90/0x90 [25338.571567] driver_probe_device+0x49/0xc0 [25338.575660] bus_for_each_drv+0x79/0xc0 [25338.579499] __device_attach+0xdc/0x160 [25338.583337] bus_probe_device+0x9d/0xb0 [25338.587167] device_add+0x418/0x780 [25338.590654] register_virtio_device+0x9e/0xe0 [25338.595011] virtio_pci_probe+0xb3/0x140 [25338.598941] local_pci_probe+0x41/0x90 [25338.602689] work_for_cpu_fn+0x16/0x20 [25338.606443] process_one_work+0x1a7/0x360 [25338.610456] ? create_worker+0x1a0/0x1a0 [25338.614381] worker_thread+0x1cf/0x390 [25338.618132] ? create_worker+0x1a0/0x1a0 [25338.622051] kthread+0x116/0x130 [25338.625283] ? kthread_flush_work_fn+0x10/0x10 [25338.629731] ret_from_fork+0x1f/0x40 [25338.633395] virtio_blk: probe of virtio418 failed with error -16 The log : "genirq: Flags mismatch irq 0. 00000080 (virtio418) vs. 00015a00 (timer)" was printed because of the irq 0 is used by timer exclusive,and when vp_find_vqs call vp_find_vqs_msix and returns false twice (for whatever reason), then it will call vp_find_vqs_intx as a fallback. Because vp_dev->pci_dev->irq is zero, we request irq 0 with flag IRQF_SHARED, and get a backtrace like above. According to PCI spec about "Interrupt Pin" Register (Offset 3Dh): "The Interrupt Pin register is a read-only register that identifies the legacy interrupt Message(s) the Function uses. Valid values are 01h, 02h, 03h, and 04h that map to legacy interrupt Messages for INTA, INTB, INTC, and INTD respectively. A value of 00h indicates that the Function uses no legacy interrupt Message(s)." So if vp_dev->pci_dev->pin is zero, we should not request legacy interrupt. Signed-off-by: Angus Chen <angus.chen@jaguarmicro.com> Suggested-by: Michael S. Tsirkin <mst@redhat.com> Message-Id: <20220930000915.548-1-angus.chen@jaguarmicro.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-10-07	vDPA: conditionally read MTU and MAC in dev cfg space	Zhu Lingshan	1	-8/+29
	The spec says: mtu only exists if VIRTIO_NET_F_MTU is set The mac address field always exists (though is only valid if VIRTIO_NET_F_MAC is set) So vdpa_dev_net_config_fill() should read MTU and MAC conditionally on the feature bits. Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220929014555.112323-7-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-10-07	vDPA: fix spars cast warning in vdpa_dev_net_mq_config_fill	Zhu Lingshan	1	-1/+2
	This commit fixes spars warnings: cast to restricted __le16 in function vdpa_dev_net_mq_config_fill() Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220929014555.112323-6-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-10-07	vDPA: check virtio device features to detect MQ	Zhu Lingshan	1	-1/+1
	vdpa_dev_net_mq_config_fill() should checks device features for MQ than driver features. Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220929014555.112323-5-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-10-07	vDPA: check VIRTIO_NET_F_RSS for max_virtqueue_paris's presence	Zhu Lingshan	1	-3/+3
	virtio 1.2 spec says: max_virtqueue_pairs only exists if VIRTIO_NET_F_MQ or VIRTIO_NET_F_RSS is set. So when reporint MQ to userspace, it should check both VIRTIO_NET_F_MQ and VIRTIO_NET_F_RSS. unused parameter struct vdpa_device *vdev is removed Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220929014555.112323-4-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-10-07	vDPA: only report driver features if FEATURES_OK is set	Zhu Lingshan	1	-6/+14
	This commit reports driver features to user space only after FEATURES_OK is features negotiation is done. Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Message-Id: <20220929014555.112323-3-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-10-07	vDPA: allow userspace to query features of a vDPA device	Zhu Lingshan	2	-5/+15
	This commit adds a new vDPA netlink attribution VDPA_ATTR_VDPA_DEV_SUPPORTED_FEATURES. Userspace can query features of vDPA devices through this new attr. This commit invokes vdpa_config_ops.get_config() rather than vdpa_get_config_unlocked() to read the device config spcae, so no races in vdpa_set_features_unlocked() Userspace tool iproute2 example: $ vdpa dev config show vdpa0 vdpa0: mac 00:e8:ca:11:be:05 link up link_announce false max_vq_pairs 4 mtu 1500 negotiated_features MRG_RXBUF CTRL_VQ MQ VERSION_1 ACCESS_PLATFORM dev_features MTU MAC MRG_RXBUF CTRL_VQ MQ ANY_LAYOUT VERSION_1 ACCESS_PLATFORM Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Message-Id: <20220929014555.112323-2-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-10-07	hugetlb: allocate vma lock for all sharable vmas	Mike Kravetz	1	-35/+15
	The hugetlb vma lock was originally designed to synchronize pmd sharing. As such, it was only necessary to allocate the lock for vmas that were capable of pmd sharing. Later in the development cycle, it was discovered that it could also be used to simplify fault/truncation races as described in [1]. However, a subsequent change to allocate the lock for all vmas that use the page cache was never made. A fault/truncation race could leave pages in a file past i_size until the file is removed. Remove the previous restriction and allocate lock for all VM_MAYSHARE vmas. Warn in the unlikely event of allocation failure. [1] https://lore.kernel.org/lkml/Yxiv0SkMkZ0JWGGp@monkey/#t Link: https://lkml.kernel.org/r/20221005011707.514612-4-mike.kravetz@oracle.com Fixes: "hugetlb: clean up code checking for fault/truncation races" Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: James Houghton <jthoughton@google.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mina Almasry <almasrymina@google.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Peter Xu <peterx@redhat.com> Cc: Prakash Sangappa <prakash.sangappa@oracle.com> Cc: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-07	hugetlb: take hugetlb vma_lock when clearing vma_lock->vma pointer	Mike Kravetz	1	-10/+32
	hugetlb file truncation/hole punch code may need to back out and take locks in order in the routine hugetlb_unmap_file_folio(). This code could race with vma freeing as pointed out in [1] and result in accessing a stale vma pointer. To address this, take the vma_lock when clearing the vma_lock->vma pointer. [1] https://lore.kernel.org/linux-mm/01f10195-7088-4462-6def-909549c75ef4@huawei.com/ [mike.kravetz@oracle.com: address build issues] Link: https://lkml.kernel.org/r/Yz5L1uxQYR1VqFtJ@monkey Link: https://lkml.kernel.org/r/20221005011707.514612-3-mike.kravetz@oracle.com Fixes: "hugetlb: use new vma_lock for pmd sharing synchronization" Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: James Houghton <jthoughton@google.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mina Almasry <almasrymina@google.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Peter Xu <peterx@redhat.com> Cc: Prakash Sangappa <prakash.sangappa@oracle.com> Cc: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-07	hugetlb: fix vma lock handling during split vma and range unmapping	Mike Kravetz	2	-20/+27
	Patch series "hugetlb: fixes for new vma lock series". In review of the series "hugetlb: Use new vma lock for huge pmd sharing synchronization", Miaohe Lin pointed out two key issues: 1) There is a race in the routine hugetlb_unmap_file_folio when locks are dropped and reacquired in the correct order [1]. 2) With the switch to using vma lock for fault/truncate synchronization, we need to make sure lock exists for all VM_MAYSHARE vmas, not just vmas capable of pmd sharing. These two issues are addressed here. In addition, having a vma lock present in all VM_MAYSHARE vmas, uncovered some issues around vma splitting. Those are also addressed. [1] https://lore.kernel.org/linux-mm/01f10195-7088-4462-6def-909549c75ef4@huawei.com/ This patch (of 3): The hugetlb vma lock hangs off the vm_private_data field and is specific to the vma. When vm_area_dup() is called as part of vma splitting, the vma lock pointer is copied to the new vma. This will result in issues such as double freeing of the structure. Update the hugetlb open vm_ops to allocate a new vma lock for the new vma. The routine __unmap_hugepage_range_final unconditionally unset VM_MAYSHARE to prevent subsequent pmd sharing. hugetlb_vma_lock_free attempted to anticipate this by checking both VM_MAYSHARE and VM_SHARED. However, if only VM_MAYSHARE was set we would miss the free. With the introduction of the vma lock, a vma can not participate in pmd sharing if vm_private_data is NULL. Instead of clearing VM_MAYSHARE in __unmap_hugepage_range_final, free the vma lock to prevent sharing. Also, update the sharing code to make sure vma lock is indeed a condition for pmd sharing. hugetlb_vma_lock_free can then key off VM_MAYSHARE and not miss any vmas. Link: https://lkml.kernel.org/r/20221005011707.514612-1-mike.kravetz@oracle.com Link: https://lkml.kernel.org/r/20221005011707.514612-2-mike.kravetz@oracle.com Fixes: "hugetlb: add vma based lock for pmd sharing" Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: James Houghton <jthoughton@google.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mina Almasry <almasrymina@google.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Peter Xu <peterx@redhat.com> Cc: Prakash Sangappa <prakash.sangappa@oracle.com> Cc: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-07	mglru: mm/vmscan.c: fix imprecise comments	Yu Zhao	1	-5/+4
	Link: https://lkml.kernel.org/r/YzSWfFI+MOeb1ils@google.com Signed-off-by: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-07	mm/mglru: don't sync disk for each aging cycle	Yu Zhao	1	-2/+0
	wakeup_flusher_threads() was added under the assumption that if a system runs out of clean cold pages, it might want to write back dirty pages more aggressively so that they can become clean and be dropped. However, doing so can breach the rate limit a system wants to impose on writeback, resulting in early SSD wearout. Link: https://lkml.kernel.org/r/YzSiWq9UEER5LKup@google.com Fixes: bd74fdaea146 ("mm: multi-gen LRU: support page table walks") Signed-off-by: Yu Zhao <yuzhao@google.com> Reported-by: Axel Rasmussen <axelrasmussen@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-07	f2fs: change to use atomic_t type form sbi.atomic_files	Chao Yu	4	-12/+12
	inode_lock[ATOMIC_FILE] was used for protecting sbi->atomic_files, update atomic_files variable's type to atomic_t instead of unsigned int, then inode_lock[ATOMIC_FILE] can be obsoleted. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-10-07	f2fs: account swapfile inodes	Chao Yu	3	-1/+14
	In order to check count of opened swapfile inodes. Signed-off-by: Chao Yu <chao.yu@oppo.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-10-07	virtio_blk: add SECURE ERASE command support	Alvaro Karsz	2	-18/+111
	Support for the VIRTIO_BLK_F_SECURE_ERASE VirtIO feature. A device that offers this feature can receive VIRTIO_BLK_T_SECURE_ERASE commands. A device which supports this feature has the following fields in the virtio config: - max_secure_erase_sectors - max_secure_erase_seg - secure_erase_sector_alignment max_secure_erase_sectors and secure_erase_sector_alignment are expressed in 512-byte units. Every secure erase command has the following fields: - sectors: The starting offset in 512-byte units. - num_sectors: The number of sectors. Signed-off-by: Alvaro Karsz <alvaro.karsz@solid-run.com> Message-Id: <20220921082729.2516779-1-alvaro.karsz@solid-run.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2022-10-07	vp_vdpa: support feature provisioning	Jason Wang	1	-2/+20
	This patch allows the device features to be provisioned via netlink. This is done by: 1) validating the provisioned features to be a subset of the parent features. 2) clearing the features that is not wanted by the userspace For example: # vdpa mgmtdev show pci/0000:02:00.0: supported_classes net max_supported_vqs 3 dev_features CSUM GUEST_CSUM CTRL_GUEST_OFFLOADS MAC GUEST_TSO4 GUEST_TSO6 GUEST_ECN GUEST_UFO HOST_TSO4 HOST_TSO6 HOST_ECN HOST_UFO MRG_RXBUF STATUS CTRL_VQ CTRL_RX CTRL_VLAN CTRL_RX_EXTRA GUEST_ANNOUNCE CTRL_MAC_ADDR RING_INDIRECT_DESC RING_EVENT_IDX VERSION_1 ACCESS_PLATFORM 1) provision vDPA device with all features that are supported by the virtio-pci # vdpa dev add name dev1 mgmtdev pci/0000:02:00.0 # vdpa dev config show dev1: mac 52:54:00:12:34:56 link up link_announce false mtu 65535 negotiated_features CSUM GUEST_CSUM CTRL_GUEST_OFFLOADS MAC GUEST_TSO4 GUEST_TSO6 GUEST_ECN GUEST_UFO HOST_TSO4 HOST_TSO6 HOST_ECN HOST_UFO MRG_RXBUF STATUS CTRL_VQ CTRL_RX CTRL_VLAN GUEST_ANNOUNCE CTRL_MAC_ADDR RING_INDIRECT_DESC RING_EVENT_IDX VERSION_1 ACCESS_PLATFORM 2) provision vDPA device with a subset of the features # vdpa dev add name dev1 mgmtdev pci/0000:02:00.0 device_features 0x300020000 # dev1: mac 52:54:00:12:34:56 link up link_announce false mtu 65535 negotiated_features CTRL_VQ VERSION_1 ACCESS_PLATFORM Reviewed-by: Eli Cohen <elic@nvidia.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220927074810.28627-4-jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-10-07	vdpa_sim_net: support feature provisioning	Jason Wang	4	-5/+17
	This patch implements features provisioning for vdpa_sim_net. 1) validating the provisioned features to be a subset of the parent features. 2) clearing the features that is not wanted by the userspace For example: vdpasim_net: supported_classes net max_supported_vqs 3 dev_features MTU MAC CTRL_VQ CTRL_MAC_ADDR ANY_LAYOUT VERSION_1 ACCESS_PLATFORM 1) provision vDPA device with all features that are supported by the net simulator dev1: mac 00:00:00:00:00:00 link up link_announce false mtu 1500 negotiated_features MTU MAC CTRL_VQ CTRL_MAC_ADDR VERSION_1 ACCESS_PLATFORM 2) provision vDPA device with a subset of the features dev1: mac 00:00:00:00:00:00 link up link_announce false mtu 1500 negotiated_features CTRL_VQ VERSION_1 ACCESS_PLATFORM Reviewed-by: Eli Cohen <elic@nvidia.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220927074810.28627-3-jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
2022-10-07	vdpa: device feature provisioning	Jason Wang	3	-0/+8
	This patch allows the device features to be provisioned through netlink. A new attribute is introduced to allow the userspace to pass a 64bit device features during device adding. This provides several advantages: - Allow to provision a subset of the features to ease the cross vendor live migration. - Better debug-ability for vDPA framework and parent. Reviewed-by: Eli Cohen <elic@nvidia.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220927074810.28627-2-jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-10-07	virtio-net: use mtu size as buffer length for big packets	Gavin Li	1	-13/+24
	Currently add_recvbuf_big() allocates MAX_SKB_FRAGS segments for big packets even when GUEST_* offloads are not present on the device. However, if guest GSO is not supported, it would be sufficient to allocate segments to cover just up the MTU size and no further. Allocating the maximum amount of segments results in a large waste of buffer space in the queue, which limits the number of packets that can be buffered and can result in reduced performance. Therefore, if guest GSO is not supported, use the MTU to calculate the optimal amount of segments required. Below is the iperf TCP test results over a Mellanox NIC, using vDPA for 1 VQ, queue size 1024, before and after the change, with the iperf server running over the virtio-net interface. MTU(Bytes)/Bandwidth (Gbit/s) Before After 1500 22.5 22.4 9000 12.8 25.9 And result of queue size 256. MTU(Bytes)/Bandwidth (Gbit/s) Before After 9000 2.15 11.9 With this patch no degradation is observed with multiple below tests and feature bit combinations. Results are summarized below for q depth of 1024. Interface MTU is 1500 if MTU feature is disabled. MTU is set to 9000 in other tests. Features/ Bandwidth (Gbit/s) Before After mtu off 20.1 20.2 mtu/indirect on 17.4 17.3 mtu/indirect/packed on 17.2 17.2 Signed-off-by: Gavin Li <gavinl@nvidia.com> Reviewed-by: Gavi Teitz <gavi@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com> Message-Id: <20220914144911.56422-3-gavinl@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2022-10-07	virtio-net: introduce and use helper function for guest gso support checks	Gavin Li	1	-4/+9
	Probe routine is already several hundred lines. Use helper function for guest gso support check. Signed-off-by: Gavin Li <gavinl@nvidia.com> Reviewed-by: Gavi Teitz <gavi@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com> Message-Id: <20220914144911.56422-2-gavinl@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-10-07	virtio: drop vp_legacy_set_queue_size	Michael S. Tsirkin	1	-2/+0
	There's actually no way to set queue size on legacy virtio pci. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Message-Id: <20220815220447.155860-1-mst@redhat.com>
2022-10-07	virtio_ring: make vring_alloc_queue_packed prettier	Deming Wang	1	-3/+3
	Add some spaces to vring_alloc_queue(make it look prettier). Signed-off-by: Deming Wang <wangdeming@inspur.com> Message-Id: <20220926183306.4535-1-wangdeming@inspur.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-10-07	virtio_ring: split: Operators use unified style	Deming Wang	1	-1/+1
	The operators of vring_alloc_queue_split should use the unified style.Add space for the '\|' ,make it be looked more pretty. Signed-off-by: Deming Wang <wangdeming@inspur.com> Message-Id: <20220926022202.1516-1-wangdeming@inspur.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-10-07	vhost: add __init/__exit annotations to module init/exit funcs	Xiu Jianfeng	1	-2/+2
	Add missing __init/__exit annotations to module init/exit funcs. Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com> Message-Id: <20220917083803.21521-1-xiujianfeng@huawei.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-10-07	net/9p: clarify trans_fd parse_opt failure handling	Li Zhong	1	-1/+3
	This parse_opts will set invalid opts.rfd/wfd in case of failure which we already check, but it is not clear for readers that parse_opts error are handled in p9_fd_create: clarify this by explicitely checking the return value. Link: https://lkml.kernel.org/r/20220921210921.1654735-1-floridsleeves@gmail.com Signed-off-by: Li Zhong <floridsleeves@gmail.com> [Dominique: reworded commit message to clarify this is NOOP] Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
2022-10-07	net/9p: add __init/__exit annotations to module init/exit funcs	Xiu Jianfeng	1	-2/+2
	xen transport was missing annotations Link: https://lkml.kernel.org/r/20220909103546.73015-1-xiujianfeng@huawei.com Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com> Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
2022-10-07	net/9p: use a dedicated spinlock for trans_fd	Dominique Martinet	1	-16/+25
	Shamelessly copying the explanation from Tetsuo Handa's suggested patch[1] (slightly reworded): syzbot is reporting inconsistent lock state in p9_req_put()[2], for p9_tag_remove() from p9_req_put() from IRQ context is using spin_lock_irqsave() on "struct p9_client"->lock but trans_fd (not from IRQ context) is using spin_lock(). Since the locks actually protect different things in client.c and in trans_fd.c, just replace trans_fd.c's lock by a new one specific to the transport (client.c's protect the idr for fid/tag allocations, while trans_fd.c's protects its own req list and request status field that acts as the transport's state machine) Link: https://lore.kernel.org/r/20220904112928.1308799-1-asmadeus@codewreck.org Link: https://lkml.kernel.org/r/2470e028-9b05-2013-7198-1fdad071d999@I-love.SAKURA.ne.jp [1] Link: https://syzkaller.appspot.com/bug?extid=2f20b523930c32c160cc [2] Reported-by: syzbot <syzbot+2f20b523930c32c160cc@syzkaller.appspotmail.com> Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reviewed-by: Christian Schoenebeck <linux_oss@crudebyte.com> Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
2022-10-07	KVM: PPC: Book3S HV: Fix stack frame regs marker	Nicholas Piggin	1	-1/+1
	The hard-coded marker is out of date now, fix it using the nice define. Fixes: 17773afdcd15 ("powerpc/64: use 32-bit immediate for STACK_FRAME_REGS_MARKER") Reported-by: Joel Stanley <joel@jms.id.au> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221006143345.129077-1-npiggin@gmail.com
2022-10-06	riscv: enable THP_SWAP for RV64	Jisheng Zhang	1	-0/+1
	I have a Sipeed Lichee RV dock board which only has 512MB DDR, so memory optimizations such as swap on zram are helpful. As is seen in commit d0637c505f8a ("arm64: enable THP_SWAP for arm64") and commit bd4c82c22c367e ("mm, THP, swap: delay splitting THP after swapped out"), THP_SWAP can improve the swap throughput significantly. Enable THP_SWAP for RV64, testing the micro-benchmark which is introduced by commit d0637c505f8a ("arm64: enable THP_SWAP for arm64") shows below numbers on the Lichee RV dock board: swp out bandwidth w/o patch: 66908 bytes/ms (mean of 10 tests) swp out bandwidth w/ patch: 322638 bytes/ms (mean of 10 tests) Improved by 382%! Signed-off-by: Jisheng Zhang <jszhang@kernel.org> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20220829145742.3139-1-jszhang@kernel.org/ Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-10-06	RISC-V: Print SSTC in canonical order	Palmer Dabbelt	1	-1/+1
	This got out of order during a merge conflict, fix it by putting the entries in the correct order. Fixes: 7ab52f75a9cf ("RISC-V: Add Sstc extension support") Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Reviewed-by: Heiko Stuebner <heiko@sntech.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20220920204518.10988-1-palmer@rivosinc.com/ Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-10-07	Revert "drm/sched: Use parent fence instead of finished"	Dave Airlie	1	-2/+2
	This reverts commit e4dc45b1848bc6bcac31eb1b4ccdd7f6718b3c86. This is causing instability on Linus' desktop, and I'm seeing oops with VK CTS runs. netconsole got me the following oops: [ 1234.778760] BUG: kernel NULL pointer dereference, address: 0000000000000088 [ 1234.778782] #PF: supervisor read access in kernel mode [ 1234.778787] #PF: error_code(0x0000) - not-present page [ 1234.778791] PGD 0 P4D 0 [ 1234.778798] Oops: 0000 [#1] PREEMPT SMP NOPTI [ 1234.778803] CPU: 7 PID: 805 Comm: systemd-journal Not tainted 6.0.0+ #2 [ 1234.778809] Hardware name: System manufacturer System Product Name/PRIME X370-PRO, BIOS 5603 07/28/2020 [ 1234.778813] RIP: 0010:drm_sched_job_done.isra.0+0xc/0x140 [gpu_sched] [ 1234.778828] Code: aa 0f 1d ce e9 57 ff ff ff 48 89 d7 e8 9d 8f 3f ce e9 4a ff ff ff 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 54 55 53 48 89 fb <48> 8b af 88 00 00 00 f0 ff 8d f0 00 00 00 48 8b 85 80 01 00 00 f0 [ 1234.778834] RSP: 0000:ffffabe680380de0 EFLAGS: 00010087 [ 1234.778839] RAX: ffffffffc04e9230 RBX: 0000000000000000 RCX: 0000000000000018 [ 1234.778897] RDX: 00000ba278e8977a RSI: ffff953fb288b460 RDI: 0000000000000000 [ 1234.778901] RBP: ffff953fb288b598 R08: 00000000000000e0 R09: ffff953fbd98b808 [ 1234.778905] R10: 0000000000000000 R11: ffffabe680380ff8 R12: ffffabe680380e00 [ 1234.778908] R13: 0000000000000001 R14: 00000000ffffffff R15: ffff953fbd9ec458 [ 1234.778912] FS: 00007f35e7008580(0000) GS:ffff95428ebc0000(0000) knlGS:0000000000000000 [ 1234.778916] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1234.778919] CR2: 0000000000000088 CR3: 000000010147c000 CR4: 00000000003506e0 [ 1234.778924] Call Trace: [ 1234.778981] <IRQ> [ 1234.778989] dma_fence_signal_timestamp_locked+0x6a/0xe0 [ 1234.778999] dma_fence_signal+0x2c/0x50 [ 1234.779005] amdgpu_fence_process+0xc8/0x140 [amdgpu] [ 1234.779234] sdma_v3_0_process_trap_irq+0x70/0x80 [amdgpu] [ 1234.779395] amdgpu_irq_dispatch+0xa9/0x1d0 [amdgpu] [ 1234.779609] amdgpu_ih_process+0x80/0x100 [amdgpu] [ 1234.779783] amdgpu_irq_handler+0x1f/0x60 [amdgpu] [ 1234.779940] __handle_irq_event_percpu+0x46/0x190 [ 1234.779946] handle_irq_event+0x34/0x70 [ 1234.779949] handle_edge_irq+0x9f/0x240 [ 1234.779954] __common_interrupt+0x66/0x100 [ 1234.779960] common_interrupt+0xa0/0xc0 [ 1234.779965] </IRQ> [ 1234.779968] <TASK> [ 1234.779971] asm_common_interrupt+0x22/0x40 [ 1234.779976] RIP: 0010:finish_mkwrite_fault+0x22/0x110 [ 1234.779981] Code: 1f 84 00 00 00 00 00 90 0f 1f 44 00 00 41 55 41 54 55 48 89 fd 53 48 8b 07 f6 40 50 08 0f 84 eb 00 00 00 48 8b 45 30 48 8b 18 <48> 89 df e8 66 bd ff ff 48 85 c0 74 0d 48 89 c2 83 e2 01 48 83 ea [ 1234.779985] RSP: 0000:ffffabe680bcfd78 EFLAGS: 00000202 Revert it for now and figure it out later. Signed-off-by: Dave Airlie <airlied@redhat.com>
2022-10-07	9p/trans_fd: always use O_NONBLOCK read/write	Tetsuo Handa	1	-0/+3
	syzbot is reporting hung task at p9_fd_close() [1], for p9_mux_poll_stop() from p9_conn_destroy() from p9_fd_close() is failing to interrupt already started kernel_read() from p9_fd_read() from p9_read_work() and/or kernel_write() from p9_fd_write() from p9_write_work() requests. Since p9_socket_open() sets O_NONBLOCK flag, p9_mux_poll_stop() does not need to interrupt kernel_read()/kernel_write(). However, since p9_fd_open() does not set O_NONBLOCK flag, but pipe blocks unless signal is pending, p9_mux_poll_stop() needs to interrupt kernel_read()/kernel_write() when the file descriptor refers to a pipe. In other words, pipe file descriptor needs to be handled as if socket file descriptor. We somehow need to interrupt kernel_read()/kernel_write() on pipes. A minimal change, which this patch is doing, is to set O_NONBLOCK flag from p9_fd_open(), for O_NONBLOCK flag does not affect reading/writing of regular files. But this approach changes O_NONBLOCK flag on userspace- supplied file descriptors (which might break userspace programs), and O_NONBLOCK flag could be changed by userspace. It would be possible to set O_NONBLOCK flag every time p9_fd_read()/p9_fd_write() is invoked, but still remains small race window for clearing O_NONBLOCK flag. If we don't want to manipulate O_NONBLOCK flag, we might be able to surround kernel_read()/kernel_write() with set_thread_flag(TIF_SIGPENDING) and recalc_sigpending(). Since p9_read_work()/p9_write_work() works are processed by kernel threads which process global system_wq workqueue, signals could not be delivered from remote threads when p9_mux_poll_stop() from p9_conn_destroy() from p9_fd_close() is called. Therefore, calling set_thread_flag(TIF_SIGPENDING)/recalc_sigpending() every time would be needed if we count on signals for making kernel_read()/kernel_write() non-blocking. Link: https://lkml.kernel.org/r/345de429-a88b-7097-d177-adecf9fed342@I-love.SAKURA.ne.jp Link: https://syzkaller.appspot.com/bug?extid=8b41a1365f1106fd0f33 [1] Reported-by: syzbot <syzbot+8b41a1365f1106fd0f33@syzkaller.appspotmail.com> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Tested-by: syzbot <syzbot+8b41a1365f1106fd0f33@syzkaller.appspotmail.com> Reviewed-by: Christian Schoenebeck <linux_oss@crudebyte.com> [Dominique: add comment at Christian's suggestion] Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
2022-10-06	random: clear new batches when bringing new CPUs online	Jason A. Donenfeld	2	-12/+18
	The commit that added the new get_random_{u8,u16}() functions neglected to update the code that clears the batches when bringing up a new CPU. It also forgot a few comments and helper defines, so add those in too. Fixes: 585cd5fe9f73 ("random: add 8-bit and 16-bit batches") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-10-06	ftrace: Create separate entry in MAINTAINERS for function hooks	Steven Rostedt (Google)	1	-5/+14
	The function hooks (ftrace) is a completely different subsystem from the general tracing. It manages how to attach callbacks to most functions in the kernel. It is also used by live kernel patching. It really is not part of tracing, although tracing uses it. Create a separate entry for FUNCTION HOOKS (FTRACE) to be separate from tracing itself in the MAINTAINERS file. Perhaps it should be moved out of the kernel/trace directory, but that's for another time. Link: https://lkml.kernel.org/r/20221006144439.459272364@goodmis.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-10-06	tracing: Update MAINTAINERS to reflect new tracing git repo	Steven Rostedt (Google)	1	-2/+2
	The tracing git repo will no longer be housed in my personal git repo, but instead live in trace/linux-trace.git. Update the MAINTAINERS file appropriately. Link: https://lkml.kernel.org/r/20221006144439.282193367@goodmis.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-10-07	powerpc: Don't add __powerpc_ prefix to syscall entry points	Michael Ellerman	4	-16/+10
	When using syscall wrappers the __SYSCALL_DEFINEx() and related macros add a "__powerpc_" prefix to all syscall entry points. So for example sys_mmap becomes __powerpc_sys_mmap. This risks breaking workflows and tools that expect the old naming scheme. At a minimum setting a breakpoint on eg. sys_mmap with gdb no longer works. There seems to be no compelling reason to add the "__powerpc_" prefix, other than that it follows what some other arches do (x86, arm64, s390). But unlike other arches powerpc doesn't always enable syscall wrappers, so the syscall entry points can change name depending on CONFIG options. For those reasons drop the "__powerpc_" prefix, reverting to the existing naming. Doing so reveals two prototypes in signal.h that have the incorrect type when syscall wrappers are enabled. There are already prototypes for both functions in syscalls.h, so drop the ones from signal.h. Fixes: 7e92e01b7245 ("powerpc: Provide syscall wrapper") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221006135940.1223988-1-mpe@ellerman.id.au
2022-10-06	sched/core: Merge cpumask_andnot()+for_each_cpu() into for_each_cpu_andnot()	Valentin Schneider	1	-4/+1
	This removes the second use of the sched_core_mask temporary mask. Suggested-by: Yury Norov <yury.norov@gmail.com> Signed-off-by: Valentin Schneider <vschneid@redhat.com>
2022-10-06	lib/test_cpumask: Add for_each_cpu_and(not) tests	Valentin Schneider	1	-0/+19
	Following the recent introduction of for_each_andnot(), add some tests to ensure for_each_cpu_and(not) results in the same as iterating over the result of cpumask_and(not)(). Suggested-by: Yury Norov <yury.norov@gmail.com> Signed-off-by: Valentin Schneider <vschneid@redhat.com>
2022-10-06	cpumask: Introduce for_each_cpu_andnot()	Valentin Schneider	2	-0/+23
	for_each_cpu_and() is very convenient as it saves having to allocate a temporary cpumask to store the result of cpumask_and(). The same issue applies to cpumask_andnot() which doesn't actually need temporary storage for iteration purposes. Following what has been done for for_each_cpu_and(), introduce for_each_cpu_andnot(). Signed-off-by: Valentin Schneider <vschneid@redhat.com>
2022-10-06	lib/find_bit: Introduce find_next_andnot_bit()	Valentin Schneider	2	-0/+42
	In preparation of introducing for_each_cpu_andnot(), add a variant of find_next_bit() that negate the bits in @addr2 when ANDing them with the bits in @addr1. Signed-off-by: Valentin Schneider <vschneid@redhat.com>
2022-10-06	ARM/dma-mapping: remove the dma_coherent member of struct dev_archdata	Christoph Hellwig	2	-4/+1
	Since commit ae626eb97376 ("ARM/dma-mapping: use dma-direct unconditionally") only the dma_coherent flag in struct device is used, so remove the now write only flag in struct dev_archdata. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2022-10-06	ARM/dma-mappіng: don't override ->dma_coherent when set from a bus notifier	Christoph Hellwig	1	-2/+10
	Commit ae626eb97376 ("ARM/dma-mapping: use dma-direct unconditionally") caused a regression on the mvebu platform, wherein devices that are dma-coherent are marked as dma-noncoherent, because although mvebu_hwcc_notifier() after that commit still marks then as coherent, the arm_coherent_dma_ops() function, which is called later, overwrites this setting, since it is being called from drivers/of/device.c with coherency parameter determined by of_dma_is_coherent(), and the device-trees do not declare the 'dma-coherent' property. Fix this by defaulting never clearing the dma_coherent flag in arm_coherent_dma_ops(). Fixes: ae626eb97376 ("ARM/dma-mapping: use dma-direct unconditionally") Reported-by: Marek Behún <kabel@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Tested-by: Marek Behún <kabel@kernel.org>