aboutsummaryrefslogtreecommitdiffstats
path: root/include/uapi/linux (follow)
AgeCommit message (Collapse)AuthorFilesLines
2021-11-02Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds1-3/+26
Pull KVM updates from Paolo Bonzini: "ARM: - More progress on the protected VM front, now with the full fixed feature set as well as the limitation of some hypercalls after initialisation. - Cleanup of the RAZ/WI sysreg handling, which was pointlessly complicated - Fixes for the vgic placement in the IPA space, together with a bunch of selftests - More memcg accounting of the memory allocated on behalf of a guest - Timer and vgic selftests - Workarounds for the Apple M1 broken vgic implementation - KConfig cleanups - New kvmarm.mode=none option, for those who really dislike us RISC-V: - New KVM port. x86: - New API to control TSC offset from userspace - TSC scaling for nested hypervisors on SVM - Switch masterclock protection from raw_spin_lock to seqcount - Clean up function prototypes in the page fault code and avoid repeated memslot lookups - Convey the exit reason to userspace on emulation failure - Configure time between NX page recovery iterations - Expose Predictive Store Forwarding Disable CPUID leaf - Allocate page tracking data structures lazily (if the i915 KVM-GT functionality is not compiled in) - Cleanups, fixes and optimizations for the shadow MMU code s390: - SIGP Fixes - initial preparations for lazy destroy of secure VMs - storage key improvements/fixes - Log the guest CPNC Starting from this release, KVM-PPC patches will come from Michael Ellerman's PPC tree" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (227 commits) RISC-V: KVM: fix boolreturn.cocci warnings RISC-V: KVM: remove unneeded semicolon RISC-V: KVM: Fix GPA passed to __kvm_riscv_hfence_gvma_xyz() functions RISC-V: KVM: Factor-out FP virtualization into separate sources KVM: s390: add debug statement for diag 318 CPNC data KVM: s390: pv: properly handle page flags for protected guests KVM: s390: Fix handle_sske page fault handling KVM: x86: SGX must obey the KVM_INTERNAL_ERROR_EMULATION protocol KVM: x86: On emulation failure, convey the exit reason, etc. to userspace KVM: x86: Get exit_reason as part of kvm_x86_ops.get_exit_info KVM: x86: Clarify the kvm_run.emulation_failure structure layout KVM: s390: Add a routine for setting userspace CPU state KVM: s390: Simplify SIGP Set Arch handling KVM: s390: pv: avoid stalls when making pages secure KVM: s390: pv: avoid stalls for kvm_s390_pv_init_vm KVM: s390: pv: avoid double free of sida page KVM: s390: pv: add macros for UVC CC values s390/mm: optimize reset_guest_reference_bit() s390/mm: optimize set_guest_storage_key() s390/mm: no need for pte_alloc_map_lock() if we know the pmd is present ...
2021-11-02Merge tag 'net-next-for-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-nextLinus Torvalds24-41/+594
Pull networking updates from Jakub Kicinski: "Core: - Remove socket skb caches - Add a SO_RESERVE_MEM socket op to forward allocate buffer space and avoid memory accounting overhead on each message sent - Introduce managed neighbor entries - added by control plane and resolved by the kernel for use in acceleration paths (BPF / XDP right now, HW offload users will benefit as well) - Make neighbor eviction on link down controllable by userspace to work around WiFi networks with bad roaming implementations - vrf: Rework interaction with netfilter/conntrack - fq_codel: implement L4S style ce_threshold_ect1 marking - sch: Eliminate unnecessary RCU waits in mini_qdisc_pair_swap() BPF: - Add support for new btf kind BTF_KIND_TAG, arbitrary type tagging as implemented in LLVM14 - Introduce bpf_get_branch_snapshot() to capture Last Branch Records - Implement variadic trace_printk helper - Add a new Bloomfilter map type - Track <8-byte scalar spill and refill - Access hw timestamp through BPF's __sk_buff - Disallow unprivileged BPF by default - Document BPF licensing Netfilter: - Introduce egress hook for looking at raw outgoing packets - Allow matching on and modifying inner headers / payload data - Add NFT_META_IFTYPE to match on the interface type either from ingress or egress Protocols: - Multi-Path TCP: - increase default max additional subflows to 2 - rework forward memory allocation - add getsockopts: MPTCP_INFO, MPTCP_TCPINFO, MPTCP_SUBFLOW_ADDRS - MCTP flow support allowing lower layer drivers to configure msg muxing as needed - Automatic Multicast Tunneling (AMT) driver based on RFC7450 - HSR support the redbox supervision frames (IEC-62439-3:2018) - Support for the ip6ip6 encapsulation of IOAM - Netlink interface for CAN-FD's Transmitter Delay Compensation - Support SMC-Rv2 eliminating the current same-subnet restriction, by exploiting the UDP encapsulation feature of RoCE adapters - TLS: add SM4 GCM/CCM crypto support - Bluetooth: initial support for link quality and audio/codec offload Driver APIs: - Add a batched interface for RX buffer allocation in AF_XDP buffer pool - ethtool: Add ability to control transceiver modules' power mode - phy: Introduce supported interfaces bitmap to express MAC capabilities and simplify PHY code - Drop rtnl_lock from DSA .port_fdb_{add,del} callbacks New drivers: - WiFi driver for Realtek 8852AE 802.11ax devices (rtw89) - Ethernet driver for ASIX AX88796C SPI device (x88796c) Drivers: - Broadcom PHYs - support 72165, 7712 16nm PHYs - support IDDQ-SR for additional power savings - PHY support for QCA8081, QCA9561 PHYs - NXP DPAA2: support for IRQ coalescing - NXP Ethernet (enetc): support for software TCP segmentation - Renesas Ethernet (ravb) - support DMAC and EMAC blocks of Gigabit-capable IP found on RZ/G2L SoC - Intel 100G Ethernet - support for eswitch offload of TC/OvS flow API, including offload of GRE, VxLAN, Geneve tunneling - support application device queues - ability to assign Rx and Tx queues to application threads - PTP and PPS (pulse-per-second) extensions - Broadcom Ethernet (bnxt) - devlink health reporting and device reload extensions - Mellanox Ethernet (mlx5) - offload macvlan interfaces - support HW offload of TC rules involving OVS internal ports - support HW-GRO and header/data split - support application device queues - Marvell OcteonTx2: - add XDP support for PF - add PTP support for VF - Qualcomm Ethernet switch (qca8k): support for QCA8328 - Realtek Ethernet DSA switch (rtl8366rb) - support bridge offload - support STP, fast aging, disabling address learning - support for Realtek RTL8365MB-VC, a 4+1 port 10M/100M/1GE switch - Mellanox Ethernet/IB switch (mlxsw) - multi-level qdisc hierarchy offload (e.g. RED, prio and shaping) - offload root TBF qdisc as port shaper - support multiple routing interface MAC address prefixes - support for IP-in-IP with IPv6 underlay - MediaTek WiFi (mt76) - mt7921 - ASPM, 6GHz, SDIO and testmode support - mt7915 - LED and TWT support - Qualcomm WiFi (ath11k) - include channel rx and tx time in survey dump statistics - support for 80P80 and 160 MHz bandwidths - support channel 2 in 6 GHz band - spectral scan support for QCN9074 - support for rx decapsulation offload (data frames in 802.3 format) - Qualcomm phone SoC WiFi (wcn36xx) - enable Idle Mode Power Save (IMPS) to reduce power consumption during idle - Bluetooth driver support for MediaTek MT7922 and MT7921 - Enable support for AOSP Bluetooth extension in Qualcomm WCN399x and Realtek 8822C/8852A - Microsoft vNIC driver (mana) - support hibernation and kexec - Google vNIC driver (gve) - support for jumbo frames - implement Rx page reuse Refactor: - Make all writes to netdev->dev_addr go thru helpers, so that we can add this address to the address rbtree and handle the updates - Various TCP cleanups and optimizations including improvements to CPU cache use - Simplify the gnet_stats, Qdisc stats' handling and remove qdisc->running sequence counter - Driver changes and API updates to address devlink locking deficiencies" * tag 'net-next-for-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2122 commits) Revert "net: avoid double accounting for pure zerocopy skbs" selftests: net: add arp_ndisc_evict_nocarrier net: ndisc: introduce ndisc_evict_nocarrier sysctl parameter net: arp: introduce arp_evict_nocarrier sysctl parameter libbpf: Deprecate AF_XDP support kbuild: Unify options for BTF generation for vmlinux and modules selftests/bpf: Add a testcase for 64-bit bounds propagation issue. bpf: Fix propagation of signed bounds from 64-bit min/max into 32-bit. bpf: Fix propagation of bounds from 64-bit min/max into 32-bit and var_off. net: vmxnet3: remove multiple false checks in vmxnet3_ethtool.c net: avoid double accounting for pure zerocopy skbs tcp: rename sk_wmem_free_skb netdevsim: fix uninit value in nsim_drv_configure_vfs() selftests/bpf: Fix also no-alu32 strobemeta selftest bpf: Add missing map_delete_elem method to bloom filter map selftests/bpf: Add bloom map success test for userspace calls bpf: Add alignment padding for "map_extra" + consolidate holes bpf: Bloom filter map naming fixups selftests/bpf: Add test cases for struct_ops prog bpf: Add dummy BPF STRUCT_OPS for test purpose ...
2021-11-01Merge tag 'audit-pr-20211101' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/auditLinus Torvalds1-0/+1
Pull audit updates from Paul Moore: "Add some additional audit logging to capture the openat2() syscall open_how struct info. Previous variations of the open()/openat() syscalls allowed audit admins to inspect the syscall args to get the information contained in the new open_how struct used in openat2()" * tag 'audit-pr-20211101' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit: audit: return early if the filter rule has a lower priority audit: add OPENAT2 record to list "how" info audit: add support for the openat2 syscall audit: replace magic audit syscall class numbers with macros lsm_audit: avoid overloading the "key" audit field audit: Convert to SPDX identifier audit: rename struct node to struct audit_node to prevent future name collisions
2021-11-01Merge tag 'selinux-pr-20211101' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinuxLinus Torvalds1-1/+3
Pull selinux updates from Paul Moore: - Add LSM/SELinux/Smack controls and auditing for io-uring. As usual, the individual commit descriptions have more detail, but we were basically missing two things which we're adding here: + establishment of a proper audit context so that auditing of io-uring ops works similarly to how it does for syscalls (with some io-uring additions because io-uring ops are *not* syscalls) + additional LSM hooks to enable access control points for some of the more unusual io-uring features, e.g. credential overrides. The additional audit callouts and LSM hooks were done in conjunction with the io-uring folks, based on conversations and RFC patches earlier in the year. - Fixup the binder credential handling so that the proper credentials are used in the LSM hooks; the commit description and the code comment which is removed in these patches are helpful to understand the background and why this is the proper fix. - Enable SELinux genfscon policy support for securityfs, allowing improved SELinux filesystem labeling for other subsystems which make use of securityfs, e.g. IMA. * tag 'selinux-pr-20211101' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux: security: Return xattr name from security_dentry_init_security() selinux: fix a sock regression in selinux_ip_postroute_compat() binder: use cred instead of task for getsecid binder: use cred instead of task for selinux checks binder: use euid from cred instead of using task LSM: Avoid warnings about potentially unused hook variables selinux: fix all of the W=1 build warnings selinux: make better use of the nf_hook_state passed to the NF hooks selinux: fix race condition when computing ocontext SIDs selinux: remove unneeded ipv6 hook wrappers selinux: remove the SELinux lockdown implementation selinux: enable genfscon labeling for securityfs Smack: Brutalist io_uring support selinux: add support for the io_uring access controls lsm,io_uring: add LSM hooks to io_uring io_uring: convert io_uring to the secure anon inode interface fs: add anon_inode_getfile_secure() similar to anon_inode_getfd_secure() audit: add filtering for io_uring records audit,io_uring,io-wq: add some basic audit support to io_uring audit: prepare audit_context for use in calling contexts beyond syscalls
2021-11-01Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextJakub Kicinski2-4/+38
Alexei Starovoitov says: ==================== pull-request: bpf-next 2021-11-01 We've added 181 non-merge commits during the last 28 day(s) which contain a total of 280 files changed, 11791 insertions(+), 5879 deletions(-). The main changes are: 1) Fix bpf verifier propagation of 64-bit bounds, from Alexei. 2) Parallelize bpf test_progs, from Yucong and Andrii. 3) Deprecate various libbpf apis including af_xdp, from Andrii, Hengqi, Magnus. 4) Improve bpf selftests on s390, from Ilya. 5) bloomfilter bpf map type, from Joanne. 6) Big improvements to JIT tests especially on Mips, from Johan. 7) Support kernel module function calls from bpf, from Kumar. 8) Support typeless and weak ksym in light skeleton, from Kumar. 9) Disallow unprivileged bpf by default, from Pawan. 10) BTF_KIND_DECL_TAG support, from Yonghong. 11) Various bpftool cleanups, from Quentin. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (181 commits) libbpf: Deprecate AF_XDP support kbuild: Unify options for BTF generation for vmlinux and modules selftests/bpf: Add a testcase for 64-bit bounds propagation issue. bpf: Fix propagation of signed bounds from 64-bit min/max into 32-bit. bpf: Fix propagation of bounds from 64-bit min/max into 32-bit and var_off. selftests/bpf: Fix also no-alu32 strobemeta selftest bpf: Add missing map_delete_elem method to bloom filter map selftests/bpf: Add bloom map success test for userspace calls bpf: Add alignment padding for "map_extra" + consolidate holes bpf: Bloom filter map naming fixups selftests/bpf: Add test cases for struct_ops prog bpf: Add dummy BPF STRUCT_OPS for test purpose bpf: Factor out helpers for ctx access checking bpf: Factor out a helper to prepare trampoline for struct_ops prog selftests, bpf: Fix broken riscv build riscv, libbpf: Add RISC-V (RV64) support to bpf_tracing.h tools, build: Add RISC-V to HOSTARCH parsing riscv, bpf: Increase the maximum number of iterations selftests, bpf: Add one test for sockmap with strparser selftests, bpf: Fix test_txmsg_ingress_parser error ... ==================== Link: https://lore.kernel.org/r/20211102013123.9005-1-alexei.starovoitov@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-01net: ndisc: introduce ndisc_evict_nocarrier sysctl parameterJames Prestwood1-0/+1
In most situations the neighbor discovery cache should be cleared on a NOCARRIER event which is currently done unconditionally. But for wireless roams the neighbor discovery cache can and should remain intact since the underlying network has not changed. This patch introduces a sysctl option ndisc_evict_nocarrier which can be disabled by a wireless supplicant during a roam. This allows packets to be sent after a roam immediately without having to wait for neighbor discovery. A user reported roughly a 1 second delay after a roam before packets could be sent out (note, on IPv4). This delay was due to the ARP cache being cleared. During testing of this same scenario using IPv6 no delay was noticed, but regardless there is no reason to clear the ndisc cache for wireless roams. Signed-off-by: James Prestwood <prestwoj@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-01net: arp: introduce arp_evict_nocarrier sysctl parameterJames Prestwood2-0/+2
This change introduces a new sysctl parameter, arp_evict_nocarrier. When set (default) the ARP cache will be cleared on a NOCARRIER event. This new option has been defaulted to '1' which maintains existing behavior. Clearing the ARP cache on NOCARRIER is relatively new, introduced by: commit 859bd2ef1fc1110a8031b967ee656c53a6260a76 Author: David Ahern <dsahern@gmail.com> Date: Thu Oct 11 20:33:49 2018 -0700 net: Evict neighbor entries on carrier down The reason for this changes is to prevent the ARP cache from being cleared when a wireless device roams. Specifically for wireless roams the ARP cache should not be cleared because the underlying network has not changed. Clearing the ARP cache in this case can introduce significant delays sending out packets after a roam. A user reported such a situation here: https://lore.kernel.org/linux-wireless/CACsRnHWa47zpx3D1oDq9JYnZWniS8yBwW1h0WAVZ6vrbwL_S0w@mail.gmail.com/ After some investigation it was found that the kernel was holding onto packets until ARP finished which resulted in this 1 second delay. It was also found that the first ARP who-has was never responded to, which is actually what caues the delay. This change is more or less working around this behavior, but again, there is no reason to clear the cache on a roam anyways. As for the unanswered who-has, we know the packet made it OTA since it was seen while monitoring. Why it never received a response is unknown. In any case, since this is a problem on the AP side of things all that can be done is to work around it until it is solved. Some background on testing/reproducing the packet delay: Hardware: - 2 access points configured for Fast BSS Transition (Though I don't see why regular reassociation wouldn't have the same behavior) - Wireless station running IWD as supplicant - A device on network able to respond to pings (I used one of the APs) Procedure: - Connect to first AP - Ping once to establish an ARP entry - Start a tcpdump - Roam to second AP - Wait for operstate UP event, and note the timestamp - Start pinging Results: Below is the tcpdump after UP. It was recorded the interface went UP at 10:42:01.432875. 10:42:01.461871 ARP, Request who-has 192.168.254.1 tell 192.168.254.71, length 28 10:42:02.497976 ARP, Request who-has 192.168.254.1 tell 192.168.254.71, length 28 10:42:02.507162 ARP, Reply 192.168.254.1 is-at ac:86:74:55:b0:20, length 46 10:42:02.507185 IP 192.168.254.71 > 192.168.254.1: ICMP echo request, id 52792, seq 1, length 64 10:42:02.507205 IP 192.168.254.71 > 192.168.254.1: ICMP echo request, id 52792, seq 2, length 64 10:42:02.507212 IP 192.168.254.71 > 192.168.254.1: ICMP echo request, id 52792, seq 3, length 64 10:42:02.507219 IP 192.168.254.71 > 192.168.254.1: ICMP echo request, id 52792, seq 4, length 64 10:42:02.507225 IP 192.168.254.71 > 192.168.254.1: ICMP echo request, id 52792, seq 5, length 64 10:42:02.507232 IP 192.168.254.71 > 192.168.254.1: ICMP echo request, id 52792, seq 6, length 64 10:42:02.515373 IP 192.168.254.1 > 192.168.254.71: ICMP echo reply, id 52792, seq 1, length 64 10:42:02.521399 IP 192.168.254.1 > 192.168.254.71: ICMP echo reply, id 52792, seq 2, length 64 10:42:02.521612 IP 192.168.254.1 > 192.168.254.71: ICMP echo reply, id 52792, seq 3, length 64 10:42:02.521941 IP 192.168.254.1 > 192.168.254.71: ICMP echo reply, id 52792, seq 4, length 64 10:42:02.522419 IP 192.168.254.1 > 192.168.254.71: ICMP echo reply, id 52792, seq 5, length 64 10:42:02.523085 IP 192.168.254.1 > 192.168.254.71: ICMP echo reply, id 52792, seq 6, length 64 You can see the first ARP who-has went out very quickly after UP, but was never responded to. Nearly a second later the kernel retries and gets a response. Only then do the ping packets go out. If an ARP entry is manually added prior to UP (after the cache is cleared) it is seen that the first ping is never responded to, so its not only an issue with ARP but with data packets in general. As mentioned prior, the wireless interface was also monitored to verify the ping/ARP packet made it OTA which was observed to be true. Signed-off-by: James Prestwood <prestwoj@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-01Merge tag 'for-linus-5.16-1' of https://github.com/cminyard/linux-ipmiLinus Torvalds1-1/+15
Pull IPMI driver updates from Corey Minyard: "A new type of low-level IPMI driver is added for direct communication over the IPMI message bus without a BMC between the driver and the bus. Other than that, lots of little bug fixes and enhancements" * tag 'for-linus-5.16-1' of https://github.com/cminyard/linux-ipmi: ipmi: kcs_bmc: Fix a memory leak in the error handling path of 'kcs_bmc_serio_add_device()' char: ipmi: replace snprintf in show functions with sysfs_emit ipmi: ipmb: fix dependencies to eliminate build error ipmi:ipmb: Add OF support ipmi: bt: Add ast2600 compatible string ipmi: bt-bmc: Use registers directly ipmi: ipmb: Fix off-by-one size check on rcvlen ipmi:ssif: Use depends on, not select, for I2C ipmi: Add docs for the IPMI IPMB driver ipmi: Add docs for IPMB direct addressing ipmi:ipmb: Add initial support for IPMI over IPMB ipmi: Add support for IPMB direct messages ipmi: Export ipmb_checksum() ipmi: Fix a typo ipmi: Check error code before processing BMC response ipmi:devintf: Return a proper error when recv buffer too small ipmi: Disable some operations during a panic ipmi:watchdog: Set panic count to proper value on a panic
2021-11-01Merge tag 'media/v5.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-mediaLinus Torvalds2-6/+31
Pull media updates from Mauro Carvalho Chehab: - New driver for SK Hynix Hi-846 8M pixel camera - New driver for the ov13b10 camera - New driver for Renesas R-Car ISP - mtk-vcodec gained support for version 2 of decoder firmware ABI - The legacy sir_ir driver got removed - videobuf2: the vb2_mem_ops kAPI had some improvements - lots of cleanups, fixes and new features at device drivers * tag 'media/v5.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (328 commits) media: venus: core: Add sdm660 DT compatible and resource struct media: dt-bindings: media: venus: Add sdm660 dt schema media: venus: vdec: decoded picture buffer handling during reconfig sequence media: venus: Handle fatal errors during encoding and decoding media: venus: helpers: Add helper to mark fatal vb2 error media: venus: hfi: Check for sys error on session hfi functions media: venus: Make sys_error flag an atomic bitops media: venus: venc: Use pmruntime autosuspend media: allegro: write vui parameters for HEVC media: allegro: nal-hevc: implement generator for vui media: allegro: write correct colorspace into SPS media: allegro: extract nal value lookup functions to header media: allegro: correctly scale the bit rate in SPS media: allegro: remove external QP table media: allegro: fix row and column in response message media: allegro: add control to disable encoder buffer media: allegro: add encoder buffer support media: allegro: add pm_runtime support media: allegro: lookup VCU settings media: allegro: fix module removal if initialization failed ...
2021-11-01Merge tag 'overflow-v5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linuxLinus Torvalds2-2/+39
Pull overflow updates from Kees Cook: "The end goal of the current buffer overflow detection work[0] is to gain full compile-time and run-time coverage of all detectable buffer overflows seen via array indexing or memcpy(), memmove(), and memset(). The str*() family of functions already have full coverage. While much of the work for these changes have been on-going for many releases (i.e. 0-element and 1-element array replacements, as well as avoiding false positives and fixing discovered overflows[1]), this series contains the foundational elements of several related buffer overflow detection improvements by providing new common helpers and FORTIFY_SOURCE changes needed to gain the introspection required for compiler visibility into array sizes. Also included are a handful of already Acked instances using the helpers (or related clean-ups), with many more waiting at the ready to be taken via subsystem-specific trees[2]. The new helpers are: - struct_group() for gaining struct member range introspection - memset_after() and memset_startat() for clearing to the end of structures - DECLARE_FLEX_ARRAY() for using flex arrays in unions or alone in structs Also included is the beginning of the refactoring of FORTIFY_SOURCE to support memcpy() introspection, fix missing and regressed coverage under GCC, and to prepare to fix the currently broken Clang support. Finishing this work is part of the larger series[0], but depends on all the false positives and buffer overflow bug fixes to have landed already and those that depend on this series to land. As part of the FORTIFY_SOURCE refactoring, a set of both a compile-time and run-time tests are added for FORTIFY_SOURCE and the mem*()-family functions respectively. The compile time tests have found a legitimate (though corner-case) bug[6] already. Please note that the appearance of "panic" and "BUG" in the FORTIFY_SOURCE refactoring are the result of relocating existing code, and no new use of those code-paths are expected nor desired. Finally, there are two tree-wide conversions for 0-element arrays and flexible array unions to gain sane compiler introspection coverage that result in no known object code differences. After this series (and the changes that have now landed via netdev and usb), we are very close to finally being able to build with -Warray-bounds and -Wzero-length-bounds. However, due corner cases in GCC[3] and Clang[4], I have not included the last two patches that turn on these options, as I don't want to introduce any known warnings to the build. Hopefully these can be solved soon" Link: https://lore.kernel.org/lkml/20210818060533.3569517-1-keescook@chromium.org/ [0] Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/log/?qt=grep&q=FORTIFY_SOURCE [1] Link: https://lore.kernel.org/lkml/202108220107.3E26FE6C9C@keescook/ [2] Link: https://lore.kernel.org/lkml/3ab153ec-2798-da4c-f7b1-81b0ac8b0c5b@roeck-us.net/ [3] Link: https://bugs.llvm.org/show_bug.cgi?id=51682 [4] Link: https://lore.kernel.org/lkml/202109051257.29B29745C0@keescook/ [5] Link: https://lore.kernel.org/lkml/20211020200039.170424-1-keescook@chromium.org/ [6] * tag 'overflow-v5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (30 commits) fortify: strlen: Avoid shadowing previous locals compiler-gcc.h: Define __SANITIZE_ADDRESS__ under hwaddress sanitizer treewide: Replace 0-element memcpy() destinations with flexible arrays treewide: Replace open-coded flex arrays in unions stddef: Introduce DECLARE_FLEX_ARRAY() helper btrfs: Use memset_startat() to clear end of struct string.h: Introduce memset_startat() for wiping trailing members and padding xfrm: Use memset_after() to clear padding string.h: Introduce memset_after() for wiping trailing members/padding lib: Introduce CONFIG_MEMCPY_KUNIT_TEST fortify: Add compile-time FORTIFY_SOURCE tests fortify: Allow strlen() and strnlen() to pass compile-time known lengths fortify: Prepare to improve strnlen() and strlen() warnings fortify: Fix dropped strcpy() compile-time write overflow check fortify: Explicitly disable Clang support fortify: Move remaining fortify helpers into fortify-string.h lib/string: Move helper functions out of string.c compiler_types.h: Remove __compiletime_object_size() cm4000_cs: Use struct_group() to zero struct cm4000_dev region can: flexcan: Use struct_group() to zero struct flexcan_regs regions ...
2021-11-01bpf: Add alignment padding for "map_extra" + consolidate holesJoanne Koong1-0/+1
This patch makes 2 changes regarding alignment padding for the "map_extra" field. 1) In the kernel header, "map_extra" and "btf_value_type_id" are rearranged to consolidate the hole. Before: struct bpf_map { ... u32 max_entries; /* 36 4 */ u32 map_flags; /* 40 4 */ /* XXX 4 bytes hole, try to pack */ u64 map_extra; /* 48 8 */ int spin_lock_off; /* 56 4 */ int timer_off; /* 60 4 */ /* --- cacheline 1 boundary (64 bytes) --- */ u32 id; /* 64 4 */ int numa_node; /* 68 4 */ ... bool frozen; /* 117 1 */ /* XXX 10 bytes hole, try to pack */ /* --- cacheline 2 boundary (128 bytes) --- */ ... struct work_struct work; /* 144 72 */ /* --- cacheline 3 boundary (192 bytes) was 24 bytes ago --- */ struct mutex freeze_mutex; /* 216 144 */ /* --- cacheline 5 boundary (320 bytes) was 40 bytes ago --- */ u64 writecnt; /* 360 8 */ /* size: 384, cachelines: 6, members: 26 */ /* sum members: 354, holes: 2, sum holes: 14 */ /* padding: 16 */ /* forced alignments: 2, forced holes: 1, sum forced holes: 10 */ } __attribute__((__aligned__(64))); After: struct bpf_map { ... u32 max_entries; /* 36 4 */ u64 map_extra; /* 40 8 */ u32 map_flags; /* 48 4 */ int spin_lock_off; /* 52 4 */ int timer_off; /* 56 4 */ u32 id; /* 60 4 */ /* --- cacheline 1 boundary (64 bytes) --- */ int numa_node; /* 64 4 */ ... bool frozen /* 113 1 */ /* XXX 14 bytes hole, try to pack */ /* --- cacheline 2 boundary (128 bytes) --- */ ... struct work_struct work; /* 144 72 */ /* --- cacheline 3 boundary (192 bytes) was 24 bytes ago --- */ struct mutex freeze_mutex; /* 216 144 */ /* --- cacheline 5 boundary (320 bytes) was 40 bytes ago --- */ u64 writecnt; /* 360 8 */ /* size: 384, cachelines: 6, members: 26 */ /* sum members: 354, holes: 1, sum holes: 14 */ /* padding: 16 */ /* forced alignments: 2, forced holes: 1, sum forced holes: 14 */ } __attribute__((__aligned__(64))); 2) Add alignment padding to the bpf_map_info struct More details can be found in commit 36f9814a494a ("bpf: fix uapi hole for 32 bit compat applications") Signed-off-by: Joanne Koong <joannekoong@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20211029224909.1721024-3-joannekoong@fb.com
2021-11-01Merge tag 'locking-core-2021-10-31' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds1-0/+25
Pull locking updates from Thomas Gleixner: - Move futex code into kernel/futex/ and split up the kitchen sink into seperate files to make integration of sys_futex_waitv() simpler. - Add a new sys_futex_waitv() syscall which allows to wait on multiple futexes. The main use case is emulating Windows' WaitForMultipleObjects which allows Wine to improve the performance of Windows Games. Also native Linux games can benefit from this interface as this is a common wait pattern for this kind of applications. - Add context to ww_mutex_trylock() to provide a path for i915 to rework their eviction code step by step without making lockdep upset until the final steps of rework are completed. It's also useful for regulator and TTM to avoid dropping locks in the non contended path. - Lockdep and might_sleep() cleanups and improvements - A few improvements for the RT substitutions. - The usual small improvements and cleanups. * tag 'locking-core-2021-10-31' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (44 commits) locking: Remove spin_lock_flags() etc locking/rwsem: Fix comments about reader optimistic lock stealing conditions locking: Remove rcu_read_{,un}lock() for preempt_{dis,en}able() locking/rwsem: Disable preemption for spinning region docs: futex: Fix kernel-doc references futex: Fix PREEMPT_RT build futex2: Documentation: Document sys_futex_waitv() uAPI selftests: futex: Test sys_futex_waitv() wouldblock selftests: futex: Test sys_futex_waitv() timeout selftests: futex: Add sys_futex_waitv() test futex,arm: Wire up sys_futex_waitv() futex,x86: Wire up sys_futex_waitv() futex: Implement sys_futex_waitv() futex: Simplify double_lock_hb() futex: Split out wait/wake futex: Split out requeue futex: Rename mark_wake_futex() futex: Rename: match_futex() futex: Rename: hb_waiter_{inc,dec,pending}() futex: Split out PI futex ...
2021-11-01Merge tag 'perf-core-2021-10-31' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds1-3/+31
Pull perf updates from Thomas Gleixner: "Core: - Allow ftrace to instrument parts of the perf core code - Add a new mem_hops field to perf_mem_data_src which allows to represent intra-node/package or inter-node/off-package details to prepare for next generation systems which have more hieararchy within the node/pacakge level. Tools: - Update for the new mem_hops field in perf_mem_data_src Arch: - A set of constraints fixes for the Intel uncore PMU - The usual set of small fixes and improvements for x86 and PPC" * tag 'perf-core-2021-10-31' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel: Fix ICL/SPR INST_RETIRED.PREC_DIST encodings powerpc/perf: Fix data source encodings for L2.1 and L3.1 accesses tools/perf: Add mem_hops field in perf_mem_data_src structure perf: Add mem_hops field in perf_mem_data_src structure perf: Add comment about current state of PERF_MEM_LVL_* namespace and remove an extra line perf/core: Allow ftrace for functions in kernel/event/core.c perf/x86: Add new event for AUX output counter index perf/x86: Add compiler barrier after updating BTS perf/x86/intel/uncore: Fix Intel SPR M3UPI event constraints perf/x86/intel/uncore: Fix Intel SPR M2PCIE event constraints perf/x86/intel/uncore: Fix Intel SPR IIO event constraints perf/x86/intel/uncore: Fix Intel SPR CHA event constraints perf/x86/intel/uncore: Fix Intel ICX IIO event constraints perf/x86/intel/uncore: Fix invalid unit check perf/x86/intel/uncore: Support extra IMC channel on Ice Lake server
2021-11-01Merge tag 'for-5.16-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linuxLinus Torvalds1-2/+9
Pull btrfs updates from David Sterba: "The updates this time are more under the hood and enhancing existing features (subpage with compression and zoned namespaces). Performance related: - misc small inode logging improvements (+3% throughput, -11% latency on sample dbench workload) - more efficient directory logging: bulk item insertion, less tree searches and locking - speed up bulk insertion of items into a b-tree, which is used when logging directories, when running delayed items for directories (fsync and transaction commits) and when running the slow path (full sync) of an fsync (bulk creation run time -4%, deletion -12%) Core: - continued subpage support - make defragmentation work - make compression write work - zoned mode - support ZNS (zoned namespaces), zone capacity is number of usable blocks in each zone - add dedicated block group (zoned) for relocation, to prevent out of order writes in some cases - greedy block group reclaim, pick the ones with least usable space first - preparatory work for send protocol updates - error handling improvements - cleanups and refactoring Fixes: - lockdep warnings - in show_devname callback, on seeding device - device delete on loop device due to conversions to workqueues - fix deadlock between chunk allocation and chunk btree modifications - fix tracking of missing device count and status" * tag 'for-5.16-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (140 commits) btrfs: remove root argument from check_item_in_log() btrfs: remove root argument from add_link() btrfs: remove root argument from btrfs_unlink_inode() btrfs: remove root argument from drop_one_dir_item() btrfs: clear MISSING device status bit in btrfs_close_one_device btrfs: call btrfs_check_rw_degradable only if there is a missing device btrfs: send: prepare for v2 protocol btrfs: fix comment about sector sizes supported in 64K systems btrfs: update device path inode time instead of bd_inode fs: export an inode_update_time helper btrfs: fix deadlock when defragging transparent huge pages btrfs: sysfs: convert scnprintf and snprintf to sysfs_emit btrfs: make btrfs_super_block size match BTRFS_SUPER_INFO_SIZE btrfs: update comments for chunk allocation -ENOSPC cases btrfs: fix deadlock between chunk allocation and chunk btree modifications btrfs: zoned: use greedy gc for auto reclaim btrfs: check-integrity: stop storing the block device name in btrfsic_dev_state btrfs: use btrfs_get_dev_args_from_path in dev removal ioctls btrfs: add a btrfs_get_dev_args_from_path helper btrfs: handle device lookup with btrfs_dev_lookup_args ...
2021-11-01Merge tag 'for-5.16/cdrom-2021-10-29' of git://git.kernel.dk/linux-blockLinus Torvalds1-0/+19
Pull CDROM updates from Jens Axboe: "On behalf of Phillip, here are the CDROM updates for the 5.16-rc1 merge window: - Add ioctl for improved media change detection (Lukas) - Reformat some documentation (Phillip) - Redundant variable removal (luo)" * tag 'for-5.16/cdrom-2021-10-29' of git://git.kernel.dk/linux-block: cdrom: Remove redundant variable and its assignment cdrom: docs: reformat table in Documentation/userspace-api/ioctl/cdrom.rst drivers/cdrom: improved ioctl for media change detection
2021-11-01Merge tag 'for-5.16/io_uring-2021-10-29' of git://git.kernel.dk/linux-blockLinus Torvalds1-0/+1
Pull io_uring updates from Jens Axboe: "Light on new features - basically just the hybrid mode support. Outside of that it's just fixes, cleanups, and performance improvements. In detail: - Add ring related information to the fdinfo output (Hao) - Hybrid async mode (Hao) - Support for batched issue on block (me) - sqe error trace improvement (me) - IOPOLL efficiency improvements (Pavel) - submit state cleanups and improvements (Pavel) - Completion side improvements (Pavel) - Drain improvements (Pavel) - Buffer selection cleanups (Pavel) - Fixed file node improvements (Pavel) - io-wq setup cancelation fix (Pavel) - Various other performance improvements and cleanups (Pavel) - Misc fixes (Arnd, Bixuan, Changcheng, Hao, me, Noah)" * tag 'for-5.16/io_uring-2021-10-29' of git://git.kernel.dk/linux-block: (97 commits) io-wq: remove worker to owner tw dependency io_uring: harder fdinfo sq/cq ring iterating io_uring: don't assign write hint in the read path io_uring: clusterise ki_flags access in rw_prep io_uring: kill unused param from io_file_supports_nowait io_uring: clean up timeout async_data allocation io_uring: don't try io-wq polling if not supported io_uring: check if opcode needs poll first on arming io_uring: clean iowq submit work cancellation io_uring: clean io_wq_submit_work()'s main loop io-wq: use helper for worker refcounting io_uring: implement async hybrid mode for pollable requests io_uring: Use ERR_CAST() instead of ERR_PTR(PTR_ERR()) io_uring: split logic of force_nonblock io_uring: warning about unused-but-set parameter io_uring: inform block layer of how many requests we are submitting io_uring: simplify io_file_supports_nowait() io_uring: combine REQ_F_NOWAIT_{READ,WRITE} flags io_uring: arm poll for non-nowait files fs/io_uring: Prioritise checking faster conditions first in io_write ...
2021-11-01Merge tag 'for-5.16/drivers-2021-10-29' of git://git.kernel.dk/linux-blockLinus Torvalds1-445/+0
Pull block driver updates from Jens Axboe: - paride driver cleanups (Christoph) - Remove cryptoloop support (Christoph) - null_blk poll support (me) - Now that add_disk() supports proper error handling, add it to various drivers (Luis) - Make ataflop actually work again (Michael) - s390 dasd fixes (Stefan, Heiko) - nbd fixes (Yu, Ye) - Remove redundant wq flush in mtip32xx (Christophe) - NVMe updates - fix a multipath partition scanning deadlock (Hannes Reinecke) - generate uevent once a multipath namespace is operational again (Hannes Reinecke) - support unique discovery controller NQNs (Hannes Reinecke) - fix use-after-free when a port is removed (Israel Rukshin) - clear shadow doorbell memory on resets (Keith Busch) - use struct_size (Len Baker) - add error handling support for add_disk (Luis Chamberlain) - limit the maximal queue size for RDMA controllers (Max Gurtovoy) - use a few more symbolic names (Max Gurtovoy) - fix error code in nvme_rdma_setup_ctrl (Max Gurtovoy) - add support for ->map_queues on FC (Saurav Kashyap) - support the current discovery subsystem entry (Hannes Reinecke) - use flex_array_size and struct_size (Len Baker) - bcache fixes (Christoph, Coly, Chao, Lin, Qing) - MD updates (Christoph, Guoqing, Xiao) - Misc fixes (Dan, Ding, Jiapeng, Shin'ichiro, Ye) * tag 'for-5.16/drivers-2021-10-29' of git://git.kernel.dk/linux-block: (117 commits) null_blk: Fix handling of submit_queues and poll_queues attributes block: ataflop: Fix warning comparing pointer to 0 bcache: replace snprintf in show functions with sysfs_emit bcache: move uapi header bcache.h to bcache code directory nvmet: use flex_array_size and struct_size nvmet: register discovery subsystem as 'current' nvmet: switch check for subsystem type nvme: add new discovery log page entry definitions block: ataflop: more blk-mq refactoring fixes block: remove support for cryptoloop and the xor transfer mtd: add add_disk() error handling rnbd: add error handling support for add_disk() um/drivers/ubd_kern: add error handling support for add_disk() m68k/emu/nfblock: add error handling support for add_disk() xen-blkfront: add error handling support for add_disk() bcache: add error handling support for add_disk() dm: add add_disk() error handling block: aoe: fixup coccinelle warnings nvmet: use struct_size over open coded arithmetic nvme: drop scan_lock and always kick requeue list when removing namespaces ...
2021-11-01amt: add control plane of amt interfaceTaehee Yoo1-0/+62
It adds definitions and control plane code for AMT. this is very similar to udp tunneling interfaces such as gtp, vxlan, etc. In the next patch, data plane code will be added. Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller1-1/+5
Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for net-next: 1) Use array_size() in ebtables, from Gustavo A. R. Silva. 2) Attach IPS_ASSURED to internal UDP stream state, reported by Maciej Zenczykowski. 3) Add NFT_META_IFTYPE to match on the interface type either from ingress or egress. 4) Generalize pktinfo->tprot_set to flags field. 5) Allow to match on inner headers / payload data. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-01vdpa: Introduce query of device config layoutParav Pandit1-0/+6
Introduce a command to query a device config layout. An example query of network vdpa device: $ vdpa dev add name bar mgmtdev vdpasim_net $ vdpa dev config show bar: mac 00:35:09:19:48:05 link up link_announce false mtu 1500 $ vdpa dev config show -jp { "config": { "bar": { "mac": "00:35:09:19:48:05", "link ": "up", "link_announce ": false, "mtu": 1500, } } } Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Eli Cohen <elic@nvidia.com> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20211026175519.87795-3-parav@nvidia.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-11-01i2c: virtio: Add support for zero-length requestsViresh Kumar1-0/+6
The virtio specification received a new mandatory feature (VIRTIO_I2C_F_ZERO_LENGTH_REQUEST) for zero length requests. Fail if the feature isn't offered by the device. For each read-request, set the VIRTIO_I2C_FLAGS_M_RD flag, as required by the VIRTIO_I2C_F_ZERO_LENGTH_REQUEST feature. This allows us to support zero length requests, like SMBUS Quick, where the buffer need not be sent anymore. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Link: https://lore.kernel.org/r/7c58868cd26d2fc4bd82d0d8b0dfb55636380110.1634808714.git.viresh.kumar@linaro.org Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jie Deng <jie.deng@intel.com> # once the spec is merged
2021-11-01netfilter: nft_payload: support for inner header matching / manglingPablo Neira Ayuso1-0/+2
Allow to match and mangle on inner headers / payload data after the transport header. There is a new field in the pktinfo structure that stores the inner header offset which is calculated only when requested. Only TCP and UDP supported at this stage. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-11-01vdpa: add new attribute VDPA_ATTR_DEV_MIN_VQ_SIZEWu Zongyong1-0/+1
This attribute advertises the min value of virtqueue size. The value is 1 by default. Signed-off-by: Wu Zongyong <wuzongyong@linux.alibaba.com> Link: https://lore.kernel.org/r/2bbc417355c4d22298050b1ba887cecfbde3e85d.1635493219.git.wuzongyong@linux.alibaba.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-11-01netfilter: nft_meta: add NFT_META_IFTYPEPablo Neira Ayuso1-1/+3
Generalize NFT_META_IIFTYPE to NFT_META_IFTYPE which allows you to match on the interface type of the skb->dev field. This field is used by the netdev family to add an implicit dependency to skip non-ethernet packets when matching on layer 3 and 4 TCP/IP header fields. For backward compatibility, add the NFT_META_IIFTYPE alias to NFT_META_IFTYPE. Add __NFT_META_IIFTYPE, to be used by userspace in the future to match specifically on the iiftype. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-10-31Merge tag 'kvmarm-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEADPaolo Bonzini2-1/+7
KVM/arm64 updates for Linux 5.16 - More progress on the protected VM front, now with the full fixed feature set as well as the limitation of some hypercalls after initialisation. - Cleanup of the RAZ/WI sysreg handling, which was pointlessly complicated - Fixes for the vgic placement in the IPA space, together with a bunch of selftests - More memcg accounting of the memory allocated on behalf of a guest - Timer and vgic selftests - Workarounds for the Apple M1 broken vgic implementation - KConfig cleanups - New kvmarm.mode=none option, for those who really dislike us
2021-10-29bcache: move uapi header bcache.h to bcache code directoryColy Li1-445/+0
The header file include/uapi/linux/bcache.h is not really a user space API heaer. This file defines the ondisk format of bcache internal meta data but no one includes it from user space, bcache-tools has its own copy of this header with minor modification. Therefore, this patch moves include/uapi/linux/bcache.h to bcache code directory as drivers/md/bcache/bcache_ondisk.h. Suggested-by: Arnd Bergmann <arnd@kernel.org> Suggested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Coly Li <colyli@suse.de> Link: https://lore.kernel.org/r/20211029060930.119923-2-colyli@suse.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-29btrfs: send: prepare for v2 protocolDavid Sterba1-2/+9
This is preparatory work for send protocol update to version 2 and higher. We have many pending protocol update requests but still don't have the basic protocol rev in place, the first thing that must happen is to do the actual versioning support. The protocol version is u32 and is a new member in the send ioctl struct. Validity of the version field is backed by a new flag bit. Old kernels would fail when a higher version is requested. Version protocol 0 will pick the highest supported version, BTRFS_SEND_STREAM_VERSION, that's also exported in sysfs. The version is still unchanged and will be increased once we have new incompatible commands or stream updates. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-10-28bpf: Add bpf_kallsyms_lookup_name helperKumar Kartikeya Dwivedi1-0/+16
This helper allows us to get the address of a kernel symbol from inside a BPF_PROG_TYPE_SYSCALL prog (used by gen_loader), so that we can relocate typeless ksym vars. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20211028063501.2239335-2-memxor@gmail.com
2021-10-28bpf: Add bloom filter map implementationJoanne Koong1-0/+9
This patch adds the kernel-side changes for the implementation of a bpf bloom filter map. The bloom filter map supports peek (determining whether an element is present in the map) and push (adding an element to the map) operations.These operations are exposed to userspace applications through the already existing syscalls in the following way: BPF_MAP_LOOKUP_ELEM -> peek BPF_MAP_UPDATE_ELEM -> push The bloom filter map does not have keys, only values. In light of this, the bloom filter map's API matches that of queue stack maps: user applications use BPF_MAP_LOOKUP_ELEM/BPF_MAP_UPDATE_ELEM which correspond internally to bpf_map_peek_elem/bpf_map_push_elem, and bpf programs must use the bpf_map_peek_elem and bpf_map_push_elem APIs to query or add an element to the bloom filter map. When the bloom filter map is created, it must be created with a key_size of 0. For updates, the user will pass in the element to add to the map as the value, with a NULL key. For lookups, the user will pass in the element to query in the map as the value, with a NULL key. In the verifier layer, this requires us to modify the argument type of a bloom filter's BPF_FUNC_map_peek_elem call to ARG_PTR_TO_MAP_VALUE; as well, in the syscall layer, we need to copy over the user value so that in bpf_map_peek_elem, we know which specific value to query. A few things to please take note of: * If there are any concurrent lookups + updates, the user is responsible for synchronizing this to ensure no false negative lookups occur. * The number of hashes to use for the bloom filter is configurable from userspace. If no number is specified, the default used will be 5 hash functions. The benchmarks later in this patchset can help compare the performance of using different number of hashes on different entry sizes. In general, using more hashes decreases both the false positive rate and the speed of a lookup. * Deleting an element in the bloom filter map is not supported. * The bloom filter map may be used as an inner map. * The "max_entries" size that is specified at map creation time is used to approximate a reasonable bitmap size for the bloom filter, and is not otherwise strictly enforced. If the user wishes to insert more entries into the bloom filter than "max_entries", they may do so but they should be aware that this may lead to a higher false positive rate. Signed-off-by: Joanne Koong <joannekoong@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20211027234504.30744-2-joannekoong@fb.com
2021-10-28fuse: add FOPEN_NOFLUSHAmir Goldstein1-1/+6
Add flag returned by FUSE_OPEN and FUSE_CREATE requests to avoid flushing data cache on close. Different filesystems implement ->flush() is different ways: - Most disk filesystems do not implement ->flush() at all - Some network filesystem (e.g. nfs) flush local write cache of FMODE_WRITE file and send a "flush" command to server - Some network filesystem (e.g. cifs) flush local write cache of FMODE_WRITE file without sending an additional command to server FUSE flushes local write cache of ANY file, even non FMODE_WRITE and sends a "flush" command to server (if server implements it). The FUSE implementation of ->flush() seems over agressive and arbitrary and does not make a lot of sense when writeback caching is disabled. Instead of deciding on another arbitrary implementation that makes sense, leave the choice of per-file flush behavior in the hands of the server. Link: https://lore.kernel.org/linux-fsdevel/CAJfpegspE8e6aKd47uZtSYX8Y-1e1FWS0VL0DH2Skb9gQP5RJQ@mail.gmail.com/ Suggested-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-10-28BackMerge tag 'v5.15-rc7' into drm-nextDave Airlie5-10/+22
The msm next tree is based on rc3, so let's just backmerge rc7 before pulling it in. Signed-off-by: Dave Airlie <airlied@redhat.com>
2021-10-27dm: introduce audit event module for device mapperMichael Weiß1-0/+2
To be able to send auditing events to user space, we introduce a generic dm-audit module. It provides helper functions to emit audit events through the kernel audit subsystem. We claim the AUDIT_DM_CTRL type=1336 and AUDIT_DM_EVENT type=1337 out of the audit event messages range in the corresponding userspace api in 'include/uapi/linux/audit.h' for those events. AUDIT_DM_CTRL is used to provide information about creation and destruction of device mapper targets which are triggered by user space admin control actions. AUDIT_DM_EVENT is used to provide information about actual errors during operation of the mapped device, showing e.g. integrity violations in audit log. Following commits to device mapper targets actually will make use of this to emit those events in relevant cases. The audit logs look like this if executing the following simple test: # dd if=/dev/zero of=test.img bs=1M count=1024 # losetup -f test.img # integritysetup -vD format --integrity sha256 -t 32 /dev/loop0 # integritysetup open -D /dev/loop0 --integrity sha256 integritytest # integritysetup status integritytest # integritysetup close integritytest # integritysetup open -D /dev/loop0 --integrity sha256 integritytest # integritysetup status integritytest # dd if=/dev/urandom of=/dev/loop0 bs=512 count=1 seek=100000 # dd if=/dev/mapper/integritytest of=/dev/null ------------------------- audit.log from auditd type=UNKNOWN[1336] msg=audit(1630425039.363:184): module=integrity op=ctr ppid=3807 pid=3819 auid=1000 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=pts2 ses=3 comm="integritysetup" exe="/sbin/integritysetup" subj==unconfined dev=254:3 error_msg='success' res=1 type=UNKNOWN[1336] msg=audit(1630425039.471:185): module=integrity op=dtr ppid=3807 pid=3819 auid=1000 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=pts2 ses=3 comm="integritysetup" exe="/sbin/integritysetup" subj==unconfined dev=254:3 error_msg='success' res=1 type=UNKNOWN[1336] msg=audit(1630425039.611:186): module=integrity op=ctr ppid=3807 pid=3819 auid=1000 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=pts2 ses=3 comm="integritysetup" exe="/sbin/integritysetup" subj==unconfined dev=254:3 error_msg='success' res=1 type=UNKNOWN[1336] msg=audit(1630425054.475:187): module=integrity op=dtr ppid=3807 pid=3819 auid=1000 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=pts2 ses=3 comm="integritysetup" exe="/sbin/integritysetup" subj==unconfined dev=254:3 error_msg='success' res=1 type=UNKNOWN[1336] msg=audit(1630425073.171:191): module=integrity op=ctr ppid=3807 pid=3883 auid=1000 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=pts2 ses=3 comm="integritysetup" exe="/sbin/integritysetup" subj==unconfined dev=254:3 error_msg='success' res=1 type=UNKNOWN[1336] msg=audit(1630425087.239:192): module=integrity op=dtr ppid=3807 pid=3902 auid=1000 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=pts2 ses=3 comm="integritysetup" exe="/sbin/integritysetup" subj==unconfined dev=254:3 error_msg='success' res=1 type=UNKNOWN[1336] msg=audit(1630425093.755:193): module=integrity op=ctr ppid=3807 pid=3906 auid=1000 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=pts2 ses=3 comm="integritysetup" exe="/sbin/integritysetup" subj==unconfined dev=254:3 error_msg='success' res=1 type=UNKNOWN[1337] msg=audit(1630425112.119:194): module=integrity op=integrity-checksum dev=254:3 sector=77480 res=0 type=UNKNOWN[1337] msg=audit(1630425112.119:195): module=integrity op=integrity-checksum dev=254:3 sector=77480 res=0 type=UNKNOWN[1337] msg=audit(1630425112.119:196): module=integrity op=integrity-checksum dev=254:3 sector=77480 res=0 type=UNKNOWN[1337] msg=audit(1630425112.119:197): module=integrity op=integrity-checksum dev=254:3 sector=77480 res=0 type=UNKNOWN[1337] msg=audit(1630425112.119:198): module=integrity op=integrity-checksum dev=254:3 sector=77480 res=0 type=UNKNOWN[1337] msg=audit(1630425112.119:199): module=integrity op=integrity-checksum dev=254:3 sector=77480 res=0 type=UNKNOWN[1337] msg=audit(1630425112.119:200): module=integrity op=integrity-checksum dev=254:3 sector=77480 res=0 type=UNKNOWN[1337] msg=audit(1630425112.119:201): module=integrity op=integrity-checksum dev=254:3 sector=77480 res=0 type=UNKNOWN[1337] msg=audit(1630425112.119:202): module=integrity op=integrity-checksum dev=254:3 sector=77480 res=0 type=UNKNOWN[1337] msg=audit(1630425112.119:203): module=integrity op=integrity-checksum dev=254:3 sector=77480 res=0 Signed-off-by: Michael Weiß <michael.weiss@aisec.fraunhofer.de> Signed-off-by: Paul Moore <paul@paul-moore.com> # fix audit.h numbering Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2021-10-27fanotify: Emit generic error info for error eventGabriel Krisman Bertazi1-0/+7
The error info is a record sent to users on FAN_FS_ERROR events documenting the type of error. It also carries an error count, documenting how many errors were observed since the last reporting. Link: https://lore.kernel.org/r/20211025192746.66445-28-krisman@collabora.com Reviewed-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com> Signed-off-by: Jan Kara <jack@suse.cz>
2021-10-27fanotify: Reserve UAPI bits for FAN_FS_ERRORGabriel Krisman Bertazi1-0/+1
FAN_FS_ERROR allows reporting of event type FS_ERROR to userspace, which is a mechanism to report file system wide problems via fanotify. This commit preallocate userspace visible bits to match the FS_ERROR event. Link: https://lore.kernel.org/r/20211025192746.66445-19-krisman@collabora.com Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com> Signed-off-by: Jan Kara <jack@suse.cz>
2021-10-26mctp: Implement extended addressingJeremy Kerr1-0/+11
This change allows an extended address struct - struct sockaddr_mctp_ext - to be passed to sendmsg/recvmsg. This allows userspace to specify output ifindex and physical address information (for sendmsg) or receive the input ifindex/physaddr for incoming messages (for recvmsg). This is typically used by userspace for MCTP address discovery and assignment operations. The extended addressing facility is conditional on a new sockopt: MCTP_OPT_ADDR_EXT; userspace must explicitly enable addressing before the kernel will consume/populate the extended address data. Includes a fix for an uninitialised var: Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-25net: phy: add constants for fast retrain related registerLuo Jie1-0/+9
Add the constants for 2.5G fast retrain capability in 10G AN control register, fast retrain status and control register and THP bypass register into mdio.h. Signed-off-by: Luo Jie <luoj@codeaurora.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-25KVM: x86: On emulation failure, convey the exit reason, etc. to userspaceDavid Edmondson1-0/+6
Should instruction emulation fail, include the VM exit reason, etc. in the emulation_failure data passed to userspace, in order that the VMM can report it as a debugging aid when describing the failure. Suggested-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: David Edmondson <david.edmondson@oracle.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210920103737.2696756-4-david.edmondson@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-25KVM: x86: Clarify the kvm_run.emulation_failure structure layoutDavid Edmondson1-2/+6
Until more flags for kvm_run.emulation_failure flags are defined, it is undetermined whether new payload elements corresponding to those flags will be additive or alternative. As a hint to userspace that an alternative is possible, wrap the current payload elements in a union. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: David Edmondson <david.edmondson@oracle.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210920103737.2696756-2-david.edmondson@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-24can: netlink: add interface for CAN-FD Transmitter Delay Compensation (TDC)Vincent Mailhol1-2/+27
Add the netlink interface for TDC parameters of struct can_tdc_const and can_tdc. Contrary to the can_bittiming(_const) structures for which there is just a single IFLA_CAN(_DATA)_BITTMING(_CONST) entry per structure, here, we create a nested entry IFLA_CAN_TDC. Within this nested entry, additional IFLA_CAN_TDC_TDC* entries are added for each of the TDC parameters of the newly introduced struct can_tdc_const and struct can_tdc. For struct can_tdc_const, these are: IFLA_CAN_TDC_TDCV_MIN IFLA_CAN_TDC_TDCV_MAX IFLA_CAN_TDC_TDCO_MIN IFLA_CAN_TDC_TDCO_MAX IFLA_CAN_TDC_TDCF_MIN IFLA_CAN_TDC_TDCF_MAX For struct can_tdc, these are: IFLA_CAN_TDC_TDCV IFLA_CAN_TDC_TDCO IFLA_CAN_TDC_TDCF This is done so that changes can be applied in the future to the structures without breaking the netlink interface. The TDC netlink logic works as follow: * CAN_CTRLMODE_FD is not provided: - if any TDC parameters are provided: error. - TDC parameters not provided: TDC parameters unchanged. * CAN_CTRLMODE_FD is provided and is false: - TDC is deactivated: both the structure and the CAN_CTRLMODE_TDC_{AUTO,MANUAL} flags are flushed. * CAN_CTRLMODE_FD provided and is true: - CAN_CTRLMODE_TDC_{AUTO,MANUAL} and tdc{v,o,f} not provided: call can_calc_tdco() to automatically decide whether TDC should be activated and, if so, set CAN_CTRLMODE_TDC_AUTO and uses the calculated tdco value. - CAN_CTRLMODE_TDC_AUTO and tdco provided: set CAN_CTRLMODE_TDC_AUTO and use the provided tdco value. Here, tdcv is illegal and tdcf is optional. - CAN_CTRLMODE_TDC_MANUAL and both of tdcv and tdco provided: set CAN_CTRLMODE_TDC_MANUAL and use the provided tdcv and tdco value. Here, tdcf is optional. - CAN_CTRLMODE_TDC_{AUTO,MANUAL} are mutually exclusive. Whenever one flag is turned on, the other will automatically be turned off. Providing both returns an error. - Combination other than the one listed above are illegal and will return an error. N.B. above rules mean that whenever CAN_CTRLMODE_FD is provided, the previous TDC values will be overwritten. The only option to reuse previous TDC value is to not provide CAN_CTRLMODE_FD. All the new parameters are defined as u32. This arbitrary choice is done to mimic the other bittiming values with are also all of type u32. An u16 would have been sufficient to hold the TDC values. This patch completes below series (c.f. [1]): - commit 289ea9e4ae59 ("can: add new CAN FD bittiming parameters: Transmitter Delay Compensation (TDC)") - commit c25cc7993243 ("can: bittiming: add calculation for CAN FD Transmitter Delay Compensation (TDC)") [1] https://lore.kernel.org/linux-can/20210224002008.4158-1-mailhol.vincent@wanadoo.fr/T/#t Link: https://lore.kernel.org/all/20210918095637.20108-5-mailhol.vincent@wanadoo.fr Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2021-10-24can: bittiming: allow TDC{V,O} to be zero and add can_tdc_const::tdc{v,o,f}_minVincent Mailhol1-0/+2
ISO 11898-1 specifies in section 11.3.3 "Transmitter delay compensation" that "the configuration range for [the] SSP position shall be at least 0 to 63 minimum time quanta." Because SSP = TDCV + TDCO, it means that we should allow both TDCV and TDCO to hold zero value in order to honor SSP's minimum possible value. However, current implementation assigned special meaning to TDCV and TDCO's zero values: * TDCV = 0 -> TDCV is automatically measured by the transceiver. * TDCO = 0 -> TDC is off. In order to allow for those values to really be zero and to maintain current features, we introduce two new flags: * CAN_CTRLMODE_TDC_AUTO indicates that the controller support automatic measurement of TDCV. * CAN_CTRLMODE_TDC_MANUAL indicates that the controller support manual configuration of TDCV. N.B.: current implementation failed to provide an option for the driver to indicate that only manual mode was supported. TDC is disabled if both CAN_CTRLMODE_TDC_AUTO and CAN_CTRLMODE_TDC_MANUAL flags are off, c.f. the helper function can_tdc_is_enabled() which is also introduced in this patch. Also, this patch adds three fields: tdcv_min, tdco_min and tdcf_min to struct can_tdc_const. While we are not convinced that those three fields could be anything else than zero, we can imagine that some controllers might specify a lower bound on these. Thus, those minimums are really added "just in case". Comments of struct can_tdc and can_tdc_const are updated accordingly. Finally, the changes are applied to the etas_es58x driver. Link: https://lore.kernel.org/all/20210918095637.20108-2-mailhol.vincent@wanadoo.fr Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2021-10-22Merge tag 'mac80211-next-for-net-next-2021-10-21' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-nextJakub Kicinski2-1/+143
Johannes Berg says: ==================== Quite a few changes: * the applicable eth_hw_addr_set() and const hw_addr changes * various code cleanups/refactorings * stack usage reductions across the wireless stack * some unstructured find_ie() -> structured find_element() changes * a few more pieces of multi-BSSID support * some 6 GHz regulatory support * 6 GHz support in hwsim, for testing userspace code * Light Communications (LC, 802.11bb) early band definitions to be able to add a first driver soon * tag 'mac80211-next-for-net-next-2021-10-21' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next: (35 commits) cfg80211: fix kernel-doc for MBSSID EMA mac80211: Prevent AP probing during suspend nl80211: Add LC placeholder band definition to nl80211_band ... ==================== Link: https://lore.kernel.org/r/20211021154953.134849-1-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller1-2/+5
Lots of simnple overlapping additions. With a build fix from Stephen Rothwell. Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-21bpf: Add verified_insns to bpf_prog_info and fdinfoDave Marchevsky1-0/+1
This stat is currently printed in the verifier log and not stored anywhere. To ease consumption of this data, add a field to bpf_prog_aux so it can be exposed via BPF_OBJ_GET_INFO_BY_FD and fdinfo. Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20211020074818.1017682-2-davemarchevsky@fb.com
2021-10-21bpf: Add bpf_skc_to_unix_sock() helperHengqi Chen1-0/+7
The helper is used in tracing programs to cast a socket pointer to a unix_sock pointer. The return value could be NULL if the casting is illegal. Suggested-by: Yonghong Song <yhs@fb.com> Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20211021134752.1223426-2-hengqi.chen@gmail.com
2021-10-21uapi: Add <linux/map_to_14segment.h>Geert Uytterhoeven1-0/+241
Add a header file providing translation primitives and tables for the conversion of (ASCII) characters to a 14-segments notation, as used by 14-segment alphanumeric displays. This follows the spirit of include/uapi/linux/map_to_7segment.h. Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2021-10-21cfg80211: fix kernel-doc for MBSSID EMAJohannes Berg1-2/+2
The struct member ema_max_profile_periodicity was listed with the wrong name in the kernel-doc, fix that. Link: https://lore.kernel.org/r/20211021173038.18ec2030c66b.Iac731bb299525940948adad2c41f514b7dd81c47@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-10-21nl80211: Add LC placeholder band definition to nl80211_bandSrinivasan Raju1-0/+2
Define LC band which is a draft under IEEE 802.11bb. Current NL80211_BAND_LC is a placeholder band and will be more defined IEEE 802.11bb progresses. Signed-off-by: Srinivasan Raju <srini.raju@purelifi.com> Link: https://lore.kernel.org/r/20211018100143.7565-2-srini.raju@purelifi.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-10-21nl80211: vendor-cmd: intel: add more details for IWL_MVM_VENDOR_CMD_HOST_GET_OWNERSHIPEmmanuel Grumbach1-0/+29
Explain more the expected flow for this command. Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Link: https://lore.kernel.org/r/20211020051147.29297-1-emmanuel.grumbach@intel.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-10-20fq_codel: generalise ce_threshold marking for subset of trafficToke Høiland-Jørgensen1-1/+2
Commit e72aeb9ee0e3 ("fq_codel: implement L4S style ce_threshold_ect1 marking") expanded the ce_threshold feature of FQ-CoDel so it can be applied to a subset of the traffic, using the ECT(1) bit of the ECN field as the classifier. However, hard-coding ECT(1) as the only classifier for this feature seems limiting, so let's expand it to be more general. To this end, change the parameter from a ce_threshold_ect1 boolean, to a one-byte selector/mask pair (ce_threshold_{selector,mask}) which is applied to the whole diffserv/ECN field in the IP header. This makes it possible to classify packets by any value in either the ECN field or the diffserv field. In particular, setting a selector of INET_ECN_ECT_1 and a mask of INET_ECN_MASK corresponds to the functionality before this patch, and a mask of ~INET_ECN_MASK allows using the selector as a straight-forward match against a diffserv code point: # apply ce_threshold to ECT(1) traffic tc qdisc replace dev eth0 root fq_codel ce_threshold 1ms ce_threshold_selector 0x1/0x3 # apply ce_threshold to ECN-capable traffic marked as diffserv AF22 tc qdisc replace dev eth0 root fq_codel ce_threshold 1ms ce_threshold_selector 0x50/0xfc Regardless of the selector chosen, the normal rules for ECN-marking of packets still apply, i.e., the flow must still declare itself ECN-capable by setting one of the bits in the ECN field to get marked at all. v2: - Add tc usage examples to patch description Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20211019174709.69081-1-toke@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-20media: allegro: add control to disable encoder bufferMichael Tretter1-0/+5
The encoder buffer can have a negative impact on the quality of the encoded video. Add a control to allow user space to disable the encoder buffer per channel if the VCU supports the encoder buffer but the quality is not sufficient. Signed-off-by: Michael Tretter <m.tretter@pengutronix.de> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>