aboutsummaryrefslogtreecommitdiffstats
path: root/include/uapi/linux (follow)
AgeCommit message (Collapse)AuthorFilesLines
2022-09-29net/sched: taprio: allow user input of per-tc max SDUVladimir Oltean1-0/+11
IEEE 802.1Q clause 12.29.1.1 "The queueMaxSDUTable structure and data types" and 8.6.8.4 "Enhancements for scheduled traffic" talk about the existence of a per traffic class limitation of maximum frame sizes, with a fallback on the port-based MTU. As far as I am able to understand, the 802.1Q Service Data Unit (SDU) represents the MAC Service Data Unit (MSDU, i.e. L2 payload), excluding any number of prepended VLAN headers which may be otherwise present in the MSDU. Therefore, the queueMaxSDU is directly comparable to the device MTU (1500 means L2 payload sizes are accepted, or frame sizes of 1518 octets, or 1522 plus one VLAN header). Drivers which offload this are directly responsible of translating into other units of measurement. To keep the fast path checks optimized, we keep 2 arrays in the qdisc, one for max_sdu translated into frame length (so that it's comparable to skb->len), and another for offloading and for dumping back to the user. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-29docs: netlink: clarify the historical baggage of Netlink flagsJakub Kicinski1-1/+1
nlmsg_flags are full of historical baggage, inconsistencies and strangeness. Try to document it more thoroughly. Explain the meaning of the ECHO flag (and while at it clarify the comment in the uAPI). Handwave a little about the NEW request flags and how they make sense on the surface but cater to really old paradigm before commands were a thing. I will add more notes on how to make use of ECHO and discouragement for reuse of flags to the kernel-side documentation. Link: https://lore.kernel.org/r/20220927212306.823862-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-29Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-0/+2
No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-29landlock: Fix documentation styleMickaël Salaün1-5/+5
It seems that all code should use double backquotes, which is also used to convert "%" defines. Let's use an homogeneous style and remove all use of simple backquotes (which should only be used for emphasis). Cc: Günther Noack <gnoack3000@gmail.com> Cc: Paul Moore <paul@paul-moore.com> Signed-off-by: Mickaël Salaün <mic@digikod.net> Link: https://lore.kernel.org/r/20220923154207.3311629-4-mic@digikod.net
2022-09-29perf/uapi: Define PERF_MEM_SNOOPX_PEER in kernel header fileRavi Bangoria1-1/+1
PERF_MEM_SNOOPX_PEER is defined only in tools uapi header. Although it's used only by perf tool, not defining it in kernel header can create problems in future. Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20220928095805.596-8-ravi.bangoria@amd.com
2022-09-29perf/mem: Introduce PERF_MEM_LVLNUM_{EXTN_MEM|IO}Ravi Bangoria1-1/+3
PERF_MEM_LVLNUM_EXTN_MEM which can be used to indicate accesses to extension memory like CXL etc. PERF_MEM_LVL_IO can be used for IO accesses but it can not distinguish between local and remote IO. Introduce new field PERF_MEM_LVLNUM_IO which can be clubbed with PERF_MEM_REMOTE_REMOTE to indicate Remote IO accesses. Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20220928095805.596-2-ravi.bangoria@amd.com
2022-09-29Merge branch 'v6.0-rc7'Peter Zijlstra4-31/+23
Merge upstream to get RAPTORLAKE_S Signed-off-by: Peter Zijlstra <peterz@infradead.org>
2022-09-29KVM: Add KVM_CAP_DIRTY_LOG_RING_ACQ_REL capability and config optionMarc Zyngier1-0/+1
In order to differenciate between architectures that require no extra synchronisation when accessing the dirty ring and those who do, add a new capability (KVM_CAP_DIRTY_LOG_RING_ACQ_REL) that identify the latter sort. TSO architectures can obviously advertise both, while relaxed architectures must only advertise the ACQ_REL version. This requires some configuration symbol rejigging, with HAVE_KVM_DIRTY_RING being only indirectly selected by two top-level config symbols: - HAVE_KVM_DIRTY_RING_TSO for strongly ordered architectures (x86) - HAVE_KVM_DIRTY_RING_ACQ_REL for weakly ordered architectures (arm64) Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Link: https://lore.kernel.org/r/20220926145120.27974-3-maz@kernel.org
2022-09-28Input: add ABS_PROFILE to uapi and documentationNate Yocom1-0/+1
Define new ABS_PROFILE axis for input devices which need it, e.g. X-Box Adaptive Controller and X-Box Elite 2. Signed-off-by: Nate Yocom <nate@yocom.org> Link: https://lore.kernel.org/r/20220908173930.28940-4-nate@yocom.org Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2022-09-28bpf: Handle bpf_link_info for the parameterized task BPF iterators.Kui-Feng Lee1-0/+4
Add new fields to bpf_link_info that users can query it through bpf_obj_get_info_by_fd(). Signed-off-by: Kui-Feng Lee <kuifeng@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Acked-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/bpf/20220926184957.208194-3-kuifeng@fb.com
2022-09-28bpf: Parameterize task iterators.Kui-Feng Lee1-0/+6
Allow creating an iterator that loops through resources of one thread/process. People could only create iterators to loop through all resources of files, vma, and tasks in the system, even though they were interested in only the resources of a specific task or process. Passing the additional parameters, people can now create an iterator to go through all resources or only the resources of a task. Signed-off-by: Kui-Feng Lee <kuifeng@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Acked-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/bpf/20220926184957.208194-2-kuifeng@fb.com
2022-09-28firmware/psci: Add debugfs support to ease debuggingDmitry Baryshkov1-0/+14
To ease debugging of PSCI supported features, add debugfs file called 'psci' describing PSCI and SMC CC versions, enabled features and options. Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Reviewed-by: Mark Brown <broonie@kernel.org> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Link: https://lore.kernel.org/r/20220926110758.666922-1-dmitry.baryshkov@linaro.org' Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2022-09-27net: tls: Add ARIA-GCM algorithmTaehee Yoo1-0/+30
RFC 6209 describes ARIA for TLS 1.2. ARIA-128-GCM and ARIA-256-GCM are defined in RFC 6209. This patch would offer performance increment and an opportunity for hardware offload. Benchmark results: iperf-ssl are used. CPU: intel i3-12100. TLS(openssl-3.0-dev) [ 3] 0.0- 1.0 sec 185 MBytes 1.55 Gbits/sec [ 3] 1.0- 2.0 sec 186 MBytes 1.56 Gbits/sec [ 3] 2.0- 3.0 sec 186 MBytes 1.56 Gbits/sec [ 3] 3.0- 4.0 sec 186 MBytes 1.56 Gbits/sec [ 3] 4.0- 5.0 sec 186 MBytes 1.56 Gbits/sec [ 3] 0.0- 5.0 sec 927 MBytes 1.56 Gbits/sec kTLS(aria-generic) [ 3] 0.0- 1.0 sec 198 MBytes 1.66 Gbits/sec [ 3] 1.0- 2.0 sec 194 MBytes 1.62 Gbits/sec [ 3] 2.0- 3.0 sec 194 MBytes 1.63 Gbits/sec [ 3] 3.0- 4.0 sec 194 MBytes 1.63 Gbits/sec [ 3] 4.0- 5.0 sec 194 MBytes 1.62 Gbits/sec [ 3] 0.0- 5.0 sec 974 MBytes 1.63 Gbits/sec kTLS(aria-avx wirh GFNI) [ 3] 0.0- 1.0 sec 632 MBytes 5.30 Gbits/sec [ 3] 1.0- 2.0 sec 657 MBytes 5.51 Gbits/sec [ 3] 2.0- 3.0 sec 657 MBytes 5.51 Gbits/sec [ 3] 3.0- 4.0 sec 656 MBytes 5.50 Gbits/sec [ 3] 4.0- 5.0 sec 656 MBytes 5.50 Gbits/sec [ 3] 0.0- 5.0 sec 3.18 GBytes 5.47 Gbits/sec Signed-off-by: Taehee Yoo <ap420073@gmail.com> Reviewed-by: Vadim Fedorenko <vfedorenko@novek.ru> Link: https://lore.kernel.org/r/20220925150033.24615-1-ap420073@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-27headers: Remove some left-over license textChristophe JAILLET5-34/+1
Remove some left-over from commit e2be04c7f995 ("License cleanup: add SPDX license identifier to uapi header files with a license") When the SPDX-License-Identifier tag has been added, the corresponding license text has not been removed. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Acked-by: Alexander Duyck <alexanderduyck@fb.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Link: https://lore.kernel.org/r/88410cddd31197ea26840d7dd71612bece8c6acf.1663871981.git.christophe.jaillet@wanadoo.fr Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-26bpf: Return value in kprobe get_func_ip only for entry addressJiri Olsa1-0/+1
Changing return value of kprobe's version of bpf_get_func_ip to return zero if the attach address is not on the function's entry point. For kprobes attached in the middle of the function we can't easily get to the function address especially now with the CONFIG_X86_KERNEL_IBT support. If user cares about current IP for kprobes attached within the function body, they can get it with PT_REGS_IP(ctx). Suggested-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20220926153340.1621984-6-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-09-26btrfs: introduce BTRFS_QGROUP_STATUS_FLAGS_MASK for later expansionQu Wenruo1-0/+4
Currently we only have 3 qgroup flags: - BTRFS_QGROUP_STATUS_FLAG_ON - BTRFS_QGROUP_STATUS_FLAG_RESCAN - BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT These flags match the on-disk flags used in btrfs_qgroup_status. But we're going to introduce extra runtime flags which will not reach disks. So here we introduce a new mask, BTRFS_QGROUP_STATUS_FLAGS_MASK, to make sure only those flags can reach disks. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-09-26btrfs: separate BLOCK_GROUP_TREE compat RO flag from EXTENT_TREE_V2Qu Wenruo1-0/+6
The problem of long mount time caused by block group item search is already known for some time, and the solution of block group tree has been proposed. There is really no need to bound this feature into extent tree v2, just introduce compat RO flag, BLOCK_GROUP_TREE, to correctly solve the problem. All the code handling block group root is already in the upstream kernel, thus this patch really only needs to introduce the new compat RO flag. This patch introduces one extra artificial limitation on block group tree feature, that free space cache v2 and no-holes feature must be enabled to use this new compat RO feature. This artificial requirement is mostly to reduce the test combinations, and can be a guideline for future features, to mostly rely on the latest default features. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-09-24media: cec: add support for Absolute Volume ControlHans Verkuil2-0/+16
Add support for this new CEC message. This was added in HDMI 2.1a. Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
2022-09-24media: rockchip: rkisp1: Define macros for DPCC configurations in UAPILaurent Pinchart1-16/+61
Extend the UAPI rkisp1-config.h header with macros for all DPCC configuration fields. While at it, clarify of fix issues in the DPCC documentation. Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Reviewed-by: Paul Elder <paul.elder@ideasonboard.com> Reviewed-by: Dafna Hirschfeld <dafna@fastmail.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
2022-09-24fuse: implement ->tmpfile()Miklos Szeredi1-1/+5
This is basically equivalent to the FUSE_CREATE operation which creates and opens a regular file. Add a new FUSE_TMPFILE operation, otherwise just reuse the protocol and the code for FUSE_CREATE. Acked-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2022-09-23ublk_drv: add START_USER_RECOVERY and END_USER_RECOVERY supportZiyangZhang1-1/+2
START_USER_RECOVERY and END_USER_RECOVERY are two new control commands to support user recovery feature. After a crash, user should send START_USER_RECOVERY, it will: (1) check if (a)current ublk_device is UBLK_S_DEV_QUIESCED which was set by quiesce_work and (b)chardev is released (2) reinit all ubqs, including: (a) put the task_struct and reset ->ubq_daemon to NULL. (b) reset all ublk_io. (3) reset ub->mm to NULL. Then, user should start a new process and send FETCH_REQ on each ubq_daemon. Finally, user should send END_USER_RECOVERY, it will: (1) wait for all new ubq_daemons getting ready. (2) update ublksrv_pid (3) unquiesce the request queue and expect incoming ublk_queue_rq() (4) convert ub's state to UBLK_S_DEV_LIVE Note: we can handle STOP_DEV between START_USER_RECOVERY and END_USER_RECOVERY. This is helpful to users who cannot start new process after sending START_USER_RECOVERY ctrl-cmd. Signed-off-by: ZiyangZhang <ZiyangZhang@linux.alibaba.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20220923153919.44078-7-ZiyangZhang@linux.alibaba.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-09-23ublk_drv: support UBLK_F_USER_RECOVERY_REISSUEZiyangZhang1-0/+2
UBLK_F_USER_RECOVERY_REISSUE implies that: With a dying ubq_daemon, ublk_drv let monitor_work requeues rq issued to userspace(ublksrv) before the ubq_daemon is dying. UBLK_F_USER_RECOVERY_REISSUE is designed for backends which: (1) tolerate double-write since ublk_drv may issue the same rq twice. (2) does not let frontend users get I/O error, such as read-only FS and VM backend. Signed-off-by: ZiyangZhang <ZiyangZhang@linux.alibaba.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20220923153919.44078-6-ZiyangZhang@linux.alibaba.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-09-23ublk_drv: define macros for recovery feature and check themZiyangZhang1-0/+3
Define some macros for recovery feature. UBLK_S_DEV_QUIESCED implies that ublk_device is quiesced and is ready for recovery. This state can be observed by userspace. UBLK_F_USER_RECOVERY implies that: (1) ublk_drv enables recovery feature. It won't let monitor_work to automatically abort rqs and release the device. (2) With a dying ubq_daemon, ublk_drv ends(aborts) rqs issued to userspace(ublksrv) before crash. (3) With a dying ubq_daemon, in task work and ublk_queue_rq(), ublk_drv requeues rqs. Signed-off-by: ZiyangZhang <ZiyangZhang@linux.alibaba.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20220923153919.44078-3-ZiyangZhang@linux.alibaba.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-09-23tun: support not enabling carrier in TUNSETIFFPatrick Rohr1-0/+2
This change adds support for not enabling carrier during TUNSETIFF interface creation by specifying the IFF_NO_CARRIER flag. Our tests make heavy use of tun interfaces. In some scenarios, the test process creates the interface but another process brings it up after the interface is discovered via netlink notification. In that case, it is not possible to create a tun/tap interface with carrier off without it racing against the bring up. Immediately setting carrier off via TUNSETCARRIER is still too late. Signed-off-by: Patrick Rohr <prohr@google.com> Cc: Maciej Żenczykowski <maze@google.com> Cc: Lorenzo Colitti <lorenzo@google.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com> Reviewed-by: Maciej Żenczykowski <maze@google.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-23net: phy: Add support for rate matchingSean Anderson2-2/+17
This adds support for rate matching (also known as rate adaptation) to the phy subsystem. The general idea is that the phy interface runs at one speed, and the MAC throttles the rate at which it sends packets to the link speed. There's a good overview of several techniques for achieving this at [1]. This patch adds support for three: pause-frame based (such as in Aquantia phys), CRS-based (such as in 10PASS-TS and 2BASE-TL), and open-loop-based (such as in 10GBASE-W). This patch makes a few assumptions and a few non assumptions about the types of rate matching available. First, it assumes that different phys may use different forms of rate matching. Second, it assumes that phys can use rate matching for any of their supported link speeds (e.g. if a phy supports 10BASE-T and XGMII, then it can adapt XGMII to 10BASE-T). Third, it does not assume that all interface modes will use the same form of rate matching. Fourth, it does not assume that all phy devices will support rate matching (even if some do). Relaxing or strengthening these (non-)assumptions could result in a different API. For example, if all interface modes were assumed to use the same form of rate matching, then a bitmask of interface modes supportting rate matching would suffice. For some better visibility into the process, the current rate matching mode is exposed as part of the ethtool ksettings. For the moment, only read access is supported. I'm not sure what userspace might want to configure yet (disable it altogether, disable just one mode, specify the mode to use, etc.). For the moment, since only pause-based rate adaptation support is added in the next few commits, rate matching can be disabled altogether by adjusting the advertisement. 802.3 calls this feature "rate adaptation" in clause 49 (10GBASE-R) and "rate matching" in clause 61 (10PASS-TL and 2BASE-TS). Aquantia also calls this feature "rate adaptation". I chose "rate matching" because it is shorter, and because Russell doesn't think "adaptation" is correct in this context. Signed-off-by: Sean Anderson <sean.anderson@seco.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-21bpf: Add bpf_user_ringbuf_drain() helperDavid Vernet1-0/+38
In a prior change, we added a new BPF_MAP_TYPE_USER_RINGBUF map type which will allow user-space applications to publish messages to a ring buffer that is consumed by a BPF program in kernel-space. In order for this map-type to be useful, it will require a BPF helper function that BPF programs can invoke to drain samples from the ring buffer, and invoke callbacks on those samples. This change adds that capability via a new BPF helper function: bpf_user_ringbuf_drain(struct bpf_map *map, void *callback_fn, void *ctx, u64 flags) BPF programs may invoke this function to run callback_fn() on a series of samples in the ring buffer. callback_fn() has the following signature: long callback_fn(struct bpf_dynptr *dynptr, void *context); Samples are provided to the callback in the form of struct bpf_dynptr *'s, which the program can read using BPF helper functions for querying struct bpf_dynptr's. In order to support bpf_ringbuf_drain(), a new PTR_TO_DYNPTR register type is added to the verifier to reflect a dynptr that was allocated by a helper function and passed to a BPF program. Unlike PTR_TO_STACK dynptrs which are allocated on the stack by a BPF program, PTR_TO_DYNPTR dynptrs need not use reference tracking, as the BPF helper is trusted to properly free the dynptr before returning. The verifier currently only supports PTR_TO_DYNPTR registers that are also DYNPTR_TYPE_LOCAL. Note that while the corresponding user-space libbpf logic will be added in a subsequent patch, this patch does contain an implementation of the .map_poll() callback for BPF_MAP_TYPE_USER_RINGBUF maps. This .map_poll() callback guarantees that an epoll-waiting user-space producer will receive at least one event notification whenever at least one sample is drained in an invocation of bpf_user_ringbuf_drain(), provided that the function is not invoked with the BPF_RB_NO_WAKEUP flag. If the BPF_RB_FORCE_WAKEUP flag is provided, a wakeup notification is sent even if no sample was drained. Signed-off-by: David Vernet <void@manifault.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220920000100.477320-3-void@manifault.com
2022-09-21bpf: Define new BPF_MAP_TYPE_USER_RINGBUF map typeDavid Vernet1-0/+1
We want to support a ringbuf map type where samples are published from user-space, to be consumed by BPF programs. BPF currently supports a kernel -> user-space circular ring buffer via the BPF_MAP_TYPE_RINGBUF map type. We'll need to define a new map type for user-space -> kernel, as none of the helpers exported for BPF_MAP_TYPE_RINGBUF will apply to a user-space producer ring buffer, and we'll want to add one or more helper functions that would not apply for a kernel-producer ring buffer. This patch therefore adds a new BPF_MAP_TYPE_USER_RINGBUF map type definition. The map type is useless in its current form, as there is no way to access or use it for anything until we one or more BPF helpers. A follow-on patch will therefore add a new helper function that allows BPF programs to run callbacks on samples that are published to the ring buffer. Signed-off-by: David Vernet <void@manifault.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220920000100.477320-2-void@manifault.com
2022-09-21io_uring/net: zerocopy sendmsgPavel Begunkov1-0/+1
Add a zerocopy version of sendmsg. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/6aabc4bdfc0ec78df6ec9328137e394af9d4e7ef.1663668091.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-09-21bpf, cgroup: Reject prog_attach_flags array when effective queryPu Lehui1-2/+5
Attach flags is only valid for attached progs of this layer cgroup, but not for effective progs. For querying with EFFECTIVE flags, exporting attach flags does not make sense. So when effective query, we reject prog_attach_flags array and don't need to populate it. Also we limit attach_flags to output 0 during effective query. Fixes: b79c9fc9551b ("bpf: implement BPF_PROG_QUERY for BPF_LSM_CGROUP") Signed-off-by: Pu Lehui <pulehui@huawei.com> Link: https://lore.kernel.org/r/20220921104604.2340580-2-pulehui@huaweicloud.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2022-09-21iio: add modifers for pitch, yaw, rollAndrea Merello1-0/+3
Add modifiers for reporting rotations as euler angles (i.e. yaw, pitch and roll). Signed-off-by: Andrea Merello <andrea.merello@iit.it> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Link: https://lore.kernel.org/r/20220907132205.28021-5-andrea.merello@iit.it Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2022-09-21iio: add modifiers for linear accelerationAndrea Merello1-1/+3
Add IIO_MOD_LINEAR_X, IIO_MOD_LINEAR_Y and IIO_MOD_LINEAR_Z modifiers to te IIO core, which is preparatory for adding the Bosch BNO055 IMU driver. Bosch BNO055 IMU can report raw accelerations (among x, y and z axis) as well as the so called "linear accelerations" (again, among x, y and z axis) which is basically the acceleration after subtracting gravity and for which those new modifiers are for. Signed-off-by: Andrea Merello <andrea.merello@iit.it> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Link: https://lore.kernel.org/r/20220907132205.28021-2-andrea.merello@iit.it Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2022-09-21io_uring: add IORING_SETUP_DEFER_TASKRUNDylan Yudaken1-0/+7
Allow deferring async tasks until the user calls io_uring_enter(2) with the IORING_ENTER_GETEVENTS flag. Enable this mode with a flag at io_uring_setup time. This functionality requires that the later io_uring_enter will be called from the same submission task, and therefore restrict this flag to work only when IORING_SETUP_SINGLE_ISSUER is also set. Being able to hand pick when tasks are run prevents the problem where there is current work to be done, however task work runs anyway. For example, a common workload would obtain a batch of CQEs, and process each one. Interrupting this to additional taskwork would add latency but not gain anything. If instead task work is deferred to just before more CQEs are obtained then no additional latency is added. The way this is implemented is by trying to keep task work local to a io_ring_ctx, rather than to the submission task. This is required, as the application will want to wake up only a single io_ring_ctx at a time to process work, and so the lists of work have to be kept separate. This has some other benefits like not having to check the task continually in handle_tw_list (and potentially unlocking/locking those), and reducing locks in the submit & process completions path. There are networking cases where using this option can reduce request latency by 50%. For example a contrived example using [1] where the client sends 2k data and receives the same data back while doing some system calls (to trigger task work) shows this reduction. The reason ends up being that if sending responses is delayed by processing task work, then the client side sits idle. Whereas reordering the sends first means that the client runs it's workload in parallel with the local task work. [1]: Using https://github.com/DylanZA/netbench/tree/defer_run Client: ./netbench --client_only 1 --control_port 10000 --host <host> --tx "epoll --threads 16 --per_thread 1 --size 2048 --resp 2048 --workload 1000" Server: ./netbench --server_only 1 --control_port 10000 --rx "io_uring --defer_taskrun 0 --workload 100" --rx "io_uring --defer_taskrun 1 --workload 100" Signed-off-by: Dylan Yudaken <dylany@fb.com> Link: https://lore.kernel.org/r/20220830125013.570060-5-dylany@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-09-21Merge tag 'iio-for-6.1a' of https://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio into char-misc-nextGreg Kroah-Hartman1-0/+3
Jonathan writes: 1st set of IIO new device support, features and cleanup for 6.1 This includes Nuno Sa's work to move the IIO core over to generic firmware properties rather than having DT specific code paths. Combined with Andy Shevchenko's long term work on drivers, this leaves IIO in a good state for handling other firmware types. New device support - liteon,ltrf216a * New driver and dt bindings to support this Light sensor. - maxim,max11205 * New driver for this 16bit single channel ADC. - memsensing,msa311 * New driver for this accelerometer. Includes a string helper for read/write. - richtek,rtq6056 * New driver and dt binding to support this current monitor used to measure power usage. - yamaha,yas530 * Support the YAS537 variant (series includes several fixes for other parts and new driver features). Staging graduation - adi,ad7746 CDC. Cleanup conducted against set of roadtest tests using the posted RFC of that framework. Features - core * Large rework to make all the core IIO code use generic firmware properties. Includes switching some drivers over as well using newly provided generic interfaces and allowing removal of DT specific ones. * Support for gesture event types for single and double tap. Used in bosch,bma400. - atmel,at91-sama5d2 * Add support for temperature sensor which uses two muxed inputs to estimate the temperature. * Handle trackx bits of EMR register to improve temp sampling accuracy. * Runtime PM support. - liteon,ltrf216a * Add a _raw channel output to allow working around an issue with differing conversions equations that breaks some user space controls. - mexelis,mlx90632 * Support regulator control. - ti,tsc2046 * External reference voltage support. Clean up and minor fixes - Tree-wide * devm_clk_get_enabled() replacements of opencoded equivalent. * Remaining IIO_DMA_MINALIGN conversions (the staging/iio drivers). * Various minor warning and similar cleanup such as missing static markings. * strlcpy() to strscpy() for cases where return value not checked. * provide units.h entries for more HZ units and use them in drivers. - dt-bindings cleanup * Drop maintainers listss where the email address is bouncing. * Switch spi devices over to using spi-peripheral.yaml * Add some missing unevaluatedProperties / additionalProperties: false entries. - ABI docs * Add some missing channel type specific sampling frequency entries. * Add parameter names for callback parameters. - MAINTAINERS * Fix wrong ADI forum links. - core * lockdep class per device, to avoid an issue with nest when one IIO device is the consumer of another. * White space tweaks. - asc,dlhl60d * Use get_unaligned_be24 to avoid some unusual data manipulation and masking. - atmel,at91-sama5d2 * Fix wrong max value. * Improve error handling when measuring pressure and touch. * Add locks to remove races on updating oversampling / sampling freq. * Add missing calls in suspend and resume path to ensure state is correctly brought up if buffered capture was in use when suspend happened. * Error out of write_raw() callback if buffered capture enabled to avoid unpredictable behavior. * Handle different versions having different oversampling ratio support and drop excess error checking. * Cleanup magic value defines where the name is just the value and hence hurts readability. * Use read_avail() callback to provide info on possible oversampling ratios. * Correctly handle variable bit depth when doing oversampling on different supported parts. Also handle higher oversampling ratios. - fsl,imx8qxp * Don't ignore errors from regulator_get_voltage() so as to avoid some very surprising scaling. - invensense,icp10100 * Switch from UNIVERSAL to DEFINE_RUNTIME_DEV_PM_OPS. UNIVERSAL rarely made sense and is now deprecated. In this driver we just avoid double disabling in some paths. - maxim,max1363 * Drop consumer channel map provision by platform data. There have been better ways of doing this for years and there are no in tree users. - microchip,mcp3911 * Update status to maintained. - qcom,spmi-adc5 * Support measurement of LDO output voltage. - qcom,spmi-adc * Add missing channel available on SM6125 SoC. - st,stmpe * Drop requirement on node name in binding now that driver correctly doesn't enforce it. - stx104 * Move to more appropriate addac directory - ti,am335x * Document ti,am654-adc compatible already in use in tree. - ti,hmc5843 * Move dev_pm_ops out of header and use new pm macros to handle export. - yamaha,yas530 * Minor cleanups. * tag 'iio-for-6.1a' of https://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio: (142 commits) iio: pressure: icp10100: Switch from UNIVERSAL to DEFINE_RUNTIME_DEV_PM_OPS(). iio: adc: max1363: Drop provision to provide an IIO channel map via platform data iio: accel: bma400: Add support for single and double tap events iio: Add new event type gesture and use direction for single and double tap iio: Use per-device lockdep class for mlock iio: adc: add max11205 adc driver dt-bindings: iio: adc: Add max11205 documentation file iio: magnetometer: yamaha-yas530: Use dev_err_probe() iio: magnetometer: yamaha-yas530: Make strings const in chip info iio: magnetometer: yamaha-yas530: Use pointers as driver data iio: adc: tsc2046: silent spi_device_id warning iio: adc: tsc2046: add vref support dt-bindings: iio: adc: ti,tsc2046: add vref-supply property iio: light: ltrf216a: Add raw attribute dt-bindings: iio: Add missing (unevaluated|additional)Properties on child nodes MAINTAINERS: fix Analog Devices forum links iio/accel: fix repeated words in comments dt-bindings: iio: accel: add dt-binding schema for msa311 accel driver iio: add MEMSensing MSA311 3-axis accelerometer driver dt-bindings: vendor-prefixes: add MEMSensing Microsystems Co., Ltd. ...
2022-09-21headers: Remove some left-over license text in include/uapi/linux/netfilter/Christophe JAILLET4-31/+4
When the SPDX-License-Identifier tag has been added, the corresponding license text has not been removed. Remove it now. Also, in xt_connmark.h, move the copyright text at the top of the file which is a much more common pattern. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Florian Westphal <fw@strlen.de>
2022-09-20HID: convert defines of HID class requests into a proper enumBenjamin Tissoires1-6/+8
This allows to export the type in BTF and so in the automatically generated vmlinux.h. It will also add some static checks on the users when we change the ll driver API (see not below). Note that we need to also do change in the ll_driver API, but given that this will have a wider impact outside of this tree, we leave this as a TODO for the future. Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Link: https://lore.kernel.org/r/20220902132938.2409206-11-benjamin.tissoires@redhat.com
2022-09-20HID: export hid_report_type to uapiBenjamin Tissoires1-0/+12
When we are dealing with eBPF, we need to have access to the report type. Currently our implementation differs from the USB standard, making it impossible for users to know the exact value besides hardcoding it themselves. And instead of a blank define, convert it as an enum. Note that we need to also do change in the ll_driver API, but given that this will have a wider impact outside of this tree, we leave this as a TODO for the future. Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Link: https://lore.kernel.org/r/20220902132938.2409206-10-benjamin.tissoires@redhat.com
2022-09-20seg6: add NEXT-C-SID support for SRv6 End behaviorAndrea Mayer1-0/+24
The NEXT-C-SID mechanism described in [1] offers the possibility of encoding several SRv6 segments within a single 128 bit SID address. Such a SID address is called a Compressed SID (C-SID) container. In this way, the length of the SID List can be drastically reduced. A SID instantiated with the NEXT-C-SID flavor considers an IPv6 address logically structured in three main blocks: i) Locator-Block; ii) Locator-Node Function; iii) Argument. C-SID container +------------------------------------------------------------------+ | Locator-Block |Loc-Node| Argument | | |Function| | +------------------------------------------------------------------+ <--------- B -----------> <- NF -> <------------- A ---------------> (i) The Locator-Block can be any IPv6 prefix available to the provider; (ii) The Locator-Node Function represents the node and the function to be triggered when a packet is received on the node; (iii) The Argument carries the remaining C-SIDs in the current C-SID container. The NEXT-C-SID mechanism relies on the "flavors" framework defined in [2]. The flavors represent additional operations that can modify or extend a subset of the existing behaviors. This patch introduces the support for flavors in SRv6 End behavior implementing the NEXT-C-SID one. An SRv6 End behavior with NEXT-C-SID flavor works as an End behavior but it is capable of processing the compressed SID List encoded in C-SID containers. An SRv6 End behavior with NEXT-C-SID flavor can be configured to support user-provided Locator-Block and Locator-Node Function lengths. In this implementation, such lengths must be evenly divisible by 8 (i.e. must be byte-aligned), otherwise the kernel informs the user about invalid values with a meaningful error code and message through netlink_ext_ack. If Locator-Block and/or Locator-Node Function lengths are not provided by the user during configuration of an SRv6 End behavior instance with NEXT-C-SID flavor, the kernel will choose their default values i.e., 32-bit Locator-Block and 16-bit Locator-Node Function. [1] - https://datatracker.ietf.org/doc/html/draft-ietf-spring-srv6-srh-compression [2] - https://datatracker.ietf.org/doc/html/rfc8986 Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-09-20net: dsa: allow the DSA master to be seen and changed through rtnetlinkVladimir Oltean1-0/+10
Some DSA switches have multiple CPU ports, which can be used to improve CPU termination throughput, but DSA, through dsa_tree_setup_cpu_ports(), sets up only the first one, leading to suboptimal use of hardware. The desire is to not change the default configuration but to permit the user to create a dynamic mapping between individual user ports and the CPU port that they are served by, configurable through rtnetlink. It is also intended to permit load balancing between CPU ports, and in that case, the foreseen model is for the DSA master to be a bonding interface whose lowers are the physical DSA masters. To that end, we create a struct rtnl_link_ops for DSA user ports with the "dsa" kind. We expose the IFLA_DSA_MASTER link attribute that contains the ifindex of the newly desired DSA master. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-09-20net/sched: flower: Add L2TPv3 filterWojciech Drewek1-0/+2
Add support for matching on L2TPv3 session ID. Session ID can be specified only when ip proto was set to IPPROTO_L2TP. Example filter: # tc filter add dev $PF1 ingress prio 1 protocol ip \ flower \ ip_proto l2tp \ l2tpv3_sid 1234 \ skip_sw \ action mirred egress redirect dev $VF1_PR Acked-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-09-20uapi: move IPPROTO_L2TP to in.hWojciech Drewek2-2/+2
IPPROTO_L2TP is currently defined in l2tp.h, but most of ip protocols are defined in in.h file. Move it there in order to keep code clean. Acked-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-09-16Merge tag 'linux-can-next-for-6.1-20220915' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-nextDavid S. Miller3-2/+55
Marc Kleine-Budde says: ==================== Sept. 15, 2022, 8:19 a.m. UTC Hello Jakub, hello David, this is a pull request of 23 patches for net-next/master. the first 2 patches are by me and fix a typo in the rx-offload helper and the flexcan driver. Christophe JAILLET's patch cleans up the error handling in rcar_canfd driver's probe function. Kenneth Lee's patch converts the kvaser_usb driver from kcalloc() to kzalloc(). Biju Das contributes 2 patches to the sja1000 driver which update the DT bindings and support for the RZ/N1 SJA1000 CAN controller. Jinpeng Cui provides 2 patches that remove redundant variables from the sja1000 and kvaser_pciefd driver. 2 patches by John Whittington and me add hardware timestamp support to the gs_usb driver. Gustavo A. R. Silva's patch converts the etas_es58x driver to make use of DECLARE_FLEX_ARRAY(). Krzysztof Kozlowski's patch cleans up the sja1000 DT bindings. Dario Binacchi fixes his invalid email in the flexcan driver documentation. Ziyang Xuan contributes 2 patches that clean up the CAN RAW protocol. Yang Yingliang's patch switches the flexcan driver to dev_err_probe(). The last 7 patches are by Oliver Hartkopp and add support for the next generation of the CAN protocol: CAN with eXtended data Length (CAN XL). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-16rtnetlink: advertise allmulti counterNicolas Dichtel1-0/+1
Like what was done with IFLA_PROMISCUITY, add IFLA_ALLMULTI to advertise the allmulti counter. The flag IFF_ALLMULTI is advertised only if it was directly set by a userland app. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-15can: raw: add CAN XL supportOliver Hartkopp1-0/+1
Enable CAN_RAW sockets to read and write CAN XL frames analogue to the CAN FD extension (new CAN_RAW_XL_FRAMES sockopt). A CAN XL network interface is capable to handle Classical CAN, CAN FD and CAN XL frames. When CAN_RAW_XL_FRAMES is enabled, the CAN_RAW socket checks whether the addressed CAN network interface is capable to handle the provided CAN frame. In opposite to the fixed number of bytes for - CAN frames (CAN_MTU = sizeof(struct can_frame)) - CAN FD frames (CANFD_MTU = sizeof(struct can_frame)) the number of bytes when reading/writing CAN XL frames depends on the number of data bytes. For efficiency reasons the length of the struct canxl_frame is truncated to the needed size for read/write operations. This leads to a calculated size of CANXL_HDR_SIZE + canxl_frame::len which is enforced on write() operations and guaranteed on read() operations. NB: Valid length values are 1 .. 2048 (CANXL_MIN_DLEN .. CANXL_MAX_DLEN). Acked-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Link: https://lore.kernel.org/all/20220912170725.120748-8-socketcan@hartkopp.net Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2022-09-15can: canxl: update CAN infrastructure for CAN XL framesOliver Hartkopp1-0/+1
- add new ETH_P_CANXL ethernet protocol type - update skb checks for CAN XL - add alloc_canxl_skb() which now needs a data length parameter - introduce init_can_skb_reserve() to reduce code duplication Acked-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Link: https://lore.kernel.org/all/20220912170725.120748-6-socketcan@hartkopp.net Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2022-09-15can: canxl: introduce CAN XL data structureOliver Hartkopp1-0/+51
This patch adds defines for data structures and length information for CAN XL (CAN with eXtended data Length) which can transfer up to 2048 byte inside a single frame. Notable changes from CAN FD: - the 11 bit arbitration field is now named 'priority' instead of 'can_id' (there are no 29 bit identifiers nor RTR frames anymore) - the data length needs a uint16 value to cover up to 2048 byte (the length element position is different to struct can[fd]_frame) - new fields (SDT, AF) and a SEC bit have been introduced - the virtual CAN interface identifier is not part if the CAN XL frame struct as this VCID value is stored in struct skbuff (analog to vlan id) Acked-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Link: https://lore.kernel.org/all/20220912170725.120748-5-socketcan@hartkopp.net Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2022-09-15can: set CANFD_FDF flag in all CAN FD frame structuresOliver Hartkopp1-2/+2
To simplify the testing in user space all struct canfd_frame's provided by the CAN subsystem of the Linux kernel now have the CANFD_FDF flag set in canfd_frame::flags. NB: Handcrafted ETH_P_CANFD frames introduced via PF_PACKET socket might not set this bit correctly. During the check for sufficient headroom in PF_PACKET sk_buffs the uninitialized CAN sk_buff data structures are filled. In the case of a CAN FD frame the CANFD_FDF flag is set accordingly. As the CAN frame content is already zero initialized in alloc_canfd_skb() the obsolete initialization of cf->flags in the CTU CAN FD driver has been removed as it would overwrite the already set CANFD_FDF flag. Acked-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Link: https://lore.kernel.org/all/20220912170725.120748-4-socketcan@hartkopp.net Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2022-09-13perf: Kill __PERF_SAMPLE_CALLCHAIN_EARLYNamhyung Kim1-2/+0
There's no in-tree user anymore. Let's get rid of it. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20220908214104.3851807-3-namhyung@kernel.org
2022-09-11userfaultfd: add /dev/userfaultfd for fine grained access controlAxel Rasmussen1-0/+4
Historically, it has been shown that intercepting kernel faults with userfaultfd (thereby forcing the kernel to wait for an arbitrary amount of time) can be exploited, or at least can make some kinds of exploits easier. So, in 37cd0575b8 "userfaultfd: add UFFD_USER_MODE_ONLY" we changed things so, in order for kernel faults to be handled by userfaultfd, either the process needs CAP_SYS_PTRACE, or this sysctl must be configured so that any unprivileged user can do it. In a typical implementation of a hypervisor with live migration (take QEMU/KVM as one such example), we do indeed need to be able to handle kernel faults. But, both options above are less than ideal: - Toggling the sysctl increases attack surface by allowing any unprivileged user to do it. - Granting the live migration process CAP_SYS_PTRACE gives it this ability, but *also* the ability to "observe and control the execution of another process [...], and examine and change [its] memory and registers" (from ptrace(2)). This isn't something we need or want to be able to do, so granting this permission violates the "principle of least privilege". This is all a long winded way to say: we want a more fine-grained way to grant access to userfaultfd, without granting other additional permissions at the same time. To achieve this, add a /dev/userfaultfd misc device. This device provides an alternative to the userfaultfd(2) syscall for the creation of new userfaultfds. The idea is, any userfaultfds created this way will be able to handle kernel faults, without the caller having any special capabilities. Access to this mechanism is instead restricted using e.g. standard filesystem permissions. [axelrasmussen@google.com: Handle misc_register() failure properly] Link: https://lkml.kernel.org/r/20220819205201.658693-3-axelrasmussen@google.com Link: https://lkml.kernel.org/r/20220808175614.3885028-3-axelrasmussen@google.com Signed-off-by: Axel Rasmussen <axelrasmussen@google.com> Acked-by: Nadav Amit <namit@vmware.com> Acked-by: Peter Xu <peterx@redhat.com> Acked-by: Mike Rapoport <rppt@linux.ibm.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dmitry V. Levin <ldv@altlinux.org> Cc: Gleb Fotengauer-Malinovskiy <glebfm@altlinux.org> Cc: Hugh Dickins <hughd@google.com> Cc: Jan Kara <jack@suse.cz> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Shuah Khan <skhan@linuxfoundation.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Zhang Yi <yi.zhang@huawei.com> Cc: Mike Rapoport <rppt@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-11statx: add direct I/O alignment informationEric Biggers1-1/+3
Traditionally, the conditions for when DIO (direct I/O) is supported were fairly simple. For both block devices and regular files, DIO had to be aligned to the logical block size of the block device. However, due to filesystem features that have been added over time (e.g. multi-device support, data journalling, inline data, encryption, verity, compression, checkpoint disabling, log-structured mode), the conditions for when DIO is allowed on a regular file have gotten increasingly complex. Whether a particular regular file supports DIO, and with what alignment, can depend on various file attributes and filesystem mount options, as well as which block device(s) the file's data is located on. Moreover, the general rule of DIO needing to be aligned to the block device's logical block size was recently relaxed to allow user buffers (but not file offsets) aligned to the DMA alignment instead. See commit bf8d08532bc1 ("iomap: add support for dma aligned direct-io"). XFS has an ioctl XFS_IOC_DIOINFO that exposes DIO alignment information. Uplifting this to the VFS is one possibility. However, as discussed (https://lore.kernel.org/linux-fsdevel/20220120071215.123274-1-ebiggers@kernel.org/T/#u), this ioctl is rarely used and not known to be used outside of XFS-specific code. It was also never intended to indicate when a file doesn't support DIO at all, nor was it intended for block devices. Therefore, let's expose this information via statx(). Add the STATX_DIOALIGN flag and two new statx fields associated with it: * stx_dio_mem_align: the alignment (in bytes) required for user memory buffers for DIO, or 0 if DIO is not supported on the file. * stx_dio_offset_align: the alignment (in bytes) required for file offsets and I/O segment lengths for DIO, or 0 if DIO is not supported on the file. This will only be nonzero if stx_dio_mem_align is nonzero, and vice versa. Note that as with other statx() extensions, if STATX_DIOALIGN isn't set in the returned statx struct, then these new fields won't be filled in. This will happen if the file is neither a regular file nor a block device, or if the file is a regular file and the filesystem doesn't support STATX_DIOALIGN. It might also happen if the caller didn't include STATX_DIOALIGN in the request mask, since statx() isn't required to return unrequested information. This commit only adds the VFS-level plumbing for STATX_DIOALIGN. For regular files, individual filesystems will still need to add code to support it. For block devices, a separate commit will wire it up too. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20220827065851.135710-2-ebiggers@kernel.org
2022-09-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-nextDavid S. Miller1-0/+2
Florian Westphal says: ==================== The following set contains changes for your *net-next* tree: - make conntrack ignore packets that are delayed (containing data already acked). The current behaviour to flag them as INVALID causes more harm than good, let them pass so peer can send an immediate ACK for the most recent sequence number. - make conntrack recognize when both peers have sent 'invalid' FINs: This helps cleaning out stale connections faster for those cases where conntrack is no longer in sync with the actual connection state. - Now that DECNET is gone, we don't need to reserve space for DECNET related information. - compact common 'find a free port number for the new inbound connection' code and move it to a helper, then cap number of tries the new helper will make until it gives up. - replace various instances of strlcpy with strscpy, from Wolfram Sang. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>