| Age | Commit message (Collapse) | Author | Files | Lines |
|
There sre 3 bits in member high of struct bkey are never used, and no
plan to support them in future,
- HEADER_SIZE, start at bit 58, length 2 bits
- KEY_PINNED, start at bit 55, length 1 bit
No any kernel code, or user space tool references or accesses the three
bits. Therefore it is possible and feasible to reserve the valuable bits
from bkey.high. They can be used in future for other purpose.
Signed-off-by: Coly Li <colyli@suse.de>
Link: https://lore.kernel.org/r/20211020143812.6403-3-colyli@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Going forward, future generation systems can have more hierarchy
within the node/package level but currently we don't have any data source
encoding field in perf, which can be used to represent this level of data.
Add a new field called 'mem_hops' in the perf_mem_data_src structure
which can be used to represent intra-node/package or inter-node/off-package
details. This field is of size 3 bits where PERF_MEM_HOPS_{NA, 0..6} value
can be used to present different hop levels data.
Also add corresponding macros to define mem_hop field values
and shift value.
Currently we define macro for HOPS_0 which corresponds
to data coming from another core but same node.
For ex: Encodings for mem_hops fields with L2 cache:
L2 - local L2
L2 | REMOTE | HOPS_0 - remote core, same node L2
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20211006140654.298352-3-kjain@linux.ibm.com
|
|
Add a comment about PERF_MEM_LVL_* namespace being depricated
to some extent in favour of added PERF_MEM_{LVLNUM_,REMOTE_,SNOOPX_}
fields.
Remove an extra line present in perf_mem__lvl_scnprintf function.
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20211006140654.298352-2-kjain@linux.ibm.com
|
|
For some reason non-off IORING_OP_TIMEOUT always fails links, it's
pretty inconvenient and unnecessary limits chaining after it to hard
linking, which is far from ideal, e.g. doesn't pair well with timeout
cancellation. Add a flag forcing it to not fail links on -ETIME.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/17c7ec0fb7a6113cc6be8cdaedcada0ba836ac0e.1633199723.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Jonathan writes:
First set of counter subsystem new feature support for the 5.16 cycle
Most interesting element this time is the new chrdev based interface
for the counter subsystem. Affects all drivers. Some minor precursor
patches.
Major parts:
* Bring all the sysfs attribute setup into the counter core rather than
leaving it to individual drivers. Docs updates accompany these changes.
* Move various definitions to a uapi header as now needed from userspace.
* Add the chardev interface + extensive documentation and example tool
* Add new ABI needed to identify indexes needed for chrdev interface
* Implement new interface for the 104-quad-8
* Follow up deals with wrong path for documentation build
* Various trivial cleanups and missing feature additions related to this
series
* tag 'counter-for-5.16a-take2' of https://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio:
docs: counter: Include counter-chrdev kernel-doc to generic-counter.rst
counter: fix docum. build problems after filename change
counter: microchip-tcb-capture: Tidy up a false kernel-doc /** marking.
counter: 104-quad-8: Add IRQ support for the ACCES 104-QUAD-8
counter: 104-quad-8: Replace mutex with spinlock
counter: Implement events_queue_size sysfs attribute
counter: Implement *_component_id sysfs attributes
counter: Implement signalZ_action_component_id sysfs attribute
tools/counter: Create Counter tools
docs: counter: Document character device interface
counter: Add character device interface
counter: Move counter enums to uapi header
docs: counter: Update to reflect sysfs internalization
counter: Update counter.h comments to reflect sysfs internalization
counter: Internalize sysfs interface code
counter: stm32-timer-cnt: Provide defines for slave mode selection
counter: stm32-lptimer-cnt: Provide defines for clock polarities
|
|
Patch set [1] introduced BTF_KIND_TAG to allow tagging
declarations for struct/union, struct/union field, var, func
and func arguments and these tags will be encoded into
dwarf. They are also encoded to btf by llvm for the bpf target.
After BTF_KIND_TAG is introduced, we intended to use it
for kernel __user attributes. But kernel __user is actually
a type attribute. Upstream and internal discussion showed
it is not a good idea to mix declaration attribute and
type attribute. So we proposed to introduce btf_type_tag
as a type attribute and existing btf_tag renamed to
btf_decl_tag ([2]).
This patch renamed BTF_KIND_TAG to BTF_KIND_DECL_TAG and some
other declarations with *_tag to *_decl_tag to make it clear
the tag is for declaration. In the future, BTF_KIND_TYPE_TAG
might be introduced per [3].
[1] https://lore.kernel.org/bpf/20210914223004.244411-1-yhs@fb.com/
[2] https://reviews.llvm.org/D111588
[3] https://reviews.llvm.org/D111199
Fixes: b5ea834dde6b ("bpf: Support for new btf kind BTF_KIND_TAG")
Fixes: 5b84bd10363e ("libbpf: Add support for BTF_KIND_TAG")
Fixes: 5c07f2fec003 ("bpftool: Add support for BTF_KIND_TAG")
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211012164838.3345699-1-yhs@fb.com
|
|
The 0-element arrays that are used as memcpy() destinations are actually
flexible arrays. Adjust their structures accordingly so that memcpy()
can better reason able their destination size (i.e. they need to be seen
as "unknown" length rather than "zero").
In some cases, use of the DECLARE_FLEX_ARRAY() helper is needed when a
flexible array is alone in a struct.
Cc: "Gustavo A. R. Silva" <gustavoars@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Kalle Valo <kvalo@codeaurora.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Nilesh Javali <njavali@marvell.com>
Cc: Manish Rangankar <mrangankar@marvell.com>
Cc: GR-QLogic-Storage-Upstream@marvell.com
Cc: "James E.J. Bottomley" <jejb@linux.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: Larry Finger <Larry.Finger@lwfinger.net>
Cc: Phillip Potter <phil@philpotter.co.uk>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Florian Schilhabel <florian.c.schilhabel@googlemail.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Cc: Fabio Aiuto <fabioaiuto83@gmail.com>
Cc: Ross Schmidt <ross.schm.dev@gmail.com>
Cc: Marco Cesati <marcocesati@gmail.com>
Cc: ath10k@lists.infradead.org
Cc: linux-wireless@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: linux-scsi@vger.kernel.org
Cc: linux-staging@lists.linux.dev
Signed-off-by: Kees Cook <keescook@chromium.org>
|
|
There are many places where kernel code wants to have several different
typed trailing flexible arrays. This would normally be done with multiple
flexible arrays in a union, but since GCC and Clang don't (on the surface)
allow this, there have been many open-coded workarounds, usually involving
neighboring 0-element arrays at the end of a structure. For example,
instead of something like this:
struct thing {
...
union {
struct type1 foo[];
struct type2 bar[];
};
};
code works around the compiler with:
struct thing {
...
struct type1 foo[0];
struct type2 bar[];
};
Another case is when a flexible array is wanted as the single member
within a struct (which itself is usually in a union). For example, this
would be worked around as:
union many {
...
struct {
struct type3 baz[0];
};
};
These kinds of work-arounds cause problems with size checks against such
zero-element arrays (for example when building with -Warray-bounds and
-Wzero-length-bounds, and with the coming FORTIFY_SOURCE improvements),
so they must all be converted to "real" flexible arrays, avoiding warnings
like this:
fs/hpfs/anode.c: In function 'hpfs_add_sector_to_btree':
fs/hpfs/anode.c:209:27: warning: array subscript 0 is outside the bounds of an interior zero-length array 'struct bplus_internal_node[0]' [-Wzero-length-bounds]
209 | anode->btree.u.internal[0].down = cpu_to_le32(a);
| ~~~~~~~~~~~~~~~~~~~~~~~^~~
In file included from fs/hpfs/hpfs_fn.h:26,
from fs/hpfs/anode.c:10:
fs/hpfs/hpfs.h:412:32: note: while referencing 'internal'
412 | struct bplus_internal_node internal[0]; /* (internal) 2-word entries giving
| ^~~~~~~~
drivers/net/can/usb/etas_es58x/es58x_fd.c: In function 'es58x_fd_tx_can_msg':
drivers/net/can/usb/etas_es58x/es58x_fd.c:360:35: warning: array subscript 65535 is outside the bounds of an interior zero-length array 'u8[0]' {aka 'unsigned char[]'} [-Wzero-length-bounds]
360 | tx_can_msg = (typeof(tx_can_msg))&es58x_fd_urb_cmd->raw_msg[msg_len];
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from drivers/net/can/usb/etas_es58x/es58x_core.h:22,
from drivers/net/can/usb/etas_es58x/es58x_fd.c:17:
drivers/net/can/usb/etas_es58x/es58x_fd.h:231:6: note: while referencing 'raw_msg'
231 | u8 raw_msg[0];
| ^~~~~~~
However, it _is_ entirely possible to have one or more flexible arrays
in a struct or union: it just has to be in another struct. And since it
cannot be alone in a struct, such a struct must have at least 1 other
named member -- but that member can be zero sized. Wrap all this nonsense
into the new DECLARE_FLEX_ARRAY() in support of having flexible arrays
in unions (or alone in a struct).
As with struct_group(), since this is needed in UAPI headers as well,
implement the core there, with a non-UAPI wrapper.
Additionally update kernel-doc to understand its existence.
https://github.com/KSPP/linux/issues/137
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: "Gustavo A. R. Silva" <gustavoars@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
|
|
Handling the migration of TSCs correctly is difficult, in part because
Linux does not provide userspace with the ability to retrieve a (TSC,
realtime) clock pair for a single instant in time. In lieu of a more
convenient facility, KVM can report similar information in the kvm_clock
structure.
Provide userspace with a host TSC & realtime pair iff the realtime clock
is based on the TSC. If userspace provides KVM_SET_CLOCK with a valid
realtime value, advance the KVM clock by the amount of elapsed time. Do
not step the KVM clock backwards, though, as it is a monotonic
oscillator.
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Oliver Upton <oupton@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20210916181538.968978-5-oupton@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
BSM or Backup Switch Mode is a common feature on RTCs, allowing to select
how the RTC will decide when to switch from its primary power supply to the
backup power supply. It is necessary to be able to set it from userspace as
there are uses cases where it has to be done dynamically.
Supported values are:
RTC_BSM_DISABLED: disabled
RTC_BSM_DIRECT: switching will happen as soon as Vbackup > Vdd
RTC_BSM_LEVEL: switching will happen around a threshold, usually with an
hysteresis
RTC_BSM_STANDBY: switching will not happen until Vdd > Vbackup, this is
useful to ensure the RTC doesn't draw any power until the device is first
powered on.
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Link: https://lore.kernel.org/r/20211018151933.76865-6-alexandre.belloni@bootlin.com
|
|
Add a new parameter allowing the get and set the correction using ioctls
instead of just sysfs.
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Link: https://lore.kernel.org/r/20211018151933.76865-5-alexandre.belloni@bootlin.com
|
|
Add a new feature for RTCs able to correct the oscillator imprecision. This
is also called offset or trimming. Such drivers have a .set_offset callback,
use that to set the feature bit from the core.
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Link: https://lore.kernel.org/r/20211018151933.76865-4-alexandre.belloni@bootlin.com
|
|
Add an ioctl allowing to get and set extra parameters for an RTC. For now,
only handle getting available features.
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Link: https://lore.kernel.org/r/20211018151933.76865-3-alexandre.belloni@bootlin.com
|
|
Add more alarm related features to be declared by drivers.
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Link: https://lore.kernel.org/r/20211018151933.76865-2-alexandre.belloni@bootlin.com
|
|
Pablo Neira Ayuso says:
====================
Netfilter/IPVS updates for net-next
The following patchset contains Netfilter/IPVS for net-next:
1) Add new run_estimation toggle to IPVS to stop the estimation_timer
logic, from Dust Li.
2) Relax superfluous dynset check on NFT_SET_TIMEOUT.
3) Add egress hook, from Lukas Wunner.
4) Nowadays, almost all hook functions in x_table land just call the hook
evaluation loop. Remove remaining hook wrappers from iptables and IPVS.
From Florian Westphal.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add a new EtherType ETH_P_REALTEK to the if_ether.h uapi header. The
EtherType 0x8899 is used in a number of different protocols from Realtek
Semiconductor Corp [1], so no general assumptions should be made when
trying to decode such packets. Observed protocols include:
0x1 - Realtek Remote Control protocol [2]
0x2 - Echo protocol [2]
0x3 - Loop detection protocol [2]
0x4 - RTL8365MB 4- and 8-byte switch CPU tag protocols [3]
0x9 - RTL8306 switch CPU tag protocol [4]
0xA - RTL8366RB switch CPU tag protocol [4]
[1] https://lore.kernel.org/netdev/CACRpkdYQthFgjwVzHyK3DeYUOdcYyWmdjDPG=Rf9B3VrJ12Rzg@mail.gmail.com/
[2] https://www.wireshark.org/lists/ethereal-dev/200409/msg00090.html
[3] https://lore.kernel.org/netdev/20210822193145.1312668-4-alvin@pqrs.dk/
[4] https://lore.kernel.org/netdev/20200708122537.1341307-2-linus.walleij@linaro.org/
Suggested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We currently have some implicit padding in struct sockaddr_mctp. This
patch makes this padding explicit, and ensures we have consistent
layout on platforms with <32bit alignmnent.
Fixes: 60fc63981693 ("mctp: Add sockaddr_mctp to uapi")
Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Use the more precise __kernel_sa_family_t for smctp_family, to match
struct sockaddr.
Also, use an unsigned int for the network member; negative networks
don't make much sense. We're already using unsigned for mctp_dev and
mctp_skb_cb, but need to change mctp_sock to suit.
Fixes: 60fc63981693 ("mctp: Add sockaddr_mctp to uapi")
Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Acked-by: Eugene Syromiatnikov <esyr@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We need the char/misc fixes in here for merging and testing.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
This patch introduces a character device interface for the Counter
subsystem. Device data is exposed through standard character device read
operations. Device data is gathered when a Counter event is pushed by
the respective Counter device driver. Configuration is handled via ioctl
operations on the respective Counter character device node.
Cc: David Lechner <david@lechnology.com>
Cc: Gwendal Grignou <gwendal@chromium.org>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: William Breathitt Gray <vilhelm.gray@gmail.com>
Link: https://lore.kernel.org/r/b8b8c64b4065aedff43699ad1f0e2f8d1419c15b.1632884256.git.vilhelm.gray@gmail.com
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
|
|
This is in preparation for a subsequent patch implementing a character
device interface for the Counter subsystem.
Reviewed-by: David Lechner <david@lechnology.com>
Signed-off-by: William Breathitt Gray <vilhelm.gray@gmail.com>
Link: https://lore.kernel.org/r/962a5f2027fafcf4f77c10e1baf520463960d1ee.1632884256.git.vilhelm.gray@gmail.com
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
|
|
Implement the netlink support for SMC-Rv2 related attributes that are
provided to user space.
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add TCA_FQ_CODEL_CE_THRESHOLD_ECT1 boolean option to select Low Latency,
Low Loss, Scalable Throughput (L4S) style marking, along with ce_threshold.
If enabled, only packets with ECT(1) can be transformed to CE
if their sojourn time is above the ce_threshold.
Note that this new option does not change rules for codel law.
In particular, if TCA_FQ_CODEL_ECN is left enabled (this is
the default when fq_codel qdisc is created), ECT(0) packets can
still get CE if codel law (as governed by limit/target) decides so.
Section 4.3.b of current draft [1] states:
b. A scheduler with per-flow queues such as FQ-CoDel or FQ-PIE can
be used for L4S. For instance within each queue of an FQ-CoDel
system, as well as a CoDel AQM, there is typically also ECN
marking at an immediate (unsmoothed) shallow threshold to support
use in data centres (see Sec.5.2.7 of [RFC8290]). This can be
modified so that the shallow threshold is solely applied to
ECT(1) packets. Then if there is a flow of non-ECN or ECT(0)
packets in the per-flow-queue, the Classic AQM (e.g. CoDel) is
applied; while if there is a flow of ECT(1) packets in the queue,
the shallower (typically sub-millisecond) threshold is applied.
Tested:
tc qd replace dev eth1 root fq_codel ce_threshold_ect1 50usec
netperf ... -t TCP_STREAM -- K dctcp
tc -s -d qd sh dev eth1
qdisc fq_codel 8022: root refcnt 32 limit 10240p flows 1024 quantum 9212 target 5ms ce_threshold_ect1 49us interval 100ms memory_limit 32Mb ecn drop_batch 64
Sent 14388596616 bytes 9543449 pkt (dropped 0, overlimits 0 requeues 152013)
backlog 0b 0p requeues 152013
maxpacket 68130 drop_overlimit 0 new_flow_count 95678 ecn_mark 0 ce_mark 7639
new_flows_len 0 old_flows_len 0
[1] L4S current draft:
https://datatracker.ietf.org/doc/html/draft-ietf-tsvwg-l4s-arch
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Ingemar Johansson S <ingemar.s.johansson@ericsson.com>
Cc: Tom Henderson <tomh@tomh.org>
Cc: Bob Briscoe <in@bobbriscoe.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
PEBS-via-PT records contain a mask of applicable counters. To identify
which event belongs to which counter, a side-band event is needed. Until
now, there has been no side-band event, and consequently users were limited
to using a single event.
Add such a side-band event. Note the event is optimised to output only
when the counter index changes for an event. That works only so long as
all PEBS-via-PT events are scheduled together, which they are for a
recording session because they are in a single group.
Also no attribute bit is used to select the new event, so a new
kernel is not compatible with older perf tools. The assumption
being that PEBS-via-PT is sufficiently esoteric that users will not
be troubled by this.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210907163903.11820-2-adrian.hunter@intel.com
|
|
Support classifying packets with netfilter on egress to satisfy user
requirements such as:
* outbound security policies for containers (Laura)
* filtering and mangling intra-node Direct Server Return (DSR) traffic
on a load balancer (Laura)
* filtering locally generated traffic coming in through AF_PACKET,
such as local ARP traffic generated for clustering purposes or DHCP
(Laura; the AF_PACKET plumbing is contained in a follow-up commit)
* L2 filtering from ingress and egress for AVB (Audio Video Bridging)
and gPTP with nftables (Pablo)
* in the future: in-kernel NAT64/NAT46 (Pablo)
The egress hook introduced herein complements the ingress hook added by
commit e687ad60af09 ("netfilter: add netfilter ingress hook after
handle_ing() under unique static key"). A patch for nftables to hook up
egress rules from user space has been submitted separately, so users may
immediately take advantage of the feature.
Alternatively or in addition to netfilter, packets can be classified
with traffic control (tc). On ingress, packets are classified first by
tc, then by netfilter. On egress, the order is reversed for symmetry.
Conceptually, tc and netfilter can be thought of as layers, with
netfilter layered above tc.
Traffic control is capable of redirecting packets to another interface
(man 8 tc-mirred). E.g., an ingress packet may be redirected from the
host namespace to a container via a veth connection:
tc ingress (host) -> tc egress (veth host) -> tc ingress (veth container)
In this case, netfilter egress classifying is not performed when leaving
the host namespace! That's because the packet is still on the tc layer.
If tc redirects the packet to a physical interface in the host namespace
such that it leaves the system, the packet is never subjected to
netfilter egress classifying. That is only logical since it hasn't
passed through netfilter ingress classifying either.
Packets can alternatively be redirected at the netfilter layer using
nft fwd. Such a packet *is* subjected to netfilter egress classifying
since it has reached the netfilter layer.
Internally, the skb->nf_skip_egress flag controls whether netfilter is
invoked on egress by __dev_queue_xmit(). Because __dev_queue_xmit() may
be called recursively by tunnel drivers such as vxlan, the flag is
reverted to false after sch_handle_egress(). This ensures that
netfilter is applied both on the overlay and underlying network.
Interaction between tc and netfilter is possible by setting and querying
skb->mark.
If netfilter egress classifying is not enabled on any interface, it is
patched out of the data path by way of a static_key and doesn't make a
performance difference that is discernible from noise:
Before: 1537 1538 1538 1537 1538 1537 Mb/sec
After: 1536 1534 1539 1539 1539 1540 Mb/sec
Before + tc accept: 1418 1418 1418 1419 1419 1418 Mb/sec
After + tc accept: 1419 1424 1418 1419 1422 1420 Mb/sec
Before + tc drop: 1620 1619 1619 1619 1620 1620 Mb/sec
After + tc drop: 1616 1624 1625 1624 1622 1619 Mb/sec
When netfilter egress classifying is enabled on at least one interface,
a minimal performance penalty is incurred for every egress packet, even
if the interface it's transmitted over doesn't have any netfilter egress
rules configured. That is caused by checking dev->nf_hooks_egress
against NULL.
Measurements were performed on a Core i7-3615QM. Commands to reproduce:
ip link add dev foo type dummy
ip link set dev foo up
modprobe pktgen
echo "add_device foo" > /proc/net/pktgen/kpktgend_3
samples/pktgen/pktgen_bench_xmit_mode_queue_xmit.sh -i foo -n 400000000 -m "11:11:11:11:11:11" -d 1.1.1.1
Accept all traffic with tc:
tc qdisc add dev foo clsact
tc filter add dev foo egress bpf da bytecode '1,6 0 0 0,'
Drop all traffic with tc:
tc qdisc add dev foo clsact
tc filter add dev foo egress bpf da bytecode '1,6 0 0 2,'
Apply this patch when measuring packet drops to avoid errors in dmesg:
https://lore.kernel.org/netdev/a73dda33-57f4-95d8-ea51-ed483abd6a7a@iogearbox.net/
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: Laura García Liébana <nevola@gmail.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Allow a user space control plane to insert entries with a new NTF_EXT_MANAGED
flag. The flag then indicates to the kernel that the neighbor entry should be
periodically probed for keeping the entry in NUD_REACHABLE state iff possible.
The use case for this is targeting XDP or tc BPF load-balancers which use
the bpf_fib_lookup() BPF helper in order to piggyback on neighbor resolution
for their backends. Given they cannot be resolved in fast-path, a control
plane inserts the L3 (without L2) entries manually into the neighbor table
and lets the kernel do the neighbor resolution either on the gateway or on
the backend directly in case the latter resides in the same L2. This avoids
to deal with L2 in the control plane and to rebuild what the kernel already
does best anyway.
NTF_EXT_MANAGED can be combined with NTF_EXT_LEARNED in order to avoid GC
eviction. The kernel then adds NTF_MANAGED flagged entries to a per-neighbor
table which gets triggered by the system work queue to periodically call
neigh_event_send() for performing the resolution. The implementation allows
migration from/to NTF_MANAGED neighbor entries, so that already existing
entries can be converted by the control plane if needed. Potentially, we could
make the interval for periodically calling neigh_event_send() configurable;
right now it's set to DELAY_PROBE_TIME which is also in line with mlxsw which
has similar driver-internal infrastructure c723c735fa6b ("mlxsw: spectrum_router:
Periodically update the kernel's neigh table"). In future, the latter could
possibly reuse the NTF_MANAGED neighbors as well.
Example:
# ./ip/ip n replace 192.168.178.30 dev enp5s0 managed extern_learn
# ./ip/ip n
192.168.178.30 dev enp5s0 lladdr f4:8c:50:5e:71:9a managed extern_learn REACHABLE
[...]
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Roopa Prabhu <roopa@nvidia.com>
Link: https://linuxplumbersconf.org/event/11/contributions/953/
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently, all bits in struct ndmsg's ndm_flags are used up with the most
recent addition of 435f2e7cc0b7 ("net: bridge: add support for sticky fdb
entries"). This makes it impossible to extend the neighboring subsystem
with new NTF_* flags:
struct ndmsg {
__u8 ndm_family;
__u8 ndm_pad1;
__u16 ndm_pad2;
__s32 ndm_ifindex;
__u16 ndm_state;
__u8 ndm_flags;
__u8 ndm_type;
};
There are ndm_pad{1,2} attributes which are not used. However, due to
uncareful design, the kernel does not enforce them to be zero upon new
neighbor entry addition, and given they've been around forever, it is
not possible to reuse them today due to risk of breakage. One option to
overcome this limitation is to add a new NDA_FLAGS_EXT attribute for
extended flags.
In struct neighbour, there is a 3 byte hole between protocol and ha_lock,
which allows neigh->flags to be extended from 8 to 32 bits while still
being on the same cacheline as before. This also allows for all future
NTF_* flags being in neigh->flags rather than yet another flags field.
Unknown flags in NDA_FLAGS_EXT will be rejected by the kernel.
Co-developed-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Roopa Prabhu <roopa@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
drm-misc-next for v5.16:
UAPI Changes:
- Allow empty drm leases for creating separate GEM namespaces.
Cross-subsystem Changes:
- Slightly rework dma_buf_poll.
- Add dma_resv_for_each_fence_unlocked to iterate, and use it inside
the lockless dma-resv functions.
Core Changes:
- Allow devm_drm_of_get_bridge to build without CONFIG_OF for compile testing.
- Add more DP2 headers.
- fix CONFIG_FB dependency in fb_helper.
- Add DRM_FORMAT_R8 to drm_format_info, and helpers for RGB332 and RGB888.
- Fix crash on a 0 or invalid EDID.
Driver Changes:
- Apply and revert DRM_MODESET_LOCK_ALL_BEGIN.
- Add mode_valid to ti-sn65dsi86 bridge.
- Support multiple syncobjs in v3d.
- Add R8, RGB332 and RGB888 pixel formats to GUD.
- Use devm_add_action_or_reset in dw-hdmi-cec.
Signed-off-by: Dave Airlie <airlied@redhat.com>
# gpg: Signature made Wed 06 Oct 2021 20:48:12 AEST
# gpg: using RSA key B97BD6A80CAC4981091AE547FE558C72A67013C3
# gpg: Good signature from "Maarten Lankhorst <maarten.lankhorst@linux.intel.com>" [expired]
# gpg: aka "Maarten Lankhorst <maarten@debian.org>" [expired]
# gpg: aka "Maarten Lankhorst <maarten.lankhorst@canonical.com>" [expired]
# gpg: Note: This key has expired!
# Primary key fingerprint: B97B D6A8 0CAC 4981 091A E547 FE55 8C72 A670 13C3
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/2602f4e9-a8ac-83f8-6c2a-39fd9ca2e1ba@linux.intel.com
|
|
Reuse the timeval compat code from core/sock to handle 32-bit and
64-bit timeval structures. Also introduce a new socket option define
to allow using y2038 safe timeval under 32-bit.
The existing behavior of sock_set_timeout and vsock's timeout setter
differ when the time value is out of bounds. vsocks current behavior
is retained at the expense of not being able to share the full
implementation.
This allows the LTP test vsock01 to pass under 32-bit compat mode.
Fixes: fe0c72f3db11 ("socket: move compat timeout handling into sock.c")
Signed-off-by: Richard Palethorpe <rpalethorpe@suse.com>
Cc: Richard Palethorpe <rpalethorpe@richiejp.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
No conflicts.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Pull networking fixes from Jakub Kicinski:
"Including fixes from xfrm, bpf, netfilter, and wireless.
Current release - regressions:
- xfrm: fix XFRM_MSG_MAPPING ABI breakage caused by inserting a new
value in the middle of an enum
- unix: fix an issue in unix_shutdown causing the other end
read/write failures
- phy: mdio: fix memory leak
Current release - new code bugs:
- mlx5e: improve MQPRIO resiliency against bad configs
Previous releases - regressions:
- bpf: fix integer overflow leading to OOB access in map element
pre-allocation
- stmmac: dwmac-rk: fix ethernet on rk3399 based devices
- netfilter: conntrack: fix boot failure with
nf_conntrack.enable_hooks=1
- brcmfmac: revert using ISO3166 country code and 0 rev as fallback
- i40e: fix freeing of uninitialized misc IRQ vector
- iavf: fix double unlock of crit_lock
Previous releases - always broken:
- bpf, arm: fix register clobbering in div/mod implementation
- netfilter: nf_tables: correct issues in netlink rule change event
notifications
- dsa: tag_dsa: fix mask for trunked packets
- usb: r8152: don't resubmit rx immediately to avoid soft lockup on
device unplug
- i40e: fix endless loop under rtnl if FW fails to correctly respond
to capability query
- mlx5e: fix rx checksum offload coexistence with ipsec offload
- mlx5: force round second at 1PPS out start time and allow it only
in supported clock modes
- phy: pcs: xpcs: fix incorrect CL37 AN sequence, EEE disable
sequence
Misc:
- xfrm: slightly rejig the new policy uAPI to make it less cryptic"
* tag 'net-5.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (66 commits)
net: prefer socket bound to interface when not in VRF
iavf: fix double unlock of crit_lock
i40e: Fix freeing of uninitialized misc IRQ vector
i40e: fix endless loop under rtnl
dt-bindings: net: dsa: marvell: fix compatible in example
ionic: move filter sync_needed bit set
gve: report 64bit tx_bytes counter from gve_handle_report_stats()
gve: fix gve_get_stats()
rtnetlink: fix if_nlmsg_stats_size() under estimation
gve: Properly handle errors in gve_assign_qpl
gve: Avoid freeing NULL pointer
gve: Correct available tx qpl check
unix: Fix an issue in unix_shutdown causing the other end read/write failures
net: stmmac: trigger PCS EEE to turn off on link down
net: pcs: xpcs: fix incorrect steps on disable EEE
netlink: annotate data races around nlk->bound
net: pcs: xpcs: fix incorrect CL37 AN sequence
net: sfp: Fix typo in state machine debug string
net/sched: sch_taprio: properly cancel timer from taprio_destroy()
net: bridge: fix under estimation in br_get_linkxstats_size()
...
|
|
Pull hyperv fixes from Wei Liu:
- Replace uuid.h with types.h in a header (Andy Shevchenko)
- Avoid sleeping in atomic context in PCI driver (Long Li)
- Avoid sending IPI to self when it shouldn't (Vitaly Kuznetsov)
* tag 'hyperv-fixes-signed-20211007' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
x86/hyperv: Avoid erroneously sending IPI to 'self'
hyper-v: Replace uuid.h with types.h
PCI: hv: Fix sleep while in non-sleep context when removing child devices from the bus
|
|
Define a macro PCI_EXP_DEVCTL_PAYLOAD_* for every possible Max Payload
Size in linux/pci_regs.h, in the same style as PCI_EXP_DEVCTL_READRQ_*.
Link: https://lore.kernel.org/r/20211005180952.6812-2-kabel@kernel.org
Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Marek Behún <kabel@kernel.org>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Marek Behún <kabel@kernel.org>
Reviewed-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
Add support to wait on multiple futexes. This is the interface
implemented by this syscall:
futex_waitv(struct futex_waitv *waiters, unsigned int nr_futexes,
unsigned int flags, struct timespec *timeout, clockid_t clockid)
struct futex_waitv {
__u64 val;
__u64 uaddr;
__u32 flags;
__u32 __reserved;
};
Given an array of struct futex_waitv, wait on each uaddr. The thread
wakes if a futex_wake() is performed at any uaddr. The syscall returns
immediately if any waiter has *uaddr != val. *timeout is an optional
absolute timeout value for the operation. This syscall supports only
64bit sized timeout structs. The flags argument of the syscall should be
empty, but it can be used for future extensions. Flags for shared
futexes, sizes, etc. should be used on the individual flags of each
waiter.
__reserved is used for explicit padding and should be 0, but it might be
used for future extensions. If the userspace uses 32-bit pointers, it
should make sure to explicitly cast it when assigning to waitv::uaddr.
Returns the array index of one of the woken futexes. There’s no given
information of how many were woken, or any particular attribute of it
(if it’s the first woken, if it is of the smaller index...).
Signed-off-by: André Almeida <andrealmeid@collabora.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210923171111.300673-17-andrealmeid@collabora.com
|
|
ipsec
Steffen Klassert says:
====================
pull request (net): ipsec 2021-10-07
1) Fix a sysbot reported shift-out-of-bounds in xfrm_get_default.
From Pavel Skripkin.
2) Fix XFRM_MSG_MAPPING ABI breakage. The new XFRM_MSG_MAPPING
messages were accidentally not paced at the end.
Fix by Eugene Syromiatnikov.
3) Fix the uapi for the default policy, use explicit field and macros
and make it accessible to userland.
From Nicolas Dichtel.
4) Fix a missing rcu lock in xfrm_notify_userpolicy().
From Nicolas Dichtel.
Please pull or let me know if there are problems.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add an extended state and sub-state to describe link issues related to
transceiver modules.
The 'ETHTOOL_LINK_EXT_SUBSTATE_MODULE_CMIS_NOT_READY' extended sub-state
tells user space that port is unable to gain a carrier because the CMIS
Module State Machine did not reach the ModuleReady (Fully Operational)
state. For example, if the module is stuck at ModuleLowPwr or
ModuleFault state. In case of the latter, user space can read the fault
reason from the module's EEPROM and potentially reset it.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add a pair of new ethtool messages, 'ETHTOOL_MSG_MODULE_SET' and
'ETHTOOL_MSG_MODULE_GET', that can be used to control transceiver
modules parameters and retrieve their status.
The first parameter to control is the power mode of the module. It is
only relevant for paged memory modules, as flat memory modules always
operate in low power mode.
When a paged memory module is in low power mode, its power consumption
is reduced to the minimum, the management interface towards the host is
available and the data path is deactivated.
User space can choose to put modules that are not currently in use in
low power mode and transition them to high power mode before putting the
associated ports administratively up. This is useful for user space that
favors reduced power consumption and lower temperatures over reduced
link up times. In QSFP-DD modules the transition from low power mode to
high power mode can take a few seconds and this transition is only
expected to get longer with future / more complex modules.
User space can control the power mode of the module via the power mode
policy attribute ('ETHTOOL_A_MODULE_POWER_MODE_POLICY'). Possible
values:
* high: Module is always in high power mode.
* auto: Module is transitioned by the host to high power mode when the
first port using it is put administratively up and to low power mode
when the last port using it is put administratively down.
The operational power mode of the module is available to user space via
the 'ETHTOOL_A_MODULE_POWER_MODE' attribute. The attribute is not
reported to user space when a module is not plugged-in.
The user API is designed to be generic enough so that it could be used
for modules with different memory maps (e.g., SFF-8636, CMIS).
The only implementation of the device driver API in this series is for a
MAC driver (mlxsw) where the module is controlled by the device's
firmware, but it is designed to be generic enough so that it could also
be used by implementations where the module is controlled by the CPU.
CMIS testing
============
# ethtool -m swp11
Identifier : 0x18 (QSFP-DD Double Density 8X Pluggable Transceiver (INF-8628))
...
Module State : 0x03 (ModuleReady)
LowPwrAllowRequestHW : Off
LowPwrRequestSW : Off
The module is not in low power mode, as it is not forced by hardware
(LowPwrAllowRequestHW is off) or by software (LowPwrRequestSW is off).
The power mode can be queried from the kernel. In case
LowPwrAllowRequestHW was on, the kernel would need to take into account
the state of the LowPwrRequestHW signal, which is not visible to user
space.
$ ethtool --show-module swp11
Module parameters for swp11:
power-mode-policy high
power-mode high
Change the power mode policy to 'auto':
# ethtool --set-module swp11 power-mode-policy auto
Query the power mode again:
$ ethtool --show-module swp11
Module parameters for swp11:
power-mode-policy auto
power-mode low
Verify with the data read from the EEPROM:
# ethtool -m swp11
Identifier : 0x18 (QSFP-DD Double Density 8X Pluggable Transceiver (INF-8628))
...
Module State : 0x01 (ModuleLowPwr)
LowPwrAllowRequestHW : Off
LowPwrRequestSW : On
Put the associated port administratively up which will instruct the host
to transition the module to high power mode:
# ip link set dev swp11 up
Query the power mode again:
$ ethtool --show-module swp11
Module parameters for swp11:
power-mode-policy auto
power-mode high
Verify with the data read from the EEPROM:
# ethtool -m swp11
Identifier : 0x18 (QSFP-DD Double Density 8X Pluggable Transceiver (INF-8628))
...
Module State : 0x03 (ModuleReady)
LowPwrAllowRequestHW : Off
LowPwrRequestSW : Off
Put the associated port administratively down which will instruct the
host to transition the module to low power mode:
# ip link set dev swp11 down
Query the power mode again:
$ ethtool --show-module swp11
Module parameters for swp11:
power-mode-policy auto
power-mode low
Verify with the data read from the EEPROM:
# ethtool -m swp11
Identifier : 0x18 (QSFP-DD Double Density 8X Pluggable Transceiver (INF-8628))
...
Module State : 0x01 (ModuleLowPwr)
LowPwrAllowRequestHW : Off
LowPwrRequestSW : On
SFF-8636 testing
================
# ethtool -m swp13
Identifier : 0x11 (QSFP28)
...
Extended identifier description : 5.0W max. Power consumption, High Power Class (> 3.5 W) enabled
Power set : Off
Power override : On
...
Transmit avg optical power (Channel 1) : 0.7733 mW / -1.12 dBm
Transmit avg optical power (Channel 2) : 0.7649 mW / -1.16 dBm
Transmit avg optical power (Channel 3) : 0.7790 mW / -1.08 dBm
Transmit avg optical power (Channel 4) : 0.7837 mW / -1.06 dBm
Rcvr signal avg optical power(Channel 1) : 0.9302 mW / -0.31 dBm
Rcvr signal avg optical power(Channel 2) : 0.9079 mW / -0.42 dBm
Rcvr signal avg optical power(Channel 3) : 0.8993 mW / -0.46 dBm
Rcvr signal avg optical power(Channel 4) : 0.8778 mW / -0.57 dBm
The module is not in low power mode, as it is not forced by hardware
(Power override is on) or by software (Power set is off).
The power mode can be queried from the kernel. In case Power override
was off, the kernel would need to take into account the state of the
LPMode signal, which is not visible to user space.
$ ethtool --show-module swp13
Module parameters for swp13:
power-mode-policy high
power-mode high
Change the power mode policy to 'auto':
# ethtool --set-module swp13 power-mode-policy auto
Query the power mode again:
$ ethtool --show-module swp13
Module parameters for swp13:
power-mode-policy auto
power-mode low
Verify with the data read from the EEPROM:
# ethtool -m swp13
Identifier : 0x11 (QSFP28)
Extended identifier description : 5.0W max. Power consumption, High Power Class (> 3.5 W) not enabled
Power set : On
Power override : On
...
Transmit avg optical power (Channel 1) : 0.0000 mW / -inf dBm
Transmit avg optical power (Channel 2) : 0.0000 mW / -inf dBm
Transmit avg optical power (Channel 3) : 0.0000 mW / -inf dBm
Transmit avg optical power (Channel 4) : 0.0000 mW / -inf dBm
Rcvr signal avg optical power(Channel 1) : 0.0000 mW / -inf dBm
Rcvr signal avg optical power(Channel 2) : 0.0000 mW / -inf dBm
Rcvr signal avg optical power(Channel 3) : 0.0000 mW / -inf dBm
Rcvr signal avg optical power(Channel 4) : 0.0000 mW / -inf dBm
Put the associated port administratively up which will instruct the host
to transition the module to high power mode:
# ip link set dev swp13 up
Query the power mode again:
$ ethtool --show-module swp13
Module parameters for swp13:
power-mode-policy auto
power-mode high
Verify with the data read from the EEPROM:
# ethtool -m swp13
Identifier : 0x11 (QSFP28)
...
Extended identifier description : 5.0W max. Power consumption, High Power Class (> 3.5 W) enabled
Power set : Off
Power override : On
...
Transmit avg optical power (Channel 1) : 0.7934 mW / -1.01 dBm
Transmit avg optical power (Channel 2) : 0.7859 mW / -1.05 dBm
Transmit avg optical power (Channel 3) : 0.7885 mW / -1.03 dBm
Transmit avg optical power (Channel 4) : 0.7985 mW / -0.98 dBm
Rcvr signal avg optical power(Channel 1) : 0.9325 mW / -0.30 dBm
Rcvr signal avg optical power(Channel 2) : 0.9034 mW / -0.44 dBm
Rcvr signal avg optical power(Channel 3) : 0.9086 mW / -0.42 dBm
Rcvr signal avg optical power(Channel 4) : 0.8885 mW / -0.51 dBm
Put the associated port administratively down which will instruct the
host to transition the module to low power mode:
# ip link set dev swp13 down
Query the power mode again:
$ ethtool --show-module swp13
Module parameters for swp13:
power-mode-policy auto
power-mode low
Verify with the data read from the EEPROM:
# ethtool -m swp13
Identifier : 0x11 (QSFP28)
...
Extended identifier description : 5.0W max. Power consumption, High Power Class (> 3.5 W) not enabled
Power set : On
Power override : On
...
Transmit avg optical power (Channel 1) : 0.0000 mW / -inf dBm
Transmit avg optical power (Channel 2) : 0.0000 mW / -inf dBm
Transmit avg optical power (Channel 3) : 0.0000 mW / -inf dBm
Transmit avg optical power (Channel 4) : 0.0000 mW / -inf dBm
Rcvr signal avg optical power(Channel 1) : 0.0000 mW / -inf dBm
Rcvr signal avg optical power(Channel 2) : 0.0000 mW / -inf dBm
Rcvr signal avg optical power(Channel 3) : 0.0000 mW / -inf dBm
Rcvr signal avg optical power(Channel 4) : 0.0000 mW / -inf dBm
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
There is no user of anything in uuid.h in the hyperv.h. Replace it with
more appropriate types.h.
Fixes: f081bbb3fd03 ("hyper-v: Remove internal types from UAPI header")
Reported-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Link: https://lore.kernel.org/r/20211001135544.1823-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
|
|
The ACRN hypervisor can emulate a virtual device within hypervisor for a
Guest VM. The emulated virtual device can work without the ACRN
userspace after creation. The hypervisor do the emulation of that device.
To support the virtual device creating/destroying, HSM provides the
following ioctls:
- ACRN_IOCTL_CREATE_VDEV
Pass data struct acrn_vdev from userspace to the hypervisor, and inform
the hypervisor to create a virtual device for a User VM.
- ACRN_IOCTL_DESTROY_VDEV
Pass data struct acrn_vdev from userspace to the hypervisor, and inform
the hypervisor to destroy a virtual device of a User VM.
These new APIs will be used by user space code vm_add_hv_vdev and
vm_remove_hv_vdev in
https://github.com/projectacrn/acrn-hypervisor/blob/master/devicemodel/core/vmmapi.c
Signed-off-by: Shuo Liu <shuo.a.liu@intel.com>
Signed-off-by: Fei Li <fei1.li@intel.com>
Link: https://lore.kernel.org/r/20210923084128.18902-3-fei1.li@intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
MMIO device passthrough enables an OS in a virtual machine to directly
access a MMIO device in the host. It promises almost the native
performance, which is required in performance-critical scenarios of
ACRN.
HSM provides the following ioctls:
- Assign - ACRN_IOCTL_ASSIGN_MMIODEV
Pass data struct acrn_mmiodev from userspace to the hypervisor, and
inform the hypervisor to assign a MMIO device to a User VM.
- De-assign - ACRN_IOCTL_DEASSIGN_PCIDEV
Pass data struct acrn_mmiodev from userspace to the hypervisor, and
inform the hypervisor to de-assign a MMIO device from a User VM.
These new APIs will be used by user space code vm_assign_mmiodev and
vm_deassign_mmiodev in
https://github.com/projectacrn/acrn-hypervisor/blob/master/devicemodel/core/vmmapi.c
Signed-off-by: Shuo Liu <shuo.a.liu@intel.com>
Signed-off-by: Fei Li <fei1.li@intel.com>
Link: https://lore.kernel.org/r/20210923084128.18902-2-fei1.li@intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
An application has come up that has a device sitting right on the IPMB
that would like to communicate with the BMC on the IPMB using normal
IPMI commands.
Sending these commands and handling the responses is easy enough, no
modifications are needed to the IPMI infrastructure. But if this is an
application that also needs to receive IPMB commands and respond, some
way is needed to handle these incoming commands and send the responses.
Currently, the IPMI message handler only sends commands to the interface
and only receives responses from interface. This change extends the
interface to receive commands/responses and send commands/responses.
These are formatted differently in support of receiving/sending IPMB
messages directly.
Signed-off-by: Corey Minyard <minyard@acm.org>
Tested-by: Andrew Manley <andrew.manley@sealingtech.com>
Reviewed-by: Andrew Manley <andrew.manley@sealingtech.com>
|
|
Spell "RESPONSE" correctly in a comment.
Signed-off-by: Corey Minyard <cminyard@mvista.com>
|
|
Since the openat2(2) syscall uses a struct open_how pointer to communicate
its parameters they are not usefully recorded by the audit SYSCALL record's
four existing arguments.
Add a new audit record type OPENAT2 that reports the parameters in its
third argument, struct open_how with fields oflag, mode and resolve.
The new record in the context of an event would look like:
time->Wed Mar 17 16:28:53 2021
type=PROCTITLE msg=audit(1616012933.531:184): proctitle=
73797363616C6C735F66696C652F6F70656E617432002F746D702F61756469742D
7465737473756974652D737641440066696C652D6F70656E617432
type=PATH msg=audit(1616012933.531:184): item=1 name="file-openat2"
inode=29 dev=00:1f mode=0100600 ouid=0 ogid=0 rdev=00:00
obj=unconfined_u:object_r:user_tmp_t:s0 nametype=CREATE
cap_fp=0 cap_fi=0 cap_fe=0 cap_fver=0 cap_frootid=0
type=PATH msg=audit(1616012933.531:184):
item=0 name="/root/rgb/git/audit-testsuite/tests"
inode=25 dev=00:1f mode=040700 ouid=0 ogid=0 rdev=00:00
obj=unconfined_u:object_r:user_tmp_t:s0 nametype=PARENT
cap_fp=0 cap_fi=0 cap_fe=0 cap_fver=0 cap_frootid=0
type=CWD msg=audit(1616012933.531:184):
cwd="/root/rgb/git/audit-testsuite/tests"
type=OPENAT2 msg=audit(1616012933.531:184):
oflag=0100302 mode=0600 resolve=0xa
type=SYSCALL msg=audit(1616012933.531:184): arch=c000003e syscall=437
success=yes exit=4 a0=3 a1=7ffe315f1c53 a2=7ffe315f1550 a3=18
items=2 ppid=528 pid=540 auid=0 uid=0 gid=0 euid=0 suid=0
fsuid=0 egid=0 sgid=0 fsgid=0 tty=ttyS0 ses=1 comm="openat2"
exe="/root/rgb/git/audit-testsuite/tests/syscalls_file/openat2"
subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
key="testsuite-1616012933-bjAUcEPO"
Link: https://lore.kernel.org/r/d23fbb89186754487850367224b060e26f9b7181.1621363275.git.rgb@redhat.com
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
[PM: tweak subject, wrap example, move AUDIT_OPENAT2 to 1337]
Signed-off-by: Paul Moore <paul@paul-moore.com>
|
|
This patch adds support for the ip6ip6 encapsulation by providing three encap
modes: inline, encap and auto.
Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The KVM host kernel is running in HS-mode needs so we need to handle
the SBI calls coming from guest kernel running in VS-mode.
This patch adds SBI v0.1 support in KVM RISC-V. Almost all SBI v0.1
calls are implemented in KVM kernel module except GETCHAR and PUTCHART
calls which are forwarded to user space because these calls cannot be
implemented in kernel space. In future, when we implement SBI v0.2 for
Guest, we will forward SBI v0.2 experimental and vendor extension calls
to user space.
Signed-off-by: Atish Patra <atish.patra@wdc.com>
Signed-off-by: Anup Patel <anup.patel@wdc.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Palmer Dabbelt <palmerdabbelt@google.com>
|
|
A small part of the declaration concerning filehandle format are
currently in the "uapi" include directory:
include/uapi/linux/nfsd/nfsfh.h
There is a lot more to the filehandle format, including "enum fid_type"
and "enum nfsd_fsid" which are not exported via "uapi".
This small part of the filehandle definition is of minimal use outside
of the kernel, and I can find no evidence that an other code is using
it. Certainly nfs-utils and wireshark (The most likely candidates) do not
use these declarations.
So move it out of "uapi" by copying the content from
include/uapi/linux/nfsd/nfsfh.h
into
fs/nfsd/nfsfh.h
A few unnecessary "#include" directives are not copied, and neither is
the #define of fh_auth, which is annotated as being for userspace only.
The copyright claims in the uapi file are identical to those in the nfsd
file, so there is no need to copy those.
The "__u32" style integer types are only needed in "uapi". In
kernel-only code we can use the more familiar "u32" style.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Daniel Borkmann says:
====================
bpf-next 2021-10-02
We've added 85 non-merge commits during the last 15 day(s) which contain
a total of 132 files changed, 13779 insertions(+), 6724 deletions(-).
The main changes are:
1) Massive update on test_bpf.ko coverage for JITs as preparatory work for
an upcoming MIPS eBPF JIT, from Johan Almbladh.
2) Add a batched interface for RX buffer allocation in AF_XDP buffer pool,
with driver support for i40e and ice from Magnus Karlsson.
3) Add legacy uprobe support to libbpf to complement recently merged legacy
kprobe support, from Andrii Nakryiko.
4) Add bpf_trace_vprintk() as variadic printk helper, from Dave Marchevsky.
5) Support saving the register state in verifier when spilling <8byte bounded
scalar to the stack, from Martin Lau.
6) Add libbpf opt-in for stricter BPF program section name handling as part
of libbpf 1.0 effort, from Andrii Nakryiko.
7) Add a document to help clarifying BPF licensing, from Alexei Starovoitov.
8) Fix skel_internal.h to propagate errno if the loader indicates an internal
error, from Kumar Kartikeya Dwivedi.
9) Fix build warnings with -Wcast-function-type so that the option can later
be enabled by default for the kernel, from Kees Cook.
10) Fix libbpf to ignore STT_SECTION symbols in legacy map definitions as it
otherwise errors out when encountering them, from Toke Høiland-Jørgensen.
11) Teach libbpf to recognize specialized maps (such as for perf RB) and
internally remove BTF type IDs when creating them, from Hengqi Chen.
12) Various fixes and improvements to BPF selftests.
====================
Link: https://lore.kernel.org/r/20211002001327.15169-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Each region has an independently configurable number of maximum
snapshots. This information is not reported to userspace, making it not
very discoverable. Fix this by adding a new
DEVLINK_ATTR_REGION_MAX_SNAPSHOST attribute which is used to report this
maximum.
Ex:
$devlink region
pci/0000:af:00.0/nvm-flash: size 10485760 snapshot [] max 1
pci/0000:af:00.0/device-caps: size 4096 snapshot [] max 10
pci/0000:af:00.1/nvm-flash: size 10485760 snapshot [] max 1
pci/0000:af:00.1/device-caps: size 4096 snapshot [] max 10
This information enables users to understand why a new region command
may fail due to having too many existing snapshots.
Reported-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
drivers/net/phy/bcm7xxx.c
d88fd1b546ff ("net: phy: bcm7xxx: Fixed indirect MMD operations")
f68d08c437f9 ("net: phy: bcm7xxx: Add EPHY entry for 72165")
net/sched/sch_api.c
b193e15ac69d ("net: prevent user from passing illegal stab size")
69508d43334e ("net_sched: Use struct_size() and flex_array_size() helpers")
Both cases trivial - adjacent code additions.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This patch lets user-space request a non-coherent memory
allocation during CREATE_BUFS and REQBUFS ioctl calls.
= CREATE_BUFS
struct v4l2_create_buffers has seven 4-byte reserved areas,
so reserved[0] is renamed to ->flags. The struct, thus, now
has six reserved 4-byte regions.
= CREATE_BUFS32
struct v4l2_create_buffers32 has seven 4-byte reserved areas,
so reserved[0] is renamed to ->flags. The struct, thus, now
has six reserved 4-byte regions.
= REQBUFS
We use one byte of a 4 byte ->reserved[1] member of struct
v4l2_requestbuffers. The struct, thus, now has reserved 3 bytes.
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
|