Age | Commit message (Collapse) | Author | Files | Lines |
|
This is a defense-in-depth measure in response to bugs like
4d6fa57b4dab ("macsec: avoid heap overflow in skb_to_sgvec"). There's
not only a potential overflow of sglist items, but also a stack overflow
potential, so we fix this by limiting the amount of recursion this function
is allowed to do. Not actually providing a bounded base case is a future
disaster that we can easily avoid here.
As a small matter of house keeping, we take this opportunity to move the
documentation comment over the actual function the documentation is for.
While this could be implemented by using an explicit stack of skbuffs,
when implementing this, the function complexity increased considerably,
and I don't think such complexity and bloat is actually worth it. So,
instead I built this and tested it on x86, x86_64, ARM, ARM64, and MIPS,
and measured the stack usage there. I also reverted the recent MIPS
changes that give it a separate IRQ stack, so that I could experience
some worst-case situations. I found that limiting it to 24 layers deep
yielded a good stack usage with room for safety, as well as being much
deeper than any driver actually ever creates.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: David Howells <dhowells@redhat.com>
Cc: Sabrina Dubroca <sd@queasysnail.net>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This commit updates documentation of the bpf_perf_event_output and
bpf_perf_event_read helpers to match their implementation.
Signed-off-by: Teng Qin <qinteng@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Allow BPF_PROG_TYPE_PERF_EVENT program types to attach to all
perf_event types, including HW_CACHE, RAW, and dynamic pmu events.
Only tracepoint/kprobe events are treated differently which require
BPF_PROG_TYPE_TRACEPOINT/BPF_PROG_TYPE_KPROBE program types accordingly.
Also add support for reading all event counters using
bpf_perf_event_read() helper.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The command
# arp -s 62.2.0.1 a:b:c:d:e:f dev eth2
adds an entry like the following (listed by "arp -an")
? (62.2.0.1) at 0a:0b:0c:0d:0e:0f [ether] PERM on eth2
but the symmetric deletion command
# arp -i eth2 -d 62.2.0.1
does not remove the PERM entry from the table, and instead leaves behind
? (62.2.0.1) at <incomplete> on eth2
The reason is that there is a refcnt of 1 for the arp_tbl itself
(neigh_alloc starts off the entry with a refcnt of 1), thus
the neigh_release() call from arp_invalidate() will (at best) just
decrement the ref to 1, but will never actually free it from the
table.
To fix this, we need to do something like neigh_forced_gc: if
the refcnt is 1 (i.e., on the table's ref), remove the entry from
the table and free it. This patch refactors and shares common code
between neigh_forced_gc and the newly added neigh_remove_one.
A similar issue exists for IPv6 Neighbor Cache entries, and is fixed
in a similar manner by this patch.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Reviewed-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
There was no reason for duplicating the code that initializes
ds->enabled_port_mask in both dsa_parse_ports_dn() and
dsa_parse_ports(), instead move this to dsa_ds_parse() which is early
enough before ops->setup() has run.
While at it, we can now make dsa_is_cpu_port() check ds->cpu_port_mask
which is a step towards being multi-CPU port capable.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Benefit from the support of ip header fields dissection and
allow users to set rules matching on ipv4 tos and ttl or
ipv6 traffic-class and hoplimit.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add support for dissection of ip tos and ttl and ipv6 traffic-class
and hoplimit. Both are dissected into the same struct.
Uses similar call to ip dissection function as with tcp, arp and others.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Since last patch, sctp doesn't need to alloc memory for asoc->stream any
more. sctp_stream_new and sctp_stream_init both are used to alloc memory
for stream.in or stream.out, and their names are also confusing.
This patch is to merge them into sctp_stream_init, and only pass stream
and streamcnt parameters into it, instead of the whole asoc.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
As Marcelo's suggestion, stream is a fixed size member of asoc and would
not grow with more streams. To avoid an allocation for it, this patch is
to define it as an object instead of pointer and update the places using
it, also create sctp_stream_update() called in sctp_assoc_update() to
migrate the stream info from one stream to another.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Jeff Kirsher says:
====================
40GbE Intel Wired LAN Driver Updates 2017-06-01
This series contains updates to i40e, i40evf and the "new" AVF virtchnl.
This is the introduction of the Intel(R) Ethernet Adaptive Virtual
Function driver code and device ID, as presented at the NetDEV 1.2
conference in 2016.
http://netdevconf.org/1.2/session.html?anjali-singhai
The idea is to convert the interface between the i40evf driver
and the parent i40e PF driver to be generic, as the i40evf driver
should in the future be able to run on top of other Intel PF
drivers, and negotiate any features beyond a "base expected" set.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Enhance API between qedi and qed, allowing qedi to inform device's
firmware when the iSCSI mac is to be changed.
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Storage drivers require images from the nvram in boot-from-SAN
scenarios. This provides the necessary API between qed and the
protocol drivers to perform such reads.
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Share several new tidbits with qedf:
- wwpn & wwnn
- Absolute pf-id [this one is actually meant for qedi as well]
- Number of available CQs
While we're at it, now that qedf will be aware of the available CQs
we can add some validation on the inputs it provides.
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add a new macro MODULE_ALIAS_TEAM_MODE to unify and simplify the
declaration of team mode alias.
Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Since dev->dsa_ptr is a pointer to a dsa_switch_tree, there is no need
to have another inline helper just to check rcv.
Remove dsa_uses_tagged_protocol and check dsa_ptr && dsa_ptr->rcv
together at the same time.
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The DSA layer uses inline helpers and copy of the tagging functions for
faster access in hot path. Add comments to detail that.
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This uses preprocessor tricks to make sure that a divide by zero
occurs if a struct changes size outside the expected number of
bytes.
Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
This removes holes and makes structure sizes consistent across
32 and 64 bit builds.
Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
This moves a function that is needed for the virtchnl interface
from the i40e PF driver over to the virtchnl.h file.
It was manually verified that the function in question is unchanged
except for the function name and function header, which explains
the slight difference in the number of lines removed/added.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
This patch implements the complete version of the virtchnl.h file
with final renames, and fixes the related code in i40e and i40evf.
It also expands comments, and adds details on the usage of
certain fields.
In addition, due to the changes a couple of casts are needed
to prevent errors found by sparse after renaming some fields.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
This patch fixes up a bunch of whitespace issues introduced
by the previous automated change of name from i40e to virtchnl.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
This change updates the arguments passed to the validate function
and fixes the caller, as well as uses the new return values added to
virtchnl.h
One other minor tweak, remove a duplicate set to zero of valid_len.
This is in preparation for moving the function to virtchnl.h.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
As part of the conversion, change the arguments
to VF_IS_V1[01] macros and move them to virtchnl.h
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
This morphs all the i40e and i40evf references to/in virtchnl.h
to be generic, using only automated methods. Updates all the
callers to use the new names. A followup patch provides separate
clean ups for messy line conversions from these "automatic"
changes, to make them more reviewable.
Was executed with the following sed script:
sed -i -f transform_script drivers/net/ethernet/intel/i40e/i40e_client.c
sed -i -f transform_script drivers/net/ethernet/intel/i40e/i40e_prototype.h
sed -i -f transform_script drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
sed -i -f transform_script drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.h
sed -i -f transform_script drivers/net/ethernet/intel/i40evf/i40e_common.c
sed -i -f transform_script drivers/net/ethernet/intel/i40evf/i40e_prototype.h
sed -i -f transform_script drivers/net/ethernet/intel/i40evf/i40evf.h
sed -i -f transform_script drivers/net/ethernet/intel/i40evf/i40evf_client.c
sed -i -f transform_script drivers/net/ethernet/intel/i40evf/i40evf_main.c
sed -i -f transform_script drivers/net/ethernet/intel/i40evf/i40evf_virtchnl.c
sed -i -f transform_script include/linux/avf/virtchnl.h
transform_script:
----8<----
s/I40E_VIRTCHNL_SUPPORTED_QTYPES/SAVE_ME_SUPPORTED_QTYPES/g
s/I40E_VIRTCHNL_VF_CAP/SAVE_ME_VF_CAP/g
s/I40E_VIRTCHNL_/VIRTCHNL_/g
s/i40e_virtchnl_/virtchnl_/g
s/i40e_vfr_/virtchnl_vfr_/g
s/I40E_VFR_/VIRTCHNL_VFR_/g
s/VIRTCHNL_OP_ADD_ETHER_ADDRESS/VIRTCHNL_OP_ADD_ETH_ADDR/g
s/VIRTCHNL_OP_DEL_ETHER_ADDRESS/VIRTCHNL_OP_DEL_ETH_ADDR/g
s/VIRTCHNL_OP_FCOE/VIRTCHNL_OP_RSVD/g
s/SAVE_ME_SUPPORTED_QTYPES/I40E_VIRTCHNL_SUPPORTED_QTYPES/g
s/SAVE_ME_VF_CAP/I40E_VIRTCHNL_VF_CAP/g
----8<----
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
This moves a header for i40evf to include/linux/avf/virtchnl.h.
The directory name AVF is an acronym for the Intel(R) Adaptive
Virtual Function.
This first step creates the new file, which is a rename of
drivers/net/ethernet/intel/i40evf/i40e_virtchnl.h to
include/linux/avf/virtchnl.h, and should show up in git
as a rename when using git log --follow.
To keep things building after the move, the changes to the i40evf
driver are made to point to the new include file location.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
The dwmac-sun8i is a heavy hacked version of stmmac hardware by
allwinner.
In fact the only common part is the descriptor management and the first
register function.
Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Instead of adding more ifthen logic for adding a new mac_device_info
setup function, it is easier to add a function pointer to the function
needed.
Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We already have an API struct that contains interrupt-related
numbers. Use it to encapsulate all information relating to the
status of SBs as (used|free).
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The KSZ9477 is a fully integrated layer 2, managed, 7 ports GigE switch
with numerous advanced features. 5 ports incorporate 10/100/1000 Mbps PHYs.
The other 2 ports have interfaces that can be configured as SGMII, RGMII, MII
or RMII. Either of these may connect directly to a host processor or
to an external PHY. The SGMII port may interface to a fiber optic transceiver.
This driver currently supports vlan, fdb, mdb & mirror dsa switch operations.
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Woojung Huh <Woojung.Huh@microchip.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Adding Microchip 9477 Phy included in KSZ9477 Switch.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Woojung Huh <Woojung.Huh@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Adding support for the Microchip KSZ switch family tail tagging.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Woojung Huh <Woojung.Huh@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
teach verifier to track bpf program stack depth
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
free up BPF_JMP | BPF_CALL | BPF_X opcode to be used by actual
indirect call by register and use kernel internal opcode to
mark call instruction into bpf_tail_call() helper.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Forgetting to disable preemption around tcf_action_stats_update()
seems to be a common mistake. Add a helper function for updating
stats on all actions of a filter.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The current dsa_register_switch function takes a useless struct device
pointer argument, which always equals ds->dev.
Drivers either call it with ds->dev, or with the same device pointer
passed to dsa_switch_alloc, which ends up being assigned to ds->dev.
This patch removes the second argument of the dsa_register_switch and
_dsa_register_switch functions.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pass extack arg down to lwtunnel_build_state and the build_state callbacks.
Add messages for failures in lwtunnel_build_state, and add the extarg to
nla_parse where possible in the build_state callbacks.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pass extack down to lwtunnel_valid_encap_type and
lwtunnel_valid_encap_type_attr. Add messages for unknown
or unsupported encap types.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add extack error message for invalid prefix length and invalid prefix.
Example of the latter is a route spec containing 172.16.100.1/24, where
the /24 mask means the lower 8-bits should be 0. Amazing how easy that
one is to overlook when an EINVAL is returned.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Some Ethernet drivers will attach/connect to a PHY device before calling
register_netdevice() which is responsible for calling netdev_register_kobject()
which would do the network device's kobject initialization. In such a case,
sysfs_create_link() would return -ENOENT because the network device's kobject
is not ready yet, and we would fail to connect to the PHY device.
In order to keep things simple and symetrical, we just take the success path as
indicative of the ability to access the network device's kobject, and create
the second link if that's the case.
Fixes: 5568363f0cb3 ("net: phy: Create sysfs reciprocal links for attached_dev/phydev")
Reported-by: Woojung Hung <Woojung.Huh@microchip.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Whenever a user changes bonding options, a NETDEV_CHANGEINFODATA
notificatin is generated which results in a rtnelink message to
be sent. While runnig 'ip monitor', we can actually see 2 messages,
one a result of the event, and the other a result of state change
that is generated bo netdev_state_change(). However, this is not
always the case. If bonding changes were done via sysfs or ifenslave
(old ioctl interface), then only 1 message is seen.
This patch removes duplicate messages in the case of using netlink
to configure bonding. It introduceds a separte function that
triggers a netdev event and uses that function in the syfs and ioctl
cases.
This was discovered while auditing all the different envents and
continues the effort of cleaning up duplicated netlink messages.
CC: David Ahern <dsa@cumulusnetworks.com>
CC: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When netdev events happen, a rtnetlink_event() handler will send
messages for every event in it's white list. These messages contain
current information about a particular device, but they do not include
the iformation about which event just happened. So, it is impossible
to tell what just happend for these events.
This patch adds a new extension to RTM_NEWLINK message called IFLA_EVENT
that would have an encoding of event that triggered this
message. This would allow the the message consumer to easily determine
if it needs to perform certain actions.
Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Overlapping changes in drivers/net/phy/marvell.c, bug fix in 'net'
restricting a HW workaround alongside cleanups in 'net-next'.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pull networking fixes from David Miller:
1) Fix state pruning in bpf verifier wrt. alignment, from Daniel
Borkmann.
2) Handle non-linear SKBs properly in SCTP ICMP parsing, from Davide
Caratti.
3) Fix bit field definitions for rss_hash_type of descriptors in mlx5
driver, from Jesper Brouer.
4) Defer slave->link updates until bonding is ready to do a full commit
to the new settings, from Nithin Sujir.
5) Properly reference count ipv4 FIB metrics to avoid use after free
situations, from Eric Dumazet and several others including Cong Wang
and Julian Anastasov.
6) Fix races in llc_ui_bind(), from Lin Zhang.
7) Fix regression of ESP UDP encapsulation for TCP packets, from
Steffen Klassert.
8) Fix mdio-octeon driver Kconfig deps, from Randy Dunlap.
9) Fix regression in setting DSCP on ipv6/GRE encapsulation, from Peter
Dawson.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (43 commits)
ipv4: add reference counting to metrics
net: ethernet: ax88796: don't call free_irq without request_irq first
ip6_tunnel, ip6_gre: fix setting of DSCP on encapsulated packets
sctp: fix ICMP processing if skb is non-linear
net: llc: add lock_sock in llc_ui_bind to avoid a race condition
bonding: Don't update slave->link until ready to commit
test_bpf: Add a couple of tests for BPF_JSGE.
bpf: add various verifier test cases
bpf: fix wrong exposure of map_flags into fdinfo for lpm
bpf: add bpf_clone_redirect to bpf_helper_changes_pkt_data
bpf: properly reset caller saved regs after helper call and ld_abs/ind
bpf: fix incorrect pruning decision when alignment must be tracked
arp: fixed -Wuninitialized compiler warning
tcp: avoid fastopen API to be used on AF_UNSPEC
net: move somaxconn init from sysctl code
net: fix potential null pointer dereference
geneve: fix fill_info when using collect_metadata
virtio-net: enable TSO/checksum offloads for Q-in-Q vlans
be2net: Fix offload features for Q-in-Q packets
vlan: Fix tcp checksum offloads in Q-in-Q vlans
...
|
|
During enslavement to a bridge, after the CHANGEUPPER is sent, the
multicast enabled state of the bridge isn't propagated down to the
offloading driver unless it's changed.
This patch allows such drivers to query the multicast enabled state from
the bridge, so that they'll be able to correctly configure their flood
tables during port enslavement.
In case multicast is disabled, unregistered multicast packets can be
treated as broadcast and be flooded through all the bridge ports.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
It's useful for drivers supporting bridge offload to be able to query
the bridge's VLAN filtering state.
Currently, upon enslavement to a bridge master, the offloading driver
will only learn about the bridge's VLAN filtering state after the bridge
device was already linked with its slave.
Being able to query the bridge's VLAN filtering state allows such
drivers to forbid enslavement in case resource couldn't be allocated for
a VLAN-aware bridge and also choose the correct initialization routine
for the enslaved port, which is dependent on the bridge type.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Andrey Konovalov reported crashes in ipv4_mtu()
I could reproduce the issue with KASAN kernels, between
10.246.7.151 and 10.246.7.152 :
1) 20 concurrent netperf -t TCP_RR -H 10.246.7.152 -l 1000 &
2) At the same time run following loop :
while :
do
ip ro add 10.246.7.152 dev eth0 src 10.246.7.151 mtu 1500
ip ro del 10.246.7.152 dev eth0 src 10.246.7.151 mtu 1500
done
Cong Wang attempted to add back rt->fi in commit
82486aa6f1b9 ("ipv4: restore rt->fi for reference counting")
but this proved to add some issues that were complex to solve.
Instead, I suggested to add a refcount to the metrics themselves,
being a standalone object (in particular, no reference to other objects)
I tried to make this patch as small as possible to ease its backport,
instead of being super clean. Note that we believe that only ipv4 dst
need to take care of the metric refcount. But if this is wrong,
this patch adds the basic infrastructure to extend this to other
families.
Many thanks to Julian Anastasov for reviewing this patch, and Cong Wang
for his efforts on this problem.
Fixes: 2860583fe840 ("ipv4: Kill rt->fi")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This flag when specified will return matched fib result in
response to a RTM_GETROUTE query.
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Prefix is needed for returning matching route spec on get route request.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
A later patch wants access to the fib result on an input route lookup
with the rcu lock held. Refactor ip_route_input_noref pushing the logic
between rcu_read_lock ... rcu_read_unlock into a new helper that takes
the fib_result as an input arg.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
A later patch wants access to the fib result on an output route lookup
with the rcu lock held. Refactor __ip_route_output_key_hash, pushing
the logic between rcu_read_lock ... rcu_read_unlock into a new helper
with the fib_result as an input arg.
To keep the name length under control remove the leading underscores
from the name and add _rcu to the name of the new helper indicating it
is called with the rcu read lock held.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|