aboutsummaryrefslogtreecommitdiffstats
path: root/net (follow)
AgeCommit message (Collapse)AuthorFilesLines
2021-10-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller16-99/+122
Lots of simnple overlapping additions. With a build fix from Stephen Rothwell. Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-21net/core: Remove unused assignment operations and variableluo penghao1-2/+1
Although if_info_size is assigned, it has not been used. And the variable should also be deleted. The clang_analyzer complains as follows: net/core/rtnetlink.c:3806: warning: Although the value stored to 'if_info_size' is used in the enclosing expression, the value is never actually read from 'if_info_size'. Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: luo penghao <luo.penghao@zte.com.cn> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-21net: stats: Read the statistics in ___gnet_stats_copy_basic() instead of adding.Sebastian Andrzej Siewior1-6/+37
Since the rework, the statistics code always adds up the byte and packet value(s). On 32bit architectures a seqcount_t is used in gnet_stats_basic_sync to ensure that the 64bit values are not modified during the read since two 32bit loads are required. The usage of a seqcount_t requires a lock to ensure that only one writer is active at a time. This lock leads to disabled preemption during the update. The lack of disabling preemption is now creating a warning as reported by Naresh since the query done by gnet_stats_copy_basic() is in preemptible context. For ___gnet_stats_copy_basic() there is no need to disable preemption since the update is performed on stack and can't be modified by another writer. Instead of disabling preemption, to avoid the warning, simply create a read function to just read the values and return as u64. Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Fixes: 67c9e6270f301 ("net: sched: Protect Qdisc::bstats with u64_stats") Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-21net: dsa: tag_8021q: make dsa_8021q_{rx,tx}_vid take dp as argumentVladimir Oltean3-19/+19
Pass a single argument to dsa_8021q_rx_vid and dsa_8021q_tx_vid that contains the necessary information from the two arguments that are currently provided: the switch and the port number. Also rename those functions so that they have a dsa_port_* prefix, since they operate on a struct dsa_port *. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-21net: dsa: tag_sja1105: do not open-code dsa_switch_for_each_portVladimir Oltean1-4/+1
Find the remaining iterators over dst->ports that only filter for the ports belonging to a certain switch, and replace those with the dsa_switch_for_each_port helper that we have now. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-21net: dsa: convert cross-chip notifiers to iterate using dpVladimir Oltean2-102/+112
The majority of cross-chip switch notifiers need to filter in some way over the type of ports: some install VLANs etc on all cascade ports. The difference is that the matching function, which filters by port type, is separate from the function where the iteration happens. So this patch needs to refactor the matching functions' prototypes as well, to take the dp as argument. In a future patch/series, I might convert dsa_towards_port to return a struct dsa_port *dp too, but at the moment it is a bit entangled with dsa_routing_port which is also used by mv88e6xxx and they both return an int port. So keep dsa_towards_port the way it is and convert it into a dp using dsa_to_port. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-21net: dsa: remove gratuitous use of dsa_is_{user,dsa,cpu}_portVladimir Oltean1-2/+2
Find the occurrences of dsa_is_{user,dsa,cpu}_port where a struct dsa_port *dp was already available in the function scope, and replace them with the dsa_port_is_{user,dsa,cpu} equivalent function which uses that dp directly and does not perform another hidden dsa_to_port(). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-21net: dsa: do not open-code dsa_switch_for_each_portVladimir Oltean1-30/+14
Find the remaining iterators over dst->ports that only filter for the ports belonging to a certain switch, and replace those with the dsa_switch_for_each_port helper that we have now. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-21net: dsa: remove the "dsa_to_port in a loop" antipattern from the coreVladimir Oltean5-52/+42
Ever since Vivien's conversion of the ds->ports array into a dst->ports list, and the introduction of dsa_to_port, iterations through the ports of a switch became quadratic whenever dsa_to_port was needed. dsa_to_port can either be called directly, or indirectly through the dsa_is_{user,cpu,dsa,unused}_port helpers. Use the newly introduced dsa_switch_for_each_port() iteration macro that works with the iterator variable being a struct dsa_port *dp directly, and not an int i. It is an expensive variable to go from i to dp, but cheap to go from dp to i. This macro iterates through the entire ds->dst->ports list and filters by the ports belonging just to the switch provided as argument. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller6-51/+19
Pablo Neira Ayuso says: ==================== Netfilter/IPVS fixes for net The following patchset contains Netfilter fixes for net: 1) Crash due to missing initialization of timer data in xt_IDLETIMER, from Juhee Kang. 2) NF_CONNTRACK_SECMARK should be bool in Kconfig, from Vegard Nossum. 3) Skip netdev events on netns removal, from Florian Westphal. 4) Add testcase to show port shadowing via UDP, also from Florian. 5) Remove pr_debug() code in ip6t_rt, this fixes a crash due to unsafe access to non-linear skbuff, from Xin Long. 6) Make net/ipv4/vs/debug_level read-only from non-init netns, from Antoine Tenart. 7) Remove bogus invocation to bash in selftests/netfilter/nft_flowtable.sh also from Florian. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-20fq_codel: generalise ce_threshold marking for subset of trafficToke Høiland-Jørgensen2-5/+11
Commit e72aeb9ee0e3 ("fq_codel: implement L4S style ce_threshold_ect1 marking") expanded the ce_threshold feature of FQ-CoDel so it can be applied to a subset of the traffic, using the ECT(1) bit of the ECN field as the classifier. However, hard-coding ECT(1) as the only classifier for this feature seems limiting, so let's expand it to be more general. To this end, change the parameter from a ce_threshold_ect1 boolean, to a one-byte selector/mask pair (ce_threshold_{selector,mask}) which is applied to the whole diffserv/ECN field in the IP header. This makes it possible to classify packets by any value in either the ECN field or the diffserv field. In particular, setting a selector of INET_ECN_ECT_1 and a mask of INET_ECN_MASK corresponds to the functionality before this patch, and a mask of ~INET_ECN_MASK allows using the selector as a straight-forward match against a diffserv code point: # apply ce_threshold to ECT(1) traffic tc qdisc replace dev eth0 root fq_codel ce_threshold 1ms ce_threshold_selector 0x1/0x3 # apply ce_threshold to ECN-capable traffic marked as diffserv AF22 tc qdisc replace dev eth0 root fq_codel ce_threshold 1ms ce_threshold_selector 0x50/0xfc Regardless of the selector chosen, the normal rules for ECN-marking of packets still apply, i.e., the flow must still declare itself ECN-capable by setting one of the bits in the ECN field to get marked at all. v2: - Add tc usage examples to patch description Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20211019174709.69081-1-toke@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-20net-core: use netdev_* calls for kernel messagesJesse Brandeburg1-12/+10
While loading a driver and changing the number of queues, I noticed this message in the kernel log: "[253489.070080] Number of in use tx queues changed invalidating tc mappings. Priority traffic classification disabled!" But I had no idea what interface was being talked about because this message used pr_warn(). After investigating, it appears we can use the netdev_* helpers already defined to create predictably formatted messages, and that already handle <unknown netdev> cases, in more of the messages in dev.c. After this change, this message (and others) will look like this: "[ 170.181093] ice 0000:3b:00.0 ens785f0: Number of in use tx queues changed invalidating tc mappings. Priority traffic classification disabled!" One goal here was not to change the message significantly from the original format so as to not break user's expectations, so I just changed messages that used pr_* and generally started with %s == dev->name. Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-20batman-adv: use eth_hw_addr_set() instead of ether_addr_copy()Jakub Kicinski1-1/+1
Commit 406f42fa0d3c ("net-next: When a bond have a massive amount of VLANs...") introduced a rbtree for faster Ethernet address look up. To maintain netdev->dev_addr in this tree we need to make all the writes to it got through appropriate helpers. Convert batman from ether_addr_copy() to eth_hw_addr_set(): @@ expression dev, np; @@ - ether_addr_copy(dev->dev_addr, np) + eth_hw_addr_set(dev, np) Signed-off-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-20mac802154: use dev_addr_set() - manualJakub Kicinski2-8/+9
Commit 406f42fa0d3c ("net-next: When a bond have a massive amount of VLANs...") introduced a rbtree for faster Ethernet address look up. To maintain netdev->dev_addr in this tree we need to make all the writes to it got through appropriate helpers. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-20mac802154: use dev_addr_set()Jakub Kicinski1-1/+1
Commit 406f42fa0d3c ("net-next: When a bond have a massive amount of VLANs...") introduced a rbtree for faster Ethernet address look up. To maintain netdev->dev_addr in this tree we need to make all the writes to it got through appropriate helpers. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-20batman-adv: prepare for const netdev->dev_addrJakub Kicinski5-13/+14
netdev->dev_addr will be constant soon, make sure the qualifier is propagated thru batman-adv. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-19net: dsa: Fix an error handling path in 'dsa_switch_parse_ports_of()'Christophe JAILLET1-2/+7
If we return before the end of the 'for_each_child_of_node()' iterator, the reference taken on 'port' must be released. Add the missing 'of_node_put()' calls. Fixes: 83c0afaec7b7 ("net: dsa: Add new binding implementation") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://lore.kernel.org/r/15d5310d1d55ad51c1af80775865306d92432e03.1634587046.git.christophe.jaillet@wanadoo.fr Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-19devlink: Remove extra device_lock assert checksLeon Romanovsky1-2/+0
PCI core code in the pci_call_probe() has a path that doesn't hold device_lock. It happens because the ->probe() is called through the workqueue mechanism. 349 static int pci_call_probe(struct pci_driver *drv, struct pci_dev *dev, 350 const struct pci_device_id *id) 351 { 352 .... 377 if (cpu < nr_cpu_ids) 378 error = work_on_cpu(cpu, local_pci_probe, &ddi); Luckily enough, the core still ensures that only single flow is executed, so it safe to remove the assert checks that anyway were added for annotations purposes. Fixes: b88f7b1203bf ("devlink: Annotate devlink API calls") Reported-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-19net: sched: Allow statistics reads from softirq.Sebastian Andrzej Siewior1-1/+1
Eric reported that the rate estimator reads statics from the softirq which in turn triggers a warning introduced in the statistics rework. The warning is too cautious. The updates happen in the softirq context so reads from softirq are fine since the writes can not be preempted. The updates/writes happen during qdisc_run() which ensures one writer and the softirq context. The remaining bad context for reading statistics remains in hard-IRQ because it may preempt a writer. Fixes: 29cbcd8582837 ("net: sched: Remove Qdisc::running sequence counter") Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-19net: sch_tbf: Add a graft commandPetr Machata1-0/+16
As another qdisc is linked to the TBF, the latter should issue an event to give drivers a chance to react to the grafting. In other qdiscs, this event is called GRAFT, so follow suit with TBF as well. Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-19can: isotp: isotp_sendmsg(): fix return error on FC timeout on TX pathMarc Kleine-Budde1-0/+3
When the a large chunk of data send and the receiver does not send a Flow Control frame back in time, the sendmsg() does not return a error code, but the number of bytes sent corresponding to the size of the packet. If a timeout occurs the isotp_tx_timer_handler() is fired, sets sk->sk_err and calls the sk->sk_error_report() function. It was wrongly expected that the error would be propagated to user space in every case. For isotp_sendmsg() blocking on wait_event_interruptible() this is not the case. This patch fixes the problem by checking if sk->sk_err is set and returning the error to user space. Fixes: e057dd3fc20f ("can: add ISO 15765-2:2016 transport protocol") Link: https://github.com/hartkopp/can-isotp/issues/42 Link: https://github.com/hartkopp/can-isotp/pull/43 Link: https://lore.kernel.org/all/20210507091839.1366379-1-mkl@pengutronix.de Cc: stable@vger.kernel.org Reported-by: Sottas Guillaume (LMB) <Guillaume.Sottas@liebherr.com> Tested-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2021-10-18Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller28-279/+200
Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next The following patchset contains Netfilter/IPVS for net-next: 1) Add new run_estimation toggle to IPVS to stop the estimation_timer logic, from Dust Li. 2) Relax superfluous dynset check on NFT_SET_TIMEOUT. 3) Add egress hook, from Lukas Wunner. 4) Nowadays, almost all hook functions in x_table land just call the hook evaluation loop. Remove remaining hook wrappers from iptables and IPVS. From Florian Westphal. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-18net: dsa: tag_rtl8_4: add realtek 8 byte protocol 4 tagAlvin Šipraga3-0/+185
This commit implements a basic version of the 8 byte tag protocol used in the Realtek RTL8365MB-VC unmanaged switch, which carries with it a protocol version of 0x04. The implementation itself only handles the parsing of the EtherType value and Realtek protocol version, together with the source or destination port fields. The rest is left unimplemented for now. The tag format is described in a confidential document provided to my company by Realtek Semiconductor Corp. Permission has been granted by the vendor to publish this driver based on that material, together with an extract from the document describing the tag format and its fields. It is hoped that this will help future implementors who do not have access to the material but who wish to extend the functionality of drivers for chips which use this protocol. In addition, two possible values of the REASON field are specified, based on experiments on my end. Realtek does not specify what value this field can take. Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Tested-by: Arınç ÜNAL <arinc.unal@arinc9.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-18net: dsa: move NET_DSA_TAG_RTL4_A to right place in Kconfig/MakefileAlvin Šipraga2-8/+8
Move things around a little so that this tag driver is alphabetically ordered. The Kconfig file is sorted based on the tristate text. Suggested-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-18net: dsa: allow reporting of standard ethtool stats for slave devicesAlvin Šipraga1-0/+34
Jakub pointed out that we have a new ethtool API for reporting device statistics in a standardized way, via .get_eth_{phy,mac,ctrl}_stats. Add a small amount of plumbing to allow DSA drivers to take advantage of this when exposing statistics. Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-18net/sched: act_ct: Fix byte count on fragmented packetsPaul Blakey1-1/+1
First fragmented packets (frag offset = 0) byte len is zeroed when stolen by ip_defrag(). And since act_ct update the stats only afterwards (at end of execute), bytes aren't correctly accounted for such packets. To fix this, move stats update to start of action execute. Fixes: b57dc7c13ea9 ("net/sched: Introduce action ct") Signed-off-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-18net: sched: Remove Qdisc::running sequence counterAhmed S. Darwish18-100/+80
The Qdisc::running sequence counter has two uses: 1. Reliably reading qdisc's tc statistics while the qdisc is running (a seqcount read/retry loop at gnet_stats_add_basic()). 2. As a flag, indicating whether the qdisc in question is running (without any retry loops). For the first usage, the Qdisc::running sequence counter write section, qdisc_run_begin() => qdisc_run_end(), covers a much wider area than what is actually needed: the raw qdisc's bstats update. A u64_stats sync point was thus introduced (in previous commits) inside the bstats structure itself. A local u64_stats write section is then started and stopped for the bstats updates. Use that u64_stats sync point mechanism for the bstats read/retry loop at gnet_stats_add_basic(). For the second qdisc->running usage, a __QDISC_STATE_RUNNING bit flag, accessed with atomic bitops, is sufficient. Using a bit flag instead of a sequence counter at qdisc_run_begin/end() and qdisc_is_running() leads to the SMP barriers implicitly added through raw_read_seqcount() and write_seqcount_begin/end() getting removed. All call sites have been surveyed though, and no required ordering was identified. Now that the qdisc->running sequence counter is no longer used, remove it. Note, using u64_stats implies no sequence counter protection for 64-bit architectures. This can lead to the qdisc tc statistics "packets" vs. "bytes" values getting out of sync on rare occasions. The individual values will still be valid. Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-18net: sched: Merge Qdisc::bstats and Qdisc::cpu_bstats data typesAhmed S. Darwish24-105/+114
The only factor differentiating per-CPU bstats data type (struct gnet_stats_basic_cpu) from the packed non-per-CPU one (struct gnet_stats_basic_packed) was a u64_stats sync point inside the former. The two data types are now equivalent: earlier commits added a u64_stats sync point to the latter. Combine both data types into "struct gnet_stats_basic_sync". This eliminates redundancy and simplifies the bstats read/write APIs. Use u64_stats_t for bstats "packets" and "bytes" data types. On 64-bit architectures, u64_stats sync points do not use sequence counter protection. Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-18net: sched: Use _bstats_update/set() instead of raw writesAhmed S. Darwish5-21/+26
The Qdisc::running sequence counter, used to protect Qdisc::bstats reads from parallel writes, is in the process of being removed. Qdisc::bstats read/writes will synchronize using an internal u64_stats sync point instead. Modify all bstats writes to use _bstats_update(). This ensures that the internal u64_stats sync point is always acquired and released as appropriate. Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-18net: sched: Protect Qdisc::bstats with u64_statsAhmed S. Darwish15-10/+35
The not-per-CPU variant of qdisc tc (traffic control) statistics, Qdisc::gnet_stats_basic_packed bstats, is protected with Qdisc::running sequence counter. This sequence counter is used for reliably protecting bstats reads from parallel writes. Meanwhile, the seqcount's write section covers a much wider area than bstats update: qdisc_run_begin() => qdisc_run_end(). That read/write section asymmetry can lead to needless retries of the read section. To prepare for removing the Qdisc::running sequence counter altogether, introduce a u64_stats sync point inside bstats instead. Modify _bstats_update() to start/end the bstats u64_stats write section. For bisectability, and finer commits granularity, the bstats read section is still protected with a Qdisc::running read/retry loop and qdisc_run_begin/end() still starts/ends that seqcount write section. Once all call sites are modified to use _bstats_update(), the Qdisc::running seqcount will be removed and bstats read/retry loop will be modified to utilize the internal u64_stats sync point. Note, using u64_stats implies no sequence counter protection for 64-bit architectures. This can lead to the statistics "packets" vs. "bytes" values getting out of sync on rare occasions. The individual values will still be valid. [bigeasy: Minor commit message edits, init all gnet_stats_basic_packed.] Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-18gen_stats: Move remaining users to gnet_stats_add_queue().Sebastian Andrzej Siewior1-37/+2
The gnet_stats_queue::qlen member is only used in the SMP-case. qdisc_qstats_qlen_backlog() needs to add qdisc_qlen() to qstats.qlen to have the same value as that provided by qdisc_qlen_sum(). gnet_stats_copy_queue() needs to overwritte the resulting qstats.qlen field whith the caller submitted qlen value. It might be differ from the submitted value. Let both functions use gnet_stats_add_queue() and remove unused __gnet_stats_copy_queue(). Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-18mq, mqprio: Use gnet_stats_add_queue().Sebastian Andrzej Siewior2-57/+18
gnet_stats_add_basic() and gnet_stats_add_queue() add up the statistics so they can be used directly for both the per-CPU and global case. gnet_stats_add_queue() copies either Qdisc's per-CPU gnet_stats_queue::qlen or the global member. The global gnet_stats_queue::qlen isn't touched in the per-CPU case so there is no need to consider it in the global-case. In the per-CPU case, the sum of global gnet_stats_queue::qlen and the per-CPU gnet_stats_queue::qlen was assigned to sch->q.qlen and sch->qstats.qlen. Now both fields are copied individually. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-18gen_stats: Add gnet_stats_add_queue().Sebastian Andrzej Siewior1-0/+32
This function will replace __gnet_stats_copy_queue(). It reads all arguments and adds them into the passed gnet_stats_queue argument. In contrast to __gnet_stats_copy_queue() it also copies the qlen member. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-18gen_stats: Add instead Set the value in __gnet_stats_copy_basic().Sebastian Andrzej Siewior4-23/+24
__gnet_stats_copy_basic() always assigns the value to the bstats argument overwriting the previous value. The later added per-CPU version always accumulated the values in the returning gnet_stats_basic_packed argument. Based on review there are five users of that function as of today: - est_fetch_counters(), ___gnet_stats_copy_basic() memsets() bstats to zero, single invocation. - mq_dump(), mqprio_dump(), mqprio_dump_class_stats() memsets() bstats to zero, multiple invocation but does not use the function due to !qdisc_is_percpu_stats(). Add the values in __gnet_stats_copy_basic() instead overwriting. Rename the function to gnet_stats_add_basic() to make it more obvious. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-18netfilter: ebtables: allocate chainstack on CPU local nodesDavidlohr Bueso1-1/+3
Keep the per-CPU memory allocated for chainstacks local. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-10-17netfilter: core: Fix clang warnings about unused static inlinesLukas Wunner1-2/+4
Unlike gcc, clang warns about unused static inlines that are not in an include file: net/netfilter/core.c:344:20: error: unused function 'nf_ingress_hook' [-Werror,-Wunused-function] static inline bool nf_ingress_hook(const struct nf_hook_ops *reg, int pf) ^ net/netfilter/core.c:353:20: error: unused function 'nf_egress_hook' [-Werror,-Wunused-function] static inline bool nf_egress_hook(const struct nf_hook_ops *reg, int pf) ^ According to commit 6863f5643dd7 ("kbuild: allow Clang to find unused static inline functions for W=1 build"), the proper resolution is to mark the affected functions as __maybe_unused. An alternative approach would be to move them to include/linux/netfilter_netdev.h, but since Pablo didn't do that in commit ddcfa710d40b ("netfilter: add nf_ingress_hook() helper function"), I'm guessing __maybe_unused is preferred. This fixes both the warning introduced by Pablo in v5.10 as well as the one recently introduced by myself with commit 42df6e1d221d ("netfilter: Introduce egress hook"). Fixes: ddcfa710d40b ("netfilter: add nf_ingress_hook() helper function") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-10-17can: isotp: isotp_sendmsg(): fix TX buffer concurrent access in isotp_sendmsg()Ziyang Xuan1-15/+31
When isotp_sendmsg() concurrent, tx.state of all TX processes can be ISOTP_IDLE. The conditions so->tx.state != ISOTP_IDLE and wq_has_sleeper(&so->wait) can not protect TX buffer from being accessed by multiple TX processes. We can use cmpxchg() to try to modify tx.state to ISOTP_SENDING firstly. If the modification of the previous process succeed, the later process must wait tx.state to ISOTP_IDLE firstly. Thus, we can ensure TX buffer is accessed by only one process at the same time. And we should also restore the original tx.state at the subsequent error processes. Fixes: e057dd3fc20f ("can: add ISO 15765-2:2016 transport protocol") Link: https://lore.kernel.org/all/c2517874fbdf4188585cf9ddf67a8fa74d5dbde5.1633764159.git.william.xuanziyang@huawei.com Cc: stable@vger.kernel.org Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com> Acked-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2021-10-17can: isotp: isotp_sendmsg(): add result check for wait_event_interruptible()Ziyang Xuan1-1/+3
Using wait_event_interruptible() to wait for complete transmission, but do not check the result of wait_event_interruptible() which can be interrupted. It will result in TX buffer has multiple accessors and the later process interferes with the previous process. Following is one of the problems reported by syzbot. ============================================================= WARNING: CPU: 0 PID: 0 at net/can/isotp.c:840 isotp_tx_timer_handler+0x2e0/0x4c0 CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.13.0-rc7+ #68 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1 04/01/2014 RIP: 0010:isotp_tx_timer_handler+0x2e0/0x4c0 Call Trace: <IRQ> ? isotp_setsockopt+0x390/0x390 __hrtimer_run_queues+0xb8/0x610 hrtimer_run_softirq+0x91/0xd0 ? rcu_read_lock_sched_held+0x4d/0x80 __do_softirq+0xe8/0x553 irq_exit_rcu+0xf8/0x100 sysvec_apic_timer_interrupt+0x9e/0xc0 </IRQ> asm_sysvec_apic_timer_interrupt+0x12/0x20 Add result check for wait_event_interruptible() in isotp_sendmsg() to avoid multiple accessers for tx buffer. Fixes: e057dd3fc20f ("can: add ISO 15765-2:2016 transport protocol") Link: https://lore.kernel.org/all/10ca695732c9dd267c76a3c30f37aefe1ff7e32f.1633764159.git.william.xuanziyang@huawei.com Cc: stable@vger.kernel.org Reported-by: syzbot+78bab6958a614b0c80b9@syzkaller.appspotmail.com Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com> Acked-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2021-10-17can: j1939: j1939_xtp_rx_rts_session_new(): abort TP less than 9 bytesZhang Changzhong2-0/+3
The receiver should abort TP if 'total message size' in TP.CM_RTS and TP.CM_BAM is less than 9 or greater than 1785 [1], but currently the j1939 stack only checks the upper bound and the receiver will accept the following broadcast message: vcan1 18ECFF00 [8] 20 08 00 02 FF 00 23 01 vcan1 18EBFF00 [8] 01 00 00 00 00 00 00 00 vcan1 18EBFF00 [8] 02 00 FF FF FF FF FF FF This patch adds check for the lower bound and abort illegal TP. [1] SAE-J1939-82 A.3.4 Row 2 and A.3.6 Row 6. Fixes: 9d71dd0c7009 ("can: add support of SAE J1939 protocol") Link: https://lore.kernel.org/all/1634203601-3460-1-git-send-email-zhangchangzhong@huawei.com Cc: stable@vger.kernel.org Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com> Acked-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2021-10-17can: j1939: j1939_xtp_rx_dat_one(): cancel session if receive TP.DT with error lengthZhang Changzhong1-2/+5
According to SAE-J1939-21, the data length of TP.DT must be 8 bytes, so cancel session when receive unexpected TP.DT message. Fixes: 9d71dd0c7009 ("can: add support of SAE J1939 protocol") Link: https://lore.kernel.org/all/1632972800-45091-1-git-send-email-zhangchangzhong@huawei.com Cc: stable@vger.kernel.org Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com> Acked-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2021-10-17can: j1939: j1939_netdev_start(): fix UAF for rx_kref of j1939_privZiyang Xuan1-2/+5
It will trigger UAF for rx_kref of j1939_priv as following. cpu0 cpu1 j1939_sk_bind(socket0, ndev0, ...) j1939_netdev_start j1939_sk_bind(socket1, ndev0, ...) j1939_netdev_start j1939_priv_set j1939_priv_get_by_ndev_locked j1939_jsk_add ..... j1939_netdev_stop kref_put_lock(&priv->rx_kref, ...) kref_get(&priv->rx_kref, ...) REFCOUNT_WARN("addition on 0;...") ==================================================== refcount_t: addition on 0; use-after-free. WARNING: CPU: 1 PID: 20874 at lib/refcount.c:25 refcount_warn_saturate+0x169/0x1e0 RIP: 0010:refcount_warn_saturate+0x169/0x1e0 Call Trace: j1939_netdev_start+0x68b/0x920 j1939_sk_bind+0x426/0xeb0 ? security_socket_bind+0x83/0xb0 The rx_kref's kref_get() and kref_put() should use j1939_netdev_lock to protect. Fixes: 9d71dd0c70099 ("can: add support of SAE J1939 protocol") Link: https://lore.kernel.org/all/20210926104757.2021540-1-william.xuanziyang@huawei.com Cc: stable@vger.kernel.org Reported-by: syzbot+85d9878b19c94f9019ad@syzkaller.appspotmail.com Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com> Acked-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2021-10-17can: j1939: j1939_tp_rxtimer(): fix errant alert in j1939_tp_rxtimerZiyang Xuan1-3/+2
When the session state is J1939_SESSION_DONE, j1939_tp_rxtimer() will give an alert "rx timeout, send abort", but do nothing actually. Move the alert into session active judgment condition, it is more reasonable. One of the scenarios is that j1939_tp_rxtimer() execute followed by j1939_xtp_rx_abort_one(). After j1939_xtp_rx_abort_one(), the session state is J1939_SESSION_DONE, then j1939_tp_rxtimer() give an alert. Fixes: 9d71dd0c7009 ("can: add support of SAE J1939 protocol") Link: https://lore.kernel.org/all/20210906094219.95924-1-william.xuanziyang@huawei.com Cc: stable@vger.kernel.org Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com> Acked-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2021-10-16net: bridge: mcast: use multicast_membership_interval for IGMPv3Nikolay Aleksandrov1-3/+1
When I added IGMPv3 support I decided to follow the RFC for computing the GMI dynamically: " 8.4. Group Membership Interval The Group Membership Interval is the amount of time that must pass before a multicast router decides there are no more members of a group or a particular source on a network. This value MUST be ((the Robustness Variable) times (the Query Interval)) plus (one Query Response Interval)." But that actually is inconsistent with how the bridge used to compute it for IGMPv2, where it was user-configurable that has a correct default value but it is up to user-space to maintain it. This would make it consistent with the other timer values which are also maintained correct by the user instead of being dynamically computed. It also changes back to the previous user-expected GMI behaviour for IGMPv3 queries which were supported before IGMPv3 was added. Note that to properly compute it dynamically we would need to add support for "Robustness Variable" which is currently missing. Reported-by: Hangbin Liu <liuhangbin@gmail.com> Fixes: 0436862e417e ("net: bridge: mcast: support for IGMPv3/MLDv2 ALLOW_NEW_SOURCES report") Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-16net: make use of helper netif_is_bridge_master()Kyungrok Chung9-14/+14
Make use of netdev helper functions to improve code readability. Replace 'dev->priv_flags & IFF_EBRIDGE' with netif_is_bridge_master(dev). Signed-off-by: Kyungrok Chung <acadx0@gmail.com> Reviewed-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-16net/smc: stop links when their GID is removedKarsten Graul1-0/+53
With SMC-Rv2 the GID is an IP address that can be deleted from the device. When an IB_EVENT_GID_CHANGE event is provided then iterate over all active links and check if their GID is still defined. Otherwise stop the affected link. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-16net/smc: add netlink support for SMC-Rv2Karsten Graul1-25/+73
Implement the netlink support for SMC-Rv2 related attributes that are provided to user space. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-16net/smc: extend LLC layer for SMC-Rv2Karsten Graul7-115/+531
Add support for large v2 LLC control messages in smc_llc.c. The new large work request buffer allows to combine control messages into one packet that had to be spread over several packets before. Add handling of the new v2 LLC messages. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-16net/smc: add v2 support to the work request layerKarsten Graul5-21/+199
In the work request layer define one large v2 buffer for each link group that is used to transmit and receive large LLC control messages. Add the completion queue handling for this buffer. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-16net/smc: retrieve v2 gid from IB deviceKarsten Graul4-30/+95
In smc_ib.c, scan for RoCE devices that support UDP encapsulation. Find an eligible device and check that there is a route to the remote peer. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-16net/smc: add v2 format of CLC decline messageKarsten Graul2-10/+55
The CLC decline message changed with SMC-Rv2 and supports up to 4 additional diagnosis codes. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>