aboutsummaryrefslogtreecommitdiffstatshomepage
path: root/net (follow)
AgeCommit message (Collapse)AuthorFilesLines
2020-05-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller35-174/+316
Conflicts were all overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds32-163/+285
Pull networking fixes from David Miller: 1) Fix reference count leaks in various parts of batman-adv, from Xiyu Yang. 2) Update NAT checksum even when it is zero, from Guillaume Nault. 3) sk_psock reference count leak in tls code, also from Xiyu Yang. 4) Sanity check TCA_FQ_CODEL_DROP_BATCH_SIZE netlink attribute in fq_codel, from Eric Dumazet. 5) Fix panic in choke_reset(), also from Eric Dumazet. 6) Fix VLAN accel handling in bnxt_fix_features(), from Michael Chan. 7) Disallow out of range quantum values in sch_sfq, from Eric Dumazet. 8) Fix crash in x25_disconnect(), from Yue Haibing. 9) Don't pass pointer to local variable back to the caller in nf_osf_hdr_ctx_init(), from Arnd Bergmann. 10) Wireguard should use the ECN decap helper functions, from Toke Høiland-Jørgensen. 11) Fix command entry leak in mlx5 driver, from Moshe Shemesh. 12) Fix uninitialized variable access in mptcp's subflow_syn_recv_sock(), from Paolo Abeni. 13) Fix unnecessary out-of-order ingress frame ordering in macsec, from Scott Dial. 14) IPv6 needs to use a global serial number for dst validation just like ipv4, from David Ahern. 15) Fix up PTP_1588_CLOCK deps, from Clay McClure. 16) Missing NLM_F_MULTI flag in gtp driver netlink messages, from Yoshiyuki Kurauchi. 17) Fix a regression in that dsa user port errors should not be fatal, from Florian Fainelli. 18) Fix iomap leak in enetc driver, from Dejin Zheng. 19) Fix use after free in lec_arp_clear_vccs(), from Cong Wang. 20) Initialize protocol value earlier in neigh code paths when generating events, from Roman Mashak. 21) netdev_update_features() must be called with RTNL mutex in macsec driver, from Antoine Tenart. 22) Validate untrusted GSO packets even more strictly, from Willem de Bruijn. 23) Wireguard decrypt worker needs a cond_resched(), from Jason Donenfeld. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (111 commits) net: flow_offload: skip hw stats check for FLOW_ACTION_HW_STATS_DONT_CARE MAINTAINERS: put DYNAMIC INTERRUPT MODERATION in proper order wireguard: send/receive: use explicit unlikely branch instead of implicit coalescing wireguard: selftests: initalize ipv6 members to NULL to squelch clang warning wireguard: send/receive: cond_resched() when processing worker ringbuffers wireguard: socket: remove errant restriction on looping to self wireguard: selftests: use normal kernel stack size on ppc64 net: ethernet: ti: am65-cpsw-nuss: fix irqs type ionic: Use debugfs_create_bool() to export bool net: dsa: Do not leave DSA master with NULL netdev_ops net: dsa: remove duplicate assignment in dsa_slave_add_cls_matchall_mirred net: stricter validation of untrusted gso packets seg6: fix SRH processing to comply with RFC8754 net: mscc: ocelot: ANA_AUTOAGE_AGE_PERIOD holds a value in seconds, not ms net: dsa: ocelot: the MAC table on Felix is twice as large net: dsa: sja1105: the PTP_CLK extts input reacts on both edges selftests: net: tcp_mmap: fix SO_RCVLOWAT setting net: hsr: fix incorrect type usage for protocol variable net: macsec: fix rtnl locking issue net: mvpp2: cls: Prevent buffer overflow in mvpp2_ethtool_cls_rule_del() ...
2020-05-06net: flow_offload: skip hw stats check for FLOW_ACTION_HW_STATS_DONT_CAREPablo Neira Ayuso1-2/+12
This patch adds FLOW_ACTION_HW_STATS_DONT_CARE which tells the driver that the frontend does not need counters, this hw stats type request never fails. The FLOW_ACTION_HW_STATS_DISABLED type explicitly requests the driver to disable the stats, however, if the driver cannot disable counters, it bails out. TCA_ACT_HW_STATS_* maintains the 1:1 mapping with FLOW_ACTION_HW_STATS_* except by disabled which is mapped to FLOW_ACTION_HW_STATS_DISABLED (this is 0 in tc). Add tc_act_hw_stats() to perform the mapping between TCA_ACT_HW_STATS_* and FLOW_ACTION_HW_STATS_*. Fixes: 319a1d19471e ("flow_offload: check for basic action hw stats type") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06ethtool: provide UAPI for PHY master/slave configuration.Oleksij Rempel2-0/+59
This UAPI is needed for BroadR-Reach 100BASE-T1 devices. Due to lack of auto-negotiation support, we needed to be able to configure the MASTER-SLAVE role of the port manually or from an application in user space. The same UAPI can be used for 1000BASE-T or MultiGBASE-T devices to force MASTER or SLAVE role. See IEEE 802.3-2018: 22.2.4.3.7 MASTER-SLAVE control register (Register 9) 22.2.4.3.8 MASTER-SLAVE status register (Register 10) 40.5.2 MASTER-SLAVE configuration resolution 45.2.1.185.1 MASTER-SLAVE config value (1.2100.14) 45.2.7.10 MultiGBASE-T AN control 1 register (Register 7.32) The MASTER-SLAVE role affects the clock configuration: ------------------------------------------------------------------------------- When the PHY is configured as MASTER, the PMA Transmit function shall source TX_TCLK from a local clock source. When configured as SLAVE, the PMA Transmit function shall source TX_TCLK from the clock recovered from data stream provided by MASTER. iMX6Q KSZ9031 XXX ------\ /-----------\ /------------\ | | | | | MAC |<----RGMII----->| PHY Slave |<------>| PHY Master | |<--- 125 MHz ---+-<------/ | | \ | ------/ \-----------/ \------------/ ^ \-TX_TCLK ------------------------------------------------------------------------------- Since some clock or link related issues are only reproducible in a specific MASTER-SLAVE-role, MAC and PHY configuration, it is beneficial to provide generic (not 100BASE-T1 specific) interface to the user space for configuration flexibility and trouble shooting. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06net: dsa: Do not leave DSA master with NULL netdev_opsFlorian Fainelli1-1/+2
When ndo_get_phys_port_name() for the CPU port was added we introduced an early check for when the DSA master network device in dsa_master_ndo_setup() already implements ndo_get_phys_port_name(). When we perform the teardown operation in dsa_master_ndo_teardown() we would not be checking that cpu_dp->orig_ndo_ops was successfully allocated and non-NULL initialized. With network device drivers such as virtio_net, this leads to a NPD as soon as the DSA switch hanging off of it gets torn down because we are now assigning the virtio_net device's netdev_ops a NULL pointer. Fixes: da7b9e9b00d4 ("net: dsa: Add ndo_get_phys_port_name() for CPU port") Reported-by: Allen Pais <allen.pais@oracle.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Tested-by: Allen Pais <allen.pais@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06net: dsa: remove duplicate assignment in dsa_slave_add_cls_matchall_mirredVladimir Oltean1-5/+3
This was caused by a poor merge conflict resolution on my side. The "act = &cls->rule->action.entries[0];" assignment was already present in the code prior to the patch mentioned below. Fixes: e13c2075280e ("net: dsa: refactor matchall mirred action to separate function") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06tcp: defer xmit timer reset in tcp_xmit_retransmit_queue()Eric Dumazet1-6/+10
As hinted in prior change ("tcp: refine tcp_pacing_delay() for very low pacing rates"), it is probably best arming the xmit timer only when all the packets have been scheduled, rather than when the head of rtx queue has been re-sent. This does matter for flows having extremely low pacing rates, since their tp->tcp_wstamp_ns could be far in the future. Note that the regular xmit path has a stronger limit in tcp_small_queue_check(), meaning it is less likely to go beyond the pacing horizon. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06tcp: refine tcp_pacing_delay() for very low pacing ratesEric Dumazet2-7/+5
With the addition of horizon feature to sch_fq, we noticed some suboptimal behavior of extremely low pacing rate TCP flows, especially when TCP is not aware of a drop happening in lower stacks. Back in commit 3f80e08f40cd ("tcp: add tcp_reset_xmit_timer() helper"), tcp_pacing_delay() was added to estimate an extra delay to add to standard rto timers. This patch removes the skb argument from this helper and tcp_reset_xmit_timer() because it makes more sense to simply consider the time at which next packet is allowed to be sent, instead of the time of whatever packet has been sent. This avoids arming RTO timer too soon and removes spurious horizon drops. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06seg6: fix SRH processing to comply with RFC8754Ahmed Abdelsalam1-2/+8
The Segment Routing Header (SRH) which defines the SRv6 dataplane is defined in RFC8754. RFC8754 (section 4.1) defines the SR source node behavior which encapsulates packets into an outer IPv6 header and SRH. The SR source node encodes the full list of Segments that defines the packet path in the SRH. Then, the first segment from list of Segments is copied into the Destination address of the outer IPv6 header and the packet is sent to the first hop in its path towards the destination. If the Segment list has only one segment, the SR source node can omit the SRH as he only segment is added in the destination address. RFC8754 (section 4.1.1) defines the Reduced SRH, when a source does not require the entire SID list to be preserved in the SRH. A reduced SRH does not contain the first segment of the related SR Policy (the first segment is the one already in the DA of the IPv6 header), and the Last Entry field is set to n-2, where n is the number of elements in the SR Policy. RFC8754 (section 4.3.1.1) defines the SRH processing and the logic to validate the SRH (S09, S10, S11) which works for both reduced and non-reduced behaviors. This patch updates seg6_validate_srh() to validate the SRH as per RFC8754. Signed-off-by: Ahmed Abdelsalam <ahabdels@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06ipv6: Implement draft-ietf-6man-rfc4941bisFernando Gont1-52/+39
Implement the upcoming rev of RFC4941 (IPv6 temporary addresses): https://tools.ietf.org/html/draft-ietf-6man-rfc4941bis-09 * Reduces the default Valid Lifetime to 2 days The number of extra addresses employed when Valid Lifetime was 7 days exacerbated the stress caused on network elements/devices. Additionally, the motivation for temporary addresses is indeed privacy and reduced exposure. With a default Valid Lifetime of 7 days, an address that becomes revealed by active communication is reachable and exposed for one whole week. The only use case for a Valid Lifetime of 7 days could be some application that is expecting to have long lived connections. But if you want to have a long lived connections, you shouldn't be using a temporary address in the first place. Additionally, in the era of mobile devices, general applications should nevertheless be prepared and robust to address changes (e.g. nodes swap wifi <-> 4G, etc.) * Employs different IIDs for different prefixes To avoid network activity correlation among addresses configured for different prefixes * Uses a simpler algorithm for IID generation No need to store "history" anywhere Signed-off-by: Fernando Gont <fgont@si6networks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06net: hsr: fix incorrect type usage for protocol variableMurali Karicheri1-1/+1
Fix following sparse checker warning:- net/hsr/hsr_slave.c:38:18: warning: incorrect type in assignment (different base types) net/hsr/hsr_slave.c:38:18: expected unsigned short [unsigned] [usertype] protocol net/hsr/hsr_slave.c:38:18: got restricted __be16 [usertype] h_proto net/hsr/hsr_slave.c:39:25: warning: restricted __be16 degrades to integer net/hsr/hsr_slave.c:39:57: warning: restricted __be16 degrades to integer Signed-off-by: Murali Karicheri <m-karicheri2@ti.com> Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06net: bridge: return false in br_mrp_enabled()Jason Yan1-1/+1
Fix the following coccicheck warning: net/bridge/br_private.h:1334:8-9: WARNING: return of 0/1 in function 'br_mrp_enabled' with return type bool Fixes: 6536993371fab ("bridge: mrp: Integrate MRP into the bridge") Signed-off-by: Jason Yan <yanaijie@huawei.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-05neigh: send protocol value in neighbor create notificationRoman Mashak1-3/+3
When a new neighbor entry has been added, event is generated but it does not include protocol, because its value is assigned after the event notification routine has run, so move protocol assignment code earlier. Fixes: df9b0e30d44c ("neighbor: Add protocol attribute") Cc: David Ahern <dsahern@gmail.com> Signed-off-by: Roman Mashak <mrv@mojatatu.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-05erspan: Add type I version 0 support.William Tu1-15/+43
The Type I ERSPAN frame format is based on the barebones IP + GRE(4-byte) encapsulation on top of the raw mirrored frame. Both type I and II use 0x88BE as protocol type. Unlike type II and III, no sequence number or key is required. To creat a type I erspan tunnel device: $ ip link add dev erspan11 type erspan \ local 172.16.1.100 remote 172.16.1.200 \ erspan_ver 0 Signed-off-by: William Tu <u9012063@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-05net/smc: remove unused inline function smc_curs_readYueHaibing1-17/+0
commit bac6de7b6370 ("net/smc: eliminate cursor read and write calls") left behind this. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-05net/smc: log important pnetid and state change eventsKarsten Graul8-20/+113
Print to system log when SMC links are available or go down, link group state changes or pnetids are applied to and removed from devices. The log entries are triggered by either user configuration actions or adapter activation/deactivation events and are not expected to happen often. The entries help SMC users to keep track of the SMC link group status and to detect when actions are needed (like to add replacements for failed adapters). Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-05sch_choke: Remove classid from choke_skb_cb.David S. Miller1-1/+0
Suggested by Cong Wang. Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-05net: sched: choke: Remove unused inline function choke_set_classidYueHaibing1-5/+0
There's no callers in-tree anymore since commit 5952fde10c35 ("net: sched: choke: remove dead filter classify code") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-04net: partially revert dynamic lockdep key changesCong Wang10-25/+204
This patch reverts the folowing commits: commit 064ff66e2bef84f1153087612032b5b9eab005bd "bonding: add missing netdev_update_lockdep_key()" commit 53d374979ef147ab51f5d632dfe20b14aebeccd0 "net: avoid updating qdisc_xmit_lock_key in netdev_update_lockdep_key()" commit 1f26c0d3d24125992ab0026b0dab16c08df947c7 "net: fix kernel-doc warning in <linux/netdevice.h>" commit ab92d68fc22f9afab480153bd82a20f6e2533769 "net: core: add generic lockdep keys" but keeps the addr_list_lock_key because we still lock addr_list_lock nestedly on stack devices, unlikely xmit_lock this is safe because we don't take addr_list_lock on any fast path. Reported-and-tested-by: syzbot+aaa6fa4949cc5d9b7b25@syzkaller.appspotmail.com Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-04atm: fix a memory leak of vcc->user_backCong Wang1-0/+6
In lec_arp_clear_vccs() only entry->vcc is freed, but vcc could be installed on entry->recv_vcc too in lec_vcc_added(). This fixes the following memory leak: unreferenced object 0xffff8880d9266b90 (size 16): comm "atm2", pid 425, jiffies 4294907980 (age 23.488s) hex dump (first 16 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 6b 6b 6b a5 ............kkk. backtrace: [<(____ptrval____)>] kmem_cache_alloc_trace+0x10e/0x151 [<(____ptrval____)>] lane_ioctl+0x4b3/0x569 [<(____ptrval____)>] do_vcc_ioctl+0x1ea/0x236 [<(____ptrval____)>] svc_ioctl+0x17d/0x198 [<(____ptrval____)>] sock_do_ioctl+0x47/0x12f [<(____ptrval____)>] sock_ioctl+0x2f9/0x322 [<(____ptrval____)>] vfs_ioctl+0x1e/0x2b [<(____ptrval____)>] ksys_ioctl+0x61/0x80 [<(____ptrval____)>] __x64_sys_ioctl+0x16/0x19 [<(____ptrval____)>] do_syscall_64+0x57/0x65 [<(____ptrval____)>] entry_SYSCALL_64_after_hwframe+0x49/0xb3 Cc: Gengming Liu <l.dmxcsnsbh@gmail.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-04atm: fix a UAF in lec_arp_clear_vccs()Cong Wang1-11/+11
Gengming reported a UAF in lec_arp_clear_vccs(), where we add a vcc socket to an entry in a per-device list but free the socket without removing it from the list when vcc->dev is NULL. We need to call lec_vcc_close() to search and remove those entries contain the vcc being destroyed. This can be done by calling vcc->push(vcc, NULL) unconditionally in vcc_destroy_socket(). Another issue discovered by Gengming's reproducer is the vcc->dev may point to the static device lecatm_dev, for which we don't need to register/unregister device, so we can just check for vcc->dev->ops->owner. Reported-by: Gengming Liu <l.dmxcsnsbh@gmail.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-04devlink: let kernel allocate region snapshot idJakub Kicinski1-13/+44
Currently users have to choose a free snapshot id before calling DEVLINK_CMD_REGION_NEW. This is potentially racy and inconvenient. Make the DEVLINK_ATTR_REGION_SNAPSHOT_ID optional and try to allocate id automatically. Send a message back to the caller with the snapshot info. Example use: $ devlink region new netdevsim/netdevsim1/dummy netdevsim/netdevsim1/dummy: snapshot 1 $ id=$(devlink -j region new netdevsim/netdevsim1/dummy | \ jq '.[][][][]') $ devlink region dump netdevsim/netdevsim1/dummy snapshot $id [...] $ devlink region del netdevsim/netdevsim1/dummy snapshot $id v4: - inline the notification code v3: - send the notification only once snapshot creation completed. v2: - don't wrap the line containing extack; - add a few sentences to the docs. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-04devlink: factor out building a snapshot notificationJakub Kicinski1-11/+28
We'll need to send snapshot info back on the socket which requested a snapshot to be created. Factor out constructing a snapshot description from the broadcast notification code. v3: new patch Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-04net_sched: sch_fq: add horizon attributeEric Dumazet1-5/+54
QUIC servers would like to use SO_TXTIME, without having CAP_NET_ADMIN, to efficiently pace UDP packets. As far as sch_fq is concerned, we need to add safety checks, so that a buggy application does not fill the qdisc with packets having delivery time far in the future. This patch adds a configurable horizon (default: 10 seconds), and a configurable policy when a packet is beyond the horizon at enqueue() time: - either drop the packet (default policy) - or cap its delivery time to the horizon. $ tc -s -d qd sh dev eth0 qdisc fq 8022: root refcnt 257 limit 10000p flow_limit 100p buckets 1024 orphan_mask 1023 quantum 10Kb initial_quantum 51160b low_rate_threshold 550Kbit refill_delay 40.0ms timer_slack 10.000us horizon 10.000s Sent 1234215879 bytes 837099 pkt (dropped 21, overlimits 0 requeues 6) backlog 0b 0p requeues 6 flows 1191 (inactive 1177 throttled 0) gc 0 highprio 0 throttled 692 latency 11.480us pkts_too_long 0 alloc_errors 0 horizon_drops 21 horizon_caps 0 v2: fixed an overflow on 32bit kernels in fq_init(), reported by kbuild test robot <lkp@intel.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-04net_sched: fix tcm_parent in tc filter dumpCong Wang1-4/+4
When we tell kernel to dump filters from root (ffff:ffff), those filters on ingress (ffff:0000) are matched, but their true parents must be dumped as they are. However, kernel dumps just whatever we tell it, that is either ffff:ffff or ffff:0000: $ nl-cls-list --dev=dummy0 --parent=root cls basic dev dummy0 id none parent root prio 49152 protocol ip match-all cls basic dev dummy0 id :1 parent root prio 49152 protocol ip match-all $ nl-cls-list --dev=dummy0 --parent=ffff: cls basic dev dummy0 id none parent ffff: prio 49152 protocol ip match-all cls basic dev dummy0 id :1 parent ffff: prio 49152 protocol ip match-all This is confusing and misleading, more importantly this is a regression since 4.15, so the old behavior must be restored. And, when tc filters are installed on a tc class, the parent should be the classid, rather than the qdisc handle. Commit edf6711c9840 ("net: sched: remove classid and q fields from tcf_proto") removed the classid we save for filters, we can just restore this classid in tcf_block. Steps to reproduce this: ip li set dev dummy0 up tc qd add dev dummy0 ingress tc filter add dev dummy0 parent ffff: protocol arp basic action pass tc filter show dev dummy0 root Before this patch: filter protocol arp pref 49152 basic filter protocol arp pref 49152 basic handle 0x1 action order 1: gact action pass random type none pass val 0 index 1 ref 1 bind 1 After this patch: filter parent ffff: protocol arp pref 49152 basic filter parent ffff: protocol arp pref 49152 basic handle 0x1 action order 1: gact action pass random type none pass val 0 index 1 ref 1 bind 1 Fixes: a10fa20101ae ("net: sched: propagate q and parent from caller down to tcf_fill_node") Fixes: edf6711c9840 ("net: sched: remove classid and q fields from tcf_proto") Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-04net: sched: fallback to qdisc noqueue if default qdisc setup failJesper Dangaard Brouer1-3/+14
Currently if the default qdisc setup/init fails, the device ends up with qdisc "noop", which causes all TX packets to get dropped. With the introduction of sysctl net/core/default_qdisc it is possible to change the default qdisc to be more advanced, which opens for the possibility that Qdisc_ops->init() can fail. This patch detect these kind of failures, and choose to fallback to qdisc "noqueue", which is so simple that its init call will not fail. This allows the interface to continue functioning. V2: As this also captures memory failures, which are transient, the device is not kept in IFF_NO_QUEUE state. This allows the net_device to retry to default qdisc assignment. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-04net/smc: save SMC-R peer link_uidKarsten Graul4-0/+13
During SMC-R link establishment the peers exchange the link_uid that is used for debugging purposes. Save the peer link_uid in smc_link so it can be retrieved by the smc_diag netlink interface. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-04net/smc: create improved SMC-R link_uidKarsten Graul4-5/+19
The link_uid of an SMC-R link is exchanged between SMC peers and its value can be used for debugging purposes. Create a unique link_uid during link initialization and use it in communication with SMC-R peers. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-04net/smc: improve termination processingKarsten Graul1-30/+31
Add helper smcr_lgr_link_deactivate_all() and eliminate duplicate code. In smc_lgr_free(), clear the smc-r links before smc_lgr_free_bufs() is called so buffers are already prepared for free. The usage of the soft parameter in __smc_lgr_terminate() is no longer needed, smc_lgr_free() can be called directly. smc_lgr_terminate_sched() and smc_smcd_terminate() set lgr->freeing to indicate that the link group will be freed soon to avoid unnecessary schedules of the free worker. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-04net/smc: add termination reason and handle LLC protocol violationKarsten Graul4-2/+30
Allow to set the reason code for the link group termination, and set meaningful values before termination processing is triggered. This reason code is sent to the peer in the final delete link message. When the LLC request or response layer receives a message type that was not handled, drop a warning and terminate the link group. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-04net/smc: asymmetric link taggingKarsten Graul3-9/+41
New connections must not be assigned to asymmetric links. Add asymmetric link tagging using new link variable link_is_asym. The new helpers smcr_lgr_set_type() and smcr_lgr_set_type_asym() are called to set the state of the link group, and tag all links accordingly. smcr_lgr_conn_assign_link() respects the link tagging and will not assign new connections to links tagged as asymmetric link. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-04net/smc: assign link to a new connectionKarsten Graul1-19/+46
For new connections, assign a link from the link group, using some simple load balancing. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-04net/smc: send DELETE_LINK, ALL message and wait for send to completeKarsten Graul3-0/+51
Add smc_llc_send_message_wait() which uses smc_wr_tx_send_wait() to send an LLC message and waits for the message send to complete. smc_llc_send_link_delete_all() calls the new function to send an DELETE_LINK,ALL LLC message. The RFC states that the sender of this type of message needs to wait for the completion event of the message transmission and can terminate the link afterwards. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-04net/smc: wait for departure of an IB messageKarsten Graul3-0/+42
Introduce smc_wr_tx_send_wait() to send an IB message and wait for the tx completion event of the message. This makes sure that the message is no longer in-flight when the function returns. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-04net/smc: handle incoming CDC validation messageKarsten Graul3-6/+48
Call smc_cdc_msg_validate() when a CDC message with the failover validation bit enabled was received. Validate that the sequence number sent with the message is one we already have received. If not, messages were lost and the connection is terminated using a new abort_work. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-04net/smc: send failover validation messageKarsten Graul3-1/+27
When a connection is switched to a new link then a link validation message must be sent to the peer over the new link, containing the sequence number of the last CDC message that was sent over the old link. The peer will validate if this sequence number is the same or lower then the number he received, and abort the connection if messages were lost. Add smcr_cdc_msg_send_validation() to send the message validation message and call it when a connection was switched in smc_switch_cursor(). Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-04net/smc: switch connections to alternate linkKarsten Graul6-9/+162
Add smc_switch_conns() to switch all connections from a link that is going down. Find an other link to switch the connections to, and switch each connection to the new link. smc_switch_cursor() updates the cursors of a connection to the state of the last successfully sent CDC message. When there is no link to switch to, terminate the link group. Call smc_switch_conns() when a link is going down. And with the possibility that links of connections can switch adapt CDC and TX functions to detect and handle link switches. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-04net/smc: save state of last sent CDC messageKarsten Graul2-0/+10
When a link goes down and all connections of this link need to be switched to an other link then the producer cursor and the sequence of the last successfully sent CDC message must be known. Add the two fields to the SMC connection and update it in the tx completion handler. And to allow matching of sequences in error cases reset the seqno to the old value in smc_cdc_msg_send() when the actual send failed. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-04devlink: Fix reporter's recovery conditionAya Levin1-2/+5
Devlink health core conditions the reporter's recovery with the expiration of the grace period. This is not relevant for the first recovery. Explicitly demand that the grace period will only apply to recoveries other than the first. Fixes: c8e1da0bf923 ("devlink: Add health report functionality") Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-04tipc: fix partial topology connection closureTuong Lien1-2/+3
When an application connects to the TIPC topology server and subscribes to some services, a new connection is created along with some objects - 'tipc_subscription' to store related data correspondingly... However, there is one omission in the connection handling that when the connection or application is orderly shutdown (e.g. via SIGQUIT, etc.), the connection is not closed in kernel, the 'tipc_subscription' objects are not freed too. This results in: - The maximum number of subscriptions (65535) will be reached soon, new subscriptions will be rejected; - TIPC module cannot be removed (unless the objects are somehow forced to release first); The commit fixes the issue by closing the connection if the 'recvmsg()' returns '0' i.e. when the peer is shutdown gracefully. It also includes the other unexpected cases. Acked-by: Jon Maloy <jmaloy@redhat.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-04net: dsa: Do not make user port errors fatalFlorian Fainelli1-1/+1
Prior to 1d27732f411d ("net: dsa: setup and teardown ports"), we would not treat failures to set-up an user port as fatal, but after this commit we would, which is a regression for some systems where interfaces may be declared in the Device Tree, but the underlying hardware may not be present (pluggable daughter cards for instance). Fixes: 1d27732f411d ("net: dsa: setup and teardown ports") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-03net/smc: enqueue local LLC messagesKarsten Graul3-2/+33
As SMC server, when a second link was deleted, trigger the setup of an asymmetric link. Do this by enqueueing a local ADD_LINK message which is processed by the LLC layer as if it were received from peer. Do the same when a new IB port became active and a new link could be created. smc_llc_srv_add_link_local() enqueues a local ADD_LINK message. And smc_llc_srv_delete_link_local() is used the same way to enqueue a local DELETE_LINK message. This is used when an IB port is no longer active. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-03net/smc: delete link processing as SMC serverKarsten Graul1-0/+72
Add smc_llc_process_srv_delete_link() to process a DELETE_LINK request as SMC server. When the request is to delete ALL links then terminate the whole link group. If not, find the link to delete by its link_id, send the DELETE_LINK request LLC message and wait for the response. No matter if a response was received, clear the deleted link and update the link group state. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-03net/smc: delete link processing as SMC clientKarsten Graul1-0/+72
Add smc_llc_process_cli_delete_link() to process a DELETE_LINK request as SMC client. When the request is to delete ALL links then terminate the whole link group. If not, find the link to delete by its link_id, send the DELETE_LINK response LLC message and then clear the deleted link. Finally determine and update the link group state. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-03net/smc: llc_del_link_work and use the LLC flow for delete linkKarsten Graul3-34/+45
Introduce a work that is scheduled when a new DELETE_LINK LLC request is received. The work will call either the SMC client or SMC server DELETE_LINK processing. And use the LLC flow framework to process incoming DELETE_LINK LLC messages, scheduling the llc_del_link_work for those events. With these changes smc_lgr_forget() is only called by one function and can be migrated into smc_lgr_cleanup_early(). Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-03net/smc: delete an asymmetric link as SMC serverKarsten Graul3-2/+82
When a link group moved from asymmetric to symmetric state then the dangling asymmetric link can be deleted. Add smc_llc_find_asym_link() to find the respective link and add smc_llc_delete_asym_link() to delete it. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-03net/smc: final part of add link processing as SMC serverKarsten Graul1-1/+28
This patch finalizes the ADD_LINK processing of new links. Send the CONFIRM_LINK request to the peer, receive the response and set link state to ACTIVE. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-03net/smc: rkey processing for a new link as SMC serverKarsten Graul1-1/+42
Part of SMC server new link establishment is the exchange of rkeys for used buffers. Loop over all used RMB buffers and send ADD_LINK_CONTINUE LLC messages to the peer. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-03net/smc: first part of add link processing as SMC serverKarsten Graul3-2/+92
First set of functions to process an ADD_LINK LLC request as an SMC server. Find an alternate IB device, determine the new link group type and get the index for the new link. Then initialize the link and send the ADD_LINK LLC message to the peer. Save the contents of the response, ready the link, map all used buffers and register the buffers with the IB device. If any error occurs, stop the processing and clear the link. And call smc_llc_srv_add_link() in af_smc.c to start second link establishment after the initial link of a link group was created. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-03net/smc: final part of add link processing as SMC clientKarsten Graul3-4/+72
This patch finalizes the ADD_LINK processing of new links. Receive the CONFIRM_LINK request from peer, complete the link initialization, register all used buffers with the IB device and finally send the CONFIRM_LINK response, which completes the ADD_LINK processing. And activate smc_llc_cli_add_link() in af_smc.c. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>