aboutsummaryrefslogtreecommitdiffstatshomepage
path: root/net (follow)
AgeCommit message (Collapse)AuthorFilesLines
2020-05-10net: dsa: sja1105: implement cross-chip bridging operationsVladimir Oltean1-0/+151
sja1105 uses dsa_8021q for DSA tagging, a format which is VLAN at heart and which is compatible with cascading. A complete description of this tagging format is in net/dsa/tag_8021q.c, but a quick summary is that each external-facing port tags incoming frames with a unique pvid, and this special VLAN is transmitted as tagged towards the inside of the system, and as untagged towards the exterior. The tag encodes the switch id and the source port index. This means that cross-chip bridging for dsa_8021q only entails adding the dsa_8021q pvids of one switch to the RX filter of the other switches. Everything else falls naturally into place, as long as the bottom-end of ports (the leaves in the tree) is comprised exclusively of dsa_8021q-compatible (i.e. sja1105 switches). Otherwise, there would be a chance that a front-panel switch transmits a packet tagged with a dsa_8021q header, header which it wouldn't be able to remove, and which would hence "leak" out. The only use case I tested (due to lack of board availability) was when the sja1105 switches are part of disjoint trees (however, this doesn't change the fact that multiple sja1105 switches still need unique switch identifiers in such a system). But in principle, even "true" single-tree setups (with DSA links) should work just as fine, except for a small change which I can't test: dsa_towards_port should be used instead of dsa_upstream_port (I made the assumption that the routing port that any sja1105 should use towards its neighbours is the CPU port. That might not hold true in other setups). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-05-10net: dsa: introduce a dsa_switch_find functionVladimir Oltean1-0/+21
Somewhat similar to dsa_tree_find, dsa_switch_find returns a dsa_switch structure pointer by searching for its tree index and switch index (the parameters from dsa,member). To be used, for example, by drivers who implement .crosschip_bridge_join and need a reference to the other switch indicated to by the tree_index and sw_index arguments. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-05-10net: dsa: permit cross-chip bridging between all trees in the systemVladimir Oltean3-8/+37
One way of utilizing DSA is by cascading switches which do not all have compatible taggers. Consider the following real-life topology: +---------------------------------------------------------------+ | LS1028A | | +------------------------------+ | | | DSA master for Felix | | | |(internal ENETC port 2: eno2))| | | +------------+------------------------------+-------------+ | | | Felix embedded L2 switch | | | | | | | | +--------------+ +--------------+ +--------------+ | | | | |DSA master for| |DSA master for| |DSA master for| | | | | | SJA1105 1 | | SJA1105 2 | | SJA1105 3 | | | | | |(Felix port 1)| |(Felix port 2)| |(Felix port 3)| | | +--+-+--------------+---+--------------+---+--------------+--+--+ +-----------------------+ +-----------------------+ +-----------------------+ | SJA1105 switch 1 | | SJA1105 switch 2 | | SJA1105 switch 3 | +-----+-----+-----+-----+ +-----+-----+-----+-----+ +-----+-----+-----+-----+ |sw1p0|sw1p1|sw1p2|sw1p3| |sw2p0|sw2p1|sw2p2|sw2p3| |sw3p0|sw3p1|sw3p2|sw3p3| +-----+-----+-----+-----+ +-----+-----+-----+-----+ +-----+-----+-----+-----+ The above can be described in the device tree as follows (obviously not complete): mscc_felix { dsa,member = <0 0>; ports { port@4 { ethernet = <&enetc_port2>; }; }; }; sja1105_switch1 { dsa,member = <1 1>; ports { port@4 { ethernet = <&mscc_felix_port1>; }; }; }; sja1105_switch2 { dsa,member = <2 2>; ports { port@4 { ethernet = <&mscc_felix_port2>; }; }; }; sja1105_switch3 { dsa,member = <3 3>; ports { port@4 { ethernet = <&mscc_felix_port3>; }; }; }; Basically we instantiate one DSA switch tree for every hardware switch in the system, but we still give them globally unique switch IDs (will come back to that later). Having 3 disjoint switch trees makes the tagger drivers "just work", because net devices are registered for the 3 Felix DSA master ports, and they are also DSA slave ports to the ENETC port. So packets received on the ENETC port are stripped of their stacked DSA tags one by one. Currently, hardware bridging between ports on the same sja1105 chip is possible, but switching between sja1105 ports on different chips is handled by the software bridge. This is fine, but we can do better. In fact, the dsa_8021q tag used by sja1105 is compatible with cascading. In other words, a sja1105 switch can correctly parse and route a packet containing a dsa_8021q tag. So if we could enable hardware bridging on the Felix DSA master ports, cross-chip bridging could be completely offloaded. Such as system would be used as follows: ip link add dev br0 type bridge && ip link set dev br0 up for port in sw0p0 sw0p1 sw0p2 sw0p3 \ sw1p0 sw1p1 sw1p2 sw1p3 \ sw2p0 sw2p1 sw2p2 sw2p3; do ip link set dev $port master br0 done The above makes switching between ports on the same row be performed in hardware, and between ports on different rows in software. Now assume the Felix switch ports are called swp0, swp1, swp2. By running the following extra commands: ip link add dev br1 type bridge && ip link set dev br1 up for port in swp0 swp1 swp2; do ip link set dev $port master br1 done the CPU no longer sees packets which traverse sja1105 switch boundaries and can be forwarded directly by Felix. The br1 bridge would not be used for any sort of traffic termination. For this to work, we need to give drivers an opportunity to listen for bridging events on DSA trees other than their own, and pass that other tree index as argument. I have made the assumption, for the moment, that the other existing DSA notifiers don't need to be broadcast to other trees. That assumption might turn out to be incorrect. But in the meantime, introduce a dsa_broadcast function, similar in purpose to dsa_port_notify, which is used only by the bridging notifiers. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-05-10net: bridge: allow enslaving some DSA master network devicesVladimir Oltean3-13/+48
Commit 8db0a2ee2c63 ("net: bridge: reject DSA-enabled master netdevices as bridge members") added a special check in br_if.c in order to check for a DSA master network device with a tagging protocol configured. This was done because back then, such devices, once enslaved in a bridge would become inoperative and would not pass DSA tagged traffic anymore due to br_handle_frame returning RX_HANDLER_CONSUMED. But right now we have valid use cases which do require bridging of DSA masters. One such example is when the DSA master ports are DSA switch ports themselves (in a disjoint tree setup). This should be completely equivalent, functionally speaking, from having multiple DSA switches hanging off of the ports of a switchdev driver. So we should allow the enslaving of DSA tagged master network devices. Instead of the regular br_handle_frame(), install a new function br_handle_frame_dummy() on these DSA masters, which returns RX_HANDLER_PASS in order to call into the DSA specific tagging protocol handlers, and lift the restriction from br_add_if. Suggested-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Suggested-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Tested-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-05-10net: phy: Send notifier when starting the cable testAndrew Lunn1-0/+41
Given that it takes time to run a cable test, send a notify message at the start, as well as when it is completed. v3: EMSGSIZE when ethnl_bcastmsg_put() fails Print an error message on failure, since this is a void function. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Michal Kubecek <mkubecek@suse.cz> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-05-10net: ethtool: Add helpers for reporting test resultsAndrew Lunn1-1/+52
The PHY drivers can use these helpers for reporting the results. The results get translated into netlink attributes which are added to the pre-allocated skbuf. v3: Poison phydev->skb Return -EMSGSIZE when ethnl_bcastmsg_put() fails Return valid error code when nla_nest_start() fails Use u8 for results Actually put u32 length into message v4: s/ENOTSUPP/EOPNOTSUPP/g Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-05-10net: ethtool: Add infrastructure for reporting cable test resultsAndrew Lunn1-0/+55
Provide infrastructure for PHY drivers to report the cable test results. A netlink skb is associated to the phydev. Helpers will be added which can add results to this skb. Once the test has finished the results are sent to user space. When netlink ethtool is not part of the kernel configuration stubs are provided. It is also impossible to trigger a cable test, so the error code returned by the alloc function is of no consequence. v2: Include the status complete in the netlink notification message v4: Replace -EINVAL with -EMSGSIZE Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-05-10net: ethtool: Make helpers publicAndrew Lunn2-2/+4
Make some helpers for building ethtool netlink messages available outside the compilation unit, so they can be used for building messages which are not simple get/set. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-05-10net: ethtool: netlink: Add support for triggering a cable testAndrew Lunn5-1/+62
Add new ethtool netlink calls to trigger the starting of a PHY cable test. Add Kconfig'ury to ETHTOOL_NETLINK so that PHYLIB is not a module when ETHTOOL_NETLINK is builtin, which would result in kernel linking errors. v2: Remove unwanted white space change Remove ethnl_cable_test_act_ops and use doit handler Rename cable_test_set_policy cable_test_act_policy Remove ETHTOOL_MSG_CABLE_TEST_ACT_REPLY v3: Remove ETHTOOL_MSG_CABLE_TEST_ACT_REPLY from documentation Remove unused cable_test_get_policy Add Reviewed-by tags v4: Remove unwanted blank line Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Michal Kubecek <mkubecek@suse.cz> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-05-09net: bpf: Add netlink and ipv6_route bpf_iter targetsYonghong Song3-4/+185
This patch added netlink and ipv6_route targets, using the same seq_ops (except show() and minor changes for stop()) for /proc/net/{netlink,ipv6_route}. The net namespace for these targets are the current net namespace at file open stage, similar to /proc/net/{netlink,ipv6_route} reference counting the net namespace at seq_file open stage. Since module is not supported for now, ipv6_route is supported only if the IPV6 is built-in, i.e., not compiled as a module. The restriction can be lifted once module is properly supported for bpf_iter. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200509175910.2476329-1-yhs@fb.com
2020-05-09Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linuxSaeed Mahameed1-0/+22
This merge includes updates to bonding driver needed for the rdma stack, to avoid conflicts with the RDMA branch. Maor Gottlieb Says: ==================== Bonding: Add support to get xmit slave The following series adds support to get the LAG master xmit slave by introducing new .ndo - ndo_get_xmit_slave. Every LAG module can implement it and it first implemented in the bond driver. This is follow-up to the RFC discussion [1]. The main motivation for doing this is for drivers that offload part of the LAG functionality. For example, Mellanox Connect-X hardware implements RoCE LAG which selects the TX affinity when the resources are created and port is remapped when it goes down. The first part of this patchset introduces the new .ndo and add the support to the bonding module. The second part adds support to get the RoCE LAG xmit slave by building skb of the RoCE packet based on the AH attributes and call to the new .ndo. The third part change the mlx5 driver driver to set the QP's affinity port according to the slave which found by the .ndo. ==================== Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-05-08ipv6: use DST_NOCOUNT in ip6_rt_pcpu_alloc()Eric Dumazet1-1/+1
We currently have to adjust ipv6 route gc_thresh/max_size depending on number of cpus on a server, this makes very little sense. If the kernels sets /proc/sys/net/ipv6/route/gc_thresh to 1024 and /proc/sys/net/ipv6/route/max_size to 4096, then we better not track the percpu dst that our implementation uses. Only routes not added (directly or indirectly) by the admin should be tracked and limited. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Martin KaFai Lau <kafai@fb.com> Cc: David Ahern <dsahern@kernel.org> Cc: Maciej Żenczykowski <maze@google.com> Acked-by: Wei Wang <weiwan@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-05-08ieee802154: 6lowpan: remove unnecessary comparisonYang Yingliang1-1/+1
The type of dispatch is u8 which is always '<=' 0xff, so the dispatch <= 0xff is always true, we can remove this comparison. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-05-08net/dst: use a smaller percpu_counter batch for dst entries accountingEric Dumazet2-4/+7
percpu_counter_add() uses a default batch size which is quite big on platforms with 256 cpus. (2*256 -> 512) This means dst_entries_get_fast() can be off by +/- 2*(nr_cpus^2) (131072 on servers with 256 cpus) Reduce the batch size to something more reasonable, and add logic to ip6_dst_gc() to call dst_entries_get_slow() before calling the _very_ expensive fib6_run_gc() function. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-05-09bpf: Allow any port in bpf_bind helperStanislav Fomichev3-20/+20
We want to have a tighter control on what ports we bind to in the BPF_CGROUP_INET{4,6}_CONNECT hooks even if it means connect() becomes slightly more expensive. The expensive part comes from the fact that we now need to call inet_csk_get_port() that verifies that the port is not used and allocates an entry in the hash table for it. Since we can't rely on "snum || !bind_address_no_port" to prevent us from calling POST_BIND hook anymore, let's add another bind flag to indicate that the call site is BPF program. v5: * fix wrong AF_INET (should be AF_INET6) in the bpf program for v6 v3: * More bpf_bind documentation refinements (Martin KaFai Lau) * Add UDP tests as well (Martin KaFai Lau) * Don't start the thread, just do socket+bind+listen (Martin KaFai Lau) v2: * Update documentation (Andrey Ignatov) * Pass BIND_FORCE_ADDRESS_NO_PORT conditionally (Andrey Ignatov) Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrey Ignatov <rdna@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20200508174611.228805-5-sdf@google.com
2020-05-09net: Refactor arguments of inet{,6}_bindStanislav Fomichev3-12/+14
The intent is to add an additional bind parameter in the next commit. Instead of adding another argument, let's convert all existing flag arguments into an extendable bit field. No functional changes. Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrey Ignatov <rdna@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20200508174611.228805-4-sdf@google.com
2020-05-07net: relax SO_TXTIME CAP_NET_ADMIN checkEric Dumazet1-10/+18
Now sch_fq has horizon feature, we want to allow QUIC/UDP applications to use EDT model so that pacing can be offloaded to the kernel (sch_fq) or the NIC. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Acked-by: Willem de Bruijn <willemb@google.com> Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-07netpoll: accept NULL np argument in netpoll_send_skb()Eric Dumazet4-13/+13
netpoll_send_skb() callers seem to leak skb if the np pointer is NULL. While this should not happen, we can make the code more robust. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-07netpoll: netpoll_send_skb() returns transmit statusEric Dumazet1-4/+7
Some callers want to know if the packet has been sent or dropped, to inform upper stacks. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-07netpoll: move netpoll_send_skb() out of lineEric Dumazet1-2/+11
There is no need to inline this helper, as we intend to add more code in this function. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-07netpoll: remove dev argument from netpoll_send_skb_on_dev()Eric Dumazet1-4/+6
netpoll_send_skb_on_dev() can get the device pointer directly from np->dev Rename it to __netpoll_send_skb() Following patch will move netpoll_send_skb() out-of-line. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-07net/smc: remove set but not used variables 'del_llc, del_llc_resp'YueHaibing1-7/+1
Fixes gcc '-Wunused-but-set-variable' warning: net/smc/smc_llc.c: In function 'smc_llc_cli_conf_link': net/smc/smc_llc.c:753:31: warning: variable 'del_llc' set but not used [-Wunused-but-set-variable] struct smc_llc_msg_del_link *del_llc; ^ net/smc/smc_llc.c: In function 'smc_llc_process_srv_delete_link': net/smc/smc_llc.c:1311:33: warning: variable 'del_llc_resp' set but not used [-Wunused-but-set-variable] struct smc_llc_msg_del_link *del_llc_resp; ^ Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-07tcp: tcp_mark_head_lost is only valid for sack-tcpzhang kai1-25/+7
so tcp_is_sack/reno checks are removed from tcp_mark_head_lost. Signed-off-by: zhang kai <zhangkaiheb@126.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-07net: remove newlines in NL_SET_ERR_MSG_MODJacob Keller3-5/+5
The NL_SET_ERR_MSG_MOD macro is used to report a string describing an error message to userspace via the netlink extended ACK structure. It should not have a trailing newline. Add a cocci script which catches cases where the newline marker is present. Using this script, fix the handful of cases which accidentally included a trailing new line. I couldn't figure out a way to get a patch mode working, so this script only implements context, report, and org. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Andy Whitcroft <apw@canonical.com> Cc: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-07hsr: remove WARN_ONCE() in hsr_fill_frame_info()Taehee Yoo1-1/+1
When VLAN frame is being sent, hsr calls WARN_ONCE() because hsr doesn't support VLAN. But using WARN_ONCE() is overdoing. Using netdev_warn_once() is enough. Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-07net: dsa: introduce a dsa_port_from_netdev public helperVladimir Oltean1-0/+9
As its implementation shows, this is synonimous with calling dsa_slave_dev_check followed by dsa_slave_to_port, so it is quite simple already and provides functionality which is already there. However there is now a need for these functions outside dsa_priv.h, for example in drivers that perform mirroring and redirection through tc-flower offloads (they are given raw access to the flow_cls_offload structure), where they need to call this function on act->dev. But simply exporting dsa_slave_to_port would make it non-inline and would result in an extra function call in the hotpath, as can be seen for example in sja1105: Before: 000006dc <sja1105_xmit>: { 6dc: e92d4ff0 push {r4, r5, r6, r7, r8, r9, sl, fp, lr} 6e0: e1a04000 mov r4, r0 6e4: e591958c ldr r9, [r1, #1420] ; 0x58c <- Inline dsa_slave_to_port 6e8: e1a05001 mov r5, r1 6ec: e24dd004 sub sp, sp, #4 u16 tx_vid = dsa_8021q_tx_vid(dp->ds, dp->index); 6f0: e1c901d8 ldrd r0, [r9, #24] 6f4: ebfffffe bl 0 <dsa_8021q_tx_vid> 6f4: R_ARM_CALL dsa_8021q_tx_vid u8 pcp = netdev_txq_to_tc(netdev, queue_mapping); 6f8: e1d416b0 ldrh r1, [r4, #96] ; 0x60 u16 tx_vid = dsa_8021q_tx_vid(dp->ds, dp->index); 6fc: e1a08000 mov r8, r0 After: 000006e4 <sja1105_xmit>: { 6e4: e92d4ff0 push {r4, r5, r6, r7, r8, r9, sl, fp, lr} 6e8: e1a04000 mov r4, r0 6ec: e24dd004 sub sp, sp, #4 struct dsa_port *dp = dsa_slave_to_port(netdev); 6f0: e1a00001 mov r0, r1 { 6f4: e1a05001 mov r5, r1 struct dsa_port *dp = dsa_slave_to_port(netdev); 6f8: ebfffffe bl 0 <dsa_slave_to_port> 6f8: R_ARM_CALL dsa_slave_to_port 6fc: e1a09000 mov r9, r0 u16 tx_vid = dsa_8021q_tx_vid(dp->ds, dp->index); 700: e1c001d8 ldrd r0, [r0, #24] 704: ebfffffe bl 0 <dsa_8021q_tx_vid> 704: R_ARM_CALL dsa_8021q_tx_vid Because we want to avoid possible performance regressions, introduce this new function which is designed to be public. Suggested-by: Vivien Didelot <vivien.didelot@gmail.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-07net: qrtr: Do not depend on ARCH_QCOMManivannan Sadhasivam1-1/+0
IPC Router protocol is also used by external modems for exchanging the QMI messages. Hence, it doesn't always depend on Qualcomm platforms. One such instance is the QCA6390 WLAN device connected to x86 machine. Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-07net: qrtr: Add MHI transport layerManivannan Sadhasivam3-0/+136
MHI is the transport layer used for communicating to the external modems. Hence, this commit adds MHI transport layer support to QRTR for transferring the QMI messages over IPC Router. Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller35-174/+316
Conflicts were all overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds32-163/+285
Pull networking fixes from David Miller: 1) Fix reference count leaks in various parts of batman-adv, from Xiyu Yang. 2) Update NAT checksum even when it is zero, from Guillaume Nault. 3) sk_psock reference count leak in tls code, also from Xiyu Yang. 4) Sanity check TCA_FQ_CODEL_DROP_BATCH_SIZE netlink attribute in fq_codel, from Eric Dumazet. 5) Fix panic in choke_reset(), also from Eric Dumazet. 6) Fix VLAN accel handling in bnxt_fix_features(), from Michael Chan. 7) Disallow out of range quantum values in sch_sfq, from Eric Dumazet. 8) Fix crash in x25_disconnect(), from Yue Haibing. 9) Don't pass pointer to local variable back to the caller in nf_osf_hdr_ctx_init(), from Arnd Bergmann. 10) Wireguard should use the ECN decap helper functions, from Toke Høiland-Jørgensen. 11) Fix command entry leak in mlx5 driver, from Moshe Shemesh. 12) Fix uninitialized variable access in mptcp's subflow_syn_recv_sock(), from Paolo Abeni. 13) Fix unnecessary out-of-order ingress frame ordering in macsec, from Scott Dial. 14) IPv6 needs to use a global serial number for dst validation just like ipv4, from David Ahern. 15) Fix up PTP_1588_CLOCK deps, from Clay McClure. 16) Missing NLM_F_MULTI flag in gtp driver netlink messages, from Yoshiyuki Kurauchi. 17) Fix a regression in that dsa user port errors should not be fatal, from Florian Fainelli. 18) Fix iomap leak in enetc driver, from Dejin Zheng. 19) Fix use after free in lec_arp_clear_vccs(), from Cong Wang. 20) Initialize protocol value earlier in neigh code paths when generating events, from Roman Mashak. 21) netdev_update_features() must be called with RTNL mutex in macsec driver, from Antoine Tenart. 22) Validate untrusted GSO packets even more strictly, from Willem de Bruijn. 23) Wireguard decrypt worker needs a cond_resched(), from Jason Donenfeld. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (111 commits) net: flow_offload: skip hw stats check for FLOW_ACTION_HW_STATS_DONT_CARE MAINTAINERS: put DYNAMIC INTERRUPT MODERATION in proper order wireguard: send/receive: use explicit unlikely branch instead of implicit coalescing wireguard: selftests: initalize ipv6 members to NULL to squelch clang warning wireguard: send/receive: cond_resched() when processing worker ringbuffers wireguard: socket: remove errant restriction on looping to self wireguard: selftests: use normal kernel stack size on ppc64 net: ethernet: ti: am65-cpsw-nuss: fix irqs type ionic: Use debugfs_create_bool() to export bool net: dsa: Do not leave DSA master with NULL netdev_ops net: dsa: remove duplicate assignment in dsa_slave_add_cls_matchall_mirred net: stricter validation of untrusted gso packets seg6: fix SRH processing to comply with RFC8754 net: mscc: ocelot: ANA_AUTOAGE_AGE_PERIOD holds a value in seconds, not ms net: dsa: ocelot: the MAC table on Felix is twice as large net: dsa: sja1105: the PTP_CLK extts input reacts on both edges selftests: net: tcp_mmap: fix SO_RCVLOWAT setting net: hsr: fix incorrect type usage for protocol variable net: macsec: fix rtnl locking issue net: mvpp2: cls: Prevent buffer overflow in mvpp2_ethtool_cls_rule_del() ...
2020-05-06net: flow_offload: skip hw stats check for FLOW_ACTION_HW_STATS_DONT_CAREPablo Neira Ayuso1-2/+12
This patch adds FLOW_ACTION_HW_STATS_DONT_CARE which tells the driver that the frontend does not need counters, this hw stats type request never fails. The FLOW_ACTION_HW_STATS_DISABLED type explicitly requests the driver to disable the stats, however, if the driver cannot disable counters, it bails out. TCA_ACT_HW_STATS_* maintains the 1:1 mapping with FLOW_ACTION_HW_STATS_* except by disabled which is mapped to FLOW_ACTION_HW_STATS_DISABLED (this is 0 in tc). Add tc_act_hw_stats() to perform the mapping between TCA_ACT_HW_STATS_* and FLOW_ACTION_HW_STATS_*. Fixes: 319a1d19471e ("flow_offload: check for basic action hw stats type") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06ethtool: provide UAPI for PHY master/slave configuration.Oleksij Rempel2-0/+59
This UAPI is needed for BroadR-Reach 100BASE-T1 devices. Due to lack of auto-negotiation support, we needed to be able to configure the MASTER-SLAVE role of the port manually or from an application in user space. The same UAPI can be used for 1000BASE-T or MultiGBASE-T devices to force MASTER or SLAVE role. See IEEE 802.3-2018: 22.2.4.3.7 MASTER-SLAVE control register (Register 9) 22.2.4.3.8 MASTER-SLAVE status register (Register 10) 40.5.2 MASTER-SLAVE configuration resolution 45.2.1.185.1 MASTER-SLAVE config value (1.2100.14) 45.2.7.10 MultiGBASE-T AN control 1 register (Register 7.32) The MASTER-SLAVE role affects the clock configuration: ------------------------------------------------------------------------------- When the PHY is configured as MASTER, the PMA Transmit function shall source TX_TCLK from a local clock source. When configured as SLAVE, the PMA Transmit function shall source TX_TCLK from the clock recovered from data stream provided by MASTER. iMX6Q KSZ9031 XXX ------\ /-----------\ /------------\ | | | | | MAC |<----RGMII----->| PHY Slave |<------>| PHY Master | |<--- 125 MHz ---+-<------/ | | \ | ------/ \-----------/ \------------/ ^ \-TX_TCLK ------------------------------------------------------------------------------- Since some clock or link related issues are only reproducible in a specific MASTER-SLAVE-role, MAC and PHY configuration, it is beneficial to provide generic (not 100BASE-T1 specific) interface to the user space for configuration flexibility and trouble shooting. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06net: dsa: Do not leave DSA master with NULL netdev_opsFlorian Fainelli1-1/+2
When ndo_get_phys_port_name() for the CPU port was added we introduced an early check for when the DSA master network device in dsa_master_ndo_setup() already implements ndo_get_phys_port_name(). When we perform the teardown operation in dsa_master_ndo_teardown() we would not be checking that cpu_dp->orig_ndo_ops was successfully allocated and non-NULL initialized. With network device drivers such as virtio_net, this leads to a NPD as soon as the DSA switch hanging off of it gets torn down because we are now assigning the virtio_net device's netdev_ops a NULL pointer. Fixes: da7b9e9b00d4 ("net: dsa: Add ndo_get_phys_port_name() for CPU port") Reported-by: Allen Pais <allen.pais@oracle.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Tested-by: Allen Pais <allen.pais@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06net: dsa: remove duplicate assignment in dsa_slave_add_cls_matchall_mirredVladimir Oltean1-5/+3
This was caused by a poor merge conflict resolution on my side. The "act = &cls->rule->action.entries[0];" assignment was already present in the code prior to the patch mentioned below. Fixes: e13c2075280e ("net: dsa: refactor matchall mirred action to separate function") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06tcp: defer xmit timer reset in tcp_xmit_retransmit_queue()Eric Dumazet1-6/+10
As hinted in prior change ("tcp: refine tcp_pacing_delay() for very low pacing rates"), it is probably best arming the xmit timer only when all the packets have been scheduled, rather than when the head of rtx queue has been re-sent. This does matter for flows having extremely low pacing rates, since their tp->tcp_wstamp_ns could be far in the future. Note that the regular xmit path has a stronger limit in tcp_small_queue_check(), meaning it is less likely to go beyond the pacing horizon. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06tcp: refine tcp_pacing_delay() for very low pacing ratesEric Dumazet2-7/+5
With the addition of horizon feature to sch_fq, we noticed some suboptimal behavior of extremely low pacing rate TCP flows, especially when TCP is not aware of a drop happening in lower stacks. Back in commit 3f80e08f40cd ("tcp: add tcp_reset_xmit_timer() helper"), tcp_pacing_delay() was added to estimate an extra delay to add to standard rto timers. This patch removes the skb argument from this helper and tcp_reset_xmit_timer() because it makes more sense to simply consider the time at which next packet is allowed to be sent, instead of the time of whatever packet has been sent. This avoids arming RTO timer too soon and removes spurious horizon drops. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06seg6: fix SRH processing to comply with RFC8754Ahmed Abdelsalam1-2/+8
The Segment Routing Header (SRH) which defines the SRv6 dataplane is defined in RFC8754. RFC8754 (section 4.1) defines the SR source node behavior which encapsulates packets into an outer IPv6 header and SRH. The SR source node encodes the full list of Segments that defines the packet path in the SRH. Then, the first segment from list of Segments is copied into the Destination address of the outer IPv6 header and the packet is sent to the first hop in its path towards the destination. If the Segment list has only one segment, the SR source node can omit the SRH as he only segment is added in the destination address. RFC8754 (section 4.1.1) defines the Reduced SRH, when a source does not require the entire SID list to be preserved in the SRH. A reduced SRH does not contain the first segment of the related SR Policy (the first segment is the one already in the DA of the IPv6 header), and the Last Entry field is set to n-2, where n is the number of elements in the SR Policy. RFC8754 (section 4.3.1.1) defines the SRH processing and the logic to validate the SRH (S09, S10, S11) which works for both reduced and non-reduced behaviors. This patch updates seg6_validate_srh() to validate the SRH as per RFC8754. Signed-off-by: Ahmed Abdelsalam <ahabdels@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06ipv6: Implement draft-ietf-6man-rfc4941bisFernando Gont1-52/+39
Implement the upcoming rev of RFC4941 (IPv6 temporary addresses): https://tools.ietf.org/html/draft-ietf-6man-rfc4941bis-09 * Reduces the default Valid Lifetime to 2 days The number of extra addresses employed when Valid Lifetime was 7 days exacerbated the stress caused on network elements/devices. Additionally, the motivation for temporary addresses is indeed privacy and reduced exposure. With a default Valid Lifetime of 7 days, an address that becomes revealed by active communication is reachable and exposed for one whole week. The only use case for a Valid Lifetime of 7 days could be some application that is expecting to have long lived connections. But if you want to have a long lived connections, you shouldn't be using a temporary address in the first place. Additionally, in the era of mobile devices, general applications should nevertheless be prepared and robust to address changes (e.g. nodes swap wifi <-> 4G, etc.) * Employs different IIDs for different prefixes To avoid network activity correlation among addresses configured for different prefixes * Uses a simpler algorithm for IID generation No need to store "history" anywhere Signed-off-by: Fernando Gont <fgont@si6networks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06net: hsr: fix incorrect type usage for protocol variableMurali Karicheri1-1/+1
Fix following sparse checker warning:- net/hsr/hsr_slave.c:38:18: warning: incorrect type in assignment (different base types) net/hsr/hsr_slave.c:38:18: expected unsigned short [unsigned] [usertype] protocol net/hsr/hsr_slave.c:38:18: got restricted __be16 [usertype] h_proto net/hsr/hsr_slave.c:39:25: warning: restricted __be16 degrades to integer net/hsr/hsr_slave.c:39:57: warning: restricted __be16 degrades to integer Signed-off-by: Murali Karicheri <m-karicheri2@ti.com> Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06net: bridge: return false in br_mrp_enabled()Jason Yan1-1/+1
Fix the following coccicheck warning: net/bridge/br_private.h:1334:8-9: WARNING: return of 0/1 in function 'br_mrp_enabled' with return type bool Fixes: 6536993371fab ("bridge: mrp: Integrate MRP into the bridge") Signed-off-by: Jason Yan <yanaijie@huawei.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-05neigh: send protocol value in neighbor create notificationRoman Mashak1-3/+3
When a new neighbor entry has been added, event is generated but it does not include protocol, because its value is assigned after the event notification routine has run, so move protocol assignment code earlier. Fixes: df9b0e30d44c ("neighbor: Add protocol attribute") Cc: David Ahern <dsahern@gmail.com> Signed-off-by: Roman Mashak <mrv@mojatatu.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-05erspan: Add type I version 0 support.William Tu1-15/+43
The Type I ERSPAN frame format is based on the barebones IP + GRE(4-byte) encapsulation on top of the raw mirrored frame. Both type I and II use 0x88BE as protocol type. Unlike type II and III, no sequence number or key is required. To creat a type I erspan tunnel device: $ ip link add dev erspan11 type erspan \ local 172.16.1.100 remote 172.16.1.200 \ erspan_ver 0 Signed-off-by: William Tu <u9012063@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-05net/smc: remove unused inline function smc_curs_readYueHaibing1-17/+0
commit bac6de7b6370 ("net/smc: eliminate cursor read and write calls") left behind this. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-05net/smc: log important pnetid and state change eventsKarsten Graul8-20/+113
Print to system log when SMC links are available or go down, link group state changes or pnetids are applied to and removed from devices. The log entries are triggered by either user configuration actions or adapter activation/deactivation events and are not expected to happen often. The entries help SMC users to keep track of the SMC link group status and to detect when actions are needed (like to add replacements for failed adapters). Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-05sch_choke: Remove classid from choke_skb_cb.David S. Miller1-1/+0
Suggested by Cong Wang. Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-05net: sched: choke: Remove unused inline function choke_set_classidYueHaibing1-5/+0
There's no callers in-tree anymore since commit 5952fde10c35 ("net: sched: choke: remove dead filter classify code") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-04xsk: Remove unnecessary member in xdp_umemMagnus Karlsson1-4/+3
Remove the unnecessary member of address in struct xdp_umem as it is only used during the umem registration. No need to carry this around as it is not used during run-time nor when unregistering the umem. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Link: https://lore.kernel.org/bpf/1588599232-24897-3-git-send-email-magnus.karlsson@intel.com
2020-05-04xsk: Change two variable names for increased clarityMagnus Karlsson4-17/+17
Change two variables names so that it is clearer what they represent. The first one is xsk_list that in fact only contains the list of AF_XDP sockets with a Tx component. Change this to xsk_tx_list for improved clarity. The second variable is size in the ring structure. One might think that this is the size of the ring, but it is in fact the size of the umem, copied into the ring structure to improve performance. Rename this variable umem_size to avoid any confusion. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Link: https://lore.kernel.org/bpf/1588599232-24897-2-git-send-email-magnus.karlsson@intel.com
2020-05-04net: partially revert dynamic lockdep key changesCong Wang10-25/+204
This patch reverts the folowing commits: commit 064ff66e2bef84f1153087612032b5b9eab005bd "bonding: add missing netdev_update_lockdep_key()" commit 53d374979ef147ab51f5d632dfe20b14aebeccd0 "net: avoid updating qdisc_xmit_lock_key in netdev_update_lockdep_key()" commit 1f26c0d3d24125992ab0026b0dab16c08df947c7 "net: fix kernel-doc warning in <linux/netdevice.h>" commit ab92d68fc22f9afab480153bd82a20f6e2533769 "net: core: add generic lockdep keys" but keeps the addr_list_lock_key because we still lock addr_list_lock nestedly on stack devices, unlikely xmit_lock this is safe because we don't take addr_list_lock on any fast path. Reported-and-tested-by: syzbot+aaa6fa4949cc5d9b7b25@syzkaller.appspotmail.com Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-04atm: fix a memory leak of vcc->user_backCong Wang1-0/+6
In lec_arp_clear_vccs() only entry->vcc is freed, but vcc could be installed on entry->recv_vcc too in lec_vcc_added(). This fixes the following memory leak: unreferenced object 0xffff8880d9266b90 (size 16): comm "atm2", pid 425, jiffies 4294907980 (age 23.488s) hex dump (first 16 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 6b 6b 6b a5 ............kkk. backtrace: [<(____ptrval____)>] kmem_cache_alloc_trace+0x10e/0x151 [<(____ptrval____)>] lane_ioctl+0x4b3/0x569 [<(____ptrval____)>] do_vcc_ioctl+0x1ea/0x236 [<(____ptrval____)>] svc_ioctl+0x17d/0x198 [<(____ptrval____)>] sock_do_ioctl+0x47/0x12f [<(____ptrval____)>] sock_ioctl+0x2f9/0x322 [<(____ptrval____)>] vfs_ioctl+0x1e/0x2b [<(____ptrval____)>] ksys_ioctl+0x61/0x80 [<(____ptrval____)>] __x64_sys_ioctl+0x16/0x19 [<(____ptrval____)>] do_syscall_64+0x57/0x65 [<(____ptrval____)>] entry_SYSCALL_64_after_hwframe+0x49/0xb3 Cc: Gengming Liu <l.dmxcsnsbh@gmail.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>