aboutsummaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)AuthorFilesLines
2017-09-29bpf: libbpf: Provide basic API support to specify BPF obj nameMartin KaFai Lau7-47/+157
This patch extends the libbpf to provide API support to allow specifying BPF object name. In tools/lib/bpf/libbpf, the C symbol of the function and the map is used. Regarding section name, all maps are under the same section named "maps". Hence, section name is not a good choice for map's name. To be consistent with map, bpf_prog also follows and uses its function symbol as the prog's name. This patch adds logic to collect function's symbols in libbpf. There is existing codes to collect the map's symbols and no change is needed. The bpf_load_program_name() and bpf_map_create_name() are added to take the name argument. For the other bpf_map_create_xxx() variants, a name argument is directly added to them. In samples/bpf, bpf_load.c in particular, the symbol is also used as the map's name and the map symbols has already been collected in the existing code. For bpf_prog, bpf_load.c does not collect the function symbol name. We can consider to collect them later if there is a need to continue supporting the bpf_load.c. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Alexei Starovoitov <ast@fb.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-29bpf: Add map_name to bpf_map_infoMartin KaFai Lau3-1/+9
This patch allows userspace to specify a name for a map during BPF_MAP_CREATE. The map's name can later be exported to user space via BPF_OBJ_GET_INFO_BY_FD. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Alexei Starovoitov <ast@fb.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-29bpf: Add name, load_time, uid and map_ids to bpf_prog_infoMartin KaFai Lau3-1/+60
The patch adds name and load_time to struct bpf_prog_aux. They are also exported to bpf_prog_info. The bpf_prog's name is passed by userspace during BPF_PROG_LOAD. The kernel only stores the first (BPF_PROG_NAME_LEN - 1) bytes and the name stored in the kernel is always \0 terminated. The kernel will reject name that contains characters other than isalnum() and '_'. It will also reject name that is not null terminated. The existing 'user->uid' of the bpf_prog_aux is also exported to the bpf_prog_info as created_by_uid. The existing 'used_maps' of the bpf_prog_aux is exported to the newly added members 'nr_map_ids' and 'map_ids' of the bpf_prog_info. On the input, nr_map_ids tells how big the userspace's map_ids buffer is. On the output, nr_map_ids tells the exact user_map_cnt and it will only copy up to the userspace's map_ids buffer is allowed. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Alexei Starovoitov <ast@fb.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-29tcp: fix under-evaluated ssthresh in TCP VegasHoang Tran1-1/+1
With the commit 76174004a0f19785 (tcp: do not slow start when cwnd equals ssthresh), the comparison to the reduced cwnd in tcp_vegas_ssthresh() would under-evaluate the ssthresh. Signed-off-by: Hoang Tran <hoang.tran@uclouvain.be> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-29net: bridge: add per-port group_fwd_mask with less restrictionsNikolay Aleksandrov5-2/+42
We need to be able to transparently forward most link-local frames via tunnels (e.g. vxlan, qinq). Currently the bridge's group_fwd_mask has a mask which restricts the forwarding of STP and LACP, but we need to be able to forward these over tunnels and control that forwarding on a per-port basis thus add a new per-port group_fwd_mask option which only disallows mac pause frames to be forwarded (they're always dropped anyway). The patch does not change the current default situation - all of the others are still restricted unless configured for forwarding. We have successfully tested this patch with LACP and STP forwarding over VxLAN and qinq tunnels. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-28Merge branch 'hns3-dcb'David S. Miller13-119/+927
Yunsheng Lin says: ==================== Add support for DCB feature in hns3 driver The patchset contains some enhancement related to DCB before adding support for DCB feature. This patchset depends on the following patchset: https://patchwork.ozlabs.org/cover/815646/ https://patchwork.ozlabs.org/cover/816145/ High Level Architecture: [ lldpad ] | | | [ hns3_dcbnl ] | | | [ hclge_dcb ] / \ / \ / \ [ hclge_main ] [ hclge_tm ] Current patch-set support following functionality: Use of lldptool to configure the tc schedule mode, tc bandwidth(if schedule mode is ETS), prio_tc_map and PFC parameter. V3: Drop mqprio support V2: Fix for not defining variables in local loop. V1: Initial Submit. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-28net: hns3: Add DCB support when interacting with network stackYunsheng Lin1-15/+87
When using lldptool to configure DCB parameter, hclge_dcb module call the client_ops->setup_tc to tell network stack which queue and priority is using for specific tc. Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-28net: hns3: Setting for fc_mode and dcb enable flag in TM moduleYunsheng Lin1-4/+30
After the DCB feature is supported, fc_mode and dcb enable flag must be set according to the DCB parameter. Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-28net: hns3: Add dcb netlink interface for the support of DCB featureYunsheng Lin4-0/+117
This patch add dcb netlink interface by calling the interface from hclge_dcb module. This patch also update Makefile in order to build hns3_dcbnl module. Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-28net: hns3: Add hclge_dcb module for the support of DCB featureYunsheng Lin7-6/+375
The hclge_dcb module calls the interface from hclge_main/tm and provide interface for the dcb netlink interface. This patch also update Makefiles required to build the DCB supported code in HNS3 Ethernet driver and update the existing Kconfig file in the hisilicon folder. Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-28net: hns3: Add some interface for the support of DCB featureYunsheng Lin4-5/+55
This patch add some interface and export some interface from hclge_tm and hclgc_main to support the upcoming DCB feature. Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-28net: hns3: Add tc-based TM support for sriov enabled portYunsheng Lin1-18/+31
When sriov is enabled and TM is in tc-based mode, vf's TM parameters is not set in TM initialization process. This patch add the tc_based TM support for sriov enabled using the information in vport struct. Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-28net: hns3: Add support for port shaper setting in TM moduleYunsheng Lin2-0/+36
This patch add a tm_port_shaper cmd and set port shaper to HCLGE_ETHER_MAX_RATE on TM initialization process. Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-28net: hns3: Add support for PFC setting in TM moduleYunsheng Lin2-5/+68
This patch add a pfc_pause_en cmd, and use it to configure PFC option according to fc_mode in hdev->tm_info. Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-28net: hns3: Add support for dynamically buffer reallocationYunsheng Lin3-70/+87
Current buffer allocation can only happen at init, when doing buffer reallocation after init, care must be taken care of memory which priv_buf points to. This patch fixes it by using a dynamic allocated temporary memory. Because we only do buffer reallocation at init or when setting up the DCB parameter, and priv_buf is only used at buffer allocation process, so it is ok to use a dynamic allocated temporary memory. Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-28net: hns3: Support for dynamically assigning tx buffer to TCYunsheng Lin2-10/+55
This patch add support of dynamically assigning tx buffer to TC when the TC is enabled. It will save buffer for rx direction to avoid packet loss. Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-28arp: make arp_hdr_len() return unsigned intAlexey Dobriyan2-2/+3
Negative ARP header length are not a thing. Constify arguments while I'm at it. Space savings: add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-3 (-3) function old new delta arpt_do_table 1163 1160 -3 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-28net-next/hinic: Fix a case of Tx Queue is Stopped foreverAviad Krawczyk1-2/+12
Fix the following scenario: 1. tx_free_poll is running on cpu X 2. xmit function is running on cpu Y and fails to get sq wqe 3. tx_free_poll frees wqes on cpu X and checks the queue is not stopped 4. xmit function stops the queue after failed to get sq wqe 5. The queue is stopped forever Signed-off-by: Aviad Krawczyk <aviad.krawczyk@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-28net-next/hinic: Set Rxq irq to specific cpu for NUMAAviad Krawczyk1-9/+9
Set Rxq irq to specific cpu for allocating and receiving the skb from the same node. Signed-off-by: Aviad Krawczyk <aviad.krawczyk@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-28Merge branch 'bpf-verifier-disassembly-improvements'David S. Miller1-2/+21
Edward Cree says: ==================== bpf/verifier: disassembly improvements Fix the output of print_bpf_insn() for ALU ops that don't look like compound assignment (i.e. BPF_END and BPF_NEG). Sample output for a short test program: 0: (b4) (u32) r0 = (u32) 0 1: (dc) r0 = be32 r0 2: (84) r0 = (u32) -r0 3: (95) exit processed 4 insns, stack depth 0 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-28bpf/verifier: improve disassembly of BPF_NEG instructionsEdward Cree1-0/+5
BPF_NEG takes only one operand, unlike the bulk of BPF_ALU[64] which are compound-assignments. So give it its own format in print_bpf_insn(). Signed-off-by: Edward Cree <ecree@solarflare.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-28bpf/verifier: improve disassembly of BPF_END instructionsEdward Cree1-2/+16
print_bpf_insn() was treating all BPF_ALU[64] the same, but BPF_END has a different structure: it has a size in insn->imm (even if it's BPF_X) and uses the BPF_SRC (X or K) to indicate which endianness to use. So it needs different code to print it. Signed-off-by: Edward Cree <ecree@solarflare.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-28cxgb4: make function ch_flower_stats_cb, fixes warningColin Ian King1-1/+1
The function ch_flower_stats_cb is local to the source and does not need to be in global scope, so make it static. Cleans up sparse warnings: symbol 'ch_flower_stats_cb' was not declared. Should it be static? Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-28Merge branch 'rtnl-pushdown-prep'David S. Miller1-38/+84
Florian Westphal says: ==================== rtnetlink: preparation patches for further rtnl lock pushdown/removal Patches split large rtnl_fill_ifinfo into smaller chunks to better see which parts 1. require rtnl 2. do not require it at all 3. rely on rtnl locking now but could be converted Changes since v3: I dropped the 'ifalias' patch, I have a change to decouple ifalias and rtnl mutex, I will send it once this series has been merged. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-28rtnetlink: rtnl_have_link_slave_info doesn't need rtnlFlorian Westphal1-3/+7
it can be switched to rcu. Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-28rtnetlink: add helpers to dump netnsid informationFlorian Westphal1-11/+19
Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-28rtnetlink: add helpers to dump vf informationFlorian Westphal1-19/+31
similar to earlier patches, split out more parts of this function to better see what is happening and where we assume rtnl is locked. Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-28rtnetlink: add helper to put master and link ifindexesFlorian Westphal1-5/+27
rtnl_fill_ifinfo currently requires caller to hold the rtnl mutex. Unfortunately the function is quite large which makes it harder to see which spots require the lock, which spots assume it and which ones could do without. Add helpers to factor out the ifindex dumping, one can use rcu to avoid rtnl dependency. Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-28selftests: rtnetlink.sh: add rudimentary vrf testFlorian Westphal1-0/+42
Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-28net_sched: use idr to allocate u32 filter handlesCong Wang1-41/+67
Instead of calling u32_lookup_ht() in a loop to find a unused handle, just switch to idr API to allocate new handles. u32 filters are special as the handle could contain a hash table id and a key id, so we need two IDR to allocate each of them. Cc: Chris Mi <chrism@mellanox.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-28net_sched: use idr to allocate basic filter handlesCong Wang1-15/+22
Instead of calling basic_get() in a loop to find a unused handle, just switch to idr API to allocate new handles. Cc: Chris Mi <chrism@mellanox.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-28net_sched: use idr to allocate bpf filter handlesCong Wang1-29/+28
Instead of calling cls_bpf_get() in a loop to find a unused handle, just switch to idr API to allocate new handles. Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Chris Mi <chrism@mellanox.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-28inetpeer: speed up inetpeer_invalidate_tree()Eric Dumazet1-4/+7
As measured in my prior patch ("sch_netem: faster rb tree removal"), rbtree_postorder_for_each_entry_safe() is nice looking but much slower than using rb_next() directly, except when tree is small enough to fit in CPU caches (then the cost is the same) From: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-27Merge branch 'mlxsw-Add-support-for-offloading-IPv4-multicast-routes'David S. Miller11-11/+2480
Jiri Pirko says: ==================== mlxsw: Add support for offloading IPv4 multicast routes Yotam says: This patch-set introduces offloading of the kernel IPv4 multicast router logic in the Spectrum driver. The first patch makes the Spectrum driver ignore FIB notifications that are not of address family IPv4 or IPv6. This is needed in order to prevent crashes while the next patches introduce the RTNL_FAMILY_IPMR FIB notifications. Patches 2-5 update ipmr to use the FIB notification chain for both MFC and VIF notifications, and patches 8-12 update the Spectrum driver to register to these notifications and offload the routes. Similarly to IPv4 and IPv6, any failure will trigger the abort mechanism which is updated in this patch-set to eject multicast route tables too. At this stage, the following limitations apply: - A multicast MFC route will be offloaded by the driver if all the output interfaces are Spectrum router interfaces (RIFs). In any other case (which includes pimreg device, tunnel devices and management ports) the route will be trapped to the CPU and the packets will be forwarded by software. - ipmr proxy routes are not supported and will trigger the abort mechanism. - The MFC TTL values are currently treated as boolean: if the value is different than 255, the traffic is forwarded to the interface and if the value is 255 it is not forwarded. Dropping packets based on their TTL isn't currently supported. To allow users to have visibility on which of the routes are offloaded and which are not, patch 6 introduces a per-route offload indication similar to IPv4 and IPv6 routes which is sent to the user via the RTNetlink interface. The Spectrum driver multicast router offloading support, which is introduced in patches 8 and 9, is divided into two parts: - The hardware logic which abstracts the Spectrum hardware and provides a simple API for the upper levels. - The offloading logic which gets the MFC and VIF notifications from the kernel and updates the hardware using the hardware logic part. Finally, the last patch makes the Spectrum router logic not ignore the multicast FIB notifications and call the corresponding functions in the multicast router offloading logic. --- v2->v3: - Move the ipmr_rule_default function definition to be inside the already existing CONFIG_IP_MROUTE_MULTIPLE_TABLES ifdef block (patch 6) - Remove double =0 initialization in spectrum_mr.c (patch 7) - Fix route4 allocation size (patch 7) v1->v2: - Add comments for struct fields in mroute.h - Take the mrt_lock while dumping VIFs in the fib_notifier dump callback - Update the MFC lastuse field too ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-27mlxsw: spectrum: router: Don't ignore IPMR notificationsYotam Gigi1-1/+2
Make the Spectrum router logic not ignore the RTNL_FAMILY_IPMR FIB notifications. Past commits added the IPMR VIF and MFC add/del notifications via the fib_notifier chain. In addition, a code for handling these notifications in the Spectrum router logic was added. Make the Spectrum router logic not ignore these notifications and forward the requests to the Spectrum multicast router offloading logic. Signed-off-by: Yotam Gigi <yotamg@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-27mlxsw: spectrum: Notify multicast router on RIF MTU changesYotam Gigi1-0/+11
Due to the fact that multicast routes hold the minimum MTU of all the egress RIFs and trap packets that don't meet it, notify the mulitcast router code on RIF MTU changes. Signed-off-by: Yotam Gigi <yotamg@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-27mlxsw: spectrum_router: Add multicast routes notification handling functionalityYotam Gigi1-2/+185
Add functionality for calling the multicast routing offloading logic upon MFC and VIF add and delete notifications. In addition, call the multicast routing upon RIF addition and deletion events. As the multicast routing offload logic may sleep, the actual calls are done in a deferred work. To ensure the MFC object is not freed in that interval, a reference is held to it. In case of a failure, the abort mechanism is used, which ejects all the routes from the hardware and triggers the traffic to flow through the kernel. Note: At that stage, the FIB notifications are still ignored, and will be enabled in a further patch. Signed-off-by: Yotam Gigi <yotamg@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-27mlxsw: spectrum: router: Squash the default route table to mainYotam Gigi1-2/+2
Currently, the mlxsw Spectrum driver offloads only either the RT_TABLE_MAIN FIB table or the VRF tables, so the RT_TABLE_LOCAL table is squashed to the RT_TABLE_MAIN table to allow local routes to be offloaded too. By default, multicast MFC routes which are not assigned to any user requested table are put in the RT_TABLE_DEFAULT table. Due to the fact that offloading multicast MFC routes support in Spectrum router logic is going to be introduced soon, squash the default table to MAIN too. Signed-off-by: Yotam Gigi <yotamg@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-27mlxsw: spectrum: Add the multicast routing hardware logicYotam Gigi4-1/+873
Implement the multicast routing hardware API introduced in previous patch for the specific spectrum hardware. The spectrum hardware multicast routes are written using the RMFT2 register and point to an ACL flexible action set. The actions used for multicast routes are: - Counter action, which allows counting bytes and packets on multicast routes. - Multicast route action, which provide RPF check and do the actual packet duplication to a list of RIFs. - Trap action, in the case the route action specified by the called is trap. Signed-off-by: Yotam Gigi <yotamg@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-27mlxsw: spectrum: Add the multicast routing offloading logicYotam Gigi4-1/+1150
Add the multicast router offloading logic, which is in charge of handling the VIF and MFC notifications and translating it to the hardware logic API. The offloading logic has to overcome several obstacles in order to safely comply with the kernel multicast router user API: - It must keep track of the mapping between VIFs to netdevices. The user can add an MFC cache entry pointing to a VIF, delete the VIF and add re-add it with a different netdevice. The offloading logic has to handle this in order to be compatible with the kernel logic. - It must keep track of the mapping between netdevices to spectrum RIFs, as the current hardware implementation assume having a RIF for every port in a multicast router. - It must handle routes pointing to pimreg device to be trapped to the kernel, as the packet should be delivered to userspace. - It must handle routes pointing tunnel VIFs. The current implementation does not support multicast forwarding to tunnels, thus routes that point to a tunnel should be trapped to the kernel. - It must be aware of proxy multicast routes, which include both (*,*) routes and duplicate routes. Currently proxy routes are not offloaded and trigger the abort mechanism: removal of all routes from hardware and triggering the traffic to go through the kernel. The multicast routing offloading logic also updates the counters of the offloaded MFC routes in a periodic work. Signed-off-by: Yotam Gigi <yotamg@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-27net: mroute: Check if rule is a default ruleYotam Gigi2-0/+19
When the ipmr starts, it adds one default FIB rule that matches all packets and sends them to the DEFAULT (multicast) FIB table. A more complex rule can be added by user to specify that for a specific interface, a packet should be look up at either an arbitrary table or according to the l3mdev of the interface. For drivers willing to offload the ipmr logic into a hardware but don't want to offload all the FIB rules functionality, provide a function that can indicate whether the FIB rule is the default multicast rule, thus only one routing table is needed. This way, a driver can register to the FIB notification chain, get notifications about FIB rules added and trigger some kind of an internal abort mechanism when a non default rule is added by the user. Signed-off-by: Yotam Gigi <yotamg@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-27net: ipmr: Add MFC offload indicationYotam Gigi2-0/+5
Allow drivers, registered to the fib notification chain indicate whether a multicast MFC route is offloaded or not, similarly to unicast routes. The indication of whether a route is offloaded is done using the mfc_flags field on an mfc_cache struct, and the information is sent to the userspace via the RTNetlink interface only. Currently, MFC routes are either offloaded or not, thus there is no need to add per-VIF offload indication. Signed-off-by: Yotam Gigi <yotamg@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-27ipmr: Send FIB notifications on MFC and VIF entriesYotam Gigi1-0/+53
Use the newly introduced notification chain to send events upon VIF and MFC addition and deletion. The MFC notifications are sent only on resolved MFC entries, as unresolved cannot be offloaded. Signed-off-by: Yotam Gigi <yotamg@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-27ipmr: Add FIB notification access functionsYotam Gigi3-2/+153
Make the ipmr module register as a FIB notifier. To do that, implement both the ipmr_seq_read and ipmr_dump ops. The ipmr_seq_read op returns a sequence counter that is incremented on every notification related operation done by the ipmr. To implement that, add a sequence counter in the netns_ipv4 struct and increment it whenever a new MFC route or VIF are added or deleted. The sequence operations are protected by the RTNL lock. The ipmr_dump iterates the list of MFC routes and the list of VIF entries and sends notifications about them. The entries dump is done under RCU where the VIF dump uses the mrt_lock too, as the vif->dev field can change under RCU. Signed-off-by: Yotam Gigi <yotamg@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-27ipmr: Add reference count to MFC entriesYotam Gigi2-3/+26
Next commits will introduce MFC notifications through the atomic fib_notification chain, thus allowing modules to be aware of MFC entries. Due to the fact that modules may need to hold a reference to an MFC entry, add reference count to MFC entries to prevent them from being freed while these modules use them. The reference counting is done only on resolved MFC entries currently. Signed-off-by: Yotam Gigi <yotamg@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-27fib: notifier: Add VIF add and delete event typesYotam Gigi1-0/+2
In order for an interface to forward packets according to the kernel multicast routing table, it must be configured with a VIF index according to the mroute user API. The VIF index is then used to refer to that interface in the mroute user API, for example, to set the iif and oifs of an MFC entry. In order to allow drivers to be aware and offload multicast routes, they have to be aware of the VIF add and delete notifications. Due to the fact that a specific VIF can be deleted and re-added pointing to another netdevice, and the MFC routes that point to it will forward the matching packets to the new netdevice, a driver willing to offload MFC cache entries must be aware of the VIF add and delete events in addition to MFC routes notifications. Signed-off-by: Yotam Gigi <yotamg@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-26Merge branch 'nfp-flower-vxlan-tunnel-offload'David S. Miller10-42/+1243
Simon Horman says: ==================== nfp: flower vxlan tunnel offload John says: This patch set allows offloading of TC flower match and set tunnel fields to the NFP. The initial focus is on VXLAN traffic. Due to the current state of the NFP firmware, only VXLAN traffic on well known port 4789 is handled. The match and action fields must explicity set this value to be supported. Tunnel end point information is also offloaded to the NFP for both encapsulation and decapsulation. The NFP expects 3 separate data sets to be supplied. For decapsulation, 2 separate lists exist; a list of MAC addresses referenced by an index comprised of the port number, and a list of IP addresses. These IP addresses are not connected to a MAC or port. The MAC addresses can be written as a block or one at a time (because they have an index, previous values can be overwritten) while the IP addresses are always written as a list of all the available IPs. Because the MAC address used as a tunnel end point may be associated with a physical port or may be a virtual netdev like an OVS bridge, we do not know which addresses should be offloaded. For this reason, all MAC addresses of active netdevs are offloaded to the NFP. A notifier checks for changes to any currently offloaded MACs or any new netdevs that may occur. For IP addresses, the tunnel end point used in the rules is known as the destination IP address must be specified in the flower classifier rule. When a new IP address appears in a rule, the IP address is offloaded. The IP is removed from the offloaded list when all rules matching on that IP are deleted. For encapsulation, a next hop table is updated on the NFP that contains the source/dest IPs, MACs and egress port. These are written individually when requested. If the NFP tries to encapsulate a packet but does not know the next hop, then is sends a request to the host. The host carries out a route lookup and populates the given entry on the NFP table. A notifier also exists to check for any links changing or going down in the kernel next hop table. If an offloaded next hop entry is removed from the kernel then it is also removed on the NFP. The NFP periodically sends a message to the host telling it which tunnel ports have packets egressing the system. The host uses this information to update the used value in the neighbour entry. This means that, rather than expire when it times out, the kernel will send an ARP to check if the link is still live. From an NFP perspective, this means that valid entries will not be removed from its next hop table. ==================== Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-26nfp: flower vxlan neighbour keep-aliveJohn Hurley4-0/+69
Periodically receive messages containing the destination IPs of tunnels that have recently forwarded traffic. Update the neighbour entries 'used' value for these IPs next hop. This prevents the neighbour entry from expiring on timeout but rather signals an ARP to verify the connection. From an NFP perspective, packets will not fall back mid-flow unless the link is verified to be down. Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-26nfp: flower vxlan neighbour offloadJohn Hurley4-0/+268
Receive a request when the NFP does not know the next hop for a packet that is to be encapsulated in a VXLAN tunnel. Do a route lookup, determine the next hop entry and update neighbour table on NFP. Monitor the kernel neighbour table for link changes and update NFP with relevant information. Overwrite routes with zero values on the NFP when they expire. Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-26nfp: offload vxlan IPv4 endpoints of flower rulesJohn Hurley5-3/+143
Maintain a list of IPv4 addresses used as the tunnel destination IP match fields in currently active flower rules. Offload the entire list of NFP_FL_IPV4_ADDRS_MAX (even if some are unused) when new IPs are added or removed. The NFP should only be aware of tunnel end points that are currently used by rules on the device Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>