aboutsummaryrefslogtreecommitdiffstats
path: root/tools/perf/scripts/python/export-to-postgresql.py (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2018-06-01net/mlx5e: RX, Always prefer Linear SKB configurationTariq Toukan1-3/+10
Prefer the linear SKB configuration of Legacy RQ over the non-linear one of Striding RQ. This implies that ConnectX-4 LX now uses legacy RQ by default, as it does not support the linear configuration of Striding RQ. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-06-01net/mlx5e: RX, Enhance legacy Receive Queue memory schemeTariq Toukan3-127/+362
Enhance the memory scheme of the legacy RQ, such that only order-0 pages are used. Whenever possible, prefer using a linear SKB, and build it wrapping the WQE buffer. Otherwise (for example, jumbo frames on x86), use non-linear SKB, with as many frags as needed. In this case, multiple WQE scatter entries are used, up to a maximum of 4 frags and 10KB of MTU. This implied to remove support of HW LRO in legacy RQ, as it would require large number of page allocations and scatter entries per WQE on archs with PAGE_SIZE = 4KB, yielding bad performance. In earlier patches, we guaranteed that all completions are in-order, and that we use a cyclic WQ. This creates an oppurtunity for a performance optimization: The mapping between a "struct mlx5e_dma_info", and the WQEs (struct mlx5e_wqe_frag_info) pointing to it, is constant across different cycles of a WQ. This allows initializing the mapping in the time of RQ creation, and not handle it in datapath. A struct mlx5e_dma_info that is shared between different WQEs is allocated by the first WQE, and freed by the last one. This implies an important requirement: WQEs that share the same struct mlx5e_dma_info must be posted within the same NAPI. Otherwise, upon completion, struct mlx5e_wqe_frag_info would mistakenly point to the new struct mlx5e_dma_info, not the one that was posted (and the HW wrote to). This bulking requirement is actually good also for performance reasons, hence we extend the bulk beyong the minimal requirement above. With this memory scheme, the RQs memory footprint is reduce by a factor of 2 on x86, and by a factor of 32 on PowerPC. Same factors apply for the number of pages in a GRO session. Performance tests: ConnectX-4, single core, single RX ring, default MTU. x86: CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz Packet rate (early drop in TC): no degradation TCP streams: ~5% improvement PowerPC: CPU: POWER8 (raw), altivec supported Packet rate (early drop in TC): 20% gain TCP streams: 25% gain Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-06-01net/mlx5e: RX, Use cyclic WQ in legacy RQTariq Toukan6-111/+161
Now that LRO is not supported for Legacy RQ, there is no source of out-of-order completions in the WQ, and we can use a cyclic one. This has multiple advantages: - reduces the WQE size (smaller PCI transactions). - lower overhead in datapath (no handling of 'next' pointers). - no reserved WQE for the WQ head (was need in linked-list). - allows using a constant map between frag and dma_info struct, in downstream patch. Performance tests: ConnectX-4, single core, single RX ring. Major gain in packet rate of single ring XDP drop. Bottleneck is shifted form HW (at 16Mpps) to SW (at 20Mpps). Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-06-01net/mlx5e: RX, Split WQ objects for different RQ typesTariq Toukan3-57/+110
Replace the common RQ WQ object with two separate ones for the different RQ types. This is in preparation for switching to using a cyclic WQ type in Legacy RQ. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-06-01net/mlx5e: RX, Remove HW LRO support in legacy RQTariq Toukan2-14/+26
Current LRO implementation in Legacy RQ uses high-order pages. In downstream patches of this series we complete the transition to using only order-0 pages in RX datapath (which was already done in Striding RQ). Unlike the more advanced Striding RQ, Legacy RQ does not make reuse of any non-consumed buffers of non-full LRO sessions, and combining it with order-0 pages has many performance drawbacks. Hence, here we totally remove LRO support in Legacy RQ. This guarantees having no out-of-order completions, which allows using a cyclic work queue (instead of a linked-list) in a downstream patch. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-06-01net/mlx5e: RX, Dedicate a function for copying SKB headerTariq Toukan1-13/+17
Get the logic of copying the packet header into the SKB linear part into a generic function. Function does copy length alignment and dma buffer sync. It is currently called only within the MPWQE flow. In a downstream patch, it will be called within the legacy RQ flow as well. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-06-01net/mlx5e: RX, Generalise function of SKB frag additionTariq Toukan1-8/+8
Rename it and pass truesize as an extra argument, as it will be used also in Legacy RQ in a downstream patch. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-06-01net/mlx5e: RX, Generalise name of non-linear SKB head sizeTariq Toukan3-4/+4
Make name more generic by dropping MPWRQ from it, as it will be used also in Legacy RQ in a downstream patch. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-06-01net/mlx5e: TX, Obsolete maintaining local copies of skb->len/dataTariq Toukan1-30/+12
Instead of maintaining a local copy of skb->len/data and updating it upon every copy to the WQE inline part, just calculate it once when needed, using the ihs. This obsoletes the function mlx5e_tx_skb_pull_inline. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-06-01net/mlx5: FPGA, Handle QP error eventIlan Tayari1-4/+24
Add handlers for this event to perform graceful teardown of the device. Signed-off-by: Ilan Tayari <ilant@mellanox.com> Signed-off-by: Adi Nissim <adin@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-06-01net/mlx5e: Support configurable MTU for vport representorsAdi Nissim3-4/+28
The representor MTU was hard coded to 1500 bytes. Allow setting arbitrary MTU values up to the max supported by the FW. Signed-off-by: Adi Nissim <adin@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-06-01net/mlx5e: Increase aRFS flow tables sizeMaor Gottlieb1-1/+1
Increase the aRFS flow table size to 64k so it could contain up to 64k different streams. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-06-01net/mlx5e: Remove redundant active_channels indicationEran Ben Elisha4-20/+0
Now, when all channels stats are saved regardless of the channel's state {open, closed}, we can safely remove this indication and the stats spin lock which protects it. Fixes: 76c3810bade3 ("net/mlx5e: Avoid reset netdev stats on configuration changes") Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-06-01net/mlx5e: Present SW stats when state is not openedEran Ben Elisha2-16/+13
The driver can present all SW stats even when the state not opened. Fixed get strings, count and stats to support it. In addition, fix tc2txq to hold a static mapping which doesn't depend on the amount of open channels, and cannot have the same value on two different cells while moving between configurations. Example: - OOB 16 channels - Change to 2 channels, 8 TCs - tc2txq[15][0] == tc2txq[1][7] == 15 This will cause multiple appearances of the same TX index in statistics output. Fixes: 76c3810bade3 ("net/mlx5e: Avoid reset netdev stats on configuration changes") Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-06-01net/mlx5e: IPOIB, Add a missing skb_pullTariq Toukan1-0/+1
A call to mlx5e_tx_skb_pull_inline was mistakenly dropped in the cited patch. Get it back. Fixes: 043dc78ecf07 ("net/mlx5e: TX, Use actual WQE size for SQ edge fill") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-06-01net/mlx5e: IPOIB, Fix overflowing SQ WQE memsetTariq Toukan1-3/+3
IPoIB WQE size is larger than a single WQEBB. Must not fetch the WQE, and surely not memset it, until it is guaranteed that there are enough WQEBBs available before getting to SQ/frag edge. Fixes: 043dc78ecf07 ("net/mlx5e: TX, Use actual WQE size for SQ edge fill") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-06-01net: hns3: Optimize the VF's process of updating multicast MACXi Wang4-5/+187
In the update flow of the new PF driver, if a multicast address is in mta table, the VF deletion action will not take effect. This patch adds the VF adaptation according to the new flow of PF'driver. Signed-off-by: Xi Wang <wangxi11@huawei.com> Reviewed-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-01net: hns3: Optimize the PF's process of updating multicast MACXi Wang4-21/+136
In the current process, the multicast MAC is added to both MAC_VLAN table and MTA table, this will reduce the utilization of the resource. This patch improves the process of adding multicast MAC address, the new process starts using the MTA table to add multicast MAC after the MAC_VLAN table is full, and the MTA is disable if it is no longer used. Signed-off-by: Xi Wang <wangxi11@huawei.com> Reviewed-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-01net: hns3: Fix for vxlan tx checksum bugYunsheng Lin1-0/+29
when skb->encapsulation is 0, skb->ip_summed is CHECKSUM_PARTIAL and it is udp packet, which has a dest port as the IANA assigned. the hardware is expected to do the checksum offload, but the hardware will not do the checksum offload when udp dest port is 4789. This patch fixes it by doing the checksum in software. Fixes: 76ad4f0ee747 ("net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC") Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-01net: hns3: Add missing break in misc_irq_handleYunsheng Lin1-3/+3
There is a break missing in the switch/case handling in hclge_misc_irq_handle, which causes the log to output uncorrectly. This patch adds the missing break, and change the dev_dbg to dev_warn in order to better catch the error. Fixes: c1a81619d73a ("net: hns3: Add mailbox interrupt handling to PF driver") Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-01net: hns3: Fix for phy not link up problem after resettingYunsheng Lin1-4/+3
When resetting, phy_state_machine may be accessing the phy through firmware if the phy is not stopped or disconnected, which will cause firemware timeout problem because the firmware is busy processing the reset request. This patch fixes it by disabling the phy when resetting. Fixes: b940aeae0ed6 ("net: hns3: never send command queue message to IMP when reset") Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-01net: hns3: Fix for hclge_reset running repeatly problemYunsheng Lin1-6/+34
When hardware sends the HCLGE_VECTOR0_EVENT_RST event through hclge_misc_irq_handle, currently driver enables misc_vector in the interrupt handle, and hardware generates the same interrupt for the same reset event again and again until the reset is complete, which causes hclge_reset running repeatly problem. This patch fixes by enabling the misc_vector after reset is complete. Fixes: 4ed340ab8f49 ("net: hns3: Add reset process in hclge_main") Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-01net: hns3: Fix for service_task not running problem after resettingYunsheng Lin2-0/+2
When hclge_ae_stop is called during resetting, it will cancel the service_task by calling cancel_work_sync, which may cause the service_task to exit without clearing HCLGE_STATE_SERVICE_SCHED bit. If this happens, the service_task will never run again. This patch fixes this problem by clearing it after calling cancel_work_sync in hclge_ae_stop. Fixes: 46a3df9f9718 ("net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support") Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-01net: hns3: Fix setting mac address errorJian Shen1-0/+13
When doing function reset or insmod hns3 dirver after rmmod, the entries of mac vlan table are not cleared, which may cause init mac address failed. This patch fixes it by clearing the old mac address when doing function reset or rmmod hns3 driver. Fixes: 76ad4f0ee747 ("net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC") Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-01net: hns3: Add repeat address checking for setting mac addressJian Shen1-0/+6
Add checking for new mac address. It doesn't need to config the mac vlan table if it's already in use. Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-01net: hns3: Add support for IFF_ALLMULTI flagPeng Li6-12/+21
This patch adds support for IFF_ALLMULTI flag to HNS3 PF and VF driver. Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-01net: hns3: Disable vf vlan filter when vf vlan table is fullYunsheng Lin1-0/+7
This is only 128 entries for hardware's vf vlan table, when the vf table is full, the firmware will disable the vf vlan filter and return a resp_code of HCLGE_VF_VLAN_NO_ENTRY to driver. This patch checks the if resp_code from firmware is HCLGE_VF_VLAN_NO_ENTRY, if yes, then print a warning and return ok to the caller. Fixes: 46a3df9f9718 ("net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support") Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-01selftests: forwarding: mirror_gre_bridge_1d_vlan: Add STP testPetr Machata1-0/+12
To test offloading of mirror-to-gretap in mlxsw for cases that a VLAN-unaware bridge is in underlay packet path, test that the STP status of bridge egress port is reflected. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-01selftests: forwarding: mirror_gre_vlan_bridge_1q: Add more testsPetr Machata1-0/+129
Offloading of mirror-to-gretap in mlxsw is tricky especially in cases when the gretap underlay involves bridges. Add more tests that exercise the bridge handling code: - forbidden_egress tests that check vlan removal on bridge port in the underlay packet path - untagged_egress tests that similarly check "egress untagged" - fdb_roaming tests that check whether learning FDB on a different port is reflected - stp tests for handling port STP status of bridge egress port Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-01selftests: forwarding: mirror_gre_vlan_bridge_1q: Rename two testsPetr Machata1-7/+7
Rename test_gretap_forbidden() and test_ip6gretap_forbidden() to a more specific test_gretap_forbidden_cpu() and test_ip6gretap_forbidden_cpu(). This will make it clearer which is which when further down a patch is introduced that forbids a VLAN on regular bridge port. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-01selftests: forwarding: mirror_gre_vlan_bridge_1q: Test final configPetr Machata1-2/+3
After the final change reestablishes the original configuration, make sure the traffic flows again as it should. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-01selftests: forwarding: mirror_gre_vlan_bridge_1q: Fix tunnel namePetr Machata1-1/+1
The "ip6gretap" in the test name refers to the tunnel device type that the test is supposed to be testing. However test_ip6gretap_forbidden() tests, due to a typo, a gretap tunnel. Fix the typo. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-01selftests: forwarding: mirror_gre_lib: Add STP testPetr Machata1-0/+32
Add a reusable full test that toggles STP state of a given bridge port and checks that the mirroring reacts appropriately. The test will be used by bridge tests in follow-up patches. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-01selftests: forwarding: mirror_lib: skip_hw the VLAN capturePetr Machata1-1/+4
When the VLAN capture is installed on a front panel device and not a soft device, the packets are counted twice: once in fast path, and once after they are trapped to the kernel. Resolve the problem by passing skip_hw flag to vlan_capture_install(). Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-01selftests: forwarding: mirror_lib: Move here do_test_span_vlan_dir_ips()Petr Machata2-15/+35
Move the function do_test_span_vlan_dir_ips() from mirror_vlan.sh test to a library file mirror_lib.sh to allow reuse. Fill in other entry points similar to other testing functions in mirror_lib.sh, they will be useful in following patches. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-01selftests: forwarding: lib: Move here vlan_capture_{, un}install()Petr Machata2-23/+23
Move vlan_capture_install() and vlan_capture_uninstall() from mirror_vlan.sh test to lib.sh so that it can be reused in other tests. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-01net: mvpp2: Split the PPv2 driver to a dedicated directoryMaxime Chevallier8-3889/+4030
As the mvpp2 driver is growing, move this driver to a dedicated directory and split it into several files. Since this driver has a lot of register defines and structure definitions, it can benefit from having all of this into a dedicated header file, named mvpp2.h. A good chunk of the mvpp2 code is dedicated to Header Parser handling, so we introduce mvpp2_prs.h where all Header Parser definitions are located, and mvpp2_prs.c containing the related code. In the same way, mvpp2_cls.h and mvpp2_cls.c are created to contain Classifier and RSS related code. The former 'mvpp2.c' file is renamed 'mvpp2_main.c' so that we can keep the driver binary named 'mvpp2'. This commit is only about spliting the driver into multiple files and doesn't introduce any new function, feature or fix besides removing 'static' keywords when needed. Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Tested-by: Antoine Tenart <antoine.tenart@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-01net: sched: split tc_ctl_tfilter into three handlersVlad Buslov1-145/+293
tc_ctl_tfilter handles three netlink message types: RTM_NEWTFILTER, RTM_DELTFILTER, RTM_GETTFILTER. However, implementation of this function involves a lot of branching on specific message type because most of the code is message-specific. This significantly complicates adding new functionality and doesn't provide much benefit of code reuse. Split tc_ctl_tfilter to three standalone functions that handle filter new, delete and get requests. The only truly protocol independent part of tc_ctl_tfilter is code that looks up queue, class, and block. Refactor this code to standalone tcf_block_find function that is used by all three new handlers. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-01rtnetlink: Fix null-ptr-deref in rtnl_newlinkPrashant Bhole1-1/+1
In rtnl_newlink(), NULL check is performed on m_ops however member of ops is accessed. Fixed by accessing member of m_ops instead of ops. [ 345.432629] BUG: KASAN: null-ptr-deref in rtnl_newlink+0x400/0x1110 [ 345.432629] Read of size 4 at addr 0000000000000088 by task ip/986 [ 345.432629] [ 345.432629] CPU: 1 PID: 986 Comm: ip Not tainted 4.17.0-rc6+ #9 [ 345.432629] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 [ 345.432629] Call Trace: [ 345.432629] dump_stack+0xc6/0x150 [ 345.432629] ? dump_stack_print_info.cold.0+0x1b/0x1b [ 345.432629] ? kasan_report+0xb4/0x410 [ 345.432629] kasan_report.cold.4+0x8f/0x91 [ 345.432629] ? rtnl_newlink+0x400/0x1110 [ 345.432629] rtnl_newlink+0x400/0x1110 [...] Fixes: ccf8dbcd062a ("rtnetlink: Remove VLA usage") Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp> Tested-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-01ipvs: add ipv6 support to ftpJulian Anastasov6-182/+331
Add support for FTP commands with extended format (RFC 2428): - FTP EPRT: IPv4 and IPv6, active mode, similar to PORT - FTP EPSV: IPv4 and IPv6, passive mode, similar to PASV. EPSV response usually contains only port but we allow real server to provide different address We restrict control and data connection to be from same address family. Allow the "(" and ")" to be optional in PASV response. Also, add ipvsh argument to the pkt_in/pkt_out handlers to better access the payload after transport header. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-06-01ipvs: add full ipv6 support to nfctJulian Anastasov1-52/+49
Prepare NFCT to support IPv6 for FTP: - Do not restrict the expectation callback to PF_INET - Split the debug messages, so that the 160-byte limitation in IP_VS_DBG_BUF is not exceeded when printing many IPv6 addresses. This means no more than 3 addresses in one message, i.e. 1 tuple with 2 addresses or 1 connection with 3 addresses. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-06-01netfilter: nft_fwd_netdev: allow to forward packets via neighbour layerPablo Neira Ayuso2-1/+149
This allows us to forward packets from the netdev family via neighbour layer, so you don't need an explicit link-layer destination when using this expression from rules. The ttl/hop_limit field is decremented. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-06-01netfilter: nfnetlink: Remove VLA usageKees Cook1-2/+23
In the quest to remove all stack VLA usage from the kernel[1], this allocates the maximum size expected for all possible attrs and adds sanity-checks at both registration and usage to make sure nothing gets out of sync. [1] https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.com Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-06-01netfilter: nf_flow_table: attach dst to skbsJason A. Donenfeld1-2/+4
Some drivers, such as vxlan and wireguard, use the skb's dst in order to determine things like PMTU. They therefore loose functionality when flow offloading is enabled. So, we ensure the skb has it before xmit'ing it in the offloading path. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-06-01netfilter: nf_tables: fix chain dependency validationPablo Neira Ayuso8-32/+208
The following ruleset: add table ip filter add chain ip filter input { type filter hook input priority 4; } add chain ip filter ap add rule ip filter input jump ap add rule ip filter ap masquerade results in a panic, because the masquerade extension should be rejected from the filter chain. The existing validation is missing a chain dependency check when the rule is added to the non-base chain. This patch fixes the problem by walking down the rules from the basechains, searching for either immediate or lookup expressions, then jumping to non-base chains and again walking down the rules to perform the expression validation, so we make sure the full ruleset graph is validated. This is done only once from the commit phase, in case of problem, we abort the transaction and perform fine grain validation for error reporting. This patch requires 003087911af2 ("netfilter: nfnetlink: allow commit to fail") to achieve this behaviour. This patch also adds a cleanup callback to nfnl batch interface to reset the validate state from the exit path. As a result of this patch, nf_tables_check_loops() doesn't use ->validate to check for loops, instead it just checks for immediate expressions. Reported-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-06-01netfilter: nf_tables: Add audit support to log statementPhil Sutter2-1/+96
This extends log statement to support the behaviour achieved with AUDIT target in iptables. Audit logging is enabled via a pseudo log level 8. In this case any other settings like log prefix are ignored since audit log format is fixed. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-06-01netfilter: nf_tables: add support for native socket matchingMáté Eckl4-0/+178
Now it can only match the transparent flag of an ip/ipv6 socket. Signed-off-by: Máté Eckl <ecklm94@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-06-01netfilter: fix ptr_ret.cocci warningskbuild test robot2-12/+3
net/netfilter/nft_numgen.c:117:1-3: WARNING: PTR_ERR_OR_ZERO can be used net/netfilter/nft_hash.c:180:1-3: WARNING: PTR_ERR_OR_ZERO can be used net/netfilter/nft_hash.c:223:1-3: WARNING: PTR_ERR_OR_ZERO can be used Use PTR_ERR_OR_ZERO rather than if(IS_ERR(...)) + PTR_ERR Generated by: scripts/coccinelle/api/ptr_ret.cocci Fixes: b9ccc07e3f31 ("netfilter: nft_hash: add map lookups for hashing operations") Fixes: d734a2888922 ("netfilter: nft_numgen: add map lookups for numgen statements") CC: Laura Garcia Liebana <nevola@gmail.com> Signed-off-by: kbuild test robot <fengguang.wu@intel.com> Acked-by: Laura Garcia Liebana <nevola@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-05-31virtio_net: fix error return code in virtnet_probe()Wei Yongjun1-1/+3
Fix to return a negative error code from the failover create fail error handling case instead of 0, as done elsewhere in this function. Fixes: ba5e4426e80e ("virtio_net: Extend virtio to use VF datapath when available") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-31rtnetlink: Remove VLA usageKees Cook2-4/+18
In the quest to remove all stack VLA usage from the kernel[1], this allocates the maximum size expected for all possible types and adds sanity-checks at both registration and usage to make sure nothing gets out of sync. This matches the proposed VLA solution for nfnetlink[2]. The values chosen here were based on finding assignments for .maxtype and .slave_maxtype and manually counting the enums: slave_maxtype (max 33): IFLA_BRPORT_MAX 33 IFLA_BOND_SLAVE_MAX 9 maxtype (max 45): IFLA_BOND_MAX 28 IFLA_BR_MAX 45 __IFLA_CAIF_HSI_MAX 8 IFLA_CAIF_MAX 4 IFLA_CAN_MAX 16 IFLA_GENEVE_MAX 12 IFLA_GRE_MAX 25 IFLA_GTP_MAX 5 IFLA_HSR_MAX 7 IFLA_IPOIB_MAX 4 IFLA_IPTUN_MAX 21 IFLA_IPVLAN_MAX 3 IFLA_MACSEC_MAX 15 IFLA_MACVLAN_MAX 7 IFLA_PPP_MAX 2 __IFLA_RMNET_MAX 4 IFLA_VLAN_MAX 6 IFLA_VRF_MAX 2 IFLA_VTI_MAX 7 IFLA_VXLAN_MAX 28 VETH_INFO_MAX 2 VXCAN_INFO_MAX 2 This additionally changes maxtype and slave_maxtype fields to unsigned, since they're only ever using positive values. [1] https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.com [2] https://patchwork.kernel.org/patch/10439647/ Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>