aboutsummaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)AuthorFilesLines
2019-11-05net: hns3: remove unused macrosColin Ian King2-4/+0
The macros HCLGE_MPF_ENBALE and HCLGEVF_MPF_ENBALE are defined but never used. I was going to fix the spelling mistake "ENBALE" -> "ENABLE" but found these macros are not used, so they can be removed. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-05vsock: Simplify '__vsock_release()'Christophe JAILLET1-3/+1
Use 'skb_queue_purge()' instead of re-implementing it. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-05net: dsa: Fix use after free in dsa_switch_remove()Florian Fainelli1-1/+2
The order in which the ports are deleted from the list and freed and the call to dsa_switch_remove() is done is reversed, which leads to an use after free condition. Reverse the two: first tear down the ports and switch from the fabric, then free the ports associated with that switch fabric. Fixes: 05f294a85235 ("net: dsa: allocate ports on touch") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-05tc-testing: added tests with cookie for mpls TC actionRoman Mashak1-0/+145
Signed-off-by: Roman Mashak <mrv@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-05Merge branch 'icmp-move-duplicate-code-in-helper-functions'David S. Miller8-30/+35
Matteo Croce says: ==================== icmp: move duplicate code in helper functions Remove some duplicate code by moving it in two helper functions. First patch adds the helpers, the second one uses it. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-05icmp: remove duplicate codeMatteo Croce6-30/+6
The same code which recognizes ICMP error packets is duplicated several times. Use the icmp_is_err() and icmpv6_is_err() helpers instead, which do the same thing. ip_multipath_l3_keys() and tcf_nat_act() didn't check for all the error types, assume that they should instead. Signed-off-by: Matteo Croce <mcroce@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-05icmp: add helpers to recognize ICMP error packetsMatteo Croce2-0/+29
Add two helper functions, one for IPv4 and one for IPv6, to recognize the ICMP packets which are error responses. This packets are special because they have as payload the original header of the packet which generated it (RFC 792 says at least 8 bytes, but Linux actually includes much more than that). Signed-off-by: Matteo Croce <mcroce@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-05Merge branch 'netvsc-RSS-related-patches'David S. Miller3-5/+15
Stephen Hemminger says: ==================== netvsc: RSS related patches Address a couple of issues related to recording RSS hash value in skb. These were found by reviewing RSS support. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-05hv_netvsc: record hardware hash in skbStephen Hemminger3-1/+12
Since RSS hash is available from the host, record it in the skb. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-05hv_netvsc: flag software created hash valueStephen Hemminger1-4/+3
When the driver needs to create a hash value because it was not done at higher level, then the hash should be marked as a software not hardware hash. Fixes: f72860afa2e3 ("hv_netvsc: Exclude non-TCP port numbers from vRSS hashing") Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-05Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queueDavid S. Miller16-1150/+3553
Jeff Kirsher says: ==================== 100GbE Intel Wired LAN Driver Updates 2019-11-04 This series contains updates to the ice driver only. Anirudh refactors the code to reduce the kernel configuration flags and introduces ice_base.c file. Maciej does additional refactoring on the configuring of transmit rings so that we are not configuring per each traffic class flow. Added support for XDP in the ice driver. Provides additional re-organizing of the code in preparation for adding build_skb() support in the driver. Adjusted the computational padding logic for headroom and tailroom to better support build_skb(), which also aligns with the logic in other Intel LAN drivers. Added build_skb support and make use of the XDP's data_meta. Krzysztof refactors the driver to prepare for AF_XDP support in the driver and then adds support for AF_XDP. v2: Updated patch 3 of the series based on community feedback with the following changes... - return -EOPNOTSUPP instead of ENOTSUPP for too large MTU which makes it impossible to attach XDP prog - don't check for case when there's no XDP prog currently on interface and ice_xdp() is called with NULL bpf_prog; this happens when user does "ip link set eth0 xdp off" and no prog is present on VSI; no need for that as it is handled by higher layer - drop the extack message for unknown xdp->command - use the smp_processor_id() for accessing the XDP Tx ring for XDP_TX action - don't leave the interface in downed state in case of any failure during the XDP Tx resources handling - undo rename of ice_build_ctob The above changes caused a ripple effect in patches 4 & 5 to update references to ice_build_ctob() which are now build_ctob() ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-05Merge branch '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queueDavid S. Miller13-10/+118
Jeff Kirsher says: ==================== 10GbE Intel Wired LAN Driver Updates 2019-11-04 This series contains old Halloween candy updates, yet still sweet, to fm10k, ixgbe and i40e. Jake adds the missing initializers for a couple of the TLV attribute macros. Added support for capturing and reporting statistics for all of the VFs in a given PF. Lastly, bump the version of the fm10k driver to reflect the recent changes. Alex addresses locality issues in the ixgbe driver when it is loaded on a system supporting multiple NUMA nodes. Manjunath Patil provides changes to the ixgbe driver, similar to those made to igb, to prevent transmit packets to request a hardware timestamp when the NIC has not been setup via the SIOCSHWTSTAMP ioctl. Alice adds support for x710 by adding the missing device id's in the appropriate places to ensure all the features are enabled in i40e. Jesse adds support for VF stats gathering in the i40e via the kernel via ndo_get_vf_stats function. v2: Fixed up commit id references in patch 5's description to align with how commit id's should be referenced. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-04i40e: implement VF stats NDOJesse Brandeburg3-0/+51
Implement the VF stats gathering via the kernel via ndo_get_vf_stats(). The driver will show per-VF stats in the output of the command: ip -s link show dev <PF> Testing Hints: ip -s link show dev eth0 will return non-zero VF stats. ... vf 0 MAC 00:55:aa:00:55:aa, spoof checking on, link-state enable, trust off RX: bytes packets mcast bcast 128000 1000 104 104 TX: bytes packets 128000 1000 Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-11-04i40e: enable X710 supportAlice Michael1-0/+2
The I40E_DEV_ID_10G_BASE_T_BC device id was added previously, but was not enabled in all the appropriate places. Adding it to enable it's use. Signed-off-by: Alice Michael <alice.michael@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-11-04ixgbe: protect TX timestamping from API misuseManjunath Patil1-1/+2
HW timestamping can only be requested for a packet if the NIC is first setup via ioctl(SIOCSHWTSTAMP). If this step was skipped, then the ixgbe driver still allowed TX packets to request HW timestamping. In this situation, we see 'clearing Tx Timestamp hang' noise in the log. Fix this by checking that the NIC is configured for HW TX timestamping before accepting a HW TX timestamping request. Similar-to: commit 26bd4e2db06b ("igb: protect TX timestamping from API misuse") commit 0a6f2f05a2f5 ("igb: Fix a test with HWTSTAMP_TX_ON") Signed-off-by: Manjunath Patil <manjunath.b.patil@oracle.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-11-04fm10k: update driver version to match out-of-treeJacob Keller1-1/+1
An upcoming out-of-tree release will be occurring which will include the recent functionality to support virtual function statistics. Update the kernel driver version to match this. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-11-04ixgbe: Make use of cpumask_local_spread to improve RSS localityAlexander Duyck1-5/+3
This patch is meant to address locality issues present in the ixgbe driver when it is loaded on a system supporting multiple NUMA nodes and more CPUs then the device can map in a 1:1 fashion. Instead of just arbitrarily mapping itself to CPUs 0-62 it would make much more sense to map itself to the local CPUs first, and then map itself to any remaining CPUs that might be used. The first effect of this is that queue 0 should always be allocated on the local CPU/NUMA node. This is important as it is the default destination if a packet doesn't match any existing flow director filter or RSS rule and as such having it local should help to reduce QPI cross-talk in the event of an unrecognized traffic type. In addition this should increase the likelihood of the RSS queues being allocated and used on CPUs local to the device while the ATR/Flow Director queues would be able to route traffic directly to the CPU that is likely to be processing it. Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-11-04fm10k: add support for ndo_get_vf_stats operationJacob Keller5-0/+56
Support capturing and reporting statistics for all of the VFs associated with a given PF device via the ndo_get_vf_stats callback. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-11-04fm10k: add missing field initializers to TLV attributes)Jacob Keller1-3/+3
Add the missing field initializers for a couple of the TLV attribute macros. This resolves the last few -Wmissing-field-initializers warnings for the fm10k Linux driver. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-11-04ice: allow 3k MTU for XDPMaciej Fijalkowski1-2/+14
At this point ice driver is able to work on order 1 pages that are split onto two 3k buffers. Let's reflect that when user is setting new MTU size and XDP is present on interface. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-11-04ice: add build_skb() supportMaciej Fijalkowski1-1/+59
Driver is now prepared for building the skb around the existing Rx buffer, so introduce the ice_build_skb responsible for it. Make use of XDP's data_meta as well. I've observed around 30% less CPU consumption with build_skb Rx path, in comparison to legacy Rx. What stands behind such result is the avoidance of flow_dissector (which we were diving into via eth_get_headlen) and no memcpy calls. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-11-04ice: introduce frame padding computation logicMaciej Fijalkowski5-20/+114
Take into account the underlying architecture specific settings and based on that calculate the possible padding that can be supplied. Typically, for x86 and standard MTU size we will end up with 192 bytes of headroom. This is the same behavior as our other drivers have and we can dedicate it for XDP purposes. Furthermore, introduce the Rx ring flag for indicating whether build_skb is used on particular. Based on that invoke the routines for padding calculation. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-11-04ice: introduce legacy Rx flagMaciej Fijalkowski5-25/+63
Add an ethtool "legacy-rx" priv flag for toggling the Rx path. This control knob will be mainly used for build_skb usage as well as buffer size/MTU manipulation. In preparation for adding build_skb support in a way that it takes care of how we set the values of max_frame and rx_buf_len fields of struct ice_vsi. Specifically, in this patch mentioned fields are set to values that will allow us to provide headroom and tailroom in-place. This can be mostly broken down onto following: - for legacy-rx "on" ethtool control knob, old behaviour is kept; - for standard 1500 MTU size configure the buffer of size 1536, as network stack is expecting the NET_SKB_PAD to be provided and NET_IP_ALIGN can have a non-zero value (these can be typically equal to 32 and 2, respectively); - for larger MTUs go with max_frame set to 9k and configure the 3k buffer in case when PAGE_SIZE of underlying arch is less than 8k; 3k buffer is implying the need for order 1 page, so that our page recycling scheme can still be applied; With that said, substitute the hardcoded ICE_RXBUF_2048 and PAGE_SIZE values in DMA API that we're making use of with rx_ring->rx_buf_len and ice_rx_pg_size(rx_ring). The latter is an introduced helper for determining the page size based on its order (which was figured out via ice_rx_pg_order). Last but not least, take care of truesize calculation. In the followup patch the headroom/tailroom computation logic will be introduced. This change aligns the buffer and frame configuration with other Intel drivers, most importantly with iavf. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-11-04ice: Add support for AF_XDPKrzysztof Kazimierczak11-27/+1456
Add zero copy AF_XDP support. This patch adds zero copy support for Tx and Rx; code for zero copy is added to ice_xsk.h and ice_xsk.c. For Tx, implement ndo_xsk_wakeup. As with other drivers, reuse existing XDP Tx queues for this task, since XDP_REDIRECT guarantees mutual exclusion between different NAPI contexts based on CPU ID. In turn, a netdev can XDP_REDIRECT to another netdev with a different NAPI context, since the operation is bound to a specific core and each core has its own hardware ring. For Rx, allocate frames as MEM_TYPE_ZERO_COPY on queues that AF_XDP is enabled. Signed-off-by: Krzysztof Kazimierczak <krzysztof.kazimierczak@intel.com> Co-developed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-11-04ice: Move common functions to ice_txrx_lib.cKrzysztof Kazimierczak5-312/+334
In preparation of AF XDP, move functions that will be used both by skb and zero-copy paths to a new file called ice_txrx_lib.c. This allows us to avoid using ifdefs to control the staticness of said functions. Move other functions (ice_rx_csum, ice_rx_hash and ice_ptype_to_htype) called only by the moved ones to the new file as well. Signed-off-by: Krzysztof Kazimierczak <krzysztof.kazimierczak@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-11-04bpf: re-fix skip write only files in debugfsDaniel Borkmann1-1/+4
Commit 5bc60de50dfe ("selftests: bpf: Don't try to read files without read permission") got reverted as the fix was not working as expected and real fix came in via 8101e069418d ("selftests: bpf: Skip write only files in debugfs"). When bpf-next got merged into net-next, the test_offload.py had a small conflict. Fix the resolution in ae8a76fb8b5d iby not reintroducing 5bc60de50dfe again. Fixes: ae8a76fb8b5d ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Jakub Kicinski <jakub.kicinski@netronome.com> Cc: Alexei Starovoitov <ast@kernel.org> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-04net: ethernet: stmmac: drop unused variable in stm32mp1_set_mode()Christophe Roullier1-3/+3
Building with W=1 (cf.scripts/Makefile.extrawarn) outputs: warning: variable ‘ret’ set but not used [-Wunused-but-set-variable] Drop the unused 'ret' variable. Signed-off-by: Christophe Roullier <christophe.roullier@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-04net: sgi: ioc3-eth: ensure tx ring is 16k aligned.Thomas Bogendoerfer1-6/+10
IOC3 hardware needs a 16k aligned TX ring. Signed-off-by: Thomas Bogendoerfer <tbogendoerfer@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-04net: sgi: ioc3-eth: fix setting NETIF_F_HIGHDMAChristoph Hellwig1-3/+1
Set NETIF_F_HIGHDMA together with the NETIF_F_IP_CSUM flag instead of letting the second assignment overwrite it. Probably doesn't matter in practice as none of the systems an IOC3 is usually found in has highmem to start with. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Thomas Bogendoerfer <tbogendoerfer@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-04net: sgi: ioc3-eth: simplify setting the DMA maskChristoph Hellwig1-20/+7
There is no need to fall back to a lower mask these days, the DMA mask just communicates the hardware supported features. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Thomas Bogendoerfer <tbogendoerfer@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-04net: sgi: ioc3-eth: fix usage of GFP_* flagsChristoph Hellwig1-2/+2
dma_alloc_coherent always zeroes memory, there is no need for __GFP_ZERO. Also doing a GFP_ATOMIC allocation just before a GFP_KERNEL one is clearly bogus. Fixes: ed870f6a7aa2 ("net: sgi: ioc3-eth: use dma-direct for dma allocations") Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Thomas Bogendoerfer <tbogendoerfer@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-04net: sgi: ioc3-eth: don't abuse dma_direct_* callsChristoph Hellwig1-14/+11
dma_direct_ is a low-level API that must never be used by drivers directly. Switch to use the proper DMA API instead. Fixes: ed870f6a7aa2 ("net: sgi: ioc3-eth: use dma-direct for dma allocations") Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Thomas Bogendoerfer <tbogendoerfer@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-04ipv6: use jhash2() in rt6_exception_hash()Eric Dumazet1-2/+2
Faster jhash2() can be used instead of jhash(), since IPv6 addresses have the needed alignment requirement. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-04net: of_get_phy_mode: Change API to solve int/unit warningsAndrew Lunn53-149/+201
Before this change of_get_phy_mode() returned an enum, phy_interface_t. On error, -ENODEV etc, is returned. If the result of the function is stored in a variable of type phy_interface_t, and the compiler has decided to represent this as an unsigned int, comparision with -ENODEV etc, is a signed vs unsigned comparision. Fix this problem by changing the API. Make the function return an error, or 0 on success, and pass a pointer, of type phy_interface_t, where the phy mode should be stored. v2: Return with *interface set to PHY_INTERFACE_MODE_NA on error. Add error checks to all users of of_get_phy_mode() Fixup a few reverse christmas tree errors Fixup a few slightly malformed reverse christmas trees v3: Fix 0-day reported errors. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-04net: bridge: fdb: eliminate extra port state tests from fast-pathNikolay Aleksandrov3-5/+9
When commit df1c0b8468b3 ("[BRIDGE]: Packets leaking out of disabled/blocked ports.") introduced the port state tests in br_fdb_update() it was to avoid learning/refreshing from STP BPDUs, it was also used to avoid learning/refreshing from user-space with NTF_USE. Those two tests are done for every packet entering the bridge if it's learning, but for the fast-path we already have them checked in br_handle_frame() and is unnecessary to do it again. Thus push the checks to the unlikely cases and drop them from br_fdb_update(), the new nbp_state_should_learn() helper is used to determine if the port state allows br_fdb_update() to be called. The two places which need to do it manually are: - user-space add call with NTF_USE set - link-local packet learning done in __br_handle_local_finish() Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-04ice: Add support for XDPMaciej Fijalkowski8-57/+825
Add support for XDP. Implement ndo_bpf and ndo_xdp_xmit. Upon load of an XDP program, allocate additional Tx rings for dedicated XDP use. The following actions are supported: XDP_TX, XDP_DROP, XDP_REDIRECT, XDP_PASS, and XDP_ABORTED. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-11-04ice: get rid of per-tc flow in Tx queue configuration routinesMaciej Fijalkowski3-49/+47
There's no reason for treating DCB as first class citizen when configuring the Tx queues and going through TCs. Reverse the logic and base the configuration logic on rings, which is the object of interest anyway. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-11-04ice: Introduce ice_base.cAnirudh Venkataramanan10-827/+811
Remove a few uses of kernel configuration flags from ice_lib.c by introducing a new source file ice_base.c. Also move corresponding function prototypes from ice_lib.h to ice_base.h and include ice_base.h where required. Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-11-03Merge tag 'mlx5-updates-2019-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linuxDavid S. Miller25-359/+393
Saeed Mahameed says: ==================== mlx5-updates-2019-11-01 Misc updates for mlx5 netdev and core driver 1) Steering Core: Replace CRC32 internal implementation with standard kernel lib. 2) Steering Core: Support IPv4 and IPv6 mixed matcher. 3) Steering Core: Lockless FTE read lookups 4) TC: Bit sized fields rewrite support. 5) FPGA: Standalone FPGA support. 6) SRIOV: Reset VF parameters configurations on SRIOV disable. 7) netdev: Dump WQs wqe descriptors on CQE with error events. 8) MISC Cleanups. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-03mISDN: remove unused variable 'faxmodulation_s'YueHaibing1-1/+0
drivers/isdn/hardware/mISDN/mISDNisar.c:30:17: warning: faxmodulation_s defined but not used [-Wunused-const-variable=] It is never used, so can be removed. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-03ptp: Add a ptp clock driver for IDT ClockMatrix.Vincent Cheng5-0/+2201
The IDT ClockMatrix (TM) family includes integrated devices that provide eight PLL channels. Each PLL channel can be independently configured as a frequency synthesizer, jitter attenuator, digitally controlled oscillator (DCO), or a digital phase lock loop (DPLL). Typically these devices are used as timing references and clock sources for PTP applications. This patch adds support for the device. Co-developed-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: Vincent Cheng <vincent.cheng.xh@renesas.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-03dt-bindings: ptp: Add device tree binding for IDT ClockMatrix based PTP clockVincent Cheng1-0/+69
Add device tree binding doc for the IDT ClockMatrix PTP clock. Signed-off-by: Vincent Cheng <vincent.cheng.xh@renesas.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-03net: icmp6: provide input address for traceroute6Francesco Ruggeri1-5/+17
traceroute6 output can be confusing, in that it shows the address that a router would use to reach the sender, rather than the address the packet used to reach the router. Consider this case: ------------------------ N2 | | ------ ------ N3 ---- | R1 | | R2 |------|H2| ------ ------ ---- | | ------------------------ N1 | ---- |H1| ---- where H1's default route is through R1, and R1's default route is through R2 over N2. traceroute6 from H1 to H2 shows R2's address on N1 rather than on N2. The script below can be used to reproduce this scenario. traceroute6 output without this patch: traceroute to 2000:103::4 (2000:103::4), 30 hops max, 80 byte packets 1 2000:101::1 (2000:101::1) 0.036 ms 0.008 ms 0.006 ms 2 2000:101::2 (2000:101::2) 0.011 ms 0.008 ms 0.007 ms 3 2000:103::4 (2000:103::4) 0.013 ms 0.010 ms 0.009 ms traceroute6 output with this patch: traceroute to 2000:103::4 (2000:103::4), 30 hops max, 80 byte packets 1 2000:101::1 (2000:101::1) 0.056 ms 0.019 ms 0.006 ms 2 2000:102::2 (2000:102::2) 0.013 ms 0.008 ms 0.008 ms 3 2000:103::4 (2000:103::4) 0.013 ms 0.009 ms 0.009 ms #!/bin/bash # # ------------------------ N2 # | | # ------ ------ N3 ---- # | R1 | | R2 |------|H2| # ------ ------ ---- # | | # ------------------------ N1 # | # ---- # |H1| # ---- # # N1: 2000:101::/64 # N2: 2000:102::/64 # N3: 2000:103::/64 # # R1's host part of address: 1 # R2's host part of address: 2 # H1's host part of address: 3 # H2's host part of address: 4 # # For example: # the IPv6 address of R1's interface on N2 is 2000:102::1/64 # # Nets are implemented by macvlan interfaces (bridge mode) over # dummy interfaces. # # Create net namespaces ip netns add host1 ip netns add host2 ip netns add rtr1 ip netns add rtr2 # Create nets ip link add net1 type dummy; ip link set net1 up ip link add net2 type dummy; ip link set net2 up ip link add net3 type dummy; ip link set net3 up # Add interfaces to net1, move them to their nemaspaces ip link add link net1 dev host1net1 type macvlan mode bridge ip link set host1net1 netns host1 ip link add link net1 dev rtr1net1 type macvlan mode bridge ip link set rtr1net1 netns rtr1 ip link add link net1 dev rtr2net1 type macvlan mode bridge ip link set rtr2net1 netns rtr2 # Add interfaces to net2, move them to their nemaspaces ip link add link net2 dev rtr1net2 type macvlan mode bridge ip link set rtr1net2 netns rtr1 ip link add link net2 dev rtr2net2 type macvlan mode bridge ip link set rtr2net2 netns rtr2 # Add interfaces to net3, move them to their nemaspaces ip link add link net3 dev rtr2net3 type macvlan mode bridge ip link set rtr2net3 netns rtr2 ip link add link net3 dev host2net3 type macvlan mode bridge ip link set host2net3 netns host2 # Configure interfaces and routes in host1 ip netns exec host1 ip link set lo up ip netns exec host1 ip link set host1net1 up ip netns exec host1 ip -6 addr add 2000:101::3/64 dev host1net1 ip netns exec host1 ip -6 route add default via 2000:101::1 # Configure interfaces and routes in rtr1 ip netns exec rtr1 ip link set lo up ip netns exec rtr1 ip link set rtr1net1 up ip netns exec rtr1 ip -6 addr add 2000:101::1/64 dev rtr1net1 ip netns exec rtr1 ip link set rtr1net2 up ip netns exec rtr1 ip -6 addr add 2000:102::1/64 dev rtr1net2 ip netns exec rtr1 ip -6 route add default via 2000:102::2 ip netns exec rtr1 sysctl net.ipv6.conf.all.forwarding=1 # Configure interfaces and routes in rtr2 ip netns exec rtr2 ip link set lo up ip netns exec rtr2 ip link set rtr2net1 up ip netns exec rtr2 ip -6 addr add 2000:101::2/64 dev rtr2net1 ip netns exec rtr2 ip link set rtr2net2 up ip netns exec rtr2 ip -6 addr add 2000:102::2/64 dev rtr2net2 ip netns exec rtr2 ip link set rtr2net3 up ip netns exec rtr2 ip -6 addr add 2000:103::2/64 dev rtr2net3 ip netns exec rtr2 sysctl net.ipv6.conf.all.forwarding=1 # Configure interfaces and routes in host2 ip netns exec host2 ip link set lo up ip netns exec host2 ip link set host2net3 up ip netns exec host2 ip -6 addr add 2000:103::4/64 dev host2net3 ip netns exec host2 ip -6 route add default via 2000:103::2 # Ping host2 from host1 ip netns exec host1 ping6 -c5 2000:103::4 # Traceroute host2 from host1 ip netns exec host1 traceroute6 2000:103::4 # Delete nets ip link del net3 ip link del net2 ip link del net1 # Delete namespaces ip netns del rtr2 ip netns del rtr1 ip netns del host2 ip netns del host1 Signed-off-by: Francesco Ruggeri <fruggeri@arista.com> Original-patch-by: Honggang Xu <hxu@arista.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-03tipc: improve message bundling algorithmTuong Lien3-104/+113
As mentioned in commit e95584a889e1 ("tipc: fix unlimited bundling of small messages"), the current message bundling algorithm is inefficient that can generate bundles of only one payload message, that causes unnecessary overheads for both the sender and receiver. This commit re-designs the 'tipc_msg_make_bundle()' function (now named as 'tipc_msg_try_bundle()'), so that when a message comes at the first place, we will just check & keep a reference to it if the message is suitable for bundling. The message buffer will be put into the link backlog queue and processed as normal. Later on, when another one comes we will make a bundle with the first message if possible and so on... This way, a bundle if really needed will always consist of at least two payload messages. Otherwise, we let the first buffer go its way without any need of bundling, so reduce the overheads to zero. Moreover, since now we have both the messages in hand, we can even optimize the 'tipc_msg_bundle()' function, make bundle of a very large (size ~ MSS) and small messages which is not with the current algorithm e.g. [1400-byte message] + [10-byte message] (MTU = 1500). Acked-by: Ying Xue <ying.xue@windreiver.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-03net: icmp: use input address in tracerouteFrancesco Ruggeri1-1/+2
Even with icmp_errors_use_inbound_ifaddr set, traceroute returns the primary address of the interface the packet was received on, even if the path goes through a secondary address. In the example: 1.0.3.1/24 ---- 1.0.1.3/24 1.0.1.1/24 ---- 1.0.2.1/24 1.0.2.4/24 ---- |H1|--------------------------|R1|--------------------------|H2| ---- N1 ---- N2 ---- where 1.0.3.1/24 is R1's primary address on N1, traceroute from H1 to H2 returns: traceroute to 1.0.2.4 (1.0.2.4), 30 hops max, 60 byte packets 1 1.0.3.1 (1.0.3.1) 0.018 ms 0.006 ms 0.006 ms 2 1.0.2.4 (1.0.2.4) 0.021 ms 0.007 ms 0.007 ms After applying this patch, it returns: traceroute to 1.0.2.4 (1.0.2.4), 30 hops max, 60 byte packets 1 1.0.1.1 (1.0.1.1) 0.033 ms 0.007 ms 0.006 ms 2 1.0.2.4 (1.0.2.4) 0.011 ms 0.007 ms 0.007 ms Original-patch-by: Bill Fenner <fenner@arista.com> Signed-off-by: Francesco Ruggeri <fruggeri@arista.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-03Merge branch 'optimize-openvswitch-flow-looking-up'David S. Miller4-105/+361
Tonghao Zhang says: ==================== optimize openvswitch flow looking up This series patch optimize openvswitch for performance or simplify codes. Patch 1, 2, 4: Port Pravin B Shelar patches to linux upstream with little changes. Patch 5, 6, 7: Optimize the flow looking up and simplify the flow hash. Patch 8, 9: are bugfix. The performance test is on Intel Xeon E5-2630 v4. The test topology is show as below: +-----------------------------------+ | +---------------------------+ | | | eth0 ovs-switch eth1 | | Host0 | +---------------------------+ | +-----------------------------------+ ^ | | | | | | | | v +-----+----+ +----+-----+ | netperf | Host1 | netserver| Host2 +----------+ +----------+ We use netperf send the 64B packets, and insert 255+ flow-mask: $ ovs-dpctl add-flow ovs-switch "in_port(1),eth(dst=00:01:00:00:00:00/ff:ff:ff:ff:ff:01),eth_type(0x0800),ipv4(frag=no)" 2 ... $ ovs-dpctl add-flow ovs-switch "in_port(1),eth(dst=00:ff:00:00:00:00/ff:ff:ff:ff:ff:ff),eth_type(0x0800),ipv4(frag=no)" 2 $ $ netperf -t UDP_STREAM -H 2.2.2.200 -l 40 -- -m 18 * Without series patch, throughput 8.28Mbps * With series patch, throughput 46.05Mbps v6: some coding style fixes v5: rewrite patch 8, release flow-mask when freeing flow v4: access ma->count with READ_ONCE/WRITE_ONCE API. More information, see patch 5 comments. v3: update ma point when realloc mask_array in patch 5 v2: simplify codes. e.g. use kfree_rcu instead of call_rcu ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-03net: openvswitch: simplify the ovs_dp_cmd_newTonghao Zhang1-22/+38
use the specified functions to init resource. Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Tested-by: Greg Rose <gvrose8192@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-03net: openvswitch: don't unlock mutex when changing the user_features failsTonghao Zhang1-1/+1
Unlocking of a not locked mutex is not allowed. Other kernel thread may be in critical section while we unlock it because of setting user_feature fail. Fixes: 95a7233c4 ("net: openvswitch: Set OvS recirc_id from tc chain index") Cc: Paul Blakey <paulb@mellanox.com> Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Tested-by: Greg Rose <gvrose8192@gmail.com> Acked-by: William Tu <u9012063@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-03net: openvswitch: fix possible memleak on destroy flow-tableTonghao Zhang1-88/+98
When we destroy the flow tables which may contain the flow_mask, so release the flow mask struct. Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Tested-by: Greg Rose <gvrose8192@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-03net: openvswitch: add likely in flow_lookupTonghao Zhang1-2/+2
The most case *index < ma->max, and flow-mask is not NULL. We add un/likely for performance. Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Tested-by: Greg Rose <gvrose8192@gmail.com> Acked-by: William Tu <u9012063@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>