linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2021-12-23	netfilter: nft_payload: WARN_ON_ONCE instead of BUG	Pablo Neira Ayuso	1	-2/+4
	BUG() is too harsh for unknown payload base, use WARN_ON_ONCE() instead. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-12-23	netfilter: nf_tables: remove rcu read-size lock	Pablo Neira Ayuso	1	-2/+0
	Chain stats are updated from the Netfilter hook path which already run under rcu read-size lock section. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-12-16	netfilter: nf_nat_masquerade: add netns refcount tracker to masq_dev_work	Eric Dumazet	1	-1/+3
	If compiled with CONFIG_NET_NS_REFCNT_TRACKER=y, using put_net_track() in iterate_cleanup_work() and netns_tracker_alloc() in nf_nat_masq_schedule() might help us finding netns refcount imbalances. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-12-16	netfilter: nfnetlink: add netns refcount tracker to struct nfulnl_instance	Eric Dumazet	1	-2/+3
	If compiled with CONFIG_NET_NS_REFCNT_TRACKER=y, using put_net_track() in nfulnl_instance_free_rcu() and get_net_track() in instance_create() might help us finding netns refcount imbalances. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-12-16	net: prestera: flower template support	Volodymyr Mytnyk	6	-3/+145
	Add user template explicit support. At this moment, max TCAM rule size is utilized for all rules, doesn't matter which and how much flower matches are provided by user. It means that some of TCAM space is wasted, which impacts the number of filters that can be offloaded. Introducing the template, allows to have more HW offloaded filters by specifying the template explicitly. Example: tc qd add dev PORT clsact tc chain add dev PORT ingress protocol ip \ flower dst_ip 0.0.0.0/16 tc filter add dev PORT ingress protocol ip \ flower skip_sw dst_ip 1.2.3.4/16 action drop NOTE: chain 0 is the default chain id for "tc chain" & "tc filter" command, so it is omitted in the example above. This patch adds only template support for default chain 0 suppoerted by prestera driver at this moment. Chains are not supported yet, and will be added later. Signed-off-by: Volodymyr Mytnyk <vmytnyk@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-16	net: dsa: rtl8365mb: add GMII as user port mode	Luiz Angelo Daros de Luca	1	-1/+2
	Recent net-next fails to initialize ports with: realtek-smi switch: phy mode gmii is unsupported on port 0 realtek-smi switch lan5 (uninitialized): validation of gmii with support 0000000,00000000,000062ef and advertisement 0000000,00000000,000062ef failed: -22 realtek-smi switch lan5 (uninitialized): failed to connect to PHY: -EINVAL realtek-smi switch lan5 (uninitialized): error -22 setting up PHY for tree 1, switch 0, port 0 Current net branch(3dd7d40b43663f58d11ee7a3d3798813b26a48f1) is not affected. I also noticed the same issue before with older versions but using a MDIO interface driver, not realtek-smi. Tested-by: Arınç ÜNAL <arinc.unal@arinc9.com> Signed-off-by: Luiz Angelo Daros de Luca <luizluca@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-16	gve: Add tx\|rx-coalesce-usec for DQO	Tao Liu	4	-11/+91
	Adding ethtool support for changing rx-coalesce-usec and tx-coalesce-usec when using the DQO queue format. Signed-off-by: Tao Liu <xliutaox@google.com> Signed-off-by: Jeroen de Borst <jeroendb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-16	gve: Add consumed counts to ethtool stats	Jordan Kim	1	-9/+12
	Being able to see how many descriptors are in-use is helpful when diagnosing certain issues. Signed-off-by: Jeroen de Borst <jeroendb@google.com> Signed-off-by: Jordan Kim <jrkim@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-16	gve: Implement suspend/resume/shutdown	Catherine Sullivan	2	-0/+60
	Add support for suspend, resume and shutdown. Signed-off-by: Catherine Sullivan <csully@google.com> Signed-off-by: David Awogbemila <awogbemila@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-16	gve: Add optional metadata descriptor type GVE_TXD_MTD	Willem de Bruijn	3	-20/+74
	Allow drivers to pass metadata along with packet data to the device. Introduce a new metadata descriptor type * GVE_TXD_MTD This descriptor is optional. If present it immediate follows the packet descriptor and precedes the segment descriptor. This descriptor may be repeated. Multiple metadata descriptors may follow. There are no immediate uses for this, this is for future proofing. At present devices allow only 1 MTD descriptor. The lower four bits of the type_flags field encode GVE_TXD_MTD. The upper four bits of the type_flags field encodes a subtype. Introduce one such metadata descriptor subtype * GVE_MTD_SUBTYPE_PATH This shares path information with the device for network failure discovery and robust response: Linux derives ipv6 flowlabel and ECMP multipath from sk->sk_txhash, and updates this field on error with sk_rethink_txhash. Allow the host stack to do the same. Pass the tx_hash value if set. Also communicate whether the path hash is set, or more exactly, what its type is. Define two common types GVE_MTD_PATH_HASH_NONE GVE_MTD_PATH_HASH_L4 Concrete examples of error conditions that are resolved are mentioned in the commits that add sk_rethink_txhash calls. Such as commit 7788174e8726 ("tcp: change IPv6 flow-label upon receiving spurious retransmission"). Experimental results mirror what the theory suggests: where IPv6 FlowLabel is included in path selection (e.g., LAG/ECMP), flowlabel rotation on TCP timeout avoids the vast majority of TCP disconnects that would otherwise have occurred during link failures in long-haul backbones, when an alternative path is available. Rotation can be applied to various bad connection signals, such as timeouts and spurious retransmissions. In aggregate, such flow level signals can help locate network issues. Define initial common states: GVE_MTD_PATH_STATE_DEFAULT GVE_MTD_PATH_STATE_TIMEOUT GVE_MTD_PATH_STATE_CONGESTION GVE_MTD_PATH_STATE_RETRANSMIT Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David Awogbemila <awogbemila@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-16	gve: remove memory barrier around seqno	Catherine Sullivan	1	-2/+0
	No longer needed after we introduced the barrier in gve_napi_poll. Signed-off-by: Catherine Sullivan <csully@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-16	gve: Update gve_free_queue_page_list signature	Catherine Sullivan	1	-2/+1
	The id field should be a u32 not a signed int. Signed-off-by: Catherine Sullivan <csully@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-16	gve: Move the irq db indexes out of the ntfy block struct	Catherine Sullivan	4	-17/+36
	Giving the device access to other kernel structs is not ideal. Move the indexes into their own array and just keep pointers to them in the ntfy block struct. Signed-off-by: Catherine Sullivan <csully@google.com> Signed-off-by: David Awogbemila <awogbemila@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-16	gve: Correct order of processing device options	Jeroen de Borst	1	-4/+4
	The legacy raw addressing device option was processed before the new RDA queue format option. This caused the supported features mask, which is provided only on the RDA queue format option, not to be set. This disabled jumbo-frame support when using raw adressing. Fixes: 255489f5b33c ("gve: Add a jumbo-frame device option") Signed-off-by: Jeroen de Borst <jeroendb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-16	net: mvneta: convert to pcs_validate() and phylink_generic_validate()	Russell King (Oracle)	1	-19/+18
	Convert mvneta to validate the autoneg state for 1000base-X in the pcs_validate() operation, rather than the MAC validate() operation. This allows us to switch the MAC validate() to use phylink_generic_validate(). Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-16	net: mvneta: convert to phylink pcs operations	Russell King	1	-52/+91
	An initial stab at converting mvneta to PCS operations. There's a few FIXMEs to be solved. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-16	net: mvneta: convert to use mac_prepare()/mac_finish()	Russell King	1	-35/+68
	Convert mvneta to use the mac_prepare() and mac_finish() methods in preparation to converting mvneta to split-PCS support. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-16	net: mvpp2: convert to pcs_validate() and phylink_generic_validate()	Russell King (Oracle)	1	-20/+17
	Convert mvpp2 to validate the autoneg state for 1000base-X in the pcs_validate() operation, rather than the MAC validate() operation. This allows us to switch the MAC validate() to use phylink_generic_validate(). Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-16	net: mvpp2: use .mac_select_pcs() interface	Russell King (Oracle)	2	-36/+42
	Use the mac_select_pcs() method to choose between the GMAC and XLG PCS implementations. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-16	net: phylink: add pcs_validate() method	Russell King (Oracle)	2	-0/+51
	Add a hook for PCS to validate the link parameters. This avoids MAC drivers having to have knowledge of their PCS in their validate() method, thereby allowing several MAC drivers to be simplfied. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-16	net: phylink: add mac_select_pcs() method to phylink_mac_ops	Russell King (Oracle)	2	-9/+77
	mac_select_pcs() allows us to have an explicit point to query which PCS the MAC wishes to use for a particular PHY interface mode, thereby allowing us to add support to validate the link settings with the PCS. Phylink will also use this to select the PCS to be used during a major configuration event without the MAC driver needing to call phylink_set_pcs(). Note that if mac_select_pcs() is present, the supported_interfaces bitmap must be filled in; this avoids mac_select_pcs() being called with PHY_INTERFACE_MODE_NA when we want to get support for all interface types. Phylink will return an error in phylink_create() unless this condition is satisfied. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-16	netfilter: conntrack: Remove useless assignment statements	luo penghao	1	-1/+0
	The old_size assignment here will not be used anymore The clang_analyzer complains as follows: Value stored to 'old_size' is never read Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: luo penghao <luo.penghao@zte.com.cn> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-12-15	net/mlx5: Introduce log_max_current_uc_list_wr_supported bit	Shay Drory	1	-1/+1
	Downstream patch will use this bit in order to know whether the device supports changing of max_uc_list. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-12-15	ice: use modern kernel API for kick	Jesse Brandeburg	1	-4/+5
	The kernel gained a new interface for drivers to use to combine tail bump (doorbell) and BQL updates, attempt to use those new interfaces. Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-12-15	ice: tighter control over VSI_DOWN state	Jesse Brandeburg	2	-5/+8
	The driver had comments to the effect of: This flag should be set before calling this function. While reviewing code it was found that there were several violations of this policy, which could introduce hard to find bugs or races. Fix the violations of the "VSI DOWN state must be set before calling ice_down" and make checking the state into code with a WARN_ON. Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-12-15	ice: use prefetch methods	Jesse Brandeburg	1	-1/+12
	The kernel provides some prefetch mechanisms to speed up commonly cold cache line accesses during receive processing. Since these are software structures it helps to have these strategically placed prefetches. Be careful to call BQL prefetch complete only for non XDP queues. Co-developed-by: Piotr Raczynski <piotr.raczynski@intel.com> Signed-off-by: Piotr Raczynski <piotr.raczynski@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-12-15	ice: update to newer kernel API	Jesse Brandeburg	1	-9/+9
	Use the netif_tx_* API from netdevice.h which has simpler parameters. Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-12-15	ice: support immediate firmware activation via devlink reload	Jacob Keller	11	-29/+330
	The ice hardware contains an embedded chip with firmware which can be updated using devlink flash. The firmware which runs on this chip is referred to as the Embedded Management Processor firmware (EMP firmware). Activating the new firmware image currently requires that the system be rebooted. This is not ideal as rebooting the system can cause unwanted downtime. In practical terms, activating the firmware does not always require a full system reboot. In many cases it is possible to activate the EMP firmware immediately. There are a couple of different scenarios to cover. * The EMP firmware itself can be reloaded by issuing a special update to the device called an Embedded Management Processor reset (EMP reset). This reset causes the device to reset and reload the EMP firmware. * PCI configuration changes are only reloaded after a cold PCIe reset. Unfortunately there is no generic way to trigger this for a PCIe device without a system reboot. When performing a flash update, firmware is capable of responding with some information about the specific update requirements. The driver updates the flash by programming a secondary inactive bank with the contents of the new image, and then issuing a command to request to switch the active bank starting from the next load. The response to the final command for updating the inactive NVM flash bank includes an indication of the minimum reset required to fully update the device. This can be one of the following: * A full power on is required * A cold PCIe reset is required * An EMP reset is required The response to the command to switch flash banks includes an indication of whether or not the firmware will allow an EMP reset request. For most updates, an EMP reset is sufficient to load the new EMP firmware without issues. In some cases, this reset is not sufficient because the PCI configuration space has changed. When this could cause incompatibility with the new EMP image, the firmware is capable of rejecting the EMP reset request. Add logic to ice_fw_update.c to handle the response data flash update AdminQ commands. For the reset level, issue a devlink status notification informing the user of how to complete the update with a simple suggestion like "Activate new firmware by rebooting the system". Cache the status of whether or not firmware will restrict the EMP reset for use in implementing devlink reload. Implement support for devlink reload with the "fw_activate" flag. This allows user space to request the firmware be activated immediately. For the .reload_down handler, we will issue a request for the EMP reset using the appropriate firmware AdminQ command. If we know that the firmware will not allow an EMP reset, simply exit with a suitable netlink extended ACK message indicating that the EMP reset is not available. For the .reload_up handler, simply wait until the driver has finished resetting. Logic to handle processing of an EMP reset already exists in the driver as part of its reset and rebuild flows. Implement support for the devlink reload interface with the "fw_activate" action. This allows userspace to request activation of firmware without a reboot. Note that support for indicating the required reset and EMP reset restriction is not supported on old versions of firmware. The driver can determine if the two features are supported by checking the device capabilities report. I confirmed support has existed since at least version 5.5.2 as reported by the 'fw.mgmt' version. Support to issue the EMP reset request has existed in all version of the EMP firmware for the ice hardware. Check the device capabilities report to determine whether or not the indications are reported by the running firmware. If the reset requirement indication is not supported, always assume a full power on is necessary. If the reset restriction capability is not supported, always assume the EMP reset is available. Users can verify if the EMP reset has activated the firmware by using the devlink info report to check that the 'running' firmware version has updated. For example a user might do the following: # Check current version $ devlink dev info # Update the device $ devlink dev flash pci/0000:af:00.0 file firmware.bin # Confirm stored version updated $ devlink dev info # Reload to activate new firmware $ devlink dev reload pci/0000:af:00.0 action fw_activate # Confirm running version updated $ devlink dev info Finally, this change does not implement basic driver-only reload support. I did look into trying to do this. However, it requires significant refactor of how the ice driver probes and loads everything. The ice driver probe and allocation flows were not designed with such a reload in mind. Refactoring the flow to support this is beyond the scope of this change. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-12-15	ice: reduce time to read Option ROM CIVD data	Jacob Keller	1	-12/+36
	During probe and device reset, the ice driver reads some data from the NVM image as part of ice_init_nvm. Part of this data includes a section of the Option ROM which contains version information. The function ice_get_orom_civd_data is used to locate the '$CIV' data section of the Option ROM. Timing of ice_probe and ice_rebuild indicate that the ice_get_orom_civd_data function takes about 10 seconds to finish executing. The function locates the section by scanning the Option ROM every 512 bytes. This requires a significant number of NVM read accesses, since the Option ROM bank is 500KB. In the worst case it would take about 1000 reads. Worse, all PFs serialize this operation during reload because of acquiring the NVM semaphore. The CIVD section is located at the end of the Option ROM image data. Unfortunately, the driver has no easy method to determine the offset manually. Practical experiments have shown that the data could be at a variety of locations, so simply reversing the scanning order is not sufficient to reduce the overall read time. Instead, copy the entire contents of the Option ROM into memory. This allows reading the data using 4Kb pages instead of 512 bytes at a time. This reduces the total number of firmware commands by a factor of 8. In addition, reading the whole section together at once allows better indication to firmware of when we're "done". Re-write ice_get_orom_civd_data to allocate virtual memory to store the Option ROM data. Copy the entire OptionROM contents at once using ice_read_flash_module. Finally, use this memory copy to scan for the '$CIV' section. This change significantly reduces the time to read the Option ROM CIVD section from ~10 seconds down to ~1 second. This has a significant impact on the total time to complete a driver rebuild or probe. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-12-15	ice: move ice_devlink_flash_update and merge with ice_flash_pldm_image	Jacob Keller	3	-72/+39
	The ice_devlink_flash_update function performs a few upfront checks and then calls ice_flash_pldm_image. Most if these checks make more sense in the context of code within ice_flash_pldm_image. Merge ice_devlink_flash_update and ice_flash_pldm_image into one function, placing it in ice_fw_update.c Since this is still the entry point for devlink, call the function ice_devlink_flash_update instead of ice_flash_pldm_image. This leaves a single function which handles the devlink parameters and then initiates a PLDM update. With this change, the ice_devlink_flash_update function in ice_fw_update.c becomes the main entry point for flash update. It elimintes some unnecessary boiler plate code between the two previous functions. The ultimate motivation for this is that it eases supporting a dry run with the PLDM library in a future change. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-12-15	ice: move and rename ice_check_for_pending_update	Jacob Keller	3	-77/+77
	The ice_devlink_flash_update function performs a few checks and then calls ice_flash_pldm_image. One of these checks is to call ice_check_for_pending_update. This function checks if the device has a pending update, and cancels it if so. This is necessary to allow a new flash update to proceed. We want to refactor the ice code to eliminate ice_devlink_flash_update, moving its checks into ice_flash_pldm_image. To do this, ice_check_for_pending_update will become static, and only called by ice_flash_pldm_image. To make this change easier to review, first just move the function up within the ice_fw_update.c file. While at it, note that the function has a misleading name. Its primary action is to cancel a pending update. Using the verb "check" does not imply this. Rename it to ice_cancel_pending_update. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-12-15	ice: devlink: add shadow-ram region to snapshot Shadow RAM	Jacob Keller	2	-5/+89
	We have a region for reading the contents of the NVM flash as a snapshot. This region does not allow reading the Shadow RAM, as it always passes the FLASH_ONLY bit to the low level firmware interface. Add a separate shadow-ram region which will allow snapshot of the current contents of the Shadow RAM. This data is built from the NVM contents but is distinct as the device builds up the Shadow RAM during initialization, so being able to snapshot its contents can be useful when attempting to debug flash related issues. Fix the comment description of the nvm-flash region which incorrectly stated that it filled the shadow-ram region, and add a comment explaining that the nvm-flash region does not actually read the Shadow RAM. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-12-15	ethtool: always write dev in ethnl_parse_header_dev_get	Jakub Kicinski	1	-3/+2
	Commit 0976b888a150 ("ethtool: fix null-ptr-deref on ref tracker") made the write to req_info.dev conditional, but as Eric points out in a different follow up the structure is often allocated on the stack and not kzalloc()'d so seems safer to always write the dev, in case it's garbage on input. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-15	net: add net device refcount tracker to struct packet_type	Eric Dumazet	3	-5/+14
	Most notable changes are in af_packet, tipc ones are trivial. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jon Maloy <jmaloy@redhat.com> Cc: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-15	selftests: mlxsw: vxlan: Remove IPv6 test case	Amit Cohen	1	-18/+0
	Currently, there is a test case to verify that VxLAN with IPv6 underlay is forbidden. Remove this test case as support for VxLAN with IPv6 underlay was added by the previous patch. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-15	mlxsw: Add support for VxLAN with IPv6 underlay	Amit Cohen	4	-10/+168
	Currently, mlxsw driver supports VxLAN with IPv4 underlay only. Add support for IPv6 underlay. The main differences are: * Learning is not supported for IPv6 FDB entries, use static entries and do not allow 'learning' flag for IPv6 VxLAN. * IPv6 addresses for FDB entries should be saved as part of KVDL. Use the new API to allocate and release entries for IPv6 addresses. * Spectrum ASICs do not fill UDP checksum, while in software IPv6 UDP packets with checksum zero are dropped. Force the relevant flags which allow the VxLAN device to generate UDP packets with zero checksum and also receive them. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-15	mlxsw: spectrum_nve: Keep track of IPv6 addresses used by FDB entries	Amit Cohen	3	-2/+162
	FDB entries that perform VxLAN encapsulation with an IPv6 underlay hold a reference on a resource. Namely, the KVDL entry where the IPv6 underlay destination IP is stored. When such an FDB entry is deleted, it needs to drop the reference from the corresponding KVDL entry. To that end, maintain a hash table that maps an FDB entry (i.e., {MAC, FID}) to the IPv6 address used by it. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-15	mlxsw: reg: Add a function to fill IPv6 unicast FDB entries	Amit Cohen	1	-0/+13
	Add a function to fill IPv6 unicast FDB entries. Use the common function for common fields. Unlike IPv4 entries, the underlay IP address is not filled in the register payload, but instead a pointer to KVDL is used. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-15	mlxsw: Split handling of FDB tunnel entries between address families	Amit Cohen	2	-24/+38
	Currently, the function which adds/removes unicast tunnel FDB entries is shared between IPv4 and IPv6, while for IPv6 it warns because there is no support for it. The code for IPv6 will be more complicated because it needs to allocate/release a KVDL pointer for the underlay IPv6 address. As a preparation for IPv6 underlay support, split the code according to address family. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-15	mlxsw: spectrum_nve_vxlan: Make VxLAN flags check per address family	Amit Cohen	1	-9/+22
	As part of 'can_offload' checks, there is a check of VxLAN flags. The supported flags for IPv6 VxLAN will be different from the existing flags because of some limitations. As preparation for IPv6 underlay support, make this check per address family. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-15	mlxsw: spectrum_ipip: Use common hash table for IPv6 address mapping	Amit Cohen	1	-22/+6
	Use the common hash table introduced by the previous patch instead of the IP-in-IP specific implementation. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-15	mlxsw: spectrum: Add hash table for IPv6 address mapping	Amit Cohen	2	-0/+150
	The device supports forwarding entries such as routes and FDBs that perform tunnel (e.g., VXLAN, IP-in-IP) encapsulation or decapsulation. When the underlay is IPv6, these entries do not encode the 128 bit IPv6 address used for encapsulation / decapsulation. Instead, these entries encode a 24 bit pointer to an array called KVDL where the IPv6 address is stored. Currently, only IP-in-IP with IPv6 underlay is supported, but subsequent patches will add support for VxLAN with IPv6 underlay. To avoid duplicating the logic required to store and retrieve these IPv6 addresses, introduce a hash table that will store the mapping between IPv6 addresses and their KVDL index. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-15	net: fec: fix system hang during suspend/resume	Joakim Zhang	1	-12/+34
	1. During normal suspend (WoL not enabled) process, system has posibility to hang. The root cause is TXF interrupt coming after clocks disabled, system hang when accessing registers from interrupt handler. To fix this issue, disable all interrupts when system suspend. 2. System also has posibility to hang with WoL enabled during suspend, after entering stop mode, then magic pattern coming after clocks disabled, system will be waked up, and interrupt handler will be called, system hang when access registers. To fix this issue, disable wakeup irq in .suspend(), and enable it in .resume(). Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-15	net: ocelot: add support to get port mac from device-tree	Clément Léger	1	-1/+4
	Add support to get mac from device-tree using of_get_ethdev_address. Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Clément Léger <clement.leger@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-15	sun4i-emac.c: remove unnecessary branch	Conley Lee	1	-18/+0
	According to the current implementation of emac_rx, every arrived packet will be processed in the while loop. So, there is no remain packet last time. The skb_last field and this branch for dealing with it is unnecessary. Signed-off-by: Conley Lee <conleylee@foxmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-15	ethtool: use ethnl_parse_header_dev_put()	Eric Dumazet	16	-18/+23
	It seems I missed that most ethnl_parse_header_dev_get() callers declare an on-stack struct ethnl_req_info, and that they simply call dev_put(req_info.dev) when about to return. Add ethnl_parse_header_dev_put() helper to properly untrack reference taken by ethnl_parse_header_dev_get(). Fixes: e4b8954074f6 ("netlink: add net device refcount tracker to struct ethnl_req_info") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-14	net/mlx5e: Move goto action checks into tc_action goto post parse op	Roi Dayan	2	-19/+35
	Move goto action checks from parse nic/fdb funcs into the tc action infra goto post parse op. While moving this part also use NL_SET_ERR_MSG_MOD() instead of NL_SET_ERR_MSG(). Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-12-14	net/mlx5e: Move vlan action chunk into tc action vlan post parse op	Roi Dayan	2	-38/+51
	Move vlan prio tag rewrite handling into tc action infra vlan post parse op. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-12-14	net/mlx5e: Add post_parse() op to tc action infrastructure	Roi Dayan	2	-0/+15
	The post_parse() op should be called after the parse op was called for all actions. It could be an action state is dependent on other actions. In the new op an action can fail the parse if the state is not valid anymore. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-12-14	net/mlx5e: Move sample attr allocation to tc_action sample parse op	Roi Dayan	3	-12/+6
	There is no reason to wait with the kmalloc to after parsing all other actions. There could still be a failure later and before offloading the rule. So alloc the mem when parsing. The memory is being released on mlx5e_flow_put() which is called also on error flow. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>