linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2018-10-01	net/mlx5e: Change VF representors' RQ type	Gavi Teitz	3	-18/+25
	The representors' RQ size was not large enough for them to achieve high enough performance, and therefore needed to be enlarged, while suffering a minimum hit to its memory usage. To achieve this the representors RQ size was increased, and its type was changed to be a striding RQ if it is supported. Towards that goal the following changes were made: * Extracted the sequence for setting the standard netdev's RQ parmas into a function * Replaced the sequence for setting the representor's RQ params with the standard sequence The impact of this change can be seen in the following measurements taken on a setup of a VM over a VF, connected to OVS via the VF representor, to an external host: Before current change: TCP Throughput [Gb/s] VM to external host ~ 7.2 With the current change (measured with a striding RQ): TCP Throughput [Gb/s] VM to external host ~ 23.5 Each representor now consumes 2 [MB] of memory for its packet buffers. Signed-off-by: Gavi Teitz <gavi@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-10-01	net/mlx5e: Ethtool steering, Support masks for l3/l4 filters	Or Gerlitz	1	-40/+16
	Allow using partial masks for L3 addresses and L4 ports across the place. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-09-29	openvswitch: Use correct reply values in datapath and vport ops	Yifeng Sun	1	-10/+10
	This patch fixes the bug that all datapath and vport ops are returning wrong values (OVS_FLOW_CMD_NEW or OVS_DP_CMD_NEW) in their replies. Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-29	tls: Remove redundant vars from tls record structure	Vakul Garg	2	-53/+45
	Structure 'tls_rec' contains sg_aead_in and sg_aead_out which point to a aad_space and then chain scatterlists sg_plaintext_data, sg_encrypted_data respectively. Rather than using chained scatterlists for plaintext and encrypted data in aead_req, it is efficient to store aad_space in sg_encrypted_data and sg_plaintext_data itself in the first index and get rid of sg_aead_in, sg_aead_in and further chaining. This requires increasing size of sg_encrypted_data & sg_plaintext_data arrarys by 1 to accommodate entry for aad_space. The code which uses sg_encrypted_data and sg_plaintext_data has been modified to skip first index as it points to aad_space. Signed-off-by: Vakul Garg <vakul.garg@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-29	tipc: buffer overflow handling in listener socket	Tung Nguyen	3	-6/+64
	Default socket receive buffer size for a listener socket is 2Mb. For each arriving empty SYN, the linux kernel allocates a 768 bytes buffer. This means that a listener socket can serve maximum 2700 simultaneous empty connection setup requests before it hits a receive buffer overflow, and much fewer if the SYN is carrying any significant amount of data. When this happens the setup request is rejected, and the client receives an ECONNREFUSED error. This commit mitigates this problem by letting the client socket try to retransmit the SYN message multiple times when it sees it rejected with the code TIPC_ERR_OVERLOAD. Retransmission is done at random intervals in the range of [100 ms, setup_timeout / 4], as many times as there is room for within the setup timeout limit. Signed-off-by: Tung Nguyen <tung.q.nguyen@dektech.com.au> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-29	tipc: add SYN bit to connection setup messages	Jon Maloy	3	-5/+22
	Messages intended for intitating a connection are currently indistinguishable from regular datagram messages. The TIPC protocol specification defines bit 17 in word 0 as a SYN bit to allow sanity check of such messages in the listening socket, but this has so far never been implemented. We do that in this commit. Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-29	tipc: refactor function tipc_sk_filter_connect()	Jon Maloy	1	-58/+43
	We refactor the function tipc_sk_filter_connect(), both to make it more readable and as a preparation for the next commit. Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-29	tipc: refactor function tipc_sk_timeout()	Jon Maloy	1	-24/+38
	We refactor this function as a preparation for the coming commits in the same series. Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-29	tipc: refactor function tipc_msg_reverse()	Jon Maloy	1	-30/+28
	The function tipc_msg_reverse() is reversing the header of a message while reusing the original buffer. We have seen at several occasions that this may have unfortunate side effects when the buffer to be reversed is a clone. In one of the following commits we will again need to reverse cloned buffers, so this is the right time to permanently eliminate this problem. In this commit we let the said function always consume the original buffer and replace it with a new one when applicable. Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-29	tcp: up initial rmem to 128KB and SYN rwin to around 64KB	Yuchung Cheng	3	-46/+8
	Previously TCP initial receive buffer is ~87KB by default and the initial receive window is ~29KB (20 MSS). This patch changes the two numbers to 128KB and ~64KB (rounding down to the multiples of MSS) respectively. The patch also simplifies the calculations s.t. the two numbers are directly controlled by sysctl tcp_rmem[1]: 1) Initial receiver buffer budget (sk_rcvbuf): while this should be configured via sysctl tcp_rmem[1], previously tcp_fixup_rcvbuf() always override and set a larger size when a new connection establishes. 2) Initial receive window in SYN: previously it is set to 20 packets if MSS <= 1460. The number 20 was based on the initial congestion window of 10: the receiver needs twice amount to avoid being limited by the receive window upon out-of-order delivery in the first window burst. But since this only applies if the receiving MSS <= 1460, connection using large MTU (e.g. to utilize receiver zero-copy) may be limited by the receive window. With this patch TCP memory configuration is more straight-forward and more properly sized to modern high-speed networks by default. Several popular stacks have been announcing 64KB rwin in SYNs as well. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Wei Wang <weiwan@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-29	hns3: Another build fix.	David S. Miller	1	-1/+1
	drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c: In function ‘hclge_get_sset_count’: drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c:496:31: error: ‘HNAE3_REVISION_ID_21’ undeclared (first use in this function); did you mean ‘FADT2_REVISION_ID’? if (hdev->pdev->revision >= HNAE3_REVISION_ID_21 \|\| ^~~~~~~~~~~~~~~~~~~~ FADT2_REVISION_ID Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-29	hns3: Fix the build.	David S. Miller	1	-1/+1
	drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c: In function ‘hns3_self_test’: drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c:278:15: error: ‘HNS3_SELF_TEST_TYPE_NUM’ undeclared (first use in this function); did you mean ‘HNS3_SELF_TEST_TPYE_NUM’? int st_param[HNS3_SELF_TEST_TYPE_NUM][2]; ^~~~~~~~~~~~~~~~~~~~~~~ HNS3_SELF_TEST_TPYE_NUM Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-28	ice: fix changing of ring descriptor size (ethtool -G)	Bruce Allan	2	-5/+16
	rx_mini_pending was set to an incorrect value. This was causing EINVAL to always be returned to 'ethtool -G'. The driver does not support mini or jumbo rings so the respective settings should be zero. Also, change the valid range of the number of descriptors in the rings to make the code simpler and easier for users to understand (this removes the valid settings of 8 and 16). Add a system log message indicating when the number is rounded-up from what the user specifies with the 'ethtool -G' command (i.e. when it is not a multiple of 32), and update the log message when a user-provided value is out of range to also indicate the stride. Signed-off-by: Bruce Allan <bruce.w.allan@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-09-28	ice: Update to capabilities admin queue command	Anirudh Venkataramanan	1	-16/+34
	This patch makes a couple of changes in the way the driver uses the "get capabilities" command. 1. Get device capabilities in addition to function capabilities 2. Align to latest spec by using cap_count to determine size of the buffer in case of length error. Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-09-28	ice: Query the Tx scheduler node before adding it	Anirudh Venkataramanan	2	-1/+71
	Query the Tx scheduler tree node information from FW before adding it to the driver's software database. This will keep the node information current in driver. Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-09-28	ice: Update comment for ice_fltr_mgmt_list_entry	Brett Creeley	1	-1/+2
	Previously the comment stated that VSI lists should be used when a second VSI becomes a subscriber to the "VLAN address". VSI lists are always used for VLAN membership, so replace "VLAN address" with "MAC address". Also note that VLAN(s) always use VSI list rules. Signed-off-by: Brett Creeley <brett.creeley@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-09-28	ice: update fw version check logic	Jacob Keller	1	-11/+19
	We have MAX_FW_API_VER_BRANCH, MAX_FW_API_VER_MAJOR, and MAX_FW_API_VER_MINOR that we use in ice_controlq.h to test when a firmware version is newer than expected. This is currently tested by comparing each field separately. Thus, we compare the branch field against the MAX_FW_API_VER_BRANCH, and so forth. This means that currently, if we suppose that the max firmware version is defined as 0.2.1, i.e. Then firmware 0.1.3 will fail to load. This is because the minor version 3 is greater than the max minor version 1. This is not intuitive, because of the notion that increasing the major firmware version to 2 should mean any firmware version with a major version is less than 2 should be considered older than 2... In order to allow both 0.2.1 and 0.1.3 to load, you would have to define the "max" firmware version as 0.2.3.. It is possible that such a firmware version doesn't even exist yet! Fix this by replacing the current logic with an updated check that behaves as follows: First, we check the major version. If it is greater than the expected version, then we prevent driver load. Additionally, a warning message is logged to indicate to the system administrator that they need to update their driver. This is now the only case where the driver will refuse to load. Second, if the major version is less than the expected version, we log an information message indicating the NVM should be updated. Third, if the major version is exact, we'll then check the minor version. If the minor version is more than two versions less than expected, we log an information message indicating the NVM should be updated. If it is more than two versions greater than the expected version, we log an information message that the driver should be updated. To support this, the ice_aq_ver_check function needs its signature updated to pass the HW structure. Since we now pass this structure, there is no need to pass the firmware API versions separately. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-09-28	ice: update branding strings and supported device ids	Bruce Allan	2	-9/+3
	Update branding strings and remove device ids 0x1594 and 0x1595. Signed-off-by: Bruce Allan <bruce.w.allan@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-09-28	ice: replace unnecessary memcpy with direct assignment	Bruce Allan	1	-1/+1
	Direct assignment is preferred over a memcpy() Signed-off-by: Bruce Allan <bruce.w.allan@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-09-28	ice: use [sr]q.count when checking if queue is initialized	Jacob Keller	1	-4/+4
	When shutting down the controlqs, we check if they are initialized before we shut them down and destroy the lock. This is important, as it prevents attempts to access the lock of an already shutdown queue. Unfortunately, we checked rq.head and sq.head as the value to determine if the queue was initialized. This doesn't work, because head is not reset when the queue is shutdown. In some flows, the adminq will have already been shut down prior to calling ice_shutdown_all_ctrlqs. This can result in a crash due to attempting to access the already destroyed mutex. Fix this by using rq.count and sq.count instead. Indeed, ice_shutdown_sq and ice_shutdown_rq already indicate that this is the value we should be using to determine of the queue was initialized. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-09-28	qed: fix spelling mistake "b_cb_registred" -> "b_cb_registered"	Colin Ian King	2	-8/+8
	Trivial fix to spelling mistake struct field name, rename it. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-28	net: sched: make function qdisc_free_cb() static	Wei Yongjun	1	-1/+1
	Fixes the following sparse warning: net/sched/sch_generic.c:944:6: warning: symbol 'qdisc_free_cb' was not declared. Should it be static? Fixes: 3a7d0d07a386 ("net: sched: extend Qdisc with rcu") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>