linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2017-06-21	sock: avoid dirtying incoming_cpu if not needed	Paolo Abeni	1	-1/+4
	for connected socket, the incoming_cpu field in the sock struct is not going to change frequently, but we are setting it unconditionally for each packet. Since sk_incoming_cpu and sk_flags share the same cacheline, and the latter is access by udp_recvmsg(), this cause a cache miss for each packet for UDP connected socket. With this patch, we set the incoming cpu field only when the ingress cpu really changes. This gives a small but measurable performance improvement for connected UDP socket. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-21	net: introduce SO_PEERGROUPS getsockopt	David Herrmann	12	-0/+55
	This adds the new getsockopt(2) option SO_PEERGROUPS on SOL_SOCKET to retrieve the auxiliary groups of the remote peer. It is designed to naturally extend SO_PEERCRED. That is, the underlying data is from the same credentials. Regarding its syntax, it is based on SO_PEERSEC. That is, if the provided buffer is too small, ERANGE is returned and @optlen is updated. Otherwise, the information is copied, @optlen is set to the actual size, and 0 is returned. While SO_PEERCRED (and thus `struct ucred') already returns the primary group, it lacks the auxiliary group vector. However, nearly all access controls (including kernel side VFS and SYSVIPC, but also user-space polkit, DBus, ...) consider the entire set of groups, rather than just the primary group. But this is currently not possible with pure SO_PEERCRED. Instead, user-space has to work around this and query the system database for the auxiliary groups of a UID retrieved via SO_PEERCRED. Unfortunately, there is no race-free way to query the auxiliary groups of the PID/UID retrieved via SO_PEERCRED. Hence, the current user-space solution is to use getgrouplist(3p), which itself falls back to NSS and whatever is configured in nsswitch.conf(3). This effectively checks which groups we would assign to the user if it logged in now. On normal systems it is as easy as reading /etc/group, but with NSS it can resort to quering network databases (eg., LDAP), using IPC or network communication. Long story short: Whenever we want to use auxiliary groups for access checks on IPC, we need further IPC to talk to the user/group databases, rather than just relying on SO_PEERCRED and the incoming socket. This is unfortunate, and might even result in dead-locks if the database query uses the same IPC as the original request. So far, those recursions / dead-locks have been avoided by using primitive IPC for all crucial NSS modules. However, we want to avoid re-inventing the wheel for each NSS module that might be involved in user/group queries. Hence, we would preferably make DBus (and other IPC that supports access-management based on groups) work without resorting to the user/group database. This new SO_PEERGROUPS ioctl would allow us to make dbus-daemon work without ever calling into NSS. Cc: Michal Sekletar <msekleta@redhat.com> Cc: Simon McVittie <simon.mcvittie@collabora.co.uk> Reviewed-by: Tom Gundersen <teg@jklm.no> Signed-off-by: David Herrmann <dh.herrmann@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-21	udp: prefetch rmem_alloc in udp_queue_rcv_skb()	Paolo Abeni	1	-0/+1
	On UDP packets processing, if the BH is the bottle-neck, it always sees a cache miss while updating rmem_alloc; try to avoid it prefetching the value as soon as we have the socket available. Performances under flood with multiple NIC rx queues used are unaffected, but when a single NIC rx queue is in use, this gives ~10% performance improvement. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-21	qede: Fix compilation without QED_RDMA	Chad Dupuis	1	-1/+1
	When CONFIG_QED_RDMA isn't defined, we'd hit the following: /include/linux/qed/qede_rdma.h:84:19: warning: ‘qede_rdma_dev_add’ used but never defined [enabled by default] static inline int qede_rdma_dev_add(struct qede_dev dev); Fixes: bbfcd1e8e167 ("qed: Set rdma generic functions prefix") Signed-off-by: Chad Dupuis <chad.dupuis@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-21	r8152: correct the definition	hayeswang	1	-11/+11
	Replace VLAN_HLEN and CRC_SIZE with ETH_FCS_LEN. Signed-off-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-21	ip6mr: add netlink notifications on mrt6msg cache reports	Julien Gomes	2	-2/+81
	Add Netlink notifications on cache reports in ip6mr, in addition to the existing mrt6msg sent to mroute6_sk. Send RTM_NEWCACHEREPORT notifications to RTNLGRP_IPV6_MROUTE_R. MSGTYPE, MIF_ID, SRC_ADDR and DST_ADDR Netlink attributes contain the same data as their equivalent fields in the mrt6msg header. PKT attribute is the packet sent to mroute6_sk, without the added mrt6msg header. Suggested-by: Ryan Halbrook <halbrook@arista.com> Signed-off-by: Julien Gomes <julien@arista.com> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-21	ipmr: add netlink notifications on igmpmsg cache reports	Julien Gomes	2	-2/+79
	Add Netlink notifications on cache reports in ipmr, in addition to the existing igmpmsg sent to mroute_sk. Send RTM_NEWCACHEREPORT notifications to RTNLGRP_IPV4_MROUTE_R. MSGTYPE, VIF_ID, SRC_ADDR and DST_ADDR Netlink attributes contain the same data as their equivalent fields in the igmpmsg header. PKT attribute is the packet sent to mroute_sk, without the added igmpmsg header. Suggested-by: Ryan Halbrook <halbrook@arista.com> Signed-off-by: Julien Gomes <julien@arista.com> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-21	rtnetlink: add restricted rtnl groups for ipv4 and ipv6 mroute	Julien Gomes	2	-0/+17
	Add RTNLGRP_{IPV4,IPV6}_MROUTE_R as two new restricted groups for the NETLINK_ROUTE family. Binding to these groups specifically requires CAP_NET_ADMIN to allow multicast of sensitive messages (e.g. mroute cache reports). Suggested-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Julien Gomes <julien@arista.com> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-21	rtnetlink: add NEWCACHEREPORT message type	Julien Gomes	2	-1/+5
	New NEWCACHEREPORT message type to be used for cache reports sent via Netlink, effectively allowing splitting cache report reception from mroute programming. Suggested-by: Ryan Halbrook <halbrook@arista.com> Signed-off-by: Julien Gomes <julien@arista.com> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-21	tcp: md5: hide unused variable	Arnd Bergmann	1	-5/+1
	Changing from a memcpy to per-member comparison left the size variable unused: net/ipv4/tcp_ipv4.c: In function 'tcp_md5_do_lookup': net/ipv4/tcp_ipv4.c:910:15: error: unused variable 'size' [-Werror=unused-variable] This does not show up when CONFIG_IPV6 is enabled, but the variable can be removed either way, along with the now unused assignment. Fixes: 6797318e623d ("tcp: md5: add an address prefix for key lookup") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-21	idsn: fix wrong skb_put() used	yuan linyu	1	-1/+1
	in my commit b952f4dff2751252db073c27c0f8a16a416a2ddc, - (u8 )skb_put(skb_out, 1) = (u8)(accm >> 24); \ + skb_put(skb_out, (u8)(accm >> 24)); \ it should skb_put_u8() Fixes: b952f4dff275 ("net: manual clean code which call skb_put_[data:zero])") Signed-off-by: yuan linyu <Linyu.Yuan@alcatel-sbell.com.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-20	i40e: don't hold RTNL lock for the entire reset	Jacob Keller	1	-20/+7
	We recently refactored i40e_do_reset() and its friends to be able to hold the RTNL lock only for the portions that actually need to be protected. However, a separate refactoring added several new callers of these functions during the PCIe error recovery and suspend/resume cycles. When merging the changes together, it was not noticed that we could reduce the RTNL scope by letting the reset function handle the lock itself, as previously it was not possible. Fix this by replacing these call sites to indicate that the reset function should handle its own lock. This enables multiple PFs to reset or resume simultaneously without serializing the resets via the RTNL lock. The end result is that on systems with lots of PFs and VFs the resets don't stall waiting for each other to finish. It is probable that we can also do the same for i40e_do_reset_safe, but this author did not research that change carefully enough to be confident. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-06-20	i40e: Handle PE_CRITERR properly with IWARP enabled	Catherine Sullivan	1	-2/+2
	When IWARP is enabled, we weren't clearing the PE_CRITERR, just logging it and removing it from the mask. We need to do a corer to reset the PE_CRITERR register, so set the bit for that as we handle the interrupt. We should also be checking for the error against the PFINT_ICR0 register, and only need to clear it in the value getting written to PFINT_ICR0_ENA. Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com> Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-06-20	i40e: clear only cause_ena bit	Shannon Nelson	1	-2/+12
	When disabling interrupts, we should only be clearing the CAUSE_ENA bit, not clearing the whole register. Clearing the whole register sets the NEXTQ_IDX field to 0 instead of 0x7ff which can confuse the Firmware in some reset sequences. Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-06-20	i40e: fix disabling overflow promiscuous mode	Alan Brady	1	-3/+2
	There exists a bug in which the driver does not correctly exit overflow promiscuous mode. This can occur if "too many" mac filters are added, putting the driver into overflow promiscuous mode, and the filters are then removed. When the failed filters are removed, the driver reports exiting overflow promiscuous mode which is correct, however traffic continues to be received as if in promiscuous mode still. The bug occurs because the conditional for toggling promiscuous mode was set to only execute when promiscuous mode was enabled and not when it was disabled as well. This patch fixes the conditional to correctly execute when promiscuous mode is toggled and not just enabled. Without this patch, the driver is unable to correctly exit overflow promiscuous mode. Signed-off-by: Alan Brady <alan.brady@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-06-20	i40e: Add support for OEM firmware version	Filip Sadowski	2	-14/+81
	This patch adds support for OEM firmware version. If OEM specific adapter is detected ethtool reports OEM product version in firmware version string instead of etrack id. Signed-off-by: Filip Sadowski <filip.sadowski@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-06-20	i40e: genericize the partition bandwidth control	Shannon Nelson	2	-28/+26
	Partition bandwidth control is not in just one form of MFP (multi-function partitioning), so make the code more generic and be sure to nudge the Tx scheduler for all MFP. Copyright updated to 2017. Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-06-20	i40e: Add message for unsupported MFP mode	Carolyn Wyborny	1	-0/+6
	This patch adds a check and message if the device is in MFP mode as changing RSS input set is not supported in MFP mode. Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-06-20	i40e: Support firmware CEE DCB UP to TC map re-definition	Greg Bowers	1	-4/+7
	Changes parsing of FW 4.33 AQ command Get CEE DCBX OPER CFG (0x0A07). Change is required because FW now creates the oper_prio_tc nibbles reversed from those in the CEE Priority Group sub-TLV. This change will only apply to FW 4.33 as future FW versions will use a different function to parse the CEE data. Signed-off-by: Greg Bowers <gregory.j.bowers@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-06-20	i40e: Fix potential out of bound array access	Sudheer Mogilappagari	1	-1/+3
	This is a fix for the static code analysis issue where dcbcfg->numapps could be greater than size of array (i.e dcbcfg->app[I40E_DCBX_MAX_APPS]). The fix makes sure that the array is not accessed past the size of of the array (i.e. I40E_DCBX_MAX_APPS). Copyright updated to 2017. Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-06-20	i40e: comment that udp_port must be in host byte order	Jacob Keller	1	-1/+5
	The firmware expects the port number passed when setting up the UDP tunnel configuration to be in Little Endian format. The i40e_aq_add_udp_tunnel command byte swaps the value from host order to Little Endian. Since commit fe0b0cd97b4f ("i40e: send correct port number to AdminQ when enabling UDP tunnels") we've correctly sent the value in host order. Let's also add a comment to the function explaining that it must be in host order, as the port numbers are commonly stored as Big Endian values. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-06-20	i40e: use dev_dbg instead of dev_info when warning about missing routine	Jacob Keller	1	-3/+3
	When searching for the vf_capability client routine, dev_info() was used, instead of the normal dev_dbg(). This causes the message to be displayed at standard log levels which can cause administrators to worry. Avoid this by using dev_dbg instead. Copyright updated to 2017. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-06-20	i40e/i40evf: update WOL and I40E_AQC_ADDR_VALID_MASK flags	Alice Michael	2	-4/+5
	Update a few flags related to FW interactions. Copyright updated to 2017. Signed-off-by: Alice Michael <alice.michael@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-06-20	i40evf: assign num_active_queues inside i40evf_alloc_queues	Jacob Keller	1	-7/+11
	The variable num_active_queues represents the number of active queues we have for the device. We assign this pretty early in i40evf_init_subtask. Several code locations are written with loops over the tx_rings and rx_rings structures, which don't get allocated until i40evf_alloc_queues, and which get freed by i40evf_free_queues. These call sites were written under the assumption that tx_rings and rx_rings would always be allocated at least when num_active_queues is non-zero. Lets fix this by moving the assignment into the function where we allocate queues. We'll use a temporary variable for storage so that we don't assign the value in the adapter structure until after the rings have been set up. Finally, when we free the queues, we'll clear the value to ensure that we do not loop over the rings memory that no longer exists. This resolves a possible NULL pointer dereference in i40evf_get_ethtool_stats which could occur if the VF fails to recover from a reset, and then a user requests statistics. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-06-20	i40e: add support for XDP_TX action	Björn Töpel	5	-87/+384
	This patch adds proper XDP_TX action support. For each Tx ring, an additional XDP Tx ring is allocated and setup. This version does the DMA mapping in the fast-path, which will penalize performance for IOMMU enabled systems. Further, debugfs support is not wired up for the XDP Tx rings. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-06-20	i40e: add XDP support for pass and drop actions	Björn Töpel	4	-31/+194
	This commit adds basic XDP support for i40e derived NICs. All XDP actions will end up in XDP_DROP. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-06-20	s390/qeth: use diag26c to get MAC address on L2	Julian Wiedmann	3	-2/+76
	When a s390 guest runs on a z/VM host that's part of a SSI cluster, it can be migrated to a different host. In this case, the MAC address it originally obtained on the old host may be re-assigned to a new guest. This would result in address conflicts between the two guests. When running as z/VM guest, use the diag26c MAC Service to obtain a hypervisor-managed MAC address. The MAC Service is SSI-aware, and won't re-assign the address after the guest is migrated to a new host. This patch adds support for the z/VM MAC Service on L2 devices. Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com> Acked-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-20	s390/diag: add diag26c support	Julian Wiedmann	2	-0/+55
	Implement support for the hypervisor diagnose 0x26c ('Access Certain System Information'). It passes a request buffer and a subfunction code, and receives a response buffer and a return code. Also add the scaffolding for the 'MAC Services' subfunction. It may be used by network devices to obtain a hypervisor-managed MAC address. Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com> Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-20	s390/qeth: fix packing buffer statistics	Julian Wiedmann	1	-7/+10
	There's two spots in qeth_send_packet() where we don't accurately account for transmitted packing buffers in qeth's performance statistics: 1) when flushing the current buffer due to insufficient size, and the next buffer is not EMPTY, we need to account for that flushed buffer. 2) when synchronizing with the TX completion code, we reset flush_count and thus forget to account for any previously flushed buffers. Reported-by: Nils Hoppmann <niho@de.ibm.com> Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-20	s390/qeth: add ipa return codes for bridgeport	Kittipon Meesompop	3	-17/+49
	add ipa return codes for Bridgeport (HiperSockets and OSA) according to system level design. Signed-off-by: Kittipon Meesompop <kmeesomp@linux.vnet.ibm.com> Reviewed-by: Julian Wiedmann <jwi@linux.vnet.ibm.com> Reviewed-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-20	sctp: handle errors when updating asoc	Xin Long	3	-14/+39
	It's a bad thing not to handle errors when updating asoc. The memory allocation failure in any of the functions called in sctp_assoc_update() would cause sctp to work unexpectedly. This patch is to fix it by aborting the asoc and reporting the error when any of these functions fails. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-20	sctp: uncork the old asoc before changing to the new one	Xin Long	1	-0/+4
	local_cork is used to decide if it should uncork asoc outq after processing some cmds, and it is set when replying or sending msgs. local_cork should always have the same value with current asoc q->cork in some way. The thing is when changing to a new asoc by cmd SET_ASOC, local_cork may not be consistent with the current asoc any more. The cmd seqs can be: SCTP_CMD_UPDATE_ASSOC (asoc) SCTP_CMD_REPLY (asoc) SCTP_CMD_SET_ASOC (new_asoc) SCTP_CMD_DELETE_TCB (new_asoc) SCTP_CMD_SET_ASOC (asoc) SCTP_CMD_REPLY (asoc) The 1st REPLY makes OLD asoc q->cork and local_cork both are 1, and the cmd DELETE_TCB clears NEW asoc q->cork and local_cork. After asoc goes back to OLD asoc, q->cork is still 1 while local_cork is 0. The 2nd REPLY will not set local_cork because q->cork is already set and it can't be uncorked and sent out because of this. To keep local_cork consistent with the current asoc q->cork, this patch is to uncork the old asoc if local_cork is set before changing to the new one. Note that the above cmd seqs will be used in the next patch when updating asoc and handling errors in it. Suggested-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-20	dccp: call inet_add_protocol after register_pernet_subsys in dccp_v6_init	Xin Long	1	-10/+10
	Patch "call inet_add_protocol after register_pernet_subsys in dccp_v4_init" fixed a null pointer dereference issue for dccp_ipv4 module. The same fix is needed for dccp_ipv6 module. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-20	dccp: call inet_add_protocol after register_pernet_subsys in dccp_v4_init	Xin Long	1	-8/+9
	Now dccp_ipv4 works as a kernel module. During loading this module, if one dccp packet is being recieved after inet_add_protocol but before register_pernet_subsys in which v4_ctl_sk is initialized, a null pointer dereference may be triggered because of init_net.dccp.v4_ctl_sk is 0x0. Jianlin found this issue when the following call trace occurred: [ 171.950177] BUG: unable to handle kernel NULL pointer dereference at 0000000000000110 [ 171.951007] IP: [<ffffffffc0558364>] dccp_v4_ctl_send_reset+0xc4/0x220 [dccp_ipv4] [...] [ 171.984629] Call Trace: [ 171.984859] <IRQ> [ 171.985061] [ 171.985213] [<ffffffffc0559a53>] dccp_v4_rcv+0x383/0x3f9 [dccp_ipv4] [ 171.985711] [<ffffffff815ca054>] ip_local_deliver_finish+0xb4/0x1f0 [ 171.986309] [<ffffffff815ca339>] ip_local_deliver+0x59/0xd0 [ 171.986852] [<ffffffff810cd7a4>] ? update_curr+0x104/0x190 [ 171.986956] [<ffffffff815c9cda>] ip_rcv_finish+0x8a/0x350 [ 171.986956] [<ffffffff815ca666>] ip_rcv+0x2b6/0x410 [ 171.986956] [<ffffffff810c83b4>] ? task_cputime+0x44/0x80 [ 171.986956] [<ffffffff81586f22>] __netif_receive_skb_core+0x572/0x7c0 [ 171.986956] [<ffffffff810d2c51>] ? trigger_load_balance+0x61/0x1e0 [ 171.986956] [<ffffffff81587188>] __netif_receive_skb+0x18/0x60 [ 171.986956] [<ffffffff8158841e>] process_backlog+0xae/0x180 [ 171.986956] [<ffffffff8158799d>] net_rx_action+0x16d/0x380 [ 171.986956] [<ffffffff81090b7f>] __do_softirq+0xef/0x280 [ 171.986956] [<ffffffff816b6a1c>] call_softirq+0x1c/0x30 This patch is to move inet_add_protocol after register_pernet_subsys in dccp_v4_init, so that v4_ctl_sk is initialized before any incoming dccp packets are processed. Reported-by: Jianlin Shi <jishi@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-20	enic: Fix format truncation warning	Govindarajulu Varadarajan	2	-6/+6
	With -Wformat-truncation, gcc throws the following warning. Fix this by increasing the size of devname to accommodate 15 character netdev interface name and description. Remove length format precision for %s. We can fit entire name. Also increment the version. drivers/net/ethernet/cisco/enic/enic_main.c: In function ‘enic_open’: drivers/net/ethernet/cisco/enic/enic_main.c:1740:15: warning: ‘%u’ directive output may be truncated writing between 1 and 2 bytes into a region of size between 1 and 12 [-Wformat-truncation=] "%.11s-rx-%u", netdev->name, i); ^~ drivers/net/ethernet/cisco/enic/enic_main.c:1740:5: note: directive argument in the range [0, 16] "%.11s-rx-%u", netdev->name, i); ^~~~~~~~~~~~~ drivers/net/ethernet/cisco/enic/enic_main.c:1738:4: note: ‘snprintf’ output between 6 and 18 bytes into a destination of size 16 snprintf(enic->msix[intr].devname, ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ sizeof(enic->msix[intr].devname), ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ "%.11s-rx-%u", netdev->name, i); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Signed-off-by: Govindarajulu Varadarajan <gvaradar@cisco.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-20	net: stmmac: enable TSO for IPv6	Niklas Cassel	1	-2/+2
	There is nothing in the IP that prevents us from enabling TSO for IPv6. Before patch: ftp fe80::2aa:bbff:fecc:1336%eth0 ftp> get /dev/zero 882512708 bytes received in 00:14 (56.11 MiB/s) After patch: ftp fe80::2aa:bbff:fecc:1336%eth0 ftp> get /dev/zero 1203326784 bytes received in 00:12 (94.52 MiB/s) Signed-off-by: Niklas Cassel <niklas.cassel@axis.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-20	ibmvnic: Return from ibmvnic_resume if not in VNIC_OPEN state	John Allen	1	-0/+3
	If the ibmvnic driver is not in the VNIC_OPEN state, return from ibmvnic_resume callback. If we are not in the VNIC_OPEN state, interrupts may not be initialized and directly calling the interrupt handler will cause a crash. Signed-off-by: John Allen <jallen@linux.vnet.ibm.com> Reviewed-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-20	net: phy: lxt: Export link partner advertising	Thomas Bogendoerfer	1	-7/+4
	Provide link partner advertising information. Removed testing for gigabit modes, which is useless for a fast ethernet phy. Signed-off-by: Thomas Bogendoerfer <tbogendoerfer@suse.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-20	net-next: mediatek: set the rx_queue to 0	John Crispin	1	-0/+1
	The get_rps_cpu() function will not do any RPS on the data flow when no queue is setup and always use the current cpu where the IRQ was handled to also handle the backlog. As we only have one physical queue we always set this to 0 unconditionally. Signed-off-by: John Crispin <john@phrozen.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-20	net-next: mediatek: split IRQ register locking into TX and RX	John Crispin	2	-30/+54
	Originally the driver only utilised the new QDMA engine. The current code still assumes this is the case when locking the IRQ mask register. Since RX now runs on the old style PDMA engine we can add a second lock. This patch reduces the IRQ latency as the TX and RX path no longer need to wait on each other under heavy load. Signed-off-by: John Crispin <john@phrozen.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-20	net-next: mediatek: add RX IRQ delay support	John Crispin	2	-4/+13
	The PDMA engine used for RX allows IRQ aggregation. The patch sets up the corresponding registers to aggregate 4 IRQs into one. Using aggregation reduces the load on the core handling to a quarter thus reducing IRQ latency and increasing RX performance by around 10%. Signed-off-by: John Crispin <john@phrozen.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-20	net-next: mediatek: print phy status changes for non DSA GMACs	John Crispin	1	-0/+3
	Currently PHY status changes are only printed for DSA ports. This patch adds code to also print status changes for non-fixed links. Signed-off-by: John Crispin <john@phrozen.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-20	vxlan: allow multiple VXLANs with same VNI for IPv6 link-local addresses	Matthias Schiffer	1	-23/+45
	As link-local addresses are only valid for a single interface, we can allow to use the same VNI for multiple independent VXLANs, as long as the used interfaces are distinct. This way, VXLANs can always be used as a drop-in replacement for VLANs with greater ID space. This also extends VNI lookup to respect the ifindex when link-local IPv6 addresses are used, so using the same VNI on multiple interfaces can actually work. Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-20	vxlan: fix snooping for link-local IPv6 addresses	Matthias Schiffer	1	-5/+15
	If VXLAN is run over link-local IPv6 addresses, it is necessary to store the ifindex in the FDB entries. Otherwise, the used interface is undefined and unicast communication will most likely fail. Support for link-local IPv4 addresses should be possible as well, but as the semantics aren't as well defined as for IPv6, and there doesn't seem to be much interest in having the support, it's not implemented for now. Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-20	vxlan: check valid combinations of address scopes	Matthias Schiffer	2	-0/+31
	* Multicast addresses are never valid as local address * Link-local IPv6 unicast addresses may only be used as remote when the local address is link-local as well * Don't allow link-local IPv6 local/remote addresses without interface We also store in the flags field if link-local addresses are used for the follow-up patches that actually make VXLAN over link-local IPv6 work. Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-20	vxlan: improve validation of address family configuration	Matthias Schiffer	1	-11/+28
	Address families of source and destination addresses must match, and changelink operations can't change the address family. In addition, always use the VXLAN_F_IPV6 to check if a VXLAN device uses IPv4 or IPv6. Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-20	vxlan: get rid of redundant vxlan_dev.flags	Matthias Schiffer	3	-42/+39
	There is no good reason to keep the flags twice in vxlan_dev and vxlan_config. Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-20	vxlan: refactor verification and application of configuration	Matthias Schiffer	1	-97/+111
	The vxlan_dev_configure function was mixing validation and application of the vxlan configuration; this could easily lead to bugs with the changelink operation, as it was hard to see if the function wcould return an error after parts of the configuration had already been applied. This commit splits validation and application out of vxlan_dev_configure as separate functions to make it clearer where error returns are allowed and where the vxlan_dev or net_device may be configured. Log messages in these functions are removed, as it is generally unexpected to find error output for netlink requests in the kernel log. Userspace should be able to handle errors based on the error codes returned via netlink just fine. In addition, some validation and initialization is moved to vxlan_validate and vxlan_setup respectively to improve grouping of similar settings. Finally, this also fixes two actual bugs: * if set, conf->mtu would overwrite dev->mtu in each changelink operation, reverting other changes of dev->mtu * the "if (!conf->dst_port)" branch would never be run, as conf->dst_port was set in vxlan_setup before. This caused VXLAN-GPE to use the same default port as other VXLAN sockets instead of the intended IANA-assigned 4790. Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-20	net: manual clean code which call skb_put_[data:zero]	yuan linyu	39	-135/+93
	Signed-off-by: yuan linyu <Linyu.Yuan@alcatel-sbell.com.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-20	net: replace more place to skb_put_[data:zero]	yuan linyu	7	-16/+9
	spatch file, @@ expression skb, len, data; type t; @@ -memcpy((t )skb_put(skb, len), data, len); +skb_put_data(skb, data, len); @@ identifier p; expression skb, len, data; type t; @@ -p = (t )memset(skb_put(skb, len), data, len); +p = skb_put_zero(skb, len); @@ expression skb, len, data; type t; @@ -memcpy((t )__skb_put(skb, len), data, len); +__skb_put_data(skb, data, len); @@ identifier p; expression skb, len, data; type t; @@ -p = (t )memset(__skb_put(skb, len), data, len); +p = __skb_put_zero(skb, len); Signed-off-by: yuan linyu <Linyu.Yuan@alcatel-sbell.com.cn> Signed-off-by: David S. Miller <davem@davemloft.net>