aboutsummaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)AuthorFilesLines
2018-08-11net: sched: act_pedit: remove dependency on rtnl lockVlad Buslov1-20/+20
Rearrange pedit init code to only access pedit action data while holding tcf spinlock. Change keys allocation type to atomic to allow it to execute while holding tcf spinlock. Take tcf spinlock in dump function when accessing pedit action data. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-11net: sched: act_ipt: remove dependency on rtnl lockVlad Buslov1-0/+3
Use tcf spinlock to protect ipt action private data from concurrent modification during dump. Ipt init already takes tcf spinlock when modifying ipt state. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-11net: sched: act_ife: remove dependency on rtnl lockVlad Buslov1-15/+25
Use tcf spinlock and rcu to protect params pointer from concurrent modification during dump and init. Use rcu swap operation to reassign params pointer under protection of tcf lock. (old params value is not used by init, so there is no need of standalone rcu dereference step) Ife action has meta-actions that are compiled as standalone modules. Rtnl mutex must be released while loading a kernel module. In order to support execution without rtnl mutex, propagate 'rtnl_held' argument to meta action loading functions. When requesting meta action module, conditionally release rtnl lock depending on 'rtnl_held' argument. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-11net: sched: act_gact: remove dependency on rtnl lockVlad Buslov1-2/+8
Use tcf spinlock to protect gact action private state from concurrent modification during dump and init. Remove rtnl assertion that is no longer necessary. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-11net: sched: act_csum: remove dependency on rtnl lockVlad Buslov1-9/+15
Use tcf lock to protect csum action struct private data from concurrent modification in init and dump. Use rcu swap operation to reassign params pointer under protection of tcf lock. (old params value is not used by init, so there is no need of standalone rcu dereference step) Remove rtnl assertion that is no longer necessary. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-11net: sched: act_bpf: remove dependency on rtnl lockVlad Buslov1-3/+7
Use tcf spinlock to protect bpf action private data from concurrent modification during dump and init. Remove rtnl lock assertion that is no longer necessary. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-11Merge branch 'net-sctp-Avoid-allocating-high-order-memory-with-kmalloc'David S. Miller9-105/+172
Konstantin Khorenko says: ==================== net/sctp: Avoid allocating high order memory with kmalloc() Each SCTP association can have up to 65535 input and output streams. For each stream type an array of sctp_stream_in or sctp_stream_out structures is allocated using kmalloc_array() function. This function allocates physically contiguous memory regions, so this can lead to allocation of memory regions of very high order, i.e.: sizeof(struct sctp_stream_out) == 24, ((65535 * 24) / 4096) == 383 memory pages (4096 byte per page), which means 9th memory order. This can lead to a memory allocation failures on the systems under a memory stress. We actually do not need these arrays of memory to be physically contiguous. Possible simple solution would be to use kvmalloc() instread of kmalloc() as kvmalloc() can allocate physically scattered pages if contiguous pages are not available. But the problem is that the allocation can happed in a softirq context with GFP_ATOMIC flag set, and kvmalloc() cannot be used in this scenario. So the other possible solution is to use flexible arrays instead of contiguios arrays of memory so that the memory would be allocated on a per-page basis. This patchset replaces kvmalloc() with flex_array usage. It consists of two parts: * First patch is preparatory - it mechanically wraps all direct access to assoc->stream.out[] and assoc->stream.in[] arrays with SCTP_SO() and SCTP_SI() wrappers so that later a direct array access could be easily changed to an access to a flex_array (or any other possible alternative). * Second patch replaces kmalloc_array() with flex_array usage. v2 changes: sctp_stream_in() users are updated to provide stream as an argument, sctp_stream_{in,out}_ptr() are now just sctp_stream_{in,out}(). v3 changes: Move type chages struct sctp_stream_out -> flex_array to next patch. Make sctp_stream_{in,out}() static incline and move them to a header. Performance results (single stream): ==================================== * Kernel: v4.18-rc6 - stock and with 2 patches from Oleg (earlier in this thread) * Node: CPU (8 cores): Intel(R) Xeon(R) CPU E31230 @ 3.20GHz RAM: 32 Gb * netperf: taken from https://github.com/HewlettPackard/netperf.git, compiled from sources with sctp support * netperf server and client are run on the same node * ip link set lo mtu 1500 The script used to run tests: # cat run_tests.sh #!/bin/bash for test in SCTP_STREAM SCTP_STREAM_MANY SCTP_RR SCTP_RR_MANY; do echo "TEST: $test"; for i in `seq 1 3`; do echo "Iteration: $i"; set -x netperf -t $test -H localhost -p 22222 -S 200000,200000 -s 200000,200000 \ -l 60 -- -m 1452; set +x done done ================================================ Results (a bit reformatted to be more readable): Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec v4.18-rc7 v4.18-rc7 + fixes TEST: SCTP_STREAM 212992 212992 1452 60.21 1125.52 1247.04 212992 212992 1452 60.20 1376.38 1149.95 212992 212992 1452 60.20 1131.40 1163.85 TEST: SCTP_STREAM_MANY 212992 212992 1452 60.00 1111.00 1310.05 212992 212992 1452 60.00 1188.55 1130.50 212992 212992 1452 60.00 1108.06 1162.50 =========== Local /Remote Socket Size Request Resp. Elapsed Trans. Send Recv Size Size Time Rate bytes Bytes bytes bytes secs. per sec v4.18-rc7 v4.18-rc7 + fixes TEST: SCTP_RR 212992 212992 1 1 60.00 45486.98 46089.43 212992 212992 1 1 60.00 45584.18 45994.21 212992 212992 1 1 60.00 45703.86 45720.84 TEST: SCTP_RR_MANY 212992 212992 1 1 60.00 40.75 40.77 212992 212992 1 1 60.00 40.58 40.08 212992 212992 1 1 60.00 39.98 39.97 Performance results for many streams: ===================================== * Kernel: v4.18-rc8 - stock and with 2 patches v3 * Node: CPU (8 cores): Intel(R) Xeon(R) CPU E31230 @ 3.20GHz RAM: 32 Gb * sctp_test: https://github.com/sctp/lksctp-tools * both server and client are run on the same node * ip link set lo mtu 1500 * sysctl -w vm.max_map_count=65530000 (need it to make memory fragmented) The script used to run tests: ============================= # cat run_sctp_test.sh #!/bin/bash set -x uname -r ip link set lo mtu 1500 swapoff -a free cat /proc/buddyinfo ./src/apps/sctp_test -H 127.0.0.1 -P 22222 -l -d 0 & sleep 3 time ./src/apps/sctp_test -H 127.0.0.1 -P 22221 -h 127.0.0.1 -p 22222 \ -s -c 1 -M 65535 -T -t 1 -x 100000 -d 0 1>/dev/null killall -9 lt-sctp_test =============================== Results (a bit reformatted to be more readable): 1) ms stock kernel v4.18-rc8, no memory fragmentation test 1 test 2 test 3 real 0m14.715s 0m14.593s 0m15.954s user 0m0.954s 0m0.955s 0m0.854s sys 0m13.388s 0m12.537s 0m13.749s 2) kernel with fixes, no memory fragmentation test 1 test 2 test 3 real 0m14.959s 0m14.693s 0m14.762s user 0m0.948s 0m0.921s 0m0.929s sys 0m13.538s 0m13.225s 0m13.217s 3) kernel with fixes, memory fragmented 'free': total used free shared buff/cache available Mem: 32906008 30555200 302740 764 2048068 266452 Mem: 32906008 30379948 541436 764 1984624 442376 Mem: 32906008 30717312 262380 764 1926316 109908 /proc/buddyinfo: Node 0, zone Normal 40773 37 34 29 0 0 0 0 0 0 0 Node 0, zone Normal 100332 68 8 4 2 1 1 0 0 0 0 Node 0, zone Normal 31113 7 2 1 0 0 0 0 0 0 0 test 1 test 2 test 3 real 0m14.159s 0m15.252s 0m15.826s user 0m0.839s 0m1.004s 0m1.048s sys 0m11.827s 0m14.240s 0m14.778s ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-11net/sctp: Replace in/out stream arrays with flex_arrayKonstantin Khorenko2-26/+71
This path replaces physically contiguous memory arrays allocated using kmalloc_array() with flexible arrays. This enables to avoid memory allocation failures on the systems under a memory stress. Signed-off-by: Oleg Babin <obabin@virtuozzo.com> Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-11net/sctp: Make wrappers for accessing in/out streamsKonstantin Khorenko9-81/+103
This patch introduces wrappers for accessing in/out streams indirectly. This will enable to replace physically contiguous memory arrays of streams with flexible arrays (or maybe any other appropriate mechanism) which do memory allocation on a per-page basis. Signed-off-by: Oleg Babin <obabin@virtuozzo.com> Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-11tc: Update README and add configKeara Leibovitz2-5/+59
Updated README. Added config file that contains the minimum required features enabled to run the tests currently present in the kernel. This must be updated when new unittests are created and require their own modules. Signed-off-by: Keara Leibovitz <kleib@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-11Merge branch 'l2tp-rework-pppol2tp-ioctl-handling'David S. Miller7-199/+133
Guillaume Nault says: ==================== l2tp: rework pppol2tp ioctl handling The current ioctl() handling code can be simplified. It tests for non-relevant conditions and uselessly holds sockets. Once useless code is removed, it becomes even simpler to let pppol2tp_ioctl() handle commands directly, rather than dispatch them to pppol2tp_tunnel_ioctl() or pppol2tp_session_ioctl(). That is the approach taken by this series. Patch #1 and #2 define helper functions aimed at simplifying the rest of the patch set. Patch #3 drops useless tests in pppol2p_ioctl() and avoid holding a refcount on the socket. Patches #4, #5 and #6 are the core of the series. They let pppol2tp_ioctl() handle all ioctls and drop the tunnel and session specific functions. Then patch #6 brings a little bit of consolidation. Finally, patch #7 takes advantage of the simplified code to make pppol2tp sockets compatible with dev_ioctl(). Certainly not a killer feature, but it is trivial and it is always nice to see l2tp getting better integration with the rest of the stack. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-11l2tp: let pppol2tp_ioctl() fallback to dev_ioctl()Guillaume Nault1-1/+1
Return -ENOIOCTLCMD for unknown ioctl commands. This lets dev_ioctl() handle generic socket ioctls like SIOCGIFNAME or SIOCGIFINDEX. PF_PPPOX/PX_PROTO_OL2TP was one of the few socket types not honouring this mechanism. Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-11l2tp: zero out stats in pppol2tp_copy_stats()Guillaume Nault1-4/+3
Integrate memset(0) in pppol2tp_copy_stats() to avoid calling it manually every time. While there, constify 'stats'. Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-11l2tp: remove pppol2tp_session_ioctl()Guillaume Nault2-48/+4
pppol2tp_ioctl() has everything in place for handling PPPIOCGL2TPSTATS on session sockets. We just need to copy the stats and set ->session_id. As a side effect of sharing session and tunnel code, ->using_ipsec is properly set even when the request was made using a session socket. Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-11l2tp: remove pppol2tp_tunnel_ioctl()Guillaume Nault1-79/+53
Handle PPPIOCGL2TPSTATS in pppol2tp_ioctl() if the socket represents a tunnel. This one is a bit special because the caller may use the tunnel socket to retrieve statistics of one of its sessions. If the session_id is set, the corresponding session's statistics are returned, instead of those of the tunnel. This is handled by the new pppol2tp_tunnel_copy_stats() helper function. Set ->tunnel_id and ->using_ipsec out of the conditional, so that it can be used by the 'else' branch in the following patch. We cannot do that for ->session_id, because tunnel sockets have to report the value that was originally passed in 'stats.session_id', while session sockets have to report their own session_id. Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-11l2tp: handle PPPIOC[GS]MRU and PPPIOC[GS]FLAGS in pppol2tp_ioctl()Guillaume Nault1-29/+44
Let pppol2tp_ioctl() handle ioctl commands directly. It still relies on pppol2tp_{session,tunnel}_ioctl() for PPPIOCGL2TPSTATS. Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-11l2tp: simplify pppol2tp_ioctl()Guillaume Nault1-27/+6
* Drop test on 'sk': sock->sk cannot be NULL, or pppox_ioctl() could not have called us. * Drop test on 'SOCK_DEAD' state: if this flag was set, the socket would be in the process of being released and no ioctl could be running anymore. * Drop test on 'PPPOX_*' state: we depend on ->sk_user_data to get the session structure. If it is non-NULL, then the socket is connected. Testing for PPPOX_* is redundant. * Retrieve session using ->sk_user_data directly, instead of going through pppol2tp_sock_to_session(). This avoids grabbing a useless reference on the socket. Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-11l2tp: split l2tp_session_get()Guillaume Nault6-36/+36
l2tp_session_get() is used for two different purposes. If 'tunnel' is NULL, the session is searched globally in the supplied network namespace. Otherwise it is searched exclusively in the tunnel context. Callers always know the context in which they need to search the session. But some of them do provide both a namespace and a tunnel, making the semantic of the call unclear. This patch defines l2tp_tunnel_get_session() for lookups done in a tunnel and restricts l2tp_session_get() to namespace searches. Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-11l2tp: define l2tp_tunnel_uses_xfrm()Guillaume Nault3-10/+21
Use helper function to figure out if a tunnel is using ipsec. Also, avoid accessing ->sk_policy directly since it's RCU protected. Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-11Merge branch 'netsec-driver-improvements'David S. Miller1-18/+12
Ilias Apalodimas says: ==================== netsec driver improvements This patchset introduces some improvements on socionext netsec driver. - patch 1/2, avoids unneeded MMIO reads on the Rx path - patch 2/2, is adjusting the numbers of descriptors used Changes since v1: - Move dma_rmb() to protect descriptor accesses until the device has updated the NETSEC_RX_PKT_OWN_FIELD bit ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-11net: socionext: Increase descriptors to 256Ilias Apalodimas1-3/+2
Increasing descriptors to 256 from 128 and adjusting the NAPI weight to 64 increases performace on Rx by ~20% on 64byte packets Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-11net: socionext: Use descriptor info instead of MMIO reads on RxIlias Apalodimas1-15/+10
MMIO reads for remaining packets in queue occur (at least)twice per invocation of netsec_process_rx(). We can use the packet descriptor to identify if it's owned by the hardware and break out, avoiding the more expensive MMIO read operations. This has a ~2% increase on the pps of the Rx path when tested with 64byte packets Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-11vxge: remove set but not used variable 'req_out', 'status' and 'ret'YueHaibing1-21/+6
Fixes gcc '-Wunused-but-set-variable' warning: drivers/net/ethernet/neterion/vxge/vxge-config.c:1097:6: warning: variable 'ret' set but not used [-Wunused-but-set-variable] drivers/net/ethernet/neterion/vxge/vxge-config.c:2263:6: warning: variable 'req_out' set but not used [-Wunused-but-set-variable] drivers/net/ethernet/neterion/vxge/vxge-config.c:2262:22: warning: variable 'status' set but not used [-Wunused-but-set-variable] drivers/net/ethernet/neterion/vxge/vxge-config.c:2360:22: warning: variable 'status' set but not used [-Wunused-but-set-variable] enum vxge_hw_status status = VXGE_HW_OK; Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-11Merge branch 'virtio_net-Expand-affinity-to-arbitrary-numbers-of-cpu-and-vq'David S. Miller5-27/+39
Caleb Raitto says: ==================== virtio_net: Expand affinity to arbitrary numbers of cpu and vq Virtio-net tries to pin each virtual queue rx and tx interrupt to a cpu if there are as many queues as cpus. Expand this heuristic to configure a reasonable affinity setting also when the number of cpus != the number of virtual queues. Patch 1 allows vqs to take an affinity mask with more than 1 cpu. Patch 2 generalizes the algorithm in virtnet_set_affinity beyond the case where #cpus == #vqs. v2 changes: Renamed "virtio_net: Make vp_set_vq_affinity() take a mask." to "virtio: Make vp_set_vq_affinity() take a mask." Tested: [InstanceSetup] set_multiqueue = false $ cd /proc/irq $ for i in `seq 24 60` ; do sudo grep ".*" $i/smp_affinity_list; done 0-15 0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10 11 11 12 12 13 13 14 14 15 15 0-15 0-15 0-15 0-15 $ cd /sys/class/net/eth0/queues/ $ for i in `seq 0 15` ; do sudo grep ".*" tx-$i/xps_cpus; done 0001 0002 0004 0008 0010 0020 0040 0080 0100 0200 0400 0800 1000 2000 4000 8000 $ sudo ethtool -L eth0 combined 15 $ cd /proc/irq $ for i in `seq 24 60` ; do sudo grep ".*" $i/smp_affinity_list; done 0-15 0-1 0-1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10 11 11 12 12 13 13 14 14 15 15 15 15 0-15 0-15 0-15 0-15 $ cd /sys/class/net/eth0/queues/ $ for i in `seq 0 14` ; do sudo grep ".*" tx-$i/xps_cpus; done 0003 0004 0008 0010 0020 0040 0080 0100 0200 0400 0800 1000 2000 4000 8000 $ sudo ethtool -L eth0 combined 8 $ cd /proc/irq $ for i in `seq 24 60` ; do sudo grep ".*" $i/smp_affinity_list; done 0-15 0-1 0-1 2-3 2-3 4-5 4-5 6-7 6-7 8-9 8-9 10-11 10-11 12-13 12-13 14-15 14-15 9 9 10 10 11 11 12 12 13 13 14 14 15 15 15 15 0-15 0-15 0-15 0-15 $ cd /sys/class/net/eth0/queues/ $ for i in `seq 0 7` ; do sudo grep ".*" tx-$i/xps_cpus; done 0003 000c 0030 00c0 0300 0c00 3000 c000 $ sudo ethtool -L eth0 combined 16 $ sudo sh -c "echo 0 > /sys/devices/system/cpu/cpu15/online" $ cd /proc/irq $ for i in `seq 24 60` ; do sudo grep ".*" $i/smp_affinity_list; done 0-15 0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10 11 11 12 12 13 13 14 14 0 0 0-15 0-15 0-15 0-15 $ cd /sys/class/net/eth0/queues/ $ for i in `seq 0 15` ; do sudo grep ".*" tx-$i/xps_cpus; done 0001 0002 0004 0008 0010 0020 0040 0080 0100 0200 0400 0800 1000 2000 4000 0001 $ for i in `seq 8 15`; \ do sudo sh -c "echo 0 > /sys/devices/system/cpu/cpu$i/online"; done $ cd /proc/irq $ for i in `seq 24 60` ; do sudo grep ".*" $i/smp_affinity_list; done 0-15 0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 0-15 0-15 0-15 0-15 $ cd /sys/class/net/eth0/queues/ $ for i in `seq 0 15` ; do sudo grep ".*" tx-$i/xps_cpus; done 0001 0002 0004 0008 0010 0020 0040 0080 0001 0002 0004 0008 0010 0020 0040 0080 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-11virtio_net: Stripe queue affinities across cores.Caleb Raitto1-15/+27
Always set the affinity hint, even if #cpu != #vq. Handle the case where #cpu > #vq (including when #cpu % #vq != 0) and when #vq > #cpu (including when #vq % #cpu != 0). Signed-off-by: Caleb Raitto <caraitto@google.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Acked-by: Jon Olson <jonolson@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-11virtio: Make vp_set_vq_affinity() take a mask.Caleb Raitto5-14/+14
Make vp_set_vq_affinity() take a cpumask instead of taking a single CPU. If there are fewer queues than cores, queue affinity should be able to map to multiple cores. Link: https://patchwork.ozlabs.org/patch/948149/ Suggested-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Caleb Raitto <caraitto@google.com> Acked-by: Gonglei <arei.gonglei@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-11lan743x: lan743x: Add PTP supportBryan Whitehead6-5/+1443
PTP support includes: Ingress, and egress timestamping. One step timestamping available. PTP clock support. Periodic output support. Signed-off-by: Bryan Whitehead <Bryan.Whitehead@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-11Merge branch 'tcp-new-mechanism-to-ACK-immediately'David S. Miller3-10/+13
Yuchung Cheng says: ==================== tcp: new mechanism to ACK immediately This patch is a follow-up feature improvement to the recent fixes on the performance issues in ECN (delayed) ACKs. Many of the fixes use tcp_enter_quickack_mode routine to force immediate ACKs. However the routine also reset tracking interactive session. This is not ideal because these immediate ACKs are required by protocol specifics unrelated to the interactiveness nature of the application. This patch set introduces a new flag to send a one-time immediate ACK without changing the status of interactive session tracking. With this patch set the immediate ACKs are generated upon these protocol states: 1) When a hole is repaired 2) When CE status changes between subsequent data packets received 3) When a data packet carries CWR flag ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-11tcp: avoid resetting ACK timer upon receiving packet with ECN CWR flagYuchung Cheng1-4/+4
Previously commit 9aee40006190 ("tcp: ack immediately when a cwr packet arrives") calls tcp_enter_quickack_mode to force sending two immediate ACKs upon receiving a packet w/ CWR flag. The side effect is it'll also reset the delayed ACK timer and interactive session tracking. This patch removes that side effect by using the new ACK_NOW flag to force an immmediate ACK. Packetdrill to demonstrate: 0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 +0 setsockopt(3, SOL_TCP, TCP_CONGESTION, "dctcp", 5) = 0 +0 bind(3, ..., ...) = 0 +0 listen(3, 1) = 0 +0 < [ect0] SEW 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7> +0 > SE. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 8> +.1 < [ect0] . 1:1(0) ack 1 win 257 +0 accept(3, ..., ...) = 4 +0 < [ect0] . 1:1001(1000) ack 1 win 257 +0 > [ect01] . 1:1(0) ack 1001 +0 write(4, ..., 1) = 1 +0 > [ect01] P. 1:2(1) ack 1001 +0 < [ect0] . 1001:2001(1000) ack 2 win 257 +0 write(4, ..., 1) = 1 +0 > [ect01] P. 2:3(1) ack 2001 +0 < [ect0] . 2001:3001(1000) ack 3 win 257 +0 < [ect0] . 3001:4001(1000) ack 3 win 257 // Ack delayed ... +.01 < [ce] P. 4001:4501(500) ack 3 win 257 +0 > [ect01] . 3:3(0) ack 4001 +0 > [ect01] E. 3:3(0) ack 4501 +.001 read(4, ..., 4500) = 4500 +0 write(4, ..., 1) = 1 +0 > [ect01] PE. 3:4(1) ack 4501 win 100 +.01 < [ect0] W. 4501:5501(1000) ack 4 win 257 // No delayed ACK on CWR flag +0 > [ect01] . 4:4(0) ack 5501 +.31 < [ect0] . 5501:6501(1000) ack 4 win 257 +0 > [ect01] . 4:4(0) ack 6501 Fixes: 9aee40006190 ("tcp: ack immediately when a cwr packet arrives") Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-11tcp: always ACK immediately on hole repairsYuchung Cheng1-2/+2
RFC 5681 sec 4.2: To provide feedback to senders recovering from losses, the receiver SHOULD send an immediate ACK when it receives a data segment that fills in all or part of a gap in the sequence space. When a gap is partially filled, __tcp_ack_snd_check already checks the out-of-order queue and correctly send an immediate ACK. However when a gap is fully filled, the previous implementation only resets pingpong mode which does not guarantee an immediate ACK because the quick ACK counter may be zero. This patch addresses this issue by marking the one-time immediate ACK flag instead. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Wei Wang <weiwan@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-11tcp: avoid resetting ACK timer in DCTCPYuchung Cheng1-2/+2
The recent fix of acking immediately in DCTCP on CE status change has an undesirable side-effect: it also resets TCP ack timer and disables pingpong mode (interactive session). But the CE status change has nothing to do with them. This patch addresses that by using the new one-time immediate ACK flag instead of calling tcp_enter_quickack_mode(). Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Wei Wang <weiwan@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-11tcp: mandate a one-time immediate ACKYuchung Cheng2-2/+5
Add a new flag to indicate a one-time immediate ACK. This flag is occasionaly set under specific TCP protocol states in addition to the more common quickack mechanism for interactive application. In several cases in the TCP code we want to force an immediate ACK but do not want to call tcp_enter_quickack_mode() because we do not want to forget the icsk_ack.pingpong or icsk_ack.ato state. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Wei Wang <weiwan@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-11wimax: usb-tx: mark expected switch fall-throughGustavo A. R. Silva1-1/+1
In preparation to enabling -Wimplicit-fallthrough, mark switch cases where we are expecting to fall through. Notice that in this particular case, I placed the "fall through" annotation at the bottom of the case, which is what GCC is expecting to find. Addresses-Coverity-ID: 115075 ("Missing break in switch") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-11wimax: usb-fw: mark expected switch fall-throughGustavo A. R. Silva1-1/+1
In preparation to enabling -Wimplicit-fallthrough, mark switch cases where we are expecting to fall through. Notice that in this particular case, I placed the "fall through" annotation at the bottom of the case, which is what GCC is expecting to find. Addresses-Coverity-ID: 1369529 ("Missing break in switch") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-11net: dp83640: Mark expected switch fall-throughsGustavo A. R. Silva1-1/+4
In preparation to enabling -Wimplicit-fallthrough, mark switch cases where we are expecting to fall through. Notice that in this particular case, I replaced the code comment at the top of the switch statement with a proper "fall through" annotation for each case, which is what GCC is expecting to find. Addresses-Coverity-ID: 1056542 ("Missing break in switch") Addresses-Coverity-ID: 1339579 ("Missing break in switch") Addresses-Coverity-ID: 1369526 ("Missing break in switch") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-11rxrpc: remove redundant static int 'zero'Colin Ian King1-1/+0
The static int 'zero' is defined but is never used hence it is redundant and can be removed. The use of this variable was removed with commit a158bdd3247b ("rxrpc: Fix call timeouts"). Cleans up clang warning: warning: 'zero' defined but not used [-Wunused-const-variable=] Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-11drivers/net/usb/r8152: remove the unneeded variable "ret" in rtl8152_system_suspendzhong jiang1-2/+1
rtl8152_system_suspend defines the variable "ret", but it is not modified after initialization. So just remove it. Signed-off-by: zhong jiang <zhongjiang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-10bnxt_en: Fix strcpy() warnings in bnxt_ethtool.cVasundhara Volam1-3/+3
This patch fixes following smatch warnings: drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c:2826 bnxt_fill_coredump_seg_hdr() error: strcpy() '"sEgM"' too large for 'seg_hdr->signature' (5 vs 4) drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c:2858 bnxt_fill_coredump_record() error: strcpy() '"cOrE"' too large for 'record->signature' (5 vs 4) drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c:2879 bnxt_fill_coredump_record() error: strcpy() 'utsname()->sysname' too large for 'record->os_name' (65 vs 32) Fixes: 6c5657d085ae ("bnxt_en: Add support for ethtool get dump.") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-10liquidio: copperhead LED identificationRaghu Vatsavayi2-5/+23
Add LED identification support for liquidio TP copperhead cards. Signed-off-by: Raghu Vatsavayi <raghu.vatsavayi@cavium.com> Acked-by: Derek Chickles <derek.chickles@cavium.com> Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-10qed/qede: qede_setup_tc() can be statickbuild test robot1-1/+1
Fixes: 5e7baf0fcb2a ("qed/qede: Multi CoS support.") Signed-off-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-10mlxsw: core: remove unnecessary function mlxsw_core_driver_putYueHaibing1-12/+0
The function mlxsw_core_driver_put only traverse mlxsw_core_driver_list to find the matched mlxsw_driver,but never used it. So it can be removed safely. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-10net: mvneta: fix mvneta_config_rss on armada 3700Jisheng Zhang1-11/+20
The mvneta Ethernet driver is used on a few different Marvell SoCs. Some SoCs have per cpu interrupts for Ethernet events, the driver uses a per CPU napi structure for this case. Some SoCs such as armada 3700 have a single interrupt for Ethernet events, the driver uses a global napi structure for this case. Current mvneta_config_rss() always operates the per cpu napi structure. Fix it by operating a global napi for "single interrupt" case, and per cpu napi structure for remaining cases. Signed-off-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com> Fixes: 2636ac3cc2b4 ("net: mvneta: Add network support for Armada 3700 SoC") Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-10net/smc: send response to test link signalUrsula Braun1-0/+34
With SMC-D z/OS sends a test link signal every 10 seconds. Linux is supposed to answer, otherwise the SMC-D connection breaks. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-10Merge branch 'r8169-smaller-improvements'David S. Miller1-148/+106
Heiner Kallweit says: ==================== r8169: smaller improvements This series includes smaller improvements, no functional change intended. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-10r8169: don't configure max jumbo frame size per chip versionHeiner Kallweit1-118/+83
We don't have to configure the max jumbo frame size per chip (sub-)version. It can be easily determined based on the chip family. And new members of the RTL8168 family (if there are any) should be automatically covered. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-10r8169: don't configure csum function per chip versionHeiner Kallweit1-67/+67
We don't have to configure the csum function per chip (sub-)version. The distinction is simple, versions RTL8102e and from RTL8168c onwards support csum_v2. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-10r8169: simplify interrupt handlerHeiner Kallweit1-12/+7
Simplify the interrupt handler a little and make it better readable. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-10r8169: don't include asm headers directlyHeiner Kallweit1-3/+1
The asm headers shouldn't be included directly. asm/irq.h is implicitly included by linux/interrupt.h, and instead of asm/io.h include linux/io.h. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-10r8169: remove version infoHeiner Kallweit1-8/+0
The version number hasn't changed for ages and in general I doubt it provides any benefit. The message in rtl_init_one() may even be misleading because it's printed also if something fails in probe. Therefore let's remove the version information. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-10Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-nextDavid S. Miller10-9/+737
Johan Hedberg says: ==================== pull request: bluetooth-next 2018-08-10 Here's one more (most likely last) bluetooth-next pull request for the 4.19 kernel. - Added support for MediaTek serial Bluetooth devices - Initial skeleton for controller-side address resolution support - Fix BT_HCIUART_RTL related Kconfig dependencies - A few other minor fixes/cleanups Please let me know if there are any issues pulling. Thanks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>