aboutsummaryrefslogtreecommitdiffstats
path: root/net (follow)
AgeCommit message (Collapse)AuthorFilesLines
2018-10-15tcp: mitigate scheduling jitter in EDT pacing modelEric Dumazet1-6/+13
In commit fefa569a9d4b ("net_sched: sch_fq: account for schedule/timers drifts") we added a mitigation for scheduling jitter in fq packet scheduler. This patch does the same in TCP stack, now it is using EDT model. Note that this mitigation is valid for both external (fq packet scheduler) or internal TCP pacing. This uses the same strategy than the above commit, allowing a time credit of half the packet currently sent. Consider following case : An skb is sent, after an idle period of 300 usec. The air-time (skb->len/pacing_rate) is 500 usec Instead of setting the pacing timer to now+500 usec, it will use now+min(500/2, 300) -> now+250usec This is like having a token bucket with a depth of half an skb. Tested: tc qdisc replace dev eth0 root pfifo_fast Before netperf -P0 -H remote -- -q 1000000000 # 8000Mbit 540000 262144 262144 10.00 7710.43 After : netperf -P0 -H remote -- -q 1000000000 # 8000 Mbit 540000 262144 262144 10.00 7999.75 # Much closer to 8000Mbit target Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15net: extend sk_pacing_rate to unsigned longEric Dumazet6-30/+38
sk_pacing_rate has beed introduced as a u32 field in 2013, effectively limiting per flow pacing to 34Gbit. We believe it is time to allow TCP to pace high speed flows on 64bit hosts, as we now can reach 100Gbit on one TCP flow. This patch adds no cost for 32bit kernels. The tcpi_pacing_rate and tcpi_max_pacing_rate were already exported as 64bit, so iproute2/ss command require no changes. Unfortunately the SO_MAX_PACING_RATE socket option will stay 32bit and we will need to add a new option to let applications control high pacing rates. State Recv-Q Send-Q Local Address:Port Peer Address:Port ESTAB 0 1787144 10.246.9.76:49992 10.246.9.77:36741 timer:(on,003ms,0) ino:91863 sk:2 <-> skmem:(r0,rb540000,t66440,tb2363904,f605944,w1822984,o0,bl0,d0) ts sack bbr wscale:8,8 rto:201 rtt:0.057/0.006 mss:1448 rcvmss:536 advmss:1448 cwnd:138 ssthresh:178 bytes_acked:256699822585 segs_out:177279177 segs_in:3916318 data_segs_out:177279175 bbr:(bw:31276.8Mbps,mrtt:0,pacing_gain:1.25,cwnd_gain:2) send 28045.5Mbps lastrcv:73333 pacing_rate 38705.0Mbps delivery_rate 22997.6Mbps busy:73333ms unacked:135 retrans:0/157 rcv_space:14480 notsent:2085120 minrtt:0.013 Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15tcp: do not change tcp_wstamp_ns in tcp_mstamp_refreshEric Dumazet2-4/+7
In EDT design, I made the mistake of using tcp_wstamp_ns to store the last tcp_clock_ns() sample and to store the pacing virtual timer. This causes major regressions at high speed flows. Introduce tcp_clock_cache to store last tcp_clock_ns(). This is needed because some arches have slow high-resolution kernel time service. tcp_wstamp_ns is only updated when a packet is sent. Note that we can remove tcp_mstamp in the future since tcp_mstamp is essentially tcp_clock_cache/1000, so the apparent socket size increase is temporary. Fixes: 9799ccb0e984 ("tcp: add tcp_wstamp_ns socket field") Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15net: bridge: fix a possible memory leak in __vlan_addLi RongQing1-0/+4
After per-port vlan stats, vlan stats should be released when fail to add vlan Fixes: 9163a0fc1f0c0 ("net: bridge: add support for per-port vlan stats") CC: bridge@lists.linux-foundation.org cc: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> CC: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: Zhang Yu <zhangyu31@baidu.com> Signed-off-by: Li RongQing <lirongqing@baidu.com> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15rxrpc: Add /proc/net/rxrpc/peers to display peer listDavid Howells3-0/+130
Add /proc/net/rxrpc/peers to display the list of peers currently active. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15net/ncsi: Extend NC-SI Netlink interface to allow user space to send NC-SI commandJustin.Lee1@Dell.com6-5/+309
The new command (NCSI_CMD_SEND_CMD) is added to allow user space application to send NC-SI command to the network card. Also, add a new attribute (NCSI_ATTR_DATA) for transferring request and response. The work flow is as below. Request: User space application -> Netlink interface (msg) -> new Netlink handler - ncsi_send_cmd_nl() -> ncsi_xmit_cmd() Response: Response received - ncsi_rcv_rsp() -> internal response handler - ncsi_rsp_handler_xxx() -> ncsi_rsp_handler_netlink() -> ncsi_send_netlink_rsp () -> Netlink interface (msg) -> user space application Command timeout - ncsi_request_timeout() -> ncsi_send_netlink_timeout () -> Netlink interface (msg with zero data length) -> user space application Error: Error detected -> ncsi_send_netlink_err () -> Netlink interface (err msg) -> user space application Signed-off-by: Justin Lee <justin.lee1@dell.com> Reviewed-by: Samuel Mendoza-Jonas <sam@mendozajonas.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15tipc: support binding to specific ip address when activating UDP bearerHoang Le1-3/+15
INADDR_ANY is hard-coded when activating UDP bearer. So, we could not bind to a specific IP address even with replicast mode using - given remote ip address instead of using multicast ip address. In this commit, we fixed it by checking and switch to use appropriate local ip address. before: $netstat -plu Active Internet connections (only servers) Proto Recv-Q Send-Q Local Address Foreign Address udp 0 0 **0.0.0.0:6118** 0.0.0.0:* after: $netstat -plu Active Internet connections (only servers) Proto Recv-Q Send-Q Local Address Foreign Address udp 0 0 **10.0.0.2:6118** 0.0.0.0:* Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15FDDI: defza: Support capturing outgoing SMT trafficMaciej W. Rozycki1-1/+12
DEC FDDIcontroller 700 (DEFZA) uses a Tx/Rx queue pair to communicate SMT frames with adapter's firmware. Any SMT frame received from the RMC via the Rx queue is queued back by the driver to the SMT Rx queue for the firmware to process. Similarly the firmware uses the SMT Tx queue to supply the driver with SMT frames which are queued back to the Tx queue for the RMC to send to the ring. When a network tap is attached to an FDDI interface handled by `defza' any incoming SMT frames captured are queued to our usual processing of network data received, which in turn delivers them to any listening taps. However the outgoing SMT frames produced by the firmware bypass our network protocol stack and are therefore not delivered to taps. This in turn means that taps are missing a part of network traffic sent by the adapter, which may make it more difficult to track down network problems or do general traffic analysis. Call `dev_queue_xmit_nit' then in the SMT Tx path, having checked that a network tap is attached, with a newly-created `dev_nit_active' helper wrapping the usual condition used in the transmit path. Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-12Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller22-210/+414
Conflicts were easy to resolve using immediate context mostly, except the cls_u32.c one where I simply too the entire HEAD chunk. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-12Merge tag 'mac80211-next-for-davem-2018-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-nextDavid S. Miller26-1123/+1136
Johannes Berg says: ==================== Highlights: * merge net-next, so I can finish the hwsim workqueue removal * fix TXQ NULL pointer issue that was reported multiple times * minstrel cleanups from Felix * simplify lib80211 code by not using skcipher, note that this will conflict with the crypto tree (and this new code here should be used) * use new netlink policy validation in nl80211 * fix up SAE (part of WPA3) in client-mode * FTM responder support in the stack ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-12net: bridge: add support for per-port vlan statsNikolay Aleksandrov5-4/+80
This patch adds an option to have per-port vlan stats instead of the default global stats. The option can be set only when there are no port vlans in the bridge since we need to allocate the stats if it is set when vlans are being added to ports (and respectively free them when being deleted). Also bump RTNL_MAX_TYPE as the bridge is the largest user of options. The current stats design allows us to add these without any changes to the fast-path, it all comes down to the per-vlan stats pointer which, if this option is enabled, will be allocated for each port vlan instead of using the global bridge-wide one. CC: bridge@lists.linux-foundation.org CC: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-12net: Evict neighbor entries on carrier downDavid Ahern3-4/+27
When a link's carrier goes down it could be a sign of the port changing networks. If the new network has overlapping addresses with the old one, then the kernel will continue trying to use neighbor entries established based on the old network until the entries finally age out - meaning a potentially long delay with communications not working. This patch evicts neighbor entries on carrier down with the exception of those marked permanent. Permanent entries are managed by userspace (either an admin or a routing daemon such as FRR). Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-12net/ipv6: Add knob to skip DELROUTE message on device downDavid Ahern2-6/+34
Another difference between IPv4 and IPv6 is the generation of RTM_DELROUTE notifications when a device is taken down (admin down) or deleted. IPv4 does not generate a message for routes evicted by the down or delete; IPv6 does. A NOS at scale really needs to avoid these messages and have IPv4 and IPv6 behave similarly, relying on userspace to handle link notifications and evict the routes. At this point existing user behavior needs to be preserved. Since notifications are a global action (not per app) the only way to preserve existing behavior and allow the messages to be skipped is to add a new sysctl (net/ipv6/route/skip_notify_on_dev_down) which can be set to disable the notificatioons. IPv6 route code already supports the option to skip the message (it is used for multipath routes for example). Besides the new sysctl we need to pass the skip_notify setting through the generic fib6_clean and fib6_walk functions to fib6_clean_node and to set skip_notify on calls to __ip_del_rt for the addrconf_ifdown path. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-12mac80211: implement ieee80211_tx_rate_update to update rateAnilkumar Kolli1-0/+19
Current mac80211 has provision to update tx status through ieee80211_tx_status() and ieee80211_tx_status_ext(). But drivers like ath10k updates the tx status from the skb except txrate, txrate will be updated from a different path, peer stats. Using ieee80211_tx_status_ext() in two different paths (one for the stats, one for the tx rate) would duplicate the stats instead. To avoid this stats duplication, ieee80211_tx_rate_update() is implemented. Signed-off-by: Anilkumar Kolli <akolli@codeaurora.org> [minor commit message editing, use initializers in code] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-10-12nl80211: Add per peer statistics to compute FCS error rateAnkita Bajaj1-0/+2
Add support for drivers to report the total number of MPDUs received and the number of MPDUs received with an FCS error from a specific peer. These counters will be incremented only when the TA of the frame matches the MAC address of the peer irrespective of FCS error. It should be noted that the TA field in the frame might be corrupted when there is an FCS error and TA matching logic would fail in such cases. Hence, FCS error counter might not be fully accurate, but it can provide help in detecting bad RX links in significant number of cases. This FCS error counter without full accuracy can be used, e.g., to trigger a kick-out of a connected client with a bad link in AP mode to force such a client to roam to another AP. Signed-off-by: Ankita Bajaj <bankita@codeaurora.org> Signed-off-by: Jouni Malinen <jouni@codeaurora.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-10-12mac80211: support FTM responder configuration/statisticsPradeep Kumar Chitrapu4-0/+129
New bss param ftm_responder is used to notify the driver to enable fine timing request (FTM) responder role in AP mode. Plumb the new cfg80211 API for FTM responder statistics through to the driver API in mac80211. Signed-off-by: David Spinadel <david.spinadel@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Pradeep Kumar Chitrapu <pradeepc@codeaurora.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-10-12Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netGreg Kroah-Hartman21-212/+416
David writes: "Networking 1) RXRPC receive path fixes from David Howells. 2) Re-export __skb_recv_udp(), from Jiri Kosina. 3) Fix refcounting in u32 classificer, from Al Viro. 4) Userspace netlink ABI fixes from Eugene Syromiatnikov. 5) Don't double iounmap on rmmod in ena driver, from Arthur Kiyanovski. 6) Fix devlink string attribute handling, we must pull a copy into a kernel buffer if the lifetime extends past the netlink request. From Moshe Shemesh. 7) Fix hangs in RDS, from Ka-Cheong Poon. 8) Fix recursive locking lockdep warnings in tipc, from Ying Xue. 9) Clear RX irq correctly in socionext, from Ilias Apalodimas. 10) bcm_sf2 fixes from Florian Fainelli." * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (38 commits) net: dsa: bcm_sf2: Call setup during switch resume net: dsa: bcm_sf2: Fix unbind ordering net: phy: sfp: remove sfp_mutex's definition r8169: set RX_MULTI_EN bit in RxConfig for 8168F-family chips net: socionext: clear rx irq correctly net/mlx4_core: Fix warnings during boot on driverinit param set failures tipc: eliminate possible recursive locking detected by LOCKDEP selftests: udpgso_bench.sh explicitly requires bash selftests: rtnetlink.sh explicitly requires bash. qmi_wwan: Added support for Gemalto's Cinterion ALASxx WWAN interface tipc: queue socket protocol error messages into socket receive buffer tipc: set link tolerance correctly in broadcast link net: ipv4: don't let PMTU updates increase route MTU net: ipv4: update fnhe_pmtu when first hop's MTU changes net/ipv6: stop leaking percpu memory in fib6 info rds: RDS (tcp) hangs on sendto() to unresponding address net: make skb_partial_csum_set() more robust against overflows devlink: Add helper function for safely copy string param devlink: Fix param cmode driverinit for string type devlink: Fix param set handling for string type ...
2018-10-11tipc: eliminate possible recursive locking detected by LOCKDEPYing Xue1-2/+9
When booting kernel with LOCKDEP option, below warning info was found: WARNING: possible recursive locking detected 4.19.0-rc7+ #14 Not tainted -------------------------------------------- swapper/0/1 is trying to acquire lock: 00000000dcfc0fc8 (&(&list->lock)->rlock#4){+...}, at: spin_lock_bh include/linux/spinlock.h:334 [inline] 00000000dcfc0fc8 (&(&list->lock)->rlock#4){+...}, at: tipc_link_reset+0x125/0xdf0 net/tipc/link.c:850 but task is already holding lock: 00000000cbb9b036 (&(&list->lock)->rlock#4){+...}, at: spin_lock_bh include/linux/spinlock.h:334 [inline] 00000000cbb9b036 (&(&list->lock)->rlock#4){+...}, at: tipc_link_reset+0xfa/0xdf0 net/tipc/link.c:849 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&(&list->lock)->rlock#4); lock(&(&list->lock)->rlock#4); *** DEADLOCK *** May be due to missing lock nesting notation 2 locks held by swapper/0/1: #0: 00000000f7539d34 (pernet_ops_rwsem){+.+.}, at: register_pernet_subsys+0x19/0x40 net/core/net_namespace.c:1051 #1: 00000000cbb9b036 (&(&list->lock)->rlock#4){+...}, at: spin_lock_bh include/linux/spinlock.h:334 [inline] #1: 00000000cbb9b036 (&(&list->lock)->rlock#4){+...}, at: tipc_link_reset+0xfa/0xdf0 net/tipc/link.c:849 stack backtrace: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.19.0-rc7+ #14 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1af/0x295 lib/dump_stack.c:113 print_deadlock_bug kernel/locking/lockdep.c:1759 [inline] check_deadlock kernel/locking/lockdep.c:1803 [inline] validate_chain kernel/locking/lockdep.c:2399 [inline] __lock_acquire+0xf1e/0x3c60 kernel/locking/lockdep.c:3411 lock_acquire+0x1db/0x520 kernel/locking/lockdep.c:3900 __raw_spin_lock_bh include/linux/spinlock_api_smp.h:135 [inline] _raw_spin_lock_bh+0x31/0x40 kernel/locking/spinlock.c:168 spin_lock_bh include/linux/spinlock.h:334 [inline] tipc_link_reset+0x125/0xdf0 net/tipc/link.c:850 tipc_link_bc_create+0xb5/0x1f0 net/tipc/link.c:526 tipc_bcast_init+0x59b/0xab0 net/tipc/bcast.c:521 tipc_init_net+0x472/0x610 net/tipc/core.c:82 ops_init+0xf7/0x520 net/core/net_namespace.c:129 __register_pernet_operations net/core/net_namespace.c:940 [inline] register_pernet_operations+0x453/0xac0 net/core/net_namespace.c:1011 register_pernet_subsys+0x28/0x40 net/core/net_namespace.c:1052 tipc_init+0x83/0x104 net/tipc/core.c:140 do_one_initcall+0x109/0x70a init/main.c:885 do_initcall_level init/main.c:953 [inline] do_initcalls init/main.c:961 [inline] do_basic_setup init/main.c:979 [inline] kernel_init_freeable+0x4bd/0x57f init/main.c:1144 kernel_init+0x13/0x180 init/main.c:1063 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:413 The reason why the noise above was complained by LOCKDEP is because we nested to hold l->wakeupq.lock and l->inputq->lock in tipc_link_reset function. In fact it's unnecessary to move skb buffer from l->wakeupq queue to l->inputq queue while holding the two locks at the same time. Instead, we can move skb buffers in l->wakeupq queue to a temporary list first and then move the buffers of the temporary list to l->inputq queue, which is also safe for us. Fixes: 3f32d0be6c16 ("tipc: lock wakeup & inputq at tipc_link_reset()") Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-11Merge tag 'alloc-args-v4.19-rc8' of https://git.kernel.org/pub/scm/linux/kernel/git/kees/linuxGreg Kroah-Hartman1-1/+1
Kees writes: "Fix open-coded multiplication arguments to allocators - Fixes several new open-coded multiplications added in the 4.19 merge window." * tag 'alloc-args-v4.19-rc8' of https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: treewide: Replace more open-coded allocation size multiplications
2018-10-11mac80211: Extend SAE authentication in infra BSS STA modeJouni Malinen2-23/+48
Previous implementation of SAE authentication in infrastructure BSS was somewhat restricting and not exactly clean way of handling the two auth() operations. This ended up removing and re-adding the STA entry for the AP in the middle of authentication and also messing up authentication state tracking through the sequence of four Authentication frames. Furthermore, this did not work if the AP ended up sending out SAE Confirm (auth trans #2) immediately after SAE Commit (auth trans #1) before the station had time to transmit its SAE Confirm. Clean up authentication state handling for the SAE case to allow two rounds of auth() calls without dropping all state between those operations. Track peer Confirmed status and mark authentication completed only once both ends have confirmed. ieee80211_mgd_auth() check for EBUSY cases is now handling only the pending association (ifmgd->assoc_data) while all pending authentication (ifmgd->auth_data) cases are allowed to proceed to allow user space to start a new connection attempt from scratch even if the previously requested authentication is still waiting completion. This is needed to avoid making SAE error cases with retries take excessive amount of time with no means for the user space to stop that (apart from setting the netdev down). As an extra bonus, the end of ieee80211_rx_mgmt_auth() can be cleaned up to avoid the extra copy of the cfg80211_rx_mlme_mgmt() call for ongoing SAE authentication since the new ieee80211_mark_sta_auth() helper function can handle both completion of authentication and updates to the STA entry under the same condition and there is no need to return from the function between those operations. Signed-off-by: Jouni Malinen <jouni@codeaurora.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-10-11mac80211: Move ieee80211_mgd_auth() EBUSY check to be before allocationJouni Malinen1-7/+4
This makes it easier to conditionally replace full allocation of auth_data to use reallocation for the case of continuing SAE authentication. Furthermore, there was not really any point in having this check done so late in the function after having already completed number of steps that cannot be used anyway in the error case. Signed-off-by: Jouni Malinen <jouni@codeaurora.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-10-11mac80211: Helper function for marking STA authenticatedJouni Malinen1-12/+22
Authentication exchange can be completed in both TX and RX paths for SAE, so move this common functionality into a helper function to avoid having to implement practically the same operations in two places when extending SAE implementation in the following commits. Signed-off-by: Jouni Malinen <jouni@codeaurora.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-10-11mac80211: rc80211_minstrel: remove variance / stddev calculationFelix Fietkau4-51/+9
When there are few packets (e.g. for sampling attempts), the exponentially weighted variance is usually vastly overestimated, making the resulting data essentially useless. As far as I know, there has not been any practical use for this, so let's not waste any cycles on it. Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-10-11mac80211: minstrel: do not sample rates 3 times slower than max_prob_rateFelix Fietkau1-4/+6
These rates are highly unlikely to be used quickly, even if the link deteriorates rapidly. This improves throughput in cases where CCK rates are not reliable enough to be skipped entirely during sampling. Sampling these rates regularly can cost a lot of airtime. Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-10-11mac80211: minstrel: fix sampling/reporting of CCK rates in HT modeFelix Fietkau1-4/+10
Long/short preamble selection cannot be sampled separately, since it depends on the BSS state. Because of that, sampling attempts to currently not used preamble modes are not counted in the statistics, which leads to CCK rates being sampled too often. Fix statistics accounting for long/short preamble by increasing the index where necessary. Fix excessive CCK rate sampling by dropping unsupported sample attempts. This improves throughput on 2.4 GHz channels Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-10-11mac80211: minstrel: fix CCK rate group streams valueFelix Fietkau1-1/+1
Fixes a harmless underflow issue when CCK rates are actively being used Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-10-11mac80211: minstrel: fix using short preamble CCK rates on HT clientsFelix Fietkau1-3/+1
mi->supported[MINSTREL_CCK_GROUP] needs to be updated short preamble rates need to be marked as supported regardless of whether it's currently enabled. Its state can change at any time without a rate_update call. Fixes: 782dda00ab8e ("mac80211: minstrel_ht: move short preamble check out of get_rate") Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-10-11mac80211: minstrel: reduce minstrel_mcs_groups sizeFelix Fietkau3-78/+93
By storing a shift value for all duration values of a group, we can reduce precision by a neglegible amount to make it fit into a u16 value. This improves cache footprint and reduces size: Before: text data bss dec hex filename 10024 116 0 10140 279c rc80211_minstrel_ht.o After: text data bss dec hex filename 9368 116 0 9484 250c rc80211_minstrel_ht.o Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-10-11mac80211: minstrel: merge with minstrel_ht, always enable VHT supportFelix Fietkau10-258/+101
Legacy-only devices are not very common and the overhead of the extra code for HT and VHT rates is not big enough to justify all those extra lines of code to make it optional. Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-10-11mac80211: minstrel: remove unnecessary debugfs cleanup codeFelix Fietkau6-47/+9
debugfs entries are cleaned up by debugfs_remove_recursive already. Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-10-11mac80211: minstrel: Enable STBC and LDPC for VHT RatesChaitanya T K1-8/+15
If peer support reception of STBC and LDPC, enable them for better performance. Signed-off-by: Chaitanya TK <chaitanya.mgit@gmail.com> Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-10-11mac80211: avoid reflecting frames back to the clientJohannes Berg1-6/+6
I'm not really sure exactly _why_ I've been carrying a note for what's probably _years_ to check that we don't do this, but we clearly do reflect frames back to the station itself if it sends such. One way or the other, it's useless since the station doesn't really need the AP to talk to itself, so suppress it. While at it, clarify some of the logic by removing skb->data references in favour of the destination address (pointer) we already have separately. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-10-11nl80211: use netlink policy validation function for elementsJohannes Berg1-76/+46
Instead of open-coding a lot of calls to is_valid_ie_attr(), add this validation directly to the policy, now that we can. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-10-11nl80211: use policy range validation where applicableJohannes Berg1-289/+180
Many range checks can be done in the policy, move them there. A few in mesh are added in the code (taken out of the macros) because they don't fit into the s16 range in the policy validation. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-10-10tipc: queue socket protocol error messages into socket receive bufferParthasarathy Bhuvaragan1-2/+12
In tipc_sk_filter_rcv(), when we detect protocol messages with error we call tipc_sk_conn_proto_rcv() and let it reset the connection and notify the socket by calling sk->sk_state_change(). However, tipc_sk_filter_rcv() may have been called from the function tipc_backlog_rcv(), in which case the socket lock is held and the socket already awake. This means that the sk_state_change() call is ignored and the error notification lost. Now the receive queue will remain empty and the socket sleeps forever. In this commit, we convert the protocol message into a connection abort message and enqueue it into the socket's receive queue. By this addition to the above state change we cover all conditions. Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-10tipc: set link tolerance correctly in broadcast linkJon Maloy1-5/+11
In the patch referred to below we added link tolerance as an additional criteria for declaring broadcast transmission "stale" and resetting the affected links. However, the 'tolerance' field of the broadcast link is never set, and remains at zero. This renders the whole commit without the intended improving effect, but luckily also with no negative effect. In this commit we add the missing initialization. Fixes: a4dc70d46cf1 ("tipc: extend link reset criteria for stale packet retransmission") Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-10net: sched: avoid writing on noop_qdiscEric Dumazet1-2/+12
While noop_qdisc.gso_skb and noop_qdisc.skb_bad_txq are not used in other places, it seems not correct to overwrite their fields in dev_init_scheduler_queue(). noop_qdisc is essentially a shared and read-only object, even if it is not marked as const because of some implementation detail. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-10net/mpls: Implement handler for strict data checking on dumpsDavid Ahern1-1/+35
Without CONFIG_INET enabled compiles fail with: net/mpls/af_mpls.o: In function `mpls_dump_routes': af_mpls.c:(.text+0xed0): undefined reference to `ip_valid_fib_dump_req' The preference is for MPLS to use the same handler as ipv4 and ipv6 to allow consistency when doing a dump for AF_UNSPEC which walks all address families invoking the route dump handler. If INET is disabled then fallback to an MPLS version which can be tighter on the data checks. Fixes: e8ba330ac0c5 ("rtnetlink: Update fib dumps for strict data checking") Reported-by: Randy Dunlap <rdunlap@infradead.org> Reported-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-10net: ipv4: don't let PMTU updates increase route MTUSabrina Dubroca1-3/+4
When an MTU update with PMTU smaller than net.ipv4.route.min_pmtu is received, we must clamp its value. However, we can receive a PMTU exception with PMTU < old_mtu < ip_rt_min_pmtu, which would lead to an increase in PMTU. To fix this, take the smallest of the old MTU and ip_rt_min_pmtu. Before this patch, in case of an update, the exception's MTU would always change. Now, an exception can have only its lock flag updated, but not the MTU, so we need to add a check on locking to the following "is this exception getting updated, or close to expiring?" test. Fixes: d52e5a7e7ca4 ("ipv4: lock mtu in fnhe when received PMTU < net.ipv4.route.min_pmtu") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-10net: ipv4: update fnhe_pmtu when first hop's MTU changesSabrina Dubroca3-6/+84
Since commit 5aad1de5ea2c ("ipv4: use separate genid for next hop exceptions"), exceptions get deprecated separately from cached routes. In particular, administrative changes don't clear PMTU anymore. As Stefano described in commit e9fa1495d738 ("ipv6: Reflect MTU changes on PMTU of exceptions for MTU-less routes"), the PMTU discovered before the local MTU change can become stale: - if the local MTU is now lower than the PMTU, that PMTU is now incorrect - if the local MTU was the lowest value in the path, and is increased, we might discover a higher PMTU Similarly to what commit e9fa1495d738 did for IPv6, update PMTU in those cases. If the exception was locked, the discovered PMTU was smaller than the minimal accepted PMTU. In that case, if the new local MTU is smaller than the current PMTU, let PMTU discovery figure out if locking of the exception is still needed. To do this, we need to know the old link MTU in the NETDEV_CHANGEMTU notifier. By the time the notifier is called, dev->mtu has been changed. This patch adds the old MTU as additional information in the notifier structure, and a new call_netdevice_notifiers_u32() function. Fixes: 5aad1de5ea2c ("ipv4: use separate genid for next hop exceptions") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-10net/ipv6: stop leaking percpu memory in fib6 infoMike Rapoport1-0/+2
The fib6_info_alloc() function allocates percpu memory to hold per CPU pointers to rt6_info, but this memory is never freed. Fix it. Fixes: a64efe142f5e ("net/ipv6: introduce fib6_info struct and helpers") Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-10Merge tag 'rxrpc-fixes-20181008' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fsDavid S. Miller9-174/+234
David Howells says: ==================== rxrpc: Fix packet reception code Here are a set of patches that prepares for and fix problems in rxrpc's package reception code. There serious problems are: (A) There's a window between binding the socket and setting the data_ready hook in which packets can find their way into the UDP socket's receive queues. (B) The skb_recv_udp() will return an error (and clear the error state) if there was an error on the Tx side. rxrpc doesn't handle this. (C) The rxrpc data_ready handler doesn't fully drain the UDP receive queue. (D) The rxrpc data_ready handler assumes it is called in a non-reentrant state. The second patch fixes (A) - (C); the third patch renders (B) and (C) non-issues by using the recap_rcv hook instead of data_ready - and the final patch fixes (D). That last is the most complex. The preparatory patches are: (1) Fix some places that are doing things in the wrong net namespace. (2) Stop taking the rcu read lock as it's held by the IP input routine in the call chain. (3) Only end the Tx phase if *we* rotated the final packet out of the Tx buffer. (4) Don't assume that the call state won't change after dropping the call_state lock. (5) Only take receive window and MTU suze parameters from an ACK packet if it's the latest ACK packet. (6) Record connection-level abort information correctly. (7) Fix a trace line. And then there are three main patches - note that these are mixed in with the preparatory patches somewhat: (1) Fix the setup window (A), skb_recv_udp() error check (B) and packet drainage (C). (2) Switch to using the encap_rcv instead of data_ready to cut out the effects of the UDP read queues and get the packets delivered directly. (3) Add more locking into the various packet input paths to defend against re-entrance (D). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-10tcp: refactor DCTCP ECN ACK handlingYuchung Cheng2-51/+44
DCTCP has two parts - a new ECN signalling mechanism and the response function to it. The first part can be used by other congestion control for DCTCP-ECN deployed networks. This patch moves that part into a separate tcp_dctcp.h to be used by other congestion control module (like how Yeah uses Vegas algorithmas). For example, BBR is experimenting such ECN signal currently https://tinyurl.com/ietf-102-iccrg-bbr2 Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Yousuk Seung <ysseung@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-10net/ipv6: Make ipv6_route_table_template staticDavid Ahern1-1/+1
ipv6_route_table_template is exported but there are no users outside of route.c. Make it static. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-10rtnetlink: Update comment in rtnl_stats_dump regarding strict data checkingDavid Ahern1-2/+2
The NLM_F_DUMP_PROPER_HDR netlink flag was replaced by a setsockopt. Update the comment in rtnl_stats_dump. Fixes: 841891ec0c65 ("rtnetlink: Update rtnl_stats_dump for strict data checking") Reported-by: Christian Brauner <christian@brauner.io> Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-10rtnetlink: Move ifm in valid_fdb_dump_legacy to closer to useDavid Ahern1-1/+3
Move setting of local variable ifm to after the message parsing in valid_fdb_dump_legacy. Avoid potential future use of unchecked variable. Fixes: 8dfbda19a21b ("rtnetlink: Move input checking for rtnl_fdb_dump to helper") Reported-by: Christian Brauner <christian@brauner.io> Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-10rds: RDS (tcp) hangs on sendto() to unresponding addressKa-Cheong Poon1-3/+10
In rds_send_mprds_hash(), if the calculated hash value is non-zero and the MPRDS connections are not yet up, it will wait. But it should not wait if the send is non-blocking. In this case, it should just use the base c_path for sending the message. Signed-off-by: Ka-Cheong Poon <ka-cheong.poon@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-10net: make skb_partial_csum_set() more robust against overflowsEric Dumazet1-5/+7
syzbot managed to crash in skb_checksum_help() [1] : BUG_ON(offset + sizeof(__sum16) > skb_headlen(skb)); Root cause is the following check in skb_partial_csum_set() if (unlikely(start > skb_headlen(skb)) || unlikely((int)start + off > skb_headlen(skb) - 2)) return false; If skb_headlen(skb) is 1, then (skb_headlen(skb) - 2) becomes 0xffffffff and the check fails to detect that ((int)start + off) is off the limit, since the compare is unsigned. When we fix that, then the first condition (start > skb_headlen(skb)) becomes obsolete. Then we should also check that (skb_headroom(skb) + start) wont overflow 16bit field. [1] kernel BUG at net/core/dev.c:2880! invalid opcode: 0000 [#1] PREEMPT SMP KASAN CPU: 1 PID: 7330 Comm: syz-executor4 Not tainted 4.19.0-rc6+ #253 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:skb_checksum_help+0x9e3/0xbb0 net/core/dev.c:2880 Code: 85 00 ff ff ff 48 c1 e8 03 42 80 3c 28 00 0f 84 09 fb ff ff 48 8b bd 00 ff ff ff e8 97 a8 b9 fb e9 f8 fa ff ff e8 2d 09 76 fb <0f> 0b 48 8b bd 28 ff ff ff e8 1f a8 b9 fb e9 b1 f6 ff ff 48 89 cf RSP: 0018:ffff8801d83a6f60 EFLAGS: 00010293 RAX: ffff8801b9834380 RBX: ffff8801b9f8d8c0 RCX: ffffffff8608c6d7 RDX: 0000000000000000 RSI: ffffffff8608cc63 RDI: 0000000000000006 RBP: ffff8801d83a7068 R08: ffff8801b9834380 R09: 0000000000000000 R10: ffff8801d83a76d8 R11: 0000000000000000 R12: 0000000000000001 R13: 0000000000010001 R14: 000000000000ffff R15: 00000000000000a8 FS: 00007f1a66db5700(0000) GS:ffff8801daf00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f7d77f091b0 CR3: 00000001ba252000 CR4: 00000000001406e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: skb_csum_hwoffload_help+0x8f/0xe0 net/core/dev.c:3269 validate_xmit_skb+0xa2a/0xf30 net/core/dev.c:3312 __dev_queue_xmit+0xc2f/0x3950 net/core/dev.c:3797 dev_queue_xmit+0x17/0x20 net/core/dev.c:3838 packet_snd net/packet/af_packet.c:2928 [inline] packet_sendmsg+0x422d/0x64c0 net/packet/af_packet.c:2953 Fixes: 5ff8dda3035d ("net: Ensure partial checksum offset is inside the skb head") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-10devlink: Add helper function for safely copy string paramMoshe Shemesh1-1/+18
Devlink string param buffer is allocated at the size of DEVLINK_PARAM_MAX_STRING_VALUE. Add helper function which makes sure this size is not exceeded. Renamed DEVLINK_PARAM_MAX_STRING_VALUE to __DEVLINK_PARAM_MAX_STRING_VALUE to emphasize that it should be used by devlink only. The driver should use the helper function instead to verify it doesn't exceed the allowed length. Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-10devlink: Fix param cmode driverinit for string typeMoshe Shemesh1-3/+12
Driverinit configuration mode value is held by devlink to enable the driver fetch the value after reload command. In case the param type is string devlink should copy the value from driver string buffer to devlink string buffer on devlink_param_driverinit_value_set() and vice-versa on devlink_param_driverinit_value_get(). Fixes: ec01aeb1803e ("devlink: Add support for get/set driverinit value") Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>