aboutsummaryrefslogtreecommitdiffstats
path: root/include (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2017-06-17ipv4: take dst->__refcnt when caching dst in fibWei Wang2-4/+20
In IPv4 routing code, fib_nh and fib_nh_exception can hold pointers to struct rtable but they never increment dst->__refcnt. This leads to the need of the dst garbage collector because when user is done with this dst and calls dst_release(), it can only decrement dst->__refcnt and can not free the dst even it sees dst->__refcnt drops from 1 to 0 (unless DST_NOCACHE flag is set) because the routing code might still hold reference to it. And when the routing code tries to delete a route, it has to put the dst to the gc_list if dst->__refcnt is not yet 0 and have a gc thread running periodically to check on dst->__refcnt and finally to free dst when refcnt becomes 0. This patch increments dst->__refcnt when fib_nh/fib_nh_exception holds reference to this dst and properly release the dst when fib_nh/fib_nh_exception has been updated with a new dst. This patch is a preparation in order to fully get rid of dst gc later. Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-17net: introduce a new function dst_dev_put()Wei Wang2-0/+25
This function should be called when removing routes from fib tree after the dst gc is no longer in use. We first mark DST_OBSOLETE_DEAD on this dst to make sure next dst_ops->check() fails and returns NULL. Secondly, as we no longer keep the gc_list, we need to properly release dst->dev right at the moment when the dst is removed from the fib/fib6 tree. It does the following: 1. change dst->input and output pointers to dst_discard/dst_dscard_out to discard all packets 2. replace dst->dev with loopback interface Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-17net: introduce DST_NOGC in dst_release() to destroy dst based on refcntWei Wang2-3/+22
The current mechanism of freeing dst is a bit complicated. dst has its ref count and when user grabs the reference to the dst, the ref count is properly taken in most cases except in IPv4/IPv6/decnet/xfrm routing code due to some historic reasons. If the reference to dst is always taken properly, we should be able to simplify the logic in dst_release() to destroy dst when dst->__refcnt drops from 1 to 0. And this should be the only condition to determine if we can call dst_destroy(). And as dst is always ref counted, there is no need for a dst garbage list to hold the dst entries that already get removed by the routing code but are still held by other users. And the task to periodically check the list to free dst if ref count become 0 is also not needed anymore. This patch introduces a temporary flag DST_NOGC(no garbage collector). If it is set in the dst, dst_release() will call dst_destroy() when dst->__refcnt drops to 0. dst_hold_safe() will also check for this flag and do atomic_inc_not_zero() similar as DST_NOCACHE to avoid double free issue. This temporary flag is mainly used so that we can make the transition component by component without breaking other parts. This flag will be removed after all components are properly transitioned. This patch also introduces a new function dst_release_immediate() which destroys dst without waiting on the rcu when refcnt drops to 0. It will be used in later patches. Follow-up patches will correct all the places to properly take ref count on dst and mark DST_NOGC. dst_release() or dst_release_immediate() will be used to release the dst instead of dst_free() and its related functions. And final clean-up patch will remove the DST_NOGC flag. Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-17net: use loopback dev when generating blackhole routeWei Wang2-5/+6
Existing ipv4/6_blackhole_route() code generates a blackhole route with dst->dev pointing to the passed in dst->dev. It is not necessary to hold reference to the passed in dst->dev because the packets going through this route are dropped anyway. A loopback interface is good enough so that we don't need to worry about releasing this dst->dev when this dev is going down. Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-17udp: call dst_hold_safe() in udp_sk_rx_set_dst()Wei Wang2-16/+14
In udp_v4/6_early_demux() code, we try to hold dst->__refcnt for dst with DST_NOCACHE flag. This is because later in udp_sk_rx_dst_set() function, we will try to cache this dst in sk for connected case. However, a better way to achieve this is to not try to hold dst in early_demux(), but in udp_sk_rx_dst_set(), call dst_hold_safe(). This approach is also more consistant with how tcp is handling it. And it will make later changes simpler. Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-17ipv6: remove unnecessary dst_hold() in ip6_fragment()Wei Wang1-4/+0
In ipv6 tx path, rcu_read_lock() is taken so that dst won't get freed during the execution of ip6_fragment(). Hence, no need to hold dst in it. Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-16net: dsa: add cross-chip multicast supportVivien Didelot1-10/+20
Similarly to how cross-chip VLAN works, define a bitmap of multicast group members for a switch, now including its DSA ports, so that multicast traffic can be sent to all switches of the fabric. A switch may drop the frames if no user port is a member. This brings support for multicast in a multi-chip environment. As of now, all switches of the fabric must support the multicast operations in order to program a single fabric port. Reported-by: Jason Cobham <jcobham@questertangent.com> Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Tested-by: Jason Cobham <jcobham@questertangent.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-16ibmvnic: driver initialization for kdump/kexecNathan Fontenot1-5/+19
When booting into the kdump/kexec kernel, pHyp and vios are not prepared for the initialization crq request and a failover transport event is generated. This is not handled correctly. At this point in initialization the driver is still in the 'probing' state and cannot handle a full reset of the driver as is normally done for a failover transport event. To correct this we catch driver resets while still in the 'probing' state and return EAGAIN. This results in the driver tearing down the main crq and calling ibmvnic_init() again. Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-16decnet: always not take dst->__refcnt when inserting dst into hash tableWei Wang1-10/+4
In the existing dn_route.c code, dn_route_output_slow() takes dst->__refcnt before calling dn_insert_route() while dn_route_input_slow() does not take dst->__refcnt before calling dn_insert_route(). This makes the whole routing code very buggy. In dn_dst_check_expire(), dnrt_free() is called when rt expires. This makes the routes inserted by dn_route_output_slow() not able to be freed as the refcnt is not released. In dn_dst_gc(), dnrt_drop() is called to release rt which could potentially cause the dst->__refcnt to be dropped to -1. In dn_run_flush(), dst_free() is called to release all the dst. Again, it makes the dst inserted by dn_route_output_slow() not able to be released and also, it does not wait on the rcu and could potentially cause crash in the path where other users still refer to this dst. This patch makes sure both input and output path do not take dst->__refcnt before calling dn_insert_route() and also makes sure dnrt_free()/dst_free() is called when removing dst from the hash table. The only difference between those 2 calls is that dnrt_free() waits on the rcu while dst_free() does not. Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-16rds: tcp: Set linger when rejecting an incoming conn in rds_tcp_accept_oneSowmini Varadhan1-1/+18
Each time we get an incoming SYN to the RDS_TCP_PORT, the TCP layer accepts the connection and then the rds_tcp_accept_one() callback is invoked to process the incoming connection. rds_tcp_accept_one() may reject the incoming syn for a number of reasons, e.g., commit 1a0e100fb2c9 ("RDS: TCP: Force every connection to be initiated by numerically smaller IP address"), or because we are getting spammed by a malicious node that is triggering a flood of connection attempts to RDS_TCP_PORT. If the incoming syn is rejected, no data would have been sent on the TCP socket, and we do not need to be in TIME_WAIT state, so we set linger on the TCP socket before closing, thereby closing the socket efficiently with a RST. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Tested-by: Imanti Mendez <imanti.mendez@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-16rds: tcp: various endian-ness fixesSowmini Varadhan6-13/+21
Found when testing between sparc and x86 machines on different subnets, so the address comparison patterns hit the corner cases and brought out some bugs fixed by this patch. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Tested-by: Imanti Mendez <imanti.mendez@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-16rds: tcp: remove cp_outgoingSowmini Varadhan4-23/+4
After commit 1a0e100fb2c9 ("RDS: TCP: Force every connection to be initiated by numerically smaller IP address") we no longer need the logic associated with cp_outgoing, so clean up usage of this field. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Tested-by: Imanti Mendez <imanti.mendez@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-16net: dsa: loop: Implement ethtool statisticsFlorian Fainelli1-2/+77
When a DSA driver implements ethtool statistics, we also override the master network device's ethtool statistics with the CPU port's statistics and this has proven to be a possible source of bugs in the past. Enhance the dsa_loop.c driver to provide statistics under the forme of ok/error reads and writes from the per-port PHY read/writes. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-16net: dsa: loop: Inline unregister_fixed_phys()Florian Fainelli1-10/+5
This is a simple function that only gets used in the driver's remove function, inline it there. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-16pktgen: Specify the index of first threadTariq Toukan8-18/+27
Use "-f <num>", to specify the index of the first sender thread. In default first thread is #0. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-16pktgen: Specify num packets per threadTariq Toukan9-8/+15
Use -n <num>, to specify the number of packets every thread sends. Zero means indefinitely. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-16dt-bindings: orion-mdio: document the new xmdio compatibleAntoine Ténart1-4/+6
A new compatible for Marvell xMDIO interfaces was added into the Marvell MDIO driver. Document this new compatible. Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-16net: mvmdio: simplify the smi read and write error pathsAntoine Ténart1-10/+6
Cosmetic patch simplifying the smi read and write error paths. It also align their error paths with the ones of the xsmi functions. Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-16net: mvmdio: add xmdio xsmi supportAntoine Ténart1-7/+105
This patch adds the xmdio xsmi interface support in the mvmdio driver. This interface is used in Ethernet controllers on Marvell 370, 7k and 8k (as of now). The xsmi interface supported by this driver complies with the IEEE 802.3 clause 45. The xSMI interface is used by 10GbE devices. Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-16net: mvmdio: check the MII_ADDR_C45 bit is not set for smi operationsAntoine Ténart1-0/+6
Add a check for the read and write smi operations, to ensure the MII_ADDR_C45 bit isn't set. This will be needed as soon as the xSMI support is added to the mvmdio driver. Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-16net: mvmdio: put the poll intervals in the ops structureAntoine Ténart1-2/+6
Put the two poll intervals (min and max) in the driver's ops structure. This is needed to add the xmdio support later. Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-16net: mvmdio: introduce an ops structureAntoine Ténart1-11/+19
Introduce an ops structure to add an indirection on the is_done function, as this is needed to add the xMDIO support later. Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-16net: mvmdio: remove duplicate lockingRussell King1-10/+0
The MDIO layer already provides per-bus locking, so there's no need for MDIO bus drivers to do their own internal locking. Remove this. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-16net: mvmdio: use GENMASK for masksAntoine Ténart1-1/+1
Cosmetic patch to use the GENMASK helper for masks. Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-16net: mvmdio: use tabs for definesAntoine Ténart1-13/+13
Cosmetic patch replacing spaces by tabs for defined values. Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-16net: mvmdio: reorder headers alphabeticallyAntoine Ténart1-5/+5
Cosmetic fix reordering headers alphabetically. Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-16bpf: qede: Report bpf_prog ID during XDP_QUERY_PROGMartin KaFai Lau1-0/+1
Add support to qede to report bpf_prog ID during XDP_QUERY_PROG. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Cc: Mintz Yuval <Yuval.Mintz@cavium.com> Acked-by: Alexei Starovoitov <ast@fb.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-16bpf: nfp: Report bpf_prog ID during XDP_QUERY_PROGMartin KaFai Lau1-0/+1
Add support to nfp to report bpf_prog ID during XDP_QUERY_PROG. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Cc: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Alexei Starovoitov <ast@fb.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-16bpf: ixgbe: Report bpf_prog ID during XDP_QUERY_PROGMartin KaFai Lau1-0/+2
Add support to ixgbe to report bpf_prog ID during XDP_QUERY_PROG. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Cc: Alexander Duyck <alexander.h.duyck@intel.com> Cc: John Fastabend <john.fastabend@gmail.com> Acked-by: Alexei Starovoitov <ast@fb.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-16bpf: thunderx: Report bpf_prog ID during XDP_QUERY_PROGMartin KaFai Lau1-0/+1
Add support to thunderx to report bpf_prog ID during XDP_QUERY_PROG. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Cc: Sunil Goutham <sgoutham@cavium.com> Acked-by: Alexei Starovoitov <ast@fb.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-16bpf: bnxt: Report bpf_prog ID during XDP_QUERY_PROGMartin KaFai Lau1-0/+1
Add support to bnxt to report bpf_prog ID during XDP_QUERY_PROG. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Cc: Michael Chan <michael.chan@broadcom.com> Acked-by: Alexei Starovoitov <ast@fb.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-16bpf: virtio_net: Report bpf_prog ID during XDP_QUERY_PROGMartin KaFai Lau1-5/+8
Add support to virtio_net to report bpf_prog ID during XDP_QUERY_PROG. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Jason Wang <jasowang@redhat.com> Acked-by: Alexei Starovoitov <ast@fb.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-16bpf: mlx5e: Report bpf_prog ID during XDP_QUERY_PROGMartin KaFai Lau1-3/+12
Add support to mlx5e to report bpf_prog ID during XDP_QUERY_PROG. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Cc: Tariq Toukan <tariqt@mellanox.com> Cc: Saeed Mahameed <saeedm@mellanox.com> Acked-by: Alexei Starovoitov <ast@fb.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-16bpf: mlx4: Report bpf_prog ID during XDP_QUERY_PROGMartin KaFai Lau1-3/+18
Add support to mlx4 to report bpf_prog ID during XDP_QUERY_PROG. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Cc: Tariq Toukan <tariqt@mellanox.com> Cc: Saeed Mahameed <saeedm@mellanox.com> Acked-by: Alexei Starovoitov <ast@fb.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-16net: Add IFLA_XDP_PROG_IDMartin KaFai Lau4-16/+38
Expose prog_id through IFLA_XDP_PROG_ID. This patch makes modification to generic_xdp. The later patches will modify other xdp-supported drivers. prog_id is added to struct net_dev_xdp. iproute2 patch will be followed. Here is how the 'ip link' will look like: > ip link show eth0 3: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp(prog_id:1) qdisc fq_codel state UP mode DEFAULT group default qlen 1000 Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Alexei Starovoitov <ast@fb.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-16networking: add and use skb_put_u8()Johannes Berg39-100/+106
Joe and Bjørn suggested that it'd be nicer to not have the cast in the fairly common case of doing *(u8 *)skb_put(skb, 1) = c; Add skb_put_u8() for this case, and use it across the code, using the following spatch: @@ expression SKB, C, S; typedef u8; identifier fn = {skb_put}; fresh identifier fn2 = fn ## "_u8"; @@ - *(u8 *)fn(SKB, S) = C; + fn2(SKB, C); Note that due to the "S", the spatch isn't perfect, it should have checked that S is 1, but there's also places that use a sizeof expression like sizeof(var) or sizeof(u8) etc. Turns out that nobody ever did something like *(u8 *)skb_put(skb, 2) = c; which would be wrong anyway since the second byte wouldn't be initialized. Suggested-by: Joe Perches <joe@perches.com> Suggested-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-16networking: make skb_push & __skb_push return void pointersJohannes Berg126-234/+204
It seems like a historic accident that these return unsigned char *, and in many places that means casts are required, more often than not. Make these functions return void * and remove all the casts across the tree, adding a (u8 *) cast only where the unsigned char pointer was used directly, all done with the following spatch: @@ expression SKB, LEN; typedef u8; identifier fn = { skb_push, __skb_push, skb_push_rcsum }; @@ - *(fn(SKB, LEN)) + *(u8 *)fn(SKB, LEN) @@ expression E, SKB, LEN; identifier fn = { skb_push, __skb_push, skb_push_rcsum }; type T; @@ - E = ((T *)(fn(SKB, LEN))) + E = fn(SKB, LEN) @@ expression SKB, LEN; identifier fn = { skb_push, __skb_push, skb_push_rcsum }; @@ - fn(SKB, LEN)[0] + *(u8 *)fn(SKB, LEN) Note that the last part there converts from push(...)[0] to the more idiomatic *(u8 *)push(...). Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-16networking: make skb_pull & friends return void pointersJohannes Berg13-27/+29
It seems like a historic accident that these return unsigned char *, and in many places that means casts are required, more often than not. Make these functions return void * and remove all the casts across the tree, adding a (u8 *) cast only where the unsigned char pointer was used directly, all done with the following spatch: @@ expression SKB, LEN; typedef u8; identifier fn = { skb_pull, __skb_pull, skb_pull_inline, __pskb_pull_tail, __pskb_pull, pskb_pull }; @@ - *(fn(SKB, LEN)) + *(u8 *)fn(SKB, LEN) @@ expression E, SKB, LEN; identifier fn = { skb_pull, __skb_pull, skb_pull_inline, __pskb_pull_tail, __pskb_pull, pskb_pull }; type T; @@ - E = ((T *)(fn(SKB, LEN))) + E = fn(SKB, LEN) Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-16networking: make skb_put & friends return void pointersJohannes Berg145-547/+486
It seems like a historic accident that these return unsigned char *, and in many places that means casts are required, more often than not. Make these functions (skb_put, __skb_put and pskb_put) return void * and remove all the casts across the tree, adding a (u8 *) cast only where the unsigned char pointer was used directly, all done with the following spatch: @@ expression SKB, LEN; typedef u8; identifier fn = { skb_put, __skb_put }; @@ - *(fn(SKB, LEN)) + *(u8 *)fn(SKB, LEN) @@ expression E, SKB, LEN; identifier fn = { skb_put, __skb_put }; type T; @@ - E = ((T *)(fn(SKB, LEN))) + E = fn(SKB, LEN) which actually doesn't cover pskb_put since there are only three users overall. A handful of stragglers were converted manually, notably a macro in drivers/isdn/i4l/isdn_bsdcomp.c and, oddly enough, one of the many instances in net/bluetooth/hci_sock.c. In the former file, I also had to fix one whitespace problem spatch introduced. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-16networking: introduce and use skb_put_data()Johannes Berg252-741/+622
A common pattern with skb_put() is to just want to memcpy() some data into the new space, introduce skb_put_data() for this. An spatch similar to the one for skb_put_zero() converts many of the places using it: @@ identifier p, p2; expression len, skb, data; type t, t2; @@ ( -p = skb_put(skb, len); +p = skb_put_data(skb, data, len); | -p = (t)skb_put(skb, len); +p = skb_put_data(skb, data, len); ) ( p2 = (t2)p; -memcpy(p2, data, len); | -memcpy(p, data, len); ) @@ type t, t2; identifier p, p2; expression skb, data; @@ t *p; ... ( -p = skb_put(skb, sizeof(t)); +p = skb_put_data(skb, data, sizeof(t)); | -p = (t *)skb_put(skb, sizeof(t)); +p = skb_put_data(skb, data, sizeof(t)); ) ( p2 = (t2)p; -memcpy(p2, data, sizeof(*p)); | -memcpy(p, data, sizeof(*p)); ) @@ expression skb, len, data; @@ -memcpy(skb_put(skb, len), data, len); +skb_put_data(skb, data, len); (again, manually post-processed to retain some comments) Reviewed-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-16networking: convert many more places to skb_put_zero()Johannes Berg49-169/+89
There were many places that my previous spatch didn't find, as pointed out by yuan linyu in various patches. The following spatch found many more and also removes the now unnecessary casts: @@ identifier p, p2; expression len; expression skb; type t, t2; @@ ( -p = skb_put(skb, len); +p = skb_put_zero(skb, len); | -p = (t)skb_put(skb, len); +p = skb_put_zero(skb, len); ) ... when != p ( p2 = (t2)p; -memset(p2, 0, len); | -memset(p, 0, len); ) @@ type t, t2; identifier p, p2; expression skb; @@ t *p; ... ( -p = skb_put(skb, sizeof(t)); +p = skb_put_zero(skb, sizeof(t)); | -p = (t *)skb_put(skb, sizeof(t)); +p = skb_put_zero(skb, sizeof(t)); ) ... when != p ( p2 = (t2)p; -memset(p2, 0, sizeof(*p)); | -memset(p, 0, sizeof(*p)); ) @@ expression skb, len; @@ -memset(skb_put(skb, len), 0, len); +skb_put_zero(skb, len); Apply it to the tree (with one manual fixup to keep the comment in vxlan.c, which spatch removed.) Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-16r8152: move calling delay_autosuspend functionhayeswang1-7/+5
Move calling delay_autosuspend() in rtl8152_runtime_suspend(). Calling delay_autosuspend() as late as possible. The original flows are 1. check if the driver/device is busy now. 2. set wake events. 3. enter runtime suspend. If the wake event occurs between (1) and (2), the device may miss it. Besides, to avoid the runtime resume occurs after runtime suspend immediately, move the checking to the end of rtl8152_runtime_suspend(). Signed-off-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-16r8152: split rtl8152_resume functionhayeswang1-38/+61
Split rtl8152_resume() into rtl8152_runtime_resume() and rtl8152_system_resume(). Besides, replace GFP_KERNEL with GFP_NOIO for usb_submit_urb(). Signed-off-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-16tls: Depend upon INET not plain NET.David S. Miller1-1/+1
We refer to TCP et al. symbols so have to use INET as the dependency. ERROR: "tcp_prot" [net/tls/tls.ko] undefined! >> ERROR: "tcp_rate_check_app_limited" [net/tls/tls.ko] undefined! ERROR: "tcp_register_ulp" [net/tls/tls.ko] undefined! ERROR: "tcp_unregister_ulp" [net/tls/tls.ko] undefined! ERROR: "do_tcp_sendpages" [net/tls/tls.ko] undefined! Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-15net/mlx4_en: Refactor mlx4_en_free_tx_descTariq Toukan1-29/+16
Some code re-ordering, functionally equivalent. - The !tx_info->inl check is evaluated anyway in both flows (common case/end case). Run it first, this might finish the flows earlier. - dma_unmap calls are identical in both flows, get it out of the if block into the common area. Performance tests: Tested on ConnectX3Pro, Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz Gain is too small to be measurable, no degradation sensed. Results are similar for IPv4 and IPv6. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Cc: kernel-team@fb.com Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-15net/mlx4_en: Replace TXBB_SIZE multiplications with shift operationsTariq Toukan2-13/+16
Define LOG_TXBB_SIZE, log of TXBB_SIZE, and use it with a shift operation instead of a multiplication with TXBB_SIZE. Operations are equivalent as TXBB_SIZE is a power of two. Performance tests: Tested on ConnectX3Pro, Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz Gain is too small to be measurable, no degradation sensed. Results are similar for IPv4 and IPv6. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Cc: kernel-team@fb.com Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-15net/mlx4_en: Increase default TX ring sizeTariq Toukan1-1/+1
Increase the default TX ring size (from 512 to 1024) to match the RX ring size. This gives the XDP TX ring a better chance to keep up with the rate of its RX ring in case of a high load of XDP_TX actions. Tested: Ethtool counter rx_xdp_tx_full used to increase, after applying this patch it stopped. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Cc: kernel-team@fb.com Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-15net/mlx4_en: Poll XDP TX completion queue in RX NAPITariq Toukan5-16/+51
Instead of having their own NAPIs, XDP TX completion queues get polled within the corresponding RX NAPI. This prevents any possible race on TX ring prod/cons indices, between the context that issues the transmits (RX NAPI) and the context that handles the completions (was previously done in a separate NAPI). This also improves performance, as it decreases the number of NAPIs running on a CPU, saving the overhead of syncing and switching between the contexts. Performance tests: Tested on ConnectX3Pro, Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz Single queue no-RSS optimization ON. XDP_TX packet rate: ------------------------------------- | Before | After | Gain | IPv4 | 12.0 Mpps | 13.8 Mpps | 15% | IPv6 | 12.0 Mpps | 13.8 Mpps | 15% | ------------------------------------- Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Cc: kernel-team@fb.com Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-15net/mlx4_en: Improve XDP xmit functionTariq Toukan3-43/+21
Several performance improvements in XDP TX datapath, including: - Ring a single doorbell for XDP TX ring per NAPI budget, instead of doing it per a lower threshold (was 8). This includes removing the flow of immediate doorbell ringing in case of a full TX ring. - Compiler branch predictor hints. - Calculate values in compile time rather than in runtime. Performance tests: Tested on ConnectX3Pro, Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz Single queue no-RSS optimization ON. XDP_TX packet rate: ------------------------------------- | Before | After | Gain | IPv4 | 10.3 Mpps | 12.0 Mpps | 17% | IPv6 | 10.3 Mpps | 12.0 Mpps | 17% | ------------------------------------- Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Cc: kernel-team@fb.com Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-15net/mlx4_en: Improve stack xmit functionTariq Toukan1-64/+87
Several small code and performance improvements in stack TX datapath, including: - Compiler branch predictor hints. - Minimize variables scope. - Move tx_info non-inline flow handling to a separate function. - Calculate data_offset in compile time rather than in runtime (for !lso_header_size branch). - Avoid trinary-operator ("?") when value can be preset in a matching branch. Performance tests: Tested on ConnectX3Pro, Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz Gain is too small to be measurable, no degradation sensed. Results are similar for IPv4 and IPv6. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Cc: kernel-team@fb.com Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>