aboutsummaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)AuthorFilesLines
2017-01-27net/ipv6: support more tunnel interfaces for EUI64 link-local generationFelix Jia3-0/+12
Signed-off-by: Felix Jia <felix.jia@alliedtelesis.co.nz> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-27net/ipv6: allow sysctl to change link-local address generation modeFelix Jia4-21/+86
The address generation mode for IPv6 link-local can only be configured by netlink messages. This patch adds the ability to change the address generation mode via sysctl. v1 -> v2 Removed the rtnl lock and switch to use RCU lock to iterate through the netdev list. v2 -> v3 Removed the addrgenmode variable from the idev structure and use the systcl storage for the flag. Simplifed the logic for sysctl handling by removing the supported for all operation. Added support for more types of tunnel interfaces for link-local address generation. Based the patches from net-next. v3 -> v4 Removed unnecessary whitespace changes. Signed-off-by: Felix Jia <felix.jia@alliedtelesis.co.nz> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-27net: Fix ndo_setup_tc commentFlorian Fainelli1-5/+6
Commit 16e5cc647173 ("net: rework setup_tc ndo op to consume general tc operand") changed the ndo_setup_tc() signature, but did not update the comments in netdevice.h, so do that now. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: John Fastabend <john.r.fastabend@intel.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-26net: ipv6: ignore null_entry on route dumpsDavid Ahern1-1/+5
lkp-robot reported a BUG: [ 10.151226] BUG: unable to handle kernel NULL pointer dereference at 00000198 [ 10.152525] IP: rt6_fill_node+0x164/0x4b8 [ 10.153307] *pdpt = 0000000012ee5001 *pde = 0000000000000000 [ 10.153309] [ 10.154492] Oops: 0000 [#1] [ 10.154987] CPU: 0 PID: 909 Comm: netifd Not tainted 4.10.0-rc4-00722-g41e8c70ee162-dirty #10 [ 10.156482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014 [ 10.158254] task: d0deb000 task.stack: d0e0c000 [ 10.159059] EIP: rt6_fill_node+0x164/0x4b8 [ 10.159780] EFLAGS: 00010296 CPU: 0 [ 10.160404] EAX: 00000000 EBX: d10c2358 ECX: c1f7c6cc EDX: c1f6ff44 [ 10.161469] ESI: 00000000 EDI: c2059900 EBP: d0e0dc4c ESP: d0e0dbe4 [ 10.162534] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068 [ 10.163482] CR0: 80050033 CR2: 00000198 CR3: 10d94660 CR4: 000006b0 [ 10.164535] Call Trace: [ 10.164993] ? paravirt_sched_clock+0x9/0xd [ 10.165727] ? sched_clock+0x9/0xc [ 10.166329] ? sched_clock_cpu+0x19/0xe9 [ 10.166991] ? lock_release+0x13e/0x36c [ 10.167652] rt6_dump_route+0x4c/0x56 [ 10.168276] fib6_dump_node+0x1d/0x3d [ 10.168913] fib6_walk_continue+0xab/0x167 [ 10.169611] fib6_walk+0x2a/0x40 [ 10.170182] inet6_dump_fib+0xfb/0x1e0 [ 10.170855] netlink_dump+0xcd/0x21f This happens when the loopback device is set down and a ipv6 fib route dump is requested. ip6_null_entry is the root of all ipv6 fib tables making it integrated into the table and hence passed to the ipv6 route dump code. The null_entry route uses the loopback device for dst.dev but may not have rt6i_idev set because of the order in which initializations are done -- ip6_route_net_init is run before addrconf_init has initialized the loopback device. Fixing the initialization order is a much bigger problem with no obvious solution thus far. The BUG is triggered when the loopback is set down and the netif_running check added by a1a22c1206 fails. The fill_node descends to checking rt->rt6i_idev for ignore_routes_with_linkdown and since rt6i_idev is NULL it faults. The null_entry route should not be processed in a dump request. Catch and ignore. This check is done in rt6_dump_route as it is the highest place in the callchain with knowledge of both the route and the network namespace. Fixes: a1a22c1206("net: ipv6: Keep nexthop of multipath route on admin down") Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-26net: ipv6: remove skb_reserve in getrouteDavid Ahern1-6/+0
Remove skb_reserve and skb_reset_mac_header from inet6_rtm_getroute. The allocated skb is not passed through the routing engine (like it is for IPv4) and has not since the beginning of git time. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-26Merge branch 'dsa2-pdata-prepatory-patches'David S. Miller7-55/+64
Florian Fainelli says: ==================== net: dsa: Preparatory patches This patch series extracts the 4 patches of the larger: net: dsa: Support for pdata in dsa2 while we wait for feedback from Greg KH on the device references. Changes in v2: - rebased properly after the multi-MDIO bus support added to mv88e6xxx ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-26net: dsa: Move ports assignment closer to error checkingFlorian Fainelli1-1/+2
Move the assignment of ports in _dsa_register_switch() closer to where it is checked, no functional change. Re-order declarations to be preserve the inverted christmas tree style. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-26net: dsa: Suffix function manipulating device_node with _dnFlorian Fainelli1-8/+8
Make it clear that these functions take a device_node structure pointer Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-26net: dsa: Make most functions take a dsa_port argumentFlorian Fainelli3-36/+44
In preparation for allowing platform data, and therefore no valid device_node pointer, make most DSA functions takes a pointer to a dsa_port structure whenever possible. While at it, introduce a dsa_port_is_valid() helper function which checks whether port->dn is NULL or not at the moment. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-26net: dsa: Pass device pointer to dsa_register_switchFlorian Fainelli5-10/+10
In preparation for allowing dsa_register_switch() to be supplied with device/platform data, pass down a struct device pointer instead of a struct device_node. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-26liquidio: Avoid accessing skb after submitting to input queueSatanand Burla2-6/+6
Accessing skb after submitting to input queue can cause access to stale pointers if the skb ends up being transmitted and freed by that time. Signed-off-by: Satanand Burla <satananda.burla@cavium.com> Signed-off-by: Derek Chickles <derek.chickles@cavium.com> Signed-off-by: Raghu Vatsavayi <raghu.vatsavayi@cavium.com> Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-26Merge tag 'batadv-next-for-davem-20170126' of git://git.open-mesh.org/linux-mergeDavid S. Miller61-70/+87
Simon Wunderlich says: ==================== This feature/cleanup patchset includes the following patches: - bump version strings, by Simon Wunderlich - ignore self-generated loop detect MAC addresses in translation table, by Simon Wunderlich - install uapi batman_adv.h header, by Sven Eckelmann - bump copyright years, by Sven Eckelmann - Remove an unused variable in translation table code, by Sven Eckelmann - Handle NET_XMIT_CN like NET_XMIT_SUCCESS (revised according to Davids suggestion), and a follow up code clean up, by Gao Feng (2 patches) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-26ISDN: eicon: reduce stack size of sig_ind functionArnd Bergmann1-8/+8
I noticed that this function uses a lot of kernel stack when the "latent entropy" plugin is enabled: drivers/isdn/hardware/eicon/message.c: In function 'sig_ind': drivers/isdn/hardware/eicon/message.c:6113:1: error: the frame size of 1168 bytes is larger than 1152 bytes [-Werror=frame-larger-than=] We currently don't warn about this, as we raise the warning limit to 2048 bytes in mainline, but I'd like to lower that limit again in the future, and this function can easily be changed to be more efficient and avoid that warning, by making some of its local variables 'const'. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-26batman-adv: Treat NET_XMIT_CN as transmit successfullyGao Feng1-1/+1
The tc could return NET_XMIT_CN as one congestion notification, but it does not mean the packet is lost. Other modules like ipvlan, macvlan, and others treat NET_XMIT_CN as success too. So batman-adv should handle NET_XMIT_CN also as NET_XMIT_SUCCESS. Signed-off-by: Gao Feng <gfree.wind@gmail.com> [sven@narfation.org: Moved NET_XMIT_CN handling to batadv_send_skb_packet] Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2017-01-26batman-adv: Remove one condition check in batadv_route_unicast_packetGao Feng1-5/+4
It could decrease one condition check to collect some statements in the first condition block. Signed-off-by: Gao Feng <gfree.wind@gmail.com> Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2017-01-26batman-adv: Remove unused variable in batadv_tt_local_set_flagsSven Eckelmann1-2/+0
Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2017-01-26batman-adv: update copyright years for 2017Sven Eckelmann60-60/+60
Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2017-01-26uapi: install batman_adv.h headerSven Eckelmann1-0/+1
09748a22f4ab ("batman-adv: add generic netlink family for batman-adv") introduced the new batman_adv.h which describes the netlink attributes and commands of batman-adv. But the Kbuild entry to install the header was not added. All currently known tools ship their own copy of batman_adv.h but it should be installed anyway to later be able to migrate to the system batman_adv.h. Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2017-01-26batman-adv: don't add loop detect macs to TTSimon Wunderlich2-1/+20
The bridge loop avoidance (BLA) feature of batman-adv sends packets to probe for Mesh/LAN packet loops. Those packets are not sent by real clients and should therefore not be added to the translation table (TT). Signed-off-by: Simon Wunderlich <simon.wunderlich@open-mesh.com>
2017-01-25bridge: move maybe_deliver_addr() inside #ifdefArnd Bergmann1-25/+25
The only caller of this new function is inside of an #ifdef checking for CONFIG_BRIDGE_IGMP_SNOOPING, so we should move the implementation there too, in order to avoid this harmless warning: net/bridge/br_forward.c:177:13: error: 'maybe_deliver_addr' defined but not used [-Werror=unused-function] Fixes: 6db6f0eae605 ("bridge: multicast to unicast") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-25bpf: use prefix_len in test_tag when reading fdinfoDaniel Borkmann1-1/+1
We currently used len instead of prefix_len for the strncmp() in fdinfo on the prog_tag. It still worked as we matched on the correct output line also with first 8 instead of 10 chars, but lets fix it properly to use the intended length. Fixes: 62b64660262a ("bpf: add prog tag test case to bpf selftests") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-25Merge branch 'broadcom-phy-cleanup'David S. Miller2-14/+10
Rafał Miłecki says: ==================== net-next: Broadcom PHY driver cleanup I will probably need to use broadcom.ko for PHY connected to interface of bgmac supported device so I started looking at it willing to understand it better. I found AUXCTL part of the driver / lib a bit confusing and hard to read so I'm trying to clean it up a bit. I hope this patchset makes following AUXCTL operations much easier making it clear which defines are for registers and which for values. There is no functional change in this pachset. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-25net: phy: bcm-phy-lib: clean up remaining AUXCTL register definesRafał Miłecki1-7/+7
1) Use 0x%02x format for register number. This follows some other defines and makes it easier to distinct register from values. 2) Put register define above values and sort the values. It makes reading header code easier. 3) Use 0x%04x format for all values. It's about consistency with other values (and most of the header) not a personal preference. 4) Separate define for reading shift value with an extre empty line. It's user for all AUXCTL registers in a bcm54xx_auxctl_read. Signed-off-by: Rafał Miłecki <rafal@milecki.pl> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-25net: phy: broadcom: drop duplicated define for RGMII SKEW delayRafał Miłecki2-2/+1
We had two defines for the same bit (both were used with the MII_BCM54XX_AUXCTL_SHDWSEL_MISC register). Signed-off-by: Rafał Miłecki <rafal@milecki.pl> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-25net: phy: broadcom: use auxctl reading helper in BCM54612E codeRafał Miłecki2-5/+2
Starting with commit 5b4e29005123 ("net: phy: broadcom: add bcm54xx_auxctl_read") we have a reading helper so use it and avoid code duplication. It also means we don't need MII_BCM54XX_AUXCTL_SHDWSEL_MISC define as it's the same as MII_BCM54XX_AUXCTL_SHDWSEL_MISC just for reading needs (same value shifted by 12 bits). Signed-off-by: Rafał Miłecki <rafal@milecki.pl> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-25net: dsa: Mop up remaining NET_DSA_HWMON referencesAndrew Lunn2-32/+0
Previous patches have moved the temperature sensor code into the Marvell PHYs. A few now dead references to NET_DSA_HWMON were left behind. Go reap them. Reported-by: Valentin Rothberg <valentinrothberg@gmail.com> Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-25sfc: reduce severity of PIO buffer alloc failuresTomáš Pilař1-3/+15
PIO buffer allocation can fail for two valid reasons: - we've run out of them (results in -ENOSPC) - the NIC configuration doesn't support them (results in -EPERM) Since both these failures are expected netif_err is excessive. Signed-off-by: Bert Kenward <bkenward@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-25Merge branch 'thunderx-ethtool'David S. Miller4-86/+83
Sunil Goutham says: ==================== thunderx: More ethtool support and BGX configuration changes These patches adds support to set queue sizes from ethtool and changes the way serdes lane configuration is done by BGX driver on 81/83xx platforms. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-25net: thunderx: Leave serdes lane config on 81/83xx to firmwareSunil Goutham1-77/+18
For DLMs and SLMs on 80/81/83xx, many lane configurations across different boards are coming up. Also kernel doesn't have any way to identify board type/info and since firmware does, just get rid of figuring out lane to serdes config and take whatever has been programmed by low level firmware. Signed-off-by: Sunil Goutham <sgoutham@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-25net: thunderx: Support to configure queue sizes from ethtoolSunil Goutham3-9/+65
Adds support to set Rx/Tx queue sizes from ethtool. Fixes an issue with retrieving queue size. Also sets SQ's CQ_LIMIT based on configured Tx queue size such that HW doesn't process SQEs when there is no sufficient space in CQ. Signed-off-by: Sunil Goutham <sgoutham@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-25net/tcp-fastopen: make connect()'s return case more consistent with non-TFOWilly Tarreau3-5/+5
Without TFO, any subsequent connect() call after a successful one returns -1 EISCONN. The last API update ensured that __inet_stream_connect() can return -1 EINPROGRESS in response to sendmsg() when TFO is in use to indicate that the connection is now in progress. Unfortunately since this function is used both for connect() and sendmsg(), it has the undesired side effect of making connect() now return -1 EINPROGRESS as well after a successful call, while at the same time poll() returns POLLOUT. This can confuse some applications which happen to call connect() and to check for -1 EISCONN to ensure the connection is usable, and for which EINPROGRESS indicates a need to poll, causing a loop. This problem was encountered in haproxy where a call to connect() is precisely used in certain cases to confirm a connection's readiness. While arguably haproxy's behaviour should be improved here, it seems important to aim at a more robust behaviour when the goal of the new API is to make it easier to implement TFO in existing applications. This patch simply ensures that we preserve the same semantics as in the non-TFO case on the connect() syscall when using TFO, while still returning -1 EINPROGRESS on sendmsg(). For this we simply tell __inet_stream_connect() whether we're doing a regular connect() or in fact connecting for a sendmsg() call. Cc: Wei Wang <weiwan@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-25Merge branch 'tcp-fastopen-new-API'David S. Miller10-26/+136
Wei Wang says: ==================== net/tcp-fastopen: Add new userspace API support The patch series is to add support for new userspace API for TCP fastopen sockets. In the current code, user has to call sendto()/sendmsg() with special flag MSG_FASTOPEN for TCP fastopen sockets. This API is quite different from the normal TCP socket API and can be cumbersome for applications to make use fastopen sockets. So this new patch introduces a new way of using TCP fastopen sockets which is similar to normal TCP sockets with a new sockopt TCP_FASTOPEN_CONNECT. More details about it is described in the third patch. (First 2 patches are preparations for the third patch.) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-25net/tcp-fastopen: Add new API supportWei Wang9-11/+111
This patch adds a new socket option, TCP_FASTOPEN_CONNECT, as an alternative way to perform Fast Open on the active side (client). Prior to this patch, a client needs to replace the connect() call with sendto(MSG_FASTOPEN). This can be cumbersome for applications who want to use Fast Open: these socket operations are often done in lower layer libraries used by many other applications. Changing these libraries and/or the socket call sequences are not trivial. A more convenient approach is to perform Fast Open by simply enabling a socket option when the socket is created w/o changing other socket calls sequence: s = socket() create a new socket setsockopt(s, IPPROTO_TCP, TCP_FASTOPEN_CONNECT …); newly introduced sockopt If set, new functionality described below will be used. Return ENOTSUPP if TFO is not supported or not enabled in the kernel. connect() With cookie present, return 0 immediately. With no cookie, initiate 3WHS with TFO cookie-request option and return -1 with errno = EINPROGRESS. write()/sendmsg() With cookie present, send out SYN with data and return the number of bytes buffered. With no cookie, and 3WHS not yet completed, return -1 with errno = EINPROGRESS. No MSG_FASTOPEN flag is needed. read() Return -1 with errno = EWOULDBLOCK/EAGAIN if connect() is called but write() is not called yet. Return -1 with errno = EWOULDBLOCK/EAGAIN if connection is established but no msg is received yet. Return number of bytes read if socket is established and there is msg received. The new API simplifies life for applications that always perform a write() immediately after a successful connect(). Such applications can now take advantage of Fast Open by merely making one new setsockopt() call at the time of creating the socket. Nothing else about the application's socket call sequence needs to change. Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-25net: Remove __sk_dst_reset() in tcp_v6_connect()Wei Wang1-1/+0
Remove __sk_dst_reset() in the failure handling because __sk_dst_reset() will eventually get called when sk is released. No need to handle it in the protocol specific connect call. This is also to make the code path consistent with ipv4. Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-25net/tcp-fastopen: refactor cookie check logicWei Wang3-14/+25
Refactor the cookie check logic in tcp_send_syn_data() into a function. This function will be called else where in later changes. Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-25r8152: fix the wrong spellinghayeswang1-2/+2
Replace rumtime with runtime. Signed-off-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-25Doc: DT: bindings: net: dsa: marvell.txt: TabificationAndrew Lunn1-56/+56
Replace spaces with tabs. Fix indentation to be multiples of tabs, not a mixture or tabs and spaces. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-25Merge branch 'bpf-tracepoints'David S. Miller18-23/+530
Daniel Borkmann says: ==================== BPF tracepoints This set adds tracepoints to BPF for better introspection and debugging. The first two patches are prerequisite for the actual third patch that adds the tracepoints. I think the first two are small and straight forward enough that they could ideally go via net-next, but I'm also open to other suggestions on how to route them in case that's not applicable (it would reduce potential merge conflicts on BPF side, though). For details, please see individual patches. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-25bpf: add initial bpf tracepointsDaniel Borkmann11-15/+483
This work adds a number of tracepoints to paths that are either considered slow-path or exception-like states, where monitoring or inspecting them would be desirable. For bpf(2) syscall, tracepoints have been placed for main commands when they succeed. In XDP case, tracepoint is for exceptions, that is, f.e. on abnormal BPF program exit such as unknown or XDP_ABORTED return code, or when error occurs during XDP_TX action and the packet could not be forwarded. Both have been split into separate event headers, and can be further extended. Worst case, if they unexpectedly should get into our way in future, they can also removed [1]. Of course, these tracepoints (like any other) can be analyzed by eBPF itself, etc. Example output: # ./perf record -a -e bpf:* sleep 10 # ./perf script sock_example 6197 [005] 283.980322: bpf:bpf_map_create: map type=ARRAY ufd=4 key=4 val=8 max=256 flags=0 sock_example 6197 [005] 283.980721: bpf:bpf_prog_load: prog=a5ea8fa30ea6849c type=SOCKET_FILTER ufd=5 sock_example 6197 [005] 283.988423: bpf:bpf_prog_get_type: prog=a5ea8fa30ea6849c type=SOCKET_FILTER sock_example 6197 [005] 283.988443: bpf:bpf_map_lookup_elem: map type=ARRAY ufd=4 key=[06 00 00 00] val=[00 00 00 00 00 00 00 00] [...] sock_example 6197 [005] 288.990868: bpf:bpf_map_lookup_elem: map type=ARRAY ufd=4 key=[01 00 00 00] val=[14 00 00 00 00 00 00 00] swapper 0 [005] 289.338243: bpf:bpf_prog_put_rcu: prog=a5ea8fa30ea6849c type=SOCKET_FILTER [1] https://lwn.net/Articles/705270/ Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-25lib, traceevent: add PRINT_HEX_STR variantDaniel Borkmann4-3/+34
Add support for the __print_hex_str() macro that was added for tracing, so that user space tools such as perf can understand it as well. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-25trace: add variant without spacing in trace_print_hex_seqDaniel Borkmann3-5/+13
For upcoming tracepoint support for BPF, we want to dump the program's tag. Format should be similar to __print_hex(), but without spacing. Add a __print_hex_str() variant for exactly that purpose that reuses trace_print_hex_seq(). Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-25tcp: reduce skb overhead in selected placesEric Dumazet2-2/+2
tcp_add_backlog() can use skb_condense() helper to get better gains and less SKB_TRUESIZE() magic. This only happens when socket backlog has to be used. Some attacks involve specially crafted out of order tiny TCP packets, clogging the ofo queue of (many) sockets. Then later, expensive collapse happens, trying to copy all these skbs into single ones. This unfortunately does not work if each skb has no neighbor in TCP sequence order. By using skb_condense() if the skb could not be coalesced to a prior one, we defeat these kind of threats, potentially saving 4K per skb (or more, since this is one page fragment). A typical NAPI driver allocates gro packets with GRO_MAX_HEAD bytes in skb->head, meaning the copy done by skb_condense() is limited to about 200 bytes. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-25Merge tag 'mlx5-updates-2017-01-24' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linuxDavid S. Miller15-120/+389
Saeed Mahameed says: ==================== mlx5-updates-2017-24-01 The first seven patches from Or Gerlitz in this series further enhances the mlx5 SRIOV switchdev mode to support offloading IPv6 tunnels using the TC tunnel key set (encap) and unset (decap) actions. Or Gerlitz says: ======================== As part of doing this change, few cleanups are done in the IPv4 code, later we move to use the full tunnel key info provided to the driver as the key for our internal hashing which is used to identify cases where the same tunnel is used for encapsulating multiple flows. As done in the IPv4 case, the control path for offloading IPv6 tunnels uses route/neigh lookups and construction of the IPv6 tunnel headers on the encap path and matching on the outer hears in the decap path. The last patch of the series enlarges the HW FDB size for the switchdev mode, so it has now room to contain offloaded flows as many as min(max number of HW flow counters supported, max HW table size supported). ======================== Next to Or's series you can find several patches handling several topics. From Mohamad, add support for SRIOV VF min rate guarantee by using the TSAR BW share weights mechanism. From Or, Two patches to enable Eth VFs to query their min-inline value for user-space. for that we move a mlx5 low level min inline helper function from mlx5 ethernet driver into the core driver and then use it in mlx5_ib to expose the inline mode to rdma applications through libmlx5. From Kamal Heib, Reduce memory consumption on kdump kernel. From Shaker Daibes, code reuse in CQE compression control logic ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-25tipc: uninitialized return code in tipc_setsockopt()Dan Carpenter1-2/+1
We shuffled some code around and added some new case statements here and now "res" isn't initialized on all paths. Fixes: 01fd12bb189a ("tipc: make replicast a user selectable option") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-25net sched actions: Add support for user cookiesJamal Hadi Salim4-0/+57
Introduce optional 128-bit action cookie. Like all other cookie schemes in the networking world (eg in protocols like http or existing kernel fib protocol field, etc) the idea is to save user state that when retrieved serves as a correlator. The kernel _should not_ intepret it. The user can store whatever they wish in the 128 bits. Sample exercise(showing variable length use of cookie) .. create an accept action with cookie a1b2c3d4 sudo $TC actions add action ok index 1 cookie a1b2c3d4 .. dump all gact actions.. sudo $TC -s actions ls action gact action order 0: gact action pass random type none pass val 0 index 1 ref 1 bind 0 installed 5 sec used 5 sec Action statistics: Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 cookie a1b2c3d4 .. bind the accept action to a filter.. sudo $TC filter add dev lo parent ffff: protocol ip prio 1 \ u32 match ip dst 127.0.0.1/32 flowid 1:1 action gact index 1 ... send some traffic.. $ ping 127.0.0.1 -c 3 PING 127.0.0.1 (127.0.0.1) 56(84) bytes of data. 64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.020 ms 64 bytes from 127.0.0.1: icmp_seq=2 ttl=64 time=0.027 ms 64 bytes from 127.0.0.1: icmp_seq=3 ttl=64 time=0.038 ms Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-24Merge branch 'netvsc-enhancements'David S. Miller4-594/+850
Stephen Hemminger says: ==================== netvsc driver enhancements for net-next Lots of little things in here. Support for minor more ethtool control, negotiation of offload parameters with host (based on FreeBSD) and several cleanups. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-24netvsc: call netif_receive_skbstephen hemminger1-1/+1
To improve performance, netvsc can call network stack directly and avoid the local backlog queue. This is safe since incoming packets are handled in softirq context already because the receive function callback is called from a tasklet. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-24netvsc: simplify get next send sectionstephen hemminger1-20/+8
Use kernel for_each_clear_bit macro to simplify finding next available send section. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-24netvsc: report per-channel stats in ethtool statisticsSimon Xiao3-57/+93
Report packets and bytes transferred through a vmbus channel via ethtool. This supersedes need for per-cpu statistics. Example: $ ethtool -S eth0 NIC statistics: ... tx_queue_0_packets: 3523179 tx_queue_0_bytes: 505370920 rx_queue_0_packets: 41430490 rx_queue_0_bytes: 62714661254 tx_queue_1_packets: 0 tx_queue_1_bytes: 0 rx_queue_1_packets: 0 rx_queue_1_bytes: 0 ... Reviewed-by: Long Li <longli@microsoft.com> Reviewed-by: K. Y. Srinivasan <kys@microsoft.com> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: Simon Xiao <sixiao@microsoft.com> Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-24netvsc: account for packets/bytes transmitted after completionstephen hemminger3-14/+22
Most drivers do not increment transmit statistics until after the transmit is completed. This will also be necessary for BQL support. Slight additional complexity because the netvsc driver aggregates multiple packets into one transmit. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>