aboutsummaryrefslogtreecommitdiffstats
path: root/net/ipv4/ping.c (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2019-07-08ipv6: elide flowlabel check if no exclusive leases existWillem de Bruijn8-14/+45
Processes can request ipv6 flowlabels with cmsg IPV6_FLOWINFO. If not set, by default an autogenerated flowlabel is selected. Explicit flowlabels require a control operation per label plus a datapath check on every connection (every datagram if unconnected). This is particularly expensive on unconnected sockets multiplexing many flows, such as QUIC. In the common case, where no lease is exclusive, the check can be safely elided, as both lease request and check trivially succeed. Indeed, autoflowlabel does the same even with exclusive leases. Elide the check if no process has requested an exclusive lease. fl6_sock_lookup previously returns either a reference to a lease or NULL to denote failure. Modify to return a real error and update all callers. On return NULL, they can use the label and will elide the atomic_dec in fl6_sock_release. This is an optimization. Robust applications still have to revert to requesting leases if the fast path fails due to an exclusive lease. Changes RFC->v1: - use static_key_false_deferred to rate limit jump label operations - call static_key_deferred_flush to stop timers on exit - move decrement out of RCU context - defer optimization also if opt data is associated with a lease - updated all fp6_sock_lookup callers, not just udp Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08tcp: Reset bytes_acked and bytes_received when disconnectingChristoph Paasch1-0/+2
If an app is playing tricks to reuse a socket via tcp_disconnect(), bytes_acked/received needs to be reset to 0. Otherwise tcp_info will report the sum of the current and the old connection.. Cc: Eric Dumazet <edumazet@google.com> Fixes: 0df48c26d841 ("tcp: add tcpi_bytes_acked to tcp_info") Fixes: bdd1f9edacb5 ("tcp: add tcpi_bytes_received to tcp_info") Signed-off-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08bonding: fix value exported by Netlink for peer_notif_delayVincent Bernat1-1/+1
IFLA_BOND_PEER_NOTIF_DELAY was set to the value of downdelay instead of peer_notif_delay. After this change, the correct value is exported. Fixes: 07a4ddec3ce9 ("bonding: add an option to specify a delay between peer notifications") Signed-off-by: Vincent Bernat <vincent@bernat.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08coallocate socket_wq with socket itselfAl Viro7-28/+15
socket->wq is assign-once, set when we are initializing both struct socket it's in and struct socket_wq it points to. As the matter of fact, the only reason for separate allocation was the ability to RCU-delay freeing of socket_wq. RCU-delaying the freeing of socket itself gets rid of that need, so we can just fold struct socket_wq into the end of struct socket and simplify the life both for sock_alloc_inode() (one allocation instead of two) and for tun/tap oddballs, where we used to embed struct socket and struct socket_wq into the same structure (now - embedding just the struct socket). Note that reference to struct socket_wq in struct sock does remain a reference - that's unchanged. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08sockfs: switch to ->free_inode()Al Viro1-3/+3
we do have an RCU-delayed part there already (freeing the wq), so it's not like the pipe situation; moreover, it might be worth considering coallocating wq with the rest of struct sock_alloc. ->sk_wq in struct sock would remain a pointer as it is, but the object it normally points to would be coallocated with struct socket... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-09xdp: fix race on generic receive pathIlya Maximets2-9/+24
Unlike driver mode, generic xdp receive could be triggered by different threads on different CPU cores at the same time leading to the fill and rx queue breakage. For example, this could happen while sending packets from two processes to the first interface of veth pair while the second part of it is open with AF_XDP socket. Need to take a lock for each generic receive to avoid race. Fixes: c497176cb2e4 ("xsk: add Rx receive functions and poll support") Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Tested-by: William Tu <u9012063@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-08selftests: forwarding: Test multipath hashing on inner IP pkts for GRE tunnelStephen Suryaputra4-0/+1220
Add selftest scripts for multipath hashing on inner IP pkts when there is a single GRE tunnel but there are multiple underlay routes to reach the other end of the tunnel. Four cases are covered in these scripts: - IPv4 inner, IPv4 outer - IPv6 inner, IPv4 outer - IPv4 inner, IPv6 outer - IPv6 inner, IPv6 outer Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Stephen Suryaputra <ssuryaextr@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08ipv6: Support multipath hashing on inner IP pktsStephen Suryaputra2-0/+37
Make the same support as commit 363887a2cdfe ("ipv4: Support multipath hashing on inner IP pkts for GRE tunnel") for outer IPv6. The hashing considers both IPv4 and IPv6 pkts when they are tunneled by IPv6 GRE. Signed-off-by: Stephen Suryaputra <ssuryaextr@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08ipv4: Multipath hashing on inner L3 needs to consider inner IPv6 pktsStephen Suryaputra1-4/+17
Commit 363887a2cdfe ("ipv4: Support multipath hashing on inner IP pkts for GRE tunnel") supports multipath policy value of 2, Layer 3 or inner Layer 3 if present, but it only considers inner IPv4. There is a use case of IPv6 is tunneled by IPv4 GRE, thus add the ability to hash on inner IPv6 addresses. Fixes: 363887a2cdfe ("ipv4: Support multipath hashing on inner IP pkts for GRE tunnel") Signed-off-by: Stephen Suryaputra <ssuryaextr@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08net: pasemi: fix an use-after-free in pasemi_mac_phy_init()Wen Yang1-1/+1
The phy_dn variable is still being used in of_phy_connect() after the of_node_put() call, which may result in use-after-free. Fixes: 1dd2d06c0459 ("net: Rework pasemi_mac driver to use of_mdio infrastructure") Signed-off-by: Wen Yang <wen.yang99@zte.com.cn> Cc: "David S. Miller" <davem@davemloft.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08net: axienet: fix a potential double free in axienet_probe()Wen Yang1-1/+0
There is a possible use-after-free issue in the axienet_probe(): 1701: np = of_parse_phandle(pdev->dev.of_node, "axistream-connected", 0); 1702: if (np) { ... 1787: of_node_put(np); ---> released here 1788: lp->eth_irq = platform_get_irq(pdev, 0); 1789: } else { ... 1801: } 1802: if (IS_ERR(lp->dma_regs)) { ... 1805: of_node_put(np); ---> double released here 1806: goto free_netdev; 1807: } We solve this problem by removing the unnecessary of_node_put(). Fixes: 28ef9ebdb64c ("net: axienet: make use of axistream-connected attribute optional") Signed-off-by: Wen Yang <wen.yang99@zte.com.cn> Cc: Anirudha Sarangi <anirudh@xilinx.com> Cc: John Linn <John.Linn@xilinx.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Michal Simek <michal.simek@xilinx.com> Cc: Robert Hancock <hancock@sedsystems.ca> Cc: netdev@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Reviewed-by: Robert Hancock <hancock@sedsystems.ca> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-09selftests/bpf: fix test_reuseport_array on s390Ilya Leoshkevich1-6/+15
Fix endianness issue: passing a pointer to 64-bit fd as a 32-bit key does not work on big-endian architectures. So cast fd to 32-bits when necessary. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-08net: stmmac: enable clause 45 mdio supportKweh Hock Leong2-8/+37
DWMAC4 is capable to support clause 45 mdio communication. This patch enable the feature on stmmac_mdio_write() and stmmac_mdio_read() by following phy_write_mmd() and phy_read_mmd() mdiobus read write implementation format. Reviewed-by: Li, Yifan <yifan2.li@intel.com> Signed-off-by: Kweh Hock Leong <hock.leong.kweh@intel.com> Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com> Signed-off-by: Voon Weifeng <weifeng.voon@intel.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08net: openvswitch: use netif_ovs_is_port() instead of opencodeTaehee Yoo2-4/+4
Use netif_ovs_is_port() function instead of open code. This patch doesn't change logic. Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08MAINTAINERS: Add page_pool maintainer entryJesper Dangaard Brouer1-0/+8
In this release cycle the number of NIC drivers using page_pool will likely reach 4 drivers. It is about time to add a maintainer entry. Add myself and Ilias. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08net: mvpp2: cls: Add support for ETHER_FLOWMaxime Chevallier1-0/+2
Users can specify classification actions based on the 'ether' flow type. In that case, this will apply to all ethernet traffic, superseeding flows such as 'udp4' or 'tcp6'. Add support for this flow type in the PPv2 classifier, by mapping the ETHER_FLOW value to the corresponding entries in the classifier. Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08net: mvpp2: cls: Report an error for unsupported flow typesMaxime Chevallier1-0/+4
Add a missing check to detect flow types that we don't support, so that user can be informed of this. Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08selftests: txring_overwrite: fix incorrect test of mmap() return valueFrank de Brabander1-1/+1
If mmap() fails it returns MAP_FAILED, which is defined as ((void *) -1). The current if-statement incorrectly tests if *ring is NULL. Fixes: 358be656406d ("selftests/net: add txring_overwrite") Signed-off-by: Frank de Brabander <debrabander@gmail.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08vsock/virtio: fix flush of works during the .remove()Stefano Garzarella1-6/+9
This patch moves the flush of works after vdev->config->del_vqs(vdev), because we need to be sure that no workers run before to free the 'vsock' object. Since we stopped the workers using the [tx|rx|event]_run flags, we are sure no one is accessing the device while we are calling vdev->config->reset(vdev), so we can safely move the workers' flush. Before the vdev->config->del_vqs(vdev), workers can be scheduled by VQ callbacks, so we must flush them after del_vqs(), to avoid use-after-free of 'vsock' object. Suggested-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08vsock/virtio: stop workers during the .remove()Stefano Garzarella1-1/+50
Before to call vdev->config->reset(vdev) we need to be sure that no one is accessing the device, for this reason, we add new variables in the struct virtio_vsock to stop the workers during the .remove(). This patch also add few comments before vdev->config->reset(vdev) and vdev->config->del_vqs(vdev). Suggested-by: Stefan Hajnoczi <stefanha@redhat.com> Suggested-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08vsock/virtio: use RCU to avoid use-after-free on the_virtio_vsockStefano Garzarella1-24/+46
Some callbacks used by the upper layers can run while we are in the .remove(). A potential use-after-free can happen, because we free the_virtio_vsock without knowing if the callbacks are over or not. To solve this issue we move the assignment of the_virtio_vsock at the end of .probe(), when we finished all the initialization, and at the beginning of .remove(), before to release resources. For the same reason, we do the same also for the vdev->priv. We use RCU to be sure that all callbacks that use the_virtio_vsock ended before freeing it. This is not required for callbacks that use vdev->priv, because after the vdev->config->del_vqs() we are sure that they are ended and will no longer be invoked. We also take the mutex during the .remove() to avoid that .probe() can run while we are resetting the device. Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08Documentation: net: dsa: b53: Describe b53 configurationBenedikt Spranger2-0/+184
Document the different needs of documentation for the b53 driver. Signed-off-by: Benedikt Spranger <b.spranger@linutronix.de> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08Documentation: net: dsa: Describe DSA switch configurationBenedikt Spranger2-0/+293
Document DSA tagged and VLAN based switch configuration by showcases. Signed-off-by: Benedikt Spranger <b.spranger@linutronix.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08nfp: tls: fix error return code in nfp_net_tls_add()Wei Yongjun1-0/+1
Fix to return negative error code -EINVAL from the error handling case instead of 0, as done elsewhere in this function. Fixes: 1f35a56cf586 ("nfp: tls: add/delete TLS TX connections") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08bnxt_en: add page_pool supportAndy Gospodarek4-7/+47
This removes contention over page allocation for XDP_REDIRECT actions by adding page_pool support per queue for the driver. The performance for XDP_REDIRECT actions scales linearly with the number of cores performing redirect actions when using the page pools instead of the standard page allocator. v2: Fix up the error path from XDP registration, noted by Ilias Apalodimas. Signed-off-by: Andy Gospodarek <gospo@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08bnxt_en: optimized XDP_REDIRECT supportAndy Gospodarek4-10/+140
This adds basic support for XDP_REDIRECT in the bnxt_en driver. Next patch adds the more optimized page pool support. Signed-off-by: Andy Gospodarek <gospo@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08bnxt_en: Refactor __bnxt_xmit_xdp().Michael Chan4-13/+28
__bnxt_xmit_xdp() is used by XDP_TX and ethtool loopback packet transmit. Refactor it so that it can be re-used by the XDP_REDIRECT logic. Restructure the TX interrupt handler logic to cleanly separate XDP_TX logic in preparation for XDP_REDIRECT. Acked-by: Andy Gospodarek <gospo@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08bnxt_en: rename some xdp functionsAndy Gospodarek3-7/+7
Renaming bnxt_xmit_xdp to __bnxt_xmit_xdp to get ready for XDP_REDIRECT support and reduce confusion/namespace collision. Signed-off-by: Andy Gospodarek <gospo@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08net: ethernet: ti: cpsw: add XDP supportIvan Khoronzhuk4-59/+488
Add XDP support based on rx page_pool allocator, one frame per page. Page pool allocator is used with assumption that only one rx_handler is running simultaneously. DMA map/unmap is reused from page pool despite there is no need to map whole page. Due to specific of cpsw, the same TX/RX handler can be used by 2 network devices, so special fields in buffer are added to identify an interface the frame is destined to. Thus XDP works for both interfaces, that allows to test xdp redirect between two interfaces easily. Also, each rx queue have own page pools, but common for both netdevs. XDP prog is common for all channels till appropriate changes are added in XDP infrastructure. Also, once page_pool recycling becomes part of skb netstack some simplifications can be added, like removing page_pool_release_page() before skb receive. In order to keep rx_dev while redirect, that can be somehow used in future, do flush in rx_handler, that allows to keep rx dev the same while redirect. It allows to conform with tracing rx_dev pointed by Jesper. Also, there is probability, that XDP generic code can be extended to support multi ndev drivers like this one, using same rx queue for several ndevs, based on switchdev for instance or else. In this case, driver can be modified like exposed here: https://lkml.org/lkml/2019/7/3/243 Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08net: ethernet: ti: cpsw_ethtool: allow res split while downIvan Khoronzhuk1-2/+1
That's possible to set channel num while interfaces are down. When interface gets up it should resplit budget. This resplit can happen after phy is up but only if speed is changed, so should be set before this, for this allow it to happen while changing number of channels, when interfaces are down. Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08net: ethernet: ti: davinci_cpdma: allow desc split while downIvan Khoronzhuk3-9/+28
That's possible to set ring params while interfaces are down. When interface gets up it uses number of descs to fill rx queue and on later on changes to create rx pools. Usually, this resplit can happen after phy is up, but it can be needed before this, so allow it to happen while setting number of rx descs, when interfaces are down. Also, if no dependency on intf state, move it to cpdma layer, where it should be. Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08net: ethernet: ti: davinci_cpdma: add dma mapped submitIvan Khoronzhuk2-10/+83
In case if dma mapped packet needs to be sent, like with XDP page pool, the "mapped" submit can be used. This patch adds dma mapped submit based on regular one. Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08net: core: page_pool: add user refcnt and reintroduce page_pool_destroyIvan Khoronzhuk5-8/+40
Jesper recently removed page_pool_destroy() (from driver invocation) and moved shutdown and free of page_pool into xdp_rxq_info_unreg(), in-order to handle in-flight packets/pages. This created an asymmetry in drivers create/destroy pairs. This patch reintroduce page_pool_destroy and add page_pool user refcnt. This serves the purpose to simplify drivers error handling as driver now drivers always calls page_pool_destroy() and don't need to track if xdp_rxq_info_reg_mem_model() was unsuccessful. This could be used for a special cases where a single RX-queue (with a single page_pool) provides packets for two net_device'es, and thus needs to register the same page_pool twice with two xdp_rxq_info structures. This patch is primarily to ease API usage for drivers. The recently merged netsec driver, actually have a bug in this area, which is solved by this API change. This patch is a modified version of Ivan Khoronzhuk's original patch. Link: https://lore.kernel.org/netdev/20190625175948.24771-2-ivan.khoronzhuk@linaro.org/ Fixes: 5c67bf0ec4d0 ("net: netsec: Use page_pool API") Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08nfc: fix potential illegal memory accessYang Wei1-1/+1
The frags_q is not properly initialized, it may result in illegal memory access when conn_info is NULL. The "goto free_exit" should be replaced by "goto exit". Signed-off-by: Yang Wei <albin_yang@163.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08macb: fix build warning for !CONFIG_OFArnd Bergmann1-2/+2
When CONFIG_OF is disabled, we get a harmless warning about the newly added variable: drivers/net/ethernet/cadence/macb_main.c:48:39: error: 'mgmt' defined but not used [-Werror=unused-variable] static struct sifive_fu540_macb_mgmt *mgmt; Move the variable closer to its use inside of the #ifdef. Fixes: c218ad559020 ("macb: Add support for SiFive FU540-C000") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08gve: fix unused variable/label warningsArnd Bergmann1-33/+33
On unusual page sizes, we get harmless warnings: drivers/net/ethernet/google/gve/gve_rx.c:283:6: error: unused variable 'pagecount' [-Werror,-Wunused-variable] drivers/net/ethernet/google/gve/gve_rx.c:336:1: error: unused label 'have_skb' [-Werror,-Wunused-label] Change the preprocessor #if to regular if() to avoid this. Fixes: f5cedc84a30d ("gve: Add transmit and receive support") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08net: stmmac: Re-work the queue selection for TSO packetsJose Abreu1-10/+18
Ben Hutchings says: "This is the wrong place to change the queue mapping. stmmac_xmit() is called with a specific TX queue locked, and accessing a different TX queue results in a data race for all of that queue's state. I think this commit should be reverted upstream and in all stable branches. Instead, the driver should implement the ndo_select_queue operation and override the queue mapping there." Fixes: c5acdbee22a1 ("net: stmmac: Send TSO packets always from Queue 0") Suggested-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Jose Abreu <joabreu@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08sfc: Remove 'PCIE error reporting unavailable'Martin Habets1-5/+1
This is only at notice level but it was pointed out that no other driver does this. Also there is no action the user can take as it is really a property of the server. Signed-off-by: Martin Habets <mhabets@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08net: netsec: Sync dma for device on buffer allocationIlias Apalodimas1-3/+1
cd1973a9215a ("net: netsec: Sync dma for device on buffer allocation") was merged on it's v1 instead of the v3. Merge the proper patch version Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08tools: bpftool: add completion for bpftool prog "loadall"Quentin Monnet1-1/+8
Bash completion for proposing the "loadall" subcommand is missing. Let's add it to the completion script. Add a specific case to propose "load" and "loadall" for completing: $ bpftool prog load ^ cursor is here Otherwise, completion considers that $command is in load|loadall and starts making related completions (file or directory names, as the number of words on the command line is below 6), when the only suggested keywords should be "load" and "loadall" until one has been picked and a space entered after that to move to the next word. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-08bpf: avoid unused variable warning in tcp_bpf_rtt()Arnd Bergmann1-3/+1
When CONFIG_BPF is disabled, we get a warning for an unused variable: In file included from drivers/target/target_core_device.c:26: include/net/tcp.h:2226:19: error: unused variable 'tp' [-Werror,-Wunused-variable] struct tcp_sock *tp = tcp_sk(sk); The variable is only used in one place, so it can be replaced with its value there to avoid the warning. Fixes: 23729ff23186 ("bpf: add BPF_CGROUP_SOCK_OPS callback that is executed on every RTT") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-08bpf: cgroup: Fix build error without CONFIG_NETYueHaibing1-0/+4
If CONFIG_NET is not set and CONFIG_CGROUP_BPF=y, gcc building fails: kernel/bpf/cgroup.o: In function `cg_sockopt_func_proto': cgroup.c:(.text+0x237e): undefined reference to `bpf_sk_storage_get_proto' cgroup.c:(.text+0x2394): undefined reference to `bpf_sk_storage_delete_proto' kernel/bpf/cgroup.o: In function `__cgroup_bpf_run_filter_getsockopt': (.text+0x2a1f): undefined reference to `lock_sock_nested' (.text+0x2ca2): undefined reference to `release_sock' kernel/bpf/cgroup.o: In function `__cgroup_bpf_run_filter_setsockopt': (.text+0x3006): undefined reference to `lock_sock_nested' (.text+0x32bb): undefined reference to `release_sock' Reported-by: Hulk Robot <hulkci@huawei.com> Suggested-by: Stanislav Fomichev <sdf@fomichev.me> Fixes: 0d01da6afc54 ("bpf: implement getsockopt and setsockopt hooks") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-08selftests/bpf: fix test_attach_probe map definitionAndrii Nakryiko1-8/+5
ef99b02b23ef ("libbpf: capture value in BTF type info for BTF-defined map defs") changed BTF-defined maps syntax, while independently merged 1e8611bbdfc9 ("selftests/bpf: add kprobe/uprobe selftests") added new test using outdated syntax of maps. This patch fixes this test after corresponding patch sets were merged. Fixes: ef99b02b23ef ("libbpf: capture value in BTF type info for BTF-defined map defs") Fixes: 1e8611bbdfc9 ("selftests/bpf: add kprobe/uprobe selftests") Signed-off-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-08selftests/bpf: add verifier tests for wide storesStanislav Fomichev2-3/+50
Make sure that wide stores are allowed at proper (aligned) addresses. Note that user_ip6 is naturally aligned on 8-byte boundary, so correct addresses are user_ip6[0] and user_ip6[2]. msg_src_ip6 is, however, aligned on a 4-byte bondary, so only msg_src_ip6[1] can be wide-stored. Cc: Andrii Nakryiko <andriin@fb.com> Cc: Yonghong Song <yhs@fb.com> Acked-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-08bpf: sync bpf.h to tools/Stanislav Fomichev1-3/+3
Sync user_ip6 & msg_src_ip6 comments. Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-08bpf: allow wide (u64) aligned stores for some fields of bpf_sock_addrStanislav Fomichev3-11/+23
Since commit cd17d7770578 ("bpf/tools: sync bpf.h") clang decided that it can do a single u64 store into user_ip6[2] instead of two separate u32 ones: # 17: (18) r2 = 0x100000000000000 # ; ctx->user_ip6[2] = bpf_htonl(DST_REWRITE_IP6_2); # 19: (7b) *(u64 *)(r1 +16) = r2 # invalid bpf_context access off=16 size=8 >From the compiler point of view it does look like a correct thing to do, so let's support it on the kernel side. Credit to Andrii Nakryiko for a proper implementation of bpf_ctx_wide_store_ok. Cc: Andrii Nakryiko <andriin@fb.com> Cc: Yonghong Song <yhs@fb.com> Fixes: cd17d7770578 ("bpf/tools: sync bpf.h") Reported-by: kernel test robot <rong.a.chen@intel.com> Acked-by: Yonghong Song <yhs@fb.com> Acked-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-08libbpf: add perf_buffer_ prefix to READMEAndrii Nakryiko1-1/+2
perf_buffer "object" is part of libbpf API now, add it to the list of libbpf function prefixes. Suggested-by: Daniel Borkman <daniel@iogearbox.net> Signed-off-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-08tools/bpftool: switch map event_pipe to libbpf's perf_bufferAndrii Nakryiko1-137/+64
Switch event_pipe implementation to rely on new libbpf perf buffer API (it's raw low-level variant). Signed-off-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-08selftests/bpf: test perf buffer APIAndrii Nakryiko2-0/+125
Add test verifying perf buffer API functionality. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Song Liu <songliubraving@fb.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-08libbpf: auto-set PERF_EVENT_ARRAY size to number of CPUsAndrii Nakryiko1-7/+24
For BPF_MAP_TYPE_PERF_EVENT_ARRAY typically correct size is number of possible CPUs. This is impossible to specify at compilation time. This change adds automatic setting of PERF_EVENT_ARRAY size to number of system CPUs, unless non-zero size is specified explicitly. This allows to adjust size for advanced specific cases, while providing convenient and logical defaults. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Song Liu <songliubraving@fb.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>