aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation (follow)
AgeCommit message (Collapse)AuthorFilesLines
2018-06-04xsk: new descriptor addressing schemeBjörn Töpel1-43/+58
Currently, AF_XDP only supports a fixed frame-size memory scheme where each frame is referenced via an index (idx). A user passes the frame index to the kernel, and the kernel acts upon the data. Some NICs, however, do not have a fixed frame-size model, instead they have a model where a memory window is passed to the hardware and multiple frames are filled into that window (referred to as the "type-writer" model). By changing the descriptor format from the current frame index addressing scheme, AF_XDP can in the future be extended to support these kinds of NICs. In the index-based model, an idx refers to a frame of size frame_size. Addressing a frame in the UMEM is done by offseting the UMEM starting address by a global offset, idx * frame_size + offset. Communicating via the fill- and completion-rings are done by means of idx. In this commit, the idx is removed in favor of an address (addr), which is a relative address ranging over the UMEM. To convert an idx-based address to the new addr is simply: addr = idx * frame_size + offset. We also stop referring to the UMEM "frame" as a frame. Instead it is simply called a chunk. To transfer ownership of a chunk to the kernel, the addr of the chunk is passed in the fill-ring. Note, that the kernel will mask addr to make it chunk aligned, so there is no need for userspace to do that. E.g., for a chunk size of 2k, passing an addr of 2048, 2050 or 3000 to the fill-ring will refer to the same chunk. On the completion-ring, the addr will match that of the Tx descriptor, passed to the kernel. Changing the descriptor format to use chunks/addr will allow for future changes to move to a type-writer based model, where multiple frames can reside in one chunk. In this model passing one single chunk into the fill-ring, would potentially result in multiple Rx descriptors. This commit changes the uapi of AF_XDP sockets, and updates the documentation. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller5-6/+141
Lots of easy overlapping changes in the confict resolutions here. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-25Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds1-6/+0
Pull networking fixes from David Miller: "Let's begin the holiday weekend with some networking fixes: 1) Whoops need to restrict cfg80211 wiphy names even more to 64 bytes. From Eric Biggers. 2) Fix flags being ignored when using kernel_connect() with SCTP, from Xin Long. 3) Use after free in DCCP, from Alexey Kodanev. 4) Need to check rhltable_init() return value in ipmr code, from Eric Dumazet. 5) XDP handling fixes in virtio_net from Jason Wang. 6) Missing RTA_TABLE in rtm_ipv4_policy[], from Roopa Prabhu. 7) Need to use IRQ disabling spinlocks in mlx4_qp_lookup(), from Jack Morgenstein. 8) Prevent out-of-bounds speculation using indexes in BPF, from Daniel Borkmann. 9) Fix regression added by AF_PACKET link layer cure, from Willem de Bruijn. 10) Correct ENIC dma mask, from Govindarajulu Varadarajan. 11) Missing config options for PMTU tests, from Stefano Brivio" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (48 commits) ibmvnic: Fix partial success login retries selftests/net: Add missing config options for PMTU tests mlx4_core: allocate ICM memory in page size chunks enic: set DMA mask to 47 bit ppp: remove the PPPIOCDETACH ioctl ipv4: remove warning in ip_recv_error net : sched: cls_api: deal with egdev path only if needed vhost: synchronize IOTLB message with dev cleanup packet: fix reserve calculation net/mlx5: IPSec, Fix a race between concurrent sandbox QP commands net/mlx5e: When RXFCS is set, add FCS data into checksum calculation bpf: properly enforce index mask to prevent out-of-bounds speculation net/mlx4: Fix irq-unsafe spinlock usage net: phy: broadcom: Fix bcm_write_exp() net: phy: broadcom: Fix auxiliary control register reads net: ipv4: add missing RTA_TABLE to rtm_ipv4_policy net/mlx4: fix spelling mistake: "Inrerface" -> "Interface" and rephrase message ibmvnic: Only do H_EOI for mobility events tuntap: correctly set SOCKWQ_ASYNC_NOSPACE virtio-net: fix leaking page for gso packet during mergeable XDP ...
2018-05-24ppp: remove the PPPIOCDETACH ioctlEric Biggers1-6/+0
The PPPIOCDETACH ioctl effectively tries to "close" the given ppp file before f_count has reached 0, which is fundamentally a bad idea. It does check 'f_count < 2', which excludes concurrent operations on the file since they would only be possible with a shared fd table, in which case each fdget() would take a file reference. However, it fails to account for the fact that even with 'f_count == 1' the file can still be linked into epoll instances. As reported by syzbot, this can trivially be used to cause a use-after-free. Yet, the only known user of PPPIOCDETACH is pppd versions older than ppp-2.4.2, which was released almost 15 years ago (November 2003). Also, PPPIOCDETACH apparently stopped working reliably at around the same time, when the f_count check was added to the kernel, e.g. see https://lkml.org/lkml/2002/12/31/83. Also, the current 'f_count < 2' check makes PPPIOCDETACH only work in single-threaded applications; it always fails if called from a multithreaded application. All pppd versions released in the last 15 years just close() the file descriptor instead. Therefore, instead of hacking around this bug by exporting epoll internals to modules, and probably missing other related bugs, just remove the PPPIOCDETACH ioctl and see if anyone actually notices. Leave a stub in place that prints a one-time warning and returns EINVAL. Reported-by: syzbot+16363c99d4134717c05b@syzkaller.appspotmail.com Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Biggers <ebiggers@google.com> Acked-by: Paul Mackerras <paulus@ozlabs.org> Reviewed-by: Guillaume Nault <g.nault@alphalink.fr> Tested-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-23net: dsa: qca8k: Add QCA8334 binding documentationMichal Vokáč1-1/+22
Add support for the four-port variant of the Qualcomm QCA833x switch. The CPU port default link settings can be reconfigured using a fixed-link sub-node. Signed-off-by: Michal Vokáč <michal.vokac@ysoft.com> Reviewed-by: Rob Herring <robh@kernel.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-23Documentation/bindings: net: the sfp i2c-bus property is now mandatoryAntoine Tenart1-2/+2
The i2c-bus property for sfp modules was made mandatory. Update the documentation to keep it in sync with the driver's behaviour. Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller4-7/+23
S390 bpf_jit.S is removed in net-next and had changes in 'net', since that code isn't used any more take the removal. TLS data structures split the TX and RX components in 'net-next', put the new struct members from the bug fix in 'net' into the RX part. The 'net-next' tree had some reworking of how the ERSPAN code works in the GRE tunneling code, overlapping with a one-line headroom calculation fix in 'net'. Overlapping changes in __sock_map_ctx_update_elem(), keep the bits that read the prog members via READ_ONCE() into local variables before using them. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-21Merge branch 'speck-v20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds4-0/+141
Merge speculative store buffer bypass fixes from Thomas Gleixner: - rework of the SPEC_CTRL MSR management to accomodate the new fancy SSBD (Speculative Store Bypass Disable) bit handling. - the CPU bug and sysfs infrastructure for the exciting new Speculative Store Bypass 'feature'. - support for disabling SSB via LS_CFG MSR on AMD CPUs including Hyperthread synchronization on ZEN. - PRCTL support for dynamic runtime control of SSB - SECCOMP integration to automatically disable SSB for sandboxed processes with a filter flag for opt-out. - KVM integration to allow guests fiddling with SSBD including the new software MSR VIRT_SPEC_CTRL to handle the LS_CFG based oddities on AMD. - BPF protection against SSB .. this is just the core and x86 side, other architecture support will come separately. * 'speck-v20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (49 commits) bpf: Prevent memory disambiguation attack x86/bugs: Rename SSBD_NO to SSB_NO KVM: SVM: Implement VIRT_SPEC_CTRL support for SSBD x86/speculation, KVM: Implement support for VIRT_SPEC_CTRL/LS_CFG x86/bugs: Rework spec_ctrl base and mask logic x86/bugs: Remove x86_spec_ctrl_set() x86/bugs: Expose x86_spec_ctrl_base directly x86/bugs: Unify x86_spec_ctrl_{set_guest,restore_host} x86/speculation: Rework speculative_store_bypass_update() x86/speculation: Add virtualized speculative store bypass disable support x86/bugs, KVM: Extend speculation control for VIRT_SPEC_CTRL x86/speculation: Handle HT correctly on AMD x86/cpufeatures: Add FEATURE_ZEN x86/cpufeatures: Disentangle SSBD enumeration x86/cpufeatures: Disentangle MSR_SPEC_CTRL enumeration from IBRS x86/speculation: Use synthetic bits for IBRS/IBPB/STIBP KVM: SVM: Move spec control call after restore of GS x86/cpu: Make alternative_msr_write work for 32-bit code x86/bugs: Fix the parameters alignment and missing void x86/bugs: Make cpu_show_common() static ...
2018-05-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds1-0/+7
Pull networking fixes from David Miller: 1) Fix refcounting bug for connections in on-packet scheduling mode of IPVS, from Julian Anastasov. 2) Set network header properly in AF_PACKET's packet_snd, from Willem de Bruijn. 3) Fix regressions in 3c59x by converting to generic DMA API. It was relying upon the hack that the PCI DMA interfaces would accept NULL for EISA devices. From Christoph Hellwig. 4) Remove RDMA devices before unregistering netdev in QEDE driver, from Michal Kalderon. 5) Use after free in TUN driver ptr_ring usage, from Jason Wang. 6) Properly check for missing netlink attributes in SMC_PNETID requests, from Eric Biggers. 7) Set DMA mask before performaing any DMA operations in vmxnet3 driver, from Regis Duchesne. 8) Fix mlx5 build with SMP=n, from Saeed Mahameed. 9) Classifier fixes in bcm_sf2 driver from Florian Fainelli. 10) Tuntap use after free during release, from Jason Wang. 11) Don't use stack memory in scatterlists in tls code, from Matt Mullins. 12) Not fully initialized flow key object in ipv4 routing code, from David Ahern. 13) Various packet headroom bug fixes in ip6_gre driver, from Petr Machata. 14) Remove queues from XPS maps using correct index, from Amritha Nambiar. 15) Fix use after free in sock_diag, from Eric Dumazet. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (64 commits) net: ip6_gre: fix tunnel metadata device sharing. cxgb4: fix offset in collecting TX rate limit info net: sched: red: avoid hashing NULL child sock_diag: fix use-after-free read in __sk_free sh_eth: Change platform check to CONFIG_ARCH_RENESAS net: dsa: Do not register devlink for unused ports net: Fix a bug in removing queues from XPS map bpf: fix truncated jump targets on heavy expansions bpf: parse and verdict prog attach may race with bpf map update bpf: sockmap update rollback on error can incorrectly dec prog refcnt net: test tailroom before appending to linear skb net: ip6_gre: Fix ip6erspan hlen calculation net: ip6_gre: Split up ip6gre_changelink() net: ip6_gre: Split up ip6gre_newlink() net: ip6_gre: Split up ip6gre_tnl_change() net: ip6_gre: Split up ip6gre_tnl_link_config() net: ip6_gre: Fix headroom request in ip6erspan_tunnel_xmit() net: ip6_gre: Request headroom in __gre6_xmit() selftests/bpf: check return value of fopen in test_verifier.c erspan: fix invalid erspan version. ...
2018-05-20Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-nextDavid S. Miller1-0/+30
Johan Hedberg says: ==================== pull request: bluetooth-next 2018-05-18 Here's the first bluetooth-next pull request for the 4.18 kernel: - Refactoring of the btbcm driver - New USB IDs for QCA_ROME and LiteOn controllers - Buffer overflow fix if the controller sends invalid advertising data length - Various cleanups & fixes for Qualcomm controllers Please let me know if there are any issues pulling. Thanks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-19sh_eth: add R8A77980 supportSergei Shtylyov1-0/+1
Finally, add support for the DT probing of the R-Car V3H (AKA R8A77980) -- it's the only R-Car gen3 SoC having the GEther controller -- others have only EtherAVB... Based on the original (and large) patch by Vladimir Barinov. Signed-off-by: Vladimir Barinov <vladimir.barinov@cogentembedded.com> Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Reviewed-by: Simon Horman <horms+renesas@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-19Merge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-socLinus Torvalds1-4/+5
Pull ARM SoC fixes from Olof Johansson: "A handful of fixes. I've been queuing them up a bit too long so the list is longer than it otherwise would have been spread out across a few -rcs. In general, it's a scattering of fixes across several platforms, nothing truly serious enough to point out. There's a slightly larger batch of them for the Davinci platforms due to work to bring them back to life after some time, so there's a handful of regressions, some of them going back very far, others more recent. There's also a few patches fixing DT on Renesas platforms since they changed some bindings without remaining backwards compatible, splitting up describing LVDS as a proper bridge instead of having it as part of the display unit. We could push for them to be backwards compatible with old device trees, but it's likely to regress eventually if nobody's actually using said compatibility" * tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (36 commits) ARM: davinci: board-dm646x-evm: set VPIF capture card name ARM: davinci: board-dm646x-evm: pass correct I2C adapter id for VPIF ARM: davinci: dm646x: fix timer interrupt generation ARM: keystone: fix platform_domain_notifier array overrun arm64: dts: exynos: Fix interrupt type for I2S1 device on Exynos5433 ARM: dts: imx51-zii-rdu1: fix touchscreen bindings firmware: arm_scmi: Use after free in scmi_create_protocol_device() ARM: dts: cygnus: fix irq type for arm global timer Revert "ARM: dts: logicpd-som-lv: Fix pinmux controller references" tee: check shm references are consistent in offset/size tee: shm: fix use-after-free via temporarily dropped reference ARM: dts: imx7s: Pass the 'fsl,sec-era' property ARM: dts: tegra20: Revert "Fix ULPI regression on Tegra20" ARM: dts: correct missing "compatible" entry for ti81xx SoCs ARM: OMAP1: ams-delta: fix deferred_fiq handler arm64: tegra: Make BCM89610 PHY interrupt as active low ARM: davinci: fix GPIO lookup for I2C ARM: dts: logicpd-som-lv: Fix pinmux controller references ARM: dts: logicpd-som-lv: Fix Audio Mute ARM: dts: logicpd-som-lv: Fix WL127x Startup Issues ...
2018-05-18Merge tag 'powerpc-4.17-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linuxLinus Torvalds1-0/+8
Pull powerpc fixes from Michael Ellerman: "Just three commits. The two cxl ones are not fixes per se, but they modify code that was added this cycle so that it will work with a recent firmware change. And then a fix for a recent commit that added sleeps in the NVRAM code, which needs to be more careful and not sleep if eg. we're called in the panic() path. Thanks to Nicholas Piggin, Philippe Bergheaud, Christophe Lombard" * tag 'powerpc-4.17-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/powernv: Fix NVRAM sleep in invalid context when crashing cxl: Report the tunneled operations status cxl: Set the PBCQ Tunnel BAR register when enabling capi mode
2018-05-18tcp: add tcp_comp_sack_nr sysctlEric Dumazet1-0/+6
This per netns sysctl allows for TCP SACK compression fine-tuning. This limits number of SACK that can be compressed. Using 0 disables SACK compression. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-18tcp: add tcp_comp_sack_delay_ns sysctlEric Dumazet1-0/+7
This per netns sysctl allows for TCP SACK compression fine-tuning. Its default value is 1,000,000, or 1 ms to meet TSO autosizing period. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-18dt-bindings: net: bluetooth: Add qualcomm-bluetoothThierry Escande1-0/+30
Add binding document for serial bluetooth chips using Qualcomm protocol. Signed-off-by: Thierry Escande <thierry.escande@linaro.org> Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2018-05-17Merge tag 'wireless-drivers-next-for-davem-2018-05-17' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-nextDavid S. Miller1-0/+31
Kalle Valo says: ==================== wireless-drivers-next patches for 4.18 The first pull request for 4.18. As usual new features and bug fixes but nothing really special. I also merged wireless-drivers due to an iwlwifi patch dependency. Major changes: iwlwifi * implement Traffic Condition Monitor and use it for scan, BT coex and to detect when the AP doesn't support UAPSD properly * some more work for the 22000 family of devices; * introduce AMSDU rate control offload qtnfmac * DFS offload support rsi * roaming enhancements * increase max supported aggregation subframes * don't advertise 5 GHz support if the device doesn't support it brcmfmac * add support for BCM4366E chipset * add support for bcm43364 wireless chipset ath10k * enable temperature reads for QCA6174 and QCA9377 * add firmware memory dump support for QCA9984 * continue adding WCN3990 support via SNOC bus ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-17tcp: disable RFC6675 loss detectionYuchung Cheng1-1/+2
This patch disables RFC6675 loss detection and make sysctl net.ipv4.tcp_recovery = 1 controls a binary choice between RACK (1) or RFC6675 (0). Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: Priyaranjan Jha <priyarjha@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-17tcp: support DUPACK threshold in RACKYuchung Cheng1-0/+1
This patch adds support for the classic DUPACK threshold rule (#DupThresh) in RACK. When the number of packets SACKed is greater or equal to the threshold, RACK sets the reordering window to zero which would immediately mark all the unsacked packets below the highest SACKed sequence lost. Since this approach is known to not work well with reordering, RACK only uses it if no reordering has been observed. The DUPACK threshold rule is a particularly useful extension to the fast recoveries triggered by RACK reordering timer. For example data-center transfers where the RTT is much smaller than a timer tick, or high RTT path where the default RTT/4 may take too long. Note that this patch differs slightly from RFC6675. RFC6675 considers a packet lost when at least #DupThresh higher-sequence packets are SACKed. With RACK, for connections that have seen reordering, RACK continues to use a dynamically-adaptive time-based reordering window to detect losses. But for connections on which we have not yet seen reordering, this patch considers a packet lost when at least one higher sequence packet is SACKed and the total number of SACKed packets is at least DupThresh. For example, suppose a connection has not seen reordering, and sends 10 packets, and packets 3, 5, 7 are SACKed. RFC6675 considers packets 1 and 2 lost. RACK considers packets 1, 2, 4, 6 lost. There is some small risk of spurious retransmits here due to reordering. However, this is mostly limited to the first flight of a connection on which the sender receives SACKs from reordering. And RFC 6675 and FACK loss detection have a similar risk on the first flight with reordering (it's just that the risk of spurious retransmits from reordering was slightly narrower for those older algorithms due to the margin of 3*MSS). Also the minimum reordering window is reduced from 1 msec to 0 to recover quicker on short RTT transfers. Therefore RACK is more aggressive in marking packets lost during recovery to reduce the reordering window timeouts. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: Priyaranjan Jha <priyarjha@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-17Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds1-3/+3
Pull kvm fixes from Paolo Bonzini: - ARM/ARM64 locking fixes - x86 fixes: PCID, UMIP, locking - improved support for recent Windows version that have a 2048 Hz APIC timer - rename KVM_HINTS_DEDICATED CPUID bit to KVM_HINTS_REALTIME - better behaved selftests * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: kvm: rename KVM_HINTS_DEDICATED to KVM_HINTS_REALTIME KVM: arm/arm64: VGIC/ITS save/restore: protect kvm_read_guest() calls KVM: arm/arm64: VGIC/ITS: protect kvm_read_guest() calls with SRCU lock KVM: arm/arm64: VGIC/ITS: Promote irq_lock() in update_affinity KVM: arm/arm64: Properly protect VGIC locks from IRQs KVM: X86: Lower the default timer frequency limit to 200us KVM: vmx: update sec exec controls for UMIP iff emulating UMIP kvm: x86: Suppress CR3_PCID_INVD bit only when PCIDs are enabled KVM: selftests: exit with 0 status code when tests cannot be run KVM: hyperv: idr_find needs RCU protection x86: Delay skip of emulated hypercall instruction KVM: Extend MAX_IRQ_ROUTES to 4096 for all archs
2018-05-17kvm: rename KVM_HINTS_DEDICATED to KVM_HINTS_REALTIMEMichael S. Tsirkin1-3/+3
KVM_HINTS_DEDICATED seems to be somewhat confusing: Guest doesn't really care whether it's the only task running on a host CPU as long as it's not preempted. And there are more reasons for Guest to be preempted than host CPU sharing, for example, with memory overcommit it can get preempted on a memory access, post copy migration can cause preemption, etc. Let's call it KVM_HINTS_REALTIME which seems to better match what guests expect. Also, the flag most be set on all vCPUs - current guests assume this. Note so in the documentation. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-05-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller6-732/+906
Daniel Borkmann says: ==================== pull-request: bpf-next 2018-05-17 The following pull-request contains BPF updates for your *net-next* tree. The main changes are: 1) Provide a new BPF helper for doing a FIB and neighbor lookup in the kernel tables from an XDP or tc BPF program. The helper provides a fast-path for forwarding packets. The API supports IPv4, IPv6 and MPLS protocols, but currently IPv4 and IPv6 are implemented in this initial work, from David (Ahern). 2) Just a tiny diff but huge feature enabled for nfp driver by extending the BPF offload beyond a pure host processing offload. Offloaded XDP programs are allowed to set the RX queue index and thus opening the door for defining a fully programmable RSS/n-tuple filter replacement. Once BPF decided on a queue already, the device data-path will skip the conventional RSS processing completely, from Jakub. 3) The original sockmap implementation was array based similar to devmap. However unlike devmap where an ifindex has a 1:1 mapping into the map there are use cases with sockets that need to be referenced using longer keys. Hence, sockhash map is added reusing as much of the sockmap code as possible, from John. 4) Introduce BTF ID. The ID is allocatd through an IDR similar as with BPF maps and progs. It also makes BTF accessible to user space via BPF_BTF_GET_FD_BY_ID and adds exposure of the BTF data through BPF_OBJ_GET_INFO_BY_FD, from Martin. 5) Enable BPF stackmap with build_id also in NMI context. Due to the up_read() of current->mm->mmap_sem build_id cannot be parsed. This work defers the up_read() via a per-cpu irq_work so that at least limited support can be enabled, from Song. 6) Various BPF JIT follow-up cleanups and fixups after the LD_ABS/LD_IND JIT conversion as well as implementation of an optimized 32/64 bit immediate load in the arm64 JIT that allows to reduce the number of emitted instructions; in case of tested real-world programs they were shrinking by three percent, from Daniel. 7) Add ifindex parameter to the libbpf loader in order to enable BPF offload support. Right now only iproute2 can load offloaded BPF and this will also enable libbpf for direct integration into other applications, from David (Beckett). 8) Convert the plain text documentation under Documentation/bpf/ into RST format since this is the appropriate standard the kernel is moving to for all documentation. Also add an overview README.rst, from Jesper. 9) Add __printf verification attribute to the bpf_verifier_vlog() helper. Though it uses va_list we can still allow gcc to check the format string, from Mathieu. 10) Fix a bash reference in the BPF selftest's Makefile. The '|& ...' is a bash 4.0+ feature which is not guaranteed to be available when calling out to shell, therefore use a more portable variant, from Joe. 11) Fix a 64 bit division in xdp_umem_reg() by using div_u64() instead of relying on the gcc built-in, from Björn. 12) Fix a sock hashmap kmalloc warning reported by syzbot when an overly large key size is used in hashmap then causing overflows in htab->elem_size. Reject bogus attr->key_size early in the sock_hash_alloc(), from Yonghong. 13) Ensure in BPF selftests when urandom_read is being linked that --build-id is always enabled so that test_stacktrace_build_id[_nmi] won't be failing, from Alexei. 14) Add bitsperlong.h as well as errno.h uapi headers into the tools header infrastructure which point to one of the arch specific uapi headers. This was needed in order to fix a build error on some systems for the BPF selftests, from Sirio. 15) Allow for short options to be used in the xdp_monitor BPF sample code. And also a bpf.h tools uapi header sync in order to fix a selftest build failure. Both from Prashant. 16) More formally clarify the meaning of ID in the direct packet access section of the BPF documentation, from Wang. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-16Revert "bonding: allow carrier and link status to determine link state"Debabrata Banerjee1-2/+2
This reverts commit 1386c36b30388f46a95100924bfcae75160db715. We don't want to encourage drivers to not report carrier status correctly, therefore remove this commit. Signed-off-by: Debabrata Banerjee <dbanerje@akamai.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-16net: phy: micrel: add 125MHz reference clock workaroundMarkus Niebel1-0/+7
The micrel KSZ9031 phy has a optional clock pin (CLK125_NDO) which can be used as reference clock for the MAC unit. The clock signal must meet the RGMII requirements to ensure the correct data transmission between the MAC and the PHY. The KSZ9031 phy does not fulfill the duty cycle requirement if the phy is configured as slave. For a complete describtion look at the errata sheets: DS80000691D or DS80000692D. The errata sheet recommends to force the phy into master mode whenever there is a 1000Base-T link-up as work around. Only set the "micrel,force-master" property if you use the phy reference clock provided by CLK125_NDO pin as MAC reference clock in your application. Attenation, this workaround is only usable if the link partner can be configured to slave mode for 1000Base-T. Signed-off-by: Markus Niebel <Markus.Niebel@tqs.de> [m.felsch@pengutronix.de: fix dt-binding documentation] [m.felsch@pengutronix.de: use already existing result var for read/write] [m.felsch@pengutronix.de: add error handling] [m.felsch@pengutronix.de: add more comments] Signed-off-by: Marco Felsch <m.felsch@pengutronix.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-16bonding: allow carrier and link status to determine link stateDebabrata Banerjee1-2/+2
In a mixed environment it may be difficult to tell if your hardware support carrier, if it does not it can always report true. With a new use_carrier option of 2, we can check both carrier and link status sequentially, instead of one or the other Signed-off-by: Debabrata Banerjee <dbanerje@akamai.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-15dt-bindings: net: add DT bindings for Microsemi Ocelot SwitchAlexandre Belloni1-0/+82
DT bindings for the Ethernet switch found on Microsemi Ocelot platforms. Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-15dt-bindings: net: add DT bindings for Microsemi MIIMAlexandre Belloni1-0/+26
DT bindings for the Microsemi MII Management Controller found on Microsemi SoCs Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-15cxl: Report the tunneled operations statusPhilippe Bergheaud1-0/+8
Failure to synchronize the tunneled operations does not prevent the initialization of the cxl card. This patch reports the tunneled operations status via /sys. Signed-off-by: Philippe Bergheaud <felix@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-14bpf, doc: howto use/run the BPF selftestsJesper Dangaard Brouer1-0/+29
I always forget howto run the BPF selftests. Thus, lets add that info to the QA document. Documentation was based on Cilium's documentation: http://cilium.readthedocs.io/en/latest/bpf/#verifying-the-setup Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-05-14bpf, doc: convert bpf_devel_QA.rst to use RST formattingJesper Dangaard Brouer1-379/+420
Same story as bpf_design_QA.rst RST format conversion. Again thanks to Quentin Monnet <quentin.monnet@netronome.com> for fixes and patches that have been squashed. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-05-14bpf, doc: convert bpf_design_QA.rst to use RST formattingJesper Dangaard Brouer1-79/+144
The RST formatting is done such that that when rendered or converted to different formats, an automatic index with links are created to the subsections. Thus, the questions are created as sections (or subsections), in-order to get the wanted auto-generated FAQ/QA index. Special thanks to Quentin Monnet <quentin.monnet@netronome.com> who have reviewed and corrected both RST formatting and GitHub rendering issues in this file. Those commits have been squashed. I've manually tested that this also renders nicely if included as part of the kernel 'make htmldocs'. As the end-goal is for this to become more integrated with kernel-doc project/movement. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-05-14bpf, doc: rename txt files to rst filesJesper Dangaard Brouer3-2/+2
This will cause them to get auto rendered, e.g. when viewing them on GitHub. Followup patches will correct the content to be RST compliant. Also adjust README.rst to point to the renamed files. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-05-14bpf, doc: add basic README.rst fileJesper Dangaard Brouer1-0/+36
A README.rst file in a directory have special meaning for sites like github, which auto renders the contents. Plus search engines like Google also index these README.rst files. Auto rendering allow us to use links, for (re)directing eBPF users to other places where docs live. The end-goal would be to direct users towards https://www.kernel.org/doc/html/latest but we haven't written the full docs yet, so we start out small and take this incrementally. This directory itself contains some useful docs, which can be linked to from the README.rst file (verified this works for github). Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-05-14dt-bindings: net: dwmac-sun8i: Add binding for GMAC on Allwinner R40 SoCChen-Yu Tsai1-0/+3
The Allwinner R40 SoC has the EMAC controller supported by dwmac-sun8i. It is named "GMAC", while EMAC refers to the 10/100 Mbps Ethernet controller supported by sun4i-emac. The controller is the same, but the R40 has the glue layer controls in the clock control unit (CCU), with a reduced RX delay chain, and no TX delay chain. This patch adds the R40 specific bits to the dwmac-sun8i binding. Signed-off-by: Chen-Yu Tsai <wens@csie.org> Reviewed-by: Rob Herring <robh@kernel.org> Acked-by: Maxime Ripard <maxime.ripard@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-14dt-bindings: net: dwmac-sun8i: simplify description of syscon propertyChen-Yu Tsai1-6/+1
The syscon property is used to point to the device that holds the glue layer control register known as the "EMAC (or GMAC) clock register". We do not need to explicitly list what compatible strings are needed, as this information is readily available in the user manuals. Also the "syscon" device type is more of an implementation detail. There are many ways to access a register not in a device's address range, the syscon interface being the most generic and unrestricted one. Simplify the description so that it says what it is supposed to describe. Signed-off-by: Chen-Yu Tsai <wens@csie.org> Reviewed-by: Rob Herring <robh@kernel.org> Acked-by: Maxime Ripard <maxime.ripard@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-14dt-bindings: net: dwmac-sun8i: Sort syscon compatibles by alphabetical orderChen-Yu Tsai1-1/+1
The A83T syscon compatible was appended to the syscon compatibles list, instead of inserted in to preserve the ordering. Move it to the proper place to keep the list sorted. Signed-off-by: Chen-Yu Tsai <wens@csie.org> Reviewed-by: Rob Herring <robh@kernel.org> Acked-by: Maxime Ripard <maxime.ripard@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-14dt-bindings: net: dwmac-sun8i: Clean up clock delay chain descriptionsChen-Yu Tsai1-4/+7
The clock delay chains found in the glue layer for dwmac-sun8i are only used with RGMII PHYs. They are not intended for non-RGMII PHYs, such as MII external PHYs or the internal PHY. Also, a recent SoC has a smaller range of possible values for the delay chain. This patch reformats the delay chain section of the device tree binding to make it clear that the delay chains only apply to RGMII PHYs, and make it easier to add the R40-specific bits later. Signed-off-by: Chen-Yu Tsai <wens@csie.org> Reviewed-by: Rob Herring <robh@kernel.org> Acked-by: Maxime Ripard <maxime.ripard@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-14Merge tag 'mvebu-fixes-4.17-1' of git://git.infradead.org/linux-mvebu into fixesOlof Johansson1-4/+5
mvebu fixes for 4.17 (part 1) Declare missing clocks needed for network on Armada 8040 base boards (such as the McBin) * tag 'mvebu-fixes-4.17-1' of git://git.infradead.org/linux-mvebu: ARM64: dts: marvell: armada-cp110: Add mg_core_clk for ethernet node ARM64: dts: marvell: armada-cp110: Add clocks for the xmdio node Signed-off-by: Olof Johansson <olof@lixom.net>
2018-05-11Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller17-17/+34
The bpf syscall and selftests conflicts were trivial overlapping changes. The r8169 change involved moving the added mdelay from 'net' into a different function. A TLS close bug fix overlapped with the splitting of the TLS state into separate TX and RX parts. I just expanded the tests in the bug fix from "ctx->conf == X" into "ctx->tx_conf == X && ctx->rx_conf == X". Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-11Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds1-1/+3
Pull networking fixes from David Miller: 1) Verify lengths of keys provided by the user is AF_KEY, from Kevin Easton. 2) Add device ID for BCM89610 PHY. Thanks to Bhadram Varka. 3) Add Spectre guards to some ATM code, courtesy of Gustavo A. R. Silva. 4) Fix infinite loop in NSH protocol code. To Eric Dumazet we are most grateful for this fix. 5) Line up /proc/net/netlink headers properly. This fix from YU Bo, we do appreciate. 6) Use after free in TLS code. Once again we are blessed by the honorable Eric Dumazet with this fix. 7) Fix regression in TLS code causing stalls on partial TLS records. This fix is bestowed upon us by Andrew Tomt. 8) Deal with too small MTUs properly in LLC code, another great gift from Eric Dumazet. 9) Handle cached route flushing properly wrt. MTU locking in ipv4, to Hangbin Liu we give thanks for this. 10) Fix regression in SO_BINDTODEVIC handling wrt. UDP socket demux. Paolo Abeni, he gave us this. 11) Range check coalescing parameters in mlx4 driver, thank you Moshe Shemesh. 12) Some ipv6 ICMP error handling fixes in rxrpc, from our good brother David Howells. 13) Fix kexec on mlx5 by freeing IRQs in shutdown path. Daniel Juergens, you're the best! 14) Don't send bonding RLB updates to invalid MAC addresses. Debabrata Benerjee saved us! 15) Uh oh, we were leaking in udp_sendmsg and ping_v4_sendmsg. The ship is now water tight, thanks to Andrey Ignatov. 16) IPSEC memory leak in ixgbe from Colin Ian King, man we've got holes everywhere! 17) Fix error path in tcf_proto_create, Jiri Pirko what would we do without you! * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (92 commits) net sched actions: fix refcnt leak in skbmod net: sched: fix error path in tcf_proto_create() when modules are not configured net sched actions: fix invalid pointer dereferencing if skbedit flags missing ixgbe: fix memory leak on ipsec allocation ixgbevf: fix ixgbevf_xmit_frame()'s return type ixgbe: return error on unsupported SFP module when resetting ice: Set rq_last_status when cleaning rq ipv4: fix memory leaks in udp_sendmsg, ping_v4_sendmsg mlxsw: core: Fix an error handling path in 'mlxsw_core_bus_device_register()' bonding: send learning packets for vlans on slave bonding: do not allow rlb updates to invalid mac net/mlx5e: Err if asked to offload TC match on frag being first net/mlx5: E-Switch, Include VF RDMA stats in vport statistics net/mlx5: Free IRQs in shutdown path rxrpc: Trace UDP transmission failure rxrpc: Add a tracepoint to log ICMP/ICMP6 and error messages rxrpc: Fix the min security level for kernel calls rxrpc: Fix error reception on AF_INET6 sockets rxrpc: Fix missing start of call timeout qed: fix spelling mistake: "taskelt" -> "tasklet" ...
2018-05-11net: doc: fix spelling mistake: "modrobe.d" -> "modprobe.d"Tonghao Zhang1-1/+1
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-11Merge tag 'pm-4.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pmLinus Torvalds2-2/+2
Pull power management fixes from Rafael Wysocki: "These fix two PCI power management regressions from the 4.13 cycle and one cpufreq schedutil governor bug introduced during the 4.12 cycle, drop a stale comment from the schedutil code and fix two mistakes in docs. Specifics: - Restore device_may_wakeup() check in pci_enable_wake() removed inadvertently during the 4.13 cycle to prevent systems from drawing excessive power when suspended or off, among other things (Rafael Wysocki). - Fix pci_dev_run_wake() to properly handle devices that only can signal PME# when in the D3cold power state (Kai Heng Feng). - Fix the schedutil cpufreq governor to avoid using UINT_MAX as the new CPU frequency in some cases due to a missing check (Rafael Wysocki). - Remove a stale comment regarding worker kthreads from the schedutil cpufreq governor (Juri Lelli). - Fix a copy-paste mistake in the intel_pstate driver documentation (Juri Lelli). - Fix a typo in the system sleep states documentation (Jonathan Neuschäfer)" * tag 'pm-4.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: PCI / PM: Check device_may_wakeup() in pci_enable_wake() PCI / PM: Always check PME wakeup capability for runtime wakeup support cpufreq: schedutil: Avoid using invalid next_freq cpufreq: schedutil: remove stale comment PM: docs: intel_pstate: fix Active Mode w/o HWP paragraph PM: docs: sleep-states: Fix a typo ("includig")
2018-05-11bpf, doc: clarification for the meaning of 'id'Wang YanQing1-6/+9
For me, as a reader whose mother language isn't English, the old words bring a little difficulty to catch the meaning, this patch rewords the subsection in a more clarificatory way. This patch also add blank lines as separator at two places to improve readability. Signed-off-by: Wang YanQing <udknight@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-10Merge tag 'for-4.17/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dmLinus Torvalds1-1/+4
Pull device mapper fixes from Mike Snitzer: - a stable fix for DM integrity to use kvfree - fix for a 4.17-rc1 change to dm-bufio's buffer alignment - fixes for a few sparse warnings - remove VLA usage in DM mirror target - improve DM thinp Documentation for the "read_only" feature * tag 'for-4.17/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm thin: update Documentation to clarify when "read_only" is valid dm mirror: remove VLA usage dm: fix some sparse warnings and whitespace in dax methods dm cache background tracker: fix sparse warning dm bufio: fix buffer alignment dm integrity: use kvfree for kvmalloc'd memory
2018-05-10dm thin: update Documentation to clarify when "read_only" is validMike Snitzer1-1/+4
Due to user confusion, clarify that it doesn't make sense to try to create a thin-pool with "read_only" mode enabled. Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-05-09Documentation/spec_ctrl: Do some minor cleanupsBorislav Petkov1-12/+12
Fix some typos, improve formulations, end sentences with a fullstop. Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2018-05-09PM: docs: intel_pstate: fix Active Mode w/o HWP paragraphJuri Lelli1-1/+1
P-state selection algorithm (powersave or performance) is selected by echoing the desired choice to scaling_governor sysfs attribute and not to scaling_cur_freq (as currently stated). Fix it. Signed-off-by: Juri Lelli <juri.lelli@redhat.com> Reviewed-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-05-09PM: docs: sleep-states: Fix a typo ("includig")Jonathan Neuschäfer1-1/+1
Fix a typo in admin-guide/pm/sleep-states.rst. Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-05-08dt-bindings: dsa: Remove unnecessary #address/#size-cellsFabio Estevam1-6/+0
If the example binding is used on a real dts file, the following DTC warning is seen with W=1: arch/arm/boot/dts/imx6q-b450v3.dtb: Warning (avoid_unnecessary_addr_size): /mdio-gpio/switch@0: unnecessary #address-cells/#size-cells without "ranges" or child "reg" property Remove unnecessary #address-cells/#size-cells to improve the binding document examples. Signed-off-by: Fabio Estevam <fabio.estevam@nxp.com> Reviewed-by: Rob Herring <robh@kernel.org> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-08Merge branch 'for-4.17-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libataLinus Torvalds1-1/+0
Pull libata fixes from Tejun Heo: "An earlier commit to add reset control for embedded ahci controllers affected some of the hardware specific drivers and got reverted for now. Other than that, just per-device workarounds and trivial changes" * 'for-4.17-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata: driver core: add __printf verification to __ata_ehi_pushv_desc ata: fix spelling mistake: "directon" -> "direction" libata: Apply NOLPM quirk for SanDisk SD7UB3Q*G1001 SSDs libata: Apply NOLPM quirk for SAMSUNG MZMPC128HBFU-000MV SSD ata: ahci: mvebu: override ahci_stop_engine for mvebu AHCI libahci: Allow drivers to override stop_engine Revert "ata: ahci-platform: add reset control support"