aboutsummaryrefslogtreecommitdiffstats
path: root/include (follow)
AgeCommit message (Collapse)AuthorFilesLines
2019-07-26Merge tag 'mlx5-fixes-2019-07-25' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linuxDavid S. Miller2-2/+5
Saeed Mahameed says: ==================== Mellanox, mlx5 fixes 2019-07-25 This series introduces some fixes to mlx5 driver. 1) Ariel is addressing an issue with enacp flow counter race condition 2) Aya fixes ethtool speed handling 3) Edward fixes modify_cq hw bits alignment 4) Maor fixes RDMA_RX capabilities handling 5) Mark reverses unregister devices order to address an issue with LAG 6) From Tariq, - wrong max num channels indication regression - TLS counters naming and documentation as suggested by Jakub - kTLS, Call WARN_ONCE on netdev mismatch There is one patch in this series that touches nfp driver to align TLS statistics names with latest documentation, Jakub is CC'ed. Please pull and let me know if there is any problem. For -stable v4.9: ('net/mlx5: Use reversed order when unregister devices') For -stable v4.20 ('net/mlx5e: Prevent encap flow counter update async to user query') ('net/mlx5: Fix modify_cq_in alignment') For -stable v5.1 ('net/mlx5e: Fix matching of speed to PRM link modes') For -stable v5.2 ('net/mlx5: Add missing RDMA_RX capabilities') ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-26net: qualcomm: rmnet: Fix incorrect UL checksum offload logicSubash Abhinov Kasiviswanathan1-2/+2
The udp_ip4_ind bit is set only for IPv4 UDP non-fragmented packets so that the hardware can flip the checksum to 0xFFFF if the computed checksum is 0 per RFC768. However, this bit had to be set for IPv6 UDP non fragmented packets as well per hardware requirements. Otherwise, IPv6 UDP packets with computed checksum as 0 were transmitted by hardware and were dropped in the network. In addition to setting this bit for IPv6 UDP, the field is also appropriately renamed to udp_ind as part of this change. Fixes: 5eb5f8608ef1 ("net: qualcomm: rmnet: Add support for TX checksum offload") Cc: Sean Tranchetti <stranche@codeaurora.org> Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-25Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller4-5/+34
Alexei Starovoitov says: ==================== pull-request: bpf 2019-07-25 The following pull-request contains BPF updates for your *net* tree. The main changes are: 1) fix segfault in libbpf, from Andrii. 2) fix gso_segs access, from Eric. 3) tls/sockmap fixes, from Jakub and John. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-25net/mlx5e: Prevent encap flow counter update async to user queryAriel Levkovich1-0/+1
This patch prevents a race between user invoked cached counters query and a neighbor last usage updater. The cached flow counter stats can be queried by calling "mlx5_fc_query_cached" which provides the number of bytes and packets that passed via this flow since the last time this counter was queried. It does so by reducting the last saved stats from the current, cached stats and then updating the last saved stats with the cached stats. It also provide the lastuse value for that flow. Since "mlx5e_tc_update_neigh_used_value" needs to retrieve the last usage time of encapsulation flows, it calls the flow counter query method periodically and async to user queries of the flow counter using cls_flower. This call is causing the driver to update the last reported bytes and packets from the cache and therefore, future user queries of the flow stats will return lower than expected number for bytes and packets since the last saved stats in the driver was updated async to the last saved stats in cls_flower. This causes wrong stats presentation of encapsulation flows to user. Since the neighbor usage updater only needs the lastuse stats from the cached counter, the fix is to use a dedicated lastuse query call that returns the lastuse value without synching between the cached stats and the last saved stats. Fixes: f6dfb4c3f216 ("net/mlx5e: Update neighbour 'used' state using HW flow rules counters") Signed-off-by: Ariel Levkovich <lariel@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-07-25net/mlx5: Fix modify_cq_in alignmentEdward Srouji1-2/+4
Fix modify_cq_in alignment to match the device specification. After this fix the 'cq_umem_valid' field will be in the right offset. Cc: <stable@vger.kernel.org> # 4.19 Fixes: bd37197554eb ("net/mlx5: Update mlx5_ifc with DEVX UID bits") Signed-off-by: Edward Srouji <edwards@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-07-25lib/dim: Fix -Wunused-const-variable warningsLeon Romanovsky1-56/+0
DIM causes to the following warnings during kernel compilation which indicates that tx_profile and rx_profile are supposed to be declared in *.c and not in *.h files. In file included from ./include/rdma/ib_verbs.h:64, from ./include/linux/mlx5/device.h:37, from ./include/linux/mlx5/driver.h:51, from ./include/linux/mlx5/vport.h:36, from drivers/infiniband/hw/mlx5/ib_virt.c:34: ./include/linux/dim.h:326:1: warning: _tx_profile_ defined but not used [-Wunused-const-variable=] 326 | tx_profile[DIM_CQ_PERIOD_NUM_MODES][NET_DIM_PARAMS_NUM_PROFILES] = { | ^~~~~~~~~~ ./include/linux/dim.h:320:1: warning: _rx_profile_ defined but not used [-Wunused-const-variable=] 320 | rx_profile[DIM_CQ_PERIOD_NUM_MODES][NET_DIM_PARAMS_NUM_PROFILES] = { | ^~~~~~~~~~ Fixes: 4f75da3666c0 ("linux/dim: Move implementation to .c files") Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-23bpf: fix narrower loads on s390Ilya Leoshkevich1-0/+13
The very first check in test_pkt_md_access is failing on s390, which happens because loading a part of a struct __sk_buff field produces an incorrect result. The preprocessed code of the check is: { __u8 tmp = *((volatile __u8 *)&skb->len + ((sizeof(skb->len) - sizeof(__u8)) / sizeof(__u8))); if (tmp != ((*(volatile __u32 *)&skb->len) & 0xFF)) return 2; }; clang generates the following code for it: 0: 71 21 00 03 00 00 00 00 r2 = *(u8 *)(r1 + 3) 1: 61 31 00 00 00 00 00 00 r3 = *(u32 *)(r1 + 0) 2: 57 30 00 00 00 00 00 ff r3 &= 255 3: 5d 23 00 1d 00 00 00 00 if r2 != r3 goto +29 <LBB0_10> Finally, verifier transforms it to: 0: (61) r2 = *(u32 *)(r1 +104) 1: (bc) w2 = w2 2: (74) w2 >>= 24 3: (bc) w2 = w2 4: (54) w2 &= 255 5: (bc) w2 = w2 The problem is that when verifier emits the code to replace a partial load of a struct __sk_buff field (*(u8 *)(r1 + 3)) with a full load of struct sk_buff field (*(u32 *)(r1 + 104)), an optional shift and a bitwise AND, it assumes that the machine is little endian and incorrectly decides to use a shift. Adjust shift count calculation to account for endianness. Fixes: 31fd85816dbe ("bpf: permits narrower load from bpf program context fields") Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-07-22Merge tag 'media/v5.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-mediaLinus Torvalds1-1/+7
Pull media fixes from Mauro Carvalho Chehab: "For two regressions in media core: - v4l2-subdev: fix regression in check_pad() - videodev2.h: change V4L2_PIX_FMT_BGRA444 define: fourcc was already in use" * tag 'media/v5.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: media: videodev2.h: change V4L2_PIX_FMT_BGRA444 define: fourcc was already in use media: v4l2-subdev: fix regression in check_pad()
2019-07-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds11-26/+48
Pull networking fixes from David Miller: 1) Several netfilter fixes including a nfnetlink deadlock fix from Florian Westphal and fix for dropping VRF packets from Miaohe Lin. 2) Flow offload fixes from Pablo Neira Ayuso including a fix to restore proper block sharing. 3) Fix r8169 PHY init from Thomas Voegtle. 4) Fix memory leak in mac80211, from Lorenzo Bianconi. 5) Missing NULL check on object allocation in cxgb4, from Navid Emamdoost. 6) Fix scaling of RX power in sfp phy driver, from Andrew Lunn. 7) Check that there is actually an ip header to access in skb->data in VRF, from Peter Kosyh. 8) Remove spurious rcu unlock in hv_netvsc, from Haiyang Zhang. 9) One more tweak the the TCP fragmentation memory limit changes, to be less harmful to applications setting small SO_SNDBUF values. From Eric Dumazet. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (40 commits) tcp: be more careful in tcp_fragment() hv_netvsc: Fix extra rcu_read_unlock in netvsc_recv_callback() vrf: make sure skb->data contains ip header to make routing connector: remove redundant input callback from cn_dev qed: Prefer pcie_capability_read_word() igc: Prefer pcie_capability_read_word() cxgb4: Prefer pcie_capability_read_word() be2net: Synchronize be_update_queues with dev_watchdog bnx2x: Prevent load reordering in tx completion processing net: phy: sfp: hwmon: Fix scaling of RX power net: sched: verify that q!=NULL before setting q->flags chelsio: Fix a typo in a function name allocate_flower_entry: should check for null deref net: hns3: typo in the name of a constant kbuild: add net/netfilter/nf_tables_offload.h to header-test blacklist. tipc: Fix a typo mac80211: don't warn about CW params when not using them mac80211: fix possible memory leak in ieee80211_assign_beacon nl80211: fix NL80211_HE_MAX_CAPABILITY_LEN nl80211: fix VENDOR_CMD_RAW_DATA ...
2019-07-22bpf: sockmap/tls, close can race with map freeJohn Fastabend2-1/+10
When a map free is called and in parallel a socket is closed we have two paths that can potentially reset the socket prot ops, the bpf close() path and the map free path. This creates a problem with which prot ops should be used from the socket closed side. If the map_free side completes first then we want to call the original lowest level ops. However, if the tls path runs first we want to call the sockmap ops. Additionally there was no locking around prot updates in TLS code paths so the prot ops could be changed multiple times once from TLS path and again from sockmap side potentially leaving ops pointed at either TLS or sockmap when psock and/or tls context have already been destroyed. To fix this race first only update ops inside callback lock so that TLS, sockmap and lowest level all agree on prot state. Second and a ULP callback update() so that lower layers can inform the upper layer when they are being removed allowing the upper layer to reset prot ops. This gets us close to allowing sockmap and tls to be stacked in arbitrary order but will save that patch for *next trees. v4: - make sure we don't free things for device; - remove the checks which swap the callbacks back only if TLS is at the top. Reported-by: syzbot+06537213db7ba2745c4a@syzkaller.appspotmail.com Fixes: 02c558b2d5d6 ("bpf: sockmap, support for msg_peek in sk_msg with redirect ingress") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-22net/tls: fix transition through disconnect with closeJohn Fastabend1-1/+4
It is possible (via shutdown()) for TCP socks to go through TCP_CLOSE state via tcp_disconnect() without actually calling tcp_close which would then call the tls close callback. Because of this a user could disconnect a socket then put it in a LISTEN state which would break our assumptions about sockets always being ESTABLISHED state. More directly because close() can call unhash() and unhash is implemented by sockmap if a sockmap socket has TLS enabled we can incorrectly destroy the psock from unhash() and then call its close handler again. But because the psock (sockmap socket representation) is already destroyed we call close handler in sk->prot. However, in some cases (TLS BASE/BASE case) this will still point at the sockmap close handler resulting in a circular call and crash reported by syzbot. To fix both above issues implement the unhash() routine for TLS. v4: - add note about tls offload still needing the fix; - move sk_proto to the cold cache line; - split TX context free into "release" and "free", otherwise the GC work itself is in already freed memory; - more TX before RX for consistency; - reuse tls_ctx_free(); - schedule the GC work after we're done with context to avoid UAF; - don't set the unhash in all modes, all modes "inherit" TLS_BASE's callbacks anyway; - disable the unhash hook for TLS_HW. Fixes: 3c4d7559159bf ("tls: kernel TLS support") Reported-by: Eric Dumazet <edumazet@google.com> Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-22net/tls: remove sock unlock/lock around strp_done()John Fastabend1-3/+4
The tls close() callback currently drops the sock lock to call strp_done(). Split up the RX cleanup into stopping the strparser and releasing most resources, syncing strparser and finally freeing the context. To avoid the need for a strp_done() call on the cleanup path of device offload make sure we don't arm the strparser until we are sure init will be successful. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-22net/tls: remove close callback sock unlock/lock around TX work flushJohn Fastabend1-0/+2
The tls close() callback currently drops the sock lock, makes a cancel_delayed_work_sync() call, and then relocks the sock. By restructuring the code we can avoid droping lock and then reclaiming it. To simplify this we do the following, tls_sk_proto_close set_bit(CLOSING) set_bit(SCHEDULE) cancel_delay_work_sync() <- cancel workqueue lock_sock(sk) ... release_sock(sk) strp_done() Setting the CLOSING bit prevents the SCHEDULE bit from being cleared by any workqueue items e.g. if one happens to be scheduled and run between when we set SCHEDULE bit and cancel work. Then because SCHEDULE bit is set now no new work will be scheduled. Tested with net selftests and bpf selftests. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-22net/tls: don't arm strparser immediately in tls_set_sw_offload()Jakub Kicinski1-0/+1
In tls_set_device_offload_rx() we prepare the software context for RX fallback and proceed to add the connection to the device. Unfortunately, software context prep includes arming strparser so in case of a later error we have to release the socket lock to call strp_done(). In preparation for not releasing the socket lock half way through callbacks move arming strparser into a separate function. Following patches will make use of that. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-21tcp: be more careful in tcp_fragment()Eric Dumazet1-0/+5
Some applications set tiny SO_SNDBUF values and expect TCP to just work. Recent patches to address CVE-2019-11478 broke them in case of losses, since retransmits might be prevented. We should allow these flows to make progress. This patch allows the first and last skb in retransmit queue to be split even if memory limits are hit. It also adds the some room due to the fact that tcp_sendmsg() and tcp_sendpage() might overshoot sk_wmem_queued by about one full TSO skb (64KB size). Note this allowance was already present in stable backports for kernels < 4.15 Note for < 4.15 backports : tcp_rtx_queue_tail() will probably look like : static inline struct sk_buff *tcp_rtx_queue_tail(const struct sock *sk) { struct sk_buff *skb = tcp_send_head(sk); return skb ? tcp_write_queue_prev(sk, skb) : tcp_write_queue_tail(sk); } Fixes: f070ef2ac667 ("tcp: tcp_fragment() should apply sane memory limits") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Andrew Prout <aprout@ll.mit.edu> Tested-by: Andrew Prout <aprout@ll.mit.edu> Tested-by: Jonathan Lemon <jonathan.lemon@gmail.com> Tested-by: Michal Kubecek <mkubecek@suse.cz> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Acked-by: Christoph Paasch <cpaasch@apple.com> Cc: Jonathan Looney <jtl@netflix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-21connector: remove redundant input callback from cn_devVasily Averin1-1/+0
A small cleanup: this callback is never used. Originally fixed by Stanislav Kinsburskiy <skinsbursky@virtuozzo.com> for OpenVZ7 bug OVZ-6877 cc: stanislav.kinsburskiy@gmail.com Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-21kbuild: add net/netfilter/nf_tables_offload.h to header-test blacklist.Jeremy Sowden1-0/+1
net/netfilter/nf_tables_offload.h includes net/netfilter/nf_tables.h which is itself on the blacklist. Reported-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-21Merge tag 'ntb-5.3' of git://github.com/jonmason/ntbLinus Torvalds3-3/+214
Pull NTB updates from Jon Mason: "New feature to add support for NTB virtual MSI interrupts, the ability to test and use this feature in the NTB transport layer. Also, bug fixes for the AMD and Switchtec drivers, as well as some general patches" * tag 'ntb-5.3' of git://github.com/jonmason/ntb: (22 commits) NTB: Describe the ntb_msi_test client in the documentation. NTB: Add MSI interrupt support to ntb_transport NTB: Add ntb_msi_test support to ntb_test NTB: Introduce NTB MSI Test Client NTB: Introduce MSI library NTB: Rename ntb.c to support multiple source files in the module NTB: Introduce functions to calculate multi-port resource index NTB: Introduce helper functions to calculate logical port number PCI/switchtec: Add module parameter to request more interrupts PCI/MSI: Support allocating virtual MSI interrupts ntb_hw_switchtec: Fix setup MW with failure bug ntb_hw_switchtec: Skip unnecessary re-setup of shared memory window for crosslink case ntb_hw_switchtec: Remove redundant steps of switchtec_ntb_reinit_peer() function NTB: correct ntb_dev_ops and ntb_dev comment typos NTB: amd: Silence shift wrapping warning in amd_ntb_db_vector_mask() ntb_hw_switchtec: potential shift wrapping bug in switchtec_ntb_init_sndev() NTB: ntb_transport: Ensure qp->tx_mw_dma_addr is initaliazed NTB: ntb_hw_amd: set peer limit register NTB: ntb_perf: Clear stale values in doorbell and command SPAD register NTB: ntb_perf: Disable NTB link after clearing peer XLAT registers ...
2019-07-20nl80211: fix NL80211_HE_MAX_CAPABILITY_LENJohn Crispin1-1/+1
NL80211_HE_MAX_CAPABILITY_LEN has changed between D2.0 and D4.0. It is now MAC (6) + PHY (11) + MCS (12) + PPE (25) = 54. Signed-off-by: John Crispin <john@phrozen.org> Link: https://lore.kernel.org/r/20190627095832.19445-1-john@phrozen.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-07-20nl80211: fix VENDOR_CMD_RAW_DATAJohannes Berg1-1/+1
Since ERR_PTR() is an inline, not a macro, just open-code it here so it's usable as an initializer, fixing the build in brcmfmac. Reported-by: Arend Van Spriel <arend.vanspriel@broadcom.com> Fixes: 901bb9891855 ("nl80211: require and validate vendor command policy") Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-07-20Merge tag 'dma-mapping-5.3-1' of git://git.infradead.org/users/hch/dma-mappingLinus Torvalds2-0/+23
Pull dma-mapping fixes from Christoph Hellwig: "Fix various regressions: - force unencrypted dma-coherent buffers if encryption bit can't fit into the dma coherent mask (Tom Lendacky) - avoid limiting request size if swiotlb is not used (me) - fix swiotlb handling in dma_direct_sync_sg_for_cpu/device (Fugang Duan)" * tag 'dma-mapping-5.3-1' of git://git.infradead.org/users/hch/dma-mapping: dma-direct: correct the physical addr in dma_direct_sync_sg_for_cpu/device dma-direct: only limit the mapping size if swiotlb could be used dma-mapping: add a dma_addressing_limited helper dma-direct: Force unencrypted DMA under SME for certain DMA masks
2019-07-20Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds3-0/+11
Pull core fixes from Thomas Gleixner: - A collection of objtool fixes which address recent fallout partially exposed by newer toolchains, clang, BPF and general code changes. - Force USER_DS for user stack traces [ Note: the "objtool fixes" are not all to objtool itself, but for kernel code that triggers objtool warnings. Things like missing function size annotations, or code that confuses the unwinder etc. - Linus] * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (27 commits) objtool: Support conditional retpolines objtool: Convert insn type to enum objtool: Fix seg fault on bad switch table entry objtool: Support repeated uses of the same C jump table objtool: Refactor jump table code objtool: Refactor sibling call detection logic objtool: Do frame pointer check before dead end check objtool: Change dead_end_function() to return boolean objtool: Warn on zero-length functions objtool: Refactor function alias logic objtool: Track original function across branches objtool: Add mcsafe_handle_tail() to the uaccess safe list bpf: Disable GCC -fgcse optimization for ___bpf_prog_run() x86/uaccess: Remove redundant CLACs in getuser/putuser error paths x86/uaccess: Don't leak AC flag into fentry from mcsafe_handle_tail() x86/uaccess: Remove ELF function annotation from copy_user_handle_tail() x86/head/64: Annotate start_cpu0() as non-callable x86/entry: Fix thunk function ELF sizes x86/kvm: Don't call kvm_spurious_fault() from .fixup x86/kvm: Replace vmx_vmenter()'s call to kvm_spurious_fault() with UD2 ...
2019-07-20Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2-0/+7
Pull more KVM updates from Paolo Bonzini: "Mostly bugfixes, but also: - s390 support for KVM selftests - LAPIC timer offloading to housekeeping CPUs - Extend an s390 optimization for overcommitted hosts to all architectures - Debugging cleanups and improvements" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (25 commits) KVM: x86: Add fixed counters to PMU filter KVM: nVMX: do not use dangling shadow VMCS after guest reset KVM: VMX: dump VMCS on failed entry KVM: x86/vPMU: refine kvm_pmu err msg when event creation failed KVM: s390: Use kvm_vcpu_wake_up in kvm_s390_vcpu_wakeup KVM: Boost vCPUs that are delivering interrupts KVM: selftests: Remove superfluous define from vmx.c KVM: SVM: Fix detection of AMD Errata 1096 KVM: LAPIC: Inject timer interrupt via posted interrupt KVM: LAPIC: Make lapic timer unpinned KVM: x86/vPMU: reset pmc->counter to 0 for pmu fixed_counters KVM: nVMX: Ignore segment base for VMX memory operand when segment not FS or GS kvm: x86: ioapic and apic debug macros cleanup kvm: x86: some tsc debug cleanup kvm: vmx: fix coccinelle warnings x86: kvm: avoid constant-conversion warning x86: kvm: avoid -Wsometimes-uninitized warning KVM: x86: expose AVX512_BF16 feature to guest KVM: selftests: enable pgste option for the linker on s390 KVM: selftests: Move kvm_create_max_vcpus test to generic code ...
2019-07-20Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds1-0/+3
Pull SCSI fixes from James Bottomley: "This is the final round of mostly small fixes in our initial submit. It's mostly minor fixes and driver updates. The only change of note is adding a virt_boundary_mask to the SCSI host and host template to parametrise this for NVMe devices instead of having them do a call in slave_alloc. It's a fairly straightforward conversion except in the two NVMe handling drivers that didn't set it who now have a virtual infinity parameter added" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (24 commits) scsi: megaraid_sas: set an unlimited max_segment_size scsi: mpt3sas: set an unlimited max_segment_size for SAS 3.0 HBAs scsi: IB/srp: set virt_boundary_mask in the scsi host scsi: IB/iser: set virt_boundary_mask in the scsi host scsi: storvsc: set virt_boundary_mask in the scsi host template scsi: ufshcd: set max_segment_size in the scsi host template scsi: core: take the DMA max mapping size into account scsi: core: add a host / host template field for the virt boundary scsi: core: Fix race on creating sense cache scsi: sd_zbc: Fix compilation warning scsi: libfc: fix null pointer dereference on a null lport scsi: zfcp: fix GCC compiler warning emitted with -Wmaybe-uninitialized scsi: zfcp: fix request object use-after-free in send path causing wrong traces scsi: zfcp: fix request object use-after-free in send path causing seqno errors scsi: megaraid_sas: Update driver version to 07.710.50.00 scsi: megaraid_sas: Add module parameter for FW Async event logging scsi: megaraid_sas: Enable msix_load_balance for Invader and later controllers scsi: megaraid_sas: Fix calculation of target ID scsi: lpfc: reduce stack size with CONFIG_GCC_PLUGIN_STRUCTLEAK_VERBOSE scsi: devinfo: BLIST_TRY_VPD_PAGES for SanDisk Cruzer Blade ...
2019-07-20Merge tag 'kbuild-v5.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuildLinus Torvalds1-8/+6
Pull more Kbuild updates from Masahiro Yamada: - match the directory structure of the linux-libc-dev package to that of Debian-based distributions - fix incorrect include/config/auto.conf generation when Kconfig creates it along with the .config file - remove misleading $(AS) from documents - clean up precious tag files by distclean instead of mrproper - add a new coccinelle patch for devm_platform_ioremap_resource migration - refactor module-related scripts to read modules.order instead of $(MODVERDIR)/*.mod files to get the list of created modules - remove MODVERDIR - update list of header compile-test - add -fcf-protection=none flag to avoid conflict with the retpoline flags when CONFIG_RETPOLINE=y - misc cleanups * tag 'kbuild-v5.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (25 commits) kbuild: add -fcf-protection=none when using retpoline flags kbuild: update compile-test header list for v5.3-rc1 kbuild: split out *.mod out of {single,multi}-used-m rules kbuild: remove 'prepare1' target kbuild: remove the first line of *.mod files kbuild: create *.mod with full directory path and remove MODVERDIR kbuild: export_report: read modules.order instead of .tmp_versions/*.mod kbuild: modpost: read modules.order instead of $(MODVERDIR)/*.mod kbuild: modsign: read modules.order instead of $(MODVERDIR)/*.mod kbuild: modinst: read modules.order instead of $(MODVERDIR)/*.mod scsi: remove pointless $(MODVERDIR)/$(obj)/53c700.ver kbuild: remove duplication from modules.order in sub-directories kbuild: get rid of kernel/ prefix from in-tree modules.{order,builtin} kbuild: do not create empty modules.order in the prepare stage coccinelle: api: add devm_platform_ioremap_resource script kbuild: compile-test headers listed in header-test-m as well kbuild: remove unused hostcc-option kbuild: remove tag files by distclean instead of mrproper kbuild: add --hash-style= and --build-id unconditionally kbuild: get rid of misleading $(AS) from documents ...
2019-07-20Merge branch 'work.dcache2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds1-1/+0
Pull dcache and mountpoint updates from Al Viro: "Saner handling of refcounts to mountpoints. Transfer the counting reference from struct mount ->mnt_mountpoint over to struct mountpoint ->m_dentry. That allows us to get rid of the convoluted games with ordering of mount shutdowns. The cost is in teaching shrink_dcache_{parent,for_umount} to cope with mixed-filesystem shrink lists, which we'll also need for the Slab Movable Objects patchset" * 'work.dcache2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: switch the remnants of releasing the mountpoint away from fs_pin get rid of detach_mnt() make struct mountpoint bear the dentry reference to mountpoint, not struct mount Teach shrink_dcache_parent() to cope with mixed-filesystem shrink lists fs/namespace.c: shift put_mountpoint() to callers of unhash_mnt() __detach_mounts(): lookup_mountpoint() can't return ERR_PTR() anymore nfs: dget_parent() never returns NULL ceph: don't open-code the check for dead lockref
2019-07-20KVM: Boost vCPUs that are delivering interruptsWanpeng Li1-0/+1
Inspired by commit 9cac38dd5d (KVM/s390: Set preempted flag during vcpu wakeup and interrupt delivery), we want to also boost not just lock holders but also vCPUs that are delivering interrupts. Most smp_call_function_many calls are synchronous, so the IPI target vCPUs are also good yield candidates. This patch introduces vcpu->ready to boost vCPUs during wakeup and interrupt delivery time; unlike s390 we do not reuse vcpu->preempted so that voluntarily preempted vCPUs are taken into account by kvm_vcpu_on_spin, but vmx_vcpu_pi_put is not affected (VT-d PI handles voluntary preemption separately, in pi_pre_block). Testing on 80 HT 2 socket Xeon Skylake server, with 80 vCPUs VM 80GB RAM: ebizzy -M vanilla boosting improved 1VM 21443 23520 9% 2VM 2800 8000 180% 3VM 1800 3100 72% Testing on my Haswell desktop 8 HT, with 8 vCPUs VM 8GB RAM, two VMs, one running ebizzy -M, the other running 'stress --cpu 2': w/ boosting + w/o pv sched yield(vanilla) vanilla boosting improved 1570 4000 155% w/ boosting + w/ pv sched yield(vanilla) vanilla boosting improved 1844 5157 179% w/o boosting, perf top in VM: 72.33% [kernel] [k] smp_call_function_many 4.22% [kernel] [k] call_function_i 3.71% [kernel] [k] async_page_fault w/ boosting, perf top in VM: 38.43% [kernel] [k] smp_call_function_many 6.31% [kernel] [k] async_page_fault 6.13% libc-2.23.so [.] __memcpy_avx_unaligned 4.88% [kernel] [k] call_function_interrupt Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Marc Zyngier <maz@kernel.org> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-07-20KVM: LAPIC: Inject timer interrupt via posted interruptWanpeng Li1-0/+6
Dedicated instances are currently disturbed by unnecessary jitter due to the emulated lapic timers firing on the same pCPUs where the vCPUs reside. There is no hardware virtual timer on Intel for guest like ARM, so both programming timer in guest and the emulated timer fires incur vmexits. This patch tries to avoid vmexit when the emulated timer fires, at least in dedicated instance scenario when nohz_full is enabled. In that case, the emulated timers can be offload to the nearest busy housekeeping cpus since APICv has been found for several years in server processors. The guest timer interrupt can then be injected via posted interrupts, which are delivered by the housekeeping cpu once the emulated timer fires. The host should tuned so that vCPUs are placed on isolated physical processors, and with several pCPUs surplus for busy housekeeping. If disabled mwait/hlt/pause vmexits keep the vCPUs in non-root mode, ~3% redis performance benefit can be observed on Skylake server, and the number of external interrupt vmexits drops substantially. Without patch VM-EXIT Samples Samples% Time% Min Time Max Time Avg time EXTERNAL_INTERRUPT 42916 49.43% 39.30% 0.47us 106.09us 0.71us ( +- 1.09% ) While with patch: VM-EXIT Samples Samples% Time% Min Time Max Time Avg time EXTERNAL_INTERRUPT 6871 9.29% 2.96% 0.44us 57.88us 0.72us ( +- 4.02% ) Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-07-19net: flow_offload: add flow_block structure and use itPablo Neira Ayuso3-4/+15
This object stores the flow block callbacks that are attached to this block. Update flow_block_cb_lookup() to take this new object. This patch restores the block sharing feature. Fixes: da3eeb904ff4 ("net: flow_offload: add list handling functions") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-19net: flow_offload: rename tc_setup_cb_t to flow_setup_cb_tPablo Neira Ayuso3-13/+15
Rename this type definition and adapt users. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-19net: flow_offload: remove netns parameter from flow_block_cb_alloc()Pablo Neira Ayuso1-2/+1
No need to annotate the netns on the flow block callback object, flow_block_cb_is_busy() already checks for used blocks. Fixes: d63db30c8537 ("net: flow_offload: add flow_block_cb_alloc() and flow_block_cb_free()") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller3-5/+11
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) Fix a deadlock when module is requested via netlink_bind() in nfnetlink, from Florian Westphal. 2) Fix ipt_rpfilter and ip6t_rpfilter with VRF, from Miaohe Lin. 3) Skip master comparison in SIP helper to fix expectation clash under two valid scenarios, from xiao ruizhu. 4) Remove obsolete comments in nf_conntrack codebase, from Yonatan Goldschmidt. 5) Fix redirect extension module autoload, from Christian Hesse. 6) Fix incorrect mssg option sent to client in synproxy, from Fernando Fernandez. 7) Fix incorrect window calculations in TCP conntrack, from Florian Westphal. 8) Don't bail out when updating basechain policy due to recent offload works, also from Florian. 9) Allow symhash to use modulus 1 as other hash extensions do, from Laura.Garcia. 10) Missing NAT chain module autoload for the inet family, from Phil Sutter. 11) Fix missing adjustment of TCP RST packet in synproxy, from Fernando Fernandez. 12) Skip EAGAIN path when nft_meta_bridge is built-in or not selected. 13) Conntrack bridge does not depend on nf_tables_bridge. 14) Turn NF_TABLES_BRIDGE into tristate to fix possible link break of nft_meta_bridge, from Arnd Bergmann. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-20kbuild: update compile-test header list for v5.3-rc1Masahiro Yamada1-8/+6
- Some headers graduated from the blacklist - hyperv_timer.h joined the header-test when CONFIG_X86=y - nf_tables*.h joined the header-test when CONFIG_NF_TABLES is enabled. - The entry for nf_tables_offload.h was added to fix build error for the combination of CONFIG_NF_TABLES=n and CONFIG_KERNEL_HEADER_TEST=y. - The entry for iomap.h was added because this header is supposed to be included only when CONFIG_BLOCK=y Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
2019-07-19Merge tag 'armsoc-dt' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/socLinus Torvalds1-41/+0
Pull ARM Devicetree updates from Olof Johansson: "We continue to see a lot of new material. I've highlighted some of it below, but there's been more beyond that as well. One of the sweeping changes is that many boards have seen their ARM Mali GPU devices added to device trees, since the DRM drivers have now been merged. So, with the caveat that I have surely missed several great contributions, here's a collection of the material this time around: New SoCs: - Mediatek mt8183 (4x Cortex-A73 + 4x Cortex-A53) - TI J721E (2x Cortex-A72 + 3x Cortex-R5F + 3 DSPs + MMA) - Amlogic G12B (4x Cortex-A73 + 2x Cortex-A53) New Boards / platforms: - Aspeed BMC support for a number of new server platforms - Kontron SMARC SoM (several i.MX6 versions) - Novtech's Meerkat96 (i.MX7) - ST Micro Avenger96 board - Hardkernel ODROID-N2 (Amlogic G12B) - Purism Librem5 devkit (i.MX8MQ) - Google Cheza (Qualcomm SDM845) - Qualcomm Dragonboard 845c (Qualcomm SDM845) - Hugsun X99 TV Box (Rockchip RK3399) - Khadas Edge/Edge-V/Captain (Rockchip RK3399) Updated / expanded boards and platforms: - Renesas r7s9210 has a lot of new peripherals added - Fixes and polish for Rockchip-based Chromebooks - Amlogic G12A has a lot of peripherals added - Nvidia Jetson Nano sees various fixes and improvements, and is now at feature parity with TX1" * tag 'armsoc-dt' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (586 commits) ARM: dts: gemini: Set DIR-685 SPI CS as active low ARM: dts: exynos: Adjust buck[78] regulators to supported values on Arndale Octa ARM: dts: exynos: Adjust buck[78] regulators to supported values on Odroid XU3 family ARM: dts: exynos: Move Mali400 GPU node to "/soc" ARM: dts: exynos: Fix imprecise abort on Mali GPU probe on Exynos4210 arm64: dts: qcom: qcs404: Add missing space for cooling-cells property arm64: dts: rockchip: Fix USB3 Type-C on rk3399-sapphire arm64: dts: rockchip: Update DWC3 modules on RK3399 SoCs arm64: dts: rockchip: enable rk3328 watchdog clock ARM: dts: rockchip: add display nodes for rk322x ARM: dts: rockchip: fix vop iommu-cells on rk322x arm64: dts: rockchip: Add support for Hugsun X99 TV Box arm64: dts: rockchip: Define values for the IPA governor for rock960 arm64: dts: rockchip: Fix multiple thermal zones conflict in rk3399.dtsi arm64: dts: rockchip: add core dtsi file for RK3399Pro SoCs arm64: dts: rockchip: improve rk3328-roc-cc rgmii performance. Revert "ARM: dts: rockchip: set PWM delay backlight settings for Minnie" ARM: dts: rockchip: Configure BT_DEV_WAKE in on rk3288-veyron arm64: dts: qcom: sdm845-cheza: add initial cheza dt ARM: dts: msm8974-FP2: Add vibration motor ...
2019-07-19Merge tag 'armsoc-drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/socLinus Torvalds9-172/+375
Pull ARM SoC-related driver updates from Olof Johansson: "Various driver updates for platforms and a couple of the small driver subsystems we merge through our tree: - A driver for SCU (system control) on NXP i.MX8QXP - Qualcomm Always-on Subsystem messaging driver (AOSS QMP) - Qualcomm PM support for MSM8998 - Support for a newer version of DRAM PHY driver for Broadcom (DPFE) - Reset controller support for Bitmain BM1880 - TI SCI (System Control Interface) support for CPU control on AM654 processors - More TI sysc refactoring and rework" * tag 'armsoc-drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (84 commits) reset: remove redundant null check on pointer dev soc: rockchip: work around clang warning dt-bindings: reset: imx7: Fix the spelling of 'indices' soc: imx: Add i.MX8MN SoC driver support soc: aspeed: lpc-ctrl: Fix probe error handling soc: qcom: geni: Add support for ACPI firmware: ti_sci: Fix gcc unused-but-set-variable warning firmware: ti_sci: Use the correct style for SPDX License Identifier soc: imx8: Use existing of_root directly soc: imx8: Fix potential kernel dump in error path firmware/psci: psci_checker: Park kthreads before stopping them memory: move jedec_ddr.h from include/memory to drivers/memory/ memory: move jedec_ddr_data.c from lib/ to drivers/memory/ MAINTAINERS: Remove myself as qcom maintainer soc: aspeed: lpc-ctrl: make parameter optional soc: qcom: apr: Don't use reg for domain id soc: qcom: fix QCOM_AOSS_QMP dependency and build errors memory: tegra: Fix -Wunused-const-variable firmware: tegra: Early resume BPMP soc/tegra: Select pinctrl for Tegra194 ...
2019-07-19Merge tag 'armsoc-soc' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/socLinus Torvalds1-28/+0
Pull ARM SoC platform updates from Olof Johansson: "SoC platform changes. Main theme this merge window: - The Netx platform (Netx 100/500) platform is removed by Linus Walleij-- the SoC doesn't have active maintainers with hardware, and in discussions with the vendor the agreement was that it's OK to remove. - Russell King has a series of patches that cleans up and refactors SA1101 and RiscPC support" * tag 'armsoc-soc' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (47 commits) ARM: stm32: use "depends on" instead of "if" after prompt ARM: sa1100: convert to common clock framework ARM: exynos: Cleanup cppcheck shifting warning ARM: pxa/lubbock: remove lubbock_set_misc_wr() from global view ARM: exynos: Only build MCPM support if used arm: add missing include platform-data/atmel.h ARM: davinci: Use GPIO lookup table for DA850 LEDs ARM: OMAP2: drop explicit assembler architecture ARM: use arch_extension directive instead of arch argument ARM: imx: Switch imx7d to imx-cpufreq-dt for speed-grading ARM: bcm: Enable PINCTRL for ARCH_BRCMSTB ARM: bcm: Enable ARCH_HAS_RESET_CONTROLLER for ARCH_BRCMSTB ARM: riscpc: enable chained scatterlist support ARM: riscpc: reduce IRQ handling code ARM: riscpc: move RiscPC assembly files from arch/arm/lib to mach-rpc ARM: riscpc: parse video information from tagged list ARM: riscpc: add ecard quirk for Atomwide 3port serial card MAINTAINERS: mvebu: Add git entry soc: ti: pm33xx: Add a print while entering RTC only mode with DDR in self-refresh ARM: OMAP2+: Make some variables static ...
2019-07-19Merge tag 'drm-next-2019-07-19' of git://anongit.freedesktop.org/drm/drmLinus Torvalds2-2/+7
Pull drm fixes from Daniel Vetter: "Dave is back in shape, but now family got it so I'm doing the pull. Two things worthy of note: - nouveau feature pull was way too late, Dave&me decided to not take that, so Ben spun up a pull with just the fixes. - after some chatting with the arm display maintainers we decided to change a bit how that's maintained, for more oversight/review and cross vendor collab. More details below: nouveau: - bugfixes - TU116 enabling (minor iteration) :w amdgpu: - large pile of fixes for new hw support this release (navi, vega20) - audio hotplug fix - bunch of corner cases and small fixes all over for amdgpu/kfd komeda: - back out some new properties (from this merge window) that needs more pondering. bochs: - fb pitch setup core: - a new panel quirk - misc fixes" * tag 'drm-next-2019-07-19' of git://anongit.freedesktop.org/drm/drm: (73 commits) drm/nouveau/secboot/gp102-: remove WAR for SEC2 RTOS start bug drm/nouveau/flcn/gp102-: improve implementation of bind_context() on SEC2/GSP drm/nouveau: fix memory leak in nouveau_conn_reset() drm/nouveau/dmem: missing mutex_lock in error path drm/nouveau/hwmon: return EINVAL if the GPU is powered down for sensors reads drm/nouveau: fix bogus GPL-2 license header drm/nouveau: fix bogus GPL-2 license header drm/nouveau/i2c: Enable i2c pads & busses during preinit drm/nouveau/disp/tu102-: wire up scdc parameter setter drm/nouveau/core: recognise TU116 chipset drm/nouveau/kms: disallow dual-link harder if hdmi connection detected drm/nouveau/disp/nv50-: fix center/aspect-corrected scaling drm/nouveau/disp/nv50-: force scaler for any non-default LVDS/eDP modes drm/nouveau/mcp89/mmu: Use mcp77_mmu_new instead of g84_mmu_new on MCP89. drm/amd/display: init res_pool dccg_ref, dchub_ref with xtalin_freq drm/amdgpu/pm: remove check for pp funcs in freq sysfs handlers drm/amd/display: Force uclk to max for every state drm/amdkfd: Remove GWS from process during uninit drm/amd/amdgpu: Fix offset for vmid selection in debugfs interface drm/amd/powerplay: update vega20 driver if to fit latest SMU firmware ...
2019-07-19Merge tag 'for-linus-5.3a-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tipLinus Torvalds3-29/+2
Pull xen updates from Juergen Gross: "Fixes and features: - A series to introduce a common command line parameter for disabling paravirtual extensions when running as a guest in virtualized environment - A fix for int3 handling in Xen pv guests - Removal of the Xen-specific tmem driver as support of tmem in Xen has been dropped (and it was experimental only) - A security fix for running as Xen dom0 (XSA-300) - A fix for IRQ handling when offlining cpus in Xen guests - Some small cleanups" * tag 'for-linus-5.3a-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen: let alloc_xenballooned_pages() fail if not enough memory free xen/pv: Fix a boot up hang revealed by int3 self test x86/xen: Add "nopv" support for HVM guest x86/paravirt: Remove const mark from x86_hyper_xen_hvm variable xen: Map "xen_nopv" parameter to "nopv" and mark it obsolete x86: Add "nopv" parameter to disable PV extensions x86/xen: Mark xen_hvm_need_lapic() and xen_x2apic_para_available() as __init xen: remove tmem driver Revert "x86/paravirt: Set up the virt_spin_lock_key after static keys get initialized" xen/events: fix binding user event channels to cpus
2019-07-19Merge tag 'iomap-5.3-merge-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds1-0/+17
Pull iomap split/cleanup from Darrick Wong: "As promised, here's the second part of the iomap merge for 5.3, in which we break up iomap.c into smaller files grouped by functional area so that it'll be easier in the long run to maintain cohesiveness of code units and to review incoming patches. There are no functional changes and fs/iomap.c split cleanly. Summary: - Regroup the fs/iomap.c code by major functional area so that we can start development for 5.4 from a more stable base" * tag 'iomap-5.3-merge-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: iomap: move internal declarations into fs/iomap/ iomap: move the main iteration code into a separate file iomap: move the buffered IO code into a separate file iomap: move the direct IO code into a separate file iomap: move the SEEK_HOLE code into a separate file iomap: move the file mapping reporting code into a separate file iomap: move the swapfile code into a separate file iomap: start moving code to fs/iomap/
2019-07-19Merge branch 'work.adfs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds1-3/+3
Pull adfs updates from Al Viro: "More ADFS patches from Russell King" * 'work.adfs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fs/adfs: add time stamp and file type helpers fs/adfs: super: limit idlen according to directory type fs/adfs: super: fix use-after-free bug fs/adfs: super: safely update options on remount fs/adfs: super: correct superblock flags fs/adfs: clean up indirect disc addresses and fragment IDs fs/adfs: clean up error message printing fs/adfs: use %pV for error messages fs/adfs: use format_version from disc_record fs/adfs: add helper to get filesystem size fs/adfs: add helper to get discrecord from map fs/adfs: correct disc record structure
2019-07-19Merge branch 'work.mount0' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds8-26/+30
Pull vfs mount updates from Al Viro: "The first part of mount updates. Convert filesystems to use the new mount API" * 'work.mount0' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits) mnt_init(): call shmem_init() unconditionally constify ksys_mount() string arguments don't bother with registering rootfs init_rootfs(): don't bother with init_ramfs_fs() vfs: Convert smackfs to use the new mount API vfs: Convert selinuxfs to use the new mount API vfs: Convert securityfs to use the new mount API vfs: Convert apparmorfs to use the new mount API vfs: Convert openpromfs to use the new mount API vfs: Convert xenfs to use the new mount API vfs: Convert gadgetfs to use the new mount API vfs: Convert oprofilefs to use the new mount API vfs: Convert ibmasmfs to use the new mount API vfs: Convert qib_fs/ipathfs to use the new mount API vfs: Convert efivarfs to use the new mount API vfs: Convert configfs to use the new mount API vfs: Convert binfmt_misc to use the new mount API convenience helper: get_tree_single() convenience helper get_tree_nodev() vfs: Kill sget_userns() ...
2019-07-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds4-5/+6
Pull networking fixes from David Miller: 1) Fix AF_XDP cq entry leak, from Ilya Maximets. 2) Fix handling of PHY power-down on RTL8411B, from Heiner Kallweit. 3) Add some new PCI IDs to iwlwifi, from Ihab Zhaika. 4) Fix handling of neigh timers wrt. entries added by userspace, from Lorenzo Bianconi. 5) Various cases of missing of_node_put(), from Nishka Dasgupta. 6) The new NET_ACT_CT needs to depend upon NF_NAT, from Yue Haibing. 7) Various RDS layer fixes, from Gerd Rausch. 8) Fix some more fallout from TCQ_F_CAN_BYPASS generalization, from Cong Wang. 9) Fix FIB source validation checks over loopback, also from Cong Wang. 10) Use promisc for unsupported number of filters, from Justin Chen. 11) Missing sibling route unlink on failure in ipv6, from Ido Schimmel. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (90 commits) tcp: fix tcp_set_congestion_control() use from bpf hook ag71xx: fix return value check in ag71xx_probe() ag71xx: fix error return code in ag71xx_probe() usb: qmi_wwan: add D-Link DWM-222 A2 device ID bnxt_en: Fix VNIC accounting when enabling aRFS on 57500 chips. net: dsa: sja1105: Fix missing unlock on error in sk_buff() gve: replace kfree with kvfree selftests/bpf: fix test_xdp_noinline on s390 selftests/bpf: fix "valid read map access into a read-only array 1" on s390 net/mlx5: Replace kfree with kvfree MAINTAINERS: update netsec driver ipv6: Unlink sibling route in case of failure liquidio: Replace vmalloc + memset with vzalloc udp: Fix typo in net/ipv4/udp.c net: bcmgenet: use promisc for unsupported filters ipv6: rt6_check should return NULL if 'from' is NULL tipc: initialize 'validated' field of received packets selftests: add a test case for rp_filter fib: relax source validation check for loopback packets mlxsw: spectrum: Do not process learned records with a dummy FID ...
2019-07-19Merge branch 'akpm' (patches from Andrew)Linus Torvalds8-73/+130
Merge yet more updates from Andrew Morton: "The rest of MM and a kernel-wide procfs cleanup. Summary of the more significant patches: - Patch series "mm/memory_hotplug: Factor out memory block devicehandling", v3. David Hildenbrand. Some spring-cleaning of the memory hotplug code, notably in drivers/base/memory.c - "mm: thp: fix false negative of shmem vma's THP eligibility". Yang Shi. Fix /proc/pid/smaps output for THP pages used in shmem. - "resource: fix locking in find_next_iomem_res()" + 1. Nadav Amit. Bugfix and speedup for kernel/resource.c - Patch series "mm: Further memory block device cleanups", David Hildenbrand. More spring-cleaning of the memory hotplug code. - Patch series "mm: Sub-section memory hotplug support". Dan Williams. Generalise the memory hotplug code so that pmem can use it more completely. Then remove the hacks from the libnvdimm code which were there to work around the memory-hotplug code's constraints. - "proc/sysctl: add shared variables for range check", Matteo Croce. We have about 250 instances of int zero; ... .extra1 = &zero, in the tree. This is a tree-wide sweep to make all those private "zero"s and "one"s use global variables. Alas, it isn't practical to make those two global integers const" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (38 commits) proc/sysctl: add shared variables for range check mm: migrate: remove unused mode argument mm/sparsemem: cleanup 'section number' data types libnvdimm/pfn: stop padding pmem namespaces to section alignment libnvdimm/pfn: fix fsdax-mode namespace info-block zero-fields mm/devm_memremap_pages: enable sub-section remap mm: document ZONE_DEVICE memory-model implications mm/sparsemem: support sub-section hotplug mm/sparsemem: prepare for sub-section ranges mm: kill is_dev_zone() helper mm/hotplug: kill is_dev_zone() usage in __remove_pages() mm/sparsemem: convert kmalloc_section_memmap() to populate_section_memmap() mm/hotplug: prepare shrink_{zone, pgdat}_span for sub-section removal mm/sparsemem: add helpers track active portions of a section at boot mm/sparsemem: introduce a SECTION_IS_EARLY flag mm/sparsemem: introduce struct mem_section_usage drivers/base/memory.c: get rid of find_memory_block_hinted() mm/memory_hotplug: move and simplify walk_memory_blocks() mm/memory_hotplug: rename walk_memory_range() and pass start+size instead of pfns mm: make register_mem_sect_under_node() static ...
2019-07-19Merge tag 'drm-next-5.3-2019-07-18' of git://people.freedesktop.org/~agd5f/linux into drm-nextDave Airlie1-1/+6
drm-next-5.3-2019-07-18: amdgpu: - Navi DC fix for secondary adapters - Fix Navi flickering with high res panels - Navi SMU fixes - Vega20 SMU fixes - Fixes for audio hotplug on HG systems - Fix for potential integer overflows on large buffer migrations - debugfs fixes for umr - Various other small fixes amdkfd: - Apply noretry setting consistently - Fix hang in eviction - Properly clean up GWS on uninit UAPI: - clarify a comment on ctx priority Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexdeucher@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190718211525.3374-1-alexander.deucher@amd.com
2019-07-18tcp: fix tcp_set_congestion_control() use from bpf hookEric Dumazet1-1/+2
Neal reported incorrect use of ns_capable() from bpf hook. bpf_setsockopt(...TCP_CONGESTION...) -> tcp_set_congestion_control() -> ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN) -> ns_capable_common() -> current_cred() -> rcu_dereference_protected(current->cred, 1) Accessing 'current' in bpf context makes no sense, since packets are processed from softirq context. As Neal stated : The capability check in tcp_set_congestion_control() was written assuming a system call context, and then was reused from a BPF call site. The fix is to add a new parameter to tcp_set_congestion_control(), so that the ns_capable() call is only performed under the right context. Fixes: 91b5b21c7c16 ("bpf: Add support for changing congestion control") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Lawrence Brakmo <brakmo@fb.com> Reported-by: Neal Cardwell <ncardwell@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Lawrence Brakmo <brakmo@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-18proc/sysctl: add shared variables for range checkMatteo Croce1-0/+7
In the sysctl code the proc_dointvec_minmax() function is often used to validate the user supplied value between an allowed range. This function uses the extra1 and extra2 members from struct ctl_table as minimum and maximum allowed value. On sysctl handler declaration, in every source file there are some readonly variables containing just an integer which address is assigned to the extra1 and extra2 members, so the sysctl range is enforced. The special values 0, 1 and INT_MAX are very often used as range boundary, leading duplication of variables like zero=0, one=1, int_max=INT_MAX in different source files: $ git grep -E '\.extra[12].*&(zero|one|int_max)' |wc -l 248 Add a const int array containing the most commonly used values, some macros to refer more easily to the correct array member, and use them instead of creating a local one for every object file. This is the bloat-o-meter output comparing the old and new binary compiled with the default Fedora config: # scripts/bloat-o-meter -d vmlinux.o.old vmlinux.o add/remove: 2/2 grow/shrink: 0/2 up/down: 24/-188 (-164) Data old new delta sysctl_vals - 12 +12 __kstrtab_sysctl_vals - 12 +12 max 14 10 -4 int_max 16 - -16 one 68 - -68 zero 128 28 -100 Total: Before=20583249, After=20583085, chg -0.00% [mcroce@redhat.com: tipc: remove two unused variables] Link: http://lkml.kernel.org/r/20190530091952.4108-1-mcroce@redhat.com [akpm@linux-foundation.org: fix net/ipv6/sysctl_net_ipv6.c] [arnd@arndb.de: proc/sysctl: make firmware loader table conditional] Link: http://lkml.kernel.org/r/20190617130014.1713870-1-arnd@arndb.de [akpm@linux-foundation.org: fix fs/eventpoll.c] Link: http://lkml.kernel.org/r/20190430180111.10688-1-mcroce@redhat.com Signed-off-by: Matteo Croce <mcroce@redhat.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Kees Cook <keescook@chromium.org> Reviewed-by: Aaron Tomlin <atomlin@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-18mm: migrate: remove unused mode argumentKeith Busch1-2/+1
migrate_page_move_mapping() doesn't use the mode argument. Remove it and update callers accordingly. Link: http://lkml.kernel.org/r/20190508210301.8472-1-keith.busch@intel.com Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Cc: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-18libnvdimm/pfn: stop padding pmem namespaces to section alignmentDan Williams1-0/+3
Now that the mm core supports section-unaligned hotplug of ZONE_DEVICE memory, we no longer need to add padding at pfn/dax device creation time. The kernel will still honor padding established by older kernels. Link: http://lkml.kernel.org/r/156092356588.979959.6793371748950931916.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reported-by: Jeff Moyer <jmoyer@redhat.com> Tested-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> [ppc64] Cc: David Hildenbrand <david@redhat.com> Cc: Jane Chu <jane.chu@oracle.com> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Logan Gunthorpe <logang@deltatee.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Toshi Kani <toshi.kani@hpe.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wei Yang <richardw.yang@linux.intel.com> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-18mm/sparsemem: support sub-section hotplugDan Williams1-1/+1
The libnvdimm sub-system has suffered a series of hacks and broken workarounds for the memory-hotplug implementation's awkward section-aligned (128MB) granularity. For example the following backtrace is emitted when attempting arch_add_memory() with physical address ranges that intersect 'System RAM' (RAM) with 'Persistent Memory' (PMEM) within a given section: # cat /proc/iomem | grep -A1 -B1 Persistent\ Memory 100000000-1ffffffff : System RAM 200000000-303ffffff : Persistent Memory (legacy) 304000000-43fffffff : System RAM 440000000-23ffffffff : Persistent Memory 2400000000-43bfffffff : Persistent Memory 2400000000-43bfffffff : namespace2.0 WARNING: CPU: 38 PID: 928 at arch/x86/mm/init_64.c:850 add_pages+0x5c/0x60 [..] RIP: 0010:add_pages+0x5c/0x60 [..] Call Trace: devm_memremap_pages+0x460/0x6e0 pmem_attach_disk+0x29e/0x680 [nd_pmem] ? nd_dax_probe+0xfc/0x120 [libnvdimm] nvdimm_bus_probe+0x66/0x160 [libnvdimm] It was discovered that the problem goes beyond RAM vs PMEM collisions as some platform produce PMEM vs PMEM collisions within a given section. The libnvdimm workaround for that case revealed that the libnvdimm section-alignment-padding implementation has been broken for a long while. A fix for that long-standing breakage introduces as many problems as it solves as it would require a backward-incompatible change to the namespace metadata interpretation. Instead of that dubious route [1], address the root problem in the memory-hotplug implementation. Note that EEXIST is no longer treated as success as that is how sparse_add_section() reports subsection collisions, it was also obviated by recent changes to perform the request_region() for 'System RAM' before arch_add_memory() in the add_memory() sequence. [1] https://lore.kernel.org/r/155000671719.348031.2347363160141119237.stgit@dwillia2-desk3.amr.corp.intel.com [osalvador@suse.de: fix deactivate_section for early sections] Link: http://lkml.kernel.org/r/20190715081549.32577-2-osalvador@suse.de Link: http://lkml.kernel.org/r/156092354368.979959.6232443923440952359.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Oscar Salvador <osalvador@suse.de> Tested-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> [ppc64] Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: Michal Hocko <mhocko@suse.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Logan Gunthorpe <logang@deltatee.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: David Hildenbrand <david@redhat.com> Cc: Jane Chu <jane.chu@oracle.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Toshi Kani <toshi.kani@hpe.com> Cc: Wei Yang <richardw.yang@linux.intel.com> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-18mm/sparsemem: prepare for sub-section rangesDan Williams1-2/+3
Prepare the memory hot-{add,remove} paths for handling sub-section ranges by plumbing the starting page frame and number of pages being handled through arch_{add,remove}_memory() to sparse_{add,remove}_one_section(). This is simply plumbing, small cleanups, and some identifier renames. No intended functional changes. Link: http://lkml.kernel.org/r/156092353780.979959.9713046515562743194.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com> Tested-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> [ppc64] Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: Michal Hocko <mhocko@suse.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Logan Gunthorpe <logang@deltatee.com> Cc: David Hildenbrand <david@redhat.com> Cc: Jane Chu <jane.chu@oracle.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Toshi Kani <toshi.kani@hpe.com> Cc: Wei Yang <richardw.yang@linux.intel.com> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>