aboutsummaryrefslogtreecommitdiffstats
path: root/include (follow)
AgeCommit message (Collapse)AuthorFilesLines
2018-12-12net: ndo_bridge_setlink: Add extackPetr Machata1-2/+3
Drivers may not be able to implement a VLAN addition or reconfiguration. In those cases it's desirable to explain to the user that it was rejected (and why). To that end, add extack argument to ndo_bridge_setlink. Adapt all users to that change. Following patches will use the new argument in the bridge driver. Signed-off-by: Petr Machata <petrm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-11bpf: fix up uapi helper description and sync bpf header with toolsDaniel Borkmann1-6/+6
Minor markup fixup from bpf-next into net-next merge in the BPF helper description of bpf_sk_lookup_tcp() and bpf_sk_lookup_udp(). Also sync up the copy of bpf.h from tooling infrastructure. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-10Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller5-43/+125
Daniel Borkmann says: ==================== pull-request: bpf-next 2018-12-11 The following pull-request contains BPF updates for your *net-next* tree. It has three minor merge conflicts, resolutions: 1) tools/testing/selftests/bpf/test_verifier.c Take first chunk with alignment_prevented_execution. 2) net/core/filter.c [...] case bpf_ctx_range_ptr(struct __sk_buff, flow_keys): case bpf_ctx_range(struct __sk_buff, wire_len): return false; [...] 3) include/uapi/linux/bpf.h Take the second chunk for the two cases each. The main changes are: 1) Add support for BPF line info via BTF and extend libbpf as well as bpftool's program dump to annotate output with BPF C code to facilitate debugging and introspection, from Martin. 2) Add support for BPF_ALU | BPF_ARSH | BPF_{K,X} in interpreter and all JIT backends, from Jiong. 3) Improve BPF test coverage on archs with no efficient unaligned access by adding an "any alignment" flag to the BPF program load to forcefully disable verifier alignment checks, from David. 4) Add a new bpf_prog_test_run_xattr() API to libbpf which allows for proper use of BPF_PROG_TEST_RUN with data_out, from Lorenz. 5) Extend tc BPF programs to use a new __sk_buff field called wire_len for more accurate accounting of packets going to wire, from Petar. 6) Improve bpftool to allow dumping the trace pipe from it and add several improvements in bash completion and map/prog dump, from Quentin. 7) Optimize arm64 BPF JIT to always emit movn/movk/movk sequence for kernel addresses and add a dedicated BPF JIT backend allocator, from Ard. 8) Add a BPF helper function for IR remotes to report mouse movements, from Sean. 9) Various cleanups in BPF prog dump e.g. to make UAPI bpf_prog_info member naming consistent with existing conventions, from Yonghong and Song. 10) Misc cleanups and improvements in allowing to pass interface name via cmdline for xdp1 BPF example, from Matteo. 11) Fix a potential segfault in BPF sample loader's kprobes handling, from Daniel T. 12) Fix SPDX license in libbpf's README.rst, from Andrey. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-10net/sched: Remove egdev mechanismOz Shlomo1-30/+0
The egdev mechanism was replaced by the TC indirect block notifications platform. Signed-off-by: Oz Shlomo <ozsh@mellanox.com> Reviewed-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Cc: John Hurley <john.hurley@netronome.com> Cc: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-10net: Add netif_is_gretap()/netif_is_ip6gretap()Oz Shlomo1-2/+11
Changed the is_gretap_dev and is_ip6gretap_dev logic from structure comparison to string comparison of the rtnl_link_ops kind field. This approach aligns with the current identification methods and function names of vxlan and geneve network devices. Convert mlxsw to use these helpers and use them in downstream mlx5 patch. Signed-off-by: Oz Shlomo <ozsh@mellanox.com> Reviewed-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-10Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linuxSaeed Mahameed10-332/+263
mlx5-next shared branch with rdma subtree to avoid mlx5 rdma v.s. netdev conflicts. Highlights: 1) RDMA ODP (On Demand Paging) improvements and moving ODP logic to mlx5 RDMA driver 2) Improved mlx5 core driver and device events handling and provided API for upper layers to subscribe to device events. 3) RDMA only code cleanup from mlx5 core 4) Add helper to get CQE opcode 5) Rework handling of port module events 6) shared mlx5_ifc.h updates to avoid conflicts Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-10bpf: rename *_info_cnt to nr_*_info in bpf_prog_infoYonghong Song1-3/+3
In uapi bpf.h, currently we have the following fields in the struct bpf_prog_info: __u32 func_info_cnt; __u32 line_info_cnt; __u32 jited_line_info_cnt; The above field names "func_info_cnt" and "line_info_cnt" also appear in union bpf_attr for program loading. The original intention is to keep the names the same between bpf_prog_info and bpf_attr so it will imply what we returned to user space will be the same as what the user space passed to the kernel. Such a naming convention in bpf_prog_info is not consistent with other fields like: __u32 nr_jited_ksyms; __u32 nr_jited_func_lens; This patch made this adjustment so in bpf_prog_info newly introduced *_info_cnt becomes nr_*_info. Acked-by: Song Liu <songliubraving@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-12-10net/mlx5: Remove the get protocol device interface entryOr Gerlitz1-2/+0
This isn't used anywhere across the mlx5 driver stack, remove it. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-10net/mlx5: Support extended destination format in flow steering commandEli Britstein1-0/+2
Update the flow steering command formatting according to the extended destination API. Note that the FW dictates that multi destination FTEs that involve at least one encap must use the extended destination format, while single destination ones must use the legacy format. Using extended destination format requires FW support. Check for its capabilities and return error if not supported. Signed-off-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Oz Shlomo <ozsh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-10net/mlx5: E-Switch, Change vhca id valid bool field to bit flagEli Britstein1-1/+5
Change the driver flow destination struct to use bit flags with the vhca id valid being the 1st one. The flags field is more extendable and will be used in downstream patch. Signed-off-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Oz Shlomo <ozsh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-10net/mlx5: Introduce extended destination fieldsEli Britstein1-3/+16
Extended destinations provide the ability to configure different encapsulation properties per destination on a single FTE. This is needed for use-cases such as remote mirroring over tunneled networks. Signed-off-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Oz Shlomo <ozsh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-10net/mlx5: Revise gre and nvgre key formatsOz Shlomo1-2/+11
GRE RFC defines a 32 bit key field. NVGRE RFC splits the 32 bit key field to 24 bit VSID (gre_key_h) and 8 bit flow entropy (gre_key_l). Define the two key parsing alternatives in a union, thus enabling both access methods. Signed-off-by: Oz Shlomo <ozsh@mellanox.com> Reviewed-by: Eli Britstein <elibr@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-10net/mlx5: Add monitor commands layout and event dataEyal Davidovich2-1/+87
Will be used in downstream patch to monitor counter changes by the HCA and report it to the driver by an event. The driver will update its counters cached data accordingly. Signed-off-by: Eyal Davidovich <eyald@mellanox.com> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller32-86/+179
Several conflicts, seemingly all over the place. I used Stephen Rothwell's sample resolutions for many of these, if not just to double check my own work, so definitely the credit largely goes to him. The NFP conflict consisted of a bug fix (moving operations past the rhashtable operation) while chaning the initial argument in the function call in the moved code. The net/dsa/master.c conflict had to do with a bug fix intermixing of making dsa_master_set_mtu() static with the fixing of the tagging attribute location. cls_flower had a conflict because the dup reject fix from Or overlapped with the addition of port range classifiction. __set_phy_supported()'s conflict was relatively easy to resolve because Andrew fixed it in both trees, so it was just a matter of taking the net-next copy. Or at least I think it was :-) Joe Stringer's fix to the handling of netns id 0 in bpf_sk_lookup() intermixed with changes on how the sdif and caller_net are calculated in these code paths in net-next. The remaining BPF conflicts were largely about the addition of the __bpf_md_ptr stuff in 'net' overlapping with adjustments and additions to the relevant data structure where the MD pointer macros are used. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-09net/mlx5: Use helper to get CQE opcodeTariq Toukan1-0/+5
Introduce and use a helper that extracts the opcode from a CQE (completion queue entry) structure. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds6-25/+75
Pull networking fixes from David Miller: "A decent batch of fixes here. I'd say about half are for problems that have existed for a while, and half are for new regressions added in the 4.20 merge window. 1) Fix 10G SFP phy module detection in mvpp2, from Baruch Siach. 2) Revert bogus emac driver change, from Benjamin Herrenschmidt. 3) Handle BPF exported data structure with pointers when building 32-bit userland, from Daniel Borkmann. 4) Memory leak fix in act_police, from Davide Caratti. 5) Check RX checksum offload in RX descriptors properly in aquantia driver, from Dmitry Bogdanov. 6) SKB unlink fix in various spots, from Edward Cree. 7) ndo_dflt_fdb_dump() only works with ethernet, enforce this, from Eric Dumazet. 8) Fix FID leak in mlxsw driver, from Ido Schimmel. 9) IOTLB locking fix in vhost, from Jean-Philippe Brucker. 10) Fix SKB truesize accounting in ipv4/ipv6/netfilter frag memory limits otherwise namespace exit can hang. From Jiri Wiesner. 11) Address block parsing length fixes in x25 from Martin Schiller. 12) IRQ and ring accounting fixes in bnxt_en, from Michael Chan. 13) For tun interfaces, only iface delete works with rtnl ops, enforce this by disallowing add. From Nicolas Dichtel. 14) Use after free in liquidio, from Pan Bian. 15) Fix SKB use after passing to netif_receive_skb(), from Prashant Bhole. 16) Static key accounting and other fixes in XPS from Sabrina Dubroca. 17) Partially initialized flow key passed to ip6_route_output(), from Shmulik Ladkani. 18) Fix RTNL deadlock during reset in ibmvnic driver, from Thomas Falcon. 19) Several small TCP fixes (off-by-one on window probe abort, NULL deref in tail loss probe, SNMP mis-estimations) from Yuchung Cheng" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (93 commits) net/sched: cls_flower: Reject duplicated rules also under skip_sw bnxt_en: Fix _bnxt_get_max_rings() for 57500 chips. bnxt_en: Fix NQ/CP rings accounting on the new 57500 chips. bnxt_en: Keep track of reserved IRQs. bnxt_en: Fix CNP CoS queue regression. net/mlx4_core: Correctly set PFC param if global pause is turned off. Revert "net/ibm/emac: wrong bit is used for STA control" neighbour: Avoid writing before skb->head in neigh_hh_output() ipv6: Check available headroom in ip6_xmit() even without options tcp: lack of available data can also cause TSO defer ipv6: sr: properly initialize flowi6 prior passing to ip6_route_output mlxsw: spectrum_switchdev: Fix VLAN device deletion via ioctl mlxsw: spectrum_router: Relax GRE decap matching check mlxsw: spectrum_switchdev: Avoid leaking FID's reference count mlxsw: spectrum_nve: Remove easily triggerable warnings ipv4: ipv6: netfilter: Adjust the frag mem limit when truesize changes sctp: frag_point sanity check tcp: fix NULL ref in tail loss probe tcp: Do not underestimate rwnd_limited net: use skb_list_del_init() to remove from RX sublists ...
2018-12-09media: bpf: add bpf function to report mouse movementSean Young1-1/+16
Some IR remotes have a directional pad or other pointer-like thing that can be used as a mouse. Make it possible to decode these types of IR protocols in BPF. Cc: netdev@vger.kernel.org Signed-off-by: Sean Young <sean@mess.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-12-09bpf: Add bpf_line_info supportMartin KaFai Lau5-0/+49
This patch adds bpf_line_info support. It accepts an array of bpf_line_info objects during BPF_PROG_LOAD. The "line_info", "line_info_cnt" and "line_info_rec_size" are added to the "union bpf_attr". The "line_info_rec_size" makes bpf_line_info extensible in the future. The new "check_btf_line()" ensures the userspace line_info is valid for the kernel to use. When the verifier is translating/patching the bpf_prog (through "bpf_patch_insn_single()"), the line_infos' insn_off is also adjusted by the newly added "bpf_adj_linfo()". If the bpf_prog is jited, this patch also provides the jited addrs (in aux->jited_linfo) for the corresponding line_info.insn_off. "bpf_prog_fill_jited_linfo()" is added to fill the aux->jited_linfo. It is currently called by the x86 jit. Other jits can also use "bpf_prog_fill_jited_linfo()" and it will be done in the followup patches. In the future, if it deemed necessary, a particular jit could also provide its own "bpf_prog_fill_jited_linfo()" implementation. A few "*line_info*" fields are added to the bpf_prog_info such that the user can get the xlated line_info back (i.e. the line_info with its insn_off reflecting the translated prog). The jited_line_info is available if the prog is jited. It is an array of __u64. If the prog is not jited, jited_line_info_cnt is 0. The verifier's verbose log with line_info will be done in a follow up patch. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-12-09Merge tag 'char-misc-4.20-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-miscLinus Torvalds1-0/+7
Pull char/misc driver fixes from Greg KH: "Here are some small driver fixes for 4.20-rc6. There is a hyperv fix that for some reaon took forever to get into a shape that could be applied to the tree properly, but resolves a much reported issue. The others are some gnss patches, one a bugfix and the two others updates to the MAINTAINERS file to properly match the gnss files in the tree. All have been in linux-next for a while with no reported issues" * tag 'char-misc-4.20-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: MAINTAINERS: exclude gnss from SIRFPRIMA2 regex matching MAINTAINERS: add gnss scm tree gnss: sirf: fix activation retry handling Drivers: hv: vmbus: Offload the handling of channels to two workqueues
2018-12-09Merge tag 'usb-4.20-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usbLinus Torvalds2-2/+3
Pull USB fixes from Greg KH: "Here are some small USB fixes for 4.20-rc6 The "largest" here are some xhci fixes for reported issues. Also here is a USB core fix, some quirk additions, and a usb-serial fix which required the export of one of the tty layer's functions to prevent code duplication. The tty maintainer agreed with this change. All of these have been in linux-next with no reported issues" * tag 'usb-4.20-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: xhci: Prevent U1/U2 link pm states if exit latency is too long xhci: workaround CSS timeout on AMD SNPS 3.0 xHC USB: check usb_get_extra_descriptor for proper size USB: serial: console: fix reported terminal settings usb: quirk: add no-LPM quirk on SanDisk Ultra Flair device USB: Fix invalid-free bug in port_over_current_notify() usb: appledisplay: Add 27" Apple Cinema Display
2018-12-09Merge tag 'dax-fixes-4.20-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimmLinus Torvalds1-6/+8
Pull dax fixes from Dan Williams: "The last of the known regression fixes and fallout from the Xarray conversion of the filesystem-dax implementation. On the path to debugging why the dax memory-failure injection test started failing after the Xarray conversion a couple more fixes for the dax_lock_mapping_entry(), now called dax_lock_page(), surfaced. Those plus the bug that started the hunt are now addressed. These patches have appeared in a -next release with no issues reported. Note the touches to mm/memory-failure.c are just the conversion to the new function signature for dax_lock_page(). Summary: - Fix the Xarray conversion of fsdax to properly handle dax_lock_mapping_entry() in the presense of pmd entries - Fix inode destruction racing a new lock request" * tag 'dax-fixes-4.20-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: dax: Fix unlock mismatch with updated API dax: Don't access a freed inode dax: Check page->mapping isn't NULL
2018-12-08net: phy: mdio-gpio: Add phy_ignore_ta_mask to platform dataAndrew Lunn1-0/+1
The Marvell 6390 Ethernet switch family does not perform MDIO turnaround correctly. Many hardware MDIO bus masters don't care about this, but the bitbangging implementation in Linux does by default. Add phy_ignore_ta_mask to the platform data so that the bitbangging code can be told which devices are known to get TA wrong. v2 Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-08net: phy: mdio-gpio: Add platform_data support for phy_maskAndrew Lunn1-0/+13
It is sometimes necessary to instantiate a bit-banging MDIO bus as a platform device, without the aid of device tree. When device tree is being used, the bus is not scanned for devices, only those devices which are in device tree are probed. Without device tree, by default, all addresses on the bus are scanned. This may then find a device which is not a PHY, e.g. a switch. And the switch may have registers containing values which look like a PHY. So during the scan, a PHY device is wrongly created. After the bus has been registered, a search is made for mdio_board_info structures which indicates devices on the bus, and the driver which should be used for them. This is typically used to instantiate Ethernet switches from platform drivers. However, if the scanning of the bus has created a PHY device at the same location as indicated into the board info for a switch, the switch device is not created, since the address is already busy. This can be avoided by setting the phy_mask of the mdio bus. This mask prevents addresses on the bus being scanned. v2 Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-08Merge tag 'asm-generic-4.20' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-genericLinus Torvalds1-0/+4
Pull asm-generic fix from Arnd Bergmann: "Multiple people reported a bug I introduced in asm-generic/unistd.h in 4.20, this is the obvious bugfix to get glibc and others to correctly build again on new architectures that no longer provide the old fstatat64() family of system calls" * tag 'asm-generic-4.20' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: asm-generic: unistd.h: fixup broken macro include.
2018-12-08Revert "mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask"David Rientjes1-4/+8
This reverts commit 89c83fb539f95491be80cdd5158e6f0ce329e317. This should have been done as part of 2f0799a0ffc0 ("mm, thp: restore node-local hugepage allocations"). The movement of the thp allocation policy from alloc_pages_vma() to alloc_hugepage_direct_gfpmask() was intended to only set __GFP_THISNODE for mempolicies that are not MPOL_BIND whereas the revert could set this regardless of mempolicy. While the check for MPOL_BIND between alloc_hugepage_direct_gfpmask() and alloc_pages_vma() was racy, that has since been removed since the revert. What is left is the possibility to use __GFP_THISNODE in policy_node() when it is unexpected because the special handling for hugepages in alloc_pages_vma() was removed as part of the consolidation. Secondly, prior to 89c83fb539f9, alloc_pages_vma() implemented a somewhat different policy for hugepage allocations, which were allocated through alloc_hugepage_vma(). For hugepage allocations, if the allocating process's node is in the set of allowed nodes, allocate with __GFP_THISNODE for that node (for MPOL_PREFERRED, use that node with __GFP_THISNODE instead). This was changed for shmem_alloc_hugepage() to allow fallback to other nodes in 89c83fb539f9 as it did for new_page() in mm/mempolicy.c which is functionally different behavior and removes the requirement to only allocate hugepages locally. So this commit does a full revert of 89c83fb539f9 instead of the partial revert that was done in 2f0799a0ffc0. The result is the same thp allocation policy for 4.20 that was in 4.19. Fixes: 89c83fb539f9 ("mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask") Fixes: 2f0799a0ffc0 ("mm, thp: restore node-local hugepage allocations") Signed-off-by: David Rientjes <rientjes@google.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-07neighbour: Avoid writing before skb->head in neigh_hh_output()Stefano Brivio1-5/+23
While skb_push() makes the kernel panic if the skb headroom is less than the unaligned hardware header size, it will proceed normally in case we copy more than that because of alignment, and we'll silently corrupt adjacent slabs. In the case fixed by the previous patch, "ipv6: Check available headroom in ip6_xmit() even without options", we end up in neigh_hh_output() with 14 bytes headroom, 14 bytes hardware header and write 16 bytes, starting 2 bytes before the allocated buffer. Always check we're not writing before skb->head and, if the headroom is not enough, warn and drop the packet. v2: - instead of panicking with BUG_ON(), WARN_ON_ONCE() and drop the packet (Eric Dumazet) - if we avoid the panic, though, we need to explicitly check the headroom before the memcpy(), otherwise we'll have corrupted slabs on a running kernel, after we warn - use __skb_push() instead of skb_push(), as the headroom check is already implemented here explicitly (Eric Dumazet) Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-07neighbor: Improve garbage collectionDavid Ahern1-0/+3
The existing garbage collection algorithm has a number of problems: 1. The gc algorithm will not evict PERMANENT entries as those entries are managed by userspace, yet the existing algorithm walks the entire hash table which means it always considers PERMANENT entries when looking for entries to evict. In some use cases (e.g., EVPN) there can be tens of thousands of PERMANENT entries leading to wasted CPU cycles when gc kicks in. As an example, with 32k permanent entries, neigh_alloc has been observed taking more than 4 msec per invocation. 2. Currently, when the number of neighbor entries hits gc_thresh2 and the last flush for the table was more than 5 seconds ago gc kicks in walks the entire hash table evicting *all* entries not in PERMANENT or REACHABLE state and not marked as externally learned. There is no discriminator on when the neigh entry was created or if it just moved from REACHABLE to another NUD_VALID state (e.g., NUD_STALE). It is possible for entries to be created or for established neighbor entries to be moved to STALE (e.g., an external node sends an ARP request) right before the 5 second window lapses: -----|---------x|----------|----- t-5 t t+5 If that happens those entries are evicted during gc causing unnecessary thrashing on neighbor entries and userspace caches trying to track them. Further, this contradicts the description of gc_thresh2 which says "Entries older than 5 seconds will be cleared". One workaround is to make gc_thresh2 == gc_thresh3 but that negates the whole point of having separate thresholds. 3. Clearing *all* neigh non-PERMANENT/REACHABLE/externally learned entries when gc_thresh2 is exceeded is over kill and contributes to trashing especially during startup. This patch addresses these problems as follows: 1. Use of a separate list_head to track entries that can be garbage collected along with a separate counter. PERMANENT entries are not added to this list. The gc_thresh parameters are only compared to the new counter, not the total entries in the table. The forced_gc function is updated to only walk this new gc_list looking for entries to evict. 2. Entries are added to the list head at the tail and removed from the front. 3. Entries are only evicted if they were last updated more than 5 seconds ago, adhering to the original intent of gc_thresh2. 4. Forced gc is stopped once the number of gc_entries drops below gc_thresh2. 5. Since gc checks do not apply to PERMANENT entries, gc levels are skipped when allocating a new neighbor for a PERMANENT entry. By extension this means there are no explicit limits on the number of PERMANENT entries that can be created, but this is no different than FIB entries or FDB entries. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-07bridge: Add br_fdb_clear_offload()Petr Machata1-0/+6
When a driver unoffloads all FDB entries en bloc, it's inefficient to send the switchdev notification one by one. Add a helper that unsets the offload flag on FDB entries on a given bridge port and VLAN. Signed-off-by: Petr Machata <petrm@mellanox.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-07vxlan: Add vxlan_fdb_clear_offload()Petr Machata1-0/+6
When a driver unoffloads all FDB entries en bloc, it's inefficient to send the switchdev notification one by one. Add a helper that walks the FDB table, unsetting the offload flag on RDST with a given VNI. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-07vxlan: Add vxlan_fdb_replay()Petr Machata1-0/+9
When a VXLAN device becomes relevant to a driver (such as when it is attached to an offloaded bridge), the driver will generally need to walk the existing FDB entries and offload them. Add a function vxlan_fdb_replay() to call a given notifier block for each FDB entry with a given VNI. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-07net/mlx5: Expose packet based credit modeDanit Goldberg1-2/+4
Packet based credit mode bit determines whether the credit mode is done per message or packet. Expose the QP creation flag and the HCA capability. Signed-off-by: Danit Goldberg <danitg@mellanox.com> Reviewed-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-12-06Merge tag 'nfs-for-4.20-5' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds1-1/+0
Pull NFS client bugfixes from Trond Myklebust: "This is mainly fallout from the updates to the SUNRPC code that is being triggered from less common combinations of NFS mount options. Highlights include: Stable fixes: - Fix a page leak when using RPCSEC_GSS/krb5p to encrypt data. Bugfixes: - Fix a regression that causes the RPC receive code to hang - Fix call_connect_status() so that it handles tasks that got transmitted while queued waiting for the socket lock. - Fix a memory leak in call_encode() - Fix several other connect races. - Fix receive code error handling. - Use the discard iterator rather than MSG_TRUNC for compatibility with AF_UNIX/AF_LOCAL sockets. - nfs: don't dirty kernel pages read by direct-io - pnfs/Flexfiles fix to enforce per-mirror stateid only for NFSv4 data servers" * tag 'nfs-for-4.20-5' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: SUNRPC: Don't force a redundant disconnection in xs_read_stream() SUNRPC: Fix up socket polling SUNRPC: Use the discard iterator rather than MSG_TRUNC SUNRPC: Treat EFAULT as a truncated message in xs_read_stream_request() SUNRPC: Fix up handling of the XDRBUF_SPARSE_PAGES flag SUNRPC: Fix RPC receive hangs SUNRPC: Fix a potential race in xprt_connect() SUNRPC: Fix a memory leak in call_encode() SUNRPC: Fix leak of krb5p encode pages SUNRPC: call_connect_status() must handle tasks that got transmitted nfs: don't dirty kernel pages read by direct-io flexfiles: enforce per-mirror stateid only for v4 DSes
2018-12-06net: core: dev: Add extack argument to __dev_change_flags()Petr Machata1-1/+2
In order to pass extack together with NETDEV_PRE_UP notifications, it's necessary to route the extack to __dev_open() from diverse (possibly indirect) callers. The last missing API is __dev_change_flags(). Therefore extend __dev_change_flags() with and extra extack argument and update the two existing users. Since the function declaration line is changed anyway, name the struct net_device argument to placate checkpatch. Signed-off-by: Petr Machata <petrm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-06net: core: dev: Add extack argument to dev_change_flags()Petr Machata1-1/+2
In order to pass extack together with NETDEV_PRE_UP notifications, it's necessary to route the extack to __dev_open() from diverse (possibly indirect) callers. One prominent API through which the notification is invoked is dev_change_flags(). Therefore extend dev_change_flags() with and extra extack argument and update all users. Most of the calls end up just encoding NULL, but several sites (VLAN, ipvlan, VRF, rtnetlink) do have extack available. Since the function declaration line is changed anyway, name the other function arguments to placate checkpatch. Signed-off-by: Petr Machata <petrm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-06net: core: dev: Add extack argument to dev_open()Petr Machata1-1/+1
In order to pass extack together with NETDEV_PRE_UP notifications, it's necessary to route the extack to __dev_open() from diverse (possibly indirect) callers. One prominent API through which the notification is invoked is dev_open(). Therefore extend dev_open() with and extra extack argument and update all users. Most of the calls end up just encoding NULL, but bond and team drivers have the extack readily available. Signed-off-by: Petr Machata <petrm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-06net: dsa: Add overhead to tag protocol ops.Andrew Lunn1-0/+1
Each DSA tag protocol needs to add additional headers to the Ethernet frame in order to direct it towards a specific switch egress port. It must also remove the head from a frame received from a switch. Indicate the maximum size of these headers in the tag protocol ops structure, so the core can take these overheads into account. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-06Merge tag 'sound-4.20-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/soundLinus Torvalds1-1/+3
Pull sound fixes from Takashi Iwai: "Still more incoming fixes than wished at this stage, but all look like small and reasonable fixes. In addition to the usual HD-audio and USB-audio quirks for various devices, two notable changes are included: - a fix for USB-audio UAF at probing a malformed descriptor - workarounds for PCM rwsem mutex starvation" * tag 'sound-4.20-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: ALSA: hda/realtek: Fix mic issue on Acer AIO Veriton Z4860G/Z6860G ALSA: hda/realtek: Fix mic issue on Acer AIO Veriton Z4660G ALSA: hda/realtek - Add support for Acer Aspire C24-860 headset mic ALSA: hda/realtek: ALC286 mic and headset-mode fixups for Acer Aspire U27-880 ALSA: usb-audio: Fix UAF decrement if card has no live interfaces in card.c ALSA: hda/realtek - Fix speaker output regression on Thinkpad T570 ALSA: pcm: Fix interval evaluation with openmin/max ALSA: hda: Add support for AMD Stoney Ridge ALSA: usb-audio: Add SMSL D1 to quirks for native DSD support ALSA: pcm: Fix starvation on down_write_nonblock() ALSA: pcm: Call snd_pcm_unlink() conditionally at closing
2018-12-06Merge tag 'usb-serial-4.20-rc6' of https://git.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial into usb-linusGreg Kroah-Hartman1-0/+1
Johan writes: USB-serial fix for v4.20-rc6 Here's a fix for a reported USB-console regression in 4.18 which revealed a long-standing bug in the console implementation. The patch has been in linux-next over night with no reported issues. Signed-off-by: Johan Hovold <johan@kernel.org> * tag 'usb-serial-4.20-rc6' of https://git.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial: USB: serial: console: fix reported terminal settings
2018-12-06asm-generic: unistd.h: fixup broken macro include.Guo Ren1-0/+4
The broken macros make the glibc compile error. If there is no __NR3264_fstat*, we should also removed related definitions. Reported-by: Marcin Juszkiewicz <marcin.juszkiewicz@linaro.org> Fixes: bf4b6a7d371e ("y2038: Remove stat64 family from default syscall set") [arnd: Both Marcin and Guo provided this patch to fix up my clearly broken commit, I applied the version with the better changelog.] Signed-off-by: Guo Ren <ren_guo@c-sky.com> Signed-off-by: Mao Han <han_mao@c-sky.com> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2018-12-05sctp: frag_point sanity checkJakub Audykowicz1-0/+5
If for some reason an association's fragmentation point is zero, sctp_datamsg_from_user will try to endlessly try to divide a message into zero-sized chunks. This eventually causes kernel panic due to running out of memory. Although this situation is quite unlikely, it has occurred before as reported. I propose to add this simple last-ditch sanity check due to the severity of the potential consequences. Signed-off-by: Jakub Audykowicz <jakub.audykowicz@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-05bpf: Change insn_offset to insn_off in bpf_func_infoMartin KaFai Lau1-1/+1
The later patch will introduce "struct bpf_line_info" which has member "line_off" and "file_off" referring back to the string section in btf. The line_"off" and file_"off" are more consistent to the naming convention in btf.h that means "offset" (e.g. name_off in "struct btf_type"). The to-be-added "struct bpf_line_info" also has another member, "insn_off" which is the same as the "insn_offset" in "struct bpf_func_info". Hence, this patch renames "insn_offset" to "insn_off" for "struct bpf_func_info". Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-12-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller2-19/+44
Alexei Starovoitov says: ==================== pull-request: bpf 2018-12-05 The following pull-request contains BPF updates for your *net* tree. The main changes are: 1) fix bpf uapi pointers for 32-bit architectures, from Daniel. 2) improve verifer ability to handle progs with a lot of branches, from Alexei. 3) strict btf checks, from Yonghong. 4) bpf_sk_lookup api cleanup, from Joe. 5) other misc fixes ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-05net: mii: mii_lpa_mod_linkmode_lpa_t: Make use of linkmode_mod_bit helperAndrew Lunn1-6/+2
Replace the if else code structure with a call to the helper linkmode_mod_bit. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-05net: mii: Add mii_lpa_mod_linkmode_lpa_tAndrew Lunn1-16/+52
Add a _mod_ variant of mii_lpa_to_linkmode_lpa_t. Use this to fix the genphy_read_status() where the 1G link partner features are getting lost. Fixes: c0ec3c273677 ("net: phy: Convert u32 phydev->lp_advertising to linkmode") Reported-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-05net: mii: Rename mii_stat1000_to_linkmode_lpa_tAndrew Lunn2-10/+19
Rename mii_stat1000_to_linkmode_lpa_t to mii_stat1000_mod_linkmode_lpa_t to indicate it modifies the passed linkmode bitmap, without clearing any other bits. Add a helper to set/clear bits in a linkmode. Use this helper to ensure bit are clear which the stat1000 indicates should not be set. Fixes: c0ec3c273677 ("net: phy: Convert u32 phydev->lp_advertising to linkmode") Suggested-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-05net: mii: Fix autoneg in mii_lpa_to_linkmode_lpa_t()Andrew Lunn1-3/+6
mii_adv_to_linkmode_adv_t() clears all bits before setting it needs to set. This means the freshly set Autoneg gets cleared. Change the order, and add comments about it clearing the old content of the bitmap. Fixes: c0ec3c273677 ("net: phy: Convert u32 phydev->lp_advertising to linkmode") Reported-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-05mm, thp: restore node-local hugepage allocationsDavid Rientjes1-2/+0
This is a full revert of ac5b2c18911f ("mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings") and a partial revert of 89c83fb539f9 ("mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask"). By not setting __GFP_THISNODE, applications can allocate remote hugepages when the local node is fragmented or low on memory when either the thp defrag setting is "always" or the vma has been madvised with MADV_HUGEPAGE. Remote access to hugepages often has much higher latency than local pages of the native page size. On Haswell, ac5b2c18911f was shown to have a 13.9% access regression after this commit for binaries that remap their text segment to be backed by transparent hugepages. The intent of ac5b2c18911f is to address an issue where a local node is low on memory or fragmented such that a hugepage cannot be allocated. In every scenario where this was described as a fix, there is abundant and unfragmented remote memory available to allocate from, even with a greater access latency. If remote memory is also low or fragmented, not setting __GFP_THISNODE was also measured on Haswell to have a 40% regression in allocation latency. Restore __GFP_THISNODE for thp allocations. Fixes: ac5b2c18911f ("mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings") Fixes: 89c83fb539f9 ("mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask") Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Michal Hocko <mhocko@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-05USB: check usb_get_extra_descriptor for proper sizeMathias Payer1-2/+2
When reading an extra descriptor, we need to properly check the minimum and maximum size allowed, to prevent from invalid data being sent by a device. Reported-by: Hui Peng <benquike@gmail.com> Reported-by: Mathias Payer <mathias.payer@nebelwelt.net> Co-developed-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Hui Peng <benquike@gmail.com> Signed-off-by: Mathias Payer <mathias.payer@nebelwelt.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: stable <stable@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-05USB: serial: console: fix reported terminal settingsJohan Hovold1-0/+1
The USB-serial console implementation has never reported the actual terminal settings used. Despite storing the corresponding cflags in its struct console, these were never honoured on later tty open() where the tty termios would be left initialised to the driver defaults. Unlike the serial console implementation, the USB-serial code calls subdriver open() already at console setup. While calling set_termios() and write() before open() looks like it could work for some USB-serial drivers, others definitely do not expect this, so modelling this after serial core is going to be intrusive, if at all possible. Instead, use a (renamed) tty helper to save the termios data used at console setup so that the tty termios reflects the actual terminal settings after a subsequent tty open(). Note that the calls to tty_init_termios() (tty_driver_install()) and tty_save_termios() are serialised using the disconnect mutex. This specifically fixes a regression that was triggered by a recent change adding software flow control to the pl2303 driver: a getty trying to disable flow control while leaving the baud rate unchanged would now also set the baud rate to the driver default (prior to the flow-control change this had been a noop). Fixes: 7041d9c3f01b ("USB: serial: pl2303: add support for tx xon/xoff flow control") Cc: stable <stable@vger.kernel.org> # 4.18 Cc: Florian Zumbiehl <florz@florz.de> Reported-by: Jarkko Nikula <jarkko.nikula@linux.intel.com> Tested-by: Jarkko Nikula <jarkko.nikula@linux.intel.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Johan Hovold <johan@kernel.org>
2018-12-04dax: Fix unlock mismatch with updated APIMatthew Wilcox1-6/+8
Internal to dax_unlock_mapping_entry(), dax_unlock_entry() is used to store a replacement entry in the Xarray at the given xas-index with the DAX_LOCKED bit clear. When called, dax_unlock_entry() expects the unlocked value of the entry relative to the current Xarray state to be specified. In most contexts dax_unlock_entry() is operating in the same scope as the matched dax_lock_entry(). However, in the dax_unlock_mapping_entry() case the implementation needs to recall the original entry. In the case where the original entry is a 'pmd' entry it is possible that the pfn performed to do the lookup is misaligned to the value retrieved in the Xarray. Change the api to return the unlock cookie from dax_lock_page() and pass it to dax_unlock_page(). This fixes a bug where dax_unlock_page() was assuming that the page was PMD-aligned if the entry was a PMD entry with signatures like: WARNING: CPU: 38 PID: 1396 at fs/dax.c:340 dax_insert_entry+0x2b2/0x2d0 RIP: 0010:dax_insert_entry+0x2b2/0x2d0 [..] Call Trace: dax_iomap_pte_fault.isra.41+0x791/0xde0 ext4_dax_huge_fault+0x16f/0x1f0 ? up_read+0x1c/0xa0 __do_fault+0x1f/0x160 __handle_mm_fault+0x1033/0x1490 handle_mm_fault+0x18b/0x3d0 Link: https://lkml.kernel.org/r/20181130154902.GL10377@bombadil.infradead.org Fixes: 9f32d221301c ("dax: Convert dax_lock_mapping_entry to XArray") Reported-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Matthew Wilcox <willy@infradead.org> Tested-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Dan Williams <dan.j.williams@intel.com>