aboutsummaryrefslogtreecommitdiffstats
path: root/include
AgeCommit message (Collapse)AuthorFilesLines
2019-08-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds14-19/+43
Pull networking fixes from David Miller: 1) Fix jmp to 1st instruction in x64 JIT, from Alexei Starovoitov. 2) Severl kTLS fixes in mlx5 driver, from Tariq Toukan. 3) Fix severe performance regression due to lack of SKB coalescing of fragments during local delivery, from Guillaume Nault. 4) Error path memory leak in sch_taprio, from Ivan Khoronzhuk. 5) Fix batched events in skbedit packet action, from Roman Mashak. 6) Propagate VLAN TX offload to hw_enc_features in bond and team drivers, from Yue Haibing. 7) RXRPC local endpoint refcounting fix and read after free in rxrpc_queue_local(), from David Howells. 8) Fix endian bug in ibmveth multicast list handling, from Thomas Falcon. 9) Oops, make nlmsg_parse() wrap around the correct function, __nlmsg_parse not __nla_parse(). Fix from David Ahern. 10) Memleak in sctp_scend_reset_streams(), fro Zheng Bin. 11) Fix memory leak in cxgb4, from Wenwen Wang. 12) Yet another race in AF_PACKET, from Eric Dumazet. 13) Fix false detection of retransmit failures in tipc, from Tuong Lien. 14) Use after free in ravb_tstamp_skb, from Tho Vu. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (101 commits) ravb: Fix use-after-free ravb_tstamp_skb netfilter: nf_tables: map basechain priority to hardware priority net: sched: use major priority number as hardware priority wimax/i2400m: fix a memory leak bug net: cavium: fix driver name ibmvnic: Unmap DMA address of TX descriptor buffers after use bnxt_en: Fix to include flow direction in L2 key bnxt_en: Use correct src_fid to determine direction of the flow bnxt_en: Suppress HWRM errors for HWRM_NVM_GET_VARIABLE command bnxt_en: Fix handling FRAG_ERR when NVM_INSTALL_UPDATE cmd fails bnxt_en: Improve RX doorbell sequence. bnxt_en: Fix VNIC clearing logic for 57500 chips. net: kalmia: fix memory leaks cx82310_eth: fix a memory leak bug bnx2x: Fix VF's VLAN reconfiguration in reload. Bluetooth: Add debug setting for changing minimum encryption key size tipc: fix false detection of retransmit failures lan78xx: Fix memory leaks MAINTAINERS: r8169: Update path to the driver MAINTAINERS: PHY LIBRARY: Update files in the record ...
2019-08-19keys: Fix description sizeDavid Howells1-4/+4
The maximum key description size is 4095. Commit f771fde82051 ("keys: Simplify key description management") inadvertantly reduced that to 255 and made sizes between 256 and 4095 work weirdly, and any size whereby size & 255 == 0 would cause an assertion in __key_link_begin() at the following line: BUG_ON(index_key->desc_len == 0); This can be fixed by simply increasing the size of desc_len in struct keyring_index_key to a u16. Note the argument length test in keyutils only checked empty descriptions and descriptions with a size around the limit (ie. 4095) and not for all the values in between, so it missed this. This has been addressed and https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/keyutils.git/commit/?id=066bf56807c26cd3045a25f355b34c1d8a20a5aa now exhaustively tests all possible lengths of type, description and payload and then some. The assertion failure looks something like: kernel BUG at security/keys/keyring.c:1245! ... RIP: 0010:__key_link_begin+0x88/0xa0 ... Call Trace: key_create_or_update+0x211/0x4b0 __x64_sys_add_key+0x101/0x200 do_syscall_64+0x5b/0x1e0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 It can be triggered by: keyctl add user "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa" a @s Fixes: f771fde82051 ("keys: Simplify key description management") Reported-by: kernel test robot <rong.a.chen@intel.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-08-19media: uapi: h264: Get rid of the p0/b0/b1 ref-listsBoris Brezillon1-3/+0
Those lists can be extracted from the dpb, let's simplify userspace life and build that list kernel-side (generic helpers will be provided for drivers that need this list). Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Nicolas Dufresne <nicolas.dufresne@collabora.com> Reviewed-by: Ezequiel Garcia <ezequiel@collabora.com> Reviewed-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com> Tested-by: Philipp Zabel <p.zabel@pengutronix.de> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
2019-08-19media: uapi: h264: Add the concept of start codeEzequiel Garcia1-0/+6
Stateless decoders have different expectations about the start code that is prepended on H264 slices. Add a menu control to express the supported start code types (including no start code). Drivers are allowed to support only one start code type, but they can support both too. Note that this is independent of the H264 decoding mode, which specifies the granularity of the decoding operations. Either in frame-based or slice-based mode, this new control will allow to define the start code expected on H264 slices. Signed-off-by: Ezequiel Garcia <ezequiel@collabora.com> Tested-by: Philipp Zabel <p.zabel@pengutronix.de> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
2019-08-19media: uapi: h264: Add the concept of decoding modeBoris Brezillon1-0/+10
Some stateless decoders don't support per-slice decoding granularity (or at least not in a way that would make them efficient or easy to use). Expose a menu to control the supported decoding modes. Drivers are allowed to support only one decoding but they can support both too. To fully specify the decoding operation, we need to introduce a start_byte_offset, to indicate where slices start. Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com> Tested-by: Philipp Zabel <p.zabel@pengutronix.de> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
2019-08-19media: uapi: h264: Rename pixel formatEzequiel Garcia1-1/+1
The V4L2_PIX_FMT_H264_SLICE_RAW name was originally suggested because the pixel format would represent H264 slices without any start code. However, as we will now introduce a start code menu control, give the pixel format a more meaningful name, while it's still early enough to do so. Signed-off-by: Ezequiel Garcia <ezequiel@collabora.com> Tested-by: Philipp Zabel <p.zabel@pengutronix.de> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
2019-08-19media: lib/sort.c: implement sort() variant taking context argumentRasmus Villemoes1-0/+5
Our list_sort() utility has always supported a context argument that is passed through to the comparison routine. Now there's a use case for the similar thing for sort(). This implements sort_r by simply extending the existing sort function in the obvious way. To avoid code duplication, we want to implement sort() in terms of sort_r(). The naive way to do that is static int cmp_wrapper(const void *a, const void *b, const void *ctx) { int (*real_cmp)(const void*, const void*) = ctx; return real_cmp(a, b); } sort(..., cmp) { sort_r(..., cmp_wrapper, cmp) } but this would do two indirect calls for each comparison. Instead, do as is done for the default swap functions - that only adds a cost of a single easily predicted branch to each comparison call. Aside from introducing support for the context argument, this also serves as preparation for patches that will eliminate the indirect comparison calls in common cases. Requested-by: Boris Brezillon <boris.brezillon@collabora.com> Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> Acked-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Philipp Zabel <p.zabel@pengutronix.de> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
2019-08-19notify: export symbols for use by the knfsd file cacheTrond Myklebust1-0/+2
The knfsd file cache will need to detect when files are unlinked, so that it can close the associated cached files. Export a minimal set of notifier functions to allow it to do so. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-08-19locks: create a new notifier chain for lease attemptsJeff Layton1-0/+5
With the new file caching infrastructure in nfsd, we can end up holding files open for an indefinite period of time, even when they are still idle. This may prevent the kernel from handing out leases on the file, which is something we don't want to block. Fix this by running a SRCU notifier call chain whenever on any lease attempt. nfsd can then purge the cache for that inode before returning. Since SRCU is only conditionally compiled in, we must only define the new chain if it's enabled, and users of the chain must ensure that SRCU is enabled. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-08-19sunrpc: add a new cache_detail operation for when a cache is flushedJeff Layton1-0/+1
When the exports table is changed, exportfs will usually write a new time to the "flush" file in the nfsd.export cache procfile. This tells the kernel to flush any entries that are older than that value. This gives us a mechanism to tell whether an unexport might have occurred. Add a new ->flush cache_detail operation that is called after flushing the cache whenever someone writes to a "flush" file. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-08-19svcrdma: Use llist for managing cache of recv_ctxtsChuck Lever1-2/+3
Use a wait-free mechanism for managing the svc_rdma_recv_ctxts free list. Subsequently, sc_recv_lock can be eliminated. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-08-19svcrdma: Remove svc_rdma_wqChuck Lever1-1/+0
Clean up: the system workqueue will work just as well. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-08-19block: remove struct request_queue queue_headJunxiao Bi1-4/+0
The dispatch list is not used any more, as the legacy block IO stack has been removed. Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-19genirq: Force interrupt threading on RTThomas Gleixner1-0/+4
Switch force_irqthreads from a boot time modifiable variable to a compile time constant when CONFIG_PREEMPT_RT is enabled. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190816160923.12855-1-bigeasy@linutronix.de
2019-08-19netfilter: add include guard to nf_conntrack_h323_types.hMasahiro Yamada1-0/+5
Add a header include guard just in case. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-08-19clk: imx8mn: Add GIC clockLeonard Crestez1-1/+2
This is enabled by default but if it's not explicitly defined and marked as critical then its parent might get turned off. Signed-off-by: Leonard Crestez <leonard.crestez@nxp.com> Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2019-08-19signal: Allow cifs and drbd to receive their terminating signalsEric W. Biederman1-1/+14
My recent to change to only use force_sig for a synchronous events wound up breaking signal reception cifs and drbd. I had overlooked the fact that by default kthreads start out with all signals set to SIG_IGN. So a change I thought was safe turned out to have made it impossible for those kernel thread to catch their signals. Reverting the work on force_sig is a bad idea because what the code was doing was very much a misuse of force_sig. As the way force_sig ultimately allowed the signal to happen was to change the signal handler to SIG_DFL. Which after the first signal will allow userspace to send signals to these kernel threads. At least for wake_ack_receiver in drbd that does not appear actively wrong. So correct this problem by adding allow_kernel_signal that will allow signals whose siginfo reports they were sent by the kernel through, but will not allow userspace generated signals, and update cifs and drbd to call allow_kernel_signal in an appropriate place so that their thread can receive this signal. Fixing things this way ensures that userspace won't be able to send signals and cause problems, that it is clear which signals the threads are expecting to receive, and it guarantees that nothing else in the system will be affected. This change was partly inspired by similar cifs and drbd patches that added allow_signal. Reported-by: ronnie sahlberg <ronniesahlberg@gmail.com> Reported-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> Tested-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> Cc: Steve French <smfrench@gmail.com> Cc: Philipp Reisner <philipp.reisner@linbit.com> Cc: David Laight <David.Laight@ACULAB.COM> Fixes: 247bc9470b1e ("cifs: fix rmmod regression in cifs.ko caused by force_sig changes") Fixes: 72abe3bcf091 ("signal/cifs: Fix cifs_put_tcp_session to call send_sig instead of force_sig") Fixes: fee109901f39 ("signal/drbd: Use send_sig not force_sig") Fixes: 3cf5d076fb4d ("signal: Remove task parameter from force_sig") Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2019-08-19netfilter: xt_nfacct: Fix alignment mismatch in xt_nfacct_match_infoJuliana Rodrigueiro1-0/+5
When running a 64-bit kernel with a 32-bit iptables binary, the size of the xt_nfacct_match_info struct diverges. kernel: sizeof(struct xt_nfacct_match_info) : 40 iptables: sizeof(struct xt_nfacct_match_info)) : 36 Trying to append nfacct related rules results in an unhelpful message. Although it is suggested to look for more information in dmesg, nothing can be found there. # iptables -A <chain> -m nfacct --nfacct-name <acct-object> iptables: Invalid argument. Run `dmesg' for more information. This patch fixes the memory misalignment by enforcing 8-byte alignment within the struct's first revision. This solution is often used in many other uapi netfilter headers. Signed-off-by: Juliana Rodrigueiro <juliana.rodrigueiro@intra2net.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-08-19Merge 5.3-rc5 into usb-nextGreg Kroah-Hartman12-61/+71
We need the usb fixes in here as well for other patches to build on. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-19Merge 5.3-rc5 into staging-nextGreg Kroah-Hartman12-61/+71
We need the staging fixes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-19Merge 5.3-rc5 into char-misc-nextGreg Kroah-Hartman12-61/+71
We need the char/misc fixes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-18netfilter: nf_tables: map basechain priority to hardware priorityPablo Neira Ayuso1-0/+2
This patch adds initial support for offloading basechains using the priority range from 1 to 65535. This is restricting the netfilter priority range to 16-bit integer since this is what most drivers assume so far from tc. It should be possible to extend this range of supported priorities later on once drivers are updated to support for 32-bit integer priorities. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-18net: sched: use major priority number as hardware priorityPablo Neira Ayuso1-1/+1
tc transparently maps the software priority number to hardware. Update it to pass the major priority which is what most drivers expect. Update drivers too so they do not need to lshift the priority field of the flow_cls_common_offload object. The stmmac driver is an exception, since this code assumes the tc software priority is fine, therefore, lshift it just to be conservative. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-18KVM: arm/arm64: vgic: Add LPI translation cache definitionMarc Zyngier1-0/+3
Add the basic data structure that expresses an MSI to LPI translation as well as the allocation/release hooks. The size of the cache is arbitrarily defined as 16*nr_vcpus. Tested-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2019-08-18Merge tag 'usb-5.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usbLinus Torvalds2-1/+4
Pull USB fixes from Greg KH: "Here are number of small USB fixes for 5.3-rc5. Syzbot has been on a tear recently now that it has some good USB debugging hooks integrated, so there's a number of fixes in here found by those tools for some _very_ old bugs. Also a handful of gadget driver fixes for reported issues, some hopefully-final dma fixes for host controller drivers, and some new USB serial gadget driver ids. All of these have been in linux-next this week with no reported issues (the usb-serial ones were in linux-next in its own branch, but merged into mine on Friday)" * tag 'usb-5.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: usb: add a hcd_uses_dma helper usb: don't create dma pools for HCDs with a localmem_pool usb: chipidea: imx: fix EPROBE_DEFER support during driver probe usb: host: fotg2: restart hcd after port reset USB: CDC: fix sanity checks in CDC union parser usb: cdc-acm: make sure a refcount is taken early enough USB: serial: option: add the BroadMobi BM818 card USB: serial: option: Add Motorola modem UARTs USB: core: Fix races in character device registration and deregistraion usb: gadget: mass_storage: Fix races between fsg_disable and fsg_set_alt usb: gadget: composite: Clear "suspended" on reset/disconnect usb: gadget: udc: renesas_usb3: Fix sysfs interface of "role" USB: serial: option: add D-Link DWM-222 device ID USB: serial: option: Add support for ZTE MF871A
2019-08-17Merge tag 'for-linus-2019-08-17' of git://git.kernel.dk/linux-blockLinus Torvalds1-4/+1
Pull block fixes from Jens Axboe: "A collection of fixes that should go into this series. This contains: - Revert of the REQ_NOWAIT_INLINE and associated dio changes. There were still corner cases there, and even though I had a solution for it, it's too involved for this stage. (me) - Set of NVMe fixes (via Sagi) - io_uring fix for fixed buffers (Anthony) - io_uring defer issue fix (Jackie) - Regression fix for queue sync at exit time (zhengbin) - xen blk-back memory leak fix (Wenwen)" * tag 'for-linus-2019-08-17' of git://git.kernel.dk/linux-block: io_uring: fix an issue when IOSQE_IO_LINK is inserted into defer list block: remove REQ_NOWAIT_INLINE io_uring: fix manual setup of iov_iter for fixed buffers xen/blkback: fix memory leaks blk-mq: move cancel of requeue_work to the front of blk_exit_queue nvme-pci: Fix async probe remove race nvme: fix controller removal race with scan work nvme-rdma: fix possible use-after-free in connect error flow nvme: fix a possible deadlock when passthru commands sent to a multipath device nvme-core: Fix extra device_put() call on error path nvmet-file: fix nvmet_file_flush() always returning an error nvmet-loop: Flush nvme_delete_wq when removing the port nvmet: Fix use-after-free bug when a port is removed nvme-multipath: revalidate nvme_ns_head gendisk in nvme_validate_ns
2019-08-17xsk: remove AF_XDP socket from map when the socket is releasedBjörn Töpel1-0/+18
When an AF_XDP socket is released/closed the XSKMAP still holds a reference to the socket in a "released" state. The socket will still use the netdev queue resource, and block newly created sockets from attaching to that queue, but no user application can access the fill/complete/rx/tx queues. This results in that all applications need to explicitly clear the map entry from the old "zombie state" socket. This should be done automatically. In this patch, the sockets tracks, and have a reference to, which maps it resides in. When the socket is released, it will remove itself from all maps. Suggested-by: Bruce Richardson <bruce.richardson@intel.com> Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-08-17bpf: support cloning sk storage on accept()Stanislav Fomichev2-0/+13
Add new helper bpf_sk_storage_clone which optionally clones sk storage and call it from sk_clone_lock. Cc: Martin KaFai Lau <kafai@fb.com> Cc: Yonghong Song <yhs@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-08-17bpf: export bpf_map_inc_not_zeroStanislav Fomichev1-0/+2
Rename existing bpf_map_inc_not_zero to __bpf_map_inc_not_zero to indicate that it's caller's responsibility to do proper locking. Create and export bpf_map_inc_not_zero wrapper that properly locks map_idr_lock. Will be used in the next commit to hold a map while cloning a socket. Cc: Martin KaFai Lau <kafai@fb.com> Cc: Yonghong Song <yhs@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-08-17xsk: add support for need_wakeup flag in AF_XDP ringsMagnus Karlsson2-1/+45
This commit adds support for a new flag called need_wakeup in the AF_XDP Tx and fill rings. When this flag is set, it means that the application has to explicitly wake up the kernel Rx (for the bit in the fill ring) or kernel Tx (for bit in the Tx ring) processing by issuing a syscall. Poll() can wake up both depending on the flags submitted and sendto() will wake up tx processing only. The main reason for introducing this new flag is to be able to efficiently support the case when application and driver is executing on the same core. Previously, the driver was just busy-spinning on the fill ring if it ran out of buffers in the HW and there were none on the fill ring. This approach works when the application is running on another core as it can replenish the fill ring while the driver is busy-spinning. Though, this is a lousy approach if both of them are running on the same core as the probability of the fill ring getting more entries when the driver is busy-spinning is zero. With this new feature the driver now sets the need_wakeup flag and returns to the application. The application can then replenish the fill queue and then explicitly wake up the Rx processing in the kernel using the syscall poll(). For Tx, the flag is only set to one if the driver has no outstanding Tx completion interrupts. If it has some, the flag is zero as it will be woken up by a completion interrupt anyway. As a nice side effect, this new flag also improves the performance of the case where application and driver are running on two different cores as it reduces the number of syscalls to the kernel. The kernel tells user space if it needs to be woken up by a syscall, and this eliminates many of the syscalls. This flag needs some simple driver support. If the driver does not support this, the Rx flag is always zero and the Tx flag is always one. This makes any application relying on this feature default to the old behaviour of not requiring any syscalls in the Rx path and always having to call sendto() in the Tx path. For backwards compatibility reasons, this feature has to be explicitly turned on using a new bind flag (XDP_USE_NEED_WAKEUP). I recommend that you always turn it on as it so far always have had a positive performance impact. The name and inspiration of the flag has been taken from io_uring by Jens Axboe. Details about this feature in io_uring can be found in http://kernel.dk/io_uring.pdf, section 8.3. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-08-17xsk: replace ndo_xsk_async_xmit with ndo_xsk_wakeupMagnus Karlsson1-2/+12
This commit replaces ndo_xsk_async_xmit with ndo_xsk_wakeup. This new ndo provides the same functionality as before but with the addition of a new flags field that is used to specifiy if Rx, Tx or both should be woken up. The previous ndo only woke up Tx, as implied by the name. The i40e and ixgbe drivers (which are all the supported ones) are updated with this new interface. This new ndo will be used by the new need_wakeup functionality of XDP sockets that need to be able to wake up both Rx and Tx driver processing. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-08-17Documentation: Add devlink-trap documentationIdo Schimmel1-0/+6
Add initial documentation of the devlink-trap mechanism, explaining the background, motivation and the semantics of the interface. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-17devlink: Add generic packet traps and groupsIdo Schimmel1-0/+40
Add generic packet traps and groups that can report dropped packets as well as exceptions such as TTL error. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-17devlink: Add packet trap infrastructureIdo Schimmel2-0/+191
Add the basic packet trap infrastructure that allows device drivers to register their supported packet traps and trap groups with devlink. Each driver is expected to provide basic information about each supported trap, such as name and ID, but also the supported metadata types that will accompany each packet trapped via the trap. The currently supported metadata type is just the input port, but more will be added in the future. For example, output port and traffic class. Trap groups allow users to set the action of all member traps. In addition, users can retrieve per-group statistics in case per-trap statistics are too narrow. In the future, the trap group object can be extended with more attributes, such as policer settings which will limit the amount of traffic generated by member traps towards the CPU. Beside registering their packet traps with devlink, drivers are also expected to report trapped packets to devlink along with relevant metadata. devlink will maintain packets and bytes statistics for each packet trap and will potentially report the trapped packet with its metadata to user space via drop monitor netlink channel. The interface towards the drivers is simple and allows devlink to set the action of the trap. Currently, only two actions are supported: 'trap' and 'drop'. When set to 'trap', the device is expected to provide the sole copy of the packet to the driver which will pass it to devlink. When set to 'drop', the device is expected to drop the packet and not send a copy to the driver. In the future, more actions can be added, such as 'mirror'. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-17drop_monitor: Allow user to start monitoring hardware dropsIdo Schimmel1-0/+2
Drop monitor has start and stop commands, but so far these were only used to start and stop monitoring of software drops. Now that drop monitor can also monitor hardware drops, we should allow the user to control these as well. Do that by adding SW and HW flags to these commands. If no flag is specified, then only start / stop monitoring software drops. This is done in order to maintain backward-compatibility with existing user space applications. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-17drop_monitor: Add support for summary alert mode for hardware dropsIdo Schimmel1-0/+3
In summary alert mode a notification is sent with a list of recent drop reasons and a count of how many packets were dropped due to this reason. To avoid expensive operations in the context in which packets are dropped, each CPU holds an array whose number of entries is the maximum number of drop reasons that can be encoded in the netlink notification. Each entry stores the drop reason and a count. When a packet is dropped the array is traversed and a new entry is created or the count of an existing entry is incremented. Later, in process context, the array is replaced with a newly allocated copy and the old array is encoded in a netlink notification. To avoid breaking user space, the notification includes the ancillary header, which is 'struct net_dm_alert_msg' with number of entries set to '0'. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-17drop_monitor: Add support for packet alert mode for hardware dropsIdo Schimmel1-0/+10
In a similar fashion to software drops, extend drop monitor to send netlink events when packets are dropped by the underlying hardware. The main difference is that instead of encoding the program counter (PC) from which kfree_skb() was called in the netlink message, we encode the hardware trap name. The two are mostly equivalent since they should both help the user understand why the packet was dropped. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-17drop_monitor: Add basic infrastructure for hardware dropsIdo Schimmel1-0/+33
Export a function that can be invoked in order to report packets that were dropped by the underlying hardware along with metadata. Subsequent patches will add support for the different alert modes. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-17Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetoothDavid S. Miller1-0/+1
Johan Hedberg says: ==================== pull request: bluetooth 2019-08-17 Here's a set of Bluetooth fixes for the 5.3-rc series: - Multiple fixes for Qualcomm (btqca & hci_qca) drivers - Minimum encryption key size debugfs setting (this is required for Bluetooth Qualification) - Fix hidp_send_message() to have a meaningful return value ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-17net: phy: remove genphy_config_initHeiner Kallweit1-1/+0
Now that all users have been removed we can remove genphy_config_init. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-17dma-fence: Store the timestamp in the same union as the cb_listChris Wilson1-5/+19
The timestamp and the cb_list are mutually exclusive, the cb_list can only be added to prior to being signaled (and once signaled we drain), while the timestamp is only valid upon being signaled. Both the timestamp and the cb_list are only valid while the fence is alive, and as soon as no references are held can be replaced by the rcu_head. By reusing the union for the timestamp, we squeeze the base dma_fence struct to 64 bytes on x86-64. v2: Sort the union chronologically Suggested-by: Christian König <christian.koenig@amd.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Christian König <christian.koenig@amd.com> Acked-by: Christian König <christian.koenig@amd.com>. Link: https://patchwork.freedesktop.org/patch/msgid/20190817153022.5749-1-chris@chris-wilson.co.uk
2019-08-17dma-fence: Shrink size of struct dma_fenceChris Wilson1-3/+3
Rearrange the couple of 32-bit atomics hidden amongst the field of pointers that unnecessarily caused the compiler to insert some padding, shrinks the size of the base struct dma_fence from 80 to 72 bytes on x86-64. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Christian König <christian.koenig@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190817144736.7826-3-chris@chris-wilson.co.uk
2019-08-17Bluetooth: Add debug setting for changing minimum encryption key sizeMarcel Holtmann1-0/+1
For testing and qualification purposes it is useful to allow changing the minimum encryption key size value that the host stack is going to enforce. This adds a new debugfs setting min_encrypt_key_size to achieve this functionality. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2019-08-16drivers: remove the SGI SN2 IOC4 base supportChristoph Hellwig2-185/+0
The IOC4 is a multi-function chip seen on SGI SN2 and some SGI MIPS systems. This removes the base driver, which while not having an SN2 Kconfig dependency was only for sub-drivers that had one. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lkml.kernel.org/r/20190813072514.23299-15-hch@lst.de Signed-off-by: Tony Luck <tony.luck@intel.com>
2019-08-16clk: Overwrite clk_hw::init with NULL during clk_register()Stephen Boyd1-1/+2
We don't want clk provider drivers to use the init structure after clk registration time, but we leave a dangling reference to it by means of clk_hw::init. Let's overwrite the member with NULL during clk_register() so that this can't be used anymore after registration time. Cc: Bjorn Andersson <bjorn.andersson@linaro.org> Cc: Doug Anderson <dianders@chromium.org> Signed-off-by: Stephen Boyd <sboyd@kernel.org> Link: https://lkml.kernel.org/r/20190731193517.237136-10-sboyd@kernel.org Reviewed-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
2019-08-16Merge tag 'pm-5.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pmLinus Torvalds1-0/+2
Pull power management fixes from Rafael Wysocki: "These add a check to avoid recent suspend-to-idle power regression on systems with NVMe drives where the PCIe ASPM policy is "performance" (or when the kernel is built without ASPM support), fix an issue related to frequency limits in the schedutil cpufreq governor and fix a mistake related to the PM QoS usage in the cpufreq core introduced recently. Specifics: - Disable NVMe power optimization related to suspend-to-idle added recently on systems where PCIe ASPM is not able to put PCIe links into low-power states to prevent excess power from being drawn by the system while suspended (Rafael Wysocki). - Make the schedutil governor handle frequency limits changes properly in all cases (Viresh Kumar). - Prevent the cpufreq core from treating positive values returned by dev_pm_qos_update_request() as errors (Viresh Kumar)" * tag 'pm-5.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: nvme-pci: Allow PCI bus-level PM to be used if ASPM is disabled PCI/ASPM: Add pcie_aspm_enabled() cpufreq: schedutil: Don't skip freq update when limits change cpufreq: dev_pm_qos_update_request() can return 1 on success
2019-08-16mm/mmu_notifiers: add a get/put scheme for the registrationJason Gunthorpe1-0/+35
Many places in the kernel have a flow where userspace will create some object and that object will need to connect to the subsystem's mmu_notifier subscription for the duration of its lifetime. In this case the subsystem is usually tracking multiple mm_structs and it is difficult to keep track of what struct mmu_notifier's have been allocated for what mm's. Since this has been open coded in a variety of exciting ways, provide core functionality to do this safely. This approach uses the struct mmu_notifier_ops * as a key to determine if the subsystem has a notifier registered on the mm or not. If there is a registration then the existing notifier struct is returned, otherwise the ops->alloc_notifiers() is used to create a new per-subsystem notifier for the mm. The destroy side incorporates an async call_srcu based destruction which will avoid bugs in the callers such as commit 6d7c3cde93c1 ("mm/hmm: fix use after free with struct hmm in the mmu notifiers"). Since we are inside the mmu notifier core locking is fairly simple, the allocation uses the same approach as for mmu_notifier_mm, the write side of the mmap_sem makes everything deterministic and we only need to do hlist_add_head_rcu() under the mm_take_all_locks(). The new users count and the discoverability in the hlist is fully serialized by the mmu_notifier_mm->lock. Link: https://lore.kernel.org/r/20190806231548.25242-4-jgg@ziepe.ca Co-developed-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Christoph Hellwig <hch@infradead.org> Reviewed-by: Ralph Campbell <rcampbell@nvidia.com> Tested-by: Ralph Campbell <rcampbell@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-08-16PCI/P2PDMA: Introduce pci_p2pdma_unmap_sg()Logan Gunthorpe1-0/+13
Add pci_p2pdma_unmap_sg() to the two places that call pci_p2pdma_map_sg(). This is a prep patch to introduce correct mappings for p2pdma transactions that go through the root complex. Link: https://lore.kernel.org/r/20190730163545.4915-10-logang@deltatee.com Link: https://lore.kernel.org/r/20190812173048.9186-10-logang@deltatee.com Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2019-08-16PCI/P2PDMA: Add attrs argument to pci_p2pdma_map_sg()Logan Gunthorpe1-4/+11
This is to match the dma_map_sg() API which this function will have to call in an future patch. Add a pci_p2pdma_map_sg_attrs() function and helper to call it with no attributes just like the dma_map_sg() function. Link: https://lore.kernel.org/r/20190730163545.4915-9-logang@deltatee.com Link: https://lore.kernel.org/r/20190812173048.9186-9-logang@deltatee.com Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2019-08-16PCI/P2PDMA: Introduce private pagemap structureLogan Gunthorpe1-1/+0
Move the PCI bus offset from the generic dev_pagemap structure to a specific pci_p2pdma_pagemap structure. This structure will grow in subsequent patches. Link: https://lore.kernel.org/r/20190730163545.4915-2-logang@deltatee.com Link: https://lore.kernel.org/r/20190812173048.9186-2-logang@deltatee.com Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Christoph Hellwig <hch@lst.de>