aboutsummaryrefslogtreecommitdiffstats
path: root/ipc (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2018-12-11seccomp: add a return code to trap to userspaceTycho Andersen6-10/+1017
This patch introduces a means for syscalls matched in seccomp to notify some other task that a particular filter has been triggered. The motivation for this is primarily for use with containers. For example, if a container does an init_module(), we obviously don't want to load this untrusted code, which may be compiled for the wrong version of the kernel anyway. Instead, we could parse the module image, figure out which module the container is trying to load and load it on the host. As another example, containers cannot mount() in general since various filesystems assume a trusted image. However, if an orchestrator knows that e.g. a particular block device has not been exposed to a container for writing, it want to allow the container to mount that block device (that is, handle the mount for it). This patch adds functionality that is already possible via at least two other means that I know about, both of which involve ptrace(): first, one could ptrace attach, and then iterate through syscalls via PTRACE_SYSCALL. Unfortunately this is slow, so a faster version would be to install a filter that does SECCOMP_RET_TRACE, which triggers a PTRACE_EVENT_SECCOMP. Since ptrace allows only one tracer, if the container runtime is that tracer, users inside the container (or outside) trying to debug it will not be able to use ptrace, which is annoying. It also means that older distributions based on Upstart cannot boot inside containers using ptrace, since upstart itself uses ptrace to monitor services while starting. The actual implementation of this is fairly small, although getting the synchronization right was/is slightly complex. Finally, it's worth noting that the classic seccomp TOCTOU of reading memory data from the task still applies here, but can be avoided with careful design of the userspace handler: if the userspace handler reads all of the task memory that is necessary before applying its security policy, the tracee's subsequent memory edits will not be read by the tracer. Signed-off-by: Tycho Andersen <tycho@tycho.ws> CC: Kees Cook <keescook@chromium.org> CC: Andy Lutomirski <luto@amacapital.net> CC: Oleg Nesterov <oleg@redhat.com> CC: Eric W. Biederman <ebiederm@xmission.com> CC: "Serge E. Hallyn" <serge@hallyn.com> Acked-by: Serge Hallyn <serge@hallyn.com> CC: Christian Brauner <christian@brauner.io> CC: Tyler Hicks <tyhicks@canonical.com> CC: Akihiro Suda <suda.akihiro@lab.ntt.co.jp> Signed-off-by: Kees Cook <keescook@chromium.org>
2018-12-11seccomp: switch system call argument type to void *Tycho Andersen3-6/+6
The const qualifier causes problems for any code that wants to write to the third argument of the seccomp syscall, as we will do in a future patch in this series. The third argument to the seccomp syscall is documented as void *, so rather than just dropping the const, let's switch everything to use void * as well. I believe this is safe because of 1. the documentation above, 2. there's no real type information exported about syscalls anywhere besides the man pages. Signed-off-by: Tycho Andersen <tycho@tycho.ws> CC: Kees Cook <keescook@chromium.org> CC: Andy Lutomirski <luto@amacapital.net> CC: Oleg Nesterov <oleg@redhat.com> CC: Eric W. Biederman <ebiederm@xmission.com> CC: "Serge E. Hallyn" <serge@hallyn.com> Acked-by: Serge Hallyn <serge@hallyn.com> CC: Christian Brauner <christian@brauner.io> CC: Tyler Hicks <tyhicks@canonical.com> CC: Akihiro Suda <suda.akihiro@lab.ntt.co.jp> Signed-off-by: Kees Cook <keescook@chromium.org>
2018-12-11seccomp: hoist struct seccomp_data recalculation higherTycho Andersen1-6/+6
In the next patch, we're going to use the sd pointer passed to __seccomp_filter() as the data to pass to userspace. Except that in some cases (__seccomp_filter(SECCOMP_RET_TRACE), emulate_vsyscall(), every time seccomp is inovked on power, etc.) the sd pointer will be NULL in order to force seccomp to recompute the register data. Previously this recomputation happened one level lower, in seccomp_run_filters(); this patch just moves it up a level higher to __seccomp_filter(). Thanks Oleg for spotting this. Signed-off-by: Tycho Andersen <tycho@tycho.ws> CC: Kees Cook <keescook@chromium.org> CC: Andy Lutomirski <luto@amacapital.net> CC: Oleg Nesterov <oleg@redhat.com> CC: Eric W. Biederman <ebiederm@xmission.com> CC: "Serge E. Hallyn" <serge@hallyn.com> Acked-by: Serge Hallyn <serge@hallyn.com> CC: Christian Brauner <christian@brauner.io> CC: Tyler Hicks <tyhicks@canonical.com> CC: Akihiro Suda <suda.akihiro@lab.ntt.co.jp> Signed-off-by: Kees Cook <keescook@chromium.org>
2018-11-11Linux 4.20-rc2Linus Torvalds1-1/+1
2018-11-11act_mirred: clear skb->tstamp on redirectEric Dumazet2-10/+2
If sch_fq is used at ingress, skbs that might have been timestamped by net_timestamp_set() if a packet capture is requesting timestamps could be delayed by arbitrary amount of time, since sch_fq time base is MONOTONIC. Fix this problem by moving code from sch_netem.c to act_mirred.c. Fixes: fb420d5d91c1 ("tcp/fq: move back to CLOCK_MONOTONIC") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-11net: dsa: mv88e6xxx: Fix clearing of stats countersAndrew Lunn1-0/+2
The mv88e6161 would sometime fail to probe with a timeout waiting for the switch to complete an operation. This operation is supposed to clear the statistics counters. However, due to a read/modify/write, without the needed mask, the operation actually carried out was more random, with invalid parameters, resulting in the switch not responding. We need to preserve the histogram mode bits, so apply a mask to keep them. Reported-by: Chris Healy <Chris.Healy@zii.aero> Fixes: 40cff8fca9e3 ("net: dsa: mv88e6xxx: Fix stats histogram mode") Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-11tipc: fix link re-establish failureJon Maloy1-4/+7
When a link failure is detected locally, the link is reset, the flag link->in_session is set to false, and a RESET_MSG with the 'stopping' bit set is sent to the peer. The purpose of this bit is to inform the peer that this endpoint just is going down, and that the peer should handle the reception of this particular RESET message as a local failure. This forces the peer to accept another RESET or ACTIVATE message from this endpoint before it can re-establish the link. This again is necessary to ensure that link session numbers are properly exchanged before the link comes up again. If a failure is detected locally at the same time at the peer endpoint this will do the same, which is also a correct behavior. However, when receiving such messages, the endpoints will not distinguish between 'stopping' RESETs and ordinary ones when it comes to updating session numbers. Both endpoints will copy the received session number and set their 'in_session' flags to true at the reception, while they are still expecting another RESET from the peer before they can go ahead and re-establish. This is contradictory, since, after applying the validation check referred to below, the 'in_session' flag will cause rejection of all such messages, and the link will never come up again. We now fix this by not only handling received RESET/STOPPING messages as a local failure, but also by omitting to set a new session number and the 'in_session' flag in such cases. Fixes: 7ea817f4e832 ("tipc: check session number before accepting link protocol messages") Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-11builddeb: Fix inclusion of dtbs in debian packageRob Herring1-2/+2
Commit 37c8a5fafa3b ("kbuild: consolidate Devicetree dtb build rules") moved the location of 'dtbs_install' target which caused dtbs to not be installed when building debian package with 'bindeb-pkg' target. Update the builddeb script to use the same logic that determines if there's a 'dtbs_install' target which is presence of the arch dts directory. Also, use CONFIG_OF_EARLY_FLATTREE instead of CONFIG_OF as that's a better indication of whether we are building dtbs. This commit will also have the side effect of installing dtbs on any arch that has dts files. Previously, it was dependent on whether the arch defined 'dtbs_install'. Fixes: 37c8a5fafa3b ("kbuild: consolidate Devicetree dtb build rules") Reported-by: Nuno Gonçalves <nunojpg@gmail.com> Signed-off-by: Rob Herring <robh@kernel.org> Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
2018-11-11Revert "scripts/setlocalversion: git: Make -dirty check more robust"Guenter Roeck1-1/+1
This reverts commit 6147b1cf19651c7de297e69108b141fb30aa2349. The reverted patch results in attempted write access to the source repository, even if that repository is mounted read-only. Output from "strace git status -uno --porcelain": getcwd("/tmp/linux-test", 129) = 16 open("/tmp/linux-test/.git/index.lock", O_RDWR|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = -1 EROFS (Read-only file system) While git appears to be able to handle this situation, a monitored build environment (such as the one used for Chrome OS kernel builds) may detect it and bail out with an access violation error. On top of that, the attempted write access suggests that git _will_ write to the file even if a build output directory is specified. Users may have the reasonable expectation that the source repository remains untouched in that situation. Fixes: 6147b1cf19651 ("scripts/setlocalversion: git: Make -dirty check more robust" Cc: Genki Sky <sky@genki.is> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Brian Norris <briannorris@chromium.org> Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
2018-11-11kbuild: deb-pkg: fix too low build version numberMasahiro Yamada1-2/+5
Since commit b41d920acff8 ("kbuild: deb-pkg: split generating packaging and build"), the build version of the kernel contained in a deb package is too low by 1. Prior to the bad commit, the kernel was built first, then the number in .version file was read out, and written into the debian control file. Now, the debian control file is created before the kernel is actually compiled, which is causing the version number mismatch. Let the mkdebian script pass KBUILD_BUILD_VERSION=${revision} to require the build system to use the specified version number. Fixes: b41d920acff8 ("kbuild: deb-pkg: split generating packaging and build") Reported-by: Doug Smythies <dsmythies@telus.net> Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Tested-by: Doug Smythies <dsmythies@telus.net>
2018-11-11kconfig: merge_config: avoid false positive matches from comment linesMasahiro Yamada1-3/+4
The current SED_CONFIG_EXP could match to comment lines in config fragment files, especially when CONFIG_PREFIX_ is empty. For example, Buildroot uses empty prefixing; starting symbols with BR2_ is just convention. Make the sed expression more robust against false positives from comment lines. The new sed expression matches to only valid patterns. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Reviewed-by: Petr Vorel <petr.vorel@gmail.com> Reviewed-by: Arnout Vandecappelle (Essensium/Mind) <arnout@mind.be>
2018-11-10net: sched: cls_flower: validate nested enc_opts_policy to avoid warningJakub Kicinski1-1/+13
TCA_FLOWER_KEY_ENC_OPTS and TCA_FLOWER_KEY_ENC_OPTS_MASK can only currently contain further nested attributes, which are parsed by hand, so the policy is never actually used resulting in a W=1 build warning: net/sched/cls_flower.c:492:1: warning: ‘enc_opts_policy’ defined but not used [-Wunused-const-variable=] enc_opts_policy[TCA_FLOWER_KEY_ENC_OPTS_MAX + 1] = { Add the validation anyway to avoid potential bugs when other attributes are added and to make the attribute structure slightly more clear. Validation will also set extact to point to bad attribute on error. Fixes: 0a6e77784f49 ("net/sched: allow flower to match tunnel options") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Simon Horman <simon.horman@netronome.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-09net: mvneta: correct typoAlexandre Belloni1-2/+2
The reserved variable should be named reserved1. Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-09flow_dissector: do not dissect l4 ports for fragments배석진1-2/+2
Only first fragment has the sport/dport information, not the following ones. If we want consistent hash for all fragments, we need to ignore ports even for first fragment. This bug is visible for IPv6 traffic, if incoming fragments do not have a flow label, since skb_get_hash() will give different results for first fragment and following ones. It is also visible if any routing rule wants dissection and sport or dport. See commit 5e5d6fed3741 ("ipv6: route: dissect flow in input path if fib rules need it") for details. [edumazet] rewrote the changelog completely. Fixes: 06635a35d13d ("flow_dissect: use programable dissector in skb_flow_dissect and friends") Signed-off-by: 배석진 <soukjin.bae@samsung.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-09net: qualcomm: rmnet: Fix incorrect assignment of real_devSubash Abhinov Kasiviswanathan1-3/+3
A null dereference was observed when a sysctl was being set from userspace and rmnet was stuck trying to complete some actions in the NETDEV_REGISTER callback. This is because the real_dev is set only after the device registration handler completes. sysctl call stack - <6> Unable to handle kernel NULL pointer dereference at virtual address 00000108 <2> pc : rmnet_vnd_get_iflink+0x1c/0x28 <2> lr : dev_get_iflink+0x2c/0x40 <2> rmnet_vnd_get_iflink+0x1c/0x28 <2> inet6_fill_ifinfo+0x15c/0x234 <2> inet6_ifinfo_notify+0x68/0xd4 <2> ndisc_ifinfo_sysctl_change+0x1b8/0x234 <2> proc_sys_call_handler+0xac/0x100 <2> proc_sys_write+0x3c/0x4c <2> __vfs_write+0x54/0x14c <2> vfs_write+0xcc/0x188 <2> SyS_write+0x60/0xc0 <2> el0_svc_naked+0x34/0x38 device register call stack - <2> notifier_call_chain+0x84/0xbc <2> raw_notifier_call_chain+0x38/0x48 <2> call_netdevice_notifiers_info+0x40/0x70 <2> call_netdevice_notifiers+0x38/0x60 <2> register_netdevice+0x29c/0x3d8 <2> rmnet_vnd_newlink+0x68/0xe8 <2> rmnet_newlink+0xa0/0x160 <2> rtnl_newlink+0x57c/0x6c8 <2> rtnetlink_rcv_msg+0x1dc/0x328 <2> netlink_rcv_skb+0xac/0x118 <2> rtnetlink_rcv+0x24/0x30 <2> netlink_unicast+0x158/0x1f0 <2> netlink_sendmsg+0x32c/0x338 <2> sock_sendmsg+0x44/0x60 <2> SyS_sendto+0x150/0x1ac <2> el0_svc_naked+0x34/0x38 Fixes: b752eff5be24 ("net: qualcomm: rmnet: Implement ndo_get_iflink") Signed-off-by: Sean Tranchetti <stranche@codeaurora.org> Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-09net: aquantia: allow rx checksum offload configurationDmitry Bogdanov5-6/+18
RX Checksum offloads could not be configured and ignored netdev features flag for checksumming. Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> Signed-off-by: Dmitry Bogdanov <dmitry.bogdanov@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-09net: aquantia: invalid checksumm offload implementationDmitry Bogdanov2-30/+41
Packets with marked invalid IP/UDP/TCP checksums were considered as good by the driver. The error was in a logic, processing offload bits in RX descriptor. Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> Signed-off-by: Dmitry Bogdanov <dmitry.bogdanov@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-09net: aquantia: fixed enable unicast on 32 macvlanIgor Russkikh1-1/+1
Fixed a condition mistake due to which macvlans unicast item number 32 was not added in the unicast filter. The consequence is that when exactly 32 macvlans are created on NIC, the last created macvlan receives no traffic because its MAC was not registered in HW. Fixes: 94b3b542303f ("net: aquantia: vlan unicast address list correct handling") Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> Tested-by: Nikita Danilov <nikita.danilov@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-09net: aquantia: fix potential IOMMU fault after driver unbindDmitry Bogdanov4-0/+35
IOMMU fault may occurr on unbind/bind or if_down/if_up sequence. Although driver disables the rings on down, this is not enough. Due to internal HW design, during subsequent initialization NIC sometimes may reuse RX descriptors cache and write to the host memory from the descriptor cache. That's get catched by IOMMU on host. This patch invalidates the descriptor cache in NIC on interface down to prevent writing to the cached descriptors and to the memory pointed in those descriptors. Signed-off-by: Dmitry Bogdanov <dmitry.bogdanov@aquantia.com> Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-09net: aquantia: synchronized flow control between mac/phyIgor Russkikh5-8/+50
Flow control statuses were not synchronized between blocks, that caused packets/link drop on some corner cases, when MAC sent PFC although Phy was not expecting these to come. Driver should readout the negotiated FC from phy and configure RX block accordigly. This is done on each link change event with information from FW. Fixes: 288551de45aa ("net: aquantia: Implement rx/tx flow control ethtools callback") Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-09clk: qcom: gcc: Fix board clock node nameVinod Koul1-1/+1
Device tree node name are not supposed to have "_" in them so fix the node name use of xo_board to xo-board Fixes: 652f1813c113 ("clk: qcom: gcc: Add global clock controller driver for QCS404") Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2018-11-09x86/cpu/vmware: Do not trace vmware_sched_clock()Steven Rostedt (VMware)1-1/+1
When running function tracing on a Linux guest running on VMware Workstation, the guest would crash. This is due to tracing of the sched_clock internal call of the VMware vmware_sched_clock(), which causes an infinite recursion within the tracing code (clock calls must not be traced). Make vmware_sched_clock() not traced by ftrace. Fixes: 80e9a4f21fd7c ("x86/vmware: Add paravirt sched clock") Reported-by: GwanYeong Kim <gy741.kim@gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Borislav Petkov <bp@suse.de> CC: Alok Kataria <akataria@vmware.com> CC: GwanYeong Kim <gy741.kim@gmail.com> CC: "H. Peter Anvin" <hpa@zytor.com> CC: Ingo Molnar <mingo@kernel.org> Cc: stable@vger.kernel.org CC: Thomas Gleixner <tglx@linutronix.de> CC: virtualization@lists.linux-foundation.org CC: x86-ml <x86@kernel.org> Link: http://lkml.kernel.org/r/20181109152207.4d3e7d70@gandalf.local.home
2018-11-09usb: typec: ucsi: add support for Cypress CCGxAjay Gupta3-0/+319
Latest NVIDIA GPU cards have a Cypress CCGx Type-C controller over I2C interface. This UCSI I2C driver uses I2C bus driver interface for communicating with Type-C controller. Signed-off-by: Ajay Gupta <ajayg@nvidia.com> Acked-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2018-11-09serial: sh-sci: Fix could not remove dev_attr_rx_fifo_timeoutYoshihiro Shimoda1-2/+2
This patch fixes an issue that the sci_remove() could not remove dev_attr_rx_fifo_timeout because uart_remove_one_port() set the port->port.type to PORT_UNKNOWN. Reported-by: Hiromitsu Yamasaki <hiromitsu.yamasaki.ym@renesas.com> Fixes: 5d23188a473d ("serial: sh-sci: make RX FIFO parameters tunable via sysfs") Cc: <stable@vger.kernel.org> # v4.11+ Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Reviewed-by: Ulrich Hecht <uli+renesas@fpond.eu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-09i2c: nvidia-gpu: make pm_ops staticWolfram Sang1-1/+1
sparse rightfully says: warning: symbol 'gpu_i2c_driver_pm' was not declared. Should it be static? Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2018-11-09i2c: add i2c bus driver for NVIDIA GPUAjay Gupta5-0/+403
Latest NVIDIA GPU card has USB Type-C interface. There is a Type-C controller which can be accessed over I2C. This driver adds I2C bus driver to communicate with Type-C controller. I2C client driver will be part of USB Type-C UCSI driver. Signed-off-by: Ajay Gupta <ajayg@nvidia.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> [wsa: kept Makefile sorting] Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2018-11-09ext4: missing !bh check in ext4_xattr_inode_write()Vasily Averin1-0/+6
According to Ted Ts'o ext4_getblk() called in ext4_xattr_inode_write() should not return bh = NULL The only time that bh could be NULL, then, would be in the case of something really going wrong; a programming error elsewhere (perhaps a wild pointer dereference) or I/O error causing on-disk file system corruption (although that would be highly unlikely given that we had *just* allocated the blocks and so the metadata blocks in question probably would still be in the cache). Fixes: e50e5129f384 ("ext4: xattr-in-inode support") Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org # 4.13
2018-11-09i2c: qcom-geni: Fix runtime PM mismatch with child devicesStephen Boyd1-7/+8
We need to enable runtime PM on this i2c controller before populating child devices with i2c_add_adapter(). Otherwise, if a child device uses runtime PM and stays runtime PM enabled we'll get the following warning at boot. Enabling runtime PM for inactive device (a98000.i2c) with active children [...] Call trace: pm_runtime_enable+0xd8/0xf8 geni_i2c_probe+0x440/0x460 platform_drv_probe+0x74/0xc8 [...] Let's move the runtime PM enabling and setup to before we add the adapter, so that this device can respond to runtime PM requests from children. Fixes: 37692de5d523 ("i2c: i2c-qcom-geni: Add bus driver for the Qualcomm GENI I2C controller") Signed-off-by: Stephen Boyd <swboyd@chromium.org> Reviewed-by: Douglas Anderson <dianders@chromium.org> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2018-11-09MAINTAINERS: Add entry for i2c-omap driverVignesh R1-0/+8
Add separate entry for i2c-omap and add my name as maintainer for this driver. Signed-off-by: Vignesh R <vigneshr@ti.com> Acked-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2018-11-09i2c: omap: Enable for ARCH_K3Vignesh R1-1/+1
Allow I2C_OMAP to be built for K3 platforms. Signed-off-by: Vignesh R <vigneshr@ti.com> Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2018-11-09dt-bindings: i2c: omap: Add new compatible for AM654 SoCsVignesh R1-2/+6
AM654 SoCs have same I2C IP as OMAP SoCs. Add new compatible to handle AM654 SoCs. While at that reformat the existing compatible list for older SoCs to list one valid compatible per line. Signed-off-by: Vignesh R <vigneshr@ti.com> Reviewed-by: Rob Herring <robh@kernel.org> Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2018-11-09xen: remove size limit of privcmd-buf mapping interfaceJuergen Gross1-18/+4
Currently the size of hypercall buffers allocated via /dev/xen/hypercall is limited to a default of 64 memory pages. For live migration of guests this might be too small as the page dirty bitmask needs to be sized according to the size of the guest. This means migrating a 8GB sized guest is already exhausting the default buffer size for the dirty bitmap. There is no sensible way to set a sane limit, so just remove it completely. The device node's usage is limited to root anyway, so there is no additional DOS scenario added by allowing unlimited buffers. While at it make the error path for the -ENOMEM case a little bit cleaner by setting n_pages to the number of successfully allocated pages instead of the target size. Fixes: c51b3c639e01f2 ("xen: add new hypercall buffer mapping device") Cc: <stable@vger.kernel.org> #4.18 Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2018-11-09xen: fix xen_qlock_wait()Juergen Gross1-6/+8
Commit a856531951dc80 ("xen: make xen_qlock_wait() nestable") introduced a regression for Xen guests running fully virtualized (HVM or PVH mode). The Xen hypervisor wouldn't return from the poll hypercall with interrupts disabled in case of an interrupt (for PV guests it does). So instead of disabling interrupts in xen_qlock_wait() use a nesting counter to avoid calling xen_clear_irq_pending() in case xen_qlock_wait() is nested. Fixes: a856531951dc80 ("xen: make xen_qlock_wait() nestable") Cc: stable@vger.kernel.org Reported-by: Sander Eikelenboom <linux@eikelenboom.it> Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Tested-by: Sander Eikelenboom <linux@eikelenboom.it> Signed-off-by: Juergen Gross <jgross@suse.com>
2018-11-09block: make sure writesame bio is aligned with logical block sizeMing Lei1-1/+1
Obviously the created writesame bio has to be aligned with logical block size, and use bio_allowed_max_sectors() to retrieve this number. Cc: stable@vger.kernel.org Cc: Mike Snitzer <snitzer@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Xiao Ni <xni@redhat.com> Cc: Mariusz Dabrowski <mariusz.dabrowski@intel.com> Fixes: b49a0871be31a745b2ef ("block: remove split code in blkdev_issue_{discard,write_same}") Tested-by: Rui Salvaterra <rsalvaterra@gmail.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-09block: cleanup __blkdev_issue_discard()Ming Lei1-17/+6
Cleanup __blkdev_issue_discard() a bit: - remove local variable of 'end_sect' - remove code block of 'fail' Cc: Mike Snitzer <snitzer@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Xiao Ni <xni@redhat.com> Cc: Mariusz Dabrowski <mariusz.dabrowski@intel.com> Tested-by: Rui Salvaterra <rsalvaterra@gmail.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-09block: make sure discard bio is aligned with logical block sizeMing Lei3-3/+13
Obviously the created discard bio has to be aligned with logical block size. This patch introduces the helper of bio_allowed_max_sectors() for this purpose. Cc: stable@vger.kernel.org Cc: Mike Snitzer <snitzer@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Xiao Ni <xni@redhat.com> Cc: Mariusz Dabrowski <mariusz.dabrowski@intel.com> Fixes: 744889b7cbb56a6 ("block: don't deal with discard limit in blkdev_issue_discard()") Fixes: a22c4d7e34402cc ("block: re-add discard_granularity and alignment checks") Reported-by: Rui Salvaterra <rsalvaterra@gmail.com> Tested-by: Rui Salvaterra <rsalvaterra@gmail.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-09Revert "nvmet-rdma: use a private workqueue for delete"Christoph Hellwig1-15/+4
This reverts commit 2acf70ade79d26b97611a8df52eb22aa33814cd4. The commit never really fixed the intended issue and caused all kinds of other issues, including a use before initialization. Suggested-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-09nvme: make sure ns head inherits underlying device limitsSagi Grimberg2-1/+4
Whenever we update ns_head info, we need to make sure it is still compatible with all underlying backing devices because although nvme multipath doesn't have any explicit use of these limits, other devices can still be stacked on top of it which may rely on the underlying limits. Start with unlimited stacking limits, and every info update iterate over siblings and adjust queue limits. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-09nvmet: don't try to add ns to p2p map unless it actually uses itSagi Grimberg1-1/+1
Even without CONFIG_P2PDMA this results in a error print: nvmet: no peer-to-peer memory is available that's supported by rxe0 and /dev/nullb0 Fixes: c6925093d0b2 ("nvmet: Optionally use PCI P2P memory") Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-09x86/xen: fix pv bootJuergen Gross2-6/+32
Commit 9da3f2b7405440 ("x86/fault: BUG() when uaccess helpers fault on kernel addresses") introduced a regression for booting Xen PV guests. Xen PV guests are using __put_user() and __get_user() for accessing the p2m map (physical to machine frame number map) as accesses might fail in case of not populated areas of the map. With above commit using __put_user() and __get_user() for accessing kernel pages is no longer valid. So replace the Xen hack by adding appropriate p2m access functions using the default fixup handler. Fixes: 9da3f2b7405440 ("x86/fault: BUG() when uaccess helpers fault on kernel addresses") Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2018-11-08net: smsc95xx: Fix MTU rangeStefan Wahren1-0/+2
The commit f77f0aee4da4 ("net: use core MTU range checking in USB NIC drivers") introduce a common MTU handling for usbnet. But it's missing the necessary changes for smsc95xx. So set the MTU range accordingly. This patch has been tested on a Raspberry Pi 3. Fixes: f77f0aee4da4 ("net: use core MTU range checking in USB NIC drivers") Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08net: stmmac: Fix RX packet size > 8191Thor Thayer4-4/+5
Ping problems with packets > 8191 as shown: PING 192.168.1.99 (192.168.1.99) 8150(8178) bytes of data. 8158 bytes from 192.168.1.99: icmp_seq=1 ttl=64 time=0.669 ms wrong data byte 8144 should be 0xd0 but was 0x0 16 10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f 20 21 22 23 24 25 26 27 28 29 2a 2b 2c 2d 2e 2f %< ---------------snip-------------------------------------- 8112 b0 b1 b2 b3 b4 b5 b6 b7 b8 b9 ba bb bc bd be bf c0 c1 c2 c3 c4 c5 c6 c7 c8 c9 ca cb cc cd ce cf 8144 0 0 0 0 d0 d1 ^^^^^^^ Notice the 4 bytes of 0 before the expected byte of d0. Databook notes that the RX buffer must be a multiple of 4/8/16 bytes [1]. Update the DMA Buffer size define to 8188 instead of 8192. Remove the -1 from the RX buffer size allocations and use the new DMA Buffer size directly. [1] Synopsys DesignWare Cores Ethernet MAC Universal v3.70a [section 8.4.2 - Table 8-24] Tested on SoCFPGA Stratix10 with ping sweep from 100 to 8300 byte packets. Fixes: 286a83721720 ("stmmac: add CHAINED descriptor mode support (V4)") Suggested-by: Jose Abreu <jose.abreu@synopsys.com> Signed-off-by: Thor Thayer <thor.thayer@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08qed: Fix potential memory corruptionSagiv Ozeri1-7/+5
A stuck ramrod should be deleted from the completion_pending list, otherwise it will be added again in the future and corrupt the list. Return error value to inform that ramrod is stuck and should be deleted. Signed-off-by: Sagiv Ozeri <sagiv.ozeri@cavium.com> Signed-off-by: Denis Bolotin <denis.bolotin@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08qed: Fix SPQ entries not returned to pool in error flowsDenis Bolotin8-15/+45
qed_sp_destroy_request() API was added for SPQ users that need to free/return the entry they acquired in their error flows. Signed-off-by: Denis Bolotin <denis.bolotin@cavium.com> Signed-off-by: Michal Kalderon <michal.kalderon@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08qed: Fix blocking/unlimited SPQ entries leakDenis Bolotin2-27/+33
When there are no SPQ entries left in the free_pool, new entries are allocated and are added to the unlimited list. When an entry in the pool is available, the content is copied from the original entry, and the new entry is sent to the device. qed_spq_post() is not aware of that, so the additional entry is stored in the original entry as p_post_ent, which can later be returned to the pool. Signed-off-by: Denis Bolotin <denis.bolotin@cavium.com> Signed-off-by: Michal Kalderon <michal.kalderon@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08qed: Fix memory/entry leak in qed_init_sp_request()Denis Bolotin1-2/+14
Free the allocated SPQ entry or return the acquired SPQ entry to the free list in error flows. Signed-off-by: Denis Bolotin <denis.bolotin@cavium.com> Signed-off-by: Michal Kalderon <michal.kalderon@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08inet: frags: better deal with smp racesEric Dumazet1-14/+15
Multiple cpus might attempt to insert a new fragment in rhashtable, if for example RPS is buggy, as reported by 배석진 in https://patchwork.ozlabs.org/patch/994601/ We use rhashtable_lookup_get_insert_key() instead of rhashtable_insert_fast() to let cpus losing the race free their own inet_frag_queue and use the one that was inserted by another cpu. Fixes: 648700f76b03 ("inet: frags: use rhashtables for reassembly units") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: 배석진 <soukjin.bae@samsung.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08net: hns3: bugfix for not checking return valueHuazhong Tan1-1/+2
hns3_reset_notify_init_enet() only return error early if the return value of hns3_restore_vlan() is not 0. This patch adds checking for the return value of hns3_restore_vlan. Fixes: 7fa6be4fd2f6 ("net: hns3: fix incorrect return value/type of some functions") Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08of, numa: Validate some distance map rulesJohn Garry1-2/+7
Currently the NUMA distance map parsing does not validate the distance table for the distance-matrix rules 1-2 in [1]. However the arch NUMA code may enforce some of these rules, but not all. Such is the case for the arm64 port, which does not enforce the rule that the distance between separates nodes cannot equal LOCAL_DISTANCE. The patch adds the following rules validation: - distance of node to self equals LOCAL_DISTANCE - distance of separate nodes > LOCAL_DISTANCE This change avoids a yet-unresolved crash reported in [2]. A note on dealing with symmetrical distances between nodes: Validating symmetrical distances between nodes is difficult. If it were mandated in the bindings that every distance must be recorded in the table, then it would be easy. However, it isn't. In addition to this, it is also possible to record [b, a] distance only (and not [a, b]). So, when processing the table for [b, a], we cannot assert that current distance of [a, b] != [b, a] as invalid, as [a, b] distance may not be present in the table and current distance would be default at REMOTE_DISTANCE. As such, we maintain the policy that we overwrite distance [a, b] = [b, a] for b > a. This policy is different to kernel ACPI SLIT validation, which allows non-symmetrical distances (ACPI spec SLIT rules allow it). However, the distance debug message is dropped as it may be misleading (for a distance which is later overwritten). Some final notes on semantics: - It is implied that it is the responsibility of the arch NUMA code to reset the NUMA distance map for an error in distance map parsing. - It is the responsibility of the FW NUMA topology parsing (whether OF or ACPI) to enforce NUMA distance rules, and not arch NUMA code. [1] Documents/devicetree/bindings/numa.txt [2] https://www.spinics.net/lists/arm-kernel/msg683304.html Cc: stable@vger.kernel.org # 4.7 Signed-off-by: John Garry <john.garry@huawei.com> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Rob Herring <robh@kernel.org>
2018-11-08of/device: Really only set bus DMA mask when appropriateRobin Murphy1-1/+3
of_dma_configure() was *supposed* to be following the same logic as acpi_dma_configure() and only setting bus_dma_mask if some range was specified by the firmware. However, it seems that subtlety got lost in the process of fitting it into the differently-shaped control flow, and as a result the force_dma==true case ends up always setting the bus mask to the 32-bit default, which is not what anyone wants. Make sure we only touch it if the DT actually said so. Fixes: 6c2fb2ea7636 ("of/device: Set bus DMA mask as appropriate") Reported-by: Aaro Koskinen <aaro.koskinen@iki.fi> Reported-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Tested-by: Aaro Koskinen <aaro.koskinen@iki.fi> Tested-by: John Stultz <john.stultz@linaro.org> Tested-by: Geert Uytterhoeven <geert+renesas@glider.be> Tested-by: Robert Richter <robert.richter@cavium.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Rob Herring <robh@kernel.org>