aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/vhost (follow)
AgeCommit message (Collapse)AuthorFilesLines
2018-12-12Revert "net: vhost: lock the vqs one by one"Jason Wang1-4/+17
This reverts commit 78139c94dc8c96a478e67dab3bee84dc6eccb5fd. We don't protect device IOTLB with vq mutex, which will lead e.g use after free for device IOTLB entries. And since we've switched to use mutex_trylock() in previous patch, it's safe to revert it without having deadlock. Fixes: commit 78139c94dc8c ("net: vhost: lock the vqs one by one") Cc: Tonghao Zhang <xiangxia.m.yue@gmail.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-12vhost_net: switch to use mutex_trylock() in vhost_net_busy_poll()Jason Wang1-1/+7
We used to hold the mutex of paired virtqueue in vhost_net_busy_poll(). But this will results an inconsistent lock order which may cause deadlock if we try to bring back the protection of device IOTLB with vq mutex that requires to hold mutex of all virtqueues at the same time. Fix this simply by switching to use mutex_trylock(), when fail just skip the busy polling. This can happen when device IOTLB is under updating which should be rare. Fixes: commit 78139c94dc8c ("net: vhost: lock the vqs one by one") Cc: Tonghao Zhang <xiangxia.m.yue@gmail.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-12vhost: make sure used idx is seen before log in vhost_add_used_n()Jason Wang1-0/+2
We miss a write barrier that guarantees used idx is updated and seen before log. This will let userspace sync and copy used ring before used idx is update. Fix this by adding a barrier before log_write(). Fixes: 8dd014adfea6f ("vhost-net: mergeable buffers support") Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds1-3/+0
Pull networking fixes from David Miller: "A decent batch of fixes here. I'd say about half are for problems that have existed for a while, and half are for new regressions added in the 4.20 merge window. 1) Fix 10G SFP phy module detection in mvpp2, from Baruch Siach. 2) Revert bogus emac driver change, from Benjamin Herrenschmidt. 3) Handle BPF exported data structure with pointers when building 32-bit userland, from Daniel Borkmann. 4) Memory leak fix in act_police, from Davide Caratti. 5) Check RX checksum offload in RX descriptors properly in aquantia driver, from Dmitry Bogdanov. 6) SKB unlink fix in various spots, from Edward Cree. 7) ndo_dflt_fdb_dump() only works with ethernet, enforce this, from Eric Dumazet. 8) Fix FID leak in mlxsw driver, from Ido Schimmel. 9) IOTLB locking fix in vhost, from Jean-Philippe Brucker. 10) Fix SKB truesize accounting in ipv4/ipv6/netfilter frag memory limits otherwise namespace exit can hang. From Jiri Wiesner. 11) Address block parsing length fixes in x25 from Martin Schiller. 12) IRQ and ring accounting fixes in bnxt_en, from Michael Chan. 13) For tun interfaces, only iface delete works with rtnl ops, enforce this by disallowing add. From Nicolas Dichtel. 14) Use after free in liquidio, from Pan Bian. 15) Fix SKB use after passing to netif_receive_skb(), from Prashant Bhole. 16) Static key accounting and other fixes in XPS from Sabrina Dubroca. 17) Partially initialized flow key passed to ip6_route_output(), from Shmulik Ladkani. 18) Fix RTNL deadlock during reset in ibmvnic driver, from Thomas Falcon. 19) Several small TCP fixes (off-by-one on window probe abort, NULL deref in tail loss probe, SNMP mis-estimations) from Yuchung Cheng" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (93 commits) net/sched: cls_flower: Reject duplicated rules also under skip_sw bnxt_en: Fix _bnxt_get_max_rings() for 57500 chips. bnxt_en: Fix NQ/CP rings accounting on the new 57500 chips. bnxt_en: Keep track of reserved IRQs. bnxt_en: Fix CNP CoS queue regression. net/mlx4_core: Correctly set PFC param if global pause is turned off. Revert "net/ibm/emac: wrong bit is used for STA control" neighbour: Avoid writing before skb->head in neigh_hh_output() ipv6: Check available headroom in ip6_xmit() even without options tcp: lack of available data can also cause TSO defer ipv6: sr: properly initialize flowi6 prior passing to ip6_route_output mlxsw: spectrum_switchdev: Fix VLAN device deletion via ioctl mlxsw: spectrum_router: Relax GRE decap matching check mlxsw: spectrum_switchdev: Avoid leaking FID's reference count mlxsw: spectrum_nve: Remove easily triggerable warnings ipv4: ipv6: netfilter: Adjust the frag mem limit when truesize changes sctp: frag_point sanity check tcp: fix NULL ref in tail loss probe tcp: Do not underestimate rwnd_limited net: use skb_list_del_init() to remove from RX sublists ...
2018-12-06vhost/vsock: fix use-after-free in network stack callersStefan Hajnoczi1-24/+33
If the network stack calls .send_pkt()/.cancel_pkt() during .release(), a struct vhost_vsock use-after-free is possible. This occurs because .release() does not wait for other CPUs to stop using struct vhost_vsock. Switch to an RCU-enabled hashtable (indexed by guest CID) so that .release() can wait for other CPUs by calling synchronize_rcu(). This also eliminates vhost_vsock_lock acquisition in the data path so it could have a positive effect on performance. This is CVE-2018-14625 "kernel: use-after-free Read in vhost_transport_send_pkt". Cc: stable@vger.kernel.org Reported-and-tested-by: syzbot+bd391451452fb0b93039@syzkaller.appspotmail.com Reported-by: syzbot+e3e074963495f92a89ed@syzkaller.appspotmail.com Reported-by: syzbot+d5a0a170c5069658b141@syzkaller.appspotmail.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2018-12-06vhost/vsock: fix reset orphans race with close timeoutStefan Hajnoczi1-7/+15
If a local process has closed a connected socket and hasn't received a RST packet yet, then the socket remains in the table until a timeout expires. When a vhost_vsock instance is released with the timeout still pending, the socket is never freed because vhost_vsock has already set the SOCK_DONE flag. Check if the close timer is pending and let it close the socket. This prevents the race which can leak sockets. Reported-by: Maximilian Riemensberger <riemensberger@cadami.net> Cc: Graham Whaley <graham.whaley@gmail.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-12-03vhost: fix IOTLB lockingJean-Philippe Brucker1-3/+0
Commit 78139c94dc8c ("net: vhost: lock the vqs one by one") moved the vq lock to improve scalability, but introduced a possible deadlock in vhost-iotlb. vhost_iotlb_notify_vq() now takes vq->mutex while holding the device's IOTLB spinlock. And on the vhost_iotlb_miss() path, the spinlock is taken while holding vq->mutex. Since calling vhost_poll_queue() doesn't require any lock, avoid the deadlock by not taking vq->mutex. Fixes: 78139c94dc8c ("net: vhost: lock the vqs one by one") Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-01Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds1-97/+329
Pull virtio/vhost updates from Michael Tsirkin: "Fixes and tweaks: - virtio balloon page hinting support - vhost scsi control queue - misc fixes" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: MAINTAINERS: remove reference to bogus vsock file vhost/scsi: Use common handling code in request queue handler vhost/scsi: Extract common handling code from control queue handler vhost/scsi: Respond to control queue operations vhost/scsi: truncate T10 PI iov_iter to prot_bytes virtio-balloon: VIRTIO_BALLOON_F_PAGE_POISON mm/page_poison: expose page_poisoning_enabled to kernel modules virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT kvm_config: add CONFIG_VIRTIO_MENU
2018-10-31vhost: Fix Spectre V1 vulnerabilityJason Wang1-0/+2
The idx in vhost_vring_ioctl() was controlled by userspace, hence a potential exploitation of the Spectre variant 1 vulnerability. Fixing this by sanitizing idx before using it to index d->vqs. Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-24vhost/scsi: Use common handling code in request queue handlerBijan Mottahedeh1-197/+164
Change the request queue handler to use common handling routines same as the control queue handler. Signed-off-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-10-24vhost/scsi: Extract common handling code from control queue handlerBijan Mottahedeh1-99/+172
Prepare to change the request queue handler to use common handling routines. Signed-off-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-10-24vhost/scsi: Respond to control queue operationsBijan Mottahedeh1-0/+190
The vhost-scsi driver currently does not handle any control queue operations. In particular, vhost_scsi_ctl_handle_kick, merely prints out a debug message but does nothing else. This can cause guest VMs to hang. As part of SCSI recovery from an error, e.g., an I/O timeout, the SCSI midlayer attempts to abort the failed operation. The SCSI virtio driver translates the abort to a SCSI TMF request that gets put on the control queue (virtscsi_abort -> virtscsi_tmf). The SCSI virtio driver then waits indefinitely for this request to be completed, but it never will because vhost-scsi never responds to that request. To avoid a hang, always respond to control queue operations; explicitly reject TMF requests, and return a no-op response to event requests. Signed-off-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-10-24vhost/scsi: truncate T10 PI iov_iter to prot_bytesGreg Edwards1-1/+3
Commands with protection information included were not truncating the protection iov_iter to the number of protection bytes in the command. This resulted in vhost_scsi mis-calculating the size of the protection SGL in vhost_scsi_calc_sgls(), and including both the protection and data SG entries in the protection SGL. Fixes: 09b13fa8c1a1 ("vhost/scsi: Add ANY_LAYOUT support in vhost_scsi_handle_vq") Signed-off-by: Greg Edwards <gedwards@ddn.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Fixes: 09b13fa8c1a1093e9458549ac8bb203a7c65c62a Cc: stable@vger.kernel.org Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-07net: vhost: remove bad code lineTonghao Zhang1-1/+0
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-26net: vhost: add rx busy polling in tx pathTonghao Zhang1-21/+14
This patch improves the guest receive performance. On the handle_tx side, we poll the sock receive queue at the same time. handle_rx do that in the same way. We set the poll-us=100us and use the netperf to test throughput and mean latency. When running the tests, the vhost-net kthread of that VM, is alway 100% CPU. The commands are shown as below. Rx performance is greatly improved by this patch. There is not notable performance change on tx with this series though. This patch is useful for bi-directional traffic. netperf -H IP -t TCP_STREAM -l 20 -- -O "THROUGHPUT, THROUGHPUT_UNITS, MEAN_LATENCY" Topology: [Host] ->linux bridge -> tap vhost-net ->[Guest] TCP_STREAM: * Without the patch: 19842.95 Mbps, 6.50 us mean latency * With the patch: 37598.20 Mbps, 3.43 us mean latency Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-26net: vhost: factor out busy polling logic to vhost_net_busy_poll()Tonghao Zhang1-40/+70
Factor out generic busy polling logic and will be used for in tx path in the next patch. And with the patch, qemu can set differently the busyloop_timeout for rx queue. To avoid duplicate codes, introduce the helper functions: * sock_has_rx_data(changed from sk_has_rx_data) * vhost_net_busy_poll_try_queue Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-26net: vhost: replace magic number of lock annotationTonghao Zhang1-3/+3
Use the VHOST_NET_VQ_XXX as a subclass for mutex_lock_nested. Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-26net: vhost: lock the vqs one by oneTonghao Zhang1-17/+7
This patch changes the way that lock all vqs at the same, to lock them one by one. It will be used for next patch to avoid the deadlock. Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-21vhost_net: add a missing error returnDan Carpenter1-0/+1
We accidentally left out this error return so it leads to some use after free bugs later on. Fixes: 0a0be13b8fe2 ("vhost_net: batch submitting XDP buffers to underlayer sockets") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-13vhost_net: batch submitting XDP buffers to underlayer socketsJason Wang1-13/+161
This patch implements XDP batching for vhost_net. The idea is first to try to do userspace copy and build XDP buff directly in vhost. Instead of submitting the packet immediately, vhost_net will batch them in an array and submit every 64 (VHOST_NET_BATCH) packets to the under layer sockets through msg_control of sendmsg(). When XDP is enabled on the TUN/TAP, TUN/TAP can process XDP inside a loop without caring GUP thus it can do batch map flushing. When XDP is not enabled or not supported, the underlayer socket need to build skb and pass it to network core. The batched packet submission allows us to do batching like netif_receive_skb_list() in the future. This saves lots of indirect calls for better cache utilization. For the case that we can't so batching e.g when sndbuf is limited or packet size is too large, we will go for usual one packet per sendmsg() way. Doing testpmd on various setups gives us: Test /+pps% XDP_DROP on TAP /+44.8% XDP_REDIRECT on TAP /+29% macvtap (skb) /+26% Netperf tests shows obvious improvements for small packet transmission: size/session/+thu%/+normalize% 64/ 1/ +2%/ 0% 64/ 2/ +3%/ +1% 64/ 4/ +7%/ +5% 64/ 8/ +8%/ +6% 256/ 1/ +3%/ 0% 256/ 2/ +10%/ +7% 256/ 4/ +26%/ +22% 256/ 8/ +27%/ +23% 512/ 1/ +3%/ +2% 512/ 2/ +19%/ +14% 512/ 4/ +43%/ +40% 512/ 8/ +45%/ +41% 1024/ 1/ +4%/ 0% 1024/ 2/ +27%/ +21% 1024/ 4/ +38%/ +73% 1024/ 8/ +15%/ +24% 2048/ 1/ +10%/ +7% 2048/ 2/ +16%/ +12% 2048/ 4/ 0%/ +2% 2048/ 8/ 0%/ +2% 4096/ 1/ +36%/ +60% 4096/ 2/ -11%/ -26% 4096/ 4/ 0%/ +14% 4096/ 8/ 0%/ +4% 16384/ 1/ -1%/ +5% 16384/ 2/ 0%/ +2% 16384/ 4/ 0%/ -3% 16384/ 8/ 0%/ +4% 65535/ 1/ 0%/ +10% 65535/ 2/ 0%/ +8% 65535/ 4/ 0%/ +1% 65535/ 8/ 0%/ +3% Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-13tun: switch to new type of msg_controlJason Wang1-2/+5
This patch introduces to a new tun/tap specific msg_control: #define TUN_MSG_UBUF 1 #define TUN_MSG_PTR 2 struct tun_msg_ctl { int type; void *ptr; }; This allows us to pass different kinds of msg_control through sendmsg(). The first supported type is ubuf (TUN_MSG_UBUF) which will be used by the existed vhost_net zerocopy code. The second is XDP buff, which allows vhost_net to pass XDP buff to TUN. This could be used to implement accepting an array of XDP buffs from vhost_net in the following patches. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds1-1/+1
Pull networking fixes from David Miller: 1) ICE, E1000, IGB, IXGBE, and I40E bug fixes from the Intel folks. 2) Better fix for AB-BA deadlock in packet scheduler code, from Cong Wang. 3) bpf sockmap fixes (zero sized key handling, etc.) from Daniel Borkmann. 4) Send zero IPID in TCP resets and SYN-RECV state ACKs, to prevent attackers using it as a side-channel. From Eric Dumazet. 5) Memory leak in mediatek bluetooth driver, from Gustavo A. R. Silva. 6) Hook up rt->dst.input of ipv6 anycast routes properly, from Hangbin Liu. 7) hns and hns3 bug fixes from Huazhong Tan. 8) Fix RIF leak in mlxsw driver, from Ido Schimmel. 9) iova range check fix in vhost, from Jason Wang. 10) Fix hang in do_tcp_sendpages() with tls, from John Fastabend. 11) More r8152 chips need to disable RX aggregation, from Kai-Heng Feng. 12) Memory exposure in TCA_U32_SEL handling, from Kees Cook. 13) TCP BBR congestion control fixes from Kevin Yang. 14) hv_netvsc, ignore non-PCI devices, from Stephen Hemminger. 15) qed driver fixes from Tomer Tayar. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (77 commits) net: sched: Fix memory exposure from short TCA_U32_SEL qed: fix spelling mistake "comparsion" -> "comparison" vhost: correctly check the iova range when waking virtqueue qlge: Fix netdev features configuration. net: macb: do not disable MDIO bus at open/close time Revert "net: stmmac: fix build failure due to missing COMMON_CLK dependency" net: macb: Fix regression breaking non-MDIO fixed-link PHYs mlxsw: spectrum_switchdev: Do not leak RIFs when removing bridge i40e: fix condition of WARN_ONCE for stat strings i40e: Fix for Tx timeouts when interface is brought up if DCB is enabled ixgbe: fix driver behaviour after issuing VFLR ixgbe: Prevent unsupported configurations with XDP ixgbe: Replace GFP_ATOMIC with GFP_KERNEL igb: Replace mdelay() with msleep() in igb_integrated_phy_loopback() igb: Replace GFP_ATOMIC with GFP_KERNEL in igb_sw_init() igb: Use an advanced ctx descriptor for launchtime e1000: ensure to free old tx/rx rings in set_ringparam() e1000: check on netif_running() before calling e1000_up() ixgb: use dma_zalloc_coherent instead of allocator/memset ice: Trivial formatting fixes ...
2018-08-25vhost: correctly check the iova range when waking virtqueueJason Wang1-1/+1
We don't wakeup the virtqueue if the first byte of pending iova range is the last byte of the range we just got updated. This will lead a virtqueue to wait for IOTLB updating forever. Fixing by correct the check and wake up the virtqueue in this case. Fixes: 6b1e6cc7855b ("vhost: new device IOTLB API") Reported-by: Peter Xu <peterx@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Tested-by: Peter Xu <peterx@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-24Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds2-2/+2
Pull virtio updates from Michael Tsirkin: "virtio, vhost: fixes, tweaks No new features but a bunch of tweaks such as switching balloon from oom notifier to shrinker" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: vhost/scsi: increase VHOST_SCSI_PREALLOC_PROT_SGLS to 2048 vhost: allow vhost-scsi driver to be built-in virtio: pci-legacy: Validate queue pfn virtio: mmio-v1: Validate queue PFN virtio_balloon: replace oom notifier with shrinker virtio-balloon: kzalloc the vb struct virtio-balloon: remove BUG() in init_vqs
2018-08-22vhost/scsi: increase VHOST_SCSI_PREALLOC_PROT_SGLS to 2048Greg Edwards1-1/+1
The current value of VHOST_SCSI_PREALLOC_PROT_SGLS is too small to accommodate larger I/Os, e.g. 16-32 MiB, when the VIRTIO_SCSI_F_T10_PI feature bit is negotiated and the backing store supports T10 PI. vhost-scsi rejects the command with errors like: [ 59.581317] vhost_scsi_calc_sgls: requested sgl_count: 1820 exceeds pre-allocated max_sgls: 512 Signed-off-by: Greg Edwards <gedwards@ddn.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-08-22vhost: allow vhost-scsi driver to be built-inGreg Edwards1-1/+1
It's useful to allow vhost-scsi to be built-in when testing vhost in L1 + L2 VMs and booting L1 VM with QEMU '-kernel' option. Signed-off-by: Greg Edwards <gedwards@ddn.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-08-15Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds1-9/+7
Pull SCSI updates from James Bottomley: "This is mostly updates to the usual drivers: mpt3sas, lpfc, qla2xxx, hisi_sas, smartpqi, megaraid_sas, arcmsr. In addition, with the continuing absence of Nic we have target updates for tcmu and target core (all with reviews and acks). The biggest observable change is going to be that we're (again) trying to switch to mulitqueue as the default (a user can still override the setting on the kernel command line). Other major core stuff is the removal of the remaining Microchannel drivers, an update of the internal timers and some reworks of completion and result handling" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (203 commits) scsi: core: use blk_mq_run_hw_queues in scsi_kick_queue scsi: ufs: remove unnecessary query(DM) UPIU trace scsi: qla2xxx: Fix issue reported by static checker for qla2x00_els_dcmd2_sp_done() scsi: aacraid: Spelling fix in comment scsi: mpt3sas: Fix calltrace observed while running IO & reset scsi: aic94xx: fix an error code in aic94xx_init() scsi: st: remove redundant pointer STbuffer scsi: qla2xxx: Update driver version to 10.00.00.08-k scsi: qla2xxx: Migrate NVME N2N handling into state machine scsi: qla2xxx: Save frame payload size from ICB scsi: qla2xxx: Fix stalled relogin scsi: qla2xxx: Fix race between switch cmd completion and timeout scsi: qla2xxx: Fix Management Server NPort handle reservation logic scsi: qla2xxx: Flush mailbox commands on chip reset scsi: qla2xxx: Fix unintended Logout scsi: qla2xxx: Fix session state stuck in Get Port DB scsi: qla2xxx: Fix redundant fc_rport registration scsi: qla2xxx: Silent erroneous message scsi: qla2xxx: Prevent sysfs access when chip is down scsi: qla2xxx: Add longer window for chip reset ...
2018-08-09Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-3/+6
Overlapping changes in RXRPC, changing to ktime_get_seconds() whilst adding some tracepoints. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-08vhost: reset metadata cache when initializing new IOTLBJason Wang1-3/+6
We need to reset metadata cache during new IOTLB initialization, otherwise the stale pointers to previous IOTLB may be still accessed which will lead a use after free. Reported-by: syzbot+c51e6736a1bf614b3272@syzkaller.appspotmail.com Fixes: f88949138058 ("vhost: introduce O(1) vq metadata cache") Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-06vhost: switch to use new message formatJason Wang3-19/+93
We use to have message like: struct vhost_msg { int type; union { struct vhost_iotlb_msg iotlb; __u8 padding[64]; }; }; Unfortunately, there will be a hole of 32bit in 64bit machine because of the alignment. This leads a different formats between 32bit API and 64bit API. What's more it will break 32bit program running on 64bit machine. So fixing this by introducing a new message type with an explicit 32bit reserved field after type like: struct vhost_msg_v2 { __u32 type; __u32 reserved; union { struct vhost_iotlb_msg iotlb; __u8 padding[64]; }; }; We will have a consistent ABI after switching to use this. To enable this capability, introduce a new ioctl (VHOST_SET_BAKCEND_FEATURE) for userspace to enable this feature (VHOST_BACKEND_F_IOTLB_V2). Fixes: 6b1e6cc7855b ("vhost: new device IOTLB API") Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-02scsi: target: loop, usb, vhost, xen: use target_remove_sessionMike Christie1-1/+1
This converts drivers that were only calling transport_deregister_session to use target_remove_session. The calling of transport_deregister_session_configfs via target_remove_session for these types of drivers is ok, because they were not exporting info from fields like sess_acl_list, sess->se_tpg and sess->fabric_sess_ptr from configfs accessible functions, so they will see no difference. Signed-off-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Felipe Balbi <balbi@kernel.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Andrzej Pietrasiewicz <andrzej.p@samsung.com> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Juergen Gross <jgross@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-08-02scsi: target: rename target_alloc_sessionMike Christie1-1/+1
Rename target_alloc_session to target_setup_session to avoid confusion with the other transport session allocation function that only allocates the session and because the target_alloc_session does so much more. It allocates the session, sets up the nacl and registers the session. The next patch will then add a remove function to match the setup in this one, so it should make sense for all drivers, except iscsi, to just call those 2 functions to setup and remove a session. iscsi will continue to be the odd driver. Signed-off-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Chris Boot <bootc@bootc.net> Cc: Bryant G. Ly <bryantly@linux.vnet.ibm.com> Cc: Michael Cyr <mikecyr@linux.vnet.ibm.com> Cc: <qla2xxx-upstream@qlogic.com> Cc: Johannes Thumshirn <jth@kernel.org> Cc: Felipe Balbi <balbi@kernel.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Andrzej Pietrasiewicz <andrzej.p@samsung.com> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Juergen Gross <jgross@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-07-22vhost_net: batch update used ring for datacopy TXJason Wang1-15/+25
Like commit e2b3b35eb989 ("vhost_net: batch used ring update in rx"), this patches implements batch used ring update for datacopy TX (zerocopy has already done some kind of batching). Testpmd transmission from guest to host (XDP_DROP on tap) shows 25.8% improvement (from ~3.1Mpps to ~3.9Mpps) on Broadwell i7-5600U CPU @ 2.60GHz machine. Netperf TCP tests does not show obvious differences. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-22vhost_net: rename VHOST_RX_BATCH to VHOST_NET_BATCHJason Wang1-4/+4
A more generic name which could be used for TX as well. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-22vhost_net: rename vhost_rx_signal_used() to vhost_net_signal_used()Jason Wang1-4/+4
Rename for reusing this for TX. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-22vhost_net: split out datacopy logicJason Wang1-20/+90
Instead of mixing zerocopy and datacopy logics, this patch tries to split datacopy logic out. This results for a more compact code and ad-hoc optimization could be done on top more easily. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-22vhost_net: introduce tx_can_batch()Jason Wang1-2/+7
Introduce tx_can_batch() to determine whether TX could be batched. This will help to reduce the code duplication in the future. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-22vhost_net: introduce get_tx_bufs()Jason Wang1-17/+32
Factor out logic of getting tx buffer and iov iter initialization. This will be used for reducing codes duplication in the future. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-22vhost_net: introduce vhost_exceeds_weight()Jason Wang1-5/+8
Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-22vhost_net: introduce helper to initialize tx iov iterJason Wang1-9/+17
Introduce init_iov_iter() in order to be reused by future patch. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-22vhost_net: drop unnecessary parameterJason Wang1-4/+2
Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-04vhost_net: Avoid rx vring kicks during busyloopToshiaki Makita1-3/+7
We may run out of avail rx ring descriptor under heavy load but busypoll did not detect it so busypoll may have exited prematurely. Avoid this by checking rx ring full during busypoll. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-04vhost_net: Avoid rx queue wake-ups during busypollToshiaki Makita1-7/+19
We may run handle_rx() while rx work is queued. For example a packet can push the rx work during the window before handle_rx calls vhost_net_disable_vq(). In that case busypoll immediately exits due to vhost_has_work() condition and enables vq again. This can lead to another unnecessary rx wake-ups, so poll rx work instead of enabling the vq. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-04vhost_net: Avoid tx vring kicks during busyloopToshiaki Makita1-13/+22
Under heavy load vhost busypoll may run without suppressing notification. For example tx zerocopy callback can push tx work while handle_tx() is running, then busyloop exits due to vhost_has_work() condition and enables notification but immediately reenters handle_tx() because the pushed work was tx. In this case handle_tx() tries to disable notification again, but when using event_idx it by design cannot. Then busyloop will run without suppressing notification. Another example is the case where handle_tx() tries to enable notification but avail idx is advanced so disables it again. This case also leads to the same situation with event_idx. The problem is that once we enter this situation busyloop does not work under heavy load for considerable amount of time, because notification is likely to happen during busyloop and handle_tx() immediately enables notification after notification happens. Specifically busyloop detects notification by vhost_has_work() and then handle_tx() calls vhost_enable_notify(). Because the detected work was the tx work, it enters handle_tx(), and enters busyloop without suppression again. This is likely to be repeated, so with event_idx we are almost not able to suppress notification in this case. To fix this, poll the work instead of enabling notification when busypoll is interrupted by something. IMHO vhost_has_work() is kind of interruption rather than a signal to completely cancel the busypoll, so let's run busypoll after the necessary work is done. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-04vhost_net: Rename local variables in vhost_net_rx_peek_head_lenToshiaki Makita1-17/+17
So we can easily see which variable is for which, tx or rx. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-02scsi: target: Remove second argument from fabric_make_tpg()Bart Van Assche1-3/+1
Since most target drivers do not use the second fabric_make_tpg() argument ("group") and since it is trivial to derive the group pointer from the wwn pointer, do not pass the group pointer to fabric_make_tpg(). Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Reviewed-by: Mike Christie <mchristi@redhat.com> Cc: Felipe Balbi <felipe.balbi@linux.intel.com> Cc: Hannes Reinecke <hare@suse.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-06-23vhost_net: validate sock before trying to put its fdJason Wang1-1/+2
Sock will be NULL if we pass -1 to vhost_net_set_backend(), but when we meet errors during ubuf allocation, the code does not check for NULL before calling sockfd_put(), this will lead NULL dereferencing. Fixing by checking sock pointer before. Fixes: bab632d69ee4 ("vhost: vhost TX zero-copy support") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-19scsi: target: Convert target drivers to use sbitmapMatthew Wilcox1-3/+3
The sbitmap and the percpu_ida perform essentially the same task, allocating tags for commands. The sbitmap outperforms the percpu_ida as documented here: https://lkml.org/lkml/2014/4/22/553 The sbitmap interface is a little harder to use, but being able to remove the percpu_ida code and getting better performance justifies the additional complexity. Signed-off-by: Matthew Wilcox <willy@infradead.org> Acked-by: Felipe Balbi <felipe.balbi@linux.intel.com> # f_tcm Reviewed-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-06-19scsi: target: Abstract tag freeingMatthew Wilcox1-1/+1
Introduce target_free_tag() and convert all drivers to use it. Signed-off-by: Matthew Wilcox <willy@infradead.org> Reviewed-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-06-16Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds1-0/+3
Pull virtio updates from Michael Tsirkin: "virtio, vhost: features, fixes - PCI virtual function support for virtio - DMA barriers for virtio strong barriers - bugfixes" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: virtio: update the comments for transport features virtio_pci: support enabling VFs vhost: fix info leak due to uninitialized memory virtio_ring: switch to dma_XX barriers for rpmsg