aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/virtio (follow)
AgeCommit message (Collapse)AuthorFilesLines
2014-12-09virtio: assert 32 bit features in transportsMichael S. Tsirkin2-0/+6
At this point, no transports set any of the high 32 feature bits. Since transports generally can't (yet) cope with such bits, add BUG_ON checks to make sure they are not set by mistake. Based on rproc patch by Rusty. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2014-12-09virtio: add support for 64 bit features.Michael S. Tsirkin3-6/+6
Change u32 to u64, and use BIT_ULL and 1ULL everywhere. Note: transports are unchanged, and only set low 32 bit. This guarantees that no transport sets e.g. VERSION_1 by mistake without proper support. Based on patch by Rusty. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2014-12-09virtio: use u32, not bitmap for featuresMichael S. Tsirkin4-14/+9
It seemed like a good idea to use bitmap for features in struct virtio_device, but it's actually a pain, and seems to become even more painful when we get more than 32 feature bits. Just change it to a u32 for now. Based on patch by Rusty. Suggested-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2014-11-11virtio_balloon: free some memory from balloon on OOMRaushaniya Maksudova1-0/+52
Excessive virtio_balloon inflation can cause invocation of OOM-killer, when Linux is under severe memory pressure. Various mechanisms are responsible for correct virtio_balloon memory management. Nevertheless it is often the case that these control tools does not have enough time to react on fast changing memory load. As a result OS runs out of memory and invokes OOM-killer. The balancing of memory by use of the virtio balloon should not cause the termination of processes while there are pages in the balloon. Now there is no way for virtio balloon driver to free some memory at the last moment before some process will be get killed by OOM-killer. This does not provide a security breach as balloon itself is running inside guest OS and is working in the cooperation with the host. Thus some improvements from guest side should be considered as normal. To solve the problem, introduce a virtio_balloon callback which is expected to be called from the oom notifier call chain in out_of_memory() function. If virtio balloon could release some memory, it will make the system to return and retry the allocation that forced the out of memory killer to run. Allocate virtio feature bit for this: it is not set by default, the the guest will not deflate virtio balloon on OOM without explicit permission from host. Signed-off-by: Raushaniya Maksudova <rmaksudova@parallels.com> Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2014-11-11virtio_balloon: return the amount of freed memory from leak_balloon()Raushaniya Maksudova1-1/+4
This value would be useful in the next patch to provide the amount of the freed memory for OOM killer. Signed-off-by: Raushaniya Maksudova <rmaksudova@parallels.com> Signed-off-by: Denis V. Lunev <den@openvz.org> CC: Rusty Russell <rusty@rustcorp.com.au> CC: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2014-10-20virtio: drop owner assignment from platform_driversWolfram Sang1-1/+0
A platform_driver does not need to set an owner, it will be populated by the driver core. Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2014-10-18Merge tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linuxLinus Torvalds4-35/+110
Pull virtio updates from Rusty Russell: "One cc: stable commit, the rest are a series of minor cleanups which have been sitting in MST's tree during my vacation. I changed a function name and made one trivial change, then they spent two days in linux-next" * tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: (25 commits) virtio-rng: refactor probe error handling virtio_scsi: drop scan callback virtio_balloon: enable VQs early on restore virtio_scsi: fix race on device removal virito_scsi: use freezable WQ for events virtio_net: enable VQs early on restore virtio_console: enable VQs early on restore virtio_scsi: enable VQs early on restore virtio_blk: enable VQs early on restore virtio_scsi: move kick event out from virtscsi_init virtio_net: fix use after free on allocation failure 9p/trans_virtio: enable VQs early virtio_console: enable VQs early virtio_blk: enable VQs early virtio_net: enable VQs early virtio: add API to enable VQs early virtio_net: minor cleanup virtio-net: drop config_mutex virtio_net: drop config_enable virtio-blk: drop config_mutex ...
2014-10-15virtio_balloon: enable VQs early on restoreMichael S. Tsirkin1-0/+2
virtio spec requires drivers to set DRIVER_OK before using VQs. This is set automatically after resume returns, virtio balloon violated this rule by adding bufs, which causes the VQ to be used directly within restore. To fix, call virtio_device_ready before using VQ. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2014-10-15virtio: defer config changed notificationsMichael S. Tsirkin1-9/+49
Defer config changed notifications that arrive during probe/scan/freeze/restore. This will allow drivers to set DRIVER_OK earlier, without worrying about racing with config change interrupts. This change will also benefit old hypervisors (before 2009) that send interrupts without checking DRIVER_OK: previously, the callback could race with driver-specific initialization. This will also help simplify drivers. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (cosmetic changes)
2014-10-15virtio-pci: move freeze/restore to virtio coreMichael S. Tsirkin2-52/+56
This is in preparation to extending config changed event handling in core. Wrapping these in an API also seems to make for a cleaner code. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2014-10-15virtio: unify config_changed handlingMichael S. Tsirkin3-10/+12
Replace duplicated code in all transports with a single wrapper in virtio.c. The only functional change is in virtio_mmio.c: if a buggy device sends us an interrupt before driver is set, we previously returned IRQ_NONE, now we return IRQ_HANDLED. As this must not happen in practice, this does not look like a big deal. See also commit 3fff0179e33cd7d0a688dab65700c46ad089e934 virtio-pci: do not oops on config change if driver not loaded. for the original motivation behind the driver check. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2014-10-15virtio_pci: fix virtio spec compliance on restoreMichael S. Tsirkin1-3/+30
On restore, virtio pci does the following: + set features + init vqs etc - device can be used at this point! + set ACKNOWLEDGE,DRIVER and DRIVER_OK status bits This is in violation of the virtio spec, which requires the following order: - ACKNOWLEDGE - DRIVER - init vqs - DRIVER_OK This behaviour will break with hypervisors that assume spec compliant behaviour. It seems like a good idea to have this patch applied to stable branches to reduce the support butden for the hypervisors. Cc: stable@vger.kernel.org Cc: Amit Shah <amit.shah@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2014-10-09mm/balloon_compaction: add vmstat counters and kpageflags bitKonstantin Khlebnikov2-0/+2
Always mark pages with PageBalloon even if balloon compaction is disabled and expose this mark in /proc/kpageflags as KPF_BALLOON. Also this patch adds three counters into /proc/vmstat: "balloon_inflate", "balloon_deflate" and "balloon_migrate". They accumulate balloon activity. Current size of balloon is (balloon_inflate - balloon_deflate) pages. All generic balloon code now gathered under option CONFIG_MEMORY_BALLOON. It should be selected by ballooning driver which wants use this feature. Currently virtio-balloon is the only user. Signed-off-by: Konstantin Khlebnikov <k.khlebnikov@samsung.com> Cc: Rafael Aquini <aquini@redhat.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09mm/balloon_compaction: remove balloon mapping and flag AS_BALLOON_MAPKonstantin Khlebnikov1-47/+13
Now ballooned pages are detected using PageBalloon(). Fake mapping is no longer required. This patch links ballooned pages to balloon device using field page->private instead of page->mapping. Also this patch embeds balloon_dev_info directly into struct virtio_balloon. Signed-off-by: Konstantin Khlebnikov <k.khlebnikov@samsung.com> Cc: Rafael Aquini <aquini@redhat.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09mm/balloon_compaction: redesign ballooned pages managementKonstantin Khlebnikov1-8/+7
Sasha Levin reported KASAN splash inside isolate_migratepages_range(). Problem is in the function __is_movable_balloon_page() which tests AS_BALLOON_MAP in page->mapping->flags. This function has no protection against anonymous pages. As result it tried to check address space flags inside struct anon_vma. Further investigation shows more problems in current implementation: * Special branch in __unmap_and_move() never works: balloon_page_movable() checks page flags and page_count. In __unmap_and_move() page is locked, reference counter is elevated, thus balloon_page_movable() always fails. As a result execution goes to the normal migration path. virtballoon_migratepage() returns MIGRATEPAGE_BALLOON_SUCCESS instead of MIGRATEPAGE_SUCCESS, move_to_new_page() thinks this is an error code and assigns newpage->mapping to NULL. Newly migrated page lose connectivity with balloon an all ability for further migration. * lru_lock erroneously required in isolate_migratepages_range() for isolation ballooned page. This function releases lru_lock periodically, this makes migration mostly impossible for some pages. * balloon_page_dequeue have a tight race with balloon_page_isolate: balloon_page_isolate could be executed in parallel with dequeue between picking page from list and locking page_lock. Race is rare because they use trylock_page() for locking. This patch fixes all of them. Instead of fake mapping with special flag this patch uses special state of page->_mapcount: PAGE_BALLOON_MAPCOUNT_VALUE = -256. Buddy allocator uses PAGE_BUDDY_MAPCOUNT_VALUE = -128 for similar purpose. Storing mark directly in struct page makes everything safer and easier. PagePrivate is used to mark pages present in page list (i.e. not isolated, like PageLRU for normal pages). It replaces special rules for reference counter and makes balloon migration similar to migration of normal pages. This flag is protected by page_lock together with link to the balloon device. Signed-off-by: Konstantin Khlebnikov <k.khlebnikov@samsung.com> Reported-by: Sasha Levin <sasha.levin@oracle.com> Link: http://lkml.kernel.org/p/53E6CEAA.9020105@oracle.com Cc: Rafael Aquini <aquini@redhat.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: <stable@vger.kernel.org> [3.8+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-09-13virtio_ring: unify direct/indirect code paths.Rusty Russell1-76/+52
virtqueue_add() populates the virtqueue descriptor table from the sgs given. If it uses an indirect descriptor table, then it puts a single descriptor in the descriptor table pointing to the kmalloc'ed indirect table where the sg is populated. Previously vring_add_indirect() did the allocation and the simple linear layout. We replace that with alloc_indirect() which allocates the indirect table then chains it like the normal descriptor table so we can reuse the core logic. This slows down pktgen by less than 1/2 a percent (which uses direct descriptors), as well as vring_bench, but it's far neater. vring_bench before: 1061485790-1104800648(1.08254e+09+/-6.6e+06)ns vring_bench after: 1125610268-1183528965(1.14172e+09+/-8e+06)ns pktgen before: 787781-796334(793165+/-2.4e+03)pps 365-369(367.5+/-1.2)Mb/sec (365530384-369498976(3.68028e+08+/-1.1e+06)bps) errors: 0 pktgen after: 779988-790404(786391+/-2.5e+03)pps 361-366(364.35+/-1.3)Mb/sec (361914432-366747456(3.64885e+08+/-1.2e+06)bps) errors: 0 Now, if we make force indirect descriptors by turning off any_header_sg in virtio_net.c: pktgen before: 713773-721062(718374+/-2.1e+03)pps 331-334(332.95+/-0.92)Mb/sec (331190672-334572768(3.33325e+08+/-9.6e+05)bps) errors: 0 pktgen after: 710542-719195(714898+/-2.4e+03)pps 329-333(331.15+/-1.1)Mb/sec (329691488-333706480(3.31713e+08+/-1.1e+06)bps) errors: 0 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-13virtio_ring: assume sgs are always well-formed.Rusty Russell1-49/+19
We used to have several callers which just used arrays. They're gone, so we can use sg_next() everywhere, simplifying the code. On my laptop, this slowed down vring_bench by 15%: vring_bench before: 936153354-967745359(9.44739e+08+/-6.1e+06)ns vring_bench after: 1061485790-1104800648(1.08254e+09+/-6.6e+06)ns However, a more realistic test using pktgen on a AMD FX(tm)-8320 saw a few percent improvement: pktgen before: 767390-792966(785159+/-6.5e+03)pps 356-367(363.75+/-2.9)Mb/sec (356068960-367936224(3.64314e+08+/-3e+06)bps) errors: 0 pktgen after: 787781-796334(793165+/-2.4e+03)pps 365-369(367.5+/-1.2)Mb/sec (365530384-369498976(3.68028e+08+/-1.1e+06)bps) errors: 0 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-27virtio: Replace DEFINE_PCI_DEVICE_TABLE macro useBenoit Taine1-1/+1
We should prefer `struct pci_device_id` over `DEFINE_PCI_DEVICE_TABLE` to meet kernel coding style guidelines. This issue was reported by checkpatch. Signed-off-by: Benoit Taine <benoit.taine@lip6.fr> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2014-04-28virtio: virtio_break_device() to mark all virtqueues broken.Rusty Russell1-0/+15
Good for post-apocalyptic scenarios, like S/390 hotplug. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2014-03-13virtio: fail adding buffer on broken queues.Rusty Russell1-3/+8
Heinz points out that adding buffers to a broken virtqueue (which should "never happen") still works. Failing allows drivers to detect and complain about broken devices. Now drivers are robust, we can add this extra check. Reported-by: Heinz Graalfs <graalfs@linux.vnet.ibm.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2014-03-13virtio_balloon: don't crash if virtqueue is broken.Rusty Russell1-5/+3
A bad implementation of virtio might cause us to mark the virtqueue broken: we'll dev_err() in that case, and the device is useless, but let's not BUG(). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2014-03-13virtio_balloon: don't softlockup on huge balloon changes.Rusty Russell1-0/+6
When adding or removing 100G from a balloon: BUG: soft lockup - CPU#0 stuck for 22s! [vballoon:367] We have a wait_event_interruptible(), but the condition is always true (more ballooning to do) so we don't ever sleep. We also have a wait_event() for the host to ack, but that is also always true as QEMU is synchronous for balloon operations. Reported-by: Gopesh Kumar Chaudhary <gopchaud@in.ibm.com> Cc: stable@kernel.org Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2014-03-13virtio: Use pci_enable_msix_exact() instead of pci_enable_msix()Alexander Gordeev1-4/+2
As result of deprecation of MSI-X/MSI enablement functions pci_enable_msix() and pci_enable_msi_block() all drivers using these two interfaces need to be updated to use the new pci_enable_msi_range() or pci_enable_msi_exact() and pci_enable_msix_range() or pci_enable_msix_exact() interfaces. Signed-off-by: Alexander Gordeev <agordeev@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2014-03-13tools/virtio: fix missing kmemleak_ignore symbolJoel Stanley1-0/+1
In commit bb478d8b167 virtio_ring: plug kmemleak false positive, kmemleak_ignore was introduced. This broke compilation of virtio_test: cc -g -O2 -Wall -I. -I ../../usr/include/ -Wno-pointer-sign -fno-strict-overflow -fno-strict-aliasing -fno-common -MMD -U_FORTIFY_SOURCE -c -o virtio_ring.o ../../drivers/virtio/virtio_ring.c ../../drivers/virtio/virtio_ring.c: In function ‘vring_add_indirect’: ../../drivers/virtio/virtio_ring.c:177:2: warning: implicit declaration of function ‘kmemleak_ignore’ [-Wimplicit-function-declaration] kmemleak_ignore(desc); ^ cc virtio_test.o virtio_ring.o -o virtio_test virtio_ring.o: In function `vring_add_indirect': tools/virtio/../../drivers/virtio/virtio_ring.c:177: undefined reference to `kmemleak_ignore' Add a dummy header for tools/virtio, and add #incldue <linux/kmemleak.h> to drivers/virtio/virtio_ring.c so it is picked up by the userspace tools. Signed-off-by: Joel Stanley <joel@jms.id.au> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2014-01-22Merge tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linuxLinus Torvalds2-3/+1
Pull virtio update from Rusty Russell: "A few simple fixes. Quiet cycle" * tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: drivers: virtio: Mark function virtballoon_migratepage() as static in virtio_balloon.c virtio-scsi: Fix hotcpu_notifier use-after-free with virtscsi_freeze virtio: pci: remove unnecessary pci_set_drvdata()
2014-01-16drivers: virtio: Mark function virtballoon_migratepage() as static in virtio_balloon.cRashika Kheria1-1/+1
Mark the function virtballoon_migratepage() as static in virtio_balloon.c because it is not used outside this file. This eliminates the following warning in virtio_balloon.c: drivers/virtio/virtio_balloon.c:372:5: warning: no previous prototype for ‘virtballoon_migratepage’ [-Wmissing-prototypes] Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2013-12-05virtio_balloon: update_balloon_size(): update correct fieldLuiz Capitulino1-1/+1
According to the virtio spec, the device configuration field that should be updated after an inflation or deflation operation is the 'actual' field, not the 'num_pages' one. Commit 855e0c5288177bcb193f6f6316952d2490478e1c swapped them in update_balloon_size(). Fix it. Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Fixes: 855e0c5288177bcb193f6f6316952d2490478e1c
2013-12-04virtio: pci: remove unnecessary pci_set_drvdata()Jingoo Han1-2/+0
The driver core clears the driver data to NULL after device_release or on probe failure. Thus, it is not needed to manually clear the device driver data to NULL. Signed-off-by: Jingoo Han <jg1.han@samsung.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2013-11-15Merge tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linuxLinus Torvalds4-17/+39
Pull virtio updates from Rusty Russell: "Nothing really exciting: some groundwork for changing virtio endian, and some robustness fixes for broken virtio devices, plus minor tweaks" * tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: virtio_scsi: verify if queue is broken after virtqueue_get_buf() x86, asmlinkage, lguest: Pass in globals into assembler statement virtio: mmio: fix signature checking for BE guests virtio_ring: adapt to notify() returning bool virtio_net: verify if queue is broken after virtqueue_get_buf() virtio_console: verify if queue is broken after virtqueue_get_buf() virtio_blk: verify if queue is broken after virtqueue_get_buf() virtio_ring: add new function virtqueue_is_broken() virtio_test: verify if virtqueue_kick() succeeded virtio_net: verify if virtqueue_kick() succeeded virtio_ring: let virtqueue_{kick()/notify()} return a bool virtio_ring: change host notification API virtio_config: remove virtio_config_val virtio: use size-based config accessors. virtio_config: introduce size-based accessors. virtio_ring: plug kmemleak false positive. virtio: pm: use CONFIG_PM_SLEEP instead of CONFIG_PM
2013-11-07virtio: mmio: fix signature checking for BE guestsMarc Zyngier1-1/+1
As virtio-mmio config registers are specified to be little-endian, using readl() to read the magic value and then memcmp() to check it fails on BE (as readl() has an implicit swab). Fix it by encoding the magic value as an integer instead of a string. Cc: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Acked-by: Pawel Moll <pawel.moll@arm.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2013-11-05virtio_ring: adapt to notify() returning boolHeinz Graalfs1-1/+1
Correct if statement to check for bool returned by notify() (introduced in 5b1bf7cb673a). Signed-off-by: Heinz Graalfs <graalfs@linux.vnet.ibm.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2013-10-29virtio_ring: add new function virtqueue_is_broken()Heinz Graalfs1-0/+8
Add new function virtqueue_is_broken(). Callers of virtqueue_get_buf() should check for a broken queue. Signed-off-by: Heinz Graalfs <graalfs@linux.vnet.ibm.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2013-10-29virtio_ring: let virtqueue_{kick()/notify()} return a boolHeinz Graalfs1-4/+16
virtqueue_{kick()/notify()} should exploit the new host notification API. If the notify call returned with a negative value the host kick failed (e.g. a kick triggered after a device was hot-unplugged). In this case the virtqueue is set to 'broken' and false is returned, otherwise true. Signed-off-by: Heinz Graalfs <graalfs@linux.vnet.ibm.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2013-10-29virtio_ring: change host notification APIHeinz Graalfs3-4/+6
Currently a host kick error is silently ignored and not reflected in the virtqueue of a particular virtio device. Changing the notify API for guest->host notification seems to be one prerequisite in order to be able to handle such errors in the context where the kick is triggered. This patch changes the notify API. The notify function must return a bool return value. It returns false if the host notification failed. Signed-off-by: Heinz Graalfs <graalfs@linux.vnet.ibm.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2013-10-16virtio: convert bus code to use dev_groupsGreg Kroah-Hartman1-8/+19
The dev_attrs field of struct bus_type is going away soon, dev_groups should be used instead. This converts the virtio bus code to use the correct field. Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: <virtualization@lists.linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-17virtio: use size-based config accessors.Rusty Russell1-6/+4
This lets the transport do endian conversion if necessary, and insulates the drivers from the difference. Most drivers can use the simple helpers virtio_cread() and virtio_cwrite(). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2013-10-17virtio_ring: plug kmemleak false positive.Rusty Russell1-0/+2
unreferenced object 0xffff88003d467e20 (size 32): comm "softirq", pid 0, jiffies 4295197765 (age 6.364s) hex dump (first 32 bytes): 28 19 bf 3d 00 00 00 00 0c 00 00 00 01 00 01 00 (..=............ 02 dc 51 3c 00 00 00 00 56 00 00 00 00 00 00 00 ..Q<....V....... backtrace: [<ffffffff8152db19>] kmemleak_alloc+0x59/0xc0 [<ffffffff81102e93>] __kmalloc+0xf3/0x180 [<ffffffff812db5d6>] vring_add_indirect+0x36/0x280 [<ffffffff812dc59f>] virtqueue_add_outbuf+0xbf/0x4e0 [<ffffffff813a8b30>] start_xmit+0x1a0/0x3b0 [<ffffffff81445861>] dev_hard_start_xmit+0x2d1/0x4d0 [<ffffffff81460052>] sch_direct_xmit+0xf2/0x1c0 [<ffffffff81445c28>] dev_queue_xmit+0x1c8/0x460 [<ffffffff814e3187>] ip6_finish_output2+0x1d7/0x470 [<ffffffff814e34b0>] ip6_finish_output+0x90/0xb0 [<ffffffff814e3507>] ip6_output+0x37/0xb0 [<ffffffff815021eb>] igmp6_send+0x2db/0x470 [<ffffffff81502645>] igmp6_timer_handler+0x95/0xa0 [<ffffffff8104b57c>] call_timer_fn+0x2c/0x90 [<ffffffff8104b7ba>] run_timer_softirq+0x1da/0x1f0 [<ffffffff81045721>] __do_softirq+0xd1/0x1b0 Address gets embedded in a descriptor via virt_to_phys(). See detach_buf, which frees it: if (vq->vring.desc[i].flags & VRING_DESC_F_INDIRECT) kfree(phys_to_virt(vq->vring.desc[i].addr)); Reported-by: Christoph Paasch <christoph.paasch@uclouvain.be> Fix-suggested-by: Christoph Paasch <christoph.paasch@uclouvain.be> Typing-done-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2013-09-23virtio: pm: use CONFIG_PM_SLEEP instead of CONFIG_PMAaron Lu1-2/+2
The freeze and restore functions defined in virtio drivers are used for suspend and hibernate, so CONFIG_PM_SLEEP is more appropriate than CONFIG_PM. This patch replace all CONFIG_PM with CONFIG_PM_SLEEP for virtio drivers that implement freeze and restore callbacks. Signed-off-by: Aaron Lu <aaron.lu@intel.com> Reviewed-by: Amit Shah <amit.shah@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2013-09-09virtio_pci: pm: Use CONFIG_PM_SLEEP instead of CONFIG_PMAaron Lu1-2/+2
The virtio_pci_freeze/restore are defined under CONFIG_PM but is used by SET_SYSTEM_SLEEP_PM_OPS macro, which is defined under CONFIG_PM_SLEEP. So if CONFIG_PM_SLEEP is not cofigured but CONFIG_PM_RUNTIME is, the following warning message appeared: drivers/virtio/virtio_pci.c:770:12: warning: ‘virtio_pci_freeze’ defined but not used [-Wunused-function] static int virtio_pci_freeze(struct device *dev) ^ drivers/virtio/virtio_pci.c:790:12: warning: ‘virtio_pci_restore’ defined but not used [-Wunused-function] static int virtio_pci_restore(struct device *dev) ^ Fix it by changing CONFIG_PM to CONFIG_PM_SLEEP. Signed-off-by: Aaron Lu <aaron.lu@intel.com> Reviewed-by: Amit Shah <amit.shah@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2013-07-10Merge tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linuxLinus Torvalds2-3/+5
Pull virtio updates from Rusty Russell: "No real surprises" * tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: MAINTAINERS: add tools/virtio/ under virtio tools/virtio: move module license stub to module.h virtio: include asm/barrier explicitly virtio: VIRTIO_F_ANY_LAYOUT feature lguest: fix example launcher compilation for broken glibc headers. virtio-net: fix the race between channels setting and refill tools/lguest: real barriers. tools/lguest: fix missing rmb(). virtio_balloon: leak_balloon(): only tell host if we got pages deflated virtio-pci: fix leaks of msix_affinity_masks Fix comment typo "CONFIG_PAE"
2013-07-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds1-12/+44
Pull networking updates from David Miller: "This is a re-do of the net-next pull request for the current merge window. The only difference from the one I made the other day is that this has Eliezer's interface renames and the timeout handling changes made based upon your feedback, as well as a few bug fixes that have trickeled in. Highlights: 1) Low latency device polling, eliminating the cost of interrupt handling and context switches. Allows direct polling of a network device from socket operations, such as recvmsg() and poll(). Currently ixgbe, mlx4, and bnx2x support this feature. Full high level description, performance numbers, and design in commit 0a4db187a999 ("Merge branch 'll_poll'") From Eliezer Tamir. 2) With the routing cache removed, ip_check_mc_rcu() gets exercised more than ever before in the case where we have lots of multicast addresses. Use a hash table instead of a simple linked list, from Eric Dumazet. 3) Add driver for Atheros CQA98xx 802.11ac wireless devices, from Bartosz Markowski, Janusz Dziedzic, Kalle Valo, Marek Kwaczynski, Marek Puzyniak, Michal Kazior, and Sujith Manoharan. 4) Support reporting the TUN device persist flag to userspace, from Pavel Emelyanov. 5) Allow controlling network device VF link state using netlink, from Rony Efraim. 6) Support GRE tunneling in openvswitch, from Pravin B Shelar. 7) Adjust SOCK_MIN_RCVBUF and SOCK_MIN_SNDBUF for modern times, from Daniel Borkmann and Eric Dumazet. 8) Allow controlling of TCP quickack behavior on a per-route basis, from Cong Wang. 9) Several bug fixes and improvements to vxlan from Stephen Hemminger, Pravin B Shelar, and Mike Rapoport. In particular, support receiving on multiple UDP ports. 10) Major cleanups, particular in the area of debugging and cookie lifetime handline, to the SCTP protocol code. From Daniel Borkmann. 11) Allow packets to cross network namespaces when traversing tunnel devices. From Nicolas Dichtel. 12) Allow monitoring netlink traffic via AF_PACKET sockets, in a manner akin to how we monitor real network traffic via ptype_all. From Daniel Borkmann. 13) Several bug fixes and improvements for the new alx device driver, from Johannes Berg. 14) Fix scalability issues in the netem packet scheduler's time queue, by using an rbtree. From Eric Dumazet. 15) Several bug fixes in TCP loss recovery handling, from Yuchung Cheng. 16) Add support for GSO segmentation of MPLS packets, from Simon Horman. 17) Make network notifiers have a real data type for the opaque pointer that's passed into them. Use this to properly handle network device flag changes in arp_netdev_event(). From Jiri Pirko and Timo Teräs. 18) Convert several drivers over to module_pci_driver(), from Peter Huewe. 19) tcp_fixup_rcvbuf() can loop 500 times over loopback, just use a O(1) calculation instead. From Eric Dumazet. 20) Support setting of explicit tunnel peer addresses in ipv6, just like ipv4. From Nicolas Dichtel. 21) Protect x86 BPF JIT against spraying attacks, from Eric Dumazet. 22) Prevent a single high rate flow from overruning an individual cpu during RX packet processing via selective flow shedding. From Willem de Bruijn. 23) Don't use spinlocks in TCP md5 signing fast paths, from Eric Dumazet. 24) Don't just drop GSO packets which are above the TBF scheduler's burst limit, chop them up so they are in-bounds instead. Also from Eric Dumazet. 25) VLAN offloads are missed when configured on top of a bridge, fix from Vlad Yasevich. 26) Support IPV6 in ping sockets. From Lorenzo Colitti. 27) Receive flow steering targets should be updated at poll() time too, from David Majnemer. 28) Fix several corner case regressions in PMTU/redirect handling due to the routing cache removal, from Timo Teräs. 29) We have to be mindful of ipv4 mapped ipv6 sockets in upd_v6_push_pending_frames(). From Hannes Frederic Sowa. 30) Fix L2TP sequence number handling bugs, from James Chapman." * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1214 commits) drivers/net: caif: fix wrong rtnl_is_locked() usage drivers/net: enic: release rtnl_lock on error-path vhost-net: fix use-after-free in vhost_net_flush net: mv643xx_eth: do not use port number as platform device id net: sctp: confirm route during forward progress virtio_net: fix race in RX VQ processing virtio: support unlocked queue poll net/cadence/macb: fix bug/typo in extracting gem_irq_read_clear bit Documentation: Fix references to defunct linux-net@vger.kernel.org net/fs: change busy poll time accounting net: rename low latency sockets functions to busy poll bridge: fix some kernel warning in multicast timer sfc: Fix memory leak when discarding scattered packets sit: fix tunnel update via netlink dt:net:stmmac: Add dt specific phy reset callback support. dt:net:stmmac: Add support to dwmac version 3.610 and 3.710 dt:net:stmmac: Allocate platform data only if its NULL. net:stmmac: fix memleak in the open method ipv6: rt6_check_neigh should successfully verify neigh if no NUD information are available net: ipv6: fix wrong ping_v6_sendmsg return value ...
2013-07-09virtio: support unlocked queue pollMichael S. Tsirkin1-12/+44
This adds a way to check ring empty state after enable_cb outside any locks. Will be used by virtio_net. Note: there's room for more optimization: caller is likely to have a memory barrier already, which means we might be able to get rid of a barrier here. Deferring this optimization until we do some benchmarking. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-03Merge branch 'akpm' (updates from Andrew Morton)Linus Torvalds1-3/+4
Merge first patch-bomb from Andrew Morton: - various misc bits - I'm been patchmonkeying ocfs2 for a while, as Joel and Mark have been distracted. There has been quite a bit of activity. - About half the MM queue - Some backlight bits - Various lib/ updates - checkpatch updates - zillions more little rtc patches - ptrace - signals - exec - procfs - rapidio - nbd - aoe - pps - memstick - tools/testing/selftests updates * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (445 commits) tools/testing/selftests: don't assume the x bit is set on scripts selftests: add .gitignore for kcmp selftests: fix clean target in kcmp Makefile selftests: add .gitignore for vm selftests: add hugetlbfstest self-test: fix make clean selftests: exit 1 on failure kernel/resource.c: remove the unneeded assignment in function __find_resource aio: fix wrong comment in aio_complete() drivers/w1/slaves/w1_ds2408.c: add magic sequence to disable P0 test mode drivers/memstick/host/r592.c: convert to module_pci_driver drivers/memstick/host/jmb38x_ms: convert to module_pci_driver pps-gpio: add device-tree binding and support drivers/pps/clients/pps-gpio.c: convert to module_platform_driver drivers/pps/clients/pps-gpio.c: convert to devm_* helpers drivers/parport/share.c: use kzalloc Documentation/accounting/getdelays.c: avoid strncpy in accounting tool aoe: update internal version number to v83 aoe: update copyright date aoe: perform I/O completions in parallel ...
2013-07-03mm: correctly update zone->managed_pagesJiang Liu1-3/+4
Enhance adjust_managed_page_count() to adjust totalhigh_pages for highmem pages. And change code which directly adjusts totalram_pages to use adjust_managed_page_count() because it adjusts totalram_pages, totalhigh_pages and zone->managed_pages altogether in a safe way. Remove inc_totalhigh_pages() and dec_totalhigh_pages() from xen/balloon driver bacause adjust_managed_page_count() has already adjusted totalhigh_pages. This patch also fixes two bugs: 1) enhances virtio_balloon driver to adjust totalhigh_pages when reserve/unreserve pages. 2) enhance memory_hotplug.c to adjust totalhigh_pages when hot-removing memory. We still need to deal with modifications of totalram_pages in file arch/powerpc/platforms/pseries/cmm.c, but need help from PPC experts. [akpm@linux-foundation.org: remove ifdef, per Wanpeng Li, virtio_balloon.c cleanup, per Sergei] [akpm@linux-foundation.org: export adjust_managed_page_count() to modules, for drivers/virtio/virtio_balloon.c] Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Wen Congyang <wency@cn.fujitsu.com> Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Minchan Kim <minchan@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: <sworddragon2@aol.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: David Howells <dhowells@redhat.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jianguo Wu <wujianguo@huawei.com> Cc: Joonsoo Kim <js1304@gmail.com> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Michel Lespinasse <walken@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-02virtio_balloon: leak_balloon(): only tell host if we got pages deflatedLuiz Capitulino1-1/+2
balloon_page_dequeue() can return NULL. If it does for the first page being freed then leak_balloon() will create a scatter list with len=0. Which in turn seems to generate an invalid virtio request. I didn't get this in practice, I found it by code review. On the other hand, such an invalid virtio request will cause errors in QEMU and fill_balloon() also performs the same check implemented by this commit. This bug was introduced in e2250429. Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com> Acked-by: Rafael Aquini <aquini@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: stable@kernel.org # 3.9
2013-07-02virtio-pci: fix leaks of msix_affinity_masksAndrew Vagin1-2/+3
vp_dev->msix_vectors should be initialized before allocating msix_affinity_masks, otherwise vp_free_vectors will not free these objects. unreferenced object 0xffff88010f969d88 (size 512): comm "systemd-udevd", pid 158, jiffies 4294673645 (age 80.545s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff816e455e>] kmemleak_alloc+0x5e/0xc0 [<ffffffff811aa7f1>] kmem_cache_alloc_node_trace+0x141/0x2c0 [<ffffffff8133ba23>] alloc_cpumask_var_node+0x23/0x80 [<ffffffff8133ba8e>] alloc_cpumask_var+0xe/0x10 [<ffffffff813fdb3d>] vp_try_to_find_vqs+0x25d/0x810 [<ffffffff813fe171>] vp_find_vqs+0x81/0xb0 [<ffffffffa00d2a05>] init_vqs+0x85/0x120 [virtio_balloon] [<ffffffffa00d2c29>] virtballoon_probe+0xf9/0x1a0 [virtio_balloon] [<ffffffff813fb61e>] virtio_dev_probe+0xde/0x140 [<ffffffff814452b8>] driver_probe_device+0x98/0x3a0 [<ffffffff8144566b>] __driver_attach+0xab/0xb0 [<ffffffff814432f4>] bus_for_each_dev+0x94/0xb0 [<ffffffff81444f4e>] driver_attach+0x1e/0x20 [<ffffffff81444910>] bus_add_driver+0x200/0x280 [<ffffffff81445c14>] driver_register+0x74/0x160 [<ffffffff813fb7d0>] register_virtio_driver+0x20/0x40 v2: change msix_vectors uncoditionaly in vp_free_vectors Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Signed-off-by: Andrew Vagin <avagin@openvz.org> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2013-05-20virtio: remove virtqueue_add_buf().Rusty Russell1-34/+3
All users changed to virtqueue_add_sg() or virtqueue_add_outbuf/inbuf. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2013-03-20virtio_balloon: use simplified virtqueue accessors.Rusty Russell1-3/+3
We never add buffers with input and output parts, so use the new accessors. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2013-03-20virtio_ring: virtqueue_add_outbuf / virtqueue_add_inbuf.Rusty Russell1-0/+44
These are specialized versions of virtqueue_add_buf(), which cover over 80% of cases and are far clearer. In particular, the scatterlists passed to these functions don't have to be clean (ie. we ignore end markers). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2013-03-20virtio_ring: virtqueue_add_sgs, to add multiple sgs.Rusty Russell1-63/+157
virtio_scsi can really use this, to avoid the current hack of copying the whole sg array. Some other things get slightly neater, too. This causes a slowdown in virtqueue_add_buf(), which is implemented as a wrapper. This is addressed in the next patches. for i in `seq 50`; do /usr/bin/time -f 'Wall time:%e' ./vringh_test --indirect --eventidx --parallel --fast-vringh; done 2>&1 | stats --trim-outliers: Before: Using CPUS 0 and 3 Guest: notified 0, pinged 39009-39063(39062) Host: notified 39009-39063(39062), pinged 0 Wall time:1.700000-1.950000(1.723542) After: Using CPUS 0 and 3 Guest: notified 0, pinged 39062-39063(39063) Host: notified 39062-39063(39063), pinged 0 Wall time:1.760000-2.220000(1.789167) Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Reviewed-by: Wanlong Gao <gaowanlong@cn.fujitsu.com> Reviewed-by: Asias He <asias@redhat.com>