aboutsummaryrefslogtreecommitdiffstats
path: root/tools/perf/scripts/python/call-graph-from-postgresql.py (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2016-03-08bcache: fix cache_set_flush() NULL pointer dereference on OOMEric Wheeler1-0/+3
When bch_cache_set_alloc() fails to kzalloc the cache_set, the asyncronous closure handling tries to dereference a cache_set that hadn't yet been allocated inside of cache_set_flush() which is called by __cache_set_unregister() during cleanup. This appears to happen only during an OOM condition on bcache_register. Signed-off-by: Eric Wheeler <bcache@linux.ewheeler.net> Cc: stable@vger.kernel.org
2016-03-08bcache: cleaned up error handling around register_cache()Eric Wheeler1-12/+22
Fix null pointer dereference by changing register_cache() to return an int instead of being void. This allows it to return -ENOMEM or -ENODEV and enables upper layers to handle the OOM case without NULL pointer issues. See this thread: http://thread.gmane.org/gmane.linux.kernel.bcache.devel/3521 Fixes this error: gargamel:/sys/block/md5/bcache# echo /dev/sdh2 > /sys/fs/bcache/register bcache: register_cache() error opening sdh2: cannot allocate memory BUG: unable to handle kernel NULL pointer dereference at 00000000000009b8 IP: [<ffffffffc05a7e8d>] cache_set_flush+0x102/0x15c [bcache] PGD 120dff067 PUD 1119a3067 PMD 0 Oops: 0000 [#1] SMP Modules linked in: veth ip6table_filter ip6_tables (...) CPU: 4 PID: 3371 Comm: kworker/4:3 Not tainted 4.4.2-amd64-i915-volpreempt-20160213bc1 #3 Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013 Workqueue: events cache_set_flush [bcache] task: ffff88020d5dc280 ti: ffff88020b6f8000 task.ti: ffff88020b6f8000 RIP: 0010:[<ffffffffc05a7e8d>] [<ffffffffc05a7e8d>] cache_set_flush+0x102/0x15c [bcache] Signed-off-by: Eric Wheeler <bcache@linux.ewheeler.net> Tested-by: Marc MERLIN <marc@merlins.org> Cc: <stable@vger.kernel.org>
2016-03-08bcache: fix race of writeback thread starting before complete initializationEric Wheeler1-1/+8
The bch_writeback_thread might BUG_ON in read_dirty() if dc->sb==BDEV_STATE_DIRTY and bch_sectors_dirty_init has not yet completed its related initialization. This patch downs the dc->writeback_lock until after initialization is complete, thus preventing bch_writeback_thread from proceeding prematurely. See this thread: http://thread.gmane.org/gmane.linux.kernel.bcache.devel/3453 Signed-off-by: Eric Wheeler <bcache@linux.ewheeler.net> Tested-by: Marc MERLIN <marc@merlins.org> Cc: <stable@vger.kernel.org> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-03-08NVMe: Create discard zero quirk white listKeith Busch3-2/+15
The NVMe specification does not require discarded blocks return zeroes on read, but provides that behavior as a possibility. Some applications more efficiently use an SSD if reads on discarded blocks were deterministically zero, based on the "discard_zeroes_data" queue attribute. There is no specification defined way to determine device behavior on discarded blocks, so the driver always left the queue setting disabled. We can only know behavior based on individual device models, so this patch adds a flag to the NVMe "quirk" list that vendors may set if they know their controller works that way. The patch also sets the new flag for one such known device. Signed-off-by: Keith Busch <keith.busch@intel.com> Suggested-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-03-04nbd: use correct div_s64 helperArnd Bergmann1-2/+1
The do_div() macro now checks its arguments for the correct type, and refuses anything other than u64, so we get a warning about nbd_ioctl passing in an loff_t: drivers/block/nbd.c: In function '__nbd_ioctl': drivers/block/nbd.c:757:77: error: comparison of distinct pointer types lacks a cast [-Werror] This changes the nbd code to use div_s64() instead, which takes a signed argument. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: 37091fdd831f ("nbd: Create size change events for userspace") Signed-off-by: Jens Axboe <axboe@fb.com>
2016-03-04mtip32xx: remove unneeded variable in mtip_cmd_timeout()Jens Axboe1-2/+1
We always return BLK_EH_RESET_TIMER, so no point in storing that in an integer. Signed-off-by: Jens Axboe <axboe@fb.com>
2016-03-03lightnvm: generalize rrpc ppa calculationsJavier González2-17/+40
In rrpc, some calculations assume a certain configuration (e.g., 1 LUN, 1 sector per page). The reason behind this was that LightNVM used a simple configuration with QEMU to test core features in the beginning. This patch relaxes these assumptions and generalizes calculation, allowing multiple luns to be used. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-03-03lightnvm: remove struct nvm_dev->total_blocksMatias Bjørling1-5/+1
The struct nvm_dev->total_blocks was only used for calculating total sectors. Remove and instead calculate total sectors from the number of luns and its sectors. Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-03-03lightnvm: rename ->nr_pages to ->nr_sectsMatias Bjørling5-24/+22
The struct rrpc->nr_pages can easily be interpreted as the number of flash pages allocated to rrpc, while it is the nr_sects. Make sure that this is reflected from the variable name. Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-03-03lightnvm: update closed list outside of intr contextJavier González1-13/+10
When an I/O finishes, full blocks are moved from the open to the closed list - a lock is taken to protect the list. This happens at the moment in the interrupt context, which is not correct. This patch moves this logic to the block workqueue instead, avoiding holding a spinlock without interrupt save in an interrupt context. Signed-off-by: Javier González <javier@cnexlabs.com> Fixes: ff0e498bfa18 ("lightnvm: manage open and closed blocks sepa...") Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-03-03xen/blback: Fit the important information of the thread in 17 charactersKonrad Rzeszutek Wilk1-4/+3
The processes names are truncated to 17, while we had the length of the process as name 20 - which meant that while we filled it out with various details - the last 3 characters (which had the queue number) never surfaced to the user-space. To simplify this and be able to fit the device name, domain id, and the queue number we remove the 'blkback' from the name. Prior to this patch the device name is "blkback.<domid>.<name>" for example: blkback.8.xvda, blkback.11.hda. With the multiqueue block backend we add "-%d" for the queue. But sadly this is already way past the limit so it gets stripped. Possible solution had been identified by Ian: http://lists.xenproject.org/archives/html/xen-devel/2015-05/msg03516.html " If you are pressed for space then the "xvd" is probably a bit redundant in a string which starts blkbk. The guest may not even call the device xvdN (iirc BSD has another prefix) any how, so having blkback say so seems of limited use anyway. Since this seems to not include a partition number how does this work in the split partition scheme? (i.e. one where the guest is given xvda1 and xvda2 rather than xvda with a partition table) [It will be 'blkback.8.xvda1', and 'blkback.11.xvda2'] Perhaps something derived from one of the schemes in http://xenbits.xen.org/docs/unstable/misc/vbd-interface.txt might be a better fit? After a bit of discussion (see http://lists.xenproject.org/archives/html/xen-devel/2015-12/msg01588.html) we settled on dropping the "blback" part. This will make it possible to have the <domid>.<name>-<queue>: [1.xvda-0] [1.xvda-1] And we enough space to make it go up to: [32100.xvdfg9-5] Acked-by: Roger Pau Monné <roger.pau@citrix.com> Reported-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2016-03-03lightnvm: fold get bb tbl when using dual/quad plane modeMatias Bjørling3-9/+45
When the media manager runs in dual or quad plane mode, lightnvm abstracts away plane specific commands. This poses a problem for get bad block table, as it reports bad blocks per plane, making the table either two or four times bigger than expected. Fold the bad block list before returning. Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-03-03lightnvm: fix up nonsensical configure overrun checkingAlan1-6/+5
Instead of checking a constant 0 actually check the space available. Even better remember to allow for the header and also check the right amount of space is needed. Signed-off-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-03-03xen-blkback: advertise indirect segment support earlierJan Beulich1-5/+8
There's no reason to defer this until the connect phase, and in fact there are frontend implementations expecting this to be available earlier. Move it into the probe function. Acked-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Cc: Bob Liu <bob.liu@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2016-03-03xen-blkfront: rename indirect descriptor parameterJan Beulich1-2/+4
"max" is rather ambiguous and carries pretty little meaning, the more that there are also "max_queues" and "max_ring_page_order". Make this "max_indirect_segments" instead, and at once change the type from int to uint (to match the respective variable's type). Acked-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2016-03-03mtip32xx: Cleanup queued requests after surprise removalAsai Thambi SP1-18/+60
Fail all pending requests after surprise removal of a drive. Signed-off-by: Vignesh Gunasekaran <vgunasekaran@micron.com> Signed-off-by: Selvan Mani <smani@micron.com> Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe <axboe@fb.com>
2016-03-03mtip32xx: Implement timeout handlerAsai Thambi SP2-10/+92
Added timeout handler. Replaced blk_mq_end_request() with blk_mq_complete_request() to avoid double completion of a request. Signed-off-by: Selvan Mani <smani@micron.com> Signed-off-by: Rajesh Kumar Sambandam <rsambandam@micron.com> Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe <axboe@fb.com>
2016-03-03mtip32xx: Handle FTL rebuild failure state during device initializationAsai Thambi SP1-5/+6
Allow device initialization to finish gracefully when it is in FTL rebuild failure state. Also, recover device out of this state after successfully secure erasing it. Signed-off-by: Selvan Mani <smani@micron.com> Signed-off-by: Vignesh Gunasekaran <vgunasekaran@micron.com> Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe <axboe@fb.com>
2016-03-03mtip32xx: Handle safe removal during IOAsai Thambi SP2-2/+33
Flush inflight IOs using fsync_bdev() when the device is safely removed. Also, block further IOs in device open function. Signed-off-by: Selvan Mani <smani@micron.com> Signed-off-by: Rajesh Kumar Sambandam <rsambandam@micron.com> Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe <axboe@fb.com>
2016-03-03mtip32xx: Fix for rmmod crash when drive is in FTL rebuildAsai Thambi SP1-7/+7
When FTL rebuild is in progress, alloc_disk() initializes the disk but device node will be created by add_disk() only after successful completion of FTL rebuild. So, skip deletion of device node in removal path when FTL rebuild is in progress. Signed-off-by: Selvan Mani <smani@micron.com> Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe <axboe@fb.com>
2016-03-03mtip32xx: Avoid issuing standby immediate cmd during FTL rebuildAsai Thambi SP1-8/+12
Prevent standby immediate command from being issued in remove, suspend and shutdown paths, while drive is in FTL rebuild process. Signed-off-by: Selvan Mani <smani@micron.com> Signed-off-by: Vignesh Gunasekaran <vgunasekaran@micron.com> Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe <axboe@fb.com>
2016-03-03mtip32xx: Print exact time when an internal command is interruptedAsai Thambi SP1-2/+6
Print exact time when an internal command is interrupted. Signed-off-by: Selvan Mani <smani@micron.com> Signed-off-by: Rajesh Kumar Sambandam <rsambandam@micron.com> Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe <axboe@fb.com>
2016-03-03mtip32xx: Remove unwanted code from taskfile error handlerAsai Thambi SP1-8/+1
Remove setting and clearing MTIP_PF_EH_ACTIVE_BIT flag in mtip_handle_tfe() as they are redundant. Also avoid waking up service thread from mtip_handle_tfe() because it is already woken up in case of taskfile error. Signed-off-by: Selvan Mani <smani@micron.com> Signed-off-by: Rajesh Kumar Sambandam <rsambandam@micron.com> Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe <axboe@fb.com>
2016-03-03mtip32xx: Fix broken service thread handlingAsai Thambi SP2-3/+8
Service thread does not detect the need for taskfile error hanlding. Fixed the flag condition to process taskfile error. Signed-off-by: Selvan Mani <smani@micron.com> Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe <axboe@fb.com>
2016-02-29nvme: expose cntlid in sysfsMing Lin2-4/+17
For NVMe over Fabrics, the cntlid will be used by systemd/udev to create link to the device, for example, /dev/disk/by-path/<fabrics-info>-<cntlid>-<namespace> -> /dev/nvme0n1 Signed-off-by: Ming Lin <ming.l@ssi.samsung.com> Reviewed-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-02-29nvme: return the whole CQE through the request passthrough interfaceChristoph Hellwig3-17/+24
Both LighNVM and NVMe over Fabrics need to look at more than just the status and result field. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Matias Bj?rling <m@bjorling.me> Reviewed-by: Jay Freyensee <james.p.freyensee@intel.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-02-29nvme: replace the kthread with a per-device watchdog timerChristoph Hellwig1-89/+23
The only work left in the kthread is the periodic health check for each controller. There is no need to run this from process context or keep a thread context around for it, so replace it with a simpler timer. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-02-29nvme: don't poll the CQ from the kthreadChristoph Hellwig1-12/+0
There is no reason to do unconditional polling of CQs per the NVMe spec. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-02-29nvme: use a work item to submit async event requestsChristoph Hellwig1-7/+18
Use a dedicated work item to submit async event requests instead of the global kthread. This simplifies the code and reduces the latencies to resubmit a request once an even notification happened. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-02-15nbd: Create size change events for userspaceMarkus Pargmann1-21/+58
The userspace needs to know when nbd devices are ready for use. Currently no events are created for the userspace which doesn't work for systemd. See the discussion here: https://github.com/systemd/systemd/pull/358 This patch uses a central point to setup the nbd-internal sizes. A ioctl to set a size does not lead to a visible size change. The size of the block device will be kept at 0 until nbd is connected. As soon as it connects, the size will be changed to the real value and a uevent is created. When disconnecting, the blockdevice is set to 0 size and another uevent is generated. Signed-off-by: Markus Pargmann <mpa@pengutronix.de>
2016-02-10nvme: split pci module out of core moduleMing Lin4-18/+35
NVMe over Fabrics drivers are going to reuse the core, so splits nvme.ko into 2 modules: nvme-core.ko: the core part nvme.ko: the PCI driver Export symbols from nvme-core.ko. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lin <ming.l@ssi.samsung.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-02-10nvme: split dev_list_lockMing Lin3-3/+2
Split dev_list_lock into one in the core and one in the PCI driver. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Ming Lin <ming.l@ssi.samsung.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-02-10nvme: move timeout variables to core.cMing Lin2-12/+12
These variables are used by PCI driver and will also be used in the forthcoming NVMe over Fabrics drivers. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Ming Lin <ming.l@ssi.samsung.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-02-10nvme/host: reference the fabric module for each bdev open calloutSagi Grimberg3-3/+18
We don't want to be able to unload the fabric driver when we have openened referenced to our namespaces. Thus, for each nvme_open we take a reference on the fabric driver and put it in nvme_release. This behavior is consistent with the scsi model. This resolves the panic when unloading a fabric module with mpath holders. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ian Bakshan <ianb@mellanox.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Ming Lin <ming.l@ssi.samsung.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-02-10nvme: Log the ctrl device name instead of the underlying pci device nameSagi Grimberg2-28/+33
Having the ctrl name "nvmeX" seems much more friendly than the underlying device name. Also, with other nvme transports such as the soon to come nvme-loop we don't have an underlying device so it doesn't makes sense to make up one. In order to help matching an instance name to a pci function, we add a info print in nvme_probe. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Acked-by: Keith Busch <keith.busch@intel.com> Manually fixed up the hunk in nvme_cancel_queue_ios(). Signed-off-by: Jens Axboe <axboe@fb.com>
2016-02-09nvme: fix drvdata setup for the nvme deviceChristoph Hellwig1-2/+1
Pass the right private data to device_create_with_groups from the beginning, and remove the superflous call to dev_set_drvdata. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jon Derrick <jonathan.derrick@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-02-09NVMe: Fix possible queue use after freedKeith Busch1-5/+9
This notifies blk-mq when the tag set contains a different number of queues prior to freeing unused ones that the request queue points to. Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-02-09blk-mq: dynamic h/w context countKeith Busch4-73/+112
The hardware's provided queue count may change at runtime with resource provisioning. This patch allows a block driver to alter the number of h/w queues available when its resource count changes. The main part is a new blk-mq API to request a new number of h/w queues for a given live tag set. The new API freezes all queues using that set, then adjusts the allocated count prior to remapping these to CPUs. The bulk of the rest just shifts where h/w contexts and all their artifacts are allocated and freed. The number of max h/w contexts is capped to the number of possible cpus since there is no use for more than that. As such, all pre-allocated memory for pointers need to account for the max possible rather than the initial number of queues. A side effect of this is that the blk-mq will proceed successfully as long as it can allocate at least one h/w context. Previously it would fail request queue initialization if less than the requested number was allocated. Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: Jon Derrick <jonathan.derrick@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-02-05nbd: ratelimit error msgs after socket closeDan Streetman1-2/+2
Make the "Attempted send on closed socket" error messages generated in nbd_request_handler() ratelimited. When the nbd socket is shutdown, the nbd_request_handler() function emits an error message for every request remaining in its queue. If the queue is large, this will spam a large amount of messages to the log. There's no need for a separate error message for each request, so this patch ratelimits it. In the specific case this was found, the system was virtual and the error messages were logged to the serial port, which overwhelmed it. Fixes: 4d48a542b427 ("nbd: fix I/O hang on disconnected nbds") Signed-off-by: Dan Streetman <dan.streetman@canonical.com> Signed-off-by: Markus Pargmann <mpa@pengutronix.de>
2016-02-05nbd: Move flag parsing to a functionMarkus Pargmann1-9/+13
nbd changes properties of the blockdevice depending on flags that were received. This patch moves this flag parsing into a separate function nbd_parse_flags(). Signed-off-by: Markus Pargmann <mpa@pengutronix.de>
2016-02-05nbd: Cleanup reset of nbd and bdev after a disconnectMarkus Pargmann1-11/+29
Group all variables that are reset after a disconnect into reset functions. This patch adds two of these functions, nbd_reset() and nbd_bdev_reset(). Signed-off-by: Markus Pargmann <mpa@pengutronix.de>
2016-02-05nbd: Timeouts are not user requested disconnectsMarkus Pargmann1-3/+6
It may be useful to know in the client that a connection timed out. The current code returns success for a timeout. This patch reports the error code -ETIMEDOUT for a timeout. Signed-off-by: Markus Pargmann <mpa@pengutronix.de>
2016-02-05nbd: Remove signal usageMarkus Pargmann1-78/+48
As discussed on the mailing list, the usage of signals for timeout handling has a lot of potential issues. The nbd driver used for some time signals for timeouts. These signals where able to get the threads out of the blocking socket operations. This patch removes all signal usage and uses a socket shutdown instead. The socket descriptor itself is cleared later when the whole nbd device is closed. The tasks_lock is removed as we do not depend on this anymore. Instead a new lock for the socket is introduced so we can safely work with the socket in the timeout handler outside of the two main threads. Cc: Oleg Nesterov <oleg@redhat.com> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Markus Pargmann <mpa@pengutronix.de> Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-02-04cfq-iosched: Allow parent cgroup to preempt its childJan Kara1-1/+18
Currently we don't allow sync workload of one cgroup to preempt sync workload of any other cgroup. This is because we want to achieve service separation between cgroups. However in cases where cgroup preempting is ancestor of the current cgroup, there is no need of separation and idling introduces unnecessary overhead. This hurts for example the case when workload is isolated within a cgroup but journalling threads are in root cgroup. Simple way to demostrate the issue is using: dbench4 -c /usr/share/dbench4/client.txt -t 10 -D /mnt 1 on ext4 filesystem on plain SATA drive (mounted with barrier=0 to make difference more visible). When all processes are in the root cgroup, reported throughput is 153.132 MB/sec. When dbench process gets its own blkio cgroup, reported throughput drops to 26.1006 MB/sec. Fix the problem by making check in cfq_should_preempt() more benevolent and allow preemption by ancestor cgroup. This improves the throughput reported by dbench4 to 48.9106 MB/sec. Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jan Kara <jack@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-02-04cfq-iosched: Allow sync noidle workloads to preempt each otherJan Kara1-1/+0
The original idea with preemption of sync noidle queues (introduced in commit 718eee0579b8 "cfq-iosched: fairness for sync no-idle queues") was that we service all sync noidle queues together, we don't idle on any of the queues individually and we idle only if there is no sync noidle queue to be served. This intention also matches the original test: if (cfqd->serving_type == SYNC_NOIDLE_WORKLOAD && new_cfqq->service_tree == cfqq->service_tree) return true; However since at that time cfqq->service_tree was not set for idling queues, this test was unreliable and was replaced in commit e4a229196a7c "cfq-iosched: fix no-idle preemption logic" by: if (cfqd->serving_type == SYNC_NOIDLE_WORKLOAD && cfqq_type(new_cfqq) == SYNC_NOIDLE_WORKLOAD && new_cfqq->service_tree->count == 1) return true; That was a reliable test but was actually doing something different - now we preempt sync noidle queue only if the new queue is the only one busy in the service tree. These days cfq queue is kept in service tree even if it is idling and thus the original check would be safe again. But since we actually check that cfq queues are in the same cgroup, of the same priority class and workload type (sync noidle), we know that new_cfqq is fine to preempt cfqq. So just remove the service tree check. Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jan Kara <jack@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-02-04cfq-iosched: Reorder checks in cfq_should_preempt()Jan Kara1-6/+7
Move check for preemption by rt class up. There is no functional change but it makes arguing about conditions simpler since we can be sure both cfq queues are from the same ioprio class. Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jan Kara <jack@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-02-04cfq-iosched: Don't group_idle if cfqq has big thinktimeJan Kara1-2/+8
There is no point in idling on a cfq group if the only cfq queue that is there has too big thinktime. Signed-off-by: Jan Kara <jack@suse.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-02-03nbd: Fix debugfs error handlingMarkus Pargmann1-41/+14
Static checker complains about the implemented error handling. It is indeed wrong. We don't care about the return values of created debugfs files. We only have to check the return values of created dirs for NULL pointer. If we use a null pointer as parent directory for files, this may lead to debugfs files in wrong places. Signed-off-by: Markus Pargmann <mpa@pengutronix.de>
2016-01-31Linux 4.5-rc2Linus Torvalds1-1/+1
2016-01-30pid: Fix spelling in commentsZhen Lei1-1/+1
Accidentally discovered this typo when I studied this module. Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Cc: Hanjun Guo <guohanjun@huawei.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tianhong Ding <dingtianhong@huawei.com> Cc: Xinwei Hu <huxinwei@huawei.com> Cc: Zefan Li <lizefan@huawei.com> Link: http://lkml.kernel.org/r/1454119457-11272-1-git-send-email-thunder.leizhen@huawei.com Signed-off-by: Ingo Molnar <mingo@kernel.org>