aboutsummaryrefslogtreecommitdiffstats
path: root/tools/perf/scripts/python/export-to-postgresql.py (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2021-10-26block: schedule queue restart after BLK_STS_ZONE_RESOURCENaohiro Aota1-4/+9
When dispatching a zone append write request to a SCSI zoned block device, if the target zone of the request is already locked, the device driver will return BLK_STS_ZONE_RESOURCE and the request will be pushed back to the hctx dipatch queue. The queue will be marked as RESTART in dd_finish_request() and restarted in __blk_mq_free_request(). However, this restart applies to the hctx of the completed request. If the requeued request is on a different hctx, dispatch will no be retried until another request is submitted or the next periodic queue run triggers, leading to up to 30 seconds latency for the requeued request. Fix this problem by scheduling a queue restart similarly to the BLK_STS_RESOURCE case or when we cannot get the budget. Also, consolidate the checks into the "need_resource" variable to simplify the condition. Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Niklas Cassel <Niklas.Cassel@wdc.com> Link: https://lore.kernel.org/r/20211026165127.4151055-1-naohiro.aota@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-26block: drain queue after disk is removed from sysfsMing Lei1-10/+12
Before removing disk from sysfs, userspace still may change queue via sysfs, such as switching elevator or setting wbt latency, both may reinitialize wbt, then the warning in blk_free_queue_stats() will be triggered since rq_qos_exit() is moved to del_gendisk(). Fixes the issue by moving draining queue & tearing down after disk is removed from sysfs, at that time no one can come into queue's store()/show(). Reported-by: Yi Zhang <yi.zhang@redhat.com> Tested-by: Yi Zhang <yi.zhang@redhat.com> Fixes: 8e141f9eb803 ("block: drain file system I/O on del_gendisk") Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20211026101204.2897166-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18block: fix incorrect references to disk objectsZqiang1-0/+1
When adding partitions to the disk, the reference count of the disk object is increased. then alloc partition device and called device_add(), if the device_add() return error, the reference count of the disk object will be reduced twice, at put_device(pdev) and put_disk(disk). this leads to the end of the object's life cycle prematurely, and trigger following calltrace. __init_work+0x2d/0x50 kernel/workqueue.c:519 synchronize_rcu_expedited+0x3af/0x650 kernel/rcu/tree_exp.h:847 bdi_remove_from_list mm/backing-dev.c:938 [inline] bdi_unregister+0x17f/0x5c0 mm/backing-dev.c:946 release_bdi+0xa1/0xc0 mm/backing-dev.c:968 kref_put include/linux/kref.h:65 [inline] bdi_put+0x72/0xa0 mm/backing-dev.c:976 bdev_free_inode+0x11e/0x220 block/bdev.c:408 i_callback+0x3f/0x70 fs/inode.c:226 rcu_do_batch kernel/rcu/tree.c:2508 [inline] rcu_core+0x76d/0x16c0 kernel/rcu/tree.c:2743 __do_softirq+0x1d7/0x93b kernel/softirq.c:558 invoke_softirq kernel/softirq.c:432 [inline] __irq_exit_rcu kernel/softirq.c:636 [inline] irq_exit_rcu+0xf2/0x130 kernel/softirq.c:648 sysvec_apic_timer_interrupt+0x93/0xc0 making disk is NULL when calling put_disk(). Reported-by: Hao Sun <sunhao.th@gmail.com> Signed-off-by: Zqiang <qiang.zhang1211@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211018103422.2043-1-qiang.zhang1211@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-17blk-cgroup: blk_cgroup_bio_start() should use irq-safe operations on blkg->iostat_cpuTejun Heo1-2/+3
c3df5fb57fe8 ("cgroup: rstat: fix A-A deadlock on 32bit around u64_stats_sync") made u64_stats updates irq-safe to avoid A-A deadlocks. Unfortunately, the conversion missed one in blk_cgroup_bio_start(). Fix it. Fixes: 2d146aa3aa84 ("mm: memcontrol: switch to rstat") Cc: stable@vger.kernel.org # v5.13+ Reported-by: syzbot+9738c8815b375ce482a1@syzkaller.appspotmail.com Signed-off-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/YWi7NrQdVlxD6J9W@slm.duckdns.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-17block, bfq: reset last_bfqq_created on group changePaolo Valente1-0/+6
Since commit 430a67f9d616 ("block, bfq: merge bursts of newly-created queues"), BFQ maintains a per-group pointer to the last bfq_queue created. If such a queue, say bfqq, happens to move to a different group, then bfqq is no more a valid last bfq_queue created for its previous group. That pointer must then be cleared. Not resetting such a pointer may also cause UAF, if bfqq happens to also be freed after being moved to a different group. This commit performs this missing reset. As such it fixes commit 430a67f9d616 ("block, bfq: merge bursts of newly-created queues"). Such a missing reset is most likely the cause of the crash reported in [1]. With some analysis, we found that this crash was due to the above UAF. And such UAF did go away with this commit applied [1]. Anyway, before this commit, that crash happened to be triggered in conjunction with commit 2d52c58b9c9b ("block, bfq: honor already-setup queue merges"). The latter was then reverted by commit ebc69e897e17 ("Revert "block, bfq: honor already-setup queue merges""). Yet commit 2d52c58b9c9b ("block, bfq: honor already-setup queue merges") contains no error related with the above UAF, and can then be restored. [1] https://bugzilla.kernel.org/show_bug.cgi?id=214503 Fixes: 430a67f9d616 ("block, bfq: merge bursts of newly-created queues") Tested-by: Grzegorz Kowal <custos.mentis@gmail.com> Signed-off-by: Paolo Valente <paolo.valente@linaro.org> Link: https://lore.kernel.org/r/20211015144336.45894-2-paolo.valente@linaro.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-17block: warn when putting the final reference on a registered diskChristoph Hellwig1-0/+1
Warn when the last reference on a live disk is put without calling del_gendisk first. There are some BDI related bug reports that look like a case of this, so make sure we have the proper instrumentation to catch it. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211014130231.1468538-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-17brd: reduce the brd_devices_mutex scopeTetsuo Handa1-22/+22
As with commit 8b52d8be86d72308 ("loop: reorder loop_exit"), unregister_blkdev() needs to be called first in order to avoid calling brd_alloc() from brd_probe() after brd_del_one() from brd_exit(). Then, we can avoid holding global mutex during add_disk()/del_gendisk() as with commit 1c500ad706383f1a ("loop: reduce the loop_ctl_mutex scope"). Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/e205f13d-18ff-a49c-0988-7de6ea5ff823@i-love.sakura.ne.jp Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-15kyber: avoid q->disk dereferences in trace pointsChristoph Hellwig2-14/+15
q->disk becomes invalid after the gendisk is removed. Work around this by caching the dev_t for the tracepoints. The real fix would be to properly tear down the I/O schedulers with the gendisk, but that is a much more invasive change. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211012093301.GA27795@lst.de Tested-by: Yi Zhang <yi.zhang@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-15block: keep q_usage_counter in atomic mode after del_gendiskChristoph Hellwig3-2/+11
Don't switch back to percpu mode to avoid the double RCU grace period when tearing down SCSI devices. After removing the disk only passthrough commands can be send anyway. Suggested-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Darrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/r/20210929071241.934472-6-hch@lst.de Tested-by: Yi Zhang <yi.zhang@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-15block: drain file system I/O on del_gendiskChristoph Hellwig4-15/+35
Instead of delaying draining of file system I/O related items like the blk-qos queues, the integrity read workqueue and timeouts only when the request_queue is removed, do that when del_gendisk is called. This is important for SCSI where the upper level drivers that control the gendisk are separate entities, and the disk can be freed much earlier than the request_queue, or can even be unbound without tearing down the queue. Fixes: edb0872f44ec ("block: move the bdi from the request_queue to the gendisk") Reported-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Darrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/r/20210929071241.934472-5-hch@lst.de Tested-by: Yi Zhang <yi.zhang@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-15block: split bio_queue_enter from blk_queue_enterChristoph Hellwig1-8/+25
To prepare for fixing a gendisk shutdown race, open code the blk_queue_enter logic in bio_queue_enter. This also removes the pointless flags translation. Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Darrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/r/20210929071241.934472-4-hch@lst.de Tested-by: Yi Zhang <yi.zhang@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-15block: factor out a blk_try_enter_queue helperChristoph Hellwig1-28/+32
Factor out the code to try to get q_usage_counter without blocking into a separate helper. Both to improve code readability and to prepare for splitting bio_queue_enter from blk_queue_enter. Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Darrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/r/20210929071241.934472-3-hch@lst.de Tested-by: Yi Zhang <yi.zhang@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-15block: call submit_bio_checks under q_usage_counterChristoph Hellwig1-22/+12
Ensure all bios check the current values of the queue under freeze protection, i.e. to make sure the zero capacity set by del_gendisk is actually seen before dispatching to the driver. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210929071241.934472-2-hch@lst.de Tested-by: Yi Zhang <yi.zhang@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-14nvme: fix per-namespace chardev deletionAdam Manzanares2-11/+12
Decrease reference count of chardevice during char device deletion in order to fix a memory leak. Add a release callabck for the device associated chardev and move ida_simple_remove into the release function. Fixes: 2637baed7801 ("nvme: introduce generic per-namespace chardev") Reported-by: Yi Zhang <yi.zhang@redhat.com> Suggested-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Adam Manzanares <a.manzanares@samsung.com> Reviewed-by: Javier González <javier@javigon.com> Tested-by: Yi Zhang <yi.zhang@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-10-12block/rnbd-clt-sysfs: fix a couple uninitialized variable bugsDan Carpenter1-1/+3
These variables are printed on the error path if match_int() fails so they have to be initialized. Fixes: 2958a995edc9 ("block/rnbd-clt: Support polling mode for IO latency optimization") Fixes: 1eb54f8f5dd8 ("block/rnbd: client: sysfs interface functions") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Gioh Kim <gi-oh.kim@ionos.com> Link: https://lore.kernel.org/r/20211012084443.GA31472@kili Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-07nvme-pci: Fix abort command idKeith Busch1-1/+1
The request tag is no longer the only component of the command id. Fixes: e7006de6c2380 ("nvme: code command_id with a genctr for use-after-free validation") Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Keith Busch <kbusch@kernel.org>
2021-10-04block: decode QUEUE_FLAG_HCTX_ACTIVE in debugfs outputJohannes Thumshirn1-0/+1
While debugging an issue we've found that $DEBUGFS/block/$disk/state doesn't decode QUEUE_FLAG_HCTX_ACTIVE but only displays its numerical value. Add QUEUE_FLAG(HCTX_ACTIVE) to the blk_queue_flag_name array so it'll get decoded properly. Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/4351076388918075bd80ef07756f9d2ce63be12c.1633332053.git.johannes.thumshirn@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-02block: genhd: fix double kfree() in __alloc_disk_node()Tetsuo Handa2-1/+2
syzbot is reporting use-after-free read at bdev_free_inode() [1], for kfree() from __alloc_disk_node() is called before bdev_free_inode() (which is called after RCU grace period) reads bdev->bd_disk and calls kfree(bdev->bd_disk). Fix use-after-free read followed by double kfree() problem by making sure that bdev->bd_disk is NULL when calling iput(). Link: https://syzkaller.appspot.com/bug?extid=8281086e8a6fbfbd952a [1] Reported-by: syzbot <syzbot+8281086e8a6fbfbd952a@syzkaller.appspotmail.com> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/e6dd13c5-8db0-4392-6e78-a42ee5d2a1c4@i-love.sakura.ne.jp Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-09-29nbd: use shifts rather than multipliesNick Desaulniers1-12/+17
commit fad7cd3310db ("nbd: add the check to prevent overflow in __nbd_ioctl()") raised an issue from the fallback helpers added in commit f0907827a8a9 ("compiler.h: enable builtin overflow checkers and add fallback code") ERROR: modpost: "__divdi3" [drivers/block/nbd.ko] undefined! As Stephen Rothwell notes: The added check_mul_overflow() call is being passed 64 bit values. COMPILER_HAS_GENERIC_BUILTIN_OVERFLOW is not set for this build (see include/linux/overflow.h). Specifically, the helpers for checking whether the results of a multiplication overflowed (__unsigned_mul_overflow, __signed_add_overflow) use the division operator when !COMPILER_HAS_GENERIC_BUILTIN_OVERFLOW. This is problematic for 64b operands on 32b hosts. This was fixed upstream by commit 76ae847497bc ("Documentation: raise minimum supported version of GCC to 5.1") which is not suitable to be backported to stable. Further, __builtin_mul_overflow() would emit a libcall to a compiler-rt-only symbol when compiling with clang < 14 for 32b targets. ld.lld: error: undefined symbol: __mulodi4 In order to keep stable buildable with GCC 4.9 and clang < 14, modify struct nbd_config to instead track the number of bits of the block size; reconstructing the block size using runtime checked shifts that are not problematic for those compilers and in a ways that can be backported to stable. In nbd_set_size, we do validate that the value of blksize must be a power of two (POT) and is in the range of [512, PAGE_SIZE] (both inclusive). This does modify the debugfs interface. Cc: stable@vger.kernel.org Cc: Arnd Bergmann <arnd@kernel.org> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Link: https://github.com/ClangBuiltLinux/linux/issues/1438 Link: https://lore.kernel.org/all/20210909182525.372ee687@canb.auug.org.au/ Link: https://lore.kernel.org/stable/CAHk-=whiQBofgis_rkniz8GBP9wZtSZdcDEffgSLO62BUGV3gg@mail.gmail.com/ Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Reported-by: Nathan Chancellor <nathan@kernel.org> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Suggested-by: Kees Cook <keescook@chromium.org> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Suggested-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Link: https://lore.kernel.org/r/20210920232533.4092046-1-ndesaulniers@google.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-09-28Revert "block, bfq: honor already-setup queue merges"Jens Axboe1-13/+3
This reverts commit 2d52c58b9c9bdae0ca3df6a1eab5745ab3f7d80b. We have had several folks complain that this causes hangs for them, which is especially problematic as the commit has also hit stable already. As no resolution seems to be forthcoming right now, revert the patch. Link: https://bugzilla.kernel.org/show_bug.cgi?id=214503 Fixes: 2d52c58b9c9b ("block, bfq: honor already-setup queue merges") Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-09-27nvme: add command id quirk for apple controllersKeith Busch3-2/+11
Some apple controllers use the command id as an index to implementation specific data structures and will fail if the value is out of bounds. The nvme driver's recently introduced command sequence number breaks this controller. Provide a quirk so these spec incompliant controllers can function as before. The driver will not have the ability to detect bad completions when this quirk is used, but we weren't previously checking this anyway. The quirk bit was selected so that it can readily apply to stable. Link: https://bugzilla.kernel.org/show_bug.cgi?id=214509 Cc: Sven Peter <sven@svenpeter.dev> Reported-by: Orlando Chamberlain <redecorating@protonmail.com> Reported-by: Aditya Garg <gargaditya08@live.com> Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: Sven Peter <sven@svenpeter.dev> Link: https://lore.kernel.org/r/20210927154306.387437-1-kbusch@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-09-24block: hold ->invalidate_lock in blkdev_fallocateMing Lei1-11/+10
When running ->fallocate(), blkdev_fallocate() should hold mapping->invalidate_lock to prevent page cache from being accessed, otherwise stale data may be read in page cache. Without this patch, blktests block/009 fails sometimes. With this patch, block/009 can pass always. Also as Jan pointed out, no pages can be created in the discarded area while you are holding the invalidate_lock, so remove the 2nd truncate_bdev_range(). Cc: Jan Kara <jack@suse.cz> Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210923023751.1441091-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-09-24blktrace: Fix uaf in blk_trace access after removing by sysfsZhihao Cheng1-0/+8
There is an use-after-free problem triggered by following process: P1(sda) P2(sdb) echo 0 > /sys/block/sdb/trace/enable blk_trace_remove_queue synchronize_rcu blk_trace_free relay_close rcu_read_lock __blk_add_trace trace_note_tsk (Iterate running_trace_list) relay_close_buf relay_destroy_buf kfree(buf) trace_note(sdb's bt) relay_reserve buf->offset <- nullptr deference (use-after-free) !!! rcu_read_unlock [ 502.714379] BUG: kernel NULL pointer dereference, address: 0000000000000010 [ 502.715260] #PF: supervisor read access in kernel mode [ 502.715903] #PF: error_code(0x0000) - not-present page [ 502.716546] PGD 103984067 P4D 103984067 PUD 17592b067 PMD 0 [ 502.717252] Oops: 0000 [#1] SMP [ 502.720308] RIP: 0010:trace_note.isra.0+0x86/0x360 [ 502.732872] Call Trace: [ 502.733193] __blk_add_trace.cold+0x137/0x1a3 [ 502.733734] blk_add_trace_rq+0x7b/0xd0 [ 502.734207] blk_add_trace_rq_issue+0x54/0xa0 [ 502.734755] blk_mq_start_request+0xde/0x1b0 [ 502.735287] scsi_queue_rq+0x528/0x1140 ... [ 502.742704] sg_new_write.isra.0+0x16e/0x3e0 [ 502.747501] sg_ioctl+0x466/0x1100 Reproduce method: ioctl(/dev/sda, BLKTRACESETUP, blk_user_trace_setup[buf_size=127]) ioctl(/dev/sda, BLKTRACESTART) ioctl(/dev/sdb, BLKTRACESETUP, blk_user_trace_setup[buf_size=127]) ioctl(/dev/sdb, BLKTRACESTART) echo 0 > /sys/block/sdb/trace/enable & // Add delay(mdelay/msleep) before kernel enters blk_trace_free() ioctl$SG_IO(/dev/sda, SG_IO, ...) // Enters trace_note_tsk() after blk_trace_free() returned // Use mdelay in rcu region rather than msleep(which may schedule out) Remove blk_trace from running_list before calling blk_trace_free() by sysfs if blk_trace is at Blktrace_running state. Fixes: c71a896154119f ("blktrace: add ftrace plugin") Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Link: https://lore.kernel.org/r/20210923134921.109194-1-chengzhihao1@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-09-24block: don't call rq_qos_ops->done_bio if the bio isn't trackedMing Lei1-1/+1
rq_qos framework is only applied on request based driver, so: 1) rq_qos_done_bio() needn't to be called for bio based driver 2) rq_qos_done_bio() needn't to be called for bio which isn't tracked, such as bios ended from error handling code. Especially in bio_endio(): 1) request queue is referred via bio->bi_bdev->bd_disk->queue, which may be gone since request queue refcount may not be held in above two cases 2) q->rq_qos may be freed in blk_cleanup_queue() when calling into __rq_qos_done_bio() Fix the potential kernel panic by not calling rq_qos_ops->done_bio if the bio isn't tracked. This way is safe because both ioc_rqos_done_bio() and blkcg_iolatency_done_bio() are nop if the bio isn't tracked. Reported-by: Yu Kuai <yukuai3@huawei.com> Cc: tj@kernel.org Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20210924110704.1541818-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-09-22md: fix a lock order reversal in md_allocChristoph Hellwig1-5/+0
Commit b0140891a8cea3 ("md: Fix race when creating a new md device.") not only moved assigning mddev->gendisk before calling add_disk, which fixes the races described in the commit log, but also added a mddev->open_mutex critical section over add_disk and creation of the md kobj. Adding a kobject after add_disk is racy vs deleting the gendisk right after adding it, but md already prevents against that by holding a mddev->active reference. On the other hand taking this lock added a lock order reversal with what is not disk->open_mutex (used to be bdev->bd_mutex when the commit was added) for partition devices, which need that lock for the internal open for the partition scan, and a recent commit also takes it for non-partitioned devices, leading to further lockdep splatter. Fixes: b0140891a8ce ("md: Fix race when creating a new md device.") Fixes: d62633873590 ("block: support delayed holder registration") Reported-by: syzbot+fadc0aaf497e6a493b9f@syzkaller.appspotmail.com Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: syzbot+fadc0aaf497e6a493b9f@syzkaller.appspotmail.com Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Song Liu <songliubraving@fb.com>
2021-09-21nvme: keep ctrl->namespaces orderedChristoph Hellwig1-16/+17
Various places in the nvme code that rely on ctrl->namespace to be ordered. Ensure that the namespae is inserted into the list at the right position from the start instead of sorting it after the fact. Fixes: 540c801c65eb ("NVMe: Implement namespace list scanning") Reported-by: Anton Eidelman <anton.eidelman@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
2021-09-21nvme-tcp: fix incorrect h2cdata pdu offset accountingSagi Grimberg1-3/+10
When the controller sends us multiple r2t PDUs in a single request we need to account for it correctly as our send/recv context run concurrently (i.e. we get a new r2t with r2t_offset before we updated our iterator and req->data_sent marker). This can cause wrong offsets to be sent to the controller. To fix that, we will first know that this may happen only in the send sequence of the last page, hence we will take the r2t_offset to the h2c PDU data_offset, and in nvme_tcp_try_send_data loop, we make sure to increment the request markers also when we completed a PDU but we are expecting more r2t PDUs as we still did not send the entire data of the request. Fixes: 825619b09ad3 ("nvme-tcp: fix possible use-after-completion") Reported-by: Nowak, Lukasz <Lukasz.Nowak@Dell.com> Tested-by: Nowak, Lukasz <Lukasz.Nowak@Dell.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-09-21nvme-fc: remove freeze/unfreeze around update_nr_hw_queuesJames Smart1-2/+0
Remove the freeze/unfreeze around changes to the number of hardware queues. Study and retest has indicated there are no ios that can be active at this point so there is nothing to freeze. nvme-fc is draining the queues in the shutdown and error recovery path in __nvme_fc_abort_outstanding_ios. This patch primarily reverts 88e837ed0f1f "nvme-fc: wait for queues to freeze before calling update_hr_hw_queues". It's not an exact revert as it leaves the adjusting of hw queues only if the count changes. Signed-off-by: James Smart <jsmart2021@gmail.com> [dwagner: added explanation why no IO is pending] Signed-off-by: Daniel Wagner <dwagner@suse.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-09-21nvme-fc: avoid race between time out and tear downJames Smart1-0/+2
To avoid race between time out and tear down, in tear down process, first we quiesce the queue, and then delete the timer and cancel the time out work for the queue. This patch merges the admin and io sync ops into the queue teardown logic as shown in the RDMA patch 3017013dcc "nvme-rdma: avoid race between time out and tear down". There is no teardown_lock in nvme-fc. Signed-off-by: James Smart <jsmart2021@gmail.com> Tested-by: Daniel Wagner <dwagner@suse.de> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-09-21nvme-fc: update hardware queues before using themDaniel Wagner1-8/+8
In case the number of hardware queues changes, we need to update the tagset and the mapping of ctx to hctx first. If we try to create and connect the I/O queues first, this operation will fail (target will reject the connect call due to the wrong number of queues) and hence we bail out of the recreate function. Then we will to try the very same operation again, thus we don't make any progress. Signed-off-by: Daniel Wagner <dwagner@suse.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-09-15blk-cgroup: fix UAF by grabbing blkcg lock before destroying blkg pdLi Jinlin1-0/+8
KASAN reports a use-after-free report when doing fuzz test: [693354.104835] ================================================================== [693354.105094] BUG: KASAN: use-after-free in bfq_io_set_weight_legacy+0xd3/0x160 [693354.105336] Read of size 4 at addr ffff888be0a35664 by task sh/1453338 [693354.105607] CPU: 41 PID: 1453338 Comm: sh Kdump: loaded Not tainted 4.18.0-147 [693354.105610] Hardware name: Huawei 2288H V5/BC11SPSCB0, BIOS 0.81 07/02/2018 [693354.105612] Call Trace: [693354.105621] dump_stack+0xf1/0x19b [693354.105626] ? show_regs_print_info+0x5/0x5 [693354.105634] ? printk+0x9c/0xc3 [693354.105638] ? cpumask_weight+0x1f/0x1f [693354.105648] print_address_description+0x70/0x360 [693354.105654] kasan_report+0x1b2/0x330 [693354.105659] ? bfq_io_set_weight_legacy+0xd3/0x160 [693354.105665] ? bfq_io_set_weight_legacy+0xd3/0x160 [693354.105670] bfq_io_set_weight_legacy+0xd3/0x160 [693354.105675] ? bfq_cpd_init+0x20/0x20 [693354.105683] cgroup_file_write+0x3aa/0x510 [693354.105693] ? ___slab_alloc+0x507/0x540 [693354.105698] ? cgroup_file_poll+0x60/0x60 [693354.105702] ? 0xffffffff89600000 [693354.105708] ? usercopy_abort+0x90/0x90 [693354.105716] ? mutex_lock+0xef/0x180 [693354.105726] kernfs_fop_write+0x1ab/0x280 [693354.105732] ? cgroup_file_poll+0x60/0x60 [693354.105738] vfs_write+0xe7/0x230 [693354.105744] ksys_write+0xb0/0x140 [693354.105749] ? __ia32_sys_read+0x50/0x50 [693354.105760] do_syscall_64+0x112/0x370 [693354.105766] ? syscall_return_slowpath+0x260/0x260 [693354.105772] ? do_page_fault+0x9b/0x270 [693354.105779] ? prepare_exit_to_usermode+0xf9/0x1a0 [693354.105784] ? enter_from_user_mode+0x30/0x30 [693354.105793] entry_SYSCALL_64_after_hwframe+0x65/0xca [693354.105875] Allocated by task 1453337: [693354.106001] kasan_kmalloc+0xa0/0xd0 [693354.106006] kmem_cache_alloc_node_trace+0x108/0x220 [693354.106010] bfq_pd_alloc+0x96/0x120 [693354.106015] blkcg_activate_policy+0x1b7/0x2b0 [693354.106020] bfq_create_group_hierarchy+0x1e/0x80 [693354.106026] bfq_init_queue+0x678/0x8c0 [693354.106031] blk_mq_init_sched+0x1f8/0x460 [693354.106037] elevator_switch_mq+0xe1/0x240 [693354.106041] elevator_switch+0x25/0x40 [693354.106045] elv_iosched_store+0x1a1/0x230 [693354.106049] queue_attr_store+0x78/0xb0 [693354.106053] kernfs_fop_write+0x1ab/0x280 [693354.106056] vfs_write+0xe7/0x230 [693354.106060] ksys_write+0xb0/0x140 [693354.106064] do_syscall_64+0x112/0x370 [693354.106069] entry_SYSCALL_64_after_hwframe+0x65/0xca [693354.106114] Freed by task 1453336: [693354.106225] __kasan_slab_free+0x130/0x180 [693354.106229] kfree+0x90/0x1b0 [693354.106233] blkcg_deactivate_policy+0x12c/0x220 [693354.106238] bfq_exit_queue+0xf5/0x110 [693354.106241] blk_mq_exit_sched+0x104/0x130 [693354.106245] __elevator_exit+0x45/0x60 [693354.106249] elevator_switch_mq+0xd6/0x240 [693354.106253] elevator_switch+0x25/0x40 [693354.106257] elv_iosched_store+0x1a1/0x230 [693354.106261] queue_attr_store+0x78/0xb0 [693354.106264] kernfs_fop_write+0x1ab/0x280 [693354.106268] vfs_write+0xe7/0x230 [693354.106271] ksys_write+0xb0/0x140 [693354.106275] do_syscall_64+0x112/0x370 [693354.106280] entry_SYSCALL_64_after_hwframe+0x65/0xca [693354.106329] The buggy address belongs to the object at ffff888be0a35580 which belongs to the cache kmalloc-1k of size 1024 [693354.106736] The buggy address is located 228 bytes inside of 1024-byte region [ffff888be0a35580, ffff888be0a35980) [693354.107114] The buggy address belongs to the page: [693354.107273] page:ffffea002f828c00 count:1 mapcount:0 mapping:ffff888107c17080 index:0x0 compound_mapcount: 0 [693354.107606] flags: 0x17ffffc0008100(slab|head) [693354.107760] raw: 0017ffffc0008100 ffffea002fcbc808 ffffea0030bd3a08 ffff888107c17080 [693354.108020] raw: 0000000000000000 00000000001c001c 00000001ffffffff 0000000000000000 [693354.108278] page dumped because: kasan: bad access detected [693354.108511] Memory state around the buggy address: [693354.108671] ffff888be0a35500: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [693354.116396] ffff888be0a35580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [693354.124473] >ffff888be0a35600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [693354.132421] ^ [693354.140284] ffff888be0a35680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [693354.147912] ffff888be0a35700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [693354.155281] ================================================================== blkgs are protected by both queue and blkcg locks and holding either should stabilize them. However, the path of destroying blkg policy data is only protected by queue lock in blkcg_activate_policy()/blkcg_deactivate_policy(). Other tasks can get the blkg policy data before the blkg policy data is destroyed, and use it after destroyed, which will result in a use-after-free. CPU0 CPU1 blkcg_deactivate_policy spin_lock_irq(&q->queue_lock) bfq_io_set_weight_legacy spin_lock_irq(&blkcg->lock) blkg_to_bfqg(blkg) pd_to_bfqg(blkg->pd[pol->plid]) ^^^^^^blkg->pd[pol->plid] != NULL bfqg != NULL pol->pd_free_fn(blkg->pd[pol->plid]) pd_to_bfqg(blkg->pd[pol->plid]) bfqg_put(bfqg) kfree(bfqg) blkg->pd[pol->plid] = NULL spin_unlock_irq(q->queue_lock); bfq_group_set_weight(bfqg, val, 0) bfqg->entity.new_weight ^^^^^^trigger uaf here spin_unlock_irq(&blkcg->lock); Fix by grabbing the matching blkcg lock before trying to destroy blkg policy data. Suggested-by: Tejun Heo <tj@kernel.org> Signed-off-by: Li Jinlin <lijinlin3@huawei.com> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20210914042605.3260596-1-lijinlin3@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-09-15blkcg: fix memory leak in blk_iolatency_initYanfei Xu1-4/+6
BUG: memory leak unreferenced object 0xffff888129acdb80 (size 96): comm "syz-executor.1", pid 12661, jiffies 4294962682 (age 15.220s) hex dump (first 32 bytes): 20 47 c9 85 ff ff ff ff 20 d4 8e 29 81 88 ff ff G...... ..).... 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff82264ec8>] kmalloc include/linux/slab.h:591 [inline] [<ffffffff82264ec8>] kzalloc include/linux/slab.h:721 [inline] [<ffffffff82264ec8>] blk_iolatency_init+0x28/0x190 block/blk-iolatency.c:724 [<ffffffff8225b8c4>] blkcg_init_queue+0xb4/0x1c0 block/blk-cgroup.c:1185 [<ffffffff822253da>] blk_alloc_queue+0x22a/0x2e0 block/blk-core.c:566 [<ffffffff8223b175>] blk_mq_init_queue_data block/blk-mq.c:3100 [inline] [<ffffffff8223b175>] __blk_mq_alloc_disk+0x25/0xd0 block/blk-mq.c:3124 [<ffffffff826a9303>] loop_add+0x1c3/0x360 drivers/block/loop.c:2344 [<ffffffff826a966e>] loop_control_get_free drivers/block/loop.c:2501 [inline] [<ffffffff826a966e>] loop_control_ioctl+0x17e/0x2e0 drivers/block/loop.c:2516 [<ffffffff81597eec>] vfs_ioctl fs/ioctl.c:51 [inline] [<ffffffff81597eec>] __do_sys_ioctl fs/ioctl.c:874 [inline] [<ffffffff81597eec>] __se_sys_ioctl fs/ioctl.c:860 [inline] [<ffffffff81597eec>] __x64_sys_ioctl+0xfc/0x140 fs/ioctl.c:860 [<ffffffff843fa745>] do_syscall_x64 arch/x86/entry/common.c:50 [inline] [<ffffffff843fa745>] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 [<ffffffff84600068>] entry_SYSCALL_64_after_hwframe+0x44/0xae Once blk_throtl_init() queue init failed, blkcg_iolatency_exit() will not be invoked for cleanup. That leads a memory leak. Swap the blk_throtl_init() and blk_iolatency_init() calls can solve this. Reported-by: syzbot+01321b15cc98e6bf96d6@syzkaller.appspotmail.com Fixes: 19688d7f9592 (block/blk-cgroup: Swap the blk_throtl_init() and blk_iolatency_init() calls) Signed-off-by: Yanfei Xu <yanfei.xu@windriver.com> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20210915072426.4022924-1-yanfei.xu@windriver.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-09-14nvme: remove the call to nvme_update_disk_info in nvme_ns_removeChristoph Hellwig1-2/+0
There is no need to explicitly unregister the integrity profile when deleting the gendisk. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Link: https://lore.kernel.org/r/20210914070657.87677-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-09-14block: flush the integrity workqueue in blk_integrity_unregisterLihong Kou1-0/+3
When the integrity profile is unregistered there can still be integrity reads queued up which could see a NULL verify_fn as shown by the race window below: CPU0 CPU1 process_one_work nvme_validate_ns bio_integrity_verify_fn nvme_update_ns_info nvme_update_disk_info blk_integrity_unregister ---set queue->integrity as 0 bio_integrity_process --access bi->profile->verify_fn(bi is a pointer of queue->integity) Before calling blk_integrity_unregister in nvme_update_disk_info, we must make sure that there is no work item in the kintegrityd_wq. Just call blk_flush_integrity to flush the work queue so the bug can be resolved. Signed-off-by: Lihong Kou <koulihong@huawei.com> [hch: split up and shortened the changelog] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Link: https://lore.kernel.org/r/20210914070657.87677-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-09-14block: check if a profile is actually registered in blk_integrity_unregisterChristoph Hellwig1-1/+5
While clearing the profile itself is harmless, we really should not clear the stable writes flag if it wasn't set due to a registered integrity profile. Reported-by: Lihong Kou <koulihong@huawei.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Link: https://lore.kernel.org/r/20210914070657.87677-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-09-14nvme-tcp: fix io_work priority inversionKeith Busch1-10/+10
Dispatching requests inline with the .queue_rq() call may block while holding the send_mutex. If the tcp io_work also happens to schedule, it may see the req_list is non-empty, leaving "pending" true and remaining in TASK_RUNNING. Since io_work is of higher scheduling priority, the .queue_rq task may not get a chance to run, blocking forward progress and leading to io timeouts. Instead of checking for pending requests within io_work, let the queueing restart io_work outside the send_mutex lock if there is more work to be done. Fixes: a0fdd1418007f ("nvme-tcp: rerun io_work if req_list is not empty") Reported-by: Samuel Jones <sjones@kalrayinc.com> Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-09-14nvme-rdma: destroy cm id before destroy qp to avoid use after freeRuozhu Li1-13/+3
We should always destroy cm_id before destroy qp to avoid to get cma event after qp was destroyed, which may lead to use after free. In RDMA connection establishment error flow, don't destroy qp in cm event handler.Just report cm_error to upper level, qp will be destroy in nvme_rdma_alloc_queue() after destroy cm id. Signed-off-by: Ruozhu Li <liruozhu@huawei.com> Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-09-14nvme-multipath: fix ANA state updates when a namespace is not presentAnton Eidelman1-2/+5
nvme_update_ana_state() has a deficiency that results in a failure to properly update the ana state for a namespace in the following case: NSIDs in ctrl->namespaces: 1, 3, 4 NSIDs in desc->nsids: 1, 2, 3, 4 Loop iteration 0: ns index = 0, n = 0, ns->head->ns_id = 1, nsid = 1, MATCH. Loop iteration 1: ns index = 1, n = 1, ns->head->ns_id = 3, nsid = 2, NO MATCH. Loop iteration 2: ns index = 2, n = 2, ns->head->ns_id = 4, nsid = 4, MATCH. Where the update to the ANA state of NSID 3 is missed. To fix this increment n and retry the update with the same ns when ns->head->ns_id is higher than nsid, Signed-off-by: Anton Eidelman <anton@lightbitslabs.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
2021-09-13nvme: avoid race in shutdown namespace removalDaniel Wagner1-8/+7
When we remove the siblings entry, we update ns->head->list, hence we can't separate the removal and test for being empty. They have to be in the same critical section to avoid a race. To avoid breaking the refcounting imbalance again, add a list empty check to nvme_find_ns_head. Fixes: 5396fdac56d8 ("nvme: fix refcounting imbalance when all paths are down") Signed-off-by: Daniel Wagner <dwagner@suse.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Tested-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-09-13nvmet: fix a width vs precision bug in nvmet_subsys_attr_serial_show()Dan Carpenter1-1/+1
This was intended to limit the number of characters printed from "subsys->serial" to NVMET_SN_MAX_SIZE. But accidentally the width specifier was used instead of the precision specifier so it only affects the alignment and not the number of characters printed. Fixes: f04064814c2a ("nvmet: fixup buffer overrun in nvmet_subsys_attr_serial()") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-09-12blk-mq: avoid to iterate over stale requestMing Lei1-1/+1
blk-mq can't run allocating driver tag and updating ->rqs[tag] atomically, meantime blk-mq doesn't clear ->rqs[tag] after the driver tag is released. So there is chance to iterating over one stale request just after the tag is allocated and before updating ->rqs[tag]. scsi_host_busy_iter() calls scsi_host_check_in_flight() to count scsi in-flight requests after scsi host is blocked, so no new scsi command can be marked as SCMD_STATE_INFLIGHT. However, driver tag allocation still can be run by blk-mq core. One request is marked as SCMD_STATE_INFLIGHT, but this request may have been kept in another slot of ->rqs[], meantime the slot can be allocated out but ->rqs[] isn't updated yet. Then this in-flight request is counted twice as SCMD_STATE_INFLIGHT. This way causes trouble in handling scsi error. Fixes the issue by not iterating over stale request. Cc: linux-scsi@vger.kernel.org Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Reported-by: luojiaxing <luojiaxing@huawei.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20210906065003.439019-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-09-12Linux 5.15-rc1Linus Torvalds1-2/+2
2021-09-11tools headers UAPI: Update tools's copy of drm.h headersArnaldo Carvalho de Melo1-2/+12
Picking the changes from: 17ce9c61c71cbc0d ("drm: document DRM_IOCTL_MODE_RMFB") Doesn't result in any tooling changes: $ tools/perf/trace/beauty/drm_ioctl.sh > before $ cp include/uapi/drm/drm.h tools/include/uapi/drm/drm.h $ tools/perf/trace/beauty/drm_ioctl.sh > after $ diff -u before after Silencing these perf build warnings: Warning: Kernel ABI header at 'tools/include/uapi/drm/drm.h' differs from latest version at 'include/uapi/drm/drm.h' diff -u tools/include/uapi/drm/drm.h include/uapi/drm/drm.h Cc: Simon Ser <contact@emersion.fr> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-09-11tools headers UAPI: Sync drm/i915_drm.h with the kernel sourcesArnaldo Carvalho de Melo1-81/+417
To pick the changes in: b65a9489730a2494 ("drm/i915/userptr: Probe existence of backing struct pages upon creation") ee242ca704d38699 ("drm/i915/guc: Implement GuC priority management") 81340cf3bddded4f ("drm/i915/uapi: reject set_domain for discrete") 7961c5b60f23dff5 ("drm/i915: Add TTM offset argument to mmap.") aef7b67a79564f6c ("drm/i915/uapi: convert drm_i915_gem_userptr to kernel doc") e7737b67ab46ee0e ("drm/i915/uapi: reject caching ioctls for discrete") 3aa8c57fe25a9247 ("drm/i915/uapi: convert drm_i915_gem_set_domain to kernel doc") 289f5a72009b8f67 ("drm/i915/uapi: convert drm_i915_gem_caching to kernel doc") 4a766ae40ec83301 ("drm/i915: Drop the CONTEXT_CLONE API (v2)") 6ff6d61dd2a943bd ("drm/i915: Drop I915_CONTEXT_PARAM_NO_ZEROMAP") fe4751c3d513ff4f ("drm/i915: Drop I915_CONTEXT_PARAM_RINGSIZE") 577729533cdc4e37 ("drm/i915: Document the Virtual Engine uAPI") c649432e86ca677d ("drm/i915: Fix busy ioctl commentary") That doesn't result in any changes to tooling as no new ioctl were added (at least not perceived by tools/perf/trace/beauty/drm_ioctl.sh). Addressing this perf build warning: Warning: Kernel ABI header at 'tools/include/uapi/drm/i915_drm.h' differs from latest version at 'include/uapi/drm/i915_drm.h' diff -u tools/include/uapi/drm/i915_drm.h include/uapi/drm/i915_drm.h Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Jason Ekstrand <jason@jlekstrand.net> Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-09-11tools headers UAPI: Sync linux/fs.h with the kernel sourcesArnaldo Carvalho de Melo1-0/+1
To pick the change in: 7957d93bf32bc211 ("block: add ioctl to read the disk sequence number") It adds a new ioctl, but we are still not using that to generate tables for 'perf trace', so no changes in tooling. This silences this perf build warning: Warning: Kernel ABI header at 'tools/include/uapi/linux/fs.h' differs from latest version at 'include/uapi/linux/fs.h' diff -u tools/include/uapi/linux/fs.h include/uapi/linux/fs.h Cc: Jens Axboe <axboe@kernel.dk> Cc: Matteo Croce <mcroce@microsoft.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-09-11tools headers UAPI: Sync linux/in.h copy with the kernel sourcesArnaldo Carvalho de Melo1-10/+32
To pick the changes in: db243b796439c0ca ("net/ipv4/ipv6: Replace one-element arraya with flexible-array members") 2d3e5caf96b9449a ("net/ipv4: Replace one-element array with flexible-array member") That don't result in any change in tooling, the structs changed remains with the same layout. This addresses this build warning: Warning: Kernel ABI header at 'tools/include/uapi/linux/in.h' differs from latest version at 'include/uapi/linux/in.h' diff -u tools/include/uapi/linux/in.h include/uapi/linux/in.h Cc: David S. Miller <davem@davemloft.net> Cc: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-09-11perf tools: Add an option to build without libbfdIan Rogers1-22/+25
Some distributions, like debian, don't link perf with libbfd. Add a build flag to make this configuration buildable and testable. This was inspired by: https://lore.kernel.org/linux-perf-users/20210910102307.2055484-1-tonyg@leastfixedpoint.com/T/#u Signed-off-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: tony garnock-jones <tonyg@leastfixedpoint.com> Link: http://lore.kernel.org/lkml/20210910225756.729087-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-09-11perf tools: Allow build-id with trailing zerosNamhyung Kim1-0/+10
Currently perf saves a build-id with size but old versions assumes the size of 20. In case the build-id is less than 20 (like for MD5), it'd fill the rest with 0s. I saw a problem when old version of perf record saved a binary in the build-id cache and new version of perf reads the data. The symbols should be read from the build-id cache (as the path no longer has the same binary) but it failed due to mismatch in the build-id. symsrc__init: build id mismatch for /home/namhyung/.debug/.build-id/53/e4c2f42a4c61a2d632d92a72afa08f00000000/elf. The build-id event in the data has 20 byte build-ids, but it saw a different size (16) when it reads the build-id of the elf file in the build-id cache. $ readelf -n ~/.debug/.build-id/53/e4c2f42a4c61a2d632d92a72afa08f00000000/elf Displaying notes found in: .note.gnu.build-id Owner Data size Description GNU 0x00000010 NT_GNU_BUILD_ID (unique build ID bitstring) Build ID: 53e4c2f42a4c61a2d632d92a72afa08f Let's fix this by allowing trailing zeros if the size is different. Fixes: 39be8d0115b321ed ("perf tools: Pass build_id object to dso__build_id_equal()") Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20210910224630.1084877-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-09-11perf tools: Fix hybrid config terms list corruptionAdrian Hunter2-9/+27
A config terms list was spliced twice, resulting in a never-ending loop when the list was traversed. Fix by using list_splice_init() and copying and freeing the lists as necessary. This patch also depends on patch "perf tools: Factor out copy_config_terms() and free_config_terms()" Example on ADL: Before: # perf record -e '{intel_pt//,cycles/aux-sample-size=4096/pp}' uname & # jobs [1]+ Running perf record -e "{intel_pt//,cycles/aux-sample-size=4096/pp}" uname # perf top -E 10 PerfTop: 4071 irqs/sec kernel: 6.9% exact: 100.0% lost: 0/0 drop: 0/0 [4000Hz cycles], (all, 24 CPUs) --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 97.60% perf [.] __evsel__get_config_term 0.25% [kernel] [k] kallsyms_expand_symbol.constprop.13 0.24% perf [.] kallsyms__parse 0.15% [kernel] [k] _raw_spin_lock 0.14% [kernel] [k] number 0.13% [kernel] [k] advance_transaction 0.08% [kernel] [k] format_decode 0.08% perf [.] map__process_kallsym_symbol 0.08% perf [.] rb_insert_color 0.08% [kernel] [k] vsnprintf exiting. # kill %1 After: # perf record -e '{intel_pt//,cycles/aux-sample-size=4096/pp}' uname & Linux [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.060 MB perf.data ] # perf script | head perf-exec 604 [001] 1827.312293: psb: psb offs: 0 ffffffffb8415e87 pt_config_start+0x37 ([kernel.kallsyms]) perf-exec 604 1827.312293: 1 branches: ffffffffb856a3bd event_sched_in.isra.133+0xfd ([kernel.kallsyms]) => ffffffffb856a9a0 perf_pmu_nop_void+0x0 ([kernel.kallsyms]) perf-exec 604 1827.312293: 1 branches: ffffffffb856b10e merge_sched_in+0x26e ([kernel.kallsyms]) => ffffffffb856a2c0 event_sched_in.isra.133+0x0 ([kernel.kallsyms]) perf-exec 604 1827.312293: 1 branches: ffffffffb856a45d event_sched_in.isra.133+0x19d ([kernel.kallsyms]) => ffffffffb8568b80 perf_event_set_state.part.61+0x0 ([kernel.kallsyms]) perf-exec 604 1827.312293: 1 branches: ffffffffb8568b86 perf_event_set_state.part.61+0x6 ([kernel.kallsyms]) => ffffffffb85662a0 perf_event_update_time+0x0 ([kernel.kallsyms]) perf-exec 604 1827.312293: 1 branches: ffffffffb856a35c event_sched_in.isra.133+0x9c ([kernel.kallsyms]) => ffffffffb8567610 perf_log_itrace_start+0x0 ([kernel.kallsyms]) perf-exec 604 1827.312293: 1 branches: ffffffffb856a377 event_sched_in.isra.133+0xb7 ([kernel.kallsyms]) => ffffffffb8403b40 x86_pmu_add+0x0 ([kernel.kallsyms]) perf-exec 604 1827.312293: 1 branches: ffffffffb8403b86 x86_pmu_add+0x46 ([kernel.kallsyms]) => ffffffffb8403940 collect_events+0x0 ([kernel.kallsyms]) perf-exec 604 1827.312293: 1 branches: ffffffffb8403a7b collect_events+0x13b ([kernel.kallsyms]) => ffffffffb8402cd0 collect_event+0x0 ([kernel.kallsyms]) Fixes: 30def61f64bac5 ("perf parse-events Create two hybrid cache events") Fixes: 94da591b1c7913 ("perf parse-events Create two hybrid raw events") Fixes: 9cbfa2f64c04d9 ("perf parse-events Create two hybrid hardware events") Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Link: https //lore.kernel.org/r/20210909125508.28693-3-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-09-11perf tools: Factor out copy_config_terms() and free_config_terms()Adrian Hunter3-13/+19
Factor out copy_config_terms() and free_config_terms() so that they can be reused. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Link: https //lore.kernel.org/r/20210909125508.28693-2-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>