Age | Commit message (Collapse) | Author | Files | Lines |
|
Inside timeout handler, blk_mq_tag_to_rq() is called
to retrieve the request from one tag. This way is obviously
wrong because the request can be freed any time and some
fiedds of the request can't be trusted, then kernel oops
might be triggered[1].
Currently wrt. blk_mq_tag_to_rq(), the only special case is
that the flush request can share same tag with the request
cloned from, and the two requests can't be active at the same
time, so this patch fixes the above issue by updating tags->rqs[tag]
with the active request(either flush rq or the request cloned
from) of the tag.
Also blk_mq_tag_to_rq() gets much simplified with this patch.
Given blk_mq_tag_to_rq() is mainly for drivers and the caller must
make sure the request can't be freed, so in bt_for_each() this
helper is replaced with tags->rqs[tag].
[1] kernel oops log
[ 439.696220] BUG: unable to handle kernel NULL pointer dereference at 0000000000000158^M
[ 439.697162] IP: [<ffffffff812d89ba>] blk_mq_tag_to_rq+0x21/0x6e^M
[ 439.700653] PGD 7ef765067 PUD 7ef764067 PMD 0 ^M
[ 439.700653] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC ^M
[ 439.700653] Dumping ftrace buffer:^M
[ 439.700653] (ftrace buffer empty)^M
[ 439.700653] Modules linked in: nbd ipv6 kvm_intel kvm serio_raw^M
[ 439.700653] CPU: 6 PID: 2779 Comm: stress-ng-sigfd Not tainted 4.2.0-rc5-next-20150805+ #265^M
[ 439.730500] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011^M
[ 439.730500] task: ffff880605308000 ti: ffff88060530c000 task.ti: ffff88060530c000^M
[ 439.730500] RIP: 0010:[<ffffffff812d89ba>] [<ffffffff812d89ba>] blk_mq_tag_to_rq+0x21/0x6e^M
[ 439.730500] RSP: 0018:ffff880819203da0 EFLAGS: 00010283^M
[ 439.730500] RAX: ffff880811b0e000 RBX: ffff8800bb465f00 RCX: 0000000000000002^M
[ 439.730500] RDX: 0000000000000000 RSI: 0000000000000202 RDI: 0000000000000000^M
[ 439.730500] RBP: ffff880819203db0 R08: 0000000000000002 R09: 0000000000000000^M
[ 439.730500] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000202^M
[ 439.730500] R13: ffff880814104800 R14: 0000000000000002 R15: ffff880811a2ea00^M
[ 439.730500] FS: 00007f165b3f5740(0000) GS:ffff880819200000(0000) knlGS:0000000000000000^M
[ 439.730500] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b^M
[ 439.730500] CR2: 0000000000000158 CR3: 00000007ef766000 CR4: 00000000000006e0^M
[ 439.730500] Stack:^M
[ 439.730500] 0000000000000008 ffff8808114eed90 ffff880819203e00 ffffffff812dc104^M
[ 439.755663] ffff880819203e40 ffffffff812d9f5e 0000020000000000 ffff8808114eed80^M
[ 439.755663] Call Trace:^M
[ 439.755663] <IRQ> ^M
[ 439.755663] [<ffffffff812dc104>] bt_for_each+0x6e/0xc8^M
[ 439.755663] [<ffffffff812d9f5e>] ? blk_mq_rq_timed_out+0x6a/0x6a^M
[ 439.755663] [<ffffffff812d9f5e>] ? blk_mq_rq_timed_out+0x6a/0x6a^M
[ 439.755663] [<ffffffff812dc1b3>] blk_mq_tag_busy_iter+0x55/0x5e^M
[ 439.755663] [<ffffffff812d88b4>] ? blk_mq_bio_to_request+0x38/0x38^M
[ 439.755663] [<ffffffff812d8911>] blk_mq_rq_timer+0x5d/0xd4^M
[ 439.755663] [<ffffffff810a3e10>] call_timer_fn+0xf7/0x284^M
[ 439.755663] [<ffffffff810a3d1e>] ? call_timer_fn+0x5/0x284^M
[ 439.755663] [<ffffffff812d88b4>] ? blk_mq_bio_to_request+0x38/0x38^M
[ 439.755663] [<ffffffff810a46d6>] run_timer_softirq+0x1ce/0x1f8^M
[ 439.755663] [<ffffffff8104c367>] __do_softirq+0x181/0x3a4^M
[ 439.755663] [<ffffffff8104c76e>] irq_exit+0x40/0x94^M
[ 439.755663] [<ffffffff81031482>] smp_apic_timer_interrupt+0x33/0x3e^M
[ 439.755663] [<ffffffff815559a4>] apic_timer_interrupt+0x84/0x90^M
[ 439.755663] <EOI> ^M
[ 439.755663] [<ffffffff81554350>] ? _raw_spin_unlock_irq+0x32/0x4a^M
[ 439.755663] [<ffffffff8106a98b>] finish_task_switch+0xe0/0x163^M
[ 439.755663] [<ffffffff8106a94d>] ? finish_task_switch+0xa2/0x163^M
[ 439.755663] [<ffffffff81550066>] __schedule+0x469/0x6cd^M
[ 439.755663] [<ffffffff8155039b>] schedule+0x82/0x9a^M
[ 439.789267] [<ffffffff8119b28b>] signalfd_read+0x186/0x49a^M
[ 439.790911] [<ffffffff8106d86a>] ? wake_up_q+0x47/0x47^M
[ 439.790911] [<ffffffff811618c2>] __vfs_read+0x28/0x9f^M
[ 439.790911] [<ffffffff8117a289>] ? __fget_light+0x4d/0x74^M
[ 439.790911] [<ffffffff811620a7>] vfs_read+0x7a/0xc6^M
[ 439.790911] [<ffffffff8116292b>] SyS_read+0x49/0x7f^M
[ 439.790911] [<ffffffff81554c17>] entry_SYSCALL_64_fastpath+0x12/0x6f^M
[ 439.790911] Code: 48 89 e5 e8 a9 b8 e7 ff 5d c3 0f 1f 44 00 00 55 89
f2 48 89 e5 41 54 41 89 f4 53 48 8b 47 60 48 8b 1c d0 48 8b 7b 30 48 8b
53 38 <48> 8b 87 58 01 00 00 48 85 c0 75 09 48 8b 97 88 0c 00 00 eb 10
^M
[ 439.790911] RIP [<ffffffff812d89ba>] blk_mq_tag_to_rq+0x21/0x6e^M
[ 439.790911] RSP <ffff880819203da0>^M
[ 439.790911] CR2: 0000000000000158^M
[ 439.790911] ---[ end trace d40af58949325661 ]---^M
Cc: <stable@vger.kernel.org>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
There may be lots of pending requests so that the buffer of PAGE_SIZE
can't hold them at all.
One typical example is scsi-mq, the queue depth(.can_queue) of
scsi_host and blk-mq is quite big but scsi_device's queue_depth
is a bit small(.cmd_per_lun), then it is quite easy to have lots
of pending requests in hw queue.
This patch fixes the following warning and the related memory
destruction.
[ 359.025101] fill_read_buffer: blk_mq_hw_sysfs_show+0x0/0x7d returned bad count^M
[ 359.055595] irq event stamp: 15537^M
[ 359.055606] general protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC ^M
[ 359.055614] Dumping ftrace buffer:^M
[ 359.055660] (ftrace buffer empty)^M
[ 359.055672] Modules linked in: nbd ipv6 kvm_intel kvm serio_raw^M
[ 359.055678] CPU: 4 PID: 21631 Comm: stress-ng-sysfs Not tainted 4.2.0-rc5-next-20150805 #434^M
[ 359.055679] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011^M
[ 359.055682] task: ffff8802161cc000 ti: ffff88021b4a8000 task.ti: ffff88021b4a8000^M
[ 359.055693] RIP: 0010:[<ffffffff811541c5>] [<ffffffff811541c5>] __kmalloc+0xe8/0x152^M
Cc: <stable@vger.kernel.org>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Update block/biovecs.txt so that it includes a note on what kind of
effects arbitrarily sized bios would bring to the block layer.
Also fix a trivial typo, bio_iter_iovec.
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Kent Overstreet <kent.overstreet@gmail.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: linux-doc@vger.kernel.org
Signed-off-by: Dongsu Park <dpark@posteo.net>
Signed-off-by: Ming Lin <ming.l@ssi.samsung.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
We can always fill up the bio now, no need to estimate the possible
size based on queue parameters.
Acked-by: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
[hch: rebased and wrote a changelog]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lin <ming.l@ssi.samsung.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Call pre-defined helper bio_add_page() instead of open coding for
iterating through bi_io_vec[]. Doing that, it's possible to make some
parts in filesystems and mm/page_io.c simpler than before.
Acked-by: Dave Kleikamp <shaggy@kernel.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
[dpark: add more description in commit message]
Signed-off-by: Dongsu Park <dpark@posteo.net>
Signed-off-by: Ming Lin <ming.l@ssi.samsung.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
As generic_make_request() is now able to handle arbitrarily sized bios,
it's no longer necessary for each individual block driver to define its
own ->merge_bvec_fn() callback. Remove every invocation completely.
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Lars Ellenberg <drbd-dev@lists.linbit.com>
Cc: drbd-user@lists.linbit.com
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Yehuda Sadeh <yehuda@inktank.com>
Cc: Sage Weil <sage@inktank.com>
Cc: Alex Elder <elder@kernel.org>
Cc: ceph-devel@vger.kernel.org
Cc: Alasdair Kergon <agk@redhat.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: dm-devel@redhat.com
Cc: Neil Brown <neilb@suse.de>
Cc: linux-raid@vger.kernel.org
Cc: Christoph Hellwig <hch@infradead.org>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Acked-by: NeilBrown <neilb@suse.de> (for the 'md' bits)
Acked-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
[dpark: also remove ->merge_bvec_fn() in dm-thin as well as
dm-era-target, and resolve merge conflicts]
Signed-off-by: Dongsu Park <dpark@posteo.net>
Signed-off-by: Ming Lin <ming.l@ssi.samsung.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Remove bio_fits_rdev() as sufficient merge_bvec_fn() handling is now
performed by blk_queue_split() in md_make_request().
Cc: Neil Brown <neilb@suse.de>
Cc: linux-raid@vger.kernel.org
Acked-by: NeilBrown <neilb@suse.de>
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
[dpark: add more description in commit message]
Signed-off-by: Dongsu Park <dpark@posteo.net>
Signed-off-by: Ming Lin <ming.l@ssi.samsung.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
If a read request fits entirely in a chunk, it will be passed directly to the
underlying device (providing it hasn't failed of course). If it doesn't fit,
the slightly less efficient path that uses the stripe_cache is used.
Requests that get to the stripe cache are always completely split up as
necessary.
So with RAID5, ripping out the merge_bvec_fn doesn't cause it to stop work,
but could cause it to take the less efficient path more often.
All that is needed to manage this is for 'chunk_aligned_read' do some bio
splitting, much like the RAID0 code does.
Cc: Neil Brown <neilb@suse.de>
Cc: linux-raid@vger.kernel.org
Acked-by: NeilBrown <neilb@suse.de>
Signed-off-by: Ming Lin <ming.l@ssi.samsung.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
The split code in blkdev_issue_{discard,write_same} can go away
now that any driver that cares does the split. We have to make
sure bio size doesn't overflow.
For discard, we set max discard sectors to (1<<31)>>9 to ensure
it doesn't overflow bi_size and hopefully it is of the proper
granularity as long as the granularity is a power of two.
Acked-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Ming Lin <ming.l@ssi.samsung.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Btrfs has been doing bio splitting from btrfs_map_bio(), by checking
device limits as well as calling ->merge_bvec_fn() etc. That is not
necessary any more, because generic_make_request() is now able to
handle arbitrarily sized bios. So clean up unnecessary code paths.
Cc: Chris Mason <clm@fb.com>
Cc: Josef Bacik <jbacik@fb.com>
Cc: linux-btrfs@vger.kernel.org
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Chris Mason <clm@fb.com>
[dpark: add more description in commit message]
Signed-off-by: Dongsu Park <dpark@posteo.net>
Signed-off-by: Ming Lin <ming.l@ssi.samsung.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
The bcache driver has always accepted arbitrarily large bios and split
them internally. Now that every driver must accept arbitrarily large
bios this code isn't nessecary anymore.
Cc: linux-bcache@vger.kernel.org
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
[dpark: add more description in commit message]
Signed-off-by: Dongsu Park <dpark@posteo.net>
Signed-off-by: Ming Lin <ming.l@ssi.samsung.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Since generic_make_request() can now handle arbitrary size bios, all we
have to do is make sure the bvec array doesn't overflow.
__bio_add_page() doesn't need to call ->merge_bvec_fn(), where
we can get rid of unnecessary code paths.
Removing the call to ->merge_bvec_fn() is also fine, as no driver that
implements support for BLOCK_PC commands even has a ->merge_bvec_fn()
method.
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
[dpark: rebase and resolve merge conflicts, change a couple of comments,
make bio_add_page() warn once upon a cloned bio.]
Signed-off-by: Dongsu Park <dpark@posteo.net>
Signed-off-by: Ming Lin <ming.l@ssi.samsung.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
The way the block layer is currently written, it goes to great lengths
to avoid having to split bios; upper layer code (such as bio_add_page())
checks what the underlying device can handle and tries to always create
bios that don't need to be split.
But this approach becomes unwieldy and eventually breaks down with
stacked devices and devices with dynamic limits, and it adds a lot of
complexity. If the block layer could split bios as needed, we could
eliminate a lot of complexity elsewhere - particularly in stacked
drivers. Code that creates bios can then create whatever size bios are
convenient, and more importantly stacked drivers don't have to deal with
both their own bio size limitations and the limitations of the
(potentially multiple) devices underneath them. In the future this will
let us delete merge_bvec_fn and a bunch of other code.
We do this by adding calls to blk_queue_split() to the various
make_request functions that need it - a few can already handle arbitrary
size bios. Note that we add the call _after_ any call to
blk_queue_bounce(); this means that blk_queue_split() and
blk_recalc_rq_segments() don't need to be concerned with bouncing
affecting segment merging.
Some make_request_fn() callbacks were simple enough to audit and verify
they don't need blk_queue_split() calls. The skipped ones are:
* nfhd_make_request (arch/m68k/emu/nfblock.c)
* axon_ram_make_request (arch/powerpc/sysdev/axonram.c)
* simdisk_make_request (arch/xtensa/platforms/iss/simdisk.c)
* brd_make_request (ramdisk - drivers/block/brd.c)
* mtip_submit_request (drivers/block/mtip32xx/mtip32xx.c)
* loop_make_request
* null_queue_bio
* bcache's make_request fns
Some others are almost certainly safe to remove now, but will be left
for future patches.
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Ming Lei <ming.lei@canonical.com>
Cc: Neil Brown <neilb@suse.de>
Cc: Alasdair Kergon <agk@redhat.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: dm-devel@redhat.com
Cc: Lars Ellenberg <drbd-dev@lists.linbit.com>
Cc: drbd-user@lists.linbit.com
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Jim Paris <jim@jtan.com>
Cc: Philip Kelleher <pjk1939@linux.vnet.ibm.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Oleg Drokin <oleg.drokin@intel.com>
Cc: Andreas Dilger <andreas.dilger@intel.com>
Acked-by: NeilBrown <neilb@suse.de> (for the 'md/md.c' bits)
Acked-by: Mike Snitzer <snitzer@redhat.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
[dpark: skip more mq-based drivers, resolve merge conflicts, etc.]
Signed-off-by: Dongsu Park <dpark@posteo.net>
Signed-off-by: Ming Lin <ming.l@ssi.samsung.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
IS_ERR(_OR_NULL) already contain an 'unlikely' compiler flag and there
is no need to do that again from its callers. Drop it.
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Commit 4246a0b6 ("block: add a bi_error field to struct bio") has added a few
dereferences of 'bio' after a call to bio_put(). This causes use-after-frees
such as:
[521120.719695] BUG: KASan: use after free in dio_bio_complete+0x2b3/0x320 at addr ffff880f36b38714
[521120.720638] Read of size 4 by task mount.ocfs2/9644
[521120.721212] =============================================================================
[521120.722056] BUG kmalloc-256 (Not tainted): kasan: bad access detected
[521120.722968] -----------------------------------------------------------------------------
[521120.722968]
[521120.723915] Disabling lock debugging due to kernel taint
[521120.724539] INFO: Slab 0xffffea003cdace00 objects=32 used=25 fp=0xffff880f36b38600 flags=0x46fffff80004080
[521120.726037] INFO: Object 0xffff880f36b38700 @offset=1792 fp=0xffff880f36b38800
[521120.726037]
[521120.726974] Bytes b4 ffff880f36b386f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
[521120.727898] Object ffff880f36b38700: 00 88 b3 36 0f 88 ff ff 00 00 d8 de 0b 88 ff ff ...6............
[521120.728822] Object ffff880f36b38710: 02 00 00 f0 00 00 00 00 00 00 00 00 00 00 00 00 ................
[521120.729705] Object ffff880f36b38720: 01 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 ................
[521120.730623] Object ffff880f36b38730: 00 00 00 00 00 00 00 00 01 00 00 00 00 02 00 00 ................
[521120.731621] Object ffff880f36b38740: 00 02 00 00 01 00 00 00 d0 f7 87 ad ff ff ff ff ................
[521120.732776] Object ffff880f36b38750: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
[521120.733640] Object ffff880f36b38760: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
[521120.734508] Object ffff880f36b38770: 01 00 03 00 01 00 00 00 88 87 b3 36 0f 88 ff ff ...........6....
[521120.735385] Object ffff880f36b38780: 00 73 22 ad 02 88 ff ff 40 13 e0 3c 00 ea ff ff .s".....@..<....
[521120.736667] Object ffff880f36b38790: 00 02 00 00 00 04 00 00 00 00 00 00 00 00 00 00 ................
[521120.737596] Object ffff880f36b387a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
[521120.738524] Object ffff880f36b387b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
[521120.739388] Object ffff880f36b387c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
[521120.740277] Object ffff880f36b387d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
[521120.741187] Object ffff880f36b387e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
[521120.742233] Object ffff880f36b387f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
[521120.743229] CPU: 41 PID: 9644 Comm: mount.ocfs2 Tainted: G B 4.2.0-rc6-next-20150810-sasha-00039-gf909086 #2420
[521120.744274] ffff880f36b38000 ffff880d89c8f638 ffffffffb6e9ba8a ffff880101c0e5c0
[521120.745025] ffff880d89c8f668 ffffffffad76a313 ffff880101c0e5c0 ffffea003cdace00
[521120.745908] ffff880f36b38700 ffff880f36b38798 ffff880d89c8f690 ffffffffad772854
[521120.747063] Call Trace:
[521120.747520] dump_stack (lib/dump_stack.c:52)
[521120.748053] print_trailer (mm/slub.c:653)
[521120.748582] object_err (mm/slub.c:660)
[521120.749079] kasan_report_error (include/linux/kasan.h:20 mm/kasan/report.c:152 mm/kasan/report.c:194)
[521120.750834] __asan_report_load4_noabort (mm/kasan/report.c:250)
[521120.753580] dio_bio_complete (fs/direct-io.c:478)
[521120.755752] do_blockdev_direct_IO (fs/direct-io.c:494 fs/direct-io.c:1291)
[521120.759765] __blockdev_direct_IO (fs/direct-io.c:1322)
[521120.761658] blkdev_direct_IO (fs/block_dev.c:162)
[521120.762993] generic_file_read_iter (mm/filemap.c:1738)
[521120.767405] blkdev_read_iter (fs/block_dev.c:1649)
[521120.768556] __vfs_read (fs/read_write.c:423 fs/read_write.c:434)
[521120.772126] vfs_read (fs/read_write.c:454)
[521120.773118] SyS_pread64 (fs/read_write.c:607 fs/read_write.c:594)
[521120.776062] entry_SYSCALL_64_fastpath (arch/x86/entry/entry_64.S:186)
[521120.777375] Memory state around the buggy address:
[521120.778118] ffff880f36b38600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[521120.779211] ffff880f36b38680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[521120.780315] >ffff880f36b38700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[521120.781465] ^
[521120.782083] ffff880f36b38780: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[521120.783717] ffff880f36b38800: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[521120.784818] ==================================================================
This patch fixes a few of those places that I caught while auditing the patch, but the
original patch should be audited further for more occurences of this issue since I'm
not too familiar with the code.
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Commit bcf2843b3f8f added ->bi_error to cleanup the error passing
for struct bio, but that ended up adding 4 bytes and a 4 byte hole
to the size of struct bio. For a clean config, that bumped it from
128 bytes, to 136 bytes, on x86-64.
The ->bi_flags member is currently an unsigned long, but it fits
easily within an int. Change it to an unsigned int, adjust the
the pool offset code, and move ->bi_error into the new hole. Then
we end up with a 128 byte bio again.
Change the bio flag set/clear to use cmpxchg to ensure we don't
lose any flags when manipulating them.
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Some places use helpers now, others don't. We only have the 'is set'
helper, add helpers for setting and clearing flags too.
It was a bit of a mess of atomic vs non-atomic access. With
BIO_UPTODATE gone, we don't have any risk of concurrent access to the
flags. So relax the restriction and don't make any of them atomic. The
flags that do have serialization issues (reffed and chained), we
already handle those separately.
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Currently we have two different ways to signal an I/O error on a BIO:
(1) by clearing the BIO_UPTODATE flag
(2) by returning a Linux errno value to the bi_end_io callback
The first one has the drawback of only communicating a single possible
error (-EIO), and the second one has the drawback of not beeing persistent
when bios are queued up, and are not passed along from child to parent
bio in the ever more popular chaining scenario. Having both mechanisms
available has the additional drawback of utterly confusing driver authors
and introducing bugs where various I/O submitters only deal with one of
them, and the others have to add boilerplate code to deal with both kinds
of error returns.
So add a new bi_error field to store an errno value directly in struct
bio and remove the existing mechanisms to clean all this up.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Lots of devices support huge discard sizes these days. Depending
on how the device handles them internally, huge discards can
introduce massive latencies (hundreds of msec) on the device side.
We have a sysfs file, discard_max_bytes, that advertises the max
hardware supported discard size. Make this writeable, and split
the settings into a soft and hard limit. This can be set from
'discard_granularity' and up to the hardware limit.
Add a new sysfs file, 'discard_max_hw_bytes', that shows the hw
set limit.
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Some drivers use it now, others just set the limits field manually.
But in preparation for splitting this into a hard and soft limit,
ensure that they all call the proper function for setting the hw
limit for discards.
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Percpu refcount is the perfect match for partition's case,
and the conversion is quite straight.
With the convertion, one pair of atomic inc/dec can be saved
for accounting block I/O, which is run in hot path of block I/O.
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
So the helper can be used in both generic partition
case and part0 case.
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
It is reasonable to set default timeout of request as 30 seconds instead of
30000 ticks, which may be 300 seconds if HZ is 100, for example, some arm64
based systems may choose 100 HZ.
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Fixes: c76cbbcf4044 ("blk-mq: put blk_queue_rq_timeout together in blk_mq_init_queue()"
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
This patch has the driver automatically reread partitions if a namespace
has a separate metadata format. Previously revalidating a disk was
sufficient to get the correct capacity set on such formatted drives,
but partitions that may exist would not have been surfaced.
Reported-by: Paul Grabinar <paul.grabinar@ranbarg.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Cc: Matthew Wilcox <willy@linux.intel.com>
Tested-by: Paul Grabinar <paul.grabinar@ranbarg.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
The end of jfs_rename(), which is also used by the error paths,
included a call to IWRITE_UNLOCK(new_ip) after labels out1, out2
and out3. If we come in through these labels, IWRITE_LOCK() has not
been called yet.
In moving that call to the correct spot, I also moved some
exceptional truncate code earlier as well, since the early error
paths don't need to deal with it, and I renamed out4: to out_tx: so
a future patch by Jan Kara doesn't need to deal with renumbering or
confusing out-of-order labels.
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
|
|
'{ }' and memset will both reset the cbuf buffer.
Only once is enough and this can be done outside fo the mutex.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Darren Hart <dvhart@linux.intel.com>
|
|
Destroy ocrdma_dev_id IDR on module exit, reclaiming the allocated memory.
This was detected by the following semantic patch (written by Luis Rodriguez
<mcgrof@suse.com>)
<SmPL>
@ defines_module_init @
declarer name module_init, module_exit;
declarer name DEFINE_IDR;
identifier init;
@@
module_init(init);
@ defines_module_exit @
identifier exit;
@@
module_exit(exit);
@ declares_idr depends on defines_module_init && defines_module_exit @
identifier idr;
@@
DEFINE_IDR(idr);
@ on_exit_calls_destroy depends on declares_idr && defines_module_exit @
identifier declares_idr.idr, defines_module_exit.exit;
@@
exit(void)
{
...
idr_destroy(&idr);
...
}
@ missing_module_idr_destroy depends on declares_idr && defines_module_exit && !on_exit_calls_destroy @
identifier declares_idr.idr, defines_module_exit.exit;
@@
exit(void)
{
...
+idr_destroy(&idr);
}
</SmPL>
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Destroy multcast_idr on module exit, reclaiming the allocated memory.
This was detected by the following semantic patch (written by Luis Rodriguez
<mcgrof@suse.com>)
<SmPL>
@ defines_module_init @
declarer name module_init, module_exit;
declarer name DEFINE_IDR;
identifier init;
@@
module_init(init);
@ defines_module_exit @
identifier exit;
@@
module_exit(exit);
@ declares_idr depends on defines_module_init && defines_module_exit @
identifier idr;
@@
DEFINE_IDR(idr);
@ on_exit_calls_destroy depends on declares_idr && defines_module_exit @
identifier declares_idr.idr, defines_module_exit.exit;
@@
exit(void)
{
...
idr_destroy(&idr);
...
}
@ missing_module_idr_destroy depends on declares_idr && defines_module_exit && !on_exit_calls_destroy @
identifier declares_idr.idr, defines_module_exit.exit;
@@
exit(void)
{
...
+idr_destroy(&idr);
}
</SmPL>
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
There is little chance our memory allocation will fail, so we can
combine initializing the work structs with allocating them instead of
looping through all of them once to allocate and again to initialize.
Then when we need to actually find out if our device is up or in the
process of going down, have all of our work structs batched up, take the
spin_lock once and only once, and do all of the batch under the one
spin_lock invocation instead of incurring all of the locked memory cycles
we would otherwise incur to take/release the spin_lock over and over
again.
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
We create a number of work structs to be queued up to a workqueue, and
on completion of the workqueue handler, the workqueue handler frees the
allocated memory. If, however, we don't queue the work struct because
the device is going down, then we need to free the memory ourselves.
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
On failure, we loop through all possible pointers and test them before
calling kfree. But really, why even attempt to free items we didn't
allocate when we can easily loop through exactly and only the devices
for which the original memory allocation succeeded and free just those.
Signed-off-by: Maninder Singh <maninder1.s@samsung.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
For IB links, reading HCA flow counters through iboe_process_mad() should
be used when mlx4_ib_process_mad() is invoked only for VFs PMA queries and
exactly nothing else.
Fixes: 7193a141eb74 ('IB/mlx4: Set VF to read from QP counters')
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
In little endian cases, the macros be16_to_cpu and cpu_to_be64
unfolds to __swab{16,64} which provides special case for constants.
In big endian cases, __constant_be16_to_cpu and be16_to_cpu
expand directly to the same expression. The same applies for
__constant_cpu_to_be64 and cpu_to_be64.
So, replace __constant_be16_to_cpu with be16_to_cpu and
__constant_cpu_to_be64 with cpu_to_be64, with the goal of getting
rid of the definition of __constant_be16_to_cpu and
__constant_cpu_to_be64 completely.
Signed-off-by: Vaishali Thakkar <vthakkar1994@gmail.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
When switching between modes (datagram / connected) change the MTU
accordingly.
datagram mode up to 4K, connected mode up to (64K - 0x10).
Signed-off-by: ELi Cohen <eli@mellanox.com>
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
By default, IPoIB-CM driver uses 64k MTU. Larger MTU gives better
performance.
This MTU plus overhead puts the memory allocation for IP based packets at
32 4k pages (order 5), which have to be contiguous.
When the system memory under pressure, it was observed that allocating 128k
contiguous physical memory is difficult and causes serious errors (such as
system becomes unusable).
This enhancement resolve the issue by removing the physically contiguous
memory requirement using Scatter/Gather feature that exists in Linux stack.
With this fix Scatter-Gather will be supported also in connected mode.
This change reverts some of the change made in commit e112373fd6aa
("IPoIB/cm: Reduce connected mode TX object size").
The ability to use SG in IPoIB CM is possible because the coupling
between NETIF_F_SG and NETIF_F_CSUM was removed in commit
ec5f06156423 ("net: Kill link between CSUM and SG features.")
Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Acked-by: Christian Marie <christian@ponies.io>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
ib_ucm_release_dev clears the wrong bit if devnum is greater
than IB_UCM_MAX_DEVICES.
Signed-off-by: Carol L Soto <clsoto@linux.vnet.ibm.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
__ipoib_ib_dev_flush calls itself recursively on child devices, and lockdep
complains about locking vlan_rwsem twice (see below). Use down_read_nested
instead of down_read to prevent the warning.
=============================================
[ INFO: possible recursive locking detected ]
4.1.0-rc4+ #36 Tainted: G O
---------------------------------------------
kworker/u20:2/261 is trying to acquire lock:
(&priv->vlan_rwsem){.+.+..}, at: [<ffffffffa0791e2a>] __ipoib_ib_dev_flush+0x3a/0x2b0 [ib_ipoib]
but task is already holding lock:
(&priv->vlan_rwsem){.+.+..}, at: [<ffffffffa0791e2a>] __ipoib_ib_dev_flush+0x3a/0x2b0 [ib_ipoib]
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&priv->vlan_rwsem);
lock(&priv->vlan_rwsem);
*** DEADLOCK ***
May be due to missing lock nesting notation
3 locks held by kworker/u20:2/261:
#0: ("%s""ipoib_flush"){.+.+..}, at: [<ffffffff810827cc>] process_one_work+0x15c/0x760
#1: ((&priv->flush_heavy)){+.+...}, at: [<ffffffff810827cc>] process_one_work+0x15c/0x760
#2: (&priv->vlan_rwsem){.+.+..}, at: [<ffffffffa0791e2a>] __ipoib_ib_dev_flush+0x3a/0x2b0 [ib_ipoib]
stack backtrace:
CPU: 3 PID: 261 Comm: kworker/u20:2 Tainted: G O 4.1.0-rc4+ #36
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
Workqueue: ipoib_flush ipoib_ib_dev_flush_heavy [ib_ipoib]
ffff8801c6c54790 ffff8801c9927af8 ffffffff81665238 0000000000000001
ffffffff825b5b30 ffff8801c9927bd8 ffffffff810bba51 ffff880100000000
ffffffff00000001 ffff880100000001 ffff8801c6c55428 ffff8801c6c54790
Call Trace:
[<ffffffff81665238>] dump_stack+0x4f/0x6f
[<ffffffff810bba51>] __lock_acquire+0x741/0x1820
[<ffffffff810bcbf8>] lock_acquire+0xc8/0x240
[<ffffffffa0791e2a>] ? __ipoib_ib_dev_flush+0x3a/0x2b0 [ib_ipoib]
[<ffffffff81669d2c>] down_read+0x4c/0x70
[<ffffffffa0791e2a>] ? __ipoib_ib_dev_flush+0x3a/0x2b0 [ib_ipoib]
[<ffffffffa0791e2a>] __ipoib_ib_dev_flush+0x3a/0x2b0 [ib_ipoib]
[<ffffffffa0791e4a>] __ipoib_ib_dev_flush+0x5a/0x2b0 [ib_ipoib]
[<ffffffffa07920ba>] ipoib_ib_dev_flush_heavy+0x1a/0x20 [ib_ipoib]
[<ffffffff81082871>] process_one_work+0x201/0x760
[<ffffffff810827cc>] ? process_one_work+0x15c/0x760
[<ffffffff81082ef0>] worker_thread+0x120/0x4d0
[<ffffffff81082dd0>] ? process_one_work+0x760/0x760
[<ffffffff81082dd0>] ? process_one_work+0x760/0x760
[<ffffffff81088b7e>] kthread+0xfe/0x120
[<ffffffff81088a80>] ? __init_kthread_worker+0x70/0x70
[<ffffffff8166c6e2>] ret_from_fork+0x42/0x70
[<ffffffff81088a80>] ? __init_kthread_worker+0x70/0x70
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
The ucma_lock_files() locks the mut mutex on two files, e.g. for migrating
an ID. Use mutex_lock_nested() to prevent the warning below.
=============================================
[ INFO: possible recursive locking detected ]
4.1.0-rc6-hmm+ #40 Tainted: G O
---------------------------------------------
pingpong_rpc_se/10260 is trying to acquire lock:
(&file->mut){+.+.+.}, at: [<ffffffffa047ac55>] ucma_migrate_id+0xc5/0x248 [rdma_ucm]
but task is already holding lock:
(&file->mut){+.+.+.}, at: [<ffffffffa047ac4b>] ucma_migrate_id+0xbb/0x248 [rdma_ucm]
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&file->mut);
lock(&file->mut);
*** DEADLOCK ***
May be due to missing lock nesting notation
1 lock held by pingpong_rpc_se/10260:
#0: (&file->mut){+.+.+.}, at: [<ffffffffa047ac4b>] ucma_migrate_id+0xbb/0x248 [rdma_ucm]
stack backtrace:
CPU: 0 PID: 10260 Comm: pingpong_rpc_se Tainted: G O 4.1.0-rc6-hmm+ #40
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
ffff8801f85b63d0 ffff880195677b58 ffffffff81668f49 0000000000000001
ffffffff825cbbe0 ffff880195677c38 ffffffff810bb991 ffff880100000000
ffff880100000000 ffff880100000001 ffff8801f85b7010 ffffffff8121bee9
Call Trace:
[<ffffffff81668f49>] dump_stack+0x4f/0x6e
[<ffffffff810bb991>] __lock_acquire+0x741/0x1820
[<ffffffff8121bee9>] ? dput+0x29/0x320
[<ffffffff810bcb38>] lock_acquire+0xc8/0x240
[<ffffffffa047ac55>] ? ucma_migrate_id+0xc5/0x248 [rdma_ucm]
[<ffffffff8166b901>] ? mutex_lock_nested+0x291/0x3e0
[<ffffffff8166b6d5>] mutex_lock_nested+0x65/0x3e0
[<ffffffffa047ac55>] ? ucma_migrate_id+0xc5/0x248 [rdma_ucm]
[<ffffffff810baeed>] ? trace_hardirqs_on+0xd/0x10
[<ffffffff8166b66e>] ? mutex_unlock+0xe/0x10
[<ffffffffa047ac55>] ucma_migrate_id+0xc5/0x248 [rdma_ucm]
[<ffffffffa0478474>] ucma_write+0xa4/0xb0 [rdma_ucm]
[<ffffffff81200674>] __vfs_write+0x34/0x100
[<ffffffff8112427c>] ? __audit_syscall_entry+0xac/0x110
[<ffffffff810ec055>] ? current_kernel_time+0xc5/0xe0
[<ffffffff812aa4d3>] ? security_file_permission+0x23/0x90
[<ffffffff8120088d>] ? rw_verify_area+0x5d/0xe0
[<ffffffff812009bb>] vfs_write+0xab/0x120
[<ffffffff81201519>] SyS_write+0x59/0xd0
[<ffffffff8112427c>] ? __audit_syscall_entry+0xac/0x110
[<ffffffff8166ffee>] system_call_fastpath+0x12/0x76
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Fixes: 3e0249f9c05c ("RDS/IB: add refcount tracking to struct rds_ib_device")
There lacks a dropping on rds_ib_device.refcount in case rds_ib_alloc_fmr
failed(mr pool running out). this lead to the refcount overflow.
A complain in line 117(see following) is seen. From vmcore:
s_ib_rdma_mr_pool_depleted is 2147485544 and rds_ibdev->refcount is -2147475448.
That is the evidence the mr pool is used up. so rds_ib_alloc_fmr is very likely
to return ERR_PTR(-EAGAIN).
115 void rds_ib_dev_put(struct rds_ib_device *rds_ibdev)
116 {
117 BUG_ON(atomic_read(&rds_ibdev->refcount) <= 0);
118 if (atomic_dec_and_test(&rds_ibdev->refcount))
119 queue_work(rds_wq, &rds_ibdev->free_work);
120 }
fix is to drop refcount when rds_ib_alloc_fmr failed.
Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
Reviewed-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Fix for incorrect recording of the MAC address
Signed-off-by: Tatyana Nikolova <Tatyana.E.Nikolova@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Neighbor resolution doesn't work without this fix
Signed-off-by: Tatyana Nikolova <Tatyana.E.Nikolova@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Fixes to allow clients to make remove mapping requests, after
they have provided the user space service with the mapping
information, they are using when the service is restarted.
1) Adding IWPM_REG_VALID, IWPM_REG_INCOMPL and IWPM_REG_UNDEF
registration types for the port mapper clients and functions
to set/check the registration type.
2) If the port mapper user space service is not available to register
the client, then its registration stays IWPM_REG_UNDEF and the
registration isn't checked until the service becomes available
(no mappings are possible, if the user space service isn't running).
3) After the service is restarted, the user space port mapper pid is set
to valid and the client registration is set to IWPM_REG_INCOMPL
to allow the client to make remove mapping requests.
Signed-off-by: Tatyana Nikolova <Tatyana.E.Nikolova@intel.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Tested-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Error values of ib_query_port() and ib_query_device() weren't propagated
correctly. Because of that, ipoib_add_port() could return NULL value,
which escaped the IS_ERR() check in ipoib_add_one() and we crashed.
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
mlx4 VFs can provide CQE raw time-stamping services, but they
don't have the hca core clock mapped to their PCI bars.
As such, we should not attempt to query and report the clock offset
to user space for VFs. Doing so causes query_device over VFs to fail
with -ENOSUPP.
Fixes: 4b664c4355b2 ('IB/mlx4: Add support for CQ time-stamping')
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Whenever ib_cm gets remove_one call, like when there is a hot-unplug
event, the driver should mark itself as going_down and confirm that no
new works are going to be queued for that device.
so, the order of the actions are:
1. mark the going_down bit.
2. flush the wq.
3. [make sure no new works for that device.]
4. unregister mad agent.
otherwise, works that are already queued can be scheduled after the mad
agent was freed.
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
We might return res which is not initialized. Also
reduce code duplication by exporting srp_parse_tmo so
srp_tmo_set can reuse it.
Detected by Coverity.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Jenny Falkovich <jennyf@mellanox.com>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
In little endian cases, the macro cpu_to_be{16,32,64} unfolds to
__swab{16,32,64} which provides special case for constants. In
big endian cases, __constant_cpu_to_be{16,32,64} and
cpu_to_be{16,32,64} expand directly to the same expression. So,
replace __constant_cpu_to_be{16,32,64} with cpu_to_be{16,32,64}
with the goal of getting rid of the definitions of
__constant_cpu_to_be{16,32,64} completely.
The Coccinelle semantic patch that performs this transformation
is as follows:
@@expression x;@@
(
- __constant_cpu_to_be16(x)
+ cpu_to_be16(x)
|
- __constant_cpu_to_be32(x)
+ cpu_to_be32(x)
|
- __constant_cpu_to_be64(x)
+ cpu_to_be64(x)
)
Signed-off-by: Vaishali Thakkar <vthakkar1994@gmail.com>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
We recently added BUG_ON's which were inappropriate for a condition which
should never happen. Change these to be WARN_ON_ONCE as a debugging aid.
Fixes: 4cd7c9479aff ('IB/mad: Add support for additional MAD info to/from drivers')
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
The define OPA_LID_PERMISSIVE is big endian and was compared to the
cpu endian variable opa_drslid.
Problem caught by 0-day build infrastructure.
Fixes: 8e4349d13f33 (IB/mad: Add final OPA MAD processing)
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: John, Jubin <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Persuant to Liran's comments on node_type on linux-rdma
mailing list:
In an effort to reform the RDMA core and ULPs to minimize use of
node_type in struct ib_device, an additional bit is added to
struct ib_device for is_switch (IB switch). This is needed
to be initialized by any IB switch device driver. This is a
NEW requirement on such device drivers which are all
"out of tree".
In addition, an ib_switch helper was added to ib_verbs.h
based on the is_switch device bit rather than node_type
(although those should be consistent).
The RDMA core (MAD, SMI, agent, sa_query, multicast, sysfs)
as well as (IPoIB and SRP) ULPs are updated where
appropriate to use this new helper. In some cases,
the helper is now used under the covers of using
rdma_[start end]_port rather than the open coding
previously used.
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Reviewed-By: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Tested-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Hal Rosenstock <hal@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|