Age | Commit message (Collapse) | Author | Files | Lines |
|
We only allow non critical writeback if the origin is idle. It is up
to the policy to decide what writeback work is critical.
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
A little class that keeps track of the volume of io that is in flight,
and the length of time that a device has been idle for.
FIXME: rather than jiffes, may be best to use ktime_t (to support faster
devices).
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
There is a race between a policy deciding to replace a cache entry,
the core target writing back any dirty data from this block, and other
IO threads doing IO to the same block.
This sort of problem is avoided most of the time by the core target
grabbing a bio prison cell before making the request to the policy.
But for a demotion the core target doesn't know which block will be
demoted, so can't do this in advance.
Fix this demotion race by introducing a callback to the policy interface
that allows the policy to grab the cell on behalf of the core target.
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org
|
|
A crypto driver can process requests synchronously or asynchronously
and can use an internal driver queue to backlog requests.
Add some comments to clarify internal logic and completion return codes.
Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
Currently if there is a leg failure, the bio will be put into the hold
list until userspace does a remove/replace on the leg. Doing so in a
cluster config (clvmd) is problematic because there may be a temporary
path failure that results in cluster raid1 remove/replace. Such
recovery takes a long time due to a full resync.
Update dm-raid1 to optionally ignore these failures so bios continue
being issued without interrupton. To enable this feature userspace
must pass "keep_log" when creating the dm-raid1 device.
Signed-off-by: Lidong Zhong <lzhong@suse.com>
Tested-by: Liuhua Wang <lwang@suse.com>
Acked-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
On 32-bit:
drivers/md/dm-log-writes.c: In function ‘log_super’:
drivers/md/dm-log-writes.c:323: warning: integer constant is too large for ‘long’ type
Add a ULL suffix to WRITE_LOG_MAGIC to fix this.
Also add a ULL suffix to WRITE_LOG_VERSION as it's stored in a __le64
field.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
Add dm-raid access to the MD RAID0 personality to enable single zone
striping.
The following changes enable that access:
- add type definition to raid_types array
- make bitmap creation conditonal in super_validate(), because
bitmaps are not allowed in raid0
- set rdev->sectors to the data image size in super_validate()
to allow the raid0 personality to calculate the MD array
size properly
- use mdddev(un)lock() functions instead of direct mutex_(un)lock()
(wrapped in here because it's a trivial change)
- enhance raid_status() to always report full sync for raid0
so that userspace checks for 100% sync will succeed and allow
for resize (and takeover/reshape once added in future paches)
- enhance raid_resume() to not load bitmap in case of raid0
- add merge function to avoid data corruption (seen with readahead)
that resulted from bio payloads that grew too large. This problem
did not occur with the other raid levels because it either did not
apply without striping (raid1) or was avoided via stripe caching.
- raise version to 1.7.0 because of the raid0 API change
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Reviewed-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
- ensure maximum device limit in superblock
- rename DMPF_* (print flags) to CTR_FLAG_* (constructor flags)
and their respective struct raid_set member
- use strcasecmp() in raid10_format_to_md_layout() as in the constructor
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Reviewed-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
Remove comment above parse_raid_params() that claims
"devices_handle_discard_safely" is a table line argument when it is
actually is a module parameter.
Also, backfill dm-raid target version 1.6.0 documentation.
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Reviewed-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
Leverage the block manager's read_only flag instead of duplicating it;
access with new dm_bm_is_read_only() method.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
The overwrite has only ever about optimizing away the need to zero a
block if the entire block was being overwritten. As such it is only
relevant when zeroing is enabled.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Joe Thornber <ejt@redhat.com>
|
|
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
Introduce a single common method for cleaning up a DM device's
mapped_device. No functional change, just eliminates duplication of
delicate mapped_device cleanup code.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
More often than not a request that is requeued _is_ mapped (meaning the
clone request is allocated and clone->q is initialized). Rename
dm_requeue_unmapped_original_request() to avoid potential confusion due
to function name containing "unmapped".
Also, remove dm_requeue_unmapped_request() since callers can easily call
the dm_requeue_original_request() directly.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
Do not allocate the io_pool mempool for blk-mq request-based DM
(DM_TYPE_MQ_REQUEST_BASED) in dm_alloc_rq_mempools().
Also refine __bind_mempools() to have more precise awareness of which
mempools each type of DM device uses -- avoids mempool churn when
reloading DM tables (particularly for DM_TYPE_REQUEST_BASED).
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
dm_merge_bvec() was originally added in f6fccb ("dm: introduce
merge_bvec_fn"). In that commit a value in sectors is converted to
bytes using << 9, and then assigned to an int. This code made
assumptions about the value of BIO_MAX_SECTORS.
A later commit 148e51 ("dm: improve documentation and code clarity in
dm_merge_bvec") was meant to have no functional change but it removed
the use of BIO_MAX_SECTORS in favor of using queue_max_sectors(). At
this point the cast from sector_t to int resulted in a zero value. The
fallout being dm_merge_bvec() would only allow a single page to be added
to a bio.
This interim fix is minimal for the benefit of stable@ because the more
comprehensive cleanup of passing a sector_t to all DM targets' merge
function will impact quite a few DM targets.
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org # 3.19+
|
|
dm-multipath accepts 0 path mapping.
# echo '0 2097152 multipath 0 0 0 0' | dmsetup create newdev
Such a mapping can be used to release underlying devices while still
holding requests in its queue until working paths come back.
However, once the multipath device is created over blk-mq devices,
it rejects reloading of 0 path mapping:
# echo '0 2097152 multipath 0 0 1 1 queue-length 0 1 1 /dev/sda 1' \
| dmsetup create mpath1
# echo '0 2097152 multipath 0 0 0 0' | dmsetup load mpath1
device-mapper: reload ioctl on mpath1 failed: Invalid argument
Command failed
With following kernel message:
device-mapper: ioctl: can't change device type after initial table load.
DM tries to inherit the current table type using dm_table_set_type()
but it doesn't work as expected because of unnecessary check about
whether the target type is hybrid or not.
Hybrid type is for targets that work as either request-based or bio-based
and not required for blk-mq or non blk-mq checking.
Fixes: 65803c205983 ("dm table: train hybrid target type detection to select blk-mq if appropriate")
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
When stacking request-based dm device on non blk-mq device and
device-mapper target could not map the request (error target is used,
multipath target with all paths down, etc), the WARN_ON_ONCE() in
free_rq_clone() will trigger when it shouldn't.
The warning was added by commit aa6df8d ("dm: fix free_rq_clone() NULL
pointer when requeueing unmapped request"). But free_rq_clone() with
clone->q == NULL is valid usage for the case where
dm_kill_unmapped_request() initiates request cleanup.
Fix this false warning by just removing the WARN_ON -- it only generated
false positives and was never useful in catching the intended case
(completing clone request not being mapped e.g. clone->q being NULL).
Fixes: aa6df8d ("dm: fix free_rq_clone() NULL pointer when requeueing unmapped request")
Reported-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reported-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
Use BLK_MQ_RQ_QUEUE_BUSY to requeue a blk-mq request directly from the
DM blk-mq device's .queue_rq. This cleans up the previous convoluted
handling of request requeueing that would return BLK_MQ_RQ_QUEUE_OK
(even though it wasn't) and then run blk_mq_requeue_request() followed
by blk_mq_kick_requeue_list().
Also, document that DM blk-mq ontop of old request_fn devices cannot
fail in clone_rq() since the clone request is preallocated as part of
the pdu.
Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
Otherwise kmemleak reported:
unreferenced object 0xffff88009b14e2b0 (size 16):
comm "fio", pid 4274, jiffies 4294978034 (age 1253.210s)
hex dump (first 16 bytes):
40 12 f3 99 01 88 ff ff 00 10 00 00 00 00 00 00 @...............
backtrace:
[<ffffffff81600029>] kmemleak_alloc+0x49/0xb0
[<ffffffff811679a8>] kmem_cache_alloc+0xf8/0x160
[<ffffffff8111c950>] mempool_alloc_slab+0x10/0x20
[<ffffffff8111cb37>] mempool_alloc+0x57/0x150
[<ffffffffa04d2b61>] __multipath_map.isra.17+0xe1/0x220 [dm_multipath]
[<ffffffffa04d2cb5>] multipath_clone_and_map+0x15/0x20 [dm_multipath]
[<ffffffffa02889b5>] map_request.isra.39+0xd5/0x220 [dm_mod]
[<ffffffffa028b0e4>] dm_mq_queue_rq+0x134/0x240 [dm_mod]
[<ffffffff812cccb5>] __blk_mq_run_hw_queue+0x1d5/0x380
[<ffffffff812ccaa5>] blk_mq_run_hw_queue+0xc5/0x100
[<ffffffff812ce350>] blk_sq_make_request+0x240/0x300
[<ffffffff812c0f30>] generic_make_request+0xc0/0x110
[<ffffffff812c0ff2>] submit_bio+0x72/0x150
[<ffffffff811c07cb>] do_blockdev_direct_IO+0x1f3b/0x2da0
[<ffffffff811c166e>] __blockdev_direct_IO+0x3e/0x40
[<ffffffff8120aa1a>] ext4_direct_IO+0x1aa/0x390
Fixes: e5863d9ad ("dm: allocate requests in target when stacking on blk-mq devices")
Reported-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org # 4.0+
|
|
When stacking request-based DM on blk_mq device, request cloning and
remapping are done in a single call to target's clone_and_map_rq().
The clone is allocated and valid only if clone_and_map_rq() returns
DM_MAPIO_REMAPPED.
The "IS_ERR(clone)" check in map_request() does not cover all the
!DM_MAPIO_REMAPPED cases that are possible (E.g. if underlying devices
are not ready or unavailable, clone_and_map_rq() may return
DM_MAPIO_REQUEUE without ever having established an ERR_PTR). Fix this
by explicitly checking for a return that is not DM_MAPIO_REMAPPED in
map_request().
Without this fix, DM core may call setup_clone() for a NULL clone
and oops like this:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000068
IP: [<ffffffff81227525>] blk_rq_prep_clone+0x7d/0x137
...
CPU: 2 PID: 5793 Comm: kdmwork-253:3 Not tainted 4.0.0-nm #1
...
Call Trace:
[<ffffffffa01d1c09>] map_tio_request+0xa9/0x258 [dm_mod]
[<ffffffff81071de9>] kthread_worker_fn+0xfd/0x150
[<ffffffff81071cec>] ? kthread_parkme+0x24/0x24
[<ffffffff81071cec>] ? kthread_parkme+0x24/0x24
[<ffffffff81071fdd>] kthread+0xe6/0xee
[<ffffffff81093a59>] ? put_lock_stats+0xe/0x20
[<ffffffff81071ef7>] ? __init_kthread_worker+0x5b/0x5b
[<ffffffff814c2d98>] ret_from_fork+0x58/0x90
[<ffffffff81071ef7>] ? __init_kthread_worker+0x5b/0x5b
Fixes: e5863d9ad ("dm: allocate requests in target when stacking on blk-mq devices")
Reported-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org # 4.0+
|
|
Remove unneeded variable used to store return value.
Generated by: scripts/coccinelle/misc/returnvar.cocci
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Without kicking queue, requeued request may stay forever in
the queue if there are no other I/O activities to the device.
The original error had been in v2.6.39 with commit 7eaceaccab5f
("block: remove per-queue plugging"), which replaced conditional
plugging by periodic runqueue.
Commit 9d1deb83d489 in v4.1-rc1 removed the periodic runqueue
and the problem started to manifest.
Fixes: 9d1deb83d489 ("dm: don't schedule delayed run of the queue if nothing to do")
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
|
|
Following lockdep splat was reported :
[ 29.382286] ===============================
[ 29.382315] [ INFO: suspicious RCU usage. ]
[ 29.382344] 4.1.0-0.rc0.git11.1.fc23.x86_64 #1 Not tainted
[ 29.382380] -------------------------------
[ 29.382409] net/bridge/br_private.h:626 suspicious
rcu_dereference_check() usage!
[ 29.382455]
other info that might help us debug this:
[ 29.382507]
rcu_scheduler_active = 1, debug_locks = 0
[ 29.382549] 2 locks held by swapper/0/0:
[ 29.382576] #0: (((&p->forward_delay_timer))){+.-...}, at:
[<ffffffff81139f75>] call_timer_fn+0x5/0x4f0
[ 29.382660] #1: (&(&br->lock)->rlock){+.-...}, at:
[<ffffffffa0450dc1>] br_forward_delay_timer_expired+0x31/0x140
[bridge]
[ 29.382754]
stack backtrace:
[ 29.382787] CPU: 0 PID: 0 Comm: swapper/0 Not tainted
4.1.0-0.rc0.git11.1.fc23.x86_64 #1
[ 29.382838] Hardware name: LENOVO 422916G/LENOVO, BIOS A1KT53AUS 04/07/2015
[ 29.382882] 0000000000000000 3ebfc20364115825 ffff880666603c48
ffffffff81892d4b
[ 29.382943] 0000000000000000 ffffffff81e124e0 ffff880666603c78
ffffffff8110bcd7
[ 29.383004] ffff8800785c9d00 ffff88065485ac58 ffff880c62002800
ffff880c5fc88ac0
[ 29.383065] Call Trace:
[ 29.383084] <IRQ> [<ffffffff81892d4b>] dump_stack+0x4c/0x65
[ 29.383130] [<ffffffff8110bcd7>] lockdep_rcu_suspicious+0xe7/0x120
[ 29.383178] [<ffffffffa04520f9>] br_fill_ifinfo+0x4a9/0x6a0 [bridge]
[ 29.383225] [<ffffffffa045266b>] br_ifinfo_notify+0x11b/0x4b0 [bridge]
[ 29.383271] [<ffffffffa0450d90>] ? br_hold_timer_expired+0x70/0x70 [bridge]
[ 29.383320] [<ffffffffa0450de8>]
br_forward_delay_timer_expired+0x58/0x140 [bridge]
[ 29.383371] [<ffffffffa0450d90>] ? br_hold_timer_expired+0x70/0x70 [bridge]
[ 29.383416] [<ffffffff8113a033>] call_timer_fn+0xc3/0x4f0
[ 29.383454] [<ffffffff81139f75>] ? call_timer_fn+0x5/0x4f0
[ 29.383493] [<ffffffff8110a90f>] ? lock_release_holdtime.part.29+0xf/0x200
[ 29.383541] [<ffffffffa0450d90>] ? br_hold_timer_expired+0x70/0x70 [bridge]
[ 29.383587] [<ffffffff8113a6a4>] run_timer_softirq+0x244/0x490
[ 29.383629] [<ffffffff810b68cc>] __do_softirq+0xec/0x670
[ 29.383666] [<ffffffff810b70d5>] irq_exit+0x145/0x150
[ 29.383703] [<ffffffff8189f506>] smp_apic_timer_interrupt+0x46/0x60
[ 29.383744] [<ffffffff8189d523>] apic_timer_interrupt+0x73/0x80
[ 29.383782] <EOI> [<ffffffff816f131f>] ? cpuidle_enter_state+0x5f/0x2f0
[ 29.383832] [<ffffffff816f131b>] ? cpuidle_enter_state+0x5b/0x2f0
Problem here is that br_forward_delay_timer_expired() is a timer
handler, calling br_ifinfo_notify() which assumes either rcu_read_lock()
or RTNL are held.
Simplest fix seems to add rcu read lock section.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Josh Boyer <jwboyer@fedoraproject.org>
Reported-by: Dominick Grift <dac.override@gmail.com>
Cc: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When trying to configure the settings for PHY1, using commands
like 'ethtool -s eth0 phyad 1 speed 100', the 'ethtool' seems to
modify other settings apart from the speed of the PHY1, in the
above case.
The ethtool seems to query the settings for PHY0, and use this
as the base to apply the new settings to the PHY1. This is
causing the other settings of the PHY 1 to be wrongly
configured.
The issue is caused by the '_ethtool_get_settings()' API, which
gets called because of the 'ETHTOOL_GSET' command, is clearing
the 'cmd' pointer (of type 'struct ethtool_cmd') by calling
memset. This clears all the parameters (if any) passed for the
'ETHTOOL_GSET' cmd. So the driver's callback is always invoked
with 'cmd->phy_address' as '0'.
The '_ethtool_get_settings()' is called from other files in the
'net/core'. So the fix is applied to the 'ethtool_get_settings()'
which is only called in the context of the 'ethtool'.
Signed-off-by: Arun Parameswaran <aparames@broadcom.com>
Reviewed-by: Ray Jui <rjui@broadcom.com>
Reviewed-by: Scott Branden <sbranden@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When more than a multicast address is present in a MLDv2 report, all but
the first address is ignored, because the code breaks out of the loop if
there has not been an error adding that address.
This has caused failures when two guests connected through the bridge
tried to communicate using IPv6. Neighbor discoveries would not be
transmitted to the other guest when both used a link-local address and a
static address.
This only happens when there is a MLDv2 querier in the network.
The fix will only break out of the loop when there is a failure adding a
multicast address.
The mdb before the patch:
dev ovirtmgmt port vnet0 grp ff02::1:ff7d:6603 temp
dev ovirtmgmt port vnet1 grp ff02::1:ff7d:6604 temp
dev ovirtmgmt port bond0.86 grp ff02::2 temp
After the patch:
dev ovirtmgmt port vnet0 grp ff02::1:ff7d:6603 temp
dev ovirtmgmt port vnet1 grp ff02::1:ff7d:6604 temp
dev ovirtmgmt port bond0.86 grp ff02::fb temp
dev ovirtmgmt port bond0.86 grp ff02::2 temp
dev ovirtmgmt port bond0.86 grp ff02::d temp
dev ovirtmgmt port vnet0 grp ff02::1:ff00:76 temp
dev ovirtmgmt port bond0.86 grp ff02::16 temp
dev ovirtmgmt port vnet1 grp ff02::1:ff00:77 temp
dev ovirtmgmt port bond0.86 grp ff02::1:ff00:def temp
dev ovirtmgmt port bond0.86 grp ff02::1:ffa1:40bf temp
Fixes: 08b202b67264 ("bridge br_multicast: IPv6 MLD support.")
Reported-by: Rik Theys <Rik.Theys@esat.kuleuven.be>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com>
Tested-by: Rik Theys <Rik.Theys@esat.kuleuven.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Use the new zynq binding for macb ethernet, since it will disable half
duplex gigabit like the Zynq TRM says to do.
Signed-off-by: Nathan Sullivan <nathan.sullivan@ni.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
According to the Zynq TRM, gigabit half duplex is not supported. Add a
new cap and compatible string so Zynq can avoid advertising that mode.
Signed-off-by: Nathan Sullivan <nathan.sullivan@ni.com>
Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: Nathan Sullivan <nathan.sullivan@ni.com>
Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When replacing an IPv4 route, tb_id member of the new fib_alias
structure is not set in the replace code path so that the new route is
ignored.
Fixes: 0ddcf43d5d4a ("ipv4: FIB Local/MAIN table collapse")
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Acked-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The tx_curr_frame_payload field is u32. When we try to calculate a
small negative delta based on it, we end up with a positive integer
close to 2^32 instead. So the tx_bytes pointer increases by about
2^32 for every transmitted frame.
Fix by calculating the delta as a signed long.
Cc: Ben Hutchings <ben.hutchings@codethink.co.uk>
Reported-by: Florian Bruhin <me@the-compiler.org>
Fixes: 7a1e890e2168 ("usbnet: Fix tx_bytes statistic running backward in cdc_ncm")
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
ip_error does not check if in_dev is NULL before dereferencing it.
IThe following sequence of calls is possible:
CPU A CPU B
ip_rcv_finish
ip_route_input_noref()
ip_route_input_slow()
inetdev_destroy()
dst_input()
With the result that a network device can be destroyed while processing
an input packet.
A crash was triggered with only unicast packets in flight, and
forwarding enabled on the only network device. The error condition
was created by the removal of the network device.
As such it is likely the that error code was -EHOSTUNREACH, and the
action taken by ip_error (if in_dev had been accessible) would have
been to not increment any counters and to have tried and likely failed
to send an icmp error as the network device is going away.
Therefore handle this weird case by just dropping the packet if
!in_dev. It will result in dropping the packet sooner, and will not
result in an actual change of behavior.
Fixes: 251da4130115b ("ipv4: Cache ip_error() routes even when not forwarding.")
Reported-by: Vittorio Gambaletta <linuxbugs@vittgam.net>
Tested-by: Vittorio Gambaletta <linuxbugs@vittgam.net>
Signed-off-by: Vittorio Gambaletta <linuxbugs@vittgam.net>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Taking socket spinlock in tcp_get_info() can deadlock, as
inet_diag_dump_icsk() holds the &hashinfo->ehash_locks[i],
while packet processing can use the reverse locking order.
We could avoid this locking for TCP_LISTEN states, but lockdep would
certainly get confused as all TCP sockets share same lockdep classes.
[ 523.722504] ======================================================
[ 523.728706] [ INFO: possible circular locking dependency detected ]
[ 523.734990] 4.1.0-dbg-DEV #1676 Not tainted
[ 523.739202] -------------------------------------------------------
[ 523.745474] ss/18032 is trying to acquire lock:
[ 523.750002] (slock-AF_INET){+.-...}, at: [<ffffffff81669d44>] tcp_get_info+0x2c4/0x360
[ 523.758129]
[ 523.758129] but task is already holding lock:
[ 523.763968] (&(&hashinfo->ehash_locks[i])->rlock){+.-...}, at: [<ffffffff816bcb75>] inet_diag_dump_icsk+0x1d5/0x6c0
[ 523.774661]
[ 523.774661] which lock already depends on the new lock.
[ 523.774661]
[ 523.782850]
[ 523.782850] the existing dependency chain (in reverse order) is:
[ 523.790326]
-> #1 (&(&hashinfo->ehash_locks[i])->rlock){+.-...}:
[ 523.796599] [<ffffffff811126bb>] lock_acquire+0xbb/0x270
[ 523.802565] [<ffffffff816f5868>] _raw_spin_lock+0x38/0x50
[ 523.808628] [<ffffffff81665af8>] __inet_hash_nolisten+0x78/0x110
[ 523.815273] [<ffffffff816819db>] tcp_v4_syn_recv_sock+0x24b/0x350
[ 523.822067] [<ffffffff81684d41>] tcp_check_req+0x3c1/0x500
[ 523.828199] [<ffffffff81682d09>] tcp_v4_do_rcv+0x239/0x3d0
[ 523.834331] [<ffffffff816842fe>] tcp_v4_rcv+0xa8e/0xc10
[ 523.840202] [<ffffffff81658fa3>] ip_local_deliver_finish+0x133/0x3e0
[ 523.847214] [<ffffffff81659a9a>] ip_local_deliver+0xaa/0xc0
[ 523.853440] [<ffffffff816593b8>] ip_rcv_finish+0x168/0x5c0
[ 523.859624] [<ffffffff81659db7>] ip_rcv+0x307/0x420
Lets use u64_sync infrastructure instead. As a bonus, 64bit
arches get optimized, as these are nop for them.
Fixes: 0df48c26d841 ("tcp: add tcpi_bytes_acked to tcp_info")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently dm-multipath has to clone the bios for every request sent
to the lower devices, which wastes cpu cycles and ties down memory.
This patch instead adds a new REQ_CLONE flag that instructs req_bio_endio
to not complete bios attached to a request, which we set on clone
requests similar to bios in a flush sequence. With this change I/O
errors on a path failure only get propagated to dm-multipath, which
can then either resubmit the I/O or complete the bios on the original
request.
I've done some basic testing of this on a Linux target with ALUA support,
and it survives path failures during I/O nicely.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Commit c4cf5261 ("bio: skip atomic inc/dec of ->bi_remaining for
non-chains") regressed all existing callers that followed this pattern:
1) saving a bio's original bi_end_io
2) wiring up an intermediate bi_end_io
3) restoring the original bi_end_io from intermediate bi_end_io
4) calling bio_endio() to execute the restored original bi_end_io
The regression was due to BIO_CHAIN only ever getting set if
bio_inc_remaining() is called. For the above pattern it isn't set until
step 3 above (step 2 would've needed to establish BIO_CHAIN). As such
the first bio_endio(), in step 2 above, never decremented __bi_remaining
before calling the intermediate bi_end_io -- leaving __bi_remaining with
the value 1 instead of 0. When bio_inc_remaining() occurred during step
3 it brought it to a value of 2. When the second bio_endio() was
called, in step 4 above, it should've called the original bi_end_io but
it didn't because there was an extra reference that wasn't dropped (due
to atomic operations being optimized away since BIO_CHAIN wasn't set
upfront).
Fix this issue by removing the __bi_remaining management complexity for
all callers that use the above pattern -- bio_chain() is the only
interface that _needs_ to be concerned with __bi_remaining. For the
above pattern callers just expect the bi_end_io they set to get called!
Remove bio_endio_nodec() and also remove all bio_inc_remaining() calls
that aren't associated with the bio_chain() interface.
Also, the bio_inc_remaining() interface has been moved local to bio.c.
Fixes: c4cf5261 ("bio: skip atomic inc/dec of ->bi_remaining for non-chains")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Multitheaded tests showed that the icv buffer in the current ghash
implementation is not handled correctly. A move of this working ghash
buffer value to the descriptor context fixed this. Code is tested and
verified with an multithreaded application via af_alg interface.
Cc: stable@vger.kernel.org
Signed-off-by: Harald Freudenberger <freude@linux.vnet.ibm.com>
Signed-off-by: Gerald Schaefer <geraldsc@linux.vnet.ibm.com>
Reported-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Wait a little bit longer, 50mS instead of 20mS, until the driver starts
polling for pen-up. The problematic behavior before this patch is applied
is as follows. The behavior was observed on the STMPE610QTR controller.
Upon a physical pen-down event, the touchscreen reports one set of x-y-p
coordinates and a pen-down event. After that, the pen-up polling is
triggered and since the controller is not ready yet, the polling mistakenly
detects a pen-up event while the physical state is still such that the pen
is down on the touch surface.
The pen-up handling flushes the controller FIFO, so after that, all the
samples in the controller are discarded. The controller becomes ready
shortly after this bogus pen-up handling and does generate again a pen-down
interrupt. This time, the controller contains x-y-p samples which all read
as zero. Since pressure value is zero, this set of samples is effectively
ignored by userland.
In the end, the driver just bounces between pen-down and bogus pen-up
handling, generating no useful results. Fix this by giving the controller a
bit more time before polling it for pen-up.
Signed-off-by: Marek Vasut <marex@denx.de>
Reviewed-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
|
|
Use msecs_to_jiffies(20) instead of plain (HZ / 50), as the former is much
more explicit about it's behavior. We want to schedule the task 20 mS from
now, so make it explicit in the code.
Signed-off-by: Marek Vasut <marex@denx.de>
Reviewed-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
|
|
Joydev is currently thinking some absolute mice are joystick, and that
messes up games in VMware guests, as the cursor typically gets stuck in
the top left corner.
Try to detect the event signature of a VMmouse input device and back off
for such devices. We're still incorrectly detecting, for example, the
VMware absolute USB mouse as a joystick, but adding an event signature
matching also that device would be considerably more risky, so defer that
to a later merge window.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
|
|
Vijay reported that a loop as simple as ...
while true; do
tc qdisc add dev foo root handle 1: prio
tc filter add dev foo parent 1: u32 match u32 0 0 flowid 1
tc qdisc del dev foo root
rmmod cls_u32
done
... will panic the kernel. Moreover, he bisected the change
apparently introducing it to 78fd1d0ab072 ("netlink: Re-add
locking to netlink_lookup() and seq walker").
The removal of synchronize_net() from the netlink socket
triggering the qdisc to be removed, seems to have uncovered
an RCU resp. module reference count race from the tc API.
Given that RCU conversion was done after e341694e3eb5 ("netlink:
Convert netlink_lookup() to use RCU protected hash table")
which added the synchronize_net() originally, occasion of
hitting the bug was less likely (not impossible though):
When qdiscs that i) support attaching classifiers and,
ii) have at least one of them attached, get deleted, they
invoke tcf_destroy_chain(), and thus call into ->destroy()
handler from a classifier module.
After RCU conversion, all classifier that have an internal
prio list, unlink them and initiate freeing via call_rcu()
deferral.
Meanhile, tcf_destroy() releases already reference to the
tp->ops->owner module before the queued RCU callback handler
has been invoked.
Subsequent rmmod on the classifier module is then not prevented
since all module references are already dropped.
By the time, the kernel invokes the RCU callback handler from
the module, that function address is then invalid.
One way to fix it would be to add an rcu_barrier() to
unregister_tcf_proto_ops() to wait for all pending call_rcu()s
to complete.
synchronize_rcu() is not appropriate as under heavy RCU
callback load, registered call_rcu()s could be deferred
longer than a grace period. In case we don't have any pending
call_rcu()s, the barrier is allowed to return immediately.
Since we came here via unregister_tcf_proto_ops(), there
are no users of a given classifier anymore. Further nested
call_rcu()s pointing into the module space are not being
done anywhere.
Only cls_bpf_delete_prog() may schedule a work item, to
unlock pages eventually, but that is not in the range/context
of cls_bpf anymore.
Fixes: 25d8c0d55f24 ("net: rcu-ify tcf_proto")
Fixes: 9888faefe132 ("net: sched: cls_basic use RCU")
Reported-by: Vijay Subramanian <subramanian.vijay@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: John Fastabend <john.r.fastabend@intel.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Thomas Graf <tgraf@suug.ch>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Tested-by: Vijay Subramanian <subramanian.vijay@gmail.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
That atom table does not check these bits. Fixes aux
regressions on some boards.
Reported-by: Malte Schröder <malte@tnxip.de>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
|
|
Retry the dpcd fetch several times. Some eDP panels
fail several times before the fetch is successful.
bug:
https://bugs.freedesktop.org/show_bug.cgi?id=73530
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
|
|
The index of ->planes[] array (3rd parameter) cannot be equal to MAX_PLANE.
This looks like a typo that is now fixed.
Signed-off-by: Stephane Viau <sviau@codeaurora.org>
Acked-by: Rob Clark <robdclark@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
Evaluating "&mddev->disks" is simple pointer arithmetic, so
it does not need 'rcu' annotations - no dereferencing is happening.
Also enhance the comment to explain that 'rdev' in that case
is not actually a pointer to an rdev.
Reported-by: Patrick Marlier <patrick.marlier@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
|
|
The variable "sector" in "raid0_make_request()" was improperly updated
by a call to "sector_div()" which modifies its first argument in place.
Commit 47d68979cc968535cb87f3e5f2e6a3533ea48fbd restored this variable
after the call for later re-use. Unfortunetly the restore was done after
the referenced variable "bio" was advanced. This lead to the original
value and the restored value being different. Here we move this line to
the proper place.
One observed side effect of this bug was discarding a file though
unlinking would cause an unrelated file's contents to be discarded.
Signed-off-by: NeilBrown <neilb@suse.de>
Fixes: 47d68979cc96 ("md/raid0: fix bug with chunksize not a power of 2.")
Cc: stable@vger.kernel.org (any that received above backport)
URL: https://bugzilla.kernel.org/show_bug.cgi?id=98501
|
|
ops_run_reconstruct6() doesn't correctly chain asyn operations. The tx returned
by async_gen_syndrome should be added as the dependent tx of next stripe.
The issue is introduced by commit 59fc630b8b5f9f21c8ce3ba153341c107dce1b0c
RAID5: batch adjacent full stripe write
Reported-and-tested-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: NeilBrown <neilb@suse.de>
|
|
The vmmouse Kconfig help text was referring to an incorrect user-space
driver version. Fix this.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
|