Age | Commit message (Collapse) | Author | Files | Lines |
|
Now that we've established (again!) that empty xattr leaf buffers are
ok, we no longer need to bhold them to transactions when we're creating
new leaf blocks. Get rid of the entire mechanism, which should simplify
the xattr code quite a bit.
The original justification for using bhold here was to prevent the AIL
from trying to write the empty leaf block into the fs during the brief
time that we release the buffer lock. The reason for /that/ was to
prevent recovery from tripping over the empty ondisk block.
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
TLDR: Revert commit 51e6104fdb95 ("xfs: detect empty attr leaf blocks in
xfs_attr3_leaf_verify") because it was wrong.
Every now and then we get a corruption report from the kernel or
xfs_repair about empty leaf blocks in the extended attribute structure.
We've long thought that these shouldn't be possible, but prior to 5.18
one would shake loose in the recoveryloop fstests about once a month.
A new addition to the xattr leaf block verifier in 5.19-rc1 makes this
happen every 7 minutes on my testing cloud. I added a ton of logging to
detect any time we set the header count on an xattr leaf block to zero.
This produced the following dmesg output on generic/388:
XFS (sda4): ino 0x21fcbaf leaf 0x129bf78 hdcount==0!
Call Trace:
<TASK>
dump_stack_lvl+0x34/0x44
xfs_attr3_leaf_create+0x187/0x230
xfs_attr_shortform_to_leaf+0xd1/0x2f0
xfs_attr_set_iter+0x73e/0xa90
xfs_xattri_finish_update+0x45/0x80
xfs_attr_finish_item+0x1b/0xd0
xfs_defer_finish_noroll+0x19c/0x770
__xfs_trans_commit+0x153/0x3e0
xfs_attr_set+0x36b/0x740
xfs_xattr_set+0x89/0xd0
__vfs_setxattr+0x67/0x80
__vfs_setxattr_noperm+0x6e/0x120
vfs_setxattr+0x97/0x180
setxattr+0x88/0xa0
path_setxattr+0xc3/0xe0
__x64_sys_setxattr+0x27/0x30
do_syscall_64+0x35/0x80
entry_SYSCALL_64_after_hwframe+0x46/0xb0
So now we know that someone is creating empty xattr leaf blocks as part
of converting a sf xattr structure into a leaf xattr structure. The
conversion routine logs any existing sf attributes in the same
transaction that creates the leaf block, so we know this is a setxattr
to a file that has no attributes at all.
Next, g/388 calls the shutdown ioctl and cycles the mount to trigger log
recovery. I also augmented buffer item recovery to call ->verify_struct
on any attr leaf blocks and complain if it finds a failure:
XFS (sda4): Unmounting Filesystem
XFS (sda4): Mounting V5 Filesystem
XFS (sda4): Starting recovery (logdev: internal)
XFS (sda4): xattr leaf daddr 0x129bf78 hdrcount == 0!
Call Trace:
<TASK>
dump_stack_lvl+0x34/0x44
xfs_attr3_leaf_verify+0x3b8/0x420
xlog_recover_buf_commit_pass2+0x60a/0x6c0
xlog_recover_items_pass2+0x4e/0xc0
xlog_recover_commit_trans+0x33c/0x350
xlog_recovery_process_trans+0xa5/0xe0
xlog_recover_process_data+0x8d/0x140
xlog_do_recovery_pass+0x19b/0x720
xlog_do_log_recovery+0x62/0xc0
xlog_do_recover+0x33/0x1d0
xlog_recover+0xda/0x190
xfs_log_mount+0x14c/0x360
xfs_mountfs+0x517/0xa60
xfs_fs_fill_super+0x6bc/0x950
get_tree_bdev+0x175/0x280
vfs_get_tree+0x1a/0x80
path_mount+0x6f5/0xaa0
__x64_sys_mount+0x103/0x140
do_syscall_64+0x35/0x80
entry_SYSCALL_64_after_hwframe+0x46/0xb0
RIP: 0033:0x7fc61e241eae
And a moment later, the _delwri_submit of the recovered buffers trips
the same verifier and recovery fails:
XFS (sda4): Metadata corruption detected at xfs_attr3_leaf_verify+0x393/0x420 [xfs], xfs_attr3_leaf block 0x129bf78
XFS (sda4): Unmount and run xfs_repair
XFS (sda4): First 128 bytes of corrupted metadata buffer:
00000000: 00 00 00 00 00 00 00 00 3b ee 00 00 00 00 00 00 ........;.......
00000010: 00 00 00 00 01 29 bf 78 00 00 00 00 00 00 00 00 .....).x........
00000020: a5 1b d0 02 b2 9a 49 df 8e 9c fb 8d f8 31 3e 9d ......I......1>.
00000030: 00 00 00 00 02 1f cb af 00 00 00 00 10 00 00 00 ................
00000040: 00 50 0f b0 00 00 00 00 00 00 00 00 00 00 00 00 .P..............
00000050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00000060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
XFS (sda4): Corruption of in-memory data (0x8) detected at _xfs_buf_ioapply+0x37f/0x3b0 [xfs] (fs/xfs/xfs_buf.c:1518). Shutting down filesystem.
XFS (sda4): Please unmount the filesystem and rectify the problem(s)
XFS (sda4): log mount/recovery failed: error -117
XFS (sda4): log mount failed
I think I see what's going on here -- setxattr is racing with something
that shuts down the filesystem:
Thread 1 Thread 2
-------- --------
xfs_attr_sf_addname
xfs_attr_shortform_to_leaf
<create empty leaf>
xfs_trans_bhold(leaf)
xattri_dela_state = XFS_DAS_LEAF_ADD
<roll transaction>
<flush log>
<shut down filesystem>
xfs_trans_bhold_release(leaf)
<discover fs is dead, bail>
Thread 3
--------
<cycle mount, start recovery>
xlog_recover_buf_commit_pass2
xlog_recover_do_reg_buffer
<replay empty leaf buffer from recovered buf item>
xfs_buf_delwri_queue(leaf)
xfs_buf_delwri_submit
_xfs_buf_ioapply(leaf)
xfs_attr3_leaf_write_verify
<trip over empty leaf buffer>
<fail recovery>
As you can see, the bhold keeps the leaf buffer locked and thus prevents
the *AIL* from tripping over the ichdr.count==0 check in the write
verifier. Unfortunately, it doesn't prevent the log from getting
flushed to disk, which sets up log recovery to fail.
So. It's clear that the kernel has always had the ability to persist
attr leaf blocks with ichdr.count==0, which means that it's part of the
ondisk format now.
Unfortunately, this check has been added and removed multiple times
throughout history. It first appeared in[1] kernel 3.10 as part of the
early V5 format patches. The check was later discovered to break log
recovery and hence disabled[2] during log recovery in kernel 4.10.
Simultaneously, the check was added[3] to xfs_repair 4.9.0 to try to
weed out the empty leaf blocks. This was still not correct because log
recovery would recover an empty attr leaf block successfully only for
regular xattr operations to trip over the empty block during of the
block during regular operation. Therefore, the check was removed
entirely[4] in kernel 5.7 but removal of the xfs_repair check was
forgotten. The continued complaints from xfs_repair lead to us
mistakenly re-adding[5] the verifier check for kernel 5.19. Remove it
once again.
[1] 517c22207b04 ("xfs: add CRCs to attr leaf blocks")
[2] 2e1d23370e75 ("xfs: ignore leaf attr ichdr.count in verifier
during log replay")
[3] f7140161 ("xfs_repair: junk leaf attribute if count == 0")
[4] f28cef9e4dac ("xfs: don't fail verifier on empty attr3 leaf
block")
[5] 51e6104fdb95 ("xfs: detect empty attr leaf blocks in
xfs_attr3_leaf_verify")
Looking at the rest of the xattr code, it seems that files with empty
leaf blocks behave as expected -- listxattr reports no attributes;
getxattr on any xattr returns nothing as expected; removexattr does
nothing; and setxattr can add attributes just fine.
Original-bug: 517c22207b04 ("xfs: add CRCs to attr leaf blocks")
Still-not-fixed-by: 2e1d23370e75 ("xfs: ignore leaf attr ichdr.count in verifier during log replay")
Removed-in: f28cef9e4dac ("xfs: don't fail verifier on empty attr3 leaf block")
Fixes: 51e6104fdb95 ("xfs: detect empty attr leaf blocks in xfs_attr3_leaf_verify")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
|
The end of this function could use some cleanup -- the EAGAIN
conditionals make it harder to figure out what's going on with the
disposal of xattri_leaf_bp, and the dual error/ret variables aren't
needed. Turn the EAGAIN case into a separate block documenting all the
subtleties of recovering in the middle of an xattr update chain, which
makes the rest of the prologue much simpler.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
|
While running the following fstest with logged xattrs DISabled, I
noticed the following:
# FSSTRESS_AVOID="-z -f unlink=1 -f rmdir=1 -f creat=2 -f mkdir=2 -f
getfattr=3 -f listfattr=3 -f attr_remove=4 -f removefattr=4 -f
setfattr=20 -f attr_set=60" ./check generic/475
INFO: task u9:1:40 blocked for more than 61 seconds.
Tainted: G O 5.19.0-rc2-djwx #rc2
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:u9:1 state:D stack:12872 pid: 40 ppid: 2 flags:0x00004000
Workqueue: xfs-cil/dm-0 xlog_cil_push_work [xfs]
Call Trace:
<TASK>
__schedule+0x2db/0x1110
schedule+0x58/0xc0
schedule_timeout+0x115/0x160
__down_common+0x126/0x210
down+0x54/0x70
xfs_buf_lock+0x2d/0xe0 [xfs 0532c1cb1d67dd81d15cb79ac6e415c8dec58f73]
xfs_buf_item_unpin+0x227/0x3a0 [xfs 0532c1cb1d67dd81d15cb79ac6e415c8dec58f73]
xfs_trans_committed_bulk+0x18e/0x320 [xfs 0532c1cb1d67dd81d15cb79ac6e415c8dec58f73]
xlog_cil_committed+0x2ea/0x360 [xfs 0532c1cb1d67dd81d15cb79ac6e415c8dec58f73]
xlog_cil_push_work+0x60f/0x690 [xfs 0532c1cb1d67dd81d15cb79ac6e415c8dec58f73]
process_one_work+0x1df/0x3c0
worker_thread+0x53/0x3b0
kthread+0xea/0x110
ret_from_fork+0x1f/0x30
</TASK>
This appears to be the result of shortform_to_leaf creating a new leaf
buffer as part of adding an xattr to a file. The new leaf buffer is
held and attached to the xfs_attr_intent structure, but then the
filesystem shuts down. Instead of the usual path (which adds the attr
to the held leaf buffer which releases the hold), we instead cancel the
entire deferred operation.
Unfortunately, xfs_attr_cancel_item doesn't release any attached leaf
buffers, so we leak the locked buffer. The CIL cannot do anything
about that, and hangs. Fix this by teaching it to release leaf buffers,
and make XFS a little more careful about not leaving a dangling
reference.
The prologue of xfs_attri_item_recover is (in this author's opinion) a
little hard to figure out, so I'll clean that up in the next patch.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
|
We should use invalidate_lock and XFS_MMAPLOCK_SHARED to check the state
of mmap_lock rw_semaphore in xfs_isilocked(), rather than i_rwsem and
XFS_IOLOCK_SHARED.
Fixes: 2433480a7e1d ("xfs: Convert to use invalidate_lock")
Signed-off-by: Kaixu Xia <kaixuxia@tencent.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
There are similar lock flags assert in xfs_ilock(), xfs_ilock_nowait(),
xfs_iunlock(), thus we can factor it out into a helper that is clear.
Signed-off-by: Kaixu Xia <kaixuxia@tencent.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
The current blocking mechanism for pushing the inodegc queue out to
disk can result in systems becoming unusable when there is a long
running inodegc operation. This is because the statfs()
implementation currently issues a blocking flush of the inodegc
queue and a significant number of common system utilities will call
statfs() to discover something about the underlying filesystem.
This can result in userspace operations getting stuck on inodegc
progress, and when trying to remove a heavily reflinked file on slow
storage with a full journal, this can result in delays measuring in
hours.
Avoid this problem by adding "push" function that expedites the
flushing of the inodegc queue, but doesn't wait for it to complete.
Convert xfs_fs_statfs() and xfs_qm_scall_getquota() to use this
mechanism so they don't block but still ensure that queued
operations are expedited.
Fixes: ab23a7768739 ("xfs: per-cpu deferred inode inactivation queues")
Reported-by: Chris Dunlop <chris@onthe.net.au>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
[djwong: fix _getquota_next to use _inodegc_push too]
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Currently inodegc work can sit queued on the per-cpu queue until
the workqueue is either flushed of the queue reaches a depth that
triggers work queuing (and later throttling). This means that we
could queue work that waits for a long time for some other event to
trigger flushing.
Hence instead of just queueing work at a specific depth, use a
delayed work that queues the work at a bound time. We can still
schedule the work immediately at a given depth, but we no long need
to worry about leaving a number of items on the list that won't get
processed until external events prevail.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
It is vitally important that we preserve the state of the NREXT64 inode
flag when we're changing the other flags2 fields.
Fixes: 9b7d16e34bbe ("xfs: Introduce XFS_DIFLAG2_NREXT64 and associated helpers")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
|
|
The variable @args is fed to a tracepoint, and that's the only place
it's used. This is fine for the kernel, but for userspace, tracepoints
are #define'd out of existence, which results in this warning on gcc
11.2:
xfs_attr.c: In function ‘xfs_attr_node_try_addname’:
xfs_attr.c:1440:42: warning: unused variable ‘args’ [-Wunused-variable]
1440 | struct xfs_da_args *args = attr->xattri_da_args;
| ^~~~
Clean this up.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
|
|
I found a race involving the larp control knob, aka the debugging knob
that lets developers enable logging of extended attribute updates:
Thread 1 Thread 2
echo 0 > /sys/fs/xfs/debug/larp
setxattr(REPLACE)
xfs_has_larp (returns false)
xfs_attr_set
echo 1 > /sys/fs/xfs/debug/larp
xfs_attr_defer_replace
xfs_attr_init_replace_state
xfs_has_larp (returns true)
xfs_attr_init_remove_state
<oops, wrong DAS state!>
This isn't a particularly severe problem right now because xattr logging
is only enabled when CONFIG_XFS_DEBUG=y, and developers *should* know
what they're doing.
However, the eventual intent is that callers should be able to ask for
the assistance of the log in persisting xattr updates. This capability
might not be required for /all/ callers, which means that dynamic
control must work correctly. Once an xattr update has decided whether
or not to use logged xattrs, it needs to stay in that mode until the end
of the operation regardless of what subsequent parallel operations might
do.
Therefore, it is an error to continue sampling xfs_globals.larp once
xfs_attr_change has made a decision about larp, and it was not correct
for me to have told Allison that ->create_intent functions can sample
the global log incompat feature bitfield to decide to elide a log item.
Instead, create a new op flag for the xfs_da_args structure, and convert
all other callers of xfs_has_larp and xfs_sb_version_haslogxattrs within
the attr update state machine to look for the operations flag.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
|
|
|
|
The Surface Go reports Chassis Type 9 (Laptop,) so the device needs to be
added to dmi_vgbs_allow_list to enable tablet mode when an attached Type
Cover is folded back.
BugLink: https://github.com/linux-surface/linux-surface/issues/837
Signed-off-by: Duke Lee <krnhotwings@gmail.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Link: https://lore.kernel.org/r/20220607213654.5567-1-krnhotwings@gmail.com
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
|
|
commit be9d73e64957 ("platform/x86: hp-wmi: Fix 0x05 error code reported by
several WMI calls") and commit 12b19f14a21a ("platform/x86: hp-wmi: Fix
hp_wmi_read_int() reporting error (0x05)") cause ACPI BIOS Error (bug):
Attempt to CreateField of length zero (20211217/dsopcode-133) because of
the ACPI method HWMC, which unconditionally creates a Field of
size (insize*8) bits:
CreateField (Arg1, 0x80, (Local5 * 0x08), DAIN)
In cases where args->insize = 0, the Field size is 0, resulting in
an error.
Fix this by using zero insize only if 0x5 error code is returned
Tested on Omen 15 AMD (2020) board ID: 8786.
Fixes: be9d73e64957 ("platform/x86: hp-wmi: Fix 0x05 error code reported by several WMI calls")
Signed-off-by: Bedant Patnaik <bedant.patnaik@gmail.com>
Tested-by: Jorge Lopez <jorge.lopez2@hp.com>
Link: https://lore.kernel.org/r/41be46743d21c78741232a47bbb5f1cdbcc3d21e.camel@gmail.com
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
|
|
WMI queries fail on some devices where the ACPI method HWMC
unconditionally attempts to create Fields beyond the buffer
if the buffer is too small, this breaks essential features
such as power profiles:
CreateByteField (Arg1, 0x10, D008)
CreateByteField (Arg1, 0x11, D009)
CreateByteField (Arg1, 0x12, D010)
CreateDWordField (Arg1, 0x10, D032)
CreateField (Arg1, 0x80, 0x0400, D128)
In cases where args->data had zero length, ACPI BIOS Error
(bug): AE_AML_BUFFER_LIMIT, Field [D008] at bit
offset/length 128/8 exceeds size of target Buffer (128 bits)
(20211217/dsopcode-198) was obtained.
ACPI BIOS Error (bug): AE_AML_BUFFER_LIMIT, Field [D009] at bit
offset/length 136/8 exceeds size of target Buffer (136bits)
(20211217/dsopcode-198)
The original code created a buffer size of 128 bytes regardless if
the WMI call required a smaller buffer or not. This particular
behavior occurs in older BIOS and reproduced in OMEN laptops. Newer
BIOS handles buffer sizes properly and meets the latest specification
requirements. This is the reason why testing with a dynamically
allocated buffer did not uncover any failures with the test systems at
hand.
This patch was tested on several OMEN, Elite, and Zbooks. It was
confirmed the patch resolves HPWMI_FAN GET/SET calls in an OMEN
Laptop 15-ek0xxx. No problems were reported when testing on several Elite
and Zbooks notebooks.
Fixes: 4b4967cbd268 ("platform/x86: hp-wmi: Changing bios_args.data to be dynamically allocated")
Signed-off-by: Jorge Lopez <jorge.lopez2@hp.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Link: https://lore.kernel.org/r/20220608212923.8585-2-jorge.lopez2@hp.com
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
|
|
The syntax without dots is available since commit 43756e347f21
("scripts/kernel-doc: Add support for named variable macro arguments").
The same HTML output is produced with and without this patch.
Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
Commit 6c77676645ad ("iov_iter: Fix iter_xarray_get_pages{,_alloc}()")
introduced a problem on some 32-bit architectures (at least arm, xtensa,
csky,sparc and mips), that have a 'size_t' that is 'unsigned int'.
The reason is that we now do
min(nr * PAGE_SIZE - offset, maxsize);
where 'nr' and 'offset' and both 'unsigned int', and PAGE_SIZE is
'unsigned long'. As a result, the normal C type rules means that the
first argument to 'min()' ends up being 'unsigned long'.
In contrast, 'maxsize' is of type 'size_t'.
Now, 'size_t' and 'unsigned long' are always the same physical type in
the kernel, so you'd think this doesn't matter, and from an actual
arithmetic standpoint it doesn't.
But on 32-bit architectures 'size_t' is commonly 'unsigned int', even if
it could also be 'unsigned long'. In that situation, both are unsigned
32-bit types, but they are not the *same* type.
And as a result 'min()' will complain about the distinct types (ignore
the "pointer types" part of the error message: that's an artifact of the
way we have made 'min()' check types for being the same):
lib/iov_iter.c: In function 'iter_xarray_get_pages':
include/linux/minmax.h:20:35: error: comparison of distinct pointer types lacks a cast [-Werror]
20 | (!!(sizeof((typeof(x) *)1 == (typeof(y) *)1)))
| ^~
lib/iov_iter.c:1464:16: note: in expansion of macro 'min'
1464 | return min(nr * PAGE_SIZE - offset, maxsize);
| ^~~
This was not visible on 64-bit architectures (where we always define
'size_t' to be 'unsigned long').
Force these cases to use 'min_t(size_t, x, y)' to make the type explicit
and avoid the issue.
[ Nit-picky note: technically 'size_t' doesn't have to match 'unsigned
long' arithmetically. We've certainly historically seen environments
with 16-bit address spaces and 32-bit 'unsigned long'.
Similarly, even in 64-bit modern environments, 'size_t' could be its
own type distinct from 'unsigned long', even if it were arithmetically
identical.
So the above type commentary is only really descriptive of the kernel
environment, not some kind of universal truth for the kinds of wild
and crazy situations that are allowed by the C standard ]
Reported-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Link: https://lore.kernel.org/all/YqRyL2sIqQNDfky2@debian/
Cc: Jeff Layton <jlayton@kernel.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
By forcing the maximum CPU that QEMU has available, we expose additional
capabilities, such as the RNDR instruction, which increases test
coverage. This then allows the CI to skip the fake seeding step in some
cases. Also enable STRICT_KERNEL_RWX to catch issues related to early
jump labels when the RNG is initialized at boot.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
MAGIC_START("IKCFG_ST") and MAGIC_END("IKCFG_ED") are moved out
from the kernel_config_data variable.
Thus, we parse kernel_config_data directly instead of considering
offset of MAGIC_START and MAGIC_END.
Fixes: 13610aa908dc ("kernel/configs: use .incbin directive to embed config_data.gz")
Signed-off-by: Kuan-Ying Lee <Kuan-Ying.Lee@mediatek.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
Call virtio_device_ready() to make this driver work after commit
b4ec69d7e09 ("virtio: harden vring IRQ"), since the driver uses the
virtqueues in the probe function. (The virtio core sets the device
ready when probe returns.)
Fixes: 8b4ec69d7e09 ("virtio: harden vring IRQ")
Fixes: 68f5d3f3b654 ("um: add PCI over virtio emulation driver")
Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
Message-Id: <20220610151203.3492541-1-vincent.whitchurch@axis.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Johannes Berg <johannes@sipsolutions.net>
|
|
Currently, the secondary channels of a multichannel session
also get hostname populated based on the info in primary channel.
However, this will end up with a wrong resolution of hostname to
IP address during reconnect.
This change fixes this by not populating hostname info for all
secondary channels.
Fixes: 5112d80c162f ("cifs: populate server_hostname for extra channels")
Cc: stable@vger.kernel.org
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Tested and works on my system.
Signed-off-by: August Wikerfors <git@augustwikerfors.se>
Link: https://lore.kernel.org/r/20220608212028.28307-1-git@augustwikerfors.se
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
|
|
Add dmi_system_id of Gigabyte Z690M AORUS ELITE AX DDR4 board.
Tested on my PC.
Signed-off-by: Piotr Chmura <chmooreck@gmail.com>
Link: https://lore.kernel.org/r/bd83567e-ebf5-0b31-074b-5f6dc7f7c147@gmail.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
|
|
As platform_driver_register() could fail, it should be better
to deal with the return value in order to maintain the code
consisitency.
Fixes: 86af1d02d458 ("platform/x86: Support for EC-connected GPIOs for identify LED/button on Barco P50 board")
Signed-off-by: Jiasheng Jiang <jiasheng@iscas.ac.cn>
Acked-by: Peter Korsgaard <peter.korsgaard@barco.com>
Link: https://lore.kernel.org/r/20220526090345.1444172-1-jiasheng@iscas.ac.cn
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
|
|
Add Raptorlake P to the list of the platforms that intel_pmc_core driver
supports for pmc_core device. Raptorlake P PCH is based on Alderlake P
PCH.
Signed-off-by: George D Sworo <george.d.sworo@intel.com>
Reviewed-by: David E. Box <david.e.box@linux.intel.com>
Link: https://lore.kernel.org/r/20220602012617.20100-1-george.d.sworo@intel.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
|
|
The probe function pmt_crashlog_probe() may incorrectly reference
the 'priv->entry array' as it uses 'i' to reference the array instead
of 'priv->num_entries' as it should. This is similar to the problem
that was addressed in pmt_telemetry_probe via commit 2cdfa0c20d58
("platform/x86/intel: Fix 'rmmod pmt_telemetry' panic").
Cc: "David E. Box" <david.e.box@linux.intel.com>
Cc: Hans de Goede <hdegoede@redhat.com>
Cc: Mark Gross <markgross@kernel.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: David Arcari <darcari@redhat.com>
Reviewed-by: David E. Box <david.e.box@linux.intel.com>
Link: https://lore.kernel.org/r/20220526203140.339120-1-darcari@redhat.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
|
|
Fix problem of missing static in struct declaration.
Fixes: 662f24826f954 ("platform/mellanox: Add support for new SN2201 system")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Michael Shych <michaelsh@nvidia.com>
Link: https://lore.kernel.org/r/20220602145103.11859-1-michaelsh@nvidia.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
|
|
The maths at the end of iter_xarray_get_pages() to calculate the actual
size doesn't work under some circumstances, such as when it's been asked to
extract a partial single page. Various terms of the equation cancel out
and you end up with actual == offset. The same issue exists in
iter_xarray_get_pages_alloc().
Fix these to just use min() to select the lesser amount from between the
amount of page content transcribed into the buffer, minus the offset, and
the size limit specified.
This doesn't appear to have caused a problem yet upstream because network
filesystems aren't getting the pages from an xarray iterator, but rather
passing it directly to the socket, which just iterates over it. Cachefiles
*does* do DIO from one to/from ext4/xfs/btrfs/etc. but it always asks for
whole pages to be written or read.
Fixes: 7ff5062079ef ("iov_iter: Add ITER_XARRAY")
Reported-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Alexander Viro <viro@zeniv.linux.org.uk>
cc: Dominique Martinet <asmadeus@codewreck.org>
cc: Mike Marshall <hubcap@omnibond.com>
cc: Gao Xiang <xiang@kernel.org>
cc: linux-afs@lists.infradead.org
cc: v9fs-developer@lists.sourceforge.net
cc: devel@lists.orangefs.org
cc: linux-erofs@lists.ozlabs.org
cc: linux-cachefs@redhat.com
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
The netfs_io_request cleanup op is now always in a position to be given a
pointer to a netfs_io_request struct, so this can be passed in instead of
the mapping and private data arguments (both of which are included in the
struct).
So rename the ->cleanup op to ->free_request (to match ->init_request) and
pass in the I/O pointer.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
cc: linux-cachefs@redhat.com
|
|
Change the signature of netfs helper functions to take a struct netfs_inode
pointer rather than a struct inode pointer where appropriate, thereby
relieving the need for the network filesystem to convert its internal inode
format down to the VFS inode only for netfslib to bounce it back up. For
type safety, it's better not to do that (and it's less typing too).
Give netfs_write_begin() an extra argument to pass in a pointer to the
netfs_inode struct rather than deriving it internally from the file
pointer. Note that the ->write_begin() and ->write_end() ops are intended
to be replaced in the future by netfslib code that manages this without the
need to call in twice for each page.
netfs_readpage() and similar are intended to be pointed at directly by the
address_space_operations table, so must stick to the signature dictated by
the function pointers there.
Changes
=======
- Updated the kerneldoc comments and documentation [DH].
Signed-off-by: David Howells <dhowells@redhat.com>
cc: linux-cachefs@redhat.com
Link: https://lore.kernel.org/r/CAHk-=wgkwKyNmNdKpQkqZ6DnmUL-x9hp0YBnUGjaPFEAdxDTbw@mail.gmail.com/
|
|
Remove an unused global variable and make another static as reported by
make C=1.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: linux-afs@lists.infradead.org
|
|
After the commit ca522482e3ea ("dm: pass NULL bdev to bio_alloc_clone"),
clone_endio() only calls dm_zone_endio() when DM targets remap the
clone bio's bdev to something other than the md->disk->part0 default.
However, if a DM target (e.g. dm-crypt) stacked ontop of a dm-zoned
does not remap the clone bio using bio_set_dev() then dm_zone_endio()
is not called at completion of the bios and zone locks are not
properly unlocked. This triggers a hang, in dm_zone_map_bio(), when
blktests block/004 is run for dm-crypt on zoned block devices. To
avoid the hang, simply remove the clone_endio() check that verifies
the target remapped the clone bio to a device other than the default.
Fixes: ca522482e3ea ("dm: pass NULL bdev to bio_alloc_clone")
Reported-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
|
|
Fix a misspelling of the word "platform".
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Michael Shych <michaelsh@nvidia.com>
Link: https://lore.kernel.org/r/9c8edde31e271311b7832d7677fe84aba917da8d.1653376503.git.geert@linux-m68k.org
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
|
|
There's a rule in certs/Makefile for which the command begins with eight
spaces. This results in:
../certs/Makefile:21: FORCE prerequisite is missing
../certs/Makefile:21: *** missing separator. Stop.
Fix this by turning the spaces into a tab.
Fixes: addf466389d9 ("certs: Check that builtin blacklist hashes are valid")
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Reviewed-by: Mickaël Salaün <mic@linux.microsoft.com>
cc: keyrings@vger.kernel.org
Link: https://lore.kernel.org/r/486b1b80-9932-aab6-138d-434c541c934a@digikod.net/ # v1
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
As Liviu pointed out, the arm,malidp-arqos-high-level property
mentioned in the original .txt binding was a mistake, and
arm,malidp-arqos-value needs to take its place.
The binding commit ce6eb0253cba ("dt/bindings: display: Add optional
property node define for Mali DP500") mentions the right name in the
commit message, but has the wrong name in the diff.
Commit d298e6a27a81 ("drm/arm/mali-dp: Add display QoS interface
configuration for Mali DP500") uses the property in the driver, but uses
the shorter name.
Remove the wrong property from the binding, and use the proper name in
the example. The actual property was already documented properly.
Fixes: 2c8b082a3ab1 ("dt-bindings: display: convert Arm Mali-DP to DT schema")
Link: https://lore.kernel.org/linux-arm-kernel/YnumGEilUblhBx8E@e110455-lin.cambridge.arm.com/
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Reported-by: Liviu Dudau <liviu.dudau@arm.com>
Reviewed-by: Liviu Dudau <liviu.dudau@arm.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20220609162729.1441760-1-andre.przywara@arm.com
|
|
There's no reason to list the same value twice in an 'enum'. This was fixed
treewide in commit c3b006819426 ("dt-bindings: Fix 'enum' lists with
duplicate entries"), but this one got added in the merge window.
A meta-schema change will catch future cases.
Signed-off-by: Rob Herring <robh@kernel.org>
Acked-by: Sergio Paracuellos <sergio.paracuellos@gmail.com>
Link: https://lore.kernel.org/r/20220606212239.1360877-1-robh@kernel.org
|
|
This function is only called from assembly, no need for a prototype
declaration in a header file. In addition, add #ifdef around the
function since it is only used when CONFIG_KASAN_HW_TAGS.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: kernel test robot <lkp@intel.com>
|
|
The EFI save/restore code is confused. When saving the check for saving
FFR is inverted due to confusion with the streaming mode check, and when
restoring we check if we need to restore FFR by checking the percpu
efi_sm_state without the required wrapper rather than based on the
combination of FA64 support and streaming mode.
Fixes: e0838f6373e5 ("arm64/sme: Save and restore streaming mode over EFI runtime calls")
Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20220602124132.3528951-1-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
Delete the redundant word 'in'.
Signed-off-by: Xiang wangx <wangxiang@cdjrlc.com>
Link: https://lore.kernel.org/r/20220610070543.59338-1-wangxiang@cdjrlc.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
In the awk script, there was a typo with the comparison operator when
checking if the matched pattern is inside an Enum block.
This prevented the generation of the whole sysreg-defs.h header.
Fixes: 66847e0618d7 ("arm64: Add sysreg header generation scripting")
Signed-off-by: Alejandro Tafalla <atafalla@dnyon.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20220609204220.12112-1-atafalla@dnyon.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
Currently if the APB or Debounce clocks aren't yet ready to be requested
the DW GPIO driver will correctly handle that by deferring the probe
procedure, but the error is still printed to the system log. It needlessly
pollutes the log since there was no real error but a request to postpone
the clock request procedure since the clocks subsystem hasn't been fully
initialized yet. Let's fix that by using the dev_err_probe method to print
the APB/clock request error status. It will correctly handle the deferred
probe situation and print the error if it actually happens.
Signed-off-by: Serge Semin <Sergey.Semin@baikalelectronics.ru>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Bartosz Golaszewski <brgl@bgdev.pl>
|
|
With arch randomness being used by every distro and enabled in
defconfigs, the distinction between rng_has_arch_random() and
rng_is_initialized() is now rather small. In fact, the places where they
differ are now places where paranoid users and system builders really
don't want arch randomness to be used, in which case we should respect
that choice, or places where arch randomness is known to be broken, in
which case that choice is all the more important. So this commit just
removes the function and its one user.
Reviewed-by: Petr Mladek <pmladek@suse.com> # for vsprintf.c
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
This commit changes the default Kconfig values of RANDOM_TRUST_CPU and
RANDOM_TRUST_BOOTLOADER to be Y by default. It does not change any
existing configs or change any kernel behavior. The reason for this is
several fold.
As background, I recently had an email thread with the kernel
maintainers of Fedora/RHEL, Debian, Ubuntu, Gentoo, Arch, NixOS, Alpine,
SUSE, and Void as recipients. I noted that some distros trust RDRAND,
some trust EFI, and some trust both, and I asked why or why not. There
wasn't really much of a "debate" but rather an interesting discussion of
what the historical reasons have been for this, and it came up that some
distros just missed the introduction of the bootloader Kconfig knob,
while another didn't want to enable it until there was a boot time
switch to turn it off for more concerned users (which has since been
added). The result of the rather uneventful discussion is that every
major Linux distro enables these two options by default.
While I didn't have really too strong of an opinion going into this
thread -- and I mostly wanted to learn what the distros' thinking was
one way or another -- ultimately I think their choice was a decent
enough one for a default option (which can be disabled at boot time).
I'll try to summarize the pros and cons:
Pros:
- The RNG machinery gets initialized super quickly, and there's no
messing around with subsequent blocking behavior.
- The bootloader mechanism is used by kexec in order for the prior
kernel to initialize the RNG of the next kernel, which increases
the entropy available to early boot daemons of the next kernel.
- Previous objections related to backdoors centered around
Dual_EC_DRBG-like kleptographic systems, in which observing some
amount of the output stream enables an adversary holding the right key
to determine the entire output stream.
This used to be a partially justified concern, because RDRAND output
was mixed into the output stream in varying ways, some of which may
have lacked pre-image resistance (e.g. XOR or an LFSR).
But this is no longer the case. Now, all usage of RDRAND and
bootloader seeds go through a cryptographic hash function. This means
that the CPU would have to compute a hash pre-image, which is not
considered to be feasible (otherwise the hash function would be
terribly broken).
- More generally, if the CPU is backdoored, the RNG is probably not the
realistic vector of choice for an attacker.
- These CPU or bootloader seeds are far from being the only source of
entropy. Rather, there is generally a pretty huge amount of entropy,
not all of which is credited, especially on CPUs that support
instructions like RDRAND. In other words, assuming RDRAND outputs all
zeros, an attacker would *still* have to accurately model every single
other entropy source also in use.
- The RNG now reseeds itself quite rapidly during boot, starting at 2
seconds, then 4, then 8, then 16, and so forth, so that other sources
of entropy get used without much delay.
- Paranoid users can set random.trust_{cpu,bootloader}=no in the kernel
command line, and paranoid system builders can set the Kconfig options
to N, so there's no reduction or restriction of optionality.
- It's a practical default.
- All the distros have it set this way. Microsoft and Apple trust it
too. Bandwagon.
Cons:
- RDRAND *could* still be backdoored with something like a fixed key or
limited space serial number seed or another indexable scheme like
that. (However, it's hard to imagine threat models where the CPU is
backdoored like this, yet people are still okay making *any*
computations with it or connecting it to networks, etc.)
- RDRAND *could* be defective, rather than backdoored, and produce
garbage that is in one way or another insufficient for crypto.
- Suggesting a *reduction* in paranoia, as this commit effectively does,
may cause some to question my personal integrity as a "security
person".
- Bootloader seeds and RDRAND are generally very difficult if not all
together impossible to audit.
Keep in mind that this doesn't actually change any behavior. This
is just a change in the default Kconfig value. The distros already are
shipping kernels that set things this way.
Ard made an additional argument in [1]:
We're at the mercy of firmware and micro-architecture anyway, given
that we are also relying on it to ensure that every instruction in
the kernel's executable image has been faithfully copied to memory,
and that the CPU implements those instructions as documented. So I
don't think firmware or ISA bugs related to RNGs deserve special
treatment - if they are broken, we should quirk around them like we
usually do. So enabling these by default is a step in the right
direction IMHO.
In [2], Phil pointed out that having this disabled masked a bug that CI
otherwise would have caught:
A clean 5.15.45 boots cleanly, whereas a downstream kernel shows the
static key warning (but it does go on to boot). The significant
difference is that our defconfigs set CONFIG_RANDOM_TRUST_BOOTLOADER=y
defining that on top of multi_v7_defconfig demonstrates the issue on
a clean 5.15.45. Conversely, not setting that option in a
downstream kernel build avoids the warning
[1] https://lore.kernel.org/lkml/CAMj1kXGi+ieviFjXv9zQBSaGyyzeGW_VpMpTLJK8PJb2QHEQ-w@mail.gmail.com/
[2] https://lore.kernel.org/lkml/c47c42e3-1d56-5859-a6ad-976a1a3381c6@raspberrypi.com/
Cc: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Stephen reported that a static key warning splat appears during early
boot on systems that credit randomness from device trees that contain an
"rng-seed" property, because because setup_machine_fdt() is called
before jump_label_init() during setup_arch():
static_key_enable_cpuslocked(): static key '0xffffffe51c6fcfc0' used before call to jump_label_init()
WARNING: CPU: 0 PID: 0 at kernel/jump_label.c:166 static_key_enable_cpuslocked+0xb0/0xb8
Modules linked in:
CPU: 0 PID: 0 Comm: swapper Not tainted 5.18.0+ #224 44b43e377bfc84bc99bb5ab885ff694984ee09ff
pstate: 600001c9 (nZCv dAIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : static_key_enable_cpuslocked+0xb0/0xb8
lr : static_key_enable_cpuslocked+0xb0/0xb8
sp : ffffffe51c393cf0
x29: ffffffe51c393cf0 x28: 000000008185054c x27: 00000000f1042f10
x26: 0000000000000000 x25: 00000000f10302b2 x24: 0000002513200000
x23: 0000002513200000 x22: ffffffe51c1c9000 x21: fffffffdfdc00000
x20: ffffffe51c2f0831 x19: ffffffe51c6fcfc0 x18: 00000000ffff1020
x17: 00000000e1e2ac90 x16: 00000000000000e0 x15: ffffffe51b710708
x14: 0000000000000066 x13: 0000000000000018 x12: 0000000000000000
x11: 0000000000000000 x10: 00000000ffffffff x9 : 0000000000000000
x8 : 0000000000000000 x7 : 61632065726f6665 x6 : 6220646573752027
x5 : ffffffe51c641d25 x4 : ffffffe51c13142c x3 : ffff0a00ffffff05
x2 : 40000000ffffe003 x1 : 00000000000001c0 x0 : 0000000000000065
Call trace:
static_key_enable_cpuslocked+0xb0/0xb8
static_key_enable+0x2c/0x40
crng_set_ready+0x24/0x30
execute_in_process_context+0x80/0x90
_credit_init_bits+0x100/0x154
add_bootloader_randomness+0x64/0x78
early_init_dt_scan_chosen+0x140/0x184
early_init_dt_scan_nodes+0x28/0x4c
early_init_dt_scan+0x40/0x44
setup_machine_fdt+0x7c/0x120
setup_arch+0x74/0x1d8
start_kernel+0x84/0x44c
__primary_switched+0xc0/0xc8
---[ end trace 0000000000000000 ]---
random: crng init done
Machine model: Google Lazor (rev1 - 2) with LTE
A trivial fix went in to address this on arm64, 73e2d827a501 ("arm64:
Initialize jump labels before setup_machine_fdt()"). I wrote patches as
well for arm32 and risc-v. But still patches are needed on xtensa,
powerpc, arc, and mips. So that's 7 platforms where things aren't quite
right. This sort of points to larger issues that might need a larger
solution.
Instead, this commit just defers setting the static branch until later
in the boot process. random_init() is called after jump_label_init() has
been called, and so is always a safe place from which to adjust the
static branch.
Fixes: f5bda35fba61 ("random: use static branch for crng_ready()")
Reported-by: Stephen Boyd <swboyd@chromium.org>
Reported-by: Phil Elwell <phil@raspberrypi.com>
Tested-by: Phil Elwell <phil@raspberrypi.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Rather than accounting in bytes and multiplying (shifting), we can just
account in bits and avoid the shift. The main motivation for this is
there are other patches in flux that expand this code a bit, and
avoiding the duplication of "* 8" everywhere makes things a bit clearer.
Cc: stable@vger.kernel.org
Fixes: 12e45a2a6308 ("random: credit architectural init the exact amount")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
add_bootloader_randomness() and the variables it touches are only used
during __init and not after, so mark these as __init. At the same time,
unexport this, since it's only called by other __init code that's
built-in.
Cc: stable@vger.kernel.org
Fixes: 428826f5358c ("fdt: add support for rng-seed")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
The current flow expands to:
if (crng_ready())
...
else if (...)
if (!crng_ready())
...
The second crng_ready() call is redundant, but can't so easily be
optimized out by the compiler.
This commit simplifies that to:
if (crng_ready()
...
else if (...)
...
Fixes: 560181c27b58 ("random: move initialization functions out of hot pages")
Cc: stable@vger.kernel.org
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Commit 40867d74c374 ("net: Add l3mdev index to flow struct and avoid oif
reset for port devices") adds a new entry (flowi_l3mdev) in the common
flow struct used for indicating the l3mdev index for later rule and
table matching.
The l3mdev_update_flow() has been adapted to properly set the
flowi_l3mdev based on the flowi_oif/flowi_iif. In fact, when a valid
flowi_iif is supplied to the l3mdev_update_flow(), this function can
update the flowi_l3mdev entry only if it has not yet been set (i.e., the
flowi_l3mdev entry is equal to 0).
The SRv6 End.DT6 behavior in VRF mode leverages a VRF device in order to
force the routing lookup into the associated routing table. This routing
operation is performed by seg6_lookup_any_nextop() preparing a flowi6
data structure used by ip6_route_input_lookup() which, in turn,
(indirectly) invokes l3mdev_update_flow().
However, seg6_lookup_any_nexthop() does not initialize the new
flowi_l3mdev entry which is filled with random garbage data. This
prevents l3mdev_update_flow() from properly updating the flowi_l3mdev
with the VRF index, and thus SRv6 End.DT6 (VRF mode)/DT46 behaviors are
broken.
This patch correctly initializes the flowi6 instance allocated and used
by seg6_lookup_any_nexhtop(). Specifically, the entire flowi6 instance
is wiped out: in case new entries are added to flowi/flowi6 (as happened
with the flowi_l3mdev entry), we should no longer have incorrectly
initialized values. As a result of this operation, the value of
flowi_l3mdev is also set to 0.
The proposed fix can be tested easily. Starting from the commit
referenced in the Fixes, selftests [1],[2] indicate that the SRv6
End.DT6 (VRF mode)/DT46 behaviors no longer work correctly. By applying
this patch, those behaviors are back to work properly again.
[1] - tools/testing/selftests/net/srv6_end_dt46_l3vpn_test.sh
[2] - tools/testing/selftests/net/srv6_end_dt6_l3vpn_test.sh
Fixes: 40867d74c374 ("net: Add l3mdev index to flow struct and avoid oif reset for port devices")
Reported-by: Anton Makarov <am@3a-alliance.com>
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20220608091917.20345-1-andrea.mayer@uniroma2.it
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Swap around the GRE and VLAN parts in the flow-key offloaded by
the driver to fit in with other tunnel types and the firmware.
Without this change used cases with GRE+VLAN on the outer header
does not get offloaded as the flow-key mismatches what the
firmware expect.
Fixes: 0d630f58989a ("nfp: flower: add support to offload QinQ match")
Fixes: 5a2b93041646 ("nfp: flower-ct: compile match sections of flow_payload")
Signed-off-by: Etienne van der Linde <etienne.vanderlinde@corigine.com>
Signed-off-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
nfp_net_sriov_check is added in nfp_app_get_vf_config which intends
to ensure ivi->vlan_proto and ivi->max_tx_rate/min_tx_rate can be
read from VF config table only when firmware supports corresponding
capability.
However, "nfp_app_get_vf_config" can be called by commands like
"ip a", "ip link set $DEV up" and "ip link set $DEV vf $NUM vlan
$param" (with VF). When using commands above, many warnings
"ndo_set_vf_<cap_x> not supported" would appear if firmware doesn't
support VF rate limit and 802.1ad VLAN assingment. If more VFs are
created, things could get worse.
Thus, this patch add an extra bool parameter for nfp_net_sriov_check
to enable/disable the cap check warning report. Unnecessary warnings
in nfp_app_get_vf_config can be avoided. Valid warnings in kinds of
vf setting function can be reserved.
Fixes: e0d0e1fdf1ed ("nfp: VF rate limit support")
Fixes: 59359597b010 ("nfp: support 802.1ad VLAN assingment to VF")
Signed-off-by: Fei Qin <fei.qin@corigine.com>
Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com>
Signed-off-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|