Age | Commit message (Collapse) | Author | Files | Lines |
|
Resctrl assumes that control and monitor operations on a resource are
performed at the same scope.
Prepare for systems that use different scope (specifically Intel needs
to split the RDT_RESOURCE_L3 resource to use L3 scope for cache control
and NODE scope for cache occupancy and memory bandwidth monitoring).
Create separate domain lists for control and monitor operations.
Note that errors during initialization of either control or monitor
functions on a domain would previously result in that domain being
excluded from both control and monitor operations. Now the domains are
allocated independently it is no longer required to disable both control
and monitor operations if either fail.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Link: https://lore.kernel.org/r/20240628215619.76401-4-tony.luck@intel.com
|
|
The rdt_domain structure is used for both control and monitor features.
It is about to be split into separate structures for these two usages
because the scope for control and monitoring features for a resource
will be different for future resources.
To allow for common code that scans a list of domains looking for a
specific domain id, move all the common fields ("list", "id", "cpu_mask")
into their own structure within the rdt_domain structure.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Link: https://lore.kernel.org/r/20240628215619.76401-3-tony.luck@intel.com
|
|
Resctrl resources operate on subsets of CPUs in the system with the
defining attribute of each subset being an instance of a particular
level of cache. E.g. all CPUs sharing an L3 cache would be part of the
same domain.
In preparation for features that are scoped at the NUMA node level,
change the code from explicit references to "cache_level" to a more
generic scope. At this point the only options for this scope are groups
of CPUs that share an L2 cache or L3 cache.
Clean up the error handling when looking up domains. Report invalid ids
before calling rdt_find_domain() in preparation for better messages when
scope can be other than cache scope. This means that rdt_find_domain()
will never return an error. So remove checks for error from the call sites.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Link: https://lore.kernel.org/r/20240628215619.76401-2-tony.luck@intel.com
|
|
pseudo_lock_region_init() and rdtgroup_cbm_to_size() open code a search for
details of a particular cache level.
Replace with get_cpu_cacheinfo_level().
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/r/20240610003927.341707-5-tony.luck@intel.com
|
|
Resctrl open codes a search for information about a given cache level in
a couple of places (and more are on the way).
Provide a new inline function get_cpu_cacheinfo_level() in
<linux/cacheinfo.h> to do the search and return a pointer to the
cacheinfo structure.
Add lockdep_assert_cpus_held() to enforce the comment that cpuhp lock
must be held.
Simplify the existing get_cpu_cacheinfo_id() by using this new function
to do the search.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/r/20240610003927.341707-4-tony.luck@intel.com
|
|
This file was created with a direct cut and paste from cpu.h so
kept the legacy declaration style.
But the Linux coding standard for function declarations in header
files is to avoid use of "extern".
Drop "extern" from all function declarations.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20240610003927.341707-3-tony.luck@intel.com
|
|
Avoid upcoming #include hell when <linux/cachinfo.h> wants to use
lockdep_assert_cpus_held() and creates a #include loop that would
break the build for arch/riscv.
[ bp: s/cpu/CPU/g ]
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20240610003927.341707-2-tony.luck@intel.com
|
|
|
|
At least ASUS Zenbook 14 (2023) and ASUS Zenbook 14 Pro (2023) are affected.
The touchscreen reports a battery status of 0% and jumps to 1% when a
stylus is used.
The device ID was added and the battery ignore quirk was enabled for it.
[jkosina@suse.com: reformatted changelog a bit]
Signed-off-by: Louis Dalibard <ontake@ontake.dev>
Signed-off-by: Jiri Kosina <jkosina@suse.com>
|
|
The Elan eKTH5015M touch controller found on the Lenovo ThinkPad X13s
shares the VCC33 supply with other peripherals that may remain powered
during suspend (e.g. when enabled as wakeup sources).
The reset line is also wired so that it can be left deasserted when the
supply is off.
This is important as it avoids holding the controller in reset for
extended periods of time when it remains powered, which can lead to
increased power consumption, and also avoids leaking current through the
X13s reset circuitry during suspend (and after driver unbind).
Use the new 'no-reset-on-power-off' devicetree property to determine
when reset needs to be asserted on power down.
Notably this also avoids wasting power on machine variants without a
touchscreen for which the driver would otherwise exit probe with reset
asserted.
Fixes: bd3cba00dcc6 ("HID: i2c-hid: elan: Add support for Elan eKTH6915 i2c-hid touchscreens")
Cc: <stable@vger.kernel.org> # 6.0
Cc: Douglas Anderson <dianders@chromium.org>
Tested-by: Steev Klimaszewski <steev@kali.org>
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Link: https://lore.kernel.org/r/20240507144821.12275-5-johan+linaro@kernel.org
Signed-off-by: Benjamin Tissoires <bentiss@kernel.org>
|
|
When the power supply is shared with other peripherals the reset line
can be wired in such a way that it can remain deasserted regardless of
whether the supply is on or not.
This is important as it can be used to avoid holding the controller in
reset for extended periods of time when it remains powered, something
which can lead to increased power consumption. Leaving reset deasserted
also avoids leaking current through the reset circuitry pull-up
resistors.
Add a new 'no-reset-on-power-off' devicetree property which can be used
by the OS to determine when reset needs to be asserted on power down.
Note that this property can also be used when the supply cannot be
turned off by the OS at all.
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://lore.kernel.org/r/20240507144821.12275-4-johan+linaro@kernel.org
Signed-off-by: Benjamin Tissoires <bentiss@kernel.org>
|
|
Add a compatible string for the Elan eKTH5015M touch controller.
Judging from the current binding and commit bd3cba00dcc6 ("HID: i2c-hid:
elan: Add support for Elan eKTH6915 i2c-hid touchscreens"), eKTH5015M
appears to be compatible with eKTH6915. Notably the power-on sequence is
the same.
While at it, drop a redundant label from the example.
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Link: https://lore.kernel.org/r/20240507144821.12275-3-johan+linaro@kernel.org
Signed-off-by: Benjamin Tissoires <bentiss@kernel.org>
|
|
The Ilitek ILI2901 touch screen controller was apparently incorrectly
added to the Elan eKTH6915 schema simply because it also has a reset
gpio and is currently managed by the Elan driver in Linux.
The two controllers are not related even if an unfortunate wording in
the commit message adding the Ilitek compatible made it sound like they
were.
Add a dedicated schema for the ILI2901 which does not specify the I2C
address (which is likely 0x41 rather than 0x10 as for other Ilitek touch
controllers) to avoid cluttering the Elan schema with unrelated devices
and to make it easier to find the correct schema when adding further
Ilitek controllers.
Fixes: d74ac6f60a7e ("dt-bindings: HID: i2c-hid: elan: Introduce Ilitek ili2901")
Cc: Zhengqiao Xia <xiazhengqiao@huaqin.corp-partner.google.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Link: https://lore.kernel.org/r/20240507144821.12275-2-johan+linaro@kernel.org
Signed-off-by: Benjamin Tissoires <bentiss@kernel.org>
|
|
HUTRR94 added support for a new usage titled "System Do Not Disturb"
which toggles a system-wide Do Not Disturb setting. This commit simply
adds a new event code for the usage.
Signed-off-by: Aseda Aboagye <aaboagye@chromium.org>
Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Link: https://lore.kernel.org/r/Zl-gUHE70s7wCAoB@google.com
Signed-off-by: Benjamin Tissoires <bentiss@kernel.org>
|
|
HUTRR116 added support for a new usage titled "System Accessibility
Binding" which toggles a system-wide bound accessibility UI or command.
This commit simply adds a new event code for the usage.
Signed-off-by: Aseda Aboagye <aaboagye@chromium.org>
Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Link: https://lore.kernel.org/r/Zl-e97O9nvudco5z@google.com
Signed-off-by: Benjamin Tissoires <bentiss@kernel.org>
|
|
syzbot reported a potential read out of bounds in asus_report_fixup.
this patch adds checks so that a read out of bounds will not occur
Signed-off-by: Andrew Ballance <andrewjballance@gmail.com>
Reported-by: <syzbot+07762f019fd03d01f04c@syzkaller.appspotmail.com>
Closes: https://syzkaller.appspot.com/bug?extid=07762f019fd03d01f04c
Fixes: 59d2f5b7392e ("HID: asus: fix more n-key report descriptors if n-key quirked")
Link: https://lore.kernel.org/r/20240602085023.1720492-1-andrewjballance@gmail.com
Signed-off-by: Benjamin Tissoires <bentiss@kernel.org>
|
|
On x86, make allmodconfig && make W=1 C=1 reports:
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/gpio/gpio-gw-pld.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/gpio/gpio-mc33880.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/gpio/gpio-pcf857x.o
Add the missing invocations of the MODULE_DESCRIPTION() macro,
including the one missing in gpio-pl061.c, which is not built for x86.
Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Link: https://lore.kernel.org/r/20240606-md-drivers-gpio-v1-1-cb42d240ca5c@quicinc.com
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
|
|
There's now no need to make sure subreq->io_iter is advanced to match
subreq->transferred before calling one of the netfs subrequest termination
functions as the check has been removed netfslib and the iterator is reset
prior to retrying a subreq.
Fixes: 3ee1a1fc3981 ("cifs: Cut over to using netfslib")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Steve French <sfrench@samba.org>
cc: Paulo Alcantara <pc@manguebit.com>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Rohith Surabattula <rohiths.msft@gmail.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
cc: linux-mm@kvack.org
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Unlock cifs_tcp_ses_lock before calling cifs_put_smb_ses() to avoid such
deadlock.
Cc: stable@vger.kernel.org
Signed-off-by: Enzo Matsumiya <ematsumiya@suse.de>
Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Building with W=1 incorrectly emits the following warning:
WARNING: modpost: missing MODULE_DESCRIPTION() in vmlinux.o
This check should apply only to modules.
Fixes: 1fffe7a34c89 ("script: modpost: emit a warning when the description is missing")
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Vincenzo Palazzo <vincenzopalazzodev@gmail.com>
|
|
In commit b18b047002b7 ("kbuild: change scripts/mksysmap into sed
script"), the mksysmap script was transformed into a sed script,
made directly executable with "#!/bin/sed -f". Apparently, the path to
sed is different on NixOS.
The shebang can't use the env command, otherwise the "sed -f" command
would be treated as a single argument. This can be solved with the -S
flag, but that is a GNU extension. Explicitly use sed instead of relying
on the executable shebang to fix NixOS builds without breaking build
environments using Busybox.
Fixes: b18b047002b7 ("kbuild: change scripts/mksysmap into sed script")
Reported-by: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Richard Acayan <mailingradian@gmail.com>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Dmitry Safonov <0x7f454c46@gmail.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
[BUG]
Since v6.8 there are rare kernel crashes reported by various people,
the common factor is bad page status error messages like this:
BUG: Bad page state in process kswapd0 pfn:d6e840
page: refcount:0 mapcount:0 mapping:000000007512f4f2 index:0x2796c2c7c
pfn:0xd6e840
aops:btree_aops ino:1
flags: 0x17ffffe0000008(uptodate|node=0|zone=2|lastcpupid=0x3fffff)
page_type: 0xffffffff()
raw: 0017ffffe0000008 dead000000000100 dead000000000122 ffff88826d0be4c0
raw: 00000002796c2c7c 0000000000000000 00000000ffffffff 0000000000000000
page dumped because: non-NULL mapping
[CAUSE]
Commit 09e6cef19c9f ("btrfs: refactor alloc_extent_buffer() to
allocate-then-attach method") changes the sequence when allocating a new
extent buffer.
Previously we always called grab_extent_buffer() under
mapping->i_private_lock, to ensure the safety on modification on
folio::private (which is a pointer to extent buffer for regular
sectorsize).
This can lead to the following race:
Thread A is trying to allocate an extent buffer at bytenr X, with 4
4K pages, meanwhile thread B is trying to release the page at X + 4K
(the second page of the extent buffer at X).
Thread A | Thread B
-----------------------------------+-------------------------------------
| btree_release_folio()
| | This is for the page at X + 4K,
| | Not page X.
| |
alloc_extent_buffer() | |- release_extent_buffer()
|- filemap_add_folio() for the | | |- atomic_dec_and_test(eb->refs)
| page at bytenr X (the first | | |
| page). | | |
| Which returned -EEXIST. | | |
| | | |
|- filemap_lock_folio() | | |
| Returned the first page locked. | | |
| | | |
|- grab_extent_buffer() | | |
| |- atomic_inc_not_zero() | | |
| | Returned false | | |
| |- folio_detach_private() | | |- folio_detach_private() for X
| |- folio_test_private() | | |- folio_test_private()
| Returned true | | | Returned true
|- folio_put() | |- folio_put()
Now there are two puts on the same folio at folio X, leading to refcount
underflow of the folio X, and eventually causing the BUG_ON() on the
page->mapping.
The condition is not that easy to hit:
- The release must be triggered for the middle page of an eb
If the release is on the same first page of an eb, page lock would kick
in and prevent the race.
- folio_detach_private() has a very small race window
It's only between folio_test_private() and folio_clear_private().
That's exactly when mapping->i_private_lock is used to prevent such race,
and commit 09e6cef19c9f ("btrfs: refactor alloc_extent_buffer() to
allocate-then-attach method") screwed that up.
At that time, I thought the page lock would kick in as
filemap_release_folio() also requires the page to be locked, but forgot
the filemap_release_folio() only locks one page, not all pages of an
extent buffer.
[FIX]
Move all the code requiring i_private_lock into
attach_eb_folio_to_filemap(), so that everything is done with proper
lock protection.
Furthermore to prevent future problems, add an extra
lockdep_assert_locked() to ensure we're holding the proper lock.
To reproducer that is able to hit the race (takes a few minutes with
instrumented code inserting delays to alloc_extent_buffer()):
#!/bin/sh
drop_caches () {
while(true); do
echo 3 > /proc/sys/vm/drop_caches
echo 1 > /proc/sys/vm/compact_memory
done
}
run_tar () {
while(true); do
for x in `seq 1 80` ; do
tar cf /dev/zero /mnt > /dev/null &
done
wait
done
}
mkfs.btrfs -f -d single -m single /dev/vda
mount -o noatime /dev/vda /mnt
# create 200,000 files, 1K each
./simoop -n 200000 -E -f 1k /mnt
drop_caches &
(run_tar)
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/linux-btrfs/CAHk-=wgt362nGfScVOOii8cgKn2LVVHeOvOA7OBwg1OwbuJQcw@mail.gmail.com/
Reported-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Link: https://lore.kernel.org/lkml/CABXGCsPktcHQOvKTbPaTwegMExije=Gpgci5NW=hqORo-s7diA@mail.gmail.com/
Reported-by: Toralf Förster <toralf.foerster@gmx.de>
Link: https://lore.kernel.org/linux-btrfs/e8b3311c-9a75-4903-907f-fc0f7a3fe423@gmx.de/
Reported-by: syzbot+f80b066392366b4af85e@syzkaller.appspotmail.com
Fixes: 09e6cef19c9f ("btrfs: refactor alloc_extent_buffer() to allocate-then-attach method")
CC: stable@vger.kernel.org # 6.8+
CC: Chris Mason <clm@fb.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Without this, the 'i' variable declared before could be overridden by
accident, e.g.
for i in "${@}"; do
__ksft_status_merge "${i}" ## 'i' has been modified
foo "${i}" ## using 'i' with an unexpected value
done
After a quick look, it looks like 'i' is currently not used after having
been modified in __ksft_status_merge(), but still, better be safe than
sorry. I saw this while modifying the same file, not because I suspected
an issue somewhere.
Fixes: 596c8819cb78 ("selftests: forwarding: Have RET track kselftest framework constants")
Acked-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://lore.kernel.org/r/20240605-upstream-net-20240605-selftests-net-lib-fixes-v1-3-b3afadd368c9@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
If there is an error to create the first netns with 'setup_ns()',
'cleanup_ns()' will be called with an empty string as first parameter.
The consequences is that 'cleanup_ns()' will try to delete an invalid
netns, and wait 20 seconds if the netns list is empty.
Instead of just checking if the name is not empty, convert the string
separated by spaces to an array. Manipulating the array is cleaner, and
calling 'cleanup_ns()' with an empty array will be a no-op.
Fixes: 25ae948b4478 ("selftests/net: add lib.sh")
Cc: stable@vger.kernel.org
Acked-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://lore.kernel.org/r/20240605-upstream-net-20240605-selftests-net-lib-fixes-v1-2-b3afadd368c9@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
If errexit is enabled ('set -e'), loopy_wait -- or busywait and others
using it -- will stop after the first failure.
Note that if the returned status of loopy_wait is checked, and even if
errexit is enabled, Bash will not stop at the first error.
Fixes: 25ae948b4478 ("selftests/net: add lib.sh")
Cc: stable@vger.kernel.org
Acked-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://lore.kernel.org/r/20240605-upstream-net-20240605-selftests-net-lib-fixes-v1-1-b3afadd368c9@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The pata_macio driver advertises a max_segment_size of 0xff00, because
the hardware doesn't cope with requests >= 64K.
However the SCSI core requires max_segment_size to be at least
PAGE_SIZE, which is a problem for pata_macio when the kernel is built
with 64K pages.
In older kernels the SCSI core would just increase the segment size to
be equal to PAGE_SIZE, however since the commit tagged below it causes a
warning and the device fails to probe:
WARNING: CPU: 0 PID: 26 at block/blk-settings.c:202 .blk_validate_limits+0x2f8/0x35c
CPU: 0 PID: 26 Comm: kworker/u4:1 Not tainted 6.10.0-rc1 #1
Hardware name: PowerMac7,2 PPC970 0x390202 PowerMac
...
NIP .blk_validate_limits+0x2f8/0x35c
LR .blk_alloc_queue+0xc0/0x2f8
Call Trace:
.blk_alloc_queue+0xc0/0x2f8
.blk_mq_alloc_queue+0x60/0xf8
.scsi_alloc_sdev+0x208/0x3c0
.scsi_probe_and_add_lun+0x314/0x52c
.__scsi_add_device+0x170/0x1a4
.ata_scsi_scan_host+0x2bc/0x3e4
.async_port_probe+0x6c/0xa0
.async_run_entry_fn+0x60/0x1bc
.process_one_work+0x228/0x510
.worker_thread+0x360/0x530
.kthread+0x134/0x13c
.start_kernel_thread+0x10/0x14
...
scsi_alloc_sdev: Allocation failure during SCSI scanning, some SCSI devices might not be configured
Although the hardware can't cope with a 64K segment, the driver
already deals with that internally by splitting large requests in
pata_macio_qc_prep(). That is how the driver has managed to function
until now on 64K kernels.
So fix the driver to advertise a max_segment_size of 64K, which avoids
the warning and keeps the SCSI core happy.
Fixes: afd53a3d8528 ("scsi: core: Initialize scsi midlayer limits before allocating the queue")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Closes: https://lore.kernel.org/all/ce2bf6af-4382-4fe1-b392-cc6829f5ceb2@roeck-us.net/
Reported-by: Doru Iorgulescu <doru.iorgulescu1@gmail.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218858
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Signed-off-by: Niklas Cassel <cassel@kernel.org>
|
|
Clang static checker (scan-build) warning:
net/ethtool/ioctl.c:line 2233, column 2
Called function pointer is null (null dereference).
Return '-EOPNOTSUPP' when 'ops->get_ethtool_phy_stats' is NULL to fix
this typo error.
Fixes: 201ed315f967 ("net/ethtool/ioctl: split ethtool_get_phy_stats into multiple helpers")
Signed-off-by: Su Hui <suhui@nfschina.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Hariprasad Kelam <hkelam@marvell.com>
Link: https://lore.kernel.org/r/20240605034742.921751-1-suhui@nfschina.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
expr_trans_bool() performs an incorrect transformation.
[Test Code]
config MODULES
def_bool y
modules
config A
def_bool y
select C if B != n
config B
def_tristate m
config C
tristate
[Result]
CONFIG_MODULES=y
CONFIG_A=y
CONFIG_B=m
CONFIG_C=m
This output is incorrect because CONFIG_C=y is expected.
Documentation/kbuild/kconfig-language.rst clearly explains the function
of the '!=' operator:
If the values of both symbols are equal, it returns 'n',
otherwise 'y'.
Therefore, the statement:
select C if B != n
should be equivalent to:
select C if y
Or, more simply:
select C
Hence, the symbol C should be selected by the value of A, which is 'y'.
However, expr_trans_bool() wrongly transforms it to:
select C if B
Therefore, the symbol C is selected by (A && B), which is 'm'.
The comment block of expr_trans_bool() correctly explains its intention:
* bool FOO!=n => FOO
^^^^
If FOO is bool, FOO!=n can be simplified into FOO. This is correct.
However, the actual code performs this transformation when FOO is
tristate:
if (e->left.sym->type == S_TRISTATE) {
^^^^^^^^^^
While it can be fixed to S_BOOLEAN, there is no point in doing so
because expr_tranform() already transforms FOO!=n to FOO when FOO is
bool. (see the "case E_UNEQUAL" part)
expr_trans_bool() is wrong and unnecessary.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
|
|
syzbot found a race in __fib6_drop_pcpu_from() [1]
If compiler reads more than once (*ppcpu_rt),
second read could read NULL, if another cpu clears
the value in rt6_get_pcpu_route().
Add a READ_ONCE() to prevent this race.
Also add rcu_read_lock()/rcu_read_unlock() because
we rely on RCU protection while dereferencing pcpu_rt.
[1]
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000012: 0000 [#1] PREEMPT SMP KASAN PTI
KASAN: null-ptr-deref in range [0x0000000000000090-0x0000000000000097]
CPU: 0 PID: 7543 Comm: kworker/u8:17 Not tainted 6.10.0-rc1-syzkaller-00013-g2bfcfd584ff5 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/02/2024
Workqueue: netns cleanup_net
RIP: 0010:__fib6_drop_pcpu_from.part.0+0x10a/0x370 net/ipv6/ip6_fib.c:984
Code: f8 48 c1 e8 03 80 3c 28 00 0f 85 16 02 00 00 4d 8b 3f 4d 85 ff 74 31 e8 74 a7 fa f7 49 8d bf 90 00 00 00 48 89 f8 48 c1 e8 03 <80> 3c 28 00 0f 85 1e 02 00 00 49 8b 87 90 00 00 00 48 8b 0c 24 48
RSP: 0018:ffffc900040df070 EFLAGS: 00010206
RAX: 0000000000000012 RBX: 0000000000000001 RCX: ffffffff89932e16
RDX: ffff888049dd1e00 RSI: ffffffff89932d7c RDI: 0000000000000091
RBP: dffffc0000000000 R08: 0000000000000005 R09: 0000000000000007
R10: 0000000000000001 R11: 0000000000000006 R12: ffff88807fa080b8
R13: fffffbfff1a9a07d R14: ffffed100ff41022 R15: 0000000000000001
FS: 0000000000000000(0000) GS:ffff8880b9200000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000001b32c26000 CR3: 000000005d56e000 CR4: 00000000003526f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
__fib6_drop_pcpu_from net/ipv6/ip6_fib.c:966 [inline]
fib6_drop_pcpu_from net/ipv6/ip6_fib.c:1027 [inline]
fib6_purge_rt+0x7f2/0x9f0 net/ipv6/ip6_fib.c:1038
fib6_del_route net/ipv6/ip6_fib.c:1998 [inline]
fib6_del+0xa70/0x17b0 net/ipv6/ip6_fib.c:2043
fib6_clean_node+0x426/0x5b0 net/ipv6/ip6_fib.c:2205
fib6_walk_continue+0x44f/0x8d0 net/ipv6/ip6_fib.c:2127
fib6_walk+0x182/0x370 net/ipv6/ip6_fib.c:2175
fib6_clean_tree+0xd7/0x120 net/ipv6/ip6_fib.c:2255
__fib6_clean_all+0x100/0x2d0 net/ipv6/ip6_fib.c:2271
rt6_sync_down_dev net/ipv6/route.c:4906 [inline]
rt6_disable_ip+0x7ed/0xa00 net/ipv6/route.c:4911
addrconf_ifdown.isra.0+0x117/0x1b40 net/ipv6/addrconf.c:3855
addrconf_notify+0x223/0x19e0 net/ipv6/addrconf.c:3778
notifier_call_chain+0xb9/0x410 kernel/notifier.c:93
call_netdevice_notifiers_info+0xbe/0x140 net/core/dev.c:1992
call_netdevice_notifiers_extack net/core/dev.c:2030 [inline]
call_netdevice_notifiers net/core/dev.c:2044 [inline]
dev_close_many+0x333/0x6a0 net/core/dev.c:1585
unregister_netdevice_many_notify+0x46d/0x19f0 net/core/dev.c:11193
unregister_netdevice_many net/core/dev.c:11276 [inline]
default_device_exit_batch+0x85b/0xae0 net/core/dev.c:11759
ops_exit_list+0x128/0x180 net/core/net_namespace.c:178
cleanup_net+0x5b7/0xbf0 net/core/net_namespace.c:640
process_one_work+0x9fb/0x1b60 kernel/workqueue.c:3231
process_scheduled_works kernel/workqueue.c:3312 [inline]
worker_thread+0x6c8/0xf70 kernel/workqueue.c:3393
kthread+0x2c1/0x3a0 kernel/kthread.c:389
ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
Fixes: d52d3997f843 ("ipv6: Create percpu rt6_info")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/r/20240604193549.981839-1-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Documentation/kbuild/kconfig-language.rst explains the behavior of
'select' as follows:
reverse dependencies can be used to force a lower limit of
another symbol. The value of the current menu symbol is used as the
minimal value <symbol> can be set to.
This is not true when the 'select' property is followed by 'if'.
[Test Code]
config MODULES
def_bool y
modules
config A
def_tristate y
select C if B
config B
def_tristate m
config C
tristate
[Result]
CONFIG_MODULES=y
CONFIG_A=y
CONFIG_B=m
CONFIG_C=m
If "the value of A is used as the minimal value C can be set to",
C must be 'y'.
The actual behavior is "C is selected by (A && B)". The lower limit of
C is downgraded due to B being 'm'.
This behavior is kind of weird, and this has arisen several times in
the mailing list.
I do not know whether it is a bug or intended behavior. Anyway, it is
not feasible to change it now because many Kconfig files are written
based on this behavior. The same applies to 'imply'.
Document this (but reserve the possibility for a future change).
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
|
|
This sentence does not make sense due to a typo. Fix it.
Fixes: def2fbffe62c ("kconfig: allow symbols implied by y to become m")
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
|
|
Currently, the initial state of the "Save" button is always active.
If none of the CONFIG options are changed while loading the .config
file, the "Save" button should be greyed out.
This can be fixed by calling conf_read() after widget initialization.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
This is a leftover from commit ce1fc9345a59 ("kconfig: do not clear
SYMBOL_DEF_USER when the value is out of range").
This code is now redundant because if a user-supplied value is out
of range, the value adjusted by sym_validate_range() differs, and
conf_unsaved has already been incremented a few lines above.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
While dumping sockets via UNIX_DIAG, we do not hold unix_state_lock().
Let's use READ_ONCE() to read sk->sk_shutdown.
Fixes: e4e541a84863 ("sock-diag: Report shutdown for inet and unix sockets (v2)")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
We can dump the socket queue length via UNIX_DIAG by specifying
UDIAG_SHOW_RQLEN.
If sk->sk_state is TCP_LISTEN, we return the recv queue length,
but here we do not hold recvq lock.
Let's use skb_queue_len_lockless() in sk_diag_show_rqlen().
Fixes: c9da99e6475f ("unix_diag: Fixup RQLEN extension report")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
If the socket type is SOCK_STREAM or SOCK_SEQPACKET, unix_release_sock()
checks the length of the peer socket's recvq under unix_state_lock().
However, unix_stream_read_generic() calls skb_unlink() after releasing
the lock. Also, for SOCK_SEQPACKET, __skb_try_recv_datagram() unlinks
skb without unix_state_lock().
Thues, unix_state_lock() does not protect qlen.
Let's use skb_queue_empty_lockless() in unix_release_sock().
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Once sk->sk_state is changed to TCP_LISTEN, it never changes.
unix_accept() takes advantage of this characteristics; it does not
hold the listener's unix_state_lock() and only acquires recvq lock
to pop one skb.
It means unix_state_lock() does not prevent the queue length from
changing in unix_stream_connect().
Thus, we need to use unix_recvq_full_lockless() to avoid data-race.
Now we remove unix_recvq_full() as no one uses it.
Note that we can remove READ_ONCE() for sk->sk_max_ack_backlog in
unix_recvq_full_lockless() because of the following reasons:
(1) For SOCK_DGRAM, it is a written-once field in unix_create1()
(2) For SOCK_STREAM and SOCK_SEQPACKET, it is changed under the
listener's unix_state_lock() in unix_listen(), and we hold
the lock in unix_stream_connect()
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
net->unx.sysctl_max_dgram_qlen is exposed as a sysctl knob and can be
changed concurrently.
Let's use READ_ONCE() in unix_create1().
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
sk_setsockopt() changes sk->sk_sndbuf under lock_sock(), but it's
not used in af_unix.c.
Let's use READ_ONCE() to read sk->sk_sndbuf in unix_writable(),
unix_dgram_sendmsg(), and unix_stream_sendmsg().
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
While dumping AF_UNIX sockets via UNIX_DIAG, sk->sk_state is read
locklessly.
Let's use READ_ONCE() there.
Note that the result could be inconsistent if the socket is dumped
during the state change. This is common for other SOCK_DIAG and
similar interfaces.
Fixes: c9da99e6475f ("unix_diag: Fixup RQLEN extension report")
Fixes: 2aac7a2cb0d9 ("unix_diag: Pending connections IDs NLA")
Fixes: 45a96b9be6ec ("unix_diag: Dumping all sockets core")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
unix_stream_read_skb() is called from sk->sk_data_ready() context
where unix_state_lock() is not held.
Let's use READ_ONCE() there.
Fixes: 77462de14a43 ("af_unix: Add read_sock for stream socket types")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
The following functions read sk->sk_state locklessly and proceed only if
the state is TCP_ESTABLISHED.
* unix_stream_sendmsg
* unix_stream_read_generic
* unix_seqpacket_sendmsg
* unix_seqpacket_recvmsg
Let's use READ_ONCE() there.
Fixes: a05d2ad1c1f3 ("af_unix: Only allow recv on connected seqpacket sockets.")
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Once sk->sk_state is changed to TCP_LISTEN, it never changes.
unix_accept() takes the advantage and reads sk->sk_state without
holding unix_state_lock().
Let's use READ_ONCE() there.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
As small optimisation, unix_stream_connect() prefetches the client's
sk->sk_state without unix_state_lock() and checks if it's TCP_CLOSE.
Later, sk->sk_state is checked again under unix_state_lock().
Let's use READ_ONCE() for the first check and TCP_CLOSE directly for
the second check.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
unix_poll() and unix_dgram_poll() read sk->sk_state locklessly and
calls unix_writable() which also reads sk->sk_state without holding
unix_state_lock().
Let's use READ_ONCE() in unix_poll() and unix_dgram_poll() and pass
it to unix_writable().
While at it, we remove TCP_SYN_SENT check in unix_dgram_poll() as
that state does not exist for AF_UNIX socket since the code was added.
Fixes: 1586a5877db9 ("af_unix: do not report POLLOUT on listeners")
Fixes: 3c73419c09a5 ("af_unix: fix 'poll for write'/ connected DGRAM sockets")
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
ioctl(SIOCINQ) calls unix_inq_len() that checks sk->sk_state first
and returns -EINVAL if it's TCP_LISTEN.
Then, for SOCK_STREAM sockets, unix_inq_len() returns the number of
bytes in recvq.
However, unix_inq_len() does not hold unix_state_lock(), and the
concurrent listen() might change the state after checking sk->sk_state.
If the race occurs, 0 is returned for the listener, instead of -EINVAL,
because the length of skb with embryo is 0.
We could hold unix_state_lock() in unix_inq_len(), but it's overkill
given the result is true for pre-listen() TCP_CLOSE state.
So, let's use READ_ONCE() for sk->sk_state in unix_inq_len().
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
sk->sk_state is changed under unix_state_lock(), but it's read locklessly
in many places.
This patch adds WRITE_ONCE() on the writer side.
We will add READ_ONCE() to the lockless readers in the following patches.
Fixes: 83301b5367a9 ("af_unix: Set TCP_ESTABLISHED for datagram sockets too")
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
When a SOCK_DGRAM socket connect()s to another socket, the both sockets'
sk->sk_state are changed to TCP_ESTABLISHED so that we can register them
to BPF SOCKMAP.
When the socket disconnects from the peer by connect(AF_UNSPEC), the state
is set back to TCP_CLOSE.
Then, the peer's state is also set to TCP_CLOSE, but the update is done
locklessly and unconditionally.
Let's say socket A connect()ed to B, B connect()ed to C, and A disconnects
from B.
After the first two connect()s, all three sockets' sk->sk_state are
TCP_ESTABLISHED:
$ ss -xa
Netid State Recv-Q Send-Q Local Address:Port Peer Address:PortProcess
u_dgr ESTAB 0 0 @A 641 * 642
u_dgr ESTAB 0 0 @B 642 * 643
u_dgr ESTAB 0 0 @C 643 * 0
And after the disconnect, B's state is TCP_CLOSE even though it's still
connected to C and C's state is TCP_ESTABLISHED.
$ ss -xa
Netid State Recv-Q Send-Q Local Address:Port Peer Address:PortProcess
u_dgr UNCONN 0 0 @A 641 * 0
u_dgr UNCONN 0 0 @B 642 * 643
u_dgr ESTAB 0 0 @C 643 * 0
In this case, we cannot register B to SOCKMAP.
So, when a socket disconnects from the peer, we should not set TCP_CLOSE to
the peer if the peer is connected to yet another socket, and this must be
done under unix_state_lock().
Note that we use WRITE_ONCE() for sk->sk_state as there are many lockless
readers. These data-races will be fixed in the following patches.
Fixes: 83301b5367a9 ("af_unix: Set TCP_ESTABLISHED for datagram sockets too")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
In case of region creation fail in ipc_devlink_create_region(), previously
created regions delete process starts from tainted pointer which actually
holds error code value.
Fix this bug by decreasing region index before delete.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: 4dcd183fbd67 ("net: wwan: iosm: devlink registration")
Signed-off-by: Aleksandr Mishin <amishin@t-argos.ru>
Acked-by: Sergey Ryazanov <ryazanov.s.a@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240604082500.20769-1-amishin@t-argos.ru
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
These pointers are frequently the same and memcmp does not compare the
pointers before comparing their contents so this was wasting cycles
comparing 16 KiB of memory which will always be equal.
Fixes: bb6780aa5a1d ("drm/vmwgfx: Diff cursors when using cmds")
Signed-off-by: Ian Forbes <ian.forbes@broadcom.com>
Signed-off-by: Zack Rusin <zack.rusin@broadcom.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240328190716.27367-1-ian.forbes@broadcom.com
|