| Age | Commit message (Collapse) | Author | Files | Lines |
|
Examples should be complete and should not have a 'status' property,
especially a disabled one because this disables the dt_binding_check of
the example against the schema. Dropping 'status' property shows
missing other properties - phy-mode and phy-handle.
Fixes: 114508a89ddc ("dt-bindings: net: Add support for Sophgo SG2044 dwmac")
Cc: <stable@vger.kernel.org>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Alexander Sverdlin <alexander.sverdlin@gmail.com>
Reviewed-by: Chen Wang <unicorn_wang@outlook.com>
Link: https://patch.msgid.link/20250701063621.23808-2-krzysztof.kozlowski@linaro.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
For NGBE devices, the queue number is limited to be 1 when SRIOV is
enabled. In this case, IRQ vector[0] is used for MISC and vector[1] is
used for queue, based on the previous patches. But for the hardware
design, the IRQ vector[1] must be allocated for use by the VF[6] when
the number of VFs is 7. So the IRQ vector[0] should be shared for PF
MISC and QUEUE interrupts.
+-----------+----------------------+
| Vector | Assigned To |
+-----------+----------------------+
| Vector 0 | PF MISC and QUEUE |
| Vector 1 | VF 6 |
| Vector 2 | VF 5 |
| Vector 3 | VF 4 |
| Vector 4 | VF 3 |
| Vector 5 | VF 2 |
| Vector 6 | VF 1 |
| Vector 7 | VF 0 |
+-----------+----------------------+
Minimize code modifications, only adjust the IRQ vector number for this
case.
Fixes: 877253d2cbf2 ("net: ngbe: add sriov function support")
Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com>
Link: https://patch.msgid.link/20250701063030.59340-4-jiawenwu@trustnetic.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Due to hardware limitations of NGBE, queue IRQs can only be requested
on vector 0 to 7. When the number of queues is set to the maximum 8,
the PCI IRQ vectors are allocated from 0 to 8. The vector 0 is used by
MISC interrupt, and althrough the vector 8 is used by queue interrupt,
it is unable to receive packets. This will cause some packets to be
dropped when RSS is enabled and they are assigned to queue 8.
So revert the adjustment of the MISC IRQ location, to make it be the
last one in IRQ vectors.
Fixes: 937d46ecc5f9 ("net: wangxun: add ethtool_ops for channel number")
Cc: stable@vger.kernel.org
Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com>
Link: https://patch.msgid.link/20250701063030.59340-3-jiawenwu@trustnetic.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Move the creating of irq_domain for MISC IRQ from .probe to .ndo_open,
and free it in .ndo_stop, to maintain consistency with the queue IRQs.
This it for subsequent adjustments to the IRQ vectors.
Fixes: aefd013624a1 ("net: txgbe: use irq_domain for interrupt controller")
Cc: stable@vger.kernel.org
Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Link: https://patch.msgid.link/20250701063030.59340-2-jiawenwu@trustnetic.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
The `tx_may_stop()` logic stops TX queues if free descriptors
(`sq->vq->num_free`) fall below the threshold of (`MAX_SKB_FRAGS` + 2).
If the total ring size (`ring_num`) is not strictly greater than this
value, queues can become persistently stopped or stop after minimal
use, severely degrading performance.
A single sk_buff transmission typically requires descriptors for:
- The virtio_net_hdr (1 descriptor)
- The sk_buff's linear data (head) (1 descriptor)
- Paged fragments (up to MAX_SKB_FRAGS descriptors)
This patch enforces that the TX ring size ('ring_num') must be strictly
greater than (MAX_SKB_FRAGS + 2). This ensures that the ring is
always large enough to hold at least one maximally-fragmented packet
plus at least one additional slot.
Reported-by: Lei Yang <leiyang@redhat.com>
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://patch.msgid.link/20250521092236.661410-4-lvivier@redhat.com
Tested-by: Lei Yang <leiyang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Improve consistency by using everywhere it is needed
'MAX_SKB_FRAGS + 2' rather than '2+MAX_SKB_FRAGS' or
'2 + MAX_SKB_FRAGS'.
No functional change.
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://patch.msgid.link/20250521092236.661410-3-lvivier@redhat.com
Tested-by: Lei Yang <leiyang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
The virtqueue_resize() function was not correctly propagating error codes
from its internal resize helper functions, specifically
virtqueue_resize_packet() and virtqueue_resize_split(). If these helpers
returned an error, but the subsequent call to virtqueue_enable_after_reset()
succeeded, the original error from the resize operation would be masked.
Consequently, virtqueue_resize() could incorrectly report success to its
caller despite an underlying resize failure.
This change restores the original code behavior:
if (vdev->config->enable_vq_after_reset(_vq))
return -EBUSY;
return err;
Fix: commit ad48d53b5b3f ("virtio_ring: separate the logic of reset/enable from virtqueue_resize")
Cc: xuanzhuo@linux.alibaba.com
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://patch.msgid.link/20250521092236.661410-2-lvivier@redhat.com
Tested-by: Lei Yang <leiyang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
When calling buf_to_xdp, the len argument is the frame data's length
without virtio header's length (vi->hdr_len). We check that len with
xsk_pool_get_rx_frame_size() + vi->hdr_len
to ensure the provided len does not larger than the allocated chunk
size. The additional vi->hdr_len is because in virtnet_add_recvbuf_xsk,
we use part of XDP_PACKET_HEADROOM for virtio header and ask the vhost
to start placing data from
hard_start + XDP_PACKET_HEADROOM - vi->hdr_len
not
hard_start + XDP_PACKET_HEADROOM
But the first buffer has virtio_header, so the maximum frame's length in
the first buffer can only be
xsk_pool_get_rx_frame_size()
not
xsk_pool_get_rx_frame_size() + vi->hdr_len
like in the current check.
This commit adds an additional argument to buf_to_xdp differentiate
between the first buffer and other ones to correctly calculate the maximum
frame's length.
Cc: stable@vger.kernel.org
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Fixes: a4e7ba702701 ("virtio_net: xsk: rx: support recv small mode")
Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
Link: https://patch.msgid.link/20250630151315.86722-2-minhquangbui99@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Replace the current repeated code to check received length in mergeable
mode with the new check_mergeable_len helper.
Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://patch.msgid.link/20250630144212.48471-4-minhquangbui99@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
The truesize is guaranteed not to exceed PAGE_SIZE in
get_mergeable_buf_len(). It is saved in mergeable context, which is not
changeable by the host side, so the check in receive path is quite
redundant.
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
Link: https://patch.msgid.link/20250630144212.48471-3-minhquangbui99@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
In xdp_linearize_page, when reading the following buffers from the ring,
we forget to check the received length with the true allocate size. This
can lead to an out-of-bound read. This commit adds that missing check.
Cc: <stable@vger.kernel.org>
Fixes: 4941d472bf95 ("virtio-net: do not reset during XDP set")
Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://patch.msgid.link/20250630144212.48471-2-minhquangbui99@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
udp_v4_early_demux now returns drop reasons as it either returns 0 or
ip_mc_validate_source, which returns itself a drop reason. However its
use was not converted in ip_rcv_finish_core and the drop reason is
ignored, leading to potentially skipping increasing LINUX_MIB_IPRPFILTER
if the drop reason is SKB_DROP_REASON_IP_RPFILTER.
This is a fix and we're not converting udp_v4_early_demux to explicitly
return a drop reason to ease backports; this can be done as a follow-up.
Fixes: d46f827016d8 ("net: ip: make ip_mc_validate_source() return drop reason")
Cc: Menglong Dong <menglong8.dong@gmail.com>
Reported-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://patch.msgid.link/20250701074935.144134-1-atenart@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When setting "ethtool -L eth0 combined 1", the number of RX/TX queue is
changed to be 1. RSS is disabled at this moment, and the indices of FDIR
have not be changed in wx_set_rss_queues(). So the combined count still
shows the previous value. This issue was introduced when supporting
FDIR. Fix it for those devices that support FDIR.
Fixes: 34744a7749b3 ("net: txgbe: add FDIR info to ethtool ops")
Cc: stable@vger.kernel.org
Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/A5C8FE56D6C04608+20250701070625.73680-1-jiawenwu@trustnetic.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The link status is latched low so that momentary link drops
can be detected. Always double-reading the status defeats this
design feature. Only double read if link was already down
This prevents unnecessary duplicate readings of the link status.
Fixes: 4f3b20bfbb75 ("amd-xgbe: add support for rx-adaptation")
Signed-off-by: Raju Rangoju <Raju.Rangoju@amd.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250701065016.4140707-1-Raju.Rangoju@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Certain classful qdiscs may invoke their classes' dequeue handler on an
enqueue operation. This may unexpectedly empty the child qdisc and thus
make an in-flight class passive via qlen_notify(). Most qdiscs do not
expect such behaviour at this point in time and may re-activate the
class eventually anyways which will lead to a use-after-free.
The referenced fix commit attempted to fix this behavior for the HFSC
case by moving the backlog accounting around, though this turned out to
be incomplete since the parent's parent may run into the issue too.
The following reproducer demonstrates this use-after-free:
tc qdisc add dev lo root handle 1: drr
tc filter add dev lo parent 1: basic classid 1:1
tc class add dev lo parent 1: classid 1:1 drr
tc qdisc add dev lo parent 1:1 handle 2: hfsc def 1
tc class add dev lo parent 2: classid 2:1 hfsc rt m1 8 d 1 m2 0
tc qdisc add dev lo parent 2:1 handle 3: netem
tc qdisc add dev lo parent 3:1 handle 4: blackhole
echo 1 | socat -u STDIN UDP4-DATAGRAM:127.0.0.1:8888
tc class delete dev lo classid 1:1
echo 1 | socat -u STDIN UDP4-DATAGRAM:127.0.0.1:8888
Since backlog accounting issues leading to a use-after-frees on stale
class pointers is a recurring pattern at this point, this patch takes
a different approach. Instead of trying to fix the accounting, the patch
ensures that qdisc_tree_reduce_backlog always calls qlen_notify when
the child qdisc is empty. This solves the problem because deletion of
qdiscs always involves a call to qdisc_reset() and / or
qdisc_purge_queue() which ultimately resets its qlen to 0 thus causing
the following qdisc_tree_reduce_backlog() to report to the parent. Note
that this may call qlen_notify on passive classes multiple times. This
is not a problem after the recent patch series that made all the
classful qdiscs qlen_notify() handlers idempotent.
Fixes: 3f981138109f ("sch_hfsc: Fix qlen accounting bug when using peek in hfsc_enqueue()")
Signed-off-by: Lion Ackermann <nnamrec@gmail.com>
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://patch.msgid.link/d912cbd7-193b-4269-9857-525bee8bbb6a@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
dma_map_XXX() functions return values DMA_MAPPING_ERROR as error values
which is often ~0. The error value should be tested with
dma_mapping_error().
This patch creates a new function in niu_ops to test if the mapping
failed. The test is fixed in niu_rbr_add_page(), added in
niu_start_xmit() and the successfully mapped pages are unmaped upon error.
Fixes: ec2deec1f352 ("niu: Fix to check for dma mapping errors.")
Signed-off-by: Thomas Fourier <fourier.thomas@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
There are two bugs in rose_rt_device_down() that can cause
use-after-free:
1. The loop bound `t->count` is modified within the loop, which can
cause the loop to terminate early and miss some entries.
2. When removing an entry from the neighbour array, the subsequent entries
are moved up to fill the gap, but the loop index `i` is still
incremented, causing the next entry to be skipped.
For example, if a node has three neighbours (A, A, B) with count=3 and A
is being removed, the second A is not checked.
i=0: (A, A, B) -> (A, B) with count=2
^ checked
i=1: (A, B) -> (A, B) with count=2
^ checked (B, not A!)
i=2: (doesn't occur because i < count is false)
This leaves the second A in the array with count=2, but the rose_neigh
structure has been freed. Code that accesses these entries assumes that
the first `count` entries are valid pointers, causing a use-after-free
when it accesses the dangling pointer.
Fix both issues by iterating over the array in reverse order with a fixed
loop bound. This ensures that all entries are examined and that the removal
of an entry doesn't affect subsequent iterations.
Reported-by: syzbot+e04e2c007ba2c80476cb@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=e04e2c007ba2c80476cb
Tested-by: syzbot+e04e2c007ba2c80476cb@syzkaller.appspotmail.com
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kohei Enju <enjuk@amazon.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250629030833.6680-1-enjuk@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The comparison in enic_change_mtu() incorrectly used the current
netdev->mtu instead of the new new_mtu value when warning about
an MTU exceeding the port MTU. This could suppress valid warnings
or issue incorrect ones.
Fix the condition and log to properly reflect the new_mtu.
Fixes: ab123fe071c9 ("enic: handle mtu change for vf properly")
Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Acked-by: John Daley <johndale@cisco.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250628145612.476096-1-alok.a.tiwari@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Update the Clause 37 Auto-Negotiation implementation to properly align
with the PCS hardware specifications:
- Fix incorrect bit settings in Link Status and Link Duplex fields
- Implement missing sequence steps 2 and 7
These changes ensure CL37 auto-negotiation protocol follows the exact
sequence patterns as specified in the hardware databook.
Fixes: 1bf40ada6290 ("amd-xgbe: Add support for clause 37 auto-negotiation")
Signed-off-by: Raju Rangoju <Raju.Rangoju@amd.com>
Link: https://patch.msgid.link/20250630192636.3838291-1-Raju.Rangoju@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Smatch complains that the error message isn't set in the caller:
lib/test_objagg.c:923 test_hints_case2()
error: uninitialized symbol 'errmsg'.
This static checker warning only showed up after a recent refactoring
but the bug dates back to when the code was originally added. This
likely doesn't affect anything in real life.
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/r/202506281403.DsuyHFTZ-lkp@intel.com/
Fixes: 0a020d416d0a ("lib: introduce initial implementation of object aggregation manager")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/8548f423-2e3b-4bb7-b816-5041de2762aa@sabinyo.mountain
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
I226 devices advertise support for the PCI-E link L1.2 substate. However,
due to a hardware limitation, the exit latency from this low-power state
is longer than the packet buffer can tolerate under high traffic
conditions. This can lead to packet loss and degraded performance.
To mitigate this, disable the L1.2 substate. The increased power draw
between L1.1 and L1.2 is insignificant.
Fixes: 43546211738e ("igc: Add new device ID's")
Link: https://lore.kernel.org/intel-wired-lan/15248b4f-3271-42dd-8e35-02bfc92b25e1@intel.com
Signed-off-by: Vitaly Lifshits <vitaly.lifshits@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
With VIRTCHNL2_CAP_MACFILTER enabled, the following warning is generated
on module load:
[ 324.701677] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:578
[ 324.701684] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1582, name: NetworkManager
[ 324.701689] preempt_count: 201, expected: 0
[ 324.701693] RCU nest depth: 0, expected: 0
[ 324.701697] 2 locks held by NetworkManager/1582:
[ 324.701702] #0: ffffffff9f7be770 (rtnl_mutex){....}-{3:3}, at: rtnl_newlink+0x791/0x21e0
[ 324.701730] #1: ff1100216c380368 (_xmit_ETHER){....}-{2:2}, at: __dev_open+0x3f0/0x870
[ 324.701749] Preemption disabled at:
[ 324.701752] [<ffffffff9cd23b9d>] __dev_open+0x3dd/0x870
[ 324.701765] CPU: 30 UID: 0 PID: 1582 Comm: NetworkManager Not tainted 6.15.0-rc5+ #2 PREEMPT(voluntary)
[ 324.701771] Hardware name: Intel Corporation M50FCP2SBSTD/M50FCP2SBSTD, BIOS SE5C741.86B.01.01.0001.2211140926 11/14/2022
[ 324.701774] Call Trace:
[ 324.701777] <TASK>
[ 324.701779] dump_stack_lvl+0x5d/0x80
[ 324.701788] ? __dev_open+0x3dd/0x870
[ 324.701793] __might_resched.cold+0x1ef/0x23d
<..>
[ 324.701818] __mutex_lock+0x113/0x1b80
<..>
[ 324.701917] idpf_ctlq_clean_sq+0xad/0x4b0 [idpf]
[ 324.701935] ? kasan_save_track+0x14/0x30
[ 324.701941] idpf_mb_clean+0x143/0x380 [idpf]
<..>
[ 324.701991] idpf_send_mb_msg+0x111/0x720 [idpf]
[ 324.702009] idpf_vc_xn_exec+0x4cc/0x990 [idpf]
[ 324.702021] ? rcu_is_watching+0x12/0xc0
[ 324.702035] idpf_add_del_mac_filters+0x3ed/0xb50 [idpf]
<..>
[ 324.702122] __hw_addr_sync_dev+0x1cf/0x300
[ 324.702126] ? find_held_lock+0x32/0x90
[ 324.702134] idpf_set_rx_mode+0x317/0x390 [idpf]
[ 324.702152] __dev_open+0x3f8/0x870
[ 324.702159] ? __pfx___dev_open+0x10/0x10
[ 324.702174] __dev_change_flags+0x443/0x650
<..>
[ 324.702208] netif_change_flags+0x80/0x160
[ 324.702218] do_setlink.isra.0+0x16a0/0x3960
<..>
[ 324.702349] rtnl_newlink+0x12fd/0x21e0
The sequence is as follows:
rtnl_newlink()->
__dev_change_flags()->
__dev_open()->
dev_set_rx_mode() - > # disables BH and grabs "dev->addr_list_lock"
idpf_set_rx_mode() -> # proceed only if VIRTCHNL2_CAP_MACFILTER is ON
__dev_uc_sync() ->
idpf_add_mac_filter ->
idpf_add_del_mac_filters ->
idpf_send_mb_msg() ->
idpf_mb_clean() ->
idpf_ctlq_clean_sq() # mutex_lock(cq_lock)
Fix by converting cq_lock to a spinlock. All operations under the new
lock are safe except freeing the DMA memory, which may use vunmap(). Fix
by requesting a contiguous physical memory for the DMA mapping.
Fixes: a251eee62133 ("idpf: add SRIOV support and other ndo_ops")
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Samuel Salin <Samuel.salin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Returning -EOPNOTSUPP from function returning u32 is leading to
cast and invalid size value as a result.
-EOPNOTSUPP as a size probably will lead to allocation fail.
Command: ethtool -x eth0
It is visible on all devices that don't have RSS caps set.
[ 136.615917] Call Trace:
[ 136.615921] <TASK>
[ 136.615927] ? __warn+0x89/0x130
[ 136.615942] ? __alloc_frozen_pages_noprof+0x322/0x330
[ 136.615953] ? report_bug+0x164/0x190
[ 136.615968] ? handle_bug+0x58/0x90
[ 136.615979] ? exc_invalid_op+0x17/0x70
[ 136.615987] ? asm_exc_invalid_op+0x1a/0x20
[ 136.616001] ? rss_prepare_get.constprop.0+0xb9/0x170
[ 136.616016] ? __alloc_frozen_pages_noprof+0x322/0x330
[ 136.616028] __alloc_pages_noprof+0xe/0x20
[ 136.616038] ___kmalloc_large_node+0x80/0x110
[ 136.616072] __kmalloc_large_node_noprof+0x1d/0xa0
[ 136.616081] __kmalloc_noprof+0x32c/0x4c0
[ 136.616098] ? rss_prepare_get.constprop.0+0xb9/0x170
[ 136.616105] rss_prepare_get.constprop.0+0xb9/0x170
[ 136.616114] ethnl_default_doit+0x107/0x3d0
[ 136.616131] genl_family_rcv_msg_doit+0x100/0x160
[ 136.616147] genl_rcv_msg+0x1b8/0x2c0
[ 136.616156] ? __pfx_ethnl_default_doit+0x10/0x10
[ 136.616168] ? __pfx_genl_rcv_msg+0x10/0x10
[ 136.616176] netlink_rcv_skb+0x58/0x110
[ 136.616186] genl_rcv+0x28/0x40
[ 136.616195] netlink_unicast+0x19b/0x290
[ 136.616206] netlink_sendmsg+0x222/0x490
[ 136.616215] __sys_sendto+0x1fd/0x210
[ 136.616233] __x64_sys_sendto+0x24/0x30
[ 136.616242] do_syscall_64+0x82/0x160
[ 136.616252] ? __sys_recvmsg+0x83/0xe0
[ 136.616265] ? syscall_exit_to_user_mode+0x10/0x210
[ 136.616275] ? do_syscall_64+0x8e/0x160
[ 136.616282] ? __count_memcg_events+0xa1/0x130
[ 136.616295] ? count_memcg_events.constprop.0+0x1a/0x30
[ 136.616306] ? handle_mm_fault+0xae/0x2d0
[ 136.616319] ? do_user_addr_fault+0x379/0x670
[ 136.616328] ? clear_bhb_loop+0x45/0xa0
[ 136.616340] ? clear_bhb_loop+0x45/0xa0
[ 136.616349] ? clear_bhb_loop+0x45/0xa0
[ 136.616359] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 136.616369] RIP: 0033:0x7fd30ba7b047
[ 136.616376] Code: 0c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 80 3d bd d5 0c 00 00 41 89 ca 74 10 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 71 c3 55 48 83 ec 30 44 89 4c 24 2c 4c 89 44
[ 136.616381] RSP: 002b:00007ffde1796d68 EFLAGS: 00000202 ORIG_RAX: 000000000000002c
[ 136.616388] RAX: ffffffffffffffda RBX: 000055d7bd89f2a0 RCX: 00007fd30ba7b047
[ 136.616392] RDX: 0000000000000028 RSI: 000055d7bd89f3b0 RDI: 0000000000000003
[ 136.616396] RBP: 00007ffde1796e10 R08: 00007fd30bb4e200 R09: 000000000000000c
[ 136.616399] R10: 0000000000000000 R11: 0000000000000202 R12: 000055d7bd89f340
[ 136.616403] R13: 000055d7bd89f3b0 R14: 000055d78943f200 R15: 0000000000000000
Fixes: 02cbfba1add5 ("idpf: add ethtool callbacks")
Reviewed-by: Ahmed Zaki <ahmed.zaki@intel.com>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Samuel Salin <Samuel.salin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Commit d53320aeef18 ("dt-bindings: net: Rename
renesas,r9a09g057-gbeth.yaml") renames the net devicetree binding
renesas,r9a09g057-gbeth.yaml to renesas,rzv2h-gbeth.yaml, but misses to
adjust the file entry in the RENESAS RZ/V2H(P) DWMAC GBETH GLUE LAYER
DRIVER section in MAINTAINERS.
Adjust the file entry after this file renaming.
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@redhat.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Link: https://patch.msgid.link/20250627134453.51780-1-lukas.bulwahn@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Remove redundant netif_napi_del() call from disconnect path.
A WARN may be triggered in __netif_napi_del_locked() during USB device
disconnect:
WARNING: CPU: 0 PID: 11 at net/core/dev.c:7417 __netif_napi_del_locked+0x2b4/0x350
This happens because netif_napi_del() is called in the disconnect path while
NAPI is still enabled. However, it is not necessary to call netif_napi_del()
explicitly, since unregister_netdev() will handle NAPI teardown automatically
and safely. Removing the redundant call avoids triggering the warning.
Full trace:
lan78xx 1-1:1.0 enu1: Failed to read register index 0x000000c4. ret = -ENODEV
lan78xx 1-1:1.0 enu1: Failed to set MAC down with error -ENODEV
lan78xx 1-1:1.0 enu1: Link is Down
lan78xx 1-1:1.0 enu1: Failed to read register index 0x00000120. ret = -ENODEV
------------[ cut here ]------------
WARNING: CPU: 0 PID: 11 at net/core/dev.c:7417 __netif_napi_del_locked+0x2b4/0x350
Modules linked in: flexcan can_dev fuse
CPU: 0 UID: 0 PID: 11 Comm: kworker/0:1 Not tainted 6.16.0-rc2-00624-ge926949dab03 #9 PREEMPT
Hardware name: SKOV IMX8MP CPU revC - bd500 (DT)
Workqueue: usb_hub_wq hub_event
pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : __netif_napi_del_locked+0x2b4/0x350
lr : __netif_napi_del_locked+0x7c/0x350
sp : ffffffc085b673c0
x29: ffffffc085b673c0 x28: ffffff800b7f2000 x27: ffffff800b7f20d8
x26: ffffff80110bcf58 x25: ffffff80110bd978 x24: 1ffffff0022179eb
x23: ffffff80110bc000 x22: ffffff800b7f5000 x21: ffffff80110bc000
x20: ffffff80110bcf38 x19: ffffff80110bcf28 x18: dfffffc000000000
x17: ffffffc081578940 x16: ffffffc08284cee0 x15: 0000000000000028
x14: 0000000000000006 x13: 0000000000040000 x12: ffffffb0022179e8
x11: 1ffffff0022179e7 x10: ffffffb0022179e7 x9 : dfffffc000000000
x8 : 0000004ffdde8619 x7 : ffffff80110bcf3f x6 : 0000000000000001
x5 : ffffff80110bcf38 x4 : ffffff80110bcf38 x3 : 0000000000000000
x2 : 0000000000000000 x1 : 1ffffff0022179e7 x0 : 0000000000000000
Call trace:
__netif_napi_del_locked+0x2b4/0x350 (P)
lan78xx_disconnect+0xf4/0x360
usb_unbind_interface+0x158/0x718
device_remove+0x100/0x150
device_release_driver_internal+0x308/0x478
device_release_driver+0x1c/0x30
bus_remove_device+0x1a8/0x368
device_del+0x2e0/0x7b0
usb_disable_device+0x244/0x540
usb_disconnect+0x220/0x758
hub_event+0x105c/0x35e0
process_one_work+0x760/0x17b0
worker_thread+0x768/0xce8
kthread+0x3bc/0x690
ret_from_fork+0x10/0x20
irq event stamp: 211604
hardirqs last enabled at (211603): [<ffffffc0828cc9ec>] _raw_spin_unlock_irqrestore+0x84/0x98
hardirqs last disabled at (211604): [<ffffffc0828a9a84>] el1_dbg+0x24/0x80
softirqs last enabled at (211296): [<ffffffc080095f10>] handle_softirqs+0x820/0xbc8
softirqs last disabled at (210993): [<ffffffc080010288>] __do_softirq+0x18/0x20
---[ end trace 0000000000000000 ]---
lan78xx 1-1:1.0 enu1: failed to kill vid 0081/0
Fixes: ec4c7e12396b ("lan78xx: Introduce NAPI polling support")
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://patch.msgid.link/20250627051346.276029-1-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
There is a occasional problem that ping is failed between AML devices.
That is because the manual enablement of the security Tx path on the
hardware is missing, no matter what its previous state was.
Fixes: 6f8b4c01a8cd ("net: txgbe: Implement PHYLINK for AML 25G/10G devices")
Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/5BDFB14C57D1C42A+20250626085153.86122-1-jiawenwu@trustnetic.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
I have been working on mlx5 related code for several years,
contributing features, code reviews, and occasional maintainer tasks
when needed. This patch makes my maintainer role official.
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://patch.msgid.link/20250627014252.1262592-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add FALLOC_FL_ALLOCATE_RANGE to the set of supported fallocate flags in
XFS_FALLOC_FL_SUPPORTED. This change improves code clarity and maintains
by explicitly showing this flag in the supported flags mask.
Note that since FALLOC_FL_ALLOCATE_RANGE is defined as 0x00, this addition
has no functional modifications.
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Youling Tang <tangyouling@kylinos.cn>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
io_uring marks a request as dealing with a regular file on S_ISREG. This
drives things like retries on short reads or writes, which is generally
not expected on a regular file (or bdev). Applications tend to not
expect that, so io_uring tries hard to ensure it doesn't deliver short
IO on regular files.
However, a recent commit added S_IFREG to anonymous inodes. When
io_uring is used to read from various things that are backed by anon
inodes, like eventfd, timerfd, etc, then it'll now all of a sudden wait
for more data when rather than deliver what was read or written in a
single operation. This breaks applications that issue reads on anon
inodes, if they ask for more data than a single read delivers.
Add a check for !S_ANON_INODE as well before setting REQ_F_ISREG to
prevent that.
Cc: Christian Brauner <brauner@kernel.org>
Cc: stable@vger.kernel.org
Link: https://github.com/ghostty-org/ghostty/discussions/7720
Fixes: cfd86ef7e8e7 ("anon_inode: use a proper mode internally")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
|
|
According to Documentation/PCI/pci-error-recovery.rst only the
error_detected() callback in the err_handler struct is mandatory for
a driver to support error recovery. So far s390's error recovery chose
a stricter approach also requiring slot_reset() and resume().
Relax this requirement and only require error_detected(). If a callback
is not implemented EEH and AER treat this as PCI_ERS_RESULT_NONE. This
return value is otherwise used by drivers abstaining from their vote
on how to proceed with recovery and currently also not supported by
s390's recovery code.
So to support missing callbacks in-line with other implementors of the
recovery flow, also handle PCI_ERS_RESULT_NONE. Since s390 only does per
PCI function recovery and does not do voting, treat PCI_ERS_RESULT_NONE
optimistically and proceed through recovery unless other failures
prevent this.
Reviewed-by: Farhan Ali <alifm@linux.ibm.com>
Reviewed-by: Julian Ruess <julianr@linux.ibm.com>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
|
If a device is disabled unblocking load/store on its own is not useful
as a full re-enable of the function is necessary anyway. Note that SCLP
Write Event Data Action Qualifier 0 (Reset) leaves the device disabled
and triggers this case unless the driver already requests a reset.
Cc: stable@vger.kernel.org
Fixes: 4cdf2f4e24ff ("s390/pci: implement minimal PCI error recovery")
Reviewed-by: Farhan Ali <alifm@linux.ibm.com>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
|
The error event information for PCI error events contains a function
handle for the respective function. This handle is generally captured at
the time the error event was recorded. Due to delays in processing or
cascading issues, it may happen that during firmware recovery multiple
events are generated. When processing these events in order Linux may
already have recovered an affected function making the event information
stale. Fix this by doing an unconditional CLP List PCI function
retrieving the current function handle with the zdev->state_lock held
and ignoring the event if its function handle is stale.
Cc: stable@vger.kernel.org
Fixes: 4cdf2f4e24ff ("s390/pci: implement minimal PCI error recovery")
Reviewed-by: Julian Ruess <julianr@linux.ibm.com>
Reviewed-by: Gerd Bayer <gbayer@linux.ibm.com>
Reviewed-by: Farhan Ali <alifm@linux.ibm.com>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
|
The driver registered xdp_rxq_info structures via xdp_rxq_info_reg()
but failed to properly unregister them in error paths and during
removal.
Fixes: d678be1dc1ec ("dpaa2-eth: add XDP_REDIRECT support")
Signed-off-by: Fushuai Wang <wangfushuai@baidu.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Link: https://patch.msgid.link/20250626133003.80136-1-wangfushuai@baidu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The `dma_map_XXX()` functions can fail and must be checked using
`dma_mapping_error()`. This patch adds proper error handling for all
DMA mapping calls.
In `atl1_alloc_rx_buffers()`, if DMA mapping fails, the buffer is
deallocated and marked accordingly.
In `atl1_tx_map()`, previously mapped buffers are unmapped and the
packet is dropped on failure.
If `atl1_xmit_frame()` drops the packet, increment the tx_error counter.
Fixes: f3cc28c79760 ("Add Attansic L1 ethernet driver.")
Signed-off-by: Thomas Fourier <fourier.thomas@gmail.com>
Link: https://patch.msgid.link/20250625141629.114984-2-fourier.thomas@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
To enable TLS ulp socket needs to be in established state.
This was added in commit d91c3e17f75f ("net/tls: Only attach
to sockets in ESTABLISHED state"), in 2018.
Signed-off-by: Ulrich Weber <ulrich.weber@gmail.com>
Link: https://patch.msgid.link/20250626145618.15464-1-ulrich.weber@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
"Co-posting selftests" belongs in the "netdev patch review" section,
same as "co-posting changes to user space components". It was
erroneously added as its own section.
Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
Link: https://patch.msgid.link/20250626182055.4161905-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Due to changes of my responsibilities within IBM i
can no longer act as maintainer for smc.
As a result of the co-operation with Alibaba over
the last years we decided to, once more, give them
more responsibility for smc by appointing
D. Wythe <alibuda@linux.alibaba.com> and
Dust Li <dust.li@linux.alibaba.com>
as maintainers as well.
Within IBM Sidraya Jayagond <sidraya@linux.ibm.com>
and Mahanta Jambigi <mjambigi@linux.ibm.com>
are going to take over the maintainership for smc.
Signed-off-by: Jan Karcher <jaka@linux.ibm.com>
Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com>
Link: https://patch.msgid.link/20250626051653.4259-1-jaka@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
If the processing of the tr->events loop fails, the filter that has been
added to filter_head will be released twice in free_filter_list(&head->rcu)
and __free_filter(filter).
After adding the filter of tr->events, add the filter to the filter_head
process to avoid triggering uaf.
Link: https://lore.kernel.org/tencent_4EF87A626D702F816CD0951CE956EC32CD0A@qq.com
Fixes: a9d0aab5eb33 ("tracing: Fix regression of filter waiting a long time on RCU synchronization")
Reported-by: syzbot+daba72c4af9915e9c894@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=daba72c4af9915e9c894
Tested-by: syzbot+daba72c4af9915e9c894@syzkaller.appspotmail.com
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Edward Adam Davis <eadavis@qq.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Currently, for controllers with extended advertising, the advertising
data is set in the asynchronous response handler for extended
adverstising params. As most advertising settings are performed in a
synchronous context, the (asynchronous) setting of the advertising data
is done too late (after enabling the advertising).
Move setting of adverstising data from asynchronous response handler
into synchronous context to fix ordering of HCI commands.
Signed-off-by: Christian Eggers <ceggers@arri.de>
Fixes: a0fb3726ba55 ("Bluetooth: Use Set ext adv/scan rsp data if controller supports")
Cc: stable@vger.kernel.org
v2: https://lore.kernel.org/linux-bluetooth/20250626115209.17839-1-ceggers@arri.de/
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
|
|
The unconditional call of hci_disable_advertising_sync() in
mesh_send_done_sync() also disables other LE advertisings (non mesh
related).
I am not sure whether this call is required at all, but checking the
adv_instances list (like done at other places) seems to solve the
problem.
Fixes: b338d91703fa ("Bluetooth: Implement support for Mesh")
Cc: stable@vger.kernel.org
Signed-off-by: Christian Eggers <ceggers@arri.de>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
|
|
According to the message of commit b338d91703fa ("Bluetooth: Implement
support for Mesh"), MGMT_OP_SET_MESH_RECEIVER should set the passive scan
parameters. Currently the scan interval and window parameters are
silently ignored, although user space (bluetooth-meshd) expects that
they can be used [1]
[1] https://git.kernel.org/pub/scm/bluetooth/bluez.git/tree/mesh/mesh-io-mgmt.c#n344
Fixes: b338d91703fa ("Bluetooth: Implement support for Mesh")
Cc: stable@vger.kernel.org
Signed-off-by: Christian Eggers <ceggers@arri.de>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
|
|
This reverts minor parts of the changes made in commit b338d91703fa
("Bluetooth: Implement support for Mesh"). It looks like these changes
were only made for development purposes but shouldn't have been part of
the commit.
Fixes: b338d91703fa ("Bluetooth: Implement support for Mesh")
Cc: stable@vger.kernel.org
Signed-off-by: Christian Eggers <ceggers@arri.de>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
|
|
We are setting the parent directory's last_unlink_trans directly which
may result in a concurrent task starting to log the directory not see the
update and therefore can log the directory after we removed a child
directory which had a snapshot within instead of falling back to a
transaction commit. Replaying such a log tree would result in a mount
failure since we can't currently delete snapshots (and subvolumes) during
log replay. This is the type of failure described in commit 1ec9a1ae1e30
("Btrfs: fix unreplayable log after snapshot delete + parent dir fsync").
Fix this by using btrfs_record_snapshot_destroy() which updates the
last_unlink_trans field while holding the inode's log_mutex lock.
Fixes: 44f714dae50a ("Btrfs: improve performance on fsync against new inode after rename/unlink")
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
In case the removed directory had a snapshot that was deleted, we are
propagating its inode's last_unlink_trans to the parent directory after
we removed the entry from the parent directory. This leaves a small race
window where someone can log the parent directory after we removed the
entry and before we updated last_unlink_trans, and as a result if we ever
try to replay such a log tree, we will fail since we will attempt to
remove a snapshot during log replay, which is currently not possible and
results in the log replay (and mount) to fail. This is the type of failure
described in commit 1ec9a1ae1e30 ("Btrfs: fix unreplayable log after
snapshot delete + parent dir fsync").
So fix this by propagating the last_unlink_trans to the parent directory
before we remove the entry from it.
Fixes: 44f714dae50a ("Btrfs: improve performance on fsync against new inode after rename/unlink")
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Instead of recording that a new subvolume was created in a directory after
we add the entry do the directory, record it before adding the entry. This
is to avoid races where after creating the entry and before recording the
new subvolume in the directory (the call to btrfs_record_new_subvolume()),
another task logs the directory, so we end up with a log tree where we
logged a directory that has an entry pointing to a root that was not yet
committed, resulting in an invalid entry if the log is persisted and
replayed later due to a power failure or crash.
Also state this requirement in the function comment for
btrfs_record_new_subvolume(), similar to what we do for the
btrfs_record_unlink_dir() and btrfs_record_snapshot_destroy().
Fixes: 45c4102f0d82 ("btrfs: avoid transaction commit on any fsync after subvolume creation")
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
When replaying log trees we use read_one_inode() to get an inode, which is
just a wrapper around btrfs_iget_logging(), which in turn is a wrapper for
btrfs_iget(). But read_one_inode() always returns NULL for any error
that btrfs_iget_logging() / btrfs_iget() may return and this is a problem
because:
1) In many callers of read_one_inode() we convert the NULL into -EIO,
which is not accurate since btrfs_iget() may return -ENOMEM and -ENOENT
for example, besides -EIO and other errors. So during log replay we
may end up reporting a false -EIO, which is confusing since we may
not have had any IO error at all;
2) When replaying directory deletes, at replay_dir_deletes(), we assume
the NULL returned from read_one_inode() means that the inode doesn't
exist and then proceed as if no error had happened. This is wrong
because unless btrfs_iget() returned ERR_PTR(-ENOENT), we had an
actual error and the target inode may exist in the target subvolume
root - this may later result in the log replay code failing at a
later stage (if we are "lucky") or succeed but leaving some
inconsistency in the filesystem.
So fix this by not ignoring errors from btrfs_iget_logging() and as
a consequence remove the read_one_inode() wrapper and just use
btrfs_iget_logging() directly. Also since btrfs_iget_logging() is
supposed to be called only against subvolume roots, just like
read_one_inode() which had a comment about it, add an assertion to
btrfs_iget_logging() to check that the target root corresponds to a
subvolume root.
Fixes: 5d4f98a28c7d ("Btrfs: Mixed back reference (FORWARD ROLLING FORMAT CHANGE)")
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
At __inode_add_ref() when processing extrefs, if we jump into the next
label we have an undefined value of victim_name.len, since we haven't
initialized it before we did the goto. This results in an invalid memory
access in the next iteration of the loop since victim_name.len was not
initialized to the length of the name of the current extref.
Fix this by initializing victim_name.len with the current extref's name
length.
Fixes: e43eec81c516 ("btrfs: use struct qstr instead of name and namelen pairs")
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
During log replay, at __add_inode_ref(), when we are searching for inode
ref keys we totally ignore if btrfs_search_slot() returns an error. This
may make a log replay succeed when there was an actual error and leave
some metadata inconsistency in a subvolume tree. Fix this by checking if
an error was returned from btrfs_search_slot() and if so, return it to
the caller.
Fixes: e02119d5a7b4 ("Btrfs: Add a write ahead tree log to optimize synchronous operations")
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
If we are rebuilding a free space tree, while modifying the free space
tree we may need to allocate a new metadata block group.
If we end up using multiple transactions for the rebuild, when we call
btrfs_end_transaction() we enter btrfs_create_pending_block_groups()
which calls add_block_group_free_space() to add items to the free space
tree for the block group.
Then later during the free space tree rebuild, at
btrfs_rebuild_free_space_tree(), we may find such new block groups
and call populate_free_space_tree() for them, which fails with -EEXIST
because there are already items in the free space tree. Then we abort the
transaction with -EEXIST at btrfs_rebuild_free_space_tree().
Notice that we say "may find" the new block groups because a new block
group may be inserted in the block groups rbtree, which is being iterated
by the rebuild process, before or after the current node where the rebuild
process is currently at.
Syzbot recently reported such case which produces a trace like the
following:
------------[ cut here ]------------
BTRFS: Transaction aborted (error -17)
WARNING: CPU: 1 PID: 7626 at fs/btrfs/free-space-tree.c:1341 btrfs_rebuild_free_space_tree+0x470/0x54c fs/btrfs/free-space-tree.c:1341
Modules linked in:
CPU: 1 UID: 0 PID: 7626 Comm: syz.2.25 Not tainted 6.15.0-rc7-syzkaller-00085-gd7fa1af5b33e-dirty #0 PREEMPT
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/07/2025
pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : btrfs_rebuild_free_space_tree+0x470/0x54c fs/btrfs/free-space-tree.c:1341
lr : btrfs_rebuild_free_space_tree+0x470/0x54c fs/btrfs/free-space-tree.c:1341
sp : ffff80009c4f7740
x29: ffff80009c4f77b0 x28: ffff0000d4c3f400 x27: 0000000000000000
x26: dfff800000000000 x25: ffff70001389eee8 x24: 0000000000000003
x23: 1fffe000182b6e7b x22: 0000000000000000 x21: ffff0000c15b73d8
x20: 00000000ffffffef x19: ffff0000c15b7378 x18: 1fffe0003386f276
x17: ffff80008f31e000 x16: ffff80008adbe98c x15: 0000000000000001
x14: 1fffe0001b281550 x13: 0000000000000000 x12: 0000000000000000
x11: ffff60001b281551 x10: 0000000000000003 x9 : 1c8922000a902c00
x8 : 1c8922000a902c00 x7 : ffff800080485878 x6 : 0000000000000000
x5 : 0000000000000001 x4 : 0000000000000001 x3 : ffff80008047843c
x2 : 0000000000000001 x1 : ffff80008b3ebc40 x0 : 0000000000000001
Call trace:
btrfs_rebuild_free_space_tree+0x470/0x54c fs/btrfs/free-space-tree.c:1341 (P)
btrfs_start_pre_rw_mount+0xa78/0xe10 fs/btrfs/disk-io.c:3074
btrfs_remount_rw fs/btrfs/super.c:1319 [inline]
btrfs_reconfigure+0x828/0x2418 fs/btrfs/super.c:1543
reconfigure_super+0x1d4/0x6f0 fs/super.c:1083
do_remount fs/namespace.c:3365 [inline]
path_mount+0xb34/0xde0 fs/namespace.c:4200
do_mount fs/namespace.c:4221 [inline]
__do_sys_mount fs/namespace.c:4432 [inline]
__se_sys_mount fs/namespace.c:4409 [inline]
__arm64_sys_mount+0x3e8/0x468 fs/namespace.c:4409
__invoke_syscall arch/arm64/kernel/syscall.c:35 [inline]
invoke_syscall+0x98/0x2b8 arch/arm64/kernel/syscall.c:49
el0_svc_common+0x130/0x23c arch/arm64/kernel/syscall.c:132
do_el0_svc+0x48/0x58 arch/arm64/kernel/syscall.c:151
el0_svc+0x58/0x17c arch/arm64/kernel/entry-common.c:767
el0t_64_sync_handler+0x78/0x108 arch/arm64/kernel/entry-common.c:786
el0t_64_sync+0x198/0x19c arch/arm64/kernel/entry.S:600
irq event stamp: 330
hardirqs last enabled at (329): [<ffff80008048590c>] raw_spin_rq_unlock_irq kernel/sched/sched.h:1525 [inline]
hardirqs last enabled at (329): [<ffff80008048590c>] finish_lock_switch+0xb0/0x1c0 kernel/sched/core.c:5130
hardirqs last disabled at (330): [<ffff80008adb9e60>] el1_dbg+0x24/0x80 arch/arm64/kernel/entry-common.c:511
softirqs last enabled at (10): [<ffff8000801fbf10>] local_bh_enable+0x10/0x34 include/linux/bottom_half.h:32
softirqs last disabled at (8): [<ffff8000801fbedc>] local_bh_disable+0x10/0x34 include/linux/bottom_half.h:19
---[ end trace 0000000000000000 ]---
Fix this by flagging new block groups which had their free space tree
entries already added and then skip them in the rebuild process. Also,
since the rebuild may be triggered when doing a remount, make sure that
when we clear an existing free space tree that we clear such flag from
every existing block group, otherwise we would skip those block groups
during the rebuild.
Reported-by: syzbot+d0014fb0fc39c5487ae5@syzkaller.appspotmail.com
Link: https://lore.kernel.org/linux-btrfs/68460a54.050a0220.daf97.0af5.GAE@google.com/
Fixes: 882af9f13e83 ("btrfs: handle free space tree rebuild in multiple transactions")
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|