Age | Commit message (Collapse) | Author | Files | Lines |
|
syzbot reported one use-after-free in pfifo_fast_enqueue() [1]
Issue here is that we can not reuse skb after a successful skb_array_produce()
since another cpu might have consumed it already.
I believe a similar problem exists in try_bulk_dequeue_skb_slow()
in case we put an skb into qdisc_enqueue_skb_bad_txq() for lockless qdisc.
[1]
BUG: KASAN: use-after-free in qdisc_pkt_len include/net/sch_generic.h:610 [inline]
BUG: KASAN: use-after-free in qdisc_qstats_cpu_backlog_inc include/net/sch_generic.h:712 [inline]
BUG: KASAN: use-after-free in pfifo_fast_enqueue+0x4bc/0x5e0 net/sched/sch_generic.c:639
Read of size 4 at addr ffff8801cede37e8 by task syzkaller717588/5543
CPU: 1 PID: 5543 Comm: syzkaller717588 Not tainted 4.16.0-rc4+ #265
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:17 [inline]
dump_stack+0x194/0x24d lib/dump_stack.c:53
print_address_description+0x73/0x250 mm/kasan/report.c:256
kasan_report_error mm/kasan/report.c:354 [inline]
kasan_report+0x23c/0x360 mm/kasan/report.c:412
__asan_report_load4_noabort+0x14/0x20 mm/kasan/report.c:432
qdisc_pkt_len include/net/sch_generic.h:610 [inline]
qdisc_qstats_cpu_backlog_inc include/net/sch_generic.h:712 [inline]
pfifo_fast_enqueue+0x4bc/0x5e0 net/sched/sch_generic.c:639
__dev_xmit_skb net/core/dev.c:3216 [inline]
Fixes: c5ad119fb6c0 ("net: sched: pfifo_fast use skb_array")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot+ed43b6903ab968b16f54@syzkaller.appspotmail.com
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Checking for 0 is insufficient: when an SKB without a batadv header, but
with a VLAN header is received, hdr_size will be 4, making the following
code interpret the Ethernet header as a batadv header.
Fixes: be1db4f6615b ("batman-adv: make the Distributed ARP Table vlan aware")
Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
|
|
batadv_check_unicast_ttvn() calls skb_cow(), so pointers into the SKB data
must be (re)set after calling it. The ethhdr variable is dropped
altogether.
Fixes: 7cdcf6dddc42 ("batman-adv: add UNICAST_4ADDR packet type")
Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
|
|
The MDIO busses are switch properties and so should be inside the
switch node. Fix the examples in the binding document.
Reported-by: 尤晓杰 <yxj790222@163.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Fixes: a3c53be55c95 ("net: dsa: mv88e6xxx: Support multiple MDIO busses")
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When errors are enqueued to the error queue via sock_queue_err_skb()
function, it is possible that the waiting application is not notified.
Calling 'sk->sk_data_ready()' would not notify applications that
selected only POLLERR events in poll() (for example).
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: Randy E. Witt <randy.e.witt@intel.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
nlmsg_multicast() consumes always the skb, thus the original skb must be
freed only when this function is called with a clone.
Fixes: cb9f7a9a5c96 ("netlink: ensure to loop over all netns in genlmsg_multicast_allns()")
Reported-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Link updates were not reported to qedr correctly.
Leading to cases where a link could be down, but qedr
would see it as up.
In addition, once qede was loaded, link state would be up,
regardless of the actual link state.
Signed-off-by: Michal Kalderon <michal.kalderon@cavium.com>
Signed-off-by: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
FW workaround. The iWARP LL2 connection did not expect TCP packets
to arrive on it's connection. The fix drops any non-tcp packets
Fixes b5c29ca ("qed: iWARP CM - setup a ll2 connection for handling
SYN packets")
Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
There is a corner case in the MPA unalign flow where a FPDU header is
split over two tcp segments. The length of the first fragment in this
case was not initialized properly and should be '1'
Fixes: c7d1d839 ("qed: Add support for MPA header being split over two tcp packets")
Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Free memory by calling put_device(), if afiucv_iucv_init is not
successful.
Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
There is no need for complex checking between the last consumed index
and current consumed index, a simple subtraction will do.
This also eliminates the possibility of a permanent transmit queue stall
under the following conditions:
- one CPU bursts ring->size worth of traffic (up to 256 buffers), to the
point where we run out of free descriptors, so we stop the transmit
queue at the end of bcm_sysport_xmit()
- because of our locking, we have the transmit process disable
interrupts which means we can be blocking the TX reclamation process
- when TX reclamation finally runs, we will be computing the difference
between ring->c_index (last consumed index by SW) and what the HW
reports through its register
- this register is masked with (ring->size - 1) = 0xff, which will lead
to stripping the upper bits of the index (register is 16-bits wide)
- we will be computing last_tx_cn as 0, which means there is no work to
be done, and we never wake-up the transmit queue, leaving it
permanently disabled
A practical example is e.g: ring->c_index aka last_c_index = 12, we
pushed 256 entries, HW consumer index = 268, we mask it with 0xff = 12,
so last_tx_cn == 0, nothing happens.
Fixes: 80105befdb4b ("net: systemport: add Broadcom SYSTEMPORT Ethernet MAC driver")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Need to lock lower socket in order to provide mutual exclusion
with kcm_unattach.
v2: Add Reported-by for syzbot
Fixes: ab7ac4eb9832e32a09f4e804 ("kcm: Kernel Connection Multiplexor module")
Reported-by: syzbot+ea75c0ffcd353d32515f064aaebefc5279e6161e@syzkaller.appspotmail.com
Signed-off-by: Tom Herbert <tom@quantonium.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
With reorder header off, received packets are untagged in skb_vlan_untag()
called from within __netif_receive_skb_core(), and later the tag will be
inserted back in vlan_do_receive().
This caused out of order vlan headers when we create a vlan device on top
of another vlan device, because vlan_do_receive() inserts a tag as the
outermost vlan tag. E.g. the outer tag is first removed in skb_vlan_untag()
and inserted back in vlan_do_receive(), then the inner tag is next removed
and inserted back as the outermost tag.
This patch fixes the behaviour by inserting the inner tag at the right
position.
Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When we have a bridge with vlan_filtering on and a vlan device on top of
it, packets would be corrupted in skb_vlan_untag() called from
br_dev_xmit().
The problem sits in skb_reorder_vlan_header() used in skb_vlan_untag(),
which makes use of skb->mac_len. In this function mac_len is meant for
handling rx path with vlan devices with reorder_header disabled, but in
tx path mac_len is typically 0 and cannot be used, which is the problem
in this case.
The current code even does not properly handle rx path (skb_vlan_untag()
called from __netif_receive_skb_core()) with reorder_header off actually.
In rx path single tag case, it works as follows:
- Before skb_reorder_vlan_header()
mac_header data
v v
+-------------------+-------------+------+----
| ETH | VLAN | ETH |
| ADDRS | TPID | TCI | TYPE |
+-------------------+-------------+------+----
<-------- mac_len --------->
<------------->
to be removed
- After skb_reorder_vlan_header()
mac_header data
v v
+-------------------+------+----
| ETH | ETH |
| ADDRS | TYPE |
+-------------------+------+----
<-------- mac_len --------->
This is ok, but in rx double tag case, it corrupts packets:
- Before skb_reorder_vlan_header()
mac_header data
v v
+-------------------+-------------+-------------+------+----
| ETH | VLAN | VLAN | ETH |
| ADDRS | TPID | TCI | TPID | TCI | TYPE |
+-------------------+-------------+-------------+------+----
<--------------- mac_len ---------------->
<------------->
should be removed
<--------------------------->
actually will be removed
- After skb_reorder_vlan_header()
mac_header data
v v
+-------------------+------+----
| ETH | ETH |
| ADDRS | TYPE |
+-------------------+------+----
<--------------- mac_len ---------------->
So, two of vlan tags are both removed while only inner one should be
removed and mac_header (and mac_len) is broken.
skb_vlan_untag() is meant for removing the vlan header at (skb->data - 2),
so use skb->data and skb->mac_header to calculate the right offset.
Reported-by: Brandon Carpenter <brandon.carpenter@cypherpath.com>
Fixes: a6e18ff11170 ("vlan: Fix untag operations of stacked vlans with REORDER_HEADER off")
Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If set/unset mode of the tunnel_key action is not provided, ->init() still
returns 0, and the caller proceeds with bogus 'struct tc_action *' object,
this results in crash:
% tc actions add action tunnel_key src_ip 1.1.1.1 dst_ip 2.2.2.1 id 7 index 1
[ 35.805515] general protection fault: 0000 [#1] SMP PTI
[ 35.806161] Modules linked in: act_tunnel_key kvm_intel kvm irqbypass
crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64
crypto_simd glue_helper cryptd serio_raw
[ 35.808233] CPU: 1 PID: 428 Comm: tc Not tainted 4.16.0-rc4+ #286
[ 35.808929] RIP: 0010:tcf_action_init+0x90/0x190
[ 35.809457] RSP: 0018:ffffb8edc068b9a0 EFLAGS: 00010206
[ 35.810053] RAX: 1320c000000a0003 RBX: 0000000000000001 RCX: 0000000000000000
[ 35.810866] RDX: 0000000000000070 RSI: 0000000000007965 RDI: ffffb8edc068b910
[ 35.811660] RBP: ffffb8edc068b9d0 R08: 0000000000000000 R09: ffffb8edc068b808
[ 35.812463] R10: ffffffffc02bf040 R11: 0000000000000040 R12: ffffb8edc068bb38
[ 35.813235] R13: 0000000000000000 R14: 0000000000000000 R15: ffffb8edc068b910
[ 35.814006] FS: 00007f3d0d8556c0(0000) GS:ffff91d1dbc40000(0000)
knlGS:0000000000000000
[ 35.814881] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 35.815540] CR2: 000000000043f720 CR3: 0000000019248001 CR4: 00000000001606a0
[ 35.816457] Call Trace:
[ 35.817158] tc_ctl_action+0x11a/0x220
[ 35.817795] rtnetlink_rcv_msg+0x23d/0x2e0
[ 35.818457] ? __slab_alloc+0x1c/0x30
[ 35.819079] ? __kmalloc_node_track_caller+0xb1/0x2b0
[ 35.819544] ? rtnl_calcit.isra.30+0xe0/0xe0
[ 35.820231] netlink_rcv_skb+0xce/0x100
[ 35.820744] netlink_unicast+0x164/0x220
[ 35.821500] netlink_sendmsg+0x293/0x370
[ 35.822040] sock_sendmsg+0x30/0x40
[ 35.822508] ___sys_sendmsg+0x2c5/0x2e0
[ 35.823149] ? pagecache_get_page+0x27/0x220
[ 35.823714] ? filemap_fault+0xa2/0x640
[ 35.824423] ? page_add_file_rmap+0x108/0x200
[ 35.825065] ? alloc_set_pte+0x2aa/0x530
[ 35.825585] ? finish_fault+0x4e/0x70
[ 35.826140] ? __handle_mm_fault+0xbc1/0x10d0
[ 35.826723] ? __sys_sendmsg+0x41/0x70
[ 35.827230] __sys_sendmsg+0x41/0x70
[ 35.827710] do_syscall_64+0x68/0x120
[ 35.828195] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[ 35.828859] RIP: 0033:0x7f3d0ca4da67
[ 35.829331] RSP: 002b:00007ffc9f284338 EFLAGS: 00000246 ORIG_RAX:
000000000000002e
[ 35.830304] RAX: ffffffffffffffda RBX: 00007ffc9f284460 RCX: 00007f3d0ca4da67
[ 35.831247] RDX: 0000000000000000 RSI: 00007ffc9f2843b0 RDI: 0000000000000003
[ 35.832167] RBP: 000000005aa6a7a9 R08: 0000000000000001 R09: 0000000000000000
[ 35.833075] R10: 00000000000005f1 R11: 0000000000000246 R12: 0000000000000000
[ 35.833997] R13: 00007ffc9f2884c0 R14: 0000000000000001 R15: 0000000000674640
[ 35.834923] Code: 24 30 bb 01 00 00 00 45 31 f6 eb 5e 8b 50 08 83 c2 07 83 e2
fc 83 c2 70 49 8b 07 48 8b 40 70 48 85 c0 74 10 48 89 14 24 4c 89 ff <ff> d0 48
8b 14 24 48 01 c2 49 01 d6 45 85 ed 74 05 41 83 47 2c
[ 35.837442] RIP: tcf_action_init+0x90/0x190 RSP: ffffb8edc068b9a0
[ 35.838291] ---[ end trace a095c06ee4b97a26 ]---
Fixes: d0f6dd8a914f ("net/sched: Introduce act_tunnel_key")
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The IRQ output of the bcm bt-device is really a level IRQ signal, which
signals a logical high as long as the device's buffer contains data. Since
the draining in the buffer is done in the tty driver, we cannot (easily)
wait in a threaded interrupt handler for the draining, after which the
IRQ should go low again.
So instead we treat the IRQ as an edge interrupt. This opens the window
for a theoretical race where we wakeup, read some data and then autosuspend
*before* the IRQ has gone (logical) low, followed by the device just at
that moment receiving more data, causing the IRQ to stay high and we never
see an edge.
Since we call pm_runtime_mark_last_busy() on every received byte, there
should be plenty time for the IRQ to go (logical) low before we ever
suspend, so this should never happen, but after commit 43fff7683468
("Bluetooth: hci_bcm: Streamline runtime PM code"), which has been reverted
since, this was actually happening causing the device to get stuck in
runtime suspend.
The bcm bt-device actually has a workaround for this, if we set the
pulsed_host_wake flag in the sleep parameters, then the device monitors
if the host is draining the buffer and if not then after a timeout the
device will pulse the IRQ line, causing us to see an edge, fixing the
stuck in suspend condition.
This commit sets the pulsed_host_wake flag to fix the (mostly theoretical)
race caused by us treating the IRQ as an edge IRQ.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
|
|
This reverts commit 43fff7683468 ("Bluetooth: hci_bcm: Streamline runtime
PM code"). The commit msg for this commit states "No functional change
intended.", but replacing:
pm_runtime_get();
pm_runtime_mark_last_busy();
pm_runtime_put_autosuspend();
with:
pm_request_resume();
Does result in a functional change, pm_request_resume() only calls
pm_runtime_mark_last_busy() if the device was suspended before the call.
This results in the following happening:
1) Device is runtime suspended
2) Device drives host_wake IRQ logically high as it starts receiving data
3) bcm_host_wake() gets called, causes the device to runtime-resume,
current time gets marked as last_busy time
4) After 5 seconds the autosuspend timer expires and the dev autosuspends
as no one has been calling pm_runtime_mark_last_busy(), the device was
resumed during those 5 seconds, so all the pm_request_resume() calls
while receiving data and/or bcm_host_wake() calls were nops
5) If 4) happens while the device has (just received) data in its buffer to
be read by the host the IRQ line is *already* / still logically high
when we autosuspend and since we use an edge triggered IRQ, the IRQ
will never trigger, causing the device to get stuck in suspend
Therefor this commit has to be reverted, so that we avoid the device
getting stuck in suspend.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Acked-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
|
|
The Atheros 1525/QCA6174 BT doesn't seem working properly on the
recent kernels, as it tries to load a wrong firmware
ar3k/AthrBT_0x00000200.dfu and it fails.
This seems to have been a problem for some time, and the known
workaround is to apply BTUSB_QCA_ROM quirk instead of BTUSB_ATH3012.
The device in question is:
T: Bus=01 Lev=01 Prnt=01 Port=09 Cnt=03 Dev#= 4 Spd=12 MxCh= 0
D: Ver= 1.10 Cls=e0(wlcon) Sub=01 Prot=01 MxPS=64 #Cfgs= 1
P: Vendor=0cf3 ProdID=3004 Rev= 0.01
C:* #Ifs= 2 Cfg#= 1 Atr=e0 MxPwr=100mA
I:* If#= 0 Alt= 0 #EPs= 3 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E: Ad=81(I) Atr=03(Int.) MxPS= 16 Ivl=1ms
E: Ad=82(I) Atr=02(Bulk) MxPS= 64 Ivl=0ms
E: Ad=02(O) Atr=02(Bulk) MxPS= 64 Ivl=0ms
I:* If#= 1 Alt= 0 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E: Ad=83(I) Atr=01(Isoc) MxPS= 0 Ivl=1ms
E: Ad=03(O) Atr=01(Isoc) MxPS= 0 Ivl=1ms
I: If#= 1 Alt= 1 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E: Ad=83(I) Atr=01(Isoc) MxPS= 9 Ivl=1ms
E: Ad=03(O) Atr=01(Isoc) MxPS= 9 Ivl=1ms
I: If#= 1 Alt= 2 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E: Ad=83(I) Atr=01(Isoc) MxPS= 17 Ivl=1ms
E: Ad=03(O) Atr=01(Isoc) MxPS= 17 Ivl=1ms
I: If#= 1 Alt= 3 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E: Ad=83(I) Atr=01(Isoc) MxPS= 25 Ivl=1ms
E: Ad=03(O) Atr=01(Isoc) MxPS= 25 Ivl=1ms
I: If#= 1 Alt= 4 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E: Ad=83(I) Atr=01(Isoc) MxPS= 33 Ivl=1ms
E: Ad=03(O) Atr=01(Isoc) MxPS= 33 Ivl=1ms
I: If#= 1 Alt= 5 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E: Ad=83(I) Atr=01(Isoc) MxPS= 49 Ivl=1ms
E: Ad=03(O) Atr=01(Isoc) MxPS= 49 Ivl=1ms
Bugzilla: http://bugzilla.opensuse.org/show_bug.cgi?id=1082504
Reported-by: Ivan Levshin <ivan.levshin@microfocus.com>
Tested-by: Ivan Levshin <ivan.levshin@microfocus.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
|
|
Closing of a listen socket wakes up kernel_accept() of
smc_tcp_listen_worker(), and then has to wait till smc_tcp_listen_worker()
gives up the internal clcsock. The wait logic introduced with
commit 127f49705823 ("net/smc: release clcsock from tcp_listen_worker")
might wait longer than necessary. This patch implements the idea to
implement the wait just with flush_work(), and gets rid of the extra
smc_close_wait_listen_clcsock() function.
Fixes: 127f49705823 ("net/smc: release clcsock from tcp_listen_worker")
Reported-by: Hans Wippel <hwippel@linux.vnet.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The NETIF_F_GSO_SOFTWARE implies support for GSO on SCTP, but the
sunvnet driver does not support GSO for sctp. Here we remove the
NETIF_F_GSO_SOFTWARE feature flag and only report NETIF_F_ALL_TSO
instead.
Signed-off-by: Cathy Zhou <Cathy.Zhou@Oracle.COM>
Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The problem was introduced in commit
506b0a395f26 ("[netdrv] tg3: APE heartbeat changes"). The bug occurs
because tp->lock spinlock is held which is obtained in tg3_start
by way of tg3_full_lock(), line 11571. The documentation for usleep_range()
specifically states it cannot be used inside a spinlock.
Fixes: 506b0a395f26 ("[netdrv] tg3: APE heartbeat changes")
Signed-off-by: Jonathan Toppins <jtoppins@redhat.com>
Acked-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Prior to the rework of PMTU information storage in commit
2c8cec5c10bc ("ipv4: Cache learned PMTU information in inetpeer."),
when a PMTU event advertising a PMTU smaller than
net.ipv4.route.min_pmtu was received, we would disable setting the DF
flag on packets by locking the MTU metric, and set the PMTU to
net.ipv4.route.min_pmtu.
Since then, we don't disable DF, and set PMTU to
net.ipv4.route.min_pmtu, so the intermediate router that has this link
with a small MTU will have to drop the packets.
This patch reestablishes pre-2.6.39 behavior by splitting
rtable->rt_pmtu into a bitfield with rt_mtu_locked and rt_pmtu.
rt_mtu_locked indicates that we shouldn't set the DF bit on that path,
and is checked in ip_dont_fragment().
One possible workaround is to set net.ipv4.route.min_pmtu to a value low
enough to accommodate the lowest MTU encountered.
Fixes: 2c8cec5c10bc ("ipv4: Cache learned PMTU information in inetpeer.")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The tx_errors counter is incremented by the dpaa_xmit caller.
Signed-off-by: Camelia Groza <camelia.groza@nxp.com>
Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: Camelia Groza <camelia.groza@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The fd_format has already been initialized at this point.
Signed-off-by: Camelia Groza <camelia.groza@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The recent changes that make the driver probing compatible with DSA
were not propagated in the dpa_remove() function, breaking the
module unload function. Using the proper device to address the issue.
Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The wait_for_completion() call in qman_delete_cgr_safe()
was triggering a scheduling while atomic bug, replacing the
kthread with a smp_call_function_single() call to fix it.
Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
Signed-off-by: Roy Pledge <roy.pledge@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Andrei Vagin reported a KASAN: slab-out-of-bounds error in
skb_update_prio()
Since SYNACK might be attached to a request socket, we need to
get back to the listener socket.
Since this listener is manipulated without locks, add const
qualifiers to sock_cgroup_prioidx() so that the const can also
be used in skb_update_prio()
Also add the const qualifier to sock_cgroup_classid() for consistency.
Fixes: ca6fb0651883 ("tcp: attach SYNACK messages to request sockets instead of listener")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Andrei Vagin <avagin@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
While waiting for the TX object to send an RTR, an external message with a
matching id can overwrite the TX data. In this case we must call the rx
routine and then try transmitting the message that was overwritten again.
The queue was being stalled because the RX event did not generate an
interrupt to wake up the queue again and the TX event did not happen
because the TXRQST flag is reset by the chip when new data is received.
According to the CC770 datasheet the id of a message object should not be
changed while the MSGVAL bit is set. This has been fixed by resetting the
MSGVAL bit before modifying the object in the transmit function and setting
it after. It is not enough to set & reset CPUUPD.
It is important to keep the MSGVAL bit reset while the message object is
being modified. Otherwise, during RTR transmission, a frame with matching
id could trigger an rx-interrupt, which would cause a race condition
between the interrupt routine and the transmit function.
Signed-off-by: Andri Yngvason <andri.yngvason@marel.com>
Tested-by: Richard Weinberger <richard@nod.at>
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
This has been reported to cause stalls on rt-linux.
Suggested-by: Richard Weinberger <richard@nod.at>
Tested-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Andri Yngvason <andri.yngvason@marel.com>
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
We're dereferencing "p_hwfn->p_rdma_info" but that is freed on the line
before in qed_rdma_resc_free(p_hwfn).
Fixes: 9de506a547c0 ("qed: Free RoCE ILT Memory on rmmod qedr")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
f7c83bcbfaf5 ("net: xfrm: use __this_cpu_read per-cpu helper") added a
__this_cpu_read() call inside ipcomp_alloc_tfms().
At the time, __this_cpu_read() required the caller to either not care
about races or to handle preemption/interrupt issues. 3.15 tightened
the rules around some per-cpu operations, and now __this_cpu_read()
should never be used in a preemptible context. On 3.15 and later, we
need to use this_cpu_read() instead.
syzkaller reported this leading to the following kernel BUG while
fuzzing sendmsg:
BUG: using __this_cpu_read() in preemptible [00000000] code: repro/3101
caller is ipcomp_init_state+0x185/0x990
CPU: 3 PID: 3101 Comm: repro Not tainted 4.16.0-rc4-00123-g86f84779d8e9 #154
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
Call Trace:
dump_stack+0xb9/0x115
check_preemption_disabled+0x1cb/0x1f0
ipcomp_init_state+0x185/0x990
? __xfrm_init_state+0x876/0xc20
? lock_downgrade+0x5e0/0x5e0
ipcomp4_init_state+0xaa/0x7c0
__xfrm_init_state+0x3eb/0xc20
xfrm_init_state+0x19/0x60
pfkey_add+0x20df/0x36f0
? pfkey_broadcast+0x3dd/0x600
? pfkey_sock_destruct+0x340/0x340
? pfkey_seq_stop+0x80/0x80
? __skb_clone+0x236/0x750
? kmem_cache_alloc+0x1f6/0x260
? pfkey_sock_destruct+0x340/0x340
? pfkey_process+0x62a/0x6f0
pfkey_process+0x62a/0x6f0
? pfkey_send_new_mapping+0x11c0/0x11c0
? mutex_lock_io_nested+0x1390/0x1390
pfkey_sendmsg+0x383/0x750
? dump_sp+0x430/0x430
sock_sendmsg+0xc0/0x100
___sys_sendmsg+0x6c8/0x8b0
? copy_msghdr_from_user+0x3b0/0x3b0
? pagevec_lru_move_fn+0x144/0x1f0
? find_held_lock+0x32/0x1c0
? do_huge_pmd_anonymous_page+0xc43/0x11e0
? lock_downgrade+0x5e0/0x5e0
? get_kernel_page+0xb0/0xb0
? _raw_spin_unlock+0x29/0x40
? do_huge_pmd_anonymous_page+0x400/0x11e0
? __handle_mm_fault+0x553/0x2460
? __fget_light+0x163/0x1f0
? __sys_sendmsg+0xc7/0x170
__sys_sendmsg+0xc7/0x170
? SyS_shutdown+0x1a0/0x1a0
? __do_page_fault+0x5a0/0xca0
? lock_downgrade+0x5e0/0x5e0
SyS_sendmsg+0x27/0x40
? __sys_sendmsg+0x170/0x170
do_syscall_64+0x19f/0x640
entry_SYSCALL_64_after_hwframe+0x42/0xb7
RIP: 0033:0x7f0ee73dfb79
RSP: 002b:00007ffe14fc15a8 EFLAGS: 00000207 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f0ee73dfb79
RDX: 0000000000000000 RSI: 00000000208befc8 RDI: 0000000000000004
RBP: 00007ffe14fc15b0 R08: 00007ffe14fc15c0 R09: 00007ffe14fc15c0
R10: 0000000000000000 R11: 0000000000000207 R12: 0000000000400440
R13: 00007ffe14fc16b0 R14: 0000000000000000 R15: 0000000000000000
Signed-off-by: Greg Hackmann <ghackmann@google.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
During the conversion to dsa_is_user_port(), a condition ended up being
reversed, which would prevent the creation of any user port when using
the legacy binding and/or platform data, fix that.
Fixes: 4a5b85ffe2a0 ("net: dsa: use dsa_is_user_port everywhere")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The l2tp_tunnel_create() function checks for v4mapped ipv6
sockets and cache that flag, so that l2tp core code can
reusing it at xmit time.
If the socket is provided by the userspace, the connection
status of the tunnel sockets can change between the tunnel
creation and the xmit call, so that syzbot is able to
trigger the following splat:
BUG: KASAN: use-after-free in ip6_dst_idev include/net/ip6_fib.h:192
[inline]
BUG: KASAN: use-after-free in ip6_xmit+0x1f76/0x2260
net/ipv6/ip6_output.c:264
Read of size 8 at addr ffff8801bd949318 by task syz-executor4/23448
CPU: 0 PID: 23448 Comm: syz-executor4 Not tainted 4.16.0-rc4+ #65
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:17 [inline]
dump_stack+0x194/0x24d lib/dump_stack.c:53
print_address_description+0x73/0x250 mm/kasan/report.c:256
kasan_report_error mm/kasan/report.c:354 [inline]
kasan_report+0x23c/0x360 mm/kasan/report.c:412
__asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433
ip6_dst_idev include/net/ip6_fib.h:192 [inline]
ip6_xmit+0x1f76/0x2260 net/ipv6/ip6_output.c:264
inet6_csk_xmit+0x2fc/0x580 net/ipv6/inet6_connection_sock.c:139
l2tp_xmit_core net/l2tp/l2tp_core.c:1053 [inline]
l2tp_xmit_skb+0x105f/0x1410 net/l2tp/l2tp_core.c:1148
pppol2tp_sendmsg+0x470/0x670 net/l2tp/l2tp_ppp.c:341
sock_sendmsg_nosec net/socket.c:630 [inline]
sock_sendmsg+0xca/0x110 net/socket.c:640
___sys_sendmsg+0x767/0x8b0 net/socket.c:2046
__sys_sendmsg+0xe5/0x210 net/socket.c:2080
SYSC_sendmsg net/socket.c:2091 [inline]
SyS_sendmsg+0x2d/0x50 net/socket.c:2087
do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287
entry_SYSCALL_64_after_hwframe+0x42/0xb7
RIP: 0033:0x453e69
RSP: 002b:00007f819593cc68 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f819593d6d4 RCX: 0000000000453e69
RDX: 0000000000000081 RSI: 000000002037ffc8 RDI: 0000000000000004
RBP: 000000000072bea0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
R13: 00000000000004c3 R14: 00000000006f72e8 R15: 0000000000000000
This change addresses the issues:
* explicitly checking for TCP_ESTABLISHED for user space provided sockets
* dropping the v4mapped flag usage - it can become outdated - and
explicitly invoking ipv6_addr_v4mapped() instead
The issue is apparently there since ancient times.
v1 -> v2: (many thanks to Guillaume)
- with csum issue introduced in v1
- replace pr_err with pr_debug
- fix build issue with IPV6 disabled
- move l2tp_sk_is_v4mapped in l2tp_core.c
v2 -> v3:
- don't update inet_daddr for v4mapped address, unneeded
- drop rendundant check at creation time
Reported-and-tested-by: syzbot+92fa328176eb07e4ac1a@syzkaller.appspotmail.com
Fixes: 3557baabf280 ("[L2TP]: PPP over L2TP driver core")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
On unsuccesful ip6_datagram_connect(), if the failure is caused by
ip6_datagram_dst_update(), the sk peer information are cleared, but
the sk->sk_state is preserved.
If the socket was already in an established status, the overall sk
status is inconsistent and fouls later checks in datagram code.
Fix this saving the old peer information and restoring them in
case of failure. This also aligns ipv6 datagram connect() behavior
with ipv4.
v1 -> v2:
- added missing Fixes tag
Fixes: 85cb73ff9b74 ("net: ipv6: reset daddr and dport in sk if connect() fails")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Alex reported the following race condition:
/* link goes up... interrupt... schedule watchdog */
\ e1000_watchdog_task
\ e1000e_has_link
\ hw->mac.ops.check_for_link() === e1000e_check_for_copper_link
\ e1000e_phy_has_link_generic(..., &link)
link = true
/* link goes down... interrupt */
\ e1000_msix_other
hw->mac.get_link_status = true
/* link is up */
mac->get_link_status = false
link_active = true
/* link_active is true, wrongly, and stays so because
* get_link_status is false */
Avoid this problem by making sure that we don't set get_link_status = false
after having checked the link.
It seems this problem has been present since the introduction of e1000e.
Link: https://lkml.org/lkml/2018/1/29/338
Reported-by: Alexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
This reverts commit 19110cfbb34d4af0cdfe14cd243f3b09dc95b013.
This reverts commit 4110e02eb45ea447ec6f5459c9934de0a273fb91.
This reverts commit d3604515c9eda464a92e8e67aae82dfe07fe3c98.
Commit 19110cfbb34d ("e1000e: Separate signaling for link check/link up")
changed what happens to the link status when there is an error which
happens after "get_link_status = false" in the copper check_for_link
callbacks. Previously, such an error would be ignored and the link
considered up. After that commit, any error implies that the link is down.
Revert commit 19110cfbb34d ("e1000e: Separate signaling for link check/link
up") and its followups. After reverting, the race condition described in
the log of commit 19110cfbb34d is reintroduced. It may still be triggered
by LSC events but this should keep the link down in case the link is
electrically unstable, as discussed. The race may no longer be
triggered by RXO events because commit 4aea7a5c5e94 ("e1000e: Avoid
receiver overrun interrupt bursts") restored reading icr in the Other
handler.
Link: https://lkml.org/lkml/2018/3/1/789
Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
Since the kprobe which was optimized by jump can not change
the execution path, the kprobe for error-injection must not
be optimized. To prohibit it, set a dummy post-handler as
officially stated in Documentation/kprobes.txt.
Fixes: 4b1a29a7f542 ("error-injection: Support fault injection framework")
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
Now when using 'ss' in iproute, kernel would try to load all _diag
modules, which also causes corresponding family and proto modules
to be loaded as well due to module dependencies.
Like after running 'ss', sctp, dccp, af_packet (if it works as a module)
would be loaded.
For example:
$ lsmod|grep sctp
$ ss
$ lsmod|grep sctp
sctp_diag 16384 0
sctp 323584 5 sctp_diag
inet_diag 24576 4 raw_diag,tcp_diag,sctp_diag,udp_diag
libcrc32c 16384 3 nf_conntrack,nf_nat,sctp
As these family and proto modules are loaded unintentionally, it
could cause some problems, like:
- Some debug tools use 'ss' to collect the socket info, which loads all
those diag and family and protocol modules. It's noisy for identifying
issues.
- Users usually expect to drop sctp init packet silently when they
have no sense of sctp protocol instead of sending abort back.
- It wastes resources (especially with multiple netns), and SCTP module
can't be unloaded once it's loaded.
...
In short, it's really inappropriate to have these family and proto
modules loaded unexpectedly when just doing debugging with inet_diag.
This patch is to introduce sock_load_diag_module() where it loads
the _diag module only when it's corresponding family or proto has
been already registered.
Note that we can't just load _diag module without the family or
proto loaded, as some symbols used in _diag module are from the
family or proto module.
v1->v2:
- move inet proto check to inet_diag to avoid a compiling err.
v2->v3:
- define sock_load_diag_module in sock.c and export one symbol
only.
- improve the changelog.
Reported-by: Sabrina Dubroca <sd@queasysnail.net>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Phil Sutter <phil@nwl.cc>
Acked-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
During initialization, if we encounter errors, there is a code path that
calls bnxt_hwrm_vnic_set_tpa() with invalid VNIC ID. This may cause a
warning in firmware logs.
Fixes: c0c050c58d84 ("bnxt_en: New Broadcom ethernet driver.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
bnxt_restore_pf_fw_resources routine frees PF resources by calling
close_nic and allocates the resources back, by doing open_nic. However,
this is not needed, if the PF is already in closed state.
This bug causes the driver to call open the device and call request_irq()
when it is not needed. Ultimately, pci_disable_msix() will crash
when bnxt_en is unloaded.
This patch fixes the problem by skipping __bnxt_close_nic and
__bnxt_open_nic inside bnxt_restore_pf_fw_resources routine, if the
interface is not running.
Fixes: 80fcaf46c092 ("bnxt_en: Restore MSIX after disabling SRIOV.")
Signed-off-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently, internal error value is returned by the driver, when
hwrm_cfa_flow_alloc() fails due lack of resources. We should be returning
Linux errno value -ENOSPC instead.
This patch also converts other similar command errors to standard Linux errno
code (-EIO) in bnxt_tc.c
Fixes: db1d36a27324 ("bnxt_en: add TC flower offload flow_alloc/free FW cmds")
Signed-off-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Recent changes added the bnxt_init_int_mode() call in the driver's open
path whenever ring reservations are changed. This call was previously
only called in the probe path. In the open path, if MQPRIO TC has been
setup, the bnxt_init_int_mode() call would reset and mess up the MQPRIO
per TC rings.
Fix it by not re-initilizing bp->tx_nr_rings_per_tc in
bnxt_init_int_mode(). Instead, initialize it in the probe path only
after the bnxt_init_int_mode() call.
Fixes: 674f50a5b026 ("bnxt_en: Implement new method to reserve rings.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When receiving a packet with VLAN tag, pass the entire 16-bit TCI to the
stack when calling __vlan_hwaccel_put_tag(). The current code is only
passing the 12-bit tag and it is missing the priority bits.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In some conditions when the driver fails to add a flow in HW and returns
an error back to the stack, the stack continues to invoke get_flow_stats()
and/or del_flow() on it. The driver fails these APIs with an error message
"no flow_node for cookie". The message gets logged repeatedly as long as
the stack keeps invoking these functions.
Fix this by removing the corresponding netdev_info() calls from these
functions.
Fixes: d7bc73053024 ("bnxt_en: add code to query TC flower offload stats")
Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The number of vnics to check must be determined ahead of time because
only standard RX rings require vnics to support RFS. The logic is
similar to the ring reservation logic and we can now use the
refactored common functions to do most of the work in setting up
the firmware message.
Fixes: 8f23d638b36b ("bnxt_en: Expand bnxt_check_rings() to check all resources.")
Signed-off-by: Eddie Wai <eddie.wai@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The bnxt_hwrm_reserve_{pf|vf}_rings() functions are very similar to
the bnxt_hwrm_check_{pf|vf}_rings() functions. Refactor the former
so that the latter can make use of common code in the next patch.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In 664fcf123a30e (net: phy: Threaded interrupts allow some simplification)
the phy_interrupt system was changed to use a traditional threaded
interrupt scheme instead of a workqueue approach.
With this change, the phy status check moved into phy_change, which
did not report back to the caller whether or not the interrupt was
handled. This means that, in the case of a shared phy interrupt,
only the first phydev's interrupt registers are checked (since
phy_interrupt() would always return IRQ_HANDLED). This leads to
interrupt storms when it is a secondary device that's actually the
interrupt source.
Signed-off-by: Brad Mouring <brad.mouring@ni.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Make sure to apply the correct pin state in suspend/resume callbacks.
Putting pins in sleep state saves power.
Signed-off-by: Bich Hemon <bich.hemon@st.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
When an interface starts, the echo_skb array is empty and the network
queue should be started only. This patch replaces useless code and locks
when the internal RX_BARRIER message is received from the IP core, telling
the driver that tx may start.
Signed-off-by: Stephane Grosjean <s.grosjean@peak-system.com>
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|