Age | Commit message (Collapse) | Author | Files | Lines |
|
There is a race between arc_emac_tx() and arc_emac_tx_clean().
sk_buff got freed by arc_emac_tx_clean() while arc_emac_tx()
submitting sk_buff.
In order to free sk_buff arc_emac_tx_clean() checks:
if ((info & FOR_EMAC) || !txbd->data)
break;
...
dev_kfree_skb_irq(skb);
If condition false, arc_emac_tx_clean() free sk_buff.
In order to submit txbd, arc_emac_tx() do:
priv->tx_buff[*txbd_curr].skb = skb;
...
priv->txbd[*txbd_curr].data = cpu_to_le32(addr);
...
... <== arc_emac_tx_clean() check condition here
... <== (info & FOR_EMAC) is false
... <== !txbd->data is false
...
*info = cpu_to_le32(FOR_EMAC | FIRST_OR_LAST_MASK | len);
In order to reproduce the situation,
run device:
# iperf -s
run on host:
# iperf -t 600 -c <device-ip-addr>
[ 28.396284] ------------[ cut here ]------------
[ 28.400912] kernel BUG at .../net/core/skbuff.c:1355!
[ 28.414019] Internal error: Oops - BUG: 0 [#1] SMP ARM
[ 28.419150] Modules linked in:
[ 28.422219] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G B 4.4.0+ #120
[ 28.429516] Hardware name: Rockchip (Device Tree)
[ 28.434216] task: c0665070 ti: c0660000 task.ti: c0660000
[ 28.439622] PC is at skb_put+0x10/0x54
[ 28.443381] LR is at arc_emac_poll+0x260/0x474
[ 28.447821] pc : [<c03af580>] lr : [<c028fec4>] psr: a0070113
[ 28.447821] sp : c0661e58 ip : eea68502 fp : ef377000
[ 28.459280] r10: 0000012c r9 : f08b2000 r8 : eeb57100
[ 28.464498] r7 : 00000000 r6 : ef376594 r5 : 00000077 r4 : ef376000
[ 28.471015] r3 : 0030488b r2 : ef13e880 r1 : 000005ee r0 : eeb57100
[ 28.477534] Flags: NzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none
[ 28.484658] Control: 10c5387d Table: 8eaf004a DAC: 00000051
[ 28.490396] Process swapper/0 (pid: 0, stack limit = 0xc0660210)
[ 28.496393] Stack: (0xc0661e58 to 0xc0662000)
[ 28.500745] 1e40: 00000002 00000000
[ 28.508913] 1e60: 00000000 ef376520 00000028 f08b23b8 00000000 ef376520 ef7b6900 c028fc64
[ 28.517082] 1e80: 2f158000 c0661ea8 c0661eb0 0000012c c065e900 c03bdeac ffff95e9 c0662100
[ 28.525250] 1ea0: c0663924 00000028 c0661ea8 c0661ea8 c0661eb0 c0661eb0 0000001e c0660000
[ 28.533417] 1ec0: 40000003 00000008 c0695a00 0000000a c066208c 00000100 c0661ee0 c0027410
[ 28.541584] 1ee0: ef0fb700 2f158000 00200000 ffff95e8 00000004 c0662100 c0662080 00000003
[ 28.549751] 1f00: 00000000 00000000 00000000 c065b45c 0000001e ef005000 c0647a30 00000000
[ 28.557919] 1f20: 00000000 c0027798 00000000 c005cf40 f0802100 c0662ffc c0661f60 f0803100
[ 28.566088] 1f40: c0661fb8 c00093bc c000ffb4 60070013 ffffffff c0661f94 c0661fb8 c00137d4
[ 28.574267] 1f60: 00000001 00000000 00000000 c001ffa0 00000000 c0660000 00000000 c065a364
[ 28.582441] 1f80: c0661fb8 c0647a30 00000000 00000000 00000000 c0661fb0 c000ffb0 c000ffb4
[ 28.590608] 1fa0: 60070013 ffffffff 00000051 00000000 00000000 c005496c c0662400 c061bc40
[ 28.598776] 1fc0: ffffffff ffffffff 00000000 c061b680 00000000 c0647a30 00000000 c0695294
[ 28.606943] 1fe0: c0662488 c0647a2c c066619c 6000406a 413fc090 6000807c 00000000 00000000
[ 28.615127] [<c03af580>] (skb_put) from [<ef376520>] (0xef376520)
[ 28.621218] Code: e5902054 e590c090 e3520000 0a000000 (e7f001f2)
[ 28.627307] ---[ end trace 4824734e2243fdb6 ]---
[ 34.377068] Internal error: Oops: 17 [#1] SMP ARM
[ 34.382854] Modules linked in:
[ 34.385947] CPU: 0 PID: 3 Comm: ksoftirqd/0 Not tainted 4.4.0+ #120
[ 34.392219] Hardware name: Rockchip (Device Tree)
[ 34.396937] task: ef02d040 ti: ef05c000 task.ti: ef05c000
[ 34.402376] PC is at __dev_kfree_skb_irq+0x4/0x80
[ 34.407121] LR is at arc_emac_poll+0x130/0x474
[ 34.411583] pc : [<c03bb640>] lr : [<c028fd94>] psr: 60030013
[ 34.411583] sp : ef05de68 ip : 0008e83c fp : ef377000
[ 34.423062] r10: c001bec4 r9 : 00000000 r8 : f08b24c8
[ 34.428296] r7 : f08b2400 r6 : 00000075 r5 : 00000019 r4 : ef376000
[ 34.434827] r3 : 00060000 r2 : 00000042 r1 : 00000001 r0 : 00000000
[ 34.441365] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none
[ 34.448507] Control: 10c5387d Table: 8f25c04a DAC: 00000051
[ 34.454262] Process ksoftirqd/0 (pid: 3, stack limit = 0xef05c210)
[ 34.460449] Stack: (0xef05de68 to 0xef05e000)
[ 34.464827] de60: ef376000 c028fd94 00000000 c0669480 c0669480 ef376520
[ 34.473022] de80: 00000028 00000001 00002ae4 ef376520 ef7b6900 c028fc64 2f158000 ef05dec0
[ 34.481215] dea0: ef05dec8 0000012c c065e900 c03bdeac ffff983f c0662100 c0663924 00000028
[ 34.489409] dec0: ef05dec0 ef05dec0 ef05dec8 ef05dec8 ef7b6000 ef05c000 40000003 00000008
[ 34.497600] dee0: c0695a00 0000000a c066208c 00000100 ef05def8 c0027410 ef7b6000 40000000
[ 34.505795] df00: 04208040 ffff983e 00000004 c0662100 c0662080 00000003 ef05c000 ef027340
[ 34.513985] df20: ef05c000 c0666c2c 00000000 00000001 00000002 00000000 00000000 c0027568
[ 34.522176] df40: ef027340 c003ef48 ef027300 00000000 ef027340 c003edd4 00000000 00000000
[ 34.530367] df60: 00000000 c003c37c ffffff7f 00000001 00000000 ef027340 00000000 00030003
[ 34.538559] df80: ef05df80 ef05df80 00000000 00000000 ef05df90 ef05df90 ef05dfac ef027300
[ 34.546750] dfa0: c003c2a4 00000000 00000000 c000f578 00000000 00000000 00000000 00000000
[ 34.554939] dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 34.563129] dfe0: 00000000 00000000 00000000 00000000 00000013 00000000 ffffffff dfff7fff
[ 34.571360] [<c03bb640>] (__dev_kfree_skb_irq) from [<c028fd94>] (arc_emac_poll+0x130/0x474)
[ 34.579840] [<c028fd94>] (arc_emac_poll) from [<c03bdeac>] (net_rx_action+0xdc/0x28c)
[ 34.587712] [<c03bdeac>] (net_rx_action) from [<c0027410>] (__do_softirq+0xcc/0x1f8)
[ 34.595482] [<c0027410>] (__do_softirq) from [<c0027568>] (run_ksoftirqd+0x2c/0x50)
[ 34.603168] [<c0027568>] (run_ksoftirqd) from [<c003ef48>] (smpboot_thread_fn+0x174/0x18c)
[ 34.611466] [<c003ef48>] (smpboot_thread_fn) from [<c003c37c>] (kthread+0xd8/0xec)
[ 34.619075] [<c003c37c>] (kthread) from [<c000f578>] (ret_from_fork+0x14/0x3c)
[ 34.626317] Code: e8bd8010 e3a00000 e12fff1e e92d4010 (e59030a4)
[ 34.632572] ---[ end trace cca5a3d86a82249a ]---
Signed-off-by: Alexander Kochetkov <al.kochet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch corrects the unaligned accesses seen on GRE TEB tunnels when
generating hash keys. Specifically what this patch does is make it so that
we force the use of skb_copy_bits when the GRE inner headers will be
unaligned due to NET_IP_ALIGNED being a non-zero value.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Acked-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently our netdevice ops is a one static global variable which
is referenced by all mlx5e netdevice instances. This can be
problematic when different driver instances do not share same
HW capabilities (e.g SRIOV PF and VFs probed to the host).
Now we have two constant global netdevice ops variables, one
for basic netdevice ops and the other with extended SRIOV ops,
on netdevice construction we choose the one suitable for
current device capabilities.
Fixes: 66e49dedada6 ("net/mlx5e: Add support for SR-IOV ndos")
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently mlx5e_select_queue is redundant since num_tc is always 1.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
mlx5_ifc.h is a header file representing the API and ABI between
the driver to the firmware and hardware. This file is used from
both the mlx5_ib and mlx5_core drivers.
Previously, this file used incrementing counter to indicate
reserved fields, for example:
struct mlx5_ifc_odp_per_transport_service_cap_bits {
u8 send[0x1];
u8 receive[0x1];
u8 write[0x1];
u8 read[0x1];
u8 reserved_0[0x1];
u8 srq_receive[0x1];
u8 reserved_1[0x1a];
};
If one developer implements through net-next feature A that uses
reserved_0, they replace it with featureA and renames reserved_1 to
reserved_0. In the same kernel cycle, a 2nd developer could implement
feature B through the rdma tree, that uses reserved_1 and split it to
featureB and a smaller reserved_1 field. This will cause a conflict
when the two trees are merged.
The source of this conflict is that the 1st developer changed *all*
reserved fields.
As Linus suggested, we change the layout of structs to:
struct mlx5_ifc_odp_per_transport_service_cap_bits {
u8 send[0x1];
u8 receive[0x1];
u8 write[0x1];
u8 read[0x1];
u8 reserved_at_4[0x1];
u8 srq_receive[0x1];
u8 reserved_at_6[0x1a];
};
This makes the conflicts much more rare and preserves the locality of
changes.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Alaa Hleihel <alaa@mellanox.com>
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
There is presently a race condition between the bonding periodic
link monitor and the updating of a slave's speed and duplex. The former
occurs on a periodic basis, and the latter in response to a driver's
calling of netif_carrier_on.
It is possible for the periodic monitor to run between the
driver call of netif_carrier_on and the receipt of the NETDEV_CHANGE
event that causes bonding to update the slave's speed and duplex. This
manifests most notably as a report that a slave is up and "0 Mbps full
duplex" after enslavement, but in principle could report an incorrect
speed and duplex after any link up event if the device comes up with a
different speed or duplex. This affects the 802.3ad aggregator
selection, as the speed and duplex are selection criteria.
This is fixed by updating the speed and duplex in the periodic
monitor, prior to using that information.
This was done historically in bonding, but the call to
bond_update_speed_duplex was removed in commit 876254ae2758 ("bonding:
don't call update_speed_duplex() under spinlocks"), as it might sleep
under lock. Later, the locking was changed to only hold RTNL, and so
after commit 876254ae2758 ("bonding: don't call update_speed_duplex()
under spinlocks") this call is again safe.
Tested-by: "Tantilov, Emil S" <emil.s.tantilov@intel.com>
Cc: Veaceslav Falico <vfalico@gmail.com>
Cc: dingtianhong <dingtianhong@huawei.com>
Fixes: 876254ae2758 ("bonding: don't call update_speed_duplex() under spinlocks")
Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Acked-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The am79c961a.c driver fails to build with clang because of an
unusual inline assembly construct:
drivers/net/ethernet/amd/am79c961a.c:53:7: error: invalid % escape in inline assembly string
"str%?h %1, [%2] @ NET_RAP\n\t"
The same change has been done a decade ago in arch/arm as of
6a39dd6222dd ("[ARM] 3759/2: Remove uses of %?"), but apparently
some drivers were missed.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The smc91x driver doesn't honor the probe deferral mechanism when the
interrupt source is not yet available, such as one provided by a gpio
controller not probed.
Fix this by propagating the platform_get_irq() error code as the probe
return value.
Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr>
Cc: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Remove the two wildcard entries, they serve no purpose and will match way too
many devices, some of them being covered by the driver in
drivers/net/phy/broadcom.c. Remove the now unused bcm7xxx_dummy_config_init()
function which would produce a warning.
Fixes: b560a58c45c6 ("net: phy: add Broadcom BCM7xxx internal PHY driver")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Since we were wrongly advertising gigabit features for these 10/100 only
Ethernet PHYs, bcm7xxx_config_init() which is supposed to apply workaround
would have not run since the check would be true, now that we have fixed the
PHY features, remove that check since it has no reasoning to be there anymore.
Fixes: e18556ee3bd83 ("net: phy: bcm7xxx: do not use PHY_BRCM_100MBPS_WAR")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The PHY entries for BCM7425/29/35 declare the 40nm Ethernet PHY as being
10/100/1000 capable, while this is just a 10/100 capable PHY device, fix that.
Fixes: d068b02cfdfc2 ("net: phy: add BCM7425 and BCM7429 PHYs")
Fixes: 9458ceab4917 ("net: phy: bcm7xxx: Add entry for BCM7435")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The clear and set masks in the call to phy_set_clr_bits() called from
bcm7xxx_config_init() are inverted. We need to fix this by swapping the two
arguments, that is, set 0 bits, but clear the shade mode 2 enable bit.
Fixes: b560a58c45c66 ("net: phy: add Broadcom BCM7xxx internal PHY driver")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When adding support for the R-Car gen3 gPTP active in configuration mode,
some call sites of ravb_ptp_{init|stop}() were missed due to an oversight.
Add checks for the R-Car gen2 SoCs around these...
Fixes: f5d7837f96e5 ("ravb: ptp: Add CONFIG mode support")
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When adding support for the R-Car gen3 gPTP active in configuration mode,
the code setting the CCC.CSEL field was duplicated due to an oversight.
For R-Car gen 2 it's just redundant and for R-Car gen3 the write at this
time is probably ignored due to CCC.GAC bit being already set...
Fixes: f5d7837f96e5 ("ravb: ptp: Add CONFIG mode support")
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The unix_dgram_sendmsg routine use the following test
if (unlikely(unix_peer(other) != sk && unix_recvq_full(other))) {
to determine if sk and other are in an n:1 association (either
established via connect or by using sendto to send messages to an
unrelated socket identified by address). This isn't correct as the
specified address could have been bound to the sending socket itself or
because this socket could have been connected to itself by the time of
the unix_peer_get but disconnected before the unix_state_lock(other). In
both cases, the if-block would be entered despite other == sk which
might either block the sender unintentionally or lead to trying to unlock
the same spin lock twice for a non-blocking send. Add a other != sk
check to guard against this.
Fixes: 7d267278a9ec ("unix: avoid use-after-free in ep_remove_wait_queue")
Reported-By: Philipp Hahn <pmhahn@pmhahn.de>
Signed-off-by: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Tested-by: Philipp Hahn <pmhahn@pmhahn.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The present unix_stream_read_generic contains various code sequences of
the form
err = -EDISASTER;
if (<test>)
goto out;
This has the unfortunate side effect of possibly causing the error code
to bleed through to the final
out:
return copied ? : err;
and then to be wrongly returned if no data was copied because the caller
didn't supply a data buffer, as demonstrated by the program available at
http://pad.lv/1540731
Change it such that err is only set if an error condition was detected.
Fixes: 3822b5c2fc62 ("af_unix: Revert 'lock_interruptible' in stream receive code")
Reported-by: Joseph Salisbury <joseph.salisbury@canonical.com>
Signed-off-by: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
My analysis in the below mail applies, although the second part is
unnecessary because i isn't used in arithmetic operations here:
https://marc.info/?l=openbsd-tech&m=145377854103866&w=2
Thanks for your time.
Signed-off-by: Michael McConville <mmcco@mykolab.com>
Acked-by: Francois Romieu <romieu@fr.zoreil.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
BRIDGE_VLAN_FILTERING automatically adds a newly bridged port to the
VLAN with the bridge's default_pvid.
The mv88e6xxx driver currently reserves VLANs 4000+ for unbridged ports
isolation. When a port joins a bridge, it leaves its reserved VLAN. When
a port leaves a bridge, it joins again its reserved VLAN.
But if the VLAN filtering is disabled, or if this hardware VLAN is
already in use, the bridged port ends up with no default VLAN, and the
communication with the CPU is thus broken.
To fix this, make a port join its reserved VLAN once on setup, never
leave it, and restore its PVID after another one was eventually used.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Tested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The current bridge code calls switchdev_port_obj_del on a VLAN port even
if the corresponding switchdev_port_obj_add call returned -EOPNOTSUPP.
If the DSA driver doesn't return -EOPNOTSUPP for a software port VLAN in
its port_vlan_del function, the VLAN is not deleted. Unbridging the port
also generates a stack trace for the same reason.
This can be quickly tested on a VLAN filtering enabled system with:
# brctl addbr br0
# brctl addif br0 lan0
# brctl addbr br1
# brctl addif br1 lan1
# brctl delif br1 lan1
Both bridges have a default default_pvid set to 1. lan0 uses the
hardware VLAN 1 while lan1 falls back to the software VLAN 1.
Unbridging lan1 does not delete its software VLAN, and thus generates
the following stack trace:
[ 2991.681705] device lan1 left promiscuous mode
[ 2991.686237] br1: port 1(lan1) entered disabled state
[ 2991.725094] ------------[ cut here ]------------
[ 2991.729761] WARNING: CPU: 0 PID: 869 at net/bridge/br_vlan.c:314 __vlan_group_free+0x4c/0x50()
[ 2991.738437] Modules linked in:
[ 2991.741546] CPU: 0 PID: 869 Comm: ip Not tainted 4.4.0 #16
[ 2991.747039] Hardware name: Freescale Vybrid VF5xx/VF6xx (Device Tree)
[ 2991.753511] Backtrace:
[ 2991.756008] [<80014450>] (dump_backtrace) from [<8001469c>] (show_stack+0x20/0x24)
[ 2991.763604] r6:80512644 r5:00000009 r4:00000000 r3:00000000
[ 2991.769343] [<8001467c>] (show_stack) from [<80268e44>] (dump_stack+0x24/0x28)
[ 2991.776618] [<80268e20>] (dump_stack) from [<80025568>] (warn_slowpath_common+0x98/0xc4)
[ 2991.784750] [<800254d0>] (warn_slowpath_common) from [<80025650>] (warn_slowpath_null+0x2c/0x34)
[ 2991.793557] r8:00000000 r7:9f786a8c r6:9f76c440 r5:9f786a00 r4:9f68ac00
[ 2991.800366] [<80025624>] (warn_slowpath_null) from [<80512644>] (__vlan_group_free+0x4c/0x50)
[ 2991.808946] [<805125f8>] (__vlan_group_free) from [<80514488>] (nbp_vlan_flush+0x44/0x68)
[ 2991.817147] r4:9f68ac00 r3:9ec70000
[ 2991.820772] [<80514444>] (nbp_vlan_flush) from [<80506f08>] (del_nbp+0xac/0x130)
[ 2991.828201] r5:9f56f800 r4:9f786a00
[ 2991.831841] [<80506e5c>] (del_nbp) from [<8050774c>] (br_del_if+0x40/0xbc)
[ 2991.838724] r7:80590f68 r6:00000000 r5:9ec71c38 r4:9f76c440
[ 2991.844475] [<8050770c>] (br_del_if) from [<80503dc0>] (br_del_slave+0x1c/0x20)
[ 2991.851802] r5:9ec71c38 r4:9f56f800
[ 2991.855428] [<80503da4>] (br_del_slave) from [<80484a34>] (do_setlink+0x324/0x7b8)
[ 2991.863043] [<80484710>] (do_setlink) from [<80485e90>] (rtnl_newlink+0x508/0x6f4)
[ 2991.870616] r10:00000000 r9:9ec71ba8 r8:00000000 r7:00000000 r6:9f6b0400 r5:9f56f800
[ 2991.878548] r4:8076278c
[ 2991.881110] [<80485988>] (rtnl_newlink) from [<80484048>] (rtnetlink_rcv_msg+0x18c/0x22c)
[ 2991.889315] r10:9f7d4e40 r9:00000000 r8:00000000 r7:00000000 r6:9f7d4e40 r5:9f6b0400
[ 2991.897250] r4:00000000
[ 2991.899814] [<80483ebc>] (rtnetlink_rcv_msg) from [<80497c74>] (netlink_rcv_skb+0xb0/0xcc)
[ 2991.908104] r8:00000000 r7:9f7d4e40 r6:9f7d4e40 r5:80483ebc r4:9f6b0400
[ 2991.914928] [<80497bc4>] (netlink_rcv_skb) from [<80483eb4>] (rtnetlink_rcv+0x34/0x3c)
[ 2991.922874] r6:9f5ea000 r5:00000028 r4:9f7d4e40 r3:80483e80
[ 2991.928622] [<80483e80>] (rtnetlink_rcv) from [<80497604>] (netlink_unicast+0x180/0x200)
[ 2991.936742] r4:9f4edc00 r3:80483e80
[ 2991.940362] [<80497484>] (netlink_unicast) from [<80497a88>] (netlink_sendmsg+0x33c/0x350)
[ 2991.948648] r8:00000000 r7:00000028 r6:00000000 r5:9f5ea000 r4:9ec71f4c
[ 2991.955481] [<8049774c>] (netlink_sendmsg) from [<80457ff0>] (sock_sendmsg+0x24/0x34)
[ 2991.963342] r10:00000000 r9:9ec71e28 r8:00000000 r7:9f1e2140 r6:00000000 r5:00000000
[ 2991.971276] r4:9ec71f4c
[ 2991.973849] [<80457fcc>] (sock_sendmsg) from [<80458af0>] (___sys_sendmsg+0x1fc/0x204)
[ 2991.981809] [<804588f4>] (___sys_sendmsg) from [<804598d0>] (__sys_sendmsg+0x4c/0x7c)
[ 2991.989640] r10:00000000 r9:9ec70000 r8:80010824 r7:00000128 r6:7ee946c4 r5:00000000
[ 2991.997572] r4:9f1e2140
[ 2992.000128] [<80459884>] (__sys_sendmsg) from [<80459918>] (SyS_sendmsg+0x18/0x1c)
[ 2992.007725] r6:00000000 r5:7ee9c7b8 r4:7ee946e0
[ 2992.012430] [<80459900>] (SyS_sendmsg) from [<80010660>] (ret_fast_syscall+0x0/0x3c)
[ 2992.020182] ---[ end trace 5d4bc29f4da04280 ]---
To fix this, return -EOPNOTSUPP in _mv88e6xxx_port_vlan_del instead of
-ENOENT if the hardware VLAN doesn't exist or the port is not a member.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Tested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
smatch detected a suspicious looking bitop condition:
drivers/net/ethernet/cavium/liquidio/lio_main.c:2529
handle_timestamp() warn: suspicious bitop condition
(skb_shinfo(skb)->tx_flags | SKBTX_IN_PROGRESS is always non-zero,
so the logic is definitely not correct. Use & to mask the correct
bit.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Commit c0eb454034aa ("hv_netvsc: Don't ask for additional head room in the
skb") got rid of needed_headroom setting for the driver. With the change I
hit the following issue trying to use ptkgen module:
[ 57.522021] kernel BUG at net/core/skbuff.c:1128!
[ 57.522021] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
...
[ 58.721068] Call Trace:
[ 58.721068] [<ffffffffa0144e86>] netvsc_start_xmit+0x4c6/0x8e0 [hv_netvsc]
...
[ 58.721068] [<ffffffffa02f87fc>] ? pktgen_finalize_skb+0x25c/0x2a0 [pktgen]
[ 58.721068] [<ffffffff814f5760>] ? __netdev_alloc_skb+0xc0/0x100
[ 58.721068] [<ffffffffa02f9907>] pktgen_thread_worker+0x257/0x1920 [pktgen]
Basically, we're calling skb_cow_head(skb, RNDIS_AND_PPI_SIZE) and crash on
if (skb_shared(skb))
BUG();
We probably need to restore needed_headroom setting (but shrunk to
RNDIS_AND_PPI_SIZE as we don't need more) to request the required headroom
space. In theory, it should not give us performance penalty.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When stopping the port, the CPU notifier are still there whereas the
mvneta_stop_dev function calls mvneta_percpu_disable() on each CPUs.
It was possible to have a new CPU coming at this point which could be
racy.
This patch adds a flag preventing executing the code notifier for a new
CPU when the port is stopping. It also uses the spinlock introduces
previously. To avoid the deadlock, the lock has been moved outside the
mvneta_percpu_elect function.
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Electing a CPU must be done in an atomic way: it should be done after or
before the removal/insertion of a CPU and this function is not reentrant.
During the loop of mvneta_percpu_elect we associates the queues to the
CPUs, if there is a topology change during this loop, then the mapping
between the CPUs and the queues could be wrong. During this loop the
interrupt mask is also updating for each CPUs, It should not be changed
in the same time by other part of the driver.
This patch adds spinlock to create the needed critical sections.
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In the MVNETA_INTR_* registers, the queues related fields are per cpu,
according to the datasheet (comment in [] are added by me):
"In a multi-CPU system, bits of RX[or TX] queues for which the access by
the reading[or writing] CPU is disabled are read as 0, and cannot be
cleared[or written]."
That means that each time we want to manipulate these bits we had to do
it on each cpu and not only on the current cpu.
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Since the commit 2dcf75e2793c ("net: mvneta: Associate RX queues with
each CPU") all the percpu irq are used and disabled at initialization, so
there is no point to disable them first.
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Instead of using a for_each_* loop in which we just call the
smp_call_function_single macro, it is more simple to directly use the
on_each_cpu macro. Moreover, this macro ensures that the calls will be
done all at once.
Suggested-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When passing to the management of multiple RX queue, the
mvneta_percpu_elect function was broken. The use of the modulo can lead
to elect the wrong cpu. For example with rxq_def=2, if the CPU 2 goes
offline and then online, we ended with the third RX queue activated in
the same time on CPU 0 and CPU2, which lead to a kernel crash.
With this fix, we don't try to get "the closer" CPU if the default CPU is
gone, now we just use CPU 0 which always be there. Thanks to this, the
code becomes more readable, easier to maintain and more predicable.
Cc: stable@vger.kernel.org
Fixes: 2dcf75e2793c ("net: mvneta: Associate RX queues with each CPU")
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch convert the for_each_present in on_each_cpu, instead of
applying on the present cpus it will be applied only on the online cpus.
This fix a bug reported on
http://thread.gmane.org/gmane.linux.ports.arm.kernel/468173.
Using the macro on_each_cpu (instead of a for_each_* loop) also ensures
that all the calls will be done all at once.
Fixes: f86428854480 ("net: mvneta: Statically assign queues to CPUs")
Reported-by: Stefan Roese <stefan.roese@gmail.com>
Suggested-by: Jisheng Zhang <jszhang@marvell.com>
Suggested-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We receoved a bug report from someone using vmware:
WARNING: CPU: 3 PID: 660 at kernel/sched/core.c:7389
__might_sleep+0x7d/0x90()
do not call blocking ops when !TASK_RUNNING; state=1 set at
[<ffffffff810fa68d>] prepare_to_wait+0x2d/0x90
Modules linked in: vmw_vsock_vmci_transport vsock snd_seq_midi
snd_seq_midi_event snd_ens1371 iosf_mbi gameport snd_rawmidi
snd_ac97_codec ac97_bus snd_seq coretemp snd_seq_device snd_pcm
snd_timer snd soundcore ppdev crct10dif_pclmul crc32_pclmul
ghash_clmulni_intel vmw_vmci vmw_balloon i2c_piix4 shpchp parport_pc
parport acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc btrfs
xor raid6_pq 8021q garp stp llc mrp crc32c_intel serio_raw mptspi vmwgfx
drm_kms_helper ttm drm scsi_transport_spi mptscsih e1000 ata_generic
mptbase pata_acpi
CPU: 3 PID: 660 Comm: vmtoolsd Not tainted
4.2.0-0.rc1.git3.1.fc23.x86_64 #1
Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop
Reference Platform, BIOS 6.00 05/20/2014
0000000000000000 0000000049e617f3 ffff88006ac37ac8 ffffffff818641f5
0000000000000000 ffff88006ac37b20 ffff88006ac37b08 ffffffff810ab446
ffff880068009f40 ffffffff81c63bc0 0000000000000061 0000000000000000
Call Trace:
[<ffffffff818641f5>] dump_stack+0x4c/0x65
[<ffffffff810ab446>] warn_slowpath_common+0x86/0xc0
[<ffffffff810ab4d5>] warn_slowpath_fmt+0x55/0x70
[<ffffffff8112551d>] ? debug_lockdep_rcu_enabled+0x1d/0x20
[<ffffffff810fa68d>] ? prepare_to_wait+0x2d/0x90
[<ffffffff810fa68d>] ? prepare_to_wait+0x2d/0x90
[<ffffffff810da2bd>] __might_sleep+0x7d/0x90
[<ffffffff812163b3>] __might_fault+0x43/0xa0
[<ffffffff81430477>] copy_from_iter+0x87/0x2a0
[<ffffffffa039460a>] __qp_memcpy_to_queue+0x9a/0x1b0 [vmw_vmci]
[<ffffffffa0394740>] ? qp_memcpy_to_queue+0x20/0x20 [vmw_vmci]
[<ffffffffa0394757>] qp_memcpy_to_queue_iov+0x17/0x20 [vmw_vmci]
[<ffffffffa0394d50>] qp_enqueue_locked+0xa0/0x140 [vmw_vmci]
[<ffffffffa039593f>] vmci_qpair_enquev+0x4f/0xd0 [vmw_vmci]
[<ffffffffa04847bb>] vmci_transport_stream_enqueue+0x1b/0x20
[vmw_vsock_vmci_transport]
[<ffffffffa047ae05>] vsock_stream_sendmsg+0x2c5/0x320 [vsock]
[<ffffffff810fabd0>] ? wake_atomic_t_function+0x70/0x70
[<ffffffff81702af8>] sock_sendmsg+0x38/0x50
[<ffffffff81702ff4>] SYSC_sendto+0x104/0x190
[<ffffffff8126e25a>] ? vfs_read+0x8a/0x140
[<ffffffff817042ee>] SyS_sendto+0xe/0x10
[<ffffffff8186d9ae>] entry_SYSCALL_64_fastpath+0x12/0x76
transport->stream_enqueue may call copy_to_user so it should
not be called inside a prepare_to_wait. Narrow the scope of
the prepare_to_wait to avoid the bad call. This also applies
to vsock_stream_recvmsg as well.
Reported-by: Vinson Lee <vlee@freedesktop.org>
Tested-by: Vinson Lee <vlee@freedesktop.org>
Signed-off-by: Laura Abbott <labbott@fedoraproject.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
There are typos in setting RTL8168H hardware parameters. If system install
another version driver that may cuase system hang.
Signed-off-by: Chunhao Lin <hau@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Dmitry reported memory leaks of IP options allocated in
ip_cmsg_send() when/if this function returns an error.
Callers are responsible for the freeing.
Many thanks to Dmitry for the report and diagnostic.
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The return value of kzalloc on failure of allocation of memory should
be -ENOMEM and not -1.
Found using Coccinelle. A simplified version of the semantic patch
used is:
//<smpl>
@@
expression *e;
position p,q;
@@
e@q = kzalloc(...);
if@p (e == NULL) {
...
return
- -1
+ -ENOMEM
;
}
//</smpl>
This function may also return -1 after calling mpp2_prs_tcam_port_map_get.
So that the function consistently returns meaningful error values on
failure, the -1 is changed to -EINVAL.
Signed-off-by: Amitoj Kaur Chawla <amitoj1606@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The return value of vmalloc on failure of allocation of memory should
be -ENOMEM and not -1.
Found using Coccinelle. A simplified version of the semantic patch
used is:
//<smpl>
@@
expression *e;
identifier l1;
position p,q;
@@
e@q = vmalloc(...);
if@p (e == NULL) {
...
goto l1;
}
l1:
...
return -1
+ -ENOMEM
;
//</smpl
The single call site of the containing function checks whether the
returned value is -1, so this check is changed as well. The single call
site of this call site, however, only checks whether the value is not 0,
so no further change was required.
Signed-off-by: Amitoj Kaur Chawla <amitoj1606@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The current logic in bond_arp_rcv will accept an incoming ARP for
validation if (a) the receiving slave is either "active" (which includes
the currently active slave, or the current ARP slave) or, (b) there is a
currently active slave, and it has received an ARP since it became active.
For case (b), the receiving slave isn't the currently active slave, and is
receiving the original broadcast ARP request, not an ARP reply from the
target.
This logic can fail if there is no currently active slave. In
this situation, the ARP probe logic cycles through all slaves, assigning
each in turn as the "current_arp_slave" for one arp_interval, then setting
that one as "active," and sending an ARP probe from that slave. The
current logic expects the ARP reply to arrive on the sending
current_arp_slave, however, due to switch FDB updating delays, the reply
may be directed to another slave.
This can arise if the bonding slaves and switch are working, but
the ARP target is not responding. When the ARP target recovers, a
condition may result wherein the ARP target host replies faster than the
switch can update its forwarding table, causing each ARP reply to be sent
to the previous current_arp_slave. This will never pass the logic in
bond_arp_rcv, as neither of the above conditions (a) or (b) are met.
Some experimentation on a LAN shows ARP reply round trips in the
200 usec range, but my available switches never update their FDB in less
than 4000 usec.
This patch changes the logic in bond_arp_rcv to additionally
accept an ARP reply for validation on any slave if there is a current ARP
slave and it sent an ARP probe during the previous arp_interval.
Fixes: aeea64ac717a ("bonding: don't trust arp requests unless active slave really works")
Cc: Veaceslav Falico <vfalico@gmail.com>
Cc: Andy Gospodarek <gospo@cumulusnetworks.com>
Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When ctx access is used, the kernel often needs to expand/rewrite
instructions, so after that patching, branch offsets have to be
adjusted for both forward and backward jumps in the new eBPF program,
but for backward jumps it fails to account the delta. Meaning, for
example, if the expansion happens exactly on the insn that sits at
the jump target, it doesn't fix up the back jump offset.
Analysis on what the check in adjust_branches() is currently doing:
/* adjust offset of jmps if necessary */
if (i < pos && i + insn->off + 1 > pos)
insn->off += delta;
else if (i > pos && i + insn->off + 1 < pos)
insn->off -= delta;
First condition (forward jumps):
Before: After:
insns[0] insns[0]
insns[1] <--- i/insn insns[1] <--- i/insn
insns[2] <--- pos insns[P] <--- pos
insns[3] insns[P] `------| delta
insns[4] <--- target_X insns[P] `-----|
insns[5] insns[3]
insns[4] <--- target_X
insns[5]
First case is if we cross pos-boundary and the jump instruction was
before pos. This is handeled correctly. I.e. if i == pos, then this
would mean our jump that we currently check was the patchlet itself
that we just injected. Since such patchlets are self-contained and
have no awareness of any insns before or after the patched one, the
delta is correctly not adjusted. Also, for the second condition in
case of i + insn->off + 1 == pos, means we jump to that newly patched
instruction, so no offset adjustment are needed. That part is correct.
Second condition (backward jumps):
Before: After:
insns[0] insns[0]
insns[1] <--- target_X insns[1] <--- target_X
insns[2] <--- pos <-- target_Y insns[P] <--- pos <-- target_Y
insns[3] insns[P] `------| delta
insns[4] <--- i/insn insns[P] `-----|
insns[5] insns[3]
insns[4] <--- i/insn
insns[5]
Second interesting case is where we cross pos-boundary and the jump
instruction was after pos. Backward jump with i == pos would be
impossible and pose a bug somewhere in the patchlet, so the first
condition checking i > pos is okay only by itself. However, i +
insn->off + 1 < pos does not always work as intended to trigger the
adjustment. It works when jump targets would be far off where the
delta wouldn't matter. But, for example, where the fixed insn->off
before pointed to pos (target_Y), it now points to pos + delta, so
that additional room needs to be taken into account for the check.
This means that i) both tests here need to be adjusted into pos + delta,
and ii) for the second condition, the test needs to be <= as pos
itself can be a target in the backjump, too.
Fixes: 9bac3d6d548e ("bpf: allow extended BPF programs access skb fields")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When looking up the pool_workqueue to use for an unbound workqueue,
workqueue assumes that the target CPU is always bound to a valid NUMA
node. However, currently, when a CPU goes offline, the mapping is
destroyed and cpu_to_node() returns NUMA_NO_NODE.
This has always been broken but hasn't triggered often enough before
874bbfe600a6 ("workqueue: make sure delayed work run in local cpu").
After the commit, workqueue forcifully assigns the local CPU for
delayed work items without explicit target CPU to fix a different
issue. This widens the window where CPU can go offline while a
delayed work item is pending causing delayed work items dispatched
with target CPU set to an already offlined CPU. The resulting
NUMA_NO_NODE mapping makes workqueue try to queue the work item on a
NULL pool_workqueue and thus crash.
While 874bbfe600a6 has been reverted for a different reason making the
bug less visible again, it can still happen. Fix it by mapping
NUMA_NO_NODE to the default pool_workqueue from unbound_pwq_by_node().
This is a temporary workaround. The long term solution is keeping CPU
-> NODE mapping stable across CPU off/online cycles which is being
worked on.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Tang Chen <tangchen@cn.fujitsu.com>
Cc: Rafael J. Wysocki <rafael@kernel.org>
Cc: Len Brown <len.brown@intel.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/g/1454424264.11183.46.camel@gmail.com
Link: http://lkml.kernel.org/g/1453702100-2597-1-git-send-email-tangchen@cn.fujitsu.com
|
|
Adding Intel codename DNV platform device IDs for SATA.
Signed-off-by: Alexandra Yates <alexandra.yates@linux.intel.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: stable@vger.kernel.org
|
|
Prior to 4.3, openvswitch tunnel vports (vxlan, gre and geneve) could
transmit vxlan packets of any size, constrained only by the ability to
send out the resulting packets. 4.3 introduced netdevs corresponding
to tunnel vports. These netdevs have an MTU, which limits the size of
a packet that can be successfully encapsulated. The default MTU
values are low (1500 or less), which is awkwardly small in the context
of physical networks supporting jumbo frames, and leads to a
conspicuous change in behaviour for userspace.
Instead, set the MTU on openvswitch-created netdevs to be the relevant
maximum (i.e. the maximum IP packet size minus any relevant overhead),
effectively restoring the behaviour prior to 4.3.
Signed-off-by: David Wragg <david@weave.works>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Allow the MTU of geneve devices to be set to large values, in order to
exploit underlying networks with larger frame sizes.
GENEVE does not have a fixed encapsulation overhead (an openvswitch
rule can add variable length options), so there is no relevant maximum
MTU to enforce. A maximum of IP_MAX_MTU is used instead.
Encapsulated packets that are too big for the underlying network will
get dropped on the floor.
Signed-off-by: David Wragg <david@weave.works>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Allow the MTU of vxlan devices without an underlying device to be set
to larger values (up to a maximum based on IP packet limits and vxlan
overhead).
Previously, their MTUs could not be set to higher than the
conventional ethernet value of 1500. This is a very arbitrary value
in the context of vxlan, and prevented vxlan devices from being able
to take advantage of jumbo frames etc.
The default MTU remains 1500, for compatibility.
Signed-off-by: David Wragg <david@weave.works>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Driver only needs to allocate for [ngpio / 32] controllers,
as each controller handles 32 gpios. But the current driver
allocates for ngpio of which the extra allocated are unused.
Fix it be registering only the required number of controllers.
Signed-off-by: Lokesh Vutla <lokeshvutla@ti.com>
Signed-off-by: Keerthy <j-keerthy@ti.com>
Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
|
|
Currently the first parameter of irq_domain_add_legacy is NULL.
irq_find_host function returns NULL when we do not populate the of_node
and hence irq_of_parse_and_map call fails whenever we want to request a
gpio irq. This fixes the request_irq failures for gpio interrupts.
Signed-off-by: Keerthy <j-keerthy@ti.com>
Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
|
|
drivers/input/touchscreen/colibri-vf50-ts.c: In function ‘vf50_ts_probe’:
drivers/input/touchscreen/colibri-vf50-ts.c:302: error: implicit declaration of function ‘of_property_read_u32’
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
|
|
The adp5589 has row 5, don't skip it when creating the GPIO mapping.
Otherwise the pin gets reserved as used and it is not possible to use it as
a GPIO.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Acked-by: Michael Hennerich <michael.hennerich@analog.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
|
|
A recent patch broke parsing the gain, offset, and threshold parameters
from device tree. Instead of setting the cached values and writing them
to the correct registers during probe, it would write the values from DT
into the register address variables and never write them to the chip
during normal operation.
Fixes: 2e23b7a96372 ("Input: edt-ft5x06 - use generic properties API")
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
|
|
Workqueue used to guarantee local execution for work items queued
without explicit target CPU. The guarantee is gone now which can
break some usages in subtle ways. To flush out those cases, this
patch implements a debug feature which forces round-robin CPU
selection for all such work items.
The debug feature defaults to off and can be enabled with a kernel
parameter. The default can be flipped with a debug config option.
If you hit this commit during bisection, please refer to 041bd12e272c
("Revert "workqueue: make sure delayed work run in local cpu"") for
more information and ping me.
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
WORK_CPU_UNBOUND work items queued to a bound workqueue always run
locally. This is a good thing normally, but not when the user has
asked us to keep unbound work away from certain CPUs. Round robin
these to wq_unbound_cpumask CPUs instead, as perturbation avoidance
trumps performance.
tj: Cosmetic and comment changes. WARN_ON_ONCE() dropped from empty
(wq_unbound_cpumask AND cpu_online_mask). If we want that, it
should be done when config changes.
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
This reverts commit 874bbfe600a660cba9c776b3957b1ce393151b76.
Workqueue used to implicity guarantee that work items queued without
explicit CPU specified are put on the local CPU. Recent changes in
timer broke the guarantee and led to vmstat breakage which was fixed
by 176bed1de5bf ("vmstat: explicitly schedule per-cpu work on the CPU
we need it to run on").
vmstat is the most likely to expose the issue and it's quite possible
that there are other similar problems which are a lot more difficult
to trigger. As a preventive measure, 874bbfe600a6 ("workqueue: make
sure delayed work run in local cpu") was applied to restore the local
CPU guarnatee. Unfortunately, the change exposed a bug in timer code
which got fixed by 22b886dd1018 ("timers: Use proper base migration in
add_timer_on()"). Due to code restructuring, the commit couldn't be
backported beyond certain point and stable kernels which only had
874bbfe600a6 started crashing.
The local CPU guarantee was accidental more than anything else and we
want to get rid of it anyway. As, with the vmstat case fixed,
874bbfe600a6 is causing more problems than it's fixing, it has been
decided to take the chance and officially break the guarantee by
reverting the commit. A debug feature will be added to force foreign
CPU assignment to expose cases relying on the guarantee and fixes for
the individual cases will be backported to stable as necessary.
Signed-off-by: Tejun Heo <tj@kernel.org>
Fixes: 874bbfe600a6 ("workqueue: make sure delayed work run in local cpu")
Link: http://lkml.kernel.org/g/20160120211926.GJ10810@quack.suse.cz
Cc: stable@vger.kernel.org
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Henrique de Moraes Holschuh <hmh@hmh.eng.br>
Cc: Daniel Bilik <daniel.bilik@neosystem.cz>
Cc: Jan Kara <jack@suse.cz>
Cc: Shaohua Li <shli@fb.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Daniel Bilik <daniel.bilik@neosystem.cz>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Michal Hocko <mhocko@kernel.org>
|