Age | Commit message (Collapse) | Author | Files | Lines |
|
Update hns to drop the hns_nic_get_headlen function in favour of
eth_get_headlen, and hence also removes now redundant hns_nic_get_headlen.
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
skb->truesize is not meant to be tracking amount of used bytes in a skb,
but amount of reserved/consumed bytes in memory.
For instance, if we use a single byte in last page fragment, we have to
account the full size of the fragment.
So skb_add_rx_frag needs to calculate the length of the entire buffer into
turesize.
Fixes: 9cbe9fd5214e ("net: hns: optimize XGE capability by reducing cpu usage")
Signed-off-by: Huazhong tan <tanhuazhong@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
'truesize' is supposed to be u32, not int, so fix it.
Signed-off-by: Huazhong tan <tanhuazhong@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When enable the config item "CONFIG_ARM64_64K_PAGES", the size of PAGE_SIZE
is 65536(64K). But the type of length and page_offset are u16, they will
overflow. So change them to u32.
Fixes: 6fe6611ff275 ("net: add Hisilicon Network Subsystem hnae framework support")
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This commit fixes a corner case where TCP BBR would enter PROBE_RTT
mode but not reduce its cwnd. If a TCP receiver ACKed less than one
full segment, the number of delivered/acked packets was 0, so that
bbr_set_cwnd() would short-circuit and exit early, without cutting
cwnd to the value we want for PROBE_RTT.
The fix is to instead make sure that even when 0 full packets are
ACKed, we do apply all the appropriate caps, including the cap that
applies in PROBE_RTT mode.
Fixes: 0f8782ea1497 ("tcp_bbr: add BBR congestion control")
Signed-off-by: Kevin Yang <yyd@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch fix the case where BBR does not exit PROBE_RTT mode when
it restarts from idle. When BBR restarts from idle and if BBR is in
PROBE_RTT mode, BBR should check if it's time to exit PROBE_RTT. If
yes, then BBR should exit PROBE_RTT mode and restore the cwnd to its
full value.
Fixes: 0f8782ea1497 ("tcp_bbr: add BBR congestion control")
Signed-off-by: Kevin Yang <yyd@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch add a helper function bbr_check_probe_rtt_done() to
1. check the condition to see if bbr should exit probe_rtt mode;
2. process the logic of exiting probe_rtt mode.
Fixes: 0f8782ea1497 ("tcp_bbr: add BBR congestion control")
Signed-off-by: Kevin Yang <yyd@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
tcp uses per-cpu (and per namespace) sockets (net->ipv4.tcp_sk) internally
to send some control packets.
1) RST packets, through tcp_v4_send_reset()
2) ACK packets in SYN-RECV and TIME-WAIT state, through tcp_v4_send_ack()
These packets assert IP_DF, and also use the hashed IP ident generator
to provide an IPv4 ID number.
Geoff Alexander reported this could be used to build off-path attacks.
These packets should not be fragmented, since their size is smaller than
IPV4_MIN_MTU. Only some tunneled paths could eventually have to fragment,
regardless of inner IPID.
We really can use zero IPID, to address the flaw, and as a bonus,
avoid a couple of atomic operations in ip_idents_reserve()
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Geoff Alexander <alexandg@cs.unm.edu>
Tested-by: Geoff Alexander <alexandg@cs.unm.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
All the 3 callers of addrconf_add_mroute() assert RTNL
lock, they don't take any additional lock either, so
it is safe to convert it to GFP_KERNEL.
Same for sit_add_v4_addrs().
Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The new tcf_exts_for_each_action() macro doesn't reference its
arguments when CONFIG_NET_CLS_ACT is disabled, which leads to
a harmless warning in at least one driver:
drivers/net/ethernet/stmicro/stmmac/stmmac_tc.c: In function 'tc_fill_actions':
drivers/net/ethernet/stmicro/stmmac/stmmac_tc.c:64:6: error: unused variable 'i' [-Werror=unused-variable]
Adding a cast to void lets us avoid this kind of warning.
To be on the safe side, do it for all three arguments, not
just the one that caused the warning.
Fixes: 244cd96adb5f ("net_sched: remove list_head from tc_action")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The TC filter flow mapping override completely skipped the call to
cake_hash(); however that meant that the internal state was not being
updated, which ultimately leads to deadlocks in some configurations. Fix
that by passing the overridden flow ID into cake_hash() instead so it can
react appropriately.
In addition, the major number of the class ID can now be set to override
the host mapping in host isolation mode. If both host and flow are
overridden (or if the respective modes are disabled), flow dissection and
hashing will be skipped entirely; otherwise, the hashing will be kept for
the portions that are not set by the filter.
Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The ncsi_pkg_info_all_nl() .dumpit handler is missing the NLM_F_MULTI
flag, causing additional package information after the first to be lost.
Also fixup a sanity check in ncsi_write_package_info() to reject out of
range package IDs.
Signed-off-by: Samuel Mendoza-Jonas <sam@mendozajonas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Acked-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Reviewed-by: Simon Horman <horms+renesas@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
use_all_metadata() acquires read_lock(&ife_mod_lock), then calls
add_metainfo() which calls find_ife_oplist() which acquires the same
lock again. Deadlock!
Introduce __add_metainfo() which accepts struct tcf_meta_ops *ops
as an additional parameter and let its callers to decide how
to find it. For use_all_metadata(), it already has ops, no
need to find it again, just call __add_metainfo() directly.
And, as ife_mod_lock is only needed for find_ife_oplist(),
this means we can make non-atomic allocation for populate_metalist()
now.
Fixes: 817e9f2c5c26 ("act_ife: acquire ife_mod_lock before reading ifeoplist")
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The only time we need to take tcfa_lock is when adding
a new metainfo to an existing ife->metalist. We don't need
to take tcfa_lock so early and so broadly in tcf_ife_init().
This means we can always take ife_mod_lock first, avoid the
reverse locking ordering warning as reported by Vlad.
Reported-by: Vlad Buslov <vladbu@mellanox.com>
Tested-by: Vlad Buslov <vladbu@mellanox.com>
Cc: Vlad Buslov <vladbu@mellanox.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This reverts commit 42c625a486f3 ("net: sched: act_ife: disable bh
when taking ife_mod_lock"), because what ife_mod_lock protects
is absolutely not touched in rate est timer BH context, they have
no race.
A better fix is following up.
Cc: Vlad Buslov <vladbu@mellanox.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
After commit 90b73b77d08e, list_head is no longer needed.
Now we just need to convert the list iteration to array
iteration for drivers.
Fixes: 90b73b77d08e ("net: sched: change action API to use array of pointers to actions")
Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
tcf_idr_check() is replaced by tcf_idr_check_alloc(),
and __tcf_idr_check() now can be folded into tcf_idr_search().
Fixes: 0190c1d452a9 ("net: sched: atomically check-allocate action")
Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Fixes: 16af6067392c ("net: sched: implement reference counted action release")
Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
All ops->delete() wants is getting the tn->idrinfo, but we already
have tc_action before calling ops->delete(), and tc_action has
a pointer ->idrinfo.
More importantly, each type of action does the same thing, that is,
just calling tcf_idr_delete_index().
So it can be just removed.
Fixes: b409074e6693 ("net: sched: add 'delete' function to action ops")
Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
tcf_action_put_many() is mostly called to clean up actions on
failure path, but tcf_action_put_many(&actions[acts_deleted]) is
used in the ugliest way: it passes a slice of the array and
uses an additional NULL at the end to avoid out-of-bound
access.
acts_deleted is completely unnecessary since we can teach
tcf_action_put_many() scan the whole array and checks against
NULL pointer. Which also means tcf_action_delete() should
set deleted action pointers to NULL to avoid double free.
Fixes: 90b73b77d08e ("net: sched: change action API to use array of pointers to actions")
Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Registering another device with same MAC address (such as TAP, VPN or
DPDK KNI) will confuse the VF autobinding logic. Restrict the search
to only run if the device is known to be a PCI attached VF.
Fixes: e8ff40d4bff1 ("hv_netvsc: improve VF device matching")
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Remove duplicated include.
Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Remove including <linux/version.h> that don't need it.
Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Remove duplicated include.
Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Prior to the introduction of fib6_info lwtstate was managed by the dst
code. With fib6_info releasing lwtstate needs to be done when the struct
is freed.
Fixes: 93531c674315 ("net/ipv6: separate handling of FIB entries from dst based routes")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
There's a new Dell TB16 dock with a different iSerialNumber.
Apply the same fix from commit 0b1655143df0 ("r8152: disable RX
aggregation on Dell TB16 dock") to this model.
BugLink: https://bugs.launchpad.net/bugs/1785780
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Keep sending mailbox commands to the MFW when it is not responsive ends up
with a redundant amount of timeout expiries.
This patch prints the MCP status on the first command which is not
responded, and blocks the following commands.
Since the (un)load request commands might be not responded due to other
PFs, the patch also adds the option to skip the blocking upon a failure.
Signed-off-by: Tomer Tayar <Tomer.Tayar@cavium.com>
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The MFW manages an internal lock to prevent concurrent hardware
(de)initialization of different PFs.
This, together with the busy-waiting for the MFW's responses for commands,
might lead to a deadlock during concurrent load or unload of PFs.
This patch adds the option to sleep within the busy-waiting, and uses it
for the (un)load requests (which are not sent from an interrupt context) to
prevent the possible deadlock.
Signed-off-by: Tomer Tayar <Tomer.Tayar@cavium.com>
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Successive iterations of halting and resuming the management chip (MCP)
might fail, since currently the driver doesn't wait for these operations to
actually take place.
This patch prevents the driver from moving forward before the operations
are reflected in the state register.
Signed-off-by: Tomer Tayar <Tomer.Tayar@cavium.com>
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The MFW might be reset and re-update its shared memory.
Upon the detection of such a reset the driver rereads this memory, but it
has to wait till the data is valid.
This patch adds the missing wait for a data ready indication.
Signed-off-by: Tomer Tayar <Tomer.Tayar@cavium.com>
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If load ip6_vti module and create a network namespace when set
fb_tunnels_only_for_init_net to 1, then exit the namespace will
cause following crash:
[ 6601.677036] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
[ 6601.679057] PGD 8000000425eca067 P4D 8000000425eca067 PUD 424292067 PMD 0
[ 6601.680483] Oops: 0000 [#1] SMP PTI
[ 6601.681223] CPU: 7 PID: 93 Comm: kworker/u16:1 Kdump: loaded Tainted: G E 4.18.0+ #3
[ 6601.683153] Hardware name: Fedora Project OpenStack Nova, BIOS seabios-1.7.5-11.el7 04/01/2014
[ 6601.684919] Workqueue: netns cleanup_net
[ 6601.685742] RIP: 0010:vti6_exit_batch_net+0x87/0xd0 [ip6_vti]
[ 6601.686932] Code: 7b 08 48 89 e6 e8 b9 ea d3 dd 48 8b 1b 48 85 db 75 ec 48 83 c5 08 48 81 fd 00 01 00 00 75 d5 49 8b 84 24 08 01 00 00 48 89 e6 <48> 8b 78 08 e8 90 ea d3 dd 49 8b 45 28 49 39 c6 4c 8d 68 d8 75 a1
[ 6601.690735] RSP: 0018:ffffa897c2737de0 EFLAGS: 00010246
[ 6601.691846] RAX: 0000000000000000 RBX: 0000000000000000 RCX: dead000000000200
[ 6601.693324] RDX: 0000000000000015 RSI: ffffa897c2737de0 RDI: ffffffff9f2ea9e0
[ 6601.694824] RBP: 0000000000000100 R08: 0000000000000000 R09: 0000000000000000
[ 6601.696314] R10: 0000000000000001 R11: 0000000000000000 R12: ffff8dc323c07e00
[ 6601.697812] R13: ffff8dc324a63100 R14: ffffa897c2737e30 R15: ffffa897c2737e30
[ 6601.699345] FS: 0000000000000000(0000) GS:ffff8dc33fdc0000(0000) knlGS:0000000000000000
[ 6601.701068] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 6601.702282] CR2: 0000000000000008 CR3: 0000000424966002 CR4: 00000000001606e0
[ 6601.703791] Call Trace:
[ 6601.704329] cleanup_net+0x1b4/0x2c0
[ 6601.705268] process_one_work+0x16c/0x370
[ 6601.706145] worker_thread+0x49/0x3e0
[ 6601.706942] kthread+0xf8/0x130
[ 6601.707626] ? rescuer_thread+0x340/0x340
[ 6601.708476] ? kthread_bind+0x10/0x10
[ 6601.709266] ret_from_fork+0x35/0x40
Reproduce:
modprobe ip6_vti
echo 1 > /proc/sys/net/core/fb_tunnels_only_for_init_net
unshare -n
exit
This because ip6n->tnls_wc[0] point to fallback device in default, but
in non-default namespace, ip6n->tnls_wc[0] will be NULL, so add the NULL
check comparatively.
Fixes: e2948e5af8ee ("ip6_vti: fix creating fallback tunnel device for vti6")
Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When set fb_tunnels_only_for_init_net to 1, don't create fallback tunnel
device for vti6 when a new namespace is created.
Tested:
[root@builder2 ~]# modprobe ip6_tunnel
[root@builder2 ~]# modprobe ip6_vti
[root@builder2 ~]# echo 1 > /proc/sys/net/core/fb_tunnels_only_for_init_net
[root@builder2 ~]# unshare -n
[root@builder2 ~]# ip link
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group
default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
After set fb_tunnels_only_for_init_net to 1, the itn->fb_tunnel_dev will
be NULL and will cause following crash:
[ 2742.849298] BUG: unable to handle kernel NULL pointer dereference at 0000000000000941
[ 2742.851380] PGD 800000042c21a067 P4D 800000042c21a067 PUD 42aaed067 PMD 0
[ 2742.852818] Oops: 0002 [#1] SMP PTI
[ 2742.853570] CPU: 7 PID: 2484 Comm: unshare Kdump: loaded Not tainted 4.18.0-rc8+ #2
[ 2742.855163] Hardware name: Fedora Project OpenStack Nova, BIOS seabios-1.7.5-11.el7 04/01/2014
[ 2742.856970] RIP: 0010:vti_init_net+0x3a/0x50 [ip_vti]
[ 2742.858034] Code: 90 83 c0 48 c7 c2 20 a1 83 c0 48 89 fb e8 6e 3b f6 ff 85 c0 75 22 8b 0d f4 19 00 00 48 8b 93 00 14 00 00 48 8b 14 ca 48 8b 12 <c6> 82 41 09 00 00 04 c6 82 38 09 00 00 45 5b c3 66 0f 1f 44 00 00
[ 2742.861940] RSP: 0018:ffff9be28207fde0 EFLAGS: 00010246
[ 2742.863044] RAX: 0000000000000000 RBX: ffff8a71ebed4980 RCX: 0000000000000013
[ 2742.864540] RDX: 0000000000000000 RSI: 0000000000000013 RDI: ffff8a71ebed4980
[ 2742.866020] RBP: ffff8a71ea717000 R08: ffffffffc083903c R09: ffff8a71ea717000
[ 2742.867505] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8a71ebed4980
[ 2742.868987] R13: 0000000000000013 R14: ffff8a71ea5b49c0 R15: 0000000000000000
[ 2742.870473] FS: 00007f02266c9740(0000) GS:ffff8a71ffdc0000(0000) knlGS:0000000000000000
[ 2742.872143] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2742.873340] CR2: 0000000000000941 CR3: 000000042bc20006 CR4: 00000000001606e0
[ 2742.874821] Call Trace:
[ 2742.875358] ops_init+0x38/0xf0
[ 2742.876078] setup_net+0xd9/0x1f0
[ 2742.876789] copy_net_ns+0xb7/0x130
[ 2742.877538] create_new_namespaces+0x11a/0x1d0
[ 2742.878525] unshare_nsproxy_namespaces+0x55/0xa0
[ 2742.879526] ksys_unshare+0x1a7/0x330
[ 2742.880313] __x64_sys_unshare+0xe/0x20
[ 2742.881131] do_syscall_64+0x5b/0x180
[ 2742.881933] entry_SYSCALL_64_after_hwframe+0x44/0xa9
Reproduce:
echo 1 > /proc/sys/net/core/fb_tunnels_only_for_init_net
modprobe ip_vti
unshare -n
Fixes: 79134e6ce2c9 ("net: do not create fallback tunnels for non-default namespaces")
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Found the ethernet network on ASUS X441UAR doesn't come back on resume
from suspend when using MSI-X. The chip is RTL8106e - version 39.
[ 21.848357] libphy: r8169: probed
[ 21.848473] r8169 0000:02:00.0 eth0: RTL8106e, 0c:9d:92:32:67:b4, XID
44900000, IRQ 127
[ 22.518860] r8169 0000:02:00.0 enp2s0: renamed from eth0
[ 29.458041] Generic PHY r8169-200:00: attached PHY driver [Generic
PHY] (mii_bus:phy_addr=r8169-200:00, irq=IGNORE)
[ 63.227398] r8169 0000:02:00.0 enp2s0: Link is Up - 100Mbps/Full -
flow control off
[ 124.514648] Generic PHY r8169-200:00: attached PHY driver [Generic
PHY] (mii_bus:phy_addr=r8169-200:00, irq=IGNORE)
Here is the ethernet controller in detail:
02:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd.
RTL8101/2/6E PCI Express Fast/Gigabit Ethernet controller [10ec:8136]
(rev 07)
Subsystem: ASUSTeK Computer Inc. RTL810xE PCI Express Fast
Ethernet controller [1043:200f]
Flags: bus master, fast devsel, latency 0, IRQ 16
I/O ports at e000 [size=256]
Memory at ef100000 (64-bit, non-prefetchable) [size=4K]
Memory at e0000000 (64-bit, prefetchable) [size=16K]
Capabilities: <access denied>
Kernel driver in use: r8169
Kernel modules: r8169
Falling back to MSI fixes the issue.
Fixes: 6c6aa15fdea5 ("r8169: improve interrupt handling")
Signed-off-by: Jian-Hong Pan <jian-hong@endlessm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
timekeeping_clocktai64() has been renamed to ktime_get_clocktai_ts64()
for consistency with the other ktime_get_* access functions.
Rename the new caller that has come up as well.
Question: this is the only ptp driver that sets the hardware time
to the current system time in TAI. Why does it do that?
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Recently, ops->init() and ops->dump() of all actions were modified to
always obtain tcf_lock when accessing private action state. Actions that
don't depend on tcf_lock for synchronization with their data path use
non-bh locking API. However, tcf_lock is also used to protect rate
estimator stats in softirq context by timer callback.
Change ops->init() and ops->dump() of all actions to disable bh when using
tcf_lock to prevent deadlock reported by following lockdep warning:
[ 105.470398] ================================
[ 105.475014] WARNING: inconsistent lock state
[ 105.479628] 4.18.0-rc8+ #664 Not tainted
[ 105.483897] --------------------------------
[ 105.488511] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
[ 105.494871] swapper/16/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
[ 105.500449] 00000000f86c012e (&(&p->tcfa_lock)->rlock){+.?.}, at: est_fetch_counters+0x3c/0xa0
[ 105.509696] {SOFTIRQ-ON-W} state was registered at:
[ 105.514925] _raw_spin_lock+0x2c/0x40
[ 105.519022] tcf_bpf_init+0x579/0x820 [act_bpf]
[ 105.523990] tcf_action_init_1+0x4e4/0x660
[ 105.528518] tcf_action_init+0x1ce/0x2d0
[ 105.532880] tcf_exts_validate+0x1d8/0x200
[ 105.537416] fl_change+0x55a/0x268b [cls_flower]
[ 105.542469] tc_new_tfilter+0x748/0xa20
[ 105.546738] rtnetlink_rcv_msg+0x56a/0x6d0
[ 105.551268] netlink_rcv_skb+0x18d/0x200
[ 105.555628] netlink_unicast+0x2d0/0x370
[ 105.559990] netlink_sendmsg+0x3b9/0x6a0
[ 105.564349] sock_sendmsg+0x6b/0x80
[ 105.568271] ___sys_sendmsg+0x4a1/0x520
[ 105.572547] __sys_sendmsg+0xd7/0x150
[ 105.576655] do_syscall_64+0x72/0x2c0
[ 105.580757] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 105.586243] irq event stamp: 489296
[ 105.590084] hardirqs last enabled at (489296): [<ffffffffb507e639>] _raw_spin_unlock_irq+0x29/0x40
[ 105.599765] hardirqs last disabled at (489295): [<ffffffffb507e745>] _raw_spin_lock_irq+0x15/0x50
[ 105.609277] softirqs last enabled at (489292): [<ffffffffb413a6a3>] irq_enter+0x83/0xa0
[ 105.618001] softirqs last disabled at (489293): [<ffffffffb413a800>] irq_exit+0x140/0x190
[ 105.626813]
other info that might help us debug this:
[ 105.633976] Possible unsafe locking scenario:
[ 105.640526] CPU0
[ 105.643325] ----
[ 105.646125] lock(&(&p->tcfa_lock)->rlock);
[ 105.650747] <Interrupt>
[ 105.653717] lock(&(&p->tcfa_lock)->rlock);
[ 105.658514]
*** DEADLOCK ***
[ 105.665349] 1 lock held by swapper/16/0:
[ 105.669629] #0: 00000000a640ad99 ((&est->timer)){+.-.}, at: call_timer_fn+0x10b/0x550
[ 105.678200]
stack backtrace:
[ 105.683194] CPU: 16 PID: 0 Comm: swapper/16 Not tainted 4.18.0-rc8+ #664
[ 105.690249] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017
[ 105.698626] Call Trace:
[ 105.701421] <IRQ>
[ 105.703791] dump_stack+0x92/0xeb
[ 105.707461] print_usage_bug+0x336/0x34c
[ 105.711744] mark_lock+0x7c9/0x980
[ 105.715500] ? print_shortest_lock_dependencies+0x2e0/0x2e0
[ 105.721424] ? check_usage_forwards+0x230/0x230
[ 105.726315] __lock_acquire+0x923/0x26f0
[ 105.730597] ? debug_show_all_locks+0x240/0x240
[ 105.735478] ? mark_lock+0x493/0x980
[ 105.739412] ? check_chain_key+0x140/0x1f0
[ 105.743861] ? __lock_acquire+0x836/0x26f0
[ 105.748323] ? lock_acquire+0x12e/0x290
[ 105.752516] lock_acquire+0x12e/0x290
[ 105.756539] ? est_fetch_counters+0x3c/0xa0
[ 105.761084] _raw_spin_lock+0x2c/0x40
[ 105.765099] ? est_fetch_counters+0x3c/0xa0
[ 105.769633] est_fetch_counters+0x3c/0xa0
[ 105.773995] est_timer+0x87/0x390
[ 105.777670] ? est_fetch_counters+0xa0/0xa0
[ 105.782210] ? lock_acquire+0x12e/0x290
[ 105.786410] call_timer_fn+0x161/0x550
[ 105.790512] ? est_fetch_counters+0xa0/0xa0
[ 105.795055] ? del_timer_sync+0xd0/0xd0
[ 105.799249] ? __lock_is_held+0x93/0x110
[ 105.803531] ? mark_held_locks+0x20/0xe0
[ 105.807813] ? _raw_spin_unlock_irq+0x29/0x40
[ 105.812525] ? est_fetch_counters+0xa0/0xa0
[ 105.817069] ? est_fetch_counters+0xa0/0xa0
[ 105.821610] run_timer_softirq+0x3c4/0x9f0
[ 105.826064] ? lock_acquire+0x12e/0x290
[ 105.830257] ? __bpf_trace_timer_class+0x10/0x10
[ 105.835237] ? __lock_is_held+0x25/0x110
[ 105.839517] __do_softirq+0x11d/0x7bf
[ 105.843542] irq_exit+0x140/0x190
[ 105.847208] smp_apic_timer_interrupt+0xac/0x3b0
[ 105.852182] apic_timer_interrupt+0xf/0x20
[ 105.856628] </IRQ>
[ 105.859081] RIP: 0010:cpuidle_enter_state+0xd8/0x4d0
[ 105.864395] Code: 46 ff 48 89 44 24 08 0f 1f 44 00 00 31 ff e8 cf ec 46 ff 80 7c 24 07 00 0f 85 1d 02 00 00 e8 9f 90 4b ff fb 66 0f 1f 44 00 00 <4c> 8b 6c 24 08 4d 29 fd 0f 80 36 03 00 00 4c 89 e8 48 ba cf f7 53
[ 105.884288] RSP: 0018:ffff8803ad94fd20 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
[ 105.892494] RAX: 0000000000000000 RBX: ffffe8fb300829c0 RCX: ffffffffb41e19e1
[ 105.899988] RDX: 0000000000000007 RSI: dffffc0000000000 RDI: ffff8803ad9358ac
[ 105.907503] RBP: ffffffffb6636300 R08: 0000000000000004 R09: 0000000000000000
[ 105.914997] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000004
[ 105.922487] R13: ffffffffb6636140 R14: ffffffffb66362d8 R15: 000000188d36091b
[ 105.929988] ? trace_hardirqs_on_caller+0x141/0x2d0
[ 105.935232] do_idle+0x28e/0x320
[ 105.938817] ? arch_cpu_idle_exit+0x40/0x40
[ 105.943361] ? mark_lock+0x8c1/0x980
[ 105.947295] ? _raw_spin_unlock_irqrestore+0x32/0x60
[ 105.952619] cpu_startup_entry+0xc2/0xd0
[ 105.956900] ? cpu_in_idle+0x20/0x20
[ 105.960830] ? _raw_spin_unlock_irqrestore+0x32/0x60
[ 105.966146] ? trace_hardirqs_on_caller+0x141/0x2d0
[ 105.971391] start_secondary+0x2b5/0x360
[ 105.975669] ? set_cpu_sibling_map+0x1330/0x1330
[ 105.980654] secondary_startup_64+0xa5/0xb0
Taking tcf_lock in sample action with bh disabled causes lockdep to issue a
warning regarding possible irq lock inversion dependency between tcf_lock,
and psample_groups_lock that is taken when holding tcf_lock in sample init:
[ 162.108959] Possible interrupt unsafe locking scenario:
[ 162.116386] CPU0 CPU1
[ 162.121277] ---- ----
[ 162.126162] lock(psample_groups_lock);
[ 162.130447] local_irq_disable();
[ 162.136772] lock(&(&p->tcfa_lock)->rlock);
[ 162.143957] lock(psample_groups_lock);
[ 162.150813] <Interrupt>
[ 162.153808] lock(&(&p->tcfa_lock)->rlock);
[ 162.158608]
*** DEADLOCK ***
In order to prevent potential lock inversion dependency between tcf_lock
and psample_groups_lock, extract call to psample_group_get() from tcf_lock
protected section in sample action init function.
Fixes: 4e232818bd32 ("net: sched: act_mirred: remove dependency on rtnl lock")
Fixes: 764e9a24480f ("net: sched: act_vlan: remove dependency on rtnl lock")
Fixes: 729e01260989 ("net: sched: act_tunnel_key: remove dependency on rtnl lock")
Fixes: d77284956656 ("net: sched: act_sample: remove dependency on rtnl lock")
Fixes: e8917f437006 ("net: sched: act_gact: remove dependency on rtnl lock")
Fixes: b6a2b971c0b0 ("net: sched: act_csum: remove dependency on rtnl lock")
Fixes: 2142236b4584 ("net: sched: act_bpf: remove dependency on rtnl lock")
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Same as ip_vti, use iptunnel_xmit_stats to updates stats in tunnel xmit
code path.
Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This function was created as a deprecated fallback case back in 2010 by
commit eb14120f743d ("pcmcia: re-work pcmcia_request_irq()") for legacy
cases.
Actual in-kernel users haven't been around for a long while. The last
in-kernel user was apparently removed four years ago by commit
5f5316fcd08e ("am2150: Update nmclan_cs.c to use update PCMCIA API").
Just remove it entirely.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
We haven't had lots of deprecation warnings lately, but the rdma use of
it made them flare up again.
They are not useful. They annoy everybody, and nobody ever does
anything about them, because it's always "somebody elses problem". And
when people start thinking that warnings are normal, they stop looking
at them, and the real warnings that mean something go unnoticed.
If you want to get rid of a function, just get rid of it. Convert every
user to the new world order.
And if you can't do that, then don't annoy everybody else with your
marking that says "I couldn't be bothered to fix this, so I'll just spam
everybody elses build logs with warnings about my laziness".
Make a kernelnewbies wiki page about things that could be cleaned up,
write a blog post about it, or talk to people on the mailing lists. But
don't add warnings to the kernel build about cleanup that you think
should happen but you aren't doing yourself.
Don't. Just don't.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Variables align_start and align_end are being assigned but are never
used hence they are redundant and can be removed.
Cleans up clang warnings:
warning: variable 'align_start' set but not used [-Wunused-but-set-variable]
warning: variable 'align_size' set but not used [-Wunused-but-set-variable]
Link: http://lkml.kernel.org/r/20180714161124.3923-1-colin.king@canonical.com
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Pointer uwq is being assigned but is never used hence it is redundant
and can be removed.
Cleans up clang warning:
warning: variable 'uwq' set but not used [-Wunused-but-set-variable]
Link: http://lkml.kernel.org/r/20180717090802.18357-1-colin.king@canonical.com
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
When perf profiling a wide variety of different workloads, it was found
that vmacache_find() had higher than expected cost: up to 0.08% of cpu
utilization in some cases. This was found to rival other core VM
functions such as alloc_pages_vma() with thp enabled and default
mempolicy, and the conditionals in __get_vma_policy().
VMACACHE_HASH() determines which of the four per-task_struct slots a vma
is cached for a particular address. This currently depends on the pfn,
so pfn 5212 occupies a different vmacache slot than its neighboring pfn
5213.
vmacache_find() iterates through all four of current's vmacache slots
when looking up an address. Hashing based on pfn, an address has
~1/VMACACHE_SIZE chance of being cached in the first vmacache slot, or
about 25%, *if* the vma is cached.
This patch hashes an address by its pmd instead of pte to optimize for
workloads with good spatial locality. This results in a higher
probability of vmas being cached in the first slot that is checked:
normally ~70% on the same workloads instead of 25%.
[rientjes@google.com: various updates]
Link: http://lkml.kernel.org/r/alpine.DEB.2.21.1807231532290.109445@chino.kir.corp.google.com
Link: http://lkml.kernel.org/r/alpine.DEB.2.21.1807091749150.114630@chino.kir.corp.google.com
Signed-off-by: David Rientjes <rientjes@google.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Provide list_lru_shrink_walk_irq() and let it behave like
list_lru_walk_one() except that it locks the spinlock with
spin_lock_irq(). This is used by scan_shadow_nodes() because its lock
nests within the i_pages lock which is acquired with IRQ. This change
allows to use proper locking promitives instead hand crafted
lock_irq_disable() plus spin_lock().
There is no EXPORT_SYMBOL provided because the current user is in-kernel
only.
Add list_lru_shrink_walk_irq() which acquires the spinlock with the
proper locking primitives.
Link: http://lkml.kernel.org/r/20180716111921.5365-5-bigeasy@linutronix.de
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
__list_lru_walk_one() is invoked with struct list_lru *lru, int nid as
the first two argument. Those two are only used to retrieve struct
list_lru_node. Since this is already done by the caller of the function
for the locking, we can pass struct list_lru_node* directly and avoid
the dance around it.
Link: http://lkml.kernel.org/r/20180716111921.5365-4-bigeasy@linutronix.de
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Move the locking inside __list_lru_walk_one() to its caller. This is a
preparation step in order to introduce list_lru_walk_one_irq() which
does spin_lock_irq() instead of spin_lock() for the locking.
Link: http://lkml.kernel.org/r/20180716111921.5365-3-bigeasy@linutronix.de
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Patch series "mm/list_lru: Add list_lru_shrink_walk_irq() and a user".
This series removes the local_irq_disable() around
list_lru_shrink_walk() (as used by mm/workingset) by adding
list_lru_shrink_walk_irq().
Vladimir Davydov preferred this over `irq' argument which I added to
struct list_lru.
The initial post (of this series) received a Reviewed-by tag by Vladimir
Davydov which I added to each patch of the series. The series applies
on top of akpm's tree which has Kirill's shrink_slab series and does not
clash with it (akpm asked me to wait a week or so and repost it then).
I tested the code paths by triggering the OOM-killer via memory over
commit and lockdep did not complain (nor did I see any warnings).
This patch (of 4):
list_lru_walk_node() invokes __list_lru_walk_one() with -1 as the
memcg_idx parameter. The same can be achieved by list_lru_walk_one() and
passing NULL as memcg argument which then gets converted into -1. This is
a preparation step when the spin_lock() function is lifted to the caller
of __list_lru_walk_one(). Invoke list_lru_walk_one() instead
__list_lru_walk_one() when possible.
Link: http://lkml.kernel.org/r/20180716111921.5365-2-bigeasy@linutronix.de
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
CONFIG_THP_SWAP should depend on CONFIG_SWAP, because it's unreasonable
to optimize swapping for THP (Transparent Huge Page) without basic
swapping support.
In original code, when CONFIG_SWAP=n and CONFIG_THP_SWAP=y,
split_swap_cluster() will not be built because it is in swapfile.c, but
it will be called in huge_memory.c. This doesn't trigger a build error
in practice because the call site is enclosed by PageSwapCache(), which
is defined to be constant 0 when CONFIG_SWAP=n. But this is fragile and
should be fixed.
The comments are fixed too to reflect the latest progress.
Link: http://lkml.kernel.org/r/20180713021228.439-1-ying.huang@intel.com
Fixes: 38d8b4e6bdc8 ("mm, THP, swap: delay splitting THP during swap out")
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Shaohua Li <shli@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Zi Yan <zi.yan@cs.rutgers.edu>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Rename new_sparse_init() to sparse_init() which enables it. Delete old
sparse_init() and all the code that became obsolete with.
[pasha.tatashin@oracle.com: remove unused sparse_mem_maps_populate_node()]
Link: http://lkml.kernel.org/r/20180716174447.14529-6-pasha.tatashin@oracle.com
Link: http://lkml.kernel.org/r/20180712203730.8703-6-pasha.tatashin@oracle.com
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Tested-by: Michael Ellerman <mpe@ellerman.id.au> [powerpc]
Tested-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Pasha Tatashin <Pavel.Tatashin@microsoft.com>
Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Souptick Joarder <jrdr.linux@gmail.com>
Cc: Steven Sistare <steven.sistare@oracle.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Yang <richard.weiyang@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|