aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/net/ethernet/cavium (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2020-02-26net: core: devlink.c: Use built-in RCU list checkingMadhuparna Bhowmik1-9/+10
list_for_each_entry_rcu() has built-in RCU and lock checking. Pass cond argument to list_for_each_entry_rcu() to silence false lockdep warning when CONFIG_PROVE_RCU_LIST is enabled. The devlink->lock is held when devlink_dpipe_table_find() is called in non RCU read side section. Therefore, pass struct devlink to devlink_dpipe_table_find() for lockdep checking. Signed-off-by: Madhuparna Bhowmik <madhuparnabhowmik10@gmail.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-26net: dsa: bcm_sf2: Forcibly configure IMP port for 1Gb/secFlorian Fainelli1-2/+1
We are still experiencing some packet loss with the existing advanced congestion buffering (ACB) settings with the IMP port configured for 2Gb/sec, so revert to conservative link speeds that do not produce packet loss until this is resolved. Fixes: 8f1880cbe8d0 ("net: dsa: bcm_sf2: Configure IMP port for 2Gb/sec") Fixes: de34d7084edd ("net: dsa: bcm_sf2: Only 7278 supports 2Gb/sec IMP port") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-26bnxt_en: add newline to netdev_*() format stringsJonathan Lemon5-38/+38
Add missing newlines to netdev_* format strings so the lines aren't buffered by the printk subsystem. Nitpicked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com> Acked-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-26netfilter: xt_hashlimit: unregister proc file before releasing mutexCong Wang1-10/+6
Before releasing the global mutex, we only unlink the hashtable from the hash list, its proc file is still not unregistered at this point. So syzbot could trigger a race condition where a parallel htable_create() could register the same file immediately after the mutex is released. Move htable_remove_proc_entry() back to mutex protection to fix this. And, fold htable_destroy() into htable_put() to make the code slightly easier to understand. Reported-and-tested-by: syzbot+d195fd3b9a364ddd6731@syzkaller.appspotmail.com Fixes: c4a3922d2d20 ("netfilter: xt_hashlimit: reduce hashlimit_mutex scope for htable_put()") Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-02-26ethtool: limit bitset sizeMichal Kubecek2-1/+4
Syzbot reported that ethnl_compact_sanity_checks() can be tricked into reading past the end of ETHTOOL_A_BITSET_VALUE and ETHTOOL_A_BITSET_MASK attributes and even the message by passing a value between (u32)(-31) and (u32)(-1) as ETHTOOL_A_BITSET_SIZE. The problem is that DIV_ROUND_UP(attr_nbits, 32) is 0 for such values so that zero length ETHTOOL_A_BITSET_VALUE will pass the length check but ethnl_bitmap32_not_zero() check would try to access up to 512 MB of attribute "payload". Prevent this overflow byt limiting the bitset size. Technically, compact bitset format would allow bitset sizes up to almost 2^18 (so that the nest size does not exceed U16_MAX) but bitsets used by ethtool are much shorter. S16_MAX, the largest value which can be directly used as an upper limit in policy, should be a reasonable compromise. Fixes: 10b518d4e6dd ("ethtool: netlink bitset handling") Reported-by: syzbot+7fd4ed5b4234ab1fdccd@syzkaller.appspotmail.com Reported-by: syzbot+709b7a64d57978247e44@syzkaller.appspotmail.com Reported-by: syzbot+983cb8fb2d17a7af549d@syzkaller.appspotmail.com Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-26net: Fix Tx hash bound checkingAmritha Nambiar1-0/+2
Fixes the lower and upper bounds when there are multiple TCs and traffic is on the the same TC on the same device. The lower bound is represented by 'qoffset' and the upper limit for hash value is 'qcount + qoffset'. This gives a clean Rx to Tx queue mapping when there are multiple TCs, as the queue indices for upper TCs will be offset by 'qoffset'. v2: Fixed commit description based on comments. Fixes: 1b837d489e06 ("net: Revoke export for __skb_tx_hash, update it to just be static skb_tx_hash") Fixes: eadec877ce9c ("net: Add support for subordinate traffic classes to netdev_pick_tx") Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com> Reviewed-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-26selftests: nft_concat_range: Add test for reported add/flush/add issueStefano Brivio1-4/+39
Add a specific test for the crash reported by Phil Sutter and addressed in the previous patch. The test cases that, in my intention, should have covered these cases, that is, the ones from the 'concurrency' section, don't run these sequences tightly enough and spectacularly failed to catch this. While at it, define a convenient way to add these kind of tests, by adding a "reported issues" test section. It's more convenient, for this particular test, to execute the set setup in its own function. However, future test cases like this one might need to call setup functions, and will typically need no tools other than nft, so allow for this in check_tools(). The original form of the reproducer used here was provided by Phil. Reported-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-02-26nft_set_pipapo: Actually fetch key data in nft_pipapo_remove()Stefano Brivio1-2/+4
Phil reports that adding elements, flushing and re-adding them right away: nft add table t '{ set s { type ipv4_addr . inet_service; flags interval; }; }' nft add element t s '{ 10.0.0.1 . 22-25, 10.0.0.1 . 10-20 }' nft flush set t s nft add element t s '{ 10.0.0.1 . 10-20, 10.0.0.1 . 22-25 }' triggers, almost reliably, a crash like this one: [ 71.319848] general protection fault, probably for non-canonical address 0x6f6b6e696c2e756e: 0000 [#1] PREEMPT SMP PTI [ 71.321540] CPU: 3 PID: 1201 Comm: kworker/3:2 Not tainted 5.6.0-rc1-00377-g2bb07f4e1d861 #192 [ 71.322746] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190711_202441-buildvm-armv7-10.arm.fedoraproject.org-2.fc31 04/01/2014 [ 71.324430] Workqueue: events nf_tables_trans_destroy_work [nf_tables] [ 71.325387] RIP: 0010:nft_set_elem_destroy+0xa5/0x110 [nf_tables] [ 71.326164] Code: 89 d4 84 c0 74 0e 8b 77 44 0f b6 f8 48 01 df e8 41 ff ff ff 45 84 e4 74 36 44 0f b6 63 08 45 84 e4 74 2c 49 01 dc 49 8b 04 24 <48> 8b 40 38 48 85 c0 74 4f 48 89 e7 4c 8b [ 71.328423] RSP: 0018:ffffc9000226fd90 EFLAGS: 00010282 [ 71.329225] RAX: 6f6b6e696c2e756e RBX: ffff88813ab79f60 RCX: ffff88813931b5a0 [ 71.330365] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff88813ab79f9a [ 71.331473] RBP: ffff88813ab79f60 R08: 0000000000000008 R09: 0000000000000000 [ 71.332627] R10: 000000000000021c R11: 0000000000000000 R12: ffff88813ab79fc2 [ 71.333615] R13: ffff88813b3adf50 R14: dead000000000100 R15: ffff88813931b8a0 [ 71.334596] FS: 0000000000000000(0000) GS:ffff88813bd80000(0000) knlGS:0000000000000000 [ 71.335780] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 71.336577] CR2: 000055ac683710f0 CR3: 000000013a222003 CR4: 0000000000360ee0 [ 71.337533] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 71.338557] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 71.339718] Call Trace: [ 71.340093] nft_pipapo_destroy+0x7a/0x170 [nf_tables_set] [ 71.340973] nft_set_destroy+0x20/0x50 [nf_tables] [ 71.341879] nf_tables_trans_destroy_work+0x246/0x260 [nf_tables] [ 71.342916] process_one_work+0x1d5/0x3c0 [ 71.343601] worker_thread+0x4a/0x3c0 [ 71.344229] kthread+0xfb/0x130 [ 71.344780] ? process_one_work+0x3c0/0x3c0 [ 71.345477] ? kthread_park+0x90/0x90 [ 71.346129] ret_from_fork+0x35/0x40 [ 71.346748] Modules linked in: nf_tables_set nf_tables nfnetlink 8021q [last unloaded: nfnetlink] [ 71.348153] ---[ end trace 2eaa8149ca759bcc ]--- [ 71.349066] RIP: 0010:nft_set_elem_destroy+0xa5/0x110 [nf_tables] [ 71.350016] Code: 89 d4 84 c0 74 0e 8b 77 44 0f b6 f8 48 01 df e8 41 ff ff ff 45 84 e4 74 36 44 0f b6 63 08 45 84 e4 74 2c 49 01 dc 49 8b 04 24 <48> 8b 40 38 48 85 c0 74 4f 48 89 e7 4c 8b [ 71.350017] RSP: 0018:ffffc9000226fd90 EFLAGS: 00010282 [ 71.350019] RAX: 6f6b6e696c2e756e RBX: ffff88813ab79f60 RCX: ffff88813931b5a0 [ 71.350019] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff88813ab79f9a [ 71.350020] RBP: ffff88813ab79f60 R08: 0000000000000008 R09: 0000000000000000 [ 71.350021] R10: 000000000000021c R11: 0000000000000000 R12: ffff88813ab79fc2 [ 71.350022] R13: ffff88813b3adf50 R14: dead000000000100 R15: ffff88813931b8a0 [ 71.350025] FS: 0000000000000000(0000) GS:ffff88813bd80000(0000) knlGS:0000000000000000 [ 71.350026] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 71.350027] CR2: 000055ac683710f0 CR3: 000000013a222003 CR4: 0000000000360ee0 [ 71.350028] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 71.350028] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 71.350030] Kernel panic - not syncing: Fatal exception [ 71.350412] Kernel Offset: disabled [ 71.365922] ---[ end Kernel panic - not syncing: Fatal exception ]--- which is caused by dangling elements that have been deactivated, but never removed. On a flush operation, nft_pipapo_walk() walks through all the elements in the mapping table, which are then deactivated by nft_flush_set(), one by one, and added to the commit list for removal. Element data is then freed. On transaction commit, nft_pipapo_remove() is called, and failed to remove these elements, leading to the stale references in the mapping. The first symptom of this, revealed by KASan, is a one-byte use-after-free in subsequent calls to nft_pipapo_walk(), which is usually not enough to trigger a panic. When stale elements are used more heavily, though, such as double-free via nft_pipapo_destroy() as in Phil's case, the problem becomes more noticeable. The issue comes from that fact that, on a flush operation, nft_pipapo_remove() won't get the actual key data via elem->key, elements to be deleted upon commit won't be found by the lookup via pipapo_get(), and removal will be skipped. Key data should be fetched via nft_set_ext_key(), instead. Reported-by: Phil Sutter <phil@nwl.cc> Fixes: 3c4287f62044 ("nf_tables: Add set type for arbitrary concatenation of ranges") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-02-25icmp: allow icmpv6_ndo_send to work with CONFIG_IPV6=nJason A. Donenfeld1-6/+10
The icmpv6_send function has long had a static inline implementation with an empty body for CONFIG_IPV6=n, so that code calling it doesn't need to be ifdef'd. The new icmpv6_ndo_send function, which is intended for drivers as a drop-in replacement with an identical function signature, should follow the same pattern. Without this patch, drivers that used to work with CONFIG_IPV6=n now result in a linker error. Cc: Chen Zhou <chenzhou10@huawei.com> Reported-by: Hulk Robot <hulkci@huawei.com> Fixes: 0b41713b6066 ("icmp: introduce helper for nat'd source address in network device context") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-25selftests: nft_concat_range: Move option for 'list ruleset' before commandStefano Brivio1-6/+6
Before nftables commit fb9cea50e8b3 ("main: enforce options before commands"), 'nft list ruleset -a' happened to work, but it's wrong and won't work anymore. Replace it by 'nft -a list ruleset'. Reported-by: Chen Yi <yiche@redhat.com> Fixes: 611973c1e06f ("selftests: netfilter: Introduce tests for sets with range concatenation") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-02-24net: bridge: fix stale eth hdr pointer in br_dev_xmitNikolay Aleksandrov1-4/+2
In br_dev_xmit() we perform vlan filtering in br_allowed_ingress() but if the packet has the vlan header inside (e.g. bridge with disabled tx-vlan-offload) then the vlan filtering code will use skb_vlan_untag() to extract the vid before filtering which in turn calls pskb_may_pull() and we may end up with a stale eth pointer. Moreover the cached eth header pointer will generally be wrong after that operation. Remove the eth header caching and just use eth_hdr() directly, the compiler does the right thing and calculates it only once so we don't lose anything. Fixes: 057658cb33fb ("bridge: suppress arp pkts on BR_NEIGH_SUPPRESS ports") Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-24net: ll_temac: Handle DMA halt condition caused by buffer underrunEsben Haabendal2-5/+56
The SDMA engine used by TEMAC halts operation when it has finished processing of the last buffer descriptor in the buffer ring. Unfortunately, no interrupt event is generated when this happens, so we need to setup another mechanism to make sure DMA operation is restarted when enough buffers have been added to the ring. Fixes: 92744989533c ("net: add Xilinx ll_temac device driver") Signed-off-by: Esben Haabendal <esben@geanix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-24net: ll_temac: Fix RX buffer descriptor handling on GFP_ATOMIC pressureEsben Haabendal2-31/+82
Failures caused by GFP_ATOMIC memory pressure have been observed, and due to the missing error handling, results in kernel crash such as [1876998.350133] kernel BUG at mm/slub.c:3952! [1876998.350141] invalid opcode: 0000 [#1] PREEMPT SMP PTI [1876998.350147] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 5.3.0-scnxt #1 [1876998.350150] Hardware name: N/A N/A/COMe-bIP2, BIOS CCR2R920 03/01/2017 [1876998.350160] RIP: 0010:kfree+0x1ca/0x220 [1876998.350164] Code: 85 db 74 49 48 8b 95 68 01 00 00 48 31 c2 48 89 10 e9 d7 fe ff ff 49 8b 04 24 a9 00 00 01 00 75 0b 49 8b 44 24 08 a8 01 75 02 <0f> 0b 49 8b 04 24 31 f6 a9 00 00 01 00 74 06 41 0f b6 74 24 5b [1876998.350172] RSP: 0018:ffffc900000f0df0 EFLAGS: 00010246 [1876998.350177] RAX: ffffea00027f0708 RBX: ffff888008d78000 RCX: 0000000000391372 [1876998.350181] RDX: 0000000000000000 RSI: ffffe8ffffd01400 RDI: ffff888008d78000 [1876998.350185] RBP: ffff8881185a5d00 R08: ffffc90000087dd8 R09: 000000000000280a [1876998.350189] R10: 0000000000000002 R11: 0000000000000000 R12: ffffea0000235e00 [1876998.350193] R13: ffff8881185438a0 R14: 0000000000000000 R15: ffff888118543870 [1876998.350198] FS: 0000000000000000(0000) GS:ffff88811f300000(0000) knlGS:0000000000000000 [1876998.350203] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 s#1 Part1 [1876998.350206] CR2: 00007f8dac7b09f0 CR3: 000000011e20a006 CR4: 00000000001606e0 [1876998.350210] Call Trace: [1876998.350215] <IRQ> [1876998.350224] ? __netif_receive_skb_core+0x70a/0x920 [1876998.350229] kfree_skb+0x32/0xb0 [1876998.350234] __netif_receive_skb_core+0x70a/0x920 [1876998.350240] __netif_receive_skb_one_core+0x36/0x80 [1876998.350245] process_backlog+0x8b/0x150 [1876998.350250] net_rx_action+0xf7/0x340 [1876998.350255] __do_softirq+0x10f/0x353 [1876998.350262] irq_exit+0xb2/0xc0 [1876998.350265] do_IRQ+0x77/0xd0 [1876998.350271] common_interrupt+0xf/0xf [1876998.350274] </IRQ> In order to handle such failures more graceful, this change splits the receive loop into one for consuming the received buffers, and one for allocating new buffers. When GFP_ATOMIC allocations fail, the receive will continue with the buffers that is still there, and with the expectation that the allocations will succeed in a later call to receive. Fixes: 92744989533c ("net: add Xilinx ll_temac device driver") Signed-off-by: Esben Haabendal <esben@geanix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-24net: ll_temac: Add more error handling of dma_map_single() callsEsben Haabendal1-2/+24
This adds error handling to the remaining dma_map_single() calls, so that behavior is well defined if/when we run out of DMA memory. Fixes: 92744989533c ("net: add Xilinx ll_temac device driver") Signed-off-by: Esben Haabendal <esben@geanix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-24net: ll_temac: Fix race condition causing TX hangEsben Haabendal1-3/+16
It is possible that the interrupt handler fires and frees up space in the TX ring in between checking for sufficient TX ring space and stopping the TX queue in temac_start_xmit. If this happens, the queue wake from the interrupt handler will occur before the queue is stopped, causing a lost wakeup and the adapter's transmit hanging. To avoid this, after stopping the queue, check again whether there is sufficient space in the TX ring. If so, wake up the queue again. This is a port of the similar fix in axienet driver, commit 7de44285c1f6 ("net: axienet: Fix race condition causing TX hang"). Fixes: 23ecc4bde21f ("net: ll_temac: fix checksum offload logic") Signed-off-by: Esben Haabendal <esben@geanix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-24mac80211: rx: avoid RCU list traversal under mutexMadhuparna Bhowmik1-1/+1
local->sta_mtx is held in __ieee80211_check_fast_rx_iface(). No need to use list_for_each_entry_rcu() as it also requires a cond argument to avoid false lockdep warnings when not used in RCU read-side section (with CONFIG_PROVE_RCU_LIST). Therefore use list_for_each_entry(); Signed-off-by: Madhuparna Bhowmik <madhuparnabhowmik10@gmail.com> Link: https://lore.kernel.org/r/20200223143302.15390-1-madhuparnabhowmik10@gmail.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-02-24nl80211: explicitly include if_vlan.hJohannes Berg1-0/+1
We use that here, and do seem to get it through some recursive include, but better include it explicitly. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Link: https://lore.kernel.org/r/20200224093814.1b9c258fec67.I45ac150d4e11c72eb263abec9f1f0c7add9bef2b@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-02-23net: core: devlink.c: Hold devlink->lock from the beginning of devlink_dpipe_table_register()Madhuparna Bhowmik1-7/+14
devlink_dpipe_table_find() should be called under either rcu_read_lock() or devlink->lock. devlink_dpipe_table_register() calls devlink_dpipe_table_find() without holding the lock and acquires it later. Therefore hold the devlink->lock from the beginning of devlink_dpipe_table_register(). Suggested-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Madhuparna Bhowmik <madhuparnabhowmik10@gmail.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-23net: phy: Avoid multiple suspendsFlorian Fainelli1-2/+3
It is currently possible for a PHY device to be suspended as part of a network device driver's suspend call while it is still being attached to that net_device, either via phy_suspend() or implicitly via phy_stop(). Later on, when the MDIO bus controller get suspended, we would attempt to suspend again the PHY because it is still attached to a network device. This is both a waste of time and creates an opportunity for improper clock/power management bugs to creep in. Fixes: 803dd9c77ac3 ("net: phy: avoid suspending twice a PHY") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-23net: ks8851-ml: Fix IRQ handling and lockingMarek Vasut1-6/+8
The KS8851 requires that packet RX and TX are mutually exclusive. Currently, the driver hopes to achieve this by disabling interrupt from the card by writing the card registers and by disabling the interrupt on the interrupt controller. This however is racy on SMP. Replace this approach by expanding the spinlock used around the ks_start_xmit() TX path to ks_irq() RX path to assure true mutual exclusion and remove the interrupt enabling/disabling, which is now not needed anymore. Furthermore, disable interrupts also in ks_net_stop(), which was missing before. Note that a massive improvement here would be to re-use the KS8851 driver approach, which is to move the TX path into a worker thread, interrupt handling to threaded interrupt, and synchronize everything with mutexes, but that would be a much bigger rework, for a separate patch. Signed-off-by: Marek Vasut <marex@denx.de> Cc: David S. Miller <davem@davemloft.net> Cc: Lukas Wunner <lukas@wunner.de> Cc: Petr Stetiar <ynezz@true.cz> Cc: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-23docs: networking: phy: Rephrase paragraph for clarityJonathan Neuschäfer1-2/+3
Let's make it a little easier to read. Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-23tcp: fix TFO SYNACK undo to avoid double-timestamp-undoNeal Cardwell1-1/+5
In a rare corner case the new logic for undo of SYNACK RTO could result in triggering the warning in tcp_fastretrans_alert() that says: WARN_ON(tp->retrans_out != 0); The warning looked like: WARNING: CPU: 1 PID: 1 at net/ipv4/tcp_input.c:2818 tcp_ack+0x13e0/0x3270 The sequence that tickles this bug is: - Fast Open server receives TFO SYN with data, sends SYNACK - (client receives SYNACK and sends ACK, but ACK is lost) - server app sends some data packets - (N of the first data packets are lost) - server receives client ACK that has a TS ECR matching first SYNACK, and also SACKs suggesting the first N data packets were lost - server performs TS undo of SYNACK RTO, then immediately enters recovery - buggy behavior then performed a *second* undo that caused the connection to be in CA_Open with retrans_out != 0 Basically, the incoming ACK packet with SACK blocks causes us to first undo the cwnd reduction from the SYNACK RTO, but then immediately enters fast recovery, which then makes us eligible for undo again. And then tcp_rcv_synrecv_state_fastopen() accidentally performs an undo using a "mash-up" of state from two different loss recovery phases: it uses the timestamp info from the ACK of the original SYNACK, and the undo_marker from the fast recovery. This fix refines the logic to only invoke the tcp_try_undo_loss() inside tcp_rcv_synrecv_state_fastopen() if the connection is still in CA_Loss. If peer SACKs triggered fast recovery, then tcp_rcv_synrecv_state_fastopen() can't safely undo. Fixes: 794200d66273 ("tcp: undo cwnd on Fast Open spurious SYNACK retransmit") Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-23hv_netvsc: Fix unwanted wakeup in netvsc_attach()Haiyang Zhang2-1/+4
When netvsc_attach() is called by operations like changing MTU, etc., an extra wakeup may happen while netvsc_attach() calling rndis_filter_device_add() which sends rndis messages when queue is stopped in netvsc_detach(). The completion message will wake up queue 0. We can reproduce the issue by changing MTU etc., then the wake_queue counter from "ethtool -S" will increase beyond stop_queue counter: stop_queue: 0 wake_queue: 1 The issue causes queue wake up, and counter increment, no other ill effects in current code. So we didn't see any network problem for now. To fix this, initialize tx_disable to true, and set it to false when the NIC is ready to be attached or registered. Fixes: 7b2ee50c0cd5 ("hv_netvsc: common detach logic") Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-23net: usb: qmi_wwan: restore mtu min/max values after raw_ip switchDaniele Palmas1-0/+3
usbnet creates network interfaces with min_mtu = 0 and max_mtu = ETH_MAX_MTU. These values are not modified by qmi_wwan when the network interface is created initially, allowing, for example, to set mtu greater than 1500. When a raw_ip switch is done (raw_ip set to 'Y', then set to 'N') the mtu values for the network interface are set through ether_setup, with min_mtu = ETH_MIN_MTU and max_mtu = ETH_DATA_LEN, not allowing anymore to set mtu greater than 1500 (error: mtu greater than device maximum). The patch restores the original min/max mtu values set by usbnet after a raw_ip switch. Signed-off-by: Daniele Palmas <dnlplm@gmail.com> Acked-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-22net: genetlink: return the error code when attribute parsing fails.Paolo Abeni1-2/+3
Currently if attribute parsing fails and the genl family does not support parallel operation, the error code returned by __nlmsg_parse() is discarded by genl_family_rcv_msg_attrs_parse(). Be sure to report the error for all genl families. Fixes: c10e6cf85e7d ("net: genetlink: push attrbuf allocation and parsing to a separate function") Fixes: ab5b526da048 ("net: genetlink: always allocate separate attrs for dumpit ops") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-22ipv4: ensure rcu_read_lock() in cipso_v4_error()Matteo Croce1-1/+6
Similarly to commit c543cb4a5f07 ("ipv4: ensure rcu_read_lock() in ipv4_link_failure()"), __ip_options_compile() must be called under rcu protection. Fixes: 3da1ed7ac398 ("net: avoid use IPCB in cipso_v4_error") Suggested-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Matteo Croce <mcroce@redhat.com> Acked-by: Paul Moore <paul@paul-moore.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-22vhost: Check docket sk_family instead of call getnameEugenio Pérez1-9/+1
Doing so, we save one call to get data we already have in the struct. Also, since there is no guarantee that getname use sockaddr_ll parameter beyond its size, we add a little bit of security here. It should do not do beyond MAX_ADDR_LEN, but syzbot found that ax25_getname writes more (72 bytes, the size of full_sockaddr_ax25, versus 20 + 32 bytes of sockaddr_ll + MAX_ADDR_LEN in syzbot repro). Fixes: 3a4d5c94e9593 ("vhost_net: a kernel-level virtio server") Reported-by: syzbot+f2a62d07a5198c819c7b@syzkaller.appspotmail.com Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-22netfilter: ipset: Fix forceadd evaluation pathJozsef Kadlecsik1-0/+2
When the forceadd option is enabled, the hash:* types should find and replace the first entry in the bucket with the new one if there are no reuseable (deleted or timed out) entries. However, the position index was just not set to zero and remained the invalid -1 if there were no reuseable entries. Reported-by: syzbot+6a86565c74ebe30aea18@syzkaller.appspotmail.com Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7") Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
2020-02-22netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reportsJozsef Kadlecsik3-206/+472
In the case of huge hash:* types of sets, due to the single spinlock of a set the processing of the whole set under spinlock protection could take too long. There were four places where the whole hash table of the set was processed from bucket to bucket under holding the spinlock: - During resizing a set, the original set was locked to exclude kernel side add/del element operations (userspace add/del is excluded by the nfnetlink mutex). The original set is actually just read during the resize, so the spinlocking is replaced with rcu locking of regions. However, thus there can be parallel kernel side add/del of entries. In order not to loose those operations a backlog is added and replayed after the successful resize. - Garbage collection of timed out entries was also protected by the spinlock. In order not to lock too long, region locking is introduced and a single region is processed in one gc go. Also, the simple timer based gc running is replaced with a workqueue based solution. The internal book-keeping (number of elements, size of extensions) is moved to region level due to the region locking. - Adding elements: when the max number of the elements is reached, the gc was called to evict the timed out entries. The new approach is that the gc is called just for the matching region, assuming that if the region (proportionally) seems to be full, then the whole set does. We could scan the other regions to check every entry under rcu locking, but for huge sets it'd mean a slowdown at adding elements. - Listing the set header data: when the set was defined with timeout support, the garbage collector was called to clean up timed out entries to get the correct element numbers and set size values. Now the set is scanned to check non-timed out entries, without actually calling the gc for the whole set. Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe -> SOFTIRQ-unsafe lock order issues during working on the patch. Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7") Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
2020-02-21MAINTAINERS: use tabs for SAFESETIDRandy Dunlap1-4/+4
Use tabs for indentation instead of spaces for SAFESETID. All (!) other entries in MAINTAINERS use tabs (according to my simple grepping). Link: http://lkml.kernel.org/r/2bb2e52a-2694-816d-57b4-6cabfadd6c1a@infradead.org Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Micah Morton <mortonm@chromium.org> Cc: James Morris <jmorris@namei.org> Cc: "Serge E. Hallyn" <serge@hallyn.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-02-21lib/stackdepot.c: fix global out-of-bounds in stack_slabsAlexander Potapenko1-2/+6
Walter Wu has reported a potential case in which init_stack_slab() is called after stack_slabs[STACK_ALLOC_MAX_SLABS - 1] has already been initialized. In that case init_stack_slab() will overwrite stack_slabs[STACK_ALLOC_MAX_SLABS], which may result in a memory corruption. Link: http://lkml.kernel.org/r/20200218102950.260263-1-glider@google.com Fixes: cd11016e5f521 ("mm, kasan: stackdepot implementation. Enable stackdepot for SLAB") Signed-off-by: Alexander Potapenko <glider@google.com> Reported-by: Walter Wu <walter-zh.wu@mediatek.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Matthias Brugger <matthias.bgg@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Kate Stewart <kstewart@linuxfoundation.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-02-21mm/sparsemem: pfn_to_page is not valid yet on SPARSEMEMWei Yang1-1/+1
When we use SPARSEMEM instead of SPARSEMEM_VMEMMAP, pfn_to_page() doesn't work before sparse_init_one_section() is called. This leads to a crash when hotplug memory: BUG: unable to handle page fault for address: 0000000006400000 #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page PGD 0 P4D 0 Oops: 0002 [#1] SMP PTI CPU: 3 PID: 221 Comm: kworker/u16:1 Tainted: G W 5.5.0-next-20200205+ #343 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015 Workqueue: kacpi_hotplug acpi_hotplug_work_fn RIP: 0010:__memset+0x24/0x30 Code: cc cc cc cc cc cc 0f 1f 44 00 00 49 89 f9 48 89 d1 83 e2 07 48 c1 e9 03 40 0f b6 f6 48 b8 01 01 01 01 01 01 01 01 48 0f af c6 <f3> 48 ab 89 d1 f3 aa 4c 89 c8 c3 90 49 89 f9 40 88 f0 48 89 d1 f3 RSP: 0018:ffffb43ac0373c80 EFLAGS: 00010a87 RAX: ffffffffffffffff RBX: ffff8a1518800000 RCX: 0000000000050000 RDX: 0000000000000000 RSI: 00000000000000ff RDI: 0000000006400000 RBP: 0000000000140000 R08: 0000000000100000 R09: 0000000006400000 R10: 0000000000000000 R11: 0000000000000002 R12: 0000000000000000 R13: 0000000000000028 R14: 0000000000000000 R15: ffff8a153ffd9280 FS: 0000000000000000(0000) GS:ffff8a153ab00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000006400000 CR3: 0000000136fca000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: sparse_add_section+0x1c9/0x26a __add_pages+0xbf/0x150 add_pages+0x12/0x60 add_memory_resource+0xc8/0x210 __add_memory+0x62/0xb0 acpi_memory_device_add+0x13f/0x300 acpi_bus_attach+0xf6/0x200 acpi_bus_scan+0x43/0x90 acpi_device_hotplug+0x275/0x3d0 acpi_hotplug_work_fn+0x1a/0x30 process_one_work+0x1a7/0x370 worker_thread+0x30/0x380 kthread+0x112/0x130 ret_from_fork+0x35/0x40 We should use memmap as it did. On x86 the impact is limited to x86_32 builds, or x86_64 configurations that override the default setting for SPARSEMEM_VMEMMAP. Other memory hotplug archs (arm64, ia64, and ppc) also default to SPARSEMEM_VMEMMAP=y. [dan.j.williams@intel.com: changelog update] {rppt@linux.ibm.com: changelog update] Link: http://lkml.kernel.org/r/20200219030454.4844-1-bhe@redhat.com Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug") Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Signed-off-by: Baoquan He <bhe@redhat.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Baoquan He <bhe@redhat.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-02-21mm/vmscan.c: don't round up scan size for online memory cgroupGavin Shan1-3/+6
Commit 68600f623d69 ("mm: don't miss the last page because of round-off error") makes the scan size round up to @denominator regardless of the memory cgroup's state, online or offline. This affects the overall reclaiming behavior: the corresponding LRU list is eligible for reclaiming only when its size logically right shifted by @sc->priority is bigger than zero in the former formula. For example, the inactive anonymous LRU list should have at least 0x4000 pages to be eligible for reclaiming when we have 60/12 for swappiness/priority and without taking scan/rotation ratio into account. After the roundup is applied, the inactive anonymous LRU list becomes eligible for reclaiming when its size is bigger than or equal to 0x1000 in the same condition. (0x4000 >> 12) * 60 / (60 + 140 + 1) = 1 ((0x1000 >> 12) * 60) + 200) / (60 + 140 + 1) = 1 aarch64 has 512MB huge page size when the base page size is 64KB. The memory cgroup that has a huge page is always eligible for reclaiming in that case. The reclaiming is likely to stop after the huge page is reclaimed, meaing the further iteration on @sc->priority and the silbing and child memory cgroups will be skipped. The overall behaviour has been changed. This fixes the issue by applying the roundup to offlined memory cgroups only, to give more preference to reclaim memory from offlined memory cgroup. It sounds reasonable as those memory is unlikedly to be used by anyone. The issue was found by starting up 8 VMs on a Ampere Mustang machine, which has 8 CPUs and 16 GB memory. Each VM is given with 2 vCPUs and 2GB memory. It took 264 seconds for all VMs to be completely up and 784MB swap is consumed after that. With this patch applied, it took 236 seconds and 60MB swap to do same thing. So there is 10% performance improvement for my case. Note that KSM is disable while THP is enabled in the testing. total used free shared buff/cache available Mem: 16196 10065 2049 16 4081 3749 Swap: 8175 784 7391 total used free shared buff/cache available Mem: 16196 11324 3656 24 1215 2936 Swap: 8175 60 8115 Link: http://lkml.kernel.org/r/20200211024514.8730-1-gshan@redhat.com Fixes: 68600f623d69 ("mm: don't miss the last page because of round-off error") Signed-off-by: Gavin Shan <gshan@redhat.com> Acked-by: Roman Gushchin <guro@fb.com> Cc: <stable@vger.kernel.org> [4.20+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-02-21lib/string.c: update match_string() doc-strings with correct behaviorAlexandru Ardelean1-0/+16
There were a few attempts at changing behavior of the match_string() helpers (i.e. 'match_string()' & 'sysfs_match_string()'), to change & extend the behavior according to the doc-string. But the simplest approach is to just fix the doc-strings. The current behavior is fine as-is, and some bugs were introduced trying to fix it. As for extending the behavior, new helpers can always be introduced if needed. The match_string() helpers behave more like 'strncmp()' in the sense that they go up to n elements or until the first NULL element in the array of strings. This change updates the doc-strings with this info. Link: http://lkml.kernel.org/r/20200213072722.8249-1-alexandru.ardelean@analog.com Signed-off-by: Alexandru Ardelean <alexandru.ardelean@analog.com> Acked-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Kees Cook <keescook@chromium.org> Cc: "Tobin C . Harding" <tobin@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-02-21mm/memcontrol.c: lost css_put in memcg_expand_shrinker_maps()Vasily Averin1-1/+3
for_each_mem_cgroup() increases css reference counter for memory cgroup and requires to use mem_cgroup_iter_break() if the walk is cancelled. Link: http://lkml.kernel.org/r/c98414fb-7e1f-da0f-867a-9340ec4bd30b@virtuozzo.com Fixes: 0a4465d34028 ("mm, memcg: assign memcg-aware shrinkers bitmap to memcg") Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Acked-by: Kirill Tkhai <ktkhai@virtuozzo.com> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Roman Gushchin <guro@fb.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-02-21mm/swapfile.c: fix a comment in sys_swapon()Christoph Hellwig1-1/+1
claim_swapfile now always takes i_rwsem. Link: http://lkml.kernel.org/r/20200114161225.309792-2-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-02-21scripts/get_maintainer.pl: deprioritize old Fixes: addressesDouglas Anderson1-4/+4
Recently, I found that get_maintainer was causing me to send emails to the old addresses for maintainers. Since I usually just trust the output of get_maintainer to know the right email address, I didn't even look carefully and fired off two patch series that went to the wrong place. Oops. The problem was introduced recently when trying to add signatures from Fixes. The problem was that these email addresses were added too early in the process of compiling our list of places to send. Things added to the list earlier are considered more canonical and when we later added maintainer entries we ended up deduplicating to the old address. Here are two examples using mainline commits (to make it easier to replicate) for the two maintainers that I messed up recently: $ git format-patch d8549bcd0529~..d8549bcd0529 $ ./scripts/get_maintainer.pl 0001-clk-Add-clk_hw*.patch | grep Boyd Stephen Boyd <sboyd@codeaurora.org>... $ git format-patch 6d1238aa3395~..6d1238aa3395 $ ./scripts/get_maintainer.pl 0001-arm64-dts-qcom-qcs404*.patch | grep Andy Andy Gross <andy.gross@linaro.org> Let's move the adding of addresses from Fixes: to the end since the email addresses from these are much more likely to be older. After this patch the above examples get the right addresses for the two examples. Link: http://lkml.kernel.org/r/20200127095001.1.I41fba9f33590bfd92cd01960161d8384268c6569@changeid Fixes: 2f5bd343694e ("scripts/get_maintainer.pl: add signatures from Fixes: <badcommit> lines in commit message") Signed-off-by: Douglas Anderson <dianders@chromium.org> Acked-by: Joe Perches <joe@perches.com> Cc: Stephen Boyd <sboyd@kernel.org> Cc: Bjorn Andersson <bjorn.andersson@linaro.org> Cc: Andy Gross <agross@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-02-21get_maintainer: remove uses of P: for maintainer nameJoe Perches1-24/+0
Commit 1ca84ed6425f ("MAINTAINERS: Reclaim the P: tag for Maintainer Entry Profile") changed the use of the "P:" tag from "Person" to "Profile (ie: special subsystem coding styles and characteristics)" Change how get_maintainer.pl parses the "P:" tag to match. Link: http://lkml.kernel.org/r/ca53823fc5d25c0be32ad937d0207a0589c08643.camel@perches.com Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Dan Williams <dan.j.william@intel.com> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-02-21selftests/vm: add missed tests in run_vmtestsSeongJae Park1-0/+33
The commits introducing 'mlock-random-test'[1], 'map_fiex_noreplace'[2], and 'thuge-gen'[3] have not added those in the 'run_vmtests' script and thus the 'run_tests' command of kselftests doesn't run those. This commit adds those in the script. 'gup_benchmark' and 'transhuge-stress' are also not included in the 'run_vmtests', but this commit does not add those because those are for performance measurement rather than pass/fail tests. [1] commit 26b4224d9961 ("selftests: expanding more mlock selftest") [2] commit 91cbacc34512 ("tools/testing/selftests/vm/map_fixed_noreplace.c: add test for MAP_FIXED_NOREPLACE") [3] commit fcc1f2d5dd34 ("selftests: add a test program for variable huge page sizes in mmap/shmget") Link: http://lkml.kernel.org/r/20200206085144.29126-1-sj38.park@gmail.com Signed-off-by: SeongJae Park <sjpark@amazon.de> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-02-21include/uapi/linux/swab.h: fix userspace breakage, use __BITS_PER_LONG for swapChristian Borntraeger1-2/+2
QEMU has a funny new build error message when I use the upstream kernel headers: CC block/file-posix.o In file included from /home/cborntra/REPOS/qemu/include/qemu/timer.h:4, from /home/cborntra/REPOS/qemu/include/qemu/timed-average.h:29, from /home/cborntra/REPOS/qemu/include/block/accounting.h:28, from /home/cborntra/REPOS/qemu/include/block/block_int.h:27, from /home/cborntra/REPOS/qemu/block/file-posix.c:30: /usr/include/linux/swab.h: In function `__swab': /home/cborntra/REPOS/qemu/include/qemu/bitops.h:20:34: error: "sizeof" is not defined, evaluates to 0 [-Werror=undef] 20 | #define BITS_PER_LONG (sizeof (unsigned long) * BITS_PER_BYTE) | ^~~~~~ /home/cborntra/REPOS/qemu/include/qemu/bitops.h:20:41: error: missing binary operator before token "(" 20 | #define BITS_PER_LONG (sizeof (unsigned long) * BITS_PER_BYTE) | ^ cc1: all warnings being treated as errors make: *** [/home/cborntra/REPOS/qemu/rules.mak:69: block/file-posix.o] Error 1 rm tests/qemu-iotests/socket_scm_helper.o This was triggered by commit d5767057c9a ("uapi: rename ext2_swab() to swab() and share globally in swab.h"). That patch is doing #include <asm/bitsperlong.h> but it uses BITS_PER_LONG. The kernel file asm/bitsperlong.h provide only __BITS_PER_LONG. Let us use the __ variant in swap.h Link: http://lkml.kernel.org/r/20200213142147.17604-1-borntraeger@de.ibm.com Fixes: d5767057c9a ("uapi: rename ext2_swab() to swab() and share globally in swab.h") Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Yury Norov <yury.norov@gmail.com> Cc: Allison Randal <allison@lohutok.net> Cc: Joe Perches <joe@perches.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: William Breathitt Gray <vilhelm.gray@gmail.com> Cc: Torsten Hilbrich <torsten.hilbrich@secunet.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-02-21Revert "ipc,sem: remove uneeded sem_undo_list lock usage in exit_sem()"Ioanna Alifieraki1-4/+2
This reverts commit a97955844807e327df11aa33869009d14d6b7de0. Commit a97955844807 ("ipc,sem: remove uneeded sem_undo_list lock usage in exit_sem()") removes a lock that is needed. This leads to a process looping infinitely in exit_sem() and can also lead to a crash. There is a reproducer available in [1] and with the commit reverted the issue does not reproduce anymore. Using the reproducer found in [1] is fairly easy to reach a point where one of the child processes is looping infinitely in exit_sem between for(;;) and if (semid == -1) block, while it's trying to free its last sem_undo structure which has already been freed by freeary(). Each sem_undo struct is on two lists: one per semaphore set (list_id) and one per process (list_proc). The list_id list tracks undos by semaphore set, and the list_proc by process. Undo structures are removed either by freeary() or by exit_sem(). The freeary function is invoked when the user invokes a syscall to remove a semaphore set. During this operation freeary() traverses the list_id associated with the semaphore set and removes the undo structures from both the list_id and list_proc lists. For this case, exit_sem() is called at process exit. Each process contains a struct sem_undo_list (referred to as "ulp") which contains the head for the list_proc list. When the process exits, exit_sem() traverses this list to remove each sem_undo struct. As in freeary(), whenever a sem_undo struct is removed from list_proc, it is also removed from the list_id list. Removing elements from list_id is safe for both exit_sem() and freeary() due to sem_lock(). Removing elements from list_proc is not safe; freeary() locks &un->ulp->lock when it performs list_del_rcu(&un->list_proc) but exit_sem() does not (locking was removed by commit a97955844807 ("ipc,sem: remove uneeded sem_undo_list lock usage in exit_sem()"). This can result in the following situation while executing the reproducer [1] : Consider a child process in exit_sem() and the parent in freeary() (because of semctl(sid[i], NSEM, IPC_RMID)). - The list_proc for the child contains the last two undo structs A and B (the rest have been removed either by exit_sem() or freeary()). - The semid for A is 1 and semid for B is 2. - exit_sem() removes A and at the same time freeary() removes B. - Since A and B have different semid sem_lock() will acquire different locks for each process and both can proceed. The bug is that they remove A and B from the same list_proc at the same time because only freeary() acquires the ulp lock. When exit_sem() removes A it makes ulp->list_proc.next to point at B and at the same time freeary() removes B setting B->semid=-1. At the next iteration of for(;;) loop exit_sem() will try to remove B. The only way to break from for(;;) is for (&un->list_proc == &ulp->list_proc) to be true which is not. Then exit_sem() will check if B->semid=-1 which is and will continue looping in for(;;) until the memory for B is reallocated and the value at B->semid is changed. At that point, exit_sem() will crash attempting to unlink B from the lists (this can be easily triggered by running the reproducer [1] a second time). To prove this scenario instrumentation was added to keep information about each sem_undo (un) struct that is removed per process and per semaphore set (sma). CPU0 CPU1 [caller holds sem_lock(sma for A)] ... freeary() exit_sem() ... ... ... sem_lock(sma for B) spin_lock(A->ulp->lock) ... list_del_rcu(un_A->list_proc) list_del_rcu(un_B->list_proc) Undo structures A and B have different semid and sem_lock() operations proceed. However they belong to the same list_proc list and they are removed at the same time. This results into ulp->list_proc.next pointing to the address of B which is already removed. After reverting commit a97955844807 ("ipc,sem: remove uneeded sem_undo_list lock usage in exit_sem()") the issue was no longer reproducible. [1] https://bugzilla.redhat.com/show_bug.cgi?id=1694779 Link: http://lkml.kernel.org/r/20191211191318.11860-1-ioanna-maria.alifieraki@canonical.com Fixes: a97955844807 ("ipc,sem: remove uneeded sem_undo_list lock usage in exit_sem()") Signed-off-by: Ioanna Alifieraki <ioanna-maria.alifieraki@canonical.com> Acked-by: Manfred Spraul <manfred@colorfullife.com> Acked-by: Herton R. Krzesinski <herton@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: <malat@debian.org> Cc: Joel Fernandes (Google) <joel@joelfernandes.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Jay Vosburgh <jay.vosburgh@canonical.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-02-21y2038: hide timeval/timespec/itimerval/itimerspec typesArnd Bergmann2-10/+14
There are no in-kernel users remaining, but there may still be users that include linux/time.h instead of sys/time.h from user space, so leave the types available to user space while hiding them from kernel space. Only the __kernel_old_* versions of these types remain now. Link: http://lkml.kernel.org/r/20200110154232.4104492-4-arnd@arndb.de Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Deepa Dinamani <deepa.kernel@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-02-21y2038: remove unused time32 interfacesArnd Bergmann6-326/+1
No users remain, so kill these off before we grow new ones. Link: http://lkml.kernel.org/r/20200110154232.4104492-3-arnd@arndb.de Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Deepa Dinamani <deepa.kernel@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-02-21y2038: remove ktime to/from timespec/timeval conversionArnd Bergmann1-37/+0
A couple of helpers are now obsolete and can be removed, so drivers can no longer start using them and instead use y2038-safe interfaces. Link: http://lkml.kernel.org/r/20200110154232.4104492-2-arnd@arndb.de Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Deepa Dinamani <deepa.kernel@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-02-21ACPI: PM: s2idle: Check fixed wakeup events in acpi_s2idle_wake()Rafael J. Wysocki3-0/+53
Commit fdde0ff8590b ("ACPI: PM: s2idle: Prevent spurious SCIs from waking up the system") overlooked the fact that fixed events can wake up the system too and broke RTC wakeup from suspend-to-idle as a result. Fix this issue by checking the fixed events in acpi_s2idle_wake() in addition to checking wakeup GPEs and break out of the suspend-to-idle loop if the status bits of any enabled fixed events are set then. Fixes: fdde0ff8590b ("ACPI: PM: s2idle: Prevent spurious SCIs from waking up the system") Reported-and-tested-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: 5.4+ <stable@vger.kernel.org> # 5.4+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-02-21mac80211: Remove a redundant mutex unlockAndrei Otcheretianski1-5/+1
The below-mentioned commit changed the code to unlock *inside* the function, but previously the unlock was *outside*. It failed to remove the outer unlock, however, leading to double unlock. Fix this. Fixes: 33483a6b88e4 ("mac80211: fix missing unlock on error in ieee80211_mark_sta_auth()") Signed-off-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com> Link: https://lore.kernel.org/r/20200221104719.cce4741cf6eb.I671567b185c8a4c2409377e483fd149ce590f56d@changeid [rewrite commit message to better explain what happened] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-02-21cfg80211: check reg_rule for NULL in handle_channel_custom()Johannes Berg1-1/+1
We may end up with a NULL reg_rule after the loop in handle_channel_custom() if the bandwidth didn't fit, check if this is the case and bail out if so. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Link: https://lore.kernel.org/r/20200221104449.3b558a50201c.I4ad3725c4dacaefd2d18d3cc65ba6d18acd5dbfe@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-02-21nl80211: fix potential leak in AP startJohannes Berg1-2/+2
If nl80211_parse_he_obss_pd() fails, we leak the previously allocated ACL memory. Free it in this case. Fixes: 796e90f42b7e ("cfg80211: add support for parsing OBBS_PD attributes") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Link: https://lore.kernel.org/r/20200221104142.835aba4cdd14.I1923b55ba9989c57e13978f91f40bfdc45e60cbd@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-02-20bnxt_en: Issue PCIe FLR in kdump kernel to cleanup pending DMAs.Vasundhara Volam1-0/+8
If crashed kernel does not shutdown the NIC properly, PCIe FLR is required in the kdump kernel in order to initialize all the functions properly. Fixes: d629522e1d66 ("bnxt_en: Reduce memory usage when running in kdump kernel.") Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-20bnxt_en: Improve device shutdown method.Vasundhara Volam1-2/+2
Especially when bnxt_shutdown() is called during kexec, we need to disable MSIX and disable Bus Master to completely quiesce the device. Make these 2 calls unconditionally in the shutdown method. Fixes: c20dc142dd7b ("bnxt_en: Disable bus master during PCI shutdown and driver unload.") Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>