aboutsummaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)AuthorFilesLines
2020-02-26net: bcmgenet: Clear ID_MODE_DIS in EXT_RGMII_OOB_CTRL when not neededNicolas Saenz Julienne1-0/+1
Outdated Raspberry Pi 4 firmware might configure the external PHY as rgmii although the kernel currently sets it as rgmii-rxid. This makes connections unreliable as ID_MODE_DIS is left enabled. To avoid this, explicitly clear that bit whenever we don't need it. Fixes: da38802211cc ("net: bcmgenet: Add RGMII_RXID support") Signed-off-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-26sched: act: count in the size of action flags bitfieldJiri Pirko1-0/+1
The put of the flags was added by the commit referenced in fixes tag, however the size of the message was not extended accordingly. Fix this by adding size of the flags bitfield to the message size. Fixes: e38226786022 ("net: sched: update action implementations to support flags") Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-26net: core: devlink.c: Use built-in RCU list checkingMadhuparna Bhowmik1-9/+10
list_for_each_entry_rcu() has built-in RCU and lock checking. Pass cond argument to list_for_each_entry_rcu() to silence false lockdep warning when CONFIG_PROVE_RCU_LIST is enabled. The devlink->lock is held when devlink_dpipe_table_find() is called in non RCU read side section. Therefore, pass struct devlink to devlink_dpipe_table_find() for lockdep checking. Signed-off-by: Madhuparna Bhowmik <madhuparnabhowmik10@gmail.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-26net: dsa: bcm_sf2: Forcibly configure IMP port for 1Gb/secFlorian Fainelli1-2/+1
We are still experiencing some packet loss with the existing advanced congestion buffering (ACB) settings with the IMP port configured for 2Gb/sec, so revert to conservative link speeds that do not produce packet loss until this is resolved. Fixes: 8f1880cbe8d0 ("net: dsa: bcm_sf2: Configure IMP port for 2Gb/sec") Fixes: de34d7084edd ("net: dsa: bcm_sf2: Only 7278 supports 2Gb/sec IMP port") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller6-228/+529
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes: 1) Perform garbage collection from workqueue to fix rcu detected stall in ipset hash set types, from Jozsef Kadlecsik. 2) Fix the forceadd evaluation path, also from Jozsef. 3) Fix nft_set_pipapo selftest, from Stefano Brivio. 4) Crash when add-flush-add element in pipapo set, also from Stefano. Add test to cover this crash. 5) Remove sysctl entry under mutex in hashlimit, from Cong Wang. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-26bnxt_en: add newline to netdev_*() format stringsJonathan Lemon5-38/+38
Add missing newlines to netdev_* format strings so the lines aren't buffered by the printk subsystem. Nitpicked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com> Acked-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-26netfilter: xt_hashlimit: unregister proc file before releasing mutexCong Wang1-10/+6
Before releasing the global mutex, we only unlink the hashtable from the hash list, its proc file is still not unregistered at this point. So syzbot could trigger a race condition where a parallel htable_create() could register the same file immediately after the mutex is released. Move htable_remove_proc_entry() back to mutex protection to fix this. And, fold htable_destroy() into htable_put() to make the code slightly easier to understand. Reported-and-tested-by: syzbot+d195fd3b9a364ddd6731@syzkaller.appspotmail.com Fixes: c4a3922d2d20 ("netfilter: xt_hashlimit: reduce hashlimit_mutex scope for htable_put()") Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-02-26ethtool: limit bitset sizeMichal Kubecek2-1/+4
Syzbot reported that ethnl_compact_sanity_checks() can be tricked into reading past the end of ETHTOOL_A_BITSET_VALUE and ETHTOOL_A_BITSET_MASK attributes and even the message by passing a value between (u32)(-31) and (u32)(-1) as ETHTOOL_A_BITSET_SIZE. The problem is that DIV_ROUND_UP(attr_nbits, 32) is 0 for such values so that zero length ETHTOOL_A_BITSET_VALUE will pass the length check but ethnl_bitmap32_not_zero() check would try to access up to 512 MB of attribute "payload". Prevent this overflow byt limiting the bitset size. Technically, compact bitset format would allow bitset sizes up to almost 2^18 (so that the nest size does not exceed U16_MAX) but bitsets used by ethtool are much shorter. S16_MAX, the largest value which can be directly used as an upper limit in policy, should be a reasonable compromise. Fixes: 10b518d4e6dd ("ethtool: netlink bitset handling") Reported-by: syzbot+7fd4ed5b4234ab1fdccd@syzkaller.appspotmail.com Reported-by: syzbot+709b7a64d57978247e44@syzkaller.appspotmail.com Reported-by: syzbot+983cb8fb2d17a7af549d@syzkaller.appspotmail.com Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-26net: Fix Tx hash bound checkingAmritha Nambiar1-0/+2
Fixes the lower and upper bounds when there are multiple TCs and traffic is on the the same TC on the same device. The lower bound is represented by 'qoffset' and the upper limit for hash value is 'qcount + qoffset'. This gives a clean Rx to Tx queue mapping when there are multiple TCs, as the queue indices for upper TCs will be offset by 'qoffset'. v2: Fixed commit description based on comments. Fixes: 1b837d489e06 ("net: Revoke export for __skb_tx_hash, update it to just be static skb_tx_hash") Fixes: eadec877ce9c ("net: Add support for subordinate traffic classes to netdev_pick_tx") Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com> Reviewed-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-26selftests: nft_concat_range: Add test for reported add/flush/add issueStefano Brivio1-4/+39
Add a specific test for the crash reported by Phil Sutter and addressed in the previous patch. The test cases that, in my intention, should have covered these cases, that is, the ones from the 'concurrency' section, don't run these sequences tightly enough and spectacularly failed to catch this. While at it, define a convenient way to add these kind of tests, by adding a "reported issues" test section. It's more convenient, for this particular test, to execute the set setup in its own function. However, future test cases like this one might need to call setup functions, and will typically need no tools other than nft, so allow for this in check_tools(). The original form of the reproducer used here was provided by Phil. Reported-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-02-26nft_set_pipapo: Actually fetch key data in nft_pipapo_remove()Stefano Brivio1-2/+4
Phil reports that adding elements, flushing and re-adding them right away: nft add table t '{ set s { type ipv4_addr . inet_service; flags interval; }; }' nft add element t s '{ 10.0.0.1 . 22-25, 10.0.0.1 . 10-20 }' nft flush set t s nft add element t s '{ 10.0.0.1 . 10-20, 10.0.0.1 . 22-25 }' triggers, almost reliably, a crash like this one: [ 71.319848] general protection fault, probably for non-canonical address 0x6f6b6e696c2e756e: 0000 [#1] PREEMPT SMP PTI [ 71.321540] CPU: 3 PID: 1201 Comm: kworker/3:2 Not tainted 5.6.0-rc1-00377-g2bb07f4e1d861 #192 [ 71.322746] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190711_202441-buildvm-armv7-10.arm.fedoraproject.org-2.fc31 04/01/2014 [ 71.324430] Workqueue: events nf_tables_trans_destroy_work [nf_tables] [ 71.325387] RIP: 0010:nft_set_elem_destroy+0xa5/0x110 [nf_tables] [ 71.326164] Code: 89 d4 84 c0 74 0e 8b 77 44 0f b6 f8 48 01 df e8 41 ff ff ff 45 84 e4 74 36 44 0f b6 63 08 45 84 e4 74 2c 49 01 dc 49 8b 04 24 <48> 8b 40 38 48 85 c0 74 4f 48 89 e7 4c 8b [ 71.328423] RSP: 0018:ffffc9000226fd90 EFLAGS: 00010282 [ 71.329225] RAX: 6f6b6e696c2e756e RBX: ffff88813ab79f60 RCX: ffff88813931b5a0 [ 71.330365] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff88813ab79f9a [ 71.331473] RBP: ffff88813ab79f60 R08: 0000000000000008 R09: 0000000000000000 [ 71.332627] R10: 000000000000021c R11: 0000000000000000 R12: ffff88813ab79fc2 [ 71.333615] R13: ffff88813b3adf50 R14: dead000000000100 R15: ffff88813931b8a0 [ 71.334596] FS: 0000000000000000(0000) GS:ffff88813bd80000(0000) knlGS:0000000000000000 [ 71.335780] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 71.336577] CR2: 000055ac683710f0 CR3: 000000013a222003 CR4: 0000000000360ee0 [ 71.337533] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 71.338557] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 71.339718] Call Trace: [ 71.340093] nft_pipapo_destroy+0x7a/0x170 [nf_tables_set] [ 71.340973] nft_set_destroy+0x20/0x50 [nf_tables] [ 71.341879] nf_tables_trans_destroy_work+0x246/0x260 [nf_tables] [ 71.342916] process_one_work+0x1d5/0x3c0 [ 71.343601] worker_thread+0x4a/0x3c0 [ 71.344229] kthread+0xfb/0x130 [ 71.344780] ? process_one_work+0x3c0/0x3c0 [ 71.345477] ? kthread_park+0x90/0x90 [ 71.346129] ret_from_fork+0x35/0x40 [ 71.346748] Modules linked in: nf_tables_set nf_tables nfnetlink 8021q [last unloaded: nfnetlink] [ 71.348153] ---[ end trace 2eaa8149ca759bcc ]--- [ 71.349066] RIP: 0010:nft_set_elem_destroy+0xa5/0x110 [nf_tables] [ 71.350016] Code: 89 d4 84 c0 74 0e 8b 77 44 0f b6 f8 48 01 df e8 41 ff ff ff 45 84 e4 74 36 44 0f b6 63 08 45 84 e4 74 2c 49 01 dc 49 8b 04 24 <48> 8b 40 38 48 85 c0 74 4f 48 89 e7 4c 8b [ 71.350017] RSP: 0018:ffffc9000226fd90 EFLAGS: 00010282 [ 71.350019] RAX: 6f6b6e696c2e756e RBX: ffff88813ab79f60 RCX: ffff88813931b5a0 [ 71.350019] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff88813ab79f9a [ 71.350020] RBP: ffff88813ab79f60 R08: 0000000000000008 R09: 0000000000000000 [ 71.350021] R10: 000000000000021c R11: 0000000000000000 R12: ffff88813ab79fc2 [ 71.350022] R13: ffff88813b3adf50 R14: dead000000000100 R15: ffff88813931b8a0 [ 71.350025] FS: 0000000000000000(0000) GS:ffff88813bd80000(0000) knlGS:0000000000000000 [ 71.350026] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 71.350027] CR2: 000055ac683710f0 CR3: 000000013a222003 CR4: 0000000000360ee0 [ 71.350028] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 71.350028] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 71.350030] Kernel panic - not syncing: Fatal exception [ 71.350412] Kernel Offset: disabled [ 71.365922] ---[ end Kernel panic - not syncing: Fatal exception ]--- which is caused by dangling elements that have been deactivated, but never removed. On a flush operation, nft_pipapo_walk() walks through all the elements in the mapping table, which are then deactivated by nft_flush_set(), one by one, and added to the commit list for removal. Element data is then freed. On transaction commit, nft_pipapo_remove() is called, and failed to remove these elements, leading to the stale references in the mapping. The first symptom of this, revealed by KASan, is a one-byte use-after-free in subsequent calls to nft_pipapo_walk(), which is usually not enough to trigger a panic. When stale elements are used more heavily, though, such as double-free via nft_pipapo_destroy() as in Phil's case, the problem becomes more noticeable. The issue comes from that fact that, on a flush operation, nft_pipapo_remove() won't get the actual key data via elem->key, elements to be deleted upon commit won't be found by the lookup via pipapo_get(), and removal will be skipped. Key data should be fetched via nft_set_ext_key(), instead. Reported-by: Phil Sutter <phil@nwl.cc> Fixes: 3c4287f62044 ("nf_tables: Add set type for arbitrary concatenation of ranges") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-02-26Merge branch 'master' of git://blackhole.kfki.hu/nfPablo Neira Ayuso3-206/+474
Jozsef Kadlecsik says: ==================== ipset patches for nf The first one is larger than usual, but the issue could not be solved simpler. Also, it's a resend of the patch I submitted a few days ago, with a one line fix on top of that: the size of the comment extensions was not taken into account at reporting the full size of the set. - Fix "INFO: rcu detected stall in hash_xxx" reports of syzbot by introducing region locking and using workqueue instead of timer based gc of timed out entries in hash types of sets in ipset. - Fix the forceadd evaluation path - the bug was also uncovered by the syzbot. ==================== Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-02-25icmp: allow icmpv6_ndo_send to work with CONFIG_IPV6=nJason A. Donenfeld1-6/+10
The icmpv6_send function has long had a static inline implementation with an empty body for CONFIG_IPV6=n, so that code calling it doesn't need to be ifdef'd. The new icmpv6_ndo_send function, which is intended for drivers as a drop-in replacement with an identical function signature, should follow the same pattern. Without this patch, drivers that used to work with CONFIG_IPV6=n now result in a linker error. Cc: Chen Zhou <chenzhou10@huawei.com> Reported-by: Hulk Robot <hulkci@huawei.com> Fixes: 0b41713b6066 ("icmp: introduce helper for nat'd source address in network device context") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-25selftests: nft_concat_range: Move option for 'list ruleset' before commandStefano Brivio1-6/+6
Before nftables commit fb9cea50e8b3 ("main: enforce options before commands"), 'nft list ruleset -a' happened to work, but it's wrong and won't work anymore. Replace it by 'nft -a list ruleset'. Reported-by: Chen Yi <yiche@redhat.com> Fixes: 611973c1e06f ("selftests: netfilter: Introduce tests for sets with range concatenation") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-02-24Merge tag 'mac80211-for-net-2020-02-24' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211David S. Miller4-9/+6
Johannes Berg ==================== A few fixes: * remove a double mutex-unlock * fix a leak in an error path * NULL pointer check * include if_vlan.h where needed * avoid RCU list traversal when not under RCU ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-24net: bridge: fix stale eth hdr pointer in br_dev_xmitNikolay Aleksandrov1-4/+2
In br_dev_xmit() we perform vlan filtering in br_allowed_ingress() but if the packet has the vlan header inside (e.g. bridge with disabled tx-vlan-offload) then the vlan filtering code will use skb_vlan_untag() to extract the vid before filtering which in turn calls pskb_may_pull() and we may end up with a stale eth pointer. Moreover the cached eth header pointer will generally be wrong after that operation. Remove the eth header caching and just use eth_hdr() directly, the compiler does the right thing and calculates it only once so we don't lose anything. Fixes: 057658cb33fb ("bridge: suppress arp pkts on BR_NEIGH_SUPPRESS ports") Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-24Merge branch 'net-ll_temac-Bugfixes'David S. Miller2-38/+175
Esben Haabendal says: ==================== net: ll_temac: Bugfixes Fix a number of bugs which have been present since the first commit. The bugs fixed in patch 1,2 and 4 have all been observed in real systems, and was relatively easy to reproduce given an appropriate stress setup. Changes since v1: - Changed error handling of of dma_map_single() in temac_start_xmit() to drop packet instead of returning NETDEV_TX_BUSY. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-24net: ll_temac: Handle DMA halt condition caused by buffer underrunEsben Haabendal2-5/+56
The SDMA engine used by TEMAC halts operation when it has finished processing of the last buffer descriptor in the buffer ring. Unfortunately, no interrupt event is generated when this happens, so we need to setup another mechanism to make sure DMA operation is restarted when enough buffers have been added to the ring. Fixes: 92744989533c ("net: add Xilinx ll_temac device driver") Signed-off-by: Esben Haabendal <esben@geanix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-24net: ll_temac: Fix RX buffer descriptor handling on GFP_ATOMIC pressureEsben Haabendal2-31/+82
Failures caused by GFP_ATOMIC memory pressure have been observed, and due to the missing error handling, results in kernel crash such as [1876998.350133] kernel BUG at mm/slub.c:3952! [1876998.350141] invalid opcode: 0000 [#1] PREEMPT SMP PTI [1876998.350147] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 5.3.0-scnxt #1 [1876998.350150] Hardware name: N/A N/A/COMe-bIP2, BIOS CCR2R920 03/01/2017 [1876998.350160] RIP: 0010:kfree+0x1ca/0x220 [1876998.350164] Code: 85 db 74 49 48 8b 95 68 01 00 00 48 31 c2 48 89 10 e9 d7 fe ff ff 49 8b 04 24 a9 00 00 01 00 75 0b 49 8b 44 24 08 a8 01 75 02 <0f> 0b 49 8b 04 24 31 f6 a9 00 00 01 00 74 06 41 0f b6 74 24 5b [1876998.350172] RSP: 0018:ffffc900000f0df0 EFLAGS: 00010246 [1876998.350177] RAX: ffffea00027f0708 RBX: ffff888008d78000 RCX: 0000000000391372 [1876998.350181] RDX: 0000000000000000 RSI: ffffe8ffffd01400 RDI: ffff888008d78000 [1876998.350185] RBP: ffff8881185a5d00 R08: ffffc90000087dd8 R09: 000000000000280a [1876998.350189] R10: 0000000000000002 R11: 0000000000000000 R12: ffffea0000235e00 [1876998.350193] R13: ffff8881185438a0 R14: 0000000000000000 R15: ffff888118543870 [1876998.350198] FS: 0000000000000000(0000) GS:ffff88811f300000(0000) knlGS:0000000000000000 [1876998.350203] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 s#1 Part1 [1876998.350206] CR2: 00007f8dac7b09f0 CR3: 000000011e20a006 CR4: 00000000001606e0 [1876998.350210] Call Trace: [1876998.350215] <IRQ> [1876998.350224] ? __netif_receive_skb_core+0x70a/0x920 [1876998.350229] kfree_skb+0x32/0xb0 [1876998.350234] __netif_receive_skb_core+0x70a/0x920 [1876998.350240] __netif_receive_skb_one_core+0x36/0x80 [1876998.350245] process_backlog+0x8b/0x150 [1876998.350250] net_rx_action+0xf7/0x340 [1876998.350255] __do_softirq+0x10f/0x353 [1876998.350262] irq_exit+0xb2/0xc0 [1876998.350265] do_IRQ+0x77/0xd0 [1876998.350271] common_interrupt+0xf/0xf [1876998.350274] </IRQ> In order to handle such failures more graceful, this change splits the receive loop into one for consuming the received buffers, and one for allocating new buffers. When GFP_ATOMIC allocations fail, the receive will continue with the buffers that is still there, and with the expectation that the allocations will succeed in a later call to receive. Fixes: 92744989533c ("net: add Xilinx ll_temac device driver") Signed-off-by: Esben Haabendal <esben@geanix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-24net: ll_temac: Add more error handling of dma_map_single() callsEsben Haabendal1-2/+24
This adds error handling to the remaining dma_map_single() calls, so that behavior is well defined if/when we run out of DMA memory. Fixes: 92744989533c ("net: add Xilinx ll_temac device driver") Signed-off-by: Esben Haabendal <esben@geanix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-24net: ll_temac: Fix race condition causing TX hangEsben Haabendal1-3/+16
It is possible that the interrupt handler fires and frees up space in the TX ring in between checking for sufficient TX ring space and stopping the TX queue in temac_start_xmit. If this happens, the queue wake from the interrupt handler will occur before the queue is stopped, causing a lost wakeup and the adapter's transmit hanging. To avoid this, after stopping the queue, check again whether there is sufficient space in the TX ring. If so, wake up the queue again. This is a port of the similar fix in axienet driver, commit 7de44285c1f6 ("net: axienet: Fix race condition causing TX hang"). Fixes: 23ecc4bde21f ("net: ll_temac: fix checksum offload logic") Signed-off-by: Esben Haabendal <esben@geanix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-24mac80211: rx: avoid RCU list traversal under mutexMadhuparna Bhowmik1-1/+1
local->sta_mtx is held in __ieee80211_check_fast_rx_iface(). No need to use list_for_each_entry_rcu() as it also requires a cond argument to avoid false lockdep warnings when not used in RCU read-side section (with CONFIG_PROVE_RCU_LIST). Therefore use list_for_each_entry(); Signed-off-by: Madhuparna Bhowmik <madhuparnabhowmik10@gmail.com> Link: https://lore.kernel.org/r/20200223143302.15390-1-madhuparnabhowmik10@gmail.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-02-24nl80211: explicitly include if_vlan.hJohannes Berg1-0/+1
We use that here, and do seem to get it through some recursive include, but better include it explicitly. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Link: https://lore.kernel.org/r/20200224093814.1b9c258fec67.I45ac150d4e11c72eb263abec9f1f0c7add9bef2b@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-02-23net: core: devlink.c: Hold devlink->lock from the beginning of devlink_dpipe_table_register()Madhuparna Bhowmik1-7/+14
devlink_dpipe_table_find() should be called under either rcu_read_lock() or devlink->lock. devlink_dpipe_table_register() calls devlink_dpipe_table_find() without holding the lock and acquires it later. Therefore hold the devlink->lock from the beginning of devlink_dpipe_table_register(). Suggested-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Madhuparna Bhowmik <madhuparnabhowmik10@gmail.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-23net: phy: Avoid multiple suspendsFlorian Fainelli1-2/+3
It is currently possible for a PHY device to be suspended as part of a network device driver's suspend call while it is still being attached to that net_device, either via phy_suspend() or implicitly via phy_stop(). Later on, when the MDIO bus controller get suspended, we would attempt to suspend again the PHY because it is still attached to a network device. This is both a waste of time and creates an opportunity for improper clock/power management bugs to creep in. Fixes: 803dd9c77ac3 ("net: phy: avoid suspending twice a PHY") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-23net: ks8851-ml: Fix IRQ handling and lockingMarek Vasut1-6/+8
The KS8851 requires that packet RX and TX are mutually exclusive. Currently, the driver hopes to achieve this by disabling interrupt from the card by writing the card registers and by disabling the interrupt on the interrupt controller. This however is racy on SMP. Replace this approach by expanding the spinlock used around the ks_start_xmit() TX path to ks_irq() RX path to assure true mutual exclusion and remove the interrupt enabling/disabling, which is now not needed anymore. Furthermore, disable interrupts also in ks_net_stop(), which was missing before. Note that a massive improvement here would be to re-use the KS8851 driver approach, which is to move the TX path into a worker thread, interrupt handling to threaded interrupt, and synchronize everything with mutexes, but that would be a much bigger rework, for a separate patch. Signed-off-by: Marek Vasut <marex@denx.de> Cc: David S. Miller <davem@davemloft.net> Cc: Lukas Wunner <lukas@wunner.de> Cc: Petr Stetiar <ynezz@true.cz> Cc: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-23docs: networking: phy: Rephrase paragraph for clarityJonathan Neuschäfer1-2/+3
Let's make it a little easier to read. Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-23tcp: fix TFO SYNACK undo to avoid double-timestamp-undoNeal Cardwell1-1/+5
In a rare corner case the new logic for undo of SYNACK RTO could result in triggering the warning in tcp_fastretrans_alert() that says: WARN_ON(tp->retrans_out != 0); The warning looked like: WARNING: CPU: 1 PID: 1 at net/ipv4/tcp_input.c:2818 tcp_ack+0x13e0/0x3270 The sequence that tickles this bug is: - Fast Open server receives TFO SYN with data, sends SYNACK - (client receives SYNACK and sends ACK, but ACK is lost) - server app sends some data packets - (N of the first data packets are lost) - server receives client ACK that has a TS ECR matching first SYNACK, and also SACKs suggesting the first N data packets were lost - server performs TS undo of SYNACK RTO, then immediately enters recovery - buggy behavior then performed a *second* undo that caused the connection to be in CA_Open with retrans_out != 0 Basically, the incoming ACK packet with SACK blocks causes us to first undo the cwnd reduction from the SYNACK RTO, but then immediately enters fast recovery, which then makes us eligible for undo again. And then tcp_rcv_synrecv_state_fastopen() accidentally performs an undo using a "mash-up" of state from two different loss recovery phases: it uses the timestamp info from the ACK of the original SYNACK, and the undo_marker from the fast recovery. This fix refines the logic to only invoke the tcp_try_undo_loss() inside tcp_rcv_synrecv_state_fastopen() if the connection is still in CA_Loss. If peer SACKs triggered fast recovery, then tcp_rcv_synrecv_state_fastopen() can't safely undo. Fixes: 794200d66273 ("tcp: undo cwnd on Fast Open spurious SYNACK retransmit") Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-23hv_netvsc: Fix unwanted wakeup in netvsc_attach()Haiyang Zhang2-1/+4
When netvsc_attach() is called by operations like changing MTU, etc., an extra wakeup may happen while netvsc_attach() calling rndis_filter_device_add() which sends rndis messages when queue is stopped in netvsc_detach(). The completion message will wake up queue 0. We can reproduce the issue by changing MTU etc., then the wake_queue counter from "ethtool -S" will increase beyond stop_queue counter: stop_queue: 0 wake_queue: 1 The issue causes queue wake up, and counter increment, no other ill effects in current code. So we didn't see any network problem for now. To fix this, initialize tx_disable to true, and set it to false when the NIC is ready to be attached or registered. Fixes: 7b2ee50c0cd5 ("hv_netvsc: common detach logic") Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-23net: usb: qmi_wwan: restore mtu min/max values after raw_ip switchDaniele Palmas1-0/+3
usbnet creates network interfaces with min_mtu = 0 and max_mtu = ETH_MAX_MTU. These values are not modified by qmi_wwan when the network interface is created initially, allowing, for example, to set mtu greater than 1500. When a raw_ip switch is done (raw_ip set to 'Y', then set to 'N') the mtu values for the network interface are set through ether_setup, with min_mtu = ETH_MIN_MTU and max_mtu = ETH_DATA_LEN, not allowing anymore to set mtu greater than 1500 (error: mtu greater than device maximum). The patch restores the original min/max mtu values set by usbnet after a raw_ip switch. Signed-off-by: Daniele Palmas <dnlplm@gmail.com> Acked-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-22net: genetlink: return the error code when attribute parsing fails.Paolo Abeni1-2/+3
Currently if attribute parsing fails and the genl family does not support parallel operation, the error code returned by __nlmsg_parse() is discarded by genl_family_rcv_msg_attrs_parse(). Be sure to report the error for all genl families. Fixes: c10e6cf85e7d ("net: genetlink: push attrbuf allocation and parsing to a separate function") Fixes: ab5b526da048 ("net: genetlink: always allocate separate attrs for dumpit ops") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-22ipv4: ensure rcu_read_lock() in cipso_v4_error()Matteo Croce1-1/+6
Similarly to commit c543cb4a5f07 ("ipv4: ensure rcu_read_lock() in ipv4_link_failure()"), __ip_options_compile() must be called under rcu protection. Fixes: 3da1ed7ac398 ("net: avoid use IPCB in cipso_v4_error") Suggested-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Matteo Croce <mcroce@redhat.com> Acked-by: Paul Moore <paul@paul-moore.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-22vhost: Check docket sk_family instead of call getnameEugenio Pérez1-9/+1
Doing so, we save one call to get data we already have in the struct. Also, since there is no guarantee that getname use sockaddr_ll parameter beyond its size, we add a little bit of security here. It should do not do beyond MAX_ADDR_LEN, but syzbot found that ax25_getname writes more (72 bytes, the size of full_sockaddr_ax25, versus 20 + 32 bytes of sockaddr_ll + MAX_ADDR_LEN in syzbot repro). Fixes: 3a4d5c94e9593 ("vhost_net: a kernel-level virtio server") Reported-by: syzbot+f2a62d07a5198c819c7b@syzkaller.appspotmail.com Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-22netfilter: ipset: Fix forceadd evaluation pathJozsef Kadlecsik1-0/+2
When the forceadd option is enabled, the hash:* types should find and replace the first entry in the bucket with the new one if there are no reuseable (deleted or timed out) entries. However, the position index was just not set to zero and remained the invalid -1 if there were no reuseable entries. Reported-by: syzbot+6a86565c74ebe30aea18@syzkaller.appspotmail.com Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7") Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
2020-02-22netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reportsJozsef Kadlecsik3-206/+472
In the case of huge hash:* types of sets, due to the single spinlock of a set the processing of the whole set under spinlock protection could take too long. There were four places where the whole hash table of the set was processed from bucket to bucket under holding the spinlock: - During resizing a set, the original set was locked to exclude kernel side add/del element operations (userspace add/del is excluded by the nfnetlink mutex). The original set is actually just read during the resize, so the spinlocking is replaced with rcu locking of regions. However, thus there can be parallel kernel side add/del of entries. In order not to loose those operations a backlog is added and replayed after the successful resize. - Garbage collection of timed out entries was also protected by the spinlock. In order not to lock too long, region locking is introduced and a single region is processed in one gc go. Also, the simple timer based gc running is replaced with a workqueue based solution. The internal book-keeping (number of elements, size of extensions) is moved to region level due to the region locking. - Adding elements: when the max number of the elements is reached, the gc was called to evict the timed out entries. The new approach is that the gc is called just for the matching region, assuming that if the region (proportionally) seems to be full, then the whole set does. We could scan the other regions to check every entry under rcu locking, but for huge sets it'd mean a slowdown at adding elements. - Listing the set header data: when the set was defined with timeout support, the garbage collector was called to clean up timed out entries to get the correct element numbers and set size values. Now the set is scanned to check non-timed out entries, without actually calling the gc for the whole set. Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe -> SOFTIRQ-unsafe lock order issues during working on the patch. Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7") Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
2020-02-21Merge tag 'linux-watchdog-5.6-rc3' of git://www.linux-watchdog.org/linux-watchdogLinus Torvalds2-7/+14
Pull watchdog fixes from Wim Van Sebroeck: - mtk_wdt needs RESET_CONTROLLER to build - da9062 driver fixes: - fix power management ops - do not ping the hw during stop() - add dependency on I2C * tag 'linux-watchdog-5.6-rc3' of git://www.linux-watchdog.org/linux-watchdog: watchdog: da9062: Add dependency on I2C watchdog: da9062: fix power management ops watchdog: da9062: do not ping the hw during stop() watchdog: fix mtk_wdt.c RESET_CONTROLLER build error
2020-02-21Merge tag 'char-misc-5.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-miscLinus Torvalds7-19/+65
Pull char/misc driver fixes from Greg KH: "Here are some small char/misc driver fixes for 5.6-rc3. Also included in here are some updates for some documentation files that I seem to be maintaining these days. The driver fixes are: - small fixes for the habanalabs driver - fsi driver bugfix All of these have been in linux-next for a while with no reported issues" * tag 'char-misc-5.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: Documentation/process: Swap out the ambassador for Canonical habanalabs: patched cb equals user cb in device memset habanalabs: do not halt CoreSight during hard reset habanalabs: halt the engines before hard-reset MAINTAINERS: remove unnecessary ':' characters fsi: aspeed: add unspecified HAS_IOMEM dependency COPYING: state that all contributions really are covered by this file Documentation/process: Change Microsoft contact for embargoed hardware issues embargoed-hardware-issues: drop Amazon contact as the email address now bounces Documentation/process: Add Arm contact for embargoed HW issues
2020-02-21Merge tag 'staging-5.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/stagingLinus Torvalds11-1530/+56
Pull staging driver fixes from Greg KH: "Here are some small staging driver fixes for 5.6-rc3, along with the removal of an unused/unneeded driver as well. The android vsoc driver is not needed anymore by anyone, so it was removed. The other driver fixes are: - ashmem bugfixes - greybus audio driver bugfix - wireless driver bugfixes and tiny cleanups to error paths All of these have been in linux-next for a while now with no reported issues" * tag 'staging-5.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: staging: rtl8723bs: Remove unneeded goto statements staging: rtl8188eu: Remove some unneeded goto statements staging: rtl8723bs: Fix potential overuse of kernel memory staging: rtl8188eu: Fix potential overuse of kernel memory staging: rtl8723bs: Fix potential security hole staging: rtl8188eu: Fix potential security hole staging: greybus: use after free in gb_audio_manager_remove_all() staging: android: Delete the 'vsoc' driver staging: rtl8723bs: fix copy of overlapping memory staging: android: ashmem: Disallow ashmem memory from being remapped staging: vt6656: fix sign of rx_dbm to bb_pre_ed_rssi.
2020-02-21Merge tag 'tty-5.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/ttyLinus Torvalds16-51/+104
Pull tty/serial driver fixes from Greg KH: "Here are a number of small tty and serial driver fixes for 5.6-rc3 that resolve a bunch of reported issues. They are: - vt selection and ioctl fixes - serdev bugfix - atmel serial driver fixes - qcom serial driver fixes - other minor serial driver fixes All of these have been in linux-next for a while with no reported issues" * tag 'tty-5.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: vt: selection, close sel_buffer race vt: selection, handle pending signals in paste_selection serial: cpm_uart: call cpm_muram_init before registering console tty: serial: qcom_geni_serial: Fix RX cancel command failure serial: 8250: Check UPF_IRQ_SHARED in advance tty: serial: imx: setup the correct sg entry for tx dma vt: vt_ioctl: fix race in VT_RESIZEX vt: fix scrollback flushing on background consoles tty: serial: tegra: Handle RX transfer in PIO mode if DMA wasn't started tty/serial: atmel: manage shutdown in case of RS485 or ISO7816 mode serdev: ttyport: restore client ops on deregistration serial: ar933x_uart: set UART_CS_{RX,TX}_READY_ORIDE
2020-02-21Merge tag 'usb-5.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usbLinus Torvalds26-140/+327
Pull USB/Thunderbolt fixes from Greg KH: "Here are a number of small USB driver fixes for 5.6-rc3. Included in here are: - MAINTAINER file updates - USB gadget driver fixes - usb core quirk additions and fixes for regressions - xhci driver fixes - usb serial driver id additions and fixes - thunderbolt bugfix Thunderbolt patches come in through here now that USB4 is really thunderbolt. All of these have been in linux-next for a while with no reported issues" * tag 'usb-5.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (34 commits) USB: misc: iowarrior: add support for the 100 device thunderbolt: Prevent crash if non-active NVMem file is read usb: gadget: udc-xilinx: Fix xudc_stop() kernel-doc format USB: misc: iowarrior: add support for the 28 and 28L devices USB: misc: iowarrior: add support for 2 OEMed devices USB: Fix novation SourceControl XL after suspend xhci: Fix memory leak when caching protocol extended capability PSI tables - take 2 Revert "xhci: Fix memory leak when caching protocol extended capability PSI tables" MAINTAINERS: Sort entries in database for THUNDERBOLT usb: dwc3: debug: fix string position formatting mixup with ret and len usb: gadget: serial: fix Tx stall after buffer overflow usb: gadget: ffs: ffs_aio_cancel(): Save/restore IRQ flags usb: dwc2: Fix SET/CLEAR_FEATURE and GET_STATUS flows usb: dwc2: Fix in ISOC request length checking usb: gadget: composite: Support more than 500mA MaxPower usb: gadget: composite: Fix bMaxPower for SuperSpeedPlus usb: gadget: u_audio: Fix high-speed max packet size usb: dwc3: gadget: Check for IOC/LST bit in TRB->ctrl fields USB: core: clean up endpoint-descriptor parsing USB: quirks: blacklist duplicate ep on Sound Devices USBPre2 ...
2020-02-21Merge tag 'drm-fixes-2020-02-21' of git://anongit.freedesktop.org/drm/drmLinus Torvalds50-243/+486
Pull drm fixes from Dave Airlie: "Varied fixes for rc3. i915 is the largest, they are seeing some ACPI problems with their CI which hopefully get solved soon [1]. msm has a bunch of fixes for new hw added in the merge, a bunch of amdgpu fixes, and nouveau adds support for some new firmwares for turing tu11x GPUs that were just released into linux-firmware by nvidia, they operate the same as the ones we already have for tu10x so should be fine to hook up. Otherwise it's just misc fixes for panfrost and sun4i. core: - Allow only one rotation argument, and allow zero rotation in video cmdline. i915: - Workaround missing Display Stream Compression (DSC) state readout by forcing modeset when its enabled at probe - Fix EHL port clock voltage level requirements - Fix queuing retire workers on the virtual engine - Fix use of partially initialized waiters - Stop using drm_pci_alloc/drm_pci/free - Fix rewind of RING_TAIL by forcing a context reload - Fix locking on resetting ring->head - Propagate our bug filing URL change to stable kernels panfrost: - Small compiler warning fix for panfrost. - Fix when using performance counters in panfrost when using per fd address space. sun4xi: - Fix dt binding nouveau: - tu11x modesetting fix - ACR/GR firmware support for tu11x (fw is public now) msm: - fix UBWC on GPU and display side for sc7180 - fix DSI suspend/resume issue encountered on sc7180 - fix some breakage on so called "linux-android" devices (fallout from sc7180/a618 support, not seen earlier due to bootloader/firmware differences) - couple other misc fixes amdgpu: - HDCP fixes - xclk fix for raven - GFXOFF fixes" [1] The Intel suspend testing should now be fixed by commit 63fb9623427f ("ACPI: PM: s2idle: Check fixed wakeup events in acpi_s2idle_wake()") * tag 'drm-fixes-2020-02-21' of git://anongit.freedesktop.org/drm/drm: (39 commits) drm/amdgpu/display: clean up hdcp workqueue handling drm/amdgpu: add is_raven_kicker judgement for raven1 drm/i915/gt: Avoid resetting ring->head outside of its timeline mutex drm/i915/execlists: Always force a context reload when rewinding RING_TAIL drm/i915: Wean off drm_pci_alloc/drm_pci_free drm/i915/gt: Protect defer_request() from new waiters drm/i915/gt: Prevent queuing retire workers on the virtual engine drm/i915/dsc: force full modeset whenever DSC is enabled at probe drm/i915/ehl: Update port clock voltage level requirements drm/i915: Update drm/i915 bug filing URL MAINTAINERS: Update drm/i915 bug filing URL drm/i915: Initialise basic fence before acquiring seqno drm/i915/gem: Require per-engine reset support for non-persistent contexts drm/nouveau/kms/gv100-: Re-set LUT after clearing for modesets drm/nouveau/gr/tu11x: initial support drm/nouveau/acr/tu11x: initial support drm/amdgpu/gfx10: disable gfxoff when reading rlc clock drm/amdgpu/gfx9: disable gfxoff when reading rlc clock drm/amdgpu/soc15: fix xclk for raven drm/amd/powerplay: always refetch the enabled features status on dpm enablement ...
2020-02-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds104-506/+1132
Pull networking fixes from David Miller: 1) Limit xt_hashlimit hash table size to avoid OOM or hung tasks, from Cong Wang. 2) Fix deadlock in xsk by publishing global consumer pointers when NAPI is finished, from Magnus Karlsson. 3) Set table field properly to RT_TABLE_COMPAT when necessary, from Jethro Beekman. 4) NLA_STRING attributes are not necessary NULL terminated, deal wiht that in IFLA_ALT_IFNAME. From Eric Dumazet. 5) Fix checksum handling in atlantic driver, from Dmitry Bezrukov. 6) Handle mtu==0 devices properly in wireguard, from Jason A. Donenfeld. 7) Fix several lockdep warnings in bonding, from Taehee Yoo. 8) Fix cls_flower port blocking, from Jason Baron. 9) Sanitize internal map names in libbpf, from Toke Høiland-Jørgensen. 10) Fix RDMA race in qede driver, from Michal Kalderon. 11) Fix several false lockdep warnings by adding conditions to list_for_each_entry_rcu(), from Madhuparna Bhowmik. 12) Fix sleep in atomic in mlx5 driver, from Huy Nguyen. 13) Fix potential deadlock in bpf_map_do_batch(), from Yonghong Song. 14) Hey, variables declared in switch statement before any case statements are not initialized. I learn something every day. Get rids of this stuff in several parts of the networking, from Kees Cook. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (99 commits) bnxt_en: Issue PCIe FLR in kdump kernel to cleanup pending DMAs. bnxt_en: Improve device shutdown method. net: netlink: cap max groups which will be considered in netlink_bind() net: thunderx: workaround BGX TX Underflow issue ionic: fix fw_status read net: disable BRIDGE_NETFILTER by default net: macb: Properly handle phylink on at91rm9200 s390/qeth: fix off-by-one in RX copybreak check s390/qeth: don't warn for napi with 0 budget s390/qeth: vnicc Fix EOPNOTSUPP precedence openvswitch: Distribute switch variables for initialization net: ip6_gre: Distribute switch variables for initialization net: core: Distribute switch variables for initialization udp: rehash on disconnect net/tls: Fix to avoid gettig invalid tls record bpf: Fix a potential deadlock with bpf_map_do_batch bpf: Do not grab the bucket spinlock by default on htab batch ops ice: Wait for VF to be reset/ready before configuration ice: Don't tell the OS that link is going down ice: Don't reject odd values of usecs set by user ...
2020-02-21Merge branch 'akpm' (patches from Andrew)Linus Torvalds20-419/+93
Merge misc fixes from Andrew Morton: - A few y2038 fixes which missed the merge window while dependencies in NFS were being sorted out. - A bunch of fixes. Some minor, some not. * emailed patches from Andrew Morton <akpm@linux-foundation.org>: MAINTAINERS: use tabs for SAFESETID lib/stackdepot.c: fix global out-of-bounds in stack_slabs mm/sparsemem: pfn_to_page is not valid yet on SPARSEMEM mm/vmscan.c: don't round up scan size for online memory cgroup lib/string.c: update match_string() doc-strings with correct behavior mm/memcontrol.c: lost css_put in memcg_expand_shrinker_maps() mm/swapfile.c: fix a comment in sys_swapon() scripts/get_maintainer.pl: deprioritize old Fixes: addresses get_maintainer: remove uses of P: for maintainer name selftests/vm: add missed tests in run_vmtests include/uapi/linux/swab.h: fix userspace breakage, use __BITS_PER_LONG for swap Revert "ipc,sem: remove uneeded sem_undo_list lock usage in exit_sem()" y2038: hide timeval/timespec/itimerval/itimerspec types y2038: remove unused time32 interfaces y2038: remove ktime to/from timespec/timeval conversion
2020-02-21MAINTAINERS: use tabs for SAFESETIDRandy Dunlap1-4/+4
Use tabs for indentation instead of spaces for SAFESETID. All (!) other entries in MAINTAINERS use tabs (according to my simple grepping). Link: http://lkml.kernel.org/r/2bb2e52a-2694-816d-57b4-6cabfadd6c1a@infradead.org Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Micah Morton <mortonm@chromium.org> Cc: James Morris <jmorris@namei.org> Cc: "Serge E. Hallyn" <serge@hallyn.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-02-21lib/stackdepot.c: fix global out-of-bounds in stack_slabsAlexander Potapenko1-2/+6
Walter Wu has reported a potential case in which init_stack_slab() is called after stack_slabs[STACK_ALLOC_MAX_SLABS - 1] has already been initialized. In that case init_stack_slab() will overwrite stack_slabs[STACK_ALLOC_MAX_SLABS], which may result in a memory corruption. Link: http://lkml.kernel.org/r/20200218102950.260263-1-glider@google.com Fixes: cd11016e5f521 ("mm, kasan: stackdepot implementation. Enable stackdepot for SLAB") Signed-off-by: Alexander Potapenko <glider@google.com> Reported-by: Walter Wu <walter-zh.wu@mediatek.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Matthias Brugger <matthias.bgg@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Kate Stewart <kstewart@linuxfoundation.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-02-21mm/sparsemem: pfn_to_page is not valid yet on SPARSEMEMWei Yang1-1/+1
When we use SPARSEMEM instead of SPARSEMEM_VMEMMAP, pfn_to_page() doesn't work before sparse_init_one_section() is called. This leads to a crash when hotplug memory: BUG: unable to handle page fault for address: 0000000006400000 #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page PGD 0 P4D 0 Oops: 0002 [#1] SMP PTI CPU: 3 PID: 221 Comm: kworker/u16:1 Tainted: G W 5.5.0-next-20200205+ #343 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015 Workqueue: kacpi_hotplug acpi_hotplug_work_fn RIP: 0010:__memset+0x24/0x30 Code: cc cc cc cc cc cc 0f 1f 44 00 00 49 89 f9 48 89 d1 83 e2 07 48 c1 e9 03 40 0f b6 f6 48 b8 01 01 01 01 01 01 01 01 48 0f af c6 <f3> 48 ab 89 d1 f3 aa 4c 89 c8 c3 90 49 89 f9 40 88 f0 48 89 d1 f3 RSP: 0018:ffffb43ac0373c80 EFLAGS: 00010a87 RAX: ffffffffffffffff RBX: ffff8a1518800000 RCX: 0000000000050000 RDX: 0000000000000000 RSI: 00000000000000ff RDI: 0000000006400000 RBP: 0000000000140000 R08: 0000000000100000 R09: 0000000006400000 R10: 0000000000000000 R11: 0000000000000002 R12: 0000000000000000 R13: 0000000000000028 R14: 0000000000000000 R15: ffff8a153ffd9280 FS: 0000000000000000(0000) GS:ffff8a153ab00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000006400000 CR3: 0000000136fca000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: sparse_add_section+0x1c9/0x26a __add_pages+0xbf/0x150 add_pages+0x12/0x60 add_memory_resource+0xc8/0x210 __add_memory+0x62/0xb0 acpi_memory_device_add+0x13f/0x300 acpi_bus_attach+0xf6/0x200 acpi_bus_scan+0x43/0x90 acpi_device_hotplug+0x275/0x3d0 acpi_hotplug_work_fn+0x1a/0x30 process_one_work+0x1a7/0x370 worker_thread+0x30/0x380 kthread+0x112/0x130 ret_from_fork+0x35/0x40 We should use memmap as it did. On x86 the impact is limited to x86_32 builds, or x86_64 configurations that override the default setting for SPARSEMEM_VMEMMAP. Other memory hotplug archs (arm64, ia64, and ppc) also default to SPARSEMEM_VMEMMAP=y. [dan.j.williams@intel.com: changelog update] {rppt@linux.ibm.com: changelog update] Link: http://lkml.kernel.org/r/20200219030454.4844-1-bhe@redhat.com Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug") Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Signed-off-by: Baoquan He <bhe@redhat.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Baoquan He <bhe@redhat.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-02-21mm/vmscan.c: don't round up scan size for online memory cgroupGavin Shan1-3/+6
Commit 68600f623d69 ("mm: don't miss the last page because of round-off error") makes the scan size round up to @denominator regardless of the memory cgroup's state, online or offline. This affects the overall reclaiming behavior: the corresponding LRU list is eligible for reclaiming only when its size logically right shifted by @sc->priority is bigger than zero in the former formula. For example, the inactive anonymous LRU list should have at least 0x4000 pages to be eligible for reclaiming when we have 60/12 for swappiness/priority and without taking scan/rotation ratio into account. After the roundup is applied, the inactive anonymous LRU list becomes eligible for reclaiming when its size is bigger than or equal to 0x1000 in the same condition. (0x4000 >> 12) * 60 / (60 + 140 + 1) = 1 ((0x1000 >> 12) * 60) + 200) / (60 + 140 + 1) = 1 aarch64 has 512MB huge page size when the base page size is 64KB. The memory cgroup that has a huge page is always eligible for reclaiming in that case. The reclaiming is likely to stop after the huge page is reclaimed, meaing the further iteration on @sc->priority and the silbing and child memory cgroups will be skipped. The overall behaviour has been changed. This fixes the issue by applying the roundup to offlined memory cgroups only, to give more preference to reclaim memory from offlined memory cgroup. It sounds reasonable as those memory is unlikedly to be used by anyone. The issue was found by starting up 8 VMs on a Ampere Mustang machine, which has 8 CPUs and 16 GB memory. Each VM is given with 2 vCPUs and 2GB memory. It took 264 seconds for all VMs to be completely up and 784MB swap is consumed after that. With this patch applied, it took 236 seconds and 60MB swap to do same thing. So there is 10% performance improvement for my case. Note that KSM is disable while THP is enabled in the testing. total used free shared buff/cache available Mem: 16196 10065 2049 16 4081 3749 Swap: 8175 784 7391 total used free shared buff/cache available Mem: 16196 11324 3656 24 1215 2936 Swap: 8175 60 8115 Link: http://lkml.kernel.org/r/20200211024514.8730-1-gshan@redhat.com Fixes: 68600f623d69 ("mm: don't miss the last page because of round-off error") Signed-off-by: Gavin Shan <gshan@redhat.com> Acked-by: Roman Gushchin <guro@fb.com> Cc: <stable@vger.kernel.org> [4.20+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-02-21lib/string.c: update match_string() doc-strings with correct behaviorAlexandru Ardelean1-0/+16
There were a few attempts at changing behavior of the match_string() helpers (i.e. 'match_string()' & 'sysfs_match_string()'), to change & extend the behavior according to the doc-string. But the simplest approach is to just fix the doc-strings. The current behavior is fine as-is, and some bugs were introduced trying to fix it. As for extending the behavior, new helpers can always be introduced if needed. The match_string() helpers behave more like 'strncmp()' in the sense that they go up to n elements or until the first NULL element in the array of strings. This change updates the doc-strings with this info. Link: http://lkml.kernel.org/r/20200213072722.8249-1-alexandru.ardelean@analog.com Signed-off-by: Alexandru Ardelean <alexandru.ardelean@analog.com> Acked-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Kees Cook <keescook@chromium.org> Cc: "Tobin C . Harding" <tobin@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-02-21mm/memcontrol.c: lost css_put in memcg_expand_shrinker_maps()Vasily Averin1-1/+3
for_each_mem_cgroup() increases css reference counter for memory cgroup and requires to use mem_cgroup_iter_break() if the walk is cancelled. Link: http://lkml.kernel.org/r/c98414fb-7e1f-da0f-867a-9340ec4bd30b@virtuozzo.com Fixes: 0a4465d34028 ("mm, memcg: assign memcg-aware shrinkers bitmap to memcg") Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Acked-by: Kirill Tkhai <ktkhai@virtuozzo.com> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Roman Gushchin <guro@fb.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-02-21mm/swapfile.c: fix a comment in sys_swapon()Christoph Hellwig1-1/+1
claim_swapfile now always takes i_rwsem. Link: http://lkml.kernel.org/r/20200114161225.309792-2-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>