aboutsummaryrefslogtreecommitdiffstats
path: root/include/linux (follow)
AgeCommit message (Collapse)AuthorFilesLines
2020-03-22Merge branch 'akpm' (patches from Andrew)Linus Torvalds2-3/+4
Merge misc fixes from Andrew Morton: "10 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: x86/mm: split vmalloc_sync_all() mm, slub: prevent kmalloc_node crashes and memory leaks mm/mmu_notifier: silence PROVE_RCU_LIST warnings epoll: fix possible lost wakeup on epoll_ctl() path mm: do not allow MADV_PAGEOUT for CoW pages mm, memcg: throttle allocators based on ancestral memory.high mm, memcg: fix corruption on 64-bit divisor in memory.high throttling page-flags: fix a crash at SetPageError(THP_SWAP) mm/hotplug: fix hot remove failure in SPARSEMEM|!VMEMMAP case memcg: fix NULL pointer dereference in __mem_cgroup_usage_unregister_event
2020-03-21x86/mm: split vmalloc_sync_all()Joerg Roedel1-2/+3
Commit 3f8fd02b1bf1 ("mm/vmalloc: Sync unmappings in __purge_vmap_area_lazy()") introduced a call to vmalloc_sync_all() in the vunmap() code-path. While this change was necessary to maintain correctness on x86-32-pae kernels, it also adds additional cycles for architectures that don't need it. Specifically on x86-64 with CONFIG_VMAP_STACK=y some people reported severe performance regressions in micro-benchmarks because it now also calls the x86-64 implementation of vmalloc_sync_all() on vunmap(). But the vmalloc_sync_all() implementation on x86-64 is only needed for newly created mappings. To avoid the unnecessary work on x86-64 and to gain the performance back, split up vmalloc_sync_all() into two functions: * vmalloc_sync_mappings(), and * vmalloc_sync_unmappings() Most call-sites to vmalloc_sync_all() only care about new mappings being synchronized. The only exception is the new call-site added in the above mentioned commit. Shile Zhang directed us to a report of an 80% regression in reaim throughput. Fixes: 3f8fd02b1bf1 ("mm/vmalloc: Sync unmappings in __purge_vmap_area_lazy()") Reported-by: kernel test robot <oliver.sang@intel.com> Reported-by: Shile Zhang <shile.zhang@linux.alibaba.com> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Borislav Petkov <bp@suse.de> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> [GHES] Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20191009124418.8286-1-joro@8bytes.org Link: https://lists.01.org/hyperkitty/list/lkp@lists.01.org/thread/4D3JPPHBNOSPFK2KEPC6KGKS6J25AIDB/ Link: http://lkml.kernel.org/r/20191113095530.228959-1-shile.zhang@linux.alibaba.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-03-21page-flags: fix a crash at SetPageError(THP_SWAP)Qian Cai1-1/+1
Commit bd4c82c22c36 ("mm, THP, swap: delay splitting THP after swapped out") supported writing THP to a swap device but forgot to upgrade an older commit df8c94d13c7e ("page-flags: define behavior of FS/IO-related flags on compound pages") which could trigger a crash during THP swapping out with DEBUG_VM_PGFLAGS=y, kernel BUG at include/linux/page-flags.h:317! page dumped because: VM_BUG_ON_PAGE(1 && PageCompound(page)) page:fffff3b2ec3a8000 refcount:512 mapcount:0 mapping:000000009eb0338c index:0x7f6e58200 head:fffff3b2ec3a8000 order:9 compound_mapcount:0 compound_pincount:0 anon flags: 0x45fffe0000d8454(uptodate|lru|workingset|owner_priv_1|writeback|head|reclaim|swapbacked) end_swap_bio_write() SetPageError(page) VM_BUG_ON_PAGE(1 && PageCompound(page)) <IRQ> bio_endio+0x297/0x560 dec_pending+0x218/0x430 [dm_mod] clone_endio+0xe4/0x2c0 [dm_mod] bio_endio+0x297/0x560 blk_update_request+0x201/0x920 scsi_end_request+0x6b/0x4b0 scsi_io_completion+0x509/0x7e0 scsi_finish_command+0x1ed/0x2a0 scsi_softirq_done+0x1c9/0x1d0 __blk_mqnterrupt+0xf/0x20 </IRQ> Fix by checking PF_NO_TAIL in those places instead. Fixes: bd4c82c22c36 ("mm, THP, swap: delay splitting THP after swapped out") Signed-off-by: Qian Cai <cai@lca.pw> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: David Hildenbrand <david@redhat.com> Acked-by: "Huang, Ying" <ying.huang@intel.com> Acked-by: Rafael Aquini <aquini@redhat.com> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20200310235846.1319-1-cai@lca.pw Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-03-21Merge tag 'io_uring-5.6-20200320' of git://git.kernel.dk/linux-blockLinus Torvalds2-1/+3
Pull io_uring fixes from Jens Axboe: "Two different fixes in here: - Fix for a potential NULL pointer deref for links with async or drain marked (Pavel) - Fix for not properly checking RLIMIT_NOFILE for async punted operations. This affects openat/openat2, which were added this cycle, and accept4. I did a full audit of other cases where we might check current->signal->rlim[] and found only RLIMIT_FSIZE for buffered writes and fallocate. That one is fixed and queued for 5.7 and marked stable" * tag 'io_uring-5.6-20200320' of git://git.kernel.dk/linux-block: io_uring: make sure accept honor rlimit nofile io_uring: make sure openat/openat2 honor rlimit nofile io_uring: NULL-deref for IOSQE_{ASYNC,DRAIN}
2020-03-20io_uring: make sure accept honor rlimit nofileJens Axboe1-1/+2
Just like commit 4022e7af86be, this fixes the fact that IORING_OP_ACCEPT ends up using get_unused_fd_flags(), which checks current->signal->rlim[] for limits. Add an extra argument to __sys_accept4_file() that allows us to pass in the proper nofile limit, and grab it at request prep time. Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-20io_uring: make sure openat/openat2 honor rlimit nofileJens Axboe1-0/+1
Dmitry reports that a test case shows that io_uring isn't honoring a modified rlimit nofile setting. get_unused_fd_flags() checks the task signal->rlimi[] for the limits. As this isn't easily inheritable, provide a __get_unused_fd_flags() that takes the value instead. Then we can grab it when the request is prepared (from the original task), and pass that in when we do the async part part of the open. Reported-by: Dmitry Kadashev <dkadashev@gmail.com> Tested-by: Dmitry Kadashev <dkadashev@gmail.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-15Merge tag 'locking-urgent-2020-03-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds2-7/+11
Pull futex fix from Thomas Gleixner: "Fix for yet another subtle futex issue. The futex code used ihold() to prevent inodes from vanishing, but ihold() does not guarantee inode persistence. Replace the inode pointer with a per boot, machine wide, unique inode identifier. The second commit fixes the breakage of the hash mechanism which causes a 100% performance regression" * tag 'locking-urgent-2020-03-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: futex: Unbreak futex hashing futex: Fix inode life-time issue
2020-03-15Merge tag 'iommu-fixes-v5.6-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommuLinus Torvalds2-5/+11
Pull IOMMU fixes from Joerg Roedel: - Intel VT-d fixes: - RCU list handling fixes - Replace WARN_TAINT with pr_warn + add_taint for reporting firmware issues - DebugFS fixes - Fix for hugepage handling in iova_to_phys implementation - Fix for handling VMD devices, which have a domain number which doesn't fit into 16 bits - Warning message fix - MSI allocation fix for iommu-dma code - Sign-extension fix for io page-table code - Fix for AMD-Vi to properly update the is-running bit when AVIC is used * tag 'iommu-fixes-v5.6-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: iommu/vt-d: Populate debugfs if IOMMUs are detected iommu/amd: Fix IOMMU AVIC not properly update the is_run bit in IRTE iommu/vt-d: Ignore devices with out-of-spec domain number iommu/vt-d: Fix the wrong printing in RHSA parsing iommu/vt-d: Fix debugfs register reads iommu/vt-d: quirk_ioat_snb_local_iommu: replace WARN_TAINT with pr_warn + add_taint iommu/vt-d: dmar_parse_one_rmrr: replace WARN_TAINT with pr_warn + add_taint iommu/vt-d: dmar: replace WARN_TAINT with pr_warn + add_taint iommu/vt-d: Silence RCU-list debugging warnings iommu/vt-d: Fix RCU-list bugs in intel_iommu_init() iommu/dma: Fix MSI reservation allocation iommu/io-pgtable-arm: Fix IOVA validation for 32-bit iommu/vt-d: Fix a bug in intel_iommu_iova_to_phys() for huge page iommu/vt-d: Fix RCU list debugging warnings
2020-03-14Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linuxLinus Torvalds1-4/+4
Pull clk fixes from Stephen Boyd: "A small collection of fixes. I'll make another sweep soon to look for more fixes for this -rc series. - Mark device node const in of_clk_get_parent APIs to ease landing changes in users later - Fix flag for Qualcomm SC7180 video clocks where we thought it would never turn off but actually hardware takes care of it - Remove disp_cc_mdss_rscc_ahb_clk on Qualcomm SC7180 SoCs because this clk is always on anyway - Correct some bad dt-binding numbers for i.MX8MN SoCs" * tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: clk: imx8mn: Fix incorrect clock defines clk: qcom: dispcc: Remove support of disp_cc_mdss_rscc_ahb_clk clk: qcom: videocc: Update the clock flag for video_cc_vcodec0_core_clk of: clk: Make of_clk_get_parent_{count,name}() parameter const
2020-03-13Merge tag 'block-5.6-2020-03-13' of git://git.kernel.dk/linux-blockLinus Torvalds1-12/+1
Pull block fixes from Jens Axboe: "A few fixes that should go into this release. This contains: - Fix for a corruption issue with the s390 dasd driver (Stefan) - Fixup/improvement for the flush insertion change that we had in this series (Ming) - Fix for the partition suppor for host aware zoned devices (Shin'ichiro) - Fix incorrect blk-iocost comparison (Tejun) The diffstat looks large, but that's a) mostly dasd, and b) the flush fix from Ming adds a big comment" * tag 'block-5.6-2020-03-13' of git://git.kernel.dk/linux-block: block: Fix partition support for host aware zoned block devices blk-mq: insert flush request to the front of dispatch queue s390/dasd: fix data corruption for thin provisioned devices blk-iocost: fix incorrect vtime comparison in iocg_is_idle()
2020-03-13Merge tag 'mmc-v5.6-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmcLinus Torvalds1-0/+1
Pull MMC fixes from Ulf Hansson: "MMC core: - Fix HW busy detection support for host controllers requiring the MMC_RSP_BUSY response flag (R1B) to be set for the command. In particular for CMD6 (eMMC), erase/trim/discard (SD/eMMC) and CMD5 (eMMC sleep). MMC host: - sdhci-omap|tegra: Fix support for HW busy detection" * tag 'mmc-v5.6-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: mmc: core: Respect MMC_CAP_NEED_RSP_BUSY for eMMC sleep command mmc: sdhci-tegra: Fix busy detection by enabling MMC_CAP_NEED_RSP_BUSY mmc: sdhci-omap: Fix busy detection by enabling MMC_CAP_NEED_RSP_BUSY mmc: core: Respect MMC_CAP_NEED_RSP_BUSY for erase/trim/discard mmc: core: Allow host controllers to require R1B for CMD6
2020-03-13iommu/vt-d: Fix debugfs register readsMegha Dey1-0/+2
Commit 6825d3ea6cde ("iommu/vt-d: Add debugfs support to show register contents") dumps the register contents for all IOMMU devices. Currently, a 64 bit read(dmar_readq) is done for all the IOMMU registers, even though some of the registers are 32 bits, which is incorrect. Use the correct read function variant (dmar_readl/dmar_readq) while reading the contents of 32/64 bit registers respectively. Signed-off-by: Megha Dey <megha.dey@linux.intel.com> Link: https://lore.kernel.org/r/1583784587-26126-2-git-send-email-megha.dey@linux.intel.com Acked-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2020-03-12Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds3-7/+16
Pull networking fixes from David Miller: "It looks like a decent sized set of fixes, but a lot of these are one liner off-by-one and similar type changes: 1) Fix netlink header pointer to calcular bad attribute offset reported to user. From Pablo Neira Ayuso. 2) Don't double clear PHY interrupts when ->did_interrupt is set, from Heiner Kallweit. 3) Add missing validation of various (devlink, nl802154, fib, etc.) attributes, from Jakub Kicinski. 4) Missing *pos increments in various netfilter seq_next ops, from Vasily Averin. 5) Missing break in of_mdiobus_register() loop, from Dajun Jin. 6) Don't double bump tx_dropped in veth driver, from Jiang Lidong. 7) Work around FMAN erratum A050385, from Madalin Bucur. 8) Make sure ARP header is pulled early enough in bonding driver, from Eric Dumazet. 9) Do a cond_resched() during multicast processing of ipvlan and macvlan, from Mahesh Bandewar. 10) Don't attach cgroups to unrelated sockets when in interrupt context, from Shakeel Butt. 11) Fix tpacket ring state management when encountering unknown GSO types. From Willem de Bruijn. 12) Fix MDIO bus PHY resume by checking mdio_bus_phy_may_suspend() only in the suspend context. From Heiner Kallweit" * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (112 commits) net: systemport: fix index check to avoid an array out of bounds access tc-testing: add ETS scheduler to tdc build configuration net: phy: fix MDIO bus PM PHY resuming net: hns3: clear port base VLAN when unload PF net: hns3: fix RMW issue for VLAN filter switch net: hns3: fix VF VLAN table entries inconsistent issue net: hns3: fix "tc qdisc del" failed issue taprio: Fix sending packets without dequeueing them net: mvmdio: avoid error message for optional IRQ net: dsa: mv88e6xxx: Add missing mask of ATU occupancy register net: memcg: fix lockdep splat in inet_csk_accept() s390/qeth: implement smarter resizing of the RX buffer pool s390/qeth: refactor buffer pool code s390/qeth: use page pointers to manage RX buffer pool seg6: fix SRv6 L2 tunnels to use IANA-assigned protocol number net: dsa: Don't instantiate phylink for CPU/DSA ports unless needed net/packet: tpacket_rcv: do not increment ring index on drop sxgbe: Fix off by one in samsung driver strncpy size arg net: caif: Add lockdep expression to RCU traversal primitive MAINTAINERS: remove Sathya Perla as Emulex NIC maintainer ...
2020-03-12net: phy: fix MDIO bus PM PHY resumingHeiner Kallweit1-0/+2
So far we have the unfortunate situation that mdio_bus_phy_may_suspend() is called in suspend AND resume path, assuming that function result is the same. After the original change this is no longer the case, resulting in broken resume as reported by Geert. To fix this call mdio_bus_phy_may_suspend() in the suspend path only, and let the phy_device store the info whether it was suspended by MDIO bus PM. Fixes: 503ba7c69610 ("net: phy: Avoid multiple suspends") Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Tested-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-12block: Fix partition support for host aware zoned block devicesShin'ichiro Kawasaki1-12/+1
Commit b72053072c0b ("block: allow partitions on host aware zone devices") introduced the helper function disk_has_partitions() to check if a given disk has valid partitions. However, since this function result directly depends on the disk partition table length rather than the actual existence of valid partitions in the table, it returns true even after all partitions are removed from the disk. For host aware zoned block devices, this results in zone management support to be kept disabled even after removing all partitions. Fix this by changing disk_has_partitions() to walk through the partition table entries and return true if and only if a valid non-zero size partition is found. Fixes: b72053072c0b ("block: allow partitions on host aware zone devices") Cc: stable@vger.kernel.org # 5.5 Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-11driver code: clarify and fix platform device DMA mask allocationChristoph Hellwig1-1/+1
This does three inter-related things to clarify the usage of the platform device dma_mask field. In the process, fix the bug introduced by cdfee5623290 ("driver core: initialize a default DMA mask for platform device") that caused Artem Tashkinov's laptop to not boot with newer Fedora kernels. This does: - First off, rename the field to "platform_dma_mask" to make it greppable. We have way too many different random fields called "dma_mask" in various data structures, where some of them are actual masks, and some of them are just pointers to the mask. And the structures all have pointers to each other, or embed each other inside themselves, and "pdev" sometimes means "platform device" and sometimes it means "PCI device". So to make it clear in the code when you actually use this new field, give it a unique name (it really should be something even more unique like "platform_device_dma_mask", since it's per platform device, not per platform, but that gets old really fast, and this is unique enough in context). To further clarify when the field gets used, initialize it when we actually start using it with the default value. - Then, use this field instead of the random one-off allocation in platform_device_register_full() that is now unnecessary since we now already have a perfectly fine allocation for it in the platform device structure. - The above then allows us to fix the actual bug, where the error path of platform_device_register_full() would unconditionally free the platform device DMA allocation with 'kfree()'. That kfree() was dont regardless of whether the allocation had been done earlier with the (now removed) kmalloc, or whether setup_pdev_dma_masks() had already been used and the dma_mask pointer pointed to the mask that was part of the platform device. It seems most people never triggered the error path, or only triggered it from a call chain that set an explicit pdevinfo->dma_mask value (and thus caused the unnecessary allocation that was "cleaned up" in the error path) before calling platform_device_register_full(). Robin Murphy points out that in Artem's case the wdat_wdt driver failed in platform_device_add(), and that was the one that had called platform_device_register_full() with pdevinfo.dma_mask = 0, and would have caused that kfree() of pdev.dma_mask corrupting the heap. A later unrelated kmalloc() then oopsed due to the heap corruption. Fixes: cdfee5623290 ("driver core: initialize a default DMA mask for platform device") Reported-bisected-and-tested-by: Artem S. Tashkinov <aros@gmx.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-03-11mmc: core: Allow host controllers to require R1B for CMD6Ulf Hansson1-0/+1
It has turned out that some host controllers can't use R1B for CMD6 and other commands that have R1B associated with them. Therefore invent a new host cap, MMC_CAP_NEED_RSP_BUSY to let them specify this. In __mmc_switch(), let's check the flag and use it to prevent R1B responses from being converted into R1. Note that, this also means that the host are on its own, when it comes to manage the busy timeout. Suggested-by: Sowjanya Komatineni <skomatineni@nvidia.com> Cc: <stable@vger.kernel.org> Tested-by: Anders Roxell <anders.roxell@linaro.org> Tested-by: Sowjanya Komatineni <skomatineni@nvidia.com> Tested-by: Faiz Abbas <faiz_abbas@ti.com> Tested-By: Peter Geis <pgwipeout@gmail.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2020-03-10Merge branch 'for-5.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroupLinus Torvalds1-0/+1
Pull cgroup fixes from Tejun Heo: - cgroup.procs listing related fixes. It didn't interlock properly with exiting tasks leaving a short window where a cgroup has empty cgroup.procs but still can't be removed and misbehaved on short reads. - psi_show() crash fix on 32bit ino archs - Empty release_agent handling fix * 'for-5.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup1: don't call release_agent when it is "" cgroup: fix psi_show() crash on 32bit ino archs cgroup: Iterate tasks that did not finish do_exit() cgroup: cgroup_procs_next should increase position index cgroup-v1: cgroup_pidlist_next should update position index
2020-03-10Merge branch 'for-5.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wqLinus Torvalds1-0/+16
Pull workqueue fixes from Tejun Heo: "Workqueue has been incorrectly round-robining per-cpu work items. Hillf's patch fixes that. The other patch documents memory-ordering properties of workqueue operations" * 'for-5.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: don't use wq_select_unbound_cpu() for bound works workqueue: Document (some) memory-ordering properties of {queue,schedule}_work()
2020-03-10iommu/vt-d: Silence RCU-list debugging warningsQian Cai1-2/+4
Similar to the commit 02d715b4a818 ("iommu/vt-d: Fix RCU list debugging warnings"), there are several other places that call list_for_each_entry_rcu() outside of an RCU read side critical section but with dmar_global_lock held. Silence those false positives as well. drivers/iommu/intel-iommu.c:4288 RCU-list traversed in non-reader section!! 1 lock held by swapper/0/1: #0: ffffffff935892c8 (dmar_global_lock){+.+.}, at: intel_iommu_init+0x1ad/0xb97 drivers/iommu/dmar.c:366 RCU-list traversed in non-reader section!! 1 lock held by swapper/0/1: #0: ffffffff935892c8 (dmar_global_lock){+.+.}, at: intel_iommu_init+0x125/0xb97 drivers/iommu/intel-iommu.c:5057 RCU-list traversed in non-reader section!! 1 lock held by swapper/0/1: #0: ffffffffa71892c8 (dmar_global_lock){++++}, at: intel_iommu_init+0x61a/0xb13 Signed-off-by: Qian Cai <cai@lca.pw> Acked-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2020-03-08inet_diag: return classid for all socket typesDmitry Yakunin1-6/+12
In commit 1ec17dbd90f8 ("inet_diag: fix reporting cgroup classid and fallback to priority") croup classid reporting was fixed. But this works only for TCP sockets because for other socket types icsk parameter can be NULL and classid code path is skipped. This change moves classid handling to inet_diag_msg_attrs_fill() function. Also inet_diag_msg_attrs_size() helper was added and addends in nlmsg_new() were reordered to save order from inet_sk_diag_fill(). Fixes: 1ec17dbd90f8 ("inet_diag: fix reporting cgroup classid and fallback to priority") Signed-off-by: Dmitry Yakunin <zeil@yandex-team.ru> Reviewed-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-08Merge tag 'driver-core-5.6-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-coreLinus Torvalds2-7/+17
Pull driver core and debugfs fixes from Greg KH: "Here are four small driver core / debugfs patches for 5.6-rc3: - debugfs api cleanup now that all debugfs_create_regset32() callers have been fixed up. This was waiting until after the -rc1 merge as these fixes came in through different trees - driver core sync state fixes based on reports of minor issues found in the feature All of these have been in linux-next with no reported issues" * tag 'driver-core-5.6-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: driver core: Skip unnecessary work when device doesn't have sync_state() driver core: Add dev_has_sync_state() driver core: Call sync_state() even if supplier has no consumers debugfs: remove return value of debugfs_create_regset32()
2020-03-07Merge tag 'block-5.6-2020-03-07' of git://git.kernel.dk/linux-blockLinus Torvalds1-1/+0
Pull block fixes from Jens Axboe: "Here are a few fixes that should go into this release. This contains: - Revert of a bad bcache patch from this merge window - Removed unused function (Daniel) - Fixup for the blktrace fix from Jan from this release (Cengiz) - Fix of deeper level bfqq overwrite in BFQ (Carlo)" * tag 'block-5.6-2020-03-07' of git://git.kernel.dk/linux-block: block, bfq: fix overwrite of bfq_group pointer in bfq_find_set_group() blktrace: fix dereference after null check Revert "bcache: ignore pending signals when creating gc and allocator thread" block: Remove used kblockd_schedule_work_on()
2020-03-06rhashtable: Document the right function parametersJonathan Neuschäfer1-1/+1
rhashtable_lookup_get_insert_key doesn't have a parameter `data`. It does have a parameter `key`, however. Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-06Merge tag 'spi-fix-v5.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spiLinus Torvalds1-0/+1
Pull spi fixes from Mark Brown: "A selection of small fixes, mostly for drivers, that have arrived since the merge window. None of them are earth shattering in themselves but all useful for affected systems" * tag 'spi-fix-v5.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: spi: spi_register_controller(): free bus id on error paths spi: bcm63xx-hsspi: Really keep pll clk enabled spi: atmel-quadspi: fix possible MMIO window size overrun spi/zynqmp: remove entry that causes a cs glitch spi: pxa2xx: Add CS control clock quirk spi: spidev: Fix CS polarity if GPIO descriptors are used spi: qup: call spi_qup_pm_resume_runtime before suspending spi: spi-omap2-mcspi: Support probe deferral for DMA channels spi: spi-omap2-mcspi: Handle DMA size restriction on AM65x
2020-03-06mm, hotplug: fix page online with DEBUG_PAGEALLOC compiled but not enabledVlastimil Babka1-0/+4
Commit cd02cf1aceea ("mm/hotplug: fix an imbalance with DEBUG_PAGEALLOC") fixed memory hotplug with debug_pagealloc enabled, where onlining a page goes through page freeing, which removes the direct mapping. Some arches don't like when the page is not mapped in the first place, so generic_online_page() maps it first. This is somewhat wasteful, but better than special casing page freeing fast paths. The commit however missed that DEBUG_PAGEALLOC configured doesn't mean it's actually enabled. One has to test debug_pagealloc_enabled() since 031bc5743f15 ("mm/debug-pagealloc: make debug-pagealloc boottime configurable"), or alternatively debug_pagealloc_enabled_static() since 8e57f8acbbd1 ("mm, debug_pagealloc: don't rely on static keys too early"), but this is not done. As a result, a s390 kernel with DEBUG_PAGEALLOC configured but not enabled will crash: Unable to handle kernel pointer dereference in virtual kernel address space Failing address: 0000000000000000 TEID: 0000000000000483 Fault in home space mode while using kernel ASCE. AS:0000001ece13400b R2:000003fff7fd000b R3:000003fff7fcc007 S:000003fff7fd7000 P:000000000000013d Oops: 0004 ilc:2 [#1] SMP CPU: 1 PID: 26015 Comm: chmem Kdump: loaded Tainted: GX 5.3.18-5-default #1 SLE15-SP2 (unreleased) Krnl PSW : 0704e00180000000 0000001ecd281b9e (__kernel_map_pages+0x166/0x188) R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3 Krnl GPRS: 0000000000000000 0000000000000800 0000400b00000000 0000000000000100 0000000000000001 0000000000000000 0000000000000002 0000000000000100 0000001ece139230 0000001ecdd98d40 0000400b00000100 0000000000000000 000003ffa17e4000 001fffe0114f7d08 0000001ecd4d93ea 001fffe0114f7b20 Krnl Code: 0000001ecd281b8e: ec17ffff00d8 ahik %r1,%r7,-1 0000001ecd281b94: ec111dbc0355 risbg %r1,%r1,29,188,3 >0000001ecd281b9e: 94fb5006 ni 6(%r5),251 0000001ecd281ba2: 41505008 la %r5,8(%r5) 0000001ecd281ba6: ec51fffc6064 cgrj %r5,%r1,6,1ecd281b9e 0000001ecd281bac: 1a07 ar %r0,%r7 0000001ecd281bae: ec03ff584076 crj %r0,%r3,4,1ecd281a5e Call Trace: [<0000001ecd281b9e>] __kernel_map_pages+0x166/0x188 [<0000001ecd4d9516>] online_pages_range+0xf6/0x128 [<0000001ecd2a8186>] walk_system_ram_range+0x7e/0xd8 [<0000001ecda28aae>] online_pages+0x2fe/0x3f0 [<0000001ecd7d02a6>] memory_subsys_online+0x8e/0xc0 [<0000001ecd7add42>] device_online+0x5a/0xc8 [<0000001ecd7d0430>] state_store+0x88/0x118 [<0000001ecd5b9f62>] kernfs_fop_write+0xc2/0x200 [<0000001ecd5064b6>] vfs_write+0x176/0x1e0 [<0000001ecd50676a>] ksys_write+0xa2/0x100 [<0000001ecda315d4>] system_call+0xd8/0x2c8 Fix this by checking debug_pagealloc_enabled_static() before calling kernel_map_pages(). Backports for kernel before 5.5 should use debug_pagealloc_enabled() instead. Also add comments. Fixes: cd02cf1aceea ("mm/hotplug: fix an imbalance with DEBUG_PAGEALLOC") Reported-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: <stable@vger.kernel.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Qian Cai <cai@lca.pw> Link: http://lkml.kernel.org/r/20200224094651.18257-1-vbabka@suse.cz Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-03-06futex: Fix inode life-time issuePeter Zijlstra2-7/+11
As reported by Jann, ihold() does not in fact guarantee inode persistence. And instead of making it so, replace the usage of inode pointers with a per boot, machine wide, unique inode identifier. This sequence number is global, but shared (file backed) futexes are rare enough that this should not become a performance issue. Reported-by: Jann Horn <jannh@google.com> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2020-03-04driver core: Add dev_has_sync_state()Saravana Kannan1-0/+11
Add an API to check if a device has sync_state support in its driver or bus. Signed-off-by: Saravana Kannan <saravanak@google.com> Link: https://lore.kernel.org/r/20200221080510.197337-3-saravanak@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-02iommu/vt-d: Fix RCU list debugging warningsAmol Grover1-3/+5
dmar_drhd_units is traversed using list_for_each_entry_rcu() outside of an RCU read side critical section but under the protection of dmar_global_lock. Hence add corresponding lockdep expression to silence the following false-positive warnings: [ 1.603975] ============================= [ 1.603976] WARNING: suspicious RCU usage [ 1.603977] 5.5.4-stable #17 Not tainted [ 1.603978] ----------------------------- [ 1.603980] drivers/iommu/intel-iommu.c:4769 RCU-list traversed in non-reader section!! [ 1.603869] ============================= [ 1.603870] WARNING: suspicious RCU usage [ 1.603872] 5.5.4-stable #17 Not tainted [ 1.603874] ----------------------------- [ 1.603875] drivers/iommu/dmar.c:293 RCU-list traversed in non-reader section!! Tested-by: Madhuparna Bhowmik <madhuparnabhowmik10@gmail.com> Signed-off-by: Amol Grover <frextrite@gmail.com> Cc: stable@vger.kernel.org Acked-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2020-03-02block: Remove used kblockd_schedule_work_on()Daniel Wagner1-1/+0
Commit ee63cfa7fc19 ("block: add kblockd_schedule_work_on()") introduced the helper in 2016. Remove it because since then no caller was added. Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-01net: phy: avoid clearing PHY interrupts twice in irq handlerHeiner Kallweit1-0/+1
On all PHY drivers that implement did_interrupt() reading the interrupt status bits clears them. This means we may loose an interrupt that is triggered between calling did_interrupt() and phy_clear_interrupt(). As part of the fix make it a requirement that did_interrupt() clears the interrupt. The Fixes tag refers to the first commit where the patch applies cleanly. Fixes: 49644e68f472 ("net: phy: add callback for custom interrupt handler to struct phy_driver") Reported-by: Michael Walle <michael@walle.cc> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-01Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds1-1/+1
Pull KVM fixes from Paolo Bonzini: "More bugfixes, including a few remaining "make W=1" issues such as too large frame sizes on some configurations. On the ARM side, the compiler was messing up shadow stacks between EL1 and EL2 code, which is easily fixed with __always_inline" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: VMX: check descriptor table exits on instruction emulation kvm: x86: Limit the number of "kvm: disabled by bios" messages KVM: x86: avoid useless copy of cpufreq policy KVM: allow disabling -Werror KVM: x86: allow compiling as non-module with W=1 KVM: Pre-allocate 1 cpumask variable per cpu for both pv tlb and pv ipis KVM: Introduce pv check helpers KVM: let declaration of kvm_get_running_vcpus match implementation KVM: SVM: allocate AVIC data structures based on kvm_amd module parameter arm64: Ask the compiler to __always_inline functions used by KVM at HYP KVM: arm64: Define our own swab32() to avoid a uapi static inline KVM: arm64: Ask the compiler to __always_inline functions used at HYP kvm: arm/arm64: Fold VHE entry/exit work into kvm_vcpu_run_vhe() KVM: arm/arm64: Fix up includes for trace.h
2020-02-28Merge tag 'block-5.6-2020-02-28' of git://git.kernel.dk/linux-blockLinus Torvalds2-6/+14
Pull block fixes from Jens Axboe: - Passthrough insertion fix (Ming) - Kill off some unused arguments (John) - blktrace RCU fix (Jan) - Dead fields removal for null_blk (Dongli) - NVMe polled IO fix (Bijan) * tag 'block-5.6-2020-02-28' of git://git.kernel.dk/linux-block: nvme-pci: Hold cq_poll_lock while completing CQEs blk-mq: Remove some unused function arguments null_blk: remove unused fields in 'nullb_cmd' blktrace: Protect q->blk_trace with RCU blk-mq: insert passthrough request into hctx->dispatch directly
2020-02-28KVM: let declaration of kvm_get_running_vcpus match implementationChristian Borntraeger1-1/+1
Sparse notices that declaration and implementation do not match: arch/s390/kvm/../../../virt/kvm/kvm_main.c:4435:17: warning: incorrect type in return expression (different address spaces) arch/s390/kvm/../../../virt/kvm/kvm_main.c:4435:17: expected struct kvm_vcpu [noderef] <asn:3> ** arch/s390/kvm/../../../virt/kvm/kvm_main.c:4435:17: got struct kvm_vcpu *[noderef] <asn:3> * Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds2-7/+20
Pull networking fixes from David Miller: 1) Fix leak in nl80211 AP start where we leak the ACL memory, from Johannes Berg. 2) Fix double mutex unlock in mac80211, from Andrei Otcheretianski. 3) Fix RCU stall in ipset, from Jozsef Kadlecsik. 4) Fix devlink locking in devlink_dpipe_table_register, from Madhuparna Bhowmik. 5) Fix race causing TX hang in ll_temac, from Esben Haabendal. 6) Stale eth hdr pointer in br_dev_xmit(), from Nikolay Aleksandrov. 7) Fix TX hash calculation bounds checking wrt. tc rules, from Amritha Nambiar. 8) Size netlink responses properly in schedule action code to take into consideration TCA_ACT_FLAGS. From Jiri Pirko. 9) Fix firmware paths for mscc PHY driver, from Antoine Tenart. 10) Don't register stmmac notifier multiple times, from Aaro Koskinen. 11) Various rmnet bug fixes, from Taehee Yoo. 12) Fix vsock deadlock in vsock transport release, from Stefano Garzarella. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (61 commits) net: dsa: mv88e6xxx: Fix masking of egress port mlxsw: pci: Wait longer before accessing the device after reset sfc: fix timestamp reconstruction at 16-bit rollover points vsock: fix potential deadlock in transport->release() unix: It's CONFIG_PROC_FS not CONFIG_PROCFS net: rmnet: fix packet forwarding in rmnet bridge mode net: rmnet: fix bridge mode bugs net: rmnet: use upper/lower device infrastructure net: rmnet: do not allow to change mux id if mux id is duplicated net: rmnet: remove rcu_read_lock in rmnet_force_unassociate_device() net: rmnet: fix suspicious RCU usage net: rmnet: fix NULL pointer dereference in rmnet_changelink() net: rmnet: fix NULL pointer dereference in rmnet_newlink() net: phy: marvell: don't interpret PHY status unless resolved mlx5: register lag notifier for init network namespace only unix: define and set show_fdinfo only if procfs is enabled hinic: fix a bug of rss configuration hinic: fix a bug of setting hw_ioctxt hinic: fix a irq affinity bug net/smc: check for valid ib_client_data ...
2020-02-27Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hidLinus Torvalds1-1/+1
Pull HID subsystem fixes from Jiri Kosina: - syzkaller-reported error handling fixes in various drivers, from various people - increase of HID report buffer size to 8K, which is apparently needed by certain modern devices - a few new device-ID-specific fixes / quirks - battery charging status reporting fix in logitech-hidpp, from Filipe Laíns * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid: HID: hid-bigbenff: fix race condition for scheduled work during removal HID: hid-bigbenff: call hid_hw_stop() in case of error HID: hid-bigbenff: fix general protection fault caused by double kfree HID: i2c-hid: add Trekstor Surfbook E11B to descriptor override HID: alps: Fix an error handling path in 'alps_input_configured()' HID: hiddev: Fix race in in hiddev_disconnect() HID: core: increase HID report buffer size to 8KiB HID: core: fix off-by-one memset in hid_report_raw_event() HID: apple: Add support for recent firmware on Magic Keyboards HID: ite: Only bind to keyboard USB interface on Acer SW5-012 keyboard dock HID: logitech-hidpp: BatteryVoltage: only read chargeStatus if extPower is active
2020-02-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller1-1/+10
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes: 1) Perform garbage collection from workqueue to fix rcu detected stall in ipset hash set types, from Jozsef Kadlecsik. 2) Fix the forceadd evaluation path, also from Jozsef. 3) Fix nft_set_pipapo selftest, from Stefano Brivio. 4) Crash when add-flush-add element in pipapo set, also from Stefano. Add test to cover this crash. 5) Remove sysctl entry under mutex in hashlimit, from Cong Wang. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-26Merge tag 'trace-v5.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-traceLinus Torvalds1-0/+3
Pull tracing and bootconfig updates: "Fixes and changes to bootconfig before it goes live in a release. Change in API of bootconfig (before it comes live in a release): - Have a magic value "BOOTCONFIG" in initrd to know a bootconfig exists - Set CONFIG_BOOT_CONFIG to 'n' by default - Show error if "bootconfig" on cmdline but not compiled in - Prevent redefining the same value - Have a way to append values - Added a SELECT BLK_DEV_INITRD to fix a build failure Synthetic event fixes: - Switch to raw_smp_processor_id() for recording CPU value in preempt section. (No care for what the value actually is) - Fix samples always recording u64 values - Fix endianess - Check number of values matches number of fields - Fix a printing bug Fix of trace_printk() breaking postponed start up tests Make a function static that is only used in a single file" * tag 'trace-v5.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: bootconfig: Fix CONFIG_BOOTTIME_TRACING dependency issue bootconfig: Add append value operator support bootconfig: Prohibit re-defining value on same key bootconfig: Print array as multiple commands for legacy command line bootconfig: Reject subkey and value on same parent key tools/bootconfig: Remove unneeded error message silencer bootconfig: Add bootconfig magic word for indicating bootconfig explicitly bootconfig: Set CONFIG_BOOT_CONFIG=n by default tracing: Clear trace_state when starting trace bootconfig: Mark boot_config_checksum() static tracing: Disable trace_printk() on post poned tests tracing: Have synthetic event test use raw_smp_processor_id() tracing: Fix number printing bug in print_synth_event() tracing: Check that number of vals matches number of synth event fields tracing: Make synth_event trace functions endian-correct tracing: Make sure synth_event_trace() example always uses u64
2020-02-26Merge branch 'master' of git://blackhole.kfki.hu/nfPablo Neira Ayuso1-1/+10
Jozsef Kadlecsik says: ==================== ipset patches for nf The first one is larger than usual, but the issue could not be solved simpler. Also, it's a resend of the patch I submitted a few days ago, with a one line fix on top of that: the size of the comment extensions was not taken into account at reporting the full size of the set. - Fix "INFO: rcu detected stall in hash_xxx" reports of syzbot by introducing region locking and using workqueue instead of timer based gc of timed out entries in hash types of sets in ipset. - Fix the forceadd evaluation path - the bug was also uncovered by the syzbot. ==================== Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-02-25icmp: allow icmpv6_ndo_send to work with CONFIG_IPV6=nJason A. Donenfeld1-6/+10
The icmpv6_send function has long had a static inline implementation with an empty body for CONFIG_IPV6=n, so that code calling it doesn't need to be ifdef'd. The new icmpv6_ndo_send function, which is intended for drivers as a drop-in replacement with an identical function signature, should follow the same pattern. Without this patch, drivers that used to work with CONFIG_IPV6=n now result in a linker error. Cc: Chen Zhou <chenzhou10@huawei.com> Reported-by: Hulk Robot <hulkci@huawei.com> Fixes: 0b41713b6066 ("icmp: introduce helper for nat'd source address in network device context") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-25blktrace: Protect q->blk_trace with RCUJan Kara2-6/+14
KASAN is reporting that __blk_add_trace() has a use-after-free issue when accessing q->blk_trace. Indeed the switching of block tracing (and thus eventual freeing of q->blk_trace) is completely unsynchronized with the currently running tracing and thus it can happen that the blk_trace structure is being freed just while __blk_add_trace() works on it. Protect accesses to q->blk_trace by RCU during tracing and make sure we wait for the end of RCU grace period when shutting down tracing. Luckily that is rare enough event that we can afford that. Note that postponing the freeing of blk_trace to an RCU callback should better be avoided as it could have unexpected user visible side-effects as debugfs files would be still existing for a short while block tracing has been shut down. Link: https://bugzilla.kernel.org/show_bug.cgi?id=205711 CC: stable@vger.kernel.org Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Tested-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reported-by: Tristan Madani <tristmd@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-02-24Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds1-0/+2
Pull kvm fixes from Paolo Bonzini: "Bugfixes, including the fix for CVE-2020-2732 and a few issues found by 'make W=1'" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: s390: rstify new ioctls in api.rst KVM: nVMX: Check IO instruction VM-exit conditions KVM: nVMX: Refactor IO bitmap checks into helper function KVM: nVMX: Don't emulate instructions in guest mode KVM: nVMX: Emulate MTF when performing instruction emulation KVM: fix error handling in svm_hardware_setup KVM: SVM: Fix potential memory leak in svm_cpu_init() KVM: apic: avoid calculating pending eoi from an uninitialized val KVM: nVMX: clear PIN_BASED_POSTED_INTR from nested pinbased_ctls only when apicv is globally disabled KVM: nVMX: handle nested posted interrupts when apicv is disabled for L1 kvm: x86: svm: Fix NULL pointer dereference when AVIC not enabled KVM: VMX: Add VMX_FEATURE_USR_WAIT_PAUSE KVM: nVMX: Hold KVM's srcu lock when syncing vmcs12->shadow KVM: x86: don't notify userspace IOAPIC on edge-triggered interrupt EOI kvm/emulate: fix a -Werror=cast-function-type KVM: x86: fix incorrect comparison in trace event KVM: nVMX: Fix some obsolete comments and grammar error KVM: x86: fix missing prototypes KVM: x86: enable -Werror
2020-02-22Merge tag 'irq-urgent-2020-02-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds1-1/+1
Pull irq fixes from Thomas Gleixner: "Two fixes for the irq core code which are follow ups to the recent MSI fixes: - The WARN_ON which was put into the MSI setaffinity callback for paranoia reasons actually triggered via a callchain which escaped when all the possible ways to reach that code were analyzed. The proc/irq/$N/*affinity interfaces have a quirk which came in when ALPHA moved to the generic interface: In case that the written affinity mask does not contain any online CPU it calls into ALPHAs magic auto affinity setting code. A few years later this mechanism was also made available to x86 for no good reasons and in a way which circumvents all sanity checks for interrupts which cannot have their affinity set from process context on X86 due to the way the X86 interrupt delivery works. It would be possible to make this work properly, but there is no point in doing so. If the interrupt is not yet started then the affinity setting has no effect and if it is started already then it is already assigned to an online CPU so there is no point to randomly move it to some other CPU. Just return EINVAL as the code has done before that change forever. - The new MSI quirk bit in the irq domain flags turned out to be already occupied, which escaped the author and the reviewers because the already in use bits were 0,6,2,3,4,5 listed in that order. That bit 6 was simply overlooked because the ordering was straight forward linear otherwise. So the new bit ended up being a duplicate. Fix it up by switching the oddball 6 to the obvious 1" * tag 'irq-urgent-2020-02-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: genirq/irqdomain: Make sure all irq domain flags are distinct genirq/proc: Reject invalid affinity masks (again)
2020-02-22netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reportsJozsef Kadlecsik1-1/+10
In the case of huge hash:* types of sets, due to the single spinlock of a set the processing of the whole set under spinlock protection could take too long. There were four places where the whole hash table of the set was processed from bucket to bucket under holding the spinlock: - During resizing a set, the original set was locked to exclude kernel side add/del element operations (userspace add/del is excluded by the nfnetlink mutex). The original set is actually just read during the resize, so the spinlocking is replaced with rcu locking of regions. However, thus there can be parallel kernel side add/del of entries. In order not to loose those operations a backlog is added and replayed after the successful resize. - Garbage collection of timed out entries was also protected by the spinlock. In order not to lock too long, region locking is introduced and a single region is processed in one gc go. Also, the simple timer based gc running is replaced with a workqueue based solution. The internal book-keeping (number of elements, size of extensions) is moved to region level due to the region locking. - Adding elements: when the max number of the elements is reached, the gc was called to evict the timed out entries. The new approach is that the gc is called just for the matching region, assuming that if the region (proportionally) seems to be full, then the whole set does. We could scan the other regions to check every entry under rcu locking, but for huge sets it'd mean a slowdown at adding elements. - Listing the set header data: when the set was defined with timeout support, the garbage collector was called to clean up timed out entries to get the correct element numbers and set size values. Now the set is scanned to check non-timed out entries, without actually calling the gc for the whole set. Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe -> SOFTIRQ-unsafe lock order issues during working on the patch. Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7") Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
2020-02-21Merge tag 'tty-5.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/ttyLinus Torvalds1-0/+2
Pull tty/serial driver fixes from Greg KH: "Here are a number of small tty and serial driver fixes for 5.6-rc3 that resolve a bunch of reported issues. They are: - vt selection and ioctl fixes - serdev bugfix - atmel serial driver fixes - qcom serial driver fixes - other minor serial driver fixes All of these have been in linux-next for a while with no reported issues" * tag 'tty-5.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: vt: selection, close sel_buffer race vt: selection, handle pending signals in paste_selection serial: cpm_uart: call cpm_muram_init before registering console tty: serial: qcom_geni_serial: Fix RX cancel command failure serial: 8250: Check UPF_IRQ_SHARED in advance tty: serial: imx: setup the correct sg entry for tx dma vt: vt_ioctl: fix race in VT_RESIZEX vt: fix scrollback flushing on background consoles tty: serial: tegra: Handle RX transfer in PIO mode if DMA wasn't started tty/serial: atmel: manage shutdown in case of RS485 or ISO7816 mode serdev: ttyport: restore client ops on deregistration serial: ar933x_uart: set UART_CS_{RX,TX}_READY_ORIDE
2020-02-21Merge tag 'usb-5.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usbLinus Torvalds1-0/+3
Pull USB/Thunderbolt fixes from Greg KH: "Here are a number of small USB driver fixes for 5.6-rc3. Included in here are: - MAINTAINER file updates - USB gadget driver fixes - usb core quirk additions and fixes for regressions - xhci driver fixes - usb serial driver id additions and fixes - thunderbolt bugfix Thunderbolt patches come in through here now that USB4 is really thunderbolt. All of these have been in linux-next for a while with no reported issues" * tag 'usb-5.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (34 commits) USB: misc: iowarrior: add support for the 100 device thunderbolt: Prevent crash if non-active NVMem file is read usb: gadget: udc-xilinx: Fix xudc_stop() kernel-doc format USB: misc: iowarrior: add support for the 28 and 28L devices USB: misc: iowarrior: add support for 2 OEMed devices USB: Fix novation SourceControl XL after suspend xhci: Fix memory leak when caching protocol extended capability PSI tables - take 2 Revert "xhci: Fix memory leak when caching protocol extended capability PSI tables" MAINTAINERS: Sort entries in database for THUNDERBOLT usb: dwc3: debug: fix string position formatting mixup with ret and len usb: gadget: serial: fix Tx stall after buffer overflow usb: gadget: ffs: ffs_aio_cancel(): Save/restore IRQ flags usb: dwc2: Fix SET/CLEAR_FEATURE and GET_STATUS flows usb: dwc2: Fix in ISOC request length checking usb: gadget: composite: Support more than 500mA MaxPower usb: gadget: composite: Fix bMaxPower for SuperSpeedPlus usb: gadget: u_audio: Fix high-speed max packet size usb: dwc3: gadget: Check for IOC/LST bit in TRB->ctrl fields USB: core: clean up endpoint-descriptor parsing USB: quirks: blacklist duplicate ep on Sound Devices USBPre2 ...
2020-02-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds4-5/+44
Pull networking fixes from David Miller: 1) Limit xt_hashlimit hash table size to avoid OOM or hung tasks, from Cong Wang. 2) Fix deadlock in xsk by publishing global consumer pointers when NAPI is finished, from Magnus Karlsson. 3) Set table field properly to RT_TABLE_COMPAT when necessary, from Jethro Beekman. 4) NLA_STRING attributes are not necessary NULL terminated, deal wiht that in IFLA_ALT_IFNAME. From Eric Dumazet. 5) Fix checksum handling in atlantic driver, from Dmitry Bezrukov. 6) Handle mtu==0 devices properly in wireguard, from Jason A. Donenfeld. 7) Fix several lockdep warnings in bonding, from Taehee Yoo. 8) Fix cls_flower port blocking, from Jason Baron. 9) Sanitize internal map names in libbpf, from Toke Høiland-Jørgensen. 10) Fix RDMA race in qede driver, from Michal Kalderon. 11) Fix several false lockdep warnings by adding conditions to list_for_each_entry_rcu(), from Madhuparna Bhowmik. 12) Fix sleep in atomic in mlx5 driver, from Huy Nguyen. 13) Fix potential deadlock in bpf_map_do_batch(), from Yonghong Song. 14) Hey, variables declared in switch statement before any case statements are not initialized. I learn something every day. Get rids of this stuff in several parts of the networking, from Kees Cook. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (99 commits) bnxt_en: Issue PCIe FLR in kdump kernel to cleanup pending DMAs. bnxt_en: Improve device shutdown method. net: netlink: cap max groups which will be considered in netlink_bind() net: thunderx: workaround BGX TX Underflow issue ionic: fix fw_status read net: disable BRIDGE_NETFILTER by default net: macb: Properly handle phylink on at91rm9200 s390/qeth: fix off-by-one in RX copybreak check s390/qeth: don't warn for napi with 0 budget s390/qeth: vnicc Fix EOPNOTSUPP precedence openvswitch: Distribute switch variables for initialization net: ip6_gre: Distribute switch variables for initialization net: core: Distribute switch variables for initialization udp: rehash on disconnect net/tls: Fix to avoid gettig invalid tls record bpf: Fix a potential deadlock with bpf_map_do_batch bpf: Do not grab the bucket spinlock by default on htab batch ops ice: Wait for VF to be reset/ready before configuration ice: Don't tell the OS that link is going down ice: Don't reject odd values of usecs set by user ...
2020-02-21y2038: remove unused time32 interfacesArnd Bergmann4-219/+1
No users remain, so kill these off before we grow new ones. Link: http://lkml.kernel.org/r/20200110154232.4104492-3-arnd@arndb.de Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Deepa Dinamani <deepa.kernel@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-02-21y2038: remove ktime to/from timespec/timeval conversionArnd Bergmann1-37/+0
A couple of helpers are now obsolete and can be removed, so drivers can no longer start using them and instead use y2038-safe interfaces. Link: http://lkml.kernel.org/r/20200110154232.4104492-2-arnd@arndb.de Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Deepa Dinamani <deepa.kernel@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-02-21genirq/irqdomain: Make sure all irq domain flags are distinctZenghui Yu1-1/+1
This was noticed when printing debugfs for MSIs on my ARM64 server. The new dstate IRQD_MSI_NOMASK_QUIRK came out surprisingly while it should only be the x86 stuff for the time being... The new MSI quirk flag uses the same bit as IRQ_DOMAIN_NAME_ALLOCATED which is oddly defined as bit 6 for no good reason. Switch it to the non used bit 1. Fixes: 6f1a4891a592 ("x86/apic/msi: Plug non-maskable MSI affinity race") Signed-off-by: Zenghui Yu <yuzenghui@huawei.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20200221020725.2038-1-yuzenghui@huawei.com