wireguard-linux - WireGuard for the Linux kernel

Age	Commit message (Collapse)	Author	Files	Lines
2020-09-26	mm, THP, swap: fix allocating cluster for swapfile by mistake	Gao Xiang	1	-1/+1
	SWP_FS is used to make swap_{read,write}page() go through the filesystem, and it's only used for swap files over NFS. So, !SWP_FS means non NFS for now, it could be either file backed or device backed. Something similar goes with legacy SWP_FILE. So in order to achieve the goal of the original patch, SWP_BLKDEV should be used instead. FS corruption can be observed with SSD device + XFS + fragmented swapfile due to CONFIG_THP_SWAP=y. I reproduced the issue with the following details: Environment: QEMU + upstream kernel + buildroot + NVMe (2 GB) Kernel config: CONFIG_BLK_DEV_NVME=y CONFIG_THP_SWAP=y Some reproducible steps: mkfs.xfs -f /dev/nvme0n1 mkdir /tmp/mnt mount /dev/nvme0n1 /tmp/mnt bs="32k" sz="1024m" # doesn't matter too much, I also tried 16m xfs_io -f -c "pwrite -R -b $bs 0 $sz" -c "fdatasync" /tmp/mnt/sw xfs_io -f -c "pwrite -R -b $bs 0 $sz" -c "fdatasync" /tmp/mnt/sw xfs_io -f -c "pwrite -R -b $bs 0 $sz" -c "fdatasync" /tmp/mnt/sw xfs_io -f -c "pwrite -F -S 0 -b $bs 0 $sz" -c "fdatasync" /tmp/mnt/sw xfs_io -f -c "pwrite -R -b $bs 0 $sz" -c "fsync" /tmp/mnt/sw mkswap /tmp/mnt/sw swapon /tmp/mnt/sw stress --vm 2 --vm-bytes 600M # doesn't matter too much as well Symptoms: - FS corruption (e.g. checksum failure) - memory corruption at: 0xd2808010 - segfault Fixes: f0eea189e8e9 ("mm, THP, swap: Don't allocate huge cluster for file backed swap device") Fixes: 38d8b4e6bdc8 ("mm, THP, swap: delay splitting THP during swap out") Signed-off-by: Gao Xiang <hsiangkao@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: "Huang, Ying" <ying.huang@intel.com> Reviewed-by: Yang Shi <shy828301@gmail.com> Acked-by: Rafael Aquini <aquini@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Carlos Maiolino <cmaiolino@redhat.com> Cc: Eric Sandeen <esandeen@redhat.com> Cc: Dave Chinner <david@fromorbit.com> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20200820045323.7809-1-hsiangkao@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-09-25	KVM: SVM: Add a dedicated INVD intercept routine	Tom Lendacky	1	-1/+7
	The INVD instruction intercept performs emulation. Emulation can't be done on an SEV guest because the guest memory is encrypted. Provide a dedicated intercept routine for the INVD intercept. And since the instruction is emulated as a NOP, just skip it instead. Fixes: 1654efcbc431 ("KVM: SVM: Add KVM_SEV_INIT command") Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <a0b9a19ffa7fef86a3cc700c7ea01cb2731e04e5.1600972918.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-25	KVM: x86: Reset MMU context if guest toggles CR4.SMAP or CR4.PKE	Sean Christopherson	1	-1/+2
	Reset the MMU context during kvm_set_cr4() if SMAP or PKE is toggled. Recent commits to (correctly) not reload PDPTRs when SMAP/PKE are toggled inadvertantly skipped the MMU context reset due to the mask of bits that triggers PDPTR loads also being used to trigger MMU context resets. Fixes: 427890aff855 ("kvm: x86: Toggling CR4.SMAP does not load PDPTEs in PAE mode") Fixes: cb957adb4ea4 ("kvm: x86: Toggling CR4.PKE does not load PDPTEs in PAE mode") Cc: Jim Mattson <jmattson@google.com> Cc: Peter Shier <pshier@google.com> Cc: Oliver Upton <oupton@google.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200923215352.17756-1-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-24	KVM: x86: fix MSR_IA32_TSC read for nested migration	Maxim Levitsky	1	-2/+15
	MSR reads/writes should always access the L1 state, since the (nested) hypervisor should intercept all the msrs it wants to adjust, and these that it doesn't should be read by the guest as if the host had read it. However IA32_TSC is an exception. Even when not intercepted, guest still reads the value + TSC offset. The write however does not take any TSC offset into account. This is documented in Intel's SDM and seems also to happen on AMD as well. This creates a problem when userspace wants to read the IA32_TSC value and then write it. (e.g for migration) In this case it reads L2 value but write is interpreted as an L1 value. To fix this make the userspace initiated reads of IA32_TSC return L1 value as well. Huge thanks to Dave Gilbert for helping me understand this very confusing semantic of MSR writes. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20200921103805.9102-2-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-24	mm: fix misplaced unlock_page in do_wp_page()	Linus Torvalds	1	-1/+1
	Commit 09854ba94c6a ("mm: do_wp_page() simplification") reorganized all the code around the page re-use vs copy, but in the process also moved the final unlock_page() around to after the wp_page_reuse() call. That normally doesn't matter - but it means that the unlock_page() is now done after releasing the page table lock. Again, not a big deal, you'd think. But it turns out that it's very wrong indeed, because once we've released the page table lock, we've basically lost our only reference to the page - the page tables - and it could now be free'd at any time. We do hold the mmap_sem, so no actual unmap() can happen, but madvise can come in and a MADV_DONTNEED will zap the page range - and free the page. So now the page may be free'd just as we're unlocking it, which in turn will usually trigger a "Bad page state" error in the freeing path. To make matters more confusing, by the time the debug code prints out the page state, the unlock has typically completed and everything looks fine again. This all doesn't happen in any normal situations, but it does trigger with the dirtyc0w_child LTP test. And it seems to trigger much more easily (but not expclusively) on s390 than elsewhere, probably because s390 doesn't do the "batch pages up for freeing after the TLB flush" that gives the unlock_page() more time to complete and makes the race harder to hit. Fixes: 09854ba94c6a ("mm: do_wp_page() simplification") Link: https://lore.kernel.org/lkml/a46e9bbef2ed4e17778f5615e818526ef848d791.camel@redhat.com/ Link: https://lore.kernel.org/linux-mm/c41149a8-211e-390b-af1d-d5eee690fecb@linux.alibaba.com/ Reported-by: Qian Cai <cai@redhat.com> Reported-by: Alex Shi <alex.shi@linux.alibaba.com> Bisected-and-analyzed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Tested-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-09-24	spi: bcm-qspi: Fix probe regression on iProc platforms	Ray Jui	1	-1/+1
	iProc chips have QSPI controller that does not have the MSPI_REV offset. Reading from that offset will cause a bus error. Fix it by having MSPI_REV query disabled in the generic compatible string. Fixes: 3a01f04d74ef ("spi: bcm-qspi: Handle lack of MSPI_REV offset") Link: https://lore.kernel.org/linux-arm-kernel/20200909211857.4144718-1-f.fainelli@gmail.com/T/#u Signed-off-by: Ray Jui <ray.jui@broadcom.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20200910152539.45584-3-ray.jui@broadcom.com Signed-off-by: Mark Brown <broonie@kernel.org>
2020-09-23	mm: move the copy_one_pte() pte_present check into the caller	Linus Torvalds	1	-12/+12
	This completes the split of the non-present and present pte cases by moving the check for the source pte being present into the single caller, which also means that we clearly separate out the very different return value case for a non-present pte. The present pte case currently always succeeds. This is a pure code re-organization with no semantic change: the intent is to make it much easier to add a new return case to the present pte case for when we do early COW at page table copy time. This was split out from the previous commit simply to make it easy to visually see that there were no semantic changes from this code re-organization. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-09-23	mm: split out the non-present case from copy_one_pte()	Linus Torvalds	1	-67/+79
	This is a purely mechanical split of the copy_one_pte() function. It's not immediately obvious when looking at the diff because of the indentation change, but the way to see what is going on in this commit is to use the "-w" flag to not show pure whitespace changes, and you see how the first part of copy_one_pte() is simply lifted out into a separate function. And since the non-present case is marked unlikely, don't make the new function be inlined. Not that gcc really seems to care, since it looks like it will inline it anyway due to the whole "single callsite for static function" logic. In fact, code generation with the function split is almost identical to before. But not marking it inline is the right thing to do. This is pure prep-work and cleanup for subsequent changes. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-09-23	spi: fsl-dspi: fix use-after-free in remove path	Sascha Hauer	1	-5/+7
	spi_unregister_controller() not only unregisters the controller, but also frees the controller. This will free the driver data with it, so we must not access it later dspi_remove(). Solve this by allocating the driver data separately from the SPI controller. Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de> Link: https://lore.kernel.org/r/20200923131026.20707-1-s.hauer@pengutronix.de Signed-off-by: Mark Brown <broonie@kernel.org>
2020-09-23	regulator: axp20x: fix LDO2/4 description	Icenowy Zheng	1	-3/+4
	Currently we wrongly set the mask of value of LDO2/4 both to the mask of LDO2, and the LDO4 voltage configuration is left untouched. This leads to conflict when LDO2/4 are both in use. Fix this issue by setting different vsel_mask to both regulators. Fixes: db4a555f7c4c ("regulator: axp20x: use defines for masks") Signed-off-by: Icenowy Zheng <icenowy@aosc.io> Link: https://lore.kernel.org/r/20200923005142.147135-1-icenowy@aosc.io Signed-off-by: Mark Brown <broonie@kernel.org>
2020-09-23	selftests: kvm: Fix assert failure in single-step test	Yang Weijiang	1	-1/+1
	This is a follow-up patch to fix an issue left in commit: 98b0bf02738004829d7e26d6cb47b2e469aaba86 selftests: kvm: Use a shorter encoding to clear RAX With the change in the commit, we also need to modify "xor" instruction length from 3 to 2 in array ss_size accordingly to pass below check: for (i = 0; i < (sizeof(ss_size) / sizeof(ss_size[0])); i++) { target_rip += ss_size[i]; CLEAR_DEBUG(); debug.control = KVM_GUESTDBG_ENABLE \| KVM_GUESTDBG_SINGLESTEP; debug.arch.debugreg[7] = 0x00000400; APPLY_DEBUG(); vcpu_run(vm, VCPU_ID); TEST_ASSERT(run->exit_reason == KVM_EXIT_DEBUG && run->debug.arch.exception == DB_VECTOR && run->debug.arch.pc == target_rip && run->debug.arch.dr6 == target_dr6, "SINGLE_STEP[%d]: exit %d exception %d rip 0x%llx " "(should be 0x%llx) dr6 0x%llx (should be 0x%llx)", i, run->exit_reason, run->debug.arch.exception, run->debug.arch.pc, target_rip, run->debug.arch.dr6, target_dr6); } Reported-by: kernel test robot <rong.a.chen@intel.com> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com> Message-Id: <20200826015524.13251-1-weijiang.yang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-23	KVM: x86: VMX: Make smaller physical guest address space support user-configurable	Mohammed Gamal	3	-7/+15
	This patch exposes allow_smaller_maxphyaddr to the user as a module parameter. Since smaller physical address spaces are only supported on VMX, the parameter is only exposed in the kvm_intel module. For now disable support by default, and let the user decide if they want to enable it. Modifications to VMX page fault and EPT violation handling will depend on whether that parameter is enabled. Signed-off-by: Mohammed Gamal <mgamal@redhat.com> Message-Id: <20200903141122.72908-1-mgamal@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-23	MIPS: BCM47XX: Remove the needless check with the 1074K	Wei Li	1	-1/+1
	As there is no known soc powered by mips 1074K in bcm47xx series, the check with 1074K is needless. So just remove it. Link: https://wireless.wiki.kernel.org/en/users/Drivers/b43/soc Fixes: 442e14a2c55e ("MIPS: Add 1074K CPU support explicitly.") Signed-off-by: Wei Li <liwei391@huawei.com> Acked-by: Rafał Miłecki <rafal@milecki.pl> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2020-09-23	MIPS: Add the missing 'CPU_1074K' into __get_cpu_type()	Wei Li	1	-0/+1
	Commit 442e14a2c55e ("MIPS: Add 1074K CPU support explicitly.") split 1074K from the 74K as an unique CPU type, while it missed to add the 'CPU_1074K' in __get_cpu_type(). So let's add it back. Fixes: 442e14a2c55e ("MIPS: Add 1074K CPU support explicitly.") Signed-off-by: Wei Li <liwei391@huawei.com> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2020-09-23	MIPS: Loongson2ef: Disable Loongson MMI instructions	Jiaxun Yang	1	-0/+4
	It was missed when I was forking Loongson2ef from Loongson64 but should be applied to Loongson2ef as march=loongson2f will also enable Loongson MMI in GCC-9+. Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Fixes: 71e2f4dd5a65 ("MIPS: Fork loongson2ef from loongson64") Reported-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: stable@vger.kernel.org # v5.8+ Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2020-09-23	ACPI: processor: Fix build for ARCH_APICTIMER_STOPS_ON_C3 unset	Rafael J. Wysocki	1	-0/+1
	Fix the lapic_timer_needs_broadcast() stub for ARCH_APICTIMER_STOPS_ON_C3 unset to actually return a value. Fixes: aa6b43d57f99 ("ACPI: processor: Use CPUIDLE_FLAG_TIMER_STOP") Reported-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-09-23	drm/i915/selftests: Push the fake iommu device from the stack to data	Chris Wilson	1	-7/+5
	Since we store a pointer to the fake iommu device that is allocated on the stack, as soon as we leave the function it goes out of scope and any future dereference is undefined behaviour. Just in case we may need to look at the fake iommu device after initialiation, move the allocation from the stack into the data. Fixes: 01b9d4e21148 ("iommu/vt-d: Use dev_iommu_priv_get/set()") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200916105022.28316-2-chris@chris-wilson.co.uk (cherry picked from commit 9f9f4101fc98db56714e71676d5a1e2d27e01f7e) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2020-09-23	PM / devfreq: tegra30: Disable clock on error in probe	Dan Carpenter	1	-1/+3
	This error path needs to call clk_disable_unprepare(). Fixes: 7296443b900e ("PM / devfreq: tegra30: Handle possible round-rate error") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Dmitry Osipenko <digetx@gmail.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
2020-09-23	PM / devfreq: Add timer type to devfreq_summary debugfs	Chanwoo Choi	1	-3/+8
	The commit 4dc3bab8687f ("PM / devfreq: Add support delayed timer for polling mode") supports the delayed timer but this commit missed the adding the timer type to devfreq_summary debugfs node. Add the timer type to devfreq_summary debugfs. Fixes: 4dc3bab8687f ("PM / devfreq: Add support delayed timer for polling mode") Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
2020-09-22	cpuidle: Drop misleading comments about RCU usage	Ulf Hansson	1	-10/+0
	The commit 1098582a0f6c ("sched,idle,rcu: Push rcu_idle deeper into the idle path"), moved the calls rcu_idle_enter\|exit() into the cpuidle core. However, it forgot to remove a couple of comments in enter_s2idle_proper() about why RCU_NONIDLE earlier was needed. So, let's drop them as they have become a bit misleading. Fixes: 1098582a0f6c ("sched,idle,rcu: Push rcu_idle deeper into the idle path") Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-09-22	dm crypt: document encrypted keyring key option	Milan Broz	1	-1/+1
	Commit 27f5411a718c4 ("dm crypt: support using encrypted keys") introduced support for encrypted keyring type. Fix documentation in admin guide to mention this type. Fixes: 27f5411a718c4 ("dm crypt: support using encrypted keys") Signed-off-by: Milan Broz <gmazyland@gmail.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-09-22	dm crypt: document new no_workqueue flags	Milan Broz	1	-0/+8
	Commit 39d42fa96ba1 ("dm crypt: add flags to optionally bypass kcryptd workqueues") introduced new dm-crypt 'no_read_workqueue' and 'no_write_workqueue' flags. Add documentation to admin guide for them. Fixes: 39d42fa96ba1 ("dm crypt: add flags to optionally bypass kcryptd workqueues") Signed-off-by: Milan Broz <gmazyland@gmail.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-09-22	Revert "ALSA: usb-audio: Disable Lenovo P620 Rear line-in volume control"	Kai-Heng Feng	1	-1/+0
	This reverts commit 34dedd2a83b241ba6aeb290260313c65dc58660e. According to Realtek, volume FU works for line-in. I can confirm volume control works after device firmware is updated. Fixes: 34dedd2a83b2 ("ALSA: usb-audio: Disable Lenovo P620 Rear line-in volume control") Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Link: https://lore.kernel.org/r/20200915103925.12777-1-kai.heng.feng@canonical.com Signed-off-by: Takashi Iwai <tiwai@suse.de>
2020-09-22	btrfs: fix put of uninitialized kobject after seed device delete	Anand Jain	1	-6/+10
	The following test case leads to NULL kobject free error: mount seed /mnt add sprout to /mnt umount /mnt mount sprout to /mnt delete seed kobject: '(null)' (00000000dd2b87e4): is not initialized, yet kobject_put() is being called. WARNING: CPU: 1 PID: 15784 at lib/kobject.c:736 kobject_put+0x80/0x350 RIP: 0010:kobject_put+0x80/0x350 :: Call Trace: btrfs_sysfs_remove_devices_dir+0x6e/0x160 [btrfs] btrfs_rm_device.cold+0xa8/0x298 [btrfs] btrfs_ioctl+0x206c/0x22a0 [btrfs] ksys_ioctl+0xe2/0x140 __x64_sys_ioctl+0x1e/0x29 do_syscall_64+0x96/0x150 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f4047c6288b :: This is because, at the end of the seed device-delete, we try to remove the seed's devid sysfs entry. But for the seed devices under the sprout fs, we don't initialize the devid kobject yet. So add a kobject state check, which takes care of the bug. Fixes: 668e48af7a94 ("btrfs: sysfs, add devid/dev_state kobject and device attributes") CC: stable@vger.kernel.org # 5.6+ Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-09-22	MIPS: Loongson-3: Fix fp register access if MSA enabled	Huacai Chen	1	-16/+8
	If MSA is enabled, FPU_REG_WIDTH is 128 rather than 64, then get_fpr64() /set_fpr64() in the original unaligned instruction emulation code access the wrong fp registers. This is because the current code doesn't specify the correct index field, so fix it. Fixes: f83e4f9896eff614d0f2547a ("MIPS: Loongson-3: Add some unaligned instructions emulation") Signed-off-by: Huacai Chen <chenhc@lemote.com> Signed-off-by: Pei Huang <huangpei@loongson.cn> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2020-09-22	media: dt-bindings: media: imx274: Convert to json-schema	Jacopo Mondi	3	-39/+77
	Convert the imx274 bindings document to json-schema and update the MAINTAINERS file accordingly. Reviewed-by: Luca Ceresoli <luca@lucaceresoli.net> Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Signed-off-by: Jacopo Mondi <jacopo+renesas@jmondi.org> Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2020-09-21	tools/bootconfig: Add testcase for tailing space	Masami Hiramatsu	1	-0/+13
	Add testcases for removing/keeping tailing space in the value. Link: https://lkml.kernel.org/r/160068151151.1088739.3469541807296024227.stgit@devnote2 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-09-21	tools/bootconfig: Add testcases for repeated key with brace	Masami Hiramatsu	1	-0/+12
	Add a testcase for repeated key with brace parsing issue. Link: https://lkml.kernel.org/r/160068150176.1088739.409481347784771987.stgit@devnote2 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-09-21	lib/bootconfig: Fix to remove tailing spaces after value	Masami Hiramatsu	1	-1/+1
	Fix to remove tailing spaces after value. If there is a space after value, the bootconfig failed to remove it because it applies strim() before replacing the delimiter with null. For example, foo = var # comment was parsed as below. foo="var " but user will expect foo="var" This fixes it by applying strim() after removing the delimiter. Link: https://lkml.kernel.org/r/160068149134.1088739.8868306567670058853.stgit@devnote2 Fixes: 76db5a27a827 ("bootconfig: Add Extra Boot Config support") Cc: Ingo Molnar <mingo@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-09-21	lib/bootconfig: Fix a bug of breaking existing tree nodes	Masami Hiramatsu	1	-13/+23
	Fix a bug of breaking existing tree nodes by parsing the second and subsequent braces. Since the bootconfig parser uses the node.next field as a flag of current parent node, but this will break the existing tree if the same key node is specified again in the bootconfig. For example, the following bootconfig should be foo.buz and bar. foo bar foo { buz } However, when parsing the brace "{", it breaks foo->bar link by marking open-brace node. So the bootconfig unlinks bar from the bootconfig internal tree. This introduces a stack outside of the tree and record the last open-brace on the stack instead of using node.next field. Link: https://lkml.kernel.org/r/160068148267.1088739.8264704338030168660.stgit@devnote2 Fixes: 76db5a27a827 ("bootconfig: Add Extra Boot Config support") Cc: Ingo Molnar <mingo@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-09-21	net: mscc: ocelot: fix some key offsets for IP4_TCP_UDP VCAP IS2 entries	Vladimir Oltean	1	-8/+8
	The IS2 IP4_TCP_UDP key offsets do not correspond to the VSC7514 datasheet. Whether they work or not is unknown to me. On VSC9959 and VSC9953, with the same mistake and same discrepancy from the documentation, tc-flower src_port and dst_port rules did not work, so I am assuming the same is true here. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-21	net: dsa: seville: fix some key offsets for IP4_TCP_UDP VCAP IS2 entries	Vladimir Oltean	1	-8/+8
	Since these were copied from the Felix VCAP IS2 code, and only the offsets were adjusted, the order of the bit fields is still wrong. Fix it. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-21	net: dsa: felix: fix some key offsets for IP4_TCP_UDP VCAP IS2 entries	Xiaoliang Yang	1	-8/+8
	Some of the IS2 IP4_TCP_UDP keys are not correct, like L4_DPORT, L4_SPORT and other L4 keys. This prevents offloaded tc-flower rules from matching on src_port and dst_port for TCP and UDP packets. Signed-off-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-21	inet_diag: validate INET_DIAG_REQ_PROTOCOL attribute	Eric Dumazet	1	-5/+15
	User space could send an invalid INET_DIAG_REQ_PROTOCOL attribute as caught by syzbot. BUG: KMSAN: uninit-value in inet_diag_lock_handler net/ipv4/inet_diag.c:55 [inline] BUG: KMSAN: uninit-value in __inet_diag_dump+0x58c/0x720 net/ipv4/inet_diag.c:1147 CPU: 0 PID: 8505 Comm: syz-executor174 Not tainted 5.9.0-rc4-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x21c/0x280 lib/dump_stack.c:118 kmsan_report+0xf7/0x1e0 mm/kmsan/kmsan_report.c:122 __msan_warning+0x58/0xa0 mm/kmsan/kmsan_instr.c:219 inet_diag_lock_handler net/ipv4/inet_diag.c:55 [inline] __inet_diag_dump+0x58c/0x720 net/ipv4/inet_diag.c:1147 inet_diag_dump_compat+0x2a5/0x380 net/ipv4/inet_diag.c:1254 netlink_dump+0xb73/0x1cb0 net/netlink/af_netlink.c:2246 __netlink_dump_start+0xcf2/0xea0 net/netlink/af_netlink.c:2354 netlink_dump_start include/linux/netlink.h:246 [inline] inet_diag_rcv_msg_compat+0x5da/0x6c0 net/ipv4/inet_diag.c:1288 sock_diag_rcv_msg+0x24f/0x620 net/core/sock_diag.c:256 netlink_rcv_skb+0x6d7/0x7e0 net/netlink/af_netlink.c:2470 sock_diag_rcv+0x63/0x80 net/core/sock_diag.c:275 netlink_unicast_kernel net/netlink/af_netlink.c:1304 [inline] netlink_unicast+0x11c8/0x1490 net/netlink/af_netlink.c:1330 netlink_sendmsg+0x173a/0x1840 net/netlink/af_netlink.c:1919 sock_sendmsg_nosec net/socket.c:651 [inline] sock_sendmsg net/socket.c:671 [inline] ____sys_sendmsg+0xc82/0x1240 net/socket.c:2353 ___sys_sendmsg net/socket.c:2407 [inline] __sys_sendmsg+0x6d1/0x820 net/socket.c:2440 __do_sys_sendmsg net/socket.c:2449 [inline] __se_sys_sendmsg+0x97/0xb0 net/socket.c:2447 __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2447 do_syscall_64+0x9f/0x140 arch/x86/entry/common.c:48 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x441389 Code: e8 fc ab 02 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 1b 09 fc ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007fff3b02ce98 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000441389 RDX: 0000000000000000 RSI: 0000000020001500 RDI: 0000000000000003 RBP: 00000000006cb018 R08: 00000000004002c8 R09: 00000000004002c8 R10: 0000000000000004 R11: 0000000000000246 R12: 0000000000402130 R13: 00000000004021c0 R14: 0000000000000000 R15: 0000000000000000 Uninit was created at: kmsan_save_stack_with_flags mm/kmsan/kmsan.c:143 [inline] kmsan_internal_poison_shadow+0x66/0xd0 mm/kmsan/kmsan.c:126 kmsan_slab_alloc+0x8a/0xe0 mm/kmsan/kmsan_hooks.c:80 slab_alloc_node mm/slub.c:2907 [inline] __kmalloc_node_track_caller+0x9aa/0x12f0 mm/slub.c:4511 __kmalloc_reserve net/core/skbuff.c:142 [inline] __alloc_skb+0x35f/0xb30 net/core/skbuff.c:210 alloc_skb include/linux/skbuff.h:1094 [inline] netlink_alloc_large_skb net/netlink/af_netlink.c:1176 [inline] netlink_sendmsg+0xdb9/0x1840 net/netlink/af_netlink.c:1894 sock_sendmsg_nosec net/socket.c:651 [inline] sock_sendmsg net/socket.c:671 [inline] ____sys_sendmsg+0xc82/0x1240 net/socket.c:2353 ___sys_sendmsg net/socket.c:2407 [inline] __sys_sendmsg+0x6d1/0x820 net/socket.c:2440 __do_sys_sendmsg net/socket.c:2449 [inline] __se_sys_sendmsg+0x97/0xb0 net/socket.c:2447 __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2447 do_syscall_64+0x9f/0x140 arch/x86/entry/common.c:48 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: 3f935c75eb52 ("inet_diag: support for wider protocol numbers") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Christoph Paasch <cpaasch@apple.com> Cc: Mat Martineau <mathew.j.martineau@linux.intel.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-21	net: bridge: br_vlan_get_pvid_rcu() should dereference the VLAN group under RCU	Vladimir Oltean	1	-10/+17
	When calling the RCU brother of br_vlan_get_pvid(), lockdep warns: ============================= WARNING: suspicious RCU usage 5.9.0-rc3-01631-g13c17acb8e38-dirty #814 Not tainted ----------------------------- net/bridge/br_private.h:1054 suspicious rcu_dereference_protected() usage! Call trace: lockdep_rcu_suspicious+0xd4/0xf8 __br_vlan_get_pvid+0xc0/0x100 br_vlan_get_pvid_rcu+0x78/0x108 The warning is because br_vlan_get_pvid_rcu() calls nbp_vlan_group() which calls rtnl_dereference() instead of rcu_dereference(). In turn, rtnl_dereference() calls rcu_dereference_protected() which assumes operation under an RCU write-side critical section, which obviously is not the case here. So, when the incorrect primitive is used to access the RCU-protected VLAN group pointer, READ_ONCE() is not used, which may cause various unexpected problems. I'm sad to say that br_vlan_get_pvid() and br_vlan_get_pvid_rcu() cannot share the same implementation. So fix the bug by splitting the 2 functions, and making br_vlan_get_pvid_rcu() retrieve the VLAN groups under proper locking annotations. Fixes: 7582f5b70f9a ("bridge: add br_vlan_get_pvid_rcu()") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-21	net: Update MAINTAINERS for MediaTek switch driver	Sean Wang	1	-0/+1
	Update maintainers for MediaTek switch driver with Landen Chao who is familiar with MediaTek MT753x switch devices and will help maintenance from the vendor side. Cc: Steven Liu <steven.liu@mediatek.com> Signed-off-by: Sean Wang <sean.wang@mediatek.com> Signed-off-by: Landen Chao <Landen.Chao@mediatek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-21	net/mlx5e: mlx5e_fec_in_caps() returns a boolean	Saeed Mahameed	1	-5/+2
	Returning errno is a bug, fix that. Also fixes smatch warnings: drivers/net/ethernet/mellanox/mlx5/core/en/port.c:453 mlx5e_fec_in_caps() warn: signedness bug returning '(-95)' Fixes: 2132b71f78d2 ("net/mlx5e: Advertise globaly supported FEC modes") Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Aya Levin <ayal@nvidia.com>
2020-09-21	net/mlx5e: kTLS, Avoid kzalloc(GFP_KERNEL) under spinlock	Saeed Mahameed	1	-15/+10
	The spinlock only needed when accessing the channel's icosq, grab the lock after the buf allocation in resync_post_get_progress_params() to avoid kzalloc(GFP_KERNEL) in atomic context. Fixes: 0419d8c9d8f8 ("net/mlx5e: kTLS, Add kTLS RX resync support") Reported-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
2020-09-21	net/mlx5e: kTLS, Fix leak on resync error flow	Saeed Mahameed	1	-2/+6
	Resync progress params buffer and dma weren't released on error, Add missing error unwinding for resync_post_get_progress_params(). Fixes: 0419d8c9d8f8 ("net/mlx5e: kTLS, Add kTLS RX resync support") Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
2020-09-21	net/mlx5e: kTLS, Add missing dma_unmap in RX resync	Saeed Mahameed	1	-3/+5
	Progress params dma address is never unmapped, unmap it when completion handling is over. Fixes: 0419d8c9d8f8 ("net/mlx5e: kTLS, Add kTLS RX resync support") Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
2020-09-21	net/mlx5e: kTLS, Fix napi sync and possible use-after-free	Tariq Toukan	1	-1/+1
	Using synchronize_rcu() is sufficient to wait until running NAPI quits. See similar upstream fix with detailed explanation: ("net/mlx5e: Use synchronize_rcu to sync with NAPI") This change also fixes a possible use-after-free as the NAPI might be already released at this stage. Fixes: 0419d8c9d8f8 ("net/mlx5e: kTLS, Add kTLS RX resync support") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-09-21	net/mlx5e: TLS, Do not expose FPGA TLS counter if not supported	Tariq Toukan	1	-4/+8
	The set of TLS TX global SW counters in mlx5e_tls_sw_stats_desc is updated from all rings by using atomic ops. This set of stats is used only in the FPGA TLS use case, not in the Connect-X TLS one, where regular per-ring counters are used. Do not expose them in the Connect-X use case, as this would cause counter duplication. For example, tx_tls_drop_no_sync_data would appear twice in the ethtool stats. Fixes: d2ead1f360e8 ("net/mlx5e: Add kTLS TX HW offload support") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2020-09-21	net/mlx5e: Fix using wrong stats_grps in mlx5e_update_ndo_stats()	Alaa Hleihel	6	-15/+17
	The cited commit started to reuse function mlx5e_update_ndo_stats() for the representors as well. However, the function is hard-coded to work on mlx5e_nic_stats_grps only. Due to this issue, the representors statistics were not updated in the output of "ip -s". Fix it to work with the correct group by extracting it from the caller's profile. Also, while at it and since this function became generic, move it to en_stats.c and rename it accordingly. Fixes: 8a236b15144b ("net/mlx5e: Convert rep stats to mlx5e_stats_grp-based infra") Signed-off-by: Alaa Hleihel <alaa@nvidia.com> Reviewed-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-09-21	net/mlx5e: Fix multicast counter not up-to-date in "ip -s"	Ron Diskin	4	-7/+12
	Currently the FW does not generate events for counters other than error counters. Unlike ".get_ethtool_stats", ".ndo_get_stats64" (which ip -s uses) might run in atomic context, while the FW interface is non atomic. Thus, 'ip' is not allowed to issue FW commands, so it will only display cached counters in the driver. Add a SW counter (mcast_packets) in the driver to count rx multicast packets. The counter also counts broadcast packets, as we consider it a special case of multicast. Use the counter value when calling "ip -s"/"ifconfig". Fixes: f62b8bb8f2d3 ("net/mlx5: Extend mlx5_core to support ConnectX-4 Ethernet functionality") Signed-off-by: Ron Diskin <rondi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-09-21	net/mlx5e: Fix endianness when calculating pedit mask first bit	Maor Dickman	1	-13/+21
	The field mask value is provided in network byte order and has to be converted to host byte order before calculating pedit mask first bit. Fixes: 88f30bbcbaaa ("net/mlx5e: Bit sized fields rewrite support") Signed-off-by: Maor Dickman <maord@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2020-09-21	net/mlx5e: Enable adding peer miss rules only if merged eswitch is supported	Maor Dickman	1	-24/+28
	The cited commit creates peer miss group during switchdev mode initialization in order to handle miss packets correctly while in VF LAG mode. This is done regardless of FW support of such groups which could cause rules setups failure later on. Fix by adding FW capability check before creating peer groups/rule. Fixes: ac004b832128 ("net/mlx5e: E-Switch, Add peer miss rules") Signed-off-by: Maor Dickman <maord@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Raed Salem <raeds@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-09-21	net/mlx5e: CT: Fix freeing ct_label mapping	Roi Dayan	3	-17/+36
	Add missing mapping remove call when removing ct rule, as the mapping was allocated when ct rule was adding with ct_label. Also there is a missing mapping remove call in error flow. Fixes: 54b154ecfb8c ("net/mlx5e: CT: Map 128 bits labels to 32 bit map ID") Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Eli Britstein <elibr@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-09-21	net/mlx5e: Fix memory leak of tunnel info when rule under multipath not ready	Jianbo Liu	1	-4/+1
	When deleting vxlan flow rule under multipath, tun_info in parse_attr is not freed when the rule is not ready. Fixes: ef06c9ee8933 ("net/mlx5e: Allow one failure when offloading tc encap rules under multipath") Signed-off-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-09-21	net/mlx5e: Use synchronize_rcu to sync with NAPI	Maxim Mikityanskiy	5	-36/+22
	As described in the previous commit, napi_synchronize doesn't quite fit the purpose when we just need to wait until the currently running NAPI quits. Its implementation waits until NAPI is not running by polling and waiting for 1ms in between. In cases where we need to deactivate one queue (e.g., recovery flows) or where we deactivate them one-by-one (deactivate channel flow), we may get stuck in napi_synchronize forever if other queues keep NAPI active, causing a soft lockup. Depending on kernel configuration (CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC), it may result in a kernel panic. To fix the issue, use synchronize_rcu to wait for NAPI to quit, and wrap the whole NAPI in rcu_read_lock. Fixes: acc6c5953af1 ("net/mlx5e: Split open/close channels to stages") Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-09-21	net/mlx5e: Use RCU to protect rq->xdp_prog	Maxim Mikityanskiy	3	-30/+27
	Currently, the RQs are temporarily deactivated while hot-replacing the XDP program, and napi_synchronize is used to make sure rq->xdp_prog is not in use. However, napi_synchronize is not ideal: instead of waiting till the end of a NAPI cycle, it polls and waits until NAPI is not running, sleeping for 1ms between the periodic checks. Under heavy workloads, this loop will never end, which may even lead to a kernel panic if the kernel detects the hangup. Such workloads include XSK TX and possibly also heavy RX (XSK or normal). The fix is inspired by commit 326fe02d1ed6 ("net/mlx4_en: protect ring->xdp_prog with rcu_read_lock"). As mlx5e_xdp_handle is already protected by rcu_read_lock, and bpf_prog_put uses call_rcu to free the program, there is no need for additional synchronization if proper RCU functions are used to access the pointer. This patch converts all accesses to rq->xdp_prog to use RCU functions. Fixes: 86994156c736 ("net/mlx5e: XDP fast RX drop bpf programs support") Fixes: db05815b36cb ("net/mlx5e: Add XSK zero-copy support") Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>