wireguard-linux - WireGuard for the Linux kernel

Age	Commit message (Collapse)	Author	Files	Lines
2020-03-16	svcrdma: Rename svcrdma_encode trace points in send routines	Chuck Lever	2	-11/+16
	These trace points are misnamed: trace_svcrdma_encode_wseg trace_svcrdma_encode_write trace_svcrdma_encode_reply trace_svcrdma_encode_rseg trace_svcrdma_encode_read trace_svcrdma_encode_pzr Because they actually trace posting on the Send Queue. Let's rename them so that I can add trace points in the chunk list encoders that actually do trace chunk list encoding events. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-03-16	svcrdma: Update synopsis of svc_rdma_send_reply_msg()	Chuck Lever	1	-6/+4
	Preparing for subsequent patches, no behavior change expected. Pass the RPC Call's svc_rdma_recv_ctxt deeper into the sendto() path. This enables passing more information about Requester- provided Write and Reply chunks into those lower-level functions. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-03-16	svcrdma: Update synopsis of svc_rdma_map_reply_msg()	Chuck Lever	3	-36/+53
	Preparing for subsequent patches, no behavior change expected. Pass the RPC Call's svc_rdma_recv_ctxt deeper into the sendto() path. This enables passing more information about Requester- provided Write and Reply chunks into those lower-level functions. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-03-16	svcrdma: Update synopsis of svc_rdma_send_reply_chunk()	Chuck Lever	3	-8/+8
	Preparing for subsequent patches, no behavior change expected. Pass the RPC Call's svc_rdma_recv_ctxt deeper into the sendto() path. This enables passing more information about Requester- provided Write and Reply chunks into the lower-level send functions. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-03-16	svcrdma: De-duplicate code that locates Write and Reply chunks	Chuck Lever	3	-35/+14
	Cache the locations of the Requester-provided Write list and Reply chunk so that the Send path doesn't need to parse the Call header again. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-03-16	svcrdma: Use struct xdr_stream to decode ingress transport headers	Chuck Lever	4	-87/+130
	The logic that checks incoming network headers has to be scrupulous. De-duplicate: replace open-coded buffer overflow checks with the use of xdr_stream helpers that are used most everywhere else XDR decoding is done. One minor change to the sanity checks: instead of checking the length of individual segments, cap the length of the whole chunk to be sure it can fit in the set of pages available in rq_pages. This should be a better test of whether the server can handle the chunks in each request. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-03-16	svcrdma: Remove svcrdma_cm_event() trace point	Chuck Lever	2	-35/+0
	Clean up. This trace point is no longer needed because the RDMA/core CMA code has an equivalent trace point that was added by commit ed999f820a6c ("RDMA/cma: Add trace points in RDMA Connection Manager"). Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-03-16	svcrdma: Create a generic tracing class for displaying xdr_buf layout	Chuck Lever	4	-2/+48
	This class can be used to create trace points in either the RPC client or RPC server paths. It simply displays the length of each part of an xdr_buf, which is useful to determine that the transport and XDR codecs are operating correctly. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-03-16	SUNRPC: Clean up: Replace dprintk and BUG_ON call sites in svcauth_gss.c	Chuck Lever	2	-29/+73
	Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-03-16	SUNRPC: Add xdr_pad_size() helper	Chuck Lever	4	-10/+21
	Introduce a helper function to compute the XDR pad size of a variable-length XDR object. Clean up: Replace open-coded calculation of XDR pad sizes. I'm sure I haven't found every instance of this calculation. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-03-16	svcrdma: Fix double svc_rdma_send_ctxt_put() in an error path	Chuck Lever	1	-8/+1
	This error path is almost never executed. Found by code inspection. Fixes: 99722fe4d5a6 ("svcrdma: Persistently allocate and DMA-map Send buffers") Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-03-16	NFSD: Clean up nfsd4_encode_readv	Chuck Lever	1	-5/+4
	Address some minor nits I noticed while working on this function. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-03-16	nfsd: Fix NFSv4 READ on RDMA when using readv	Chuck Lever	10	-23/+106
	svcrdma expects that the payload falls precisely into the xdr_buf page vector. This does not seem to be the case for nfsd4_encode_readv(). This code is called only when fops->splice_read is missing or when RQ_SPLICE_OK is clear, so it's not a noticeable problem in many common cases. Add new transport method: ->xpo_read_payload so that when a READ payload does not fit exactly in rq_res's page vector, the XDR encoder can inform the RPC transport exactly where that payload is, without the payload's XDR pad. That way, when a Write chunk is present, the transport knows what byte range in the Reply message is supposed to be matched with the chunk. Note that the Linux NFS server implementation of NFS/RDMA can currently handle only one Write chunk per RPC-over-RDMA message. This simplifies the implementation of this fix. Fixes: b04209806384 ("nfsd4: allow exotic read compounds") Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=198053 Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-03-16	fs: nfsd: fileache.c: Use built-in RCU list checking	Madhuparna Bhowmik	1	-1/+1
	list_for_each_entry_rcu() has built-in RCU and lock checking. Pass cond argument to list_for_each_entry_rcu() to silence false lockdep warning when CONFIG_PROVE_RCU_LIST is enabled by default. Signed-off-by: Madhuparna Bhowmik <madhuparnabhowmik10@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-03-16	fs: nfsd: nfs4state.c: Use built-in RCU list checking	Madhuparna Bhowmik	1	-1/+2
	list_for_each_entry_rcu() has built-in RCU and lock checking. Pass cond argument to list_for_each_entry_rcu() to silence false lockdep warning when CONFIG_PROVE_RCU_LIST is enabled by default. Signed-off-by: Madhuparna Bhowmik <madhuparnabhowmik10@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-03-16	svcrdma: Replace zero-length array with flexible-array member	Gustavo A. R. Silva	1	-1/+1
	The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-03-16	sunrpc: Pass lockdep expression to RCU lists	Amol Grover	1	-1/+2
	detail->hash_table[] is traversed using hlist_for_each_entry_rcu outside an RCU read-side critical section but under the protection of detail->hash_lock. Hence, add corresponding lockdep expression to silence false-positive warnings, and harden RCU lists. Signed-off-by: Amol Grover <frextrite@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-03-16	sunrpc: Replace zero-length array with flexible-array member	Gustavo A. R. Silva	1	-1/+1
	The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-03-16	nfsd: set the server_scope during service startup	Scott Mayhew	3	-4/+10
	Currently, nfsd4_encode_exchange_id() encodes the utsname nodename string in the server_scope field. In a multi-host container environemnt, if an nfsd container is restarted on a different host than it was originally running on, clients will see a server_scope mismatch and will not attempt to reclaim opens. Instead, set the server_scope while we're in a process context during service startup, so we get the utsname nodename of the current process and store that in nfsd_net. Signed-off-by: Scott Mayhew <smayhew@redhat.com> [bfields: fix up major_id too] Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-03-15	Linux 5.6-rc6	Linus Torvalds	1	-1/+1

2020-03-14	iommu/vt-d: Populate debugfs if IOMMUs are detected	Megha Dey	2	-2/+13
	Currently, the intel iommu debugfs directory(/sys/kernel/debug/iommu/intel) gets populated only when DMA remapping is enabled (dmar_disabled = 0) irrespective of whether interrupt remapping is enabled or not. Instead, populate the intel iommu debugfs directory if any IOMMUs are detected. Cc: Dan Carpenter <dan.carpenter@oracle.com> Fixes: ee2636b8670b1 ("iommu/vt-d: Enable base Intel IOMMU debugfs support") Signed-off-by: Megha Dey <megha.dey@linux.intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2020-03-14	KVM: nVMX: avoid NULL pointer dereference with incorrect EVMCS GPAs	Vitaly Kuznetsov	1	-2/+3
	When an EVMCS enabled L1 guest on KVM will tries doing enlightened VMEnter with EVMCS GPA = 0 the host crashes because the evmcs_gpa != vmx->nested.hv_evmcs_vmptr condition in nested_vmx_handle_enlightened_vmptrld() will evaluate to false (as nested.hv_evmcs_vmptr is zeroed after init). The crash will happen on vmx->nested.hv_evmcs pointer dereference. Another problematic EVMCS ptr value is '-1' but it only causes host crash after nested_release_evmcs() invocation. The problem is exactly the same as with '0', we mistakenly think that the EVMCS pointer hasn't changed and thus nested.hv_evmcs_vmptr is valid. Resolve the issue by adding an additional !vmx->nested.hv_evmcs check to nested_vmx_handle_enlightened_vmptrld(), this way we will always be trying kvm_vcpu_map() when nested.hv_evmcs is NULL and this is supposed to catch all invalid EVMCS GPAs. Also, initialize hv_evmcs_vmptr to '0' in nested_release_evmcs() to be consistent with initialization where we don't currently set hv_evmcs_vmptr to '-1'. Cc: stable@vger.kernel.org Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-14	irqchip/gic-v3: Workaround Cavium erratum 38539 when reading GICD_TYPER2	Marc Zyngier	2	-1/+31
	Despite the architecture spec requiring that reserved registers in the GIC distributor memory map are RES0 (and thus are not allowed to generate an exception), the Cavium ThunderX (aka TX1) SoC explodes as such: [ 0.000000] GICv3: GIC: Using split EOI/Deactivate mode [ 0.000000] GICv3: 128 SPIs implemented [ 0.000000] GICv3: 0 Extended SPIs implemented [ 0.000000] Internal error: synchronous external abort: 96000210 [#1] SMP [ 0.000000] Modules linked in: [ 0.000000] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.4.0-rc4-00035-g3cf6a3d5725f #7956 [ 0.000000] Hardware name: cavium,thunder-88xx (DT) [ 0.000000] pstate: 60000085 (nZCv daIf -PAN -UAO) [ 0.000000] pc : __raw_readl+0x0/0x8 [ 0.000000] lr : gic_init_bases+0x110/0x560 [ 0.000000] sp : ffff800011243d90 [ 0.000000] x29: ffff800011243d90 x28: 0000000000000000 [ 0.000000] x27: 0000000000000018 x26: 0000000000000002 [ 0.000000] x25: ffff8000116f0000 x24: ffff000fbe6a2c80 [ 0.000000] x23: 0000000000000000 x22: ffff010fdc322b68 [ 0.000000] x21: ffff800010a7a208 x20: 00000000009b0404 [ 0.000000] x19: ffff80001124dad0 x18: 0000000000000010 [ 0.000000] x17: 000000004d8d492b x16: 00000000f67eb9af [ 0.000000] x15: ffffffffffffffff x14: ffff800011249908 [ 0.000000] x13: ffff800091243ae7 x12: ffff800011243af4 [ 0.000000] x11: ffff80001126e000 x10: ffff800011243a70 [ 0.000000] x9 : 00000000ffffffd0 x8 : ffff80001069c828 [ 0.000000] x7 : 0000000000000059 x6 : ffff8000113fb4d1 [ 0.000000] x5 : 0000000000000001 x4 : 0000000000000000 [ 0.000000] x3 : 0000000000000000 x2 : 0000000000000000 [ 0.000000] x1 : 0000000000000000 x0 : ffff8000116f000c [ 0.000000] Call trace: [ 0.000000] __raw_readl+0x0/0x8 [ 0.000000] gic_of_init+0x188/0x224 [ 0.000000] of_irq_init+0x200/0x3cc [ 0.000000] irqchip_init+0x1c/0x40 [ 0.000000] init_IRQ+0x160/0x1d0 [ 0.000000] start_kernel+0x2ec/0x4b8 [ 0.000000] Code: a8c47bfd d65f03c0 d538d080 d65f03c0 (b9400000) when reading the GICv4.1 GICD_TYPER2 register, which is unexpected... Work around it by adding a new quirk for the following variants: ThunderX: CN88xx OCTEON TX: CN83xx, CN81xx OCTEON TX2: CN93xx, CN96xx, CN98xx, CNF95xx* and use this flag to avoid accessing GICD_TYPER2. Note that all reserved registers (including redistributors and ITS) are impacted by this erratum, but that only GICD_TYPER2 has to be worked around so far. Signed-off-by: Marc Zyngier <maz@kernel.org> Tested-by: Robert Richter <rrichter@marvell.com> Tested-by: Mark Salter <msalter@redhat.com> Tested-by: Tim Harvey <tharvey@gateworks.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Robert Richter <rrichter@marvell.com> Link: https://lore.kernel.org/r/20191027144234.8395-11-maz@kernel.org Link: https://lore.kernel.org/r/20200311115649.26060-1-maz@kernel.org
2020-03-14	KVM: x86: Initializing all kvm_lapic_irq fields in ioapic_write_indirect	Nitesh Narayan Lal	1	-2/+5
	Previously all fields of structure kvm_lapic_irq were not initialized before it was passed to kvm_bitmap_or_dest_vcpus(). Which will cause an issue when any of those fields are used for processing a request. For example not initializing the msi_redir_hint field before passing to the kvm_bitmap_or_dest_vcpus(), may lead to a misbehavior of kvm_apic_map_get_dest_lapic(). This will specifically happen when the kvm_lowest_prio_delivery() returns TRUE due to a non-zero garbage value of msi_redir_hint, which should not happen as the request belongs to APIC fixed delivery mode and we do not want to deliver the interrupt only to the lowest priority candidate. This patch initializes all the fields of kvm_lapic_irq based on the values of ioapic redirect_entry object before passing it on to kvm_bitmap_or_dest_vcpus(). Fixes: 7ee30bc132c6 ("KVM: x86: deliver KVM IOAPIC scan request to target vCPUs") Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> [Set level to false since the value doesn't really matter. Suggested by Vitaly Kuznetsov. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-14	KVM: VMX: Condition ENCLS-exiting enabling on CPU support for SGX1	Sean Christopherson	1	-2/+14
	Enable ENCLS-exiting (and thus set vmcs.ENCLS_EXITING_BITMAP) only if the CPU supports SGX1. Per Intel's SDM, all ENCLS leafs #UD if SGX1 is not supported[], i.e. intercepting ENCLS to inject a #UD is unnecessary. Avoiding ENCLS-exiting even when it is reported as supported by the CPU works around a reported issue where SGX is "hard" disabled after an S3 suspend/resume cycle, i.e. CPUID.0x7.SGX=0 and the VMCS field/control are enumerated as unsupported. While the root cause of the S3 issue is unknown, it's definitely _not_ a KVM (or kernel) bug, i.e. this is a workaround for what is most likely a hardware or firmware issue. As a bonus side effect, KVM saves a VMWRITE when first preparing vmcs01 and vmcs02. Note, SGX must be disabled in BIOS to take advantage of this workaround [] The additional ENCLS CPUID check on SGX1 exists so that SGX can be globally "soft" disabled post-reset, e.g. if #MC bits in MCi_CTL are cleared. Soft disabled meaning disabling SGX without clearing the primary CPUID bit (in leaf 0x7) and without poking into non-SGX CPU paths, e.g. for the VMCS controls. Fixes: 0b665d304028 ("KVM: vmx: Inject #UD for SGX ENCLS instruction in guest") Reported-by: Toni Spets <toni.spets@iki.fi> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-14	iommu/amd: Fix IOMMU AVIC not properly update the is_run bit in IRTE	Suravee Suthikulpanit	1	-2/+2
	Commit b9c6ff94e43a ("iommu/amd: Re-factor guest virtual APIC (de-)activation code") accidentally left out the ir_data pointer when calling modity_irte_ga(), which causes the function amd_iommu_update_ga() to return prematurely due to struct amd_ir_data.ref is NULL and the "is_run" bit of IRTE does not get updated properly. This results in bad I/O performance since IOMMU AVIC always generate GA Log entry and notify IOMMU driver and KVM when it receives interrupt from the PCI pass-through device instead of directly inject interrupt to the vCPU. Fixes by passing ir_data when calling modify_irte_ga() as done previously. Fixes: b9c6ff94e43a ("iommu/amd: Re-factor guest virtual APIC (de-)activation code") Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2020-03-14	iommu/vt-d: Ignore devices with out-of-spec domain number	Daniel Drake	1	-0/+8
	VMD subdevices are created with a PCI domain ID of 0x10000 or higher. These subdevices are also handled like all other PCI devices by dmar_pci_bus_notifier(). However, when dmar_alloc_pci_notify_info() take records of such devices, it will truncate the domain ID to a u16 value (in info->seg). The device at (e.g.) 10000:00:02.0 is then treated by the DMAR code as if it is 0000:00:02.0. In the unlucky event that a real device also exists at 0000:00:02.0 and also has a device-specific entry in the DMAR table, dmar_insert_dev_scope() will crash on: BUG_ON(i >= devices_cnt); That's basically a sanity check that only one PCI device matches a single DMAR entry; in this case we seem to have two matching devices. Fix this by ignoring devices that have a domain number higher than what can be looked up in the DMAR table. This problem was carefully diagnosed by Jian-Hong Pan. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Daniel Drake <drake@endlessm.com> Fixes: 59ce0515cdaf3 ("iommu/vt-d: Update DRHD/RMRR/ATSR device scope caches when PCI hotplug happens") Signed-off-by: Joerg Roedel <jroedel@suse.de>
2020-03-14	iommu/vt-d: Fix the wrong printing in RHSA parsing	Zhenzhong Duan	1	-1/+1
	When base address in RHSA structure doesn't match base address in each DRHD structure, the base address in last DRHD is printed out. This doesn't make sense when there are multiple DRHD units, fix it by printing the buggy RHSA's base address. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@gmail.com> Fixes: fd0c8894893cb ("intel-iommu: Set a more specific taint flag for invalid BIOS DMAR tables") Signed-off-by: Joerg Roedel <jroedel@suse.de>
2020-03-13	afs: Use kfree_rcu() instead of casting kfree() to rcu_callback_t	Jann Horn	2	-2/+2
	afs_put_addrlist() casts kfree() to rcu_callback_t. Apart from being wrong in theory, this might also blow up when people start enforcing function types via compiler instrumentation, and it means the rcu_head has to be first in struct afs_addr_list. Use kfree_rcu() instead, it's simpler and more correct. Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-03-13	ovl: fix lockdep warning for async write	Miklos Szeredi	1	-0/+6
	Lockdep reports "WARNING: lock held when returning to user space!" due to async write holding freeze lock over the write. Apparently aio.c already deals with this by lying to lockdep about the state of the lock. Do the same here. No need to check for S_IFREG() here since these file ops are regular-only. Reported-by: syzbot+9331a354f4f624a52a55@syzkaller.appspotmail.com Fixes: 2406a307ac7d ("ovl: implement async IO routines") Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-03-13	ovl: fix some xino configurations	Amir Goldstein	2	-1/+9
	Fix up two bugs in the coversion to xino_mode: 1. xino=off does not always end up in disabled mode 2. xino=auto on 32bit arch should end up in disabled mode Take a proactive approach to disabling xino on 32bit kernel: 1. Disable XINO_AUTO config during build time 2. Disable xino with a warning on mount time As a by product, xino=on on 32bit arch also ends up in disabled mode. We never intended to enable xino on 32bit arch and this will make the rest of the logic simpler. Fixes: 0f831ec85eda ("ovl: simplify ovl_same_sb() helper") Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-03-13	x86/vector: Remove warning on managed interrupt migration	Peter Xu	1	-6/+8
	The vector management code assumes that managed interrupts cannot be migrated away from an online CPU. free_moved_vector() has a WARN_ON_ONCE() which triggers when a managed interrupt vector association on a online CPU is cleared. The CPU offline code uses a different mechanism which cannot trigger this. This assumption is not longer correct because the new CPU isolation feature which affects the placement of managed interrupts must be able to move a managed interrupt away from an online CPU. There are two reasons why this can happen: 1) When the interrupt is activated the affinity mask which was established in irq_create_affinity_masks() is handed in to the vector allocation code. This mask contains all CPUs to which the interrupt can be made affine to, but this does not take the CPU isolation 'managed_irq' mask into account. When the interrupt is finally requested by the device driver then the affinity is checked again and the CPU isolation 'managed_irq' mask is taken into account, which moves the interrupt to a non-isolated CPU if possible. 2) The interrupt can be affine to an isolated CPU because the non-isolated CPUs in the calculated affinity mask are not online. Once a non-isolated CPU which is in the mask comes online the interrupt is migrated to this non-isolated CPU In both cases the regular online migration mechanism is used which triggers the WARN_ON_ONCE() in free_moved_vector(). Case #1 could have been addressed by taking the isolation mask into account, but that would require a massive code change in the activation logic and the eventual migration event was accepted as a reasonable tradeoff when the isolation feature was developed. But even if #1 would be addressed, #2 would still trigger it. Of course the warning in free_moved_vector() was overlooked at that time and the above two cases which have been discussed during patch review have obviously never been tested before the final submission. So keep it simple and remove the warning. [ tglx: Rewrote changelog and added a comment to free_moved_vector() ] Fixes: 11ea68f553e2 ("genirq, sched/isolation: Isolate from handling managed interrupts") Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lkml.kernel.org/r/20200312205830.81796-1-peterx@redhat.com
2020-03-13	i2c: acpi: put device when verifying client fails	Wolfram Sang	1	-1/+9
	i2c_verify_client() can fail, so we need to put the device when that happens. Fixes: 525e6fabeae2 ("i2c / ACPI: add support for ACPI reconfigure notifications") Reported-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2020-03-13	iommu/vt-d: Fix debugfs register reads	Megha Dey	2	-15/+27
	Commit 6825d3ea6cde ("iommu/vt-d: Add debugfs support to show register contents") dumps the register contents for all IOMMU devices. Currently, a 64 bit read(dmar_readq) is done for all the IOMMU registers, even though some of the registers are 32 bits, which is incorrect. Use the correct read function variant (dmar_readl/dmar_readq) while reading the contents of 32/64 bit registers respectively. Signed-off-by: Megha Dey <megha.dey@linux.intel.com> Link: https://lore.kernel.org/r/1583784587-26126-2-git-send-email-megha.dey@linux.intel.com Acked-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2020-03-13	iommu/vt-d: quirk_ioat_snb_local_iommu: replace WARN_TAINT with pr_warn + add_taint	Hans de Goede	1	-3/+4
	Quoting from the comment describing the WARN functions in include/asm-generic/bug.h: * WARN(), WARN_ON(), WARN_ON_ONCE, and so on can be used to report * significant kernel issues that need prompt attention if they should ever * appear at runtime. * * Do not use these macros when checking for invalid external inputs The (buggy) firmware tables which the dmar code was calling WARN_TAINT for really are invalid external inputs. They are not under the kernel's control and the issues in them cannot be fixed by a kernel update. So logging a backtrace, which invites bug reports to be filed about this, is not helpful. Fixes: 556ab45f9a77 ("ioat2: catch and recover from broken vtd configurations v6") Signed-off-by: Hans de Goede <hdegoede@redhat.com> Acked-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20200309182510.373875-1-hdegoede@redhat.com BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=701847 Signed-off-by: Joerg Roedel <jroedel@suse.de>
2020-03-13	iommu/vt-d: dmar_parse_one_rmrr: replace WARN_TAINT with pr_warn + add_taint	Hans de Goede	1	-2/+4
	Quoting from the comment describing the WARN functions in include/asm-generic/bug.h: * WARN(), WARN_ON(), WARN_ON_ONCE, and so on can be used to report * significant kernel issues that need prompt attention if they should ever * appear at runtime. * * Do not use these macros when checking for invalid external inputs The (buggy) firmware tables which the dmar code was calling WARN_TAINT for really are invalid external inputs. They are not under the kernel's control and the issues in them cannot be fixed by a kernel update. So logging a backtrace, which invites bug reports to be filed about this, is not helpful. Some distros, e.g. Fedora, have tools watching for the kernel backtraces logged by the WARN macros and offer the user an option to file a bug for this when these are encountered. The WARN_TAINT in dmar_parse_one_rmrr + another iommu WARN_TAINT, addressed in another patch, have lead to over a 100 bugs being filed this way. This commit replaces the WARN_TAINT("...") call, with a pr_warn(FW_BUG "...") + add_taint(TAINT_FIRMWARE_WORKAROUND, ...) call avoiding the backtrace and thus also avoiding bug-reports being filed about this against the kernel. Fixes: f5a68bb0752e ("iommu/vt-d: Mark firmware tainted if RMRR fails sanity check") Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Joerg Roedel <jroedel@suse.de> Acked-by: Lu Baolu <baolu.lu@linux.intel.com> Cc: stable@vger.kernel.org Cc: Barret Rhoden <brho@google.com> Link: https://lore.kernel.org/r/20200309140138.3753-3-hdegoede@redhat.com BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1808874
2020-03-13	iommu/vt-d: dmar: replace WARN_TAINT with pr_warn + add_taint	Hans de Goede	1	-5/+6
	Quoting from the comment describing the WARN functions in include/asm-generic/bug.h: * WARN(), WARN_ON(), WARN_ON_ONCE, and so on can be used to report * significant kernel issues that need prompt attention if they should ever * appear at runtime. * * Do not use these macros when checking for invalid external inputs The (buggy) firmware tables which the dmar code was calling WARN_TAINT for really are invalid external inputs. They are not under the kernel's control and the issues in them cannot be fixed by a kernel update. So logging a backtrace, which invites bug reports to be filed about this, is not helpful. Some distros, e.g. Fedora, have tools watching for the kernel backtraces logged by the WARN macros and offer the user an option to file a bug for this when these are encountered. The WARN_TAINT in warn_invalid_dmar() + another iommu WARN_TAINT, addressed in another patch, have lead to over a 100 bugs being filed this way. This commit replaces the WARN_TAINT("...") calls, with pr_warn(FW_BUG "...") + add_taint(TAINT_FIRMWARE_WORKAROUND, ...) calls avoiding the backtrace and thus also avoiding bug-reports being filed about this against the kernel. Fixes: fd0c8894893c ("intel-iommu: Set a more specific taint flag for invalid BIOS DMAR tables") Fixes: e625b4a95d50 ("iommu/vt-d: Parse ANDD records") Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Joerg Roedel <jroedel@suse.de> Acked-by: Lu Baolu <baolu.lu@linux.intel.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20200309140138.3753-2-hdegoede@redhat.com BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1564895
2020-03-12	drm/dp_mst: Rewrite and fix bandwidth limit checks	Lyude Paul	1	-26/+93
	Sigh, this is mostly my fault for not giving commit cd82d82cbc04 ("drm/dp_mst: Add branch bandwidth validation to MST atomic check") enough scrutiny during review. The way we're checking bandwidth limitations here is mostly wrong: For starters, drm_dp_mst_atomic_check_bw_limit() determines the pbn_limit of a branch by simply scanning each port on the current branch device, then uses the last non-zero full_pbn value that it finds. It then counts the sum of the PBN used on each branch device for that level, and compares against the full_pbn value it found before. This is wrong because ports can and will have different PBN limitations on many hubs, especially since a number of DisplayPort hubs out there will be clever and only use the smallest link rate required for each downstream sink - potentially giving every port a different full_pbn value depending on what link rate it's trained at. This means with our current code, which max PBN value we end up with is not well defined. Additionally, we also need to remember when checking bandwidth limitations that the top-most device in any MST topology is a branch device, not a port. This means that the first level of a topology doesn't technically have a full_pbn value that needs to be checked. Instead, we should assume that so long as our VCPI allocations fit we're within the bandwidth limitations of the primary MSTB. We do however, want to check full_pbn on every port including those of the primary MSTB. However, it's important to keep in mind that this value represents the minimum link rate /between a port's sink or mstb, and the mstb itself/. A quick diagram to explain: MSTB #1 / \ / \ Port #1 Port #2 full_pbn for Port #1 → \| \| ← full_pbn for Port #2 Sink #1 MSTB #2 \| etc... Note that in the above diagram, the combined PBN from all VCPI allocations on said hub should not exceed the full_pbn value of port #2, and the display configuration on sink #1 should not exceed the full_pbn value of port #1. However, port #1 and port #2 can otherwise consume as much bandwidth as they want so long as their VCPI allocations still fit. And finally - our current bandwidth checking code also makes the mistake of not checking whether something is an end device or not before trying to traverse down it. So, let's fix it by rewriting our bandwidth checking helpers. We split the function into one part for handling branches which simply adds up the total PBN on each branch and returns it, and one for checking each port to ensure we're not going over its PBN limit. Phew. This should fix regressions seen, where we erroneously reject display configurations due to thinking they're going over our bandwidth limits when they're not. Changes since v1: * Took an even closer look at how PBN limitations are supposed to be handled, and did some experimenting with Sean Paul. Ended up rewriting these helpers again, but this time they should actually be correct! Changes since v2: * Small indenting fix * Fix pbn_used check in drm_dp_mst_atomic_check_port_bw_limit() Signed-off-by: Lyude Paul <lyude@redhat.com> Fixes: cd82d82cbc04 ("drm/dp_mst: Add branch bandwidth validation to MST atomic check") Cc: Sean Paul <seanpaul@google.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Mikita Lipski <mikita.lipski@amd.com> Tested-by: Hans de Goede <hdegoede@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200309210131.1497545-1-lyude@redhat.com
2020-03-12	drm/dp_mst: Reprobe path resources in CSN handler	Lyude Paul	1	-23/+25
	We used to punt off reprobing path resources to the link address probe work, but now that we handle CSNs asynchronously from the driver's HPD handling we can do whatever the heck we want from the CSN! So, reprobe the path resources from drm_dp_mst_handle_conn_stat(). Also, get rid of the path resource reprobing code in drm_dp_check_and_send_link_address() since it's needlessly complicated when we already reprobe path resources from drm_dp_handle_link_address_port(). And finally, teach drm_dp_send_enum_path_resources() to return 1 on PBN changes so we know if we need to send another hotplug or not. This fixes issues where we've indicated to userspace that a port has just been connected, before we actually probed it's available PBN - something that results in unexpected atomic check failures. Signed-off-by: Lyude Paul <lyude@redhat.com> Fixes: cd82d82cbc04 ("drm/dp_mst: Add branch bandwidth validation to MST atomic check") Cc: Mikita Lipski <mikita.lipski@amd.com> Cc: Hans de Goede <hdegoede@redhat.com> Cc: Sean Paul <sean@poorly.run> Link: https://patchwork.freedesktop.org/patch/msgid/20200306234623.547525-4-lyude@redhat.com Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Tested-by: Hans de Goede <hdegoede@redhat.com>
2020-03-12	drm/dp_mst: Use full_pbn instead of available_pbn for bandwidth checks	Lyude Paul	2	-10/+9
	DisplayPort specifications are fun. For a while, it's been really unclear to us what available_pbn actually does. There's a somewhat vague explanation in the DisplayPort spec (starting from 1.2) that partially explains it: The minimum payload bandwidth number supported by the path. Each node updates this number with its available payload bandwidth number if its payload bandwidth number is less than that in the Message Transaction reply. So, it sounds like available_pbn represents the smallest link rate in use between the source and the branch device. Cool, so full_pbn is just the highest possible PBN that the branch device supports right? Well, we assumed that for quite a while until Sean Paul noticed that on some MST hubs, available_pbn will actually get set to 0 whenever there's any active payloads on the respective branch device. This caused quite a bit of confusion since clearing the payload ID table would end up fixing the available_pbn value. So, we just went with that until commit cd82d82cbc04 ("drm/dp_mst: Add branch bandwidth validation to MST atomic check") started breaking people's setups due to us getting erroneous available_pbn values. So, we did some more digging and got confused until we finally looked at the definition for full_pbn: The bandwidth of the link at the trained link rate and lane count between the DP Source device and the DP Sink device with no time slots allocated to VC Payloads, represented as a Payload Bandwidth Number. As with the Available_Payload_Bandwidth_Number, this number is determined by the link with the lowest lane count and link rate. That's what we get for not reading specs closely enough, hehe. So, since full_pbn is definitely what we want for doing bandwidth restriction checks - let's start using that instead and ignore available_pbn entirely. Signed-off-by: Lyude Paul <lyude@redhat.com> Fixes: cd82d82cbc04 ("drm/dp_mst: Add branch bandwidth validation to MST atomic check") Cc: Mikita Lipski <mikita.lipski@amd.com> Cc: Hans de Goede <hdegoede@redhat.com> Cc: Sean Paul <sean@poorly.run> Reviewed-by: Mikita Lipski <mikita.lipski@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200306234623.547525-3-lyude@redhat.com Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Tested-by: Hans de Goede <hdegoede@redhat.com>
2020-03-12	drm/dp_mst: Rename drm_dp_mst_is_dp_mst_end_device() to be less redundant	Lyude Paul	1	-5/+5
	It's already prefixed by dp_mst, so we don't really need to repeat ourselves here. One of the changes I should have picked up originally when reviewing MST DSC support. There should be no functional changes here Cc: Mikita Lipski <mikita.lipski@amd.com> Cc: Sean Paul <seanpaul@google.com> Cc: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Lyude Paul <lyude@redhat.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Tested-by: Hans de Goede <hdegoede@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200306234623.547525-2-lyude@redhat.com
2020-03-12	net: systemport: fix index check to avoid an array out of bounds access	Colin Ian King	1	-1/+1
	Currently the bounds check on index is off by one and can lead to an out of bounds access on array priv->filters_loc when index is RXCHK_BRCM_TAG_MAX. Fixes: bb9051a2b230 ("net: systemport: Add support for WAKE_FILTER") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-12	tc-testing: add ETS scheduler to tdc build configuration	Davide Caratti	1	-0/+1
	add CONFIG_NET_SCH_ETS to 'config', otherwise test suites using this file to perform a full tdc run will encounter the following warning: ok 645 e90e - Add ETS qdisc using bands # skipped - "-----> teardown stage" did not complete successfully Fixes: 82c664b69c8b ("selftests: qdiscs: Add test coverage for ETS Qdisc") Reported-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-12	net: phy: fix MDIO bus PM PHY resuming	Heiner Kallweit	2	-1/+7
	So far we have the unfortunate situation that mdio_bus_phy_may_suspend() is called in suspend AND resume path, assuming that function result is the same. After the original change this is no longer the case, resulting in broken resume as reported by Geert. To fix this call mdio_bus_phy_may_suspend() in the suspend path only, and let the phy_device store the info whether it was suspended by MDIO bus PM. Fixes: 503ba7c69610 ("net: phy: Avoid multiple suspends") Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Tested-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-12	cifs_atomic_open(): fix double-put on late allocation failure	Al Viro	3	-4/+8
	several iterations of ->atomic_open() calling conventions ago, we used to need fput() if ->atomic_open() failed at some point after successful finish_open(). Now (since 2016) it's not needed - struct file carries enough state to make fput() work regardless of the point in struct file lifecycle and discarding it on failure exits in open() got unified. Unfortunately, I'd missed the fact that we had an instance of ->atomic_open() (cifs one) that used to need that fput(), as well as the stale comment in finish_open() demanding such late failure handling. Trivially fixed... Fixes: fe9ec8291fca "do_last(): take fput() on error after opening to out:" Cc: stable@kernel.org # v4.7+ Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-12	gfs2_atomic_open(): fix O_EXCL\|O_CREAT handling on cold dcache	Al Viro	1	-1/+1
	with the way fs/namei.c:do_last() had been done, ->atomic_open() instances needed to recognize the case when existing file got found with O_EXCL\|O_CREAT, either by falling back to finish_no_open() or failing themselves. gfs2 one didn't. Fixes: 6d4ade986f9c (GFS2: Add atomic_open support) Cc: stable@kernel.org # v3.11 Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-12	net: hns3: clear port base VLAN when unload PF	Jian Shen	1	-0/+23
	Currently, PF missed to clear the port base VLAN for VF when unload. In this case, the VLAN id will remain in the VLAN table. This patch fixes it. Fixes: 92f11ea177cd ("net: hns3: fix set port based VLAN issue for VF") Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-12	net: hns3: fix RMW issue for VLAN filter switch	Jian Shen	1	-4/+15
	According to the user manual, the ingress and egress VLAN filter are configured at the same time. Currently, hclge_init_vlan_config() and hclge_set_vlan_spoofchk() will both change the VLAN filter switch. So it's necessary to read the old configuration before modifying it. Fixes: 22044f95faa0 ("net: hns3: add support for spoof check setting") Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-12	net: hns3: fix VF VLAN table entries inconsistent issue	Jian Shen	4	-0/+6
	Currently, if VF is loaded on the host side, the host doesn't clear the VF's VLAN table entries when VF removing. In this case, when doing reset and disabling sriov at the same time the VLAN device over VF will be removed, but the VLAN table entries in hardware are remained. This patch fixes it by asking PF to clear the VLAN table entries for VF when VF is removing. It also clears the VLAN table full bit after VF VLAN table entries being cleared. Fixes: c6075b193462 ("net: hns3: Record VF vlan tables") Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-12	net: hns3: fix "tc qdisc del" failed issue	Yonglong Liu	1	-1/+1
	The HNS3 driver supports to configure TC numbers and TC to priority map via "tc" tool. But when delete the rule, will fail, because the HNS3 driver needs at least one TC, but the "tc" tool sets TC number to zero when delete. This patch makes sure that the TC number is at least one. Fixes: 30d240dfa2e8 ("net: hns3: Add mqprio hardware offload support in hns3 driver") Signed-off-by: Yonglong Liu <liuyonglong@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>