linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2019-12-11	s390/kasan: add KASAN_VMALLOC support	Vasily Gorbik	2	-12/+57
	Add KASAN_VMALLOC support which now enables vmalloc memory area access checks as well as enables usage of VMAP_STACK under kasan. KASAN_VMALLOC changes the way vmalloc and modules areas shadow memory is handled. With this new approach only top level page tables are pre-populated and lower levels are filled dynamically upon memory allocation. Acked-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-12-11	s390: remove last diag 0x44 caller	Heiko Carstens	3	-26/+5
	diag 0x44 is a voluntary undirected yield of a virtual CPU. This has caused a lot of performance issues in the past. There is only one caller left, and that one is only executed if diag 0x9c (directed yield) is not present. Given that all hypervisors implement diag 0x9c anyway, remove the last diag 0x44 to avoid that more callers will be added. Worst case that could happen now, if diag 0x9c is not present, is that a virtual CPU would loop a bit instead of giving its time slice up. diag 0x44 statistics in debugfs are kept and will always be zero, so that user space can tell that there are no calls. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-12-11	s390/uv: use EOPNOTSUPP instead of ENOTSUPP	Christian Borntraeger	1	-1/+1
	ENOTSUP is just an internal kernel error and should never reach userspace. The return value of the share function is not exported to userspace, but to avoid giving bad examples let us use EOPNOTSUPP: Suggested-by: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-by: Janosch Frank <frankja@linux.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-12-11	s390/cpum_sf: Avoid SBD overflow condition in irq handler	Thomas Richter	1	-6/+0
	The s390 CPU Measurement sampling facility has an overflow condition which fires when all entries in a SBD are used. The measurement alert interrupt is triggered and reads out all samples in this SDB. It then tests the successor SDB, if this SBD is not full, the interrupt handler does not read any samples at all from this SDB The design waits for the hardware to fill this SBD and then trigger another meassurement alert interrupt. This scheme works nicely until an perf_event_overflow() function call discards the sample due to a too high sampling rate. The interrupt handler has logic to read out a partially filled SDB when the perf event overflow condition in linux common code is met. This causes the CPUM sampling measurement hardware and the PMU device driver to operate on the same SBD's trailer entry. This should not happen. This can be seen here using this trace: cpumsf_pmu_add: tear:0xb5286000 hw_perf_event_update: sdbt 0xb5286000 full 1 over 0 flush_all:0 hw_perf_event_update: sdbt 0xb5286008 full 0 over 0 flush_all:0 above shows 1. interrupt hw_perf_event_update: sdbt 0xb5286008 full 1 over 0 flush_all:0 hw_perf_event_update: sdbt 0xb5286008 full 0 over 0 flush_all:0 above shows 2. interrupt ... this goes on fine until... hw_perf_event_update: sdbt 0xb5286068 full 1 over 0 flush_all:0 perf_push_sample1: overflow one or more samples read from the IRQ handler are rejected by perf_event_overflow() and the IRQ handler advances to the next SDB and modifies the trailer entry of a partially filled SDB. hw_perf_event_update: sdbt 0xb5286070 full 0 over 0 flush_all:1 timestamp: 14:32:52.519953 Next time the IRQ handler is called for this SDB the trailer entry shows an overflow count of 19 missed entries. hw_perf_event_update: sdbt 0xb5286070 full 1 over 19 flush_all:1 timestamp: 14:32:52.970058 Remove access to a follow on SDB when event overflow happened. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-12-11	s390/cpum_sf: Adjust sampling interval to avoid hitting sample limits	Thomas Richter	1	-0/+16
	Function perf_event_ever_overflow() and perf_event_account_interrupt() are called every time samples are processed by the interrupt handler. However function perf_event_account_interrupt() has checks to avoid being flooded with interrupts (more then 1000 samples are received per task_tick). Samples are then dropped and a PERF_RECORD_THROTTLED is added to the perf data. The perf subsystem limit calculation is: maximum sample frequency := 100000 --> 1 samples per 10 us task_tick = 10ms = 10000us --> 1000 samples per task_tick The work flow is measurement_alert() uses SDBT head and each SBDT points to 511 SDB pages, each with 126 sample entries. After processing 8 SBDs and for each valid sample calling: perf_event_overflow() perf_event_account_interrupts() there is a considerable amount of samples being dropped, especially when the sample frequency is very high and near the 100000 limit. To avoid the high amount of samples being dropped near the end of a task_tick time frame, increment the sampling interval in case of dropped events. The CPU Measurement sampling facility on the s390 supports only intervals, specifiing how many CPU cycles have to be executed before a sample is generated. Increase the interval when the samples being generated hit the task_tick limit. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-12-11	s390/test_unwind: fix spelling mistake "reqister" -> "register"	Colin Ian King	1	-1/+1
	There is a spelling mistake in a pr_info message. Fix it. Link: https://lkml.kernel.org/r/20191202090215.28766-1-colin.king@canonical.com Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-12-11	s390/spinlock: remove confusing comment in arch_spin_lock_wait	Vasily Gorbik	1	-1/+0
	arch_spin_lock_wait does not take steal time into consideration. Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-12-11	md: make sure desc_nr less than MD_SB_DISKS	Yufen Yu	1	-0/+1
	For super_90_load, we need to make sure 'desc_nr' less than MD_SB_DISKS, avoiding invalid memory access of 'sb->disks'. Fixes: 228fc7d76db6 ("md: avoid invalid memory access for array sb->dev_roles") Signed-off-by: Yufen Yu <yuyufen@huawei.com> Signed-off-by: Song Liu <songliubraving@fb.com>
2019-12-11	md: raid1: check rdev before reference in raid1_sync_request func	Zhiqiang Liu	1	-1/+1
	In raid1_sync_request func, rdev should be checked before reference. Signed-off-by: Zhiqiang Liu <liuzhiqiang26@huawei.com> Signed-off-by: Song Liu <songliubraving@fb.com>
2019-12-11	raid5: need to set STRIPE_HANDLE for batch head	Guoqing Jiang	1	-1/+1
	With commit 6ce220dd2f8ea71d6afc29b9a7524c12e39f374a ("raid5: don't set STRIPE_HANDLE to stripe which is in batch list"), we don't want to set STRIPE_HANDLE flag for sh which is already in batch list. However, the stripe which is the head of batch list should set this flag, otherwise panic could happen inside init_stripe at BUG_ON(sh->batch_head), it is reproducible with raid5 on top of nvdimm devices per Xiao oberserved. Thanks for Xiao's effort to verify the change. Fixes: 6ce220dd2f8ea ("raid5: don't set STRIPE_HANDLE to stripe which is in batch list") Reported-by: Xiao Ni <xni@redhat.com> Tested-by: Xiao Ni <xni@redhat.com> Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com> Signed-off-by: Song Liu <songliubraving@fb.com>
2019-12-11	afs: Show volume name in /proc/net/afs/<cell>/volumes	David Howells	1	-3/+4
	Show the name of each volume in /proc/net/afs/<cell>/volumes to make it easier to work out the name corresponding to a volume ID. This makes it easier to work out which mounts in /proc/mounts correspond to which volume ID. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Marc Dionne <marc.dionne@auristor.com>
2019-12-11	afs: Fix missing cell comparison in afs_test_super()	David Howells	1	-0/+1
	Fix missing cell comparison in afs_test_super(). Without this, any pair volumes that have the same volume ID will share a superblock, no matter the cell, unless they're in different network namespaces. Normally, most users will only deal with a single cell and so they won't see this. Even if they do look into a second cell, they won't see a problem unless they happen to hit a volume with the same ID as one they've already got mounted. Before the patch: # ls /afs/grand.central.org/archive linuxdev/ mailman/ moin/ mysql/ pipermail/ stage/ twiki/ # ls /afs/kth.se/ linuxdev/ mailman/ moin/ mysql/ pipermail/ stage/ twiki/ # cat /proc/mounts \| grep afs none /afs afs rw,relatime,dyn,autocell 0 0 #grand.central.org:root.cell /afs/grand.central.org afs ro,relatime 0 0 #grand.central.org:root.archive /afs/grand.central.org/archive afs ro,relatime 0 0 #grand.central.org:root.archive /afs/kth.se afs ro,relatime 0 0 After the patch: # ls /afs/grand.central.org/archive linuxdev/ mailman/ moin/ mysql/ pipermail/ stage/ twiki/ # ls /afs/kth.se/ admin/ common/ install/ OldFiles/ service/ system/ bakrestores/ home/ misc/ pkg/ src/ wsadmin/ # cat /proc/mounts \| grep afs none /afs afs rw,relatime,dyn,autocell 0 0 #grand.central.org:root.cell /afs/grand.central.org afs ro,relatime 0 0 #grand.central.org:root.archive /afs/grand.central.org/archive afs ro,relatime 0 0 #kth.se:root.cell /afs/kth.se afs ro,relatime 0 0 Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: Carsten Jacobi <jacobi@de.ibm.com> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Marc Dionne <marc.dionne@auristor.com> Tested-by: Jonathan Billings <jsbillings@jsbillings.org> cc: Todd DeSantis <atd@us.ibm.com>
2019-12-11	afs: Fix creation calls in the dynamic root to fail with EOPNOTSUPP	David Howells	1	-0/+3
	Fix the lookup method on the dynamic root directory such that creation calls, such as mkdir, open(O_CREAT), symlink, etc. fail with EOPNOTSUPP rather than failing with some odd error (such as EEXIST). lookup() itself tries to create automount directories when it is invoked. These are cached locally in RAM and not committed to storage. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Marc Dionne <marc.dionne@auristor.com> Tested-by: Jonathan Billings <jsbillings@jsbillings.org>
2019-12-11	afs: Fix mountpoint parsing	David Howells	1	-2/+4
	Each AFS mountpoint has strings that define the target to be mounted. This is required to end in a dot that is supposed to be stripped off. The string can include suffixes of ".readonly" or ".backup" - which are supposed to come before the terminal dot. To add to the confusion, the "fs lsmount" afs utility does not show the terminal dot when displaying the string. The kernel mount source string parser, however, assumes that the terminal dot marks the suffix and that the suffix is always "" and is thus ignored. In most cases, there is no suffix and this is not a problem - but if there is a suffix, it is lost and this affects the ability to mount the correct volume. The command line mount command, on the other hand, is expected not to include a terminal dot - so the problem doesn't arise there. Fix this by making sure that the dot exists and then stripping it when passing the string to the mount configuration. Fixes: bec5eb614130 ("AFS: Implement an autocell mount capability [ver #2]") Reported-by: Jonathan Billings <jsbillings@jsbillings.org> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Marc Dionne <marc.dionne@auristor.com> Tested-by: Jonathan Billings <jsbillings@jsbillings.org>
2019-12-11	dt-bindings: net: ti: cpsw-switch: update to fix comments	Grygorii Strashko	1	-15/+7
	After original patch was merged there were additional comments/requests provided by Rob Herring [1]. Mostly they are related to json-schema usage, and this patch fixes them. Also SPDX-License-Identifier has been changed to (GPL-2.0-only OR BSD-2-Clause) as requested. [1] https://lkml.org/lkml/2019/11/21/875 Fixes: ef63fe72f698 ("dt-bindings: net: ti: add new cpsw switch driver bindings") Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> [robh: Remove 2 more maxItems that aren't necessary] Signed-off-by: Rob Herring <robh@kernel.org>
2019-12-11	dt-bindings: remoteproc: stm32: add wakeup-source property	Arnaud Pouliquen	1	-0/+2
	If the optional wdg interrupt is defined, then this property may be defined. Signed-off-by: Arnaud Pouliquen <arnaud.pouliquen@st.com> Signed-off-by: Rob Herring <robh@kernel.org>
2019-12-11	xhci: make sure interrupts are restored to correct state	Mathias Nyman	1	-6/+6
	spin_unlock_irqrestore() might be called with stale flags after reading port status, possibly restoring interrupts to a incorrect state. If a usb2 port just finished resuming while the port status is read the spin lock will be temporary released and re-acquired in a separate function. The flags parameter is passed as value instead of a pointer, not updating flags properly before the final spin_unlock_irqrestore() is called. Cc: <stable@vger.kernel.org> # v3.12+ Fixes: 8b3d45705e54 ("usb: Fix xHCI host issues on remote wakeup.") Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://lore.kernel.org/r/20191211142007.8847-7-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-11	xhci: handle some XHCI_TRUST_TX_LENGTH quirks cases as default behaviour.	Mathias Nyman	1	-1/+2
	xhci driver claims it needs XHCI_TRUST_TX_LENGTH quirk for both Broadcom/Cavium and a Renesas xHC controllers. The quirk was inteded for handling false "success" complete event for transfers that had data left untransferred. These transfers should complete with "short packet" events instead. In these two new cases the false "success" completion is reported after a "short packet" if the TD consists of several TRBs. xHCI specs 4.10.1.1.2 say remaining TRBs should report "short packet" as well after the first short packet in a TD, but this issue seems so common it doesn't make sense to add the quirk for all vendors. Turn these events into short packets automatically instead. This gets rid of the "The WARN Successful completion on short TX for slot 1 ep 1: needs XHCI_TRUST_TX_LENGTH quirk" warning in many cases. Cc: <stable@vger.kernel.org> Reported-by: Eli Billauer <eli.billauer@gmail.com> Reported-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Eli Billauer <eli.billauer@gmail.com> Tested-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://lore.kernel.org/r/20191211142007.8847-6-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-11	xhci: Increase STS_HALT timeout in xhci_suspend()	Kai-Heng Feng	1	-1/+1
	I've recently observed failed xHCI suspend attempt on AMD Raven Ridge system: kernel: xhci_hcd 0000:04:00.4: WARN: xHC CMD_RUN timeout kernel: PM: suspend_common(): xhci_pci_suspend+0x0/0xd0 returns -110 kernel: PM: pci_pm_suspend(): hcd_pci_suspend+0x0/0x30 returns -110 kernel: PM: dpm_run_callback(): pci_pm_suspend+0x0/0x150 returns -110 kernel: PM: Device 0000:04:00.4 failed to suspend async: error -110 Similar to commit ac343366846a ("xhci: Increase STS_SAVE timeout in xhci_suspend()") we also need to increase the HALT timeout to make it be able to suspend again. Cc: <stable@vger.kernel.org> # 5.2+ Fixes: f7fac17ca925 ("xhci: Convert xhci_handshake() to use readl_poll_timeout_atomic()") Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://lore.kernel.org/r/20191211142007.8847-5-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-11	usb: xhci: only set D3hot for pci device	Henry Lin	3	-5/+16
	Xhci driver cannot call pci_set_power_state() on non-pci xhci host controllers. For example, NVIDIA Tegra XHCI host controller which acts as platform device with XHCI_SPURIOUS_WAKEUP quirk set in some platform hits this issue during shutdown. Cc: <stable@vger.kernel.org> Fixes: 638298dc66ea ("xhci: Fix spurious wakeups after S5 on Haswell") Signed-off-by: Henry Lin <henryl@nvidia.com> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://lore.kernel.org/r/20191211142007.8847-4-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-11	xhci: fix USB3 device initiated resume race with roothub autosuspend	Mathias Nyman	2	-2/+11
	A race in xhci USB3 remote wake handling may force device back to suspend after it initiated resume siganaling, causing a missed resume event or warm reset of device. When a USB3 link completes resume signaling and goes to enabled (UO) state a interrupt is issued and the interrupt handler will clear the bus_state->port_remote_wakeup resume flag, allowing bus suspend. If the USB3 roothub thread just finished reading port status before the interrupt, finding ports still in suspended (U3) state, but hasn't yet started suspending the hub, then the xhci interrupt handler will clear the flag that prevented roothub suspend and allow bus to suspend, forcing all port links back to suspended (U3) state. Example case: usb_runtime_suspend() # because all ports still show suspended U3 usb_suspend_both() hub_suspend(); # successful as hub->wakeup_bits not set yet ==> INTERRUPT xhci_irq() handle_port_status() clear bus_state->port_remote_wakeup usb_wakeup_notification() sets hub->wakeup_bits; kick_hub_wq() <== END INTERRUPT hcd_bus_suspend() xhci_bus_suspend() # success as port_remote_wakeup bits cleared Fix this by increasing roothub usage count during port resume to prevent roothub autosuspend, and by making sure bus_state->port_remote_wakeup flag is only cleared after resume completion is visible, i.e. after xhci roothub returned U0 or other non-U3 link state link on a get port status request. Issue rootcaused by Chiasheng Lee Cc: <stable@vger.kernel.org> Cc: Lee, Hou-hsun <hou-hsun.lee@intel.com> Reported-by: Lee, Chiasheng <chiasheng.lee@intel.com> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://lore.kernel.org/r/20191211142007.8847-3-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-11	xhci: Fix memory leak in xhci_add_in_port()	Mika Westerberg	1	-0/+4
	When xHCI is part of Alpine or Titan Ridge Thunderbolt controller and the xHCI device is hot-removed as a result of unplugging a dock for example, the driver leaks memory it allocates for xhci->usb3_rhub.psi and xhci->usb2_rhub.psi in xhci_add_in_port() as reported by kmemleak: unreferenced object 0xffff922c24ef42f0 (size 16): comm "kworker/u16:2", pid 178, jiffies 4294711640 (age 956.620s) hex dump (first 16 bytes): 21 00 0c 00 12 00 dc 05 23 00 e0 01 00 00 00 00 !.......#....... backtrace: [<000000007ac80914>] xhci_mem_init+0xcf8/0xeb7 [<0000000001b6d775>] xhci_init+0x7c/0x160 [<00000000db443fe3>] xhci_gen_setup+0x214/0x340 [<00000000fdffd320>] xhci_pci_setup+0x48/0x110 [<00000000541e1e03>] usb_add_hcd.cold+0x265/0x747 [<00000000ca47a56b>] usb_hcd_pci_probe+0x219/0x3b4 [<0000000021043861>] xhci_pci_probe+0x24/0x1c0 [<00000000b9231f25>] local_pci_probe+0x3d/0x70 [<000000006385c9d7>] pci_device_probe+0xd0/0x150 [<0000000070241068>] really_probe+0xf5/0x3c0 [<0000000061f35c0a>] driver_probe_device+0x58/0x100 [<000000009da11198>] bus_for_each_drv+0x79/0xc0 [<000000009ce45f69>] __device_attach+0xda/0x160 [<00000000df201aaf>] pci_bus_add_device+0x46/0x70 [<0000000088a1bc48>] pci_bus_add_devices+0x27/0x60 [<00000000ad9ee708>] pci_bus_add_devices+0x52/0x60 unreferenced object 0xffff922c24ef3318 (size 8): comm "kworker/u16:2", pid 178, jiffies 4294711640 (age 956.620s) hex dump (first 8 bytes): 34 01 05 00 35 41 0a 00 4...5A.. backtrace: [<000000007ac80914>] xhci_mem_init+0xcf8/0xeb7 [<0000000001b6d775>] xhci_init+0x7c/0x160 [<00000000db443fe3>] xhci_gen_setup+0x214/0x340 [<00000000fdffd320>] xhci_pci_setup+0x48/0x110 [<00000000541e1e03>] usb_add_hcd.cold+0x265/0x747 [<00000000ca47a56b>] usb_hcd_pci_probe+0x219/0x3b4 [<0000000021043861>] xhci_pci_probe+0x24/0x1c0 [<00000000b9231f25>] local_pci_probe+0x3d/0x70 [<000000006385c9d7>] pci_device_probe+0xd0/0x150 [<0000000070241068>] really_probe+0xf5/0x3c0 [<0000000061f35c0a>] driver_probe_device+0x58/0x100 [<000000009da11198>] bus_for_each_drv+0x79/0xc0 [<000000009ce45f69>] __device_attach+0xda/0x160 [<00000000df201aaf>] pci_bus_add_device+0x46/0x70 [<0000000088a1bc48>] pci_bus_add_devices+0x27/0x60 [<00000000ad9ee708>] pci_bus_add_devices+0x52/0x60 Fix this by calling kfree() for the both psi objects in xhci_mem_cleanup(). Cc: <stable@vger.kernel.org> # 4.4+ Fixes: 47189098f8be ("xhci: parse xhci protocol speed ID list for usb 3.1 usage") Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://lore.kernel.org/r/20191211142007.8847-2-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-11	drm/i915: Serialise with remote retirement	Chris Wilson	1	-3/+23
	Since retirement may be running in a worker on another CPU, it may be skipped in the local intel_gt_wait_for_idle(). To ensure the state is consistent for our sanity checks upon load, serialise with the remote retirer by waiting on the timeline->mutex. Outside of this use case, e.g. on suspend or module unload, we expect the slack to be picked up by intel_gt_pm_wait_for_idle() and so prefer to put the special case serialisation with retirement in its single user, for now at least. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191121071044.97798-2-chris@chris-wilson.co.uk (cherry picked from commit 2d0fb251360ab7eccbffd99f6933a2a4de678d52) Fixes: 093b92287363 ("drm/i915: Split i915_active.mutex into an irq-safe spinlock for the rbtree") Closes: https://gitlab.freedesktop.org/drm/intel/issues/754 Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2019-12-11	virtio_balloon: divide/multiply instead of shifts	Michael S. Tsirkin	1	-4/+5
	We managed to get confused about the shift direction at least once. Let's switch to division/multiplcation instead. Add a number of pages macro for this purpose. We still keep the order macro around too since this is what alloc/free pages want. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Wei Wang <wei.w.wang@intel.com> Reviewed-by: David Hildenbrand <david@redhat.com>
2019-12-11	virtio_balloon: name cleanups	Michael S. Tsirkin	1	-12/+12
	free_page_order is a confusing name. It's not a page order actually, it's the order of the block of memory we are hinting. Rename to hint_block_order. Also, rename SIZE to BYTES to make it clear it's the block size in bytes. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Wei Wang <wei.w.wang@intel.com> Reviewed-by: David Hildenbrand <david@redhat.com>
2019-12-11	virtio-balloon: fix managed page counts when migrating pages between zones	David Hildenbrand	1	-0/+11
	In case we have to migrate a ballon page to a newpage of another zone, the managed page count of both zones is wrong. Paired with memory offlining (which will adjust the managed page count), we can trigger kernel crashes and all kinds of different symptoms. One way to reproduce: 1. Start a QEMU guest with 4GB, no NUMA 2. Hotplug a 1GB DIMM and online the memory to ZONE_NORMAL 3. Inflate the balloon to 1GB 4. Unplug the DIMM (be quick, otherwise unmovable data ends up on it) 5. Observe /proc/zoneinfo Node 0, zone Normal pages free 16810 min 24848885473806 low 18471592959183339 high 36918337032892872 spanned 262144 present 262144 managed 18446744073709533486 6. Do anything that requires some memory (e.g., inflate the balloon some more). The OOM goes crazy and the system crashes [ 238.324946] Out of memory: Killed process 537 (login) total-vm:27584kB, anon-rss:860kB, file-rss:0kB, shmem-rss:00 [ 238.338585] systemd invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0 [ 238.339420] CPU: 0 PID: 1 Comm: systemd Tainted: G D W 5.4.0-next-20191204+ #75 [ 238.340139] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu4 [ 238.341121] Call Trace: [ 238.341337] dump_stack+0x8f/0xd0 [ 238.341630] dump_header+0x61/0x5ea [ 238.341942] oom_kill_process.cold+0xb/0x10 [ 238.342299] out_of_memory+0x24d/0x5a0 [ 238.342625] __alloc_pages_slowpath+0xd12/0x1020 [ 238.343024] __alloc_pages_nodemask+0x391/0x410 [ 238.343407] pagecache_get_page+0xc3/0x3a0 [ 238.343757] filemap_fault+0x804/0xc30 [ 238.344083] ? ext4_filemap_fault+0x28/0x42 [ 238.344444] ext4_filemap_fault+0x30/0x42 [ 238.344789] __do_fault+0x37/0x1a0 [ 238.345087] __handle_mm_fault+0x104d/0x1ab0 [ 238.345450] handle_mm_fault+0x169/0x360 [ 238.345790] do_user_addr_fault+0x20d/0x490 [ 238.346154] do_page_fault+0x31/0x210 [ 238.346468] async_page_fault+0x43/0x50 [ 238.346797] RIP: 0033:0x7f47eba4197e [ 238.347110] Code: Bad RIP value. [ 238.347387] RSP: 002b:00007ffd7c0c1890 EFLAGS: 00010293 [ 238.347834] RAX: 0000000000000002 RBX: 000055d196a20a20 RCX: 00007f47eba4197e [ 238.348437] RDX: 0000000000000033 RSI: 00007ffd7c0c18c0 RDI: 0000000000000004 [ 238.349047] RBP: 00007ffd7c0c1c20 R08: 0000000000000000 R09: 0000000000000033 [ 238.349660] R10: 00000000ffffffff R11: 0000000000000293 R12: 0000000000000001 [ 238.350261] R13: ffffffffffffffff R14: 0000000000000000 R15: 00007ffd7c0c18c0 [ 238.350878] Mem-Info: [ 238.351085] active_anon:3121 inactive_anon:51 isolated_anon:0 [ 238.351085] active_file:12 inactive_file:7 isolated_file:0 [ 238.351085] unevictable:0 dirty:0 writeback:0 unstable:0 [ 238.351085] slab_reclaimable:5565 slab_unreclaimable:10170 [ 238.351085] mapped:3 shmem:111 pagetables:155 bounce:0 [ 238.351085] free:720717 free_pcp:2 free_cma:0 [ 238.353757] Node 0 active_anon:12484kB inactive_anon:204kB active_file:48kB inactive_file:28kB unevictable:0kB iss [ 238.355979] Node 0 DMA free:11556kB min:36kB low:48kB high:60kB reserved_highatomic:0KB active_anon:152kB inactivB [ 238.358345] lowmem_reserve[]: 0 2955 2884 2884 2884 [ 238.358761] Node 0 DMA32 free:2677864kB min:7004kB low:10028kB high:13052kB reserved_highatomic:0KB active_anon:0B [ 238.361202] lowmem_reserve[]: 0 0 72057594037927865 72057594037927865 72057594037927865 [ 238.361888] Node 0 Normal free:193448kB min:99395541895224kB low:73886371836733356kB high:147673348131571488kB reB [ 238.364765] lowmem_reserve[]: 0 0 0 0 0 [ 238.365101] Node 0 DMA: 74kB (U) 58kB (UE) 616kB (UME) 232kB (UM) 164kB (U) 2128kB (UE) 3256kB (UME) 2512B [ 238.366379] Node 0 DMA32: 04kB 18kB (U) 216kB (UM) 232kB (UM) 264kB (UM) 1128kB (U) 1256kB (U) 1512kB (U)B [ 238.367654] Node 0 Normal: 19854kB (UME) 13218kB (UME) 84416kB (UME) 52432kB (UME) 30064kB (UME) 138128kB (B [ 238.369184] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [ 238.369915] 130 total pagecache pages [ 238.370241] 0 pages in swap cache [ 238.370533] Swap cache stats: add 0, delete 0, find 0/0 [ 238.370981] Free swap = 0kB [ 238.371239] Total swap = 0kB [ 238.371488] 1048445 pages RAM [ 238.371756] 0 pages HighMem/MovableOnly [ 238.372090] 306992 pages reserved [ 238.372376] 0 pages cma reserved [ 238.372661] 0 pages hwpoisoned In another instance (older kernel), I was able to observe this (negative page count :/): [ 180.896971] Offlined Pages 32768 [ 182.667462] Offlined Pages 32768 [ 184.408117] Offlined Pages 32768 [ 186.026321] Offlined Pages 32768 [ 187.684861] Offlined Pages 32768 [ 189.227013] Offlined Pages 32768 [ 190.830303] Offlined Pages 32768 [ 190.833071] Built 1 zonelists, mobility grouping on. Total pages: -36920272750453009 In another instance (older kernel), I was no longer able to start any process: [root@vm ~]# [ 214.348068] Offlined Pages 32768 [ 215.973009] Offlined Pages 32768 cat /proc/meminfo -bash: fork: Cannot allocate memory [root@vm ~]# cat /proc/meminfo -bash: fork: Cannot allocate memory Fix it by properly adjusting the managed page count when migrating if the zone changed. The managed page count of the zones now looks after unplug of the DIMM (and after deflating the balloon) just like before inflating the balloon (and plugging+onlining the DIMM). We'll temporarily modify the totalram page count. If this ever becomes a problem, we can fine tune by providing helpers that don't touch the totalram pages (e.g., adjust_zone_managed_page_count()). Please note that fixing up the managed page count is only necessary when we adjusted the managed page count when inflating - only if we don't have VIRTIO_BALLOON_F_DEFLATE_ON_OOM. With that feature, the managed page count is not touched when inflating/deflating. Reported-by: Yumei Huang <yuhuang@redhat.com> Fixes: 3dcc0571cd64 ("mm: correctly update zone->managed_pages") Cc: <stable@vger.kernel.org> # v3.11+ Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Jiang Liu <liuj97@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Igor Mammedov <imammedo@redhat.com> Cc: virtualization@lists.linux-foundation.org Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-12-11	USB: Fix incorrect DMA allocations for local memory pool drivers	Fredrik Noring	2	-22/+23
	Fix commit 7b81cb6bddd2 ("usb: add a HCD_DMA flag instead of guestimating DMA capabilities") where local memory USB drivers erroneously allocate DMA memory instead of pool memory, causing OHCI Unrecoverable Error, disabled HC died; cleaning up The order between hcd_uses_dma() and hcd->localmem_pool is now arranged as in hcd_buffer_alloc() and hcd_buffer_free(), with the test for hcd->localmem_pool placed first. As an alternative, one might consider adjusting hcd_uses_dma() with static inline bool hcd_uses_dma(struct usb_hcd *hcd) { - return IS_ENABLED(CONFIG_HAS_DMA) && (hcd->driver->flags & HCD_DMA); + return IS_ENABLED(CONFIG_HAS_DMA) && + (hcd->driver->flags & HCD_DMA) && + (hcd->localmem_pool == NULL); } One can also consider unsetting HCD_DMA for local memory pool drivers. Fixes: 7b81cb6bddd2 ("usb: add a HCD_DMA flag instead of guestimating DMA capabilities") Cc: stable <stable@vger.kernel.org> Signed-off-by: Fredrik Noring <noring@nocrew.org> Link: https://lore.kernel.org/r/20191210172905.GA52526@sx9 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-11	i2c: add helper to check if a client has a driver attached	Wolfram Sang	1	-0/+5
	As a preparation for an API conversion, factor out something frequently used in the media subsystem. As an improvement, it bails out on both, NULL and ERRPTR to handle the old and new API. Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2019-12-11	ALSA: hda/realtek - Line-out jack doesn't work on a Dell AIO	Hui Wang	1	-5/+3
	After applying the fixup ALC274_FIXUP_DELL_AIO_LINEOUT_VERB, the Line-out jack works well. And instead of adding a new set of pin definition in the pin_fixup_tbl, we put a more generic matching entry in the fallback_pin_fixup_tbl. Cc: <stable@vger.kernel.org> Signed-off-by: Hui Wang <hui.wang@canonical.com> Link: https://lore.kernel.org/r/20191211051321.5883-1-hui.wang@canonical.com Signed-off-by: Takashi Iwai <tiwai@suse.de>
2019-12-10	io_uring: add sockets to list of files that support non-blocking issue	Jens Axboe	1	-2/+4
	In chasing a performance issue between using IORING_OP_RECVMSG and IORING_OP_READV on sockets, tracing showed that we always punt the socket reads to async offload. This is due to io_file_supports_async() not checking for S_ISSOCK on the inode. Since sockets supports the O_NONBLOCK (or MSG_DONTWAIT) flag just fine, add sockets to the list of file types that we can do a non-blocking issue to. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-10	net: make socket read/write_iter() honor IOCB_NOWAIT	Jens Axboe	1	-2/+2
	The socket read/write helpers only look at the file O_NONBLOCK. not the iocb IOCB_NOWAIT flag. This breaks users like preadv2/pwritev2 and io_uring that rely on not having the file itself marked nonblocking, but rather the iocb itself. Cc: netdev@vger.kernel.org Acked-by: David Miller <davem@davemloft.net> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-10	io_uring: only hash regular files for async work execution	Jens Axboe	1	-1/+3
	We hash regular files to avoid having multiple threads hammer on the inode mutex, but it should not be needed on other types of files (like sockets). Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-10	io_uring: run next sqe inline if possible	Jens Axboe	1	-4/+11
	One major use case of linked commands is the ability to run the next link inline, if at all possible. This is done correctly for async offload, but somewhere along the line we lost the ability to do so when we were able to complete a request without having to punt it. Ensure that we do so correctly. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-10	io_uring: don't dynamically allocate poll data	Jens Axboe	1	-16/+11
	This essentially reverts commit e944475e6984. For high poll ops workloads, like TAO, the dynamic allocation of the wait_queue entry for IORING_OP_POLL_ADD adds considerable extra overhead. Go back to embedding the wait_queue_entry, but keep the usage of wait->private for the pointer stashing. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-10	io_uring: deferred send/recvmsg should assign iov	Jens Axboe	1	-2/+2
	Don't just assign it from the main call path, that can miss the case when we're called from issue deferral. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-10	io_uring: sqthread should grab ctx->uring_lock for submissions	Jens Axboe	1	-5/+2
	We use the mutex to guard against registered file updates, for instance. Ensure we're safe in accessing that state against concurrent updates. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-10	io-wq: briefly spin for new work after finishing work	Jens Axboe	2	-5/+26
	To avoid going to sleep only to get woken shortly thereafter, spin briefly for new work upon completion of work. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-10	io-wq: remove worker->wait waitqueue	Jens Axboe	1	-8/+2
	We only have one cases of using the waitqueue to wake the worker, the rest are using wake_up_process(). Since we can save some cycles not fiddling with the waitqueue io_wqe_worker(), switch the work activation to task wakeup and get rid of the now unused wait_queue_head_t in struct io_worker. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-10	io_uring: allow unbreakable links	Jens Axboe	2	-38/+47
	Some commands will invariably end in a failure in the sense that the completion result will be less than zero. One such example is timeouts that don't have a completion count set, they will always complete with -ETIME unless cancelled. For linked commands, we sever links and fail the rest of the chain if the result is less than zero. Since we have commands where we know that will happen, add IOSQE_IO_HARDLINK as a stronger link that doesn't sever regardless of the completion result. Note that the link will still sever if we fail submitting the parent request, hard links are only resilient in the presence of completion results for requests that did submit correctly. Cc: stable@vger.kernel.org # v5.4 Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Reported-by: 李通洲 <carter.li@eoitek.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-10	cpuidle: Fix cpuidle_driver_state_disabled()	Rafael J. Wysocki	1	-0/+10
	It turns out that cpuidle_driver_state_disabled() can be called before registering the cpufreq driver on some platforms, which was not expected when it was introduced and which leads to a NULL pointer dereference when trying to walk the CPUs associated with the given cpuidle driver. Fix the problem by making cpuidle_driver_state_disabled() check if the driver's mask of CPUs associated with it is present and to set CPUIDLE_FLAG_UNUSABLE for the given idle state in the driver's states list if that is not the case to cause __cpuidle_register_device() to set CPUIDLE_STATE_DISABLED_BY_DRIVER for that state for all cpuidle devices registered by it later. Fixes: cbda56d5fefc ("cpuidle: Introduce cpuidle_driver_state_disabled() for driver quirks") Reported-by: Daniel Lezcano <daniel.lezcano@linaro.org> Tested-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-12-10	drm/amd/display: include linux/slab.h where needed	Arnd Bergmann	1	-0/+2
	Calling kzalloc() and related functions requires the linux/slab.h header to be included: drivers/gpu/drm/amd/amdgpu/../display/dc/dcn21/dcn21_resource.c: In function 'dcn21_ipp_create': drivers/gpu/drm/amd/amdgpu/../display/dc/dcn21/dcn21_resource.c:679:3: error: implicit declaration of function 'kzalloc'; did you mean 'd_alloc'? [-Werror=implicit-function-declaration] kzalloc(sizeof(struct dcn10_ipp), GFP_KERNEL); A lot of other headers also miss a direct include in this file, but this is the only one that causes a problem for now. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-12-10	i2c: fix header file kernel-doc warning	Randy Dunlap	1	-0/+1
	Fix kernel-doc warning in <linux/i2c.h>. ../include/linux/i2c.h:337: warning: Function parameter or member 'init_irq' not described in 'i2c_client' Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2019-12-10	i2c: remove i2c_new_dummy() API	Wolfram Sang	2	-29/+0
	All in-kernel users have been converted to {devm_}i2c_new_dummy_device(). Remove the old API. Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Tested-by: Luca Ceresoli <luca@lucaceresoli.net> Reviewed-by: Luca Ceresoli <luca@lucaceresoli.net> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2019-12-10	drm/amd/display: fix undefined struct member reference	Arnd Bergmann	1	-0/+2
	An initialization was added for two optional struct members. One of these is always present in the dcn20_resource file, but the other one depends on CONFIG_DRM_AMD_DC_DSC_SUPPORT and causes a build failure if that is missing: drivers/gpu/drm/amd/amdgpu/../display/dc/dcn20/dcn20_resource.c:926:14: error: excess elements in struct initializer [-Werror] .num_dsc = 5, Add another #ifdef around the assignment. Fixes: c3d03c5a196f ("drm/amd/display: Include num_vmid and num_dsc within NV14's resource caps") Reviewed-by: Zhan Liu <zhan.liu@amd.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-12-10	sh: kgdb: Mark expected switch fall-throughs	Kuninori Morimoto	1	-0/+1
	Mark switch cases where we are expecting to fall through. This patch fixes the following error: LINUX/arch/sh/kernel/kgdb.c: In function 'kgdb_arch_handle_exception': LINUX/arch/sh/kernel/kgdb.c:267:6: error: this statement may fall through [-Werror=implicit-fallthrough=] if (kgdb_hex2long(&ptr, &addr)) ^ LINUX/arch/sh/kernel/kgdb.c:269:2: note: here case 'D': ^~~~ Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com> Acked-by: Daniel Thompson <daniel.thompson@linaro.org> Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
2019-12-10	ftrace: Fix function_graph tracer interaction with BPF trampoline	Alexei Starovoitov	4	-26/+21
	Depending on type of BPF programs served by BPF trampoline it can call original function. In such case the trampoline will skip one stack frame while returning. That will confuse function_graph tracer and will cause crashes with bad RIP. Teach graph tracer to skip functions that have BPF trampoline attached. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-12-10	tracing: remove set but not used variable 'buffer'	YueHaibing	1	-2/+0
	kernel/trace/trace_events_inject.c: In function trace_inject_entry: kernel/trace/trace_events_inject.c:20:22: warning: variable buffer set but not used [-Wunused-but-set-variable] It is never used, so remove it. Link: http://lkml.kernel.org/r/20191207034409.25668-1-yuehaibing@huawei.com Reported-by: Hulk Robot <hulkci@huawei.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-12-10	module: Remove accidental change of module_enable_x()	Steven Rostedt (VMware)	1	-5/+1
	When pulling in Divya Indi's patch, I made a minor fix to remove unneeded braces. I commited my fix up via "git commit -a --amend". Unfortunately, I didn't realize I had some changes I was testing in the module code, and those changes were applied to Divya's patch as well. This reverts the accidental updates to the module code. Cc: Jessica Yu <jeyu@kernel.org> Cc: Divya Indi <divya.indi@oracle.com> Reported-by: Peter Zijlstra <peterz@infradead.org> Fixes: e585e6469d6f ("tracing: Verify if trace array exists before destroying it.") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-12-10	ALSA: hda/hdmi - Fix duplicate unref of pci_dev	Lukas Wunner	1	-1/+0
	Nicholas Johnson reports a null pointer deref as well as a refcount underflow upon hot-removal of a Thunderbolt-attached AMD eGPU. He's bisected the issue down to commit 586bc4aab878 ("ALSA: hda/hdmi - fix vgaswitcheroo detection for AMD"). The commit iterates over PCI devices using pci_get_class() and unreferences each device found, even though pci_get_class() subsequently unreferences the device as well. Fix it. Fixes: 586bc4aab878 ("ALSA: hda/hdmi - fix vgaswitcheroo detection for AMD") Link: https://lore.kernel.org/r/PSXP216MB0438BFEAA0617283A834E11580580@PSXP216MB0438.KORP216.PROD.OUTLOOK.COM/ Reported-and-tested-by: Nicholas Johnson <nicholas.johnson-opensource@outlook.com.au> Signed-off-by: Lukas Wunner <lukas@wunner.de> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Cc: Mika Westerberg <mika.westerberg@linux.intel.com> Cc: Alexander Deucher <alexander.deucher@amd.com> Cc: Bjorn Helgaas <helgaas@kernel.org> Link: https://lore.kernel.org/r/77aa6c01aefe1ebc4004e87b0bc714f2759f15c4.1575985006.git.lukas@wunner.de Signed-off-by: Takashi Iwai <tiwai@suse.de>
2019-12-10	docs: dm-integrity: remove reference to ARC4	Eric Biggers	1	-1/+1
	ARC4 is no longer considered secure, so it shouldn't be used, even as just an example. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>