aboutsummaryrefslogtreecommitdiffstats
path: root/arch/um (follow)
AgeCommit message (Collapse)AuthorFilesLines
2020-12-17Merge tag 'for-linus-5.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/umlLinus Torvalds35-588/+897
Pull UML updates from Richard Weinberger: - IRQ handling cleanups - Support for suspend - Various fixes for UML specific drivers: ubd, vector, xterm * tag 'for-linus-5.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml: (32 commits) um: Fix build w/o CONFIG_PM_SLEEP um: time-travel: Correct time event IRQ delivery um: irq/sigio: Support suspend/resume handling of workaround IRQs um: time-travel: Actually apply "free-until" optimisation um: chan_xterm: Fix fd leak um: tty: Fix handling of close in tty lines um: Monitor error events in IRQ controller um: allocate a guard page to helper threads um: support some of ARCH_HAS_SET_MEMORY um: time-travel: avoid multiple identical propagations um: Fetch registers only for signals which need them um: Support suspend to RAM um: Allow PM with suspend-to-idle um: time: Fix read_persistent_clock64() in time-travel um: Simplify os_idle_sleep() and sleep longer um: Simplify IRQ handling code um: Remove IRQ_NONE type um: irq: Reduce irq_reg allocation um: irq: Clean up and rename struct irq_fd um: Clean up alarm IRQ chip name ...
2020-12-16Merge tag 'tif-task_work.arch-2020-12-14' of git://git.kernel.dk/linux-blockLinus Torvalds2-1/+4
Pull TIF_NOTIFY_SIGNAL updates from Jens Axboe: "This sits on top of of the core entry/exit and x86 entry branch from the tip tree, which contains the generic and x86 parts of this work. Here we convert the rest of the archs to support TIF_NOTIFY_SIGNAL. With that done, we can get rid of JOBCTL_TASK_WORK from task_work and signal.c, and also remove a deadlock work-around in io_uring around knowing that signal based task_work waking is invoked with the sighand wait queue head lock. The motivation for this work is to decouple signal notify based task_work, of which io_uring is a heavy user of, from sighand. The sighand lock becomes a huge contention point, particularly for threaded workloads where it's shared between threads. Even outside of threaded applications it's slower than it needs to be. Roman Gershman <romger@amazon.com> reported that his networked workload dropped from 1.6M QPS at 80% CPU to 1.0M QPS at 100% CPU after io_uring was changed to use TIF_NOTIFY_SIGNAL. The time was all spent hammering on the sighand lock, showing 57% of the CPU time there [1]. There are further cleanups possible on top of this. One example is TIF_PATCH_PENDING, where a patch already exists to use TIF_NOTIFY_SIGNAL instead. Hopefully this will also lead to more consolidation, but the work stands on its own as well" [1] https://github.com/axboe/liburing/issues/215 * tag 'tif-task_work.arch-2020-12-14' of git://git.kernel.dk/linux-block: (28 commits) io_uring: remove 'twa_signal_ok' deadlock work-around kernel: remove checking for TIF_NOTIFY_SIGNAL signal: kill JOBCTL_TASK_WORK io_uring: JOBCTL_TASK_WORK is no longer used by task_work task_work: remove legacy TWA_SIGNAL path sparc: add support for TIF_NOTIFY_SIGNAL riscv: add support for TIF_NOTIFY_SIGNAL nds32: add support for TIF_NOTIFY_SIGNAL ia64: add support for TIF_NOTIFY_SIGNAL h8300: add support for TIF_NOTIFY_SIGNAL c6x: add support for TIF_NOTIFY_SIGNAL alpha: add support for TIF_NOTIFY_SIGNAL xtensa: add support for TIF_NOTIFY_SIGNAL arm: add support for TIF_NOTIFY_SIGNAL microblaze: add support for TIF_NOTIFY_SIGNAL hexagon: add support for TIF_NOTIFY_SIGNAL csky: add support for TIF_NOTIFY_SIGNAL openrisc: add support for TIF_NOTIFY_SIGNAL sh: add support for TIF_NOTIFY_SIGNAL um: add support for TIF_NOTIFY_SIGNAL ...
2020-12-16Merge tag 'asm-generic-timers-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-genericLinus Torvalds1-1/+0
Pull asm-generic cross-architecture timer cleanup from Arnd Bergmann: "This cleans up two ancient timer features that were never completed in the past, CONFIG_GENERIC_CLOCKEVENTS and CONFIG_ARCH_USES_GETTIMEOFFSET. There was only one user left for the ARCH_USES_GETTIMEOFFSET variant of clocksource implementations, the ARM EBSA110 platform. Rather than changing to use modern timekeeping, we remove the platform entirely as Russell no longer uses his machine and nobody else seems to have one any more. The conditional code for using arch_gettimeoffset() is removed as a result. For CONFIG_GENERIC_CLOCKEVENTS, there are still a couple of platforms not using clockevent drivers: parisc, ia64, most of m68k, and one Arm platform. These all do timer ticks slighly differently, and this gets cleaned up to the point they at least all call the same helper function. Instead of most platforms using 'select GENERIC_CLOCKEVENTS' in Kconfig, the polarity is now reversed, with the few remaining ones selecting LEGACY_TIMER_TICK instead" * tag 'asm-generic-timers-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: timekeeping: default GENERIC_CLOCKEVENTS to enabled timekeeping: remove xtime_update m68k: remove timer_interrupt() function m68k: change remaining timers to legacy_timer_tick m68k: m68328: use legacy_timer_tick() m68k: sun3/sun3c: use legacy_timer_tick m68k: split heartbeat out of timer function m68k: coldfire: use legacy_timer_tick() parisc: use legacy_timer_tick ARM: rpc: use legacy_timer_tick ia64: convert to legacy_timer_tick timekeeping: add CONFIG_LEGACY_TIMER_TICK timekeeping: remove arch_gettimeoffset net: remove am79c961a driver ARM: remove ebsa110 platform
2020-12-15Merge tag 'asm-generic-mmu-context-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-genericLinus Torvalds1-7/+5
Pull asm-generic mmu-context cleanup from Arnd Bergmann: "This is a cleanup series from Nicholas Piggin, preparing for later changes. The asm/mmu_context.h header are generalized and common code moved to asm-gneneric/mmu_context.h. This saves a bit of code and makes it easier to change in the future" * tag 'asm-generic-mmu-context-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: (25 commits) h8300: Fix generic mmu_context build m68k: mmu_context: Fix Sun-3 build xtensa: use asm-generic/mmu_context.h for no-op implementations x86: use asm-generic/mmu_context.h for no-op implementations um: use asm-generic/mmu_context.h for no-op implementations sparc: use asm-generic/mmu_context.h for no-op implementations sh: use asm-generic/mmu_context.h for no-op implementations s390: use asm-generic/mmu_context.h for no-op implementations riscv: use asm-generic/mmu_context.h for no-op implementations powerpc: use asm-generic/mmu_context.h for no-op implementations parisc: use asm-generic/mmu_context.h for no-op implementations openrisc: use asm-generic/mmu_context.h for no-op implementations nios2: use asm-generic/mmu_context.h for no-op implementations nds32: use asm-generic/mmu_context.h for no-op implementations mips: use asm-generic/mmu_context.h for no-op implementations microblaze: use asm-generic/mmu_context.h for no-op implementations m68k: use asm-generic/mmu_context.h for no-op implementations ia64: use asm-generic/mmu_context.h for no-op implementations hexagon: use asm-generic/mmu_context.h for no-op implementations csky: use asm-generic/mmu_context.h for no-op implementations ...
2020-12-15Merge tag 'irq-core-2020-12-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds1-16/+1
Pull irq updates from Thomas Gleixner: "Generic interrupt and irqchips subsystem updates. Unusually, there is not a single completely new irq chip driver, just new DT bindings and extensions of existing drivers to accomodate new variants! Core: - Consolidation and robustness changes for irq time accounting - Cleanup and consolidation of irq stats - Remove the fasteoi IPI flow which has been proved useless - Provide an interface for converting legacy interrupt mechanism into irqdomains Drivers: - Preliminary support for managed interrupts on platform devices - Correctly identify allocation of MSIs proxyied by another device - Generalise the Ocelot support to new SoCs - Improve GICv4.1 vcpu entry, matching the corresponding KVM optimisation - Work around spurious interrupts on Qualcomm PDC - Random fixes and cleanups" * tag 'irq-core-2020-12-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (54 commits) irqchip/qcom-pdc: Fix phantom irq when changing between rising/falling driver core: platform: Add devm_platform_get_irqs_affinity() ACPI: Drop acpi_dev_irqresource_disabled() resource: Add irqresource_disabled() genirq/affinity: Add irq_update_affinity_desc() irqchip/gic-v3-its: Flag device allocation as proxied if behind a PCI bridge irqchip/gic-v3-its: Tag ITS device as shared if allocating for a proxy device platform-msi: Track shared domain allocation irqchip/ti-sci-intr: Fix freeing of irqs irqchip/ti-sci-inta: Fix printing of inta id on probe success drivers/irqchip: Remove EZChip NPS interrupt controller Revert "genirq: Add fasteoi IPI flow" irqchip/hip04: Make IPIs use handle_percpu_devid_irq() irqchip/bcm2836: Make IPIs use handle_percpu_devid_irq() irqchip/armada-370-xp: Make IPIs use handle_percpu_devid_irq() irqchip/gic, gic-v3: Make SGIs use handle_percpu_devid_irq() irqchip/ocelot: Add support for Jaguar2 platforms irqchip/ocelot: Add support for Serval platforms irqchip/ocelot: Add support for Luton platforms irqchip/ocelot: prepare to support more SoC ...
2020-12-15Merge tag 'irqchip-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/coreThomas Gleixner2-2/+8
Pull irqchip updates for 5.11 from Marc Zyngier: - Preliminary support for managed interrupts on platform devices - Correctly identify allocation of MSIs proxyied by another device - Remove the fasteoi IPI flow which has been proved useless - Generalise the Ocelot support to new SoCs - Improve GICv4.1 vcpu entry, matching the corresponding KVM optimisation - Work around spurious interrupts on Qualcomm PDC - Random fixes and cleanups Link: https://lore.kernel.org/r/20201212135626.1479884-1-maz@kernel.org
2020-12-14Merge tag 'core-mm-2020-12-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds2-14/+0
Pull kmap updates from Thomas Gleixner: "The new preemtible kmap_local() implementation: - Consolidate all kmap_atomic() internals into a generic implementation which builds the base for the kmap_local() API and make the kmap_atomic() interface wrappers which handle the disabling/enabling of preemption and pagefaults. - Switch the storage from per-CPU to per task and provide scheduler support for clearing mapping when scheduling out and restoring them when scheduling back in. - Merge the migrate_disable/enable() code, which is also part of the scheduler pull request. This was required to make the kmap_local() interface available which does not disable preemption when a mapping is established. It has to disable migration instead to guarantee that the virtual address of the mapped slot is the same across preemption. - Provide better debug facilities: guard pages and enforced utilization of the mapping mechanics on 64bit systems when the architecture allows it. - Provide the new kmap_local() API which can now be used to cleanup the kmap_atomic() usage sites all over the place. Most of the usage sites do not require the implicit disabling of preemption and pagefaults so the penalty on 64bit and 32bit non-highmem systems is removed and quite some of the code can be simplified. A wholesale conversion is not possible because some usage depends on the implicit side effects and some need to be cleaned up because they work around these side effects. The migrate disable side effect is only effective on highmem systems and when enforced debugging is enabled. On 64bit and 32bit non-highmem systems the overhead is completely avoided" * tag 'core-mm-2020-12-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (33 commits) ARM: highmem: Fix cache_is_vivt() reference x86/crashdump/32: Simplify copy_oldmem_page() io-mapping: Provide iomap_local variant mm/highmem: Provide kmap_local* sched: highmem: Store local kmaps in task struct x86: Support kmap_local() forced debugging mm/highmem: Provide CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP mm/highmem: Provide and use CONFIG_DEBUG_KMAP_LOCAL microblaze/mm/highmem: Add dropped #ifdef back xtensa/mm/highmem: Make generic kmap_atomic() work correctly mm/highmem: Take kmap_high_get() properly into account highmem: High implementation details and document API Documentation/io-mapping: Remove outdated blurb io-mapping: Cleanup atomic iomap mm/highmem: Remove the old kmap_atomic cruft highmem: Get rid of kmap_types.h xtensa/mm/highmem: Switch to generic kmap atomic sparc/mm/highmem: Switch to generic kmap atomic powerpc/mm/highmem: Switch to generic kmap atomic nds32/mm/highmem: Switch to generic kmap atomic ...
2020-12-14um: Fix build w/o CONFIG_PM_SLEEPJohannes Berg1-1/+1
uml_pm_wake() is unconditionally called from the SIGUSR1 wakeup handler since that's in the userspace portion of UML, and thus a bit tricky to ifdef out. Since pm_system_wakeup() can always be called (but may be an empty inline), also simply always have uml_pm_wake() to fix the build. Reported-by: Randy Dunlap <rdunlap@infradead.org> Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2020-12-13um: time-travel: Correct time event IRQ deliveryJohannes Berg4-0/+47
Lockdep (on 5.10-rc) points out that we're delivering IRQs while IRQs are not even enabled, which clearly shouldn't happen. Defer the time event IRQ delivery until they actually are enabled. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2020-12-13um: irq/sigio: Support suspend/resume handling of workaround IRQsJohannes Berg3-16/+56
If the sigio workaround needed to be applied to a file descriptor, set_irq_wake() wouldn't work for it since it would get polled by the thread instead of causing SIGIO, and thus could never really cause a wakeup, since the thread notification FD wasn't marked as being able to wake up the system. Fix this by marking the thread's notification FD explicitly as a wake source FD, i.e. not suppressing SIGIO for it in suspend. In order to not cause spurious wakeups, we then need to remove all FDs that shouldn't wake up the system from the polling thread. In order to do this, add unlocked versions of ignore_sigio_fd() and add_sigio_fd() (nothing else is happening in suspend, so this is fine), and also modify ignore_sigio_fd() to return -ENOENT if the FD wasn't originally in there. This doesn't matter because nothing else currently checks the return value, but the irq code needs to know which ones to restore the workaround for. All told, this lets us use a timerfd for the RTC clock in the next patch, which doesn't send SIGIO. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2020-12-13um: time-travel: Actually apply "free-until" optimisationJohannes Berg1-1/+12
Due a bug - we never checked the time_travel_ext_free_until value - we were always requesting time for every single scheduling. This adds up since we make reading time cost 256ns, and it's a fairly common call. Fix this. While at it, also make reading time only cost something when we're not currently waiting for our scheduling turn - otherwise things get mixed up in a very confusing way. We should never get here, since we're not actually running, but it's possible if you stick printk() or such into the virtio code that must handle the external interrupts. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2020-12-13um: chan_xterm: Fix fd leakAnton Ivanov1-0/+5
xterm serial channel was leaking a fd used in setting up the port helper This bug is prehistoric - it predates switching to git. The "fixes" header here is really just to mark all the versions we would like this to apply to which is "Anything from the Cretaceous period onwards". No dinosaurs were harmed in fixing this bug. Fixes: b40997b872cd ("um: drivers/xterm.c: fix a file descriptor leak") Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2020-12-13um: tty: Fix handling of close in tty linesAnton Ivanov1-2/+2
Fix a logical error in tty reading. We get 0 and errno == EAGAIN on the first attempt to read from a closed file descriptor. Compared to that a true EAGAIN is EAGAIN and -1. If we check errno for EAGAIN first, before checking the return value we miss the fact that the descriptor is closed. This bug is as old as the driver. It was not showing up with the original POLL based IRQ controller, because it was producing multiple events. Switching to EPOLL unmasked it. Fixes: ff6a17989c08 ("Epoll based IRQ controller") Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2020-12-13um: Monitor error events in IRQ controllerAnton Ivanov1-1/+1
Ensure that file closes, connection closes, etc are propagated as interrupts in the interrupt controller. Fixes: ff6a17989c08 ("Epoll based IRQ controller") Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2020-12-13um: allocate a guard page to helper threadsJohannes Berg4-8/+11
We've been running into stack overflows in helper threads corrupting memory (e.g. because somebody put printf() or os_info() there), so to avoid those causing hard-to-debug issues later on, allocate a guard page for helper thread stacks and mark it read-only. Unfortunately, the crash dump at that point is useless as the stack tracer will try to backtrace the *kernel* thread, not the helper thread, but at least we don't survive to a random issue caused by corruption. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2020-12-13um: support some of ARCH_HAS_SET_MEMORYJohannes Berg4-0/+59
For now, only support set_memory_ro()/rw() which we need for the stack protection in the next patch. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2020-12-13um: time-travel: avoid multiple identical propagationsJohannes Berg1-0/+6
If there is some kind of interrupt negotation or such then it may happen that we send an update message multiple times, avoid that in the interest of efficiency by storing the last transmitted value and only sending a new update if it's not the same as the last update. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2020-12-13um: Fetch registers only for signals which need themAnton Ivanov1-1/+14
UML userspace fetches siginfo and passes it to signal handlers in UML. This is needed only for some of the signals, because key handlers like SIGIO make no use of this variable. Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2020-12-13um: Support suspend to RAMJohannes Berg6-2/+139
With all the previous bits in place, we can now also support suspend to RAM, in the sense that everything is suspended, not just most, including userspace, processes like in s2idle. Since um_idle_sleep() now waits forever, we can simply call that to "suspend" the system. As before, you can wake it up using SIGUSR1 since we're just in a pause() call that only needs to return. In order to implement selective resume from certain devices, and not have any arbitrary device interrupt wake up, suspend interrupts by removing SIGIO notification (O_ASYNC) from all the FDs that are not supposed to wake up the system. However, swap out the handler so we don't actually handle the SIGIO as an interrupt. Since we're in pause(), the mere act of receiving SIGIO wakes us up, and then after things have been restored enough, re-set O_ASYNC for all previously suspended FDs, reinstall the proper SIGIO handler, and send SIGIO to self to process anything that might now be pending. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2020-12-13um: Allow PM with suspend-to-idleJohannes Berg5-1/+46
In order to be able to experiment with suspend in UML, add the minimal work to be able to suspend (s2idle) an instance of UML, and be able to wake it back up from that state with the USR1 signal sent to the main UML process. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2020-12-13um: time: Fix read_persistent_clock64() in time-travelJohannes Berg1-3/+20
In time-travel mode, we've relied on read_persistent_clock64() being called only once at system startup, but this is both the right thing to call from the pseudo-RTC, and also gets called by the timekeeping core during suspend/resume. Thus, fix this to always fall make use of the time_travel_time in any time-travel mode, initializing time_travel_start at boot to the right value depending on the time-travel mode. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2020-12-13um: Simplify os_idle_sleep() and sleep longerJohannes Berg5-25/+21
There really is no reason to pass the amount of time we should sleep, especially since it's just hard-coded to one second. Additionally, one second isn't really all that long, and as we are expecting to be woken up by a signal, we can sleep longer and avoid doing some work every second, so replace the current clock_nanosleep() with just an empty select() that can _only_ be woken up by a signal. We can also remove the deliver_alarm() since we don't need to do that when we got e.g. SIGIO that woke us up, and if we got SIGALRM the signal handler will actually (have) run, so it's just unnecessary extra work. Similarly, in time-travel mode, just program the wakeup event from idle to be S64_MAX, which is basically the most you could ever simulate to. Of course, you should already have an event in the list that's earlier and will cause a wakeup, normally that's the regular timer interrupt, though in suspend it may (later) also be an RTC event. Since actually getting to this point would be a bug and you can't ever get out again, panic() on it in the time control code. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2020-12-13um: Simplify IRQ handling codeJohannes Berg1-242/+128
Reduce dynamic allocations (and thereby cache misses) by simply embedding the registration data for IRQs in the irq_entry, we never supported these being really dynamic anyway as only one was ever allowed ("Trying to reregister ..."). Lockless behaviour is preserved by removing the FD from the poll set appropriately, but we use reg->events to indicate whether or not this entry is used, rather than dynamically allocating them. Also port the list of IRQ entries to list_head instead of the current open-coded singly-linked list implementation, just for sanity. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2020-12-13um: Remove IRQ_NONE typeJohannes Berg8-45/+26
We don't actually use this in um_request_irq(), so it can never be assigned. It's also not clear what that would be useful for, so just remove it. This results in quite a number of cleanups, all the way to removing the "SIGIO on close" startup check, since the data it assigns (pty_close_sigio) is not used anymore. While at it, also make this an enum so we get a minimum of type checking, and remove the IRQ_NONE hack in virtio since we now no longer have the name twice. Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2020-12-13um: irq: Reduce irq_reg allocationJohannes Berg2-7/+7
We don't need an array of 4 entries to capture three and the name 'MAX_IRQ_TYPE' really gets confusing as well. Remove it and add a correct NUM_IRQ_TYPES, and use that correctly. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2020-12-13um: irq: Clean up and rename struct irq_fdJohannes Berg2-26/+22
This really shouldn't be called "irq_fd" since it doesn't carry an fd. Well, it used to, apparently, but that struct member is unused. Rename it to "irq_reg" since it more accurately reflects a registered interrupt, and remove the unused 'next' and 'fd' members from the struct as well. While at it, also move it to the implementation, it's not used anywhere else, and the header file is shared with the userspace components. Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2020-12-13um: Clean up alarm IRQ chip nameJohannes Berg1-5/+4
We don't use "SIGVTALRM", it's just SIGALRM. Clean up the naming. While at it, fix the comment's grammar. Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2020-12-13um: virtio: Use dynamic IRQ allocationJohannes Berg2-11/+16
This separates the devices, which is better for debug and for later suspend/resume and wakeup support, since there we'll have to separate which IRQs can wake up the system and which cannot. Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2020-12-13um: Support dynamic IRQ allocationJohannes Berg13-32/+65
It's cumbersome and error-prone to keep adding fixed IRQ numbers, and for proper device wakeup support for the virtio/vhost-user support we need to have different IRQs for each device. Even if in theory two IRQs (with and without wake) might be sufficient, it's much easier to reason about it when we have dynamic number assignment. It also makes it easier to add new devices that may dynamically exist or depending on the configuration, etc. Add support for this, up to 64 IRQs (the same limit as epoll FDs we have right now). Since it's not easy to port all the existing places to dynamic allocation (some data is statically initialized) keep the low numbers are reserved for the existing hard-coded IRQ numbers. Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2020-12-13um: sigio: Return error from add_sigio_fd()Johannes Berg1-2/+4
If we run out of space, return an error instead of 0. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2020-12-13um: ubd: Set device serial attribute from cmdlineChristopher Obbard1-16/+62
Adds the ability to set the UBD device serial number from the commandline, disabling the serial number functionality by default. In some cases it may be useful to set a serial to the UBD device, such that downstream users (i.e. udev) can use this information to better describe the hardware to the user from the UML cmdline. In our case we use this parameter to create some entries under /dev/disk/by-ubd-id/ for each of the UBD devices passed through the UML cmdline. Signed-off-by: Christopher Obbard <chris.obbard@collabora.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2020-12-13um: Increase stack frame size threshold for signal.cAndy Shevchenko1-0/+2
The signal.c can't use heap for bit data located on stack. However, by default a compiler warns us about overstepping stack frame size threshold: arch/um/os-Linux/signal.c: In function ‘sig_handler_common’: arch/um/os-Linux/signal.c:51:1: warning: the frame size of 2960 bytes is larger than 2048 bytes [-Wframe-larger-than=] 51 | } | ^ arch/um/os-Linux/signal.c: In function ‘timer_real_alarm_handler’: arch/um/os-Linux/signal.c:95:1: warning: the frame size of 2960 bytes is larger than 2048 bytes [-Wframe-larger-than=] 95 | } | ^ Due to above increase stack frame size threshold explicitly for signal.c to avoid unnecessary warning. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Tested-by: David Gow <davidgow@google.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2020-12-13um: line: Don't free winch (with IRQ) under spinlockJohannes Berg1-4/+8
Lockdep correctly complains that one shouldn't call um_free_irq() with free_irq() inside under a spinlock since that will attempt to acquire a mutex. Rearrange the code to keep the list manipulations under the lock while moving the actual freeing outside of it, to avoid this. In particular, this removes the lockdep complaint at shutdown that I was seeing with lockdep enabled. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Acked-By: anton.ivanov@cambridgegreys.com Signed-off-by: Richard Weinberger <richard@nod.at>
2020-12-13um: ubd: Submit all data segments atomicallyGabriel Krisman Bertazi1-76/+115
Internally, UBD treats each physical IO segment as a separate command to be submitted in the execution pipe. If the pipe returns a transient error after a few segments have already been written, UBD will tell the block layer to requeue the request, but there is no way to reclaim the segments already submitted. When a new attempt to dispatch the request is done, those segments already submitted will get duplicated, causing the WARN_ON below in the best case, and potentially data corruption. In my system, running a UML instance with 2GB of RAM and a 50M UBD disk, I can reproduce the WARN_ON by simply running mkfs.fvat against the disk on a freshly booted system. There are a few ways to around this, like reducing the pressure on the pipe by reducing the queue depth, which almost eliminates the occurrence of the problem, increasing the pipe buffer size on the host system, or by limiting the request to one physical segment, which causes the block layer to submit way more requests to resolve a single operation. Instead, this patch modifies the format of a UBD command, such that all segments are sent through a single element in the communication pipe, turning the command submission atomic from the point of view of the block layer. The new format has a variable size, depending on the number of elements, and looks like this: +------------+-----------+-----------+------------ | cmd_header | segment 0 | segment 1 | segment ... +------------+-----------+-----------+------------ With this format, we push a pointer to cmd_header in the submission pipe. This has the advantage of reducing the memory footprint of executing a single request, since it allow us to merge some fields in the header. It is possible to reduce even further each segment memory footprint, by merging bitmap_words and cow_offset, for instance, but this is not the focus of this patch and is left as future work. One issue with the patch is that for a big number of segments, we now perform one big memory allocation instead of multiple small ones, but I wasn't able to trigger any real issues or -ENOMEM because of this change, that wouldn't be reproduced otherwise. This was tested using fio with the verify-crc32 option, and by running an ext4 filesystem over this UBD device. The original WARN_ON was: ------------[ cut here ]------------ WARNING: CPU: 0 PID: 0 at lib/refcount.c:28 refcount_warn_saturate+0x13f/0x141 refcount_t: underflow; use-after-free. Modules linked in: CPU: 0 PID: 0 Comm: swapper Not tainted 5.5.0-rc6-00002-g2a5bb2cf75c8 #346 Stack: 6084eed0 6063dc77 00000009 6084ef60 00000000 604b8d9f 6084eee0 6063dcbc 6084ef40 6006ab8d e013d780 1c00000000 Call Trace: [<600a0c1c>] ? printk+0x0/0x94 [<6004a888>] show_stack+0x13b/0x155 [<6063dc77>] ? dump_stack_print_info+0xdf/0xe8 [<604b8d9f>] ? refcount_warn_saturate+0x13f/0x141 [<6063dcbc>] dump_stack+0x2a/0x2c [<6006ab8d>] __warn+0x107/0x134 [<6008da6c>] ? wake_up_process+0x17/0x19 [<60487628>] ? blk_queue_max_discard_sectors+0x0/0xd [<6006b05f>] warn_slowpath_fmt+0xd1/0xdf [<6006af8e>] ? warn_slowpath_fmt+0x0/0xdf [<600acc14>] ? raw_read_seqcount_begin.constprop.0+0x0/0x15 [<600619ae>] ? os_nsecs+0x1d/0x2b [<604b8d9f>] refcount_warn_saturate+0x13f/0x141 [<6048bc8f>] refcount_sub_and_test.constprop.0+0x2f/0x37 [<6048c8de>] blk_mq_free_request+0xf1/0x10d [<6048ca06>] __blk_mq_end_request+0x10c/0x114 [<6005ac0f>] ubd_intr+0xb5/0x169 [<600a1a37>] __handle_irq_event_percpu+0x6b/0x17e [<600a1b70>] handle_irq_event_percpu+0x26/0x69 [<600a1bd9>] handle_irq_event+0x26/0x34 [<600a1bb3>] ? handle_irq_event+0x0/0x34 [<600a5186>] ? unmask_irq+0x0/0x37 [<600a57e6>] handle_edge_irq+0xbc/0xd6 [<600a131a>] generic_handle_irq+0x21/0x29 [<60048f6e>] do_IRQ+0x39/0x54 [...] ---[ end trace c6e7444e55386c0f ]--- Cc: Christopher Obbard <chris.obbard@collabora.com> Reported-by: Martyn Welch <martyn@collabora.com> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com> Tested-by: Christopher Obbard <chris.obbard@collabora.com> Acked-by: Anton Ivanov <anton.ivanov@cambridgegreys.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2020-12-13um: Fix time-travel modeJohannes Berg1-5/+0
Since the time-travel rework, basic time-travel mode hasn't worked properly, but there's no longer a need for this WARN_ON() so just remove it and thereby fix things. Cc: stable@vger.kernel.org Fixes: 4b786e24ca80 ("um: time-travel: Rewrite as an event scheduler") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2020-12-13um: Remove use of asprinf in umid.cAnton Ivanov1-12/+5
asprintf is not compatible with the existing uml memory allocation mechanism. Its use on the "user" side of UML results in a corrupt slab state. Fixes: 0d4e5ac7e780 ("um: remove uses of variable length arrays") Cc: stable@vger.kernel.org Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2020-12-13um: Add support for TIF_NOTIFY_SIGNALJens Axboe2-1/+4
Wire up TIF_NOTIFY_SIGNAL handling for um. Cc: linux-um@lists.infradead.org Signed-off-by: Jens Axboe <axboe@kernel.dk> Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2020-12-13um: random: Register random as hwrng-core deviceChristopher Obbard1-76/+25
The UML random driver creates a dummy device under the guest, /dev/hw_random. When this file is read from the guest, the driver reads from the host machine's /dev/random, in-turn reading from the host kernel's entropy pool. This entropy pool could have been filled by a hardware random number generator or just the host kernel's internal software entropy generator. Currently the driver does not fill the guests kernel entropy pool, this requires a userspace tool running inside the guest (like rng-tools) to read from the dummy device provided by this driver, which then would fill the guest's internal entropy pool. This all seems quite pointless when we are already reading from an entropy pool, so this patch aims to register the device as a hwrng device using the hwrng-core framework. This not only improves and cleans up the driver, but also fills the guest's entropy pool without having to resort to using extra userspace tools in the guest. This is typically a nuisance when booting a guest: the random pool takes a long time (~200s) to build up enough entropy since the dummy hwrng is not used to fill the guest's pool. This port was originally attempted by Alexander Neville "dark" (in CC, discussion in Link), but the conversation there stalled since the handling of -EAGAIN errors were no removed and longer handled by the driver. This patch attempts to use the existing method of error handling but utilises the new hwrng core. The issue can be noticed when booting a UML guest: [ 2.560000] random: fast init done [ 214.000000] random: crng init done With the patch applied, filling the pool becomes a lot quicker: [ 2.560000] random: fast init done [ 12.000000] random: crng init done Cc: Alexander Neville <dark@volatile.bz> Link: https://lore.kernel.org/lkml/20190828204609.02a7ff70@TheDarkness/ Link: https://lore.kernel.org/lkml/20190829135001.6a5ff940@TheDarkness.local/ Cc: Sjoerd Simons <sjoerd.simons@collabora.co.uk> Signed-off-by: Christopher Obbard <chris.obbard@collabora.com> Acked-by: Anton Ivanov <anton.ivanov@cambridgegreys.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2020-12-13um: Convert tasklets to use new tasklet_setup() APIAllen Pais1-3/+3
In preparation for unconditionally passing the struct tasklet_struct pointer to all tasklet callbacks, switch to using the new tasklet_setup() and from_tasklet() to pass the tasklet pointer explicitly. Signed-off-by: Romain Perier <romain.perier@gmail.com> Signed-off-by: Allen Pais <allen.lkml@gmail.com> Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2020-11-29Merge tag 'locking-urgent-2020-11-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds1-1/+1
Pull locking fixes from Thomas Gleixner: "Two more places which invoke tracing from RCU disabled regions in the idle path. Similar to the entry path the low level idle functions have to be non-instrumentable" * tag 'locking-urgent-2020-11-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: intel_idle: Fix intel_idle() vs tracing sched/idle: Fix arch_cpu_idle() vs tracing
2020-11-24sched/idle: Fix arch_cpu_idle() vs tracingPeter Zijlstra1-1/+1
We call arch_cpu_idle() with RCU disabled, but then use local_irq_{en,dis}able(), which invokes tracing, which relies on RCU. Switch all arch_cpu_idle() implementations to use raw_local_irq_{en,dis}able() and carefully manage the lockdep,rcu,tracing state like we do in entry. (XXX: we really should change arch_cpu_idle() to not return with interrupts enabled) Reported-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Mark Rutland <mark.rutland@arm.com> Link: https://lkml.kernel.org/r/20201120114925.594122626@infradead.org
2020-11-23um/irqstat: Get rid of the duplicated declarationsThomas Gleixner1-16/+1
irq_cpustat_t and ack_bad_irq() are exactly the same as the asm-generic ones. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/r/20201113141733.156361337@linutronix.de
2020-11-10um: Call pgtable_pmd_page_dtor() in __pmd_free_tlb()Richard Weinberger1-1/+7
Commit b2b29d6d0119 ("mm: account PMD tables like PTE tables") uncovered a bug in uml, we forgot to call the destructor. While we are here, give x a sane name. Reported-by: Anton Ivanov <anton.ivanov@cambridgegreys.com> Co-developed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Richard Weinberger <richard@nod.at> Tested-by: Christopher Obbard <chris.obbard@collabora.com>
2020-11-09um: add support for TIF_NOTIFY_SIGNALJens Axboe2-1/+4
Wire up TIF_NOTIFY_SIGNAL handling for um. Cc: linux-um@lists.infradead.org Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-11-06highmem: Get rid of kmap_types.hThomas Gleixner2-14/+0
The header is not longer used and on alpha, ia64, openrisc, parisc and um it was completely unused anyway as these architectures have no highmem support. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Christoph Hellwig <hch@lst.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20201103095858.422094352@linutronix.de
2020-10-30timekeeping: default GENERIC_CLOCKEVENTS to enabledArnd Bergmann1-1/+0
Almost all machines use GENERIC_CLOCKEVENTS, so it feels wrong to require each one to select that symbol manually. Instead, enable it whenever CONFIG_LEGACY_TIMER_TICK is disabled as a simplification. It should be possible to select both GENERIC_CLOCKEVENTS and LEGACY_TIMER_TICK from an architecture now and decide at runtime between the two. For the clockevents arch-support.txt file, this means that additional architectures are marked as TODO when they have at least one machine that still uses LEGACY_TIMER_TICK, rather than being marked 'ok' when at least one machine has been converted. This means that both m68k and arm (for riscpc) revert to TODO. At this point, we could just always enable CONFIG_GENERIC_CLOCKEVENTS rather than leaving it off when not needed. I built an m68k defconfig kernel (using gcc-10.1.0) and found that this would add around 5.5KB in kernel image size: text data bss dec hex filename 3861936 1092236 196656 5150828 4e986c obj-m68k/vmlinux-no-clockevent 3866201 1093832 196184 5156217 4ead79 obj-m68k/vmlinux-clockevent On Arm (MACH_RPC), that difference appears to be twice as large, around 11KB on top of an 6MB vmlinux. Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Tested-by: Geert Uytterhoeven <geert@linux-m68k.org> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2020-10-27um: use asm-generic/mmu_context.h for no-op implementationsNicholas Piggin1-7/+5
Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com> Cc: linux-um@lists.infradead.org Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2020-10-26arch/um: partially revert the conversion to __section() macroLinus Torvalds1-1/+1
A couple of um files ended up not including the header file that defines the __section() macro, and the simplest fix is to just revert the change for those files. Fixes: 33def8498fdd treewide: Convert macro and uses of __section(foo) to __section("foo") Reported-and-tested-by: Guenter Roeck <linux@roeck-us.net> Cc: Joe Perches <joe@perches.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-25treewide: Convert macro and uses of __section(foo) to __section("foo")Joe Perches3-13/+13
Use a more generic form for __section that requires quotes to avoid complications with clang and gcc differences. Remove the quote operator # from compiler_attributes.h __section macro. Convert all unquoted __section(foo) uses to quoted __section("foo"). Also convert __attribute__((section("foo"))) uses to __section("foo") even if the __attribute__ has multiple list entry forms. Conversion done using the script at: https://lore.kernel.org/lkml/75393e5ddc272dc7403de74d645e6c6e0f4e70eb.camel@perches.com/2-convert_section.pl Signed-off-by: Joe Perches <joe@perches.com> Reviewed-by: Nick Desaulniers <ndesaulniers@gooogle.com> Reviewed-by: Miguel Ojeda <ojeda@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-23Merge tag 'arch-cleanup-2020-10-22' of git://git.kernel.dk/linux-blockLinus Torvalds1-1/+1
Pull arch task_work cleanups from Jens Axboe: "Two cleanups that don't fit other categories: - Finally get the task_work_add() cleanup done properly, so we don't have random 0/1/false/true/TWA_SIGNAL confusing use cases. Updates all callers, and also fixes up the documentation for task_work_add(). - While working on some TIF related changes for 5.11, this TIF_NOTIFY_RESUME cleanup fell out of that. Remove some arch duplication for how that is handled" * tag 'arch-cleanup-2020-10-22' of git://git.kernel.dk/linux-block: task_work: cleanup notification modes tracehook: clear TIF_NOTIFY_RESUME in tracehook_notify_resume()