Age | Commit message (Collapse) | Author | Files | Lines |
|
The unsafe doorbell and scratchpad access should display reason when
WARN is called. Otherwise we get a stack dump without any explanation.
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
|
|
Printouts driver name and version to indicate what is being loaded.
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
|
|
Benchmarking showed a significant performance increase with the MTU size
to 64k instead of 16k. Change the driver default to 64k.
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
|
|
Instead of using the platform code names, use the correct platform names
to identify the respective Intel NTB hardware.
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
|
|
Disable DMA usage by default, since the CPU provides much better
performance with write combining. Provide a module parameter to enable
DMA usage when offloading the memcpy is preferred.
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Allen Hubbe <Allen.Hubbe@emc.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
|
|
Changing the memory window BAR mappings to write combining significantly
boosts the performance. We will also use memcpy that uses non-temporal
store, which showed performance improvement when doing non-cached
memcpys.
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
|
|
Allocate memory for the NUMA node of the NTB device.
Signed-off-by: Allen Hubbe <Allen.Hubbe@emc.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
|
|
Allocate memory and request the DMA channel for the same NUMA node as
the NTB device.
Signed-off-by: Allen Hubbe <Allen.Hubbe@emc.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
|
|
When the ntb transport is connecting and waiting for the peer, the debug
console receives lots of debug level messages about the remote qp link
status being down. Rate limit those messages.
Signed-off-by: Allen Hubbe <Allen.Hubbe@emc.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
|
|
This is a simple debugging driver that enables the doorbell and
scratch pad registers to be read and written from the debugfs. This
tool enables more complicated debugging to be scripted from user space.
This driver may be used to test that your ntb hardware and drivers are
functioning at a basic level.
Signed-off-by: Allen Hubbe <Allen.Hubbe@emc.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
|
|
This is a simple ping pong driver that exercises the scratch pads and
doorbells of the ntb hardware. This driver may be used to test that
your ntb hardware and drivers are functioning at a basic level.
Signed-off-by: Allen Hubbe <Allen.Hubbe@emc.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
|
|
Add module parameters for the addresses to be used in B2B topology.
Signed-off-by: Allen Hubbe <Allen.Hubbe@emc.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
|
|
Reset the link stats when the link goes down. In particular, the TX and
RX index and count must be reset, or else the TX side will be sending
packets to the RX side where the RX side is not expecting them. Reset
all the stats, to be consistent.
Signed-off-by: Allen Hubbe <Allen.Hubbe@emc.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
|
|
On link down, don't advance RX index to the next entry. The next entry
should never be valid after receiving the link down flag.
Signed-off-by: Allen Hubbe <Allen.Hubbe@emc.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
|
|
The same message "qp %d: Link Down\n" was printed at two locations in
ntb_transport. Change the messages so they are distinct.
Signed-off-by: Allen Hubbe <Allen.Hubbe@emc.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
|
|
Set errata flags for the specific device IDs to which they apply,
instead of the whole Xeon hardware class.
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
|
|
Link training should be enabled in the driver probe for root port mode.
We should not have to wait for transport to be loaded for this to
happen. Otherwise the ntb device will not show up on the transparent
bridge side of the link.
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
|
|
The transport was writing and then reading the peer scratch pad,
essentially reading what it just wrote instead of exchanging any
information with the peer. The transport expects the peer values to be
the same as the local values, so this issue was not obvious.
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
|
|
Change ntb_hw_intel to use the new NTB hardware abstraction layer.
Split ntb_transport into its own driver. Change it to use the new NTB
hardware abstraction layer.
Signed-off-by: Allen Hubbe <Allen.Hubbe@emc.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
|
|
Abstract the NTB device behind a programming interface, so that it can
support different hardware and client drivers.
Signed-off-by: Allen Hubbe <Allen.Hubbe@emc.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
|
|
Jan Kara and Thomas Gleixner reported boot crashes in the FPU
code:
general protection fault: 0000 [#1] SMP
RIP: 0010:[<ffffffff81048a6c>] [<ffffffff81048a6c>] mxcsr_feature_mask_init+0x1c/0x40
2b:* 0f ae 85 00 fe ff ff fxsave -0x200(%rbp)
and bisected it down to the following FPU commit:
91a8c2a5b43f ("x86/fpu: Clean up and fix MXCSR handling")
The reason is that the on-stack FPU registers state variable,
used by the FXSAVE instruction, did not have the required
minimum alignment of 16 bytes, causing the general protection
fault.
This is most likely a GCC bug in older GCC versions, but the
offending commit also added a bogus extra 32-byte alignment
(which GCC ignored too).
So fix this bug by making the variable static again, but also
mark it __initdata this time, because fpu__init_system_mxcsr()
is now an __init function.
Reported-and-bisected-by: Jan Kara <jack@suse.cz>
Reported-bisected-and-tested-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20150704075819.GA9201@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Commit 44dba3d5d6a1 ("sched: Refactor task_struct to use
numa_faults instead of numa_* pointers") modified the way
tsk->numa_faults stats are accounted.
However that commit never touched show_numa_stats() that is displayed
in /proc/pid/sched and thus the numbers displayed in /proc/pid/sched
don't match the actual numbers.
Fix it by making sure that /proc/pid/sched reflects the task
fault numbers. Also add group fault stats too.
Also couple of more modifications are added here:
1. Format changes:
- Previously we would list two entries per node, one for private
and one for shared. Also the home node info was listed in each entry.
- Now preferred node, total_faults and current node are
displayed separately.
- Now there is one entry per node, that lists private,shared task and
group faults.
2. Unit changes:
- p->numa_pages_migrated was getting reset after every read of
/proc/pid/sched. It's more useful to have absolute numbers since
differential migrations between two accesses can be more easily
calculated.
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Iulia Manda <iulia.manda21@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1435252903-1081-4-git-send-email-srikar@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Having the numa group ID in /proc/sched_debug helps to see how
the numa groups have spread across the system.
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Iulia Manda <iulia.manda21@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1435252903-1081-3-git-send-email-srikar@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Currently print_cfs_rq() is declared in include/linux/sched.h.
However it's not used outside kernel/sched. Hence move the
declaration to kernel/sched/sched.h
Also some functions are only available for CONFIG_SCHED_DEBUG=y.
Hence move the declarations to within the #ifdef.
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Iulia Manda <iulia.manda21@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1435252903-1081-2-git-send-email-srikar@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Expand /proc/pid/schedstat output:
- enable it on CONFIG_TASK_DELAY_ACCT=y && !CONFIG_SCHEDSTATS kernels.
- dump all zeroes on kernels that are booted with the 'nodelayacct'
option, which boot option disables delay accounting on
CONFIG_TASK_DELAY_ACCT=y kernels.
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: a.p.zijlstra@chello.nl
Cc: ricklind@us.ibm.com
Link: http://lkml.kernel.org/r/5ccbef17d4bc841084ea6e6421d4e4a23b7b806f.1435654789.git.naveen.n.rao@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Both CONFIG_SCHEDSTATS=y and CONFIG_TASK_DELAY_ACCT=y track task
sched_info, which results in ugly #if clauses.
Simplify the code by introducing a synthethic CONFIG_SCHED_INFO
switch, selected by both.
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: a.p.zijlstra@chello.nl
Cc: ricklind@us.ibm.com
Link: http://lkml.kernel.org/r/8d19eef800811a94b0f91bcbeb27430a884d7433.1435255405.git.naveen.n.rao@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
The length of each EDID block is EDID_LENGTH, and number of blocks is
(1 + edid->extensions) - we need to multiply not add them.
This causes wrong EDID to be passed on, and is a regression introduced
by d2ed34362a52 (drm: Introduce helper for replacing blob properties)
Signed-off-by: Shixin Zeng <zeng.shixin@gmail.com>
Cc: Daniel Stone <daniels@collabora.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
[danvet: Add Cc: and fix commit summary.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
|
|
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Added Hyper-V crash msrs values - HV_X64_MSR_CRASH*.
Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Peter Hornyack <peterhornyack@google.com>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Gleb Natapov <gleb@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Commit 609e36d372ad ("KVM: x86: pass host_initiated to functions that
read MSRs") modified kvm_get_msr_common function to use msr_info->data
instead of data but missed one occurrence. Replace it and remove the
unused local variable.
Fixes: 609e36d372ad ("KVM: x86: pass host_initiated to functions that
read MSRs")
Signed-off-by: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Eric noticed problems with vhost-scsi and virtio-ccw: vhost-scsi
complained about overwriting values in the config space, which
was triggered by a broken implementation of virtio-ccw's config
get/set routines. It was probably sheer luck that we did not hit
this before.
When writing a value to the config space, the WRITE_CONF ccw will
always write from the beginning of the config space up to and
including the value to be set. If the config space up to the value
has not yet been retrieved from the device, however, we'll end up
overwriting values. Keep track of the known config space and update
if needed to avoid this.
Moreover, READ_CONF will only read the number of bytes it has been
instructed to retrieve, so we must not copy more than that to the
buffer, or we might overwrite trailing values.
Reported-by: Eric Farman <farman@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Eric Farman <farman@linux.vnet.ibm.com>
Tested-by: Eric Farman <farman@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Memory-mapped LVT0 register already contains the new value when APICv
traps so we can't directly detect a change.
Memorize a bit we are interested in to enable legacy NMI watchdog.
Suggested-by: Yoshida Nobuo <yoshida.nb@ncos.nec.co.jp>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Legacy NMI watchdog didn't work after migration/resume, because
vapics_in_nmi_mode was left at 0.
Cc: stable@vger.kernel.org
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Writes were a bit racy, but hard to turn into a bug at the same time.
(Particularly because modern Linux doesn't use this feature anymore.)
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
[Actually the next patch makes it much, much easier to trigger the race
so I'm including this one for stable@ as well. - Paolo]
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Commit 1cde2930e154 ("sched/preempt: Add static_key() to preempt_notifiers")
had two problems. First, the preempt-notifier API needs to sleep with the
addition of the static_key, we do however need to hold off preemption
while modifying the preempt notifier list, otherwise a preemption could
observe an inconsistent list state. KVM correctly registers and
unregisters preempt notifiers with preemption disabled, so the sleep
caused dmesg splats.
Second, KVM registers and unregisters preemption notifiers very often
(in vcpu_load/vcpu_put). With a single uniprocessor guest the static key
would move between 0 and 1 continuously, hitting the slow path on every
userspace exit.
To fix this, wrap the static_key inc/dec in a new API, and call it from
KVM.
Fixes: 1cde2930e154 ("sched/preempt: Add static_key() to preempt_notifiers")
Reported-by: Pontus Fuchs <pontus.fuchs@gmail.com>
Reported-by: Takashi Iwai <tiwai@suse.de>
Tested-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Commit 86dca36e6ba introduced ratelimited usage for
'unhandled_signal' messages.
The commit checks the ratelimit irrespective of whether
the signal is handled or not, which is wrong and leads
to false reports like the below in dmesg :
__do_user_fault: 127 callbacks suppressed
Do the ratelimit check only if the signal is unhandled.
Fixes: 86dca36e6ba0 ("arm64: use private ratelimit state along with show_unhandled_signals")
Cc: Vladimir Murzin <Vladimir.Murzin@arm.com>
Signed-off-by: Suzuki K. Poulose <suzuki.poulose@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
Use kernel.h macro definition.
Thanks to Julia Lawall for Coccinelle scripting support.
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Jean Delvare <jdelvare@suse.de>
|
|
Add an item to the checklist when submitting a new hwmon driver: only
some I2C addresses can be probed, others should not for safety
reasons.
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
|
|
Add pwm[4-7] and the associated pwm[4-7]_mode attributes.
Signed-off-by: Roger Lucas <vt8231@hiddenengine.co.uk>
Signed-off-by: Jean Delvare <jdelvare@suse.de>
|
|
It is normal that firmware presents GICC entry or entries (processors)
with disabled flag in ACPI MADT, taking a system of 16 cpus for example,
ACPI firmware may present 8 ebabled first with another 8 cpus disabled
in MADT, the disabled cpus can be hot-added later.
Firmware may also present more cpus than the hardware actually has, but
disabled the unused ones, and easily enable it when the hardware has such
cpus to make the firmware code scalable.
So that's not an error for disabled cpus in MADT, we can switch pr_err()
to pr_debug() to make the boot a little quieter by default.
Since hwid for disabled cpus often are invalid, and we check invalid hwid
first in the code, for use case that hot add cpus later will be filtered
out and will not be counted in possible cups, so move this check before
the hwid one to prepare the code to count for disabeld cpus when cpu
hot-plug is introduced.
Signed-off-by: Hanjun Guo <hanjun.guo@linaro.org>
Reviewed-by: Al Stone <ahs3@redhat.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
It's a bug in our Makefile rules, make it show what the changing
certificate list was, and make it a warning so that people actually see
it.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Revert commit ff284f37fc0e (Revert "ACPICA: Permanently set _REV to
the value '2'.) as the regression introduced by commit b1ef29725865
reverted by it is now addressed via a blacklist entry.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
The platform firmware on some systems expects Linux to return "5" as
the supported ACPI revision which makes it expose system configuration
information in a special way.
For example, based on what ACPI exports as the supported revision,
Dell XPS 13 (2015) configures its audio device to either work in HDA
mode or in I2S mode, where the former is supposed to be used on Linux
until the latter is fully supported (in the kernel as well as in user
space).
Since ACPI 6 mandates that _REV should return "2" if ACPI 2 or later
is supported by the OS, a subsequent change will make that happen, so
make it possible to override that on systems where "5" is expected to
be returned for Linux to work correctly one them (such as the Dell
machine mentioned above).
Original-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
In commit 92923ca3aace "mm: meminit: only set page reserved in the memblock region"
we dropped setting the reserved bits for all pages. This results in some warnings
on ia64:
put_kernel_page: page at 0xe000000005588000 not in reserved memory
put_kernel_page: page at 0xe000000005588000 not in reserved memory
put_kernel_page: page at 0xe000000005580000 not in reserved memory
put_kernel_page: page at 0xe000000005580000 not in reserved memory
put_kernel_page: page at 0xe000000005580000 not in reserved memory
put_kernel_page: page at 0xe000000005580000 not in reserved memory
the two different pages match up with two objects from the loaded kernel
that get mapped by arch/ia64/mm/init.c:setup_gate()
a000000101588000 D __start_gate_section
a000000101580000 D empty_zero_page
In a discussion with Mel Gorman:
http://lkml.kernel.org/r/20150526102219.GB13750%40suse.de
he suggested that while the preferred approach might be to
set the reserved bit for these pages, it would also be OK
to just drop the test:
"as it's a debugging check that is ia-64 specific"
After hunting around a bit and failin to find a good place to mark these
pages as reserved - I decided to just delete the test.
Signed-off-by: Tony Luck <tony.luck@intel.com>
|
|
At the moment the IRQCHIP_DECLARE macro is only declared locally in
drivers/irqchip/irqchip.h. It prevents from using it directly in arch/*
directories whenever irqchip drivers only exist there, which happens in a few
cases (e.g. arc, arm, microblaze and mips).
This patch makes the macro to be globally defined, i.e. in
include/linux/irqchip.h, and thus usable for arch-specific declarations of
irqchip drivers. In this way, it is very similar to what clocksource does (ie
CLOCKSOURCE_OF_DECLARE is defined in include/linux/clocksource.h).
For now, this patch only moves the declaration of the macro
IRQCHIP_DECLARE to the global header 'include/linux/irqchip.h' and make
'drivers/irqchip/irqchip.h' include 'include/linux/irqchip.h'. Later, other
patches will get rid of 'drivers/irqchip/irqchip.h' and modify all the impacted
irqchip drivers.
Signed-off-by: Joel Porquet <joel@porquet.org>
Cc: Jason Cooper <jason@lakedaemon.net>
Link: http://lkml.kernel.org/r/1435865565-14114-1-git-send-email-joel@porquet.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
It is not needed after booting, this patch moves the arm_cpuidle_init()
function to the __init section.
Signed-off-by: Jisheng Zhang <jszhang@marvell.com>
Reviewed-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
Fixes an issue when queue_reuest_irq fails in nvme_setup_io_queues. This
patch initializes all vectors to -1 and resets the vector to -1 in the
case of a failure in queue_request_irq. This avoids the free_irq in
nvme_suspend_queue if the queue did not get an irq.
Signed-off-by: Jon Derrick <jonathan.derrick@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
52ebea749aae ("writeback: make backing_dev_info host cgroup-specific
bdi_writebacks") made bdi (backing_dev_info) host per-cgroup wb's
(bdi_writeback's). As the congested state needs to be per-wb and
referenced from blkcg side and multiple wbs, the patch made all
non-root cong's (bdi_writeback_congested's) reference counted and
indexed on bdi.
When a bdi is destroyed, cgwb_bdi_destroy() tries to drain all
non-root cong's; however, this can hang indefinitely because wb's can
also be referenced from blkcg_gq's which are destroyed after bdi
destruction is complete.
This patch fixes the bug by updating bdi destruction to not wait for
cong's to drain. A cong is unlinked from bdi->cgwb_congested_tree on
bdi destuction regardless of its reference count as the bdi may go
away any point after destruction. wb_congested_put() checks whether
the cong is already unlinked on release.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Jon Christopherson <jon@jons.org>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=100681
Fixes: 52ebea749aae ("writeback: make backing_dev_info host cgroup-specific bdi_writebacks")
Tested-by: Jon Christopherson <jon@jons.org>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
52ebea749aae ("writeback: make backing_dev_info host cgroup-specific
bdi_writebacks") made bdi (backing_dev_info) host per-cgroup wb's
(bdi_writeback's). As the congested state needs to be per-wb and
referenced from blkcg side and multiple wbs, the patch made all
non-root cong's (bdi_writeback_congested's) reference counted and
indexed on bdi.
When a bdi is destroyed, cgwb_bdi_destroy() tries to drain all
non-root cong's; however, this can hang indefinitely because wb's can
also be referenced from blkcg_gq's which are destroyed after bdi
destruction is complete.
To fix the bug, bdi destruction will be updated to not wait for cong's
to drain, which naturally means that cong's may outlive the associated
bdi. This is fine for non-root cong's but is problematic for the root
cong's which are embedded in their bdi's as they may end up getting
dereferenced after the containing bdi's are freed.
This patch makes root cong's behave the same as non-root cong's. They
are no longer embedded in their bdi's but allocated separately during
bdi initialization, indexed and reference counted the same way.
* As cong handling is the same for all wb's, wb->congested
initialization is moved into wb_init().
* When !CONFIG_CGROUP_WRITEBACK, there was no indexing or refcnting.
bdi->wb_congested is now a pointer pointing to the root cong
allocated during bdi init and minimal refcnting operations are
implemented.
* The above makes root wb init paths diverge depending on
CONFIG_CGROUP_WRITEBACK. root wb init is moved to cgwb_bdi_init().
This patch in itself shouldn't cause any consequential behavior
differences but prepares for the actual fix.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Jon Christopherson <jon@jons.org>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=100681
Tested-by: Jon Christopherson <jon@jons.org>
Added <linux/slab.h> include to backing-dev.h for kfree() definition.
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
This patch only moves files to their new locations, before applying the
next two patches adding the NTB Abstraction layer. Splitting this patch
from the next is intended make distinct which code is changed only due
to moving the files, versus which are substantial code changes in adding
the NTB Abstraction layer.
Signed-off-by: Allen Hubbe <Allen.Hubbe@emc.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
|