Age | Commit message (Collapse) | Author | Files | Lines |
|
The fact that tasks could be migrated from leaf to root rcu_node
structures meant that synchronize_rcu_expedited() had to disable
CPU hotplug. However, tasks now stay put, so this commit removes the
CPU-hotplug disabling from synchronize_rcu_expedited().
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
|
Currently if the rcu_fanout_leaf boot parameter is out of bounds (that
is, less than RCU_FANOUT_LEAF or greater than the number of bits in an
unsigned long), a warning is issued and execution continues with the
out-of-bounds value. This can result in all manner of failures, so this
patch resets rcu_fanout_leaf to RCU_FANOUT_LEAF when an out-of-bounds
condition is detected.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
|
Because gcc does not realize a loop would not be entered ever
(i.e. in case of rcu_num_lvls == 1):
for (i = 1; i < rcu_num_lvls; i++)
rsp->level[i] = rsp->level[i - 1] + levelcnt[i - 1];
some compiler (pre- 5.x?) versions give a bogus warning:
kernel/rcu/tree.c: In function ‘rcu_init_one.isra.55’:
kernel/rcu/tree.c:4108:13: warning: array subscript is above array bounds [-Warray-bounds]
rsp->level[i] = rsp->level[i - 1] + rsp->levelcnt[i - 1];
^
Fix that warning by adding an extra item to rcu_state::level[]
array. Once the bogus warning is fixed in gcc and kernel drops
support of older versions, the dummy item may be removed from
the array.
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Suggested-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
|
This update makes arithmetic to calculate number of RCU nodes
more straight and easy to read.
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
|
Although a number of RCU levels may be less than the current
maximum of four, some static data associated with each level
are allocated for all four levels. As result, the extra data
never get accessed and just wast memory. This update limits
count of allocated items to the number of used RCU levels.
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
|
Members rcu_state::levelcnt[] and rcu_state::levelspread[]
are only used at init. There is no reason to keep them
afterwards.
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
|
Number of items in rcu_capacity[] array is defined by macro
MAX_RCU_LVLS. However, that array is never accessed beyond
RCU_NUM_LVLS index. Therefore, we can limit the array to
RCU_NUM_LVLS items and eliminate MAX_RCU_LVLS. As result,
in most cases the memory is conserved.
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
|
Variable rcu_num_lvls is limited by RCU_NUM_LVLS macro.
In turn, rcu_state::levelcnt[] array is never accessed
beyond rcu_num_lvls. Thus, rcu_state::levelcnt[] is safe
to limit to RCU_NUM_LVLS items.
Since rcu_num_lvls could be changed during boot (as result
of rcutree.rcu_fanout_leaf kernel parameter update) one might
assume a new value could overflow the value of RCU_NUM_LVLS.
However, that is not the case, since leaf-level fanout is only
permitted to increase, resulting in rcu_num_lvls possibly to
decrease.
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
|
Current code suggests that introducing the extra level to
rcu_capacity[] array makes some of the arithmetic easier.
Well, in fact it appears rather confusing and unnecessary.
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
|
This update simplifies rcu_init_geometry() code flow
and makes calculation of the total number of rcu_node
structures more easy to read.
The update relies on the fact num_rcu_lvl[] is never
accessed beyond rcu_num_lvls index by the rest of the
code. Therefore, there is no need initialize the whole
num_rcu_lvl[].
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
|
Local variable 'n' mimics 'nr_cpu_ids' while the both are
used within one function. There is no reason for 'n' to
exist whatsoever.
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
|
Currently a condition when RCU tree is unable to accommodate
the configured number of CPUs is not permitted and causes
a fall back to compile-time values. However, the code has no
means to exceed the RCU tree capacity neither at compile-time
nor in run-time. Therefore, if the condition is met in run-
time then it indicates a serios problem elsewhere and should
be handled with a panic.
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
|
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
|
|
|
Commit 835a6a2f8603 ("Bluetooth: Stop sabotaging list poisoning")
thought that the code was sabotaging the list poisoning when NULL'ing
out the list pointers and removed it.
But what was going on was that the bluetooth code was using NULL
pointers for the list as a way to mark it empty, and that commit just
broke it (and replaced the test with NULL with a "list_empty()" test on
a uninitialized list instead, breaking things even further).
So fix it all up to use the regular and real list_empty() handling
(which does not use NULL, but a pointer to itself), also making sure to
initialize the list properly (the previous NULL case was initialized
implicitly by the session being allocated with kzalloc())
This is a combination of patches by Marcel Holtmann and Tedd Ho-Jeong
An.
[ I would normally expect to get this through the bt tree, but I'm going
to release -rc1, so I'm just committing this directly - Linus ]
Reported-and-tested-by: Jörg Otte <jrg.otte@gmail.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Original-by: Tedd Ho-Jeong An <tedd.an@intel.com>
Original-by: Marcel Holtmann <marcel@holtmann.org>:
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
if server claims to have written/read more than we'd told it to,
warn and cap the claimed byte count to avoid advancing more than
we are ready to.
|
|
Braino in "9p: switch p9_client_write() to passing it struct iov_iter *";
if response is impossible to parse and we discard the request, get the
out of the loop right there.
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
If we'd already sent a request and decide to abort it, we *must*
issue TFLUSH properly and not just blindly reuse the tag, or
we'll get seriously screwed when response eventually arrives
and we confuse it for response to later request that had reused
the same tag.
Cc: stable@vger.kernel.org # v3.2 and later
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
The brd driver is the only in-tree driver that may sleep currently.
After some discussion on linux-fsdevel, we decided that any driver
may choose to sleep in its ->direct_access method. To ensure that all
callers of bdev_direct_access() are prepared for this, add a call
to might_sleep().
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
If a block device supports the ->direct_access methods, bypass the normal
DIO path and use DAX to go straight to memcpy() instead of allocating
a DIO and a BIO.
Includes support for the DIO_SKIP_DIO_COUNT flag in DAX, as is done in
do_blockdev_direct_IO().
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
When userspace does a write, there's no need for the written data to
pollute the CPU cache. This matches the original XIP code.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
For block devices which are small enough, mkfs will default to creating
a filesystem with block sizes smaller than page size.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
When split BAR is enabled, the driver needs to dump out the split BAR
registers rather than the original 64bit BAR registers.
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
|
|
The unsafe doorbell and scratchpad access should display reason when
WARN is called. Otherwise we get a stack dump without any explanation.
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
|
|
Printouts driver name and version to indicate what is being loaded.
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
|
|
Benchmarking showed a significant performance increase with the MTU size
to 64k instead of 16k. Change the driver default to 64k.
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
|
|
Instead of using the platform code names, use the correct platform names
to identify the respective Intel NTB hardware.
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
|
|
Disable DMA usage by default, since the CPU provides much better
performance with write combining. Provide a module parameter to enable
DMA usage when offloading the memcpy is preferred.
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Allen Hubbe <Allen.Hubbe@emc.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
|
|
Changing the memory window BAR mappings to write combining significantly
boosts the performance. We will also use memcpy that uses non-temporal
store, which showed performance improvement when doing non-cached
memcpys.
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
|
|
Allocate memory for the NUMA node of the NTB device.
Signed-off-by: Allen Hubbe <Allen.Hubbe@emc.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
|
|
Allocate memory and request the DMA channel for the same NUMA node as
the NTB device.
Signed-off-by: Allen Hubbe <Allen.Hubbe@emc.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
|
|
When the ntb transport is connecting and waiting for the peer, the debug
console receives lots of debug level messages about the remote qp link
status being down. Rate limit those messages.
Signed-off-by: Allen Hubbe <Allen.Hubbe@emc.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
|
|
This is a simple debugging driver that enables the doorbell and
scratch pad registers to be read and written from the debugfs. This
tool enables more complicated debugging to be scripted from user space.
This driver may be used to test that your ntb hardware and drivers are
functioning at a basic level.
Signed-off-by: Allen Hubbe <Allen.Hubbe@emc.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
|
|
This is a simple ping pong driver that exercises the scratch pads and
doorbells of the ntb hardware. This driver may be used to test that
your ntb hardware and drivers are functioning at a basic level.
Signed-off-by: Allen Hubbe <Allen.Hubbe@emc.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
|
|
Add module parameters for the addresses to be used in B2B topology.
Signed-off-by: Allen Hubbe <Allen.Hubbe@emc.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
|
|
Reset the link stats when the link goes down. In particular, the TX and
RX index and count must be reset, or else the TX side will be sending
packets to the RX side where the RX side is not expecting them. Reset
all the stats, to be consistent.
Signed-off-by: Allen Hubbe <Allen.Hubbe@emc.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
|
|
On link down, don't advance RX index to the next entry. The next entry
should never be valid after receiving the link down flag.
Signed-off-by: Allen Hubbe <Allen.Hubbe@emc.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
|
|
The same message "qp %d: Link Down\n" was printed at two locations in
ntb_transport. Change the messages so they are distinct.
Signed-off-by: Allen Hubbe <Allen.Hubbe@emc.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
|
|
Set errata flags for the specific device IDs to which they apply,
instead of the whole Xeon hardware class.
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
|
|
Link training should be enabled in the driver probe for root port mode.
We should not have to wait for transport to be loaded for this to
happen. Otherwise the ntb device will not show up on the transparent
bridge side of the link.
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
|
|
The transport was writing and then reading the peer scratch pad,
essentially reading what it just wrote instead of exchanging any
information with the peer. The transport expects the peer values to be
the same as the local values, so this issue was not obvious.
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
|
|
Change ntb_hw_intel to use the new NTB hardware abstraction layer.
Split ntb_transport into its own driver. Change it to use the new NTB
hardware abstraction layer.
Signed-off-by: Allen Hubbe <Allen.Hubbe@emc.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
|
|
Abstract the NTB device behind a programming interface, so that it can
support different hardware and client drivers.
Signed-off-by: Allen Hubbe <Allen.Hubbe@emc.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
|
|
Jan Kara and Thomas Gleixner reported boot crashes in the FPU
code:
general protection fault: 0000 [#1] SMP
RIP: 0010:[<ffffffff81048a6c>] [<ffffffff81048a6c>] mxcsr_feature_mask_init+0x1c/0x40
2b:* 0f ae 85 00 fe ff ff fxsave -0x200(%rbp)
and bisected it down to the following FPU commit:
91a8c2a5b43f ("x86/fpu: Clean up and fix MXCSR handling")
The reason is that the on-stack FPU registers state variable,
used by the FXSAVE instruction, did not have the required
minimum alignment of 16 bytes, causing the general protection
fault.
This is most likely a GCC bug in older GCC versions, but the
offending commit also added a bogus extra 32-byte alignment
(which GCC ignored too).
So fix this bug by making the variable static again, but also
mark it __initdata this time, because fpu__init_system_mxcsr()
is now an __init function.
Reported-and-bisected-by: Jan Kara <jack@suse.cz>
Reported-bisected-and-tested-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20150704075819.GA9201@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Commit 44dba3d5d6a1 ("sched: Refactor task_struct to use
numa_faults instead of numa_* pointers") modified the way
tsk->numa_faults stats are accounted.
However that commit never touched show_numa_stats() that is displayed
in /proc/pid/sched and thus the numbers displayed in /proc/pid/sched
don't match the actual numbers.
Fix it by making sure that /proc/pid/sched reflects the task
fault numbers. Also add group fault stats too.
Also couple of more modifications are added here:
1. Format changes:
- Previously we would list two entries per node, one for private
and one for shared. Also the home node info was listed in each entry.
- Now preferred node, total_faults and current node are
displayed separately.
- Now there is one entry per node, that lists private,shared task and
group faults.
2. Unit changes:
- p->numa_pages_migrated was getting reset after every read of
/proc/pid/sched. It's more useful to have absolute numbers since
differential migrations between two accesses can be more easily
calculated.
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Iulia Manda <iulia.manda21@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1435252903-1081-4-git-send-email-srikar@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Having the numa group ID in /proc/sched_debug helps to see how
the numa groups have spread across the system.
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Iulia Manda <iulia.manda21@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1435252903-1081-3-git-send-email-srikar@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Currently print_cfs_rq() is declared in include/linux/sched.h.
However it's not used outside kernel/sched. Hence move the
declaration to kernel/sched/sched.h
Also some functions are only available for CONFIG_SCHED_DEBUG=y.
Hence move the declarations to within the #ifdef.
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Iulia Manda <iulia.manda21@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1435252903-1081-2-git-send-email-srikar@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Expand /proc/pid/schedstat output:
- enable it on CONFIG_TASK_DELAY_ACCT=y && !CONFIG_SCHEDSTATS kernels.
- dump all zeroes on kernels that are booted with the 'nodelayacct'
option, which boot option disables delay accounting on
CONFIG_TASK_DELAY_ACCT=y kernels.
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: a.p.zijlstra@chello.nl
Cc: ricklind@us.ibm.com
Link: http://lkml.kernel.org/r/5ccbef17d4bc841084ea6e6421d4e4a23b7b806f.1435654789.git.naveen.n.rao@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Both CONFIG_SCHEDSTATS=y and CONFIG_TASK_DELAY_ACCT=y track task
sched_info, which results in ugly #if clauses.
Simplify the code by introducing a synthethic CONFIG_SCHED_INFO
switch, selected by both.
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: a.p.zijlstra@chello.nl
Cc: ricklind@us.ibm.com
Link: http://lkml.kernel.org/r/8d19eef800811a94b0f91bcbeb27430a884d7433.1435255405.git.naveen.n.rao@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
The length of each EDID block is EDID_LENGTH, and number of blocks is
(1 + edid->extensions) - we need to multiply not add them.
This causes wrong EDID to be passed on, and is a regression introduced
by d2ed34362a52 (drm: Introduce helper for replacing blob properties)
Signed-off-by: Shixin Zeng <zeng.shixin@gmail.com>
Cc: Daniel Stone <daniels@collabora.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
[danvet: Add Cc: and fix commit summary.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
|