Age | Commit message (Collapse) | Author | Files | Lines |
|
Disable the jprobes APIs and comment out the jprobes API function
code. This is in preparation of removing all jprobes related
code (including kprobe's break_handler).
Nowadays ftrace and other tracing features are mature enough
to replace jprobes use-cases. Users can safely use ftrace and
perf probe etc. for their use cases.
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: David S . Miller <davem@davemloft.net>
Cc: Ian McDonald <ian.mcdonald@jandi.co.nz>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E . McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlad Yasevich <vyasevich@gmail.com>
Link: http://lkml.kernel.org/r/150724527741.5014.15465541485637899227.stgit@devbox
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
We want to wait for all potentially preempted kprobes trampoline
execution to have completed. This guarantees that any freed
trampoline memory is not in use by any task in the system anymore.
synchronize_rcu_tasks() gives such a guarantee, so use it.
Also, this guarantees to wait for all potentially preempted tasks
on the instructions which will be replaced with a jump.
Since this becomes a problem only when CONFIG_PREEMPT=y, enable
CONFIG_TASKS_RCU=y for synchronize_rcu_tasks() in that case.
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Naveen N . Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Paul E . McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/150845661962.5443.17724352636247312231.stgit@devbox
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Because many of RCU's files have not been included into docbook, a
number of errors have accumulated. This commit fixes them.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Commit 764f80798b95 ("doc: Add RCU files to docbook-generation files")
added :external: options for RCU source files in the file
Documentation/core-api/kernel-api.rst. However, this now means nothing,
so this commit removes them.
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: Akira Yokosawa <akiyks@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This introduces a "register private expedited" membarrier command which
allows eventual removal of important memory barrier constraints on the
scheduler fast-paths. It changes how the "private expedited" membarrier
command (new to 4.14) is used from user-space.
This new command allows processes to register their intent to use the
private expedited command. This affects how the expedited private
command introduced in 4.14-rc is meant to be used, and should be merged
before 4.14 final.
Processes are now required to register before using
MEMBARRIER_CMD_PRIVATE_EXPEDITED, otherwise that command returns EPERM.
This fixes a problem that arose when designing requested extensions to
sys_membarrier() to allow JITs to efficiently flush old code from
instruction caches. Several potential algorithms are much less painful
if the user register intent to use this functionality early on, for
example, before the process spawns the second thread. Registering at
this time removes the need to interrupt each and every thread in that
process at the first expedited sys_membarrier() system call.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
For CPUs which have an unknown or invalid CPU location (physical location)
assume that their cycle counters aren't syncronized across CPUs.
Signed-off-by: Helge Deller <deller@gmx.de>
Fixes: c8c3735997a3 ("parisc: Enhance detection of synchronous cr16 clocksources")
Cc: stable@vger.kernel.org # 4.13+
Signed-off-by: Helge Deller <deller@gmx.de>
|
|
__cmpxchg_u64 is built and used outside CONFIG_64BIT and thus needs to
be exported. This fixes the following build error seen when building
parisc:allmodconfig.
ERROR: "__cmpxchg_u64" [drivers/net/ethernet/intel/i40e/i40e.ko] undefined!
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Helge Deller <deller@gmx.de>
|
|
As discussed on the debian-hppa list, double-wordcompare and exchange
operations fail on 32-bit kernels. Looking at the code, I realized that
the ",ma" completer does the wrong thing in the "ldw,ma 4(%r26), %r29"
instruction. This increments %r26 and causes the following store to
write to the wrong location.
Note by Helge Deller:
The patch applies cleanly to stable kernel series if this upstream
commit is merged in advance:
f4125cfdb300 ("parisc: Avoid trashing sr2 and sr3 in LWS code").
Signed-off-by: John David Anglin <dave.anglin@bell.net>
Tested-by: Christoph Biedl <debian.axhn@manchmal.in-ulm.de>
Fixes: 89206491201c ("parisc: Implement new LWS CAS supporting 64 bit operations.")
Cc: stable@vger.kernel.org # 3.13+
Signed-off-by: Helge Deller <deller@gmx.de>
|
|
This reverts commit:
e863d539614641 ("kprobes: Warn if optprobe handler tries to change execution path")
On PowerPC, we place a probe at kretprobe_trampoline to catch function
returns and with CONFIG_OPTPROBES=y, this probe gets optimized. This
works for us due to the way we handle the optprobe as described in
commit:
762df10bad6954 ("powerpc/kprobes: Optimize kprobe in kretprobe_trampoline()")
With the above commit, we end up with a warning. As such, revert this change.
Reported-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20171017081834.3629-1-naveen.n.rao@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
In debian/ubuntu, libc.so is located at a different place,
/lib/x86_64-linux-gnu/libc-2.23.so, so it outputs like this when testing:
PING ::1(::1) 56 data bytes
64 bytes from ::1: icmp_seq=1 ttl=64 time=0.040 ms
--- ::1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.040/0.040/0.040/0.000 ms
0.000 probe_libc:inet_pton:(7f0e2db741c0))
__GI___inet_pton (/lib/x86_64-linux-gnu/libc-2.23.so)
getaddrinfo (/lib/x86_64-linux-gnu/libc-2.23.so)
[0xffffa9d40f34ff4d] (/bin/ping)
Fix up the libc path to make sure this test works in more OSes.
Committer testing:
When this test fails one can use 'perf test -v', i.e. in verbose mode, where
it'll show the expected backtrace, so, after applying this test:
On Fedora 26:
# perf test -v ping
62: probe libc's inet_pton & backtrace it with ping :
--- start ---
test child forked, pid 23322
PING ::1(::1) 56 data bytes
64 bytes from ::1: icmp_seq=1 ttl=64 time=0.058 ms
--- ::1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.058/0.058/0.058/0.000 ms
0.000 probe_libc:inet_pton:(7fe344310d80))
__GI___inet_pton (/usr/lib64/libc-2.25.so)
getaddrinfo (/usr/lib64/libc-2.25.so)
_init (/usr/bin/ping)
test child finished with 0
---- end ----
probe libc's inet_pton & backtrace it with ping: Ok
#
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Kim Phillips <kim.phillips@arm.com>
Cc: Li Zhijian <lizhijian@cn.fujitsu.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Philip Li <philip.li@intel.com>
Link: http://lkml.kernel.org/r/1508315649-18836-1-git-send-email-lizhijian@cn.fujitsu.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
In current xyarray code, xyarray__max_x() returns max_y, and xyarray__max_y()
returns max_x.
It's confusing and for code logic it looks not correct.
Error happens when closing evsel fd. Let's see this scenario:
1. Allocate an fd (pseudo-code)
perf_evsel__alloc_fd(struct perf_evsel *evsel, int ncpus, int nthreads)
{
evsel->fd = xyarray__new(ncpus, nthreads, sizeof(int));
}
xyarray__new(int xlen, int ylen, size_t entry_size)
{
size_t row_size = ylen * entry_size;
struct xyarray *xy = zalloc(sizeof(*xy) + xlen * row_size);
xy->entry_size = entry_size;
xy->row_size = row_size;
xy->entries = xlen * ylen;
xy->max_x = xlen;
xy->max_y = ylen;
......
}
So max_x is ncpus, max_y is nthreads and row_size = nthreads * 4.
2. Use perf syscall and get the fd
int perf_evsel__open(struct perf_evsel *evsel, struct cpu_map *cpus,
struct thread_map *threads)
{
for (cpu = 0; cpu < cpus->nr; cpu++) {
for (thread = 0; thread < nthreads; thread++) {
int fd, group_fd;
fd = sys_perf_event_open(&evsel->attr, pid, cpus->map[cpu],
group_fd, flags);
FD(evsel, cpu, thread) = fd;
}
}
static inline void *xyarray__entry(struct xyarray *xy, int x, int y)
{
return &xy->contents[x * xy->row_size + y * xy->entry_size];
}
These codes don't have issues. The issue happens in the closing of fd.
3. Close fd.
void perf_evsel__close_fd(struct perf_evsel *evsel)
{
int cpu, thread;
for (cpu = 0; cpu < xyarray__max_x(evsel->fd); cpu++)
for (thread = 0; thread < xyarray__max_y(evsel->fd); ++thread) {
close(FD(evsel, cpu, thread));
FD(evsel, cpu, thread) = -1;
}
}
Since xyarray__max_x() returns max_y (nthreads) and xyarry__max_y()
returns max_x (ncpus), so above code is actually to be:
for (cpu = 0; cpu < nthreads; cpu++)
for (thread = 0; thread < ncpus; ++thread) {
close(FD(evsel, cpu, thread));
FD(evsel, cpu, thread) = -1;
}
It's not correct!
This change is introduced by "475fb533fb7d" ("perf evsel: Fix buffer overflow
while freeing events")
This fix is to let xyarray__max_x() return max_x (ncpus) and
let xyarry__max_y() return max_y (nthreads)
Committer note:
This was also fixed by Ravi Bangoria, who provided the same patch,
noticing the problem with 'perf record':
<quote Ravi>
I see 'perf record -p <pid>' crashes with following log:
*** Error in `./perf': free(): invalid next size (normal): 0x000000000298b340 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7f7fd85c87e5]
/lib/x86_64-linux-gnu/libc.so.6(+0x8037a)[0x7f7fd85d137a]
/lib/x86_64-linux-gnu/libc.so.6(cfree+0x4c)[0x7f7fd85d553c]
./perf(perf_evsel__close+0xb4)[0x4b7614]
./perf(perf_evlist__delete+0x100)[0x4ab180]
./perf(cmd_record+0x1d9)[0x43a5a9]
./perf[0x49aa2f]
./perf(main+0x631)[0x427841]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7f7fd8571830]
./perf(_start+0x29)[0x427a59]
</>
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Fixes: d74be4767367 ("perf xyarray: Save max_x, max_y")
Link: http://lkml.kernel.org/r/1508339478-26674-1-git-send-email-yao.jin@linux.intel.com
Link: http://lkml.kernel.org/r/1508327446-15302-1-git-send-email-ravi.bangoria@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The commit 99b5c5bb9a54 ("ALSA: hda - Remove the use of set_fs()")
converted the get_kctl_0dB_offset() call for killing set_fs() usage in
HD-audio codec code. The conversion assumed that the TLV callback
used in HD-audio code is only snd_hda_mixer_amp() and applies the TLV
calculation locally.
Although this assumption is correct, and all slave kctls are actually
with that callback, the current code is still utterly buggy; it
doesn't hit this condition and falls back to the next check. It's
because the function gets called after adding slave kctls to vmaster.
By assigning a slave kctl, the slave kctl object is faked inside
vmaster code, and the whole kctl ops are overridden. Thus the
callback op points to a different value from what we've assumed.
More badly, as reported by the KERNEXEC and UDEREF features of PaX,
the code flow turns into the unexpected pitfall. The next fallback
check is SNDRV_CTL_ELEM_ACCESS_TLV_READ access bit, and this always
hits for each kctl with TLV. Then it evaluates the callback function
pointer wrongly as if it were a TLV array. Although currently its
side-effect is fairly limited, this incorrect reference may lead to an
unpleasant result.
For addressing the regression, this patch introduces a new helper to
vmaster code, snd_ctl_apply_vmaster_slaves(). This works similarly
like the existing map_slaves() in hda_codec.c: it loops over the slave
list of the given master, and applies the given function to each
slave. Then the initializer function receives the right kctl object
and we can compare the correct pointer instead of the faked one.
Also, for catching the similar breakage in future, give an error
message when the unexpected TLV callback is found and bail out
immediately.
Fixes: 99b5c5bb9a54 ("ALSA: hda - Remove the use of set_fs()")
Reported-by: PaX Team <pageexec@freemail.hu>
Cc: <stable@vger.kernel.org> # v4.13
Signed-off-by: Takashi Iwai <tiwai@suse.de>
|
|
While converting the error messages to the standard macros in the
commit 4e76a8833fac ("ALSA: hda - Replace with standard printk"), a
superfluous '-' slipped in the code mistakenly. Its influence is
almost negligible, merely shows a dB value as negative integer instead
of positive integer (or vice versa) in the rare error message.
So let's kill this embarrassing byte to show more correct value.
Fixes: 4e76a8833fac ("ALSA: hda - Replace with standard printk")
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
|
|
The loop in snd_hdac_bus_parse_capabilities() may go to nirvana when
it hits an invalid register value read:
BUG: unable to handle kernel paging request at ffffad5dc41f3fff
IP: pci_azx_readl+0x5/0x10 [snd_hda_intel]
Call Trace:
snd_hdac_bus_parse_capabilities+0x3c/0x1f0 [snd_hda_core]
azx_probe_continue+0x7d5/0x940 [snd_hda_intel]
.....
This happened on a new Intel machine, and we need to check the value
and abort the loop accordingly.
[Note: the fixes tag below indicates only the commit where this patch
can be applied; the original problem was introduced even before that
commit]
Fixes: 6720b38420a0 ("ALSA: hda - move bus_parse_capabilities to core")
Cc: <stable@vger.kernel.org>
Acked-by: Vinod Koul <vinod.koul@intel.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
|
|
The 'use' locking macros are no-ops if neither SMP or SND_DEBUG is
enabled. This might once have been OK in non-preemptible
configurations, but even in that case snd_seq_read() may sleep while
relying on a 'use' lock. So always use the proper implementations.
Cc: stable@vger.kernel.org
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
|
|
This reverts commit c91fc8519d87715a3a173475ea3778794c139996.
That change caused a C6 and PC6 residency regression on large idle systems.
Users also complained about new output indicating jitter:
turbostat: cpu6 jitter 3794 9142
Signed-off-by: Len Brown <len.brown@intel.com>
Cc: 4.13+ <stable@vger.kernel.org> # v4.13+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Commit 7496946a8 ("tracing: Add samples of DECLARE_EVENT_CLASS() and
DEFINE_EVENT()") added template examples for all the events. It created a
DEFINE_EVENT_FN() example which reused the foo_bar_reg and foo_bar_unreg
functions.
Enabling both the TRACE_EVENT_FN() and DEFINE_EVENT_FN() example trace
events caused the foo_bar_reg to be called twice, creating the test thread
twice. The foo_bar_unreg would remove it only once, even if it was called
multiple times, leaving a thread existing when the module is unloaded,
causing an oops.
Add a ref count and allow foo_bar_reg() and foo_bar_unreg() be called by
multiple trace events.
Cc: stable@vger.kernel.org
Fixes: 7496946a8 ("tracing: Add samples of DECLARE_EVENT_CLASS() and DEFINE_EVENT()")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
Currently we try to defer completion of async DIO to the process context
in case there are any mapped pages associated with the inode so that we
can invalidate the pages when the IO completes. However the check is racy
and the pages can be mapped afterwards. If this happens we might end up
calling invalidate_inode_pages2_range() in dio_complete() in interrupt
context which could sleep. This can be reproduced by generic/451.
Fix this by passing the information whether we can or can't invalidate
to the dio_complete(). Thanks Eryu Guan for reporting this and Jan Kara
for suggesting a fix.
Fixes: 332391a9935d ("fs: Fix page cache inconsistency when mixing buffered and AIO DIO")
Reported-by: Eryu Guan <eguan@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Tested-by: Eryu Guan <eguan@redhat.com>
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Thomas reported that 'perf buildid-list' gets a SEGFAULT due to NULL
pointer deref when he ran it on a data with namespace events. It was
because the buildid_id__mark_dso_hit_ops lacks the namespace event
handler and perf_too__fill_default() didn't set it.
Program received signal SIGSEGV, Segmentation fault.
0x0000000000000000 in ?? ()
Missing separate debuginfos, use: dnf debuginfo-install audit-libs-2.7.7-1.fc25.s390x bzip2-libs-1.0.6-21.fc25.s390x elfutils-libelf-0.169-1.fc25.s390x
+elfutils-libs-0.169-1.fc25.s390x libcap-ng-0.7.8-1.fc25.s390x numactl-libs-2.0.11-2.ibm.fc25.s390x openssl-libs-1.1.0e-1.1.ibm.fc25.s390x perl-libs-5.24.1-386.fc25.s390x
+python-libs-2.7.13-2.fc25.s390x slang-2.3.0-7.fc25.s390x xz-libs-5.2.3-2.fc25.s390x zlib-1.2.8-10.fc25.s390x
(gdb) where
#0 0x0000000000000000 in ?? ()
#1 0x00000000010fad6a in machines__deliver_event (machines=<optimized out>, machines@entry=0x2c6fd18,
evlist=<optimized out>, event=event@entry=0x3fffdf00470, sample=0x3ffffffe880, sample@entry=0x3ffffffe888,
tool=tool@entry=0x1312968 <build_id.mark_dso_hit_ops>, file_offset=1136) at util/session.c:1287
#2 0x00000000010fbf4e in perf_session__deliver_event (file_offset=1136, tool=0x1312968 <build_id.mark_dso_hit_ops>,
sample=0x3ffffffe888, event=0x3fffdf00470, session=0x2c6fc30) at util/session.c:1340
#3 perf_session__process_event (session=0x2c6fc30, session@entry=0x0, event=event@entry=0x3fffdf00470,
file_offset=file_offset@entry=1136) at util/session.c:1522
#4 0x00000000010fddde in __perf_session__process_events (file_size=11880, data_size=<optimized out>,
data_offset=<optimized out>, session=0x0) at util/session.c:1899
#5 perf_session__process_events (session=0x0, session@entry=0x2c6fc30) at util/session.c:1953
#6 0x000000000103b2ac in perf_session__list_build_ids (with_hits=<optimized out>, force=<optimized out>)
at builtin-buildid-list.c:83
#7 cmd_buildid_list (argc=<optimized out>, argv=<optimized out>) at builtin-buildid-list.c:115
#8 0x00000000010a026c in run_builtin (p=0x1311f78 <commands+24>, argc=argc@entry=2, argv=argv@entry=0x3fffffff3c0)
at perf.c:296
#9 0x000000000102bc00 in handle_internal_command (argv=<optimized out>, argc=2) at perf.c:348
#10 run_argv (argcp=<synthetic pointer>, argv=<synthetic pointer>) at perf.c:392
#11 main (argc=<optimized out>, argv=0x3fffffff3c0) at perf.c:536
(gdb)
Fix it by adding a stub event handler for namespace event.
Committer testing:
Further clarifying, plain using 'perf buildid-list' will not end up in a
SEGFAULT when processing a perf.data file with namespace info:
# perf record -a --namespaces sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 2.024 MB perf.data (1058 samples) ]
# perf buildid-list | wc -l
38
# perf buildid-list | head -5
e2a171c7b905826fc8494f0711ba76ab6abbd604 /lib/modules/4.14.0-rc3+/build/vmlinux
874840a02d8f8a31cedd605d0b8653145472ced3 /lib/modules/4.14.0-rc3+/kernel/arch/x86/kvm/kvm-intel.ko
ea7223776730cd8a22f320040aae4d54312984bc /lib/modules/4.14.0-rc3+/kernel/drivers/gpu/drm/i915/i915.ko
5961535e6732a8edb7f22b3f148bb2fa2e0be4b9 /lib/modules/4.14.0-rc3+/kernel/drivers/gpu/drm/drm.ko
f045f54aa78cf1931cc893f78b6cbc52c72a8cb1 /usr/lib64/libc-2.25.so
#
It is only when one asks for checking what of those entries actually had
samples, i.e. when we use either -H or --with-hits, that we will process
all the PERF_RECORD_ events, and since tools/perf/builtin-buildid-list.c
neither explicitely set a perf_tool.namespaces() callback nor the
default stub was set that we end up, when processing a
PERF_RECORD_NAMESPACE record, causing a SEGFAULT:
# perf buildid-list -H
Segmentation fault (core dumped)
^C
#
Reported-and-Tested-by: Thomas-Mich Richter <tmricht@linux.vnet.ibm.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Hari Bathini <hbathini@linux.vnet.ibm.com>
Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas-Mich Richter <tmricht@linux.vnet.ibm.com>
Fixes: f3b3614a284d ("perf tools: Add PERF_RECORD_NAMESPACES to include namespaces related info")
Link: http://lkml.kernel.org/r/20171017132900.11043-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
'perf record' had a '-l' option that meant "scale counter values" a very
long time ago, but it currently belongs to 'perf stat' as '-c'. So
remove it. I found this problem in the below case.
$ perf record -e cycles -l sleep 3
Error: unknown switch `l
Signed-off-by: Taeung Song <treeze.taeung@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1507907412-19813-1-git-send-email-treeze.taeung@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The last cleanup introduced two harmless warnings:
fs/xfs/xfs_fsmap.c:480:1: warning: '__xfs_getfsmap_rtdev' defined but not used
fs/xfs/xfs_fsmap.c:372:1: warning: 'xfs_getfsmap_rtdev_rtbitmap_helper' defined but not used
This moves those two functions as well.
Fixes: bb9c2e543325 ("xfs: move more RT specific code under CONFIG_XFS_RT")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
The writeback rework in commit fbcc02561359 ("xfs: Introduce
writeback context for writepages") introduced a subtle change in
behavior with regard to the block mapping used across the
->writepages() sequence. The previous xfs_cluster_write() code would
only flush pages up to EOF at the time of the writepage, thus
ensuring that any pages due to file-extending writes would be
handled on a separate cycle and with a new, updated block mapping.
The updated code establishes a block mapping in xfs_writepage_map()
that could extend beyond EOF if the file has post-eof preallocation.
Because we now use the generic writeback infrastructure and pass the
cached mapping to each writepage call, there is no implicit EOF
limit in place. If eofblocks trimming occurs during ->writepages(),
any post-eof portion of the cached mapping becomes invalid. The
eofblocks code has no means to serialize against writeback because
there are no pages associated with post-eof blocks. Therefore if an
eofblocks trim occurs and is followed by a file-extending buffered
write, not only has the mapping become invalid, but we could end up
writing a page to disk based on the invalid mapping.
Consider the following sequence of events:
- A buffered write creates a delalloc extent and post-eof
speculative preallocation.
- Writeback starts and on the first writepage cycle, the delalloc
extent is converted to real blocks (including the post-eof blocks)
and the mapping is cached.
- The file is closed and xfs_release() trims post-eof blocks. The
cached writeback mapping is now invalid.
- Another buffered write appends the file with a delalloc extent.
- The concurrent writeback cycle picks up the just written page
because the writeback range end is LLONG_MAX. xfs_writepage_map()
attributes it to the (now invalid) cached mapping and writes the
data to an incorrect location on disk (and where the file offset is
still backed by a delalloc extent).
This problem is reproduced by xfstests test generic/464, which
triggers racing writes, appends, open/closes and writeback requests.
To address this problem, trim the mapping used during writeback to
within EOF when the mapping is validated. This ensures the mapping
is revalidated for any pages encountered beyond EOF as of the time
the current mapping was cached or last validated.
Reported-by: Eryu Guan <eguan@redhat.com>
Diagnosed-by: Eryu Guan <eguan@redhat.com>
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
Commit 332391a9935d ("fs: Fix page cache inconsistency when mixing
buffered and AIO DIO") moved page cache invalidation from
iomap_dio_rw() to iomap_dio_complete() for iomap based direct write
path, but before the dio->end_io() call, and it re-introdued the bug
fixed by commit c771c14baa33 ("iomap: invalidate page caches should
be after iomap_dio_complete() in direct write").
I found this because fstests generic/418 started failing on XFS with
v4.14-rc3 kernel, which is the regression test for this specific
bug.
So similarly, fix it by moving dio->end_io() (which does the
unwritten extent conversion) before page cache invalidation, to make
sure next buffer read reads the final real allocations not unwritten
extents. I also add some comments about why should end_io() go first
in case we get it wrong again in the future.
Note that, there's no such problem in the non-iomap based direct
write path, because we didn't remove the page cache invalidation
after the ->direct_IO() in generic_file_direct_write() call, but I
decided to fix dio_complete() too so we don't leave a landmine
there, also be consistent with iomap_dio_complete().
Fixes: 332391a9935d ("fs: Fix page cache inconsistency when mixing buffered and AIO DIO")
Signed-off-by: Eryu Guan <eguan@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Lukas Czerner <lczerner@redhat.com>
|
|
Recently we've had warnings arise from the vm handing us pages
without bufferheads attached to them. This should not ever occur
in XFS, but we don't defend against it properly if it does. The only
place where we remove bufferheads from a page is in
xfs_vm_releasepage(), but we can't tell the difference here between
"page is dirty so don't release" and "page is dirty but is being
invalidated so release it".
In some places that are invalidating pages ask for pages to be
released and follow up afterward calling ->releasepage by checking
whether the page was dirty and then aborting the invalidation. This
is a possible vector for releasing buffers from a page but then
leaving it in the mapping, so we really do need to avoid dirty pages
in xfs_vm_releasepage().
To differentiate between invalidated pages and normal pages, we need
to clear the page dirty flag when invalidating the pages. This can
be done through xfs_vm_invalidatepage(), and will result
xfs_vm_releasepage() seeing the page as clean which matches the
bufferhead state on the page after calling block_invalidatepage().
Hence we can re-add the page dirty check in xfs_vm_releasepage to
catch the case where we might be releasing a page that is actually
dirty and so should not have the bufferheads on it removed. This
will remove one possible vector of "dirty page with no bufferheads"
and so help narrow down the search for the root cause of that
problem.
Signed-Off-By: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
Jiri and Namhyung have long contributed a lot of code and time reviewing
patches to tools/, so lets make that reflected in the MAINTAINERS file
to encourage patch submitters to add them to the CC list, speeding up
the process of tools/perf/ patch processing.
Acked-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-onicopw68bg6kn56lnybfpns@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Add native DSD support quirk for Pro-Ject Pre Box S2 Digital USB id
2772:0230.
Signed-off-by: Jussi Laako <jussi@sonarnerd.net>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
|
|
This adds a short document describing the views of how the Linux kernel
community feels about enforcing the license of the kernel.
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Alex Elder (Linaro) <elder@linaro.org>
Acked-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Andy Gross <andy.gross@linaro.org>
Acked-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Anna Schumaker <schumaker.anna@gmail.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Acked-by: Bart Van Assche <bart.vanassche@wdc.com>
Acked-by: Bhumika Goyal <bhumirks@gmail.com>
Acked-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Acked-by: Borislav Petkov <bp@suse.de>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Acked-by: Chuck Lever <chuck.lever@oracle.com>
Acked-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Darrick J. Wong (Oracle) <darrick.wong@oracle.com>
Acked-by: Darrick J. Wong <djwong@kernel.org>
Acked-by: David Kershner <david.kershner@unisys.com>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Acked-by: Doug Ledford <dledford@redhat.com>
Acked-by: Fabio Estevam <festevam@gmail.com>
Acked-by: Felipe Balbi <balbi@kernel.org>
Acked-by: Florian Westphal <fw@strlen.de>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Guenter Roeck <linux@roeck-us.net>
Acked-by: Hannes Reinecke <hare@suse.com>
Acked-by: Hans de Goede <hdegoede@redhat.com>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Heiko Stuebner <heiko@sntech.de>
Acked-by: Heiner Kallweit <hkallweit1@gmail.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Ivan Safonov <insafonov@gmail.com>
Acked-by: Jaegeuk Kim <jaegeuk@kernel.org>
Acked-by: Jan Kara (SUSE) <jack@suse.cz>
Acked-by: Javier Martinez Canillas <javier@dowhile0.org>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Acked-by: Jens Axboe <axboe@kernel.dk>
Acked-by: Jes Sorensen <Jes.Sorensen@gmail.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Joe Perches <joe@perches.com>
Acked-by: Joerg Roedel (SUSE) <jroedel@suse.de>
Acked-by: Johan Hovold <johan@kernel.org>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Juergen Gross <jgross@suse.com>
Acked-by: Julia Lawall <Julia.Lawall@lip6.fr>
Acked-by: K. Y. Srinivasan <kys@microsoft.com>
Acked-by: Khalid Aziz <khalid@gonehiking.org>
Acked-by: Krzysztof Kozlowski <krzk@kernel.org>
Acked-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Acked-by: Larry Finger <Larry.Finger@lwfinger.net>
Acked-by: Laura Abbott <laura@labbott.name>
Acked-by: Lee Jones <lee.jones@linaro.org>
Acked-by: Leon Romanovsky <leon@kernel.org>
Acked-by: Linus Walleij (Linaro) <linus.walleij@linaro.org>
Acked-by: Lv Zheng <zetalog@gmail.com>
Acked-by: Martin K. Petersen (Oracle) <martin.petersen@oracle.com>
Acked-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Mike Marshall <hubcap@omnibond.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Neil Armstrong <narmstrong@baylibre.com>
Acked-by: Olof Johansson <olof@lixom.net>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Paul Burton <paul.burton@mips.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Acked-by: Richard Weinberger <richard@nod.at>
Acked-by: Rik van Riel <riel@surriel.com>
Acked-by: Rob Clark <robdclark@gmail.com>
Acked-by: Rob Herring <robh@kernel.org>
Acked-by: Sebastian Reichel (Collabora) <sre@kernel.org>
Acked-by: Shawn Guo <shawnguo@kernel.org>
Acked-by: Shuah Khan <shuahkh@osg.samsung.com>
Acked-by: Simon Horman <horms@verge.net.au>
Acked-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Acked-by: Sven Eckelmann <sven@narfation.org>
Acked-by: Takashi Iwai (SUSE) <tiwai@suse.de>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Thierry Reding <thierry.reding@gmail.com>
Acked-by: Tony Luck <tony.luck@gmail.com>
Acked-by: Ulf Hansson <ulf.hansson@linaro.org>
Acked-by: Vinod Koul <vkoul@kernel.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Acked-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
zipl from s390-tools generates root=/dev/ram0 kernel cmdline for
zfcpdump, thus BLK_DEV_RAM is required.
zfcpdump initrd mounts DEBUG_FS, thus is also required.
Bug-Ubuntu: https://launchpad.net/bugs/1722735
Bug-Ubuntu: https://launchpad.net/bugs/1719290
Signed-off-by: Dimitri John Ledkov <xnox@ubuntu.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
|
On CPU hotplug some cpu stats contain bogus values:
$ cat /proc/stat
cpu 0 0 49 1280 0 0 0 3 0 0
cpu0 0 0 49 618 0 0 0 3 0 0
cpu1 0 0 0 662 0 0 0 0 0 0
[...]
$ echo 0 > /sys/devices/system/cpu/cpu1/online
$ echo 1 > /sys/devices/system/cpu/cpu1/online
$ cat /proc/stat
cpu 0 0 49 3200 0 450359962737 450359962737 3 0 0
cpu0 0 0 49 1956 0 0 0 3 0 0
cpu1 0 0 0 1244 0 450359962737 450359962737 0 0 0
[...]
pcpu_attach_task() needs the same assignments as vtime_task_switch.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Fixes: b7394a5f4ce9 ("sched/cputime, s390: Implement delayed accounting of system time")
Cc: stable@vger.kernel.org # 4.11+
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
|
|
|
On CPUs like AMD's Geode, for example, we shouldn't even try to load
microcode because they do not support the modern microcode loading
interface.
However, we do the family check *after* the other checks whether the
loader has been disabled on the command line or whether we're running in
a guest.
So move the family checks first in order to exit early if we're being
loaded on an unsupported family.
Reported-and-tested-by: Sven Glodowski <glodi1@arcor.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org> # 4.11..
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://bugzilla.suse.com/show_bug.cgi?id=1061396
Link: http://lkml.kernel.org/r/20171012112316.977-1-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Johan Hovold reported a big lockdep slowdown on his system, caused by lockdep:
> I had noticed that the BeagleBone Black boot time appeared to have
> increased significantly with 4.14 and yesterday I finally had time to
> investigate it.
>
> Boot time (from "Linux version" to login prompt) had in fact doubled
> since 4.13 where it took 17 seconds (with my current config) compared to
> the 35 seconds I now see with 4.14-rc4.
>
> I quick bisect pointed to lockdep and specifically the following commit:
>
> 28a903f63ec0 ("locking/lockdep: Handle non(or multi)-acquisition of a crosslock")
Because the final v4.14 release is close, disable the cross-release lockdep
features for now.
Bisected-by: Johan Hovold <johan@kernel.org>
Debugged-by: Johan Hovold <johan@kernel.org>
Reported-by: Johan Hovold <johan@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Lindgren <tony@atomide.com>
Cc: kernel-team@lge.com
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-mm@kvack.org
Cc: linux-omap@vger.kernel.org
Link: http://lkml.kernel.org/r/20171014072659.f2yr6mhm5ha3eou7@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Since commit:
94b1b03b519b ("x86/mm: Rework lazy TLB mode and TLB freshness tracking")
x86's lazy TLB mode has been all the way lazy: when running a kernel thread
(including the idle thread), the kernel keeps using the last user mm's
page tables without attempting to maintain user TLB coherence at all.
From a pure semantic perspective, this is fine -- kernel threads won't
attempt to access user pages, so having stale TLB entries doesn't matter.
Unfortunately, I forgot about a subtlety. By skipping TLB flushes,
we also allow any paging-structure caches that may exist on the CPU
to become incoherent. This means that we can have a
paging-structure cache entry that references a freed page table, and
the CPU is within its rights to do a speculative page walk starting
at the freed page table.
I can imagine this causing two different problems:
- A speculative page walk starting from a bogus page table could read
IO addresses. I haven't seen any reports of this causing problems.
- A speculative page walk that involves a bogus page table can install
garbage in the TLB. Such garbage would always be at a user VA, but
some AMD CPUs have logic that triggers a machine check when it notices
these bogus entries. I've seen a couple reports of this.
Boris further explains the failure mode:
> It is actually more of an optimization which assumes that paging-structure
> entries are in WB DRAM:
>
> "TlbCacheDis: cacheable memory disable. Read-write. 0=Enables
> performance optimization that assumes PML4, PDP, PDE, and PTE entries
> are in cacheable WB-DRAM; memory type checks may be bypassed, and
> addresses outside of WB-DRAM may result in undefined behavior or NB
> protocol errors. 1=Disables performance optimization and allows PML4,
> PDP, PDE and PTE entries to be in any memory type. Operating systems
> that maintain page tables in memory types other than WB- DRAM must set
> TlbCacheDis to insure proper operation."
>
> The MCE generated is an NB protocol error to signal that
>
> "Link: A specific coherent-only packet from a CPU was issued to an
> IO link. This may be caused by software which addresses page table
> structures in a memory type other than cacheable WB-DRAM without
> properly configuring MSRC001_0015[TlbCacheDis]. This may occur, for
> example, when page table structure addresses are above top of memory. In
> such cases, the NB will generate an MCE if it sees a mismatch between
> the memory operation generated by the core and the link type."
>
> I'm assuming coherent-only packets don't go out on IO links, thus the
> error.
To fix this, reinstate TLB coherence in lazy mode. With this patch
applied, we do it in one of two ways:
- If we have PCID, we simply switch back to init_mm's page tables
when we enter a kernel thread -- this seems to be quite cheap
except for the cost of serializing the CPU.
- If we don't have PCID, then we set a flag and switch to init_mm
the first time we would otherwise need to flush the TLB.
The /sys/kernel/debug/x86/tlb_use_lazy_mode debug switch can be changed
to override the default mode for benchmarking.
In theory, we could optimize this better by only flushing the TLB in
lazy CPUs when a page table is freed. Doing that would require
auditing the mm code to make sure that all page table freeing goes
through tlb_remove_page() as well as reworking some data structures
to implement the improved flush logic.
Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Reported-by: Adam Borowski <kilobyte@angband.pl>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Eric Biggers <ebiggers@google.com>
Cc: Johannes Hirte <johannes.hirte@datenkhaos.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Roman Kagan <rkagan@virtuozzo.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 94b1b03b519b ("x86/mm: Rework lazy TLB mode and TLB freshness tracking")
Link: http://lkml.kernel.org/r/20171009170231.fkpraqokz6e4zeco@pd.tnic
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
When the VMA based swap readahead was introduced, a new knob
/sys/kernel/mm/swap/vma_ra_max_order
was added as the max window of VMA swap readahead. This is to make it
possible to use different max window for VMA based readahead and
original physical readahead. But Minchan Kim pointed out that this will
cause a regression because setting page-cluster sysctl to zero cannot
disable swap readahead with the change.
To fix the regression, the page-cluster sysctl is used as the max window
of both the VMA based swap readahead and original physical swap
readahead. If more fine grained control is needed in the future, more
knobs can be added as the subordinate knobs of the page-cluster sysctl.
The vma_ra_max_order knob is deleted. Because the knob was introduced
in v4.14-rc1, and this patch is targeting being merged before v4.14
releasing, there should be no existing users of this newly added ABI.
Link: http://lkml.kernel.org/r/20171011070847.16003-1-ying.huang@intel.com
Fixes: ec560175c0b6fce ("mm, swap: VMA based swap readahead")
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Reported-by: Minchan Kim <minchan@kernel.org>
Acked-by: Minchan Kim <minchan@kernel.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Shaohua Li <shli@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Tim Chen <tim.c.chen@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Loading the pmd without holding the pmd_lock exposes us to races with
concurrent updaters of the page tables but, worse still, it also allows
the compiler to cache the pmd value in a register and reuse it later on,
even if we've performed a READ_ONCE in between and seen a more recent
value.
In the case of page_vma_mapped_walk, this leads to the following crash
when the pmd loaded for the initial pmd_trans_huge check is all zeroes
and a subsequent valid table entry is loaded by check_pmd. We then
proceed into map_pte, but the compiler re-uses the zero entry inside
pte_offset_map, resulting in a junk pointer being installed in
pvmw->pte:
PC is at check_pte+0x20/0x170
LR is at page_vma_mapped_walk+0x2e0/0x540
[...]
Process doio (pid: 2463, stack limit = 0xffff00000f2e8000)
Call trace:
check_pte+0x20/0x170
page_vma_mapped_walk+0x2e0/0x540
page_mkclean_one+0xac/0x278
rmap_walk_file+0xf0/0x238
rmap_walk+0x64/0xa0
page_mkclean+0x90/0xa8
clear_page_dirty_for_io+0x84/0x2a8
mpage_submit_page+0x34/0x98
mpage_process_page_bufs+0x164/0x170
mpage_prepare_extent_to_map+0x134/0x2b8
ext4_writepages+0x484/0xe30
do_writepages+0x44/0xe8
__filemap_fdatawrite_range+0xbc/0x110
file_write_and_wait_range+0x48/0xd8
ext4_sync_file+0x80/0x4b8
vfs_fsync_range+0x64/0xc0
SyS_msync+0x194/0x1e8
This patch fixes the problem by ensuring that READ_ONCE is used before
the initial checks on the pmd, and this value is subsequently used when
checking whether or not the pmd is present. pmd_check is removed and
the pmd_present check is inlined directly.
Link: http://lkml.kernel.org/r/1507222630-5839-1-git-send-email-will.deacon@arm.com
Fixes: f27176cfc363 ("mm: convert page_mkclean_one() to use page_vma_mapped_walk()")
Signed-off-by: Will Deacon <will.deacon@arm.com>
Tested-by: Yury Norov <ynorov@caviumnetworks.com>
Tested-by: Richard Ruigrok <rruigrok@codeaurora.org>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Kmemleak considers any pointers on task stacks as references. This
patch clears newly allocated and reused vmap stacks.
Link: http://lkml.kernel.org/r/150728990124.744199.8403409836394318684.stgit@buzz
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
inode->i_private is assigned by a Node pointer only after registering a
new binary format, so it could be NULL if inode was created by
bm_fill_super() (or iput() was called by the error path in
bm_register_write()), and this could result in NULL pointer dereference
when evicting such an inode. e.g. mount binfmt_misc filesystem then
umount it immediately:
mount -t binfmt_misc binfmt_misc /proc/sys/fs/binfmt_misc
umount /proc/sys/fs/binfmt_misc
will result in
BUG: unable to handle kernel NULL pointer dereference at 0000000000000013
IP: bm_evict_inode+0x16/0x40 [binfmt_misc]
...
Call Trace:
evict+0xd3/0x1a0
iput+0x17d/0x1d0
dentry_unlink_inode+0xb9/0xf0
__dentry_kill+0xc7/0x170
shrink_dentry_list+0x122/0x280
shrink_dcache_parent+0x39/0x90
do_one_tree+0x12/0x40
shrink_dcache_for_umount+0x2d/0x90
generic_shutdown_super+0x1f/0x120
kill_litter_super+0x29/0x40
deactivate_locked_super+0x43/0x70
deactivate_super+0x45/0x60
cleanup_mnt+0x3f/0x70
__cleanup_mnt+0x12/0x20
task_work_run+0x86/0xa0
exit_to_usermode_loop+0x6d/0x99
syscall_return_slowpath+0xba/0xf0
entry_SYSCALL_64_fastpath+0xa3/0xa
Fix it by making sure Node (e) is not NULL.
Link: http://lkml.kernel.org/r/20171010100642.31786-1-eguan@redhat.com
Fixes: 83f918274e4b ("exec: binfmt_misc: shift filp_close(interp_file) from kill_node() to bm_evict_inode()")
Signed-off-by: Eryu Guan <eguan@redhat.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
When using FAT on a block device which supports rw_page, we can hit
BUG_ON(!PageLocked(page)) in try_to_free_buffers(). This is because we
call clean_buffers() after unlocking the page we've written. Introduce
a new clean_page_buffers() which cleans all buffers associated with a
page and call it from within bdev_write_page().
[akpm@linux-foundation.org: s/PAGE_SIZE/~0U/ per Linus and Matthew]
Link: http://lkml.kernel.org/r/20171006211541.GA7409@bombadil.infradead.org
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Reported-by: Toshi Kani <toshi.kani@hpe.com>
Reported-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Tested-by: Toshi Kani <toshi.kani@hpe.com>
Acked-by: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Add kernel-doc notation for some macros. Correct kernel-doc comments &
typos for a few macros.
Link: http://lkml.kernel.org/r/76fa1403-1511-be4c-e9c4-456b43edfad3@infradead.org
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
We have seen NULL-pointer dereference crashes in tty->disc_data when the
N_TTY fallback driver failed to open during hangup. The immediate cause
of this open to fail has been addressed in the preceding patch to
vmalloc(), but this code could be more robust.
As Alan pointed out in commit 8a8dabf2dd68 ("tty: handle the case where
we cannot restore a line discipline"), the N_TTY driver, historically
the safe fallback that could never fail, can indeed fail, but the
surrounding code is not prepared to handle this. To avoid crashes he
added a new N_NULL driver to take N_TTY's place as the last resort.
Hook that fallback up to the hangup path. Update tty_ldisc_reinit() to
reflect the reality that n_tty_open can indeed fail.
Link: http://lkml.kernel.org/r/20171004185959.GC2136@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Alan Cox <alan@llwyncelyn.cymru>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This reverts commits 5d17a73a2ebe ("vmalloc: back off when the current
task is killed") and 171012f56127 ("mm: don't warn when vmalloc() fails
due to a fatal signal").
Commit 5d17a73a2ebe ("vmalloc: back off when the current task is
killed") made all vmalloc allocations from a signal-killed task fail.
We have seen crashes in the tty driver from this, where a killed task
exiting tries to switch back to N_TTY, fails n_tty_open because of the
vmalloc failing, and later crashes when dereferencing tty->disc_data.
Arguably, relying on a vmalloc() call to succeed in order to properly
exit a task is not the most robust way of doing things. There will be a
follow-up patch to the tty code to fall back to the N_NULL ldisc.
But the justification to make that vmalloc() call fail like this isn't
convincing, either. The patch mentions an OOM victim exhausting the
memory reserves and thus deadlocking the machine. But the OOM killer is
only one, improbable source of fatal signals. It doesn't make sense to
fail allocations preemptively with plenty of memory in most cases.
The patch doesn't mention real-life instances where vmalloc sites would
exhaust memory, which makes it sound more like a theoretical issue to
begin with. But just in case, the OOM access to memory reserves has
been restricted on the allocator side in cd04ae1e2dc8 ("mm, oom: do not
rely on TIF_MEMDIE for memory reserves access"), which should take care
of any theoretical concerns on that front.
Revert this patch, and the follow-up that suppresses the allocation
warnings when we fail the allocations due to a signal.
Link: http://lkml.kernel.org/r/20171004185906.GB2136@cmpxchg.org
Fixes: 171012f56127 ("mm: don't warn when vmalloc() fails due to a fatal signal")
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Alan Cox <alan@llwyncelyn.cymru>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
cma_alloc() unconditionally prints an INFO message when the CMA
allocation fails. Make this message conditional on the non-presence of
__GFP_NOWARN in gfp_mask.
This patch aims at removing INFO messages that are displayed when the
VC4 driver tries to allocate buffer objects. From the driver
perspective an allocation failure is acceptable, and the driver can
possibly do something to make following allocation succeed (like
flushing the VC4 internal cache).
Link: http://lkml.kernel.org/r/20171004125447.15195-1-boris.brezillon@free-electrons.com
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Acked-by: Laura Abbott <labbott@redhat.com>
Cc: Jaewon Kim <jaewon31.kim@samsung.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Eric Anholt <eric@anholt.net>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
gcc on aarch64 may emit synbols of type 'n' if the kernel is built with
'-frecord-gcc-switches'. In most cases, those symbols are reported with
nm as
000000000000000e n $d
and with objdump as
0000000000000000 l d .GCC.command.line 0000000000000000 .GCC.command.line
000000000000000e l .GCC.command.line 0000000000000000 $d
Those symbols are detected in is_arm_mapping_symbol() and ignored.
However, if "--prefix-symbols=<prefix>" is configured as well, the
situation is different. For example, in efi/libstub, arm64 images are
built with
'--prefix-alloc-sections=.init --prefix-symbols=__efistub_'.
In combination with '-frecord-gcc-switches', the symbols are now reported
by nm as:
000000000000000e n __efistub_$d
and by objdump as:
0000000000000000 l d .GCC.command.line 0000000000000000 .GCC.command.line
000000000000000e l .GCC.command.line 0000000000000000 __efistub_$d
Those symbols are no longer ignored and included in the base address
calculation. This results in a base address of 000000000000000e, which
in turn causes kallsyms to abort with
kallsyms failure:
relative symbol value 0xffffff900800a000 out of range in relative mode
The problem is seen in little endian arm64 builds with CONFIG_EFI
enabled and with '-frecord-gcc-switches' set in KCFLAGS.
Explicitly ignore symbols of type 'n' since those are clearly debug
symbols.
Link: http://lkml.kernel.org/r/1507136063-3139-1-git-send-email-linux@roeck-us.net
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
I was stress testing some backports and with high load, after some time,
the latest version of the selftest showed some false positive in
connection with the uffdio_copy_retry. This seems to fix it while still
exercising -EEXIST in the background transfer once in a while.
The fork child will quit after the last UFFDIO_COPY is run, so a
repeated UFFDIO_COPY may not return -EEXIST. This change restricts the
-EEXIST stress to the background transfer where the memory can't go away
from under it.
Also updated uffdio_zeropage, so the interface is consistent.
Link: http://lkml.kernel.org/r/20171004171541.1495-2-aarcange@redhat.com
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Pavel Emelyanov <xemul@virtuozzo.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
When I execute numactl -H (which reads /sys/devices/system/node/nodeX/cpumap
and displays cpumask_of_node for each node), I get different result
on X86 and arm64. For each numa node, the former only displayed online
CPUs, and the latter displayed all possible CPUs. Unfortunately, both
Linux documentation and numactl manual have not described it clear.
I sent a mail to ask for help, and Michal Hocko replied that he
preferred to print online cpus because it doesn't really make much sense
to bind anything on offline nodes.
Will said:
"I suspect the vast majority (if not all) code that reads this file was
developed for x86, so having the same behaviour for arm64 sounds like
something we should do ASAP before people try to special case with
things like #ifdef __aarch64__. I'd rather have this in 4.14 if
possible."
Link: http://lkml.kernel.org/r/1506678805-15392-2-git-send-email-thunder.leizhen@huawei.com
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Tianhong Ding <dingtianhong@huawei.com>
Cc: Hanjun Guo <guohanjun@huawei.com>
Cc: Libin <huawei.libin@huawei.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
A non present pmd entry can appear after pmd_lock is taken in
page_vma_mapped_walk(), even if THP migration is not enabled. The
WARN_ONCE is unnecessary.
Link: http://lkml.kernel.org/r/20171003142606.12324-1-zi.yan@sent.com
Fixes: 616b8371539a ("mm: thp: enable thp migration in generic path")
Signed-off-by: Zi Yan <zi.yan@cs.rutgers.edu>
Reported-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
Tested-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Commit 3a321d2a3dde ("mm: change the call sites of numa statistics
items") separated NUMA counters from zone counters, but the
NUMA_INTERLEAVE_HIT call site wasn't updated to use the new interface.
So alloc_page_interleave() actually increments NR_ZONE_INACTIVE_FILE
instead of NUMA_INTERLEAVE_HIT.
Fix this by using __inc_numa_state() interface to increment
NUMA_INTERLEAVE_HIT.
Link: http://lkml.kernel.org/r/20171003191003.8573-1-aryabinin@virtuozzo.com
Fixes: 3a321d2a3dde ("mm: change the call sites of numa statistics items")
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Kemi Wang <kemi.wang@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The pci-rcar driver is enabled for compile tests, and this has shown that
the driver cannot build without CONFIG_OF, following the inclusion of
commit f8f2fe7355fb ("PCI: rcar: Use new OF interrupt mapping when possible"):
drivers/pci/host/pcie-rcar.c: In function 'pci_dma_range_parser_init':
drivers/pci/host/pcie-rcar.c:1039:2: error: implicit declaration of function 'of_n_addr_cells' [-Werror=implicit-function-declaration]
parser->pna = of_n_addr_cells(node);
^
As pointed out by Ben Dooks and Geert Uytterhoeven, this is actually
supposed to build fine, which we can achieve if we make the declaration
of of_irq_parse_and_map_pci conditional on CONFIG_OF and provide an
empty inline function otherwise, as we do for a lot of other of
interfaces.
This lets us build the rcar_pci driver again without CONFIG_OF for build
testing. All platforms using this driver select OF, so this doesn't
change anything for the users.
[akpm@linux-foundation.org: be consistent with surrounding code]
Link: http://lkml.kernel.org/r/20170911200805.3363318-1-arnd@arndb.de
Fixes: c25da4778803 ("PCI: rcar: Add Renesas R-Car PCIe driver")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Frank Rowand <frank.rowand@sony.com>
Acked-by: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Magnus Damm <damm@opensource.se>
Cc: Ben Dooks <ben.dooks@codethink.co.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
mm/madvise.c has a brief description about all MADV_ flags. Add a
description for the newly added MADV_WIPEONFORK and MADV_KEEPONFORK.
Although man page has the similar information, but it'd better to keep
the consistent with other flags.
Link: http://lkml.kernel.org/r/1506117328-88228-1-git-send-email-yang.s@alibaba-inc.com
Signed-off-by: Yang Shi <yang.s@alibaba-inc.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Expand the "Runtime testing" menu by including more entries inside it
instead of after it. This is just Kconfig symbol movement.
This causes the (arch-independent) Runtime tests to be presented
(listed) all in one place instead of in multiple places.
Link: http://lkml.kernel.org/r/c194e5c4-2042-bf94-a2d8-7aa13756e257@infradead.org
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|