aboutsummaryrefslogtreecommitdiffstatshomepage
path: root/tools/scripts (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2019-08-31tools/power turbostat: Add support for Hygon Fam 18h (Dhyana) RAPLPu Wen1-2/+7
Commit 9392bd98bba760be96ee ("tools/power turbostat: Add support for AMD Fam 17h (Zen) RAPL") and the commit 3316f99a9f1b68c578c5 ("tools/power turbostat: Also read package power on AMD F17h (Zen)") add AMD Fam 17h RAPL support. Hygon Family 18h(Dhyana) support RAPL in bit 14 of CPUID 0x80000007 EDX, and has MSRs RAPL_PWR_UNIT/CORE_ENERGY_STAT/PKG_ENERGY_STAT. So add Hygon Dhyana Family 18h support for RAPL. Already tested on Hygon multi-node systems and it shows correct per-core energy usage and the total package power. Signed-off-by: Pu Wen <puwen@hygon.cn> Reviewed-by: Calvin Walton <calvin.walton@kepstin.ca> Signed-off-by: Len Brown <len.brown@intel.com>
2019-08-31tools/power turbostat: Fix caller parameter of get_tdp_amd()Pu Wen1-1/+1
Commit 9392bd98bba760be96ee ("tools/power turbostat: Add support for AMD Fam 17h (Zen) RAPL") add a function get_tdp_amd(), the parameter is CPU family. But the rapl_probe_amd() function use wrong model parameter. Fix the wrong caller parameter of get_tdp_amd() to use family. Cc: <stable@vger.kernel.org> # v5.1+ Signed-off-by: Pu Wen <puwen@hygon.cn> Reviewed-by: Calvin Walton <calvin.walton@kepstin.ca> Signed-off-by: Len Brown <len.brown@intel.com>
2019-08-31tools/power turbostat: Fix CPU%C1 display valueSrinivas Pandruvada1-6/+17
In some case C1% will be wrong value, when platform doesn't have MSR for C1 residency. For example: Core CPU CPU%c1 - - 100.00 0 0 100.00 0 2 100.00 1 1 100.00 1 3 100.00 But adding Busy% will fix this Core CPU Busy% CPU%c1 - - 99.77 0.23 0 0 99.77 0.23 0 2 99.77 0.23 1 1 99.77 0.23 1 3 99.77 0.23 This issue can be reproduced on most of the recent systems including Broadwell, Skylake and later. This is because if we don't select Busy% or Avg_MHz or Bzy_MHz then mperf value will not be read from MSR, so it will be 0. But this is required for C1% calculation when MSR for C1 residency is not present. Same is true for C3, C6 and C7 column selection. So add another define DO_BIC_READ(), which doesn't depend on user column selection and use for mperf, C3, C6 and C7 related counters. So when there is no platform support for C1 residency counters, we still read these counters, if the CPU has support and user selected display of CPU%c1. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2019-08-31tools/power turbostat: do not enforce 1msArtem Bityutskiy1-5/+0
Turbostat works by taking a snapshot of counters, sleeping, taking another snapshot, calculating deltas, and printing out the table. The sleep time is controlled via -i option or by user sending a signal or a character to stdin. In the latter case, turbostat always adds 1 ms sleep before it reads the counters, in order to avoid larger imprecisions in the results in prints. While the 1 ms delay may be a good idea for a "dumb" user, it is a problem for an "aware" user. I do thousands and thousands of measurements over a short period of time (like 2ms), and turbostat unconditionally adds a 1ms to my interval, so I cannot get what I really need. This patch removes the unconditional 1ms sleep. This is an expert user tool, after all, and non-experts will unlikely ever use it in the non-fixed interval mode anyway, so I think it is OK to remove the 1ms delay. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2019-08-31tools/power turbostat: read from pipes tooArtem Bityutskiy1-4/+16
Commit '47936f944e78 tools/power turbostat: fix printing on input' make a valid fix, but it completely disabled piped stdin support, which is a valuable use-case. Indeed, if stdin is a pipe, turbostat won't read anything from it, so it becomes impossible to get turbostat output at user-defined moments, instead of the regular intervals. There is no reason why this should works for terminals, but not for pipes. This patch improves the situation. Instead of ignoring pipes, we read data from them but gracefully handle the EOF case. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2019-08-31tools/power turbostat: Add Ice Lake NNPI supportRajneesh Bhardwaj1-0/+1
This enables turbostat utility on Ice Lake NNPI SoC. Link: https://lkml.org/lkml/2019/6/5/1034 Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2019-08-31tools/power turbostat: rename has_hsw_msrs()Len Brown1-4/+4
Perhaps if this more descriptive name had been used, then we wouldn't have had the HSW ULT vs HSW CORE bug, fixed by the previous commit. Signed-off-by: Len Brown <len.brown@intel.com>
2019-08-31tools/power turbostat: Fix Haswell Core systemsLen Brown1-4/+6
turbostat: cpu0: msr offset 0x630 read failed: Input/output error because Haswell Core does not have C8-C10. Output C8-C10 only on Haswell ULT. Fixes: f5a4c76ad7de ("tools/power turbostat: consolidate duplicate model numbers") Reported-by: Prarit Bhargava <prarit@redhat.com> Suggested-by: Kosuke Tatsukawa <tatsu@ab.jp.nec.com> Signed-off-by: Len Brown <len.brown@intel.com>
2019-08-31tools/power turbostat: add Jacobsville supportZhang Rui1-0/+3
Jacobsville behaves like Denverton. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2019-08-31tools/power turbostat: fix buffer overrunNaoya Horiguchi1-1/+1
turbostat could be terminated by general protection fault on some latest hardwares which (for example) support 9 levels of C-states and show 18 "tADDED" lines. That bloats the total output and finally causes buffer overrun. So let's extend the buffer to avoid this. Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: Len Brown <len.brown@intel.com>
2019-08-31tools/power turbostat: fix file descriptor leaksGustavo A. R. Silva1-0/+1
Fix file descriptor leaks by closing fp before return. Addresses-Coverity-ID: 1444591 ("Resource leak") Addresses-Coverity-ID: 1444592 ("Resource leak") Fixes: 5ea7647b333f ("tools/power turbostat: Warn on bad ACPI LPIT data") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Reviewed-by: Prarit Bhargava <prarit@redhat.com> Signed-off-by: Len Brown <len.brown@intel.com>
2019-08-31tools/power turbostat: fix leak of file descriptor on error return pathColin Ian King1-0/+1
Currently the error return path does not close the file fp and leaks a file descriptor. Fix this by closing the file. Fixes: 5ea7647b333f ("tools/power turbostat: Warn on bad ACPI LPIT data") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Len Brown <len.brown@intel.com>
2019-08-31tools/power turbostat: Make interval calculation per thread to reduce jitterYazen Ghannam1-3/+10
Turbostat currently normalizes TSC and other values by dividing by an interval. This interval is the delta between the start of one global (all counters on all CPUs) sampling and the start of another. However, this introduces a lot of jitter into the data. In order to reduce jitter, the interval calculation should be based on timestamps taken per thread and close to the start of the thread's sampling. Define a per thread time value to hold the delta between samples taken on the thread. Use the timestamp taken at the beginning of sampling to calculate the delta. Move the thread's beginning timestamp to after the CPU migration to avoid jitter due to the migration. Use the global time delta for the average time delta. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Len Brown <len.brown@intel.com>
2019-08-31tools/power turbostat: remove duplicate pc10 columnLen Brown1-1/+0
Remove the duplicate pc10 column. Fixes: be0e54c4ebbf ("turbostat: Build-in "Low Power Idle" counters support") Reported-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: Len Brown <len.brown@intel.com>
2019-08-31tools/power x86_energy_perf_policy: Fix argument parsingZephaniah E. Loss-Cutler-Hull1-1/+1
The -w argument in x86_energy_perf_policy currently triggers an unconditional segfault. This is because the argument string reads: "+a:c:dD:E:e:f:m:M:rt:u:vw" and yet the argument handler expects an argument. When parse_optarg_string is called with a null argument, we then proceed to crash in strncmp, not horribly friendly. The man page describes -w as taking an argument, the long form (--hwp-window) is correctly marked as taking a required argument, and the code expects it. As such, this patch simply marks the short form (-w) as requiring an argument. Signed-off-by: Zephaniah E. Loss-Cutler-Hull <zephaniah@gmail.com> Signed-off-by: Len Brown <len.brown@intel.com>
2019-08-31tools/power: Fix typo in man pageMatt Lupfer1-1/+1
From context, we mean EPB (Enegry Performance Bias). Signed-off-by: Matt Lupfer <mlupfer@ddn.com> Signed-off-by: Len Brown <len.brown@intel.com>
2019-08-31tools/power/x86: Enable compiler optimisations and Fortify by defaultBen Hutchings2-2/+4
Compiling without optimisations is silly, especially since some warnings depend on the optimiser. Use -O2. Fortify adds warnings for unchecked I/O (among other things), which seems to be a good idea for user-space code. Enable that too. Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Len Brown <len.brown@intel.com>
2019-08-31tools/power x86_energy_perf_policy: Fix "uninitialized variable" warnings at -O2Ben Hutchings1-11/+15
x86_energy_perf_policy first uses __get_cpuid() to check the maximum CPUID level and exits if it is too low. It then assumes that later calls will succeed (which I think is architecturally guaranteed). It also assumes that CPUID works at all (which is not guaranteed on x86_32). If optimisations are enabled, gcc warns about potentially uninitialized variables. Fix this by adding an exit-on-error after every call to __get_cpuid() instead of just checking the maximum level. Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Len Brown <len.brown@intel.com>
2019-08-31tracing: Correct kdoc formatsJakub Kicinski1-12/+14
Fix the following kdoc warnings: kernel/trace/trace.c:1579: warning: Function parameter or member 'tr' not described in 'update_max_tr_single' kernel/trace/trace.c:1579: warning: Function parameter or member 'tsk' not described in 'update_max_tr_single' kernel/trace/trace.c:1579: warning: Function parameter or member 'cpu' not described in 'update_max_tr_single' kernel/trace/trace.c:1776: warning: Function parameter or member 'type' not described in 'register_tracer' kernel/trace/trace.c:2239: warning: Function parameter or member 'task' not described in 'tracing_record_taskinfo' kernel/trace/trace.c:2239: warning: Function parameter or member 'flags' not described in 'tracing_record_taskinfo' kernel/trace/trace.c:2269: warning: Function parameter or member 'prev' not described in 'tracing_record_taskinfo_sched_switch' kernel/trace/trace.c:2269: warning: Function parameter or member 'next' not described in 'tracing_record_taskinfo_sched_switch' kernel/trace/trace.c:2269: warning: Function parameter or member 'flags' not described in 'tracing_record_taskinfo_sched_switch' kernel/trace/trace.c:3078: warning: Function parameter or member 'ip' not described in 'trace_vbprintk' kernel/trace/trace.c:3078: warning: Function parameter or member 'fmt' not described in 'trace_vbprintk' kernel/trace/trace.c:3078: warning: Function parameter or member 'args' not described in 'trace_vbprintk' Link: http://lkml.kernel.org/r/20190828052549.2472-2-jakub.kicinski@netronome.com Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-08-31ftrace/x86: Remove mcount() declarationJisheng Zhang1-1/+0
Commit 562e14f72292 ("ftrace/x86: Remove mcount support") removed the support for using mcount, so we could remove the mcount() declaration to clean up. Link: http://lkml.kernel.org/r/20190826170150.10f101ba@xhacker.debian Signed-off-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-08-31tracing/probe: Fix null pointer dereferenceXinpeng Liu1-1/+2
BUG: KASAN: null-ptr-deref in trace_probe_cleanup+0x8d/0xd0 Read of size 8 at addr 0000000000000000 by task syz-executor.0/9746 trace_probe_cleanup+0x8d/0xd0 free_trace_kprobe.part.14+0x15/0x50 alloc_trace_kprobe+0x23e/0x250 Link: http://lkml.kernel.org/r/1565220563-980-1-git-send-email-danielliu861@gmail.com Fixes: e3dc9f898ef9c ("tracing/probe: Add trace_event_call accesses APIs") Signed-off-by: Xinpeng Liu <danielliu861@gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-08-31tracing: Make exported ftrace_set_clr_event non-staticDenis Efremov2-1/+2
The function ftrace_set_clr_event is declared static and marked EXPORT_SYMBOL_GPL(), which is at best an odd combination. Because the function was decided to be a part of API, this commit removes the static attribute and adds the declaration to the header. Link: http://lkml.kernel.org/r/20190704172110.27041-1-efremov@linux.com Fixes: f45d1225adb04 ("tracing: Kernel access to Ftrace instances") Reviewed-by: Joe Jin <joe.jin@oracle.com> Signed-off-by: Denis Efremov <efremov@linux.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-08-30Partially revert "kfifo: fix kfifo_alloc() and kfifo_init()"Linus Torvalds1-1/+2
Commit dfe2a77fd243 ("kfifo: fix kfifo_alloc() and kfifo_init()") made the kfifo code round the number of elements up. That was good for __kfifo_alloc(), but it's actually wrong for __kfifo_init(). The difference? __kfifo_alloc() will allocate the rounded-up number of elements, but __kfifo_init() uses an allocation done by the caller. We can't just say "use more elements than the caller allocated", and have to round down. The good news? All the normal cases will be using power-of-two arrays anyway, and most users of kfifo's don't use kfifo_init() at all, but one of the helper macros to declare a KFIFO that enforce the proper power-of-two behavior. But it looks like at least ibmvscsis might be affected. The bad news? Will Deacon refers to an old thread and points points out that the memory ordering in kfifo's is questionable. See https://lore.kernel.org/lkml/20181211034032.32338-1-yuleixzhang@tencent.com/ for more. Fixes: dfe2a77fd243 ("kfifo: fix kfifo_alloc() and kfifo_init()") Reported-by: laokz <laokz@foxmail.com> Cc: Stefani Seibold <stefani@seibold.net> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: Greg KH <greg@kroah.com> Cc: Kees Cook <keescook@chromium.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-08-30mm: memcontrol: fix percpu vmstats and vmevents flushShakeel Butt1-5/+5
Instead of using raw_cpu_read() use per_cpu() to read the actual data of the corresponding cpu otherwise we will be reading the data of the current cpu for the number of online CPUs. Link: http://lkml.kernel.org/r/20190829203110.129263-1-shakeelb@google.com Fixes: bb65f89b7d3d ("mm: memcontrol: flush percpu vmevents before releasing memcg") Fixes: c350a99ea2b1 ("mm: memcontrol: flush percpu vmstats before releasing memcg") Signed-off-by: Shakeel Butt <shakeelb@google.com> Acked-by: Roman Gushchin <guro@fb.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-08-30mm, memcg: do not set reclaim_state on soft limit reclaimMichal Hocko1-2/+3
Adric Blake has noticed[1] the following warning: WARNING: CPU: 7 PID: 175 at mm/vmscan.c:245 set_task_reclaim_state+0x1e/0x40 [...] Call Trace: mem_cgroup_shrink_node+0x9b/0x1d0 mem_cgroup_soft_limit_reclaim+0x10c/0x3a0 balance_pgdat+0x276/0x540 kswapd+0x200/0x3f0 ? wait_woken+0x80/0x80 kthread+0xfd/0x130 ? balance_pgdat+0x540/0x540 ? kthread_park+0x80/0x80 ret_from_fork+0x35/0x40 ---[ end trace 727343df67b2398a ]--- which tells us that soft limit reclaim is about to overwrite the reclaim_state configured up in the call chain (kswapd in this case but the direct reclaim is equally possible). This means that reclaim stats would get misleading once the soft reclaim returns and another reclaim is done. Fix the warning by dropping set_task_reclaim_state from the soft reclaim which is always called with reclaim_state set up. [1] http://lkml.kernel.org/r/CAE1jjeePxYPvw1mw2B3v803xHVR_BNnz0hQUY_JDMN8ny29M6w@mail.gmail.com Link: http://lkml.kernel.org/r/20190828071808.20410-1-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Reported-by: Adric Blake <promarbler14@gmail.com> Acked-by: Yafang Shao <laoar.shao@gmail.com> Acked-by: Yang Shi <yang.shi@linux.alibaba.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Hillf Danton <hdanton@sina.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-08-30mailmap: add aliases for Dmitry SafonovDmitry Safonov1-0/+3
I don't work for Virtuozzo or Samsung anymore and I've noticed that they have started sending annoying html email-replies. And I prioritize my personal emails over work email box, so while at it add an entry for Arista too - so I can reply faster when needed. Link: http://lkml.kernel.org/r/20190827220346.11123-1-dima@arista.com Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-08-30mm/z3fold.c: fix lock/unlock imbalance in z3fold_page_isolateGustavo A. R. Silva1-0/+1
Fix lock/unlock imbalance by unlocking *zhdr* before return. Addresses Coverity ID 1452811 ("Missing unlock") Link: http://lkml.kernel.org/r/20190826030634.GA4379@embeddedor Fixes: d776aaa9895e ("mm/z3fold.c: fix race between migration and destruction") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Henry Burns <henrywolfeburns@gmail.com> Cc: Vitaly Wool <vitalywool@gmail.com> Cc: Shakeel Butt <shakeelb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-08-30mm, memcg: partially revert "mm/memcontrol.c: keep local VM counters in sync with the hierarchical ones"Roman Gushchin1-5/+3
Commit 766a4c19d880 ("mm/memcontrol.c: keep local VM counters in sync with the hierarchical ones") effectively decreased the precision of per-memcg vmstats_local and per-memcg-per-node lruvec percpu counters. That's good for displaying in memory.stat, but brings a serious regression into the reclaim process. One issue I've discovered and debugged is the following: lruvec_lru_size() can return 0 instead of the actual number of pages in the lru list, preventing the kernel to reclaim last remaining pages. Result is yet another dying memory cgroups flooding. The opposite is also happening: scanning an empty lru list is the waste of cpu time. Also, inactive_list_is_low() can return incorrect values, preventing the active lru from being scanned and freed. It can fail both because the size of active and inactive lists are inaccurate, and because the number of workingset refaults isn't precise. In other words, the result is pretty random. I'm not sure, if using the approximate number of slab pages in count_shadow_number() is acceptable, but issues described above are enough to partially revert the patch. Let's keep per-memcg vmstat_local batched (they are only used for displaying stats to the userspace), but keep lruvec stats precise. This change fixes the dead memcg flooding on my setup. Link: http://lkml.kernel.org/r/20190817004726.2530670-1-guro@fb.com Fixes: 766a4c19d880 ("mm/memcontrol.c: keep local VM counters in sync with the hierarchical ones") Signed-off-by: Roman Gushchin <guro@fb.com> Acked-by: Yafang Shao <laoar.shao@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-08-30mm/zsmalloc.c: fix build when CONFIG_COMPACTION=nAndrew Morton1-0/+2
Fixes: 701d678599d0c1 ("mm/zsmalloc.c: fix race condition in zs_destroy_pool") Link: http://lkml.kernel.org/r/201908251039.5oSbEEUT%25lkp@intel.com Reported-by: kbuild test robot <lkp@intel.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Henry Burns <henrywolfeburns@gmail.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Shakeel Butt <shakeelb@google.com> Cc: Jonathan Adams <jwadams@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-08-30mm: memcontrol: flush percpu slab vmstats on kmem offliningRoman Gushchin2-10/+30
I've noticed that the "slab" value in memory.stat is sometimes 0, even if some children memory cgroups have a non-zero "slab" value. The following investigation showed that this is the result of the kmem_cache reparenting in combination with the per-cpu batching of slab vmstats. At the offlining some vmstat value may leave in the percpu cache, not being propagated upwards by the cgroup hierarchy. It means that stats on ancestor levels are lower than actual. Later when slab pages are released, the precise number of pages is substracted on the parent level, making the value negative. We don't show negative values, 0 is printed instead. To fix this issue, let's flush percpu slab memcg and lruvec stats on memcg offlining. This guarantees that numbers on all ancestor levels are accurate and match the actual number of outstanding slab pages. Link: http://lkml.kernel.org/r/20190819202338.363363-3-guro@fb.com Fixes: fb2f2b0adb98 ("mm: memcg/slab: reparent memcg kmem_caches on cgroup removal") Signed-off-by: Roman Gushchin <guro@fb.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-08-30ftrace: Check for successful allocation of hashNaveen N. Rao1-0/+5
In register_ftrace_function_probe(), we are not checking the return value of alloc_and_copy_ftrace_hash(). The subsequent call to ftrace_match_records() may end up dereferencing the same. Add a check to ensure this doesn't happen. Link: http://lkml.kernel.org/r/26e92574f25ad23e7cafa3cf5f7a819de1832cbe.1562249521.git.naveen.n.rao@linux.vnet.ibm.com Cc: stable@vger.kernel.org Fixes: 1ec3a81a0cf42 ("ftrace: Have each function probe use its own ftrace_ops") Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-08-30ftrace: Check for empty hash and comment the race with registering probesSteven Rostedt (VMware)1-1/+9
The race between adding a function probe and reading the probes that exist is very subtle. It needs a comment. Also, the issue can also happen if the probe has has the EMPTY_HASH as its func_hash. Cc: stable@vger.kernel.org Fixes: 7b60f3d876156 ("ftrace: Dynamically create the probe ftrace_ops for the trace_array") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-08-30ftrace: Fix NULL pointer dereference in t_probe_next()Naveen N. Rao1-0/+4
LTP testsuite on powerpc results in the below crash: Unable to handle kernel paging request for data at address 0x00000000 Faulting instruction address: 0xc00000000029d800 Oops: Kernel access of bad area, sig: 11 [#1] LE SMP NR_CPUS=2048 NUMA PowerNV ... CPU: 68 PID: 96584 Comm: cat Kdump: loaded Tainted: G W NIP: c00000000029d800 LR: c00000000029dac4 CTR: c0000000001e6ad0 REGS: c0002017fae8ba10 TRAP: 0300 Tainted: G W MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 28022422 XER: 20040000 CFAR: c00000000029d90c DAR: 0000000000000000 DSISR: 40000000 IRQMASK: 0 ... NIP [c00000000029d800] t_probe_next+0x60/0x180 LR [c00000000029dac4] t_mod_start+0x1a4/0x1f0 Call Trace: [c0002017fae8bc90] [c000000000cdbc40] _cond_resched+0x10/0xb0 (unreliable) [c0002017fae8bce0] [c0000000002a15b0] t_start+0xf0/0x1c0 [c0002017fae8bd30] [c0000000004ec2b4] seq_read+0x184/0x640 [c0002017fae8bdd0] [c0000000004a57bc] sys_read+0x10c/0x300 [c0002017fae8be30] [c00000000000b388] system_call+0x5c/0x70 The test (ftrace_set_ftrace_filter.sh) is part of ftrace stress tests and the crash happens when the test does 'cat $TRACING_PATH/set_ftrace_filter'. The address points to the second line below, in t_probe_next(), where filter_hash is dereferenced: hash = iter->probe->ops.func_hash->filter_hash; size = 1 << hash->size_bits; This happens due to a race with register_ftrace_function_probe(). A new ftrace_func_probe is created and added into the func_probes list in trace_array under ftrace_lock. However, before initializing the filter, we drop ftrace_lock, and re-acquire it after acquiring regex_lock. If another process is trying to read set_ftrace_filter, it will be able to acquire ftrace_lock during this window and it will end up seeing a NULL filter_hash. Fix this by just checking for a NULL filter_hash in t_probe_next(). If the filter_hash is NULL, then this probe is just being added and we can simply return from here. Link: http://lkml.kernel.org/r/05e021f757625cbbb006fad41380323dbe4e3b43.1562249521.git.naveen.n.rao@linux.vnet.ibm.com Cc: stable@vger.kernel.org Fixes: 7b60f3d876156 ("ftrace: Dynamically create the probe ftrace_ops for the trace_array") Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-08-30keys: ensure that ->match_free() is called in request_key_and_link()Eric Biggers1-1/+1
If check_cached_key() returns a non-NULL value, we still need to call key_type::match_free() to undo key_type::match_preparse(). Fixes: 7743c48e54ee ("keys: Cache result of request_key*() temporarily in task_struct") Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-08-30i2c: mediatek: disable zero-length transfers for mt8183Hsin-Yi Wang1-1/+10
Quoting from mt8183 datasheet, the number of transfers to be transferred in one transaction should be set to bigger than 1, so we should forbid zero-length transfer and update functionality. Reported-by: Alexandru M Stan <amstan@chromium.org> Signed-off-by: Hsin-Yi Wang <hsinyi@chromium.org> Reviewed-by: Qii Wang <qii.wang@mediatek.com> [wsa: shortened commit message a little] Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2019-08-30i2c: iproc: Stop advertising support of SMBUS quick cmdLori Hikichi1-1/+4
The driver does not support the SMBUS Quick command so remove the flag that indicates that level of support. By default the i2c_detect tool uses the quick command to try and detect devices at some bus addresses. If the quick command is used then we will not detect the device, even though it is present. Fixes: e6e5dd3566e0 (i2c: iproc: Add Broadcom iProc I2C Driver) Signed-off-by: Lori Hikichi <lori.hikichi@broadcom.com> Signed-off-by: Rayagonda Kokatanur <rayagonda.kokatanur@broadcom.com> Reviewed-by: Ray Jui <ray.jui@broadcom.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2019-08-30MAINTAINERS: i2c mv64xxx: Update documentation pathDenis Efremov1-1/+1
Update MAINTAINERS record to reflect the file move from i2c-mv64xxx.txt to marvell,mv64xxx-i2c.yaml. Fixes: f8bbde72ef44 ("dt-bindings: i2c: mv64xxx: Add YAML schemas") Signed-off-by: Denis Efremov <efremov@linux.com> Acked-by: Maxime Ripard <mripard@kernel.org> Acked-by: Gregory CLEMENT <gregory.clement@bootlin.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2019-08-30perf/x86/amd/ibs: Fix sample bias for dispatched micro-opsKim Phillips2-7/+18
When counting dispatched micro-ops with cnt_ctl=1, in order to prevent sample bias, IBS hardware preloads the least significant 7 bits of current count (IbsOpCurCnt) with random values, such that, after the interrupt is handled and counting resumes, the next sample taken will be slightly perturbed. The current count bitfield is in the IBS execution control h/w register, alongside the maximum count field. Currently, the IBS driver writes that register with the maximum count, leaving zeroes to fill the current count field, thereby overwriting the random bits the hardware preloaded for itself. Fix the driver to actually retain and carry those random bits from the read of the IBS control register, through to its write, instead of overwriting the lower current count bits with zeroes. Tested with: perf record -c 100001 -e ibs_op/cnt_ctl=1/pp -a -C 0 taskset -c 0 <workload> 'perf annotate' output before: 15.70 65: addsd %xmm0,%xmm1 17.30 add $0x1,%rax 15.88 cmp %rdx,%rax je 82 17.32 72: test $0x1,%al jne 7c 7.52 movapd %xmm1,%xmm0 5.90 jmp 65 8.23 7c: sqrtsd %xmm1,%xmm0 12.15 jmp 65 'perf annotate' output after: 16.63 65: addsd %xmm0,%xmm1 16.82 add $0x1,%rax 16.81 cmp %rdx,%rax je 82 16.69 72: test $0x1,%al jne 7c 8.30 movapd %xmm1,%xmm0 8.13 jmp 65 8.24 7c: sqrtsd %xmm1,%xmm0 8.39 jmp 65 Tested on Family 15h and 17h machines. Machines prior to family 10h Rev. C don't have the RDWROPCNT capability, and have the IbsOpCurCnt bitfield reserved, so this patch shouldn't affect their operation. It is unknown why commit db98c5faf8cb ("perf/x86: Implement 64-bit counter support for IBS") ignored the lower 4 bits of the IbsOpCurCnt field; the number of preloaded random bits has always been 7, AFAICT. Signed-off-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: "Arnaldo Carvalho de Melo" <acme@kernel.org> Cc: <x86@kernel.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "Borislav Petkov" <bp@alien8.de> Cc: Stephane Eranian <eranian@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: "Namhyung Kim" <namhyung@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Link: https://lkml.kernel.org/r/20190826195730.30614-1-kim.phillips@amd.com
2019-08-30perf/x86/intel: Restrict period on NehalemJosh Hunt1-0/+6
We see our Nehalem machines reporting 'perfevents: irq loop stuck!' in some cases when using perf: perfevents: irq loop stuck! WARNING: CPU: 0 PID: 3485 at arch/x86/events/intel/core.c:2282 intel_pmu_handle_irq+0x37b/0x530 ... RIP: 0010:intel_pmu_handle_irq+0x37b/0x530 ... Call Trace: <NMI> ? perf_event_nmi_handler+0x2e/0x50 ? intel_pmu_save_and_restart+0x50/0x50 perf_event_nmi_handler+0x2e/0x50 nmi_handle+0x6e/0x120 default_do_nmi+0x3e/0x100 do_nmi+0x102/0x160 end_repeat_nmi+0x16/0x50 ... ? native_write_msr+0x6/0x20 ? native_write_msr+0x6/0x20 </NMI> intel_pmu_enable_event+0x1ce/0x1f0 x86_pmu_start+0x78/0xa0 x86_pmu_enable+0x252/0x310 __perf_event_task_sched_in+0x181/0x190 ? __switch_to_asm+0x41/0x70 ? __switch_to_asm+0x35/0x70 ? __switch_to_asm+0x41/0x70 ? __switch_to_asm+0x35/0x70 finish_task_switch+0x158/0x260 __schedule+0x2f6/0x840 ? hrtimer_start_range_ns+0x153/0x210 schedule+0x32/0x80 schedule_hrtimeout_range_clock+0x8a/0x100 ? hrtimer_init+0x120/0x120 ep_poll+0x2f7/0x3a0 ? wake_up_q+0x60/0x60 do_epoll_wait+0xa9/0xc0 __x64_sys_epoll_wait+0x1a/0x20 do_syscall_64+0x4e/0x110 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7fdeb1e96c03 ... Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: acme@kernel.org Cc: Josh Hunt <johunt@akamai.com> Cc: bpuranda@akamai.com Cc: mingo@redhat.com Cc: jolsa@redhat.com Cc: tglx@linutronix.de Cc: namhyung@kernel.org Cc: alexander.shishkin@linux.intel.com Link: https://lkml.kernel.org/r/1566256411-18820-1-git-send-email-johunt@akamai.com
2019-08-30mmc: sdhci-cadence: enable v4_mode to fix ADMA 64-bit addressingMasahiro Yamada1-0/+1
The IP datasheet says this controller is compatible with SD Host Specification Version v4.00. As it turned out, the ADMA of this IP does not work with 64-bit mode when it is in the Version 3.00 compatible mode; it understands the old 64-bit descriptor table (as defined in SDHCI v2), but the ADMA System Address Register (SDHCI_ADMA_ADDRESS) cannot point to the 64-bit address. I noticed this issue only after commit bd2e75633c80 ("dma-contiguous: use fallback alloc_pages for single pages"). Prior to that commit, dma_set_mask_and_coherent() returned the dma address that fits in 32-bit range, at least for the default arm64 configuration (arch/arm64/configs/defconfig). Now the host->adma_addr exceeds the 32-bit limit, causing the real problem for the Socionext SoCs. (As a side-note, I was also able to reproduce the issue for older kernels by turning off CONFIG_DMA_CMA.) Call sdhci_enable_v4_mode() to fix this. Cc: <stable@vger.kernel.org> # v4.20+ Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2019-08-30mmc: sdhci-sprd: clear the UHS-I modes read from registersChunyan Zhang1-1/+12
sprd's sd host controller supports SDR50/SDR104/DDR50 though, the UHS-I mode used by the specific card can be selected via devicetree only. Fixes: fb8bd90f83c4 ("mmc: sdhci-sprd: Add Spreadtrum's initial host controller") Signed-off-by: Chunyan Zhang <chunyan.zhang@unisoc.com> Signed-off-by: Chunyan Zhang <zhang.lyra@gmail.com> Reviewed-by: Baolin Wang <baolin.wang@linaro.org> Tested-by: Baolin Wang <baolin.wang@linaro.org> Cc: stable@vger.kernel.org Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2019-08-30mms: sdhci-sprd: add SDHCI_QUIRK_BROKEN_CARD_DETECTIONChunyan Zhang1-1/+2
sprd's sd host controller doesn't support detection to card insert or remove. Fixes: fb8bd90f83c4 ("mmc: sdhci-sprd: Add Spreadtrum's initial host controller") Signed-off-by: Chunyan Zhang <chunyan.zhang@unisoc.com> Signed-off-by: Chunyan Zhang <zhang.lyra@gmail.com> Reviewed-by: Baolin Wang <baolin.wang@linaro.org> Tested-by: Baolin Wang <baolin.wang@linaro.org> Cc: stable@vger.kernel.org Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2019-08-30mmc: sdhci-sprd: add SDHCI_QUIRK2_PRESET_VALUE_BROKENChunyan Zhang1-1/+2
The bit of PRESET_VAL_ENABLE in HOST_CONTROL2 register is reserved on sprd's sd host controller, set quirk2 to disable configuring this. Fixes: fb8bd90f83c4 ("mmc: sdhci-sprd: Add Spreadtrum's initial host controller") Signed-off-by: Chunyan Zhang <chunyan.zhang@unisoc.com> Signed-off-by: Chunyan Zhang <zhang.lyra@gmail.com> Reviewed-by: Baolin Wang <baolin.wang@linaro.org> Tested-by: Baolin Wang <baolin.wang@linaro.org> Cc: stable@vger.kernel.org Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2019-08-30mmc: sdhci-sprd: add get_ro hook functionChunyan Zhang1-0/+6
sprd's sd host controller doesn't support write protect to sd card. Fixes: fb8bd90f83c4 ("mmc: sdhci-sprd: Add Spreadtrum's initial host controller") Signed-off-by: Chunyan Zhang <chunyan.zhang@unisoc.com> Signed-off-by: Chunyan Zhang <zhang.lyra@gmail.com> Reviewed-by: Baolin Wang <baolin.wang@linaro.org> Tested-by: Baolin Wang <baolin.wang@linaro.org> Cc: stable@vger.kernel.org Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2019-08-30mmc: sdhci-sprd: fixed incorrect clock dividerChunyan Zhang1-3/+4
The register SDHCI_CLOCK_CONTROL should be cleared before config clock divider, otherwise the frequency configured maybe lower than we expected. Fixes: fb8bd90f83c4 ("mmc: sdhci-sprd: Add Spreadtrum's initial host controller") Signed-off-by: Chunyan Zhang <chunyan.zhang@unisoc.com> Signed-off-by: Chunyan Zhang <zhang.lyra@gmail.com> Reviewed-by: Baolin Wang <baolin.wang@linaro.org> Tested-by: Baolin Wang <baolin.wang@linaro.org> Cc: stable@vger.kernel.org Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2019-08-30mmc: core: Fix init of SD cards reporting an invalid VDD rangeUlf Hansson1-0/+6
The OCR register defines the supported range of VDD voltages for SD cards. However, it has turned out that some SD cards reports an invalid voltage range, for example having bit7 set. When a host supports MMC_CAP2_FULL_PWR_CYCLE and some of the voltages from the invalid VDD range, this triggers the core to run a power cycle of the card to try to initialize it at the lowest common supported voltage. Obviously this fails, since the card can't support it. Let's fix this problem, by clearing invalid bits from the read OCR register for SD cards, before proceeding with the VDD voltage negotiation. Cc: stable@vger.kernel.org Reported-by: Philip Langdale <philipl@overt.org> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Reviewed-by: Philip Langdale <philipl@overt.org> Tested-by: Philip Langdale <philipl@overt.org> Tested-by: Manuel Presnitz <mail@mpy.de>
2019-08-29i2c: piix4: Fix port selection for AMD Family 16h Model 30hAndrew Cooks1-7/+5
Family 16h Model 30h SMBus controller needs the same port selection fix as described and fixed in commit 0fe16195f891 ("i2c: piix4: Fix SMBus port selection for AMD Family 17h chips") commit 6befa3fde65f ("i2c: piix4: Support alternative port selection register") also fixed the port selection for Hudson2, but unfortunately this is not the exact same device and the AMD naming and PCI Device IDs aren't particularly helpful here. The SMBus port selection register is common to the following Families and models, as documented in AMD's publicly available BIOS and Kernel Developer Guides: 50742 - Family 15h Model 60h-6Fh (PCI_DEVICE_ID_AMD_KERNCZ_SMBUS) 55072 - Family 15h Model 70h-7Fh (PCI_DEVICE_ID_AMD_KERNCZ_SMBUS) 52740 - Family 16h Model 30h-3Fh (PCI_DEVICE_ID_AMD_HUDSON2_SMBUS) The Hudson2 PCI Device ID (PCI_DEVICE_ID_AMD_HUDSON2_SMBUS) is shared between Bolton FCH and Family 16h Model 30h, but the location of the SmBus0Sel port selection bits are different: 51192 - Bolton Register Reference Guide We distinguish between Bolton and Family 16h Model 30h using the PCI Revision ID: Bolton is device 0x780b, revision 0x15 Family 16h Model 30h is device 0x780b, revision 0x1F Family 15h Model 60h and 70h are both device 0x790b, revision 0x4A. The following additional public AMD BKDG documents were checked and do not share the same port selection register: 42301 - Family 15h Model 00h-0Fh doesn't mention any 42300 - Family 15h Model 10h-1Fh doesn't mention any 49125 - Family 15h Model 30h-3Fh doesn't mention any 48751 - Family 16h Model 00h-0Fh uses the previously supported index register SB800_PIIX4_PORT_IDX_ALT at 0x2e Signed-off-by: Andrew Cooks <andrew.cooks@opengear.com> Signed-off-by: Jean Delvare <jdelvare@suse.de> Cc: stable@vger.kernel.org [v4.6+] Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2019-08-29x86/mm/cpa: Prevent large page split when ftrace flips RW on kernel textThomas Gleixner1-8/+18
ftrace does not use text_poke() for enabling trace functionality. It uses its own mechanism and flips the whole kernel text to RW and back to RO. The CPA rework removed a loop based check of 4k pages which tried to preserve a large page by checking each 4k page whether the change would actually cover all pages in the large page. This resulted in endless loops for nothing as in testing it turned out that it actually never preserved anything. Of course testing missed to include ftrace, which is the one and only case which benefitted from the 4k loop. As a consequence enabling function tracing or ftrace based kprobes results in a full 4k split of the kernel text, which affects iTLB performance. The kernel RO protection is the only valid case where this can actually preserve large pages. All other static protections (RO data, data NX, PCI, BIOS) are truly static. So a conflict with those protections which results in a split should only ever happen when a change of memory next to a protected region is attempted. But these conflicts are rightfully splitting the large page to preserve the protected regions. In fact a change to the protected regions itself is a bug and is warned about. Add an exception for the static protection check for kernel text RO when the to be changed region spawns a full large page which allows to preserve the large mappings. This also prevents the syslog to be spammed about CPA violations when ftrace is used. The exception needs to be removed once ftrace switched over to text_poke() which avoids the whole issue. Fixes: 585948f4f695 ("x86/mm/cpa: Avoid the 4k pages check completely") Reported-by: Song Liu <songliubraving@fb.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Song Liu <songliubraving@fb.com> Reviewed-by: Song Liu <songliubraving@fb.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1908282355340.1938@nanos.tec.linutronix.de
2019-08-29i2c: designware: Synchronize IRQs when unregistering slave clientJarkko Nikula1-0/+1
Make sure interrupt handler i2c_dw_irq_handler_slave() has finished before clearing the the dev->slave pointer in i2c_dw_unreg_slave(). There is possibility for a race if i2c_dw_irq_handler_slave() is running on another CPU while clearing the dev->slave pointer. Reported-by: Krzysztof Adamski <krzysztof.adamski@nokia.com> Reported-by: Wolfram Sang <wsa@the-dreams.de> Signed-off-by: Jarkko Nikula <jarkko.nikula@linux.intel.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2019-08-29i2c: i801: Avoid memory leak in check_acpi_smo88xx_device()Andy Shevchenko1-3/+12
check_acpi_smo88xx_device() utilizes acpi_get_object_info() which in its turn allocates a buffer. User is responsible to clean allocated resources. The last has been missed in the original code. Fix it here. While here, replace !ACPI_SUCCESS() with ACPI_FAILURE(). Fixes: 19b07cb4a187 ("i2c: i801: Register optional lis3lv02d I2C device on Dell machines") Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Pali Rohár <pali.rohar@gmail.com> Reviewed-by: Jean Delvare <jdelvare@suse.de> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>