aboutsummaryrefslogtreecommitdiffstats
path: root/tools/perf/scripts/python/export-to-postgresql.py (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2017-07-02moduleparam: fix doc: hwparam_irq configures an IRQSylvain 'ythier' Hitier1-1/+1
Signed-off-by: Sylvain 'ythier' Hitier <sylvain.hitier@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-02locking/refcount: Remove the half-implemented refcount_sub() APIKees Cook1-6/+0
CONFIG_REFCOUNT_FULL=y (correctly) does not provide a refcount_sub(), which should not be part of proper refcount design patterns. Remove the erroneous extern and the later !CONFIG_REFCOUNT_FULL accidental implementation. Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Elena Reshetova <elena.reshetova@intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 29dee3c03abc ("locking/refcounts: Out-of-line everything") Link: http://lkml.kernel.org/r/20170701180129.GA17405@beast Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-30uapi/linux/a.out.h: don't use deprecated system-specific predefines.Zack Weinberg1-25/+1
uapi/linux/a.out.h uses a number of predefined macros that are deprecated because they're in the application namespace (e.g. '#ifdef linux' instead of '#ifdef __linux__'). This patch either corrects or just removes them if they are not applicable to Linux. The primary reason this is worth bothering to fix, considering how obsolete a.out binary support is, is that the GCC build process considers this such a severe error that it will copy the header into a private directory and change the macro names, which causes future updates to the header to be masked. This header probably doesn't get updated very often anymore, but it is the _only_ uapi header that gets this treatment, so IMHO it is worth patching just to drive that number all the way to zero. Signed-off-by: Zack Weinberg <zackw@panix.com> [hch: removed dead conditionals] Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-06-30hashtable: remove repeated phrase from a commentJakub Kicinski1-1/+0
"in a rcu enabled hashtable" is repeated twice in a comment. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-06-30x86/intel_rdt: Fix memory leak on mount failureVikas Shivappa1-1/+3
If mount fails, the kn_info directory is not freed causing memory leak. Add the missing error handling path. Fixes: 4e978d06dedb ("x86/intel_rdt: Add "info" files to resctrl file system") Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: ravi.v.shankar@intel.com Cc: tony.luck@intel.com Cc: fenghua.yu@intel.com Cc: peterz@infradead.org Cc: vikas.shivappa@intel.com Cc: andi.kleen@intel.com Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1498503368-20173-3-git-send-email-vikas.shivappa@linux.intel.com
2017-06-30objtool: Silence warnings for functions which use IRETJosh Poimboeuf1-8/+6
Previously, objtool ignored functions which have the IRET instruction in them. That's because it assumed that such functions know what they're doing with respect to frame pointers. With the new "objtool 2.0" changes, it stopped ignoring such functions, and started complaining about them: arch/x86/kernel/alternative.o: warning: objtool: do_sync_core()+0x1b: unsupported instruction in callable function arch/x86/kernel/alternative.o: warning: objtool: text_poke()+0x1a8: unsupported instruction in callable function arch/x86/kernel/ftrace.o: warning: objtool: do_sync_core()+0x16: unsupported instruction in callable function arch/x86/kernel/cpu/mcheck/mce.o: warning: objtool: machine_check_poll()+0x166: unsupported instruction in callable function arch/x86/kernel/cpu/mcheck/mce.o: warning: objtool: do_machine_check()+0x147: unsupported instruction in callable function Silence those warnings for now. They can be re-enabled later, once we have unwind hints which will allow the code to annotate the IRET usages. Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: live-patching@vger.kernel.org Fixes: baa41469a7b9 ("objtool: Implement stack validation 2.0") Link: http://lkml.kernel.org/r/20170630140934.mmwtpockvpupahro@treble Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-30lightnvm: pblk: set line bitmap check under debugJavier González1-1/+4
Do bitmap checks only when debug mode is enable. The line bitmap used for mapping to physical addresses is fairly large (~512KB) and it is expensive to do this checks on the fast path. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <matias@cnexlabs.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-30lightnvm: pblk: verify that cache read is still validJavier González3-6/+22
When a read is directed to the cache, we risk that the lba has been updated during the time we made the L2P table lookup and the time we are actually reading form the cache. We intentionally not hold the L2P lock not to block other threads. While strict ordering is not a guarantee at this level (unless REQ_FLUSH has been previously issued), we have experience that some databases that have recently implemented direct I/O support, issue metadata reads very close to the writes, without issuing a fsync in the middle. An easy way to support them while they is to make an extra effort and check the L2P map right before reading the cache. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <matias@cnexlabs.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-30lightnvm: pblk: add initialization checkJavier González1-0/+6
Add a sanity check to the pblk initialization sequence in order to ensure that enough LUNs have been allocated to store the line metadata. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <matias@cnexlabs.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-30lightnvm: pblk: remove target using async. I/OsJavier González6-73/+122
When removing a pblk instance, pad the current line using asynchronous I/O. This reduces the removal time from ~1 minute in the worst case to a couple of seconds. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <matias@cnexlabs.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-30lightnvm: pblk: use vmalloc for GC data bufferJavier González5-12/+14
For now, we allocate a per I/O buffer for GC data. Since the potential size of the buffer is 256KB and GC is not in the fast path, do this allocation with vmalloc. This puts lets pressure on the memory allocator at no performance cost. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <matias@cnexlabs.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-30lightnvm: pblk: use right metadata buffer for recoveryJavier González1-3/+3
Fix bad metadata buffer assignations introduced when refactoring the medatada write path. Fixes: dd2a43437337 lightnvm: pblk: sched. metadata on write thread Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <matias@cnexlabs.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-30lightnvm: pblk: schedule if data is not readyJavier González1-1/+3
When user threads place data into the write buffer, they reserve space and do the memory copy out of the lock. As a consequence, when the write thread starts persisting data, there is a chance that it is not copied yet. In this case, avoid polling, and schedule before retrying. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <matias@cnexlabs.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-30lightnvm: pblk: remove unused return variableJavier González1-2/+2
Remove unused variable. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <matias@cnexlabs.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-30lightnvm: pblk: fix double-free on pblk initJavier González1-2/+0
Prevent pblk->lines being double freed in case of an error during pblk initialization. Fixes: dd2a43437337: "lightnvm: pblk: sched. metadata on write thread" Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <matias@cnexlabs.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-30lightnvm: pblk: fix bad le64 assignationsJavier González4-4/+7
Use the right types and conversions on le64 variables. Reported by sparse. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <matias@cnexlabs.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-30perf auxtrace: Add CPU filter supportAdrian Hunter4-0/+14
Decoding auxtrace data can take a long time. To avoid decoding unnecessarily, filter auxtrace data that is collected per-cpu before it is decoded. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1495786658-18063-38-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-06-30perf intel-pt: Do not use TSC packets for calculating CPU cycles to TSCAdrian Hunter1-0/+14
CBR (core-to-bus ratio) packets provide an indication of CPU frequency. A more accurate measure can be made by counting the cycles (given by CYC packets) in between other timing packets (either MTC or TSC). Using TSC packets has at least 2 issues: 1) timing might have stopped (e.g. mwait) or 2) TSC packets within PSB+ might slip past CYC packets. For now, simply do not use TSC packets for calculating CPU cycles to TSC. That leaves the case where 2 MTC packets are used, otherwise falling back to the CBR value. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1495786658-18063-37-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-06-30perf intel-pt: Update documentation to include new ptwrite and power eventsAdrian Hunter1-2/+40
Update documentation to include new ptwrite and power events. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1495786658-18063-36-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-06-30perf intel-pt: Add example script for power events and PTWRITEAdrian Hunter3-0/+144
Add script intel-pt-events.py that provides an example of how to unpack the raw data for power events and PTWRITE. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1495786658-18063-35-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-06-30perf intel-pt: Synthesize new power and "ptwrite" eventsAdrian Hunter1-0/+283
Synthesize new power and ptwrite events. Power events report changes to C-state but I have also added support for the existing CBR (core-to-bus ratio) packet and included that when outputting power events. The PTWRITE packet is associated with the new "ptwrite" instruction, which is essentially just a way to stuff a 32 or 64 bit value into the PT trace. More details can be found in the patches that add documentation and in the Intel SDM. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1498811805-2335-1-git-send-email-adrian.hunter@intel.com [ Copy the description of such packet from the patchkit cover message ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-06-30perf intel-pt: Move code in intel_pt_synth_events() to simplify attr settingAdrian Hunter1-23/+22
intel_pt_synth_events() uses the same attr structure to create each event. Move the code around a bit to simplify that. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1495786658-18063-33-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-06-30perf intel-pt: Factor out intel_pt_set_event_name()Adrian Hunter1-8/+16
Factor out intel_pt_set_event_name() so it can be reused. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1495786658-18063-32-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-06-30perf intel-pt: Tidy messages into called function intel_pt_synth_event()Adrian Hunter1-24/+18
Tidy print messages into called function intel_pt_synth_event(). Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1495786658-18063-31-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-06-30perf intel-pt: Tidy Intel PT evsel lookup into separate functionAdrian Hunter1-10/+15
Tidy the lookup of the Intel PT selected event (perf_evsel) into a separate function. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1495786658-18063-30-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-06-30perf intel-pt: Join needlessly wrapped linesAdrian Hunter1-4/+2
Join needlessly wrapped lines. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1495786658-18063-29-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-06-30perf intel-pt: Remove unused instructions_sample_periodAdrian Hunter1-2/+0
Remove unused struct intel_pt member instructions_sample_period. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1495786658-18063-28-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-06-30perf intel-pt: Factor out common code synthesizing event samplesAdrian Hunter1-122/+100
Factor out common code in functions synthesizing event samples i.e. intel_pt_synth_branch_sample(), intel_pt_synth_instruction_sample() and intel_pt_synth_transaction_sample(). Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1495786658-18063-27-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-06-30perf script: Add synthesized Intel PT power and ptwrite eventsAdrian Hunter2-1/+231
Add definitions for synthesized Intel PT events for power and ptwrite. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1498811802-2301-1-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-06-30objtool: Implement stack validation 2.0Josh Poimboeuf11-320/+1130
This is a major rewrite of objtool. Instead of only tracking frame pointer changes, it now tracks all stack-related operations, including all register saves/restores. In addition to making stack validation more robust, this also paves the way for undwarf generation. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: live-patching@vger.kernel.org Link: http://lkml.kernel.org/r/678bd94c0566c6129bcc376cddb259c4c5633004.1498659915.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-30objtool, x86: Add several functions and files to the objtool whitelistJosh Poimboeuf15-6/+39
In preparation for an objtool rewrite which will have broader checks, whitelist functions and files which cause problems because they do unusual things with the stack. These whitelists serve as a TODO list for which functions and files don't yet have undwarf unwinder coverage. Eventually most of the whitelists can be removed in favor of manual CFI hint annotations or objtool improvements. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: live-patching@vger.kernel.org Link: http://lkml.kernel.org/r/7f934a5d707a574bda33ea282e9478e627fb1829.1498659915.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-30objtool: Move checking code to check.cJosh Poimboeuf4-1268/+1328
In preparation for the new 'objtool undwarf generate' command, which will rely on 'objtool check', move the checking code from builtin-check.c to check.c where it can be used by other commands. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Reviewed-by: Jiri Slaby <jslaby@suse.cz> Cc: Andy Lutomirski <luto@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: live-patching@vger.kernel.org Link: http://lkml.kernel.org/r/294c5c695fd73c1a5000bbe5960a7c9bec4ee6b4.1498659915.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-30x86/uaccess: Optimize copy_user_enhanced_fast_string() for short stringsPaolo Abeni1-2/+5
According to the Intel datasheet, the REP MOVSB instruction exposes a pretty heavy setup cost (50 ticks), which hurts short string copy operations. This change tries to avoid this cost by calling the explicit loop available in the unrolled code for strings shorter than 64 bytes. The 64 bytes cutoff value is arbitrary from the code logic point of view - it has been selected based on measurements, as the largest value that still ensures a measurable gain. Micro benchmarks of the __copy_from_user() function with lengths in the [0-63] range show this performance gain (shorter the string, larger the gain): - in the [55%-4%] range on Intel Xeon(R) CPU E5-2690 v4 - in the [72%-9%] range on Intel Core i7-4810MQ Other tested CPUs - namely Intel Atom S1260 and AMD Opteron 8216 - show no difference, because they do not expose the ERMS feature bit. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Alan Cox <gnomes@lxorguk.ukuu.org.uk> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/4533a1d101fd460f80e21329a34928fad521c1d4.1498744345.git.pabeni@redhat.com [ Clarified the changelog. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-30sched/cputime: Refactor the cputime_adjust() codeGustavo A. R. Silva1-11/+5
Address a Coverity false positive, which is caused by overly convoluted code: Value assigned to variable 'utime' at line 619:utime = rtime; is overwritten at line 642:utime = rtime - stime; before it can be used. This makes such variable assignment useless. Remove this variable assignment and refactor the code related. Addresses-Coverity-ID: 1371643 Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com> Cc: Frans Klaver <fransklaver@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Stanislaw Gruszka <sgruszka@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wanpeng Li <wanpeng.li@hotmail.com> Link: http://lkml.kernel.org/r/20170629184128.GA5271@embeddedgus Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-30sched/debug: Expose the number of RT/DL tasks that can migrateDaniel Bristot de Oliveira1-2/+15
Add the value of the rt_rq.rt_nr_migratory and dl_rq.dl_nr_migratory to the sched_debug output, for instance: rt_rq[0]: .rt_nr_running : 2 .rt_nr_migratory : 1 <--- Like this .rt_throttled : 0 .rt_time : 828.645877 .rt_runtime : 1000.000000 This is useful to debug problems related to the RT/DL schedulers. This also fixes the format of some variables, that were unsigned, rather than signed. Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com> Cc: Clark Williams <williams@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis Claudio R. Goncalves <lgoncalv@redhat.com> Cc: Luiz Capitulino <lcapitulino@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-rt-users <linux-rt-users@vger.kernel.org> Link: http://lkml.kernel.org/r/7896f71cada54ee7dd8507bb666063a2e051c3d4.1498482127.git.bristot@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-30perf/x86/intel: Constify the 'lbr_desc[]' array and make a function staticColin Ian King1-2/+2
A few minor clean-ups: constify the lbr_desc[] array and make local function lbr_from_signext_quirk_rd() static to fix a sparse warning: "symbol 'lbr_from_signext_quirk_rd' was not declared. Should it be static?" Signed-off-by: Colin Ian King <colin.king@canonical.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kernel-janitors@vger.kernel.org Link: http://lkml.kernel.org/r/20170629091406.9870-1-colin.king@canonical.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-30x86/boot/KASLR: Fix kexec crash due to 'virt_addr' calculation bugBaoquan He3-7/+2
Kernel text KASLR is separated into physical address and virtual address randomization. And for virtual address randomization, we only randomiza to get an offset between 16M and KERNEL_IMAGE_SIZE. So the initial value of 'virt_addr' should be LOAD_PHYSICAL_ADDR, but not the original kernel loading address 'output'. The bug will cause kernel boot failure if kernel is loaded at a different position than the address, 16M, which is decided at compiled time. Kexec/kdump is such practical case. To fix it, just assign LOAD_PHYSICAL_ADDR to virt_addr as initial value. Tested-by: Dave Young <dyoung@redhat.com> Signed-off-by: Baoquan He <bhe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 8391c73 ("x86/KASLR: Randomize virtual address separately") Link: http://lkml.kernel.org/r/1498567146-11990-3-git-send-email-bhe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-30x86/boot/KASLR: Add checking for the offset of kernel virtual address randomizationBaoquan He1-0/+2
For kernel text KASLR, the virtual address is confined to area of 1G, [0xffffffff80000000, 0xffffffffc0000000). For the implemenataion of virtual address randomization, we only randomize to get an offset between 16M and 1G, then add this offset to the starting address, 0xffffffff80000000. Here 16M is the offset which is decided at linking stage. So the amount of the local variable 'virt_addr' which respresents the offset plus the kernel output size can not exceed KERNEL_IMAGE_SIZE. Add a debug check for the offset. If out of bounds, print error message and hang there. Suggested-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Baoquan He <bhe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1498567146-11990-2-git-send-email-bhe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-29tracing/kprobes: Allow to create probe with a module name starting with a digitSabrina Dubroca1-9/+5
Always try to parse an address, since kstrtoul() will safely fail when given a symbol as input. If that fails (which will be the case for a symbol), try to parse a symbol instead. This allows creating a probe such as: p:probe/vlan_gro_receive 8021q:vlan_gro_receive+0 Which is necessary for this command to work: perf probe -m 8021q -a vlan_gro_receive Link: http://lkml.kernel.org/r/fd72d666f45b114e2c5b9cf7e27b91de1ec966f1.1498122881.git.sd@queasysnail.net Cc: stable@vger.kernel.org Fixes: 413d37d1e ("tracing: Add kprobe-based event tracer") Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-06-30MIPS: Avoid accidental raw backtraceJames Hogan1-0/+2
Since commit 81a76d7119f6 ("MIPS: Avoid using unwind_stack() with usermode") show_backtrace() invokes the raw backtracer when cp0_status & ST0_KSU indicates user mode to fix issues on EVA kernels where user and kernel address spaces overlap. However this is used by show_stack() which creates its own pt_regs on the stack and leaves cp0_status uninitialised in most of the code paths. This results in the non deterministic use of the raw back tracer depending on the previous stack content. show_stack() deals exclusively with kernel mode stacks anyway, so explicitly initialise regs.cp0_status to KSU_KERNEL (i.e. 0) to ensure we get a useful backtrace. Fixes: 81a76d7119f6 ("MIPS: Avoid using unwind_stack() with usermode") Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: linux-mips@linux-mips.org Cc: <stable@vger.kernel.org> # 3.15+ Patchwork: https://patchwork.linux-mips.org/patch/16656/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2017-06-30MIPS: Perform post-DMA cache flushes on systems with MAARsPaul Burton1-5/+18
Recent CPUs from Imagination Technologies such as the I6400 or P6600 are able to speculatively fetch data from memory into caches. This means that if used in a system with non-coherent DMA they require that caches be invalidated after a device performs DMA, and before the CPU reads the DMA'd data, in order to ensure that stale values weren't speculatively prefetched. Such CPUs also introduced Memory Accessibility Attribute Registers (MAARs) in order to control the regions in which they are allowed to speculate. Thus we can use the presence of MAARs as a good indication that the CPU requires the above cache maintenance. Use the presence of MAARs to determine the result of cpu_needs_post_dma_flush() in the default case, in order to handle these recent CPUs correctly. Note that the return type of cpu_needs_post_dma_flush() is changed to bool, such that it's clearer what's happening when cpu_has_maar is cast to bool for the return value. If this patch were backported to a pre-v4.7 kernel then MIPS_CPU_MAAR was 1ull<<34, so when cast to an int we would incorrectly return 0. It so happens that MIPS_CPU_MAAR is currently 1ull<<30, so when truncated to an int gives a non-zero value anyway, but even so the implicit conversion from long long int to bool makes it clearer to understand what will happen than the implicit conversion from long long int to int would. The bool return type also fits this usage better semantically, so seems like an all-round win. Thanks to Ed for spotting the issue for pre-v4.7 kernels & suggesting the return type change. Signed-off-by: Paul Burton <paul.burton@imgtec.com> Reviewed-by: Bryan O'Donoghue <pure.logic@nexus-software.ie> Tested-by: Bryan O'Donoghue <pure.logic@nexus-software.ie> Cc: Ed Blake <ed.blake@imgtec.com> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/16363/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2017-06-30MIPS: Fix IRQ tracing & lockdep when reschedulingPaul Burton1-0/+3
When the scheduler sets TIF_NEED_RESCHED & we call into the scheduler from arch/mips/kernel/entry.S we disable interrupts. This is true regardless of whether we reach work_resched from syscall_exit_work, resume_userspace or by looping after calling schedule(). Although we disable interrupts in these paths we don't call trace_hardirqs_off() before calling into C code which may acquire locks, and we therefore leave lockdep with an inconsistent view of whether interrupts are disabled or not when CONFIG_PROVE_LOCKING & CONFIG_DEBUG_LOCKDEP are both enabled. Without tracing this interrupt state lockdep will print warnings such as the following once a task returns from a syscall via syscall_exit_partial with TIF_NEED_RESCHED set: [ 49.927678] ------------[ cut here ]------------ [ 49.934445] WARNING: CPU: 0 PID: 1 at kernel/locking/lockdep.c:3687 check_flags.part.41+0x1dc/0x1e8 [ 49.946031] DEBUG_LOCKS_WARN_ON(current->hardirqs_enabled) [ 49.946355] CPU: 0 PID: 1 Comm: init Not tainted 4.10.0-00439-gc9fd5d362289-dirty #197 [ 49.963505] Stack : 0000000000000000 ffffffff81bb5d6a 0000000000000006 ffffffff801ce9c4 [ 49.974431] 0000000000000000 0000000000000000 0000000000000000 000000000000004a [ 49.985300] ffffffff80b7e487 ffffffff80a24498 a8000000ff160000 ffffffff80ede8b8 [ 49.996194] 0000000000000001 0000000000000000 0000000000000000 0000000077c8030c [ 50.007063] 000000007fd8a510 ffffffff801cd45c 0000000000000000 a8000000ff127c88 [ 50.017945] 0000000000000000 ffffffff801cf928 0000000000000001 ffffffff80a24498 [ 50.028827] 0000000000000000 0000000000000001 0000000000000000 0000000000000000 [ 50.039688] 0000000000000000 a8000000ff127bd0 0000000000000000 ffffffff805509bc [ 50.050575] 00000000140084e0 0000000000000000 0000000000000000 0000000000040a00 [ 50.061448] 0000000000000000 ffffffff8010e1b0 0000000000000000 ffffffff805509bc [ 50.072327] ... [ 50.076087] Call Trace: [ 50.079869] [<ffffffff8010e1b0>] show_stack+0x80/0xa8 [ 50.086577] [<ffffffff805509bc>] dump_stack+0x10c/0x190 [ 50.093498] [<ffffffff8015dde0>] __warn+0xf0/0x108 [ 50.099889] [<ffffffff8015de34>] warn_slowpath_fmt+0x3c/0x48 [ 50.107241] [<ffffffff801c15b4>] check_flags.part.41+0x1dc/0x1e8 [ 50.114961] [<ffffffff801c239c>] lock_is_held_type+0x8c/0xb0 [ 50.122291] [<ffffffff809461b8>] __schedule+0x8c0/0x10f8 [ 50.129221] [<ffffffff80946a60>] schedule+0x30/0x98 [ 50.135659] [<ffffffff80106278>] work_resched+0x8/0x34 [ 50.142397] ---[ end trace 0cb4f6ef5b99fe21 ]--- [ 50.148405] possible reason: unannotated irqs-off. [ 50.154600] irq event stamp: 400463 [ 50.159566] hardirqs last enabled at (400463): [<ffffffff8094edc8>] _raw_spin_unlock_irqrestore+0x40/0xa8 [ 50.171981] hardirqs last disabled at (400462): [<ffffffff8094eb98>] _raw_spin_lock_irqsave+0x30/0xb0 [ 50.183897] softirqs last enabled at (400450): [<ffffffff8016580c>] __do_softirq+0x4ac/0x6a8 [ 50.195015] softirqs last disabled at (400425): [<ffffffff80165e78>] irq_exit+0x110/0x128 Fix this by using the TRACE_IRQS_OFF macro to call trace_hardirqs_off() when CONFIG_TRACE_IRQFLAGS is enabled. This is done before invoking schedule() following the work_resched label because: 1) Interrupts are disabled regardless of the path we take to reach work_resched() & schedule(). 2) Performing the tracing here avoids the need to do it in paths which disable interrupts but don't call out to C code before hitting a path which uses the RESTORE_SOME macro that will call trace_hardirqs_on() or trace_hardirqs_off() as appropriate. We call trace_hardirqs_on() using the TRACE_IRQS_ON macro before calling syscall_trace_leave() for similar reasons, ensuring that lockdep has a consistent view of state after we re-enable interrupts. Signed-off-by: Paul Burton <paul.burton@imgtec.com> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Cc: linux-mips@linux-mips.org Cc: stable <stable@vger.kernel.org> Patchwork: https://patchwork.linux-mips.org/patch/15385/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2017-06-30MIPS: pm-cps: Drop manual cache-line alignment of ready_countPaul Burton1-8/+1
We allocate memory for a ready_count variable per-CPU, which is accessed via a cached non-coherent TLB mapping to perform synchronisation between threads within the core using LL/SC instructions. In order to ensure that the variable is contained within its own data cache line we allocate 2 lines worth of memory & align the resulting pointer to a line boundary. This is however unnecessary, since kmalloc is guaranteed to return memory which is at least cache-line aligned (see ARCH_DMA_MINALIGN). Stop the redundant manual alignment. Besides cleaning up the code & avoiding needless work, this has the side effect of avoiding an arithmetic error found by Bryan on 64 bit systems due to the 32 bit size of the former dlinesz. This led the ready_count variable to have its upper 32b cleared erroneously for MIPS64 kernels, causing problems when ready_count was later used on MIPS64 via cpuidle. Signed-off-by: Paul Burton <paul.burton@imgtec.com> Fixes: 3179d37ee1ed ("MIPS: pm-cps: add PM state entry code for CPS systems") Reported-by: Bryan O'Donoghue <bryan.odonoghue@imgtec.com> Reviewed-by: Bryan O'Donoghue <bryan.odonoghue@imgtec.com> Tested-by: Bryan O'Donoghue <bryan.odonoghue@imgtec.com> Cc: linux-mips@linux-mips.org Cc: stable <stable@vger.kernel.org> # v3.16+ Patchwork: https://patchwork.linux-mips.org/patch/15383/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2017-06-29ARM: 8685/1: ensure memblock-limit is pmd-alignedDoug Berger1-4/+4
The pmd containing memblock_limit is cleared by prepare_page_table() which creates the opportunity for early_alloc() to allocate unmapped memory if memblock_limit is not pmd aligned causing a boot-time hang. Commit 965278dcb8ab ("ARM: 8356/1: mm: handle non-pmd-aligned end of RAM") attempted to resolve this problem, but there is a path through the adjust_lowmem_bounds() routine where if all memory regions start and end on pmd-aligned addresses the memblock_limit will be set to arm_lowmem_limit. Since arm_lowmem_limit can be affected by the vmalloc early parameter, the value of arm_lowmem_limit may not be pmd-aligned. This commit corrects this oversight such that memblock_limit is always rounded down to pmd-alignment. Fixes: 965278dcb8ab ("ARM: 8356/1: mm: handle non-pmd-aligned end of RAM") Signed-off-by: Doug Berger <opendmb@gmail.com> Suggested-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
2017-06-29sfc: fix attempt to translate invalid filter IDEdward Cree1-3/+4
When filter insertion fails with no rollback, we were trying to convert EFX_EF10_FILTER_ID_INVALID to an id to store in 'ids' (which is either vlan->uc or vlan->mc). This would WARN_ON_ONCE and then record a bogus filter ID of 0x1fff, neither of which is a good thing. Fixes: 0ccb998bf46d ("sfc: fix filter_id misinterpretation in edge case") Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-29net: handle NAPI_GRO_FREE_STOLEN_HEAD case also in napi_frags_finish()Michal Kubeček1-7/+17
Recently I started seeing warnings about pages with refcount -1. The problem was traced to packets being reused after their head was merged into a GRO packet by skb_gro_receive(). While bisecting the issue pointed to commit c21b48cc1bbf ("net: adjust skb->truesize in ___pskb_trim()") and I have never seen it on a kernel with it reverted, I believe the real problem appeared earlier when the option to merge head frag in GRO was implemented. Handling NAPI_GRO_FREE_STOLEN_HEAD state was only added to GRO_MERGED_FREE branch of napi_skb_finish() so that if the driver uses napi_gro_frags() and head is merged (which in my case happens after the skb_condense() call added by the commit mentioned above), the skb is reused including the head that has been merged. As a result, we release the page reference twice and eventually end up with negative page refcount. To fix the problem, handle NAPI_GRO_FREE_STOLEN_HEAD in napi_frags_finish() the same way it's done in napi_skb_finish(). Fixes: d7e8883cfcf4 ("net: make GRO aware of skb->head_frag") Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-29bpf: prevent leaking pointer via xadd on unpriviledgedDaniel Borkmann2-0/+71
Leaking kernel addresses on unpriviledged is generally disallowed, for example, verifier rejects the following: 0: (b7) r0 = 0 1: (18) r2 = 0xffff897e82304400 3: (7b) *(u64 *)(r1 +48) = r2 R2 leaks addr into ctx Doing pointer arithmetic on them is also forbidden, so that they don't turn into unknown value and then get leaked out. However, there's xadd as a special case, where we don't check the src reg for being a pointer register, e.g. the following will pass: 0: (b7) r0 = 0 1: (7b) *(u64 *)(r1 +48) = r0 2: (18) r2 = 0xffff897e82304400 ; map 4: (db) lock *(u64 *)(r1 +48) += r2 5: (95) exit We could store the pointer into skb->cb, loose the type context, and then read it out from there again to leak it eventually out of a map value. Or more easily in a different variant, too: 0: (bf) r6 = r1 1: (7a) *(u64 *)(r10 -8) = 0 2: (bf) r2 = r10 3: (07) r2 += -8 4: (18) r1 = 0x0 6: (85) call bpf_map_lookup_elem#1 7: (15) if r0 == 0x0 goto pc+3 R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R6=ctx R10=fp 8: (b7) r3 = 0 9: (7b) *(u64 *)(r0 +0) = r3 10: (db) lock *(u64 *)(r0 +0) += r6 11: (b7) r0 = 0 12: (95) exit from 7 to 11: R0=inv,min_value=0,max_value=0 R6=ctx R10=fp 11: (b7) r0 = 0 12: (95) exit Prevent this by checking xadd src reg for pointer types. Also add a couple of test cases related to this. Fixes: 1be7f75d1668 ("bpf: enable non-root eBPF programs") Fixes: 17a5267067f3 ("bpf: verifier (add verifier core)") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-29perf/x86/intel/uncore: Fix wrong box pointer checkKan Liang1-1/+1
Should not init a NULL box. It will cause system crash. The issue looks like caused by a typo. This was not noticed because there is no NULL box. Also, for most boxes, they are enabled by default. The init code is not critical. Fixes: fff4b87e594a ("perf/x86/intel/uncore: Make package handling more robust") Signed-off-by: Kan Liang <kan.liang@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20170629190926.2456-1-kan.liang@intel.com
2017-06-29arcnet: com20020-pci: add missing pdev setup in netdev structureMichael Grzeschik1-0/+1
We add the pdev data to the pci devices netdev structure. This way the interface get consistent device names in the userspace (udev). Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-29arcnet: com20020-pci: fix dev_id calculationMichael Grzeschik1-2/+3
The dev_id was miscalculated. Only the two bits 4-5 are relevant for the MA1 card. PCIARC1 and PCIFB2 use the four bits 4-7 for id selection. Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>