aboutsummaryrefslogtreecommitdiffstats
path: root/tools/perf/scripts/python/export-to-postgresql.py (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2015-03-27watchdog: imgpdc: Fix probe NULL pointer dereferenceJames Hogan1-1/+1
The IMG PDC watchdog probe function calls pdc_wdt_stop() prior to watchdog_set_drvdata(), causing a NULL pointer dereference when pdc_wdt_stop() retrieves the struct pdc_wdt_dev pointer using watchdog_get_drvdata() and reads the register base address through it. Fix by moving the watchdog_set_drvdata() call earlier, to where various other pdc_wdt->wdt_dev fields are initialised. Fixes: 93937669e9b5 ("watchdog: ImgTec PDC Watchdog Timer Driver") Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Ezequiel Garcia <ezequiel.garcia@imgtec.com> Cc: Naidu Tellapati <Naidu.Tellapati@imgtec.com> Cc: Jude Abraham <Jude.Abraham@imgtec.com> Cc: linux-watchdog@vger.kernel.org Reviewed-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
2015-03-27watchdog: mtk_wdt: signedness bug in mtk_wdt_start()Dan Carpenter1-1/+1
"ret" should be signed for the error handling to work correctly. This doesn't matter much in real life since mtk_wdt_set_timeout() always succeeds. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
2015-03-26drm/i915: Fixup legacy plane->crtc link for initial fb configDaniel Vetter1-0/+2
This is a very similar bug in the load detect code fixed in commit 9128b040eb774e04bc23777b005ace2b66ab2a85 Author: Daniel Vetter <daniel.vetter@ffwll.ch> Date: Tue Mar 3 17:31:21 2015 +0100 drm/i915: Fix modeset state confusion in the load detect code But this time around it was the initial fb code that forgot to update the plane->crtc pointer. Otherwise it's the exact same bug, with the exact same restrains (any set_config call/ioctl that doesn't disable the pipe papers over the bug for free, so fairly hard to hit in normal testing). So if you want the full explanation just go read that one over there - it's rather long ... Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Josh Boyer <jwboyer@fedoraproject.org> Cc: Jani Nikula <jani.nikula@linux.intel.com> Reported-and-tested-by: Josh Boyer <jwboyer@fedoraproject.org> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> [Jani: backported to drm-intel-fixes for v4.0-rc] Reference: http://mid.gmane.org/CA+5PVA7ChbtJrknqws1qvZcbrg1CW2pQAFkSMURWWgyASRyGXg@mail.gmail.com Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2015-03-26drm/i915: Fix atomic state when reusing the firmware fbDamien Lespiau1-2/+9
Right now, we get a warning when taking over the firmware fb: [drm:drm_atomic_plane_check] FB set but no CRTC with the following backtrace: [<ffffffffa010339d>] drm_atomic_check_only+0x35d/0x510 [drm] [<ffffffffa0103567>] drm_atomic_commit+0x17/0x60 [drm] [<ffffffffa00a6ccd>] drm_atomic_helper_plane_set_property+0x8d/0xd0 [drm_kms_helper] [<ffffffffa00f1fed>] drm_mode_plane_set_obj_prop+0x2d/0x90 [drm] [<ffffffffa00a8a1b>] restore_fbdev_mode+0x6b/0xf0 [drm_kms_helper] [<ffffffffa00aa969>] drm_fb_helper_restore_fbdev_mode_unlocked+0x29/0x80 [drm_kms_helper] [<ffffffffa00aa9e2>] drm_fb_helper_set_par+0x22/0x50 [drm_kms_helper] [<ffffffffa050a71a>] intel_fbdev_set_par+0x1a/0x60 [i915] [<ffffffff813ad444>] fbcon_init+0x4f4/0x580 That's because we update the plane state with the fb from the firmware, but we never associate the plane to that CRTC. We don't quite have the full DRM take over from HW state just yet, so fake enough of the plane atomic state to pass the checks. v2: Fix the state on which we set the CRTC in the case we're sharing the initial fb with another pipe. (Matt) Signed-off-by: Damien Lespiau <damien.lespiau@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> [Jani: backported to drm-intel-fixes for v4.0-rc] Reference: http://mid.gmane.org/CA+5PVA7yXH=U757w8V=Zj2U1URG4nYNav20NpjtQ4svVueyPNw@mail.gmail.com Reference: http://lkml.kernel.org/r/CA+55aFweWR=nDzc2Y=rCtL_H8JfdprQiCimN5dwc+TgyD4Bjsg@mail.gmail.com Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2015-03-26ALSA: hda - Add one more node in the EAPD supporting candidate listHui Wang1-1/+1
We have a HP machine which use the codec node 0x17 connecting the internal speaker, and from the node capability, we saw the EAPD, if we don't set the EAPD on for this node, the internal speaker can't output any sound. Cc: <stable@vger.kernel.org> BugLink: https://bugs.launchpad.net/bugs/1436745 Signed-off-by: Hui Wang <hui.wang@canonical.com> Signed-off-by: Takashi Iwai <tiwai@suse.de>
2015-03-26drm/i915: Keep ring->active_list and ring->requests_list consistentChris Wilson1-17/+21
If we retire requests last, we may use a later seqno and so clear the requests lists without clearing the active list, leading to confusion. Hence we should retire requests first for consistency with the early return. The order used to be important as the lifecycle for the object on the active list was determined by request->seqno. However, the requests themselves are now reference counted removing the constraint from the order of retirement. Fixes regression from commit 1b5a433a4dd967b125131da42b89b5cc0d5b1f57 Author: John Harrison <John.C.Harrison@Intel.com> Date: Mon Nov 24 18:49:42 2014 +0000 drm/i915: Convert 'i915_seqno_passed' calls into 'i915_gem_request_completed ' and a WARNING: CPU: 0 PID: 1383 at drivers/gpu/drm/i915/i915_gem_evict.c:279 i915_gem_evict_vm+0x10c/0x140() WARN_ON(!list_empty(&vm->active_list)) Identified by updating WATCH_LISTS: [drm:i915_verify_lists] *ERROR* blitter ring: active list not empty, but no requests WARNING: CPU: 0 PID: 681 at drivers/gpu/drm/i915/i915_gem.c:2751 i915_gem_retire_requests_ring+0x149/0x230() WARN_ON(i915_verify_lists(ring->dev)) Note that this is only a problem in evict_vm where the following happens after a retire_request has cleaned out all requests, but not all active bo: - intel_ring_idle called from i915_gpu_idle notices that no requests are outstanding and immediately returns. - i915_gem_retire_requests_ring called from i915_gem_retire_requests also immediately returns when there's no request, still leaving the bo on the active list. - evict_vm hits the WARN_ON(!list_empty(&vm->active_list)) after evicting all active objects that there's still stuff left that shouldn't be there. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2015-03-26ALSA: hda_intel: apply the Seperate stream_tag for Sunrise PointLibin Yang1-1/+1
The total stream number of Sunrise Point's input and output stream exceeds 15, which will cause some streams do not work because of the overflow on SDxCTL.STRM field if using the legacy stream tag allocation method. This patch uses the new stream tag allocation method by add the flag AZX_DCAPS_SEPARATE_STREAM_TAG for Skylake platform. Signed-off-by: Libin Yang <libin.yang@intel.com> Signed-off-by: Takashi Iwai <tiwai@suse.de>
2015-03-26ARC: signal handling robustifyVineet Gupta1-4/+16
A malicious signal handler / restorer can DOS the system by fudging the user regs saved on stack, causing weird things such as sigreturn returning to user mode PC but cpu state still being kernel mode.... Ensure that in sigreturn path status32 always has U bit; any other bogosity (gargbage PC etc) will be taken care of by normal user mode exceptions mechanisms. Reproducer signal handler: void handle_sig(int signo, siginfo_t *info, void *context) { ucontext_t *uc = context; struct user_regs_struct *regs = &(uc->uc_mcontext.regs); regs->scratch.status32 = 0; } Before the fix, kernel would go off to weeds like below: --------->8----------- [ARCLinux]$ ./signal-test Path: /signal-test CPU: 0 PID: 61 Comm: signal-test Not tainted 4.0.0-rc5+ #65 task: 8f177880 ti: 5ffe6000 task.ti: 8f15c000 [ECR ]: 0x00220200 => Invalid Write @ 0x00000010 by insn @ 0x00010698 [EFA ]: 0x00000010 [BLINK ]: 0x2007c1ee [ERET ]: 0x10698 [STAT32]: 0x00000000 : <-------- BTA: 0x00010680 SP: 0x5ffe7e48 FP: 0x00000000 LPS: 0x20003c6c LPE: 0x20003c70 LPC: 0x00000000 ... --------->8----------- Reported-by: Alexey Brodkin <abrodkin@synopsys.com> Cc: <stable@vger.kernel.org> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2015-03-26ARC: SA_SIGINFO ucontext regs off-by-oneVineet Gupta1-2/+2
The regfile provided to SA_SIGINFO signal handler as ucontext was off by one due to pt_regs gutter cleanups in 2013. Before handling signal, user pt_regs are copied onto user_regs_struct and copied back later. Both structs are binary compatible. This was all fine until commit 2fa919045b72 (ARC: pt_regs update #2) which removed the empty stack slot at top of pt_regs (corresponding to first pad) and made the corresponding fixup in struct user_regs_struct (the pad in there was moved out of @scratch - not removed altogether as it is part of ptrace ABI) struct user_regs_struct { + long pad; struct { - long pad; long bta, lp_start, lp_end,.... } scratch; ... } This meant that now user_regs_struct was off by 1 reg w.r.t pt_regs and signal code needs to user_regs_struct.scratch to reflect it as pt_regs, which is what this commit does. This problem was hidden for 2 years, because both save/restore, despite using wrong location, were using the same location. Only an interim inspection (reproducer below) exposed the issue. void handle_segv(int signo, siginfo_t *info, void *context) { ucontext_t *uc = context; struct user_regs_struct *regs = &(uc->uc_mcontext.regs); printf("regs %x %x\n", <=== prints 7 8 (vs. 8 9) regs->scratch.r8, regs->scratch.r9); } int main() { struct sigaction sa; sa.sa_sigaction = handle_segv; sa.sa_flags = SA_SIGINFO; sigemptyset(&sa.sa_mask); sigaction(SIGSEGV, &sa, NULL); asm volatile( "mov r7, 7 \n" "mov r8, 8 \n" "mov r9, 9 \n" "mov r10, 10 \n" :::"r7","r8","r9","r10"); *((unsigned int*)0x10) = 0; } Fixes: 2fa919045b72ec892e "ARC: pt_regs update #2: Remove unused gutter at start of pt_regs" CC: <stable@vger.kernel.org> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2015-03-25mm: numa: mark huge PTEs young when clearing NUMA hinting faultsMel Gorman1-0/+1
Base PTEs are marked young when the NUMA hinting information is cleared but the same does not happen for huge pages which this patch addresses. Note that migrated pages are not marked young as the base page migration code does not assume that migrated pages have been referenced. This could be addressed but beyond the scope of this series which is aimed at Dave Chinners shrink workload that is unlikely to be affected by this issue. Signed-off-by: Mel Gorman <mgorman@suse.de> Cc: Dave Chinner <david@fromorbit.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-03-25mm: numa: slow PTE scan rate if migration failures occurMel Gorman4-8/+15
Dave Chinner reported the following on https://lkml.org/lkml/2015/3/1/226 Across the board the 4.0-rc1 numbers are much slower, and the degradation is far worse when using the large memory footprint configs. Perf points straight at the cause - this is from 4.0-rc1 on the "-o bhash=101073" config: - 56.07% 56.07% [kernel] [k] default_send_IPI_mask_sequence_phys - default_send_IPI_mask_sequence_phys - 99.99% physflat_send_IPI_mask - 99.37% native_send_call_func_ipi smp_call_function_many - native_flush_tlb_others - 99.85% flush_tlb_page ptep_clear_flush try_to_unmap_one rmap_walk try_to_unmap migrate_pages migrate_misplaced_page - handle_mm_fault - 99.73% __do_page_fault trace_do_page_fault do_async_page_fault + async_page_fault 0.63% native_send_call_func_single_ipi generic_exec_single smp_call_function_single This is showing excessive migration activity even though excessive migrations are meant to get throttled. Normally, the scan rate is tuned on a per-task basis depending on the locality of faults. However, if migrations fail for any reason then the PTE scanner may scan faster if the faults continue to be remote. This means there is higher system CPU overhead and fault trapping at exactly the time we know that migrations cannot happen. This patch tracks when migration failures occur and slows the PTE scanner. Signed-off-by: Mel Gorman <mgorman@suse.de> Reported-by: Dave Chinner <david@fromorbit.com> Tested-by: Dave Chinner <david@fromorbit.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-03-25mm: numa: preserve PTE write permissions across a NUMA hinting faultMel Gorman3-6/+14
Protecting a PTE to trap a NUMA hinting fault clears the writable bit and further faults are needed after trapping a NUMA hinting fault to set the writable bit again. This patch preserves the writable bit when trapping NUMA hinting faults. The impact is obvious from the number of minor faults trapped during the basis balancing benchmark and the system CPU usage; autonumabench 4.0.0-rc4 4.0.0-rc4 baseline preserve Time System-NUMA01 107.13 ( 0.00%) 103.13 ( 3.73%) Time System-NUMA01_THEADLOCAL 131.87 ( 0.00%) 83.30 ( 36.83%) Time System-NUMA02 8.95 ( 0.00%) 10.72 (-19.78%) Time System-NUMA02_SMT 4.57 ( 0.00%) 3.99 ( 12.69%) Time Elapsed-NUMA01 515.78 ( 0.00%) 517.26 ( -0.29%) Time Elapsed-NUMA01_THEADLOCAL 384.10 ( 0.00%) 384.31 ( -0.05%) Time Elapsed-NUMA02 48.86 ( 0.00%) 48.78 ( 0.16%) Time Elapsed-NUMA02_SMT 47.98 ( 0.00%) 48.12 ( -0.29%) 4.0.0-rc4 4.0.0-rc4 baseline preserve User 44383.95 43971.89 System 252.61 201.24 Elapsed 998.68 1000.94 Minor Faults 2597249 1981230 Major Faults 365 364 There is a similar drop in system CPU usage using Dave Chinner's xfsrepair workload 4.0.0-rc4 4.0.0-rc4 baseline preserve Amean real-xfsrepair 454.14 ( 0.00%) 442.36 ( 2.60%) Amean syst-xfsrepair 277.20 ( 0.00%) 204.68 ( 26.16%) The patch looks hacky but the alternatives looked worse. The tidest was to rewalk the page tables after a hinting fault but it was more complex than this approach and the performance was worse. It's not generally safe to just mark the page writable during the fault if it's a write fault as it may have been read-only for COW so that approach was discarded. Signed-off-by: Mel Gorman <mgorman@suse.de> Reported-by: Dave Chinner <david@fromorbit.com> Tested-by: Dave Chinner <david@fromorbit.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-03-25mm: numa: group related processes based on VMA flags instead of page table flagsMel Gorman2-19/+13
These are three follow-on patches based on the xfsrepair workload Dave Chinner reported was problematic in 4.0-rc1 due to changes in page table management -- https://lkml.org/lkml/2015/3/1/226. Much of the problem was reduced by commit 53da3bc2ba9e ("mm: fix up numa read-only thread grouping logic") and commit ba68bc0115eb ("mm: thp: Return the correct value for change_huge_pmd"). It was known that the performance in 3.19 was still better even if is far less safe. This series aims to restore the performance without compromising on safety. For the test of this mail, I'm comparing 3.19 against 4.0-rc4 and the three patches applied on top autonumabench 3.19.0 4.0.0-rc4 4.0.0-rc4 4.0.0-rc4 4.0.0-rc4 vanilla vanilla vmwrite-v5r8 preserve-v5r8 slowscan-v5r8 Time System-NUMA01 124.00 ( 0.00%) 161.86 (-30.53%) 107.13 ( 13.60%) 103.13 ( 16.83%) 145.01 (-16.94%) Time System-NUMA01_THEADLOCAL 115.54 ( 0.00%) 107.64 ( 6.84%) 131.87 (-14.13%) 83.30 ( 27.90%) 92.35 ( 20.07%) Time System-NUMA02 9.35 ( 0.00%) 10.44 (-11.66%) 8.95 ( 4.28%) 10.72 (-14.65%) 8.16 ( 12.73%) Time System-NUMA02_SMT 3.87 ( 0.00%) 4.63 (-19.64%) 4.57 (-18.09%) 3.99 ( -3.10%) 3.36 ( 13.18%) Time Elapsed-NUMA01 570.06 ( 0.00%) 567.82 ( 0.39%) 515.78 ( 9.52%) 517.26 ( 9.26%) 543.80 ( 4.61%) Time Elapsed-NUMA01_THEADLOCAL 393.69 ( 0.00%) 384.83 ( 2.25%) 384.10 ( 2.44%) 384.31 ( 2.38%) 380.73 ( 3.29%) Time Elapsed-NUMA02 49.09 ( 0.00%) 49.33 ( -0.49%) 48.86 ( 0.47%) 48.78 ( 0.63%) 50.94 ( -3.77%) Time Elapsed-NUMA02_SMT 47.51 ( 0.00%) 47.15 ( 0.76%) 47.98 ( -0.99%) 48.12 ( -1.28%) 49.56 ( -4.31%) 3.19.0 4.0.0-rc4 4.0.0-rc4 4.0.0-rc4 4.0.0-rc4 vanilla vanillavmwrite-v5r8preserve-v5r8slowscan-v5r8 User 46334.60 46391.94 44383.95 43971.89 44372.12 System 252.84 284.66 252.61 201.24 249.00 Elapsed 1062.14 1050.96 998.68 1000.94 1026.78 Overall the system CPU usage is comparable and the test is naturally a bit variable. The slowing of the scanner hurts numa01 but on this machine it is an adverse workload and patches that dramatically help it often hurt absolutely everything else. Due to patch 2, the fault activity is interesting 3.19.0 4.0.0-rc4 4.0.0-rc4 4.0.0-rc4 4.0.0-rc4 vanilla vanillavmwrite-v5r8preserve-v5r8slowscan-v5r8 Minor Faults 2097811 2656646 2597249 1981230 1636841 Major Faults 362 450 365 364 365 Note the impact preserving the write bit across protection updates and fault reduces faults. NUMA alloc hit 1229008 1217015 1191660 1178322 1199681 NUMA alloc miss 0 0 0 0 0 NUMA interleave hit 0 0 0 0 0 NUMA alloc local 1228514 1216317 1190871 1177448 1199021 NUMA base PTE updates 245706197 240041607 238195516 244704842 115012800 NUMA huge PMD updates 479530 468448 464868 477573 224487 NUMA page range updates 491225557 479886983 476207932 489222218 229950144 NUMA hint faults 659753 656503 641678 656926 294842 NUMA hint local faults 381604 373963 360478 337585 186249 NUMA hint local percent 57 56 56 51 63 NUMA pages migrated 5412140 6374899 6266530 5277468 5755096 AutoNUMA cost 5121% 5083% 4994% 5097% 2388% Here the impact of slowing the PTE scanner on migratrion failures is obvious as "NUMA base PTE updates" and "NUMA huge PMD updates" are massively reduced even though the headline performance is very similar. As xfsrepair was the reported workload here is the impact of the series on it. xfsrepair 3.19.0 4.0.0-rc4 4.0.0-rc4 4.0.0-rc4 4.0.0-rc4 vanilla vanilla vmwrite-v5r8 preserve-v5r8 slowscan-v5r8 Min real-fsmark 1183.29 ( 0.00%) 1165.73 ( 1.48%) 1152.78 ( 2.58%) 1153.64 ( 2.51%) 1177.62 ( 0.48%) Min syst-fsmark 4107.85 ( 0.00%) 4027.75 ( 1.95%) 3986.74 ( 2.95%) 3979.16 ( 3.13%) 4048.76 ( 1.44%) Min real-xfsrepair 441.51 ( 0.00%) 463.96 ( -5.08%) 449.50 ( -1.81%) 440.08 ( 0.32%) 439.87 ( 0.37%) Min syst-xfsrepair 195.76 ( 0.00%) 278.47 (-42.25%) 262.34 (-34.01%) 203.70 ( -4.06%) 143.64 ( 26.62%) Amean real-fsmark 1188.30 ( 0.00%) 1177.34 ( 0.92%) 1157.97 ( 2.55%) 1158.21 ( 2.53%) 1182.22 ( 0.51%) Amean syst-fsmark 4111.37 ( 0.00%) 4055.70 ( 1.35%) 3987.19 ( 3.02%) 3998.72 ( 2.74%) 4061.69 ( 1.21%) Amean real-xfsrepair 450.88 ( 0.00%) 468.32 ( -3.87%) 454.14 ( -0.72%) 442.36 ( 1.89%) 440.59 ( 2.28%) Amean syst-xfsrepair 199.66 ( 0.00%) 290.60 (-45.55%) 277.20 (-38.84%) 204.68 ( -2.51%) 150.55 ( 24.60%) Stddev real-fsmark 4.12 ( 0.00%) 10.82 (-162.29%) 4.14 ( -0.28%) 5.98 (-45.05%) 4.60 (-11.53%) Stddev syst-fsmark 2.63 ( 0.00%) 20.32 (-671.82%) 0.37 ( 85.89%) 16.47 (-525.59%) 15.05 (-471.79%) Stddev real-xfsrepair 6.87 ( 0.00%) 4.55 ( 33.75%) 3.46 ( 49.58%) 1.78 ( 74.12%) 0.52 ( 92.50%) Stddev syst-xfsrepair 3.02 ( 0.00%) 10.30 (-241.37%) 13.17 (-336.37%) 0.71 ( 76.63%) 5.00 (-65.61%) CoeffVar real-fsmark 0.35 ( 0.00%) 0.92 (-164.73%) 0.36 ( -2.91%) 0.52 (-48.82%) 0.39 (-12.10%) CoeffVar syst-fsmark 0.06 ( 0.00%) 0.50 (-682.41%) 0.01 ( 85.45%) 0.41 (-543.22%) 0.37 (-478.78%) CoeffVar real-xfsrepair 1.52 ( 0.00%) 0.97 ( 36.21%) 0.76 ( 49.94%) 0.40 ( 73.62%) 0.12 ( 92.33%) CoeffVar syst-xfsrepair 1.51 ( 0.00%) 3.54 (-134.54%) 4.75 (-214.31%) 0.34 ( 77.20%) 3.32 (-119.63%) Max real-fsmark 1193.39 ( 0.00%) 1191.77 ( 0.14%) 1162.90 ( 2.55%) 1166.66 ( 2.24%) 1188.50 ( 0.41%) Max syst-fsmark 4114.18 ( 0.00%) 4075.45 ( 0.94%) 3987.65 ( 3.08%) 4019.45 ( 2.30%) 4082.80 ( 0.76%) Max real-xfsrepair 457.80 ( 0.00%) 474.60 ( -3.67%) 457.82 ( -0.00%) 444.42 ( 2.92%) 441.03 ( 3.66%) Max syst-xfsrepair 203.11 ( 0.00%) 303.65 (-49.50%) 294.35 (-44.92%) 205.33 ( -1.09%) 155.28 ( 23.55%) The really relevant lines as syst-xfsrepair which is the system CPU usage when running xfsrepair. Note that on my machine the overhead was 45% higher on 4.0-rc4 which may be part of what Dave is seeing. Once we preserve the write bit across faults, it's only 2.51% higher on average. With the full series applied, system CPU usage is 24.6% lower on average. Again, the impact of preserving the write bit on minor faults is obvious and the impact of slowing scanning after migration failures is obvious on the PTE updates. Note also that the number of pages migrated is much reduced even though the headline performance is comparable. 3.19.0 4.0.0-rc4 4.0.0-rc4 4.0.0-rc4 4.0.0-rc4 vanilla vanillavmwrite-v5r8preserve-v5r8slowscan-v5r8 Minor Faults 153466827 254507978 249163829 153501373 105737890 Major Faults 610 702 690 649 724 NUMA base PTE updates 217735049 210756527 217729596 216937111 144344993 NUMA huge PMD updates 129294 85044 106921 127246 79887 NUMA pages migrated 21938995 29705270 28594162 22687324 16258075 3.19.0 4.0.0-rc4 4.0.0-rc4 4.0.0-rc4 4.0.0-rc4 vanilla vanillavmwrite-v5r8preserve-v5r8slowscan-v5r8 Mean sdb-avgqusz 13.47 2.54 2.55 2.47 2.49 Mean sdb-avgrqsz 202.32 140.22 139.50 139.02 138.12 Mean sdb-await 25.92 5.09 5.33 5.02 5.22 Mean sdb-r_await 4.71 0.19 0.83 0.51 0.11 Mean sdb-w_await 104.13 5.21 5.38 5.05 5.32 Mean sdb-svctm 0.59 0.13 0.14 0.13 0.14 Mean sdb-rrqm 0.16 0.00 0.00 0.00 0.00 Mean sdb-wrqm 3.59 1799.43 1826.84 1812.21 1785.67 Max sdb-avgqusz 111.06 12.13 14.05 11.66 15.60 Max sdb-avgrqsz 255.60 190.34 190.01 187.33 191.78 Max sdb-await 168.24 39.28 49.22 44.64 65.62 Max sdb-r_await 660.00 52.00 280.00 76.00 12.00 Max sdb-w_await 7804.00 39.28 49.22 44.64 65.62 Max sdb-svctm 4.00 2.82 2.86 1.98 2.84 Max sdb-rrqm 8.30 0.00 0.00 0.00 0.00 Max sdb-wrqm 34.20 5372.80 5278.60 5386.60 5546.15 FWIW, I also checked SPECjbb in different configurations but it's similar observations -- minor faults lower, PTE update activity lower and performance is roughly comparable against 3.19. This patch (of 3): Threads that share writable data within pages are grouped together as related tasks. This decision is based on whether the PTE is marked dirty which is subject to timing races between the PTE scanner update and when the application writes the page. If the page is file-backed, then background flushes and sync also affect placement. This is unpredictable behaviour which is impossible to reason about so this patch makes grouping decisions based on the VMA flags. Signed-off-by: Mel Gorman <mgorman@suse.de> Reported-by: Dave Chinner <david@fromorbit.com> Tested-by: Dave Chinner <david@fromorbit.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-03-25hfsplus: fix B-tree corruption after insertion at position 0Sergei Antonov1-9/+11
Fix B-tree corruption when a new record is inserted at position 0 in the node in hfs_brec_insert(). In this case a hfs_brec_update_parent() is called to update the parent index node (if exists) and it is passed hfs_find_data with a search_key containing a newly inserted key instead of the key to be updated. This results in an inconsistent index node. The bug reproduces on my machine after an extents overflow record for the catalog file (CNID=4) is inserted into the extents overflow B-tree. Because of a low (reserved) value of CNID=4, it has to become the first record in the first leaf node. The resulting first leaf node is correct: ---------------------------------------------------- | key0.CNID=4 | key1.CNID=123 | key2.CNID=456, ... | ---------------------------------------------------- But the parent index key0 still contains the previous key CNID=123: ----------------------- | key0.CNID=123 | ... | ----------------------- A change in hfs_brec_insert() makes hfs_brec_update_parent() work correctly by preventing it from getting fd->record=-1 value from __hfs_brec_find(). Along the way, I removed duplicate code with unification of the if condition. The resulting code is equivalent to the original code because node is never 0. Also hfs_brec_update_parent() will now return an error after getting a negative fd->record value. However, the return value of hfs_brec_update_parent() is not checked anywhere in the file and I'm leaving it unchanged by this patch. brec.c lacks error checking after some other calls too, but this issue is of less importance than the one being fixed by this patch. Signed-off-by: Sergei Antonov <saproj@gmail.com> Cc: Joe Perches <joe@perches.com> Reviewed-by: Vyacheslav Dubeyko <slava@dubeyko.com> Acked-by: Hin-Tak Leung <htl10@users.sourceforge.net> Cc: Anton Altaparmakov <aia21@cam.ac.uk> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@infradead.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-03-25MAINTAINERS: add Jan as DMI/SMBIOS support maintainerJean Delvare1-0/+7
I am familiar with these drivers and I care about them so let me add myself as their maintainer. Signed-off-by: Jean Delvare <jdelvare@suse.de> Acked-by: Matt Fleming <matt.fleming@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-03-25fs/affs/file.c: unlock/release page on errorTaesoo Kim1-7/+12
When affs_bread_ino() fails, correctly unlock the page and release the page cache with proper error value. All write_end() should unlock/release the page that was locked by write_beg(). Signed-off-by: Taesoo Kim <tsgatesv@gmail.com> Cc: Fabian Frederick <fabf@skynet.be> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-03-25mm/page_alloc.c: call kernel_map_pages in unset_migrateype_isolateLaura Abbott1-0/+1
Commit 3c605096d315 ("mm/page_alloc: restrict max order of merging on isolated pageblock") changed the logic of unset_migratetype_isolate to check the buddy allocator and explicitly call __free_pages to merge. The page that is being freed in this path never had prep_new_page called so set_page_refcounted is called explicitly but there is no call to kernel_map_pages. With the default kernel_map_pages this is mostly harmless but if kernel_map_pages does any manipulation of the page tables (unmapping or setting pages to read only) this may trigger a fault: alloc_contig_range test_pages_isolated(ceb00, ced00) failed Unable to handle kernel paging request at virtual address ffffffc0cec00000 pgd = ffffffc045fc4000 [ffffffc0cec00000] *pgd=0000000000000000 Internal error: Oops: 9600004f [#1] PREEMPT SMP Modules linked in: exfatfs CPU: 1 PID: 23237 Comm: TimedEventQueue Not tainted 3.10.49-gc72ad36-dirty #1 task: ffffffc03de52100 ti: ffffffc015388000 task.ti: ffffffc015388000 PC is at memset+0xc8/0x1c0 LR is at kernel_map_pages+0x1ec/0x244 Fix this by calling kernel_map_pages to ensure the page is set in the page table properly Fixes: 3c605096d315 ("mm/page_alloc: restrict max order of merging on isolated pageblock") Signed-off-by: Laura Abbott <lauraa@codeaurora.org> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Mel Gorman <mgorman@suse.de> Acked-by: Rik van Riel <riel@redhat.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: Xishi Qiu <qiuxishi@huawei.com> Cc: Vladimir Davydov <vdavydov@parallels.com> Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Gioh Kim <gioh.kim@lge.com> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-03-25mm/slub: fix lockups on PREEMPT && !SMP kernelsMark Rutland1-2/+4
Commit 9aabf810a67c ("mm/slub: optimize alloc/free fastpath by removing preemption on/off") introduced an occasional hang for kernels built with CONFIG_PREEMPT && !CONFIG_SMP. The problem is the following loop the patch introduced to slab_alloc_node and slab_free: do { tid = this_cpu_read(s->cpu_slab->tid); c = raw_cpu_ptr(s->cpu_slab); } while (IS_ENABLED(CONFIG_PREEMPT) && unlikely(tid != c->tid)); GCC 4.9 has been observed to hoist the load of c and c->tid above the loop for !SMP kernels (as in this case raw_cpu_ptr(x) is compile-time constant and does not force a reload). On arm64 the generated assembly looks like: ldr x4, [x0,#8] loop: ldr x1, [x0,#8] cmp x1, x4 b.ne loop If the thread is preempted between the load of c->tid (into x1) and tid (into x4), and an allocation or free occurs in another thread (bumping the cpu_slab's tid), the thread will be stuck in the loop until s->cpu_slab->tid wraps, which may be forever in the absence of allocations/frees on the same CPU. This patch changes the loop condition to access c->tid with READ_ONCE. This ensures that the value is reloaded even when the compiler would otherwise assume it could cache the value, and also ensures that the load will not be torn. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Steve Capper <steve.capper@linaro.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-03-25mm/memory hotplug: postpone the reset of obsolete pgdatGu Zheng1-9/+4
Qiu Xishi reported the following BUG when testing hot-add/hot-remove node under stress condition: BUG: unable to handle kernel paging request at 0000000000025f60 IP: next_online_pgdat+0x1/0x50 PGD 0 Oops: 0000 [#1] SMP ACPI: Device does not support D3cold Modules linked in: fuse nls_iso8859_1 nls_cp437 vfat fat loop dm_mod coretemp mperf crc32c_intel ghash_clmulni_intel aesni_intel ablk_helper cryptd lrw gf128mul glue_helper aes_x86_64 pcspkr microcode igb dca i2c_algo_bit ipv6 megaraid_sas iTCO_wdt i2c_i801 i2c_core iTCO_vendor_support tg3 sg hwmon ptp lpc_ich pps_core mfd_core acpi_pad rtc_cmos button ext3 jbd mbcache sd_mod crc_t10dif scsi_dh_alua scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc scsi_dh ahci libahci libata scsi_mod [last unloaded: rasf] CPU: 23 PID: 238 Comm: kworker/23:1 Tainted: G O 3.10.15-5885-euler0302 #1 Hardware name: HUAWEI TECHNOLOGIES CO.,LTD. Huawei N1/Huawei N1, BIOS V100R001 03/02/2015 Workqueue: events vmstat_update task: ffffa800d32c0000 ti: ffffa800d32ae000 task.ti: ffffa800d32ae000 RIP: 0010: next_online_pgdat+0x1/0x50 RSP: 0018:ffffa800d32afce8 EFLAGS: 00010286 RAX: 0000000000001440 RBX: ffffffff81da53b8 RCX: 0000000000000082 RDX: 0000000000000000 RSI: 0000000000000082 RDI: 0000000000000000 RBP: ffffa800d32afd28 R08: ffffffff81c93bfc R09: ffffffff81cbdc96 R10: 00000000000040ec R11: 00000000000000a0 R12: ffffa800fffb3440 R13: ffffa800d32afd38 R14: 0000000000000017 R15: ffffa800e6616800 FS: 0000000000000000(0000) GS:ffffa800e6600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000025f60 CR3: 0000000001a0b000 CR4: 00000000001407e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: refresh_cpu_vm_stats+0xd0/0x140 vmstat_update+0x11/0x50 process_one_work+0x194/0x3d0 worker_thread+0x12b/0x410 kthread+0xc6/0xd0 ret_from_fork+0x7c/0xb0 The cause is the "memset(pgdat, 0, sizeof(*pgdat))" at the end of try_offline_node, which will reset all the content of pgdat to 0, as the pgdat is accessed lock-free, so that the users still using the pgdat will panic, such as the vmstat_update routine. process A: offline node XX: vmstat_updat() refresh_cpu_vm_stats() for_each_populated_zone() find online node XX cond_resched() offline cpu and memory, then try_offline_node() node_set_offline(nid), and memset(pgdat, 0, sizeof(*pgdat)) zone = next_zone(zone) pg_data_t *pgdat = zone->zone_pgdat; // here pgdat is NULL now next_online_pgdat(pgdat) next_online_node(pgdat->node_id); // NULL pointer access So the solution here is postponing the reset of obsolete pgdat from try_offline_node() to hotadd_new_pgdat(), and just resetting pgdat->nr_zones and pgdat->classzone_idx to be 0 rather than the memset 0 to avoid breaking pointer information in pgdat. Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Reported-by: Xishi Qiu <qiuxishi@huawei.com> Suggested-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: David Rientjes <rientjes@google.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Taku Izumi <izumi.taku@jp.fujitsu.com> Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: Xie XiuQi <xiexiuqi@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-03-25MAINTAINERS: correct rtc armada38x pattern entryJoe Perches1-1/+1
Commit c6a95dbee793 ("MAINTAINERS: add the RTC driver for the Armada38x") typoed the pattern, fix it. Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Gregory CLEMENT <gregory.clement@free-electrons.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-03-25mm/pagewalk.c: prevent positive return value of walk_page_test() from being passed to callersNaoya Horiguchi1-1/+8
walk_page_test() is purely pagewalk's internal stuff, and its positive return values are not intended to be passed to the callers of pagewalk. However, in the current code if the last vma in the do-while loop in walk_page_range() happens to return a positive value, it leaks outside walk_page_range(). So the user visible effect is invalid/unexpected return value (according to the reporter, mbind() causes it.) This patch fixes it simply by reinitializing the return value after checked. Another exposed interface, walk_page_vma(), already returns 0 for such cases so no problem. Fixes: fafaa4264eba ("pagewalk: improve vma handling") Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: Kazutomo Yoshii <kazutomo.yoshii@gmail.com> Reported-by: Kazutomo Yoshii <kazutomo.yoshii@gmail.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-03-25mm: fix anon_vma->degree underflow in anon_vma endless growing preventionLeon Yu2-3/+8
I have constantly stumbled upon "kernel BUG at mm/rmap.c:399!" after upgrading to 3.19 and had no luck with 4.0-rc1 neither. So, after looking into new logic introduced by commit 7a3ef208e662 ("mm: prevent endless growth of anon_vma hierarchy"), I found chances are that unlink_anon_vmas() is called without incrementing dst->anon_vma->degree in anon_vma_clone() due to allocation failure. If dst->anon_vma is not NULL in error path, its degree will be incorrectly decremented in unlink_anon_vmas() and eventually underflow when exiting as a result of another call to unlink_anon_vmas(). That's how "kernel BUG at mm/rmap.c:399!" is triggered for me. This patch fixes the underflow by dropping dst->anon_vma when allocation fails. It's safe to do so regardless of original value of dst->anon_vma because dst->anon_vma doesn't have valid meaning if anon_vma_clone() fails. Besides, callers don't care dst->anon_vma in such case neither. Also suggested by Michal Hocko, we can clean up vma_adjust() a bit as anon_vma_clone() now does the work. [akpm@linux-foundation.org: tweak comment] Fixes: 7a3ef208e662 ("mm: prevent endless growth of anon_vma hierarchy") Signed-off-by: Leon Yu <chianglungyu@gmail.com> Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com> Reviewed-by: Michal Hocko <mhocko@suse.cz> Acked-by: Rik van Riel <riel@redhat.com> Acked-by: David Rientjes <rientjes@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-03-25drivers/rtc/rtc-mrst: fix suspend/resumeLars-Peter Clausen1-8/+9
The Moorestown RTC driver implements suspend and resume callbacks and assigns them to the suspend and resume fields of the device_driver struct. These callbacks are never actually called by anything though. Modify the driver to properly use dev_pm_ops so that the suspend and resume functions are actually executed upon suspend/resume. [akpm@linux-foundation.org: device_driver.name is const char *] Signed-off-by: Lars-Peter Clausen <lars@metafoo.de> Cc: Alessandro Zummo <a.zummo@towertech.it> Cc: Feng Tang <feng.tang@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-03-25aoe: update aoe maintainer informationEd Cashin1-2/+2
The coraid.com email address is defunct. The old aoe support area hosted at coraid.com is no longer up. These changes update the email and website to current ones. Signed-off-by: Ed Cashin <ed.cashin@acm.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-03-25selinux: fix sel_write_enforce broken return valueJoe Perches1-1/+1
Return a negative error value like the rest of the entries in this function. Cc: <stable@vger.kernel.org> Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> [PM: tweaked subject line] Signed-off-by: Paul Moore <pmoore@redhat.com>
2015-03-25s390/smp: reenable smt after resumeHeiko Carstens1-0/+11
After a suspend/resume cycle we missed to enable smt again, which leads to all sorts of bugs, since the kernel assumes smt is enabled, while the hardware thinks it is not. Reported-and-tested-by: Sebastian Ott <sebott@linux.vnet.ibm.com> Reported-by: Stefan Haberland <stefan.haberland@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-03-25drm/i915: Don't try to reference the fb in get_initial_plane_config()Damien Lespiau1-4/+3
Tvrtko noticed a new warning on boot: WARNING: CPU: 1 PID: 353 at include/linux/kref.h:47 drm_framebuffer_reference+0x6c/0x80 [drm]() Call Trace: [<ffffffff8161f10c>] dump_stack+0x4f/0x7b [<ffffffff81052caa>] warn_slowpath_common+0xaa/0xd0 [<ffffffff81052d8a>] warn_slowpath_null+0x1a/0x20 [<ffffffffa00d035c>] drm_framebuffer_reference+0x6c/0x80 [drm] [<ffffffffa01c0df7>] update_state_fb.isra.54+0x47/0x50 [i915] [<ffffffffa01ccd5c>] skylake_get_initial_plane_config+0x93c/0x950 [i915] [<ffffffffa01e8721>] intel_modeset_init+0x1551/0x17c0 [i915] [<ffffffffa02476e0>] i915_driver_load+0xed0/0x11e0 [i915] [<ffffffff81627aa1>] ? _raw_spin_unlock_irqrestore+0x51/0x70 [<ffffffffa00ca8b7>] drm_dev_register+0x77/0x110 [drm] [<ffffffffa00cda3b>] drm_get_pci_dev+0x11b/0x1f0 [drm] [<ffffffff81098e3d>] ? trace_hardirqs_on+0xd/0x10 [<ffffffff81627aa1>] ? _raw_spin_unlock_irqrestore+0x51/0x70 [<ffffffffa0145276>] i915_pci_probe+0x56/0x60 [i915] [<ffffffff813ad59c>] pci_device_probe+0x7c/0x100 [<ffffffff81466aad>] driver_probe_device+0x16d/0x380 We cannot take a reference at this point, not before intel_framebuffer_init() and the underlying drm_framebuffer_init(). Introduced in: commit 706dc7b549175e47f23e913b7f1e52874a7d0f56 Author: Matt Roper <matthew.d.roper@intel.com> Date: Tue Feb 3 13:10:04 2015 -0800 drm/i915: Ensure plane->state->fb stays in sync with plane->fb v2: Don't move update_state_fb(). It was moved around because I originally put update_state_fb() in intel_alloc_plane_obj() before finding a better place. (Matt) Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Reported-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Damien Lespiau <damien.lespiau@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> From drm-next: (cherry picked from commit f55548b5af87ebfc586ca75748947f1c1b1a4a52) Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-03-24arm64: percpu: Make this_cpu accessors pre-empt safeSteve Capper2-19/+57
this_cpu operations were implemented for arm64 in: 5284e1b arm64: xchg: Implement cmpxchg_double f97fc81 arm64: percpu: Implement this_cpu operations Unfortunately, it is possible for pre-emption to take place between address generation and data access. This can lead to cases where data is being manipulated by this_cpu for a different CPU than it was called on. Which effectively breaks the spec. This patch disables pre-emption for the this_cpu operations guaranteeing that address generation and data manipulation take place without a pre-emption in-between. Fixes: 5284e1b4bc8a ("arm64: xchg: Implement cmpxchg_double") Fixes: f97fc810798c ("arm64: percpu: Implement this_cpu operations") Reported-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Steve Capper <steve.capper@linaro.org> [catalin.marinas@arm.com: remove space after type cast] Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2015-03-24drm: Fixup racy refcounting in plane_force_disableDaniel Vetter1-12/+1
Originally it was impossible to be dropping the last refcount in this function since there was always one around still from the idr. But in commit 83f45fc360c8e16a330474860ebda872d1384c8c Author: Daniel Vetter <daniel.vetter@ffwll.ch> Date: Wed Aug 6 09:10:18 2014 +0200 drm: Don't grab an fb reference for the idr we've switched to weak references, broke that assumption but forgot to fix it up. Since we still force-disable planes it's only possible to hit this when racing multiple rmfb with fbdev restoring or similar evil things. As long as userspace is nice it's impossible to hit the BUG_ON. But the BUG_ON would most likely be hit from fbdev code, which usually invovles the console_lock besides all modeset locks. So very likely we'd never get the bug reports if this was hit in the wild, hence better be safe than sorry and backport. Spotted by Matt Roper while reviewing other patches. [airlied: pull this back into 4.0 - the oops happens there] Cc: stable@vger.kernel.org Cc: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-03-23kvm: avoid page allocation failure in kvm_set_memory_region()Igor Mammedov1-7/+7
KVM guest can fail to startup with following trace on host: qemu-system-x86: page allocation failure: order:4, mode:0x40d0 Call Trace: dump_stack+0x47/0x67 warn_alloc_failed+0xee/0x150 __alloc_pages_direct_compact+0x14a/0x150 __alloc_pages_nodemask+0x776/0xb80 alloc_kmem_pages+0x3a/0x110 kmalloc_order+0x13/0x50 kmemdup+0x1b/0x40 __kvm_set_memory_region+0x24a/0x9f0 [kvm] kvm_set_ioapic+0x130/0x130 [kvm] kvm_set_memory_region+0x21/0x40 [kvm] kvm_vm_ioctl+0x43f/0x750 [kvm] Failure happens when attempting to allocate pages for 'struct kvm_memslots', however it doesn't have to be present in physically contiguous (kmalloc-ed) address space, change allocation to kvm_kvzalloc() so that it will be vmalloc-ed when its size is more then a page. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2015-03-23KVM: x86: call irq notifiers with directed EOIRadim Krčmář2-3/+4
kvm_ioapic_update_eoi() wasn't called if directed EOI was enabled. We need to do that for irq notifiers. (Like with edge interrupts.) Fix it by skipping EOI broadcast only. Bug: https://bugzilla.kernel.org/show_bug.cgi?id=82211 Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Tested-by: Bandan Das <bsd@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2015-03-23dm: fix add_disk() NULL pointer due to race with free_dev()Mike Snitzer1-10/+16
Commit c4db59d31e39 ("fs: don't reassign dirty inodes to default_backing_dev_info") exposed DM to a latent race in free_dev() vs add_disk() in relation to management of the device's minor number. Fix this by refactoring free_dev() to match cleanup order of the alloc_dev() error path. Move cleanup of the gendisk, queue, and bdev to _before_ the cleanup of the idr managed minor number. Also, purely due to cleanup that fell out during the free_dev() audit: - adjust dm_blk_close() to access the gendisk's private_data under the _minor_lock spinlock. - move __dm_destroy()'s dm_get_live_table() call out from under the _minor_lock spinlock. Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1202449 Reported-by: Zdenek Kabelac <zkabelac@redhat.com> Reported-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-03-23arm64: Use the reserved TTBR0 if context switching to the init_mmCatalin Marinas1-0/+9
The idle_task_exit() function may call switch_mm() with next == &init_mm. On arm64, init_mm.pgd cannot be used for user mappings, so this patch simply sets the reserved TTBR0. Cc: <stable@vger.kernel.org> Reported-by: Jon Medhurst (Tixy) <tixy@linaro.org> Tested-by: Jon Medhurst (Tixy) <tixy@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2015-03-23ALSA: hda - Add dock support for Thinkpad T450s (17aa:5036)Sebastian Wicki1-0/+1
This model uses the same dock port as the previous generation. Signed-off-by: Sebastian Wicki <gandro@gmx.net> Cc: <stable@vger.kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de>
2015-03-23sparc64: Fix several bugs in memmove().David S. Miller1-3/+32
Firstly, handle zero length calls properly. Believe it or not there are a few of these happening during early boot. Next, we can't just drop to a memcpy() call in the forward copy case where dst <= src. The reason is that the cache initializing stores used in the Niagara memcpy() implementations can end up clearing out cache lines before we've sourced their original contents completely. For example, considering NG4memcpy, the main unrolled loop begins like this: load src + 0x00 load src + 0x08 load src + 0x10 load src + 0x18 load src + 0x20 store dst + 0x00 Assume dst is 64 byte aligned and let's say that dst is src - 8 for this memcpy() call. That store at the end there is the one to the first line in the cache line, thus clearing the whole line, which thus clobbers "src + 0x28" before it even gets loaded. To avoid this, just fall through to a simple copy only mildly optimized for the case where src and dst are 8 byte aligned and the length is a multiple of 8 as well. We could get fancy and call GENmemcpy() but this is good enough for how this thing is actually used. Reported-by: David Ahern <david.ahern@oracle.com> Reported-by: Bob Picco <bpicco@meloft.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-23writeback: fix possible underflow in write bandwidth calculationTejun Heo1-1/+4
From 1ebf33901ecc75d9496862dceb1ef0377980587c Mon Sep 17 00:00:00 2001 From: Tejun Heo <tj@kernel.org> Date: Mon, 23 Mar 2015 00:08:19 -0400 2f800fbd777b ("writeback: fix dirtied pages accounting on redirty") introduced account_page_redirty() which reverts stat updates for a redirtied page, making BDI_DIRTIED no longer monotonically increasing. bdi_update_write_bandwidth() uses the delta in BDI_DIRTIED as the basis for bandwidth calculation. While unlikely, since the above patch, the newer value may be lower than the recorded past value and underflow the bandwidth calculation leading to a wild result. Fix it by subtracing min of the old and new values when calculating delta. AFAIK, there hasn't been any report of it happening but the resulting erratic behavior would be non-critical and temporary, so it's possible that the issue is happening without being reported. The risk of the fix is very low, so tagged for -stable. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Jan Kara <jack@suse.cz> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Greg Thelen <gthelen@google.com> Fixes: 2f800fbd777b ("writeback: fix dirtied pages accounting on redirty") Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe <axboe@fb.com>
2015-03-23NVMe: Initialize device list head before startingKeith Busch1-0/+1
Driver recovery requires the device's list node to have been initialized. Fixes: https://lkml.org/lkml/2015/3/22/262 Reported-by: Steven Noonan <steven@uplinklabs.net> Signed-off-by: Keith Busch <keith.busch@intel.com> Cc: Matthew Wilcox <willy@linux.intel.com> Cc: Jens Axboe <axboe@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-03-23metag: Fix ioremap_wc/ioremap_cached build errorsJames Hogan3-94/+106
When ioremap_wc() or ioremap_cached() are used without first including asm/pgtable.h, the _PAGE_CACHEABLE or _PAGE_WR_COMBINE definitions aren't found, resulting in build errors like the following (in next-20150323 due to "lib: devres: add a helper function for ioremap_wc"): lib/devres.c: In function ‘devm_ioremap_wc’: lib/devres.c:91: error: ‘_PAGE_WR_COMBINE’ undeclared We can't easily include asm/pgtable.h in asm/io.h due to dependency problems, so split out the _PAGE_* definitions from asm/pgtable.h into a separate asm/pgtable-bits.h header (as a couple of other architectures already do), and include that in io.h instead. Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: linux-metag@vger.kernel.org Cc: Abhilash Kesavan <a.kesavan@samsung.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-23parisc: Fix pmd code to depend on PT_NLEVELS value, not on CONFIG_64BITHelge Deller1-5/+3
Make the code which sets up the pmd depend on PT_NLEVELS == 3, not on CONFIG_64BIT. The reason is, that a 64bit kernel with a page size greater than 4k doesn't need the pmd and thus has PT_NLEVELS = 2. Signed-off-by: Helge Deller <deller@gmx.de>
2015-03-23parisc: mm: don't count preallocated pmdsMikulas Patocka1-2/+7
The patch dc6c9a35b66b520cf67e05d8ca60ebecad3b0479 that counts pmds allocated for a process introduced a bug on 64-bit PA-RISC kernels. The PA-RISC architecture preallocates one pmd with each pgd. This preallocated pmd can never be freed - pmd_free does nothing when it is called with this pmd. When the kernel attempts to free this preallocated pmd, it decreases the count of allocated pmds. The result is that the counter underflows and this error is reported. This patch fixes the bug by artifically incrementing the counter in pmd_free when the kernel tries to free the preallocated pmd. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Helge Deller <deller@gmx.de>
2015-03-23parisc: Add compile-time check when adding new syscallsHelge Deller1-3/+6
Signed-off-by: Helge Deller <deller@gmx.de>
2015-03-23lockdep: Fix the module unload key range freeing logicPeter Zijlstra2-30/+59
Module unload calls lockdep_free_key_range(), which removes entries from the data structures. Most of the lockdep code OTOH assumes the data structures are append only; in specific see the comments in add_lock_to_list() and look_up_lock_class(). Clearly this has only worked by accident; make it work proper. The actual scenario to make it go boom would involve the memory freed by the module unlock being re-allocated and re-used for a lock inside of a rcu-sched grace period. This is a very unlikely scenario, still better plug the hole. Use RCU list iteration in all places and ammend the comments. Change lockdep_free_key_range() to issue a sync_sched() between removal from the lists and returning -- which results in the memory being freed. Further ensure the callers are placed correctly and comment the requirements. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andrey Tsyvarev <tsyvarev@ispras.ru> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-23sched: Fix RLIMIT_RTTIME when PI-boosting to RTBrian Silverman1-0/+2
When non-realtime tasks get priority-inheritance boosted to a realtime scheduling class, RLIMIT_RTTIME starts to apply to them. However, the counter used for checking this (the same one used for SCHED_RR timeslices) was not getting reset. This meant that tasks running with a non-realtime scheduling class which are repeatedly boosted to a realtime one, but never block while they are running realtime, eventually hit the timeout without ever running for a time over the limit. This patch resets the realtime timeslice counter when un-PI-boosting from an RT to a non-RT scheduling class. I have some test code with two threads and a shared PTHREAD_PRIO_INHERIT mutex which induces priority boosting and spins while boosted that gets killed by a SIGXCPU on non-fixed kernels but doesn't with this patch applied. It happens much faster with a CONFIG_PREEMPT_RT kernel, and does happen eventually with PREEMPT_VOLUNTARY kernels. Signed-off-by: Brian Silverman <brian@peloton-tech.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: austin@peloton-tech.com Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/1424305436-6716-1-git-send-email-brian@peloton-tech.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-23perf: Fix irq_work 'tail' recursionPeter Zijlstra1-0/+10
Vince reported a watchdog lockup like: [<ffffffff8115e114>] perf_tp_event+0xc4/0x210 [<ffffffff810b4f8a>] perf_trace_lock+0x12a/0x160 [<ffffffff810b7f10>] lock_release+0x130/0x260 [<ffffffff816c7474>] _raw_spin_unlock_irqrestore+0x24/0x40 [<ffffffff8107bb4d>] do_send_sig_info+0x5d/0x80 [<ffffffff811f69df>] send_sigio_to_task+0x12f/0x1a0 [<ffffffff811f71ce>] send_sigio+0xae/0x100 [<ffffffff811f72b7>] kill_fasync+0x97/0xf0 [<ffffffff8115d0b4>] perf_event_wakeup+0xd4/0xf0 [<ffffffff8115d103>] perf_pending_event+0x33/0x60 [<ffffffff8114e3fc>] irq_work_run_list+0x4c/0x80 [<ffffffff8114e448>] irq_work_run+0x18/0x40 [<ffffffff810196af>] smp_trace_irq_work_interrupt+0x3f/0xc0 [<ffffffff816c99bd>] trace_irq_work_interrupt+0x6d/0x80 Which is caused by an irq_work generating new irq_work and therefore not allowing forward progress. This happens because processing the perf irq_work triggers another perf event (tracepoint stuff) which in turn generates an irq_work ad infinitum. Avoid this by raising the recursion counter in the irq_work -- which effectively disables all software events (including tracepoints) from actually triggering again. Reported-by: Vince Weaver <vincent.weaver@maine.edu> Tested-by: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20150219170311.GH21418@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-23powerpc/book3s: Fix the MCE code to use CONFIG_KVM_BOOK3S_64_HANDLERMahesh Salgaonkar1-1/+1
commit id 2ba9f0d has changed CONFIG_KVM_BOOK3S_64_HV to tristate to allow HV/PR bits to be built as modules. But the MCE code still depends on CONFIG_KVM_BOOK3S_64_HV which is wrong. When user selects CONFIG_KVM_BOOK3S_64_HV=m to build HV/PR bits as a separate module the relevant MCE code gets excluded. This patch fixes the MCE code to use CONFIG_KVM_BOOK3S_64_HANDLER. This makes sure that the relevant MCE code is included when HV/PR bits are built as a separate modules. Fixes: 2ba9f0d88750 ("kvm: powerpc: book3s: Support building HV and PR KVM as module") Cc: stable@vger.kernel.org # v3.14+ Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-03-22Linux 4.0-rc5Linus Torvalds1-1/+1
2015-03-22netfilter: nft_compat: set IP6T_F_PROTO flag if protocol is setPablo Neira Ayuso1-0/+6
ip6tables extensions check for this flag to restrict match/target to a given protocol. Without this flag set, SYNPROXY6 returns an error. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Patrick McHardy <kaber@trash.net>
2015-03-21cx82310_eth: wait for firmware to become readyOndrej Zary1-6/+24
When the device is powered up, some (older) firmware versions fail to work properly if we send commands before the boot is complete (everything is OK when the device is hot-plugged). The firmware indicates its ready status by putting the link up. Newer firmwares delay the first command so they don't suffer from this problem. They also report the link being always up. Wait for firmware to become ready (link up) before sending any commands and/or data. This also allows lowering CMD_TIMEOUT value to a reasonable time. Tested with 4.1.0.9 (old) and 4.1.0.30 (new) firmware versions. Signed-off-by: Ondrej Zary <linux@rainbow-software.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-21md: fix problems with freeing private data after ->run failure.NeilBrown2-3/+2
If ->run() fails, it can either free the data structures it allocated, or leave that task to ->free() which will be called on failures. However: md.c calls ->free() even if ->private_data is NULL, which causes problems in some personalities. raid0.c frees the data, but doesn't clear ->private_data, which will become a problem when we fix md.c So better fix both these issues at once. Reported-by: Richard W.M. Jones <rjones@redhat.com> Fixes: 5aa61f427e4979be733e4847b9199ff9cc48a47e URL: https://bugzilla.kernel.org/show_bug.cgi?id=94381 Signed-off-by: NeilBrown <neilb@suse.de>
2015-03-20net: validate the range we feed to iov_iter_init() in sys_sendto/sys_recvfromAl Viro1-0/+4
Cc: stable@vger.kernel.org # v3.19 Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>