linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2015-01-15	drm/i915: PSR HSW/BDW: Fix inverted logic at sink main_link_active bit.	Rodrigo Vivi	1	-2/+2
	We have only two possible states with so many names and combinations that might be confusing. 1 - Main link active / enabled / stand by / on 2 - Main link disabled / off / full off Let's start organizing it by fixing a inverted logic when setting the sink bit. Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Durgadoss R <durgadoss.r@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-01-15	drm/i915: PSR VLV/CHV: Remove condition checks that only applies to Haswell.	Rodrigo Vivi	1	-8/+5
	These conditions applies only to Haswell and we were also checking for them on Valleyview/Cherryview. Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Durgadoss R <durgadoss.r@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-01-15	drm/i915: VLV/CHV PSR needs to exit PSR on every flush.	Rodrigo Vivi	1	-4/+2
	ON these platforms we don't have hardware tracking working for any case. So we need to fake this on software by forcing psr to exit on every flush. Manual tests indicated this was needed. Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Durgadoss R <durgadoss.r@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-01-13	drm/i915: Fix kerneldoc for i915 atomic plane code	Matt Roper	1	-1/+2
	Description of the 'state' parameter for intel_plane_destroy_state() was missing and the intel_atomic_plane.c file section heading did not match drm.tmpl. Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-01-13	drm/i915: Don't pretend SDVO hotplug works on 915	Ville Syrjälä	1	-0/+3
	915 doens't support hotplug at all, so we shouldn't try to pretend otherwise in the SDVO code. Note: i915 does have hotplug support in hw, we simply never enabled it in i915.ko: There's only one hpd bit for all outputs, so not worth the bother to add this special case for this rather old platform. Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Jani Nikula <jani.nikula@intel.com> [danvet: Clarify that only i915.ko doesn't support hpd on i915g.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-01-13	drm/i915: Don't register HDMI connectors for eDP ports on VLV/CHV	Ville Syrjälä	1	-2/+4
	If we determine that a specific port is eDP, don't register the HDMI connector/encoder for it. The reason being that we want to disable HPD interrupts for eDP ports when the display is off, but the presence of the extra HDMI connector would demand the HPD interrupt to remain enabled all the time. Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-01-13	drm/i915: Remove I915_HAS_HOTPLUG() check from i915_hpd_irq_setup()	Ville Syrjälä	1	-22/+20
	The dev_priv->display.hpd_irq_setup hook is optional, so we can move the I915_HAS_HOTPLUG() check out of i915_hpd_irq_setup() and only set up the hook when hotplug support is present. Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-01-13	drm/i915: Make hpd arrays big enough to avoid out of bounds access	Ville Syrjälä	1	-6/+6
	intel_hpd_irq_handler() walks the passed in hpd[] array assuming it contains HPD_NUM_PINS elements. Currently that's not true as we don't specify an explicit size for the arrays when initializing them. Avoid the out of bounds accesses by specifying the size for the arrays. Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-01-13	Revert "drm/i915/chv: Use timeout mode for RC6 on chv"	Rodrigo Vivi	1	-3/+2
	This reverts commit 5a0afd4b78ec23f27f5d486ac3d102c2e8d66bd7. Although timeout mode allows higher residency it impact badly on performance. I believe while we don't have a way to balance between performance and power savings at runtime I believe we have to revert and prioritize performance that was impacted a lot. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88103 Cc: Deepak S <deepak.s@linux.intel.com> Cc: Wendy Wang <wendy.wang@intel.com> Cc: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-01-13	drm/i915: Improve HiZ throughput on Cherryview.	Kenneth Graunke	2	-0/+6
	Found by reading the HIZ_CHICKEN documentation. Improves performance in a HiZ microbenchmark by around 50%. Improves performance in OglZBuffer by around 18%. Thanks to Chris Wilson for helping me figure out where to put this. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-01-13	drm/i915: Reset CSB read pointer in ring init	Thomas Daniel	1	-1/+1
	A previous commit enabled execlists by default: commit 27401d126b5b ("drm/i915/bdw: Enable execlists by default where supported") This allowed routine testing of execlists which exposed a regression when resuming from suspend. The cause was tracked down the to recent changes to the ring init sequence: commit 35a57ffbb108 ("drm/i915: Only init engines once") During a suspend/resume cycle the hardware Context Status Buffer write pointer is reset. However since the recent changes to the init sequence the software CSB read pointer is no longer reset. This means that context status events are not handled correctly and new contexts are not written to the ELSP, resulting in an apparent GPU hang. Pending further changes to the ring init code, just move the ring->next_context_status_buffer initialization into gen8_init_common_ring to fix this regression. v2: Moved init into gen8_init_common_ring rather than context_enable after feedback from Daniel Vetter. Updated commit msg to reflect this and also cite commits related to the regression. Fixed bz link to correct bug. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88096 Cc: Paulo Zanoni <paulo.r.zanoni@intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Dave Gordon <david.s.gordon@intel.com> Signed-off-by: Thomas Daniel <thomas.daniel@intel.com> Reviewed-by: Dave Gordon <david.s.gordon@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-01-12	drm/i915: Drop unused position fields (v2)	Matt Roper	3	-45/+5
	The userspace-requested plane coordinates are now always available via plane->state.base (and the i915-adjusted values are stored in plane->state), so we no longer use the coordinate fields in intel_plane and can drop them. Also, note that the error case for pageflip calls update_plane() to program the values from plane->state; it's simpler to just call intel_plane_restore() which does the same thing. v2: Replace manual update_plane() with intel_plane_restore() in pageflip error handler. Reviewed-by(v1): Bob Paauwe <bob.j.paauwe@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-01-12	drm/i915: Move to atomic plane helpers (v9)	Matt Roper	6	-165/+303
	Switch plane handling to use the atomic plane helpers. This means that rather than provide our own implementations of .update_plane() and .disable_plane(), we expose the lower-level check/prepare/commit/cleanup entrypoints and let the DRM core implement update/disable for us using those entrypoints. The other main change that falls out of this patch is that our drm_plane's will now always have a valid plane->state that contains the relevant plane state (initial state is allocated at plane creation). The base drm_plane_state pointed to holds the requested source/dest coordinates, and the subclassed intel_plane_state holds the adjusted values that our driver actually uses. v2: - Renamed file from intel_atomic.c to intel_atomic_plane.c (Daniel) - Fix a copy/paste comment mistake (Bob) v3: - Use prepare/cleanup functions that we've already factored out - Use newly refactored pre_commit/commit/post_commit to avoid sleeping during vblank evasion v4: - Rebase to latest di-nightly requires adding an 'old_state' parameter to atomic_update; v5: - Must have botched a rebase somewhere and lost some work. Restore state 'dirty' flag to let begin/end code know which planes to run the pre_commit/post_commit hooks for. This would have actually shown up as broken in the next commit rather than this one. v6: - Squash kerneldoc patch into this one. - Previous patches have now already taken care of most of the infrastructure that used to be in this patch. All we're adding here now is some thin wrappers. v7: - Check return of intel_plane_duplicate_state() for allocation failures. v8: - Drop unused drm_plane_state -> intel_plane_state cast. (Ander) - Squash in actual transition to plane helpers. Significant refactoring earlier in the patchset has made the combined prep+transition much easier to swallow than it was in earlier iterations. (Ander) v9: - s/track_fbs/disabled_planes/ in the atomic crtc flags. The only fb's we need to update frontbuffer tracking for are those on a plane about to be disabled (since the atomic helpers never call prepare_fb() when disabling a plane), so the new name more accurately describes what we're actually tracking. Testcase: igt/kms_plane Testcase: igt/kms_universal_plane Testcase: igt/kms_cursor_crc Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Ander Conselvan de Oliveira <conselvan2@gmail.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-01-12	drm/i915: Clarify sprite plane function names (v4)	Matt Roper	3	-17/+16
	A few of the sprite-related function names in i915 are very similar (e.g., intel_enable_planes() vs intel_crtc_enable_planes()) and don't make it clear whether they only operate on sprite planes, or whether they also apply to all universal plane types. Rename a few functions to be more consistent with our function naming for primary/cursor planes or to clarify that they apply specifically to sprite planes: - s/intel_disable_planes/intel_disable_sprite_planes/ - s/intel_enable_planes/intel_enable_sprite_planes/ Also, drop the sprite-specific intel_destroy_plane() and just use the type-agnostic intel_plane_destroy() function. The extra 'disable' call that intel_destroy_plane() did is unnecessary since the plane will already be disabled due to framebuffer destruction by the point it gets called. v2: Earlier consolidation patches have reduced the number of functions we need to rename here. v3: Also rename intel_plane_funcs vtable to intel_sprite_plane_funcs for consistency with primary/cursor. (Ander) v4: Convert comment for intel_plane_destroy() to kerneldoc now that it is no longer a static function. (Ander) Reviewed-by(v1): Bob Paauwe <bob.j.paauwe@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Ander Conselvan de Oliveira <conselvan2@gmail.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-01-12	drm/i915: Move vblank evasion to commit (v4)	Matt Roper	3	-42/+14
	Move the vblank evasion up from the low-level, hw-specific update_plane() handlers to the general plane commit operation. Everything inside commit should now be non-sleeping, so this brings us closer to how vblank evasion will behave once we move over to atomic. v2: - Restore lost intel_crtc->active check on vblank evasion v3: - Replace assert_pipe_enabled() in intel_disable_primary_hw_plane() with an intel_crtc->active test; it turns out assert_pipe_enabled() grabs some mutexes and can sleep, which we can't do with interrupts disabled. v4: - Equivalent to v2; v3 change is now squashed into an earlier patch of the series. (Ander). Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Ander Conselvan de Oliveira <conselvan2@gmail.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-01-12	drm/i915: Refactor work that can sleep out of commit (v7)	Matt Roper	3	-119/+209
	Once we integrate our work into the atomic pipeline, plane commit operations will need to happen with interrupts disabled, due to vblank evasion. Our commit functions today include sleepable work, so those operations need to be split out and run either before or after the atomic register programming. The solution here calculates which of those operations will need to be performed during the 'check' phase and sets flags in an intel_crtc sub-struct. New intel_begin_crtc_commit() and intel_finish_crtc_commit() functions are added before and after the actual register programming; these will eventually be called from the atomic plane helper's .atomic_begin() and .atomic_end() entrypoints. v2: Fix broken sprite code split v3: Make the pre/post commit work crtc-based to match how we eventually want this to be called from the atomic plane helpers. v4: Some platforms that haven't had their watermark code reworked were waiting for vblank, then calling update_sprite_watermarks in their platform-specific disable code. These also need to be flagged out of the critical section. v5: Sprite plane test for primary show/hide should just set the flag to wait for pending flips, not actually perform the wait. (Ander) v6: - Rebase onto latest di-nightly; picks up an important runtime PM fix. - Handle 'wait_for_flips' flag in intel_begin_crtc_commit(). (Ander) - Use wait_for_flips flag for primary plane update rather than performing the wait in the check routine. - Added kerneldoc to pre_disable/post_enable functions that are no longer static. (Ander) - Replace assert_pipe_enabled() in intel_disable_primary_hw_plane() with an intel_crtc->active test; it turns out assert_pipe_enabled() grabs some mutexes and can sleep, which we can't do with interrupts disabled. v7: - Check for fb != NULL when deciding whether the sprite plane hides the primary plane during a sprite update. (PRTS) Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Ander Conselvan de Oliveira <conselvan2@gmail.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-01-12	drm/i915: fix build for CONFIG_BUG=n	Jani Nikula	1	-2/+2
	If CONFIG_BUG=n __WARN_printf won't be defined leading to the below build failure. The double underscores should have told us to steer clear of it anyway. drivers/gpu/drm/i915/intel_display.c: In function ‘assert_pll’: drivers/gpu/drm/i915/intel_display.c:1027:2: error: implicit declaration of function ‘__WARN_printf’ [-Werror=implicit-function-declaration] I915_STATE_WARN(cur_state != state, Use WARN(1, ...) instead. It handles CONFIG_BUG=n gracefully and, with the constant condition, a sane compiler should reduce it to __WARN_printf. This is a regression introduced by commit e2c719b75c8c186deb86570d8466df9e9eff919b Author: Rob Clark <robdclark@gmail.com> Date: Mon Dec 15 13:56:32 2014 -0500 drm/i915: tame the chattermouth (v2) Reported-by: Jim Davis <jim.epost@gmail.com> Reference: http://mid.gmane.org/CA+r1ZhgHTi7bS2irhtuSUs9aO=Br1dumN8=oAOeaMJDZ_ZhwBw@mail.gmail.com Cc: Rob Clark <robdclark@gmail.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-01-12	drm/i915: remove unused power_well/get_cdclk_freq api	Imre Deak	2	-93/+0
	After switching to using the component interface this API isn't needed any more. v2-3: unchanged v4: - move the removal of i915_powerwell.h to this patch (Takashi) Signed-off-by: Imre Deak <imre.deak@intel.com> Reviewed-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-01-12	ALSA: hda: add component support	Imre Deak	3	-45/+106
	Register a component master to be used to interface with the i915 driver. This is meant to replace the current interface which is based on module symbol lookups. Note that currently we keep the existing behavior and pin the i915 module while the hda driver is loaded. Using the component interface allows us to remove this dependency once support for dynamically enabling / disabling the HDMI functionality is added to the driver. v2: - change roles between the hda and i915 components (Daniel) v3: - rename display_component to audio_component (Daniel) v4: - move removal of i915_powerwell.h from this patch to the next (Takashi) - request_module fails if module support isn't enabled, so ignore any error it returns and depend on the following NULL check of the component ops (Takashi) - change over to using dev_* instead of pr_* (Takashi) Signed-off-by: Imre Deak <imre.deak@intel.com> Reviewed-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-01-12	ALSA: hda: pass intel_hda to all i915 interface functions	Imre Deak	3	-26/+33
	chip is already passed to most of the i915 interface functions. Unify the interface by passing intel_hda instead of chip and passing it to all functions. Passing intel_hda instead of chip makes more sense since this is an intel specific interface. Also in an upcoming patch we will use intel_hda in all of these functions so by passing intel_hda we can save on some pointer casts from chip to intel_hda. This will be needed by an upcoming patch adding component support. No functional change. v2-3: unchanged v4: - pass intel_hda instead of chip Signed-off-by: Imre Deak <imre.deak@intel.com> Reviewed-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-01-12	ALSA: hda: export struct hda_intel	Imre Deak	3	-29/+31
	This struct will be needed by the component code added in an upcoming patch, so export it into a new hda_intel.h file. At the same time also merge hda_i915.h into this new header, there is no reason to keep two separate intel specific header file. Suggested-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Imre Deak <imre.deak@intel.com> Reviewed-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-01-12	drm/i915: add component support	Imre Deak	5	-0/+157
	Register a component to be used to interface with the snd_hda_intel driver. This is meant to replace the same interface that is currently based on module symbol lookup. v2: - change roles between the hda and i915 components (Daniel) - add the implementation to a new file (Jani) - use better namespacing (Jani) v3: - move the implementation to intel_audio.c (Daniel) - rename display_component to audio_component (Daniel) - add kerneldoc (Daniel) v4: - run forgotten git rm i915_component.c (Jani) Signed-off-by: Imre Deak <imre.deak@intel.com> Reviewed-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-01-12	drm/i915: add dev_to_i915 helper	Imre Deak	2	-6/+8
	This will be needed by later patches, so factor it out. No functional change. v2: - s/dev_to_i915_priv/dev_to_i915/ (Jani) - don't use the helper in i915_pm_suspend (Chris) - simplify the helper (Chris) v3: - remove redundant upcasting in the helper (Daniel) Signed-off-by: Imre Deak <imre.deak@intel.com> Reviewed-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-01-11	linux 3.19-rc4	Linus Torvalds	1	-1/+1

2015-01-11	mm: fix corner case in anon_vma endless growing prevention	Konstantin Khlebnikov	1	-2/+4
	Fix for BUG_ON(anon_vma->degree) splashes in unlink_anon_vmas() ("kernel BUG at mm/rmap.c:399!") caused by commit 7a3ef208e662 ("mm: prevent endless growth of anon_vma hierarchy") Anon_vma_clone() is usually called for a copy of source vma in destination argument. If source vma has anon_vma it should be already in dst->anon_vma. NULL in dst->anon_vma is used as a sign that it's called from anon_vma_fork(). In this case anon_vma_clone() finds anon_vma for reusing. Vma_adjust() calls it differently and this breaks anon_vma reusing logic: anon_vma_clone() links vma to old anon_vma and updates degree counters but vma_adjust() overrides vma->anon_vma right after that. As a result final unlink_anon_vmas() decrements degree for wrong anon_vma. This patch assigns ->anon_vma before calling anon_vma_clone(). Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com> Reported-and-tested-by: Chris Clayton <chris2553@googlemail.com> Reported-and-tested-by: Oded Gabbay <oded.gabbay@amd.com> Reported-and-tested-by: Chih-Wei Huang <cwhuang@android-x86.org> Acked-by: Rik van Riel <riel@redhat.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Daniel Forrest <dan.forrest@ssec.wisc.edu> Cc: Michal Hocko <mhocko@suse.cz> Cc: stable@vger.kernel.org # to match back-porting of 7a3ef208e662 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-01-11	mm: Don't count the stack guard page towards RLIMIT_STACK	Linus Torvalds	1	-2/+5
	Commit fee7e49d4514 ("mm: propagate error from stack expansion even for guard page") made sure that we return the error properly for stack growth conditions. It also theorized that counting the guard page towards the stack limit might break something, but also said "Let's see if anybody notices". Somebody did notice. Apparently android-x86 sets the stack limit very close to the limit indeed, and including the guard page in the rlimit check causes the android 'zygote' process problems. So this adds the (fairly trivial) code to make the stack rlimit check be against the actual real stack size, rather than the size of the vma that includes the guard page. Reported-and-tested-by: Chih-Wei Huang <cwhuang@android-x86.org> Cc: Jay Foad <jay.foad@gmail.com> Cc: stable@kernel.org # to match back-porting of fee7e49d4514 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-01-09	ARM: 8275/1: mm: fix PMD_SECT_RDONLY undeclared compile error	Victor Kamensky	1	-2/+2
	In v3.19-rc3 tree when CONFIG_ARM_LPAE and CONFIG_DEBUG_RODATA are enabled image failed to compile with the following error: arch/arm/mm/init.c:661:14: error: ‘PMD_SECT_RDONLY’ undeclared here (not in a function) It seems that '80d6b0c ARM: mm: allow text and rodata sections to be read-only' and 'ded9477 ARM: 8109/1: mm: Modify pte_write and pmd_write logic for LPAE' commits crossed. 80d6b0c uses PMD_SECT_RDONLY macro but ded9477 renames it and uses software bits L_PMD_SECT_RDONLY instead. Fix is to use L_PMD_SECT_RDONLY instead PMD_SECT_RDONLY as ded9477 does in another places. Signed-off-by: Victor Kamensky <victor.kamensky@linaro.org> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2015-01-09	HID: roccat: potential out of bounds in pyra_sysfs_write_settings()	Dan Carpenter	1	-2/+6
	This is a static checker fix. We write some binary settings to the sysfs file. One of the settings is the "->startup_profile". There isn't any checking to make sure it fits into the pyra->profile_settings[] array in the profile_activated() function. I added a check to pyra_sysfs_write_settings() in both places because I wasn't positive that the other callers were correct. Cc: <stable@vger.kernel.org> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2015-01-09	mutex: Always clear owner field upon mutex_unlock()	Chris Wilson	1	-1/+1
	Currently if DEBUG_MUTEXES is enabled, the mutex->owner field is only cleared iff debug_locks is active. This exposes a race to other users of the field where the mutex->owner may be still set to a stale value, potentially upsetting mutex_spin_on_owner() among others. References: https://bugs.freedesktop.org/show_bug.cgi?id=87955 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Davidlohr Bueso <dave@stgolabs.net> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1420540175-30204-1-git-send-email-chris@chris-wilson.co.uk Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-01-09	sched/fair: Fix RCU stall upon -ENOMEM in sched_create_group()	Tetsuo Handa	1	-0/+4
	When alloc_fair_sched_group() in sched_create_group() fails, free_sched_group() is called, and free_fair_sched_group() is called by free_sched_group(). Since destroy_cfs_bandwidth() is called by free_fair_sched_group() without calling init_cfs_bandwidth(), RCU stall occurs at hrtimer_cancel(): INFO: rcu_sched self-detected stall on CPU { 1} (t=60000 jiffies g=13074 c=13073 q=0) Task dump for CPU 1: (fprintd) R running task 0 6249 1 0x00000088 ... Call Trace: <IRQ> [<ffffffff81094988>] sched_show_task+0xa8/0x110 [<ffffffff81097acd>] dump_cpu_task+0x3d/0x50 [<ffffffff810c3a80>] rcu_dump_cpu_stacks+0x90/0xd0 [<ffffffff810c7751>] rcu_check_callbacks+0x491/0x700 [<ffffffff810cbf2b>] update_process_times+0x4b/0x80 [<ffffffff810db046>] tick_sched_handle.isra.20+0x36/0x50 [<ffffffff810db0a2>] tick_sched_timer+0x42/0x70 [<ffffffff810ccb19>] __run_hrtimer+0x69/0x1a0 [<ffffffff810db060>] ? tick_sched_handle.isra.20+0x50/0x50 [<ffffffff810ccedf>] hrtimer_interrupt+0xef/0x230 [<ffffffff810452cb>] local_apic_timer_interrupt+0x3b/0x70 [<ffffffff8164a465>] smp_apic_timer_interrupt+0x45/0x60 [<ffffffff816485bd>] apic_timer_interrupt+0x6d/0x80 <EOI> [<ffffffff810cc588>] ? lock_hrtimer_base.isra.23+0x18/0x50 [<ffffffff81193cf1>] ? __kmalloc+0x211/0x230 [<ffffffff810cc9d2>] hrtimer_try_to_cancel+0x22/0xd0 [<ffffffff81193cf1>] ? __kmalloc+0x211/0x230 [<ffffffff810ccaa2>] hrtimer_cancel+0x22/0x30 [<ffffffff810a3cb5>] free_fair_sched_group+0x25/0xd0 [<ffffffff8108df46>] free_sched_group+0x16/0x40 [<ffffffff810971bb>] sched_create_group+0x4b/0x80 [<ffffffff810aa383>] sched_autogroup_create_attach+0x43/0x1c0 [<ffffffff8107dc9c>] sys_setsid+0x7c/0x110 [<ffffffff81647729>] system_call_fastpath+0x12/0x17 Check whether init_cfs_bandwidth() was called before calling destroy_cfs_bandwidth(). Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> [ Move the check into destroy_cfs_bandwidth() to aid compilability. ] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Paul Turner <pjt@google.com> Cc: Ben Segall <bsegall@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/201412252210.GCC30204.SOMVFFOtQJFLOH@I-love.SAKURA.ne.jp Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-01-09	sched/deadline: Avoid double-accounting in case of missed deadlines	Luca Abeni	1	-18/+1
	The dl_runtime_exceeded() function is supposed to ckeck if a SCHED_DEADLINE task must be throttled, by checking if its current runtime is <= 0. However, it also checks if the scheduling deadline has been missed (the current time is larger than the current scheduling deadline), further decreasing the runtime if this happens. This "double accounting" is wrong: - In case of partitioned scheduling (or single CPU), this happens if task_tick_dl() has been called later than expected (due to small HZ values). In this case, the current runtime is also negative, and replenish_dl_entity() can take care of the deadline miss by recharging the current runtime to a value smaller than dl_runtime - In case of global scheduling on multiple CPUs, scheduling deadlines can be missed even if the task did not consume more runtime than expected, hence penalizing the task is wrong This patch fix this problem by throttling a SCHED_DEADLINE task only when its runtime becomes negative, and not modifying the runtime Signed-off-by: Luca Abeni <luca.abeni@unitn.it> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Juri Lelli <juri.lelli@gmail.com> Cc: <stable@vger.kernel.org> Cc: Dario Faggioli <raistlin@linux.it> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1418813432-20797-3-git-send-email-luca.abeni@unitn.it Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-01-09	sched/deadline: Fix migration of SCHED_DEADLINE tasks	Luca Abeni	1	-3/+3
	According to global EDF, tasks should be migrated between runqueues without checking if their scheduling deadlines and runtimes are valid. However, SCHED_DEADLINE currently performs such a check: a migration happens doing: deactivate_task(rq, next_task, 0); set_task_cpu(next_task, later_rq->cpu); activate_task(later_rq, next_task, 0); which ends up calling dequeue_task_dl(), setting the new CPU, and then calling enqueue_task_dl(). enqueue_task_dl() then calls enqueue_dl_entity(), which calls update_dl_entity(), which can modify scheduling deadline and runtime, breaking global EDF scheduling. As a result, some of the properties of global EDF are not respected: for example, a taskset {(30, 80), (40, 80), (120, 170)} scheduled on two cores can have unbounded response times for the third task even if 30/80+40/80+120/170 = 1.5809 < 2 This can be fixed by invoking update_dl_entity() only in case of wakeup, or if this is a new SCHED_DEADLINE task. Signed-off-by: Luca Abeni <luca.abeni@unitn.it> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Juri Lelli <juri.lelli@gmail.com> Cc: <stable@vger.kernel.org> Cc: Dario Faggioli <raistlin@linux.it> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1418813432-20797-2-git-send-email-luca.abeni@unitn.it Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-01-09	sched: Fix odd values in effective_load() calculations	Yuyang Du	1	-1/+1
	In effective_load, we have (long w * unsigned long tg->shares) / long W, when w is negative, it is cast to unsigned long and hence the product is insanely large. Fix this by casting tg->shares to long. Reported-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Yuyang Du <yuyang.du@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Dave Jones <davej@redhat.com> Cc: Andrey Ryabinin <a.ryabinin@samsung.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20141219002956.GA25405@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-01-09	sched, fanotify: Deal with nested sleeps	Peter Zijlstra	1	-5/+5
	As per e23738a7300a ("sched, inotify: Deal with nested sleeps"). fanotify_read is a wait loop with sleeps in. Wait loops rely on task_struct::state and sleeps do too, since that's the only means of actually sleeping. Therefore the nested sleeps destroy the wait loop state and the wait loop breaks the sleep functions that assume TASK_RUNNING (mutex_lock). Fix this by using the new woken_wake_function and wait_woken() stuff, which registers wakeups in wait and thereby allows shrinking the task_state::state changes to the actual sleep part. Reported-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Reported-by: Sedat Dilek <sedat.dilek@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Takashi Iwai <tiwai@suse.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Eric Paris <eparis@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Eric Paris <eparis@redhat.com> Link: http://lkml.kernel.org/r/20141216152838.GZ3337@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-01-09	perf/x86/uncore/hsw-ep: Handle systems with only two SBOXes	Andi Kleen	2	-1/+18
	There was another report of a boot failure with a #GP fault in the uncore SBOX initialization. The earlier work around was not enough for this system. The boot was failing while trying to initialize the third SBOX. This patch detects parts with only two SBOXes and limits the number of SBOX units to two there. Stable material, as it affects boot problems on 3.18. Tested-by: Andreas Oehler <andreas@oehler-net.de> Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: <stable@vger.kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Yan, Zheng <zheng.z.yan@intel.com> Link: http://lkml.kernel.org/r/1420583675-9163-1-git-send-email-andi@firstfloor.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-01-09	perf/x86_64: Improve user regs sampling	Andy Lutomirski	1	-2/+76
	Perf reports user regs for kernel-mode samples so that samples can be backtraced through user code. The old code was very broken in syscall context, resulting in useless backtraces. The new code, in contrast, is still dangerously racy, but it should at least work most of the time. Tested-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: chenggang.qcg@taobao.com Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/243560c26ff0f739978e2459e203f6515367634d.1420396372.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-01-09	perf: Move task_pt_regs sampling into arch code	Andy Lutomirski	6	-16/+63
	On x86_64, at least, task_pt_regs may be only partially initialized in many contexts, so x86_64 should not use it without extra care from interrupt context, let alone NMI context. This will allow x86_64 to override the logic and will supply some scratch space to use to make a cleaner copy of user regs. Tested-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: chenggang.qcg@taobao.com Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Jean Pihet <jean.pihet@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mark Salter <msalter@redhat.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/e431cd4c18c2e1c44c774f10758527fb2d1025c4.1420396372.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-01-09	x86: Fix off-by-one in instruction decoder	Peter Zijlstra	1	-1/+1
	Stephane reported that the PEBS fixup was broken by the recent commit to the instruction decoder. The thing had an off-by-one which resulted in not being able to decode the last instruction and always bail. Reported-by: Stephane Eranian <eranian@google.com> Fixes: 6ba48ff46f76 ("x86: Remove arbitrary instruction size limit in instruction decoder") Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org # 3.18 Cc: <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Liang Kan <kan.liang@intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Jim Keniston <jkenisto@us.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Link: http://lkml.kernel.org/r/20141216104614.GV3337@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-01-08	mm, vmscan: prevent kswapd livelock due to pfmemalloc-throttled process being killed	Vlastimil Babka	1	-11/+13
	Charles Shirron and Paul Cassella from Cray Inc have reported kswapd stuck in a busy loop with nothing left to balance, but kswapd_try_to_sleep() failing to sleep. Their analysis found the cause to be a combination of several factors: 1. A process is waiting in throttle_direct_reclaim() on pgdat->pfmemalloc_wait 2. The process has been killed (by OOM in this case), but has not yet been scheduled to remove itself from the waitqueue and die. 3. kswapd checks for throttled processes in prepare_kswapd_sleep(): if (waitqueue_active(&pgdat->pfmemalloc_wait)) { wake_up(&pgdat->pfmemalloc_wait); return false; // kswapd will not go to sleep } However, for a process that was already killed, wake_up() does not remove the process from the waitqueue, since try_to_wake_up() checks its state first and returns false when the process is no longer waiting. 4. kswapd is running on the same CPU as the only CPU that the process is allowed to run on (through cpus_allowed, or possibly single-cpu system). 5. CONFIG_PREEMPT_NONE=y kernel is used. If there's nothing to balance, kswapd encounters no voluntary preemption points and repeatedly fails prepare_kswapd_sleep(), blocking the process from running and removing itself from the waitqueue, which would let kswapd sleep. So, the source of the problem is that we prevent kswapd from going to sleep until there are processes waiting on the pfmemalloc_wait queue, and a process waiting on a queue is guaranteed to be removed from the queue only when it gets scheduled. This was done to make sure that no process is left sleeping on pfmemalloc_wait when kswapd itself goes to sleep. However, it isn't necessary to postpone kswapd sleep until the pfmemalloc_wait queue actually empties. To prevent processes from being left sleeping, it's actually enough to guarantee that all processes waiting on pfmemalloc_wait queue have been woken up by the time we put kswapd to sleep. This patch therefore fixes this issue by substituting 'wake_up' with 'wake_up_all' and removing 'return false' in the code snippet from prepare_kswapd_sleep() above. Note that if any process puts itself in the queue after this waitqueue_active() check, or after the wake up itself, it means that the process will also wake up kswapd - and since we are under prepare_to_wait(), the wake up won't be missed. Also we update the comment prepare_kswapd_sleep() to hopefully more clearly describe the races it is preventing. Fixes: 5515061d22f0 ("mm: throttle direct reclaimers if PF_MEMALLOC reserves are low and swap is backed by network storage") Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.cz> Acked-by: Rik van Riel <riel@redhat.com> Cc: <stable@vger.kernel.org> [3.6+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-01-08	memcg: fix destination cgroup leak on task charges migration	Vladimir Davydov	1	-12/+0
	We are supposed to take one css reference per each memory page and per each swap entry accounted to a memory cgroup. However, during task charges migration we take a reference to the destination cgroup twice per each swap entry: first in mem_cgroup_do_precharge()->try_charge() and then in mem_cgroup_move_swap_account(), permanently leaking the destination cgroup. The hunk taking the second reference seems to be a leftover from the pre-00501b531c472 ("mm: memcontrol: rewrite charge API") era. Remove it to fix the leak. Fixes: e8ea14cc6ead (mm: memcontrol: take a css reference for each charged page) Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-01-08	mm: memcontrol: switch soft limit default back to infinity	Johannes Weiner	1	-1/+4
	Commit 3e32cb2e0a12 ("mm: memcontrol: lockless page counters") accidentally switched the soft limit default from infinity to zero, which turns all memcgs with even a single page into soft limit excessors and engages soft limit reclaim on all of them during global memory pressure. This makes global reclaim generally more aggressive, but also inverts the meaning of existing soft limit configurations where unset soft limits are usually more generous than set ones. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.cz> Acked-by: Vladimir Davydov <vdavydov@parallels.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-01-08	mm/debug_pagealloc: remove obsolete Kconfig options	Joonsoo Kim	1	-9/+0
	These are obsolete since commit e30825f1869a ("mm/debug-pagealloc: prepare boottime configurable") was merged. So remove them. [pebolle@tiscali.nl: find obsolete Kconfig options] Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Paul Bolle <pebolle@tiscali.nl> Cc: Mel Gorman <mgorman@suse.de> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Dave Hansen <dave@sr71.net> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: Jungsoo Son <jungsoo.son@lge.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-01-08	vfs: renumber FMODE_NONOTIFY and add to uniqueness check	David Drysdale	3	-4/+5
	Fix clashing values for O_PATH and FMODE_NONOTIFY on sparc. The clashing O_PATH value was added in commit 5229645bdc35 ("vfs: add nonconflicting values for O_PATH") but this can't be changed as it is user-visible. FMODE_NONOTIFY is only used internally in the kernel, but it is in the same numbering space as the other O_* flags, as indicated by the comment at the top of include/uapi/asm-generic/fcntl.h (and its use in fs/notify/fanotify/fanotify_user.c). So renumber it to avoid the clash. All of this has happened before (commit 12ed2e36c98a: "fanotify: FMODE_NONOTIFY and __O_SYNC in sparc conflict"), and all of this will happen again -- so update the uniqueness check in fcntl_init() to include __FMODE_NONOTIFY. Signed-off-by: David Drysdale <drysdale@google.com> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Jan Kara <jack@suse.cz> Cc: Heinrich Schuchardt <xypron.glpk@gmx.de> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Eric Paris <eparis@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-01-08	arch/blackfin/mach-bf533/boards/stamp.c: add linux/delay.h	Oleg Nesterov	1	-0/+1
	build error arch/blackfin/mach-bf533/boards/stamp.c:834:2: error: implicit declaration of function 'mdelay' Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reported-by: Wu Fengguang <fengguang.wu@intel.com> Acked-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-01-08	ocfs2: fix the wrong directory passed to ocfs2_lookup_ino_from_name() when link file	Xue jiufei	1	-8/+35
	In ocfs2_link(), the parent directory inode passed to function ocfs2_lookup_ino_from_name() is wrong. Parameter dir is the parent of new_dentry not old_dentry. We should get old_dir from old_dentry and lookup old_dentry in old_dir in case another node remove the old dentry. With this change, hard linking works again, when paths are relative with at least one subdirectory. This is how the problem was reproducable: # mkdir a # mkdir b # touch a/test # ln a/test b/test ln: failed to create hard link `b/test' => `a/test': No such file or directory However when creating links in the same dir, it worked well. Now the link gets created. Fixes: 0e048316ff57 ("ocfs2: check existence of old dentry in ocfs2_link()") Signed-off-by: joyce.xue <xuejiufei@huawei.com> Reported-by: Szabo Aron - UBIT <aron@ubit.hu> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Tested-by: Aron Szabo <aron@ubit.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-01-08	MAINTAINERS: update rydberg's addresses	Henrik Rydberg	2	-6/+7
	My ISP finally gave up on the old mail address, so I am moving things over to bitmath.org instead. Also change the status fields to better reflect reality. Signed-off-by: Henrik Rydberg <rydberg@bitmath.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-01-08	mm: protect set_page_dirty() from ongoing truncation	Johannes Weiner	3	-42/+29
	Tejun, while reviewing the code, spotted the following race condition between the dirtying and truncation of a page: __set_page_dirty_nobuffers() __delete_from_page_cache() if (TestSetPageDirty(page)) page->mapping = NULL if (PageDirty()) dec_zone_page_state(page, NR_FILE_DIRTY); dec_bdi_stat(mapping->backing_dev_info, BDI_RECLAIMABLE); if (page->mapping) account_page_dirtied(page) __inc_zone_page_state(page, NR_FILE_DIRTY); __inc_bdi_stat(mapping->backing_dev_info, BDI_RECLAIMABLE); which results in an imbalance of NR_FILE_DIRTY and BDI_RECLAIMABLE. Dirtiers usually lock out truncation, either by holding the page lock directly, or in case of zap_pte_range(), by pinning the mapcount with the page table lock held. The notable exception to this rule, though, is do_wp_page(), for which this race exists. However, do_wp_page() already waits for a locked page to unlock before setting the dirty bit, in order to prevent a race where clear_page_dirty() misses the page bit in the presence of dirty ptes. Upgrade that wait to a fully locked set_page_dirty() to also cover the situation explained above. Afterwards, the code in set_page_dirty() dealing with a truncation race is no longer needed. Remove it. Reported-by: Tejun Heo <tj@kernel.org> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-01-08	mm: prevent endless growth of anon_vma hierarchy	Konstantin Khlebnikov	2	-1/+51
	Constantly forking task causes unlimited grow of anon_vma chain. Each next child allocates new level of anon_vmas and links vma to all previous levels because pages might be inherited from any level. This patch adds heuristic which decides to reuse existing anon_vma instead of forking new one. It adds counter anon_vma->degree which counts linked vmas and directly descending anon_vmas and reuses anon_vma if counter is lower than two. As a result each anon_vma has either vma or at least two descending anon_vmas. In such trees half of nodes are leafs with alive vmas, thus count of anon_vmas is no more than two times bigger than count of vmas. This heuristic reuses anon_vmas as few as possible because each reuse adds false aliasing among vmas and rmap walker ought to scan more ptes when it searches where page is might be mapped. Link: http://lkml.kernel.org/r/20120816024610.GA5350@evergreen.ssec.wisc.edu Fixes: 5beb49305251 ("mm: change anon_vma linking to fix multi-process server scalability issue") [akpm@linux-foundation.org: fix typo, per Rik] Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com> Reported-by: Daniel Forrest <dan.forrest@ssec.wisc.edu> Tested-by: Michal Hocko <mhocko@suse.cz> Tested-by: Jerome Marchand <jmarchan@redhat.com> Reviewed-by: Michal Hocko <mhocko@suse.cz> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: <stable@vger.kernel.org> [2.6.34+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-01-08	exit: fix race between wait_consider_task() and wait_task_zombie()	Oleg Nesterov	1	-3/+9
	wait_consider_task() checks EXIT_ZOMBIE after EXIT_DEAD/EXIT_TRACE and both checks can fail if we race with EXIT_ZOMBIE -> EXIT_DEAD/EXIT_TRACE change in between, gcc needs to reload p->exit_state after security_task_wait(). In this case ->notask_error will be wrongly cleared and do_wait() can hang forever if it was the last eligible child. Many thanks to Arne who carefully investigated the problem. Note: this bug is very old but it was pure theoretical until commit b3ab03160dfa ("wait: completely ignore the EXIT_DEAD tasks"). Before this commit "-O2" was probably enough to guarantee that compiler won't read ->exit_state twice. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reported-by: Arne Goedeke <el@laramies.com> Tested-by: Arne Goedeke <el@laramies.com> Cc: <stable@vger.kernel.org> [3.15+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-01-08	ocfs2: remove bogus check in dlm_process_recovery_data	Joseph Qi	1	-4/+1
	In dlm_process_recovery_data, only when dlm_new_lock failed the ret will be set to -ENOMEM. And in this case, newlock is definitely NULL. So test newlock is meaningless, remove it. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Reviewed-by: Alex Chen <alex.chen@huawei.com> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>