aboutsummaryrefslogtreecommitdiffstats
path: root/mm/vmstat.c (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2013-12-18mm/mempolicy: fix !vma in new_vma_page()Wanpeng Li1-6/+8
BUG_ON(!vma) assumption is introduced by commit 0bf598d863e3 ("mbind: add BUG_ON(!vma) in new_vma_page()"), however, even if address = __vma_address(page, vma); and vma->start < address < vma->end page_address_in_vma() may still return -EFAULT because of many other conditions in it. As a result the while loop in new_vma_page() may end with vma=NULL. This patch revert the commit and also fix the potential dereference NULL pointer reported by Dan. http://marc.info/?l=linux-mm&m=137689530323257&w=2 kernel BUG at mm/mempolicy.c:1204! invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC CPU: 3 PID: 7056 Comm: trinity-child3 Not tainted 3.13.0-rc3+ #2 task: ffff8801ca5295d0 ti: ffff88005ab20000 task.ti: ffff88005ab20000 RIP: new_vma_page+0x70/0x90 RSP: 0000:ffff88005ab21db0 EFLAGS: 00010246 RAX: fffffffffffffff2 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000008040075 RSI: ffff8801c3d74600 RDI: ffffea00079a8b80 RBP: ffff88005ab21dc8 R08: 0000000000000004 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: fffffffffffffff2 R13: ffffea00079a8b80 R14: 0000000000400000 R15: 0000000000400000 FS: 00007ff49c6f4740(0000) GS:ffff880244e00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007ff49c68f994 CR3: 000000005a205000 CR4: 00000000001407e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Stack: ffffea00079a8b80 ffffea00079a8bc0 ffffea00079a8ba0 ffff88005ab21e50 ffffffff811adc7a 0000000000000000 ffff8801ca5295d0 0000000464e224f8 0000000000000000 0000000000000002 0000000000000000 ffff88020ce75c00 Call Trace: migrate_pages+0x12a/0x850 SYSC_mbind+0x513/0x6a0 SyS_mbind+0xe/0x10 ia32_do_call+0x13/0x13 Code: 85 c0 75 2f 4c 89 e1 48 89 da 31 f6 bf da 00 02 00 65 44 8b 04 25 08 f7 1c 00 e8 ec fd ff ff 5b 41 5c 41 5d 5d c3 0f 1f 44 00 00 <0f> 0b 66 0f 1f 44 00 00 4c 89 e6 48 89 df ba 01 00 00 00 e8 48 RIP [<ffffffff8119f200>] new_vma_page+0x70/0x90 RSP <ffff88005ab21db0> Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com> Reported-by: Dave Jones <davej@redhat.com> Reported-by: Sasha Levin <sasha.levin@oracle.com> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Reviewed-by: Bob Liu <bob.liu@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-12-18MAINTAINERS: add Davidlohr as GPT maintainerDavidlohr Bueso1-0/+6
Add a new entry for the GPT standard. Any future changes will now be routed through linux-efi. Signed-off-by: Davidlohr Bueso <davidlohr@hp.com> Acked-by: Matt Fleming <matt.fleming@intel.com> Cc: Jens Axboe <axboe@kernel.dk> Acked-by: Matt Domsch <Matt_Domsch@dell.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-12-18mm/memory-failure.c: recheck PageHuge() after hugetlb page migrate successfullyJianguo Wu1-4/+10
After a successful hugetlb page migration by soft offline, the source page will either be freed into hugepage_freelists or buddy(over-commit page). If page is in buddy, page_hstate(page) will be NULL. It will hit a NULL pointer dereference in dequeue_hwpoisoned_huge_page(). BUG: unable to handle kernel NULL pointer dereference at 0000000000000058 IP: [<ffffffff81163761>] dequeue_hwpoisoned_huge_page+0x131/0x1d0 PGD c23762067 PUD c24be2067 PMD 0 Oops: 0000 [#1] SMP So check PageHuge(page) after call migrate_pages() successfully. Signed-off-by: Jianguo Wu <wujianguo@huawei.com> Tested-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-12-18mm/compaction: respect ignore_skip_hint in update_pageblock_skipJoonsoo Kim1-0/+4
update_pageblock_skip() only fits to compaction which tries to isolate by pageblock unit. If isolate_migratepages_range() is called by CMA, it try to isolate regardless of pageblock unit and it don't reference get_pageblock_skip() by ignore_skip_hint. We should also respect it on update_pageblock_skip() to prevent from setting the wrong information. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com> Cc: Christoph Lameter <cl@linux.com> Cc: Rafael Aquini <aquini@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Rik van Riel <riel@redhat.com> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: <stable@vger.kernel.org> [3.7+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-12-18mm/mempolicy: correct putback method for isolate pages if failedJoonsoo Kim1-1/+1
queue_pages_range() isolates hugetlbfs pages and putback_lru_pages() can't handle these. We should change it to putback_movable_pages(). Naoya said that it is worth going into stable, because it can break in-use hugepage list. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: Rafael Aquini <aquini@redhat.com> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com> Cc: Christoph Lameter <cl@linux.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Rik van Riel <riel@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: <stable@vger.kernel.org> [3.12.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-12-18mm: add missing dependency in KconfigSima Baymani1-1/+1
Eliminate the following (rand)config warning by adding missing PROC_FS dependency: warning: (HWPOISON_INJECT && MEM_SOFT_DIRTY) selects PROC_PAGE_MONITOR which has unmet direct dependencies (PROC_FS && MMU) Signed-off-by: Sima Baymani <sima.baymani@gmail.com> Suggested-by: David Rientjes <rientjes@google.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-12-18sh: always link in helper functions extracted from libgccGeert Uytterhoeven1-1/+1
E.g. landisk_defconfig, which has CONFIG_NTFS_FS=m: ERROR: "__ashrdi3" [fs/ntfs/ntfs.ko] undefined! For "lib-y", if no symbols in a compilation unit are referenced by other units, the compilation unit will not be included in vmlinux. This breaks modules that do reference those symbols. Use "obj-y" instead to fix this. http://kisskb.ellerman.id.au/kisskb/buildresult/8838077/ This doesn't fix all cases. There are others, e.g. udivsi3. This is also not limited to sh, many architectures handle this in the same way. A simple solution is to unconditionally include all helper functions. A more complex solution is to make the choice of "lib-y" or "obj-y" depend on CONFIG_MODULES: obj-$(CONFIG_MODULES) += ... lib-y($CONFIG_MODULES) += ... Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Paul Mundt <lethal@linux-sh.org> Tested-by: Nobuhiro Iwamatsu <nobuhiro.iwamatsu.yj@renesas.com> Reviewed-by: Nobuhiro Iwamatsu <nobuhiro.iwamatsu.yj@renesas.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-12-18mm: page_alloc: exclude unreclaimable allocations from zone fairness policyJohannes Weiner1-1/+2
Dave Hansen noted a regression in a microbenchmark that loops around open() and close() on an 8-node NUMA machine and bisected it down to commit 81c0a2bb515f ("mm: page_alloc: fair zone allocator policy"). That change forces the slab allocations of the file descriptor to spread out to all 8 nodes, causing remote references in the page allocator and slab. The round-robin policy is only there to provide fairness among memory allocations that are reclaimed involuntarily based on pressure in each zone. It does not make sense to apply it to unreclaimable kernel allocations that are freed manually, in this case instantly after the allocation, and incur the remote reference costs twice for no reason. Only round-robin allocations that are usually freed through page reclaim or slab shrinking. Bisected by Dave Hansen. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-12-18mm: numa: defer TLB flush for THP migration as long as possibleMel Gorman2-7/+3
THP migration can fail for a variety of reasons. Avoid flushing the TLB to deal with THP migration races until the copy is ready to start. Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Alex Thorlton <athorlton@sgi.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-12-18mm: numa: guarantee that tlb_flush_pending updates are visible before page table updatesMel Gorman1-1/+6
According to documentation on barriers, stores issued before a LOCK can complete after the lock implying that it's possible tlb_flush_pending can be visible after a page table update. As per revised documentation, this patch adds a smp_mb__before_spinlock to guarantee the correct ordering. Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-12-18mm: fix TLB flush race between migration, and change_protection_rangeRik van Riel8-7/+69
There are a few subtle races, between change_protection_range (used by mprotect and change_prot_numa) on one side, and NUMA page migration and compaction on the other side. The basic race is that there is a time window between when the PTE gets made non-present (PROT_NONE or NUMA), and the TLB is flushed. During that time, a CPU may continue writing to the page. This is fine most of the time, however compaction or the NUMA migration code may come in, and migrate the page away. When that happens, the CPU may continue writing, through the cached translation, to what is no longer the current memory location of the process. This only affects x86, which has a somewhat optimistic pte_accessible. All other architectures appear to be safe, and will either always flush, or flush whenever there is a valid mapping, even with no permissions (SPARC). The basic race looks like this: CPU A CPU B CPU C load TLB entry make entry PTE/PMD_NUMA fault on entry read/write old page start migrating page change PTE/PMD to new page read/write old page [*] flush TLB reload TLB from new entry read/write new page lose data [*] the old page may belong to a new user at this point! The obvious fix is to flush remote TLB entries, by making sure that pte_accessible aware of the fact that PROT_NONE and PROT_NUMA memory may still be accessible if there is a TLB flush pending for the mm. This should fix both NUMA migration and compaction. [mgorman@suse.de: fix build] Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Mel Gorman <mgorman@suse.de> Cc: Alex Thorlton <athorlton@sgi.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-12-18mm: numa: avoid unnecessary disruption of NUMA hinting during migrationMel Gorman3-6/+37
do_huge_pmd_numa_page() handles the case where there is parallel THP migration. However, by the time it is checked the NUMA hinting information has already been disrupted. This patch adds an earlier check with some helpers. Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Alex Thorlton <athorlton@sgi.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-12-18mm: numa: clear numa hinting information on mprotectMel Gorman2-0/+4
On a protection change it is no longer clear if the page should be still accessible. This patch clears the NUMA hinting fault bits on a protection change. Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Alex Thorlton <athorlton@sgi.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-12-18sched: numa: skip inaccessible VMAsMel Gorman1-0/+7
Inaccessible VMA should not be trapping NUMA hint faults. Skip them. Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Alex Thorlton <athorlton@sgi.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-12-18mm: numa: avoid unnecessary work on the failure pathMel Gorman1-1/+3
If a PMD changes during a THP migration then migration aborts but the failure path is doing more work than is necessary. Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Alex Thorlton <athorlton@sgi.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-12-18mm: numa: ensure anon_vma is locked to prevent parallel THP splitsMel Gorman1-0/+7
The anon_vma lock prevents parallel THP splits and any associated complexity that arises when handling splits during THP migration. This patch checks if the lock was successfully acquired and bails from THP migration if it failed for any reason. Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Alex Thorlton <athorlton@sgi.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-12-18mm: numa: do not clear PTE for pte_numa updateMel Gorman1-2/+7
The TLB must be flushed if the PTE is updated but change_pte_range is clearing the PTE while marking PTEs pte_numa without necessarily flushing the TLB if it reinserts the same entry. Without the flush, it's conceivable that two processors have different TLBs for the same virtual address and at the very least it would generate spurious faults. This patch only unmaps the pages in change_pte_range for a full protection change. [riel@redhat.com: write pte_numa pte back to the page tables] Signed-off-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Rik van Riel <riel@redhat.com> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Alex Thorlton <athorlton@sgi.com> Cc: Chegu Vinod <chegu_vinod@hp.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-12-18mm: numa: do not clear PMD during PTE update scanMel Gorman1-1/+1
If the PMD is flushed then a parallel fault in handle_mm_fault() will enter the pmd_none and do_huge_pmd_anonymous_page() path where it'll attempt to insert a huge zero page. This is wasteful so the patch avoids clearing the PMD when setting pmd_numa. Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Alex Thorlton <athorlton@sgi.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-12-18mm: clear pmd_numa before invalidatingMel Gorman1-0/+3
On x86, PMD entries are similar to _PAGE_PROTNONE protection and are handled as NUMA hinting faults. The following two page table protection bits are what defines them _PAGE_NUMA:set _PAGE_PRESENT:clear A PMD is considered present if any of the _PAGE_PRESENT, _PAGE_PROTNONE, _PAGE_PSE or _PAGE_NUMA bits are set. If pmdp_invalidate encounters a pmd_numa, it clears the present bit leaving _PAGE_NUMA which will be considered not present by the CPU but present by pmd_present. The existing caller of pmdp_invalidate should handle it but it's an inconsistent state for a PMD. This patch keeps the state consistent when calling pmdp_invalidate. Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Alex Thorlton <athorlton@sgi.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-12-18mm: numa: call MMU notifiers on THP migrationMel Gorman1-8/+14
MMU notifiers must be called on THP page migration or secondary MMUs will get very confused. Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Alex Thorlton <athorlton@sgi.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-12-18mm: numa: serialise parallel get_user_page against THP migrationMel Gorman3-15/+60
Base pages are unmapped and flushed from cache and TLB during normal page migration and replaced with a migration entry that causes any parallel NUMA hinting fault or gup to block until migration completes. THP does not unmap pages due to a lack of support for migration entries at a PMD level. This allows races with get_user_pages and get_user_pages_fast which commit 3f926ab945b6 ("mm: Close races between THP migration and PMD numa clearing") made worse by introducing a pmd_clear_flush(). This patch forces get_user_page (fast and normal) on a pmd_numa page to go through the slow get_user_page path where it will serialise against THP migration and properly account for the NUMA hinting fault. On the migration side the page table lock is taken for each PTE update. Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Alex Thorlton <athorlton@sgi.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-12-18kexec: migrate to reboot cpuVivek Goyal3-1/+3
Commit 1b3a5d02ee07 ("reboot: move arch/x86 reboot= handling to generic kernel") moved reboot= handling to generic code. In the process it also removed the code in native_machine_shutdown() which are moving reboot process to reboot_cpu/cpu0. I guess that thought must have been that all reboot paths are calling migrate_to_reboot_cpu(), so we don't need this special handling. But kexec reboot path (kernel_kexec()) is not calling migrate_to_reboot_cpu() so above change broke kexec. Now reboot can happen on non-boot cpu and when INIT is sent in second kerneo to bring up BP, it brings down the machine. So start calling migrate_to_reboot_cpu() in kexec reboot path to avoid this problem. Bisected by WANG Chao. Reported-by: Matthew Whitehead <mwhitehe@redhat.com> Reported-by: Dave Young <dyoung@redhat.com> Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Tested-by: Baoquan He <bhe@redhat.com> Tested-by: WANG Chao <chaowang@redhat.com> Acked-by: H. Peter Anvin <hpa@linux.intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-12-17cpupower: Fix segfault due to incorrect getopt_long arugmentsJosh Boyer1-3/+3
If a user calls 'cpupower set --perf-bias 15', the process will end with a SIGSEGV in libc because cpupower-set passes a NULL optarg to the atoi call. This is because the getopt_long structure currently has all of the options as having an optional_argument when they really have a required argument. We change the structure to use required_argument to match the short options and it resolves the issue. This fixes https://bugzilla.redhat.com/show_bug.cgi?id=1000439 Signed-off-by: Josh Boyer <jwboyer@fedoraproject.org> Cc: Dominik Brodowski <linux@dominikbrodowski.net> Cc: Thomas Renninger <trenn@suse.de> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-12-16mfd: s2mps11: Fix build after regmap field rename in sec-core.cKrzysztof Kozlowski2-4/+4
Fix building of s2mps11 regulator and clock drivers after renaming regmap field in struct sec_pmic_dev in commit: - "mfd/rtc: s5m: Fix register updating by adding regmap for RTC" Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Cc: Kyungmin Park <kyungmin.park@samsung.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Mark Brown <broonie@linaro.org>
2013-12-15Linux 3.13-rc4Linus Torvalds1-1/+1
2013-12-15null_blk: mem garbage on NUMA systems during initMatias Bjorling1-4/+4
For NUMA systems, initializing the blk-mq layer and using per node hctx. We initialize submit queues to 1, while blk-mq nr_hw_queues is initialized to the number of NUMA nodes. This makes the null_init_hctx function overwrite memory outside of what it allocated. In my case it lead to writing garbage into struct request_queue's mq_map. Signed-off-by: Matias Bjorling <m@bjorling.me> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-12-15radeon_pm: fix oops in hwmon_attributes_visible() and radeon_hwmon_show_temp_thresh()Sergey Senozhatsky1-4/+2
Since commit ec39f64bba34 ("drm/radeon/dpm: Convert to use devm_hwmon_register_with_groups") radeon_hwmon_init() is using hwmon_device_register_with_groups(), which sets `rdev' as a device private driver_data, while hwmon_attributes_visible() and radeon_hwmon_show_temp_thresh() are still waiting for `drm_device'. Fix them by using dev_get_drvdata(), in order to avoid this oops: BUG: unable to handle kernel paging request at 0000000000001e28 IP: [<ffffffffa02ae8b4>] hwmon_attributes_visible+0x18/0x3d [radeon] PGD 15057e067 PUD 151a8e067 PMD 0 Oops: 0000 [#1] PREEMPT SMP Call Trace: internal_create_group+0x114/0x1d9 sysfs_create_group+0xe/0x10 sysfs_create_groups+0x22/0x5f device_add+0x34f/0x501 device_register+0x15/0x18 hwmon_device_register_with_groups+0xb5/0xed radeon_hwmon_init+0x56/0x7c [radeon] radeon_pm_init+0x134/0x7e5 [radeon] radeon_modeset_init+0x75f/0x8ed [radeon] radeon_driver_load_kms+0xc6/0x187 [radeon] drm_dev_register+0xf9/0x1b4 [drm] drm_get_pci_dev+0x98/0x129 [drm] radeon_pci_probe+0xa3/0xac [radeon] pci_device_probe+0x6e/0xcf driver_probe_device+0x98/0x1c4 __driver_attach+0x5c/0x7e bus_for_each_dev+0x7b/0x85 driver_attach+0x19/0x1b bus_add_driver+0x104/0x1ce driver_register+0x89/0xc5 __pci_register_driver+0x58/0x5b drm_pci_init+0x86/0xea [drm] radeon_init+0x97/0x1000 [radeon] do_one_initcall+0x7f/0x117 load_module+0x1583/0x1bb4 SyS_init_module+0xa0/0xaf Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Alexander Deucher <Alexander.Deucher@amd.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-12-15Revert "selinux: consider filesystem subtype in policies"Linus Torvalds2-60/+22
This reverts commit 102aefdda4d8275ce7d7100bc16c88c74272b260. Tom London reports that it causes sync() to hang on Fedora rawhide: https://bugzilla.redhat.com/show_bug.cgi?id=1033965 and Josh Boyer bisected it down to this commit. Reverting the commit in the rawhide kernel fixes the problem. Eric Paris root-caused it to incorrect subtype matching in that commit breaking fuse, and has a tentative patch, but by now we're better off retrying this in 3.14 rather than playing with it any more. Reported-by: Tom London <selinux@gmail.com> Bisected-by: Josh Boyer <jwboyer@fedoraproject.org> Acked-by: Eric Paris <eparis@redhat.com> Cc: James Morris <jmorris@namei.org> Cc: Anand Avati <avati@redhat.com> Cc: Paul Moore <paul@paul-moore.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-12-14igb: Fix for issue where values could be too high for udelay function.Carolyn Wyborny1-1/+4
This patch changes the igb_phy_has_link function to check the value of the parameter before deciding to use udelay or mdelay in order to be sure that the value is not too high for udelay function. CC: stable kernel <stable@vger.kernel.org> # 3.9+ Signed-off-by: Sunil K Pandey <sunil.k.pandey@intel.com> Signed-off-by: Kevin B Smith <kevin.b.smith@intel.com> Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-14i40e: fix null dereferenceJesse Brandeburg1-0/+3
If the vsi->tx_rings structure is NULL we don't want to panic. Change-Id: Ic694f043701738c434e8ebe0caf0673f4410dc10 Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-13ARM: fix asm/memory.h build errorRussell King3-20/+17
Jason Gunthorpe reports a build failure when ARM_PATCH_PHYS_VIRT is not defined: In file included from arch/arm/include/asm/page.h:163:0, from include/linux/mm_types.h:16, from include/linux/sched.h:24, from arch/arm/kernel/asm-offsets.c:13: arch/arm/include/asm/memory.h: In function '__virt_to_phys': arch/arm/include/asm/memory.h:244:40: error: 'PHYS_OFFSET' undeclared (first use in this function) arch/arm/include/asm/memory.h:244:40: note: each undeclared identifier is reported only once for each function it appears in arch/arm/include/asm/memory.h: In function '__phys_to_virt': arch/arm/include/asm/memory.h:249:13: error: 'PHYS_OFFSET' undeclared (first use in this function) Fixes: ca5a45c06cd4 ("ARM: mm: use phys_addr_t appropriately in p2v and v2p conversions") Tested-By: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2013-12-13dm array: fix a reference counting bug in shadow_ablockJoe Thornber1-1/+9
An old array block could have its reference count decremented below zero when it is being replaced in the btree by a new array block. The fix is to increment the old ablock's reference count just before inserting a new ablock into the btree. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org # 3.9+
2013-12-13dm space map: disallow decrementing a reference count below zeroJoe Thornber1-9/+23
The old behaviour, returning -EINVAL if a ref_count of 0 would be decremented, was removed in commit f722063 ("dm space map: optimise sm_ll_dec and sm_ll_inc"). To fix this regression we return an error code from the mutator function pointer passed to sm_ll_mutate() and have dec_ref_count() return -EINVAL if the old ref_count is 0. Add a DMERR to reflect the potential seriousness of this error. Also, add missing dm_tm_unlock() to sm_ll_mutate()'s error path. With this fix the following dmts regression test now passes: dmtest run --suite cache -n /metadata_use_kernel/ The next patch fixes the higher-level dm-array code that exposed this regression. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org # 3.12+
2013-12-13ceph: Avoid data inconsistency due to d-cache aliasing in readpage()Li Wang1-2/+6
If the length of data to be read in readpage() is exactly PAGE_CACHE_SIZE, the original code does not flush d-cache for data consistency after finishing reading. This patches fixes this. Signed-off-by: Li Wang <liwang@ubuntukylin.com> Signed-off-by: Sage Weil <sage@inktank.com>
2013-12-13ceph: initialize inode before instantiating dentryYan, Zheng1-78/+58
commit b18825a7c8 (Put a small type field into struct dentry::d_flags) put a type field into struct dentry::d_flags. __d_instantiate() set the field by checking inode->i_mode. So we should initialize inode before instantiating dentry when handling mds reply. Fixes: http://tracker.ceph.com/issues/6930 Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by: Sage Weil <sage@inktank.com>
2013-12-13powerpc/powernv: Fix OPAL LPC access in Little EndianBenjamin Herrenschmidt2-7/+7
We are passing pointers to the firmware for reads, we need to properly convert the result as OPAL is always BE. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-12-13powerpc/powernv: Fix endian issue in opal_xscom_readAnton Blanchard2-2/+4
opal_xscom_read uses a pointer to return the data so we need to byteswap it on LE builds. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-12-13powerpc: Fix endian issues in crash dump codeAnton Blanchard1-3/+3
A couple more device tree properties that need byte swapping. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-12-13powerpc/pseries: Fix endian issues in MSI codeAnton Blanchard1-13/+15
The MSI code is miscalculating quotas in little endian mode. Add required byteswaps to fix this. Before we claimed a quota of 65536, after the patch we see the correct value of 256. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-12-13powerpc/pseries: Fix PCIE link speed endian issueAnton Blanchard1-4/+4
We need to byteswap ibm,pcie-link-speed-stats. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-12-13powerpc/pseries: Fix endian issues in nvram codeAnton Blanchard1-23/+23
The NVRAM code has a number of endian issues. I noticed a very confused error log count: RTAS: 100663330 -------- RTAS event begin -------- 100663330 == 0x06000022. 0x6 LE error logs and 0x22 BE error logs. The pstore code has similar issues - if we write an oops in one endian and attempt to read it in another we get junk. Make both of these formats big endian, and byteswap as required. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-12-13powerpc/pseries: Fix endian issues in /proc/ppc64/lparcfgAnton Blanchard1-6/+6
Some obvious issues: cat /proc/ppc64/lparcfg ... partition_id=16777216 ... partition_potential_processors=268435456 Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-12-13powerpc: Fix topology core_id endian issue on LE buildsAnton Blanchard1-2/+2
cpu_to_core_id() is missing a byteswap: cat /sys/devices/system/cpu/cpu63/topology/core_id 201326592 Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-12-13powerpc: Fix endian issue in setup-common.cAnton Blanchard1-2/+2
During on LE boot we see: Partition configured for 1073741824 cpus, operating system maximum is 2048. Clearly missing a byteswap here. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-12-13powerpc: PTRACE_PEEKUSR always returns FPR0Ulrich Weigand1-2/+2
There is a bug in using ptrace to access FPRs via PTRACE_PEEKUSR / PTRACE_POKEUSR. In effect, trying to access any of the FPRs always really accesses FPR0, which does seriously break debugging :-) The problem seems to have been introduced by commit 3ad26e5c4459d (Merge branch 'for-kvm' into next). [ It is indeed a merge conflict between Paul's FPU/VSX state rework and my LE patches - Anton ] Signed-off-by: Ulrich Weigand <Ulrich.Weigand@de.ibm.com> Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-12-12mm: memcg: do not allow task about to OOM kill to bypass the limitJohannes Weiner1-1/+1
Commit 4942642080ea ("mm: memcg: handle non-error OOM situations more gracefully") allowed tasks that already entered a memcg OOM condition to bypass the memcg limit on subsequent allocation attempts hoping this would expedite finishing the page fault and executing the kill. David Rientjes is worried that this breaks memcg isolation guarantees and since there is no evidence that the bypass actually speeds up fault processing just change it so that these subsequent charge attempts fail outright. The notable exception being __GFP_NOFAIL charges which are required to bypass the limit regardless. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reported-by: David Rientjes <rientjes@google.com> Acked-by: Michal Hocko <mhocko@suse.cz> Acked-bt: David Rientjes <rientjes@google.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-12-12mm: memcg: fix race condition between memcg teardown and swapinJohannes Weiner1-0/+36
There is a race condition between a memcg being torn down and a swapin triggered from a different memcg of a page that was recorded to belong to the exiting memcg on swapout (with CONFIG_MEMCG_SWAP extension). The result is unreclaimable pages pointing to dead memcgs, which can lead to anything from endless loops in later memcg teardown (the page is charged to all hierarchical parents but is not on any LRU list) or crashes from following the dangling memcg pointer. Memcgs with tasks in them can not be torn down and usually charges don't show up in memcgs without tasks. Swapin with the CONFIG_MEMCG_SWAP extension is the notable exception because it charges the cgroup that was recorded as owner during swapout, which may be empty and in the process of being torn down when a task in another memcg triggers the swapin: teardown: swapin: lookup_swap_cgroup_id() rcu_read_lock() mem_cgroup_lookup() css_tryget() rcu_read_unlock() disable css_tryget() call_rcu() offline_css() reparent_charges() res_counter_charge() (hierarchical!) css_put() css_free() pc->mem_cgroup = dead memcg add page to dead lru Add a final reparenting step into css_free() to make sure any such raced charges are moved out of the memcg before it's finally freed. In the longer term it would be cleaner to have the css_tryget() and the res_counter charge under the same RCU lock section so that the charge reparenting is deferred until the last charge whose tryget succeeded is visible. But this will require more invasive changes that will be harder to evaluate and backport into stable, so better defer them to a separate change set. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.cz> Cc: David Rientjes <rientjes@google.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-12-12thp: move preallocated PTE page table on move_huge_pmd()Kirill A. Shutemov1-1/+11
Andrey Wagin reported crash on VM_BUG_ON() in pgtable_pmd_page_dtor() with fallowing backtrace: free_pgd_range+0x2bf/0x410 free_pgtables+0xce/0x120 unmap_region+0xe0/0x120 do_munmap+0x249/0x360 move_vma+0x144/0x270 SyS_mremap+0x3b9/0x510 system_call_fastpath+0x16/0x1b The crash can be reproduce with this test case: #define _GNU_SOURCE #include <sys/mman.h> #include <stdio.h> #include <unistd.h> #define MB (1024 * 1024UL) #define GB (1024 * MB) int main(int argc, char **argv) { char *p; int i; p = mmap((void *) GB, 10 * MB, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0); for (i = 0; i < 10 * MB; i += 4096) p[i] = 1; mremap(p, 10 * MB, 10 * MB, MREMAP_FIXED | MREMAP_MAYMOVE, 2 * GB); return 0; } Due to split PMD lock, we now store preallocated PTE tables for THP pages per-PMD table. It means we need to move them to other PMD table if huge PMD moved there. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reported-by: Andrey Vagin <avagin@openvz.org> Tested-by: Andrey Vagin <avagin@openvz.org> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-12-12mfd/rtc: s5m: fix register updating by adding regmap for RTCKrzysztof Kozlowski5-14/+29
Rename old regmap field of "struct sec_pmic_dev" to "regmap_pmic" and add new regmap for RTC. On S5M8767A registers were not properly updated and read due to usage of the same regmap as the PMIC. This could be observed in various hangs, e.g. in infinite loop during waiting for UDR field change. On this chip family the RTC has different I2C address than PMIC so additional regmap is needed. Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com> Reviewed-by: Mark Brown <broonie@linaro.org> Acked-by: Sangbeom Kim <sbkim73@samsung.com> Cc: Samuel Ortiz <sameo@linux.intel.com> Cc: Lee Jones <lee.jones@linaro.org> Cc: Liam Girdwood <lgirdwood@gmail.com> Cc: Alessandro Zummo <a.zummo@towertech.it> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Kyungmin Park <kyungmin.park@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-12-12rtc: s5m: enable IRQ wake during suspendKrzysztof Kozlowski1-0/+25
Add PM suspend/resume ops to rtc-s5m driver and enable IRQ wake during suspend so the RTC would act like a wake up source. This allows waking up from suspend to RAM on RTC alarm interrupt. Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com> Cc: Mark Brown <broonie@linaro.org> Acked-by: Sangbeom Kim <sbkim73@samsung.com> Cc: Samuel Ortiz <sameo@linux.intel.com> Cc: Lee Jones <lee.jones@linaro.org> Cc: Liam Girdwood <lgirdwood@gmail.com> Cc: Alessandro Zummo <a.zummo@towertech.it> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Kyungmin Park <kyungmin.park@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>