linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2013-12-16	arm64: enable generic clockevent broadcast	Lorenzo Pieralisi	3	-1/+20
	On platforms with power management capabilities, timers that are shut down when a CPU enters deep C-states must be emulated using an always-on timer and a timer IPI to relay the timer IRQ to target CPUs on an SMP system. This patch enables the generic clockevents broadcast infrastructure for arm64, by providing the required Kconfig entries and adding the timer IPI infrastructure. Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
2013-12-16	arm64: kernel: implement HW breakpoints CPU PM notifier	Lorenzo Pieralisi	1	-13/+66
	When a CPU is shutdown either through CPU idle or suspend to RAM, the content of HW breakpoint registers must be reset or restored to proper values when CPU resume from low power states. This patch adds debug register restore operations to the HW breakpoint control function and implements a CPU PM notifier that allows to restore the content of HW breakpoint registers to allow proper suspend/resume operations. Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
2013-12-16	arm64: kernel: refactor code to install/uninstall breakpoints	Lorenzo Pieralisi	1	-54/+88
	Most of the code executed to install and uninstall breakpoints is common and can be factored out in a function that through a runtime operations type provides the requested implementation. This patch creates a common function that can be used to install/uninstall breakpoints and defines the set of operations that can be carried out through it. Reviewed-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
2013-12-16	arm: kvm: implement CPU PM notifier	Lorenzo Pieralisi	1	-0/+30
	Upon CPU shutdown and consequent warm-reboot, the hypervisor CPU state must be re-initialized. This patch implements a CPU PM notifier that upon warm-boot calls a KVM hook to reinitialize properly the hypervisor state so that the CPU can be safely resumed. Acked-by: Marc Zyngier <marc.zyngier@arm.com> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
2013-12-16	arm64: kernel: implement fpsimd CPU PM notifier	Lorenzo Pieralisi	1	-0/+36
	When a CPU enters a low power state, its FP register content is lost. This patch adds a notifier to save the FP context on CPU shutdown and restore it on CPU resume. The context is saved and restored only if the suspending thread is not a kernel thread, mirroring the current context switch behaviour. Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
2013-12-16	arm64: kernel: cpu_{suspend/resume} implementation	Lorenzo Pieralisi	5	-0/+319
	Kernel subsystems like CPU idle and suspend to RAM require a generic mechanism to suspend a processor, save its context and put it into a quiescent state. The cpu_{suspend}/{resume} implementation provides such a framework through a kernel interface allowing to save/restore registers, flush the context to DRAM and suspend/resume to/from low-power states where processor context may be lost. The CPU suspend implementation relies on the suspend protocol registered in CPU operations to carry out a suspend request after context is saved and flushed to DRAM. The cpu_suspend interface: int cpu_suspend(unsigned long arg); allows to pass an opaque parameter that is handed over to the suspend CPU operations back-end so that it can take action according to the semantics attached to it. The arg parameter allows suspend to RAM and CPU idle drivers to communicate to suspend protocol back-ends; it requires standardization so that the interface can be reused seamlessly across systems, paving the way for generic drivers. Context memory is allocated on the stack, whose address is stashed in a per-cpu variable to keep track of it and passed to core functions that save/restore the registers required by the architecture. Even though, upon successful execution, the cpu_suspend function shuts down the suspending processor, the warm boot resume mechanism, based on the cpu_resume function, makes the resume path operate as a cpu_suspend function return, so that cpu_suspend can be treated as a C function by the caller, which simplifies coding the PM drivers that rely on the cpu_suspend API. Upon context save, the minimal amount of memory is flushed to DRAM so that it can be retrieved when the MMU is off and caches are not searched. The suspend CPU operation, depending on the required operations (eg CPU vs Cluster shutdown) is in charge of flushing the cache hierarchy either implicitly (by calling firmware implementations like PSCI) or explicitly by executing the required cache maintainance functions. Debug exceptions are disabled during cpu_{suspend}/{resume} operations so that debug registers can be saved and restored properly preventing preemption from debug agents enabled in the kernel. Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
2013-12-16	arm64: kernel: suspend/resume registers save/restore	Lorenzo Pieralisi	3	-0/+90
	Power management software requires the kernel to save and restore CPU registers while going through suspend and resume operations triggered by kernel subsystems like CPU idle and suspend to RAM. This patch implements code that provides save and restore mechanism for the arm v8 implementation. Memory for the context is passed as parameter to both cpu_do_suspend and cpu_do_resume functions, and allows the callers to implement context allocation as they deem fit. The registers that are saved and restored correspond to the registers set actually required by the kernel to be up and running which represents a subset of v8 ISA. Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
2013-12-16	arm64: kernel: build MPIDR_EL1 hash function data structure	Lorenzo Pieralisi	2	-0/+83
	On ARM64 SMP systems, cores are identified by their MPIDR_EL1 register. The MPIDR_EL1 guidelines in the ARM ARM do not provide strict enforcement of MPIDR_EL1 layout, only recommendations that, if followed, split the MPIDR_EL1 on ARM 64 bit platforms in four affinity levels. In multi-cluster systems like big.LITTLE, if the affinity guidelines are followed, the MPIDR_EL1 can not be considered a linear index. This means that the association between logical CPU in the kernel and the HW CPU identifier becomes somewhat more complicated requiring methods like hashing to associate a given MPIDR_EL1 to a CPU logical index, in order for the look-up to be carried out in an efficient and scalable way. This patch provides a function in the kernel that starting from the cpu_logical_map, implement collision-free hashing of MPIDR_EL1 values by checking all significative bits of MPIDR_EL1 affinity level bitfields. The hashing can then be carried out through bits shifting and ORing; the resulting hash algorithm is a collision-free though not minimal hash that can be executed with few assembly instructions. The mpidr_el1 is filtered through a mpidr mask that is built by checking all bits that toggle in the set of MPIDR_EL1s corresponding to possible CPUs. Bits that do not toggle do not carry information so they do not contribute to the resulting hash. Pseudo code: /* check all bits that toggle, so they are required / for (i = 1, mpidr_el1_mask = 0; i < num_possible_cpus(); i++) mpidr_el1_mask \|= (cpu_logical_map(i) ^ cpu_logical_map(0)); / * Build shifts to be applied to aff0, aff1, aff2, aff3 values to hash the * mpidr_el1 * fls() returns the last bit set in a word, 0 if none * ffs() returns the first bit set in a word, 0 if none */ fs0 = mpidr_el1_mask[7:0] ? ffs(mpidr_el1_mask[7:0]) - 1 : 0; fs1 = mpidr_el1_mask[15:8] ? ffs(mpidr_el1_mask[15:8]) - 1 : 0; fs2 = mpidr_el1_mask[23:16] ? ffs(mpidr_el1_mask[23:16]) - 1 : 0; fs3 = mpidr_el1_mask[39:32] ? ffs(mpidr_el1_mask[39:32]) - 1 : 0; ls0 = fls(mpidr_el1_mask[7:0]); ls1 = fls(mpidr_el1_mask[15:8]); ls2 = fls(mpidr_el1_mask[23:16]); ls3 = fls(mpidr_el1_mask[39:32]); bits0 = ls0 - fs0; bits1 = ls1 - fs1; bits2 = ls2 - fs2; bits3 = ls3 - fs3; aff0_shift = fs0; aff1_shift = 8 + fs1 - bits0; aff2_shift = 16 + fs2 - (bits0 + bits1); aff3_shift = 32 + fs3 - (bits0 + bits1 + bits2); u32 hash(u64 mpidr_el1) { u32 l[4]; u64 mpidr_el1_masked = mpidr_el1 & mpidr_el1_mask; l[0] = mpidr_el1_masked & 0xff; l[1] = mpidr_el1_masked & 0xff00; l[2] = mpidr_el1_masked & 0xff0000; l[3] = mpidr_el1_masked & 0xff00000000; return (l[0] >> aff0_shift \| l[1] >> aff1_shift \| l[2] >> aff2_shift \| l[3] >> aff3_shift); } The hashing algorithm relies on the inherent properties set in the ARM ARM recommendations for the MPIDR_EL1. Exotic configurations, where for instance the MPIDR_EL1 values at a given affinity level have large holes, can end up requiring big hash tables since the compression of values that can be achieved through shifting is somewhat crippled when holes are present. Kernel warns if the number of buckets of the resulting hash table exceeds the number of possible CPUs by a factor of 4, which is a symptom of a very sparse HW MPIDR_EL1 configuration. The hash algorithm is quite simple and can easily be implemented in assembly code, to be used in code paths where the kernel virtual address space is not set-up (ie cpu_resume) and instruction and data fetches are strongly ordered so code must be compact and must carry out few data accesses. Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
2013-12-16	arm64: kernel: add MPIDR_EL1 accessors macros	Lorenzo Pieralisi	1	-0/+10
	In order to simplify access to different affinity levels within the MPIDR_EL1 register values, this patch implements some preprocessor macros that allow to retrieve the MPIDR_EL1 affinity level value according to the level passed as input parameter. Reviewed-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
2013-12-15	Linux 3.13-rc4	Linus Torvalds	1	-1/+1

2013-12-15	null_blk: mem garbage on NUMA systems during init	Matias Bjorling	1	-4/+4
	For NUMA systems, initializing the blk-mq layer and using per node hctx. We initialize submit queues to 1, while blk-mq nr_hw_queues is initialized to the number of NUMA nodes. This makes the null_init_hctx function overwrite memory outside of what it allocated. In my case it lead to writing garbage into struct request_queue's mq_map. Signed-off-by: Matias Bjorling <m@bjorling.me> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-12-15	radeon_pm: fix oops in hwmon_attributes_visible() and radeon_hwmon_show_temp_thresh()	Sergey Senozhatsky	1	-4/+2
	Since commit ec39f64bba34 ("drm/radeon/dpm: Convert to use devm_hwmon_register_with_groups") radeon_hwmon_init() is using hwmon_device_register_with_groups(), which sets `rdev' as a device private driver_data, while hwmon_attributes_visible() and radeon_hwmon_show_temp_thresh() are still waiting for `drm_device'. Fix them by using dev_get_drvdata(), in order to avoid this oops: BUG: unable to handle kernel paging request at 0000000000001e28 IP: [<ffffffffa02ae8b4>] hwmon_attributes_visible+0x18/0x3d [radeon] PGD 15057e067 PUD 151a8e067 PMD 0 Oops: 0000 [#1] PREEMPT SMP Call Trace: internal_create_group+0x114/0x1d9 sysfs_create_group+0xe/0x10 sysfs_create_groups+0x22/0x5f device_add+0x34f/0x501 device_register+0x15/0x18 hwmon_device_register_with_groups+0xb5/0xed radeon_hwmon_init+0x56/0x7c [radeon] radeon_pm_init+0x134/0x7e5 [radeon] radeon_modeset_init+0x75f/0x8ed [radeon] radeon_driver_load_kms+0xc6/0x187 [radeon] drm_dev_register+0xf9/0x1b4 [drm] drm_get_pci_dev+0x98/0x129 [drm] radeon_pci_probe+0xa3/0xac [radeon] pci_device_probe+0x6e/0xcf driver_probe_device+0x98/0x1c4 __driver_attach+0x5c/0x7e bus_for_each_dev+0x7b/0x85 driver_attach+0x19/0x1b bus_add_driver+0x104/0x1ce driver_register+0x89/0xc5 __pci_register_driver+0x58/0x5b drm_pci_init+0x86/0xea [drm] radeon_init+0x97/0x1000 [radeon] do_one_initcall+0x7f/0x117 load_module+0x1583/0x1bb4 SyS_init_module+0xa0/0xaf Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Alexander Deucher <Alexander.Deucher@amd.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-12-15	Revert "selinux: consider filesystem subtype in policies"	Linus Torvalds	2	-60/+22
	This reverts commit 102aefdda4d8275ce7d7100bc16c88c74272b260. Tom London reports that it causes sync() to hang on Fedora rawhide: https://bugzilla.redhat.com/show_bug.cgi?id=1033965 and Josh Boyer bisected it down to this commit. Reverting the commit in the rawhide kernel fixes the problem. Eric Paris root-caused it to incorrect subtype matching in that commit breaking fuse, and has a tentative patch, but by now we're better off retrying this in 3.14 rather than playing with it any more. Reported-by: Tom London <selinux@gmail.com> Bisected-by: Josh Boyer <jwboyer@fedoraproject.org> Acked-by: Eric Paris <eparis@redhat.com> Cc: James Morris <jmorris@namei.org> Cc: Anand Avati <avati@redhat.com> Cc: Paul Moore <paul@paul-moore.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-12-14	igb: Fix for issue where values could be too high for udelay function.	Carolyn Wyborny	1	-1/+4
	This patch changes the igb_phy_has_link function to check the value of the parameter before deciding to use udelay or mdelay in order to be sure that the value is not too high for udelay function. CC: stable kernel <stable@vger.kernel.org> # 3.9+ Signed-off-by: Sunil K Pandey <sunil.k.pandey@intel.com> Signed-off-by: Kevin B Smith <kevin.b.smith@intel.com> Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-14	i40e: fix null dereference	Jesse Brandeburg	1	-0/+3
	If the vsi->tx_rings structure is NULL we don't want to panic. Change-Id: Ic694f043701738c434e8ebe0caf0673f4410dc10 Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-13	ARM: fix asm/memory.h build error	Russell King	3	-20/+17
	Jason Gunthorpe reports a build failure when ARM_PATCH_PHYS_VIRT is not defined: In file included from arch/arm/include/asm/page.h:163:0, from include/linux/mm_types.h:16, from include/linux/sched.h:24, from arch/arm/kernel/asm-offsets.c:13: arch/arm/include/asm/memory.h: In function '__virt_to_phys': arch/arm/include/asm/memory.h:244:40: error: 'PHYS_OFFSET' undeclared (first use in this function) arch/arm/include/asm/memory.h:244:40: note: each undeclared identifier is reported only once for each function it appears in arch/arm/include/asm/memory.h: In function '__phys_to_virt': arch/arm/include/asm/memory.h:249:13: error: 'PHYS_OFFSET' undeclared (first use in this function) Fixes: ca5a45c06cd4 ("ARM: mm: use phys_addr_t appropriately in p2v and v2p conversions") Tested-By: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2013-12-13	dm array: fix a reference counting bug in shadow_ablock	Joe Thornber	1	-1/+9
	An old array block could have its reference count decremented below zero when it is being replaced in the btree by a new array block. The fix is to increment the old ablock's reference count just before inserting a new ablock into the btree. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org # 3.9+
2013-12-13	dm space map: disallow decrementing a reference count below zero	Joe Thornber	1	-9/+23
	The old behaviour, returning -EINVAL if a ref_count of 0 would be decremented, was removed in commit f722063 ("dm space map: optimise sm_ll_dec and sm_ll_inc"). To fix this regression we return an error code from the mutator function pointer passed to sm_ll_mutate() and have dec_ref_count() return -EINVAL if the old ref_count is 0. Add a DMERR to reflect the potential seriousness of this error. Also, add missing dm_tm_unlock() to sm_ll_mutate()'s error path. With this fix the following dmts regression test now passes: dmtest run --suite cache -n /metadata_use_kernel/ The next patch fixes the higher-level dm-array code that exposed this regression. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org # 3.12+
2013-12-12	mm: memcg: do not allow task about to OOM kill to bypass the limit	Johannes Weiner	1	-1/+1
	Commit 4942642080ea ("mm: memcg: handle non-error OOM situations more gracefully") allowed tasks that already entered a memcg OOM condition to bypass the memcg limit on subsequent allocation attempts hoping this would expedite finishing the page fault and executing the kill. David Rientjes is worried that this breaks memcg isolation guarantees and since there is no evidence that the bypass actually speeds up fault processing just change it so that these subsequent charge attempts fail outright. The notable exception being __GFP_NOFAIL charges which are required to bypass the limit regardless. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reported-by: David Rientjes <rientjes@google.com> Acked-by: Michal Hocko <mhocko@suse.cz> Acked-bt: David Rientjes <rientjes@google.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-12-12	mm: memcg: fix race condition between memcg teardown and swapin	Johannes Weiner	1	-0/+36
	There is a race condition between a memcg being torn down and a swapin triggered from a different memcg of a page that was recorded to belong to the exiting memcg on swapout (with CONFIG_MEMCG_SWAP extension). The result is unreclaimable pages pointing to dead memcgs, which can lead to anything from endless loops in later memcg teardown (the page is charged to all hierarchical parents but is not on any LRU list) or crashes from following the dangling memcg pointer. Memcgs with tasks in them can not be torn down and usually charges don't show up in memcgs without tasks. Swapin with the CONFIG_MEMCG_SWAP extension is the notable exception because it charges the cgroup that was recorded as owner during swapout, which may be empty and in the process of being torn down when a task in another memcg triggers the swapin: teardown: swapin: lookup_swap_cgroup_id() rcu_read_lock() mem_cgroup_lookup() css_tryget() rcu_read_unlock() disable css_tryget() call_rcu() offline_css() reparent_charges() res_counter_charge() (hierarchical!) css_put() css_free() pc->mem_cgroup = dead memcg add page to dead lru Add a final reparenting step into css_free() to make sure any such raced charges are moved out of the memcg before it's finally freed. In the longer term it would be cleaner to have the css_tryget() and the res_counter charge under the same RCU lock section so that the charge reparenting is deferred until the last charge whose tryget succeeded is visible. But this will require more invasive changes that will be harder to evaluate and backport into stable, so better defer them to a separate change set. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.cz> Cc: David Rientjes <rientjes@google.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-12-12	thp: move preallocated PTE page table on move_huge_pmd()	Kirill A. Shutemov	1	-1/+11
	Andrey Wagin reported crash on VM_BUG_ON() in pgtable_pmd_page_dtor() with fallowing backtrace: free_pgd_range+0x2bf/0x410 free_pgtables+0xce/0x120 unmap_region+0xe0/0x120 do_munmap+0x249/0x360 move_vma+0x144/0x270 SyS_mremap+0x3b9/0x510 system_call_fastpath+0x16/0x1b The crash can be reproduce with this test case: #define _GNU_SOURCE #include <sys/mman.h> #include <stdio.h> #include <unistd.h> #define MB (1024 * 1024UL) #define GB (1024 * MB) int main(int argc, char *argv) { char p; int i; p = mmap((void ) GB, 10 MB, PROT_READ \| PROT_WRITE, MAP_PRIVATE \| MAP_ANONYMOUS \| MAP_FIXED, -1, 0); for (i = 0; i < 10 * MB; i += 4096) p[i] = 1; mremap(p, 10 * MB, 10 * MB, MREMAP_FIXED \| MREMAP_MAYMOVE, 2 * GB); return 0; } Due to split PMD lock, we now store preallocated PTE tables for THP pages per-PMD table. It means we need to move them to other PMD table if huge PMD moved there. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reported-by: Andrey Vagin <avagin@openvz.org> Tested-by: Andrey Vagin <avagin@openvz.org> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-12-12	mfd/rtc: s5m: fix register updating by adding regmap for RTC	Krzysztof Kozlowski	5	-14/+29
	Rename old regmap field of "struct sec_pmic_dev" to "regmap_pmic" and add new regmap for RTC. On S5M8767A registers were not properly updated and read due to usage of the same regmap as the PMIC. This could be observed in various hangs, e.g. in infinite loop during waiting for UDR field change. On this chip family the RTC has different I2C address than PMIC so additional regmap is needed. Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com> Reviewed-by: Mark Brown <broonie@linaro.org> Acked-by: Sangbeom Kim <sbkim73@samsung.com> Cc: Samuel Ortiz <sameo@linux.intel.com> Cc: Lee Jones <lee.jones@linaro.org> Cc: Liam Girdwood <lgirdwood@gmail.com> Cc: Alessandro Zummo <a.zummo@towertech.it> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Kyungmin Park <kyungmin.park@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-12-12	rtc: s5m: enable IRQ wake during suspend	Krzysztof Kozlowski	1	-0/+25
	Add PM suspend/resume ops to rtc-s5m driver and enable IRQ wake during suspend so the RTC would act like a wake up source. This allows waking up from suspend to RAM on RTC alarm interrupt. Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com> Cc: Mark Brown <broonie@linaro.org> Acked-by: Sangbeom Kim <sbkim73@samsung.com> Cc: Samuel Ortiz <sameo@linux.intel.com> Cc: Lee Jones <lee.jones@linaro.org> Cc: Liam Girdwood <lgirdwood@gmail.com> Cc: Alessandro Zummo <a.zummo@towertech.it> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Kyungmin Park <kyungmin.park@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-12-12	rtc: s5m: limit endless loop waiting for register update	Krzysztof Kozlowski	1	-6/+31
	After setting alarm or time the driver is waiting for UDR register to be cleared indicating that registers data have been transferred. Limit the endless loop to only 5 retries. Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com> Reviewed-by: Mark Brown <broonie@linaro.org> Acked-by: Sangbeom Kim <sbkim73@samsung.com> Cc: Samuel Ortiz <sameo@linux.intel.com> Cc: Lee Jones <lee.jones@linaro.org> Cc: Liam Girdwood <lgirdwood@gmail.com> Cc: Alessandro Zummo <a.zummo@towertech.it> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Kyungmin Park <kyungmin.park@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-12-12	rtc: s5m: fix unsuccesful IRQ request during probe	Krzysztof Kozlowski	1	-2/+4
	Probe failed for rtc-s5m: s5m-rtc s5m-rtc: Failed to request alarm IRQ: 12: -22 s5m-rtc: probe of s5m-rtc failed with error -22 Fix rtc-s5m interrupt request by using regmap_irq_get_virq() for mapping the IRQ. Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com> Reviewed-by: Mark Brown <broonie@linaro.org> Acked-by: Sangbeom Kim <sbkim73@samsung.com> Cc: Samuel Ortiz <sameo@linux.intel.com> Cc: Lee Jones <lee.jones@linaro.org> Cc: Liam Girdwood <lgirdwood@gmail.com> Cc: Alessandro Zummo <a.zummo@towertech.it> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Kyungmin Park <kyungmin.park@samsung.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-12-12	drivers/rtc/rtc-s5m.c: fix info->rtc assignment	Geert Uytterhoeven	1	-27/+27
	Fix this warning: drivers/rtc/rtc-s5m.c: In function `s5m_rtc_probe': drivers/rtc/rtc-s5m.c:545: warning: assignment from incompatible pointer type struct s5m_rtc_info.rtc has type "struct regmap ", while struct sec_pmic_dev.rtc has type "struct i2c_client ". Probably the author wanted to assign "struct sec_pmic_dev.regmap", which has the correct type. Also, as "rtc" doesn't make much sense as a name for a regmap, rename it to "regmap". Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Sangbeom Kim <sbkim73@samsung.com> Cc: Sachin Kamat <sachin.kamat@linaro.org> Tested-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-12-12	include/linux/kernel.h: make might_fault() a nop for !MMU	Axel Lin	1	-1/+2
	The machine cannot fault if !MUU, so make might_fault() a nop for !MMU. This fixes below build error if !CONFIG_MMU && (CONFIG_PROVE_LOCKING=y \|\| CONFIG_DEBUG_ATOMIC_SLEEP=y): arch/arm/kernel/built-in.o: In function `arch_ptrace': arch/arm/kernel/ptrace.c:852: undefined reference to `might_fault' arch/arm/kernel/built-in.o: In function `restore_sigframe': arch/arm/kernel/signal.c:173: undefined reference to `might_fault' ... arch/arm/kernel/built-in.o:arch/arm/kernel/signal.c:177: more undefined references to `might_fault' follow make: *** [vmlinux] Error 1 Signed-off-by: Axel Lin <axel.lin@ingics.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-12-12	drivers/rtc/rtc-at91rm9200.c: correct alarm over day/month wrap	Linus Pizunski	1	-0/+2
	Update month and day of month to the alarm month/day instead of current day/month when setting the RTC alarm mask. Signed-off-by: Linus Pizunski <linus@narrativeteam.com> Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-12-12	procfs: also fix proc_reg_get_unmapped_area() for !MMU case	Jan Beulich	1	-5/+9
	Commit fad1a86e25e0 ("procfs: call default get_unmapped_area on MMU-present architectures"), as its title says, took care of only the MMU case, leaving the !MMU side still in the regressed state (returning -EIO in all cases where pde->proc_fops->get_unmapped_area is NULL). From the fad1a86e25e0 changelog: "Commit c4fe24485729 ("sparc: fix PCI device proc file mmap(2)") added proc_reg_get_unmapped_area in proc_reg_file_ops and proc_reg_file_ops_no_compat, by which now mmap always returns EIO if get_unmapped_area method is not defined for the target procfs file, which causes regression of mmap on /proc/vmcore. To address this issue, like get_unmapped_area(), call default current->mm->get_unmapped_area on MMU-present architectures if pde->proc_fops->get_unmapped_area, i.e. the one in actual file operation in the procfs file, is not defined" Signed-off-by: Jan Beulich <jbeulich@suse.com> Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: David S. Miller <davem@davemloft.net> Cc: <stable@vger.kernel.org> [3.12.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-12-12	mm: memcg: do not declare OOM from __GFP_NOFAIL allocations	Johannes Weiner	1	-0/+3
	Commit 84235de394d9 ("fs: buffer: move allocation failure loop into the allocator") started recognizing __GFP_NOFAIL in memory cgroups but forgot to disable the OOM killer. Any task that does not fail allocation will also not enter the OOM completion path. So don't declare an OOM state in this case or it'll be leaked and the task be able to bypass the limit until the next userspace-triggered page fault cleans up the OOM state. Reported-by: William Dauchy <wdauchy@gmail.com> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.cz> Cc: David Rientjes <rientjes@google.com> Cc: <stable@vger.kernel.org> [3.12.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-12-12	include/linux/hugetlb.h: make isolate_huge_page() an inline	Naoya Horiguchi	1	-1/+4
	With CONFIG_HUGETLBFS=n: mm/migrate.c: In function `do_move_page_to_node_array': include/linux/hugetlb.h:140:33: warning: statement with no effect [-Wunused-value] #define isolate_huge_page(p, l) false ^ mm/migrate.c:1170:4: note: in expansion of macro `isolate_huge_page' isolate_huge_page(page, &pagelist); Reported-by: Borislav Petkov <bp@alien8.de> Tested-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-12-12	mtd: nand: pxa3xx: Use info->use_dma to release DMA resources	Ezequiel Garcia	1	-1/+1
	In commit: commit 62e8b851783138a11da63285be0fbf69530ff73d Author: Ezequiel Garcia <ezequiel.garcia@free-electrons.com> Date: Fri Oct 4 15:30:38 2013 -0300 mtd: nand: pxa3xx: Allocate data buffer on detected flash size the way the buffer is allocated was changed: the first READ_ID is issued with a small kmalloc'ed buffer. Only once the flash page size is detected the DMA buffers are allocated, and info->use_dma is set. Currently, if the device detection fails, the driver checks the 'use_dma' module parameter and tries to release unallocated DMA resources. Fix this by checking the proper indicator of the DMA allocation, which is 'info->use_dma'. Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com> Signed-off-by: Brian Norris <computersforpeace@gmail.com>
2013-12-12	Partially revert "mtd: nand: pxa3xx: Introduce 'marvell,armada370-nand' compatible string"	Ezequiel Garcia	1	-4/+0
	This partially reverts c0f3b8643a6fa2461d70760ec49d21d2b031d611. The "armada370-nand" compatible support is not complete, and it was mistake to add it. Revert it and postpone the support until the infrastructure is in place. Cc: <stable@vger.kernel.org> # 3.12 Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com> Acked-by: Jason Cooper <jason@lakedaemon.net> Signed-off-by: Brian Norris <computersforpeace@gmail.com>
2013-12-12	selinux: process labeled IPsec TCP SYN-ACK packets properly in selinux_ip_postroute()	Paul Moore	1	-7/+35
	Due to difficulty in arriving at the proper security label for TCP SYN-ACK packets in selinux_ip_postroute(), we need to check packets while/before they are undergoing XFRM transforms instead of waiting until afterwards so that we can determine the correct security label. Reported-by: Janak Desai <Janak.Desai@gtri.gatech.edu> Cc: stable@vger.kernel.org Signed-off-by: Paul Moore <pmoore@redhat.com>
2013-12-12	selinux: look for IPsec labels on both inbound and outbound packets	Paul Moore	3	-14/+47
	Previously selinux_skb_peerlbl_sid() would only check for labeled IPsec security labels on inbound packets, this patch enables it to check both inbound and outbound traffic for labeled IPsec security labels. Reported-by: Janak Desai <Janak.Desai@gtri.gatech.edu> Cc: stable@vger.kernel.org Signed-off-by: Paul Moore <pmoore@redhat.com>
2013-12-12	selinux: handle TCP SYN-ACK packets correctly in selinux_ip_postroute()	Paul Moore	1	-15/+53
	In selinux_ip_postroute() we perform access checks based on the packet's security label. For locally generated traffic we get the packet's security label from the associated socket; this works in all cases except for TCP SYN-ACK packets. In the case of SYN-ACK packet's the correct security label is stored in the connection's request_sock, not the server's socket. Unfortunately, at the point in time when selinux_ip_postroute() is called we can't query the request_sock directly, we need to recreate the label using the same logic that originally labeled the associated request_sock. See the inline comments for more explanation. Reported-by: Janak Desai <Janak.Desai@gtri.gatech.edu> Tested-by: Janak Desai <Janak.Desai@gtri.gatech.edu> Cc: stable@vger.kernel.org Signed-off-by: Paul Moore <pmoore@redhat.com>
2013-12-12	selinux: handle TCP SYN-ACK packets correctly in selinux_ip_output()	Paul Moore	1	-2/+23
	In selinux_ip_output() we always label packets based on the parent socket. While this approach works in almost all cases, it doesn't work in the case of TCP SYN-ACK packets when the correct label is not the label of the parent socket, but rather the label of the larval socket represented by the request_sock struct. Unfortunately, since the request_sock isn't queued on the parent socket until after the SYN-ACK packet is sent, we can't lookup the request_sock to determine the correct label for the packet; at this point in time the best we can do is simply pass/NF_ACCEPT the packet. It must be said that simply passing the packet without any explicit labeling action, while far from ideal, is not terrible as the SYN-ACK packet will inherit any IP option based labeling from the initial connection request so the label should be correct and all our access controls remain in place so we shouldn't have to worry about information leaks. Reported-by: Janak Desai <Janak.Desai@gtri.gatech.edu> Tested-by: Janak Desai <Janak.Desai@gtri.gatech.edu> Cc: stable@vger.kernel.org Signed-off-by: Paul Moore <pmoore@redhat.com>
2013-12-12	i2c: imx: Check the return value from clk_prepare_enable()	Fabio Estevam	1	-1/+3
	clk_prepare_enable() may fail, so let's check its return value and propagate it in the case of error. Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2013-12-12	KVM: x86: fix guest-initiated crash with x2apic (CVE-2013-6376)	Gleb Natapov	1	-1/+4
	A guest can cause a BUG_ON() leading to a host kernel crash. When the guest writes to the ICR to request an IPI, while in x2apic mode the following things happen, the destination is read from ICR2, which is a register that the guest can control. kvm_irq_delivery_to_apic_fast uses the high 16 bits of ICR2 as the cluster id. A BUG_ON is triggered, which is a protection against accessing map->logical_map with an out-of-bounds access and manages to avoid that anything really unsafe occurs. The logic in the code is correct from real HW point of view. The problem is that KVM supports only one cluster with ID 0 in clustered mode, but the code that has the bug does not take this into account. Reported-by: Lars Bull <larsbull@google.com> Cc: stable@vger.kernel.org Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-12-12	KVM: x86: Convert vapic synchronization to _cached functions (CVE-2013-6368)	Andy Honig	3	-53/+18
	In kvm_lapic_sync_from_vapic and kvm_lapic_sync_to_vapic there is the potential to corrupt kernel memory if userspace provides an address that is at the end of a page. This patches concerts those functions to use kvm_write_guest_cached and kvm_read_guest_cached. It also checks the vapic_address specified by userspace during ioctl processing and returns an error to userspace if the address is not a valid GPA. This is generally not guest triggerable, because the required write is done by firmware that runs before the guest. Also, it only affects AMD processors and oldish Intel that do not have the FlexPriority feature (unless you disable FlexPriority, of course; then newer processors are also affected). Fixes: b93463aa59d6 ('KVM: Accelerated apic support') Reported-by: Andrew Honig <ahonig@google.com> Cc: stable@vger.kernel.org Signed-off-by: Andrew Honig <ahonig@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-12-12	KVM: x86: Fix potential divide by 0 in lapic (CVE-2013-6367)	Andy Honig	1	-1/+2
	Under guest controllable circumstances apic_get_tmcct will execute a divide by zero and cause a crash. If the guest cpuid support tsc deadline timers and performs the following sequence of requests the host will crash. - Set the mode to periodic - Set the TMICT to 0 - Set the mode bits to 11 (neither periodic, nor one shot, nor tsc deadline) - Set the TMICT to non-zero. Then the lapic_timer.period will be 0, but the TMICT will not be. If the guest then reads from the TMCCT then the host will perform a divide by 0. This patch ensures that if the lapic_timer.period is 0, then the division does not occur. Reported-by: Andrew Honig <ahonig@google.com> Cc: stable@vger.kernel.org Signed-off-by: Andrew Honig <ahonig@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-12-12	KVM: Improve create VCPU parameter (CVE-2013-4587)	Andy Honig	1	-0/+3
	In multiple functions the vcpu_id is used as an offset into a bitfield. Ag malicious user could specify a vcpu_id greater than 255 in order to set or clear bits in kernel memory. This could be used to elevate priveges in the kernel. This patch verifies that the vcpu_id provided is less than 255. The api documentation already specifies that the vcpu_id must be less than max_vcpus, but this is currently not checked. Reported-by: Andrew Honig <ahonig@google.com> Cc: stable@vger.kernel.org Signed-off-by: Andrew Honig <ahonig@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-12-12	i2c: mux: Inherit retry count and timeout from parent for muxed bus	Elie De Brauwer	1	-0/+2
	If a muxed i2c bus gets created the default retry count and timeout of the muxed bus is zero. Hence it it possible that you end up with a situation where the parent controller sets a default retry count and timeout which gets applied and used while the muxed bus (using the same controller) has a default retry count of zero and a default timeout of 1s (set in i2c_add_adapter()). This can be solved by initializing the retry count and timeout of the muxed bus with the values used by the the parent at creation time. Signed-off-by: Elie De Brauwer <eliedebrauwer@gmail.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2013-12-12	xen-netback: fix gso_prefix check	Paul Durrant	1	-1/+1
	There is a mistake in checking the gso_prefix mask when passing large packets to a guest. The wrong shift is applied to the bit - the raw skb gso type is used rather then the translated one. This leads to large packets being handed to the guest without the GSO metadata. This patch fixes the check. The mistake manifested as errors whilst running Microsoft HCK large packet offload tests between a pair of Windows 8 VMs. I have verified this patch fixes those errors. Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: David Vrabel <david.vrabel@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-12	net: make neigh_priv_len in struct net_device 16bit instead of 8bit	Sebastian Siewior	1	-1/+1
	neigh_priv_len is defined as u8. With all debug enabled struct ipoib_neigh has 200 bytes. The largest part is sk_buff_head with 96 bytes and here the spinlock with 72 bytes. The size value still fits in this u8 leaving some room for more. On -RT struct ipoib_neigh put on weight and has 392 bytes. The main reason is sk_buff_head with 288 and the fatty here is spinlock with 192 bytes. This does no longer fit into into neigh_priv_len and gcc complains. This patch changes neigh_priv_len from being 8bit to 16bit. Since the following element (dev_id) is 16bit followed by a spinlock which is aligned, the struct remains with a total size of 3200 (allmodconfig) / 2048 (with as much debug off as possible) bytes on x86-64. On x86-32 the struct is 1856 (allmodconfig) / 1216 (with as much debug off as possible) bytes long. The numbers were gained with and without the patch to prove that this change does not increase the size of the struct. Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-12	drivers: net: cpsw: fix for cpsw crash when build as modules	Mugunthan V N	1	-3/+14
	When CPSW and Davinci MDIO are build as modules, CPSW crashes when accessing CPSW registers in CPSW probe. The same is working in built-in as the CPSW clocks are enabled in Davindi MDIO probe, SO Enabling the clocks before accessing the version register and moving out the other register access to cpsw device open. Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com> Signed-off-by: Felipe Balbi <balbi@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-12	word-at-a-time: provide generic big-endian zero_bytemask implementation	Will Deacon	1	-0/+8
	Whilst architectures may be able to do better than this (which they can, by simply defining their own macro), this is a generic stab at a zero_bytemask implementation for the asm-generic, big-endian word-at-a-time implementation. On arm64, a clz instruction is used to implement the fls efficiently. Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-12-12	dcache: allow word-at-a-time name hashing with big-endian CPUs	Will Deacon	3	-7/+4
	When explicitly hashing the end of a string with the word-at-a-time interface, we have to be careful which end of the word we pick up. On big-endian CPUs, the upper-bits will contain the data we're after, so ensure we generate our masks accordingly (and avoid hashing whatever random junk may have been sitting after the string). This patch adds a new dcache helper, bytemask_from_count, which creates a mask appropriate for the CPU endianness. Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-12-12	xen-netback: napi: don't prematurely request a tx event	Paul Durrant	1	-1/+1
	This patch changes the RING_FINAL_CHECK_FOR_REQUESTS in xenvif_build_tx_gops to a check for RING_HAS_UNCONSUMED_REQUESTS as the former call has the side effect of advancing the ring event pointer and therefore inviting another interrupt from the frontend before the napi poll has actually finished, thereby defeating the point of napi. The event pointer is updated by RING_FINAL_CHECK_FOR_REQUESTS in xenvif_poll, the napi poll function, if the work done is less than the budget i.e. when actually transitioning back to interrupt mode. Reported-by: Malcolm Crossley <malcolm.crossley@citrix.com> Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: David Vrabel <david.vrabel@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-12	xen-netback: napi: fix abuse of budget	Paul Durrant	1	-7/+7
	netback seems to be somewhat confused about the napi budget parameter. The parameter is supposed to limit the number of skbs processed in each poll, but netback has this confused with grant operations. This patch fixes that, properly limiting the work done in each poll. Note that this limit makes sure we do not process any more data from the shared ring than we intend to pass back from the poll. This is important to prevent tx_queue potentially growing without bound. Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: David Vrabel <david.vrabel@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>