aboutsummaryrefslogtreecommitdiffstatshomepage
path: root/kernel (follow)
AgeCommit message (Collapse)AuthorFilesLines
2014-08-08shm: add memfd_create() syscallDavid Herrmann1-0/+1
memfd_create() is similar to mmap(MAP_ANON), but returns a file-descriptor that you can pass to mmap(). It can support sealing and avoids any connection to user-visible mount-points. Thus, it's not subject to quotas on mounted file-systems, but can be used like malloc()'ed memory, but with a file-descriptor to it. memfd_create() returns the raw shmem file, so calls like ftruncate() can be used to modify the underlying inode. Also calls like fstat() will return proper information and mark the file as regular file. If you want sealing, you can specify MFD_ALLOW_SEALING. Otherwise, sealing is not supported (like on all other regular files). Compared to O_TMPFILE, it does not require a tmpfs mount-point and is not subject to a filesystem size limit. It is still properly accounted to memcg limits, though, and to the same overcommit or no-overcommit accounting as all user memory. Signed-off-by: David Herrmann <dh.herrmann@gmail.com> Acked-by: Hugh Dickins <hughd@google.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Ryan Lortie <desrt@desrt.ca> Cc: Lennart Poettering <lennart@poettering.net> Cc: Daniel Mack <zonque@gmail.com> Cc: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-08mm: allow drivers to prevent new writable mappingsDavid Herrmann1-1/+1
This patch (of 6): The i_mmap_writable field counts existing writable mappings of an address_space. To allow drivers to prevent new writable mappings, make this counter signed and prevent new writable mappings if it is negative. This is modelled after i_writecount and DENYWRITE. This will be required by the shmem-sealing infrastructure to prevent any new writable mappings after the WRITE seal has been set. In case there exists a writable mapping, this operation will fail with EBUSY. Note that we rely on the fact that iff you already own a writable mapping, you can increase the counter without using the helpers. This is the same that we do for i_writecount. Signed-off-by: David Herrmann <dh.herrmann@gmail.com> Acked-by: Hugh Dickins <hughd@google.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Ryan Lortie <desrt@desrt.ca> Cc: Lennart Poettering <lennart@poettering.net> Cc: Daniel Mack <zonque@gmail.com> Cc: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-08kernel/acct.c: fix coding style warnings and errorsIonut Alexa1-13/+17
Signed-off-by: Ionut Alexa <ionut.m.alexa@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-08shm: make exit_shm work proportional to task activityJack Miller1-0/+6
This is small set of patches our team has had kicking around for a few versions internally that fixes tasks getting hung on shm_exit when there are many threads hammering it at once. Anton wrote a simple test to cause the issue: http://ozlabs.org/~anton/junkcode/bust_shm_exit.c Before applying this patchset, this test code will cause either hanging tracebacks or pthread out of memory errors. After this patchset, it will still produce output like: root@somehost:~# ./bust_shm_exit 1024 160 ... INFO: rcu_sched detected stalls on CPUs/tasks: {} (detected by 116, t=2111 jiffies, g=241, c=240, q=7113) INFO: Stall ended before state dump start ... But the task will continue to run along happily, so we consider this an improvement over hanging, even if it's a bit noisy. This patch (of 3): exit_shm obtains the ipc_ns shm rwsem for write and holds it while it walks every shared memory segment in the namespace. Thus the amount of work is related to the number of shm segments in the namespace not the number of segments that might need to be cleaned. In addition, this occurs after the task has been notified the thread has exited, so the number of tasks waiting for the ns shm rwsem can grow without bound until memory is exausted. Add a list to the task struct of all shmids allocated by this task. Init the list head in copy_process. Use the ns->rwsem for locking. Add segments after id is added, remove before removing from id. On unshare of NEW_IPCNS orphan any ids as if the task had exited, similar to handling of semaphore undo. I chose a define for the init sequence since its a simple list init, otherwise it would require a function call to avoid include loops between the semaphore code and the task struct. Converting the list_del to list_del_init for the unshare cases would remove the exit followed by init, but I left it blow up if not inited. Signed-off-by: Milton Miller <miltonm@bga.com> Signed-off-by: Jack Miller <millerjo@us.ibm.com> Cc: Davidlohr Bueso <davidlohr@hp.com> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: Anton Blanchard <anton@samba.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-08panic: add TAINT_SOFTLOCKUPJosh Hunt2-0/+2
This taint flag will be set if the system has ever entered a softlockup state. Similar to TAINT_WARN it is useful to know whether or not the system has been in a softlockup state when debugging. [akpm@linux-foundation.org: apply the taint before calling panic()] Signed-off-by: Josh Hunt <johunt@akamai.com> Cc: Jason Baron <jbaron@akamai.com> Cc: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-08kernel/gcov/fs.c: remove unnecessary null test before debugfs_removeFabian Frederick1-2/+1
This fixes checkpatch warning: WARNING: debugfs_remove(NULL) is safe this check is probably not required Signed-off-by: Fabian Frederick <fabf@skynet.be> Cc: Peter Oberparleiter <oberpar@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-08kernel/fork.c: make mm_init_owner staticVladimir Davydov1-7/+7
It's only used in fork.c:mm_init(). Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-08fork: copy mm's vm usage counters under mmap_semVladimir Davydov1-0/+5
If a forking process has a thread calling (un)mmap (silly but still), the child process may have some of its mm's vm usage counters (total_vm and friends) screwed up, because currently they are copied from oldmm w/o holding any locks (memcpy in dup_mm). This patch moves the counters initialization to dup_mmap() to be called under oldmm->mmap_sem, which eliminates any possibility of race. Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Christoph Lameter <cl@linux.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-08fork: reset mm->pinned_vmVladimir Davydov1-0/+1
mm->pinned_vm counts pages of mm's address space that were permanently pinned in memory by increasing their reference counter. The counter was introduced by commit bc3e53f682d9 ("mm: distinguish between mlocked and pinned pages"), while before it locked_vm had been used for such pages. Obviously, we should reset the counter on fork if !CLONE_VM, just like we do with locked_vm, but currently we don't. Let's fix it. This patch will fix the contents of /proc/pid/status:VmPin. ib_umem_get[infiniband] and perf_mmap still check pinned_vm against RLIMIT_MEMLOCK. It's left from the times when pinned pages were accounted under locked_vm, but today it looks wrong. It isn't clear how we should deal with it. We still have some drivers accounting pinned pages under mm->locked_vm - this is what commit bc3e53f682d9 was fighting against. It's infiniband/usnic and vfio. Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Roland Dreier <roland@kernel.org> Cc: Sean Hefty <sean.hefty@intel.com> Cc: Hal Rosenstock <hal.rosenstock@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-08fork/exec: cleanup mm initializationVladimir Davydov1-27/+20
mm initialization on fork/exec is spread all over the place, which makes the code look inconsistent. We have mm_init(), which is supposed to init/nullify mm's internals, but it doesn't init all the fields it should: - on fork ->mmap,mm_rb,vmacache_seqnum,map_count,mm_cpumask,locked_vm are zeroed in dup_mmap(); - on fork ->pmd_huge_pte is zeroed in dup_mm(), immediately before calling mm_init(); - ->cpu_vm_mask_var ptr is initialized by mm_init_cpumask(), which is called before mm_init() on both fork and exec; - ->context is initialized by init_new_context(), which is called after mm_init() on both fork and exec; Let's consolidate all the initializations in mm_init() to make the code look cleaner. Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Christoph Lameter <cl@linux.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-08proc: constify seq_operationsFabian Frederick1-3/+3
proc_uid_seq_operations, proc_gid_seq_operations and proc_projid_seq_operations are only called in proc_id_map_open with seq_open as const struct seq_operations so we can constify the 3 structures and update proc_id_map_open prototype. text data bss dec hex filename 6817 404 1984 9205 23f5 kernel/user_namespace.o-before 6913 308 1984 9205 23f5 kernel/user_namespace.o-after Signed-off-by: Fabian Frederick <fabf@skynet.be> Cc: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-08kernel/exit.c: fix coding style warnings and errorsIonut Alexa1-23/+26
Fixed coding style warnings and errors. Signed-off-by: Ionut Alexa <ionut.m.alexa@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-08kernel/test_kprobes.c: use current logging functionsFabian Frederick1-56/+31
- Add pr_fmt - Coalesce formats - Use current pr_foo() functions instead of printk - Remove unnecessary "failed" display (already in log level). Signed-off-by: Fabian Frederick <fabf@skynet.be> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-08kernel/kallsyms.c: fix %pB when there's no symbol at the addressNamhyung Kim1-1/+1
__sprint_symbol() should restore original address when kallsyms_lookup() failed to find a symbol. It's reported when dumpstack shows an address in a dynamically allocated trampoline for ftrace. [ 1314.612287] [<ffffffff81700312>] dump_stack+0x45/0x56 [ 1314.612290] [<ffffffff8125f5b0>] ? meminfo_proc_open+0x30/0x30 [ 1314.612293] [<ffffffffa080a494>] kpatch_ftrace_handler+0x14/0xf0 [kpatch] [ 1314.612306] [<ffffffffa00160c4>] 0xffffffffa00160c3 You can see a difference in the hex address - c4 and c3. Fix it. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Reported-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-08page-cgroup: get rid of NR_PCG_FLAGSVladimir Davydov1-2/+0
It's not used anywhere today, so let's remove it. Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Acked-by: Michal Hocko <mhocko@suse.cz> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-08mm: memcontrol: use page lists for uncharge batchingJohannes Weiner1-4/+0
Pages are now uncharged at release time, and all sources of batched uncharges operate on lists of pages. Directly use those lists, and get rid of the per-task batching state. This also batches statistics accounting, in addition to the res counter charges, to reduce IRQ-disabling and re-enabling. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.cz> Cc: Hugh Dickins <hughd@google.com> Cc: Tejun Heo <tj@kernel.org> Cc: Vladimir Davydov <vdavydov@parallels.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Vladimir Davydov <vdavydov@parallels.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-08mm: memcontrol: rewrite charge APIJohannes Weiner1-7/+8
These patches rework memcg charge lifetime to integrate more naturally with the lifetime of user pages. This drastically simplifies the code and reduces charging and uncharging overhead. The most expensive part of charging and uncharging is the page_cgroup bit spinlock, which is removed entirely after this series. Here are the top-10 profile entries of a stress test that reads a 128G sparse file on a freshly booted box, without even a dedicated cgroup (i.e. executing in the root memcg). Before: 15.36% cat [kernel.kallsyms] [k] copy_user_generic_string 13.31% cat [kernel.kallsyms] [k] memset 11.48% cat [kernel.kallsyms] [k] do_mpage_readpage 4.23% cat [kernel.kallsyms] [k] get_page_from_freelist 2.38% cat [kernel.kallsyms] [k] put_page 2.32% cat [kernel.kallsyms] [k] __mem_cgroup_commit_charge 2.18% kswapd0 [kernel.kallsyms] [k] __mem_cgroup_uncharge_common 1.92% kswapd0 [kernel.kallsyms] [k] shrink_page_list 1.86% cat [kernel.kallsyms] [k] __radix_tree_lookup 1.62% cat [kernel.kallsyms] [k] __pagevec_lru_add_fn After: 15.67% cat [kernel.kallsyms] [k] copy_user_generic_string 13.48% cat [kernel.kallsyms] [k] memset 11.42% cat [kernel.kallsyms] [k] do_mpage_readpage 3.98% cat [kernel.kallsyms] [k] get_page_from_freelist 2.46% cat [kernel.kallsyms] [k] put_page 2.13% kswapd0 [kernel.kallsyms] [k] shrink_page_list 1.88% cat [kernel.kallsyms] [k] __radix_tree_lookup 1.67% cat [kernel.kallsyms] [k] __pagevec_lru_add_fn 1.39% kswapd0 [kernel.kallsyms] [k] free_pcppages_bulk 1.30% cat [kernel.kallsyms] [k] kfree As you can see, the memcg footprint has shrunk quite a bit. text data bss dec hex filename 37970 9892 400 48262 bc86 mm/memcontrol.o.old 35239 9892 400 45531 b1db mm/memcontrol.o This patch (of 4): The memcg charge API charges pages before they are rmapped - i.e. have an actual "type" - and so every callsite needs its own set of charge and uncharge functions to know what type is being operated on. Worse, uncharge has to happen from a context that is still type-specific, rather than at the end of the page's lifetime with exclusive access, and so requires a lot of synchronization. Rewrite the charge API to provide a generic set of try_charge(), commit_charge() and cancel_charge() transaction operations, much like what's currently done for swap-in: mem_cgroup_try_charge() attempts to reserve a charge, reclaiming pages from the memcg if necessary. mem_cgroup_commit_charge() commits the page to the charge once it has a valid page->mapping and PageAnon() reliably tells the type. mem_cgroup_cancel_charge() aborts the transaction. This reduces the charge API and enables subsequent patches to drastically simplify uncharging. As pages need to be committed after rmap is established but before they are added to the LRU, page_add_new_anon_rmap() must stop doing LRU additions again. Revive lru_cache_add_active_or_unevictable(). [hughd@google.com: fix shmem_unuse] [hughd@google.com: Add comments on the private use of -EAGAIN] Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.cz> Cc: Tejun Heo <tj@kernel.org> Cc: Vladimir Davydov <vdavydov@parallels.com> Signed-off-by: Hugh Dickins <hughd@google.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-06Merge branch 'akpm' (patchbomb from Andrew Morton)Linus Torvalds6-69/+114
Merge incoming from Andrew Morton: - Various misc things. - arch/sh updates. - Part of ocfs2. Review is slow. - Slab updates. - Most of -mm. - printk updates. - lib/ updates. - checkpatch updates. * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (226 commits) checkpatch: update $declaration_macros, add uninitialized_var checkpatch: warn on missing spaces in broken up quoted checkpatch: fix false positives for --strict "space after cast" test checkpatch: fix false positive MISSING_BREAK warnings with --file checkpatch: add test for native c90 types in unusual order checkpatch: add signed generic types checkpatch: add short int to c variable types checkpatch: add for_each tests to indentation and brace tests checkpatch: fix brace style misuses of else and while checkpatch: add --fix option for a couple OPEN_BRACE misuses checkpatch: use the correct indentation for which() checkpatch: add fix_insert_line and fix_delete_line helpers checkpatch: add ability to insert and delete lines to patch/file checkpatch: add an index variable for fixed lines checkpatch: warn on break after goto or return with same tab indentation checkpatch: emit a warning on file add/move/delete checkpatch: add test for commit id formatting style in commit log checkpatch: emit fewer kmalloc_array/kcalloc conversion warnings checkpatch: improve "no space after cast" test checkpatch: allow multiple const * types ...
2014-08-06Merge tag 'pm+acpi-3.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pmLinus Torvalds7-222/+475
Pull ACPI and power management updates from Rafael Wysocki: "Again, ACPICA leads the pack (47 commits), followed by cpufreq (18 commits) and system suspend/hibernation (9 commits). From the new code perspective, the ACPICA update brings ACPI 5.1 to the table, including a new device configuration object called _DSD (Device Specific Data) that will hopefully help us to operate device properties like Device Trees do (at least to some extent) and changes related to supporting ACPI on ARM. Apart from that we have hibernation changes making it use radix trees to store memory bitmaps which should speed up some operations carried out by it quite significantly. We also have some power management changes related to suspend-to-idle (the "freeze" sleep state) support and more preliminary changes needed to support ACPI on ARM (outside of ACPICA). The rest is fixes and cleanups pretty much everywhere. Specifics: - ACPICA update to upstream version 20140724. That includes ACPI 5.1 material (support for the _CCA and _DSD predefined names, changes related to the DMAR and PCCT tables and ARM support among other things) and cleanups related to using ACPICA's header files. A major part of it is related to acpidump and the core code used by that utility. Changes from Bob Moore, David E Box, Lv Zheng, Sascha Wildner, Tomasz Nowicki, Hanjun Guo. - Radix trees for memory bitmaps used by the hibernation core from Joerg Roedel. - Support for waking up the system from suspend-to-idle (also known as the "freeze" sleep state) using ACPI-based PCI wakeup signaling (Rafael J Wysocki). - Fixes for issues related to ACPI button events (Rafael J Wysocki). - New device ID for an ACPI-enumerated device included into the Wildcat Point PCH from Jie Yang. - ACPI video updates related to backlight handling from Hans de Goede and Linus Torvalds. - Preliminary changes needed to support ACPI on ARM from Hanjun Guo and Graeme Gregory. - ACPI PNP core cleanups from Arjun Sreedharan and Zhang Rui. - Cleanups related to ACPI_COMPANION() and ACPI_HANDLE() macros (Rafael J Wysocki). - ACPI-based device hotplug cleanups from Wei Yongjun and Rafael J Wysocki. - Cleanups and improvements related to system suspend from Lan Tianyu, Randy Dunlap and Rafael J Wysocki. - ACPI battery cleanup from Wei Yongjun. - cpufreq core fixes from Viresh Kumar. - Elimination of a deadband effect from the cpufreq ondemand governor and intel_pstate driver cleanups from Stratos Karafotis. - 350MHz CPU support for the powernow-k6 cpufreq driver from Mikulas Patocka. - Fix for the imx6 cpufreq driver from Anson Huang. - cpuidle core and governor cleanups from Daniel Lezcano, Sandeep Tripathy and Mohammad Merajul Islam Molla. - Build fix for the big_little cpuidle driver from Sachin Kamat. - Configuration fix for the Operation Performance Points (OPP) framework from Mark Brown. - APM cleanup from Jean Delvare. - cpupower utility fixes and cleanups from Peter Senna Tschudin, Andrey Utkin, Himangi Saraogi, Rickard Strandqvist, Thomas Renninger" * tag 'pm+acpi-3.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (118 commits) ACPI / LPSS: add LPSS device for Wildcat Point PCH ACPI / PNP: Replace faulty is_hex_digit() by isxdigit() ACPICA: Update version to 20140724. ACPICA: ACPI 5.1: Update for PCCT table changes. ACPICA/ARM: ACPI 5.1: Update for GTDT table changes. ACPICA/ARM: ACPI 5.1: Update for MADT changes. ACPICA/ARM: ACPI 5.1: Update for FADT changes. ACPICA: ACPI 5.1: Support for the _CCA predifined name. ACPICA: ACPI 5.1: New notify value for System Affinity Update. ACPICA: ACPI 5.1: Support for the _DSD predefined name. ACPICA: Debug object: Add current value of Timer() to debug line prefix. ACPICA: acpihelp: Add UUID support, restructure some existing files. ACPICA: Utilities: Fix local printf issue. ACPICA: Tables: Update for DMAR table changes. ACPICA: Remove some extraneous printf arguments. ACPICA: Update for comments/formatting. No functional changes. ACPICA: Disassembler: Add support for the ToUUID opererator (macro). ACPICA: Remove a redundant cast to acpi_size for ACPI_OFFSET() macro. ACPICA: Work around an ancient GCC bug. ACPI / processor: Make it possible to get local x2apic id via _MAT ...
2014-08-06Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds1-0/+1
Pull SCSI updates from James Bottomley: "This patch set consists of the usual driver updates (ufs, storvsc, pm8001 hpsa). It also has removal of the user space target driver code (everyone is using LIO now), a partial PCI MSI-X update, more multi-queue updates, conversion to 64 bit LUNs (so we could theoretically cope with any LUN returned by a device) and placeholder support for the ZBC device type (Shingle drives), plus an assortment of minor updates and bug fixes" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (143 commits) scsi: do not issue SCSI RSOC command to Promise Vtrak E610f vmw_pvscsi: Use pci_enable_msix_exact() instead of pci_enable_msix() pm8001: Fix invalid return when request_irq() failed lpfc: Remove superfluous call to pci_disable_msix() isci: Use pci_enable_msix_exact() instead of pci_enable_msix() bfa: Use pci_enable_msix_exact() instead of pci_enable_msix() bfa: Cleanup bfad_setup_intr() function bfa: Do not call pci_enable_msix() after it failed once fnic: Use pci_enable_msix_exact() instead of pci_enable_msix() scsi: use short driver name for per-driver cmd slab caches scsi_debug: support scsi-mq, queues and locks Drivers: add blist flags scsi: ufs: fix endianness sparse warnings scsi: ufs: make undeclared functions static bnx2i: Update driver version to 2.7.10.1 pm8001: fix a memory leak in nvmd_resp pm8001: fix update_flash pm8001: fix a memory leak in flash_update pm8001: Cleaning up uninitialized variables pm8001: Fix to remove null pointer checks that could never happen ...
2014-08-06kernel/printk/printk.c: fix bool assignementsNeil Zhang1-3/+3
Fix coccinelle warnings. Signed-off-by: Neil Zhang <zhangwm@marvell.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-06printk: enable interrupts before calling console_trylock_for_printk()Jan Kara1-10/+21
We need interrupts disabled when calling console_trylock_for_printk() only so that cpu id we pass to can_use_console() remains valid (for other things console_sem provides all the exclusion we need and deadlocks on console_sem due to interrupts are impossible because we use down_trylock()). However if we are rescheduled, we are guaranteed to run on an online cpu so we can easily just get the cpu id in can_use_console(). We can lose a bit of performance when we enable interrupts in vprintk_emit() and then disable them again in console_unlock() but OTOH it can somewhat reduce interrupt latency caused by console_unlock(). We differ from (reverted) commit 939f04bec1a4 in that we avoid calling console_unlock() from vprintk_emit() with lockdep enabled as that has unveiled quite some bugs leading to system freezes during boot (e.g. https://lkml.org/lkml/2014/5/30/242, https://lkml.org/lkml/2014/6/28/521). Signed-off-by: Jan Kara <jack@suse.cz> Tested-by: Andreas Bombe <aeb@debian.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-06printk: miscellaneous cleanupsAlex Elder1-13/+13
Some small cleanups to kernel/printk/printk.c. None of them should cause any change in behavior. - When CONFIG_PRINTK is defined, parenthesize the value of LOG_LINE_MAX. - When CONFIG_PRINTK is *not* defined, there is an extra LOG_LINE_MAX definition; delete it. - Pull an assignment out of a conditional expression in console_setup(). - Use isdigit() in console_setup() rather than open coding it. - In update_console_cmdline(), drop a NUL-termination assignment; the strlcpy() call that precedes it guarantees it's not needed. - Simplify some logic in printk_timed_ratelimit(). Signed-off-by: Alex Elder <elder@linaro.org> Reviewed-by: Petr Mladek <pmladek@suse.cz> Cc: Andi Kleen <ak@linux.intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: Jan Kara <jack@suse.cz> Cc: John Stultz <john.stultz@linaro.org> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-06printk: use a clever macroAlex Elder1-10/+2
Use the IS_ENABLED() macro rather than #ifdef blocks to set certain global values. Signed-off-by: Alex Elder <elder@linaro.org> Acked-by: Borislav Petkov <bp@suse.de> Reviewed-by: Petr Mladek <pmladek@suse.cz> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jan Kara <jack@suse.cz> Cc: John Stultz <john.stultz@linaro.org> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-06printk: fix some commentsAlex Elder1-11/+12
Fix a few comments that don't accurately describe their corresponding code. It also fixes some minor typographical errors. Signed-off-by: Alex Elder <elder@linaro.org> Reviewed-by: Petr Mladek <pmladek@suse.cz> Cc: Andi Kleen <ak@linux.intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: Jan Kara <jack@suse.cz> Cc: John Stultz <john.stultz@linaro.org> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-06printk: rename DEFAULT_MESSAGE_LOGLEVELAlex Elder1-1/+1
Commit a8fe19ebfbfd ("kernel/printk: use symbolic defines for console loglevels") makes consistent use of symbolic values for printk() log levels. The naming scheme used is different from the one used for DEFAULT_MESSAGE_LOGLEVEL though. Change that symbol name to be MESSAGE_LOGLEVEL_DEFAULT for consistency. And because the value of that symbol comes from a similarly-named config option, rename CONFIG_DEFAULT_MESSAGE_LOGLEVEL as well. Signed-off-by: Alex Elder <elder@linaro.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: Jan Kara <jack@suse.cz> Cc: John Stultz <john.stultz@linaro.org> Cc: Petr Mladek <pmladek@suse.cz> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-06printk: tweak do_syslog() to match commentsAlex Elder1-1/+1
In do_syslog() there's a path used by kmsg_poll() and kmsg_read() that only needs to know whether there's any data available to read (and not its size). These callers only check for non-zero return. As a shortcut, do_syslog() returns the difference between what has been logged and what has been "seen." The comments say that the "count of records" should be returned but it's not. Instead it returns (log_next_idx - syslog_idx), which is a difference between buffer offsets--and the result could be negative. The behavior is the same (it'll be zero or not in the same cases), but the count of records is more meaningful and it matches what the comments say. So change the code to return that. Signed-off-by: Alex Elder <elder@linaro.org> Cc: Petr Mladek <pmladek@suse.cz> Cc: Jan Kara <jack@suse.cz> Cc: Joe Perches <joe@perches.com> Cc: John Stultz <john.stultz@linaro.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-06printk: allow increasing the ring buffer depending on the number of CPUsLuis R. Rodriguez1-0/+34
The default size of the ring buffer is too small for machines with a large amount of CPUs under heavy load. What ends up happening when debugging is the ring buffer overlaps and chews up old messages making debugging impossible unless the size is passed as a kernel parameter. An idle system upon boot up will on average spew out only about one or two extra lines but where this really matters is on heavy load and that will vary widely depending on the system and environment. There are mechanisms to help increase the kernel ring buffer for tracing through debugfs, and those interfaces even allow growing the kernel ring buffer per CPU. We also have a static value which can be passed upon boot. Relying on debugfs however is not ideal for production, and relying on the value passed upon bootup is can only used *after* an issue has creeped up. Instead of being reactive this adds a proactive measure which lets you scale the amount of contributions you'd expect to the kernel ring buffer under load by each CPU in the worst case scenario. We use num_possible_cpus() to avoid complexities which could be introduced by dynamically changing the ring buffer size at run time, num_possible_cpus() lets us use the upper limit on possible number of CPUs therefore avoiding having to deal with hotplugging CPUs on and off. This introduces the kernel configuration option LOG_CPU_MAX_BUF_SHIFT which is used to specify the maximum amount of contributions to the kernel ring buffer in the worst case before the kernel ring buffer flips over, the size is specified as a power of 2. The total amount of contributions made by each CPU must be greater than half of the default kernel ring buffer size (1 << LOG_BUF_SHIFT bytes) in order to trigger an increase upon bootup. The kernel ring buffer is increased to the next power of two that would fit the required minimum kernel ring buffer size plus the additional CPU contribution. For example if LOG_BUF_SHIFT is 18 (256 KB) you'd require at least 128 KB contributions by other CPUs in order to trigger an increase of the kernel ring buffer. With a LOG_CPU_BUF_SHIFT of 12 (4 KB) you'd require at least anything over > 64 possible CPUs to trigger an increase. If you had 128 possible CPUs the amount of minimum required kernel ring buffer bumps to: ((1 << 18) + ((128 - 1) * (1 << 12))) / 1024 = 764 KB Since we require the ring buffer to be a power of two the new required size would be 1024 KB. This CPU contributions are ignored when the "log_buf_len" kernel parameter is used as it forces the exact size of the ring buffer to an expected power of two value. [pmladek@suse.cz: fix build] Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com> Signed-off-by: Petr Mladek <pmladek@suse.cz> Tested-by: Davidlohr Bueso <davidlohr@hp.com> Tested-by: Petr Mladek <pmladek@suse.cz> Reviewed-by: Davidlohr Bueso <davidlohr@hp.com> Cc: Andrew Lunn <andrew@lunn.ch> Cc: Stephen Warren <swarren@wwwdotorg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Petr Mladek <pmladek@suse.cz> Cc: Joe Perches <joe@perches.com> Cc: Arun KS <arunks.linux@gmail.com> Cc: Kees Cook <keescook@chromium.org> Cc: Davidlohr Bueso <davidlohr@hp.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-06printk: make dynamic units clear for the kernel ring bufferLuis R. Rodriguez1-1/+1
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com> Suggested-by: Davidlohr Bueso <davidlohr@hp.com> Cc: Andrew Lunn <andrew@lunn.ch> Cc: Stephen Warren <swarren@wwwdotorg.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Petr Mladek <pmladek@suse.cz> Cc: Joe Perches <joe@perches.com> Cc: Arun KS <arunks.linux@gmail.com> Cc: Kees Cook <keescook@chromium.org> Cc: Davidlohr Bueso <davidlohr@hp.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-06printk: move power of 2 practice of ring buffer size to a helperLuis R. Rodriguez1-4/+10
In practice the power of 2 practice of the size of the kernel ring buffer remains purely historical but not a requirement, specially now that we have LOG_ALIGN and use it for both static and dynamic allocations. It could have helped with implicit alignment back in the days given the even the dynamically sized ring buffer was guaranteed to be aligned so long as CONFIG_LOG_BUF_SHIFT was set to produce a __LOG_BUF_LEN which is architecture aligned, since log_buf_len=n would be allowed only if it was > __LOG_BUF_LEN and we always ended up rounding the log_buf_len=n to the next power of 2 with roundup_pow_of_two(), any multiple of 2 then should be also architecture aligned. These assumptions of course relied heavily on CONFIG_LOG_BUF_SHIFT producing an aligned value but users can always change this. We now have precise alignment requirements set for the log buffer size for both static and dynamic allocations, but lets upkeep the old practice of using powers of 2 for its size to help with easy expected scalable values and the allocators for dynamic allocations. We'll reuse this later so move this into a helper. Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com> Cc: Andrew Lunn <andrew@lunn.ch> Cc: Stephen Warren <swarren@wwwdotorg.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Petr Mladek <pmladek@suse.cz> Cc: Joe Perches <joe@perches.com> Cc: Arun KS <arunks.linux@gmail.com> Cc: Kees Cook <keescook@chromium.org> Cc: Davidlohr Bueso <davidlohr@hp.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-06printk: make dynamic kernel ring buffer alignment explicitLuis R. Rodriguez1-2/+3
We have to consider alignment for the ring buffer both for the default static size, and then also for when an dynamic allocation is made when the log_buf_len=n kernel parameter is passed to set the size specifically to a size larger than the default size set by the architecture through CONFIG_LOG_BUF_SHIFT. The default static kernel ring buffer can be aligned properly if architectures set CONFIG_LOG_BUF_SHIFT properly, we provide ranges for the size though so even if CONFIG_LOG_BUF_SHIFT has a sensible aligned value it can be reduced to a non aligned value. Commit 6ebb017de9 ("printk: Fix alignment of buf causing crash on ARM EABI") by Andrew Lunn ensures the static buffer is always aligned and the decision of alignment is done by the compiler by using __alignof__(struct log). When log_buf_len=n is used we allocate the ring buffer dynamically. Dynamic allocation varies, for the early allocation called before setup_arch() memblock_virt_alloc() requests a page aligment and for the default kernel allocation memblock_virt_alloc_nopanic() requests no special alignment, which in turn ends up aligning the allocation to SMP_CACHE_BYTES, which is L1 cache aligned. Since we already have the required alignment for the kernel ring buffer though we can do better and request explicit alignment for LOG_ALIGN. This does that to be safe and make dynamic allocation alignment explicit. Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com> Tested-by: Petr Mladek <pmladek@suse.cz> Acked-by: Petr Mladek <pmladek@suse.cz> Cc: Andrew Lunn <andrew@lunn.ch> Cc: Stephen Warren <swarren@wwwdotorg.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Petr Mladek <pmladek@suse.cz> Cc: Joe Perches <joe@perches.com> Cc: Arun KS <arunks.linux@gmail.com> Cc: Kees Cook <keescook@chromium.org> Cc: Davidlohr Bueso <davidlohr@hp.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-06kernel/smp.c:on_each_cpu_cond(): fix warning in fallback pathSasha Levin1-1/+1
The rarely-executed memry-allocation-failed callback path generates a WARN_ON_ONCE() when smp_call_function_single() succeeds. Presumably it's supposed to warn on failures. Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Cc: Christoph Lameter <cl@gentwo.org> Cc: Gilad Ben-Yossef <gilad@benyossef.com> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Tejun Heo <htejun@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-06mm, oom: remove unnecessary exit_state checkDavid Rientjes1-0/+1
The oom killer scans each process and determines whether it is eligible for oom kill or whether the oom killer should abort because of concurrent memory freeing. It will abort when an eligible process is found to have TIF_MEMDIE set, meaning it has already been oom killed and we're waiting for it to exit. Processes with task->mm == NULL should not be considered because they are either kthreads or have already detached their memory and killing them would not lead to memory freeing. That memory is only freed after exit_mm() has returned, however, and not when task->mm is first set to NULL. Clear TIF_MEMDIE after exit_mm()'s mmput() so that an oom killed process is no longer considered for oom kill, but only until exit_mm() has returned. This was fragile in the past because it relied on exit_notify() to be reached before no longer considering TIF_MEMDIE processes. Signed-off-by: David Rientjes <rientjes@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-06mm, hugetlb: remove hugetlb_zero and hugetlb_infinityDavid Rientjes1-6/+3
They are unnecessary: "zero" can be used in place of "hugetlb_zero" and passing extra2 == NULL is equivalent to infinity. Signed-off-by: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Reviewed-by: Luiz Capitulino <lcapitulino@redhat.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-06kernel/watchdog.c: convert printk/pr_warning to pr_foo()Fabian Frederick1-4/+6
Replace some obsolete functions. Signed-off-by: Fabian Frederick <fabf@skynet.be> Cc: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-06kernel/auditfilter.c: replace count*size kmalloc by kcallocFabian Frederick1-2/+2
kcalloc manages count*sizeof overflow. Signed-off-by: Fabian Frederick <fabf@skynet.be> Cc: Eric Paris <eparis@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds5-10/+547
Pull networking updates from David Miller: "Highlights: 1) Steady transitioning of the BPF instructure to a generic spot so all kernel subsystems can make use of it, from Alexei Starovoitov. 2) SFC driver supports busy polling, from Alexandre Rames. 3) Take advantage of hash table in UDP multicast delivery, from David Held. 4) Lighten locking, in particular by getting rid of the LRU lists, in inet frag handling. From Florian Westphal. 5) Add support for various RFC6458 control messages in SCTP, from Geir Ola Vaagland. 6) Allow to filter bridge forwarding database dumps by device, from Jamal Hadi Salim. 7) virtio-net also now supports busy polling, from Jason Wang. 8) Some low level optimization tweaks in pktgen from Jesper Dangaard Brouer. 9) Add support for ipv6 address generation modes, so that userland can have some input into the process. From Jiri Pirko. 10) Consolidate common TCP connection request code in ipv4 and ipv6, from Octavian Purdila. 11) New ARP packet logger in netfilter, from Pablo Neira Ayuso. 12) Generic resizable RCU hash table, with intial users in netlink and nftables. From Thomas Graf. 13) Maintain a name assignment type so that userspace can see where a network device name came from (enumerated by kernel, assigned explicitly by userspace, etc.) From Tom Gundersen. 14) Automatic flow label generation on transmit in ipv6, from Tom Herbert. 15) New packet timestamping facilities from Willem de Bruijn, meant to assist in measuring latencies going into/out-of the packet scheduler, latency from TCP data transmission to ACK, etc" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1536 commits) cxgb4 : Disable recursive mailbox commands when enabling vi net: reduce USB network driver config options. tg3: Modify tg3_tso_bug() to handle multiple TX rings amd-xgbe: Perform phy connect/disconnect at dev open/stop amd-xgbe: Use dma_set_mask_and_coherent to set DMA mask net: sun4i-emac: fix memory leak on bad packet sctp: fix possible seqlock seadlock in sctp_packet_transmit() Revert "net: phy: Set the driver when registering an MDIO bus device" cxgb4vf: Turn off SGE RX/TX Callback Timers and interrupts in PCI shutdown routine team: Simplify return path of team_newlink bridge: Update outdated comment on promiscuous mode net-timestamp: ACK timestamp for bytestreams net-timestamp: TCP timestamping net-timestamp: SCHED timestamp on entering packet scheduler net-timestamp: add key to disambiguate concurrent datagrams net-timestamp: move timestamp flags out of sk_flags net-timestamp: extend SCM_TIMESTAMPING ancillary data struct cxgb4i : Move stray CPL definitions to cxgb4 driver tcp: reduce spurious retransmits due to transient SACK reneging qlcnic: Initialize dcbnl_ops before register_netdev ...
2014-08-06Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-securityLinus Torvalds7-63/+412
Pull security subsystem updates from James Morris: "In this release: - PKCS#7 parser for the key management subsystem from David Howells - appoint Kees Cook as seccomp maintainer - bugfixes and general maintenance across the subsystem" * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (94 commits) X.509: Need to export x509_request_asymmetric_key() netlabel: shorter names for the NetLabel catmap funcs/structs netlabel: fix the catmap walking functions netlabel: fix the horribly broken catmap functions netlabel: fix a problem when setting bits below the previously lowest bit PKCS#7: X.509 certificate issuer and subject are mandatory fields in the ASN.1 tpm: simplify code by using %*phN specifier tpm: Provide a generic means to override the chip returned timeouts tpm: missing tpm_chip_put in tpm_get_random() tpm: Properly clean sysfs entries in error path tpm: Add missing tpm_do_selftest to ST33 I2C driver PKCS#7: Use x509_request_asymmetric_key() Revert "selinux: fix the default socket labeling in sock_graft()" X.509: x509_request_asymmetric_keys() doesn't need string length arguments PKCS#7: fix sparse non static symbol warning KEYS: revert encrypted key change ima: add support for measuring and appraising firmware firmware_class: perform new LSM checks security: introduce kernel_fw_from_file hook PKCS#7: Missing inclusion of linux/err.h ...
2014-08-05Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds25-748/+1024
Pull timer and time updates from Thomas Gleixner: "A rather large update of timers, timekeeping & co - Core timekeeping code is year-2038 safe now for 32bit machines. Now we just need to fix all in kernel users and the gazillion of user space interfaces which rely on timespec/timeval :) - Better cache layout for the timekeeping internal data structures. - Proper nanosecond based interfaces for in kernel users. - Tree wide cleanup of code which wants nanoseconds but does hoops and loops to convert back and forth from timespecs. Some of it definitely belongs into the ugly code museum. - Consolidation of the timekeeping interface zoo. - A fast NMI safe accessor to clock monotonic for tracing. This is a long standing request to support correlated user/kernel space traces. With proper NTP frequency correction it's also suitable for correlation of traces accross separate machines. - Checkpoint/restart support for timerfd. - A few NOHZ[_FULL] improvements in the [hr]timer code. - Code move from kernel to kernel/time of all time* related code. - New clocksource/event drivers from the ARM universe. I'm really impressed that despite an architected timer in the newer chips SoC manufacturers insist on inventing new and differently broken SoC specific timers. [ Ed. "Impressed"? I don't think that word means what you think it means ] - Another round of code move from arch to drivers. Looks like most of the legacy mess in ARM regarding timers is sorted out except for a few obnoxious strongholds. - The usual updates and fixlets all over the place" * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (114 commits) timekeeping: Fixup typo in update_vsyscall_old definition clocksource: document some basic timekeeping concepts timekeeping: Use cached ntp_tick_length when accumulating error timekeeping: Rework frequency adjustments to work better w/ nohz timekeeping: Minor fixup for timespec64->timespec assignment ftrace: Provide trace clocks monotonic timekeeping: Provide fast and NMI safe access to CLOCK_MONOTONIC seqcount: Add raw_write_seqcount_latch() seqcount: Provide raw_read_seqcount() timekeeping: Use tk_read_base as argument for timekeeping_get_ns() timekeeping: Create struct tk_read_base and use it in struct timekeeper timekeeping: Restructure the timekeeper some more clocksource: Get rid of cycle_last clocksource: Move cycle_last validation to core code clocksource: Make delta calculation a function wireless: ath9k: Get rid of timespec conversions drm: vmwgfx: Use nsec based interfaces drm: i915: Use nsec based interfaces timekeeping: Provide ktime_get_raw() hangcheck-timer: Use ktime_get_ns() ...
2014-08-05Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds2-3/+4
Pull irq updates from Thomas Gleixner: "Nothing spectacular from the irq department this time: - overhaul of the crossbar chip driver - overhaul of the spear shirq chip driver - support for the atmel-aic chip - code move from arch to drivers - the usual tiny fixlets - two reverts worth to mention which undo the too simple attempt of supporting wakeup interrupts on shared interrupt lines" * 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (41 commits) Revert "irq: Warn when shared interrupts do not match on NO_SUSPEND" Revert "PM / sleep / irq: Do not suspend wakeup interrupts" irq: Warn when shared interrupts do not match on NO_SUSPEND irqchip: atmel-aic: Define irq fixups for atmel SoCs irqchip: atmel-aic: Implement RTC irq fixup irqchip: atmel-aic: Add irq fixup infrastructure irqchip: atmel-aic: Add atmel AIC/AIC5 drivers irqchip: atmel-aic: Move binding doc to interrupt-controller directory genirq: generic chip: Export irq_map_generic_chip function PM / sleep / irq: Do not suspend wakeup interrupts irqchip: or1k-pic: Migrate from arch/openrisc/ irqchip: crossbar: Allow for quirky hardware with direct hardwiring of GIC documentation: dt: omap: crossbar: Add description for interrupt consumer irqchip: crossbar: Introduce centralized check for crossbar write irqchip: crossbar: Introduce ti, max-crossbar-sources to identify valid crossbar mapping irqchip: crossbar: Add kerneldoc for crossbar_domain_unmap callback irqchip: crossbar: Set cb pointer to null in case of error irqchip: crossbar: Change the goto naming irqchip: crossbar: Return proper error value irqchip: crossbar: Fix kerneldoc warning ...
2014-08-05Merge branch 'pm-sleep'Rafael J. Wysocki1-127/+367
* pm-sleep: PM / Hibernate: Touch Soft Lockup Watchdog in rtree_next_node PM / Hibernate: Remove the old memory-bitmap implementation PM / Hibernate: Iterate over set bits instead of PFNs in swsusp_free() PM / Hibernate: Implement position keeping in radix tree PM / Hibernate: Add memory_rtree_find_bit function PM / Hibernate: Create a Radix-Tree to store memory bitmap PM / sleep: fix kernel-doc warnings in drivers/base/power/main.c
2014-08-05Merge branch 'pm-cpuidle'Rafael J. Wysocki1-4/+0
* pm-cpuidle: cpuidle: Remove time measurement in poll state cpuidle: Remove manual selection of the multiple driver support cpuidle: ladder governor - use macro instead of hardcoded value cpuidle: big_little: Fix build error cpuidle: menu governor - remove unused macro STDDEV_THRESH cpuidle: fix permission for driver name sysfs node cpuidle: move idle traces to cpuidle_enter_state()
2014-08-04Merge tag 'staging-3.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/stagingLinus Torvalds1-1/+1
Pull staging driver updates from Greg KH: "Here's the big pull request for the staging driver tree for 3.17-rc1. Lots of things in here, over 2000 patches, but the best part is this: 1480 files changed, 39070 insertions(+), 254659 deletions(-) Thanks to the great work of Kristina Martšenko, 14 different staging drivers have been removed from the tree as they were obsolete and no one was willing to work on cleaning them up. Other than the driver removals, loads of cleanups are in here (comedi, lustre, etc.) as well as the usual IIO driver updates and additions. All of this has been in the linux-next tree for a while" * tag 'staging-3.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (2199 commits) staging: comedi: addi_apci_1564: remove diagnostic interrupt support code staging: comedi: addi_apci_1564: add subdevice to check diagnostic status staging: wlan-ng: coding style problem fix staging: wlan-ng: fixing coding style problems staging: comedi: ii_pci20kc: request and ioremap memory staging: lustre: bitwise vs logical typo staging: dgnc: Remove unneeded dgnc_trace.c and dgnc_trace.h staging: dgnc: rephrase comment staging: comedi: ni_tio: remove some dead code staging: rtl8723au: Fix static symbol sparse warning staging: rtl8723au: usb_dvobj_init(): Remove unused variable 'pdev_desc' staging: rtl8723au: Do not duplicate kernel provided USB macros staging: rtl8723au: Remove never set struct pwrctrl_priv.bHWPowerdown staging: rtl8723au: Remove two never set variables staging: rtl8723au: RSSI_test is never set staging:r8190: coding style: Fixed checkpatch reported Error staging:r8180: coding style: Fixed too long lines staging:r8180: coding style: Fixed commenting style staging: lustre: ptlrpc: lproc_ptlrpc.c - fix dereferenceing user space buffer staging: lustre: ldlm: ldlm_resource.c - fix dereferenceing user space buffer ...
2014-08-04Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds14-274/+382
Pull scheduler updates from Ingo Molnar: - Move the nohz kick code out of the scheduler tick to a dedicated IPI, from Frederic Weisbecker. This necessiated quite some background infrastructure rework, including: * Clean up some irq-work internals * Implement remote irq-work * Implement nohz kick on top of remote irq-work * Move full dynticks timer enqueue notification to new kick * Move multi-task notification to new kick * Remove unecessary barriers on multi-task notification - Remove proliferation of wait_on_bit() action functions and allow wait_on_bit_action() functions to support a timeout. (Neil Brown) - Another round of sched/numa improvements, cleanups and fixes. (Rik van Riel) - Implement fast idling of CPUs when the system is partially loaded, for better scalability. (Tim Chen) - Restructure and fix the CPU hotplug handling code that may leave cfs_rq and rt_rq's throttled when tasks are migrated away from a dead cpu. (Kirill Tkhai) - Robustify the sched topology setup code. (Peterz Zijlstra) - Improve sched_feat() handling wrt. static_keys (Jason Baron) - Misc fixes. * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (37 commits) sched/fair: Fix 'make xmldocs' warning caused by missing description sched: Use macro for magic number of -1 for setparam sched: Robustify topology setup sched: Fix sched_setparam() policy == -1 logic sched: Allow wait_on_bit_action() functions to support a timeout sched: Remove proliferation of wait_on_bit() action functions sched/numa: Revert "Use effective_load() to balance NUMA loads" sched: Fix static_key race with sched_feat() sched: Remove extra static_key*() function indirection sched/rt: Fix replenish_dl_entity() comments to match the current upstream code sched: Transform resched_task() into resched_curr() sched/deadline: Kill task_struct->pi_top_task sched: Rework check_for_tasks() sched/rt: Enqueue just unthrottled rt_rq back on the stack in __disable_runtime() sched/fair: Disable runtime_enabled on dying rq sched/numa: Change scan period code to match intent sched/numa: Rework best node setting in task_numa_migrate() sched/numa: Examine a task move when examining a task swap sched/numa: Simplify task_numa_compare() sched/numa: Use effective_load() to balance NUMA loads ...
2014-08-04Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds2-1/+19
Pull perf changes from Ingo Molnar: "Kernel side changes: - Consolidate the PMU interrupt-disabled code amongst architectures (Vince Weaver) - misc fixes Tooling changes (new features, user visible changes): - Add support for pagefault tracing in 'trace', please see multiple examples in the changeset messages (Stanislav Fomichev). - Add pagefault statistics in 'trace' (Stanislav Fomichev) - Add header for columns in 'top' and 'report' TUI browsers (Jiri Olsa) - Add pagefault statistics in 'trace' (Stanislav Fomichev) - Add IO mode into timechart command (Stanislav Fomichev) - Fallback to syscalls:* when raw_syscalls:* is not available in the perl and python perf scripts. (Daniel Bristot de Oliveira) - Add --repeat global option to 'perf bench' to be used in benchmarks such as the existing 'futex' one, that was modified to use it instead of a local option. (Davidlohr Bueso) - Fix fd -> pathname resolution in 'trace', be it using /proc or a vfs_getname probe point. (Arnaldo Carvalho de Melo) - Add suggestion of how to set perf_event_paranoid sysctl, to help non-root users trying tools like 'trace' to get a working environment. (Arnaldo Carvalho de Melo) - Updates from trace-cmd for traceevent plugin_kvm plus args cleanup (Steven Rostedt, Jan Kiszka) - Support S/390 in 'perf kvm stat' (Alexander Yarygin) Tooling infrastructure changes: - Allow reserving a row for header purposes in the hists browser (Arnaldo Carvalho de Melo) - Various fixes and prep work related to supporting Intel PT (Adrian Hunter) - Introduce multiple debug variables control (Jiri Olsa) - Add callchain and additional sample information for python scripts (Joseph Schuchart) - More prep work to support Intel PT: (Adrian Hunter) - Polishing 'script' BTS output - 'inject' can specify --kallsym - VDSO is per machine, not a global var - Expose data addr lookup functions previously private to 'script' - Large mmap fixes in events processing - Include standard stringify macros in power pc code (Sukadev Bhattiprolu) Tooling cleanups: - Convert open coded equivalents to asprintf() (Andy Shevchenko) - Remove needless reassignments in 'trace' (Arnaldo Carvalho de Melo) - Cache the is_exit syscall test in 'trace) (Arnaldo Carvalho de Melo) - No need to reimplement err() in 'perf bench sched-messaging', drop barf(). (Davidlohr Bueso). - Remove ev_name argument from perf_evsel__hists_browse, can be obtained from the other parameters. (Jiri Olsa) Tooling fixes: - Fix memory leak in the 'sched-messaging' perf bench test. (Davidlohr Bueso) - The -o and -n 'perf bench mem' options are mutually exclusive, emit error when both are specified. (Davidlohr Bueso) - Fix scrollbar refresh row index in the ui browser, problem exposed now that headers will be added and will be allowed to be switched on/off. (Jiri Olsa) - Handle the num array type in python properly (Sebastian Andrzej Siewior) - Fix wrong condition for allocation failure (Jiri Olsa) - Adjust callchain based on DWARF debug info on powerpc (Sukadev Bhattiprolu) - Fix a risk for doing free on uninitialized pointer in traceevent lib (Rickard Strandqvist) - Update attr test with PERF_FLAG_FD_CLOEXEC flag (Jiri Olsa) - Enable close-on-exec flag on perf file descriptor (Yann Droneaud) - Fix build on gcc 4.4.7 (Arnaldo Carvalho de Melo) - Event ordering fixes (Jiri Olsa)" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (123 commits) Revert "perf tools: Fix jump label always changing during tracing" perf tools: Fix perf usage string leftover perf: Check permission only for parent tracepoint event perf record: Store PERF_RECORD_FINISHED_ROUND only for nonempty rounds perf record: Always force PERF_RECORD_FINISHED_ROUND event perf inject: Add --kallsyms parameter perf tools: Expose 'addr' functions so they can be reused perf session: Fix accounting of ordered samples queue perf powerpc: Include util/util.h and remove stringify macros perf tools: Fix build on gcc 4.4.7 perf tools: Add thread parameter to vdso__dso_findnew() perf tools: Add dso__type() perf tools: Separate the VDSO map name from the VDSO dso name perf tools: Add vdso__new() perf machine: Fix the lifetime of the VDSO temporary file perf tools: Group VDSO global variables into a structure perf session: Add ability to skip 4GiB or more perf session: Add ability to 'skip' a non-piped event stream perf tools: Pass machine to vdso__dso_findnew() perf tools: Add dso__data_size() ...
2014-08-04Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds12-407/+664
Pull locking updates from Ingo Molnar: "The main changes in this cycle are: - big rtmutex and futex cleanup and robustification from Thomas Gleixner - mutex optimizations and refinements from Jason Low - arch_mutex_cpu_relax() removal and related cleanups - smaller lockdep tweaks" * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits) arch, locking: Ciao arch_mutex_cpu_relax() locking/lockdep: Only ask for /proc/lock_stat output when available locking/mutexes: Optimize mutex trylock slowpath locking/mutexes: Try to acquire mutex only if it is unlocked locking/mutexes: Delete the MUTEX_SHOW_NO_WAITER macro locking/mutexes: Correct documentation on mutex optimistic spinning rtmutex: Make the rtmutex tester depend on BROKEN futex: Simplify futex_lock_pi_atomic() and make it more robust futex: Split out the first waiter attachment from lookup_pi_state() futex: Split out the waiter check from lookup_pi_state() futex: Use futex_top_waiter() in lookup_pi_state() futex: Make unlock_pi more robust rtmutex: Avoid pointless requeueing in the deadlock detection chain walk rtmutex: Cleanup deadlock detector debug logic rtmutex: Confine deadlock logic to futex rtmutex: Simplify remove_waiter() rtmutex: Document pi chain walk rtmutex: Clarify the boost/deboost part rtmutex: No need to keep task ref for lock owner check rtmutex: Simplify and document try_to_take_rtmutex() ...
2014-08-04Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds9-99/+329
Pull RCU changes from Ingo Molar: "The main changes: - torture-test updates - callback-offloading changes - maintainership changes - update RCU documentation - miscellaneous fixes" * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (32 commits) rcu: Allow for NULL tick_nohz_full_mask when nohz_full= missing rcu: Fix a sparse warning in rcu_report_unblock_qs_rnp() rcu: Fix a sparse warning in rcu_initiate_boost() rcu: Fix __rcu_reclaim() to use true/false for bool rcu: Remove CONFIG_PROVE_RCU_DELAY rcu: Use __this_cpu_read() instead of per_cpu_ptr() rcu: Don't use NMIs to dump other CPUs' stacks rcu: Bind grace-period kthreads to non-NO_HZ_FULL CPUs rcu: Simplify priority boosting by putting rt_mutex in rcu_node rcu: Check both root and current rcu_node when setting up future grace period rcu: Allow post-unlock reference for rt_mutex rcu: Loosen __call_rcu()'s rcu_head alignment constraint rcu: Eliminate read-modify-write ACCESS_ONCE() calls rcu: Remove redundant ACCESS_ONCE() from tick_do_timer_cpu rcu: Make rcu node arrays static const char * const signal: Explain local_irq_save() call rcu: Handle obsolete references to TINY_PREEMPT_RCU rcu: Document deadlock-avoidance information for rcu_read_unlock() scripts: Teach get_maintainer.pl about the new "R:" tag rcu: Update rcu torture maintainership filename patterns ...
2014-08-04Merge tag 'trace-3.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-traceLinus Torvalds3-61/+19
Pull tracing filter cleanups from Steven Rostedt: "Oleg Nesterov did several clean ups with the tracing filter code. As he found some small bugs that went into 3.16, and these changes were based on that, I had to apply his changes to a separate branch than my main development branch. This was based on work that was already pulled into 3.16, and is a separate pull request to keep from having local merges in my pull request" * tag 'trace-3.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Kill "filter_string" arg of replace_preds() tracing: Change apply_subsystem_event_filter() paths to check file->system == dir tracing: Kill ftrace_event_call->files tracing/uprobes: Kill the dead TRACE_EVENT_FL_USE_CALL_FILTER logic tracing: Kill call_filter_disable() tracing: Kill destroy_call_preds() tracing: Kill destroy_preds() and destroy_file_preds()
2014-08-04Merge tag 'trace-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-traceLinus Torvalds13-464/+932
Pull tracing updates from Steven Rostedt: "This pull request has a lot of work done. The main thing is the changes to the ftrace function callback infrastructure. It's introducing a way to allow different functions to call directly different trampolines instead of all calling the same "mcount" one. The only user of this for now is the function graph tracer, which always had a different trampoline, but the function tracer trampoline was called and did basically nothing, and then the function graph tracer trampoline was called. The difference now, is that the function graph tracer trampoline can be called directly if a function is only being traced by the function graph trampoline. If function tracing is also happening on the same function, the old way is still done. The accounting for this takes up more memory when function graph tracing is activated, as it needs to keep track of which functions it uses. I have a new way that wont take as much memory, but it's not ready yet for this merge window, and will have to wait for the next one. Another big change was the removal of the ftrace_start/stop() calls that were used by the suspend/resume code that stopped function tracing when entering into suspend and resume paths. The stop of ftrace was done because there was some function that would crash the system if one called smp_processor_id()! The stop/start was a big hammer to solve the issue at the time, which was when ftrace was first introduced into Linux. Now ftrace has better infrastructure to debug such issues, and I found the problem function and labeled it with "notrace" and function tracing can now safely be activated all the way down into the guts of suspend and resume Other changes include clean ups of uprobe code, clean up of the trace_seq() code, and other various small fixes and clean ups to ftrace and tracing" * tag 'trace-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (57 commits) ftrace: Add warning if tramp hash does not match nr_trampolines ftrace: Fix trampoline hash update check on rec->flags ring-buffer: Use rb_page_size() instead of open coded head_page size ftrace: Rename ftrace_ops field from trampolines to nr_trampolines tracing: Convert local function_graph functions to static ftrace: Do not copy old hash when resetting tracing: let user specify tracing_thresh after selecting function_graph ring-buffer: Always run per-cpu ring buffer resize with schedule_work_on() tracing: Remove function_trace_stop and HAVE_FUNCTION_TRACE_MCOUNT_TEST s390/ftrace: remove check of obsolete variable function_trace_stop arm64, ftrace: Remove check of obsolete variable function_trace_stop Blackfin: ftrace: Remove check of obsolete variable function_trace_stop metag: ftrace: Remove check of obsolete variable function_trace_stop microblaze: ftrace: Remove check of obsolete variable function_trace_stop MIPS: ftrace: Remove check of obsolete variable function_trace_stop parisc: ftrace: Remove check of obsolete variable function_trace_stop sh: ftrace: Remove check of obsolete variable function_trace_stop sparc64,ftrace: Remove check of obsolete variable function_trace_stop tile: ftrace: Remove check of obsolete variable function_trace_stop ftrace: x86: Remove check of obsolete variable function_trace_stop ...
2014-08-04Merge branch 'for-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroupLinus Torvalds5-332/+627
Pull cgroup changes from Tejun Heo: "Mostly changes to get the v2 interface ready. The core features are mostly ready now and I think it's reasonable to expect to drop the devel mask in one or two devel cycles at least for a subset of controllers. - cgroup added a controller dependency mechanism so that block cgroup can depend on memory cgroup. This will be used to finally support IO provisioning on the writeback traffic, which is currently being implemented. - The v2 interface now uses a separate table so that the interface files for the new interface are explicitly declared in one place. Each controller will explicitly review and add the files for the new interface. - cpuset is getting ready for the hierarchical behavior which is in the similar style with other controllers so that an ancestor's configuration change doesn't change the descendants' configurations irreversibly and processes aren't silently migrated when a CPU or node goes down. All the changes are to the new interface and no behavior changed for the multiple hierarchies" * 'for-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (29 commits) cpuset: fix the WARN_ON() in update_nodemasks_hier() cgroup: initialize cgrp_dfl_root_inhibit_ss_mask from !->dfl_files test cgroup: make CFTYPE_ONLY_ON_DFL and CFTYPE_NO_ internal to cgroup core cgroup: distinguish the default and legacy hierarchies when handling cftypes cgroup: replace cgroup_add_cftypes() with cgroup_add_legacy_cftypes() cgroup: rename cgroup_subsys->base_cftypes to ->legacy_cftypes cgroup: split cgroup_base_files[] into cgroup_{dfl|legacy}_base_files[] cpuset: export effective masks to userspace cpuset: allow writing offlined masks to cpuset.cpus/mems cpuset: enable onlined cpu/node in effective masks cpuset: refactor cpuset_hotplug_update_tasks() cpuset: make cs->{cpus, mems}_allowed as user-configured masks cpuset: apply cs->effective_{cpus,mems} cpuset: initialize top_cpuset's configured masks at mount cpuset: use effective cpumask to build sched domains cpuset: inherit ancestor's masks if effective_{cpus, mems} becomes empty cpuset: update cs->effective_{cpus, mems} when config changes cpuset: update cpuset->effective_{cpus,mems} at hotplug cpuset: add cs->effective_cpus and cs->effective_mems cgroup: clean up sane_behavior handling ...