aboutsummaryrefslogtreecommitdiffstats
path: root/kernel (follow)
AgeCommit message (Collapse)AuthorFilesLines
2014-01-21fsnotify: remove .should_send_event callbackJan Kara2-20/+1
After removing event structure creation from the generic layer there is no reason for separate .should_send_event and .handle_event callbacks. So just remove the first one. Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Eric Paris <eparis@parisplace.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-21fsnotify: do not share events between notification groupsJan Kara2-10/+12
Currently fsnotify framework creates one event structure for each notification event and links this event into all interested notification groups. This is done so that we save memory when several notification groups are interested in the event. However the need for event structure shared between inotify & fanotify bloats the event structure so the result is often higher memory consumption. Another problem is that fsnotify framework keeps path references with outstanding events so that fanotify can return open file descriptors with its events. This has the undesirable effect that filesystem cannot be unmounted while there are outstanding events - a regression for inotify compared to a situation before it was converted to fsnotify framework. For fanotify this problem is hard to avoid and users of fanotify should kind of expect this behavior when they ask for file descriptors from notified files. This patch changes fsnotify and its users to create separate event structure for each group. This allows for much simpler code (~400 lines removed by this patch) and also smaller event structures. For example on 64-bit system original struct fsnotify_event consumes 120 bytes, plus additional space for file name, additional 24 bytes for second and each subsequent group linking the event, and additional 32 bytes for each inotify group for private data. After the conversion inotify event consumes 48 bytes plus space for file name which is considerably less memory unless file names are long and there are several groups interested in the events (both of which are uncommon). Fanotify event fits in 56 bytes after the conversion (fanotify doesn't care about file names so its events don't have to have it allocated). A win unless there are four or more fanotify groups interested in the event. The conversion also solves the problem with unmount when only inotify is used as we don't have to grab path references for inotify events. [hughd@google.com: fanotify: fix corruption preventing startup] Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Eric Paris <eparis@parisplace.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-21module: Add missing newline in printk call.Tetsuo Handa1-4/+2
Add missing \n and also follow commit bddb12b3 "kernel/module.c: use pr_foo()". Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2014-01-20Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds8-254/+190
Pull timer changes from Ingo Molnar: - ARM clocksource/clockevent improvements and fixes - generic timekeeping updates: TAI fixes/improvements, cleanups - Posix cpu timer cleanups and improvements - dynticks updates: full dynticks bugfixes, optimizations and cleanups * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (46 commits) clocksource: Timer-sun5i: Switch to sched_clock_register() timekeeping: Remove comment that's mostly out of date rtc-cmos: Add an alarm disable quirk timekeeper: fix comment typo for tk_setup_internals() timekeeping: Fix missing timekeeping_update in suspend path timekeeping: Fix CLOCK_TAI timer/nanosleep delays tick/timekeeping: Call update_wall_time outside the jiffies lock timekeeping: Avoid possible deadlock from clock_was_set_delayed timekeeping: Fix potential lost pv notification of time change timekeeping: Fix lost updates to tai adjustment clocksource: sh_cmt: Add clk_prepare/unprepare support clocksource: bcm_kona_timer: Remove unused bcm_timer_ids clocksource: vt8500: Remove deprecated IRQF_DISABLED clocksource: tegra: Remove deprecated IRQF_DISABLED clocksource: misc drivers: Remove deprecated IRQF_DISABLED clocksource: sh_mtu2: Remove unnecessary platform_set_drvdata() clocksource: sh_tmu: Remove unnecessary platform_set_drvdata() clocksource: armada-370-xp: Enable timer divider only when needed clocksource: clksrc-of: Warn if no clock sources are found clocksource: orion: Switch to sched_clock_register() ...
2014-01-20Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds24-323/+3087
Pull scheduler changes from Ingo Molnar: - Add the initial implementation of SCHED_DEADLINE support: a real-time scheduling policy where tasks that meet their deadlines and periodically execute their instances in less than their runtime quota see real-time scheduling and won't miss any of their deadlines. Tasks that go over their quota get delayed (Available to privileged users for now) - Clean up and fix preempt_enable_no_resched() abuse all around the tree - Do sched_clock() performance optimizations on x86 and elsewhere - Fix and improve auto-NUMA balancing - Fix and clean up the idle loop - Apply various cleanups and fixes * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (60 commits) sched: Fix __sched_setscheduler() nice test sched: Move SCHED_RESET_ON_FORK into attr::sched_flags sched: Fix up attr::sched_priority warning sched: Fix up scheduler syscall LTP fails sched: Preserve the nice level over sched_setscheduler() and sched_setparam() calls sched/core: Fix htmldocs warnings sched/deadline: No need to check p if dl_se is valid sched/deadline: Remove unused variables sched/deadline: Fix sparse static warnings m68k: Fix build warning in mac_via.h sched, thermal: Clean up preempt_enable_no_resched() abuse sched, net: Fixup busy_loop_us_clock() sched, net: Clean up preempt_enable_no_resched() abuse sched/preempt: Fix up missed PREEMPT_NEED_RESCHED folding sched/preempt, locking: Rework local_bh_{dis,en}able() sched/clock, x86: Avoid a runtime condition in native_sched_clock() sched/clock: Fix up clear_sched_clock_stable() sched/clock, x86: Use a static_key for sched_clock_stable sched/clock: Remove local_irq_disable() from the clocks sched/clock, x86: Rewrite cyc2ns() to avoid the need to disable IRQs ...
2014-01-20Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds3-37/+96
Pull perf updates from Ingo Molnar: "Kernel side changes: - Add Intel RAPL energy counter support (Stephane Eranian) - Clean up uprobes (Oleg Nesterov) - Optimize ring-buffer writes (Peter Zijlstra) Tooling side changes, user visible: - 'perf diff': - Add column colouring improvements (Ramkumar Ramachandra) - 'perf kvm': - Add guest related improvements, including allowing to specify a directory with guest specific /proc information (Dongsheng Yang) - Add shell completion support (Ramkumar Ramachandra) - Add '-v' option (Dongsheng Yang) - Support --guestmount (Dongsheng Yang) - 'perf probe': - Support showing source code, asking for variables to be collected at probe time and other 'perf probe' operations that use DWARF information. This supports only binaries with debugging information at this time, detached debuginfo (aka debuginfo packages) support should come in later patches (Masami Hiramatsu) - 'perf record': - Rename --no-delay option to --no-buffering, better reflecting its purpose and freeing up '--delay' to take the place of '--initial-delay', so that 'record' and 'stat' are consistent (Arnaldo Carvalho de Melo) - Default the -t/--thread option to no inheritance (Adrian Hunter) - Make per-cpu mmaps the default (Adrian Hunter) - 'perf report': - Improve callchain processing performance (Frederic Weisbecker) - Retain bfd reference to lookup source line numbers, greatly optimizing, among other use cases, 'perf report -s srcline' (Adrian Hunter) - Improve callchain processing performance even more (Namhyung Kim) - Add a perf.data file header window in the 'perf report' TUI, associated with the 'i' hotkey, providing a counterpart to the --header option in the stdio UI (Namhyung Kim) - 'perf script': - Add an option in 'perf script' to print the source line number (Adrian Hunter) - Add --header/--header-only options to 'script' and 'report', the default is not tho show the header info, but as this has been the default for some time, leave a single line explaining how to obtain that information (Jiri Olsa) - Add options to show comm, fork, exit and mmap PERF_RECORD_ events (Namhyung Kim) - Print callchains and symbols if they exist (David Ahern) - 'perf timechart' - Add backtrace support to CPU info - Print pid along the name - Add support for CPU topology - Add new option --highlight'ing threads, be it by name or, if a numeric value is provided, that run more than given duration (Stanislav Fomichev) - 'perf top': - Make 'perf top -g' refer to callchains, for consistency with other tools (David Ahern) - 'perf trace': - Handle old kernels where the "raw_syscalls" tracepoints were called plain "syscalls" (David Ahern) - Remove thread summary coloring, by Pekka Enberg. - Honour -m option in 'trace', the tool was offering the option to set the mmap size, but wasn't using it when doing the actual mmap on the events file descriptors (Jiri Olsa) - generic: - Backport libtraceevent plugin support (trace-cmd repository, with plugins for jbd2, hrtimer, kmem, kvm, mac80211, sched_switch, function, xen, scsi, cfg80211 (Jiri Olsa) - Print session information only if --stdio is given (Namhyung Kim) Tooling side changes, developer visible (plumbing): - Improve 'perf probe' exit path, release resources (Masami Hiramatsu) - Improve libtraceevent plugins exit path, allowing the registering of an unregister handler to be called at exit time (Namhyung Kim) - Add an alias to the build test makefile (make -C tools/perf build-test) (Namhyung Kim) - Get rid of die() and friends (good riddance!) in libtraceevent (Namhyung Kim) - Fix cross build problems related to pkgconfig and CROSS_COMPILE not being propagated to the feature tests, leading to features being tested in the host and then being enabled on the target (Mark Rutland) - Improve forked workload error reporting by sending the errno in the signal data queueing integer field, using sigqueue and by doing the signal setup in the evlist methods, removing open coded equivalents in various tools (Arnaldo Carvalho de Melo) - Do more auto exit cleanup chores in the 'evlist' destructor, so that the tools don't have to all do that sequence (Arnaldo Carvalho de Melo) - Pack 'struct perf_session_env' and 'struct trace' (Arnaldo Carvalho de Melo) - Add test for building detached source tarballs (Arnaldo Carvalho de Melo) - Move some header files (tools/perf/ to tools/include/ to make them available to other tools/ dwelling codebases (Namhyung Kim) - Move logic to warn about kptr_restrict'ed kernels to separate function in 'report' (Arnaldo Carvalho de Melo) - Move hist browser selection code to separate function (Arnaldo Carvalho de Melo) - Move histogram entries collapsing to separate function (Arnaldo Carvalho de Melo) - Introduce evlist__for_each() & friends (Arnaldo Carvalho de Melo) - Automate setup of FEATURE_CHECK_(C|LD)FLAGS-all variables (Jiri Olsa) - Move arch setup into seprate Makefile (Jiri Olsa) - Make libtraceevent install target quieter (Jiri Olsa) - Make tests/make output more compact (Jiri Olsa) - Ignore generated files in feature-checks (Chunwei Chen) - Introduce pevent_filter_strerror() in libtraceevent, similar in purpose to libc's strerror() function (Namhyung Kim) - Use perf_data_file methods to write output file in 'record' and 'inject' (Jiri Olsa) - Use pr_*() functions where applicable in 'report' (Namhyumg Kim) - Add 'machine' 'addr_location' struct to have full picture (machine, thread, map, symbol, addr) for a (partially) resolved address, reducing function signatures (Arnaldo Carvalho de Melo) - Reduce code duplication in the histogram entry creation/insertion (Arnaldo Carvalho de Melo) - Auto allocate annotation histogram data structures (Arnaldo Carvalho de Melo) - No need to test against NULL before calling free, also set freed memory in struct pointers to NULL, to help fixing use after free bugs (Arnaldo Carvalho de Melo) - Rename some struct DSO binary_type related members and methods, to clarify its purpose and need for differentiation (symtab_type, ie one is about the files .text, CFI, etc, i.e. its binary contents, and the other is about where the symbol table came from (Arnaldo Carvalho de Melo) - Convert to new topic libraries, starting with an API one (sysfs, debugfs, etc), renaming liblk in the process (Borislav Petkov) - Get rid of some more panic() like error handling in libtraceevent. (Namhyung Kim) - Get rid of panic() like calls in libtraceevent (Namyung Kim) - Start carving out symbol parsing routines (perf, just moving routines to topic files in tools/lib/symbol/, tools that want to use it need to integrate it directly, ie no tools/lib/symbol/Makefile is provided (Arnaldo Carvalho de Melo) - Assorted refactoring patches, moving code around and adding utility evlist methods that will be used in the IPT patchset (Adrian Hunter) - Assorted mmap_pages handling fixes (Adrian Hunter) - Several man pages typo fixes (Dongsheng Yang) - Get rid of several die() calls in libtraceevent (Namhyung Kim) - Use basename() in a more robust way, to avoid problems related to different system library implementations for that function (Stephane Eranian) - Remove open coded management of short_name_allocated member (Adrian Hunter) - Several cleanups in the "dso" methods, constifying some parameters and renaming some fields to clarify its purpose (Arnaldo Carvalho de Melo) - Add per-feature check flags, fixing libunwind related build problems on some architectures (Jean Pihet) - Do not disable source line lookup just because of one failure. (Adrian Hunter) - Several 'perf kvm' man page corrections (Dongsheng Yang) - Correct the message in feature-libnuma checking, swowing the right devel package names for various distros (Dongsheng Yang) - Polish 'readn()' function and introduce its counterpart, 'writen()' (Jiri Olsa) - Start moving timechart state from global variables to a 'perf_tool' derived 'timechart' struct (Arnaldo Carvalho de Melo) ... and lots of fixes and improvements I forgot to list" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (282 commits) perf tools: Remove unnecessary callchain cursor state restore on unmatch perf callchain: Spare double comparison of callchain first entry perf tools: Do proper comm override error handling perf symbols: Export elf_section_by_name and reuse perf probe: Release all dynamically allocated parameters perf probe: Release allocated probe_trace_event if failed perf tools: Add 'build-test' make target tools lib traceevent: Unregister handler when xen plugin is unloaded tools lib traceevent: Unregister handler when scsi plugin is unloaded tools lib traceevent: Unregister handler when jbd2 plugin is is unloaded tools lib traceevent: Unregister handler when cfg80211 plugin is unloaded tools lib traceevent: Unregister handler when mac80211 plugin is unloaded tools lib traceevent: Unregister handler when sched_switch plugin is unloaded tools lib traceevent: Unregister handler when kvm plugin is unloaded tools lib traceevent: Unregister handler when kmem plugin is unloaded tools lib traceevent: Unregister handler when hrtimer plugin is unloaded tools lib traceevent: Unregister handler when function plugin is unloaded tools lib traceevent: Add pevent_unregister_print_function() tools lib traceevent: Add pevent_unregister_event_handler() tools lib traceevent: fix pointer-integer size mismatch ...
2014-01-20Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds8-74/+251
Pull RCU updates from Ingo Molnar: - add RCU torture scripts/tooling - static analysis improvements - update RCU documentation - miscellaneous fixes * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (52 commits) rcu: Remove "extern" from function declarations in kernel/rcu/rcu.h rcu: Remove "extern" from function declarations in include/linux/*rcu*.h rcu/torture: Dynamically allocate SRCU output buffer to avoid overflow rcu: Don't activate RCU core on NO_HZ_FULL CPUs rcu: Warn on allegedly impossible rcu_read_unlock_special() from irq rcu: Add an RCU_INITIALIZER for global RCU-protected pointers rcu: Make rcu_assign_pointer's assignment volatile and type-safe bonding: Use RCU_INIT_POINTER() for better overhead and for sparse rcu: Add comment on evaluate-once properties of rcu_assign_pointer(). rcu: Provide better diagnostics for blocking in RCU callback functions rcu: Improve SRCU's grace-period comments rcu: Fix CONFIG_RCU_FANOUT_EXACT for odd fanout/leaf values rcu: Fix coccinelle warnings rcutorture: Stop tracking FSF's postal address rcutorture: Move checkarg to functions.sh rcutorture: Flag errors and warnings with color coding rcutorture: Record results from repeated runs of the same test scenario rcutorture: Test summary at end of run with less chattiness rcutorture: Update comment in kvm.sh listing typical RCU trace events rcutorture: Add tracing-enabled version of TREE08 ...
2014-01-20Merge branch 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds6-50/+242
Pull core locking changes from Ingo Molnar: - futex performance increases: larger hashes, smarter wakeups - mutex debugging improvements - lots of SMP ordering documentation updates - introduce the smp_load_acquire(), smp_store_release() primitives. (There are WIP patches that make use of them - not yet merged) - lockdep micro-optimizations - lockdep improvement: better cover IRQ contexts - liblockdep at last. We'll continue to monitor how useful this is * 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (34 commits) futexes: Fix futex_hashsize initialization arch: Re-sort some Kbuild files to hopefully help avoid some conflicts futexes: Avoid taking the hb->lock if there's nothing to wake up futexes: Document multiprocessor ordering guarantees futexes: Increase hash table size for better performance futexes: Clean up various details arch: Introduce smp_load_acquire(), smp_store_release() arch: Clean up asm/barrier.h implementations using asm-generic/barrier.h arch: Move smp_mb__{before,after}_atomic_{inc,dec}.h into asm/atomic.h locking/doc: Rename LOCK/UNLOCK to ACQUIRE/RELEASE mutexes: Give more informative mutex warning in the !lock->owner case powerpc: Full barrier for smp_mb__after_unlock_lock() rcu: Apply smp_mb__after_unlock_lock() to preserve grace periods Documentation/memory-barriers.txt: Downgrade UNLOCK+BLOCK locking: Add an smp_mb__after_unlock_lock() for UNLOCK+BLOCK barrier Documentation/memory-barriers.txt: Document ACCESS_ONCE() Documentation/memory-barriers.txt: Prohibit speculative writes Documentation/memory-barriers.txt: Add long atomic examples to memory-barriers.txt Documentation/memory-barriers.txt: Add needed ACCESS_ONCE() calls to memory-barriers.txt Revert "smp/cpumask: Make CONFIG_CPUMASK_OFFSTACK=y usable without debug dependency" ...
2014-01-20Merge branch 'core-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds1-1/+1
Pull core debug changes from Ingo Molnar: "Currently there are two methods to set the panic_timeout: via 'panic=X' boot commandline option, or via /proc/sys/kernel/panic. This tree adds a third panic_timeout configuration method: configuration via Kconfig, via CONFIG_PANIC_TIMEOUT=X - useful to distros that generally want their kernel defaults to come with the .config. CONFIG_PANIC_TIMEOUT defaults to 0, which was the previous default value of panic_timeout. Doing that unearthed a few arch trickeries regarding arch-special panic_timeout values and related complications - hopefully all resolved to the satisfaction of everyone" * 'core-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: powerpc: Clean up panic_timeout usage MIPS: Remove panic_timeout settings panic: Make panic_timeout configurable
2014-01-19tracing: Fix buggered tee(2) on tracing_pipeAl Viro1-7/+1
In kernel/trace/trace.c we have this: static void tracing_pipe_buf_release(struct pipe_inode_info *pipe, struct pipe_buffer *buf) { __free_page(buf->page); } static const struct pipe_buf_operations tracing_pipe_buf_ops = { .can_merge = 0, .map = generic_pipe_buf_map, .unmap = generic_pipe_buf_unmap, .confirm = generic_pipe_buf_confirm, .release = tracing_pipe_buf_release, .steal = generic_pipe_buf_steal, .get = generic_pipe_buf_get, }; with void generic_pipe_buf_get(struct pipe_inode_info *pipe, struct pipe_buffer *buf) { page_cache_get(buf->page); } and I don't see anything that would've prevented tee(2) called on the pipe that got stuff spliced into it from that sucker. ->ops->get() will be called, then buf gets copied into target pipe's ->bufs[] and eventually readers get to both copies of the buffer. With get_page(page) look at that page __free_page(page) look at that page __free_page(page) which is not a good thing, to put it mildly. AFAICS, that ought to use the normal generic_pipe_buf_release() (aka page_cache_release(buf->page)), shouldn't it? [ SDR - As trace_pipe just allocates the page with alloc_page(GFP_KERNEL), and doesn't do anything special with it (no LRU logic). The __free_page() should be fine, as it wont actually free a page with reference count. Maybe there's a chance to leak memory? Anyway, This change is at a minimum good for being symmetric with generic_pipe_buf_get, it is fine to add. ] Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> [ SDR - Removed no longer used tracing_pipe_buf_release ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-01-18cgroup: trivial style updatesSeongJae Park1-2/+3
* Place newline before function opening brace in cgroup_kill_sb(). * Insert space before assignment in attach_task_by_pid() tj: merged two patches into one. Signed-off-by: SeongJae Park <sj38.park@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2014-01-17Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespaceLinus Torvalds1-1/+1
Pull namespace fixes from Eric Biederman: "This is a set of 3 regression fixes. This fixes /proc/mounts when using "ip netns add <netns>" to display the actual mount point. This fixes a regression in clone that broke lxc-attach. This fixes a regression in the permission checks for mounting /proc that made proc unmountable if binfmt_misc was in use. Oops. My apologies for sending this pull request so late. Al Viro gave interesting review comments about the d_path fix that I wanted to address in detail before I sent this pull request. Unfortunately a bad round of colds kept from addressing that in detail until today. The executive summary of the review was: Al: Is patching d_path really sufficient? The prepend_path, d_path, d_absolute_path, and __d_path family of functions is a really mess. Me: Yes, patching d_path is really sufficient. Yes, the code is mess. No it is not appropriate to rewrite all of d_path for a regression that has existed for entirely too long already, when a two line change will do" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: vfs: Fix a regression in mounting proc fork: Allow CLONE_PARENT after setns(CLONE_NEWPID) vfs: In d_path don't call d_dname on a mount point
2014-01-17audit: fix location of __net_initdata for audit_net_opsRichard Guy Briggs1-1/+1
Fixup caught by checkpatch. Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Eric Paris <eparis@redhat.com>
2014-01-17audit: remove pr_info for every network namespaceEric Paris1-2/+0
A message about creating the audit socket might be fine at startup, but a pr_info for every single network namespace created on a system isn't useful. Signed-off-by: Eric Paris <eparis@redhat.com>
2014-01-16sched: Fix __sched_setscheduler() nice testPeter Zijlstra1-1/+2
With the introduction of sched_attr::sched_nice we need to check if we've got permission to actually change the nice value. Daniel found that can_nice() would always fail; and upon inspection it turns out that can_nice() only tests to see if we can lower the nice value, but it doesn't validate if we're lowering or not. Therefore amend the test to only call can_nice() when we lower the nice value. Reported-and-Tested-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: raistlin@linux.it Cc: juri.lelli@gmail.com Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Fixes: d50dde5a10f3 ("sched: Add new scheduler syscalls to support an extended scheduling parameters ABI") Link: http://lkml.kernel.org/r/20140116165425.GA9481@laptop.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-16futexes: Fix futex_hashsize initializationHeiko Carstens1-2/+4
"futexes: Increase hash table size for better performance" introduces a new alloc_large_system_hash() call. alloc_large_system_hash() however may allocate less memory than requested, e.g. limited by MAX_ORDER. Hence pass a pointer to alloc_large_system_hash() which will contain the hash shift when the function returns. Afterwards correctly set futex_hashsize. Fixes a crash on s390 where the requested allocation size was 4MB but only 1MB was allocated. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Darren Hart <dvhart@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Waiman Long <Waiman.Long@hp.com> Cc: Jason Low <jason.low2@hp.com> Cc: Davidlohr Bueso <davidlohr@hp.com> Link: http://lkml.kernel.org/r/20140116135450.GA4345@osiris Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-16Merge branch 'perf/urgent' into perf/coreIngo Molnar15-116/+162
Pick up the latest fixes, refresh the development tree. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-16sched: Move SCHED_RESET_ON_FORK into attr::sched_flagsPeter Zijlstra1-14/+28
I noticed the new sched_{set,get}attr() calls didn't properly deal with the SCHED_RESET_ON_FORK hack. Instead of propagating the flags in high bits nonsense use the brand spanking new attr::sched_flags field. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Juri Lelli <juri.lelli@gmail.com> Cc: Dario Faggioli <raistlin@linux.it> Link: http://lkml.kernel.org/r/20140115162242.GJ31570@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-16sched: Fix up attr::sched_priority warningPeter Zijlstra1-2/+1
Fengguang Wu reported the following build warning: > kernel/sched/core.c:3067 __sched_setscheduler() warn: unsigned 'attr->sched_priority' is never less than zero. Since it doesn't make sense for attr::sched_priority to be negative, remove the check, since we already test for an upper limit any actual negative values passed in through the old param::sched_priority field will still be detected. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Juri Lelli <juri.lelli@gmail.com> Cc: Dario Faggioli <raistlin@linux.it> Fixes: d50dde5a10f3 ("sched: Add new scheduler syscalls to support an extended scheduling parameters ABI") Link: http://lkml.kernel.org/n/tip-fid9nalzii2r5voxtf4eh5kz@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-16sched: Fix up scheduler syscall LTP failsPeter Zijlstra1-3/+11
Wu reported LTP failures: > ltp.sched_setparam02.1.TFAIL > ltp.sched_setparam02.2.TFAIL > ltp.sched_setparam02.3.TFAIL > ltp.sched_setparam03.1.TFAIL There were 2 things wrong; firstly __setscheduler() failed on sched_setparam()'s policy = -1, fix that by reading from p->policy in that case. Secondly, getparam() (and getattr()) would still report !0 sched_priority for !FIFO/RR tasks after having been such. So unconditionally set p->rt_priority. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Juri Lelli <juri.lelli@gmail.com> Cc: Dario Faggioli <raistlin@linux.it> Fixes: d50dde5a10f3 ("sched: Add new scheduler syscalls to support an extended scheduling parameters ABI") Link: http://lkml.kernel.org/r/20140115153320.GH31570@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-16sched: Preserve the nice level over sched_setscheduler() and sched_setparam() callsPeter Zijlstra1-2/+4
Previously sched_setscheduler() and sched_setparam() would not affect the nice value of a task, restore this behaviour. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: raistlin@linux.it Cc: juri.lelli@gmail.com Cc: Michael wang <wangyun@linux.vnet.ibm.com> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Fixes: d50dde5a10f3 ("sched: Add new scheduler syscalls to support an extended scheduling parameters ABI") Link: http://lkml.kernel.org/r/20140115113015.GB31570@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-16sched/core: Fix htmldocs warningsJuri Lelli1-2/+2
Fengguang Wu's kbuild test robot reported the following new htmldocs warnings: >>> Warning(kernel/sched/core.c:3380): No description found for parameter 'uattr' >>> Warning(kernel/sched/core.c:3380): Excess function parameter 'attr' description in 'sys_sched_setattr' >>> Warning(kernel/sched/core.c:3520): No description found for parameter 'uattr' >>> Warning(kernel/sched/core.c:3520): Excess function parameter 'attr' description in 'sys_sched_getattr' The second argument to sys_sched_{setattr,getattr}() is named uattr (not attr). Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Juri Lelli <juri.lelli@gmail.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Dario Faggioli <raistlin@linux.it> Fixes: d50dde5a10f3 ("sched: Add new scheduler syscalls to support an extended scheduling parameters ABI") Link: http://lkml.kernel.org/r/52D5552D.5000102@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-16sched/deadline: No need to check p if dl_se is validJuri Lelli1-2/+1
Dan Carpenter reported new 'Smatch' warnings: > tree: git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git sched/core > head: 130816ce4d5f69167324f7272e70aa3d641677c6 > commit: 1baca4ce16b8cc7d4f50be1f7914799af30a2861 [17/50] sched/deadline: Add SCHED_DEADLINE SMP-related data structures & logic > > kernel/sched/deadline.c:937 pick_next_task_dl() warn: variable dereferenced before check 'p' (see line 934) BUG_ON() already fires if pick_next_dl_entity() doesn't return a valid dl_se. No need to check if p is valid afterward. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Juri Lelli <juri.lelli@gmail.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Fixes: 1baca4ce16b8 ("sched/deadline: Add SCHED_DEADLINE SMP-related data structures & logic") Link: http://lkml.kernel.org/r/52D54E25.6060100@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-16sched/deadline: Remove unused variablesPeter Zijlstra1-11/+0
fix these new sparse warnings: >> kernel/sched/core.c:305:14: sparse: symbol 'sysctl_sched_dl_period' was not declared. Should it be static? >> kernel/sched/core.c:306:5: sparse: symbol 'sysctl_sched_dl_runtime' was not declared. Should it be static? Better still, they're completely unused so remove them. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Juri Lelli <juri.lelli@gmail.com> Link: http://lkml.kernel.org/n/tip-ke0shkG7vMnzmcdqhhiymyem@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-16sched/deadline: Fix sparse static warningsFengguang Wu1-3/+3
new sparse warnings: >> kernel/sched/cpudeadline.c:38:6: sparse: symbol 'cpudl_exchange' was not declared. Should it be static? >> kernel/sched/cpudeadline.c:46:6: sparse: symbol 'cpudl_heapify' was not declared. Should it be static? >> kernel/sched/cpudeadline.c:71:6: sparse: symbol 'cpudl_change_key' was not declared. Should it be static? >> kernel/sched/cpudeadline.c:195:15: sparse: memset with byte count of 163928 Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Juri Lelli <juri.lelli@gmail.com> Fixes: 6bfd6d72f51c ("sched/deadline: speed up SCHED_DEADLINE pushes with a push-heap") Link: http://lkml.kernel.org/r/52d47f8c.EYJsA5+mELPBk4t6\%fengguang.wu@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-16Merge branches 'sched-urgent-for-linus' and 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds1-1/+1
Pull scheduler and timer fixes from Ingo Molnar: "Contains a fix for a scheduler bug that manifested itself as a 3D performance regression and a crash fix for the ARM Cadence TTC clock driver" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched: Calculate effective load even if local weight is 0 * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: clocksource: cadence_ttc: Fix mutex taken inside interrupt context
2014-01-16sched/nohz: Fix overflow error in scheduler_tick_max_deferment()Kevin Hilman1-1/+1
While calculating the scheduler tick max deferment, the delta is converted from microseconds to nanoseconds through a multiplication against NSEC_PER_USEC. But this microseconds operand is an unsigned int, thus the result may likely overflow. The result is cast to u64 but only once the operation is completed, which is too late to avoid overflown result. This is currently not a problem because the scheduler tick max deferment is 1 second. But this may become an issue as we plan to make this value tunable. So lets fix this by casting the usecs value to u64 before multiplying by NSECS_PER_USEC. Also to prevent from this kind of mistake to happen again, move this ad-hoc jiffies -> nsecs conversion to a new helper. Signed-off-by: Kevin Hilman <khilman@linaro.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Alex Shi <alex.shi@linaro.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Kevin Hilman <khilman@linaro.org> Link: http://lkml.kernel.org/r/1387315388-31676-2-git-send-email-khilman@linaro.org [move ad-hoc conversion to jiffies_to_nsecs helper] Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2014-01-15nohz_full: fix code style issue of tick_nohz_full_stop_tickAlex Shi1-8/+8
Code usually starts with 'tab' instead of 7 'space' in kernel Signed-off-by: Alex Shi <alex.shi@linaro.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Alex Shi <alex.shi@linaro.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Kevin Hilman <khilman@linaro.org> Link: http://lkml.kernel.org/r/1386074112-30754-2-git-send-email-alex.shi@linaro.org Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2014-01-15nohz: Get timekeeping max deferment outside jiffies_lockFrederic Weisbecker1-1/+2
We don't need to fetch the timekeeping max deferment under the jiffies_lock seqlock. If the clocksource is updated concurrently while we stop the tick, stop machine is called and the tick will be reevaluated again along with uptodate jiffies and its related values. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Alex Shi <alex.shi@linaro.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Kevin Hilman <khilman@linaro.org> Link: http://lkml.kernel.org/r/1387320692-28460-9-git-send-email-fweisbec@gmail.com Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2014-01-15tick: Rename tick_check_idle() to tick_irq_enter()Frederic Weisbecker2-5/+5
This makes the code more symetric against the existing tick functions called on irq exit: tick_irq_exit() and tick_nohz_irq_exit(). These function are also symetric as they mirror each other's action: we start to account idle time on irq exit and we stop this accounting on irq entry. Also the tick is stopped on irq exit and timekeeping catches up with the tickless time elapsed until we reach irq entry. This rename was suggested by Peter Zijlstra a long while ago but it got forgotten in the mass. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Alex Shi <alex.shi@linaro.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Kevin Hilman <khilman@linaro.org> Link: http://lkml.kernel.org/r/1387320692-28460-2-git-send-email-fweisbec@gmail.com Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2014-01-14audit: Convert int limit uses to u32Joe Perches2-25/+26
The equivalent uapi struct uses __u32 so make the kernel uses u32 too. This can prevent some oddities where the limit is logged/emitted as a negative value. Convert kstrtol to kstrtouint to disallow negative values. Signed-off-by: Joe Perches <joe@perches.com> [eparis: do not remove static from audit_default declaration]
2014-01-14audit: Use more current logging styleJoe Perches1-20/+18
Add pr_fmt to prefix "audit: " to output Convert printk(KERN_<LEVEL> to pr_<level> Coalesce formats Use pr_cont Move a brace after switch Signed-off-by: Joe Perches <joe@perches.com>
2014-01-14audit: Use hex_byte_pack_upperJoe Perches1-5/+2
Using the generic kernel function causes the object size to increase with gcc 4.8.1. $ size kernel/audit.o* text data bss dec hex filename 18577 6079 8436 33092 8144 kernel/audit.o.new 18579 6015 8420 33014 80f6 kernel/audit.o.old Unsigned...
2014-01-14tracing: Have trace buffer point back to trace_arraySteven Rostedt (Red Hat)1-0/+2
The trace buffer has a descriptor pointer that goes back to the trace array. But it was never assigned. Luckily, nothing uses it (yet), but it will in the future. Although nothing currently uses this, if any of the new features get backported to older kernels, and because this is such a simple change, I'm marking it for stable too. Cc: stable@vger.kernel.org # v3.10+ Fixes: 12883efb670c "tracing: Consolidate max_tr into main trace_array structure" Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-01-13audit: reorder AUDIT_TTY_SET argumentsEric Paris1-7/+4
An admin is likely to want to see old and new values next to each other. Putting all of the old values followed by all of the new values is just hard to read as a human. Signed-off-by: Eric Paris <eparis@redhat.com>
2014-01-13audit: rework AUDIT_TTY_SET to only grab spin_lock onceEric Paris1-15/+13
We can simplify the AUDIT_TTY_SET code to only grab the spin_lock one time. We need to determine if the new values are valid and if so, set the new values at the same time we grab the old onces. While we are here get rid of 'res' and just use err. Signed-off-by: Eric Paris <eparis@redhat.com>
2014-01-13audit: remove needless switch in AUDIT_SETEric Paris1-16/+9
If userspace specified that it was setting values via the mask we do not need a second check to see if they also set the version field high enough to understand those values. (clearly if they set the mask they knew those values). Signed-off-by: Eric Paris <eparis@redhat.com>
2014-01-13audit: use define's for audit versionEric Paris1-1/+1
Give names to the audit versions. Just something for a userspace programmer to know what the version provides. Signed-off-by: Eric Paris <eparis@redhat.com>
2014-01-13audit: wait_for_auditd rework for readabilityEric Paris1-9/+7
We had some craziness with signed to unsigned long casting which appears wholely unnecessary. Just use signed long. Even though 2 values of the math equation are unsigned longs the result is expected to be a signed long. So why keep casting the result to signed long? Just make it signed long and use it. We also remove the needless "timeout" variable. We already have the stack "sleep_time" variable. Just use that... Signed-off-by: Eric Paris <eparis@redhat.com>
2014-01-13audit: log task info on feature changeRichard Guy Briggs1-0/+1
Add task information to the log when changing a feature state. Signed-off-by: Eric Paris <eparis@redhat.com>
2014-01-13audit: fix incorrect set of audit_sockGao feng1-1/+1
NETLINK_CB(skb).sk is the socket of user space process, netlink_unicast in kauditd_send_skb wants the kernel side socket. Since the sk_state of audit netlink socket is not NETLINK_CONNECTED, so the netlink_getsockbyportid doesn't return -ECONNREFUSED. And the socket of userspace process can be released anytime, so the audit_sock may point to invalid socket. this patch sets the audit_sock to the kernel side audit netlink socket. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Acked-by: Eric Paris <eparis@redhat.com> Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Eric Paris <eparis@redhat.com>
2014-01-13audit: print error message when fail to create audit socketGao feng1-5/+4
print the error message and then return -ENOMEM. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Acked-by: Eric Paris <eparis@redhat.com> Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Eric Paris <eparis@redhat.com>
2014-01-13audit: fix dangling keywords in audit_log_set_loginuid() outputRichard Guy Briggs1-6/+9
Remove spaces between "new", "old" label modifiers and "auid", "ses" labels in log output since userspace tools can't parse orphaned keywords. Make variable names more consistent and intuitive. Make audit_log_format() argument code easier to read. Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Eric Paris <eparis@redhat.com>
2014-01-13audit: log on errors from filter user rulesRichard Guy Briggs2-5/+8
An error on an AUDIT_NEVER rule disabled logging on that rule. On error on AUDIT_NEVER rules, log. Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Eric Paris <eparis@redhat.com>
2014-01-13audit: audit_log_start running on auditd should not stopToshiyuki Okajima1-6/+8
The backlog cannot be consumed when audit_log_start is running on auditd even if audit_log_start calls wait_for_auditd to consume it. The situation is the deadlock because only auditd can consume the backlog. If the other process needs to send the backlog, it can be also stopped by the deadlock. So, audit_log_start running on auditd should not stop. You can see the deadlock with the following reproducer: # auditctl -a exit,always -S all # reboot Signed-off-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com> Reviewed-by: gaofeng@cn.fujitsu.com Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Eric Paris <eparis@redhat.com>
2014-01-13audit: drop audit_cmd_lock in AUDIT_USER family of casesRichard Guy Briggs1-0/+2
We do not need to hold the audit_cmd_mutex for this family of cases. The possible exception to this is the call to audit_filter_user(), so drop the lock immediately after. To help in fixing the race we are trying to avoid, make sure that nothing called by audit_filter_user() calls audit_log_start(). In particular, watch out for *_audit_rule_match(). This fix will take care of systemd and anything USING audit. It still means that we could race with something configuring audit and auditd shutting down. Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Reported-by: toshi.okajima@jp.fujitsu.com Tested-by: toshi.okajima@jp.fujitsu.com Signed-off-by: Eric Paris <eparis@redhat.com>
2014-01-13audit: convert all sessionid declaration to unsigned intEric Paris3-3/+3
Right now the sessionid value in the kernel is a combination of u32, int, and unsigned int. Just use unsigned int throughout. Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Eric Paris <eparis@redhat.com>
2014-01-13audit: Added exe field to audit core dump signal logPaul Davies C2-1/+10
Currently when the coredump signals are logged by the audit system, the actual path to the executable is not logged. Without details of exe, the system admin may not have an exact idea on what program failed. This patch changes the audit_log_task() so that the path to the exe is also logged. This was copied from audit_log_task_info() and the latter enhanced to avoid disappearing text fields. Signed-off-by: Paul Davies C <pauldaviesc@gmail.com> Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Eric Paris <eparis@redhat.com>
2014-01-13audit: prevent an older auditd shutdown from orphaning a newer auditd startupRichard Guy Briggs1-0/+2
There have been reports of auditd restarts resulting in kaudit not being able to find a newly registered auditd. It results in reports such as: kernel: [ 2077.233573] audit: *NO* daemon at audit_pid=1614 kernel: [ 2077.234712] audit: audit_lost=97 audit_rate_limit=0 audit_backlog_limit=320 kernel: [ 2077.234718] audit: auditd disappeared (previously mis-spelled "dissapeared") One possible cause is a race between the shutdown of an older auditd and a newer one. If the newer one sets the daemon pid to itself in kauditd before the older one has cleared the daemon pid, the newer daemon pid will be erased. This could be caused by an automated system, or by manual intervention, but in either case, there is no use in having the older daemon clear the daemon pid reference since its old pid is no longer being referenced. This patch will prevent that specific case, returning an error of EACCES. The case for preventing a newer auditd from registering itself if there is an existing auditd is a more difficult case that is beyond the scope of this patch. Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Eric Paris <eparis@redhat.com>
2014-01-13audit: refactor audit_receive_msg() to clarify AUDIT_*_RULE* casesRichard Guy Briggs2-33/+45
audit_receive_msg() needlessly contained a fallthrough case that called audit_receive_filter(), containing no common code between the cases. Separate them to make the logic clearer. Refactor AUDIT_LIST_RULES, AUDIT_ADD_RULE, AUDIT_DEL_RULE cases to create audit_rule_change(), audit_list_rules_send() functions. This should not functionally change the logic. Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Eric Paris <eparis@redhat.com>