aboutsummaryrefslogtreecommitdiffstats
path: root/fs (follow)
AgeCommit message (Collapse)AuthorFilesLines
2015-11-11vfs: remove unused wrapper block_page_mkwrite()Ross Zwisler4-26/+6
The function currently called "__block_page_mkwrite()" used to be called "block_page_mkwrite()" until a wrapper for this function was added by: commit 24da4fab5a61 ("vfs: Create __block_page_mkwrite() helper passing error values back") This wrapper, the current "block_page_mkwrite()", is currently unused. __block_page_mkwrite() is used directly by ext4, nilfs2 and xfs. Remove the unused wrapper, rename __block_page_mkwrite() back to block_page_mkwrite() and update the comment above block_page_mkwrite(). Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Jan Kara <jack@suse.com> Cc: Jan Kara <jack@suse.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-11-11binfmt_elf: Correct `arch_check_elf's descriptionMaciej W. Rozycki1-1/+1
Correct `arch_check_elf's description, mistakenly copied and pasted from `arch_elf_pt_proc'. Signed-off-by: Maciej W. Rozycki <macro@imgtec.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-11-11fs: fix writeback.c kernel-doc warningsRandy Dunlap1-1/+2
Fix kernel-doc warnings in fs/fs-writeback.c by moving a #define macro to after the function's opening brace. Also #undef this macro at the end of the function. ..//fs/fs-writeback.c:1984: warning: Excess function parameter 'inode' description in 'I_DIRTY_INODE' ..//fs/fs-writeback.c:1984: warning: Excess function parameter 'flags' description in 'I_DIRTY_INODE' Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-11-11fs: fix inode.c kernel-doc warningRandy Dunlap1-0/+1
Fix kernel-doc warning in fs/inode.c: ..//fs/inode.c:1606: warning: No description found for parameter 'inode' Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-11-11fs/pipe.c: return error code rather than 0 in pipe_write()Eric Biggers1-5/+4
pipe_write() would return 0 if it failed to merge the beginning of the data to write with the last, partially filled pipe buffer. It should return an error code instead. Userspace programs could be confused by write() returning 0 when called with a nonzero 'count'. The EFAULT error case was a regression from f0d1bec9d5 ("new helper: copy_page_from_iter()"), while the ops->confirm() error case was a much older bug. Test program: #include <assert.h> #include <errno.h> #include <unistd.h> int main(void) { int fd[2]; char data[1] = {0}; assert(0 == pipe(fd)); assert(1 == write(fd[1], data, 1)); /* prior to this patch, write() returned 0 here */ assert(-1 == write(fd[1], NULL, 1)); assert(errno == EFAULT); } Cc: stable@vger.kernel.org # at least v3.15+ Signed-off-by: Eric Biggers <ebiggers3@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-11-11fs/pipe.c: preserve alloc_file() error codeEric Biggers1-3/+6
If sys_pipe() was unable to allocate a 'struct file', it always failed with ENFILE, which means "The number of simultaneously open files in the system would exceed a system-imposed limit." However, alloc_file() actually returns an ERR_PTR value and might fail with other error codes. Currently, in addition to ENFILE, it can fail with ENOMEM, potentially when there are few open files in the system. Update sys_pipe() to preserve this error code. In a prior submission of a similar patch (1) some concern was raised about introducing a new error code for sys_pipe(). However, for most system calls, programs cannot assume that new error codes will never be introduced. In addition, ENOMEM was, in fact, already a possible error code for sys_pipe(), in the case where the file descriptor table could not be expanded due to insufficient memory. (1) http://comments.gmane.org/gmane.linux.kernel/1357942 Signed-off-by: Eric Biggers <ebiggers3@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-11-11binfmt_elf: Don't clobber passed executable's file headerMaciej W. Rozycki1-5/+5
Do not clobber the buffer space passed from `search_binary_handler' and originally preloaded by `prepare_binprm' with the executable's file header by overwriting it with its interpreter's file header. Instead keep the buffer space intact and directly use the data structure locally allocated for the interpreter's file header, fixing a bug introduced in 2.1.14 with loadable module support (linux-mips.org commit beb11695 [Import of Linux/MIPS 2.1.14], predating kernel.org repo's history). Adjust the amount of data read from the interpreter's file accordingly. This was not an issue before loadable module support, because back then `load_elf_binary' was executed only once for a given ELF executable, whether the function succeeded or failed. With loadable module support supported and enabled, upon a failure of `load_elf_binary' -- which may for example be caused by architecture code rejecting an executable due to a missing hardware feature requested in the file header -- a module load is attempted and then the function reexecuted by `search_binary_handler'. With the executable's file header replaced with its interpreter's file header the executable can then be erroneously accepted in this subsequent attempt. Cc: stable@vger.kernel.org # all the way back Signed-off-by: Maciej W. Rozycki <macro@imgtec.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-11-11FS-Cache: Handle a write to the page immediately beyond the EOF markerDavid Howells2-31/+38
Handle a write being requested to the page immediately beyond the EOF marker on a cache object. Currently this gets an assertion failure in CacheFiles because the EOF marker is used there to encode information about a partial page at the EOF - which could lead to an unknown blank spot in the file if we extend the file over it. The problem is actually in fscache where we check the index of the page being written against store_limit. store_limit is set to the number of pages that we're allowed to store by fscache_set_store_limit() - which means it's one more than the index of the last page we're allowed to store. The problem is that we permit writing to a page with an index _equal_ to the store limit - when we should reject that case. Whilst we're at it, change the triggered assertion in CacheFiles to just return -ENOBUFS instead. The assertion failure looks something like this: CacheFiles: Assertion failed 1000 < 7b1 is false ------------[ cut here ]------------ kernel BUG at fs/cachefiles/rdwr.c:962! ... RIP: 0010:[<ffffffffa02c9e83>] [<ffffffffa02c9e83>] cachefiles_write_page+0x273/0x2d0 [cachefiles] Cc: stable@vger.kernel.org # v2.6.31+; earlier - that + backport of a17754f (at least) Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-11-11cachefiles: perform test on s_blocksize when opening cache file.NeilBrown2-6/+2
cachefiles requires that s_blocksize in the cache is not greater than PAGE_SIZE, and performs the check every time a block is accessed. Move the test to the place where the file is "opened", where other file-validity tests are performed. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-11-11FS-Cache: Don't override netfs's primary_index if registering failedKinglong Mee1-18/+17
Only override netfs->primary_index when registering success. Cc: stable@vger.kernel.org # v2.6.30+ Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-11-11FS-Cache: Increase reference of parent after registering, netfs successKinglong Mee1-5/+4
If netfs exist, fscache should not increase the reference of parent's usage and n_children, otherwise, never be decreased. v2: thanks David's suggest, move increasing reference of parent if success use kmem_cache_free() freeing primary_index directly v3: don't move "netfs->primary_index->parent = &fscache_fsdef_index;" Cc: stable@vger.kernel.org # v2.6.30+ Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-11-11debugfs: fix refcount imbalance in start_creatingDaniel Borkmann1-1/+5
In debugfs' start_creating(), we pin the file system to safely access its root. When we failed to create a file, we unpin the file system via failed_creating() to release the mount count and eventually the reference of the vfsmount. However, when we run into an error during lookup_one_len() when still in start_creating(), we only release the parent's mutex but not so the reference on the mount. Looks like it was done in the past, but after splitting portions of __create_file() into start_creating() and end_creating() via 190afd81e4a5 ("debugfs: split the beginning and the end of __create_file() off"), this seemed missed. Noticed during code review. Fixes: 190afd81e4a5 ("debugfs: split the beginning and the end of __create_file() off") Cc: stable@vger.kernel.org # v4.0+ Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-11-07Merge branch 'akpm' (patches from Andrew)Linus Torvalds43-264/+418
Merge second patch-bomb from Andrew Morton: - most of the rest of MM - procfs - lib/ updates - printk updates - bitops infrastructure tweaks - checkpatch updates - nilfs2 update - signals - various other misc bits: coredump, seqfile, kexec, pidns, zlib, ipc, dma-debug, dma-mapping, ... * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (102 commits) ipc,msg: drop dst nil validation in copy_msg include/linux/zutil.h: fix usage example of zlib_adler32() panic: release stale console lock to always get the logbuf printed out dma-debug: check nents in dma_sync_sg* dma-mapping: tidy up dma_parms default handling pidns: fix set/getpriority and ioprio_set/get in PRIO_USER mode kexec: use file name as the output message prefix fs, seqfile: always allow oom killer seq_file: reuse string_escape_str() fs/seq_file: use seq_* helpers in seq_hex_dump() coredump: change zap_threads() and zap_process() to use for_each_thread() coredump: ensure all coredumping tasks have SIGNAL_GROUP_COREDUMP signal: remove jffs2_garbage_collect_thread()->allow_signal(SIGCONT) signal: introduce kernel_signal_stop() to fix jffs2_garbage_collect_thread() signal: turn dequeue_signal_lock() into kernel_dequeue_signal() signals: kill block_all_signals() and unblock_all_signals() nilfs2: fix gcc uninitialized-variable warnings in powerpc build nilfs2: fix gcc unused-but-set-variable warnings MAINTAINERS: nilfs2: add header file for tracing nilfs2: add tracepoints for analyzing reading and writing metadata files ...
2015-11-07Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivialLinus Torvalds8-11/+11
Pull trivial updates from Jiri Kosina: "Trivial stuff from trivial tree that can be trivially summed up as: - treewide drop of spurious unlikely() before IS_ERR() from Viresh Kumar - cosmetic fixes (that don't really affect basic functionality of the driver) for pktcdvd and bcache, from Julia Lawall and Petr Mladek - various comment / printk fixes and updates all over the place" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: bcache: Really show state of work pending bit hwmon: applesmc: fix comment typos Kconfig: remove comment about scsi_wait_scan module class_find_device: fix reference to argument "match" debugfs: document that debugfs_remove*() accepts NULL and error values net: Drop unlikely before IS_ERR(_OR_NULL) mm: Drop unlikely before IS_ERR(_OR_NULL) fs: Drop unlikely before IS_ERR(_OR_NULL) drivers: net: Drop unlikely before IS_ERR(_OR_NULL) drivers: misc: Drop unlikely before IS_ERR(_OR_NULL) UBI: Update comments to reflect UBI_METAONLY flag pktcdvd: drop null test before destroy functions
2015-11-06fs, seqfile: always allow oom killerGreg Thelen1-3/+8
Since 5cec38ac866b ("fs, seq_file: fallback to vmalloc instead of oom kill processes") seq_buf_alloc() avoids calling the oom killer for PAGE_SIZE or smaller allocations; but larger allocations can use the oom killer via vmalloc(). Thus reads of small files can return ENOMEM, but larger files use the oom killer to avoid ENOMEM. The effect of this bug is that reads from /proc and other virtual filesystems can return ENOMEM instead of the preferred behavior - oom killing something (possibly the calling process). I don't know of anyone except Google who has noticed the issue. I suspect the fix is more needed in smaller systems where there isn't any reclaimable memory. But these seem like the kinds of systems which probably don't use the oom killer for production situations. Memory overcommit requires use of the oom killer to select a victim regardless of file size. Enable oom killer for small seq_buf_alloc() allocations. Fixes: 5cec38ac866b ("fs, seq_file: fallback to vmalloc instead of oom kill processes") Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Greg Thelen <gthelen@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06seq_file: reuse string_escape_str()Andy Shevchenko1-19/+6
strint_escape_str() escapes input string by given criteria. In case of seq_escape() the criteria is to convert some characters to their octal representation. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06fs/seq_file: use seq_* helpers in seq_hex_dump()Andy Shevchenko1-8/+7
This improves code readability. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06coredump: change zap_threads() and zap_process() to use for_each_thread()Oleg Nesterov1-14/+13
Change zap_threads() paths to use for_each_thread() rather than while_each_thread(). While at it, change zap_threads() to avoid the nested if's to make the code more readable and lessen the indentation. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Kyle Walker <kwalker@redhat.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Stanislav Kozina <skozina@redhat.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06coredump: ensure all coredumping tasks have SIGNAL_GROUP_COREDUMPOleg Nesterov1-6/+6
task_will_free_mem() is wrong in many ways, and in particular the SIGNAL_GROUP_COREDUMP check is not reliable: a task can participate in the coredumping without SIGNAL_GROUP_COREDUMP bit set. change zap_threads() paths to always set SIGNAL_GROUP_COREDUMP even if other CLONE_VM processes can't react to SIGKILL. Fortunately, at least oom-kill case if fine; it kills all tasks sharing the same mm, so it should also kill the process which actually dumps the core. The change in prepare_signal() is not strictly necessary, it just ensures that the patch does not bring another subtle behavioural change. But it reminds us that this SIGNAL_GROUP_EXIT/COREDUMP case needs more changes. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Kyle Walker <kwalker@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Stanislav Kozina <skozina@redhat.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06signal: remove jffs2_garbage_collect_thread()->allow_signal(SIGCONT)Oleg Nesterov1-1/+0
jffs2_garbage_collect_thread() does allow_signal(SIGCONT) for no reason, SIGCONT will wake a stopped task up even if it is ignored. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Tejun Heo <tj@kernel.org> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Felipe Balbi <balbi@ti.com> Cc: Markus Pargmann <mpa@pengutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06signal: introduce kernel_signal_stop() to fix jffs2_garbage_collect_thread()Oleg Nesterov1-2/+1
jffs2_garbage_collect_thread() can race with SIGCONT and sleep in TASK_STOPPED state after it was already sent. Add the new helper, kernel_signal_stop(), which does this correctly. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Tejun Heo <tj@kernel.org> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Felipe Balbi <balbi@ti.com> Cc: Markus Pargmann <mpa@pengutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06signal: turn dequeue_signal_lock() into kernel_dequeue_signal()Oleg Nesterov1-2/+1
1. Rename dequeue_signal_lock() to kernel_dequeue_signal(). This matches another "for kthreads only" kernel_sigaction() helper. 2. Remove the "tsk" and "mask" arguments, they are always current and current->blocked. And it is simply wrong if tsk != current. 3. We could also remove the 3rd "siginfo_t *info" arg but it looks potentially useful. However we can simplify the callers if we change kernel_dequeue_signal() to accept info => NULL. 4. Remove _irqsave, it is never called from atomic context. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Tejun Heo <tj@kernel.org> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Felipe Balbi <balbi@ti.com> Cc: Markus Pargmann <mpa@pengutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06nilfs2: fix gcc uninitialized-variable warnings in powerpc buildRyusuke Konishi3-4/+7
Some false positive warnings are reported for powerpc build. The following warnings are reported in http://kisskb.ellerman.id.au/kisskb/buildresult/12519703/ CC fs/nilfs2/super.o fs/nilfs2/super.c: In function 'nilfs_resize_fs': fs/nilfs2/super.c:376:2: warning: 'blocknr' may be used uninitialized in this function [-Wuninitialized] fs/nilfs2/super.c:362:11: note: 'blocknr' was declared here CC fs/nilfs2/recovery.o fs/nilfs2/recovery.c: In function 'nilfs_salvage_orphan_logs': fs/nilfs2/recovery.c:631:21: warning: 'sum' may be used uninitialized in this function [-Wuninitialized] fs/nilfs2/recovery.c:585:32: note: 'sum' was declared here fs/nilfs2/recovery.c: In function 'nilfs_search_super_root': fs/nilfs2/recovery.c:873:11: warning: 'sum' may be used uninitialized in this function [-Wuninitialized] Another similar warning is reported in http://kisskb.ellerman.id.au/kisskb/buildresult/12520079/ CC fs/nilfs2/btree.o fs/nilfs2/btree.c: In function 'nilfs_btree_convert_and_insert': include/asm-generic/bitops/non-atomic.h:105:20: warning: 'bh' may be used uninitialized in this function [-Wuninitialized] fs/nilfs2/btree.c:1859:22: note: 'bh' was declared here This cleans out these warnings by forcing the variables to be initialized. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06nilfs2: fix gcc unused-but-set-variable warningsRyusuke Konishi5-13/+3
Fix the following build warnings: $ make W=1 [...] CC [M] fs/nilfs2/btree.o fs/nilfs2/btree.c: In function 'nilfs_btree_split': fs/nilfs2/btree.c:923:8: warning: variable 'newptr' set but not used [-Wunused-but-set-variable] __u64 newptr; ^ fs/nilfs2/btree.c:922:8: warning: variable 'newkey' set but not used [-Wunused-but-set-variable] __u64 newkey; ^ CC [M] fs/nilfs2/dat.o fs/nilfs2/dat.c: In function 'nilfs_dat_prepare_end': fs/nilfs2/dat.c:158:8: warning: variable 'start' set but not used [-Wunused-but-set-variable] __u64 start; ^ CC [M] fs/nilfs2/segment.o fs/nilfs2/segment.c: In function 'nilfs_segctor_do_immediate_flush': fs/nilfs2/segment.c:2433:6: warning: variable 'err' set but not used [-Wunused-but-set-variable] int err; ^ CC [M] fs/nilfs2/sufile.o fs/nilfs2/sufile.c: In function 'nilfs_sufile_alloc': fs/nilfs2/sufile.c:320:27: warning: variable 'ncleansegs' set but not used [-Wunused-but-set-variable] unsigned long nsegments, ncleansegs, nsus, cnt; ^ CC [M] fs/nilfs2/alloc.o fs/nilfs2/alloc.c: In function 'nilfs_palloc_prepare_alloc_entry': fs/nilfs2/alloc.c:478:38: warning: variable 'groups_per_desc_block' set but not used [-Wunused-but-set-variable] unsigned long n, entries_per_group, groups_per_desc_block; ^ Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06nilfs2: add tracepoints for analyzing reading and writing metadata filesHitoshi Mitake1-0/+6
This patch adds tracepoints for analyzing requests of reading and writing metadata files. The tracepoints cover every in-place mdt files (cpfile, sufile, and datfile). Example of tracing mdt_insert_new_block(): cp-14635 [000] ...1 30598.199309: nilfs2_mdt_insert_new_block: inode = ffff88022a8d0178 ino = 3 block = 155 cp-14635 [000] ...1 30598.199520: nilfs2_mdt_insert_new_block: inode = ffff88022a8d0178 ino = 3 block = 5 cp-14635 [000] ...1 30598.200828: nilfs2_mdt_insert_new_block: inode = ffff88022a8d0178 ino = 3 block = 253 Signed-off-by: Hitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: TK Kato <TK.Kato@wdc.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06nilfs2: add tracepoints for analyzing sufile manipulationHitoshi Mitake1-0/+8
This patch adds tracepoints which would be useful for analyzing segment usage from a perspective of high level sufile manipulation (check, alloc, free). sufile is an important in-place updated metadata file, so analyzing the behavior would be useful for performance turning. example of usage (a case of allocation): $ sudo bin/tpoint nilfs2:nilfs2_segment_usage_allocated Tracing nilfs2:nilfs2_segment_usage_allocated. Ctrl-C to end. segctord-17800 [002] ...1 10671.867294: nilfs2_segment_usage_allocated: sufile = ffff880054f908a8 segnum = 2 segctord-17800 [002] ...1 10675.073477: nilfs2_segment_usage_allocated: sufile = ffff880054f908a8 segnum = 3 Signed-off-by: Hitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Benixon Dhas <benixon.dhas@wdc.com> Cc: TK Kato <TK.Kato@wdc.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06nilfs2: add a tracepoint for transaction eventsHitoshi Mitake1-1/+32
This patch adds a tracepoint for transaction events of nilfs. With the tracepoint, these events can be tracked: begin, abort, commit, trylock, lock, and unlock. Basically, these events have corresponding functions e.g. begin event corresponds nilfs_transaction_begin(). The unlock event is an exception. It corresponds to the iteration in nilfs_transaction_lock(). Only one tracepoint is introcued: nilfs2_transaction_transition. The above events are distinguished with newly introduced enum. With this tracepoint, we can analyse a critical section of segment constructoin. Sample output by tpoint of perf-tools: cp-4457 [000] ...1 63.266220: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800bf5ccc58 count = 1 flags = 9 state = BEGIN cp-4457 [000] ...1 63.266221: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800bf5ccc58 count = 0 flags = 9 state = COMMIT cp-4457 [000] ...1 63.266221: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800bf5ccc58 count = 0 flags = 9 state = COMMIT segctord-4371 [001] ...1 68.261196: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800b889bdf8 count = 0 flags = 10 state = TRYLOCK segctord-4371 [001] ...1 68.261280: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800b889bdf8 count = 0 flags = 10 state = LOCK segctord-4371 [001] ...1 68.261877: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800b889bdf8 count = 1 flags = 10 state = BEGIN segctord-4371 [001] ...1 68.262116: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800b889bdf8 count = 0 flags = 18 state = COMMIT segctord-4371 [001] ...1 68.265032: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800b889bdf8 count = 0 flags = 18 state = UNLOCK segctord-4371 [001] ...1 132.376847: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800b889bdf8 count = 0 flags = 10 state = TRYLOCK This patch also does trivial cleaning of comma usage in collection stage transition event for consistent coding style. Signed-off-by: Hitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06nilfs2: add a tracepoint for tracking stage transition of segment constructionHitoshi Mitake2-21/+53
This patch adds a tracepoint for tracking stage transition of block collection in segment construction. With the tracepoint, we can analysis the behavior of segment construction in depth. It would be useful for bottleneck detection and debugging, etc. The tracepoint is created with the standard trace API of linux (like ext3, ext4, f2fs and btrfs). So we can analysis with existing tools easily. Of course, more detailed analysis will be possible if we can create nilfs specific analysis tools. Below is an example of event dump with Brendan Gregg's perf-tools (https://github.com/brendangregg/perf-tools). Time consumption between each stage can be obtained. $ sudo bin/tpoint nilfs2:nilfs2_collection_stage_transition Tracing nilfs2:nilfs2_collection_stage_transition. Ctrl-C to end. segctord-14875 [003] ...1 28311.067794: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_INIT segctord-14875 [003] ...1 28311.068139: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_GC segctord-14875 [003] ...1 28311.068139: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_FILE segctord-14875 [003] ...1 28311.068486: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_IFILE segctord-14875 [003] ...1 28311.068540: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_CPFILE segctord-14875 [003] ...1 28311.068561: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_SUFILE segctord-14875 [003] ...1 28311.068565: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_DAT segctord-14875 [003] ...1 28311.068573: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_SR segctord-14875 [003] ...1 28311.068574: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_DONE For capturing transition correctly, this patch adds wrappers for the member scnt of nilfs_cstage. With this change, every transition of the stage can produce trace event in a correct manner. Signed-off-by: Hitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06nilfs2: free unused dat file blocks during garbage collectionRyusuke Konishi2-17/+75
As a nilfs2 volume ages, the amount of available disk space decreases little by little due to bloat of DAT (disk address translation) metadata file. Even if we delete all files in a file system and free their block addresses from the DAT file through a garbage collection, empty DAT blocks are not freed. This fixes the issue by extending the deallocator of block addresses so that empty data blocks and empty bitmap blocks of DAT are deleted. The following comparison shows the effect of this patch. Each shows disk amount information of a nilfs2 volume that we cleaned out by deleting all files and running gc after having filled 90% of its capacity. Before: Filesystem 1K-blocks Used Available Use% Mounted on /dev/sda1 500105212 3022844 472072192 1% /test After: Filesystem 1K-blocks Used Available Use% Mounted on /dev/sda1 500105212 16380 475078656 1% /test Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06nilfs2: add helper functions to delete blocks from dat fileRyusuke Konishi1-0/+50
This adds delete functions for data blocks of metadata files using bitmap based allocator. nilfs_palloc_delete_entry_block() deletes an entry block (e.g. block storing dat entries), and nilfs_palloc_delete_bitmap_block() deletes a bitmap block, respectively. These helpers are intended to be used in the successive change on deallocator of block addresses ("nilfs2: free unused dat file blocks during garbage collection"). Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06nilfs2: get rid of nilfs_palloc_group_is_in()Ryusuke Konishi1-19/+9
This unfolds nilfs_palloc_group_is_in() helper function into nilfs_palloc_freev() function to simplify a range check and an index calculation repeatedy performed in a loop of the function. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06nilfs2: refactor nilfs_palloc_find_available_slot()Ryusuke Konishi1-27/+21
The current implementation of nilfs_palloc_find_available_slot() function is overkill. The underlying bit search routine is well optimized, so this uses it more simply in nilfs_palloc_find_available_slot(). Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06nilfs2: do not call nilfs_mdt_bgl_lock() needlesslyRyusuke Konishi1-44/+40
In the bitmap based allocator implementation, nilfs_mdt_bgl_lock() helper is frequently used to get a spinlock protecting a target block group. This reduces its usage and simplifies arguments of some related functions by directly passing a pointer to the spinlock. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06nilfs2: use nilfs_warning() in allocator implementationRyusuke Konishi1-8/+12
This uses nilfs_warning() to replace "printk(KERN_WARNING ...);" in the bitmap based allocator implementation of nilfs2. The warning messages are modified to include the device name and the inode number in each message. This makes it clear which metadata file of which device has output warnings such as "entry number xxxx already freed". Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06nilfs2: drop null test before destroy functionsJulia Lawall1-8/+4
Remove unneeded NULL test. The semantic patch that makes this change is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ expression x; @@ -if (x != NULL) \(kmem_cache_destroy\|mempool_destroy\|dma_pool_destroy\)(x); // </smpl> Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06fs/jffs2/wbuf.c: remove stray semicolonAndrew Morton1-1/+1
Reported-by: Wu Fengguang <fengguang.wu@intel.com> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06proc: actually make proc_fd_permission() thread-friendlyOleg Nesterov1-3/+11
The commit 96d0df79f264 ("proc: make proc_fd_permission() thread-friendly") fixed the access to /proc/self/fd from sub-threads, but introduced another problem: a sub-thread can't access /proc/<tid>/fd/ or /proc/thread-self/fd if generic_permission() fails. Change proc_fd_permission() to check same_thread_group(pid_task(), current). Fixes: 96d0df79f264 ("proc: make proc_fd_permission() thread-friendly") Reported-by: "Jin, Yihua" <yihua.jin@intel.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06fs/proc/array.c: set overflow flag in case of errorAndy Shevchenko1-5/+5
For now in task_name() we ignore the return code of string_escape_str() call. This is not good if buffer suddenly becomes not big enough. Do the proper error handling there. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06fs/sync.c: make sync_file_range(2) use WB_SYNC_NONE writebackJan Kara1-1/+2
sync_file_range(2) is documented to issue writeback only for pages that are not currently being written. After all the system call has been created for userspace to be able to issue background writeout and so waiting for in-flight IO is undesirable there. However commit ee53a891f474 ("mm: do_sync_mapping_range integrity fix") switched do_sync_mapping_range() and thus sync_file_range() to issue writeback in WB_SYNC_ALL mode since do_sync_mapping_range() was used by other code relying on WB_SYNC_ALL semantics. These days do_sync_mapping_range() went away and we can switch sync_file_range(2) back to issuing WB_SYNC_NONE writeback. That should help PostgreSQL avoid large latency spikes when flushing data in the background. Andres measured a 20% increase in transactions per second on an SSD disk. Signed-off-by: Jan Kara <jack@suse.com> Reported-by: Andres Freund <andres@anarazel.de> Tested-By: Andres Freund <andres@anarazel.de> Cc: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06mm, fs: introduce mapping_gfp_constraint()Michal Hocko14-23/+23
There are many places which use mapping_gfp_mask to restrict a more generic gfp mask which would be used for allocations which are not directly related to the page cache but they are performed in the same context. Let's introduce a helper function which makes the restriction explicit and easier to track. This patch doesn't introduce any functional changes. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Michal Hocko <mhocko@suse.com> Suggested-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06mm, page_alloc: rename __GFP_WAIT to __GFP_RECLAIMMel Gorman3-3/+3
__GFP_WAIT was used to signal that the caller was in atomic context and could not sleep. Now it is possible to distinguish between true atomic context and callers that are not willing to sleep. The latter should clear __GFP_DIRECT_RECLAIM so kswapd will still wake. As clearing __GFP_WAIT behaves differently, there is a risk that people will clear the wrong flags. This patch renames __GFP_WAIT to __GFP_RECLAIM to clearly indicate what it does -- setting it allows all reclaim activity, clearing them prevents it. [akpm@linux-foundation.org: fix build] [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Christoph Lameter <cl@linux.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Vitaly Wool <vitalywool@gmail.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep and avoiding waking kswapdMel Gorman9-21/+21
__GFP_WAIT has been used to identify atomic context in callers that hold spinlocks or are in interrupts. They are expected to be high priority and have access one of two watermarks lower than "min" which can be referred to as the "atomic reserve". __GFP_HIGH users get access to the first lower watermark and can be called the "high priority reserve". Over time, callers had a requirement to not block when fallback options were available. Some have abused __GFP_WAIT leading to a situation where an optimisitic allocation with a fallback option can access atomic reserves. This patch uses __GFP_ATOMIC to identify callers that are truely atomic, cannot sleep and have no alternative. High priority users continue to use __GFP_HIGH. __GFP_DIRECT_RECLAIM identifies callers that can sleep and are willing to enter direct reclaim. __GFP_KSWAPD_RECLAIM to identify callers that want to wake kswapd for background reclaim. __GFP_WAIT is redefined as a caller that is willing to enter direct reclaim and wake kswapd for background reclaim. This patch then converts a number of sites o __GFP_ATOMIC is used by callers that are high priority and have memory pools for those requests. GFP_ATOMIC uses this flag. o Callers that have a limited mempool to guarantee forward progress clear __GFP_DIRECT_RECLAIM but keep __GFP_KSWAPD_RECLAIM. bio allocations fall into this category where kswapd will still be woken but atomic reserves are not used as there is a one-entry mempool to guarantee progress. o Callers that are checking if they are non-blocking should use the helper gfpflags_allow_blocking() where possible. This is because checking for __GFP_WAIT as was done historically now can trigger false positives. Some exceptions like dm-crypt.c exist where the code intent is clearer if __GFP_DIRECT_RECLAIM is used instead of the helper due to flag manipulations. o Callers that built their own GFP flags instead of starting with GFP_KERNEL and friends now also need to specify __GFP_KSWAPD_RECLAIM. The first key hazard to watch out for is callers that removed __GFP_WAIT and was depending on access to atomic reserves for inconspicuous reasons. In some cases it may be appropriate for them to use __GFP_HIGH. The second key hazard is callers that assembled their own combination of GFP flags instead of starting with something like GFP_KERNEL. They may now wish to specify __GFP_KSWAPD_RECLAIM. It's almost certainly harmless if it's missed in most cases as other activity will wake kswapd. Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Vitaly Wool <vitalywool@gmail.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06Merge branch 'for-linus-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfsLinus Torvalds42-1065/+2627
Pull btrfs updates from Chris Mason: "We have a lot of subvolume quota improvements in here, along with big piles of cleanups from Dave Sterba and Anand Jain and others. Josef pitched in a batch of allocator fixes based on production use here at FB. We found that mount -o ssd_spread greatly improved our performance on hardware raid5/6, but it exposed some CPU bottlenecks in the allocator. These patches make a huge difference" * 'for-linus-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (100 commits) Btrfs: fix hole punching when using the no-holes feature Btrfs: find_free_extent: Do not erroneously skip LOOP_CACHING_WAIT state btrfs: Fix a data space underflow warning btrfs: qgroup: Fix a rebase bug which will cause qgroup double free btrfs: qgroup: Fix a race in delayed_ref which leads to abort trans btrfs: clear PF_NOFREEZE in cleaner_kthread() btrfs: qgroup: Don't copy extent buffer to do qgroup rescan btrfs: add balance filters limits, stripes and usage to supported mask btrfs: extend balance filter usage to take minimum and maximum btrfs: add balance filter for stripes btrfs: extend balance filter limit to take minimum and maximum btrfs: fix use after free iterating extrefs btrfs: check unsupported filters in balance arguments Btrfs: fix regression running delayed references when using qgroups Btrfs: fix regression when running delayed references Btrfs: don't do extra bitmap search in one bit case Btrfs: keep track of largest extent in bitmaps Btrfs: don't keep trying to build clusters if we are fragmented Btrfs: cut down on loops through the allocator Btrfs: don't continue setting up space cache when enospc ...
2015-11-06Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4Linus Torvalds36-992/+1199
Pull ext4 updates from Ted Ts'o: "Add support for the CSUM_SEED feature which will allow future userspace utilities to change the file system's UUID without rewriting all of the file system metadata. A number of miscellaneous fixes, the most significant of which are in the ext4 encryption support. Anyone wishing to use the encryption feature should backport all of the ext4 crypto patches up to 4.4 to get fixes to a memory leak and file system corruption bug. There are also cleanups in ext4's feature test macros and in ext4's sysfs support code" * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (26 commits) fs/ext4: remove unnecessary new_valid_dev check ext4: fix abs() usage in ext4_mb_check_group_pa ext4: do not allow journal_opts for fs w/o journal ext4: explicit mount options parsing cleanup ext4, jbd2: ensure entering into panic after recording an error in superblock [PATCH] fix calculation of meta_bg descriptor backups ext4: fix potential use after free in __ext4_journal_stop jbd2: fix checkpoint list cleanup ext4: fix xfstest generic/269 double revoked buffer bug with bigalloc ext4: make the bitmap read routines return real error codes jbd2: clean up feature test macros with predicate functions ext4: clean up feature test macros with predicate functions ext4: call out CRC and corruption errors with specific error codes ext4: store checksum seed in superblock ext4: reserve code points for the project quota feature ext4: promote ext4 over ext2 in the default probe order jbd2: gate checksum calculations on crc driver presence, not sb flags ext4: use private version of page_zero_new_buffers() for data=journal mode ext4 crypto: fix bugs in ext4_encrypted_zeroout() ext4 crypto: replace some BUG_ON()'s with error checks ...
2015-11-06Merge tag 'trace-v4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-traceLinus Torvalds1-1/+5
Pull tracking updates from Steven Rostedt: "Most of the changes are clean ups and small fixes. Some of them have stable tags to them. I searched through my INBOX just as the merge window opened and found lots of patches to pull. I ran them through all my tests and they were in linux-next for a few days. Features added this release: ---------------------------- - Module globbing. You can now filter function tracing to several modules. # echo '*:mod:*snd*' > set_ftrace_filter (Dmitry Safonov) - Tracer specific options are now visible even when the tracer is not active. It was rather annoying that you can only see and modify tracer options after enabling the tracer. Now they are in the options/ directory even when the tracer is not active. Although they are still only visible when the tracer is active in the trace_options file. - Trace options are now per instance (although some of the tracer specific options are global) - New tracefs file: set_event_pid. If any pid is added to this file, then all events in the instance will filter out events that are not part of this pid. sched_switch and sched_wakeup events handle next and the wakee pids" * tag 'trace-v4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (68 commits) tracefs: Fix refcount imbalance in start_creating() tracing: Put back comma for empty fields in boot string parsing tracing: Apply tracer specific options from kernel command line. tracing: Add some documentation about set_event_pid ring_buffer: Remove unneeded smp_wmb() before wakeup of reader benchmark tracing: Allow dumping traces without tracking trace started cpus ring_buffer: Fix more races when terminating the producer in the benchmark ring_buffer: Do no not complete benchmark reader too early tracing: Remove redundant TP_ARGS redefining tracing: Rename max_stack_lock to stack_trace_max_lock tracing: Allow arch-specific stack tracer recordmcount: arm64: Replace the ignored mcount call into nop recordmcount: Fix endianness handling bug for nop_mcount tracepoints: Fix documentation of RCU lockdep checks tracing: ftrace_event_is_function() can return boolean tracing: is_legal_op() can return boolean ring-buffer: rb_event_is_commit() can return boolean ring-buffer: rb_per_cpu_empty() can return boolean ring_buffer: ring_buffer_empty{cpu}() can return boolean ring-buffer: rb_is_reader_page() can return boolean ...
2015-11-06Merge tag 'for-linus-20151106' of git://git.infradead.org/linux-mtdLinus Torvalds4-23/+14
Pull MTD updates from Brian Norris: "Core: - WARN (in some cases) when a struct mtd_info is registered multiple times; in the past this was "supported", but it's still error prone for future development. There's only one ugly case of this left in the tree (that we're aware of) and the owners are aware of the problems there. - fix potential deadlock in the blkdev removal path NOTE: the (potential) deadlock was introduced in a for-stable patch. This one is also marked for -stable. - ioctl(BLKPG) compat_ioctl support; resolves issues with 32-bit user space vs 64-bit kernel space - Set MTD parent device correctly throughout the tree, so the tree structure appears correctly in sysfs; many drivers were missing this (soft) requirement - Move device tree partitions (ofpart) into a dedicated 'partitions' subnode; this helps to disambiguate whether a node is a partition or some other auxiliary data - Improve error handling for partitioning failures NAND: - General: Increase timeout period, for corner-case systems with less-than-accurate jiffies - Fix OF-based autoloading of several NAND drivers when built as modules - pxa3xx_nand: - Rework timing configuration to be more dynamic - Refactor PM support - brcmnand: prepare for NorthStar 2 support (ARM64, 16-bit NAND chips) - sunxi_nand: refactoring and a few bug fixes - vf610: new NAND driver - FSMC: add SW BCH support; support common NAND DT bindings - lpc32xx_slc: refactor and improve timing calculations logic - denali: support for rev 5.1 SPI NOR: - Layering improvements - Added Winbond lock/unlock support - Added mtd_is_locked() (i.e., ioctl(MEMISLOCKED)) support - Increase full-chip-erase timeout linearly with flash size - fsl-quadspi: fix compile for non-ARM architectures - New flash support" * tag 'for-linus-20151106' of git://git.infradead.org/linux-mtd: (169 commits) mtd: don't WARN about overloaded users of mtd->reboot_notifier.notifier_call mtd: nand: sunxi: avoid retrieving data before ECC pass mtd: nand: sunxi: fix sunxi_nfc_hw_ecc_read/write_chunk() mtd: blkdevs: fix potential deadlock + lockdep warnings mtd: ofpart: move ofpart partitions to a dedicated dt node doc: dt: mtd: support partitions in a special 'partitions' subnode mtd: brcmnand: Force 8bit mode before doing nand_scan_ident() mtd: brcmnand: factor out CFG and CFG_EXT bitfields mtd: mtdpart: Do not fail mtd probe when parsing partitions fails mtd: fsl-quadspi: fix macro collision problems with READ/WRITE mtd: warn when registering the same master many times mtd: fixup corner case error handling in mtd_device_parse_register() mtd: tests: Replace timeval with ktime_t mtd: fsmc_nand: Add BCH4 SW ECC support for SPEAr600 mtd: nand: vf610_nfc: use nand_check_erased_ecc_chunk() helper mtd: nand: increase ready wait timeout and report timeouts mtd: docg3: off by one in doc_register_sysfs() mtd: pxa3xx_nand: clean up the pxa3xx timings mtd: pxa3xx_nand: rework flash detection and timing setup mtd: pxa3xx_nand: add helpers to setup the timings ...
2015-11-05Merge branch 'akpm' (patches from Andrew)Linus Torvalds20-76/+204
Merge patch-bomb from Andrew Morton: - inotify tweaks - some ocfs2 updates (many more are awaiting review) - various misc bits - kernel/watchdog.c updates - Some of mm. I have a huge number of MM patches this time and quite a lot of it is quite difficult and much will be held over to next time. * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (162 commits) selftests: vm: add tests for lock on fault mm: mlock: add mlock flags to enable VM_LOCKONFAULT usage mm: introduce VM_LOCKONFAULT mm: mlock: add new mlock system call mm: mlock: refactor mlock, munlock, and munlockall code kasan: always taint kernel on report mm, slub, kasan: enable user tracking by default with KASAN=y kasan: use IS_ALIGNED in memory_is_poisoned_8() kasan: Fix a type conversion error lib: test_kasan: add some testcases kasan: update reference to kasan prototype repo kasan: move KASAN_SANITIZE in arch/x86/boot/Makefile kasan: various fixes in documentation kasan: update log messages kasan: accurately determine the type of the bad access kasan: update reported bug types for kernel memory accesses kasan: update reported bug types for not user nor kernel memory accesses mm/kasan: prevent deadlock in kasan reporting mm/kasan: don't use kasan shadow pointer in generic functions mm/kasan: MODULE_VADDR is not available on all archs ...
2015-11-05vfs: clear remainder of 'full_fds_bits' in dup_fd()Eric Biggers1-32/+32
This fixes a bug from commit f3f86e33dc3d ("vfs: Fix pathological performance case for __alloc_fd()"). v2: refactor to share fd bitmap copying code Signed-off-by: Eric Biggers <ebiggers3@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-05mm, oom: add comment for why oom_adj existsDavid Rientjes1-0/+10
/proc/pid/oom_adj exists solely to avoid breaking existing userspace binaries that write to the tunable. Add a comment in the only possible location within the kernel tree to describe the situation and motivation for keeping it around. Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-05mm: clear_soft_dirty_pmd() requires THPLaurent Dufour1-7/+7
Don't build clear_soft_dirty_pmd() if transparent huge pages are not enabled. Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>