linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2016-10-11	autofs: use autofs4_free_ino() to kfree dentry data	Tomohiro Kusumi	1	-1/+1
	kfree dentry data allocated by autofs4_new_ino() with autofs4_free_ino() instead of raw kfree. (since we have the interface to free autofs_info*) This patch was modified to remove the need to set the dentry info field to NULL dew to a change in the previous patch. Link: http://lkml.kernel.org/r/20160812024805.12352.43650.stgit@pluto.themaw.net Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com> Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11	autofs: remove ino free in autofs4_dir_symlink()	Ian Kent	1	-2/+0
	The inode allocation failure case in autofs4_dir_symlink() frees the autofs dentry info of the dentry without setting ->d_fsdata to NULL. That could lead to a double free so just get rid of the free and leave it to ->d_release(). Link: http://lkml.kernel.org/r/20160812024759.12352.10653.stgit@pluto.themaw.net Signed-off-by: Ian Kent <raven@themaw.net> Cc: Tomohiro Kusumi <kusumi.tomohiro@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11	autofs: add WARN_ON(1) for non dir/link inode case	Tomohiro Kusumi	1	-1/+2
	It's invalid if the given mode is neither dir nor link, so warn on else case. Link: http://lkml.kernel.org/r/20160812024754.12352.8536.stgit@pluto.themaw.net Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com> Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11	autofs: fix autofs4_fill_super() error exit handling	Ian Kent	1	-3/+3
	Somewhere along the line the error handling gotos have become incorrect. Link: http://lkml.kernel.org/r/20160812024749.12352.15100.stgit@pluto.themaw.net Signed-off-by: Ian Kent <raven@themaw.net> Cc: Tomohiro Kusumi <kusumi.tomohiro@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11	autofs: test autofs versions first on sb initialization	Tomohiro Kusumi	1	-17/+17
	This patch does what the below comment says. It could be and it's considered better to do this first before various functions get called during initialization. /* Couldn't this be tested earlier? */ Link: http://lkml.kernel.org/r/20160812024744.12352.43075.stgit@pluto.themaw.net Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com> Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11	autofs: drop unnecessary extern in autofs_i.h	Tomohiro Kusumi	1	-1/+1
	autofs4_kill_sb() doesn't need to be declared as extern, and no other functions in .h are explicitly declared as extern. Link: http://lkml.kernel.org/r/20160812024739.12352.99354.stgit@pluto.themaw.net Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com> Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11	autofs: fix typos in Documentation/filesystems/autofs4.txt	Tomohiro Kusumi	1	-4/+4
	plus minor whitespace fixes. Link: http://lkml.kernel.org/r/20160812024734.12352.17122.stgit@pluto.themaw.net Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com> Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11	kprobes: include <asm/sections.h> instead of <asm-generic/sections.h>	Christoph Hellwig	1	-1/+1
	asm-generic headers are generic implementations for architecture specific code and should not be included by common code. Thus use the asm/ version of sections.h to get at the linker sections. Link: http://lkml.kernel.org/r/1473602302-6208-1-git-send-email-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11	checkpatch: improve the octal permissions tests	Joe Perches	1	-16/+44
	The function calls with octal permissions commonly span multiple lines. The current test is line oriented and fails to find some matches. Make the test use the $stat variable instead of the $line variable to span multiple lines. Also add a few functions to the known functions with permissions list. Move the SYMBOLIC_PERMS test to a separate section to find all the S_<FOO> permissions in any form not just those that have specific function names. This can now find and fix permissions uses like: .mode = S_<FOO> \| S_<BAR>; Link: http://lkml.kernel.org/r/b51bab60530912aae4ac420119d465c5b206f19f.1475030406.git.joe@perches.com Signed-off-by: Joe Perches <joe@perches.com> Tested-by: Ramiro Oliveira <roliveir@synopsys.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11	checkpatch: add warning for unnamed function definition arguments	Joe Perches	1	-0/+13
	Function definitions without identifiers like int foo(int) are not preferred. Emit a warning when they occur. Link: http://lkml.kernel.org/r/94fe6378504745991b650f48fc92bb4648f25706.1474925354.git.joe@perches.com Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11	checkpatch: improve MACRO_ARG_PRECEDENCE test	Joe Perches	1	-1/+19
	It is possible for a multiple line macro definition to have a false positive report when an argument is used on a line after a continuation \. This line might have a leading '+' as the initial character that could be confused by checkpatch as an operator. Avoid the leading character on multiple line macro definitions. Link: http://lkml.kernel.org/r/60229d13399f9b6509db5a32e30d4c16951a60cd.1473836073.git.joe@perches.com Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11	checkpatch: add --strict test for precedence challenged macro arguments	Joe Perches	1	-1/+8
	Add a test for macro arguents that have a non-comma leading or trailing operator where the argument isn't parenthesized to avoid possible precedence issues. Link: http://lkml.kernel.org/r/47715508972f8d786f435e583ff881dbeee3a114.1473745855.git.joe@perches.com Signed-off-by: Joe Perches <joe@perches.com> Cc: Andy Whitcroft <apw@canonical.com> Cc: Julia Lawall <julia.lawall@lip6.fr> Cc: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11	checkpatch: add --strict test for macro argument reuse	Joe Perches	1	-8/+35
	If a macro argument is used multiple times in the macro definition, the macro argument may have an unexpected side-effect. Add a test (MACRO_ARG_REUSE) for that condition which is only emitted with command-line option --strict. Link: http://lkml.kernel.org/r/b6d67a87cafcafd15499e91780dc63b15dec0aa0.1473744906.git.joe@perches.com Signed-off-by: Joe Perches <joe@perches.com> Cc: Andy Whitcroft <apw@canonical.com> Cc: Julia Lawall <julia.lawall@lip6.fr> Cc: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11	checkpatch: improve the block comment * alignment test	Joe Perches	1	-7/+12
	An "uninitialized value" is emitted when a block comment starts on the same line as a statement. Fix this and make the test use a little fewer cpu cycles too. Link: http://lkml.kernel.org/r/3c9993320c2182d37f53ac540878cfef59c3f62d.1473365956.git.joe@perches.com Signed-off-by: Joe Perches <joe@perches.com> Reported-by: Charlemagne Lasse <charlemagnelasse@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11	checkpatch: speed up checking for filenames in sections marked obsolete	Joe Perches	1	-1/+1
	Adding -f to the get_maintainer.pl invocation means git isn't invoked by get_maintainer.pl for known filenames. This reduces the overall time to run checkpatch. Link: http://lkml.kernel.org/r/22991e3a295aeb399b43af0478b6e5809106ccee.1472684066.git.joe@perches.com Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11	const_structs.checkpatch: add frequently used from Julia Lawall's list	Joe Perches	1	-0/+25
	Using const is generally a good idea. Julia Lawall has created a list of always const and almost always const structs in the kernel sources. Link: https://lkml.org/lkml/2016/8/28/95 Add the most frequently used (> 50 cases) that are almost always or always const. Link: http://lkml.kernel.org/r/1e16020f8027654db0095bbfbcc11da51025365c.1472664220.git.joe@perches.com Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Kees Cook <keescook@chromium.org> Cc: Julia Lawall <julia.lawall@lip6.fr> Cc: Andy Whitcroft <apw@canonical.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11	checkpatch: externalize the structs that should be const	Joe Perches	2	-40/+63
	Make it easier to add new structs that should be const. Link: http://lkml.kernel.org/r/e5a8da43e7c11525bafbda1ca69a8323614dd942.1472664220.git.joe@perches.com Signed-off-by: Joe Perches <joe@perches.com> Cc: Julia Lawall <julia.lawall@lip6.fr> Cc: Kees Cook <keescook@chromium.org> Cc: Andy Whitcroft <apw@canonical.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11	checkpatch: don't test for prefer ether_addr_<foo>	Joe Perches	1	-35/+35
	< sigh > Comment these tests out. These are just too enticing to people that don't verify that both source and dest addresses really must be __aligned(2). It helps make Dan Carpenter happy too. Link: http://lkml.kernel.org/r/dc32ec66d24647f4cdf824c8dfbbc59aa7ce7b7d.1472665676.git.joe@perches.com Signed-off-by: Joe Perches <joe@perches.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: Greg <gvrose8192@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11	checkpatch: test multiple line block comment alignment	Joe Perches	1	-0/+19
	Warn when block comments are not aligned on the * /* * block comment, no warning / / * block comment, emit warning */ Link: http://lkml.kernel.org/r/edb57bd330adfe024b95ec2a807d4aa7f0c8b112.1472261299.git.joe@perches.com Signed-off-by: Joe Perches <joe@perches.com> Reported-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11	checkpatch: look for symbolic permissions and suggest octal instead	Joe Perches	1	-6/+43
	S_<FOO> uses should be avoided where octal is more intelligible. Linus didst say: : It's much easier to parse and understand the octal numbers, while the : symbolic macro names are just random line noise and hard as hell to : understand. You really have to think about it. : : So we should rather go the other way: convert existing bad symbolic : permission bit macro use to just use the octal numbers. : : The symbolic names are good for the other bits (ie sticky bit, and the : inode mode _type_ numbers etc), but for the permission bits, the symbolic : names are just insane crap. Nobody sane should ever use them. Not in the : kernel, not in user space. (http://lkml.kernel.org/r/CA+55aFw5v23T-zvDZp-MmD_EYxF8WbafwwB59934FV7g21uMGQ@mail.gmail.com) Link: http://lkml.kernel.org/r/7232ef011d05a92f4caa86a5e9830d87966a2eaf.1470180926.git.joe@perches.com Signed-off-by: Joe Perches <joe@perches.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11	checkpatch: see if modified files are marked obsolete in MAINTAINERS	Joe Perches	1	-0/+14
	Use get_maintainer to check the status of individual files. If "obsolete", suggest leaving the files alone. Link: http://lkml.kernel.org/r/7ceaa510dc9d2df05ec4b456baed7bb1415550b3.1471889575.git.joe@perches.com Signed-off-by: Joe Perches <joe@perches.com> Cc: SF Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11	lib/bitmap.c: enhance bitmap syntax	Noam Camus	2	-18/+82
	Today there are platforms with many CPUs (up to 4K). Trying to boot only part of the CPUs may result in too long string. For example lets take NPS platform that is part of arch/arc. This platform have SMP system with 256 cores each with 16 HW threads (SMT machine) where HW thread appears as CPU to the kernel. In this example there is total of 4K CPUs. When one tries to boot only part of the HW threads from each core the string representing the map may be long... For example if for sake of performance we decided to boot only first half of HW threads of each core the map will look like: 0-7,16-23,32-39,...,4080-4087 This patch introduce new syntax to accommodate with such use case. I added an optional postfix to a range of CPUs which will choose according to given modulo the desired range of reminders i.e.: <cpus range>:sed_size/group_size For example, above map can be described in new syntax like this: 0-4095:8/16 Note that this patch is backward compatible with current syntax. [akpm@linux-foundation.org: rework documentation] Link: http://lkml.kernel.org/r/1473579629-4283-1-git-send-email-noamca@mellanox.com Signed-off-by: Noam Camus <noamca@mellanox.com> Cc: David Decotigny <decot@googlers.com> Cc: Ben Hutchings <ben@decadent.org.uk> Cc: David S. Miller <davem@davemloft.net> Cc: Pan Xinhui <xinhui@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11	lib/kstrtox.c: smaller _parse_integer()	Alexey Dobriyan	1	-5/+1
	Set "overflow" bit upon encountering it instead of postponing to the end of the conversion. Somehow gcc unwedges itself and generates better code: $ ./scripts/bloat-o-meter ../vmlinux-000 ../obj/vmlinux _parse_integer 177 139 -38 Inspired by patch from Zhaoxiu Zeng. Link: http://lkml.kernel.org/r/20160826221920.GA1909@p183.telecom.by Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11	include/linux/ctype.h: make isdigit() table lookupless	Alexey Dobriyan	1	-1/+4
	Make isdigit into a simple range checking inline function: return '0' <= c && c <= '9'; This code is 1 branch, not 2 because any reasonable compiler can optimize this code into SUB+CMP, so the code while (isdigit((c = *s++))) ... remains 1 branch per iteration HOWEVER it suddenly doesn't do table lookup priming cacheline nobody cares about. Link: http://lkml.kernel.org/r/20160826190047.GA12536@p183.telecom.by Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11	lib: harden strncpy_from_user	Mark Rutland	1	-0/+2
	The strncpy_from_user() accessor is effectively a copy_from_user() specialised to copy strings, terminating early at a NUL byte if possible. In other respects it is identical, and can be used to copy an arbitrarily large buffer from userspace into the kernel. Conceptually, it exposes a similar attack surface. As with copy_from_user(), we check the destination range when the kernel is built with KASAN, but unlike copy_from_user() we do not check the destination buffer when using HARDENED_USERCOPY. As strncpy_from_user() calls get_user() in a loop, we must call check_object_size() explicitly. This patch adds this instrumentation to strncpy_from_user(), per the same rationale as with the regular copy_from_user(). In the absence of hardened usercopy this will have no impact as the instrumentation expands to an empty static inline function. Link: http://lkml.kernel.org/r/1472221903-31181-1-git-send-email-mark.rutland@arm.com Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11	radix-tree tests: properly initialize mutex	Ross Zwisler	1	-1/+1
	The pthread_mutex_t in regression1.c wasn't being initialized properly. Link: http://lkml.kernel.org/r/20160815194237.25967-4-ross.zwisler@linux.intel.com Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Shuah Khan <shuahkh@osg.samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11	radix-tree tests: add iteration test	Ross Zwisler	4	-1/+184
	There are four cases I can see where we could end up with a NULL 'slot' in radix_tree_next_slot(). This unit test exercises all four of them, making sure that if in the future we have an unsafe path through radix_tree_next_slot(), we'll catch it. Here are details on the four cases: 1) radix_tree_iter_retry() via a non-tagged iteration like radix_tree_for_each_slot(). In this case we currently aren't seeing a bug because radix_tree_iter_retry() sets iter->next_index = iter->index; which means that in in the else case in radix_tree_next_slot(), 'count' is zero, so we skip over the while() loop and effectively just return NULL without ever dereferencing 'slot'. 2) radix_tree_iter_retry() via tagged iteration like radix_tree_for_each_tagged(). This case was giving us NULL pointer dereferences in testing, and was fixed with this commit: commit 3cb9185c6730 ("radix-tree: fix radix_tree_iter_retry() for tagged iterators.") This fix doesn't explicitly check for 'slot' being NULL, though, it works around the NULL pointer dereference by instead zeroing iter->tags in radix_tree_iter_retry(), which makes us bail out of the if() case in radix_tree_next_slot() before we dereference 'slot'. 3) radix_tree_iter_next() via via a non-tagged iteration like radix_tree_for_each_slot(). This currently happens in shmem_tag_pins() and shmem_partial_swap_usage(). As with non-tagged iteration, 'count' in the else case of radix_tree_next_slot() is zero, so we skip over the while() loop and effectively just return NULL without ever dereferencing 'slot'. 4) radix_tree_iter_next() via tagged iteration like radix_tree_for_each_tagged(). This happens in shmem_wait_for_pins(). radix_tree_iter_next() zeros out iter->tags, so we end up exiting radix_tree_next_slot() here: if (flags & RADIX_TREE_ITER_TAGGED) { void *canon = slot; iter->tags >>= 1; if (unlikely(!iter->tags)) return NULL; Link: http://lkml.kernel.org/r/20160815194237.25967-3-ross.zwisler@linux.intel.com Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Shuah Khan <shuahkh@osg.samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11	radix-tree: 'slot' can be NULL in radix_tree_next_slot()	Ross Zwisler	1	-0/+8
	There are four cases I can see where we could end up with a NULL 'slot' in radix_tree_next_slot(). Yet radix_tree_next_slot() never actually checks whether 'slot' is NULL. It just happens that for the cases where 'slot' is NULL, some other combination of factors prevents us from dereferencing it. It would be very easy for someone to unwittingly change one of these factors without realizing that we are implicitly depending on it to save us from a NULL pointer dereference. Add a comment documenting the things that allow 'slot' to be safely passed as NULL to radix_tree_next_slot(). Here are details on the four cases: 1) radix_tree_iter_retry() via a non-tagged iteration like radix_tree_for_each_slot(). In this case we currently aren't seeing a bug because radix_tree_iter_retry() sets iter->next_index = iter->index; which means that in in the else case in radix_tree_next_slot(), 'count' is zero, so we skip over the while() loop and effectively just return NULL without ever dereferencing 'slot'. 2) radix_tree_iter_retry() via tagged iteration like radix_tree_for_each_tagged(). This case was giving us NULL pointer dereferences in testing, and was fixed with this commit: commit 3cb9185c6730 ("radix-tree: fix radix_tree_iter_retry() for tagged iterators.") This fix doesn't explicitly check for 'slot' being NULL, though, it works around the NULL pointer dereference by instead zeroing iter->tags in radix_tree_iter_retry(), which makes us bail out of the if() case in radix_tree_next_slot() before we dereference 'slot'. 3) radix_tree_iter_next() via via a non-tagged iteration like radix_tree_for_each_slot(). This currently happens in shmem_tag_pins() and shmem_partial_swap_usage(). As with non-tagged iteration, 'count' in the else case of radix_tree_next_slot() is zero, so we skip over the while() loop and effectively just return NULL without ever dereferencing 'slot'. 4) radix_tree_iter_next() via tagged iteration like radix_tree_for_each_tagged(). This happens in shmem_wait_for_pins(). radix_tree_iter_next() zeros out iter->tags, so we end up exiting radix_tree_next_slot() here: if (flags & RADIX_TREE_ITER_TAGGED) { void *canon = slot; iter->tags >>= 1; if (unlikely(!iter->tags)) return NULL; Link: http://lkml.kernel.org/r/20160815194237.25967-2-ross.zwisler@linux.intel.com Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Shuah Khan <shuahkh@osg.samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11	fs/select: add vmalloc fallback for select(2)	Vlastimil Babka	1	-3/+11
	The select(2) syscall performs a kmalloc(size, GFP_KERNEL) where size grows with the number of fds passed. We had a customer report page allocation failures of order-4 for this allocation. This is a costly order, so it might easily fail, as the VM expects such allocation to have a lower-order fallback. Such trivial fallback is vmalloc(), as the memory doesn't have to be physically contiguous and the allocation is temporary for the duration of the syscall only. There were some concerns, whether this would have negative impact on the system by exposing vmalloc() to userspace. Although an excessive use of vmalloc can cause some system wide performance issues - TLB flushes etc. - a large order allocation is not for free either and an excessive reclaim/compaction can have a similar effect. Also note that the size is effectively limited by RLIMIT_NOFILE which defaults to 1024 on the systems I checked. That means the bitmaps will fit well within single page and thus the vmalloc() fallback could be only excercised for processes where root allows a higher limit. Note that the poll(2) syscall seems to use a linked list of order-0 pages, so it doesn't need this kind of fallback. [eric.dumazet@gmail.com: fix failure path logic] [akpm@linux-foundation.org: use proper type for size] Link: http://lkml.kernel.org/r/20160927084536.5923-1-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: David Laight <David.Laight@ACULAB.COM> Cc: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Jason Baron <jbaron@akamai.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11	block: implement (some of) fallocate for block devices	Darrick J. Wong	2	-1/+79
	After much discussion, it seems that the fallocate feature flag FALLOC_FL_ZERO_RANGE maps nicely to SCSI WRITE SAME; and the feature FALLOC_FL_PUNCH_HOLE maps nicely to the devices that have been whitelisted for zeroing SCSI UNMAP. Punch still requires that FALLOC_FL_KEEP_SIZE is set. A length that goes past the end of the device will be clamped to the device size if KEEP_SIZE is set; or will return -EINVAL if not. Both start and length must be aligned to the device's logical block size. Since the semantics of fallocate are fairly well established already, wire up the two pieces. The other fallocate variants (collapse range, insert range, and allocate blocks) are not supported. Link: http://lkml.kernel.org/r/147518379992.22791.8849838163218235007.stgit@birch.djwong.org Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: Mike Snitzer <snitzer@redhat.com> # tweaked header Cc: Brian Foster <bfoster@redhat.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Hannes Reinecke <hare@suse.de> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11	block: require write_same and discard requests align to logical block size	Darrick J. Wong	1	-0/+15
	Make sure that the offset and length arguments that we're using to construct WRITE SAME and DISCARD requests are actually aligned to the logical block size. Failure to do this causes other errors in other parts of the block layer or the SCSI layer because disks don't support partial logical block writes. Link: http://lkml.kernel.org/r/147518379026.22791.4437508871355153928.stgit@birch.djwong.org Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Mike Snitzer <snitzer@redhat.com> # tweaked header Cc: Brian Foster <bfoster@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11	block: invalidate the page cache when issuing BLKZEROOUT	Darrick J. Wong	1	-6/+12
	Patch series "fallocate for block devices", v11. This is a patchset to fix page cache coherency with BLKZEROOUT and implement fallocate for block devices. The first patch is a fix to the existing BLKZEROOUT ioctl to invalidate the page cache if the zeroing command to the underlying device succeeds. Without this patch we still have the pagecache coherence bug that's been in the kernel forever. The second patch changes the internal block device functions to reject attempts to discard or zeroout that are not aligned to the logical block size. Previously, we only checked that the start/len parameters were 512-byte aligned, which caused kernel BUG_ONs for unaligned IOs to 4k-LBA devices. The third patch creates an fallocate handler for block devices, wires up the FALLOC_FL_PUNCH_HOLE flag to zeroing-discard, and connects FALLOC_FL_ZERO_RANGE to write-same so that we can have a consistent fallocate interface between files and block devices. It also allows the combination of PUNCH_HOLE and NO_HIDE_STALE to invoke non-zeroing discard. Test cases for the new block device fallocate are now in xfstests as generic/349-351. This patch (of 3): Invalidate the page cache (as a regular O_DIRECT write would do) to avoid returning stale cache contents at a later time. Link: http://lkml.kernel.org/r/147518378313.22791.16649519283678515021.stgit@birch.djwong.org Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Mike Snitzer <snitzer@redhat.com> Cc: Brian Foster <bfoster@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11	ocfs2: fix memory leak in dlm_migrate_request_handler()	Guozhonghua	1	-0/+3
	In the dlm_migrate_request_handler(), when `ret' is -EEXIST, the mle should be freed, otherwise the memory will be leaked. Link: http://lkml.kernel.org/r/71604351584F6A4EBAE558C676F37CA4A3D3522A@H3CMLB12-EX.srv.huawei-3com.com Signed-off-by: Guozhonghua <guozhonghua@h3c.com> Reviewed-by: Mark Fasheh <mfasheh@versity.com> Cc: Eric Ren <zren@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <joseph.qi@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11	Fix off-by-one in __pipe_get_pages()	Al Viro	1	-2/+2
	it actually worked only when requested area ended on the page boundary... Reported-by: Marco Grassi <marco.gra@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11	netfilter: Fix slab corruption.	Linus Torvalds	1	-75/+33
	Use the correct pattern for singly linked list insertion and deletion. We can also calculate the list head outside of the mutex. Fixes: e3b37f11e6e4 ("netfilter: replace list_head with single linked list") Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Reviewed-by: Aaron Conole <aconole@bytheb.org> Signed-off-by: David S. Miller <davem@davemloft.net> net/netfilter/core.c \| 108 ++++++++++++++++----------------------------------- 1 file changed, 33 insertions(+), 75 deletions(-)
2016-10-10	[btrfs] fix check_direct_IO() for non-iovec iterators	Al Viro	1	-1/+1
	looking for duplicate ->iov_base makes sense only for iovec-backed iterators; for kvec-backed ones it's pointless, for bvec-backed ones it's pointless and broken on 32bit (we walk through an array of struct bio_vec accessing them as if they were struct iovec; works by accident on 64bit, but on 32bit it'll blow up) and for pipe-backed ones it's pointless and ends up oopsing. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-10-10	constify iov_iter_count() and iter_is_iovec()	Al Viro	1	-2/+2
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-10-10	fix ITER_PIPE interaction with direct_IO	Al Viro	3	-11/+5
	by making sure we call iov_iter_advance() on original iov_iter even if direct_IO (done on its copy) has returned 0. It's a no-op for old iov_iter flavours and does the right thing (== truncation of the stuff we'd allocated, but not filled) in ITER_PIPE case. Failures (e.g. -EIO) get caught and dealt with by cleanup in generic_file_read_iter(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-10-10	dlm: free workqueues after the connections	Marcelo Ricardo Leitner	1	-6/+2
	After backporting commit ee44b4bc054a ("dlm: use sctp 1-to-1 API") series to a kernel with an older workqueue which didn't use RCU yet, it was noticed that we are freeing the workqueues in dlm_lowcomms_stop() too early as free_conn() will try to access that memory for canceling the queued works if any. This issue was introduced by commit 0d737a8cfd83 as before it such attempt to cancel the queued works wasn't performed, so the issue was not present. This patch fixes it by simply inverting the free order. Cc: stable@vger.kernel.org Fixes: 0d737a8cfd83 ("dlm: fix race while closing connections") Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David Teigland <teigland@redhat.com>
2016-10-09	printk: make reading the kernel log flush pending lines	Linus Torvalds	1	-0/+11
	That will mean that any possible subsequent continuation will now be broken up onto a line of its own (since reading the log has finalized the beginning og the line), but if user space has activated system logging (or if there's a kernel message dump going on) that is the right thing to do. And now that we actually get the continuation flags _right_ for this all, the user space logger that is reading the kernel messages can actually see the continuation marker. Not that anybody seems to really bother with it (or care), but in theory user space can do its own message stitching. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-09	printk: re-organize log_output() to be more legible	Linus Torvalds	1	-35/+27
	Avoid some duplicate logic now that we can return early, and update the comments for the new LOG_CONT world order. This also stops the continuation flushing from just using random record flags for the flushing action, instead taking the flags from the proper original line and updating them as we add continuations to it. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-09	printk: split out core logging code into helper function	Linus Torvalds	1	-39/+39
	The code that actually decides how to log the message (whether to put it directly into the record log, whether to append it to an existing buffered log, or whether to start a new buffered log) is fairly non-obvious code in the middle of the vprintk_emit() function. Splitting that code up into a helper function makes it easier to understand, but perhaps more importantly also allows for the code to just return early out of the helper function once it has made the decision about where the new log content goes. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-09	printk: reinstate KERN_CONT for printing continuation lines	Linus Torvalds	5	-23/+19
	Long long ago the kernel log buffer was a buffered stream of bytes, very much like stdio in user space. It supported log levels by scanning the stream and noticing the log level markers at the beginning of each line, but if you wanted to print a partial line in multiple chunks, you just did multiple printk() calls, and it just automatically worked. Except when it didn't, and you had very confusing output when different lines got all mixed up with each other. Then you got fragment lines mixing with each other, or with non-fragment lines, because it was traditionally impossible to tell whether a printk() call was a continuation or not. To at least help clarify the issue of continuation lines, we added a KERN_CONT marker back in 2007 to mark continuation lines: 474925277671 ("printk: add KERN_CONT annotation"). That continuation marker was initially an empty string, and didn't actuall make any semantic difference. But it at least made it possible to annotate the source code, and have check-patch notice that a printk() didn't need or want a log level marker, because it was a continuation of a previous line. To avoid the ambiguity between a continuation line that had that KERN_CONT marker, and a printk with no level information at all, we then in 2009 made KERN_CONT be a real log level marker which meant that we could now reliably tell the difference between the two cases. 5fd29d6ccbc9 ("printk: clean up handling of log-levels and newlines") and we could take advantage of that to make sure we didn't mix up continuation lines with lines that just didn't have any loglevel at all. Then, in 2012, the kernel log buffer was changed to be a "record" based log, where each line was a record that has a loglevel and a timestamp. You can see the beginning of that conversion in commits e11fea92e13f ("kmsg: export printk records to the /dev/kmsg interface") 7ff9554bb578 ("printk: convert byte-buffer to variable-length record buffer") with a number of follow-up commits to fix some painful fallout from that conversion. Over all, it took a couple of months to sort out most of it. But the upside was that you could have concurrent readers (and writers) of the kernel log and not have lines with mixed output in them. And one particular pain-point for the record-based kernel logging was exactly the fragmentary lines that are generated in smaller chunks. In order to still log them as one recrod, the continuation lines need to be attached to the previous record properly. However the explicit continuation record marker that is actually useful for this exact case was actually removed in aroundm the same time by commit 61e99ab8e35a ("printk: remove the now unnecessary "C" annotation for KERN_CONT") due to the incorrect belief that KERN_CONT wasn't meaningful. The ambiguity between "is this a continuation line" or "is this a plain printk with no log level information" was reintroduced, and in fact became an even bigger pain point because there was now the whole record-level merging of kernel messages going on. This patch reinstates the KERN_CONT as a real non-empty string marker, so that the ambiguity is fixed once again. But it's not a plain revert of that original removal: in the four years since we made KERN_CONT an empty string again, not only has the format of the log level markers changed, we've also had some usage changes in this area. For example, some ACPI code seems to use KERN_CONT _together_ with a log level, and now uses both the KERN_CONT marker and (for example) a KERN_INFO marker to show that it's an informational continuation of a line. Which is actually not a bad idea - if the continuation line cannot be attached to its predecessor, without the log level information we don't know what log level to assign to it (and we traditionally just assigned it the default loglevel). So having both a log level and the KERN_CONT marker is not necessarily a bad idea, but it does mean that we need to actually iterate over potentially multiple markers, rather than just a single one. Also, since KERN_CONT was still conceptually needed, and encouraged, but didn't actually _do_ anything, we've also had the reverse problem: rather than having too many annotations it has too few, and there is bit rot with code that no longer marks the continuation lines with the KERN_CONT marker. So this patch not only re-instates the non-empty KERN_CONT marker, it also fixes up the cases of bit-rot I noticed in my own logs. There are probably other cases where KERN_CONT will be needed to be added, either because it is new code that never dealt with the need for KERN_CONT, or old code that has bitrotted without anybody noticing. That said, we should strive to avoid the need for KERN_CONT. It does result in real problems for logging, and should generally not be seen as a good feature. If we some day can get rid of the feature entirely, because nobody does any fragmented printk calls, that would be lovely. But until that point, let's at mark the code that relies on the hacky multi-fragment kernel printk's. Not only does it avoid the ambiguity, it also annotates code as "maybe this would be good to fix some day". (That said, particularly during single-threaded bootup, the downsides of KERN_CONT are very limited. Things get much hairier when you have multiple threads going on and user level reading and writing logs too). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-09	be2net: Enable VF link state setting for BE3	Suresh Reddy	1	-1/+1
	The VF link state setting feature now works on BE3 chips too from FW ver 11.1.192.0 onwards. Signed-off-by: Suresh Reddy <suresh.reddy@broadcom.com> Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-09	be2net: Fix TX stats for TSO packets	Sriharsha Basavapatna	1	-2/+12
	TX stats update does not take into account headers which get duplicated when the TSO packet is split into segments by HW. Fix this for both tunneled (vxlan) and non-tunneled TSO packets. Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-09	be2net: Update Copyright string in be_hw.h	Sriharsha Basavapatna	1	-1/+1
	This patch updates the year and company name in the copyright string in be_hw.h. Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-09	be2net: NCSI FW section should be properly updated with ethtool for BE3	Sriharsha Basavapatna	1	-1/+23
	The driver has a check to ensure that NCSI FW section is updated only if the current FW version in the card supports it. This FW version check is done using memcmp() which obviously fails in some cases. Fix this by breaking up the version string into integer version components and comparing them. Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-09	be2net: Provide an alternate way to read pf_num for BEx chips	Sriharsha Basavapatna	2	-1/+10
	The driver gets the pf_num for Skyhawk and Lancer using GET_FUNC_CONFIG FW command. But since that command is not supported in BEx, we need to get it from some other command. Otherwise TPE recovery would fail since all NIC PFs would end up with a func num of 0. There's a pci function number field in the response of GET_CNTL_ATTRIBUTES command that can be read to get the same info for BEx adapters. Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-08	x86/pkeys: Make protection keys an "eager" feature	Dave Hansen	1	-3/+4
	Our XSAVE features are divided into two categories: those that generate FPU exceptions, and those that do not. MPX and pkeys do not generate FPU exceptions and thus can not be used lazily. We disable them when lazy mode is forced on. We have a pair of masks to collect these two sets of features, but XFEATURE_MASK_PKRU was added to the wrong mask: XFEATURE_MASK_LAZY. Fix it by moving the feature to XFEATURE_MASK_EAGER. Note: this only causes problem if you boot with lazy FPU mode (eagerfpu=off) which is not the default. It also only affects hardware which is not currently publicly available. It looks like eager mode is going away, but we still need this patch applied to any kernel that has protection keys and lazy mode, which is 4.6 through 4.8 at this point, and 4.9 if the lazy removal isn't sent to Linus for 4.9. Fixes: c8df40098451 ("x86/fpu, x86/mm/pkeys: Add PKRU xsave fields and data structures") Signed-off-by: Dave Hansen <dave.hansen@intel.com> Cc: Dave Hansen <dave@sr71.net> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20161007162342.28A49813@viggo.jf.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-10-08	x86/apic: Prevent pointless warning messages	Thomas Gleixner	1	-3/+5
	Markus reported that he sees new warnings: APIC: NR_CPUS/possible_cpus limit of 4 reached. Processor 4/0x84 ignored. APIC: NR_CPUS/possible_cpus limit of 4 reached. Processor 5/0x85 ignored. This comes from the recent persistant cpuid - nodeid changes. The code which emits the warning has been called prior to these changes only for enabled processors. Now it's called for disabled processors as well to get the possible cpu accounting correct. So if the kernel is compiled for the number of actual available/enabled CPUs and the BIOS reports disabled CPUs as well then the above warnings are printed. That's a pointless exercise as it only makes sense if there are more CPUs enabled than the kernel supports. Nake the warning conditional on enabled processors so we are back to the state before these changes. Fixes: 8f54969dc8d6 ("x86/acpi: Introduce persistent storage for cpuid <-> apicid mapping") Reported-and-tested-by: Markus Trippelsdorf <markus@trippelsdorf.de> Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk> Cc: Dou Liyang <douly.fnst@cn.fujitsu.com> Cc: linux-acpi@vger.kernel.org Cc: Gu Zheng <guz.fnst@cn.fujitsu.com> Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1610071549330.19804@nanos Signed-off-by: Thomas Gleixner <tglx@linutronix.de>