linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2016-05-02	GFS2: Remove allocation parms from gfs2_rbm_find	Bob Peterson	1	-10/+6
	Struct gfs2_alloc_parms ap is never referenced in function gfs2_rbm_find, so this patch removes it. Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2016-05-02	gfs2: use inode_lock/unlock instead of accessing i_mutex directly	Abhi Das	1	-3/+3
	i_mutex has been replaced by i_rwsem and directly accessing the non-existent i_mutex breaks the kernel build. Signed-off-by: Abhi Das <adas@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2016-04-19	GFS2: Add calls to gfs2_holder_uninit in two error handlers	Daniel DeFreez	2	-2/+4
	This patch fixes two locations that do not call gfs2_holder_uninit if gfs2_glock_nq returns an error. Signed-off-by: Daniel DeFreez <dcdefreez@ucdavis.edu> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2016-04-14	GFS2: Don't dereference inode in gfs2_inode_lookup until it's valid	Bob Peterson	1	-3/+3
	Function gfs2_inode_lookup was dereferencing the inode, and after, it checks for the value being NULL. We need to check that first. Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2016-04-12	GFS2: fs/gfs2/glock.c: Deinline do_error, save 1856 bytes	Denys Vlasenko	1	-1/+1
	This function compiles to 522 bytes of machine code. Error paths are not very time critical. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2016-04-05	gfs2: Use gfs2 wrapper to sync inode before calling generic_file_splice_read()	Abhi Das	1	-2/+26
	gfs2_file_splice_read() f_op grabs and releases the cluster-wide inode glock to sync the inode size to the latest. Without this, generic_file_splice_read() uses an older i_size value and can return EOF for valid offsets in the inode. Signed-off-by: Abhi Das <adas@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2016-04-05	GFS2: Get rid of dead code in inode_go_demote_ok	Bob Peterson	1	-7/+0
	Function inode_go_demote_ok had some code that was only executed if gl_holders was not empty. However, if gl_holders was not empty, the only caller, demote_ok(), returns before inode_go_demote_ok would ever be called. Therefore, it's dead code, so I removed it. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Acked-by: Steven Whitehouse <swhiteho@redhat.com>
2016-03-24	GFS2: ignore unlock failures after withdraw	Benjamin Marzinski	2	-1/+9
	After gfs2 has withdrawn the filesystem, it may still have many locks not in the unlocked state. If it is using lock_dlm, it will failed trying the unlocks since it has already unmounted the lock manager. Instead, it should set the SDF_SKIP_DLM_UNLOCK flag on withdraw, to signal that it can skip the lock_manager on unlocks, and failback to lock_nolock style unlocking. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2016-03-17	livepatch/module: remove livepatch module notifier	Jessica Yu	3	-76/+94
	Remove the livepatch module notifier in favor of directly enabling and disabling patches to modules in the module loader. Hard-coding the function calls ensures that ftrace_module_enable() is run before klp_module_coming() during module load, and that klp_module_going() is run before ftrace_release_mod() during module unload. This way, ftrace and livepatch code is run in the correct order during the module load/unload sequence without dependence on the module notifier call chain. Signed-off-by: Jessica Yu <jeyu@redhat.com> Reviewed-by: Petr Mladek <pmladek@suse.cz> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2016-03-17	modules: split part of complete_formation() into prepare_coming_module()	Jessica Yu	1	-8/+18
	Put all actions in complete_formation() that are performed after module->state is set to MODULE_STATE_COMING into a separate function prepare_coming_module(). This split prepares for the removal of the livepatch module notifiers in favor of hard-coding function calls to klp_module_{coming,going} in the module loader. The complete_formation -> prepare_coming_module split will also make error handling easier since we can jump to the appropriate error label to do any module GOING cleanup after all the COMING-actions have completed. Signed-off-by: Jessica Yu <jeyu@redhat.com> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> Reviewed-by: Petr Mladek <pmladek@suse.cz> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2016-03-17	Revert "Share upstreaming patches"	Linus Walleij	1	-2/+0
	This reverts commit a101ad945113be3d7f283a181810d76897f0a0d6.
2016-03-17	livepatch: Update maintainers	Josh Poimboeuf	1	-2/+3
	Seth and Vojtech are no longer active maintainers of livepatch, so remove them in favor of Jessica and Miroslav. Also add Petr as a designated reviewer. [jikos@kernel.org: Petr is the only one affected who hasn't provided his Ack by the time this patch has been applied, as he is offline curently; but I hereby assert that I've talked to him and he's OK with this change] Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Acked-by: Seth Jennings <sjenning@redhat.com> Acked-by: Jessica Yu <jeyu@redhat.com> Acked-by: Miroslav Benes <mbenes@suse.cz> Acked-by: Vojtech Pavlik <vojtech@suse.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2016-03-16	gpio: mcp23s08: Fix clearing of interrupt.	Phil Reid	1	-0/+4
	The mcp23s18 is configurable on clearing the interrupt on either reading INTCAP or GPIO. Since driver reads INTCAP in IRQ and not the GPIO reg need to set control byte for this mode. Signed-off-by: Phil Reid <preid@electromag.com.au> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2016-03-16	gpiolib: Fix comment referring to gpio_() in gpiod_()	Geert Uytterhoeven	1	-2/+2
	Fixes: 79a9becda8940deb ("gpiolib: export descriptor-based GPIO interface") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2016-03-16	gpio: pca953x: Fix pca953x_gpio_set_multiple() on 64-bit	Geert Uytterhoeven	1	-2/+4
	pca953x_gpio_set_multiple() divides by 4 to convert from longs to bytes, which assumes a 32-bit platform, and is not correct on 64-bit platforms. Use "sizeof(...)" instead to fix this. Cc: stable@vger.kernel.org Fixes: b4818afeacbd8182 ("gpio: pca953x: Add set_multiple to allow multiple bits to be set in one write.") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Acked-by: Phil Reid <preid@electromag.com.au> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2016-03-16	gpio: xgene: Fix kconfig for standby GIPO contoller	Matthias Brugger	1	-0/+1
	The standby GPIO controller can be used as a interrupt controller. Select GPIOLIB_IRQCHIP when compiling this driver. Otherwise we get a compilation error: drivers/gpio/gpio-xgene-sb.c: In function 'xgene_gpio_sb_probe': drivers/gpio/gpio-xgene-sb.c:312:10: error: 'struct gpio_chip' has no member named 'irqdomain' priv->gc.irqdomain = priv->irq_domain; ^ scripts/Makefile.build:295: recipe for target 'drivers/gpio/gpio-xgene-sb.o' failed make[2]: *** [drivers/gpio/gpio-xgene-sb.o] Error 1 Fixes: 1013fc41 "gpio: xgene: Enable X-Gene standby GPIO as interrupt controller" Signed-off-by: Matthias Brugger <mbrugger@suse.com> Acked-by: Quan Nguyen <qnguyen@apm.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2016-03-16	HID: microsoft: Add ID for MS Wireless Comfort Keyboard	Slava Bacherikov	3	-0/+4
	Microsoft Wireless Comfort Keyboard has vendor specific My Favorites 1-5 keys. Linux already supports this buttons on other MS keyboards by MS_ERGONOMY quirk. So apply MS_ERGONOMY quirk to USB PID 0x00e3 (Microsoft Wireless Optical Desktop Receiver 3.0A). After this My Favorites 1..5 keys will be reported as KEY_F14..KEY_F15 events. Signed-off-by: Slava Bacherikov <slava@bacher09.org> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2016-03-15	ARM: pxa/raumfeld: use PROPERTY_ENTRY_INTEGER to define props	Arnd Bergmann	1	-11/+4
	gcc-6.0 notices that the use of the property_entry in this file that was recently introduced cannot work right, as we initialize the wrong field: raumfeld.c:387:3: error: the address of 'raumfeld_rotary_encoder_steps' will always evaluate as 'true' [-Werror=address] DEV_PROP_U32, 1, &raumfeld_rotary_encoder_steps, }, ^~~~~~~~~~~~ raumfeld.c:389:3: error: the address of 'raumfeld_rotary_encoder_axis' will always evaluate as 'true' [-Werror=address] DEV_PROP_U32, 1, &raumfeld_rotary_encoder_axis, }, ^~~~~~~~~~~~ raumfeld.c:391:3: error: the address of 'raumfeld_rotary_encoder_relative_axis' will always evaluate as 'true' [-Werror=address] DEV_PROP_U32, 1, &raumfeld_rotary_encoder_relative_axis, }, ^~~~~~~~~~~~ The problem appears to stem from relying on an old definition of 'struct property', but it has changed several times since the code could have last been correct. This changes the code to use the PROPERTY_ENTRY_INTEGER() macro instead, which works fine for the current definition and is a safer way of doing the initialization. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: a9e340dce3c3 ("Input: rotary_encoder - move away from platform data structure") Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2016-03-15	autofs4: fix string.h include in auto_dev-ioctl.h	Ian Kent	1	-5/+0
	Since including linux/string.h will now do the right thing remove the conditional check. Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15	autofs4: use pr_xxx() macros directly for logging	Ian Kent	6	-75/+68
	Use the standard pr_xxx() log macros directly for log prints instead of the AUTOFS_XXX() macros. Signed-off-by: Ian Kent <ikent@redhat.com> Cc: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15	autofs4: change log print macros to not insert newline	Ian Kent	6	-49/+49
	Common kernel coding practice is to include the newline of log prints within the log text rather than hidden away in a macro. To avoid introducing inconsistencies as changes are made change the log macros to not include the newline. Signed-off-by: Ian Kent <raven@themaw.net> Cc: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15	autofs4: make autofs log prints consistent	Ian Kent	4	-12/+12
	Use the pr_() print in AUTOFS_() macros instead of printks and include the module name in log message macros. Also use the AUTOFS_*() macros everywhere instead of raw printks. Signed-off-by: Ian Kent <raven@themaw.net> Cc: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15	autofs4: fix some white space errors	Ian Kent	6	-10/+8
	Fix some white space format errors. Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15	autofs4: fix invalid ioctl return in autofs4_root_ioctl_unlocked()	Ian Kent	1	-1/+1
	The return from an ioctl if an invalid ioctl is passed in should be EINVAL not ENOSYS. Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15	autofs4: fix coding style line length in autofs4_wait()	Ian Kent	1	-2/+4
	The need for this is questionable but checkpatch.pl complains about the line length and it's a straightfoward change. Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15	autofs4: fix coding style problem in autofs4_get_set_timeout()	Ian Kent	1	-8/+20
	Refactor autofs4_get_set_timeout() to eliminate coding style error. Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15	autofs4: coding style fixes	Ian Kent	12	-188/+190
	Try and make the coding style completely consistent throughtout the autofs module and inline with kernel coding style recommendations. Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15	autofs: show pipe inode in mount options	Stanislav Kinsburskiy	1	-1/+6
	This is required for CRIU (Checkpoint Restart In Userspace) to migrate a mount point when write end in user space is closed. Below is a brief description of the problem. To migrate a non-catatonic autofs mount point, one has to restore the control pipe between kernel and autofs master process. One of the autofs masters is systemd, which closes pipe write end after passing it to the kernel with mount call. To be able to restore the systemd control pipe one has to know which read pipe end in systemd corresponds to the write pipe end in the kernel. The pipe "fd" in mount options is not enough because it was closed and probably replaced by some other descriptor. Thus, some other attribute is required to be able to find the read pipe end. The best attribute to use to find the correct pipe end is inode number becuase it's unique for the whole system and can't be reused while the autofs mount exists. This attribute can also be used to recognize a situation where an autofs mount has no master (no process with specified "pgrp" or no file descriptor with "pipe_ino", specified in autofs mount options). Signed-off-by: Stanislav Kinsburskiy <skinsbursky@virtuozzo.com> Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15	kallsyms: add support for relative offsets in kallsyms address table	Ard Biesheuvel	5	-19/+126
	Similar to how relative extables are implemented, it is possible to emit the kallsyms table in such a way that it contains offsets relative to some anchor point in the kernel image rather than absolute addresses. On 64-bit architectures, it cuts the size of the kallsyms address table in half, since offsets between kernel symbols can typically be expressed in 32 bits. This saves several hundreds of kilobytes of permanent .rodata on average. In addition, the kallsyms address table is no longer subject to dynamic relocation when CONFIG_RELOCATABLE is in effect, so the relocation work done after decompression now doesn't have to do relocation updates for all these values. This saves up to 24 bytes (i.e., the size of a ELF64 RELA relocation table entry) per value, which easily adds up to a couple of megabytes of uncompressed __init data on ppc64 or arm64. Even if these relocation entries typically compress well, the combined size reduction of 2.8 MB uncompressed for a ppc64_defconfig build (of which 2.4 MB is __init data) results in a ~500 KB space saving in the compressed image. Since it is useful for some architectures (like x86) to retain the ability to emit absolute values as well, this patch also adds support for capturing both absolute and relative values when KALLSYMS_ABSOLUTE_PERCPU is in effect, by emitting absolute per-cpu addresses as positive 32-bit values, and addresses relative to the lowest encountered relative symbol as negative values, which are subtracted from the runtime address of this base symbol to produce the actual address. Support for the above is enabled by default for all architectures except IA-64 and Tile-GX, whose symbols are too far apart to capture in this manner. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Tested-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Kees Cook <keescook@chromium.org> Tested-by: Kees Cook <keescook@chromium.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Ingo Molnar <mingo@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Michal Marek <mmarek@suse.cz> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15	kallsyms: don't overload absolute symbol type for percpu symbols	Ard Biesheuvel	1	-2/+12
	Commit c6bda7c988a5 ("kallsyms: fix percpu vars on x86-64 with relocation") overloaded the 'A' (absolute) symbol type to signify that a symbol is not subject to dynamic relocation. However, the original A type does not imply that at all, and depending on the version of the toolchain, many A type symbols are emitted that are in fact relative to the kernel text, i.e., if the kernel is relocated at runtime, these symbols should be updated as well. For instance, on sparc32, the following symbols are emitted as absolute (kindly provided by Guenter Roeck): f035a420 A _etext f03d9000 A _sdata f03de8c4 A jiffies f03f8860 A _edata f03fc000 A __init_begin f041bdc8 A __init_text_end f0423000 A __bss_start f0423000 A __init_end f044457d A __bss_stop f044457d A _end On x86_64, similar behavior can be observed: ffffffff81a00000 A __end_rodata_hpage_align ffffffff81b19000 A __vvar_page ffffffff81d3d000 A _end Even if only a couple of them pass the symbol range check that results in them to be taken into account for the final kallsyms symbol table, it is obvious that 'A' does not mean the symbol does not need to be updated at relocation time, and overloading its meaning to signify that is perhaps not a good idea. So instead, add a new percpu_absolute member to struct sym_entry, and when --absolute-percpu is in effect, use it to record symbols whose addresses should be emitted as final values rather than values that still require relocation at runtime. That way, we can drop the check against the 'A' type. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Tested-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Kees Cook <keescook@chromium.org> Tested-by: Kees Cook <keescook@chromium.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Ingo Molnar <mingo@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Michal Marek <mmarek@suse.cz> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15	x86: kallsyms: disable absolute percpu symbols on !SMP	Ard Biesheuvel	2	-1/+5
	scripts/kallsyms.c has a special --absolute-percpu command line option which deals with the zero based per cpu offsets that are used when building for SMP on x86_64. This means that the option should only be passed in that case, so add a Kconfig symbol with the correct predicate, and use that instead. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Tested-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Kees Cook <keescook@chromium.org> Tested-by: Kees Cook <keescook@chromium.org> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Ingo Molnar <mingo@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Michal Marek <mmarek@suse.cz> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15	checkpatch: fix another left brace warning	Geyslan G. Bem	1	-1/+1
	This patch escapes a regex that uses left brace. Using checkpatch.pl with Perl 5.22.0 generates the warning: "Unescaped left brace in regex is deprecated, passed through in regex;" Comment from regcomp.c in Perl source: "Currently we don't warn when the lbrace is at the start of a construct. This catches it in the middle of a literal string, or when it's the first thing after something like "\b"." This works as a complement to 4e5d56bd ("checkpatch: fix left brace warning"). Signed-off-by: Geyslan G. Bem <geyslan@gmail.com> Signed-off-by: Joe Perches <joe@perches.com> Suggested-by: Peter Senna Tschudin <peter.senna@gmail.com> Cc: Eddie Kovsky <ewk@edkovsky.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15	checkpatch: improve UNSPECIFIED_INT test for bare signed/unsigned uses	Joe Perches	1	-4/+8
	Improve the test to allow casts to (unsigned) or (signed) to be found and fixed if desired. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15	checkpatch: warn on bare unsigned or signed declarations without int	Joe Perches	1	-0/+20
	Kernel style prefers "unsigned int <foo>" over "unsigned <foo>" and "signed int <foo>" over "signed <foo>". Emit a warning for these simple signed/unsigned <foo> declarations. Fix it too if desired. Signed-off-by: Joe Perches <joe@perches.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15	checkpatch: exclude asm volatile from complex macro check	Joe Perches	1	-0/+3
	asm volatile and all its variants like __asm__ __volatile__ ("<foo>") are reported as errors with "Macros with with complex values should be enclosed in parentheses". Make an exception for these asm volatile macro definitions by converting the "asm volatile" to "asm_volatile" so it appears as a single function call and the error isn't reported. Signed-off-by: Joe Perches <joe@perches.com> Reported-by: Jeff Merkey <linux.mdb@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15	mm: memcontrol: drop unnecessary lru locking from mem_cgroup_migrate()	Johannes Weiner	1	-2/+1
	Migration accounting in the memory controller used to have to handle both oldpage and newpage being on the LRU already; fuse's page cache replacement used to pass a recycled newpage that had been uncharged but not freed and removed from the LRU, and the memcg migration code used to uncharge oldpage to "pass on" the existing charge to newpage. Nowadays, pages are no longer uncharged when truncated from the page cache, but rather only at free time, so if a LRU page is recycled in page cache replacement it'll also still be charged. And we bail out of the charge transfer altogether in that case. Tell commit_charge() that we know newpage is not on the LRU, to avoid taking the zone->lru_lock unnecessarily from the migration path. But also, oldpage is no longer uncharged inside migration. We only use oldpage for its page->mem_cgroup and page size, so we don't care about its LRU state anymore either. Remove any mention from the kernel doc. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Suggested-by: Hugh Dickins <hughd@google.com> Acked-by: Vladimir Davydov <vdavydov@virtuozzo.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Mateusz Guzik <mguzik@redhat.com> Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15	mm: migrate: consolidate mem_cgroup_migrate() calls	Johannes Weiner	1	-7/+2
	Rather than scattering mem_cgroup_migrate() calls all over the place, have a single call from a safe place where every migration operation eventually ends up in - migrate_page_copy(). Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Suggested-by: Hugh Dickins <hughd@google.com> Acked-by: Vladimir Davydov <vdavydov@virtuozzo.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Mateusz Guzik <mguzik@redhat.com> Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15	mm/compaction: speed up pageblock_pfn_to_page() when zone is contiguous	Joonsoo Kim	7	-52/+105
	There is a performance drop report due to hugepage allocation and in there half of cpu time are spent on pageblock_pfn_to_page() in compaction [1]. In that workload, compaction is triggered to make hugepage but most of pageblocks are un-available for compaction due to pageblock type and skip bit so compaction usually fails. Most costly operations in this case is to find valid pageblock while scanning whole zone range. To check if pageblock is valid to compact, valid pfn within pageblock is required and we can obtain it by calling pageblock_pfn_to_page(). This function checks whether pageblock is in a single zone and return valid pfn if possible. Problem is that we need to check it every time before scanning pageblock even if we re-visit it and this turns out to be very expensive in this workload. Although we have no way to skip this pageblock check in the system where hole exists at arbitrary position, we can use cached value for zone continuity and just do pfn_to_page() in the system where hole doesn't exist. This optimization considerably speeds up in above workload. Before vs After Max: 1096 MB/s vs 1325 MB/s Min: 635 MB/s 1015 MB/s Avg: 899 MB/s 1194 MB/s Avg is improved by roughly 30% [2]. [1]: http://www.spinics.net/lists/linux-mm/msg97378.html [2]: https://lkml.org/lkml/2015/12/9/23 [akpm@linux-foundation.org: don't forget to restore zone->contiguous on error path, per Vlastimil] Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Reported-by: Aaron Lu <aaron.lu@intel.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Tested-by: Aaron Lu <aaron.lu@intel.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Rik van Riel <riel@redhat.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15	mm/compaction: pass only pageblock aligned range to pageblock_pfn_to_page	Joonsoo Kim	1	-11/+30
	pageblock_pfn_to_page() is used to check there is valid pfn and all pages in the pageblock is in a single zone. If there is a hole in the pageblock, passing arbitrary position to pageblock_pfn_to_page() could cause to skip whole pageblock scanning, instead of just skipping the hole page. For deterministic behaviour, it's better to always pass pageblock aligned range to pageblock_pfn_to_page(). It will also help further optimization on pageblock_pfn_to_page() in the following patch. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Aaron Lu <aaron.lu@intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Rik van Riel <riel@redhat.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15	mm/compaction: fix invalid free_pfn and compact_cached_free_pfn	Joonsoo Kim	1	-4/+5
	free_pfn and compact_cached_free_pfn are the pointer that remember restart position of freepage scanner. When they are reset or invalid, we set them to zone_end_pfn because freepage scanner works in reverse direction. But, because zone range is defined as [zone_start_pfn, zone_end_pfn), zone_end_pfn is invalid to access. Therefore, we should not store it to free_pfn and compact_cached_free_pfn. Instead, we need to store zone_end_pfn - 1 to them. There is one more thing we should consider. Freepage scanner scan reversely by pageblock unit. If free_pfn and compact_cached_free_pfn are set to middle of pageblock, it regards that sitiation as that it already scans front part of pageblock so we lose opportunity to scan there. To fix-up, this patch do round_down() to guarantee that reset position will be pageblock aligned. Note that thanks to the current pageblock_pfn_to_page() implementation, actual access to zone_end_pfn doesn't happen until now. But, following patch will change pageblock_pfn_to_page() so this patch is needed from now on. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: David Rientjes <rientjes@google.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Aaron Lu <aaron.lu@intel.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15	mm/memblock.c: remove unnecessary memblock_type variable	Alexander Kuleshov	1	-6/+2
	We define struct memblock_type *type in the memblock_add_region() and memblock_reserve_region() functions only for passing it to the memlock_add_range() and memblock_reserve_range() functions. Let's remove these variables and will pass a type directly. Signed-off-by: Alexander Kuleshov <kuleshovmail@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15	x86: also use debug_pagealloc_enabled() for free_init_pages	Christian Borntraeger	1	-14/+15
	we want to couple all debugging features with debug_pagealloc_enabled() and not with the config option CONFIG_DEBUG_PAGEALLOC. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Suggested-by: David Rientjes <rientjes@google.com> Acked-by: David Rientjes <rientjes@google.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Laura Abbott <labbott@fedoraproject.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15	s390: query dynamic DEBUG_PAGEALLOC setting	Christian Borntraeger	2	-9/+7
	We can use debug_pagealloc_enabled() to check if we can map the identity mapping with 1MB/2GB pages as well as to print the current setting in dump_stack. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Acked-by: David Rientjes <rientjes@google.com> Cc: Laura Abbott <labbott@fedoraproject.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15	x86: query dynamic DEBUG_PAGEALLOC setting	Christian Borntraeger	3	-16/+10
	We can use debug_pagealloc_enabled() to check if we can map the identity mapping with 2MB pages. We can also add the state into the dump_stack output. The patch does not touch the code for the 1GB pages, which ignored CONFIG_DEBUG_PAGEALLOC. Do we need to fence this as well? Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: David Rientjes <rientjes@google.com> Cc: Laura Abbott <labbott@fedoraproject.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15	thp: cleanup split_huge_page()	Kirill A. Shutemov	1	-13/+7
	After one of bugfixes to freeze_page(), we don't have freezed pages in rmap, therefore mapcount of all subpages of freezed THP is zero. And we have assert for that. Let's drop code which deal with non-zero mapcount of subpages. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15	mm: use linear_page_index() in do_fault()	Matthew Wilcox	1	-2/+1
	do_fault() assumes that PAGE_SIZE is the same as PAGE_CACHE_SIZE. Use linear_page_index() to calculate pgoff in the correct units. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15	mm: remove unnecessary uses of lock_page_memcg()	Johannes Weiner	5	-20/+8
	There are several users that nest lock_page_memcg() inside lock_page() to prevent page->mem_cgroup from changing. But the page lock prevents pages from moving between cgroups, so that is unnecessary overhead. Remove lock_page_memcg() in contexts with locked contexts and fix the debug code in the page stat functions to be okay with the page lock. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Vladimir Davydov <vdavydov@virtuozzo.com> Cc: Michal Hocko <mhocko@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15	mm: simplify lock_page_memcg()	Johannes Weiner	12	-117/+88
	Now that migration doesn't clear page->mem_cgroup of live pages anymore, it's safe to make lock_page_memcg() and the memcg stat functions take pages, and spare the callers from memcg objects. [akpm@linux-foundation.org: fix warnings] Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Suggested-by: Vladimir Davydov <vdavydov@virtuozzo.com> Acked-by: Vladimir Davydov <vdavydov@virtuozzo.com> Cc: Michal Hocko <mhocko@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15	mm: migrate: do not touch page->mem_cgroup of live pages	Johannes Weiner	6	-25/+20
	Changing a page's memcg association complicates dealing with the page, so we want to limit this as much as possible. Page migration e.g. does not have to do that. Just like page cache replacement, it can forcibly charge a replacement page, and then uncharge the old page when it gets freed. Temporarily overcharging the cgroup by a single page is not an issue in practice, and charging is so cheap nowadays that this is much preferrable to the headache of messing with live pages. The only place that still changes the page->mem_cgroup binding of live pages is when pages move along with a task to another cgroup. But that path isolates the page from the LRU, takes the page lock, and the move lock (lock_page_memcg()). That means page->mem_cgroup is always stable in callers that have the page isolated from the LRU or locked. Lighter unlocked paths, like writeback accounting, can use lock_page_memcg(). [akpm@linux-foundation.org: fix build] [vdavydov@virtuozzo.com: fix lockdep splat] Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Vladimir Davydov <vdavydov@virtuozzo.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Greg Thelen <gthelen@google.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15	mm: workingset: per-cgroup cache thrash detection	Johannes Weiner	5	-57/+134
	Cache thrash detection (see a528910e12ec "mm: thrash detection-based file cache sizing" for details) currently only works on the system level, not inside cgroups. Worse, as the refaults are compared to the global number of active cache, cgroups might wrongfully get all their refaults activated when their pages are hotter than those of others. Move the refault machinery from the zone to the lruvec, and then tag eviction entries with the memcg ID. This makes the thrash detection work correctly inside cgroups. [sergey.senozhatsky@gmail.com: do not return from workingset_activation() with locked rcu and page] Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>