linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2012-01-12	memcg: make mem_cgroup_split_huge_fixup() more efficient	KAMEZAWA Hiroyuki	1	-3/+2
	In split_huge_page(), mem_cgroup_split_huge_fixup() is called to handle page_cgroup modifcations. It takes move_lock_page_cgroup() and modifies page_cgroup and LRU accounting jobs and called HPAGE_PMD_SIZE - 1 times. But thinking again, - compound_lock() is held at move_accout...then, it's not necessary to take move_lock_page_cgroup(). - LRU is locked and all tail pages will go into the same LRU as head is now on. - page_cgroup is contiguous in huge page range. This patch fixes mem_cgroup_split_huge_fixup() as to be called once per hugepage and reduce costs for spliting. [akpm@linux-foundation.org: fix typo, per Michal] Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Reviewed-by: Michal Hocko <mhocko@suse.cz> Cc: Balbir Singh <bsingharora@gmail.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-12	mm: memcg: remove unused node/section info from pc->flags	Johannes Weiner	1	-33/+0
	To find the page corresponding to a certain page_cgroup, the pc->flags encoded the node or section ID with the base array to compare the pc pointer to. Now that the per-memory cgroup LRU lists link page descriptors directly, there is no longer any code that knows the struct page_cgroup of a PFN but not the struct page. [hughd@google.com: remove unused node/section info from pc->flags fix] Signed-off-by: Johannes Weiner <jweiner@redhat.com> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Reviewed-by: Michal Hocko <mhocko@suse.cz> Reviewed-by: Kirill A. Shutemov <kirill@shutemov.name> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Ying Han <yinghan@google.com> Cc: Greg Thelen <gthelen@google.com> Cc: Michel Lespinasse <walken@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-12	mm: make per-memcg LRU lists exclusive	Johannes Weiner	3	-43/+30
	Now that all code that operated on global per-zone LRU lists is converted to operate on per-memory cgroup LRU lists instead, there is no reason to keep the double-LRU scheme around any longer. The pc->lru member is removed and page->lru is linked directly to the per-memory cgroup LRU lists, which removes two pointers from a descriptor that exists for every page frame in the system. Signed-off-by: Johannes Weiner <jweiner@redhat.com> Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Ying Han <yinghan@google.com> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Reviewed-by: Michal Hocko <mhocko@suse.cz> Reviewed-by: Kirill A. Shutemov <kirill@shutemov.name> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Greg Thelen <gthelen@google.com> Cc: Michel Lespinasse <walken@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-12	mm: collect LRU list heads into struct lruvec	Johannes Weiner	2	-5/+7
	Having a unified structure with a LRU list set for both global zones and per-memcg zones allows to keep that code simple which deals with LRU lists and does not care about the container itself. Once the per-memcg LRU lists directly link struct pages, the isolation function and all other list manipulations are shared between the memcg case and the global LRU case. Signed-off-by: Johannes Weiner <jweiner@redhat.com> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Reviewed-by: Michal Hocko <mhocko@suse.cz> Reviewed-by: Kirill A. Shutemov <kirill@shutemov.name> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Ying Han <yinghan@google.com> Cc: Greg Thelen <gthelen@google.com> Cc: Michel Lespinasse <walken@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-12	mm: move memcg hierarchy reclaim to generic reclaim code	Johannes Weiner	1	-0/+24
	Memory cgroup limit reclaim and traditional global pressure reclaim will soon share the same code to reclaim from a hierarchical tree of memory cgroups. In preparation of this, move the two right next to each other in shrink_zone(). The mem_cgroup_hierarchical_reclaim() polymath is split into a soft limit reclaim function, which still does hierarchy walking on its own, and a limit (shrinking) reclaim function, which relies on generic reclaim code to walk the hierarchy. Signed-off-by: Johannes Weiner <jweiner@redhat.com> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Reviewed-by: Michal Hocko <mhocko@suse.cz> Reviewed-by: Kirill A. Shutemov <kirill@shutemov.name> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Ying Han <yinghan@google.com> Cc: Greg Thelen <gthelen@google.com> Cc: Michel Lespinasse <walken@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-12	memcg: add mem_cgroup_replace_page_cache() to fix LRU issue	KAMEZAWA Hiroyuki	1	-0/+6
	Commit ef6a3c6311 ("mm: add replace_page_cache_page() function") added a function replace_page_cache_page(). This function replaces a page in the radix-tree with a new page. WHen doing this, memory cgroup needs to fix up the accounting information. memcg need to check PCG_USED bit etc. In some(many?) cases, 'newpage' is on LRU before calling replace_page_cache(). So, memcg's LRU accounting information should be fixed, too. This patch adds mem_cgroup_replace_page_cache() and removes the old hooks. In that function, old pages will be unaccounted without touching res_counter and new page will be accounted to the memcg (of old page). WHen overwriting pc->mem_cgroup of newpage, take zone->lru_lock and avoid races with LRU handling. Background: replace_page_cache_page() is called by FUSE code in its splice() handling. Here, 'newpage' is replacing oldpage but this newpage is not a newly allocated page and may be on LRU. LRU mis-accounting will be critical for memory cgroup because rmdir() checks the whole LRU is empty and there is no account leak. If a page is on the other LRU than it should be, rmdir() will fail. This bug was added in March 2011, but no bug report yet. I guess there are not many people who use memcg and FUSE at the same time with upstream kernels. The result of this bug is that admin cannot destroy a memcg because of account leak. So, no panic, no deadlock. And, even if an active cgroup exist, umount can succseed. So no problem at shutdown. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.cz> Cc: Miklos Szeredi <mszeredi@suse.cz> Cc: Hugh Dickins <hughd@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-12	epoll: limit paths	Jason Baron	2	-0/+2
	The current epoll code can be tickled to run basically indefinitely in both loop detection path check (on ep_insert()), and in the wakeup paths. The programs that tickle this behavior set up deeply linked networks of epoll file descriptors that cause the epoll algorithms to traverse them indefinitely. A couple of these sample programs have been previously posted in this thread: https://lkml.org/lkml/2011/2/25/297. To fix the loop detection path check algorithms, I simply keep track of the epoll nodes that have been already visited. Thus, the loop detection becomes proportional to the number of epoll file descriptor and links. This dramatically decreases the run-time of the loop check algorithm. In one diabolical case I tried it reduced the run-time from 15 mintues (all in kernel time) to .3 seconds. Fixing the wakeup paths could be done at wakeup time in a similar manner by keeping track of nodes that have already been visited, but the complexity is harder, since there can be multiple wakeups on different cpus...Thus, I've opted to limit the number of possible wakeup paths when the paths are created. This is accomplished, by noting that the end file descriptor points that are found during the loop detection pass (from the newly added link), are actually the sources for wakeup events. I keep a list of these file descriptors and limit the number and length of these paths that emanate from these 'source file descriptors'. In the current implemetation I allow 1000 paths of length 1, 500 of length 2, 100 of length 3, 50 of length 4 and 10 of length 5. Note that it is sufficient to check the 'source file descriptors' reachable from the newly added link, since no other 'source file descriptors' will have newly added links. This allows us to check only the wakeup paths that may have gotten too long, and not re-check all possible wakeup paths on the system. In terms of the path limit selection, I think its first worth noting that the most common case for epoll, is probably the model where you have 1 epoll file descriptor that is monitoring n number of 'source file descriptors'. In this case, each 'source file descriptor' has a 1 path of length 1. Thus, I believe that the limits I'm proposing are quite reasonable and in fact may be too generous. Thus, I'm hoping that the proposed limits will not prevent any workloads that currently work to fail. In terms of locking, I have extended the use of the 'epmutex' to all epoll_ctl add and remove operations. Currently its only used in a subset of the add paths. I need to hold the epmutex, so that we can correctly traverse a coherent graph, to check the number of paths. I believe that this additional locking is probably ok, since its in the setup/teardown paths, and doesn't affect the running paths, but it certainly is going to add some extra overhead. Also, worth noting is that the epmuex was recently added to the ep_ctl add operations in the initial path loop detection code using the argument that it was not on a critical path. Another thing to note here, is the length of epoll chains that is allowed. Currently, eventpoll.c defines: /* Maximum number of nesting allowed inside epoll sets */ #define EP_MAX_NESTS 4 This basically means that I am limited to a graph depth of 5 (EP_MAX_NESTS + 1). However, this limit is currently only enforced during the loop check detection code, and only when the epoll file descriptors are added in a certain order. Thus, this limit is currently easily bypassed. The newly added check for wakeup paths, stricly limits the wakeup paths to a length of 5, regardless of the order in which ep's are linked together. Thus, a side-effect of the new code is a more consistent enforcement of the graph depth. Thus far, I've tested this, using the sample programs previously mentioned, which now either return quickly or return -EINVAL. I've also testing using the piptest.c epoll tester, which showed no difference in performance. I've also created a number of different epoll networks and tested that they behave as expectded. I believe this solves the original diabolical test cases, while still preserving the sane epoll nesting. Signed-off-by: Jason Baron <jbaron@redhat.com> Cc: Nelson Elhage <nelhage@ksplice.com> Cc: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-12	mm,slub,x86: decouple size of struct page from CONFIG_CMPXCHG_LOCAL	Heiko Carstens	1	-5/+4
	While implementing cmpxchg_double() on s390 I realized that we don't set CONFIG_CMPXCHG_LOCAL despite the fact that we have support for it. However setting that option will increase the size of struct page by eight bytes on 64 bit, which we certainly do not want. Also, it doesn't make sense that a present cpu feature should increase the size of struct page. Besides that it looks like the dependency to CMPXCHG_LOCAL is wrong and that it should depend on CMPXCHG_DOUBLE instead. This patch: If an architecture supports CMPXCHG_LOCAL this shouldn't result automatically in larger struct pages if the SLUB allocator is used. Instead introduce a new config option "HAVE_ALIGNED_STRUCT_PAGE" which can be selected if a double word aligned struct page is required. Also update x86 Kconfig so that it should work as before. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-by: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-12	include/linux/linkage.h: remove unused ATTRIB_NORET macro	Joe Perches	1	-2/+0
	The uses have been renamed so delete the unused macro. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-12	treewide: convert uses of ATTRIB_NORETURN to __noreturn	Joe Perches	1	-3/+3
	Use the more commonly used __noreturn instead of ATTRIB_NORETURN. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Joe Perches <joe@perches.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Haavard Skinnemoen <hskinnemoen@gmail.com> Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-12	treewide: remove useless NORET_TYPE macro and uses	Joe Perches	3	-5/+4
	It's a very old and now unused prototype marking so just delete it. Neaten panic pointer argument style to keep checkpatch quiet. Signed-off-by: Joe Perches <joe@perches.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Haavard Skinnemoen <hskinnemoen@gmail.com> Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-12	include/linux/linkage.h: remove unused NORET_AND macro	Joe Perches	1	-1/+0
	The only use in kernel.h is gone so remove the macro. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-12	kernel.h: neaten panic prototype	Joe Perches	1	-2/+3
	Use __printf macro. Convert NORET_AND to ATTRIB_NORET. Use the normal kernel style for pointer arguments. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-12	net_sched: sfq: add optional RED on top of SFQ	Eric Dumazet	2	-1/+22
	Adds an optional Random Early Detection on each SFQ flow queue. Traditional SFQ limits count of packets, while RED permits to also control number of bytes per flow, and adds ECN capability as well. 1) We dont handle the idle time management in this RED implementation, since each 'new flow' begins with a null qavg. We really want to address backlogged flows. 2) if headdrop is selected, we try to ecn mark first packet instead of currently enqueued packet. This gives faster feedback for tcp flows compared to traditional RED [ marking the last packet in queue ] Example of use : tc qdisc add dev $DEV parent 1:1 handle 10: est 1sec 4sec sfq \ limit 3000 headdrop flows 512 divisor 16384 \ redflowlimit 100000 min 8000 max 60000 probability 0.20 ecn qdisc sfq 10: parent 1:1 limit 3000p quantum 1514b depth 127 headdrop flows 512/16384 divisor 16384 ewma 6 min 8000b max 60000b probability 0.2 ecn prob_mark 0 prob_mark_head 4876 prob_drop 6131 forced_mark 0 forced_mark_head 0 forced_drop 0 Sent 1175211782 bytes 777537 pkt (dropped 6131, overlimits 11007 requeues 0) rate 99483Kbit 8219pps backlog 689392b 456p requeues 0 In this test, with 64 netperf TCP_STREAM sessions, 50% using ECN enabled flows, we can see number of packets CE marked is smaller than number of drops (for non ECN flows) If same test is run, without RED, we can check backlog is much bigger. qdisc sfq 10: parent 1:1 limit 3000p quantum 1514b depth 127 headdrop flows 512/16384 divisor 16384 Sent 1148683617 bytes 795006 pkt (dropped 0, overlimits 0 requeues 0) rate 98429Kbit 8521pps backlog 1221290b 841p requeues 0 Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Stephen Hemminger <shemminger@vyatta.com> CC: Dave Taht <dave.taht@gmail.com> Tested-by: Dave Taht <dave.taht@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-12	phylib: introduce mdiobus_alloc_size()	Timur Tabi	1	-1/+6
	Introduce function mdiobus_alloc_size() as an alternative to mdiobus_alloc(). Most callers of mdiobus_alloc() also allocate a private data structure, and then manually point bus->priv to this object. mdiobus_alloc_size() combines the two operations into one, which simplifies memory management. The original mdiobus_alloc() now just calls mdiobus_alloc_size(0). Signed-off-by: Timur Tabi <timur@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-13	module_param: check that bool parameters really are bool.	Rusty Russell	1	-8/+2
	module_param(bool) used to counter-intuitively take an int. In fddd5201 (mid-2009) we allowed bool or int/unsigned int using a messy trick. This tightens the check (you'll get a warning about incompatible return type) but still allows it. Next kernel version, we'll remove it. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-01-13	module_param: make bool parameters really bool (drivers & misc)	Rusty Russell	5	-6/+6
	module_param(bool) used to counter-intuitively take an int. In fddd5201 (mid-2009) we allowed bool or int/unsigned int using a messy trick. It's time to remove the int/unsigned int option. For this version it'll simply give a warning, but it'll break next kernel version. Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-01-13	module_param: make bool parameters really bool (core code)	Rusty Russell	1	-1/+2
	module_param(bool) used to counter-intuitively take an int. In fddd5201 (mid-2009) we allowed bool or int/unsigned int using a messy trick. It's time to remove the int/unsigned int option. For this version it'll simply give a warning, but it'll break next kernel version. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-01-13	module_param: avoid bool abuse, add bint for special cases.	Rusty Russell	1	-0/+6
	For historical reasons, we allow module_param(bool) to take an int (or an unsigned int). That's going away. A few drivers really want an int: they set it to -1 and a parameter will set it to 0 or 1. This sucks: reading them from sysfs will give 'Y' for both -1 and 1, but if we change it to an int, then the users might be broken (if they did "param" instead of "param=1"). Use a new 'bint' parser for them. (ntfs has a different problem: it needs an int for debug_msgs because it's also exposed via sysctl.) Cc: Steve Glendinning <steve.glendinning@smsc.com> Cc: Jean Delvare <khali@linux-fr.org> Cc: Guenter Roeck <guenter.roeck@ericsson.com> Cc: Hoang-Nam Nguyen <hnguyen@de.ibm.com> Cc: Christoph Raisch <raisch@de.ibm.com> Cc: Roland Dreier <roland@kernel.org> Cc: Sean Hefty <sean.hefty@intel.com> Cc: Hal Rosenstock <hal.rosenstock@gmail.com> Cc: linux390@de.ibm.com Cc: Anton Altaparmakov <anton@tuxera.com> Cc: Jaroslav Kysela <perex@perex.cz> Cc: Takashi Iwai <tiwai@suse.de> Cc: lm-sensors@lm-sensors.org Cc: linux-rdma@vger.kernel.org Cc: linux-s390@vger.kernel.org Cc: linux-ntfs-dev@lists.sourceforge.net Cc: alsa-devel@alsa-project.org Acked-by: Takashi Iwai <tiwai@suse.de> (For the sound part) Acked-by: Guenter Roeck <guenter.roeck@ericsson.com> (For the hwmon driver) Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-01-13	module_param: check type correctness for module_param_array	Rusty Russell	1	-0/+1
	module_param_array(), unlike its non-array cousins, didn't check the type of the variable. Fixing this found two bugs. Cc: Luca Risolia <luca.risolia@studio.unibo.it> Cc: Mauro Carvalho Chehab <mchehab@infradead.org> Cc: Eric Piel <eric.piel@tremplin-utc.net> Cc: linux-media@vger.kernel.org Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-01-13	module: struct module_ref should contains long fields	Eric Dumazet	1	-5/+16
	module_ref contains two "unsigned int" fields. Thats now too small, since some machines can open more than 2^32 files. Check commit 518de9b39e8 (fs: allow for more than 2^31 files) for reference. We can add an aligned(2 * sizeof(unsigned long)) attribute to force alloc_percpu() allocating module_ref areas in single cache lines. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Rusty Russell <rusty@rustcorp.com.au> CC: Tejun Heo <tj@kernel.org> CC: Robin Holt <holt@sgi.com> CC: David Miller <davem@davemloft.net> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-01-12	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse	Linus Torvalds	1	-1/+15
	* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: FUSE: Notifying the kernel of deletion. fuse: support ioctl on directories fuse: Use kcalloc instead of kzalloc to allocate array fuse: llseek optimize SEEK_CUR and SEEK_SET
2012-01-12	Merge tag 'to-linus' of git://github.com/rustyrussell/linux	Linus Torvalds	2	-61/+15
	* tag 'to-linus' of git://github.com/rustyrussell/linux: (24 commits) lguest: Make sure interrupt is allocated ok by lguest_setup_irq lguest: move the lguest tool to the tools directory lguest: switch segment-voodoo-numbers to readable symbols virtio: balloon: Add freeze, restore handlers to support S4 virtio: balloon: Move vq initialization into separate function virtio: net: Add freeze, restore handlers to support S4 virtio: net: Move vq and vq buf removal into separate function virtio: net: Move vq initialization into separate function virtio: blk: Add freeze, restore handlers to support S4 virtio: blk: Move vq initialization to separate function virtio: console: Disable callbacks for virtqueues at start of S4 freeze virtio: console: Add freeze and restore handlers to support S4 virtio: console: Move vq and vq buf removal into separate functions virtio: pci: add PM notification handlers for restore, freeze, thaw, poweroff virtio: pci: switch to new PM API virtio_blk: fix config handler race virtio: add debugging if driver doesn't kick. virtio: expose added descriptors immediately. virtio: avoid modulus operation. virtio: support unlocked queue kick ...
2012-01-12	mmc: host: Adds support for eMMC 4.5 HS200 mode	Girish K S	1	-0/+1
	This patch adds support for the HS200 mode on the host side. Also enables the tuning feature required when the HS200 mode is selected. Signed-off-by: Girish K S <girish.shivananjappa@linaro.org> Signed-off-by: Chris Ball <cjb@laptop.org>
2012-01-12	mmc: core: HS200 mode support for eMMC 4.5	Girish K S	3	-2/+78
	This patch adds the support of the HS200 bus speed for eMMC 4.5 devices. The eMMC 4.5 devices have support for 200MHz bus speed. The function prototype of the tuning function is modified to handle the tuning command number which is different in sd and mmc case. Signed-off-by: Girish K S <girish.shivananjappa@linaro.org> Signed-off-by: Philip Rakity <prakity@marvell.com> Signed-off-by: Chris Ball <cjb@laptop.org>
2012-01-12	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless	David S. Miller	1	-2/+2

2012-01-12	vfs: export symbol d_find_any_alias()	Sage Weil	1	-0/+1
	Ceph needs this. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sage Weil <sage@newdream.net>
2012-01-12	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound	Linus Torvalds	16	-71/+960
	* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (526 commits) ASoC: twl6040 - Add method to query optimum PDM_DL1 gain ALSA: hda - Fix the lost power-setup of seconary pins after PM resume ALSA: usb-audio: add Yamaha MOX6/MOX8 support ALSA: virtuoso: add S/PDIF input support for all Xonars ALSA: ice1724 - Support for ooAoo SQ210a ALSA: ice1724 - Allow card info based on model only ALSA: ice1724 - Create capture pcm only for ADC-enabled configurations ALSA: hdspm - Provide unique driver id based on card serial ASoC: Dynamically allocate the rtd device for a non-empty release() ASoC: Fix recursive dependency due to select ATMEL_SSC in SND_ATMEL_SOC_SSC ALSA: hda - Fix the detection of "Loopback Mixing" control for VIA codecs ALSA: hda - Return the error from get_wcaps_type() for invalid NIDs ALSA: hda - Use auto-parser for HP laptops with cx20459 codec ALSA: asihpi - Fix potential Oops in snd_asihpi_cmode_info() ALSA: hdsp - Fix potential Oops in snd_hdsp_info_pref_sync_ref() ALSA: hda/cirrus - support for iMac12,2 model ASoC: cx20442: add bias control over a platform provided regulator ALSA: usb-audio - Avoid flood of frame-active debug messages ALSA: snd-usb-us122l: Delete calls to preempt_disable mfd: Put WM8994 into cache only mode when suspending ... Fix up trivial conflicts in: - arch/arm/mach-s3c64xx/mach-crag6410.c: renamed speyside_wm8962 to tobermory, added littlemill right next to it - drivers/base/regmap/{regcache.c,regmap.c}: duplicate diff that had already come in with other changes in the regmap tree
2012-01-12	Merge branch 'topic/hda' into for-linus	Takashi Iwai	1	-0/+8

2012-01-12	Merge branch 'topic/misc' into for-linus	Takashi Iwai	6	-1/+731

2012-01-12	blockdev: convert some macros to static inlines	Stephen Rothwell	1	-13/+64
	We prefer to program in C rather than preprocessor and it fixes this warning when CONFIG_BLK_DEV_INTEGRITY is not set: drivers/md/dm-table.c: In function 'dm_table_set_integrity': drivers/md/dm-table.c:1285:3: warning: statement with no effect [-Wunused-value] Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-01-11	Merge tag 'rmobile-for-linus' of git://github.com/pmundt/linux-sh	Linus Torvalds	1	-1/+10
	SH/R-Mobile updates for 3.3 merge window. * tag 'rmobile-for-linus' of git://github.com/pmundt/linux-sh: (32 commits) arm: mach-shmobile: add a resource name for shdma ARM: mach-shmobile: r8a7779 SMP support V3 ARM: mach-shmobile: Add kota2 defconfig. ARM: mach-shmobile: Add marzen defconfig. ARM: mach-shmobile: r8a7779 power domain support V2 ARM: mach-shmobile: Fix up marzen build for recent GIC changes. ARM: mach-shmobile: r8a7779 PFC function support ARM: mach-shmobile: Flush caches in platform_cpu_die() ARM: mach-shmobile: Allow SoC specific CPU kill code ARM: mach-shmobile: Fix headsmp.S code to use CPUINIT ARM: mach-shmobile: clock-r8a7779: clkz/clkzs support ARM: mach-shmobile: clock-r8a7779: add DIV4 clock support ARM: mach-shmobile: Marzen LAN89218 support ARM: mach-shmobile: Marzen SCIF2/SCIF4 support ARM: mach-shmobile: r8a7779 PFC GPIO-only support V2 ARM: mach-shmobile: r8a7779 and Marzen base support V2 sh: pfc: Unlock register support sh: pfc: Variable bitfield width config register support sh: pfc: Add config_reg_helper() function sh: pfc: Convert index to field and value pair ...
2012-01-11	Merge tag 'sh-for-linus' of git://github.com/pmundt/linux-sh	Linus Torvalds	3	-3/+40
	SuperH updates for 3.3 merge window. * tag 'sh-for-linus' of git://github.com/pmundt/linux-sh: (38 commits) sh: magicpanelr2: Update for parse_mtd_partitions() fallout. sh: mach-rsk: Update for parse_mtd_partitions() fallout. sh: sh2a: Improve cache flush/invalidate functions sh: also without PM_RUNTIME pm_runtime.o must be built sh: add a resource name for shdma sh: Remove redundant try_to_freeze() invocations. sh: Ensure IRQs are enabled across do_notify_resume(). sh: Fix up store queue code for subsys_interface changes. sh: clkfwk: sh_clk_init_parent() should be called after clk_register() sh: add platform_device for renesas_usbhs in board-sh7757lcr sh: modify clock-sh7757 for renesas_usbhs sh: pfc: ioremap() support sh: use ioread32/iowrite32 and mapped_reg for div6 sh: use ioread32/iowrite32 and mapped_reg for div4 sh: use ioread32/iowrite32 and mapped_reg for mstp32 sh: extend clock struct with mapped_reg member sh: clkfwk: clock-sh73a0: all div6_clks use SH_CLK_DIV6_EXT() sh: clkfwk: clock-sh7724: all div6_clks use SH_CLK_DIV6_EXT() sh: clock-sh7723: add CLKDEV_ICK_ID for cleanup serial: sh-sci: Handle GPIO function requests. ...
2012-01-12	virtio: pci: add PM notification handlers for restore, freeze, thaw, poweroff	Amit Shah	1	-0/+5
	Handle thaw, restore and freeze notifications from the PM core. Expose these to individual virtio drivers that can quiesce and resume vq operations. For drivers not implementing the thaw() method, use the restore method instead. These functions also save device-specific data so that the device can be put in pre-suspend state after resume, and disable and enable the PCI device in the freeze and resume functions, respectively. Signed-off-by: Amit Shah <amit.shah@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-01-12	virtio: support unlocked queue kick	Rusty Russell	1	-0/+4
	Based on patch by Christoph for virtio_blk speedup: Split virtqueue_kick to be able to do the actual notification outside the lock protecting the virtqueue. This patch was originally done by Stefan Hajnoczi, but I can't find the original one anymore and had to recreated it from memory. Pointers to the original or corrections for the commit message are welcome. Stefan's patch was here: https://github.com/stefanha/linux/commit/a6d06644e3a58e57a774e77d7dc34c4a5a2e7496 http://www.spinics.net/lists/linux-virtualization/msg14616.html Third time's the charm! Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-01-12	virtio: rename virtqueue_add_buf_gfp to virtqueue_add_buf	Rusty Russell	1	-15/+6
	Remove wrapper functions. This makes the allocation type explicit in all callers; I used GPF_KERNEL where it seemed obvious, left it at GFP_ATOMIC otherwise. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Reviewed-by: Christoph Hellwig <hch@lst.de>
2012-01-12	virtio: document functions better.	Rusty Russell	1	-47/+0
	The old documentation is left over from when we used a structure with strategy pointers. And move the documentation to the C file as per kernel practice. Though I disagree... Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Reviewed-by: Christoph Hellwig <hch@lst.de>
2012-01-12	virtio: harsher barriers for rpmsg.	Rusty Russell	1	-0/+1
	We were cheating with our barriers; using the smp ones rather than the real device ones. That was fine, until rpmsg came along, which is used to talk to a real device (a non-SMP CPU). Unfortunately, just putting back the real barriers (reverting d57ed95d) causes a performance regression on virtio-pci. In particular, Amos reports netbench's TCP_RR over virtio_net CPU utilization increased up to 35% while throughput went down by up to 14%. By comparison, this branch is in the noise. Reference: https://lkml.org/lkml/2011/12/11/22 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-01-11	mmc: core: Add option to prevent eMMC sleep command	Ulf Hansson	1	-0/+1
	Host may now use MMC_CAP2_NO_SLEEP_CMD to disable the use of eMMC sleep/awake command. This option can be used when your platform has a buggy kernel crash dump software, which is supposed to store the dump on the eMMC, but is not able to wake up the eMMC from sleep state. In particular, failures have been seen with u-boot; even if it is fixed there, platforms will be slow to update their bootloader binaries. Signed-off-by: Ulf Hansson <ulf.hansson@stericsson.com> Reviewed-by: Hanumath Prasad <hanumath.prasad@stericsson.com> Reviewed-by: Srinidhi Kasagar <srinidhi.kasagar@stericsson.com> Acked-by: Subhash Jadavani <subhashj@codeaurora.org> Signed-off-by: Chris Ball <cjb@laptop.org>
2012-01-11	mmc: sdhci-pci: remove SDHCI_QUIRK2_OWN_CARD_DETECTION	Adrian Hunter	1	-2/+0
	Even if a driver provides separate card detection, an interrupt is still needed to abort mmc requests that are in progress. SDHCI_QUIRK2_OWN_CARD_DETECTION prevents that, so remove it. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Chris Ball <cjb@laptop.org>
2012-01-11	mmc: sdhci-pci: add platform data	Adrian Hunter	1	-0/+18
	Add a means of getting platform data for the SDHCI PCI devices. The data is stored against the slot not the device in order to support multi-slot devices. The data allows platform-specific setup (such as getting GPIO numbers from firmware or setting up wl12xx for SDIO) to be done in platform support files instead of the sdhci-pci driver. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Chris Ball <cjb@laptop.org>
2012-01-11	mmc: add a generic GPIO card-detect helper	Guennadi Liakhovetski	1	-0/+19
	This patch adds a primitive helper to support card hotplug detection on platforms, where a GPIO, capable of producing interrupts, is used for detection of card-insertion and -removal events. Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de> Signed-off-by: Chris Ball <cjb@laptop.org>
2012-01-11	mmc: add a card hotplug handler context	Guennadi Liakhovetski	1	-0/+6
	SD/MMC controllers provide different card insertion and removal detection methods. On some of them the controller itself issues an interrupt, on others polling is used, on yet others auxiliary means are used for this purpose, e.g., a GPIO IRQ. Further, on some systems one of those methods can be chosen at driver probing time and configured in software. E.g., on some systems the SD/MMC controller card hot-plug detection pin can be configured either as a respective controller functions, or an IRQ-capable GPIO. To support such flexible configurations a card hot-plug context is added by this patch. Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de> Signed-off-by: Chris Ball <cjb@laptop.org>
2012-01-11	mmc: dw_mmc: Add more capabilities field	Seungwon Jeon	1	-0/+1
	This patch adds another capabilities field for MMC_CAPS2_XXX. Signed-off-by: Seungwon Jeon <tgih.jun@samsung.com> Signed-off-by: Chris Ball <cjb@laptop.org>
2012-01-11	mmc: boot partition ro lock support	Johan Rudholm	2	-1/+15
	Enable boot partitions to be read-only locked until next power on via a sysfs entry. There will be one sysfs entry for each boot partition: /sys/block/mmcblkXbootY/ro_lock_until_next_power_on Each boot partition is locked by writing 1 to its file. Signed-off-by: Johan Rudholm <johan.rudholm@stericsson.com> Signed-off-by: John Beckett <john.beckett@stericsson.com> Signed-off-by: Chris Ball <cjb@laptop.org>
2012-01-11	mmc: allow upper layers to know immediately if card has been removed	Adrian Hunter	3	-0/+6
	Add a function mmc_detect_card_removed() which upper layers can use to determine immediately if a card has been removed. This function should be called after an I/O request fails so that all queued I/O requests can be errored out immediately instead of waiting for the card device to be removed. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Sujit Reddy Thumma <sthumma@codeaurora.org> Signed-off-by: Chris Ball <cjb@laptop.org>
2012-01-11	mmc: mmci: add capabilities2 for MMC_CAP2	Per Forlin	1	-0/+2
	Signed-off-by: Per Forlin <per.forlin@stericsson.com> Signed-off-by: Chris Ball <cjb@laptop.org>
2012-01-11	mmc: sdio: support SDIO UHS cards	Philip Rakity	2	-3/+30
	This patch adds support for sdio UHS cards per the version 3.0 spec. UHS mode is only enabled for version 3.0 cards when both the host and the controller support UHS modes. 1.8v signaling support is removed if both the card and the host do not support UHS. This is done to maintain compatibility and some system/card combinations break when 1.8v signaling is enabled when the host does not support UHS. Signed-off-by: Philip Rakity <prakity@marvell.com> Signed-off-by: Aaron Lu <Aaron.lu@amd.com> Reviewed-by: Arindam Nath <arindam.nath@amd.com> Tested-by: Bing Zhao <bzhao@marvell.com> Signed-off-by: Chris Ball <cjb@laptop.org>
2012-01-11	mmc: core: Use delayed work in clock gating framework	Sujit Reddy Thumma	1	-1/+3
	Current clock gating framework disables the MCI clock as soon as the request is completed and enables it when a request arrives. This aggressive clock gating framework, when enabled, cause following issues: When there are back-to-back requests from the Queue layer, we unnecessarily end up disabling and enabling the clocks between these requests since 8MCLK clock cycles is a very short duration compared to the time delay between back to back requests reaching the MMC layer. This overhead can effect the overall performance depending on how long the clock enable and disable calls take which is platform dependent. For example on some platforms we can have clock control not on the local processor, but on a different subsystem and the time taken to perform the clock enable/disable can add significant overhead. Also if the host controller driver decides to disable the host clock too when mmc_set_ios function is called with ios.clock=0, it adds additional delay and it is highly possible that the next request had already arrived and unnecessarily blocked in enabling the clocks. This is seen frequently when the processor is executing at high speeds and in multi-core platforms thus reduces the overall throughput compared to if clock gating is disabled. Fix this by delaying turning off the clocks by posting request on delayed workqueue. Also cancel the unscheduled pending work, if any, when there is access to card. sysfs entry is provided to tune the delay as needed, default value set to 200ms. Signed-off-by: Sujit Reddy Thumma <sthumma@codeaurora.org> Acked-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Chris Ball <cjb@laptop.org>
2012-01-11	mmc: debugfs: expose the SDCLK frq in sys ios	Giuseppe CAVALLARO	1	-0/+2
	This patch is to expose the actual SDCLK frequency in /sys/kernel/debug/mmcX/ios entry. For example, if the max clk for a normal speed card is 20MHz this is reported in /sys/kernel/debug/mmcX/ios. Unfortunately the actual SDCLK frequency (i.e. Baseclock / divisor) is not reported at all: for example, in that case, on Arasan HC, it should be 48/4=12 (MHz). Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Chris Ball <cjb@laptop.org>