linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2012-12-12	thp: implement splitting pmd for huge zero page	Kirill A. Shutemov	1	-1/+42
	We can't split huge zero page itself (and it's bug if we try), but we can split the pmd which points to it. On splitting the pmd we create a table with all ptes set to normal zero page. [akpm@linux-foundation.org: fix build error] Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: "H. Peter Anvin" <hpa@linux.intel.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12	thp: change split_huge_page_pmd() interface	Kirill A. Shutemov	10	-16/+37
	Pass vma instead of mm and add address parameter. In most cases we already have vma on the stack. We provides split_huge_page_pmd_mm() for few cases when we have mm, but not vma. This change is preparation to huge zero pmd splitting implementation. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: "H. Peter Anvin" <hpa@linux.intel.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12	thp: change_huge_pmd(): make sure we don't try to make a page writable	Kirill A. Shutemov	1	-0/+1
	mprotect core never tries to make page writable using change_huge_pmd(). Let's add an assert that the assumption is true. It's important to be sure we will not make huge zero page writable. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: "H. Peter Anvin" <hpa@linux.intel.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12	thp: do_huge_pmd_wp_page(): handle huge zero page	Kirill A. Shutemov	3	-22/+104
	On write access to huge zero page we alloc a new huge page and clear it. If ENOMEM, graceful fallback: we create a new pmd table and set pte around fault address to newly allocated normal (4k) page. All other ptes in the pmd set to normal zero page. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: "H. Peter Anvin" <hpa@linux.intel.com> Cc: Mel Gorman <mel@csn.ul.ie> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12	thp: copy_huge_pmd(): copy huge zero page	Kirill A. Shutemov	1	-0/+22
	It's easy to copy huge zero page. Just set destination pmd to huge zero page. It's safe to copy huge zero page since we have none yet :-p [rientjes@google.com: fix comment] Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: "H. Peter Anvin" <hpa@linux.intel.com> Cc: Mel Gorman <mel@csn.ul.ie> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12	thp: zap_huge_pmd(): zap huge zero pmd	Kirill A. Shutemov	1	-8/+13
	We don't have a mapped page to zap in huge zero page case. Let's just clear pmd and remove it from tlb. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: "H. Peter Anvin" <hpa@linux.intel.com> Cc: Mel Gorman <mel@csn.ul.ie> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12	thp: huge zero page: basic preparation	Kirill A. Shutemov	1	-0/+30
	During testing I noticed big (up to 2.5 times) memory consumption overhead on some workloads (e.g. ft.A from NPB) if THP is enabled. The main reason for that big difference is lacking zero page in THP case. We have to allocate a real page on read page fault. A program to demonstrate the issue: #include <assert.h> #include <stdlib.h> #include <unistd.h> #define MB 10241024 int main(int argc, char argv) { char p; int i; posix_memalign((void *)&p, 2 MB, 200 * MB); for (i = 0; i < 200 * MB; i+= 4096) assert(p[i] == 0); pause(); return 0; } With thp-never RSS is about 400k, but with thp-always it's 200M. After the patcheset thp-always RSS is 400k too. Design overview. Huge zero page (hzp) is a non-movable huge page (2M on x86-64) filled with zeros. The way how we allocate it changes in the patchset: - [01/10] simplest way: hzp allocated on boot time in hugepage_init(); - [09/10] lazy allocation on first use; - [10/10] lockless refcounting + shrinker-reclaimable hzp; We setup it in do_huge_pmd_anonymous_page() if area around fault address is suitable for THP and we've got read page fault. If we fail to setup hzp (ENOMEM) we fallback to handle_pte_fault() as we normally do in THP. On wp fault to hzp we allocate real memory for the huge page and clear it. If ENOMEM, graceful fallback: we create a new pmd table and set pte around fault address to newly allocated normal (4k) page. All other ptes in the pmd set to normal zero page. We cannot split hzp (and it's bug if we try), but we can split the pmd which points to it. On splitting the pmd we create a table with all ptes set to normal zero page. === By hpa's request I've tried alternative approach for hzp implementation (see Virtual huge zero page patchset): pmd table with all entries set to zero page. This way should be more cache friendly, but it increases TLB pressure. The problem with virtual huge zero page: it requires per-arch enabling. We need a way to mark that pmd table has all ptes set to zero page. Some numbers to compare two implementations (on 4s Westmere-EX): Mirobenchmark1 ============== test: posix_memalign((void *)&p, 2 MB, 8 * GB); for (i = 0; i < 100; i++) { assert(memcmp(p, p + 4GB, 4GB) == 0); asm volatile ("": : :"memory"); } hzp: Performance counter stats for './test_memcmp' (5 runs): 32356.272845 task-clock # 0.998 CPUs utilized ( +- 0.13% ) 40 context-switches # 0.001 K/sec ( +- 0.94% ) 0 CPU-migrations # 0.000 K/sec 4,218 page-faults # 0.130 K/sec ( +- 0.00% ) 76,712,481,765 cycles # 2.371 GHz ( +- 0.13% ) [83.31%] 36,279,577,636 stalled-cycles-frontend # 47.29% frontend cycles idle ( +- 0.28% ) [83.35%] 1,684,049,110 stalled-cycles-backend # 2.20% backend cycles idle ( +- 2.96% ) [66.67%] 134,355,715,816 instructions # 1.75 insns per cycle # 0.27 stalled cycles per insn ( +- 0.10% ) [83.35%] 13,526,169,702 branches # 418.039 M/sec ( +- 0.10% ) [83.31%] 1,058,230 branch-misses # 0.01% of all branches ( +- 0.91% ) [83.36%] 32.413866442 seconds time elapsed ( +- 0.13% ) vhzp: Performance counter stats for './test_memcmp' (5 runs): 30327.183829 task-clock # 0.998 CPUs utilized ( +- 0.13% ) 38 context-switches # 0.001 K/sec ( +- 1.53% ) 0 CPU-migrations # 0.000 K/sec 4,218 page-faults # 0.139 K/sec ( +- 0.01% ) 71,964,773,660 cycles # 2.373 GHz ( +- 0.13% ) [83.35%] 31,191,284,231 stalled-cycles-frontend # 43.34% frontend cycles idle ( +- 0.40% ) [83.32%] 773,484,474 stalled-cycles-backend # 1.07% backend cycles idle ( +- 6.61% ) [66.67%] 134,982,215,437 instructions # 1.88 insns per cycle # 0.23 stalled cycles per insn ( +- 0.11% ) [83.32%] 13,509,150,683 branches # 445.447 M/sec ( +- 0.11% ) [83.34%] 1,017,667 branch-misses # 0.01% of all branches ( +- 1.07% ) [83.32%] 30.381324695 seconds time elapsed ( +- 0.13% ) Mirobenchmark2 ============== test: posix_memalign((void *)&p, 2 MB, 8 * GB); for (i = 0; i < 1000; i++) { char _p = p; while (_p < p+4GB) { assert(_p == (_p+4*GB)); _p += 4096; asm volatile ("": : :"memory"); } } hzp: Performance counter stats for 'taskset -c 0 ./test_memcmp2' (5 runs): 3505.727639 task-clock # 0.998 CPUs utilized ( +- 0.26% ) 9 context-switches # 0.003 K/sec ( +- 4.97% ) 4,384 page-faults # 0.001 M/sec ( +- 0.00% ) 8,318,482,466 cycles # 2.373 GHz ( +- 0.26% ) [33.31%] 5,134,318,786 stalled-cycles-frontend # 61.72% frontend cycles idle ( +- 0.42% ) [33.32%] 2,193,266,208 stalled-cycles-backend # 26.37% backend cycles idle ( +- 5.51% ) [33.33%] 9,494,670,537 instructions # 1.14 insns per cycle # 0.54 stalled cycles per insn ( +- 0.13% ) [41.68%] 2,108,522,738 branches # 601.451 M/sec ( +- 0.09% ) [41.68%] 158,746 branch-misses # 0.01% of all branches ( +- 1.60% ) [41.71%] 3,168,102,115 L1-dcache-loads # 903.693 M/sec ( +- 0.11% ) [41.70%] 1,048,710,998 L1-dcache-misses # 33.10% of all L1-dcache hits ( +- 0.11% ) [41.72%] 1,047,699,685 LLC-load # 298.854 M/sec ( +- 0.03% ) [33.38%] 2,287 LLC-misses # 0.00% of all LL-cache hits ( +- 8.27% ) [33.37%] 3,166,187,367 dTLB-loads # 903.147 M/sec ( +- 0.02% ) [33.35%] 4,266,538 dTLB-misses # 0.13% of all dTLB cache hits ( +- 0.03% ) [33.33%] 3.513339813 seconds time elapsed ( +- 0.26% ) vhzp: Performance counter stats for 'taskset -c 0 ./test_memcmp2' (5 runs): 27313.891128 task-clock # 0.998 CPUs utilized ( +- 0.24% ) 62 context-switches # 0.002 K/sec ( +- 0.61% ) 4,384 page-faults # 0.160 K/sec ( +- 0.01% ) 64,747,374,606 cycles # 2.370 GHz ( +- 0.24% ) [33.33%] 61,341,580,278 stalled-cycles-frontend # 94.74% frontend cycles idle ( +- 0.26% ) [33.33%] 56,702,237,511 stalled-cycles-backend # 87.57% backend cycles idle ( +- 0.07% ) [33.33%] 10,033,724,846 instructions # 0.15 insns per cycle # 6.11 stalled cycles per insn ( +- 0.09% ) [41.65%] 2,190,424,932 branches # 80.195 M/sec ( +- 0.12% ) [41.66%] 1,028,630 branch-misses # 0.05% of all branches ( +- 1.50% ) [41.66%] 3,302,006,540 L1-dcache-loads # 120.891 M/sec ( +- 0.11% ) [41.68%] 271,374,358 L1-dcache-misses # 8.22% of all L1-dcache hits ( +- 0.04% ) [41.66%] 20,385,476 LLC-load # 0.746 M/sec ( +- 1.64% ) [33.34%] 76,754 LLC-misses # 0.38% of all LL-cache hits ( +- 2.35% ) [33.34%] 3,309,927,290 dTLB-loads # 121.181 M/sec ( +- 0.03% ) [33.34%] 2,098,967,427 dTLB-misses # 63.41% of all dTLB cache hits ( +- 0.03% ) [33.34%] 27.364448741 seconds time elapsed ( +- 0.24% ) === I personally prefer implementation present in this patchset. It doesn't touch arch-specific code. This patch: Huge zero page (hzp) is a non-movable huge page (2M on x86-64) filled with zeros. For now let's allocate the page on hugepage_init(). We'll switch to lazy allocation later. We are not going to map the huge zero page until we can handle it properly on all code paths. is_huge_zero_{pfn,pmd}() functions will be used by following patches to check whether the pfn/pmd is huge zero page. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: "H. Peter Anvin" <hpa@linux.intel.com> Cc: Mel Gorman <mel@csn.ul.ie> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12	bootmem: remove alloc_arch_preferred_bootmem()	Joonsoo Kim	1	-16/+4
	The name of this function is not suitable, and removing the function and open-coding it into each call sites makes the code more understandable. Additionally, we shouldn't do an allocation from bootmem when slab_is_available(), so directly return kmalloc()'s return value. Signed-off-by: Joonsoo Kim <js1304@gmail.com> Cc: Haavard Skinnemoen <hskinnemoen@gmail.com> Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12	bootmem: remove not implemented function call, bootmem_arch_preferred_node()	Joonsoo Kim	1	-12/+0
	There is no implementation of bootmem_arch_preferred_node() and a call to this function will cause a compilation error. So remove it. Signed-off-by: Joonsoo Kim <js1304@gmail.com> Cc: Haavard Skinnemoen <hskinnemoen@gmail.com> Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12	net/mlx4_en: Add support for destination MAC in steering rules	Yan Burman	1	-6/+15
	Implement destination MAC rule extension for L3/L4 rules in flow steering. Usefull for vSwitch/macvlan configurations. Signed-off-by: Yan Burman <yanb@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-12	net/mlx4_en: Use generic etherdevice.h functions.	Yan Burman	1	-4/+2
	Get rid of full_mac, zero_mac in favour of is_zero_ether_addr and is_broadcast_ether_addr. Signed-off-by: Yan Burman <yanb@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-12	net: ethtool: Add destination MAC address to flow steering API	Yan Burman	1	-4/+7
	Add ability to specify destination MAC address for L3/L4 flow spec in order to be able to specify action for different VM's under vSwitch configuration. This change is transparent to older userspace. Signed-off-by: Yan Burman <yanb@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-12	bridge: add support of adding and deleting mdb entries	Cong Wang	4	-29/+297
	This patch implents adding/deleting mdb entries via netlink. Currently all entries are temp, we probably need a flag to distinguish permanent entries too. Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Stephen Hemminger <shemminger@vyatta.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Thomas Graf <tgraf@suug.ch> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-12	bridge: notify mdb changes via netlink	Cong Wang	4	-0/+90
	As Stephen mentioned, we need to monitor the mdb changes in user-space, so add notifications via netlink too. Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Stephen Hemminger <shemminger@vyatta.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Thomas Graf <tgraf@suug.ch> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-12	ndisc: Unexport ndisc_{build,send}_skb().	YOSHIFUJI Hideaki	2	-31/+11
	These symbols were exported for bonding device by commit 305d552a ("bonding: send IPv6 neighbor advertisement on failover"). It bacame obsolete by commit 7c899432 ("bonding, ipv4, ipv6, vlan: Handle NETDEV_BONDING_FAILOVER like NETDEV_NOTIFY_PEERS") and removed by commit 4f5762ec ("bonding: Remove obsolete source file 'bond_ipv6.c'"). Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-12	uapi: add missing netconf.h to export list	stephen hemminger	1	-0/+1
	Add netconf.h for use by iproute2. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-12	HID: i2c-hid: add mutex protecting open/close race	Benjamin Tissoires	1	-3/+11
	We should not enter close function while someone else is in open. This mutex prevents this race. There is also no need to override the ret value with -EIO in case of a failure of i2c_hid_set_power. Signed-off-by: Benjamin Tissoires <benjamin.tissoires@gmail.com> Reviewed-by: Jean Delvare <khali@linux-fr.org> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2012-12-12	Revert "HID: sensors: add to special driver list"	Alexander Holler	2	-11/+0
	Those IDs aren't necessary anymore. This reverts commit c8147d9ea19bfe7d8e569351bc7239e118dd6997. Signed-off-by: Alexander Holler <holler@ahsoftware.de> Acked-by: "Pandruvada, Srinivas" <srinivas.pandruvada@intel.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2012-12-12	HID: sensors: autodetect USB HID sensor hubs	Alexander Holler	4	-33/+13
	It should not be necessary to add IDs for HID sensor hubs to lists in hid-core.c and hid-sensor-hub.c. So instead of a whitelist, autodetect such USB HID sensor hubs, based on a collection of type physical inside a useage page of type sensor. If some sensor hubs stil must be usable as raw devices, a blacklist might be created. Signed-off-by: Alexander Holler <holler@ahsoftware.de> Acked-by: "Pandruvada, Srinivas" <srinivas.pandruvada@intel.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2012-12-12	MN10300: Use asm-generic/pci_iomap.h	David Howells	1	-0/+1
	The declarations from MN10300's pci_iomap() was removed by commit 34f1bdee1910f7efe3c32e1a891dba4fd21cb3b6 but asm-generic/pci_iomap.h wasn't then #included from asm/io.h. Signed-off-by: David Howells <dhowells@redhat.com>
2012-12-12	MN10300: Get rid of unused variable from ASB2305 PCI code	David Howells	1	-3/+1
	Get rid of an unused variable in pcibios_fixup_device_resources() which leads to the following warning: arch/mn10300/unit-asb2305/pci.c: In function 'pcibios_fixup_device_resources': arch/mn10300/unit-asb2305/pci.c:324:24: warning: unused variable 'region' [-Wunused-variable] Whilst we're at it, merge the two integer variable declarations into one line. Signed-off-by: David Howells <dhowells@redhat.com>
2012-12-12	MN10300: ASB2305 PCI code needs linux/irq.h	David Howells	1	-0/+1
	ASB2305 PCI code needs to #include linux/irq.h for XIRQ1 so that it can set the CPU interrupt priority level on the PCI interrupt. Signed-off-by: David Howells <dhowells@redhat.com>
2012-12-12	mn10300/mm/fault.c: Port OOM changes to do_page_fault	Kautuk Consul	1	-8/+25
	Commit d065bd810b6deb67d4897a14bfe21f8eb526ba99 (mm: retry page fault when blocking on disk transfer) and commit 37b23e0525d393d48a7d59f870b3bc061a30ccdb (x86,mm: make pagefault killable) The above commits introduced changes into the x86 pagefault handler for making the page fault handler retryable as well as killable. These changes reduce the mmap_sem hold time, which is crucial during OOM killer invocation. Port these changes to mn10300. Signed-off-by: Kautuk Consul <consul.kautuk@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com>
2012-12-12	MN10300: Handle cacheable PCI regions in pci_iomap()	David Howells	2	-1/+36
	Handle cacheable PCI regions in pci_iomap(). If IORESOURCE_CACHEABLE is set then we AND away the 0x20000000 "flag". Signed-off-by: David Howells <dhowells@redhat.com>
2012-12-12	MN10300: fix debug polling in ttySM driver	Mark Salter	1	-12/+23
	The debug polling interface for the SoC serial ports did not work in the case where the serial ports were not also used as a console. In that case, the uart driver startup function will not be called so tx and rx would not be enabled in the hardware control register. Also, vdma interrupts would not be enabled which the poll_get_char function relied on. This patch makes sure that the rx and tx enables are set as a consequence of the uart set_termios call which is the only initialization done for the debug polling interface. Also, the poll_get_char now handles the case where vdma interrupts are not enabled. Signed-off-by: Mark Salter <msalter@redhat.com> Signed-off-by: David Howells <dhowells@redhat.com>
2012-12-12	MN10300: ttySM: clean up unnecessary casting	Mark Salter	2	-6/+6
	The ttySM uart data register pointers are declared as void* pointers. Change them to u8* pointers so we don't need to use casts in the code. Signed-off-by: Mark Salter <msalter@redhat.com> Signed-off-by: David Howells <dhowells@redhat.com>
2012-12-12	MN10300: fix SMP synchronization between txdma and serial driver	Mark Salter	4	-48/+90
	The SoC serial port driver uses a high priority interrupt to handle tx of characters in the tx ring buffer. The driver needs to disable/enable this IRQ from outside of irq context. The original code to do this is not foolproof on SMP machines because the IRQ running on one core could still access the serial port for a short time after the driver running on another core disables the interrupt. This patch adds a flag to tell the IRQ handler that the driver wants to disable the interrupt. After seeing the flag, the IRQ handler will immediately disable the interrupt and exit. After setting the flag, the driver will wait for interrupt to be disabled by the IRQ handler. Signed-off-by: Mark Salter <msalter@redhat.com> Signed-off-by: David Howells <dhowells@redhat.com>
2012-12-12	MN10300: fix serial port vdma irq setup for SMP	Mark Salter	1	-4/+30
	The builtin SoC serial ports have no FIFOs and use a virtual DMA mechanism based on high priority IRQs to avoid overruns. These high priority interrupts are pinned to cpu#0 on SMP systems. This patch fixes a problem with SMP where the set_intr_level() interface is used to set the priority for these IRQs. The set_intr_level() function sets priority on the local cpu but on SMP systems, this code may be run on some other cpu than the one handling the interrupts. Instead of setting interrupt level explicitly, this patch uses a special irq_chip for these interrupts so that the mask/unmask methods can set the interrupt level implicitly. Signed-off-by: Mark Salter <msalter@redhat.com> Signed-off-by: David Howells <dhowells@redhat.com>
2012-12-12	MN10300: cleanup IRQ affinity setting	Mark Salter	3	-52/+17
	The irq_set_affinity handler for the mn10300 cpu pic had some hard-coded IRQs which were not to be migrated from one cpu to another. This patch cleans those up by using a combination of IRQF_NOBALANCING and specialized irq chips with no irq_set_affinity handler. This maintains the previous behavior by using generic IRQ interfaces rather than hard coding IRQ numbers in the default irq_set_affinity handler. Signed-off-by: Mark Salter <msalter@redhat.com> Signed-off-by: David Howells <dhowells@redhat.com>
2012-12-12	MN10300: ttySM: Use memory barriers correctly in circular buffer logic	David Howells	1	-6/+8
	Use memory barriers correctly in the circular buffer logic used in the driver, as documented in Documentation/circular-buffers.txt. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Mark Salter <msalter@redhat.com>
2012-12-12	ALSA: hda - Move runtime PM check to runtime_idle callback	Takashi Iwai	1	-5/+14
	The runtime_idle callback is the right place to check the suspend capability, but currently we do it wrongly in the runtime_suspend callback. This leads to a kernel error message like: pci_pm_runtime_suspend(): azx_runtime_suspend+0x0/0x50 [snd_hda_intel] returns -11 and the runtime PM core would even repeat the attempts. Reported-and-tested-by: Borislav Petkov <bp@alien8.de> Cc: <stable@vger.kernel.org> [v3.7] Signed-off-by: Takashi Iwai <tiwai@suse.de>
2012-12-12	ALSA: hda - Add stereo-dmic fixup for Acer Aspire One 522	Takashi Iwai	1	-0/+1
	Acer Aspire One 522 has the infamous digital mic unit that needs the phase inversion fixup for stereo. Bugzilla: https://bugzilla.novell.com/show_bug.cgi?id=715737 Cc: <stable@vger.kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de>
2012-12-12	ALSA: hda - Avoid doubly suspend after vga switcheroo	Takashi Iwai	1	-0/+6
	The HD-audio driver artificially calls the suspend and the resume code path in the VGA switcheroo state changes. When a machine goes to suspend, it tries to suspend the device again, and it stalls at snd_power_wait(). This patch adds checks whether the devices were already in (forced) suspend in PM callbacks for avoiding the doubly suspend. Reported-by: Daniel J Blueman <daniel@quora.org> Signed-off-by: Takashi Iwai <tiwai@suse.de>
2012-12-12	ALSA: usb-audio: Enable S/PDIF on the ASUS Xonar U3	Denis Washington	1	-2/+5
	The only required change is to extend the existing Xonar U1 mixer quirks to the U3, which seems to be controlled the same way. Signed-off-by: Denis Washington <denisw@online.de> Signed-off-by: Takashi Iwai <tiwai@suse.de>
2012-12-12	ALSA: hda - Check validity of CORB/RIRB WP reads	Takashi Iwai	1	-2/+12
	When the HD-audio controller is disabled (e.g. via vga switcheroo) but the driver is still accessing it, it spews floods of "spurious response" kernel messages. It's because CORB/RIRB WP reads 0xff, and the driver tries to fill up until this number. This patch changes the CORB/RIRB WP reads to word instead of byte, and add the check of the read value. If it's 0xffff, the controller is supposed to be disabled, so the further action will be skipped. Signed-off-by: Takashi Iwai <tiwai@suse.de>
2012-12-12	ALSA: hda - use usleep_range in link reset and change timeout check	Mengdong Lin	1	-9/+11
	Reducing the time on HDA link reset can help to reduce the driver loading time. So we replace msleep with usleep_range to get more accurate time control and change the value to a smaller one. And a 100ms timeout is set for both entering and exiting the link reset. Signed-off-by: Xingchao Wang <xingchao.wang@intel.com> Signed-off-by: Mengdong Lin <mengdong.lin@intel.com> Signed-off-by: Takashi Iwai <tiwai@suse.de>
2012-12-12	Thermal: Fix DEFAULT_THERMAL_GOVERNOR	Zhang Rui	2	-3/+9
	Fix DEFAULT_THERMAL_GOVERNOR to be consistant with the default governor selected in kernel config file. Signed-off-by: Zhang Rui <rui.zhang@intel.com>
2012-12-12	Thermal: fix a NULL pointer dereference when generic thermal layer is built as a module	Zhang Rui	1	-1/+2
	[ 12.761956] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018 [ 12.762016] IP: [<ffffffffa0005277>] handle_thermal_trip+0x47/0x130 [thermal_sys] [ 12.762060] PGD 1fec74067 PUD 1fee5b067 PMD 0 [ 12.762127] Oops: 0000 [#1] SMP [ 12.762177] Modules linked in: hid_generic crc32c_intel usbhid hid firewire_ohci(+) e1000e(+) firewire_core crc_itu_t xhci_hcd(+) thermal(+) fan thermal_sys hwmon [ 12.762423] CPU 1 [ 12.762443] Pid: 187, comm: modprobe Tainted: G A 3.7.0-thermal-module+ #25 /DH77DF [ 12.762496] RIP: 0010:[<ffffffffa0005277>] [<ffffffffa0005277>] handle_thermal_trip+0x47/0x130 [thermal_sys] [ 12.762682] RSP: 0018:ffff8801fe7ddc18 EFLAGS: 00010282 [ 12.762704] RAX: 0000000000000000 RBX: ffff8801ff3e9c00 RCX: ffff8801fdc39800 [ 12.762728] RDX: ffff8801fe7ddc24 RSI: 0000000000000001 RDI: ffff8801ff3e9c00 [ 12.762764] RBP: ffff8801fe7ddc48 R08: 0000000004000000 R09: ffffffffa001f568 [ 12.762797] R10: ffffffff81363083 R11: 0000000000000001 R12: 0000000000000001 [ 12.762832] R13: 0000000000000000 R14: 0000000000000001 R15: ffff8801fde73e68 [ 12.762866] FS: 00007f5548516700(0000) GS:ffff88021f240000(0000) knlGS:0000000000000000 [ 12.762912] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 12.762946] CR2: 0000000000000018 CR3: 00000001fefe2000 CR4: 00000000001407e0 [ 12.762979] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 12.763014] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 12.763048] Process modprobe (pid: 187, threadinfo ffff8801fe7dc000, task ffff8801fe5bdb40) [ 12.763095] Stack: [ 12.763122] 0000000000019640 00000000fdc39800 ffff8801fe7ddc48 ffff8801ff3e9c00 [ 12.763225] 0000000000000002 0000000000000000 ffff8801fe7ddc78 ffffffffa00053e7 [ 12.763338] ffff8801ff3e9c00 0000000000006c98 ffffffffa0007480 ffff8801ff3e9c00 [ 12.763440] Call Trace: [ 12.763470] [<ffffffffa00053e7>] thermal_zone_device_update+0x77/0xa0 [thermal_sys] [ 12.763515] [<ffffffffa0006d38>] thermal_zone_device_register+0x788/0xa88 [thermal_sys] [ 12.763562] [<ffffffffa001f394>] acpi_thermal_add+0x360/0x4c8 [thermal] [ 12.763598] [<ffffffff8133902a>] acpi_device_probe+0x50/0x190 [ 12.763632] [<ffffffff811bd793>] ? sysfs_create_link+0x13/0x20 [ 12.763666] [<ffffffff813cc41b>] driver_probe_device+0x7b/0x240 [ 12.763699] [<ffffffff813cc68b>] __driver_attach+0xab/0xb0 [ 12.763732] [<ffffffff813cc5e0>] ? driver_probe_device+0x240/0x240 [ 12.763766] [<ffffffff813ca836>] bus_for_each_dev+0x56/0x90 [ 12.763799] [<ffffffff813cbf4e>] driver_attach+0x1e/0x20 [ 12.763831] [<ffffffff813cbac0>] bus_add_driver+0x190/0x290 [ 12.763864] [<ffffffffa0022000>] ? 0xffffffffa0021fff [ 12.763896] [<ffffffff813ccbea>] driver_register+0x7a/0x160 [ 12.763928] [<ffffffffa0022000>] ? 0xffffffffa0021fff [ 12.763960] [<ffffffff813399fb>] acpi_bus_register_driver+0x43/0x45 [ 12.763995] [<ffffffffa002203a>] acpi_thermal_init+0x3a/0x42 [thermal] [ 12.764029] [<ffffffff8100207f>] do_one_initcall+0x3f/0x170 [ 12.764063] [<ffffffff810b1a5f>] sys_init_module+0x8f/0x200 [ 12.764097] [<ffffffff815ff259>] system_call_fastpath+0x16/0x1b [ 12.764129] Code: 48 8b 87 c8 02 00 00 41 89 f4 48 8d 55 dc ff 50 28 44 8b 6d dc 41 8d 45 fe 83 f8 01 76 5e 48 8b 83 d8 02 00 00 44 89 e6 48 89 df <ff> 50 18 4c 8d a3 10 03 00 00 4c 89 e7 e8 87 f1 5e e1 8b 83 bc [ 12.765164] RIP [<ffffffffa0005277>] handle_thermal_trip+0x47/0x130 [thermal_sys] [ 12.765223] RSP <ffff8801fe7ddc18> [ 12.765252] CR2: 0000000000000018 [ 12.765284] ---[ end trace 7723294cdfb00d2a ]--- This is because thermal_zone_device_update() is invoked before any thermal governors being registered. Signed-off-by: Zhang Rui <rui.zhang@intel.com>
2012-12-12	pkt_sched: avoid requeues if possible	Eric Dumazet	5	-6/+22
	With BQL being deployed, we can more likely have following behavior : We dequeue a packet from qdisc in dequeue_skb(), then we realize target tx queue is in XOFF state in sch_direct_xmit(), and we have to hold the skb into gso_skb for later. This shows in stats (tc -s qdisc dev eth0) as requeues. Problem of these requeues is that high priority packets can not be dequeued as long as this (possibly low prio and big TSO packet) is not removed from gso_skb. At 1Gbps speed, a full size TSO packet is 500 us of extra latency. In some cases, we know that all packets dequeued from a qdisc are for a particular and known txq : - If device is non multi queue - For all MQ/MQPRIO slave qdiscs This patch introduces a new qdisc flag, TCQ_F_ONETXQUEUE to mark this capability, so that dequeue_skb() is allowed to dequeue a packet only if the associated txq is not stopped. This indeed reduce latencies for high prio packets (or improve fairness with sfq/fq_codel), and almost remove qdisc 'requeues'. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-12	solos-pci: fix double-free of TX skb in DMA mode	David Woodhouse	1	-2/+3
	We weren't clearing card->tx_skb[port] when processing the TX done interrupt. If there wasn't another skb ready to transmit immediately, this led to a double-free because we'd free it again next time we did have a packet to send. Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Cc: stable@kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-11	bnx2: Fix accidental reversions.	Michael Chan	1	-2/+2
	Commit 4ce45e02469c382699f4c5f6df727aea9dd2e1ca "bnx2: Add BNX2 prefix to CHIP ID and name macros" accidentally reverted 2 commits to use pci_ioumap() and to make pci_error_handlers const. This fixes those mistakes. Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-11	ktest: Test if target machine is up before install	Steven Rostedt	1	-0/+10
	Sometimes a test kernel will crash or hang on reboot (this is even more apparent when testing a config without CGROUPS on a box running systemd). When this happens, on the next iteration of installing a kernel, ktest will fail when it tries to install. Have ktest do a check to see if the target can be connected to via ssh before it tries to install. If it can't connect, then reboot again. This time the reboot will fail because it can't connect and will force a power cycle. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-12-11	ktest: Fix breakage from change of oldnoconfig to olddefconfig	Steven Rostedt	1	-4/+8
	Commit fb16d891 "kconfig: replace 'oldnoconfig' with 'olddefconfig', and keep the old name", changed ktest's default config update from oldnoconfig to olddefconfig without adding oldnoconfig as a backup. The make oldnoconfig works much better than its backup of: yes '' \| make oldconfig But due to this change, and the fact that ktest is used to build lots of older kernels (and for bisects), it forgoes the oldnoconfig completely. Cc: Adam Lee <adam8157@gmail.com> Cc: Michal Marek <mmarek@suse.cz> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-12-11	memory_hotplug: ensure every online node has NORMAL memory	Lai Jiangshan	1	-0/+40
	Old memory hotplug code and new online/movable may cause a online node don't have any normal memory, but memory-management acts bad when we have nodes which is online but don't have any normal memory. Example: it may cause a bound task fail on all kernel allocation and cause the task can't create task or create other kernel object. So we disable non-normal-memory-node here, we will enable it when we prepared. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Jiang Liu <jiang.liu@huawei.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: Mel Gorman <mgorman@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-11	memory_hotplug: handle empty zone when online_movable/online_kernel	Lai Jiangshan	1	-6/+45
	Make online_movable/online_kernel can empty a zone or can move memory to a empty zone. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Jiang Liu <jiang.liu@huawei.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: Mel Gorman <mgorman@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-11	mm, memory-hotplug: dynamic configure movable memory and portion memory	Lai Jiangshan	4	-14/+146
	Add online_movable and online_kernel for logic memory hotplug. This is the dynamic version of "movablecore" & "kernelcore". We have the same reason to introduce it as to introduce "movablecore" & "kernelcore". It has the same motive as "movablecore" & "kernelcore", but it is dynamic/running-time: o We can configure memory as kernelcore or movablecore after boot. Userspace workload is increased, we need more hugepage, we can't use "online_movable" to add memory and allow the system use more THP(transparent-huge-page), vice-verse when kernel workload is increase. Also help for virtualization to dynamic configure host/guest's memory, to save/(reduce waste) memory. Memory capacity on Demand o When a new node is physically online after boot, we need to use "online_movable" or "online_kernel" to configure/portion it as we expected when we logic-online it. This configuration also helps for physically-memory-migrate. o all benefit as the same as existed "movablecore" & "kernelcore". o Preparing for movable-node, which is very important for power-saving, hardware partitioning and high-available-system(hardware fault management). (Note, we don't introduce movable-node here.) Action behavior: When a memoryblock/memorysection is onlined by "online_movable", the kernel will not have directly reference to the page of the memoryblock, thus we can remove that memory any time when needed. When it is online by "online_kernel", the kernel can use it. When it is online by "online", the zone type doesn't changed. Current constraints: Only the memoryblock which is adjacent to the ZONE_MOVABLE can be online from ZONE_NORMAL to ZONE_MOVABLE. [akpm@linux-foundation.org: use min_t, cleanups] Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Jiang Liu <jiang.liu@huawei.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: Mel Gorman <mgorman@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-11	drivers/base/node.c: cleanup node_state_attr[]	Lai Jiangshan	1	-10/+10
	use [index] = init_value use N_xxxxx instead of hardcode. Make it more readability and easier to add new state. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-11	bootmem: fix wrong call parameter for free_bootmem()	Joonsoo Kim	5	-16/+16
	It is strange that alloc_bootmem() returns a virtual address and free_bootmem() requires a physical address. Anyway, free_bootmem()'s first parameter should be physical address. There are some call sites for free_bootmem() with virtual address. So fix them. [akpm@linux-foundation.org: improve free_bootmem() and free_bootmem_pate() documentation] Signed-off-by: Joonsoo Kim <js1304@gmail.com> Cc: Haavard Skinnemoen <hskinnemoen@gmail.com> Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-11	avr32, kconfig: remove HAVE_ARCH_BOOTMEM	Joonsoo Kim	1	-3/+0
	There is no code for CONFIG_HAVE_ARCH_BOOTMEM, so remove it. Signed-off-by: Joonsoo Kim <js1304@gmail.com> Cc: Haavard Skinnemoen <hskinnemoen@gmail.com> Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-11	mm: cma: remove watermark hacks	Marek Szyprowski	2	-67/+0
	Commits 2139cbe627b8 ("cma: fix counting of isolated pages") and d95ea5d18e69 ("cma: fix watermark checking") introduced a reliable method of free page accounting when memory is being allocated from CMA regions, so the workaround introduced earlier by commit 49f223a9cd96 ("mm: trigger page reclaim in alloc_contig_range() to stabilise watermarks") can be finally removed. Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Kyungmin Park <kyungmin.park@samsung.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Mel Gorman <mel@csn.ul.ie> Acked-by: Michal Nazarewicz <mina86@mina86.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>