linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2011-12-16	target: Set additional sense length field in sense data	Roland Dreier	2	-0/+15
	The target code was not setting the additional sense length field in the sense data it returned, which meant that at least the Linux stack ignored the ASC/ASCQ fields. For example, without this patch, on a tcm_loop device: # sg_raw -v /dev/sda 2 0 0 0 0 0 gives cdb to send: 02 00 00 00 00 00 SCSI Status: Check Condition Sense Information: Fixed format, current; Sense key: Illegal Request Raw sense data (in hex): 70 00 05 00 00 00 00 00 while after the patch we correctly get the following (which matches what a regular disk returns): cdb to send: 02 00 00 00 00 00 SCSI Status: Check Condition Sense Information: Fixed format, current; Sense key: Illegal Request Additional sense: Invalid command operation code Raw sense data (in hex): 70 00 05 00 00 00 00 0a 00 00 00 00 20 00 00 00 00 00 Signed-off-by: Roland Dreier <roland@purestorage.com> Cc: stable@kernel.org Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2011-12-14	target: Remove legacy device status check from transport_execute_tasks	Nicholas Bellinger	1	-7/+0
	This patch removes a legacy se_dev_check_online() check from within transport_execute_tasks() that should no longer be necessary as transport_lookup_cmd_lun() is already making this call. Using transport_cmd_check_stop() from transport_execute_tasks() should already be checking per se_cmd context for each descriptor upon active I/O shutdown, so no need to acquire dev->dev_status_lock again while executing se_task submission. Cc: Christoph Hellwig <hch@lst.de> Cc: Roland Dreier <roland@purestorage.com> Cc: Joern Engel <joern@logfs.org> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2011-12-14	target: Remove __transport_execute_tasks() for each processing context	Nicholas Bellinger	1	-2/+0
	This patch removes the original usage of __transport_execute_tasks() ahead of every transport_get_cmd_from_queue() call in transport_processing_thread(). This helps reduce se_device->execute_task_lock contention between qla2xxx wq with target_submit_cmd() for READs and transport_processing_thread() context servicing WRITEs with full payloads for I/O submission. It also adds a __transport_execute_tasks() to kick the task queue again without a *se_cmd descriptor with existing queue full logic, but this may end up not being necessary. Cc: Christoph Hellwig <hch@lst.de> Cc: Roland Dreier <roland@purestorage.com> Cc: Joern Engel <joern@logfs.org> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2011-12-14	target: Remove extra se_device->execute_task_lock access in fast path	Nicholas Bellinger	2	-17/+24
	This patch makes __transport_execute_tasks() perform the addition of tasks to dev->execute_task_list via __transport_add_tasks_from_cmd() while holding dev->execute_task_lock during normal I/O fast path submission. It effectively removes the unnecessary re-acquire of dev->execute_task_lock during transport_execute_tasks() -> transport_add_tasks_from_cmd() ahead of calling __transport_execute_tasks() to queue tasks for the passed *se_cmd descriptor. (v2: Re-add goto check_depth usage for multi-task submission for now..) Cc: Christoph Hellwig <hch@lst.de> Cc: Roland Dreier <roland@purestorage.com> Cc: Joern Engel <joern@logfs.org> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2011-12-14	target: Drop se_device TCQ queue_depth usage from I/O path	Nicholas Bellinger	4	-46/+2
	Historically, pSCSI devices have been the ones that required target-core to enforce a per se_device->depth_left. This patch changes target-core to no longer (by default) enforce a per se_device->depth_left or sleep in transport_tcq_window_closed() when we out of queue slots for all backend export cases. Cc: Christoph Hellwig <hch@lst.de> Cc: Roland Dreier <roland@purestorage.com> Cc: Joern Engel <joern@logfs.org> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2011-12-14	target: Fix possible NULL pointer with __transport_execute_tasks	Nicholas Bellinger	1	-1/+2
	This patch makes __transport_execute_tasks() use a local *se_dev reference to prevent direct se_cmd->se_dev access after transport_cmd_check_stop() -> transport_add_tasks_from_cmd() has been called, as in the current implementation we can expect __transport_execute_tasks() may be called from another context that may have already completed the I/O. Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2011-12-14	target: Remove TFO->check_release_cmd() fabric API caller	Nicholas Bellinger	1	-4/+0
	Remove the now unused target_core_fabric_ops->check_release_cmd() as target_core handles this directly for se_cmd->cmd_kref objects now. Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2011-12-14	tcm_fc: Convert ft_send_work to use target_submit_cmd	Nicholas Bellinger	1	-38/+13
	This patch converts the main ft_send_work() I/O path to use target_submit_cmd() with a single se_cmd->cmd_kref reference that is released via the existing ft_check_stop_free() response path callback. It also makes ft_send_tm() use transport_init_se_cmd() and target_get_sess_cmd() to also use single se_cmd->cmd_kref reference. Cc: Christoph Hellwig <hch@lst.de> Cc: Kiran Patil <kiran.patil@intel.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2011-12-14	target: Add target_submit_cmd() for process context fabric submission	Nicholas Bellinger	3	-3/+92
	This patch adds a target_submit_cmd() caller that can be used by fabrics to submit an uninitialized se_cmd descriptor to an struct se_session + unpacked_lun from workqueue process context. This call will invoke the following steps: - transport_init_se_cmd() to setup se_cmd specific pointers - Obtain se_cmd->cmd_kref references with target_get_sess_cmd() - set se_cmd->t_tasks_bidi - transport_lookup_cmd_lun() to setup struct se_cmd->se_lun from the passed unpacked_lun - transport_generic_allocate_tasks() to setup the passed cdb, and - transport_handle_cdb_direct() handle READ dispatch or WRITE ready-to-transfer callback to fabric v2 changes from hch feedback: ) Add target_sc_flags_table for target_submit_cmd flags ) Rename bidi parameter to flags, add TARGET_SCF_BIDI_OP ) Convert checks to BUG_ON ) Add out_check_cond for transport_send_check_condition_and_sense usage v3 changes: ) Add TARGET_SCF_ACK_KREF for target_submit_cmd into target_get_sess_cmd to determine when the fabric caller is expecting a second kref_put() from fabric packet acknowledgement. Cc: Christoph Hellwig <hch@lst.de> Cc: Roland Dreier <roland@purestorage.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2011-12-14	target: Make target_put_sess_cmd use target_release_cmd_kref	Nicholas Bellinger	2	-15/+25
	This patch moves target_put_sess_cmd() to use a se_cmd->cmd_kref callback target_release_cmd_kref when performing driver release of fabric->se_cmd descriptor memory. It sets the default cmd_kref count value to '2' within target_get_sess_cmd() setup, and currently assumes TFO->check_stop_free() usage. It drops se_tfo->check_release_cmd() usage in the main transport_release_cmd codepath. Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2011-12-14	target: Set response format in INQUIRY response	Roland Dreier	1	-0/+12
	Current SCSI specs say that the "response format" field in the standard INQUIRY response should be set to 2, and all the real SCSI devices I have do put 2 here. So let's do that too. Signed-off-by: Roland Dreier <roland@purestorage.com> Cc: stable@kernel.org Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2011-12-14	target: tcm_mod_builder: small fixups	Sebastian Andrzej Siewior	1	-12/+10
	This includes: - remove on _ in "__NAMELEN" in $fabric _make_tport - target_fabric_configfs_init() returns an error pointer and not NULL anymore. Consider that. - replace (!(var_name)) with (!var_name). The extra () are not required - remove #ifdef MODULE. If the code is builtin it needs an init function or the code is useless - put exit/clean functions into __exit Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2011-12-14	Documentation/target: Fix tcm_mod_builder.py build breakage	Nicholas Bellinger	1	-21/+2
	This patch fixes TFO->release_cmd() and removes legacy pack_lun() usage and new_cmd_failure when generating new TCM fabric skeleton from the tcm_mod_builder.py script. Reported-by: Stefan Bergstrand <stefan.bergstrand@sdsab.se> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2011-12-14	target: remove overagressive ____cacheline_aligned annoations	Christoph Hellwig	1	-31/+31
	If we want dynamically allocated objects to be cacheline aligned we need to tell that to the slab allocator by using the proper flags and not by liberally sprinkling annotations onto all structures. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2011-12-14	tcm_loop: bump max_sectors	Christoph Hellwig	2	-14/+5
	There is not reason to artifically limit max_sectors in tcm_loop, set it to UINT_MAX to allow stressing the large I/O handling in the target core using the loopback driver. Also remove various superflous defines hiding the values set in the host template. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2011-12-14	target/configs: remove trailing newline from udev_path and alias	Sebastian Andrzej Siewior	1	-0/+6
	This patch strips the trailing newline from backend device udev_path and alias attributes. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2011-12-14	iscsi-target: fix chap identifier simple_strtoul usage	Nicholas Bellinger	1	-3/+7
	This patch makes chap_server_compute_md5() use proper unsigned long usage for the CHAP_I (identifier) and check for values beyond 255 as per RFC-1994. Reported-by: Joern Engel <joern@logfs.org> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2011-12-14	target: remove useless casts	Jörn Engel	18	-101/+90
	A reader should spend an extra moment whenever noticing a cast, because either something special is going on that deserves extra attention or, as is all too often the case, the code is wrong. These casts, afaics, have all been useless. They cast a foo* to a foo, cast a void to the assigned type, cast a foo* to void, before assigning it to a void variable, etc. In a few cases I also removed an additional &...[0], which is equally useless. Lastly I added three FIXMEs where, to the best of my judgement, the code appears to have a bug. It would be good if someone could check these. Signed-off-by: Joern Engel <joern@logfs.org> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2011-12-14	target: simplify target_check_cdb_and_preempt	Jörn Engel	1	-16/+10
	- rename to target_check_cdb_and_preempt - use non-safe list_for_each_entry - move common check into callee (simplifying callers) Signed-off-by: Joern Engel <joern@logfs.org> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2011-12-14	target: Move core_scsi3_check_cdb_abort_and_preempt	Jörn Engel	3	-17/+15
	And make it static afterwards. Signed-off-by: Joern Engel <joern@logfs.org> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2011-12-14	target: use \n as a separator for configuration	Sebastian Andrzej Siewior	5	-5/+5
	The command \| echo rd_pages=32768 > ramdisk/control Does not work because it writes "rd_pages=32768\n" and the parser which matches for "rd_pages=%d" does not recognize it due to the \n. One way of fixing this would be using "echo -n" instead. This patch adds \n to the list of separators so we don't have to use the -n argument which I find is more convinient. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2011-12-14	target: make the se_task task_state_active a normal bool	Christoph Hellwig	3	-23/+22
	There is no need to make task_state_active an atomic_t given that it is always set under the execute_task_lock so we can make it a simple bool. Also rename it to t_state_active to be closer to the list it guards, and make sure all checks before the list addion/removal actually happen under execute_task_lock. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2011-12-14	target: remove the se_task task_error_status field	Christoph Hellwig	2	-8/+1
	We only reach transport_complete_task once per task, so the test and set on task_error_status is never going to have an effect. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2011-12-14	target: fold se_task.task_sense into task_flags	Christoph Hellwig	2	-4/+4
	Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2011-12-14	target: header reshuffle, part2	Christoph Hellwig	44	-462/+288
	This reorganized the headers under include/target into: - target_core_base.h stays as is with all target-wide data stuctures and defines - target_core_backend.h contains the whole interface to I/O backends - target_core_fabric.h contains the whole interface to fabric modules Except for those only the various configfs macro headers stay around. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2011-12-14	target: reshuffle headers	Christoph Hellwig	21	-186/+170
	Create a new headers, drivers/target/target_core_internal.h that is supposed to hold all target_core-internal prototypes. Move all non-exported includes from include/target to it, and merge the smaller prototype-only includes inside drivers/target into it as well. Mark functions that were found to not be called outside their implementation file static. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2011-12-09	Linux 3.2-rc5	Linus Torvalds	1	-1/+1

2011-12-09	sys_getppid: add missing rcu_dereference	Mandeep Singh Baines	1	-1/+1
	In order to safely dereference current->real_parent inside an rcu_read_lock, we need an rcu_dereference. Signed-off-by: Mandeep Singh Baines <msb@chromium.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-12-09	rapidio/tsi721: modify PCIe capability settings	Alexandre Bounine	2	-5/+17
	Modify initialization of PCIe capability registers in Tsi721 mport driver: - change Completion Timeout value to avoid unexpected data transfer aborts during intensive traffic. - replace hardcoded offset of PCIe capability block by making it use the common function. This patch is applicable to kernel versions starting from 3.2-rc1. Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com> Cc: Matt Porter <mporter@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-12-09	rapidio/tsi721: fix mailbox resource reporting	Alexandre Bounine	1	-2/+2
	Bug fix for Tsi721 RapidIO mport driver: Tsi721 supports four RapidIO mailboxes (MBOX0 - MBOX3) as defined by RapidIO specification. Mailbox resources has to be properly reported to allow use of all available mailboxes (initial version reports only MBOX0). This patch is applicable to kernel versions staring from 3.2-rc1. Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com> Cc: Matt Porter <mporter@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-12-09	rapidio/tsi721: switch to dma_zalloc_coherent	Alexandre Bounine	1	-13/+4
	Replace the pair dma_alloc_coherent()+memset() with the new dma_zalloc_coherent() added by Andrew Morton for kernel version 3.2 Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-12-09	procfs: do not overflow get_{idle,iowait}_time for nohz	Michal Hocko	1	-2/+2
	Since commit a25cac5198d4 ("proc: Consider NO_HZ when printing idle and iowait times") we are reporting idle/io_wait time also while a CPU is tickless. We rely on get_{idle,iowait}_time functions to retrieve proper data. These functions, however, use usecs_to_cputime to translate micro seconds time to cputime64_t. This is just an alias to usecs_to_jiffies which reduces the data type from u64 to unsigned int and also checks whether the given parameter overflows jiffies_to_usecs(MAX_JIFFY_OFFSET) and returns MAX_JIFFY_OFFSET in that case. When we overflow depends on CONFIG_HZ but especially for CONFIG_HZ_300 it is quite low (1431649781) so we are getting MAX_JIFFY_OFFSET for >3000s! until we overflow unsigned int. Just for reference CONFIG_HZ_100 has an overflow window around 20s, CONFIG_HZ_250 ~8s and CONFIG_HZ_1000 ~2s. This results in a bug when people saw [h]top going mad reporting 100% CPU usage even though there was basically no CPU load. The reason was simply that /proc/stat stopped reporting idle/io_wait changes (and reported MAX_JIFFY_OFFSET) and so the only change happening was for user system time. Let's use nsecs_to_jiffies64 instead which doesn't reduce the precision to 32b type and it is much more appropriate for cumulative time values (unlike usecs_to_jiffies which intended for timeout calculations). Signed-off-by: Michal Hocko <mhocko@suse.cz> Tested-by: Artem S. Tashkinov <t.artem@mailcity.com> Cc: Dave Jones <davej@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-12-09	mm: vmalloc: check for page allocation failure before vmlist insertion	Mel Gorman	1	-0/+2
	Commit f5252e00 ("mm: avoid null pointer access in vm_struct via /proc/vmallocinfo") adds newly allocated vm_structs to the vmlist after it is fully initialised. Unfortunately, it did not check that __vmalloc_area_node() successfully populated the area. In the event of allocation failure, the vmalloc area is freed but the pointer to freed memory is inserted into the vmlist leading to a a crash later in get_vmalloc_info(). This patch adds a check for ____vmalloc_area_node() failure within __vmalloc_node_range. It does not use "goto fail" as in the previous error path as a warning was already displayed by __vmalloc_area_node() before it called vfree in its failure path. Credit goes to Luciano Chavez for doing all the real work of identifying exactly where the problem was. Signed-off-by: Mel Gorman <mgorman@suse.de> Reported-by: Luciano Chavez <lnx1138@linux.vnet.ibm.com> Tested-by: Luciano Chavez <lnx1138@linux.vnet.ibm.com> Reviewed-by: Rik van Riel <riel@redhat.com> Acked-by: David Rientjes <rientjes@google.com> Cc: <stable@vger.kernel.org> [3.1.x+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-12-09	mm: Ensure that pfn_valid() is called once per pageblock when reserving pageblocks	Michal Hocko	1	-1/+7
	setup_zone_migrate_reserve() expects that zone->start_pfn starts at pageblock_nr_pages aligned pfn otherwise we could access beyond an existing memblock resulting in the following panic if CONFIG_HOLES_IN_ZONE is not configured and we do not check pfn_valid: IP: [<c02d331d>] setup_zone_migrate_reserve+0xcd/0x180 pdpt = 0000000000000000 pde = f000ff53f000ff53 Oops: 0000 [#1] SMP Pid: 1, comm: swapper Not tainted 3.0.7-0.7-pae #1 VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform EIP: 0060:[<c02d331d>] EFLAGS: 00010006 CPU: 0 EIP is at setup_zone_migrate_reserve+0xcd/0x180 EAX: 000c0000 EBX: f5801fc0 ECX: 000c0000 EDX: 00000000 ESI: 000c01fe EDI: 000c01fe EBP: 00140000 ESP: f2475f58 DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 Process swapper (pid: 1, ti=f2474000 task=f2472cd0 task.ti=f2474000) Call Trace: [<c02d389c>] __setup_per_zone_wmarks+0xec/0x160 [<c02d3a1f>] setup_per_zone_wmarks+0xf/0x20 [<c08a771c>] init_per_zone_wmark_min+0x27/0x86 [<c020111b>] do_one_initcall+0x2b/0x160 [<c086639d>] kernel_init+0xbe/0x157 [<c05cae26>] kernel_thread_helper+0x6/0xd Code: a5 39 f5 89 f7 0f 46 fd 39 cf 76 40 8b 03 f6 c4 08 74 32 eb 91 90 89 c8 c1 e8 0e 0f be 80 80 2f 86 c0 8b 14 85 60 2f 86 c0 89 c8 <2b> 82 b4 12 00 00 c1 e0 05 03 82 ac 12 00 00 8b 00 f6 c4 08 0f EIP: [<c02d331d>] setup_zone_migrate_reserve+0xcd/0x180 SS:ESP 0068:f2475f58 CR2: 00000000000012b4 We crashed in pageblock_is_reserved() when accessing pfn 0xc0000 because highstart_pfn = 0x36ffe. The issue was introduced in 3.0-rc1 by 6d3163ce ("mm: check if any page in a pageblock is reserved before marking it MIGRATE_RESERVE"). Make sure that start_pfn is always aligned to pageblock_nr_pages to ensure that pfn_valid s always called at the start of each pageblock. Architectures with holes in pageblocks will be correctly handled by pfn_valid_within in pageblock_is_reserved. Signed-off-by: Michal Hocko <mhocko@suse.cz> Signed-off-by: Mel Gorman <mgorman@suse.de> Tested-by: Dang Bo <bdang@vmware.com> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Arve Hjnnevg <arve@android.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Dave Hansen <dave@linux.vnet.ibm.com> Cc: <stable@vger.kernel.org> [3.0+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-12-09	mm/migrate.c: pair unlock_page() and lock_page() when migrating huge pages	Hillf Danton	1	-1/+1
	Avoid unlocking and unlocked page if we failed to lock it. Signed-off-by: Hillf Danton <dhillf@gmail.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-12-09	thp: set compound tail page _count to zero	Youquan Song	2	-1/+2
	Commit 70b50f94f1644 ("mm: thp: tail page refcounting fix") keeps all page_tail->_count zero at all times. But the current kernel does not set page_tail->_count to zero if a 1GB page is utilized. So when an IOMMU 1GB page is used by KVM, it wil result in a kernel oops because a tail page's _count does not equal zero. kernel BUG at include/linux/mm.h:386! invalid opcode: 0000 [#1] SMP Call Trace: gup_pud_range+0xb8/0x19d get_user_pages_fast+0xcb/0x192 ? trace_hardirqs_off+0xd/0xf hva_to_pfn+0x119/0x2f2 gfn_to_pfn_memslot+0x2c/0x2e kvm_iommu_map_pages+0xfd/0x1c1 kvm_iommu_map_memslots+0x7c/0xbd kvm_iommu_map_guest+0xaa/0xbf kvm_vm_ioctl_assigned_device+0x2ef/0xa47 kvm_vm_ioctl+0x36c/0x3a2 do_vfs_ioctl+0x49e/0x4e4 sys_ioctl+0x5a/0x7c system_call_fastpath+0x16/0x1b RIP gup_huge_pud+0xf2/0x159 Signed-off-by: Youquan Song <youquan.song@intel.com> Reviewed-by: Andrea Arcangeli <aarcange@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-12-09	thp: add compound tail page _mapcount when mapped	Youquan Song	1	-0/+2
	With the 3.2-rc kernel, IOMMU 2M pages in KVM works. But when I tried to use IOMMU 1GB pages in KVM, I encountered an oops and the 1GB page failed to be used. The root cause is that 1GB page allocation calls gup_huge_pud() while 2M page calls gup_huge_pmd. If compound pages are used and the page is a tail page, gup_huge_pmd() increases _mapcount to record tail page are mapped while gup_huge_pud does not do that. So when the mapped page is relesed, it will result in kernel oops because the page is not marked mapped. This patch add tail process for compound page in 1GB huge page which keeps the same process as 2M page. Reproduce like: 1. Add grub boot option: hugepagesz=1G hugepages=8 2. mount -t hugetlbfs -o pagesize=1G hugetlbfs /dev/hugepages 3. qemu-kvm -m 2048 -hda os-kvm.img -cpu kvm64 -smp 4 -mem-path /dev/hugepages -net none -device pci-assign,host=07:00.1 kernel BUG at mm/swap.c:114! invalid opcode: 0000 [#1] SMP Call Trace: put_page+0x15/0x37 kvm_release_pfn_clean+0x31/0x36 kvm_iommu_put_pages+0x94/0xb1 kvm_iommu_unmap_memslots+0x80/0xb6 kvm_assign_device+0xba/0x117 kvm_vm_ioctl_assigned_device+0x301/0xa47 kvm_vm_ioctl+0x36c/0x3a2 do_vfs_ioctl+0x49e/0x4e4 sys_ioctl+0x5a/0x7c system_call_fastpath+0x16/0x1b RIP put_compound_page+0xd4/0x168 Signed-off-by: Youquan Song <youquan.song@intel.com> Reviewed-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-12-09	printk: avoid double lock acquire	Peter Zijlstra	1	-1/+2
	Commit 4f2a8d3cf5e ("printk: Fix console_sem vs logbuf_lock unlock race") introduced another silly bug where we would want to acquire an already held lock. Avoid this. Reported-by: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-12-09	memcg: update maintainers	KAMEZAWA Hiroyuki	1	-1/+2
	More players joined to memory cgroup developments and Johannes' great work changed internal design of memory cgroup dramatically. And he will do more works. Michal Hokko did many bug fixes and know memory cgroup very well. Daisuke Nishimura helped us very much but he seems busy now. Thanks to his works. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Michal Hocko <mhocko@suse.cz> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-12-09	drivers/rtc/rtc-s3c.c: fix driver clock enable/disable balance issues	Jonghwan Choi	1	-1/+1
	If an error occurs after the clock is enabled, the enable/disable state can become unbalanced. Signed-off-by: Jonghwan Choi <jhbird.choi@samsung.com> Cc: Alessandro Zummo <a.zummo@towertech.it> Acked-by: Kukjin Kim <kgene.kim@samsung.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-12-09	CREDITS: update Kees's expired fingerprint and fix details	Kees Cook	1	-3/+6
	Small clean-up for my CREDITS entry; the GPG fingerprint was not up to date, so I fixed other details at the same time too. Signed-off-by: Kees Cook <kees@outflux.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-12-09	thp: reduce khugepaged freezing latency	Andrea Arcangeli	1	-12/+4
	khugepaged can sometimes cause suspend to fail, requiring that the user retry the suspend operation. Use wait_event_freezable_timeout() instead of schedule_timeout_interruptible() to avoid missing freezer wakeups. A try_to_freeze() would have been needed in the khugepaged_alloc_hugepage tight loop too in case of the allocation failing repeatedly, and wait_event_freezable_timeout will provide it too. khugepaged would still freeze just fine by trying again the next minute but it's better if it freezes immediately. Reported-by: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Tested-by: Jiri Slaby <jslaby@suse.cz> Cc: Tejun Heo <tj@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com> Cc: "Rafael J. Wysocki" <rjw@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-12-09	fs/proc/meminfo.c: fix compilation error	Claudio Scordino	1	-3/+4
	Fix the error message "directives may not be used inside a macro argument" which appears when the kernel is compiled for the cris architecture. Signed-off-by: Claudio Scordino <claudio@evidence.eu.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-12-09	vmscan: use atomic-long for shrinker batching	Konstantin Khlebnikov	4	-12/+10
	Use atomic-long operations instead of looping around cmpxchg(). [akpm@linux-foundation.org: massage atomic.h inclusions] Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: Dave Chinner <david@fromorbit.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-12-09	vmscan: fix initial shrinker size handling	Konstantin Khlebnikov	1	-3/+6
	A shrinker function can return -1, means that it cannot do anything without a risk of deadlock. For example prune_super() does this if it cannot grab a superblock refrence, even if nr_to_scan=0. Currently we interpret this -1 as a ULONG_MAX size shrinker and evaluate `total_scan' according to this. So the next time around this shrinker can cause really big pressure. Let's skip such shrinkers instead. Also make total_scan signed, otherwise the check (total_scan < 0) below never works. Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: Dave Chinner <david@fromorbit.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-12-09	MAINTAINERS: Update amd-iommu F: patterns	Joe Perches	1	-2/+2
	Commit 29b68415e335 ("x86: amd_iommu: move to drivers/iommu/") moved the files, update the patterns. CC: Ohad Ben-Cohen <ohad@wizery.com> CC: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2011-12-09	x86, efi: Calling __pa() with an ioremap()ed address is invalid	Matt Fleming	6	-35/+48
	If we encounter an efi_memory_desc_t without EFI_MEMORY_WB set in ->attribute we currently call set_memory_uc(), which in turn calls __pa() on a potentially ioremap'd address. On CONFIG_X86_32 this is invalid, resulting in the following oops on some machines: BUG: unable to handle kernel paging request at f7f22280 IP: [<c10257b9>] reserve_ram_pages_type+0x89/0x210 [...] Call Trace: [<c104f8ca>] ? page_is_ram+0x1a/0x40 [<c1025aff>] reserve_memtype+0xdf/0x2f0 [<c1024dc9>] set_memory_uc+0x49/0xa0 [<c19334d0>] efi_enter_virtual_mode+0x1c2/0x3aa [<c19216d4>] start_kernel+0x291/0x2f2 [<c19211c7>] ? loglevel+0x1b/0x1b [<c19210bf>] i386_start_kernel+0xbf/0xc8 A better approach to this problem is to map the memory region with the correct attributes from the start, instead of modifying it after the fact. The uncached case can be handled by ioremap_nocache() and the cached by ioremap_cache(). Despite first impressions, it's not possible to use ioremap_cache() to map all cached memory regions on CONFIG_X86_64 because EFI_RUNTIME_SERVICES_DATA regions really don't like being mapped into the vmalloc space, as detailed in the following bug report, https://bugzilla.redhat.com/show_bug.cgi?id=748516 Therefore, we need to ensure that any EFI_RUNTIME_SERVICES_DATA regions are covered by the direct kernel mapping table on CONFIG_X86_64. To accomplish this we now map E820_RESERVED_EFI regions via the direct kernel mapping with the initial call to init_memory_mapping() in setup_arch(), whereas previously these regions wouldn't be mapped if they were after the last E820_RAM region until efi_ioremap() was called. Doing it this way allows us to delete efi_ioremap() completely. Signed-off-by: Matt Fleming <matt.fleming@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Matthew Garrett <mjg@redhat.com> Cc: Zhang Rui <rui.zhang@intel.com> Cc: Huang Ying <huang.ying.caritas@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/1321621751-3650-1-git-send-email-matt@console-pimps.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-08	cifs: check for NULL last_entry before calling cifs_save_resume_key	Jeff Layton	1	-2/+8
	Prior to commit eaf35b1, cifs_save_resume_key had some NULL pointer checks at the top. It turns out that at least one of those NULL pointer checks is needed after all. When the LastNameOffset in a FIND reply appears to be beyond the end of the buffer, CIFSFindFirst and CIFSFindNext will set srch_inf.last_entry to NULL. Since eaf35b1, the code will now oops in this situation. Fix this by having the callers check for a NULL last entry pointer before calling cifs_save_resume_key. No change is needed for the call site in cifs_readdir as it's not reachable with a NULL current_entry pointer. This should fix: https://bugzilla.redhat.com/show_bug.cgi?id=750247 Cc: stable@vger.kernel.org Cc: Christoph Hellwig <hch@infradead.org> Reported-by: Adam G. Metzler <adamgmetzler@gmail.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com>
2011-12-08	cifs: attempt to freeze while looping on a receive attempt	Jeff Layton	1	-0/+2
	In the recent overhaul of the demultiplex thread receive path, I neglected to ensure that we attempt to freeze on each pass through the receive loop. Reported-and-Tested-by: Woody Suwalski <terraluna977@gmail.com> Reported-and-Tested-by: Adam Williamson <awilliam@redhat.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com>
2011-12-08	cifs: Fix sparse warning when calling cifs_strtoUCS	Steve French	1	-3/+3
	Fix sparse endian check warning while calling cifs_strtoUCS CHECK fs/cifs/smbencrypt.c fs/cifs/smbencrypt.c:216:37: warning: incorrect type in argument 1 (different base types) fs/cifs/smbencrypt.c:216:37: expected restricted __le16 [usertype] <noident> fs/cifs/smbencrypt.c:216:37: got unsigned short <noident> Signed-off-by: Steve French <smfrench@gmail.com> Acked-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com