linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2015-02-19	x86: pte_protnone() and pmd_protnone() must check entry is not present	David Vrabel	1	-2/+4
	Since _PAGE_PROTNONE aliases _PAGE_GLOBAL it is only valid if _PAGE_PRESENT is clear. Make pte_protnone() and pmd_protnone() check for this. This fixes a 64-bit Xen PV guest regression introduced by 8a0516ed8b90 ("mm: convert p[te\|md]_numa users to p[te\|md]_protnone_numa"). Any userspace process would endlessly fault. In a 64-bit PV guest, userspace page table entries have _PAGE_GLOBAL set by the hypervisor. This meant that any fault on a present userspace entry (e.g., a write to a read-only mapping) would be misinterpreted as a NUMA hinting fault and the fault would not be correctly handled, resulting in the access endlessly faulting. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Acked-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-19	MAINTAINERS: update Ceph and RBD maintainers	Sage Weil	1	-3/+4
	- add Ilya, drop Yehuda as an RBD maintainer - add Zheng as a Ceph maintainer - update Yehuda and Sage's emails Signed-off-by: Sage Weil <sage@redhat.com>
2015-02-19	libceph: kfree() in put_osd() shouldn't depend on authorizer	Ilya Dryomov	1	-2/+3
	a255651d4cad ("ceph: ensure auth ops are defined before use") made kfree() in put_osd() conditional on the authorizer. A mechanical mistake most likely - fix it. Cc: Alex Elder <elder@linaro.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Sage Weil <sage@redhat.com> Reviewed-by: Alex Elder <elder@linaro.org>
2015-02-19	libceph: fix double __remove_osd() problem	Ilya Dryomov	1	-8/+18
	It turns out it's possible to get __remove_osd() called twice on the same OSD. That doesn't sit well with rb_erase() - depending on the shape of the tree we can get a NULL dereference, a soft lockup or a random crash at some point in the future as we end up touching freed memory. One scenario that I was able to reproduce is as follows: <osd3 is idle, on the osd lru list> <con reset - osd3> con_fault_finish() osd_reset() <osdmap - osd3 down> ceph_osdc_handle_map() <takes map_sem> kick_requests() <takes request_mutex> reset_changed_osds() __reset_osd() __remove_osd() <releases request_mutex> <releases map_sem> <takes map_sem> <takes request_mutex> __kick_osd_requests() __reset_osd() __remove_osd() <-- !!! A case can be made that osd refcounting is imperfect and reworking it would be a proper resolution, but for now Sage and I decided to fix this by adding a safe guard around __remove_osd(). Fixes: http://tracker.ceph.com/issues/8087 Cc: Sage Weil <sage@redhat.com> Cc: stable@vger.kernel.org # 3.9+: 7c6e6fc53e73: libceph: assert both regular and lingering lists in __remove_osd() Cc: stable@vger.kernel.org # 3.9+: cc9f1f518cec: libceph: change from BUG to WARN for __remove_osd() asserts Cc: stable@vger.kernel.org # 3.9+ Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Sage Weil <sage@redhat.com> Reviewed-by: Alex Elder <elder@linaro.org>
2015-02-19	rbd: convert to blk-mq	Christoph Hellwig	1	-54/+68
	This converts the rbd driver to use the blk-mq infrastructure. Except for switching to a per-request work item this is almost mechanical. This was tested by Alexandre DERUMIER in November, and found to give him 120000 iops, although the only comparism available was an old 3.10 kernel which gave 80000iops. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <elder@linaro.org> [idryomov@gmail.com: context, blk_mq_init_queue() EH] Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2015-02-19	ceph: return error for traceless reply race	Yan, Zheng	1	-6/+9
	When we receives traceless reply for request that created new inode, we re-send a lookup request to MDS get information of the newly created inode. (VFS expects FS' callback return an inode in create case) This breaks one request into two requests. Other client may modify or move to the new inode in the middle. When the race happens, ceph_handle_notrace_create() unconditionally links the dentry for 'create' operation to the inode returned by lookup. This may confuse VFS when the inode is a directory (VFS does not allow multiple linkages for directory inode). This patch makes ceph_handle_notrace_create() when it detect a race. This event should be rare and it happens only when we talk to old MDS. Recent MDS does not send traceless reply for request that creates new inode. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-02-19	ceph: fix dentry leaks	Yan, Zheng	2	-3/+6
	Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-02-19	ceph: re-send requests when MDS enters reconnecting stage	Yan, Zheng	1	-3/+26
	So that MDS can check if any request is already completed and process completed requests in clientreplay stage. When completed requests are processed in clientreplay stage, MDS can avoid sending traceless replies. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-02-19	ceph: show nocephx_require_signatures and notcp_nodelay options	Ilya Dryomov	1	-0/+4
	Signed-off-by: Ilya Dryomov <idryomov@redhat.com>
2015-02-19	libceph: tcp_nodelay support	Chaitanya Huilgol	4	-4/+33
	TCP_NODELAY socket option set on connection sockets, disables Nagle’s algorithm and improves latency characteristics. tcp_nodelay(default)/notcp_nodelay option flags provided to enable/disable setting the socket option. Signed-off-by: Chaitanya Huilgol <chaitanya.huilgol@sandisk.com> [idryomov@redhat.com: NO_TCP_NODELAY -> TCP_NODELAY, minor adjustments] Signed-off-by: Ilya Dryomov <idryomov@redhat.com>
2015-02-19	rbd: do not treat standalone as flatten	Ilya Dryomov	1	-20/+10
	If the clone is resized down to 0, it becomes standalone. If such resize is carried over while an image is mapped we would detect this and call rbd_dev_parent_put() which means "let go of all parent state, including the spec(s) of parent images(s)". This leads to a mismatch between "rbd info" and sysfs parent fields, so a fix is in order. # rbd create --image-format 2 --size 1 foo # rbd snap create foo@snap # rbd snap protect foo@snap # rbd clone foo@snap bar # DEV=$(rbd map bar) # rbd resize --allow-shrink --size 0 bar # rbd resize --size 1 bar # rbd info bar \| grep parent parent: rbd/foo@snap Before: # cat /sys/bus/rbd/devices/0/parent (no parent image) After: # cat /sys/bus/rbd/devices/0/parent pool_id 0 pool_name rbd image_id 10056b8b4567 image_name foo snap_id 2 snap_name snap overlap 0 Signed-off-by: Ilya Dryomov <idryomov@redhat.com> Reviewed-by: Josh Durgin <jdurgin@redhat.com> Reviewed-by: Alex Elder <elder@linaro.org>
2015-02-19	ceph: fix atomic_open snapdir	Yan, Zheng	1	-1/+1
	ceph_handle_snapdir() checks ceph_mdsc_do_request()'s return value and creates snapdir inode if it's -ENOENT Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-02-19	ceph: properly mark empty directory as complete	Yan, Zheng	1	-14/+15
	ceph_add_cap() calls __check_cap_issue(), which clears directory inode' complete flag. so we should set the complete flag for empty directory should be set after calling ceph_add_cap(). Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-02-19	client: include kernel version in client metadata	Yan, Zheng	1	-1/+2
	Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-02-19	ceph: provide seperate {inode,file}_operations for snapdir	Yan, Zheng	3	-4/+19
	remove all unsupported operations from {inode,file}_operations. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-02-19	ceph: fix request time stamp encoding	Yan, Zheng	1	-2/+10
	struct timespec uses 'long' to present second and nanosecond. 'long' is 64 bits on 64bits machine. ceph MDS expects time stamp to be encoded as struct ceph_timespec, which uses 'u32' to present second and nanosecond. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-02-19	ceph: fix reading inline data when i_size > PAGE_SIZE	Yan, Zheng	2	-15/+26
	when inode has inline data but its size > PAGE_SIZE (it was truncated to larger size), previous direct read code return -EIO. This patch adds code to return zeros for data whose offset > PAGE_SIZE. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-02-19	ceph: avoid block operation when !TASK_RUNNING (ceph_mdsc_close_sessions)	Yan, Zheng	2	-9/+5
	use an atomic variable to track number of sessions, this can avoid block operation inside wait loops. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-02-19	ceph: avoid block operation when !TASK_RUNNING (ceph_get_caps)	Yan, Zheng	1	-44/+42
	we should not do block operation in wait_event_interruptible()'s condition check function, but reading inline data can block. so move the read inline data code to ceph_get_caps() Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-02-19	ceph: avoid block operation when !TASK_RUNNING (ceph_mdsc_sync)	Yan, Zheng	2	-18/+35
	check_cap_flush() calls mutex_lock(), which may block. So we can't use it as condition check function for wait_event(); Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-02-19	rbd: fix error paths in rbd_dev_refresh()	Ilya Dryomov	1	-7/+6
	header_rwsem should be released on errors. Also remove useless rbd_dev->mapping.size != rbd_dev->header.image_size test. Signed-off-by: Ilya Dryomov <idryomov@redhat.com>
2015-02-19	ceph: improve reference tracking for snaprealm	Yan, Zheng	4	-27/+63
	When snaprealm is created, its initial reference count is zero. But in some rare cases, the newly created snaprealm is not referenced by anyone. This causes snaprealm with zero reference count not freed. The fix is set reference count of newly snaprealm to 1. The reference is return the function who requests to create the snaprealm. When the function finishes its job, it releases the reference. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-02-19	ceph: properly zero data pages for file holes.	Yan, Zheng	1	-6/+7
	A bug is found in striped_read() of fs/ceph/file.c. striped_read() calls ceph_zero_pape_vector_range(). The first argument, page_align + read + ret, passed to ceph_zero_pape_vector_range() is wrong. When a file has holes, this wrong parameter may cause memory corruption either in kernal space or user space. Kernel space memory may be corrupted in the case of non direct IO; user space memory may be corrupted in the case of direct IO. In the latter case, the application doing direct IO may crash due to memory corruption, as we have experienced. The correct value should be initial_align + read + ret, where intial_align = o_direct ? buf_align : io_align. Compared with page_align, the current page offest, initial_align is the initial page offest, which should be used to calculate the page and offset in ceph_zero_pape_vector_range(). Reported-by: caifeng zhu <zhucaifeng@unissoft-nj.com> Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-02-19	ceph: acl: Remove unused function	Rickard Strandqvist	1	-14/+0
	Remove the function ceph_get_cached_acl() that is not used anywhere. This was partially found by using a static code analysis program called cppcheck. Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se> Reviewed-by: Yan, Zheng <zyan@redhat.com>
2015-02-19	rbd: nuke copy_token()	Rickard Strandqvist	1	-30/+0
	It's been largely superseded by dup_token() and unused for over 2 years, identified by cppcheck. Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se> [idryomov@redhat.com: changelog] Signed-off-by: Ilya Dryomov <idryomov@redhat.com>
2015-02-19	ceph: handle SESSION_FORCE_RO message	Yan, Zheng	4	-0/+27
	mark session as readonly and wake up all cap waiters. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-02-19	libceph: use mon_client.c/put_generic_request() more	Ilya Dryomov	1	-2/+2
	Signed-off-by: Ilya Dryomov <idryomov@redhat.com>
2015-02-19	libceph: nuke pool op infrastructure	Ilya Dryomov	5	-192/+4
	On Mon, Dec 22, 2014 at 5:35 PM, Sage Weil <sage@newdream.net> wrote: > On Mon, 22 Dec 2014, Ilya Dryomov wrote: >> Actually, pool op stuff has been unused for over two years - looks like >> it was added for rbd create_snap and that got ripped out in 2012. It's >> unlikely we'd ever need to manage pools or snaps from the kernel client >> so I think it makes sense to nuke it. Sage? > > Yep! Signed-off-by: Ilya Dryomov <idryomov@redhat.com>
2015-02-18	pwm: tegra: Use NSEC_PER_SEC	Thierry Reding	1	-1/+1
	Instead of using the literal value for the number of nanoseconds per second, use the macro instead to increase readability. Signed-off-by: Thierry Reding <thierry.reding@gmail.com>
2015-02-18	md/raid5: Fix livelock when array is both resyncing and degraded.	NeilBrown	1	-1/+2
	Commit a7854487cd7128a30a7f4f5259de9f67d5efb95f: md: When RAID5 is dirty, force reconstruct-write instead of read-modify-write. Causes an RCW cycle to be forced even when the array is degraded. A degraded array cannot support RCW as that requires reading all data blocks, and one may be missing. Forcing an RCW when it is not possible causes a live-lock and the code spins, repeatedly deciding to do something that cannot succeed. So change the condition to only force RCW on non-degraded arrays. Reported-by: Manibalan P <pmanibalan@amiindia.co.in> Bisected-by: Jes Sorensen <Jes.Sorensen@redhat.com> Tested-by: Jes Sorensen <Jes.Sorensen@redhat.com> Signed-off-by: NeilBrown <neilb@suse.de> Fixes: a7854487cd7128a30a7f4f5259de9f67d5efb95f Cc: stable@vger.kernel.org (v3.7+)
2015-02-17	seccomp: cap SECCOMP_RET_ERRNO data to MAX_ERRNO	Kees Cook	1	-1/+3
	The value resulting from the SECCOMP_RET_DATA mask could exceed MAX_ERRNO when setting errno during a SECCOMP_RET_ERRNO filter action. This makes sure we have a reliable value being set, so that an invalid errno will not be ignored by userspace. Signed-off-by: Kees Cook <keescook@chromium.org> Reported-by: Dmitry V. Levin <ldv@altlinux.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Will Drewry <wad@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-17	samples/seccomp: improve label helper	Kees Cook	2	-1/+9
	Fixes a potential corruption with uninitialized stack memory in the seccomp BPF sample program. [akpm@linux-foundation.org: coding-style fixlet] Signed-off-by: Kees Cook <keescook@chromium.org> Reported-by: Robert Swiecki <swiecki@google.com> Tested-by: Robert Swiecki <swiecki@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-17	ipc,sem: use current->state helpers	Davidlohr Bueso	1	-1/+1
	Call __set_current_state() instead of assigning the new state directly. These interfaces also aid CONFIG_DEBUG_ATOMIC_SLEEP environments, keeping track of who changed the state. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-17	scripts/gdb: disable pagination while printing from breakpoint handler	Jan Kiszka	1	-0/+11
	While reporting the (refreshed) list of modules on automatic updates we may hit the page boundary of the output console and cause a stop if pagination is enabled. However, gdb does not accept user input while running over the breakpoint handler. So we get stuck, and the user is forced to interrupt gdb. Resolve this by disabling pagination during automatic symbol updates. We restore the user's configuration once done. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Jason Wessel <jason.wessel@windriver.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Ben Widawsky <ben@bwidawsk.net> Cc: Borislav Petkov <bp@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-17	scripts/gdb: define maintainer	Jan Kiszka	1	-0/+5
	I'm proposing myself for keeping an eye on these scripts and integrating contributions. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Jason Wessel <jason.wessel@windriver.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Ben Widawsky <ben@bwidawsk.net> Cc: Borislav Petkov <bp@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-17	scripts/gdb: convert CpuList to generator function	Jan Kiszka	2	-40/+33
	Yet another code simplification. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Jason Wessel <jason.wessel@windriver.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Ben Widawsky <ben@bwidawsk.net> Cc: Borislav Petkov <bp@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-17	scripts/gdb: convert ModuleList to generator function	Jan Kiszka	2	-23/+12
	Analogously to the task list, convert the module list to a generator function. It noticeably simplifies the code. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Jason Wessel <jason.wessel@windriver.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Ben Widawsky <ben@bwidawsk.net> Cc: Borislav Petkov <bp@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-17	scripts/gdb: use a generator instead of iterator for task list	Daniel Wagner	1	-30/+20
	The iterator does not return any task_struct from the thread_group list because the first condition in the 'if not t or ...' will only be the first time None. Instead of keeping track of the state ourself in the next() function, we fall back using Python's generator. Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Jason Wessel <jason.wessel@windriver.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Ben Widawsky <ben@bwidawsk.net> Cc: Borislav Petkov <bp@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-17	scripts/gdb: ignore byte-compiled python files	Daniel Thompson	2	-0/+3
	Using the gdb scripts leaves byte-compiled python files in the scripts/ directory. These should be ignored by git. [jan.kiszka@siemens.com: drop redundant mrproper rule as suggested by Michal] Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Cc: Michal Marek <mmarek@suse.cz> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Jason Wessel <jason.wessel@windriver.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Ben Widawsky <ben@bwidawsk.net> Cc: Borislav Petkov <bp@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-17	scripts/gdb: port to python3 / gdb7.7	Pantelis Koukousoulas	6	-9/+18
	I tried to use these scripts in an ubuntu 14.04 host (gdb 7.7 compiled against python 3.3) but there were several errors. I believe this patch fixes these issues so that the commands now work (I tested lx-symbols, lx-dmesg, lx-lsmod). Main issues that needed to be resolved: * In python 2 iterators have a "next()" method. In python 3 it is __next__() instead (so let's just add both). * In older python versions there was an implicit conversion in object.__format__() (used when an object is in string.format()) where it was converting the object to str first and then calling str's __format__(). This has now been removed so we must explicitly convert to str the objects for which we need to keep this behavior. * In dmesg.py: in python 3 log_buf is now a "memoryview" object which needs to be converted to a string in order to use string methods like "splitlines()". Luckily memoryview exists in python 2.7.6 as well, so we can convert log_buf to memoryview and use the same code in both python 2 and python 3. This version of the patch has now been tested with gdb 7.7 and both python 3.4 and python 2.7.6 (I think asking for at least python 2.7.6 is a reasonable requirement instead of complicating the code with version checks etc). Signed-off-by: Pantelis Koukousoulas <pktoss@gmail.com> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Jason Wessel <jason.wessel@windriver.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Ben Widawsky <ben@bwidawsk.net> Cc: Borislav Petkov <bp@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>