linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2015-07-01	crush: fix a bug in tree bucket decode	Ilya Dryomov	1	-1/+1
	struct crush_bucket_tree::num_nodes is u8, so ceph_decode_8_safe() should be used. -Wconversion catches this, but I guess it went unnoticed in all the noise it spews. The actual problem (at least for common crushmaps) isn't the u32 -> u8 truncation though - it's the advancement by 4 bytes instead of 1 in the crushmap buffer. Fixes: http://tracker.ceph.com/issues/2759 Cc: stable@vger.kernel.org Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2015-06-29	libceph: Fix ceph_tcp_sendpage()'s more boolean usage	Benoît Canet	1	-1/+1
	From struct ceph_msg_data_cursor in include/linux/ceph/messenger.h: bool last_piece; /* current is last piece / In ceph_msg_data_next(): last_piece = cursor->last_piece; A call to ceph_msg_data_next() is followed by: ret = ceph_tcp_sendpage(con->sock, page, page_offset, length, last_piece); while ceph_tcp_sendpage() is: static int ceph_tcp_sendpage(struct socket sock, struct page page, int offset, size_t size, bool more) The logic is inverted: correct it. Signed-off-by: Benoît Canet <benoit.canet@nodalink.com> Reviewed-by: Alex Elder <elder@linaro.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2015-06-25	libceph: Remove spurious kunmap() of the zero page	Benoît Canet	1	-1/+0
	ceph_tcp_sendpage already does the work of mapping/unmapping the zero page if needed. Signed-off-by: Benoît Canet <benoit.canet@nodalink.com> Reviewed-by: Alex Elder <elder@linaro.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2015-06-25	rbd: queue_depth map option	Ilya Dryomov	1	-3/+14
	nr_requests (/sys/block/rbd<id>/queue/nr_requests) is pretty much irrelevant in blk-mq case because each driver sets its own max depth that it can handle and that's the number of tags that gets preallocated on setup. Users can't increase queue depth beyond that value via writing to nr_requests. For rbd we are happy with the default BLKDEV_MAX_RQ (128) for most cases but we want to give users the opportunity to increase it. Introduce a new per-device queue_depth option to do just that: $ sudo rbd map -o queue_depth=1024 ... Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Alex Elder <elder@linaro.org>
2015-06-25	rbd: store rbd_options in rbd_device	Ilya Dryomov	1	-7/+11
	Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Alex Elder <elder@linaro.org>
2015-06-25	rbd: terminate rbd_opts_tokens with Opt_err	Ilya Dryomov	1	-16/+8
	Also nuke useless Opt_last_bool and don't break lines unnecessarily. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Alex Elder <elder@linaro.org>
2015-06-25	ceph: fix ceph_writepages_start()	Yan, Zheng	1	-14/+23
	Before a page get locked, someone else can write data to the page and increase the i_size. So we should re-check the i_size after pages are locked. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-06-25	rbd: bump queue_max_segments	Ilya Dryomov	1	-0/+1
	The default queue_limits::max_segments value (BLK_MAX_SEGMENTS = 128) unnecessarily limits bio sizes to 512k (assuming 4k pages). rbd, being a virtual block device, doesn't have any restrictions on the number of physical segments, so bump max_segments to max_hw_sectors, in theory allowing a sector per segment (although the only case this matters that I can think of is some readv/writev style thing). In practice this is going to give us 1M bios - the number of segments in a bio is limited in bio_get_nr_vecs() by BIO_MAX_PAGES = 256. Note that this doesn't result in any improvement on a typical direct sequential test. This is because on a box with a not too badly fragmented memory the default BLK_MAX_SEGMENTS is enough to see nice rbd object size sized requests. The only difference is the size of bios being merged - 512k vs 1M for something like $ dd if=/dev/zero of=/dev/rbd0 oflag=direct bs=$RBD_OBJ_SIZE $ dd if=/dev/rbd0 iflag=direct of=/dev/null bs=$RBD_OBJ_SIZE Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Alex Elder <elder@linaro.org>
2015-06-25	ceph: rework dcache readdir	Yan, Zheng	6	-215/+295
	Previously our dcache readdir code relies on that child dentries in directory dentry's d_subdir list are sorted by dentry's offset in descending order. When adding dentries to the dcache, if a dentry already exists, our readdir code moves it to head of directory dentry's d_subdir list. This design relies on dcache internals. Al Viro suggests using ncpfs's approach: keeping array of pointers to dentries in page cache of directory inode. the validity of those pointers are presented by directory inode's complete and ordered flags. When a dentry gets pruned, we clear directory inode's complete flag in the d_prune() callback. Before moving a dentry to other directory, we clear the ordered flag for both old and new directory. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-06-25	crush: sync up with userspace	Ilya Dryomov	7	-78/+160
	.. up to ceph.git commit 1db1abc8328d ("crush: eliminate ad hoc diff between kernel and userspace"). This fixes a bunch of recently pulled coding style issues and makes includes a bit cleaner. A patch "crush:Make the function crush_ln static" from Nicholas Krause <xerofoify@gmail.com> is folded in as crush_ln() has been made static in userspace as well. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2015-06-25	crush: fix crash from invalid 'take' argument	Ilya Dryomov	1	-2/+9
	Verify that the 'take' argument is a valid device or bucket. Otherwise ignore it (do not add the value to the working vector). Reflects ceph.git commit 9324d0a1af61e1c234cc48e2175b4e6320fff8f4. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2015-06-25	ceph: switch some GFP_NOFS memory allocation to GFP_KERNEL	Yan, Zheng	5	-14/+15
	GFP_NOFS memory allocation is required for page writeback path. But there is no need to use GFP_NOFS in syscall path and readpage path Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-06-25	ceph: pre-allocate data structure that tracks caps flushing	Yan, Zheng	9	-16/+103
	Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-06-25	ceph: re-send flushing caps (which are revoked) in reconnect stage	Yan, Zheng	3	-6/+61
	if flushing caps were revoked, we should re-send the cap flush in client reconnect stage. This guarantees that MDS processes the cap flush message before issuing the flushing caps to other client. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-06-25	ceph: send TID of the oldest pending caps flush to MDS	Yan, Zheng	1	-18/+49
	According to this information, MDS can trim its completed caps flush list (which is used to detect duplicated cap flush). Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-06-25	ceph: track pending caps flushing globally	Yan, Zheng	5	-57/+91
	So we know TID of the oldest pending caps flushing. Later patch will send this information to MDS, so that MDS can trim its completed caps flush list. Tracking pending caps flushing globally also simplifies syncfs code. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-06-25	ceph: track pending caps flushing accurately	Yan, Zheng	5	-88/+192
	Previously we do not trace accurate TID for flushing caps. when MDS failovers, we have no choice but to re-send all flushing caps with a new TID. This can cause problem because MDS can has already flushed some caps and has issued the same caps to other client. The re-sent cap flush has a new TID, which makes MDS unable to detect if it has already processed the cap flush. This patch adds code to track pending caps flushing accurately. When re-sending cap flush is needed, we use its original flush TID. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-06-25	libceph: fix wrong name "Ceph filesystem for Linux"	Hong Zhiguo	1	-1/+1
	modinfo libceph prints the module name "Ceph filesystem for Linux", which is same as the real fs module ceph. It's confusing. Signed-off-by: Hong Zhiguo <zhiguohong@tencent.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2015-06-25	ceph: fix directory fsync	Yan, Zheng	2	-64/+65
	fsync() on directory should flush dirty caps and wait for any uncommitted directory opertions to commit. But ceph_dir_fsync() only waits for uncommitted directory opertions. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-06-25	ceph: fix flushing caps	Yan, Zheng	1	-24/+25
	Current ceph_fsync() only flushes dirty caps and wait for them to be flushed. It doesn't wait for caps that has already been flushing. This patch makes ceph_fsync() wait for pending flushing caps too. Besides, this patch also makes caps_are_flushed() peroperly handle tid wrapping. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-06-25	ceph: don't include used caps in cap_wanted	Yan, Zheng	1	-3/+3
	when copying files to cephfs, file data may stay in page cache after corresponding file is closed. Cached data use Fc capability. If we include Fc capability in cap_wanted, MDS will treat files with cached data as open files, and journal them in an EOpen event when trimming log segment. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-06-25	ceph: ratelimit warn messages for MDS closes session	Yan, Zheng	1	-3/+7
	Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-06-25	rbd: timeout watch teardown on unmap with mount_timeout	Ilya Dryomov	1	-10/+28
	As part of unmap sequence, kernel client has to talk to the OSDs to teardown watch on the header object. If none of the OSDs are available it would hang forever, until interrupted by a signal - when that happens we follow through with the rest of unmap procedure (i.e. unregister the device and put all the data structures) and the unmap is still considired successful (rbd cli tool exits with 0). The watch on the userspace side should eventually timeout so that's fine. This isn't very nice, because various userspace tools (pacemaker rbd resource agent, for example) then have to worry about setting up their own timeouts. Timeout it with mount_timeout (60 seconds by default). Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Alex Elder <elder@linaro.org> Reviewed-by: Sage Weil <sage@redhat.com>
2015-06-25	ceph: simplify two mount_timeout sites	Ilya Dryomov	2	-18/+14
	No need to bifurcate wait now that we've got ceph_timeout_jiffies(). Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Alex Elder <elder@linaro.org> Reviewed-by: Yan, Zheng <zyan@redhat.com>
2015-06-25	libceph: a couple tweaks for wait loops	Ilya Dryomov	2	-5/+4
	- return -ETIMEDOUT instead of -EIO in case of timeout - wait_event_interruptible_timeout() returns time left until timeout and since it can be almost LONG_MAX we had better assign it to long Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Alex Elder <elder@linaro.org>
2015-06-25	libceph: store timeouts in jiffies, verify user input	Ilya Dryomov	9	-38/+71
	There are currently three libceph-level timeouts that the user can specify on mount: mount_timeout, osd_idle_ttl and osdkeepalive. All of these are in seconds and no checking is done on user input: negative values are accepted, we multiply them all by HZ which may or may not overflow, arbitrarily large jiffies then get added together, etc. There is also a bug in the way mount_timeout=0 is handled. It's supposed to mean "infinite timeout", but that's not how wait.h APIs treat it and so __ceph_open_session() for example will busy loop without much chance of being interrupted if none of ceph-mons are there. Fix all this by verifying user input, storing timeouts capped by msecs_to_jiffies() in jiffies and using the new ceph_timeout_jiffies() helper for all user-specified waits to handle infinite timeouts correctly. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Alex Elder <elder@linaro.org>
2015-06-25	libceph: nuke time_sub()	Ilya Dryomov	1	-9/+0
	Unused since ceph got merged into mainline I guess. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Alex Elder <elder@linaro.org>
2015-06-25	ceph: exclude setfilelock requests when calculating oldest tid	Yan, Zheng	2	-7/+25
	setfilelock requests can block for a long time, which can prevent client from advancing its oldest tid. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-06-25	ceph: don't pre-allocate space for cap release messages	Yan, Zheng	4	-202/+129
	Previously we pre-allocate cap release messages for each caps. This wastes lots of memory when there are large amount of caps. This patch make the code not pre-allocate the cap release messages. Instead, we add the corresponding ceph_cap struct to a list when releasing a cap. Later when flush cap releases is needed, we allocate the cap release messages dynamically. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-06-25	ceph: make sure syncfs flushes all cap snaps	Yan, Zheng	4	-31/+76
	Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-06-25	ceph: don't trim auth cap when there are cap snaps	Yan, Zheng	1	-1/+2
	Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-06-25	ceph: take snap_rwsem when accessing snap realm's cached_context	Yan, Zheng	3	-7/+57
	When ceph inode's i_head_snapc is NULL, __ceph_mark_dirty_caps() accesses snap realm's cached_context. So we need take read lock of snap_rwsem. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-06-25	ceph: avoid sending unnessesary FLUSHSNAP message	Yan, Zheng	3	-45/+78
	when a snap notification contains no new snapshot, we can avoid sending FLUSHSNAP message to MDS. But we still need to create cap_snap in some case because it's required by write path and page writeback path Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-06-25	ceph: set i_head_snapc when getting CEPH_CAP_FILE_WR reference	Yan, Zheng	4	-137/+212
	In most cases that snap context is needed, we are holding reference of CEPH_CAP_FILE_WR. So we can set ceph inode's i_head_snapc when getting the CEPH_CAP_FILE_WR reference, and make codes get snap context from i_head_snapc. This makes the code simpler. Another benefit of this change is that we can handle snap notification more elegantly. Especially when snap context is updated while someone else is doing write. The old queue cap_snap code may set cap_snap's context to ether the old context or the new snap context, depending on if i_head_snapc is set. The new queue capp_snap code always set cap_snap's context to the old snap context. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-06-25	ceph: use empty snap context for uninline_data and get_pool_perm	Yan, Zheng	3	-14/+14
	Cached_context in ceph_snap_realm is directly accessed by uninline_data() and get_pool_perm(). This is racy in theory. both uninline_data() and get_pool_perm() do not modify existing object, they only create new object. So we can pass the empty snap context to them. Unlike cached_context in ceph_snap_realm, we do not need to protect the empty snap context. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-06-25	libceph: use kvfree() instead of open-coding it	Ilya Dryomov	1	-4/+1
	This one sneaked in through vfs tree with commit 2b777c9dd9eb ("ceph_sync_read: stop poking into iov_iter guts"). Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2015-06-25	ceph: check OSD caps before read/write	Yan, Zheng	7	-6/+249
	Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-06-25	libceph: allow setting osd_req_op's flags	Yan, Zheng	5	-14/+21
	Signed-off-by: Yan, Zheng <zyan@redhat.com> Reviewed-by: Alex Elder <elder@linaro.org>
2015-06-25	libceph: properly release STAT request's raw_data_in	Yan, Zheng	1	-0/+3
	Signed-off-by: Yan, Zheng <zyan@redhat.com> Reviewed-by: Alex Elder <elder@linaro.org>
2015-06-21	Linux 4.1	Linus Torvalds	1	-1/+1

2015-06-19	clk: at91: fix h32mx prototype inclusion in pmc header	Nicolas Ferre	1	-1/+1
	Trivial fix that prevents to compile this pmc clock driver if h32mx clock is present but smd clock isn't. Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com> Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com> Acked-by: Alexandre Belloni <alexandre.belloni@free-electrons.com> Fixes: bcc5fd49a0fd ("clk: at91: add a driver for the h32mx clock") Cc: <stable@vger.kernel.org> # 3.18+
2015-06-19	clk: at91: trivial: typo in peripheral clock description	Nicolas Ferre	1	-1/+1
	Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com> Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
2015-06-19	clk: at91: fix PERIPHERAL_MAX_SHIFT definition	Boris Brezillon	1	-4/+4
	Fix the PERIPHERAL_MAX_SHIFT definition (3 instead of 4) and adapt the round_rate and set_rate logic accordingly. Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com> Reported-by: "Wu, Songjun" <Songjun.Wu@atmel.com>
2015-06-19	clk: at91: pll: fix input range validity check	Boris Brezillon	1	-2/+10
	The PLL impose a certain input range to work correctly, but it appears that this input range does not apply on the input clock (or parent clock) but on the input clock after it has passed the PLL divisor. Fix the implementation accordingly. Cc: <stable@vger.kernel.org> # v3.14+ Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com> Reported-by: Jonas Andersson <jonas@microbit.se>
2015-06-18	revert "cpumask: don't perform while loop in cpumask_next_and()"	Andrew Morton	1	-5/+4
	Revert commit 534b483a86e6 ("cpumask: don't perform while loop in cpumask_next_and()"). This was a minor optimization, but it puts a `struct cpumask' on the stack, which consumes too much stack space. Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Reported-by: Peter Zijlstra <peterz@infradead.org> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Tejun Heo <tj@kernel.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Amir Vadai <amirv@mellanox.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-18	drm/radeon: don't probe MST on hw we don't support it on	Dave Airlie	1	-0/+5
	If you do radeon.mst=1 on a gpu without mst hw, and then plug some mst hw it will oops instead of falling back. So check we have DCE5 at least before proceeding. Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Christian König <christian.koenig@amd.com>
2015-06-18	drm/radeon: Add RADEON_INFO_VA_UNMAP_WORKING query	Michel Dänzer	2	-0/+4
	This tells userspace that it's safe to use the RADEON_VA_UNMAP operation of the DRM_RADEON_GEM_VA ioctl. Cc: stable@vger.kernel.org (NOTE: Backporting this commit requires at least backports of commits 26d4d129b6042197b4cbc8341c0618f99231af2f, 48afbd70ac7b6aa62e8d452091023941d8085f8a and c29c0876ec05d51a93508a39b90b92c29ba6423d as well, otherwise using RADEON_VA_UNMAP runs into trouble) Signed-off-by: Michel Dänzer <michel.daenzer@amd.com> Signed-off-by: Christian König <christian.koenig@amd.com>
2015-06-17	Kconfig: disable Media Controller for DVB	Mauro Carvalho Chehab	1	-0/+1
	Since when we start discussions about the usage Media Controller for complex hardware, one thing become clear: the way it is, MC fails to map anything different than capture/output/m2m video-only streaming. The point is that MC has entities named as devnodes, but the only devnode used (before the DVB patches) is MEDIA_ENT_T_DEVNODE_V4L. Due to the way MC got implemented, however, this entity actually doesn't represent the devnode, but the hardware I/O engine that receives data via DMA. By coincidence, such DMA is associated with the V4L device node on webcam hardware, but this is not true even for other V4L2 devices. For example, on USB hardware, the DMA is done via the USB controller. The data passes though a in-kernel filter that strips off the URB headers. Other V4L2 devices like radio may not even have DMA. When it have, the DMA is done via ALSA, and not via the V4L devnode. In other words, MC is broken as a whole, but tagging it as BROKEN right now would do more harm than good. So, instead, let's mark, for now, the DVB part as broken and block all new changes to MC while we fix this mess, whith we hopefully will do for the next Kernel version. Requested-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com> Acked-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-17	mm: shmem_zero_setup skip security check and lockdep conflict with XFS	Hugh Dickins	1	-1/+7
	It appears that, at some point last year, XFS made directory handling changes which bring it into lockdep conflict with shmem_zero_setup(): it is surprising that mmap() can clone an inode while holding mmap_sem, but that has been so for many years. Since those few lockdep traces that I've seen all implicated selinux, I'm hoping that we can use the __shmem_file_setup(,,,S_PRIVATE) which v3.13's commit c7277090927a ("security: shmem: implement kernel private shmem inodes") introduced to avoid LSM checks on kernel-internal inodes: the mmap("/dev/zero") cloned inode is indeed a kernel-internal detail. This also covers the !CONFIG_SHMEM use of ramfs to support /dev/zero (and MAP_SHARED\|MAP_ANONYMOUS). I thought there were also drivers which cloned inode in mmap(), but if so, I cannot locate them now. Reported-and-tested-by: Prarit Bhargava <prarit@redhat.com> Reported-and-tested-by: Daniel Wagner <wagi@monom.org> Reported-and-tested-by: Morten Stevens <mstevens@fedoraproject.org> Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-17	i2c: slave: fix the example how to instantiate from userspace	Wolfram Sang	1	-3/+3
	I copied the wrong shell code into the documentation. Sorry to all who tried to get sense out of this current example :/ Slight rewording while we are here. Reported-by: Tim Bakker <bakkert@mymail.vcu.edu> Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de> Cc: stable@kernel.org