aboutsummaryrefslogtreecommitdiffstats
path: root/tools/perf/scripts/python/futex-contention.py (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2014-10-14ceph: additional debugfs outputJohn Spray2-0/+47
MDS session state and client global ID is useful instrumentation when testing. Signed-off-by: John Spray <john.spray@redhat.com>
2014-10-14ceph: export ceph_session_state_name functionJohn Spray2-7/+9
...so that it can be used from the ceph debugfs code when dumping session info. Signed-off-by: John Spray <john.spray@redhat.com>
2014-10-14ceph: include the initial ACL in create/mkdir/mknod MDS requestsYan, Zheng4-47/+170
Current code set new file/directory's initial ACL in a non-atomic manner. Client first sends request to MDS to create new file/directory, then set the initial ACL after the new file/directory is successfully created. The fix is include the initial ACL in create/mkdir/mknod MDS requests. So MDS can handle creating file/directory and setting the initial ACL in one request. Signed-off-by: Yan, Zheng <zyan@redhat.com> Reviewed-by: Sage Weil <sage@redhat.com>
2014-10-14ceph: use pagelist to present MDS request dataYan, Zheng3-39/+28
Current code uses page array to present MDS request data. Pages in the array are allocated/freed by caller of ceph_mdsc_do_request(). If request is interrupted, the pages can be freed while they are still being used by the request message. The fix is use pagelist to present MDS request data. Pagelist is reference counted. Signed-off-by: Yan, Zheng <zyan@redhat.com> Reviewed-by: Sage Weil <sage@redhat.com>
2014-10-14libceph: reference counting pagelistYan, Zheng4-7/+10
this allow pagelist to present data that may be sent multiple times. Signed-off-by: Yan, Zheng <zyan@redhat.com> Reviewed-by: Sage Weil <sage@redhat.com>
2014-10-14ceph: fix llistxattr on symlinkYan, Zheng1-2/+1
only regular file and directory have vxattrs. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2014-10-14ceph: send client metadata to MDSJohn Spray1-1/+70
Implement version 2 of CEPH_MSG_CLIENT_SESSION syntax, which includes additional client metadata to allow the MDS to report on clients by user-sensible names like hostname. Signed-off-by: John Spray <john.spray@redhat.com> Reviewed-by: Yan, Zheng <zyan@redhat.com>
2014-10-14ceph: remove redundant code for max file size verificationChao Yu2-15/+0
Both ceph_update_writeable_page and ceph_setattr will verify file size with max size ceph supported. There are two caller for ceph_update_writeable_page, ceph_write_begin and ceph_page_mkwrite. For ceph_write_begin, we have already verified the size in generic_write_checks of ceph_write_iter; for ceph_page_mkwrite, we have no chance to change file size when mmap. Likewise we have already verified the size in inode_change_ok when we call ceph_setattr. So let's remove the redundant code for max file size verification. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Reviewed-by: Yan, Zheng <zyan@redhat.com>
2014-10-14ceph: remove redundant io_iter_advance()Yan, Zheng1-1/+0
ceph_sync_read and generic_file_read_iter() have already advanced the IO iterator. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2014-10-14ceph: move ceph_find_inode() outside the s_mutexYan, Zheng2-8/+10
ceph_find_inode() may wait on freeing inode, using it inside the s_mutex may cause deadlock. (the freeing inode is waiting for OSD read reply, but dispatch thread is blocked by the s_mutex) Signed-off-by: Yan, Zheng <zyan@redhat.com> Reviewed-by: Sage Weil <sage@redhat.com>
2014-10-14ceph: request xattrs if xattr_version is zeroYan, Zheng5-30/+20
Following sequence of events can happen. - Client releases an inode, queues cap release message. - A 'lookup' reply brings the same inode back, but the reply doesn't contain xattrs because MDS didn't receive the cap release message and thought client already has up-to-data xattrs. The fix is force sending a getattr request to MDS if xattrs_version is 0. The getattr mask is set to CEPH_STAT_CAP_XATTR, so MDS knows client does not have xattr. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2014-10-14rbd: set the remaining discard properties to enable supportJosh Durgin1-0/+2
max_discard_sectors must be set for the queue to support discard. Operations implementing discard for rbd zero data, so report that. Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2014-10-14rbd: use helpers to handle discard for layered images correctlyJosh Durgin1-32/+22
Only allocate two osd ops for discard requests, since the preallocation hint is only added for regular writes. Use rbd_img_obj_request_fill() to recreate the original write or discard osd operations, isolating that logic to one place, and change the assert in rbd_osd_req_create_copyup() to accept discard requests as well. Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2014-10-14rbd: extract a method for adding object operationsJosh Durgin1-55/+78
rbd_img_request_fill() creates a ceph_osd_request and has logic for adding the appropriate osd ops to it based on the request type and image properties. For layered images, the original rbd_obj_request is resent with a copyup operation in front, using a new ceph_osd_request. The logic for adding the original operations should be the same as when first sending them, so move it to a helper function. op_type only needs to be checked once, so create a helper for that as well and call it outside the loop in rbd_img_request_fill(). Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2014-10-14rbd: make discard trigger copy-on-writeJosh Durgin1-1/+2
Discard requests are a form of write, so they should go through the same process as plain write requests and trigger copy-on-write for layered images. Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2014-10-14rbd: tolerate -ENOENT for discard operationsJosh Durgin1-0/+3
Discard may try to delete an object from a non-layered image that does not exist. If this occurs, the image already has no data in that range, so change the result to success. Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2014-10-14rbd: fix snapshot context reference count for discardsJosh Durgin1-1/+2
Discards take a reference to the snapshot context of an image when they are created. This reference needs to be cleaned up when the request is done just as it is for regular writes. Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2014-10-14rbd: read image size for discard check safelyJosh Durgin1-6/+12
In rbd_img_request_fill() the image size is only checked to determine whether we can truncate an object instead of zeroing it for discard requests. Take rbd_dev->header_rwsem while reading the image size, and move this read into the discard check, so that non-discard ops don't need to take the semaphore in this function. Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2014-10-14rbd: initial discard bits from Guangliang ZhaoGuangliang Zhao1-15/+89
This patch add the discard support for rbd driver. There are three types operation in the driver: 1. The objects would be removed if they completely contained within the discard range. 2. The objects would be truncated if they partly contained within the discard range, and align with their boundary. 3. Others would be zeroed. A discard request from blkdev_issue_discard() is defined which REQ_WRITE and REQ_DISCARD both marked and no data, so we must check the REQ_DISCARD first when getting the request type. This resolve: http://tracker.ceph.com/issues/190 [ Ilya Dryomov: This is incomplete and somewhat buggy, see follow up commits by Josh Durgin for refinements and fixes which weren't folded in to preserve authorship. ] Signed-off-by: Guangliang Zhao <lucienchao@gmail.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org>
2014-10-14rbd: extend the operation typeGuangliang Zhao1-34/+63
It could only handle the read and write operations now, extend it for the coming discard support. Signed-off-by: Guangliang Zhao <lucienchao@gmail.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org>
2014-10-14rbd: skip the copyup when an entire object writingGuangliang Zhao1-0/+8
It need to copyup the parent's content when layered writing, but an entire object write would overwrite it, so skip it. Signed-off-by: Guangliang Zhao <lucienchao@gmail.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org>
2014-10-14rbd: add img_obj_request_simple() helperIlya Dryomov1-16/+28
To clarify the conditions and make it easier to add new ones. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
2014-10-14rbd: access snapshot context and mapping size safelyJosh Durgin1-13/+21
These fields may both change while the image is mapped if a snapshot is created or deleted or the image is resized. They are guarded by rbd_dev->header_rwsem, so hold that while reading them, and store a local copy to refer to outside of the critical section. The local copy will stay consistent since the snapshot context is reference counted, and the mapping size is just a u64. This prevents torn loads from giving us inconsistent values. Move reading header.snapc into the caller of rbd_img_request_create() so that we only need to take the semaphore once. The read-only caller, rbd_parent_request_create() can just pass NULL for snapc, since the snapshot context is only relevant for writes. Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
2014-10-14rbd: do not return -ERANGE on auth failuresIlya Dryomov1-3/+1
Trying to map an image out of a pool for which we don't have an 'x' permission bit fails with -ERANGE from ceph_extract_encoded_string() due to an unsigned vs signed bug. Fix it and get rid of the -EINVAL sink, thus propagating rbd::get_id cls method errors. (I've seen a bunch of unexplained -ERANGE reports, I bet this is it). Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org>
2014-10-14libceph: don't try checking queue_work() return valueIlya Dryomov1-10/+5
queue_work() doesn't "fail to queue", it returns false if work was already on a queue, which can't happen here since we allocate event_work right before we queue it. So don't bother at all. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org>
2014-10-14ceph: make sure request isn't in any waiting list when kicking request.Yan, Zheng1-0/+1
we may corrupt waiting list if a request in the waiting list is kicked. Signed-off-by: Yan, Zheng <zyan@redhat.com> Reviewed-by: Sage Weil <sage@redhat.com>
2014-10-14ceph: protect kick_requests() with mdsc->mutexYan, Zheng1-2/+3
Signed-off-by: Yan, Zheng <zyan@redhat.com> Reviewed-by: Sage Weil <sage@redhat.com>
2014-10-14libceph: Convert pr_warning to pr_warnJoe Perches5-37/+39
Use the more common pr_warn. Other miscellanea: o Coalesce formats o Realign arguments Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
2014-10-14ceph: trim unused inodes before reconnecting to recovering MDSYan, Zheng1-10/+13
So the recovering MDS does not need to fetch these ununsed inodes during cache rejoin. This may reduce MDS recovery time. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2014-10-14libceph: fix a use after free issue in osdmap_set_max_osdLi RongQing1-16/+16
If the state variable is krealloced successfully, map->osd_state will be freed, once following two reallocation failed, and exit the function without resetting map->osd_state, map->osd_state become a wild pointer. fix it by resetting them after krealloc successfully. Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
2014-10-14libceph: select CRYPTO_CBC in addition to CRYPTO_AESIlya Dryomov1-0/+1
We want "cbc(aes)" algorithm, so select CRYPTO_CBC too, not just CRYPTO_AES. Otherwise on !CRYPTO_CBC kernels we fail rbd map/mount with libceph: error -2 building auth method x request Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
2014-10-14libceph: resend lingering requests with a new tidIlya Dryomov1-19/+54
Both not yet registered (r_linger && list_empty(&r_linger_item)) and registered linger requests should use the new tid on resend to avoid the dup op detection logic on the OSDs, yet we were doing this only for "registered" case. Factor out and simplify the "registered" logic and use the new helper for "not registered" case as well. Fixes: http://tracker.ceph.com/issues/8806 Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org>
2014-10-14libceph: abstract out ceph_osd_request enqueue logicIlya Dryomov1-7/+17
Introduce __enqueue_request() and switch to it. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org>
2014-10-05Linux 3.17Linus Torvalds1-1/+1
2014-10-03init/Kconfig: Fix HAVE_FUTEX_CMPXCHG to not break up the EXPERT menuJosh Triplett1-0/+1
commit 03b8c7b623c80af264c4c8d6111e5c6289933666 ("futex: Allow architectures to skip futex_atomic_cmpxchg_inatomic() test") added the HAVE_FUTEX_CMPXCHG symbol right below FUTEX. This placed it right in the middle of the options for the EXPERT menu. However, HAVE_FUTEX_CMPXCHG does not depend on EXPERT or FUTEX, so Kconfig stops placing items in the EXPERT menu, and displays the remaining several EXPERT items (starting with EPOLL) directly in the General Setup menu. Since both users of HAVE_FUTEX_CMPXCHG only select it "if FUTEX", make HAVE_FUTEX_CMPXCHG itself depend on FUTEX. With this change, the subsequent items display as part of the EXPERT menu again; the EMBEDDED menu now appears as the next top-level item in the General Setup menu, which makes General Setup much shorter and more usable. Signed-off-by: Josh Triplett <josh@joshtriplett.org> Acked-by: Randy Dunlap <rdunlap@infradead.org> Cc: stable <stable@vger.kernel.org>
2014-10-03init/Kconfig: Hide printk log config if CONFIG_PRINTK=nJosh Triplett1-0/+2
The buffers sized by CONFIG_LOG_BUF_SHIFT and CONFIG_LOG_CPU_MAX_BUF_SHIFT do not exist if CONFIG_PRINTK=n, so don't ask about their size at all. Signed-off-by: Josh Triplett <josh@joshtriplett.org> Acked-by: Randy Dunlap <rdunlap@infradead.org> Cc: stable <stable@vger.kernel.org>
2014-10-03[SCSI] uas: disable use of blk-mq I/O pathChristoph Hellwig1-0/+7
The uas driver uses the block layer tag for USB3 stream IDs. With blk-mq we can get larger tag numbers that the queue depth, which breaks this assumption. A fix is under way for 3.18, but sits on top of large changes so can't easily be backported. Set the disable_blk_mq path so that a uas device can't easily crash the system when using blk-mq for SCSI. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2014-10-03i2c: qup: Fix order of runtime pm initializationAndy Gross1-4/+8
The runtime pm calls need to be done before populating the children via the i2c_add_adapter call. If this is not done, a child can run into issues trying to do i2c read/writes due to the pm_runtime_sync failing. Signed-off-by: Andy Gross <agross@codeaurora.org> Reviewed-by: Felipe Balbi <balbi@ti.com> Acked-by: Bjorn Andersson <bjorn.andersson@sonymobile.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de> Cc: stable@kernel.org
2014-10-03i2c: rk3x: fix 0 length write transfersAlexandru M Stan1-1/+1
i2cdetect -q was broken (everything was a false positive, and no transfers were actually being sent over i2c). The way it works is by sending a 0 length write request and checking for NACK. This patch fixes the 0 length writes and actually sends them. Reported-by: Doug Anderson <dianders@chromium.org> Signed-off-by: Alexandru M Stan <amstan@chromium.org> Tested-by: Doug Anderson <dianders@chromium.org> Tested-by: Max Schwarz <max.schwarz@online.de> Signed-off-by: Wolfram Sang <wsa@the-dreams.de> Cc: stable@kernel.org
2014-10-02mm: page_alloc: fix zone allocation fairness on UPJohannes Weiner1-4/+3
The zone allocation batches can easily underflow due to higher-order allocations or spills to remote nodes. On SMP that's fine, because underflows are expected from concurrency and dealt with by returning 0. But on UP, zone_page_state will just return a wrapped unsigned long, which will get past the <= 0 check and then consider the zone eligible until its watermarks are hit. Commit 3a025760fc15 ("mm: page_alloc: spill to remote nodes before waking kswapd") already made the counter-resetting use atomic_long_read() to accomodate underflows from remote spills, but it didn't go all the way with it. Make it clear that these batches are expected to go negative regardless of concurrency, and use atomic_long_read() everywhere. Fixes: 81c0a2bb515f ("mm: page_alloc: fair zone allocator policy") Reported-by: Vlastimil Babka <vbabka@suse.cz> Reported-by: Leon Romanovsky <leon@leon.nu> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Mel Gorman <mgorman@suse.de> Cc: <stable@vger.kernel.org> [3.12+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-02perf: fix perf bug in fork()Peter Zijlstra2-3/+6
Oleg noticed that a cleanup by Sylvain actually uncovered a bug; by calling perf_event_free_task() when failing sched_fork() we will not yet have done the memset() on ->perf_event_ctxp[] and will therefore try and 'free' the inherited contexts, which are still in use by the parent process. This is bad.. Suggested-by: Oleg Nesterov <oleg@redhat.com> Reported-by: Oleg Nesterov <oleg@redhat.com> Reported-by: Sylvain 'ythier' Hitier <sylvain.hitier@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-02MAINTAINERS: change git URL for mpc5xxx treeAnatolij Gustschin1-1/+1
The repository for mpc5xxx has been moved, update git URL to new location. Signed-off-by: Anatolij Gustschin <agust@denx.de> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-02mm: memcontrol: do not iterate uninitialized memcgsJohannes Weiner1-5/+31
The cgroup iterators yield css objects that have not yet gone through css_online(), but they are not complete memcgs at this point and so the memcg iterators should not return them. Commit d8ad30559715 ("mm/memcg: iteration skip memcgs not yet fully initialized") set out to implement exactly this, but it uses CSS_ONLINE, a cgroup-internal flag that does not meet the ordering requirements for memcg, and so the iterator may skip over initialized groups, or return partially initialized memcgs. The cgroup core can not reasonably provide a clear answer on whether the object around the css has been fully initialized, as that depends on controller-specific locking and lifetime rules. Thus, introduce a memcg-specific flag that is set after the memcg has been initialized in css_online(), and read before mem_cgroup_iter() callers access the memcg members. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Tejun Heo <tj@kernel.org> Acked-by: Michal Hocko <mhocko@suse.cz> Cc: Hugh Dickins <hughd@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: <stable@vger.kernel.org> [3.12+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-02ocfs2/dlm: should put mle when goto kill in dlm_assert_master_handleralex chen1-0/+4
In dlm_assert_master_handler, the mle is get in dlm_find_mle, should be put when goto kill, otherwise, this mle will never be released. Signed-off-by: Alex Chen <alex.chen@huawei.com> Reviewed-by: Joseph Qi <joseph.qi@huawei.com> Reviewed-by: joyce.xue <xuejiufei@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-02ring-buffer: Fix infinite spin in reading bufferSteven Rostedt (Red Hat)1-1/+1
Commit 651e22f2701b "ring-buffer: Always reset iterator to reader page" fixed one bug but in the process caused another one. The reset is to update the header page, but that fix also changed the way the cached reads were updated. The cache reads are used to test if an iterator needs to be updated or not. A ring buffer iterator, when created, disables writes to the ring buffer but does not stop other readers or consuming reads from happening. Although all readers are synchronized via a lock, they are only synchronized when in the ring buffer functions. Those functions may be called by any number of readers. The iterator continues down when its not interrupted by a consuming reader. If a consuming read occurs, the iterator starts from the beginning of the buffer. The way the iterator sees that a consuming read has happened since its last read is by checking the reader "cache". The cache holds the last counts of the read and the reader page itself. Commit 651e22f2701b changed what was saved by the cache_read when the rb_iter_reset() occurred, making the iterator never match the cache. Then if the iterator calls rb_iter_reset(), it will go into an infinite loop by checking if the cache doesn't match, doing the reset and retrying, just to see that the cache still doesn't match! Which should never happen as the reset is suppose to set the cache to the current value and there's locks that keep a consuming reader from having access to the data. Fixes: 651e22f2701b "ring-buffer: Always reset iterator to reader page" Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-10-02CIFS: Fix readpages retrying on reconnectsPavel Shilovsky1-7/+1
If we got a reconnect error from async readv we re-add pages back to page_list and continue loop. That is wrong because these pages have been already added to the pagecache but page_list has pages that have not been added to the pagecache yet. This ends up with a general protection fault in put_pages after readpages. Fix it by not retrying the read of these pages and falling back to readpage instead. Fixes debian bug 762306 Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Steve French <smfrench@gmail.com> Tested-by: Arthur Marsh <arthur.marsh@internode.on.net>
2014-10-02Fix problem recognizing symlinksSteve French2-1/+3
Changeset eb85d94bd introduced a problem where if a cifs open fails during query info of a file we will still try to close the file (happens with certain types of reparse points) even though the file handle is not valid. In addition for SMB2/SMB3 we were not mapping the return code returned by Windows when trying to open a file (like a Windows NFS symlink) which is a reparse point. Signed-off-by: Steve French <smfrench@gmail.com> Reviewed-by: Pavel Shilovsky <pshilovsky@samba.org> CC: stable <stable@vger.kernel.org> #v3.13+
2014-10-02mm: numa: Do not mark PTEs pte_numa when splitting huge pagesMel Gorman1-2/+5
This patch reverts 1ba6e0b50b ("mm: numa: split_huge_page: transfer the NUMA type from the pmd to the pte"). If a huge page is being split due a protection change and the tail will be in a PROT_NONE vma then NUMA hinting PTEs are temporarily created in the protected VMA. VM_RW|VM_PROTNONE |-----------------| ^ split here In the specific case above, it should get fixed up by change_pte_range() but there is a window of opportunity for weirdness to happen. Similarly, if a huge page is shrunk and split during a protection update but before pmd_numa is cleared then a pte_numa can be left behind. Instead of adding complexity trying to deal with the case, this patch will not mark PTEs NUMA when splitting a huge page. NUMA hinting faults will not be triggered which is marginal in comparison to the complexity in dealing with the corner cases during THP split. Cc: stable@vger.kernel.org Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Rik van Riel <riel@redhat.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-02mm: migrate: Close race between migration completion and mprotectMel Gorman1-1/+4
A migration entry is marked as write if pte_write was true at the time the entry was created. The VMA protections are not double checked when migration entries are being removed as mprotect marks write-migration-entries as read. It means that potentially we take a spurious fault to mark PTEs write again but it's straight-forward. However, there is a race between write migrations being marked read and migrations finishing. This potentially allows a PTE to be write that should have been read. Close this race by double checking the VMA permissions using maybe_mkwrite when migration completes. [torvalds@linux-foundation.org: use maybe_mkwrite] Cc: stable@vger.kernel.org Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-02md/raid5: disable 'DISCARD' by default due to safety concerns.NeilBrown1-1/+17
It has come to my attention (thanks Martin) that 'discard_zeroes_data' is only a hint. Some devices in some cases don't do what it says on the label. The use of DISCARD in RAID5 depends on reads from discarded regions being predictably zero. If a write to a previously discarded region performs a read-modify-write cycle it assumes that the parity block was consistent with the data blocks. If all were zero, this would be the case. If some are and some aren't this would not be the case. This could lead to data corruption after a device failure when data needs to be reconstructed from the parity. As we cannot trust 'discard_zeroes_data', ignore it by default and so disallow DISCARD on all raid4/5/6 arrays. As many devices are trustworthy, and as there are benefits to using DISCARD, add a module parameter to over-ride this caution and cause DISCARD to work if discard_zeroes_data is set. If a site want to enable DISCARD on some arrays but not on others they should select DISCARD support at the filesystem level, and set the raid456 module parameter. raid456.devices_handle_discard_safely=Y As this is a data-safety issue, I believe this patch is suitable for -stable. DISCARD support for RAID456 was added in 3.7 Cc: Shaohua Li <shli@kernel.org> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: Mike Snitzer <snitzer@redhat.com> Cc: Heinz Mauelshagen <heinzm@redhat.com> Cc: stable@vger.kernel.org (3.7+) Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: Mike Snitzer <snitzer@redhat.com> Fixes: 620125f2bf8ff0c4969b79653b54d7bcc9d40637 Signed-off-by: NeilBrown <neilb@suse.de>