aboutsummaryrefslogtreecommitdiffstats
path: root/fs/gfs2 (follow)
AgeCommit message (Collapse)AuthorFilesLines
2020-08-28Merge tag 'gfs2-v5.9-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2Linus Torvalds2-0/+32
Pull gfs2 fix from Andreas Gruenbacher: "Fix a memory leak on filesystem withdraw. We didn't detect this bug because we have slab merging on by default (CONFIG_SLAB_MERGE_DEFAULT). Adding 'slub_nomerge' to the kernel command line exposed the problem" * tag 'gfs2-v5.9-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: gfs2: add some much needed cleanup for log flushes that fail
2020-08-24gfs2: add some much needed cleanup for log flushes that failBob Peterson2-0/+32
When a log flush fails due to io errors, it signals the failure but does not clean up after itself very well. This is because buffers are added to the transaction tr_buf and tr_databuf queue, but the io error causes gfs2_log_flush to bypass the "after_commit" functions responsible for dequeueing the bd elements. If the bd elements are added to the ail list before the error, function ail_drain takes care of dequeueing them. But if they haven't gotten that far, the elements are forgotten and make the transactions unable to be freed. This patch introduces new function trans_drain which drains the bd elements from the transaction so they can be freed properly. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-08-23treewide: Use fallthrough pseudo-keywordGustavo A. R. Silva2-3/+3
Replace the existing /* fall through */ comments and its variants with the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary fall-through markings when it is the case. [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
2020-08-10Merge tag 'gfs2-for-5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2Linus Torvalds6-61/+81
Pull gfs2 updates from Andreas Gruenbacher: - Make sure transactions won't be started recursively in gfs2_block_zero_range (bug introduced in 5.4 when switching to iomap_zero_range) - Fix a glock holder refcount leak introduced in the iopen glock locking scheme rework merged in 5.8. - A few other small improvements (debugging, stack usage, comment fixes). * tag 'gfs2-for-5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: gfs2: When gfs2_dirty_inode gets a glock error, dump the glock gfs2: Never call gfs2_block_zero_range with an open transaction gfs2: print details on transactions that aren't properly ended gfs2: Fix inaccurate comment fs: Fix typo in comment gfs2: Fix refcount leak in gfs2_glock_poke gfs2: Pass glock holder to gfs2_file_direct_{read,write} gfs2: Add some flags missing from glock output
2020-08-07gfs2: When gfs2_dirty_inode gets a glock error, dump the glockBob Peterson1-0/+1
Before this patch, if function gfs2_dirty_inode got an error when trying to lock the inode glock, it complained, but it didn't say what glock or inode had the problem. In this case, it almost always means that dinode_in found an error with the dinode in the file system. So it makes sense to dump the glock, which tells us the location of the dinode in the file system. That will allow us to analyze the corruption from the metadata. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-08-07gfs2: Never call gfs2_block_zero_range with an open transactionBob Peterson1-30/+39
Before this patch, some functions started transactions then they called gfs2_block_zero_range. However, gfs2_block_zero_range, like writes, can start transactions, which results in a recursive transaction error. For example: do_shrink trunc_start gfs2_trans_begin <------------------------------------------------ gfs2_block_zero_range iomap_zero_range(inode, from, length, NULL, &gfs2_iomap_ops); iomap_apply ... iomap_zero_range_actor iomap_begin gfs2_iomap_begin gfs2_iomap_begin_write actor (iomap_zero_range_actor) iomap_zero iomap_write_begin gfs2_iomap_page_prepare gfs2_trans_begin <------------------------ This patch reorders the callers of gfs2_block_zero_range so that they only start their transactions after the call. It also adds a BUG_ON to ensure this doesn't happen again. Fixes: 2257e468a63b ("gfs2: implement gfs2_block_zero_range using iomap_zero_range") Cc: stable@vger.kernel.org # v5.5+ Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-08-07gfs2: print details on transactions that aren't properly endedBob Peterson1-13/+16
If function gfs2_trans_begin is called with another transaction active it BUGs out, but it doesn't give any details about the duplicate. This patch moves function gfs2_print_trans and calls it when this situation arises for better debugging. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-08-07gfs2: Fix inaccurate commentBob Peterson1-1/+1
The comment regarding journal flush thresholds is wrong. This patch fixes it. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-08-06Merge tag 'iomap-5.9-merge-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds1-1/+2
Pull iomap updates from Darrick Wong: "The most notable changes are: - iomap no longer invalidates the page cache when performing a direct read, since doing so is unnecessary and the old directio code doesn't do that either. - iomap embraced the use of returning ENOTBLK from a direct write to trigger falling back to a buffered write since ext4 already did this and btrfs wants it for their port. - iomap falls back to buffered writes if we're doing a direct write and the page cache invalidation after the flush fails; this was necessary to handle a corner case in the btrfs port. - Remove email virus scanner detritus that was accidentally included in yesterday's pull request. Clearly I need(ed) to update my git branch checker scripts. :(" * tag 'iomap-5.9-merge-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: iomap: fall back to buffered writes for invalidation failures xfs: use ENOTBLK for direct I/O to buffered I/O fallback iomap: Only invalidate page cache pages on direct IO writes iomap: Make sure iomap_end is called after iomap_begin
2020-08-05iomap: fall back to buffered writes for invalidation failuresChristoph Hellwig1-1/+2
Failing to invalid the page cache means data in incoherent, which is a very bad state for the system. Always fall back to buffered I/O through the page cache if we can't invalidate mappings. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Acked-by: Bob Peterson <rpeterso@redhat.com> Acked-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Theodore Ts'o <tytso@mit.edu> # for ext4 Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com> # for gfs2 Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com>
2020-08-04Merge tag 'uninit-macro-v5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linuxLinus Torvalds3-3/+3
Pull uninitialized_var() macro removal from Kees Cook: "This is long overdue, and has hidden too many bugs over the years. The series has several "by hand" fixes, and then a trivial treewide replacement. - Clean up non-trivial uses of uninitialized_var() - Update documentation and checkpatch for uninitialized_var() removal - Treewide removal of uninitialized_var()" * tag 'uninit-macro-v5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: compiler: Remove uninitialized_var() macro treewide: Remove uninitialized_var() usage checkpatch: Remove awareness of uninitialized_var() macro mm/debug_vm_pgtable: Remove uninitialized_var() usage f2fs: Eliminate usage of uninitialized_var() macro media: sur40: Remove uninitialized_var() usage KVM: PPC: Book3S PR: Remove uninitialized_var() usage clk: spear: Remove uninitialized_var() usage clk: st: Remove uninitialized_var() usage spi: davinci: Remove uninitialized_var() usage ide: Remove uninitialized_var() usage rtlwifi: rtl8192cu: Remove uninitialized_var() usage b43: Remove uninitialized_var() usage drbd: Remove uninitialized_var() usage x86/mm/numa: Remove uninitialized_var() usage docs: deprecated.rst: Add uninitialized_var()
2020-08-03gfs2: Fix refcount leak in gfs2_glock_pokeAndreas Gruenbacher1-1/+3
In gfs2_glock_poke, make sure gfs2_holder_uninit is called on the local glock holder. Without that, we're leaking a glock and a pid reference. Fixes: 9e8990dea926 ("gfs2: Smarter iopen glock waiting") Cc: stable@vger.kernel.org # v5.8+ Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-08-03gfs2: Pass glock holder to gfs2_file_direct_{read,write}Andreas Gruenbacher1-16/+15
Pass a pointer to the existing glock holder from gfs2_file_{read,write}_iter to gfs2_file_direct_{read,write} to save some stack space. Suggested-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-08-03gfs2: Add some flags missing from glock outputBob Peterson1-0/+6
Before this patch, three flags were not represented in the glock output. This patch adds them in: c - GLF_INODE_CREATING P - GLF_PENDING_DELETE x - GLF_FREEING (both f and F are already used) Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-07-16treewide: Remove uninitialized_var() usageKees Cook3-3/+3
Using uninitialized_var() is dangerous as it papers over real bugs[1] (or can in the future), and suppresses unrelated compiler warnings (e.g. "unused variable"). If the compiler thinks it is uninitialized, either simply initialize the variable or make compiler changes. In preparation for removing[2] the[3] macro[4], remove all remaining needless uses with the following script: git grep '\buninitialized_var\b' | cut -d: -f1 | sort -u | \ xargs perl -pi -e \ 's/\buninitialized_var\(([^\)]+)\)/\1/g; s:\s*/\* (GCC be quiet|to make compiler happy) \*/$::g;' drivers/video/fbdev/riva/riva_hw.c was manually tweaked to avoid pathological white-space. No outstanding warnings were found building allmodconfig with GCC 9.3.0 for x86_64, i386, arm64, arm, powerpc, powerpc64le, s390x, mips, sparc64, alpha, and m68k. [1] https://lore.kernel.org/lkml/20200603174714.192027-1-glider@google.com/ [2] https://lore.kernel.org/lkml/CA+55aFw+Vbj0i=1TGqCR5vQkCzWJ0QxK6CernOU6eedsudAixw@mail.gmail.com/ [3] https://lore.kernel.org/lkml/CA+55aFwgbgqhbp1fkxvRKEpzyR5J8n1vKT1VZdz9knmPuXhOeg@mail.gmail.com/ [4] https://lore.kernel.org/lkml/CA+55aFz2500WfbKXAx8s67wrm9=yVJu65TpLgN_ybYNv0VEOKA@mail.gmail.com/ Reviewed-by: Leon Romanovsky <leonro@mellanox.com> # drivers/infiniband and mlx4/mlx5 Acked-by: Jason Gunthorpe <jgg@mellanox.com> # IB Acked-by: Kalle Valo <kvalo@codeaurora.org> # wireless drivers Reviewed-by: Chao Yu <yuchao0@huawei.com> # erofs Signed-off-by: Kees Cook <keescook@chromium.org>
2020-07-07gfs2: Rework read and page fault lockingAndreas Gruenbacher2-46/+51
So far, gfs2 has taken the inode glocks inside the ->readpage and ->readahead address space operations. Since commit d4388340ae0b ("fs: convert mpage_readpages to mpage_readahead"), gfs2_readahead is passed the pages to read ahead locked. With that, the current holder of the inode glock may be trying to lock one of those pages while gfs2_readahead is trying to take the inode glock, resulting in a deadlock. Fix that by moving the lock taking to the higher-level ->read_iter file and ->fault vm operations. This also gets rid of an ugly lock inversion workaround in gfs2_readpage. The cache consistency model of filesystems like gfs2 is such that if data is found in the page cache, the data is up to date and can be used without taking any filesystem locks. If a page is not cached, filesystem locks must be taken before populating the page cache. To avoid taking the inode glock when the data is already cached, gfs2_file_read_iter first tries to read the data with the IOCB_NOIO flag set. If that fails, the inode glock is taken and the operation is retried with the IOCB_NOIO flag cleared. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-07-03gfs2: The freeze glock should never be frozenBob Peterson2-8/+11
Before this patch, some gfs2 code locked the freeze glock with LM_FLAG_NOEXP (Do not freeze) flag, and some did not. We never want to freeze the freeze glock, so this patch makes it consistently use LM_FLAG_NOEXP always. Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2020-07-03gfs2: When freezing gfs2, use GL_EXACT and not GL_NOCACHEBob Peterson2-11/+9
Before this patch, the freeze code in gfs2 specified GL_NOCACHE in several places. That's wrong because we always want to know the state of whether the file system is frozen. There was also a problem with freeze/thaw transitioning the glock from frozen (EX) to thawed (SH) because gfs2 will normally grant glocks in EX to processes that request it in SH mode, unless GL_EXACT is specified. Therefore, the freeze/thaw code, which tried to reacquire the glock in SH mode would get the glock in EX mode, and miss the transition from EX to SH. That made it think the thaw had completed normally, but since the glock was still cached in EX, other nodes could not freeze again. This patch removes the GL_NOCACHE flag to allow the freeze glock to be cached. It also adds the GL_EXACT flag so the glock is fully transitioned from EX to SH, thereby allowing future freeze operations. Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2020-07-03gfs2: read-only mounts should grab the sd_freeze_gl glockBob Peterson1-1/+11
Before this patch, only read-write mounts would grab the freeze glock in read-only mode, as part of gfs2_make_fs_rw. So the freeze glock was never initialized. That meant requests to freeze, which request the glock in EX, were granted without any state transition. That meant you could mount a gfs2 file system, which is currently frozen on a different cluster node, in read-only mode. This patch makes read-only mounts lock the freeze glock in SH mode, which will block for file systems that are frozen on another node. Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2020-07-03gfs2: freeze should work on read-only mountsBob Peterson1-4/+6
Before this patch, function freeze_go_sync, called when promoting the freeze glock, was testing for the SDF_JOURNAL_LIVE superblock flag. That's only set for read-write mounts. Read-only mounts don't use a journal, so the bit is never set, so the freeze never happened. This patch removes the check for SDF_JOURNAL_LIVE for freeze requests but still checks it when deciding whether to flush a journal. Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2020-07-03gfs2: eliminate GIF_ORDERED in favor of list_emptyBob Peterson4-9/+12
In several places, we used the GIF_ORDERED inode flag to determine if an inode was on the ordered writes list. However, since we always held the sd_ordered_lock spin_lock during the manipulation, we can just as easily check list_empty(&ip->i_ordered) instead. This allows us to keep more than one ordered writes list to make journal writing improvements. This patch eliminates GIF_ORDERED in favor of checking list_empty. Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2020-06-30gfs2: Don't sleep during glock hash walkAndreas Gruenbacher1-1/+4
In flush_delete_work, instead of flushing each individual pending delayed work item, cancel and re-queue them for immediate execution. The waiting isn't needed here because we're already waiting for all queued work items to complete in gfs2_flush_delete_work. This makes the code more efficient, but more importantly, it avoids sleeping during a rhashtable walk, inside rcu_read_lock(). Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-06-30gfs2: fix trans slab error when withdraw occurs inside log_flushBob Peterson1-0/+10
Log flush operations (gfs2_log_flush()) can target a specific transaction. But if the function encounters errors (e.g. io errors) and withdraws, the transaction was only freed it if was queued to one of the ail lists. If the withdraw occurred before the transaction was queued to the ail1 list, function ail_drain never freed it. The result was: BUG gfs2_trans: Objects remaining in gfs2_trans on __kmem_cache_shutdown() This patch makes log_flush() add the targeted transaction to the ail1 list so that function ail_drain() will find and free it properly. Cc: stable@vger.kernel.org # v5.7+ Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-06-30gfs2: Don't return NULL from gfs2_inode_lookupAndreas Gruenbacher1-1/+2
Callers expect gfs2_inode_lookup to return an inode pointer or ERR_PTR(error). Commit b66648ad6dcf caused it to return NULL instead of ERR_PTR(-ESTALE) in some cases. Fix that. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Fixes: b66648ad6dcf ("gfs2: Move inode generation number check into gfs2_inode_lookup") Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-06-08Merge tag 'gfs2-for-5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2Linus Torvalds16-76/+396
Pull gfs2 updates from Andreas Gruenbacher: - An iopen glock locking scheme rework that speeds up deletes of inodes accessed from multiple nodes - Various bug fixes and debugging improvements - Convert gfs2-glocks.txt to ReST * tag 'gfs2-for-5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: gfs2: fix use-after-free on transaction ail lists gfs2: new slab for transactions gfs2: initialize transaction tr_ailX_lists earlier gfs2: Smarter iopen glock waiting gfs2: Wake up when setting GLF_DEMOTE gfs2: Check inode generation number in delete_work_func gfs2: Move inode generation number check into gfs2_inode_lookup gfs2: Minor gfs2_lookup_by_inum cleanup gfs2: Try harder to delete inodes locally gfs2: Give up the iopen glock on contention gfs2: Turn gl_delete into a delayed work gfs2: Keep track of deleted inode generations in LVBs gfs2: Allow ASPACE glocks to also have an lvb gfs2: instrumentation wrt log_flush stuck gfs2: introduce new gfs2_glock_assert_withdraw gfs2: print mapping->nrpages in glock dump for address space glocks gfs2: Only do glock put in gfs2_create_inode for free inodes gfs2: Allow lock_nolock mount to specify jid=X gfs2: Don't ignore inode write errors during inode_go_sync docs: filesystems: convert gfs2-glocks.txt to ReST
2020-06-05Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4Linus Torvalds1-0/+1
Pull ext4 updates from Ted Ts'o: "A lot of bug fixes and cleanups for ext4, including: - Fix performance problems found in dioread_nolock now that it is the default, caused by transaction leaks. - Clean up fiemap handling in ext4 - Clean up and refactor multiple block allocator (mballoc) code - Fix a problem with mballoc with a smaller file systems running out of blocks because they couldn't properly use blocks that had been reserved by inode preallocation. - Fixed a race in ext4_sync_parent() versus rename() - Simplify the error handling in the extent manipulation code - Make sure all metadata I/O errors are felected to ext4_ext_dirty()'s and ext4_make_inode_dirty()'s callers. - Avoid passing an error pointer to brelse in ext4_xattr_set() - Fix race which could result to freeing an inode on the dirty last in data=journal mode. - Fix refcount handling if ext4_iget() fails - Fix a crash in generic/019 caused by a corrupted extent node" * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (58 commits) ext4: avoid unnecessary transaction starts during writeback ext4: don't block for O_DIRECT if IOCB_NOWAIT is set ext4: remove the access_ok() check in ext4_ioctl_get_es_cache fs: remove the access_ok() check in ioctl_fiemap fs: handle FIEMAP_FLAG_SYNC in fiemap_prep fs: move fiemap range validation into the file systems instances iomap: fix the iomap_fiemap prototype fs: move the fiemap definitions out of fs.h fs: mark __generic_block_fiemap static ext4: remove the call to fiemap_check_flags in ext4_fiemap ext4: split _ext4_fiemap ext4: fix fiemap size checks for bitmap files ext4: fix EXT4_MAX_LOGICAL_BLOCK macro add comment for ext4_dir_entry_2 file_type member jbd2: avoid leaking transaction credits when unreserving handle ext4: drop ext4_journal_free_reserved() ext4: mballoc: use lock for checking free blocks while retrying ext4: mballoc: refactor ext4_mb_good_group() ext4: mballoc: introduce pcpu seqcnt for freeing PA to improve ENOSPC handling ext4: mballoc: refactor ext4_mb_discard_preallocations() ...
2020-06-05Merge branch 'gfs2-iopen' into for-nextAndreas Gruenbacher9-38/+289
2020-06-05gfs2: fix use-after-free on transaction ail listsBob Peterson1-2/+9
Before this patch, transactions could be merged into the system transaction by function gfs2_merge_trans(), but the transaction ail lists were never merged. Because the ail flushing mechanism can run separately, bd elements can be attached to the transaction's buffer list during the transaction (trans_add_meta, etc) but quickly moved to its ail lists. Later, in function gfs2_trans_end, the transaction can be freed (by gfs2_trans_end) while it still has bd elements queued to its ail lists, which can cause it to either lose track of the bd elements altogether (memory leak) or worse, reference the bd elements after the parent transaction has been freed. Although I've not seen any serious consequences, the problem becomes apparent with the previous patch's addition of: gfs2_assert_warn(sdp, list_empty(&tr->tr_ail1_list)); to function gfs2_trans_free(). This patch adds logic into gfs2_merge_trans() to move the merged transaction's ail lists to the sdp transaction. This prevents the use-after-free. To do this properly, we need to hold the ail lock, so we pass sdp into the function instead of the transaction itself. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-06-05gfs2: new slab for transactionsBob Peterson6-8/+32
This patch adds a new slab for gfs2 transactions. That allows us to reduce kernel memory fragmentation, have better organization of data for analysis of vmcore dumps. A new centralized function is added to free the slab objects, and it exposes use-after-free by giving warnings if a transaction is freed while it still has bd elements attached to its buffers or ail lists. We make sure to initialize those transaction ail lists so we can check their integrity when freeing. At a later time, we should add a slab initialization function to make it more efficient, but for this initial patch I wanted to minimize the impact. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-06-05gfs2: initialize transaction tr_ailX_lists earlierBob Peterson3-2/+4
Since transactions may be freed shortly after they're created, before a log_flush occurs, we need to initialize their ail1 and ail2 lists earlier. Before this patch, the ail1 list was initialized in gfs2_log_flush(). This moves the initialization to the point when the transaction is first created. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-06-05gfs2: Smarter iopen glock waitingAndreas Gruenbacher2-5/+40
When trying to upgrade the iopen glock from a shared to an exclusive lock in gfs2_evict_inode, abort the wait if there is contention on the corresponding inode glock: in that case, the inode must still be in active use on another node, and we're not guaranteed to get the iopen glock anytime soon. To make this work even better, when we notice contention on the iopen glock and we can't evict the corresponsing inode and release the iopen glock immediately, poke the inode glock. The other node(s) trying to acquire the lock can then abort instead of timing out. Thanks to Heinz Mauelshagen for pointing out a locking bug in a previous version of this patch. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-06-05gfs2: Wake up when setting GLF_DEMOTEAndreas Gruenbacher1-4/+14
Wake up the sdp->sd_async_glock_wait wait queue when setting the GLF_DEMOTE flag. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-06-05gfs2: Check inode generation number in delete_work_funcAndreas Gruenbacher2-2/+7
In delete_work_func, if the iopen glock still has an inode attached, limit the inode lookup to that specific generation number: in the likely case that the inode was deleted on the node on which the inode's link count dropped to zero, we can skip verifying the on-disk block type and reading in the inode. The same applies if another node that had the inode open managed to delete the inode before us. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-06-05gfs2: Move inode generation number check into gfs2_inode_lookupAndreas Gruenbacher1-9/+24
Move the inode generation number check from gfs2_lookup_by_inum into gfs2_inode_lookup: gfs2_inode_lookup may be able to decide that an inode with the given inode generation number cannot exist without having to verify the block type or reading the inode from disk. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-06-05gfs2: Minor gfs2_lookup_by_inum cleanupAndreas Gruenbacher4-5/+14
Use a zero no_formal_ino instead of a NULL pointer to indicate that any inode generation number will qualify: a valid inode never has a zero no_formal_ino. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-06-05gfs2: Try harder to delete inodes locallyAndreas Gruenbacher1-6/+47
When an inode's link count drops to zero and the inode is cached on other nodes, the current behavior of gfs2 is to immediately give up and to rely on the other node(s) to delete the inode if there is iopen glock contention. This leads to resource group glock bouncing and the loss of caching. With the previous patches in place, we can fix that by not giving up immediately. When the inode is still open on other nodes, those nodes won't be able to evict the inode and give up the iopen glock. In that case, our lock conversion request will time out. The unlink system call will block for the duration of the iopen lock conversion request. We're also holding the inode glock in EX mode for an extended duration, so other nodes won't be able to make progress on the inode, either. This is worse than what we had before, but we can prevent other nodes from getting stuck by aborting our iopen locking request if there is contention on the inode glock. This will the the subject of a future patch. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-06-05gfs2: Give up the iopen glock on contentionAndreas Gruenbacher3-2/+57
When there's contention on the iopen glock, it means that the link count of the corresponding inode has dropped to zero on a remote node which is now trying to delete the inode. In that case, try to evict the inode so that the iopen glock will be released, which will allow the remote node to do its job. When the inode is still open locally, the inode's reference count won't drop to zero and so we'll keep holding the inode and its iopen glock. The remote node will time out its request to grab the iopen glock, and when the inode is finally closed locally, we'll try to delete it ourself. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-06-05gfs2: Turn gl_delete into a delayed workAndreas Gruenbacher7-8/+65
This requires flushing delayed work items in gfs2_make_fs_ro (which is called before unmounting a filesystem). When inodes are deleted and then recreated, pending gl_delete work items would have no effect because the inode generations will have changed, so we can cancel any pending gl_delete works before reusing iopen glocks. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-06-05gfs2: Keep track of deleted inode generations in LVBsAndreas Gruenbacher4-1/+26
When deleting an inode, keep track of the generation of the deleted inode in the inode glock Lock Value Block (LVB). When trying to delete an inode remotely, check the last-known inode generation against the deleted inode generation to skip duplicate remote deletes. This avoids taking the resource group glock in order to verify the block type. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-06-05gfs2: Allow ASPACE glocks to also have an lvbBob Peterson1-4/+3
Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-06-05gfs2: instrumentation wrt log_flush stuckBob Peterson1-9/+25
This adds checks for gfs2_log_flush being stuck, similarly to the check in gfs2_ail1_flush. To faciliate this and make the strings easy to grep we move the ail1 emptying to its own function, empty_ail1_list. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-06-05gfs2: introduce new gfs2_glock_assert_withdrawBob Peterson2-3/+13
Before this patch, asserts based on glocks did not print the glock with the error. This patch introduces a new macro, gfs2_glock_assert_withdraw which first prints the glock, then takes the assert. This also changes a few glock asserts to the new macro. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-06-05gfs2: print mapping->nrpages in glock dump for address space glocksBob Peterson1-9/+16
This patch makes the glock dumps in debugfs print the number of pages (nrpages) for address space glocks. This will aid in debugging. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-06-03fs: move the fiemap definitions out of fs.hChristoph Hellwig1-0/+1
No need to pull the fiemap definitions into almost every file in the kernel build. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Link: https://lore.kernel.org/r/20200523073016.2944131-5-hch@lst.de Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-06-02gfs2: Only do glock put in gfs2_create_inode for free inodesBob Peterson1-1/+2
Before this patch, the error path of function gfs2_create_inode would always calls gfs2_glock_put for the inode glock. That's good for inodes that are free. But after they've been added to the vfs inodes, errors will cause the inode to be evicted, and the evict will do the glock put for us. If we do a glock put again, we can try to free the glock while there are still references to it, e.g. revokes pending for the transaction that created it. This patch adds a check: if (free_vfs_inode) before the put, thus solving the problem. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-06-02mm: remove the pgprot argument to __vmallocChristoph Hellwig2-6/+5
The pgprot argument to __vmalloc is always PAGE_KERNEL now, so remove it. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Michael Kelley <mikelley@microsoft.com> [hyperv] Acked-by: Gao Xiang <xiang@kernel.org> [erofs] Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Wei Liu <wei.liu@kernel.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Airlie <airlied@linux.ie> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Laura Abbott <labbott@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Sakari Ailus <sakari.ailus@linux.intel.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Link: http://lkml.kernel.org/r/20200414131348.444715-22-hch@lst.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02fs: convert mpage_readpages to mpage_readaheadMatthew Wilcox (Oracle)1-15/+8
Implement the new readahead aop and convert all callers (block_dev, exfat, ext2, fat, gfs2, hpfs, isofs, jfs, nilfs2, ocfs2, omfs, qnx6, reiserfs & udf). The callers are all trivial except for GFS2 & OCFS2. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com> # ocfs2 Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> # ocfs2 Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Cc: Chao Yu <yuchao0@huawei.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: Eric Biggers <ebiggers@google.com> Cc: Gao Xiang <gaoxiang25@huawei.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Zi Yan <ziy@nvidia.com> Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com> Cc: Miklos Szeredi <mszeredi@redhat.com> Link: http://lkml.kernel.org/r/20200414150233.24495-17-willy@infradead.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02gfs2: Allow lock_nolock mount to specify jid=XBob Peterson1-1/+1
Before this patch, a simple typo accidentally added \n to the jid= string for lock_nolock mounts. This made it impossible to mount a gfs2 file system with a journal other than journal0. Thus: mount -tgfs2 -o hostdata="jid=1" <device> <mount pt> Resulted in: mount: wrong fs type, bad option, bad superblock on <device> In most cases this is not a problem. However, for debugging and testing purposes we sometimes want to test the integrity of other journals. This patch removes the unnecessary \n and thus allows lock_nolock users to specify an alternate journal. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-06-02gfs2: Don't ignore inode write errors during inode_go_syncBob Peterson1-3/+5
Before for this patch, function inode_go_sync ignored io errors during inode_go_sync, overwriting them with metadata write errors: error = filemap_fdatawait(mapping); mapping_set_error(mapping, error); } error = filemap_fdatawait(metamapping); ... return error; So any errors returned by the inode write would be forgotten if the metadata write succeeded. This patch still does both writes, but only sets error if it's still zero. That way, any errors will be reported by to the caller, do_xmote, which will take appropriate action and report the error. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-05-29gfs2: Even more gfs2_find_jhead fixesAndreas Gruenbacher1-10/+5
Fix several issues in the previous gfs2_find_jhead fix: * When updating @blocks_submitted, @block refers to the first block block not submitted yet, not the last block submitted, so fix an off-by-one error. * We want to ensure that @blocks_submitted is far enough ahead of @blocks_read to guarantee that there is in-flight I/O. Otherwise, we'll eventually end up waiting for pages that haven't been submitted, yet. * It's much easier to compare the number of blocks added with the number of blocks submitted to limit the maximum bio size. * Even with bio chaining, we can keep adding blocks until we reach the maximum bio size, as long as we stop at a page boundary. This simplifies the logic. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Bob Peterson <rpeterso@redhat.com>