aboutsummaryrefslogtreecommitdiffstats
path: root/fs/gfs2/rgrp.c (follow)
AgeCommit message (Collapse)AuthorFilesLines
2020-11-25gfs2: check for empty rgrp tree in gfs2_ri_updateBob Peterson1-0/+4
If gfs2 tries to mount a (corrupt) file system that has no resource groups it still tries to set preferences on the first one, which causes a kernel null pointer dereference. This patch adds a check to function gfs2_ri_update so this condition is detected and reported back as an error. Reported-by: syzbot+e3f23ce40269a4c9053a@syzkaller.appspotmail.com Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-11-12gfs2: fix possible reference leak in gfs2_check_blk_typeZhang Qilong1-5/+5
In the fail path of gfs2_check_blk_type, forgetting to call gfs2_glock_dq_uninit will result in rgd_gh reference leak. Signed-off-by: Zhang Qilong <zhangqilong3@huawei.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-10-29gfs2: check for live vs. read-only file system in gfs2_fitrimBob Peterson1-0/+3
Before this patch, gfs2_fitrim was not properly checking for a "live" file system. If the file system had something to trim and the file system was read-only (or spectator) it would start the trim, but when it starts the transaction, gfs2_trans_begin returns -EROFS (read-only file system) and it errors out. However, if the file system was already trimmed so there's no work to do, it never called gfs2_trans_begin. That code is bypassed so it never returns the error. Instead, it returns a good return code with 0 work. All this makes for inconsistent behavior: The same fstrim command can return -EROFS in one case and 0 in another. This tripped up xfstests generic/537 which reports the error as: +fstrim with unrecovered metadata just ate your filesystem This patch adds a check for a "live" (iow, active journal, iow, RW) file system, and if not, returns the error properly. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-10-29gfs2: Free rd_bits later in gfs2_clear_rgrpd to fix use-after-freeBob Peterson1-1/+1
Function gfs2_clear_rgrpd calls kfree(rgd->rd_bits) before calling return_all_reservations, but return_all_reservations still dereferences rgd->rd_bits in __rs_deltree. Fix that by moving the call to kfree below the call to return_all_reservations. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-10-20gfs2: Eliminate gl_vmBob Peterson1-4/+0
The gfs2_glock structure has a gl_vm member, introduced in commit 7005c3e4ae428 ("GFS2: Use range based functions for rgrp sync/invalidation"), which stores the location of resource groups within their address space. This structure is in a union with iopen glock specific fields. It was introduced because at unmount time, the resource group objects were destroyed before flushing out any pending resource group glock work, and flushing out such work could require flushing / truncating the address space. Since commit b3422cacdd7e6 ("gfs2: Rework how rgrp buffer_heads are managed"), any pending resource group glock work is flushed out before destroying the resource group objects. So the resource group objects will now always exist in rgrp_go_sync and rgrp_go_inval, and we now simply compute the gl_vm values where needed instead of caching them. This also eliminates the union. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-10-15gfs2: Wipe jdata and ail1 in gfs2_journal_wipe, formerly gfs2_meta_wipeBob Peterson1-3/+3
Before this patch, when blocks were freed, it called gfs2_meta_wipe to take the metadata out of the pending journal blocks. It did this mostly by calling another function called gfs2_remove_from_journal. This is shortsighted because it does not do anything with jdata blocks which may also be in the journal. This patch expands the function so that it wipes out jdata blocks from the journal as well, and it wipes it from the ail1 list if it hasn't been written back yet. Since it now processes jdata blocks as well, the function has been renamed from gfs2_meta_wipe to gfs2_journal_wipe. New function gfs2_ail1_wipe wants a static view of the ail list, so it locks the sd_ail_lock when removing items. To accomplish this, function gfs2_remove_from_journal no longer locks the sd_ail_lock, and it's now the caller's responsibility to do so. I was going to make sd_ail_lock locking conditional, but the practice is generally frowned upon. For details, see: https://lwn.net/Articles/109066/ Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-10-14gfs2: Fix NULL pointer dereference in gfs2_rgrp_dumpAndrew Price1-6/+3
When an rindex entry is found to be corrupt, compute_bitstructs() calls gfs2_consist_rgrpd() which calls gfs2_rgrp_dump() like this: gfs2_rgrp_dump(NULL, rgd->rd_gl, fs_id_buf); gfs2_rgrp_dump then dereferences the gl without checking it and we get BUG: KASAN: null-ptr-deref in gfs2_rgrp_dump+0x28/0x280 because there's no rgrp glock involved while reading the rindex on mount. Fix this by changing gfs2_rgrp_dump to take an rgrp argument. Reported-by: syzbot+43fa87986bdd31df9de6@syzkaller.appspotmail.com Signed-off-by: Andrew Price <anprice@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-06-05gfs2: Turn gl_delete into a delayed workAndreas Gruenbacher1-1/+1
This requires flushing delayed work items in gfs2_make_fs_ro (which is called before unmounting a filesystem). When inodes are deleted and then recreated, pending gl_delete work items would have no effect because the inode generations will have changed, so we can cancel any pending gl_delete works before reusing iopen glocks. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-03-27gfs2: don't lock sd_log_flush_lock in try_rgrp_unlinkBob Peterson1-2/+0
In function try_rgrp_unlink, we added a temporary lock of the sd_log_flush_lock while searching the bitmaps. This protected us from problems in which dinodes being freed were still in a state of flux because the rgrp was in an active transaction. It was a kludge. Now that we've straightened out the code for inode eviction, deletes, and all the recovery mess, we no longer need this kludge. This patch removes it, and should improve performance. Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2020-03-27gfs2: Split gfs2_rsqa_delete into gfs2_rs_delete and gfs2_qa_putAndreas Gruenbacher1-3/+2
Keeping reservations and quotas separate helps reviewing the code. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2020-03-27gfs2: Change inode qa_data to allow multiple usersBob Peterson1-1/+1
Before this patch, multiple users called gfs2_qa_alloc which allocated a qadata structure to the inode, if quotas are turned on. Later, in file close or evict, the structure was deleted with gfs2_qa_delete. But there can be several competing processes who need access to the structure. There were races between file close (release) and the others. Thus, a release could delete the structure out from under a process that relied upon its existence. For example, chown. This patch changes the management of the qadata structures to be a get/put scheme. Function gfs2_qa_alloc has been changed to gfs2_qa_get and if the structure is allocated, the count essentially starts out at 1. Function gfs2_qa_delete has been renamed to gfs2_qa_put, and the last guy to decrement the count to 0 frees the memory. Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2020-03-27gfs2: eliminate gfs2_rsqa_alloc in favor of gfs2_qa_allocBob Peterson1-10/+0
Before this patch, multiple callers called gfs2_rsqa_alloc to force the existence of a reservations structure and a quota data structure if needed. However, now the reservations are handled separately, so the quota data is only the quota data. So we eliminate the one in favor of just calling gfs2_qa_alloc directly. Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2020-02-10gfs2: Rework how rgrp buffer_heads are managedBob Peterson1-18/+5
Before this patch, the rgrp code had a serious problem related to how it managed buffer_heads for resource groups. The problem caused file system corruption, especially in cases of journal replay. When an rgrp glock was demoted to transfer ownership to a different cluster node, do_xmote() first calls rgrp_go_sync and then rgrp_go_inval, as expected. When it calls rgrp_go_sync, that called gfs2_rgrp_brelse() that dropped the buffer_head reference count. In most cases, the reference count went to zero, which is right. However, there were other places where the buffers are handled differently. After rgrp_go_sync, do_xmote called rgrp_go_inval which called gfs2_rgrp_brelse a second time, then rgrp_go_inval's call to truncate_inode_pages_range would get rid of the pages in memory, but only if the reference count drops to 0. Unfortunately, gfs2_rgrp_brelse was setting bi->bi_bh = NULL. So when rgrp_go_sync called gfs2_rgrp_brelse, it lost the pointer to the buffer_heads in cases where the reference count was still 1. Therefore, when rgrp_go_inval called gfs2_rgrp_brelse a second time, it failed the check for "if (bi->bi_bh)" and thus failed to call brelse a second time. Because of that, the reference count on those buffers sometimes failed to drop from 1 to 0. And that caused function truncate_inode_pages_range to keep the pages in page cache rather than freeing them. The next time the rgrp glock was acquired, the metadata read of the rgrp buffers re-used the pages in memory, which were now wrong because they were likely modified by the other node who acquired the glock in EX (which is why we demoted the glock). This re-use of the page cache caused corruption because changes made by the other nodes were never seen, so the bitmaps were inaccurate. For some reason, the problem became most apparent when journal replay forced the replay of rgrps in memory, which caused newer rgrp data to be overwritten by the older in-core pages. A big part of the problem was that the rgrp buffer were released in multiple places: The go_unlock function would release them when the glock was released rather than when the glock is demoted, which is clearly wrong because our intent was to cache them until the glock is demoted from SH or EX. This patch attempts to clean up the mess and make one consistent and centralized mechanism for managing the rgrp buffer_heads by implementing several changes: 1. It eliminates the call to gfs2_rgrp_brelse() from rgrp_go_sync. We don't want to release the buffers or zero the pointers when syncing for the reasons stated above. It only makes sense to release them when the glock is actually invalidated (go_inval). And when we do, then we set the bh pointers to NULL. 2. The go_unlock function (which was only used for rgrps) is eliminated, as we've talked about doing many times before. The go_unlock function was called too early in the glock dq process, and should not happen until the glock is invalidated. 3. It also eliminates the call to rgrp_brelse in gfs2_clear_rgrpd. That will now happen automatically when the rgrp glocks are demoted, and shouldn't happen any sooner or later than that. Instead, function gfs2_clear_rgrpd has been modified to demote the rgrp glocks, and therefore, free those pages, before the remaining glocks are culled by gfs2_gl_hash_clear. This prevents the gl_object from hanging around when the glocks are culled. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-02-10gfs2: Report errors before withdrawAndreas Gruenbacher1-25/+23
In gfs2_rgrp_verify and compute_bitstructs, make sure to report errors before withdrawing the filesystem: otherwise, when we withdraw first and withdraw is configured to panic, we'll never get to the error reporting. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2020-01-21gfs2: remove unused LBIT macrosAlex Shi1-10/+0
Since commit 223b2b889f37 ("GFS2: Fix alignment issue and tidy gfs2_bitfit"), these 3 macros aren't used anymore, so remove them. Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-09-03gfs2: Fix possible fs name overflowsBob Peterson1-1/+1
This patch fixes three places in which temporary character buffers could overflow due to the addition of the file system id from patch 3792ce973f07. Thanks to Dan Carpenter for pointing it out. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-06-27gfs2: replace more printk with calls to fs_info and friendsBob Peterson1-13/+14
This patch replaces a few leftover printk errors with calls to fs_info and similar, so that the file system having the error is properly logged. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-06-27gfs2: dump fsid when dumping glock problemsBob Peterson1-7/+14
Before this patch, if a glock error was encountered, the glock with the problem was dumped. But sometimes you may have lots of file systems mounted, and that doesn't tell you which file system it was for. This patch adds a new boolean parameter fsid to the dump_glock family of functions. For non-error cases, such as dumping the glocks debugfs file, the fsid is not dumped in order to keep lock dumps and glocktop as clean as possible. For all error cases, such as GLOCK_BUG_ON, the file system id is now printed. This will make it easier to debug. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-06-05treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 398Thomas Gleixner1-4/+1
Based on 1 normalized pattern(s): this copyrighted material is made available to anyone wishing to use modify copy or redistribute it subject to the terms and conditions of the gnu general public license version 2 extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 44 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190531081038.653000175@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-07gfs2: Rename gfs2_trans_{add_unrevoke => remove_revoke}Andreas Gruenbacher1-1/+1
Rename gfs2_trans_add_unrevoke to gfs2_trans_remove_revoke: there is no such thing as an "unrevoke" object; all this function does is remove existing revoke objects plus some bookkeeping. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-05-07gfs2: Fix loop in gfs2_rbm_find (v2)Andreas Gruenbacher1-29/+25
Fix the resource group wrap-around logic in gfs2_rbm_find that commit e579ed4f44 broke. The bug can lead to unnecessary repeated scanning of the same bitmaps; there is a risk that future changes will turn this into an endless loop. This is an updated version of commit 2d29f6b96d ("gfs2: Fix loop in gfs2_rbm_find") which ended up being reverted because it introduced a performance regression in iozone (see commit e74c98ca2d). Changes since v1: - Simplify the wrap-around logic. - Handle the case where each resource group only has a single bitmap block (small filesystem). - Update rd_extfail_pt whenever we scan the entire bitmap, even when we don't start the scan at the very beginning of the bitmap. Fixes: e579ed4f446e ("GFS2: Introduce rbm field bii") Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-01-31gfs2: Revert "Fix loop in gfs2_rbm_find"Andreas Gruenbacher1-1/+1
This reverts commit 2d29f6b96d8f80322ed2dd895bca590491c38d34. It turns out that the fix can lead to a ~20 percent performance regression in initial writes to the page cache according to iozone. Let's revert this for now to have more time for a proper fix. Cc: stable@vger.kernel.org # v3.13+ Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-12gfs2: Dump nrpages for inodes and their glocksBob Peterson1-1/+1
This patch is based on an idea from Steve Whitehouse. The idea is to dump the number of pages for inodes in the glock dumps. The additional locking required me to drop const from quite a few places. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2018-12-12gfs2: Fix loop in gfs2_rbm_findAndreas Gruenbacher1-1/+1
Fix the resource group wrap-around logic in gfs2_rbm_find that commit e579ed4f44 broke. The bug can lead to unnecessary repeated scanning of the same bitmaps; there is a risk that future changes will turn this into an endless loop. Fixes: e579ed4f44 ("GFS2: Introduce rbm field bii") Cc: stable@vger.kernel.org # v3.13+ Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-11-09gfs2: Put bitmap buffers in put_superAndreas Gruenbacher1-1/+2
gfs2_put_super calls gfs2_clear_rgrpd to destroy the gfs2_rgrpd objects attached to the resource group glocks. That function should release the buffers attached to the gfs2_bitmap objects (bi_bh), but the call to gfs2_rgrp_brelse for doing that is missing. When gfs2_releasepage later runs across these buffers which are still referenced, it refuses to free them. This causes the pages the buffers are attached to to remain referenced as well. With enough mount/unmount cycles, the system will eventually run out of memory. Fix this by adding the missing call to gfs2_rgrp_brelse in gfs2_clear_rgrpd. (Also fix a gfs2_rgrp_relse -> gfs2_rgrp_brelse typo in a comment.) Fixes: 39b0f1e92908 ("GFS2: Don't brelse rgrp buffer_heads every allocation") Cc: stable@vger.kernel.org # v4.2+ Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2018-10-12gfs2: Pass resource group to rgblk_freeAndreas Gruenbacher1-28/+16
Function rgblk_free can only deal with one resource group at a time, so pass that resource group is as a parameter. Several of the callers already have the resource group at hand, so we only need additional lookup code in a few places. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Steven Whitehouse <swhiteho@redhat.com>
2018-10-12gfs2: Remove unnecessary gfs2_rlist_alloc parameterBob Peterson1-3/+2
The state parameter of gfs2_rlist_alloc is set to LM_ST_EXCLUSIVE in all calls, so remove it and hardcode that state in gfs2_rlist_alloc instead. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Steven Whitehouse <swhiteho@redhat.com>
2018-10-12gfs2: Fix marking bitmaps non-fullAndreas Gruenbacher1-2/+11
Reservations in gfs can span multiple gfs2_bitmaps (but they won't span multiple resource groups). When removing a reservation, we want to clear the GBF_FULL flags of all involved gfs2_bitmaps, not just that of the first bitmap. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Steven Whitehouse <swhiteho@redhat.com>
2018-10-12gfs2: Fix some minor typosAndreas Gruenbacher1-2/+2
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Steven Whitehouse <swhiteho@redhat.com>
2018-10-12gfs2: Rename bitmap.bi_{len => bytes}Andreas Gruenbacher1-16/+16
This field indicates the size of the bitmap in bytes, similar to how the bi_blocks field indicates the size of the bitmap in blocks. In count_unlinked, replace an instance of bi_bytes * GFS2_NBBY by bi_blocks. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Steven Whitehouse <swhiteho@redhat.com>
2018-10-12gfs2: Remove unused RGRP_RSRV_MINBYTES definitionAndreas Gruenbacher1-1/+1
This definition is only used to define RGRP_RSRV_MINBLKS, with no benefit over defining RGRP_RSRV_MINBLKS directly. In addition, instead of forcing RGRP_RSRV_MINBLKS to be of type u32, cast it to that type where that type is required. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Steven Whitehouse <swhiteho@redhat.com>
2018-10-12gfs2: Move rs_{sizehint, rgd_gh} fields into the inodeAndreas Gruenbacher1-9/+7
Move the rs_sizehint and rs_rgd_gh fields from struct gfs2_blkreserv into the inode: they are more closely related to the inode than to a particular reservation. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Steven Whitehouse <swhiteho@redhat.com>
2018-10-12gfs2: Clean up out-of-bounds check in gfs2_rbm_from_blockAndreas Gruenbacher1-7/+2
We already have a function that checks if a block is within a resource group, so use that in gfs2_rbm_from_block as well. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Steven Whitehouse <swhiteho@redhat.com>
2018-10-12gfs2: Always check the result of gfs2_rbm_from_blockAndreas Gruenbacher1-3/+8
When gfs2_rbm_from_block fails, the rbm it returns is undefined, so we always want to make sure gfs2_rbm_from_block has succeeded before looking at the rbm. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Steven Whitehouse <swhiteho@redhat.com>
2018-10-05gfs2: Use fs_* functions instead of pr_* function where we canBob Peterson1-11/+17
Before this patch, various errors and messages were reported using the pr_* functions: pr_err, pr_warn, pr_info, etc., but that does not tell you which gfs2 mount had the problem, which is often vital to debugging. This patch changes the calls from pr_* to fs_* in most of the messages so that the file system id is printed along with the message. Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-08-28gfs2: Don't set GFS2_RDF_UPTODATE when the lvb is updatedBob Peterson1-1/+1
The GFS2_RDF_UPTODATE flag in the rgrp is used to determine when a rgrp buffer is valid. It's cleared when the glock is invalidated, signifying that the buffer data is now invalid. But before this patch, function update_rgrp_lvb was setting the flag when it determined it had a valid lvb. But that's an invalid assumption: just because you have a valid lvb doesn't mean you have valid buffers. After all, another node may have made the lvb valid, and this node just fetched it from the glock via dlm. Consider this scenario: 1. The file system is mounted with RGRPLVB option. 2. In gfs2_inplace_reserve it locks the rgrp glock EX, but thanks to GL_SKIP, it skips the gfs2_rgrp_bh_get. 3. Since loops == 0 and the allocation target (ap->target) is bigger than the largest known chunk of blocks in the rgrp (rs->rs_rbm.rgd->rd_extfail_pt) it skips that rgrp and bypasses the call to gfs2_rgrp_bh_get there as well. 4. update_rgrp_lvb sees the lvb MAGIC number is valid, so bypasses gfs2_rgrp_bh_get, but it still sets sets GFS2_RDF_UPTODATE due to this invalid assumption. 5. The next time update_rgrp_lvb is called, it sees the bit is set and just returns 0, assuming both the lvb and rgrp are both uptodate. But since this is a smaller allocation, or space has been freed by another node, thus adjusting the lvb values, it decides to use the rgrp for allocations, with invalid rd_free due to the fact it was never updated. This patch changes update_rgrp_lvb so it doesn't set the UPTODATE flag anymore. That way, it has no choice but to fetch the latest values. Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-08-28gfs2: improve debug information when lvb mismatches are foundBob Peterson1-5/+36
Before this patch, gfs2_rgrp_bh_get would check for lvb mismatches, but it wouldn't tell you what was actually wrong. This patch adds more information to help us debug it. It also makes rgrp consistency checks dump any bad rgrps, and the rgrp dump code dump any lvbs as well as the rgrp itself. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Acked-by: Steven Whitehouse <swhiteho@redhat.com>
2018-08-08gfs2: eliminate update_rgrp_lvb_unlinkedBob Peterson1-9/+2
Function update_rgrp_lvb_unlinked used to do the same thing as be32_add_cpu. This patch removes it in favor of using be32_add_cpu directly. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Andrew Price <anprice@redhat.com>
2018-08-07gfs2: Fix gfs2_testbit to use clone bitmapsBob Peterson1-25/+19
Function gfs2_testbit is called in three places. Two of those places, gfs2_alloc_extent and gfs2_unaligned_extlen, should be using the clone bitmaps, not the "real" bitmaps. Function gfs2_unaligned_extlen is used by the block reservations scheme to determine the length of an extent of free blocks. Before this patch, it wasn't using the clone bitmap, which means recently-freed blocks were treated as free blocks for the purposes of an allocation. This patch adds a new parameter to gfs2_testbit to indicate whether or not the clone bitmaps should be used (if available). Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
2018-07-26gfs2: cleanup: call gfs2_rgrp_ondisk2lvb from gfs2_rgrp_outBob Peterson1-17/+13
Before this patch gfs2_rgrp_ondisk2lvb was called after every call to gfs2_rgrp_out. This patch just calls it directly from within gfs2_rgrp_out, and moves the function to be before it so we don't need a function prototype. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
2018-07-25GFS2: rgrp free blocks used incorrectlyBob Peterson1-5/+34
Before this patch, several functions in rgrp.c checked the value of rgd->rd_free_clone. That does not take into account blocks that were reserved by a multi-block reservation. This causes a problem when space gets tight in the file system. For example, when function gfs2_inplace_reserve checks to see if a rgrp has enough blocks to satisfy the request, it can accept a rgrp that it should reject because, although there are enough blocks to satisfy the request _now_, those blocks may be reserved for another running process. A second problem with this occurs when we've reserved the remaining blocks in an rgrp: function rg_mblk_search() can reject an rgrp improperly because it calculates: u32 free_blocks = rgd->rd_free_clone - rgd->rd_reserved; But rd_reserved includes blocks that the current process just reserved in its own call to inplace_reserve. For example, it can reserve the last 128 blocks of an rgrp, then reject that same rgrp because the above calculates out to free_blocks = 0; Consequences include, but are not limited to, (1) leaving holes, and thus increasing file system fragmentation, and (2) reporting file system is full long before it actually is. This patch introduces a new function, rgd_free, which returns the number of clone-free blocks (blocks that are truly free as opposed to blocks that are still being used because an unlinked file is still open) minus the number of blocks reserved by processes, but not counting the blocks we ourselves reserved (because obviously we need to allocate them). Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2018-07-24Merge branch 'iomap-write' into linux-gfs2/for-nextAndreas Gruenbacher1-2/+3
Pull in the gfs2 iomap-write changes: Tweak the existing code to properly support iomap write and eliminate an unnecessary special case in gfs2_block_map. Implement iomap write support for buffered and direct I/O. Simplify some of the existing code and eliminate code that is no longer used: gfs2: Remove gfs2_write_{begin,end} gfs2: iomap direct I/O support gfs2: gfs2_extent_length cleanup gfs2: iomap buffered write support gfs2: Further iomap cleanups This is based on the following changes on the xfs 'iomap-4.19-merge' branch: iomap: add private pointer to struct iomap iomap: add a page_done callback iomap: generic inline data handling iomap: complete partial direct I/O writes synchronously iomap: mark newly allocated buffer heads as new fs: factor out a __generic_write_end helper Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2018-07-24gfs2: Don't reject a supposedly full bitmap if we have blocks reservedBob Peterson1-1/+2
Before this patch, you could get into situations like this: 1. Process 1 searches for X free blocks, finds them, makes a reservation 2. Process 2 searches for free blocks in the same rgrp, but now the bitmap is full because process 1's reservation is skipped over. So it marks the bitmap as GBF_FULL. 3. Process 1 tries to allocate blocks from its own reservation, but since the GBF_FULL bit is set, it skips over the rgrp and searches elsewhere, thus not using its own reservation. This patch adds an additional check to allow processes to use their own reservations. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2018-07-05gfs2: Eliminate redundant ip->i_rgdAndreas Gruenbacher1-7/+6
GFS2 remembers the last rgrp used for allocations in ip->i_rgd. However, block allocations are made by way of a reservations structure, ip->i_res, which keeps the last rgrp in ip->i_res.rs_rgd, and ip->i_res is kept in sync with ip->i_res.rs_rgd, so it's redundant. Get rid of ip->i_rgd and just use ip->i_res.rs_rgd in its place. Based on patches by Robert Peterson. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-07-04gfs2: Stop messing with ip->i_rgd in the rlist codeAndreas Gruenbacher1-6/+22
In the resource group list code, keep the last resource group added in the last position in the array. Check against that instead of messing with ip->i_rgd. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-06-21gfs2: eliminate rs_inum and reduce the size of gfs2 inodesBob Peterson1-2/+3
Before this patch, block reservations kept track of the inode number. At one point, that was a valid thing to do. However, since we made the reservation a part of the inode (rather than a pointer to a separate allocated object) the reservation can determine the inode number by using container_of. This saves us a little memory in our inode. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Acked-by: Steven Whitehouse <swhiteho@redhat.com> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
2018-06-12treewide: kmalloc() -> kmalloc_array()Kees Cook1-2/+3
The kmalloc() function has a 2-factor argument form, kmalloc_array(). This patch replaces cases of: kmalloc(a * b, gfp) with: kmalloc_array(a * b, gfp) as well as handling cases of: kmalloc(a * b * c, gfp) with: kmalloc(array3_size(a, b, c), gfp) as it's slightly less ugly than: kmalloc_array(array_size(a, b), c, gfp) This does, however, attempt to ignore constant size factors like: kmalloc(4 * 1024, gfp) though any constants defined via macros get caught up in the conversion. Any factors with a sizeof() of "unsigned char", "char", and "u8" were dropped, since they're redundant. The tools/ directory was manually excluded, since it has its own implementation of kmalloc(). The Coccinelle script used for this was: // Fix redundant parens around sizeof(). @@ type TYPE; expression THING, E; @@ ( kmalloc( - (sizeof(TYPE)) * E + sizeof(TYPE) * E , ...) | kmalloc( - (sizeof(THING)) * E + sizeof(THING) * E , ...) ) // Drop single-byte sizes and redundant parens. @@ expression COUNT; typedef u8; typedef __u8; @@ ( kmalloc( - sizeof(u8) * (COUNT) + COUNT , ...) | kmalloc( - sizeof(__u8) * (COUNT) + COUNT , ...) | kmalloc( - sizeof(char) * (COUNT) + COUNT , ...) | kmalloc( - sizeof(unsigned char) * (COUNT) + COUNT , ...) | kmalloc( - sizeof(u8) * COUNT + COUNT , ...) | kmalloc( - sizeof(__u8) * COUNT + COUNT , ...) | kmalloc( - sizeof(char) * COUNT + COUNT , ...) | kmalloc( - sizeof(unsigned char) * COUNT + COUNT , ...) ) // 2-factor product with sizeof(type/expression) and identifier or constant. @@ type TYPE; expression THING; identifier COUNT_ID; constant COUNT_CONST; @@ ( - kmalloc + kmalloc_array ( - sizeof(TYPE) * (COUNT_ID) + COUNT_ID, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * COUNT_ID + COUNT_ID, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * (COUNT_CONST) + COUNT_CONST, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * COUNT_CONST + COUNT_CONST, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * (COUNT_ID) + COUNT_ID, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * COUNT_ID + COUNT_ID, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * (COUNT_CONST) + COUNT_CONST, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * COUNT_CONST + COUNT_CONST, sizeof(THING) , ...) ) // 2-factor product, only identifiers. @@ identifier SIZE, COUNT; @@ - kmalloc + kmalloc_array ( - SIZE * COUNT + COUNT, SIZE , ...) // 3-factor product with 1 sizeof(type) or sizeof(expression), with // redundant parens removed. @@ expression THING; identifier STRIDE, COUNT; type TYPE; @@ ( kmalloc( - sizeof(TYPE) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kmalloc( - sizeof(TYPE) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kmalloc( - sizeof(TYPE) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kmalloc( - sizeof(TYPE) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kmalloc( - sizeof(THING) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kmalloc( - sizeof(THING) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kmalloc( - sizeof(THING) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kmalloc( - sizeof(THING) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) ) // 3-factor product with 2 sizeof(variable), with redundant parens removed. @@ expression THING1, THING2; identifier COUNT; type TYPE1, TYPE2; @@ ( kmalloc( - sizeof(TYPE1) * sizeof(TYPE2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kmalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kmalloc( - sizeof(THING1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kmalloc( - sizeof(THING1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kmalloc( - sizeof(TYPE1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) | kmalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) ) // 3-factor product, only identifiers, with redundant parens removed. @@ identifier STRIDE, SIZE, COUNT; @@ ( kmalloc( - (COUNT) * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - COUNT * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - COUNT * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - (COUNT) * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - COUNT * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - (COUNT) * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - (COUNT) * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - COUNT * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) ) // Any remaining multi-factor products, first at least 3-factor products, // when they're not all constants... @@ expression E1, E2, E3; constant C1, C2, C3; @@ ( kmalloc(C1 * C2 * C3, ...) | kmalloc( - (E1) * E2 * E3 + array3_size(E1, E2, E3) , ...) | kmalloc( - (E1) * (E2) * E3 + array3_size(E1, E2, E3) , ...) | kmalloc( - (E1) * (E2) * (E3) + array3_size(E1, E2, E3) , ...) | kmalloc( - E1 * E2 * E3 + array3_size(E1, E2, E3) , ...) ) // And then all remaining 2 factors products when they're not all constants, // keeping sizeof() as the second factor argument. @@ expression THING, E1, E2; type TYPE; constant C1, C2, C3; @@ ( kmalloc(sizeof(THING) * C2, ...) | kmalloc(sizeof(TYPE) * C2, ...) | kmalloc(C1 * C2 * C3, ...) | kmalloc(C1 * C2, ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * (E2) + E2, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * E2 + E2, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * (E2) + E2, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * E2 + E2, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - (E1) * E2 + E1, E2 , ...) | - kmalloc + kmalloc_array ( - (E1) * (E2) + E1, E2 , ...) | - kmalloc + kmalloc_array ( - E1 * E2 + E1, E2 , ...) ) Signed-off-by: Kees Cook <keescook@chromium.org>
2018-06-04GFS2: gfs2_free_extlen can return an extent that is too longBob Peterson1-1/+1
Function gfs2_free_extlen calculates the length of an extent of free blocks that may be reserved. The end pointer was calculated as end = start + bh->b_size but b_size is incorrect because the bitmap usually stops prior to the end of the buffer data on the last bitmap. What this means is that when you do a write, you can reserve a chunk of blocks that runs off the end of the last bitmap. For example, I've got a file system where there is only one bitmap for each rgrp, so ri_length==1. I saw cases in which iozone tried to do a big write, grabbed a large block reservation, chose rgrp 5464152, which has ri_data0 5464153 and ri_data 8188. So 5464153 + 8188 = 5472341 which is the end of the rgrp. When it grabbed a reservation it got back: 5470936, length 7229. But 5470936 + 7229 = 5478165. So the reservation starts inside the rgrp but runs 5824 blocks past the end of the bitmap. This patch fixes the calculation so it won't exceed the last bitmap. It also adds a BUG_ON to guard against overflows in the future. Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-01-30gfs2: Add a few missing newlines in messagesAndreas Gruenbacher1-1/+1
Some of the info, warning, and error messages are missing their trailing newline. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-01-23GFS2: Log the reason for log flushes in every log headerBob Peterson1-1/+2
This patch just adds the capability for GFS2 to track which function called gfs2_log_flush. This should make it easier to diagnose problems based on the sequence of events found in the journals. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>