diff options
author | 2019-07-09 12:34:26 -0700 | |
---|---|---|
committer | 2019-07-09 12:34:26 -0700 | |
commit | e9a83bd2322035ed9d7dcf35753d3f984d76c6a5 (patch) | |
tree | 66dc466ff9aec0f9bb7f39cba50a47eab6585559 /Documentation/device-mapper/log-writes.rst | |
parent | Merge tag 'printk-for-5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk (diff) | |
parent | docs: automarkup.py: ignore exceptions when seeking for xrefs (diff) | |
download | linux-dev-e9a83bd2322035ed9d7dcf35753d3f984d76c6a5.tar.xz linux-dev-e9a83bd2322035ed9d7dcf35753d3f984d76c6a5.zip |
Merge tag 'docs-5.3' of git://git.lwn.net/linux
Pull Documentation updates from Jonathan Corbet:
"It's been a relatively busy cycle for docs:
- A fair pile of RST conversions, many from Mauro. These create more
than the usual number of simple but annoying merge conflicts with
other trees, unfortunately. He has a lot more of these waiting on
the wings that, I think, will go to you directly later on.
- A new document on how to use merges and rebases in kernel repos,
and one on Spectre vulnerabilities.
- Various improvements to the build system, including automatic
markup of function() references because some people, for reasons I
will never understand, were of the opinion that
:c:func:``function()`` is unattractive and not fun to type.
- We now recommend using sphinx 1.7, but still support back to 1.4.
- Lots of smaller improvements, warning fixes, typo fixes, etc"
* tag 'docs-5.3' of git://git.lwn.net/linux: (129 commits)
docs: automarkup.py: ignore exceptions when seeking for xrefs
docs: Move binderfs to admin-guide
Disable Sphinx SmartyPants in HTML output
doc: RCU callback locks need only _bh, not necessarily _irq
docs: format kernel-parameters -- as code
Doc : doc-guide : Fix a typo
platform: x86: get rid of a non-existent document
Add the RCU docs to the core-api manual
Documentation: RCU: Add TOC tree hooks
Documentation: RCU: Rename txt files to rst
Documentation: RCU: Convert RCU UP systems to reST
Documentation: RCU: Convert RCU linked list to reST
Documentation: RCU: Convert RCU basic concepts to reST
docs: filesystems: Remove uneeded .rst extension on toctables
scripts/sphinx-pre-install: fix out-of-tree build
docs: zh_CN: submitting-drivers.rst: Remove a duplicated Documentation/
Documentation: PGP: update for newer HW devices
Documentation: Add section about CPU vulnerabilities for Spectre
Documentation: platform: Delete x86-laptop-drivers.txt
docs: Note that :c:func: should no longer be used
...
Diffstat (limited to 'Documentation/device-mapper/log-writes.rst')
-rw-r--r-- | Documentation/device-mapper/log-writes.rst | 145 |
1 files changed, 145 insertions, 0 deletions
diff --git a/Documentation/device-mapper/log-writes.rst b/Documentation/device-mapper/log-writes.rst new file mode 100644 index 000000000000..23141f2ffb7c --- /dev/null +++ b/Documentation/device-mapper/log-writes.rst @@ -0,0 +1,145 @@ +============= +dm-log-writes +============= + +This target takes 2 devices, one to pass all IO to normally, and one to log all +of the write operations to. This is intended for file system developers wishing +to verify the integrity of metadata or data as the file system is written to. +There is a log_write_entry written for every WRITE request and the target is +able to take arbitrary data from userspace to insert into the log. The data +that is in the WRITE requests is copied into the log to make the replay happen +exactly as it happened originally. + +Log Ordering +============ + +We log things in order of completion once we are sure the write is no longer in +cache. This means that normal WRITE requests are not actually logged until the +next REQ_PREFLUSH request. This is to make it easier for userspace to replay +the log in a way that correlates to what is on disk and not what is in cache, +to make it easier to detect improper waiting/flushing. + +This works by attaching all WRITE requests to a list once the write completes. +Once we see a REQ_PREFLUSH request we splice this list onto the request and once +the FLUSH request completes we log all of the WRITEs and then the FLUSH. Only +completed WRITEs, at the time the REQ_PREFLUSH is issued, are added in order to +simulate the worst case scenario with regard to power failures. Consider the +following example (W means write, C means complete): + + W1,W2,W3,C3,C2,Wflush,C1,Cflush + +The log would show the following: + + W3,W2,flush,W1.... + +Again this is to simulate what is actually on disk, this allows us to detect +cases where a power failure at a particular point in time would create an +inconsistent file system. + +Any REQ_FUA requests bypass this flushing mechanism and are logged as soon as +they complete as those requests will obviously bypass the device cache. + +Any REQ_OP_DISCARD requests are treated like WRITE requests. Otherwise we would +have all the DISCARD requests, and then the WRITE requests and then the FLUSH +request. Consider the following example: + + WRITE block 1, DISCARD block 1, FLUSH + +If we logged DISCARD when it completed, the replay would look like this: + + DISCARD 1, WRITE 1, FLUSH + +which isn't quite what happened and wouldn't be caught during the log replay. + +Target interface +================ + +i) Constructor + + log-writes <dev_path> <log_dev_path> + + ============= ============================================== + dev_path Device that all of the IO will go to normally. + log_dev_path Device where the log entries are written to. + ============= ============================================== + +ii) Status + + <#logged entries> <highest allocated sector> + + =========================== ======================== + #logged entries Number of logged entries + highest allocated sector Highest allocated sector + =========================== ======================== + +iii) Messages + + mark <description> + + You can use a dmsetup message to set an arbitrary mark in a log. + For example say you want to fsck a file system after every + write, but first you need to replay up to the mkfs to make sure + we're fsck'ing something reasonable, you would do something like + this:: + + mkfs.btrfs -f /dev/mapper/log + dmsetup message log 0 mark mkfs + <run test> + + This would allow you to replay the log up to the mkfs mark and + then replay from that point on doing the fsck check in the + interval that you want. + + Every log has a mark at the end labeled "dm-log-writes-end". + +Userspace component +=================== + +There is a userspace tool that will replay the log for you in various ways. +It can be found here: https://github.com/josefbacik/log-writes + +Example usage +============= + +Say you want to test fsync on your file system. You would do something like +this:: + + TABLE="0 $(blockdev --getsz /dev/sdb) log-writes /dev/sdb /dev/sdc" + dmsetup create log --table "$TABLE" + mkfs.btrfs -f /dev/mapper/log + dmsetup message log 0 mark mkfs + + mount /dev/mapper/log /mnt/btrfs-test + <some test that does fsync at the end> + dmsetup message log 0 mark fsync + md5sum /mnt/btrfs-test/foo + umount /mnt/btrfs-test + + dmsetup remove log + replay-log --log /dev/sdc --replay /dev/sdb --end-mark fsync + mount /dev/sdb /mnt/btrfs-test + md5sum /mnt/btrfs-test/foo + <verify md5sum's are correct> + + Another option is to do a complicated file system operation and verify the file + system is consistent during the entire operation. You could do this with: + + TABLE="0 $(blockdev --getsz /dev/sdb) log-writes /dev/sdb /dev/sdc" + dmsetup create log --table "$TABLE" + mkfs.btrfs -f /dev/mapper/log + dmsetup message log 0 mark mkfs + + mount /dev/mapper/log /mnt/btrfs-test + <fsstress to dirty the fs> + btrfs filesystem balance /mnt/btrfs-test + umount /mnt/btrfs-test + dmsetup remove log + + replay-log --log /dev/sdc --replay /dev/sdb --end-mark mkfs + btrfsck /dev/sdb + replay-log --log /dev/sdc --replay /dev/sdb --start-mark mkfs \ + --fsck "btrfsck /dev/sdb" --check fua + +And that will replay the log until it sees a FUA request, run the fsck command +and if the fsck passes it will replay to the next FUA, until it is completed or +the fsck command exists abnormally. |