aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/dax/super.c (follow)
AgeCommit message (Collapse)AuthorFilesLines
2022-09-29dax: Remove usage of the deprecated ida_simple_xxx APIBo Liu1-3/+3
ida_alloc_max() makes it clear that the second argument is inclusive, and the alloc/free terminology is more idiomatic and symmetric then get/remove. Signed-off-by: Bo Liu <liubo03@inspur.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/r/20220926012635.3205-1-liubo03@inspur.com [djbw: reword changelog] Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-07-17dax: introduce holder for dax_deviceShiyang Ruan1-1/+66
Patch series "v14 fsdax-rmap + v11 fsdax-reflink", v2. The patchset fsdax-rmap is aimed to support shared pages tracking for fsdax. It moves owner tracking from dax_assocaite_entry() to pmem device driver, by introducing an interface ->memory_failure() for struct pagemap. This interface is called by memory_failure() in mm, and implemented by pmem device. Then call holder operations to find the filesystem which the corrupted data located in, and call filesystem handler to track files or metadata associated with this page. Finally we are able to try to fix the corrupted data in filesystem and do other necessary processing, such as killing processes who are using the files affected. The call trace is like this: memory_failure() |* fsdax case |------------ |pgmap->ops->memory_failure() => pmem_pgmap_memory_failure() | dax_holder_notify_failure() => | dax_device->holder_ops->notify_failure() => | - xfs_dax_notify_failure() | |* xfs_dax_notify_failure() | |-------------------------- | | xfs_rmap_query_range() | | xfs_dax_failure_fn() | | * corrupted on metadata | | try to recover data, call xfs_force_shutdown() | | * corrupted on file data | | try to recover data, call mf_dax_kill_procs() |* normal case |------------- |mf_generic_kill_procs() The patchset fsdax-reflink attempts to add CoW support for fsdax, and takes XFS, which has both reflink and fsdax features, as an example. One of the key mechanisms needed to be implemented in fsdax is CoW. Copy the data from srcmap before we actually write data to the destination iomap. And we just copy range in which data won't be changed. Another mechanism is range comparison. In page cache case, readpage() is used to load data on disk to page cache in order to be able to compare data. In fsdax case, readpage() does not work. So, we need another compare data with direct access support. With the two mechanisms implemented in fsdax, we are able to make reflink and fsdax work together in XFS. This patch (of 14): To easily track filesystem from a pmem device, we introduce a holder for dax_device structure, and also its operation. This holder is used to remember who is using this dax_device: - When it is the backend of a filesystem, the holder will be the instance of this filesystem. - When this pmem device is one of the targets in a mapped device, the holder will be this mapped device. In this case, the mapped device has its own dax_device and it will follow the first rule. So that we can finally track to the filesystem we needed. The holder and holder_ops will be set when filesystem is being mounted, or an target device is being activated. Link: https://lkml.kernel.org/r/20220603053738.1218681-1-ruansy.fnst@fujitsu.com Link: https://lkml.kernel.org/r/20220603053738.1218681-2-ruansy.fnst@fujitsu.com Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dan Williams <dan.j.wiliams@intel.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Jane Chu <jane.chu@oracle.com> Cc: Goldwyn Rodrigues <rgoldwyn@suse.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Matthew Wilcox <willy@infradead.org> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Goldwyn Rodrigues <rgoldwyn@suse.com> Cc: Ritesh Harjani <riteshh@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-16dax: add .recovery_write dax_operationJane Chu1-0/+9
Introduce dax_recovery_write() operation. The function is used to recover a dax range that contains poison. Typical use case is when a user process receives a SIGBUS with si_code BUS_MCEERR_AR indicating poison(s) in a dax range, in response, the user process issues a pwrite() to the page-aligned dax range, thus clears the poison and puts valid data in the range. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jane Chu <jane.chu@oracle.com> Link: https://lore.kernel.org/r/20220422224508.440670-6-jane.chu@oracle.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-05-16dax: introduce DAX_RECOVERY_WRITE dax access modeJane Chu1-2/+3
Up till now, dax_direct_access() is used implicitly for normal access, but for the purpose of recovery write, dax range with poison is requested. To make the interface clear, introduce enum dax_access_mode { DAX_ACCESS, DAX_RECOVERY_WRITE, } where DAX_ACCESS is used for normal dax access, and DAX_RECOVERY_WRITE is used for dax recovery write. Suggested-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Jane Chu <jane.chu@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Mike Snitzer <snitzer@redhat.com> Reviewed-by: Vivek Goyal <vgoyal@redhat.com> Link: https://lore.kernel.org/r/165247982851.52965.11024212198889762949.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-03-24Merge tag 'dax-for-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimmLinus Torvalds1-0/+2
Pull DAX updates from Dan Williams: "Andrew has been shepherding major dax features that touch the core -mm through his tree, but I still collect the dax updates that are core-mm independent. - Fix a crash due to a missing rcu_barrier() in dax_fs_exit() - Fix two miscellaneous doc issues" * tag 'dax-for-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: dax: Fix missing kdoc for dax_device dax: make sure inodes are flushed before destroy cache fsdax: fix function description
2022-03-22fs: allocate inode by using alloc_inode_sb()Muchun Song1-1/+1
The inode allocation is supposed to use alloc_inode_sb(), so convert kmem_cache_alloc() of all filesystems to alloc_inode_sb(). Link: https://lkml.kernel.org/r/20220228122126.37293-5-songmuchun@bytedance.com Signed-off-by: Muchun Song <songmuchun@bytedance.com> Acked-by: Theodore Ts'o <tytso@mit.edu> [ext4] Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Cc: Alex Shi <alexs@kernel.org> Cc: Anna Schumaker <Anna.Schumaker@Netapp.com> Cc: Chao Yu <chao@kernel.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Fam Zheng <fam.zheng@bytedance.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kari Argillander <kari.argillander@gmail.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Xiongchun Duan <duanxiongchun@bytedance.com> Cc: Yang Shi <shy828301@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-12dax: Fix missing kdoc for dax_deviceIra Weiny1-0/+1
struct dax_device has a member named ops which was undocumented. Add the kdoc. Signed-off-by: Ira Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/r/20220304204655.3489216-1-ira.weiny@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-02-17dax: make sure inodes are flushed before destroy cacheTong Zhang1-0/+1
A bug can be triggered by following command $ modprobe nd_pmem && modprobe -r nd_pmem [ 10.060014] BUG dax_cache (Not tainted): Objects remaining in dax_cache on __kmem_cache_shutdown() [ 10.060938] Slab 0x0000000085b729ac objects=9 used=1 fp=0x000000004f5ae469 flags=0x200000000010200(slab|head|node) [ 10.062433] Call Trace: [ 10.062673] dump_stack_lvl+0x34/0x44 [ 10.062865] slab_err+0x90/0xd0 [ 10.063619] __kmem_cache_shutdown+0x13b/0x2f0 [ 10.063848] kmem_cache_destroy+0x4a/0x110 [ 10.064058] __x64_sys_delete_module+0x265/0x300 This is caused by dax_fs_exit() not flushing inodes before destroy cache. To fix this issue, call rcu_barrier() before destroy cache. Signed-off-by: Tong Zhang <ztong0001@gmail.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220212071111.148575-1-ztong0001@gmail.com Fixes: 7b6be8444e0f ("dax: refactor dax-fs into a generic provider of 'struct dax_device' instances") Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-12-18dax: remove the copy_from_iter and copy_to_iter methodsChristoph Hellwig1-4/+32
These methods indirect the actual DAX read/write path. In the end pmem uses magic flush and mc safe variants and fuse and dcssblk use plain ones while device mapper picks redirects to the underlying device. Add set_dax_nocache() and set_dax_nomc() APIs to control which copy routines are used to remove indirect call from the read/write fast path as well as a lot of boilerplate code. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Vivek Goyal <vgoyal@redhat.com> [virtiofs] Link: https://lore.kernel.org/r/20211215084508.435401-5-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-12-18dax: remove the DAXDEV_F_SYNC flagChristoph Hellwig1-5/+1
Remove the DAXDEV_F_SYNC flag and thus the flags argument to alloc_dax and just let the drivers call set_dax_synchronous directly. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Pankaj Gupta <pankaj.gupta@ionos.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Link: https://lore.kernel.org/r/20211215084508.435401-4-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-12-18dax: simplify dax_synchronous and set_dax_synchronousChristoph Hellwig1-4/+4
Remove the pointless wrappers. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Pankaj Gupta <pankaj.gupta@ionos.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Link: https://lore.kernel.org/r/20211215084508.435401-3-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-12-04dax: return the partition offset from fs_dax_get_by_bdevChristoph Hellwig1-3/+6
Prepare for the removal of the block_device from the DAX I/O path by returning the partition offset from fs_dax_get_by_bdev so that the file systems have it at hand for use during I/O. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/r/20211129102203.2243509-26-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-12-04fsdax: simplify the pgoff calculationChristoph Hellwig1-14/+0
Replace the two steps of dax_iomap_sector and bdev_dax_pgoff with a single dax_iomap_pgoff helper that avoids lots of cumbersome sector conversions. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/r/20211129102203.2243509-15-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-12-04dax: remove dax_capableChristoph Hellwig1-36/+0
Just open code the block size and dax_dev == NULL checks in the callers. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Mike Snitzer <snitzer@redhat.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> [erofs] Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/r/20211129102203.2243509-9-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-12-04dax: move the partition alignment check into fs_dax_get_by_bdevChristoph Hellwig1-17/+6
fs_dax_get_by_bdev is the primary interface to find a dax device for a block device, so move the partition alignment check there instead of wiring it up through ->dax_supported. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/r/20211129102203.2243509-7-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-12-04dax: remove the pgmap sanity checks in generic_fsdax_supportedChristoph Hellwig1-48/+1
Drivers that register a dax_dev should make sure it works, no need to double check from the file system. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/r/20211129102203.2243509-6-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-12-04dax: simplify the dax_device <-> gendisk associationChristoph Hellwig1-82/+27
Replace the dax_host_hash with an xarray indexed by the pointer value of the gendisk, and require explicitly calls from the block drivers that want to associate their gendisk with a dax_device. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Mike Snitzer <snitzer@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/r/20211129102203.2243509-5-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-12-04dm: make the DAX support depend on CONFIG_FS_DAXChristoph Hellwig1-4/+2
The device mapper DAX support is all hanging off a block device and thus can't be used with device dax. Make it depend on CONFIG_FS_DAX instead of CONFIG_DAX_DRIVER. This also means that bdev_dax_pgoff only needs to be built under CONFIG_FS_DAX now. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Mike Snitzer <snitzer@redhat.com> Link: https://lore.kernel.org/r/20211129102203.2243509-3-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-09-27nvdimm/pmem: move dax_attribute_group from dax to pmemChristoph Hellwig1-82/+18
dax_attribute_group is only used by the pmem driver, and can avoid the completely pointless lookup by the disk name if moved there. This leaves just a single caller of dax_get_by_host, so move dax_get_by_host into the same ifdef block as that caller. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Link: https://lore.kernel.org/r/20210922173431.2454024-3-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-08-26dax: remove bdev_dax_supportedChristoph Hellwig1-41/+1
All callers already have a dax_device obtained from fs_dax_get_by_bdev at hand, so just pass that to dax_supported() insted of doing another lookup. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Link: https://lore.kernel.org/r/20210826135510.6293-10-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-08-26dax: stub out dax_supported for !CONFIG_FS_DAXChristoph Hellwig1-18/+18
dax_supported calls into ->dax_supported which checks for fsdax support. Don't bother building it for !CONFIG_FS_DAX as it will always return false. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Link: https://lore.kernel.org/r/20210826135510.6293-8-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-08-26dax: remove __generic_fsdax_supportedChristoph Hellwig1-4/+4
Just implement generic_fsdax_supported directly out of line instead of adding a wrapper. Given that generic_fsdax_supported is only supplied for CONFIG_FS_DAX builds this also allows to not provide it at all for !CONFIG_FS_DAX builds. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Link: https://lore.kernel.org/r/20210826135510.6293-7-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-08-26dax: move the dax_read_lock() locking into dax_supportedChristoph Hellwig1-7/+9
Move the dax_read_lock/dax_read_unlock pair from the callers into dax_supported to make it a little easier to use. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Link: https://lore.kernel.org/r/20210826135510.6293-6-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-08-26dax: mark dax_get_by_host staticChristoph Hellwig1-55/+54
And move the code around a bit to avoid a forward declaration. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Link: https://lore.kernel.org/r/20210826135510.6293-5-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-08-26dax: stop using bdevnameChristoph Hellwig1-13/+7
Just use the %pg format specifier instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Link: https://lore.kernel.org/r/20210826135510.6293-3-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-07-07dax: Ensure errno is returned from dax_direct_accessIra Weiny1-1/+1
If the caller specifies a negative nr_pages that is an invalid parameter. Return -EINVAL to ensure callers get an errno if they want to check it. Reviewed-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/r/20210525172428.3634316-4-ira.weiny@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-02-23whack-a-mole: don't open-code iminor/imajorAl Viro1-1/+1
several instances creeped back into the tree... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-12-15device-dax/core: Fix memory leak when rmmod dax.koWang Hai1-0/+1
When I repeatedly modprobe and rmmod dax.ko, kmemleak report a memory leak as follows: unreferenced object 0xffff9a5588c05088 (size 8): comm "modprobe", pid 261, jiffies 4294693644 (age 42.063s) ... backtrace: [<00000000e007ced0>] kstrdup+0x35/0x70 [<000000002ae73897>] kstrdup_const+0x3d/0x50 [<000000002b00c9c3>] kvasprintf_const+0xbc/0xf0 [<000000008023282f>] kobject_set_name_vargs+0x3b/0xd0 [<00000000d2cbaa4e>] kobject_set_name+0x62/0x90 [<00000000202e7a22>] bus_register+0x7f/0x2b0 [<000000000b77792c>] 0xffffffffc02840f7 [<000000002d5be5ac>] 0xffffffffc02840b4 [<00000000dcafb7cd>] do_one_initcall+0x58/0x240 [<00000000049fe480>] do_init_module+0x56/0x1e2 [<0000000022671491>] load_module+0x2517/0x2840 [<000000001a2201cb>] __do_sys_finit_module+0x9c/0xe0 [<000000003eb304e7>] do_syscall_64+0x33/0x40 [<0000000051c5fd06>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 When rmmod dax is executed, dax_bus_exit() is missing. This patch can fix this bug. Fixes: 9567da0b408a ("device-dax: Introduce bus + driver model") Cc: <stable@vger.kernel.org> Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wang Hai <wanghai38@huawei.com> Link: https://lore.kernel.org/r/20201201135929.66530-1-wanghai38@huawei.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2020-10-19Merge tag 'fuse-update-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuseLinus Torvalds1-1/+2
Pull fuse updates from Miklos Szeredi: - Support directly accessing host page cache from virtiofs. This can improve I/O performance for various workloads, as well as reducing the memory requirement by eliminating double caching. Thanks to Vivek Goyal for doing most of the work on this. - Allow automatic submounting inside virtiofs. This allows unique st_dev/ st_ino values to be assigned inside the guest to files residing on different filesystems on the host. Thanks to Max Reitz for the patches. - Fix an old use after free bug found by Pradeep P V K. * tag 'fuse-update-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: (25 commits) virtiofs: calculate number of scatter-gather elements accurately fuse: connection remove fix fuse: implement crossmounts fuse: Allow fuse_fill_super_common() for submounts fuse: split fuse_mount off of fuse_conn fuse: drop fuse_conn parameter where possible fuse: store fuse_conn in fuse_req fuse: add submount support to <uapi/linux/fuse.h> fuse: fix page dereference after free virtiofs: add logic to free up a memory range virtiofs: maintain a list of busy elements virtiofs: serialize truncate/punch_hole and dax fault path virtiofs: define dax address space operations virtiofs: add DAX mmap support virtiofs: implement dax read/write operations virtiofs: introduce setupmapping/removemapping commands virtiofs: implement FUSE_INIT map_alignment field virtiofs: keep a list of free dax memory ranges virtiofs: add a mount option to enable dax virtiofs: set up virtio_fs dax_device ...
2020-09-20dax: Fix stack overflow when mounting fsdax pmem deviceAdrian Huang1-6/+6
When mounting fsdax pmem device, commit 6180bb446ab6 ("dax: fix detection of dax support for non-persistent memory block devices") introduces the stack overflow [1][2]. Here is the call path for mounting ext4 file system: ext4_fill_super bdev_dax_supported __bdev_dax_supported dax_supported generic_fsdax_supported __generic_fsdax_supported bdev_dax_supported The call path leads to the infinite calling loop, so we cannot call bdev_dax_supported() in __generic_fsdax_supported(). The sanity checking of the variable 'dax_dev' is moved prior to the two bdev_dax_pgoff() checks [3][4]. [1] https://lore.kernel.org/linux-nvdimm/1420999447.1004543.1600055488770.JavaMail.zimbra@redhat.com/ [2] https://lore.kernel.org/linux-nvdimm/alpine.LRH.2.02.2009141131220.30651@file01.intranet.prod.int.rdu2.redhat.com/ [3] https://lore.kernel.org/linux-nvdimm/CA+RJvhxBHriCuJhm-D8NvJRe3h2MLM+ZMFgjeJjrRPerMRLvdg@mail.gmail.com/ [4] https://lore.kernel.org/linux-nvdimm/20200903160608.GU878166@iweiny-DESK2.sc.intel.com/ Fixes: 6180bb446ab6 ("dax: fix detection of dax support for non-persistent memory block devices") Reported-by: Yi Zhang <yi.zhang@redhat.com> Reported-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Adrian Huang <ahuang12@lenovo.com> Reviewed-by: Jan Kara <jack@suse.cz> Tested-by: Ritesh Harjani <riteshh@linux.ibm.com> Cc: Coly Li <colyli@suse.de> Cc: Ira Weiny <ira.weiny@intel.com> Cc: John Pittman <jpittman@redhat.com> Link: https://lore.kernel.org/r/20200917111549.6367-1-adrianhuang0701@gmail.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2020-09-20dm: Call proper helper to determine dax supportJan Kara1-0/+4
DM was calling generic_fsdax_supported() to determine whether a device referenced in the DM table supports DAX. However this is a helper for "leaf" device drivers so that they don't have to duplicate common generic checks. High level code should call dax_supported() helper which that calls into appropriate helper for the particular device. This problem manifested itself as kernel messages: dm-3: error: dax access failed (-95) when lvm2-testsuite run in cases where a DM device was stacked on top of another DM device. Fixes: 7bf7eac8d648 ("dax: Arrange for dax_supported check to span multiple devices") Cc: <stable@vger.kernel.org> Tested-by: Adrian Huang <ahuang12@lenovo.com> Signed-off-by: Jan Kara <jack@suse.cz> Acked-by: Mike Snitzer <snitzer@redhat.com> Reported-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/r/160061715195.13131.5503173247632041975.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2020-09-10dax: Modify bdev_dax_pgoff() to handle NULL bdevVivek Goyal1-1/+2
virtiofs does not have a block device but it has dax device. Modify bdev_dax_pgoff() to be able to handle that. If there is no bdev, that means dax offset is 0. (It can't be a partition block device starting at an offset in dax device). This is little hackish. There have been discussions about getting rid of dax not supporting partitions. https://lore.kernel.org/linux-fsdevel/20200107125159.GA15745@infradead.org/ IMHO, this path can easily break exisitng users. For example ioctl(BLKPG_ADD_PARTITION) will start breaking on block devices supporting DAX. Also, I personally find it very useful to be able to partition dax devices and still be able to use DAX. Alternatively, I tried to store offset into dax device information in iomap interface, but that got NACKed. https://lore.kernel.org/linux-fsdevel/20200217133117.GB20444@infradead.org/ I can't think of a good path to solve this issue properly. So to make progress, it seems this patch is least bad option for now and I hope we can take it. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Christoph Hellwig <hch@infradead.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Jan Kara <jack@suse.cz> Cc: Vishal L Verma <vishal.l.verma@intel.com> Cc: "Weiny, Ira" <ira.weiny@intel.com> Cc: linux-nvdimm@lists.01.org Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-09-03dax: fix detection of dax support for non-persistent memory block devicesColy Li1-1/+1
When calling __generic_fsdax_supported(), a dax-unsupported device may not have dax_dev as NULL, e.g. the dax related code block is not enabled by Kconfig. Therefore in __generic_fsdax_supported(), to check whether a device supports DAX or not, the following order of operations should be performed: - If dax_dev pointer is NULL, it means the device driver explicitly announce it doesn't support DAX. Then it is OK to directly return false from __generic_fsdax_supported(). - If dax_dev pointer is NOT NULL, it might be because the driver doesn't support DAX and not explicitly initialize related data structure. Then bdev_dax_supported() should be called for further check. If device driver desn't explicitly set its dax_dev pointer to NULL, this is not a bug. Calling bdev_dax_supported() makes sure they can be recognized as dax-unsupported eventually. Fixes: c2affe920b0e ("dax: do not print error message for non-persistent memory block device") Cc: Jan Kara <jack@suse.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Reviewed-and-tested-by: Adrian Huang <ahuang12@lenovo.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Signed-off-by: Coly Li <colyli@suse.de> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com> Link: https://lore.kernel.org/r/20200903161625.19524-1-colyli@suse.de
2020-08-20dax: do not print error message for non-persistent memory block deviceAdrian Huang1-0/+6
Commit 231609785cbf ("dax: print error message by pr_info() in __generic_fsdax_supported()") happens to print the following error message during booting when the non-persistent memory block devices are configured by device mapper. Those error messages are caused by the variable 'dax_dev' is NULL. Users might be confused with those error messages since they do not use the persistent memory device. Moreover, users might scare about "what's wrong with my disks" because they see the 'error' and 'failed' keywords. # dmesg | grep fail sdk3: error: dax access failed (-95) sdk3: error: dax access failed (-95) sdk3: error: dax access failed (-95) sdk3: error: dax access failed (-95) sdk3: error: dax access failed (-95) sdk3: error: dax access failed (-95) sdk3: error: dax access failed (-95) sdk3: error: dax access failed (-95) sdk3: error: dax access failed (-95) sdb3: error: dax access failed (-95) sdb3: error: dax access failed (-95) sdb3: error: dax access failed (-95) sdb3: error: dax access failed (-95) sdb3: error: dax access failed (-95) sdb3: error: dax access failed (-95) sdb3: error: dax access failed (-95) sdb3: error: dax access failed (-95) sdb3: error: dax access failed (-95) # lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 1.1T 0 disk ├─sda1 8:1 0 156M 0 part ├─sda2 8:2 0 40G 0 part └─sda3 8:3 0 1.1T 0 part sdb 8:16 0 1.1T 0 disk ├─sdb1 8:17 0 600M 0 part ├─sdb2 8:18 0 1G 0 part └─sdb3 8:19 0 1.1T 0 part ├─rhel00-swap 254:3 0 4G 0 lvm ├─rhel00-home 254:4 0 1T 0 lvm └─rhel00-root 254:5 0 50G 0 lvm sdc 8:32 0 1.1T 0 disk sdd 8:48 0 1.1T 0 disk sde 8:64 0 1.1T 0 disk sdf 8:80 0 1.1T 0 disk sdg 8:96 0 1.1T 0 disk sdh 8:112 0 3.3T 0 disk ├─sdh1 8:113 0 500M 0 part /boot/efi ├─sdh2 8:114 0 40G 0 part / ├─sdh3 8:115 0 2.9T 0 part /home └─sdh4 8:116 0 314.6G 0 part [SWAP] sdi 8:128 0 1.1T 0 disk sdj 8:144 0 3.3T 0 disk ├─sdj1 8:145 0 512M 0 part └─sdj2 8:146 0 3.3T 0 part sdk 8:160 0 119.2G 0 disk ├─sdk1 8:161 0 200M 0 part ├─sdk2 8:162 0 1G 0 part └─sdk3 8:163 0 118G 0 part ├─rhel-swap 254:0 0 4G 0 lvm ├─rhel-home 254:1 0 64G 0 lvm └─rhel-root 254:2 0 50G 0 lvm sdl 8:176 0 119.2G 0 disk The call path is shown as follows: dm_table_determine_type dm_table_supports_dax device_supports_dax generic_fsdax_supported __generic_fsdax_supported With the disk configuration listing from the command 'lsblk', the member 'dev->dax_dev' of the block devices 'sdb3' and 'sdk3' (configured by device mapper) is NULL in function generic_fsdax_supported() because the member is configured in function open_table_device(). To prevent the confusing error messages in this scenario (this is normal behavior), just print those error messages by pr_debug() by checking if dax_dev is NULL and the block device does not support DAX. Link: https://lore.kernel.org/r/20200819154236.24191-1-adrianhuang0701@gmail.com Fixes: 231609785cbf ("dax: print error message by pr_info() in __generic_fsdax_supported()") Cc: Coly Li <colyli@suse.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Alasdair Kergon <agk@redhat.com> Cc: Mike Snitzer <snitzer@redhat.com> Acked-by: Coly Li <colyli@suse.de> Signed-off-by: Adrian Huang <ahuang12@lenovo.com> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
2020-08-11Merge tag 'libnvdimm-for-5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimmLinus Torvalds1-6/+7
Pull libnvdimm updayes from Vishal Verma: "You'd normally receive this pull request from Dan Williams, but he's busy watching a newborn (Congrats Dan!), so I'm watching libnvdimm this cycle. This adds a new feature in libnvdimm - 'Runtime Firmware Activation', and a few small cleanups and fixes in libnvdimm and DAX. I'd originally intended to make separate topic-based pull requests - one for libnvdimm, and one for DAX, but some of the DAX material fell out since it wasn't quite ready. Summary: - add 'Runtime Firmware Activation' support for NVDIMMs that advertise the relevant capability - misc libnvdimm and DAX cleanups" * tag 'libnvdimm-for-5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: libnvdimm/security: ensure sysfs poll thread woke up and fetch updated attr libnvdimm/security: the 'security' attr never show 'overwrite' state libnvdimm/security: fix a typo ACPI: NFIT: Fix ARS zero-sized allocation dax: Fix incorrect argument passed to xas_set_err() ACPI: NFIT: Add runtime firmware activate support PM, libnvdimm: Add runtime firmware activation support libnvdimm: Convert to DEVICE_ATTR_ADMIN_RO() drivers/dax: Expand lock scope to cover the use of addresses fs/dax: Remove unused size parameter dax: print error message by pr_info() in __generic_fsdax_supported() driver-core: Introduce DEVICE_ATTR_ADMIN_{RO,RW} tools/testing/nvdimm: Emulate firmware activation commands tools/testing/nvdimm: Prepare nfit_ctl_test() for ND_CMD_CALL emulation tools/testing/nvdimm: Add command debug messages tools/testing/nvdimm: Cleanup dimm index passing ACPI: NFIT: Define runtime firmware activation commands ACPI: NFIT: Move bus_dsm_mask out of generic nvdimm_bus_descriptor libnvdimm: Validate command family indices
2020-07-28drivers/dax: Expand lock scope to cover the use of addressesIra Weiny1-1/+2
The addition of PKS protection to dax read lock/unlock will require that the address returned by dax_direct_access() be protected by this lock. Correct the locking by ensuring that the use of kaddr and end_kaddr are covered by the dax read lock/unlock. Link: https://lore.kernel.org/r/20200717072056.73134-12-ira.weiny@intel.com Reviewed-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
2020-07-28dax: print error message by pr_info() in __generic_fsdax_supported()Coly Li1-5/+5
In struct dax_operations, the callback routine dax_supported() returns a bool type result. For false return value, the caller has no idea whether the device does not support dax at all, or it is just some mis- configuration issue. An example is formatting an Ext4 file system on pmem device on top of a NVDIMM namespace by, # mkfs.ext4 /dev/pmem0 If the fs block size does not match kernel space memory page size (which is possible on non-x86 platform), mount this Ext4 file system will fail, # mount -o dax /dev/pmem0 /mnt mount: /mnt: wrong fs type, bad option, bad superblock on /dev/pmem0, missing codepage or helper program, or other error. And from the dmesg output there is only the following information, [ 307.853148] EXT4-fs (pmem0): DAX unsupported by block device. The above information is quite confusing. Because definitely the pmem0 device supports dax operation, and the super block is consistent as how it was created by mkfs.ext4. Indeed the failure is from __generic_fsdax_supported() by the following code piece, if (blocksize != PAGE_SIZE) { pr_debug("%s: error: unsupported blocksize for dax\n", bdevname(bdev, buf)); return false; } It is because the Ext4 block size is 4KB and kernel page size is 8KB or 16KB. It is not simple to make dax_supported() from struct dax_operations or __generic_fsdax_supported() to return exact failure type right now. So the simplest fix is to use pr_info() to print all the error messages inside __generic_fsdax_supported(). Then users may find informative clue from the kernel message at least. Message printed by pr_debug() is very easy to be ignored by users. This patch prints error message by pr_info() in __generic_fsdax_supported(), when then mount fails, following lines can be found from dmesg output, [ 2705.500885] pmem0: error: unsupported blocksize for dax [ 2705.500888] EXT4-fs (pmem0): DAX unsupported by block device. Now the users may have idea the mount failure is from pmem driver for unsupported block size. Link: https://lore.kernel.org/r/20200725162450.95999-1-colyli@suse.de Cc: Dan Williams <dan.j.williams@intel.com> Cc: Anthony Iliopoulos <ailiopoulos@suse.com> Reported-by: Michal Suchanek <msuchanek@suse.com> Suggested-by: Jan Kara <jack@suse.com> Reviewed-by: Jan Kara <jack@suse.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Signed-off-by: Coly Li <colyli@suse.de> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
2020-07-01block: remove the bd_queue field from struct block_deviceChristoph Hellwig1-1/+1
Just use bd_disk->queue instead. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-04-02dax: Move mandatory ->zero_page_range() check in alloc_dax()Vivek Goyal1-5/+9
zero_page_range() dax operation is mandatory for dax devices. Right now that check happens in dax_zero_page_range() function. Dan thinks that's too late and its better to do the check earlier in alloc_dax(). I also modified alloc_dax() to return pointer with error code in it in case of failure. Right now it returns NULL and caller assumes failure happened due to -ENOMEM. But with this ->zero_page_range() check, I need to return -EINVAL instead. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Link: https://lore.kernel.org/r/20200401161125.GB9398@redhat.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2020-04-02dax, pmem: Add a dax operation zero_page_rangeVivek Goyal1-0/+20
Add a dax operation zero_page_range, to zero a page. This will also clear any known poison in the page being zeroed. As of now, zeroing of one page is allowed in a single call. There are no callers which are trying to zero more than a page in a single call. Once we grow the callers which zero more than a page in single call, we can add that support. Primary reason for not doing that yet is that this will add little complexity in dm implementation where a range might be spanning multiple underlying targets and one will have to split the range into multiple sub ranges and call zero_page_range() on individual targets. Suggested-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Link: https://lore.kernel.org/r/20200228163456.1587-3-vgoyal@redhat.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2020-01-16dax: Get rid of fs_dax_get_by_host() helperVivek Goyal1-1/+1
Looks like nobody is using fs_dax_get_by_host() except fs_dax_get_by_bdev() and it can easily use dax_get_by_host() instead. IIUC, fs_dax_get_by_host() was only introduced so that one could compile with CONFIG_FS_DAX=n and CONFIG_DAX=m. fs_dax_get_by_bdev() achieves the same purpose and hence it looks like fs_dax_get_by_host() is not needed anymore. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20200106181117.GA16248@redhat.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2019-07-19Merge branch 'work.mount0' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds1-13/+10
Pull vfs mount updates from Al Viro: "The first part of mount updates. Convert filesystems to use the new mount API" * 'work.mount0' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits) mnt_init(): call shmem_init() unconditionally constify ksys_mount() string arguments don't bother with registering rootfs init_rootfs(): don't bother with init_ramfs_fs() vfs: Convert smackfs to use the new mount API vfs: Convert selinuxfs to use the new mount API vfs: Convert securityfs to use the new mount API vfs: Convert apparmorfs to use the new mount API vfs: Convert openpromfs to use the new mount API vfs: Convert xenfs to use the new mount API vfs: Convert gadgetfs to use the new mount API vfs: Convert oprofilefs to use the new mount API vfs: Convert ibmasmfs to use the new mount API vfs: Convert qib_fs/ipathfs to use the new mount API vfs: Convert efivarfs to use the new mount API vfs: Convert configfs to use the new mount API vfs: Convert binfmt_misc to use the new mount API convenience helper: get_tree_single() convenience helper get_tree_nodev() vfs: Kill sget_userns() ...
2019-07-05libnvdimm: add dax_dev sync flagPankaj Gupta1-1/+18
This patch adds 'DAXDEV_SYNC' flag which is set for nd_region doing synchronous flush. This later is used to disable MAP_SYNC functionality for ext4 & xfs filesystem for devices don't support synchronous flush. Signed-off-by: Pankaj Gupta <pagupta@redhat.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2019-06-05treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 295Thomas Gleixner1-9/+1
Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of version 2 of the gnu general public license as published by the free software foundation this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 64 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Alexios Zavras <alexios.zavras@intel.com> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190529141901.894819585@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-25vfs: Convert dax to use the new mount APIDavid Howells1-6/+10
Convert the dax filesystem to the new internal mount API as the old one will be obsoleted and removed. This allows greater flexibility in communication of mount parameters between userspace, the VFS and the filesystem. See Documentation/filesystems/mount_api.txt for more information. Signed-off-by: David Howells <dhowells@redhat.com> cc: Dan Williams <dan.j.williams@intel.com> cc: Vishal Verma <vishal.l.verma@intel.com> cc: Keith Busch <keith.busch@intel.com> cc: Dave Jiang <dave.jiang@intel.com> cc: linux-nvdimm@lists.01.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-05-25mount_pseudo(): drop 'name' argument, switch to d_make_root()Al Viro1-1/+1
Once upon a time we used to set ->d_name of e.g. pipefs root so that d_path() on pipes would work. These days it's completely pointless - dentries of pipes are not even connected to pipefs root. However, mount_pseudo() had set the root dentry name (passed as the second argument) and callers kept inventing names to pass to it. Including those that didn't *have* any non-root dentries to start with... All of that had been pointless for about 8 years now; it's time to get rid of that cargo-culting... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-05-21device-dax: Drop register_filesystem()Dan Williams1-7/+0
The device-dax fs is only there to allocate a common inode for each device-node that refers to the same device by major:minor. It is otherwise not user mountable and need not be displayed in /proc/filesystems. Reported-by: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-05-20dax: Arrange for dax_supported check to span multiple devicesDan Williams1-31/+57
Pankaj reports that starting with commit ad428cdb525a "dax: Check the end of the block-device capacity with dax_direct_access()" device-mapper no longer allows dax operation. This results from the stricter checks in __bdev_dax_supported() that validate that the start and end of a block-device map to the same 'pagemap' instance. Teach the dax-core and device-mapper to validate the 'pagemap' on a per-target basis. This is accomplished by refactoring the bdev_dax_supported() internals into generic_fsdax_supported() which takes a sector range to validate. Consequently generic_fsdax_supported() is suitable to be used in a device-mapper ->iterate_devices() callback. A new ->dax_supported() operation is added to allow composite devices to split and route upper-level bdev_dax_supported() requests. Fixes: ad428cdb525a ("dax: Check the end of the block-device...") Cc: <stable@vger.kernel.org> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Keith Busch <keith.busch@intel.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Reviewed-by: Jan Kara <jack@suse.cz> Reported-by: Pankaj Gupta <pagupta@redhat.com> Reviewed-by: Pankaj Gupta <pagupta@redhat.com> Tested-by: Pankaj Gupta <pagupta@redhat.com> Tested-by: Vaibhav Jain <vaibhav@linux.ibm.com> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2019-05-01dax: make use of ->free_inode()Al Viro1-5/+2
we might want to drop ->destroy_inode() there - it's used only for WARN_ON() now, and AFAICS that could be moved to ->evict_inode() if we had one... Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-03-16Merge tag 'devdax-for-5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimmLinus Torvalds1-12/+29
Pull device-dax updates from Dan Williams: "New device-dax infrastructure to allow persistent memory and other "reserved" / performance differentiated memories, to be assigned to the core-mm as "System RAM". Some users want to use persistent memory as additional volatile memory. They are willing to cope with potential performance differences, for example between DRAM and 3D Xpoint, and want to use typical Linux memory management apis rather than a userspace memory allocator layered over an mmap() of a dax file. The administration model is to decide how much Persistent Memory (pmem) to use as System RAM, create a device-dax-mode namespace of that size, and then assign it to the core-mm. The rationale for device-dax is that it is a generic memory-mapping driver that can be layered over any "special purpose" memory, not just pmem. On subsequent boots udev rules can be used to restore the memory assignment. One implication of using pmem as RAM is that mlock() no longer keeps data off persistent media. For this reason it is recommended to enable NVDIMM Security (previously merged for 5.0) to encrypt pmem contents at rest. We considered making this recommendation an actively enforced requirement, but in the end decided to leave it as a distribution / administrator policy to allow for emulation and test environments that lack security capable NVDIMMs. Summary: - Replace the /sys/class/dax device model with /sys/bus/dax, and include a compat driver so distributions can opt-in to the new ABI. - Allow for an alternative driver for the device-dax address-range - Introduce the 'kmem' driver to hotplug / assign a device-dax address-range to the core-mm. - Arrange for the device-dax target-node to be onlined so that the newly added memory range can be uniquely referenced by numa apis" NOTE! I'm not entirely happy with the whole "PMEM as RAM" model because we currently have special - and very annoying rules in the kernel about accessing PMEM only with the "MC safe" accessors, because machine checks inside the regular repeat string copy functions can be fatal in some (not described) circumstances. And apparently the PMEM modules can cause that a lot more than regular RAM. The argument is that this happens because PMEM doesn't necessarily get scrubbed at boot like RAM does, but that is planned to be added for the user space tooling. Quoting Dan from another email: "The exposure can be reduced in the volatile-RAM case by scanning for and clearing errors before it is onlined as RAM. The userspace tooling for that can be in place before v5.1-final. There's also runtime notifications of errors via acpi_nfit_uc_error_notify() from background scrubbers on the DIMM devices. With that mechanism the kernel could proactively clear newly discovered poison in the volatile case, but that would be additional development more suitable for v5.2. I understand the concern, and the need to highlight this issue by tapping the brakes on feature development, but I don't see PMEM as RAM making the situation worse when the exposure is also there via DAX in the PMEM case. Volatile-RAM is arguably a safer use case since it's possible to repair pages where the persistent case needs active application coordination" * tag 'devdax-for-5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: device-dax: "Hotplug" persistent memory for use like normal RAM mm/resource: Let walk_system_ram_range() search child resources mm/memory-hotplug: Allow memory resources to be children mm/resource: Move HMM pr_debug() deeper into resource code mm/resource: Return real error codes from walk failures device-dax: Add a 'modalias' attribute to DAX 'bus' devices device-dax: Add a 'target_node' attribute device-dax: Auto-bind device after successful new_id acpi/nfit, device-dax: Identify differentiated memory with a unique numa-node device-dax: Add /sys/class/dax backwards compatibility device-dax: Add support for a dax override driver device-dax: Move resource pinning+mapping into the common driver device-dax: Introduce bus + driver model device-dax: Start defining a dax bus model device-dax: Remove multi-resource infrastructure device-dax: Kill dax_region base device-dax: Kill dax_region ida