aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/nvdimm (follow)
AgeCommit message (Collapse)AuthorFilesLines
2019-02-28libnvdimm/btt: Fix LBA masking during 'free list' populationVishal Verma3-6/+29
The Linux BTT implementation assumes that log entries will never have the 'zero' flag set, and indeed it never sets that flag for log entries itself. However, the UEFI spec is ambiguous on the exact format of the LBA field of a log entry, specifically as to whether it should include the additional flag bits or not. While a zero bit doesn't make sense in the context of a log entry, other BTT implementations might still have it set. If an implementation does happen to have it set, we would happily read it in as the next block to write to for writes. Since a high bit is set, it pushes the block number out of the range of an 'arena', and we fail such a write with an EIO. Follow the robustness principle, and tolerate such implementations by stripping out the zero flag when populating the free list during initialization. Additionally, use the same stripped out entries for detection of incomplete writes and map restoration that happens at this stage. Add a sysfs file 'log_zero_flags' that indicates the ability to accept such a layout to userspace applications. This enables 'ndctl check-namespace' to recognize whether the kernel is able to handle zero flags, or whether it should attempt a fix-up under the --repair option. Cc: Dan Williams <dan.j.williams@intel.com> Reported-by: Dexuan Cui <decui@microsoft.com> Reported-by: Pedro d'Aquino Filocre F S Barbuda <pbarbuda@microsoft.com> Tested-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2019-02-28libnvdimm/btt: Remove unnecessary code in btt_freelist_initVishal Verma1-6/+2
We call btt_log_read() twice, once to get the 'old' log entry, and again to get the 'new' entry. However, we have no use for the 'old' entry, so remove it. Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2019-02-22libnvdimm/pfn: Remove dax_label_reserveDan Williams1-4/+2
The reserve was for an abandoned effort to add label (partitioning support) to device-dax instances. Remove it. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2019-02-12libnvdimm/pmem: Honor force_raw for legacy pmem regionsDan Williams1-0/+4
For recovery, where non-dax access is needed to a given physical address range, and testing, allow the 'force_raw' attribute to override the default establishment of a dev_pagemap. Otherwise without this capability it is possible to end up with a namespace that can not be activated due to corrupted info-block, and one that can not be repaired due to a section collision. Cc: <stable@vger.kernel.org> Fixes: 004f1afbe199 ("libnvdimm, pmem: direct map legacy pmem by default") Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2019-02-12libnvdimm/pfn: Account for PAGE_SIZE > info-block-size in nd_pfn_init()Dan Williams1-7/+13
Similar to "libnvdimm: Fix altmap reservation size calculation" provide for a reservation of a full page worth of info block space at info-block establishment time. Typically there is already slack in the padding from honoring the default 2MB alignment, but provide for a reservation for corner case configurations that would otherwise fit. Cc: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2019-02-12libnvdimm: Fix altmap reservation size calculationOliver O'Halloran1-1/+1
Libnvdimm reserves the first 8K of pfn and devicedax namespaces to store a superblock describing the namespace. This 8K reservation is contained within the altmap area which the kernel uses for the vmemmap backing for the pages within the namespace. The altmap allows for some pages at the start of the altmap area to be reserved and that mechanism is used to protect the superblock from being re-used as vmemmap backing. The number of PFNs to reserve is calculated using: PHYS_PFN(SZ_8K) Which is implemented as: #define PHYS_PFN(x) ((unsigned long)((x) >> PAGE_SHIFT)) So on systems where PAGE_SIZE is greater than 8K the reservation size is truncated to zero and the superblock area is re-used as vmemmap backing. As a result all the namespace information stored in the superblock (i.e. if it's a PFN or DAX namespace) is lost and the namespace needs to be re-created to get access to the contents. This patch fixes this by using PFN_UP() rather than PHYS_PFN() to ensure that at least one page is reserved. On systems with a 4K pages size this patch should have no effect. Cc: stable@vger.kernel.org Cc: Dan Williams <dan.j.williams@intel.com> Fixes: ac515c084be9 ("libnvdimm, pmem, pfn: move pfn setup to the core") Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2019-02-12libnvdimm, pfn: Fix over-trim in trim_pfn_device()Wei Yang1-1/+1
When trying to see whether current nd_region intersects with others, trim_pfn_device() has already calculated the *size* to be expanded to SECTION size. Do not double append 'adjust' to 'size' when calculating whether the end of a region collides with the next pmem region. Fixes: ae86cbfef381 "libnvdimm, pfn: Pad pfn namespaces relative to other regions" Cc: <stable@vger.kernel.org> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2019-02-11Merge 5.0-rc6 into driver-core-nextGreg Kroah-Hartman4-7/+26
We need the debugfs fixes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-02-02libnvdimm/dimm: Add a no-BLK quirk based on NVDIMM familyDan Williams4-0/+23
As Dexuan reports the NVDIMM_FAMILY_HYPERV platform is incompatible with the existing Linux namespace implementation because it uses NSLABEL_FLAG_LOCAL for x1-width PMEM interleave sets. Quirk it as an platform / DIMM that does not provide BLK-aperture access. Allow the libnvdimm core to assume no potential for aliasing. In case other implementations make the same mistake, provide a "noblk" module parameter to force-enable the quirk. Link: https://lkml.kernel.org/r/PU1P153MB0169977604493B82B662A01CBF920@PU1P153MB0169.APCP153.PROD.OUTLOOK.COM Reported-by: Dexuan Cui <decui@microsoft.com> Tested-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2019-01-31libnvdimm: Schedule device registration on node local to the deviceAlexander Duyck1-3/+8
Force the device registration for nvdimm devices to be closer to the actual device. This is achieved by using either the NUMA node ID of the region, or of the parent. By doing this we can have everything above the region based on the region, and everything below the region based on the nvdimm bus. By guaranteeing NUMA locality I see an improvement of as high as 25% for per-node init of a system with 12TB of persistent memory. Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-21libnvdimm/security: Require nvdimm_security_setup_events() to succeedDan Williams3-5/+24
The following warning: ACPI0012:00: security event setup failed: -19 ...is meant to capture exceptional failures of sysfs_get_dirent(), however it will also fail in the common case when security support is disabled. A few issues: 1/ A dev_warn() report for a common case is too chatty 2/ The setup of this notifier is generic, no need for it to be driven from the nfit driver, it can exist completely in the core. 3/ If it fails for any reason besides security support being disabled, that's fatal and should abort DIMM activation. Userspace may hang if it never gets overwrite notifications. 4/ The dirent needs to be released. Move the call to the core 'dimm' driver, make it conditional on security support being active, make it fatal for the exceptional case, add the missing sysfs_put() at device disable time. Fixes: 7d988097c546 ("...Add security DSM overwrite support") Reviewed-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2019-01-15libnvdimm/security: Fix nvdimm_security_state() state request selectionDave Jiang1-2/+2
The input parameter should be enum nvdimm_passphrase_type instead of bool for selection of master/user for selection of extended master passphrase state or the regular user passphrase state. Fixes: 89fa9d8ea7bdf ("...add Intel DSM 1.8 master passphrase support") Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2019-01-15libnvdimm/label: Clear 'updating' flag after label-set updateDan Williams1-5/+18
The UEFI 2.7 specification sets expectations that the 'updating' flag is eventually cleared. To date, the libnvdimm core has never adhered to that protocol. The policy of the core matches the policy of other multi-device info-block formats like MD-Software-RAID that expect administrator intervention on inconsistent info-blocks, not automatic invalidation. However, some pre-boot environments may unfortunately attempt to "clean up" the labels and invalidate a set when it fails to find at least one "non-updating" label in the set. Clear the updating flag after set updates to minimize the window of vulnerability to aggressive pre-boot environments. Ideally implementations would not write to the label area outside of creating namespaces. Note that this only minimizes the window, it does not close it as the system can still crash while clearing the flag and the set can be subsequently deleted / invalidated by the pre-boot environment. Fixes: f524bf271a5c ("libnvdimm: write pmem label set") Cc: <stable@vger.kernel.org> Cc: Kelly Couch <kelly.j.couch@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2019-01-06acpi/nfit, device-dax: Identify differentiated memory with a unique numa-nodeDan Williams4-1/+4
Persistent memory, as described by the ACPI NFIT (NVDIMM Firmware Interface Table), is the first known instance of a memory range described by a unique "target" proximity domain. Where "initiator" and "target" proximity domains is an approach that the ACPI HMAT (Heterogeneous Memory Attributes Table) uses to described the unique performance properties of a memory range relative to a given initiator (e.g. CPU or DMA device). Currently the numa-node for a /dev/pmemX block-device or /dev/daxX.Y char-device follows the traditional notion of 'numa-node' where the attribute conveys the closest online numa-node. That numa-node attribute is useful for cpu-binding and memory-binding processes *near* the device. However, when the memory range backing a 'pmem', or 'dax' device is onlined (memory hot-add) the memory-only-numa-node representing that address needs to be differentiated from the set of online nodes. In other words, the numa-node association of the device depends on whether you can bind processes *near* the cpu-numa-node in the offline device-case, or bind process *on* the memory-range directly after the backing address range is onlined. Allow for the case that platform firmware describes persistent memory with a unique proximity domain, i.e. when it is distinct from the proximity of DRAM and CPUs that are on the same socket. Plumb the Linux numa-node translation of that proximity through the libnvdimm region device to namespaces that are in device-dax mode. With this in place the proposed kmem driver [1] can optionally discover a unique numa-node number for the address range as it transitions the memory from an offline state managed by a device-driver to an online memory range managed by the core-mm. [1]: https://lore.kernel.org/lkml/20181022201317.8558C1D8@viggo.jf.intel.com Reported-by: Fan Du <fan.du@intel.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: "Oliver O'Halloran" <oohall@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Jérôme Glisse <jglisse@redhat.com> Reviewed-by: Yang Shi <yang.shi@linux.alibaba.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-12-28Merge branch 'akpm' (patches from Andrew)Linus Torvalds1-8/+5
Merge misc updates from Andrew Morton: - large KASAN update to use arm's "software tag-based mode" - a few misc things - sh updates - ocfs2 updates - just about all of MM * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (167 commits) kernel/fork.c: mark 'stack_vm_area' with __maybe_unused memcg, oom: notify on oom killer invocation from the charge path mm, swap: fix swapoff with KSM pages include/linux/gfp.h: fix typo mm/hmm: fix memremap.h, move dev_page_fault_t callback to hmm hugetlbfs: Use i_mmap_rwsem to fix page fault/truncate race hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization memory_hotplug: add missing newlines to debugging output mm: remove __hugepage_set_anon_rmap() include/linux/vmstat.h: remove unused page state adjustment macro mm/page_alloc.c: allow error injection mm: migrate: drop unused argument of migrate_page_move_mapping() blkdev: avoid migration stalls for blkdev pages mm: migrate: provide buffer_migrate_page_norefs() mm: migrate: move migrate_page_lock_buffers() mm: migrate: lock buffers before migrate_page_move_mapping() mm: migration: factor out code to compute expected number of page references mm, page_alloc: enable pcpu_drain with zone capability kmemleak: add config to select auto scan mm/page_alloc.c: don't call kasan_free_pages() at deferred mem init ...
2018-12-28Merge tag 'libnvdimm-for-4.21' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimmLinus Torvalds11-18/+781
Pull libnvdimm updates from Dan Williams: "The vast bulk of this update is the new support for the security capabilities of some nvdimms. The userspace tooling for this capability is still a work in progress, but the changes survive the existing libnvdimm unit tests. The changes also pass manual checkout on hardware and the new nfit_test emulation of the security capability. The touches of the security/keys/ files have received the necessary acks from Mimi and David. Those changes were necessary to allow for a new generic encrypted-key type, and allow the nvdimm sub-system to lookup key material referenced by the libnvdimm-sysfs interface. Summary: - Add support for the security features of nvdimm devices that implement a security model similar to ATA hard drive security. The security model supports locking access to the media at device-power-loss, to be unlocked with a passphrase, and secure-erase (crypto-scramble). Unlike the ATA security case where the kernel expects device security to be managed in a pre-OS environment, the libnvdimm security implementation allows key provisioning and key-operations at OS runtime. Keys are managed with the kernel's encrypted-keys facility to provide data-at-rest security for the libnvdimm key material. The usage model mirrors fscrypt key management, but is driven via libnvdimm sysfs. - Miscellaneous updates for api usage and comment fixes" * tag 'libnvdimm-for-4.21' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (21 commits) libnvdimm/security: Quiet security operations libnvdimm/security: Add documentation for nvdimm security support tools/testing/nvdimm: add Intel DSM 1.8 support for nfit_test tools/testing/nvdimm: Add overwrite support for nfit_test tools/testing/nvdimm: Add test support for Intel nvdimm security DSMs acpi/nfit, libnvdimm/security: add Intel DSM 1.8 master passphrase support acpi/nfit, libnvdimm/security: Add security DSM overwrite support acpi/nfit, libnvdimm: Add support for issue secure erase DSM to Intel nvdimm acpi/nfit, libnvdimm: Add enable/update passphrase support for Intel nvdimms acpi/nfit, libnvdimm: Add disable passphrase support to Intel nvdimm. acpi/nfit, libnvdimm: Add unlock of nvdimm support for Intel DIMMs acpi/nfit, libnvdimm: Add freeze security support to Intel nvdimm acpi/nfit, libnvdimm: Introduce nvdimm_security_ops keys-encrypted: add nvdimm key format type to encrypted keys keys: Export lookup_user_key to external users acpi/nfit, libnvdimm: Store dimm id as a member to struct nvdimm libnvdimm, namespace: Replace kmemdup() with kstrndup() libnvdimm, label: Switch to bitmap_zalloc() ACPI/nfit: Adjust annotation for why return 0 if fail to find NFIT at start libnvdimm, bus: Check id immediately following ida_simple_get ...
2018-12-28mm, devm_memremap_pages: fix shutdown handlingDan Williams1-8/+5
The last step before devm_memremap_pages() returns success is to allocate a release action, devm_memremap_pages_release(), to tear the entire setup down. However, the result from devm_add_action() is not checked. Checking the error from devm_add_action() is not enough. The api currently relies on the fact that the percpu_ref it is using is killed by the time the devm_memremap_pages_release() is run. Rather than continue this awkward situation, offload the responsibility of killing the percpu_ref to devm_memremap_pages_release() directly. This allows devm_memremap_pages() to do the right thing relative to init failures and shutdown. Without this change we could fail to register the teardown of devm_memremap_pages(). The likelihood of hitting this failure is tiny as small memory allocations almost always succeed. However, the impact of the failure is large given any future reconfiguration, or disable/enable, of an nvdimm namespace will fail forever as subsequent calls to devm_memremap_pages() will fail to setup the pgmap_radix since there will be stale entries for the physical address range. An argument could be made to require that the ->kill() operation be set in the @pgmap arg rather than passed in separately. However, it helps code readability, tracking the lifetime of a given instance, to be able to grep the kill routine directly at the devm_memremap_pages() call site. Link: http://lkml.kernel.org/r/154275558526.76910.7535251937849268605.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com> Fixes: e8d513483300 ("memremap: change devm_memremap_pages interface...") Reviewed-by: "Jérôme Glisse" <jglisse@redhat.com> Reported-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Michal Hocko <mhocko@suse.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-27Merge miscellaneous libnvdimm updates for 4.21Dan Williams6-35/+86
* Use common helpers, bitmap_zalloc() and kstrndup(), to replace open coded versions. * Clarify the comments around hotplug vs initial init case for the nfit driver. * Cleanup the libnvdimm init path.
2018-12-22libnvdimm/security: Quiet security operationsDan Williams2-16/+16
The security implementation is too chatty. For example, the common case is that security is not enabled / setup, and booting a qemu configuration currently yields: nvdimm nmem0: request_key() found no key nvdimm nmem0: failed to unlock dimm: -126 nvdimm nmem1: request_key() found no key nvdimm nmem1: failed to unlock dimm: -126 Convert all security related log messages to debug level. Cc: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-12-21tools/testing/nvdimm: Add test support for Intel nvdimm security DSMsDave Jiang1-1/+1
Add nfit_test support for DSM functions "Get Security State", "Set Passphrase", "Disable Passphrase", "Unlock Unit", "Freeze Lock", and "Secure Erase" for the fake DIMMs. Also adding a sysfs knob in order to put the DIMMs in "locked" state. The order of testing DIMM unlocking would be. 1a. Disable DIMM X. 1b. Set Passphrase to DIMM X. 2. Write to /sys/devices/platform/nfit_test.0/nfit_test_dimm/test_dimmX/lock_dimm 3. Renable DIMM X 4. Check DIMM X state via sysfs "security" attribute for nmemX. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-12-21acpi/nfit, libnvdimm/security: add Intel DSM 1.8 master passphrase supportDave Jiang3-29/+69
With Intel DSM 1.8 [1] two new security DSMs are introduced. Enable/update master passphrase and master secure erase. The master passphrase allows a secure erase to be performed without the user passphrase that is set on the NVDIMM. The commands of master_update and master_erase are added to the sysfs knob in order to initiate the DSMs. They are similar in opeartion mechanism compare to update and erase. [1]: http://pmem.io/documents/NVDIMM_DSM_Interface-V1.8.pdf Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-12-21acpi/nfit, libnvdimm/security: Add security DSM overwrite supportDave Jiang5-5/+200
Add support for the NVDIMM_FAMILY_INTEL "ovewrite" capability as described by the Intel DSM spec v1.7. This will allow triggering of overwrite on Intel NVDIMMs. The overwrite operation can take tens of minutes. When the overwrite DSM is issued successfully, the NVDIMMs will be unaccessible. The kernel will do backoff polling to detect when the overwrite process is completed. According to the DSM spec v1.7, the 128G NVDIMMs can take up to 15mins to perform overwrite and larger DIMMs will take longer. Given that overwrite puts the DIMM in an indeterminate state until it completes introduce the NDD_SECURITY_OVERWRITE flag to prevent other operations from executing when overwrite is happening. The NDD_WORK_PENDING flag is added to denote that there is a device reference on the nvdimm device for an async workqueue thread context. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-12-21acpi/nfit, libnvdimm: Add support for issue secure erase DSM to Intel nvdimmDave Jiang3-2/+53
Add support to issue a secure erase DSM to the Intel nvdimm. The required passphrase is acquired from an encrypted key in the kernel user keyring. To trigger the action, "erase <keyid>" is written to the "security" sysfs attribute. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-12-21acpi/nfit, libnvdimm: Add enable/update passphrase support for Intel nvdimmsDave Jiang3-7/+69
Add support for enabling and updating passphrase on the Intel nvdimms. The passphrase is the an encrypted key in the kernel user keyring. We trigger the update via writing "update <old_keyid> <new_keyid>" to the sysfs attribute "security". If no <old_keyid> exists (for enabling security) then a 0 should be used. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-12-21acpi/nfit, libnvdimm: Add disable passphrase support to Intel nvdimm.Dave Jiang3-3/+116
Add support to disable passphrase (security) for the Intel nvdimm. The passphrase used for disabling is pulled from an encrypted-key in the kernel user keyring. The action is triggered by writing "disable <keyid>" to the sysfs attribute "security". Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-12-13acpi/nfit, libnvdimm: Add unlock of nvdimm support for Intel DIMMsDave Jiang5-1/+177
Add support to unlock the dimm via the kernel key management APIs. The passphrase is expected to be pulled from userspace through keyutils. The key management and sysfs attributes are libnvdimm generic. Encrypted keys are used to protect the nvdimm passphrase at rest. The master key can be a trusted-key sealed in a TPM, preferred, or an encrypted-key, more flexible, but more exposure to a potential attacker. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Co-developed-by: Dan Williams <dan.j.williams@intel.com> Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-12-13acpi/nfit, libnvdimm: Add freeze security support to Intel nvdimmDave Jiang2-2/+65
Add support for freeze security on Intel nvdimm. This locks out any changes to security for the DIMM until a hard reset of the DIMM is performed. This is triggered by writing "freeze" to the generic nvdimm/nmemX "security" sysfs attribute. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Co-developed-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-12-13acpi/nfit, libnvdimm: Introduce nvdimm_security_opsDave Jiang3-1/+63
Some NVDIMMs, like the ones defined by the NVDIMM_FAMILY_INTEL command set, expose a security capability to lock the DIMMs at poweroff and require a passphrase to unlock them. The security model is derived from ATA security. In anticipation of other DIMMs implementing a similar scheme, and to abstract the core security implementation away from the device-specific details, introduce nvdimm_security_ops. Initially only a status retrieval operation, ->state(), is defined, along with the base infrastructure and definitions for future operations. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Co-developed-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-12-13acpi/nfit, libnvdimm: Store dimm id as a member to struct nvdimmDave Jiang2-5/+8
The generated dimm id is needed for the sysfs attribute as well as being used as the identifier/description for the security key. Since it's constant and should never change, store it as a member of struct nvdimm. As nvdimm_create() continues to grow parameters relative to NFIT driver requirements, do not require other implementations to keep pace. Introduce __nvdimm_create() to carry the new parameters and keep nvdimm_create() with the long standing default api. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-12-10libnvdimm, namespace: Replace kmemdup() with kstrndup()Andy Shevchenko1-2/+1
kstrndup() takes care of '\0' terminator for the strings. Use it here instead of kmemdup() + explicit terminating the input string. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-12-10libnvdimm, label: Switch to bitmap_zalloc()Andy Shevchenko1-4/+3
Switch to bitmap_zalloc() to show clearly what we are allocating. Besides that it returns pointer of bitmap type instead of opaque void *. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-12-10libnvdimm, bus: Check id immediately following ida_simple_getOcean He1-2/+2
The id check was not executed immediately following ida_simple_get. Just change the codes position, without function change. Signed-off-by: Ocean He <hehy1@lenovo.com> Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-12-09Merge tag 'v4.20-rc6' into for-4.21/blockJens Axboe3-27/+80
Pull in v4.20-rc6 to resolve the conflict in NVMe, but also to get the two corruption fixes. We're going to be overhauling the direct dispatch path, and we need to do that on top of the changes we made for that in mainline. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-05libnvdimm, pfn: Pad pfn namespaces relative to other regionsDan Williams3-27/+80
Commit cfe30b872058 "libnvdimm, pmem: adjust for section collisions with 'System RAM'" enabled Linux to workaround occasions where platform firmware arranges for "System RAM" and "Persistent Memory" to collide within a single section boundary. Unfortunately, as reported in this issue [1], platform firmware can inflict the same collision between persistent memory regions. The approach of interrogating iomem_resource does not work in this case because platform firmware may merge multiple regions into a single iomem_resource range. Instead provide a method to interrogate regions that share the same parent bus. This is a stop-gap until the core-MM can grow support for hotplug on sub-section boundaries. [1]: https://github.com/pmem/ndctl/issues/76 Fixes: cfe30b872058 ("libnvdimm, pmem: adjust for section collisions with...") Cc: <stable@vger.kernel.org> Reported-by: Patrick Geary <patrickg@supermicro.com> Tested-by: Patrick Geary <patrickg@supermicro.com> Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-12-04acpi/nfit: Add support for Intel DSM 1.8 commandsDave Jiang1-1/+1
Add command definition for security commands defined in Intel DSM specification v1.8 [1]. This includes "get security state", "set passphrase", "unlock unit", "freeze lock", "secure erase", "overwrite", "overwrite query", "master passphrase enable/disable", and "master erase", . Since this adds several Intel definitions, move the relevant bits to their own header. These commands mutate physical data, but that manipulation is not cache coherent. The requirement to flush and invalidate caches makes these commands unsuitable to be called from userspace, so extra logic is added to detect and block these commands from being submitted via the ioctl command submission path. Lastly, the commands may contain sensitive key material that should not be dumped in a standard debug session. Update the nvdimm-command payload-dump facility to move security command payloads behind a default-off compile time switch. [1]: http://pmem.io/documents/NVDIMM_DSM_Interface-V1.8.pdf Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-11-15block: remove the lock argument to blk_alloc_queue_nodeChristoph Hellwig1-1/+1
With the legacy request path gone there is no real need to override the queue_lock. Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-10-25Merge tag 'libnvdimm-for-4.20' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimmLinus Torvalds11-63/+251
Pull libnvdimm updates from Dan Williams: - Improve the efficiency and performance of reading nvdimm-namespace labels. Reduce the amount of label data read at driver load time by a few orders of magnitude. Reduce heavyweight call-outs to platform-firmware routines. - Handle media errors located in the 'struct page' array stored on a persistent memory namespace. Let the kernel clear these errors rather than an awkward userspace workaround. - Fix Address Range Scrub (ARS) completion tracking. Correct occasions where the kernel indicates completion of ARS before submission. - Fix asynchronous device registration reference counting. - Add support for reporting an nvdimm dirty-shutdown-count via sysfs. - Fix various small libnvdimm core and uapi issues. * tag 'libnvdimm-for-4.20' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (21 commits) acpi, nfit: Further restrict userspace ARS start requests acpi, nfit: Fix Address Range Scrub completion tracking UAPI: ndctl: Remove use of PAGE_SIZE UAPI: ndctl: Fix g++-unsupported initialisation in headers tools/testing/nvdimm: Populate dirty shutdown data acpi, nfit: Collect shutdown status acpi, nfit: Introduce nfit_mem flags libnvdimm, label: Fix sparse warning nvdimm: Use namespace index data to reduce number of label reads needed nvdimm: Split label init out from the logic for getting config data nvdimm: Remove empty if statement nvdimm: Clarify comment in sizeof_namespace_index nvdimm: Sanity check labeloff libnvdimm, dimm: Maximize label transfer size libnvdimm, pmem: Fix badblocks population for 'raw' namespaces libnvdimm, namespace: Drop the repeat assignment for variable dev->parent libnvdimm, region: Fail badblocks listing for inactive regions libnvdimm, pfn: during init, clear errors in the metadata area libnvdimm: Set device node in nd_device_register libnvdimm: Hold reference on parent while scheduling async init ...
2018-10-12libnvdimm, label: Fix sparse warningDan Williams1-1/+3
The kbuild robot reports: drivers/nvdimm/label.c:500:32: warning: restricted __le32 degrades to integer ...read 'nslot' into a local u32. Reported-by: kbuild test robot <lkp@intel.com> Acked-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-10-12nvdimm: Use namespace index data to reduce number of label reads neededAlexander Duyck3-12/+88
This patch adds logic that is meant to make use of the namespace index data to reduce the number of reads that are needed to initialize a given namespace. The general idea is that once we have enough data to validate the namespace index we do so and then proceed to fetch only those labels that are not listed as being "free". By doing this I am seeing a total time reduction from about 4-5 seconds to 2-3 seconds for 24 NVDIMM modules each with 128K of label config area. Reviewed-by: Toshi Kani <toshi.kani@hpe.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-10-12nvdimm: Split label init out from the logic for getting config dataAlexander Duyck5-31/+61
This patch splits the initialization of the label data into two functions. One for doing the init, and another for reading the actual configuration data. The idea behind this is that by doing this we create a symmetry between the getting and setting of config data in that we have a function for both. In addition it will make it easier for us to identify the bits that are related to init versus the pieces that are a wrapper for reading data from the ACPI interface. So for example by splitting things out like this it becomes much more obvious that we were performing checks that weren't necessarily related to the set/get operations such as relying on ndd->data being present when the set and get ops should not care about a locally cached copy of the label area. Reviewed-by: Toshi Kani <toshi.kani@hpe.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-10-12nvdimm: Remove empty if statementAlexander Duyck1-3/+2
This patch removes an empty statement from an if expression and promotes the else statement to the if expression with the expression logic reversed. I feel this is more readable as the empty statement can lead to issues if any additional logic was ever added. Reviewed-by: Toshi Kani <toshi.kani@hpe.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-10-12nvdimm: Clarify comment in sizeof_namespace_indexAlexander Duyck1-1/+2
When working on the label code I found it rather confusing to see several spots that reference a minimum label size of 256 while working with labels that are 128 bytes in size. This patch is meant to provide a clarification on one of the comments that was at the heart of the issue. Specifically for version 1.2 and later of the namespace specification the minimum label size is 256, prior to that the minimum label size was 128. So we should state that as such to avoid confusion. Reviewed-by: Toshi Kani <toshi.kani@hpe.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-10-12nvdimm: Sanity check labeloffAlexander Duyck1-0/+7
This patch adds validation for the labeloff field in the indexes. Reviewed-by: Toshi Kani <toshi.kani@hpe.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-10-10libnvdimm, dimm: Maximize label transfer sizeDan Williams1-7/+6
Use kvzalloc() to bypass the arbitrary PAGE_SIZE limit of label transfer operations. Given the expense of calling into firmware, maximize the amount of label data we transfer per call to be up to the total label space if allowed by the firmware. Instead of limiting based on PAGE_SIZE we can instead simply limit the maximum size based on either the config_size int he case of the get operation, or the length of the write based on the set operation. On a system with 24 NVDIMM modules each with a config_size of 128K and a maximum transfer size of 64K - 4, this patch reduces the init time for the label data from around 24 seconds down to between 4-5 seconds. Reviewed-by: Toshi Kani <toshi.kani@hpe.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-10-09libnvdimm, pmem: Fix badblocks population for 'raw' namespacesDan Williams1-1/+3
The driver is only initializing bb_res in the devm_memremap_pages() paths, but the raw namespace case is passing an uninitialized bb_res to nvdimm_badblocks_populate(). Fixes: e8d513483300 ("memremap: change devm_memremap_pages interface...") Cc: <stable@vger.kernel.org> Cc: Christoph Hellwig <hch@lst.de> Reported-by: Jacek Zloch <jacek.zloch@intel.com> Reported-by: Krzysztof Rusocki <krzysztof.rusocki@intel.com> Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-10-01libnvdimm, namespace: Drop the repeat assignment for variable dev->parentGuangZhe Fu1-1/+0
The variable dev-parent is assigned twice with the same &nd_region->dev. I think we could drop the second one. Signed-off-by: GuangZhe Fu <fugz1@lenovo.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-09-28libnvdimm, region: Fail badblocks listing for inactive regionsDan Williams1-2/+9
While experimenting with region driver loading the following backtrace was triggered: INFO: trying to register non-static key. the code is fine but needs lockdep annotation. turning off the locking correctness validator. [..] Call Trace: dump_stack+0x85/0xcb register_lock_class+0x571/0x580 ? __lock_acquire+0x2ba/0x1310 ? kernfs_seq_start+0x2a/0x80 __lock_acquire+0xd4/0x1310 ? dev_attr_show+0x1c/0x50 ? __lock_acquire+0x2ba/0x1310 ? kernfs_seq_start+0x2a/0x80 ? lock_acquire+0x9e/0x1a0 lock_acquire+0x9e/0x1a0 ? dev_attr_show+0x1c/0x50 badblocks_show+0x70/0x190 ? dev_attr_show+0x1c/0x50 dev_attr_show+0x1c/0x50 This results from a missing successful call to devm_init_badblocks() from nd_region_probe(). Block attempts to show badblocks while the region is not enabled. Fixes: 6a6bef90425e ("libnvdimm: add mechanism to publish badblocks...") Cc: <stable@vger.kernel.org> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-09-28libnvdimm, pfn: during init, clear errors in the metadata areaVishal Verma1-1/+60
If there are badblocks present in the 'struct page' area for pfn namespaces, until now, the only way to clear them has been to force the namespace into raw mode, clear the errors, and re-enable the fsdax mode. This is clunky, given that it should be easy enough for the pfn driver to do the same. Add a new helper that uses the most recently available badblocks list to check whether there are any badblocks that lie in the volatile struct page area. If so, before initializing the struct pages, send down targeted writes via nvdimm_write_bytes to write zeroes to the affected blocks, and thus clear errors. Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-09-28block: genhd: add 'groups' argument to device_add_diskHannes Reinecke3-3/+3
Update device_add_disk() to take an 'groups' argument so that individual drivers can register a device with additional sysfs attributes. This avoids race condition the driver would otherwise have if these groups were to be created with sysfs_add_groups(). Signed-off-by: Martin Wilck <martin.wilck@suse.com> Signed-off-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-26libnvdimm: Set device node in nd_device_registerAlexander Duyck1-6/+10
This change makes it so that we don't repeatedly overwrite the device node for nvdimm regions. The earliest we can set the node is immediately after calling device init, so I have moved the code there so we can avoid rewriting the node with each uevent. Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>