aboutsummaryrefslogtreecommitdiffstatshomepage
path: root/tools/perf/scripts/python/export-to-postgresql.py (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2021-06-17nvme-fabrics: remove memset in nvmf_reg_read64()Chaitanya Kulkarni1-2/+1
Declare and initialize structure variable to the zero values so that we can get rid of the zeroout memset call. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-06-17nvme-tcp: use ctrl sgl check helperChaitanya Kulkarni1-1/+1
Use the helper to check NVMe controller's SGL support. Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-06-17nvme-pci: use ctrl sgl check helperChaitanya Kulkarni1-2/+2
Use the helper to check NVMe controller's SGL support. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-06-17nvme-fc: use ctrl sgl check helperChaitanya Kulkarni1-1/+1
Use the helper to check NVMe controller's SGL support. Reviewed-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-06-17nvme: add a helper to check ctrl sgl supportChaitanya Kulkarni1-0/+5
For various transports such as fc/tcp/pci it is common to check if NVMe SGLs are supported or not by the controller. In this preparation patch we add a helper to avoid the open coding of such checks in the various transport. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-06-17nvme-pci: remove trailing lines for helpersChaitanya Kulkarni1-2/+0
Remove the extra white line at the end of the functions. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-06-17nvme-pci: fix var. type for increasing cq_headJK Kim1-1/+1
nvmeq->cq_head is compared with nvmeq->q_depth and changed the value and cq_phase for handling the next cq db. but, nvmeq->q_depth's type is u32 and max. value is 0x10000 when CQP.MSQE is 0xffff and io_queue_depth is 0x10000. current temp. variable for comparing with nvmeq->q_depth is overflowed when previous nvmeq->cq_head is 0xffff. in this case, nvmeq->cq_phase is not updated. so, fix data type for temp. variable to u32. Signed-off-by: JK Kim <jongkang.kim2@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-06-16nvme-tcp: fix error codes in nvme_tcp_setup_ctrl()Dan Carpenter1-0/+2
These error paths currently return success but they should return -EOPNOTSUPP. Fixes: 73ffcefcfca0 ("nvme-tcp: check sgl supported by target") Fixes: 3f2304f8c6d6 ("nvme-tcp: add NVMe over TCP host driver") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-06-16nvme: factor out a nvme_validate_passthru_nsid helperChaitanya Kulkarni1-10/+16
Add a helper nvme_validate_passthru_nsid() to validate the nsid that removes the nsid validation and error message print code from nvme_user_cmd() and nvme_user_cmd64(). Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-06-16nvme: fix grammar in the CONFIG_NVME_MULTIPATH kconfig help textGeert Uytterhoeven1-1/+1
Fix a singular/plural mismatch in the CONFIG_NVME_MULTIPATH help text. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-06-16nvme: remove superfluous bio_set_dev in nvme_requeue_workDaniel Wagner1-5/+0
Commit ce86dad222e9 ("nvme-multipath: reset bdev to ns head when failover") moved the reset code where the bio is added to the requeue_list for the failover path. But it left the original bio_set_dev in nvme_requeue_work. There is a second path to nvme_requee_work. It is via nvme_ns_head_submit_bio. Though we don't have to set bio->bi_bdev for this path either, as it points to the correct bdev already. Let's remove the bio_set_dev. It's updating the bio->bi_bdev with the same pointer and thus it's unnecessary. Signed-off-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-06-16nvme: verify MNAN value if ANA is enabledDaniel Wagner1-0/+7
The controller is required to have a non-zero MNAN value if it supports ANA: If the controller supports Asymmetric Namespace Access Reporting, then this field shall be set to a non-zero value that is less than or equal to the NN value. Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-06-16ACPI: Add quirks for AMD Renoir/Lucienne CPUs to force the D3 hintMario Limonciello3-0/+37
AMD systems from Renoir and Lucienne require that the NVME controller is put into D3 over a Modern Standby / suspend-to-idle cycle. This is "typically" accomplished using the `StorageD3Enable` property in the _DSD, but this property was introduced after many of these systems launched and most OEM systems don't have it in their BIOS. On AMD Renoir without these drives going into D3 over suspend-to-idle the resume will fail with the NVME controller being reset and a trace like this in the kernel logs: ``` [ 83.556118] nvme nvme0: I/O 161 QID 2 timeout, aborting [ 83.556178] nvme nvme0: I/O 162 QID 2 timeout, aborting [ 83.556187] nvme nvme0: I/O 163 QID 2 timeout, aborting [ 83.556196] nvme nvme0: I/O 164 QID 2 timeout, aborting [ 95.332114] nvme nvme0: I/O 25 QID 0 timeout, reset controller [ 95.332843] nvme nvme0: Abort status: 0x371 [ 95.332852] nvme nvme0: Abort status: 0x371 [ 95.332856] nvme nvme0: Abort status: 0x371 [ 95.332859] nvme nvme0: Abort status: 0x371 [ 95.332909] PM: dpm_run_callback(): pci_pm_resume+0x0/0xe0 returns -16 [ 95.332936] nvme 0000:03:00.0: PM: failed to resume async: error -16 ``` The Microsoft documentation for StorageD3Enable mentioned that Windows has a hardcoded allowlist for D3 support, which was used for these platforms. Introduce quirks to hardcode them for Linux as well. As this property is now "standardized", OEM systems using AMD Cezanne and newer APU's have adopted this property, and quirks like this should not be necessary. CC: Shyam-sundar S-k <Shyam-sundar.S-k@amd.com> CC: Alexander Deucher <Alexander.Deucher@amd.com> CC: Prike Liang <prike.liang@amd.com> Link: https://docs.microsoft.com/en-us/windows-hardware/design/component-guidelines/power-management-for-storage-hardware-devices-intro Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Julian Sikorski <belegdol@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-06-16ACPI: Check StorageD3Enable _DSD property in ACPI codeMario Limonciello3-27/+35
Although first implemented for NVME, this check may be usable by other drivers as well. Microsoft's specification explicitly mentions that is may be usable by SATA and AHCI devices. Google also indicates that they have used this with SDHCI in a downstream kernel tree that a user can plug a storage device into. Link: https://docs.microsoft.com/en-us/windows-hardware/design/component-guidelines/power-management-for-storage-hardware-devices-intro Suggested-by: Keith Busch <kbusch@kernel.org> CC: Shyam-sundar S-k <Shyam-sundar.S-k@amd.com> CC: Alexander Deucher <Alexander.Deucher@amd.com> CC: Rafael J. Wysocki <rjw@rjwysocki.net> CC: Prike Liang <prike.liang@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-06-15floppy: Fix fall-through warning for ClangGustavo A. R. Silva1-0/+1
In preparation to enable -Wimplicit-fallthrough for Clang, fix a warning by explicitly adding a break statement instead of letting the code fall through to the next case. Link: https://github.com/KSPP/linux/issues/115 Link: https://lore.kernel.org/linux-hardening/47bcd36a-6524-348b-e802-0691d1b3c429@kernel.dk/ Suggested-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Denis Efremov <efremov@linux.com>
2021-06-15floppy: cleanup: remove redundant assignment to nr_sectorsJiapeng Chong1-1/+0
Variable nr_sectors is set to zero but this value is never read as it is overwritten later on, hence it is a redundant assignment and can be removed. Clean up the following clang-analyzer warning: drivers/block/floppy.c:2333:2: warning: Value stored to 'nr_sectors' is never read [clang-analyzer-deadcode.DeadStores]. Link: https://lore.kernel.org/r/1619774805-121562-1-git-send-email-jiapeng.chong@linux.alibaba.com Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Signed-off-by: Denis Efremov <efremov@linux.com>
2021-06-14md/raid5: avoid device_lock in read_one_chunk()Gal Ofri1-6/+23
There is a lock contention on device_lock in read_one_chunk(). device_lock is taken to sync conf->active_aligned_reads and conf->quiesce. read_one_chunk() takes the lock, then waits for quiesce=0 (resumed) before incrementing active_aligned_reads. raid5_quiesce() takes the lock, sets quiesce=2 (in-progress), then waits for active_aligned_reads to be zero before setting quiesce=1 (suspended). Introduce a fast (lockless) path in read_one_chunk(): activate aligned read without taking device_lock. In case quiesce starts while activating the aligned-read in fast path, deactivate it and revert to old behavior (take device_lock and wait for quiesce to finish). Add smp store/load in raid5_quiesce()/read_one_chunk() respectively to gaurantee that read_one_chunk() does not miss an ongoing quiesce. My setups: 1. 8 local nvme drives (each up to 250k iops). 2. 8 ram disks (brd). Each setup with raid6 (6+2), 1024 io threads on a 96 cpu-cores (48 per socket) system. Record both iops and cpu spent on this contention with rand-read-4k. Record bw with sequential-read-128k. Note: in most cases cpu is still busy but due to "new" bottlenecks. nvme: | iops | cpu | bw ----------------------------------------------- without patch | 1.6M | ~50% | 5.5GB/s with patch | 2M (throttled) | 0% | 16GB/s (throttled) ram (brd): | iops | cpu | bw ----------------------------------------------- without patch | 2M | ~80% | 24GB/s with patch | 4M | 0% | 55GB/s CC: Song Liu <song@kernel.org> CC: Neil Brown <neilb@suse.de> Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Gal Ofri <gal.ofri@storing.io> Signed-off-by: Song Liu <song@kernel.org>
2021-06-14md: add comments in md_integrity_registerGuoqing Jiang1-0/+6
Given it is not obvious for the error handling, let's try to add some comments here to make it clear. Signed-off-by: Guoqing Jiang <jiangguoqing@kylinos.cn> Signed-off-by: Song Liu <song@kernel.org>
2021-06-14md: check level before create and exit io_acct_setGuoqing Jiang1-6/+14
The bio_set (io_acct_set) is used by personalities to clone bio and trace the timestamp of bio. Some personalities such as raid1/10 don't need the bio_set, so add check to not create it unconditionally. Also update the comment for md_account_bio to make it more clear. Suggested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Guoqing Jiang <jiangguoqing@kylinos.cn> Signed-off-by: Song Liu <song@kernel.org>
2021-06-14md: Constify attribute_group structsRikard Falkeborn4-7/+7
The attribute_group structs are never modified, they're only passed to sysfs_create_group() and sysfs_remove_group(). Make them const to allow the compiler to put them in read-only memory. Signed-off-by: Rikard Falkeborn <rikard.falkeborn@gmail.com> Signed-off-by: Song Liu <song@kernel.org>
2021-06-14md: mark some personalities as deprecatedGuoqing Jiang4-6/+6
Mark the three personalities (linear, fault and multipath) as deprecated because: 1. people can use dm multipath or nvme multipath. 2. linear is already deprecated in MODULE_ALIAS. 3. no one actively using fault. Signed-off-by: Guoqing Jiang <jiangguoqing@kylinos.cn> Signed-off-by: Song Liu <song@kernel.org>
2021-06-14md/raid10: enable io accountingGuoqing Jiang2-0/+7
For raid10, we record the start time between split bio and clone bio, and finish the accounting in the final endio. Also introduce start_time in r10bio accordingly. Signed-off-by: Guoqing Jiang <jiangguoqing@kylinos.cn> Signed-off-by: Song Liu <song@kernel.org>
2021-06-14md/raid1: enable io accountingGuoqing Jiang2-0/+8
For raid1, we record the start time between split bio and clone bio, and finish the accounting in the final endio. Also introduce start_time in r1bio accordingly. Signed-off-by: Guoqing Jiang <jiangguoqing@kylinos.cn> Signed-off-by: Song Liu <song@kernel.org>
2021-06-14md/raid1: rename print_msg with r1bio_existedGuoqing Jiang1-4/+4
The caller of raid1_read_request could pass NULL or a valid pointer for "struct r1bio *r1_bio", so it actually means whether r1_bio is existed or not. Signed-off-by: Guoqing Jiang <jiangguoqing@kylinos.cn> Signed-off-by: Song Liu <song@kernel.org>
2021-06-14md/raid5: avoid redundant bio clone in raid5_read_one_chunkGuoqing Jiang1-14/+15
After enable io accounting, chunk read bio could be cloned twice which is not good. To avoid such inefficiency, let's clone align_bio from io_acct_set too, then we need only call md_account_bio in make_request unconditionally. Signed-off-by: Guoqing Jiang <jiangguoqing@kylinos.cn> Signed-off-by: Song Liu <song@kernel.org>
2021-06-14md/raid5: move checking badblock before clone bio in raid5_read_one_chunkGuoqing Jiang1-7/+7
We don't need to clone bio if the relevant region has badblock. Signed-off-by: Guoqing Jiang <jiangguoqing@kylinos.cn> Signed-off-by: Song Liu <song@kernel.org>
2021-06-14md: add io accounting for raid0 and raid5Guoqing Jiang4-3/+68
We introduce a new bioset (io_acct_set) for raid0 and raid5 since they don't own clone infrastructure to accounting io. And the bioset is added to mddev instead of to raid0 and raid5 layer, because with this way, we can put common functions to md.h and reuse them in raid0 and raid5. Also struct md_io_acct is added accordingly which includes io start_time, the origin bio and cloned bio. Then we can call bio_{start,end}_io_acct to get related io status. Signed-off-by: Guoqing Jiang <jiangguoqing@kylinos.cn> Signed-off-by: Song Liu <song@kernel.org>
2021-06-14md: revert io stats accountingGuoqing Jiang2-46/+0
The commit 41d2d848e5c0 ("md: improve io stats accounting") could cause double fault problem per the report [1], and also it is not correct to change ->bi_end_io if md don't own it, so let's revert it. And io stats accounting will be replemented in later commits. [1]. https://lore.kernel.org/linux-raid/3bf04253-3fad-434a-63a7-20214e38cf26@gmail.com/T/#t Fixes: 41d2d848e5c0 ("md: improve io stats accounting") Signed-off-by: Guoqing Jiang <jiangguoqing@kylinos.cn> Signed-off-by: Song Liu <song@kernel.org>
2021-06-14mark pstore-blk as brokenChristoph Hellwig1-0/+1
pstore-blk just pokes directly into the pagecache for the block device without going through the file operations for that by faking up it's own file operations that do not match the block device ones. As this breaks the control of the block layer of it's page cache, and even now just works by accident only the best thing is to just disable this driver. Fixes: 17639f67c1d6 ("pstore/blk: Introduce backend for block devices") Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210608161327.1537919-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-09z2ram: remove unnecessary oom messageZhen Lei1-8/+2
Fixes scripts/checkpatch.pl warning: WARNING: Possible unnecessary 'out of memory' message Remove it can help us save a bit of memory. Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-09sx8: remove unnecessary oom messageZhen Lei1-2/+0
Fixes scripts/checkpatch.pl warning: WARNING: Possible unnecessary 'out of memory' message Remove it can help us save a bit of memory. Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-09sunvdc: remove unnecessary oom messageZhen Lei1-2/+1
Fixes scripts/checkpatch.pl warning: WARNING: Possible unnecessary 'out of memory' message Remove it can help us save a bit of memory. Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-09mtip32xx: remove unnecessary oom messageZhen Lei1-21/+5
Fixes scripts/checkpatch.pl warning: WARNING: Possible unnecessary 'out of memory' message Remove it can help us save a bit of memory. Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-09drbd: remove unnecessary oom messageZhen Lei1-16/+6
Fixes scripts/checkpatch.pl warning: WARNING: Possible unnecessary 'out of memory' message Remove it can help us save a bit of memory. Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-09aoe: remove unnecessary oom messageZhen Lei1-3/+1
Fixes scripts/checkpatch.pl warning: WARNING: Possible unnecessary 'out of memory' message Remove it can help us save a bit of memory. Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-03nvmet: remove a superfluous variableChaitanya Kulkarni1-2/+1
Remove the superfluous variable "bdev" that is only used once in the nvmet_bdev_alloc_bip() and use req->ns->bdev that is used everywhere in the code to access the nvmet request's bdev. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-06-03nvmet: move ka_work initialization to nvmet_alloc_ctrlAmit Engel1-1/+1
Initialize keep-alive work only once, as part of alloc_ctrl and not each time that nvmet_start_keep_alive_timer is being called Signed-off-by: Amit Engel <amit.engel@dell.com> Reviewed-by: Hou Pu <houpu.main@gmail.com>
2021-06-03nvme: remove nvme_{get,put}_ns_from_diskChristoph Hellwig2-45/+28
Now that only one caller is left remove the helpers by restructuring nvme_pr_command so that it has two helpers for sending a command of to a given nsid using either the ns_head for multipath, or the namespace stored in the gendisk. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
2021-06-03nvme: split nvme_report_zonesChristoph Hellwig4-24/+35
Split multipath support out of nvme_report_zones into a separate helper and simplify the non-multipath version as a result. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
2021-06-03nvme: move the CSI sanity check into nvme_ns_report_zonesChristoph Hellwig1-5/+4
Move the CSI check into nvme_ns_report_zones to clean up the code a little bit and prepare for further refactoring. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
2021-06-03nvme: add a sparse annotation to nvme_ns_head_ctrl_ioctlChristoph Hellwig1-0/+1
Add the __releases annotation to tell sparse that nvme_ns_head_ctrl_ioctl is expected to unlock head->srcu. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
2021-06-03nvme: open code nvme_put_ns_from_disk in nvme_ns_head_ctrl_ioctlChristoph Hellwig1-1/+1
nvme_ns_head_ctrl_ioctl is always used on multipath nodes, so just call srcu_read_unlock directly. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
2021-06-03nvme: open code nvme_{get,put}_ns_from_disk in nvme_ns_head_ioctlChristoph Hellwig1-10/+10
nvme_ns_head_ioctl is always used on multipath nodes, no need to deal with the de-multiplexers. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
2021-06-03nvme: open code nvme_put_ns_from_disk in nvme_ns_head_chr_ioctlChristoph Hellwig1-7/+5
nvme_ns_head_chr_ioctl is always used on multipath nodes, so just call srcu_read_unlock and consolidate the two unlock paths. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
2021-06-03nvme-fabrics: remove extra bracesChaitanya Kulkarni1-1/+1
No need to use the braces around ~ operator. No functionality change in this patch. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-06-03nvme-fabrics: remove an extra commentChaitanya Kulkarni1-1/+1
Remove the comment at the end of the switch that is not needed as function is small enough. No functionality change in this patch. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-06-03nvme-fabrics: remove extra new lines in the switchChaitanya Kulkarni1-5/+4
Remove the extra lines in the switch block that is not common practice in the kernel code. No functionality change in this patch. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-06-03nvme-fabrics: fix the kerneldco comment for nvmf_log_connect_error()Chaitanya Kulkarni1-13/+9
Fix the comment style that matches existing code. No functionality change in this patch. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-06-03nvme-tcp: allow selecting the network interface for connectionsMartin Belanger4-2/+50
In our application, we need a way to force TCP connections to go out a specific IP interface instead of letting Linux select the interface based on the routing tables. Add the 'host-iface' option to allow specifying the interface to use. When the option host-iface is specified, the driver uses the specified interface to set the option SO_BINDTODEVICE on the TCP socket before connecting. This new option is needed in addtion to the existing host-traddr for the following reasons: Specifying an IP interface by its associated IP address is less intuitive than specifying the actual interface name and, in some cases, simply doesn't work. That's because the association between interfaces and IP addresses is not predictable. IP addresses can be changed or can change by themselves over time (e.g. DHCP). Interface names are predictable [1] and will persist over time. Consider the following configuration. 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state ... link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 100.0.0.100/24 scope global lo valid_lft forever preferred_lft forever 2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc ... link/ether 08:00:27:21:65:ec brd ff:ff:ff:ff:ff:ff inet 100.0.0.100/24 scope global enp0s3 valid_lft forever preferred_lft forever 3: enp0s8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc ... link/ether 08:00:27:4f:95:5c brd ff:ff:ff:ff:ff:ff inet 100.0.0.100/24 scope global enp0s8 valid_lft forever preferred_lft forever The above is a VM that I configured with the same IP address (100.0.0.100) on all interfaces. Doing a reverse lookup to identify the unique interface associated with 100.0.0.100 does not work here. And this is why the option host_iface is required. I understand that the above config does not represent a standard host system, but I'm using this to prove a point: "We can never know how users will configure their systems". By te way, The above configuration is perfectly fine by Linux. The current TCP implementation for host_traddr performs a bind()-before-connect(). This is a common construct to set the source IP address on a TCP socket before connecting. This has no effect on how Linux selects the interface for the connection. That's because Linux uses the Weak End System model as described in RFC1122 [2]. On the other hand, setting the Source IP Address has benefits and should be supported by linux-nvme. In fact, setting the Source IP Address is a mandatory FedGov requirement (e.g. connection to a RADIUS/TACACS+ server). Consider the following configuration. $ ip addr list dev enp0s8 3: enp0s8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc ... link/ether 08:00:27:4f:95:5c brd ff:ff:ff:ff:ff:ff inet 192.168.56.101/24 brd 192.168.56.255 scope global enp0s8 valid_lft 426sec preferred_lft 426sec inet 192.168.56.102/24 scope global secondary enp0s8 valid_lft forever preferred_lft forever inet 192.168.56.103/24 scope global secondary enp0s8 valid_lft forever preferred_lft forever inet 192.168.56.104/24 scope global secondary enp0s8 valid_lft forever preferred_lft forever Here we can see that several addresses are associated with interface enp0s8. By default, Linux always selects the default IP address, 192.168.56.101, as the source address when connecting over interface enp0s8. Some users, however, want the ability to specify a different source address (e.g., 192.168.56.102, 192.168.56.103, ...). The option host_traddr can be used as-is to perform this function. In conclusion, I believe that we need 2 options for TCP connections. One that can be used to specify an interface (host-iface). And one that can be used to set the source address (host-traddr). Users should be allowed to use one or the other, or both, or none. Of course, the documentation for host_traddr will need some clarification. It should state that when used for TCP connection, this option only sets the source address. And the documentation for host_iface should say that this option is only available for TCP connections. References: [1] https://www.freedesktop.org/wiki/Software/systemd/PredictableNetworkInterfaceNames/ [2] https://tools.ietf.org/html/rfc1122 Tested both IPv4 and IPv6 connections. Signed-off-by: Martin Belanger <martin.belanger@dell.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-06-03nvme-pci: look for StorageD3Enable on companion ACPI device insteadMario Limonciello1-23/+1
The documentation around the StorageD3Enable property hints that it should be made on the PCI device. This is where newer AMD systems set the property and it's required for S0i3 support. So rather than look for nodes of the root port only present on Intel systems, switch to the companion ACPI device for all systems. David Box from Intel indicated this should work on Intel as well. Link: https://lore.kernel.org/linux-nvme/YK6gmAWqaRmvpJXb@google.com/T/#m900552229fa455867ee29c33b854845fce80ba70 Link: https://docs.microsoft.com/en-us/windows-hardware/design/component-guidelines/power-management-for-storage-hardware-devices-intro Fixes: df4f9bc4fb9c ("nvme-pci: add support for ACPI StorageD3Enable property") Suggested-by: Liang Prike <Prike.Liang@amd.com> Acked-by: Raul E Rangel <rrangel@chromium.org> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: David E. Box <david.e.box@linux.intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de>