<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-dev/drivers/nvme/host, branch master</title>
<subtitle>Linux kernel development work - see feature branches</subtitle>
<id>https://git.zx2c4.com/linux-dev/atom/drivers/nvme/host?h=master</id>
<link rel='self' href='https://git.zx2c4.com/linux-dev/atom/drivers/nvme/host?h=master'/>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/linux-dev/'/>
<updated>2022-11-09T13:28:27Z</updated>
<entry>
<title>nvme: quiet user passthrough command errors</title>
<updated>2022-11-09T13:28:27Z</updated>
<author>
<name>Keith Busch</name>
<email>kbusch@kernel.org</email>
</author>
<published>2022-10-28T20:14:15Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/linux-dev/commit/?id=d7ac8dca938cd60cf7bd9a89a229a173c6bcba87'/>
<id>urn:sha1:d7ac8dca938cd60cf7bd9a89a229a173c6bcba87</id>
<content type='text'>
The driver is spamming the kernel logs for entirely harmless errors from
user space submitting unsupported commands. Just silence the errors.
The application has direct access to command status, so there's no need
to log these.

And since every passthrough command now uses the quiet flag, move the
setting to the common initializer.

Signed-off-by: Keith Busch &lt;kbusch@kernel.org&gt;
Reviewed-by: Alan Adamson &lt;alan.adamson@oracle.com&gt;
Reviewed-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Reviewed-by: Kanchan Joshi &lt;joshi.k@samsung.com&gt;
Reviewed-by: Chaitanya Kulkarni &lt;kch@nvidia.com&gt;
Reviewed-by: Daniel Wagner &lt;dwagner@suse.de&gt;
Tested-by: Alan Adamson &lt;alan.adamson@oracle.com&gt;
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
</content>
</entry>
<entry>
<title>nvme-multipath: set queue dma alignment to 3</title>
<updated>2022-10-25T15:07:53Z</updated>
<author>
<name>Keith Busch</name>
<email>kbusch@kernel.org</email>
</author>
<published>2022-10-24T18:57:45Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/linux-dev/commit/?id=fe8714b04fb137aa62e9a69424c48b5301b721b9'/>
<id>urn:sha1:fe8714b04fb137aa62e9a69424c48b5301b721b9</id>
<content type='text'>
NVMe spec requires all transports support dword aligned addresses, which
is already set in the namespace request_queue. Set the same limit in the
multipath device's request_queue as well.

Signed-off-by: Keith Busch &lt;kbusch@kernel.org&gt;
Reviewed-by: Sagi Grimberg &lt;sagi@grimberg.me&gt;
Reviewed-by: Chaitanya Kulkarni &lt;kch@nvidia.com&gt;
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
</content>
</entry>
<entry>
<title>nvme-tcp: fix possible circular locking when deleting a controller under memory pressure</title>
<updated>2022-10-25T15:07:50Z</updated>
<author>
<name>Sagi Grimberg</name>
<email>sagi@grimberg.me</email>
</author>
<published>2022-10-23T08:04:43Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/linux-dev/commit/?id=83e1226b0ee2d7e3fb6e002fbbfc6ab36aabdc35'/>
<id>urn:sha1:83e1226b0ee2d7e3fb6e002fbbfc6ab36aabdc35</id>
<content type='text'>
When destroying a queue, when calling sock_release, the network stack
might need to allocate an skb to send a FIN/RST. When that happens
during memory pressure, there is a need to reclaim memory, which
in turn may ask the nvme-tcp device to write out dirty pages, however
this is not possible due to a ctrl teardown that is going on.

Set PF_MEMALLOC to the task that releases the socket to grant access
to PF_MEMALLOC reserves. In addition, do the same for the nvme-tcp
thread as this may also originate from the swap itself and should
be more resilient to memory pressure situations.

This fixes the following lockdep complaint:
--
======================================================
 WARNING: possible circular locking dependency detected
 6.0.0-rc2+ #25 Tainted: G        W
 ------------------------------------------------------
 kswapd0/92 is trying to acquire lock:
 ffff888114003240 (sk_lock-AF_INET-NVME){+.+.}-{0:0}, at: tcp_sendpage+0x23/0xa0

 but task is already holding lock:
 ffffffff97e95ca0 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat+0x987/0x10d0

 which lock already depends on the new lock.

 the existing dependency chain (in reverse order) is:

 -&gt; #1 (fs_reclaim){+.+.}-{0:0}:
        fs_reclaim_acquire+0x11e/0x160
        kmem_cache_alloc_node+0x44/0x530
        __alloc_skb+0x158/0x230
        tcp_send_active_reset+0x7e/0x730
        tcp_disconnect+0x1272/0x1ae0
        __tcp_close+0x707/0xd90
        tcp_close+0x26/0x80
        inet_release+0xfa/0x220
        sock_release+0x85/0x1a0
        nvme_tcp_free_queue+0x1fd/0x470 [nvme_tcp]
        nvme_do_delete_ctrl+0x130/0x13d [nvme_core]
        nvme_sysfs_delete.cold+0x8/0xd [nvme_core]
        kernfs_fop_write_iter+0x356/0x530
        vfs_write+0x4e8/0xce0
        ksys_write+0xfd/0x1d0
        do_syscall_64+0x58/0x80
        entry_SYSCALL_64_after_hwframe+0x63/0xcd

 -&gt; #0 (sk_lock-AF_INET-NVME){+.+.}-{0:0}:
        __lock_acquire+0x2a0c/0x5690
        lock_acquire+0x18e/0x4f0
        lock_sock_nested+0x37/0xc0
        tcp_sendpage+0x23/0xa0
        inet_sendpage+0xad/0x120
        kernel_sendpage+0x156/0x440
        nvme_tcp_try_send+0x48a/0x2630 [nvme_tcp]
        nvme_tcp_queue_rq+0xefb/0x17e0 [nvme_tcp]
        __blk_mq_try_issue_directly+0x452/0x660
        blk_mq_plug_issue_direct.constprop.0+0x207/0x700
        blk_mq_flush_plug_list+0x6f5/0xc70
        __blk_flush_plug+0x264/0x410
        blk_finish_plug+0x4b/0xa0
        shrink_lruvec+0x1263/0x1ea0
        shrink_node+0x736/0x1a80
        balance_pgdat+0x740/0x10d0
        kswapd+0x5f2/0xaf0
        kthread+0x256/0x2f0
        ret_from_fork+0x1f/0x30

other info that might help us debug this:

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(fs_reclaim);
                               lock(sk_lock-AF_INET-NVME);
                               lock(fs_reclaim);
  lock(sk_lock-AF_INET-NVME);

 *** DEADLOCK ***

3 locks held by kswapd0/92:
 #0: ffffffff97e95ca0 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat+0x987/0x10d0
 #1: ffff88811f21b0b0 (q-&gt;srcu){....}-{0:0}, at: blk_mq_flush_plug_list+0x6b3/0xc70
 #2: ffff888170b11470 (&amp;queue-&gt;send_mutex){+.+.}-{3:3}, at: nvme_tcp_queue_rq+0xeb9/0x17e0 [nvme_tcp]

Fixes: 3f2304f8c6d6 ("nvme-tcp: add NVMe over TCP host driver")
Reported-by: Daniel Wagner &lt;dwagner@suse.de&gt;
Signed-off-by: Sagi Grimberg &lt;sagi@grimberg.me&gt;
Tested-by: Daniel Wagner &lt;dwagner@suse.de&gt;
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
</content>
</entry>
<entry>
<title>nvme-tcp: replace sg_init_marker() with sg_init_table()</title>
<updated>2022-10-25T15:07:50Z</updated>
<author>
<name>Nam Cao</name>
<email>namcaov@gmail.com</email>
</author>
<published>2022-10-22T17:46:36Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/linux-dev/commit/?id=5fa9add66b00ad0c796185ff7438eaa3e67c1187'/>
<id>urn:sha1:5fa9add66b00ad0c796185ff7438eaa3e67c1187</id>
<content type='text'>
In nvme_tcp_ddgst_update(), sg_init_marker() is called with an
uninitialized scatterlist. This is probably fine, but gcc complains:

  CC [M]  drivers/nvme/host/tcp.o
In file included from ./include/linux/dma-mapping.h:10,
                 from ./include/linux/skbuff.h:31,
                 from ./include/net/net_namespace.h:43,
                 from ./include/linux/netdevice.h:38,
                 from ./include/net/sock.h:46,
                 from drivers/nvme/host/tcp.c:12:
In function ‘sg_mark_end’,
    inlined from ‘sg_init_marker’ at ./include/linux/scatterlist.h:356:2,
    inlined from ‘nvme_tcp_ddgst_update’ at drivers/nvme/host/tcp.c:390:2:
./include/linux/scatterlist.h:234:11: error: ‘sg.page_link’ is used uninitialized [-Werror=uninitialized]
  234 |         sg-&gt;page_link |= SG_END;
      |         ~~^~~~~~~~~~~
drivers/nvme/host/tcp.c: In function ‘nvme_tcp_ddgst_update’:
drivers/nvme/host/tcp.c:388:28: note: ‘sg’ declared here
  388 |         struct scatterlist sg;
      |                            ^~
cc1: all warnings being treated as errors

Use sg_init_table() instead, which basically memset the scatterlist to
zero first before calling sg_init_marker().

Signed-off-by: Nam Cao &lt;namcaov@gmail.com&gt;
Reviewed-by: Sagi Grimberg &lt;sagi@grimberg.me&gt;
Reviewed-by: Chaitanya Kulkarni &lt;kch@nvidia.com&gt;
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
</content>
</entry>
<entry>
<title>nvme-hwmon: kmalloc the NVME SMART log buffer</title>
<updated>2022-10-19T10:43:13Z</updated>
<author>
<name>Serge Semin</name>
<email>Sergey.Semin@baikalelectronics.ru</email>
</author>
<published>2022-10-18T15:33:52Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/linux-dev/commit/?id=c94b7f9bab22ac504f9153767676e659988575ad'/>
<id>urn:sha1:c94b7f9bab22ac504f9153767676e659988575ad</id>
<content type='text'>
Recent commit 52fde2c07da6 ("nvme: set dma alignment to dword") has
caused a regression on our platform.

It turned out that the nvme_get_log() method invocation caused the
nvme_hwmon_data structure instance corruption.  In particular the
nvme_hwmon_data.ctrl pointer was overwritten either with zeros or with
garbage.  After some research we discovered that the problem happened
even before the actual NVME DMA execution, but during the buffer mapping.
Since our platform is DMA-noncoherent, the mapping implied the cache-line
invalidations or write-backs depending on the DMA-direction parameter.
In case of the NVME SMART log getting the DMA was performed
from-device-to-memory, thus the cache-invalidation was activated during
the buffer mapping.  Since the log-buffer isn't cache-line aligned, the
cache-invalidation caused the neighbour data to be discarded.  The
neighbouring data turned to be the data surrounding the buffer in the
framework of the nvme_hwmon_data structure.

In order to fix that we need to make sure that the whole log-buffer is
defined within the cache-line-aligned memory region so the
cache-invalidation procedure wouldn't involve the adjacent data. One of
the option to guarantee that is to kmalloc the DMA-buffer [1]. Seeing the
rest of the NVME core driver prefer that method it has been chosen to fix
this problem too.

Note after a deeper researches we found out that the denoted commit wasn't
a root cause of the problem. It just revealed the invalidity by activating
the DMA-based NVME SMART log getting performed in the framework of the
NVME hwmon driver. The problem was here since the initial commit of the
driver.

[1] Documentation/core-api/dma-api-howto.rst

Fixes: 400b6a7b13a3 ("nvme: Add hardware monitoring support")
Signed-off-by: Serge Semin &lt;Sergey.Semin@baikalelectronics.ru&gt;
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
</content>
</entry>
<entry>
<title>nvme-hwmon: consistently ignore errors from nvme_hwmon_init</title>
<updated>2022-10-19T10:42:58Z</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2022-10-18T14:55:55Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/linux-dev/commit/?id=6b8cf94005187952f794c0c4ed3920a1e8accfa3'/>
<id>urn:sha1:6b8cf94005187952f794c0c4ed3920a1e8accfa3</id>
<content type='text'>
An NVMe controller works perfectly fine even when the hwmon
initialization fails.  Stop returning errors that do not come from a
controller reset from nvme_hwmon_init to handle this case consistently.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Guenter Roeck &lt;linux@roeck-us.net&gt;
Reviewed-by: Serge Semin &lt;fancer.lancer@gmail.com&gt;
</content>
</entry>
<entry>
<title>nvme-apple: don't limit DMA segement size</title>
<updated>2022-10-19T08:36:39Z</updated>
<author>
<name>Russell King (Oracle)</name>
<email>rmk+kernel@armlinux.org.uk</email>
</author>
<published>2022-10-12T11:46:06Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/linux-dev/commit/?id=d622f8477a8018974f8df961440dca58224f9c6b'/>
<id>urn:sha1:d622f8477a8018974f8df961440dca58224f9c6b</id>
<content type='text'>
NVMe uses PRPs for data transfers and has no specific limit for a single
DMA segement.  Limiting the size will cause problems because the block
layer assumes PRP-ish devices using a virt boundary mask don't have a
segment limit.  And while this is true, we also really need to tell the
DMA mapping layer about it, otherwise dma-debug will trip over it.

Fixes: 5bd2927aceba ("nvme-apple: Add initial Apple SoC NVMe driver")
Suggested-by: Sven Peter &lt;sven@svenpeter.dev&gt;
Signed-off-by: Russell King (Oracle) &lt;rmk+kernel@armlinux.org.uk&gt;
[hch: rewrote the commit message based on the PCIe commit]
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Eric Curtin &lt;ecurtin@redhat.com&gt;
Reviewed-by: Sven Peter &lt;sven@svenpeter.dev&gt;
</content>
</entry>
<entry>
<title>nvme-pci: disable write zeroes on various Kingston SSD</title>
<updated>2022-10-19T08:36:39Z</updated>
<author>
<name>Xander Li</name>
<email>xander_li@kingston.com.tw</email>
</author>
<published>2022-10-11T11:06:42Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/linux-dev/commit/?id=ac9b57d4e1e3ecf0122e915bbba1bd4c90ec3031'/>
<id>urn:sha1:ac9b57d4e1e3ecf0122e915bbba1bd4c90ec3031</id>
<content type='text'>
Kingston SSDs do support NVMe Write_Zeroes cmd but take long time to
process.  The firmware version is locked by these SSDs, we can not expect
firmware improvement, so disable Write_Zeroes cmd.

Signed-off-by: Xander Li &lt;xander_li@kingston.com.tw&gt;
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
</content>
</entry>
<entry>
<title>nvme: fix error pointer dereference in error handling</title>
<updated>2022-10-19T08:36:39Z</updated>
<author>
<name>Dan Carpenter</name>
<email>dan.carpenter@oracle.com</email>
</author>
<published>2022-10-15T08:25:56Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/linux-dev/commit/?id=4739824e2d7878dcea88397a6758e31e3c5c124e'/>
<id>urn:sha1:4739824e2d7878dcea88397a6758e31e3c5c124e</id>
<content type='text'>
There is typo here so it releases the wrong variable.  "ctrl-&gt;admin_q"
was intended instead of "ctrl-&gt;fabrics_q".

Fixes: fe60e8c53411 ("nvme: add common helpers to allocate and free tagsets")
Signed-off-by: Dan Carpenter &lt;dan.carpenter@oracle.com&gt;
Reviewed-by: Chaitanya Kulkarni &lt;kch@nvidia.com&gt;
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
</content>
</entry>
<entry>
<title>nvme-multipath: fix possible hang in live ns resize with ANA access</title>
<updated>2022-10-12T09:42:58Z</updated>
<author>
<name>Sagi Grimberg</name>
<email>sagi@grimberg.me</email>
</author>
<published>2022-09-29T07:36:47Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/linux-dev/commit/?id=72e3b8883a36e80ebfa41015c7b6926ce31ace05'/>
<id>urn:sha1:72e3b8883a36e80ebfa41015c7b6926ce31ace05</id>
<content type='text'>
When we revalidate paths as part of ns size change (as of commit
e7d65803e2bb), it is possible that during the path revalidation, the
only paths that is IO capable (i.e. optimized/non-optimized) are the
ones that ns resize was not yet informed to the host, which will cause
inflight requests to be requeued (as we have available paths but none
are IO capable). These requests on the requeue list are waiting for
someone to resubmit them at some point.

The IO capable paths will eventually notify the ns resize change to the
host, but there is nothing that will kick the requeue list to resubmit
the queued requests.

Fix this by always kicking the requeue list, and if no IO capable path
exists, these requests will be queued again.

A typical log that indicates that IOs are requeued:
--
nvme nvme1: creating 4 I/O queues.
nvme nvme1: new ctrl: "testnqn1"
nvme nvme2: creating 4 I/O queues.
nvme nvme2: mapped 4/0/0 default/read/poll queues.
nvme nvme2: new ctrl: NQN "testnqn1", addr 127.0.0.1:8009
nvme nvme1: rescanning namespaces.
nvme1n1: detected capacity change from 2097152 to 4194304
block nvme1n1: no usable path - requeuing I/O
block nvme1n1: no usable path - requeuing I/O
block nvme1n1: no usable path - requeuing I/O
block nvme1n1: no usable path - requeuing I/O
block nvme1n1: no usable path - requeuing I/O
block nvme1n1: no usable path - requeuing I/O
block nvme1n1: no usable path - requeuing I/O
block nvme1n1: no usable path - requeuing I/O
block nvme1n1: no usable path - requeuing I/O
block nvme1n1: no usable path - requeuing I/O
nvme nvme2: rescanning namespaces.
--

Reported-by: Yogev Cohen &lt;yogev@lightbitslabs.com&gt;
Fixes: e7d65803e2bb ("nvme-multipath: revalidate paths during rescan")
Signed-off-by: Sagi Grimberg &lt;sagi@grimberg.me&gt;
Cc: &lt;stable@vger.kernel.org&gt; # v5.15+
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
</content>
</entry>
</feed>
