scsi: lpfc: Fix crash when a fabric node is released prematurely

The driver's management of the fabric controller (aka pseudo-scsi initiator) node in SLI3 mode is causing this crash. The crash occurs because of a node reference imbalance that frees the fabric controller node while devloss is outstanding from the SCSI transport. This is triggered by an odd behavior where the switch reacts to a rejected RDP request with a PLOGI and nothing else, not even a LOGO. The driver ACKS the PLOGI and after successfully registering the RPI, incorrectly registers the fabric controller node because it has the NLP_FC4_FCP flag still set from the fabric controller PRLI. If a LIP is issued, the driver attempts to cleanup on Link Up and ends up executing too many puts. Fix by detecting the fabric node type and clearing out the nodes internal flags that triggered a SCSI transport registration and subsequence dev_loss event. The driver cannot count on any persistence from fabric controller nodes. Link: https://lore.kernel.org/r/20210104180240.46824-5-jsmart2021@gmail.com Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
author: James Smart <jsmart2021@gmail.com> 2021-01-04 10:02:29 -0800
committer: Martin K. Petersen <martin.petersen@oracle.com> 2021-01-07 23:02:35 -0500
commit: 07aaefdf75c50b55e1f1e1c904fa6d00466e0a75 (patch)
tree: 2e9319d64373ce5d132384e8ffade776f3ae1aa7 /drivers/scsi/lpfc/lpfc_hbadisc.c
parent: scsi: lpfc: Refresh ndlp when a new PRLI is received in the PRLI issue state (diff)
download: wireguard-linux-07aaefdf75c50b55e1f1e1c904fa6d00466e0a75.tar.xz
wireguard-linux-07aaefdf75c50b55e1f1e1c904fa6d00466e0a75.zip
1 files changed, 13 insertions, 5 deletions
diff --git a/drivers/scsi/lpfc/lpfc_hbadisc.c b/drivers/scsi/lpfc/lpfc_hbadisc.c
index 2b6b5fc671fe..bcb5bf7e19dc 100644
--- a/drivers/scsi/lpfc/lpfc_hbadisc.c
+++ b/drivers/scsi/lpfc/lpfc_hbadisc.c
@@ -73,6 +73,16 @@ static void lpfc_unregister_fcfi_cmpl(struct lpfc_hba *, LPFC_MBOXQ_t *);
 static int lpfc_fcf_inuse(struct lpfc_hba *);
 static void lpfc_mbx_cmpl_read_sparam(struct lpfc_hba *, LPFC_MBOXQ_t *);
 
+static int
+lpfc_valid_xpt_node(struct lpfc_nodelist *ndlp)
+{
+	if (ndlp->nlp_fc4_type ||
+	    ndlp->nlp_DID == Fabric_DID ||
+	    ndlp->nlp_DID == NameServer_DID ||
+	    ndlp->nlp_DID == FDMI_DID)
+		return 1;
+	return 0;
+}
 /* The source of a terminate rport I/O is either a dev_loss_tmo
  * event or a call to fc_remove_host.  While the rport should be
  * valid during these downcalls, the transport can call twice
@@ -4318,7 +4328,8 @@ lpfc_nlp_state_cleanup(struct lpfc_vport *vport, struct lpfc_nodelist *ndlp,
 	/* FCP and NVME Transport interface */
 	if ((old_state == NLP_STE_MAPPED_NODE ||
 	     old_state == NLP_STE_UNMAPPED_NODE)) {
-		if (ndlp->rport) {
+		if (ndlp->rport &&
+		    lpfc_valid_xpt_node(ndlp)) {
 			vport->phba->nport_event_cnt++;
 			lpfc_unregister_remote_port(ndlp);
 		}
@@ -4340,10 +4351,7 @@ lpfc_nlp_state_cleanup(struct lpfc_vport *vport, struct lpfc_nodelist *ndlp,
 
 	if (new_state ==  NLP_STE_MAPPED_NODE ||
 	    new_state == NLP_STE_UNMAPPED_NODE) {
-		if (ndlp->nlp_fc4_type ||
-		    ndlp->nlp_DID == Fabric_DID ||
-		    ndlp->nlp_DID == NameServer_DID ||
-		    ndlp->nlp_DID == FDMI_DID) {
+		if (lpfc_valid_xpt_node(ndlp)) {
 			vport->phba->nport_event_cnt++;
 			/*
 			 * Tell the fc transport about the port, if we haven't
author	James Smart <jsmart2021@gmail.com>	2021-01-04 10:02:29 -0800
committer	Martin K. Petersen <martin.petersen@oracle.com>	2021-01-07 23:02:35 -0500
commit	07aaefdf75c50b55e1f1e1c904fa6d00466e0a75 (patch)
tree	2e9319d64373ce5d132384e8ffade776f3ae1aa7 /drivers/scsi/lpfc/lpfc_hbadisc.c
parent	scsi: lpfc: Refresh ndlp when a new PRLI is received in the PRLI issue state (diff)
download	wireguard-linux-07aaefdf75c50b55e1f1e1c904fa6d00466e0a75.tar.xz wireguard-linux-07aaefdf75c50b55e1f1e1c904fa6d00466e0a75.zip