linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2017-04-06	scsi: libsas: allow async aborts	Christoph Hellwig	1	-3/+0
	We now first try to call ->eh_abort_handler from a work queue, but libsas was always failing that for no good reason. Allow async aborts. Reviewed-by: Johannes Thumshirn <jth@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-04-06	scsi: always send command aborts	Hannes Reinecke	1	-12/+0
	When a command has timed out we always should be sending an abort; with the previous code a failed abort might signal SCSI EH to start, and all other timed out commands will never be aborted, even though they might belong to a different ITL nexus. Cc: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-04-06	scsi: sd: Return SUCCESS in sd_eh_action() after device offline	Hannes Reinecke	1	-1/+1
	If sd_eh_action() decides to take the device offline there is no point in returning FAILED, as taking the device offline is the ultimate step in SCSI EH anyway. So further escalation via SCSI EH is not likely to make a difference and we can as well return SUCCESS. Cc: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-04-06	scsi: scsi_error: count medium access timeout only once per EH run	Hannes Reinecke	3	-1/+45
	The current medium access timeout counter will be increased for each command, so if there are enough failed commands we'll hit the medium access timeout for even a single device failure and the following kernel message is displayed: sd H:C:T:L: [sdXY] Medium access timeout failure. Offlining disk! Fix this by making the timeout per EH run, ie the counter will only be increased once per device and EH run. Fixes: 18a4d0a ("[SCSI] Handle disk devices which can not process medium access commands") Cc: Ewan Milne <emilne@redhat.com> Cc: Lawrence Obermann <loberman@redhat.com> Cc: Benjamin Block <bblock@linux.vnet.ibm.com> Cc: Steffen Maier <maier@linux.vnet.ibm.com> Signed-off-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-04-06	scsi: csiostor: switch to pci_alloc_irq_vectors	Christoph Hellwig	2	-82/+47
	And get automatic MSI-X affinity for free. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Varun Prakash <varun@chelsio.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-04-06	scsi: ses: don't get power status of SES device slot on probe	Mauricio Faria de Oliveira	2	-2/+6
	The commit 08024885a2a3 ("ses: Add power_status to SES device slot") introduced the 'power_status' attribute to enclosure components and the associated callbacks. There are 2 callbacks available to get the power status of a device: 1) ses_get_power_status() for 'struct enclosure_component_callbacks' 2) get_component_power_status() for the sysfs device attribute (these are available for kernel-space and user-space, respectively.) However, despite both methods being available to get power status on demand, that commit also introduced a call to get power status in ses_enclosure_data_process(). This dramatically increased the total probe time for SCSI devices on larger configurations, because ses_enclosure_data_process() is called several times during the SCSI devices probe and loops over the component devices (but that is another problem, another patch). That results in a tremendous continuous hammering of SCSI Receive Diagnostics commands to the enclosure-services device, which does delay the total probe time for the SCSI devices __significantly__: Originally, ~34 minutes on a system attached to ~170 disks: [ 9214.490703] mpt3sas version 13.100.00.00 loaded ... [11256.580231] scsi 17:0:177:0: qdepth(16), tagged(1), simple(0), ordered(0), scsi_level(6), cmd_que(1) With this patch, it decreased to ~2.5 minutes -- a 13.6x faster [ 1002.992533] mpt3sas version 13.100.00.00 loaded ... [ 1151.978831] scsi 11:0:177:0: qdepth(16), tagged(1), simple(0), ordered(0), scsi_level(6), cmd_que(1) Back to the commit discussion.. on the ses_get_power_status() call introduced in ses_enclosure_data_process(): impact of removing it. That may possibly be in place to initialize the power status value on device probe. However, those 2 functions available to retrieve that value _do_ automatically refresh/update it. So the potential benefit would be a direct access of the 'power_status' field which does not use the callbacks... But the only reader of 'struct enclosure_component::power_status' is the get_component_power_status() callback for sysfs attribute, and it _does_ check for and call the .get_power_status callback, (which indeed is defined and implemented by that commit), so the power status value is, again, automatically updated. So, the remaining potential for a direct/non-callback access to the power_status attribute would be out-of-tree modules -- well, for those, if they are for whatever reason interested in values that are set during device probe and not up-to-date by the time they need it.. well, that would be curious. Well, to handle that more properly, set the initial power state value to '-1' (i.e., uninitialized) instead of '1' (power 'on'), and check for it in that callback which may do an direct access to the field value _if_ a callback function is not defined. Signed-off-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com> Fixes: 08024885a2a3 ("ses: Add power_status to SES device slot") Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Song Liu <songliubraving@fb.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-04-06	scsi: osd_uld: Check scsi_device_get() return value	Bart Van Assche	1	-3/+5
	scsi_device_get() can fail. Hence check its return value. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Boaz Harrosh <bharrosh@panasas.com> Acked-by: Boaz Harrosh <ooo@electrozaur.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-04-04	scsi: sas: remove sas_domain_release_transport	Johannes Thumshirn	1	-7/+0
	sas_domain_release_transport is unused since at least v3.13, remove it. Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-04-04	scsi: qla2xxx: Fix typo in driver	Milan P Gandhi	5	-7/+7
	Signed-off-by: Milan P Gandhi <mgandhi@redhat.com> Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-04-04	scsi: advansys: fix uninitialized data access	Arnd Bergmann	1	-11/+10
	gcc-7.0.1 now warns about a previously unnoticed access of uninitialized struct members: drivers/scsi/advansys.c: In function 'AscMsgOutSDTR': drivers/scsi/advansys.c:3860:26: error: '((void )&sdtr_buf+5)' may be used uninitialized in this function [-Werror=maybe-uninitialized] ((ushort)s_buffer[i + 1] << 8) \| s_buffer[i]); ^ drivers/scsi/advansys.c:3860:26: error: '((void )&sdtr_buf+7)' may be used uninitialized in this function [-Werror=maybe-uninitialized] drivers/scsi/advansys.c:3860:26: error: '((void )&sdtr_buf+5)' may be used uninitialized in this function [-Werror=maybe-uninitialized] drivers/scsi/advansys.c:3860:26: error: '((void )&sdtr_buf+7)' may be used uninitialized in this function [-Werror=maybe-uninitialized] The code has existed in this exact form at least since v2.6.12, and the warning seems correct. This uses named initializers to ensure we initialize all members of the structure. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-30	scsi: be2iscsi: switch to pci_alloc_irq_vectors	Christoph Hellwig	2	-87/+42
	And get automatic MSI-X affinity for free. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Jitendra Bhivare <jitendra.bhivare@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-29	Revert "scsi: ufs: add queries retry mechanism"	Szymon Mielczarek	1	-45/+9
	This reverts commit 61e073590b82a539654626ecae91b8fab11db3f3. The patch introduced redundant query retries as we already had such mechanism provided with _retry functions. Both ufshcd_read_desc and ufshcd_read_unit_desc_param functions call ufshcd_query_descriptor_retry wrapper. Signed-off-by: Szymon Mielczarek <szymonx.mielczarek@intel.com> Reviewed-by: Subhash Jadavani <subhashj@codeaurora.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-29	scsi: hpsa: change driver version	Don Brace	1	-1/+1
	Reviewed-by: Gerry Morong <gerry.morong@microsemi.com> Reviewed-by: Scott Teel <scott.teel@microsemi.com> Signed-off-by: Don Brace <don.brace@microsemi.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-29	scsi: hpsa: update pci ids	Don Brace	1	-0/+4
	Reviewed-by: Gerry Morong <gerry.morong@microsemi.com> Reviewed-by: Scott Teel <scott.teel@microsemi.com> Signed-off-by: Don Brace <don.brace@microsemi.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-29	scsi: hisi_sas: fix SATA dependency	Arnd Bergmann	1	-1/+2
	Removing the 'select SCSI_SAS_LIBSAS' statement in Kconfig resulted in a link failure in configurations that have hisi_sas built-in but libsas as a loadable module: drivers/scsi/built-in.o: In function `hisi_sas_scan_finished': hisi_sas_main.c:(.text+0x37ce9): undefined reference to `sas_drain_work' drivers/scsi/built-in.o: In function `hisi_sas_slave_configure': hisi_sas_main.c:(.text+0x37d17): undefined reference to `sas_slave_configure' hisi_sas_main.c:(.text+0x37d40): undefined reference to `sas_change_queue_depth' drivers/scsi/built-in.o: In function `hisi_sas_remove': All other libsas users have the 'select' statement, so we should do the same here for consistency. For all I can tell, the patch that added the sata softreset does not actually introduce a dependency on SCSI_SAS_ATA but instead adds calls into libata itself, so we can express that with a more specific dependency. We cannot have 'select SCSI_SAS_LIBSAS; depends on SCSI_SAS_ATA' as that would cause a dependency loop. Fixes: 7c594f0407de ("scsi: hisi_sas: add softreset function for SATA disk") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-29	scsi: ufs: just use sizeof() for snprintf()	Tomohiro Kusumi	1	-1/+1
	Not much reason to use ARRAY_SIZE() when we know it's for a C string. Signed-off-by: Tomohiro Kusumi <tkusumi@tuxera.com> Reviewed-by: Subhash Jadavani <subhashj@codeaurora.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-29	scsi: ufs: remove deprecated enum for hw interrupt	Tomohiro Kusumi	1	-7/+0
	These flags are no longer needed after 2fbd009b in 2013. Signed-off-by: Tomohiro Kusumi <tkusumi@tuxera.com> Reviewed-by: Subhash Jadavani <subhashj@codeaurora.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-29	scsi: ufs: add missing macros for register bits from UFSHCI spec	Tomohiro Kusumi	1	-0/+2
	Add macros for register bits that can be found in JESD223C (v2.1). Not all registers are defined in ufshci.h (i.e. some are unused whether macros are defined or undefined), but all the bits for those registers that are already defined should appear here. Signed-off-by: Tomohiro Kusumi <tkusumi@tuxera.com> Reviewed-by: Subhash Jadavani <subhashj@codeaurora.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-29	scsi: ufs: non functional macro fix	Tomohiro Kusumi	1	-4/+4
	Not having () isn't likely to do any harm in this case, but all the other macros below do have it. Also add "are" in a comment. Signed-off-by: Tomohiro Kusumi <tkusumi@tuxera.com> Reviewed-by: Subhash Jadavani <subhashj@codeaurora.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-29	scsi: ufs: use existing macro CONTROLLER_ENABLE to test register bit	Tomohiro Kusumi	1	-1/+2
	Signed-off-by: Tomohiro Kusumi <tkusumi@tuxera.com> Reviewed-by: Subhash Jadavani <subhashj@codeaurora.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-29	scsi: ufs: make ufshcd_is_{device_present, hba_active}() return bool	Tomohiro Kusumi	1	-6/+6
	ufshcd driver generally uses bool for is_xxx type things instead of int, so conform to its style. Signed-off-by: Tomohiro Kusumi <tkusumi@tuxera.com> Reviewed-by: Subhash Jadavani <subhashj@codeaurora.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-29	scsi: hisi_sas: add missing break in switch statement	Colin Ian King	1	-0/+1
	It appears that a break in the TRANS_TX_OPEN_CNX_ERR_NO_DESTINATION case got accidentally removed in an earlier commit, as it stands, the ts->stat and ts->open_rej_reason are being updated twice for this case which looks incorrect. Fix this by adding in the missing break statement. Detected by CoverityScan, CID#1422110 ("Missing break in switch") Fixes: 634a9585f49c7 ("scsi: hisi_sas: process error codes according to their priority") Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-27	scsi: be2iscsi: Update driver version	Jitendra Bhivare	1	-1/+1
	Version 11.4.0.0 Signed-off-by: Jitendra Bhivare <jitendra.bhivare@broadcom.com> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Reviewed-by: Chris Leech <cleech@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-27	scsi: be2iscsi: Update Copyright	Jitendra Bhivare	9	-75/+36
	Update Broadcom Copyright markings in all files. Signed-off-by: Jitendra Bhivare <jitendra.bhivare@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-27	scsi: be2iscsi: Check size before copying ASYNC handle	Jitendra Bhivare	1	-1/+6
	Data in buffers are gathered into a single buffer before giving to iSCSI layer. Though less likely to have payload more than 8K in ASYNC PDU, the data length is provide by FW and check is missing for overrun. Signed-off-by: Jitendra Bhivare <jitendra.bhivare@broadcom.com> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Reviewed-by: Chris Leech <cleech@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-27	scsi: be2iscsi: Remove free_list for ASYNC handles	Jitendra Bhivare	2	-64/+43
	With previous patch adding ASYNC Rx buffers to free_list is not required. Remove all free_list related operations. Add in_use to track if buffer posted is being processed by driver and purge all buffers received for connection if found so. Signed-off-by: Jitendra Bhivare <jitendra.bhivare@broadcom.com> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Reviewed-by: Chris Leech <cleech@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-27	scsi: be2iscsi: Use num_cons field in Rx CQE	Jitendra Bhivare	2	-74/+51
	FW runs out of buffer if buffers are not posted back soon. ASYNC Rx CQE indicates that FW has consumed 8 RQEs. Use it to post back buffers instead of waiting for buffers to be processed and freed by driver. Signed-off-by: Jitendra Bhivare <jitendra.bhivare@broadcom.com> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Reviewed-by: Chris Leech <cleech@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-27	scsi: be2iscsi: Increase HDQ default queue size	Jitendra Bhivare	2	-16/+17
	Currently, ASYNC PDU default queue size is set to max connections. This leaves only one buffer per connection for any ASYNC PDUs from targets. Double the size of the default queue. Signed-off-by: Jitendra Bhivare <jitendra.bhivare@broadcom.com> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Reviewed-by: Chris Leech <cleech@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-27	scsi: scsi_transport_iscsi: Use flush_work in iscsi_remove_session	Jitendra Bhivare	1	-2/+1
	scsi_flush_work flushes workqueue for the Scsi_Host. In iSCSI offload enabled host, this would wait for all other sessions under the host. Use flush_work for the session being removed instead. Signed-off-by: Jitendra Bhivare <jitendra.bhivare@broadcom.com> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Reviewed-by: Chris Leech <cleech@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-27	scsi: be2iscsi: Replace spin_unlock_bh with spin_lock	Jitendra Bhivare	1	-1/+1
	spin_unlock_bh back_lock is used in beiscsi_eh_device_reset instead of spin_lock. Signed-off-by: Jitendra Bhivare <jitendra.bhivare@broadcom.com> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Reviewed-by: Chris Leech <cleech@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-27	scsi: be2iscsi: Fix closing of connection	Jitendra Bhivare	5	-160/+159
	CID needs to be freed even when invalidate or upload connection fails. Attempt to close connection 3 times before freeing CID. Set cleanup_type to INVALIDATE instead of force TCP_RST. This unnecessarily is terminating connection with reset instead of gracefully closing it. Set save_cfg to 0 - session not to be saved on flash. Add delay and process CQ before uploading connection. Signed-off-by: Jitendra Bhivare <jitendra.bhivare@broadcom.com> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Reviewed-by: Chris Leech <cleech@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-27	scsi: be2iscsi: Check tag in beiscsi_mccq_compl_wait	Jitendra Bhivare	1	-0/+6
	scsi host12: BS_1377 : mgmt_invalidate_connection Failed for cid=256 BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 IP: [<ffffffff81332ebf>] __list_add+0xf/0xc0 PGD 0 Oops: 0000 [#1] SMP Modules linked in: ... CPU: 9 PID: 1542 Comm: iscsid Tainted: G ------------ T 3.10.0-514.el7.x86_64 #1 Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 09/12/2016 task: ffff88076f310fb0 ti: ffff88076bba8000 task.ti: ffff88076bba8000 RIP: 0010:[<ffffffff81332ebf>] [<ffffffff81332ebf>] __list_add+0xf/0xc0 RSP: 0018:ffff88076bbab8e8 EFLAGS: 00010046 RAX: 0000000000000246 RBX: ffff88076bbab990 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffff880468badf58 RDI: ffff88076bbab990 RBP: ffff88076bbab900 R08: 0000000000000246 R09: 00000000000020de R10: 0000000000000000 R11: ffff88076bbab5be R12: 0000000000000000 R13: ffff880468badf58 R14: 000000000001adb0 R15: ffff88076f310fb0 FS: 00007f377124a880(0000) GS:ffff88046fa40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000008 CR3: 0000000771318000 CR4: 00000000001407e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Stack: ffff88076bbab990 ffff880468badf50 0000000000000001 ffff88076bbab938 ffffffff810b128b 0000000000000246 00000000cf9b7040 ffff880468bac7a0 0000000000000000 ffff880468bac7a0 ffff88076bbab9d0 ffffffffa05a6ea3 Call Trace: [<ffffffff810b128b>] prepare_to_wait+0x7b/0x90 [<ffffffffa05a6ea3>] beiscsi_mccq_compl_wait+0x153/0x330 [be2iscsi] [<ffffffff810b1600>] ? wake_up_atomic_t+0x30/0x30 [<ffffffffa05981b1>] beiscsi_ep_disconnect+0x91/0x2d0 [be2iscsi] [<ffffffffa0202ffa>] iscsi_if_ep_disconnect.isra.14+0x5a/0x70 [scsi_transport_iscsi] [<ffffffffa02042fb>] iscsi_if_recv_msg+0x113b/0x14a0 [scsi_transport_iscsi] [<ffffffff811dffd8>] ? __kmalloc_node_track_caller+0x58/0x290 [<ffffffffa02046ee>] iscsi_if_rx+0x8e/0x1f0 [scsi_transport_iscsi] [<ffffffff815a351d>] netlink_unicast+0xed/0x1b0 [<ffffffff815a38fe>] netlink_sendmsg+0x31e/0x690 [<ffffffff815a03e4>] ? netlink_rcv_wake+0x44/0x60 [<ffffffff815a19e3>] ? netlink_recvmsg+0x1e3/0x450 beiscsi_mccq_compl_wait gets called even when MCC tag allocation failed for mgmt_invalidate_connection. mcc_wait is not initialized for tag 0 so causes crash in prepare_to_wait. Signed-off-by: Jitendra Bhivare <jitendra.bhivare@broadcom.com> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Reviewed-by: Chris Leech <cleech@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-27	scsi: ufs: fix wrong/ambiguous fall through comments	Tomohiro Kusumi	1	-2/+1
	These aren't really falling through to anywhere meaningful. Signed-off-by: Tomohiro Kusumi <tkusumi@tuxera.com> Reviewed-by: Subhash Jadavani <subhashj@codeaurora.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-27	scsi: osd_uld: remove an unneeded NULL check	Dan Carpenter	1	-4/+3
	We don't call the remove() function unless probe() succeeds so "oud" can't be NULL here. Plus, if it were NULL, we dereference it on the next line so it would crash anyway. [mkp: applied by hand] Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Boaz Harrosh <ooo@electrozaur.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23	scsi: ipr: Driver version 2.6.4	Brian King	1	-2/+2
	Bump driver version Signed-off-by: Brian King <brking@linux.vnet.ibm.com> Reviewed-by: Wendy Xiong <wenxiong@linux.vnet.ibm.com> Tested-by: Wendy Xiong <wenxiong@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23	scsi: ipr: Fix SATA EH hang	Brian King	1	-21/+41
	This patch fixes a hang that can occur in ATA EH with ipr. With ipr's usage of libata, commands should never end up on ap->eh_done_q. The timeout function we use for ipr, even for SATA devices, is scsi_times_out, so ATA_QCFLAG_EH_SCHEDULED never gets set for ipr and EH is driven completely by ipr and SCSI. The SCSI EH thread ends up calling ipr's eh_device_reset_handler, which then calls ata_std_error_handler. This ends up calling ipr_sata_reset, which issues a reset to the device. This should result in all pending commands getting failed back and having ata_qc_complete called for them, which should end up clearing ATA_QCFLAG_FAILED as qc->flags gets zeroed in ata_qc_free. This ensures that when we end up in ata_eh_finish, we don't do anything more with the command. On adapters that only support a single interrupt and when running with two MSI-X vectors or less, the adapter firmware guarantees that responses to all outstanding commands are sent back prior to sending the response to the SATA reset command. On newer adapters supporting multiple HRRQs, however, this can no longer be guaranteed, since the command responses and reset response may be processed on different HRRQs. If ipr returns from ipr_sata_reset before the outstanding command was returned, this sends us down the path of __ata_eh_qc_complete which then moves the associated scsi_cmd from the work_q in scsi_eh_bus_device_reset to ap->eh_done_q, which then will sit there forever and we will be wedged. This patch fixes this up by ensuring that any outstanding commands are flushed before returning from eh_device_reset_handler for a SATA device. Reported-by: David Jeffery <djeffery@redhat.com> Signed-off-by: Brian King <brking@linux.vnet.ibm.com> Reviewed-by: Wendy Xiong <wenxiong@linux.vnet.ibm.com> Tested-by: Wendy Xiong <wenxiong@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23	scsi: ipr: Error path locking fixes	Brian King	1	-13/+93
	This patch closes up some potential race conditions observed in the error handling paths in ipr while debugging an issue resulting in a hang with SATA error handling. These patches ensure we are holding the correct lock when adding and removing commands from the free and pending queues in some error scenarios. Signed-off-by: Brian King <brking@linux.vnet.ibm.com> Reviewed-by: Wendy Xiong <wenxiong@linux.vnet.ibm.com> Tested-by: Wendy Xiong <wenxiong@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23	scsi: ipr: Fix abort path race condition	Brian King	1	-17/+47
	This fixes a race condition in the error handlomg paths of ipr. While a command is outstanding to the adapter, it is placed on a pending queue for the hrrq it is associated with, while holding the HRRQ lock. When a command is completed, it is removed from the pending queue, under HRRQ lock, and placed on a local list. This list is then iterated through without any locks and each command's done function is invoked, inside of which, the command gets returned to the free list while grabbing the HRRQ lock. This fixes two race conditions when commands have been removed from the pending list but have not yet been added to the free list. Both of these changes fix race conditions that could result in returning success from eh_abort_handler and then later calling scsi_done for the same request. The first race condition is in ipr_cancel_op. It looks through each pending queue to see if the command to be aborted is still outstanding or not. Rather than looking on the pending queue, reverse the logic to check to look for commands that are NOT on the free queue. The second race condition can occur when in ipr_wait_for_ops where we are waiting for responses for commands we've aborted. Signed-off-by: Brian King <brking@linux.vnet.ibm.com> Reviewed-by: Wendy Xiong <wenxiong@linux.vnet.ibm.com> Tested-by: Wendy Xiong <wenxiong@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23	scsi: ipr: Remove redundant initialization	Brian King	1	-6/+5
	Removes some code in __ipr_eh_dev_reset which was modifying the ipr_cmd done function. This should have already been setup at command allocation time and if its since been changed, it means we are in the ipr_erp* functions and need to wait for them to complete and don't want to override that here. Signed-off-by: Brian King <brking@linux.vnet.ibm.com> Reviewed-by: Wendy Xiong <wenxiong@linux.vnet.ibm.com> Tested-by: Wendy Xiong <wenxiong@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23	scsi: ipr: Fix missed EH wakeup	Brian King	1	-4/+12
	Following a command abort or device reset, ipr's EH handlers wait for the commands getting aborted to get sent back from the adapter prior to returning from the EH handler. This fixes up some cases where the completion handler was not getting called, which would have resulted in the EH thread waiting until it timed out, greatly extending EH time. Signed-off-by: Brian King <brking@linux.vnet.ibm.com> Reviewed-by: Wendy Xiong <wenxiong@linux.vnet.ibm.com> Tested-by: Wendy Xiong <wenxiong@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23	scsi: hisi_sas: add is_sata_phy_v2_hw()	Xiaofei Tan	1	-4/+13
	Add helper function is_sata_phy_v2_hw() to judge whether the attached device is SATA disk for a root PHY. Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23	scsi: hisi_sas: use dev_is_sata to identify SATA or SAS disk	Xiang Chen	1	-1/+1
	When SMP IO is sent, sas_protocol_ata couldn't judge whether the disk is SATA or SAS disk. So use dev_is_sata to identify SATA or SAS disk. Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23	scsi: hisi_sas: check hisi_sas_lu_reset() error message	John Garry	1	-2/+3
	Unless we actually get some sort of failure in hisi_sas_lu_reset(), don't print a message. Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23	scsi: hisi_sas: release SMP slot in lldd_abort_task	Xiang Chen	1	-2/+7
	When an SMP task timeouts, it will call lldd_abort_task to release the associated slot, and then will release the sas_task. Currently in lldd_abort_task, if we fail to internally abort IO, then the slot of SMP IO is not released, but sas_task will still be later released, so the slot's sas_task is NULL, which will cause NULL pointer when hisi_sas_slot_task_free happens later. To resolve, check the return value of internal abort, and release the slot if it failed. Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23	scsi: hisi_sas: add hisi_sas_clear_nexus_ha()	John Garry	1	-0/+8
	Add function for upper-layer to reset controller when all else fails. Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23	scsi: hisi_sas: rename hisi_sas_link_timeout_{enable, disable}_link	John Garry	1	-6/+6
	For consistency, remove the "hisi_sas_" prefix. Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23	scsi: hisi_sas: handle PHY UP+DOWN simultaneous irq	Xiaofei Tan	1	-17/+39
	Handle the situation that PHY UP and DOWN irq happen simultaneously. There is no mechanism of SoC HW to ensure this situation will never happen. So, we add this handle just in case. Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23	scsi: hisi_sas: some modifications to v2 hw reg init values	John Garry	1	-6/+6
	This patch includes: (1) Disable transport layer retry (2) Support CQ time and count interrupt coal (3) fix link FIFO full issue Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: Zhao Nenglong <zhaonenglong@hisilicon.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23	scsi: hisi_sas: process error codes according to their priority	Xiang Chen	1	-155/+398
	There are some rules to decide which error code has the high priority when errors happen together: (1) Error phase of CQ decides the error happens on RX or TX; (2) For TX error, when DMA/TRANS TX error happen simultaneously, the priority of DMA TX error is higher than TRANS TX error, so for the priority of TX error: DW2 (DMA TX part) > DW0; (3) For RX error, when TRANS/DMA/SIPC RX error happen simultaneously, the priority of TRANS RX error is higher than DMA and SIPC RX error, and we should also keep the rules (the priority of DW3 > DW2), so for the priority of RX error: DW1 > DW3 > DW2(SIPC RX part); (4) There are also a priority we should keep in the same error type. So, modify slot error code to handle this. In addition to this, some some error codes are modified according to recommendation from SoC designer. Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23	scsi: hisi_sas: remove task free'ing for timeouts	John Garry	1	-14/+1
	When a TMF or internal abort times-out, do not free slot. We expect this to be done upon later escalated error handling. Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>