aboutsummaryrefslogtreecommitdiffstats
path: root/scripts (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2019-10-18RDMA/iwcm: move iw_rem_ref() calls out of spinlockKrishnamraju Eraparaju1-23/+29
kref release routines usually perform memory release operations, hence, they should not be called with spinlocks held. one such case is: SIW kref release routine siw_free_qp(), which can sleep via vfree() while freeing queue memory. Hence, all iw_rem_ref() calls in IWCM are moved out of spinlocks. Fixes: 922a8e9fb2e0 ("RDMA: iWARP Connection Manager.") Signed-off-by: Krishnamraju Eraparaju <krishna2@chelsio.com> Reviewed-by: Bernard Metzler <bmt@zurich.ibm.com> Link: https://lore.kernel.org/r/20191007102627.12568-1-krishna2@chelsio.com Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-10-18iw_cxgb4: fix ECN check on the passive acceptPotnuri Bharat Teja1-14/+14
pass_accept_req() is using the same skb for handling accept request and sending accept reply to HW. Here req and rpl structures are pointing to same skb->data which is over written by INIT_TP_WR() and leads to accessing corrupt req fields in accept_cr() while checking for ECN flags. Reordered code in accept_cr() to fetch correct req fields. Fixes: 92e7ae7172 ("iw_cxgb4: Choose appropriate hw mtu index and ISS for iWARP connections") Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com> Link: https://lore.kernel.org/r/20191003104353.11590-1-bharat@chelsio.com Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-10-17IB/hfi1: Use a common pad buffer for 9B and 16B packetsMike Marciniszyn2-8/+7
There is no reason for a different pad buffer for the two packet types. Expand the current buffer allocation to allow for both packet types. Fixes: f8195f3b14a0 ("IB/hfi1: Eliminate allocation while atomic") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Kaike Wan <kaike.wan@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Link: https://lore.kernel.org/r/20191004204934.26838.13099.stgit@awfm-01.aw.intel.com Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-10-17IB/hfi1: Avoid excessive retry for TID RDMA READ requestKaike Wan1-5/+0
A TID RDMA READ request could be retried under one of the following conditions: - The RC retry timer expires; - A later TID RDMA READ RESP packet is received before the next expected one. For the latter, under normal conditions, the PSN in IB space is used for comparison. More specifically, the IB PSN in the incoming TID RDMA READ RESP packet is compared with the last IB PSN of a given TID RDMA READ request to determine if the request should be retried. This is similar to the retry logic for noraml RDMA READ request. However, if a TID RDMA READ RESP packet is lost due to congestion, header suppresion will be disabled and each incoming packet will raise an interrupt until the hardware flow is reloaded. Under this condition, each packet KDETH PSN will be checked by software against r_next_psn and a retry will be requested if the packet KDETH PSN is later than r_next_psn. Since each TID RDMA READ segment could have up to 64 packets and each TID RDMA READ request could have many segments, we could make far more retries under such conditions, and thus leading to RETRY_EXC_ERR status. This patch fixes the issue by removing the retry when the incoming packet KDETH PSN is later than r_next_psn. Instead, it resorts to RC timer and normal IB PSN comparison for any request retry. Fixes: 9905bf06e890 ("IB/hfi1: Add functions to receive TID RDMA READ response") Cc: <stable@vger.kernel.org> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Link: https://lore.kernel.org/r/20191004204035.26542.41684.stgit@awfm-01.aw.intel.com Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-10-17RDMA/mlx5: Clear old rate limit when closing QPRafi Wiener1-3/+5
Before QP is closed it changes to ERROR state, when this happens the QP was left with old rate limit that was already removed from the table. Fixes: 7d29f349a4b9 ("IB/mlx5: Properly adjust rate limit on QP state transitions") Signed-off-by: Rafi Wiener <rafiw@mellanox.com> Signed-off-by: Oleg Kuporosov <olegk@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Link: https://lore.kernel.org/r/20191002120243.16971-1-leon@kernel.org Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-10-04RDMA/mlx5: Add missing synchronize_srcu() for MW casesJason Gunthorpe4-55/+33
While MR uses live as the SRCU 'update', the MW case uses the xarray directly, xa_erase() causes the MW to become inaccessible to the pagefault thread. Thus whenever a MW is removed from the xarray we must synchronize_srcu() before freeing it. This must be done before freeing the mkey as re-use of the mkey while the pagefault thread is using the stale mkey is undesirable. Add the missing synchronizes to MW and DEVX indirect mkey and delete the bogus protection against double destroy in mlx5_core_destroy_mkey() Fixes: 534fd7aac56a ("IB/mlx5: Manage indirection mkey upon DEVX flow for ODP") Fixes: 6aec21f6a832 ("IB/mlx5: Page faults handling infrastructure") Link: https://lore.kernel.org/r/20191001153821.23621-7-jgg@ziepe.ca Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-04RDMA/mlx5: Put live in the correct place for ODP MRsJason Gunthorpe3-38/+14
live is used to signal to the pagefault thread that the MR is initialized and ready for use. It should be after the umem is assigned and all other setup is completed. This prevents races (at least) of the form: CPU0 CPU1 mlx5_ib_alloc_implicit_mr() implicit_mr_alloc() live = 1 imr->umem = umem num_pending_prefetch_inc() if (live) atomic_inc(num_pending_prefetch) atomic_set(num_pending_prefetch,0) // Overwrites other thread's store Further, live is being used with SRCU as the 'update' in an acquire/release fashion, so it can not be read and written raw. Move all live = 1's to after MR initialization is completed and use smp_store_release/smp_load_acquire() for manipulating it. Add a missing live = 0 when an implicit MR child is deleted, before queuing work to do synchronize_srcu(). The barriers in update_odp_mr() were some broken attempt to create a acquire/release, but were not even applied consistently and missed the point, delete it as well. Fixes: 6aec21f6a832 ("IB/mlx5: Page faults handling infrastructure") Link: https://lore.kernel.org/r/20191001153821.23621-6-jgg@ziepe.ca Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-04RDMA/mlx5: Order num_pending_prefetch properly with synchronize_srcuJason Gunthorpe1-2/+3
During destroy setting live = 0 and then synchronize_srcu() prevents num_pending_prefetch from incrementing, and also, ensures that all work holding that count is queued on the WQ. Testing before causes races of the form: CPU0 CPU1 dereg_mr() mlx5_ib_advise_mr_prefetch() srcu_read_lock() num_pending_prefetch_inc() if (!live) live = 0 atomic_read() == 0 // skip flush_workqueue() atomic_inc() queue_work(); srcu_read_unlock() WARN_ON(atomic_read()) // Fails Swap the order so that the synchronize_srcu() prevents this. Fixes: a6bc3875f176 ("IB/mlx5: Protect against prefetch of invalid MR") Link: https://lore.kernel.org/r/20191001153821.23621-5-jgg@ziepe.ca Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-04RDMA/odp: Lift umem_mutex out of ib_umem_odp_unmap_dma_pages()Jason Gunthorpe2-6/+12
This fixes a race of the form: CPU0 CPU1 mlx5_ib_invalidate_range() mlx5_ib_invalidate_range() // This one actually makes npages == 0 ib_umem_odp_unmap_dma_pages() if (npages == 0 && !dying) // This one does nothing ib_umem_odp_unmap_dma_pages() if (npages == 0 && !dying) dying = 1; dying = 1; schedule_work(&umem_odp->work); // Double schedule of the same work schedule_work(&umem_odp->work); // BOOM npages and dying must be read and written under the umem_mutex lock. Since whenever ib_umem_odp_unmap_dma_pages() is called mlx5 must also call mlx5_ib_update_xlt, and both need to be done in the same locking region, hoist the lock out of unmap. This avoids an expensive double critical section in mlx5_ib_invalidate_range(). Fixes: 81713d3788d2 ("IB/mlx5: Add implicit MR support") Link: https://lore.kernel.org/r/20191001153821.23621-4-jgg@ziepe.ca Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-04RDMA/mlx5: Fix a race with mlx5_ib_update_xlt on an implicit MRJason Gunthorpe1-2/+32
mlx5_ib_update_xlt() must be protected against parallel free of the MR it is accessing, also it must be called single threaded while updating the HW. Otherwise we can have races of the form: CPU0 CPU1 mlx5_ib_update_xlt() mlx5_odp_populate_klm() odp_lookup() == NULL pklm = ZAP implicit_mr_get_data() implicit_mr_alloc() <update interval tree> mlx5_ib_update_xlt mlx5_odp_populate_klm() odp_lookup() != NULL pklm = VALID mlx5_ib_post_send_wait() mlx5_ib_post_send_wait() // Replaces VALID with ZAP This can be solved by putting both the SRCU and the umem_mutex lock around every call to mlx5_ib_update_xlt(). This ensures that the content of the interval tree relavent to mlx5_odp_populate_klm() (ie mr->parent == mr) will not change while it is running, and thus the posted WRs to update the KLM will always reflect the correct information. The race above will resolve by either having CPU1 wait till CPU0 completes the ZAP or CPU0 will run after the add and instead store VALID. The pagefault path adding children already holds the umem_mutex and SRCU, so the only missed lock is during MR destruction. Fixes: 81713d3788d2 ("IB/mlx5: Add implicit MR support") Link: https://lore.kernel.org/r/20191001153821.23621-3-jgg@ziepe.ca Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-04RDMA/mlx5: Do not allow rereg of a ODP MRJason Gunthorpe1-3/+3
This code is completely broken, the umem of a ODP MR simply cannot be discarded without a lot more locking, nor can an ODP mkey be blithely destroyed via destroy_mkey(). Fixes: 6aec21f6a832 ("IB/mlx5: Page faults handling infrastructure") Link: https://lore.kernel.org/r/20191001153821.23621-2-jgg@ziepe.ca Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-04IB/core: Fix wrong iterating on portsMohamad Heib1-1/+1
rdma_for_each_port is already incrementing the iterator's value it receives therefore, after the first iteration the iterator is increased by 2 which eventually causing wrong queries and possible traces. Fix the above by removing the old redundant incrementation that was used before rdma_for_each_port() macro. Cc: <stable@vger.kernel.org> Fixes: ea1075edcbab ("RDMA: Add and use rdma_for_each_port") Link: https://lore.kernel.org/r/20191002122127.17571-1-leon@kernel.org Signed-off-by: Mohamad Heib <mohamadh@mellanox.com> Reviewed-by: Erez Alfasi <ereza@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-04RDMA/nldev: Reshuffle the code to avoid need to rebind QP in error pathLeon Romanovsky1-6/+4
Properly unwind QP counter rebinding in case of failure. Trying to rebind the counter after unbiding it is not going to work reliably, move the unbind to the end so it doesn't have to be unwound. Fixes: b389327df905 ("RDMA/nldev: Allow counter manual mode configration through RDMA netlink") Link: https://lore.kernel.org/r/20191002115627.16740-1-leon@kernel.org Reviewed-by: Mark Zhang <markz@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-04RDMA/cxgb4: Do not dma memory off of the stackGreg KH1-11/+17
Nicolas pointed out that the cxgb4 driver is doing dma off of the stack, which is generally considered a very bad thing. On some architectures it could be a security problem, but odds are none of them actually run this driver, so it's just a "normal" bug. Resolve this by allocating the memory for a message off of the heap instead of the stack. kmalloc() always will give us a proper memory location that DMA will work correctly from. Link: https://lore.kernel.org/r/20191001165611.GA3542072@kroah.com Reported-by: Nicolas Waisman <nico@semmle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Tested-by: Potnuri Bharat Teja <bharat@chelsio.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-04RDMA/cm: Fix memory leak in cm_add/remove_oneJack Morgenstein1-0/+3
In the process of moving the debug counters sysfs entries, the commit mentioned below eliminated the cm_infiniband sysfs directory. This sysfs directory was tied to the cm_port object allocated in procedure cm_add_one(). Before the commit below, this cm_port object was freed via a call to kobject_put(port->kobj) in procedure cm_remove_port_fs(). Since port no longer uses its kobj, kobject_put(port->kobj) was eliminated. This, however, meant that kfree was never called for the cm_port buffers. Fix this by adding explicit kfree(port) calls to functions cm_add_one() and cm_remove_one(). Note: the kfree call in the first chunk below (in the cm_add_one error flow) fixes an old, undetected memory leak. Fixes: c87e65cfb97c ("RDMA/cm: Move debug counters to be under relevant IB device") Link: https://lore.kernel.org/r/20190916071154.20383-2-leon@kernel.org Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-04RDMA/core: Fix an error handling path in 'res_get_common_doit()'Christophe JAILLET1-1/+1
According to surrounding error paths, it is likely that 'goto err_get;' is expected here. Otherwise, a call to 'rdma_restrack_put(res);' would be missing. Fixes: c5dfe0ea6ffa ("RDMA/nldev: Add resource tracker doit callback") Link: https://lore.kernel.org/r/20190818091044.8845-1-christophe.jaillet@wanadoo.fr Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-04RDMA/i40iw: Associate ibdev to netdev before IB device registrationShiraz, Saleem1-0/+4
i40iw IB device registration fails with ENODEV. ib_register_device setup_device/setup_port_data i40iw_port_immutable ib_query_port iw_query_port ib_device_get_netdev(ENODEV) ib_device_get_netdev() does not have a netdev associated with the ibdev and thus fails. Use ib_device_set_netdev() to associate netdev to ibdev in i40iw before IB device registration. Fixes: 4929116bdf72 ("RDMA/core: Add common iWARP query port") Link: https://lore.kernel.org/r/20190925164524.856-1-shiraz.saleem@intel.com Signed-off-by: Shiraz, Saleem <shiraz.saleem@intel.com> Reviewed-by: Kamal Heib <kamalheib1@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-01RDMA/iwcm: Fix a lock inversion issueBart Van Assche1-1/+2
This patch fixes the lock inversion complaint: ============================================ WARNING: possible recursive locking detected 5.3.0-rc7-dbg+ #1 Not tainted -------------------------------------------- kworker/u16:6/171 is trying to acquire lock: 00000000035c6e6c (&id_priv->handler_mutex){+.+.}, at: rdma_destroy_id+0x78/0x4a0 [rdma_cm] but task is already holding lock: 00000000bc7c307d (&id_priv->handler_mutex){+.+.}, at: iw_conn_req_handler+0x151/0x680 [rdma_cm] other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&id_priv->handler_mutex); lock(&id_priv->handler_mutex); *** DEADLOCK *** May be due to missing lock nesting notation 3 locks held by kworker/u16:6/171: #0: 00000000e2eaa773 ((wq_completion)iw_cm_wq){+.+.}, at: process_one_work+0x472/0xac0 #1: 000000001efd357b ((work_completion)(&work->work)#3){+.+.}, at: process_one_work+0x476/0xac0 #2: 00000000bc7c307d (&id_priv->handler_mutex){+.+.}, at: iw_conn_req_handler+0x151/0x680 [rdma_cm] stack backtrace: CPU: 3 PID: 171 Comm: kworker/u16:6 Not tainted 5.3.0-rc7-dbg+ #1 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 Workqueue: iw_cm_wq cm_work_handler [iw_cm] Call Trace: dump_stack+0x8a/0xd6 __lock_acquire.cold+0xe1/0x24d lock_acquire+0x106/0x240 __mutex_lock+0x12e/0xcb0 mutex_lock_nested+0x1f/0x30 rdma_destroy_id+0x78/0x4a0 [rdma_cm] iw_conn_req_handler+0x5c9/0x680 [rdma_cm] cm_work_handler+0xe62/0x1100 [iw_cm] process_one_work+0x56d/0xac0 worker_thread+0x7a/0x5d0 kthread+0x1bc/0x210 ret_from_fork+0x24/0x30 This is not a bug as there are actually two lock classes here. Link: https://lore.kernel.org/r/20190930231707.48259-3-bvanassche@acm.org Fixes: de910bd92137 ("RDMA/cma: Simplify locking needed for serialization of callbacks") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-01RDMA/iw_cxgb4: fix SRQ access from dump_qp()Potnuri Bharat Teja2-11/+6
dump_qp() is wrongly trying to dump SRQ structures as QP when SRQ is used by the application. This patch matches the QPID before dumping them. Also removes unwanted SRQ id addition to QP id xarray. Fixes: 2f43129127e6 ("cxgb4: Convert qpidr to XArray") Link: https://lore.kernel.org/r/20190930074119.20046-1-bharat@chelsio.com Signed-off-by: Rahul Kundu <rahul.kundu@chelsio.com> Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-01RDMA/hfi1: Prevent memory leak in sdma_initNavid Emamdoost1-1/+4
In sdma_init if rhashtable_init fails the allocated memory for tmp_sdma_rht should be released. Fixes: 5a52a7acf7e2 ("IB/hfi1: NULL pointer dereference when freeing rhashtable") Link: https://lore.kernel.org/r/20190925144543.10141-1-navid.emamdoost@gmail.com Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com> Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-01RDMA/core: Fix use after free and refcnt leak on ndev in_device in iwarp_query_portMichal Kalderon1-4/+5
If an iWARP driver is probed and removed while there are no ips set for the device, it will lead to a reference count leak on the inet device of the netdevice. In addition, the netdevice was accessed after already calling netdev_put, which could lead to using the netdev after already freed. Fixes: 4929116bdf72 ("RDMA/core: Add common iWARP query port") Link: https://lore.kernel.org/r/20190925123332.10746-1-michal.kalderon@marvell.com Signed-off-by: Ariel Elior <ariel.elior@marvell.com> Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com> Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com> Reviewed-by: Kamal Heib <kamalheib1@gmail.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-01RDMA/siw: Fix serialization issue in write_space()Krishnamraju Eraparaju1-4/+11
In siw_qp_llp_write_space(), 'sock' members should be accessed with sk_callback_lock held, otherwise, it could race with siw_sk_restore_upcalls(). And this could cause "NULL deref" panic. Below panic is due to the NULL cep returned from sk_to_cep(sk): Call Trace: <IRQ> siw_qp_llp_write_space+0x11/0x40 [siw] tcp_check_space+0x4c/0xf0 tcp_rcv_established+0x52b/0x630 tcp_v4_do_rcv+0xf4/0x1e0 tcp_v4_rcv+0x9b8/0xab0 ip_protocol_deliver_rcu+0x2c/0x1c0 ip_local_deliver_finish+0x44/0x50 ip_local_deliver+0x6b/0xf0 ? ip_protocol_deliver_rcu+0x1c0/0x1c0 ip_rcv+0x52/0xd0 ? ip_rcv_finish_core.isra.14+0x390/0x390 __netif_receive_skb_one_core+0x83/0xa0 netif_receive_skb_internal+0x73/0xb0 napi_gro_frags+0x1ff/0x2b0 t4_ethrx_handler+0x4a7/0x740 [cxgb4] process_responses+0x2c9/0x590 [cxgb4] ? t4_sge_intr_msix+0x1d/0x30 [cxgb4] ? handle_irq_event_percpu+0x51/0x70 ? handle_irq_event+0x41/0x60 ? handle_edge_irq+0x97/0x1a0 napi_rx_handler+0x14/0xe0 [cxgb4] net_rx_action+0x2af/0x410 __do_softirq+0xda/0x2a8 do_softirq_own_stack+0x2a/0x40 </IRQ> do_softirq+0x50/0x60 __local_bh_enable_ip+0x50/0x60 ip_finish_output2+0x18f/0x520 ip_output+0x6e/0xf0 ? __ip_finish_output+0x1f0/0x1f0 __ip_queue_xmit+0x14f/0x3d0 ? __slab_alloc+0x4b/0x58 __tcp_transmit_skb+0x57d/0xa60 tcp_write_xmit+0x23b/0xfd0 __tcp_push_pending_frames+0x2e/0xf0 tcp_sendmsg_locked+0x939/0xd50 tcp_sendmsg+0x27/0x40 sock_sendmsg+0x57/0x80 siw_tx_hdt+0x894/0xb20 [siw] ? find_busiest_group+0x3e/0x5b0 ? common_interrupt+0xa/0xf ? common_interrupt+0xa/0xf ? common_interrupt+0xa/0xf siw_qp_sq_process+0xf1/0xe60 [siw] ? __wake_up_common_lock+0x87/0xc0 siw_sq_resume+0x33/0xe0 [siw] siw_run_sq+0xac/0x190 [siw] ? remove_wait_queue+0x60/0x60 kthread+0xf8/0x130 ? siw_sq_resume+0xe0/0xe0 [siw] ? kthread_bind+0x10/0x10 ret_from_fork+0x35/0x40 Fixes: f29dd55b0236 ("rdma/siw: queue pair methods") Link: https://lore.kernel.org/r/20190923101112.32685-1-krishna2@chelsio.com Signed-off-by: Krishnamraju Eraparaju <krishna2@chelsio.com> Reviewed-by: Bernard Metzler <bmt@zurich.ibm.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-01RDMA/vmw_pvrdma: Free SRQ only onceAdit Ranadive1-2/+0
An extra kfree cleanup was missed since these are now deallocated by core. Link: https://lore.kernel.org/r/1568848066-12449-1-git-send-email-aditr@vmware.com Cc: <stable@vger.kernel.org> Fixes: 68e326dea1db ("RDMA: Handle SRQ allocations by IB/core") Signed-off-by: Adit Ranadive <aditr@vmware.com> Reviewed-by: Vishnu Dasa <vdasa@vmware.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-09-30Linux 5.4-rc1Linus Torvalds1-2/+2
2019-09-30csky: Move static keyword to the front of declarationKrzysztof Wilczynski1-1/+1
Move the static keyword to the front of declaration of csky_pmu_of_device_ids, and resolve the following compiler warning that can be seen when building with warnings enabled (W=1): arch/csky/kernel/perf_event.c:1340:1: warning: ‘static’ is not at beginning of declaration [-Wold-style-declaration] Signed-off-by: Krzysztof Wilczynski <kw@linux.com> Signed-off-by: Guo Ren <guoren@kernel.org>
2019-09-30csky: entry: Remove unneeded need_resched() loopValentin Schneider1-4/+0
Since the enabling and disabling of IRQs within preempt_schedule_irq() is contained in a need_resched() loop, we don't need the outer arch code loop. Signed-off-by: Valentin Schneider <valentin.schneider@arm.com> Signed-off-by: Guo Ren <guoren@kernel.org>
2019-09-30csky: Fixup csky_pmu.max_period assignmentMao Han1-1/+1
The csky_pmu.max_period has type u64, and BIT() can only return 32 bits unsigned long on C-SKY. The initialization for max_period will be incorrect when count_width is bigger than 32. Use BIT_ULL() Signed-off-by: Mao Han <han_mao@c-sky.com> Signed-off-by: Guo Ren <ren_guo@c-sky.com>
2019-09-30csky: Fixup add zero_fp fixup perf backtrace panicGuo Ren2-21/+31
We need set fp zero to let backtrace know the end. The patch fixup perf callchain panic problem, because backtrace didn't know what is the end of fp. Signed-off-by: Guo Ren <ren_guo@c-sky.com> Reported-by: Mao Han <han_mao@c-sky.com>
2019-09-30csky: Use generic free_initrd_mem()Mike Rapoport1-16/+0
The csky implementation of free_initrd_mem() is an open-coded version of free_reserved_area() without poisoning. Remove it and make csky use the generic version of free_initrd_mem(). Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Guo Ren <guoren@kernel.org>
2019-09-29Revert "Revert "ext4: make __ext4_get_inode_loc plug""Linus Torvalds1-0/+3
This reverts commit 72dbcf72156641fde4d8ea401e977341bfd35a05. Instead of waiting forever for entropy that may just not happen, we now try to actively generate entropy when required, and are thus hopefully avoiding the problem that caused the nice ext4 IO pattern fix to be reverted. So revert the revert. Cc: Ahmed S. Darwish <darwish.07@gmail.com> Cc: Ted Ts'o <tytso@mit.edu> Cc: Willy Tarreau <w@1wt.eu> Cc: Alexander E. Patrakov <patrakov@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-29random: try to actively add entropy rather than passively wait for itLinus Torvalds1-1/+61
For 5.3 we had to revert a nice ext4 IO pattern improvement, because it caused a bootup regression due to lack of entropy at bootup together with arguably broken user space that was asking for secure random numbers when it really didn't need to. See commit 72dbcf721566 (Revert "ext4: make __ext4_get_inode_loc plug"). This aims to solve the issue by actively generating entropy noise using the CPU cycle counter when waiting for the random number generator to initialize. This only works when you have a high-frequency time stamp counter available, but that's the case on all modern x86 CPU's, and on most other modern CPU's too. What we do is to generate jitter entropy from the CPU cycle counter under a somewhat complex load: calling the scheduler while also guaranteeing a certain amount of timing noise by also triggering a timer. I'm sure we can tweak this, and that people will want to look at other alternatives, but there's been a number of papers written on jitter entropy, and this should really be fairly conservative by crediting one bit of entropy for every timer-induced jump in the cycle counter. Not because the timer itself would be all that unpredictable, but because the interaction between the timer and the loop is going to be. Even if (and perhaps particularly if) the timer actually happens on another CPU, the cacheline interaction between the loop that reads the cycle counter and the timer itself firing is going to add perturbations to the cycle counter values that get mixed into the entropy pool. As Thomas pointed out, with a modern out-of-order CPU, even quite simple loops show a fair amount of hard-to-predict timing variability even in the absense of external interrupts. But this tries to take that further by actually having a fairly complex interaction. This is not going to solve the entropy issue for architectures that have no CPU cycle counter, but it's not clear how (and if) that is solvable, and the hardware in question is largely starting to be irrelevant. And by doing this we can at least avoid some of the even more contentious approaches (like making the entropy waiting time out in order to avoid the possibly unbounded waiting). Cc: Ahmed Darwish <darwish.07@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Nicholas Mc Guire <hofrat@opentech.at> Cc: Andy Lutomirski <luto@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Willy Tarreau <w@1wt.eu> Cc: Alexander E. Patrakov <patrakov@gmail.com> Cc: Lennart Poettering <mzxreary@0pointer.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-29Documentation/process: Clarify disclosure rulesThomas Gleixner1-7/+33
The role of the contact list provided by the disclosing party and how it affects the disclosure process and the ability to include experts into the development process is not really well explained. Neither is it entirely clear when the disclosing party will be informed about the fact that a developer who is not covered by an employer NDA needs to be brought in and disclosed. Explain the role of the contact list and the information policy along with an eventual conflict resolution better. Reported-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/r/alpine.DEB.2.21.1909251028390.10825@nanos.tec.linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-09-28selftests/ftrace: Fix same probe error testSteven Rostedt (VMware)1-1/+1
The "same probe" selftest that tests that adding the same probe fails doesn't add the same probe and passes, which fails the test. Fixes: b78b94b82122 ("selftests/ftrace: Update kprobe event error testcase") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-09-28mm, tracing: Print symbol name for call_site in trace eventsChangbin Du1-3/+4
To improve the readability of raw slab trace points, print the call_site ip using '%pS'. Then we can grep events with function names. [002] .... 808.188897: kmem_cache_free: call_site=putname+0x47/0x50 ptr=00000000cef40c80 [002] .... 808.188898: kfree: call_site=security_cred_free+0x42/0x50 ptr=0000000062400820 [002] .... 808.188904: kmem_cache_free: call_site=put_cred_rcu+0x88/0xa0 ptr=0000000058d74ef8 [002] .... 808.188913: kmem_cache_alloc: call_site=prepare_creds+0x26/0x100 ptr=0000000058d74ef8 bytes_req=168 bytes_alloc=576 gfp_flags=GFP_KERNEL [002] .... 808.188917: kmalloc: call_site=security_prepare_creds+0x77/0xa0 ptr=0000000062400820 bytes_req=8 bytes_alloc=336 gfp_flags=GFP_KERNEL|__GFP_ZERO [002] .... 808.188920: kmem_cache_alloc: call_site=getname_flags+0x4f/0x1e0 ptr=00000000cef40c80 bytes_req=4096 bytes_alloc=4480 gfp_flags=GFP_KERNEL [002] .... 808.188925: kmem_cache_free: call_site=putname+0x47/0x50 ptr=00000000cef40c80 [002] .... 808.188926: kfree: call_site=security_cred_free+0x42/0x50 ptr=0000000062400820 [002] .... 808.188931: kmem_cache_free: call_site=put_cred_rcu+0x88/0xa0 ptr=0000000058d74ef8 Link: http://lkml.kernel.org/r/20190914103215.23301-1-changbin.du@gmail.com Signed-off-by: Changbin Du <changbin.du@gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-09-28tracing: Have error path in predicate_parse() free its allocated memoryNavid Emamdoost1-2/+4
In predicate_parse, there is an error path that is not going to out_free instead it returns directly which leads to a memory leak. Link: http://lkml.kernel.org/r/20190920225800.3870-1-navid.emamdoost@gmail.com Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-09-28tracing: Fix clang -Wint-in-bool-context warnings in IF_ASSIGN macroNathan Chancellor1-5/+5
After r372664 in clang, the IF_ASSIGN macro causes a couple hundred warnings along the lines of: kernel/trace/trace_output.c:1331:2: warning: converting the enum constant to a boolean [-Wint-in-bool-context] kernel/trace/trace.h:409:3: note: expanded from macro 'trace_assign_type' IF_ASSIGN(var, ent, struct ftrace_graph_ret_entry, ^ kernel/trace/trace.h:371:14: note: expanded from macro 'IF_ASSIGN' WARN_ON(id && (entry)->type != id); \ ^ 264 warnings generated. This warning can catch issues with constructs like: if (state == A || B) where the developer really meant: if (state == A || state == B) This is currently the only occurrence of the warning in the kernel tree across defconfig, allyesconfig, allmodconfig for arm32, arm64, and x86_64. Add the implicit '!= 0' to the WARN_ON statement to fix the warnings and find potential issues in the future. Link: https://github.com/llvm/llvm-project/commit/28b38c277a2941e9e891b2db30652cfd962f070b Link: https://github.com/ClangBuiltLinux/linux/issues/686 Link: http://lkml.kernel.org/r/20190926162258.466321-1-natechancellor@gmail.com Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-09-28tracing/probe: Fix to check the difference of nr_args before adding probeMasami Hiramatsu1-0/+16
Steven reported that a test triggered: ================================================================== BUG: KASAN: slab-out-of-bounds in trace_kprobe_create+0xa9e/0xe40 Read of size 8 at addr ffff8880c4f25a48 by task ftracetest/4798 CPU: 2 PID: 4798 Comm: ftracetest Not tainted 5.3.0-rc6-test+ #30 Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v03.03 07/14/2016 Call Trace: dump_stack+0x7c/0xc0 ? trace_kprobe_create+0xa9e/0xe40 print_address_description+0x6c/0x332 ? trace_kprobe_create+0xa9e/0xe40 ? trace_kprobe_create+0xa9e/0xe40 __kasan_report.cold.6+0x1a/0x3b ? trace_kprobe_create+0xa9e/0xe40 kasan_report+0xe/0x12 trace_kprobe_create+0xa9e/0xe40 ? print_kprobe_event+0x280/0x280 ? match_held_lock+0x1b/0x240 ? find_held_lock+0xac/0xd0 ? fs_reclaim_release.part.112+0x5/0x20 ? lock_downgrade+0x350/0x350 ? kasan_unpoison_shadow+0x30/0x40 ? __kasan_kmalloc.constprop.6+0xc1/0xd0 ? trace_kprobe_create+0xe40/0xe40 ? trace_kprobe_create+0xe40/0xe40 create_or_delete_trace_kprobe+0x2e/0x60 trace_run_command+0xc3/0xe0 ? trace_panic_handler+0x20/0x20 ? kasan_unpoison_shadow+0x30/0x40 trace_parse_run_command+0xdc/0x163 vfs_write+0xe1/0x240 ksys_write+0xba/0x150 ? __ia32_sys_read+0x50/0x50 ? tracer_hardirqs_on+0x61/0x180 ? trace_hardirqs_off_caller+0x43/0x110 ? mark_held_locks+0x29/0xa0 ? do_syscall_64+0x14/0x260 do_syscall_64+0x68/0x260 Fix to check the difference of nr_args before adding probe on existing probes. This also may set the error log index bigger than the number of command parameters. In that case it sets the error position is next to the last parameter. Link: http://lkml.kernel.org/r/156966474783.3478.13217501608215769150.stgit@devnote2 Fixes: ca89bc071d5e ("tracing/kprobe: Add multi-probe per event support") Reported-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-09-28mm, page_alloc: allow hugepage fallback to remote nodes when madvisedDavid Rientjes1-0/+11
For systems configured to always try hard to allocate transparent hugepages (thp defrag setting of "always") or for memory that has been explicitly madvised to MADV_HUGEPAGE, it is often better to fallback to remote memory to allocate the hugepage if the local allocation fails first. The point is to allow the initial call to __alloc_pages_node() to attempt to defragment local memory to make a hugepage available, if possible, rather than immediately fallback to remote memory. Local hugepages will always have a better access latency than remote (huge)pages, so an attempt to make a hugepage available locally is always preferred. If memory compaction cannot be successful locally, however, it is likely better to fallback to remote memory. This could take on two forms: either allow immediate fallback to remote memory or do per-zone watermark checks. It would be possible to fallback only when per-zone watermarks fail for order-0 memory, since that would require local reclaim for all subsequent faults so remote huge allocation is likely better than thrashing the local zone for large workloads. In this case, it is assumed that because the system is configured to try hard to allocate hugepages or the vma is advised to explicitly want to try hard for hugepages that remote allocation is better when local allocation and memory compaction have both failed. Signed-off-by: David Rientjes <rientjes@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Stefan Priebe - Profihost AG <s.priebe@profihost.ag> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-28mm, page_alloc: avoid expensive reclaim when compaction may not succeedDavid Rientjes1-0/+22
Memory compaction has a couple significant drawbacks as the allocation order increases, specifically: - isolate_freepages() is responsible for finding free pages to use as migration targets and is implemented as a linear scan of memory starting at the end of a zone, - failing order-0 watermark checks in memory compaction does not account for how far below the watermarks the zone actually is: to enable migration, there must be *some* free memory available. Per the above, watermarks are not always suffficient if isolate_freepages() cannot find the free memory but it could require hundreds of MBs of reclaim to even reach this threshold (read: potentially very expensive reclaim with no indication compaction can be successful), and - if compaction at this order has failed recently so that it does not even run as a result of deferred compaction, looping through reclaim can often be pointless. For hugepage allocations, these are quite substantial drawbacks because these are very high order allocations (order-9 on x86) and falling back to doing reclaim can potentially be *very* expensive without any indication that compaction would even be successful. Reclaim itself is unlikely to free entire pageblocks and certainly no reliance should be put on it to do so in isolation (recall lumpy reclaim). This means we should avoid reclaim and simply fail hugepage allocation if compaction is deferred. It is also not helpful to thrash a zone by doing excessive reclaim if compaction may not be able to access that memory. If order-0 watermarks fail and the allocation order is sufficiently large, it is likely better to fail the allocation rather than thrashing the zone. Signed-off-by: David Rientjes <rientjes@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Stefan Priebe - Profihost AG <s.priebe@profihost.ag> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-28Revert "Revert "Revert "mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask""David Rientjes4-22/+51
This reverts commit 92717d429b38e4f9f934eed7e605cc42858f1839. Since commit a8282608c88e ("Revert "mm, thp: restore node-local hugepage allocations"") is reverted in this series, it is better to restore the previous 5.2 behavior between the thp allocation and the page allocator rather than to attempt any consolidation or cleanup for a policy that is now reverted. It's less risky during an rc cycle and subsequent patches in this series further modify the same policy that the pre-5.3 behavior implements. Consolidation and cleanup can be done subsequent to a sane default page allocation strategy, so this patch reverts a cleanup done on a strategy that is now reverted and thus is the least risky option. Signed-off-by: David Rientjes <rientjes@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Stefan Priebe - Profihost AG <s.priebe@profihost.ag> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-28Revert "Revert "mm, thp: restore node-local hugepage allocations""David Rientjes3-29/+17
This reverts commit a8282608c88e08b1782141026eab61204c1e533f. The commit references the original intended semantic for MADV_HUGEPAGE which has subsequently taken on three unique purposes: - enables or disables thp for a range of memory depending on the system's config (is thp "enabled" set to "always" or "madvise"), - determines the synchronous compaction behavior for thp allocations at fault (is thp "defrag" set to "always", "defer+madvise", or "madvise"), and - reverts a previous MADV_NOHUGEPAGE (there is no madvise mode to only clear previous hugepage advice). These are the three purposes that currently exist in 5.2 and over the past several years that userspace has been written around. Adding a NUMA locality preference adds a fourth dimension to an already conflated advice mode. Based on the semantic that MADV_HUGEPAGE has provided over the past several years, there exist workloads that use the tunable based on these principles: specifically that the allocation should attempt to defragment a local node before falling back. It is agreed that remote hugepages typically (but not always) have a better access latency than remote native pages, although on Naples this is at parity for intersocket. The revert commit that this patch reverts allows hugepage allocation to immediately allocate remotely when local memory is fragmented. This is contrary to the semantic of MADV_HUGEPAGE over the past several years: that is, memory compaction should be attempted locally before falling back. The performance degradation of remote hugepages over local hugepages on Rome, for example, is 53.5% increased access latency. For this reason, the goal is to revert back to the 5.2 and previous behavior that would attempt local defragmentation before falling back. With the patch that is reverted by this patch, we see performance degradations at the tail because the allocator happily allocates the remote hugepage rather than even attempting to make a local hugepage available. zone_reclaim_mode is not a solution to this problem since it does not only impact hugepage allocations but rather changes the memory allocation strategy for *all* page allocations. Signed-off-by: David Rientjes <rientjes@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Stefan Priebe - Profihost AG <s.priebe@profihost.ag> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-28i2c: slave-eeprom: Add read only modeBjörn Ardö1-3/+11
Add read-only versions of all EEPROMs. These versions are read-only on the i2c side, but can be written from the sysfs side. Signed-off-by: Björn Ardö <bjorn.ardo@axis.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2019-09-28i2c: i801: Bring back Block Process Call support for certain platformsJarkko Nikula1-0/+1
Commit b84398d6d7f9 ("i2c: i801: Use iTCO version 6 in Cannon Lake PCH and beyond") looks like to drop by accident Block Write-Block Read Process Call support for Intel Sunrisepoint, Lewisburg, Denverton and Kaby Lake. That support was added for above and newer platforms by the commit 315cd67c9453 ("i2c: i801: Add Block Write-Block Read Process Call support") so bring it back for above platforms. Fixes: b84398d6d7f9 ("i2c: i801: Use iTCO version 6 in Cannon Lake PCH and beyond") Signed-off-by: Jarkko Nikula <jarkko.nikula@linux.intel.com> Reviewed-by: Alexander Sverdlin <alexander.sverdlin@nokia.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2019-09-28i2c: riic: Clear NACK in tend isrChris Brandt1-0/+1
The NACKF flag should be cleared in INTRIICNAKI interrupt processing as description in HW manual. This issue shows up quickly when PREEMPT_RT is applied and a device is probed that is not plugged in (like a touchscreen controller). The result is endless interrupts that halt system boot. Fixes: 310c18a41450 ("i2c: riic: add driver") Cc: stable@vger.kernel.org Reported-by: Chien Nguyen <chien.nguyen.eb@rvc.renesas.com> Signed-off-by: Chris Brandt <chris.brandt@renesas.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2019-09-28i2c: qcom-geni: Disable DMA processing on the Lenovo Yoga C630Lee Jones1-4/+8
We have a production-level laptop (Lenovo Yoga C630) which is exhibiting a rather horrific bug. When I2C HID devices are being scanned for at boot-time the QCom Geni based I2C (Serial Engine) attempts to use DMA. When it does, the laptop reboots and the user never sees the OS. Attempts are being made to debug the reason for the spontaneous reboot. No luck so far, hence the requirement for this hot-fix. This workaround will be removed once we have a viable fix. Signed-off-by: Lee Jones <lee.jones@linaro.org> Tested-by: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2019-09-28iommu/amd: Lock code paths traversing protection_domain->dev_listJoerg Roedel1-1/+24
The traversing of this list requires protection_domain->lock to be taken to avoid nasty races with attach/detach code. Make sure the lock is held on all code-paths traversing this list. Reported-by: Filippo Sironi <sironi@amazon.de> Fixes: 92d420ec028d ("iommu/amd: Relax locking in dma_ops path") Reviewed-by: Filippo Sironi <sironi@amazon.de> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2019-09-28iommu/amd: Lock dev_data in attach/detach code pathsJoerg Roedel2-0/+12
Make sure that attaching a detaching a device can't race against each other and protect the iommu_dev_data with a spin_lock in these code paths. Fixes: 92d420ec028d ("iommu/amd: Relax locking in dma_ops path") Reviewed-by: Filippo Sironi <sironi@amazon.de> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2019-09-28iommu/amd: Check for busy devices earlier in attach_device()Joerg Roedel1-18/+7
Check early in attach_device whether the device is already attached to a domain. This also simplifies the code path so that __attach_device() can be removed. Fixes: 92d420ec028d ("iommu/amd: Relax locking in dma_ops path") Reviewed-by: Filippo Sironi <sironi@amazon.de> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2019-09-28iommu/amd: Take domain->lock for complete attach/detach pathJoerg Roedel1-39/+26
The code-paths before __attach_device() and __detach_device() are called also access and modify domain state, so take the domain lock there too. This allows to get rid of the __detach_device() function. Fixes: 92d420ec028d ("iommu/amd: Relax locking in dma_ops path") Reviewed-by: Filippo Sironi <sironi@amazon.de> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2019-09-28iommu/amd: Remove amd_iommu_devtable_lockJoerg Roedel1-17/+6
The lock is not necessary because the device table does not contain shared state that needs protection. Locking is only needed on an individual entry basis, and that needs to happen on the iommu_dev_data level. Fixes: 92d420ec028d ("iommu/amd: Relax locking in dma_ops path") Reviewed-by: Filippo Sironi <sironi@amazon.de> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>