aboutsummaryrefslogtreecommitdiffstatshomepage
path: root/tools/perf/scripts/python/call-graph-from-postgresql.py (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2016-07-05NFS: Remove unused function nfs_revalidate_mapping_protected()Trond Myklebust2-35/+4
Clean up... Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-05NFS: Remove redundant waits for O_DIRECT in fsync() and write_begin()Trond Myklebust1-6/+0
We're now waiting immediately after taking the locks, so waiting in fsync() and write_begin() is either redundant or potentially subject to livelock (if not holding the lock). Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-05NFS: Cleanup nfs_direct_complete()Trond Myklebust1-7/+5
There is only one caller that sets the "write" argument to true, so just move the call to nfs_zap_mapping() and get rid of the now redundant argument. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-05NFS: Do not serialise O_DIRECT reads and writesTrond Myklebust6-37/+174
Allow dio requests to be scheduled in parallel, but ensuring that they do not conflict with buffered I/O. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-05NFS: Move buffered I/O locking into nfs_file_write()Trond Myklebust1-12/+15
Preparation for the patch that de-serialises O_DIRECT reads and writes. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-05NFS Cleanup: move call to generic_write_checks() into fs/nfs/direct.cTrond Myklebust2-9/+9
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-05NFS: Remove racy size manipulations in O_DIRECTTrond Myklebust1-16/+0
On success, the RPC callbacks will ensure that we make the appropriate calls to nfs_writeback_update_inode() Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-05NFS: Ensure we reset the write verifier 'committed' value on resend.Trond Myklebust2-0/+19
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-05NFS: Fix O_DIRECT verifier problemsTrond Myklebust3-3/+16
We should not be interested in looking at the value of the stable field, since that could take any value. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-05pNFS: pnfs_layoutcommit_outstanding() is no longer used when !CONFIG_NFS_V4_1Trond Myklebust1-7/+0
Cleanup... Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-05pNFS: Ensure we layoutcommit before revalidating attributesTrond Myklebust1-16/+7
If we need to update the cached attributes, then we'd better make sure that we also layoutcommit first. Otherwise, the server may have stale attributes. Prior to this patch, the revalidation code tried to "fix" this problem by simply disabling attributes that would be affected by the layoutcommit. That approach breaks nfs_writeback_check_extend(), leading to a file size corruption. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-05pNFS: Files and flexfiles always need to commit before layoutcommitTrond Myklebust5-9/+30
So ensure that we mark the layout for commit once the write is done, and then ensure that the commit to ds is finished before sending layoutcommit. Note that by doing this, we're able to optimise away the commit for the case of servers that don't need layoutcommit in order to return updated attributes. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-05pNFS/flexfiles: Clean up calls to pnfs_set_layoutcommit()Trond Myklebust1-9/+10
Let's just have one place where we check ff_layout_need_layoutcommit(). Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-05pNFS/flexfiles: Fix layoutcommit after a commit to DSTrond Myklebust1-2/+1
We should always do a layoutcommit after commit to DS, except if the layout segment we're using has set FF_FLAGS_NO_LAYOUTCOMMIT. Fixes: d67ae825a59d ("pnfs/flexfiles: Add the FlexFile Layout Driver") Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-05pNFS/files: Fix layoutcommit after a commit to DSTrond Myklebust1-2/+1
According to the errata https://www.rfc-editor.org/errata_search.php?rfc=5661&eid=2751 we should always send layout commit after a commit to DS. Fixes: bc7d4b8fd091 ("nfs/filelayout: set layoutcommit...") Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-06-22NFS: Don't call COMMIT in ->releasepage()Trond Myklebust1-23/+0
While COMMIT has the potential to free up a lot of memory that is being taken by unstable writes, it isn't guaranteed to free up this particular page. Also, calling fsync() on the server is expensive and so we want to do it in a more controlled fashion, rather than have it triggered at random by the VM. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-06-22NFS: Don't hold the inode lock across fsync()Trond Myklebust1-2/+0
Commits are no longer required to be serialised. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-06-22NFS: writepage of a single page should not be synchronousTrond Myklebust1-1/+1
It is almost always better to wait for more so that we can issue a bulk commit. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-06-22NFS: Kill NFS_INO_NFS_INO_FLUSHING: it is a performance killerTrond Myklebust4-21/+0
filemap_datawrite() and friends already deal just fine with livelock. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-06-22NFS: Cache aggressively when file is open for writingTrond Myklebust2-29/+46
Unless the user is using file locking, we must assume close-to-open cache consistency when the file is open for writing. Adjust the caching algorithm so that it does not clear the cache on out-of-order writes and/or attribute revalidations. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-06-15NFS: Cache access checks more aggressivelyTrond Myklebust1-21/+31
If an attribute revalidation fails, then we already know that we'll zap the access cache. If, OTOH, the inode isn't changing, there should be no need to eject access calls just because they are old. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-06-13NFS: Don't flush caches for a getattr that races with writebackTrond Myklebust1-6/+9
If there were outstanding writes then chalk up the unexpected change attribute on the server to them. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-06-12Linux 4.7-rc3Linus Torvalds1-1/+1
2016-06-10uvc_v4l2: Simplify compat ioctl implementationAndy Lutomirski1-56/+2
The uvc compat ioctl implementation seems to have copied user data for no good reason. Remove a bunch of copies. Signed-off-by: Andy Lutomirski <luto@kernel.org>
2016-06-10uvc: Forward compat ioctls to their handlers directlyAndy Lutomirski1-21/+18
The current code goes through a lot of indirection just to call a known handler. Simplify it: just call the handlers directly. Cc: stable@vger.kernel.org Signed-off-by: Andy Lutomirski <luto@kernel.org>
2016-06-10sched: panic on corrupted stack endJann Horn1-1/+2
Until now, hitting this BUG_ON caused a recursive oops (because oops handling involves do_exit(), which calls into the scheduler, which in turn raises an oops), which caused stuff below the stack to be overwritten until a panic happened (e.g. via an oops in interrupt context, caused by the overwritten CPU index in the thread_info). Just panic directly. Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-10ecryptfs: forbid opening files without mmap handlerJann Horn1-2/+11
This prevents users from triggering a stack overflow through a recursive invocation of pagefault handling that involves mapping procfs files into virtual memory. Signed-off-by: Jann Horn <jannh@google.com> Acked-by: Tyler Hicks <tyhicks@canonical.com> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-10proc: prevent stacking filesystems on topJann Horn1-0/+7
This prevents stacking filesystems (ecryptfs and overlayfs) from using procfs as lower filesystem. There is too much magic going on inside procfs, and there is no good reason to stack stuff on top of procfs. (For example, procfs does access checks in VFS open handlers, and ecryptfs by design calls open handlers from a kernel thread that doesn't drop privileges or so.) Signed-off-by: Jann Horn <jannh@google.com> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-10x86/ioapic: Fix incorrect pointers in ioapic_setup_resources()Rui Wang1-1/+1
On a 4-socket Brickland system, hot-removing one ioapic is fine. Hot-removing the 2nd one causes panic in mp_unregister_ioapic() while calling release_resource(). It is because the iomem_res pointer has already been released when removing the first ioapic. To explain the use of &res[num] here: res is assigned to ioapic_resources, and later in ioapic_insert_resources() we do: struct resource *r = ioapic_resources; for_each_ioapic(i) { insert_resource(&iomem_resource, r); r++; } Here 'r' is treated as an arry of 'struct resource', and the r++ ensures that each element of the array is inserted separately. Thus we should call release_resouce() on each element at &res[num]. Fix it by assigning the correct pointers to ioapics[i].iomem_res in ioapic_setup_resources(). Signed-off-by: Rui Wang <rui.y.wang@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: tony.luck@intel.com Cc: linux-pci@vger.kernel.org Cc: rjw@rjwysocki.net Cc: linux-acpi@vger.kernel.org Cc: bhelgaas@google.com Link: http://lkml.kernel.org/r/1465369193-4816-3-git-send-email-rui.y.wang@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-10x86/entry/traps: Don't force in_interrupt() to return true in IST handlersAndy Lutomirski1-10/+10
Forcing in_interrupt() to return true if we're not in a bona fide interrupt confuses the softirq code. This fixes warnings like: NOHZ: local_softirq_pending 282 ... which can happen when running things like selftests/x86. This will change perf's static percpu buffer usage in IST context. I think this is okay, and it's changing the behavior to match historical (pre-4.0) behavior. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Fixes: 959274753857 ("x86, traps: Track entry into and exit from IST context") Link: http://lkml.kernel.org/r/cdc215f94d118d691d73df35275022331156fb45.1464130360.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-10vmxnet3: segCnt can be 1 for LRO packetsShrikrishna Khare2-3/+3
The device emulation may send segCnt of 1 for LRO packets. Signed-off-by: Shrikrishna Khare <skhare@vmware.com> Signed-off-by: Jin Heo <heoj@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-09packet: compat support for sock_fprogWillem de Bruijn3-2/+41
Socket option PACKET_FANOUT_DATA takes a struct sock_fprog as argument if PACKET_FANOUT has mode PACKET_FANOUT_CBPF. This structure contains a pointer into user memory. If userland is 32-bit and kernel is 64-bit the two disagree about the layout of struct sock_fprog. Add compat setsockopt support to convert a 32-bit compat_sock_fprog to a 64-bit sock_fprog. This is analogous to compat_sock_fprog support for SO_REUSEPORT added in commit 1957598840f4 ("soreuseport: add compat case for setsockopt SO_ATTACH_REUSEPORT_CBPF"). Reported-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Willem de Bruijn <willemb@google.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-09stmmac: fix parameter to dwmac4_set_umac_addr()Ben Dooks1-1/+1
The dwmac4_set_umac_addr() takes a struct mac_device_info as the first parameter, but is being passed a ioaddr instead from dwmac4_set_filter(). Fix the warning/bug by changing the first parameter. drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c:159:46: warning: incorrect type in argument 1 (different address spaces) drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c:159:46: expected struct mac_device_info *hw drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c:159:46: got void [noderef] <asn:2>*ioaddr Note, only compile tested this as do not have any hardware with it in. Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk> Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-09net/mlx5e: Fix blue flame quota logicEli Cohen1-1/+2
Blue flame is a latency enhancement feature that allows the driver to write the packet data directly to the NIC's registers thus making the read of the packet data from host memory redundant. We maintain a quota for the blue flame which is reloaded whenever we identify that the hardware is processing send requests and processes them fast enough so by the time we post the next send request it was able to process all the pending ones. This indicates that the hardware is capable of processing more blue flame requests efficiently. The blue flame quota is decremented whenever we send using blue flame. The current code erroneously clears the budget if we did not use blue flame for the current post send operation and we fix it here. Fixes: 88a85f99e51f ('net/mlx5e: TX latency optimization to save DMA reads') Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-09net/mlx5e: Use ndo_stop explicitly at shutdown flowEran Ben Elisha1-4/+1
The current implementation copies the flow of ndo_stop instead of calling it explicitly, Fixed it. Fixes: 5fc7197d3a25 ("net/mlx5: Add pci shutdown callback") Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-09net/mlx5: E-Switch, always set mc_promisc for allmulti vportsMohamad Haj Yahia1-0/+1
Set the mc_promisc flag also in the case of adding new mc address to existing allmulti vport. Fixes: a35f71f27a61 ('net/mlx5: E-Switch, Implement promiscuous rx modes vf request handling') Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-09net/mlx5: E-Switch, Modify node guid on vf set MACNoa Osherovich4-4/+68
In RoCE, the RDMA-CM needs the node guid to establish connection between nodes. Today, the node guid exposed to mlx5 Ethernet VFs is zero, therefore RDMA-CM on the VF is broken. Whenever the administrator sets a MAC for a VF, derive the node guid from it and set it as well in the following way: MAC: e4:1d:2d:b3:f4:01 -> node_guid: e4:1d:2d:ff:fe:b3:f4:01 Fixes: 77256579c6b43 ('net/mlx5: E-Switch, Introduce Vport...') Signed-off-by: Noa Osherovich <noaos@mellanox.com> Signed-off-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-09net/mlx5: E-Switch, Fix vport enable flowMohamad Haj Yahia1-4/+1
Reorder vport enable flow to mark the vport as enabled before calling the vport change handler which was modified to handle the case for when vport is not enabled. This fixes the case for when the PF netdev is open before sriov is enabled, once sriov is enabled at esw_enable_vport, esw_vport_change_handle_locked didn't read the PF context since it thought the PF vport was not enabled. When we enable the vport, arming for events is not required anymore, since it's done on the vport change handle Fixes: 586cfa7f1d58 ('net/mlx5: E-Switch, Use vport event handler for vport cleanup') Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-09net/mlx5: E-Switch, Use the correct error check on returned pointersOr Gerlitz1-17/+17
The mlx5 flow-steering API (mlx5_create_flow_table/group/rule) never returns null pointer on error. Even if it was doing that, checking for IS_ERR_OR_NULL(p) and then returning PTR_ERR(p) would have cause bugs, since PTR_ERR(NULL) --> success, crash. To make things more robust and protect against related future bugs, convert all IS_ERR_OR_NULL checks on returned values to IS_ERR. Fixes: 5742df0f7dbe ('net/mlx5: E-Switch, Introduce VST vport ingress/egress ACLs') Fixes: 86d722ad2c3b ('net/mlx5: Use flow steering infrastructure for mlx5_en') Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reported-by: Ilya Lesokhin <ilyal@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-09net/mlx5: E-Switch, Use the correct free() functionOr Gerlitz1-3/+3
We must use kvfree() for something that could have been allocated with vzalloc(), do that. Fixes: 5742df0f7dbe ('net/mlx5: E-Switch, Introduce VST vport ingress/egress ACLs') Fixes: 86d722ad2c3b ('net/mlx5: Use flow steering infrastructure for mlx5_en') Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reported-by: Ilya Lesokhin <ilyal@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-09net/mlx5: Fix E-Switch flow steering capabilities checkMaor Gottlieb1-13/+15
Add missing capabilities check for E-Switch FDB and ACLs flow tables before creating their namespace in flow steering. Fixes: efdc810ba39d ('net/mlx5: Flow steering, Add vport ACL support') Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-09net/mlx5: Fix flow steering NIC capabilities checkMaor Gottlieb2-1/+15
Flow steering infrastructure is currently used only on link layer ethernet, therefore the driver should initialize the flow steering when the device link layer is ethernet. In addition, add missing capability check before initializing the namespace of NIC RX flow tables. Fixes: 2530236303d9 ('net/mlx5_core: Flow steering tree initialization') Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-09net/mlx5: Fix root flow table updateMaor Gottlieb1-1/+1
When we destroy the last flow table we need to update the root_ft to NULL. It fixes an issue for when the last flow table is destroyed and recreated again, root_ft pointer will not be updated, as a result traffic will be dropped. Fixes: 2cc43b494a6c ('net/mlx5_core: Managing root flow table') Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-09net/mlx5: Fix MLX5_CMD_OP_MAX to be defined correctlyShahar Klein2-3/+2
Having MLX5_CMD_OP_MAX on another file causes us to repeatedly miss accounting new commands added to the driver and hence there're no entries for them in debugfs. To solve that, we integrate it into the commands enum as the last entry. Fixes: 34a40e689393 ('net/mlx5_core: Introduce modify flow table command') Signed-off-by: Shahar Klein <shahark@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-09net/mlx5: Fix masking of reserved bits in XRCD numberMajd Dibbiny1-1/+1
Mask the reserved bits when reading the number of newly created XRCD. Fixes: e126ba97dba9 ('mlx5: Add driver for Mellanox Connect-IB adapters') Signed-off-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-09net/mlx5: Fix the size of modify QP mailboxMajd Dibbiny1-0/+1
Add 16 reserved bytes at the end of mlx5_modify_qp_mbox_in to match the hardware spec definition. Fixes: e126ba97dba9 ('mlx5: Add driver for Mellanox Connect-IB adapters') Signed-off-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-10powerpc/nohash: Fix build break with 64K pagesMichael Ellerman1-1/+1
Commit 74701d5947a6 "powerpc/mm: Rename function to indicate we are allocating fragments" renamed page_table_free() to pte_fragment_free(). One occurrence was mistyped as pte_fragment_fre(). This only breaks the nohash 64K page build, which is not the default or enabled in any defconfig. Fixes: 74701d5947a6 ("powerpc/mm: Rename function to indicate we are allocating fragments") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-10drm/amdgpu: fix warning with powerplay disabled.Dave Airlie1-1/+1
This just fixes a warning when you disable powerplay. Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-06-09mm/fadvise.c: do not discard partial pages with POSIX_FADV_DONTNEEDOleg Drokin1-0/+11
I noticed that the logic in the fadvise64_64 syscall is incorrect for partial pages. While first page of the region is correctly skipped if it is partial, the last page of the region is mistakenly discarded. This leads to problems for applications that read data in non-page-aligned chunks discarding already processed data between the reads. A somewhat misguided application that does something like write(XX bytes (non-page-alligned)); drop the data it just wrote; repeat gets a significant penalty in performance as a result. Link: http://lkml.kernel.org/r/1464917140-1506698-1-git-send-email-green@linuxhacker.ru Signed-off-by: Oleg Drokin <green@linuxhacker.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-09mm: introduce dedicated WQ_MEM_RECLAIM workqueue to do lru_add_drain_allWang Sheng-Hui1-1/+19
This patch is based on https://patchwork.ozlabs.org/patch/574623/. Tejun submitted commit 23d11a58a9a6 ("workqueue: skip flush dependency checks for legacy workqueues") for the legacy create*_workqueue() interface. But some workq created by alloc_workqueue still reports warning on memory reclaim, e.g nvme_workq with flag WQ_MEM_RECLAIM set: workqueue: WQ_MEM_RECLAIM nvme:nvme_reset_work is flushing !WQ_MEM_RECLAIM events:lru_add_drain_per_cpu ------------[ cut here ]------------ WARNING: CPU: 0 PID: 6 at SoC/linux/kernel/workqueue.c:2448 check_flush_dependency+0xb4/0x10c ... check_flush_dependency+0xb4/0x10c flush_work+0x54/0x140 lru_add_drain_all+0x138/0x188 migrate_prep+0xc/0x18 alloc_contig_range+0xf4/0x350 cma_alloc+0xec/0x1e4 dma_alloc_from_contiguous+0x38/0x40 __dma_alloc+0x74/0x25c nvme_alloc_queue+0xcc/0x36c nvme_reset_work+0x5c4/0xda8 process_one_work+0x128/0x2ec worker_thread+0x58/0x434 kthread+0xd4/0xe8 ret_from_fork+0x10/0x50 That's because lru_add_drain_all() will schedule the drain work on system_wq, whose flag is set to 0, !WQ_MEM_RECLAIM. Introduce a dedicated WQ_MEM_RECLAIM workqueue to do lru_add_drain_all(), aiding in getting memory freed. Link: http://lkml.kernel.org/r/1464917521-9775-1-git-send-email-shhuiw@foxmail.com Signed-off-by: Wang Sheng-Hui <shhuiw@foxmail.com> Acked-by: Tejun Heo <tj@kernel.org> Cc: Keith Busch <keith.busch@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thierry Reding <treding@nvidia.com> Cc: Ingo Molnar <mingo@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>