aboutsummaryrefslogtreecommitdiffstats
path: root/tools/perf/scripts/python/export-to-postgresql.py (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2020-05-23drm/msm: Refactor address space initializationJordan Crouse17-118/+85
Refactor how address space initialization works. Instead of having the address space function create the MMU object (and thus require separate but equal functions for gpummu and iommu) use a single function and pass the MMU struct in. Make the generic code cleaner by using target specific functions to create the address space so a2xx can do its own thing in its own space. For all the other targets use a generic helper to initialize IOMMU but leave the door open for newer targets to use customization if they need it. Reviewed-by: Rob Clark <robdclark@gmail.com> Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org> Tested-by: Shawn Guo <shawn.guo@linaro.org> [squash in rebase fixups] Signed-off-by: Rob Clark <robdclark@chromium.org>
2020-05-23drm/msm: Attach the IOMMU device during initializationJordan Crouse9-55/+27
Everywhere an IOMMU object is created by msm_gpu_create_address_space the IOMMU device is attached immediately after. Instead of carrying around the infrastructure to do the attach from the device specific code do it directly in the msm_iommu_init() function. This gets it out of the way for more aggressive cleanups that follow. Reviewed-by: Rob Clark <robdclark@gmail.com> Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org> Tested-by: Shawn Guo <shawn.guo@linaro.org> [squash in rebase fixups and fix for unused fxn] Signed-off-by: Rob Clark <robdclark@chromium.org>
2020-05-22drm/msm/dpu: dpu_setup_dspp_pcc() can be statickbuild test robot1-1/+1
Fixes: 4259ff7ae509 ("drm/msm/dpu: add support for pcc color block in dpu driver") Signed-off-by: kbuild test robot <lkp@intel.com> Signed-off-by: Rob Clark <robdclark@chromium.org>
2020-05-22drm/msm/a6xx: a6xx_hfi_send_start() can be statickbuild test robot1-2/+2
Fixes: 8167e6fa76c8 ("drm/msm/a6xx: HFI v2 for A640 and A650") Signed-off-by: kbuild test robot <lkp@intel.com> Signed-off-by: Rob Clark <robdclark@chromium.org>
2020-05-18drm/msm/a4xx: add a405_registers for a405 deviceShawn Guo1-3/+50
A405 device has a different set of registers than a4xx_registers. It has no VMIDMT or XPU registers, and VBIF registers are different. Let's add a405_registers for a405 device. As adreno_is_a405() works only after adreno_gpu_init() gets called, the assignments get moved down after adreno_gpu_init(). Signed-off-by: Shawn Guo <shawn.guo@linaro.org> Reviewed-by: Jordan Crouse <jcrouse@codeauorora.org> Signed-off-by: Rob Clark <robdclark@chromium.org>
2020-05-18drm/msm/a4xx: add adreno a405 supportShawn Guo4-13/+34
It adds support for adreno a405 found on MSM8939. The adreno_is_a430() check in adreno_submit() needs an extension to cover a405. The downstream driver suggests it should cover the whole a4xx generation. That's why it gets changed to adreno_is_a4xx(), while a420 is not tested though. Signed-off-by: Shawn Guo <shawn.guo@linaro.org> Reviewed-by: Jordan Crouse <jcrouse@codeaurora.org> Signed-off-by: Rob Clark <robdclark@chromium.org>
2020-05-18drm/msm/a6xx: update a6xx_hw_init for A640 and A650Jonathan Marek2-9/+61
Adreno 640 and 650 GPUs need some registers set differently. Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Jordan Crouse <jcrouse@codeaurora.org> Signed-off-by: Rob Clark <robdclark@chromium.org>
2020-05-18drm/msm/a6xx: enable GMU logJonathan Marek3-0/+20
This is required for a650 to work. Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Jordan Crouse <jcrouse@codeaurora.org> Signed-off-by: Rob Clark <robdclark@chromium.org>
2020-05-18drm/msm/a6xx: update pdc/rscc GMU registers for A640/A650Jonathan Marek3-53/+85
Update the gmu_pdc registers for A640 and A650. Some of the RSCC registers on A650 are in a separate region. Note this also changes the address of these registers: RSCC_TCS1_DRV0_STATUS RSCC_TCS2_DRV0_STATUS RSCC_TCS3_DRV0_STATUS Based on the values in msm-4.14 and msm-4.19 kernels. v3: replaced adreno_is_a650 around ->rscc with checks for "rscc" resource Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Jordan Crouse <jcrouse@codeaurora.org> Signed-off-by: Rob Clark <robdclark@chromium.org>
2020-05-18drm/msm/a6xx: A640/A650 GMU firmware pathJonathan Marek3-16/+138
Newer GPUs have different GMU firmware path. v3: updated a6xx_gmu_fw_load based on feedback, including gmu_write_bulk, and removed extra whitespace change Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Jordan Crouse <jcrouse@codeaurora.org> Signed-off-by: Rob Clark <robdclark@chromium.org>
2020-05-18drm/msm/a6xx: HFI v2 for A640 and A650Jonathan Marek5-24/+222
Add HFI v2 code paths required by Adreno 640 and 650 GPUs. Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Jordan Crouse <jcrouse@codeaurora.org> Signed-off-by: Rob Clark <robdclark@chromium.org>
2020-05-18drm/msm/a6xx: add A640/A650 to gpulistJonathan Marek3-1/+35
Add Adreno 640 and 650 GPU info to the gpulist. Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Jordan Crouse <jcrouse@codeaurora.org> Signed-off-by: Rob Clark <robdclark@chromium.org>
2020-05-18drm/msm/a6xx: use msm_gem for GMU memory objectsJonathan Marek3-42/+88
This gives more fine-grained control over how memory is allocated over the DMA api. In particular, it allows using an address range or pinning to a fixed address. Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Jordan Crouse <jcrouse@codeaurora.org> Signed-off-by: Rob Clark <robdclark@chromium.org>
2020-05-18drm/msm: add internal MSM_BO_MAP_PRIV flagJonathan Marek2-0/+4
This flag sets IOMMU_PRIV, which is required for some a6xx GMU objects. Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Jordan Crouse <jcrouse@codeauorora.org> Signed-off-by: Rob Clark <robdclark@chromium.org>
2020-05-18drm/msm: add msm_gem_get_and_pin_iova_rangeJonathan Marek3-10/+30
This function allows pinning iova to a specific page range (for a6xx GMU). Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Jordan Crouse <jcrouse@codeaurora.org> Signed-off-by: Rob Clark <robdclark@chromium.org>
2020-05-18drm/msm: Check for powered down HW in the devfreq callbacksJordan Crouse3-0/+21
Writing to the devfreq sysfs nodes while the GPU is powered down can result in a system crash (on a5xx) or a nasty GMU error (on a6xx): $ /sys/class/devfreq/5000000.gpu# echo 500000000 > min_freq [ 104.841625] platform 506a000.gmu: [drm:a6xx_gmu_set_oob] *ERROR* Timeout waiting for GMU OOB set GPU_DCVS: 0x0 Despite the fact that we carefully try to suspend the devfreq device when the hardware is powered down there are lots of holes in the governors that don't check for the suspend state and blindly call into the devfreq callbacks that end up triggering hardware reads in the GPU driver. Call pm_runtime_get_if_in_use() in the gpu_busy() and gpu_set_freq() callbacks to skip the hardware access if it isn't active. v3: Only check pm_runtime_get_if_in_use() for == 0 per Eric Anholt v2: Use pm_runtime_get_if_in_use() per Eric Anholt Cc: stable@vger.kernel.org Reviewed-by: Eric Anholt <eric@anholt.net> Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org> Signed-off-by: Rob Clark <robdclark@chromium.org>
2020-05-18drm/msm/dpu: update bandwidth threshold checkKrishna Manikandan2-24/+3
Maximum allowed bandwidth has no dependency on the type of panel used. Hence, cleanup the code to use max_bw_high as the threshold value for bandwidth checks. Update the maximum allowed bandwidth as 6.8Gbps for SC7180 target. Signed-off-by: Krishna Manikandan <mkrishn@codeaurora.org> Signed-off-by: Rob Clark <robdclark@chromium.org>
2020-05-18drm/msm/dpu: add support for clk and bw scaling for displayKalyan Thota8-23/+228
This change adds support to scale src clk and bandwidth as per composition requirements. Interconnect registration for bw has been moved to mdp device node from mdss to facilitate the scaling. Signed-off-by: Kalyan Thota <kalyan_t@codeaurora.org> Signed-off-by: Rob Clark <robdclark@chromium.org>
2020-05-18drm/msm/dpu: add support for pcc color block in dpu driverKalyan Thota4-4/+164
This change adds support for color correction sub block for SC7180 device. Signed-off-by: Kalyan Thota <kalyan_t@codeaurora.org> Tested-by: Fritz Koenig <frkoenig@google.com> Signed-off-by: Rob Clark <robdclark@chromium.org>
2020-05-18drm/msm/dpu: add support for color processing blocks in dpu driverKalyan Thota14-15/+322
This change adds support to configure dspp blocks in the dpu driver. Macro description of the changes coming in this patch. 1) Add dspp definitions in the hw catalog. 2) Add capability to reserve dspp blocks in the display data path. 3) Attach the reserved block to the encoder. Signed-off-by: Kalyan Thota <kalyan_t@codeaurora.org> Tested-by: Fritz Koenig <frkoenig@google.com> Signed-off-by: Rob Clark <robdclark@chromium.org>
2020-05-18drm/msm/mdp5: Fix mdp5_init error path for failed mdp5_kms allocationRoy Spliet1-1/+2
When allocation for mdp5_kms fails, calling mdp5_destroy() leads to undefined behaviour, likely a nullptr exception or use-after-free troubles. Signed-off-by: Roy Spliet <nouveau@spliet.org> Reviewed-by: Abhinav Kumar <abhinavk@codeaurora.org> Signed-off-by: Rob Clark <robdclark@chromium.org>
2020-05-18drm/msm: Fix typoChristophe JAILLET1-2/+2
Duplicated 'we' Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Abhinav Kumar <abhinavk@codeaurora.org> Signed-off-by: Rob Clark <robdclark@chromium.org>
2020-05-18drm/msm: Fix undefined "rd_full" link errorBjorn Andersson1-2/+2
rd_full should be defined outside the CONFIG_DEBUG_FS region, in order to be able to link the msm driver even when CONFIG_DEBUG_FS is disabled. Fixes: e515af8d4a6f ("drm/msm: devcoredump should dump MSM_SUBMIT_BO_DUMP buffers") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> Reviewed-by: Rob Clark <robdclark@gmail.com> Signed-off-by: Rob Clark <robdclark@chromium.org>
2020-05-18drm/msm: Add syncobj support.Bas Nieuwenhuizen3-4/+258
This 1) Enables core DRM syncobj support. 2) Adds options to the submission ioctl to wait/signal syncobjs. Just like the wait fence fd, this does inline waits. Using the scheduler would be nice but I believe it is out of scope for this work. Support for timeline syncobjs is implemented and the interface is ready for it, but I'm not enabling it yet until there is some code for turnip to use it. The reset is mostly in there because in the presence of waiting and signalling the same semaphores, resetting them after signalling can become very annoying. v2: - Fixed style issues - Removed a cleanup issue in a failure case - Moved to a copy_from_user per syncobj v3: - Fixed a missing declaration introduced in v2 - Reworked to use ERR_PTR/PTR_ERR - Simplified failure gotos. Used by: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2769 Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Jordan Crouse <jcrouse@codeaurora.org> Signed-off-by: Rob Clark <robdclark@chromium.org>
2020-05-18drm/msm/dpu: Fix compile warningsHongbo Yao1-18/+0
Using the following command will get compile warnings: make W=1 drivers/gpu/drm/msm/disp/dpu1/dpu_crtc.o ARCH=arm64 drivers/gpu/drm/msm/disp/dpu1/dpu_crtc.c: In function ‘_dpu_crtc_program_lm_output_roi’: drivers/gpu/drm/msm/disp/dpu1/dpu_crtc.c:91:19: warning: variable ‘dpu_crtc’ set but not used [-Wunused-but-set-variable] struct dpu_crtc *dpu_crtc; ^~~~~~~~ drivers/gpu/drm/msm/disp/dpu1/dpu_crtc.c: In function ‘dpu_crtc_atomic_begin’: drivers/gpu/drm/msm/disp/dpu1/dpu_crtc.c:428:35: warning: variable ‘smmu_state’ set but not used [-Wunused-but-set-variable] struct dpu_crtc_smmu_state_data *smmu_state; ^~~~~~~~~~ drivers/gpu/drm/msm/disp/dpu1/dpu_crtc.c: In function ‘dpu_crtc_atomic_flush’: drivers/gpu/drm/msm/disp/dpu1/dpu_crtc.c:489:25: warning: variable ‘event_thread’ set but not used [-Wunused-but-set-variable] struct msm_drm_thread *event_thread; ^~~~~~~~~~~~ drivers/gpu/drm/msm/disp/dpu1/dpu_crtc.c: In function ‘dpu_crtc_destroy_state’: drivers/gpu/drm/msm/disp/dpu1/dpu_crtc.c:565:19: warning: variable ‘dpu_crtc’ set but not used [-Wunused-but-set-variable] struct dpu_crtc *dpu_crtc; ^~~~~~~~ drivers/gpu/drm/msm/disp/dpu1/dpu_crtc.c: In function ‘dpu_crtc_duplicate_state’: drivers/gpu/drm/msm/disp/dpu1/dpu_crtc.c:664:19: warning: variable ‘dpu_crtc’ set but not used [-Wunused-but-set-variable] struct dpu_crtc *dpu_crtc; ^~~~~~~~ drivers/gpu/drm/msm/disp/dpu1/dpu_crtc.c: In function ‘dpu_crtc_disable’: drivers/gpu/drm/msm/disp/dpu1/dpu_crtc.c:693:26: warning: variable ‘priv’ set but not used [-Wunused-but-set-variable] struct msm_drm_private *priv; ^~~~ drivers/gpu/drm/msm/disp/dpu1/dpu_crtc.c:691:27: warning: variable ‘mode’ set but not used [-Wunused-but-set-variable] struct drm_display_mode *mode; ^~~~ drivers/gpu/drm/msm/disp/dpu1/dpu_crtc.c: In function ‘dpu_crtc_enable’: drivers/gpu/drm/msm/disp/dpu1/dpu_crtc.c:766:26: warning: variable ‘priv’ set but not used [-Wunused-but-set-variable] struct msm_drm_private *priv; ^~~~ drivers/gpu/drm/msm/disp/dpu1/dpu_crtc.c: In function ‘dpu_crtc_init’: drivers/gpu/drm/msm/disp/dpu1/dpu_crtc.c:1292:18: warning: variable ‘kms’ set but not used [-Wunused-but-set-variable] struct dpu_kms *kms = NULL; ^~~ drivers/gpu/drm/msm/disp/dpu1/dpu_crtc.c:663: warning: Excess function parameter 'Returns' description in 'dpu_crtc_duplicate_state' Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Hongbo Yao <yaohongbo@huawei.com> Signed-off-by: Rob Clark <robdclark@chromium.org>
2020-05-18drm/msm/a6xx: Fix a typo in an error messageChristophe JAILLET1-2/+2
'in' is duplicated in the error message. Axe one of them. While at it, slighly improve indentation. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Rob Clark <robdclark@chromium.org>
2020-05-18drm/msm/mdp5: Add MDP5 configuration for MSM8x36.Konrad Dybcio1-0/+76
This change adds MDP5 configuration for MSM8x36-based SoCs, like MSM8936, 8939 and their APQ variants. The configuration is based on MSM8916's, but adds some notable features, like ad and pp blocks, along with some register changes. changes since v1: - add an ad block - add a second mixer @ 0x47000 - adjust .max_width - write a more descriptive commit message Signed-off-by: Konrad Dybcio <konradybcio@gmail.com> Reviewed-by: Shawn Guo <shawn.guo@linaro.org> Signed-off-by: Rob Clark <robdclark@chromium.org>
2020-05-10Linux 5.7-rc5Linus Torvalds1-1/+1
2020-05-09gcc-10: mark more functions __init to avoid section mismatch warningsLinus Torvalds2-2/+2
It seems that for whatever reason, gcc-10 ends up not inlining a couple of functions that used to be inlined before. Even if they only have one single callsite - it looks like gcc may have decided that the code was unlikely, and not worth inlining. The code generation difference is harmless, but caused a few new section mismatch errors, since the (now no longer inlined) function wasn't in the __init section, but called other init functions: Section mismatch in reference from the function kexec_free_initrd() to the function .init.text:free_initrd_mem() Section mismatch in reference from the function tpm2_calc_event_log_size() to the function .init.text:early_memremap() Section mismatch in reference from the function tpm2_calc_event_log_size() to the function .init.text:early_memunmap() So add the appropriate __init annotation to make modpost not complain. In both cases there were trivially just a single callsite from another __init function. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-09gcc-10: avoid shadowing standard library 'free()' in cryptoLinus Torvalds2-6/+6
gcc-10 has started warning about conflicting types for a few new built-in functions, particularly 'free()'. This results in warnings like: crypto/xts.c:325:13: warning: conflicting types for built-in function ‘free’; expected ‘void(void *)’ [-Wbuiltin-declaration-mismatch] because the crypto layer had its local freeing functions called 'free()'. Gcc-10 is in the wrong here, since that function is marked 'static', and thus there is no chance of confusion with any standard library function namespace. But the simplest thing to do is to just use a different name here, and avoid this gcc mis-feature. [ Side note: gcc knowing about 'free()' is in itself not the mis-feature: the semantics of 'free()' are special enough that a compiler can validly do special things when seeing it. So the mis-feature here is that gcc thinks that 'free()' is some restricted name, and you can't shadow it as a local static function. Making the special 'free()' semantics be a function attribute rather than tied to the name would be the much better model ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-09gcc-10: disable 'restrict' warning for nowLinus Torvalds1-0/+3
gcc-10 now warns about passing aliasing pointers to functions that take restricted pointers. That's actually a great warning, and if we ever start using 'restrict' in the kernel, it might be quite useful. But right now we don't, and it turns out that the only thing this warns about is an idiom where we have declared a few functions to be "printf-like" (which seems to make gcc pick up the restricted pointer thing), and then we print to the same buffer that we also use as an input. And people do that as an odd concatenation pattern, with code like this: #define sysfs_show_gen_prop(buffer, fmt, ...) \ snprintf(buffer, PAGE_SIZE, "%s"fmt, buffer, __VA_ARGS__) where we have 'buffer' as both the destination of the final result, and as the initial argument. Yes, it's a bit questionable. And outside of the kernel, people do have standard declarations like int snprintf( char *restrict buffer, size_t bufsz, const char *restrict format, ... ); where that output buffer is marked as a restrict pointer that cannot alias with any other arguments. But in the context of the kernel, that 'use snprintf() to concatenate to the end result' does work, and the pattern shows up in multiple places. And we have not marked our own version of snprintf() as taking restrict pointers, so the warning is incorrect for now, and gcc picks it up on its own. If we do start using 'restrict' in the kernel (and it might be a good idea if people find places where it matters), we'll need to figure out how to avoid this issue for snprintf and friends. But in the meantime, this warning is not useful. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-09gcc-10: disable 'stringop-overflow' warning for nowLinus Torvalds1-0/+1
This is the final array bounds warning removal for gcc-10 for now. Again, the warning is good, and we should re-enable all these warnings when we have converted all the legacy array declaration cases to flexible arrays. But in the meantime, it's just noise. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-09nvme: fix possible hang when ns scanning fails during error recoverySagi Grimberg1-1/+1
When the controller is reconnecting, the host fails I/O and admin commands as the host cannot reach the controller. ns scanning may revalidate namespaces during that period and it is wrong to remove namespaces due to these failures as we may hang (see 205da2434301). One command that may fail is nvme_identify_ns_descs. Since we return success due to having ns identify descriptor list optional, we continue to compare ns identifiers in nvme_revalidate_disk, obviously fail and return -ENODEV to nvme_validate_ns, which will remove the namespace. Exactly what we don't want to happen. Fixes: 22802bf742c2 ("nvme: Namepace identification descriptor list is optional") Tested-by: Anton Eidelman <anton@lightbitslabs.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-09nvme-pci: fix "slimmer CQ head update"Alexey Dobriyan1-1/+5
Pre-incrementing ->cq_head can't be done in memory because OOB value can be observed by another context. This devalues space savings compared to original code :-\ $ ./scripts/bloat-o-meter ../vmlinux-000 ../obj/vmlinux add/remove: 0/0 grow/shrink: 0/4 up/down: 0/-32 (-32) Function old new delta nvme_poll_irqdisable 464 456 -8 nvme_poll 455 447 -8 nvme_irq 388 380 -8 nvme_dev_disable 955 947 -8 But the code is minimal now: one read for head, one read for q_depth, one increment, one comparison, single instruction phase bit update and one write for new head. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reported-by: John Garry <john.garry@huawei.com> Tested-by: John Garry <john.garry@huawei.com> Fixes: e2a366a4b0feaeb ("nvme-pci: slimmer CQ head update") Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-09bdi: add a ->dev_name field to struct backing_dev_infoChristoph Hellwig2-2/+4
Cache a copy of the name for the life time of the backing_dev_info structure so that we can reference it even after unregistering. Fixes: 68f23b89067f ("memcg: fix a crash in wb_workfn when a device disappears") Reported-by: Yufen Yu <yuyufen@huawei.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-09bdi: use bdi_dev_name() to get device nameYufen Yu4-8/+10
Use the common interface bdi_dev_name() to get device name. Signed-off-by: Yufen Yu <yuyufen@huawei.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Add missing <linux/backing-dev.h> include BFQ Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-09gcc-10: disable 'array-bounds' warning for nowLinus Torvalds1-0/+1
This is another fine warning, related to the 'zero-length-bounds' one, but hitting the same historical code in the kernel. Because C didn't historically support flexible array members, we have code that instead uses a one-sized array, the same way we have cases of zero-sized arrays. The one-sized arrays come from either not wanting to use the gcc zero-sized array extension, or from a slight convenience-feature, where particularly for strings, the size of the structure now includes the allocation for the final NUL character. So with a "char name[1];" at the end of a structure, you can do things like v = my_malloc(sizeof(struct vendor) + strlen(name)); and avoid the "+1" for the terminator. Yes, the modern way to do that is with a flexible array, and using 'offsetof()' instead of 'sizeof()', and adding the "+1" by hand. That also technically gets the size "more correct" in that it avoids any alignment (and thus padding) issues, but this is another long-term cleanup thing that will not happen for 5.7. So disable the warning for now, even though it's potentially quite useful. Having a slew of warnings that then hide more urgent new issues is not an improvement. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-09gcc-10: disable 'zero-length-bounds' warning for nowLinus Torvalds1-0/+3
This is a fine warning, but we still have a number of zero-length arrays in the kernel that come from the traditional gcc extension. Yes, they are getting converted to flexible arrays, but in the meantime the gcc-10 warning about zero-length bounds is very verbose, and is hiding other issues. I missed one actual build failure because it was hidden among hundreds of lines of warning. Thankfully I caught it on the second go before pushing things out, but it convinced me that I really need to disable the new warnings for now. We'll hopefully be all done with our conversion to flexible arrays in the not too distant future, and we can then re-enable this warning. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-09Stop the ad-hoc games with -Wno-maybe-initializedLinus Torvalds3-23/+3
We have some rather random rules about when we accept the "maybe-initialized" warnings, and when we don't. For example, we consider it unreliable for gcc versions < 4.9, but also if -O3 is enabled, or if optimizing for size. And then various kernel config options disabled it, because they know that they trigger that warning by confusing gcc sufficiently (ie PROFILE_ALL_BRANCHES). And now gcc-10 seems to be introducing a lot of those warnings too, so it falls under the same heading as 4.9 did. At the same time, we have a very straightforward way to _enable_ that warning when wanted: use "W=2" to enable more warnings. So stop playing these ad-hoc games, and just disable that warning by default, with the known and straight-forward "if you want to work on the extra compiler warnings, use W=123". Would it be great to have code that is always so obvious that it never confuses the compiler whether a variable is used initialized or not? Yes, it would. In a perfect world, the compilers would be smarter, and our source code would be simpler. That's currently not the world we live in, though. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-08ceph: demote quotarealm lookup warning to a debug messageLuis Henriques1-2/+2
A misconfigured cephx can easily result in having the kernel client flooding the logs with: ceph: Can't lookup inode 1 (err: -13) Change this message to debug level. Cc: stable@vger.kernel.org URL: https://tracker.ceph.com/issues/44546 Signed-off-by: Luis Henriques <lhenriques@suse.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-05-08iommu/virtio: Reverse arguments to list_addJulia Lawall1-1/+1
Elsewhere in the file, there is a list_for_each_entry with &vdev->resv_regions as the second argument, suggesting that &vdev->resv_regions is the list head. So exchange the arguments on the list_add call to put the list head in the second argument. Fixes: 2a5a31487445 ("iommu/virtio: Add probe request") Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Link: https://lore.kernel.org/r/1588704467-13431-1-git-send-email-Julia.Lawall@inria.fr Signed-off-by: Joerg Roedel <jroedel@suse.de>
2020-05-07mm: limit boost_watermark on small zonesHenry Willard1-0/+8
Commit 1c30844d2dfe ("mm: reclaim small amounts of memory when an external fragmentation event occurs") adds a boost_watermark() function which increases the min watermark in a zone by at least pageblock_nr_pages or the number of pages in a page block. On Arm64, with 64K pages and 512M huge pages, this is 8192 pages or 512M. It does this regardless of the number of managed pages managed in the zone or the likelihood of success. This can put the zone immediately under water in terms of allocating pages from the zone, and can cause a small machine to fail immediately due to OoM. Unlike set_recommended_min_free_kbytes(), which substantially increases min_free_kbytes and is tied to THP, boost_watermark() can be called even if THP is not active. The problem is most likely to appear on architectures such as Arm64 where pageblock_nr_pages is very large. It is desirable to run the kdump capture kernel in as small a space as possible to avoid wasting memory. In some architectures, such as Arm64, there are restrictions on where the capture kernel can run, and therefore, the space available. A capture kernel running in 768M can fail due to OoM immediately after boost_watermark() sets the min in zone DMA32, where most of the memory is, to 512M. It fails even though there is over 500M of free memory. With boost_watermark() suppressed, the capture kernel can run successfully in 448M. This patch limits boost_watermark() to boosting a zone's min watermark only when there are enough pages that the boost will produce positive results. In this case that is estimated to be four times as many pages as pageblock_nr_pages. Mel said: : There is no harm in marking it stable. Clearly it does not happen very : often but it's not impossible. 32-bit x86 is a lot less common now : which would previously have been vulnerable to triggering this easily. : ppc64 has a larger base page size but typically only has one zone. : arm64 is likely the most vulnerable, particularly when CMA is : configured with a small movable zone. Fixes: 1c30844d2dfe ("mm: reclaim small amounts of memory when an external fragmentation event occurs") Signed-off-by: Henry Willard <henry.willard@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: David Hildenbrand <david@redhat.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/1588294148-6586-1-git-send-email-henry.willard@oracle.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-07ubsan: disable UBSAN_ALIGNMENT under COMPILE_TESTKees Cook2-10/+6
The documentation for UBSAN_ALIGNMENT already mentions that it should not be used on all*config builds (and for efficient-unaligned-access architectures), so just refactor the Kconfig to correctly implement this so randconfigs will stop creating insane images that freak out objtool under CONFIG_UBSAN_TRAP (due to the false positives producing functions that never return, etc). Link: http://lkml.kernel.org/r/202005011433.C42EA3E2D@keescook Fixes: 0887a7ebc977 ("ubsan: add trap instrumentation option") Signed-off-by: Kees Cook <keescook@chromium.org> Reported-by: Randy Dunlap <rdunlap@infradead.org> Link: https://lore.kernel.org/linux-next/202004231224.D6B3B650@keescook/ Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-07mm/vmscan: remove unnecessary argument description of isolate_lru_pages()Qiwu Chen1-1/+0
Since commit a9e7c39fa9fd9 ("mm/vmscan.c: remove 7th argument of isolate_lru_pages()"), the explanation of 'mode' argument has been unnecessary. Let's remove it. Signed-off-by: Qiwu Chen <chenqiwu@xiaomi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20200501090346.2894-1-chenqiwu@xiaomi.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-07epoll: atomically remove wait entry on wake upRoman Penyaev1-19/+24
This patch does two things: - fixes a lost wakeup introduced by commit 339ddb53d373 ("fs/epoll: remove unnecessary wakeups of nested epoll") - improves performance for events delivery. The description of the problem is the following: if N (>1) threads are waiting on ep->wq for new events and M (>1) events come, it is quite likely that >1 wakeups hit the same wait queue entry, because there is quite a big window between __add_wait_queue_exclusive() and the following __remove_wait_queue() calls in ep_poll() function. This can lead to lost wakeups, because thread, which was woken up, can handle not all the events in ->rdllist. (in better words the problem is described here: https://lkml.org/lkml/2019/10/7/905) The idea of the current patch is to use init_wait() instead of init_waitqueue_entry(). Internally init_wait() sets autoremove_wake_function as a callback, which removes the wait entry atomically (under the wq locks) from the list, thus the next coming wakeup hits the next wait entry in the wait queue, thus preventing lost wakeups. Problem is very well reproduced by the epoll60 test case [1]. Wait entry removal on wakeup has also performance benefits, because there is no need to take a ep->lock and remove wait entry from the queue after the successful wakeup. Here is the timing output of the epoll60 test case: With explicit wakeup from ep_scan_ready_list() (the state of the code prior 339ddb53d373): real 0m6.970s user 0m49.786s sys 0m0.113s After this patch: real 0m5.220s user 0m36.879s sys 0m0.019s The other testcase is the stress-epoll [2], where one thread consumes all the events and other threads produce many events: With explicit wakeup from ep_scan_ready_list() (the state of the code prior 339ddb53d373): threads events/ms run-time ms 8 5427 1474 16 6163 2596 32 6824 4689 64 7060 9064 128 6991 18309 After this patch: threads events/ms run-time ms 8 5598 1429 16 7073 2262 32 7502 4265 64 7640 8376 128 7634 16767 (number of "events/ms" represents event bandwidth, thus higher is better; number of "run-time ms" represents overall time spent doing the benchmark, thus lower is better) [1] tools/testing/selftests/filesystems/epoll/epoll_wakeup_test.c [2] https://github.com/rouming/test-tools/blob/master/stress-epoll.c Signed-off-by: Roman Penyaev <rpenyaev@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Jason Baron <jbaron@akamai.com> Cc: Khazhismel Kumykov <khazhy@google.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Heiher <r@hev.cc> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20200430130326.1368509-2-rpenyaev@suse.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-07kselftests: introduce new epoll60 testcase for catching lost wakeupsRoman Penyaev1-0/+146
This test case catches lost wake up introduced by commit 339ddb53d373 ("fs/epoll: remove unnecessary wakeups of nested epoll") The test is simple: we have 10 threads and 10 event fds. Each thread can harvest only 1 event. 1 producer fires all 10 events at once and waits that all 10 events will be observed by 10 threads. In case of lost wakeup epoll_wait() will timeout and 0 will be returned. Test case catches two sort of problems: forgotten wakeup on event, which hits the ->ovflist list, this problem was fixed by: 5a2513239750 ("eventpoll: fix missing wakeup for ovflist in ep_poll_callback") the other problem is when several sequential events hit the same waiting thread, thus other waiters get no wakeups. Problem is fixed in the following patch. Signed-off-by: Roman Penyaev <rpenyaev@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Khazhismel Kumykov <khazhy@google.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Heiher <r@hev.cc> Cc: Jason Baron <jbaron@akamai.com> Link: http://lkml.kernel.org/r/20200430130326.1368509-1-rpenyaev@suse.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-07percpu: make pcpu_alloc() aware of current gfp contextFilipe Manana1-4/+10
Since 5.7-rc1, on btrfs we have a percpu counter initialization for which we always pass a GFP_KERNEL gfp_t argument (this happens since commit 2992df73268f78 ("btrfs: Implement DREW lock")). That is safe in some contextes but not on others where allowing fs reclaim could lead to a deadlock because we are either holding some btrfs lock needed for a transaction commit or holding a btrfs transaction handle open. Because of that we surround the call to the function that initializes the percpu counter with a NOFS context using memalloc_nofs_save() (this is done at btrfs_init_fs_root()). However it turns out that this is not enough to prevent a possible deadlock because percpu_alloc() determines if it is in an atomic context by looking exclusively at the gfp flags passed to it (GFP_KERNEL in this case) and it is not aware that a NOFS context is set. Because percpu_alloc() thinks it is in a non atomic context it locks the pcpu_alloc_mutex. This can result in a btrfs deadlock when pcpu_balance_workfn() is running, has acquired that mutex and is waiting for reclaim, while the btrfs task that called percpu_counter_init() (and therefore percpu_alloc()) is holding either the btrfs commit_root semaphore or a transaction handle (done fs/btrfs/backref.c: iterate_extent_inodes()), which prevents reclaim from finishing as an attempt to commit the current btrfs transaction will deadlock. Lockdep reports this issue with the following trace: ====================================================== WARNING: possible circular locking dependency detected 5.6.0-rc7-btrfs-next-77 #1 Not tainted ------------------------------------------------------ kswapd0/91 is trying to acquire lock: ffff8938a3b3fdc8 (&delayed_node->mutex){+.+.}, at: __btrfs_release_delayed_node.part.0+0x3f/0x320 [btrfs] but task is already holding lock: ffffffffb4f0dbc0 (fs_reclaim){+.+.}, at: __fs_reclaim_acquire+0x5/0x30 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #4 (fs_reclaim){+.+.}: fs_reclaim_acquire.part.0+0x25/0x30 __kmalloc+0x5f/0x3a0 pcpu_create_chunk+0x19/0x230 pcpu_balance_workfn+0x56a/0x680 process_one_work+0x235/0x5f0 worker_thread+0x50/0x3b0 kthread+0x120/0x140 ret_from_fork+0x3a/0x50 -> #3 (pcpu_alloc_mutex){+.+.}: __mutex_lock+0xa9/0xaf0 pcpu_alloc+0x480/0x7c0 __percpu_counter_init+0x50/0xd0 btrfs_drew_lock_init+0x22/0x70 [btrfs] btrfs_get_fs_root+0x29c/0x5c0 [btrfs] resolve_indirect_refs+0x120/0xa30 [btrfs] find_parent_nodes+0x50b/0xf30 [btrfs] btrfs_find_all_leafs+0x60/0xb0 [btrfs] iterate_extent_inodes+0x139/0x2f0 [btrfs] iterate_inodes_from_logical+0xa1/0xe0 [btrfs] btrfs_ioctl_logical_to_ino+0xb4/0x190 [btrfs] btrfs_ioctl+0x165a/0x3130 [btrfs] ksys_ioctl+0x87/0xc0 __x64_sys_ioctl+0x16/0x20 do_syscall_64+0x5c/0x260 entry_SYSCALL_64_after_hwframe+0x49/0xbe -> #2 (&fs_info->commit_root_sem){++++}: down_write+0x38/0x70 btrfs_cache_block_group+0x2ec/0x500 [btrfs] find_free_extent+0xc6a/0x1600 [btrfs] btrfs_reserve_extent+0x9b/0x180 [btrfs] btrfs_alloc_tree_block+0xc1/0x350 [btrfs] alloc_tree_block_no_bg_flush+0x4a/0x60 [btrfs] __btrfs_cow_block+0x122/0x5a0 [btrfs] btrfs_cow_block+0x106/0x240 [btrfs] commit_cowonly_roots+0x55/0x310 [btrfs] btrfs_commit_transaction+0x509/0xb20 [btrfs] sync_filesystem+0x74/0x90 generic_shutdown_super+0x22/0x100 kill_anon_super+0x14/0x30 btrfs_kill_super+0x12/0x20 [btrfs] deactivate_locked_super+0x31/0x70 cleanup_mnt+0x100/0x160 task_work_run+0x93/0xc0 exit_to_usermode_loop+0xf9/0x100 do_syscall_64+0x20d/0x260 entry_SYSCALL_64_after_hwframe+0x49/0xbe -> #1 (&space_info->groups_sem){++++}: down_read+0x3c/0x140 find_free_extent+0xef6/0x1600 [btrfs] btrfs_reserve_extent+0x9b/0x180 [btrfs] btrfs_alloc_tree_block+0xc1/0x350 [btrfs] alloc_tree_block_no_bg_flush+0x4a/0x60 [btrfs] __btrfs_cow_block+0x122/0x5a0 [btrfs] btrfs_cow_block+0x106/0x240 [btrfs] btrfs_search_slot+0x50c/0xd60 [btrfs] btrfs_lookup_inode+0x3a/0xc0 [btrfs] __btrfs_update_delayed_inode+0x90/0x280 [btrfs] __btrfs_commit_inode_delayed_items+0x81f/0x870 [btrfs] __btrfs_run_delayed_items+0x8e/0x180 [btrfs] btrfs_commit_transaction+0x31b/0xb20 [btrfs] iterate_supers+0x87/0xf0 ksys_sync+0x60/0xb0 __ia32_sys_sync+0xa/0x10 do_syscall_64+0x5c/0x260 entry_SYSCALL_64_after_hwframe+0x49/0xbe -> #0 (&delayed_node->mutex){+.+.}: __lock_acquire+0xef0/0x1c80 lock_acquire+0xa2/0x1d0 __mutex_lock+0xa9/0xaf0 __btrfs_release_delayed_node.part.0+0x3f/0x320 [btrfs] btrfs_evict_inode+0x40d/0x560 [btrfs] evict+0xd9/0x1c0 dispose_list+0x48/0x70 prune_icache_sb+0x54/0x80 super_cache_scan+0x124/0x1a0 do_shrink_slab+0x176/0x440 shrink_slab+0x23a/0x2c0 shrink_node+0x188/0x6e0 balance_pgdat+0x31d/0x7f0 kswapd+0x238/0x550 kthread+0x120/0x140 ret_from_fork+0x3a/0x50 other info that might help us debug this: Chain exists of: &delayed_node->mutex --> pcpu_alloc_mutex --> fs_reclaim Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(fs_reclaim); lock(pcpu_alloc_mutex); lock(fs_reclaim); lock(&delayed_node->mutex); *** DEADLOCK *** 3 locks held by kswapd0/91: #0: (fs_reclaim){+.+.}, at: __fs_reclaim_acquire+0x5/0x30 #1: (shrinker_rwsem){++++}, at: shrink_slab+0x12f/0x2c0 #2: (&type->s_umount_key#43){++++}, at: trylock_super+0x16/0x50 stack backtrace: CPU: 1 PID: 91 Comm: kswapd0 Not tainted 5.6.0-rc7-btrfs-next-77 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014 Call Trace: dump_stack+0x8f/0xd0 check_noncircular+0x170/0x190 __lock_acquire+0xef0/0x1c80 lock_acquire+0xa2/0x1d0 __mutex_lock+0xa9/0xaf0 __btrfs_release_delayed_node.part.0+0x3f/0x320 [btrfs] btrfs_evict_inode+0x40d/0x560 [btrfs] evict+0xd9/0x1c0 dispose_list+0x48/0x70 prune_icache_sb+0x54/0x80 super_cache_scan+0x124/0x1a0 do_shrink_slab+0x176/0x440 shrink_slab+0x23a/0x2c0 shrink_node+0x188/0x6e0 balance_pgdat+0x31d/0x7f0 kswapd+0x238/0x550 kthread+0x120/0x140 ret_from_fork+0x3a/0x50 This could be fixed by making btrfs pass GFP_NOFS instead of GFP_KERNEL to percpu_counter_init() in contextes where it is not reclaim safe, however that type of approach is discouraged since memalloc_[nofs|noio]_save() were introduced. Therefore this change makes pcpu_alloc() look up into an existing nofs/noio context before deciding whether it is in an atomic context or not. Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Dennis Zhou <dennis@kernel.org> Cc: Tejun Heo <tj@kernel.org> Cc: Christoph Lameter <cl@linux.com> Link: http://lkml.kernel.org/r/20200430164356.15543-1-fdmanana@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-07mm/slub: fix incorrect interpretation of s->offsetWaiman Long1-15/+30
In a couple of places in the slub memory allocator, the code uses "s->offset" as a check to see if the free pointer is put right after the object. That check is no longer true with commit 3202fa62fb43 ("slub: relocate freelist pointer to middle of object"). As a result, echoing "1" into the validate sysfs file, e.g. of dentry, may cause a bunch of "Freepointer corrupt" error reports like the following to appear with the system in panic afterwards. ============================================================================= BUG dentry(666:pmcd.service) (Tainted: G B): Freepointer corrupt ----------------------------------------------------------------------------- To fix it, use the check "s->offset == s->inuse" in the new helper function freeptr_outside_object() instead. Also add another helper function get_info_end() to return the end of info block (inuse + free pointer if not overlapping with object). Fixes: 3202fa62fb43 ("slub: relocate freelist pointer to middle of object") Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Rafael Aquini <aquini@redhat.com> Cc: Christoph Lameter <cl@linux.com> Cc: Vitaly Nikolenko <vnik@duasynt.com> Cc: Silvio Cesare <silvio.cesare@gmail.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Markus Elfring <Markus.Elfring@web.de> Cc: Changbin Du <changbin.du@gmail.com> Link: http://lkml.kernel.org/r/20200429135328.26976-1-longman@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-07scripts/gdb: repair rb_first() and rb_last()Aymeric Agon-Rambosson1-2/+2
The current implementations of the rb_first() and rb_last() gdb functions have a variable that references itself in its instanciation, which causes the function to throw an error if a specific condition on the argument is met. The original author rather intended to reference the argument and made a typo. Referring the argument instead makes the function work as intended. Signed-off-by: Aymeric Agon-Rambosson <aymeric.agon@yandex.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Stephen Boyd <swboyd@chromium.org> Cc: Jan Kiszka <jan.kiszka@siemens.com> Cc: Kieran Bingham <kbingham@kernel.org> Cc: Douglas Anderson <dianders@chromium.org> Cc: Nikolay Borisov <n.borisov.lkml@gmail.com> Cc: Jackie Liu <liuyun01@kylinos.cn> Cc: Jason Wessel <jason.wessel@windriver.com> Link: http://lkml.kernel.org/r/20200427051029.354840-1-aymeric.agon@yandex.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-07eventpoll: fix missing wakeup for ovflist in ep_poll_callbackKhazhismel Kumykov1-9/+9
In the event that we add to ovflist, before commit 339ddb53d373 ("fs/epoll: remove unnecessary wakeups of nested epoll") we would be woken up by ep_scan_ready_list, and did no wakeup in ep_poll_callback. With that wakeup removed, if we add to ovflist here, we may never wake up. Rather than adding back the ep_scan_ready_list wakeup - which was resulting in unnecessary wakeups, trigger a wake-up in ep_poll_callback. We noticed that one of our workloads was missing wakeups starting with 339ddb53d373 and upon manual inspection, this wakeup seemed missing to me. With this patch added, we no longer see missing wakeups. I haven't yet tried to make a small reproducer, but the existing kselftests in filesystem/epoll passed for me with this patch. [khazhy@google.com: use if/elif instead of goto + cleanup suggested by Roman] Link: http://lkml.kernel.org/r/20200424190039.192373-1-khazhy@google.com Fixes: 339ddb53d373 ("fs/epoll: remove unnecessary wakeups of nested epoll") Signed-off-by: Khazhismel Kumykov <khazhy@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Roman Penyaev <rpenyaev@suse.de> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Roman Penyaev <rpenyaev@suse.de> Cc: Heiher <r@hev.cc> Cc: Jason Baron <jbaron@akamai.com> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20200424025057.118641-1-khazhy@google.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>