aboutsummaryrefslogtreecommitdiffstats
path: root/tools/perf/scripts/python/failed-syscalls-by-pid.py (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2014-07-09kernfs: kernel-doc warning fixFabian Frederick1-1/+1
s/static_name/name_is_static Signed-off-by: Fabian Frederick <fabf@skynet.be> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-09debugfs: Fix corrupted loop in debugfs_remove_recursiveSteven Rostedt1-7/+26
[ I'm currently running my tests on it now, and so far, after a few hours it has yet to blow up. I'll run it for 24 hours which it never succeeded in the past. ] The tracing code has a way to make directories within the debugfs file system as well as deleting them using mkdir/rmdir in the instance directory. This is very limited in functionality, such as there is no renames, and the parent directory "instance" can not be modified. The tracing code creates the instance directory from the debugfs code and then replaces the dentry->d_inode->i_op with its own to allow for mkdir/rmdir to work. When these are called, the d_entry and inode locks need to be released to call the instance creation and deletion code. That code has its own accounting and locking to serialize everything to prevent multiple users from causing harm. As the parent "instance" directory can not be modified this simplifies things. I created a stress test that creates several threads that randomly creates and deletes directories thousands of times a second. The code stood up to this test and I submitted it a while ago. Recently I added a new test that adds readers to the mix. While the instance directories were being added and deleted, readers would read from these directories and even enable tracing within them. This test was able to trigger a bug: general protection fault: 0000 [#1] PREEMPT SMP Modules linked in: ... CPU: 3 PID: 17789 Comm: rmdir Tainted: G W 3.15.0-rc2-test+ #41 Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS SDBLI944.86P 05/08/2007 task: ffff88003786ca60 ti: ffff880077018000 task.ti: ffff880077018000 RIP: 0010:[<ffffffff811ed5eb>] [<ffffffff811ed5eb>] debugfs_remove_recursive+0x1bd/0x367 RSP: 0018:ffff880077019df8 EFLAGS: 00010246 RAX: 0000000000000002 RBX: ffff88006f0fe490 RCX: 0000000000000000 RDX: dead000000100058 RSI: 0000000000000246 RDI: ffff88003786d454 RBP: ffff88006f0fe640 R08: 0000000000000628 R09: 0000000000000000 R10: 0000000000000628 R11: ffff8800795110a0 R12: ffff88006f0fe640 R13: ffff88006f0fe640 R14: ffffffff81817d0b R15: ffffffff818188b7 FS: 00007ff13ae24700(0000) GS:ffff88007d580000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000003054ec7be0 CR3: 0000000076d51000 CR4: 00000000000007e0 Stack: ffff88007a41ebe0 dead000000100058 00000000fffffffe ffff88006f0fe640 0000000000000000 ffff88006f0fe678 ffff88007a41ebe0 ffff88003793a000 00000000fffffffe ffffffff810bde82 ffff88006f0fe640 ffff88007a41eb28 Call Trace: [<ffffffff810bde82>] ? instance_rmdir+0x15b/0x1de [<ffffffff81132e2d>] ? vfs_rmdir+0x80/0xd3 [<ffffffff81132f51>] ? do_rmdir+0xd1/0x139 [<ffffffff8124ad9e>] ? trace_hardirqs_on_thunk+0x3a/0x3c [<ffffffff814fea62>] ? system_call_fastpath+0x16/0x1b Code: fe ff ff 48 8d 75 30 48 89 df e8 c9 fd ff ff 85 c0 75 13 48 c7 c6 b8 cc d2 81 48 c7 c7 b0 cc d2 81 e8 8c 7a f5 ff 48 8b 54 24 08 <48> 8b 82 a8 00 00 00 48 89 d3 48 2d a8 00 00 00 48 89 44 24 08 RIP [<ffffffff811ed5eb>] debugfs_remove_recursive+0x1bd/0x367 RSP <ffff880077019df8> It took a while, but every time it triggered, it was always in the same place: list_for_each_entry_safe(child, next, &parent->d_subdirs, d_u.d_child) { Where the child->d_u.d_child seemed to be corrupted. I added lots of trace_printk()s to see what was wrong, and sure enough, it was always the child's d_u.d_child field. I looked around to see what touches it and noticed that in __dentry_kill() which calls dentry_free(): static void dentry_free(struct dentry *dentry) { /* if dentry was never visible to RCU, immediate free is OK */ if (!(dentry->d_flags & DCACHE_RCUACCESS)) __d_free(&dentry->d_u.d_rcu); else call_rcu(&dentry->d_u.d_rcu, __d_free); } I also noticed that __dentry_kill() unlinks the child->d_u.child under the parent->d_lock spin_lock. Looking back at the loop in debugfs_remove_recursive() it never takes the parent->d_lock to do the list walk. Adding more tracing, I was able to prove this was the issue: ftrace-t-15385 1.... 246662024us : dentry_kill <ffffffff81138b91>: free ffff88006d573600 rmdir-15409 2.... 246662024us : debugfs_remove_recursive <ffffffff811ec7e5>: child=ffff88006d573600 next=dead000000100058 The dentry_kill freed ffff88006d573600 just as the remove recursive was walking it. In order to fix this, the list walk needs to be modified a bit to take the parent->d_lock. The safe version is no longer necessary, as every time we remove a child, the parent->d_lock must be released and the list walk must start over. Each time a child is removed, even though it may still be on the list, it should be skipped by the first check in the loop: if (!debugfs_positive(child)) continue; Cc: stable@vger.kernel.org Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-09stable_kernel_rules: Add pointer to netdev-FAQ for network patchesDave Chiluk1-0/+3
Stable_kernel_rules should point submitters of network stable patches to the netdev_FAQ.txt as requests for stable network patches should go to netdev first. Signed-off-by: Dave Chiluk <chiluk@canonical.com> Cc: stable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-08driver core: platform: add device binding path 'driver_override'Kim Phillips3-0/+68
Needed by platform device drivers, such as the upcoming vfio-platform driver, in order to bypass the existing OF, ACPI, id_table and name string matches, and successfully be able to be bound to any device, like so: echo vfio-platform > /sys/bus/platform/devices/fff51000.ethernet/driver_override echo fff51000.ethernet > /sys/bus/platform/devices/fff51000.ethernet/driver/unbind echo fff51000.ethernet > /sys/bus/platform/drivers_probe This mimics "PCI: Introduce new device binding path using pci_dev.driver_override", which is an interface enhancement for more deterministic PCI device binding, e.g., when in the presence of hotplug. Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Alexander Graf <agraf@suse.de> Reviewed-by: Stuart Yoder <stuart.yoder@freescale.com> Signed-off-by: Kim Phillips <kim.phillips@freescale.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-08driver core/platform: remove unused implicit padding in platform_objectYann Droneaud1-2/+2
Up to 7 bytes are wasted at the end of struct platform_object in the form of padding after name field: unfortunately this padding is not used when allocating the memory to hold the name. This patch converts name array from name[1] to C99 flexible array name[] (equivalent to name[0]) so that no padding is required by the presence of this field. Memory allocation is updated to take care of allocating an additional byte for the NUL terminating character. Built on Fedora 20, using GCC 4.8, for ARM, i386, SPARC64 and x86_64 architectures, the data structure layout can be reported with following command: $ pahole drivers/base/platform.o \ --recursive \ --class_name device,pdev_archdata,platform_device,platform_object Please find below some comparisons of structure layout for arm, i386, sparc64 and x86_64 architecture before and after the patch. --- obj-arm/drivers/base/platform.o.pahole.v3.15-rc7-79-gfe45736f4134 2014-05-30 10:32:06.290960701 +0200 +++ obj-arm/drivers/base/platform.o.pahole.v3.15-rc7-80-g2cdb06858d71 2014-05-30 11:26:20.851988347 +0200 @@ -81,10 +81,9 @@ /* XXX last struct has 4 bytes of padding */ /* --- cacheline 6 boundary (384 bytes) was 8 bytes ago --- */ - char name[1]; /* 392 1 */ + char name[0]; /* 392 0 */ - /* size: 400, cachelines: 7, members: 2 */ - /* padding: 7 */ + /* size: 392, cachelines: 7, members: 2 */ /* paddings: 1, sum paddings: 4 */ - /* last cacheline: 16 bytes */ + /* last cacheline: 8 bytes */ }; --- obj-i386/drivers/base/platform.o.pahole.v3.15-rc7-79-gfe45736f4134 2014-05-30 10:32:06.305960691 +0200 +++ obj-i386/drivers/base/platform.o.pahole.v3.15-rc7-80-g2cdb06858d71 2014-05-30 11:26:20.875988332 +0200 @@ -73,9 +73,8 @@ struct platform_object { struct platform_device pdev; /* 0 396 */ /* --- cacheline 6 boundary (384 bytes) was 12 bytes ago --- */ - char name[1]; /* 396 1 */ + char name[0]; /* 396 0 */ - /* size: 400, cachelines: 7, members: 2 */ - /* padding: 3 */ - /* last cacheline: 16 bytes */ + /* size: 396, cachelines: 7, members: 2 */ + /* last cacheline: 12 bytes */ }; --- obj-sparc64/drivers/base/platform.o.pahole.v3.15-rc7-79-gfe45736f4134 2014-05-30 10:32:06.406960625 +0200 +++ obj-sparc64/drivers/base/platform.o.pahole.v3.15-rc7-80-g2cdb06858d71 2014-05-30 11:26:20.971988269 +0200 @@ -94,9 +94,8 @@ struct platform_object { struct platform_device pdev; /* 0 2208 */ /* --- cacheline 34 boundary (2176 bytes) was 32 bytes ago --- */ - char name[1]; /* 2208 1 */ + char name[0]; /* 2208 0 */ - /* size: 2216, cachelines: 35, members: 2 */ - /* padding: 7 */ - /* last cacheline: 40 bytes */ + /* size: 2208, cachelines: 35, members: 2 */ + /* last cacheline: 32 bytes */ }; --- obj-x86_64/drivers/base/platform.o.pahole.v3.15-rc7-79-gfe45736f4134 2014-05-30 10:32:06.432960608 +0200 +++ obj-x86_64/drivers/base/platform.o.pahole.v3.15-rc7-80-g2cdb06858d71 2014-05-30 11:26:21.000988250 +0200 @@ -84,9 +84,8 @@ struct platform_object { struct platform_device pdev; /* 0 720 */ /* --- cacheline 11 boundary (704 bytes) was 16 bytes ago --- */ - char name[1]; /* 720 1 */ + char name[0]; /* 720 0 */ - /* size: 728, cachelines: 12, members: 2 */ - /* padding: 7 */ - /* last cacheline: 24 bytes */ + /* size: 720, cachelines: 12, members: 2 */ + /* last cacheline: 16 bytes */ }; Changes from v5 [1]: - dropped dma_mask allocation changes and only kept padding removal changes (name array length set to 0). Changes from v4 [2]: [by Emil Goode <emilgoode@gmail.com>:] - Split v4 of the patch into two separate patches. - Generated new object file size and data structure layout info. - Updated the changelog message. Changes from v3 [3]: - fixed commit message so that git am doesn't fail. Changes from v2 [4]: - move 'dma_mask' to platform_object so that it's always allocated and won't leak on release; remove all previously added support functions. - use C99 flexible array member for 'name' to remove padding at the end of platform_object. Changes from v1 [5]: - remove unneeded kfree() from error path - add reference to author/commit adding allocation of dmamask Changes from v0 [6]: - small rewrite to squeeze the patch to a bare minimal [1] http://lkml.kernel.org/r/1401122483-31603-2-git-send-email-emilgoode@gmail.com http://lkml.kernel.org/r/1401122483-31603-1-git-send-email-emilgoode@gmail.com http://lkml.kernel.org/r/1401122483-31603-3-git-send-email-emilgoode@gmail.com [2] http://lkml.kernel.org/r/1390817152-30898-1-git-send-email-ydroneaud@opteya.com https://patchwork.kernel.org/patch/3541871/ [3] http://lkml.kernel.org/r/1390771138-28348-1-git-send-email-ydroneaud@opteya.com https://patchwork.kernel.org/patch/3540081/ [4] http://lkml.kernel.org/r/1389683909-17495-1-git-send-email-ydroneaud@opteya.com https://patchwork.kernel.org/patch/3484411/ [5] http://lkml.kernel.org/r/1389649085-7365-1-git-send-email-ydroneaud@opteya.com https://patchwork.kernel.org/patch/3480961/ [6] http://lkml.kernel.org/r/1386886207-2735-1-git-send-email-ydroneaud@opteya.com Cc: Emil Goode <emilgoode@gmail.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: Shawn Guo <shawn.guo@freescale.com> Cc: Sascha Hauer <kernel@pengutronix.de> Cc: Russell King <linux@arm.linux.org.uk> Cc: Olof Johansson <olof@lixom.net> Cc: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Acked-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-08firmware loader: inform direct failure when udev loader is disabledLuis R. Rodriguez2-13/+15
Now that the udev firmware loader is optional request_firmware() will not provide any information on the kernel ring buffer if direct firmware loading failed and udev firmware loading is disabled. If no information is needed request_firmware_direct() should be used for optional firmware, at which point drivers can take on the onus over informing of any failures, if udev firmware loading is disabled though we should at the very least provide some sort of information as when the udev loader was enabled by default back in the days. With this change with a simple firmware load test module [0]: Example output without FW_LOADER_USER_HELPER_FALLBACK platform fake-dev.0: Direct firmware load for fake.bin failed with error -2 Example with FW_LOADER_USER_HELPER_FALLBACK platform fake-dev.0: Direct firmware load for fake.bin failed with error -2 platform fake-dev.0: Falling back to user helper Without this change without FW_LOADER_USER_HELPER_FALLBACK we get no output logged upon failure. Cc: Tom Gundersen <teg@jklm.no> Cc: Ming Lei <ming.lei@canonical.com> Cc: Abhay Salunke <Abhay_Salunke@dell.com> Cc: Stefan Roese <sr@denx.de> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Kay Sievers <kay@vrfy.org> Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com> Reviewed-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-08firmware: replace ALIGN(PAGE_SIZE) by PAGE_ALIGNFabian Frederick1-1/+1
use mm.h definition Cc: Ming Lei <ming.lei@canonical.com> Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-08firmware: read firmware size using i_size_read()Dmitry Kasatkin1-14/+3
There is no need to read attr because inode structure contains size of the file. Use i_size_read() instead. Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com> Acked-by: Ming Lei <ming.lei@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-08firmware loader: allow disabling of udev as firmware loaderTakashi Iwai3-8/+19
[The patch was originally proposed by Tom Gundersen, and rewritten afterwards by me; most of changelogs below borrowed from Tom's original patch -- tiwai] Currently (at least) the dell-rbu driver selects FW_LOADER_USER_HELPER, which means that distros can't really stop loading firmware through udev without breaking other users (though some have). Ideally we would remove/disable the udev firmware helper in both the kernel and in udev, but if we were to disable it in udev and not the kernel, the result would be (seemingly) hung kernels as no one would be around to cancel firmware requests. This patch allows udev firmware loading to be disabled while still allowing non-udev firmware loading, as done by the dell-rbu driver, to continue working. This is achieved by only using the fallback mechanism when the uevent is suppressed. The patch renames the user-selectable Kconfig from FW_LOADER_USER_HELPER to FW_LOADER_USER_HELPER_FALLBACK, and the former is reverse-selected by the latter or the drivers that need userhelper like dell-rbu. Also, the "default y" is removed together with this change, since it's been deprecated in udev upstream, thus rather better to disable it nowadays. Tested with FW_LOADER_USER_HELPER=n LATTICE_ECP3_CONFIG=y DELL_RBU=y and udev without the firmware loading support, but I don't have the hardware to test the lattice/dell drivers, so additional testing would be appreciated. Reviewed-by: Tom Gundersen <teg@jklm.no> Cc: Ming Lei <ming.lei@canonical.com> Cc: Abhay Salunke <Abhay_Salunke@dell.com> Cc: Stefan Roese <sr@denx.de> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Kay Sievers <kay@vrfy.org> Tested-by: Balaji Singh <B_B_Singh@DELL.com> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-08reservation: add suppport for read-only access using rcuMaarten Lankhorst5-54/+400
This adds some extra functions to deal with rcu. reservation_object_get_fences_rcu() will obtain the list of shared and exclusive fences without obtaining the ww_mutex. reservation_object_wait_timeout_rcu() will wait on all fences of the reservation_object, without obtaining the ww_mutex. reservation_object_test_signaled_rcu() will test if all fences of the reservation_object are signaled without using the ww_mutex. reservation_object_get_excl and reservation_object_get_list require the reservation object to be held, updating requires write_seqcount_begin/end. If only the exclusive fence is needed, rcu_dereference followed by fence_get_rcu can be used, if the shared fences are needed it's recommended to use the supplied functions. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com> Acked-by: Sumit Semwal <sumit.semwal@linaro.org> Acked-by: Daniel Vetter <daniel@ffwll.ch> Reviewed-By: Thomas Hellstrom <thellstrom@vmware.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-08reservation: update api and add some helpersMaarten Lankhorst4-19/+229
Move the list of shared fences to a struct, and return it in reservation_object_get_list(). Add reservation_object_get_excl to get the exclusive fence. Add reservation_object_reserve_shared(), which reserves space in the reservation_object for 1 more shared fence. reservation_object_add_shared_fence() and reservation_object_add_excl_fence() are used to assign a new fence to a reservation_object pointer, to complete a reservation. Changes since v1: - Add reservation_object_get_excl, reorder code a bit. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com> Acked-by: Sumit Semwal <sumit.semwal@linaro.org> Acked-by: Daniel Vetter <daniel@ffwll.ch> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-08dma-buf: add poll support, v3Maarten Lankhorst2-0/+120
Thanks to Fengguang Wu for spotting a missing static cast. v2: - Kill unused variable need_shared. v3: - Clarify the BUG() in dma_buf_release some more. (Rob Clark) Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com> Acked-by: Sumit Semwal <sumit.semwal@linaro.org> Acked-by: Daniel Vetter <daniel@ffwll.ch> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-08reservation: add support for fences to enable cross-device synchronisationMaarten Lankhorst1-1/+19
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com> Acked-by: Sumit Semwal <sumit.semwal@linaro.org> Acked-by: Daniel Vetter <daniel@ffwll.ch> Reviewed-by: Rob Clark <robdclark@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-08android: convert sync to fence api, v6Maarten Lankhorst7-651/+609
Just to show it's easy. Android syncpoints can be mapped to a timeline. This removes the need to maintain a separate api for synchronization. I've left the android trace events in place, but the core fence events should already be sufficient for debugging. v2: - Call fence_remove_callback in sync_fence_free if not all fences have fired. v3: - Merge Colin Cross' bugfixes, and the android fence merge optimization. v4: - Merge with the upstream fixes. v5: - Fix small style issues pointed out by Thomas Hellstrom. v6: - Fix for updates to fence api. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com> Acked-by: John Stultz <john.stultz@linaro.org> Acked-by: Sumit Semwal <sumit.semwal@linaro.org> Acked-by: Daniel Vetter <daniel@ffwll.ch> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-08dma-buf: use reservation objectsMaarten Lankhorst17-14/+65
This allows reservation objects to be used in dma-buf. it's required for implementing polling support on the fences that belong to a dma-buf. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com> Acked-by: Mauro Carvalho Chehab <m.chehab@samsung.com> #drivers/media/v4l2-core/ Acked-by: Thomas Hellstrom <thellstrom@vmware.com> #drivers/gpu/drm/ttm Acked-by: Sumit Semwal <sumit.semwal@linaro.org> Acked-by: Daniel Vetter <daniel@ffwll.ch> Signed-off-by: Vincent Stehlé <vincent.stehle@laposte.net> #drivers/gpu/drm/armada/ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-08seqno-fence: Hardware dma-buf implementation of fencing (v6)Maarten Lankhorst5-2/+193
This type of fence can be used with hardware synchronization for simple hardware that can block execution until the condition (dma_buf[offset] - value) >= 0 has been met when WAIT_GEQUAL is used, or (dma_buf[offset] != 0) has been met when WAIT_NONZERO is set. A software fallback still has to be provided in case the fence is used with a device that doesn't support this mechanism. It is useful to expose this for graphics cards that have an op to support this. Some cards like i915 can export those, but don't have an option to wait, so they need the software fallback. I extended the original patch by Rob Clark. v1: Original v2: Renamed from bikeshed to seqno, moved into dma-fence.c since not much was left of the file. Lots of documentation added. v3: Use fence_ops instead of custom callbacks. Moved to own file to avoid circular dependency between dma-buf.h and fence.h v4: Add spinlock pointer to seqno_fence_init v5: Add condition member to allow wait for != 0. Fix small style errors pointed out by checkpatch. v6: Move to a separate file. Fix up api changes in fences. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com> Acked-by: Sumit Semwal <sumit.semwal@linaro.org> Acked-by: Daniel Vetter <daniel@ffwll.ch> Reviewed-by: Rob Clark <robdclark@gmail.com> #v4 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-08fence: dma-buf cross-device synchronization (v18)Maarten Lankhorst7-2/+915
A fence can be attached to a buffer which is being filled or consumed by hw, to allow userspace to pass the buffer without waiting to another device. For example, userspace can call page_flip ioctl to display the next frame of graphics after kicking the GPU but while the GPU is still rendering. The display device sharing the buffer with the GPU would attach a callback to get notified when the GPU's rendering-complete IRQ fires, to update the scan-out address of the display, without having to wake up userspace. A driver must allocate a fence context for each execution ring that can run in parallel. The function for this takes an argument with how many contexts to allocate: + fence_context_alloc() A fence is transient, one-shot deal. It is allocated and attached to one or more dma-buf's. When the one that attached it is done, with the pending operation, it can signal the fence: + fence_signal() To have a rough approximation whether a fence is fired, call: + fence_is_signaled() The dma-buf-mgr handles tracking, and waiting on, the fences associated with a dma-buf. The one pending on the fence can add an async callback: + fence_add_callback() The callback can optionally be cancelled with: + fence_remove_callback() To wait synchronously, optionally with a timeout: + fence_wait() + fence_wait_timeout() When emitting a fence, call: + trace_fence_emit() To annotate that a fence is blocking on another fence, call: + trace_fence_annotate_wait_on(fence, on_fence) A default software-only implementation is provided, which can be used by drivers attaching a fence to a buffer when they have no other means for hw sync. But a memory backed fence is also envisioned, because it is common that GPU's can write to, or poll on some memory location for synchronization. For example: fence = custom_get_fence(...); if ((seqno_fence = to_seqno_fence(fence)) != NULL) { dma_buf *fence_buf = seqno_fence->sync_buf; get_dma_buf(fence_buf); ... tell the hw the memory location to wait ... custom_wait_on(fence_buf, seqno_fence->seqno_ofs, fence->seqno); } else { /* fall-back to sw sync * / fence_add_callback(fence, my_cb); } On SoC platforms, if some other hw mechanism is provided for synchronizing between IP blocks, it could be supported as an alternate implementation with it's own fence ops in a similar way. enable_signaling callback is used to provide sw signaling in case a cpu waiter is requested or no compatible hardware signaling could be used. The intention is to provide a userspace interface (presumably via eventfd) later, to be used in conjunction with dma-buf's mmap support for sw access to buffers (or for userspace apps that would prefer to do their own synchronization). v1: Original v2: After discussion w/ danvet and mlankhorst on #dri-devel, we decided that dma-fence didn't need to care about the sw->hw signaling path (it can be handled same as sw->sw case), and therefore the fence->ops can be simplified and more handled in the core. So remove the signal, add_callback, cancel_callback, and wait ops, and replace with a simple enable_signaling() op which can be used to inform a fence supporting hw->hw signaling that one or more devices which do not support hw signaling are waiting (and therefore it should enable an irq or do whatever is necessary in order that the CPU is notified when the fence is passed). v3: Fix locking fail in attach_fence() and get_fence() v4: Remove tie-in w/ dma-buf.. after discussion w/ danvet and mlankorst we decided that we need to be able to attach one fence to N dma-buf's, so using the list_head in dma-fence struct would be problematic. v5: [ Maarten Lankhorst ] Updated for dma-bikeshed-fence and dma-buf-manager. v6: [ Maarten Lankhorst ] I removed dma_fence_cancel_callback and some comments about checking if fence fired or not. This is broken by design. waitqueue_active during destruction is now fatal, since the signaller should be holding a reference in enable_signalling until it signalled the fence. Pass the original dma_fence_cb along, and call __remove_wait in the dma_fence_callback handler, so that no cleanup needs to be performed. v7: [ Maarten Lankhorst ] Set cb->func and only enable sw signaling if fence wasn't signaled yet, for example for hardware fences that may choose to signal blindly. v8: [ Maarten Lankhorst ] Tons of tiny fixes, moved __dma_fence_init to header and fixed include mess. dma-fence.h now includes dma-buf.h All members are now initialized, so kmalloc can be used for allocating a dma-fence. More documentation added. v9: Change compiler bitfields to flags, change return type of enable_signaling to bool. Rework dma_fence_wait. Added dma_fence_is_signaled and dma_fence_wait_timeout. s/dma// and change exports to non GPL. Added fence_is_signaled and fence_enable_sw_signaling calls, add ability to override default wait operation. v10: remove event_queue, use a custom list, export try_to_wake_up from scheduler. Remove fence lock and use a global spinlock instead, this should hopefully remove all the locking headaches I was having on trying to implement this. enable_signaling is called with this lock held. v11: Use atomic ops for flags, lifting the need for some spin_lock_irqsaves. However I kept the guarantee that after fence_signal returns, it is guaranteed that enable_signaling has either been called to completion, or will not be called any more. Add contexts and seqno to base fence implementation. This allows you to wait for less fences, by testing for seqno + signaled, and then only wait on the later fence. Add FENCE_TRACE, FENCE_WARN, and FENCE_ERR. This makes debugging easier. An CONFIG_DEBUG_FENCE will be added to turn off the FENCE_TRACE spam, and another runtime option can turn it off at runtime. v12: Add CONFIG_FENCE_TRACE. Add missing documentation for the fence->context and fence->seqno members. v13: Fixup CONFIG_FENCE_TRACE kconfig description. Move fence_context_alloc to fence. Simplify fence_later. Kill priv member to fence_cb. v14: Remove priv argument from fence_add_callback, oops! v15: Remove priv from documentation. Explicitly include linux/atomic.h. v16: Add trace events. Import changes required by android syncpoints. v17: Use wake_up_state instead of try_to_wake_up. (Colin Cross) Fix up commit description for seqno_fence. (Rob Clark) v18: Rename release_fence to fence_release. Move to drivers/dma-buf/. Rename __fence_is_signaled and __fence_signal to *_locked. Rename __fence_init to fence_init. Make fence_default_wait return a signed long, and fix wait ops too. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com> Signed-off-by: Thierry Reding <thierry.reding@gmail.com> #use smp_mb__before_atomic() Acked-by: Sumit Semwal <sumit.semwal@linaro.org> Acked-by: Daniel Vetter <daniel@ffwll.ch> Reviewed-by: Rob Clark <robdclark@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-08dma-buf: move to drivers/dma-bufMaarten Lankhorst7-5/+5
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com> Acked-by: Sumit Semwal <sumit.semwal@linaro.org> Acked-by: Daniel Vetter <daniel@ffwll.ch> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-06Linux 3.16-rc4Linus Torvalds1-1/+1
2014-07-05genirq: Fix memory leak when calling irq_free_hwirqs()Keith Busch1-2/+2
irq_free_hwirqs() always calls irq_free_descs() with a cnt == 0 which makes it a no-op since the interrupt count to free is decremented in itself. Fixes: 7b6ef1262549f6afc5c881aaef80beb8fd15f908 Signed-off-by: Keith Busch <keith.busch@intel.com> Acked-by: David Rientjes <rientjes@google.com> Link: http://lkml.kernel.org/r/1404167084-8070-1-git-send-email-keith.busch@intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-07-04MAINTAINERS: Add few more Keystone driversSantosh Shilimkar1-1/+25
Update MAINTAINERS file for recently added reset controller, AEMIF and clocksource driver for Keystone SOCs. The EMIF memory controller driver is also added along with AEMIF. Cc: Arnd Bergmann <arnd@arndb.de> Cc: Olof Johansson <olof@lixom.net> Cc: Kevin Hilman <khilman@linaro.org> Cc: Mike Turquette <mturquette@linaro.org> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@ti.com> Signed-off-by: Olof Johansson <olof@lixom.net>
2014-07-04MAINTAINERS: merge MXS entry into IMX oneShawn Guo1-7/+1
The mach-mxs platform is actually co-maintained by myself and pengutronix folks. Also it's hosted in the same kernel tree as IMX. So let's merge the entry into IMX one. Signed-off-by: Shawn Guo <shawn.guo@freescale.com> Acked-by: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: Olof Johansson <olof@lixom.net>
2014-07-04ARM: sunxi: Reintroduce the restart code for A10/A20 SoCsMaxime Ripard1-0/+77
This partly reverts commits 553600502b84 (ARM: sunxi: Remove reset code from the platform) and 5e669ec583e2 (ARM: sunxi: Remove init_machine callback) for the sun4i, sun5i and sun7i families. This is needed because the watchdog counterpart of these commits was dropped, and didn't make it into 3.16. In order to still be able to reboot the board, we need to reintroduce that code. Of course, the long term view is still to get rid of that code in mach-sunxi. Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com> Signed-off-by: Olof Johansson <olof@lixom.net>
2014-07-04component: fix bug with legacy APIRussell King1-4/+6
Sachin Kamat reports that "component: add support for component match array" broke Exynos DRM due to a NULL pointer deref. Fix this. Reported-by: Sachin Kamat <sachin.kamat@samsung.com> Tested-by: Sachin Kamat <sachin.kamat@samsung.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2014-07-04arm64: fix el2_setup check of CurrentELMarc Zyngier3-4/+6
The CurrentEL system register reports the Current Exception Level of the CPU. It doesn't say anything about the stack handling, and yet we compare it to PSR_MODE_EL2t and PSR_MODE_EL2h. It works by chance because PSR_MODE_EL2t happens to match the right bits, but that's otherwise a very bad idea. Just check for the EL value instead. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> [catalin.marinas@arm.com: fixed arch/arm64/kernel/efi-entry.S] Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-07-04arm64: mm: Make icache synchronisation logic huge page awareSteve Capper1-1/+2
The __sync_icache_dcache routine will only flush the dcache for the first page of a compound page, potentially leading to stale icache data residing further on in a hugetlb page. This patch addresses this issue by taking into consideration the order of the page when flushing the dcache. Reported-by: Mark Brown <broonie@linaro.org> Tested-by: Mark Brown <broonie@linaro.org> Signed-off-by: Steve Capper <steve.capper@linaro.org> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: <stable@vger.kernel.org> # v3.11+
2014-07-04arm64: mm: Fix horrendous config typoSteve Capper1-1/+1
The define ARM64_64K_PAGES is tested for rather than CONFIG_ARM64_64K_PAGES. Correct that typo here. Signed-off-by: Steve Capper <steve.capper@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-07-04ALSA: hda - restore BCLK M/N value as per CDCLK for HSW/BDW display HDA controllerMengdong Lin3-41/+66
For HSW/BDW display HD-A controller, hda_set_bclk() is defined to set BCLK by programming the M/N values as per the core display clock (CDCLK) queried from i915 display driver. And the audio driver will also set BCLK in azx_first_init() since the display driver can turn off the shared power in boot phase if only eDP is connected and M/N values will be lost and must be reprogrammed. Signed-off-by: Mengdong Lin <mengdong.lin@intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de>
2014-07-04drm/i915: provide interface for audio driver to query cdclkJani Nikula2-0/+22
For Haswell and Broadwell, if the display power well has been disabled, the display audio controller divider values EM4 M VALUE and EM5 N VALUE will have been lost. The CDCLK frequency is required for reprogramming them to generate 24MHz HD-A link BCLK. So provide a private interface for the audio driver to query CDCLK. This is a stopgap solution until a more generic interface between audio and display drivers has been implemented. Signed-off-by: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Damien Lespiau <damien.lespiau@intel.com> Signed-off-by: Mengdong Lin <mengdong.lin@intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de>
2014-07-03ptrace,x86: force IRET path after a ptrace_stop()Tejun Heo2-0/+19
The 'sysret' fastpath does not correctly restore even all regular registers, much less any segment registers or reflags values. That is very much part of why it's faster than 'iret'. Normally that isn't a problem, because the normal ptrace() interface catches the process using the signal handler infrastructure, which always returns with an iret. However, some paths can get caught using ptrace_event() instead of the signal path, and for those we need to make sure that we aren't going to return to user space using 'sysret'. Otherwise the modifications that may have been done to the register set by the tracer wouldn't necessarily take effect. Fix it by forcing IRET path by setting TIF_NOTIFY_RESUME from arch_ptrace_stop_needed() which is invoked from ptrace_stop(). Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Andy Lutomirski <luto@amacapital.net> Acked-by: Oleg Nesterov <oleg@redhat.com> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-07-03lz4: add overrun checks to lz4_uncompress_unknownoutputsize()Greg Kroah-Hartman1-1/+5
Jan points out that I forgot to make the needed fixes to the lz4_uncompress_unknownoutputsize() function to mirror the changes done in lz4_decompress() with regards to potential pointer overflows. The only in-kernel user of this function is the zram code, which only takes data from a valid compressed buffer that it made itself, so it's not a big issue. But due to external kernel modules using this function, it's better to be safe here. Reported-by: Jan Beulich <JBeulich@suse.com> Cc: "Don A. Bailey" <donb@securitymouse.com> Cc: stable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-03[SCSI] use the scsi data buffer length to extract transfer sizeMartin K. Petersen1-1/+1
Commit 8846bab180fa introduced a helper that can be used to query the wire transfer size for a SCSI command taking protection information into account. However, some commands do not have a 1:1 mapping between the block range they work on and the payload size (discard, write same). After the scatterlist has been set up these requests use __data_len to store the number of bytes to report completion on. This means that callers of scsi_transfer_length() would get the wrong byte count for these types of requests. To overcome this we make scsi_transfer_length() use the scatterlist length in the scsi_data_buffer as basis for the wire transfer calculation instead of __data_len. Reported-by: Christoph Hellwig <hch@infradead.org> Debugged-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Fixes: d77e65350f2d82dfa0557707d505711f5a43c8fd Cc: stable@vger.kernel.org Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2014-07-03shmem: fix init_page_accessed use to stop !PageLRU bugHugh Dickins1-5/+10
Under shmem swapping load, I sometimes hit the VM_BUG_ON_PAGE(!PageLRU) in isolate_lru_pages() at mm/vmscan.c:1281! Commit 2457aec63745 ("mm: non-atomically mark page accessed during page cache allocation where possible") looks like interrupted work-in-progress. mm/filemap.c's call to init_page_accessed() is fine, but not mm/shmem.c's - shmem_write_begin() is clearly wrong to use it after shmem_getpage(), when the page is always visible in radix_tree, and often already on LRU. Revert change to shmem_write_begin(), and use init_page_accessed() or mark_page_accessed() appropriately for SGP_WRITE in shmem_getpage_gfp(). SGP_WRITE also covers shmem_symlink(), which did not mark_page_accessed() before; but since many other filesystems use [__]page_symlink(), which did and does mark the page accessed, consider this as rectifying an oversight. Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Mel Gorman <mgorman@suse.de> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Michal Hocko <mhocko@suse.cz> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Prabhakar Lad <prabhakar.csengg@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-07-03kernel/printk/printk.c: revert "printk: enable interrupts before calling console_trylock_for_printk()"Andrew Morton1-26/+18
Revert commit 939f04bec1a4 ("printk: enable interrupts before calling console_trylock_for_printk()"). Andreas reported: : None of the post 3.15 kernel boot for me. They all hang at the GRUB : screen telling me it loaded and started the kernel, but the kernel : itself stops before it prints anything (or even replaces the GRUB : background graphics). 939f04bec1a4 is modest latency reduction. Revert it until we understand the reason for these failures. Reported-by: Andreas Bombe <aeb@debian.org> Cc: Jan Kara <jack@suse.cz> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-07-03tools/testing/selftests/ipc/msgque.c: improve error handling when not running as rootShuah Khan1-0/+5
The test fails in the middle when it is not run as root while accessing /proc/sys/kernel/msg_next_id. Changed it to check for root at the beginning of the test and exit if not root. Signed-off-by: Shuah Khan <shuah.kh@samsung.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Davidlohr Bueso <davidlohr@hp.com> Cc: Colin Ian King <colin.king@canonical.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-07-03fs/seq_file: fallback to vmalloc allocationHeiko Carstens1-9/+21
There are a couple of seq_files which use the single_open() interface. This interface requires that the whole output must fit into a single buffer. E.g. for /proc/stat allocation failures have been observed because an order-4 memory allocation failed due to memory fragmentation. In such situations reading /proc/stat is not possible anymore. Therefore change the seq_file code to fallback to vmalloc allocations which will usually result in a couple of order-0 allocations and hence also work if memory is fragmented. For reference a call trace where reading from /proc/stat failed: sadc: page allocation failure: order:4, mode:0x1040d0 CPU: 1 PID: 192063 Comm: sadc Not tainted 3.10.0-123.el7.s390x #1 [...] Call Trace: show_stack+0x6c/0xe8 warn_alloc_failed+0xd6/0x138 __alloc_pages_nodemask+0x9da/0xb68 __get_free_pages+0x2e/0x58 kmalloc_order_trace+0x44/0xc0 stat_open+0x5a/0xd8 proc_reg_open+0x8a/0x140 do_dentry_open+0x1bc/0x2c8 finish_open+0x46/0x60 do_last+0x382/0x10d0 path_openat+0xc8/0x4f8 do_filp_open+0x46/0xa8 do_sys_open+0x114/0x1f0 sysc_tracego+0x14/0x1a Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Tested-by: David Rientjes <rientjes@google.com> Cc: Ian Kent <raven@themaw.net> Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Cc: Thorsten Diehl <thorsten.diehl@de.ibm.com> Cc: Andrea Righi <andrea@betterlinux.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Stefan Bader <stefan.bader@canonical.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-07-03/proc/stat: convert to single_open_size()Heiko Carstens1-20/+2
These two patches are supposed to "fix" failed order-4 memory allocations which have been observed when reading /proc/stat. The problem has been observed on s390 as well as on x86. To address the problem change the seq_file memory allocations to fallback to use vmalloc, so that allocations also work if memory is fragmented. This approach seems to be simpler and less intrusive than changing /proc/stat to use an interator. Also it "fixes" other users as well, which use seq_file's single_open() interface. This patch (of 2): Use seq_file's single_open_size() to preallocate a buffer that is large enough to hold the whole output, instead of open coding it. Also calculate the requested size using the number of online cpus instead of possible cpus, since the size of the output only depends on the number of online cpus. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Ian Kent <raven@themaw.net> Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Cc: Thorsten Diehl <thorsten.diehl@de.ibm.com> Cc: Andrea Righi <andrea@betterlinux.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Stefan Bader <stefan.bader@canonical.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-07-03hwpoison: fix the handling path of the victimized page frame that belong to non-LRUChen Yucong1-4/+5
Until now, the kernel has the same policy to handle victimized page frames that belong to kernel-space(reserved/slab-subsystem) or non-LRU(unknown page state). In other word, the result of handling either of these victimized page frames is (IGNORED | FAILED), and the return value of memory_failure() is -EBUSY. This patch is to avoid that memory_failure() returns very soon due to the "true" value of (!PageLRU(p)), and it also ensures that action_result() can report more precise information("reserved kernel", "kernel slab", and "unknown page state") instead of "non LRU", especially for memory errors which are detected by memory-scrubbing. Andi said: : While running the mcelog test suite on 3.14 I hit the following VM_BUG_ON: : : soft_offline: 0x56d4: unknown non LRU page type 3ffff800008000 : page:ffffea000015b400 count:3 mapcount:2097169 mapping: (null) index:0xffff8800056d7000 : page flags: 0x3ffff800004081(locked|slab|head) : ------------[ cut here ]------------ : kernel BUG at mm/rmap.c:1495! : : I think what happened is that a LRU page turned into a slab page in : parallel with offlining. memory_failure initially tests for this case, : but doesn't retest later after the page has been locked. : : ... : : I ran this patch in a loop over night with some stress plus : the mcelog test suite running in a loop. I cannot guarantee it hit it, : but it should have given it a good beating. : : The kernel survived with no messages, although the mcelog test suite : got killed at some point because it couldn't fork anymore. Probably : some unrelated problem. : : So the patch is ok for me for .16. Signed-off-by: Chen Yucong <slaoub@gmail.com> Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Reported-by: Andi Kleen <andi@firstfloor.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-07-03mm:vmscan: update the trace-vmscan-postprocess.pl for event vmscan/mm_vmscan_lru_isolateChen Yucong1-12/+2
When using trace-vmscan-postprocess.pl for checking the file/anon rate of scanning, we can find that it can not be performed. At the same time, the following message will be reported: WARNING: Format not as expected for event vmscan/mm_vmscan_lru_isolate 'file' != 'contig_taken' Fewer fields than expected in format at ./trace-vmscan-postprocess.pl line 171, <FORMAT> line 76. In trace-vmscan-postprocess.pl, (contig_taken, contig_dirty, and contig_failed) are be associated respectively to (nr_lumpy_taken, nr_lumpy_dirty, and nr_lumpy_failed) for lumpy reclaim. Via commit c53919adc045 ("mm: vmscan: remove lumpy reclaim"), lumpy reclaim had already been removed by Mel, but the update for trace-vmscan-postprocess.pl was missed. Signed-off-by: Chen Yucong <slaoub@gmail.com> Cc: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-07-03msync: fix incorrect fstart calculationNamjae Jeon1-1/+2
Fix a regression caused by 7fc34a62ca44 ("mm/msync.c: sync only the requested range in msync()"). xfstests generic/075 fail occured on ext4 data=journal mode because the intended range was not syncing due to wrong fstart calculation. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Ashish Sangwan <a.sangwan@samsung.com> Reported-by: Eric Whitney <enwlinux@gmail.com> Tested-by: Eric Whitney <enwlinux@gmail.com> Acked-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Reviewed-by: Lukas Czerner <lczerner@redhat.com> Tested-by: Lukas Czerner <lczerner@redhat.com> Reviewed-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-07-03zram: revalidate disk after capacity changeMinchan Kim1-1/+4
Alexander reported mkswap on /dev/zram0 is failed if other process is opening the block device file. Step is as follows, 0. Reset the unused zram device. 1. Use a program that opens /dev/zram0 with O_RDWR and sleeps until killed. 2. While that program sleeps, echo the correct value to /sys/block/zram0/disksize. 3. Verify (e.g. in /proc/partitions) that the disk size is applied correctly. It is. 4. While that program still sleeps, attempt to mkswap /dev/zram0. This fails: mkswap: error: swap area needs to be at least 40 KiB When I investigated, the size get by ioctl(fd, BLKGETSIZE64, xxx) on mkswap to get a size of blockdev was zero although zram0 has right size by 2. The reason is zram didn't revalidate disk after changing capacity so that size of blockdev's inode is not uptodate until all of file is close. This patch should fix the BUG. Signed-off-by: Minchan Kim <minchan@kernel.org> Reported-by: Alexander E. Patrakov <patrakov@gmail.com> Tested-by: Alexander E. Patrakov <patrakov@gmail.com> Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Nitin Gupta <ngupta@vflare.org> Acked-by: Jerome Marchand <jmarchan@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-07-03tools: memory-hotplug fix unexpected operator errorShuah Khan1-1/+1
on-off-test uses "$UID != 0" to test for root, but $UID is a construct specific to bash. Using /bin/sh that isn't bash results in the following error (due to the "$UID" part expanding to nothing): ./on-off-test.sh: 9: [: !=: unexpected operator Change Makefile to use bash instead. Signed-off-by: Shuah Khan <shuah.kh@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-07-03tools: cpu-hotplug fix unexpected operator errorShuah Khan1-1/+1
on-off-test uses "$UID != 0" to test for root, but $UID is a construct specific to bash. Using /bin/sh that isn't bash results in the following error (due to the "$UID" part expanding to nothing): ./on-off-test.sh: 9: [: !=: unexpected operator Change Makefile to use bash instead. Signed-off-by: Shuah Khan <shuah.kh@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-07-03autofs4: fix false positive compile errorIan Kent1-1/+1
On strict build environments we can see: fs/autofs4/inode.c: In function 'autofs4_fill_super': fs/autofs4/inode.c:312: error: 'pgrp' may be used uninitialized in this function make[2]: *** [fs/autofs4/inode.o] Error 1 make[1]: *** [fs/autofs4] Error 2 make: *** [fs] Error 2 make: *** Waiting for unfinished jobs.... This is due to the use of pgrp_set being used to indicate pgrp has has been set rather than initializing pgrp itself. Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-07-03slub: fix off by one in number of slab testsJoonsoo Kim1-3/+3
min_partial means minimum number of slab cached in node partial list. So, if nr_partial is less than it, we keep newly empty slab on node partial list rather than freeing it. But if nr_partial is equal or greater than it, it means that we have enough partial slabs so should free newly empty slab. Current implementation missed the equal case so if we set min_partial is 0, then, at least one slab could be cached. This is critical problem to kmemcg destroying logic because it doesn't works properly if some slabs is cached. This patch fixes this problem. Fixes 91cb69620284 ("slub: make dead memcg caches discard free slabs immediately"). Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: Vladimir Davydov <vdavydov@parallels.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-07-03mm: page_alloc: fix CMA area initialisation when pageblock > MAX_ORDERMichal Nazarewicz1-2/+14
With a kernel configured with ARM64_64K_PAGES && !TRANSPARENT_HUGEPAGE, the following is triggered at early boot: SMP: Total of 8 processors activated. devtmpfs: initialized Unable to handle kernel NULL pointer dereference at virtual address 00000008 pgd = fffffe0000050000 [00000008] *pgd=00000043fba00003, *pmd=00000043fba00003, *pte=00e0000078010407 Internal error: Oops: 96000006 [#1] SMP Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.15.0-rc864k+ #44 task: fffffe03bc040000 ti: fffffe03bc080000 task.ti: fffffe03bc080000 PC is at __list_add+0x10/0xd4 LR is at free_one_page+0x270/0x638 ... Call trace: __list_add+0x10/0xd4 free_one_page+0x26c/0x638 __free_pages_ok.part.52+0x84/0xbc __free_pages+0x74/0xbc init_cma_reserved_pageblock+0xe8/0x104 cma_init_reserved_areas+0x190/0x1e4 do_one_initcall+0xc4/0x154 kernel_init_freeable+0x204/0x2a8 kernel_init+0xc/0xd4 This happens because init_cma_reserved_pageblock() calls __free_one_page() with pageblock_order as page order but it is bigger than MAX_ORDER. This in turn causes accesses past zone->free_list[]. Fix the problem by changing init_cma_reserved_pageblock() such that it splits pageblock into individual MAX_ORDER pages if pageblock is bigger than a MAX_ORDER page. In cases where !CONFIG_HUGETLB_PAGE_SIZE_VARIABLE, which is all architectures expect for ia64, powerpc and tile at the moment, the “pageblock_order > MAX_ORDER” condition will be optimised out since both sides of the operator are constants. In cases where pageblock size is variable, the performance degradation should not be significant anyway since init_cma_reserved_pageblock() is called only at boot time at most MAX_CMA_AREAS times which by default is eight. Signed-off-by: Michal Nazarewicz <mina86@mina86.com> Reported-by: Mark Salter <msalter@redhat.com> Tested-by: Mark Salter <msalter@redhat.com> Tested-by: Christopher Covington <cov@codeaurora.org> Cc: Mel Gorman <mgorman@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: <stable@vger.kernel.org> [3.5+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-07-03Btrfs: fix crash when starting transactionFilipe Manana1-0/+1
Often when starting a transaction we commit the currently running transaction, which can end up writing block group caches when the current process has its journal_info set to NULL (and not to a transaction). This makes our assertion at btrfs_check_data_free_space() (current_journal != NULL) fail, resulting in a crash/hang. Therefore fix it by setting journal_info. Two different traces of this issue follow below. 1) [51502.241936] BTRFS: assertion failed: current->journal_info, file: fs/btrfs/extent-tree.c, line: 3670 [51502.242213] ------------[ cut here ]------------ [51502.242493] kernel BUG at fs/btrfs/ctree.h:3964! [51502.242669] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC (...) [51502.244010] Call Trace: [51502.244010] [<ffffffffa02bc025>] btrfs_check_data_free_space+0x395/0x3a0 [btrfs] [51502.244010] [<ffffffffa02c3bdc>] btrfs_write_dirty_block_groups+0x4ac/0x640 [btrfs] [51502.244010] [<ffffffffa0357a6a>] commit_cowonly_roots+0x164/0x226 [btrfs] [51502.244010] [<ffffffffa02d53cd>] btrfs_commit_transaction+0x4ed/0xab0 [btrfs] [51502.244010] [<ffffffff8168ec7b>] ? _raw_spin_unlock+0x2b/0x40 [51502.244010] [<ffffffffa02d6259>] start_transaction+0x459/0x620 [btrfs] [51502.244010] [<ffffffffa02d67ab>] btrfs_start_transaction+0x1b/0x20 [btrfs] [51502.244010] [<ffffffffa02d73e1>] __unlink_start_trans+0x31/0xe0 [btrfs] [51502.244010] [<ffffffffa02dea67>] btrfs_unlink+0x37/0xc0 [btrfs] [51502.244010] [<ffffffff811bb054>] ? do_unlinkat+0x114/0x2a0 [51502.244010] [<ffffffff811baebc>] vfs_unlink+0xcc/0x150 [51502.244010] [<ffffffff811bb1a0>] do_unlinkat+0x260/0x2a0 [51502.244010] [<ffffffff811a9ef4>] ? filp_close+0x64/0x90 [51502.244010] [<ffffffff810aaea6>] ? trace_hardirqs_on_caller+0x16/0x1e0 [51502.244010] [<ffffffff81349cab>] ? trace_hardirqs_on_thunk+0x3a/0x3f [51502.244010] [<ffffffff811be9eb>] SyS_unlinkat+0x1b/0x40 [51502.244010] [<ffffffff81698452>] system_call_fastpath+0x16/0x1b [51502.244010] Code: 0b 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 55 89 f1 48 c7 c2 71 13 36 a0 48 89 fe 31 c0 48 c7 c7 b8 43 36 a0 48 89 e5 e8 5d b0 32 e1 <0f> 0b 0f 1f 44 00 00 55 b9 11 00 00 00 48 89 e5 41 55 49 89 f5 [51502.244010] RIP [<ffffffffa03575da>] assfail.constprop.88+0x1e/0x20 [btrfs] 2) [25405.097230] BTRFS: assertion failed: current->journal_info, file: fs/btrfs/extent-tree.c, line: 3670 [25405.097488] ------------[ cut here ]------------ [25405.097767] kernel BUG at fs/btrfs/ctree.h:3964! [25405.097940] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC (...) [25405.100008] Call Trace: [25405.100008] [<ffffffffa02bc025>] btrfs_check_data_free_space+0x395/0x3a0 [btrfs] [25405.100008] [<ffffffffa02c3bdc>] btrfs_write_dirty_block_groups+0x4ac/0x640 [btrfs] [25405.100008] [<ffffffffa035755a>] commit_cowonly_roots+0x164/0x226 [btrfs] [25405.100008] [<ffffffffa02d53cd>] btrfs_commit_transaction+0x4ed/0xab0 [btrfs] [25405.100008] [<ffffffff8109c170>] ? bit_waitqueue+0xc0/0xc0 [25405.100008] [<ffffffffa02d6259>] start_transaction+0x459/0x620 [btrfs] [25405.100008] [<ffffffffa02d67ab>] btrfs_start_transaction+0x1b/0x20 [btrfs] [25405.100008] [<ffffffffa02e3407>] btrfs_create+0x47/0x210 [btrfs] [25405.100008] [<ffffffffa02d74cc>] ? btrfs_permission+0x3c/0x80 [btrfs] [25405.100008] [<ffffffff811bc63b>] vfs_create+0x9b/0x130 [25405.100008] [<ffffffff811bcf19>] do_last+0x849/0xe20 [25405.100008] [<ffffffff811b9409>] ? link_path_walk+0x79/0x820 [25405.100008] [<ffffffff811bd5b5>] path_openat+0xc5/0x690 [25405.100008] [<ffffffff810ab07d>] ? trace_hardirqs_on+0xd/0x10 [25405.100008] [<ffffffff811cdcd2>] ? __alloc_fd+0x32/0x1d0 [25405.100008] [<ffffffff811be2a3>] do_filp_open+0x43/0xa0 [25405.100008] [<ffffffff811cddf1>] ? __alloc_fd+0x151/0x1d0 [25405.100008] [<ffffffff811abcfc>] do_sys_open+0x13c/0x230 [25405.100008] [<ffffffff810aaea6>] ? trace_hardirqs_on_caller+0x16/0x1e0 [25405.100008] [<ffffffff811abe12>] SyS_open+0x22/0x30 [25405.100008] [<ffffffff81698452>] system_call_fastpath+0x16/0x1b [25405.100008] Code: 0b 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 55 89 f1 48 c7 c2 51 13 36 a0 48 89 fe 31 c0 48 c7 c7 d0 43 36 a0 48 89 e5 e8 6d b5 32 e1 <0f> 0b 0f 1f 44 00 00 55 b9 11 00 00 00 48 89 e5 41 55 49 89 f5 [25405.100008] RIP [<ffffffffa03570ca>] assfail.constprop.88+0x1e/0x20 [btrfs] Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-07-03Btrfs: fix btrfs_print_leaf for skinny metadataJosef Bacik1-4/+5
We wouldn't actuall print the extent information if we had a skinny metadata item, this fixes that. Thanks, Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-07-03Btrfs: fix race of using total_bytes_pinnedLiu Bo1-4/+1
This percpu counter @total_bytes_pinned is introduced to skip unnecessary operations of 'commit transaction', it accounts for those space we may free but are stuck in delayed refs. And we zero out @space_info->total_bytes_pinned every transaction period so we have a better idea of how much space we'll actually free up by committing this transaction. However, we do the 'zero out' part a little earlier, before we actually unpin space, so we end up returning ENOSPC when we actually have free space that's just unpinned from committing transaction. xfstests/generic/074 complained then. This fixes it by actually accounting the percpu pinned number when 'unpin', and since it's protected by space_info->lock, the race is gone now. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-07-03btrfs: use E2BIG instead of EIO if compression does not helpDavid Sterba1-1/+1
Return codes got updated in 60e1975acb48fc3d74a3422b21dde74c977ac3d5 (btrfs: return errno instead of -1 from compression) lzo wrapper returns E2BIG in this case, do the same for zlib. Signed-off-by: David Sterba <dsterba@suse.cz>