aboutsummaryrefslogtreecommitdiffstats
path: root/tools/perf/scripts/python/syscall-counts.py (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2014-06-25ARM: at91/dt: define sam9261ek slow crystal frequencyAlexandre Belloni1-0/+4
Define at91sam9261ek's slow crystal frequencies. Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com> Acked-by: Boris BREZILLON <boris.brezillon@free-electrons.com> Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
2014-06-25ARM: at91/dt: sam9261: correctly define mainckAlexandre Belloni1-2/+8
mainck (CKGR_MCFR register) is actually using main_osc (CKGR_MOR register). Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com> Acked-by: Boris BREZILLON <boris.brezillon@free-electrons.com> Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
2014-06-25ARM: at91/dt: sam9n12: correct PLLA ICPLL and OUT valuesAlexandre Belloni1-2/+2
ICPLL can only take 0 or 1, it got mixed with OUT which can be in the [0-3] range. Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com> Acked-by: Boris BREZILLON <boris.brezillon@free-electrons.com> Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
2014-06-25ARM: at91/dt: sam9x5: correct PLLA ICPLL and OUT valuesAlexandre Belloni1-2/+2
ICPLL can only take 0 or 1, it got mixed with OUT which can be in the [0-3] range. Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com> Acked-by: Boris BREZILLON <boris.brezillon@free-electrons.com> Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
2014-06-25misc: atmel_pwm: fix Kconfig symbolsNicolas Ferre1-1/+1
AT91 symbols AT91SAM9263, AT91SAM9RL, and AT91SAM9G45 do not exist and this patch changes them to their correct ARCH_* version. These symbols are chosen instead of the SOC_* ones because this driver is not converted to DT. Anyway, the ATMEL_PWM symbol and the associated driver will be removed soon, during the move to the PWM sub-system. Reported-by: Paul Bolle <pebolle@tiscali.nl> Acked-by: Alexandre Belloni <alexandre.belloni@free-electrons.com> Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
2014-06-25powerpc: Don't skip ePAPR spin-table CPUsScott Wood1-1/+9
Commit 59a53afe70fd530040bdc69581f03d880157f15a "powerpc: Don't setup CPUs with bad status" broke ePAPR SMP booting. ePAPR says that CPUs that aren't presently running shall have status of disabled, with enable-method being used to determine whether the CPU can be enabled. Fix by checking for spin-table, which is currently the only supported enable-method. Signed-off-by: Scott Wood <scottwood@freescale.com> Cc: Michael Neuling <mikey@neuling.org> Cc: Emil Medve <Emilian.Medve@Freescale.com> Cc: stable@vger.kernel.org Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-06-25powerpc/module: Fix TOC symbol CRCLaurent Dufour1-1/+10
The commit 71ec7c55ed91 introduced the magic symbol ".TOC." for ELFv2 ABI. This symbol is built manually and has no CRC value computed. A zero value is put in the CRC section to avoid modpost complaining about a missing CRC. Unfortunately, this breaks the kernel module loading when the kernel is relocated (kdump case for instance) because of the relocation applied to the kcrctab values. This patch compute a CRC value for the TOC symbol which will match the one compute by the kernel when it is relocated - aka '0 - relocate_start' done in maybe_relocated called by check_version (module.c). Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com> Cc: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-06-25powerpc/powernv: Remove OPAL v1 takeoverMichael Ellerman7-392/+2
In commit 27f4488872d9 "Add OPAL takeover from PowerVM" we added support for "takeover" on OPAL v1 machines. This was a mode of operation where we would boot under pHyp, and query for the presence of OPAL. If detected we would then do a special sequence to take over the machine, and the kernel would end up running in hypervisor mode. OPAL v1 was never a supported product, and was never shipped outside IBM. As far as we know no one is still using it. Newer versions of OPAL do not use the takeover mechanism. Although the query for OPAL should be harmless on machines with newer OPAL, we have seen a machine where it causes a crash in Open Firmware. The code in early_init_devtree() to copy boot_command_line into cmd_line was added in commit 817c21ad9a1f "Get kernel command line accross OPAL takeover", and AFAIK is only used by takeover, so should also be removed. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-06-24NFSv4: test SECINFO RPC_AUTH_GSS pseudoflavors for supportAndy Adamson4-45/+58
Fix nfs4_negotiate_security to create an rpc_clnt used to test each SECINFO returned pseudoflavor. Check credential creation (and gss_context creation) which is important for RPC_AUTH_GSS pseudoflavors which can fail for multiple reasons including mis-configuration. Don't call nfs4_negotiate in nfs4_submount as it was just called by nfs4_proc_lookup_mountpoint (nfs4_proc_lookup_common) Signed-off-by: Andy Adamson <andros@netapp.com> [Trond: fix corrupt return value from nfs_find_best_sec()] Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-06-24NFS Return -EPERM if no supported or matching SECINFO flavorAndy Adamson1-7/+4
Do not return RPC_AUTH_UNIX if SEINFO reply tests fail. This prevents an infinite loop of NFS4ERR_WRONGSEC for non RPC_AUTH_UNIX mounts. Without this patch, a mount with no sec= option to a server that does not include RPC_AUTH_UNIX in the SECINFO return can be presented with an attemtp to use RPC_AUTH_UNIX which will result in an NFS4ERR_WRONG_SEC which will prompt the SECINFO call which will again try RPC_AUTH_UNIX.... Signed-off-by: Andy Adamson <andros@netapp.com> Tested-By: Steve Dickson <steved@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-06-24NFS check the return of nfs4_negotiate_security in nfs4_submountAndy Adamson1-2/+5
Signed-off-by: Andy Adamson <andros@netapp.com> Tested-By: Steve Dickson <steved@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-06-24NFS: Don't mark the data cache as invalid if it has been flushedTrond Myklebust1-35/+40
Now that we have functions such as nfs_write_pageuptodate() that use the cache_validity flags to check if the data cache is valid or not, it is a little more important to keep the flags in sync with the state of the data cache. In particular, we'd like to ensure that if the data cache is empty, we don't start marking it as needing revalidation. Reported-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-06-24NFS: Clear NFS_INO_REVAL_PAGECACHE when we update the file sizeTrond Myklebust1-0/+1
In nfs_update_inode(), if the change attribute is seen to change on the server, then we set NFS_INO_REVAL_PAGECACHE in order to make sure that we check the file size. However, if we also update the file size in the same function, we don't need to check it again. So make sure that we clear the NFS_INO_REVAL_PAGECACHE that was set earlier. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-06-24nfs: Fix cache_validity check in nfs_write_pageuptodate()Scott Mayhew1-1/+3
NFS_INO_INVALID_DATA cannot be ignored, even if we have a delegation. We're still having some problems with data corruption when multiple clients are appending to a file and those clients are being granted write delegations on open. To reproduce: Client A: vi /mnt/`hostname -s` while :; do echo "XXXXXXXXXXXXXXX" >>/mnt/file; sleep $(( $RANDOM % 5 )); done Client B: vi /mnt/`hostname -s` while :; do echo "YYYYYYYYYYYYYYY" >>/mnt/file; sleep $(( $RANDOM % 5 )); done What's happening is that in nfs_update_inode() we're recognizing that the file size has changed and we're setting NFS_INO_INVALID_DATA accordingly, but then we ignore the cache_validity flags in nfs_write_pageuptodate() because we have a delegation. As a result, in nfs_updatepage() we're extending the write to cover the full page even though we've not read in the data to begin with. Signed-off-by: Scott Mayhew <smayhew@redhat.com> Cc: <stable@vger.kernel.org> # v3.11+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-06-24aio: fix kernel memory disclosure in io_getevents() introduced in v3.10Benjamin LaHaise1-0/+3
A kernel memory disclosure was introduced in aio_read_events_ring() in v3.10 by commit a31ad380bed817aa25f8830ad23e1a0480fef797. The changes made to aio_read_events_ring() failed to correctly limit the index into ctx->ring_pages[], allowing an attacked to cause the subsequent kmap() of an arbitrary page with a copy_to_user() to copy the contents into userspace. This vulnerability has been assigned CVE-2014-0206. Thanks to Mateusz and Petr for disclosing this issue. This patch applies to v3.12+. A separate backport is needed for 3.10/3.11. Signed-off-by: Benjamin LaHaise <bcrl@kvack.org> Cc: Mateusz Guzik <mguzik@redhat.com> Cc: Petr Matousek <pmatouse@redhat.com> Cc: Kent Overstreet <kmo@daterainc.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: stable@vger.kernel.org
2014-06-24aio: fix aio request leak when events are reaped by userspaceBenjamin LaHaise1-2/+1
The aio cleanups and optimizations by kmo that were merged into the 3.10 tree added a regression for userspace event reaping. Specifically, the reference counts are not decremented if the event is reaped in userspace, leading to the application being unable to submit further aio requests. This patch applies to 3.12+. A separate backport is required for 3.10/3.11. This issue was uncovered as part of CVE-2014-0206. Signed-off-by: Benjamin LaHaise <bcrl@kvack.org> Cc: stable@vger.kernel.org Cc: Kent Overstreet <kmo@daterainc.com> Cc: Mateusz Guzik <mguzik@redhat.com> Cc: Petr Matousek <pmatouse@redhat.com>
2014-06-24ARM: integrator: fix OF-related regressionLinus Walleij2-36/+13
Commit 07e461cd7e73a84f0e3757932b93cc80976fd749 "of: Ensure unique names without sacrificing determinism" caused a boot failure regression on the Integrator machines. The problem is probably caused by fiddling too much with the device tree population in the OF init function, such as passing the SoC bus device as parent when populating the device tree. This patch fixes the problem by: - Avoiding to explicitly look up the tree root - Look up devices needed before device population from the match only, passing NULL as root - Passing NULL as root and parent when calling of_platform_populate() After this the Integrators boot again. Tested on Integrator/AP and Integrator/CP. Cc: Grant Likely <grant.likely@linaro.org> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2014-06-24ARM: mvebu: Fix the improper use of the compatible string armada38x using a wildcardGregory CLEMENT6-7/+17
Wildcards in compatible strings should be avoid. "marvell,armada38x" was recently introduced but was not yet used. The armada 385 SoC is a superset of the armada 380 SoC (with more CPUs and more PCIe slots). So this patch replaces the use of "marvell,armada38x" by the "marvell,armada380" string. Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com> Link: https://lkml.kernel.org/r/1403533011-21339-1-git-send-email-gregory.clement@free-electrons.com Acked-by: Andrew Lunn <andrew@lunn.ch> Cc: <stable@vger.kernel.org> # v3.15+ Signed-off-by: Jason Cooper <jason@lakedaemon.net>
2014-06-24powerpc/kmemleak: Do not scan the DART tableCatalin Marinas1-0/+5
The DART table allocation is registered to kmemleak via the memblock_alloc_base() call. However, the DART table is later unmapped and dart_tablebase VA no longer accessible. This patch tells kmemleak not to scan this block and avoid an unhandled paging request. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-06-24selftests/powerpc: Use the test harness for the TM DSCR testMichael Ellerman2-4/+12
This gives us standardised success/failure output and also handles killing the test if it runs forever (2 minutes). Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-06-24powerpc/cell: cbe_thermal.c: Cleaning up a variable is of the wrong typeRickard Strandqvist1-1/+1
This variable is of the wrong type, everywhere it is used it should be an unsigned int rather than a int. Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-06-24powerpc/kprobes: Fix jprobes on ABI v2 (LE)Michael Ellerman1-3/+6
In commit 721aeaa9 "Build little endian ppc64 kernel with ABIv2", we missed some updates required in the kprobes code to make jprobes work when the kernel is built with ABI v2. Firstly update arch_deref_entry_point() to do the right thing. Now that we have added ppc_global_function_entry() we can just always use that, it will do the right thing for 32 & 64 bit and ABI v1 & v2. Secondly we need to update the code that sets up the register state before calling the jprobe handler. On ABI v1 we setup r2 to hold the TOC, on ABI v2 we need to populate r12 with the function entry point address. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-06-24powerpc/ftrace: Use pr_fmt() to namespace error messagesMichael Ellerman1-23/+20
The printks() in our ftrace code have no prefix, so they appear on the console with very little context, eg: Branch out of range Use pr_fmt() & pr_err() to add a prefix. While we're at it, collapse a few split lines that don't need to be, and add a missing newline to one message. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-06-24powerpc/ftrace: Fix nop of modules on 64bit LE (ABIv2)Michael Ellerman1-3/+4
There is a bug in the handling of the function entry when we are nopping out a branch from a module in ftrace. We compare the result of module_trampoline_target() with the value of ppc_function_entry(), and expect them to be true. But they never will be. module_trampoline_target() will always return the global entry point of the function, whereas ppc_function_entry() will always return the local. Fix it by using the newly added ppc_global_function_entry(). Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-06-24powerpc/ftrace: Fix inverted check of create_branch()Michael Ellerman1-1/+1
In commit 24a1bdc35, "Fix ABIv2 issues with __ftrace_make_call", Anton changed the logic that creates and patches the branch, and added a thinko in the check of create_branch(). create_branch() returns the instruction that was generated, so if we get zero then it succeeded. The result is we can't ftrace modules: Branch out of range WARNING: at ../kernel/trace/ftrace.c:1638 ftrace failed to modify [<d000000004ba001c>] fuse_req_init_context+0x1c/0x90 [fuse] We should probably fix patch_instruction() to do that check and make the API saner, but that's a separate patch. For now just invert the test. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-06-24powerpc/ftrace: Fix typo in mask of opcodeMichael Ellerman1-1/+1
In commit 24a1bdc35, "Fix ABIv2 issues with __ftrace_make_call", Anton changed the logic that checks for the expected code sequence when patching a module. We missed the typo in the mask, 0xffff00000 should be 0xffff0000, which has the effect of making the test always true. That makes it impossible to ftrace against modules, eg: Unexpected call sequence: 48000008 e8410018 WARNING: at ../kernel/trace/ftrace.c:1638 ftrace failed to modify [<d000000007cf001c>] rng_dev_open+0x1c/0x70 [rng_core] Reported-by: David Binderman <dcb314@hotmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-06-24powerpc: Add ppc_global_function_entry()Michael Ellerman1-0/+11
ABIv2 has the concept of a global and local entry point to a function. In most cases we are interested in the local entry point, and so that is what ppc_function_entry() returns. However we have a case in the ftrace code where we want the global entry point, and there may be other places we need it too. Rather than special casing each, add an accessor. For ABIv1 and 32-bit there is only a single entry point, so we return that. That means it's safe for the caller to use this without also checking the ABI version. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-06-24powerpc/macintosh/smu.c: Fix closing brace followed by ifRasmus Villemoes1-1/+2
A closing brace followed by "if" is almost certainly a mistake. Maybe "else if" was meant, but in this case it doesn't really matter. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-06-24powerpc: Remove __arch_swab*Benjamin Herrenschmidt1-43/+0
The generic code uses gcc built-ins which work fine so there's no benefit in implementing our own anymore. We can't completely remove the ld/st_le* functions as some historical cruft still uses them, but that's next on the radar Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-06-24powerpc: Remove ancient DEBUG_SIG codeMichael Ellerman2-18/+0
We have some compile-time disabled debug code in signal_xx.c. It's from some ancient time BG, almost certainly part of the original port, given the very similar code on other arches. The show_unhandled_signal logic, added in d0c3d534a438 (2.6.24) is cleaner and prints more useful information, so drop the debug code. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-06-24powerpc/kerenl: Enable EEH for IO accessorsGavin Shan1-10/+10
In arch/powerpc/kernel/iomap.c, lots of IO reading accessors missed to check EEH error as Ben pointed. The patch fixes it. For the writing accessors, we change the called functions only for making them look similar to the reading counterparts. Suggested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-06-23ocfs2/dlm: do not purge lockres that is queued for assert masterXue jiufei4-6/+55
When workqueue is delayed, it may occur that a lockres is purged while it is still queued for master assert. it may trigger BUG() as follows. N1 N2 dlm_get_lockres() ->dlm_do_master_requery is the master of lockres, so queue assert_master work dlm_thread() start running and purge the lockres dlm_assert_master_worker() send assert master message to other nodes receiving the assert_master message, set master to N2 dlmlock_remote() send create_lock message to N2, but receive DLM_IVLOCKID, if it is RECOVERY lockres, it triggers the BUG(). Another BUG() is triggered when N3 become the new master and send assert_master to N1, N1 will trigger the BUG() because owner doesn't match. So we should not purge lockres when it is queued for assert master. Signed-off-by: joyce.xue <xuejiufei@huawei.com> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-23ocfs2: do not return DLM_MIGRATE_RESPONSE_MASTERY_REF to avoid endless,loop during umountjiangyiwen1-5/+9
The following case may lead to endless loop during umount. node A node B node C node D umount volume, migrate lockres1 to B want to lock lockres1, send MASTER_REQUEST_MSG to C init block mle send MIGRATE_REQUEST_MSG to C find a block mle, and then return DLM_MIGRATE_RESPONSE_MASTERY_REF to B set C in refmap umount successfully try to umount, endless loop occurs when migrate lockres1 since C is in refmap So we can fix this endless loop case by only returning DLM_MIGRATE_RESPONSE_MASTERY_REF if it has a mastery mle when receiving MIGRATE_REQUEST_MSG. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: jiangyiwen <jiangyiwen@huawei.com> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Xue jiufei <xuejiufei@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-23ocfs2: manually do the iput once ocfs2_add_entry failed in ocfs2_symlink and ocfs2_mknodjiangyiwen1-0/+27
When the call to ocfs2_add_entry() failed in ocfs2_symlink() and ocfs2_mknod(), iput() will not be called during dput(dentry) because no d_instantiate(), and this will lead to umount hung. Signed-off-by: jiangyiwen <jiangyiwen@huawei.com> Cc: Joel Becker <jlbec@evilplan.org> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-23ocfs2: fix a tiny race when running dirop_fileop_racerYiwen Jiang2-2/+96
When running dirop_fileop_racer we found a dead lock case. 2 nodes, say Node A and Node B, mount the same ocfs2 volume. Create /race/16/1 in the filesystem, and let the inode number of dir 16 is less than the inode number of dir race. Node A Node B mv /race/16/1 /race/ right after Node A has got the EX mode of /race/16/, and tries to get EX mode of /race ls /race/16/ In this case, Node A has got the EX mode of /race/16/, and wants to get EX mode of /race/. Node B has got the PR mode of /race/, and wants to get the PR mode of /race/16/. Since EX and PR are mutually exclusive, dead lock happens. This patch fixes this case by locking in ancestor order before trying inode number order. Signed-off-by: Yiwen Jiang <jiangyiwen@huawei.com> Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Cc: Joel Becker <jlbec@evilplan.org> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-23ocfs2/dlm: fix misuse of list_move_tail() in dlm_run_purge_list()Xue jiufei1-1/+1
When a lockres in purge list but is still in use, it should be moved to the tail of purge list. dlm_thread will continue to check next lockres in purge list. However, code list_move_tail(&dlm->purge_list, &lockres->purge) will do *no* movements, so dlm_thread will purge the same lockres in this loop again and again. If it is in use for a long time, other lockres will not be processed. Signed-off-by: Yiwen Jiang <jiangyiwen@huawei.com> Signed-off-by: joyce.xue <xuejiufei@huawei.com> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-23ocfs2: refcount: take rw_lock in ocfs2_reflinkWengang Wang1-0/+8
This patch tries to fix this crash: #5 [ffff88003c1cd690] do_invalid_op at ffffffff810166d5 #6 [ffff88003c1cd730] invalid_op at ffffffff8159b2de [exception RIP: ocfs2_direct_IO_get_blocks+359] RIP: ffffffffa05dfa27 RSP: ffff88003c1cd7e8 RFLAGS: 00010202 RAX: 0000000000000000 RBX: ffff88003c1cdaa8 RCX: 0000000000000000 RDX: 000000000000000c RSI: ffff880027a95000 RDI: ffff88003c79b540 RBP: ffff88003c1cd858 R8: 0000000000000000 R9: ffffffff815f6ba0 R10: 00000000000001c9 R11: 00000000000001c9 R12: ffff88002d271500 R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000001000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #7 [ffff88003c1cd860] do_direct_IO at ffffffff811cd31b #8 [ffff88003c1cd950] direct_IO_iovec at ffffffff811cde9c #9 [ffff88003c1cd9b0] do_blockdev_direct_IO at ffffffff811ce764 #10 [ffff88003c1cdb80] __blockdev_direct_IO at ffffffff811ce7cc #11 [ffff88003c1cdbb0] ocfs2_direct_IO at ffffffffa05df756 [ocfs2] #12 [ffff88003c1cdbe0] generic_file_direct_write_iter at ffffffff8112f935 #13 [ffff88003c1cdc40] ocfs2_file_write_iter at ffffffffa0600ccc [ocfs2] #14 [ffff88003c1cdd50] do_aio_write at ffffffff8119126c #15 [ffff88003c1cddc0] aio_rw_vect_retry at ffffffff811d9bb4 #16 [ffff88003c1cddf0] aio_run_iocb at ffffffff811db880 #17 [ffff88003c1cde30] io_submit_one at ffffffff811dc238 #18 [ffff88003c1cde80] do_io_submit at ffffffff811dc437 #19 [ffff88003c1cdf70] sys_io_submit at ffffffff811dc530 #20 [ffff88003c1cdf80] system_call_fastpath at ffffffff8159a159 It crashes at BUG_ON(create && (ext_flags & OCFS2_EXT_REFCOUNTED)); in ocfs2_direct_IO_get_blocks. ocfs2_direct_IO_get_blocks is expecting the OCFS2_EXT_REFCOUNTED be removed in ocfs2_prepare_inode_for_write() if it was there. But no cluster lock is taken during the time before (or inside) ocfs2_prepare_inode_for_write() and after ocfs2_direct_IO_get_blocks(). It can happen in this case: Node A(which crashes) Node B ------------------------ --------------------------- ocfs2_file_aio_write ocfs2_prepare_inode_for_write ocfs2_inode_lock ... ocfs2_inode_unlock #no refcount found .... ocfs2_reflink ocfs2_inode_lock ... ocfs2_inode_unlock #now, refcount flag set on extent ... flush change to disk ocfs2_direct_IO_get_blocks ocfs2_get_clusters #extent map miss #buffer_head miss read extents from disk found refcount flag on extent crash.. Fix: Take rw_lock in ocfs2_reflink path Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-23ocfs2: revert "ocfs2: fix NULL pointer dereference when dismount and ocfs2rec simultaneously"Xue jiufei1-6/+2
75f82eaa502c ("ocfs2: fix NULL pointer dereference when dismount and ocfs2rec simultaneously") may cause umount hang while shutting down truncate log. The situation is as followes: ocfs2_dismout_volume -> ocfs2_recovery_exit -> free osb->recovery_map -> ocfs2_truncate_shutdown -> lock global bitmap inode -> ocfs2_wait_for_recovery -> check whether osb->recovery_map->rm_used is zero Because osb->recovery_map is already freed, rm_used can be any other values, so it may yield umount hang. Signed-off-by: joyce.xue <xuejiufei@huawei.com> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-23ocfs2: fix deadlock when two nodes are converting same lock from PR to EX and idletimeout closes connTariq Saeed1-5/+13
Orabug: 18639535 Two node cluster and both nodes hold a lock at PR level and both want to convert to EX at the same time. Master node 1 has sent BAST and then closes the connection due to idletime out. Node 0 receives BAST, sends unlock req with cancel flag but gets error -ENOTCONN. The problem is this error is ignored in dlm_send_remote_unlock_request() on the **incorrect** assumption that the master is dead. See NOTE in comment why it returns DLM_NORMAL. Upon getting DLM_NORMAL, node 0 proceeds to sends convert (without cancel flg) which fails with -ENOTCONN. waits 5 sec and resends. This time gets DLM_IVLOCKID from the master since lock not found in grant, it had been moved to converting queue in response to conv PR->EX req. No way out. Node 1 (master) Node 0 ============== ====== lock mode PR PR convert PR -> EX mv grant -> convert and que BAST ... <-------- convert PR -> EX convert que looks like this: ((node 1, PR -> EX) (node 0, PR -> EX)) ... BAST (want PR -> NL) ------------------> ... idle timout, conn closed ... In response to BAST, sends unlock with cancel convert flag gets -ENOTCONN. Ignores and sends remote convert request gets -ENOTCONN, waits 5 Sec, retries ... reconnects <----------------- convert req goes through on next try does not find lock on grant que status DLM_IVLOCKID ------------------> ... No way out. Fix is to keep retrying unlock with cancel flag until it succeeds or the master dies. Signed-off-by: Tariq Saeed <tariq.x.saeed@oracle.com> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-23ocfs2: should add inode into orphan dir after updating entry in ocfs2_rename()alex chen1-11/+11
There are two files a and b in dir /mnt/ocfs2. node A node B mv a b In ocfs2_rename(), after calling ocfs2_orphan_add(), the inode of file b will be added into orphan dir. If ocfs2_update_entry() fails, ocfs2_rename return error and mv operation fails. But file b still exists in the parent dir. ocfs2_queue_orphan_scan -> ocfs2_queue_recovery_completion -> ocfs2_complete_recovery -> ocfs2_recover_orphans The inode of the file b will be put with iput(). ocfs2_evict_inode -> ocfs2_delete_inode -> ocfs2_wipe_inode -> ocfs2_remove_inode OCFS2_VALID_FL in the inode i_flags will be cleared. The file b still can be accessed on node B. ls /mnt/ocfs2 When first read the file b with ocfs2_read_inode_block(). It will validate the inode using ocfs2_validate_inode_block(). Because OCFS2_VALID_FL not set in the inode i_flags, so the file system will be readonly. So we should add inode into orphan dir after updating entry in ocfs2_rename(). Signed-off-by: alex.chen <alex.chen@huawei.com> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-23mm: fix crashes from mbind() merging vmasHugh Dickins1-26/+20
In v2.6.34 commit 9d8cebd4bcd7 ("mm: fix mbind vma merge problem") introduced vma merging to mbind(), but it should have also changed the convention of passing start vma from queue_pages_range() (formerly check_range()) to new_vma_page(): vma merging may have already freed that structure, resulting in BUG at mm/mempolicy.c:1738 and probably worse crashes. Fixes: 9d8cebd4bcd7 ("mm: fix mbind vma merge problem") Reported-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Tested-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Christoph Lameter <cl@linux.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: <stable@vger.kernel.org> [2.6.34+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-23checkpatch: reduce false positives when checking void function return statementsJoe Perches1-5/+10
The previous patch had a few too many false positives on styles that should be acceptable. Signed-off-by: Joe Perches <joe@perches.com> Tested-by: Anish Bhatt <anish@chelsio.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-23ia64: arch/ia64/include/uapi/asm/fcntl.h needs personality.hAndrew Morton1-0/+1
fs/notify/fanotify/fanotify_user.c: In function 'SYSC_fanotify_init': fs/notify/fanotify/fanotify_user.c:726: error: implicit declaration of function 'personality' fs/notify/fanotify/fanotify_user.c:726: error: 'PER_LINUX32' undeclared (first use in this function) fs/notify/fanotify/fanotify_user.c:726: error: (Each undeclared identifier is reported only once fs/notify/fanotify/fanotify_user.c:726: error: for each function it appears in.) Reported-by: Wu Fengguang <fengguang.wu@intel.com> Cc: Will Woods <wwoods@redhat.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: <stable@vger.kernel.org> [3.15.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-23DMA, CMA: fix possible memory leakJoonsoo Kim1-1/+11
We should free memory for bitmap when we find zone mismatch, otherwise this memory will leak. Additionally, I copy code comment from PPC KVM's CMA code to inform why we need to check zone mis-match. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Reviewed-by: Michal Nazarewicz <mina86@mina86.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Minchan Kim <minchan@kernel.org> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Gleb Natapov <gleb@kernel.org> Cc: Alexander Graf <agraf@suse.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-23slab: fix oops when reading /proc/slab_allocatorsJoonsoo Kim1-19/+71
Commit b1cb0982bdd6 ("change the management method of free objects of the slab") introduced a bug on slab leak detector ('/proc/slab_allocators'). This detector works like as following decription. 1. traverse all objects on all the slabs. 2. determine whether it is active or not. 3. if active, print who allocate this object. but that commit changed the way how to manage free objects, so the logic determining whether it is active or not is also changed. In before, we regard object in cpu caches as inactive one, but, with this commit, we mistakenly regard object in cpu caches as active one. This intoduces kernel oops if DEBUG_PAGEALLOC is enabled. If DEBUG_PAGEALLOC is enabled, kernel_map_pages() is used to detect who corrupt free memory in the slab. It unmaps page table mapping if object is free and map it if object is active. When slab leak detector check object in cpu caches, it mistakenly think this object active so try to access object memory to retrieve caller of allocation. At this point, page table mapping to this object doesn't exist, so oops occurs. Following is oops message reported from Dave. It blew up when something tried to read /proc/slab_allocators (Just cat it, and you should see the oops below) Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC Modules linked in: [snip...] CPU: 1 PID: 9386 Comm: trinity-c33 Not tainted 3.14.0-rc5+ #131 task: ffff8801aa46e890 ti: ffff880076924000 task.ti: ffff880076924000 RIP: 0010:[<ffffffffaa1a8f4a>] [<ffffffffaa1a8f4a>] handle_slab+0x8a/0x180 RSP: 0018:ffff880076925de0 EFLAGS: 00010002 RAX: 0000000000001000 RBX: 0000000000000000 RCX: 000000005ce85ce7 RDX: ffffea00079be100 RSI: 0000000000001000 RDI: ffff880107458000 RBP: ffff880076925e18 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: 000000000000000f R12: ffff8801e6f84000 R13: ffffea00079be100 R14: ffff880107458000 R15: ffff88022bb8d2c0 FS: 00007fb769e45740(0000) GS:ffff88024d040000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffff8801e6f84ff8 CR3: 00000000a22db000 CR4: 00000000001407e0 DR0: 0000000002695000 DR1: 0000000002695000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000070602 Call Trace: leaks_show+0xce/0x240 seq_read+0x28e/0x490 proc_reg_read+0x3d/0x80 vfs_read+0x9b/0x160 SyS_read+0x58/0xb0 tracesys+0xd4/0xd9 Code: f5 00 00 00 0f 1f 44 00 00 48 63 c8 44 3b 0c 8a 0f 84 e3 00 00 00 83 c0 01 44 39 c0 72 eb 41 f6 47 1a 01 0f 84 e9 00 00 00 89 f0 <4d> 8b 4c 04 f8 4d 85 c9 0f 84 88 00 00 00 49 8b 7e 08 4d 8d 46 RIP handle_slab+0x8a/0x180 To fix the problem, I introduce an object status buffer on each slab. With this, we can track object status precisely, so slab leak detector would not access active object and no kernel oops would occur. Memory overhead caused by this fix is only imposed to CONFIG_DEBUG_SLAB_LEAK which is mainly used for debugging, so memory overhead isn't big problem. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Reported-by: Dave Jones <davej@redhat.com> Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reviewed-by: Vladimir Davydov <vdavydov@parallels.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-23shmem: fix faulting into a hole while it's punchedHugh Dickins1-4/+52
Trinity finds that mmap access to a hole while it's punched from shmem can prevent the madvise(MADV_REMOVE) or fallocate(FALLOC_FL_PUNCH_HOLE) from completing, until the reader chooses to stop; with the puncher's hold on i_mutex locking out all other writers until it can complete. It appears that the tmpfs fault path is too light in comparison with its hole-punching path, lacking an i_data_sem to obstruct it; but we don't want to slow down the common case. Extend shmem_fallocate()'s existing range notification mechanism, so shmem_fault() can refrain from faulting pages into the hole while it's punched, waiting instead on i_mutex (when safe to sleep; or repeatedly faulting when not). [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Hugh Dickins <hughd@google.com> Reported-by: Sasha Levin <sasha.levin@oracle.com> Tested-by: Sasha Levin <sasha.levin@oracle.com> Cc: Dave Jones <davej@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-23mm: let mm_find_pmd fix buggy race with THP faultHugh Dickins4-13/+20
Trinity has reported: BUG: unable to handle kernel NULL pointer dereference at 0000000000000018 IP: __lock_acquire (kernel/locking/lockdep.c:3070 (discriminator 1)) CPU: 6 PID: 16173 Comm: trinity-c364 Tainted: G W 3.15.0-rc1-next-20140415-sasha-00020-gaa90d09 #398 lock_acquire (arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602) _raw_spin_lock (include/linux/spinlock_api_smp.h:143 kernel/locking/spinlock.c:151) remove_migration_pte (mm/migrate.c:137) rmap_walk (mm/rmap.c:1628 mm/rmap.c:1699) remove_migration_ptes (mm/migrate.c:224) migrate_pages (mm/migrate.c:922 mm/migrate.c:960 mm/migrate.c:1126) migrate_misplaced_page (mm/migrate.c:1733) __handle_mm_fault (mm/memory.c:3762 mm/memory.c:3812 mm/memory.c:3925) handle_mm_fault (mm/memory.c:3948) __get_user_pages (mm/memory.c:1851) __mlock_vma_pages_range (mm/mlock.c:255) __mm_populate (mm/mlock.c:711) SyS_mlockall (include/linux/mm.h:1799 mm/mlock.c:817 mm/mlock.c:791) I believe this comes about because, whereas collapsing and splitting THP functions take anon_vma lock in write mode (which excludes concurrent rmap walks), faulting THP functions (write protection and misplaced NUMA) do not - and mostly they do not need to. But they do use a pmdp_clear_flush(), set_pmd_at() sequence which, for an instant (indeed, for a long instant, given the inter-CPU TLB flush in there), leaves *pmd neither present not trans_huge. Which can confuse a concurrent rmap walk, as when removing migration ptes, seen in the dumped trace. Although that rmap walk has a 4k page to insert, anon_vmas containing THPs are in no way segregated from 4k-page anon_vmas, so the 4k-intent mm_find_pmd() does need to cope with that instant when a trans_huge pmd is temporarily absent. I don't think we need strengthen the locking at the THP end: it's easily handled with an ACCESS_ONCE() before testing both conditions. And since mm_find_pmd() had only one caller who wanted a THP rather than a pmd, let's slightly repurpose it to fail when it hits a THP or non-present pmd, and open code split_huge_page_address() again. Signed-off-by: Hugh Dickins <hughd@google.com> Reported-by: Sasha Levin <sasha.levin@oracle.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Bob Liu <bob.liu@oracle.com> Cc: Christoph Lameter <cl@gentwo.org> Cc: Dave Jones <davej@redhat.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-23mm: thp: fix DEBUG_PAGEALLOC oops in copy_page_rep()Hugh Dickins1-4/+35
Trinity has for over a year been reporting a CONFIG_DEBUG_PAGEALLOC oops in copy_page_rep() called from copy_user_huge_page() called from do_huge_pmd_wp_page(). I believe this is a DEBUG_PAGEALLOC false positive, due to the source page being split, and a tail page freed, while copy is in progress; and not a problem without DEBUG_PAGEALLOC, since the pmd_same() check will prevent a miscopy from being made visible. Fix by adding get_user_huge_page() and put_user_huge_page(): reducing to the usual get_page() and put_page() on head page in the usual config; but get and put references to all of the tail pages when DEBUG_PAGEALLOC. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reported-by: Sasha Levin <sasha.levin@oracle.com> Tested-by: Sasha Levin <sasha.levin@oracle.com> Cc: Dave Jones <davej@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-23kernel/watchdog.c: print traces for all cpus on lockup detectionAaron Tomlin5-0/+73
A 'softlockup' is defined as a bug that causes the kernel to loop in kernel mode for more than a predefined period to time, without giving other tasks a chance to run. Currently, upon detection of this condition by the per-cpu watchdog task, debug information (including a stack trace) is sent to the system log. On some occasions, we have observed that the "victim" rather than the actual "culprit" (i.e. the owner/holder of the contended resource) is reported to the user. Often this information has proven to be insufficient to assist debugging efforts. To avoid loss of useful debug information, for architectures which support NMI, this patch makes it possible to improve soft lockup reporting. This is accomplished by issuing an NMI to each cpu to obtain a stack trace. If NMI is not supported we just revert back to the old method. A sysctl and boot-time parameter is available to toggle this feature. [dzickus@redhat.com: add CONFIG_SMP in certain areas] [akpm@linux-foundation.org: additional CONFIG_SMP=n optimisations] [mq@suse.cz: fix warning] Signed-off-by: Aaron Tomlin <atomlin@redhat.com> Signed-off-by: Don Zickus <dzickus@redhat.com> Cc: David S. Miller <davem@davemloft.net> Cc: Mateusz Guzik <mguzik@redhat.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Jan Moskyto Matejka <mq@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-23nmi: provide the option to issue an NMI back trace to every cpu but currentAaron Tomlin5-13/+38
Sometimes it is preferred not to use the trigger_all_cpu_backtrace() routine when one wants to avoid capturing a back trace for current. For instance if one was previously captured recently. This patch provides a new routine namely trigger_allbutself_cpu_backtrace() which offers the flexibility to issue an NMI to every cpu but current and capture a back trace accordingly. Patch x86 and sparc to support new routine. [dzickus@redhat.com: add stub in #else clause] [dzickus@redhat.com: don't print message in single processor case, wrap with get/put_cpu based on Oleg's suggestion] [sfr@canb.auug.org.au: undo C99ism] Signed-off-by: Aaron Tomlin <atomlin@redhat.com> Signed-off-by: Don Zickus <dzickus@redhat.com> Acked-by: David S. Miller <davem@davemloft.net> Cc: Mateusz Guzik <mguzik@redhat.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>