linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2016-03-17	nfsd: recover: fix memory leak	Sudip Mukherjee	1	-0/+1
	nfsd4_cltrack_grace_start() will allocate the memory for grace_start but when we returned due to error we missed freeing it. Signed-off-by: Sudip Mukherjee <sudip.mukherjee@codethink.co.uk> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-03-16	nfsd: fix deadlock secinfo+readdir compound	J. Bruce Fields	1	-0/+1
	nfsd_lookup_dentry exits with the parent filehandle locked. fh_put also unlocks if necessary (nfsd filehandle locking is probably too lenient), so it gets unlocked eventually, but if the following op in the compound needs to lock it again, we can deadlock. A fuzzer ran into this; normal clients don't send a secinfo followed by a readdir in the same compound. Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-03-02	nfsd4: resfh unused in nfsd4_secinfo	J. Bruce Fields	1	-2/+0
	Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-03-01	svcrdma: Use new CQ API for RPC-over-RDMA server send CQs	Chuck Lever	5	-177/+121
	Calling ib_poll_cq() to sort through WCs during a completion is a common pattern amongst RDMA consumers. Since commit 14d3a3b2498e ("IB: add a proper completion queue abstraction"), WC sorting can be handled by the IB core. By converting to this new API, svcrdma is made a better neighbor to other RDMA consumers, as it allows the core to schedule the delivery of completions more fairly amongst all active consumers. This new API also aims each completion at a function that is specific to the WR's opcode. Thus the ctxt->wr_op field and the switch in process_context is replaced by a set of methods that handle each completion type. Because each ib_cqe carries a pointer to a completion method, the core can now post operations on a consumer's QP, and handle the completions itself. The server's rdma_stat_sq_poll and rdma_stat_sq_prod metrics are no longer updated. As a clean up, the cq_event_handler, the dto_tasklet, and all associated locking is removed, as they are no longer referenced or used. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-03-01	svcrdma: Use new CQ API for RPC-over-RDMA server receive CQs	Chuck Lever	2	-91/+40
	Calling ib_poll_cq() to sort through WCs during a completion is a common pattern amongst RDMA consumers. Since commit 14d3a3b2498e ("IB: add a proper completion queue abstraction"), WC sorting can be handled by the IB core. By converting to this new API, svcrdma is made a better neighbor to other RDMA consumers, as it allows the core to schedule the delivery of completions more fairly amongst all active consumers. Because each ib_cqe carries a pointer to a completion method, the core can now post operations on a consumer's QP, and handle the completions itself. svcrdma receive completions no longer use the dto_tasklet. Each polled Receive WC is now handled individually in soft IRQ context. The server transport's rdma_stat_rq_poll and rdma_stat_rq_prod metrics are no longer updated. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-03-01	svcrdma: Remove close_out exit path	Chuck Lever	1	-11/+1
	Clean up: close_out is reached only when ctxt == NULL and XPT_CLOSE is already set. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Devesh Sharma <devesh.sharma@broadcom.com> Tested-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-03-01	svcrdma: Hook up the logic to return ERR_CHUNK	Chuck Lever	2	-13/+46
	RFC 5666 Section 4.2 states: > When the peer detects an RPC-over-RDMA header version that it does > not support (currently this document defines only version 1), it > replies with an error code of ERR_VERS, and provides the low and > high inclusive version numbers it does, in fact, support. And: > When other decoding errors are detected in the header or chunks, > either an RPC decode error MAY be returned or the RPC/RDMA error > code ERR_CHUNK MUST be returned. The Linux NFS server does throw ERR_VERS when a client sends it a request whose rdma_version is not "one." But it does not return ERR_CHUNK when a header decoding error occurs. It just drops the request. To improve protocol extensibility, it should reject invalid values in the rdma_proc field instead of treating them all like RDMA_MSG. Otherwise clients can't detect when the server doesn't support new rdma_proc values. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Devesh Sharma <devesh.sharma@broadcom.com> Tested-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-03-01	svcrdma: Use correct XID in error replies	Chuck Lever	3	-8/+4
	When constructing an error reply, svc_rdma_xdr_encode_error() needs to view the client's request message so it can get the failing request's XID. svc_rdma_xdr_decode_req() is supposed to return a pointer to the client's request header. But if it fails to decode the client's message (and thus an error reply is needed) it does not return the pointer. The server then sends a bogus XID in the error reply. Instead, unconditionally generate the pointer to the client's header in svc_rdma_recvfrom(), and pass that pointer to both functions. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Devesh Sharma <devesh.sharma@broadcom.com> Tested-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-03-01	svcrdma: Make RDMA_ERROR messages work	Chuck Lever	5	-67/+74
	Fix several issues with svc_rdma_send_error(): - Post a receive buffer to replace the one that was consumed by the incoming request - Posting a send should use DMA_TO_DEVICE, not DMA_FROM_DEVICE - No need to put_page _and_ free pages in svc_rdma_put_context - Make sure the sge is set up completely in case the error path goes through svc_rdma_unmap_dma() - Replace the use of ENOSYS, which has a reserved meaning Related fixes in svc_rdma_recvfrom(): - Don't leak the ctxt associated with the incoming request - Don't close the connection after sending an error reply - Let svc_rdma_send_error() figure out the right header error code As a last clean up, move svc_rdma_send_error() to svc_rdma_sendto.c with other similar functions. There is some common logic in these functions that could someday be combined to reduce code duplication. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Devesh Sharma <devesh.sharma@broadcom.com> Tested-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-03-01	rpcrdma: Add RPCRDMA_HDRLEN_ERR	Chuck Lever	1	-0/+1
	Error headers are shorter than either RDMA_MSG or RDMA_NOMSG. Since HDRLEN_MIN is already used in several other places that would be annoying to change, add RPCRDMA_HDRLEN_ERR for the one or two spots where the shorter length is needed. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-03-01	svcrdma: svc_rdma_post_recv() should close connection on error	Chuck Lever	5	-24/+20
	Clean up: Most svc_rdma_post_recv() call sites close the transport connection when a receive cannot be posted. Wrap that in a common helper. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Devesh Sharma <devesh.sharma@broadcom.com> Tested-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-03-01	svcrdma: Close connection when a send error occurs	Chuck Lever	1	-2/+6
	Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-03-01	nfsd: Lower NFSv4.1 callback message size limit	Chuck Lever	4	-14/+24
	The maximum size of a backchannel message on RPC-over-RDMA depends on the connection's inline threshold. Today that threshold is typically 1024 bytes, making the maximum message size 996 bytes. The Linux server's CREATE_SESSION operation checks that the size of callback Calls can be as large as 1044 bytes, to accommodate RPCSEC_GSS. Thus CREATE_SESSION fails if a client advertises the true message size maximum of 996 bytes. But the server's backchannel currently does not support RPCSEC_GSS. The actual maximum size it needs is much smaller. It is safe to reduce the limit to enable NFSv4.1 on RDMA backchannel operation. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-03-01	svcrdma: Do not send Write chunk XDR pad with inline content	Chuck Lever	3	-7/+19
	The NFS server's XDR encoders adds an XDR pad for content in the xdr_buf page list at the beginning of the xdr_buf's tail buffer. On RDMA transports, Write chunks are sent separately and without an XDR pad. If a Write chunk is being sent, strip off the pad in the tail buffer so that inline content following the Write chunk remains XDR-aligned when it is sent to the client. BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=294 Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-03-01	svcrdma: Do not write xdr_buf::tail in a Write chunk	Chuck Lever	1	-3/+8
	When the Linux NFS server writes an odd-length data item into a Write chunk, it finishes with XDR pad bytes. If the data item is smaller than the Write chunk, the pad bytes are written at the end of the data item, but still inside the chunk (ie, in the application's buffer). Since this is direct data placement, that exposes the pad bytes. XDR pad bytes are inserted in order to preserve the XDR alignment of the next XDR data item in an XDR stream. But Write chunks do not appear in the payload XDR stream, and only one data item is allowed in each chunk. Thus XDR padding is not needed in a Write chunk. With NFSv4, the Linux NFS server places the results of any operations that follow an NFSv4 READ or READLINK in the xdr_buf's tail. Those results also should never be sent as a part of a Write chunk. The current logic in send_write_chunks() appears to assume that the xdr_buf's tail contains only pad bytes (ie, NFSv3). The server should write only the contents of the xdr_buf's page list in a Write chunk. If there's more than an XDR pad in the tail, that needs to go inline or in the Reply chunk. BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=294 Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-03-01	svcrdma: Find client-provided write and reply chunks once per reply	Chuck Lever	1	-44/+36
	The client provides the location of Write chunks into which the server writes bulk payload. The client provides these when the Upper Layer Protocol wants direct data placement and the Binding allows it. (For NFS, this is READ and READLINK operations). The client also provides the location of a Reply chunk into which the server writes the non-bulk part of an RPC reply. The client provides this chunk whenever it believes the reply can be larger than its receive buffers. The server then uses the presence of these chunks to determine how it will form its reply message. svc_rdma_sendto() was looking for Write and Reply chunks multiple times for every reply message. It would be more efficient to do it just once. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-03-01	nfsd: Update NFS server comments related to RDMA support	Chuck Lever	2	-4/+3
	The server does indeed now support NFSv4.1 on RDMA transports. It does not support shifting an RDMA-capable TCP transport (such as iWARP) to RDMA mode. Reported-by: Shirley Ma <shirley.ma@oracle.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-03-01	nfsd: Fix a memory leak when meeting unsupported state_protect_how4	Kinglong Mee	1	-1/+2
	Remember free allocated client when meeting unsupported state protect how. Fixes: 50c7b948adbd ("nfsd: minor consolidation of mach_cred handling code") Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-03-01	nfsd4: fix bad bounds checking	J. Bruce Fields	1	-5/+8
	A number of spots in the xdr decoding follow a pattern like n = be32_to_cpup(p++); READ_BUF(n + 4); where n is a u32. The only bounds checking is done in READ_BUF itself, but since it's checking (n + 4), it won't catch cases where n is very large, (u32)(-4) or higher. I'm not sure exactly what the consequences are, but we've seen crashes soon after. Instead, just break these up into two READ_BUF()s. Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-02-23	sunrpc/cache: fix off-by-one in qword_get()	Stefan Hajnoczi	1	-1/+1
	The qword_get() function NUL-terminates its output buffer. If the input string is in hex format \xXXXX... and the same length as the output buffer, there is an off-by-one: int qword_get(char *bpp, char dest, int bufsize) { ... while (len < bufsize) { ... dest++ = (h << 4) \| l; len++; } ... dest = '\0'; return len; } This patch ensures the NUL terminator doesn't fall outside the output buffer. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-02-14	Linux 4.5-rc4	Linus Torvalds	1	-1/+1

2016-02-13	ALSA: usb-audio: avoid freeing umidi object twice	Andrey Konovalov	1	-1/+0
	The 'umidi' object will be free'd on the error path by snd_usbmidi_free() when tearing down the rawmidi interface. So we shouldn't try to free it in snd_usbmidi_create() after having registered the rawmidi interface. Found by KASAN. Signed-off-by: Andrey Konovalov <andreyknvl@gmail.com> Acked-by: Clemens Ladisch <clemens@ladisch.de> Cc: <stable@vger.kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de>
2016-02-12	IB/mlx5: Fix RC transport send queue overhead computation	Leon Romanovsky	1	-5/+7
	Fix the RC QPs send queue overhead computation to take into account two additional segments in the WQE which are needed for registration operations. The ATOMIC and UMR segments can't coexist together, so chose maximum out of them. The commit 9e65dc371b5c ("IB/mlx5: Fix RC transport send queue overhead computation") was intended to update RC transport as commit messages states, but added the code to UC transport. Fixes: 9e65dc371b5c ("IB/mlx5: Fix RC transport send queue overhead computation") Signed-off-by: Kamal Heib <kamalh@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-02-12	IB/ipoib: fix for rare multicast join race condition	Alex Estrin	1	-7/+17
	A narrow window for race condition still exist between multicast join thread and *dev_flush workers. A kernel crash caused by prolong erratic link state changes was observed (most likely a faulty cabling): [167275.656270] BUG: unable to handle kernel NULL pointer dereference at 0000000000000020 [167275.665973] IP: [<ffffffffa05f8f2e>] ipoib_mcast_join+0xae/0x1d0 [ib_ipoib] [167275.674443] PGD 0 [167275.677373] Oops: 0000 [#1] SMP ... [167275.977530] Call Trace: [167275.982225] [<ffffffffa05f92f0>] ? ipoib_mcast_free+0x200/0x200 [ib_ipoib] [167275.992024] [<ffffffffa05fa1b7>] ipoib_mcast_join_task+0x2a7/0x490 [ib_ipoib] [167276.002149] [<ffffffff8109d5fb>] process_one_work+0x17b/0x470 [167276.010754] [<ffffffff8109e3cb>] worker_thread+0x11b/0x400 [167276.019088] [<ffffffff8109e2b0>] ? rescuer_thread+0x400/0x400 [167276.027737] [<ffffffff810a5aef>] kthread+0xcf/0xe0 Here was a hit spot: ipoib_mcast_join() { .............. rec.qkey = priv->broadcast->mcmember.qkey; ^^^^^^^ ..... } Proposed patch should prevent multicast join task to continue if link state change is detected. Signed-off-by: Alex Estrin <alex.estrin@intel.com> Changes from v4: - as suggested by Doug Ledford, optimized spinlock usage, i.e. ipoib_mcast_join() is called with lock held. Changes from v3: - sync with priv->lock before flag check. Chages from v2: - Move check for OPER_UP flag state to mcast_join() to ensure no event worker is in progress. - minor style fixes. Changes from v1: - No need to lock again if error detected. Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-02-12	EVM: Use crypto_memneq() for digest comparisons	Ryan Ware	1	-1/+2
	This patch fixes vulnerability CVE-2016-2085. The problem exists because the vm_verify_hmac() function includes a use of memcmp(). Unfortunately, this allows timing side channel attacks; specifically a MAC forgery complexity drop from 2^128 to 2^12. This patch changes the memcmp() to the cryptographically safe crypto_memneq(). Reported-by: Xiaofei Rex Guo <xiaofei.rex.guo@intel.com> Signed-off-by: Ryan Ware <ware@linux.intel.com> Cc: stable@vger.kernel.org Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com> Signed-off-by: James Morris <james.l.morris@oracle.com>
2016-02-12	ARC: mm: Introduce explicit super page size support	Vineet Gupta	2	-19/+45
	MMUv4 supports 2 concurrent page sizes: Normal and Super [4K to 16M] So far Linux supported a single super page size for a given Normal page, depending on the software page walking address split. e.g. we had 11:8:13 address split for 8K page, which meant super page was 2 ^(8+13) = 2M (given that THP size has to be PMD_SHIFT) Now we turn this around, by allowing multiple Super Pages in Kconfig (currently 2M and 16M only) and forcing page walker address split to PGDIR_SHIFT and PAGE_SHIFT For configs without Super page, things are same as before and PGDIR_SHIFT can be hacked to get non default address split The motivation for this change is a customer who needs 16M super page and a 8K Normal page combo. Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2016-02-11	arch/x86/Kconfig: CONFIG_X86_UV should depend on CONFIG_EFI	Andrew Morton	1	-0/+1
	arch/x86/built-in.o: In function `uv_bios_call': (.text+0xeba00): undefined reference to `efi_call' Reported-by: kbuild test robot <fengguang.wu@intel.com> Suggested-by: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@elte.hu> Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk> Acked-by: Alex Thorlton <athorlton@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-11	mm: fix pfn_t vs highmem	Dan Williams	3	-12/+11
	The pfn_t type uses an unsigned long to store a pfn + flags value. On a 64-bit platform the upper 12 bits of an unsigned long are never used for storing the value of a pfn. However, this is not true on highmem platforms, all 32-bits of a pfn value are used to address a 44-bit physical address space. A pfn_t needs to store a 64-bit value. Link: https://bugzilla.kernel.org/show_bug.cgi?id=112211 Fixes: 01c8f1c44b83 ("mm, dax, gpu: convert vm_insert_mixed to pfn_t") Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reported-by: Stuart Foster <smf.linux@ntlworld.com> Reported-by: Julian Margetson <runaway@candw.ms> Tested-by: Julian Margetson <runaway@candw.ms> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-11	kernel/locking/lockdep.c: convert hash tables to hlists	Andrew Morton	2	-25/+21
	Mike said: : CONFIG_UBSAN_ALIGNMENT breaks x86-64 kernel with lockdep enabled, i. e : kernel with CONFIG_UBSAN_ALIGNMENT fails to load without even any error : message. : : The problem is that ubsan callbacks use spinlocks and might be called : before lockdep is initialized. Particularly this line in the : reserve_ebda_region function causes problem: : : lowmem = (unsigned short )__va(BIOS_LOWMEM_KILOBYTES); : : If i put lockdep_init() before reserve_ebda_region call in : x86_64_start_reservations kernel loads well. Fix this ordering issue permanently: change lockdep so that it uses hlists for the hash tables. Unlike a list_head, an hlist_head is in its initialized state when it is all-zeroes, so lockdep is ready for operation immediately upon boot - lockdep_init() need not have run. The patch will also save some memory. lockdep_init() and lockdep_initialized can be done away with now - a 4.6 patch has been prepared to do this. Reported-by: Mike Krinkin <krinkin.m.u@gmail.com> Suggested-by: Mike Krinkin <krinkin.m.u@gmail.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-11	mm,thp: fix spellos in describing __HAVE_ARCH_FLUSH_PMD_TLB_RANGE	Vineet Gupta	1	-2/+2
	[akpm@linux-foundation.org: s/threshhold/threshold/] Signed-off-by: Vineet Gupta <vgupta@synopsys.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-11	mm,thp: khugepaged: call pte flush at the time of collapse	Vineet Gupta	1	-1/+3
	This showed up on ARC when running LMBench bw_mem tests as Overlapping TLB Machine Check Exception triggered due to STLB entry (2M pages) overlapping some NTLB entry (regular 8K page). bw_mem 2m touches a large chunk of vaddr creating NTLB entries. In the interim khugepaged kicks in, collapsing the contiguous ptes into a single pmd. pmdp_collapse_flush()->flush_pmd_tlb_range() is called to flush out NTLB entries for the ptes. This for ARC (by design) can only shootdown STLB entries (for pmd). The stray NTLB entries cause the overlap with the subsequent STLB entry for collapsed page. So make pmdp_collapse_flush() call pte flush interface not pmd flush. Note that originally all thp flush call sites in generic code called flush_tlb_range() leaving it to architecture to implement the flush for pte and/or pmd. Commit 12ebc1581ad11454 changed this by calling a new opt-in API flush_pmd_tlb_range() which made the semantics more explicit but failed to distinguish the pte vs pmd flush in generic code, which is what this patch fixes. Note that ARC can fixed w/o touching the generic pmdp_collapse_flush() by defining a ARC version, but that defeats the purpose of generic version, plus sementically this is the right thing to do. Fixes STAR 9000961194: LMBench on AXS103 triggering duplicate TLB exceptions with super pages Fixes: 12ebc1581ad11454 ("mm,thp: introduce flush_pmd_tlb_range") Signed-off-by: Vineet Gupta <vgupta@synopsys.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: <stable@vger.kernel.org> [4.4] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-11	mm/backing-dev.c: fix error path in wb_init()	Rasmus Villemoes	1	-1/+1
	We need to use post-decrement to get percpu_counter_destroy() called on &wb->stat[0]. Moreover, the pre-decremebt would cause infinite out-of-bounds accesses if the setup code failed at i==0. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Vladimir Davydov <vdavydov@virtuozzo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-11	mm, dax: check for pmd_none() after split_huge_pmd()	Kirill A. Shutemov	2	-2/+6
	DAX implements split_huge_pmd() by clearing pmd. This simple approach reduces memory overhead, as we don't need to deposit page table on huge page mapping to make split_huge_pmd() never-fail. PTE table can be allocated and populated later on page fault from backing store. But one side effect is that have to check if pmd is pmd_none() after split_huge_pmd(). In most places we do this already to deal with parallel MADV_DONTNEED. But I found two call sites which is not affected by MADV_DONTNEED (due down_write(mmap_sem)), but need to have the check to work with DAX properly. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Matthew Wilcox <willy@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-11	vsprintf: kptr_restrict is okay in IRQ when 2	Jason A. Donenfeld	1	-13/+13
	The kptr_restrict flag, when set to 1, only prints the kernel address when the user has CAP_SYSLOG. When it is set to 2, the kernel address is always printed as zero. When set to 1, this needs to check whether or not we're in IRQ. However, when set to 2, this check is unneccessary, and produces confusing results in dmesg. Thus, only make sure we're not in IRQ when mode 1 is used, but not mode 2. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-11	mm: fix filemap.c kernel doc warning	Randy Dunlap	1	-0/+1
	Add missing kernel-doc notation for function parameter 'gfp_mask' to fix kernel-doc warning. mm/filemap.c:1898: warning: No description found for parameter 'gfp_mask' Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-11	ubsan: cosmetic fix to Kconfig text	Yang Shi	1	-1/+3
	When enabling UBSAN_SANITIZE_ALL, the kernel image size gets increased significantly (~3x). So, it sounds better to have some note in Kconfig. And, fixed a typo. Signed-off-by: Yang Shi <yang.shi@linaro.org> Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-11	IB/core: Fix reading capability mask of the port info class	Eran Ben Elisha	1	-3/+2
	When checking specific attribute from a bit mask, need to use bitwise AND and not logical AND, fixed that. Fixes: 145d9c541032 ('IB/core: Display extended counter set if available') Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Christoph Lameter <cl@linux.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-02-11	net/mlx4: fix some error handling in mlx4_multi_func_init()	Rasmus Villemoes	1	-1/+1
	The while loop after err_slaves should use post-decrement; otherwise we'll fail to do the kfrees for i==0, and will run into out-of-bounds accesses if the setup above failed already at i==0. [I'm not sure why one even bothers populating the ->vlan_filter array: mlx4.h isn't #included by anything outside drivers/net/ethernet/mellanox/mlx4/, and "git grep -C2 -w vlan_filter drivers/net/ethernet/mellanox/mlx4/" seems to suggest that the vlan_filter elements aren't used at all.] Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-02-11	Revert "mmc: block: don't use parameter prefix if built as module"	Ulf Hansson	1	-3/+0
	This reverts commit 829b6962f7e3cfc06f7c5c26269fd47ad48cf503. Revert this change as it causes a sysfs path to change and therefore introduces and ABI regression. More precisely Android's vold is not being able to access /sys/module/mmcblk/parameters/perdev_minors any more, since the path becomes changed to: "/sys/module/mmc_block/..." Fixes: 829b6962f7e3 ("mmc: block: don't use parameter prefix if built as module") Reported-by: John Stultz <john.stultz@linaro.org> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2016-02-11	thermal: cpu_cooling: fix out of bounds access in time_in_idle	Javi Merino	1	-6/+8
	In __cpufreq_cooling_register() we allocate the arrays for time_in_idle and time_in_idle_timestamp to be as big as the number of cpus in this cpufreq device. However, in get_load() we access this array using the cpu number as index, which can result in an out of bound access. Index time_in_idle{,_timestamp} using the index in the cpufreq_device's allowed_cpus mask, as we do for the load_cpu array in cpufreq_get_requested_power() Reported-by: Nicolas Boichat <drinkcat@chromium.org> Cc: Amit Daniel Kachhap <amit.kachhap@gmail.com> Cc: Zhang Rui <rui.zhang@intel.com> Cc: Eduardo Valentin <edubezval@gmail.com> Tested-by: Nicolas Boichat <drinkcat@chromium.org> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Javi Merino <javi.merino@arm.com> Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
2016-02-11	btrfs: properly set the termination value of ctx->pos in readdir	David Sterba	3	-3/+16
	The value of ctx->pos in the last readdir call is supposed to be set to INT_MAX due to 32bit compatibility, unless 'pos' is intentially set to a larger value, then it's LLONG_MAX. There's a report from PaX SIZE_OVERFLOW plugin that "ctx->pos++" overflows (https://forums.grsecurity.net/viewtopic.php?f=1&t=4284), on a 64bit arch, where the value is 0x7fffffffffffffff ie. LLONG_MAX before the increment. We can get to that situation like that: * emit all regular readdir entries * still in the same call to readdir, bump the last pos to INT_MAX * next call to readdir will not emit any entries, but will reach the bump code again, finds pos to be INT_MAX and sets it to LLONG_MAX Normally this is not a problem, but if we call readdir again, we'll find 'pos' set to LLONG_MAX and the unconditional increment will overflow. The report from Victor at (http://thread.gmane.org/gmane.comp.file-systems.btrfs/49500) with debugging print shows that pattern: Overflow: e Overflow: 7fffffff Overflow: 7fffffffffffffff PAX: size overflow detected in function btrfs_real_readdir fs/btrfs/inode.c:5760 cicus.935_282 max, count: 9, decl: pos; num: 0; context: dir_context; CPU: 0 PID: 2630 Comm: polkitd Not tainted 4.2.3-grsec #1 Hardware name: Gigabyte Technology Co., Ltd. H81ND2H/H81ND2H, BIOS F3 08/11/2015 ffffffff81901608 0000000000000000 ffffffff819015e6 ffffc90004973d48 ffffffff81742f0f 0000000000000007 ffffffff81901608 ffffc90004973d78 ffffffff811cb706 0000000000000000 ffff8800d47359e0 ffffc90004973ed8 Call Trace: [<ffffffff81742f0f>] dump_stack+0x4c/0x7f [<ffffffff811cb706>] report_size_overflow+0x36/0x40 [<ffffffff812ef0bc>] btrfs_real_readdir+0x69c/0x6d0 [<ffffffff811dafc8>] iterate_dir+0xa8/0x150 [<ffffffff811e6d8d>] ? __fget_light+0x2d/0x70 [<ffffffff811dba3a>] SyS_getdents+0xba/0x1c0 Overflow: 1a [<ffffffff811db070>] ? iterate_dir+0x150/0x150 [<ffffffff81749b69>] entry_SYSCALL_64_fastpath+0x12/0x83 The jump from 7fffffff to 7fffffffffffffff happens when new dir entries are not yet synced and are processed from the delayed list. Then the code could go to the bump section again even though it might not emit any new dir entries from the delayed list. The fix avoids entering the "bump" section again once we've finished emitting the entries, both for synced and delayed entries. References: https://forums.grsecurity.net/viewtopic.php?f=1&t=4284 Reported-by: Victor <services@swwu.com> CC: stable@vger.kernel.org Signed-off-by: David Sterba <dsterba@suse.com> Tested-by: Holger Hoffstätte <holger.hoffstaette@googlemail.com> Signed-off-by: Chris Mason <clm@fb.com>
2016-02-11	ARM: 8519/1: ICST: try other dividends than 1	Linus Walleij	1	-0/+1
	Since the dawn of time the ICST code has only supported divide by one or hang in an eternal loop. Luckily we were always dividing by one because the reference frequency for the systems using the ICSTs is 24MHz and the [min,max] values for the PLL input if [10,320] MHz for ICST307 and [6,200] for ICST525, so the loop will always terminate immediately without assigning any divisor for the reference frequency. But for the code to make sense, let's insert the missing i++ Reported-by: David Binderman <dcb314@hotmail.com> Cc: stable@vger.kernel.org Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2016-02-11	mmc: sdhci-acpi: Fix card detect race for Intel BXT/APL	Adrian Hunter	1	-0/+30
	Intel BXT/APL use a card detect GPIO however the host controller will not enable bus power unless it's card detect also reflects the presence of a card. Unfortunately those 2 things race which can result in commands not starting, after which the controller does nothing and there is a 10 second wait for the driver's 10-second timer to timeout. That is fixed by having the driver look also at the present state register to determine if the card is present. Consequently, provide a 'get_cd' mmc host operation for BXT/APL that does that. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: stable@vger.kernel.org # v4.4+ Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2016-02-11	mmc: sdhci-pci: Fix card detect race for Intel BXT/APL	Adrian Hunter	1	-0/+31
	Intel BXT/APL use a card detect GPIO however the host controller will not enable bus power unless it's card detect also reflects the presence of a card. Unfortunately those 2 things race which can result in commands not starting, after which the controller does nothing and there is a 10 second wait for the driver's 10-second timer to timeout. That is fixed by having the driver look also at the present state register to determine if the card is present. Consequently, provide a 'get_cd' mmc host operation for BXT/APL that does that. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: stable@vger.kernel.org # v4.4+ Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2016-02-11	mmc: sdhci: Allow override of get_cd() called from sdhci_request()	Adrian Hunter	1	-1/+1
	Drivers may need to provide their own get_cd() mmc host op, but currently the internals of the current op (sdhci_get_cd()) are provided by sdhci_do_get_cd() which is also called from sdhci_request(). To allow override of the get_cd functionality, change sdhci_request() to call ->get_cd() instead of sdhci_do_get_cd(). Note, in the future the call to ->get_cd() will likely be removed from sdhci_request() since most drivers don't need actually it. However this change is being done now to facilitate a subsequent bug fix. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: stable@vger.kernel.org # v4.4+ Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2016-02-11	mmc: sdhci: Allow override of mmc host operations	Adrian Hunter	2	-1/+3
	In the past, fixes for specific hardware devices were implemented in sdhci using quirks. That approach is no longer accepted because the growing number of quirks was starting to make the code difficult to understand and maintain. One alternative to quirks, is to allow drivers to override the default mmc host operations. This patch makes it easy to do that, and it is needed for a subsequent bug fix, for which separate patches are provided. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: stable@vger.kernel.org # v4.4+ Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2016-02-11	mips: Differentiate between 32 and 64 bit ELF header	Daniel Wagner	3	-4/+9
	Depending on the configuration either the 32 or 64 bit version of elf_check_arch() is defined. parse_crash_elf{32\|64}_headers() does some basic verification of the ELF header via vmcore_elf{32\|64}_check_arch() which happen to map to elf_check_arch(). Since the implementation 32 and 64 bit version of elf_check_arch() differ, we use the wrong type: In file included from include/linux/elf.h:4:0, from fs/proc/vmcore.c:13: fs/proc/vmcore.c: In function 'parse_crash_elf64_headers': >> arch/mips/include/asm/elf.h:228:23: error: initialization from incompatible pointer type [-Werror=incompatible-pointer-types] struct elfhdr *__h = (hdr); \ ^ include/linux/crash_dump.h:41:37: note: in expansion of macro 'elf_check_arch' #define vmcore_elf64_check_arch(x) (elf_check_arch(x) \|\| vmcore_elf_check_arch_cross(x)) ^ fs/proc/vmcore.c:1015:4: note: in expansion of macro 'vmcore_elf64_check_arch' !vmcore_elf64_check_arch(&ehdr) \|\| ^ Therefore, we rather define vmcore_elf{32\|64}_check_arch() as a basic machine check and use it also in binfm_elf?32.c as well. Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de> Suggested-by: Maciej W. Rozycki <macro@imgtec.com> Reviewed-by: Maciej W. Rozycki <macro@imgtec.com> Reported-by: Fengguang Wu <fengguang.wu@intel.com> Cc: linux-kernel@vger.kernel.org Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/12529/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2016-02-11	irqchip/gic-v3: Make sure read from ICC_IAR1_EL1 is visible on redestributor	Tirumalesh Chalamarla	1	-0/+1
	The ARM GICv3 specification mentions the need for dsb after a read from the ICC_IAR1_EL1 register: 4.1.1 Physical CPU Interface: The effects of reading ICC_IAR0_EL1 and ICC_IAR1_EL1 on the state of a returned INTID are not guaranteed to be visible until after the execution of a DSB. Not having this could result in missed interrupts, so let's add the required barrier. [Marc: fixed commit message] Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Tirumalesh Chalamarla <tchalamarla@caviumnetworks.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2016-02-11	irqchip/gic: Only set the EOImodeNS bit for the root controller	Jon Hunter	1	-1/+1
	EOImode1 is only used for the root controller and hence only the root controller uses the eoimode1 functions for handling interrupts. However, if the root controller supports EOImode1, then the EOImodeNS bit will be set for all GICs, enabling EOImode1. This is not what we want and this causes interrupts on non-root GICs to only be dropped in priority but never deactivated. Therefore, only set the EOImodeNS bit for the root controller. Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Jon Hunter <jonathanh@nvidia.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2016-02-11	irqchip/gic: Only populate set_affinity for the root controller	Jon Hunter	1	-6/+5
	Setting the affinity of an IRQ, it only applicable for the root interrupt controller and so only populate this operator for the root controller. Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Jon Hunter <jonathanh@nvidia.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>