aboutsummaryrefslogtreecommitdiffstatshomepage
path: root/tools/perf/scripts/python/export-to-postgresql.py (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2017-07-12nfsd4: factor ctime into change attributeJ. Bruce Fields3-3/+25
Factoring ctime into the nfsv4 change attribute gives us better properties than just i_version alone. Eventually we'll likely also expose this (as opposed to raw i_version) to userspace, at which point we'll want to move it to a common helper, called from either userspace or individual filesystems. For now, nfsd is the only user. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-07-12svcrdma: Remove svc_rdma_chunk_ctxt::cc_dir fieldChuck Lever1-10/+8
Clean up: No need to save the I/O direction. The functions that release svc_rdma_chunk_ctxt already know what direction to use. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-07-12svcrdma: use offset_in_page() macroChuck Lever1-2/+3
Clean up: Use offset_in_page() macro instead of open-coding. Reported-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-07-12svcrdma: Clean up after converting svc_rdma_recvfrom to rdma_rw APIChuck Lever2-39/+4
Clean up: Registration mode details are now handled by the rdma_rw API, and thus can be removed from svcrdma. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-07-12svcrdma: Clean-up svc_rdma_unmap_dmaChuck Lever1-14/+5
There's no longer a need to compare each SGE's lkey with the PD's local_dma_lkey. Now that FRWR is gone, all DMA mappings are for pages that were registered with this key. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-07-12svcrdma: Remove frmr cacheChuck Lever2-104/+0
Clean up: Now that the svc_rdma_recvfrom path uses the rdma_rw API, the details of Read sink buffer registration are dealt with by the kernel's RDMA core. This cache is no longer used, and can be removed. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-07-12svcrdma: Remove unused Read completion handlersChuck Lever2-87/+10
Clean up: The generic RDMA R/W API conversion of svc_rdma_recvfrom replaced the Register, Read, and Invalidate completion handlers. Remove the old ones, which are no longer used. These handlers shared some helper code with svc_rdma_wc_send. Fold the wc_common helper back into the one remaining completion handler. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-07-12svcrdma: Properly compute .len and .buflen for received RPC CallsChuck Lever2-11/+5
When an RPC-over-RDMA request is received, the Receive buffer contains a Transport Header possibly followed by an RPC message. Even though rq_arg.head[0] (as passed to NFSD) does not contain the Transport Header header, currently rq_arg.len includes the size of the Transport Header. That violates the intent of the xdr_buf API contract. .buflen should include everything, but .len should be exactly the length of the RPC message in the buffer. The rq_arg fields are summed together at the end of svc_rdma_recvfrom to obtain the correct return value. rq_arg.len really ought to contain the correct number of bytes already, but it currently doesn't due to the above misbehavior. Let's instead ensure that .buflen includes the length of the transport header, and that .len is always equal to head.iov_len + .page_len + tail.iov_len . Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-07-12svcrdma: Use generic RDMA R/W API in RPC Call pathChuck Lever3-468/+106
The current svcrdma recvfrom code path has a lot of detail about registration mode and the type of port (iWARP, IB, etc). Instead, use the RDMA core's generic R/W API. This shares code with other RDMA-enabled ULPs that manages the gory details of buffer registration and the posting of RDMA Read Work Requests. Since the Read list marshaling code is being replaced, I took the opportunity to replace C structure-based XDR encoding code with more portable code that uses pointer arithmetic. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-07-12svcrdma: Add recvfrom helpers to svc_rdma_rw.cChuck Lever2-1/+427
svc_rdma_rw.c already contains helpers for the sendto path. Introduce helpers for the recvfrom path. The plan is to replace the local NFSD bespoke code that constructs and posts RDMA Read Work Requests with calls to the rdma_rw API. This shares code with other RDMA-enabled ULPs that manages the gory details of buffer registration and posting Work Requests. This new code also puts all RDMA_NOMSG-specific logic in one place. Lastly, the use of rqstp->rq_arg.pages is deprecated in favor of using rqstp->rq_pages directly, for clarity. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-07-12sunrpc: Allocate up to RPCSVC_MAXPAGES per svc_rqstChuck Lever2-5/+7
svcrdma needs 259 pages allocated to receive 1MB NFSv4.0 WRITE requests: - 1 page for the transport header and head iovec - 256 pages for the data payload - 1 page for the trailing GETATTR request (since NFSD XDR decoding does not look for a tail iovec, the GETATTR is stuck at the end of the rqstp->rq_arg.pages list) - 1 page for building the reply xdr_buf But RPCSVC_MAXPAGES is already 259 (on x86_64). The problem is that svc_alloc_arg never allocates that many pages. To address this: 1. The final element of rq_pages always points to NULL. To accommodate up to 259 pages in rq_pages, add an extra element to rq_pages for the array termination sentinel. 2. Adjust the calculation of "pages" to match how RPCSVC_MAXPAGES is calculated, so it can go up to 259. Bruce noted that the calculation assumes sv_max_mesg is a multiple of PAGE_SIZE, which might not always be true. I didn't change this assumption. 3. Change the loop boundaries to allow 259 pages to be allocated. Additional clean-up: WARN_ON_ONCE adds an extra conditional branch, which is basically never taken. And there's no need to dump the stack here because svc_alloc_arg has only one caller. Keeping that NULL "array termination sentinel"; there doesn't appear to be any code that depends on it, only code in nfsd_splice_actor() which needs the 259th element to be initialized to *something*. So it's possible we could just keep the array at 259 elements and drop that final NULL, but we're being conservative for now. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-06-28svcrdma: Don't account for Receive queue "starvation"Chuck Lever1-15/+6
>From what I can tell, calling ->recvfrom when there is no work to do is a normal part of operation. This is the only way svc_recv can tell when there is no more data ready to receive on the transport. Neither the TCP nor the UDP transport implementations have a "starve" metric. The cost of receive starvation accounting is bumping an atomic, which results in extra (IMO unnecessary) bus traffic between CPU sockets, while holding a spin lock. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-06-28svcrdma: Improve Reply chunk sanity checkingChuck Lever1-6/+11
Identify malformed transport headers and unsupported chunk combinations as early as possible. - Ensure that segment lengths are not crazy. - Ensure that the Reply chunk's segment count is not crazy. With a 1KB inline threshold, the largest number of Write segments that can be conveyed is about 60 (for a RDMA_NOMSG Reply message). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-06-28svcrdma: Improve Write chunk sanity checkingChuck Lever1-5/+47
Identify malformed transport headers and unsupported chunk combinations as early as possible. - Reject RPC-over-RDMA messages that contain more than one Write chunk, since this implementation does not support more than one per message. - Ensure that segment lengths are not crazy. - Ensure that the chunk's segment count is not crazy. With a 1KB inline threshold, the largest number of Write segments that can be conveyed is about 60 (for a RDMA_NOMSG Reply message). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-06-28svcrdma: Improve Read chunk sanity checkingChuck Lever1-18/+37
Identify malformed transport headers and unsupported chunk combinations as early as possible. - Reject RPC-over-RDMA messages that contain more than one Read chunk, since this implementation currently does not support more than one per RPC transaction. - Ensure that segment lengths are not crazy. - Remove the segment count check. With a 1KB inline threshold, the largest number of Read segments that can be conveyed is about 40 (for a RDMA_NOMSG Call message). This is nowhere near RPCSVC_MAXPAGES. As far as I can tell, that was just a sanity check and does not enforce an implementation limit. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-06-28svcrdma: Remove svc_rdma_marshal.cChuck Lever4-173/+128
svc_rdma_marshal.c has one remaining exported function -- svc_rdma_xdr_decode_req -- and it has a single call site. Take the same approach as the sendto path, and move this function into the source file where it is called. This is a refactoring change only. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-06-28svcrdma: Avoid Send Queue overflowChuck Lever2-1/+6
Sanity case: Catch the case where more Work Requests are being posted to the Send Queue than there are Send Queue Entries. This might happen if a client sends a chunk with more segments than there are SQEs for the transport. The server can't send that reply, so the transport will deadlock unless the client drops the RPC. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-06-28svcrdma: Squelch disconnection messagesChuck Lever1-3/+10
The server displays "svcrdma: failed to post Send WR (-107)" in the kernel log when the client disconnects. This could flood the server's log, so remove the message. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-06-28sunrpc: Disable splice for krb5iChuck Lever2-1/+9
Running a multi-threaded 8KB fio test (70/30 mix), three or four out of twelve of the jobs fail when using krb5i. The failure is an EIO on a read. Troubleshooting confirmed the EIO results when the client fails to verify the MIC of an NFS READ reply. Bruce suggested the problem could be due to the data payload changing between the time the reply's MIC was computed on the server and the time the reply was actually sent. krb5p gets around this problem by disabling RQ_SPLICE_OK. Use the same mechanism for krb5i RPCs. "iozone -i0 -i1 -s128m -y1k -az -I", export is tmpfs, mount is sec=krb5i,vers=3,proto=rdma. The important numbers are the read / reread column. Here's without the RQ_SPLICE_OK patch: kB reclen write rewrite read reread 131072 1 7546 7929 8396 8267 131072 2 14375 14600 15843 15639 131072 4 19280 19248 21303 21410 131072 8 32350 31772 35199 34883 131072 16 36748 37477 49365 51706 131072 32 55669 56059 57475 57389 131072 64 74599 75190 74903 75550 131072 128 99810 101446 102828 102724 131072 256 122042 122612 124806 125026 131072 512 137614 138004 141412 141267 131072 1024 146601 148774 151356 151409 131072 2048 180684 181727 293140 292840 131072 4096 206907 207658 552964 549029 131072 8192 223982 224360 454493 473469 131072 16384 228927 228390 654734 632607 And here's with it: kB reclen write rewrite read reread 131072 1 7700 7365 7958 8011 131072 2 13211 13303 14937 14414 131072 4 19001 19265 20544 20657 131072 8 30883 31097 34255 33566 131072 16 36868 34908 51499 49944 131072 32 56428 55535 58710 56952 131072 64 73507 74676 75619 74378 131072 128 100324 101442 103276 102736 131072 256 122517 122995 124639 124150 131072 512 137317 139007 140530 140830 131072 1024 146807 148923 151246 151072 131072 2048 179656 180732 292631 292034 131072 4096 206216 208583 543355 541951 131072 8192 223738 224273 494201 489372 131072 16384 229313 229840 691719 668427 I would say that there is not much difference in this test. For good measure, here's the same test with sec=krb5p: kB reclen write rewrite read reread 131072 1 5982 5881 6137 6218 131072 2 10216 10252 10850 10932 131072 4 12236 12575 15375 15526 131072 8 15461 15462 23821 22351 131072 16 25677 25811 27529 27640 131072 32 31903 32354 34063 33857 131072 64 42989 43188 45635 45561 131072 128 52848 53210 56144 56141 131072 256 59123 59214 62691 62933 131072 512 63140 63277 66887 67025 131072 1024 65255 65299 69213 69140 131072 2048 76454 76555 133767 133862 131072 4096 84726 84883 251925 250702 131072 8192 89491 89482 270821 276085 131072 16384 91572 91597 361768 336868 BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=307 Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-06-11Linux 4.12-rc5Linus Torvalds1-1/+1
2017-06-11compiler, clang: properly override 'inline' for clangLinus Torvalds1-1/+2
Commit abb2ea7dfd82 ("compiler, clang: suppress warning for unused static inline functions") just caused more warnings due to re-defining the 'inline' macro. So undef it before re-defining it, and also add the 'notrace' attribute like the gcc version that this is overriding does. Maybe this makes clang happier. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-06-11KVM: async_pf: avoid async pf injection when in guest modeWanpeng Li3-4/+7
INFO: task gnome-terminal-:1734 blocked for more than 120 seconds. Not tainted 4.12.0-rc4+ #8 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. gnome-terminal- D 0 1734 1015 0x00000000 Call Trace: __schedule+0x3cd/0xb30 schedule+0x40/0x90 kvm_async_pf_task_wait+0x1cc/0x270 ? __vfs_read+0x37/0x150 ? prepare_to_swait+0x22/0x70 do_async_page_fault+0x77/0xb0 ? do_async_page_fault+0x77/0xb0 async_page_fault+0x28/0x30 This is triggered by running both win7 and win2016 on L1 KVM simultaneously, and then gives stress to memory on L1, I can observed this hang on L1 when at least ~70% swap area is occupied on L0. This is due to async pf was injected to L2 which should be injected to L1, L2 guest starts receiving pagefault w/ bogus %cr2(apf token from the host actually), and L1 guest starts accumulating tasks stuck in D state in kvm_async_pf_task_wait() since missing PAGE_READY async_pfs. This patch fixes the hang by doing async pf when executing L1 guest. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-06-10hexagon: Use raw_copy_to_userGuenter Roeck1-3/+2
Commit ac4691fac8ad ("hexagon: switch to RAW_COPY_USER") replaced __copy_to_user_hexagon() with raw_copy_to_user(), but did not catch all callers, resulting in the following build error. arch/hexagon/mm/uaccess.c: In function '__clear_user_hexagon': arch/hexagon/mm/uaccess.c:40:3: error: implicit declaration of function '__copy_to_user_hexagon' Fixes: ac4691fac8ad ("hexagon: switch to RAW_COPY_USER") Cc: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Richard Kuo <rkuo@codeaurora.org> Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2017-06-10ufs: we need to sync inode before freeing itAl Viro1-0/+1
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-06-09excessive checks in ufs_write_failed() and ufs_evict_inode()Al Viro1-13/+5
As it is, short copy in write() to append-only file will fail to truncate the excessive allocated blocks. As the matter of fact, all checks in ufs_truncate_blocks() are either redundant or wrong for that caller. As for the only other caller (ufs_evict_inode()), we only need the file type checks there. Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-06-09ufs_getfrag_block(): we only grab ->truncate_mutex on block creation pathAl Viro1-1/+3
Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-06-09ufs_extend_tail(): fix the braino in calling conventions of ufs_new_fragments()Al Viro1-1/+2
... and it really needs splitting into "new" and "extend" cases, but that's for later Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-06-09ufs: set correct ->s_maxsizeAl Viro1-0/+18
Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-06-09ufs: restore maintaining ->i_blocksAl Viro2-1/+26
Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-06-09fix ufs_isblockset()Al Viro1-3/+7
Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-06-09ufs: restore proper tail allocationAl Viro1-1/+1
Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-06-09Btrfs: fix delalloc accounting leak caused by u32 overflowOmar Sandoval1-2/+2
btrfs_calc_trans_metadata_size() does an unsigned 32-bit multiplication, which can overflow if num_items >= 4 GB / (nodesize * BTRFS_MAX_LEVEL * 2). For a nodesize of 16kB, this overflow happens at 16k items. Usually, num_items is a small constant passed to btrfs_start_transaction(), but we also use btrfs_calc_trans_metadata_size() for metadata reservations for extent items in btrfs_delalloc_{reserve,release}_metadata(). In drop_outstanding_extents(), num_items is calculated as inode->reserved_extents - inode->outstanding_extents. The difference between these two counters is usually small, but if many delalloc extents are reserved and then the outstanding extents are merged in btrfs_merge_extent_hook(), the difference can become large enough to overflow in btrfs_calc_trans_metadata_size(). The overflow manifests itself as a leak of a multiple of 4 GB in delalloc_block_rsv and the metadata bytes_may_use counter. This in turn can cause early ENOSPC errors. Additionally, these WARN_ONs in extent-tree.c will be hit when unmounting: WARN_ON(fs_info->delalloc_block_rsv.size > 0); WARN_ON(fs_info->delalloc_block_rsv.reserved > 0); WARN_ON(space_info->bytes_pinned > 0 || space_info->bytes_reserved > 0 || space_info->bytes_may_use > 0); Fix it by casting nodesize to a u64 so that btrfs_calc_trans_metadata_size() does a full 64-bit multiplication. While we're here, do the same in btrfs_calc_trunc_metadata_size(); this can't overflow with any existing uses, but it's better to be safe here than have another hard-to-debug problem later on. Cc: stable@vger.kernel.org Signed-off-by: Omar Sandoval <osandov@fb.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: Chris Mason <clm@fb.com>
2017-06-09Btrfs: clear EXTENT_DEFRAG bits in finish_ordered_ioLiu Bo1-1/+1
Before this, we use 'filled' mode here, ie. if all range has been filled with EXTENT_DEFRAG bits, get to clear it, but if the defrag range joins the adjacent delalloc range, then we'll have EXTENT_DEFRAG bits in extent_state until releasing this inode's pages, and that prevents extent_data from being freed. This clears the bit if any was found within the ordered extent. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Chris Mason <clm@fb.com>
2017-06-09btrfs: tree-log.c: Wrong printk information about namelenSu Yue1-1/+1
In verify_dir_item, it wants to printk name_len of dir_item but printk data_len acutally. Fix it by calling btrfs_dir_name_len instead of btrfs_dir_data_len. Signed-off-by: Su Yue <suy.fnst@cn.fujitsu.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Chris Mason <clm@fb.com>
2017-06-09Input: synaptics-rmi4 - register F03 port as pass-through serioDmitry Torokhov1-1/+1
The 5th generation Thinkpad X1 Carbons use Synaptics touchpads accessible over SMBus/RMI, combined with ALPS or Elantech trackpoint devices instead of classic IBM/Lenovo trackpoints. Unfortunately there is no way for ALPS driver to detect whether it is dealing with touchpad + trackpoint combination or just a trackpoint, so we end up with a "phantom" dualpoint ALPS device in addition to real touchpad and trackpoint. Given that we do not have any special advanced handling for ALPS or Elantech trackpoints (unlike IBM trackpoints that have separate driver and a host of options) we are better off keeping the trackpoints in PS/2 emulation mode. We achieve that by setting serio type to SERIO_PS_PSTHRU, which will limit number of protocols psmouse driver will try. In addition to getting rid of the "phantom" touchpads, this will also speed up probing of F03 pass-through port. Reported-by: Damjan Georgievski <gdamjan@gmail.com> Suggested-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Acked-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2017-06-09device-dax: fix 'dax' device filesystem inode destruction crashDan Williams1-2/+7
The inode destruction path for the 'dax' device filesystem incorrectly assumes that the inode was initialized through 'alloc_dax()'. However, if someone attempts to directly mount the dax filesystem with 'mount -t dax dax mnt' that will bypass 'alloc_dax()' and the following failure signatures may occur as a result: kill_dax() must be called before final iput() WARNING: CPU: 2 PID: 1188 at drivers/dax/super.c:243 dax_destroy_inode+0x48/0x50 RIP: 0010:dax_destroy_inode+0x48/0x50 Call Trace: destroy_inode+0x3b/0x60 evict+0x139/0x1c0 iput+0x1f9/0x2d0 dentry_unlink_inode+0xc3/0x160 __dentry_kill+0xcf/0x180 ? dput+0x37/0x3b0 dput+0x3a3/0x3b0 do_one_tree+0x36/0x40 shrink_dcache_for_umount+0x2d/0x90 generic_shutdown_super+0x1f/0x120 kill_anon_super+0x12/0x20 deactivate_locked_super+0x43/0x70 deactivate_super+0x4e/0x60 general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC RIP: 0010:kfree+0x6d/0x290 Call Trace: <IRQ> dax_i_callback+0x22/0x60 ? dax_destroy_inode+0x50/0x50 rcu_process_callbacks+0x298/0x740 ida_remove called for id=0 which is not allocated. WARNING: CPU: 0 PID: 0 at lib/idr.c:383 ida_remove+0x110/0x120 [..] Call Trace: <IRQ> ida_simple_remove+0x2b/0x50 ? dax_destroy_inode+0x50/0x50 dax_i_callback+0x3c/0x60 rcu_process_callbacks+0x298/0x740 Add missing initialization of the 'struct dax_device' and inode so that the destruction path does not kfree() or ida_simple_remove() uninitialized data. Fixes: 7b6be8444e0f ("dax: refactor dax-fs into a generic provider of 'struct dax_device' instances") Reported-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-06-09efi: Fix boot panic because of invalid BGRT image addressDave Young1-1/+25
Maniaxx reported a kernel boot crash in the EFI code, which I emulated by using same invalid phys addr in code: BUG: unable to handle kernel paging request at ffffffffff280001 IP: efi_bgrt_init+0xfb/0x153 ... Call Trace: ? bgrt_init+0xbc/0xbc acpi_parse_bgrt+0xe/0x12 acpi_table_parse+0x89/0xb8 acpi_boot_init+0x445/0x4e2 ? acpi_parse_x2apic+0x79/0x79 ? dmi_ignore_irq0_timer_override+0x33/0x33 setup_arch+0xb63/0xc82 ? early_idt_handler_array+0x120/0x120 start_kernel+0xb7/0x443 ? early_idt_handler_array+0x120/0x120 x86_64_start_reservations+0x29/0x2b x86_64_start_kernel+0x154/0x177 secondary_startup_64+0x9f/0x9f There is also a similar bug filed in bugzilla.kernel.org: https://bugzilla.kernel.org/show_bug.cgi?id=195633 The crash is caused by this commit: 7b0a911478c7 efi/x86: Move the EFI BGRT init code to early init code The root cause is the firmware on those machines provides invalid BGRT image addresses. In a kernel before above commit BGRT initializes late and uses ioremap() to map the image address. Ioremap validates the address, if it is not a valid physical address ioremap() just fails and returns. However in current kernel EFI BGRT initializes early and uses early_memremap() which does not validate the image address, and kernel panic happens. According to ACPI spec the BGRT image address should fall into EFI_BOOT_SERVICES_DATA, see the section 5.2.22.4 of below document: http://www.uefi.org/sites/default/files/resources/ACPI_6_1.pdf Fix this issue by validating the image address in efi_bgrt_init(). If the image address does not fall into any EFI_BOOT_SERVICES_DATA areas we just bail out with a warning message. Reported-by: Maniaxx <tripleshiftone@gmail.com> Signed-off-by: Dave Young <dyoung@redhat.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Fixes: 7b0a911478c7 ("efi/x86: Move the EFI BGRT init code to early init code") Link: http://lkml.kernel.org/r/20170609084558.26766-2-ard.biesheuvel@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-09cxl: Avoid double free_irq() for psl,slice interruptsVaibhav Jain1-3/+11
During an eeh call to cxl_remove can result in double free_irq of psl,slice interrupts. This can happen if perst_reloads_same_image == 1 and call to cxl_configure_adapter() fails during slot_reset callback. In such a case we see a kernel oops with following back-trace: Oops: Kernel access of bad area, sig: 11 [#1] Call Trace: free_irq+0x88/0xd0 (unreliable) cxl_unmap_irq+0x20/0x40 [cxl] cxl_native_release_psl_irq+0x78/0xd8 [cxl] pci_deconfigure_afu+0xac/0x110 [cxl] cxl_remove+0x104/0x210 [cxl] pci_device_remove+0x6c/0x110 device_release_driver_internal+0x204/0x2e0 pci_stop_bus_device+0xa0/0xd0 pci_stop_and_remove_bus_device+0x28/0x40 pci_hp_remove_devices+0xb0/0x150 pci_hp_remove_devices+0x68/0x150 eeh_handle_normal_event+0x140/0x580 eeh_handle_event+0x174/0x360 eeh_event_handler+0x1e8/0x1f0 This patch fixes the issue of double free_irq by checking that variables that hold the virqs (err_hwirq, serr_hwirq, psl_virq) are not '0' before un-mapping and resetting these variables to '0' when they are un-mapped. Cc: stable@vger.kernel.org Signed-off-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-09gpio: mvebu: fix gpio bank registration when pwm is usedRichard Genoud1-0/+7
If more than one gpio bank has the "pwm" property, only one will be registered successfully, all the others will fail with: mvebu-gpio: probe of f1018140.gpio failed with error -17 That's because in alloc_pwms(), the chip->base (aka "int pwm"), was not set (thus, ==0) ; and 0 is a meaningful start value in alloc_pwm(). What was intended is mvpwm->chip->base = -1. Like that, the numbering will be done auto-magically Moreover, as the region might be already occupied by another pwm, we shouldn't force: mvpwm->chip->base = 0 nor mvpwm->chip->base = id * MVEBU_MAX_GPIO_PER_BANK; Tested on clearfog-pro (Marvell 88F6828) Fixes: 757642f9a584 ("gpio: mvebu: Add limited PWM support") Signed-off-by: Richard Genoud <richard.genoud@gmail.com> Reviewed-by: Gregory CLEMENT <gregory.clement@free-electrons.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2017-06-09gpio: mvebu: fix blink counter register selectionRichard Genoud1-1/+1
The blink counter A was always selected because 0 was forced in the blink select counter register. The variable 'set' was obviously there to be used as the register value, selecting the B counter when id==1 and A counter when id==0. Tested on clearfog-pro (Marvell 88F6828) Fixes: 757642f9a584 ("gpio: mvebu: Add limited PWM support") Reviewed-by: Gregory CLEMENT <gregory.clement@free-electrons.com> Reviewed-by: Ralph Sennhauser <ralph.sennhauser@gmail.com> Signed-off-by: Richard Genoud <richard.genoud@gmail.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2017-06-09KEYS: fix refcount_inc() on zeroMark Rutland1-7/+4
If a key's refcount is dropped to zero between key_lookup() peeking at the refcount and subsequently attempting to increment it, refcount_inc() will see a zero refcount. Here, refcount_inc() will WARN_ONCE(), and will *not* increment the refcount, which will remain zero. Once key_lookup() drops key_serial_lock, it is possible for the key to be freed behind our back. This patch uses refcount_inc_not_zero() to perform the peek and increment atomically. Fixes: fff292914d3a2f1e ("security, keys: convert key.usage from atomic_t to refcount_t") Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: David Howells <dhowells@redhat.com> Cc: David Windsor <dwindsor@gmail.com> Cc: Elena Reshetova <elena.reshetova@intel.com> Cc: Hans Liljestrand <ishkamiel@gmail.com> Cc: James Morris <james.l.morris@oracle.com> Cc: Kees Cook <keescook@chromium.org> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: James Morris <james.l.morris@oracle.com>
2017-06-09KEYS: Convert KEYCTL_DH_COMPUTE to use the crypto KPP APIMat Martineau2-103/+171
The initial Diffie-Hellman computation made direct use of the MPI library because the crypto module did not support DH at the time. Now that KPP is implemented, KEYCTL_DH_COMPUTE should use it to get rid of duplicate code and leverage possible hardware acceleration. This fixes an issue whereby the input to the KDF computation would include additional uninitialized memory when the result of the Diffie-Hellman computation was shorter than the input prime number. Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: James Morris <james.l.morris@oracle.com>
2017-06-09crypto : asymmetric_keys : verify_pefile:zero memory content before freeingLoganaden Velvindron1-2/+2
Signed-off-by: Loganaden Velvindron <logan@hackers.mu> Signed-off-by: Yasir Auleear <yasirmx@hackers.mu> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: James Morris <james.l.morris@oracle.com>
2017-06-09KEYS: DH: add __user annotations to keyctl_kdf_paramsEric Biggers1-2/+2
Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Stephan Mueller <smueller@chronox.de> Signed-off-by: James Morris <james.l.morris@oracle.com>
2017-06-09KEYS: DH: ensure the KDF counter is properly alignedEric Biggers1-13/+3
Accessing a 'u8[4]' through a '__be32 *' violates alignment rules. Just make the counter a __be32 instead. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Stephan Mueller <smueller@chronox.de> Signed-off-by: James Morris <james.l.morris@oracle.com>
2017-06-09KEYS: DH: don't feed uninitialized "otherinfo" into KDFEric Biggers1-1/+1
If userspace called KEYCTL_DH_COMPUTE with kdf_params containing NULL otherinfo but nonzero otherinfolen, the kernel would allocate a buffer for the otherinfo, then feed it into the KDF without initializing it. Fix this by always doing the copy from userspace (which will fail with EFAULT in this scenario). Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Stephan Mueller <smueller@chronox.de> Signed-off-by: James Morris <james.l.morris@oracle.com>
2017-06-09KEYS: DH: forbid using digest_null as the KDF hashEric Biggers1-1/+11
Requesting "digest_null" in the keyctl_kdf_params caused an infinite loop in kdf_ctr() because the "null" hash has a digest size of 0. Fix it by rejecting hash algorithms with a digest size of 0. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Stephan Mueller <smueller@chronox.de> Signed-off-by: James Morris <james.l.morris@oracle.com>
2017-06-09KEYS: sanitize key structs before freeingEric Biggers2-4/+1
While a 'struct key' itself normally does not contain sensitive information, Documentation/security/keys.txt actually encourages this: "Having a payload is not required; and the payload can, in fact, just be a value stored in the struct key itself." In case someone has taken this advice, or will take this advice in the future, zero the key structure before freeing it. We might as well, and as a bonus this could make it a bit more difficult for an adversary to determine which keys have recently been in use. This is safe because the key_jar cache does not use a constructor. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: James Morris <james.l.morris@oracle.com>
2017-06-09KEYS: trusted: sanitize all key materialEric Biggers1-28/+22
As the previous patch did for encrypted-keys, zero sensitive any potentially sensitive data related to the "trusted" key type before it is freed. Notably, we were not zeroing the tpm_buf structures in which the actual key is stored for TPM seal and unseal, nor were we zeroing the trusted_key_payload in certain error paths. Cc: Mimi Zohar <zohar@linux.vnet.ibm.com> Cc: David Safford <safford@us.ibm.com> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: James Morris <james.l.morris@oracle.com>
2017-06-09KEYS: encrypted: sanitize all key materialEric Biggers1-18/+13
For keys of type "encrypted", consistently zero sensitive key material before freeing it. This was already being done for the decrypted payloads of encrypted keys, but not for the master key and the keys derived from the master key. Out of an abundance of caution and because it is trivial to do so, also zero buffers containing the key payload in encrypted form, although depending on how the encrypted-keys feature is used such information does not necessarily need to be kept secret. Cc: Mimi Zohar <zohar@linux.vnet.ibm.com> Cc: David Safford <safford@us.ibm.com> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: James Morris <james.l.morris@oracle.com>