Age | Commit message (Collapse) | Author | Files | Lines |
|
Factoring ctime into the nfsv4 change attribute gives us better
properties than just i_version alone.
Eventually we'll likely also expose this (as opposed to raw i_version)
to userspace, at which point we'll want to move it to a common helper,
called from either userspace or individual filesystems. For now, nfsd
is the only user.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Clean up: No need to save the I/O direction. The functions that
release svc_rdma_chunk_ctxt already know what direction to use.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Clean up: Use offset_in_page() macro instead of open-coding.
Reported-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Clean up: Registration mode details are now handled by the rdma_rw
API, and thus can be removed from svcrdma.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
There's no longer a need to compare each SGE's lkey with the PD's
local_dma_lkey. Now that FRWR is gone, all DMA mappings are for
pages that were registered with this key.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Clean up: Now that the svc_rdma_recvfrom path uses the rdma_rw API,
the details of Read sink buffer registration are dealt with by the
kernel's RDMA core. This cache is no longer used, and can be
removed.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Clean up:
The generic RDMA R/W API conversion of svc_rdma_recvfrom replaced
the Register, Read, and Invalidate completion handlers. Remove the
old ones, which are no longer used.
These handlers shared some helper code with svc_rdma_wc_send. Fold
the wc_common helper back into the one remaining completion handler.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
When an RPC-over-RDMA request is received, the Receive buffer
contains a Transport Header possibly followed by an RPC message.
Even though rq_arg.head[0] (as passed to NFSD) does not contain the
Transport Header header, currently rq_arg.len includes the size of
the Transport Header.
That violates the intent of the xdr_buf API contract. .buflen should
include everything, but .len should be exactly the length of the RPC
message in the buffer.
The rq_arg fields are summed together at the end of
svc_rdma_recvfrom to obtain the correct return value. rq_arg.len
really ought to contain the correct number of bytes already, but it
currently doesn't due to the above misbehavior.
Let's instead ensure that .buflen includes the length of the
transport header, and that .len is always equal to head.iov_len +
.page_len + tail.iov_len .
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
The current svcrdma recvfrom code path has a lot of detail about
registration mode and the type of port (iWARP, IB, etc).
Instead, use the RDMA core's generic R/W API. This shares code with
other RDMA-enabled ULPs that manages the gory details of buffer
registration and the posting of RDMA Read Work Requests.
Since the Read list marshaling code is being replaced, I took the
opportunity to replace C structure-based XDR encoding code with more
portable code that uses pointer arithmetic.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
svc_rdma_rw.c already contains helpers for the sendto path.
Introduce helpers for the recvfrom path.
The plan is to replace the local NFSD bespoke code that constructs
and posts RDMA Read Work Requests with calls to the rdma_rw API.
This shares code with other RDMA-enabled ULPs that manages the gory
details of buffer registration and posting Work Requests.
This new code also puts all RDMA_NOMSG-specific logic in one place.
Lastly, the use of rqstp->rq_arg.pages is deprecated in favor of
using rqstp->rq_pages directly, for clarity.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
svcrdma needs 259 pages allocated to receive 1MB NFSv4.0 WRITE requests:
- 1 page for the transport header and head iovec
- 256 pages for the data payload
- 1 page for the trailing GETATTR request (since NFSD XDR decoding
does not look for a tail iovec, the GETATTR is stuck at the end
of the rqstp->rq_arg.pages list)
- 1 page for building the reply xdr_buf
But RPCSVC_MAXPAGES is already 259 (on x86_64). The problem is that
svc_alloc_arg never allocates that many pages. To address this:
1. The final element of rq_pages always points to NULL. To
accommodate up to 259 pages in rq_pages, add an extra element
to rq_pages for the array termination sentinel.
2. Adjust the calculation of "pages" to match how RPCSVC_MAXPAGES
is calculated, so it can go up to 259. Bruce noted that the
calculation assumes sv_max_mesg is a multiple of PAGE_SIZE,
which might not always be true. I didn't change this assumption.
3. Change the loop boundaries to allow 259 pages to be allocated.
Additional clean-up: WARN_ON_ONCE adds an extra conditional branch,
which is basically never taken. And there's no need to dump the
stack here because svc_alloc_arg has only one caller.
Keeping that NULL "array termination sentinel"; there doesn't appear to
be any code that depends on it, only code in nfsd_splice_actor() which
needs the 259th element to be initialized to *something*. So it's
possible we could just keep the array at 259 elements and drop that
final NULL, but we're being conservative for now.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
>From what I can tell, calling ->recvfrom when there is no work to do
is a normal part of operation. This is the only way svc_recv can
tell when there is no more data ready to receive on the transport.
Neither the TCP nor the UDP transport implementations have a
"starve" metric.
The cost of receive starvation accounting is bumping an atomic, which
results in extra (IMO unnecessary) bus traffic between CPU sockets,
while holding a spin lock.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Identify malformed transport headers and unsupported chunk
combinations as early as possible.
- Ensure that segment lengths are not crazy.
- Ensure that the Reply chunk's segment count is not crazy.
With a 1KB inline threshold, the largest number of Write segments
that can be conveyed is about 60 (for a RDMA_NOMSG Reply message).
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Identify malformed transport headers and unsupported chunk
combinations as early as possible.
- Reject RPC-over-RDMA messages that contain more than one Write
chunk, since this implementation does not support more than one per
message.
- Ensure that segment lengths are not crazy.
- Ensure that the chunk's segment count is not crazy.
With a 1KB inline threshold, the largest number of Write segments
that can be conveyed is about 60 (for a RDMA_NOMSG Reply message).
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Identify malformed transport headers and unsupported chunk
combinations as early as possible.
- Reject RPC-over-RDMA messages that contain more than one Read chunk,
since this implementation currently does not support more than one
per RPC transaction.
- Ensure that segment lengths are not crazy.
- Remove the segment count check. With a 1KB inline threshold, the
largest number of Read segments that can be conveyed is about 40
(for a RDMA_NOMSG Call message). This is nowhere near
RPCSVC_MAXPAGES. As far as I can tell, that was just a sanity
check and does not enforce an implementation limit.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
svc_rdma_marshal.c has one remaining exported function --
svc_rdma_xdr_decode_req -- and it has a single call site. Take
the same approach as the sendto path, and move this function
into the source file where it is called.
This is a refactoring change only.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Sanity case: Catch the case where more Work Requests are being
posted to the Send Queue than there are Send Queue Entries.
This might happen if a client sends a chunk with more segments than
there are SQEs for the transport. The server can't send that reply,
so the transport will deadlock unless the client drops the RPC.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
The server displays "svcrdma: failed to post Send WR (-107)" in the
kernel log when the client disconnects. This could flood the server's
log, so remove the message.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Running a multi-threaded 8KB fio test (70/30 mix), three or four out
of twelve of the jobs fail when using krb5i. The failure is an EIO
on a read.
Troubleshooting confirmed the EIO results when the client fails to
verify the MIC of an NFS READ reply. Bruce suggested the problem
could be due to the data payload changing between the time the
reply's MIC was computed on the server and the time the reply was
actually sent.
krb5p gets around this problem by disabling RQ_SPLICE_OK. Use the
same mechanism for krb5i RPCs.
"iozone -i0 -i1 -s128m -y1k -az -I", export is tmpfs, mount is
sec=krb5i,vers=3,proto=rdma. The important numbers are the
read / reread column.
Here's without the RQ_SPLICE_OK patch:
kB reclen write rewrite read reread
131072 1 7546 7929 8396 8267
131072 2 14375 14600 15843 15639
131072 4 19280 19248 21303 21410
131072 8 32350 31772 35199 34883
131072 16 36748 37477 49365 51706
131072 32 55669 56059 57475 57389
131072 64 74599 75190 74903 75550
131072 128 99810 101446 102828 102724
131072 256 122042 122612 124806 125026
131072 512 137614 138004 141412 141267
131072 1024 146601 148774 151356 151409
131072 2048 180684 181727 293140 292840
131072 4096 206907 207658 552964 549029
131072 8192 223982 224360 454493 473469
131072 16384 228927 228390 654734 632607
And here's with it:
kB reclen write rewrite read reread
131072 1 7700 7365 7958 8011
131072 2 13211 13303 14937 14414
131072 4 19001 19265 20544 20657
131072 8 30883 31097 34255 33566
131072 16 36868 34908 51499 49944
131072 32 56428 55535 58710 56952
131072 64 73507 74676 75619 74378
131072 128 100324 101442 103276 102736
131072 256 122517 122995 124639 124150
131072 512 137317 139007 140530 140830
131072 1024 146807 148923 151246 151072
131072 2048 179656 180732 292631 292034
131072 4096 206216 208583 543355 541951
131072 8192 223738 224273 494201 489372
131072 16384 229313 229840 691719 668427
I would say that there is not much difference in this test.
For good measure, here's the same test with sec=krb5p:
kB reclen write rewrite read reread
131072 1 5982 5881 6137 6218
131072 2 10216 10252 10850 10932
131072 4 12236 12575 15375 15526
131072 8 15461 15462 23821 22351
131072 16 25677 25811 27529 27640
131072 32 31903 32354 34063 33857
131072 64 42989 43188 45635 45561
131072 128 52848 53210 56144 56141
131072 256 59123 59214 62691 62933
131072 512 63140 63277 66887 67025
131072 1024 65255 65299 69213 69140
131072 2048 76454 76555 133767 133862
131072 4096 84726 84883 251925 250702
131072 8192 89491 89482 270821 276085
131072 16384 91572 91597 361768 336868
BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=307
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
|
|
Commit abb2ea7dfd82 ("compiler, clang: suppress warning for unused
static inline functions") just caused more warnings due to re-defining
the 'inline' macro.
So undef it before re-defining it, and also add the 'notrace' attribute
like the gcc version that this is overriding does.
Maybe this makes clang happier.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
INFO: task gnome-terminal-:1734 blocked for more than 120 seconds.
Not tainted 4.12.0-rc4+ #8
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
gnome-terminal- D 0 1734 1015 0x00000000
Call Trace:
__schedule+0x3cd/0xb30
schedule+0x40/0x90
kvm_async_pf_task_wait+0x1cc/0x270
? __vfs_read+0x37/0x150
? prepare_to_swait+0x22/0x70
do_async_page_fault+0x77/0xb0
? do_async_page_fault+0x77/0xb0
async_page_fault+0x28/0x30
This is triggered by running both win7 and win2016 on L1 KVM simultaneously,
and then gives stress to memory on L1, I can observed this hang on L1 when
at least ~70% swap area is occupied on L0.
This is due to async pf was injected to L2 which should be injected to L1,
L2 guest starts receiving pagefault w/ bogus %cr2(apf token from the host
actually), and L1 guest starts accumulating tasks stuck in D state in
kvm_async_pf_task_wait() since missing PAGE_READY async_pfs.
This patch fixes the hang by doing async pf when executing L1 guest.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Commit ac4691fac8ad ("hexagon: switch to RAW_COPY_USER") replaced
__copy_to_user_hexagon() with raw_copy_to_user(), but did not catch
all callers, resulting in the following build error.
arch/hexagon/mm/uaccess.c: In function '__clear_user_hexagon':
arch/hexagon/mm/uaccess.c:40:3: error:
implicit declaration of function '__copy_to_user_hexagon'
Fixes: ac4691fac8ad ("hexagon: switch to RAW_COPY_USER")
Cc: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Richard Kuo <rkuo@codeaurora.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
|
|
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
As it is, short copy in write() to append-only file will fail
to truncate the excessive allocated blocks. As the matter of
fact, all checks in ufs_truncate_blocks() are either redundant
or wrong for that caller. As for the only other caller
(ufs_evict_inode()), we only need the file type checks there.
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
... and it really needs splitting into "new" and "extend" cases, but that's for
later
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
btrfs_calc_trans_metadata_size() does an unsigned 32-bit multiplication,
which can overflow if num_items >= 4 GB / (nodesize * BTRFS_MAX_LEVEL * 2).
For a nodesize of 16kB, this overflow happens at 16k items. Usually,
num_items is a small constant passed to btrfs_start_transaction(), but
we also use btrfs_calc_trans_metadata_size() for metadata reservations
for extent items in btrfs_delalloc_{reserve,release}_metadata().
In drop_outstanding_extents(), num_items is calculated as
inode->reserved_extents - inode->outstanding_extents. The difference
between these two counters is usually small, but if many delalloc
extents are reserved and then the outstanding extents are merged in
btrfs_merge_extent_hook(), the difference can become large enough to
overflow in btrfs_calc_trans_metadata_size().
The overflow manifests itself as a leak of a multiple of 4 GB in
delalloc_block_rsv and the metadata bytes_may_use counter. This in turn
can cause early ENOSPC errors. Additionally, these WARN_ONs in
extent-tree.c will be hit when unmounting:
WARN_ON(fs_info->delalloc_block_rsv.size > 0);
WARN_ON(fs_info->delalloc_block_rsv.reserved > 0);
WARN_ON(space_info->bytes_pinned > 0 ||
space_info->bytes_reserved > 0 ||
space_info->bytes_may_use > 0);
Fix it by casting nodesize to a u64 so that
btrfs_calc_trans_metadata_size() does a full 64-bit multiplication.
While we're here, do the same in btrfs_calc_trunc_metadata_size(); this
can't overflow with any existing uses, but it's better to be safe here
than have another hard-to-debug problem later on.
Cc: stable@vger.kernel.org
Signed-off-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
|
|
Before this, we use 'filled' mode here, ie. if all range has been
filled with EXTENT_DEFRAG bits, get to clear it, but if the defrag
range joins the adjacent delalloc range, then we'll have EXTENT_DEFRAG
bits in extent_state until releasing this inode's pages, and that
prevents extent_data from being freed.
This clears the bit if any was found within the ordered extent.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
|
|
In verify_dir_item, it wants to printk name_len of dir_item but
printk data_len acutally.
Fix it by calling btrfs_dir_name_len instead of btrfs_dir_data_len.
Signed-off-by: Su Yue <suy.fnst@cn.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
|
|
The 5th generation Thinkpad X1 Carbons use Synaptics touchpads accessible
over SMBus/RMI, combined with ALPS or Elantech trackpoint devices instead
of classic IBM/Lenovo trackpoints. Unfortunately there is no way for ALPS
driver to detect whether it is dealing with touchpad + trackpoint
combination or just a trackpoint, so we end up with a "phantom" dualpoint
ALPS device in addition to real touchpad and trackpoint.
Given that we do not have any special advanced handling for ALPS or
Elantech trackpoints (unlike IBM trackpoints that have separate driver and
a host of options) we are better off keeping the trackpoints in PS/2
emulation mode. We achieve that by setting serio type to SERIO_PS_PSTHRU,
which will limit number of protocols psmouse driver will try. In addition
to getting rid of the "phantom" touchpads, this will also speed up probing
of F03 pass-through port.
Reported-by: Damjan Georgievski <gdamjan@gmail.com>
Suggested-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Acked-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
|
|
The inode destruction path for the 'dax' device filesystem incorrectly
assumes that the inode was initialized through 'alloc_dax()'. However,
if someone attempts to directly mount the dax filesystem with 'mount -t
dax dax mnt' that will bypass 'alloc_dax()' and the following failure
signatures may occur as a result:
kill_dax() must be called before final iput()
WARNING: CPU: 2 PID: 1188 at drivers/dax/super.c:243 dax_destroy_inode+0x48/0x50
RIP: 0010:dax_destroy_inode+0x48/0x50
Call Trace:
destroy_inode+0x3b/0x60
evict+0x139/0x1c0
iput+0x1f9/0x2d0
dentry_unlink_inode+0xc3/0x160
__dentry_kill+0xcf/0x180
? dput+0x37/0x3b0
dput+0x3a3/0x3b0
do_one_tree+0x36/0x40
shrink_dcache_for_umount+0x2d/0x90
generic_shutdown_super+0x1f/0x120
kill_anon_super+0x12/0x20
deactivate_locked_super+0x43/0x70
deactivate_super+0x4e/0x60
general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC
RIP: 0010:kfree+0x6d/0x290
Call Trace:
<IRQ>
dax_i_callback+0x22/0x60
? dax_destroy_inode+0x50/0x50
rcu_process_callbacks+0x298/0x740
ida_remove called for id=0 which is not allocated.
WARNING: CPU: 0 PID: 0 at lib/idr.c:383 ida_remove+0x110/0x120
[..]
Call Trace:
<IRQ>
ida_simple_remove+0x2b/0x50
? dax_destroy_inode+0x50/0x50
dax_i_callback+0x3c/0x60
rcu_process_callbacks+0x298/0x740
Add missing initialization of the 'struct dax_device' and inode so that
the destruction path does not kfree() or ida_simple_remove()
uninitialized data.
Fixes: 7b6be8444e0f ("dax: refactor dax-fs into a generic provider of 'struct dax_device' instances")
Reported-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
Maniaxx reported a kernel boot crash in the EFI code, which I emulated
by using same invalid phys addr in code:
BUG: unable to handle kernel paging request at ffffffffff280001
IP: efi_bgrt_init+0xfb/0x153
...
Call Trace:
? bgrt_init+0xbc/0xbc
acpi_parse_bgrt+0xe/0x12
acpi_table_parse+0x89/0xb8
acpi_boot_init+0x445/0x4e2
? acpi_parse_x2apic+0x79/0x79
? dmi_ignore_irq0_timer_override+0x33/0x33
setup_arch+0xb63/0xc82
? early_idt_handler_array+0x120/0x120
start_kernel+0xb7/0x443
? early_idt_handler_array+0x120/0x120
x86_64_start_reservations+0x29/0x2b
x86_64_start_kernel+0x154/0x177
secondary_startup_64+0x9f/0x9f
There is also a similar bug filed in bugzilla.kernel.org:
https://bugzilla.kernel.org/show_bug.cgi?id=195633
The crash is caused by this commit:
7b0a911478c7 efi/x86: Move the EFI BGRT init code to early init code
The root cause is the firmware on those machines provides invalid BGRT
image addresses.
In a kernel before above commit BGRT initializes late and uses ioremap()
to map the image address. Ioremap validates the address, if it is not a
valid physical address ioremap() just fails and returns. However in current
kernel EFI BGRT initializes early and uses early_memremap() which does not
validate the image address, and kernel panic happens.
According to ACPI spec the BGRT image address should fall into
EFI_BOOT_SERVICES_DATA, see the section 5.2.22.4 of below document:
http://www.uefi.org/sites/default/files/resources/ACPI_6_1.pdf
Fix this issue by validating the image address in efi_bgrt_init(). If the
image address does not fall into any EFI_BOOT_SERVICES_DATA areas we just
bail out with a warning message.
Reported-by: Maniaxx <tripleshiftone@gmail.com>
Signed-off-by: Dave Young <dyoung@redhat.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Fixes: 7b0a911478c7 ("efi/x86: Move the EFI BGRT init code to early init code")
Link: http://lkml.kernel.org/r/20170609084558.26766-2-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
During an eeh call to cxl_remove can result in double free_irq of
psl,slice interrupts. This can happen if perst_reloads_same_image == 1
and call to cxl_configure_adapter() fails during slot_reset
callback. In such a case we see a kernel oops with following back-trace:
Oops: Kernel access of bad area, sig: 11 [#1]
Call Trace:
free_irq+0x88/0xd0 (unreliable)
cxl_unmap_irq+0x20/0x40 [cxl]
cxl_native_release_psl_irq+0x78/0xd8 [cxl]
pci_deconfigure_afu+0xac/0x110 [cxl]
cxl_remove+0x104/0x210 [cxl]
pci_device_remove+0x6c/0x110
device_release_driver_internal+0x204/0x2e0
pci_stop_bus_device+0xa0/0xd0
pci_stop_and_remove_bus_device+0x28/0x40
pci_hp_remove_devices+0xb0/0x150
pci_hp_remove_devices+0x68/0x150
eeh_handle_normal_event+0x140/0x580
eeh_handle_event+0x174/0x360
eeh_event_handler+0x1e8/0x1f0
This patch fixes the issue of double free_irq by checking that
variables that hold the virqs (err_hwirq, serr_hwirq, psl_virq) are
not '0' before un-mapping and resetting these variables to '0' when
they are un-mapped.
Cc: stable@vger.kernel.org
Signed-off-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
If more than one gpio bank has the "pwm" property, only one will be
registered successfully, all the others will fail with:
mvebu-gpio: probe of f1018140.gpio failed with error -17
That's because in alloc_pwms(), the chip->base (aka "int pwm"), was not
set (thus, ==0) ; and 0 is a meaningful start value in alloc_pwm().
What was intended is mvpwm->chip->base = -1.
Like that, the numbering will be done auto-magically
Moreover, as the region might be already occupied by another pwm, we
shouldn't force:
mvpwm->chip->base = 0
nor
mvpwm->chip->base = id * MVEBU_MAX_GPIO_PER_BANK;
Tested on clearfog-pro (Marvell 88F6828)
Fixes: 757642f9a584 ("gpio: mvebu: Add limited PWM support")
Signed-off-by: Richard Genoud <richard.genoud@gmail.com>
Reviewed-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
|
|
The blink counter A was always selected because 0 was forced in the
blink select counter register.
The variable 'set' was obviously there to be used as the register value,
selecting the B counter when id==1 and A counter when id==0.
Tested on clearfog-pro (Marvell 88F6828)
Fixes: 757642f9a584 ("gpio: mvebu: Add limited PWM support")
Reviewed-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Reviewed-by: Ralph Sennhauser <ralph.sennhauser@gmail.com>
Signed-off-by: Richard Genoud <richard.genoud@gmail.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
|
|
If a key's refcount is dropped to zero between key_lookup() peeking at
the refcount and subsequently attempting to increment it, refcount_inc()
will see a zero refcount. Here, refcount_inc() will WARN_ONCE(), and
will *not* increment the refcount, which will remain zero.
Once key_lookup() drops key_serial_lock, it is possible for the key to
be freed behind our back.
This patch uses refcount_inc_not_zero() to perform the peek and increment
atomically.
Fixes: fff292914d3a2f1e ("security, keys: convert key.usage from atomic_t to refcount_t")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Cc: David Windsor <dwindsor@gmail.com>
Cc: Elena Reshetova <elena.reshetova@intel.com>
Cc: Hans Liljestrand <ishkamiel@gmail.com>
Cc: James Morris <james.l.morris@oracle.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: James Morris <james.l.morris@oracle.com>
|
|
The initial Diffie-Hellman computation made direct use of the MPI
library because the crypto module did not support DH at the time. Now
that KPP is implemented, KEYCTL_DH_COMPUTE should use it to get rid of
duplicate code and leverage possible hardware acceleration.
This fixes an issue whereby the input to the KDF computation would
include additional uninitialized memory when the result of the
Diffie-Hellman computation was shorter than the input prime number.
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
|
|
Signed-off-by: Loganaden Velvindron <logan@hackers.mu>
Signed-off-by: Yasir Auleear <yasirmx@hackers.mu>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
|
|
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: James Morris <james.l.morris@oracle.com>
|
|
Accessing a 'u8[4]' through a '__be32 *' violates alignment rules. Just
make the counter a __be32 instead.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: James Morris <james.l.morris@oracle.com>
|
|
If userspace called KEYCTL_DH_COMPUTE with kdf_params containing NULL
otherinfo but nonzero otherinfolen, the kernel would allocate a buffer
for the otherinfo, then feed it into the KDF without initializing it.
Fix this by always doing the copy from userspace (which will fail with
EFAULT in this scenario).
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: James Morris <james.l.morris@oracle.com>
|
|
Requesting "digest_null" in the keyctl_kdf_params caused an infinite
loop in kdf_ctr() because the "null" hash has a digest size of 0. Fix
it by rejecting hash algorithms with a digest size of 0.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: James Morris <james.l.morris@oracle.com>
|
|
While a 'struct key' itself normally does not contain sensitive
information, Documentation/security/keys.txt actually encourages this:
"Having a payload is not required; and the payload can, in fact,
just be a value stored in the struct key itself."
In case someone has taken this advice, or will take this advice in the
future, zero the key structure before freeing it. We might as well, and
as a bonus this could make it a bit more difficult for an adversary to
determine which keys have recently been in use.
This is safe because the key_jar cache does not use a constructor.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
|
|
As the previous patch did for encrypted-keys, zero sensitive any
potentially sensitive data related to the "trusted" key type before it
is freed. Notably, we were not zeroing the tpm_buf structures in which
the actual key is stored for TPM seal and unseal, nor were we zeroing
the trusted_key_payload in certain error paths.
Cc: Mimi Zohar <zohar@linux.vnet.ibm.com>
Cc: David Safford <safford@us.ibm.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
|
|
For keys of type "encrypted", consistently zero sensitive key material
before freeing it. This was already being done for the decrypted
payloads of encrypted keys, but not for the master key and the keys
derived from the master key.
Out of an abundance of caution and because it is trivial to do so, also
zero buffers containing the key payload in encrypted form, although
depending on how the encrypted-keys feature is used such information
does not necessarily need to be kept secret.
Cc: Mimi Zohar <zohar@linux.vnet.ibm.com>
Cc: David Safford <safford@us.ibm.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
|