aboutsummaryrefslogtreecommitdiffstats
path: root/net/ceph/messenger.c (follow)
AgeCommit message (Collapse)AuthorFilesLines
2022-10-04libceph: drop last_piece flag from ceph_msg_data_cursorJeff Layton1-35/+5
ceph_msg_data_next is always passed a NULL pointer for this field. Some of the "next" operations look at it in order to determine the length, but we can just take the min of the data on the page or cursor->resid. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-02-02libceph: optionally use bounce buffer on recv path in crc modeIlya Dryomov1-0/+4
Both msgr1 and msgr2 in crc mode are zero copy in the sense that message data is read from the socket directly into the destination buffer. We assume that the destination buffer is stable (i.e. remains unchanged while it is being read to) though. Otherwise, CRC errors ensue: libceph: read_partial_message 0000000048edf8ad data crc 1063286393 != exp. 228122706 libceph: osd1 (1)192.168.122.1:6843 bad crc/signature libceph: bad data crc, calculated 57958023, expected 1805382778 libceph: osd2 (2)192.168.122.1:6876 integrity error, bad crc Introduce rxbounce option to enable use of a bounce buffer when receiving message data. In particular this is needed if a mapped image is a Windows VM disk, passed to QEMU. Windows has a system-wide "dummy" page that may be mapped into the destination buffer (potentially more than once into the same buffer) by the Windows Memory Manager in an effort to generate a single large I/O [1][2]. QEMU makes a point of preserving overlap relationships when cloning I/O vectors, so krbd gets exposed to this behaviour. [1] "What Is Really in That MDL?" https://docs.microsoft.com/en-us/previous-versions/windows/hardware/design/dn614012(v=vs.85) [2] https://blogs.msmvps.com/kernelmustard/2005/05/04/dummy-pages/ URL: https://bugzilla.redhat.com/show_bug.cgi?id=1973317 Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Jeff Layton <jlayton@kernel.org>
2022-01-20Merge tag 'ceph-for-5.17-rc1' of git://github.com/ceph/ceph-clientLinus Torvalds1-7/+8
Pull ceph updates from Ilya Dryomov: "The highlight is the new mount "device" string syntax implemented by Venky Shankar. It solves some long-standing issues with using different auth entities and/or mounting different CephFS filesystems from the same cluster, remounting and also misleading /proc/mounts contents. The existing syntax of course remains to be maintained. On top of that, there is a couple of fixes for edge cases in quota and a new mount option for turning on unbuffered I/O mode globally instead of on a per-file basis with ioctl(CEPH_IOC_SYNCIO)" * tag 'ceph-for-5.17-rc1' of git://github.com/ceph/ceph-client: ceph: move CEPH_SUPER_MAGIC definition to magic.h ceph: remove redundant Lsx caps check ceph: add new "nopagecache" option ceph: don't check for quotas on MDS stray dirs ceph: drop send metrics debug message rbd: make const pointer spaces a static const array ceph: Fix incorrect statfs report for small quota ceph: mount syntax module parameter doc: document new CephFS mount device syntax ceph: record updated mon_addr on remount ceph: new device mount syntax libceph: rename parse_fsid() to ceph_parse_fsid() and export libceph: generalize addr/ip parsing based on delimiter
2022-01-15mm: allow !GFP_KERNEL allocations for kvmallocMichal Hocko1-1/+1
Support for GFP_NO{FS,IO} and __GFP_NOFAIL has been implemented by previous patches so we can allow the support for kvmalloc. This will allow some external users to simplify or completely remove their helpers. GFP_NOWAIT semantic hasn't been supported so far but it hasn't been explicitly documented so let's add a note about that. ceph_kvmalloc is the first helper to be dropped and changed to kvmalloc. Link: https://lkml.kernel.org/r/20211122153233.9924-5-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Christoph Hellwig <hch@lst.de> Cc: Dave Chinner <david@fromorbit.com> Cc: Ilya Dryomov <idryomov@gmail.com> Cc: Jeff Layton <jlayton@kernel.org> Cc: Neil Brown <neilb@suse.de> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-13libceph: generalize addr/ip parsing based on delimiterVenky Shankar1-7/+8
... and remove hardcoded function name in ceph_parse_ips(). [ idryomov: delim parameter, drop CEPH_ADDR_PARSE_DEFAULT_DELIM ] Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14libceph, ceph: implement msgr2.1 protocol (crc and secure modes)Ilya Dryomov1-12/+56
Implement msgr2.1 wire protocol, available since nautilus 14.2.11 and octopus 15.2.5. msgr2.0 wire protocol is not implemented -- it has several security, integrity and robustness issues and therefore considered deprecated. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14libceph: move msgr1 protocol specific fields to its own structIlya Dryomov1-4/+4
A couple whitespace fixups, no functional changes. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14libceph: move msgr1 protocol implementation to its own fileIlya Dryomov1-1495/+0
A pure move, no other changes. Note that ceph_tcp_recv{msg,page}() and ceph_tcp_send{msg,page}() helpers are also moved. msgr2 will bring its own, more efficient, variants based on iov_iter. Switching msgr1 to them was considered but decided against to avoid subtle regressions. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14libceph: separate msgr1 protocol implementationIlya Dryomov1-50/+88
In preparation for msgr2, define internal messenger <-> protocol interface (as opposed to external messenger <-> client interface, which is struct ceph_connection_operations) consisting of try_read(), try_write(), revoke(), revoke_incoming(), opened(), reset_session() and reset_protocol() ops. The semantics are exactly the same as they are now. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14libceph: export remaining protocol independent infrastructureIlya Dryomov1-82/+75
In preparation for msgr2, make all protocol independent functions in messenger.c global. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14libceph: export zero_pageIlya Dryomov1-8/+9
In preparation for msgr2, make zero_page global. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14libceph: rename and export con->flags bitsIlya Dryomov1-43/+34
In preparation for msgr2, move the defines to the header file. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14libceph: rename and export con->state statesIlya Dryomov1-51/+39
In preparation for msgr2, rename msgr1 specific states and move the defines to the header file. Also drop state transition comments. They don't cover all possible transitions (e.g. NEGOTIATING -> STANDBY, etc) and currently do more harm than good. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14libceph: make con->state an intIlya Dryomov1-10/+6
unsigned long is a leftover from when con->state used to be a set of bits managed with set_bit(), clear_bit(), etc. Save a bit of memory. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14libceph: don't export ceph_messenger_{init_fini}() to modulesIlya Dryomov1-2/+0
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14libceph: make sure our addr->port is zero and addr->nonce is non-zeroIlya Dryomov1-10/+17
Our messenger instance addr->port is normally zero -- anything else is nonsensical because as a client we connect to multiple servers and don't listen on any port. However, a user can supply an arbitrary addr:port via ip option and the port is currently preserved. Zero it. Conversely, make sure our addr->nonce is non-zero. A zero nonce is special: in combination with a zero port, it is used to blocklist the entire ip. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14libceph: factor out ceph_con_get_out_msg()Ilya Dryomov1-20/+39
Move the logic of grabbing the next message from the queue into its own function. Like ceph_con_in_msg_alloc(), this is protocol independent. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14libceph: change ceph_con_in_msg_alloc() to take hdrIlya Dryomov1-5/+6
ceph_con_in_msg_alloc() is protocol independent, but con->in_hdr (and struct ceph_msg_header in general) is msgr1 specific. While the struct is deeply ingrained inside and outside the messenger, con->in_hdr field can be separated. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14libceph: change ceph_msg_data_cursor_init() to take cursorIlya Dryomov1-4/+3
Make it possible to have local cursors and embed them outside struct ceph_msg. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14libceph: handle discarding acked and requeued messages separatelyIlya Dryomov1-20/+54
Make it easier to follow and remove dependency on msgr1 specific CEPH_MSGR_TAG_SEQ. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14libceph: drop msg->ack_stamp fieldIlya Dryomov1-1/+0
It is set in process_ack() but never used. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14libceph: remove redundant session reset log messageIlya Dryomov1-4/+3
Stick with pr_info message because session reset isn't an error most of the time. When it is (i.e. if the server denies the reconnect attempt), we get a bunch of other pr_err messages. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14libceph: clear con->peer_global_seq on RESETSESSIONIlya Dryomov1-3/+3
con->peer_global_seq is part of session state. Clear it when the server tells us to reset, not just in ceph_con_close(). Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14libceph: rename reset_connection() to ceph_con_reset_session()Ilya Dryomov1-6/+4
With just session reset bits left, rename appropriately. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14libceph: split protocol reset bits out of reset_connection()Ilya Dryomov1-26/+24
Move protocol reset bits into ceph_con_reset_protocol(), leaving just session reset bits. Note that con->out_skip is now reset on faults. This fixes a crash in the case of a stateful session getting a fault while in the middle of revoking a message. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14libceph: don't call reset_connection() on version/feature mismatchesIlya Dryomov1-3/+0
A fault due to a version mismatch or a feature set mismatch used to be treated differently from other faults: the connection would get closed without trying to reconnect and there was a ->bad_proto() connection op for notifying about that. This changed a long time ago, see commits 6384bb8b8e88 ("libceph: kill bad_proto ceph connection op") and 0fa6ebc600bc ("libceph: fix protocol feature mismatch failure path"). Nowadays these aren't any different from other faults (i.e. we try to reconnect even though the mismatch won't resolve until the server is replaced). reset_connection() calls there are rather confusing because reset_connection() resets a session together an individual instance of the protocol. This is cleaned up in the next patch. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14libceph: lower exponential backoff delayIlya Dryomov1-3/+9
The current setting allows the backoff to climb up to 5 minutes. This is too high -- it becomes hard to tell whether the client is stuck on something or just in backoff. In userspace, ms_max_backoff is defaulted to 15 seconds. Let's do the same. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14libceph: include middle_len in process_message() doutIlya Dryomov1-1/+2
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-10-12libceph: clear con->out_msg on Policy::stateful_server faultsIlya Dryomov1-0/+5
con->out_msg must be cleared on Policy::stateful_server (!CEPH_MSG_CONNECT_LOSSY) faults. Not doing so botches the reconnection attempt, because after writing the banner the messenger moves on to writing the data section of that message (either from where it got interrupted by the connection reset or from the beginning) instead of writing struct ceph_msg_connect. This results in a bizarre error message because the server sends CEPH_MSGR_TAG_BADPROTOVER but we think we wrote struct ceph_msg_connect: libceph: mds0 (1)172.21.15.45:6828 socket error on write ceph: mds0 reconnect start libceph: mds0 (1)172.21.15.45:6829 socket closed (con state OPEN) libceph: mds0 (1)172.21.15.45:6829 protocol version mismatch, my 32 != server's 32 libceph: mds0 (1)172.21.15.45:6829 protocol version mismatch AFAICT this bug goes back to the dawn of the kernel client. The reason it survived for so long is that only MDS sessions are stateful and only two MDS messages have a data section: CEPH_MSG_CLIENT_RECONNECT (always, but reconnecting is rare) and CEPH_MSG_CLIENT_REQUEST (only when xattrs are involved). The connection has to get reset precisely when such message is being sent -- in this case it was the former. Cc: stable@vger.kernel.org Link: https://tracker.ceph.com/issues/47723 Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Jeff Layton <jlayton@kernel.org>
2020-10-12libceph: format ceph_entity_addr nonces as unsignedIlya Dryomov1-3/+3
Match the server side logs. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-10-12libceph: move a dout in queue_con_delay()Ilya Dryomov1-1/+1
The queued con->work can start executing (and therefore logging) before we get to this "con->work has been queued" message, making the logs confusing. Move it up, with the meaning of "con->work is about to be queued". Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-10-02libceph: use sendpage_ok() in ceph_tcp_sendpage()Coly Li1-1/+1
In libceph, ceph_tcp_sendpage() does the following checks before handle the page by network layer's zero copy sendpage method, if (page_count(page) >= 1 && !PageSlab(page)) This check is exactly what sendpage_ok() does. This patch replace the open coded checks by sendpage_ok() as a code cleanup. Signed-off-by: Coly Li <colyli@suse.de> Acked-by: Jeff Layton <jlayton@kernel.org> Cc: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-23treewide: Use fallthrough pseudo-keywordGustavo A. R. Silva1-2/+2
Replace the existing /* fall through */ comments and its variants with the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary fall-through markings when it is the case. [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
2020-05-28tcp: add tcp_sock_set_nodelayChristoph Hellwig1-9/+2
Add a helper to directly set the TCP_NODELAY sockopt from kernel space without going through a fake uaccess. Cleanup the callers to avoid pointless wrappers now that this is a simple function call. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Sagi Grimberg <sagi@grimberg.me> Acked-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-23libceph: fix alloc_msg_with_page_vector() memory leaksIlya Dryomov1-2/+7
Make it so that CEPH_MSG_DATA_PAGES data item can own pages, fixing a bunch of memory leaks for a page vector allocated in alloc_msg_with_page_vector(). Currently, only watch-notify messages trigger this allocation, and normally the page vector is freed either in handle_watch_notify() or by the caller of ceph_osdc_notify(). But if the message is freed before that (e.g. if the session faults while reading in the message or if the notify is stale), we leak the page vector. This was supposed to be fixed by switching to a message-owned pagelist, but that never happened. Fixes: 1907920324f1 ("libceph: support for sending notifies") Reported-by: Roman Penyaev <rpenyaev@suse.de> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Roman Penyaev <rpenyaev@suse.de>
2019-11-27libceph, rbd, ceph: convert to use the new mount APIDavid Howells1-2/+0
Convert the ceph filesystem to the new internal mount API as the old one will be obsoleted and removed. This allows greater flexibility in communication of mount parameters between userspace, the VFS and the filesystem. See Documentation/filesystems/mount_api.txt for more information. [ Numerous string handling, leak and regression fixes; rbd conversion was particularly broken and had to be redone almost from scratch. ] Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-09-16libceph: add function that reset client's entity addrYan, Zheng1-0/+6
This function also re-open connections to OSD/MON, and re-send in-flight OSD requests after re-opening connections to OSD. Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-07-18Merge tag 'ceph-for-5.3-rc1' of git://github.com/ceph/ceph-clientLinus Torvalds1-6/+8
Pull ceph updates from Ilya Dryomov: "Lots of exciting things this time! - support for rbd object-map and fast-diff features (myself). This will speed up reads, discards and things like snap diffs on sparse images. - ceph.snap.btime vxattr to expose snapshot creation time (David Disseldorp). This will be used to integrate with "Restore Previous Versions" feature added in Windows 7 for folks who reexport ceph through SMB. - security xattrs for ceph (Zheng Yan). Only selinux is supported for now due to the limitations of ->dentry_init_security(). - support for MSG_ADDR2, FS_BTIME and FS_CHANGE_ATTR features (Jeff Layton). This is actually a single feature bit which was missing because of the filesystem pieces. With this in, the kernel client will finally be reported as "luminous" by "ceph features" -- it is still being reported as "jewel" even though all required Luminous features were implemented in 4.13. - stop NULL-terminating ceph vxattrs (Jeff Layton). The convention with xattrs is to not terminate and this was causing inconsistencies with ceph-fuse. - change filesystem time granularity from 1 us to 1 ns, again fixing an inconsistency with ceph-fuse (Luis Henriques). On top of this there are some additional dentry name handling and cap flushing fixes from Zheng. Finally, Jeff is formally taking over for Zheng as the filesystem maintainer" * tag 'ceph-for-5.3-rc1' of git://github.com/ceph/ceph-client: (71 commits) ceph: fix end offset in truncate_inode_pages_range call ceph: use generic_delete_inode() for ->drop_inode ceph: use ceph_evict_inode to cleanup inode's resource ceph: initialize superblock s_time_gran to 1 MAINTAINERS: take over for Zheng as CephFS kernel client maintainer rbd: setallochint only if object doesn't exist rbd: support for object-map and fast-diff rbd: call rbd_dev_mapping_set() from rbd_dev_image_probe() libceph: export osd_req_op_data() macro libceph: change ceph_osdc_call() to take page vector for response libceph: bump CEPH_MSG_MAX_DATA_LEN (again) rbd: new exclusive lock wait/wake code rbd: quiescing lock should wait for image requests rbd: lock should be quiesced on reacquire rbd: introduce copyup state machine rbd: rename rbd_obj_setup_*() to rbd_obj_init_*() rbd: move OSD request allocation into object request state machines rbd: factor out __rbd_osd_setup_discard_ops() rbd: factor out rbd_osd_setup_copyup() rbd: introduce obj_req->osd_reqs list ...
2019-07-08libceph: rename ceph_encode_addr to ceph_encode_banner_addrJeff Layton1-3/+3
...ditto for the decode function. We only use these functions to fix up banner addresses now, so let's name them more appropriately. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-07-08libceph: use TYPE_LEGACY for entity addrs instead of TYPE_NONEJeff Layton1-2/+5
Going forward, we'll have different address types so let's use the addr2 TYPE_LEGACY for internal tracking rather than TYPE_NONE. Also, make ceph_pr_addr print the address type value as well. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-07-08libceph: fix sa_family just after reading addressJeff Layton1-3/+2
It doesn't make sense to leave it undecoded until later. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-06-27keys: Pass the network namespace into request_key mechanismDavid Howells1-1/+2
Create a request_key_net() function and use it to pass the network namespace domain tag into DNS revolver keys and rxrpc/AFS keys so that keys for different domains can coexist in the same keyring. Signed-off-by: David Howells <dhowells@redhat.com> cc: netdev@vger.kernel.org cc: linux-nfs@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: linux-afs@lists.infradead.org
2019-05-16Merge tag 'afs-fixes-20190516' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fsLinus Torvalds1-1/+1
Pull misc AFS fixes from David Howells: "This fixes a set of miscellaneous issues in the afs filesystem, including: - leak of keys on file close. - broken error handling in xattr functions. - missing locking when updating VL server list. - volume location server DNS lookup whereby preloaded cells may not ever get a lookup and regular DNS lookups to maintain server lists consume power unnecessarily. - incorrect error propagation and handling in the fileserver iteration code causes operations to sometimes apparently succeed. - interruption of server record check/update side op during fileserver iteration causes uninterruptible main operations to fail unexpectedly. - callback promise expiry time miscalculation. - over invalidation of the callback promise on directories. - double locking on callback break waking up file locking waiters. - double increment of the vnode callback break counter. Note that it makes some changes outside of the afs code, including: - an extra parameter to dns_query() to allow the dns_resolver key just accessed to be immediately invalidated. AFS is caching the results itself, so the key can be discarded. - an interruptible version of wait_var_event(). - an rxrpc function to allow the maximum lifespan to be set on a call. - a way for an rxrpc call to be marked as non-interruptible" * tag 'afs-fixes-20190516' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: afs: Fix double inc of vnode->cb_break afs: Fix lock-wait/callback-break double locking afs: Don't invalidate callback if AFS_VNODE_DIR_VALID not set afs: Fix calculation of callback expiry time afs: Make dynamic root population wait uninterruptibly for proc_cells_lock afs: Make some RPC operations non-interruptible rxrpc: Allow the kernel to mark a call as being non-interruptible afs: Fix error propagation from server record check/update afs: Fix the maximum lifespan of VL and probe calls rxrpc: Provide kernel interface to set max lifespan on a call afs: Fix "kAFS: AFS vnode with undefined type 0" afs: Fix cell DNS lookup Add wait_var_event_interruptible() dns_resolver: Allow used keys to be invalidated afs: Fix afs_cell records to always have a VL server list record afs: Fix missing lock when replacing VL server list afs: Fix afs_xattr_get_yfs() to not try freeing an error value afs: Fix incorrect error handling in afs_xattr_get_acl() afs: Fix key leak in afs_release() and afs_evict_inode()
2019-05-15dns_resolver: Allow used keys to be invalidatedDavid Howells1-1/+1
Allow used DNS resolver keys to be invalidated after use if the caller is doing its own caching of the results. This reduces the amount of resources required. Fix AFS to invalidate DNS results to kill off permanent failure records that get lodged in the resolver keyring and prevent future lookups from happening. Fixes: 0a5143f2f89c ("afs: Implement VL server rotation") Signed-off-by: David Howells <dhowells@redhat.com>
2019-05-07libceph: make ceph_pr_addr take an struct ceph_entity_addr pointerJeff Layton1-24/+24
GCC9 is throwing a lot of warnings about unaligned accesses by callers of ceph_pr_addr. All of the current callers are passing a pointer to the sockaddr inside struct ceph_entity_addr. Fix it to take a pointer to a struct ceph_entity_addr instead, and then have the function make a copy of the sockaddr before printing it. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-05-07libceph: fix unaligned accesses in ceph_entity_addr handlingJeff Layton1-40/+37
GCC9 is throwing a lot of warnings about unaligned access. This patch fixes some of them by changing most of the sockaddr handling functions to take a pointer to struct ceph_entity_addr instead of struct sockaddr_storage. The lower functions can then make copies or do unaligned accesses as needed. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-03-25libceph: fix breakage caused by multipage bvecsIlya Dryomov1-2/+6
A bvec can now consist of multiple physically contiguous pages. This means that bvec_iter_advance() can move to a different page while staying in the same bvec (i.e. ->bi_bvec_done != 0). The messenger works in terms of segments which can now be defined as the smaller of a bvec and a page. The "more bytes to process in this segment" condition holds only if bvec_iter_advance() leaves us in the same bvec _and_ in the same page. On next bvec (possibly in the same page) and on next page (possibly in the same bvec) we may need to set ->last_piece. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-02-18libceph: handle an empty authorize replyIlya Dryomov1-6/+9
The authorize reply can be empty, for example when the ticket used to build the authorizer is too old and TAG_BADAUTHORIZER is returned from the service. Calling ->verify_authorizer_reply() results in an attempt to decrypt and validate (somewhat) random data in au->buf (most likely the signature block from calc_signature()), which fails and ends up in con_fault_finish() with !con->auth_retry. The ticket isn't invalidated and the connection is retried again and again until a new ticket is obtained from the monitor: libceph: osd2 192.168.122.1:6809 bad authorize reply libceph: osd2 192.168.122.1:6809 bad authorize reply libceph: osd2 192.168.122.1:6809 bad authorize reply libceph: osd2 192.168.122.1:6809 bad authorize reply Let TAG_BADAUTHORIZER handler kick in and increment con->auth_retry. Cc: stable@vger.kernel.org Fixes: 5c056fdc5b47 ("libceph: verify authorize reply on connect") Link: https://tracker.ceph.com/issues/20164 Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Sage Weil <sage@redhat.com>
2019-01-21libceph: avoid KEEPALIVE_PENDING races in ceph_con_keepalive()Ilya Dryomov1-2/+3
con_fault() can transition the connection into STANDBY right after ceph_con_keepalive() clears STANDBY in clear_standby(): libceph user thread ceph-msgr worker ceph_con_keepalive() mutex_lock(&con->mutex) clear_standby(con) mutex_unlock(&con->mutex) mutex_lock(&con->mutex) con_fault() ... if KEEPALIVE_PENDING isn't set set state to STANDBY ... mutex_unlock(&con->mutex) set KEEPALIVE_PENDING set WRITE_PENDING This triggers warnings in clear_standby() when either ceph_con_send() or ceph_con_keepalive() get to clearing STANDBY next time. I don't see a reason to condition queue_con() call on the previous value of KEEPALIVE_PENDING, so move the setting of KEEPALIVE_PENDING into the critical section -- unlike WRITE_PENDING, KEEPALIVE_PENDING could have been a non-atomic flag. Reported-by: syzbot+acdeb633f6211ccdf886@syzkaller.appspotmail.com Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Tested-by: Myungho Jung <mhjungk@gmail.com>
2018-12-26libceph: switch more to bool in ceph_tcp_sendmsg()Ilya Dryomov1-1/+1
Unlike in ceph_tcp_sendpage(), it's a bool. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>