aboutsummaryrefslogtreecommitdiffstats
path: root/include/linux/ceph (follow)
AgeCommit message (Collapse)AuthorFilesLines
2022-10-04libceph: drop last_piece flag from ceph_msg_data_cursorJeff Layton1-3/+1
ceph_msg_data_next is always passed a NULL pointer for this field. Some of the "next" operations look at it in order to determine the length, but we can just take the min of the data on the page or cursor->resid. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-08-03libceph: clean up ceph_osdc_start_request prototypeJeff Layton1-3/+2
This function always returns 0, and ignores the nofail boolean. Drop the nofail argument, make the function void return and fix up the callers. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-08-03ceph: fix incorrect old_size length in ceph_mds_request_argsXiubo Li1-3/+3
The 'old_size' is a __le64 type since birth, not sure why the kclient incorrectly switched it to __le32. This change is okay won't break anything because union will always allocate more memory than the 'open' member needed. Rename 'file_replication' to 'pool' as ceph did. Though this 'open' struct may never be used in kclient in future, it's confusing when going through the ceph code. Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-08-03ceph: fix the incorrect comment for the ceph_mds_caps structXiubo Li1-1/+1
The incorrect comment is misleading. Acutally the last members in ceph_mds_caps strcut is a union for none export and export bodies. Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-08-03ceph: prevent a client from exceeding the MDS maximum xattr sizeLuís Henriques1-0/+1
The MDS tries to enforce a limit on the total key/values in extended attributes. However, this limit is enforced only if doing a synchronous operation (MDS_OP_SETXATTR) -- if we're buffering the xattrs, the MDS doesn't have a chance to enforce these limits. This patch adds support for decoding the xattrs maximum size setting that is distributed in the mdsmap. Then, when setting an xattr, the kernel client will revert to do a synchronous operation if that maximum size is exceeded. While there, fix a dout() that would trigger a printk warning: [ 98.718078] ------------[ cut here ]------------ [ 98.719012] precision 65536 too large [ 98.719039] WARNING: CPU: 1 PID: 3755 at lib/vsprintf.c:2703 vsnprintf+0x5e3/0x600 ... Link: https://tracker.ceph.com/issues/55725 Signed-off-by: Luís Henriques <lhenriques@suse.de> Reviewed-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-05-18libceph: fix potential use-after-free on linger ping and resendsIlya Dryomov1-0/+3
request_reinit() is not only ugly as the comment rightfully suggests, but also unsafe. Even though it is called with osdc->lock held for write in all cases, resetting the OSD request refcount can still race with handle_reply() and result in use-after-free. Taking linger ping as an example: handle_timeout thread handle_reply thread down_read(&osdc->lock) req = lookup_request(...) ... finish_request(req) # unregisters up_read(&osdc->lock) __complete_request(req) linger_ping_cb(req) # req->r_kref == 2 because handle_reply still holds its ref down_write(&osdc->lock) send_linger_ping(lreq) req = lreq->ping_req # same req # cancel_linger_request is NOT # called - handle_reply already # unregistered request_reinit(req) WARN_ON(req->r_kref != 1) # fires request_init(req) kref_init(req->r_kref) # req->r_kref == 1 after kref_init ceph_osdc_put_request(req) kref_put(req->r_kref) # req->r_kref == 0 after kref_put, req is freed <further req initialization/use> !!! This happens because send_linger_ping() always (re)uses the same OSD request for watch ping requests, relying on cancel_linger_request() to unregister it from the OSD client and rip its messages out from the messenger. send_linger() does the same for watch/notify registration and watch reconnect requests. Unfortunately cancel_request() doesn't guarantee that after it returns the OSD client would be completely done with the OSD request -- a ref could still be held and the callback (if specified) could still be invoked too. The original motivation for request_reinit() was inability to deal with allocation failures in send_linger() and send_linger_ping(). Switching to using osdc->req_mempool (currently only used by CephFS) respects that and allows us to get rid of request_reinit(). Cc: stable@vger.kernel.org Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Xiubo Li <xiubli@redhat.com> Acked-by: Jeff Layton <jlayton@kernel.org>
2022-03-01ceph: do not release the global snaprealm until unmountingXiubo Li1-1/+2
The global snaprealm would be created and then destroyed immediately every time when updating it. URL: https://tracker.ceph.com/issues/54362 Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-03-01ceph: remove incorrect and unused CEPH_INO_DOTDOT macroXiubo Li1-1/+0
Ceph have removed this macro and the 0x3 will be use for global dummy snaprealm. Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-03-01ceph: move to a dedicated slabcache for ceph_cap_snapXiubo Li1-0/+1
There could be huge number of capsnaps around at any given time. On x86_64 the structure is 248 bytes, which will be rounded up to 256 bytes by kzalloc. Move this to a dedicated slabcache to save 8 bytes for each. [ jlayton: use kmem_cache_zalloc ] Signed-off-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-03-01ceph: add getvxattr opMilind Changire1-0/+1
Problem: Some directory vxattrs (e.g. ceph.dir.pin.random) are governed by information that isn't necessarily shared with the client. Add support for the new GETVXATTR operation, which allows the client to query the MDS directly for vxattrs. When the client is queried for a vxattr that doesn't have a special handler, have it issue a GETVXATTR to the MDS directly. Solution: Adds new getvxattr op to fetch ceph.dir.pin*, ceph.dir.layout* and ceph.file.layout* vxattrs. If the entire layout for a dir or a file is being set, then it is expected that the layout be set in standard JSON format. Individual field value retrieval is not wrapped in JSON. The JSON format also applies while setting the vxattr if the entire layout is being set in one go. As a temporary measure, setting a vxattr can also be done in the old format. The old format will be deprecated in the future. URL: https://tracker.ceph.com/issues/51062 Signed-off-by: Milind Changire <mchangir@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-02-02libceph: optionally use bounce buffer on recv path in crc modeIlya Dryomov2-0/+2
Both msgr1 and msgr2 in crc mode are zero copy in the sense that message data is read from the socket directly into the destination buffer. We assume that the destination buffer is stable (i.e. remains unchanged while it is being read to) though. Otherwise, CRC errors ensue: libceph: read_partial_message 0000000048edf8ad data crc 1063286393 != exp. 228122706 libceph: osd1 (1)192.168.122.1:6843 bad crc/signature libceph: bad data crc, calculated 57958023, expected 1805382778 libceph: osd2 (2)192.168.122.1:6876 integrity error, bad crc Introduce rxbounce option to enable use of a bounce buffer when receiving message data. In particular this is needed if a mapped image is a Windows VM disk, passed to QEMU. Windows has a system-wide "dummy" page that may be mapped into the destination buffer (potentially more than once into the same buffer) by the Windows Memory Manager in an effort to generate a single large I/O [1][2]. QEMU makes a point of preserving overlap relationships when cloning I/O vectors, so krbd gets exposed to this behaviour. [1] "What Is Really in That MDL?" https://docs.microsoft.com/en-us/previous-versions/windows/hardware/design/dn614012(v=vs.85) [2] https://blogs.msmvps.com/kernelmustard/2005/05/04/dummy-pages/ URL: https://bugzilla.redhat.com/show_bug.cgi?id=1973317 Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Jeff Layton <jlayton@kernel.org>
2022-02-02libceph: make recv path in secure mode work the same as send pathIlya Dryomov1-0/+4
The recv path of secure mode is intertwined with that of crc mode. While it's slightly more efficient that way (the ciphertext is read into the destination buffer and decrypted in place, thus avoiding two potentially heavy memory allocations for the bounce buffer and the corresponding sg array), it isn't really amenable to changes. Sacrifice that edge and align with the send path which always uses a full-sized bounce buffer (currently there is no other way -- if the kernel crypto API ever grows support for streaming (piecewise) en/decryption for GCM [1], we would be able to easily take advantage of that on both sides). [1] https://lore.kernel.org/all/20141225202830.GA18794@gondor.apana.org.au/ Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Jeff Layton <jlayton@kernel.org>
2022-01-20Merge tag 'ceph-for-5.17-rc1' of git://github.com/ceph/ceph-clientLinus Torvalds2-2/+3
Pull ceph updates from Ilya Dryomov: "The highlight is the new mount "device" string syntax implemented by Venky Shankar. It solves some long-standing issues with using different auth entities and/or mounting different CephFS filesystems from the same cluster, remounting and also misleading /proc/mounts contents. The existing syntax of course remains to be maintained. On top of that, there is a couple of fixes for edge cases in quota and a new mount option for turning on unbuffered I/O mode globally instead of on a per-file basis with ioctl(CEPH_IOC_SYNCIO)" * tag 'ceph-for-5.17-rc1' of git://github.com/ceph/ceph-client: ceph: move CEPH_SUPER_MAGIC definition to magic.h ceph: remove redundant Lsx caps check ceph: add new "nopagecache" option ceph: don't check for quotas on MDS stray dirs ceph: drop send metrics debug message rbd: make const pointer spaces a static const array ceph: Fix incorrect statfs report for small quota ceph: mount syntax module parameter doc: document new CephFS mount device syntax ceph: record updated mon_addr on remount ceph: new device mount syntax libceph: rename parse_fsid() to ceph_parse_fsid() and export libceph: generalize addr/ip parsing based on delimiter
2022-01-15mm: allow !GFP_KERNEL allocations for kvmallocMichal Hocko1-1/+0
Support for GFP_NO{FS,IO} and __GFP_NOFAIL has been implemented by previous patches so we can allow the support for kvmalloc. This will allow some external users to simplify or completely remove their helpers. GFP_NOWAIT semantic hasn't been supported so far but it hasn't been explicitly documented so let's add a note about that. ceph_kvmalloc is the first helper to be dropped and changed to kvmalloc. Link: https://lkml.kernel.org/r/20211122153233.9924-5-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Christoph Hellwig <hch@lst.de> Cc: Dave Chinner <david@fromorbit.com> Cc: Ilya Dryomov <idryomov@gmail.com> Cc: Jeff Layton <jlayton@kernel.org> Cc: Neil Brown <neilb@suse.de> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-13libceph: rename parse_fsid() to ceph_parse_fsid() and exportVenky Shankar1-0/+1
... as it is too generic. also, use __func__ when logging rather than hardcoding the function name. Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-01-13libceph: generalize addr/ip parsing based on delimiterVenky Shankar2-2/+2
... and remove hardcoded function name in ceph_parse_ips(). [ idryomov: delim parameter, drop CEPH_ADDR_PARSE_DEFAULT_DELIM ] Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2021-11-08libceph, ceph: move ceph_osdc_copy_from() into cephfs codeLuís Henriques1-11/+8
This patch moves ceph_osdc_copy_from() function out of libceph code into cephfs. There are no other users for this function, and there is the need (in another patch) to access internal ceph_osd_request struct members. Signed-off-by: Luís Henriques <lhenriques@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2021-11-08ceph: don't rely on error_string to validate blocklisted session.Kotresh HR1-0/+2
The "error_string" in the metadata of MClientSession is being parsed by kclient to validate whether the session is blocklisted. The "error_string" is for humans and shouldn't be relied on it. Hence added the flag to MClientsession to indicate the session is blocklisted. [ jlayton: minor formatting cleanup ] URL: https://tracker.ceph.com/issues/47450 Signed-off-by: Kotresh HR <khiremat@redhat.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2021-09-02ceph: flush mdlog before umountingXiubo Li1-0/+1
Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2021-06-24libceph: set global_id as soon as we get an auth ticketIlya Dryomov1-1/+3
Commit 61ca49a9105f ("libceph: don't set global_id until we get an auth ticket") delayed the setting of global_id too much. It is set only after all tickets are received, but in pre-nautilus clusters an auth ticket and the service tickets are obtained in separate steps (for a total of three MAuth replies). When the service tickets are requested, global_id is used to build an authorizer; if global_id is still 0 we never get them and fail to establish the session. Moving the setting of global_id into protocol implementations. This way global_id can be set exactly when an auth ticket is received, not sooner nor later. Fixes: 61ca49a9105f ("libceph: don't set global_id until we get an auth ticket") Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Jeff Layton <jlayton@kernel.org>
2021-06-24libceph: don't pass result into ac->ops->handle_reply()Ilya Dryomov1-1/+1
There is no result to pass in msgr2 case because authentication failures are reported through auth_bad_method frame and in MAuth case an error is returned immediately. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Jeff Layton <jlayton@kernel.org>
2021-02-16libceph: deprecate [no]cephx_require_signatures optionsIlya Dryomov1-4/+3
These options were introduced in 3.19 with support for message signing and are rather useless, as explained in commit a51983e4dd2d ("libceph: add nocephx_sign_messages option"). Deprecate them. In case there is someone out there with a cluster that lacks support for MSG_AUTH feature (very unlikely but has to be considered since we haven't formally raised the bar from argonaut to bobtail yet), make nocephx_sign_messages also waive MSG_AUTH requirement. This is probably how it should have been done in the first place -- if we aren't going to sign, requiring the signing feature makes no sense. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Jeff Layton <jlayton@kernel.org>
2020-12-28libceph: add __maybe_unused to DEFINE_MSGR2_FEATUREIlya Dryomov1-2/+2
Avoid -Wunused-const-variable warnings for "make W=1". Reported-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14libceph: drop ceph_auth_{create,update}_authorizer()Ilya Dryomov1-6/+0
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14libceph, ceph: implement msgr2.1 protocol (crc and secure modes)Ilya Dryomov6-6/+231
Implement msgr2.1 wire protocol, available since nautilus 14.2.11 and octopus 15.2.5. msgr2.0 wire protocol is not implemented -- it has several security, integrity and robustness issues and therefore considered deprecated. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14libceph: introduce connection modes and ms_mode optionIlya Dryomov3-2/+13
msgr2 supports two connection modes: crc (plain) and secure (on-wire encryption). Connection mode is picked by server based on input from client. Introduce ms_mode option: ms_mode=legacy - msgr1 (default) ms_mode=crc - crc mode, if denied fail ms_mode=secure - secure mode, if denied fail ms_mode=prefer-crc - crc mode, if denied agree to secure mode ms_mode=prefer-secure - secure mode, if denied agree to crc mode ms_mode affects all connections, we don't separate connections to mons like it's done in userspace with ms_client_mode vs ms_mon_client_mode. For now the default is legacy, to be flipped to prefer-crc after some time. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14libceph, rbd: ignore addr->type while comparing in some casesIlya Dryomov1-1/+8
For libceph, this ensures that libceph instance sharing (share option) continues to work. For rbd, this avoids blocklisting alive lock owners (locker addr is always LEGACY, while watcher addr is ANY in nautilus). Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14libceph, ceph: get and handle cluster maps with addrvecsIlya Dryomov4-6/+15
In preparation for msgr2, make the cluster send us maps with addrvecs including both LEGACY and MSGR2 addrs instead of a single LEGACY addr. This means advertising support for SERVER_NAUTILUS and also some older features: SERVER_MIMIC, MONENC and MONNAMES. MONNAMES and MONENC are actually pre-argonaut, we just never updated ceph_monmap_decode() for them. Decoding is unconditional, see commit 23c625ce3065 ("libceph: assume argonaut on the server side"). SERVER_MIMIC doesn't bear any meaning for the kernel client. Since ceph_decode_entity_addrvec() is guarded by encoding version checks (and in msgr2 case it is guarded implicitly by the fact that server is speaking msgr2), we assume MSG_ADDR2 for it. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14libceph: drop ac->ops->name fieldIlya Dryomov1-2/+0
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14libceph: amend cephx init_protocol() and build_request()Ilya Dryomov1-0/+1
In msgr2, initial authentication happens with an exchange of msgr2 control frames -- MAuth message and struct ceph_mon_request_header aren't used. Make that optional. Stop reporting cephx protocol as "x". Use "cephx" instead. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14libceph, ceph: incorporate nautilus cephx changesIlya Dryomov1-4/+12
- request service tickets together with auth ticket. Currently we get auth ticket via CEPHX_GET_AUTH_SESSION_KEY op and then request service tickets via CEPHX_GET_PRINCIPAL_SESSION_KEY op in a separate message. Since nautilus, desired service tickets are shared togther with auth ticket in CEPHX_GET_AUTH_SESSION_KEY reply. - propagate session key and connection secret, if any. In preparation for msgr2, update handle_reply() and verify_authorizer_reply() auth ops to propagate session key and connection secret. Since nautilus, if secure mode is negotiated, connection secret is shared either in CEPHX_GET_AUTH_SESSION_KEY reply (for mons) or in a final authorizer reply (for osds and mdses). Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14libceph: move msgr1 protocol specific fields to its own structIlya Dryomov1-35/+41
A couple whitespace fixups, no functional changes. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14libceph: move msgr1 protocol implementation to its own fileIlya Dryomov1-0/+1
A pure move, no other changes. Note that ceph_tcp_recv{msg,page}() and ceph_tcp_send{msg,page}() helpers are also moved. msgr2 will bring its own, more efficient, variants based on iov_iter. Switching msgr1 to them was considered but decided against to avoid subtle regressions. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14libceph: separate msgr1 protocol implementationIlya Dryomov1-0/+8
In preparation for msgr2, define internal messenger <-> protocol interface (as opposed to external messenger <-> client interface, which is struct ceph_connection_operations) consisting of try_read(), try_write(), revoke(), revoke_incoming(), opened(), reset_session() and reset_protocol() ops. The semantics are exactly the same as they are now. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14libceph: export remaining protocol independent infrastructureIlya Dryomov1-1/+38
In preparation for msgr2, make all protocol independent functions in messenger.c global. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14libceph: export zero_pageIlya Dryomov1-0/+1
In preparation for msgr2, make zero_page global. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14libceph: rename and export con->flags bitsIlya Dryomov1-1/+12
In preparation for msgr2, move the defines to the header file. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14libceph: rename and export con->state statesIlya Dryomov1-1/+11
In preparation for msgr2, rename msgr1 specific states and move the defines to the header file. Also drop state transition comments. They don't cover all possible transitions (e.g. NEGOTIATING -> STANDBY, etc) and currently do more harm than good. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14libceph: make con->state an intIlya Dryomov1-1/+1
unsigned long is a leftover from when con->state used to be a set of bits managed with set_bit(), clear_bit(), etc. Save a bit of memory. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14libceph: drop msg->ack_stamp fieldIlya Dryomov1-1/+0
It is set in process_ack() but never used. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14libceph: lower exponential backoff delayIlya Dryomov1-2/+2
The current setting allows the backoff to climb up to 5 minutes. This is too high -- it becomes hard to tell whether the client is stuck on something or just in backoff. In userspace, ms_max_backoff is defaulted to 15 seconds. Let's do the same. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14ceph: implement updated ceph_mds_request_head structureJeff Layton1-1/+31
When we added the btime feature in mainline ceph, we had to extend struct ceph_mds_request_args so that it could be set. Implement the same in the kernel client. Rename ceph_mds_request_head with a _old extension, and a union ceph_mds_request_args_ext to allow for the extended size of the new header format. Add the appropriate code to handle both formats in struct create_request_message and key the behavior on whether the peer supports CEPH_FEATURE_FS_BTIME. The gid_list field in the payload is now populated from the saved credential. For now, we don't add any support for setting the btime via setattr, but this does enable us to add that in the future. [ idryomov: break unnecessarily long lines ] Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14ceph: set osdmap epoch for setxattrXiubo Li1-0/+1
When setting the file/dir layout, it may need data pool info. So in mds server, it needs to check the osdmap. At present, if mds doesn't find the data pool specified, it will try to get the latest osdmap. Now if pass the osd epoch for setxattr, the mds server can only check this epoch of osdmap. URL: https://tracker.ceph.com/issues/48504 Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14libceph: remove unused port macrosLiu, Changcheng1-9/+0
1. monitor's default port is defined by CEPH_MON_PORT 2. CEPH_PORT_START and CEPH_PORT_LAST are not needed. Signed-off-by: Changcheng Liu <changcheng.liu@aliyun.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-12-14ceph: add new RECOVER mount_state when recovering sessionJeff Layton1-0/+1
When recovering a session (a'la recover_session=clean), we want to do all of the operations that we do on a forced umount, but changing the mount state to SHUTDOWN is can cause queued MDS requests to fail when the session comes back. Most of those can idle until the session is recovered in this situation. Reserve SHUTDOWN state for forced umount, and make a new RECOVER state for the forced reconnect situation. Change several tests for equality with SHUTDOWN to test for that or RECOVER. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-10-12libceph: fix ENTITY_NAME format suggestionIlya Dryomov1-1/+1
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-10-12libceph, rbd, ceph: "blacklist" -> "blocklist"Ilya Dryomov2-2/+2
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-10-12libceph: multiple workspaces for CRUSH computationsIlya Dryomov1-2/+12
Replace a global map->crush_workspace (protected by a global mutex) with a list of workspaces, up to the number of CPUs + 1. This is based on a patch from Robin Geuze <robing@nl.team.blue>. Robin and his team have observed a 10-20% increase in IOPS on all queue depths and lower CPU usage as well on a high-end all-NVMe 100GbE cluster. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-08-24libceph: add __maybe_unused to DEFINE_CEPH_FEATUREIlya Dryomov1-4/+4
Avoid -Wunused-const-variable warnings for "make W=1". Reported-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
2020-08-04ceph: move sb->wb_pagevec_pool to be a global mempoolJeff Layton1-0/+1
When doing some testing recently, I hit some page allocation failures on mount, when creating the wb_pagevec_pool for the mount. That requires 128k (32 contiguous pages), and after thrashing the memory during an xfstests run, sometimes that would fail. 128k for each mount seems like a lot to hold in reserve for a rainy day, so let's change this to a global mempool that gets allocated when the module is plugged in. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>