Age | Commit message (Collapse) | Author | Files | Lines |
|
The Lustre filesystem has been in the kernel tree for over 5 years now.
While it has been an endless source of enjoyment for new kernel
developers learning how to do basic codingstyle cleanups, as well as an
semi-entertaining source of bewilderment from the vfs developers any
time they have looked into the codebase to try to figure out how to port
their latest api changes to this filesystem, it has not really moved
forward into the "this is in shape to get out of staging" despite many
half-completed attempts.
And getting code out of staging is the main goal of that portion of the
kernel tree. Code should not stagnate and it feels like having this
code in staging is only causing the development cycle of the filesystem
to take longer than it should. There is a whole separate out-of-tree
copy of this codebase where the developers work on it, and then random
changes are thrown over the wall at staging at some later point in time.
This dual-tree development model has never worked, and the state of this
codebase is proof of that.
So, let's just delete the whole mess. Now the lustre developers can go
off and work in their out-of-tree codebase and not have to worry about
providing valid changelog entries and breaking their patches up into
logical pieces. They can take the time they have spend doing those
types of housekeeping chores and get the codebase into a much better
shape, and it can be submitted for inclusion into the real part of the
kernel tree when ready.
Cc: Oleg Drokin <oleg.drokin@intel.com>
Cc: Andreas Dilger <andreas.dilger@intel.com>
Cc: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Instead of the catch-all libcfs_all.h, just include the
files actually needed in different places.
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Just use current->pid and current->comm directly, instead
of having wrappers.
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
The variable "cfs_cpt_table" has the same name as
the structure "struct cfs_cpt_table".
This makes it hard to use #define to make one disappear
on a uni-processor build, but keep the other.
So rename the variable to cfs_cpt_tab.
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Trivial fix to spelling mistake in DEBUG_REQ message text
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Discard cfs_time_current() and cfs_time_current64()
and use jiffies and get_jiffies_64() like the rest of the kernel.
Signed-off-by: NeilBrown <neilb@suse.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
1/ use list_for_each_entry_safe() instead of
list_for_each_safe() and similar.
2/ use list_first_entry() and list_last_entry() where appropriate.
3/ When removing everything from a list, use
while ((x = list_first_entry_or_null()) {
as it makes the intent clear
4/ No need to take a spinlock in a structure that is about
to be freed - we must have exclusive access at this stage.
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
The only interesting difference between libcfs_kvzalloc()
and kvzalloc() is that the former appears to work
with GFP_NOFS, which the latter gives a WARN_ON_ONCE()
when that is attempted.
Each libcfs_kvzalloc() should really be analysed
and either converted to a kzalloc() call if the size is never
more than a page, or to use GFP_KERNEL if no locks are held.
If there is ever a case where locks are held and a large allocation
is needed, then some other technique should be used.
It might be nice to not always blindly zero pages too...
For now, just convert libcfs_kvzalloc() calls to
kvzalloc(), and let the warning remind us that there is work to do.
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Using vmalloc with GFP_NOFS is not supported as vmalloc
performs some internal allocations with GFP_KERNEL.
So in cases where the size passed to libcfs_kvzalloc()
is clearly at most 1 page, convert to kzalloc().
In cases where the call clearly doesn't hold any
filesystem locks, convert to GFP_KERNEL.
Unfortunately there are many more that are not easy to fix.
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
This is the last remaining use of l_wait_event().
It is the only use of LWI_TIMEOUT_INTR_ALL() which
has a meaning that timeouts can be interrupted.
Only interrupts by "fatal" signals are allowed, so
introduce l_wait_event_abortable_timeout() to
support this.
Reviewed-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: NeilBrown <neilb@suse.com>
Reviewed-by: Patrick Farrell <paf@cray.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
replace l_wait_event() with wait_event_idle_timeout() and explicit
loop. This approach is easier to understand.
Reviewed-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: NeilBrown <neilb@suse.com>
Reviewed-by: Patrick Farrell <paf@cray.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
We can replace l_wait_event() with
wait_event_idle_timeout() here providing we call the
timeout function when wait_event_idle_timeout() returns zero.
As ptlrpc_expired_set() returns 1, the l_wait_event() aborts of the
first timeout.
Reviewed-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: NeilBrown <neilb@suse.com>
Reviewed-by: Patrick Farrell <paf@cray.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
This waiter currently wakes up every second to re-test if
imp_flight is zero. If we ensure wakeup is called whenever
imp_flight is decremented to zero, we can just have a simple
wait_event_idle_timeout().
So add a wake_up_all to the one place it is missing, and simplify
the wait_event.
Reviewed-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: NeilBrown <neilb@suse.com>
Reviewed-by: Patrick Farrell <paf@cray.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
cfs_time_seconds() converts a number of seconds to the
matching number of jiffies.
The standard way to do this in Linux is "* HZ".
So discard cfs_time_seconds() and use "* HZ" instead.
Reviewed-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: NeilBrown <neilb@suse.com>
Reviewed-by: Patrick Farrell <paf@cray.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
The cfs_get_random_bytes() interface adds nothing of value
to get_random_byte() (which it uses internally). So just use the
standard interface.
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
It's good to have SPDX identifiers in all files to make it easier to
audit the kernel tree for correct licenses.
Update the drivers/staging/lustre files files with the correct SPDX
license identifier based on the license text in the file itself. The
SPDX identifier is a legally binding shorthand, which can be used
instead of the full boiler plate text.
This work is based on a script and data from Thomas Gleixner, Philippe
Ombredanne, and Kate Stewart.
Cc: Oleg Drokin <oleg.drokin@intel.com>
Cc: James Simmons <jsimmons@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Rationalize include paths in the ptlrpc/ldlm source code files.
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
The libcfs CFS_DURATION_T define is really only for
jiffies and its being used with time64_t in some of
the ptlrpc code. Lets remove CFS_DURATION_T and
replaced it with normal %lld instead.
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/24977
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4423
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
During the reorganization of ptlrpc_request some of the
time64_t fields were incorrectly turned into time_t.
Restore those fields back to time_64_t.
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/24977
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4423
Fixes: 32c8728d87dc ("staging/lustre/ptlrpc: reorganize ptlrpc_request")
CC: Liang Zhen <liang.zhen@intel.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
It's not necessary reassgin & re-adjust rq_mbits for replay
request in ptlrpc_set_bulk_mbits(), they all must have already
been correctly assigned before.
Such unecessary reassign could make the first matchbit not
PTLRPC_BULK_OPS_MASK aligned, that'll trigger LASSERT in
ptlrpc_register_bulk():
- ptlrpc_set_bulk_mbits() is called when first time sending
request, rq_mbits is set as xid, which is BULK_OPS aligned;
- ptlrpc_set_bulk_mbits() continue to adjust the mbits for
multi-bulk RPC, rq_mbits is not aligned anymore, then rq_xid
is changed accordingly if client is connecting to an old
server, so rq_xid became unaligned too;
- The request is replayed, ptlrpc_set_bulk_mbits() reassign
the rq_mbits as rq_xid, which isn't aligned already, but
ptlrpc_register_bulk() still assumes this value as the
first matchbits and LASSERT it's BULK_OPS aligned.
Signed-off-by: Niu Yawei <yawei.niu@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-6808
Reviewed-on: http://review.whamcloud.com/23048
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Due to the way the DFID was embedded in our debug strings checkpatch
would report the following error:
CHECK: Concatenated strings should use spaces between elements
This patch introduces proper space to resolve these reports.
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Change lnet_process_id_t from typedef to proper structure.
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/20831
Reviewed-by: Olaf Weber <olaf@sgi.com>
Reviewed-by: Doug Oucharek <doug.s.oucharek@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Change lnet_handle_md_t from a typedef of another typedef into
a proper stand alone structure. Create the inline functions
LNetInvalidateMDHandle and LNetMDHandleIsInvalid to handle this
new piece of data.
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/20831
Reviewed-by: Olaf Weber <olaf@sgi.com>
Reviewed-by: Doug Oucharek <doug.s.oucharek@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
The macro DIV_ROUND_UP performs the computation (((n) + (d) - 1) /(d)).
It clarifies the divisor calculations. This occurence was detected using
the coccinelle script:
@@
expression e1;
expression e2;
@@
(
- ((e1) + e2 - 1) / (e2)
+ DIV_ROUND_UP(e1,e2)
|
- ((e1) + (e2 - 1)) / (e2)
+ DIV_ROUND_UP(e1,e2)
)
Signed-off-by: simran singhal <singhalsimran0@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
The replay cursor should be updated properly when close happened
during replay, otherwise, ptlrpc_replay_next() could run into a
dead loop due to an invalid replay cursor:
- replay cursor is moved to an open request during replay;
- application close that open file, so the rq_replay of the open
request is cleared;
- ptlrpc_replay_next() calls ptlrpc_free_committed() to free
committed/closed requests, the open request is removed from
the committed list, so the replay cursor is changed to an
empty list_head now. The open request won't be freed now since
it's still held by the pending close request;
- ptlrpc_replay_next() continue to move the replay cursor to
next and run into a dead loop at the end;
Another change in this patch is to remove the out of date comments
in ptlrpc_replay_next() and cover the whole process of finding
replay request within imp_lock, because:
1. With two separated replay lists and replay cursor introduced,
finding replay request won't take much time as before, it's
not necessary to do this "lock -> unlock -> lock -> unlock"
trick anymore;
2. Nowadays there are various kind of non-replay requests are
allowed during recovery, so ptlrpc_free_committed() may run in
parallel to remove an open request while ptlrpc_replay_next()
is iterating the open requests list;
Signed-off-by: Niu Yawei <yawei.niu@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-8765
Reviewed-on: https://review.whamcloud.com/23418
Reviewed-by: Yang Sheng <yang.sheng@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
ptlrpc_import_delay_req() refuses to delay blocking asts when import
is not in LUSTRE_IMP_FULL yet. That leads to client eviction assuming
that it failed to respond.
Allow delays for blocking asts being resent.
Signed-off-by: Vladimir Saveliev <vladimir.saveliev@seagate.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-8351
Seagate-bug-id: MRP-3500
Reviewed-on: https://review.whamcloud.com/21065
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Set mbits for EINPROGRESS resend in ptl_send_rpc().
Signed-off-by: Niu Yawei <yawei.niu@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-8193
Reviewed-on: http://review.whamcloud.com/20377
Reviewed-by: Liang Zhen <liang.zhen@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
lustre uses a fake switch() statement as a compile-time assert, but unfortunately
each use of that causes a warning when building with clang:
drivers/staging/lustre/lnet/klnds/socklnd/socklnd.c:2907:2: warning: no case matching constant switch condition '42'
drivers/staging/lustre/lnet/klnds/socklnd/../../../include/linux/libcfs/libcfs_private.h:294:36: note: expanded from macro 'CLASSERT'
#define CLASSERT(cond) do {switch (42) {case (cond): case 0: break; } } while (0)
As Greg suggested, let's just kill off this macro completely instead of
fixing it. This replaces it with BUILD_BUG_ON(), which means we have
to negate all the conditions in the process.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
We should free "desc" before returning NULL.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
The request xid was used to make sure the ost object timestamps
being updated by the out of order setattr/punch/write requests
properly. However, this mechanism is broken by the multiple rcvd
slot feature, where we deferred the xid assignment from request
packing to request sending.
This patch moved back the xid assignment to request packing, and
the manner of finding lowest unreplied xid is changed from scan
sending & delay list to scan a unreplied requests list.
This patch also skipped packing the known replied XID in connect
and disconnect request, so that we can make sure the known replied
XID is increased only on both server & client side.
Signed-off-by: Niu Yawei <yawei.niu@intel.com>
Reviewed-on: http://review.whamcloud.com/16759
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5951
Reviewed-by: Gregoire Pichon <gregoire.pichon@bull.net>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
ptlrpc is using rq_xid as matchbits of bulk data, which means it
has to change rq_xid for bulk resend to avoid several bulk data
landing into the same buffer from different resends.
This patch uses one of reserved __u64 of ptlrpc_body to transfer
mbits to peer, matchbits is now separated from xid. With this change,
ptlrpc can keep rq_xid unchanged on resend, it only updates matchbits
for bulk data.
This protocol change is only applied if both sides of connection have
OBD_CONNECT_BULK_MBITS, otherwise, ptlrpc still uses old approach and
update xid while resending bulk.
Signed-off-by: Liang Zhen <liang.zhen@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3534
Reviewed-on: http://review.whamcloud.com/15421
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Fix: 5c689e689baa ("staging/lustre/ptlrpc: race at req processing")
decreased the race window, but does not remove it. Disable rq_resend
right after MSG_REPLAY flag set. Import lock protects two threads
from race between set/clear MSG_REPLAY and rq_resend flags.
Signed-off-by: Alexander Boyko <alexander_boyko@xyratex.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5554
Xyratex-bug-id: MRP-1888
Reviewed-on: http://review.whamcloud.com/10735
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
-EAGAIN is a normal return when requesting POSIX flocks.
We can't recognize exactly that case here, but it's the
only case that should result in -EAGAIN on LDLM_ENQUEUE, so
don't print to console in that case.
Signed-off-by: Patrick Farrell <paf@cray.com>
Reviewed-on: http://review.whamcloud.com/22856
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-8658
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
imp_peer_committed_transno should not decrease as this can
confuse the users if imp_peer_committed_transno is used to
check commit status.
Signed-off-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-7079
Reviewed-on: http://review.whamcloud.com/16161
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Reviewed-by: wangdi <di.wang@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
When encryption is enabled RPCs are encrypted just before being
sent. The encryption requires allocating memory in the encoding pool.
The current implementation in sptlrpc_enc_pool_get_pages() is
deadlock-prone. Indeed, if there is no more free pages in the pool,
all ptlrpcd threads can end up waiting in a queue, so there is no
thread available to process other requests. It means client is not
able to process replies from servers that yet contain last committed
transno useful to release memory allocated by previous requests,
including enc_pool pages.
To fix this, in sptlrpc_enc_pool_get_pages(), do not make ptlrpcd
threads wait in queue if encoding pool has already reached its maximum
capacity. Instead, return -ENOMEM. If functions calling ptl_send_rpc()
get -ENOMEM, then put back request in queue by moving it back to
RQ_PHASE_NEW phase.
As an optimization, do not call ptl_send_rpc() again for requests that
already failed to allocate in the enc_pool, as long as there is not
enough memory in the enc_pool to satisfy theirs needs.
In /sys/fs/lustre/sptlrpc/encrypt_page_pools, add a new 'out of mem'
stat to track how many requests fail to allocate memory in the
enc_pool.
Signed-off-by: Sebastien Buisson <sebastien.buisson@bull.net>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-6356
Reviewed-on: http://review.whamcloud.com/15070
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Even though the server might already got the bulk
replay request, but bulk transfer timeout, let's
replay the bulk request, i.e. treat such replay as
same as no replied replay request (See
ptlrpc_replay_interpret()).
Signed-off-by: wang di <di.wang@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-6924
Reviewed-on: http://review.whamcloud.com/15793
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Atomically assign XIDs and put request and sending list so
we can learn the lowest unreplied XID at any point.
This allows to embed in every resquests the highest XID for
which a reply has been received and does not have an unreplied
lower-numbered XID.
This will be used by the MDT target to release in-memory
reply data corresponding to XIDs of reply received by the client.
Signed-off-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Signed-off-by: Gregoire Pichon <gregoire.pichon@bull.net>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5319
Reviewed-on: http://review.whamcloud.com/14793
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Added a union to ptlrpc_bulk_desc for KVEC and KIOV buffers.
bd_type has been changed to be a bit mask. Bits are set in
bd_type to specify {put,get}{source,sink}{kvec,kiov}
changed all instances in the code to access the union properly
ASSUMPTION: all current code only works with KIOV and new DNE code
to be added will introduce a different code path that uses IOVEC
As part of the implementation buffer operations are added
as callbacks to be defined by users of ptlrpc. This enables
buffer users to define how to allocate and release buffers.
Signed-off-by: Amir Shehata <amir.shehata@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5835
Reviewed-on: http://review.whamcloud.com/12525
Reviewed-by: wangdi <di.wang@intel.com>
Reviewed-by: Liang Zhen <liang.zhen@intel.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Do not take unwrap time in req_waittime calculation on client part
as decryption is not related to request service time.
Signed-off-by: Sebastien Buisson <sebastien.buisson@bull.net>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-6356
Reviewed-on: http://review.whamcloud.com/14404
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: Jeremy Filizetti <jeremy.filizetti@gmail.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@seagate.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Patch fixes the issue seen on the client with growing request
timeout which occurred after the server side patch landed for
LU-5079. While commit itself is correct, it reveals another
issue. If request is being processed for a long time on server
then client adaptive timeouts will adapt to that after receiving
reply and new requests will have bigger timeout. Another problem
is that server AT history is corrupted by recovery request
processing time which not pure service time but includes
also waiting time for clients to recover.
Patch prevents the AT stats update from early replies on client and
from recovering requests processing time on server.
The ptlrpc_at_recv_early_reply() still updates the current request
timeout as asked by server, but don't include this into AT stats.
The real reply will bring that data from server after all.
Signed-off-by: Mikhail Pershin <mike.pershin@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-6084
Reviewed-on: http://review.whamcloud.com/13520
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Jian Yu <jian.yu@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
A lot of symbols don't need to be exported at all because they are
only used in the module they belong to.
Signed-off-by: frank zago <fzago@cray.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5829
Reviewed-on: http://review.whamcloud.com/12510
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
It may be that a client or MDS is trying to connect to a target (OST
or peer MDT) before that target is finished setup. Rather than
spamming the console logs during initial connection, only print a
console error message if there are repeated failures trying to
connect to the target, which may indicate an error on that node.
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Bobi Jam <bobijam.xu@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3456
Reviewed-on: http://review.whamcloud.com/10057
Reviewed-by: Bobi Jam <bobijam@gmail.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
To avoid scanning the replay open list every time in the
ptlrpc_free_committed(), the fix of LU-2613 (4322e0f9) changed
the ptlrpc_free_committed() to skip the open list unless the
import generation is changed. That introduced a race which could
make a closed open being replayed:
1. Application calls ll_close_inode_openhandle()-> mdc_close(),
to close file, rq_replay is cleared, but the open request is
still on the imp_committed_list;
2. Before the md_clear_open_replay_data() is called for close,
client start replay, and that closed open will be replayed
mistakenly;
3. Open replay interpret callback (mdc_replay_open) could race
with the mdc_clear_open_replay_data() at the end;
This patch fix the ptlrpc_free_committed() to make sure the
open list is scanned on recovery to prevent the closed open request
from being replayed.
Signed-off-by: Niu Yawei <yawei.niu@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5507
Reviewed-on: http://review.whamcloud.com/12667
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Once connected, the previously gathered AT statistics is not valid
anymore because may reflect other routing, etc. The connect by itself
could take a long time due to different reasons (e.g. server was not
ready) and net latency got very high (see import_select_connection())
what does not reflect the current situation.
Take into account only the current (re-)CONNECT rpc latency.
Signed-off-by: Vitaly Fertman <vitaly_fertman@xyratex.com>
Signed-off-by: Alexander Boyko <alexander_boyko@xyratex.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5380
Xyratex-bug-id: MRP-1285
Reviewed-on: http://review.whamcloud.com/11155
Reviewed-by: Liang Zhen <liang.zhen@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
While clients will resend client->server RPCs, servers would not
resend server->client RPCs such as LDLM callbacks (blocking
or completion callbacks/ASTs). This could result in clients being
evicted from the server if blocking callbacks were dropped by the
network (a failed router or lossy network) and the client did not
cancel the requested lock in time.
In order to fix this problem, this patch adds the ability to resend
LDLM callbacks from the server and give the client a chance to
respond within the timeout period before it is evicted:
- resend BL AST within lock callback timeout period;
- still do not resend CANCEL_ON_BLOCK;
- regular resend for CP AST without BL AST embedded;
- prolong lock callback timeout on resend;
some fixes:
- recovery-small test_10 to actually evict the client
with dropped BL AST;
- ETIMEDOUT to be returned if send limit is expired;
Signed-off-by: Vitaly Fertman <vitaly_fertman@xyratex.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5520
Reviewed-by: Alexey Lyashkov <Alexey_Lyashkov@xyratex.com>
Reviewed-by: Andriy Skulysh <Andriy_Skulysh@xyratex.com>
Xyratex-bug-id: MRP-417
Reviewed-on: http://review.whamcloud.com/9335
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Johann Lombardi <johann.lombardi@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Exit condition from UNREGISTERING phase is releasing of
both reply and bulk buffers.
Call ptlrpc_unregister_bulk() if ptlrpc_unregister_reply()
wasn't completed in async mode before switching to
UNREGISTERING phase.
Signed-off-by: Andriy Skulysh <Andriy_Skulysh@xyratex.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-5259
Xyratex-bug-id: MRP-1960
Reviewed-on: http://review.whamcloud.com/10846
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Ann Koehler <amk@cray.com>
Reviewed-by: Liang Zhen <liang.zhen@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
When determining whether an early reply can be sent the server will
calculate the new deadline based on an offset from the request
arrival time. However, when actually setting the new deadline
the server offsets the current time. This can result in deadlines
being extended more than at_max seconds past the request arrival
time. Instead, the server should offset the arrival time when updating
its request timeout.
When a client receives an early reply it doesn't know the server side
arrival time so we use the original sent time as an approximation.
Signed-off-by: Chris Horn <hornc@cray.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4578
Reviewed-on: http://review.whamcloud.com/9100
Reviewed-by: James Simmons <uja.ornl@gmail.com>
Reviewed-by: Christopher J. Morrone <chris.morrone.llnl@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
The reverse order of request_out_callback() and reply_in_callback()
puts the RPC into UNREGISTERING state, which is waiting for RPC &
bulk md unlink, whereas only RPC md unlink has been called so far.
If bulk is lost, even expired_set does not check for UNREGISTERING
state.
The same for write if server returns an error.
This phase is ambiguous, split to UNREG_RPC and UNREG_BULK.
Signed-off-by: Vitaly Fertman <vitaly.fertman@seagate.com>
Seagate-bug-id: MRP-2953, MRP-3206
Reviewed-by: Andriy Skulysh <andriy.skulysh@seagate.com>
Reviewed-by: Alexey Leonidovich Lyashkov <alexey.lyashkov@seagate.com>
Tested-by: Elena V. Gryaznova <elena.gryaznova@seagate.com>
Reviewed-on: http://review.whamcloud.com/19953
Reviewed-by: Chris Horn <hornc@cray.com>
Reviewed-by: Ann Koehler <amk@cray.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Combine __ptlrpc_request_bufs_pack into ptlrpc_request_bufs_pack
because it was an unnecessary wrapper otherwise.
Signed-off-by: Ben Evans <bevans@cray.com>
Reviewed-on: http://review.whamcloud.com/16765
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-7269
Reviewed-by: Frank Zago <fzago@cray.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Chris Horn <hornc@cray.com>
Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|