Age | Commit message (Collapse) | Author | Files | Lines |
|
Send probes to all the unprobed fileservers in a fileserver list on all
addresses simultaneously in an attempt to find out the fastest route whilst
not getting stuck for 20s on any server or address that we don't get a
reply from.
This alleviates the problem whereby attempting to access a new server can
take a long time because the rotation algorithm ends up rotating through
all servers and addresses until it finds one that responds.
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
In some circumstances, the callback interest pointer is NULL, so in such a
case we can't dereference it when checking to see if the callback is
broken. This causes an oops in some circumstances.
Fix this by replacing the function that worked out the aggregate break
counter with one that actually does the comparison, and then make that
return true (ie. broken) if there is no callback interest as yet (ie. the
pointer is NULL).
Fixes: 68251f0a6818 ("afs: Fix whole-volume callback handling")
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
Eliminate the address pointer from the address list cursor as it's
redundant (ac->addrs[ac->index] can be used to find the same address) and
address lists must be replaced rather than being rearranged, so is of
limited value.
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
Provide an option to allow the file or volume location server cursor to be
dumped if the rotation routine falls off the end without managing to
contact a server.
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
Implement support for talking to YFS-variant fileservers in the cache
manager and the filesystem client. These implement upgraded services on
the same port as their AFS services.
YFS fileservers provide expanded capabilities over AFS.
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
Expand fields in various data structures to support the expanded
information that YFS is capable of returning.
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
Get the target vnode in afs_rmdir() and validate it before we attempt the
deletion, The vnode pointer will be passed through to the delivery function
in a later patch so that the delivery function can mark it deleted.
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
Calculate the callback expiration time at the point of operation reply
delivery, using the reply time queried from AF_RXRPC on that call as a
base.
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
The FS.FetchStatus reply delivery function was updating inode of the
directory in which a lookup had been done with the status of the looked up
file. This corrupts some of the directory state.
Fixes: 5cf9dd55a0ec ("afs: Prospectively look up extra files when doing a single lookup")
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
Implement the YFS cache manager service which gives extra capabilities on
top of AFS. This is done by listening for an additional service on the
same port and indicating that anyone requesting an upgrade should be
upgraded to the YFS port.
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
Remove unnecessary details of a broken callback, such as version, expiry
and type, from the afs_callback_break struct as they're not actually used
and make the list take more memory.
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
Call the function to commit the status on a new file, dir or symlink so
that the access rights for the caller's key are cached for that object.
Without this, the next access to the file will cause a FetchStatus
operation to be emitted to retrieve the access rights.
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
Increase the sizes of the volume ID to 64 bits and the vnode ID (inode
number equivalent) to 96 bits to allow the support of YFS.
This requires the iget comparator to check the vnode->fid rather than i_ino
and i_generation as i_ino is not sufficiently capacious. It also requires
this data to be placed into the vnode cache key for fscache.
For the moment, just discard the top 32 bits of the vnode ID when returning
it though stat.
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
When writing a new page, clear space in the page rather than attempting to
load it from the server if the space is beyond the EOF.
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
Add a couple of tracepoints to log the production of I/O errors within the AFS
filesystem.
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
Fix afs_deliver_to_call() to handle -EIO being returned by the operation
delivery function, indicating that the call found itself in the wrong
state, by printing an error and aborting the call.
Currently, an assertion failure will occur. This can happen, say, if the
delivery function falls off the end without calling afs_extract_data() with
the want_more parameter set to false to collect the end of the Rx phase of
a call.
The assertion failure looks like:
AFS: Assertion failed
4 == 7 is false
0x4 == 0x7 is false
------------[ cut here ]------------
kernel BUG at fs/afs/rxrpc.c:462!
and is matched in the trace buffer by a line like:
kworker/7:3-3226 [007] ...1 85158.030203: afs_io_error: c=0003be0c r=-5 CM_REPLY
Fixes: 98bf40cd99fc ("afs: Protect call->state changes against signals")
Reported-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
Currently the TTL on VL server and address lists isn't set in all
circumstances and may be set to poor choices in others, since the TTL is
derived from the SRV/AFSDB DNS record if and when available.
Fix the TTL by limiting the range to a minimum and maximum from the current
time. At some point these can be made into sysctl knobs. Further, use the
TTL we obtained from the upcall to set the expiry on negative results too;
in future a mechanism can be added to force reloading of such data.
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
Track VL servers as independent entities rather than lumping all their
addresses together into one set and implement server-level rotation by:
(1) Add the concept of a VL server list, where each server has its own
separate address list. This code is similar to the FS server list.
(2) Use the DNS resolver to retrieve a set of servers and their associated
addresses, ports, preference and weight ratings.
(3) In the case of a legacy DNS resolver or an address list given directly
through /proc/net/afs/cells, create a list containing just a dummy
server record and attach all the addresses to that.
(4) Implement a simple rotation policy, for the moment ignoring the
priorities and weights assigned to the servers.
(5) Show the address list through /proc/net/afs/<cell>/vlservers. This
also displays the source and status of the data as indicated by the
upcall.
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
Improve the error handling in FS server rotation by:
(1) Cache the latest useful error value for the fs operation as a whole in
struct afs_fs_cursor separately from the error cached in the
afs_addr_cursor struct. The one in the address cursor gets clobbered
occasionally. Copy over the error to the fs operation only when it's
something we'd be interested in passing to userspace.
(2) Make it so that EDESTADDRREQ is the default that is seen only if no
addresses are available to be accessed.
(3) When calling utility functions, such as checking a volume status or
probing a fileserver, don't let a successful result clobber the cached
error in the cursor; instead, stash the result in a temporary variable
until it has been assessed.
(4) Don't return ETIMEDOUT or ETIME if a better error, such as
ENETUNREACH, is already cached.
(5) On leaving the rotation loop, turn any remote abort code into a more
useful error than ECONNABORTED.
Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation")
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
afs_extract_data sets up a temporary iov_iter and passes it to AF_RXRPC
each time it is called to describe the remaining buffer to be filled.
Instead:
(1) Put an iterator in the afs_call struct.
(2) Set the iterator for each marshalling stage to load data into the
appropriate places. A number of convenience functions are provided to
this end (eg. afs_extract_to_buf()).
This iterator is then passed to afs_extract_data().
(3) Use the new ITER_DISCARD iterator to discard any excess data provided
by FetchData.
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
Include the site of detection of AFS protocol errors in trace lines to
better be able to determine what went wrong.
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
Add a new iterator, ITER_DISCARD, that can only be used in READ mode and
just discards any data copied to it.
This is useful in a network filesystem for discarding any unwanted data
sent by a server.
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
In the iov_iter struct, separate the iterator type from the iterator
direction and use accessor functions to access them in most places.
Convert a bunch of places to use switch-statements to access them rather
then chains of bitwise-AND statements. This makes it easier to add further
iterator types. Also, this can be more efficient as to implement a switch
of small contiguous integers, the compiler can use ~50% fewer compare
instructions than it has to use bitwise-and instructions.
Further, cease passing the iterator type into the iterator setup function.
The iterator function can set that itself. Only the direction is required.
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
Use accessor functions to access an iterator's type and direction. This
allows for the possibility of using some other method of determining the
type of iterator than if-chains with bitwise-AND conditions.
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
Remove the undefinition of READ and WRITE because these constants may be
used elsewhere in subsequently included header files, thus breaking them.
These constants don't actually appear to be used in the driver, so the
undefinition seems pointless.
Fixes: 4562236b3bc0 ("drm/amd/dc: Add dc display driver (v2)")
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
We no longer need to worry about whether or not the entry is hashed in
order to figure out if the contents are valid. We only care whether or
not the refcount is non-zero.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
The filehandle has a length which is defined as a 32-bit
"unsigned integer". Change sign of the length appropriately.
Signed-off-by: Frank Sorenson <sorenson@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
This reverts commit 6fe9487892b32cb1c8b8b0d552ed7222a527fe30.
It is causing more serious regressions than the RCU warning
it is fixing.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
There is no need to have the 'struct rocker_desc_info *desc_info'
variable static since new value always be assigned before use it.
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Trivial fix to spelling mistake in DP_INFO message.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Commit 058214a4d1df ("ip6_tun: Add infrastructure for doing
encapsulation") added the ip6_tnl_encap() call in ip6_tnl_xmit(), before
the call to ipv6_push_frag_opts() to append the IPv6 Tunnel Encapsulation
Limit option (option 4, RFC 2473, par. 5.1) to the outer IPv6 header.
As long as the option didn't actually end up in generated packets, this
wasn't an issue. Then commit 89a23c8b528b ("ip6_tunnel: Fix missing tunnel
encapsulation limit option") fixed sending of this option, and the
resulting layout, e.g. for FoU, is:
.-------------------.------------.----------.-------------------.----- - -
| Outer IPv6 Header | UDP header | Option 4 | Inner IPv6 Header | Payload
'-------------------'------------'----------'-------------------'----- - -
Needless to say, FoU and GUE (at least) won't work over IPv6. The option
is appended by default, and I couldn't find a way to disable it with the
current iproute2.
Turn this into a more reasonable:
.-------------------.----------.------------.-------------------.----- - -
| Outer IPv6 Header | Option 4 | UDP header | Inner IPv6 Header | Payload
'-------------------'----------'------------'-------------------'----- - -
With this, and with 84dad55951b0 ("udp6: fix encap return code for
resubmitting"), FoU and GUE work again over IPv6.
Fixes: 058214a4d1df ("ip6_tun: Add infrastructure for doing encapsulation")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Andrey reported the following warning triggered while running CRIU tests:
tcp_clean_rtx_queue()
...
last_ackt = tcp_skb_timestamp_us(skb);
WARN_ON_ONCE(last_ackt == 0);
This is caused by 5f6188a8003d ("tcp: do not change tcp_wstamp_ns
in tcp_mstamp_refresh"), as we end up having skbs in retransmit queue
with a zero skb->skb_mstamp_ns field.
We could fix this bug in different ways, like making sure
tp->tcp_wstamp_ns is not zero at socket creation, but as Neal pointed
out, we also do not want that pacing status of a repaired socket
could push tp->tcp_wstamp_ns far ahead in the future.
So we prefer changing tcp_write_xmit() to not call tcp_update_skb_after_send()
and instead do what is requested by TCP_REPAIR logic.
Fixes: 5f6188a8003d ("tcp: do not change tcp_wstamp_ns in tcp_mstamp_refresh")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Andrey Vagin <avagin@openvz.org>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We initialize a struct tipc_event allocated on the kernel stack to
zero to avert info leak to user space.
Reported-by: syzbot+057458894bc8cada4dee@syzkaller.appspotmail.com
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch adds checksum offload and TSO support for the HiNIC
driver. Perfomance test (Iperf) shows more than 100% improvement
in TCP streams.
Signed-off-by: Zhao Chen <zhaochen6@huawei.com>
Signed-off-by: Xue Chaojing <xuechaojing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In ethtool_ioctl(), the ioctl command 'ethcmd' is checked through a switch
statement to see whether it is necessary to pre-process the ethtool
structure, because, as mentioned in the comment, the structure
ethtool_rxnfc is defined with padding. If yes, a user-space buffer 'rxnfc'
is allocated through compat_alloc_user_space(). One thing to note here is
that, if 'ethcmd' is ETHTOOL_GRXCLSRLALL, the size of the buffer 'rxnfc' is
partially determined by 'rule_cnt', which is actually acquired from the
user-space buffer 'compat_rxnfc', i.e., 'compat_rxnfc->rule_cnt', through
get_user(). After 'rxnfc' is allocated, the data in the original user-space
buffer 'compat_rxnfc' is then copied to 'rxnfc' through copy_in_user(),
including the 'rule_cnt' field. However, after this copy, no check is
re-enforced on 'rxnfc->rule_cnt'. So it is possible that a malicious user
race to change the value in the 'compat_rxnfc->rule_cnt' between these two
copies. Through this way, the attacker can bypass the previous check on
'rule_cnt' and inject malicious data. This can cause undefined behavior of
the kernel and introduce potential security risk.
This patch avoids the above issue via copying the value acquired by
get_user() to 'rxnfc->rule_cn', if 'ethcmd' is ETHTOOL_GRXCLSRLALL.
Signed-off-by: Wenwen Wang <wang6495@umn.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
On multi adapter setup if the uld registration fails even on
one adapter, the allocated resources for the uld on all the
adapters are freed, rendering the functioning adapters unusable.
This commit fixes the issue by freeing the allocated resources
only for the failed adapter.
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When dumping classes by parent, kernel would return classes twice:
| # tc qdisc add dev lo root prio
| # tc class show dev lo
| class prio 8001:1 parent 8001:
| class prio 8001:2 parent 8001:
| class prio 8001:3 parent 8001:
| # tc class show dev lo parent 8001:
| class prio 8001:1 parent 8001:
| class prio 8001:2 parent 8001:
| class prio 8001:3 parent 8001:
| class prio 8001:1 parent 8001:
| class prio 8001:2 parent 8001:
| class prio 8001:3 parent 8001:
This comes from qdisc_match_from_root() potentially returning the root
qdisc itself if its handle matched. Though in that case, root's classes
were already dumped a few lines above.
Fixes: cb395b2010879 ("net: sched: optimize class dumps")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The struct type was copied from the line before but it should be "tx"
instead of "rx". I have reviewed the code and I can't immediately see
that this bug causes a runtime issue.
Fixes: 36e53349b60b ("bnxt_en: Add additional extended port statistics.")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Clang warns:
drivers/atm/zatm.c:513:7: error: while loop has empty body
[-Werror,-Wempty-body]
zwait;
^
drivers/atm/zatm.c:513:7: note: put the semicolon on a separate line to
silence this warning
Get rid of this warning by using an empty do-while loop. While we're at
it, add parentheses to make it clear that this is a function-like macro.
Link: https://github.com/ClangBuiltLinux/linux/issues/42
Suggested-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Clang warns:
drivers/atm/eni.c:244:48: error: for loop has empty body
[-Werror,-Wempty-body]
for (order = 0; (1 << order) < *size; order++);
^
drivers/atm/eni.c:244:48: note: put the semicolon on a separate line to
silence this warning
In this case, that loop is expected to be empty so silence the warning
in the way that Clang suggests.
Link: https://github.com/ClangBuiltLinux/linux/issues/42
Suggested-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Update Shrijeet's email address for the VRF entry.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Commits ffb6ca33b04b and e08ea3a96fc7 prevent setting xprt_min_resvport
greater than xprt_max_resvport, but may also break simple code that sets
one parameter then the other, if the new range does not overlap the old.
Also it looks racy to me, unless there's some serialization I'm not
seeing. Granted it would probably require malicious privileged processes
(unless there's a chance these might eventually be settable in unprivileged
containers), but still it seems better not to let userspace panic the
kernel.
Simpler seems to be to allow setting the parameters to whatever you want
but interpret xprt_min_resvport > xprt_max_resvport as the empty range.
Fixes: ffb6ca33b04b "sunrpc: Prevent resvport min/max inversion..."
Fixes: e08ea3a96fc7 "sunrpc: Prevent rexvport min/max inversion..."
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
We don't need to call this in the direct, read, or pnfs resend paths and
the only other caller is the write path in nfs_page_async_flush() which
already checks and sets the pg_error on the context.
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
We must check pg_error and call error_cleanup after any call to pg_doio.
Currently, we are skipping the unlock of a page if we encounter an error in
nfs_pageio_complete() before handing off the work to the RPC layer.
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
These are counters for errors received on rx side, such as
FEC errors.
Signed-off-by: Shay Agroskin <shayag@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Added "Per lane raw errors" capability bit in
Ports Capabilities Mask (PCAM) enhanced features
layout.
This bit determines if the fields "phy_raw_errors_laneX"
in "Physical Layer statistical" counters group are supported.
Signed-off-by: Shay Agroskin <shayag@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|