From a5325ae5b8bff051933a754db7727fc9823e6414 Mon Sep 17 00:00:00 2001 From: Erik Hugne Date: Thu, 28 Aug 2014 09:08:47 +0200 Subject: tipc: add name distributor resiliency queue TIPC name table updates are distributed asynchronously in a cluster, entailing a risk of certain race conditions. E.g., if two nodes simultaneously issue conflicting (overlapping) publications, this may not be detected until both publications have reached a third node, in which case one of the publications will be silently dropped on that node. Hence, we end up with an inconsistent name table. In most cases this conflict is just a temporary race, e.g., one node is issuing a publication under the assumption that a previous, conflicting, publication has already been withdrawn by the other node. However, because of the (rtt related) distributed update delay, this may not yet hold true on all nodes. The symptom of this failure is a syslog message: "tipc: Cannot publish {%u,%u,%u}, overlap error". In this commit we add a resiliency queue at the receiving end of the name table distributor. When insertion of an arriving publication fails, we retain it in this queue for a short amount of time, assuming that another update will arrive very soon and clear the conflict. If so happens, we insert the publication, otherwise we drop it. The (configurable) retention value defaults to 2000 ms. Knowing from experience that the situation described above is extremely rare, there is no risk that the queue will accumulate any large number of items. Signed-off-by: Erik Hugne Signed-off-by: Jon Maloy Acked-by: Ying Xue Signed-off-by: David S. Miller --- Documentation/sysctl/net.txt | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) (limited to 'Documentation/sysctl') diff --git a/Documentation/sysctl/net.txt b/Documentation/sysctl/net.txt index 9a0319a82470..04892b821157 100644 --- a/Documentation/sysctl/net.txt +++ b/Documentation/sysctl/net.txt @@ -241,6 +241,9 @@ address of the router (or Connected) for internal networks. 6. TIPC ------------------------------------------------------- +tipc_rmem +---------- + The TIPC protocol now has a tunable for the receive memory, similar to the tcp_rmem - i.e. a vector of 3 INTEGERs: (min, default, max) @@ -252,3 +255,16 @@ The max value is set to CONN_OVERLOAD_LIMIT, and the default and min values are scaled (shifted) versions of that same value. Note that the min value is not at this point in time used in any meaningful way, but the triplet is preserved in order to be consistent with things like tcp_rmem. + +named_timeout +-------------- + +TIPC name table updates are distributed asynchronously in a cluster, without +any form of transaction handling. This means that different race scenarios are +possible. One such is that a name withdrawal sent out by one node and received +by another node may arrive after a second, overlapping name publication already +has been accepted from a third node, although the conflicting updates +originally may have been issued in the correct sequential order. +If named_timeout is nonzero, failed topology updates will be placed on a defer +queue until another event arrives that clears the error, or until the timeout +expires. Value is in milliseconds. -- cgit v1.2.3-59-g8ed1b