aboutsummaryrefslogtreecommitdiffstats
path: root/net/tipc (follow)
AgeCommit message (Collapse)AuthorFilesLines
2018-03-22tipc: step sk->sk_drops when rcv buffer is fullGhantaKrishnamurthy MohanKrishna1-2/+7
Currently when tipc is unable to queue a received message on a socket, the message is rejected back to the sender with error TIPC_ERR_OVERLOAD. However, the application on this socket has no knowledge about these discards. In this commit, we try to step the sk_drops counter when tipc is unable to queue a received message. Export sk_drops using tipc socket diagnostics. Acked-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: GhantaKrishnamurthy MohanKrishna <mohan.krishna.ghanta.krishnamurthy@ericsson.com> Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22tipc: implement socket diagnostics for AF_TIPCGhantaKrishnamurthy MohanKrishna5-6/+203
This commit adds socket diagnostics capability for AF_TIPC in netlink family NETLINK_SOCK_DIAG in a new kernel module (diag.ko). The following are key design considerations: - config TIPC_DIAG has default y, like INET_DIAG. - only requests with flag NLM_F_DUMP is supported (dump all). - tipc_sock_diag_req message is introduced to send filter parameters. - the response attributes are of TLV, some nested. To avoid exposing data structures between diag and tipc modules and avoid code duplication, the following additions are required: - export tipc_nl_sk_walk function to reuse socket iterator. - export tipc_sk_fill_sock_diag to fill the tipc diag attributes. - create a sock_diag response message in __tipc_add_sock_diag defined in diag.c and use the above exported tipc_sk_fill_sock_diag to fill response. Acked-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: GhantaKrishnamurthy MohanKrishna <mohan.krishna.ghanta.krishnamurthy@ericsson.com> Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22tipc: modify socket iterator for sock_diagGhantaKrishnamurthy MohanKrishna1-24/+41
The current socket iterator function tipc_nl_sk_dump, handles socket locks and calls __tipc_nl_add_sk for each socket. To reuse this logic in sock_diag implementation, we do minor modifications to make these functions generic as described below. In this commit, we add a two new functions __tipc_nl_sk_walk, __tipc_nl_add_sk_info and modify tipc_nl_sk_dump, __tipc_nl_add_sk accordingly. In __tipc_nl_sk_walk we: 1. acquire and release socket locks 2. for each socket, execute the specified callback function In __tipc_nl_add_sk we: - Move the netlink attribute insertion to __tipc_nl_add_sk_info. tipc_nl_sk_dump calls tipc_nl_sk_walk with __tipc_nl_add_sk as argument. sock_diag will use these generic functions in a later commit. There is no functional change in this commit. Acked-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: GhantaKrishnamurthy MohanKrishna <mohan.krishna.ghanta.krishnamurthy@ericsson.com> Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-17tipc: some name changesJon Maloy5-103/+106
We rename some lists and fields in struct publication both to make the naming more consistent and to better reflect their roles. We also update the descriptions of those lists. node_list -> local_publ cluster_list -> all_publ pport_list -> binding_sock ref -> port There are no functional changes in this commit. Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-17tipc: merge two lists in struct publicationJon Maloy2-13/+12
The size of struct publication can be reduced further. Membership in lists 'nodesub_list' and 'local_list' is mutually exlusive, in that remote publications use the former and local publications the latter. We replace the two lists with one single, named 'binding_node' which reflects what it really is. Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-17tipc: remove zone_list member in struct publicationJon Maloy2-76/+30
As a further consequence of the previous commits, we can also remove the member 'zone_list 'in struct name_info and struct publication. Instead, we now let the member cluster_list take over the role a container of all publications of a given <type,lower, upper>. We also remove the counters for the size of those lists, since they don't serve any purpose. Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-17tipc: remove zone publication list in name tableJon Maloy4-26/+29
As a consequence of the previous commit we nan now eliminate zone scope related lists in the name table. We start with name_table::publ_list[3], which can now be replaced with two lists, one for node scope publications and one for cluster scope publications. Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-17tipc: obsolete TIPC_ZONE_SCOPEJon Maloy6-40/+23
Publications for TIPC_CLUSTER_SCOPE and TIPC_ZONE_SCOPE are in all aspects handled the same way, both on the publishing node and on the receiving nodes. Despite previous ambitions to the contrary, this is never going to change, so we take the conseqeunce of this and obsolete TIPC_ZONE_SCOPE and related macros/functions. Whenever a user is doing a bind() or a sendmsg() attempt using ZONE_SCOPE we translate this internally to CLUSTER_SCOPE, while we remain compatible with users and remote nodes still using ZONE_SCOPE. Furthermore, the non-formalized scope value 0 has always been permitted for use during lookup, with the same meaning as ZONE_SCOPE/CLUSTER_SCOPE. We now permit it even as binding scope, but for compatibility reasons we choose to not change the value of TIPC_CLUSTER_SCOPE. Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-13net: Convert tipc_net_opsKirill Tkhai1-0/+1
TIPC looks concentrated in itself, and other pernet_operations seem not touching its entities. tipc_net_ops look pernet-divided, and they should be safe to be executed in parallel for several net the same time. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-07tipc: bcast: use true and false for boolean valuesGustavo A. R. Silva1-1/+1
Assign true or false to boolean variables instead of an integer value. This issue was detected with the help of Coccinelle. Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2-0/+2
All of the conflicts were cases of overlapping changes. In net/core/devlink.c, we have to make care that the resouce size_params have become a struct member rather than a pointer to such an object. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-27tipc: correct initial value for group congestion flagJon Maloy2-0/+2
In commit 60c253069632 ("tipc: fix race between poll() and setsockopt()") we introduced a pointer from struct tipc_group to the 'group_is_connected' flag in struct tipc_sock, so that this field can be checked without dereferencing the group pointer of the latter struct. The initial value for this flag is correctly set to 'false' when a group is created, but we miss the case when no group is created at all, in which case the initial value should be 'true'. This has the effect that SOCK_RDM/DGRAM sockets sending datagrams never receive POLLOUT if they request so. This commit corrects this bug. Fixes: 60c253069632 ("tipc: fix race between poll() and setsockopt()") Reported-by: Hoang Le <hoang.h.le@dektek.com.au> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller5-54/+91
2018-02-19tipc: don't call sock_release() in atomic contextPaolo Abeni1-1/+1
syzbot reported a scheduling while atomic issue at netns destruction time: BUG: sleeping function called from invalid context at net/core/sock.c:2769 in_atomic(): 1, irqs_disabled(): 0, pid: 85, name: kworker/u4:3 5 locks held by kworker/u4:3/85: #0: ((wq_completion)"%s""netns"){+.+.}, at: [<00000000c9792deb>] process_one_work+0xaaf/0x1af0 kernel/workqueue.c:2084 #1: (net_cleanup_work){+.+.}, at: [<00000000adc12e2a>] process_one_work+0xb01/0x1af0 kernel/workqueue.c:2088 #2: (net_sem){++++}, at: [<000000009ccb5669>] cleanup_net+0x23f/0xd20 net/core/net_namespace.c:494 #3: (net_mutex){+.+.}, at: [<00000000a92767d9>] cleanup_net+0xa7d/0xd20 net/core/net_namespace.c:496 #4: (&(&srv->idr_lock)->rlock){+...}, at: [<000000001343e568>] spin_lock_bh include/linux/spinlock.h:315 [inline] #4: (&(&srv->idr_lock)->rlock){+...}, at: [<000000001343e568>] tipc_topsrv_stop+0x231/0x610 net/tipc/topsrv.c:685 CPU: 0 PID: 85 Comm: kworker/u4:3 Not tainted 4.16.0-rc1+ #230 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: netns cleanup_net Call Trace: __dump_stack lib/dump_stack.c:17 [inline] dump_stack+0x194/0x257 lib/dump_stack.c:53 ___might_sleep+0x2b2/0x470 kernel/sched/core.c:6128 __might_sleep+0x95/0x190 kernel/sched/core.c:6081 lock_sock_nested+0x37/0x110 net/core/sock.c:2769 lock_sock include/net/sock.h:1463 [inline] tipc_release+0x103/0xff0 net/tipc/socket.c:572 sock_release+0x8d/0x1e0 net/socket.c:594 tipc_topsrv_stop+0x3c0/0x610 net/tipc/topsrv.c:696 tipc_exit_net+0x15/0x40 net/tipc/core.c:96 ops_exit_list.isra.6+0xae/0x150 net/core/net_namespace.c:148 cleanup_net+0x6ba/0xd20 net/core/net_namespace.c:529 process_one_work+0xbbf/0x1af0 kernel/workqueue.c:2113 worker_thread+0x223/0x1990 kernel/workqueue.c:2247 kthread+0x33c/0x400 kernel/kthread.c:238 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:429 This is caused by tipc_topsrv_stop() releasing the listener socket with the idr lock held. This changeset addresses the issue moving the release operation outside such lock. Reported-and-tested-by: syzbot+749d9d87c294c00ca856@syzkaller.appspotmail.com Fixes: 0ef897be12b8 ("tipc: separate topology server listener socket from subcsriber sockets") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: ///jon Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-19tipc: fix bug on error path in tipc_topsrv_kern_subscr()Jon Maloy1-3/+4
In commit cc1ea9ffadf7 ("tipc: eliminate struct tipc_subscriber") we re-introduced an old bug on the error path in the function tipc_topsrv_kern_subscr(). We now re-introduce the correction too. Reported-by: syzbot+f62e0f2a0ef578703946@syzkaller.appspotmail.com Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-16tipc: rename tipc_server to tipc_topsrvJon Maloy7-259/+258
We rename struct tipc_server to struct tipc_topsrv. This reflect its now specialized role as topology server. Accoringly, we change or add function prefixes to make it clearer which functionality those belong to. There are no functional changes in this commit. Acked-by: Ying.Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-16tipc: separate topology server listener socket from subcsriber socketsJon Maloy1-181/+147
We move the listener socket to struct tipc_server and give it its own work item. This makes it easier to follow the code, and entails some simplifications in the reception code in subscriber sockets. Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-16tipc: make struct tipc_server private for server.cJon Maloy5-128/+110
In order to narrow the interface and dependencies between the topology server and the subscription/binding table functionality we move struct tipc_server inside the file server.c. This requires some code adaptations in other files, but those are mostly minor. The most important change is that we have to move the start/stop functions for the topology server to server.c, where they logically belong anyway. Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-16tipc: some prefix changesJon Maloy4-56/+54
Since we now have removed struct tipc_subscriber from the code, and only struct tipc_subscription remains, there is no longer need for long and awkward prefixes to distinguish between their pertaining functions. We now change all tipc_subscrp_* prefixes to tipc_sub_*. This is a purely cosmetic change. Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-16tipc: collapse subscription creation functionsJon Maloy4-43/+22
After the previous changes it becomes logical to collapse the two-level creation of subscription instances into one. We do that here. We also rename the creation and deletion functions for more consistency. Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-16tipc: simplify endianness handling in topology subscriberJon Maloy5-102/+86
Because of the requirement for total distribution transparency, users send subscriptions and receive topology events in their own host format. It is up to the topology server to determine this format and do the correct conversions to and from its own host format when needed. Until now, this has been handled in a rather non-transparent way inside the topology server and subscriber code, leading to unnecessary complexity when creating subscriptions and issuing events. We now improve this situation by adding two new macros, tipc_sub_read() and tipc_evt_write(). Both those functions calculate the need for conversion internally before performing their respective operations. Hence, all handling of such conversions become transparent to the rest of the code. Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-16tipc: simplify interaction between subscription and topology connectionJon Maloy5-149/+88
The message transmission and reception in the topology server is more generic than is currently necessary. By basing the funtionality on the fact that we only send items of type struct tipc_event and always receive items of struct tipc_subcr we can make several simplifications, and also get rid of some unnecessary dynamic memory allocations. Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-16tipc: eliminate struct tipc_subscriberJon Maloy4-207/+146
It is unnecessary to keep two structures, struct tipc_conn and struct tipc_subscriber, with a one-to-one relationship and still with different life cycles. The fact that the two often run in different contexts, and still may access each other via direct pointers constitutes an additional hazard, something we have experienced at several occasions, and still see happening. We have identified at least two remaining problems that are easier to fix if we simplify the topology server data structure somewhat. - When there is a race between a subscription up/down event and a timeout event, it is fully possible that the former might be delivered after the latter, leading to confusion for the receiver. - The function tipc_subcrp_timeout() is executing in interrupt context, while the following call chain is at least theoretically possible: tipc_subscrp_timeout() tipc_subscrp_send_event() tipc_conn_sendmsg() conn_put() tipc_conn_kref_release() sock_release(sock) I.e., we end up calling a function that might try to sleep in interrupt context. To eliminate this, we need to ensure that the tipc_conn structure and the socket, as well as the subscription instances, only are deleted in work queue context, i.e., after the timeout event really has been sent out. We now remove this unnecessary complexity, by merging data and functionality of the subscriber structure into struct tipc_conn and the associated file server.c. We thereafter add a spinlock and a new 'inactive' state to the subscription structure. Using those, both problems described above can be easily solved. Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-16tipc: remove unnecessary function pointersJon Maloy4-37/+20
Interaction between the functionality in server.c and subscr.c is done via function pointers installed in struct server. This makes the code harder to follow, and doesn't serve any obvious purpose. Here, we replace the function pointers with direct function calls. Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-16tipc: remove redundant code in topology serverJon Maloy3-35/+9
The socket handling in the topology server is unnecessarily generic. It is prepared to handle both SOCK_RDM, SOCK_DGRAM and SOCK_STREAM type sockets, as well as the only socket type which is really used, SOCK_SEQPACKET. We now remove this redundant code to make the code more readable. Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-14tipc: apply bearer link tolerance on running linksJon Maloy4-4/+32
Currently, the default link tolerance set in struct tipc_bearer only has effect on links going up after that moment. I.e., a user has to reset all the node's links across that bearer to have the new value applied. This is too limiting and disturbing on a running cluster to be useful. We now change this so that also already existing links are updated dynamically, without any need for a reset, when the bearer value is changed. We leverage the already existing per-link functionality for this to achieve the wanted effect. Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-14tipc: Fix missing RTNL lock protection during setting link propertiesYing Xue1-6/+8
Currently when user changes link properties, TIPC first checks if user's command message contains media name or bearer name through tipc_media_find() or tipc_bearer_find() which is protected by RTNL lock. But when tipc_nl_compat_link_set() conducts the checking with the two functions, it doesn't hold RTNL lock at all, as a result, the following complaints were reported: audit: type=1400 audit(1514679888.244:9): avc: denied { write } for pid=3194 comm="syzkaller021477" path="socket:[11143]" dev="sockfs" ino=11143 scontext=unconfined_u:system_r:insmod_t:s0-s0:c0.c1023 tcontext=unconfined_u:system_r:insmod_t:s0-s0:c0.c1023 tclass=netlink_generic_socket permissive=1 Reviewed-by: Kirill Tkhai <ktkhai@virtuozzo.com> ============================= WARNING: suspicious RCU usage 4.15.0-rc5+ #152 Not tainted ----------------------------- net/tipc/bearer.c:177 suspicious rcu_dereference_protected() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 2 locks held by syzkaller021477/3194: #0: (cb_lock){++++}, at: [<00000000d20133ea>] genl_rcv+0x19/0x40 net/netlink/genetlink.c:634 #1: (genl_mutex){+.+.}, at: [<00000000fcc5d1bc>] genl_lock net/netlink/genetlink.c:33 [inline] #1: (genl_mutex){+.+.}, at: [<00000000fcc5d1bc>] genl_rcv_msg+0x115/0x140 net/netlink/genetlink.c:622 stack backtrace: CPU: 1 PID: 3194 Comm: syzkaller021477 Not tainted 4.15.0-rc5+ #152 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:17 [inline] dump_stack+0x194/0x257 lib/dump_stack.c:53 lockdep_rcu_suspicious+0x123/0x170 kernel/locking/lockdep.c:4585 tipc_bearer_find+0x2b4/0x3b0 net/tipc/bearer.c:177 tipc_nl_compat_link_set+0x329/0x9f0 net/tipc/netlink_compat.c:729 __tipc_nl_compat_doit net/tipc/netlink_compat.c:288 [inline] tipc_nl_compat_doit+0x15b/0x660 net/tipc/netlink_compat.c:335 tipc_nl_compat_handle net/tipc/netlink_compat.c:1119 [inline] tipc_nl_compat_recv+0x112f/0x18f0 net/tipc/netlink_compat.c:1201 genl_family_rcv_msg+0x7b7/0xfb0 net/netlink/genetlink.c:599 genl_rcv_msg+0xb2/0x140 net/netlink/genetlink.c:624 netlink_rcv_skb+0x21e/0x460 net/netlink/af_netlink.c:2408 genl_rcv+0x28/0x40 net/netlink/genetlink.c:635 netlink_unicast_kernel net/netlink/af_netlink.c:1275 [inline] netlink_unicast+0x4e8/0x6f0 net/netlink/af_netlink.c:1301 netlink_sendmsg+0xa4a/0xe60 net/netlink/af_netlink.c:1864 sock_sendmsg_nosec net/socket.c:636 [inline] sock_sendmsg+0xca/0x110 net/socket.c:646 sock_write_iter+0x31a/0x5d0 net/socket.c:915 call_write_iter include/linux/fs.h:1772 [inline] new_sync_write fs/read_write.c:469 [inline] __vfs_write+0x684/0x970 fs/read_write.c:482 vfs_write+0x189/0x510 fs/read_write.c:544 SYSC_write fs/read_write.c:589 [inline] SyS_write+0xef/0x220 fs/read_write.c:581 do_syscall_32_irqs_on arch/x86/entry/common.c:327 [inline] do_fast_syscall_32+0x3ee/0xf9d arch/x86/entry/common.c:389 entry_SYSENTER_compat+0x54/0x63 arch/x86/entry/entry_64_compat.S:129 In order to correct the mistake, __tipc_nl_compat_doit() has been protected by RTNL lock, which means the whole operation of setting bearer/media properties is under RTNL protection. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reported-by: syzbot <syzbot+6345fd433db009b29413@syzkaller.appspotmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-14tipc: Introduce __tipc_nl_net_setYing Xue2-3/+13
Introduce __tipc_nl_net_set() which doesn't hold RTNL lock. Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-14tipc: Introduce __tipc_nl_media_setYing Xue2-9/+15
Introduce __tipc_nl_media_set() which doesn't hold RTNL lock. Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-14tipc: Introduce __tipc_nl_bearer_setYing Xue2-9/+15
Introduce __tipc_nl_bearer_set() which doesn't holding RTNL lock. Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-14tipc: Introduce __tipc_nl_bearer_enableYing Xue2-7/+11
Introduce __tipc_nl_bearer_enable() which doesn't hold RTNL lock. Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-14tipc: Introduce __tipc_nl_bearer_disableYing Xue2-6/+14
Introduce __tipc_nl_bearer_disable() which doesn't hold RTNL lock. Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-14tipc: Refactor __tipc_nl_compat_doitYing Xue1-14/+15
As preparation for adding RTNL to make (*cmd->transcode)() and (*cmd->transcode)() constantly protected by RTNL lock, we move out of memory allocations existing between them as many as possible so that the time of holding RTNL can be minimized in __tipc_nl_compat_doit(). Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-12net: make getname() functions return length rather than use int* parameterDenys Vlasenko1-3/+2
Changes since v1: Added changes in these files: drivers/infiniband/hw/usnic/usnic_transport.c drivers/staging/lustre/lnet/lnet/lib-socket.c drivers/target/iscsi/iscsi_target_login.c drivers/vhost/net.c fs/dlm/lowcomms.c fs/ocfs2/cluster/tcp.c security/tomoyo/network.c Before: All these functions either return a negative error indicator, or store length of sockaddr into "int *socklen" parameter and return zero on success. "int *socklen" parameter is awkward. For example, if caller does not care, it still needs to provide on-stack storage for the value it does not need. None of the many FOO_getname() functions of various protocols ever used old value of *socklen. They always just overwrite it. This change drops this parameter, and makes all these functions, on success, return length of sockaddr. It's always >= 0 and can be differentiated from an error. Tests in callers are changed from "if (err)" to "if (err < 0)", where needed. rpc_sockname() lost "int buflen" parameter, since its only use was to be passed to kernel_getsockname() as &buflen and subsequently not used in any way. Userspace API is not changed. text data bss dec hex filename 30108430 2633624 873672 33615726 200ef6e vmlinux.before.o 30108109 2633612 873672 33615393 200ee21 vmlinux.o Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> CC: David S. Miller <davem@davemloft.net> CC: linux-kernel@vger.kernel.org CC: netdev@vger.kernel.org CC: linux-bluetooth@vger.kernel.org CC: linux-decnet-user@lists.sourceforge.net CC: linux-wireless@vger.kernel.org CC: linux-rdma@vger.kernel.org CC: linux-sctp@vger.kernel.org CC: linux-nfs@vger.kernel.org CC: linux-x25@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-11vfs: do bulk POLL* -> EPOLL* replacementLinus Torvalds1-11/+11
This is the mindless scripted replacement of kernel use of POLL* variables as described by Al, done by this script: for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'` for f in $L; do sed -i "-es/^\([^\"]*\)\(\<POLL$V\>\)/\\1E\\2/" $f; done done with de-mangling cleanups yet to come. NOTE! On almost all architectures, the EPOLL* constants have the same values as the POLL* constants do. But they keyword here is "almost". For various bad reasons they aren't the same, and epoll() doesn't actually work quite correctly in some cases due to this on Sparc et al. The next patch from Al will sort out the final differences, and we should be all done. Scripted-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-08tipc: fix skb truesize/datasize ratio controlHoang Le1-2/+2
In commit d618d09a68e4 ("tipc: enforce valid ratio between skb truesize and contents") we introduced a test for ensuring that the condition truesize/datasize <= 4 is true for a received buffer. Unfortunately this test has two problems. - Because of the integer arithmetics the test if (skb->truesize / buf_roundup_len(skb) > 4) will miss all ratios [4 < ratio < 5], which was not the intention. - The buffer returned by skb_copy() inherits skb->truesize of the original buffer, which doesn't help the situation at all. In this commit, we change the ratio condition and replace skb_copy() with a call to skb_copy_expand() to finally get this right. Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-31Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds14-333/+423
Pull networking updates from David Miller: 1) Significantly shrink the core networking routing structures. Result of http://vger.kernel.org/~davem/seoul2017_netdev_keynote.pdf 2) Add netdevsim driver for testing various offloads, from Jakub Kicinski. 3) Support cross-chip FDB operations in DSA, from Vivien Didelot. 4) Add a 2nd listener hash table for TCP, similar to what was done for UDP. From Martin KaFai Lau. 5) Add eBPF based queue selection to tun, from Jason Wang. 6) Lockless qdisc support, from John Fastabend. 7) SCTP stream interleave support, from Xin Long. 8) Smoother TCP receive autotuning, from Eric Dumazet. 9) Lots of erspan tunneling enhancements, from William Tu. 10) Add true function call support to BPF, from Alexei Starovoitov. 11) Add explicit support for GRO HW offloading, from Michael Chan. 12) Support extack generation in more netlink subsystems. From Alexander Aring, Quentin Monnet, and Jakub Kicinski. 13) Add 1000BaseX, flow control, and EEE support to mvneta driver. From Russell King. 14) Add flow table abstraction to netfilter, from Pablo Neira Ayuso. 15) Many improvements and simplifications to the NFP driver bpf JIT, from Jakub Kicinski. 16) Support for ipv6 non-equal cost multipath routing, from Ido Schimmel. 17) Add resource abstration to devlink, from Arkadi Sharshevsky. 18) Packet scheduler classifier shared filter block support, from Jiri Pirko. 19) Avoid locking in act_csum, from Davide Caratti. 20) devinet_ioctl() simplifications from Al viro. 21) More TCP bpf improvements from Lawrence Brakmo. 22) Add support for onlink ipv6 route flag, similar to ipv4, from David Ahern. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1925 commits) tls: Add support for encryption using async offload accelerator ip6mr: fix stale iterator net/sched: kconfig: Remove blank help texts openvswitch: meter: Use 64-bit arithmetic instead of 32-bit tcp_nv: fix potential integer overflow in tcpnv_acked r8169: fix RTL8168EP take too long to complete driver initialization. qmi_wwan: Add support for Quectel EP06 rtnetlink: enable IFLA_IF_NETNSID for RTM_NEWLINK ipmr: Fix ptrdiff_t print formatting ibmvnic: Wait for device response when changing MAC qlcnic: fix deadlock bug tcp: release sk_frag.page in tcp_disconnect ipv4: Get the address of interface correctly. net_sched: gen_estimator: fix lockdep splat net: macb: Handle HRESP error net/mlx5e: IPoIB, Fix copy-paste bug in flow steering refactoring ipv6: addrconf: break critical section in addrconf_verify_rtnl() ipv6: change route cache aging logic i40e/i40evf: Update DESC_NEEDED value to reflect larger value bnxt_en: cleanup DIM work on device shutdown ...
2018-01-30Merge branch 'work.sock_recvmsg' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds1-2/+2
Pull kern_recvmsg reduction from Al Viro: "kernel_recvmsg() is a set_fs()-using wrapper for sock_recvmsg(). In all but one case that is not needed - use of ITER_KVEC for ->msg_iter takes care of the data and does not care about set_fs(). The only exception is svc_udp_recvfrom() where we want cmsg to be store into kernel object; everything else can just use sock_recvmsg() and be done with that. A followup converting svc_udp_recvfrom() away from set_fs() (and killing kernel_recvmsg() off) is *NOT* in here - I'd like to hear what netdev folks think of the approach proposed in that followup)" * 'work.sock_recvmsg' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: tipc: switch to sock_recvmsg() smc: switch to sock_recvmsg() ipvs: switch to sock_recvmsg() mISDN: switch to sock_recvmsg() drbd: switch to sock_recvmsg() lustre lnet_sock_read(): switch to sock_recvmsg() cfs2: switch to sock_recvmsg() ncpfs: switch to sock_recvmsg() dlm: switch to sock_recvmsg() svc_recvfrom(): switch to sock_recvmsg()
2018-01-30Merge branch 'misc.poll' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds1-2/+2
Pull poll annotations from Al Viro: "This introduces a __bitwise type for POLL### bitmap, and propagates the annotations through the tree. Most of that stuff is as simple as 'make ->poll() instances return __poll_t and do the same to local variables used to hold the future return value'. Some of the obvious brainos found in process are fixed (e.g. POLLIN misspelled as POLL_IN). At that point the amount of sparse warnings is low and most of them are for genuine bugs - e.g. ->poll() instance deciding to return -EINVAL instead of a bitmap. I hadn't touched those in this series - it's large enough as it is. Another problem it has caught was eventpoll() ABI mess; select.c and eventpoll.c assumed that corresponding POLL### and EPOLL### were equal. That's true for some, but not all of them - EPOLL### are arch-independent, but POLL### are not. The last commit in this series separates userland POLL### values from the (now arch-independent) kernel-side ones, converting between them in the few places where they are copied to/from userland. AFAICS, this is the least disruptive fix preserving poll(2) ABI and making epoll() work on all architectures. As it is, it's simply broken on sparc - try to give it EPOLLWRNORM and it will trigger only on what would've triggered EPOLLWRBAND on other architectures. EPOLLWRBAND and EPOLLRDHUP, OTOH, are never triggered at all on sparc. With this patch they should work consistently on all architectures" * 'misc.poll' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (37 commits) make kernel-side POLL... arch-independent eventpoll: no need to mask the result of epi_item_poll() again eventpoll: constify struct epoll_event pointers debugging printk in sg_poll() uses %x to print POLL... bitmap annotate poll(2) guts 9p: untangle ->poll() mess ->si_band gets POLL... bitmap stored into a user-visible long field ring_buffer_poll_wait() return value used as return value of ->poll() the rest of drivers/*: annotate ->poll() instances media: annotate ->poll() instances fs: annotate ->poll() instances ipc, kernel, mm: annotate ->poll() instances net: annotate ->poll() instances apparmor: annotate ->poll() instances tomoyo: annotate ->poll() instances sound: annotate ->poll() instances acpi: annotate ->poll() instances crypto: annotate ->poll() instances block: annotate ->poll() instances x86: annotate ->poll() instances ...
2018-01-19tipc: fix race between poll() and setsockopt()Jon Maloy3-17/+13
Letting tipc_poll() dereference a socket's pointer to struct tipc_group entails a race risk, as the group item may be deleted in a concurrent tipc_sk_join() or tipc_sk_leave() thread. We now move the 'open' flag in struct tipc_group to struct tipc_sock, and let the former retain only a pointer to the moved field. This will eliminate the race risk. Reported-by: syzbot+799dafde0286795858ac@syzkaller.appspotmail.com Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-17Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-12/+14
Overlapping changes all over. The mini-qdisc bits were a little bit tricky, however. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-16tipc: fix race condition at topology server receiveJon Maloy3-46/+51
We have identified a race condition during reception of socket events and messages in the topology server. - The function tipc_close_conn() is releasing the corresponding struct tipc_subscriber instance without considering that there may still be items in the receive work queue. When those are scheduled, in the function tipc_receive_from_work(), they are using the subscriber pointer stored in struct tipc_conn, without first checking if this is valid or not. This will sometimes lead to crashes, as the next call of tipc_conn_recvmsg() will access the now deleted item. We fix this by making the usage of this pointer conditional on whether the connection is active or not. I.e., we check the condition test_bit(CF_CONNECTED) before making the call tipc_conn_recvmsg(). - Since the two functions may be running on different cores, the condition test described above is not enough. tipc_close_conn() may come in between and delete the subscriber item after the condition test is done, but before tipc_conn_recv_msg() is finished. This happens less frequently than the problem described above, but leads to the same symptoms. We fix this by using the existing sk_callback_lock for mutual exclusion in the two functions. In addition, we have to move a call to tipc_conn_terminate() outside the mentioned lock to avoid deadlock. Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-15tipc: fix bug during lookup of multicast destination nodesJon Maloy3-8/+4
In commit 232d07b74a33 ("tipc: improve groupcast scope handling") we inadvertently broke non-group multicast transmission when changing the parameter 'domain' to 'scope' in the function tipc_nametbl_lookup_dst_nodes(). We missed to make the corresponding change in the calling function, with the result that the lookup always fails. A closer anaysis reveals that this parameter is not needed at all. Non-group multicast is hard coded to use CLUSTER_SCOPE, and in the current implementation this will be delivered to all matching destinations except those which are published with NODE_SCOPE on other nodes. Since such publications never will be visible on the sending node anyway, it makes no sense to discriminate by scope at all. We now remove this parameter altogether. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-15tipc: fix a memory leak in tipc_nl_node_get_link()Cong Wang1-12/+14
When tipc_node_find_by_name() fails, the nlmsg is not freed. While on it, switch to a goto label to properly free it. Fixes: be9c086715c ("tipc: narrow down exposure of struct tipc_node") Reported-by: Dmitry Vyukov <dvyukov@google.com> Cc: Jon Maloy <jon.maloy@ericsson.com> Cc: Ying Xue <ying.xue@windriver.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-15tipc: fix a potental access after delete in tipc_sk_join()Jon Maloy1-0/+1
In commit d12d2e12cec2 "tipc: send out join messages as soon as new member is discovered") we added a call to the function tipc_group_join() without considering the case that the preceding tipc_sk_publish() might have failed, and the group item already deleted. We fix this by returning from tipc_sk_join() directly after the failed tipc_sk_publish. Reported-by: syzbot+e3eeae78ea88b8d6d858@syzkaller.appspotmail.com Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-09tipc: improve poll() for group member socketJon Maloy3-33/+41
The current criteria for returning POLLOUT from a group member socket is too simplistic. It basically returns POLLOUT as soon as the group has external destinations, something obviously leading to a lot of spinning during destination congestion situations. At the same time, the internal congestion handling is unnecessarily complex. We now change this as follows. - We introduce an 'open' flag in struct tipc_group. This flag is used only to help poll() get the setting of POLLOUT right, and *not* for congeston handling as such. This means that a user can choose to ignore an EAGAIN for a destination and go on sending messages to other destinations in the group if he wants to. - The flag is set to false every time we return EAGAIN on a send call. - The flag is set to true every time any member, i.e., not necessarily the member that caused EAGAIN, is removed from the small_win list. - We remove the group member 'usr_pending' flag. The size of the send window and presence in the 'small_win' list is sufficient criteria for recognizing congestion. This solution seems to be a reasonable compromise between 'anycast', which is normally not waiting for POLLOUT for a specific destination, and the other three send modes, which are. Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-09tipc: improve groupcast scope handlingJon Maloy8-71/+96
When a member joins a group, it also indicates a binding scope. This makes it possible to create both node local groups, invisible to other nodes, as well as cluster global groups, visible everywhere. In order to avoid that different members end up having permanently differing views of group size and memberhip, we must inhibit locally and globally bound members from joining the same group. We do this by using the binding scope as an additional separator between groups. I.e., a member must ignore all membership events from sockets using a different scope than itself, and all lookups for message destinations must require an exact match between the message's lookup scope and the potential target's binding scope. Apart from making it possible to create local groups using the same identity on different nodes, a side effect of this is that it now also becomes possible to create a cluster global group with the same identity across the same nodes, without interfering with the local groups. Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-09tipc: add option to suppress PUBLISH events for pre-existing publicationsJon Maloy6-15/+21
Currently, when a user is subscribing for binding table publications, he will receive a PUBLISH event for all already existing matching items in the binding table. However, a group socket making a subscriptions doesn't need this initial status update from the binding table, because it has already scanned it during the join operation. Worse, the multiplicatory effect of issuing mutual events for dozens or hundreds group members within a short time frame put a heavy load on the topology server, with the end result that scale out operations on a big group tend to take much longer than needed. We now add a new filter option, TIPC_SUB_NO_STATUS, for topology server subscriptions, so that this initial avalanche of events is suppressed. This change, along with the previous commit, significantly improves the range and speed of group scale out operations. We keep the new option internal for the tipc driver, at least for now. Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-09tipc: send out join messages as soon as new member is discoveredJon Maloy4-42/+68
When a socket is joining a group, we look up in the binding table to find if there are already other members of the group present. This is used for being able to return EAGAIN instead of EHOSTUNREACH if the user proceeds directly to a send attempt. However, the information in the binding table can be used to directly set the created member in state MBR_PUBLISHED and send a JOIN message to the peer, instead of waiting for a topology PUBLISH event to do this. When there are many members in a group, the propagation time for such events can be significant, and we can save time during the join operation if we use the initial lookup result fully. In this commit, we eliminate the member state MBR_DISCOVERED which has been the result of the initial lookup, and do instead go directly to MBR_PUBLISHED, which initiates the setup. After this change, the tipc_member FSM looks as follows: +-----------+ ---->| PUBLISHED |-----------------------------------------------+ PUB- +-----------+ LEAVE/WITHRAW | LISH |JOIN | | +-------------------------------------------+ | | | LEAVE/WITHDRAW | | | | +------------+ | | | | +----------->| PENDING |---------+ | | | | |msg/maxactv +-+---+------+ LEAVE/ | | | | | | | | WITHDRAW | | | | | | +----------+ | | | | | | | |revert/maxactv| | | | | | | V V V V V | +----------+ msg +------------+ +-----------+ +-->| JOINED |------>| ACTIVE |------>| LEAVING |---> | +----------+ +--- -+------+ LEAVE/+-----------+DOWN | A A | WITHDRAW A A A EVT | | | |RECLAIM | | | | | |REMIT V | | | | | |== adv +------------+ | | | | | +---------| RECLAIMING |--------+ | | | | +-----+------+ LEAVE/ | | | | |REMIT WITHDRAW | | | | |< adv | | | |msg/ V LEAVE/ | | | |adv==ADV_IDLE+------------+ WITHDRAW | | | +-------------| REMITTED |------------+ | | +------------+ | |PUBLISH | JOIN +-----------+ LEAVE/WITHDRAW | ---->| JOINING |-----------------------------------------------+ +-----------+ Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-09tipc: simplify group LEAVE sequenceJon Maloy1-31/+9
After the changes in the previous commit the group LEAVE sequence can be simplified. We now let the arrival of a LEAVE message unconditionally issue a group DOWN event to the user. When a topology WITHDRAW event is received, the member, if it still there, is set to state LEAVING, but we only issue a group DOWN event when the link to the peer node is gone, so that no LEAVE message is to be expected. Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>