aboutsummaryrefslogtreecommitdiffstats
path: root/net/rds/cong.c (follow)
AgeCommit message (Collapse)AuthorFilesLines
2018-02-08rds: tcp: use rds_destroy_pending() to synchronize netns/module teardown and rds connection/workq managementSowmini Varadhan1-1/+1
An rds_connection can get added during netns deletion between lines 528 and 529 of 506 static void rds_tcp_kill_sock(struct net *net) : /* code to pull out all the rds_connections that should be destroyed */ : 528 spin_unlock_irq(&rds_tcp_conn_lock); 529 list_for_each_entry_safe(tc, _tc, &tmp_list, t_tcp_node) 530 rds_conn_destroy(tc->t_cpath->cp_conn); Such an rds_connection would miss out the rds_conn_destroy() loop (that cancels all pending work) and (if it was scheduled after netns deletion) could trigger the use-after-free. A similar race-window exists for the module unload path in rds_tcp_exit -> rds_tcp_destroy_conns Concurrency with netns deletion (rds_tcp_kill_sock()) must be handled by checking check_net() before enqueuing new work or adding new connections. Concurrency with module-unload is handled by maintaining a module specific flag that is set at the start of the module exit function, and must be checked before enqueuing new work or adding new connections. This commit refactors existing RDS_DESTROY_PENDING checks added by commit 3db6e0d172c9 ("rds: use RCU to synchronize work-enqueue with connection teardown") and consolidates all the concurrency checks listed above into the function rds_destroy_pending(). Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-05rds: use RCU to synchronize work-enqueue with connection teardownSowmini Varadhan1-3/+7
rds_sendmsg() can enqueue work on cp_send_w from process context, but it should not enqueue this work if connection teardown has commenced (else we risk enquing work after rds_conn_path_destroy() has assumed that all work has been cancelled/flushed). Similarly some other functions like rds_cong_queue_updates and rds_tcp_data_ready are called in softirq context, and may end up enqueuing work on rds_wq after rds_conn_path_destroy() has assumed that all workqs are quiesced. Check the RDS_DESTROY_PENDING bit and use rcu synchronization to avoid all these races. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-14RDS: split out connection specific state from rds_connection to rds_conn_pathSowmini Varadhan1-1/+2
In preparation for multipath RDS, split the rds_connection structure into a base structure, and a per-path struct rds_conn_path. The base structure tracks information and locks common to all paths. The workqs for send/recv/shutdown etc are tracked per rds_conn_path. Thus the workq callbacks now work with rds_conn_path. This commit allows for one rds_conn_path per rds_connection, and will be extended into multiple conn_paths in subsequent commits. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-16RDS: Fix the atomicity for congestion map updatesantosh.shilimkar@oracle.com1-2/+2
Two different threads with different rds sockets may be in rds_recv_rcvbuf_delta() via receive path. If their ports both map to the same word in the congestion map, then using non-atomic ops to update it could cause the map to be incorrect. Lets use atomics to avoid such an issue. Full credit to Wengang <wen.gang.wang@oracle.com> for finding the issue, analysing it and also pointing out to offending code with spin lock based fix. Reviewed-by: Leon Romanovsky <leon@leon.nu> Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-11rds: rds_cong_queue_updates needs to defer the congestion update transmissionSowmini Varadhan1-1/+15
When the RDS transport is TCP, we cannot inline the call to rds_send_xmit from rds_cong_queue_update because (a) we are already holding the sock_lock in the recv path, and will deadlock when tcp_setsockopt/tcp_sendmsg try to get the sock lock (b) cong_queue_update does an irqsave on the rds_cong_lock, and this will trigger warnings (for a good reason) from functions called out of sock_lock. This patch reverts the change introduced by 2fa57129d ("RDS: Bypass workqueue when queueing cong updates"). The patch has been verified for both RDS/TCP as well as RDS/RDMA to ensure that there are not regressions for either transport: - for verification of RDS/TCP a client-server unit-test was used, with the server blocked in gdb and thus unable to drain its rcvbuf, eventually triggering a RDS congestion update. - for RDS/RDMA, the standard IB regression tests were used Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-31net: Add export.h for EXPORT_SYMBOL/THIS_MODULE to non-modulesPaul Gortmaker1-0/+1
These files are non modular, but need to export symbols using the macros now living in export.h -- call out the include so that things won't break when we remove the implicit presence of module.h from everywhere. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-03-23rds: use little-endian bitopsAkinobu Mita1-3/+3
As a preparation for removing ext2 non-atomic bit operations from asm/bitops.h. This converts ext2 non-atomic bit operations to little-endian bit operations. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Andy Grover <andy.grover@oracle.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-03-23rds: stop including asm-generic/bitops/le.h directlyAkinobu Mita1-5/+4
asm-generic/bitops/le.h is only intended to be included directly from asm-generic/bitops/ext2-non-atomic.h or asm-generic/bitops/minix-le.h which implements generic ext2 or minix bit operations. This stops including asm-generic/bitops/le.h directly and use ext2 non-atomic bit operations instead. It seems odd to use ext2_*_bit() on rds, but it will replaced with __{set,clear,test}_bit_le() after introducing little endian bit operations for all architectures. This indirect step is necessary to maintain bisectability for some architectures which have their own little-endian bit operations. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Andy Grover <andy.grover@oracle.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-09-08RDS: Bypass workqueue when queueing cong updatesAndy Grover1-1/+1
Now that rds_send_xmit() does not block, we can call it directly instead of going through the helper thread. Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08RDS: cleanup: remove "== NULL"s and "!= NULL"s in ptr comparisonsAndy Grover1-3/+3
Favor "if (foo)" style over "if (foo != NULL)". Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-04-11Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6David S. Miller1-0/+1
Conflicts: drivers/net/stmmac/stmmac_main.c drivers/net/wireless/wl12xx/wl1271_cmd.c drivers/net/wireless/wl12xx/wl1271_main.c drivers/net/wireless/wl12xx/wl1271_spi.c net/core/ethtool.c net/mac80211/scan.c
2010-03-30include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.hTejun Heo1-0/+1
percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-16RDS: Fix congestion issues for loopbackAndy Grover1-2/+0
We have two kinds of loopback: software (via loop transport) and hardware (via IB). sw is used for 127.0.0.1, and doesn't support rdma ops. hw is used for sends to local device IPs, and supports rdma. Both are used in different cases. For both of these, when there is a congestion map update, we want to call rds_cong_map_updated() but not actually send anything -- since loopback local and foreign congestion maps point to the same spot, they're already in sync. The old code never called sw loop's xmit_cong_map(),so rds_cong_map_updated() wasn't being called for it. sw loop ports would not work right with the congestion monitor. Fixing that meant that hw loopback now would send congestion maps to itself. This is also undesirable (racy), so we check for this case in the ib-specific xmit code. Signed-off-by: Andy Grover <andy.grover@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-30RDS: Do not send congestion updates to loopback connectionsAndy Grover1-0/+2
This issue was discovered by HP's Pradeep and fixed in OFED 1.3, but not fixed in later versions, since the fix's implementation was not immediately applyable to the later code. This patch should do the trick for 1.4+ codebases. Signed-off-by: Andy Grover <andy.grover@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-23RDS: Export symbols from core RDSAndy Grover1-0/+1
Now that rdma and tcp transports will be modularized, we need to export a number of functions so they can call them. Signed-off-by: Andy Grover <andy.grover@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-02rds: Fix build on powerpc.David S. Miller1-0/+2
As reported by Stephen Rothwell. > Today's linux-next build (powerpc allyesconfig) failed like this: > > net/rds/cong.c: In function 'rds_cong_set_bit': > net/rds/cong.c:284: error: implicit declaration of function 'generic___set_le_bit' > net/rds/cong.c: In function 'rds_cong_clear_bit': > net/rds/cong.c:298: error: implicit declaration of function 'generic___clear_le_bit' > net/rds/cong.c: In function 'rds_cong_test_bit': > net/rds/cong.c:309: error: implicit declaration of function 'generic_test_le_bit' Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-26RDS: Congestion-handling codeAndy Grover1-0/+402
RDS handles per-socket congestion by updating peers with a complete congestion map (8KB). This code keeps track of these maps for itself and ones received from peers. Signed-off-by: Andy Grover <andy.grover@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>