aboutsummaryrefslogtreecommitdiffstats
path: root/net (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2013-10-23net: always inline net_secret_initHannes Frederic Sowa1-1/+1
Currently net_secret_init does not get inlined, so we always have a call to net_secret_init even in the fast path. Let's specify net_secret_init as __always_inline so we have the nop in the fast-path without the call to net_secret_init and the unlikely path at the epilogue of the function. jump_labels handle the inlining correctly. Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-22Revert "bridge: only expire the mdb entry when query is received"Linus Lüssing3-22/+28
While this commit was a good attempt to fix issues occuring when no multicast querier is present, this commit still has two more issues: 1) There are cases where mdb entries do not expire even if there is a querier present. The bridge will unnecessarily continue flooding multicast packets on the according ports. 2) Never removing an mdb entry could be exploited for a Denial of Service by an attacker on the local link, slowly, but steadily eating up all memory. Actually, this commit became obsolete with "bridge: disable snooping if there is no querier" (b00589af3b) which included fixes for a few more cases. Therefore reverting the following commits (the commit stated in the commit message plus three of its follow up fixes): ==================== Revert "bridge: update mdb expiration timer upon reports." This reverts commit f144febd93d5ee534fdf23505ab091b2b9088edc. Revert "bridge: do not call setup_timer() multiple times" This reverts commit 1faabf2aab1fdaa1ace4e8c829d1b9cf7bfec2f1. Revert "bridge: fix some kernel warning in multicast timer" This reverts commit c7e8e8a8f7a70b343ca1e0f90a31e35ab2d16de1. Revert "bridge: only expire the mdb entry when query is received" This reverts commit 9f00b2e7cf241fa389733d41b615efdaa2cb0f5b. ==================== CC: Cong Wang <amwang@redhat.com> Signed-off-by: Linus Lüssing <linus.luessing@web.de> Reviewed-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-22net: remove function sk_reset_txq()ZHAO Gang1-6/+0
What sk_reset_txq() does is just calls function sk_tx_queue_reset(), and sk_reset_txq() is used only in sock.h, by dst_negative_advice(). Let dst_negative_advice() calls sk_tx_queue_reset() directly so we can remove unneeded sk_reset_txq(). Signed-off-by: ZHAO Gang <gamerh2o@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-21tcp: initialize passive-side sk_pacing_rate after 3WHSNeal Cardwell1-0/+2
For passive TCP connections, upon receiving the ACK that completes the 3WHS, make sure we set our pacing rate after we get our first RTT sample. On passive TCP connections, when we receive the ACK completing the 3WHS we do not take an RTT sample in tcp_ack(), but rather in tcp_synack_rtt_meas(). So upon receiving the ACK that completes the 3WHS, tcp_ack() leaves sk_pacing_rate at its initial value. Originally the initial sk_pacing_rate value was 0, so passive-side connections defaulted to sysctl_tcp_min_tso_segs (2 segs) in skbuffs made in the first RTT. With a default initial cwnd of 10 packets, this happened to be correct for RTTs 5ms or bigger, so it was hard to see problems in WAN or emulated WAN testing. Since 7eec4174ff ("pkt_sched: fq: fix non TCP flows pacing"), the initial sk_pacing_rate is 0xffffffff. So after that change, passive TCP connections were keeping this value (and using large numbers of segments per skbuff) until receiving an ACK for data. Signed-off-by: Neal Cardwell <ncardwell@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Yuchung Cheng <ycheng@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-21ipv6: probe routes asynchronous in rt6_probeHannes Frederic Sowa1-7/+31
Routes need to be probed asynchronous otherwise the call stack gets exhausted when the kernel attemps to deliver another skb inline, like e.g. xt_TEE does, and we probe at the same time. We update neigh->updated still at once, otherwise we would send to many probes. Cc: Julian Anastasov <ja@ssi.bg> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-21ipv6: sit: add GSO/TSO supportEric Dumazet6-9/+34
Now ipv6_gso_segment() is stackable, its relatively easy to implement GSO/TSO support for SIT tunnels Performance results, when segmentation is done after tunnel device (as no NIC is yet enabled for TSO SIT support) : Before patch : lpq84:~# ./netperf -H 2002:af6:1153:: -Cc MIGRATED TCP STREAM TEST from ::0 (::) port 0 AF_INET6 to 2002:af6:1153:: () port 0 AF_INET6 Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB 87380 16384 16384 10.00 3168.31 4.81 4.64 2.988 2.877 After patch : lpq84:~# ./netperf -H 2002:af6:1153:: -Cc MIGRATED TCP STREAM TEST from ::0 (::) port 0 AF_INET6 to 2002:af6:1153:: () port 0 AF_INET6 Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB 87380 16384 16384 10.00 5525.00 7.76 5.17 2.763 1.840 Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-21ipv6: gso: make ipv6_gso_segment() stackableEric Dumazet1-6/+17
In order to support GSO on SIT tunnels, we need to make inet_gso_segment() stackable. It should not assume network header starts right after mac header. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-21ipv4: Allow unprivileged users to use per net sysctlsEric W. Biederman1-4/+0
Allow unprivileged users to use: /proc/sys/net/ipv4/icmp_echo_ignore_all /proc/sys/net/ipv4/icmp_echo_ignore_broadcasts /proc/sys/net/ipv4/icmp_ignore_bogus_error_response /proc/sys/net/ipv4/icmp_errors_use_inbound_ifaddr /proc/sys/net/ipv4/icmp_ratelimit /proc/sys/net/ipv4/icmp_ratemask /proc/sys/net/ipv4/ping_group_range /proc/sys/net/ipv4/tcp_ecn /proc/sys/net/ipv4/ip_local_ports_range These are occassionally handy and after a quick review I don't see any problems with unprivileged users using them. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-21ipv4: Use math to point per net sysctls into the appropriate struct net.Eric W. Biederman1-18/+5
Simplify maintenance of ipv4_net_table by using math to point the per net sysctls into the appropriate struct net, instead of manually reassinging all of the variables into hard coded table slots. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-21tcp_memcontrol: Kill struct tcp_memcontrolEric W. Biederman1-43/+18
Replace the pointers in struct cg_proto with actual data fields and kill struct tcp_memcontrol as it is not fully redundant. This removes a confusing, unnecessary layer of abstraction. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-21tcp_memcontrol: Remove the per netns control.Eric W. Biederman7-31/+20
The code that is implemented is per memory cgroup not per netns, and having per netns bits is just confusing. Remove the per netns bits to make it easier to see what is really going on. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-21tcp_memcontrol: Remove setting cgroup settings via sysctlEric W. Biederman2-51/+2
The code is broken and does not constrain sysctl_tcp_mem as tcp_update_limit does. With the result that it allows the cgroup tcp memory limits to be bypassed. The semantics are broken as the settings are not per netns and are in a per netns table, and instead looks at current. Since the code is broken in both design and implementation and does not implement the functionality for which it was written remove it. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-21tcp_memcontrol: Remove tcp_max_memoryEric W. Biederman1-13/+0
This function is never called. Remove it. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-21netfilter: nf_conntrack: fix rt6i_gateway checks for H.323 helperJulian Anastasov1-2/+2
Now when rt6_nexthop() can return nexthop address we can use it for proper nexthop comparison of directly connected destinations. For more information refer to commit bbb5823cf742a7 ("netfilter: nf_conntrack: fix rt_gateway checks for H.323 helper"). Signed-off-by: Julian Anastasov <ja@ssi.bg> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-21ipv6: fill rt6i_gateway with nexthop addressJulian Anastasov2-4/+8
Make sure rt6i_gateway contains nexthop information in all routes returned from lookup or when routes are directly attached to skb for generated ICMP packets. The effect of this patch should be a faster version of rt6_nexthop() and the consideration of local addresses as nexthop. Signed-off-by: Julian Anastasov <ja@ssi.bg> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-19net: switch net_secret key generation to net_get_random_onceHannes Frederic Sowa1-12/+2
Cc: Eric Dumazet <edumazet@google.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-19tcp: switch tcp_fastopen key generation to net_get_random_onceHannes Frederic Sowa2-11/+21
Changed key initialization of tcp_fastopen cookies to net_get_random_once. If the user sets a custom key net_get_random_once must be called at least once to ensure we don't overwrite the user provided key when the first cookie is generated later on. Cc: Yuchung Cheng <ycheng@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-19inet: convert inet_ehash_secret and ipv6_hash_secret to net_get_random_onceHannes Frederic Sowa7-42/+44
Initialize the ehash and ipv6_hash_secrets with net_get_random_once. Each compilation unit gets its own secret now: ipv4/inet_hashtables.o ipv4/udp.o ipv6/inet6_hashtables.o ipv6/udp.o rds/connection.o The functions still get inlined into the hashing functions. In the fast path we have at most two (needed in ipv6) if (unlikely(...)). Cc: Eric Dumazet <edumazet@google.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-19inet: split syncookie keys for ipv4 and ipv6 and initialize with net_get_random_onceHannes Frederic Sowa2-13/+14
This patch splits the secret key for syncookies for ipv4 and ipv6 and initializes them with net_get_random_once. This change was the reason I did this series. I think the initialization of the syncookie_secret is way to early. Cc: Florian Westphal <fw@strlen.de> Cc: Eric Dumazet <edumazet@google.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-19net: introduce new macro net_get_random_onceHannes Frederic Sowa1-0/+48
net_get_random_once is a new macro which handles the initialization of secret keys. It is possible to call it in the fast path. Only the initialization depends on the spinlock and is rather slow. Otherwise it should get used just before the key is used to delay the entropy extration as late as possible to get better randomness. It returns true if the key got initialized. The usage of static_keys for net_get_random_once is a bit uncommon so it needs some further explanation why this actually works: === In the simple non-HAVE_JUMP_LABEL case we actually have === no constrains to use static_key_(true|false) on keys initialized with STATIC_KEY_INIT_(FALSE|TRUE). So this path just expands in favor of the likely case that the initialization is already done. The key is initialized like this: ___done_key = { .enabled = ATOMIC_INIT(0) } The check if (!static_key_true(&___done_key)) \ expands into (pseudo code) if (!likely(___done_key > 0)) , so we take the fast path as soon as ___done_key is increased from the helper function. === If HAVE_JUMP_LABELs are available this depends === on patching of jumps into the prepared NOPs, which is done in jump_label_init at boot-up time (from start_kernel). It is forbidden and dangerous to use net_get_random_once in functions which are called before that! At compilation time NOPs are generated at the call sites of net_get_random_once. E.g. net/ipv6/inet6_hashtable.c:inet6_ehashfn (we need to call net_get_random_once two times in inet6_ehashfn, so two NOPs): 71: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) 76: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) Both will be patched to the actual jumps to the end of the function to call __net_get_random_once at boot time as explained above. arch_static_branch is optimized and inlined for false as return value and actually also returns false in case the NOP is placed in the instruction stream. So in the fast case we get a "return false". But because we initialize ___done_key with (enabled != (entries & 1)) this call-site will get patched up at boot thus returning true. The final check looks like this: if (!static_key_true(&___done_key)) \ ___ret = __net_get_random_once(buf, \ expands to if (!!static_key_false(&___done_key)) \ ___ret = __net_get_random_once(buf, \ So we get true at boot time and as soon as static_key_slow_inc is called on the key it will invert the logic and return false for the fast path. static_key_slow_inc will change the branch because it got initialized with .enabled == 0. After static_key_slow_inc is called on the key the branch is replaced with a nop again. === Misc: === The helper defers the increment into a workqueue so we don't have problems calling this code from atomic sections. A seperate boolean (___done) guards the case where we enter net_get_random_once again before the increment happend. Cc: Ingo Molnar <mingo@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Jason Baron <jbaron@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Eric Dumazet <edumazet@google.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-19ipv6: split inet6_ehashfn to hash functions per compilation unitHannes Frederic Sowa2-4/+40
This patch splits the inet6_ehashfn into separate ones in ipv6/inet6_hashtables.o and ipv6/udp.o to ease the introduction of seperate secrets keys later. Cc: Eric Dumazet <edumazet@google.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-19ipv4: split inet_ehashfn to hash functions per compilation unitHannes Frederic Sowa3-7/+36
This duplicates a bit of code but let's us easily introduce separate secret keys later. The separate compilation units are ipv4/inet_hashtabbles.o, ipv4/udp.o and rds/connection.o. Cc: Eric Dumazet <edumazet@google.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-19ipip: add GSO/TSO supportEric Dumazet9-6/+23
Now inet_gso_segment() is stackable, its relatively easy to implement GSO/TSO support for IPIP Performance results, when segmentation is done after tunnel device (as no NIC is yet enabled for TSO IPIP support) : Before patch : lpq83:~# ./netperf -H 7.7.9.84 -Cc MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.9.84 () port 0 AF_INET Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB 87380 16384 16384 10.00 3357.88 5.09 3.70 2.983 2.167 After patch : lpq83:~# ./netperf -H 7.7.9.84 -Cc MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.9.84 () port 0 AF_INET Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB 87380 16384 16384 10.00 7710.19 4.52 6.62 1.152 1.687 Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-19ipv4: gso: make inet_gso_segment() stackableEric Dumazet2-7/+20
In order to support GSO on IPIP, we need to make inet_gso_segment() stackable. It should not assume network header starts right after mac header. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-19ipv4: generalize gre_handle_offloadsEric Dumazet2-29/+33
This patch makes gre_handle_offloads() more generic and rename it to iptunnel_handle_offloads() This will be used to add GSO/TSO support to IPIP tunnels. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-19net: generalize skb_segment()Eric Dumazet1-17/+5
While implementing GSO/TSO support for IPIP, I found skb_segment() was assuming network header was immediately following mac header. Its not really true in the case inet_gso_segment() is stacked : By the time tcp_gso_segment() is called, network header points to the inner IP header. Let's instead assume nothing and pick the current offsets found in original skb, we have skb_headers_offset_update() helper for that. Also move the csum_start update inside skb_headers_offset_update() Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-19ip_output: do skb ufo init for peeked non ufo skb as wellJiri Pirko1-4/+9
Now, if user application does: sendto len<mtu flag MSG_MORE sendto len>mtu flag 0 The skb is not treated as fragmented one because it is not initialized that way. So move the initialization to fix this. introduced by: commit e89e9cf539a28df7d0eb1d0a545368e9920b34ac "[IPv4/IPv6]: UFO Scatter-gather approach" Signed-off-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-19ip6_output: do skb ufo init for peeked non ufo skb as wellJiri Pirko1-11/+14
Now, if user application does: sendto len<mtu flag MSG_MORE sendto len>mtu flag 0 The skb is not treated as fragmented one because it is not initialized that way. So move the initialization to fix this. introduced by: commit e89e9cf539a28df7d0eb1d0a545368e9920b34ac "[IPv4/IPv6]: UFO Scatter-gather approach" Signed-off-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-19udp6: respect IPV6_DONTFRAG sockopt in case there are pending framesJiri Pirko1-3/+2
if up->pending != 0 dontfrag is left with default value -1. That causes that application that do: sendto len>mtu flag MSG_MORE sendto len>mtu flag 0 will receive EMSGSIZE errno as the result of the second sendto. This patch fixes it by respecting IPV6_DONTFRAG socket option. introduced by: commit 4b340ae20d0e2366792abe70f46629e576adaf5e "IPv6: Complete IPV6_DONTFRAG support" Signed-off-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-19ipv6: gso: remove redundant lockingEric Dumazet1-4/+1
ipv6_gso_send_check() and ipv6_gso_segment() are called by skb_mac_gso_segment() under rcu lock, no need to use rcu_read_lock() / rcu_read_unlock() Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-19net: misc: Remove extern from function prototypesJoe Perches11-167/+157
There are a mix of function prototypes with and without extern in the kernel sources. Standardize on not using extern for function prototypes. Function prototypes don't need to be written with extern. extern is assumed by the compiler. Its use is as unnecessary as using auto to declare automatic/local variables in a block. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-19net: ipv4/ipv6: Remove extern from function prototypesJoe Perches4-59/+54
There are a mix of function prototypes with and without extern in the kernel sources. Standardize on not using extern for function prototypes. Function prototypes don't need to be written with extern. extern is assumed by the compiler. Its use is as unnecessary as using auto to declare automatic/local variables in a block. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-19net: dccp: Remove extern from function prototypesJoe Perches7-158/+148
There are a mix of function prototypes with and without extern in the kernel sources. Standardize on not using extern for function prototypes. Function prototypes don't need to be written with extern. extern is assumed by the compiler. Its use is as unnecessary as using auto to declare automatic/local variables in a block. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-19net: 8021q/bluetooth/bridge/can/ceph: Remove extern from function prototypesJoe Perches8-219/+194
There are a mix of function prototypes with and without extern in the kernel sources. Standardize on not using extern for function prototypes. Function prototypes don't need to be written with extern. extern is assumed by the compiler. Its use is as unnecessary as using auto to declare automatic/local variables in a block. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-19ipv4: gso: send_check() & segment() cleanupsEric Dumazet1-13/+11
inet_gso_segment() and inet_gso_send_check() are called by skb_mac_gso_segment() under rcu lock, no need to use rcu_read_lock() / rcu_read_unlock() Avoid calling ip_hdr() twice per function. We can use ip_send_check() helper. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-19net: unix: inherit SOCK_PASS{CRED, SEC} flags from socket to fix raceDaniel Borkmann1-0/+10
In the case of credentials passing in unix stream sockets (dgram sockets seem not affected), we get a rather sparse race after commit 16e5726 ("af_unix: dont send SCM_CREDENTIALS by default"). We have a stream server on receiver side that requests credential passing from senders (e.g. nc -U). Since we need to set SO_PASSCRED on each spawned/accepted socket on server side to 1 first (as it's not inherited), it can happen that in the time between accept() and setsockopt() we get interrupted, the sender is being scheduled and continues with passing data to our receiver. At that time SO_PASSCRED is neither set on sender nor receiver side, hence in cmsg's SCM_CREDENTIALS we get eventually pid:0, uid:65534, gid:65534 (== overflow{u,g}id) instead of what we actually would like to see. On the sender side, here nc -U, the tests in maybe_add_creds() invoked through unix_stream_sendmsg() would fail, as at that exact time, as mentioned, the sender has neither SO_PASSCRED on his side nor sees it on the server side, and we have a valid 'other' socket in place. Thus, sender believes it would just look like a normal connection, not needing/requesting SO_PASSCRED at that time. As reverting 16e5726 would not be an option due to the significant performance regression reported when having creds always passed, one way/trade-off to prevent that would be to set SO_PASSCRED on the listener socket and allow inheriting these flags to the spawned socket on server side in accept(). It seems also logical to do so if we'd tell the listener socket to pass those flags onwards, and would fix the race. Before, strace: recvmsg(4, {msg_name(0)=NULL, msg_iov(1)=[{"blub\n", 4096}], msg_controllen=32, {cmsg_len=28, cmsg_level=SOL_SOCKET, cmsg_type=SCM_CREDENTIALS{pid=0, uid=65534, gid=65534}}, msg_flags=0}, 0) = 5 After, strace: recvmsg(4, {msg_name(0)=NULL, msg_iov(1)=[{"blub\n", 4096}], msg_controllen=32, {cmsg_len=28, cmsg_level=SOL_SOCKET, cmsg_type=SCM_CREDENTIALS{pid=11580, uid=1000, gid=1000}}, msg_flags=0}, 0) = 5 Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-19batman-adv: make the backbone gw check VLAN specificAntonio Quartulli3-43/+27
The backbone gw check has to be VLAN specific so that code using it can specify VID where the check has to be done. In the TT code, the check has been moved into the tt_global_add() function so that it can be performed on a per-entry basis instead of ignoring all the TT data received from another backbone node. Only TT global entries belonging to the VLAN where the backbone node is connected to are skipped. All the other spots where the TT code was checking whether a node is a backbone have been removed. Moreover, batadv_bla_is_backbone_gw_orig() now returns bool since it used to return only 1 or 0. Cc: Simon Wunderlich <siwu@hrz.tu-chemnitz.de> Signed-off-by: Antonio Quartulli <antonio@open-mesh.com> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2013-10-19batman-adv: make the TT global purge routine VLAN specificAntonio Quartulli5-6/+21
Instead of unconditionally removing all the TT entries served by a given originator, make tt_global_orig_del() remove only entries matching a given VLAN identifier provided as argument. If such argument is negative all the global entries served by the originator are removed. This change is used into the BLA code to purge entries served by a newly discovered Backbone node, but limiting the operation only to those connected to the VLAN where the backbone has been discovered. Cc: Simon Wunderlich <siwu@hrz.tu-chemnitz.de> Signed-off-by: Antonio Quartulli <antonio@open-mesh.com> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2013-10-19batman-adv: make the TT CRC logic VLAN specificAntonio Quartulli5-157/+730
This change allows nodes to handle the TT table on a per-VLAN basis. This is needed because nodes may have to store only some of the global entries advertised by another node. In this scenario such nodes would re-create only a partial global table and would not be able to compute a correct CRC anymore. This patch splits the logic and introduces one CRC per VLAN. In this way a node fetching only some entries belonging to some VLANs is still able to compute the needed CRCs and still check the table correctness. With this patch the shape of the TVLV-TT is changed too because now a node needs to advertise all the CRCs of all the VLANs that it is wired to. The debug output of the local Translation Table now shows the CRC along with each entry since there is not a common value for the entire table anymore. Signed-off-by: Antonio Quartulli <antonio@open-mesh.com> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2013-10-19batman-adv: lock around TT operations to avoid sending inconsistent dataAntonio Quartulli4-4/+34
A TT response may be prepared and sent while the local or global translation table is getting updated. The worst case is when one of the tables is accessed after its content has been recently updated but the metadata (TTVN/CRC) has not yet. In this case the reader will get a table content which does not match the TTVN/CRC. This will lead to an inconsistent state and so to a TT recovery. To avoid entering this situation, put a lock around those TT operations recomputing the metadata and around the TT Response creation (the latter is the only reader that accesses the metadata together with the table). Signed-off-by: Antonio Quartulli <antonio@open-mesh.com> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2013-10-19batman-adv: remove bogus commentAntonio Quartulli1-5/+0
this comment refers to the old batmand codebase and does not make sense anymore. Remove it Signed-off-by: Antonio Quartulli <antonio@open-mesh.com> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2013-10-19batman-adv: refine API calls for unicast transmissions of SKBsLinus Lüssing4-48/+108
With this patch the functions batadv_send_skb_unicast() and batadv_send_skb_unicast_4addr() are further refined into batadv_send_skb_via_tt(), batadv_send_skb_via_tt_4addr() and batadv_send_skb_via_gw(). This way we avoid any "guessing" about where to send a packet in the unicast forwarding methods and let the callers decide. This is going to be useful for the upcoming multicast related patches in particular. Further, the return values were polished a little to use the more appropriate NET_XMIT_* defines. Signed-off-by: Linus Lüssing <linus.luessing@web.de> Acked-by: Antonio Quartulli <antonio@meshcoding.com> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2013-10-19batman-adv: make the AP isolation attribute VLAN specificAntonio Quartulli5-14/+30
AP isolation has to be enabled on one VLAN interface only. This patch moves the AP isolation attribute to the per-vlan interface attribute set, enabling it to have a different value depending on the selected vlan. Signed-off-by: Antonio Quartulli <antonio@open-mesh.com> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2013-10-19batman-adv: add sysfs framework for VLANAntonio Quartulli4-3/+198
Each VLAN can now have its own set of attributes which are exported through a new subfolder in the sysfs tree. Each VLAN created on top of a soft_iface will have its own subfolder. The subfolder is named "vlan%VID" and it is created inside the "mesh" sysfs folder belonging to batman-adv. Attributes corresponding to the untagged LAN are stored in the root sysfs folder as before. This patch also creates all the needed macros and data structures to easily handle new VLAN spacific attributes. Signed-off-by: Antonio Quartulli <antonio@open-mesh.com> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2013-10-19batman-adv: add per VLAN interface attribute frameworkAntonio Quartulli5-3/+197
Since batman-adv is now fully VLAN-aware, a proper framework able to handle per-vlan-interface attributes is needed. Those attributes will affect the associated VLAN interface only, rather than the real soft_iface (which would result in every vlan interface having the same attribute configuration). To make the code simpler and easier to extend, attributes associated to the standalone soft_iface are now treated like belonging to yet another vlan having a special vid. This vid is different from the others because it is made up by all zeros and the VLAN_HAS_TAG bit is not set. Signed-off-by: Antonio Quartulli <antonio@open-mesh.com> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2013-10-19batman-adv: make the Distributed ARP Table vlan awareAntonio Quartulli2-48/+107
The same IP subnet can be used on different VLANs, therefore DAT has to differentiate whether the IP to resolve belongs to one or the other virtual LAN. To accomplish this task DAT has to deal with the VLAN tag and store it together with each ARP entry. Signed-off-by: Antonio Quartulli <antonio@open-mesh.com> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2013-10-19batman-adv: make the GW module correctly talk to the new VLAN-TTAntonio Quartulli1-3/+18
The gateway code is now adapted in order to correctly interact with the Translation Table component by using the vlan ID Signed-off-by: Antonio Quartulli <antonio@open-mesh.com> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2013-10-19batman-adv: print the VID together with the TT entriesAntonio Quartulli2-32/+52
Signed-off-by: Antonio Quartulli <antonio@open-mesh.com> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2013-10-19batman-adv: use vid when computing local and global TT CRCAntonio Quartulli1-4/+31
now that each TT entry is characterised by a VLAN ID, the latter has to be taken into consideration when computing the local/global table CRC as it would be theoretically possible to have the same client in two different VLANs Signed-off-by: Antonio Quartulli <antonio@open-mesh.com> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2013-10-19batman-adv: add the VLAN ID attribute to the TT entryAntonio Quartulli13-135/+312
To make the translation table code VLAN-aware, each entry must carry the VLAN ID which it belongs to. This patch adds such attribute to the related TT structures. Signed-off-by: Antonio Quartulli <antonio@open-mesh.com> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>