aboutsummaryrefslogtreecommitdiffstats
path: root/net (follow)
AgeCommit message (Collapse)AuthorFilesLines
2013-10-09batman-adv: tvlv - convert tt query packet to use tvlv unicast packetsMarek Lindner5-334/+286
Instead of generating TT specific packets the TVLV unicast API is used to send translation table data. Signed-off-by: Marek Lindner <lindner_marek@yahoo.de> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2013-10-09batman-adv: tvlv - convert tt data sent within OGMsMarek Lindner5-158/+187
The translation table meta data (version number, crc checksum, etc) as well as the translation table diff propgated within OGMs now uses the newly introduced tvlv infrastructure. Signed-off-by: Marek Lindner <lindner_marek@yahoo.de> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2013-10-09batman-adv: tvlv - add network coding containerMarek Lindner5-1/+75
Create network coding container to announce network coding capabilities (if enabled). Signed-off-by: Marek Lindner <lindner_marek@yahoo.de> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2013-10-09batman-adv: tvlv - add distributed arp table containerMarek Lindner5-1/+83
Create DAT container to announce DAT capabilities (if enabled). Signed-off-by: Marek Lindner <lindner_marek@yahoo.de> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2013-10-09batman-adv: tvlv - gateway download/upload bandwidth containerMarek Lindner11-191/+327
Prior to this patch batman-adv read the advertised uplink bandwidth from userspace and compressed this information into a single byte called "gateway class". Now the download & upload bandwidth information is sent as-is. No userspace change is necessary since the sysfs API always allowed to specify a bandwidth. Signed-off-by: Marek Lindner <lindner_marek@yahoo.de> Signed-off-by: Spyros Gasteratos <morfeas3000@gmail.com> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2013-10-09batman-adv: tvlv - basic infrastructureMarek Lindner7-17/+794
The goal is to provide the infrastructure for sending, receiving and parsing information 'containers' while preserving backward compatibility. TVLV (based on the commonly known Type Length Value technique) was chosen as the format for those containers. Even if a node does not know the tvlv type of a certain container it can simply skip the current container and proceed with the next. Past experience has shown features evolve over time, so a 'version' field was added right from the start to allow differentiating between feature variants - hence the name: T(ype) V(ersion) L(ength) V(alue). This patch introduces the basic TVLV infrastructure: * register / unregister tvlv containers to be sent with each OGM (on primary interfaces only) * register / unregister callback handlers to be called upon finding the corresponding tvlv type in a tvlv buffer * unicast tvlv send / receive API calls Signed-off-by: Marek Lindner <lindner_marek@yahoo.de> Signed-off-by: Spyros Gasteratos <morfeas3000@gmail.com> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2013-10-09batman-adv: switch to a new packet compatibility versionAntonio Quartulli1-1/+1
With this change batman-adv is breaking compatibility with older versions and it is moving to compat-version 15. Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de> Signed-off-by: Martin Hundebøll <martin@hundeboll.net> Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2013-10-09net: fix build errors if ipv6 is disabledEric Dumazet3-3/+11
CONFIG_IPV6=n is still a valid choice ;) It appears we can remove dead code. Reported-by: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-09udp: fix a typo in __udp4_lib_mcast_demux_lookupEric Dumazet1-1/+1
At this point sk might contain garbage. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-09ipv6: make lookups simpler and fasterEric Dumazet28-239/+189
TCP listener refactoring, part 4 : To speed up inet lookups, we moved IPv4 addresses from inet to struct sock_common Now is time to do the same for IPv6, because it permits us to have fast lookups for all kind of sockets, including upcoming SYN_RECV. Getting IPv6 addresses in TCP lookups currently requires two extra cache lines, plus a dereference (and memory stall). inet6_sk(sk) does the dereference of inet_sk(__sk)->pinet6 This patch is way bigger than its IPv4 counter part, because for IPv4, we could add aliases (inet_daddr, inet_rcv_saddr), while on IPv6, it's not doable easily. inet6_sk(sk)->daddr becomes sk->sk_v6_daddr inet6_sk(sk)->rcv_saddr becomes sk->sk_v6_rcv_saddr And timewait socket also have tw->tw_v6_daddr & tw->tw_v6_rcv_saddr at the same offset. We get rid of INET6_TW_MATCH() as INET6_MATCH() is now the generic macro. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-08tcp/dccp: remove twchainEric Dumazet8-240/+122
TCP listener refactoring, part 3 : Our goal is to hash SYN_RECV sockets into main ehash for fast lookup, and parallel SYN processing. Current inet_ehash_bucket contains two chains, one for ESTABLISH (and friend states) sockets, another for TIME_WAIT sockets only. As the hash table is sized to get at most one socket per bucket, it makes little sense to have separate twchain, as it makes the lookup slightly more complicated, and doubles hash table memory usage. If we make sure all socket types have the lookup keys at the same offsets, we can use a generic and faster lookup. It turns out TIME_WAIT and ESTABLISHED sockets already have common lookup fields for IPv4. [ INET_TW_MATCH() is no longer needed ] I'll provide a follow-up to factorize IPv6 lookup as well, to remove INET6_TW_MATCH() This way, SYN_RECV pseudo sockets will be supported the same. A new sock_gen_put() helper is added, doing either a sock_put() or inet_twsk_put() [ and will support SYN_RECV later ]. Note this helper should only be called in real slow path, when rcu lookup found a socket that was moved to another identity (freed/reused immediately), but could eventually be used in other contexts, like sock_edemux() Before patch : dmesg | grep "TCP established" TCP established hash table entries: 524288 (order: 11, 8388608 bytes) After patch : TCP established hash table entries: 524288 (order: 10, 4194304 bytes) Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller27-69/+153
Conflicts: include/linux/netdevice.h net/core/sock.c Trivial merge issues. Removal of "extern" for functions declaration in netdevice.h at the same time "const" was added to an argument. Two parallel line additions in net/core/sock.c Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-08pkt_sched: fq: fix non TCP flows pacingEric Dumazet2-11/+10
Steinar reported FQ pacing was not working for UDP flows. It looks like the initial sk->sk_pacing_rate value of 0 was a wrong choice. We should init it to ~0U (unlimited) Then, TCA_FQ_FLOW_DEFAULT_RATE should be removed because it makes no real sense. The default rate is really unlimited, and we need to avoid a zero divide. Reported-by: Steinar H. Gunderson <sesse@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-08net: vlan: fix nlmsg size calculation in vlan_get_size()Marc Kleine-Budde1-1/+1
This patch fixes the calculation of the nlmsg size, by adding the missing nla_total_size(). Cc: Patrick McHardy <kaber@trash.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-08pkt_sched: fq: fix typo for initial_quantumEric Dumazet1-1/+1
TCA_FQ_INITIAL_QUANTUM should set q->initial_quantum Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-08ipv6: Fix the upper MTU limit in GRE tunnelOussama Ghorbel1-2/+1
Unlike ipv4, the struct member hlen holds the length of the GRE and ipv6 headers. This length is also counted in dev->hard_header_len. Perhaps, it's more clean to modify the hlen to count only the GRE header without ipv6 header as the variable name suggest, but the simple way to fix this without regression risk is simply modify the calculation of the limit in ip6gre_tunnel_change_mtu function. Verified in kernel version v3.11. Signed-off-by: Oussama Ghorbel <ou.ghorbel@gmail.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-08cgroup: cls: remove unnecessary task_cls_classidGao feng1-2/+2
We can get classid through cgroup_subsys_state, this is directviewing and effective. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-08cgroup: netprio: remove unnecessary task_netprioidxGao feng1-2/+1
Since the tasks have been migrated to the cgroup, there is no need to call task_netprioidx to get task's cgroup id. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-08net: ipv4 only populate IP_PKTINFO when neededShawn Bohrer3-4/+5
The since the removal of the routing cache computing fib_compute_spec_dst() does a fib_table lookup for each UDP multicast packet received. This has introduced a performance regression for some UDP workloads. This change skips populating the packet info for sockets that do not have IP_PKTINFO set. Benchmark results from a netperf UDP_RR test: Before 89789.68 transactions/s After 90587.62 transactions/s Benchmark results from a fio 1 byte UDP multicast pingpong test (Multicast one way unicast response): Before 12.63us RTT After 12.48us RTT Signed-off-by: Shawn Bohrer <sbohrer@rgmadvisors.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-08udp: ipv4: Add udp early demuxShawn Bohrer2-18/+185
The removal of the routing cache introduced a performance regression for some UDP workloads since a dst lookup must be done for each packet. This change caches the dst per socket in a similar manner to what we do for TCP by implementing early_demux. For UDP multicast we can only cache the dst if there is only one receiving socket on the host. Since caching only works when there is one receiving socket we do the multicast socket lookup using RCU. For UDP unicast we only demux sockets with an exact match in order to not break forwarding setups. Additionally since the hash chains may be long we only check the first socket to see if it is a match and not waste extra time searching the whole chain when we might not find an exact match. Benchmark results from a netperf UDP_RR test: Before 87961.22 transactions/s After 89789.68 transactions/s Benchmark results from a fio 1 byte UDP multicast pingpong test (Multicast one way unicast response): Before 12.97us RTT After 12.63us RTT Signed-off-by: Shawn Bohrer <sbohrer@rgmadvisors.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-08udp: Only allow busy read/poll on connected socketsShawn Bohrer2-4/+6
UDP sockets can receive packets from multiple endpoints and thus may be received on multiple receive queues. Since packets packets can arrive on multiple receive queues we should not mark the napi_id for all packets. This makes busy read/poll only work for connected UDP sockets. This additionally enables busy read/poll for UDP multicast packets as long as the socket is connected by moving the check into __udp_queue_rcv_skb(). Signed-off-by: Shawn Bohrer <sbohrer@rgmadvisors.com> Suggested-by: Eric Dumazet <edumazet@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-08net_sched: increment drop counters in qdisc_tree_decrease_qlen()Eric Dumazet1-0/+3
qdisc_tree_decrease_qlen() is called when some packets are dropped on a qdisc, and we want to notify parents of qlen changes. We also can increment parents qdisc qstats drop counters. This permits more accurate drop counters up to root qdisc. For example a graft operation typically resets a qdisc (drops all packets) and call qdisc_tree_decrease_qlen() Note that callers are responsible for their drop counters. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-08l2tp: Fix build warning with ipv6 disabled.David S. Miller1-5/+8
net/l2tp/l2tp_core.c: In function ‘l2tp_verify_udp_checksum’: net/l2tp/l2tp_core.c:499:22: warning: unused variable ‘tunnel’ [-Wunused-variable] Create a helper "l2tp_tunnel()" to facilitate this, and as a side effect get rid of a bunch of unnecessary void pointer casts. Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-086lowpan: Sync default hardware address of lowpan links to their wpanAlan Ott1-0/+3
When a lowpan link to a wpan device is created, set the hardware address of the lowpan link to that of the wpan device. Signed-off-by: Alan Ott <alan@signal11.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-086lowpan: Only make 6lowpan links to IEEE802154 devicesAlan Ott1-0/+2
Refuse to create 6lowpan links if the actual hardware interface is of any type other than ARPHRD_IEEE802154. Signed-off-by: Alan Ott <alan@signal11.us> Suggested-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-07net: Update the sysctl permissions handler to test effective uid/gidEric W. Biederman1-2/+2
On Tue, 20 Aug 2013 11:40:04 -0500 Eric Sandeen <sandeen@redhat.com> wrote: > This was brought up in a Red Hat bug (which may be marked private, I'm sorry): > > Bug 987055 - open O_WRONLY succeeds on some root owned files in /proc for process running with unprivileged EUID > > "On RHEL7 some of the files in /proc can be opened for writing by an unprivileged EUID." > > The flaw existed upstream as well last I checked. > > This commit in kernel v3.8 caused the regression: > > commit cff109768b2d9c03095848f4cd4b0754117262aa > Author: Eric W. Biederman <ebiederm@xmission.com> > Date: Fri Nov 16 03:03:01 2012 +0000 > > net: Update the per network namespace sysctls to be available to the network namespace owner > > - Allow anyone with CAP_NET_ADMIN rights in the user namespace of the > the netowrk namespace to change sysctls. > - Allow anyone the uid of the user namespace root the same > permissions over the network namespace sysctls as the global root. > - Allow anyone with gid of the user namespace root group the same > permissions over the network namespace sysctl as the global root group. > > Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> > Signed-off-by: David S. Miller <davem@davemloft.net> > > because it changed /sys/net's special permission handler to test current_uid, not > current_euid; same for current_gid/current_egid. > > So in this case, root cannot drop privs via set[ug]id, and retains all privs > in this codepath. Modify the code to use current_euid(), and in_egroup_p, as in done in fs/proc/proc_sysctl.c:test_perm() Cc: stable@vger.kernel.org Reviewed-by: Eric Sandeen <sandeen@redhat.com> Reported-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-07Merge git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-nextDavid S. Miller11-287/+936
Conflicts: drivers/net/wireless/brcm80211/brcmfmac/dhd_bus.h drivers/net/wireless/rtlwifi/rtl8188ee/phy.h drivers/net/wireless/rtlwifi/rtl8192ce/phy.h drivers/net/wireless/rtlwifi/rtl8192de/phy.h drivers/net/wireless/rtlwifi/rtl8723ae/phy.h Just some minor conflicts between the wireless-next changes and Joe Perches's "extern" removal from function prototypes in header files. John W. Linville says: ==================== Regarding the Bluetooth bits, Gustavo says: "The big work here is from Marcel and Johan. They did a lot of work in the L2CAP, HCI and MGMT layers. The most important ones are the addition of a new MGMT command to enable/disable LE advertisement and the introduction of the HCI user channel to allow applications to get directly and exclusive access to Bluetooth devices." As to the ath10k bits, Kalle says: "Bartosz dropped support for qca98xx hw1.0 hardware from ath10k, it's just too much to support it. Michal added support for the new firmware interface. Marek fixed WEP in AP and IBSS mode. Rest of the changes are minor fixes or cleanups." And also: "Major changes are: * throughput improvements including aligning the RX frames correctly and optimising HTT layer (Michal) * remove qca98xx hw1.0 support (Bartosz) * add support for firmware version 999.999.0.636 (Michal) * firmware htt statistics support (Kalle) * fix WEP in AP and IBSS mode (Marek) * fix a mutex unlock balance in debugfs file (Shafi) And of course there's a lot of smaller fixes and cleanup." For the wl12xx bits, Luca says: "Here are some patches intended for 3.13. Eliad is upstreaming a bunch of patches that have been pending in the internal tree. Mostly bugfixes and other small improvements." Along with that... Arend and friends bring us a batch of brcmfmac updates, Larry Finger offers some rtlwifi refactoring, and Sujith sends the usual batch of ath9k updates. As usual, there are a number of other small updates from a variety of players as well. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-07ipv4: fix ineffective source address selectionJiri Benc1-1/+1
When sending out multicast messages, the source address in inet->mc_addr is ignored and rewritten by an autoselected one. This is caused by a typo in commit 813b3b5db831 ("ipv4: Use caller's on-stack flowi as-is in output route lookups"). Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-07net: Separate the close_list and the unreg_list v2Eric W. Biederman2-14/+17
Separate the unreg_list and the close_list in dev_close_many preventing dev_close_many from permuting the unreg_list. The permutations of the unreg_list have resulted in cases where the loopback device is accessed it has been freed in code such as dst_ifdown. Resulting in subtle memory corruption. This is the second bug from sharing the storage between the close_list and the unreg_list. The issues that crop up with sharing are apparently too subtle to show up in normal testing or usage, so let's forget about being clever and use two separate lists. v2: Make all callers pass in a close_list to dev_close_many Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-07net: fix unsafe set_memory_rw from softirqAlexei Starovoitov1-4/+4
on x86 system with net.core.bpf_jit_enable = 1 sudo tcpdump -i eth1 'tcp port 22' causes the warning: [ 56.766097] Possible unsafe locking scenario: [ 56.766097] [ 56.780146] CPU0 [ 56.786807] ---- [ 56.793188] lock(&(&vb->lock)->rlock); [ 56.799593] <Interrupt> [ 56.805889] lock(&(&vb->lock)->rlock); [ 56.812266] [ 56.812266] *** DEADLOCK *** [ 56.812266] [ 56.830670] 1 lock held by ksoftirqd/1/13: [ 56.836838] #0: (rcu_read_lock){.+.+..}, at: [<ffffffff8118f44c>] vm_unmap_aliases+0x8c/0x380 [ 56.849757] [ 56.849757] stack backtrace: [ 56.862194] CPU: 1 PID: 13 Comm: ksoftirqd/1 Not tainted 3.12.0-rc3+ #45 [ 56.868721] Hardware name: System manufacturer System Product Name/P8Z77 WS, BIOS 3007 07/26/2012 [ 56.882004] ffffffff821944c0 ffff88080bbdb8c8 ffffffff8175a145 0000000000000007 [ 56.895630] ffff88080bbd5f40 ffff88080bbdb928 ffffffff81755b14 0000000000000001 [ 56.909313] ffff880800000001 ffff880800000000 ffffffff8101178f 0000000000000001 [ 56.923006] Call Trace: [ 56.929532] [<ffffffff8175a145>] dump_stack+0x55/0x76 [ 56.936067] [<ffffffff81755b14>] print_usage_bug+0x1f7/0x208 [ 56.942445] [<ffffffff8101178f>] ? save_stack_trace+0x2f/0x50 [ 56.948932] [<ffffffff810cc0a0>] ? check_usage_backwards+0x150/0x150 [ 56.955470] [<ffffffff810ccb52>] mark_lock+0x282/0x2c0 [ 56.961945] [<ffffffff810ccfed>] __lock_acquire+0x45d/0x1d50 [ 56.968474] [<ffffffff810cce6e>] ? __lock_acquire+0x2de/0x1d50 [ 56.975140] [<ffffffff81393bf5>] ? cpumask_next_and+0x55/0x90 [ 56.981942] [<ffffffff810cef72>] lock_acquire+0x92/0x1d0 [ 56.988745] [<ffffffff8118f52a>] ? vm_unmap_aliases+0x16a/0x380 [ 56.995619] [<ffffffff817628f1>] _raw_spin_lock+0x41/0x50 [ 57.002493] [<ffffffff8118f52a>] ? vm_unmap_aliases+0x16a/0x380 [ 57.009447] [<ffffffff8118f52a>] vm_unmap_aliases+0x16a/0x380 [ 57.016477] [<ffffffff8118f44c>] ? vm_unmap_aliases+0x8c/0x380 [ 57.023607] [<ffffffff810436b0>] change_page_attr_set_clr+0xc0/0x460 [ 57.030818] [<ffffffff810cfb8d>] ? trace_hardirqs_on+0xd/0x10 [ 57.037896] [<ffffffff811a8330>] ? kmem_cache_free+0xb0/0x2b0 [ 57.044789] [<ffffffff811b59c3>] ? free_object_rcu+0x93/0xa0 [ 57.051720] [<ffffffff81043d9f>] set_memory_rw+0x2f/0x40 [ 57.058727] [<ffffffff8104e17c>] bpf_jit_free+0x2c/0x40 [ 57.065577] [<ffffffff81642cba>] sk_filter_release_rcu+0x1a/0x30 [ 57.072338] [<ffffffff811108e2>] rcu_process_callbacks+0x202/0x7c0 [ 57.078962] [<ffffffff81057f17>] __do_softirq+0xf7/0x3f0 [ 57.085373] [<ffffffff81058245>] run_ksoftirqd+0x35/0x70 cannot reuse jited filter memory, since it's readonly, so use original bpf insns memory to hold work_struct defer kfree of sk_filter until jit completed freeing tested on x86_64 and i386 Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-07ipv6: Allow the MTU of ipip6 tunnel to be set below 1280Oussama Ghorbel1-2/+10
The (inner) MTU of a ipip6 (IPv4-in-IPv6) tunnel cannot be set below 1280, which is the minimum MTU in IPv6. However, there should be no IPv6 on the tunnel interface at all, so the IPv6 rules should not apply. More info at https://bugzilla.kernel.org/show_bug.cgi?id=15530 This patch allows to check the minimum MTU for ipv6 tunnel according to these rules: -In case the tunnel is configured with ipip6 mode the minimum MTU is 68. -In case the tunnel is configured with ip6ip6 or any mode the minimum MTU is 1280. Signed-off-by: Oussama Ghorbel <ou.ghorbel@gmail.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-07netif_set_xps_queue: make cpu mask constMichael S. Tsirkin1-1/+2
virtio wants to pass in cpumask_of(cpu), make parameter const to avoid build warnings. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-04tcp: do not forget FIN in tcp_shifted_skb()Eric Dumazet1-1/+4
Yuchung found following problem : There are bugs in the SACK processing code, merging part in tcp_shift_skb_data(), that incorrectly resets or ignores the sacked skbs FIN flag. When a receiver first SACK the FIN sequence, and later throw away ofo queue (e.g., sack-reneging), the sender will stop retransmitting the FIN flag, and hangs forever. Following packetdrill test can be used to reproduce the bug. $ cat sack-merge-bug.pkt `sysctl -q net.ipv4.tcp_fack=0` // Establish a connection and send 10 MSS. 0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 +.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 +.000 bind(3, ..., ...) = 0 +.000 listen(3, 1) = 0 +.050 < S 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7> +.000 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 6> +.001 < . 1:1(0) ack 1 win 1024 +.000 accept(3, ..., ...) = 4 +.100 write(4, ..., 12000) = 12000 +.000 shutdown(4, SHUT_WR) = 0 +.000 > . 1:10001(10000) ack 1 +.050 < . 1:1(0) ack 2001 win 257 +.000 > FP. 10001:12001(2000) ack 1 +.050 < . 1:1(0) ack 2001 win 257 <sack 10001:11001,nop,nop> +.050 < . 1:1(0) ack 2001 win 257 <sack 10001:12002,nop,nop> // SACK reneg +.050 < . 1:1(0) ack 12001 win 257 +0 %{ print "unacked: ",tcpi_unacked }% +5 %{ print "" }% First, a typo inverted left/right of one OR operation, then code forgot to advance end_seq if the merged skb carried FIN. Bug was added in 2.6.29 by commit 832d11c5cd076ab ("tcp: Try to restore large SKBs while SACK processing") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Acked-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-04Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller27-1653/+2418
Pablo Neira Ayuso says: ==================== The following patchset contains Netfilter updates for your net-next tree, mostly ipset improvements and enhancements features, they are: * Don't call ip_nest_end needlessly in the error path from me, suggested by Pablo Neira Ayuso, from Jozsef Kadlecsik. * Fixed sparse warnings about shadowed variable and missing rcu annotation and fix of "may be used uninitialized" warnings, also from Jozsef. * Renamed simple macro names to avoid namespace issues, reported by David Laight, again from Jozsef. * Use fix sized type for timeout in the extension part, and cosmetic ordering of matches and targets separatedly in xt_set.c, from Jozsef. * Support package fragments for IPv4 protos without ports from Anders K. Pedersen. For example this allows a hash:ip,port ipset containing the entry 192.168.0.1,gre:0 to match all package fragments for PPTP VPN tunnels to/from the host. Without this patch only the first package fragment (with fragment offset 0) was matched. * Introduced a new operation to get both setname and family, from Jozsef. ip[6]tables set match and SET target need to know the family of the set in order to reject adding rules which refer to a set with a non-mathcing family. Currently such rules are silently accepted and then ignored instead of generating an error message to the user. * Reworked extensions support in ipset types from Jozsef. The approach of defining structures with all variations is not manageable as the number of extensions grows. Therefore a blob for the extensions is introduced, somewhat similar to conntrack. The support of extensions which need a per data destroy function is added as well. * When an element timed out in a list:set type of set, the garbage collector skipped the checking of the next element. So the purging was delayed to the next run of the gc, fixed by Jozsef. * A small Kconfig fix: NETFILTER_NETLINK cannot be selected and ipset requires it. * hash:net,net type from Oliver Smith. The type provides the ability to store pairs of subnets in a set. * Comment for ipset entries from Oliver Smith. This makes possible to annotate entries in a set with comments, for example: ipset n foo hash:net,net comment ipset a foo 10.0.0.0/21,192.168.1.0/24 comment "office nets A and B" * Fix of hash types resizing with comment extension from Jozsef. * Fix of new extensions for list:set type when an element is added into a slot from where another element was pushed away from Jozsef. * Introduction of a common function for the listing of the element extensions from Jozsef. * Net namespace support for ipset from Vitaly Lavrov. * hash:net,port,net type from Oliver Smith, which makes possible to store the triples of two subnets and a protocol, port pair in a set. * Get xt_TCPMSS working with net namespace, by Gao feng. * Use the proper net netnamespace to allocate skbs, also by Gao feng. * A couple of cleanups for the conntrack SIP helper, by Holger Eitzenberger. * Extend cttimeout to allow setting default conntrack timeouts via nfnetlink, so we can get rid of all our sysctl/proc interfaces in the future for timeout tuning, from me. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-03tcp: shrink tcp6_timewait_sock by one cache lineEric Dumazet3-6/+6
While working on tcp listener refactoring, I found that it would really make things easier if sock_common could include the IPv6 addresses needed in the lookups, instead of doing very complex games to get their values (depending on sock being SYN_RECV, ESTABLISHED, TIME_WAIT) For this to happen, I need to be sure that tcp6_timewait_sock and tcp_timewait_sock consume same number of cache lines. This is possible if we only use 32bits for tw_ttd, as we remove one 32bit hole in inet_timewait_sock inet_tw_time_stamp() is defined and used, even if its current implementation looks like tcp_time_stamp : We might need finer resolution for tcp_time_stamp in the future. Before patch : sizeof(struct tcp6_timewait_sock) = 0xc8 After patch : sizeof(struct tcp6_timewait_sock) = 0xc0 Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-03Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wirelessDavid S. Miller5-14/+22
John W. Linville says: ==================== Here is another batch of fixes intended for the 3.12 stream... For the mac80211 bits, Johannes says: "This time I have two fixes for IBSS (including one for wext, hah), a fix for extended rates IEs, an active monitor checking fix and a sysfs registration race fix." On top of those... Amitkumar Karwar brings an mwifiex fix for an interrupt loss issue w/ SDIO devices. The problem was due to a command timeout issue introduced by an earlier patch. Felix Fietkau a stall in the ath9k driver. This patch fixes the regression introduced in the commit "ath9k: use software queues for un-aggregated data packets". Stanislaw Gruszka reverts an rt2x00 patch that was found to cause connection problems with some devices. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-03Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-nextJohn W. Linville12-326/+969
2013-10-03net: heap overflow in __audit_sockaddr()Dan Carpenter2-4/+22
We need to cap ->msg_namelen or it leads to a buffer overflow when we to the memcpy() in __audit_sockaddr(). It requires CAP_AUDIT_CONTROL to exploit this bug. The call tree is: ___sys_recvmsg() move_addr_to_user() audit_sockaddr() __audit_sockaddr() Reported-by: Jüri Aedla <juri.aedla@gmail.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-03Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davemJohn W. Linville5-14/+22
2013-10-03Merge tag 'batman-adv-fix-for-davem' of git://git.open-mesh.org/linux-mergeDavid S. Miller3-16/+31
Included change: - fix multi soft-interfaces setups with Network Coding enabled by registering the CODED packet type once only (instead of once per soft-if) Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-03net: ipv4: Change variable type to boolPeter Senna Tschudin1-1/+1
The variable fully_acked is only assigned the values true and false. Change its type to bool. The simplified semantic patch that find this problem is as follows (http://coccinelle.lip6.fr/): @exists@ type T; identifier b; @@ - T + bool b = ...; ... when any b = \(true\|false\) Signed-off-by: Peter Senna Tschudin <peter.senna@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-03flow_dissector: factor out the ports extraction in skb_flow_get_portsNikolay Aleksandrov1-11/+28
Factor out the code that extracts the ports from skb_flow_dissect and add a new function skb_flow_get_ports which can be re-used. Suggested-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Acked-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-03inet: consolidate INET_TW_MATCHEric Dumazet2-10/+7
TCP listener refactoring, part 2 : We can use a generic lookup, sockets being in whatever state, if we are sure all relevant fields are at the same place in all socket types (ESTABLISH, TIME_WAIT, SYN_RECV) This patch removes these macros : inet_addrpair, inet_addrpair, tw_addrpair, tw_portpair And adds : sk_portpair, sk_addrpair, sk_daddr, sk_rcv_saddr Then, INET_TW_MATCH() is really the same than INET_MATCH() Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-03Bluetooth: Only one command per L2CAP LE signalling is supportedMarcel Holtmann1-25/+19
The Bluetooth specification makes it clear that only one command should be present in the L2CAP LE signalling packet. So tighten the checks here and restrict it to exactly one command. This is different from L2CAP BR/EDR signalling where multiple commands can be part of the same packet. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2013-10-03Bluetooth: Check minimum length of SMP packetsMarcel Holtmann1-2/+7
When SMP packets are received, make sure they contain at least 1 byte header for the opcode. If not, drop the packet and disconnect the link. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2013-10-03Bluetooth: Drop packets on ATT fixed channel on BR/EDRMarcel Holtmann1-0/+4
The ATT fixed channel is only valid when using LE connections. On BR/EDR it is required to go through L2CAP connection oriented channel for ATT. Drop ATT packets when they are received on a BR/EDR connection. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2013-10-03Bluetooth: L2CAP connectionless channels are only valid for BR/EDRMarcel Holtmann1-0/+4
When receiving connectionless packets on a LE connection, just drop the packet. There is no concept of connectionless channels for LE. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2013-10-03Bluetooth: SMP packets are only valid on LE connectionsMarcel Holtmann1-0/+6
When receiving SMP packets on a BR/EDR connection, then just drop the packet and do not try to process it. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2013-10-03Bluetooth: Don't copy L2CAP LE signalling to raw socketsMarcel Holtmann1-2/+0
The L2CAP raw sockets are only used for BR/EDR signalling. Packets on LE links should not be forwarded there. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2013-10-03Bluetooth: Fix switch statement order for L2CAP fixed channelsMarcel Holtmann1-3/+4
The switch statement for the various L2CAP fixed channel handlers is not really ordered. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>