aboutsummaryrefslogtreecommitdiffstats
path: root/net/packet/af_packet.c (follow)
AgeCommit message (Collapse)AuthorFilesLines
2013-04-25packet: account statistics only in tpacket_stats_uDaniel Borkmann1-21/+16
Currently, packet_sock has a struct tpacket_stats stats member for TPACKET_V1 and TPACKET_V2 statistic accounting, and with TPACKET_V3 ``union tpacket_stats_u stats_u'' was introduced, where however only statistics for TPACKET_V3 are held, and when copied to user space, TPACKET_V3 does some hackery and access also tpacket_stats' stats, although everything could have been done within the union itself. Unify accounting within the tpacket_stats_u union so that we can remove 8 bytes from packet_sock that are there unnecessary. Note that even if we switch to TPACKET_V3 and would use non mmap(2)ed option, this still works due to the union with same types + offsets, that are exposed to the user space. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-25packet: if hw/sw ts enabled in rx/tx ring, report which ts we gotDaniel Borkmann1-13/+23
Currently, there is no way to find out which timestamp is reported in tpacket{,2,3}_hdr's tp_sec, tp_{n,u}sec members. It can be one of SOF_TIMESTAMPING_SYS_HARDWARE, SOF_TIMESTAMPING_RAW_HARDWARE, SOF_TIMESTAMPING_SOFTWARE, or a fallback variant late call from the PF_PACKET code in software. Therefore, report in the tp_status member of the ring buffer which timestamp has been reported for RX and TX path. This should not break anything for the following reasons: i) in RX ring path, the user needs to test for tp_status & TP_STATUS_USER, and later for other flags as well such as TP_STATUS_VLAN_VALID et al, so adding other flags will do no harm; ii) in TX ring path, time stamps with PACKET_TIMESTAMP socketoption are not available resp. had no effect except that the application setting this is buggy. Next to TP_STATUS_AVAILABLE, the user also should check for other flags such as TP_STATUS_WRONG_FORMAT to reclaim frames to the application. Thus, in case TX ts are turned off (default case), nothing happens to the application logic, and in case we want to use this new feature, we now can also check which of the ts source is reported in the status field as provided in the docs. Reported-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-25packet: enable hardware tx timestamping on tpacket ringDaniel Borkmann1-25/+25
Currently, we only have software timestamping for the TX ring buffer path, but this limitation stems rather from the implementation. By just reusing tpacket_get_timestamp(), we can also allow hardware timestamping just as in the RX path. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-25packet: tx timestamping on tpacket ringWillem de Bruijn1-0/+33
When transmit timestamping is enabled at the socket level, record a timestamp on packets written to a PACKET_TX_RING. Tx timestamps are always looped to the application over the socket error queue. Software timestamps are also written back into the packet frame header in the packet ring. Reported-by: Paul Chavent <paul.chavent@onera.fr> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-19packet: move hw/sw timestamp extraction into a small helperDaniel Borkmann1-34/+23
This patch introduces a small, internal helper function, that is used by PF_PACKET. Based on the flags that are passed, it extracts the packet timestamp in the receive path. This is merely a refactoring to remove some duplicate code in tpacket_rcv(), to make it more readable, and to enable others to use this function in PF_PACKET as well, e.g. for TX. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-16packet: minor: add generic tpacket_uhdr to access packet headersDaniel Borkmann1-27/+12
There is no need to add a dozen unions each time at the start of the function. So, do this once and use it instead. Thus, we can remove some duplicate code and make it more readable. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-14net: sock: make sock_tx_timestamp voidDaniel Borkmann1-6/+4
Currently, sock_tx_timestamp() always returns 0. The comment that describes the sock_tx_timestamp() function wrongly says that it returns an error when an invalid argument is passed (from commit 20d4947353be, ``net: socket infrastructure for SO_TIMESTAMPING''). Make the function void, so that we can also remove all the unneeded if conditions that check for such a _non-existant_ error case in the output path. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-27net: switch to use skb_probe_transport_header()Jason Wang1-19/+3
Switch to use the new help skb_probe_transport_header() to do the l4 header probing for untrusted sources. For packets with partial csum, the header should already been set by skb_partial_csum_set(). Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-26packet: set transport header before doing xmitJason Wang1-0/+21
Set the transport header for 1) some drivers (e.g ixgbe needs l4 header to do atr) 2) precise packet length estimation (introduced in 1def9238) needs l4 header to compute header length. So this patch first tries to get l4 header for packet socket through skb_flow_dissect(), and pretend no l4 header if skb_flow_dissect() fails. Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-19packet: packet fanout rollover during socket overloadWillem de Bruijn1-23/+86
Changes: v3->v2: rebase (no other changes) passes selftest v2->v1: read f->num_members only once fix bug: test rollover mode + flag Minimize packet drop in a fanout group. If one socket is full, roll over packets to another from the group. Maintain flow affinity during normal load using an rxhash fanout policy, while dispersing unexpected traffic storms that hit a single cpu, such as spoofed-source DoS flows. Rollover breaks affinity for flows arriving at saturated sockets during those conditions. The patch adds a fanout policy ROLLOVER that rotates between sockets, filling each socket before moving to the next. It also adds a fanout flag ROLLOVER. If passed along with any other fanout policy, the primary policy is applied until the chosen socket is full. Then, rollover selects another socket, to delay packet drop until the entire system is saturated. Probing sockets is not free. Selecting the last used socket, as rollover does, is a greedy approach that maximizes chance of success, at the cost of extreme load imbalance. In practice, with sufficiently long queues to absorb bursts, sockets are drained in parallel and load balance looks uniform in `top`. To avoid contention, scales counters with number of sockets and accesses them lockfree. Values are bounds checked to ensure correctness. Tested using an application with 9 threads pinned to CPUs, one socket per thread and sufficient busywork per packet operation to limits each thread to handling 32 Kpps. When sent 500 Kpps single UDP stream packets, a FANOUT_CPU setup processes 32 Kpps in total without this patch, 270 Kpps with the patch. Tested with read() and with a packet ring (V1). Also, passes psock_fanout.c unit test added to selftests. Signed-off-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-27hlist: drop the node parameter from iteratorsSasha Levin1-2/+1
I'm not sure why, but the hlist for each entry iterators were conceived list_for_each_entry(pos, head, member) The hlist ones were greedy and wanted an extra parameter: hlist_for_each_entry(tpos, pos, head, member) Why did they need an extra pos parameter? I'm not quite sure. Not only they don't really need it, it also prevents the iterator from looking exactly like the list iterator, which is unfortunate. Besides the semantic patch, there was some manual work required: - Fix up the actual hlist iterators in linux/list.h - Fix up the declaration of other iterators based on the hlist ones. - A very small amount of places were using the 'node' parameter, this was modified to use 'obj->member' instead. - Coccinelle didn't handle the hlist_for_each_entry_safe iterator properly, so those had to be fixed up manually. The semantic patch which is mostly the work of Peter Senna Tschudin is here: @@ iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host; type T; expression a,c,d,e; identifier b; statement S; @@ -T b; <+... when != b ( hlist_for_each_entry(a, - b, c, d) S | hlist_for_each_entry_continue(a, - b, c) S | hlist_for_each_entry_from(a, - b, c) S | hlist_for_each_entry_rcu(a, - b, c, d) S | hlist_for_each_entry_rcu_bh(a, - b, c, d) S | hlist_for_each_entry_continue_rcu_bh(a, - b, c) S | for_each_busy_worker(a, c, - b, d) S | ax25_uid_for_each(a, - b, c) S | ax25_for_each(a, - b, c) S | inet_bind_bucket_for_each(a, - b, c) S | sctp_for_each_hentry(a, - b, c) S | sk_for_each(a, - b, c) S | sk_for_each_rcu(a, - b, c) S | sk_for_each_from -(a, b) +(a) S + sk_for_each_from(a) S | sk_for_each_safe(a, - b, c, d) S | sk_for_each_bound(a, - b, c) S | hlist_for_each_entry_safe(a, - b, c, d, e) S | hlist_for_each_entry_continue_rcu(a, - b, c) S | nr_neigh_for_each(a, - b, c) S | nr_neigh_for_each_safe(a, - b, c, d) S | nr_node_for_each(a, - b, c) S | nr_node_for_each_safe(a, - b, c, d) S | - for_each_gfn_sp(a, c, d, b) S + for_each_gfn_sp(a, c, d) S | - for_each_gfn_indirect_valid_sp(a, c, d, b) S + for_each_gfn_indirect_valid_sp(a, c, d) S | for_each_host(a, - b, c) S | for_each_host_safe(a, - b, c, d) S | for_each_mesh_entry(a, - b, c, d) S ) ...+> [akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c] [akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c] [akpm@linux-foundation.org: checkpatch fixes] [akpm@linux-foundation.org: fix warnings] [akpm@linux-foudnation.org: redo intrusive kvm changes] Tested-by: Peter Senna Tschudin <peter.senna@gmail.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Gleb Natapov <gleb@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-18net: proc: change proc_net_remove to remove_proc_entryGao feng1-1/+1
proc_net_remove is only used to remove proc entries that under /proc/net,it's not a general function for removing proc entries of netns. if we want to remove some proc entries which under /proc/net/stat/, we still need to call remove_proc_entry. this patch use remove_proc_entry to replace proc_net_remove. we can remove proc_net_remove after this patch. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-18net: proc: change proc_net_fops_create to proc_createGao feng1-1/+1
Right now, some modules such as bonding use proc_create to create proc entries under /proc/net/, and other modules such as ipv4 use proc_net_fops_create. It looks a little chaos.this patch changes all of proc_net_fops_create to proc_create. we can remove proc_net_fops_create after this patch. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-03packet: fix leakage of tx_ring memoryPhil Sutter1-4/+6
When releasing a packet socket, the routine packet_set_ring() is reused to free rings instead of allocating them. But when calling it for the first time, it fills req->tp_block_nr with the value of rb->pg_vec_len which in the second invocation makes it bail out since req->tp_block_nr is greater zero but req->tp_block_size is zero. This patch solves the problem by passing a zeroed auto-variable to packet_set_ring() upon each invocation from packet_release(). As far as I can tell, this issue exists even since 69e3c75 (net: TX_RING and packet mmap), i.e. the original inclusion of TX ring support into af_packet, but applies only to sockets with both RX and TX ring allocated, which is probably why this was unnoticed all the time. Signed-off-by: Phil Sutter <phil.sutter@viprinet.com> Cc: Johann Baudy <johann.baudy@gnu-log.net> Cc: Daniel Borkmann <dborkman@redhat.com> Acked-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-18net: Allow userns root to control llc, netfilter, netlink, packet, and xfrmEric W. Biederman1-1/+1
Allow an unpriviled user who has created a user namespace, and then created a network namespace to effectively use the new network namespace, by reducing capable(CAP_NET_ADMIN) and capable(CAP_NET_RAW) calls to be ns_capable(net->user_ns, CAP_NET_ADMIN), or capable(net->user_ns, CAP_NET_RAW) calls. Allow creation of af_key sockets. Allow creation of llc sockets. Allow creation of af_packet sockets. Allow sending xfrm netlink control messages. Allow binding to netlink multicast groups. Allow sending to netlink multicast groups. Allow adding and dropping netlink multicast groups. Allow sending to all netlink multicast groups and port ids. Allow reading the netfilter SO_IP_SET socket option. Allow sending netfilter netlink messages. Allow setting and getting ip_vs netfilter socket options. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-07packet: tx_ring: allow the user to choose tx data offsetPaul Chavent1-1/+45
The tx data offset of packet mmap tx ring used to be : (TPACKET2_HDRLEN - sizeof(struct sockaddr_ll)) The problem is that, with SOCK_RAW socket, the payload (14 bytes after the beginning of the user data) is misaligned. This patch allows to let the user gives an offset for it's tx data if he desires. Set sock option PACKET_TX_HAS_OFF to 1, then specify in each frame of your tx ring tp_net for SOCK_DGRAM, or tp_mac for SOCK_RAW. Signed-off-by: Paul Chavent <paul.chavent@onera.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-26packet: minor: remove unused err assignmentDaniel Borkmann1-2/+0
This tiny patch removes two unused err assignments. In those two cases the err variable is either overwritten with another value at a later point in time without having read the previous assigment, or it is assigned and the function returns without using/reading err after the assignment. Signed-off-by: Daniel Borkmann <daniel.borkmann@tik.ee.ethz.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-31Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-1/+1
Merge the 'net' tree to get the recent set of netfilter bug fixes in order to assist with some merge hassles Pablo is going to have to deal with for upcoming changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-24Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespaceDavid S. Miller1-1/+1
This is an initial merge in of Eric Biederman's work to start adding user namespace support to the networking. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-23af_packet: match_fanout_group() can be staticFengguang Wu1-1/+1
cc: Eric Leblond <eric@regit.org> Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-22packet: Protect packet sk list with mutex (v2)Pavel Emelyanov1-5/+11
Change since v1: * Fixed inuse counters access spotted by Eric In patch eea68e2f (packet: Report socket mclist info via diag module) I've introduced a "scheduling in atomic" problem in packet diag module -- the socket list is traversed under rcu_read_lock() while performed under it sk mclist access requires rtnl lock (i.e. -- mutex) to be taken. [152363.820563] BUG: scheduling while atomic: crtools/12517/0x10000002 [152363.820573] 4 locks held by crtools/12517: [152363.820581] #0: (sock_diag_mutex){+.+.+.}, at: [<ffffffff81a2dcb5>] sock_diag_rcv+0x1f/0x3e [152363.820613] #1: (sock_diag_table_mutex){+.+.+.}, at: [<ffffffff81a2de70>] sock_diag_rcv_msg+0xdb/0x11a [152363.820644] #2: (nlk->cb_mutex){+.+.+.}, at: [<ffffffff81a67d01>] netlink_dump+0x23/0x1ab [152363.820693] #3: (rcu_read_lock){.+.+..}, at: [<ffffffff81b6a049>] packet_diag_dump+0x0/0x1af Similar thing was then re-introduced by further packet diag patches (fanount mutex and pgvec mutex for rings) :( Apart from being terribly sorry for the above, I propose to change the packet sk list protection from spinlock to mutex. This lock currently protects two modifications: * sklist * prot inuse counters The sklist modifications can be just reprotected with mutex since they already occur in a sleeping context. The inuse counters modifications are trickier -- the __this_cpu_-s are used inside, thus requiring the caller to handle the potential issues with contexts himself. Since packet sockets' counters are modified in two places only (packet_create and packet_release) we only need to protect the context from being preempted. BH disabling is not required in this case. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-22af_packet: use define instead of constantdanborkmann@iogearbox.net1-2/+3
Instead of using a hard-coded value for the status variable, it would make the code more readable to use its destined define from linux/if_packet.h. Signed-off-by: daniel.borkmann@tik.ee.ethz.ch Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-2/+10
2012-08-20af_packet: don't emit packet on orig fanout groupEric Leblond1-0/+9
If a packet is emitted on one socket in one group of fanout sockets, it is transmitted again. It is thus read again on one of the sockets of the fanout group. This result in a loop for software which generate packets when receiving one. This retransmission is not the intended behavior: a fanout group must behave like a single socket. The packet should not be transmitted on a socket if it originates from a socket belonging to the same fanout group. This patch fixes the issue by changing the transmission check to take fanout group info account. Reported-by: Aleksandr Kotov <a1k@mail.ru> Signed-off-by: Eric Leblond <eric@regit.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-20packet: Report fanout status via diag enginePavel Emelyanov1-20/+3
Reported value is the same reported by the FANOUT getsockoption, but unlike it, the absent fanout setup results in absent nlattr, rather than in nlattr with zero value. This is done so, since zero fanout report may mean both -- no fanout, and fanout with both id and type zero. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14userns: Print out socket uids in a user namespace aware fashion.Eric W. Biederman1-1/+1
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: James Morris <jmorris@namei.org> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: Patrick McHardy <kaber@trash.net> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Cc: Sridhar Samudrala <sri@us.ibm.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-08-14packet: Introduce net/packet/internal.h headerPavel Emelyanov1-97/+2
The diag module will need to access some private packet_sock data, so move it to a header in advance. This file will be shared between the af_packet.c and the diag.c Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-12af_packet: remove BUG statement in tpacket_destruct_skbdanborkmann@iogearbox.net1-1/+0
Here's a quote of the comment about the BUG macro from asm-generic/bug.h: Don't use BUG() or BUG_ON() unless there's really no way out; one example might be detecting data structure corruption in the middle of an operation that can't be backed out of. If the (sub)system can somehow continue operating, perhaps with reduced functionality, it's probably not BUG-worthy. If you're tempted to BUG(), think again: is completely giving up really the *only* solution? There are usually better options, where users don't need to reboot ASAP and can mostly shut down cleanly. In our case, the status flag of a ring buffer slot is managed from both sides, the kernel space and the user space. This means that even though the kernel side might work as expected, the user space screws up and changes this flag right between the send(2) is triggered when the flag is changed to TP_STATUS_SENDING and a given skb is destructed after some time. Then, this will hit the BUG macro. As David suggested, the best solution is to simply remove this statement since it cannot be used for kernel side internal consistency checks. I've tested it and the system still behaves /stable/ in this case, so in accordance with the above comment, we should rather remove it. Signed-off-by: Daniel Borkmann <daniel.borkmann@tik.ee.ethz.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-08af_packet: Quiet sparse noise about using plain integer as NULL pointerYing Xue1-1/+1
Quiets the sparse warning: warning: Using plain integer as NULL pointer Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-27net: added support for 40GbE link.parav.pandit@emulex.com1-12/+6
1. removed code replication for tov calculation for 1G, 10G and made is common for speed > 1G (1G, 10G, 40G, 100G). 2. defines values for #4 different 40G Phys (KR4, LF4, SR4, CR4) Signed-off-by: Parav Pandit <parav.pandit@emulex.com> Reviewed-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-11af_packet: use sizeof instead of constant in spkt_devicedanborkmann@iogearbox.net1-1/+1
This small patch removes access to the last element of the spkt_device array through a constant. Instead, it is accessed by sizeof() to respect possible changes in if_packet.h. Signed-off-by: Daniel Borkmann <daniel.borkmann@tik.ee.ethz.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-04net: Remove casts to same typeJoe Perches1-5/+4
Adding casts of objects to the same type is unnecessary and confusing for a human reader. For example, this cast: int y; int *p = (int *)&y; I used the coccinelle script below to find and remove these unnecessary casts. I manually removed the conversions this script produces of casts with __force and __user. @@ type T; T *p; @@ - (T *)p + p Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-21af_packet: packet_getsockopt() cleanupEric Dumazet1-38/+8
Factorize code, since most fetched values are int type. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-19packet: dont drop packet but consume itEric Dumazet1-1/+1
When we need to clone skb, we dont drop a packet. Call consume_skb() to not confuse dropwatch. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-15net: cleanup unsigned to unsigned intEric Dumazet1-1/+1
Use of "unsigned int" is preferred to bare "unsigned" in net tree. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-28Remove all #inclusions of asm/system.hDavid Howells1-1/+0
Remove all #inclusions of asm/system.h preparatory to splitting and killing it. Performed with the following command: perl -p -i -e 's!^#\s*include\s*<asm/system[.]h>.*\n!!' `grep -Irl '^#\s*include\s*<asm/system[.]h>' *` Signed-off-by: David Howells <dhowells@redhat.com>
2012-02-24net: Add framework to allow sending packets with customized CRC.Ben Greear1-4/+28
This is useful for testing RX handling of frames with bad CRCs. Requires driver support to actually put the packet on the wire properly. Signed-off-by: Ben Greear <greearb@candelatech.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2011-12-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-1/+5
2011-12-27packet: fix possible dev refcnt leak when bind failWei Yongjun1-1/+5
If bind is fail when bind is called after set PACKET_FANOUT sock option, the dev refcnt will leak. Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-4/+2
Conflicts: net/bluetooth/l2cap_core.c Just two overlapping changes, one added an initialization of a local variable, and another change added a new local variable. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-23net: relax rcvbuf limitsEric Dumazet1-4/+2
skb->truesize might be big even for a small packet. Its even bigger after commit 87fb4b7b533 (net: more accurate skb truesize) and big MTU. We should allow queueing at least one packet per receiver, even with a low RCVBUF setting. Reported-by: Michal Simek <monstr@monstr.eu> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-18packet: Add needed_tailroom to packet_sendmsg_spktHerbert Xu1-1/+2
packet: Add needed_tailroom to packet_sendmsg_spkt While auditing LL_ALLOCATED_SPACE I noticed that packet_sendmsg_spkt did not include needed_tailroom when allocating an skb. This isn't a fatal error as we should always tolerate inadequate tail room but it isn't optimal. This patch fixes that. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-18net: Remove all uses of LL_ALLOCATED_SPACEHerbert Xu1-7/+11
net: Remove all uses of LL_ALLOCATED_SPACE The macro LL_ALLOCATED_SPACE was ill-conceived. It applies the alignment to the sum of needed_headroom and needed_tailroom. As the amount that is then reserved for head room is needed_headroom with alignment, this means that the tail room left may be too small. This patch replaces all uses of LL_ALLOCATED_SPACE with the macro LL_RESERVED_SPACE and direct reference to needed_tailroom. This also fixes the problem with needed_headroom changing between allocating the skb and reserving the head room. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-03af_packet: de-inline some helper functionsOlof Johansson1-26/+26
This popped some compiler errors due to mismatched prototypes. Just remove most manual inlines, the compiler should be able to figure out what makes sense to inline and not. net/packet/af_packet.c:252: warning: 'prb_curr_blk_in_use' declared inline after being called net/packet/af_packet.c:252: warning: previous declaration of 'prb_curr_blk_in_use' was here net/packet/af_packet.c:258: warning: 'prb_queue_frozen' declared inline after being called net/packet/af_packet.c:258: warning: previous declaration of 'prb_queue_frozen' was here net/packet/af_packet.c:248: warning: 'packet_previous_frame' declared inline after being called net/packet/af_packet.c:248: warning: previous declaration of 'packet_previous_frame' was here net/packet/af_packet.c:251: warning: 'packet_increment_head' declared inline after being called net/packet/af_packet.c:251: warning: previous declaration of 'packet_increment_head' was here Signed-off-by: Olof Johansson <olof@lixom.net> Cc: Chetan Loke <loke.chetan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-18macvlan: handle fragmented multicast framesEric Dumazet1-38/+1
Fragmented multicast frames are delivered to a single macvlan port, because ip defrag logic considers other samples are redundant. Implement a defrag step before trying to send the multicast frame. Reported-by: Ben Greear <greearb@candelatech.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-10af_packet: remove unnecessary BUG_ON() in tpacket_destruct_skbdanborkmann@iogearbox.net1-2/+0
If skb is NULL, then stack trace is thrown anyway on dereference. Therefore, the stack trace triggered by BUG_ON is duplicate. Signed-off-by: Daniel Borkmann <danborkmann@googlemail.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-07Merge branch 'master' of github.com:davem330/netDavid S. Miller1-1/+4
Conflicts: net/batman-adv/soft-interface.c
2011-10-03make PACKET_STATISTICS getsockopt report consistently between ring and non-ringWillem de Bruijn1-1/+4
This is a minor change. Up until kernel 2.6.32, getsockopt(fd, SOL_PACKET, PACKET_STATISTICS, ...) would return total and dropped packets since its last invocation. The introduction of socket queue overflow reporting [1] changed drop rate calculation in the normal packet socket path, but not when using a packet ring. As a result, the getsockopt now returns different statistics depending on the reception method used. With a ring, it still returns the count since the last call, as counts are incremented in tpacket_rcv and reset in getsockopt. Without a ring, it returns 0 if no drops occurred since the last getsockopt and the total drops over the lifespan of the socket otherwise. The culprit is this line in packet_rcv, executed on a drop: drop_n_acct: po->stats.tp_drops = atomic_inc_return(&sk->sk_drops); As it shows, the new drop number it taken from the socket drop counter, which is not reset at getsockopt. I put together a small example that demonstrates the issue [2]. It runs for 10 seconds and overflows the queue/ring on every odd second. The reported drop rates are: ring: 16, 0, 16, 0, 16, ... non-ring: 0, 15, 0, 30, 0, 46, 0, 60, 0 , 74. Note how the even ring counts monotonically increase. Because the getsockopt adds tp_drops to tp_packets, total counts are similarly reported cumulatively. Long story short, reinstating the original code, as the below patch does, fixes the issue at the cost of additional per-packet cycles. Another solution that does not introduce per-packet overhead is be to keep the current data path, record the value of sk_drops at getsockopt() at call N in a new field in struct packetsock and subtract that when reporting at call N+1. I'll be happy to code that, instead, it's just more messy. [1] http://patchwork.ozlabs.org/patch/35665/ [2] http://kernel.googlecode.com/files/test-packetsock-getstatistics.c Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-09-15net: consolidate and fix ethtool_ops->get_settings callingJiri Pirko1-25/+27
This patch does several things: - introduces __ethtool_get_settings which is called from ethtool code and from drivers as well. Put ASSERT_RTNL there. - dev_ethtool_get_settings() is replaced by __ethtool_get_settings() - changes calling in drivers so rtnl locking is respected. In iboe_get_rate was previously ->get_settings() called unlocked. This fixes it. Also prb_calc_retire_blk_tmo() in af_packet.c had the same problem. Also fixed by calling __dev_get_by_index() instead of dev_get_by_index() and holding rtnl_lock for both calls. - introduces rtnl_lock in bnx2fc_vport_create() and fcoe_vport_create() so bnx2fc_if_create() and fcoe_if_create() are called locked as they are from other places. - use __ethtool_get_settings() in bonding code Signed-off-by: Jiri Pirko <jpirko@redhat.com> v2->v3: -removed dev_ethtool_get_settings() -added ASSERT_RTNL into __ethtool_get_settings() -prb_calc_retire_blk_tmo - use __dev_get_by_index() and lock around it and __ethtool_get_settings() call v1->v2: add missing export_symbol Reviewed-by: Ben Hutchings <bhutchings@solarflare.com> [except FCoE bits] Acked-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-26af_packet: Prefixed tpacket_v3 structs to avoid name space collisionchetan loke1-55/+62
structs introduced in tpacket_v3 implementation are prefixed with 'tpacket' to avoid namespace collision. Compile tested. Signed-off-by: Chetan Loke <loke.chetan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>