aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/net/bonding (follow)
AgeCommit message (Collapse)AuthorFilesLines
2012-12-07bonding: Fix check for ethtool get_link operation supportBen Hutchings1-11/+6
Since commit 2c60db037034 ('net: provide a default dev->ethtool_ops') all devices have a non-null ethtool_ops. Test only dev->ethtool_ops->get_link in both places where we care. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-30bonding: delete migrated IP addresses from the rlb hash tableJiri Bohac3-40/+184
Bonding in balance-alb mode records information from ARP packets passing through the bond in a hash table (rx_hashtbl). At certain situations (e.g. link change of a slave), rlb_update_rx_clients() will send out ARP packets to update ARP caches of other hosts on the network to achieve RX load balancing. The problem is that once an IP address is recorded in the hash table, it stays there indefinitely. If this IP address is migrated to a different host in the network, bonding still sends out ARP packets that poison other systems' ARP caches with invalid information. This patch solves this by looking at all incoming ARP packets, and checking if the source IP address is one of the source addresses stored in the rx_hashtbl. If it is, but the MAC addresses differ, the corresponding hash table entries are removed. Thus, when an IP address is migrated, the first ARP broadcast by its new owner will purge the offending entries of rx_hashtbl. The hash table is hashed by ip_dst. To be able to do the above check efficiently (not walking the whole hash table), we need a reverse mapping (by ip_src). I added three new members in struct rlb_client_info: rx_hashtbl[x].src_first will point to the start of a list of entries for which hash(ip_src) == x. The list is linked with src_next and src_prev. When an incoming ARP packet arrives at rlb_arp_recv() rlb_purge_src_ip() can quickly walk only the entries on the corresponding lists, i.e. the entries that are likely to contain the offending IP address. To avoid confusion, I renamed these existing fields of struct rlb_client_info: next -> used_next prev -> used_prev rx_hashtbl_head -> rx_hashtbl_used_head (The current linked list is _not_ a list of hash table entries with colliding ip_dst. It's a list of entries that are being used; its purpose is to avoid walking the whole hash table when looking for used entries.) Signed-off-by: Jiri Bohac <jbohac@suse.cz> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-30bonding: rlb mode of bond should not alter ARP originating via bridgezheng.li2-0/+19
Do not modify or load balance ARP packets passing through balance-alb mode (wherein the ARP did not originate locally, and arrived via a bridge). Modifying pass-through ARP replies causes an incorrect MAC address to be placed into the ARP packet, rendering peers unable to communicate with the actual destination from which the ARP reply originated. Load balancing pass-through ARP requests causes an entry to be created for the peer in the rlb table, and bond_alb_monitor will occasionally issue ARP updates to all peers in the table instrucing them as to which MAC address they should communicate with; this occurs when some event sets rx_ntt. In the bridged case, however, the MAC address used for the update would be the MAC of the slave, not the actual source MAC of the originating destination. This would render peers unable to communicate with the destinations beyond the bridge. Signed-off-by: Zheng Li <zheng.x.li@oracle.com> Cc: Jay Vosburgh <fubar@us.ibm.com> Cc: Andy Gospodarek <andy@greyhouse.net> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-28bonding: in balance-rr mode, set curr_active_slave only if it is upMichal Kubeček1-1/+1
If all slaves of a balance-rr bond with ARP monitor are enslaved with down link state, bond keeps down state even after slaves go up. This is caused by bond_enslave() setting curr_active_slave to first slave not taking into account its link state. As bond_loadbalance_arp_mon() uses curr_active_slave to identify whether slave's down->up transition should update bond's link state, bond stays down even if slaves are up (until first slave goes from up to down at least once). Before commit f31c7937 "bonding: start slaves with link down for ARP monitor", this was masked by slaves always starting in UP state with ARP monitor (and MII monitor not relying on curr_active_slave being NULL if there is no slave up). Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-21bonding: Bonding driver does not consider the gso_max_size/gso_max_segs setting of slave devices.Sarveshwar Bandi1-0/+7
Patch sets the lowest gso_max_size and gso_max_segs values of the slave devices during enslave and detach. Signed-off-by: Sarveshwar Bandi <sarveshwar.bandi@emulex.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-01bonding: fix second off-by-one errornikolay@redhat.com1-1/+1
Fix off-by-one error because IFNAMSIZ == 16 and when this code gets executed we stick a NULL byte where we should not. How to reproduce: with CONFIG_CC_STACKPROTECTOR=y (otherwise it may pass by silently) modprobe bonding; echo 1 > /sys/class/net/bond0/bonding/mode; echo "AAAAAAAAAAAAAAAA" > /sys/class/net/bond0/bonding/active_slave; Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Note: Sorry for the second patch but I missed this one while checking the file. You can squash them into one patch. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-01bonding: fix off-by-one errornikolay@redhat.com1-1/+1
Fix off-by-one error because IFNAMSIZ == 16 and when this code gets executed we stick a NULL byte where we should not. How to reproduce: with CONFIG_CC_STACKPROTECTOR=y (otherwise it may pass by silently) modprobe bonding; echo 1 > /sys/class/net/bond0/bonding/mode; echo "AAAAAAAAAAAAAAAA" > /sys/class/net/bond0/bonding/primary; Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-16vlan: fix bond/team enslave of vlan challenged slave/portJiri Pirko1-1/+1
In vlan_uses_dev() check for number of vlan devs rather than existence of vlan_info. The reason is that vlan id 0 is there without appropriate vlan dev on it by default which prevented from enslaving vlan challenged dev. Reported-by: Jon Stanley <jstanley@rmrf.net> Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-04bonding: set qdisc_tx_busylock to avoid LOCKDEP splatEric Dumazet1-0/+2
If a qdisc is installed on a bonding device, its possible to get following lockdep splat under stress : ============================================= [ INFO: possible recursive locking detected ] 3.6.0+ #211 Not tainted --------------------------------------------- ping/4876 is trying to acquire lock: (dev->qdisc_tx_busylock ?: &qdisc_tx_busylock){+.-...}, at: [<ffffffff8157a191>] dev_queue_xmit+0xe1/0x830 but task is already holding lock: (dev->qdisc_tx_busylock ?: &qdisc_tx_busylock){+.-...}, at: [<ffffffff8157a191>] dev_queue_xmit+0xe1/0x830 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(dev->qdisc_tx_busylock ?: &qdisc_tx_busylock); lock(dev->qdisc_tx_busylock ?: &qdisc_tx_busylock); *** DEADLOCK *** May be due to missing lock nesting notation 6 locks held by ping/4876: #0: (sk_lock-AF_INET){+.+.+.}, at: [<ffffffff815e5030>] raw_sendmsg+0x600/0xc30 #1: (rcu_read_lock_bh){.+....}, at: [<ffffffff815ba4bd>] ip_finish_output+0x12d/0x870 #2: (rcu_read_lock_bh){.+....}, at: [<ffffffff8157a0b0>] dev_queue_xmit+0x0/0x830 #3: (dev->qdisc_tx_busylock ?: &qdisc_tx_busylock){+.-...}, at: [<ffffffff8157a191>] dev_queue_xmit+0xe1/0x830 #4: (&bond->lock){++.?..}, at: [<ffffffffa02128c1>] bond_start_xmit+0x31/0x4b0 [bonding] #5: (rcu_read_lock_bh){.+....}, at: [<ffffffff8157a0b0>] dev_queue_xmit+0x0/0x830 stack backtrace: Pid: 4876, comm: ping Not tainted 3.6.0+ #211 Call Trace: [<ffffffff810a0145>] __lock_acquire+0x715/0x1b80 [<ffffffff810a256b>] ? mark_held_locks+0x9b/0x100 [<ffffffff810a1bf2>] lock_acquire+0x92/0x1d0 [<ffffffff8157a191>] ? dev_queue_xmit+0xe1/0x830 [<ffffffff81726b7c>] _raw_spin_lock+0x3c/0x50 [<ffffffff8157a191>] ? dev_queue_xmit+0xe1/0x830 [<ffffffff8106264d>] ? rcu_read_lock_bh_held+0x5d/0x90 [<ffffffff8157a191>] dev_queue_xmit+0xe1/0x830 [<ffffffff8157a0b0>] ? netdev_pick_tx+0x570/0x570 [<ffffffffa0212a6a>] bond_start_xmit+0x1da/0x4b0 [bonding] [<ffffffff815796d0>] dev_hard_start_xmit+0x240/0x6b0 [<ffffffff81597c6e>] sch_direct_xmit+0xfe/0x2a0 [<ffffffff8157a249>] dev_queue_xmit+0x199/0x830 [<ffffffff8157a0b0>] ? netdev_pick_tx+0x570/0x570 [<ffffffff815ba96f>] ip_finish_output+0x5df/0x870 [<ffffffff815ba4bd>] ? ip_finish_output+0x12d/0x870 [<ffffffff815bb964>] ip_output+0x54/0xf0 [<ffffffff815bad48>] ip_local_out+0x28/0x90 [<ffffffff815bc444>] ip_send_skb+0x14/0x50 [<ffffffff815bc4b2>] ip_push_pending_frames+0x32/0x40 [<ffffffff815e536a>] raw_sendmsg+0x93a/0xc30 [<ffffffff8128d570>] ? selinux_file_send_sigiotask+0x1f0/0x1f0 [<ffffffff8109ddb4>] ? __lock_is_held+0x54/0x80 [<ffffffff815f6730>] ? inet_recvmsg+0x220/0x220 [<ffffffff8109ddb4>] ? __lock_is_held+0x54/0x80 [<ffffffff815f6855>] inet_sendmsg+0x125/0x240 [<ffffffff815f6730>] ? inet_recvmsg+0x220/0x220 [<ffffffff8155cddb>] sock_sendmsg+0xab/0xe0 [<ffffffff810a1650>] ? lock_release_non_nested+0xa0/0x2e0 [<ffffffff810a1650>] ? lock_release_non_nested+0xa0/0x2e0 [<ffffffff8155d18c>] __sys_sendmsg+0x37c/0x390 [<ffffffff81195b2a>] ? fsnotify+0x2ca/0x7e0 [<ffffffff811958e8>] ? fsnotify+0x88/0x7e0 [<ffffffff81361f36>] ? put_ldisc+0x56/0xd0 [<ffffffff8116f98a>] ? fget_light+0x3da/0x510 [<ffffffff8155f6c4>] sys_sendmsg+0x44/0x80 [<ffffffff8172fc22>] system_call_fastpath+0x16/0x1b Avoid this problem using a distinct lock_class_key for bonding devices. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jay Vosburgh <fubar@us.ibm.com> Cc: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-31bonding: add some slack to arp monitoring time limitsJiri Bohac1-11/+20
Currently, all the time limits in the bonding ARP monitor are in multiples of arp_interval -- the time interval at which the ARP monitor is periodically scheduled. With a fast network round-trip and a little scheduling latency of the ARP monitor work, a limit of n*delta_in_ticks may effectively mean (n-1)*delta_in_ticks. This is fatal in case of n==1 (the link will stay down forever) and makes the behaviour non-deterministic in all the other cases. Add a delta_in_ticks/2 time slack to all the time limits. Signed-off-by: Jiri Bohac <jbohac@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-22bonding: support for IPv6 transmit hashingJohn Eaglesham1-26/+63
Currently the "bonding" driver does not support load balancing outgoing traffic in LACP mode for IPv6 traffic. IPv4 (and TCP or UDP over IPv4) are currently supported; this patch adds transmit hashing for IPv6 (and TCP or UDP over IPv6), bringing IPv6 up to par with IPv4 support in the bonding driver. In addition, bounds checking has been added to all transmit hashing functions. The algorithm chosen (xor'ing the bottom three quads of the source and destination addresses together, then xor'ing each byte of that result into the bottom byte, finally xor'ing with the last bytes of the MAC addresses) was selected after testing almost 400,000 unique IPv6 addresses harvested from server logs. This algorithm had the most even distribution for both big- and little-endian architectures while still using few instructions. Its behavior also attempts to closely match that of the IPv4 algorithm. The IPv6 flow label was intentionally not included in the hash as it appears to be unset in the vast majority of IPv6 traffic sampled, and the current algorithm not using the flow label already offers a very even distribution. Fragmented IPv6 packets are handled the same way as fragmented IPv4 packets, ie, they are not balanced based on layer 4 information. Additionally, IPv6 packets with intermediate headers are not balanced based on layer 4 information. In practice these intermediate headers are not common and this should not cause any problems, and the alternative (a packet-parsing loop and look-up table) seemed slow and complicated for little gain. Tested-by: John Eaglesham <linux@8192.net> Signed-off-by: John Eaglesham <linux@8192.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-7/+5
2012-08-14netpoll: check netpoll tx status on the right deviceAmerigo Wang1-1/+1
Although this doesn't matter actually, because netpoll_tx_running() doesn't use the parameter, the code will be more readable. For team_dev_queue_xmit() we have to move it down to avoid compile errors. Cc: David Miller <davem@davemloft.net> Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14netpoll: make __netpoll_cleanup non-blockAmerigo Wang1-3/+1
Like the previous patch, slave_disable_netpoll() and __netpoll_cleanup() may be called with read_lock() held too, so we should make them non-block, by moving the cleanup and kfree() to call_rcu_bh() callbacks. Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14netpoll: use GFP_ATOMIC in slave_enable_netpoll() and __netpoll_setup()Amerigo Wang1-3/+3
slave_enable_netpoll() and __netpoll_setup() may be called with read_lock() held, so should use GFP_ATOMIC to allocate memory. Eric suggested to pass gfp flags to __netpoll_setup(). Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: "David S. Miller" <davem@davemloft.net> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14net: remove netdev_bonding_change()Amerigo Wang1-10/+10
I don't see any benifits to use netdev_bonding_change() than using call_netdevice_notifiers() directly. Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-20bond_sysfs: use real_num_tx_queues rather than params.tx_queueJiri Pirko1-1/+1
Since now number of tx queues can be specified during bond instance creation and therefore it may differ from params.tx_queues, use rather real_num_tx_queues for boundary check. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-20net: rename bond_queue_mapping to slave_dev_queue_mappingJiri Pirko1-3/+3
As this is going to be used not only by bonding. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-20rtnl: allow to specify different num for rx and tx queue countJiri Pirko1-6/+8
Also cut out unused function parameters and possible err in return value. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-18bonding: refine IFF_XMIT_DST_RELEASE capabilityEric Dumazet1-0/+5
Some workloads greatly benefit of IFF_XMIT_DST_RELEASE capability on output net device, avoiding dirtying dst refcount. bonding currently disables IFF_XMIT_DST_RELEASE unconditionally. If all slaves have the IFF_XMIT_DST_RELEASE bit set, then bonding master can also have it in its priv_flags Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jay Vosburgh <fubar@us.ibm.com> Cc: Andy Gospodarek <andy@greyhouse.net> Cc: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-17netpoll: move np->dev and np->dev_name init into __netpoll_setup()Jiri Pirko1-3/+1
Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-10Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2-4/+7
Conflicts: net/batman-adv/bridge_loop_avoidance.c net/batman-adv/bridge_loop_avoidance.h net/batman-adv/soft-interface.c net/mac80211/mlme.c With merge help from Antonio Quartulli (batman-adv) and Stephen Rothwell (drivers/net/usb/qmi_wwan.c). The net/mac80211/mlme.c conflict seemed easy enough, accounting for a conversion to some new tracing macros. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-09bonding: debugfs and network namespaces are incompatibleEric W. Biederman1-1/+1
The bonding debugfs support has been broken in the presence of network namespaces since it has been added. The debugfs support does not handle multiple bonding devices with the same name in different network namespaces. I haven't had any bug reports, and I'm not interested in getting any. Disable the debugfs support when network namespaces are enabled. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-09bonding: Manage /proc/net/bonding/ entries from the netdev eventsEric W. Biederman1-3/+6
It was recently reported that moving a bonding device between network namespaces causes warnings from /proc. It turns out after the move we were trying to add and to remove the /proc/net/bonding entries from the wrong network namespace. Move the bonding /proc registration code into the NETDEV_REGISTER and NETDEV_UNREGISTER events where the proc registration and unregistration will always happen at the right time. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-25Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-2/+13
Conflicts: drivers/net/usb/qmi_wwan.c net/batman-adv/translation-table.c net/ipv6/route.c qmi_wwan.c resolution provided by Bjørn Mork. batman-adv conflict is dealing merely with the changes of global function names to have a proper subsystem prefix. ipv6's route.c conflict is merely two side-by-side additions of network namespace methods. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-17bonding: show all the link status of slavesAmerigo Wang1-2/+13
There are four link statuses of a bonding slave, the procfs code shows a wrong status when using downdelay/updelay: (slave->link == BOND_LINK_UP) ? "up" : "down" It doesn't respect the rest two statuses. This patch fixes it. Cc: Jay Vosburgh <fubar@us.ibm.com> Cc: Andy Gospodarek <andy@greyhouse.net> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-13bonding: drop_monitor awareEric Dumazet3-13/+13
When packets are dropped in TX path, its better to use kfree_skb() instead of dev_kfree_skb() to give proper drop_monitor events. Also move the kfree_skb() call after read_unlock() in bond_alb_xmit() and bond_xmit_activebackup() Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-12Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2-6/+11
Conflicts: MAINTAINERS drivers/net/wireless/iwlwifi/pcie/trans.c The iwlwifi conflict was resolved by keeping the code added in 'net' that turns off the buggy chip feature. The MAINTAINERS conflict was merely overlapping changes, one change updated all the wireless web site URLs and the other changed some GIT trees to be Johannes's instead of John's. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-12bonding: remove packet cloning in recv_probe()Eric Dumazet5-40/+36
Cloning all packets in input path have a significant cost. Use skb_header_pointer()/skb_copy_bits() instead of pskb_may_pull() so that recv_probe handlers (bond_3ad_lacpdu_recv / bond_arp_rcv / rlb_arp_recv ) dont touch input skb. bond_handle_frame() can avoid the skb_clone()/dev_kfree_skb() Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jay Vosburgh <fubar@us.ibm.com> Cc: Andy Gospodarek <andy@greyhouse.net> Cc: Jiri Bohac <jbohac@suse.cz> Cc: Nicolas de Pesloüan <nicolas.2p.debian@free.fr> Cc: Maciej Żenczykowski <maze@google.com> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-12bonding: Fix corrupted queue_mappingEric Dumazet1-4/+5
In the transmit path of the bonding driver, skb->cb is used to stash the skb->queue_mapping so that the bonding device can set its own queue mapping. This value becomes corrupted since the skb->cb is also used in __dev_xmit_skb. When transmitting through bonding driver, bond_select_queue is called from dev_queue_xmit. In bond_select_queue the original skb->queue_mapping is copied into skb->cb (via bond_queue_mapping) and skb->queue_mapping is overwritten with the bond driver queue. Subsequently in dev_queue_xmit, __dev_xmit_skb is called which writes the packet length into skb->cb, thereby overwriting the stashed queue mappping. In bond_dev_queue_xmit (called from hard_start_xmit), the queue mapping for the skb is set to the stashed value which is now the skb length and hence is an invalid queue for the slave device. If we want to save skb->queue_mapping into skb->cb[], best place is to add a field in struct qdisc_skb_cb, to make sure it wont conflict with other layers (eg : Qdiscc, Infiniband...) This patchs also makes sure (struct qdisc_skb_cb)->data is aligned on 8 bytes : netem qdisc for example assumes it can store an u64 in it, without misalignment penalty. Note : we only have 20 bytes left in (struct qdisc_skb_cb)->data[]. The largest user is CHOKe and it fills it. Based on a previous patch from Tom Herbert. Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Tom Herbert <therbert@google.com> Cc: John Fastabend <john.r.fastabend@intel.com> Cc: Roland Dreier <roland@kernel.org> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-12bonding:record primary when modify it via sysfsWeiping Pan1-2/+6
If we modify primary via sysfs and it is not a valid slave, we should record it for future use, and this behavior is the same with bond_check_params(). Signed-off-by: Weiping Pan <wpan@redhat.com> Acked-by: Nicolas de Pesloüan <nicolas.2p.debian@free.fr> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller5-18/+32
2012-05-13bonding: Fix LACPDU rx_dropped commit.David S. Miller2-6/+8
I applied the wrong version of Jiri's bonding fix in commit 13a8e0c8cdb43982372bd6c65fb26839c8fd8ce9 ("bonding: don't increase rx_dropped after processing LACPDUs") I applied v3, which introduces warnings I asked him to fix, instead of v4 which properly takes care of those issues. This inter-diffs such that the warnings are now gone. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-10net, drivers/net: Convert compare_ether_addr_64bits to ether_addr_equal_64bitsJoe Perches1-29/+29
Use the new bool function ether_addr_equal_64bits to add some clarity and reduce the likelihood for misuse of compare_ether_addr_64bits for sorting. Done via cocci script: $ cat compare_ether_addr_64bits.cocci @@ expression a,b; @@ - !compare_ether_addr_64bits(a, b) + ether_addr_equal_64bits(a, b) @@ expression a,b; @@ - compare_ether_addr_64bits(a, b) + !ether_addr_equal_64bits(a, b) @@ expression a,b; @@ - !ether_addr_equal_64bits(a, b) == 0 + ether_addr_equal_64bits(a, b) @@ expression a,b; @@ - !ether_addr_equal_64bits(a, b) != 0 + !ether_addr_equal_64bits(a, b) @@ expression a,b; @@ - ether_addr_equal_64bits(a, b) == 0 + !ether_addr_equal_64bits(a, b) @@ expression a,b; @@ - ether_addr_equal_64bits(a, b) != 0 + ether_addr_equal_64bits(a, b) @@ expression a,b; @@ - !!ether_addr_equal_64bits(a, b) + ether_addr_equal_64bits(a, b) Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-10drivers/net: Convert compare_ether_addr to ether_addr_equalJoe Perches1-1/+1
Use the new bool function ether_addr_equal to add some clarity and reduce the likelihood for misuse of compare_ether_addr for sorting. Done via cocci script: $ cat compare_ether_addr.cocci @@ expression a,b; @@ - !compare_ether_addr(a, b) + ether_addr_equal(a, b) @@ expression a,b; @@ - compare_ether_addr(a, b) + !ether_addr_equal(a, b) @@ expression a,b; @@ - !ether_addr_equal(a, b) == 0 + ether_addr_equal(a, b) @@ expression a,b; @@ - !ether_addr_equal(a, b) != 0 + !ether_addr_equal(a, b) @@ expression a,b; @@ - ether_addr_equal(a, b) == 0 + !ether_addr_equal(a, b) @@ expression a,b; @@ - ether_addr_equal(a, b) != 0 + ether_addr_equal(a, b) @@ expression a,b; @@ - !!ether_addr_equal(a, b) + ether_addr_equal(a, b) Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-10bonding: don't increase rx_dropped after processing LACPDUsJiri Bohac3-12/+24
Since commit 3aba891d, bonding processes LACP frames (802.3ad mode) with bond_handle_frame(). Currently a copy of the skb is made and the original is left to be processed by other rx_handlers and the rest of the network stack by returning RX_HANDLER_ANOTHER. As there is no protocol handler for PKT_TYPE_LACPDU, the frame is dropped and dev->rx_dropped increased. Fix this by making bond_handle_frame() return RX_HANDLER_CONSUMED if bonding has processed the LACP frame. Signed-off-by: Jiri Bohac <jbohac@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-27bonding: bond_update_speed_duplex() can return void since no callers check its returnRick Jones1-6/+6
As none of the callers of bond_update_speed_duplex (need to) check its return value, there is little point in it returning anything. Signed-off-by: Rick Jones <rick.jones2@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-19bonding: start slaves with link down for ARP monitorMichal Kubeček1-12/+21
Initialize slave device link state as down if ARP monitor is active and net_carrier_ok() returns zero. Also shift initial value of its last_arp_tx so that it doesn't immediately cause fake detection of "up" state. When ARP monitoring is used, initializing the slave device with up link state can cause ARP monitor to detect link failure before the device is really up (with igb driver, this can take more than two seconds). Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Flavio Leitner <fbl@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-13bonding: Fixup get_tx_queue() op second arg type.David S. Miller1-1/+1
I missed this when fixing up the warning in the previous commit. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-13rtnetlink & bonding: change args got get_tx_queuesstephen hemminger1-5/+2
Change get_tx_queues, drop unsused arg/return value real_tx_queues, and use return by value (with error) rather than call by reference. Probably bonding should just change to LLTX and the whole get_tx_queues API could disappear! Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-05bonding: properly unset current_arp_slave on slave link upVeaceslav Falico1-1/+5
When a slave comes up, we're unsetting the current_arp_slave without removing active flags from it, which can lead to situations where we have more than one slave with active flags in active-backup mode. To avoid this situation we must remove the active flags from a slave before removing it as a current_arp_slave. Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-04net/bonding: correctly proxy slave neigh param setup ndo functionShlomo Pongratz1-8/+43
The current implemenation was buggy for slaves who use ndo_neigh_setup, since the networking stack invokes the bonding device ndo entry (from neigh_params_alloc) before any devices are enslaved, and the bonding driver can't further delegate the call at that point in time. As a result when bonding IPoIB devices, the neigh_cleanup hasn't been called. Fix that by deferring the actual call into the slave ndo_neigh_setup from the time the bonding neigh_setup is called. Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-04net/bonding: emit address change event also in bond_releaseShlomo Pongratz1-0/+3
commit 7d26bb103c4 "bonding: emit event when bonding changes MAC" didn't take care to emit the NETDEV_CHANGEADDR event in bond_release, where bonding actually changes the mac address (to all zeroes). As a result the neighbours aren't deleted by the core networking code (which does so upon getting that event). Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds1-1/+7
Pull networking fixes from David Miller: 1) Provide device string properly for USB i2400m wimax devices, also don't OOPS when providing firmware string. From Phil Sutter. 2) Add support for sh_eth SH7734 chips, from Nobuhiro Iwamatsu. 3) Add another device ID to USB zaurus driver, from Guan Xin. 4) Loop index start in pool vector iterator is wrong causing MAC to not get configured in bnx2x driver, fix from Dmitry Kravkov. 5) EQL driver assumes HZ=100, fix from Eric Dumazet. 6) Now that skb_add_rx_frag() can specify the truesize increment separately, do so in f_phonet and cdc_phonet, also from Eric Dumazet. 7) virtio_net accidently uses net_ratelimit() not only on the kernel warning but also the statistic bump, fix from Rick Jones. 8) ip_route_input_mc() uses fixed init_net namespace, oops, use dev_net(dev) instead. Fix from Benjamin LaHaise. 9) dev_forward_skb() needs to clear the incoming interface index of the SKB so that it looks like a new incoming packet, also from Benjamin LaHaise. 10) iwlwifi mistakenly initializes a channel entry as 2GHZ instead of 5GHZ, fix from Stanislav Yakovlev. 11) Missing kmalloc() return value checks in orinoco, from Santosh Nayak. 12) ath9k doesn't check for HT capabilities in the right way, it is checking ht_supported instead of the ATH9K_HW_CAP_HT flag. Fix from Sujith Manoharan. 13) Fix x86 BPF JIT emission of 16-bit immediate field of AND instructions, from Feiran Zhuang. 14) Avoid infinite loop in GARP code when registering sysfs entries. From David Ward. 15) rose protocol uses memcpy instead of memcmp in a device address comparison, oops. Fix from Daniel Borkmann. 16) Fix build of lpc_eth due to dev_hw_addr_rancom() interface being renamed to eth_hw_addr_random(). From Roland Stigge. 17) Make ipv6 RTM_GETROUTE interpret RTA_IIF attribute the same way that ipv4 does. Fix from Shmulik Ladkani. 18) via-rhine has an inverted bit test, causing suspend/resume regressions. Fix from Andreas Mohr. 19) RIONET assumes 4K page size, fix from Akinobu Mita. 20) Initialization of imask register in sky2 is buggy, because bits are "or'd" into an uninitialized local variable. Fix from Lino Sanfilippo. 21) Fix FCOE checksum offload handling, from Yi Zou. 22) Fix VLAN processing regression in e1000, from Jiri Pirko. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (52 commits) sky2: dont overwrite settings for PHY Quick link tg3: Fix 5717 serdes powerdown problem net: usb: cdc_eem: fix mtu net: sh_eth: fix endian check for architecture independent usb/rtl8150 : Remove duplicated definitions rionet: fix page allocation order of rionet_active via-rhine: fix wait-bit inversion. ipv6: Fix RTM_GETROUTE's interpretation of RTA_IIF to be consistent with ipv4 net: lpc_eth: Fix rename of dev_hw_addr_random net/netfilter/nfnetlink_acct.c: use linux/atomic.h rose_dev: fix memcpy-bug in rose_set_mac_address Fix non TBI PHY access; a bad merge undid bug fix in a previous commit. net/garp: avoid infinite loop if attribute already exists x86 bpf_jit: fix a bug in emitting the 16-bit immediate operand of AND bonding: emit event when bonding changes MAC mac80211: fix oper channel timestamp updation ath9k: Use HW HT capabilites properly MAINTAINERS: adding maintainer for ipw2x00 net: orinoco: add error handling for failed kmalloc(). net/wireless: ipw2x00: fix a typo in wiphy struct initilization ...
2012-03-29bonding: emit event when bonding changes MACWeiping Pan1-1/+7
When a bonding device is configured with fail_over_mac=active, we expect to see the MAC address of the new active slave as the source MAC address after failover. But we see that the source MAC address is the MAC address of previous active slave. Emit NETDEV_CHANGEADDR event when bonding changes its MAC address, in order to let arp_netdev_event flush neighbour cache and route cache. How to reproduce this bug ? -----------hostB---------------- hostA ----- switch ---|-- eth0--bond0(192.168.100.2/24)| (192.168.100.1/24 \--|-- eth1-/ | -------------------------------- 1 on hostB, modprobe bonding mode=1 miimon=500 fail_over_mac=active downdelay=1000 num_grat_arp=1 ifconfig bond0 192.168.100.2/24 up ifenslave bond0 eth0 ifenslave bond0 eth1 then eth0 is the active slave, and MAC of bond0 is MAC of eth0. 2 on hostA, ping 192.168.100.2 3 on hostB, tcpdump -i bond0 -p icmp -XXX you will see bond0 uses MAC of eth0 as source MAC in icmp reply. 4 on hostB, ifconfig eth0 down tcpdump -i bond0 -p icmp -XXX (just keep it running in step 3) you will see first bond0 uses MAC of eth1 as source MAC in icmp reply, then it will use MAC of eth0 as source MAC. Signed-off-by: Weiping Pan <wpan@redhat.com> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-28Merge tag 'split-asm_system_h-for-linus-20120328' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-asm_systemLinus Torvalds1-1/+0
Pull "Disintegrate and delete asm/system.h" from David Howells: "Here are a bunch of patches to disintegrate asm/system.h into a set of separate bits to relieve the problem of circular inclusion dependencies. I've built all the working defconfigs from all the arches that I can and made sure that they don't break. The reason for these patches is that I recently encountered a circular dependency problem that came about when I produced some patches to optimise get_order() by rewriting it to use ilog2(). This uses bitops - and on the SH arch asm/bitops.h drags in asm-generic/get_order.h by a circuituous route involving asm/system.h. The main difficulty seems to be asm/system.h. It holds a number of low level bits with no/few dependencies that are commonly used (eg. memory barriers) and a number of bits with more dependencies that aren't used in many places (eg. switch_to()). These patches break asm/system.h up into the following core pieces: (1) asm/barrier.h Move memory barriers here. This already done for MIPS and Alpha. (2) asm/switch_to.h Move switch_to() and related stuff here. (3) asm/exec.h Move arch_align_stack() here. Other process execution related bits could perhaps go here from asm/processor.h. (4) asm/cmpxchg.h Move xchg() and cmpxchg() here as they're full word atomic ops and frequently used by atomic_xchg() and atomic_cmpxchg(). (5) asm/bug.h Move die() and related bits. (6) asm/auxvec.h Move AT_VECTOR_SIZE_ARCH here. Other arch headers are created as needed on a per-arch basis." Fixed up some conflicts from other header file cleanups and moving code around that has happened in the meantime, so David's testing is somewhat weakened by that. We'll find out anything that got broken and fix it.. * tag 'split-asm_system_h-for-linus-20120328' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-asm_system: (38 commits) Delete all instances of asm/system.h Remove all #inclusions of asm/system.h Add #includes needed to permit the removal of asm/system.h Move all declarations of free_initmem() to linux/mm.h Disintegrate asm/system.h for OpenRISC Split arch_align_stack() out from asm-generic/system.h Split the switch_to() wrapper out of asm-generic/system.h Move the asm-generic/system.h xchg() implementation to asm-generic/cmpxchg.h Create asm-generic/barrier.h Make asm-generic/cmpxchg.h #include asm-generic/cmpxchg-local.h Disintegrate asm/system.h for Xtensa Disintegrate asm/system.h for Unicore32 [based on ver #3, changed by gxt] Disintegrate asm/system.h for Tile Disintegrate asm/system.h for Sparc Disintegrate asm/system.h for SH Disintegrate asm/system.h for Score Disintegrate asm/system.h for S390 Disintegrate asm/system.h for PowerPC Disintegrate asm/system.h for PA-RISC Disintegrate asm/system.h for MN10300 ...
2012-03-28Remove all #inclusions of asm/system.hDavid Howells1-1/+0
Remove all #inclusions of asm/system.h preparatory to splitting and killing it. Performed with the following command: perl -p -i -e 's!^#\s*include\s*<asm/system[.]h>.*\n!!' `grep -Irl '^#\s*include\s*<asm/system[.]h>' *` Signed-off-by: David Howells <dhowells@redhat.com>
2012-03-22bonding: remove entries for master_ip and vlan_ip and query devices insteadAndy Gospodarek2-69/+31
The following patch aimed to resolve an issue where secondary, tertiary, etc. addresses added to bond interfaces could overwrite the bond->master_ip and vlan_ip values. commit 917fbdb32f37e9a93b00bb12ee83532982982df3 Author: Henrik Saavedra Persson <henrik.e.persson@ericsson.com> Date: Wed Nov 23 23:37:15 2011 +0000 bonding: only use primary address for ARP That patch was good because it prevented bonds using ARP monitoring from sending frames with an invalid source IP address. Unfortunately, it didn't always work as expected. When using an ioctl (like ifconfig does) to set the IP address and netmask, 2 separate ioctls are actually called to set the IP and netmask if the mask chosen doesn't match the standard mask for that class of address. The first ioctl did not have a mask that matched the one in the primary address and would still cause the device address to be overwritten. The second ioctl that was called to set the mask would then detect as secondary and ignored, but the damage was already done. This was not an issue when using an application that used netlink sockets as the setting of IP and netmask came down at once. The inconsistent behavior between those two interfaces was something that needed to be resolved. While I was thinking about how I wanted to resolve this, Ralf Zeidler came with a patch that resolved this on a RHEL kernel by keeping a full shadow of the entries in dev->ifa_list for the bonding device and vlan devices in the bonding driver. I didn't like the duplication of the list as I want to see the 'bonding' struct and code shrink rather than grow, but liked the general idea. As the Subject indicates this patch drops the master_ip and vlan_ip elements from the 'bonding' and 'vlan_entry' structs, respectively. This can be done because a device's address-list is now traversed to determine the optimal source IP address for ARP requests and for checks to see if the bonding device has a particular IP address. This code could have all be contained inside the bonding driver, but it made more sense to me to EXPORT and call inet_confirm_addr since it did exactly what was needed. I tested this and a backported patch and everything works as expected. Ralf also helped with verification of the backported patch. Thanks to Ralf for all his help on this. v2: Whitespace and organizational changes based on suggestions from Jay Vosburgh and Dave Miller. v3: Fixup incorrect usage of rcu_read_unlock based on Dave Miller's suggestion. Signed-off-by: Andy Gospodarek <andy@greyhouse.net> CC: Ralf Zeidler <ralf.zeidler@nsn.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-19bonding: send igmp report for its masterPeter Pan(潘卫平)1-3/+15
Liang Zheng(lzheng@redhat.com) found that in the following topo, bonding does not send igmp report when we trigger a fail-over of bonding. eth0-- |-- bond0 -- br0 eth1-- modprobe bonding mode=1 miimon=100 resend_igmp=10 ifconfig bond0 up ifenslave bond0 eth0 eth1 brctl addbr br0 ifconfig br0 192.168.100.2/24 up brctl addif br0 bond0 Add 192.168.100.2(br0) into a multicast group, like 224.10.10.10, then trigger a fali-over in bonding. You can see that parameter "resend_igmp" does not work. The reason is that when we add br0 into a multicast group, it does not propagate multicast knowledge down to its ports. If we choose to propagate multicast knowledge down to all ports for bridge, then we have to track every change that is done to bridge, and keep a backup for all ports. It is hard to track, I think. Instead I choose to modify bonding to send igmp report for its master. Changelog: V2: correct comments V3: move this check into bond_resend_igmp_join_requests() V4: only send igmp reports if bond is enslaved to a bridge Signed-off-by: Weiping Pan <panweiping3@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-05bonding: Fix misspelling of "since"Jesper Juhl1-1/+1
Signed-off-by: Jesper Juhl <jj@chaosbits.net> Signed-off-by: David S. Miller <davem@davemloft.net>