linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2015-07-27	packet: missing dev_put() in packet_do_bind()	Lars Westerhoff	1	-5/+3
	When binding a PF_PACKET socket, the use count of the bound interface is always increased with dev_hold in dev_get_by_{index,name}. However, when rebound with the same protocol and device as in the previous bind the use count of the interface was not decreased. Ultimately, this caused the deletion of the interface to fail with the following message: unregister_netdevice: waiting for dummy0 to become free. Usage count = 1 This patch moves the dev_put out of the conditional part that was only executed when either the protocol or device changed on a bind. Fixes: 902fefb82ef7 ('packet: improve socket create/bind latency in some cases') Signed-off-by: Lars Westerhoff <lars.westerhoff@newtec.eu> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-27	macvtap: fix network header pointer for VLAN tagged pkts	Ivan Vecera	1	-0/+7
	Network header is set with offset ETH_HLEN but it is not true for VLAN (multiple-)tagged and results in checksum issues in lower devices. v2: leave skb->protocol untouched (thx Vlad), comment added v3: moved after skb_probe_transport_header() call (thx Toshiaki) Signed-off-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-27	fib_trie: Drop unnecessary calls to leaf_pull_suffix	Alexander Duyck	1	-4/+0
	It was reported that update_suffix was taking a long time on systems where a large number of leaves were attached to a single node. As it turns out fib_table_flush was calling update_suffix for each leaf that didn't have all of the aliases stripped from it. As a result, on this large node removing one leaf would result in us calling update_suffix for every other leaf on the node. The fix is to just remove the calls to leaf_pull_suffix since they are redundant as we already have a call in resize that will go through and update the suffix length for the node before we exit out of fib_table_flush or fib_table_flush_external. Reported-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Tested-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-27	macb: Fix build with macro'ized readl/writel.	David S. Miller	2	-15/+15
	If an architecture defines readl/writel using CPP macros, we get the following kinds of build failure: > > > drivers/net/ethernet/cadence/macb.c:164:1: error: macro "writel" > > > passed 3 arguments, but takes just 2 > macb_or_gem_writel(bp, SA1B, bottom); > ^ Rename the methods so that this doesn't happen. Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-27	net: fec: Ensure clocks are enabled while using mdio bus	Andrew Lunn	1	-13/+76
	When a switch is attached to the mdio bus, the mdio bus can be used while the interface is not open. If the IPG clock is not enabled, MDIO reads/writes will simply time out. Add support for runtime PM to control this clock. Enable/disable this clock using runtime PM, with open()/close() and mdio read()/write() function triggering runtime PM operations. Since PM is optional, the IPG clock is enabled at probe and is no longer modified by fec_enet_clk_enable(), thus if PM is not enabled in the kernel, it is guaranteed the clock is running when MDIO operations are performed. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Cc: tyler.baker@linaro.org Cc: fabio.estevam@freescale.com Cc: shawn.guo@linaro.org Tested-by: Fabio Estevam <fabio.estevam@freescale.com> Tested-by: Tyler Baker <tyler.baker@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-27	net: netcp: Fixes SGMII reset on network interface shutdown	WingMan Kwok	3	-2/+47
	This patch asserts SGMII RTRESET, i.e. resetting the SGMII Tx/Rx logic, during network interface shutdown to avoid having the hardware wedge when shutting down with high incoming traffic rates. This is cleared (brought out of RTRESET) when the interface is brought back up. Signed-off-by: WingMan Kwok <w-kwok2@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-27	net/macb: convert to kernel doc	Andy Shevchenko	1	-3/+11
	This patch coverts struct description to the kernel doc format. There is no functional change. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-27	net/macb: replace macb_count_tx_descriptors() by DIV_ROUND_UP()	Andy Shevchenko	1	-8/+2
	macb_count_tx_descriptors() repeats the generic macro DIV_ROUND_UP(). The patch does a replacement. There is no functional change. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-27	net/macb: suppress compiler warnings	Andy Shevchenko	2	-6/+5
	This patch fixes the following warnings: drivers/net/ethernet/cadence/macb.c: In function ‘macb_handle_link_change’: drivers/net/ethernet/cadence/macb.c:266: warning: comparison between signed and unsigned drivers/net/ethernet/cadence/macb.c:267: warning: comparison between signed and unsigned drivers/net/ethernet/cadence/macb.c:291: warning: comparison between signed and unsigned drivers/net/ethernet/cadence/macb.c: In function ‘gem_update_stats’: drivers/net/ethernet/cadence/macb.c:1908: warning: comparison between signed and unsigned drivers/net/ethernet/cadence/macb.c: In function ‘gem_get_ethtool_strings’: drivers/net/ethernet/cadence/macb.c:1988: warning: comparison between signed and unsigned Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-27	net/macb: use dev_*() when netdev is not yet registered	Andy Shevchenko	1	-2/+2
	To avoid messages like macb macb.0 (unnamed net_device) (uninitialized): Cadence caps 0x00000000 macb macb.0 (unnamed net_device) (uninitialized): invalid hw address, using random let's use dev_*() macros. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-27	net/macb: check if macb_config present	Andy Shevchenko	1	-2/+1
	The commit 98b5a0f4a228 introduces jumbo frame support, but also it assumes that macb_config present which is not always true. The configuration without macb_config fails to boot. Unable to handle kernel NULL pointer dereference at virtual address 00000010 ptbr = 90350000 pgd = 00000000 Oops: Kernel access of bad area, sig: 11 [#1] FRAME_POINTER chip: 0x01f:0x1e82 rev 2 Modules linked in: CPU: 0 PID: 1 Comm: swapper Not tainted 4.2.0-rc3-next-20150723+ #13 task: 91c26000 ti: 91c28000 task.ti: 91c28000 PC is at macb_probe+0x140/0x61c Fixes: 98b5a0f4a228 (net: macb: Add support for jumbo frames) Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-27	net/macb: improve big endian CPU support	Andy Shevchenko	2	-44/+87
	The commit a50dad355a53 (net: macb: Add big endian CPU support) converted I/O accessors to readl_relaxed() and writel_relaxed() and consequentially broke MACB driver on AVR32 platforms such as ATNGW100. This patch improves I/O access by checking endiannes first and use the corresponding methods. Fixes: a50dad355a53 (net: macb: Add big endian CPU support) Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-27	tcp: fix recv with flags MSG_WAITALL \| MSG_PEEK	Sabrina Dubroca	5	-10/+14
	Currently, tcp_recvmsg enters a busy loop in sk_wait_data if called with flags = MSG_WAITALL \| MSG_PEEK. sk_wait_data waits for sk_receive_queue not empty, but in this case, the receive queue is not empty, but does not contain any skb that we can use. Add a "last skb seen on receive queue" argument to sk_wait_data, so that it sleeps until the receive queue has new skbs. Link: https://bugzilla.kernel.org/show_bug.cgi?id=99461 Link: https://sourceware.org/bugzilla/show_bug.cgi?id=18493 Link: https://bugzilla.redhat.com/show_bug.cgi?id=1205258 Reported-by: Enrico Scholz <rh-bugzilla@ensc.de> Reported-by: Dan Searle <dan@censornet.com> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-27	r8152: don't enable napi before rx ready	hayeswang	1	-2/+4
	Adjust napi_disable() and napi_enable() to avoid r8152_poll() start working before rx ready. Otherwise, it may have race condition for rx_agg. Signed-off-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-27	r8152: fix wakeup settings	hayeswang	1	-6/+22
	Avoid the driver to enable WOL if the device doesn't support it. Signed-off-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-27	r8152: fix the issue about U1/U2	hayeswang	1	-40/+54
	- Disable U1/U2 during initialization. - Disable lpm when linking is on, and enable it when linking is off. - Disable U1/U2 when enabling runtime suspend. It is possible to let hw stop working, if the U1/U2 request occurs during some situations. The patch is used to avoid it. Signed-off-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-26	net: fec: introduce fec_ptp_stop and use in probe fail path	Lucas Stach	3	-3/+13
	This function frees resources and cancels delayed work item that have been initialized in fec_ptp_init(). Use this to do proper error handling if something goes wrong in probe function after fec_ptp_init has been called. Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Acked-by: Fugang Duan <B38611@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-26	net: fec: use managed DMA API functions to allocate BD ring	Lucas Stach	1	-2/+2
	So it gets freed when the device is going away. This fixes a DMA memory leak on driver probe() fail and driver remove(). Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-26	niu: don't count tx error twice in case of headroom realloc fails	Jiri Pirko	1	-3/+1
	Fixes: a3138df9 ("[NIU]: Add Sun Neptune ethernet driver.") Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-26	inet: frags: remove INET_FRAG_EVICTED and use list_evictor for the test	Nikolay Aleksandrov	4	-5/+7
	We can simply remove the INET_FRAG_EVICTED flag to avoid all the flags race conditions with the evictor and use a participation test for the evictor list, when we're at that point (after inet_frag_kill) in the timer there're 2 possible cases: 1. The evictor added the entry to its evictor list while the timer was waiting for the chainlock or 2. The timer unchained the entry and the evictor won't see it In both cases we should be able to see list_evictor correctly due to the sync on the chainlock. Joint work with Florian Westphal. Tested-by: Frank Schreuder <fschreuder@transip.nl> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-26	inet: frag: don't wait for timer deletion when evicting	Florian Westphal	1	-18/+11
	Frank reports 'NMI watchdog: BUG: soft lockup' errors when load is high. Instead of (potentially) unbounded restarts of the eviction process, just skip to the next entry. One caveat is that, when a netns is exiting, a timer may still be running by the time inet_evict_bucket returns. We use the frag memory accounting to wait for outstanding timers, so that when we free the percpu counter we can be sure no running timer will trip over it. Reported-and-tested-by: Frank Schreuder <fschreuder@transip.nl> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-26	inet: frag: change *_frag_mem_limit functions to take netns_frags as argument	Florian Westphal	6	-20/+20
	Followup patch will call it after inet_frag_queue was freed, so q->net doesn't work anymore (but netf = q->net; free(q); mem_limit(netf) would). Tested-by: Frank Schreuder <fschreuder@transip.nl> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-26	inet: frag: don't re-use chainlist for evictor	Florian Westphal	2	-5/+5
	commit 65ba1f1ec0eff ("inet: frags: fix a race between inet_evict_bucket and inet_frag_kill") describes the bug, but the fix doesn't work reliably. Problem is that ->flags member can be set on other cpu without chainlock being held by that task, i.e. the RMW-Cycle can clear INET_FRAG_EVICTED bit after we put the element on the evictor private list. We can crash when walking the 'private' evictor list since an element can be deleted from list underneath the evictor. Join work with Nikolay Alexandrov. Fixes: b13d3cbfb8e8 ("inet: frag: move eviction of queues to work queue") Reported-by: Johan Schuijt <johan@transip.nl> Tested-by: Frank Schreuder <fschreuder@transip.nl> Signed-off-by: Nikolay Alexandrov <nikolay@cumulusnetworks.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-26	net: sctp: stop spamming klog with rfc6458, 5.3.2. deprecation warnings	Daniel Borkmann	1	-6/+0
	Back then when we added support for SCTP_SNDINFO/SCTP_RCVINFO from RFC6458 5.3.4/5.3.5, we decided to add a deprecation warning for the (as per RFC deprecated) SCTP_SNDRCV via commit bbbea41d5e53 ("net: sctp: deprecate rfc6458, 5.3.2. SCTP_SNDRCV support"), see [1]. Imho, it was not a good idea, and we should just revert that message for a couple of reasons: 1) It's uapi and therefore set in stone forever. 2) To be able to run on older and newer kernels, an SCTP application would need to probe for both, SCTP_SNDRCV, but also SCTP_SNDINFO/ SCTP_RCVINFO support, so that on older kernels, it can make use of SCTP_SNDRCV, and on newer kernels SCTP_SNDINFO/SCTP_RCVINFO. In my (limited) experience, a lot of SCTP appliances are migrating to newer kernels only ve(ee)ry slowly. 3) Some people don't have the chance to change their applications, f.e. due to proprietary legacy stuff. So, they'll hit this warning in fast path and are stuck with older kernels. But i.e. due to point 1) I really fail to see the benefit of a warning. So just revert that for now, the issue was reported up Jamal. [1] http://thread.gmane.org/gmane.linux.network/321960/ Reported-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Michael Tuexen <tuexen@fh-muenster.de> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-26	net/mlx4_en: Remove BUG_ON assert when checking if ring is full	Ido Shamay	1	-1/+0
	In mlx4_en_is_ring_empty we check if ring surpassed its size. Since the prod and cons indicators are u32, there might be a state where prod wrapped around and cons, making this assert false, although no actual bug exists (other code segment can cope with this state). Signed-off-by: Ido Shamay <idos@mellanox.com> Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-26	net/mlx4_core: Relieve cpu load average on the port sending flow	Jack Morgenstein	1	-2/+15
	When a port is not attached, the FW requires a longer than usual time to execute the SENSE_PORT command. In the command flow, the wait_for_completion_timeout call used in mlx4_cmd_wait puts the kernel thread into the uninterruptible state during this time. This, in turn, due to the computation method, causes the CPU load average to increase. Fix this by using wait_for_completion_interruptible_timeout() for the SENSE_PORT command, which puts the thread in the interruptible state. In this state, the thread does not contribute to the CPU load average. Treat the interrupted case as if the SENSE_PORT command returned port_type = NONE. Fix suggested by Gideon Naim <gideonn@mellanox.com> and Bart Van Assche <bart.vanassche@sandisk.com>. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-26	net/mlx4_core: Fix wrong index in propagating port change event to VFs	Jack Morgenstein	1	-2/+2
	The port-change event processing in procedure mlx4_eq_int() uses "slave" as the vf_oper array index. Since the value of "slave" is the PF function index, the result is that the PF link state is used for deciding to propagate the event for all the VFs. The VF link state should be used, so the VF function index should be used here. Fixes: 948e306d7d64 ('net/mlx4: Add VF link state support') Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-26	net/mlx4_core: Use sink counter for the VF default as fallback	Or Gerlitz	1	-0/+5
	Some old PF drivers don't let VFs allocate counters, in that case, use the sink counter so the VF can load and operate properly. Fixes: 6de5f7f6a1fa ('net/mlx4_core: Allocate default counter per port') Reported-by: Sebastian Ott <sebott@linux.vnet.ibm.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-26	bridge: netlink: fix slave_changelink/br_setport race conditions	Nikolay Aleksandrov	1	-1/+9
	Since slave_changelink support was added there have been a few race conditions when using br_setport() since some of the port functions it uses require the bridge lock. It is very easy to trigger a lockup due to some internal spin_lock() usage without bh disabled, also it's possible to get the bridge into an inconsistent state. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Fixes: 3ac636b8591c ("bridge: implement rtnl_link_ops->slave_changelink") Reviewed-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-25	cgroup: net_cls: fix false-positive "suspicious RCU usage"	Konstantin Khlebnikov	1	-1/+2
	In dev_queue_xmit() net_cls protected with rcu-bh. [ 270.730026] =============================== [ 270.730029] [ INFO: suspicious RCU usage. ] [ 270.730033] 4.2.0-rc3+ #2 Not tainted [ 270.730036] ------------------------------- [ 270.730040] include/linux/cgroup.h:353 suspicious rcu_dereference_check() usage! [ 270.730041] other info that might help us debug this: [ 270.730043] rcu_scheduler_active = 1, debug_locks = 1 [ 270.730045] 2 locks held by dhclient/748: [ 270.730046] #0: (rcu_read_lock_bh){......}, at: [<ffffffff81682b70>] __dev_queue_xmit+0x50/0x960 [ 270.730085] #1: (&qdisc_tx_lock){+.....}, at: [<ffffffff81682d60>] __dev_queue_xmit+0x240/0x960 [ 270.730090] stack backtrace: [ 270.730096] CPU: 0 PID: 748 Comm: dhclient Not tainted 4.2.0-rc3+ #2 [ 270.730098] Hardware name: OpenStack Foundation OpenStack Nova, BIOS Bochs 01/01/2011 [ 270.730100] 0000000000000001 ffff8800bafeba58 ffffffff817ad487 0000000000000007 [ 270.730103] ffff880232a0a780 ffff8800bafeba88 ffffffff810ca4f2 ffff88022fb23e00 [ 270.730105] ffff880232a0a780 ffff8800bafebb68 ffff8800bafebb68 ffff8800bafebaa8 [ 270.730108] Call Trace: [ 270.730121] [<ffffffff817ad487>] dump_stack+0x4c/0x65 [ 270.730148] [<ffffffff810ca4f2>] lockdep_rcu_suspicious+0xe2/0x120 [ 270.730153] [<ffffffff816a62d2>] task_cls_state+0x92/0xa0 [ 270.730158] [<ffffffffa00b534f>] cls_cgroup_classify+0x4f/0x120 [cls_cgroup] [ 270.730164] [<ffffffff816aac74>] tc_classify_compat+0x74/0xc0 [ 270.730166] [<ffffffff816ab573>] tc_classify+0x33/0x90 [ 270.730170] [<ffffffffa00bcb0a>] htb_enqueue+0xaa/0x4a0 [sch_htb] [ 270.730172] [<ffffffff81682e26>] __dev_queue_xmit+0x306/0x960 [ 270.730174] [<ffffffff81682b70>] ? __dev_queue_xmit+0x50/0x960 [ 270.730176] [<ffffffff816834a3>] dev_queue_xmit_sk+0x13/0x20 [ 270.730185] [<ffffffff81787770>] dev_queue_xmit+0x10/0x20 [ 270.730187] [<ffffffff8178b91c>] packet_snd.isra.62+0x54c/0x760 [ 270.730190] [<ffffffff8178be25>] packet_sendmsg+0x2f5/0x3f0 [ 270.730203] [<ffffffff81665245>] ? sock_def_readable+0x5/0x190 [ 270.730210] [<ffffffff817b64bb>] ? _raw_spin_unlock+0x2b/0x40 [ 270.730216] [<ffffffff8173bcbc>] ? unix_dgram_sendmsg+0x5cc/0x640 [ 270.730219] [<ffffffff8165f367>] sock_sendmsg+0x47/0x50 [ 270.730221] [<ffffffff8165f42f>] sock_write_iter+0x7f/0xd0 [ 270.730232] [<ffffffff811fd4c7>] __vfs_write+0xa7/0xf0 [ 270.730234] [<ffffffff811fe5b8>] vfs_write+0xb8/0x190 [ 270.730236] [<ffffffff811fe8c2>] SyS_write+0x52/0xb0 [ 270.730239] [<ffffffff817b6bae>] entry_SYSCALL_64_fastpath+0x12/0x76 Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-24	sch_choke: drop all packets in queue during reset	WANG Cong	1	-0/+13
	Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-24	sch_plug: purge buffered packets during reset	WANG Cong	1	-0/+1
	Otherwise the skbuff related structures are not correctly refcount'ed. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-24	ipv4: consider TOS in fib_select_default	Julian Anastasov	5	-15/+30
	fib_select_default considers alternative routes only when res->fi is for the first alias in res->fa_head. In the common case this can happen only when the initial lookup matches the first alias with highest TOS value. This prevents the alternative routes to require specific TOS. This patch solves the problem as follows: - routes that require specific TOS should be returned by fib_select_default only when TOS matches, as already done in fib_table_lookup. This rule implies that depending on the TOS we can have many different lists of alternative gateways and we have to keep the last used gateway (fa_default) in first alias for the TOS instead of using single tb_default value. - as the aliases are ordered by many keys (TOS desc, fib_priority asc), we restrict the possible results to routes with matching TOS and lowest metric (fib_priority) and routes that match any TOS, again with lowest metric. For example, packet with TOS 8 can not use gw3 (not lowest metric), gw4 (different TOS) and gw6 (not lowest metric), all other gateways can be used: tos 8 via gw1 metric 2 <--- res->fa_head and res->fi tos 8 via gw2 metric 2 tos 8 via gw3 metric 3 tos 4 via gw4 tos 0 via gw5 tos 0 via gw6 metric 1 Reported-by: Hagen Paul Pfeifer <hagen@jauu.net> Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-24	ipv4: fib_select_default should match the prefix	Julian Anastasov	1	-0/+5
	fib_trie starting from 4.1 can link fib aliases from different prefixes in same list. Make sure the alternative gateways are in same table and for same prefix (0) by checking tb_id and fa_slen. Fixes: 79e5ad2ceb00 ("fib_trie: Remove leaf_info") Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-23	Bluetooth: Fix NULL pointer dereference in smp_conn_security	Johan Hedberg	1	-0/+4
	The l2cap_conn->smp pointer may be NULL for various valid reasons where SMP has failed to initialize properly. One such scenario is when crypto support is missing, another when the adapter has been powered on through a legacy method. The smp_conn_security() function should have the appropriate check for this situation to avoid NULL pointer dereferences. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Cc: stable@vger.kernel.org # 4.0+
2015-07-22	netfilter: nf_conntrack: Support expectations in different zones	Joe Stringer	1	-1/+2
	When zones were originally introduced, the expectation functions were all extended to perform lookup using the zone. However, insertion was not modified to check the zone. This means that two expectations which are intended to apply for different connections that have the same tuple but exist in different zones cannot both be tracked. Fixes: 5d0aa2ccd4 (netfilter: nf_conntrack: add support for "conntrack zones") Signed-off-by: Joe Stringer <joestringer@nicira.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-07-22	ARM64/irq: Use access helper irq_data_get_affinity_mask()	Jiang Liu	1	-2/+2
	This is a preparatory patch for moving irq_data struct members. Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Reviewed-by: Hanjun Guo <hanjun.guo@linaro.org> Cc: linux-arm-kernel@lists.infradead.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2015-07-22	arm64: switch_to: calculate cpu context pointer using separate register	Will Deacon	1	-2/+3
	Commit 0c8c0f03e3a2 ("x86/fpu, sched: Dynamically allocate 'struct fpu'") moved the thread_struct to the bottom of task_struct. As a result, the offset is now too large to be used in an immediate add on arm64 with some kernel configs: arch/arm64/kernel/entry.S: Assembler messages: arch/arm64/kernel/entry.S:588: Error: immediate out of range arch/arm64/kernel/entry.S:597: Error: immediate out of range This patch calculates the offset using an additional register instead of an immediate offset. Fixes: 0c8c0f03e3a2 ("x86/fpu, sched: Dynamically allocate 'struct fpu'") Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Olof Johansson <olof@lixom.net> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: Will Deacon <will.deacon@arm.com> Tested-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2015-07-21	ravb: fix ring memory allocation	Sergei Shtylyov	1	-25/+34
	The driver is written as if it can adapt to a low memory situation allocating less RX skbs and TX aligned buffers than the respective RX/TX ring sizes. In reality though the driver would malfunction in this case. Stop being overly smart and just fail in such situation -- this is achieved by moving the memory allocation from ravb_ring_format() to ravb_ring_init(). We leave dma_map_single() calls in place but make their failure non-fatal by marking the corresponding RX descriptors with zero data size which should prevent DMA to an invalid addresses. Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-21	net: phy: dp83867: Fix warning check for setting the internal delay	Dan Murphy	1	-1/+1
	Fix warning: logical ‘or’ of collectively exhaustive tests is always true Change the internal delay check from an 'or' condition to an 'and' condition. Reported-by: David Binderman <dcb314@hotmail.com> Signed-off-by: Dan Murphy <dmurphy@ti.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-21	openvswitch: allocate nr_node_ids flow_stats instead of num_possible_nodes	Chris J Arges	1	-1/+1
	Some architectures like POWER can have a NUMA node_possible_map that contains sparse entries. This causes memory corruption with openvswitch since it allocates flow_cache with a multiple of num_possible_nodes() and assumes the node variable returned by for_each_node will index into flow->stats[node]. Use nr_node_ids to allocate a maximal sparse array instead of num_possible_nodes(). The crash was noticed after 3af229f2 was applied as it changed the node_possible_map to match node_online_map on boot. Fixes: 3af229f2071f5b5cb31664be6109561fbe19c861 Signed-off-by: Chris J Arges <chris.j.arges@canonical.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-21	netlink: don't hold mutex in rcu callback when releasing mmapd ring	Florian Westphal	1	-32/+47
	Kirill A. Shutemov says: This simple test-case trigers few locking asserts in kernel: int main(int argc, char *argv) { unsigned int block_size = 16 4096; struct nl_mmap_req req = { .nm_block_size = block_size, .nm_block_nr = 64, .nm_frame_size = 16384, .nm_frame_nr = 64 * block_size / 16384, }; unsigned int ring_size; int fd; fd = socket(AF_NETLINK, SOCK_RAW, NETLINK_GENERIC); if (setsockopt(fd, SOL_NETLINK, NETLINK_RX_RING, &req, sizeof(req)) < 0) exit(1); if (setsockopt(fd, SOL_NETLINK, NETLINK_TX_RING, &req, sizeof(req)) < 0) exit(1); ring_size = req.nm_block_nr * req.nm_block_size; mmap(NULL, 2 * ring_size, PROT_READ\|PROT_WRITE, MAP_SHARED, fd, 0); return 0; } +++ exited with 0 +++ BUG: sleeping function called from invalid context at /home/kas/git/public/linux-mm/kernel/locking/mutex.c:616 in_atomic(): 1, irqs_disabled(): 0, pid: 1, name: init 3 locks held by init/1: #0: (reboot_mutex){+.+...}, at: [<ffffffff81080959>] SyS_reboot+0xa9/0x220 #1: ((reboot_notifier_list).rwsem){.+.+..}, at: [<ffffffff8107f379>] __blocking_notifier_call_chain+0x39/0x70 #2: (rcu_callback){......}, at: [<ffffffff810d32e0>] rcu_do_batch.isra.49+0x160/0x10c0 Preemption disabled at:[<ffffffff8145365f>] __delay+0xf/0x20 CPU: 1 PID: 1 Comm: init Not tainted 4.1.0-00009-gbddf4c4818e0 #253 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Debian-1.8.2-1 04/01/2014 ffff88017b3d8000 ffff88027bc03c38 ffffffff81929ceb 0000000000000102 0000000000000000 ffff88027bc03c68 ffffffff81085a9d 0000000000000002 ffffffff81ca2a20 0000000000000268 0000000000000000 ffff88027bc03c98 Call Trace: <IRQ> [<ffffffff81929ceb>] dump_stack+0x4f/0x7b [<ffffffff81085a9d>] ___might_sleep+0x16d/0x270 [<ffffffff81085bed>] __might_sleep+0x4d/0x90 [<ffffffff8192e96f>] mutex_lock_nested+0x2f/0x430 [<ffffffff81932fed>] ? _raw_spin_unlock_irqrestore+0x5d/0x80 [<ffffffff81464143>] ? __this_cpu_preempt_check+0x13/0x20 [<ffffffff8182fc3d>] netlink_set_ring+0x1ed/0x350 [<ffffffff8182e000>] ? netlink_undo_bind+0x70/0x70 [<ffffffff8182fe20>] netlink_sock_destruct+0x80/0x150 [<ffffffff817e484d>] __sk_free+0x1d/0x160 [<ffffffff817e49a9>] sk_free+0x19/0x20 [..] Cong Wang says: We can't hold mutex lock in a rcu callback, [..] Thomas Graf says: The socket should be dead at this point. It might be simpler to add a netlink_release_ring() function which doesn't require locking at all. Reported-by: "Kirill A. Shutemov" <kirill@shutemov.name> Diagnosed-by: Cong Wang <cwang@twopensource.com> Suggested-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-21	ARM: net: fix vlan access instructions in ARM JIT.	Nicolas Schichan	1	-3/+5
	This makes BPF_ANC \| SKF_AD_VLAN_TAG and BPF_ANC \| SKF_AD_VLAN_TAG_PRESENT have the same behaviour as the in kernel VM and makes the test_bpf LD_VLAN_TAG and LD_VLAN_TAG_PRESENT tests pass. Signed-off-by: Nicolas Schichan <nschichan@freebox.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-21	ARM: net: handle negative offsets in BPF JIT.	Nicolas Schichan	1	-9/+38
	Previously, the JIT would reject negative offsets known during code generation and mishandle negative offsets provided at runtime. Fix that by calling bpf_internal_load_pointer_neg_helper() appropriately in the jit_get_skb_{b,h,w} slow path helpers and by forcing the execution flow to the slow path helpers when the offset is negative. Signed-off-by: Nicolas Schichan <nschichan@freebox.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-21	ARM: net: fix condition for load_order > 0 when translating load instructions.	Nicolas Schichan	1	-1/+1
	To check whether the load should take the fast path or not, the code would check that (r_skb_hlen - load_order) is greater than the offset of the access using an "Unsigned higher or same" condition. For halfword accesses and an skb length of 1 at offset 0, that test is valid, as we end up comparing 0xffffffff(-1) and 0, so the fast path is taken and the filter allows the load to wrongly succeed. A similar issue exists for word loads at offset 0 and an skb length of less than 4. Fix that by using the condition "Signed greater than or equal" condition for the fast path code for load orders greater than 0. Signed-off-by: Nicolas Schichan <nschichan@freebox.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-21	tcp: suppress a division by zero warning	Eric Dumazet	1	-6/+5
	Andrew Morton reported following warning on one ARM build with gcc-4.4 : net/ipv4/inet_hashtables.c: In function 'inet_ehash_locks_alloc': net/ipv4/inet_hashtables.c:617: warning: division by zero Even guarded with a test on sizeof(spinlock_t), compiler does not like current construct on a !CONFIG_SMP build. Remove the warning by using a temporary variable. Fixes: 095dc8e0c368 ("tcp: fix/cleanup inet_ehash_locks_alloc()") Reported-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-21	Revert "fsnotify: fix oops in fsnotify_clear_marks_by_group_flags()"	Linus Torvalds	1	-14/+20
	This reverts commit a2673b6e040663bf16a552f8619e6bde9f4b9acf. Kinglong Mee reports a memory leak with that patch, and Jan Kara confirms: "Thanks for report! You are right that my patch introduces a race between fsnotify kthread and fsnotify_destroy_group() which can result in leaking inotify event on group destruction. I haven't yet decided whether the right fix is not to queue events for dying notification group (as that is pointless anyway) or whether we should just fix the original problem differently... Whenever I look at fsnotify code mark handling I get lost in the maze of locks, lists, and subtle differences between how different notification systems handle notification marks :( I'll think about it over night" and after thinking about it, Jan says: "OK, I have looked into the code some more and I found another relatively simple way of fixing the original oops. It will be IMHO better than trying to fixup this issue which has more potential for breakage. I'll ask Linus to revert the fsnotify fix he already merged and send a new fix" Reported-by: Kinglong Mee <kinglongmee@gmail.com> Requested-by: Jan Kara <jack@suse.cz> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-07-21	drivers: net: cpsw: remove tx event processing in rx napi poll	Mugunthan V N	1	-6/+3
	With commit c03abd84634d ("net: ethernet: cpsw: don't requests IRQs we don't use") common isr and napi are separated into separate tx isr and rx isr/napi, but still in rx napi tx events are handled. So removing the tx event handling in rx napi. Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-21	inet: frags: fix defragmented packet's IP header for af_packet	Edward Hyunkoo Jee	1	-2/+4
	When ip_frag_queue() computes positions, it assumes that the passed sk_buff does not contain L2 headers. However, when PACKET_FANOUT_FLAG_DEFRAG is used, IP reassembly functions can be called on outgoing packets that contain L2 headers. Also, IPv4 checksum is not corrected after reassembly. Fixes: 7736d33f4262 ("packet: Add pre-defragmentation support for ipv4 fanouts.") Signed-off-by: Edward Hyunkoo Jee <edjee@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Cc: Jerry Chu <hkchu@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-21	net: mvneta: fix refilling for Rx DMA buffers	Simon Guinot	1	-12/+10
	With the actual code, if a memory allocation error happens while refilling a Rx descriptor, then the original Rx buffer is both passed to the networking stack (in a SKB) and let in the Rx ring. This leads to various kernel oops and crashes. As a fix, this patch moves Rx descriptor refilling ahead of building SKB with the associated Rx buffer. In case of a memory allocation failure, data is dropped and the original DMA buffer is put back into the Rx ring. Signed-off-by: Simon Guinot <simon.guinot@sequanux.org> Fixes: c5aff18204da ("net: mvneta: driver for Marvell Armada 370/XP network unit") Cc: <stable@vger.kernel.org> # v3.8+ Tested-by: Yoann Sculo <yoann@sculo.fr> Signed-off-by: David S. Miller <davem@davemloft.net>