linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2016-07-25	net/mlx5e: Check the minimum inline header mode before xmit	Hadar Hen Zion	3	-4/+53
	Each send queue (SQ) has inline mode that defines the minimal required inline headers in the SQ WQE. Before sending each packet check that the minimum required headers on the WQE are copied. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-25	net/sctp: terminate rhashtable walk correctly	Vegard Nossum	1	-0/+1
	I was seeing a lot of these: BUG: sleeping function called from invalid context at mm/slab.h:388 in_atomic(): 0, irqs_disabled(): 0, pid: 14971, name: trinity-c2 Preemption disabled at:[<ffffffff819bcd46>] rhashtable_walk_start+0x46/0x150 [<ffffffff81149abb>] preempt_count_add+0x1fb/0x280 [<ffffffff83295722>] _raw_spin_lock+0x12/0x40 [<ffffffff811aac87>] console_unlock+0x2f7/0x930 [<ffffffff811ab5bb>] vprintk_emit+0x2fb/0x520 [<ffffffff811aba6a>] vprintk_default+0x1a/0x20 [<ffffffff812c171a>] printk+0x94/0xb0 [<ffffffff811d6ed0>] print_stack_trace+0xe0/0x170 [<ffffffff8115835e>] ___might_sleep+0x3be/0x460 [<ffffffff81158490>] __might_sleep+0x90/0x1a0 [<ffffffff8139b823>] kmem_cache_alloc+0x153/0x1e0 [<ffffffff819bca1e>] rhashtable_walk_init+0xfe/0x2d0 [<ffffffff82ec64de>] sctp_transport_walk_start+0x1e/0x60 [<ffffffff82edd8ad>] sctp_transport_seq_start+0x4d/0x150 [<ffffffff8143a82b>] seq_read+0x27b/0x1180 [<ffffffff814f97fc>] proc_reg_read+0xbc/0x180 [<ffffffff813d471b>] __vfs_read+0xdb/0x610 [<ffffffff813d4d3a>] vfs_read+0xea/0x2d0 [<ffffffff813d615b>] SyS_pread64+0x11b/0x150 [<ffffffff8100334c>] do_syscall_64+0x19c/0x410 [<ffffffff832960a5>] return_from_SYSCALL_64+0x0/0x6a [<ffffffffffffffff>] 0xffffffffffffffff Apparently we always need to call rhashtable_walk_stop(), even when rhashtable_walk_start() fails: * rhashtable_walk_start - Start a hash table walk * @iter: Hash table iterator * * Start a hash table walk. Note that we take the RCU lock in all * cases including when we return an error. So you must always call * rhashtable_walk_stop to clean up. otherwise we never call rcu_read_unlock() and we get the splat above. Fixes: 53fa1036 ("sctp: fix some rhashtable functions using in sctp proc/diag") See-also: 53fa1036 ("sctp: fix some rhashtable functions using in sctp proc/diag") See-also: f2dba9c6 ("rhashtable: Introduce rhashtable_walk_*") Cc: Xin Long <lucien.xin@gmail.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: stable@vger.kernel.org Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-25	net/irda: fix NULL pointer dereference on memory allocation failure	Vegard Nossum	1	-2/+5
	I ran into this: kasan: CONFIG_KASAN_INLINE enabled kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] PREEMPT SMP KASAN CPU: 2 PID: 2012 Comm: trinity-c3 Not tainted 4.7.0-rc7+ #19 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 task: ffff8800b745f2c0 ti: ffff880111740000 task.ti: ffff880111740000 RIP: 0010:[<ffffffff82bbf066>] [<ffffffff82bbf066>] irttp_connect_request+0x36/0x710 RSP: 0018:ffff880111747bb8 EFLAGS: 00010286 RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000069dd8358 RDX: 0000000000000009 RSI: 0000000000000027 RDI: 0000000000000048 RBP: ffff880111747c00 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000069dd8358 R11: 1ffffffff0759723 R12: 0000000000000000 R13: ffff88011a7e4780 R14: 0000000000000027 R15: 0000000000000000 FS: 00007fc738404700(0000) GS:ffff88011af00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fc737fdfb10 CR3: 0000000118087000 CR4: 00000000000006e0 Stack: 0000000000000200 ffff880111747bd8 ffffffff810ee611 ffff880119f1f220 ffff880119f1f4f8 ffff880119f1f4f0 ffff88011a7e4780 ffff880119f1f232 ffff880119f1f220 ffff880111747d58 ffffffff82bca542 0000000000000000 Call Trace: [<ffffffff82bca542>] irda_connect+0x562/0x1190 [<ffffffff825ae582>] SYSC_connect+0x202/0x2a0 [<ffffffff825b4489>] SyS_connect+0x9/0x10 [<ffffffff8100334c>] do_syscall_64+0x19c/0x410 [<ffffffff83295ca5>] entry_SYSCALL64_slow_path+0x25/0x25 Code: 41 89 ca 48 89 e5 41 57 41 56 41 55 41 54 41 89 d7 53 48 89 fb 48 83 c7 48 48 89 fa 41 89 f6 48 c1 ea 03 48 83 ec 20 4c 8b 65 10 <0f> b6 04 02 84 c0 74 08 84 c0 0f 8e 4c 04 00 00 80 7b 48 00 74 RIP [<ffffffff82bbf066>] irttp_connect_request+0x36/0x710 RSP <ffff880111747bb8> ---[ end trace 4cda2588bc055b30 ]--- The problem is that irda_open_tsap() can fail and leave self->tsap = NULL, and then irttp_connect_request() almost immediately dereferences it. Cc: stable@vger.kernel.org Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-25	sctp: also point GSO head_skb to the sk when it's available	Marcelo Ricardo Leitner	1	-0/+3
	The head skb for GSO packets won't travel through the inner depths of SCTP stack as it doesn't contain any chunks on it. That means skb->sk doesn't get set and then when sctp_recvmsg() calls sctp_inet6_skb_msgname() on the head_skb it panics, as this last needs to check flags at the socket (sp->v4mapped). The fix is to initialize skb->sk for th head skb once we are able to do it. That is, when the first chunk is processed. Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-25	sctp: fix BH handling on socket backlog	Marcelo Ricardo Leitner	2	-2/+2
	Now that the backlog processing is called with BH enabled, we have to disable BH before taking the socket lock via bh_lock_sock() otherwise it may dead lock: sctp_backlog_rcv() bh_lock_sock(sk); if (sock_owned_by_user(sk)) { if (sk_add_backlog(sk, skb, sk->sk_rcvbuf)) sctp_chunk_free(chunk); else backloged = 1; } else sctp_inq_push(inqueue, chunk); bh_unlock_sock(sk); while sctp_inq_push() was disabling/enabling BH, but enabling BH triggers pending softirq, which then may try to re-lock the socket in sctp_rcv(). [ 219.187215] <IRQ> [ 219.187217] [<ffffffff817ca3e0>] _raw_spin_lock+0x20/0x30 [ 219.187223] [<ffffffffa041888c>] sctp_rcv+0x48c/0xba0 [sctp] [ 219.187225] [<ffffffff816e7db2>] ? nf_iterate+0x62/0x80 [ 219.187226] [<ffffffff816f1b14>] ip_local_deliver_finish+0x94/0x1e0 [ 219.187228] [<ffffffff816f1e1f>] ip_local_deliver+0x6f/0xf0 [ 219.187229] [<ffffffff816f1a80>] ? ip_rcv_finish+0x3b0/0x3b0 [ 219.187230] [<ffffffff816f17a8>] ip_rcv_finish+0xd8/0x3b0 [ 219.187232] [<ffffffff816f2122>] ip_rcv+0x282/0x3a0 [ 219.187233] [<ffffffff810d8bb6>] ? update_curr+0x66/0x180 [ 219.187235] [<ffffffff816abac4>] __netif_receive_skb_core+0x524/0xa90 [ 219.187236] [<ffffffff810d8e00>] ? update_cfs_shares+0x30/0xf0 [ 219.187237] [<ffffffff810d557c>] ? __enqueue_entity+0x6c/0x70 [ 219.187239] [<ffffffff810dc454>] ? enqueue_entity+0x204/0xdf0 [ 219.187240] [<ffffffff816ac048>] __netif_receive_skb+0x18/0x60 [ 219.187242] [<ffffffff816ad1ce>] process_backlog+0x9e/0x140 [ 219.187243] [<ffffffff816ac8ec>] net_rx_action+0x22c/0x370 [ 219.187245] [<ffffffff817cd352>] __do_softirq+0x112/0x2e7 [ 219.187247] [<ffffffff817cc3bc>] do_softirq_own_stack+0x1c/0x30 [ 219.187247] <EOI> [ 219.187248] [<ffffffff810aa1c8>] do_softirq.part.14+0x38/0x40 [ 219.187249] [<ffffffff810aa24d>] __local_bh_enable_ip+0x7d/0x80 [ 219.187254] [<ffffffffa0408428>] sctp_inq_push+0x68/0x80 [sctp] [ 219.187258] [<ffffffffa04190f1>] sctp_backlog_rcv+0x151/0x1c0 [sctp] [ 219.187260] [<ffffffff81692b07>] __release_sock+0x87/0xf0 [ 219.187261] [<ffffffff81692ba0>] release_sock+0x30/0xa0 [ 219.187265] [<ffffffffa040e46d>] sctp_accept+0x17d/0x210 [sctp] [ 219.187266] [<ffffffff810e7510>] ? prepare_to_wait_event+0xf0/0xf0 [ 219.187268] [<ffffffff8172d52c>] inet_accept+0x3c/0x130 [ 219.187269] [<ffffffff8168d7a3>] SYSC_accept4+0x103/0x210 [ 219.187271] [<ffffffff817ca2ba>] ? _raw_spin_unlock_bh+0x1a/0x20 [ 219.187272] [<ffffffff81692bfc>] ? release_sock+0x8c/0xa0 [ 219.187276] [<ffffffffa0413e22>] ? sctp_inet_listen+0x62/0x1b0 [sctp] [ 219.187277] [<ffffffff8168f2d0>] SyS_accept+0x10/0x20 Fixes: 860fbbc343bf ("sctp: prepare for socket backlog behavior change") Cc: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-25	hv_netvsc: Fix VF register on bonding devices	Haiyang Zhang	1	-2/+2
	Added a condition to avoid bonding devices with same MAC registering as VF. Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-25	kcm: remove redundant -ve error check and return path	Colin Ian King	1	-5/+1
	The check for a -ve error is redundant, remove it and just immediately return the return value from the call to seq_open_net. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-25	net: ipv6: Always leave anycast and multicast groups on link down	Mike Manning	1	-0/+4
	Default kernel behavior is to delete IPv6 addresses on link down, which entails deletion of the multicast and the subnet-router anycast addresses. These deletions do not happen with sysctl setting to keep global IPv6 addresses on link down, so every link down/up causes an increment of the anycast and multicast refcounts. These bogus refcounts may stop these addrs from being removed on subsequent calls to delete them. The solution is to leave the groups for the multicast and subnet anycast on link down for the callflow when global IPv6 addresses are kept. Fixes: f1705ec197e7 ("net: ipv6: Make address flushing on ifdown optional") Signed-off-by: Mike Manning <mmanning@brocade.com> Acked-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-25	libcxgb: remove unused including <linux/version.h>	Wei Yongjun	1	-1/+0
	Remove including <linux/version.h> that don't need it. Signed-off-by: Wei Yongjun <weiyj.lk@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-25	sctp: use inet_recvmsg to support sctp RFS well	Xin Long	2	-2/+2
	Commit 486bdee0134c ("sctp: add support for RPS and RFS") saves skb->hash into sk->sk_rxhash so that the inet_* can record it to flow table. But sctp uses sock_common_recvmsg as .recvmsg instead of inet_recvmsg, sock_common_recvmsg doesn't invoke sock_rps_record_flow to record the flow. It may cause that the receiver has no chances to record the flow if it doesn't send msg or poll the socket. So this patch fixes it by using inet_recvmsg as .recvmsg in sctp. Fixes: 486bdee0134c ("sctp: add support for RPS and RFS") Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-25	macsec: validate ICV length on link creation	Davide Caratti	1	-1/+13
	Test the cipher suite initialization in case ICV length has a value different than its default. If this test fails, creation of a new macsec link will also fail. This avoids situations where further security associations can't be added due to failures of crypto_aead_setauthsize(), caused by unsupported user-provided values of the ICV length. Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-25	macsec: fix error codes when a SA is created	Davide Caratti	1	-22/+36
	preserve the return value of AEAD functions that are called when a SA is created, to avoid inappropriate display of "RTNETLINK answers: Cannot allocate memory" message. Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-25	macsec: limit ICV length to 16 octets	Davide Caratti	2	-2/+4
	IEEE 802.1AE-2006 standard recommends that the ICV element in a MACsec frame should not exceed 16 octets: add MACSEC_STD_ICV_LEN in uapi definitions accordingly, and avoid accepting configurations where the ICV length exceeds the standard value. Leave definition of MACSEC_MAX_ICV_LEN unchanged for backwards compatibility with userspace programs. Fixes: dece8d2b78d1 ("uapi: add MACsec bits") Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-25	bridge: Fix incorrect re-injection of LLDP packets	Ido Schimmel	1	-0/+8
	Commit 8626c56c8279 ("bridge: fix potential use-after-free when hook returns QUEUE or STOLEN verdict") caused LLDP packets arriving through a bridge port to be re-injected to the Rx path with skb->dev set to the bridge device, but this breaks the lldpad daemon. The lldpad daemon opens a packet socket with protocol set to ETH_P_LLDP for any valid device on the system, which doesn't not include soft devices such as bridge and VLAN. Since packet sockets (ptype_base) are processed in the Rx path after the Rx handler, LLDP packets with skb->dev set to the bridge device never reach the lldpad daemon. Fix this by making the bridge's Rx handler re-inject LLDP packets with RX_HANDLER_PASS, which effectively restores the behaviour prior to the mentioned commit. This means netfilter will never receive LLDP packets coming through a bridge port, as I don't see a way in which we can have okfn() consume the packet without breaking existing behaviour. I've already carried out a similar fix for STP packets in commit 56fae404fb2c ("bridge: Fix incorrect re-injection of STP packets"). Fixes: 8626c56c8279 ("bridge: fix potential use-after-free when hook returns QUEUE or STOLEN verdict") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Cc: Florian Westphal <fw@strlen.de> Cc: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-25	sctp: support ipv6 nonlocal bind	Xin Long	1	-1/+3
	This patch makes sctp support ipv6 nonlocal bind by adding sp->inet.freebind and net->ipv6.sysctl.ip_nonlocal_bind check in sctp_v6_available as what sctp did to support ipv4 nonlocal bind (commit cdac4e077489). Reported-by: Shijoe George <spanjikk@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-25	bpf, events: fix offset in skb copy handler	Daniel Borkmann	4	-9/+14
	This patch fixes the __output_custom() routine we currently use with bpf_skb_copy(). I missed that when len is larger than the size of the current handle, we can issue multiple invocations of copy_func, and __output_custom() advances destination but also source buffer by the written amount of bytes. When we have __output_custom(), this is actually wrong since in that case the source buffer points to a non-linear object, in our case an skb, which the copy_func helper is supposed to walk. Therefore, since this is non-linear we thus need to pass the offset into the helper, so that copy_func can use it for extracting the data from the source object. Therefore, adjust the callback signatures properly and pass offset into the skb_header_pointer() invoked from bpf_skb_copy() callback. The __DEFINE_OUTPUT_COPY_BODY() is adjusted to accommodate for two things: i) to pass in whether we should advance source buffer or not; this is a compile-time constant condition, ii) to pass in the offset for __output_custom(), which we do with help of __VA_ARGS__, so everything can stay inlined as is currently. Both changes allow for adapting the __output_* fast-path helpers w/o extra overhead. Fixes: 555c8a8623a3 ("bpf: avoid stack copy and use skb ctx for event output") Fixes: 7e3f977edd0b ("perf, events: add non-linear data support for raw records") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-25	net/ncsi: avoid maybe-uninitialized warning	Arnd Bergmann	1	-13/+19
	gcc-4.9 and higher warn about the newly added NSCI code: net/ncsi/ncsi-manage.c: In function 'ncsi_process_next_channel': net/ncsi/ncsi-manage.c:1003:2: error: 'old_state' may be used uninitialized in this function [-Werror=maybe-uninitialized] The warning is a false positive and therefore harmless, but it would be good to avoid it anyway. I have determined that the barrier in the spin_unlock_irqsave() is what confuses gcc to the point that it cannot track whether the variable was unused or not. This rearranges the code in a way that makes it obvious to gcc that old_state is always initialized at the time of use, functionally this should not change anything. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-25	cxgb3i, cxgb4i: fix symbol not declared sparse warning	Varun Prakash	2	-3/+3
	Fix following sparse warnings warning: symbol 'cxgb3i_ofld_init' was not declared. Should it be static? warning: symbol 'cxgb4i_cplhandlers' was not declared. Should it be static? warning: symbol 'cxgb4i_ofld_init' was not declared. Should it be static? Signed-off-by: Varun Prakash <varun@chelsio.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-25	libcxgb: export ppm release and tagmask set api	Varun Prakash	4	-0/+6
	Export cxgbi_ppm_release() to release ppod manager and cxgbi_tagmask_set() to set tag mask, they are used by cxgb3i, cxgb4i and cxgbit. Signed-off-by: Varun Prakash <varun@chelsio.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-25	cxgb3i: add iSCSI DDP support	Varun Prakash	1	-1/+118
	Add iSCSI DDP support in cxgb3i driver using common iSCSI DDP Page Pod Manager. Signed-off-by: Varun Prakash <varun@chelsio.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-25	cxgb4i,libcxgbi: add iSCSI DDP support	Varun Prakash	8	-2/+507
	Add iSCSI DDP support in cxgb4i driver using common iSCSI DDP Page Pod Manager. Signed-off-by: Varun Prakash <varun@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-25	cxgb3i,cxgb4i,libcxgbi: remove iSCSI DDP support	Varun Prakash	4	-1009/+0
	Remove old ddp code from cxgb3i,cxgb4i,libcxgbi. Next two commits adds DDP support using common iSCSI DDP Page Pod Manager. Signed-off-by: Varun Prakash <varun@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-25	libcxgb: add library module for Chelsio drivers	Varun Prakash	9	-26/+81
	Add common library module(libcxgb.ko) for Chelsio drivers to remove duplicate code. Code for iSCSI DDP Page Pod Manager is moved from cxgb4.ko to libcxgb.ko. Earlier only cxgbit.ko was using this code, now cxgb3i and cxgb4i will also use common Page Pod manager code. In future this module will have common connection management and hardware specific code that can be shared by multiple Chelsio drivers. Signed-off-by: Varun Prakash <varun@chelsio.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-25	net: bridge: br_set_ageing_time takes a clock_t	Vivien Didelot	2	-2/+2
	Change the ageing_time type in br_set_ageing_time() from u32 to what it is expected to be, i.e. a clock_t. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-25	net: bridge: fix br_stp_enable_bridge comment	Vivien Didelot	1	-1/+1
	br_stp_enable_bridge() does take the br->lock spinlock. Fix its wrongly pasted comment and use the same as br_stp_disable_bridge(). Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-25	cxgb4/cxgb4vf: Add link mode mask API to cxgb4 and cxgb4vf	Ganesh Goudar	7	-236/+446
	Based on original work by Casey Leedom <leedom@chelsio.com> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-25	net/bonding: Enforce active-backup policy for IPoIB bonds	Mark Bloch	1	-1/+10
	When using an IPoIB bond currently only active-backup mode is a valid use case and this commit strengthens it. Since commit 2ab82852a270 ("net/bonding: Enable bonding to enslave netdevices not supporting set_mac_address()") was introduced till 4.7-rc1, IPoIB didn't support the set_mac_address ndo, and hence the fail over mac policy always applied to IPoIB bonds. With the introduction of commit 492a7e67ff83 ("IB/IPoIB: Allow setting the device address"), that doesn't hold and practically IPoIB bonds are broken as of that. To fix it, lets go to fail over mac if the device doesn't support the ndo OR this is IPoIB device. As a by-product, this commit also prevents a stack corruption which occurred when trying to copy 20 bytes (IPoIB) device address to a sockaddr struct that has only 16 bytes of storage. Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Acked-by: Andy Gospodarek <gospo@cumulusnetworks.com> Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-24	mlxsw: spectrum: Add support in matchall mirror TC offloading	Yotam Gigi	2	-1/+507
	This patch offloads port mirroring directives to hw using the matchall TC with action mirror. It includes both the implementation of the ndo_setup_tc function for the spectrum driver and the spectrum hardware offload configuration code. The hardware offload code is basically two new functions which are capable of adding and removing a new mirror ports pair. It is done using the MPAT, MPAR and SBIB registers: - A new Switch-Port Analyzer (SPAN) entry is added using MPAT to the 'to' port. - The 'to' port is bound to the SPAN entry using MPAR register. - In case of egress SPAN, the 'to' port gets a new internal shared buffer using SBIB register. In addition, a new database was added to the mlxsw_sp struct to store all the SPAN entries and their bound ports list. The number of supported SPAN entries is determined by resource query. Signed-off-by: Yotam Gigi <yotamg@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-24	net/sched: act_mirred: Add helper inlines to access tcf_mirred info.	Yotam Gigi	1	-0/+9
	The helper function is_tcf_mirred_mirror helps finding whether an action struct is of type mirred and is configured to be of type mirror. Signed-off-by: Yotam Gigi <yotamg@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-24	mlxsw: reg: Add the Monitoring Port Analyzer register	Yotam Gigi	1	-0/+67
	The MPAR register is used to bind ports to a SPAN entry (which was created using MPAT register) and thus mirror their traffic (ingress / egress) to a different port. Signed-off-by: Yotam Gigi <yotamg@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-24	mlxsw: reg: Add Monitoring Port Analyzer Table register	Yotam Gigi	1	-0/+54
	The MPAT register is used to query and configure the Switch Port Analyzer (SPAN) table. This register is used to configure a port as a mirror output port, while after that a mirrored input port can be bound using MPAR register. Signed-off-by: Yotam Gigi <yotamg@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-24	mlxsw: reg: Add Shared Buffer Internal Buffer register	Yotam Gigi	1	-0/+41
	The SBIB register configures per port buffer for internal use. This register is used to configure an egress mirror buffer on the egress port which does the mirroring. Signed-off-by: Yotam Gigi <yotamg@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-24	net/sched: Add match-all classifier hw offloading.	Yotam Gigi	4	-3/+87
	Following the work that have been done on offloading classifiers like u32 and flower, now the match-all classifier hw offloading is possible. if the interface supports tc offloading. To control the offloading, two tc flags have been introduced: skip_sw and skip_hw. Typical usage: tc filter add dev eth25 parent ffff: \ matchall skip_sw \ action mirred egress mirror \ dev eth27 Signed-off-by: Yotam Gigi <yotamg@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-24	net/sched: introduce Match-all classifier	Jiri Pirko	4	-0/+270
	The matchall classifier matches every packet and allows the user to apply actions on it. This filter is very useful in usecases where every packet should be matched, for example, packet mirroring (SPAN) can be setup very easily using that filter. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Yotam Gigi <yotamg@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-24	mlxsw: pci: Add max span resources to resources query	Nogah Frankel	2	-0/+7
	Add max span resources to resources query. Signed-off-by: Nogah Frankel <nogahf@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-24	mlxsw: pci: Add resources query implementation.	Nogah Frankel	6	-3/+109
	Add resources query implementation. If exists, query the HW for its builtin resources instead of having them as consts in the code. Signed-off-by: Nogah Frankel <nogahf@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-24	cdc_ether: Improve ZTE MF823/831/910 handling	Kristian Evensen	1	-0/+51
	The firmware in several ZTE devices (at least the MF823/831/910 modems/mifis) use OS fingerprinting to determine which type of device to export. In addition, these devices export a REST API which can be used to control the type of device. So far, on Linux, the devices have been seen as RNDIS or CDC Ether. When CDC Ether is used, devices of the same type are, as with RNDIS, exported with the same, bogus random MAC address. In addition, the devices (at least on all firmware revisions I have found) use the bogus MAC when sending traffic routed from external networks. And as a final feature, the devices sometimes export the link state incorrectly. There are also references online to several other ZTE devices displaying this behavior, with several different PIDs and MAC addresses. This patch tries to improve the handling of ZTE devices by doing the following: * Create a new driver_info-struct that is used by ZTE devices that do not have an explicit entry in the product table. This struct is the same as the default cdc_ether driver info, but a new bind- and an rx_fixup-function have been added. * In the new bind function, we check if we have read a random MAC from the device. If we have, then we generate a new random MAC address. This will ensure that all devices get a unique MAC. * The rx_fixup-function replaces the destination MAC address in the skb with that of the device. I have not seen a revision of these devices that behaves correctly (i.e., sets the right destination MAC), so I chose not to do any comparison with for example the known, bogus addresses. * The MF823/MF832/MF910 sometimes export cdc carrier on twice on connect (the correct behavior is off then on). Work around this by manually setting carrier to off if an on-notification is received and the NOCARRIER-bit is not set. This change will affect all devices, but it should take care of similar mistakes made by other manufacturers. I tried to think of/look/test for problems/regressions that could be introduced by this behavior, but could not find any. However, my familiarity with this code path is not that great, so there could be something I have overlooked. I have tested this patch with multiple revisions of all three devices, and they behave as expected. In other words, they all got a valid, random MAC, the correct operational state and I can receive/sent traffic without problems. I also tested with some other cdc_ether devices I have and did not find any problems/regressions caused by the two general changes. v3->v4: * Forgot to remove unused variables, sorry about that (thanks David Miller). v2->v3: * I had forgot to remove the random MAC generation from usbnet_cdc_bind() (thanks Oliver). * Rework logic in the ZTE bind-function a bit. v1->v2: * Only generate random MAC for ZTE devices (thanks Oliver Neukum). * Set random MAC and do RX fixup for all ZTE devices that do not have a product-entry, as the bogus MAC have been seen on devices with several different PIDs/MAC addresses. In other words, it seems to be the default behavior of ZTE CDC Ether devices (thanks Lars Melin). Signed-off-by: Kristian Evensen <kristian.evensen@gmail.com> Acked-by: Oliver Neukum <oneukum@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-23	netfilter: nft_compat: fix crash when related match/target module is removed	Liping Zhang	1	-23/+20
	We "cache" the loaded match/target modules and reuse them, but when the modules are removed, we still point to them. Then we may end up with invalid memory references when using iptables-compat to add rules later. Input the following commands will reproduce the kernel crash: # iptables-compat -A INPUT -j LOG # iptables-compat -D INPUT -j LOG # rmmod xt_LOG # iptables-compat -A INPUT -j LOG BUG: unable to handle kernel paging request at ffffffffa05a9010 IP: [<ffffffff813f783e>] strcmp+0xe/0x30 Call Trace: [<ffffffffa05acc43>] nft_target_select_ops+0x83/0x1f0 [nft_compat] [<ffffffffa058a177>] nf_tables_expr_parse+0x147/0x1f0 [nf_tables] [<ffffffffa058e541>] nf_tables_newrule+0x301/0x810 [nf_tables] [<ffffffff8141ca00>] ? nla_parse+0x20/0x100 [<ffffffffa057fa8f>] nfnetlink_rcv+0x33f/0x53d [nfnetlink] [<ffffffffa057f94b>] ? nfnetlink_rcv+0x1fb/0x53d [nfnetlink] [<ffffffff817116b8>] netlink_unicast+0x178/0x220 [<ffffffff81711a5b>] netlink_sendmsg+0x2fb/0x3a0 [<ffffffff816b7fc8>] sock_sendmsg+0x38/0x50 [<ffffffff816b8a7e>] ___sys_sendmsg+0x28e/0x2a0 [<ffffffff816bcb7e>] ? release_sock+0x1e/0xb0 [<ffffffff81804ac5>] ? _raw_spin_unlock_bh+0x35/0x40 [<ffffffff816bcbe2>] ? release_sock+0x82/0xb0 [<ffffffff816b93d4>] __sys_sendmsg+0x54/0x90 [<ffffffff816b9422>] SyS_sendmsg+0x12/0x20 [<ffffffff81805172>] entry_SYSCALL_64_fastpath+0x1a/0xa9 So when nobody use the related match/target module, there's no need to "cache" it. And nft_[match\|target]_release are useless anymore, remove them. Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-07-23	netfilter: nft_compat: put back match/target module if init fail	Liping Zhang	1	-8/+24
	If the user specify the invalid NFTA_MATCH_INFO/NFTA_TARGET_INFO attr or memory alloc fail, we should call module_put to the related match or target. Otherwise, we cannot remove the module even nobody use it. Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-07-23	netfilter: h323: Use mod_timer instead of set_expect_timeout	Gao Feng	1	-14/+1
	Simplify the code without any side effect. The set_expect_timeout is used to modify the timer expired time. It tries to delete timer, and add it again. So we could use mod_timer directly. Signed-off-by: Gao Feng <fgao@ikuai8.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-07-23	pps: do not crash when failed to register	Jiri Slaby	1	-1/+1
	With this command sequence: modprobe plip modprobe pps_parport rmmod pps_parport the partport_pps modules causes this crash: BUG: unable to handle kernel NULL pointer dereference at (null) IP: parport_detach+0x1d/0x60 [pps_parport] Oops: 0000 [#1] SMP ... Call Trace: parport_unregister_driver+0x65/0xc0 [parport] SyS_delete_module+0x187/0x210 The sequence that builds up to this is: 1) plip is loaded and takes the parport device for exclusive use: plip0: Parallel port at 0x378, using IRQ 7. 2) pps_parport then fails to grab the device: pps_parport: parallel port PPS client parport0: cannot grant exclusive access for device pps_parport pps_parport: couldn't register with parport0 3) rmmod of pps_parport is then killed because it tries to access pardev->name, but pardev (taken from port->cad) is NULL. So add a check for NULL in the test there too. Link: http://lkml.kernel.org/r/20160714115245.12651-1-jslaby@suse.cz Signed-off-by: Jiri Slaby <jslaby@suse.cz> Acked-by: Rodolfo Giometti <giometti@enneenne.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-07-23	tools/vm/slabinfo: fix an unintentional printf	Dan Carpenter	1	-1/+2
	The curly braces are missing here so we print stuff unintentionally. Fixes: 9da4714a2d44 ('slub: slabinfo update for cmpxchg handling') Link: http://lkml.kernel.org/r/20160715211243.GE19522@mwanda Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Christoph Lameter <cl@linux.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Colin Ian King <colin.king@canonical.com> Cc: Laura Abbott <labbott@fedoraproject.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-07-23	testing/radix-tree: fix a macro expansion bug	Dan Carpenter	1	-1/+1
	There are no parentheses around this macro and it causes a problem when we do: index = rand() % THRASH_SIZE; Link: http://lkml.kernel.org/r/20160715210953.GC19522@mwanda Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-07-23	radix-tree: fix radix_tree_iter_retry() for tagged iterators.	Andrey Ryabinin	1	-0/+1
	radix_tree_iter_retry() resets slot to NULL, but it doesn't reset tags. Then NULL slot and non-zero iter.tags passed to radix_tree_next_slot() leading to crash: RIP: radix_tree_next_slot include/linux/radix-tree.h:473 find_get_pages_tag+0x334/0x930 mm/filemap.c:1452 .... Call Trace: pagevec_lookup_tag+0x3a/0x80 mm/swap.c:960 mpage_prepare_extent_to_map+0x321/0xa90 fs/ext4/inode.c:2516 ext4_writepages+0x10be/0x2b20 fs/ext4/inode.c:2736 do_writepages+0x97/0x100 mm/page-writeback.c:2364 __filemap_fdatawrite_range+0x248/0x2e0 mm/filemap.c:300 filemap_write_and_wait_range+0x121/0x1b0 mm/filemap.c:490 ext4_sync_file+0x34d/0xdb0 fs/ext4/fsync.c:115 vfs_fsync_range+0x10a/0x250 fs/sync.c:195 vfs_fsync fs/sync.c:209 do_fsync+0x42/0x70 fs/sync.c:219 SYSC_fdatasync fs/sync.c:232 SyS_fdatasync+0x19/0x20 fs/sync.c:230 entry_SYSCALL_64_fastpath+0x23/0xc1 arch/x86/entry/entry_64.S:207 We must reset iterator's tags to bail out from radix_tree_next_slot() and go to the slow-path in radix_tree_next_chunk(). Fixes: 46437f9a554f ("radix-tree: fix race in gang lookup") Link: http://lkml.kernel.org/r/1468495196-10604-1-git-send-email-aryabinin@virtuozzo.com Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Acked-by: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Matthew Wilcox <willy@linux.intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-07-23	mm: memcontrol: fix cgroup creation failure after many small jobs	Johannes Weiner	3	-24/+87
	The memory controller has quite a bit of state that usually outlives the cgroup and pins its CSS until said state disappears. At the same time it imposes a 16-bit limit on the CSS ID space to economically store IDs in the wild. Consequently, when we use cgroups to contain frequent but small and short-lived jobs that leave behind some page cache, we quickly run into the 64k limitations of outstanding CSSs. Creating a new cgroup fails with -ENOSPC while there are only a few, or even no user-visible cgroups in existence. Although pinning CSSs past cgroup removal is common, there are only two instances that actually need an ID after a cgroup is deleted: cache shadow entries and swapout records. Cache shadow entries reference the ID weakly and can deal with the CSS having disappeared when it's looked up later. They pose no hurdle. Swap-out records do need to pin the css to hierarchically attribute swapins after the cgroup has been deleted; though the only pages that remain swapped out after offlining are tmpfs/shmem pages. And those references are under the user's control, so they are manageable. This patch introduces a private 16-bit memcg ID and switches swap and cache shadow entries over to using that. This ID can then be recycled after offlining when the CSS remains pinned only by objects that don't specifically need it. This script demonstrates the problem by faulting one cache page in a new cgroup and deleting it again: set -e mkdir -p pages for x in `seq 128000`; do [ $((x % 1000)) -eq 0 ] && echo $x mkdir /cgroup/foo echo $$ >/cgroup/foo/cgroup.procs echo trex >pages/$x echo $$ >/cgroup/cgroup.procs rmdir /cgroup/foo done When run on an unpatched kernel, we eventually run out of possible IDs even though there are no visible cgroups: [root@ham ~]# ./cssidstress.sh [...] 65000 mkdir: cannot create directory '/cgroup/foo': No space left on device After this patch, the IDs get released upon cgroup destruction and the cache and css objects get released once memory reclaim kicks in. [hannes@cmpxchg.org: init the IDR] Link: http://lkml.kernel.org/r/20160621154601.GA22431@cmpxchg.org Fixes: b2052564e66d ("mm: memcontrol: continue cache reclaim from offlined groups") Link: http://lkml.kernel.org/r/20160617162516.GD19084@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reported-by: John Garcia <john.garcia@mesosphere.io> Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com> Acked-by: Tejun Heo <tj@kernel.org> Cc: Nikolay Borisov <kernel@kyup.com> Cc: <stable@vger.kernel.org> [3.19+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-07-22	netfilter: connlabels: move set helper to xt_connlabel	Florian Westphal	3	-32/+16
	xt_connlabel is the only user so move it. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-07-22	netfilter: conntrack: support a fixed size of 128 distinct labels	Florian Westphal	6	-40/+18
	The conntrack label extension is currently variable-sized, e.g. if only 2 labels are used by iptables rules then the labels->bits[] array will only contain one element. We track size of each label storage area in the 'words' member. But in nftables and openvswitch we always have to ask for worst-case since we don't know what bit will be used at configuration time. As most arches are 64bit we need to allocate 24 bytes in this case: struct nf_conn_labels { u8 words; /* 0 1 / / XXX 7 bytes hole, try to pack / long unsigned bits[2]; / 8 24 */ Make bits a fixed size and drop the words member, it simplifies the code and only increases memory requirements on x86 when less than 64bit labels are required. We still only allocate the extension if its needed. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-07-22	gpio: tegra: don't auto-enable for COMPILE_TEST	Arnd Bergmann	1	-2/+5
	I stumbled over a build error with COMPILE_TEST and CONFIG_OF disabled: drivers/gpio/gpio-tegra.c: In function 'tegra_gpio_probe': drivers/gpio/gpio-tegra.c:603:9: error: 'struct gpio_chip' has no member named 'of_node' The problem is that the newly added GPIO_TEGRA Kconfig symbol does not have a dependency on CONFIG_OF. However, there is another problem here as the driver gets enabled unconditionally whenever COMPILE_TEST is set. This fixes both problems, by making the symbol user-visible when COMPILE_TEST is set and default-enabled for ARCH_TEGRA=y. As a side-effect, it is now possible to compile-test a Tegra kernel with GPIO support disabled, which is harmless. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: 4dd4dd1d2120 ("gpio: tegra: Allow compile test") Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2016-07-22	crypto: rsa-pkcs1pad - fix rsa-pkcs1pad request struct	Herbert Xu	1	-2/+2
	To allow for child request context the struct akcipher_request child_req needs to be at the end of the structure. Cc: stable@vger.kernel.org Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-07-22	ixgbe: cleanup crosstalk fix	Don Skidmore	4	-41/+72
	This patch address a few issues with the initial crosstalk fix. Most important of which is the SDP that indicates the presents of a SFP+ module changes between HW types. With this change that is taken in to consideration It also moves the check closer to the base code that checks link. This makes it so we only need to do the check in one spot. Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>