Age | Commit message (Collapse) | Author | Files | Lines |
|
Each send queue (SQ) has inline mode that defines the minimal required
inline headers in the SQ WQE.
Before sending each packet check that the minimum required headers
on the WQE are copied.
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
I was seeing a lot of these:
BUG: sleeping function called from invalid context at mm/slab.h:388
in_atomic(): 0, irqs_disabled(): 0, pid: 14971, name: trinity-c2
Preemption disabled at:[<ffffffff819bcd46>] rhashtable_walk_start+0x46/0x150
[<ffffffff81149abb>] preempt_count_add+0x1fb/0x280
[<ffffffff83295722>] _raw_spin_lock+0x12/0x40
[<ffffffff811aac87>] console_unlock+0x2f7/0x930
[<ffffffff811ab5bb>] vprintk_emit+0x2fb/0x520
[<ffffffff811aba6a>] vprintk_default+0x1a/0x20
[<ffffffff812c171a>] printk+0x94/0xb0
[<ffffffff811d6ed0>] print_stack_trace+0xe0/0x170
[<ffffffff8115835e>] ___might_sleep+0x3be/0x460
[<ffffffff81158490>] __might_sleep+0x90/0x1a0
[<ffffffff8139b823>] kmem_cache_alloc+0x153/0x1e0
[<ffffffff819bca1e>] rhashtable_walk_init+0xfe/0x2d0
[<ffffffff82ec64de>] sctp_transport_walk_start+0x1e/0x60
[<ffffffff82edd8ad>] sctp_transport_seq_start+0x4d/0x150
[<ffffffff8143a82b>] seq_read+0x27b/0x1180
[<ffffffff814f97fc>] proc_reg_read+0xbc/0x180
[<ffffffff813d471b>] __vfs_read+0xdb/0x610
[<ffffffff813d4d3a>] vfs_read+0xea/0x2d0
[<ffffffff813d615b>] SyS_pread64+0x11b/0x150
[<ffffffff8100334c>] do_syscall_64+0x19c/0x410
[<ffffffff832960a5>] return_from_SYSCALL_64+0x0/0x6a
[<ffffffffffffffff>] 0xffffffffffffffff
Apparently we always need to call rhashtable_walk_stop(), even when
rhashtable_walk_start() fails:
* rhashtable_walk_start - Start a hash table walk
* @iter: Hash table iterator
*
* Start a hash table walk. Note that we take the RCU lock in all
* cases including when we return an error. So you must always call
* rhashtable_walk_stop to clean up.
otherwise we never call rcu_read_unlock() and we get the splat above.
Fixes: 53fa1036 ("sctp: fix some rhashtable functions using in sctp proc/diag")
See-also: 53fa1036 ("sctp: fix some rhashtable functions using in sctp proc/diag")
See-also: f2dba9c6 ("rhashtable: Introduce rhashtable_walk_*")
Cc: Xin Long <lucien.xin@gmail.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: stable@vger.kernel.org
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
I ran into this:
kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] PREEMPT SMP KASAN
CPU: 2 PID: 2012 Comm: trinity-c3 Not tainted 4.7.0-rc7+ #19
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
task: ffff8800b745f2c0 ti: ffff880111740000 task.ti: ffff880111740000
RIP: 0010:[<ffffffff82bbf066>] [<ffffffff82bbf066>] irttp_connect_request+0x36/0x710
RSP: 0018:ffff880111747bb8 EFLAGS: 00010286
RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000069dd8358
RDX: 0000000000000009 RSI: 0000000000000027 RDI: 0000000000000048
RBP: ffff880111747c00 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000069dd8358 R11: 1ffffffff0759723 R12: 0000000000000000
R13: ffff88011a7e4780 R14: 0000000000000027 R15: 0000000000000000
FS: 00007fc738404700(0000) GS:ffff88011af00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fc737fdfb10 CR3: 0000000118087000 CR4: 00000000000006e0
Stack:
0000000000000200 ffff880111747bd8 ffffffff810ee611 ffff880119f1f220
ffff880119f1f4f8 ffff880119f1f4f0 ffff88011a7e4780 ffff880119f1f232
ffff880119f1f220 ffff880111747d58 ffffffff82bca542 0000000000000000
Call Trace:
[<ffffffff82bca542>] irda_connect+0x562/0x1190
[<ffffffff825ae582>] SYSC_connect+0x202/0x2a0
[<ffffffff825b4489>] SyS_connect+0x9/0x10
[<ffffffff8100334c>] do_syscall_64+0x19c/0x410
[<ffffffff83295ca5>] entry_SYSCALL64_slow_path+0x25/0x25
Code: 41 89 ca 48 89 e5 41 57 41 56 41 55 41 54 41 89 d7 53 48 89 fb 48 83 c7 48 48 89 fa 41 89 f6 48 c1 ea 03 48 83 ec 20 4c 8b 65 10 <0f> b6 04 02 84 c0 74 08 84 c0 0f 8e 4c 04 00 00 80 7b 48 00 74
RIP [<ffffffff82bbf066>] irttp_connect_request+0x36/0x710
RSP <ffff880111747bb8>
---[ end trace 4cda2588bc055b30 ]---
The problem is that irda_open_tsap() can fail and leave self->tsap = NULL,
and then irttp_connect_request() almost immediately dereferences it.
Cc: stable@vger.kernel.org
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The head skb for GSO packets won't travel through the inner depths of
SCTP stack as it doesn't contain any chunks on it. That means skb->sk
doesn't get set and then when sctp_recvmsg() calls
sctp_inet6_skb_msgname() on the head_skb it panics, as this last needs
to check flags at the socket (sp->v4mapped).
The fix is to initialize skb->sk for th head skb once we are able to do
it. That is, when the first chunk is processed.
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Now that the backlog processing is called with BH enabled, we have to
disable BH before taking the socket lock via bh_lock_sock() otherwise
it may dead lock:
sctp_backlog_rcv()
bh_lock_sock(sk);
if (sock_owned_by_user(sk)) {
if (sk_add_backlog(sk, skb, sk->sk_rcvbuf))
sctp_chunk_free(chunk);
else
backloged = 1;
} else
sctp_inq_push(inqueue, chunk);
bh_unlock_sock(sk);
while sctp_inq_push() was disabling/enabling BH, but enabling BH
triggers pending softirq, which then may try to re-lock the socket in
sctp_rcv().
[ 219.187215] <IRQ>
[ 219.187217] [<ffffffff817ca3e0>] _raw_spin_lock+0x20/0x30
[ 219.187223] [<ffffffffa041888c>] sctp_rcv+0x48c/0xba0 [sctp]
[ 219.187225] [<ffffffff816e7db2>] ? nf_iterate+0x62/0x80
[ 219.187226] [<ffffffff816f1b14>] ip_local_deliver_finish+0x94/0x1e0
[ 219.187228] [<ffffffff816f1e1f>] ip_local_deliver+0x6f/0xf0
[ 219.187229] [<ffffffff816f1a80>] ? ip_rcv_finish+0x3b0/0x3b0
[ 219.187230] [<ffffffff816f17a8>] ip_rcv_finish+0xd8/0x3b0
[ 219.187232] [<ffffffff816f2122>] ip_rcv+0x282/0x3a0
[ 219.187233] [<ffffffff810d8bb6>] ? update_curr+0x66/0x180
[ 219.187235] [<ffffffff816abac4>] __netif_receive_skb_core+0x524/0xa90
[ 219.187236] [<ffffffff810d8e00>] ? update_cfs_shares+0x30/0xf0
[ 219.187237] [<ffffffff810d557c>] ? __enqueue_entity+0x6c/0x70
[ 219.187239] [<ffffffff810dc454>] ? enqueue_entity+0x204/0xdf0
[ 219.187240] [<ffffffff816ac048>] __netif_receive_skb+0x18/0x60
[ 219.187242] [<ffffffff816ad1ce>] process_backlog+0x9e/0x140
[ 219.187243] [<ffffffff816ac8ec>] net_rx_action+0x22c/0x370
[ 219.187245] [<ffffffff817cd352>] __do_softirq+0x112/0x2e7
[ 219.187247] [<ffffffff817cc3bc>] do_softirq_own_stack+0x1c/0x30
[ 219.187247] <EOI>
[ 219.187248] [<ffffffff810aa1c8>] do_softirq.part.14+0x38/0x40
[ 219.187249] [<ffffffff810aa24d>] __local_bh_enable_ip+0x7d/0x80
[ 219.187254] [<ffffffffa0408428>] sctp_inq_push+0x68/0x80 [sctp]
[ 219.187258] [<ffffffffa04190f1>] sctp_backlog_rcv+0x151/0x1c0 [sctp]
[ 219.187260] [<ffffffff81692b07>] __release_sock+0x87/0xf0
[ 219.187261] [<ffffffff81692ba0>] release_sock+0x30/0xa0
[ 219.187265] [<ffffffffa040e46d>] sctp_accept+0x17d/0x210 [sctp]
[ 219.187266] [<ffffffff810e7510>] ? prepare_to_wait_event+0xf0/0xf0
[ 219.187268] [<ffffffff8172d52c>] inet_accept+0x3c/0x130
[ 219.187269] [<ffffffff8168d7a3>] SYSC_accept4+0x103/0x210
[ 219.187271] [<ffffffff817ca2ba>] ? _raw_spin_unlock_bh+0x1a/0x20
[ 219.187272] [<ffffffff81692bfc>] ? release_sock+0x8c/0xa0
[ 219.187276] [<ffffffffa0413e22>] ? sctp_inet_listen+0x62/0x1b0 [sctp]
[ 219.187277] [<ffffffff8168f2d0>] SyS_accept+0x10/0x20
Fixes: 860fbbc343bf ("sctp: prepare for socket backlog behavior change")
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Added a condition to avoid bonding devices with same MAC registering
as VF.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The check for a -ve error is redundant, remove it and just
immediately return the return value from the call to
seq_open_net.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Default kernel behavior is to delete IPv6 addresses on link
down, which entails deletion of the multicast and the
subnet-router anycast addresses. These deletions do not
happen with sysctl setting to keep global IPv6 addresses on
link down, so every link down/up causes an increment of the
anycast and multicast refcounts. These bogus refcounts may
stop these addrs from being removed on subsequent calls to
delete them. The solution is to leave the groups for the
multicast and subnet anycast on link down for the callflow
when global IPv6 addresses are kept.
Fixes: f1705ec197e7 ("net: ipv6: Make address flushing on ifdown optional")
Signed-off-by: Mike Manning <mmanning@brocade.com>
Acked-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Remove including <linux/version.h> that don't need it.
Signed-off-by: Wei Yongjun <weiyj.lk@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Commit 486bdee0134c ("sctp: add support for RPS and RFS")
saves skb->hash into sk->sk_rxhash so that the inet_* can
record it to flow table.
But sctp uses sock_common_recvmsg as .recvmsg instead
of inet_recvmsg, sock_common_recvmsg doesn't invoke
sock_rps_record_flow to record the flow. It may cause
that the receiver has no chances to record the flow if
it doesn't send msg or poll the socket.
So this patch fixes it by using inet_recvmsg as .recvmsg
in sctp.
Fixes: 486bdee0134c ("sctp: add support for RPS and RFS")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Test the cipher suite initialization in case ICV length has a value
different than its default. If this test fails, creation of a new macsec
link will also fail. This avoids situations where further security
associations can't be added due to failures of crypto_aead_setauthsize(),
caused by unsupported user-provided values of the ICV length.
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
preserve the return value of AEAD functions that are called when a SA is
created, to avoid inappropriate display of "RTNETLINK answers: Cannot
allocate memory" message.
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
IEEE 802.1AE-2006 standard recommends that the ICV element in a MACsec
frame should not exceed 16 octets: add MACSEC_STD_ICV_LEN in uapi
definitions accordingly, and avoid accepting configurations where the ICV
length exceeds the standard value. Leave definition of MACSEC_MAX_ICV_LEN
unchanged for backwards compatibility with userspace programs.
Fixes: dece8d2b78d1 ("uapi: add MACsec bits")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Commit 8626c56c8279 ("bridge: fix potential use-after-free when hook
returns QUEUE or STOLEN verdict") caused LLDP packets arriving through a
bridge port to be re-injected to the Rx path with skb->dev set to the
bridge device, but this breaks the lldpad daemon.
The lldpad daemon opens a packet socket with protocol set to ETH_P_LLDP
for any valid device on the system, which doesn't not include soft
devices such as bridge and VLAN.
Since packet sockets (ptype_base) are processed in the Rx path after the
Rx handler, LLDP packets with skb->dev set to the bridge device never
reach the lldpad daemon.
Fix this by making the bridge's Rx handler re-inject LLDP packets with
RX_HANDLER_PASS, which effectively restores the behaviour prior to the
mentioned commit.
This means netfilter will never receive LLDP packets coming through a
bridge port, as I don't see a way in which we can have okfn() consume
the packet without breaking existing behaviour. I've already carried out
a similar fix for STP packets in commit 56fae404fb2c ("bridge: Fix
incorrect re-injection of STP packets").
Fixes: 8626c56c8279 ("bridge: fix potential use-after-free when hook returns QUEUE or STOLEN verdict")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Cc: Florian Westphal <fw@strlen.de>
Cc: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch makes sctp support ipv6 nonlocal bind by adding
sp->inet.freebind and net->ipv6.sysctl.ip_nonlocal_bind
check in sctp_v6_available as what sctp did to support
ipv4 nonlocal bind (commit cdac4e077489).
Reported-by: Shijoe George <spanjikk@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch fixes the __output_custom() routine we currently use with
bpf_skb_copy(). I missed that when len is larger than the size of the
current handle, we can issue multiple invocations of copy_func, and
__output_custom() advances destination but also source buffer by the
written amount of bytes. When we have __output_custom(), this is actually
wrong since in that case the source buffer points to a non-linear object,
in our case an skb, which the copy_func helper is supposed to walk.
Therefore, since this is non-linear we thus need to pass the offset into
the helper, so that copy_func can use it for extracting the data from
the source object.
Therefore, adjust the callback signatures properly and pass offset
into the skb_header_pointer() invoked from bpf_skb_copy() callback. The
__DEFINE_OUTPUT_COPY_BODY() is adjusted to accommodate for two things:
i) to pass in whether we should advance source buffer or not; this is
a compile-time constant condition, ii) to pass in the offset for
__output_custom(), which we do with help of __VA_ARGS__, so everything
can stay inlined as is currently. Both changes allow for adapting the
__output_* fast-path helpers w/o extra overhead.
Fixes: 555c8a8623a3 ("bpf: avoid stack copy and use skb ctx for event output")
Fixes: 7e3f977edd0b ("perf, events: add non-linear data support for raw records")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
gcc-4.9 and higher warn about the newly added NSCI code:
net/ncsi/ncsi-manage.c: In function 'ncsi_process_next_channel':
net/ncsi/ncsi-manage.c:1003:2: error: 'old_state' may be used uninitialized in this function [-Werror=maybe-uninitialized]
The warning is a false positive and therefore harmless, but it would be good to
avoid it anyway. I have determined that the barrier in the spin_unlock_irqsave()
is what confuses gcc to the point that it cannot track whether the variable
was unused or not.
This rearranges the code in a way that makes it obvious to gcc that old_state
is always initialized at the time of use, functionally this should not
change anything.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Fix following sparse warnings
warning: symbol 'cxgb3i_ofld_init' was not declared. Should it be static?
warning: symbol 'cxgb4i_cplhandlers' was not declared. Should it be static?
warning: symbol 'cxgb4i_ofld_init' was not declared. Should it be static?
Signed-off-by: Varun Prakash <varun@chelsio.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Export cxgbi_ppm_release() to release
ppod manager and cxgbi_tagmask_set() to
set tag mask, they are used by cxgb3i, cxgb4i
and cxgbit.
Signed-off-by: Varun Prakash <varun@chelsio.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add iSCSI DDP support in cxgb3i driver
using common iSCSI DDP Page Pod Manager.
Signed-off-by: Varun Prakash <varun@chelsio.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add iSCSI DDP support in cxgb4i driver
using common iSCSI DDP Page Pod Manager.
Signed-off-by: Varun Prakash <varun@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Remove old ddp code from cxgb3i,cxgb4i,libcxgbi.
Next two commits adds DDP support using
common iSCSI DDP Page Pod Manager.
Signed-off-by: Varun Prakash <varun@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add common library module(libcxgb.ko) for
Chelsio drivers to remove duplicate code.
Code for iSCSI DDP Page Pod Manager is moved
from cxgb4.ko to libcxgb.ko. Earlier only cxgbit.ko
was using this code, now cxgb3i and cxgb4i will
also use common Page Pod manager code.
In future this module will have common connection
management and hardware specific code that can be
shared by multiple Chelsio drivers.
Signed-off-by: Varun Prakash <varun@chelsio.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Change the ageing_time type in br_set_ageing_time() from u32 to what it
is expected to be, i.e. a clock_t.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
br_stp_enable_bridge() does take the br->lock spinlock. Fix its wrongly
pasted comment and use the same as br_stp_disable_bridge().
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Based on original work by Casey Leedom <leedom@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When using an IPoIB bond currently only active-backup mode is a valid
use case and this commit strengthens it.
Since commit 2ab82852a270 ("net/bonding: Enable bonding to enslave
netdevices not supporting set_mac_address()") was introduced till
4.7-rc1, IPoIB didn't support the set_mac_address ndo, and hence the
fail over mac policy always applied to IPoIB bonds.
With the introduction of commit 492a7e67ff83 ("IB/IPoIB: Allow setting
the device address"), that doesn't hold and practically IPoIB bonds are
broken as of that. To fix it, lets go to fail over mac if the device
doesn't support the ndo OR this is IPoIB device.
As a by-product, this commit also prevents a stack corruption which
occurred when trying to copy 20 bytes (IPoIB) device address
to a sockaddr struct that has only 16 bytes of storage.
Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Acked-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch offloads port mirroring directives to hw using the matchall TC
with action mirror. It includes both the implementation of the
ndo_setup_tc function for the spectrum driver and the spectrum hardware
offload configuration code.
The hardware offload code is basically two new functions which are capable
of adding and removing a new mirror ports pair. It is done using the MPAT,
MPAR and SBIB registers:
- A new Switch-Port Analyzer (SPAN) entry is added using MPAT to the 'to'
port.
- The 'to' port is bound to the SPAN entry using MPAR register.
- In case of egress SPAN, the 'to' port gets a new internal shared
buffer using SBIB register.
In addition, a new database was added to the mlxsw_sp struct to store all
the SPAN entries and their bound ports list. The number of supported SPAN
entries is determined by resource query.
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The helper function is_tcf_mirred_mirror helps finding whether an action
struct is of type mirred and is configured to be of type mirror.
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The MPAR register is used to bind ports to a SPAN entry (which was
created using MPAT register) and thus mirror their traffic (ingress /
egress) to a different port.
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The MPAT register is used to query and configure the Switch Port Analyzer
(SPAN) table. This register is used to configure a port as a mirror output
port, while after that a mirrored input port can be bound using MPAR
register.
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The SBIB register configures per port buffer for internal use. This
register is used to configure an egress mirror buffer on the egress port
which does the mirroring.
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Following the work that have been done on offloading classifiers like u32
and flower, now the match-all classifier hw offloading is possible. if
the interface supports tc offloading.
To control the offloading, two tc flags have been introduced: skip_sw and
skip_hw. Typical usage:
tc filter add dev eth25 parent ffff: \
matchall skip_sw \
action mirred egress mirror \
dev eth27
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The matchall classifier matches every packet and allows the user to apply
actions on it. This filter is very useful in usecases where every packet
should be matched, for example, packet mirroring (SPAN) can be setup very
easily using that filter.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add max span resources to resources query.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add resources query implementation. If exists, query the HW for its
builtin resources instead of having them as consts in the code.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The firmware in several ZTE devices (at least the MF823/831/910
modems/mifis) use OS fingerprinting to determine which type of device to
export. In addition, these devices export a REST API which can be used to
control the type of device. So far, on Linux, the devices have been seen as
RNDIS or CDC Ether.
When CDC Ether is used, devices of the same type are, as with RNDIS,
exported with the same, bogus random MAC address. In addition, the devices
(at least on all firmware revisions I have found) use the bogus MAC when
sending traffic routed from external networks. And as a final feature, the
devices sometimes export the link state incorrectly. There are also
references online to several other ZTE devices displaying this behavior,
with several different PIDs and MAC addresses.
This patch tries to improve the handling of ZTE devices by doing the
following:
* Create a new driver_info-struct that is used by ZTE devices that do not
have an explicit entry in the product table. This struct is the same as the
default cdc_ether driver info, but a new bind- and an rx_fixup-function
have been added.
* In the new bind function, we check if we have read a random MAC from the
device. If we have, then we generate a new random MAC address. This will
ensure that all devices get a unique MAC.
* The rx_fixup-function replaces the destination MAC address in the skb
with that of the device. I have not seen a revision of these devices that
behaves correctly (i.e., sets the right destination MAC), so I chose not to
do any comparison with for example the known, bogus addresses.
* The MF823/MF832/MF910 sometimes export cdc carrier on twice on connect
(the correct behavior is off then on). Work around this by manually setting
carrier to off if an on-notification is received and the NOCARRIER-bit is
not set.
This change will affect all devices, but it should take care of similar
mistakes made by other manufacturers. I tried to think of/look/test for
problems/regressions that could be introduced by this behavior, but could
not find any. However, my familiarity with this code path is not that
great, so there could be something I have overlooked.
I have tested this patch with multiple revisions of all three devices, and
they behave as expected. In other words, they all got a valid, random MAC,
the correct operational state and I can receive/sent traffic without
problems. I also tested with some other cdc_ether devices I have and did
not find any problems/regressions caused by the two general changes.
v3->v4:
* Forgot to remove unused variables, sorry about that (thanks David
Miller).
v2->v3:
* I had forgot to remove the random MAC generation from usbnet_cdc_bind()
(thanks Oliver).
* Rework logic in the ZTE bind-function a bit.
v1->v2:
* Only generate random MAC for ZTE devices (thanks Oliver Neukum).
* Set random MAC and do RX fixup for all ZTE devices that do not have a
product-entry, as the bogus MAC have been seen on devices with several
different PIDs/MAC addresses. In other words, it seems to be the default
behavior of ZTE CDC Ether devices (thanks Lars Melin).
Signed-off-by: Kristian Evensen <kristian.evensen@gmail.com>
Acked-by: Oliver Neukum <oneukum@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We "cache" the loaded match/target modules and reuse them, but when the
modules are removed, we still point to them. Then we may end up with
invalid memory references when using iptables-compat to add rules later.
Input the following commands will reproduce the kernel crash:
# iptables-compat -A INPUT -j LOG
# iptables-compat -D INPUT -j LOG
# rmmod xt_LOG
# iptables-compat -A INPUT -j LOG
BUG: unable to handle kernel paging request at ffffffffa05a9010
IP: [<ffffffff813f783e>] strcmp+0xe/0x30
Call Trace:
[<ffffffffa05acc43>] nft_target_select_ops+0x83/0x1f0 [nft_compat]
[<ffffffffa058a177>] nf_tables_expr_parse+0x147/0x1f0 [nf_tables]
[<ffffffffa058e541>] nf_tables_newrule+0x301/0x810 [nf_tables]
[<ffffffff8141ca00>] ? nla_parse+0x20/0x100
[<ffffffffa057fa8f>] nfnetlink_rcv+0x33f/0x53d [nfnetlink]
[<ffffffffa057f94b>] ? nfnetlink_rcv+0x1fb/0x53d [nfnetlink]
[<ffffffff817116b8>] netlink_unicast+0x178/0x220
[<ffffffff81711a5b>] netlink_sendmsg+0x2fb/0x3a0
[<ffffffff816b7fc8>] sock_sendmsg+0x38/0x50
[<ffffffff816b8a7e>] ___sys_sendmsg+0x28e/0x2a0
[<ffffffff816bcb7e>] ? release_sock+0x1e/0xb0
[<ffffffff81804ac5>] ? _raw_spin_unlock_bh+0x35/0x40
[<ffffffff816bcbe2>] ? release_sock+0x82/0xb0
[<ffffffff816b93d4>] __sys_sendmsg+0x54/0x90
[<ffffffff816b9422>] SyS_sendmsg+0x12/0x20
[<ffffffff81805172>] entry_SYSCALL_64_fastpath+0x1a/0xa9
So when nobody use the related match/target module, there's no need to
"cache" it. And nft_[match|target]_release are useless anymore, remove
them.
Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
If the user specify the invalid NFTA_MATCH_INFO/NFTA_TARGET_INFO attr
or memory alloc fail, we should call module_put to the related match
or target. Otherwise, we cannot remove the module even nobody use it.
Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Simplify the code without any side effect. The set_expect_timeout is
used to modify the timer expired time. It tries to delete timer, and
add it again. So we could use mod_timer directly.
Signed-off-by: Gao Feng <fgao@ikuai8.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
With this command sequence:
modprobe plip
modprobe pps_parport
rmmod pps_parport
the partport_pps modules causes this crash:
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: parport_detach+0x1d/0x60 [pps_parport]
Oops: 0000 [#1] SMP
...
Call Trace:
parport_unregister_driver+0x65/0xc0 [parport]
SyS_delete_module+0x187/0x210
The sequence that builds up to this is:
1) plip is loaded and takes the parport device for exclusive use:
plip0: Parallel port at 0x378, using IRQ 7.
2) pps_parport then fails to grab the device:
pps_parport: parallel port PPS client
parport0: cannot grant exclusive access for device pps_parport
pps_parport: couldn't register with parport0
3) rmmod of pps_parport is then killed because it tries to access
pardev->name, but pardev (taken from port->cad) is NULL.
So add a check for NULL in the test there too.
Link: http://lkml.kernel.org/r/20160714115245.12651-1-jslaby@suse.cz
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Acked-by: Rodolfo Giometti <giometti@enneenne.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The curly braces are missing here so we print stuff unintentionally.
Fixes: 9da4714a2d44 ('slub: slabinfo update for cmpxchg handling')
Link: http://lkml.kernel.org/r/20160715211243.GE19522@mwanda
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Laura Abbott <labbott@fedoraproject.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
There are no parentheses around this macro and it causes a problem when
we do:
index = rand() % THRASH_SIZE;
Link: http://lkml.kernel.org/r/20160715210953.GC19522@mwanda
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
radix_tree_iter_retry() resets slot to NULL, but it doesn't reset tags.
Then NULL slot and non-zero iter.tags passed to radix_tree_next_slot()
leading to crash:
RIP: radix_tree_next_slot include/linux/radix-tree.h:473
find_get_pages_tag+0x334/0x930 mm/filemap.c:1452
....
Call Trace:
pagevec_lookup_tag+0x3a/0x80 mm/swap.c:960
mpage_prepare_extent_to_map+0x321/0xa90 fs/ext4/inode.c:2516
ext4_writepages+0x10be/0x2b20 fs/ext4/inode.c:2736
do_writepages+0x97/0x100 mm/page-writeback.c:2364
__filemap_fdatawrite_range+0x248/0x2e0 mm/filemap.c:300
filemap_write_and_wait_range+0x121/0x1b0 mm/filemap.c:490
ext4_sync_file+0x34d/0xdb0 fs/ext4/fsync.c:115
vfs_fsync_range+0x10a/0x250 fs/sync.c:195
vfs_fsync fs/sync.c:209
do_fsync+0x42/0x70 fs/sync.c:219
SYSC_fdatasync fs/sync.c:232
SyS_fdatasync+0x19/0x20 fs/sync.c:230
entry_SYSCALL_64_fastpath+0x23/0xc1 arch/x86/entry/entry_64.S:207
We must reset iterator's tags to bail out from radix_tree_next_slot()
and go to the slow-path in radix_tree_next_chunk().
Fixes: 46437f9a554f ("radix-tree: fix race in gang lookup")
Link: http://lkml.kernel.org/r/1468495196-10604-1-git-send-email-aryabinin@virtuozzo.com
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The memory controller has quite a bit of state that usually outlives the
cgroup and pins its CSS until said state disappears. At the same time
it imposes a 16-bit limit on the CSS ID space to economically store IDs
in the wild. Consequently, when we use cgroups to contain frequent but
small and short-lived jobs that leave behind some page cache, we quickly
run into the 64k limitations of outstanding CSSs. Creating a new cgroup
fails with -ENOSPC while there are only a few, or even no user-visible
cgroups in existence.
Although pinning CSSs past cgroup removal is common, there are only two
instances that actually need an ID after a cgroup is deleted: cache
shadow entries and swapout records.
Cache shadow entries reference the ID weakly and can deal with the CSS
having disappeared when it's looked up later. They pose no hurdle.
Swap-out records do need to pin the css to hierarchically attribute
swapins after the cgroup has been deleted; though the only pages that
remain swapped out after offlining are tmpfs/shmem pages. And those
references are under the user's control, so they are manageable.
This patch introduces a private 16-bit memcg ID and switches swap and
cache shadow entries over to using that. This ID can then be recycled
after offlining when the CSS remains pinned only by objects that don't
specifically need it.
This script demonstrates the problem by faulting one cache page in a new
cgroup and deleting it again:
set -e
mkdir -p pages
for x in `seq 128000`; do
[ $((x % 1000)) -eq 0 ] && echo $x
mkdir /cgroup/foo
echo $$ >/cgroup/foo/cgroup.procs
echo trex >pages/$x
echo $$ >/cgroup/cgroup.procs
rmdir /cgroup/foo
done
When run on an unpatched kernel, we eventually run out of possible IDs
even though there are no visible cgroups:
[root@ham ~]# ./cssidstress.sh
[...]
65000
mkdir: cannot create directory '/cgroup/foo': No space left on device
After this patch, the IDs get released upon cgroup destruction and the
cache and css objects get released once memory reclaim kicks in.
[hannes@cmpxchg.org: init the IDR]
Link: http://lkml.kernel.org/r/20160621154601.GA22431@cmpxchg.org
Fixes: b2052564e66d ("mm: memcontrol: continue cache reclaim from offlined groups")
Link: http://lkml.kernel.org/r/20160617162516.GD19084@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: John Garcia <john.garcia@mesosphere.io>
Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Nikolay Borisov <kernel@kyup.com>
Cc: <stable@vger.kernel.org> [3.19+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
xt_connlabel is the only user so move it.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
The conntrack label extension is currently variable-sized, e.g. if
only 2 labels are used by iptables rules then the labels->bits[] array
will only contain one element.
We track size of each label storage area in the 'words' member.
But in nftables and openvswitch we always have to ask for worst-case
since we don't know what bit will be used at configuration time.
As most arches are 64bit we need to allocate 24 bytes in this case:
struct nf_conn_labels {
u8 words; /* 0 1 */
/* XXX 7 bytes hole, try to pack */
long unsigned bits[2]; /* 8 24 */
Make bits a fixed size and drop the words member, it simplifies
the code and only increases memory requirements on x86 when
less than 64bit labels are required.
We still only allocate the extension if its needed.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
I stumbled over a build error with COMPILE_TEST and CONFIG_OF
disabled:
drivers/gpio/gpio-tegra.c: In function 'tegra_gpio_probe':
drivers/gpio/gpio-tegra.c:603:9: error: 'struct gpio_chip' has no member named 'of_node'
The problem is that the newly added GPIO_TEGRA Kconfig symbol
does not have a dependency on CONFIG_OF. However, there is another
problem here as the driver gets enabled unconditionally whenever
COMPILE_TEST is set.
This fixes both problems, by making the symbol user-visible
when COMPILE_TEST is set and default-enabled for ARCH_TEGRA=y.
As a side-effect, it is now possible to compile-test a Tegra
kernel with GPIO support disabled, which is harmless.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: 4dd4dd1d2120 ("gpio: tegra: Allow compile test")
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
|
|
To allow for child request context the struct akcipher_request child_req
needs to be at the end of the structure.
Cc: stable@vger.kernel.org
Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
This patch address a few issues with the initial crosstalk fix. Most
important of which is the SDP that indicates the presents of a SFP+
module changes between HW types. With this change that is taken in
to consideration
It also moves the check closer to the base code that checks link. This
makes it so we only need to do the check in one spot.
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|