linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2018-10-17	net: ena: Fix Kconfig dependency on X86	Netanel Belgazal	1	-1/+1
	The Kconfig limitation of X86 is to too wide. The ENA driver only requires a little endian dependency. Change the dependency to be on little endian CPU. Signed-off-by: Netanel Belgazal <netanel@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-17	udp6: fix encap return code for resubmitting	Paolo Abeni	1	-4/+2
	The commit eb63f2964dbe ("udp6: add missing checks on edumux packet processing") used the same return code convention of the ipv4 counterpart, but ipv6 uses the opposite one: positive values means resubmit. This change addresses the issue, using positive return value for resubmitting. Also update the related comment, which was broken, too. Fixes: eb63f2964dbe ("udp6: add missing checks on edumux packet processing") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-17	mlxsw: core: Fix use-after-free when flashing firmware during init	Ido Schimmel	3	-6/+17
	When the switch driver (e.g., mlxsw_spectrum) determines it needs to flash a new firmware version it resets the ASIC after the flashing process. The bus driver (e.g., mlxsw_pci) then registers itself again with mlxsw_core which means (among other things) that the device registers itself again with the hwmon subsystem again. Since the device was registered with the hwmon subsystem using devm_hwmon_device_register_with_groups(), then the old hwmon device (registered before the flashing) was never unregistered and was referencing stale data, resulting in a use-after free. Fix by removing reliance on device managed APIs in mlxsw_hwmon_init(). Fixes: c86d62cc410c ("mlxsw: spectrum: Reset FW after flash") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reported-by: Alexander Petrovskiy <alexpe@mellanox.com> Tested-by: Alexander Petrovskiy <alexpe@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-17	tcp_bbr: centralize code to set gains	Neal Cardwell	1	-10/+30
	Centralize the code that sets gains used for computing cwnd and pacing rate. This simplifies the code and makes it easier to change the state machine or (in the future) dynamically change the gain values and ensure that the correct gain values are always used. Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Priyaranjan Jha <priyarjha@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-17	tcp_bbr: adjust TCP BBR for departure time pacing	Neal Cardwell	1	-2/+35
	Adjust TCP BBR for the new departure time pacing model in the recent commit ab408b6dc7449 ("tcp: switch tcp and sch_fq to new earliest departure time model"). With TSQ and pacing at lower layers, there are often several skbs queued in the pacing layer, and thus there is less data "in the network" than "in flight". With departure time pacing at lower layers (e.g. fq or potential future NICs), the data in the pacing layer now has a pre-scheduled ("baked-in") departure time that cannot be changed, even if the congestion control algorithm decides to use a new pacing rate. This means that there can be a non-trivial lag between when BBR makes a pacing rate change and when the inter-skb pacing delays change. After a pacing rate change, the number of packets in the network can gradually evolve to be higher or lower, depending on whether the sending rate is higher or lower than the delivery rate. Thus ignoring this lag can cause significant overshoot, with the flow ending up with too many or too few packets in the network. This commit changes BBR to adapt its pacing rate based on the amount of data in the network that it estimates has already been "baked in" by previous departure time decisions. We estimate the number of our packets that will be in the network at the earliest departure time (EDT) for the next skb scheduled as: in_network_at_edt = inflight_at_edt - (EDT - now) * bw If we're increasing the amount of data in the network ("in_network"), then we want to know if the transmit of the EDT skb will push in_network above the target, so our answer includes bbr_tso_segs_goal() from the skb departing at EDT. If we're decreasing in_network, then we want to know if in_network will sink too low just before the EDT transmit, so our answer does not include the segments from the skb departing at EDT. Why do we treat pacing_gain > 1.0 case and pacing_gain < 1.0 case differently? The in_network curve is a step function: in_network goes up on transmits, and down on ACKs. To accurately predict when in_network will go beyond our target value, this will happen on different events, depending on whether we're concerned about in_network potentially going too high or too low: o if pushing in_network up (pacing_gain > 1.0), then in_network goes above target upon a transmit event o if pushing in_network down (pacing_gain < 1.0), then in_network goes below target upon an ACK event This commit changes the BBR state machine to use this estimated "packets in network" value to make its decisions. Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-17	net/ncsi: Add NCSI Broadcom OEM command	Vijay Khemka	5	-2/+147
	This patch adds OEM Broadcom commands and response handling. It also defines OEM Get MAC Address handler to get and configure the device. ncsi_oem_gma_handler_bcm: This handler send NCSI broadcom command for getting mac address. ncsi_rsp_handler_oem_bcm: This handles response received for all broadcom OEM commands. ncsi_rsp_handler_oem_bcm_gma: This handles get mac address response and set it to device. Signed-off-by: Vijay Khemka <vijaykhemka@fb.com> Reviewed-by: Samuel Mendoza-Jonas <sam@mendozajonas.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-17	sctp: not free the new asoc when sctp_wait_for_connect returns err	Xin Long	1	-1/+3
	When sctp_wait_for_connect is called to wait for connect ready for sp->strm_interleave in sctp_sendmsg_to_asoc, a panic could be triggered if cpu is scheduled out and the new asoc is freed elsewhere, as it will return err and later the asoc gets freed again in sctp_sendmsg. [ 285.840764] list_del corruption, ffff9f0f7b284078->next is LIST_POISON1 (dead000000000100) [ 285.843590] WARNING: CPU: 1 PID: 8861 at lib/list_debug.c:47 __list_del_entry_valid+0x50/0xa0 [ 285.846193] Kernel panic - not syncing: panic_on_warn set ... [ 285.846193] [ 285.848206] CPU: 1 PID: 8861 Comm: sctp_ndata Kdump: loaded Not tainted 4.19.0-rc7.label #584 [ 285.850559] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [ 285.852164] Call Trace: ... [ 285.872210] ? __list_del_entry_valid+0x50/0xa0 [ 285.872894] sctp_association_free+0x42/0x2d0 [sctp] [ 285.873612] sctp_sendmsg+0x5a4/0x6b0 [sctp] [ 285.874236] sock_sendmsg+0x30/0x40 [ 285.874741] ___sys_sendmsg+0x27a/0x290 [ 285.875304] ? __switch_to_asm+0x34/0x70 [ 285.875872] ? __switch_to_asm+0x40/0x70 [ 285.876438] ? ptep_set_access_flags+0x2a/0x30 [ 285.877083] ? do_wp_page+0x151/0x540 [ 285.877614] __sys_sendmsg+0x58/0xa0 [ 285.878138] do_syscall_64+0x55/0x180 [ 285.878669] entry_SYSCALL_64_after_hwframe+0x44/0xa9 This is a similar issue with the one fixed in Commit ca3af4dd28cf ("sctp: do not free asoc when it is already dead in sctp_sendmsg"). But this one can't be fixed by returning -ESRCH for the dead asoc in sctp_wait_for_connect, as it will break sctp_connect's return value to users. This patch is to simply set err to -ESRCH before it returns to sctp_sendmsg when any err is returned by sctp_wait_for_connect for sp->strm_interleave, so that no asoc would be freed due to this. When users see this error, they will know the packet hasn't been sent. And it also makes sense to not free asoc because waiting connect fails, like the second call for sctp_wait_for_connect in sctp_sendmsg_to_asoc. Fixes: 668c9beb9020 ("sctp: implement assign_number for sctp_stream_interleave") Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-17	sctp: fix race on sctp_id2asoc	Marcelo Ricardo Leitner	1	-3/+2
	syzbot reported an use-after-free involving sctp_id2asoc. Dmitry Vyukov helped to root cause it and it is because of reading the asoc after it was freed: CPU 1 CPU 2 (working on socket 1) (working on socket 2) sctp_association_destroy sctp_id2asoc spin lock grab the asoc from idr spin unlock spin lock remove asoc from idr spin unlock free(asoc) if asoc->base.sk != sk ... [] This can only be hit if trying to fetch asocs from different sockets. As we have a single IDR for all asocs, in all SCTP sockets, their id is unique on the system. An application can try to send stuff on an id that matches on another socket, and the if in [] will protect from such usage. But it didn't consider that as that asoc may belong to another socket, it may be freed in parallel (read: under another socket lock). We fix it by moving the checks in [*] into the protected region. This fixes it because the asoc cannot be freed while the lock is held. Reported-by: syzbot+c7dd55d7aec49d48e49a@syzkaller.appspotmail.com Acked-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-17	r8169: re-enable MSI-X on RTL8168g	Heiner Kallweit	1	-5/+0
	Similar to d49c88d7677b ("r8169: Enable MSI-X on RTL8106e") after e9d0ba506ea8 ("PCI: Reprogram bridge prefetch registers on resume") we can safely assume that this also fixes the root cause of the issue worked around by 7c53a722459c ("r8169: don't use MSI-X on RTL8168g"). So let's revert it. Fixes: 7c53a722459c ("r8169: don't use MSI-X on RTL8168g") Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-17	net: phy: mscc: fix memory leak in vsc8574_config_pre_init	Gustavo A. R. Silva	1	-1/+1
	In case memory resources for fw were successfully allocated, release them before return. Addresses-Coverity-ID: 1473968 ("Resource leak") Fixes: 00d70d8e0e78 ("net: phy: mscc: add support for VSC8574 PHY") Reviewed-by: Quentin Schulz <quentin.schulz@bootlin.com> Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-17	net: phy: mscc: fix signedness bug in vsc85xx_downshift_get	Gustavo A. R. Silva	1	-1/+1
	Currently, the error handling for the call to function phy_read_paged() doesn't work because reg_val is of type u16 (16 bits, unsigned), which makes it impossible for it to hold a value less than 0. Fix this by changing the type of variable reg_val to int. Addresses-Coverity-ID: 1473970 ("Unsigned compared against 0") Fixes: 6a0bfbbe20b0 ("net: phy: mscc: migrate to phy_select/restore_page functions") Reviewed-by: Quentin Schulz <quentin.schulz@bootlin.com> Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-17	net: bpfilter: use get_pid_task instead of pid_task	Taehee Yoo	1	-2/+4
	pid_task() dereferences rcu protected tasks array. But there is no rcu_read_lock() in shutdown_umh() routine so that rcu_read_lock() is needed. get_pid_task() is wrapper function of pid_task. it holds rcu_read_lock() then calls pid_task(). if task isn't NULL, it increases reference count of task. test commands: %modprobe bpfilter %modprobe -rv bpfilter splat looks like: [15102.030932] ============================= [15102.030957] WARNING: suspicious RCU usage [15102.030985] 4.19.0-rc7+ #21 Not tainted [15102.031010] ----------------------------- [15102.031038] kernel/pid.c:330 suspicious rcu_dereference_check() usage! [15102.031063] other info that might help us debug this: [15102.031332] rcu_scheduler_active = 2, debug_locks = 1 [15102.031363] 1 lock held by modprobe/1570: [15102.031389] #0: 00000000580ef2b0 (bpfilter_lock){+.+.}, at: stop_umh+0x13/0x52 [bpfilter] [15102.031552] stack backtrace: [15102.031583] CPU: 1 PID: 1570 Comm: modprobe Not tainted 4.19.0-rc7+ #21 [15102.031607] Hardware name: To be filled by O.E.M. To be filled by O.E.M./Aptio CRB, BIOS 5.6.5 07/08/2015 [15102.031628] Call Trace: [15102.031676] dump_stack+0xc9/0x16b [15102.031723] ? show_regs_print_info+0x5/0x5 [15102.031801] ? lockdep_rcu_suspicious+0x117/0x160 [15102.031855] pid_task+0x134/0x160 [15102.031900] ? find_vpid+0xf0/0xf0 [15102.032017] shutdown_umh.constprop.1+0x1e/0x53 [bpfilter] [15102.032055] stop_umh+0x46/0x52 [bpfilter] [15102.032092] __x64_sys_delete_module+0x47e/0x570 [ ... ] Fixes: d2ba09c17a06 ("net: add skeleton of bpfilter kernel module") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-17	ptp: fix Spectre v1 vulnerability	Gustavo A. R. Silva	1	-0/+4
	pin_index can be indirectly controlled by user-space, hence leading to a potential exploitation of the Spectre variant 1 vulnerability. This issue was detected with the help of Smatch: drivers/ptp/ptp_chardev.c:253 ptp_ioctl() warn: potential spectre issue 'ops->pin_config' [r] (local cap) Fix this by sanitizing pin_index before using it to index ops->pin_config, and before passing it as an argument to function ptp_set_pinfunc(), in which it is used to index info->pin_config. Notice that given that speculation windows are large, the policy is to kill the speculation on the first load and not worry if it can be completed with a dependent load/store [1]. [1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2 Cc: stable@vger.kernel.org Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-17	net: fix warning in af_unix	Kyeongdon Kim	1	-0/+2
	This fixes the "'hash' may be used uninitialized in this function" net/unix/af_unix.c:1041:20: warning: 'hash' may be used uninitialized in this function [-Wmaybe-uninitialized] addr->hash = hash ^ sk->sk_type; Signed-off-by: Kyeongdon Kim <kyeongdon.kim@lge.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-17	net: dsa: mv88e6xxx: Fix 88E6141/6341 2500mbps SERDES speed	Marek Behún	3	-4/+26
	This is a fix for the port_set_speed method for the Topaz family. Currently the same method is used as for the Peridot family, but this is wrong for the SERDES port. On Topaz, the SERDES port is port 5, not 9 and 10 as in Peridot. Moreover setting alt_bit on Topaz only makes sense for port 0 (for (differentiating 100mbps vs 200mbps). The SERDES port does not support more than 2500mbps, so alt_bit does not make any difference. Signed-off-by: Marek Behún <marek.behun@nic.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-17	sparc: vDSO: Silence an uninitialized variable warning	Dan Carpenter	1	-1/+3
	Smatch complains that "val" would be uninitialized if kstrtoul() fails. Fixes: 9a08862a5d2e ("vDSO for sparc") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-17	net: qla3xxx: Remove overflowing shift statement	Nathan Chancellor	1	-2/+0
	Clang currently warns: drivers/net/ethernet/qlogic/qla3xxx.c:384:24: warning: signed shift result (0xF00000000) requires 37 bits to represent, but 'int' only has 32 bits [-Wshift-overflow] ((ISP_NVRAM_MASK << 16) \| qdev->eeprom_cmd_data)); ~~~~~~~~~~~~~~ ^ ~~ 1 warning generated. The warning is certainly accurate since ISP_NVRAM_MASK is defined as (0x000F << 16) which is then shifted by 16, resulting in 64424509440, well above UINT_MAX. Given that this is the only location in this driver where ISP_NVRAM_MASK is shifted again, it seems likely that ISP_NVRAM_MASK was originally defined without a shift and during the move of the shift to the definition, this statement wasn't properly removed (since ISP_NVRAM_MASK is used in the statenent right above this). Only the maintainers can confirm this since this statment has been here since the driver was first added to the kernel. Link: https://github.com/ClangBuiltLinux/linux/issues/127 Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-17	geneve, vxlan: Don't set exceptions if skb->len < mtu	Stefano Brivio	3	-6/+15
	We shouldn't abuse exceptions: if the destination MTU is already higher than what we're transmitting, no exception should be created. Fixes: 52a589d51f10 ("geneve: update skb dst pmtu on tx path") Fixes: a93bf0ff4490 ("vxlan: update skb dst pmtu on tx path") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-17	geneve, vxlan: Don't check skb_dst() twice	Stefano Brivio	2	-21/+6
	Commit f15ca723c1eb ("net: don't call update_pmtu unconditionally") avoids that we try updating PMTU for a non-existent destination, but didn't clean up cases where the check was already explicit. Drop those redundant checks. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-17	octeontx2-af: Support for disabling NIX RQ/SQ/CQ contexts	Geetha sowjanya	3	-3/+129
	This patch adds support for a RVU PF/VF to disable all RQ/SQ/CQ contexts of a NIX LF via mbox. This will be used by PF/VF drivers upon teardown or while freeing up HW resources. A HW context which is not INIT'ed cannot be modified and a RVU PF/VF driver may or may not INIT all the RQ/SQ/CQ contexts. So a bitmap is introduced to keep track of enabled NIX RQ/SQ/CQ contexts, so that only enabled hw contexts are disabled upon LF teardown. Signed-off-by: Geetha sowjanya <gakula@marvell.com> Signed-off-by: Stanislaw Kardach <skardach@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-17	octeontx2-af: NIX AQ instruction enqueue support	Sunil Goutham	4	-9/+680
	Add support for a RVU PF/VF to submit instructions to NIX AQ via mbox. Instructions can be to init/write/read RQ/SQ/CQ/RSS contexts. In case of read, context will be returned as part of response to the mbox msg received. Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-17	octeontx2-af: Alloc bitmaps for NIX Tx scheduler queues	Sunil Goutham	3	-1/+114
	Allocate bitmaps and memory for PFVF mapping info for maintaining NIX transmit scheduler queues maintenance. PF/VF drivers will request for alloc, free e.t.c of Tx schedulers via mailbox. Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-17	octeontx2-af: NIX LSO config for TSOv4/v6 offload	Sunil Goutham	4	-0/+138
	Config LSO formats for TSOv4 and TSOv6 offloads. These formats tell HW which fields in the TCP packet's headers have to be updated while performing segmentation offload. Also report PF/VF drivers the LSO format indices as part of response to NIX_LF_ALLOC mbox msg. These indices are used in SQE extension headers while framing SQE for pkt transmission with TSO offload. Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-17	octeontx2-af: NIX block LF initialization	Sunil Goutham	4	-0/+308
	Upon receiving NIX_LF_ALLOC mbox message allocate memory for NIXLF's CQ, SQ, RQ, CINT, QINT and RSS HW contexts and configure respective base iova HW. Enable caching of contexts into NIX NDC. Return SQ buffer (SQB) size, this PF/VF MAC address etc info e.t.c to the mbox msg sender. Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-17	octeontx2-af: NIX block admin queue init	Sunil Goutham	5	-1/+220
	Initialize NIX admin queue (AQ) i.e alloc memory for AQ instructions and for the results. All NIX LFs will submit instructions to AQ to init/write/read RQ/SQ/CQ/RSS contexts and in case of read, get context from result memory. Also before configuring/using NIX block calibrate X2P bus and check if NIX interfaces like CGX and LBK are in active and working state. Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-17	octeontx2-af: Support for disabling NPA Aura/Pool contexts	Geetha sowjanya	3	-0/+110
	This patch adds support for a RVU PF/VF to disable all Aura/Pool contexts of a NPA LF via mbox. This will be used by PF/VF drivers upon teardown or while freeing up HW resources. A HW context which is not INIT'ed cannot be modified and a RVU PF/VF driver may or may not INIT all the Aura/Pool contexts. So a bitmap is introduced to keep track of enabled NPA Aura/Pool contexts, so that only enabled hw contexts are disabled upon LF teardown. Signed-off-by: Geetha sowjanya <gakula@marvell.com> Signed-off-by: Stanislaw Kardach <skardach@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-17	octeontx2-af: NPA AQ instruction enqueue support	Sunil Goutham	5	-0/+427
	Add support for a RVU PF/VF to submit instructions to NPA AQ via mbox. Instructions can be to init/write/read Aura/Pool/Qint contexts. In case of read, context will be returned as part of response to the mbox msg received. Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-17	octeontx2-af: NPA block LF initialization	Sunil Goutham	5	-3/+214
	Upon receiving NPA_LF_ALLOC mbox message allocate memory for NPALF's aura, pool and qint contexts and configure the same to HW. Enable caching of contexts into NPA NDC. Return pool related info like stack size, num pointers per stack page e.t.c to the mbox msg sender. Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-17	octeontx2-af: NPA block admin queue init	Sunil Goutham	6	-2/+309
	Initialize NPA admin queue (AQ) i.e alloc memory for AQ instructions and for the results. All NPA LFs will submit instructions to AQ to init/write/read Aura/Pool contexts and in case of read, get context from result memory. Added some common APIs for allocating memory for a queue and get IOVA in return, these APIs will be used by NIX AQ and for other purposes. Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-17	octeontx2-af: Enable or disable CGX internal loopback	Geetha sowjanya	5	-0/+72
	Add support to enable or disable internal loopback mode in CGX. New mbox IDs CGX_INTLBK_ENABLE/DISABLE added for this. Signed-off-by: Geetha sowjanya <gakula@marvell.com> Signed-off-by: Linu Cherian <lcherian@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-17	octeontx2-af: Forward CGX link notifications to PFs	Linu Cherian	6	-12/+368
	Upon receiving notification from firmware the CGX event handler in the AF driver gets the current link info such as status, speed, duplex etc from CGX driver and sends it across to PFs who have registered to receive such notifications. To support above - Mbox messaging support for sending msgs from AF to PF has been added. - Added mbox msgs so that PFs can register/unregister for link events. - Link notifications are sent to PF under two scenarioss. 1. When a asynchronous link change notification is received from firmware with notification flag turned on for that PF. 2. Upon notification turn on request, the current link status is send to the PF. Also added a new mailbox msg using which RVU PF/VF can retrieve their mapped CGX LMAC's current link info. Link info includes status, speed, duplex and lmac type. Signed-off-by: Linu Cherian <lcherian@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-17	octeontx2-af: Support for MAC address filters in CGX	Vidhya Raman	5	-0/+183
	This patch adds support for setting MAC address filters in CGX for PF interfaces. Also PF interfaces can be put in promiscuous mode. Dataplane PFs access this functionality using mailbox messages to the AF driver. Signed-off-by: Vidhya Raman <vraman@marvell.com> Signed-off-by: Stanislaw Kardach <skardach@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-17	octeontx2-af: Support to retrieve CGX LMAC stats	Christina Jacob	5	-0/+76
	This patch adds support for a RVU PF/VF driver to retrieve it's mapped CGX LMAC Rx and Tx stats from AF via mbox. New mailbox msg is added is added. Signed-off-by: Christina Jacob <cjacob@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-17	octeontx2-af: CGX Rx/Tx enable/disable mbox handlers	Sunil Goutham	5	-0/+73
	Added new mailbox msgs for RVU PF/VFs to request AF to enable/disable their mapped CGX::LMAC Rx & Tx. Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: Linu Cherian <lcherian@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-17	octeontx2-af: Improve register polling loop	Sunil Goutham	1	-3/+3
	Instead of looping on a integer timeout, use time_before(jiffies), so that maximum poll time is capped. Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Suggested-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-17	sparc: Fix syscall fallback bugs in VDSO.	David S. Miller	2	-1/+16
	First, the trap number for 32-bit syscalls is 0x10. Also, only negate the return value when syscall error is indicated by the carry bit being set. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-17	mlxsw: spectrum_switchdev: Add support for VxLAN encapsulation	Ido Schimmel	1	-1/+405
	In the device, VxLAN encapsulation takes place in the FDB table where certain {MAC, FID} entries are programmed with an underlay unicast IP. MAC addresses that are not programmed in the FDB are flooded to the relevant local ports and also to a list of underlay unicast IPs that are programmed using the all zeros MAC address in the VxLAN driver. One difference between the hardware and software data paths is the fact that in the software data path there are two FDB lookups prior to the encapsulation of the packet. First in the bridge's FDB table using {MAC, VID} and another in the VxLAN's FDB table using {MAC, VNI}. Therefore, when a new VxLAN FDB entry is notified, it is only programmed to the device if there is a corresponding entry in the bridge's FDB table. Similarly, when a new bridge FDB entry pointing to the VxLAN device is notified, it is only programmed to the device if there is a corresponding entry in the VxLAN's FDB table. Note that the above scheme will result in a discrepancy between both data paths if only one FDB table is populated in the software data path. For example, if only the bridge's FDB is populated with an entry pointing to a VxLAN device, then a packet hitting the entry will only be flooded by the kernel to remote VTEPs whereas the device will also flood the packets to other local ports member in the VLAN. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-17	mlxsw: spectrum: Enable VxLAN enslavement to bridges	Ido Schimmel	3	-1/+267
	Enslavement of VxLAN devices to offloaded bridges was never forbidden by mlxsw, but this patch makes sure the required configuration is performed in order to allow VxLAN encapsulation and decapsulation to take place in the device. The patch handles both the case where a VxLAN device is enslaved to an already offloaded bridge and the case where the first mlxsw port is enslaved to a bridge that already has VxLAN device configured. Invalid configurations are sanitized and an error string is returned via extack. Since encapsulation and decapsulation do not occur when the VxLAN device is down, the driver makes sure to enable / disable these functionalities based on NETDEV_PRE_UP and NETDEV_DOWN events. Note that NETDEV_PRE_UP is used in favor of NETDEV_UP, as the former allows to veto the operation, if necessary. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-17	bridge: switchdev: Allow clearing FDB entry offload indication	Ido Schimmel	8	-13/+20
	Currently, an FDB entry only ceases being offloaded when it is deleted. This changes with VxLAN encapsulation. Devices capable of performing VxLAN encapsulation usually have only one FDB table, unlike the software data path which has two - one in the bridge driver and another in the VxLAN driver. Therefore, bridge FDB entries pointing to a VxLAN device are only offloaded if there is a corresponding entry in the VxLAN FDB. Allow clearing the offload indication in case the corresponding entry was deleted from the VxLAN FDB. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-17	vxlan: Notify for each remote of a removed FDB entry	Petr Machata	1	-1/+4
	When notifications are sent about FDB activity, and an FDB entry with several remotes is removed, the notification is sent only for the first destination. That makes it impossible to distinguish between the case where only this first remote is removed, and the one where the FDB entry is removed as a whole. Therefore send one notification for each remote of a removed FDB entry. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-17	vxlan: Support marking RDSTs as offloaded	Petr Machata	3	-1/+61
	Offloaded bridge FDB entries are marked with NTF_OFFLOADED. Implement a similar mechanism for VXLAN, where a given remote destination can be marked as offloaded. To that end, introduce a new event, SWITCHDEV_VXLAN_FDB_OFFLOADED, through which the marking is communicated to the vxlan driver. To identify which RDST should be marked as offloaded, an switchdev_notifier_vxlan_fdb_info is passed to the listeners. The "offloaded" flag in that object determines whether the offloaded mark should be set or cleared. When sending offloaded FDB entries over netlink, mark them with NTF_OFFLOADED. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-17	vxlan: Add vxlan_fdb_find_uc() for FDB querying	Petr Machata	2	-0/+52
	A switchdev-capable driver that is aware of VXLAN may need to query VXLAN FDB. In the particular case of mlxsw, this functionality is limited to querying UC FDBs. Those being easier to deal with than the general case of RDST chain traversal, introduce an interface to query specifically UC FDBs: vxlan_fdb_find_uc(). Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-17	vxlan: Add switchdev notifications	Petr Machata	3	-2/+58
	When offloading VXLAN devices, drivers need to know about events in VXLAN FDB database. Since VXLAN models a bridge, it is natural to distribute the VXLAN FDB notifications using the pre-existing switchdev notification mechanism. To that end, introduce two new notification types: SWITCHDEV_VXLAN_FDB_ADD_TO_DEVICE and SWITCHDEV_VXLAN_FDB_DEL_TO_DEVICE. Introduce a new function, vxlan_fdb_switchdev_call_notifiers() to send the new notifier types, and a struct switchdev_notifier_vxlan_fdb_info to communicate the details of the FDB entry under consideration. Invoke the new function from vxlan_fdb_notify(). Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-17	net: Add netif_is_vxlan()	Ido Schimmel	2	-1/+9
	Add the ability to determine whether a netdev is a VxLAN netdev by calling the above mentioned function that checks the netdev's rtnl_link_ops. This will allow modules to identify netdev events involving a VxLAN netdev and act accordingly. For example, drivers capable of VxLAN offload will need to configure the underlying device when a VxLAN netdev is being enslaved to an offloaded bridge. Convert nfp to use the newly introduced helper. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-17	mlxsw: spectrum_router: Configure matching local routes for NVE decap	Ido Schimmel	1	-0/+10
	When a local route that matches the source IP of an offloaded NVE tunnel is notified, the driver needs to program it to perform NVE decapsulation instead of merely trapping packets to the CPU. This patch complements "mlxsw: spectrum_router: Enable local routes promotion to perform NVE decap" where existing local routes were promoted to perform NVE decapsulation. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-17	mlxsw: spectrum_fid: Clear NVE configuration when destroying 802.1D FIDs	Ido Schimmel	1	-0/+2
	802.1D FIDs are used to represent VLAN-unaware bridges and currently this is the only type of FID that supports NVE configuration. Since the NVE tunnel device does not take a reference on the FID, it is possible for the FID to be destroyed when it still has NVE configuration. Therefore, when destroying the FID make sure to disable its NVE configuration. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-17	mlxsw: spectrum_nve: Implement VxLAN operations	Ido Schimmel	1	-2/+188
	The common NVE core expects each encapsulation type to implement a certain set of operations that are specific to this type and the currently used ASIC. These operations include things such as the ability to determine whether a certain NVE configuration can be offloaded and ASIC-specific initialization for this type. Implement these operations for VxLAN on the Spectrum ASIC. Spectrum-2 support will be added by a future patchset. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-17	mlxsw: spectrum_nve: Implement common NVE core	Ido Schimmel	6	-1/+1158
	The Spectrum ASIC supports different types of NVE encapsulations (e.g., VxLAN, NVGRE) with more types to be supported by future ASICs. Despite being different, all these encapsulations share some common functionality such as the enablement of NVE encapsulation on a given filtering identifier (FID) and the addition of remote VTEPs to the linked-list of VTEPs that traffic should be flooded to. Implement this common core and allow different ASICs to register different operations for different encapsulation types. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-17	inet: Refactor INET_ECN_decapsulate()	Ido Schimmel	1	-4/+14
	Drivers that support tunnel decapsulation (IPinIP or NVE) need to configure the underlying device to conform to the behavior outlined in RFC 6040 with respect to the ECN bits. This behavior is implemented by INET_ECN_decapsulate() which requires an skb to be passed where the ECN CE bit can be potentially set. Since these drivers do not need to mark an skb, but only configure the device to do so, factor out the business logic to __INET_ECN_decapsulate() and potentially perform the marking in INET_ECN_decapsulate(). This allows drivers to invoke __INET_ECN_decapsulate() and configure the device. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Suggested-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-17	vxlan: Export address checking functions	Ido Schimmel	2	-26/+32
	Drivers that support VxLAN offload need to be able to sanitize the configuration of the VxLAN device and accept / reject its offload. For example, mlxsw requires that the local IP of the VxLAN device be set and that packets be flooded to unicast IP(s) and not to a multicast group. Expose the functions that perform such checks. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>