linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2019-06-06	ipv6: fix EFAULT on sendto with icmpv6 and hdrincl	Olivier Matz	1	-5/+8
	The following code returns EFAULT (Bad address): s = socket(AF_INET6, SOCK_RAW, IPPROTO_ICMPV6); setsockopt(s, SOL_IPV6, IPV6_HDRINCL, 1); sendto(ipv6_icmp6_packet, addr); /* returns -1, errno = EFAULT */ The IPv4 equivalent code works. A workaround is to use IPPROTO_RAW instead of IPPROTO_ICMPV6. The failure happens because 2 bytes are eaten from the msghdr by rawv6_probe_proto_opt() starting from commit 19e3c66b52ca ("ipv6 equivalent of "ipv4: Avoid reading user iov twice after raw_probe_proto_opt""), but at that time it was not a problem because IPV6_HDRINCL was not yet introduced. Only eat these 2 bytes if hdrincl == 0. Fixes: 715f504b1189 ("ipv6: add IPV6_HDRINCL option for raw sockets") Signed-off-by: Olivier Matz <olivier.matz@6wind.com> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-06	ipv6: use READ_ONCE() for inet->hdrincl as in ipv4	Olivier Matz	1	-2/+10
	As it was done in commit 8f659a03a0ba ("net: ipv4: fix for a race condition in raw_sendmsg") and commit 20b50d79974e ("net: ipv4: emulate READ_ONCE() on ->hdrincl bit-field in raw_sendmsg()") for ipv4, copy the value of inet->hdrincl in a local variable, to avoid introducing a race condition in the next commit. Signed-off-by: Olivier Matz <olivier.matz@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-06	r8169: silence sparse warning in rtl8169_start_xmit	Heiner Kallweit	1	-1/+1
	The opts[] array is of type u32. Therefore remove the wrong cpu_to_le32(). The opts[] array members are converted to little endian later when being assigned to the respective descriptor fields. This is not a new issue, it just popped up due to r8169.c having been renamed and more thoroughly checked. Due to the renaming this patch applies to net-next only. Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-06	Revert "gfs2: Replace gl_revokes with a GLF flag"	Bob Peterson	6	-31/+15
	Commit 73118ca8baf7 introduced a glock reference counting bug in gfs2_trans_remove_revoke. Given that, replacing gl_revokes with a GLF flag is no longer useful, so revert that commit. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-06-06	arm64: Silence gcc warnings about arch ABI drift	Dave Martin	1	-0/+1
	Since GCC 9, the compiler warns about evolution of the platform-specific ABI, in particular relating for the marshaling of certain structures involving bitfields. The kernel is a standalone binary, and of course nobody would be so stupid as to expose structs containing bitfields as function arguments in ABI. (Passing a pointer to such a struct, however inadvisable, should be unaffected by this change. perf and various drivers rely on that.) So these warnings do more harm than good: turn them off. We may miss warnings about future ABI drift, but that's too bad. Future ABI breaks of this class will have to be debugged and fixed the traditional way unless the compiler evolves finer-grained diagnostics. Signed-off-by: Dave Martin <Dave.Martin@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-06-06	parisc: Fix crash due alternative coding for NP iopdir_fdc bit	Helge Deller	1	-1/+2
	According to the found documentation, data cache flushes and sync instructions are needed on the PCX-U+ (PA8200, e.g. C200/C240) platforms, while PCX-W (PA8500, e.g. C360) platforms aparently don't need those flushes when changing the IO PDIR data structures. We have no documentation for PCX-W+ (PA8600) and PCX-W2 (PA8700) CPUs, but Carlo Pisani reported that his C3600 machine (PA8600, PCX-W+) fails when the fdc instructions were removed. His firmware didn't set the NIOP bit, so one may assume it's a firmware bug since other C3750 machines had the bit set. Even if documentation (as mentioned above) states that PCX-W (PA8500, e.g. J5000) does not need fdc flushes, Sven could show that an Adaptec 29320A PCI-X SCSI controller reliably failed on a dd command during the first five minutes in his J5000 when fdc flushes were missing. Going forward, we will now NOT replace the fdc and sync assembler instructions by NOPS if: a) the NP iopdir_fdc bit was set by firmware, or b) we find a CPU up to and including a PCX-W+ (PA8600). This fixes the HPMC crashes on a C240 and C36XX machines. For other machines we rely on the firmware to set the bit when needed. In case one finds HPMC issues, people could try to boot their machines with the "no-alternatives" kernel option to turn off any alternative patching. Reported-by: Sven Schnelle <svens@stackframe.org> Reported-by: Carlo Pisani <carlojpisani@gmail.com> Tested-by: Sven Schnelle <svens@stackframe.org> Fixes: 3847dab77421 ("parisc: Add alternative coding infrastructure") Signed-off-by: Helge Deller <deller@gmx.de> Cc: stable@vger.kernel.org # 5.0+
2019-06-06	parisc: Use lpa instruction to load physical addresses in driver code	John David Anglin	3	-2/+26
	Most I/O in the kernel is done using the kernel offset mapping. However, there is one API that uses aliased kernel address ranges: > The final category of APIs is for I/O to deliberately aliased address > ranges inside the kernel. Such aliases are set up by use of the > vmap/vmalloc API. Since kernel I/O goes via physical pages, the I/O > subsystem assumes that the user mapping and kernel offset mapping are > the only aliases. This isn't true for vmap aliases, so anything in > the kernel trying to do I/O to vmap areas must manually manage > coherency. It must do this by flushing the vmap range before doing > I/O and invalidating it after the I/O returns. For this reason, we should use the hardware lpa instruction to load the physical address of kernel virtual addresses in the driver code. I believe we only use the vmap/vmalloc API with old PA 1.x processors which don't have a sba, so we don't hit this problem. Tested on c3750, c8000 and rp3440. Signed-off-by: John David Anglin <dave.anglin@bell.net> Signed-off-by: Helge Deller <deller@gmx.de>
2019-06-06	parisc: configs: Remove useless UEVENT_HELPER_PATH	Krzysztof Kozlowski	7	-7/+0
	Remove the CONFIG_UEVENT_HELPER_PATH because: 1. It is disabled since commit 1be01d4a5714 ("driver: base: Disable CONFIG_UEVENT_HELPER by default") as its dependency (UEVENT_HELPER) was made default to 'n', 2. It is not recommended (help message: "This should not be used today [...] creates a high system load") and was kept only for ancient userland, 3. Certain userland specifically requests it to be disabled (systemd README: "Legacy hotplug slows down the system and confuses udev"). Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Acked-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Helge Deller <deller@gmx.de>
2019-06-06	parisc: Use implicit space register selection for loading the coherence index of I/O pdirs	John David Anglin	2	-5/+2
	We only support I/O to kernel space. Using %sr1 to load the coherence index may be racy unless interrupts are disabled. This patch changes the code used to load the coherence index to use implicit space register selection. This saves one instruction and eliminates the race. Tested on rp3440, c8000 and c3750. Signed-off-by: John David Anglin <dave.anglin@bell.net> Cc: stable@vger.kernel.org Signed-off-by: Helge Deller <deller@gmx.de>
2019-06-06	ARM64: trivial: s/TIF_SECOMP/TIF_SECCOMP/ comment typo fix	George G. Davis	1	-1/+1
	Fix a s/TIF_SECOMP/TIF_SECCOMP/ comment typo Cc: Jiri Kosina <trivial@kernel.org> Reviewed-by: Kees Cook <keescook@chromium.org Signed-off-by: George G. Davis <george_davis@mentor.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-06-05	ipv6: tcp: send consistent flowlabel in TIME_WAIT state	Eric Dumazet	1	-0/+2
	After commit 1d13a96c74fc ("ipv6: tcp: fix flowlabel value in ACK messages"), we stored in tw_flowlabel the flowlabel, in the case ACK packets needed to be sent on behalf of a TIME_WAIT socket. We can use the same field so that RST packets sent from TIME_WAIT state also use a consistent flowlabel. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Florent Fourcot <florent.fourcot@wifirst.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-05	ipv6: tcp: enable flowlabel reflection in some RST packets	Eric Dumazet	4	-9/+29
	When RST packets are sent because no socket could be found, it makes sense to use flowlabel_reflect sysctl to decide if a reflection of the flowlabel is requested. This extends commit 22b6722bfa59 ("ipv6: Add sysctl for per namespace flow label reflection"), for some TCP RST packets. In order to provide full control of this new feature, flowlabel_reflect becomes a bitmask. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-05	lib: objagg: Use struct_size() in kzalloc()	Gustavo A. R. Silva	1	-4/+2
	One of the more common cases of allocation size calculations is finding the size of a structure that has a zero-sized array at the end, along with memory for some number of elements for that array. For example: struct objagg_stats { ... struct objagg_obj_stats_info stats_info[]; }; size = sizeof(objagg_stats) + sizeof(objagg_stats->stats_info[0]) count; instance = kzalloc(size, GFP_KERNEL); Instead of leaving these open-coded and prone to type mistakes, we can now use the new struct_size() helper: instance = kzalloc(struct_size(instance, stats_info, count), GFP_KERNEL); Notice that, in this case, variable alloc_size is not necessary, hence it is removed. This code was detected with the help of Coccinelle. Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-05	inet_connection_sock: remove unused parameter of reqsk_queue_unlink func	Zhiqiang Liu	1	-3/+2
	small cleanup: "struct request_sock_queue *queue" parameter of reqsk_queue_unlink func is never used in the func, so we can remove it. Signed-off-by: Zhiqiang Liu <liuzhiqiang26@huawei.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-05	Revert "fib_rules: return 0 directly if an exactly same rule exists when NLM_F_EXCL not supplied"	Hangbin Liu	1	-3/+3
	This reverts commit e9919a24d3022f72bcadc407e73a6ef17093a849. Nathan reported the new behaviour breaks Android, as Android just add new rules and delete old ones. If we return 0 without adding dup rules, Android will remove the new added rules and causing system to soft-reboot. Fixes: e9919a24d302 ("fib_rules: return 0 directly if an exactly same rule exists when NLM_F_EXCL not supplied") Reported-by: Nathan Chancellor <natechancellor@gmail.com> Reported-by: Yaro Slav <yaro330@gmail.com> Reported-by: Maciej Żenczykowski <zenczykowski@gmail.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Nathan Chancellor <natechancellor@gmail.com> Tested-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-05	net: phy: remove state PHY_FORCING	Heiner Kallweit	2	-35/+2
	In the early days of phylib we had a functionality that changed to the next lower speed in fixed mode if no link was established after a certain period of time. This functionality has been removed years ago, and state PHY_FORCING isn't needed any longer. Instead we can go from UP to RUNNING or NOLINK directly (same as in autoneg mode). Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-05	net: aquantia: fix wol configuration not applied sometimes	Nikita Danilov	2	-8/+10
	WoL magic packet configuration sometimes does not work due to couple of leakages found. Mainly there was a regression introduced during readx_poll refactoring. Next, fw request waiting time was too small. Sometimes that caused sleep proxy config function to return with an error and to skip WoL configuration. At last, WoL data were passed to FW from not clean buffer. That could cause FW to accept garbage as a random configuration data. Fixes: 6a7f2277313b ("net: aquantia: replace AQ_HW_WAIT_FOR with readx_poll_timeout_atomic") Signed-off-by: Nikita Danilov <nikita.danilov@aquantia.com> Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-05	ethtool: fix potential userspace buffer overflow	Vivien Didelot	1	-1/+4
	ethtool_get_regs() allocates a buffer of size ops->get_regs_len(), and pass it to the kernel driver via ops->get_regs() for filling. There is no restriction about what the kernel drivers can or cannot do with the open ethtool_regs structure. They usually set regs->version and ignore regs->len or set it to the same size as ops->get_regs_len(). But if userspace allocates a smaller buffer for the registers dump, we would cause a userspace buffer overflow in the final copy_to_user() call, which uses the regs.len value potentially reset by the driver. To fix this, make this case obvious and store regs.len before calling ops->get_regs(), to only copy as much data as requested by userspace, up to the value returned by ops->get_regs_len(). While at it, remove the redundant check for non-null regbuf. Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com> Reviewed-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-05	Fix memory leak in sctp_process_init	Neil Horman	2	-10/+8
	syzbot found the following leak in sctp_process_init BUG: memory leak unreferenced object 0xffff88810ef68400 (size 1024): comm "syz-executor273", pid 7046, jiffies 4294945598 (age 28.770s) hex dump (first 32 bytes): 1d de 28 8d de 0b 1b e3 b5 c2 f9 68 fd 1a 97 25 ..(........h...% 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<00000000a02cebbd>] kmemleak_alloc_recursive include/linux/kmemleak.h:55 [inline] [<00000000a02cebbd>] slab_post_alloc_hook mm/slab.h:439 [inline] [<00000000a02cebbd>] slab_alloc mm/slab.c:3326 [inline] [<00000000a02cebbd>] __do_kmalloc mm/slab.c:3658 [inline] [<00000000a02cebbd>] __kmalloc_track_caller+0x15d/0x2c0 mm/slab.c:3675 [<000000009e6245e6>] kmemdup+0x27/0x60 mm/util.c:119 [<00000000dfdc5d2d>] kmemdup include/linux/string.h:432 [inline] [<00000000dfdc5d2d>] sctp_process_init+0xa7e/0xc20 net/sctp/sm_make_chunk.c:2437 [<00000000b58b62f8>] sctp_cmd_process_init net/sctp/sm_sideeffect.c:682 [inline] [<00000000b58b62f8>] sctp_cmd_interpreter net/sctp/sm_sideeffect.c:1384 [inline] [<00000000b58b62f8>] sctp_side_effects net/sctp/sm_sideeffect.c:1194 [inline] [<00000000b58b62f8>] sctp_do_sm+0xbdc/0x1d60 net/sctp/sm_sideeffect.c:1165 [<0000000044e11f96>] sctp_assoc_bh_rcv+0x13c/0x200 net/sctp/associola.c:1074 [<00000000ec43804d>] sctp_inq_push+0x7f/0xb0 net/sctp/inqueue.c:95 [<00000000726aa954>] sctp_backlog_rcv+0x5e/0x2a0 net/sctp/input.c:354 [<00000000d9e249a8>] sk_backlog_rcv include/net/sock.h:950 [inline] [<00000000d9e249a8>] __release_sock+0xab/0x110 net/core/sock.c:2418 [<00000000acae44fa>] release_sock+0x37/0xd0 net/core/sock.c:2934 [<00000000963cc9ae>] sctp_sendmsg+0x2c0/0x990 net/sctp/socket.c:2122 [<00000000a7fc7565>] inet_sendmsg+0x64/0x120 net/ipv4/af_inet.c:802 [<00000000b732cbd3>] sock_sendmsg_nosec net/socket.c:652 [inline] [<00000000b732cbd3>] sock_sendmsg+0x54/0x70 net/socket.c:671 [<00000000274c57ab>] ___sys_sendmsg+0x393/0x3c0 net/socket.c:2292 [<000000008252aedb>] __sys_sendmsg+0x80/0xf0 net/socket.c:2330 [<00000000f7bf23d1>] __do_sys_sendmsg net/socket.c:2339 [inline] [<00000000f7bf23d1>] __se_sys_sendmsg net/socket.c:2337 [inline] [<00000000f7bf23d1>] __x64_sys_sendmsg+0x23/0x30 net/socket.c:2337 [<00000000a8b4131f>] do_syscall_64+0x76/0x1a0 arch/x86/entry/common.c:3 The problem was that the peer.cookie value points to an skb allocated area on the first pass through this function, at which point it is overwritten with a heap allocated value, but in certain cases, where a COOKIE_ECHO chunk is included in the packet, a second pass through sctp_process_init is made, where the cookie value is re-allocated, leaking the first allocation. Fix is to always allocate the cookie value, and free it when we are done using it. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Reported-by: syzbot+f7e9153b037eac9b1df8@syzkaller.appspotmail.com CC: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> CC: "David S. Miller" <davem@davemloft.net> CC: netdev@vger.kernel.org Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-05	net: rds: fix memory leak when unload rds_rdma	Zhu Yanjun	2	-1/+4
	When KASAN is enabled, after several rds connections are created, then "rmmod rds_rdma" is run. The following will appear. " BUG rds_ib_incoming (Not tainted): Objects remaining in rds_ib_incoming on __kmem_cache_shutdown() Call Trace: dump_stack+0x71/0xab slab_err+0xad/0xd0 __kmem_cache_shutdown+0x17d/0x370 shutdown_cache+0x17/0x130 kmem_cache_destroy+0x1df/0x210 rds_ib_recv_exit+0x11/0x20 [rds_rdma] rds_ib_exit+0x7a/0x90 [rds_rdma] __x64_sys_delete_module+0x224/0x2c0 ? __ia32_sys_delete_module+0x2c0/0x2c0 do_syscall_64+0x73/0x190 entry_SYSCALL_64_after_hwframe+0x44/0xa9 " This is rds connection memory leak. The root cause is: When "rmmod rds_rdma" is run, rds_ib_remove_one will call rds_ib_dev_shutdown to drop the rds connections. rds_ib_dev_shutdown will call rds_conn_drop to drop rds connections as below. " rds_conn_path_drop(&conn->c_path[0], false); " In the above, destroy is set to false. void rds_conn_path_drop(struct rds_conn_path cp, bool destroy) { atomic_set(&cp->cp_state, RDS_CONN_ERROR); rcu_read_lock(); if (!destroy && rds_destroy_pending(cp->cp_conn)) { rcu_read_unlock(); return; } queue_work(rds_wq, &cp->cp_down_w); rcu_read_unlock(); } In the above function, destroy is set to false. rds_destroy_pending is called. This does not move rds connections to ib_nodev_conns. So destroy is set to true to move rds connections to ib_nodev_conns. In rds_ib_unregister_client, flush_workqueue is called to make rds_wq finsh shutdown rds connections. The function rds_ib_destroy_nodev_conns is called to shutdown rds connections finally. Then rds_ib_recv_exit is called to destroy slab. void rds_ib_recv_exit(void) { kmem_cache_destroy(rds_ib_incoming_slab); kmem_cache_destroy(rds_ib_frag_slab); } The above slab memory leak will not occur again. >From tests, 256 rds connections [root@ca-dev14 ~]# time rmmod rds_rdma real 0m16.522s user 0m0.000s sys 0m8.152s 512 rds connections [root@ca-dev14 ~]# time rmmod rds_rdma real 0m32.054s user 0m0.000s sys 0m15.568s To rmmod rds_rdma with 256 rds connections, about 16 seconds are needed. And with 512 rds connections, about 32 seconds are needed. >From ftrace, when one rds connection is destroyed, " 19) \| rds_conn_destroy [rds]() { 19) 7.782 us \| rds_conn_path_drop [rds](); 15) \| rds_shutdown_worker [rds]() { 15) \| rds_conn_shutdown [rds]() { 15) 1.651 us \| rds_send_path_reset [rds](); 15) 7.195 us \| } 15) + 11.434 us \| } 19) 2.285 us \| rds_cong_remove_conn [rds](); 19) 24062.76 us \| } " So if many rds connections will be destroyed, this function rds_ib_destroy_nodev_conns uses most of time. Suggested-by: Håkon Bugge <haakon.bugge@oracle.com> Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-05	net: rds: add per rds connection cache statistics	Zhu Yanjun	2	-0/+4
	The variable cache_allocs is to indicate how many frags (KiB) are in one rds connection frag cache. The command "rds-info -Iv" will output the rds connection cache statistics as below: " RDS IB Connections: LocalAddr RemoteAddr Tos SL LocalDev RemoteDev 1.1.1.14 1.1.1.14 58 255 fe80::2:c903:a:7a31 fe80::2:c903:a:7a31 send_wr=256, recv_wr=1024, send_sge=8, rdma_mr_max=4096, rdma_mr_size=257, cache_allocs=12 " This means that there are about 12KiB frag in this rds connection frag cache. Since rds.h in rds-tools is not related with the kernel rds.h, the change in kernel rds.h does not affect rds-tools. rds-info in rds-tools 2.0.5 and 2.0.6 is tested with this commit. It works well. Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-05	net: stmmac: dwmac4: fix flow control issue	Biao Huang	1	-2/+6
	Current dwmac4_flow_ctrl will not clear GMAC_RX_FLOW_CTRL_RFE/GMAC_RX_FLOW_CTRL_RFE bits, so MAC hw will keep flow control on although expecting flow control off by ethtool. Add codes to fix it. Fixes: 477286b53f55 ("stmmac: add GMAC4 core support") Signed-off-by: Biao Huang <biao.huang@mediatek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-05	net: stmmac: modify default value of tx-frames	Biao Huang	1	-1/+1
	the default value of tx-frames is 25, it's too late when passing tstamp to stack, then the ptp4l will fail: ptp4l -i eth0 -f gPTP.cfg -m ptp4l: selected /dev/ptp0 as PTP clock ptp4l: port 1: INITIALIZING to LISTENING on INITIALIZE ptp4l: port 0: INITIALIZING to LISTENING on INITIALIZE ptp4l: port 1: link up ptp4l: timed out while polling for tx timestamp ptp4l: increasing tx_timestamp_timeout may correct this issue, but it is likely caused by a driver bug ptp4l: port 1: send peer delay response failed ptp4l: port 1: LISTENING to FAULTY on FAULT_DETECTED (FT_UNSPECIFIED) ptp4l tests pass when changing the tx-frames from 25 to 1 with ethtool -C option. It should be fine to set tx-frames default value to 1, so ptp4l will pass by default. Signed-off-by: Biao Huang <biao.huang@mediatek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-05	net: stmmac: dwmac-mediatek: disable rx watchdog	Biao Huang	1	-0/+1
	disable rx watchdog for dwmac-mediatek, then the hw will issue a rx interrupt once receiving a packet, so the responding time for rx path will be reduced. Signed-off-by: Biao Huang <biao.huang@mediatek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-05	net: stmmac: dwmac-mediatek: enable Ethernet power domain	Biao Huang	1	-0/+7
	add Ethernet power on/off operations in init/exit flow. Signed-off-by: Biao Huang <biao.huang@mediatek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-05	ipv6: fix the check before getting the cookie in rt6_get_cookie	Xin Long	1	-2/+1
	In Jianlin's testing, netperf was broken with 'Connection reset by peer', as the cookie check failed in rt6_check() and ip6_dst_check() always returned NULL. It's caused by Commit 93531c674315 ("net/ipv6: separate handling of FIB entries from dst based routes"), where the cookie can be got only when 'c1'(see below) for setting dst_cookie whereas rt6_check() is called when !'c1' for checking dst_cookie, as we can see in ip6_dst_check(). Since in ip6_dst_check() both rt6_dst_from_check() (c1) and rt6_check() (!c1) will check the 'from' cookie, this patch is to remove the c1 check in rt6_get_cookie(), so that the dst_cookie can always be set properly. c1: (rt->rt6i_flags & RTF_PCPU \|\| unlikely(!list_empty(&rt->rt6i_uncached))) Fixes: 93531c674315 ("net/ipv6: separate handling of FIB entries from dst based routes") Reported-by: Jianlin Shi <jishi@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-05	ipv4: not do cache for local delivery if bc_forwarding is enabled	Xin Long	1	-12/+12
	With the topo: h1 ---\| rp1 \| \| route rp3 \|--- h3 (192.168.200.1) h2 ---\| rp2 \| If rp1 bc_forwarding is set while rp2 bc_forwarding is not, after doing "ping 192.168.200.255" on h1, then ping 192.168.200.255 on h2, and the packets can still be forwared. This issue was caused by the input route cache. It should only do the cache for either bc forwarding or local delivery. Otherwise, local delivery can use the route cache for bc forwarding of other interfaces. This patch is to fix it by not doing cache for local delivery if all.bc_forwarding is enabled. Note that we don't fix it by checking route cache local flag after rt_cache_valid() in "local_input:" and "ip_mkroute_input", as the common route code shouldn't be touched for bc_forwarding. Fixes: 5cbf777cfdf6 ("route: add support for directed broadcast forwarding") Reported-by: Jianlin Shi <jishi@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-05	drivers: net: vxlan: drop unneeded likely() call around IS_ERR()	Enrico Weigelt	1	-1/+1
	IS_ERR() already calls unlikely(), so this extra likely() call around the !IS_ERR() is not needed. Signed-off-by: Enrico Weigelt <info@metux.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-05	net: ipv6: drop unneeded likely() call around IS_ERR()	Enrico Weigelt	2	-2/+2
	IS_ERR() already calls unlikely(), so this extra unlikely() call around IS_ERR() is not needed. Signed-off-by: Enrico Weigelt <info@metux.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-05	net: ipv4: drop unneeded likely() call around IS_ERR()	Enrico Weigelt	4	-4/+4
	IS_ERR() already calls unlikely(), so this extra unlikely() call around IS_ERR() is not needed. Signed-off-by: Enrico Weigelt <info@metux.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-05	net: openvswitch: drop unneeded likely() call around IS_ERR()	Enrico Weigelt	1	-1/+1
	IS_ERR() already calls unlikely(), so this extra likely() call around the !IS_ERR() is not needed. Signed-off-by: Enrico Weigelt <info@metux.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-05	net: socket: drop unneeded likely() call around IS_ERR()	Enrico Weigelt	1	-1/+1
	IS_ERR() already calls unlikely(), so this extra likely() call around the !IS_ERR() is not needed. Signed-off-by: Enrico Weigelt <info@metux.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-05	nfp: flower: use struct_size() helper	Gustavo A. R. Silva	1	-2/+1
	One of the more common cases of allocation size calculations is finding the size of a structure that has a zero-sized array at the end, along with memory for some number of elements for that array. For example: struct nfp_tun_active_tuns { ... struct route_ip_info { __be32 ipv4; __be32 egress_port; __be32 extra[2]; } tun_info[]; }; Make use of the struct_size() helper instead of an open-coded version in order to avoid any potential type mistakes. So, replace the following form: sizeof(struct nfp_tun_active_tuns) + sizeof(struct route_ip_info) * count with: struct_size(payload, tun_info, count) This code was detected with the help of Coccinelle. Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-05	i40e: Check and set the PF driver state first in i40e_ndo_set_vf_mac	Lihong Yang	1	-5/+5
	The PF driver state flag __I40E_VIRTCHNL_OP_PENDING needs to be checked and set at the beginning of i40e_ndo_set_vf_mac. Otherwise, if there are error conditions before it, the flag will be cleared unexpectedly by this function to cause potential race conditions. Hence move the check to the top of this function. Signed-off-by: Lihong Yang <lihong.yang@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-05	i40e: Do not check VF state in i40e_ndo_get_vf_config	Lihong Yang	1	-4/+2
	The VF configuration returned in i40e_ndo_get_vf_config is already stored by the PF. There is no dependency on any specific state of the VF to return the configuration. Drop the check against I40E_VF_STATE_INIT since it is not needed. Signed-off-by: Lihong Yang <lihong.yang@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-05	net: ixgbevf: fix a missing check of ixgbevf_write_msg_read_ack	Kangjie Lu	1	-3/+2
	If ixgbevf_write_msg_read_ack fails, return its error code upstream Signed-off-by: Kangjie Lu <kjlu@umn.edu> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Reviewed-by: Mukesh Ojha <mojha@codeaurora.org> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-06-05	ixgbe: implement support for SDP/PPS output on X550 hardware	Jacob Keller	2	-5/+108
	Similar to the X540 hardware, enable support for generating a 1pps output signal on SDP0. This support is slightly different to the X540 hardware, because of the register layout changes. First, the system time register is now represented in 'cycles' and 'billions of cycles'. Second, we need to also program the TSSDP register, as well as the ESDP register. Third, the clock output uses only FREQOUT, instead of a full 64bit value for the output clock period. Finally, we have to use the ST0 bit instead of the SYNCLK bit in the TSAUXC register. This support should work even for the hardware with a higher frequency clock, as it carefully takes into account the multiply and shift of the cycle counter used. We also set the pps configuration to 1, since we now support generating a pulse per second output. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-06-05	net: hns3: Use LLDP ethertype define ETH_P_LLDP	Anirudh Venkataramanan	2	-2/+1
	Remove references to HCLGE_MAC_ETHERTYPE_LLDP and use ETH_P_LLDP instead. Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-06-05	ice: Use LLDP ethertype define ETH_P_LLDP	Jeff Kirsher	1	-3/+1
	Instead of using a local define for the LLDP ethertype, use the kernel define ETH_P_LLDP. Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-06-05	ixgbe: Use LLDP ethertype define ETH_P_LLDP	Anirudh Venkataramanan	2	-3/+1
	Remove references to IXGBE_ETH_P_LLD and use ETH_P_LLDP instead. Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-06-05	i40e: Use LLDP ethertype define ETH_P_LLDP	Anirudh Venkataramanan	2	-4/+2
	Remove references to I40E_ETH_P_LLDP and use ETH_P_LLDP instead. Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-06-05	net: Add a define for LLDP ethertype	Anirudh Venkataramanan	1	-0/+1
	Add a new define ETH_P_LLDP for Link Layer Discovery Protocol (LLDP) ethertype. Suggested-by: Bruce Allan <bruce.w.allan@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-06-05	ixgbe: add a kernel documentation comment for ixgbe_ptp_get_ts_config	Jacob Keller	1	-0/+9
	This function was missing a documentation comment. Add one now. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-06-05	ixgbe: use 'cc' instead of 'hw_cc' for local variable	Jacob Keller	1	-3/+3
	The ixgbe_ptp.c file sometimes uses hw_cc as the local variable for the cycle counter in ixgbe_ptp_read_X550. However, we use just 'cc' as a local variable for this by convention else where in the file. Convert this lone usage of 'hw_cc' into just the shorter 'cc' name to match the other read functions in the file. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-06-05	ixgbe: fix PTP SDP pin setup on X540 hardware	Jacob Keller	1	-29/+42
	The function ixgbe_ptp_setup_sdp_X540 attempts to program a software defined pin, in order to generate a pulse-per-second output on SDP 0. It does work to generate the output, but does not align the output on the full second. Additionally, it does not take into account the cyclecounter multiplier. This leads to somewhat confusing code which is likely to be incorrect if blindly copied to another hardware type. Update this code to account for the cyclecounter multiplier, and to directly use timecounter_read. This change ensures that the SDP output will align properly on a full second, and makes the intent of the calculations a bit more clear. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-06-05	ixgbe: reduce PTP Tx timestamp timeout to 1 second	Jacob Keller	1	-1/+1
	Previously we waited for a whole 15 seconds before we cleared the Tx timestamp state. This is astronomically long compared to the worst case timings expected by our devices. In addition, this is longer than the wait in ptp4l when it detects a fault (caused by missing Tx timestamps). Thus, reduce the timer to only 1 second, which is well after the maximum expected delay. This should reduce user frustration when a timestamp does get dropped for some reason. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-06-05	ixgbe: fix AF_XDP tx packet count	William Tu	1	-0/+1
	The total_packets count at ixgbe_clean_xdp_tx_irq is always zero when testing with xdpsock -t -N. Set the gso_segs to 1 to make the tx packet count correct. Signed-off-by: William Tu <u9012063@gmail.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-06-05	ixgbe: fix AF_XDP tx byte count	William Tu	1	-1/+0
	The tx bytecount is done twice. When running './xdpsock -t -N -i eth3' and 'ip -s link show dev eth3' The avg packet size is 120 instead of 60. So remove the extra one. Signed-off-by: William Tu <u9012063@gmail.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-06-05	ixgbe: remove umem from adapter	Jan Sokolowski	2	-71/+19
	As current implementation of netdev already contains and provides umems for us, we no longer have the need to contain these structures in ixgbe_adapter. Refactor the code to operate on netdev-provided umems. Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-06-05	ixgbe: add tracking of AF_XDP zero-copy state for each queue pair	Jan Sokolowski	3	-1/+11
	Here, we add a bitmap to the ixgbe_adapter that tracks if a certain queue pair has been "zero-copy enabled" via the ndo_bpf. The bitmap is used in ixgbe_xsk_umem, and enables zero-copy if and only if XDP is enabled, the corresponding qid in the bitmap is set, and the umem is non-NULL; Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>