linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2019-05-07	net: phy: improve pause mode reporting in phy_print_status	Heiner Kallweit	1	-1/+27
	So far we report symmetric pause only, and we don't consider the local pause capabilities. Let's properly consider local and remote capabilities, and report also asymmetric pause. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-07	dt-bindings: net: Fix a typo in the phy-mode list for ethernet bindings	Maxime Chevallier	1	-1/+1
	The phy_mode "2000base-x" is actually supposed to be "1000base-x", even though the commit title of the original patch says otherwise. Fixes: 55601a880690 ("net: phy: Add 2000base-x, 2500base-x and rxaui modes") Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-07	net: macb: Change interrupt and napi enable order in open	Harini Katakam	1	-3/+3
	Current order in open: -> Enable interrupts (macb_init_hw) -> Enable NAPI -> Start PHY Sequence of RX handling: -> RX interrupt occurs -> Interrupt is cleared and interrupt bits disabled in handler -> NAPI is scheduled -> In NAPI, RX budget is processed and RX interrupts are re-enabled With the above, on QEMU or fixed link setups (where PHY state doesn't matter), there's a chance macb RX interrupt occurs before NAPI is enabled. This will result in NAPI being scheduled before it is enabled. Fix this macb open by changing the order. Fixes: ae1f2a56d273 ("net: macb: Added support for many RX queues") Signed-off-by: Harini Katakam <harini.katakam@xilinx.com> Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-07	net: ll_temac: Improve error message on error IRQ	Esben Haabendal	1	-2/+8
	The channel status register value can be very helpful when debugging SDMA problems. Signed-off-by: Esben Haabendal <esben@geanix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-07	net/sched: remove block pointer from common offload structure	Pieter Jansen van Vuuren	8	-36/+25
	Based on feedback from Jiri avoid carrying a pointer to the tcf_block structure in the tc_cls_common_offload structure. Instead store a flag in driver private data which indicates if offloads apply to a shared block at block binding time. Suggested-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-07	net: ethernet: support of_get_mac_address new ERR_PTR error	Petr Štetiar	45	-45/+45
	There was NVMEM support added to of_get_mac_address, so it could now return ERR_PTR encoded error values, so we need to adjust all current users of of_get_mac_address to this new fact. While at it, remove superfluous is_valid_ether_addr as the MAC address returned from of_get_mac_address is always valid and checked by is_valid_ether_addr anyway. Fixes: d01f449c008a ("of_net: add NVMEM support to of_get_mac_address") Signed-off-by: Petr Štetiar <ynezz@true.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-07	net: usb: smsc: fix warning reported by kbuild test robot	Petr Štetiar	2	-2/+2
	This patch fixes following warning reported by kbuild test robot: In function ‘memcpy’, inlined from ‘smsc75xx_init_mac_address’ at drivers/net/usb/smsc75xx.c:778:3, inlined from ‘smsc75xx_bind’ at drivers/net/usb/smsc75xx.c:1501:2: ./include/linux/string.h:355:9: warning: argument 2 null where non-null expected [-Wnonnull] return __builtin_memcpy(p, q, size); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/net/usb/smsc75xx.c: In function ‘smsc75xx_bind’: ./include/linux/string.h:355:9: note: in a call to built-in function ‘__builtin_memcpy’ I've replaced the offending memcpy with ether_addr_copy, because I'm 100% sure, that of_get_mac_address can't return NULL as it returns valid pointer or ERR_PTR encoded value, nothing else. I'm hesitant to just change IS_ERR into IS_ERR_OR_NULL check, as this would make the warning disappear also, but it would be confusing to check for impossible return value just to make a compiler happy. Fixes: adfb3cb2c52e ("net: usb: support of_get_mac_address new ERR_PTR error") Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Petr Štetiar <ynezz@true.cz> Reviewed-by: Woojung Huh <woojung.huh@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-07	staging: octeon-ethernet: Fix of_get_mac_address ERR_PTR check	Petr Štetiar	1	-1/+1
	Commit 284eb160681c ("staging: octeon-ethernet: support of_get_mac_address new ERR_PTR error") has introduced checking for ERR_PTR encoded error value from of_get_mac_address with IS_ERR macro, which is not sufficient in this case, as the mac variable is set to NULL initialy and if the kernel is compiled without DT support this NULL would get passed to IS_ERR, which would lead to the wrong decision and would pass that NULL pointer and invalid MAC address further. Fixes: 284eb160681c ("staging: octeon-ethernet: support of_get_mac_address new ERR_PTR error") Signed-off-by: Petr Štetiar <ynezz@true.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-07	net: dsa: support of_get_mac_address new ERR_PTR error	Petr Štetiar	1	-1/+1
	There was NVMEM support added to of_get_mac_address, so it could now return ERR_PTR encoded error values, so we need to adjust all current users of of_get_mac_address to this new fact. While at it, remove superfluous is_valid_ether_addr as the MAC address returned from of_get_mac_address is always valid and checked by is_valid_ether_addr anyway. Fixes: d01f449c008a ("of_net: add NVMEM support to of_get_mac_address") Signed-off-by: Petr Štetiar <ynezz@true.cz> Tested-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-07	net: dsa: sja1105: Fix status initialization in sja1105_get_ethtool_stats	Nathan Chancellor	1	-1/+3
	Clang warns: drivers/net/dsa/sja1105/sja1105_ethtool.c:316:39: warning: suggest braces around initialization of subobject [-Wmissing-braces] struct sja1105_port_status status = {0}; ^ {} 1 warning generated. One way to fix these warnings is to add additional braces like Clang suggests; however, there has been a bit of push back from some maintainers[1][2], who just prefer memset as it is unambiguous, doesn't depend on a particular compiler version[3], and properly initializes all subobjects. Do that here so there are no more warnings. [1]: https://lore.kernel.org/lkml/022e41c0-8465-dc7a-a45c-64187ecd9684@amd.com/ [2]: https://lore.kernel.org/lkml/20181128.215241.702406654469517539.davem@davemloft.net/ [3]: https://lore.kernel.org/lkml/20181116150432.2408a075@redhat.com/ Fixes: 52c34e6e125c ("net: dsa: sja1105: Add support for ethtool port counters") Link: https://github.com/ClangBuiltLinux/linux/issues/471 Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Acked-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-07	vrf: sit mtu should not be updated when vrf netdev is the link	Stephen Suryaputra	1	-1/+1
	VRF netdev mtu isn't typically set and have an mtu of 65536. When the link of a tunnel is set, the tunnel mtu is changed from 1480 to the link mtu minus tunnel header. In the case of VRF netdev is the link, then the tunnel mtu becomes 65516. So, fix it by not setting the tunnel mtu in this case. Signed-off-by: Stephen Suryaputra <ssuryaextr@gmail.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-07	net: dsa: Fix error cleanup path in dsa_init_module	YueHaibing	1	-2/+9
	BUG: unable to handle kernel paging request at ffffffffa01c5430 PGD 3270067 P4D 3270067 PUD 3271063 PMD 230bc5067 PTE 0 Oops: 0000 [#1 CPU: 0 PID: 6159 Comm: modprobe Not tainted 5.1.0+ #33 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2014 RIP: 0010:raw_notifier_chain_register+0x16/0x40 Code: 63 f8 66 90 e9 5d ff ff ff 90 90 90 90 90 90 90 90 90 90 90 55 48 8b 07 48 89 e5 48 85 c0 74 1c 8b 56 10 3b 50 10 7e 07 eb 12 <39> 50 10 7c 0d 48 8d 78 08 48 8b 40 08 48 85 c0 75 ee 48 89 46 08 RSP: 0018:ffffc90001c33c08 EFLAGS: 00010282 RAX: ffffffffa01c5420 RBX: ffffffffa01db420 RCX: 4fcef45928070a8b RDX: 0000000000000000 RSI: ffffffffa01db420 RDI: ffffffffa01b0068 RBP: ffffc90001c33c08 R08: 000000003e0a33d0 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000094443661 R12: ffff88822c320700 R13: ffff88823109be80 R14: 0000000000000000 R15: ffffc90001c33e78 FS: 00007fab8bd08540(0000) GS:ffff888237a00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffffa01c5430 CR3: 00000002297ea000 CR4: 00000000000006f0 Call Trace: register_netdevice_notifier+0x43/0x250 ? 0xffffffffa01e0000 dsa_slave_register_notifier+0x13/0x70 [dsa_core ? 0xffffffffa01e0000 dsa_init_module+0x2e/0x1000 [dsa_core do_one_initcall+0x6c/0x3cc ? do_init_module+0x22/0x1f1 ? rcu_read_lock_sched_held+0x97/0xb0 ? kmem_cache_alloc_trace+0x325/0x3b0 do_init_module+0x5b/0x1f1 load_module+0x1db1/0x2690 ? m_show+0x1d0/0x1d0 __do_sys_finit_module+0xc5/0xd0 __x64_sys_finit_module+0x15/0x20 do_syscall_64+0x6b/0x1d0 entry_SYSCALL_64_after_hwframe+0x49/0xbe Cleanup allocated resourses if there are errors, otherwise it will trgger memleak. Fixes: c9eb3e0f8701 ("net: dsa: Add support for learning FDB through notification") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-07	l2tp: Fix possible NULL pointer dereference	YueHaibing	1	-1/+2
	BUG: unable to handle kernel NULL pointer dereference at 0000000000000128 PGD 0 P4D 0 Oops: 0000 [#1 CPU: 0 PID: 5697 Comm: modprobe Tainted: G W 5.1.0-rc7+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2014 RIP: 0010:__lock_acquire+0x53/0x10b0 Code: 8b 1c 25 40 5e 01 00 4c 8b 6d 10 45 85 e4 0f 84 bd 06 00 00 44 8b 1d 7c d2 09 02 49 89 fe 41 89 d2 45 85 db 0f 84 47 02 00 00 <48> 81 3f a0 05 70 83 b8 00 00 00 00 44 0f 44 c0 83 fe 01 0f 86 3a RSP: 0018:ffffc90001c07a28 EFLAGS: 00010002 RAX: 0000000000000000 RBX: ffff88822f038440 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000128 RBP: ffffc90001c07a88 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000001 R13: 0000000000000000 R14: 0000000000000128 R15: 0000000000000000 FS: 00007fead0811540(0000) GS:ffff888237a00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000128 CR3: 00000002310da000 CR4: 00000000000006f0 Call Trace: ? __lock_acquire+0x24e/0x10b0 lock_acquire+0xdf/0x230 ? flush_workqueue+0x71/0x530 flush_workqueue+0x97/0x530 ? flush_workqueue+0x71/0x530 l2tp_exit_net+0x170/0x2b0 [l2tp_core ? l2tp_exit_net+0x93/0x2b0 [l2tp_core ops_exit_list.isra.6+0x36/0x60 unregister_pernet_operations+0xb8/0x110 unregister_pernet_device+0x25/0x40 l2tp_init+0x55/0x1000 [l2tp_core ? 0xffffffffa018d000 do_one_initcall+0x6c/0x3cc ? do_init_module+0x22/0x1f1 ? rcu_read_lock_sched_held+0x97/0xb0 ? kmem_cache_alloc_trace+0x325/0x3b0 do_init_module+0x5b/0x1f1 load_module+0x1db1/0x2690 ? m_show+0x1d0/0x1d0 __do_sys_finit_module+0xc5/0xd0 __x64_sys_finit_module+0x15/0x20 do_syscall_64+0x6b/0x1d0 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x7fead031a839 Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 1f f6 2c 00 f7 d8 64 89 01 48 RSP: 002b:00007ffe8d9acca8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 RAX: ffffffffffffffda RBX: 0000560078398b80 RCX: 00007fead031a839 RDX: 0000000000000000 RSI: 000056007659dc2e RDI: 0000000000000003 RBP: 000056007659dc2e R08: 0000000000000000 R09: 0000560078398b80 R10: 0000000000000003 R11: 0000000000000246 R12: 0000000000000000 R13: 00005600783a04a0 R14: 0000000000040000 R15: 0000560078398b80 Modules linked in: l2tp_core(+) e1000 ip_tables ipv6 [last unloaded: l2tp_core CR2: 0000000000000128 ---[ end trace 8322b2b8bf83f8e1 If alloc_workqueue fails in l2tp_init, l2tp_net_ops is unregistered on failure path. Then l2tp_exit_net is called which will flush NULL workqueue, this patch add a NULL check to fix it. Fixes: 67e04c29ec0d ("l2tp: unregister l2tp_net_ops on failure path") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Acked-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-07	taprio: add null check on sched_nest to avoid potential null pointer dereference	Colin Ian King	1	-0/+2
	The call to nla_nest_start_noflag can return a null pointer and currently this is not being checked and this can lead to a null pointer dereference when the null pointer sched_nest is passed to function nla_nest_end. Fix this by adding in a null pointer check. Addresses-Coverity: ("Dereference null return value") Fixes: a3d43c0d56f1 ("taprio: Add support adding an admin schedule") Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-07	net: mvpp2: cls: fix less than zero check on a u32 variable	Colin Ian King	1	-2/+4
	The signed return from the call to mvpp2_cls_c2_port_flow_index is being assigned to the u32 variable c2.index and then checked for a negative error condition which is always going to be false. Fix this by assigning the return to the int variable index and checking this instead. Addresses-Coverity: ("Unsigned compared against 0") Fixes: 90b509b39ac9 ("net: mvpp2: cls: Add Classification offload support") Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-07	net_sched: sch_fq: handle non connected flows	Eric Dumazet	1	-2/+13
	FQ packet scheduler assumed that packets could be classified based on their owning socket. This means that if a UDP server uses one UDP socket to send packets to different destinations, packets all land in one FQ flow. This is unfair, since each TCP flow has a unique bucket, meaning that in case of pressure (fully utilised uplink), TCP flows have more share of the bandwidth. If we instead detect unconnected sockets, we can use a stochastic hash based on the 4-tuple hash. This also means a QUIC server using one UDP socket will properly spread the outgoing packets to different buckets, and in-kernel pacing based on EDT model will no longer risk having big rb-tree on one flow. Note that UDP application might provide the skb->hash in an ancillary message at sendmsg() time to avoid the cost of a dissection in fq packet scheduler. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-07	net_sched: sch_fq: do not assume EDT packets are ordered	Eric Dumazet	1	-12/+83
	TCP stack makes sure packets for a given flow are monotically increasing, but we want to allow UDP packets to use EDT as well, so that QUIC servers can use in-kernel pacing. This patch adds a per-flow rb-tree on which packets might be stored. We still try to use the linear list for the typical cases where packets are queued with monotically increasing skb->tstamp, since queue/dequeue packets on a standard list is O(1). Note that the ability to store packets in arbitrary EDT order will allow us to implement later a per TCP socket mechanism adding delays (with jitter eventually) and reorders, to implement convenient network emulators. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-07	net: hns3: use devm_kcalloc when allocating desc_cb	Yunsheng Lin	1	-4/+4
	This patch uses devm_kcalloc instead of kcalloc when allocating ring->desc_cb, because devm_kcalloc not only ensure to free the memory when the dev is deallocted, but also allocate the memory from it's device memory node. Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-07	net: hns3: some cleanup for struct hns3_enet_ring	Yunsheng Lin	2	-9/+2
	This patch removes some unused field in struct hns3_enet_ring, use ring->dev for ring_to_dev macro, and use dev consistently in hns3_fill_desc. Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-07	net: hns3: unify the page reusing for page size 4K and 64K	Yunsheng Lin	1	-32/+13
	When page size is 64K, RX buffer is currently not reused when the page_offset is moved to last buffer. This patch adds checking to decide whether the buffer page can be reused when last_offset is moved beyond last offset. If the driver is the only user of page when page_offset is moved to beyond last offset, then buffer can be reused and page_offset is set to zero. Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-07	net: hns3: optimize the barrier using when cleaning TX BD	Yunsheng Lin	1	-14/+15
	Currently, a barrier is used when cleaning each TX BD, which may cause performance degradation. This patch optimizes it to use one barrier when cleaning TX BD each round. Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-07	net: hns3: fix error handling for desc filling	Yunsheng Lin	1	-11/+7
	When desc filling fails in hns3_nic_net_xmit, it will call hns3_clear_desc to unmap the dma mapping. But currently the ring->next_to_use points to the desc where the desc filling or dma mapping return error, which means the desc that ring->next_to_use points to has not done the dma mapping, the desc that need unmapping is before the ring->next_to_use. This patch fixes it by calling ring_ptr_move_bw(next_to_use) before doing unmapping operation, and set desc_cb->dma to zero to avoid freeing it again when unloading. Also, when filling skb head or frag fails, both need to unmap all the way back to next_to_use_head, so remove one desc filling error handling. Fixes: 76ad4f0ee747 ("net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC") Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-07	net: hns3: combine len and checksum handling for inner and outer header.	Yunsheng Lin	1	-111/+81
	When filling len and checksum info to description, there is some similar checking or calculation. So this patch adds hns3_set_l2l3l4 to fill the inner(/normal) header's len and checksum info. If it is a encapsulation skb, it calls hns3_set_outer_l2l3l4 to handle the outer header's len and checksum info, in order to avoid some similar checking or calculation. Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-07	net: hns3: refactor BD filling for l2l3l4 info	Yunsheng Lin	1	-39/+23
	This patch separates the inner and outer l2l3l4 len handling in hns3_set_l2l3l4_len, this is a preparation to combine the l2l3l4 len and checksum handling for inner and outer header. Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-07	net: hns3: fix for tunnel type handling in hns3_rx_checksum	Yunsheng Lin	1	-6/+8
	According to hardware user manual, the tunnel packet type is available in the rx.ol_info field of struct hns3_desc. Currently the tunnel packet type is decided by the rx.l234_info, which may cause RX checksum handling error. This patch fixes it by using the correct field in struct hns3_desc to decide the tunnel packet type. Fixes: 76ad4f0ee747 ("net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC") Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-07	net: hns3: add linearizing checking for TSO case	Yunsheng Lin	1	-0/+45
	HW requires every continuous 8 buffer data to be larger than MSS, we simplify it by ensuring skb_headlen + the first continuous 7 frags to to be larger than GSO header len + mss, and the remaining continuous 7 frags to be larger than MSS except the last 7 frags. This patch adds hns3_skb_need_linearized to handle it for TSO case. Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-07	net: hns3: add counter for times RX pages gets allocated	Yunsheng Lin	3	-0/+6
	Currently, using "ethtool --statistics" can show how many time RX page have been reused, but there is no counter for RX page not being reused. This patch adds non_reuse_pg counter to better debug the performance issue, because it is hard to determine when the RX page is reused or not if there is no such counter. Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-07	net: hns3: use napi_schedule_irqoff in hard interrupts handlers	Yunsheng Lin	1	-1/+1
	napi_schedule_irqoff is introduced to be used from hard interrupts handlers or when irqs are already masked, see: https://lists.openwall.net/netdev/2014/10/29/2 So this patch replaces napi_schedule with napi_schedule_irqoff. Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-07	net: hns3: unify maybe_stop_tx for TSO and non-TSO case	Yunsheng Lin	3	-86/+50
	Currently, maybe_stop_tx ops for TSO and non-TSO case share some BD calculation code, so this patch unifies the maybe_stop_tx by removing the maybe_stop_tx ops. skb_is_gso() can be used to differentiate the case between TSO and non-TSO case if there is need to handle special case for TSO case. This patch also add tx_copy field in "ethtool --statistics" to help better debug the performance issue caused by calling skb_copy. Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-07	net: dsa: lantiq: Add Forwarding Database access	Hauke Mehrtens	1	-0/+98
	This adds functions to add and remove static entries to and from the forwarding database and dump the full forwarding database. Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-07	net: dsa: lantiq: Add fast age function	Hauke Mehrtens	1	-0/+40
	Fast aging per port is not supported directly by the hardware, it is only possible to configure a global aging time. Do the fast aging by iterating over the MAC forwarding table and remove all dynamic entries for a given port. Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-07	net: dsa: lantiq: Add VLAN aware bridge offloading	Hauke Mehrtens	1	-4/+197
	The VLAN aware bridge offloading is similar to the VLAN unaware offloading, this makes it possible to offload the VLAN bridge functionalities. The hardware supports up to 64 VLAN bridge entries, we already use one entry for each LAN port to prevent forwarding of packets between the ports when the ports are not in a bridge, so in the end we have 57 possible VLANs. The VLAN filtering is currently only active when the ports are in a bridge, VLAN filtering for ports not in a bridge is not implemented. It is currently not possible to change between VLAN filtering and not filtering while the port is already in a bridge, this would make the driver more complicated. The VLANs are only defined on bridge entries, so we will not add anything into the hardware when the port joins a bridge if it is doing VLAN filtering, but only when an allowed VLAN is added. Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-07	net: dsa: lantiq: Add VLAN unaware bridge offloading	Hauke Mehrtens	1	-7/+468
	This allows to offload bridges with DSA to the switch hardware and do the packet forwarding in hardware. This implements generic functions to access the switch hardware tables, which are used to control many features of the switch. This patch activates the MAC learning by removing the MAC address table lock, to prevent uncontrolled forwarding of packets between all the LAN ports, they are added into individual bridge tables entries with individual flow ids and the switch will do the MAC learning for each port separately before they are added to a real bridge. Each bridge consist of an entry in the active VLAN table and the VLAN mapping table, table entries with the same index are matching. In the VLAN unaware mode we configure everything with VLAN ID 0, but we use different flow IDs, the switch should handle all VLANs as normal payload and ignore them. When the hardware looks for the port of the destination MAC address it only takes the entries which have the same flow ID of the ingress packet. The bridges are configured with 64 possible entries with these information: Table Index, 0...63 VLAN ID, 0...4095: VLAN ID 0 is untagged flow ID, 0..63: Same flow IDs share entries in MAC learning table port map, one bit for each port number tagged port map, one bit for each port number Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-07	net: dsa: lantiq: Allow special tags only on CPU port	Hauke Mehrtens	1	-2/+4
	Allow the special tag in ingress only on the CPU port and not on all ports. A packet with a special tag could circumvent the hardware forwarding and should only be allowed on the CPU port where Linux controls the port. Fixes: 14fceff4771e ("net: dsa: Add Lantiq / Intel DSA driver for vrx200)" Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-07	afs: Implement YFS ACL setting	David Howells	3	-5/+112
	Implement the setting of YFS ACLs in AFS through the interface of setting the afs.yfs.acl extended attribute on the file. Signed-off-by: David Howells <dhowells@redhat.com>
2019-05-07	afs: Get YFS ACLs and information through xattrs	David Howells	4	-4/+304
	The YFS/AuriStor variant of AFS provides more capable ACLs and provides per-volume ACLs and per-file ACLs as well as per-directory ACLs. It also provides some extra information that can be retrieved through four ACLs: (1) afs.yfs.acl The YFS file ACL (not the same format as afs.acl). (2) afs.yfs.vol_acl The YFS volume ACL. (3) afs.yfs.acl_inherited "1" if a file's ACL is inherited from its parent directory, "0" otherwise. (4) afs.yfs.acl_num_cleaned The number of of ACEs removed from the ACL by the server because the PT entries were removed from the PTS database (ie. the subject is no longer known). Signed-off-by: David Howells <dhowells@redhat.com>
2019-05-07	afs: implement acl setting	Joe Gorse	5	-6/+110
	Implements the setting of ACLs in AFS by means of setting the afs.acl extended attribute on the file. Signed-off-by: Joe Gorse <jhgorse@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com>
2019-05-07	afs: Get an AFS3 ACL as an xattr	David Howells	5	-0/+184
	Implement an xattr on AFS files called "afs.acl" that retrieves a file's ACL. It returns the raw AFS3 ACL from the result of calling FS.FetchACL, leaving any interpretation to userspace. Note that whilst YFS servers will respond to FS.FetchACL, this will render a more-advanced YFS ACL down. Use "afs.yfs.acl" instead for that. Signed-off-by: David Howells <dhowells@redhat.com>
2019-05-07	afs: Fix getting the afs.fid xattr	David Howells	1	-3/+12
	The AFS3 FID is three 32-bit unsigned numbers and is represented as three up-to-8-hex-digit numbers separated by colons to the afs.fid xattr. However, with the advent of support for YFS, the FID is now a 64-bit volume number, a 96-bit vnode/inode number and a 32-bit uniquifier (as before). Whilst the sprintf in afs_xattr_get_fid() has been partially updated (it currently ignores the upper 32 bits of the 96-bit vnode number), the size of the stack-based buffer has not been increased to match, thereby allowing stack corruption to occur. Fix this by increasing the buffer size appropriately and conditionally including the upper part of the vnode number if it is non-zero. The latter requires the lower part to be zero-padded if the upper part is non-zero. Fixes: 3b6492df4153 ("afs: Increase to 64-bit volume ID and 96-bit vnode ID for YFS") Signed-off-by: David Howells <dhowells@redhat.com>
2019-05-07	afs: Fix the afs.cell and afs.volume xattr handlers	David Howells	1	-2/+2
	Fix the ->get handlers for the afs.cell and afs.volume xattrs to pass the source data size to memcpy() rather than target buffer size. Overcopying the source data occasionally causes the kernel to oops. Fixes: d3e3b7eac886 ("afs: Add metadata xattrs") Signed-off-by: David Howells <dhowells@redhat.com>
2019-05-07	afs: Calculate i_blocks based on file size	Marc Dionne	1	-1/+5
	While it's not possible to give an accurate number for the blocks used on the server, populate i_blocks based on the file size so that 'du' can give a reasonable estimate. The value is rounded up to 1K granularity, for consistency with what other AFS clients report, and the servers' 1K usage quota unit. Note that the value calculated by 'du' at the root of a volume can still be slightly lower than the quota usage on the server, as 0-length files are charged 1 quota block, but are reported as occupying 0 blocks. Again, this is consistent with other AFS clients. Signed-off-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com>
2019-05-07	afs: Log more information for "kAFS: AFS vnode with undefined type\n"	David Howells	4	-8/+36
	Log more information when "kAFS: AFS vnode with undefined type\n" is displayed due to a vnode record being retrieved from the server that appears to have a duff file type (usually 0). This prints more information to try and help pin down the problem. Signed-off-by: David Howells <dhowells@redhat.com>
2019-05-07	io_uring: use cpu_online() to check p->sq_thread_cpu instead of cpu_possible()	Shenghui Wang	1	-1/+1
	This issue is found by running liburing/test/io_uring_setup test. When test run, the testcase "attempt to bind to invalid cpu" would not pass with messages like: io_uring_setup(1, 0xbfc2f7c8), \ flags: IORING_SETUP_SQPOLL\|IORING_SETUP_SQ_AFF, \ resv: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000, \ sq_thread_cpu: 2 expected -1, got 3 FAIL On my system, there is: CPU(s) possible : 0-3 CPU(s) online : 0-1 CPU(s) offline : 2-3 CPU(s) present : 0-1 The sq_thread_cpu 2 is offline on my system, so the bind should fail. But cpu_possible() will pass the check. We shouldn't be able to bind to an offline cpu. Use cpu_online() to do the check. After the change, the testcase run as expected: EINVAL will be returned for cpu offlined. Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Shenghui Wang <shhuiw@foxmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-05-07	block: fix mismerge in bvec_advance	Christoph Hellwig	1	-7/+1
	When Jens merged my commit to only allow contiguous page structs in a bio_vec with Ming's 5.1 fix to ensue the bvec length didn't overflow we failed to keep the removal of the expensive nth_page calls. This commits adds them back as intended. Fixes: 5c61ee2cd586 ("Merge tag 'v5.1-rc6' into for-5.2/block") Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-05-07	samples: show race-free pidfd metadata access	Christian Brauner	3	-1/+119
	This is a sample program showing userspace how to get race-free access to process metadata from a pidfd. It is rather easy to do and userspace can actually simply reuse code that currently parses a process's status file in procfs. The program can easily be extended into a generic helper suitable for inclusion in a libc to make it even easier for userspace to gain metadata access. Since this came up in a discussion because this API is going to be used in various service managers: A lot of programs will have a whitelist seccomp filter that returns <some-errno> for all new syscalls. This means that programs might get confused if CLONE_PIDFD works but the later pidfd_send_signal() syscall doesn't. Hence, here's a ahead of time check that pidfd_send_signal() is supported: bool pidfd_send_signal_supported() { int procfd = open("/proc/self", O_DIRECTORY \| O_RDONLY \| O_CLOEXEC); if (procfd < 0) return false; /* * A process is always allowed to signal itself so * pidfd_send_signal() should never fail this test. If it does * it must mean it is not available, blocked by an LSM, seccomp, * or other. */ return pidfd_send_signal(procfd, 0, NULL, 0) == 0; } Signed-off-by: Christian Brauner <christian@brauner.io> Co-developed-by: Jann Horn <jannh@google.com> Signed-off-by: Jann Horn <jannh@google.com> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Kees Cook <keescook@chromium.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: David Howells <dhowells@redhat.com> Cc: "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> Cc: Andy Lutomirsky <luto@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Aleksa Sarai <cyphar@cyphar.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk>
2019-05-07	signal: support CLONE_PIDFD with pidfd_send_signal	Christian Brauner	2	-6/+9
	Let pidfd_send_signal() use pidfds retrieved via CLONE_PIDFD. With this patch pidfd_send_signal() becomes independent of procfs. This fullfils the request made when we merged the pidfd_send_signal() patchset. The pidfd_send_signal() syscall is now always available allowing for it to be used by users without procfs mounted or even users without procfs support compiled into the kernel. Signed-off-by: Christian Brauner <christian@brauner.io> Co-developed-by: Jann Horn <jannh@google.com> Signed-off-by: Jann Horn <jannh@google.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Kees Cook <keescook@chromium.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: David Howells <dhowells@redhat.com> Cc: "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> Cc: Andy Lutomirsky <luto@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Aleksa Sarai <cyphar@cyphar.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk>
2019-05-07	clone: add CLONE_PIDFD	Christian Brauner	3	-4/+106
	This patchset makes it possible to retrieve pid file descriptors at process creation time by introducing the new flag CLONE_PIDFD to the clone() system call. Linus originally suggested to implement this as a new flag to clone() instead of making it a separate system call. As spotted by Linus, there is exactly one bit for clone() left. CLONE_PIDFD creates file descriptors based on the anonymous inode implementation in the kernel that will also be used to implement the new mount api. They serve as a simple opaque handle on pids. Logically, this makes it possible to interpret a pidfd differently, narrowing or widening the scope of various operations (e.g. signal sending). Thus, a pidfd cannot just refer to a tgid, but also a tid, or in theory - given appropriate flag arguments in relevant syscalls - a process group or session. A pidfd does not represent a privilege. This does not imply it cannot ever be that way but for now this is not the case. A pidfd comes with additional information in fdinfo if the kernel supports procfs. The fdinfo file contains the pid of the process in the callers pid namespace in the same format as the procfs status file, i.e. "Pid:\t%d". As suggested by Oleg, with CLONE_PIDFD the pidfd is returned in the parent_tidptr argument of clone. This has the advantage that we can give back the associated pid and the pidfd at the same time. To remove worries about missing metadata access this patchset comes with a sample program that illustrates how a combination of CLONE_PIDFD, and pidfd_send_signal() can be used to gain race-free access to process metadata through /proc/<pid>. The sample program can easily be translated into a helper that would be suitable for inclusion in libc so that users don't have to worry about writing it themselves. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Christian Brauner <christian@brauner.io> Co-developed-by: Jann Horn <jannh@google.com> Signed-off-by: Jann Horn <jannh@google.com> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Kees Cook <keescook@chromium.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: David Howells <dhowells@redhat.com> Cc: "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> Cc: Andy Lutomirsky <luto@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Aleksa Sarai <cyphar@cyphar.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk>
2019-05-07	pinctrl: mcp23s08: Do not complain about unsupported params	Jan Kundrát	1	-2/+1
	It is expected that some of these operations won't work on each and every HW. Previously, even a simple `cat /sys/kernel/debug/pinctrl/spi1.1/pinconf-pins` caused excessive dmesg output. Signed-off-by: Jan Kundrát <jan.kundrat@cesnet.cz> Cc: Phil Reid <preid@electromag.com.au> Cc: Sebastian Reichel <sebastian.reichel@collabora.co.uk> Cc: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2019-05-06	tty: rocket: fix incorrect forward declaration of 'rp_init()'	Linus Torvalds	1	-1/+1
	Make the forward declaration actually match the real function definition, something that previous versions of gcc had just ignored. This is another patch to fix new warnings from gcc-9 before I start the merge window pulls. I don't want to miss legitimate new warnings just because my system update brought a new compiler with new warnings. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-06	ubsan: Remove vla bound checks.	Andrey Ryabinin	3	-24/+0
	The kernel the kernel is built with -Wvla for some time, so is not supposed to have any variable length arrays. Remove vla bounds checking from ubsan since it's useless now. Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>