linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2019-02-01	samples/bpf: Extend RLIMIT_MEMLOCK for xdp_{sample_pkts, router_ipv4}	Maciej Fijalkowski	2	-0/+14
	There is a common problem with xdp samples that happens when user wants to run a particular sample and some bpf program is already loaded. The default 64kb RLIMIT_MEMLOCK resource limit will cause a following error (assuming that xdp sample that is failing was converted to libbpf usage): libbpf: Error in bpf_object__probe_name():Operation not permitted(1). Couldn't load basic 'r0 = 0' BPF program. libbpf: failed to load object './xdp_sample_pkts_kern.o' Fix it in xdp_sample_pkts and xdp_router_ipv4 by setting RLIMIT_MEMLOCK to RLIM_INFINITY. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-01	samples/bpf: Convert XDP samples to libbpf usage	Maciej Fijalkowski	6	-103/+253
	Some of XDP samples that are attaching the bpf program to the interface via libbpf's bpf_set_link_xdp_fd are still using the bpf_load.c for loading and manipulating the ebpf program and maps. Convert them to do this through libbpf usage and remove bpf_load from the picture. While at it remove what looks like debug leftover in xdp_redirect_map_user.c In xdp_redirect_cpu, change the way that the program to be loaded onto interface is chosen - user now needs to pass the program's section name instead of the relative number. In case of typo print out the section names to choose from. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-01	samples/bpf: xdp_redirect_cpu have not need for read_trace_pipe	Jesper Dangaard Brouer	1	-10/+0
	The sample xdp_redirect_cpu is not using helper bpf_trace_printk. Thus it makes no sense that the --debug option us reading from /sys/kernel/debug/tracing/trace_pipe via read_trace_pipe. Simply remove it. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-01	libbpf: Add a helper for retrieving a map fd for a given name	Maciej Fijalkowski	3	-0/+10
	XDP samples are mostly cooperating with eBPF maps through their file descriptors. In case of a eBPF program that contains multiple maps it might be tiresome to iterate through them and call bpf_map__fd for each one. Add a helper mostly based on bpf_object__find_map_by_name, but instead of returning the struct bpf_map pointer, return map fd. Suggested-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-01	bpf: powerpc64: add JIT support for bpf line info	Sandipan Das	1	-0/+1
	This adds support for generating bpf line info for JITed programs. Signed-off-by: Sandipan Das <sandipan@linux.ibm.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-01	Merge branch 'bpf-spinlocks'	Daniel Borkmann	26	-39/+1248
	Alexei Starovoitov says: ==================== Many algorithms need to read and modify several variables atomically. Until now it was hard to impossible to implement such algorithms in BPF. Hence introduce support for bpf_spin_lock. The api consists of 'struct bpf_spin_lock' that should be placed inside hash/array/cgroup_local_storage element and bpf_spin_lock/unlock() helper function. Example: struct hash_elem { int cnt; struct bpf_spin_lock lock; }; struct hash_elem * val = bpf_map_lookup_elem(&hash_map, &key); if (val) { bpf_spin_lock(&val->lock); val->cnt++; bpf_spin_unlock(&val->lock); } and BPF_F_LOCK flag for lookup/update bpf syscall commands that allows user space to read/write map elements under lock. Together these primitives allow race free access to map elements from bpf programs and from user space. Key restriction: root only. Key requirement: maps must be annotated with BTF. This concept was discussed at Linux Plumbers Conference 2018. Thank you everyone who participated and helped to iron out details of api and implementation. Patch 1: bpf_spin_lock support in the verifier, BTF, hash, array. Patch 2: bpf_spin_lock in cgroup local storage. Patches 3,4,5: tests Patch 6: BPF_F_LOCK flag to lookup/update Patches 7,8,9: tests v6->v7: - fixed this_cpu->__this_cpu per Peter's suggestion and added Ack. - simplified bpf_spin_lock and load/store overlap check in the verifier as suggested by Andrii - rebase v5->v6: - adopted arch_spinlock approach suggested by Peter - switched to spin_lock_irqsave equivalent as the simplest way to avoid deadlocks in rare case of nested networking progs (cgroup-bpf prog in preempt_disable vs clsbpf in softirq sharing the same map with bpf_spin_lock) bpf_spin_lock is only allowed in networking progs that don't have arbitrary entry points unlike tracing progs. - rebase and split test_verifier tests v4->v5: - disallow bpf_spin_lock for tracing progs due to insufficient preemption checks - socket filter progs cannot use bpf_spin_lock due to missing preempt_disable - fix atomic_set_release. Spotted by Peter. - fixed hash_of_maps v3->v4: - fix BPF_EXIST \| BPF_NOEXIST check patch 6. Spotted by Jakub. Thanks! - rebase v2->v3: - fixed build on ia64 and archs where qspinlock is not supported - fixed missing lock init during lookup w/o BPF_F_LOCK. Spotted by Martin v1->v2: - addressed several issues spotted by Daniel and Martin in patch 1 - added test11 to patch 4 as suggested by Daniel ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-01	selftests/bpf: test for BPF_F_LOCK	Alexei Starovoitov	3	-1/+141
	Add C based test that runs 4 bpf programs in parallel that update the same hash and array maps. And another 2 threads that read from these two maps via lookup(key, value, BPF_F_LOCK) api to make sure the user space sees consistent value in both hash and array elements while user space races with kernel bpf progs. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-01	libbpf: introduce bpf_map_lookup_elem_flags()	Alexei Starovoitov	3	-0/+16
	Introduce int bpf_map_lookup_elem_flags(int fd, const void key, void value, __u64 flags) helper to lookup array/hash/cgroup_local_storage elements with BPF_F_LOCK flag. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-01	tools/bpf: sync uapi/bpf.h	Alexei Starovoitov	1	-0/+1
	add BPF_F_LOCK definition to tools/include/uapi/linux/bpf.h Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-01	bpf: introduce BPF_F_LOCK flag	Alexei Starovoitov	7	-14/+110
	Introduce BPF_F_LOCK flag for map_lookup and map_update syscall commands and for map_update() helper function. In all these cases take a lock of existing element (which was provided in BTF description) before copying (in or out) the rest of map value. Implementation details that are part of uapi: Array: The array map takes the element lock for lookup/update. Hash: hash map also takes the lock for lookup/update and tries to avoid the bucket lock. If old element exists it takes the element lock and updates the element in place. If element doesn't exist it allocates new one and inserts into hash table while holding the bucket lock. In rare case the hashmap has to take both the bucket lock and the element lock to update old value in place. Cgroup local storage: It is similar to array. update in place and lookup are done with lock taken. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-01	selftests/bpf: add bpf_spin_lock C test	Alexei Starovoitov	4	-2/+155
	add bpf_spin_lock C based test that requires latest llvm with BTF support Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-01	selftests/bpf: add bpf_spin_lock verifier tests	Alexei Starovoitov	2	-1/+434
	add bpf_spin_lock tests to test_verifier.c that don't require latest llvm with BTF support Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-01	tools/bpf: sync include/uapi/linux/bpf.h	Alexei Starovoitov	1	-1/+6
	sync bpf.h Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-01	bpf: add support for bpf_spin_lock to cgroup local storage	Alexei Starovoitov	3	-1/+6
	Allow 'struct bpf_spin_lock' to reside inside cgroup local storage. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-01	bpf: introduce bpf_spin_lock	Alexei Starovoitov	14	-26/+386
	Introduce 'struct bpf_spin_lock' and bpf_spin_lock/unlock() helpers to let bpf program serialize access to other variables. Example: struct hash_elem { int cnt; struct bpf_spin_lock lock; }; struct hash_elem * val = bpf_map_lookup_elem(&hash_map, &key); if (val) { bpf_spin_lock(&val->lock); val->cnt++; bpf_spin_unlock(&val->lock); } Restrictions and safety checks: - bpf_spin_lock is only allowed inside HASH and ARRAY maps. - BTF description of the map is mandatory for safety analysis. - bpf program can take one bpf_spin_lock at a time, since two or more can cause dead locks. - only one 'struct bpf_spin_lock' is allowed per map element. It drastically simplifies implementation yet allows bpf program to use any number of bpf_spin_locks. - when bpf_spin_lock is taken the calls (either bpf2bpf or helpers) are not allowed. - bpf program must bpf_spin_unlock() before return. - bpf program can access 'struct bpf_spin_lock' only via bpf_spin_lock()/bpf_spin_unlock() helpers. - load/store into 'struct bpf_spin_lock lock;' field is not allowed. - to use bpf_spin_lock() helper the BTF description of map value must be a struct and have 'struct bpf_spin_lock anyname;' field at the top level. Nested lock inside another struct is not allowed. - syscall map_lookup doesn't copy bpf_spin_lock field to user space. - syscall map_update and program map_update do not update bpf_spin_lock field. - bpf_spin_lock cannot be on the stack or inside networking packet. bpf_spin_lock can only be inside HASH or ARRAY map value. - bpf_spin_lock is available to root only and to all program types. - bpf_spin_lock is not allowed in inner maps of map-in-map. - ld_abs is not allowed inside spin_lock-ed region. - tracing progs and socket filter progs cannot use bpf_spin_lock due to insufficient preemption checks Implementation details: - cgroup-bpf class of programs can nest with xdp/tc programs. Hence bpf_spin_lock is equivalent to spin_lock_irqsave. Other solutions to avoid nested bpf_spin_lock are possible. Like making sure that all networking progs run with softirq disabled. spin_lock_irqsave is the simplest and doesn't add overhead to the programs that don't use it. - arch_spinlock_t is used when its implemented as queued_spin_lock - archs can force their own arch_spinlock_t - on architectures where queued_spin_lock is not available and sizeof(arch_spinlock_t) != sizeof(__u32) trivial lock is used. - presence of bpf_spin_lock inside map value could have been indicated via extra flag during map_create, but specifying it via BTF is cleaner. It provides introspection for map key/value and reduces user mistakes. Next steps: - allow bpf_spin_lock in other map types (like cgroup local storage) - introduce BPF_F_LOCK flag for bpf_map_update() syscall and helper to request kernel to grab bpf_spin_lock before rewriting the value. That will serialize access to map elements. Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-01-31	bpf, cgroups: clean up kerneldoc warnings	Valdis Kletnieks	3	-3/+4
	Building with W=1 reveals some bitrot: CC kernel/bpf/cgroup.o kernel/bpf/cgroup.c:238: warning: Function parameter or member 'flags' not described in '__cgroup_bpf_attach' kernel/bpf/cgroup.c:367: warning: Function parameter or member 'unused_flags' not described in '__cgroup_bpf_detach' Add a kerneldoc line for 'flags'. Fixing the warning for 'unused_flags' is best approached by removing the unused parameter on the function call. Signed-off-by: Valdis Kletnieks <valdis.kletnieks@vt.edu> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-01-31	bpf: fix missing prototype warnings	Valdis Kletnieks	1	-1/+3
	Compiling with W=1 generates warnings: CC kernel/bpf/core.o kernel/bpf/core.c:721:12: warning: no previous prototype for ?bpf_jit_alloc_exec_limit? [-Wmissing-prototypes] 721 \| u64 __weak bpf_jit_alloc_exec_limit(void) \| ^~~~~~~~~~~~~~~~~~~~~~~~ kernel/bpf/core.c:757:14: warning: no previous prototype for ?bpf_jit_alloc_exec? [-Wmissing-prototypes] 757 \| void __weak bpf_jit_alloc_exec(unsigned long size) \| ^~~~~~~~~~~~~~~~~~ kernel/bpf/core.c:762:13: warning: no previous prototype for ?bpf_jit_free_exec? [-Wmissing-prototypes] 762 \| void __weak bpf_jit_free_exec(void addr) \| ^~~~~~~~~~~~~~~~~ All three are weak functions that archs can override, provide proper prototypes for when a new arch provides their own. Signed-off-by: Valdis Kletnieks <valdis.kletnieks@vt.edu> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-01-31	bpf: fix bitrotted kerneldoc	Valdis Kletnieks	1	-1/+2
	Over the years, the function signature has changed, but the kerneldoc block hasn't. Signed-off-by: Valdis Kletnieks <valdis.kletnieks@vt.edu> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-01-31	Merge branch 'bpf-tests-probe-kernel-support'	Daniel Borkmann	4	-5/+56
	Stanislav Fomichev says: ==================== If test_maps/test_verifier is running against the kernel which doesn't have _all_ BPF features enabled, it fails with an error. This patch series tries to probe kernel support for each failed test and skip it instead. This lets users run BPF selftests in the not-all-bpf-yes environments and received correct PASS/NON-PASS result. See https://www.spinics.net/lists/netdev/msg539331.html for more context. The series goes like this: * patch #1 skips sockmap tests in test_maps.c if BPF_MAP_TYPE_SOCKMAP map is not supported (if bpf_create_map fails, we probe the kernel for support) * patch #2 skips verifier tests if test->prog_type is not supported (if bpf_verify_program fails, we probe the kernel for support) * patch #3 skips verifier tests if test fixup map is not supported (if create_map fails, we probe the kernel for support) * next patches fix various small issues that arise from the first four: * patch #4 sets "unknown func bpf_trace_printk#6" prog_type to BPF_PROG_TYPE_TRACEPOINT so it is correctly skipped in CONFIG_BPF_EVENTS=n case * patch #5 exposes BPF_PROG_TYPE_CGROUP_{SKB,SOCK,SOCK_ADDR} only when CONFIG_CGROUP_BPF=y, this makes verifier correctly skip appropriate tests v3 changes: * rebased on top of Quentin's series which adds probes to libbpf v2 changes: * don't sprinkle "ifdef CONFIG_CGROUP_BPF" all around net/core/filter.c, doing it only in the bpf_types.h is enough to disable BPF_PROG_TYPE_CGROUP_{SKB,SOCK,SOCK_ADDR} prog types for non-cgroup enabled kernels ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-01-31	bpf: BPF_PROG_TYPE_CGROUP_{SKB, SOCK, SOCK_ADDR} require cgroups enabled	Stanislav Fomichev	1	-0/+2
	There is no way to exercise appropriate attach points without cgroups enabled. This lets test_verifier correctly skip tests for these prog_types if kernel was compiled without BPF cgroup support. Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-01-31	selftests/bpf: mark verifier test that uses bpf_trace_printk as BPF_PROG_TYPE_TRACEPOINT	Stanislav Fomichev	1	-0/+1
	We don't have this helper if the kernel was compiled without CONFIG_BPF_EVENTS. Setting prog_type to BPF_PROG_TYPE_TRACEPOINT let's verifier correctly skip this test based on the missing prog_type support in the kernel. Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-01-31	selftests/bpf: skip verifier tests for unsupported map types	Stanislav Fomichev	1	-3/+33
	Use recently introduced bpf_probe_map_type() to skip tests in the test_verifier if map creation (create_map) fails. It's handled explicitly for each fixup, i.e. if bpf_create_map returns negative fd, we probe the kernel for the appropriate map support and skip the test is map type is not supported. Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-01-31	selftests/bpf: skip verifier tests for unsupported program types	Stanislav Fomichev	1	-1/+8
	Use recently introduced bpf_probe_prog_type() to skip tests in the test_verifier() if bpf_verify_program() fails. The skipped test is indicated in the output. Example: ... 679/p bpf_get_stack return R0 within range SKIP (unsupported program type 5) 680/p ld_abs: invalid op 1 OK ... Summary: 863 PASSED, 165 SKIPPED, 3 FAILED Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-01-31	selftests/bpf: skip sockmap in test_maps if kernel doesn't have support	Stanislav Fomichev	1	-1/+12
	Use recently introduced bpf_probe_map_type() to skip test_sockmap() if map creation fails. The skipped test is indicated in the output. Example: test_sockmap SKIP (unsupported map type BPF_MAP_TYPE_SOCKMAP) Fork 1024 tasks to 'test_update_delete' ... test_sockmap SKIP (unsupported map type BPF_MAP_TYPE_SOCKMAP) Fork 1024 tasks to 'test_update_delete' ... test_maps: OK, 2 SKIPPED Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-01-30	Merge branch 'hns3-next'	David S. Miller	13	-104/+145
	Huazhong Tan says: ==================== code optimizations & bugfixes for HNS3 driver This patchset includes bugfixes and code optimizations for the HNS3 ethernet controller driver ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-30	net: hns3: keep flow director state unchanged when reset	Jian Shen	2	-5/+7
	In orginal codes, driver always enables flow director when intializing. When user disable flow director with command ethtool -K, the flow director will be enabled again after resetting. This patch fixes it by only enabling it when first initialzing. Fixes: 6871af29b3ab ("net: hns3: Add reset handle for flow director") Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-30	net: hns3: stop sending keep alive msg to PF when VF is resetting	Jian Shen	1	-0/+4
	When VF is resetting, it can't communicate to PF with mailbox msg. This patch adds reset state checking before sending keep alive msg to PF. Fixes: a6d818e31d08 ("net: hns3: Add vport alive state checking support") Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-30	net: hns3: fix an issue for hclgevf_ae_get_hdev	Peng Li	1	-1/+6
	HNS3 VF driver support NIC and Roce, hdev stores NIC handle and Roce handle, should use correct parameter for container_of. Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-30	net: hns3: fix improper error handling in the hclge_init_ae_dev()	Huazhong Tan	1	-1/+1
	While hclge_init_umv_space() failed in the hclge_init_ae_dev(), we should undo all the operation which has been done successfully, the last success operation maybe hclge_mac_mdio_config(), so if hclge_init_umv_space() failed, we also need to undo it. Fixes: 288475b2ad01 ("{topost} net: hns3: refine umv space allocation") Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-30	net: hns3: fix for rss result nonuniform	Jian Shen	2	-6/+26
	The rss result is more uniform when use recommended hash key from microsoft, instead of the one generated by netdev_rss_key_fill(). Also using hash algorithm "xor" is better than "toeplitz". This patch modifies the default hash key and hash algorithm. Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-30	net: hns3: fix netif_napi_del() not do problem when unloading	Huazhong Tan	1	-20/+7
	When the driver is unloading, if a global reset occurs, unmap_ring_from_vector() in the hns3_nic_uninit_vector_data() will fail, and hns3_nic_uninit_vector_data() just return. There may be some netif_napi_del() not be done. Since hardware will unmap all ring while resetting, so hns3_nic_uninit_vector_data() should ignore this error, and do the rest uninitialization. Fixes: 76ad4f0ee747 ("net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC") Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-30	net: hns3: Fix NULL deref when unloading driver	Huazhong Tan	5	-22/+40
	When the driver is unloading, if there is a calling of ndo_open occurs between phy_disconnect() and unregister_netdev(), it will end up causing the kernel to eventually hit a NULL deref: [14942.417828] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000048 [14942.529878] Mem abort info: [14942.551166] ESR = 0x96000006 [14942.567070] Exception class = DABT (current EL), IL = 32 bits [14942.623081] SET = 0, FnV = 0 [14942.639112] EA = 0, S1PTW = 0 [14942.643628] Data abort info: [14942.659227] ISV = 0, ISS = 0x00000006 [14942.674870] CM = 0, WnR = 0 [14942.679449] user pgtable: 4k pages, 48-bit VAs, pgdp = 00000000224ad6ad [14942.695595] [0000000000000048] pgd=00000021e6673003, pud=00000021dbf01003, pmd=0000000000000000 [14942.723163] Internal error: Oops: 96000006 [#1] PREEMPT SMP [14942.729358] Modules linked in: hns3(O) hclge(O) pv680_mii(O) hnae3(O) [last unloaded: hclge] [14942.738907] CPU: 1 PID: 26629 Comm: kworker/u4:13 Tainted: G O 4.18.0-rc1-12928-ga960791-dirty #145 [14942.749491] Hardware name: Huawei Technologies Co., Ltd. D05/D05, BIOS Hi1620 FPGA TB BOOT BIOS B763 08/17/2018 [14942.760392] Workqueue: events_power_efficient phy_state_machine [14942.766644] pstate: 80c00009 (Nzcv daif +PAN +UAO) [14942.771918] pc : test_and_set_bit+0x18/0x38 [14942.776589] lr : netif_carrier_off+0x24/0x70 [14942.781033] sp : ffff0000121abd20 [14942.784518] x29: ffff0000121abd20 x28: 0000000000000000 [14942.790208] x27: ffff0000164d3cd8 x26: ffff8021da68b7b8 [14942.795832] x25: 0000000000000000 x24: ffff8021eb407800 [14942.801445] x23: 0000000000000000 x22: 0000000000000000 [14942.807046] x21: 0000000000000001 x20: 0000000000000000 [14942.812672] x19: 0000000000000000 x18: ffff000009781708 [14942.818284] x17: 00000000004970e8 x16: ffff00000816ad48 [14942.823900] x15: 0000000000000000 x14: 0000000000000008 [14942.829528] x13: 0000000000000000 x12: 0000000000000f65 [14942.835149] x11: 0000000000000001 x10: 00000000000009d0 [14942.840753] x9 : ffff0000121abaa0 x8 : 0000000000000000 [14942.846360] x7 : ffff000009781708 x6 : 0000000000000003 [14942.851970] x5 : 0000000000000020 x4 : 0000000000000004 [14942.857575] x3 : 0000000000000002 x2 : 0000000000000001 [14942.863180] x1 : 0000000000000048 x0 : 0000000000000000 [14942.868875] Process kworker/u4:13 (pid: 26629, stack limit = 0x00000000c909dbf3) [14942.876464] Call trace: [14942.879200] test_and_set_bit+0x18/0x38 [14942.883376] phy_link_change+0x38/0x78 [14942.887378] phy_state_machine+0x3dc/0x4f8 [14942.891968] process_one_work+0x158/0x470 [14942.896223] worker_thread+0x50/0x470 [14942.900219] kthread+0x104/0x130 [14942.903905] ret_from_fork+0x10/0x1c [14942.907755] Code: d2800022 8b400c21 f9800031 9ac32044 (c85f7c22) [14942.914185] ---[ end trace 968c9e12eb740b23 ]--- So this patch fixes it by modifying the timing to do phy_connect_direct() and phy_disconnect(). Fixes: 256727da7395 ("net: hns3: Add MDIO support to HNS3 Ethernet driver for hip08 SoC") Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-30	net: hns3: only support tc 0 for VF	Yunsheng Lin	3	-16/+28
	When the VF shares the same TC config as PF, the business running on PF and VF must have samiliar module. For simplicity, we are not considering VF sharing the same tc configuration as PF use case, so this patch removes the support of TC configuration from VF and forcing VF to just use single TC. Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-30	net: hns3: change hnae3_register_ae_dev() to int	Huazhong Tan	3	-4/+16
	hnae3_register_ae_dev() may fail, and it should return a error code to its caller, so change hnae3_register_ae_dev() return type to int. Also, when hnae3_register_ae_dev() return error, hns3_probe() should do some error handling and return the error code. Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-30	net: hns3: use the correct interface to stop\|open port	Peng Li	1	-2/+2
	dev_close() stop the netdev and the service base on the netdev will stop. But ndev->netdev_ops->ndo_stop() may only stop HW and stack queue, the service base on the netdev can still work. Fixes: 5668abda0931 ("net: hns3: add support for set_ringparam") Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-30	net: hns3: fix VF dump register issue	Jian Shen	1	-0/+2
	In original codes, the .get_regs_len and .get_regs were missed assigned. This patch fixes it. Fixes: 1600c3e5f23e ("net: hns3: Support "ethtool -d" for HNS3 VF driver") Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-30	net: hns3: reuse the definition of l3 and l4 header info union	liyongxin	2	-27/+6
	Union l3_hdr_info and l4_hdr_info have already been defined in the hns3_enet.h, so it is unnecessary to define them elsewhere. This patch removes the redundant definition, and reuses the one defined in the hns3_enet.h. Signed-off-by: liyongxin <liyongxin1@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-30	Merge branch 'net-dsa-mt7530-support-MT7530-in-the-MT7621-SoC'	David S. Miller	6	-48/+117
	Greg Ungerer says: ==================== net: dsa: mt7530: support MT7530 in the MT7621 SoC This is the fourth version of a patch series supporting the MT7530 switch as used in the MediaTek MT7621 SoC. Unlike the MediaTek MT7623 the MT7621 is built around a dual core MIPS CPU architecture. But inside it uses basically the same 7530 switch. This series resolves all issues I had with previous versions, and I can now reliably use the driver on a 7621 SoC platform. These patches were generated against linux-5.0-rc4. The first patch enables support for the existing kernel mediatek ethernet driver on the MT7621 SoC. This support is from Bjørn Mork, with an update and fix by me. Using this driver fixed a number of problems I had (TX checksums, large RX packet drop) over the staging driver (drivers/staging/mt7621-eth). Patch 2 modifies the mt7530 DSA driver to support the 7530 switch as implemented in the Mediatek MT7621 SoC. The last patch updates the devicetree bindings to reflect the new support in the mt7530 driver. There is no real dependencies between the patches, so they can be taken independantly. Creating a new binding for the MT7621 seems like the only viable approach to distinguish between a stand alone 7530 switch, the silicon module in the MT7623 SoC and the silicon in the MT7621. Certainly the 7530 ID register in the MT7623 and MT7621 returns the same value, "0x7530001". Looking at the mt7530.c DSA driver it might make some sense to convert the existing "mediatek,mcm" binding to something like "mediatek,mt7623" to be consistent with this new MT7621 support. As far as I can tell this is the intention of this binding. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-30	dt-bindings: net: dsa: add new MT7530 binding to support MT7621	Greg Ungerer	1	-1/+5
	Add devicetree binding to support the compatible mt7530 switch as used in the MediaTek MT7621 SoC. Signed-off-by: Greg Ungerer <gerg@kernel.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Sean Wang <sean.wang@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-30	net: dsa: mt7530: support the 7530 switch on the Mediatek MT7621 SoC	Greg Ungerer	2	-39/+66
	The MediaTek MT7621 SoC device contains a 7530 switch, and the existing linux kernel 7530 DSA switch driver can be used with it. The bulk of the changes required stem from the 7621 having different regulator and pad setup. The existing setup of these in the 7530 driver appears to be very specific to its implemtation in the Mediatek 7623 SoC. (Not entirely surprising given the 7623 is a quad core ARM based SoC, and the 7621 is a dual core, dual thread MIPS based SoC). Create a new devicetree type, "mediatek,mt7621", to support the 7530 switch in the 7621 SoC. There appears to be no usable ID register to distinguish it from a 7530 in other hardware at runtime. This is used to carry out the appropriate configuration and setup. Signed-off-by: Greg Ungerer <gerg@kernel.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Sean Wang <sean.wang@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-30	net: ethernet: mediatek: support MT7621 SoC ethernet hardware	Bjørn Mork	3	-8/+46
	The Mediatek MT7621 SoC contains the same ethernet hardware module as used on a number of other MediaTek SoC parts. There are some minor differences to deal with but we can use the same driver to support them all. This patch is based on work by Bjørn Mork <bjorn@mork.no>, and his original patch is at: https://github.com/bmork/LEDE/commit/3293bc63f5461ca1eb0bbc4ed90145335e7e3404 There is an additional compatible devicetree type added, and the primary change to the code required is to support a single interrupt (for both RX and TX interrupts). Signed-off-by: Bjørn Mork <bjorn@mork.no> [gerg@kernel.org: rebase to mainline and irq handler fix] Signed-off-by: Greg Ungerer <gerg@kernel.org> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Sean Wang <sean.wang@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-30	Merge branch 'mlxsw-spectrum_acl-Include-delta-bits-into-hashtable-key'	David S. Miller	5	-43/+174
	Ido Schimmel says: ==================== mlxsw: spectrum_acl: Include delta bits into hashtable key The Spectrum-2 ASIC allows multiple rules to use the same mask provided that the difference between their masks is small enough (up to 8 consecutive delta bits). A more detailed explanation is provided in merge commit 756cd36626f7 ("Merge branch 'mlxsw-Introduce-algorithmic-TCAM-support'"). These delta bits are part of the rule's key and therefore rules that only differ in their delta bits can be inserted with the same A-TCAM mask. In case two rules share the same key and only differ in their priority, then the second will spill to the C-TCAM. Current code does not take the delta bits into account when checking for duplicate rules, which leads to unnecessary spillage to the C-TCAM. This may result in reduced scale and performance. Patch #1 includes the delta bits in the rule's key to avoid the above mentioned problem. Patch #2 adds a tracepoint when a rule is inserted into the C-TCAM. Patches #3-#5 add test cases to make sure unnecessary spillage into the C-TCAM does not occur. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-30	selftests: spectrum-2: Add delta two masks one key test	Jiri Pirko	1	-1/+45
	Ensure that the bug is fixed and we no longer have C-TCAM spill for two keys that differ only in delta. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-30	selftests: spectrum-2: Fix multiple_masks_test	Jiri Pirko	1	-4/+22
	With recent fix in C-TCAM spillage for delta masks, the test stops to be falsely positive. So fix it not to use delta by adding src_ip bits to the masks. Alongside with that, use C-TCAM spill trace to see when the spillage actually happens. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-30	selftests: spectrum-2: Extend and move trace helpers	Jiri Pirko	1	-22/+49
	Allow to specify number of trace hits and move helpers to the beginning of the file. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-30	mlxsw: spectrum_acl: Add C-TCAM spill tracepoint	Jiri Pirko	2	-0/+41
	Add some visibility to the rule addition process and trace whenever rule spilled into C-TCAM. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-30	mlxsw: spectrum_acl: Include delta bits into hashtable key	Jiri Pirko	3	-16/+17
	Currently only ERP mask masked bits in key are considered for the hashtable key. That leads to false negative collisions and fallbacks to C-TCAM in case two keys differ only in delta bits. Fix this by taking full encoded key as a hashtable key, including delta bits. Reported-by: Nir Dotan <nird@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-30	Merge branch 'sctp-support-SCTP_FUTURE-CURRENT-ALL_ASSOC'	David S. Miller	5	-267/+525
	Xin Long says: ==================== sctp: support SCTP_FUTURE/CURRENT/ALL_ASSOC This patchset adds the support for 3 assoc_id constants: SCTP_FUTURE_ASSOC SCTP_CURRENT_ASSOC, SCTP_ALL_ASSOC, described in rfc6458#section-7.2: All socket options set on a one-to-one style listening socket also apply to all future accepted sockets. For one-to-many style sockets, often a socket option will pass a structure that includes an assoc_id field. This field can be filled with the association identifier of a particular association and unless otherwise specified can be filled with one of the following constants: SCTP_FUTURE_ASSOC: Specifies that only future associations created after this socket option will be affected by this call. SCTP_CURRENT_ASSOC: Specifies that only currently existing associations will be affected by this call, and future associations will still receive the previous default value. SCTP_ALL_ASSOC: Specifies that all current and future associations will be affected by this call. The functions for many other sockopts that use assoc_id also need to be updated accordingly. ==================== Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-30	sctp: add SCTP_FUTURE_ASOC and SCTP_CURRENT_ASSOC for SCTP_STREAM_SCHEDULER sockopt	Xin Long	3	-14/+34
	Check with SCTP_ALL_ASSOC instead in sctp_setsockopt_scheduler and check with SCTP_FUTURE_ASSOC instead in sctp_getsockopt_scheduler, it's compatible with 0. SCTP_CURRENT_ASSOC is supported for SCTP_STREAM_SCHEDULER in this patch. It also adds default_ss in sctp_sock to support SCTP_FUTURE_ASSOC. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-30	sctp: use SCTP_FUTURE_ASSOC and add SCTP_CURRENT_ASSOC for SCTP_EVENT sockopt	Xin Long	1	-29/+48
	Check with SCTP_ALL_ASSOC instead in sctp_setsockopt_event and check with SCTP_FUTURE_ASSOC instead in sctp_getsockopt_event, it's compatible with 0. SCTP_CURRENT_ASSOC is supported for SCTP_EVENT in this patch. It also adds sctp_assoc_ulpevent_type_set() to make code more readable. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>