linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2020-01-09	libbpf: Make bpf_map order and indices stable	Andrii Nakryiko	1	-14/+0
	Currently, libbpf re-sorts bpf_map structs after all the maps are added and initialized, which might change their relative order and invalidate any bpf_map pointer or index taken before that. This is inconvenient and error-prone. For instance, it can cause .kconfig map index to point to a wrong map. Furthermore, libbpf itself doesn't rely on any specific ordering of bpf_maps, so it's just an unnecessary complication right now. This patch drops sorting of maps and makes their relative positions fixed. If efficient index is ever needed, it's better to have a separate array of pointers as a search index, instead of reordering bpf_map struct in-place. This will be less error-prone and will allow multiple independent orderings, if necessary (e.g., either by section index or by name). Fixes: 166750bc1dd2 ("libbpf: Support libbpf-provided extern variables") Reported-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200110034247.1220142-1-andriin@fb.com
2020-01-09	bpf: Document BPF_F_QUERY_EFFECTIVE flag	Andrey Ignatov	2	-2/+12
	Document BPF_F_QUERY_EFFECTIVE flag, mostly to clarify how it affects attach_flags what may not be obvious and what may lead to confision. Specifically attach_flags is returned only for target_fd but if programs are inherited from an ancestor cgroup then returned attach_flags for current cgroup may be confusing. For example, two effective programs of same attach_type can be returned but w/o BPF_F_ALLOW_MULTI in attach_flags. Simple repro: # bpftool c s /sys/fs/cgroup/path/to/task ID AttachType AttachFlags Name # bpftool c s /sys/fs/cgroup/path/to/task effective ID AttachType AttachFlags Name 95043 ingress tw_ipt_ingress 95048 ingress tw_ingress Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20200108014006.938363-1-rdna@fb.com
2020-01-09	bpf: Add bpf_dctcp example	Martin KaFai Lau	3	-0/+625
	This patch adds a bpf_dctcp example. It currently does not do no-ECN fallback but the same could be done through the cgrp2-bpf. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200109003517.3856825-1-kafai@fb.com
2020-01-09	bpf: libbpf: Add STRUCT_OPS support	Martin KaFai Lau	6	-13/+661
	This patch adds BPF STRUCT_OPS support to libbpf. The only sec_name convention is SEC(".struct_ops") to identify the struct_ops implemented in BPF, e.g. To implement a tcp_congestion_ops: SEC(".struct_ops") struct tcp_congestion_ops dctcp = { .init = (void )dctcp_init, / <-- a bpf_prog / / ... some more func prts ... */ .name = "bpf_dctcp", }; Each struct_ops is defined as a global variable under SEC(".struct_ops") as above. libbpf creates a map for each variable and the variable name is the map's name. Multiple struct_ops is supported under SEC(".struct_ops"). In the bpf_object__open phase, libbpf will look for the SEC(".struct_ops") section and find out what is the btf-type the struct_ops is implementing. Note that the btf-type here is referring to a type in the bpf_prog.o's btf. A "struct bpf_map" is added by bpf_object__add_map() as other maps do. It will then collect (through SHT_REL) where are the bpf progs that the func ptrs are referring to. No btf_vmlinux is needed in the open phase. In the bpf_object__load phase, the map-fields, which depend on the btf_vmlinux, are initialized (in bpf_map__init_kern_struct_ops()). It will also set the prog->type, prog->attach_btf_id, and prog->expected_attach_type. Thus, the prog's properties do not rely on its section name. [ Currently, the bpf_prog's btf-type ==> btf_vmlinux's btf-type matching process is as simple as: member-name match + btf-kind match + size match. If these matching conditions fail, libbpf will reject. The current targeting support is "struct tcp_congestion_ops" which most of its members are function pointers. The member ordering of the bpf_prog's btf-type can be different from the btf_vmlinux's btf-type. ] Then, all obj->maps are created as usual (in bpf_object__create_maps()). Once the maps are created and prog's properties are all set, the libbpf will proceed to load all the progs. bpf_map__attach_struct_ops() is added to register a struct_ops map to a kernel subsystem. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200109003514.3856730-1-kafai@fb.com
2020-01-09	bpf: Synch uapi bpf.h to tools/	Martin KaFai Lau	1	-2/+17
	This patch sync uapi bpf.h to tools/ Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200109003512.3856559-1-kafai@fb.com
2020-01-09	bpf: Add BPF_FUNC_tcp_send_ack helper	Martin KaFai Lau	2	-2/+33
	Add a helper to send out a tcp-ack. It will be used in the later bpf_dctcp implementation that requires to send out an ack when the CE state changed. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20200109004551.3900448-1-kafai@fb.com
2020-01-09	bpf: tcp: Support tcp_congestion_ops in bpf	Martin KaFai Lau	10	-16/+261
	This patch makes "struct tcp_congestion_ops" to be the first user of BPF STRUCT_OPS. It allows implementing a tcp_congestion_ops in bpf. The BPF implemented tcp_congestion_ops can be used like regular kernel tcp-cc through sysctl and setsockopt. e.g. [root@arch-fb-vm1 bpf]# sysctl -a \| egrep congestion net.ipv4.tcp_allowed_congestion_control = reno cubic bpf_cubic net.ipv4.tcp_available_congestion_control = reno bic cubic bpf_cubic net.ipv4.tcp_congestion_control = bpf_cubic There has been attempt to move the TCP CC to the user space (e.g. CCP in TCP). The common arguments are faster turn around, get away from long-tail kernel versions in production...etc, which are legit points. BPF has been the continuous effort to join both kernel and userspace upsides together (e.g. XDP to gain the performance advantage without bypassing the kernel). The recent BPF advancements (in particular BTF-aware verifier, BPF trampoline, BPF CO-RE...) made implementing kernel struct ops (e.g. tcp cc) possible in BPF. It allows a faster turnaround for testing algorithm in the production while leveraging the existing (and continue growing) BPF feature/framework instead of building one specifically for userspace TCP CC. This patch allows write access to a few fields in tcp-sock (in bpf_tcp_ca_btf_struct_access()). The optional "get_info" is unsupported now. It can be added later. One possible way is to output the info with a btf-id to describe the content. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20200109003508.3856115-1-kafai@fb.com
2020-01-09	bpf: Introduce BPF_MAP_TYPE_STRUCT_OPS	Martin KaFai Lau	11	-47/+642
	The patch introduces BPF_MAP_TYPE_STRUCT_OPS. The map value is a kernel struct with its func ptr implemented in bpf prog. This new map is the interface to register/unregister/introspect a bpf implemented kernel struct. The kernel struct is actually embedded inside another new struct (or called the "value" struct in the code). For example, "struct tcp_congestion_ops" is embbeded in: struct bpf_struct_ops_tcp_congestion_ops { refcount_t refcnt; enum bpf_struct_ops_state state; struct tcp_congestion_ops data; /* <-- kernel subsystem struct here / } The map value is "struct bpf_struct_ops_tcp_congestion_ops". The "bpftool map dump" will then be able to show the state ("inuse"/"tobefree") and the number of subsystem's refcnt (e.g. number of tcp_sock in the tcp_congestion_ops case). This "value" struct is created automatically by a macro. Having a separate "value" struct will also make extending "struct bpf_struct_ops_XYZ" easier (e.g. adding "void (init)(void)" to "struct bpf_struct_ops_XYZ" to do some initialization works before registering the struct_ops to the kernel subsystem). The libbpf will take care of finding and populating the "struct bpf_struct_ops_XYZ" from "struct XYZ". Register a struct_ops to a kernel subsystem: 1. Load all needed BPF_PROG_TYPE_STRUCT_OPS prog(s) 2. Create a BPF_MAP_TYPE_STRUCT_OPS with attr->btf_vmlinux_value_type_id set to the btf id "struct bpf_struct_ops_tcp_congestion_ops" of the running kernel. Instead of reusing the attr->btf_value_type_id, btf_vmlinux_value_type_id s added such that attr->btf_fd can still be used as the "user" btf which could store other useful sysadmin/debug info that may be introduced in the furture, e.g. creation-date/compiler-details/map-creator...etc. 3. Create a "struct bpf_struct_ops_tcp_congestion_ops" object as described in the running kernel btf. Populate the value of this object. The function ptr should be populated with the prog fds. 4. Call BPF_MAP_UPDATE with the object created in (3) as the map value. The key is always "0". During BPF_MAP_UPDATE, the code that saves the kernel-func-ptr's args as an array of u64 is generated. BPF_MAP_UPDATE also allows the specific struct_ops to do some final checks in "st_ops->init_member()" (e.g. ensure all mandatory func ptrs are implemented). If everything looks good, it will register this kernel struct to the kernel subsystem. The map will not allow further update from this point. Unregister a struct_ops from the kernel subsystem: BPF_MAP_DELETE with key "0". Introspect a struct_ops: BPF_MAP_LOOKUP_ELEM with key "0". The map value returned will have the prog _id_ populated as the func ptr. The map value state (enum bpf_struct_ops_state) will transit from: INIT (map created) => INUSE (map updated, i.e. reg) => TOBEFREE (map value deleted, i.e. unreg) The kernel subsystem needs to call bpf_struct_ops_get() and bpf_struct_ops_put() to manage the "refcnt" in the "struct bpf_struct_ops_XYZ". This patch uses a separate refcnt for the purose of tracking the subsystem usage. Another approach is to reuse the map->refcnt and then "show" (i.e. during map_lookup) the subsystem's usage by doing map->refcnt - map->usercnt to filter out the map-fd/pinned-map usage. However, that will also tie down the future semantics of map->refcnt and map->usercnt. The very first subsystem's refcnt (during reg()) holds one count to map->refcnt. When the very last subsystem's refcnt is gone, it will also release the map->refcnt. All bpf_prog will be freed when the map->refcnt reaches 0 (i.e. during map_free()). Here is how the bpftool map command will look like: [root@arch-fb-vm1 bpf]# bpftool map show 6: struct_ops name dctcp flags 0x0 key 4B value 256B max_entries 1 memlock 4096B btf_id 6 [root@arch-fb-vm1 bpf]# bpftool map dump id 6 [{ "value": { "refcnt": { "refs": { "counter": 1 } }, "state": 1, "data": { "list": { "next": 0, "prev": 0 }, "key": 0, "flags": 2, "init": 24, "release": 0, "ssthresh": 25, "cong_avoid": 30, "set_state": 27, "cwnd_event": 28, "in_ack_event": 26, "undo_cwnd": 29, "pkts_acked": 0, "min_tso_segs": 0, "sndbuf_expand": 0, "cong_control": 0, "get_info": 0, "name": [98,112,102,95,100,99,116,99,112,0,0,0,0,0,0,0 ], "owner": 0 } } } ] Misc Notes: * bpf_struct_ops_map_sys_lookup_elem() is added for syscall lookup. It does an inplace update on "value" instead returning a pointer to syscall.c. Otherwise, it needs a separate copy of "zero" value for the BPF_STRUCT_OPS_STATE_INIT to avoid races. The bpf_struct_ops_map_delete_elem() is also called without preempt_disable() from map_delete_elem(). It is because the "->unreg()" may requires sleepable context, e.g. the "tcp_unregister_congestion_control()". * "const" is added to some of the existing "struct btf_func_model *" function arg to avoid a compiler warning caused by this patch. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20200109003505.3855919-1-kafai@fb.com
2020-01-09	bpf: Introduce BPF_PROG_TYPE_STRUCT_OPS	Martin KaFai Lau	10	-63/+373
	This patch allows the kernel's struct ops (i.e. func ptr) to be implemented in BPF. The first use case in this series is the "struct tcp_congestion_ops" which will be introduced in a latter patch. This patch introduces a new prog type BPF_PROG_TYPE_STRUCT_OPS. The BPF_PROG_TYPE_STRUCT_OPS prog is verified against a particular func ptr of a kernel struct. The attr->attach_btf_id is the btf id of a kernel struct. The attr->expected_attach_type is the member "index" of that kernel struct. The first member of a struct starts with member index 0. That will avoid ambiguity when a kernel struct has multiple func ptrs with the same func signature. For example, a BPF_PROG_TYPE_STRUCT_OPS prog is written to implement the "init" func ptr of the "struct tcp_congestion_ops". The attr->attach_btf_id is the btf id of the "struct tcp_congestion_ops" of the _running_ kernel. The attr->expected_attach_type is 3. The ctx of BPF_PROG_TYPE_STRUCT_OPS is an array of u64 args saved by arch_prepare_bpf_trampoline that will be done in the next patch when introducing BPF_MAP_TYPE_STRUCT_OPS. "struct bpf_struct_ops" is introduced as a common interface for the kernel struct that supports BPF_PROG_TYPE_STRUCT_OPS prog. The supporting kernel struct will need to implement an instance of the "struct bpf_struct_ops". The supporting kernel struct also needs to implement a bpf_verifier_ops. During BPF_PROG_LOAD, bpf_struct_ops_find() will find the right bpf_verifier_ops by searching the attr->attach_btf_id. A new "btf_struct_access" is also added to the bpf_verifier_ops such that the supporting kernel struct can optionally provide its own specific check on accessing the func arg (e.g. provide limited write access). After btf_vmlinux is parsed, the new bpf_struct_ops_init() is called to initialize some values (e.g. the btf id of the supporting kernel struct) and it can only be done once the btf_vmlinux is available. The R0 checks at BPF_EXIT is excluded for the BPF_PROG_TYPE_STRUCT_OPS prog if the return type of the prog->aux->attach_func_proto is "void". Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20200109003503.3855825-1-kafai@fb.com
2020-01-09	bpf: Support bitfield read access in btf_struct_access	Martin KaFai Lau	1	-5/+39
	This patch allows bitfield access as a scalar. It checks "off + size > t->size" to avoid accessing bitfield end up accessing beyond the struct. This check is done outside of the loop since it is applicable to all access. It also takes this chance to break early on the "off < moff" case. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20200109003501.3855427-1-kafai@fb.com
2020-01-09	bpf: Add enum support to btf_ctx_access()	Martin KaFai Lau	1	-1/+1
	It allows bpf prog (e.g. tracing) to attach to a kernel function that takes enum argument. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20200109003459.3855366-1-kafai@fb.com
2020-01-09	bpf: Avoid storing modifier to info->btf_id	Martin KaFai Lau	1	-3/+6
	info->btf_id expects the btf_id of a struct, so it should store the final result after skipping modifiers (if any). It also takes this chanace to add a missing newline in one of the bpf_log() messages. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20200109003456.3855176-1-kafai@fb.com
2020-01-09	bpf: Save PTR_TO_BTF_ID register state when spilling to stack	Martin KaFai Lau	1	-0/+1
	This patch makes the verifier save the PTR_TO_BTF_ID register state when spilling to the stack. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20200109003454.3854870-1-kafai@fb.com
2020-01-09	selftests/bpf: Restore original comm in test_overhead	Stanislav Fomichev	1	-1/+7
	test_overhead changes task comm in order to estimate BPF trampoline overhead but never sets the comm back to the original one. We have the tests (like core_reloc.c) that have 'test_progs' as hard-coded expected comm, so let's try to preserve the original comm. Currently, everything works because the order of execution is: first core_recloc, then test_overhead; but let's make it a bit future-proof. Other related changes: use 'test_overhead' as new comm instead of 'test' to make it easy to debug and drop '\n' at the end. Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Petar Penkov <ppenkov@google.com> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20200108192132.189221-1-sdf@google.com
2020-01-08	bpftool: Add misc section and probe for large INSN limit	Michal Rostecki	1	-0/+18
	Introduce a new probe section (misc) for probes not related to concrete map types, program types, functions or kernel configuration. Introduce a probe for large INSN limit as the first one in that section. Example outputs: # bpftool feature probe [...] Scanning miscellaneous eBPF features... Large program size limit is available # bpftool feature probe macros [...] /* eBPF misc features */ #define HAVE_HAVE_LARGE_INSN_LIMIT # bpftool feature probe -j \| jq '.["misc"]' { "have_large_insn_limit": true } Signed-off-by: Michal Rostecki <mrostecki@opensuse.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Link: https://lore.kernel.org/bpf/20200108162428.25014-3-mrostecki@opensuse.org
2020-01-08	libbpf: Add probe for large INSN limit	Michal Rostecki	3	-0/+23
	Introduce a new probe which checks whether kernel has large maximum program size which was increased in the following commit: c04c0d2b968a ("bpf: increase complexity limit and maximum program size") Based on the similar check in Cilium[0], authored by Daniel Borkmann. [0] https://github.com/cilium/cilium/commit/657d0f585afd26232cfa5d4e70b6f64d2ea91596 Co-authored-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Michal Rostecki <mrostecki@opensuse.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Link: https://lore.kernel.org/bpf/20200108162428.25014-2-mrostecki@opensuse.org
2020-01-07	ptp: clockmatrix: Rework clockmatrix version information.	Vincent Cheng	2	-64/+15
	Simplify and fix the version information displayed by the driver. The new info better relects what is needed to support the hardware. Prev: Version: 4.8.0, Pipeline 22169 0x4001, Rev 0, Bond 5, CSR 311, IRQ 2 New: Version: 4.8.0, Id: 0x4001 Hw Rev: 5 OTP Config Select: 15 - Remove pipeline, CSR and IRQ because version x.y.z already incorporates this information. - Remove bond number because it is not used. - Remove rev number because register was not implemented, always 0 - Add HW Rev ID register to replace rev number - Add OTP config select to show the user configuration chosen by the configurable GPIO pins on start-up Signed-off-by: Vincent Cheng <vincent.cheng.xh@renesas.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-07	enetc: Fix inconsistent IS_ERR and PTR_ERR	YueHaibing	1	-1/+1
	The proper pointer to be passed as argument is hw Detected using Coccinelle. Fixes: 6517798dd343 ("enetc: Make MDIO accessors more generic and export to include/linux/fsl") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-07	enetc: Fix an off by one in enetc_setup_tc_txtime()	Dan Carpenter	1	-1/+1
	The priv->tx_ring[] has 16 elements but only priv->num_tx_rings are set up, the rest are NULL. This ">" comparison should be ">=" to avoid a potential crash. Fixes: 0d08c9ec7d6e ("enetc: add support time specific departure base on the qos etf") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-07	Documentation: networking: Add stmmac to device drivers list	Jose Abreu	1	-0/+1
	Add the stmmac RST file to the index. Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-07	Documentation: networking: Convert stmmac documentation to RST format	Jose Abreu	2	-401/+697
	Convert the documentation of the driver to RST format and delete the old txt and old information that no longer applies. Also, add some new information. Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-07	MAINTAINERS: Add stmmac Ethernet driver documentation entry	Jose Abreu	1	-0/+1
	Add the missing entry for the file that documents the stmicro Ethernet driver stmmac. Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-07	drivers: net: cisco_hdlc: use __func__ in debug message	Chen Zhou	1	-2/+2
	Use __func__ to print the function name instead of hard coded string. BTW, replace printk(KERN_DEBUG, ...) with netdev_dbg. Signed-off-by: Chen Zhou <chenzhou10@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-07	net: ch9200: remove unnecessary return	Chen Zhou	1	-2/+0
	The return is not needed, remove it. Signed-off-by: Chen Zhou <chenzhou10@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-07	net: ch9200: use __func__ in debug message	Chen Zhou	1	-11/+11
	Use __func__ to print the function name instead of hard coded string. Signed-off-by: Chen Zhou <chenzhou10@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-07	ionic: clear compiler warning on hb use before set	Shannon Nelson	1	-1/+1
	Build checks have pointed out that 'hb' can theoretically be used before set, so let's initialize it and get rid of the compiler complaint. Signed-off-by: Shannon Nelson <snelson@pensando.io> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-07	ionic: restrict received packets to mtu size	Shannon Nelson	1	-2/+9
	Make sure the NIC drops packets that are larger than the specified MTU. The front end of the NIC will accept packets larger than MTU and will copy all the data it can to fill up the driver's posted buffers - if the buffers are not long enough the packet will then get dropped. With the Rx SG buffers allocagted as full pages, we are currently setting up more space than MTU size available and end up receiving some packets that are larger than MTU, up to the size of buffers posted. To be sure the NIC doesn't waste our time with oversized packets we need to lie a little in the SG descriptor about how long is the last SG element. At dealloc time, we know the allocation was a page, so the deallocation doesn't care about what length we put in the descriptor. Signed-off-by: Shannon Nelson <snelson@pensando.io> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-07	ionic: add Rx dropped packet counter	Shannon Nelson	3	-3/+11
	Add a counter for packets dropped by the driver, typically for bad size or a receive error seen by the device. Signed-off-by: Shannon Nelson <snelson@pensando.io> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-07	ionic: drop use of subdevice tags	Shannon Nelson	1	-4/+0
	The subdevice concept is not being used in the driver, so drop the references to it. Signed-off-by: Shannon Nelson <snelson@pensando.io> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-06	net: dsa: mv88e6xxx: Unique ATU and VTU IRQ names	Andrew Lunn	3	-2/+10
	Dynamically generate a unique interrupt name for the VTU and ATU, based on the device name. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-06	net: dsa: mv88e6xxx: Unique g2 IRQ name	Andrew Lunn	2	-1/+5
	Dynamically generate a unique g2 interrupt name, based on the device name. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-06	net: dsa: mv88e6xxx: Unique watchdog IRQ name	Andrew Lunn	2	-1/+5
	Dynamically generate a unique watchdog interrupt name, based on the device name. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-06	net: dsa: mv88e6xxx: Unique SERDES interrupt names	Andrew Lunn	2	-1/+6
	Dynamically generate a unique SERDES interrupt name, based on the device name and the port the SERDES is for. For example: 95: 3 mv88e6xxx-g2 9 Edge mv88e6xxx-0.2:00-serdes-9 96: 0 mv88e6xxx-g2 10 Edge mv88e6xxx-0.2:00-serdes-10 The 0.2:00 indicates the switch and -9 indicates port 9. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-06	net: dsa: mv88e6xxx: Unique IRQ name	Andrew Lunn	2	-1/+5
	Dynamically generate a unique switch interrupt name, based on the device name. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-06	igc: Use Start of Packet signal from PHY for timestamping	Vinicius Costa Gomes	2	-1/+6
	For better accuracy, i225 is able to do timestamping using the Start of Packet signal from the PHY. Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-01-06	igc: Add support for ethtool GET_TS_INFO command	Vinicius Costa Gomes	1	-0/+34
	This command allows igc to report what types of timestamping are supported. ptp4l uses this to detect if the hardware supports timestamping. Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-01-06	igc: Add support for TX timestamping	Vinicius Costa Gomes	4	-0/+158
	This adds support for timestamping packets being transmitted. Based on the code from i210. The basic differences is that i225 has 4 registers to store the transmit timestamps (i210 has one). Right now, we only support retrieving from one register, support for using the other registers will be added later. Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-01-06	igc: Add support for RX timestamping	Vinicius Costa Gomes	5	-0/+340
	This adds support for timestamping received packets. It is based on the i210, as many features of i225 work the same way. The main difference from i210 is that i225 has support for choosing the timer register to use when timestamping packets. Right now, we only support using timer 0. The other difference is that i225 stores two timestamps in the receive descriptor, right now, we only retrieve one. Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-01-06	epic100: allow nesting of ethtool_ops begin() and complete()	Michal Kubecek	1	-2/+5
	Unlike most networking drivers using begin() and complete() ethtool_ops callbacks to resume a device which is down and suspend it again when done, epic100 does not use standard refcounted infrastructure but sets device sleep state directly. With the introduction of netlink ethtool interface, we may have nested begin-complete blocks so that inner complete() would put the device back to sleep for the rest of the outer block. To avoid rewriting an old and not very actively developed driver, just add a nesting counter and only perform resume and suspend on the outermost level. Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-06	via-velocity: allow nesting of ethtool_ops begin() and complete()	Michal Kubecek	2	-4/+11
	Unlike most networking drivers using begin() and complete() ethtool_ops callbacks to resume a device which is down and suspend it again when done, via-velocity does not use standard refcounted infrastructure but sets device sleep state directly. With the introduction of netlink ethtool interface, we may have nested begin-complete blocks so that inner complete() would put the device back to sleep for the rest of the outer block. To avoid rewriting an old and not very actively developed driver, just add a nesting counter and only perform resume and suspend on the outermost level. Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-06	wil6210: get rid of begin() and complete() ethtool_ops	Michal Kubecek	1	-27/+16
	The wil6210 driver locks a mutex in begin() ethtool_ops callback and unlocks it in complete() so that all ethtool requests are serialized. This is not going to work correctly with netlink interface; e.g. when ioctl triggers a netlink notification, netlink code would call begin() again while the mutex taken by ioctl code is still held by the same task. Let's get rid of the begin() and complete() callbacks and move the mutex locking into the remaining ethtool_ops handlers except get_drvinfo which only copies strings that are not changing so that there is no need for serialization. Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-06	fcnal-test: Fix vrf argument in local tcp tests	David Ahern	1	-4/+4
	The recent MD5 tests added duplicate configuration in the default VRF. This change exposed a bug in existing tests designed to verify no connection when client and server are not in the same domain. The server should be running bound to the vrf device with the client run in the default VRF (the -2 option is meant for validating connection data). Fix the option for both tests. While technically this is a bug in previous releases, the tests are properly failing since the default VRF does not have any routing configuration so there really is no need to backport to prior releases. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-06	gtp: simplify error handling code in 'gtp_encap_enable()'	Christophe JAILLET	1	-6/+3
	'gtp_encap_disable_sock(sk)' handles the case where sk is NULL, so there is no need to test it before calling the function. This saves a few line of code. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-06	selftests: forwarding: router: Add test case for destination IP link-local	Amit Cohen	1	-0/+25
	Add test case to check that packets are not dropped when they need to be routed and their destination is link-local, i.e., 169.254.0.0/16. Signed-off-by: Amit Cohen <amitc@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-06	mlxsw: spectrum: Disable DIP_LINK_LOCAL check in hardware pipeline	Amit Cohen	2	-0/+3
	The check drops packets if they need to be routed and their destination IP is link-local, i.e., belongs to 169.254.0.0/16 address range. Disable the check since the kernel forwards such packets and does not drop them. Signed-off-by: Amit Cohen <amitc@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-06	selftests: forwarding: router: Add test case for source IP equals destination IP	Amit Cohen	1	-0/+44
	Add test case to check that packets are not dropped when they need to be routed and their source IP equals to their destination IP. Signed-off-by: Amit Cohen <amitc@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-06	mlxsw: spectrum: Disable SIP_DIP check in hardware pipeline	Amit Cohen	2	-0/+3
	The check drops packets if they need to be routed and their source IP equals to their destination IP. Disable the check since the kernel forwards such packets and does not drop them. Signed-off-by: Amit Cohen <amitc@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-06	selftests: forwarding: router: Add test case for multicast destination MAC mismatch	Amit Cohen	1	-0/+82
	Add test case to check that packets are not dropped when they need to be routed and their multicast MAC mismatched to their multicast destination IP. i.e., destination IP is multicast and * for IPV4: DMAC != {01-00-5E-0 (25 bits), DIP[22:0]} * for IPV6: DMAC != {33-33-0 (16 bits), DIP[31:0]} Signed-off-by: Amit Cohen <amitc@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-06	mlxsw: spectrum: Disable MC_DMAC check in hardware pipeline	Amit Cohen	2	-0/+3
	The check drops packets if they need to be routed and their multicast MAC mismatched to their multicast destination IP. For IPV4: DMAC is mismatched if it is different from {01-00-5E-0 (25 bits), DIP[22:0]} For IPV6: DMAC is mismatched if it is different from {33-33-0 (16 bits), DIP[31:0]} Disable the check since the kernel forwards such packets and does not drop them. Signed-off-by: Amit Cohen <amitc@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-06	selftests: forwarding: router: Add test case for source IP in class E	Amit Cohen	1	-1/+37
	Add test case to check that packets are not dropped when they need to be routed and their source IP in class E, (i.e., 240.0.0.0 – 255.255.255.254). Signed-off-by: Amit Cohen <amitc@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>