linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2018-10-18	selftests/bpf: fix file resource leak in load_kallsyms	Peng Hao	1	-0/+1
	FILE pointer variable f is opened but never closed. Signed-off-by: Peng Hao <peng.hao2@zte.com.cn> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-17	bpf: fix doc of bpf_skb_adjust_room() in uapi	Nicolas Dichtel	2	-2/+2
	len_diff is signed. Fixes: fa15601ab31e ("bpf: add documentation for eBPF helpers (33-41)") CC: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-17	bpf: sockmap, add msg_peek tests to test_sockmap	John Fastabend	1	-52/+115
	Add tests that do a MSG_PEEK recv followed by a regular receive to test flag support. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-17	bpf: sockmap, support for msg_peek in sk_msg with redirect ingress	John Fastabend	3	-17/+30
	This adds support for the MSG_PEEK flag when doing redirect to ingress and receiving on the sk_msg psock queue. Previously the flag was being ignored which could confuse applications if they expected the flag to work as normal. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-17	bpf: skmsg, improve sk_msg_used_element to work in cork context	John Fastabend	1	-5/+8
	Currently sk_msg_used_element is only called in zerocopy context where cork is not possible and if this case happens we fallback to copy mode. However the helper is more useful if it works in all contexts. This patch resolved the case where if end == head indicating a full or empty ring the helper always reports an empty ring. To fix this add a test for the full ring case to avoid reporting a full ring has 0 elements. This additional functionality will be used in the next patches from recvmsg context where end = head with a full ring is a valid case. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-17	bpf: sockmap, fix skmsg recvmsg handler to track size correctly	John Fastabend	2	-0/+2
	When converting sockmap to new skmsg generic data structures we missed that the recvmsg handler did not correctly use sg.size and instead was using individual elements length. The result is if a sock is closed with outstanding data we omit the call to sk_mem_uncharge() and can get the warning below. [ 66.728282] WARNING: CPU: 6 PID: 5783 at net/core/stream.c:206 sk_stream_kill_queues+0x1fa/0x210 To fix this correct the redirect handler to xfer the size along with the scatterlist and also decrement the size from the recvmsg handler. Now when a sock is closed the remaining 'size' will be decremented with sk_mem_uncharge(). Signed-off-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-16	nfp: bpf: double check vNIC capabilities after object sharing	Jakub Kicinski	3	-6/+22
	Program translation stage checks that program can be offloaded to the netdev which was passed during the load (bpf_attr->prog_ifindex). After program sharing was introduced, however, the netdev on which program is loaded can theoretically be different, and therefore we should recheck the program size and max stack size at load time. This was found by code inspection, AFAIK today all vNICs have identical caps. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-16	nfp: bpf: protect against mis-initializing atomic counters	Jakub Kicinski	3	-7/+76
	Atomic operations on the NFP are currently always in big endian. The driver keeps track of regions of memory storing atomic values and byte swaps them accordingly. There are corner cases where the map values may be initialized before the driver knows they are used as atomic counters. This can happen either when the datapath is performing the update and the stack contents are unknown or when map is updated before the program which will use it for atomic values is loaded. To avoid situation where user initializes the value to 0 1 2 3 and then after loading a program which uses the word as an atomic counter starts reading 3 2 1 0 - only allow atomic counters to be initialized to endian-neutral values. For updates from the datapath the stack information may not be as precise, so just allow initializing such values to 0. Example code which would break: struct bpf_map_def SEC("maps") rxcnt = { .type = BPF_MAP_TYPE_HASH, .key_size = sizeof(__u32), .value_size = sizeof(__u64), .max_entries = 1, }; int xdp_prog1() { __u64 nonzeroval = 3; __u32 key = 0; __u64 *value; value = bpf_map_lookup_elem(&rxcnt, &key); if (!value) bpf_map_update_elem(&rxcnt, &key, &nonzeroval, BPF_ANY); else __sync_fetch_and_add(value, 1); return XDP_PASS; } $ offload bpftool map dump key: 00 00 00 00 value: 00 00 00 03 00 00 00 00 should be: $ offload bpftool map dump key: 00 00 00 00 value: 03 00 00 00 00 00 00 00 Reported-by: David Beckett <david.beckett@netronome.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-16	libbpf: Per-symbol visibility for DSO	Andrey Ignatov	4	-148/+179
	Make global symbols in libbpf DSO hidden by default with -fvisibility=hidden and export symbols that are part of ABI explicitly with __attribute__((visibility("default"))). This is common practice that should prevent from accidentally exporting a symbol, that is not supposed to be a part of ABI what, in turn, improves both libbpf developer- and user-experiences. See [1] for more details. Export control becomes more important since more and more projects use libbpf. The patch doesn't export a bunch of netlink related functions since as agreed in [2] they'll be reworked. That doesn't break bpftool since bpftool links libbpf statically. [1] https://www.akkadia.org/drepper/dsohowto.pdf (2.2 Export Control) [2] https://www.mail-archive.com/netdev@vger.kernel.org/msg251434.html Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-16	bpf, tls: add tls header to tools infrastructure	Daniel Borkmann	2	-5/+86
	Andrey reported a build error for the BPF kselftest suite when compiled on a machine which does not have tls related header bits installed natively: test_sockmap.c:120:23: fatal error: linux/tls.h: No such file or directory #include <linux/tls.h> ^ compilation terminated. Fix it by adding the header to the tools include infrastructure and add definitions such as SOL_TLS that could potentially be missing. Fixes: e9dd904708c4 ("bpf: add tls support for testing in test_sockmap") Reported-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-16	net/ipv4: Bail early if user only wants prefix entries	David Ahern	1	-2/+6
	Unlike IPv6, IPv4 does not have routes marked with RTF_PREFIX_RT. If the flag is set in the dump request, just return. In the process of this change, move the CLONE check to use the new filter flags. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-16	net/ipv6: Bail early if user only wants cloned entries	David Ahern	1	-2/+5
	Similar to IPv4, IPv6 fib no longer contains cloned routes. If a user requests a route dump for only cloned entries, no sense walking the FIB and returning everything. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-16	net/mpls: Handle kernel side filtering of route dumps	David Ahern	1	-5/+28
	Update the dump request parsing in MPLS for the non-INET case to enable kernel side filtering. If INET is disabled the only filters that make sense for MPLS are protocol and nexthop device. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-16	net: Enable kernel side filtering of route dumps	David Ahern	6	-15/+53
	Update parsing of route dump request to enable kernel side filtering. Allow filtering results by protocol (e.g., which routing daemon installed the route), route type (e.g., unicast), table id and nexthop device. These amount to the low hanging fruit, yet a huge improvement, for dumping routes. ip_valid_fib_dump_req is called with RTNL held, so __dev_get_by_index can be used to look up the device index without taking a reference. From there filter->dev is only used during dump loops with the lock still held. Set NLM_F_DUMP_FILTERED in the answer_flags so the user knows the results have been filtered should no entries be returned. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-16	net: Plumb support for filtering ipv4 and ipv6 multicast route dumps	David Ahern	4	-12/+74
	Implement kernel side filtering of routes by egress device index and table id. If the table id is given in the filter, lookup table and call mr_table_dump directly for it. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-16	ipmr: Refactor mr_rtm_dumproute	David Ahern	2	-33/+61
	Move per-table loops from mr_rtm_dumproute to mr_table_dump and export mr_table_dump for dumps by specific table id. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-16	net/mpls: Plumb support for filtering route dumps	David Ahern	1	-1/+41
	Implement kernel side filtering of routes by egress device index and protocol. MPLS uses only a single table and route type. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-16	net/ipv6: Plumb support for filtering route dumps	David Ahern	2	-14/+54
	Implement kernel side filtering of routes by table id, egress device index, protocol, and route type. If the table id is given in the filter, lookup the table and call fib6_dump_table directly for it. Move the existing route flags check for prefix only routes to the new filter. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-16	net/ipv4: Plumb support for filtering route dumps	David Ahern	3	-13/+39
	Implement kernel side filtering of routes by table id, egress device index, protocol and route type. If the table id is given in the filter, lookup the table and call fib_table_dump directly for it. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-16	net: Add struct for fib dump filter	David Ahern	7	-11/+37
	Add struct fib_dump_filter for options on limiting which routes are returned in a dump request. The current list is table id, protocol, route type, rtm_flags and nexthop device index. struct net is needed to lookup the net_device from the index. Declare the filter for each route dump handler and plumb the new arguments from dump handlers to ip_valid_fib_dump_req. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-16	netlink: Add answer_flags to netlink_callback	David Ahern	2	-1/+3
	With dump filtering we need a way to ensure the NLM_F_DUMP_FILTERED flag is set on a message back to the user if the data returned is influenced by some input attributes. Normally this can be done as messages are added to the skb, but if the filter results in no data being returned, the user could be confused as to why. This patch adds answer_flags to the netlink_callback allowing dump handlers to set the NLM_F_DUMP_FILTERED at a minimum in the NLMSG_DONE message ensuring the flag gets back to the user. The netlink_callback space is initialized to 0 via a memset in __netlink_dump_start, so init of the new answer_flags is covered. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15	net: phy: merge phy_start_aneg and phy_start_aneg_priv	Heiner Kallweit	1	-18/+3
	After commit 9f2959b6b52d ("net: phy: improve handling delayed work") the sync parameter isn't needed any longer in phy_start_aneg_priv(). This allows to merge phy_start_aneg() and phy_start_aneg_priv(). Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15	hv_netvsc: fix vf serial matching with pci slot info	Haiyang Zhang	1	-4/+11
	The VF device's serial number is saved as a string in PCI slot's kobj name, not the slot->number. This patch corrects the netvsc driver, so the VF device can be successfully paired with synthetic NIC. Fixes: 00d7ddba1143 ("hv_netvsc: pair VF based on serial number") Reported-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15	tcp: cdg: use tcp high resolution clock cache	Eric Dumazet	1	-1/+1
	We store in tcp socket a cache of most recent high resolution clock, there is no need to call local_clock() again, since this cache is good enough. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15	tcp_bbr: fix typo in bbr_pacing_margin_percent	Neal Cardwell	1	-2/+2
	There was a typo in this parameter name. Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15	tcp: optimize tcp internal pacing	Eric Dumazet	1	-15/+16
	When TCP implements its own pacing (when no fq packet scheduler is used), it is arming high resolution timer after a packet is sent. But in many cases (like TCP_RR kind of workloads), this high resolution timer expires before the application attempts to write the following packet. This overhead also happens when the flow is ACK clocked and cwnd limited instead of being limited by the pacing rate. This leads to extra overhead (high number of IRQ) Now tcp_wstamp_ns is reserved for the pacing timer only (after commit "tcp: do not change tcp_wstamp_ns in tcp_mstamp_refresh"), we can setup the timer only when a packet is about to be sent, and if tcp_wstamp_ns is in the future. This leads to a ~10% performance increase in TCP_RR workloads. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15	net_sched: sch_fq: no longer use skb_is_tcp_pure_ack()	Eric Dumazet	1	-1/+1
	With the new EDT model, sch_fq no longer has to special case TCP pure acks, since their skb->tstamp will allow them being sent without pacing delay. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15	tcp: mitigate scheduling jitter in EDT pacing model	Eric Dumazet	1	-6/+13
	In commit fefa569a9d4b ("net_sched: sch_fq: account for schedule/timers drifts") we added a mitigation for scheduling jitter in fq packet scheduler. This patch does the same in TCP stack, now it is using EDT model. Note that this mitigation is valid for both external (fq packet scheduler) or internal TCP pacing. This uses the same strategy than the above commit, allowing a time credit of half the packet currently sent. Consider following case : An skb is sent, after an idle period of 300 usec. The air-time (skb->len/pacing_rate) is 500 usec Instead of setting the pacing timer to now+500 usec, it will use now+min(500/2, 300) -> now+250usec This is like having a token bucket with a depth of half an skb. Tested: tc qdisc replace dev eth0 root pfifo_fast Before netperf -P0 -H remote -- -q 1000000000 # 8000Mbit 540000 262144 262144 10.00 7710.43 After : netperf -P0 -H remote -- -q 1000000000 # 8000 Mbit 540000 262144 262144 10.00 7999.75 # Much closer to 8000Mbit target Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15	net: extend sk_pacing_rate to unsigned long	Eric Dumazet	7	-32/+40
	sk_pacing_rate has beed introduced as a u32 field in 2013, effectively limiting per flow pacing to 34Gbit. We believe it is time to allow TCP to pace high speed flows on 64bit hosts, as we now can reach 100Gbit on one TCP flow. This patch adds no cost for 32bit kernels. The tcpi_pacing_rate and tcpi_max_pacing_rate were already exported as 64bit, so iproute2/ss command require no changes. Unfortunately the SO_MAX_PACING_RATE socket option will stay 32bit and we will need to add a new option to let applications control high pacing rates. State Recv-Q Send-Q Local Address:Port Peer Address:Port ESTAB 0 1787144 10.246.9.76:49992 10.246.9.77:36741 timer:(on,003ms,0) ino:91863 sk:2 <-> skmem:(r0,rb540000,t66440,tb2363904,f605944,w1822984,o0,bl0,d0) ts sack bbr wscale:8,8 rto:201 rtt:0.057/0.006 mss:1448 rcvmss:536 advmss:1448 cwnd:138 ssthresh:178 bytes_acked:256699822585 segs_out:177279177 segs_in:3916318 data_segs_out:177279175 bbr:(bw:31276.8Mbps,mrtt:0,pacing_gain:1.25,cwnd_gain:2) send 28045.5Mbps lastrcv:73333 pacing_rate 38705.0Mbps delivery_rate 22997.6Mbps busy:73333ms unacked:135 retrans:0/157 rcv_space:14480 notsent:2085120 minrtt:0.013 Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15	tcp: do not change tcp_wstamp_ns in tcp_mstamp_refresh	Eric Dumazet	3	-4/+8
	In EDT design, I made the mistake of using tcp_wstamp_ns to store the last tcp_clock_ns() sample and to store the pacing virtual timer. This causes major regressions at high speed flows. Introduce tcp_clock_cache to store last tcp_clock_ns(). This is needed because some arches have slow high-resolution kernel time service. tcp_wstamp_ns is only updated when a packet is sent. Note that we can remove tcp_mstamp in the future since tcp_mstamp is essentially tcp_clock_cache/1000, so the apparent socket size increase is temporary. Fixes: 9799ccb0e984 ("tcp: add tcp_wstamp_ns socket field") Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15	net: bridge: fix a possible memory leak in __vlan_add	Li RongQing	1	-0/+4
	After per-port vlan stats, vlan stats should be released when fail to add vlan Fixes: 9163a0fc1f0c0 ("net: bridge: add support for per-port vlan stats") CC: bridge@lists.linux-foundation.org cc: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> CC: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: Zhang Yu <zhangyu31@baidu.com> Signed-off-by: Li RongQing <lirongqing@baidu.com> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15	rxrpc: Add /proc/net/rxrpc/peers to display peer list	David Howells	3	-0/+130
	Add /proc/net/rxrpc/peers to display the list of peers currently active. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15	fore200e: fix missing unlock on error in bsq_audit()	Wei Yongjun	1	-0/+1
	Add the missing unlock before return from function bsq_audit() in the error handling case. Fixes: 1d9d8be91788 ("fore200e: check for dma mapping failures") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15	bnxt_en: Add PCI ID for BCM57508 device.	Michael Chan	1	-0/+3
	Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15	bnxt_en: Add new NAPI poll function for 57500 chips.	Michael Chan	2	-4/+114
	Add a new poll function that polls for NQ events. If the NQ event is a CQ notification, we locate the CP ring from the cq_handle and call __bnxt_poll_work() to handle RX/TX events on the CP ring. Add a new has_more_work field in struct bnxt_cp_ring_info to indicate budget has been reached. __bnxt_poll_cqs_done() is called to update or ARM the CP rings if budget has not been reached or not. If budget has been reached, the next bnxt_poll_p5() call will continue to poll from the CQ rings directly. Otherwise, the NQ will be ARMed for the next IRQ. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15	bnxt_en: Refactor bnxt_poll_work().	Michael Chan	2	-11/+38
	Separate the CP ring polling logic in bnxt_poll_work() into 2 separate functions __bnxt_poll_work() and __bnxt_poll_work_done(). Since the logic is separated, we need to add tx_pkts and events fields to struct bnxt_napi to keep track of the events to handle between the 2 functions. We also add had_work_done field to struct bnxt_cp_ring_info to indicate whether some work was performed on the CP ring. This is needed to better support the 57500 chips. We need to poll up to 2 separate CP rings before we update or ARM the CP rings on the 57500 chips. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15	bnxt_en: Add coalescing setup for 57500 chips.	Michael Chan	1	-0/+46
	On legacy chips, the CP ring may be shared between RX and TX and so only setup the RX coalescing parameters in such a case. On 57500 chips, we always have a dedicated CP ring for TX so we can always set up the TX coalescing parameters in bnxt_hwrm_set_coal(). Also, the min_timer coalescing parameter applies to the NQ on the new chips and a separate firmware call needs to be made to set it up. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15	bnxt_en: Use bnxt_cp_ring_info struct pointer as parameter for RX path.	Michael Chan	2	-43/+45
	In the RX code path, we current use the bnxt_napi struct pointer to identify the associated RX/CP rings. Change it to use the struct bnxt_cp_ring_info pointer instead since there are now up to 2 CP rings per MSIX. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15	bnxt_en: Add RSS support for 57500 chips.	Michael Chan	1	-4/+109
	RSS context allocation and RSS indirection table setup are very different on the new chip. Refactor bnxt_setup_vnic() to call 2 different functions to set up RSS for the vnic based on chip type. On the new chip, the number of RSS contexts and the indirection table size depends on the number of RX rings. Each indirection table entry is also different on the new chip since ring groups are no longer used. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15	bnxt_en: Increase RSS context array count and skip ring groups on 57500 chips.	Michael Chan	2	-10/+22
	On the new 57500 chips, we need to allocate one RSS context for every 64 RX rings. In previous chips, only one RSS context per vnic is required regardless of the number of RX rings. So increase the max RSS context array count to 8. Hardware ring groups are not used on the new chips. Note that the software ring group structure is still maintained in the driver to keep track of the rings associated with the vnic. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15	bnxt_en: Allocate/Free CP rings for 57500 series chips.	Michael Chan	1	-5/+66
	On the new 57500 chips, we allocate/free one CP ring for each RX ring or TX ring separately. Using separate CP rings for RX/TX is an improvement as TX events will no longer be stuck behind RX events. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15	bnxt_en: Modify bnxt_ring_alloc_send_msg() to support 57500 chips.	Michael Chan	2	-6/+56
	Firmware ring allocation semantics are slightly different for most ring types on 57500 chips. Allocation/deallocation for NQ rings are also added for the new chips. A CP ring handle is also added so that from the NQ interrupt event, we can locate the CP ring. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15	bnxt_en: Add helper functions to get firmware CP ring ID.	Michael Chan	1	-11/+56
	On the new 57500 chips, getting the associated CP ring ID associated with an RX ring or TX ring is different than before. On the legacy chips, we find the associated ring group and look up the CP ring ID. On the 57500 chips, each RX ring and TX ring has a dedicated CP ring even if they share the MSIX. Use these helper functions at appropriate places to get the CP ring ID. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15	bnxt_en: Allocate completion ring structures for 57500 series chips.	Michael Chan	2	-0/+67
	On 57500 chips, the original bnxt_cp_ring_info struct now refers to the NQ. bp->cp_nr_rings refer to the number of NQs on 57500 chips. There are now 2 pointers for the CP rings associated with RX and TX rings. Modify bnxt_alloc_cp_rings() and bnxt_free_cp_rings() accordingly. With multiple CP rings per NAPI, we need to add a pointer in bnxt_cp_ring_info struct to point back to the bnxt_napi struct. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15	bnxt_en: Modify the ring reservation functions for 57500 series chips.	Michael Chan	1	-30/+97
	The ring reservation functions have to be modified for P5 chips in the following ways: - bnxt_cp_ring_info structs map to internal NQs as well as CP rings. - Ring groups are not used. - 1 CP ring must be available for each RX or TX ring. - number of RSS contexts to reserve is multiples of 64 RX rings. - RFS currently not supported. Also, RX AGG rings are only used for jumbo frames, so we need to unconditionally call bnxt_reserve_rings() in __bnxt_open_nic() to see if we need to reserve AGG rings in case MTU has changed. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15	bnxt_en: Adjust MSIX and ring groups for 57500 series chips.	Michael Chan	1	-1/+8
	Store the maximum MSIX capability in PCIe config. space earlier. When we call firmware to query capability, we need to compare the PCIe MSIX max count with the firmware count and use the smaller one as the MSIX count for 57500 (P5) chips. The new chips don't use ring groups. But previous chips do and the existing logic limits the available rings based on resource calculations including ring groups. Setting the max ring groups to the max rx rings will work on the new chips without changing the existing logic. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15	bnxt_en: Re-structure doorbells.	Michael Chan	4	-62/+171
	The 57500 series chips have a new 64-bit doorbell format. Use a new bnxt_db_info structure to unify the new and the old 32-bit doorbells. Add a new bnxt_set_db() function to set up the doorbell addreses and doorbell keys ahead of time. Modify and introduce new doorbell helpers to help abstract and unify the old and new doorbells. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15	bnxt_en: Add 57500 new chip ID and basic structures.	Michael Chan	2	-15/+88
	57500 series is a new chip class (P5) that requires some driver changes in the next several patches. This adds basic chip ID, doorbells, and the notification queue (NQ) structures. Each MSIX is associated with an NQ instead of a CP ring in legacy chips. Each NQ has up to 2 associated CP rings for RX and TX. The same bnxt_cp_ring_info struct will be used for the NQ. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15	bnxt_en: Configure context memory on new devices.	Michael Chan	1	-3/+120
	Call firmware to configure the DMA addresses of all context memory pages on new devices requiring context memory. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15	bnxt_en: Check context memory requirements from firmware.	Michael Chan	2	-8/+248
	New device requires host context memory as a backing store. Call firmware to check for context memory requirements and store the parameters. Allocate host pages accordingly. We also need to move the call bnxt_hwrm_queue_qportcfg() earlier so that all the supported hardware queues and the IDs are known before checking and allocating context memory. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>