linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2018-10-08	rtnetlink: Update rtnl_bridge_getlink for strict data checking	David Ahern	1	-13/+57
	Update rtnl_bridge_getlink for strict data checking. If the flag is set, the dump request is expected to have an ifinfomsg struct as the header potentially followed by one or more attributes. Any data passed in the header or as an attribute is taken as a request to influence the data returned. Only values supported by the dump handler are allowed to be non-0 or set in the request. At the moment only the IFLA_EXT_MASK attribute is supported. Signed-off-by: David Ahern <dsahern@gmail.com> Acked-by: Christian Brauner <christian@brauner.io> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08	rtnetlink: Update rtnl_dump_ifinfo for strict data checking	David Ahern	1	-30/+83
	Update rtnl_dump_ifinfo for strict data checking. If the flag is set, the dump request is expected to have an ifinfomsg struct as the header potentially followed by one or more attributes. Any data passed in the header or as an attribute is taken as a request to influence the data returned. Only values supported by the dump handler are allowed to be non-0 or set in the request. At the moment only the IFA_TARGET_NETNSID, IFLA_EXT_MASK, IFLA_MASTER, and IFLA_LINKINFO attributes are supported. Existing code does not fail the dump if nlmsg_parse fails. That behavior is kept for non-strict checking. Signed-off-by: David Ahern <dsahern@gmail.com> Acked-by: Christian Brauner <christian@brauner.io> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08	net/ipv6: Update inet6_dump_addr for strict data checking	David Ahern	1	-10/+59
	Update inet6_dump_addr for strict data checking. If the flag is set, the dump request is expected to have an ifaddrmsg struct as the header potentially followed by one or more attributes. Any data passed in the header or as an attribute is taken as a request to influence the data returned. Only values suppored by the dump handler are allowed to be non-0 or set in the request. At the moment only the IFA_TARGET_NETNSID attribute is supported. Follow on patches can add support for other fields (e.g., honor ifa_index and only return data for the given device index). Signed-off-by: David Ahern <dsahern@gmail.com> Acked-by: Christian Brauner <christian@brauner.io> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08	net/ipv4: Update inet_dump_ifaddr for strict data checking	David Ahern	1	-11/+61
	Update inet_dump_ifaddr for strict data checking. If the flag is set, the dump request is expected to have an ifaddrmsg struct as the header potentially followed by one or more attributes. Any data passed in the header or as an attribute is taken as a request to influence the data returned. Only values supported by the dump handler are allowed to be non-0 or set in the request. At the moment only the IFA_TARGET_NETNSID attribute is supported. Follow on patches can support for other fields (e.g., honor ifa_index and only return data for the given device index). Signed-off-by: David Ahern <dsahern@gmail.com> Acked-by: Christian Brauner <christian@brauner.io> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08	netlink: Add new socket option to enable strict checking on dumps	David Ahern	4	-1/+23
	Add a new socket option, NETLINK_DUMP_STRICT_CHK, that userspace can use via setsockopt to request strict checking of headers and attributes on dump requests. To get dump features such as kernel side filtering based on data in the header or attributes appended to the dump request, userspace must call setsockopt() for NETLINK_DUMP_STRICT_CHK and a non-zero value. Since the netlink sock and its flags are private to the af_netlink code, the strict checking flag is passed to dump handlers via a flag in the netlink_callback struct. For old userspace on new kernel there is no impact as all of the data checks in later patches are wrapped in a check on the new strict flag. For new userspace on old kernel, the setsockopt will fail and even if new userspace sets data in the headers and appended attributes the kernel will silently ignore it. Moving forward when the setsockopt succeeds, the new userspace on old kernel means the dump request can pass an attribute the kernel does not understand. The dump will then fail as the older kernel does not understand it. New userspace on new kernel setting the socket option gets the benefit of the improved data dump. Kernel side the NETLINK_DUMP_STRICT_CHK uapi is converted to a generic NETLINK_F_STRICT_CHK flag which can potentially be leveraged for tighter checking on the NEW, DEL, and SET commands. Signed-off-by: David Ahern <dsahern@gmail.com> Acked-by: Christian Brauner <christian@brauner.io> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08	net/ipv6: Refactor address dump to push inet6_fill_args to in6_dump_addrs	David Ahern	1	-27/+30
	Pull the inet6_fill_args arg up to in6_dump_addrs and move netnsid into it. Signed-off-by: David Ahern <dsahern@gmail.com> Acked-by: Christian Brauner <christian@brauner.io> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08	netlink: Add strict version of nlmsg_parse and nla_parse	David Ahern	2	-12/+53
	nla_parse is currently lenient on message parsing, allowing type to be 0 or greater than max expected and only logging a message "netlink: %d bytes leftover after parsing attributes in process `%s'." if the netlink message has unknown data at the end after parsing. What this could mean is that the header at the front of the attributes is actually wrong and the parsing is shifted from what is expected. Add a new strict version that actually fails with EINVAL if there are any bytes remaining after the parsing loop completes, if the atttrbitue type is 0 or greater than max expected. Signed-off-by: David Ahern <dsahern@gmail.com> Acked-by: Christian Brauner <christian@brauner.io> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08	net: Add extack to nlmsg_parse	David Ahern	12	-17/+21
	Make sure extack is passed to nlmsg_parse where easy to do so. Most of these are dump handlers and leveraging the extack in the netlink_callback. Signed-off-by: David Ahern <dsahern@gmail.com> Acked-by: Christian Brauner <christian@brauner.io> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08	netlink: Add extack message to nlmsg_parse for invalid header length	David Ahern	1	-1/+3
	Give a user a reason why EINVAL is returned in nlmsg_parse. Signed-off-by: David Ahern <dsahern@gmail.com> Acked-by: Christian Brauner <christian@brauner.io> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08	netlink: Pass extack to dump handlers	David Ahern	2	-1/+12
	Declare extack in netlink_dump and pass to dump handlers via netlink_callback. Add any extack message after the dump_done_errno allowing error messages to be returned. This will be useful when strict checking is done on dump requests, returning why the dump fails EINVAL. Signed-off-by: David Ahern <dsahern@gmail.com> Acked-by: Christian Brauner <christian@brauner.io> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08	isdn/gigaset: mark expected switch fall-throughs	Gustavo A. R. Silva	1	-2/+2
	Replace "--v-- fall through --v--" with a proper "fall through" annotation. Also, change "bad cid: fall through" to "fall through - bad cid". This fix is part of the ongoing efforts to enabling -Wimplicit-fallthrough Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08	net: sched: cls_u32: simplify the hell out u32_delete() emptiness check	Al Viro	1	-47/+1
	Now that we have the knode count, we can instantly check if any hnodes are non-empty. And that kills the check for extra references to root hnode - those could happen only if there was a knode to carry such a link. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08	net: sched: cls_u32: keep track of knodes count in tc_u_common	Al Viro	1	-0/+6
	allows to simplify u32_delete() considerably Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08	net: sched: cls_u32: get rid of tp_c	Al Viro	1	-7/+4
	Both hnode ->tp_c and tp_c argument of u32_set_parms() the latter is redundant, the former - never read... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08	net: sched: cls_u32: the tp_c argument of u32_set_parms() is always tp->data	Al Viro	1	-3/+2
	It must be tc_u_common associated with that tp (i.e. tp->data). Proof: * both ->ht_up and ->tp_c are assign-once * ->tp_c of anything inserted into tp_c->hlist is tp_c * hnodes never get reinserted into the lists or moved between those, so anything found by u32_lookup_ht(tp->data, ...) will have ->tp_c equal to tp->data. * tp->root->tp_c == tp->data. * ->ht_up of anything inserted into hnode->ht[...] is equal to hnode. * knodes never get reinserted into hash chains or moved between those, so anything returned by u32_lookup_key(ht, ...) will have ->ht_up equal to ht. * any knode returned by u32_get(tp, ...) will have ->ht_up->tp_c point to tp->data Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08	net: sched: cls_u32: pass tc_u_common to u32_set_parms() instead of tc_u_hnode	Al Viro	1	-4/+4
	the only thing we used ht for was ht->tp_c and callers can get that without going through ->tp_c at all; start with lifting that into the callers, next commits will massage those, eventually removing ->tp_c altogether. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08	net: sched: cls_u32: clean tc_u_common hashtable	Al Viro	1	-15/+9
	* calculate key once, not for each hash chain element * let tc_u_hash() return the pointer to chain head rather than index - callers are cleaner that way. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08	net: sched: cls_u32: get rid of tc_u_common ->rcu	Al Viro	1	-1/+0
	unused Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08	net: sched: cls_u32: get rid of tc_u_knode ->tp	Al Viro	1	-3/+0
	not used anymore Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08	net: sched: cls_u32: get rid of unused argument of u32_destroy_key()	Al Viro	1	-7/+6
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08	net: sched: cls_u32: make sure that divisor is a power of 2	Al Viro	1	-1/+5
	Tested by modifying iproute2 to allow sending a divisor > 255 Tested-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08	net: sched: cls_u32: disallow linking to root hnode	Al Viro	1	-0/+4
	Operation makes no sense. Nothing will actually break if we do so (depth limit in u32_classify() will prevent infinite loops), but according to maintainers it's best prohibited outright. NOTE: doing so guarantees that u32_destroy() will trigger the call of u32_destroy_hnode(); we might want to make that unconditional. Test: tc qdisc add dev eth0 ingress tc filter add dev eth0 parent ffff: protocol ip prio 100 u32 \ link 800: offset at 0 mask 0f00 shift 6 plus 0 eat match ip protocol 6 ff should fail with Error: cls_u32: Not linking to root node Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08	net: sched: cls_u32: mark root hnode explicitly	Al Viro	1	-1/+3
	... and produce consistent error on attempt to delete such. Existing check in u32_delete() is inconsistent - after tc qdisc add dev eth0 ingress tc filter add dev eth0 parent ffff: protocol ip prio 100 handle 1: u32 \ divisor 1 tc filter add dev eth0 parent ffff: protocol ip prio 200 handle 2: u32 \ divisor 1 both tc filter delete dev eth0 parent ffff: protocol ip prio 100 handle 801: u32 and tc filter delete dev eth0 parent ffff: protocol ip prio 100 handle 800: u32 will fail (at least with refcounting fixes), but the former will complain about an attempt to remove a busy table, while the latter will recognize it as root and yield "Not allowed to delete root node" instead. The problem with the existing check is that several tcf_proto instances might share the same tp->data and handle-to-hnode lookup will be the same for all of them. So comparing an hnode to be deleted with tp->root won't catch the case when one tp is used to try deleting the root of another. Solution is trivial - mark the root hnodes explicitly upon allocation and check for that. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08	net: phy: mscc: add support for VSC8574 PHY	Quentin Schulz	1	-1/+319
	The VSC8574 PHY is a 4-ports PHY that is 10/100/1000BASE-T, 100BASE-FX, 1000BASE-X and triple-speed copper SFP capable, can communicate with the MAC via SGMII, QSGMII or 1000BASE-X, supports WOL, downshifting and can set the blinking pattern of each of its 4 LEDs, supports SyncE as well as HP Auto-MDIX detection. This adds support for 10/100/1000BASE-T, SGMII/QSGMII link with the MAC, WOL, downshifting, HP Auto-MDIX detection and blinking pattern for its 4 LEDs. The VSC8574 has also an internal Intel 8051 microcontroller whose firmware needs to be patched when the PHY is reset. If the 8051's firmware has the expected CRC, its patching can be skipped. The microcontroller can be accessed from any port of the PHY, though the CRC function can only be done through the PHY that is the base PHY of the package (internal address 0) due to a limitation of the firmware. The GPIO register bank is a set of registers that are common to all PHYs in the package. So any modification in any register of this bank affects all PHYs of the package. If the PHYs haven't been reset before booting the Linux kernel and were configured to use interrupts for e.g. link status updates, it is required to clear the interrupts mask register of all PHYs before being able to use interrupts with any PHY. The first PHY of the package that will be init will take care of clearing all PHYs interrupts mask registers. Thus, we need to keep track of the init sequence in the package, if it's already been done or if it's to be done. Most of the init sequence of a PHY of the package is common to all PHYs in the package, thus we use the SMI broadcast feature which enables us to propagate a write in one register of one PHY to all PHYs in the same package. Signed-off-by: Quentin Schulz <quentin.schulz@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08	net: phy: mscc: add support for VSC8584 PHY	Quentin Schulz	1	-0/+747
	The VSC8584 PHY is a 4-ports PHY that is 10/100/1000BASE-T, 100BASE-FX, 1000BASE-X and triple-speed copper SFP capable, can communicate with the MAC via SGMII, QSGMII or 1000BASE-X, supports downshifting and can set the blinking pattern of each of its 4 LEDs, supports hardware offloading of MACsec and supports SyncE as well as HP Auto-MDIX detection. This adds support for 10/100/1000BASE-T, SGMII/QSGMII link with the MAC, downshifting, HP Auto-MDIX detection and blinking pattern for its 4 LEDs. The VSC8584 has also an internal Intel 8051 microcontroller whose firmware needs to be patched when the PHY is reset. If the 8051's firmware has the expected CRC, its patching can be skipped. The microcontroller can be accessed from any port of the PHY, though the CRC function can only be done through the PHY that is the base PHY of the package (internal address 0) due to a limitation of the firmware. The GPIO register bank is a set of registers that are common to all PHYs in the package. So any modification in any register of this bank affects all PHYs of the package. If the PHYs haven't been reset before booting the Linux kernel and were configured to use interrupts for e.g. link status updates, it is required to clear the interrupts mask register of all PHYs before being able to use interrupts with any PHY. The first PHY of the package that will be init will take care of clearing all PHYs interrupts mask registers. Thus, we need to keep track of the init sequence in the package, if it's already been done or if it's to be done. Most of the init sequence of a PHY of the package is common to all PHYs in the package, thus we use the SMI broadcast feature which enables us to propagate a write in one register of one PHY to all PHYs in the same package. The revA of the VSC8584 PHY (which is not and will not be publicly released) should NOT patch the firmware of the microcontroller or it'll make things worse, the easiest way is just to not support it. Signed-off-by: Quentin Schulz <quentin.schulz@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08	dt-bindings: net: vsc8531: add two additional LED modes for VSC8584	Quentin Schulz	1	-0/+2
	The VSC8584 (and most likely other PHYs in the same generation) has two additional LED modes that can be picked, so let's add them. Reviewed-by: Rob Herring <robh@kernel.org> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Quentin Schulz <quentin.schulz@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08	net: phy: mscc: remove unneeded temporary variable	Quentin Schulz	1	-8/+4
	Here, the rc variable is either used only for the condition right after the assignment or right before being used as the return value of the function it's being used in. So let's remove this unneeded temporary variable whenever possible. Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Quentin Schulz <quentin.schulz@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08	net: phy: mscc: shorten `x != 0` condition to `x`	Quentin Schulz	1	-4/+4
	`if (x != 0)` is basically a more verbose version of `if (x)` so let's use the latter so it's consistent throughout the whole driver. Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Quentin Schulz <quentin.schulz@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08	net: phy: mscc: remove unneeded parenthesis	Quentin Schulz	1	-1/+1
	The == operator precedes the \|\| operator, so we can remove the parenthesis around (a == b) \|\| (c == d). The condition is rather explicit and short so removing the parenthesis definitely does not make it harder to read. Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Quentin Schulz <quentin.schulz@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08	net: phy: mscc: Add EEE init sequence	Raju Lakkaraju	1	-0/+65
	Microsemi PHYs (VSC 8530/31/40/41) need to update the Energy Efficient Ethernet initialization sequence. In order to avoid certain link state errors that could result in link drops and packet loss, the physical coding sublayer (PCS) must be updated with settings related to EEE in order to improve performance. Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microchip.com> Signed-off-by: Quentin Schulz <quentin.schulz@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08	net: phy: mscc: add ethtool statistics counters	Raju Lakkaraju	1	-0/+119
	There are a few counters available in the PHY: receive errors, false carriers, link disconnects, media CRC errors and valids counters. So let's expose those in the PHY driver. Use the priv structure as the next PHY to be supported has a few additional counters. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microsemi.com> Signed-off-by: Quentin Schulz <quentin.schulz@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08	net: phy: mscc: migrate to phy_select/restore_page functions	Quentin Schulz	1	-84/+62
	The Microsemi PHYs have multiple banks of registers (called pages). Registers can only be accessed from one page, if we need a register from another page, we need to switch the page and the registers of all other pages are not accessible anymore. Basically, to read register 5 from page 0, 1, 2, etc., you do the same phy_read(phydev, 5); but you need to set the desired page beforehand. In order to guarantee that two concurrent functions do not change the page, we need to do some locking per page. This can be achieved with the use of phy_select_page and phy_restore_page functions but phy_write/read calls in-between those two functions shall be replaced by their lock-free alternative __phy_write/read. Let's migrate this driver to those functions. Suggested-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Quentin Schulz <quentin.schulz@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08	net: dpaa2: fix and improve dpaa2-ptp driver	Yangbo Lu	1	-16/+9
	This patch is to fix and improve dpaa2-ptp driver in some places. - Fixed the return for some functions. - Replaced kzalloc with devm_kzalloc. - Removed dev_set_drvdata(dev, NULL). - Made ptp_dpaa2_caps const. Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08	net: dpaa2: remove unused code for dprtc	Yangbo Lu	3	-723/+0
	This patch is to removed unused code for dprtc. This code will be re-added along with more features of dpaa2-ptp added. Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08	net: dpaa2: rename rtc as ptp in dpaa2-ptp driver	Yangbo Lu	1	-15/+15
	In dpaa2-ptp driver, it's odd to use rtc in names of some functions and structures except these dprtc APIs. This patch is to use ptp instead of rtc in names. Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08	net: dpaa2: fix dependency of config FSL_DPAA2_ETH	Yangbo Lu	1	-1/+1
	The NETDEVICES dependency and ETHERNET dependency hadn't been required since dpaa2-eth was moved out of staging. Also allowed COMPILE_TEST for dpaa2-eth. Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08	MAINTAINERS: update files maintained under DPAA2 PTP/ETHERNET	Yangbo Lu	1	-3/+8
	The files maintained under DPAA2 PTP/ETHERNET needs to be updated since dpaa2 ptp driver had been moved into drivers/net/ethernet/freescale/dpaa2/. Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08	net: dpaa2: move DPAA2 PTP driver out of staging/	Yangbo Lu	11	-27/+23
	This patch is to move DPAA2 PTP driver out of staging/ since the dpaa2-eth had been moved out. Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08	bpf: allow offload of programs with BPF-to-BPF function calls	Quentin Monnet	1	-7/+3
	Now that there is at least one driver supporting BPF-to-BPF function calls, lift the restriction, in the verifier, on hardware offload of eBPF programs containing such calls. But prevent jit_subprogs(), still in the verifier, from being run for offloaded programs. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-08	nfp: bpf: support pointers to other stack frames for BPF-to-BPF calls	Quentin Monnet	3	-1/+6
	Mark instructions that use pointers to areas in the stack outside of the current stack frame, and process them accordingly in mem_op_stack(). This way, we also support BPF-to-BPF calls where the caller passes a pointer to data in its own stack frame to the callee (typically, when the caller passes an address to one of its local variables located in the stack, as an argument). Thanks to Jakub and Jiong for figuring out how to deal with this case, I just had to turn their email discussion into this patch. Suggested-by: Jiong Wang <jiong.wang@netronome.com> Suggested-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-08	nfp: bpf: optimise save/restore for R6~R9 based on register usage	Quentin Monnet	3	-23/+78
	When pre-processing the instructions, it is trivial to detect what subprograms are using R6, R7, R8 or R9 as destination registers. If a subprogram uses none of those, then we do not need to jump to the subroutines dedicated to saving and restoring callee-saved registers in its prologue and epilogue. This patch introduces detection of callee-saved registers in subprograms and prevents the JIT from adding calls to those subroutines whenever we can: we save some instructions in the translated program, and some time at runtime on BPF-to-BPF calls and returns. If no subprogram needs to save those registers, we can avoid appending the subroutines at the end of the program. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-08	nfp: bpf: fix return address from register-saving subroutine to callee	Quentin Monnet	1	-1/+27
	On performing a BPF-to-BPF call, we first jump to a subroutine that pushes callee-saved registers (R6~R9) to the stack, and from there we goes to the start of the callee next. In order to do so, the caller must pass to the subroutine the address of the NFP instruction to jump to at the end of that subroutine. This cannot be reliably implemented when translated the caller, as we do not always know the start offset of the callee yet. This patch implement the required fixup step for passing the start offset in the callee via the register used by the subroutine to hold its return address. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-08	nfp: bpf: update fixup function for BPF-to-BPF calls support	Quentin Monnet	2	-3/+24
	Relocation for targets of BPF-to-BPF calls are required at the end of translation. Update the nfp_fixup_branches() function in that regard. When checking that the last instruction of each bloc is a branch, we must account for the length of the instructions required to pop the return address from the stack. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-08	nfp: bpf: account for additional stack usage when checking stack limit	Quentin Monnet	2	-8/+68
	Offloaded programs using BPF-to-BPF calls use the stack to store the return address when calling into a subprogram. Callees also need some space to save eBPF registers R6 to R9. And contrarily to kernel verifier, we align stack frames on 64 bytes (and not 32). Account for all this when checking the stack size limit before JIT-ing the program. This means we have to recompute maximum stack usage for the program, we cannot get the value from the kernel. In addition to adapting the checks on stack usage, move them to the finalize() callback, now that we have it and because such checks are part of the verification step rather than translation. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-08	nfp: bpf: add main logics for BPF-to-BPF calls support in nfp driver	Quentin Monnet	5	-4/+295
	This is the main patch for the logics of BPF-to-BPF calls in the nfp driver. The functions called on BPF_JUMP \| BPF_CALL and BPF_JUMP \| BPF_EXIT were used to call helpers and exit from the program, respectively; make them usable for calling into, or returning from, a BPF subprogram as well. For all calls, push the return address as well as the callee-saved registers (R6 to R9) to the stack, and pop them upon returning from the calls. In order to limit the overhead in terms of instruction number, this is done through dedicated subroutines. Jumping to the callee actually consists in jumping to the subroutine, that "returns" to the callee: this will require some fixup for passing the address in a later patch. Similarly, returning consists in jumping to the subroutine, which pops registers and then return directly to the caller (but no fixup is needed here). Return to the caller is performed with the RTN instruction newly added to the JIT. For the few steps where we need to know what subprogram an instruction belongs to, the struct nfp_insn_meta is extended with a new subprog_idx field. Note that checks on the available stack size, to take into account the additional requirements associated to BPF-to-BPF calls (storing R6-R9 and return addresses), are added in a later patch. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-08	nfp: bpf: account for BPF-to-BPF calls when preparing nfp JIT	Quentin Monnet	2	-11/+27
	Similarly to "exit" or "helper call" instructions, BPF-to-BPF calls will require additional processing before translation starts, in order to record and mark jump destinations. We also mark the instructions where each subprogram begins. This will be used in a following commit to determine where to add prologues for subprograms. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-08	nfp: bpf: ignore helper-related checks for BPF calls in nfp verifier	Quentin Monnet	2	-4/+13
	The checks related to eBPF helper calls are performed each time the nfp driver meets a BPF_JUMP \| BPF_CALL instruction. However, these checks are not relevant for BPF-to-BPF call (same instruction code, different value in source register), so just skip the checks for such calls. While at it, rename the function that runs those checks to make it clear they apply to _helper_ calls only. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-08	nfp: bpf: copy eBPF subprograms information from kernel verifier	Quentin Monnet	3	-0/+29
	In order to support BPF-to-BPF calls in offloaded programs, the nfp driver must collect information about the distinct subprograms: namely, the number of subprograms composing the complete program and the stack depth of those subprograms. The latter in particular is non-trivial to collect, so we copy those elements from the kernel verifier via the newly added post-verification hook. The struct nfp_prog is extended to store this information. Stack depths are stored in an array of dedicated structs. Subprogram start indexes are not collected. Instead, meta instructions associated to the start of a subprogram will be marked with a flag in a later patch. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-08	nfp: bpf: rename nfp_prog->stack_depth as nfp_prog->stack_frame_depth	Quentin Monnet	3	-8/+8
	In preparation for support for BPF to BPF calls in offloaded programs, rename the "stack_depth" field of the struct nfp_prog as "stack_frame_depth". This is to make it clear that the field refers to the maximum size of the current stack frame (as opposed to the maximum size of the whole stack memory). Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-08	bpf: add verifier callback to get stack usage info for offloaded progs	Quentin Monnet	6	-2/+37
	In preparation for BPF-to-BPF calls in offloaded programs, add a new function attribute to the struct bpf_prog_offload_ops so that drivers supporting eBPF offload can hook at the end of program verification, and potentially extract information collected by the verifier. Implement a minimal callback (returning 0) in the drivers providing the structs, namely netdevsim and nfp. This will be useful in the nfp driver, in later commits, to extract the number of subprograms as well as the stack depth for those subprograms. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>