linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2018-05-23	amd-xgbe: Advertise FEC support with the KR re-driver	Tom Lendacky	1	-0/+4
	When a KR re-driver is present, indicate the FEC support is available during auto-negotiation. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-23	amd-xgbe: Always attempt link training in KR mode	Tom Lendacky	1	-53/+16
	Link training is always attempted when in KR mode, but the code is structured to check if link training has been enabled before attempting to perform it. Since that check will always be true, simplify the code to always enable and start link training during KR auto-negotiation. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-23	amd-xgbe: Add ethtool show/set channels support	Tom Lendacky	3	-0/+163
	Add ethtool support to show and set the device channel configuration. Changing the channel configuration will result in a device restart. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-23	amd-xgbe: Prepare for ethtool set-channel support	Tom Lendacky	2	-60/+68
	In order to support being able to dynamically set/change the number of Rx and Tx channels, update the code to: - Move alloc and free of device memory into callable functions - Move setting of the real number of Rx and Tx channels to device startup - Move mapping of the RSS channels to device startup Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-23	amd-xgbe: Add ethtool show/set ring parameter support	Tom Lendacky	3	-5/+72
	Add ethtool support to show and set the number of the Rx and Tx ring descriptors. Changing the ring configuration will result in a device restart. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-23	amd-xgbe: Add ethtool support to retrieve SFP module info	Tom Lendacky	4	-0/+189
	Add support to get SFP module information using ethtool. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-23	amd-xgbe: Remove field that indicates SFP diagnostic support	Tom Lendacky	1	-9/+0
	The driver currently sets an indication of whether the SFP supports, and that the driver can obtain, diagnostics data. This isn't currently used by the driver and the logic to set this indicator is flawed because the field is cleared each time the SFP is checked and only set when a new SFP is detected. Remove this field and the logic supporting it. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-23	amd-xgbe: Remove use of comm_owned field	Tom Lendacky	1	-16/+0
	The comm_owned field can hide logic where double locking is attempted and prevent multiple threads for the same device from accessing the mutex properly. Remove the comm_owned field and use the mutex API exclusively for gaining ownership. The current driver has been audited and is obtaining communications ownership properly. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-23	amd-xgbe: Read and save the port property registers during probe	Tom Lendacky	3	-47/+62
	Read and save the port property registers once during the device probe and then use the saved values as they are needed. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-23	amd-xgbe: Fix debug output of max channel counts	Tom Lendacky	1	-1/+1
	A debug output print statement uses the wrong variable to output the maximum Rx channel count (cut and paste error, basically). Fix the statement to use the proper variable. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-23	net/smc: longer delay when freeing client link groups	Ursula Braun	1	-1/+1
	Client link group creation always follows the server linkgroup creation. If peer creates a new server link group, client has to create a new client link group. If peer reuses a server link group for a new connection, client has to reuse its client link group as well. To avoid out-of-sync conditions for link groups a longer delay for for client link group removal is defined to make sure this link group still exists, once the peer decides to reuse a server link group. Currently the client link group delay time is just 10 jiffies larger than the server link group delay time. This patch increases the delay difference to 10 seconds to have a better protection against out-of-sync link groups. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-23	net/smc: urgent data support	Stefan Raspl	8	-36/+238
	Add support for out of band data send and receive. Signed-off-by: Stefan Raspl <raspl@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-23	net/smc: lock smc_lgr_list in port_terminate()	Hans Wippel	1	-3/+13
	Currently, smc_port_terminate() is not holding the lock of the lgr list while it is traversing the list. This patch adds locking to this function and changes smc_lgr_terminate() accordingly. Signed-off-by: Hans Wippel <hwippel@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-23	net/smc: return 0 for ioctl calls in states INIT and CLOSED	Ursula Braun	1	-3/+15
	A connected SMC-socket contains addresses of descriptors for the send buffer and the rmb (receive buffer). Fields of these descriptors are used to determine the answer for certain ioctl requests. Add extra handling for unconnected SMC socket states without valid buffer descriptor addresses. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Reported-by: syzbot+e6714328fda813fc670f@syzkaller.appspotmail.com Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-23	cxgb4: do L1 config when module is inserted	Ganesh Goudar	3	-12/+65
	trigger an L1 configure operation when a transceiver module is inserted in order to cause current "sticky" options like Requested Forward Error Correction to be reapplied. Signed-off-by: Casey Leedom <leedom@chelsio.com> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-23	cxgb4: change the port capability bits definition	Ganesh Goudar	4	-8/+8
	MDI Port Capabilities bit definitions were inconsistent with regard to the MDI enum values. 2 bits used to define MDI in the port capabilities are not really separable, it's a 2-bit field with 4 different values. Change the port capability bit definitions to be "AUTO" and "STRAIGHT" in order to get them to line up with the enum's. Signed-off-by: Casey Leedom <leedom@chelsio.com> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-23	net: dsa: qca8k: Remove redundant parentheses	Michal Vokáč	1	-1/+1
	Fix warning reported by checkpatch. Signed-off-by: Michal Vokáč <michal.vokac@ysoft.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-23	net: dsa: qca8k: Replace GPL boilerplate by SPDX	Michal Vokáč	1	-9/+1
	Replace the GPLv2 license boilerplate with the SPDX license identifier. Signed-off-by: Michal Vokáč <michal.vokac@ysoft.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-23	net: dsa: qca8k: Allow overwriting CPU port setting	Michal Vokáč	2	-0/+44
	Implement adjust_link function that allows to overwrite default CPU port setting using fixed-link device tree subnode. Signed-off-by: Michal Vokáč <michal.vokac@ysoft.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-23	net: dsa: qca8k: Force CPU port to its highest bandwidth	Michal Vokáč	2	-3/+9
	By default autonegotiation is enabled to configure MAC on all ports. For the CPU port autonegotiation can not be used so we need to set some sensible defaults manually. This patch forces the default setting of the CPU port to 1000Mbps/full duplex which is the chip maximum capability. Also correct size of the bit field used to configure link speed. Fixes: 6b93fb46480a ("net-next: dsa: add new driver for qca8xxx family") Signed-off-by: Michal Vokáč <michal.vokac@ysoft.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-23	net: dsa: qca8k: Enable RXMAC when bringing up a port	Michal Vokáč	1	-1/+1
	When a port is brought up/down do not enable/disable only the TXMAC but the RXMAC as well. This is essential for the CPU port to work. Fixes: 6b93fb46480a ("net-next: dsa: add new driver for qca8xxx family") Signed-off-by: Michal Vokáč <michal.vokac@ysoft.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-23	net: dsa: qca8k: Add support for QCA8334 switch	Michal Vokáč	1	-0/+1
	Add support for the four-port variant of the Qualcomm QCA833x switch. Signed-off-by: Michal Vokáč <michal.vokac@ysoft.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-23	net: dsa: qca8k: Add QCA8334 binding documentation	Michal Vokáč	1	-1/+22
	Add support for the four-port variant of the Qualcomm QCA833x switch. The CPU port default link settings can be reconfigured using a fixed-link sub-node. Signed-off-by: Michal Vokáč <michal.vokac@ysoft.com> Reviewed-by: Rob Herring <robh@kernel.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-23	cxgb4: Add new T6 device ids	Ganesh Goudar	1	-0/+2
	Add 0x6088 and 0x6089 device ids for new T6 cards. Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-23	selftests: uevent filtering	Christian Brauner	3	-0/+505
	Recent discussions around uevent filtering (cf. net-next commit [1], [2], and [3] and discussions in [4], [5], and [6]) have shown that the semantics around uevent filtering where not well understood. Now that we have settled - at least for the moment - how uevent filtering should look like let's add some selftests to ensure we don't regress anything in the future. Note, the semantics of uevent filtering are described in detail in my commit message to [2] so I won't repeat them here. [1]: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=90d52d4fd82007005125d9a8d2d560a1ca059b9d [2]: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=a3498436b3a0f8ec289e6847e1de40b4123e1639 [3]: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=26045a7b14bc7a5455e411d820110f66557d6589 [4]: https://lkml.org/lkml/2018/4/4/739 [5]: https://lkml.org/lkml/2018/4/26/767 [6]: https://lkml.org/lkml/2018/4/26/738 Signed-off-by: Christian Brauner <christian@brauner.io> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-23	selftests: net: initial fib rule tests	Roopa Prabhu	2	-1/+249
	This adds a first set of tests for fib rule match/action for ipv4 and ipv6. Initial tests only cover action lookup table. can be extended to cover other actions in the future. Uses ip route get to validate the rule lookup. Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-23	ipv6: support sport, dport and ip_proto in RTM_GETROUTE	Roopa Prabhu	1	-0/+17
	This is a followup to fib6 rules sport, dport and ipproto match support. Only supports tcp, udp and icmp for ipproto. Used by fib rule self tests. Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-23	ipv4: support sport, dport and ip_proto in RTM_GETROUTE	Roopa Prabhu	6	-40/+140
	This is a followup to fib rules sport, dport and ipproto match support. Only supports tcp, udp and icmp for ipproto. Used by fib rule self tests. Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-23	hv_netvsc: Add handlers for ethtool get/set msg level	Haiyang Zhang	1	-0/+16
	The handlers for ethtool get/set msg level are missing from netvsc. This patch adds them. Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-23	net: vxge: fix spelling mistake in macro VXGE_HW_ERR_PRIVILAGED_OPEARATION	Colin Ian King	4	-10/+10
	Rename VXGE_HW_ERR_PRIVILAGED_OPEARATION to VXGE_HW_ERR_PRIVILEGED_OPERATION to fix spelling mistake. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-23	gso: limit udp gso to egress-only virtual devices	Willem de Bruijn	3	-5/+6
	Until the udp receive stack supports large packets (UDP GRO), GSO packets must not loop from the egress to the ingress path. Revert the change that added NETIF_F_GSO_UDP_L4 to various virtual devices through NETIF_F_GSO_ENCAP_ALL as this included devices that may loop packets, such as veth and macvlan. Instead add it to specific devices that forward to another device's egress path, bonding and team. Fixes: 83aa025f535f ("udp: add gso support to virtual devices") CC: Alexander Duyck <alexander.duyck@gmail.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-23	udp: exclude gso from xfrm paths	Willem de Bruijn	2	-2/+4
	UDP GSO delays final datagram construction to the GSO layer. This conflicts with protocol transformations. Fixes: bec1f6f69736 ("udp: generate gso with UDP_SEGMENT") CC: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-23	Documentation/bindings: net: the sfp i2c-bus property is now mandatory	Antoine Tenart	1	-2/+2
	The i2c-bus property for sfp modules was made mandatory. Update the documentation to keep it in sync with the driver's behaviour. Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-23	net: phy: sfp: make the i2c-bus dt property mandatory	Antoine Tenart	1	-13/+15
	This patch makes the i2c-bus property mandatory when using a device tree. If the sfp i2c bus isn't described it's impossible to guess the protocol to use for a given module, and the sfp module would then not work in most cases. Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-23	net: phy: sfp: warn the user when no tx_disable pin is available	Antoine Tenart	1	-0/+9
	In case no Tx disable pin is available the SFP modules will always be emitting. This could be an issue when using modules using laser as their light source as we would have no way to disable it when the fiber is removed. This patch adds a warning when registering an SFP cage which do not have its tx_disable pin wired or available. Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Acked-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-23	nfp: assign vNIC id as phys_port_name of vNICs which are not ports	Jakub Kicinski	5	-7/+30
	When NFP is modelled as a switch we assign phys_port_name to respective port(representor )s: vNIC0 - \| - PF port (pf%d) MAC/PHY (p%d[s%d]) - \|E== In most cases there is only one vNIC for communication with the switch. If there is more than one we need to be able to identify them. Use %d as phys_port_name of the vNICs. We don't have to pass ID to nfp_net_debugfs_vnic_add() separately any more. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-23	nfp: use split in naming of PCIe PF ports	Jakub Kicinski	3	-1/+11
	PCI PFs can host more than one logical endpoint. In NFP terms this means having more than one vNIC for PCIe PF. The vNICs are usually corresponding 1:1 to Ethernet ports. In core NIC we use the legacy idea of vNIC being the Ethernet port, hence netdevs put pX(sY) in their phys_port_name, like Ethernet ports would. When ASIC ports are fully represented we need to be able to name different PCIe PF ports, too. Use a scheme similar to Ethernet ports - pfXsY, for PCIe PF number X, sub-port Y. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-23	nfp: abm: force Ethternet port up	Jakub Kicinski	3	-0/+18
	Current control firmware does not cater too well to multi-host applications. There is no way to check which hosts are up or otherwise negotiate what the state of the external port (the Ethernet port) should be. Make sure the link is up when driver loads, and don't take it down when Ethernet port netdev is closed. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-23	nfp: abm: spawn port netdevs	Jakub Kicinski	4	-20/+225
	To configure buffering points we need full set of netdevs: ASIC user netdev -- \| -- PCIe port MAC port -- \| -- Configuring egrees qdiscs on user netdev configures standard Linux TC software qdiscs, configuring PCIe port qdiscs will provide a way of setting ASIC queuing parameters for PCIe block. MAC port netdev egress qdiscs correspond to ASIC MAC Traffic Manager block. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-23	nfp: add devlink_eswitch_mode_set callback	Jakub Kicinski	2	-0/+22
	Our previous apps all assumed to use only one eswitch mode (legacy or switchdev) without the ability to change it. ABM NIC will want to support the switch so plumb devlink_eswitch_mode_set through. The devlink_eswitch_mode_set is expected to spawn representors and potentially devlink ports so it's called under big devlink lock and pf->lock. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-23	devlink: don't take instance lock around eswitch mode set	Jakub Kicinski	1	-1/+2
	Changing switch mode may want to register and unregister devlink ports. Therefore similarly to DEVLINK_CMD_PORT_SPLIT/UNSPLIT it should not take the instance lock. Drivers don't depend on existing locking since it's a very recent addition. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-23	nfp: add app pointer to port representors	Jakub Kicinski	1	-0/+2
	nfp_apps can currently associate their structures with vNICs but not representors. Add app priv pointer to representors as well. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-23	nfp: abm: create project-specific vNIC structure	Jakub Kicinski	6	-4/+186
	ABM NIC requires more complex vNIC handling, allocate per-vNIC structure. Find out RX queue base and PCI PF id. There will be multiple PFs sharing the same MAC port, therefore the MAC address assigned to the vNIC must be looked up in the HWInfo database. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-23	nfp: abm: add initial active buffer management NIC skeleton	Jakub Kicinski	6	-0/+155
	Add a very rudimentary active buffer management NIC support. For now it's like a core NIC without SR-IOV support. Next commits will extend its functionality. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-23	nfp: core: allow 4-byte aligned accesses to Memory Units	Jakub Kicinski	1	-50/+44
	Current code doesn't enforce length requirements on 32bit accesses with action NFP_CPP_ACTION_RW to memory units, but if the access is only aligned to 4 bytes as well we will fall into the explicit access case and error out. Such accesses are correct, allow them by lowering the width earlier. While at it use a switch statement to improve readability. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-23	nfp: add shared buffer configuration	Jakub Kicinski	6	-1/+294
	Allow app FW to advertise its shared buffer pool information. Use the per-PF mailbox to configure them from devlink. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-23	nfp: add support for per-PCI PF mailbox	Jakub Kicinski	3	-0/+173
	When working with devlink-related functionality for locking reasons it's easier to create a new mailbox per-PCI PF device than try to use one of the netdev/vNIC mailboxes. Define new mailbox structure and resolve its symbol during probe. For forward compatibility allow silent truncation of mailbox command data. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-23	nfp: move rtsym helpers to pf code	Jakub Kicinski	3	-48/+51
	nfp_net_pf_rtsym_read_optional() and nfp_net_pf_map_rtsym() are not really related to networking code. Move them to the PF code and remove the net from their names. They will soon be needed by code outside of nfp_net_main.c anyway. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-23	net: add skeleton of bpfilter kernel module	Alexei Starovoitov	13	-0/+339
	bpfilter.ko consists of bpfilter_kern.c (normal kernel module code) and user mode helper code that is embedded into bpfilter.ko The steps to build bpfilter.ko are the following: - main.c is compiled by HOSTCC into the bpfilter_umh elf executable file - with quite a bit of objcopy and Makefile magic the bpfilter_umh elf file is converted into bpfilter_umh.o object file with _binary_net_bpfilter_bpfilter_umh_start and _end symbols Example: $ nm ./bld_x64/net/bpfilter/bpfilter_umh.o 0000000000004cf8 T _binary_net_bpfilter_bpfilter_umh_end 0000000000004cf8 A _binary_net_bpfilter_bpfilter_umh_size 0000000000000000 T _binary_net_bpfilter_bpfilter_umh_start - bpfilter_umh.o and bpfilter_kern.o are linked together into bpfilter.ko bpfilter_kern.c is a normal kernel module code that calls the fork_usermode_blob() helper to execute part of its own data as a user mode process. Notice that _binary_net_bpfilter_bpfilter_umh_start - end is placed into .init.rodata section, so it's freed as soon as __init function of bpfilter.ko is finished. As part of __init the bpfilter.ko does first request/reply action via two unix pipe provided by fork_usermode_blob() helper to make sure that umh is healthy. If not it will kill it via pid. Later bpfilter_process_sockopt() will be called from bpfilter hooks in get/setsockopt() to pass iptable commands into umh via bpfilter.ko If admin does 'rmmod bpfilter' the __exit code bpfilter.ko will kill umh as well. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-23	umh: introduce fork_usermode_blob() helper	Alexei Starovoitov	4	-12/+164
	Introduce helper: int fork_usermode_blob(void data, size_t len, struct umh_info info); struct umh_info { struct file pipe_to_umh; struct file pipe_from_umh; pid_t pid; }; that GPLed kernel modules (signed or unsigned) can use it to execute part of its own data as swappable user mode process. The kernel will do: - allocate a unique file in tmpfs - populate that file with [data, data + len] bytes - user-mode-helper code will do_execve that file and, before the process starts, the kernel will create two unix pipes for bidirectional communication between kernel module and umh - close tmpfs file, effectively deleting it - the fork_usermode_blob will return zero on success and populate 'struct umh_info' with two unix pipes and the pid of the user process As the first step in the development of the bpfilter project the fork_usermode_blob() helper is introduced to allow user mode code to be invoked from a kernel module. The idea is that user mode code plus normal kernel module code are built as part of the kernel build and installed as traditional kernel module into distro specified location, such that from a distribution point of view, there is no difference between regular kernel modules and kernel modules + umh code. Such modules can be signed, modprobed, rmmod, etc. The use of this new helper by a kernel module doesn't make it any special from kernel and user space tooling point of view. Such approach enables kernel to delegate functionality traditionally done by the kernel modules into the user space processes (either root or !root) and reduces security attack surface of the new code. The buggy umh code would crash the user process, but not the kernel. Another advantage is that umh code of the kernel module can be debugged and tested out of user space (e.g. opening the possibility to run clang sanitizers, fuzzers or user space test suites on the umh code). In case of the bpfilter project such architecture allows complex control plane to be done in the user space while bpf based data plane stays in the kernel. Since umh can crash, can be oom-ed by the kernel, killed by the admin, the kernel module that uses them (like bpfilter) needs to manage life time of umh on its own via two unix pipes and the pid of umh. The exit code of such kernel module should kill the umh it started, so that rmmod of the kernel module will cleanup the corresponding umh. Just like if the kernel module does kmalloc() it should kfree() it in the exit code. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>