wireguard-linux - WireGuard for the Linux kernel

Age	Commit message (Collapse)	Author	Files	Lines
2020-04-01	x86: get rid of 'rtype' argument to __get_user_asm() macro	Linus Torvalds	1	-11/+17
	This is the exact same thing as 3680785692fb ("x86: get rid of 'rtype' argument to __put_user_goto() macro") except it's about __get_user_asm() rather than __put_user_goto(). The reasons are the same: having the low-level asm access the argument with a different size than the compiler thinks it does is fundamentally wrong. But unlike the __put_user_goto() case, we actually did tell the compiler that we used a bigger variable (either long or long long), and then only filled in the low bits, and ended up "fixing" this by casting the result to the proper pointer type. That's because we needed to use a non-qualified type (the user pointer might be a const pointer!), and that makes this a bit more painful. Our '__inttype()' macro used to be lazy and only differentiate between "fits in a register" or "needs two registers". So this fix had to also make that '__inttype()' macro more precise. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-01	x86: get rid of 'rtype' argument to __put_user_goto() macro	Linus Torvalds	1	-7/+7
	The 'rtype' argument goes back to pre-git (and pre-BK) times, and comes from the fact that we used to not necessarily have the same type sizes for the arguments of the inline asm as we did for the actual accesses we did. So 'rtype' is the 'register type' - the override of the register size in the inline asm when it doesn't match the actual size of the variable we use as the output argument (for when you used "put_user()" on an "int" value that was assigned to a byte-sized user space access etc). That mismatch doesn't actually exist any more, and should probably never have existed in the first place. It's a horrid bug just waiting to happen (using more - or less - of the variable that the compiler expected us to use). I think we had some odd casting going on to hide the effects of that oddity after-the-fact, but those are long gone, and these days we should always have the right size value in the first place, using things like __typeof__(*(ptr)) __pu_val = (x); and gcc should thus have the right register size without any manual 'rtype' games. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-03-31	x86: get rid of 'errret' argument to __get_user_xyz() macross	Linus Torvalds	1	-15/+15
	Every remaining user just has the error case returning -EFAULT. In fact, the exception was __get_user_asm_nozero(), which was removed in commit 4b842e4e25b1 ("x86: get rid of small constant size cases in raw_copy_{to,from}_user()"), and the other __get_user_xyz() macros just followed suit for consistency. Fix up some macro whitespace while at it. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-03-31	x86: remove __put_user_asm() infrastructure	Linus Torvalds	1	-11/+0
	The last user was removed by commit 4b842e4e25b1 ("x86: get rid of small constant size cases in raw_copy_{to,from}_user()"). Get rid of the left-overs before somebody tries to use it again. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-03-31	net: ipv6: rpl_iptunnel: Fix potential memory leak in rpl_do_srh_inline	Gustavo A. R. Silva	1	-1/+3
	In case memory resources for buf were allocated, release them before return. Addresses-Coverity-ID: 1492011 ("Resource leak") Fixes: a7a29f9c361f ("net: ipv6: add rpl sr tunnel") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-31	cxgb4/chcr: nic-tls stats in ethtool	Rohit Maheshwari	1	-0/+18
	Included nic tls statistics in ethtool stats. Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-31	net: dsa: fix oops while probing Marvell DSA switches	Russell King	1	-1/+2
	Fix an oops in dsa_port_phylink_mac_change() caused by a combination of a20f997010c4 ("net: dsa: Don't instantiate phylink for CPU/DSA ports unless needed") and the net-dsa-improve-serdes-integration series of patches 65b7a2c8e369 ("Merge branch 'net-dsa-improve-serdes-integration'"). Unable to handle kernel NULL pointer dereference at virtual address 00000124 pgd = c0004000 [00000124] *pgd=00000000 Internal error: Oops: 805 [#1] SMP ARM Modules linked in: tag_edsa spi_nor mtd xhci_plat_hcd mv88e6xxx(+) xhci_hcd armada_thermal marvell_cesa dsa_core ehci_orion libdes phy_armada38x_comphy at24 mcp3021 sfp evbug spi_orion sff mdio_i2c CPU: 1 PID: 214 Comm: irq/55-mv88e6xx Not tainted 5.6.0+ #470 Hardware name: Marvell Armada 380/385 (Device Tree) PC is at phylink_mac_change+0x10/0x88 LR is at mv88e6352_serdes_irq_status+0x74/0x94 [mv88e6xxx] Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-31	net/bpfilter: remove superfluous testing message	Bruno Meneguele	1	-1/+0
	A testing message was brought by 13d0f7b814d9 ("net/bpfilter: fix dprintf usage for /dev/kmsg") but should've been deleted before patch submission. Although it doesn't cause any harm to the code or functionality itself, it's totally unpleasant to have it displayed on every loop iteration with no real use case. Thus remove it unconditionally. Fixes: 13d0f7b814d9 ("net/bpfilter: fix dprintf usage for /dev/kmsg") Signed-off-by: Bruno Meneguele <bmeneg@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-31	net: macb: Fix handling of fixed-link node	Codrin Ciubotariu	1	-0/+3
	fixed-link nodes are treated as PHY nodes by of_mdiobus_child_is_phy(). We must check if the interface is a fixed-link before looking up for PHY nodes. Fixes: 7897b071ac3b ("net: macb: convert to phylink") Tested-by: Cristian Birsan <cristian.birsan@microchip.com> Signed-off-by: Codrin Ciubotariu <codrin.ciubotariu@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-31	net: dsa: ksz: Select KSZ protocol tag	Codrin Ciubotariu	1	-0/+1
	KSZ protocol tag is needed by the KSZ DSA drivers. Fixes: 0b9f9dfbfab4 ("dsa: Allow tag drivers to be built as modules") Tested-by: Cristian Birsan <cristian.birsan@microchip.com> Signed-off-by: Codrin Ciubotariu <codrin.ciubotariu@microchip.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-01	Makefile: Update kselftest help information	Shuah Khan	1	-6/+9
	Update kselftest help information. Signed-off-by: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2020-03-30	netdevsim: dev: Fix memory leak in nsim_dev_take_snapshot_write	Gustavo A. R. Silva	1	-0/+1
	In case memory resources for dummy_data were allocated, release them before return. Addresses-Coverity-ID: 1491997 ("Resource leak") Fixes: 7ef19d3b1d5e ("devlink: report error once U32_MAX snapshot ids have been used") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30	net: stmmac: add EHL 2.5Gbps PCI info and PCI ID	Voon Weifeng	1	-8/+16
	Add EHL SGMII 2.5Gbps PCI info and PCI ID Signed-off-by: Voon Weifeng <weifeng.voon@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30	net: stmmac: add EHL PSE0 & PSE1 1Gbps PCI info and PCI ID	Voon Weifeng	1	-0/+75
	Add EHL PSE0/1 RGMII & SGMII 1Gbps PCI info and PCI ID Signed-off-by: Voon Weifeng <weifeng.voon@intel.com> Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30	net: stmmac: create dwmac-intel.c to contain all Intel platform	Voon Weifeng	4	-313/+519
	As stmmac_pci.c file is getting bigger and more complex, it is reasonable to separate all the Intel specific dwmac pci device to a different file. This move includes Intel Quark, TGL and EHL. A new kernel config CONFIG_DWMAC_INTEL is introduced and depends on X86. For this initial patch, all the necessary function such as probe() and exit() are identical besides the function name. Signed-off-by: Voon Weifeng <weifeng.voon@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30	net: dsa: bcm_sf2: Support specifying VLAN tag egress rule	Florian Fainelli	1	-2/+38
	The port to which the ASP is connected on 7278 is not capable of processing VLAN tags as part of the Ethernet frame, so allow an user to configure the egress VLAN policy they want to see applied by purposing the h_ext.data[1] field. Bit 0 is used to indicate that 0=tagged, 1=untagged. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30	net: dsa: bcm_sf2: Add support for matching VLAN TCI	Florian Fainelli	1	-15/+38
	Update relevant code paths to support the programming and matching of VLAN TCI, this is the only member of the ethtool_flow_ext that we can match, the switch does not permit matching the VLAN Ethernet Type field. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30	net: dsa: bcm_sf2: Move writing of CFP_DATA(5) into slicing functions	Florian Fainelli	1	-32/+32
	In preparation for matching VLANs, move the writing of CFP_DATA(5) into the IPv4 and IPv6 slicing logic since they are part of the per-flow configuration. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30	net: dsa: bcm_sf2: Check earlier for FLOW_EXT and FLOW_MAC_EXT	Florian Fainelli	1	-2/+3
	We do not currently support matching on FLOW_EXT or FLOW_MAC_EXT, but we were not checking for those bits being set in the flow specification. The check for FLOW_EXT and FLOW_MAC_EXT are separated out because a subsequent commit will add support for matching VLAN TCI which are covered by FLOW_EXT. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30	net: dsa: bcm_sf2: Disable learning for ASP port	Florian Fainelli	1	-1/+9
	We don't want to enable learning for the ASP port since it only receives directed traffic, this allows us to bypass ARL-driven forwarding rules which could conflict with Broadcom tags and/or CFP forwarding. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30	net: dsa: b53: Deny enslaving port 7 for 7278 into a bridge	Florian Fainelli	1	-0/+6
	On 7278, port 7 connects to the ASP which should only receive frames through the use of CFP rules, it is not desirable to have it be part of a bridge at all since that would make it pick up unwanted traffic that it may not even be able to filter or sustain. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30	net: dsa: b53: Prevent tagged VLAN on port 7 for 7278	Florian Fainelli	1	-0/+8
	On 7278, port 7 of the switch connects to the ASP UniMAC which is not capable of processing VLAN tagged frames. We can still allow the port to be part of a VLAN entry, and we may want it to be untagged on egress on that VLAN because of that limitation. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30	net: dsa: b53: Restore VLAN entries upon (re)configuration	Florian Fainelli	1	-0/+15
	The first time b53_configure_vlan() is called we have not configured any VLAN entries yet, since that happens later when interfaces get brought up. When b53_configure_vlan() is called again from suspend/resume we need to restore all VLAN entries though. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30	net: dsa: bcm_sf2: Fix overflow checks	Florian Fainelli	1	-6/+3
	Commit f949a12fd697 ("net: dsa: bcm_sf2: fix buffer overflow doing set_rxnfc") tried to fix the some user controlled buffer overflows in bcm_sf2_cfp_rule_set() and bcm_sf2_cfp_rule_del() but the fix was using CFP_NUM_RULES, which while it is correct not to overflow the bitmaps, is not representative of what the device actually supports. Correct that by using bcm_sf2_cfp_rule_size() instead. The latter subtracts the number of rules by 1, so change the checks from greater than or equal to greater than accordingly. Fixes: f949a12fd697 ("net: dsa: bcm_sf2: fix buffer overflow doing set_rxnfc") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30	hv_netvsc: Remove unnecessary round_up for recv_completion_cnt	Haiyang Zhang	1	-4/+5
	The vzalloc_node(), already rounds the total size to whole pages, and sizeof(u64) is smaller than sizeof(struct recv_comp_data). So round_up of recv_completion_cnt is not necessary, and may cause extra memory allocation. To save memory, remove this unnecessary round_up for recv_completion_cnt. Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30	selftests: mlxsw: Add test cases for devlink-trap policers	Ido Schimmel	2	-0/+390
	Add test cases that verify that each registered packet trap policer: * Honors that imposed limitations of rate and burst size * Able to police trapped packets to the specified rate * Able to police trapped packets to the specified burst size * Able to be unbound from its trap group Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30	mlxsw: spectrum_trap: Add support for setting of packet trap group parameters	Ido Schimmel	5	-5/+46
	Implement support for setting of packet trap group parameters by invoking the trap_group_init() callback with the new parameters. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30	mlxsw: spectrum_trap: Switch to use correct packet trap group	Ido Schimmel	3	-18/+16
	Some packet traps are currently exposed to user space as being member of "l3_drops" trap group, but internally they are member of a different group. Switch these traps to use the correct group so that they are all subject to the same policer, as exposed to user space. Set the trap priority of packets trapped due to loopback error during routing to the lowest priority. Such packets are not routed again by the kernel and therefore should not mask other traps (e.g., host miss) that should be routed. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30	mlxsw: spectrum_trap: Do not initialize dedicated discard policer	Ido Schimmel	1	-9/+1
	The policer is now initialized as part of the registration with devlink, so there is no need to initialize it before the registration. Remove the initialization. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30	mlxsw: spectrum_trap: Add devlink-trap policer support	Ido Schimmel	6	-11/+297
	Register supported packet trap policers with devlink and implement callbacks to change their parameters and read their counters. Prevent user space from passing invalid policer parameters down to the device by checking their validity and communicating the failure via an appropriate extack message. v2: * Remove the max/min validity checks from __mlxsw_sp_trap_policer_set() Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30	mlxsw: spectrum_trap: Prepare policers for registration with devlink	Ido Schimmel	2	-1/+77
	Prepare an array of policer IDs to register with devlink and their associated parameters. The array is composed from both policers that are currently bound to exposed trap groups and policers that are not bound to any trap group. v2: * Provide max/min rate/burst size when registering policers Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30	mlxsw: spectrum: Track used packet trap policer IDs	Ido Schimmel	4	-3/+38
	During initialization the driver configures various packet trap groups and binds policers to them. Currently, most of these groups are not exposed to user space and therefore their policers should not be exposed as well. Otherwise, user space will be able to alter policer parameters without knowing which packet traps are policed by the policer. Use a bitmap to track the used policer IDs so that these policers will not be registered with devlink in a subsequent patch. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30	mlxsw: reg: Extend QPCR register	Ido Schimmel	1	-0/+17
	The QoS Policer Configuration Register (QPCR) is used to configure hardware policers. Extend this register with following fields and defines which will be used by subsequent patches: 1. Violate counter: reads number of packets dropped by the policer 2. Clear counter: to ensure we start counting from 0 3. Rate and burst size limits Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30	selftests: netdevsim: Add test cases for devlink-trap policers	Ido Schimmel	2	-0/+153
	Add test cases for packet trap policer set / show commands as well as for the binding of these policers to packet trap groups. Both good and bad flows are tested for maximum coverage. v2: * Add test case with new 'fail_trap_policer_set' knob * Add test case for partially modified trap group Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30	netdevsim: Add support for setting of packet trap group parameters	Ido Schimmel	2	-0/+18
	Add a dummy callback to set trap group parameters. Return an error when the 'fail_trap_group_set' debugfs file is set in order to exercise error paths and verify that error is propagated to user space when should. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30	devlink: Allow setting of packet trap group parameters	Ido Schimmel	2	-2/+63
	The previous patch allowed device drivers to publish their default binding between packet trap policers and packet trap groups. However, some users might not be content with this binding and would like to change it. In case user space passed a packet trap policer identifier when setting a packet trap group, invoke the appropriate device driver callback and pass the new policer identifier. v2: * Check for presence of 'DEVLINK_ATTR_TRAP_POLICER_ID' in devlink_trap_group_set() and bail if not present * Add extack error message in case trap group was partially modified Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30	devlink: Add packet trap group parameters support	Ido Schimmel	4	-9/+43
	Packet trap groups are used to aggregate logically related packet traps. Currently, these groups allow user space to batch operations such as setting the trap action of all member traps. In order to prevent the CPU from being overwhelmed by too many trapped packets, it is desirable to bind a packet trap policer to these groups. For example, to limit all the packets that encountered an exception during routing to 10Kpps. Allow device drivers to bind default packet trap policers to packet trap groups when the latter are registered with devlink. The next patch will enable user space to change this default binding. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30	netdevsim: Add devlink-trap policer support	Ido Schimmel	2	-1/+86
	Register three dummy packet trap policers with devlink and implement callbacks to change their parameters and read their counters. This will be used later on in the series to test the devlink-trap policer infrastructure. v2: * Remove check about burst size being a power of 2 and instead add a debugfs knob to fail the operation * Provide max/min rate/burst size when registering policers and remove the validity checks from nsim_dev_devlink_trap_policer_set() Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30	Documentation: Add description of packet trap policers	Ido Schimmel	1	-0/+26
	Extend devlink-trap documentation with information about packet trap policers. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30	devlink: Add packet trap policers support	Ido Schimmel	3	-0/+515
	Devices capable of offloading the kernel's datapath and perform functions such as bridging and routing must also be able to send (trap) specific packets to the kernel (i.e., the CPU) for processing. For example, a device acting as a multicast-aware bridge must be able to trap IGMP membership reports to the kernel for processing by the bridge module. In most cases, the underlying device is capable of handling packet rates that are several orders of magnitude higher compared to those that can be handled by the CPU. Therefore, in order to prevent the underlying device from overwhelming the CPU, devices usually include packet trap policers that are able to police the trapped packets to rates that can be handled by the CPU. This patch allows capable device drivers to register their supported packet trap policers with devlink. User space can then tune the parameters of these policer (currently, rate and burst size) and read from the device the number of packets that were dropped by the policer, if supported. Subsequent patches in the series will allow device drivers to create default binding between these policers and packet trap groups and allow user space to change the binding. v2: * Add 'strict_start_type' in devlink policy * Have device drivers provide max/min rate/burst size for each policer. Use them to check validity of user provided parameters Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30	selftests/bpf: Test FD-based cgroup attachment	Andrii Nakryiko	2	-0/+268
	Add selftests to exercise FD-based cgroup BPF program attachments and their intermixing with legacy cgroup BPF attachments. Auto-detachment and program replacement (both unconditional and cmpxchng-like) are tested as well. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200330030001.2312810-5-andriin@fb.com
2020-03-30	libbpf: Add support for bpf_link-based cgroup attachment	Andrii Nakryiko	6	-1/+122
	Add bpf_program__attach_cgroup(), which uses BPF_LINK_CREATE subcommand to create an FD-based kernel bpf_link. Also add low-level bpf_link_create() API. If expected_attach_type is not specified explicitly with bpf_program__set_expected_attach_type(), libbpf will try to determine proper attach type from BPF program's section definition. Also add support for bpf_link's underlying BPF program replacement: - unconditional through high-level bpf_link__update_program() API; - cmpxchg-like with specifying expected current BPF program through low-level bpf_link_update() API. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200330030001.2312810-4-andriin@fb.com
2020-03-30	bpf: Implement bpf_prog replacement for an active bpf_cgroup_link	Andrii Nakryiko	5	-0/+186
	Add new operation (LINK_UPDATE), which allows to replace active bpf_prog from under given bpf_link. Currently this is only supported for bpf_cgroup_link, but will be extended to other kinds of bpf_links in follow-up patches. For bpf_cgroup_link, implemented functionality matches existing semantics for direct bpf_prog attachment (including BPF_F_REPLACE flag). User can either unconditionally set new bpf_prog regardless of which bpf_prog is currently active under given bpf_link, or, optionally, can specify expected active bpf_prog. If active bpf_prog doesn't match expected one, no changes are performed, old bpf_link stays intact and attached, operation returns a failure. cgroup_bpf_replace() operation is resolving race between auto-detachment and bpf_prog update in the same fashion as it's done for bpf_link detachment, except in this case update has no way of succeeding because of target cgroup marked as dying. So in this case error is returned. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200330030001.2312810-3-andriin@fb.com
2020-03-30	bpf: Implement bpf_link-based cgroup BPF program attachment	Andrii Nakryiko	7	-98/+354
	Implement new sub-command to attach cgroup BPF programs and return FD-based bpf_link back on success. bpf_link, once attached to cgroup, cannot be replaced, except by owner having its FD. Cgroup bpf_link supports only BPF_F_ALLOW_MULTI semantics. Both link-based and prog-based BPF_F_ALLOW_MULTI attachments can be freely intermixed. To prevent bpf_cgroup_link from keeping cgroup alive past the point when no BPF program can be executed, implement auto-detachment of link. When cgroup_bpf_release() is called, all attached bpf_links are forced to release cgroup refcounts, but they leave bpf_link otherwise active and allocated, as well as still owning underlying bpf_prog. This is because user-space might still have FDs open and active, so bpf_link as a user-referenced object can't be freed yet. Once last active FD is closed, bpf_link will be freed and underlying bpf_prog refcount will be dropped. But cgroup refcount won't be touched, because cgroup is released already. The inherent race between bpf_cgroup_link release (from closing last FD) and cgroup_bpf_release() is resolved by both operations taking cgroup_mutex. So the only additional check required is when bpf_cgroup_link attempts to detach itself from cgroup. At that time we need to check whether there is still cgroup associated with that link. And if not, exit with success, because bpf_cgroup_link was already successfully detached. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Roman Gushchin <guro@fb.com> Link: https://lore.kernel.org/bpf/20200330030001.2312810-2-andriin@fb.com
2020-03-30	staging/octeon: fix up merge error	Randy Dunlap	1	-1/+1
	There's a semantic conflict in the Octeon staging network driver, which used the skb_reset_tc() function to reset skb state when re-using an skb. But that inline helper function was removed in mainline by commit 2c64605b590e ("net: Fix CONFIG_NET_CLS_ACT=n and CONFIG_NFT_FWD_NETDEV={y, m} build"). Fix it by using skb_reset_redirect() instead. Also move it out of the This code path only ends up triggering if REUSE_SKBUFFS_WITHOUT_FREE is enabled, which in turn only happens if you don't have CONFIG_NETFILTER configured. Which was how this wasn't caught by the usual allmodconfig builds. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-03-30	selinux: clean up indentation issue with assignment statement	Colin Ian King	1	-4/+3
	The assignment of e->type_names is indented one level too deep, clean this up by removing the extraneous tab. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-03-30	NFS: Ensure security label is set for root inode	Scott Mayhew	4	-38/+39
	When using NFSv4.2, the security label for the root inode should be set via a call to nfs_setsecurity() during the mount process, otherwise the inode will appear as unlabeled for up to acdirmin seconds. Currently the label for the root inode is allocated, retrieved, and freed entirely witin nfs4_proc_get_root(). Add a field for the label to the nfs_fattr struct, and allocate & free the label in nfs_get_root(), where we also add a call to nfs_setsecurity(). Note that for the call to nfs_setsecurity() to succeed, it's necessary to also move the logic calling security_sb_{set,clone}_security() from nfs_get_tree_common() down into nfs_get_root()... otherwise the SBLABEL_MNT flag will not be set in the super_block's security flags and nfs_setsecurity() will silently fail. Reported-by: Richard Haines <richard_c_haines@btinternet.com> Signed-off-by: Scott Mayhew <smayhew@redhat.com> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Tested-by: Stephen Smalley <sds@tycho.nsa.gov> [PM: fixed 80-char line width problems] Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-03-30	bpf: Test_verifier, add alu32 bounds tracking tests	John Fastabend	1	-0/+39
	Its possible to have divergent ALU32 and ALU64 bounds when using JMP32 instructins and ALU64 arithmatic operations. Sometimes the clang will even generate this code. Because the case is a bit tricky lets add a specific test for it. Here is pseudocode asm version to illustrate the idea, 1 r0 = 0xffffffff00000001; 2 if w0 > 1 goto %l[fail]; 3 r0 += 1 5 if w0 > 2 goto %l[fail] 6 exit The intent here is the verifier will fail the load if the 32bit bounds are not tracked correctly through ALU64 op. Similarly we can check the 64bit bounds are correctly zero extended after ALU32 ops. 1 r0 = 0xffffffff00000001; 2 w0 += 1 2 if r0 > 3 goto %l[fail]; 6 exit The above will fail if we do not correctly zero extend 64bit bounds after 32bit op. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/158560430155.10843.514209255758200922.stgit@john-Precision-5820-Tower
2020-03-30	bpf: Test_verifier, #65 error message updates for trunc of boundary-cross	John Fastabend	1	-8/+4
	After changes to add update_reg_bounds after ALU ops and 32-bit bounds tracking truncation of boundary crossing range will fail earlier and with a different error message. Now the test error trace is the following 11: (17) r1 -= 2147483584 12: R0_w=map_value(id=0,off=0,ks=8,vs=8,imm=0) R1_w=invP(id=0,smin_value=-2147483584,smax_value=63) R10=fp0 fp-8_w=mmmmmmmm 12: (17) r1 -= 2147483584 13: R0_w=map_value(id=0,off=0,ks=8,vs=8,imm=0) R1_w=invP(id=0, umin_value=18446744069414584448,umax_value=18446744071562068095, var_off=(0xffffffff00000000; 0xffffffff)) R10=fp0 fp-8_w=mmmmmmmm 13: (77) r1 >>= 8 14: R0_w=map_value(id=0,off=0,ks=8,vs=8,imm=0) R1_w=invP(id=0, umin_value=72057594021150720,umax_value=72057594029539328, var_off=(0xffffffff000000; 0xffffff), s32_min_value=-16777216,s32_max_value=-1, u32_min_value=-16777216) R10=fp0 fp-8_w=mmmmmmmm 14: (0f) r0 += r1 value 72057594021150720 makes map_value pointer be out of bounds Because we have 'umin_value == umax_value' instead of previously where 'umin_value != umax_value' we can now fail earlier noting that pointer addition is out of bounds. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/158560428103.10843.6316594510312781186.stgit@john-Precision-5820-Tower
2020-03-30	bpf: Test_verifier, bpf_get_stack return value add <0	John Fastabend	1	-4/+4
	With current ALU32 subreg handling and retval refine fix from last patches we see an expected failure in test_verifier. With verbose verifier state being printed at each step for clarity we have the following relavent lines [I omit register states that are not necessarily useful to see failure cause], #101/p bpf_get_stack return R0 within range FAIL Failed to load prog 'Success'! [..] 14: (85) call bpf_get_stack#67 R0_w=map_value(id=0,off=0,ks=8,vs=48,imm=0) R3_w=inv48 15: R0=inv(id=0,smax_value=48,var32_off=(0x0; 0xffffffff)) 15: (b7) r1 = 0 16: R0=inv(id=0,smax_value=48,var32_off=(0x0; 0xffffffff)) R1_w=inv0 16: (bf) r8 = r0 17: R0=inv(id=0,smax_value=48,var32_off=(0x0; 0xffffffff)) R1_w=inv0 R8_w=inv(id=0,smax_value=48,var32_off=(0x0; 0xffffffff)) 17: (67) r8 <<= 32 18: R0=inv(id=0,smax_value=48,var32_off=(0x0; 0xffffffff)) R1_w=inv0 R8_w=inv(id=0,smax_value=9223372032559808512, umax_value=18446744069414584320, var_off=(0x0; 0xffffffff00000000), s32_min_value=0, s32_max_value=0, u32_max_value=0, var32_off=(0x0; 0x0)) 18: (c7) r8 s>>= 32 19 R0=inv(id=0,smax_value=48,var32_off=(0x0; 0xffffffff)) R1_w=inv0 R8_w=inv(id=0,smin_value=-2147483648, smax_value=2147483647, var32_off=(0x0; 0xffffffff)) 19: (cd) if r1 s< r8 goto pc+16 R0=inv(id=0,smax_value=48,var32_off=(0x0; 0xffffffff)) R1_w=inv0 R8_w=inv(id=0,smin_value=-2147483648, smax_value=0, var32_off=(0x0; 0xffffffff)) 20: R0=inv(id=0,smax_value=48,var32_off=(0x0; 0xffffffff)) R1_w=inv0 R8_w=inv(id=0,smin_value=-2147483648, smax_value=0, R9=inv48 20: (1f) r9 -= r8 21: (bf) r2 = r7 22: R2_w=map_value(id=0,off=0,ks=8,vs=48,imm=0) 22: (0f) r2 += r8 value -2147483648 makes map_value pointer be out of bounds After call bpf_get_stack() on line 14 and some moves we have at line 16 an r8 bound with max_value 48 but an unknown min value. This is to be expected bpf_get_stack call can only return a max of the input size but is free to return any negative error in the 32-bit register space. The C helper is returning an int so will use lower 32-bits. Lines 17 and 18 clear the top 32 bits with a left/right shift but use ARSH so we still have worst case min bound before line 19 of -2147483648. At this point the signed check 'r1 s< r8' meant to protect the addition on line 22 where dst reg is a map_value pointer may very well return true with a large negative number. Then the final line 22 will detect this as an invalid operation and fail the program. What we want to do is proceed only if r8 is positive non-error. So change 'r1 s< r8' to 'r1 s> r8' so that we jump if r8 is negative. Next we will throw an error because we access past the end of the map value. The map value size is 48 and sizeof(struct test_val) is 48 so we walk off the end of the map value on the second call to get bpf_get_stack(). Fix this by changing sizeof(struct test_val) to 24 by using 'sizeof(struct test_val) / 2'. After this everything passes as expected. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/158560426019.10843.3285429543232025187.stgit@john-Precision-5820-Tower