linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2022-10-05	cpufreq: amd-pstate: Add explanation for X86_AMD_PSTATE_UT	Meng Li	2	-0/+9
	This kernel module is used for testing. It's safe to say M here. It can also be built-in without X86_AMD_PSTATE enabled. Currently, only tests for amd-pstate are supported. If X86_AMD_PSTATE is set disabled, it can tell the users test can only run on amd-pstate driver, please set X86_AMD_PSTATE enabled. In the future, comparison tests will be added. It can set amd-pstate disabled and set acpi-cpufreq enabled to run test cases, then compare the test results. Suggested-by: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: Meng Li <li.meng@amd.com> Acked-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2022-10-05	selftests/cpu-hotplug: Add log info when test success	Zhao Gongyi	1	-1/+1
	Add log information when run full test successfully. Signed-off-by: Zhao Gongyi <zhaogongyi@huawei.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2022-10-05	selftests/cpu-hotplug: Reserve one cpu online at least	Zhao Gongyi	1	-18/+22
	Considering that we can not offline all cpus in any cases, we need to reserve one cpu online when the test offline all hotpluggable online cpus, otherwise the test will fail forever. Fixes: d89dffa976bc ("fault-injection: add selftests for cpu and memory hotplug") Signed-off-by: Zhao Gongyi <zhaogongyi@huawei.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2022-10-05	selftests/cpu-hotplug: Delete fault injection related code	Zhao Gongyi	2	-82/+6
	Delete fault injection related code since the module has been deleted. Signed-off-by: Zhao Gongyi <zhaogongyi@huawei.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2022-10-05	selftests/cpu-hotplug: Use return instead of exit	Zhao Gongyi	1	-5/+8
	Some cpus will be left in offline state when online function exits in some error conditions. Use return instead of exit to fix it. Signed-off-by: Zhao Gongyi <zhaogongyi@huawei.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2022-10-05	selftests/cpu-hotplug: Correct log info	Zhao Gongyi	1	-1/+1
	Correct the log info to match the test. Signed-off-by: Zhao Gongyi <zhaogongyi@huawei.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2022-10-05	cpufreq: amd-pstate: modify type in argument 2 for filp_open	Meng Li	1	-1/+1
	Modify restricted FMODE_PREAD to experted int O_RDONLY to fix the sparse warnings below: sparse warnings: (new ones prefixed by >>) >> drivers/cpufreq/amd-pstate-ut.c:74:40: sparse: sparse: incorrect type >> in argument 2 (different base types) @@ expected int @@ got >> restricted fmode_t [usertype] @@ drivers/cpufreq/amd-pstate-ut.c:74:40: sparse: expected int drivers/cpufreq/amd-pstate-ut.c:74:40: sparse: got restricted fmode_t [usertype] Signed-off-by: Meng Li <li.meng@amd.com> Reported-by: kernel test robot <lkp@intel.com> Acked-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2022-10-05	Documentation: amd-pstate: Add unit test introduction	Meng Li	1	-0/+76
	Introduce the AMD P-State unit test module design and implementation. It also talks about kselftest and how to use. Signed-off-by: Meng Li <li.meng@amd.com> Acked-by: Huang Rui <ray.huang@amd.com> Reviewed-by: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2022-10-05	selftests: amd-pstate: Add test trigger for amd-pstate driver	Meng Li	4	-0/+66
	Add amd-pstate test trigger in kselftest, it will load/unload amd-pstate-ut module to test some cases etc. Signed-off-by: Meng Li <li.meng@amd.com> Acked-by: Huang Rui <ray.huang@amd.com> Reviewed-by: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2022-10-05	cpufreq: amd-pstate: Add test module for amd-pstate driver	Meng Li	3	-0/+301
	Add amd-pstate-ut test module, this module is used by kselftest to unit test amd-pstate functionality. This module will be expected by some of selftests to be present and loaded. Signed-off-by: Meng Li <li.meng@amd.com> Acked-by: Huang Rui <ray.huang@amd.com> Reviewed-by: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2022-10-05	cpufreq: amd-pstate: Expose struct amd_cpudata	Meng Li	3	-59/+79
	Expose struct amd_cpudata to AMD P-State unit test module. This data struct will be used on the following AMD P-State unit test (amd-pstate-ut) module. The amd-pstate-ut module can get some AMD infomations by this data struct. For example: highest perf, nominal perf, boost supported etc. Signed-off-by: Meng Li <li.meng@amd.com> Acked-by: Huang Rui <ray.huang@amd.com> Acked-by: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2022-10-05	selftests/vm: use top_srcdir instead of recomputing relative paths	Axel Rasmussen	4	-5/+5
	In various places both in t/t/s/v/Makefile as well as some of the test sources, we were referring to headers or directories using some fairly long relative paths. Since we have a working top_srcdir variable though, which refers to the root of the kernel tree, we can clean up all of these "up and over" relative paths, just relying on the single variable instead. Signed-off-by: Axel Rasmussen <axelrasmussen@google.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2022-10-05	arm64: alternatives: Use vdso/bits.h instead of linux/bits.h	Nathan Chancellor	1	-1/+1
	When building with CONFIG_LTO after commit ba00c2a04fa5 ("arm64: fix the build with binutils 2.27"), the following build error occurs: In file included from arch/arm64/kernel/module-plts.c:6: In file included from include/linux/elf.h:6: In file included from arch/arm64/include/asm/elf.h:8: In file included from arch/arm64/include/asm/hwcap.h:9: In file included from arch/arm64/include/asm/cpufeature.h:9: In file included from arch/arm64/include/asm/alternative-macros.h:5: In file included from include/linux/bits.h:22: In file included from include/linux/build_bug.h:5: In file included from include/linux/compiler.h:248: In file included from arch/arm64/include/asm/rwonce.h:71: include/asm-generic/rwonce.h:67:9: error: expected string literal in 'asm' return __READ_ONCE((unsigned long )addr); ^ arch/arm64/include/asm/rwonce.h:43:16: note: expanded from macro '__READ_ONCE' asm volatile(__LOAD_RCPC(b, %w0, %1) \ ^ arch/arm64/include/asm/rwonce.h:17:2: note: expanded from macro '__LOAD_RCPC' ALTERNATIVE( \ ^ Similar to the issue resolved by commit 0072dc1b53c3 ("arm64: avoid BUILD_BUG_ON() in alternative-macros"), there is a circular include dependency through <linux/bits.h> when CONFIG_LTO is enabled due to <asm/rwonce.h> appearing in the include chain before the contents of <asm/alternative-macros.h>, which results in ALTERNATIVE() not getting expanded properly because it has not been defined yet. Avoid this issue by including <vdso/bits.h>, which includes the definition of the BIT() macro, instead of <linux/bits.h>, as BIT() is the only macro from bits.h that is relevant to this header. Fixes: ba00c2a04fa5 ("arm64: fix the build with binutils 2.27") Link: https://github.com/ClangBuiltLinux/linux/issues/1728 Signed-off-by: Nathan Chancellor <nathan@kernel.org> Tested-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20221003193759.1141709-1-nathan@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-10-05	Revert "ARM: dts: BCM5301X: Add basic PCI controller properties"	Arnd Bergmann	1	-12/+0
	Adding the #address-cells/#size-cells properties without also adding the other required properties for PCI nodes causes new build warnings from dtc that now show up everywhere, rather than just while verifying the yaml bindings: arch/arm/boot/dts/bcm5301x.dtsi:240.21-246.5: Warning (pci_bridge): /axi@18000000/pcie@12000: missing ranges for PCI bridge (or not a bridge) arch/arm/boot/dts/bcm5301x.dtsi:248.21-254.5: Warning (pci_bridge): /axi@18000000/pcie@13000: missing ranges for PCI bridge (or not a bridge) arch/arm/boot/dts/bcm5301x.dtsi:256.21-262.5: Warning (pci_bridge): /axi@18000000/pcie@14000: missing ranges for PCI bridge (or not a bridge) arch/arm/boot/dts/bcm47094-luxul-xbr-4500.dtb: Warning (unit_address_format): Failed prerequisite 'pci_bridge' Revert it for now. Fixes: 61dc1e3850a6 ("ARM: dts: BCM5301X: Add basic PCI controller properties") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2022-10-04	ARM: 9246/1: dump: show page table level name	Wang Kefeng	1	-1/+5
	ARM could have 3 page table level if ARM_LPAE enabled, or only 2 page table level, let's show the page table level name when dump. Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2022-10-04	ARM: 9245/1: dump: show FDT region	Wang Kefeng	1	-1/+1
	Since commit 7a1be318f579 ("ARM: 9012/1: move device tree mapping out of linear region"), FDT is placed between the end of the vmalloc region and the start of the fixmap region, let's show it in dump. Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2022-10-04	ARM: 9242/1: kasan: Only map modules if CONFIG_KASAN_VMALLOC=n	Alex Sverdlin	1	-2/+7
	In case CONFIG_KASAN_VMALLOC=y kasan_populate_vmalloc() allocates the shadow pages dynamically. But even worse is that kasan_release_vmalloc() releases them, which is not compatible with create_mapping() of MODULES_VADDR..MODULES_END range: BUG: Bad page state in process kworker/9:1 pfn:2068b page:e5e06160 refcount:0 mapcount:0 mapping:00000000 index:0x0 flags: 0x1000(reserved) raw: 00001000 e5e06164 e5e06164 00000000 00000000 00000000 ffffffff 00000000 page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set bad because of flags: 0x1000(reserved) Modules linked in: ip_tables CPU: 9 PID: 154 Comm: kworker/9:1 Not tainted 5.4.188-... #1 Hardware name: LSI Axxia AXM55XX Workqueue: events do_free_init unwind_backtrace show_stack dump_stack bad_page free_pcp_prepare free_unref_page kasan_depopulate_vmalloc_pte __apply_to_page_range apply_to_existing_page_range kasan_release_vmalloc __purge_vmap_area_lazy _vm_unmap_aliases.part.0 __vunmap do_free_init process_one_work worker_thread kthread Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Alexander Sverdlin <alexander.sverdlin@nokia.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2022-10-04	ARM: 9240/1: dma-mapping: Pass (void *) to virt_to_page()	Linus Walleij	1	-1/+1
	Pointers to virtual memory functions are (void *) but the __dma_update_pte() function is passing an unsigned long. Fix this up by explicit cast. Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2022-10-04	ARM: 9234/1: stacktrace: Avoid duplicate saving of exception PC value	Li Huafei	3	-14/+37
	Because an exception stack frame is not created in the exception entry, save_trace() does special handling for the exception PC, but this is only needed when CONFIG_FRAME_POINTER_UNWIND=y. When CONFIG_ARM_UNWIND=y, unwind annotations have been added to the exception entry and save_trace() will repeatedly save the exception PC: [0x7f000090] hrtimer_hander+0x8/0x10 [hrtimer] [0x8019ec50] __hrtimer_run_queues+0x18c/0x394 [0x8019f760] hrtimer_run_queues+0xbc/0xd0 [0x8019def0] update_process_times+0x34/0x80 [0x801ad2a4] tick_periodic+0x48/0xd0 [0x801ad3dc] tick_handle_periodic+0x1c/0x7c [0x8010f2e0] twd_handler+0x30/0x40 [0x80177620] handle_percpu_devid_irq+0xa0/0x23c [0x801718d0] generic_handle_domain_irq+0x24/0x34 [0x80502d28] gic_handle_irq+0x74/0x88 [0x8085817c] generic_handle_arch_irq+0x58/0x78 [0x80100ba8] __irq_svc+0x88/0xc8 [0x80108114] arch_cpu_idle+0x38/0x3c [0x80108114] arch_cpu_idle+0x38/0x3c <==== duplicate saved exception PC [0x80861bf8] default_idle_call+0x38/0x130 [0x8015d5cc] do_idle+0x150/0x214 [0x8015d978] cpu_startup_entry+0x18/0x1c [0x808589c0] rest_init+0xd8/0xdc [0x80c00a44] arch_post_acpi_subsys_init+0x0/0x8 We can move the special handling of the exception PC in save_trace() to the unwind_frame() of the frame pointer unwinder. Signed-off-by: Li Huafei <lihuafei1@huawei.com> Reviewed-by: Linus Waleij <linus.walleij@linaro.org> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2022-10-04	ARM: 9233/1: stacktrace: Skip frame pointer boundary check for call_with_stack()	Li Huafei	2	-7/+35
	When using the frame pointer unwinder, it was found that the stack trace output of stack_trace_save() is incomplete if the stack contains call_with_stack(): [0x7f00002c] dump_stack_task+0x2c/0x90 [hrtimer] [0x7f0000a0] hrtimer_hander+0x10/0x18 [hrtimer] [0x801a67f0] __hrtimer_run_queues+0x1b0/0x3b4 [0x801a7350] hrtimer_run_queues+0xc4/0xd8 [0x801a597c] update_process_times+0x3c/0x88 [0x801b5a98] tick_periodic+0x50/0xd8 [0x801b5bf4] tick_handle_periodic+0x24/0x84 [0x8010ffc4] twd_handler+0x38/0x48 [0x8017d220] handle_percpu_devid_irq+0xa8/0x244 [0x80176e9c] generic_handle_domain_irq+0x2c/0x3c [0x8052e3a8] gic_handle_irq+0x7c/0x90 [0x808ab15c] generic_handle_arch_irq+0x60/0x80 [0x8051191c] call_with_stack+0x1c/0x20 For the frame pointer unwinder, unwind_frame() checks stackframe::fp by stackframe::sp. Since call_with_stack() switches the SP from one stack to another, stackframe::fp and stackframe: :sp will point to different stacks, so we can no longer check stackframe::fp by stackframe::sp. Skip checking stackframe::fp at this point to avoid this problem. Signed-off-by: Li Huafei <lihuafei1@huawei.com> Reviewed-by: Linus Waleij <linus.walleij@linaro.org> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2022-10-04	alpha: add full ioread64/iowrite64 implementation	Arnd Bergmann	13	-23/+175
	The previous patch introduced ioread64/iowrite64 declarations, but this means we no longer get the io-64-nonatomic variant, and run into a long error when someone actually wants to use these: ERROR: modpost: "ioread64" [drivers/net/ethernet/freescale/enetc/fsl-enetc.ko] undefined! Add the (hopefully) correct implementation for each machine type, based on the 32-bit accessor. Since the 32-bit return type does not work for ioread64(), change the internal implementation to use the correct width consistently, but leave the external interface to match the asm-generic/iomap.h header that uses 32-bit or 64-bit return values. Reported-by: Guenter Roeck <linux@roeck-us.net> Tested-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Fixes: 7e772dad9913 ("alpha: Use generic <asm-generic/io.h>") Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2022-10-03	eth: pse: add missing static inlines	Jakub Kicinski	1	-6/+6
	build bot reports missing 'static inline' qualifiers in the header. Reported-by: kernel test robot <lkp@intel.com> Fixes: 18ff0bcda6d1 ("ethtool: add interface to interact with Ethernet Power Equipment") Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://lore.kernel.org/r/20221004040327.2034878-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-03	once: rename _SLOW to _SLEEPABLE	Jason A. Donenfeld	3	-26/+26
	The _SLOW designation wasn't really descriptive of anything. This is meant to be called from process context when it's possible to sleep. So name this more aptly _SLEEPABLE, which better fits its intended use. Fixes: 62c07983bef9 ("once: add DO_ONCE_SLOW() for sleepable contexts") Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20221003181413.1221968-1-Jason@zx2c4.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-03	net: pse-pd: add regulator based PSE driver	Oleksij Rempel	3	-0/+160
	Add generic, regulator based PSE driver to support simple Power Sourcing Equipment without automatic classification support. This driver was tested on 10Bast-T1L switch with regulator based PoDL PSE. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-03	dt-bindings: net: pse-dt: add bindings for regulator based PoDL PSE controller	Oleksij Rempel	2	-0/+73
	Add bindings for the regulator based Ethernet PoDL PSE controller and generic bindings for all PSE controllers. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-03	ethtool: add interface to interact with Ethernet Power Equipment	Oleksij Rempel	10	-1/+449
	Add interface to support Power Sourcing Equipment. At current step it provides generic way to address all variants of PSE devices as defined in IEEE 802.3-2018 but support only objects specified for IEEE 802.3-2018 104.4 PoDL Power Sourcing Equipment (PSE). Currently supported and mandatory objects are: IEEE 802.3-2018 30.15.1.1.3 aPoDLPSEPowerDetectionStatus IEEE 802.3-2018 30.15.1.1.2 aPoDLPSEAdminState IEEE 802.3-2018 30.15.1.2.1 acPoDLPSEAdminControl This is minimal interface needed to control PSE on each separate ethernet port but it provides not all mandatory objects specified in IEEE 802.3-2018. Since "PoDL PSE" and "PSE" have similar names, but some different values I decide to not merge them and keep separate naming schema. This should allow as to be as close to IEEE 802.3 spec as possible and avoid name conflicts in the future. This implementation is connected to PHYs instead of MACs because PSE auto classification can potentially interfere with PHY auto negotiation. So, may be some extra PHY related initialization will be needed. With WIP version of ethtools interaction with PSE capable link looks as following: $ ip l ... 5: t1l1@eth0: <BROADCAST,MULTICAST> .. ... $ ethtool --show-pse t1l1 PSE attributs for t1l1: PoDL PSE Admin State: disabled PoDL PSE Power Detection Status: disabled $ ethtool --set-pse t1l1 podl-pse-admin-control enable $ ethtool --show-pse t1l1 PSE attributs for t1l1: PoDL PSE Admin State: enabled PoDL PSE Power Detection Status: delivering power Signed-off-by: kernel test robot <lkp@intel.com> Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-03	net: mdiobus: search for PSE nodes by parsing PHY nodes.	Oleksij Rempel	3	-2/+39
	Some PHYs can be linked with PSE (Power Sourcing Equipment), so search for related nodes and attach it to the phydev. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-03	net: mdiobus: fwnode_mdiobus_register_phy() rework error handling	Oleksij Rempel	1	-9/+12
	Rework error handling as preparation for PSE patch. This patch should make it easier to extend this function. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-03	net: add framework to support Ethernet PSE and PDs devices	Oleksij Rempel	6	-0/+341
	This framework was create with intention to provide support for Ethernet PSE (Power Sourcing Equipment) and PDs (Powered Device). At current step this patch implements generic PSE support for PoDL (Power over Data Lines 802.3bu) specification with reserving name space for PD devices as well. This framework can be extended to support 802.3af and 802.3at "Power via the Media Dependent Interface" (or PoE/Power over Ethernet) Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-03	dt-bindings: net: phy: add PoDL PSE property	Oleksij Rempel	1	-0/+6
	Add property to reference node representing a PoDL Power Sourcing Equipment. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-03	[brown paperbag] fix coredump breakage	Al Viro	1	-1/+2
	Let me count the ways in which I'd screwed up: * when emitting a page, handling of gaps in coredump should happen before fetching the current file position. * fix for a problem that occurs on rather uncommon setups (and hadn't been observed in the wild) had been sent very late in the cycle. * ... with badly insufficient testing, introducing an easily reproducible breakage. Without giving it time to soak in -next. Fucked-up-by: Al Viro <viro@zeniv.linux.org.uk> Reported-by: "J. R. Okajima" <hooanon05g@gmail.com> Tested-by: "J. R. Okajima" <hooanon05g@gmail.com> Fixes: 06bbaa6dc53c "[coredump] don't use __kernel_write() on kmap_local_page()" Cc: stable@kernel.org # v6.0-only Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-10-03	net: marvell: prestera: Propagate nh state from hw to kernel	Yevhen Orlov	2	-0/+114
	We poll nexthops in HW and call for each active nexthop appropriate neighbour. Also we provide implicity neighbour resolving. For example, user have added nexthop route: # ip route add 5.5.5.5 via 1.1.1.2 But neighbour 1.1.1.2 doesn't exist. In this case we will try to call neigh_event_send, even if there is no traffic. This is useful, when you have add route, which will be used after some time but with a lot of traffic (burst). So, we has prepared, offloaded route in advance. Co-developed-by: Taras Chornyi <tchornyi@marvell.com> Signed-off-by: Taras Chornyi <tchornyi@marvell.com> Co-developed-by: Oleksandr Mazur <oleksandr.mazur@plvision.eu> Signed-off-by: Oleksandr Mazur <oleksandr.mazur@plvision.eu> Signed-off-by: Yevhen Orlov <yevhen.orlov@plvision.eu> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-03	net: marvell: prestera: Add neighbour cache accounting	Yevhen Orlov	2	-3/+797
	Move forward and use new PRESTERA_FIB_TYPE_UC_NH to provide basic nexthop routes support. Provide deinitialization sequence for all created router objects. Limitations: - Only "local" and "main" tables supported - Only generic interfaces supported for router (no bridges or vlans) Co-developed-by: Taras Chornyi <tchornyi@marvell.com> Signed-off-by: Taras Chornyi <tchornyi@marvell.com> Co-developed-by: Oleksandr Mazur <oleksandr.mazur@plvision.eu> Signed-off-by: Oleksandr Mazur <oleksandr.mazur@plvision.eu> Signed-off-by: Yevhen Orlov <yevhen.orlov@plvision.eu> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-03	net: marvell: prestera: add stub handler neighbour events	Yevhen Orlov	2	-0/+60
	Actual handler will be added in next patches Co-developed-by: Taras Chornyi <tchornyi@marvell.com> Signed-off-by: Taras Chornyi <tchornyi@marvell.com> Co-developed-by: Oleksandr Mazur <oleksandr.mazur@plvision.eu> Signed-off-by: Oleksandr Mazur <oleksandr.mazur@plvision.eu> Signed-off-by: Yevhen Orlov <yevhen.orlov@plvision.eu> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-03	net: marvell: prestera: Add heplers to interact with fib_notifier_info	Yevhen Orlov	1	-34/+65
	This will be used to implement nexthops related logic in next patches. Also try to keep ipv4/6 abstraction to be able to reuse helpers for ipv6 in the future. Co-developed-by: Taras Chornyi <tchornyi@marvell.com> Signed-off-by: Taras Chornyi <tchornyi@marvell.com> Co-developed-by: Oleksandr Mazur <oleksandr.mazur@plvision.eu> Signed-off-by: Oleksandr Mazur <oleksandr.mazur@plvision.eu> Signed-off-by: Yevhen Orlov <yevhen.orlov@plvision.eu> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-03	net: marvell: prestera: Add length macros for prestera_ip_addr	Yevhen Orlov	1	-0/+2
	Add macros to determine IP address length (internal driver types). This will be used in next patches for nexthops logic. Co-developed-by: Taras Chornyi <tchornyi@marvell.com> Signed-off-by: Taras Chornyi <tchornyi@marvell.com> Co-developed-by: Oleksandr Mazur <oleksandr.mazur@plvision.eu> Signed-off-by: Oleksandr Mazur <oleksandr.mazur@plvision.eu> Signed-off-by: Yevhen Orlov <yevhen.orlov@plvision.eu> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-03	net: marvell: prestera: add delayed wq and flush wq on deinit	Yevhen Orlov	3	-0/+14
	Flushing workqueues ensures, that no more pending works, related to just unregistered or deinitialized notifiers. After that we can free memory. Delayed wq will be used for neighbours in next patches. Co-developed-by: Taras Chornyi <tchornyi@marvell.com> Signed-off-by: Taras Chornyi <tchornyi@marvell.com> Co-developed-by: Oleksandr Mazur <oleksandr.mazur@plvision.eu> Signed-off-by: Oleksandr Mazur <oleksandr.mazur@plvision.eu> Signed-off-by: Yevhen Orlov <yevhen.orlov@plvision.eu> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-03	net: marvell: prestera: Add strict cleanup of fib arbiter	Yevhen Orlov	1	-1/+41
	This will, ensure, that there is no more, preciously allocated fib_cache entries left after deinit. Will be used to free allocated resources of nexthop routes, that points to "not our" port (e.g. eth0). Signed-off-by: Yevhen Orlov <yevhen.orlov@plvision.eu> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-03	net: marvell: prestera: Add cleanup of allocated fib_nodes	Yevhen Orlov	1	-0/+12
	Do explicity cleanup on router_hw_fini, to ensure, that all allocated objects cleaned. This will be used in cases, when upper layer (cache) is not mapped to router_hw layer. Co-developed-by: Taras Chornyi <tchornyi@marvell.com> Signed-off-by: Taras Chornyi <tchornyi@marvell.com> Co-developed-by: Oleksandr Mazur <oleksandr.mazur@plvision.eu> Signed-off-by: Oleksandr Mazur <oleksandr.mazur@plvision.eu> Signed-off-by: Yevhen Orlov <yevhen.orlov@plvision.eu> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-03	net: marvell: prestera: Add router nexthops ABI	Yevhen Orlov	6	-8/+582
	- Add functions to allocate/delete/set nexthop group - NOTE: non-ECMP nexthop is nexthop group with allocated size = 1 - Add function to read state of HW nh (if packets going through it) Co-developed-by: Taras Chornyi <tchornyi@marvell.com> Signed-off-by: Taras Chornyi <tchornyi@marvell.com> Co-developed-by: Oleksandr Mazur <oleksandr.mazur@plvision.eu> Signed-off-by: Oleksandr Mazur <oleksandr.mazur@plvision.eu> Signed-off-by: Yevhen Orlov <yevhen.orlov@plvision.eu> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-03	eth: octeon: fix build after netif_napi_add() changes	Jakub Kicinski	1	-2/+2
	Guenter reports I missed a netif_napi_add() call in one of the platform-specific drivers: drivers/net/ethernet/cavium/octeon/octeon_mgmt.c: In function 'octeon_mgmt_probe': drivers/net/ethernet/cavium/octeon/octeon_mgmt.c:1399:9: error: too many arguments to function 'netif_napi_add' 1399 \| netif_napi_add(netdev, &p->napi, octeon_mgmt_napi_poll, \| ^~~~~~~~~~~~~~ Reported-by: Guenter Roeck <linux@roeck-us.net> Fixes: b48b89f9c189 ("net: drop the weight argument from netif_napi_add") Link: https://lore.kernel.org/r/20221002175650.1491124-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-03	net/mlx5: E-Switch, Return EBUSY if can't get mode lock	Jianbo Liu	1	-1/+1
	It is to avoid tc retrying during device mode change. Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-03	net/mlx5: E-switch, Don't update group if qos is not enabled	Chris Mi	1	-1/+5
	Currently, qos group will be updated and qos will be enabled when unregistering devlink port. Actually no need to update group if qos is not enabled. Add a check to prevent unnecessary enabling and disabling qos for every port. Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Dmytro Linkin <dlinkin@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-03	net/mlx5: E-Switch, Allow offloading fwd dest flow table with vport	Roi Dayan	1	-7/+9
	Before this commit a fwd dest flow table resulted in ignoring vport dests which is incorrect and is supported. With this commit the dests can be a mix of flow table and vport dests. There is still a limitation that there cannot be more than one flow table dest. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-03	net/mlx5: Set default grace period based on function type	Maher Sanalla	1	-2/+16
	Currently, driver sets the same grace period for fw fatal health reporter to any type of function. Since the lower level functions are more vulnerable to fw fatal errors as a result of parent function closure/reload, set a smaller grace period for the lower level functions, as follows: 1. For ECPF: 180 seconds. 2. For PF: 60 seconds. 3. For VF/SF: 30 seconds. Signed-off-by: Maher Sanalla <msanalla@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-03	net/mlx5: Start health poll at earlier stage of driver load	Moshe Shemesh	3	-10/+19
	Start health poll at earlier stage, so if fw fatal issue occurred before or during initialization commands such as init_hca or set_hca_cap the poll health can detect and indicate that the driver is already in error state. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-03	net/mlx5e: Expose rx_oversize_pkts_buffer counter	Gal Pressman	4	-4/+32
	Add the rx_oversize_pkts_buffer counter to ethtool statistics. This counter exposes the number of dropped received packets due to length which arrived to RQ and exceed software buffer size allocated by the device for incoming traffic. It might imply that the device MTU is larger than the software buffers size. Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-03	net/mlx5e: xsk: Optimize for unaligned mode with 3072-byte frames	Maxim Mikityanskiy	4	-3/+61
	When XSK frame size is 3072 (or another power of two multiplied by 3), KLM mechanism for NIC virtual memory page mapping can be optimized by replacing it with KSM. Before this change, two KLM entries were needed to map an XSK frame that is not a power of two: one entry maps the UMEM memory up to the frame length, the other maps the rest of the stride to the garbage page. When the frame length divided by 3 is a power of two, it can be mapped using 3 KSM entries, and the fourth will map the rest of the stride to the garbage page. All 4 KSM entries are of the same size, which allows for a much faster lookup. Frame size 3072 is useful in certain use cases, because it allows packing 4 frames into 3 pages. Generally speaking, other frame sizes equal to PAGE_SIZE minus a power of two can be optimized in a similar way, but it will require many more KSMs per frame, which slows down UMRs a little bit, but more importantly may hit the limit for the maximum number of KSM entries. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-03	net/mlx5e: xsk: Print a warning in slow configurations	Maxim Mikityanskiy	1	-0/+9
	On striding RQ, when the XSK frame size doesn't match the MKey page size, KLM is used for memory mappings, which is a slower mechanism than MTT or KSM. It may happen in two cases: 1. Frame size is not a power of two (only possible in the unaligned mode of XSK). 2. Frame size is 2048 bytes, and the firmware doesn't support MKey pages smaller than 4096 bytes. Depending on the case, print a warning and recommend to disable striding RQ or upgrade the firmware. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-03	net/mlx5e: xsk: Use KLM to protect frame overrun in unaligned mode	Maxim Mikityanskiy	4	-10/+90
	XSK RQs support striding RQ linear mode, but the stride size may be bigger than the XSK frame size, because: 1. The stride size must be a power of two. 2. The stride size must be equal to the UMR page size. Each XSK frame is treated as a separate page, because they aren't necessarily adjacent in physical memory, so the driver can't put more than one stride per page. 3. The minimal MTT page size is 4096 on older firmware. That means that if XSK frame size is 2048 or not a power of two, the strides may be bigger than XSK frames. Normally, it's not a problem if the hardware enforces the MTU. However, traffic between vports skips the hardware MTU check, and oversized packets may be received. If an oversized packet is bigger than the XSK frame but not bigger than the stride, it will cause overwriting of the adjacent UMEM region. If the packet takes more than one stride, they can be recycled for reuse, so it's not a problem when the XSK frame size matches the stride size. Work around the above issue by leveraging KLM to make a more fine-grained mapping. The beginning of each stride is mapped to the frame memory, and the padding up to the closest power of two is mapped to the overflow page that doesn't belong to UMEM. This way, application data corruption won't happen upon receiving packets bigger than MTU. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>