aboutsummaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)AuthorFilesLines
2022-09-20efi: libstub: drop pointless get_memory_map() callArd Biesheuvel1-8/+0
Currently, the non-x86 stub code calls get_memory_map() redundantly, given that the data it returns is never used anywhere. So drop the call. Cc: <stable@vger.kernel.org> # v4.14+ Fixes: 24d7c494ce46 ("efi/arm-stub: Round up FDT allocation to mapping size") Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2022-09-20nvdimm: make __nvdimm_security_overwrite_query staticJiapeng Chong1-1/+1
This symbol is not used outside of security.c, so marks it static. drivers/nvdimm/security.c:411:6: warning: no previous prototype for function '__nvdimm_security_overwrite_query'. Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=2148 Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Link: https://lore.kernel.org/r/20220914061251.42052-1-jiapeng.chong@linux.alibaba.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-09-20Merge tag 'for-6.0-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linuxLinus Torvalds2-8/+74
Pull btrfs fixes from David Sterba: - two fixes for hangs in the umount sequence where threads depend on each other and the work must be finished in the right order - in zoned mode, wait for flushing all block group metadata IO before finishing the zone * tag 'for-6.0-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: zoned: wait for extent buffer IOs before finishing a zone btrfs: fix hang during unmount when stopping a space reclaim worker btrfs: fix hang during unmount when stopping block group reclaim worker
2022-09-20Merge branch 'tcp-introduce-optional-per-netns-ehash'Jakub Kicinski28-161/+361
Kuniyuki Iwashima says: ==================== tcp: Introduce optional per-netns ehash. The more sockets we have in the hash table, the longer we spend looking up the socket. While running a number of small workloads on the same host, they penalise each other and cause performance degradation. The root cause might be a single workload that consumes much more resources than the others. It often happens on a cloud service where different workloads share the same computing resource. On EC2 c5.24xlarge instance (196 GiB memory and 524288 (1Mi / 2) ehash entries), after running iperf3 in different netns, creating 24Mi sockets without data transfer in the root netns causes about 10% performance regression for the iperf3's connection. thash_entries sockets length Gbps 524288 1 1 50.7 24Mi 48 45.1 It is basically related to the length of the list of each hash bucket. For testing purposes to see how performance drops along the length, I set 131072 (1Mi / 8) to thash_entries, and here's the result. thash_entries sockets length Gbps 131072 1 1 50.7 1Mi 8 49.9 2Mi 16 48.9 4Mi 32 47.3 8Mi 64 44.6 16Mi 128 40.6 24Mi 192 36.3 32Mi 256 32.5 40Mi 320 27.0 48Mi 384 25.0 To resolve the socket lookup degradation, we introduce an optional per-netns hash table for TCP, but it's just ehash, and we still share the global bhash, bhash2 and lhash2. With a smaller ehash, we can look up non-listener sockets faster and isolate such noisy neighbours. Also, we can reduce lock contention. For details, please see the last patch. patch 1 - 4: prep for per-netns ehash patch 5: small optimisation for netns dismantle without TIME_WAIT sockets patch 6: add per-netns ehash Many thanks to Eric Dumazet for reviewing and advising. ==================== Link: https://lore.kernel.org/r/20220908011022.45342-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-20tcp: Introduce optional per-netns ehash.Kuniyuki Iwashima9-8/+164
The more sockets we have in the hash table, the longer we spend looking up the socket. While running a number of small workloads on the same host, they penalise each other and cause performance degradation. The root cause might be a single workload that consumes much more resources than the others. It often happens on a cloud service where different workloads share the same computing resource. On EC2 c5.24xlarge instance (196 GiB memory and 524288 (1Mi / 2) ehash entries), after running iperf3 in different netns, creating 24Mi sockets without data transfer in the root netns causes about 10% performance regression for the iperf3's connection. thash_entries sockets length Gbps 524288 1 1 50.7 24Mi 48 45.1 It is basically related to the length of the list of each hash bucket. For testing purposes to see how performance drops along the length, I set 131072 (1Mi / 8) to thash_entries, and here's the result. thash_entries sockets length Gbps 131072 1 1 50.7 1Mi 8 49.9 2Mi 16 48.9 4Mi 32 47.3 8Mi 64 44.6 16Mi 128 40.6 24Mi 192 36.3 32Mi 256 32.5 40Mi 320 27.0 48Mi 384 25.0 To resolve the socket lookup degradation, we introduce an optional per-netns hash table for TCP, but it's just ehash, and we still share the global bhash, bhash2 and lhash2. With a smaller ehash, we can look up non-listener sockets faster and isolate such noisy neighbours. In addition, we can reduce lock contention. We can control the ehash size by a new sysctl knob. However, depending on workloads, it will require very sensitive tuning, so we disable the feature by default (net.ipv4.tcp_child_ehash_entries == 0). Moreover, we can fall back to using the global ehash in case we fail to allocate enough memory for a new ehash. The maximum size is 16Mi, which is large enough that even if we have 48Mi sockets, the average list length is 3, and regression would be less than 1%. We can check the current ehash size by another read-only sysctl knob, net.ipv4.tcp_ehash_entries. A negative value means the netns shares the global ehash (per-netns ehash is disabled or failed to allocate memory). # dmesg | cut -d ' ' -f 5- | grep "established hash" TCP established hash table entries: 524288 (order: 10, 4194304 bytes, vmalloc hugepage) # sysctl net.ipv4.tcp_ehash_entries net.ipv4.tcp_ehash_entries = 524288 # can be changed by thash_entries # sysctl net.ipv4.tcp_child_ehash_entries net.ipv4.tcp_child_ehash_entries = 0 # disabled by default # ip netns add test1 # ip netns exec test1 sysctl net.ipv4.tcp_ehash_entries net.ipv4.tcp_ehash_entries = -524288 # share the global ehash # sysctl -w net.ipv4.tcp_child_ehash_entries=100 net.ipv4.tcp_child_ehash_entries = 100 # ip netns add test2 # ip netns exec test2 sysctl net.ipv4.tcp_ehash_entries net.ipv4.tcp_ehash_entries = 128 # own a per-netns ehash with 2^n buckets When more than two processes in the same netns create per-netns ehash concurrently with different sizes, we need to guarantee the size in one of the following ways: 1) Share the global ehash and create per-netns ehash First, unshare() with tcp_child_ehash_entries==0. It creates dedicated netns sysctl knobs where we can safely change tcp_child_ehash_entries and clone()/unshare() to create a per-netns ehash. 2) Control write on sysctl by BPF We can use BPF_PROG_TYPE_CGROUP_SYSCTL to allow/deny read/write on sysctl knobs. Note that the global ehash allocated at the boot time is spread over available NUMA nodes, but inet_pernet_hashinfo_alloc() will allocate pages for each per-netns ehash depending on the current process's NUMA policy. By default, the allocation is done in the local node only, so the per-netns hash table could fully reside on a random node. Thus, depending on the NUMA policy the netns is created with and the CPU the current thread is running on, we could see some performance differences for highly optimised networking applications. Note also that the default values of two sysctl knobs depend on the ehash size and should be tuned carefully: tcp_max_tw_buckets : tcp_child_ehash_entries / 2 tcp_max_syn_backlog : max(128, tcp_child_ehash_entries / 128) As a bonus, we can dismantle netns faster. Currently, while destroying netns, we call inet_twsk_purge(), which walks through the global ehash. It can be potentially big because it can have many sockets other than TIME_WAIT in all netns. Splitting ehash changes that situation, where it's only necessary for inet_twsk_purge() to clean up TIME_WAIT sockets in each netns. With regard to this, we do not free the per-netns ehash in inet_twsk_kill() to avoid UAF while iterating the per-netns ehash in inet_twsk_purge(). Instead, we do it in tcp_sk_exit_batch() after calling tcp_twsk_purge() to keep it protocol-family-independent. In the future, we could optimise ehash lookup/iteration further by removing netns comparison for the per-netns ehash. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-20tcp: Save unnecessary inet_twsk_purge() calls.Kuniyuki Iwashima4-2/+18
While destroying netns, we call inet_twsk_purge() in tcp_sk_exit_batch() and tcpv6_net_exit_batch() for AF_INET and AF_INET6. These commands trigger the kernel to walk through the potentially big ehash twice even though the netns has no TIME_WAIT sockets. # ip netns add test # ip netns del test or # unshare -n /bin/true >/dev/null When tw_refcount is 1, we need not call inet_twsk_purge() at least for the net. We can save such unneeded iterations if all netns in net_exit_list have no TIME_WAIT sockets. This change eliminates the tax by the additional unshare() described in the next patch to guarantee the per-netns ehash size. Tested: # mount -t debugfs none /sys/kernel/debug/ # echo cleanup_net > /sys/kernel/debug/tracing/set_ftrace_filter # echo inet_twsk_purge >> /sys/kernel/debug/tracing/set_ftrace_filter # echo function > /sys/kernel/debug/tracing/current_tracer # cat ./add_del_unshare.sh for i in `seq 1 40` do (for j in `seq 1 100` ; do unshare -n /bin/true >/dev/null ; done) & done wait; # ./add_del_unshare.sh Before the patch: # cat /sys/kernel/debug/tracing/trace_pipe kworker/u128:0-8 [031] ...1. 174.162765: cleanup_net <-process_one_work kworker/u128:0-8 [031] ...1. 174.240796: inet_twsk_purge <-cleanup_net kworker/u128:0-8 [032] ...1. 174.244759: inet_twsk_purge <-tcp_sk_exit_batch kworker/u128:0-8 [034] ...1. 174.290861: cleanup_net <-process_one_work kworker/u128:0-8 [039] ...1. 175.245027: inet_twsk_purge <-cleanup_net kworker/u128:0-8 [046] ...1. 175.290541: inet_twsk_purge <-tcp_sk_exit_batch kworker/u128:0-8 [037] ...1. 175.321046: cleanup_net <-process_one_work kworker/u128:0-8 [024] ...1. 175.941633: inet_twsk_purge <-cleanup_net kworker/u128:0-8 [025] ...1. 176.242539: inet_twsk_purge <-tcp_sk_exit_batch After: # cat /sys/kernel/debug/tracing/trace_pipe kworker/u128:0-8 [038] ...1. 428.116174: cleanup_net <-process_one_work kworker/u128:0-8 [038] ...1. 428.262532: cleanup_net <-process_one_work kworker/u128:0-8 [030] ...1. 429.292645: cleanup_net <-process_one_work Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-20tcp: Access &tcp_hashinfo via net.Kuniyuki Iwashima17-78/+108
We will soon introduce an optional per-netns ehash. This means we cannot use tcp_hashinfo directly in most places. Instead, access it via net->ipv4.tcp_death_row.hashinfo. The access will be valid only while initialising tcp_hashinfo itself and creating/destroying each netns. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-20tcp: Set NULL to sk->sk_prot->h.hashinfo.Kuniyuki Iwashima6-15/+24
We will soon introduce an optional per-netns ehash. This means we cannot use the global sk->sk_prot->h.hashinfo to fetch a TCP hashinfo. Instead, set NULL to sk->sk_prot->h.hashinfo for TCP and get a proper hashinfo from net->ipv4.tcp_death_row.hashinfo. Note that we need not use sk->sk_prot->h.hashinfo if DCCP is disabled. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-20tcp: Don't allocate tcp_death_row outside of struct netns_ipv4.Kuniyuki Iwashima7-25/+15
We will soon introduce an optional per-netns ehash and access hash tables via net->ipv4.tcp_death_row->hashinfo instead of &tcp_hashinfo in most places. It could harm the fast path because dereferences of two fields in net and tcp_death_row might incur two extra cache line misses. To save one dereference, let's place tcp_death_row back in netns_ipv4 and fetch hashinfo via net->ipv4.tcp_death_row"."hashinfo. Note tcp_death_row was initially placed in netns_ipv4, and commit fbb8295248e1 ("tcp: allocate tcp_death_row outside of struct netns_ipv4") changed it to a pointer so that we can fire TIME_WAIT timers after freeing net. However, we don't do so after commit 04c494e68a13 ("Revert "tcp/dccp: get rid of inet_twsk_purge()""), so we need not define tcp_death_row as a pointer. Also, we move refcount_dec_and_test(&tw_refcount) from tcp_sk_exit() to tcp_sk_exit_batch() as a debug check. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-20tcp: Clean up some functions.Kuniyuki Iwashima5-46/+45
This patch adds no functional change and cleans up some functions that the following patches touch around so that we make them tidy and easy to review/revert. The changes are - Keep reverse christmas tree order - Remove unnecessary init of port in inet_csk_find_open_port() - Use req_to_sk() once in reqsk_queue_unlink() - Use sock_net(sk) once in tcp_time_wait() and tcp_v[46]_connect() Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-20nvdimm/region: Fix kernel-docJiapeng Chong1-1/+1
drivers/nvdimm/region_devs.c:1103: warning: expecting prototype for nvdimm_flush(). Prototype was for generic_nvdimm_flush() instead. Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=2209 Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Link: https://lore.kernel.org/r/20220919061428.102883-1-jiapeng.chong@linux.alibaba.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-09-20IB/hfi1: remove rc_only_opcode and uc_only_opcode declarationsGaosheng Cui1-3/+0
rc_only_opcode and uc_only_opcode have been removed since commit b374e060cc2a ("IB/hfi1: Consolidate pio control masks into single definition"), so remove them. Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com> Link: https://lore.kernel.org/r/20220911092325.3216513-1-cuigaosheng1@huawei.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-09-20Merge tag 'fs.fixes.v6.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmappingLinus Torvalds1-0/+2
Pull vfs fix from Christian Brauner: "Beginning of the merge window we introduced the vfs{g,u}id_t types in b27c82e12965 ("attr: port attribute changes to new types") and changed various codepaths over including chown_common(). When userspace passes -1 for an ownership change the ownership fields in struct iattr stay uninitialized. Usually this is fine because any code making use of any fields in struct iattr must check the ->ia_valid field whether the value of interest has been initialized. That's true for all struct iattr passing code. However, over the course of the last year with more heavy use of KMSAN we found quite a few places that got this wrong. A recent one I fixed was 3cb6ee991496 ("9p: only copy valid iattrs in 9P2000.L setattr implementation"). But we also have LSM hooks. Actually we have two. The first one is security_inode_setattr() in notify_change() which does the right thing and passes the full struct iattr down to LSMs and thus LSMs can check whether it is initialized. But then we also have security_path_chown() which passes down a path argument and the target ownership as the filesystem would see it. For the latter we now generate the target values based on struct iattr and pass it down. However, when userspace passes -1 then struct iattr isn't initialized. This patch simply initializes ->ia_vfs{g,u}id with INVALID_VFS{G,U}ID so the hook continue to see invalid ownership when -1 is passed from userspace. The only LSM that cares about the actual values is Tomoyo. The vfs codepaths don't look at these fields without ->ia_valid being set so there's no harm in initializing ->ia_vfs{g,u}id. Arguably this is also safer since we can't end up copying valid ownership values when invalid ownership values should be passed. This only affects mainline. No kernel has been released with this and thus no backport is needed. The commit is thus marked with a Fixes: tag but annotated with "# mainline only" (I didn't quite remember what Greg said about how to tell stable autoselect to not bother with fixes for mainline only)" * tag 'fs.fixes.v6.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping: open: always initialize ownership fields
2022-09-20efi: efibc: Guard against allocation failureGuilherme G. Piccoli1-0/+3
There is a single kmalloc in this driver, and it's not currently guarded against allocation failure. Do it here by just bailing-out the reboot handler, in case this tentative allocation fails. Fixes: 416581e48679 ("efi: efibc: avoid efivar API for setting variables") Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2022-09-20drm/amd/pm: Remove unneeded result variableye xingchen1-3/+1
Return the value atomctrl_initialize_mc_reg_table_v2_2() directly instead of storing it in another redundant variable. Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: ye xingchen <ye.xingchen@zte.com.cn> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-20drm/amd/pm: Remove the unneeded result variableye xingchen1-4/+1
Return the value append_vbios_pptable() directly instead of storing it in another redundant variable. Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: ye xingchen <ye.xingchen@zte.com.cn> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-20drm/amdgpu: fix initial connector audio valuehongao1-1/+6
This got lost somewhere along the way, This fixes audio not working until set_property was called. Signed-off-by: hongao <hongao@uniontech.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-20drm/amd/display: Reduce number of arguments of dml314's CalculateFlipSchedule()Nathan Chancellor1-125/+47
Most of the arguments are identical between the two call sites and they can be accessed through the 'struct vba_vars_st' pointer. This reduces the total amount of stack space that dml314_ModeSupportAndSystemConfigurationFull() uses by 112 bytes with LLVM 16 (1976 -> 1864), helping clear up the following clang warning: drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn314/display_mode_vba_314.c:4020:6: error: stack frame size (2216) exceeds limit (2048) in 'dml314_ModeSupportAndSystemConfigurationFull' [-Werror,-Wframe-larger-than] void dml314_ModeSupportAndSystemConfigurationFull(struct display_mode_lib *mode_lib) ^ 1 error generated. Link: https://github.com/ClangBuiltLinux/linux/issues/1710 Reported-by: "kernelci.org bot" <bot@kernelci.org> Tested-by: Maíra Canal <mairacanal@riseup.net> Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-20drm/amd/display: Reduce number of arguments of dml314's CalculateWatermarksAndDRAMSpeedChangeSupport()Nathan Chancellor1-196/+52
Most of the arguments are identical between the two call sites and they can be accessed through the 'struct vba_vars_st' pointer. This reduces the total amount of stack space that dml314_ModeSupportAndSystemConfigurationFull() uses by 240 bytes with LLVM 16 (2216 -> 1976), helping clear up the following clang warning: drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn314/display_mode_vba_314.c:4020:6: error: stack frame size (2216) exceeds limit (2048) in 'dml314_ModeSupportAndSystemConfigurationFull' [-Werror,-Wframe-larger-than] void dml314_ModeSupportAndSystemConfigurationFull(struct display_mode_lib *mode_lib) ^ 1 error generated. Link: https://github.com/ClangBuiltLinux/linux/issues/1710 Reported-by: "kernelci.org bot" <bot@kernelci.org> Tested-by: Maíra Canal <mairacanal@riseup.net> Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-20drm/amdgpu: don't register a dirty callback for non-atomicAlex Deucher1-1/+10
Some asics still support non-atomic code paths. Fixes: 66f99628eb2440 ("drm/amdgpu: use dirty framebuffer helper") Reported-by: Arthur Marsh <arthur.marsh@internode.on.net> Reviewed-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-20drm/amdgpu: bump minor for gang submitChristian König1-1/+2
Since that has now landed bump the minor to let userspace know about it. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-20drm/amdgpu: properly initialize return value during CSChristian König1-0/+1
The return value is no longer initialized before the loop because of moving code around. Signed-off-by: Christian König <christian.koenig@amd.com> Fixes: c2b08e7a6d27 ("drm/amdgpu: move entity selection and job init earlier during CS") Reviewed-by: Guchun Chen <guchun.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-20drm/amd/pm: drop the pptable related workarounds for SMU 13.0.0Evan Quan2-94/+6
The pptable in the vbios is fully ready. The related workarounds in driver are not needed any more. Signed-off-by: Evan Quan <evan.quan@amd.com> Acked-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-20drm/amd/pm: add support for 3794 pptable for SMU13.0.0Evan Quan1-0/+1
Enable 3794 pptable support for SMU13.0.0. Signed-off-by: Evan Quan <evan.quan@amd.com> Acked-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-20drm/amdgpu: add gang submit frontend v6Christian König5-99/+195
Allows submitting jobs as gang which needs to run on multiple engines at the same time. All members of the gang get the same implicit, explicit and VM dependencies. So no gang member will start running until everything else is ready. The last job is considered the gang leader (usually a submission to the GFX ring) and used for signaling output dependencies. Each job is remembered individually as user of a buffer object, so there is no joining of work at the end. v2: rebase and fix review comments from Andrey and Yogesh v3: use READ instead of BOOKKEEP for now because of VM unmaps, set gang leader only when necessary v4: fix order of pushing jobs and adding fences found by Trigger. v5: fix job index calculation and adding IBs to jobs v6: fix typo found by Alex Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-20drm/amdgpu: add gang submit backend v2Christian König4-2/+67
Allows submitting jobs as gang which needs to run on multiple engines at the same time. Basic idea is that we have a global gang submit fence representing when the gang leader is finally pushed to run on the hardware last. Jobs submitted as gang are never re-submitted in case of a GPU reset since this won't work and will just deadlock the hardware immediately again. v2: fix logic inversion, improve documentation, fix rcu Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-20drm/amdgpu: cleanup instance limit on VCN4 v4Christian König1-19/+23
Similar to what we did for VCN3 use the job instead of the parser entity. Cleanup the coding style quite a bit as well. v2: merge improved application check into this patch v3: finally fix the check v4: limit to the correct engine Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-20drm/amdgpu: getting fan speed pwm for vega10 properlyYury Zhuravlev1-13/+12
Instead of using RPM speed, we will use a function from vega20 based on PWM registers. Reviewed-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Yury Zhuravlev <stalkerg@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-20headers: Remove some left-over license textChristophe JAILLET2-4/+0
Remove a left-over from commit 2874c5fd2842 ("treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152") There is no need for an empty "License:". Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://lore.kernel.org/r/0e5ff727626b748238f4b78932f81572143d8f0b.1662896317.git.christophe.jaillet@wanadoo.fr Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-20selftests/bonding: add a test for bonding lladdr targetHangbin Liu2-0/+66
This is a regression test for commit 592335a4164c ("bonding: accept unsolicited NA message") and commit b7f14132bf58 ("bonding: use unspecified address if no available link local address"). When the bond interface up and no available link local address, unspecified address(::) is used to send the NS message. The unsolicited NA message should also be accepted for validation. Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: Jonathan Toppins <jtoppins@redhat.com> Link: https://lore.kernel.org/r/20220920033047.173244-1-liuhangbin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-20gfs2: Register fs after creating workqueuesBob Peterson1-12/+12
Before this patch, the gfs2 file system was registered prior to creating the three workqueues. In some cases this allowed dlm to send recovery work to a workqueue that did not yet exist because gfs2 was still initializing. This patch changes the order of gfs2's initialization routine so it only registers the file system after the work queues are created. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2022-09-20net: mdio: mux-multiplexer: Switch to use dev_err_probe() helperYang Yingliang1-6/+3
dev_err() can be replace with dev_err_probe() which will check if error code is -EPROBE_DEFER. Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20220915065043.665138-3-yangyingliang@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-20net: mdio: mux-mmioreg: Switch to use dev_err_probe() helperYang Yingliang1-6/+3
dev_err() can be replace with dev_err_probe() which will check if error code is -EPROBE_DEFER. Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20220915065043.665138-2-yangyingliang@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-20net: mdio: mux-meson-g12a: Switch to use dev_err_probe() helperYang Yingliang1-13/+7
dev_err() can be replace with dev_err_probe() which will check if error code is -EPROBE_DEFER. Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20220915065043.665138-1-yangyingliang@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-20ravb: Add RZ/G2L MII interface supportBiju Das2-1/+15
EMAC IP found on RZ/G2L Gb ethernet supports MII interface. This patch adds support for selecting MII interface mode. Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com> Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru> Link: https://lore.kernel.org/r/20220914192604.265859-1-biju.das.jz@bp.renesas.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-20Merge tag 'execve-v6.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linuxLinus Torvalds5-104/+3
Pull execve reverts from Kees Cook: "The recent work to support time namespace unsharing turns out to have some undesirable corner cases, so rather than allowing the API to stay exposed for another release, it'd be best to remove it ASAP, with the replacement getting another cycle of testing. Nothing is known to use this yet, so no userspace breakage is expected. For more details, see: https://lore.kernel.org/lkml/ed418e43ad28b8688cfea2b7c90fce1c@ispras.ru Summary: - Remove the recent 'unshare time namespace on vfork+exec' feature (Andrei Vagin)" * tag 'execve-v6.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: Revert "fs/exec: allow to unshare a time namespace on vfork+exec" Revert "selftests/timens: add a test for vfork+exit"
2022-09-20net: rtnetlink: Enslave device before bringing it upPhil Sutter1-7/+7
Unlike with bridges, one can't add an interface to a bond and set it up at the same time: | # ip link set dummy0 down | # ip link set dummy0 master bond0 up | Error: Device can not be enslaved while up. Of all drivers with ndo_add_slave callback, bond and team decline if IFF_UP flag is set, vrf cycles the interface (i.e., sets it down and immediately up again) and the others just don't care. Support the common notion of setting the interface up after enslaving it by sorting the operations accordingly. Signed-off-by: Phil Sutter <phil@nwl.cc> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20220914150623.24152-1-phil@nwl.cc Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-20Merge branch 'macb-add-zynqmp-sgmii-dynamic-configuration-support'Jakub Kicinski3-0/+98
Radhey Shyam Pandey says: ==================== macb: add zynqmp SGMII dynamic configuration support This patchset add firmware and driver support to do SD/GEM dynamic configuration. In traditional flow GEM secure space configuration is done by FSBL. However in specific usescases like dynamic designs where GEM is not enabled in base vivado design, FSBL skips GEM initialization and we need a mechanism to configure GEM secure space in linux space at runtime. ==================== Link: https://lore.kernel.org/r/1663158796-14869-1-git-send-email-radhey.shyam.pandey@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-20net: macb: Add zynqmp SGMII dynamic configuration supportRadhey Shyam Pandey1-0/+22
Add support for the dynamic configuration which takes care of configuring the GEM secure space configuration registers using EEMI APIs. High level sequence is to: - Check for the PM dynamic configuration support, if no error proceed with GEM dynamic configurations(next steps) otherwise skip the dynamic configuration. - Configure GEM Fixed configurations. - Configure GEM_CLK_CTRL (gemX_sgmii_mode). - Trigger GEM reset. Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@amd.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Tested-by: Conor Dooley <conor.dooley@microchip.com> (for MPFS) Reviewed-by: Claudiu Beznea <claudiu.beznea@microchip.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-20firmware: xilinx: add support for sd/gem configRonak Jain2-0/+76
Add new APIs in firmware to configure SD/GEM registers. Internally it calls PM IOCTL for below SD/GEM register configuration: - SD/EMMC select - SD slot type - SD base clock - SD 8 bit support - SD fixed config - GEM SGMII Mode - GEM fixed config Signed-off-by: Ronak Jain <ronak.jain@xilinx.com> Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@amd.com> Reviewed-by: Claudiu Beznea <claudiu.beznea@microchip.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-20HSI: nokia-modem: Replace of_gpio_count() by gpiod_count()Andy Shevchenko1-3/+1
As a preparation to unexport of_gpio_named_count(), convert the driver to use gpiod_count() instead. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
2022-09-20HSI: ssi_protocol: fix potential resource leak in ssip_pn_open()Jianglei Nie1-0/+1
ssip_pn_open() claims the HSI client's port with hsi_claim_port(). When hsi_register_port_event() gets some error and returns a negetive value, the HSI client's port should be released with hsi_release_port(). Fix it by calling hsi_release_port() when hsi_register_port_event() fails. Signed-off-by: Jianglei Nie <niejianglei2021@163.com> Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
2022-09-20net: clear msg_get_inq in __get_compat_msghdr()Tetsuo Handa1-0/+1
syzbot is still complaining uninit-value in tcp_recvmsg(), for commit 1228b34c8d0ecf6d ("net: clear msg_get_inq in __sys_recvfrom() and __copy_msghdr_from_user()") missed that __get_compat_msghdr() is called instead of copy_msghdr_from_user() when MSG_CMSG_COMPAT is specified. Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Fixes: 1228b34c8d0ecf6d ("net: clear msg_get_inq in __sys_recvfrom() and __copy_msghdr_from_user()") Reviewed-by: Jens Axboe <axboe@kernel.dk> Link: https://lore.kernel.org/r/d06d0f7f-696c-83b4-b2d5-70b5f2730a37@I-love.SAKURA.ne.jp Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-20Merge branch 'ipmr-always-call-ip-6-_mr_forward-from-rcu-read-side-critical-section'Jakub Kicinski3-2/+97
Ido Schimmel says: ==================== ipmr: Always call ip{,6}_mr_forward() from RCU read-side critical section Patch #1 fixes a bug in ipmr code. Patch #2 adds corresponding test cases. ==================== Link: https://lore.kernel.org/r/20220914075339.4074096-1-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-20selftests: forwarding: Add test cases for unresolved multicast routesIdo Schimmel1-1/+91
Add IPv4 and IPv6 test cases for unresolved multicast routes, testing that queued packets are forwarded after installing a matching (S, G) route. The test cases can be used to reproduce the bugs fixed in "ipmr: Always call ip{,6}_mr_forward() from RCU read-side critical section". Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-20ipmr: Always call ip{,6}_mr_forward() from RCU read-side critical sectionIdo Schimmel2-1/+6
These functions expect to be called from RCU read-side critical section, but this only happens when invoked from the data path via ip{,6}_mr_input(). They can also be invoked from process context in response to user space adding a multicast route which resolves a cache entry with queued packets [1][2]. Fix by adding missing rcu_read_lock() / rcu_read_unlock() in these call paths. [1] WARNING: suspicious RCU usage 6.0.0-rc3-custom-15969-g049d233c8bcc-dirty #1387 Not tainted ----------------------------- net/ipv4/ipmr.c:84 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 1 lock held by smcrouted/246: #0: ffffffff862389b0 (rtnl_mutex){+.+.}-{3:3}, at: ip_mroute_setsockopt+0x11c/0x1420 stack backtrace: CPU: 0 PID: 246 Comm: smcrouted Not tainted 6.0.0-rc3-custom-15969-g049d233c8bcc-dirty #1387 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-1.fc36 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x91/0xb9 vif_dev_read+0xbf/0xd0 ipmr_queue_xmit+0x135/0x1ab0 ip_mr_forward+0xe7b/0x13d0 ipmr_mfc_add+0x1a06/0x2ad0 ip_mroute_setsockopt+0x5c1/0x1420 do_ip_setsockopt+0x23d/0x37f0 ip_setsockopt+0x56/0x80 raw_setsockopt+0x219/0x290 __sys_setsockopt+0x236/0x4d0 __x64_sys_setsockopt+0xbe/0x160 do_syscall_64+0x34/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd [2] WARNING: suspicious RCU usage 6.0.0-rc3-custom-15969-g049d233c8bcc-dirty #1387 Not tainted ----------------------------- net/ipv6/ip6mr.c:69 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 1 lock held by smcrouted/246: #0: ffffffff862389b0 (rtnl_mutex){+.+.}-{3:3}, at: ip6_mroute_setsockopt+0x6b9/0x2630 stack backtrace: CPU: 1 PID: 246 Comm: smcrouted Not tainted 6.0.0-rc3-custom-15969-g049d233c8bcc-dirty #1387 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-1.fc36 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x91/0xb9 vif_dev_read+0xbf/0xd0 ip6mr_forward2.isra.0+0xc9/0x1160 ip6_mr_forward+0xef0/0x13f0 ip6mr_mfc_add+0x1ff2/0x31f0 ip6_mroute_setsockopt+0x1825/0x2630 do_ipv6_setsockopt+0x462/0x4440 ipv6_setsockopt+0x105/0x140 rawv6_setsockopt+0xd8/0x690 __sys_setsockopt+0x236/0x4d0 __x64_sys_setsockopt+0xbe/0x160 do_syscall_64+0x34/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd Fixes: ebc3197963fc ("ipmr: add rcu protection over (struct vif_device)->dev") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-20xen-netfront: make bounce_skb staticruanjinjie1-1/+1
The symbol is not used outside of the file, so mark it static. Fixes the following warning: ./drivers/net/xen-netfront.c:676:16: warning: symbol 'bounce_skb' was not declared. Should it be static? Signed-off-by: ruanjinjie <ruanjinjie@huawei.com> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/20220914064339.49841-1-ruanjinjie@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-20net: ipa: properly limit modem routing table useAlex Elder5-26/+32
IPA can route packets between IPA-connected entities. The AP and modem are currently the only such entities supported, and no routing is required to transfer packets between them. The number of entries in each routing table is fixed, and defined at initialization time. Some of these entries are designated for use by the modem, and the rest are available for the AP to use. The AP sends a QMI message to the modem which describes (among other things) information about routing table memory available for the modem to use. Currently the QMI initialization packet gives wrong information in its description of routing tables. What *should* be supplied is the maximum index that the modem can use for the routing table memory located at a given location. The current code instead supplies the total *number* of routing table entries. Furthermore, the modem is granted the entire table, not just the subset it's supposed to use. This patch fixes this. First, the ipa_mem_bounds structure is generalized so its "end" field can be interpreted either as a final byte offset, or a final array index. Second, the IPv4 and IPv6 (non-hashed and hashed) table information fields in the QMI ipa_init_modem_driver_req structure are changed to be ipa_mem_bounds rather than ipa_mem_array structures. Third, we set the "end" value for each routing table to be the last index, rather than setting the "count" to be the number of indices. Finally, instead of allowing the modem to use all of a routing table's memory, it is limited to just the portion meant to be used by the modem. In all versions of IPA currently supported, that is IPA_ROUTE_MODEM_COUNT (8) entries. Update a few comments for clarity. Fixes: 530f9216a9537 ("soc: qcom: ipa: AP/modem communications") Signed-off-by: Alex Elder <elder@linaro.org> Link: https://lore.kernel.org/r/20220913204602.1803004-1-elder@linaro.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-20bpf: Check whether or not node is NULL before free it in free_bulkHou Tao1-1/+2
llnode could be NULL if there are new allocations after the checking of c-free_cnt > c->high_watermark in bpf_mem_refill() and before the calling of __llist_del_first() in free_bulk (e.g. a PREEMPT_RT kernel or allocation in NMI context). And it will incur oops as shown below: BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page PGD 0 P4D 0 Oops: 0002 [#1] PREEMPT_RT SMP CPU: 39 PID: 373 Comm: irq_work/39 Tainted: G W 6.0.0-rc6-rt9+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) RIP: 0010:bpf_mem_refill+0x66/0x130 ...... Call Trace: <TASK> irq_work_single+0x24/0x60 irq_work_run_list+0x24/0x30 run_irq_workd+0x18/0x20 smpboot_thread_fn+0x13f/0x2c0 kthread+0x121/0x140 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x1f/0x30 </TASK> Simply fixing it by checking whether or not llnode is NULL in free_bulk(). Fixes: 8d5a8011b35d ("bpf: Batch call_rcu callbacks instead of SLAB_TYPESAFE_BY_RCU.") Signed-off-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/r/20220919144811.3570825-1-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-09-20net: phy: micrel: Add interrupts support for LAN8804 PHYHoratiu Vultur1-0/+62
Add support for interrupts for LAN8804 PHY. Tested-by: Michael Walle <michael@walle.cc> # on kontron-kswitch-d10 Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Link: https://lore.kernel.org/r/20220913142926.816746-1-horatiu.vultur@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>