diff options
author | 2025-04-09 10:14:51 -0700 | |
---|---|---|
committer | 2025-06-05 11:09:43 -0700 | |
commit | be17c0df67959fe4f88dac75dc26ed9252d4b133 (patch) | |
tree | f00beb7f8a1b319cc911f400b1b945cc731eebb4 /scripts/lib/kdoc/kdoc_files.py | |
parent | Merge patch series "riscv: ftrace: atmoic patching and preempt improvements" (diff) | |
download | linux-rng-be17c0df67959fe4f88dac75dc26ed9252d4b133.tar.xz linux-rng-be17c0df67959fe4f88dac75dc26ed9252d4b133.zip |
riscv: module: Optimize PLT/GOT entry counting
perf reports that 99.63% of the cycles from `modprobe amdgpu` are spent
inside module_frob_arch_sections(). This is because amdgpu.ko contains
about 300000 relocations in its .rela.text section, and the algorithm in
count_max_entries() takes quadratic time.
Apply two optimizations from the arm64 code, which together reduce the
total execution time by 99.58%. First, sort the relocations so duplicate
entries are adjacent. Second, reduce the number of relocations that must
be sorted by filtering to only relocations that need PLT/GOT entries, as
done in commit d4e0340919fb ("arm64/module: Optimize module load time by
optimizing PLT counting").
Unlike the arm64 code, here the filtering and sorting is done in a
scratch buffer, because the HI20 relocation search optimization in
apply_relocate_add() depends on the original order of the relocations.
This allows accumulating PLT/GOT relocations across sections so sorting
and counting is only done once per module.
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20250409171526.862481-3-samuel.holland@sifive.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
Diffstat (limited to 'scripts/lib/kdoc/kdoc_files.py')
0 files changed, 0 insertions, 0 deletions