Age | Commit message (Collapse) | Author | Files | Lines |
|
It is complicated to add mocked-up symbols for pre-handling CRC.
Handle CRC after all the export symbols in the relevant module
are registered.
Call handle_modversion() after the handle_symbol() iteration.
In some cases, I see atand-alone __crc_* without __ksymtab_*.
For example, ARCH=arm allyesconfig produces __crc_ccitt_veneer and
__crc_itu_t_veneer. I guess they come from crc_ccitt, crc_itu_t,
respectively. Since __*_veneer are auto-generated symbols, just
ignore them.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
|
|
This function handles not only modversions, but also unresolved
symbols, export symbols, etc.
Rename it to a more proper function name.
While I was here, I also added the 'const' qualifier to *sym.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
|
|
Currently, namespace_from_kstrtabns() relies on the fact that
namespace strings are recorded in the __ksymtab_strings section.
Actually, it is coded in include/linux/export.h, but modpost does
not need to hard-code the section name.
Elf_Sym::st_shndx holds the index of the relevant section. Using it is
a more portable way to get the namespace string.
Make namespace_from_kstrtabns() simply call sym_get_data(), and delete
the info->ksymtab_strings .
While I was here, I added more 'const' qualifiers to pointers.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
|
|
When CONFIG_MODULE_REL_CRCS is enabled, the value of __crc_* is not
an absolute value, but the address to the CRC data embedded in the
.rodata section.
Getting the data pointed by the symbol value is somewhat complex.
Split it out into a new helper, sym_get_data().
I will reuse it to refactor namespace_from_kstrtabns() in the next
commit.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
|
|
Currently, some sanity checks for uapi headers are done by
scripts/headers_check.pl, which is wired up to the 'headers_check'
target in the top Makefile.
It is true compiling headers has better test coverage, but there
are still several headers excluded from the compile test. I like
to keep headers_check.pl for a while, but we can delete a lot of
code by moving the build rule to usr/include/Makefile.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
|
|
There are both positive and negative options about this feature.
At first, I thought it was a good idea, but actually Linus stated a
negative opinion (https://lkml.org/lkml/2019/9/29/227). I admit it
is ugly and annoying.
The baseline I'd like to keep is the compile-test of uapi headers.
(Otherwise, kernel developers have no way to ensure the correctness
of the exported headers.)
I will maintain a small build rule in usr/include/Makefile.
Remove the other header test functionality.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
|
|
GNU Make manual says:
$?
The names of all the prerequisites that are newer than the target,
with spaces between them.
To reflect this, rename any-prereq to newer-prereqs, which is clearer
and more intuitive.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
|
|
The incremental build of Linux kernel is pretty slow when lots of
objects are compiled. The rebuild of allmodconfig may take a few
minutes even when none of the objects needs to be rebuilt.
The time-consuming part in the incremental build is the evaluation of
if_changed* macros since they are used in the recipes to compile C and
assembly source files into objects.
I notice the following code in if_changed* is expensive:
$(filter-out $(PHONY) $(wildcard $^),$^)
In the incremental build, every object has its .*.cmd file, which
contains the auto-generated list of included headers. So, $^ are
expanded into the long list of the source file + included headers,
and $(wildcard $^) checks whether they exist.
It may not be clear why this check exists there.
Here is the record of my research.
[1] The first code addition into Kbuild
This code dates back to 2002. It is the pre-git era. So, I copy-pasted
it from the historical git tree.
| commit 4a6db0791528c220655b063cf13fefc8470dbfee (HEAD)
| Author: Kai Germaschewski <kai@tp1.ruhr-uni-bochum.de>
| Date: Mon Jun 17 00:22:37 2002 -0500
|
| kbuild: Handle removed headers
|
| New and old way to handle dependencies would choke when a file
| #include'd by other files was removed, since the dependency on it was
| still recorded, but since it was gone, make has no idea what to do about
| it (and would complain with "No rule to make <file> ...")
|
| We now add targets for all the previously included files, so make will
| just ignore them if they disappear.
|
| diff --git a/Rules.make b/Rules.make
| index 6ef827d3df39..7db5301ea7db 100644
| --- a/Rules.make
| +++ b/Rules.make
| @@ -446,7 +446,7 @@ if_changed = $(if $(strip $? \
| # execute the command and also postprocess generated .d dependencies
| # file
|
| -if_changed_dep = $(if $(strip $? \
| +if_changed_dep = $(if $(strip $? $(filter-out FORCE $(wildcard $^),$^)\
| $(filter-out $(cmd_$(1)),$(cmd_$@))\
| $(filter-out $(cmd_$@),$(cmd_$(1)))),\
| @set -e; \
| diff --git a/scripts/fixdep.c b/scripts/fixdep.c
| index b5d7bee8efc7..db45bd1888c0 100644
| --- a/scripts/fixdep.c
| +++ b/scripts/fixdep.c
| @@ -292,7 +292,7 @@ void parse_dep_file(void *map, size_t len)
| exit(1);
| }
| memcpy(s, m, p-m); s[p-m] = 0;
| - printf("%s: \\\n", target);
| + printf("deps_%s := \\\n", target);
| m = p+1;
|
| clear_config();
| @@ -314,7 +314,8 @@ void parse_dep_file(void *map, size_t len)
| }
| m = p + 1;
| }
| - printf("\n");
| + printf("\n%s: $(deps_%s)\n\n", target, target);
| + printf("$(deps_%s):\n", target);
| }
|
| void print_deps(void)
The "No rule to make <file> ..." error can be solved by passing -MP to
the compiler, but I think the detection of header removal is a good
feature. When a header is removed, all source files that previously
included it should be re-compiled. This makes sure we has correctly
got rid of #include directives of it.
This is also related with the behavior of $?. The GNU Make manual says:
$?
The names of all the prerequisites that are newer than the target,
with spaces between them.
This does not explain whether a non-existent prerequisite is considered
to be newer than the target.
At this point of time, GNU Make 3.7x was used, where the $? did not
include non-existent prerequisites. Therefore,
$(filter-out FORCE $(wildcard $^),$^)
was useful to detect the header removal, and to rebuild the related
objects if it is the case.
[2] Change of $? behavior
Later, the behavior of $? was changed (fixed) to include prerequisites
that did not exist.
First, GNU Make commit 64e16d6c00a5 ("Various changes getting ready for
the release of 3.81.") changed it, but in the release test of 3.81, it
turned out to break the kernel build.
See these:
- http://lists.gnu.org/archive/html/bug-make/2006-03/msg00003.html
- https://savannah.gnu.org/bugs/?16002
- https://savannah.gnu.org/bugs/?16051
Then, GNU Make commit 6d8d9b74d9c5 ("Numerous updates to tests for
issues found on Cygwin and Windows.") reverted it for the 3.81 release
to give Linux kernel time to adjust to the new behavior.
After the 3.81 release, GNU Make commit 7595f38f62af ("Fixed a number
of documentation bugs, plus some build/install issues:") re-added it.
[3] Adjustment to the new $? behavior on Kbuild side
Meanwhile, the kernel build was changed by commit 4f1933620f57 ("kbuild:
change kbuild to not rely on incorrect GNU make behavior") to adjust to
the new $? behavior.
[4] GNU Make 3.82 released in 2010
GNU Make 3.82 was the first release that integrated the correct $?
behavior. At this point, Kbuild dealt with GNU Make versions with
different $? behaviors.
3.81 or older:
$? does not contain any non-existent prerequisite.
$(filter-out $(PHONY) $(wildcard $^),$^) was useful to detect
removed include headers.
3.82 or newer:
$? contains non-existent prerequisites. When a header is removed,
it appears in $?. $(filter-out $(PHONY) $(wildcard $^),$^) became
a redundant check.
With the correct $? behavior, we could have dropped the expensive
check for 3.82 or later, but we did not. (Maybe nobody noticed this
optimization.)
[5] The .SECONDARY special target trips up $?
Some time later, I noticed $? did not work as expected under some
circumstances. As above, $? should contain non-existent prerequisites,
but the ones specified as SECONDARY do not appear in $?.
I asked this in GNU Make ML, and it seems a bug:
https://lists.gnu.org/archive/html/bug-make/2019-01/msg00001.html
Since commit 8e9b61b293d9 ("kbuild: move .SECONDARY special target to
Kbuild.include"), all files, including headers listed in .*.cmd files,
are treated as secondary.
So, we are back into the incorrect $? behavior.
If we Kbuild want to react to the header removal, we need to keep
$(filter-out $(PHONY) $(wildcard $^),$^) but this makes the rebuild
so slow.
[Summary]
- I believe noticing the header removal and recompiling related objects
is a nice feature for the build system.
- If $? worked correctly, $(filter-out $(PHONY),$?) would be enough
to detect the header removal.
- Currently, $? does not work correctly when used with .SECONDARY,
and Kbuild is hit by this bug.
- I filed a bug report for this, but not fixed yet as of writing.
- Currently, the header removal is detected by the following expensive
code:
$(filter-out $(PHONY) $(wildcard $^),$^)
- I do not want to revert commit 8e9b61b293d9 ("kbuild: move
.SECONDARY special target to Kbuild.include"). Specifying
.SECONDARY globally is clean, and it matches to the Kbuild policy.
This commit proactively removes the expensive check since it makes the
incremental build faster. A downside is Kbuild will no longer be able
to notice the header removal.
You can confirm it by the full-build followed by a header removal, and
then re-build.
$ make defconfig all
[ full build ]
$ rm include/linux/device.h
$ make
CALL scripts/checksyscalls.sh
CALL scripts/atomic/check-atomics.sh
DESCEND objtool
CHK include/generated/compile.h
Kernel: arch/x86/boot/bzImage is ready (#11)
Building modules, stage 2.
MODPOST 12 modules
Previously, Kbuild noticed a missing header and emits a build error.
Now, Kbuild is fine with it. This is an unusual corner-case, not a big
deal. Once the $? bug is fixed in GNU Make, everything will work fine.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
|
|
Since commit 707816c8b050 ("netfilter: remove deprecation warnings from
uapi headers."), you can compile linux/netfilter_ipv4/ipt_LOG.h and
linux/netfilter_ipv6/ip6t_LOG.h without warnings.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
|
|
The local variable, ns_entry, is unneeded.
While I was here, I also cleaned up the comparison with NULL or 0.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Matthias Maennich <maennich@google.com>
|
|
scripts/nsdeps is written to take care of only in-tree modules.
Perhaps, this is not a bug, but just a design. At least,
Documentation/core-api/symbol-namespaces.rst focuses on in-tree modules.
Having said that, some people already tried nsdeps for external modules.
So, it would be nice to support it.
Reported-by: Steve French <smfrench@gmail.com>
Reported-by: Jessica Yu <jeyu@kernel.org>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Tested-by: Jessica Yu <jeyu@kernel.org>
Acked-by: Jessica Yu <jeyu@kernel.org>
Reviewed-by: Matthias Maennich <maennich@google.com>
Tested-by: Matthias Maennich <maennich@google.com>
|
|
The modpost, with the -d option given, generates per-module .ns_deps
files.
Kbuild generates per-module .mod files to carry module information.
This is convenient because Make handles multiple jobs in parallel
when the -j option is given.
On the other hand, the modpost always runs as a single thread.
I do not see a strong reason to produce separate .ns_deps files.
This commit changes the modpost to generate just one file,
modules.nsdeps, each line of which has the following format:
<module_name>: <list of missing namespaces>
Please note it contains *missing* namespaces instead of required ones.
So, modules.nsdeps is empty if the namespace dependency is all good.
This will work more efficiently because spatch will no longer process
already imported namespaces. I removed the '(if needed)' from the
nsdeps log since spatch is invoked only when needed.
This also solves the stale .ns_deps problem reported by Jessica Yu:
https://lkml.org/lkml/2019/10/28/467
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Tested-by: Jessica Yu <jeyu@kernel.org>
Acked-by: Jessica Yu <jeyu@kernel.org>
Reviewed-by: Matthias Maennich <maennich@google.com>
Tested-by: Matthias Maennich <maennich@google.com>
|
|
buf_write() allocates memory. Free it.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
|
|
'make nsdeps' invokes the modpost three times at most; before linking
vmlinux, before building modules, and finally for generating .ns_deps
files. Running the modpost again and again is not efficient.
The last two can be unified. When the -d option is given, the modpost
still does the usual job, and in addition, generates .ns_deps files.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Tested-by: Matthias Maennich <maennich@google.com>
Reviewed-by: Matthias Maennich <maennich@google.com>
|
|
Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com>
Acked-by: Alexander Kapshuk <alexander.kapshuk@gmail.com>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
|
|
If ncurses is installed, but at a non-default location, the previous
error message was not helpful in resolving the situation. Now it will
suggest that pkg-config might need to be installed in addition to
ncurses.
Signed-off-by: Alyssa Ross <hi@alyssa.is>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
|
|
make listnewconfig will list the individual options that need to be set.
This is useful but there's no easy way to get the help text associated
with the options at the same time. Introduce a new targe
'make helpnewconfig' which lists the full help text of all the
new options as well. This makes it easier to automatically generate
changes that are easy for humans to review. This command also adds
markers between each option for easier parsing.
Signed-off-by: Laura Abbott <labbott@redhat.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
|
|
Add a 'dir-pkg' target which just creates the same directory structures
as in tar-pkg, but doesn't package anything.
Useful when the user wants to copy the kernel tree on a machine using
ssh, rsync or whatever.
Signed-off-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
|
|
There are 6 defconfigs with names longer than 24 characters, breaking
alignment in "make help".
The "winner" is "ecovec24-romimage_defconfig", counting in at 27
characters.
Extend the defconfig field size to 27 to restore alignment.
Don't use a larger value, to not encourage people to create even longer
defconfig names.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
|
|
Some "make help" text lines extend beyond 80 characters.
Wrap them before an opening parenthesis, or before 80 characters.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
|
|
This patch replaces backquote to dollar parenthesis syntax for better
readability.
Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Nico Schottelius <nico-linuxsetlocalversion@schottelius.org>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
|
|
EXPORT_SYMBOL from assembly code produces an unused symbol __kcrctab_*.
kcrctab is used as a section name (prefixed with three underscores),
but never used as a symbol.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
|
|
For EXPORT_SYMBOL from C files, <linux/export.h> defines __ksymtab_*
as local symbols.
For EXPORT_SYMBOL from assembly, in contrast, <asm-generic/export.h>
produces globally-visible __ksymtab_* symbols due to this .globl
directive.
I do not know why this must be global. It still works without this.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
|
|
Since commit 394053f4a4b3 ("kbuild: make single targets work more
correctly"), building single targets is really slow.
Speed it up by not descending into unrelated directories.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
|
|
KBUILD_SINGLE_TARGETS does not need to contain all the targets.
Change it to keep track the targets only from the current directory
and its subdirectories.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
|
|
When single-build is set, everything in $(MAKECMDGOALS) is a single
target. You can use $(MAKECMDGOALS) to list out the single targets.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
|
|
This comment block explains why include/generated/compile.h is omitted,
but nothing about include/generated/autoconf.h, which might be more
difficult to understand. Add more comments.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
|
|
'pushd' ... 'popd' is the last bash-specific code in this script.
One way to avoid it is to run the code in a sub-shell.
With that addressed, you can run this script with sh.
I replaced $(BASH) with $(CONFIG_SHELL), and I changed the hashbang
to #!/bin/sh.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
|
|
This script copies headers by the cpio command twice; first from
srctree, and then from objtree. However, when we building in-tree,
we know the srctree and the objtree are the same. That is, all the
headers copied by the first cpio are overwritten by the second one.
Skip the first cpio when we are building in-tree.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
|
|
This script computes md5sum of headers in srctree and in objtree.
However, when we are building in-tree, we know the srctree and the
objtree are the same. That is, we end up with the same computation
twice. In fact, the first two lines of kernel/kheaders.md5 are always
the same for in-tree builds.
Unify the two md5sum calculations.
For in-tree builds ($building_out_of_srctree is empty), we check
only two directories, "include", and "arch/$SRCARCH/include".
For out-of-tree builds ($building_out_of_srctree is 1), we check
4 directories, "$srctree/include", "$srctree/arch/$SRCARCH/include",
"include", and "arch/$SRCARCH/include" since we know they are all
different.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
|
|
The 'head' and 'tail' commands can take a file path directly.
So, you do not need to run 'cat'.
cat kernel/kheaders.md5 | head -1
... is equivalent to:
head -1 kernel/kheaders.md5
and the latter saves forking one process.
While I was here, I replaced 'head -1' with 'head -n 1'.
I also replaced '==' with '=' since we do not have a good reason to
use the bashism.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
|
|
Since commit 040fcc819a2e ("kbuild: improved modversioning support for
external modules"), the external module build reads Module.symvers in
the directory of the module itself, then dumps symbols back into it.
It accumulates stale symbols in the file when you build an external
module incrementally.
The idea behind it was, as the commit log explained, you can copy
Modules.symvers from one module to another when you need to pass symbol
information between two modules. However, the manual copy of the file
sounds questionable to me, and containing stale symbols is a downside.
Some time later, commit 0d96fb20b7ed ("kbuild: Add new Kbuild variable
KBUILD_EXTRA_SYMBOLS") introduced a saner approach.
So, this commit removes the former one. Going forward, the external
module build dumps symbols into Module.symvers to be carried via
KBUILD_EXTRA_SYMBOLS, but never reads it automatically.
With the -I option removed, there is no one to set the external_module
flag unless KBUILD_EXTRA_SYMBOLS is passed. Now the -i option does it
instead.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
|
|
When building external modules, $(objtree)/Module.symvers is scanned
for symbol information of vmlinux and in-tree modules.
Additionally, vmlinux is parsed if it exists in $(objtree)/.
This is totally redundant since all the necessary information is
contained in $(objtree)/Module.symvers.
Do not parse vmlinux at all for external module builds. This makes
sense because vmlinux is deleted by 'make clean'.
'make clean' leaves all the build artifacts for building external
modules. vmlinux is unneeded for that.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
|
|
The comment line "When building external modules ..." explains
the same thing as "Include the module's Makefile ..." a few lines
below.
The comment "they may be used when building the .mod.c file" is no
longer true; .mod.c file is compiled in scripts/Makefile.modfinal
since commit 9b9a3f20cbe0 ("kbuild: split final module linking out
into Makefile.modfinal"). I still keep the code in case $(obj) or
$(src) is used in the external module Makefile.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
|
|
util.c exists both in scripts/kconfig/ and scripts/kconfig/lxdialog.
Prior to commit 54b8ae66ae1a ("kbuild: change *FLAGS_<basetarget>.o
to take the path relative to $(obj)"), Kbuild could not pass different
flags to source files with the same basename. Now that this issue
was solved, you can split util.c out of parser.y and compile them
independently of each other.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
|
|
This tool is only used by drivers/video/logo/Makefile. No reason to
keep it in scripts/.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
|
|
Shorten the code. It still works in the same way.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
|
|
These are listed in include/uapi/asm-generic/Kbuild, so Kbuild will
automatically generate them.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
|
|
Remove hexagon-specific bitsperlong.h so that it falls back to
include/uapi/asm-generic/bitsperlong.h
Kbuild will automatically create a wrapper of it.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
|
|
|
|
config option GENERIC_IO was removed but still selected by lib/kconfig
This patch finish the cleaning.
Fixes: 9de8da47742b ("kconfig: kill off GENERIC_IO option")
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Corentin Labbe <clabbe@baylibre.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The need_wakeup flag for Tx might not be set for AF_XDP sockets that
are only used to send packets. This happens if there is at least one
outstanding packet that has not been completed by the hardware and we
get that corresponding completion (which will not generate an
interrupt since interrupts are disabled in the napi poll loop) between
the time we stopped processing the Tx completions and interrupts are
enabled again. In this case, the need_wakeup flag will have been
cleared at the end of the Tx completion processing as we believe we
will get an interrupt from the outstanding completion at a later point
in time. But if this completion interrupt occurs before interrupts
are enable, we lose it and should at that point really have set the
need_wakeup flag since there are no more outstanding completions that
can generate an interrupt to continue the processing. When this
happens, user space will see a Tx queue need_wakeup of 0 and skip
issuing a syscall, which means will never get into the Tx processing
again and we have a deadlock.
This patch introduces a quick fix for this issue by just setting the
need_wakeup flag for Tx to 1 all the time. I am working on a proper
fix for this that will toggle the flag appropriately, but it is more
challenging than I anticipated and I am afraid that this patch will
not be completed before the merge window closes, therefore this easier
fix for now. This fix has a negative performance impact in the range
of 0% to 4%. Towards the higher end of the scale if you have driver
and application on the same core and issue a lot of packets, and
towards no negative impact if you use two cores, lower transmission
speeds and/or a workload that also receives packets.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
The need_wakeup flag for Tx might not be set for AF_XDP sockets that
are only used to send packets. This happens if there is at least one
outstanding packet that has not been completed by the hardware and we
get that corresponding completion (which will not generate an
interrupt since interrupts are disabled in the napi poll loop) between
the time we stopped processing the Tx completions and interrupts are
enabled again. In this case, the need_wakeup flag will have been
cleared at the end of the Tx completion processing as we believe we
will get an interrupt from the outstanding completion at a later point
in time. But if this completion interrupt occurs before interrupts
are enable, we lose it and should at that point really have set the
need_wakeup flag since there are no more outstanding completions that
can generate an interrupt to continue the processing. When this
happens, user space will see a Tx queue need_wakeup of 0 and skip
issuing a syscall, which means will never get into the Tx processing
again and we have a deadlock.
This patch introduces a quick fix for this issue by just setting the
need_wakeup flag for Tx to 1 all the time. I am working on a proper
fix for this that will toggle the flag appropriately, but it is more
challenging than I anticipated and I am afraid that this patch will
not be completed before the merge window closes, therefore this easier
fix for now. This fix has a negative performance impact in the range
of 0% to 4%. Towards the higher end of the scale if you have driver
and application on the same core and issue a lot of packets, and
towards no negative impact if you use two cores, lower transmission
speeds and/or a workload that also receives packets.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
When implementing launch time support in the igb and igc drivers, the
skb->tstamp value is assumed to be a s64, but it's declared as a ktime_t
value.
Although ktime_t is typedef'd to s64 it wasn't always, and the kernel
provides accessors for ktime_t values.
Use the ktime_to_timespec64 and ktime_set accessors instead of directly
assuming that the variable is always an s64.
This improves portability if the code is ever moved to another kernel
version, or if the definition of ktime_t ever changes again in the
future.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
This patch contains fix for a problem with command:
'ethtool -m <dev>'
which breaks functionality of:
'ethtool <dev>'
when called on X722 NIC
Disallowed update of link phy_types on X722 NIC
Currently correct value cannot be obtained from FW
Previously wrong value returned by FW was used and was
a root cause for incorrect output of 'ethtool <dev>' command
Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
Since commit 92418fb14750 ("i40e/i40evf: Use usec value instead of reg
value for ITR defines") the driver tracks the interrupt throttling
intervals in single usec units, although the actual ITRN registers are
programmed in 2 usec units. Most register programming flows in the driver
correctly handle the conversion, although it is currently not applied when
the registers are initialized to their default values. Most of the time
this doesn't present a problem since the default values are usually
immediately overwritten through the standard adaptive throttling mechanism,
or updated manually by the user, but if adaptive throttling is disabled and
the interval values are left alone then the incorrect value will persist.
Since the intended default interval of 50 usecs (vs. 100 usecs as
programmed) performs better for most traffic workloads, this can lead to
performance regressions.
This patch adds the correct conversion when writing the initial values to
the ITRN registers.
Signed-off-by: Nicholas Nunley <nicholas.d.nunley@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
Currently the for-loop counter i is a u8 however it is being checked
against a maximum value hw->num_tx_sched_layers which is a u16. Hence
there is a potential wrap-around of counter i back to zero if
hw->num_tx_sched_layers is greater than 255. Fix this by making i
a u16.
Addresses-Coverity: ("Infinite loop")
Fixes: b36c598c999c ("ice: Updates to Tx scheduler code")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
While rebooting the system with SR-IOV vfs enabled leads
to below crash due to recurrence of __qede_remove() on the VF
devices (first from .shutdown() flow of the VF itself and
another from PF's .shutdown() flow executing pci_disable_sriov())
This patch adds a safeguard in __qede_remove() flow to fix this,
so that driver doesn't attempt to remove "already removed" devices.
[ 194.360134] BUG: unable to handle kernel NULL pointer dereference at 00000000000008dc
[ 194.360227] IP: [<ffffffffc03553c4>] __qede_remove+0x24/0x130 [qede]
[ 194.360304] PGD 0
[ 194.360325] Oops: 0000 [#1] SMP
[ 194.360360] Modules linked in: tcp_lp fuse tun bridge stp llc devlink bonding ip_set nfnetlink ib_isert iscsi_target_mod ib_srpt target_core_mod ib_srp scsi_transport_srp scsi_tgt ib_ipoib ib_umad rpcrdma sunrpc rdma_ucm ib_uverbs ib_iser rdma_cm iw_cm ib_cm libiscsi scsi_transport_iscsi dell_smbios iTCO_wdt iTCO_vendor_support dell_wmi_descriptor dcdbas vfat fat pcc_cpufreq skx_edac intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd qedr ib_core pcspkr ses enclosure joydev ipmi_ssif sg i2c_i801 lpc_ich mei_me mei wmi ipmi_si ipmi_devintf ipmi_msghandler tpm_crb acpi_pad acpi_power_meter xfs libcrc32c sd_mod crc_t10dif crct10dif_generic crct10dif_pclmul crct10dif_common crc32c_intel mgag200
[ 194.361044] qede i2c_algo_bit drm_kms_helper qed syscopyarea sysfillrect nvme sysimgblt fb_sys_fops ttm nvme_core mpt3sas crc8 ptp drm pps_core ahci raid_class scsi_transport_sas libahci libata drm_panel_orientation_quirks nfit libnvdimm dm_mirror dm_region_hash dm_log dm_mod [last unloaded: ip_tables]
[ 194.361297] CPU: 51 PID: 7996 Comm: reboot Kdump: loaded Not tainted 3.10.0-1062.el7.x86_64 #1
[ 194.361359] Hardware name: Dell Inc. PowerEdge MX840c/0740HW, BIOS 2.4.6 10/15/2019
[ 194.361412] task: ffff9cea9b360000 ti: ffff9ceabebdc000 task.ti: ffff9ceabebdc000
[ 194.361463] RIP: 0010:[<ffffffffc03553c4>] [<ffffffffc03553c4>] __qede_remove+0x24/0x130 [qede]
[ 194.361534] RSP: 0018:ffff9ceabebdfac0 EFLAGS: 00010282
[ 194.361570] RAX: 0000000000000000 RBX: ffff9cd013846098 RCX: 0000000000000000
[ 194.361621] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9cd013846098
[ 194.361668] RBP: ffff9ceabebdfae8 R08: 0000000000000000 R09: 0000000000000000
[ 194.361715] R10: 00000000bfe14201 R11: ffff9ceabfe141e0 R12: 0000000000000000
[ 194.361762] R13: ffff9cd013846098 R14: 0000000000000000 R15: ffff9ceab5e48000
[ 194.361810] FS: 00007f799c02d880(0000) GS:ffff9ceacb0c0000(0000) knlGS:0000000000000000
[ 194.361865] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 194.361903] CR2: 00000000000008dc CR3: 0000001bdac76000 CR4: 00000000007607e0
[ 194.361953] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 194.362002] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 194.362051] PKRU: 55555554
[ 194.362073] Call Trace:
[ 194.362109] [<ffffffffc0355500>] qede_remove+0x10/0x20 [qede]
[ 194.362180] [<ffffffffb97d0f3e>] pci_device_remove+0x3e/0xc0
[ 194.362240] [<ffffffffb98b3c52>] __device_release_driver+0x82/0xf0
[ 194.362285] [<ffffffffb98b3ce3>] device_release_driver+0x23/0x30
[ 194.362343] [<ffffffffb97c86d4>] pci_stop_bus_device+0x84/0xa0
[ 194.362388] [<ffffffffb97c87e2>] pci_stop_and_remove_bus_device+0x12/0x20
[ 194.362450] [<ffffffffb97f153f>] pci_iov_remove_virtfn+0xaf/0x160
[ 194.362496] [<ffffffffb97f1aec>] sriov_disable+0x3c/0xf0
[ 194.362534] [<ffffffffb97f1bc3>] pci_disable_sriov+0x23/0x30
[ 194.362599] [<ffffffffc02f83c3>] qed_sriov_disable+0x5e3/0x650 [qed]
[ 194.362658] [<ffffffffb9622df6>] ? kfree+0x106/0x140
[ 194.362709] [<ffffffffc02cc0c0>] ? qed_free_stream_mem+0x70/0x90 [qed]
[ 194.362754] [<ffffffffb9622df6>] ? kfree+0x106/0x140
[ 194.362803] [<ffffffffc02cd659>] qed_slowpath_stop+0x1a9/0x1d0 [qed]
[ 194.362854] [<ffffffffc035544e>] __qede_remove+0xae/0x130 [qede]
[ 194.362904] [<ffffffffc03554e0>] qede_shutdown+0x10/0x20 [qede]
[ 194.362956] [<ffffffffb97cf90a>] pci_device_shutdown+0x3a/0x60
[ 194.363010] [<ffffffffb98b180b>] device_shutdown+0xfb/0x1f0
[ 194.363066] [<ffffffffb94b66c6>] kernel_restart_prepare+0x36/0x40
[ 194.363107] [<ffffffffb94b66e2>] kernel_restart+0x12/0x60
[ 194.363146] [<ffffffffb94b6959>] SYSC_reboot+0x229/0x260
[ 194.363196] [<ffffffffb95f200d>] ? handle_mm_fault+0x39d/0x9b0
[ 194.363253] [<ffffffffb942b621>] ? __switch_to+0x151/0x580
[ 194.363304] [<ffffffffb9b7ec28>] ? __schedule+0x448/0x9c0
[ 194.363343] [<ffffffffb94b69fe>] SyS_reboot+0xe/0x10
[ 194.363387] [<ffffffffb9b8bede>] system_call_fastpath+0x25/0x2a
[ 194.363430] Code: f9 e9 37 ff ff ff 90 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 41 55 4c 8d af 98 00 00 00 41 54 4c 89 ef 41 89 f4 53 e8 4c e4 55 f9 <80> b8 dc 08 00 00 01 48 89 c3 4c 8d b8 c0 08 00 00 4c 8b b0 c0
[ 194.363712] RIP [<ffffffffc03553c4>] __qede_remove+0x24/0x130 [qede]
[ 194.363764] RSP <ffff9ceabebdfac0>
[ 194.363791] CR2: 00000000000008dc
Signed-off-by: Manish Chopra <manishc@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Sudarsana Kalluru <skalluru@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
KCSAN reported the following data-race [1]
The fix will also prevent the compiler from optimizing out
the condition.
[1]
BUG: KCSAN: data-race in neigh_resolve_output / neigh_resolve_output
write to 0xffff8880a41dba78 of 8 bytes by interrupt on cpu 1:
neigh_event_send include/net/neighbour.h:443 [inline]
neigh_resolve_output+0x78/0x480 net/core/neighbour.c:1474
neigh_output include/net/neighbour.h:511 [inline]
ip_finish_output2+0x4af/0xe40 net/ipv4/ip_output.c:228
__ip_finish_output net/ipv4/ip_output.c:308 [inline]
__ip_finish_output+0x23a/0x490 net/ipv4/ip_output.c:290
ip_finish_output+0x41/0x160 net/ipv4/ip_output.c:318
NF_HOOK_COND include/linux/netfilter.h:294 [inline]
ip_output+0xdf/0x210 net/ipv4/ip_output.c:432
dst_output include/net/dst.h:436 [inline]
ip_local_out+0x74/0x90 net/ipv4/ip_output.c:125
__ip_queue_xmit+0x3a8/0xa40 net/ipv4/ip_output.c:532
ip_queue_xmit+0x45/0x60 include/net/ip.h:237
__tcp_transmit_skb+0xe81/0x1d60 net/ipv4/tcp_output.c:1169
tcp_transmit_skb net/ipv4/tcp_output.c:1185 [inline]
__tcp_retransmit_skb+0x4bd/0x15f0 net/ipv4/tcp_output.c:2976
tcp_retransmit_skb+0x36/0x1a0 net/ipv4/tcp_output.c:2999
tcp_retransmit_timer+0x719/0x16d0 net/ipv4/tcp_timer.c:515
tcp_write_timer_handler+0x42d/0x510 net/ipv4/tcp_timer.c:598
tcp_write_timer+0xd1/0xf0 net/ipv4/tcp_timer.c:618
read to 0xffff8880a41dba78 of 8 bytes by interrupt on cpu 0:
neigh_event_send include/net/neighbour.h:442 [inline]
neigh_resolve_output+0x57/0x480 net/core/neighbour.c:1474
neigh_output include/net/neighbour.h:511 [inline]
ip_finish_output2+0x4af/0xe40 net/ipv4/ip_output.c:228
__ip_finish_output net/ipv4/ip_output.c:308 [inline]
__ip_finish_output+0x23a/0x490 net/ipv4/ip_output.c:290
ip_finish_output+0x41/0x160 net/ipv4/ip_output.c:318
NF_HOOK_COND include/linux/netfilter.h:294 [inline]
ip_output+0xdf/0x210 net/ipv4/ip_output.c:432
dst_output include/net/dst.h:436 [inline]
ip_local_out+0x74/0x90 net/ipv4/ip_output.c:125
__ip_queue_xmit+0x3a8/0xa40 net/ipv4/ip_output.c:532
ip_queue_xmit+0x45/0x60 include/net/ip.h:237
__tcp_transmit_skb+0xe81/0x1d60 net/ipv4/tcp_output.c:1169
tcp_transmit_skb net/ipv4/tcp_output.c:1185 [inline]
__tcp_retransmit_skb+0x4bd/0x15f0 net/ipv4/tcp_output.c:2976
tcp_retransmit_skb+0x36/0x1a0 net/ipv4/tcp_output.c:2999
tcp_retransmit_timer+0x719/0x16d0 net/ipv4/tcp_timer.c:515
tcp_write_timer_handler+0x42d/0x510 net/ipv4/tcp_timer.c:598
Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.4.0-rc3+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Commit 67692435c411 ("sched: Rework pick_next_task() slow-path")
inadvertly introduced a race because it changed a previously
unexplored dependency between dropping the rq->lock and
sched_class::put_prev_task().
The comments about dropping rq->lock, in for example
newidle_balance(), only mentions the task being current and ->on_cpu
being set. But when we look at the 'change' pattern (in for example
sched_setnuma()):
queued = task_on_rq_queued(p); /* p->on_rq == TASK_ON_RQ_QUEUED */
running = task_current(rq, p); /* rq->curr == p */
if (queued)
dequeue_task(...);
if (running)
put_prev_task(...);
/* change task properties */
if (queued)
enqueue_task(...);
if (running)
set_next_task(...);
It becomes obvious that if we do this after put_prev_task() has
already been called on @p, things go sideways. This is exactly what
the commit in question allows to happen when it does:
prev->sched_class->put_prev_task(rq, prev, rf);
if (!rq->nr_running)
newidle_balance(rq, rf);
The newidle_balance() call will drop rq->lock after we've called
put_prev_task() and that allows the above 'change' pattern to
interleave and mess up the state.
Furthermore, it turns out we lost the RT-pull when we put the last DL
task.
Fix both problems by extracting the balancing from put_prev_task() and
doing a multi-class balance() pass before put_prev_task().
Fixes: 67692435c411 ("sched: Rework pick_next_task() slow-path")
Reported-by: Quentin Perret <qperret@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Quentin Perret <qperret@google.com>
Tested-by: Valentin Schneider <valentin.schneider@arm.com>
|