Age | Commit message (Collapse) | Author | Files | Lines |
|
When getting the directory contents, the entries are first fetched to a
kernel buffer, then they are copied to userspace with dir_emit(). This
second phase is non-blocking as long as the userspace buffer is not paged
out, making it interruptible makes zero sense.
Overload d_type as flags, since it only uses 4 bits from 32.
Reviewed-by: Bernd Schubert <bschubert@ddn.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Link: https://lore.kernel.org/20250513112335.1473177-1-mszeredi@redhat.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Fixes a typo in the root= parameter description, changing
"this a a" to "this is a".
Fixes: c0c1a7dcb6f5 ("init: move the nfs/cifs/ram special cases out of name_to_dev_t")
Signed-off-by: Petr Vaněk <arkamar@atlas.cz>
Link: https://lore.kernel.org/20250512110827.32530-1-arkamar@atlas.cz
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
When CONFIG_CGROUPS is not selected, {get,put}_cgroup_ns become no-ops
and therefore it is not necessary to compile in the code for changing
the reference count.
When CONFIG_CGROUP is selected, there is no valid case where
either of {get,put}_cgroup_ns() will be called with a NULL argument.
Signed-off-by: Joel Savitz <jsavitz@redhat.com>
Link: https://lore.kernel.org/20250508184930.183040-3-jsavitz@redhat.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
In free_nsproxy() and the error path of create_new_namesapces() the
put_*_ns() calls are guarded by unnecessary NULL checks.
put_pid_ns(), put_ipc_ns(), put_uts_ns(), and put_time_ns() will never
receive a NULL argument unless their namespace type is disabled, and in
this case all four become no-ops at compile time anyway. put_mnt_ns()
will never receive a null argument at any time.
This unguarded usage is in line with other call sites of put_*_ns().
Signed-off-by: Joel Savitz <jsavitz@redhat.com>
Link: https://lore.kernel.org/20250508184930.183040-2-jsavitz@redhat.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Stop using write_cache_pages and use writeback_iter directly. This
removes an indirect call per written folio and makes the code easier
to follow.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/20250507062124.3933305-1-hch@lst.de
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
After commit 475d0db742e3 ("fs: Fix theoretical division by 0 in
super_cache_scan()."), there's no need to plus one to prevent
division by zero.
Remove it to simplify the code.
Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com>
Link: https://lore.kernel.org/20250428135050.267297-1-alexjlzheng@tencent.com
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
This makes it easy to detect proper anonymous inodes and to ensure that
we can detect them in codepaths such as readahead().
Readahead on anonymous inodes didn't work because they didn't have a
proper mode. Now that they have we need to retain EINVAL being returned
otherwise LTP will fail.
We also need to ensure that ioctls aren't simply fired like they are for
regular files so things like inotify inodes continue to correctly call
their own ioctl handlers as in [1].
Reported-by: Xilin Wu <sophon@radxa.com>
Link: https://lore.kernel.org/3A9139D5CD543962+89831381-31b9-4392-87ec-a84a5b3507d8@radxa.com [1]
Link: https://lore.kernel.org/7a1a7076-ff6b-4cb0-94e7-7218a0a44028@sirena.org.uk
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
This system call has been deprecated for quite a while now.
Let's try and remove it from the kernel completely.
Link: https://lore.kernel.org/20250415-kanufahren-besten-02ac00e6becd@brauner
Acked-by: Kees Cook <kees@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
The routine gets called for every path component during lookup.
->i_mode is going to be cached on account of permission checks, while
->i_rdev is an area which is most likely cache-cold.
gcc 14.2 is kind enough to emit one branch:
movzwl (%rbx),%eax
mov %eax,%edx
and $0xb000,%dx
cmp $0x2000,%dx
je 11bc <inode_permission+0xec>
This patch is lazy in that I don't know if the ->i_rdev branch makes
any sense with the newly added mode check upfront. I am not changing any
semantics here though.
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Link: https://lore.kernel.org/20250416221626.2710239-3-mjguzik@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Remove validate_constant_table() since:
- It has no caller.
- It has below 3 bugs for good constant table array array[] which must
end with a empty entry, and take below invocation for explaination:
validate_constant_table(array, ARRAY_SIZE(array), ...)
- Always return wrong value due to the last empty entry.
- Imprecise error message for missorted case.
- Potential NULL pointer dereference since the last pr_err() may use
@tbl[i].name NULL pointer to print the last empty entry's name.
Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com>
Link: https://lore.kernel.org/20250415-fix_fs-v4-1-5d575124a3ff@quicinc.com
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
The routine only encounters errors when people try to access things they
can't, which is a negligible amount of calls.
The only questionable bit might be the pre-existing predict around
MAY_WRITE. Currently the routine is predominantly used for MAY_EXEC, so
this makes some sense.
I verified this straightens out the asm.
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Link: https://lore.kernel.org/20250416221626.2710239-2-mjguzik@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Delete macro fsparam_u32hex() since:
- it has no caller.
- it uses as type @fs_param_is_u32_hex which is never defined, so will
cause compile error when caller uses it.
Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com>
Link: https://lore.kernel.org/20250411-fix_fs-v2-1-5d3395c102e4@quicinc.com
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Looking at the asm produced by gcc 13.3 for x86-64:
1. may_lookup() usage was not optimized for succeeding, despite the
routine being inlined and rightfully starting with likely(!err)
2. the compiler assumed the path will have an indefinite amount of
slashes to skip, after which the result will be an empty name
As such:
1. predict may_lookup() succeeding
2. check for one slash, no explicit predicts. do roll forward with
skipping more slashes while predicting there is only one
3. predict the path to find was not a mere slash
This also has a side effect of shrinking the file:
add/remove: 1/1 grow/shrink: 0/3 up/down: 934/-1012 (-78)
Function old new delta
link_path_walk - 934 +934
path_parentat 138 112 -26
path_openat 4864 4823 -41
path_lookupat 418 374 -44
link_path_walk.part.constprop 901 - -901
Total: Before=46639, After=46561, chg -0.17%
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Link: https://lore.kernel.org/20250412110935.2267703-1-mjguzik@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Make file-nr output the total allocated file handles, not per-cpu
cache number, it's more precise, and not in hot path
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Link: https://lore.kernel.org/20250410112117.2851-1-lirongqing@baidu.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Adding an unlikely() hint on the n < 0 comparison return path improves
run-time performance of the select() system call, the negative
value of n is very uncommon in normal select usage.
Benchmarking on an Debian based Intel(R) Core(TM) Ultra 9 285K with
a 6.15-rc1 kernel built with 14.2.0 using a select of 1000 file
descriptors with zero timeout shows a consistent call reduction from
258 ns down to 254 ns, which is a ~1.5% performance improvement.
Results based on running 25 tests with turbo disabled (to reduce clock
freq turbo changes), with 30 second run per test and comparing the number
of select() calls per second. The % standard deviation of the 25 tests
was 0.24%, so results are reliable.
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Link: https://lore.kernel.org/20250414092426.53529-1-colin.i.king@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
fs_name() has @index as unsigned int, so there is underflow risk for
operation '@index--'.
Fix by breaking the for loop when '@index == 0' which is also more proper
than '@index <= 0' for unsigned integer comparison.
Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com>
Link: https://lore.kernel.org/20250410-fix_fs-v1-1-7c14ccc8ebaa@quicinc.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
There is no mount option with pattern "...,=key_or_value,...", so the if
condition '(value == key)' in while loop of vfs_parse_monolithic_sep() is
is unlikely true.
Mark the condition with unlikely() to improve both performance and
readability.
Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com>
Link: https://lore.kernel.org/20250410-fix_fs-v1-5-7c14ccc8ebaa@quicinc.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
For fs_validate_description(), its comments easily mislead reader that
the function will search array @desc for duplicated entries with name
specified by parameter @name, but @name is not used for search actually.
Fix by marking name as owner's name of these parameter specifications.
Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Use KERN_INFO instead of default KERN_NOTICE for
infof()|info_plog()|infofc() to printk informational messages.
Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com>
Link: https://lore.kernel.org/20250410-rfc_fix_fs-v1-1-406e13b3608e@quicinc.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Adding an unlikely() hint on the fd < 0 comparison return path improves
run-time performance of the poll() system call. gcov based coverage
analysis based on running stress-ng and a kernel build shows that this
path return path is highly unlikely.
Benchmarking on an Debian based Intel(R) Core(TM) Ultra 9 285K with
a 6.15-rc1 kernel and a poll of 1024 file descriptors with zero timeout
shows an call reduction from 32818 ns down to 32635 ns, which is a ~0.5%
performance improvement.
Results based on running 25 tests with turbo disabled (to reduce clock
freq turbo changes), with 30 second run per test and comparing the number
of poll() calls per second. The % standard deviation of the 25 tests
was 0.08%, so results are reliable.
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Link: https://lore.kernel.org/20250409155510.577490-1-colin.i.king@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Bring the netfs documentation up to date.
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/1690127.1744208325@warthog.procyon.org.uk
Reviewed-by: "Paulo Alcantara (Red Hat)" <pc@manguebit.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: Viacheslav Dubeyko <slava@dubeyko.com>
cc: Alex Markuze <amarkuze@redhat.com>
cc: Timothy Day <timday@amazon.com>
cc: Jonathan Corbet <corbet@lwn.net>
cc: netfs@lists.linux.dev
cc: linux-doc@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Vast majority of the time the func returns false.
This avoids a branch to determine whether we are in RCU mode.
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Link: https://lore.kernel.org/20250408073641.1799151-1-mjguzik@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
This matches the annotation in fdget().
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Link: https://lore.kernel.org/20250406235806.1637000-2-mjguzik@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
This is a nop, but I did verify asm improves.
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Link: https://lore.kernel.org/20250406235806.1637000-1-mjguzik@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Test that anonymous inodes cannot be open()ed.
Link: https://lore.kernel.org/20250407-work-anon_inode-v1-9-53a44c20d44e@kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Test that anonymous inodes cannot be exec()ed.
Link: https://lore.kernel.org/20250407-work-anon_inode-v1-8-53a44c20d44e@kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Test that anonymous inodes cannot be chmod()ed.
Link: https://lore.kernel.org/20250407-work-anon_inode-v1-7-53a44c20d44e@kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Test that anonymous inodes cannot be chown()ed.
Link: https://lore.kernel.org/20250407-work-anon_inode-v1-6-53a44c20d44e@kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
It isn't possible to execute anonymous inodes because they cannot be
opened in any way after they have been created. This includes execution:
execveat(fd_anon_inode, "", NULL, NULL, AT_EMPTY_PATH)
Anonymous inodes have inode->f_op set to no_open_fops which sets
no_open() which returns ENXIO. That means any call to do_dentry_open()
which is the endpoint of the do_open_execat() will fail. There's no
chance to execute an anonymous inode. Unless a given subsystem overrides
it ofc.
However, we should still harden this and raise SB_I_NODEV and
SB_I_NOEXEC on the superblock itself so that no one gets any creative
ideas.
Link: https://lore.kernel.org/20250407-work-anon_inode-v1-5-53a44c20d44e@kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Cc: stable@vger.kernel.org # all LTS kernels
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
So far pidfs did use it's own version. Just use the generic version.
We use our own wrappers because we're going to be implementing
properties soon.
Link: https://lore.kernel.org/20250407-work-anon_inode-v1-4-53a44c20d44e@kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
It is currently possible to change the mode and owner of the single
anonymous inode in the kernel:
int main(int argc, char *argv[])
{
int ret, sfd;
sigset_t mask;
struct signalfd_siginfo fdsi;
sigemptyset(&mask);
sigaddset(&mask, SIGINT);
sigaddset(&mask, SIGQUIT);
ret = sigprocmask(SIG_BLOCK, &mask, NULL);
if (ret < 0)
_exit(1);
sfd = signalfd(-1, &mask, 0);
if (sfd < 0)
_exit(2);
ret = fchown(sfd, 5555, 5555);
if (ret < 0)
_exit(3);
ret = fchmod(sfd, 0777);
if (ret < 0)
_exit(3);
_exit(4);
}
This is a bug. It's not really a meaningful one because anonymous inodes
don't really figure into path lookup and they cannot be reopened via
/proc/<pid>/fd/<nr> and can't be used for lookup itself. So they can
only ever serve as direct references.
But it is still completely bogus to allow the mode and ownership or any
of the properties of the anonymous inode to be changed. Block this!
Link: https://lore.kernel.org/20250407-work-anon_inode-v1-3-53a44c20d44e@kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Cc: stable@vger.kernel.org # all LTS kernels
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
So far pidfs did use it's own version. Just use the generic version. We
use our own wrappers because we're going to be implementing our own
retrieval properties soon.
Link: https://lore.kernel.org/20250407-work-anon_inode-v1-2-53a44c20d44e@kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
This allows the VFS to not trip over anonymous inodes and we can add
asserts based on the mode into the vfs. When we report it to userspace
we can simply hide the mode to avoid regressions. I've audited all
direct callers of alloc_anon_inode() and only secretmen overrides i_mode
and i_op inode operations but it already uses a regular file.
Link: https://lore.kernel.org/20250407-work-anon_inode-v1-1-53a44c20d44e@kernel.org
Fixes: af153bb63a336 ("vfs: catch invalid modes in may_open()")
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Cc: stable@vger.kernel.org # all LTS kernels
Reported-by: syzbot+5d8e79d323a13aa0b248@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/67ed3fb3.050a0220.14623d.0009.GAE@google.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Update the document to reflect that initramfs didn't replace initrd
following kernel 2.5.x.
The initramfs buffer format now supports many compression types in
addition to gzip, so include them in the grammar section.
c_mtime use is dependent on CONFIG_INITRAMFS_PRESERVE_MTIME.
Signed-off-by: David Disseldorp <ddiss@suse.de>
Link: https://lore.kernel.org/r/20250402033949.852-2-ddiss@suse.de
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
|
|
The "real" linux/types.h UAPI header gracefully degrades to a NOOP when
included from assembly code.
Mirror this behaviour in the tools/ variant.
Test for __ASSEMBLER__ over __ASSEMBLY__ as the former is provided by the
toolchain automatically.
Reported-by: Mark Brown <broonie@kernel.org>
Closes: https://lore.kernel.org/lkml/af553c62-ca2f-4956-932c-dd6e3a126f58@sirena.org.uk/
Fixes: c9fbaa879508 ("selftests: vDSO: parse_vdso: Use UAPI headers instead of libc headers")
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Link: https://patch.msgid.link/20250321-uapi-consistency-v1-1-439070118dc0@linutronix.de
Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Support up to 8192 processors
Add cpuidle governor debug telemetry, disabled by default
Update default output to exclude cpuidle invocation counts
Bug fixes
Signed-off-by: Len Brown <len.brown@intel.com>
|
|
Create "pct_idle" counter group, the sofware notion of residency
so it can now be singled out, independent of other counter groups.
Create "cpuidle" group, the cpuidle invocation counts.
Disable "cpuidle", by default.
Create "swidle" = "cpuidle" + "pct_idle".
Undocument "sysfs", the old name for "swidle", but keep it working
for backwards compatibilty.
Create "hwidle", all the HW idle counters
Modify "idle", enabled by default
"idle" = "hwidle" + "pct_idle" (and now excludes "cpuidle")
Signed-off-by: Len Brown <len.brown@intel.com>
|
|
... and don't error out so hard on missing module descriptions.
Before commit 6c6c1fc09de3 ("modpost: require a MODULE_DESCRIPTION()")
we used to warn about missing module descriptions, but only when
building with extra warnigns (ie 'W=1').
After that commit the warning became an unconditional hard error.
And it turns out not all modules have been converted despite the claims
to the contrary. As reported by Damian Tometzki, the slub KUnit test
didn't have a module description, and apparently nobody ever really
noticed.
The reason nobody noticed seems to be that the slub KUnit tests get
disabled by SLUB_TINY, which also ends up disabling a lot of other code,
both in tests and in slub itself. And so anybody doing full build tests
didn't actually see this failre.
So let's disable SLUB_TINY for build-only tests, since it clearly ends
up limiting build coverage. Also turn the missing module descriptions
error back into a warning, but let's keep it around for non-'W=1'
builds.
Reported-by: Damian Tometzki <damian@riscv-rocks.de>
Link: https://lore.kernel.org/all/01070196099fd059-e8463438-7b1b-4ec8-816d-173874be9966-000000@eu-central-1.amazonses.com/
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
Fixes: 6c6c1fc09de3 ("modpost: require a MODULE_DESCRIPTION()")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Probe cpuidle "sysfs" residency and counts separately,
since soon we will make one disabled on, and the
other disabled off.
Clarify that some BIC (build-in-counters) are actually "groups".
since we're about to re-name some of those groups.
no functional change.
Signed-off-by: Len Brown <len.brown@intel.com>
|
|
Do fflush() to discard the buffered data, before each read of the
graphics sysfs knobs.
Fixes: ba99a4fc8c24 ("tools/power turbostat: Remove unnecessary fflush() call")
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
|
|
Document that on Intel Granite Rapids Systems,
Uncore domains 0-2 are CPU domains, and
uncore domains 3-4 are IO domains.
Signed-off-by: Len Brown <len.brown@intel.com>
|
|
The CoreThr column displays total thermal throttling events
since boot time.
Change it to report events during the measurement interval.
This is more useful for showing a user the current conditions.
Total events since boot time are still available to the user via
/sys/devices/system/cpu/cpu*/thermal_throttle/*
Document CoreThr on turbostat.8
Fixes: eae97e053fe30 ("turbostat: Support thermal throttle count print")
Reported-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Cc: Chen Yu <yu.c.chen@intel.com>
|
|
On systems with >= 1024 cpus (in my case 1152), turbostat fails with the error output:
"turbostat: /sys/fs/cgroup/cpuset.cpus.effective: cpu str malformat 0-1151"
A similar error appears with the use of turbostat --cpu when the inputted cpu
range contains a cpu number >= 1024:
# turbostat -c 1100-1151
"--cpu 1100-1151" malformed
...
Both errors are caused by parse_cpu_str() reaching its limit of CPU_SUBSET_MAXCPUS.
It's a good idea to limit the maximum cpu number being parsed, but 1024 is too low.
For a small increase in compute and allocated memory, increasing CPU_SUBSET_MAXCPUS
brings support for parsing cpu numbers >= 1024.
Increase CPU_SUBSET_MAXCPUS to 8192, a common setting for CONFIG_NR_CPUS on x86_64.
Signed-off-by: Justin Ernst <justin.ernst@hpe.com>
Signed-off-by: Len Brown <len.brown@intel.com>
|
|
The rpm-pkg make target currently suffers from a few issues related to
debuginfo:
1. debuginfo for things built into the kernel (vmlinux) is not available
in any RPM produced by make rpm-pkg. This makes using tools like
systemtap against a make rpm-pkg kernel impossible.
2. debug source for the kernel is not available. This means that
commands like 'disas /s' in gdb, which display source intermixed with
assembly, can only print file names/line numbers which then must be
painstakingly resolved to actual source in a separate editor.
3. debuginfo for modules is available, but it remains bundled with the
.ko files that contain module code, in the main kernel RPM. This is a
waste of space for users who do not need to debug the kernel (i.e.
most users).
Address all of these issues by additionally building a debuginfo RPM
when the kernel configuration allows for it, in line with standard
patterns followed by RPM distributors. With these changes:
1. systemtap now works (when these changes are backported to 6.11, since
systemtap lags a bit behind in compatibility), as verified by the
following simple test script:
# stap -e 'probe kernel.function("do_sys_open").call { printf("%s\n", $$parms); }'
dfd=0xffffffffffffff9c filename=0x7fe18800b160 flags=0x88800 mode=0x0
...
2. disas /s works correctly in gdb, with source and disassembly
interspersed:
# gdb vmlinux --batch -ex 'disas /s blk_op_str'
Dump of assembler code for function blk_op_str:
block/blk-core.c:
125 {
0xffffffff814c8740 <+0>: endbr64
127
128 if (op < ARRAY_SIZE(blk_op_name) && blk_op_name[op])
0xffffffff814c8744 <+4>: mov $0xffffffff824a7378,%rax
0xffffffff814c874b <+11>: cmp $0x23,%edi
0xffffffff814c874e <+14>: ja 0xffffffff814c8768 <blk_op_str+40>
0xffffffff814c8750 <+16>: mov %edi,%edi
126 const char *op_str = "UNKNOWN";
0xffffffff814c8752 <+18>: mov $0xffffffff824a7378,%rdx
127
128 if (op < ARRAY_SIZE(blk_op_name) && blk_op_name[op])
0xffffffff814c8759 <+25>: mov -0x7dfa0160(,%rdi,8),%rax
126 const char *op_str = "UNKNOWN";
0xffffffff814c8761 <+33>: test %rax,%rax
0xffffffff814c8764 <+36>: cmove %rdx,%rax
129 op_str = blk_op_name[op];
130
131 return op_str;
132 }
0xffffffff814c8768 <+40>: jmp 0xffffffff81d01360 <__x86_return_thunk>
End of assembler dump.
3. The size of the main kernel package goes down substantially,
especially if many modules are built (quite typical). Here is a
comparison of installed size of the kernel package (configured with
allmodconfig, dwarf4 debuginfo, and module compression turned off)
before and after this patch:
# rpm -qi kernel-6.13* | grep -E '^(Version|Size)'
Version : 6.13.0postpatch+
Size : 1382874089
Version : 6.13.0prepatch+
Size : 17870795887
This is a ~92% size reduction.
Note that a debuginfo package can only be produced if the following
configs are set:
- CONFIG_DEBUG_INFO=y
- CONFIG_MODULE_COMPRESS=n
- CONFIG_DEBUG_INFO_SPLIT=n
The first of these is obvious - we can't produce debuginfo if the build
does not generate it. The second two requirements can in principle be
removed, but doing so is difficult with the current approach, which uses
a generic rpmbuild script find-debuginfo.sh that processes all packaged
executables. If we want to remove those requirements the best path
forward is likely to add some debuginfo extraction/installation logic to
the modules_install target (controllable by flags). That way, it's
easier to operate on modules before they're compressed, and the logic
can be reused by all packaging targets.
Signed-off-by: Uday Shankar <ushankar@purestorage.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
The scripts/kconfig/merge_config.sh script requires an existing
$INITFILE (or the $1 argument) as a base file for merging Kconfig
fragments. However, an empty $INITFILE can serve as an initial starting
point, later referenced by the KCONFIG_ALLCONFIG Makefile variable
if -m is not used. This variable can point to any configuration file
containing preset config symbols (the merged output) as stated in
Documentation/kbuild/kconfig.rst. When -m is used $INITFILE will
contain just the merge output requiring the user to run make (i.e.
KCONFIG_ALLCONFIG=<$INITFILE> make <allnoconfig/alldefconfig> or make
olddefconfig).
Instead of failing when `$INITFILE` is missing, create an empty file and
use it as the starting point for merges.
Signed-off-by: Daniel Gomez <da.gomez@samsung.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
Commit 654102df2ac2 ("kbuild: add generic support for built-in boot
DTBs") introduced generic support for built-in DTBs.
Select GENERIC_BUILTIN_DTB when built-in DTB support is enabled.
To keep consistency across architectures, this commit also renames
CONFIG_NIOS2_DTB_SOURCE_BOOL to CONFIG_BUILTIN_DTB, and
CONFIG_NIOS2_DTB_SOURCE to CONFIG_BUILTIN_DTB_NAME.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
This option was removed from Kconfig in 8c710f75256b ("net/sched:
Retire tcindex classifier") but from the defconfigs.
Fixes: 8c710f75256b ("net/sched: Retire tcindex classifier")
Signed-off-by: Johan Korsnes <johan.korsnes@gmail.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
|
|
J2-based devices expect to find a device tree blob at the end of the
.bss section. As of a77725a9a3c5 ("scripts/dtc: Update to upstream
version v1.6.1-19-g0a3a9d3449c8"), libfdt enforces 8-byte alignment
for the DTB, causing J2 devices to fail early in sh_fdt_init().
As the J2 loader firmware calculates the DTB location based on the kernel
image .bss section size rather than the __bss_stop symbol offset, the
required alignment can't be enforced with BSS_SECTION(0, PAGE_SIZE, 8).
To fix this, inline a modified version of the above macro which grows
.bss by the required size. While this change affects all existing SH
boards, it should be benign on platforms which don't need this alignment.
Signed-off-by: Artur Rojek <contact@artur-rojek.eu>
Reviewed-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Tested-by: Rob Landley <rob@landley.net>
Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
|
|
The function hrtimer_init() doesn't exist anymore. It was replaced by
hrtimer_setup().
Thus, rename the hrtimer_init trace event to hrtimer_setup to keep it
consistent.
Signed-off-by: Nam Cao <namcao@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/all/cba84c3d853c5258aa3a262363a6eac08e2c7afc.1738746927.git.namcao@linutronix.de
|