From 1507f51255c9ff07d75909a84e7c0d7f3c4b2f49 Mon Sep 17 00:00:00 2001 From: Mike Rapoport Date: Wed, 7 Jul 2021 18:08:03 -0700 Subject: mm: introduce memfd_secret system call to create "secret" memory areas Introduce "memfd_secret" system call with the ability to create memory areas visible only in the context of the owning process and not mapped not only to other processes but in the kernel page tables as well. The secretmem feature is off by default and the user must explicitly enable it at the boot time. Once secretmem is enabled, the user will be able to create a file descriptor using the memfd_secret() system call. The memory areas created by mmap() calls from this file descriptor will be unmapped from the kernel direct map and they will be only mapped in the page table of the processes that have access to the file descriptor. Secretmem is designed to provide the following protections: * Enhanced protection (in conjunction with all the other in-kernel attack prevention systems) against ROP attacks. Seceretmem makes "simple" ROP insufficient to perform exfiltration, which increases the required complexity of the attack. Along with other protections like the kernel stack size limit and address space layout randomization which make finding gadgets is really hard, absence of any in-kernel primitive for accessing secret memory means the one gadget ROP attack can't work. Since the only way to access secret memory is to reconstruct the missing mapping entry, the attacker has to recover the physical page and insert a PTE pointing to it in the kernel and then retrieve the contents. That takes at least three gadgets which is a level of difficulty beyond most standard attacks. * Prevent cross-process secret userspace memory exposures. Once the secret memory is allocated, the user can't accidentally pass it into the kernel to be transmitted somewhere. The secreremem pages cannot be accessed via the direct map and they are disallowed in GUP. * Harden against exploited kernel flaws. In order to access secretmem, a kernel-side attack would need to either walk the page tables and create new ones, or spawn a new privileged uiserspace process to perform secrets exfiltration using ptrace. The file descriptor based memory has several advantages over the "traditional" mm interfaces, such as mlock(), mprotect(), madvise(). File descriptor approach allows explicit and controlled sharing of the memory areas, it allows to seal the operations. Besides, file descriptor based memory paves the way for VMMs to remove the secret memory range from the userspace hipervisor process, for instance QEMU. Andy Lutomirski says: "Getting fd-backed memory into a guest will take some possibly major work in the kernel, but getting vma-backed memory into a guest without mapping it in the host user address space seems much, much worse." memfd_secret() is made a dedicated system call rather than an extension to memfd_create() because it's purpose is to allow the user to create more secure memory mappings rather than to simply allow file based access to the memory. Nowadays a new system call cost is negligible while it is way simpler for userspace to deal with a clear-cut system calls than with a multiplexer or an overloaded syscall. Moreover, the initial implementation of memfd_secret() is completely distinct from memfd_create() so there is no much sense in overloading memfd_create() to begin with. If there will be a need for code sharing between these implementation it can be easily achieved without a need to adjust user visible APIs. The secret memory remains accessible in the process context using uaccess primitives, but it is not exposed to the kernel otherwise; secret memory areas are removed from the direct map and functions in the follow_page()/get_user_page() family will refuse to return a page that belongs to the secret memory area. Once there will be a use case that will require exposing secretmem to the kernel it will be an opt-in request in the system call flags so that user would have to decide what data can be exposed to the kernel. Removing of the pages from the direct map may cause its fragmentation on architectures that use large pages to map the physical memory which affects the system performance. However, the original Kconfig text for CONFIG_DIRECT_GBPAGES said that gigabyte pages in the direct map "... can improve the kernel's performance a tiny bit ..." (commit 00d1c5e05736 ("x86: add gbpages switches")) and the recent report [1] showed that "... although 1G mappings are a good default choice, there is no compelling evidence that it must be the only choice". Hence, it is sufficient to have secretmem disabled by default with the ability of a system administrator to enable it at boot time. Pages in the secretmem regions are unevictable and unmovable to avoid accidental exposure of the sensitive data via swap or during page migration. Since the secretmem mappings are locked in memory they cannot exceed RLIMIT_MEMLOCK. Since these mappings are already locked independently from mlock(), an attempt to mlock()/munlock() secretmem range would fail and mlockall()/munlockall() will ignore secretmem mappings. However, unlike mlock()ed memory, secretmem currently behaves more like long-term GUP: secretmem mappings are unmovable mappings directly consumed by user space. With default limits, there is no excessive use of secretmem and it poses no real problem in combination with ZONE_MOVABLE/CMA, but in the future this should be addressed to allow balanced use of large amounts of secretmem along with ZONE_MOVABLE/CMA. A page that was a part of the secret memory area is cleared when it is freed to ensure the data is not exposed to the next user of that page. The following example demonstrates creation of a secret mapping (error handling is omitted): fd = memfd_secret(0); ftruncate(fd, MAP_SIZE); ptr = mmap(NULL, MAP_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); [1] https://lore.kernel.org/linux-mm/213b4567-46ce-f116-9cdf-bbd0c884eb3c@linux.intel.com/ [akpm@linux-foundation.org: suppress Kconfig whine] Link: https://lkml.kernel.org/r/20210518072034.31572-5-rppt@kernel.org Signed-off-by: Mike Rapoport Acked-by: Hagen Paul Pfeifer Acked-by: James Bottomley Cc: Alexander Viro Cc: Andy Lutomirski Cc: Arnd Bergmann Cc: Borislav Petkov Cc: Catalin Marinas Cc: Christopher Lameter Cc: Dan Williams Cc: Dave Hansen Cc: Elena Reshetova Cc: "H. Peter Anvin" Cc: Ingo Molnar Cc: James Bottomley Cc: "Kirill A. Shutemov" Cc: Matthew Wilcox Cc: Mark Rutland Cc: Michael Kerrisk Cc: Palmer Dabbelt Cc: Palmer Dabbelt Cc: Paul Walmsley Cc: Peter Zijlstra Cc: Rick Edgecombe Cc: Roman Gushchin Cc: Shakeel Butt Cc: Shuah Khan Cc: Thomas Gleixner Cc: Tycho Andersen Cc: Will Deacon Cc: David Hildenbrand Cc: kernel test robot Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- kernel/sys_ni.c | 2 ++ 1 file changed, 2 insertions(+) (limited to 'kernel') diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c index dad4d994641e..30971b1dd4a9 100644 --- a/kernel/sys_ni.c +++ b/kernel/sys_ni.c @@ -358,6 +358,8 @@ COND_SYSCALL(pkey_mprotect); COND_SYSCALL(pkey_alloc); COND_SYSCALL(pkey_free); +/* memfd_secret */ +COND_SYSCALL(memfd_secret); /* * Architecture specific weak syscall entries. -- cgit v1.2.3-59-g8ed1b From 9a436f8ff6316c3c1a21a758e14ded930bd615d9 Mon Sep 17 00:00:00 2001 From: Mike Rapoport Date: Wed, 7 Jul 2021 18:08:07 -0700 Subject: PM: hibernate: disable when there are active secretmem users It is unsafe to allow saving of secretmem areas to the hibernation snapshot as they would be visible after the resume and this essentially will defeat the purpose of secret memory mappings. Prevent hibernation whenever there are active secret memory users. Link: https://lkml.kernel.org/r/20210518072034.31572-6-rppt@kernel.org Signed-off-by: Mike Rapoport Acked-by: David Hildenbrand Acked-by: James Bottomley Cc: Alexander Viro Cc: Andy Lutomirski Cc: Arnd Bergmann Cc: Borislav Petkov Cc: Catalin Marinas Cc: Christopher Lameter Cc: Dan Williams Cc: Dave Hansen Cc: David Hildenbrand Cc: Elena Reshetova Cc: Hagen Paul Pfeifer Cc: "H. Peter Anvin" Cc: Ingo Molnar Cc: James Bottomley Cc: "Kirill A. Shutemov" Cc: Mark Rutland Cc: Matthew Wilcox Cc: Michael Kerrisk Cc: Palmer Dabbelt Cc: Palmer Dabbelt Cc: Paul Walmsley Cc: Peter Zijlstra Cc: Rick Edgecombe Cc: Roman Gushchin Cc: Shakeel Butt Cc: Shuah Khan Cc: Thomas Gleixner Cc: Tycho Andersen Cc: Will Deacon Cc: kernel test robot Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/secretmem.h | 6 ++++++ kernel/power/hibernate.c | 5 ++++- mm/secretmem.c | 15 +++++++++++++++ 3 files changed, 25 insertions(+), 1 deletion(-) (limited to 'kernel') diff --git a/include/linux/secretmem.h b/include/linux/secretmem.h index e617b4afcc62..21c3771e6a56 100644 --- a/include/linux/secretmem.h +++ b/include/linux/secretmem.h @@ -30,6 +30,7 @@ static inline bool page_is_secretmem(struct page *page) } bool vma_is_secretmem(struct vm_area_struct *vma); +bool secretmem_active(void); #else @@ -43,6 +44,11 @@ static inline bool page_is_secretmem(struct page *page) return false; } +static inline bool secretmem_active(void) +{ + return false; +} + #endif /* CONFIG_SECRETMEM */ #endif /* _LINUX_SECRETMEM_H */ diff --git a/kernel/power/hibernate.c b/kernel/power/hibernate.c index da0b41914177..559acef3fddb 100644 --- a/kernel/power/hibernate.c +++ b/kernel/power/hibernate.c @@ -31,6 +31,7 @@ #include #include #include +#include #include #include "power.h" @@ -81,7 +82,9 @@ void hibernate_release(void) bool hibernation_available(void) { - return nohibernate == 0 && !security_locked_down(LOCKDOWN_HIBERNATION); + return nohibernate == 0 && + !security_locked_down(LOCKDOWN_HIBERNATION) && + !secretmem_active(); } /** diff --git a/mm/secretmem.c b/mm/secretmem.c index 972cd1bbc3cc..f77d25467a14 100644 --- a/mm/secretmem.c +++ b/mm/secretmem.c @@ -40,6 +40,13 @@ module_param_named(enable, secretmem_enable, bool, 0400); MODULE_PARM_DESC(secretmem_enable, "Enable secretmem and memfd_secret(2) system call"); +static atomic_t secretmem_users; + +bool secretmem_active(void) +{ + return !!atomic_read(&secretmem_users); +} + static vm_fault_t secretmem_fault(struct vm_fault *vmf) { struct address_space *mapping = vmf->vma->vm_file->f_mapping; @@ -94,6 +101,12 @@ static const struct vm_operations_struct secretmem_vm_ops = { .fault = secretmem_fault, }; +static int secretmem_release(struct inode *inode, struct file *file) +{ + atomic_dec(&secretmem_users); + return 0; +} + static int secretmem_mmap(struct file *file, struct vm_area_struct *vma) { unsigned long len = vma->vm_end - vma->vm_start; @@ -116,6 +129,7 @@ bool vma_is_secretmem(struct vm_area_struct *vma) } static const struct file_operations secretmem_fops = { + .release = secretmem_release, .mmap = secretmem_mmap, }; @@ -202,6 +216,7 @@ SYSCALL_DEFINE1(memfd_secret, unsigned int, flags) file->f_flags |= O_LARGEFILE; fd_install(fd, file); + atomic_inc(&secretmem_users); return fd; err_put_fd: -- cgit v1.2.3-59-g8ed1b From 9294523e3768030ae8afb84110bcecc66425a647 Mon Sep 17 00:00:00 2001 From: Stephen Boyd Date: Wed, 7 Jul 2021 18:09:20 -0700 Subject: module: add printk formats to add module build ID to stacktraces Let's make kernel stacktraces easier to identify by including the build ID[1] of a module if the stacktrace is printing a symbol from a module. This makes it simpler for developers to locate a kernel module's full debuginfo for a particular stacktrace. Combined with scripts/decode_stracktrace.sh, a developer can download the matching debuginfo from a debuginfod[2] server and find the exact file and line number for the functions plus offsets in a stacktrace that match the module. This is especially useful for pstore crash debugging where the kernel crashes are recorded in something like console-ramoops and the recovery kernel/modules are different or the debuginfo doesn't exist on the device due to space concerns (the debuginfo can be too large for space limited devices). Originally, I put this on the %pS format, but that was quickly rejected given that %pS is used in other places such as ftrace where build IDs aren't meaningful. There was some discussions on the list to put every module build ID into the "Modules linked in:" section of the stacktrace message but that quickly becomes very hard to read once you have more than three or four modules linked in. It also provides too much information when we don't expect each module to be traversed in a stacktrace. Having the build ID for modules that aren't important just makes things messy. Splitting it to multiple lines for each module quickly explodes the number of lines printed in an oops too, possibly wrapping the warning off the console. And finally, trying to stash away each module used in a callstack to provide the ID of each symbol printed is cumbersome and would require changes to each architecture to stash away modules and return their build IDs once unwinding has completed. Instead, we opt for the simpler approach of introducing new printk formats '%pS[R]b' for "pointer symbolic backtrace with module build ID" and '%pBb' for "pointer backtrace with module build ID" and then updating the few places in the architecture layer where the stacktrace is printed to use this new format. Before: Call trace: lkdtm_WARNING+0x28/0x30 [lkdtm] direct_entry+0x16c/0x1b4 [lkdtm] full_proxy_write+0x74/0xa4 vfs_write+0xec/0x2e8 After: Call trace: lkdtm_WARNING+0x28/0x30 [lkdtm 6c2215028606bda50de823490723dc4bc5bf46f9] direct_entry+0x16c/0x1b4 [lkdtm 6c2215028606bda50de823490723dc4bc5bf46f9] full_proxy_write+0x74/0xa4 vfs_write+0xec/0x2e8 [akpm@linux-foundation.org: fix build with CONFIG_MODULES=n, tweak code layout] [rdunlap@infradead.org: fix build when CONFIG_MODULES is not set] Link: https://lkml.kernel.org/r/20210513171510.20328-1-rdunlap@infradead.org [akpm@linux-foundation.org: make kallsyms_lookup_buildid() static] [cuibixuan@huawei.com: fix build error when CONFIG_SYSFS is disabled] Link: https://lkml.kernel.org/r/20210525105049.34804-1-cuibixuan@huawei.com Link: https://lkml.kernel.org/r/20210511003845.2429846-6-swboyd@chromium.org Link: https://fedoraproject.org/wiki/Releases/FeatureBuildId [1] Link: https://sourceware.org/elfutils/Debuginfod.html [2] Signed-off-by: Stephen Boyd Signed-off-by: Bixuan Cui Signed-off-by: Randy Dunlap Cc: Jiri Olsa Cc: Alexei Starovoitov Cc: Jessica Yu Cc: Evan Green Cc: Hsin-Yi Wang Cc: Petr Mladek Cc: Steven Rostedt Cc: Sergey Senozhatsky Cc: Andy Shevchenko Cc: Rasmus Villemoes Cc: Matthew Wilcox Cc: Baoquan He Cc: Borislav Petkov Cc: Catalin Marinas Cc: Dave Young Cc: Ingo Molnar Cc: Konstantin Khlebnikov Cc: Sasha Levin Cc: Thomas Gleixner Cc: Vivek Goyal Cc: Will Deacon Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/core-api/printk-formats.rst | 11 ++++ include/linux/kallsyms.h | 21 +++++- include/linux/module.h | 9 ++- kernel/kallsyms.c | 104 ++++++++++++++++++++++++------ kernel/module.c | 42 ++++++++++-- lib/vsprintf.c | 8 ++- 6 files changed, 166 insertions(+), 29 deletions(-) (limited to 'kernel') diff --git a/Documentation/core-api/printk-formats.rst b/Documentation/core-api/printk-formats.rst index 4346ae17a72c..d941717a191b 100644 --- a/Documentation/core-api/printk-formats.rst +++ b/Documentation/core-api/printk-formats.rst @@ -125,6 +125,17 @@ used when printing stack backtraces. The specifier takes into consideration the effect of compiler optimisations which may occur when tail-calls are used and marked with the noreturn GCC attribute. +If the pointer is within a module, the module name and optionally build ID is +printed after the symbol name with an extra ``b`` appended to the end of the +specifier. + +:: + %pS versatile_init+0x0/0x110 [module_name] + %pSb versatile_init+0x0/0x110 [module_name ed5019fdf5e53be37cb1ba7899292d7e143b259e] + %pSRb versatile_init+0x9/0x110 [module_name ed5019fdf5e53be37cb1ba7899292d7e143b259e] + (with __builtin_extract_return_addr() translation) + %pBb prev_fn_of_versatile_init+0x88/0x88 [module_name ed5019fdf5e53be37cb1ba7899292d7e143b259e] + Probed Pointers from BPF / tracing ---------------------------------- diff --git a/include/linux/kallsyms.h b/include/linux/kallsyms.h index 465060acc981..a1d6fc82d7f0 100644 --- a/include/linux/kallsyms.h +++ b/include/linux/kallsyms.h @@ -7,6 +7,7 @@ #define _LINUX_KALLSYMS_H #include +#include #include #include #include @@ -15,8 +16,10 @@ #include #define KSYM_NAME_LEN 128 -#define KSYM_SYMBOL_LEN (sizeof("%s+%#lx/%#lx [%s]") + (KSYM_NAME_LEN - 1) + \ - 2*(BITS_PER_LONG*3/10) + (MODULE_NAME_LEN - 1) + 1) +#define KSYM_SYMBOL_LEN (sizeof("%s+%#lx/%#lx [%s %s]") + \ + (KSYM_NAME_LEN - 1) + \ + 2*(BITS_PER_LONG*3/10) + (MODULE_NAME_LEN - 1) + \ + (BUILD_ID_SIZE_MAX * 2) + 1) struct cred; struct module; @@ -91,8 +94,10 @@ const char *kallsyms_lookup(unsigned long addr, /* Look up a kernel symbol and return it in a text buffer. */ extern int sprint_symbol(char *buffer, unsigned long address); +extern int sprint_symbol_build_id(char *buffer, unsigned long address); extern int sprint_symbol_no_offset(char *buffer, unsigned long address); extern int sprint_backtrace(char *buffer, unsigned long address); +extern int sprint_backtrace_build_id(char *buffer, unsigned long address); int lookup_symbol_name(unsigned long addr, char *symname); int lookup_symbol_attrs(unsigned long addr, unsigned long *size, unsigned long *offset, char *modname, char *name); @@ -128,6 +133,12 @@ static inline int sprint_symbol(char *buffer, unsigned long addr) return 0; } +static inline int sprint_symbol_build_id(char *buffer, unsigned long address) +{ + *buffer = '\0'; + return 0; +} + static inline int sprint_symbol_no_offset(char *buffer, unsigned long addr) { *buffer = '\0'; @@ -140,6 +151,12 @@ static inline int sprint_backtrace(char *buffer, unsigned long addr) return 0; } +static inline int sprint_backtrace_build_id(char *buffer, unsigned long addr) +{ + *buffer = '\0'; + return 0; +} + static inline int lookup_symbol_name(unsigned long addr, char *symname) { return -ERANGE; diff --git a/include/linux/module.h b/include/linux/module.h index 8100bb477d86..8a298d820dbc 100644 --- a/include/linux/module.h +++ b/include/linux/module.h @@ -11,6 +11,7 @@ #include #include +#include #include #include #include @@ -369,6 +370,11 @@ struct module { /* Unique handle for this module */ char name[MODULE_NAME_LEN]; +#ifdef CONFIG_STACKTRACE_BUILD_ID + /* Module build ID */ + unsigned char build_id[BUILD_ID_SIZE_MAX]; +#endif + /* Sysfs stuff. */ struct module_kobject mkobj; struct module_attribute *modinfo_attrs; @@ -636,7 +642,7 @@ void *dereference_module_function_descriptor(struct module *mod, void *ptr); const char *module_address_lookup(unsigned long addr, unsigned long *symbolsize, unsigned long *offset, - char **modname, + char **modname, const unsigned char **modbuildid, char *namebuf); int lookup_module_symbol_name(unsigned long addr, char *symname); int lookup_module_symbol_attrs(unsigned long addr, unsigned long *size, unsigned long *offset, char *modname, char *name); @@ -740,6 +746,7 @@ static inline const char *module_address_lookup(unsigned long addr, unsigned long *symbolsize, unsigned long *offset, char **modname, + const unsigned char **modbuildid, char *namebuf) { return NULL; diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c index c851ca0ed357..0ba87982d017 100644 --- a/kernel/kallsyms.c +++ b/kernel/kallsyms.c @@ -25,7 +25,10 @@ #include #include #include +#include #include +#include +#include /* * These will be re-linked against their real values @@ -297,21 +300,14 @@ int kallsyms_lookup_size_offset(unsigned long addr, unsigned long *symbolsize, get_symbol_pos(addr, symbolsize, offset); return 1; } - return !!module_address_lookup(addr, symbolsize, offset, NULL, namebuf) || + return !!module_address_lookup(addr, symbolsize, offset, NULL, NULL, namebuf) || !!__bpf_address_lookup(addr, symbolsize, offset, namebuf); } -/* - * Lookup an address - * - modname is set to NULL if it's in the kernel. - * - We guarantee that the returned name is valid until we reschedule even if. - * It resides in a module. - * - We also guarantee that modname will be valid until rescheduled. - */ -const char *kallsyms_lookup(unsigned long addr, - unsigned long *symbolsize, - unsigned long *offset, - char **modname, char *namebuf) +static const char *kallsyms_lookup_buildid(unsigned long addr, + unsigned long *symbolsize, + unsigned long *offset, char **modname, + const unsigned char **modbuildid, char *namebuf) { const char *ret; @@ -327,6 +323,8 @@ const char *kallsyms_lookup(unsigned long addr, namebuf, KSYM_NAME_LEN); if (modname) *modname = NULL; + if (modbuildid) + *modbuildid = NULL; ret = namebuf; goto found; @@ -334,7 +332,7 @@ const char *kallsyms_lookup(unsigned long addr, /* See if it's in a module or a BPF JITed image. */ ret = module_address_lookup(addr, symbolsize, offset, - modname, namebuf); + modname, modbuildid, namebuf); if (!ret) ret = bpf_address_lookup(addr, symbolsize, offset, modname, namebuf); @@ -348,6 +346,22 @@ found: return ret; } +/* + * Lookup an address + * - modname is set to NULL if it's in the kernel. + * - We guarantee that the returned name is valid until we reschedule even if. + * It resides in a module. + * - We also guarantee that modname will be valid until rescheduled. + */ +const char *kallsyms_lookup(unsigned long addr, + unsigned long *symbolsize, + unsigned long *offset, + char **modname, char *namebuf) +{ + return kallsyms_lookup_buildid(addr, symbolsize, offset, modname, + NULL, namebuf); +} + int lookup_symbol_name(unsigned long addr, char *symname) { int res; @@ -404,15 +418,17 @@ found: /* Look up a kernel symbol and return it in a text buffer. */ static int __sprint_symbol(char *buffer, unsigned long address, - int symbol_offset, int add_offset) + int symbol_offset, int add_offset, int add_buildid) { char *modname; + const unsigned char *buildid; const char *name; unsigned long offset, size; int len; address += symbol_offset; - name = kallsyms_lookup(address, &size, &offset, &modname, buffer); + name = kallsyms_lookup_buildid(address, &size, &offset, &modname, &buildid, + buffer); if (!name) return sprintf(buffer, "0x%lx", address - symbol_offset); @@ -424,8 +440,19 @@ static int __sprint_symbol(char *buffer, unsigned long address, if (add_offset) len += sprintf(buffer + len, "+%#lx/%#lx", offset, size); - if (modname) - len += sprintf(buffer + len, " [%s]", modname); + if (modname) { + len += sprintf(buffer + len, " [%s", modname); +#if IS_ENABLED(CONFIG_STACKTRACE_BUILD_ID) + if (add_buildid && buildid) { + /* build ID should match length of sprintf */ +#if IS_ENABLED(CONFIG_MODULES) + static_assert(sizeof(typeof_member(struct module, build_id)) == 20); +#endif + len += sprintf(buffer + len, " %20phN", buildid); + } +#endif + len += sprintf(buffer + len, "]"); + } return len; } @@ -443,10 +470,27 @@ static int __sprint_symbol(char *buffer, unsigned long address, */ int sprint_symbol(char *buffer, unsigned long address) { - return __sprint_symbol(buffer, address, 0, 1); + return __sprint_symbol(buffer, address, 0, 1, 0); } EXPORT_SYMBOL_GPL(sprint_symbol); +/** + * sprint_symbol_build_id - Look up a kernel symbol and return it in a text buffer + * @buffer: buffer to be stored + * @address: address to lookup + * + * This function looks up a kernel symbol with @address and stores its name, + * offset, size, module name and module build ID to @buffer if possible. If no + * symbol was found, just saves its @address as is. + * + * This function returns the number of bytes stored in @buffer. + */ +int sprint_symbol_build_id(char *buffer, unsigned long address) +{ + return __sprint_symbol(buffer, address, 0, 1, 1); +} +EXPORT_SYMBOL_GPL(sprint_symbol_build_id); + /** * sprint_symbol_no_offset - Look up a kernel symbol and return it in a text buffer * @buffer: buffer to be stored @@ -460,7 +504,7 @@ EXPORT_SYMBOL_GPL(sprint_symbol); */ int sprint_symbol_no_offset(char *buffer, unsigned long address) { - return __sprint_symbol(buffer, address, 0, 0); + return __sprint_symbol(buffer, address, 0, 0, 0); } EXPORT_SYMBOL_GPL(sprint_symbol_no_offset); @@ -480,7 +524,27 @@ EXPORT_SYMBOL_GPL(sprint_symbol_no_offset); */ int sprint_backtrace(char *buffer, unsigned long address) { - return __sprint_symbol(buffer, address, -1, 1); + return __sprint_symbol(buffer, address, -1, 1, 0); +} + +/** + * sprint_backtrace_build_id - Look up a backtrace symbol and return it in a text buffer + * @buffer: buffer to be stored + * @address: address to lookup + * + * This function is for stack backtrace and does the same thing as + * sprint_symbol() but with modified/decreased @address. If there is a + * tail-call to the function marked "noreturn", gcc optimized out code after + * the call so that the stack-saved return address could point outside of the + * caller. This function ensures that kallsyms will find the original caller + * by decreasing @address. This function also appends the module build ID to + * the @buffer if @address is within a kernel module. + * + * This function returns the number of bytes stored in @buffer. + */ +int sprint_backtrace_build_id(char *buffer, unsigned long address) +{ + return __sprint_symbol(buffer, address, -1, 1, 1); } /* To avoid using get_symbol_offset for every symbol, we carry prefix along. */ diff --git a/kernel/module.c b/kernel/module.c index 64bd61b2d3ad..ed13917ea5f3 100644 --- a/kernel/module.c +++ b/kernel/module.c @@ -13,6 +13,7 @@ #include #include #include +#include #include #include #include @@ -1465,6 +1466,13 @@ resolve_symbol_wait(struct module *mod, return ksym; } +#ifdef CONFIG_KALLSYMS +static inline bool sect_empty(const Elf_Shdr *sect) +{ + return !(sect->sh_flags & SHF_ALLOC) || sect->sh_size == 0; +} +#endif + /* * /sys/module/foo/sections stuff * J. Corbet @@ -1472,11 +1480,6 @@ resolve_symbol_wait(struct module *mod, #ifdef CONFIG_SYSFS #ifdef CONFIG_KALLSYMS -static inline bool sect_empty(const Elf_Shdr *sect) -{ - return !(sect->sh_flags & SHF_ALLOC) || sect->sh_size == 0; -} - struct module_sect_attr { struct bin_attribute battr; unsigned long address; @@ -2797,6 +2800,26 @@ static void add_kallsyms(struct module *mod, const struct load_info *info) } #endif /* CONFIG_KALLSYMS */ +#if IS_ENABLED(CONFIG_KALLSYMS) && IS_ENABLED(CONFIG_STACKTRACE_BUILD_ID) +static void init_build_id(struct module *mod, const struct load_info *info) +{ + const Elf_Shdr *sechdr; + unsigned int i; + + for (i = 0; i < info->hdr->e_shnum; i++) { + sechdr = &info->sechdrs[i]; + if (!sect_empty(sechdr) && sechdr->sh_type == SHT_NOTE && + !build_id_parse_buf((void *)sechdr->sh_addr, mod->build_id, + sechdr->sh_size)) + break; + } +} +#else +static void init_build_id(struct module *mod, const struct load_info *info) +{ +} +#endif + static void dynamic_debug_setup(struct module *mod, struct _ddebug *debug, unsigned int num) { if (!debug) @@ -4021,6 +4044,7 @@ static int load_module(struct load_info *info, const char __user *uargs, goto free_arch_cleanup; } + init_build_id(mod, info); dynamic_debug_setup(mod, info->debug, info->num_debug); /* Ftrace init must be called in the MODULE_STATE_UNFORMED state */ @@ -4254,6 +4278,7 @@ const char *module_address_lookup(unsigned long addr, unsigned long *size, unsigned long *offset, char **modname, + const unsigned char **modbuildid, char *namebuf) { const char *ret = NULL; @@ -4264,6 +4289,13 @@ const char *module_address_lookup(unsigned long addr, if (mod) { if (modname) *modname = mod->name; + if (modbuildid) { +#if IS_ENABLED(CONFIG_STACKTRACE_BUILD_ID) + *modbuildid = mod->build_id; +#else + *modbuildid = NULL; +#endif + } ret = find_kallsyms_symbol(mod, addr, size, offset); } diff --git a/lib/vsprintf.c b/lib/vsprintf.c index 77ba1c40a99d..26c83943748a 100644 --- a/lib/vsprintf.c +++ b/lib/vsprintf.c @@ -993,8 +993,12 @@ char *symbol_string(char *buf, char *end, void *ptr, value = (unsigned long)ptr; #ifdef CONFIG_KALLSYMS - if (*fmt == 'B') + if (*fmt == 'B' && fmt[1] == 'b') + sprint_backtrace_build_id(sym, value); + else if (*fmt == 'B') sprint_backtrace(sym, value); + else if (*fmt == 'S' && (fmt[1] == 'b' || (fmt[1] == 'R' && fmt[2] == 'b'))) + sprint_symbol_build_id(sym, value); else if (*fmt != 's') sprint_symbol(sym, value); else @@ -2263,9 +2267,11 @@ early_param("no_hash_pointers", no_hash_pointers_enable); * - 'S' For symbolic direct pointers (or function descriptors) with offset * - 's' For symbolic direct pointers (or function descriptors) without offset * - '[Ss]R' as above with __builtin_extract_return_addr() translation + * - 'S[R]b' as above with module build ID (for use in backtraces) * - '[Ff]' %pf and %pF were obsoleted and later removed in favor of * %ps and %pS. Be careful when re-using these specifiers. * - 'B' For backtraced symbolic direct pointers with offset + * - 'Bb' as above with module build ID (for use in backtraces) * - 'R' For decoded struct resource, e.g., [mem 0x0-0x1f 64bit pref] * - 'r' For raw struct resource, e.g., [mem 0x0-0x1f flags 0x201] * - 'b[l]' For a bitmap, the number of bits is determined by the field -- cgit v1.2.3-59-g8ed1b From 44e8a5e9120bf4fc1ab046b648b0598e6652c36e Mon Sep 17 00:00:00 2001 From: Stephen Boyd Date: Wed, 7 Jul 2021 18:09:49 -0700 Subject: kdump: use vmlinux_build_id to simplify We can use the vmlinux_build_id array here now instead of open coding it. This mostly consolidates code. Link: https://lkml.kernel.org/r/20210511003845.2429846-14-swboyd@chromium.org Signed-off-by: Stephen Boyd Cc: Jiri Olsa Cc: Alexei Starovoitov Cc: Jessica Yu Cc: Evan Green Cc: Hsin-Yi Wang Cc: Dave Young Cc: Baoquan He Cc: Vivek Goyal Cc: Andy Shevchenko Cc: Borislav Petkov Cc: Catalin Marinas Cc: Ingo Molnar Cc: Konstantin Khlebnikov Cc: Matthew Wilcox Cc: Petr Mladek Cc: Rasmus Villemoes Cc: Sasha Levin Cc: Sergey Senozhatsky Cc: Steven Rostedt Cc: Thomas Gleixner Cc: Will Deacon Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/buildid.h | 2 +- include/linux/crash_core.h | 12 +++++------ kernel/crash_core.c | 50 ++-------------------------------------------- lib/buildid.c | 2 +- 4 files changed, 10 insertions(+), 56 deletions(-) (limited to 'kernel') diff --git a/include/linux/buildid.h b/include/linux/buildid.h index 3e8d77a93ec4..3b7a0ff4642f 100644 --- a/include/linux/buildid.h +++ b/include/linux/buildid.h @@ -10,7 +10,7 @@ int build_id_parse(struct vm_area_struct *vma, unsigned char *build_id, __u32 *size); int build_id_parse_buf(const void *buf, unsigned char *build_id, u32 buf_size); -#if IS_ENABLED(CONFIG_STACKTRACE_BUILD_ID) +#if IS_ENABLED(CONFIG_STACKTRACE_BUILD_ID) || IS_ENABLED(CONFIG_CRASH_CORE) extern unsigned char vmlinux_build_id[BUILD_ID_SIZE_MAX]; void init_vmlinux_build_id(void); #else diff --git a/include/linux/crash_core.h b/include/linux/crash_core.h index 206bde8308b2..de62a722431e 100644 --- a/include/linux/crash_core.h +++ b/include/linux/crash_core.h @@ -38,8 +38,12 @@ phys_addr_t paddr_vmcoreinfo_note(void); #define VMCOREINFO_OSRELEASE(value) \ vmcoreinfo_append_str("OSRELEASE=%s\n", value) -#define VMCOREINFO_BUILD_ID(value) \ - vmcoreinfo_append_str("BUILD-ID=%s\n", value) +#define VMCOREINFO_BUILD_ID() \ + ({ \ + static_assert(sizeof(vmlinux_build_id) == 20); \ + vmcoreinfo_append_str("BUILD-ID=%20phN\n", vmlinux_build_id); \ + }) + #define VMCOREINFO_PAGESIZE(value) \ vmcoreinfo_append_str("PAGESIZE=%ld\n", value) #define VMCOREINFO_SYMBOL(name) \ @@ -69,10 +73,6 @@ extern unsigned char *vmcoreinfo_data; extern size_t vmcoreinfo_size; extern u32 *vmcoreinfo_note; -/* raw contents of kernel .notes section */ -extern const void __start_notes __weak; -extern const void __stop_notes __weak; - Elf_Word *append_elf_note(Elf_Word *buf, char *name, unsigned int type, void *data, size_t data_len); void final_note(Elf_Word *buf); diff --git a/kernel/crash_core.c b/kernel/crash_core.c index da449c1cdca7..eb53f5ec62c9 100644 --- a/kernel/crash_core.c +++ b/kernel/crash_core.c @@ -4,6 +4,7 @@ * Copyright (C) 2002-2004 Eric Biederman */ +#include #include #include #include @@ -378,53 +379,6 @@ phys_addr_t __weak paddr_vmcoreinfo_note(void) } EXPORT_SYMBOL(paddr_vmcoreinfo_note); -#define NOTES_SIZE (&__stop_notes - &__start_notes) -#define BUILD_ID_MAX SHA1_DIGEST_SIZE -#define NT_GNU_BUILD_ID 3 - -struct elf_note_section { - struct elf_note n_hdr; - u8 n_data[]; -}; - -/* - * Add build ID from .notes section as generated by the GNU ld(1) - * or LLVM lld(1) --build-id option. - */ -static void add_build_id_vmcoreinfo(void) -{ - char build_id[BUILD_ID_MAX * 2 + 1]; - int n_remain = NOTES_SIZE; - - while (n_remain >= sizeof(struct elf_note)) { - const struct elf_note_section *note_sec = - &__start_notes + NOTES_SIZE - n_remain; - const u32 n_namesz = note_sec->n_hdr.n_namesz; - - if (note_sec->n_hdr.n_type == NT_GNU_BUILD_ID && - n_namesz != 0 && - !strcmp((char *)¬e_sec->n_data[0], "GNU")) { - if (note_sec->n_hdr.n_descsz <= BUILD_ID_MAX) { - const u32 n_descsz = note_sec->n_hdr.n_descsz; - const u8 *s = ¬e_sec->n_data[n_namesz]; - - s = PTR_ALIGN(s, 4); - bin2hex(build_id, s, n_descsz); - build_id[2 * n_descsz] = '\0'; - VMCOREINFO_BUILD_ID(build_id); - return; - } - pr_warn("Build ID is too large to include in vmcoreinfo: %u > %u\n", - note_sec->n_hdr.n_descsz, - BUILD_ID_MAX); - return; - } - n_remain -= sizeof(struct elf_note) + - ALIGN(note_sec->n_hdr.n_namesz, 4) + - ALIGN(note_sec->n_hdr.n_descsz, 4); - } -} - static int __init crash_save_vmcoreinfo_init(void) { vmcoreinfo_data = (unsigned char *)get_zeroed_page(GFP_KERNEL); @@ -443,7 +397,7 @@ static int __init crash_save_vmcoreinfo_init(void) } VMCOREINFO_OSRELEASE(init_uts_ns.name.release); - add_build_id_vmcoreinfo(); + VMCOREINFO_BUILD_ID(); VMCOREINFO_PAGESIZE(PAGE_SIZE); VMCOREINFO_SYMBOL(init_uts_ns); diff --git a/lib/buildid.c b/lib/buildid.c index 180a1a9b3b76..dfc62625cae4 100644 --- a/lib/buildid.c +++ b/lib/buildid.c @@ -174,7 +174,7 @@ int build_id_parse_buf(const void *buf, unsigned char *build_id, u32 buf_size) return parse_build_id_buf(build_id, NULL, buf, buf_size); } -#if IS_ENABLED(CONFIG_STACKTRACE_BUILD_ID) +#if IS_ENABLED(CONFIG_STACKTRACE_BUILD_ID) || IS_ENABLED(CONFIG_CRASH_CORE) unsigned char vmlinux_build_id[BUILD_ID_SIZE_MAX] __ro_after_init; /** -- cgit v1.2.3-59-g8ed1b