diff options
Diffstat (limited to 'Documentation/filesystems/proc.rst')
-rw-r--r-- | Documentation/filesystems/proc.rst | 654 |
1 files changed, 438 insertions, 216 deletions
diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst index 533c79e8d2cd..2971551b7235 100644 --- a/Documentation/filesystems/proc.rst +++ b/Documentation/filesystems/proc.rst @@ -47,6 +47,8 @@ fixes/update part 1.1 Stefani Seibold <stefani@seibold.net> June 9 2009 3.10 /proc/<pid>/timerslack_ns - Task timerslack value 3.11 /proc/<pid>/patch_state - Livepatch patch operation state 3.12 /proc/<pid>/arch_status - Task architecture specific information + 3.13 /proc/<pid>/fd - List of symlinks to open files + 3.14 /proc/<pid/ksm_stat - Information about the process's ksm status. 4 Configuring procfs 4.1 Mount options @@ -84,7 +86,7 @@ contact Bodo Bauer at bb@ricochet.net. We'll be happy to add them to this document. The latest version of this document is available online at -http://tldp.org/LDP/Linux-Filesystem-Hierarchy/html/proc.html +https://www.kernel.org/doc/html/latest/filesystems/proc.html If the above direction does not works for you, you could try the kernel mailing list at linux-kernel@vger.kernel.org and/or try to reach me at @@ -126,6 +128,16 @@ process running on the system, which is named after the process ID (PID). The link 'self' points to the process reading the file system. Each process subdirectory has the entries listed in Table 1-1. +A process can read its own information from /proc/PID/* with no extra +permissions. When reading /proc/PID/* information for other processes, reading +process is required to have either CAP_SYS_PTRACE capability with +PTRACE_MODE_READ access permissions, or, alternatively, CAP_PERFMON +capability. This applies to all read-only information like `maps`, `environ`, +`pagemap`, etc. The only exception is `mem` file due to its read-write nature, +which requires CAP_SYS_PTRACE capabilities with more elevated +PTRACE_MODE_ATTACH permissions; CAP_PERFMON capability does not grant access +to /proc/PID/mem for other processes. + Note that an open file descriptor to /proc/<pid> or to any of its contained files or subdirectories does not prevent <pid> being reused for some other process in the event that <pid> exits. Operations on @@ -178,6 +190,7 @@ read the file /proc/PID/status:: Gid: 100 100 100 100 FDSize: 256 Groups: 100 14 16 + Kthread: 0 VmPeak: 5004 kB VmSize: 5004 kB VmLck: 0 kB @@ -210,6 +223,7 @@ read the file /proc/PID/status:: NoNewPrivs: 0 Seccomp: 0 Speculation_Store_Bypass: thread vulnerable + SpeculationIndirectBranch: conditional enabled voluntary_ctxt_switches: 0 nonvoluntary_ctxt_switches: 1 @@ -230,7 +244,7 @@ asynchronous manner and the value may not be very precise. To see a precise snapshot of a moment, you can see /proc/<pid>/smaps file and scan page table. It's slow but very precise. -.. table:: Table 1-2: Contents of the status files (as of 4.19) +.. table:: Table 1-2: Contents of the status fields (as of 4.19) ========================== =================================================== Field Content @@ -244,7 +258,8 @@ It's slow but very precise. Ngid NUMA group ID (0 if none) Pid process id PPid process id of the parent process - TracerPid PID of process tracing this process (0 if not) + TracerPid PID of process tracing this process (0 if not, or + the tracer is outside of the current pid namespace) Uid Real, effective, saved set, and file system UIDs Gid Real, effective, saved set, and file system GIDs FDSize number of file descriptor slots currently allocated @@ -253,6 +268,7 @@ It's slow but very precise. NSpid descendant namespace process ID hierarchy NSpgid descendant namespace process group ID hierarchy NSsid descendant namespace session ID hierarchy + Kthread kernel thread flag, 1 is yes, 0 is no VmPeak peak virtual memory size VmSize total program size VmLck locked memory size @@ -292,6 +308,7 @@ It's slow but very precise. NoNewPrivs no_new_privs, like prctl(PR_GET_NO_NEW_PRIV, ...) Seccomp seccomp mode, like prctl(PR_GET_SECCOMP, ...) Speculation_Store_Bypass speculative store bypass mitigation status + SpeculationIndirectBranch indirect branch speculation mode Cpus_allowed mask of CPUs on which this process may run Cpus_allowed_list Same as previous, but in "list format" Mems_allowed mask of memory nodes allowed to this process @@ -301,7 +318,7 @@ It's slow but very precise. ========================== =================================================== -.. table:: Table 1-3: Contents of the statm files (as of 2.6.8-rc3) +.. table:: Table 1-3: Contents of the statm fields (as of 2.6.8-rc3) ======== =============================== ============================== Field Content @@ -319,7 +336,7 @@ It's slow but very precise. ======== =============================== ============================== -.. table:: Table 1-4: Contents of the stat files (as of 2.6.30-rc7) +.. table:: Table 1-4: Contents of the stat fields (as of 2.6.30-rc7) ============= =============================================================== Field Content @@ -424,15 +441,28 @@ with the memory region, as the case would be with BSS (uninitialized data). The "pathname" shows the name associated file for this mapping. If the mapping is not associated with a file: - ======= ==================================== + =================== =========================================== [heap] the heap of the program [stack] the stack of the main process [vdso] the "virtual dynamic shared object", the kernel system call handler - ======= ==================================== + [anon:<name>] a private anonymous mapping that has been + named by userspace + [anon_shmem:<name>] an anonymous shared memory mapping that has + been named by userspace + =================== =========================================== or if empty, the mapping is anonymous. +Starting with 6.11 kernel, /proc/PID/maps provides an alternative +ioctl()-based API that gives ability to flexibly and efficiently query and +filter individual VMAs. This interface is binary and is meant for more +efficient and easy programmatic use. `struct procmap_query`, defined in +linux/fs.h UAPI header, serves as an input/output argument to the +`PROCMAP_QUERY` ioctl() command. See comments in linus/fs.h UAPI header for +details on query semantics, supported flags, data returned, and general API +usage information. + The /proc/PID/smaps is an extension based on maps, showing the memory consumption for each of the process's mappings. For each mapping (aka Virtual Memory Area, or VMA) there is a series of lines such as the following:: @@ -444,12 +474,14 @@ Memory Area, or VMA) there is a series of lines such as the following:: MMUPageSize: 4 kB Rss: 892 kB Pss: 374 kB + Pss_Dirty: 0 kB Shared_Clean: 892 kB Shared_Dirty: 0 kB Private_Clean: 0 kB Private_Dirty: 0 kB Referenced: 892 kB Anonymous: 0 kB + KSM: 0 kB LazyFree: 0 kB AnonHugePages: 0 kB ShmemPmdMapped: 0 kB @@ -463,23 +495,42 @@ Memory Area, or VMA) there is a series of lines such as the following:: THPeligible: 0 VmFlags: rd ex mr mw me dw -The first of these lines shows the same information as is displayed for the -mapping in /proc/PID/maps. Following lines show the size of the mapping -(size); the size of each page allocated when backing a VMA (KernelPageSize), -which is usually the same as the size in the page table entries; the page size -used by the MMU when backing a VMA (in most cases, the same as KernelPageSize); -the amount of the mapping that is currently resident in RAM (RSS); the -process' proportional share of this mapping (PSS); and the number of clean and -dirty shared and private pages in the mapping. +The first of these lines shows the same information as is displayed for +the mapping in /proc/PID/maps. Following lines show the size of the +mapping (size); the size of each page allocated when backing a VMA +(KernelPageSize), which is usually the same as the size in the page table +entries; the page size used by the MMU when backing a VMA (in most cases, +the same as KernelPageSize); the amount of the mapping that is currently +resident in RAM (RSS); the process's proportional share of this mapping +(PSS); and the number of clean and dirty shared and private pages in the +mapping. The "proportional set size" (PSS) of a process is the count of pages it has in memory, where each page is divided by the number of processes sharing it. So if a process has 1000 pages all to itself, and 1000 shared with one other -process, its PSS will be 1500. - -Note that even a page which is part of a MAP_SHARED mapping, but has only -a single pte mapped, i.e. is currently used by only one process, is accounted -as private and not as shared. +process, its PSS will be 1500. "Pss_Dirty" is the portion of PSS which +consists of dirty pages. ("Pss_Clean" is not included, but it can be +calculated by subtracting "Pss_Dirty" from "Pss".) + +Traditionally, a page is accounted as "private" if it is mapped exactly once, +and a page is accounted as "shared" when mapped multiple times, even when +mapped in the same process multiple times. Note that this accounting is +independent of MAP_SHARED. + +In some kernel configurations, the semantics of pages part of a larger +allocation (e.g., THP) can differ: a page is accounted as "private" if all +pages part of the corresponding large allocation are *certainly* mapped in the +same process, even if the page is mapped multiple times in that process. A +page is accounted as "shared" if any page page of the larger allocation +is *maybe* mapped in a different process. In some cases, a large allocation +might be treated as "maybe mapped by multiple processes" even though this +is no longer the case. + +Some kernel configurations do not track the precise number of times a page part +of a larger allocation is mapped. In this case, when calculating the PSS, the +average number of mappings per page in this larger allocation might be used +as an approximation for the number of mappings of a page. The PSS calculation +will be imprecise in this case. "Referenced" indicates the amount of memory currently marked as referenced or accessed. @@ -488,18 +539,21 @@ accessed. a mapping associated with a file may contain anonymous pages: when MAP_PRIVATE and a page is modified, the file page is replaced by a private anonymous copy. +"KSM" reports how many of the pages are KSM pages. Note that KSM-placed zeropages +are not included, only actual KSM pages. + "LazyFree" shows the amount of memory which is marked by madvise(MADV_FREE). The memory isn't freed immediately with madvise(). It's freed in memory pressure if the memory is clean. Please note that the printed value might be lower than the real value due to optimizations used in the current implementation. If this is not desirable please file a bug report. -"AnonHugePages" shows the ammount of memory backed by transparent hugepage. +"AnonHugePages" shows the amount of memory backed by transparent hugepage. -"ShmemPmdMapped" shows the ammount of shared (shmem/tmpfs) memory backed by +"ShmemPmdMapped" shows the amount of shared (shmem/tmpfs) memory backed by huge pages. -"Shared_Hugetlb" and "Private_Hugetlb" show the ammounts of memory backed by +"Shared_Hugetlb" and "Private_Hugetlb" show the amounts of memory backed by hugetlbfs page which is *not* counted in "RSS" or "PSS" field for historical reasons. And these are not included in {Shared,Private}_{Clean,Dirty} field. @@ -510,8 +564,10 @@ replaced by copy-on-write) part of the underlying shmem object out on swap. "SwapPss" shows proportional swap share of this mapping. Unlike "Swap", this does not take into account swapped out page of underlying shmem objects. "Locked" indicates whether the mapping is locked in memory or not. -"THPeligible" indicates whether the mapping is eligible for allocating THP -pages - 1 if true, 0 otherwise. It just shows the current status. + +"THPeligible" indicates whether the mapping is eligible for allocating +naturally aligned THP pages of any currently enabled size. 1 if true, 0 +otherwise. "VmFlags" field deserves a separate description. This member represents the kernel flags associated with the particular virtual memory area in two letter @@ -528,7 +584,6 @@ encoded manner. The codes are the following: ms may share gd stack segment growns down pf pure PFN range - dw disabled write to the mapped file lo pages are locked in memory io memory mapped I/O area sr sequential read advise provided @@ -538,14 +593,24 @@ encoded manner. The codes are the following: ac area is accountable nr swap space is not reserved for the area ht area uses huge tlb pages + sf synchronous page fault ar architecture specific flag + wf wipe on fork dd do not include area into core dump sd soft dirty flag mm mixed map area hg huge page advise flag nh no huge page advise flag - mg mergable advise flag + mg mergeable advise flag bt arm64 BTI guarded page + mt arm64 MTE allocation tags are enabled + um userfaultfd missing tracking + uw userfaultfd wr-protect tracking + ui userfaultfd minor fault + ss shadow/guarded control stack page + sl sealed + lf lock on fault pages + dp always lazily freeable mapping == ======================================= Note that there is no guarantee that every flag and associated mnemonic will @@ -649,6 +714,11 @@ Where: node locality page counters (N0 == node0, N1 == node1, ...) and the kernel page size, in KB, that is backing the mapping up. +Note that some kernel configurations do not track the precise number of times +a page part of a larger allocation (e.g., THP) is mapped. In these +configurations, "mapmax" might corresponds to the average number of mappings +per page in such a larger allocation instead. + 1.2 Kernel data --------------- @@ -663,10 +733,17 @@ files are there, and which are missing. ============ =============================================================== File Content ============ =============================================================== + allocinfo Memory allocations profiling information apm Advanced power management info + bootconfig Kernel command line obtained from boot config, + and, if there were kernel parameters from the + boot loader, a "# Parameters from bootloader:" + line followed by a line containing those + parameters prefixed by "# ". (5.5) buddyinfo Kernel memory allocator information (see text) (2.5) bus Directory containing bus specific information - cmdline Kernel command line + cmdline Kernel command line, both from bootloader and embedded + in the kernel image cpuinfo Info about the CPU devices Available devices (block and character) dma Used DMS channels @@ -684,7 +761,14 @@ files are there, and which are missing. kcore Kernel core image (can be ELF or A.OUT(deprecated in 2.4)) kmsg Kernel messages ksyms Kernel symbol table - loadavg Load average of last 1, 5 & 15 minutes + loadavg Load average of last 1, 5 & 15 minutes; + number of processes currently runnable (running or on ready queue); + total number of processes in system; + last pid created. + All fields are separated by one space except "number of + processes currently runnable" and "total number of processes + in system", which are separated by a slash ('/'). Example: + 0.61 0.61 0.55 3/828 22084 locks Kernel locks meminfo Memory info misc Miscellaneous @@ -915,60 +999,117 @@ also be allocatable although a lot of filesystem metadata may have to be reclaimed to achieve this. +allocinfo +~~~~~~~~~ + +Provides information about memory allocations at all locations in the code +base. Each allocation in the code is identified by its source file, line +number, module (if originates from a loadable module) and the function calling +the allocation. The number of bytes allocated and number of calls at each +location are reported. The first line indicates the version of the file, the +second line is the header listing fields in the file. + +Example output. + +:: + + > tail -n +3 /proc/allocinfo | sort -rn + 127664128 31168 mm/page_ext.c:270 func:alloc_page_ext + 56373248 4737 mm/slub.c:2259 func:alloc_slab_page + 14880768 3633 mm/readahead.c:247 func:page_cache_ra_unbounded + 14417920 3520 mm/mm_init.c:2530 func:alloc_large_system_hash + 13377536 234 block/blk-mq.c:3421 func:blk_mq_alloc_rqs + 11718656 2861 mm/filemap.c:1919 func:__filemap_get_folio + 9192960 2800 kernel/fork.c:307 func:alloc_thread_stack_node + 4206592 4 net/netfilter/nf_conntrack_core.c:2567 func:nf_ct_alloc_hashtable + 4136960 1010 drivers/staging/ctagmod/ctagmod.c:20 [ctagmod] func:ctagmod_start + 3940352 962 mm/memory.c:4214 func:alloc_anon_folio + 2894464 22613 fs/kernfs/dir.c:615 func:__kernfs_new_node + ... + + meminfo ~~~~~~~ Provides information about distribution and utilization of memory. This -varies by architecture and compile options. The following is from a -16GB PIII, which has highmem enabled. You may not have all of these fields. +varies by architecture and compile options. Some of the counters reported +here overlap. The memory reported by the non overlapping counters may not +add up to the overall memory usage and the difference for some workloads +can be substantial. In many cases there are other means to find out +additional memory using subsystem specific interfaces, for instance +/proc/net/sockstat for TCP memory allocations. + +Example output. You may not have all of these fields. :: > cat /proc/meminfo - MemTotal: 16344972 kB - MemFree: 13634064 kB - MemAvailable: 14836172 kB - Buffers: 3656 kB - Cached: 1195708 kB - SwapCached: 0 kB - Active: 891636 kB - Inactive: 1077224 kB - HighTotal: 15597528 kB - HighFree: 13629632 kB - LowTotal: 747444 kB - LowFree: 4432 kB - SwapTotal: 0 kB - SwapFree: 0 kB - Dirty: 968 kB - Writeback: 0 kB - AnonPages: 861800 kB - Mapped: 280372 kB - Shmem: 644 kB - KReclaimable: 168048 kB - Slab: 284364 kB - SReclaimable: 159856 kB - SUnreclaim: 124508 kB - PageTables: 24448 kB - NFS_Unstable: 0 kB - Bounce: 0 kB - WritebackTmp: 0 kB - CommitLimit: 7669796 kB - Committed_AS: 100056 kB - VmallocTotal: 112216 kB - VmallocUsed: 428 kB - VmallocChunk: 111088 kB - Percpu: 62080 kB - HardwareCorrupted: 0 kB - AnonHugePages: 49152 kB - ShmemHugePages: 0 kB - ShmemPmdMapped: 0 kB + MemTotal: 32858820 kB + MemFree: 21001236 kB + MemAvailable: 27214312 kB + Buffers: 581092 kB + Cached: 5587612 kB + SwapCached: 0 kB + Active: 3237152 kB + Inactive: 7586256 kB + Active(anon): 94064 kB + Inactive(anon): 4570616 kB + Active(file): 3143088 kB + Inactive(file): 3015640 kB + Unevictable: 0 kB + Mlocked: 0 kB + SwapTotal: 0 kB + SwapFree: 0 kB + Zswap: 1904 kB + Zswapped: 7792 kB + Dirty: 12 kB + Writeback: 0 kB + AnonPages: 4654780 kB + Mapped: 266244 kB + Shmem: 9976 kB + KReclaimable: 517708 kB + Slab: 660044 kB + SReclaimable: 517708 kB + SUnreclaim: 142336 kB + KernelStack: 11168 kB + PageTables: 20540 kB + SecPageTables: 0 kB + NFS_Unstable: 0 kB + Bounce: 0 kB + WritebackTmp: 0 kB + CommitLimit: 16429408 kB + Committed_AS: 7715148 kB + VmallocTotal: 34359738367 kB + VmallocUsed: 40444 kB + VmallocChunk: 0 kB + Percpu: 29312 kB + EarlyMemtestBad: 0 kB + HardwareCorrupted: 0 kB + AnonHugePages: 4149248 kB + ShmemHugePages: 0 kB + ShmemPmdMapped: 0 kB + FileHugePages: 0 kB + FilePmdMapped: 0 kB + CmaTotal: 0 kB + CmaFree: 0 kB + Unaccepted: 0 kB + Balloon: 0 kB + HugePages_Total: 0 + HugePages_Free: 0 + HugePages_Rsvd: 0 + HugePages_Surp: 0 + Hugepagesize: 2048 kB + Hugetlb: 0 kB + DirectMap4k: 401152 kB + DirectMap2M: 10008576 kB + DirectMap1G: 24117248 kB MemTotal Total usable RAM (i.e. physical RAM minus a few reserved bits and the kernel binary code) MemFree - The sum of LowFree+HighFree + Total free RAM. On highmem systems, the sum of LowFree+HighFree MemAvailable An estimate of how much memory is available for starting new applications, without swapping. Calculated from MemFree, @@ -982,8 +1123,9 @@ Buffers Relatively temporary storage for raw disk blocks shouldn't get tremendously large (20MB or so) Cached - in-memory cache for files read from the disk (the - pagecache). Doesn't include SwapCached + In-memory cache for files read from the disk (the + pagecache) as well as tmpfs & shmem. + Doesn't include SwapCached. SwapCached Memory that once was swapped out, is swapped back in but still also is in the swapfile (if memory is needed it @@ -995,6 +1137,11 @@ Active Inactive Memory which has been less recently used. It is more eligible to be reclaimed for other purposes +Unevictable + Memory allocated for userspace which cannot be reclaimed, such + as mlocked pages, ramfs backing pages, secret memfd pages etc. +Mlocked + Memory locked with mlock(). HighTotal, HighFree Highmem is all memory above ~860MB of physical memory. Highmem areas are for use by userspace programs, or @@ -1011,26 +1158,26 @@ SwapTotal SwapFree Memory which has been evicted from RAM, and is temporarily on the disk +Zswap + Memory consumed by the zswap backend (compressed size) +Zswapped + Amount of anonymous memory stored in zswap (original size) Dirty Memory which is waiting to get written back to the disk Writeback Memory which is actively being written back to the disk AnonPages - Non-file backed pages mapped into userspace page tables -HardwareCorrupted - The amount of RAM/memory in KB, the kernel identifies as - corrupted. -AnonHugePages - Non-file backed huge pages mapped into userspace page tables + Non-file backed pages mapped into userspace page tables. Note that + some kernel configurations might consider all pages part of a + larger allocation (e.g., THP) as "mapped", as soon as a single + page is mapped. Mapped - files which have been mmaped, such as libraries + files which have been mmapped, such as libraries. Note that some + kernel configurations might consider all pages part of a larger + allocation (e.g., THP) as "mapped", as soon as a single page is + mapped. Shmem Total memory used by shared memory (shmem) and tmpfs -ShmemHugePages - Memory used by shared memory (shmem) and tmpfs allocated - with huge pages -ShmemPmdMapped - Shared memory mapped into userspace with huge pages KReclaimable Kernel allocations that the kernel will attempt to reclaim under memory pressure. Includes SReclaimable (below), and other @@ -1041,16 +1188,22 @@ SReclaimable Part of Slab, that might be reclaimed, such as caches SUnreclaim Part of Slab, that cannot be reclaimed on memory pressure +KernelStack + Memory consumed by the kernel stacks of all tasks PageTables - amount of memory dedicated to the lowest level of page - tables. + Memory consumed by userspace page tables +SecPageTables + Memory consumed by secondary page tables, this currently includes + KVM mmu and IOMMU allocations on x86 and arm64. NFS_Unstable - Always zero. Previous counted pages which had been written to + Always zero. Previously counted pages which had been written to the server, but has not been committed to stable storage. Bounce - Memory used for block device "bounce buffers" + Always zero. Previously memory used for block device + "bounce buffers". WritebackTmp - Memory used by FUSE for temporary writeback buffers + Always zero. Previously memory used by FUSE for temporary + writeback buffers. CommitLimit Based on the overcommit ratio ('vm.overcommit_ratio'), this is the total amount of memory currently available to @@ -1068,14 +1221,14 @@ CommitLimit yield a CommitLimit of 7.3G. For more details, see the memory overcommit documentation - in vm/overcommit-accounting. + in mm/overcommit-accounting. Committed_AS The amount of memory presently allocated on the system. The committed memory is a sum of all of the memory which has been allocated by processes, even if it has not been "used" by them as of yet. A process which malloc()'s 1G of memory, but only touches 300M of it will show up as - using 1G. This 1G is memory which has been "committed" to + using 1G. This 1G is memory which has been "committed" to by the VM and can be used at any time by the allocating application. With strict overcommit enabled on the system (mode 2 in 'vm.overcommit_memory'), allocations which would @@ -1084,7 +1237,7 @@ Committed_AS not fail due to lack of memory once that memory has been successfully allocated. VmallocTotal - total size of vmalloc memory area + total size of vmalloc virtual address space VmallocUsed amount of vmalloc area which is used VmallocChunk @@ -1092,6 +1245,41 @@ VmallocChunk Percpu Memory allocated to the percpu allocator used to back percpu allocations. This stat excludes the cost of metadata. +EarlyMemtestBad + The amount of RAM/memory in kB, that was identified as corrupted + by early memtest. If memtest was not run, this field will not + be displayed at all. Size is never rounded down to 0 kB. + That means if 0 kB is reported, you can safely assume + there was at least one pass of memtest and none of the passes + found a single faulty byte of RAM. +HardwareCorrupted + The amount of RAM/memory in KB, the kernel identifies as + corrupted. +AnonHugePages + Non-file backed huge pages mapped into userspace page tables +ShmemHugePages + Memory used by shared memory (shmem) and tmpfs allocated + with huge pages +ShmemPmdMapped + Shared memory mapped into userspace with huge pages +FileHugePages + Memory used for filesystem data (page cache) allocated + with huge pages +FilePmdMapped + Page cache mapped into userspace with huge pages +CmaTotal + Memory reserved for the Contiguous Memory Allocator (CMA) +CmaFree + Free remaining memory in the CMA reserves +Unaccepted + Memory that has not been accepted by the guest +Balloon + Memory returned to Host by VM Balloon Drivers +HugePages_Total, HugePages_Free, HugePages_Rsvd, HugePages_Surp, Hugepagesize, Hugetlb + See Documentation/admin-guide/mm/hugetlbpage.rst. +DirectMap4k, DirectMap2M, DirectMap1G + Breakdown of page table sizes used in the kernel's + identity mapping of RAM vmallocinfo ~~~~~~~~~~~ @@ -1160,85 +1348,7 @@ Provides counts of softirq handlers serviced since boot time, for each CPU. HRTIMER: 0 0 0 0 RCU: 1678 1769 2178 2250 - -1.3 IDE devices in /proc/ide ----------------------------- - -The subdirectory /proc/ide contains information about all IDE devices of which -the kernel is aware. There is one subdirectory for each IDE controller, the -file drivers and a link for each IDE device, pointing to the device directory -in the controller specific subtree. - -The file 'drivers' contains general information about the drivers used for the -IDE devices:: - - > cat /proc/ide/drivers - ide-cdrom version 4.53 - ide-disk version 1.08 - -More detailed information can be found in the controller specific -subdirectories. These are named ide0, ide1 and so on. Each of these -directories contains the files shown in table 1-6. - - -.. table:: Table 1-6: IDE controller info in /proc/ide/ide? - - ======= ======================================= - File Content - ======= ======================================= - channel IDE channel (0 or 1) - config Configuration (only for PCI/IDE bridge) - mate Mate name - model Type/Chipset of IDE controller - ======= ======================================= - -Each device connected to a controller has a separate subdirectory in the -controllers directory. The files listed in table 1-7 are contained in these -directories. - - -.. table:: Table 1-7: IDE device information - - ================ ========================================== - File Content - ================ ========================================== - cache The cache - capacity Capacity of the medium (in 512Byte blocks) - driver driver and version - geometry physical and logical geometry - identify device identify block - media media type - model device identifier - settings device setup - smart_thresholds IDE disk management thresholds - smart_values IDE disk management values - ================ ========================================== - -The most interesting file is ``settings``. This file contains a nice -overview of the drive parameters:: - - # cat /proc/ide/ide0/hda/settings - name value min max mode - ---- ----- --- --- ---- - bios_cyl 526 0 65535 rw - bios_head 255 0 255 rw - bios_sect 63 0 63 rw - breada_readahead 4 0 127 rw - bswap 0 0 1 r - file_readahead 72 0 2097151 rw - io_32bit 0 0 3 rw - keepsettings 0 0 1 rw - max_kb_per_request 122 1 127 rw - multcount 0 0 8 rw - nice1 1 0 1 rw - nowerr 0 0 1 rw - pio_mode write-only 0 255 w - slow 0 0 1 rw - unmaskirq 0 0 1 rw - using_dma 0 0 1 rw - - -1.4 Networking info in /proc/net +1.3 Networking info in /proc/net -------------------------------- The subdirectory /proc/net follows the usual pattern. Table 1-8 shows the @@ -1284,6 +1394,7 @@ support this. Table 1-9 lists the files and their meaning. rt_cache Routing cache snmp SNMP data sockstat Socket statistics + softnet_stat Per-CPU incoming packets queues statistics of online CPUs tcp TCP sockets udp UDP sockets unix UNIX domain sockets @@ -1317,12 +1428,12 @@ It will contain information that is specific to that bond, such as the current slaves of the bond, the link status of the slaves, and how many times the slaves link has failed. -1.5 SCSI info +1.4 SCSI info ------------- -If you have a SCSI host adapter in your system, you'll find a subdirectory -named after the driver for this adapter in /proc/scsi. You'll also see a list -of all recognized SCSI devices in /proc/scsi:: +If you have a SCSI or ATA host adapter in your system, you'll find a +subdirectory named after the driver for this adapter in /proc/scsi. +You'll also see a list of all recognized SCSI devices in /proc/scsi:: >cat /proc/scsi/scsi Attached devices: @@ -1380,7 +1491,7 @@ AHA-2940 SCSI adapter:: Total transfers 0 (0 reads and 0 writes) -1.6 Parallel port info in /proc/parport +1.5 Parallel port info in /proc/parport --------------------------------------- The directory /proc/parport contains information about the parallel ports of @@ -1405,7 +1516,7 @@ These directories contain the four files shown in Table 1-10. number or none). ========= ==================================================================== -1.7 TTY info in /proc/tty +1.6 TTY info in /proc/tty ------------------------- Information about the available and actually used tty's can be found in the @@ -1440,7 +1551,7 @@ To see which tty's are currently in use, you can simply look into the file unknown /dev/tty 4 1-63 console -1.8 Miscellaneous kernel statistics in /proc/stat +1.7 Miscellaneous kernel statistics in /proc/stat ------------------------------------------------- Various pieces of information about kernel activity are available in the @@ -1448,16 +1559,18 @@ Various pieces of information about kernel activity are available in the since the system first booted. For a quick look, simply cat the file:: > cat /proc/stat - cpu 2255 34 2290 22625563 6290 127 456 0 0 0 - cpu0 1132 34 1441 11311718 3675 127 438 0 0 0 - cpu1 1123 0 849 11313845 2614 0 18 0 0 0 - intr 114930548 113199788 3 0 5 263 0 4 [... lots more numbers ...] - ctxt 1990473 - btime 1062191376 - processes 2915 - procs_running 1 + cpu 237902850 368826709 106375398 1873517540 1135548 0 14507935 0 0 0 + cpu0 60045249 91891769 26331539 468411416 495718 0 5739640 0 0 0 + cpu1 59746288 91759249 26609887 468860630 312281 0 4384817 0 0 0 + cpu2 59489247 92985423 26904446 467808813 171668 0 2268998 0 0 0 + cpu3 58622065 92190267 26529524 468436680 155879 0 2114478 0 0 0 + intr 8688370575 8 3373 0 0 0 0 0 0 1 40791 0 0 353317 0 0 0 0 224789828 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 190974333 41958554 123983334 43 0 224593 0 0 0 <more 0's deleted> + ctxt 22848221062 + btime 1605316999 + processes 746787147 + procs_running 2 procs_blocked 0 - softirq 183433 0 21755 12 39 1137 231 21459 2263 + softirq 12121874454 100099120 3938138295 127375644 2795979 187870761 0 173808342 3072582055 52608 224184354 The very first "cpu" line aggregates the numbers in all of the other "cpuN" lines. These numbers identify the amount of time the CPU has spent performing @@ -1513,14 +1626,14 @@ softirqs serviced; each subsequent column is the total for that particular softirq. -1.9 Ext4 file system parameters +1.8 Ext4 file system parameters ------------------------------- Information about mounted ext4 file systems can be found in /proc/fs/ext4. Each mounted filesystem will have a directory in /proc/fs/ext4 based on its device name (i.e., /proc/fs/ext4/hdc or -/proc/fs/ext4/dm-0). The files in each per-device directory are shown -in Table 1-12, below. +/proc/fs/ext4/sda9 or /proc/fs/ext4/dm-0). The files in each per-device +directory are shown in Table 1-12, below. .. table:: Table 1-12: Files in /proc/fs/ext4/<devname> @@ -1529,7 +1642,7 @@ in Table 1-12, below. mb_groups details of multiblock allocator buddy cache of free blocks ============== ========================================================== -1.10 /proc/consoles +1.9 /proc/consoles ------------------- Shows registered system console lines. @@ -1600,12 +1713,12 @@ can inadvertently disrupt your system, it is advisable to read both documentation and source before actually making adjustments. In any case, be very careful when writing to any of these files. The entries in /proc may change slightly between the 2.1.* and the 2.2 kernel, so if there is any doubt -review the kernel documentation in the directory /usr/src/linux/Documentation. +review the kernel documentation in the directory linux/Documentation. This chapter is heavily based on the documentation included in the pre 2.2 kernels, and became part of it in version 2.2.1 of the Linux kernel. -Please see: Documentation/admin-guide/sysctl/ directory for descriptions of these -entries. +Please see: Documentation/admin-guide/sysctl/ directory for descriptions of +these entries. Summary ------- @@ -1836,19 +1949,19 @@ For example:: This file contains lines of the form:: 36 35 98:0 /mnt1 /mnt2 rw,noatime master:1 - ext3 /dev/root rw,errors=continue - (1)(2)(3) (4) (5) (6) (7) (8) (9) (10) (11) - - (1) mount ID: unique identifier of the mount (may be reused after umount) - (2) parent ID: ID of parent (or of self for the top of the mount tree) - (3) major:minor: value of st_dev for files on filesystem - (4) root: root of the mount within the filesystem - (5) mount point: mount point relative to the process's root - (6) mount options: per mount options - (7) optional fields: zero or more fields of the form "tag[:value]" - (8) separator: marks the end of the optional fields - (9) filesystem type: name of filesystem of the form "type[.subtype]" - (10) mount source: filesystem specific information or "none" - (11) super options: per super block options + (1)(2)(3) (4) (5) (6) (n…m) (m+1)(m+2) (m+3) (m+4) + + (1) mount ID: unique identifier of the mount (may be reused after umount) + (2) parent ID: ID of parent (or of self for the top of the mount tree) + (3) major:minor: value of st_dev for files on filesystem + (4) root: root of the mount within the filesystem + (5) mount point: mount point relative to the process's root + (6) mount options: per mount options + (n…m) optional fields: zero or more fields of the form "tag[:value]" + (m+1) separator: marks the end of the optional fields + (m+2) filesystem type: name of filesystem of the form "type[.subtype]" + (m+3) mount source: filesystem specific information or "none" + (m+4) super options: per super block options Parsers should ignore all unrecognised optional fields. Currently the possible optional fields are: @@ -1875,8 +1988,8 @@ For more information on mount propagation see: These files provide a method to access a task's comm value. It also allows for a task to set its own or one of its thread siblings comm value. The comm value is limited in size compared to the cmdline value, so writing anything longer -then the kernel's TASK_COMM_LEN (currently 16 chars) will result in a truncated -comm value. +then the kernel's TASK_COMM_LEN (currently 16 chars, including the NUL +terminator) will result in a truncated comm value. 3.7 /proc/<pid>/task/<tid>/children - Information about task children @@ -1899,18 +2012,20 @@ if precise results are needed. 3.8 /proc/<pid>/fdinfo/<fd> - Information about opened file --------------------------------------------------------------- This file provides information associated with an opened file. The regular -files have at least three fields -- 'pos', 'flags' and 'mnt_id'. The 'pos' -represents the current offset of the opened file in decimal form [see lseek(2) -for details], 'flags' denotes the octal O_xxx mask the file has been -created with [see open(2) for details] and 'mnt_id' represents mount ID of -the file system containing the opened file [see 3.5 /proc/<pid>/mountinfo -for details]. +files have at least four fields -- 'pos', 'flags', 'mnt_id' and 'ino'. +The 'pos' represents the current offset of the opened file in decimal +form [see lseek(2) for details], 'flags' denotes the octal O_xxx mask the +file has been created with [see open(2) for details] and 'mnt_id' represents +mount ID of the file system containing the opened file [see 3.5 +/proc/<pid>/mountinfo for details]. 'ino' represents the inode number of +the file. A typical output is:: pos: 0 flags: 0100002 mnt_id: 19 + ino: 63107 All locks associated with a file descriptor are shown in its fdinfo too:: @@ -1927,6 +2042,7 @@ Eventfd files pos: 0 flags: 04002 mnt_id: 9 + ino: 63107 eventfd-count: 5a where 'eventfd-count' is hex value of a counter. @@ -1939,6 +2055,7 @@ Signalfd files pos: 0 flags: 04002 mnt_id: 9 + ino: 63107 sigmask: 0000000000000200 where 'sigmask' is hex value of the signal mask associated @@ -1952,6 +2069,7 @@ Epoll files pos: 0 flags: 02 mnt_id: 9 + ino: 63107 tfd: 5 events: 1d data: ffffffffffffffff pos:0 ino:61af sdev:7 where 'tfd' is a target file descriptor number in decimal form, @@ -1968,6 +2086,8 @@ For inotify files the format is the following:: pos: 0 flags: 02000000 + mnt_id: 9 + ino: 63107 inotify wd:3 ino:9e7e sdev:800013 mask:800afce ignored_mask:0 fhandle-bytes:8 fhandle-type:1 f_handle:7e9e0000640d1b6d where 'wd' is a watch descriptor in decimal form, i.e. a target file @@ -1990,6 +2110,7 @@ For fanotify files the format is:: pos: 0 flags: 02 mnt_id: 9 + ino: 63107 fanotify flags:10 event-flags:0 fanotify mnt_id:12 mflags:40 mask:38 ignored_mask:40000003 fanotify ino:4f969 sdev:800013 mflags:0 mask:3b ignored_mask:40000000 fhandle-bytes:8 fhandle-type:1 f_handle:69f90400c275b5b4 @@ -2014,6 +2135,7 @@ Timerfd files pos: 0 flags: 02 mnt_id: 9 + ino: 63107 clockid: 0 ticks: 0 settime flags: 01 @@ -2028,6 +2150,22 @@ details]. 'it_value' is remaining time until the timer expiration. with TIMER_ABSTIME option which will be shown in 'settime flags', but 'it_value' still exhibits timer's remaining time. +DMA Buffer files +~~~~~~~~~~~~~~~~ + +:: + + pos: 0 + flags: 04002 + mnt_id: 9 + ino: 63107 + size: 32768 + count: 2 + exp_name: system-heap + +where 'size' is the size of the DMA buffer in bytes. 'count' is the file count of +the DMA buffer file. 'exp_name' is the name of the DMA buffer exporter. + 3.9 /proc/<pid>/map_files - Information about memory mapped files --------------------------------------------------------------------- This directory contains symbolic links which represent memory mapped files @@ -2128,6 +2266,90 @@ AVX512_elapsed_ms the task is unlikely an AVX512 user, but depends on the workload and the scheduling scenario, it also could be a false negative mentioned above. +3.13 /proc/<pid>/fd - List of symlinks to open files +------------------------------------------------------- +This directory contains symbolic links which represent open files +the process is maintaining. Example output:: + + lr-x------ 1 root root 64 Sep 20 17:53 0 -> /dev/null + l-wx------ 1 root root 64 Sep 20 17:53 1 -> /dev/null + lrwx------ 1 root root 64 Sep 20 17:53 10 -> 'socket:[12539]' + lrwx------ 1 root root 64 Sep 20 17:53 11 -> 'socket:[12540]' + lrwx------ 1 root root 64 Sep 20 17:53 12 -> 'socket:[12542]' + +The number of open files for the process is stored in 'size' member +of stat() output for /proc/<pid>/fd for fast access. +------------------------------------------------------- + +3.14 /proc/<pid/ksm_stat - Information about the process's ksm status +--------------------------------------------------------------------- +When CONFIG_KSM is enabled, each process has this file which displays +the information of ksm merging status. + +Example +~~~~~~~ + +:: + + / # cat /proc/self/ksm_stat + ksm_rmap_items 0 + ksm_zero_pages 0 + ksm_merging_pages 0 + ksm_process_profit 0 + ksm_merge_any: no + ksm_mergeable: no + +Description +~~~~~~~~~~~ + +ksm_rmap_items +^^^^^^^^^^^^^^ + +The number of ksm_rmap_item structures in use. The structure +ksm_rmap_item stores the reverse mapping information for virtual +addresses. KSM will generate a ksm_rmap_item for each ksm-scanned page of +the process. + +ksm_zero_pages +^^^^^^^^^^^^^^ + +When /sys/kernel/mm/ksm/use_zero_pages is enabled, it represent how many +empty pages are merged with kernel zero pages by KSM. + +ksm_merging_pages +^^^^^^^^^^^^^^^^^ + +It represents how many pages of this process are involved in KSM merging +(not including ksm_zero_pages). It is the same with what +/proc/<pid>/ksm_merging_pages shows. + +ksm_process_profit +^^^^^^^^^^^^^^^^^^ + +The profit that KSM brings (Saved bytes). KSM can save memory by merging +identical pages, but also can consume additional memory, because it needs +to generate a number of rmap_items to save each scanned page's brief rmap +information. Some of these pages may be merged, but some may not be abled +to be merged after being checked several times, which are unprofitable +memory consumed. + +ksm_merge_any +^^^^^^^^^^^^^ + +It specifies whether the process's 'mm is added by prctl() into the +candidate list of KSM or not, and if KSM scanning is fully enabled at +process level. + +ksm_mergeable +^^^^^^^^^^^^^ + +It specifies whether any VMAs of the process''s mms are currently +applicable to KSM. + +More information about KSM can be found in +Documentation/admin-guide/mm/ksm.rst. + + Chapter 4: Configuring procfs ============================= @@ -2156,7 +2378,7 @@ arguments are now protected against local eavesdroppers. hidepid=invisible or hidepid=2 means hidepid=1 plus all /proc/<pid>/ will be fully invisible to other users. It doesn't mean that it hides a fact whether a process with a specific pid value exists (it can be learned by other means, e.g. -by "kill -0 $PID"), but it hides process' uid and gid, which may be learned by +by "kill -0 $PID"), but it hides process's uid and gid, which may be learned by stat()'ing /proc/<pid>/ otherwise. It greatly complicates an intruder's task of gathering information about running processes, whether some daemon runs with elevated privileges, whether other user runs some sensitive program, whether @@ -2175,7 +2397,7 @@ are not related to tasks. Chapter 5: Filesystem behavior ============================== -Originally, before the advent of pid namepsace, procfs was a global file +Originally, before the advent of pid namespace, procfs was a global file system. It means that there was only one procfs instance in the system. When pid namespace was added, a separate procfs instance was mounted in |