From 60ca05c3b44566b70d64fbb8e87a6e0c67725468 Mon Sep 17 00:00:00 2001 From: Salvatore Bonaccorso Date: Wed, 15 Aug 2018 07:46:04 +0200 Subject: Documentation/l1tf: Fix small spelling typo Fix small typo (wiil -> will) in the "3.4. Nested virtual machines" section. Fixes: 5b76a3cff011 ("KVM: VMX: Tell the nested hypervisor to skip L1D flush on vmentry") Cc: linux-kernel@vger.kernel.org Cc: Jonathan Corbet Cc: Josh Poimboeuf Cc: Paolo Bonzini Cc: Greg Kroah-Hartman Cc: Tony Luck Cc: linux-doc@vger.kernel.org Cc: trivial@kernel.org Signed-off-by: Salvatore Bonaccorso Signed-off-by: Jonathan Corbet --- Documentation/admin-guide/l1tf.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'Documentation/admin-guide') diff --git a/Documentation/admin-guide/l1tf.rst b/Documentation/admin-guide/l1tf.rst index bae52b845de0..b85dd80510b0 100644 --- a/Documentation/admin-guide/l1tf.rst +++ b/Documentation/admin-guide/l1tf.rst @@ -553,7 +553,7 @@ When nested virtualization is in use, three operating systems are involved: the bare metal hypervisor, the nested hypervisor and the nested virtual machine. VMENTER operations from the nested hypervisor into the nested guest will always be processed by the bare metal hypervisor. If KVM is the -bare metal hypervisor it wiil: +bare metal hypervisor it will: - Flush the L1D cache on every switch from the nested hypervisor to the nested virtual machine, so that the nested hypervisor's secrets are not -- cgit v1.2.3-59-g8ed1b From a7ddcea58ae22d85d94eabfdd3de75c3742e376b Mon Sep 17 00:00:00 2001 From: Henrik Austad Date: Tue, 4 Sep 2018 00:15:23 +0200 Subject: Drop all 00-INDEX files from Documentation/ This is a respin with a wider audience (all that get_maintainer returned) and I know this spams a *lot* of people. Not sure what would be the correct way, so my apologies for ruining your inbox. The 00-INDEX files are supposed to give a summary of all files present in a directory, but these files are horribly out of date and their usefulness is brought into question. Often a simple "ls" would reveal the same information as the filenames are generally quite descriptive as a short introduction to what the file covers (it should not surprise anyone what Documentation/sched/sched-design-CFS.txt covers) A few years back it was mentioned that these files were no longer really needed, and they have since then grown further out of date, so perhaps it is time to just throw them out. A short status yields the following _outdated_ 00-INDEX files, first counter is files listed in 00-INDEX but missing in the directory, last is files present but not listed in 00-INDEX. List of outdated 00-INDEX: Documentation: (4/10) Documentation/sysctl: (0/1) Documentation/timers: (1/0) Documentation/blockdev: (3/1) Documentation/w1/slaves: (0/1) Documentation/locking: (0/1) Documentation/devicetree: (0/5) Documentation/power: (1/1) Documentation/powerpc: (0/5) Documentation/arm: (1/0) Documentation/x86: (0/9) Documentation/x86/x86_64: (1/1) Documentation/scsi: (4/4) Documentation/filesystems: (2/9) Documentation/filesystems/nfs: (0/2) Documentation/cgroup-v1: (0/2) Documentation/kbuild: (0/4) Documentation/spi: (1/0) Documentation/virtual/kvm: (1/0) Documentation/scheduler: (0/2) Documentation/fb: (0/1) Documentation/block: (0/1) Documentation/networking: (6/37) Documentation/vm: (1/3) Then there are 364 subdirectories in Documentation/ with several files that are missing 00-INDEX alltogether (and another 120 with a single file and no 00-INDEX). I don't really have an opinion to whether or not we /should/ have 00-INDEX, but the above 00-INDEX should either be removed or be kept up to date. If we should keep the files, I can try to keep them updated, but I rather not if we just want to delete them anyway. As a starting point, remove all index-files and references to 00-INDEX and see where the discussion is going. Signed-off-by: Henrik Austad Acked-by: "Paul E. McKenney" Just-do-it-by: Steven Rostedt Reviewed-by: Jens Axboe Acked-by: Paul Moore Acked-by: Greg Kroah-Hartman Acked-by: Mark Brown Acked-by: Mike Rapoport Cc: [Almost everybody else] Signed-off-by: Jonathan Corbet --- Documentation/00-INDEX | 428 -------------------------------- Documentation/PCI/00-INDEX | 26 -- Documentation/RCU/00-INDEX | 34 --- Documentation/RCU/rcu.txt | 4 - Documentation/admin-guide/README.rst | 3 +- Documentation/arm/00-INDEX | 50 ---- Documentation/block/00-INDEX | 34 --- Documentation/blockdev/00-INDEX | 18 -- Documentation/cdrom/00-INDEX | 11 - Documentation/cgroup-v1/00-INDEX | 26 -- Documentation/devicetree/00-INDEX | 12 - Documentation/fb/00-INDEX | 75 ------ Documentation/filesystems/00-INDEX | 153 ------------ Documentation/filesystems/nfs/00-INDEX | 26 -- Documentation/fmc/00-INDEX | 38 --- Documentation/gpio/00-INDEX | 4 - Documentation/ide/00-INDEX | 14 -- Documentation/ioctl/00-INDEX | 12 - Documentation/isdn/00-INDEX | 42 ---- Documentation/kbuild/00-INDEX | 14 -- Documentation/laptops/00-INDEX | 16 -- Documentation/leds/00-INDEX | 32 --- Documentation/locking/00-INDEX | 16 -- Documentation/m68k/00-INDEX | 7 - Documentation/mips/00-INDEX | 4 - Documentation/mmc/00-INDEX | 10 - Documentation/netlabel/00-INDEX | 10 - Documentation/netlabel/cipso_ipv4.txt | 11 +- Documentation/netlabel/introduction.txt | 2 +- Documentation/networking/00-INDEX | 234 ----------------- Documentation/parisc/00-INDEX | 6 - Documentation/power/00-INDEX | 44 ---- Documentation/powerpc/00-INDEX | 34 --- Documentation/s390/00-INDEX | 28 --- Documentation/scheduler/00-INDEX | 18 -- Documentation/scsi/00-INDEX | 108 -------- Documentation/serial/00-INDEX | 16 -- Documentation/spi/00-INDEX | 16 -- Documentation/sysctl/00-INDEX | 16 -- Documentation/timers/00-INDEX | 16 -- Documentation/virtual/00-INDEX | 11 - Documentation/virtual/kvm/00-INDEX | 35 --- Documentation/vm/00-INDEX | 50 ---- Documentation/w1/00-INDEX | 10 - Documentation/w1/masters/00-INDEX | 12 - Documentation/w1/slaves/00-INDEX | 14 -- Documentation/x86/00-INDEX | 20 -- Documentation/x86/x86_64/00-INDEX | 16 -- README | 1 - scripts/check_00index.sh | 67 ----- 50 files changed, 8 insertions(+), 1896 deletions(-) delete mode 100644 Documentation/00-INDEX delete mode 100644 Documentation/PCI/00-INDEX delete mode 100644 Documentation/RCU/00-INDEX delete mode 100644 Documentation/arm/00-INDEX delete mode 100644 Documentation/block/00-INDEX delete mode 100644 Documentation/blockdev/00-INDEX delete mode 100644 Documentation/cdrom/00-INDEX delete mode 100644 Documentation/cgroup-v1/00-INDEX delete mode 100644 Documentation/devicetree/00-INDEX delete mode 100644 Documentation/fb/00-INDEX delete mode 100644 Documentation/filesystems/00-INDEX delete mode 100644 Documentation/filesystems/nfs/00-INDEX delete mode 100644 Documentation/fmc/00-INDEX delete mode 100644 Documentation/gpio/00-INDEX delete mode 100644 Documentation/ide/00-INDEX delete mode 100644 Documentation/ioctl/00-INDEX delete mode 100644 Documentation/isdn/00-INDEX delete mode 100644 Documentation/kbuild/00-INDEX delete mode 100644 Documentation/laptops/00-INDEX delete mode 100644 Documentation/leds/00-INDEX delete mode 100644 Documentation/locking/00-INDEX delete mode 100644 Documentation/m68k/00-INDEX delete mode 100644 Documentation/mips/00-INDEX delete mode 100644 Documentation/mmc/00-INDEX delete mode 100644 Documentation/netlabel/00-INDEX delete mode 100644 Documentation/networking/00-INDEX delete mode 100644 Documentation/parisc/00-INDEX delete mode 100644 Documentation/power/00-INDEX delete mode 100644 Documentation/powerpc/00-INDEX delete mode 100644 Documentation/s390/00-INDEX delete mode 100644 Documentation/scheduler/00-INDEX delete mode 100644 Documentation/scsi/00-INDEX delete mode 100644 Documentation/serial/00-INDEX delete mode 100644 Documentation/spi/00-INDEX delete mode 100644 Documentation/sysctl/00-INDEX delete mode 100644 Documentation/timers/00-INDEX delete mode 100644 Documentation/virtual/00-INDEX delete mode 100644 Documentation/virtual/kvm/00-INDEX delete mode 100644 Documentation/vm/00-INDEX delete mode 100644 Documentation/w1/00-INDEX delete mode 100644 Documentation/w1/masters/00-INDEX delete mode 100644 Documentation/w1/slaves/00-INDEX delete mode 100644 Documentation/x86/00-INDEX delete mode 100644 Documentation/x86/x86_64/00-INDEX delete mode 100755 scripts/check_00index.sh (limited to 'Documentation/admin-guide') diff --git a/Documentation/00-INDEX b/Documentation/00-INDEX deleted file mode 100644 index 2754fe83f0d4..000000000000 --- a/Documentation/00-INDEX +++ /dev/null @@ -1,428 +0,0 @@ - -This is a brief list of all the files in ./linux/Documentation and what -they contain. If you add a documentation file, please list it here in -alphabetical order as well, or risk being hunted down like a rabid dog. -Please keep the descriptions small enough to fit on one line. - Thanks -- Paul G. - -Following translations are available on the WWW: - - - Japanese, maintained by the JF Project (jf@listserv.linux.or.jp), at - http://linuxjf.sourceforge.jp/ - -00-INDEX - - this file. -ABI/ - - info on kernel <-> userspace ABI and relative interface stability. -CodingStyle - - nothing here, just a pointer to process/coding-style.rst. -DMA-API.txt - - DMA API, pci_ API & extensions for non-consistent memory machines. -DMA-API-HOWTO.txt - - Dynamic DMA mapping Guide -DMA-ISA-LPC.txt - - How to do DMA with ISA (and LPC) devices. -DMA-attributes.txt - - listing of the various possible attributes a DMA region can have -EDID/ - - directory with info on customizing EDID for broken gfx/displays. -IPMI.txt - - info on Linux Intelligent Platform Management Interface (IPMI) Driver. -IRQ-affinity.txt - - how to select which CPU(s) handle which interrupt events on SMP. -IRQ-domain.txt - - info on interrupt numbering and setting up IRQ domains. -IRQ.txt - - description of what an IRQ is. -Intel-IOMMU.txt - - basic info on the Intel IOMMU virtualization support. -Makefile - - It's not of interest for those who aren't touching the build system. -PCI/ - - info related to PCI drivers. -RCU/ - - directory with info on RCU (read-copy update). -SAK.txt - - info on Secure Attention Keys. -SM501.txt - - Silicon Motion SM501 multimedia companion chip -SubmittingPatches - - nothing here, just a pointer to process/coding-style.rst. -accounting/ - - documentation on accounting and taskstats. -acpi/ - - info on ACPI-specific hooks in the kernel. -admin-guide/ - - info related to Linux users and system admins. -aoe/ - - description of AoE (ATA over Ethernet) along with config examples. -arm/ - - directory with info about Linux on the ARM architecture. -arm64/ - - directory with info about Linux on the 64 bit ARM architecture. -auxdisplay/ - - misc. LCD driver documentation (cfag12864b, ks0108). -backlight/ - - directory with info on controlling backlights in flat panel displays -block/ - - info on the Block I/O (BIO) layer. -blockdev/ - - info on block devices & drivers -bt8xxgpio.txt - - info on how to modify a bt8xx video card for GPIO usage. -btmrvl.txt - - info on Marvell Bluetooth driver usage. -bus-devices/ - - directory with info on TI GPMC (General Purpose Memory Controller) -bus-virt-phys-mapping.txt - - how to access I/O mapped memory from within device drivers. -cdrom/ - - directory with information on the CD-ROM drivers that Linux has. -cgroup-v1/ - - cgroups v1 features, including cpusets and memory controller. -cma/ - - Continuous Memory Area (CMA) debugfs interface. -conf.py - - It's not of interest for those who aren't touching the build system. -connector/ - - docs on the netlink based userspace<->kernel space communication mod. -console/ - - documentation on Linux console drivers. -core-api/ - - documentation on kernel core components. -cpu-freq/ - - info on CPU frequency and voltage scaling. -cpu-hotplug.txt - - document describing CPU hotplug support in the Linux kernel. -cpu-load.txt - - document describing how CPU load statistics are collected. -cpuidle/ - - info on CPU_IDLE, CPU idle state management subsystem. -cputopology.txt - - documentation on how CPU topology info is exported via sysfs. -crc32.txt - - brief tutorial on CRC computation -crypto/ - - directory with info on the Crypto API. -dcdbas.txt - - information on the Dell Systems Management Base Driver. -debugging-modules.txt - - some notes on debugging modules after Linux 2.6.3. -debugging-via-ohci1394.txt - - how to use firewire like a hardware debugger memory reader. -dell_rbu.txt - - document demonstrating the use of the Dell Remote BIOS Update driver. -dev-tools/ - - directory with info on development tools for the kernel. -device-mapper/ - - directory with info on Device Mapper. -dmaengine/ - - the DMA engine and controller API guides. -devicetree/ - - directory with info on device tree files used by OF/PowerPC/ARM -digsig.txt - -info on the Digital Signature Verification API -dma-buf-sharing.txt - - the DMA Buffer Sharing API Guide -docutils.conf - - nothing here. Just a configuration file for docutils. -dontdiff - - file containing a list of files that should never be diff'ed. -driver-api/ - - the Linux driver implementer's API guide. -driver-model/ - - directory with info about Linux driver model. -early-userspace/ - - info about initramfs, klibc, and userspace early during boot. -efi-stub.txt - - How to use the EFI boot stub to bypass GRUB or elilo on EFI systems. -eisa.txt - - info on EISA bus support. -extcon/ - - directory with porting guide for Android kernel switch driver. -isa.txt - - info on EISA bus support. -fault-injection/ - - dir with docs about the fault injection capabilities infrastructure. -fb/ - - directory with info on the frame buffer graphics abstraction layer. -features/ - - status of feature implementation on different architectures. -filesystems/ - - info on the vfs and the various filesystems that Linux supports. -firmware_class/ - - request_firmware() hotplug interface info. -flexible-arrays.txt - - how to make use of flexible sized arrays in linux -fmc/ - - information about the FMC bus abstraction -fpga/ - - FPGA Manager Core. -futex-requeue-pi.txt - - info on requeueing of tasks from a non-PI futex to a PI futex -gcc-plugins.txt - - GCC plugin infrastructure. -gpio/ - - gpio related documentation -gpu/ - - directory with information on GPU driver developer's guide. -hid/ - - directory with information on human interface devices -highuid.txt - - notes on the change from 16 bit to 32 bit user/group IDs. -hwspinlock.txt - - hardware spinlock provides hardware assistance for synchronization -timers/ - - info on the timer related topics -hw_random.txt - - info on Linux support for random number generator in i8xx chipsets. -hwmon/ - - directory with docs on various hardware monitoring drivers. -i2c/ - - directory with info about the I2C bus/protocol (2 wire, kHz speed). -x86/i386/ - - directory with info about Linux on Intel 32 bit architecture. -ia64/ - - directory with info about Linux on Intel 64 bit architecture. -ide/ - - Information regarding the Enhanced IDE drive. -iio/ - - info on industrial IIO configfs support. -index.rst - - main index for the documentation at ReST format. -infiniband/ - - directory with documents concerning Linux InfiniBand support. -input/ - - info on Linux input device support. -intel_txt.txt - - info on intel Trusted Execution Technology (intel TXT). -io-mapping.txt - - description of io_mapping functions in linux/io-mapping.h -io_ordering.txt - - info on ordering I/O writes to memory-mapped addresses. -ioctl/ - - directory with documents describing various IOCTL calls. -iostats.txt - - info on I/O statistics Linux kernel provides. -irqflags-tracing.txt - - how to use the irq-flags tracing feature. -isapnp.txt - - info on Linux ISA Plug & Play support. -isdn/ - - directory with info on the Linux ISDN support, and supported cards. -kbuild/ - - directory with info about the kernel build process. -kdump/ - - directory with mini HowTo on getting the crash dump code to work. -doc-guide/ - - how to write and format reStructuredText kernel documentation -kernel-per-CPU-kthreads.txt - - List of all per-CPU kthreads and how they introduce jitter. -kobject.txt - - info of the kobject infrastructure of the Linux kernel. -kprobes.txt - - documents the kernel probes debugging feature. -kref.txt - - docs on adding reference counters (krefs) to kernel objects. -laptops/ - - directory with laptop related info and laptop driver documentation. -ldm.txt - - a brief description of LDM (Windows Dynamic Disks). -leds/ - - directory with info about LED handling under Linux. -livepatch/ - - info on kernel live patching. -locking/ - - directory with info about kernel locking primitives -lockup-watchdogs.txt - - info on soft and hard lockup detectors (aka nmi_watchdog). -logo.gif - - full colour GIF image of Linux logo (penguin - Tux). -logo.txt - - info on creator of above logo & site to get additional images from. -lsm.txt - - Linux Security Modules: General Security Hooks for Linux -lzo.txt - - kernel LZO decompressor input formats -m68k/ - - directory with info about Linux on Motorola 68k architecture. -mailbox.txt - - How to write drivers for the common mailbox framework (IPC). -md/ - - directory with info about Linux Software RAID -media/ - - info on media drivers: uAPI, kAPI and driver documentation. -memory-barriers.txt - - info on Linux kernel memory barriers. -memory-devices/ - - directory with info on parts like the Texas Instruments EMIF driver -memory-hotplug.txt - - Hotpluggable memory support, how to use and current status. -men-chameleon-bus.txt - - info on MEN chameleon bus. -mic/ - - Intel Many Integrated Core (MIC) architecture device driver. -mips/ - - directory with info about Linux on MIPS architecture. -misc-devices/ - - directory with info about devices using the misc dev subsystem -mmc/ - - directory with info about the MMC subsystem -mtd/ - - directory with info about memory technology devices (flash) -namespaces/ - - directory with various information about namespaces -netlabel/ - - directory with information on the NetLabel subsystem. -networking/ - - directory with info on various aspects of networking with Linux. -nfc/ - - directory relating info about Near Field Communications support. -nios2/ - - Linux on the Nios II architecture. -nommu-mmap.txt - - documentation about no-mmu memory mapping support. -numastat.txt - - info on how to read Numa policy hit/miss statistics in sysfs. -ntb.txt - - info on Non-Transparent Bridge (NTB) drivers. -nvdimm/ - - info on non-volatile devices. -nvmem/ - - info on non volatile memory framework. -output/ - - default directory where html/LaTeX/pdf files will be written. -padata.txt - - An introduction to the "padata" parallel execution API -parisc/ - - directory with info on using Linux on PA-RISC architecture. -parport-lowlevel.txt - - description and usage of the low level parallel port functions. -pcmcia/ - - info on the Linux PCMCIA driver. -percpu-rw-semaphore.txt - - RCU based read-write semaphore optimized for locking for reading -perf/ - - info about the APM X-Gene SoC Performance Monitoring Unit (PMU). -phy/ - - ino on Samsung USB 2.0 PHY adaptation layer. -phy.txt - - Description of the generic PHY framework. -pi-futex.txt - - documentation on lightweight priority inheritance futexes. -pinctrl.txt - - info on pinctrl subsystem and the PINMUX/PINCONF and drivers -platform/ - - List of supported hardware by compal and Dell laptop. -pnp.txt - - Linux Plug and Play documentation. -power/ - - directory with info on Linux PCI power management. -powerpc/ - - directory with info on using Linux with the PowerPC. -prctl/ - - directory with info on the priveledge control subsystem -preempt-locking.txt - - info on locking under a preemptive kernel. -process/ - - how to work with the mainline kernel development process. -pps/ - - directory with information on the pulse-per-second support -pti/ - - directory with info on Intel MID PTI. -ptp/ - - directory with info on support for IEEE 1588 PTP clocks in Linux. -pwm.txt - - info on the pulse width modulation driver subsystem -rapidio/ - - directory with info on RapidIO packet-based fabric interconnect -rbtree.txt - - info on what red-black trees are and what they are for. -remoteproc.txt - - info on how to handle remote processor (e.g. AMP) offloads/usage. -rfkill.txt - - info on the radio frequency kill switch subsystem/support. -robust-futex-ABI.txt - - documentation of the robust futex ABI. -robust-futexes.txt - - a description of what robust futexes are. -rpmsg.txt - - info on the Remote Processor Messaging (rpmsg) Framework -rtc.txt - - notes on how to use the Real Time Clock (aka CMOS clock) driver. -s390/ - - directory with info on using Linux on the IBM S390. -scheduler/ - - directory with info on the scheduler. -scsi/ - - directory with info on Linux scsi support. -security/ - - directory that contains security-related info -serial/ - - directory with info on the low level serial API. -sgi-ioc4.txt - - description of the SGI IOC4 PCI (multi function) device. -sh/ - - directory with info on porting Linux to a new architecture. -smsc_ece1099.txt - -info on the smsc Keyboard Scan Expansion/GPIO Expansion device. -sound/ - - directory with info on sound card support. -spi/ - - overview of Linux kernel Serial Peripheral Interface (SPI) support. -sphinx/ - - no documentation here, just files required by Sphinx toolchain. -sphinx-static/ - - no documentation here, just files required by Sphinx toolchain. -static-keys.txt - - info on how static keys allow debug code in hotpaths via patching -svga.txt - - short guide on selecting video modes at boot via VGA BIOS. -sync_file.txt - - Sync file API guide. -sysctl/ - - directory with info on the /proc/sys/* files. -target/ - - directory with info on generating TCM v4 fabric .ko modules -tee.txt - - info on the TEE subsystem and drivers -this_cpu_ops.txt - - List rationale behind and the way to use this_cpu operations. -thermal/ - - directory with information on managing thermal issues (CPU/temp) -trace/ - - directory with info on tracing technologies within linux -translations/ - - translations of this document from English to another language -unaligned-memory-access.txt - - info on how to avoid arch breaking unaligned memory access in code. -unshare.txt - - description of the Linux unshare system call. -usb/ - - directory with info regarding the Universal Serial Bus. -vfio.txt - - info on Virtual Function I/O used in guest/hypervisor instances. -video-output.txt - - sysfs class driver interface to enable/disable a video output device. -virtual/ - - directory with information on the various linux virtualizations. -vm/ - - directory with info on the Linux vm code. -w1/ - - directory with documents regarding the 1-wire (w1) subsystem. -watchdog/ - - how to auto-reboot Linux if it has "fallen and can't get up". ;-) -wimax/ - - directory with info about Intel Wireless Wimax Connections -core-api/workqueue.rst - - information on the Concurrency Managed Workqueue implementation -x86/x86_64/ - - directory with info on Linux support for AMD x86-64 (Hammer) machines. -xillybus.txt - - Overview and basic ui of xillybus driver -xtensa/ - - directory with documents relating to arch/xtensa port/implementation -xz.txt - - how to make use of the XZ data compression within linux kernel -zorro.txt - - info on writing drivers for Zorro bus devices found on Amigas. diff --git a/Documentation/PCI/00-INDEX b/Documentation/PCI/00-INDEX deleted file mode 100644 index 206b1d5c1e71..000000000000 --- a/Documentation/PCI/00-INDEX +++ /dev/null @@ -1,26 +0,0 @@ -00-INDEX - - this file -acpi-info.txt - - info on how PCI host bridges are represented in ACPI -MSI-HOWTO.txt - - the Message Signaled Interrupts (MSI) Driver Guide HOWTO and FAQ. -PCIEBUS-HOWTO.txt - - a guide describing the PCI Express Port Bus driver -pci-error-recovery.txt - - info on PCI error recovery -pci-iov-howto.txt - - the PCI Express I/O Virtualization HOWTO -pci.txt - - info on the PCI subsystem for device driver authors -pcieaer-howto.txt - - the PCI Express Advanced Error Reporting Driver Guide HOWTO -endpoint/pci-endpoint.txt - - guide to add endpoint controller driver and endpoint function driver. -endpoint/pci-endpoint-cfs.txt - - guide to use configfs to configure the PCI endpoint function. -endpoint/pci-test-function.txt - - specification of *PCI test* function device. -endpoint/pci-test-howto.txt - - userguide for PCI endpoint test function. -endpoint/function/binding/ - - binding documentation for PCI endpoint function diff --git a/Documentation/RCU/00-INDEX b/Documentation/RCU/00-INDEX deleted file mode 100644 index f46980c060aa..000000000000 --- a/Documentation/RCU/00-INDEX +++ /dev/null @@ -1,34 +0,0 @@ -00-INDEX - - This file -arrayRCU.txt - - Using RCU to Protect Read-Mostly Arrays -checklist.txt - - Review Checklist for RCU Patches -listRCU.txt - - Using RCU to Protect Read-Mostly Linked Lists -lockdep.txt - - RCU and lockdep checking -lockdep-splat.txt - - RCU Lockdep splats explained. -NMI-RCU.txt - - Using RCU to Protect Dynamic NMI Handlers -rcu_dereference.txt - - Proper care and feeding of return values from rcu_dereference() -rcubarrier.txt - - RCU and Unloadable Modules -rculist_nulls.txt - - RCU list primitives for use with SLAB_TYPESAFE_BY_RCU -rcuref.txt - - Reference-count design for elements of lists/arrays protected by RCU -rcu.txt - - RCU Concepts -RTFP.txt - - List of RCU papers (bibliography) going back to 1980. -stallwarn.txt - - RCU CPU stall warnings (module parameter rcu_cpu_stall_suppress) -torture.txt - - RCU Torture Test Operation (CONFIG_RCU_TORTURE_TEST) -UP.txt - - RCU on Uniprocessor Systems -whatisRCU.txt - - What is RCU? diff --git a/Documentation/RCU/rcu.txt b/Documentation/RCU/rcu.txt index 7d4ae110c2c9..721b3e426515 100644 --- a/Documentation/RCU/rcu.txt +++ b/Documentation/RCU/rcu.txt @@ -87,7 +87,3 @@ o Where can I find more information on RCU? See the RTFP.txt file in this directory. Or point your browser at http://www.rdrop.com/users/paulmck/RCU/. - -o What are all these files in this directory? - - See 00-INDEX for the list. diff --git a/Documentation/admin-guide/README.rst b/Documentation/admin-guide/README.rst index 15ea785b2dfa..0797eec76be1 100644 --- a/Documentation/admin-guide/README.rst +++ b/Documentation/admin-guide/README.rst @@ -51,8 +51,7 @@ Documentation - There are various README files in the Documentation/ subdirectory: these typically contain kernel-specific installation notes for some - drivers for example. See Documentation/00-INDEX for a list of what - is contained in each file. Please read the + drivers for example. Please read the :ref:`Documentation/process/changes.rst ` file, as it contains information about the problems, which may result by upgrading your kernel. diff --git a/Documentation/arm/00-INDEX b/Documentation/arm/00-INDEX deleted file mode 100644 index b6e69fd371c4..000000000000 --- a/Documentation/arm/00-INDEX +++ /dev/null @@ -1,50 +0,0 @@ -00-INDEX - - this file -Booting - - requirements for booting -CCN.txt - - Cache Coherent Network ring-bus and perf PMU driver. -Interrupts - - ARM Interrupt subsystem documentation -IXP4xx - - Intel IXP4xx Network processor. -Netwinder - - Netwinder specific documentation -Porting - - Symbol definitions for porting Linux to a new ARM machine. -Setup - - Kernel initialization parameters on ARM Linux -README - - General ARM documentation -SA1100/ - - SA1100 documentation -Samsung-S3C24XX/ - - S3C24XX ARM Linux Overview -SPEAr/ - - ST SPEAr platform Linux Overview -VFP/ - - Release notes for Linux Kernel Vector Floating Point support code -cluster-pm-race-avoidance.txt - - Algorithm for CPU and Cluster setup/teardown -empeg/ - - Ltd's Empeg MP3 Car Audio Player -firmware.txt - - Secure firmware registration and calling. -kernel_mode_neon.txt - - How to use NEON instructions in kernel mode -kernel_user_helpers.txt - - Helper functions in kernel space made available for userspace. -mem_alignment - - alignment abort handler documentation -memory.txt - - description of the virtual memory layout -nwfpe/ - - NWFPE floating point emulator documentation -swp_emulation - - SWP/SWPB emulation handler/logging description -tcm.txt - - ARM Tightly Coupled Memory -uefi.txt - - [U]EFI configuration and runtime services documentation -vlocks.txt - - Voting locks, low-level mechanism relying on memory system atomic writes. diff --git a/Documentation/block/00-INDEX b/Documentation/block/00-INDEX deleted file mode 100644 index 8d55b4bbb5e2..000000000000 --- a/Documentation/block/00-INDEX +++ /dev/null @@ -1,34 +0,0 @@ -00-INDEX - - This file -bfq-iosched.txt - - BFQ IO scheduler and its tunables -biodoc.txt - - Notes on the Generic Block Layer Rewrite in Linux 2.5 -biovecs.txt - - Immutable biovecs and biovec iterators -capability.txt - - Generic Block Device Capability (/sys/block//capability) -cfq-iosched.txt - - CFQ IO scheduler tunables -cmdline-partition.txt - - how to specify block device partitions on kernel command line -data-integrity.txt - - Block data integrity -deadline-iosched.txt - - Deadline IO scheduler tunables -ioprio.txt - - Block io priorities (in CFQ scheduler) -pr.txt - - Block layer support for Persistent Reservations -null_blk.txt - - Null block for block-layer benchmarking. -queue-sysfs.txt - - Queue's sysfs entries -request.txt - - The members of struct request (in include/linux/blkdev.h) -stat.txt - - Block layer statistics in /sys/block//stat -switching-sched.txt - - Switching I/O schedulers at runtime -writeback_cache_control.txt - - Control of volatile write back caches diff --git a/Documentation/blockdev/00-INDEX b/Documentation/blockdev/00-INDEX deleted file mode 100644 index c08df56dd91b..000000000000 --- a/Documentation/blockdev/00-INDEX +++ /dev/null @@ -1,18 +0,0 @@ -00-INDEX - - this file -README.DAC960 - - info on Mylex DAC960/DAC1100 PCI RAID Controller Driver for Linux. -cciss.txt - - info, major/minor #'s for Compaq's SMART Array Controllers. -cpqarray.txt - - info on using Compaq's SMART2 Intelligent Disk Array Controllers. -floppy.txt - - notes and driver options for the floppy disk driver. -mflash.txt - - info on mGine m(g)flash driver for linux. -nbd.txt - - info on a TCP implementation of a network block device. -paride.txt - - information about the parallel port IDE subsystem. -ramdisk.txt - - short guide on how to set up and use the RAM disk. diff --git a/Documentation/cdrom/00-INDEX b/Documentation/cdrom/00-INDEX deleted file mode 100644 index 433edf23dc49..000000000000 --- a/Documentation/cdrom/00-INDEX +++ /dev/null @@ -1,11 +0,0 @@ -00-INDEX - - this file (info on CD-ROMs and Linux) -Makefile - - only used to generate TeX output from the documentation. -cdrom-standard.tex - - LaTeX document on standardizing the CD-ROM programming interface. -ide-cd - - info on setting up and using ATAPI (aka IDE) CD-ROMs. -packet-writing.txt - - Info on the CDRW packet writing module - diff --git a/Documentation/cgroup-v1/00-INDEX b/Documentation/cgroup-v1/00-INDEX deleted file mode 100644 index 13e0c85e7b35..000000000000 --- a/Documentation/cgroup-v1/00-INDEX +++ /dev/null @@ -1,26 +0,0 @@ -00-INDEX - - this file -blkio-controller.txt - - Description for Block IO Controller, implementation and usage details. -cgroups.txt - - Control Groups definition, implementation details, examples and API. -cpuacct.txt - - CPU Accounting Controller; account CPU usage for groups of tasks. -cpusets.txt - - documents the cpusets feature; assign CPUs and Mem to a set of tasks. -admin-guide/devices.rst - - Device Whitelist Controller; description, interface and security. -freezer-subsystem.txt - - checkpointing; rationale to not use signals, interface. -hugetlb.txt - - HugeTLB Controller implementation and usage details. -memcg_test.txt - - Memory Resource Controller; implementation details. -memory.txt - - Memory Resource Controller; design, accounting, interface, testing. -net_cls.txt - - Network classifier cgroups details and usages. -net_prio.txt - - Network priority cgroups details and usages. -pids.txt - - Process number cgroups details and usages. diff --git a/Documentation/devicetree/00-INDEX b/Documentation/devicetree/00-INDEX deleted file mode 100644 index 8c4102c6a5e7..000000000000 --- a/Documentation/devicetree/00-INDEX +++ /dev/null @@ -1,12 +0,0 @@ -Documentation for device trees, a data structure by which bootloaders pass -hardware layout to Linux in a device-independent manner, simplifying hardware -probing. This subsystem is maintained by Grant Likely - and has a mailing list at -https://lists.ozlabs.org/listinfo/devicetree-discuss - -00-INDEX - - this file -booting-without-of.txt - - Booting Linux without Open Firmware, describes history and format of device trees. -usage-model.txt - - How Linux uses DT and what DT aims to solve. \ No newline at end of file diff --git a/Documentation/fb/00-INDEX b/Documentation/fb/00-INDEX deleted file mode 100644 index fe85e7c5907a..000000000000 --- a/Documentation/fb/00-INDEX +++ /dev/null @@ -1,75 +0,0 @@ -Index of files in Documentation/fb. If you think something about frame -buffer devices needs an entry here, needs correction or you've written one -please mail me. - Geert Uytterhoeven - -00-INDEX - - this file. -api.txt - - The frame buffer API between applications and buffer devices. -arkfb.txt - - info on the fbdev driver for ARK Logic chips. -aty128fb.txt - - info on the ATI Rage128 frame buffer driver. -cirrusfb.txt - - info on the driver for Cirrus Logic chipsets. -cmap_xfbdev.txt - - an introduction to fbdev's cmap structures. -deferred_io.txt - - an introduction to deferred IO. -efifb.txt - - info on the EFI platform driver for Intel based Apple computers. -ep93xx-fb.txt - - info on the driver for EP93xx LCD controller. -fbcon.txt - - intro to and usage guide for the framebuffer console (fbcon). -framebuffer.txt - - introduction to frame buffer devices. -gxfb.txt - - info on the framebuffer driver for AMD Geode GX2 based processors. -intel810.txt - - documentation for the Intel 810/815 framebuffer driver. -intelfb.txt - - docs for Intel 830M/845G/852GM/855GM/865G/915G/945G fb driver. -internals.txt - - quick overview of frame buffer device internals. -lxfb.txt - - info on the framebuffer driver for AMD Geode LX based processors. -matroxfb.txt - - info on the Matrox framebuffer driver for Alpha, Intel and PPC. -metronomefb.txt - - info on the driver for the Metronome display controller. -modedb.txt - - info on the video mode database. -pvr2fb.txt - - info on the PowerVR 2 frame buffer driver. -pxafb.txt - - info on the driver for the PXA25x LCD controller. -s3fb.txt - - info on the fbdev driver for S3 Trio/Virge chips. -sa1100fb.txt - - information about the driver for the SA-1100 LCD controller. -sh7760fb.txt - - info on the SH7760/SH7763 integrated LCDC Framebuffer driver. -sisfb.txt - - info on the framebuffer device driver for various SiS chips. -sm501.txt - - info on the framebuffer device driver for sm501 videoframebuffer. -sstfb.txt - - info on the frame buffer driver for 3dfx' Voodoo Graphics boards. -tgafb.txt - - info on the TGA (DECChip 21030) frame buffer driver. -tridentfb.txt - info on the framebuffer driver for some Trident chip based cards. -udlfb.txt - - Driver for DisplayLink USB 2.0 chips. -uvesafb.txt - - info on the userspace VESA (VBE2+ compliant) frame buffer device. -vesafb.txt - - info on the VESA frame buffer device. -viafb.modes - - list of modes for VIA Integration Graphic Chip. -viafb.txt - - info on the VIA Integration Graphic Chip console framebuffer driver. -vt8623fb.txt - - info on the fb driver for the graphics core in VIA VT8623 chipsets. diff --git a/Documentation/filesystems/00-INDEX b/Documentation/filesystems/00-INDEX deleted file mode 100644 index 0937bade1099..000000000000 --- a/Documentation/filesystems/00-INDEX +++ /dev/null @@ -1,153 +0,0 @@ -00-INDEX - - this file (info on some of the filesystems supported by linux). -Locking - - info on locking rules as they pertain to Linux VFS. -9p.txt - - 9p (v9fs) is an implementation of the Plan 9 remote fs protocol. -adfs.txt - - info and mount options for the Acorn Advanced Disc Filing System. -afs.txt - - info and examples for the distributed AFS (Andrew File System) fs. -affs.txt - - info and mount options for the Amiga Fast File System. -autofs-mount-control.txt - - info on device control operations for autofs module. -automount-support.txt - - information about filesystem automount support. -befs.txt - - information about the BeOS filesystem for Linux. -bfs.txt - - info for the SCO UnixWare Boot Filesystem (BFS). -btrfs.txt - - info for the BTRFS filesystem. -caching/ - - directory containing filesystem cache documentation. -ceph.txt - - info for the Ceph Distributed File System. -cifs/ - - directory containing CIFS filesystem documentation and example code. -coda.txt - - description of the CODA filesystem. -configfs/ - - directory containing configfs documentation and example code. -cramfs.txt - - info on the cram filesystem for small storage (ROMs etc). -dax.txt - - info on avoiding the page cache for files stored on CPU-addressable - storage devices. -debugfs.txt - - info on the debugfs filesystem. -devpts.txt - - info on the devpts filesystem. -directory-locking - - info about the locking scheme used for directory operations. -dlmfs.txt - - info on the userspace interface to the OCFS2 DLM. -dnotify.txt - - info about directory notification in Linux. -dnotify_test.c - - example program for dnotify. -ecryptfs.txt - - docs on eCryptfs: stacked cryptographic filesystem for Linux. -efivarfs.txt - - info for the efivarfs filesystem. -exofs.txt - - info, usage, mount options, design about EXOFS. -ext2.txt - - info, mount options and specifications for the Ext2 filesystem. -ext3.txt - - info, mount options and specifications for the Ext3 filesystem. -ext4.txt - - info, mount options and specifications for the Ext4 filesystem. -f2fs.txt - - info and mount options for the F2FS filesystem. -fiemap.txt - - info on fiemap ioctl. -files.txt - - info on file management in the Linux kernel. -fuse.txt - - info on the Filesystem in User SpacE including mount options. -gfs2-glocks.txt - - info on the Global File System 2 - Glock internal locking rules. -gfs2-uevents.txt - - info on the Global File System 2 - uevents. -gfs2.txt - - info on the Global File System 2. -hfs.txt - - info on the Macintosh HFS Filesystem for Linux. -hfsplus.txt - - info on the Macintosh HFSPlus Filesystem for Linux. -hpfs.txt - - info and mount options for the OS/2 HPFS. -inotify.txt - - info on the powerful yet simple file change notification system. -isofs.txt - - info and mount options for the ISO 9660 (CDROM) filesystem. -jfs.txt - - info and mount options for the JFS filesystem. -locks.txt - - info on file locking implementations, flock() vs. fcntl(), etc. -mandatory-locking.txt - - info on the Linux implementation of Sys V mandatory file locking. -nfs/ - - nfs-related documentation. -nilfs2.txt - - info and mount options for the NILFS2 filesystem. -ntfs.txt - - info and mount options for the NTFS filesystem (Windows NT). -ocfs2.txt - - info and mount options for the OCFS2 clustered filesystem. -omfs.txt - - info on the Optimized MPEG FileSystem. -path-lookup.txt - - info on path walking and name lookup locking. -pohmelfs/ - - directory containing pohmelfs filesystem documentation. -porting - - various information on filesystem porting. -proc.txt - - info on Linux's /proc filesystem. -qnx6.txt - - info on the QNX6 filesystem. -quota.txt - - info on Quota subsystem. -ramfs-rootfs-initramfs.txt - - info on the 'in memory' filesystems ramfs, rootfs and initramfs. -relay.txt - - info on relay, for efficient streaming from kernel to user space. -romfs.txt - - description of the ROMFS filesystem. -seq_file.txt - - how to use the seq_file API. -sharedsubtree.txt - - a description of shared subtrees for namespaces. -spufs.txt - - info and mount options for the SPU filesystem used on Cell. -squashfs.txt - - info on the squashfs filesystem. -sysfs-pci.txt - - info on accessing PCI device resources through sysfs. -sysfs-tagging.txt - - info on sysfs tagging to avoid duplicates. -sysfs.txt - - info on sysfs, a ram-based filesystem for exporting kernel objects. -sysv-fs.txt - - info on the SystemV/V7/Xenix/Coherent filesystem. -tmpfs.txt - - info on tmpfs, a filesystem that holds all files in virtual memory. -ubifs.txt - - info on the Unsorted Block Images FileSystem. -udf.txt - - info and mount options for the UDF filesystem. -ufs.txt - - info on the ufs filesystem. -vfat.txt - - info on using the VFAT filesystem used in Windows NT and Windows 95. -vfs.txt - - overview of the Virtual File System. -xfs-delayed-logging-design.txt - - info on the XFS Delayed Logging Design. -xfs-self-describing-metadata.txt - - info on XFS Self Describing Metadata. -xfs.txt - - info and mount options for the XFS filesystem. diff --git a/Documentation/filesystems/nfs/00-INDEX b/Documentation/filesystems/nfs/00-INDEX deleted file mode 100644 index 53f3b596ac0d..000000000000 --- a/Documentation/filesystems/nfs/00-INDEX +++ /dev/null @@ -1,26 +0,0 @@ -00-INDEX - - this file (nfs-related documentation). -Exporting - - explanation of how to make filesystems exportable. -fault_injection.txt - - information for using fault injection on the server -knfsd-stats.txt - - statistics which the NFS server makes available to user space. -nfs.txt - - nfs client, and DNS resolution for fs_locations. -nfs41-server.txt - - info on the Linux server implementation of NFSv4 minor version 1. -nfs-rdma.txt - - how to install and setup the Linux NFS/RDMA client and server software -nfsd-admin-interfaces.txt - - Administrative interfaces for nfsd. -nfsroot.txt - - short guide on setting up a diskless box with NFS root filesystem. -pnfs.txt - - short explanation of some of the internals of the pnfs client code -rpc-cache.txt - - introduction to the caching mechanisms in the sunrpc layer. -idmapper.txt - - information for configuring request-keys to be used by idmapper -rpc-server-gss.txt - - Information on GSS authentication support in the NFS Server diff --git a/Documentation/fmc/00-INDEX b/Documentation/fmc/00-INDEX deleted file mode 100644 index 431c69570f43..000000000000 --- a/Documentation/fmc/00-INDEX +++ /dev/null @@ -1,38 +0,0 @@ - -Documentation in this directory comes from sections of the manual we -wrote for the externally-developed fmc-bus package. The complete -manual as of today (2013-02) is available in PDF format at -http://www.ohwr.org/projects/fmc-bus/files - -00-INDEX - - this file. - -FMC-and-SDB.txt - - What are FMC and SDB, basic concepts for this framework - -API.txt - - The functions that are exported by the bus driver - -parameters.txt - - The module parameters - -carrier.txt - - writing a carrier (a device) - -mezzanine.txt - - writing code for your mezzanine (a driver) - -identifiers.txt - - how identification and matching works - -fmc-fakedev.txt - - about drivers/fmc/fmc-fakedev.ko - -fmc-trivial.txt - - about drivers/fmc/fmc-trivial.ko - -fmc-write-eeprom.txt - - about drivers/fmc/fmc-write-eeprom.ko - -fmc-chardev.txt - - about drivers/fmc/fmc-chardev.ko diff --git a/Documentation/gpio/00-INDEX b/Documentation/gpio/00-INDEX deleted file mode 100644 index 17e19a68058f..000000000000 --- a/Documentation/gpio/00-INDEX +++ /dev/null @@ -1,4 +0,0 @@ -00-INDEX - - This file -sysfs.txt - - Information about the GPIO sysfs interface diff --git a/Documentation/ide/00-INDEX b/Documentation/ide/00-INDEX deleted file mode 100644 index 22f98ca79539..000000000000 --- a/Documentation/ide/00-INDEX +++ /dev/null @@ -1,14 +0,0 @@ -00-INDEX - - this file -ChangeLog.ide-cd.1994-2004 - - ide-cd changelog -ChangeLog.ide-floppy.1996-2002 - - ide-floppy changelog -ChangeLog.ide-tape.1995-2002 - - ide-tape changelog -ide-tape.txt - - info on the IDE ATAPI streaming tape driver -ide.txt - - important info for users of ATA devices (IDE/EIDE disks and CD-ROMS). -warm-plug-howto.txt - - using sysfs to remove and add IDE devices. \ No newline at end of file diff --git a/Documentation/ioctl/00-INDEX b/Documentation/ioctl/00-INDEX deleted file mode 100644 index c1a925787950..000000000000 --- a/Documentation/ioctl/00-INDEX +++ /dev/null @@ -1,12 +0,0 @@ -00-INDEX - - this file -botching-up-ioctls.txt - - how to avoid botching up ioctls -cdrom.txt - - summary of CDROM ioctl calls -hdio.txt - - summary of HDIO_ ioctl calls -ioctl-decoding.txt - - how to decode the bits of an IOCTL code -ioctl-number.txt - - how to implement and register device/driver ioctl calls diff --git a/Documentation/isdn/00-INDEX b/Documentation/isdn/00-INDEX deleted file mode 100644 index 2d1889b6c1fa..000000000000 --- a/Documentation/isdn/00-INDEX +++ /dev/null @@ -1,42 +0,0 @@ -00-INDEX - - this file (info on ISDN implementation for Linux) -CREDITS - - list of the kind folks that brought you this stuff. -HiSax.cert - - information about the ITU approval certification of the HiSax driver. -INTERFACE - - description of isdn4linux Link Level and Hardware Level interfaces. -INTERFACE.fax - - description of the fax subinterface of isdn4linux. -INTERFACE.CAPI - - description of kernel CAPI Link Level to Hardware Level interface. -README - - general info on what you need and what to do for Linux ISDN. -README.FAQ - - general info for FAQ. -README.HiSax - - info on the HiSax driver which replaces the old teles. -README.audio - - info for running audio over ISDN. -README.avmb1 - - info on driver for AVM-B1 ISDN card. -README.concap - - info on "CONCAP" encapsulation protocol interface used for X.25. -README.diversion - - info on module for isdn diversion services. -README.fax - - info for using Fax over ISDN. -README.gigaset - - info on the drivers for Siemens Gigaset ISDN adapters -README.hfc-pci - - info on hfc-pci based cards. -README.hysdn - - info on driver for Hypercope active HYSDN cards -README.mISDN - - info on the Modular ISDN subsystem (mISDN) -README.syncppp - - info on running Sync PPP over ISDN. -README.x25 - - info for running X.25 over ISDN. -syncPPP.FAQ - - frequently asked questions about running PPP over ISDN. diff --git a/Documentation/kbuild/00-INDEX b/Documentation/kbuild/00-INDEX deleted file mode 100644 index 8c5e6aa78004..000000000000 --- a/Documentation/kbuild/00-INDEX +++ /dev/null @@ -1,14 +0,0 @@ -00-INDEX - - this file: info on the kernel build process -headers_install.txt - - how to export Linux headers for use by userspace -kbuild.txt - - developer information on kbuild -kconfig.txt - - usage help for make *config -kconfig-language.txt - - specification of Config Language, the language in Kconfig files -makefiles.txt - - developer information for linux kernel makefiles -modules.txt - - how to build modules and to install them diff --git a/Documentation/laptops/00-INDEX b/Documentation/laptops/00-INDEX deleted file mode 100644 index 86169dc766f7..000000000000 --- a/Documentation/laptops/00-INDEX +++ /dev/null @@ -1,16 +0,0 @@ -00-INDEX - - This file -asus-laptop.txt - - information on the Asus Laptop Extras driver. -disk-shock-protection.txt - - information on hard disk shock protection. -laptop-mode.txt - - how to conserve battery power using laptop-mode. -sony-laptop.txt - - Sony Notebook Control Driver (SNC) Readme. -sonypi.txt - - info on Linux Sony Programmable I/O Device support. -thinkpad-acpi.txt - - information on the (IBM and Lenovo) ThinkPad ACPI Extras driver. -toshiba_haps.txt - - information on the Toshiba HDD Active Protection Sensor driver. diff --git a/Documentation/leds/00-INDEX b/Documentation/leds/00-INDEX deleted file mode 100644 index ae626b29a740..000000000000 --- a/Documentation/leds/00-INDEX +++ /dev/null @@ -1,32 +0,0 @@ -00-INDEX - - This file -leds-blinkm.txt - - Driver for BlinkM LED-devices. -leds-class.txt - - documents LED handling under Linux. -leds-class-flash.txt - - documents flash LED handling under Linux. -leds-lm3556.txt - - notes on how to use the leds-lm3556 driver. -leds-lp3944.txt - - notes on how to use the leds-lp3944 driver. -leds-lp5521.txt - - notes on how to use the leds-lp5521 driver. -leds-lp5523.txt - - notes on how to use the leds-lp5523 driver. -leds-lp5562.txt - - notes on how to use the leds-lp5562 driver. -leds-lp55xx.txt - - description about lp55xx common driver. -leds-lm3556.txt - - notes on how to use the leds-lm3556 driver. -leds-mlxcpld.txt - - notes on how to use the leds-mlxcpld driver. -ledtrig-oneshot.txt - - One-shot LED trigger for both sporadic and dense events. -ledtrig-transient.txt - - LED Transient Trigger, one shot timer activation. -ledtrig-usbport.txt - - notes on how to use the drivers/usb/core/ledtrig-usbport.c trigger. -uleds.txt - - notes on how to use the uleds driver. diff --git a/Documentation/locking/00-INDEX b/Documentation/locking/00-INDEX deleted file mode 100644 index c256c9bee2a4..000000000000 --- a/Documentation/locking/00-INDEX +++ /dev/null @@ -1,16 +0,0 @@ -00-INDEX - - this file. -lockdep-design.txt - - documentation on the runtime locking correctness validator. -lockstat.txt - - info on collecting statistics on locks (and contention). -mutex-design.txt - - info on the generic mutex subsystem. -rt-mutex-design.txt - - description of the RealTime mutex implementation design. -rt-mutex.txt - - desc. of RT-mutex subsystem with PI (Priority Inheritance) support. -spinlocks.txt - - info on using spinlocks to provide exclusive access in kernel. -ww-mutex-design.txt - - Intro to Mutex wait/would deadlock handling.s diff --git a/Documentation/m68k/00-INDEX b/Documentation/m68k/00-INDEX deleted file mode 100644 index 2be8c6b00e74..000000000000 --- a/Documentation/m68k/00-INDEX +++ /dev/null @@ -1,7 +0,0 @@ -00-INDEX - - this file -README.buddha - - Amiga Buddha and Catweasel IDE Driver -kernel-options.txt - - command line options for Linux/m68k - diff --git a/Documentation/mips/00-INDEX b/Documentation/mips/00-INDEX deleted file mode 100644 index 8ae9cffc2262..000000000000 --- a/Documentation/mips/00-INDEX +++ /dev/null @@ -1,4 +0,0 @@ -00-INDEX - - this file. -AU1xxx_IDE.README - - README for MIPS AU1XXX IDE driver. diff --git a/Documentation/mmc/00-INDEX b/Documentation/mmc/00-INDEX deleted file mode 100644 index 4623bc0aa0bb..000000000000 --- a/Documentation/mmc/00-INDEX +++ /dev/null @@ -1,10 +0,0 @@ -00-INDEX - - this file -mmc-dev-attrs.txt - - info on SD and MMC device attributes -mmc-dev-parts.txt - - info on SD and MMC device partitions -mmc-async-req.txt - - info on mmc asynchronous requests -mmc-tools.txt - - info on mmc-utils tools diff --git a/Documentation/netlabel/00-INDEX b/Documentation/netlabel/00-INDEX deleted file mode 100644 index 837bf35990e2..000000000000 --- a/Documentation/netlabel/00-INDEX +++ /dev/null @@ -1,10 +0,0 @@ -00-INDEX - - this file. -cipso_ipv4.txt - - documentation on the IPv4 CIPSO protocol engine. -draft-ietf-cipso-ipsecurity-01.txt - - IETF draft of the CIPSO protocol, dated 16 July 1992. -introduction.txt - - NetLabel introduction, READ THIS FIRST. -lsm_interface.txt - - documentation on the NetLabel kernel security module API. diff --git a/Documentation/netlabel/cipso_ipv4.txt b/Documentation/netlabel/cipso_ipv4.txt index 93dacb132c3c..a6075481fd60 100644 --- a/Documentation/netlabel/cipso_ipv4.txt +++ b/Documentation/netlabel/cipso_ipv4.txt @@ -6,11 +6,12 @@ May 17, 2006 * Overview -The NetLabel CIPSO/IPv4 protocol engine is based on the IETF Commercial IP -Security Option (CIPSO) draft from July 16, 1992. A copy of this draft can be -found in this directory, consult '00-INDEX' for the filename. While the IETF -draft never made it to an RFC standard it has become a de-facto standard for -labeled networking and is used in many trusted operating systems. +The NetLabel CIPSO/IPv4 protocol engine is based on the IETF Commercial +IP Security Option (CIPSO) draft from July 16, 1992. A copy of this +draft can be found in this directory +(draft-ietf-cipso-ipsecurity-01.txt). While the IETF draft never made +it to an RFC standard it has become a de-facto standard for labeled +networking and is used in many trusted operating systems. * Outbound Packet Processing diff --git a/Documentation/netlabel/introduction.txt b/Documentation/netlabel/introduction.txt index 5ecd8d1dcf4e..3caf77bcff0f 100644 --- a/Documentation/netlabel/introduction.txt +++ b/Documentation/netlabel/introduction.txt @@ -22,7 +22,7 @@ refrain from calling the protocol engines directly, instead they should use the NetLabel kernel security module API described below. Detailed information about each NetLabel protocol engine can be found in this -directory, consult '00-INDEX' for filenames. +directory. * Communication Layer diff --git a/Documentation/networking/00-INDEX b/Documentation/networking/00-INDEX deleted file mode 100644 index 02a323c43261..000000000000 --- a/Documentation/networking/00-INDEX +++ /dev/null @@ -1,234 +0,0 @@ -00-INDEX - - this file -3c509.txt - - information on the 3Com Etherlink III Series Ethernet cards. -6pack.txt - - info on the 6pack protocol, an alternative to KISS for AX.25 -LICENSE.qla3xxx - - GPLv2 for QLogic Linux Networking HBA Driver -LICENSE.qlge - - GPLv2 for QLogic Linux qlge NIC Driver -LICENSE.qlcnic - - GPLv2 for QLogic Linux qlcnic NIC Driver -PLIP.txt - - PLIP: The Parallel Line Internet Protocol device driver -README.ipw2100 - - README for the Intel PRO/Wireless 2100 driver. -README.ipw2200 - - README for the Intel PRO/Wireless 2915ABG and 2200BG driver. -README.sb1000 - - info on General Instrument/NextLevel SURFboard1000 cable modem. -altera_tse.txt - - Altera Triple-Speed Ethernet controller. -arcnet-hardware.txt - - tons of info on ARCnet, hubs, jumper settings for ARCnet cards, etc. -arcnet.txt - - info on the using the ARCnet driver itself. -atm.txt - - info on where to get ATM programs and support for Linux. -ax25.txt - - info on using AX.25 and NET/ROM code for Linux -baycom.txt - - info on the driver for Baycom style amateur radio modems -bonding.txt - - Linux Ethernet Bonding Driver HOWTO: link aggregation in Linux. -bridge.txt - - where to get user space programs for ethernet bridging with Linux. -cdc_mbim.txt - - 3G/LTE USB modem (Mobile Broadband Interface Model) -checksum-offloads.txt - - Explanation of checksum offloads; LCO, RCO -cops.txt - - info on the COPS LocalTalk Linux driver -cs89x0.txt - - the Crystal LAN (CS8900/20-based) Ethernet ISA adapter driver -cxacru.txt - - Conexant AccessRunner USB ADSL Modem -cxacru-cf.py - - Conexant AccessRunner USB ADSL Modem configuration file parser -cxgb.txt - - Release Notes for the Chelsio N210 Linux device driver. -dccp.txt - - the Datagram Congestion Control Protocol (DCCP) (RFC 4340..42). -dctcp.txt - - DataCenter TCP congestion control -de4x5.txt - - the Digital EtherWORKS DE4?? and DE5?? PCI Ethernet driver -decnet.txt - - info on using the DECnet networking layer in Linux. -dl2k.txt - - README for D-Link DL2000-based Gigabit Ethernet Adapters (dl2k.ko). -dm9000.txt - - README for the Simtec DM9000 Network driver. -dmfe.txt - - info on the Davicom DM9102(A)/DM9132/DM9801 fast ethernet driver. -dns_resolver.txt - - The DNS resolver module allows kernel servies to make DNS queries. -driver.txt - - Softnet driver issues. -ena.txt - - info on Amazon's Elastic Network Adapter (ENA) -e100.txt - - info on Intel's EtherExpress PRO/100 line of 10/100 boards -e1000.txt - - info on Intel's E1000 line of gigabit ethernet boards -e1000e.txt - - README for the Intel Gigabit Ethernet Driver (e1000e). -eql.txt - - serial IP load balancing -fib_trie.txt - - Level Compressed Trie (LC-trie) notes: a structure for routing. -filter.txt - - Linux Socket Filtering -fore200e.txt - - FORE Systems PCA-200E/SBA-200E ATM NIC driver info. -framerelay.txt - - info on using Frame Relay/Data Link Connection Identifier (DLCI). -gen_stats.txt - - Generic networking statistics for netlink users. -generic-hdlc.txt - - The generic High Level Data Link Control (HDLC) layer. -generic_netlink.txt - - info on Generic Netlink -gianfar.txt - - Gianfar Ethernet Driver. -i40e.txt - - README for the Intel Ethernet Controller XL710 Driver (i40e). -i40evf.txt - - Short note on the Driver for the Intel(R) XL710 X710 Virtual Function -ieee802154.txt - - Linux IEEE 802.15.4 implementation, API and drivers -igb.txt - - README for the Intel Gigabit Ethernet Driver (igb). -igbvf.txt - - README for the Intel Gigabit Ethernet Driver (igbvf). -ip-sysctl.txt - - /proc/sys/net/ipv4/* variables -ip_dynaddr.txt - - IP dynamic address hack e.g. for auto-dialup links -ipddp.txt - - AppleTalk-IP Decapsulation and AppleTalk-IP Encapsulation -iphase.txt - - Interphase PCI ATM (i)Chip IA Linux driver info. -ipsec.txt - - Note on not compressing IPSec payload and resulting failed policy check. -ipv6.txt - - Options to the ipv6 kernel module. -ipvs-sysctl.txt - - Per-inode explanation of the /proc/sys/net/ipv4/vs interface. -irda.txt - - where to get IrDA (infrared) utilities and info for Linux. -ixgb.txt - - README for the Intel 10 Gigabit Ethernet Driver (ixgb). -ixgbe.txt - - README for the Intel 10 Gigabit Ethernet Driver (ixgbe). -ixgbevf.txt - - README for the Intel Virtual Function (VF) Driver (ixgbevf). -l2tp.txt - - User guide to the L2TP tunnel protocol. -lapb-module.txt - - programming information of the LAPB module. -ltpc.txt - - the Apple or Farallon LocalTalk PC card driver -mac80211-auth-assoc-deauth.txt - - authentication and association / deauth-disassoc with max80211 -mac80211-injection.txt - - HOWTO use packet injection with mac80211 -multiqueue.txt - - HOWTO for multiqueue network device support. -netconsole.txt - - The network console module netconsole.ko: configuration and notes. -netdev-features.txt - - Network interface features API description. -netdevices.txt - - info on network device driver functions exported to the kernel. -netif-msg.txt - - Design of the network interface message level setting (NETIF_MSG_*). -netlink_mmap.txt - - memory mapped I/O with netlink -nf_conntrack-sysctl.txt - - list of netfilter-sysctl knobs. -nfc.txt - - The Linux Near Field Communication (NFS) subsystem. -openvswitch.txt - - Open vSwitch developer documentation. -operstates.txt - - Overview of network interface operational states. -packet_mmap.txt - - User guide to memory mapped packet socket rings (PACKET_[RT]X_RING). -phonet.txt - - The Phonet packet protocol used in Nokia cellular modems. -phy.txt - - The PHY abstraction layer. -pktgen.txt - - User guide to the kernel packet generator (pktgen.ko). -policy-routing.txt - - IP policy-based routing -ppp_generic.txt - - Information about the generic PPP driver. -proc_net_tcp.txt - - Per inode overview of the /proc/net/tcp and /proc/net/tcp6 interfaces. -radiotap-headers.txt - - Background on radiotap headers. -ray_cs.txt - - Raylink Wireless LAN card driver info. -rds.txt - - Background on the reliable, ordered datagram delivery method RDS. -regulatory.txt - - Overview of the Linux wireless regulatory infrastructure. -rxrpc.txt - - Guide to the RxRPC protocol. -s2io.txt - - Release notes for Neterion Xframe I/II 10GbE driver. -scaling.txt - - Explanation of network scaling techniques: RSS, RPS, RFS, aRFS, XPS. -sctp.txt - - Notes on the Linux kernel implementation of the SCTP protocol. -secid.txt - - Explanation of the secid member in flow structures. -skfp.txt - - SysKonnect FDDI (SK-5xxx, Compaq Netelligent) driver info. -smc9.txt - - the driver for SMC's 9000 series of Ethernet cards -spider_net.txt - - README for the Spidernet Driver (as found in PS3 / Cell BE). -stmmac.txt - - README for the STMicro Synopsys Ethernet driver. -tc-actions-env-rules.txt - - rules for traffic control (tc) actions. -timestamping.txt - - overview of network packet timestamping variants. -tcp.txt - - short blurb on how TCP output takes place. -tcp-thin.txt - - kernel tuning options for low rate 'thin' TCP streams. -team.txt - - pointer to information for ethernet teaming devices. -tlan.txt - - ThunderLAN (Compaq Netelligent 10/100, Olicom OC-2xxx) driver info. -tproxy.txt - - Transparent proxy support user guide. -tuntap.txt - - TUN/TAP device driver, allowing user space Rx/Tx of packets. -udplite.txt - - UDP-Lite protocol (RFC 3828) introduction. -vortex.txt - - info on using 3Com Vortex (3c590, 3c592, 3c595, 3c597) Ethernet cards. -vxge.txt - - README for the Neterion X3100 PCIe Server Adapter. -vxlan.txt - - Virtual extensible LAN overview -x25.txt - - general info on X.25 development. -x25-iface.txt - - description of the X.25 Packet Layer to LAPB device interface. -xfrm_device.txt - - description of XFRM offload API -xfrm_proc.txt - - description of the statistics package for XFRM. -xfrm_sync.txt - - sync patches for XFRM enable migration of an SA between hosts. -xfrm_sysctl.txt - - description of the XFRM configuration options. -z8530drv.txt - - info about Linux driver for Z8530 based HDLC cards for AX.25 diff --git a/Documentation/parisc/00-INDEX b/Documentation/parisc/00-INDEX deleted file mode 100644 index cbd060961f43..000000000000 --- a/Documentation/parisc/00-INDEX +++ /dev/null @@ -1,6 +0,0 @@ -00-INDEX - - this file. -debugging - - some debugging hints for real-mode code -registers - - current/planned usage of registers diff --git a/Documentation/power/00-INDEX b/Documentation/power/00-INDEX deleted file mode 100644 index 7f3c2def2cac..000000000000 --- a/Documentation/power/00-INDEX +++ /dev/null @@ -1,44 +0,0 @@ -00-INDEX - - This file -apm-acpi.txt - - basic info about the APM and ACPI support. -basic-pm-debugging.txt - - Debugging suspend and resume -charger-manager.txt - - Battery charger management. -admin-guide/devices.rst - - How drivers interact with system-wide power management -drivers-testing.txt - - Testing suspend and resume support in device drivers -freezing-of-tasks.txt - - How processes and controlled during suspend -interface.txt - - Power management user interface in /sys/power -opp.txt - - Operating Performance Point library -pci.txt - - How the PCI Subsystem Does Power Management -pm_qos_interface.txt - - info on Linux PM Quality of Service interface -power_supply_class.txt - - Tells userspace about battery, UPS, AC or DC power supply properties -runtime_pm.txt - - Power management framework for I/O devices. -s2ram.txt - - How to get suspend to ram working (and debug it when it isn't) -states.txt - - System power management states -suspend-and-cpuhotplug.txt - - Explains the interaction between Suspend-to-RAM (S3) and CPU hotplug -swsusp-and-swap-files.txt - - Using swap files with software suspend (to disk) -swsusp-dmcrypt.txt - - How to use dm-crypt and software suspend (to disk) together -swsusp.txt - - Goals, implementation, and usage of software suspend (ACPI S3) -tricks.txt - - How to trick software suspend (to disk) into working when it isn't -userland-swsusp.txt - - Experimental implementation of software suspend in userspace -video.txt - - Video issues during resume from suspend diff --git a/Documentation/powerpc/00-INDEX b/Documentation/powerpc/00-INDEX deleted file mode 100644 index 9dc845cf7d88..000000000000 --- a/Documentation/powerpc/00-INDEX +++ /dev/null @@ -1,34 +0,0 @@ -Index of files in Documentation/powerpc. If you think something about -Linux/PPC needs an entry here, needs correction or you've written one -please mail me. - Cort Dougan (cort@fsmlabs.com) - -00-INDEX - - this file -bootwrapper.txt - - Information on how the powerpc kernel is wrapped for boot on various - different platforms. -cpu_features.txt - - info on how we support a variety of CPUs with minimal compile-time - options. -cxl.txt - - Overview of the CXL driver. -eeh-pci-error-recovery.txt - - info on PCI Bus EEH Error Recovery -firmware-assisted-dump.txt - - Documentation on the firmware assisted dump mechanism "fadump". -hvcs.txt - - IBM "Hypervisor Virtual Console Server" Installation Guide -mpc52xx.txt - - Linux 2.6.x on MPC52xx family -pmu-ebb.txt - - Description of the API for using the PMU with Event Based Branches. -qe_firmware.txt - - describes the layout of firmware binaries for the Freescale QUICC - Engine and the code that parses and uploads the microcode therein. -ptrace.txt - - Information on the ptrace interfaces for hardware debug registers. -transactional_memory.txt - - Overview of the Power8 transactional memory support. -dscr.txt - - Overview DSCR (Data Stream Control Register) support. diff --git a/Documentation/s390/00-INDEX b/Documentation/s390/00-INDEX deleted file mode 100644 index 317f0378ae01..000000000000 --- a/Documentation/s390/00-INDEX +++ /dev/null @@ -1,28 +0,0 @@ -00-INDEX - - this file. -3270.ChangeLog - - ChangeLog for the UTS Global 3270-support patch (outdated). -3270.txt - - how to use the IBM 3270 display system support. -cds.txt - - s390 common device support (common I/O layer). -CommonIO - - common I/O layer command line parameters, procfs and debugfs entries -config3270.sh - - example configuration for 3270 devices. -DASD - - information on the DASD disk device driver. -Debugging390.txt - - hints for debugging on s390 systems. -driver-model.txt - - information on s390 devices and the driver model. -monreader.txt - - information on accessing the z/VM monitor stream from Linux. -qeth.txt - - HiperSockets Bridge Port Support. -s390dbf.txt - - information on using the s390 debug feature. -vfio-ccw.txt - information on the vfio-ccw I/O subchannel driver. -zfcpdump.txt - - information on the s390 SCSI dump tool. diff --git a/Documentation/scheduler/00-INDEX b/Documentation/scheduler/00-INDEX deleted file mode 100644 index eccf7ad2e7f9..000000000000 --- a/Documentation/scheduler/00-INDEX +++ /dev/null @@ -1,18 +0,0 @@ -00-INDEX - - this file. -sched-arch.txt - - CPU Scheduler implementation hints for architecture specific code. -sched-bwc.txt - - CFS bandwidth control overview. -sched-design-CFS.txt - - goals, design and implementation of the Completely Fair Scheduler. -sched-domains.txt - - information on scheduling domains. -sched-nice-design.txt - - How and why the scheduler's nice levels are implemented. -sched-rt-group.txt - - real-time group scheduling. -sched-deadline.txt - - deadline scheduling. -sched-stats.txt - - information on schedstats (Linux Scheduler Statistics). diff --git a/Documentation/scsi/00-INDEX b/Documentation/scsi/00-INDEX deleted file mode 100644 index bb4a76f823e1..000000000000 --- a/Documentation/scsi/00-INDEX +++ /dev/null @@ -1,108 +0,0 @@ -00-INDEX - - this file -53c700.txt - - info on driver for 53c700 based adapters -BusLogic.txt - - info on driver for adapters with BusLogic chips -ChangeLog.1992-1997 - - Changes to scsi files, if not listed elsewhere -ChangeLog.arcmsr - - Changes to driver for ARECA's SATA RAID controller cards -ChangeLog.ips - - IBM ServeRAID driver Changelog -ChangeLog.lpfc - - Changes to lpfc driver -ChangeLog.megaraid - - Changes to LSI megaraid controller. -ChangeLog.megaraid_sas - - Changes to serial attached scsi version of LSI megaraid controller. -ChangeLog.ncr53c8xx - - Changes to ncr53c8xx driver -ChangeLog.sym53c8xx - - Changes to sym53c8xx driver -ChangeLog.sym53c8xx_2 - - Changes to second generation of sym53c8xx driver -FlashPoint.txt - - info on driver for BusLogic FlashPoint adapters -LICENSE.FlashPoint - - Licence of the Flashpoint driver -LICENSE.qla2xxx - - License for QLogic Linux Fibre Channel HBA Driver firmware. -LICENSE.qla4xxx - - License for QLogic Linux iSCSI HBA Driver. -Mylex.txt - - info on driver for Mylex adapters -NinjaSCSI.txt - - info on WorkBiT NinjaSCSI-32/32Bi driver -aacraid.txt - - Driver supporting Adaptec RAID controllers -advansys.txt - - List of Advansys Host Adapters -aha152x.txt - - info on driver for Adaptec AHA152x based adapters -aic79xx.txt - - Adaptec Ultra320 SCSI host adapters -aic7xxx.txt - - info on driver for Adaptec controllers -arcmsr_spec.txt - - ARECA FIRMWARE SPEC (for IOP331 adapter) -bfa.txt - - Brocade FC/FCOE adapter driver. -bnx2fc.txt - - FCoE hardware offload for Broadcom network interfaces. -cxgb3i.txt - - Chelsio iSCSI Linux Driver -dc395x.txt - - README file for the dc395x SCSI driver -dpti.txt - - info on driver for DPT SmartRAID and Adaptec I2O RAID based adapters -dtc3x80.txt - - info on driver for DTC 2x80 based adapters -g_NCR5380.txt - - info on driver for NCR5380 and NCR53c400 based adapters -hpsa.txt - - HP Smart Array Controller SCSI driver. -hptiop.txt - - HIGHPOINT ROCKETRAID 3xxx RAID DRIVER -libsas.txt - - Serial Attached SCSI management layer. -link_power_management_policy.txt - - Link power management options. -lpfc.txt - - LPFC driver release notes -megaraid.txt - - Common Management Module, shared code handling ioctls for LSI drivers -ncr53c8xx.txt - - info on driver for NCR53c8xx based adapters -osd.txt - Object-Based Storage Device, command set introduction. -osst.txt - - info on driver for OnStream SC-x0 SCSI tape -ppa.txt - - info on driver for IOmega zip drive -qlogicfas.txt - - info on driver for QLogic FASxxx based adapters -scsi-changer.txt - - README for the SCSI media changer driver -scsi-generic.txt - - info on the sg driver for generic (non-disk/CD/tape) SCSI devices. -scsi-parameters.txt - - List of SCSI-parameters to pass to the kernel at module load-time. -scsi.txt - - short blurb on using SCSI support as a module. -scsi_mid_low_api.txt - - info on API between SCSI layer and low level drivers -scsi_eh.txt - - info on SCSI midlayer error handling infrastructure -scsi_fc_transport.txt - - SCSI Fiber Channel Tansport -st.txt - - info on scsi tape driver -sym53c500_cs.txt - - info on PCMCIA driver for Symbios Logic 53c500 based adapters -sym53c8xx_2.txt - - info on second generation driver for sym53c8xx based adapters -tmscsim.txt - - info on driver for AM53c974 based adapters -ufs.txt - - info on Universal Flash Storage(UFS) and UFS host controller driver. diff --git a/Documentation/serial/00-INDEX b/Documentation/serial/00-INDEX deleted file mode 100644 index 8021a9f29fc5..000000000000 --- a/Documentation/serial/00-INDEX +++ /dev/null @@ -1,16 +0,0 @@ -00-INDEX - - this file. -README.cycladesZ - - info on Cyclades-Z firmware loading. -driver - - intro to the low level serial driver. -moxa-smartio - - file with info on installing/using Moxa multiport serial driver. -n_gsm.txt - - GSM 0710 tty multiplexer howto. -rocket.txt - - info on the Comtrol RocketPort multiport serial driver. -serial-rs485.txt - - info about RS485 structures and support in the kernel. -tty.txt - - guide to the locking policies of the tty layer. diff --git a/Documentation/spi/00-INDEX b/Documentation/spi/00-INDEX deleted file mode 100644 index 8e4bb17d70eb..000000000000 --- a/Documentation/spi/00-INDEX +++ /dev/null @@ -1,16 +0,0 @@ -00-INDEX - - this file. -butterfly - - AVR Butterfly SPI driver overview and pin configuration. -ep93xx_spi - - Basic EP93xx SPI driver configuration. -pxa2xx - - PXA2xx SPI master controller build by spi_message fifo wq -spidev - - Intro to the userspace API for spi devices -spi-lm70llp - - Connecting an LM70-LLP sensor to the kernel via the SPI subsys. -spi-sc18is602 - - NXP SC18IS602/603 I2C-bus to SPI bridge -spi-summary - - (Linux) SPI overview. If unsure about SPI or SPI in Linux, start here. diff --git a/Documentation/sysctl/00-INDEX b/Documentation/sysctl/00-INDEX deleted file mode 100644 index 8cf5d493fd03..000000000000 --- a/Documentation/sysctl/00-INDEX +++ /dev/null @@ -1,16 +0,0 @@ -00-INDEX - - this file. -README - - general information about /proc/sys/ sysctl files. -abi.txt - - documentation for /proc/sys/abi/*. -fs.txt - - documentation for /proc/sys/fs/*. -kernel.txt - - documentation for /proc/sys/kernel/*. -net.txt - - documentation for /proc/sys/net/*. -sunrpc.txt - - documentation for /proc/sys/sunrpc/*. -vm.txt - - documentation for /proc/sys/vm/*. diff --git a/Documentation/timers/00-INDEX b/Documentation/timers/00-INDEX deleted file mode 100644 index 3be05fe0f1f9..000000000000 --- a/Documentation/timers/00-INDEX +++ /dev/null @@ -1,16 +0,0 @@ -00-INDEX - - this file -highres.txt - - High resolution timers and dynamic ticks design notes -hpet.txt - - High Precision Event Timer Driver for Linux -hrtimers.txt - - subsystem for high-resolution kernel timers -NO_HZ.txt - - Summary of the different methods for the scheduler clock-interrupts management. -timekeeping.txt - - Clock sources, clock events, sched_clock() and delay timer notes -timers-howto.txt - - how to insert delays in the kernel the right (tm) way. -timer_stats.txt - - timer usage statistics diff --git a/Documentation/virtual/00-INDEX b/Documentation/virtual/00-INDEX deleted file mode 100644 index af0d23968ee7..000000000000 --- a/Documentation/virtual/00-INDEX +++ /dev/null @@ -1,11 +0,0 @@ -Virtualization support in the Linux kernel. - -00-INDEX - - this file. - -paravirt_ops.txt - - Describes the Linux kernel pv_ops to support different hypervisors -kvm/ - - Kernel Virtual Machine. See also http://linux-kvm.org -uml/ - - User Mode Linux, builds/runs Linux kernel as a userspace program. diff --git a/Documentation/virtual/kvm/00-INDEX b/Documentation/virtual/kvm/00-INDEX deleted file mode 100644 index 3492458a4ae8..000000000000 --- a/Documentation/virtual/kvm/00-INDEX +++ /dev/null @@ -1,35 +0,0 @@ -00-INDEX - - this file. -amd-memory-encryption.rst - - notes on AMD Secure Encrypted Virtualization feature and SEV firmware - command description -api.txt - - KVM userspace API. -arm - - internal ABI between the kernel and HYP (for arm/arm64) -cpuid.txt - - KVM-specific cpuid leaves (x86). -devices/ - - KVM_CAP_DEVICE_CTRL userspace API. -halt-polling.txt - - notes on halt-polling -hypercalls.txt - - KVM hypercalls. -locking.txt - - notes on KVM locks. -mmu.txt - - the x86 kvm shadow mmu. -msr.txt - - KVM-specific MSRs (x86). -nested-vmx.txt - - notes on nested virtualization for Intel x86 processors. -ppc-pv.txt - - the paravirtualization interface on PowerPC. -review-checklist.txt - - review checklist for KVM patches. -s390-diag.txt - - Diagnose hypercall description (for IBM S/390) -timekeeping.txt - - timekeeping virtualization for x86-based architectures. -vcpu-requests.rst - - internal VCPU request API diff --git a/Documentation/vm/00-INDEX b/Documentation/vm/00-INDEX deleted file mode 100644 index f4a4f3e884cf..000000000000 --- a/Documentation/vm/00-INDEX +++ /dev/null @@ -1,50 +0,0 @@ -00-INDEX - - this file. -active_mm.rst - - An explanation from Linus about tsk->active_mm vs tsk->mm. -balance.rst - - various information on memory balancing. -cleancache.rst - - Intro to cleancache and page-granularity victim cache. -frontswap.rst - - Outline frontswap, part of the transcendent memory frontend. -highmem.rst - - Outline of highmem and common issues. -hmm.rst - - Documentation of heterogeneous memory management -hugetlbfs_reserv.rst - - A brief overview of hugetlbfs reservation design/implementation. -hwpoison.rst - - explains what hwpoison is -ksm.rst - - how to use the Kernel Samepage Merging feature. -mmu_notifier.rst - - a note about clearing pte/pmd and mmu notifications -numa.rst - - information about NUMA specific code in the Linux vm. -overcommit-accounting.rst - - description of the Linux kernels overcommit handling modes. -page_frags.rst - - description of page fragments allocator -page_migration.rst - - description of page migration in NUMA systems. -page_owner.rst - - tracking about who allocated each page -remap_file_pages.rst - - a note about remap_file_pages() system call -slub.rst - - a short users guide for SLUB. -split_page_table_lock.rst - - Separate per-table lock to improve scalability of the old page_table_lock. -swap_numa.rst - - automatic binding of swap device to numa node -transhuge.rst - - Transparent Hugepage Support, alternative way of using hugepages. -unevictable-lru.rst - - Unevictable LRU infrastructure -z3fold.txt - - outline of z3fold allocator for storing compressed pages -zsmalloc.rst - - outline of zsmalloc allocator for storing compressed pages -zswap.rst - - Intro to compressed cache for swap pages diff --git a/Documentation/w1/00-INDEX b/Documentation/w1/00-INDEX deleted file mode 100644 index cb49802745dc..000000000000 --- a/Documentation/w1/00-INDEX +++ /dev/null @@ -1,10 +0,0 @@ -00-INDEX - - This file -slaves/ - - Drivers that provide support for specific family codes. -masters/ - - Individual chips providing 1-wire busses. -w1.generic - - The 1-wire (w1) bus -w1.netlink - - Userspace communication protocol over connector [1]. diff --git a/Documentation/w1/masters/00-INDEX b/Documentation/w1/masters/00-INDEX deleted file mode 100644 index 8330cf9325f0..000000000000 --- a/Documentation/w1/masters/00-INDEX +++ /dev/null @@ -1,12 +0,0 @@ -00-INDEX - - This file -ds2482 - - The Maxim/Dallas Semiconductor DS2482 provides 1-wire busses. -ds2490 - - The Maxim/Dallas Semiconductor DS2490 builds USB <-> W1 bridges. -mxc-w1 - - W1 master controller driver found on Freescale MX2/MX3 SoCs -omap-hdq - - HDQ/1-wire module of TI OMAP 2430/3430. -w1-gpio - - GPIO 1-wire bus master driver. diff --git a/Documentation/w1/slaves/00-INDEX b/Documentation/w1/slaves/00-INDEX deleted file mode 100644 index 68946f83e579..000000000000 --- a/Documentation/w1/slaves/00-INDEX +++ /dev/null @@ -1,14 +0,0 @@ -00-INDEX - - This file -w1_therm - - The Maxim/Dallas Semiconductor ds18*20 temperature sensor. -w1_ds2413 - - The Maxim/Dallas Semiconductor ds2413 dual channel addressable switch. -w1_ds2423 - - The Maxim/Dallas Semiconductor ds2423 counter device. -w1_ds2438 - - The Maxim/Dallas Semiconductor ds2438 smart battery monitor. -w1_ds28e04 - - The Maxim/Dallas Semiconductor ds28e04 eeprom. -w1_ds28e17 - - The Maxim/Dallas Semiconductor ds28e17 1-Wire-to-I2C Master Bridge. diff --git a/Documentation/x86/00-INDEX b/Documentation/x86/00-INDEX deleted file mode 100644 index 3bb2ee3edcd1..000000000000 --- a/Documentation/x86/00-INDEX +++ /dev/null @@ -1,20 +0,0 @@ -00-INDEX - - this file -boot.txt - - List of boot protocol versions -earlyprintk.txt - - Using earlyprintk with a USB2 debug port key. -entry_64.txt - - Describe (some of the) kernel entry points for x86. -exception-tables.txt - - why and how Linux kernel uses exception tables on x86 -microcode.txt - - How to load microcode from an initrd-CPIO archive early to fix CPU issues. -mtrr.txt - - how to use x86 Memory Type Range Registers to increase performance -pat.txt - - Page Attribute Table intro and API -usb-legacy-support.txt - - how to fix/avoid quirks when using emulated PS/2 mouse/keyboard. -zero-page.txt - - layout of the first page of memory. diff --git a/Documentation/x86/x86_64/00-INDEX b/Documentation/x86/x86_64/00-INDEX deleted file mode 100644 index 92fc20ab5f0e..000000000000 --- a/Documentation/x86/x86_64/00-INDEX +++ /dev/null @@ -1,16 +0,0 @@ -00-INDEX - - This file -boot-options.txt - - AMD64-specific boot options. -cpu-hotplug-spec - - Firmware support for CPU hotplug under Linux/x86-64 -fake-numa-for-cpusets - - Using numa=fake and CPUSets for Resource Management -kernel-stacks - - Context-specific per-processor interrupt stacks. -machinecheck - - Configurable sysfs parameters for the x86-64 machine check code. -mm.txt - - Memory layout of x86-64 (4 level page tables, 46 bits physical). -uefi.txt - - Booting Linux via Unified Extensible Firmware Interface. diff --git a/README b/README index 2c927ccbd970..669ac7c32292 100644 --- a/README +++ b/README @@ -12,7 +12,6 @@ In order to build the documentation, use ``make htmldocs`` or There are various text files in the Documentation/ subdirectory, several of them using the Restructured Text markup notation. -See Documentation/00-INDEX for a list of what is contained in each file. Please read the Documentation/process/changes.rst file, as it contains the requirements for building and running the kernel, and information about diff --git a/scripts/check_00index.sh b/scripts/check_00index.sh deleted file mode 100755 index aa47f5926c80..000000000000 --- a/scripts/check_00index.sh +++ /dev/null @@ -1,67 +0,0 @@ -#!/bin/bash -# SPDX-License-Identifier: GPL-2.0 - -cd Documentation/ - -# Check entries that should be removed - -obsolete="" -for i in $(tail -n +12 00-INDEX |grep -E '^[a-zA-Z0-9]+'); do - if [ ! -e $i ]; then - obsolete="$obsolete $i" - fi -done - -# Check directory entries that should be added -search="" -dir="" -for i in $(find . -maxdepth 1 -type d); do - if [ "$i" != "." ]; then - new=$(echo $i|perl -ne 's,./(.*),$1/,; print $_') - search="$search $new" - fi -done - -for i in $search; do - if [ "$(grep -P "^$i" 00-INDEX)" == "" ]; then - dir="$dir $i" - fi -done - -# Check file entries that should be added -search="" -file="" -for i in $(find . -maxdepth 1 -type f); do - if [ "$i" != "./.gitignore" ]; then - new=$(echo $i|perl -ne 's,./(.*),$1,; print $_') - search="$search $new" - fi -done - -for i in $search; do - if [ "$(grep -P "^$i\$" 00-INDEX)" == "" ]; then - file="$file $i" - fi -done - -# Output its findings - -echo -e "Documentation/00-INDEX check results:\n" - -if [ "$obsolete" != "" ]; then - echo -e "- Should remove those entries:\n\t$obsolete\n" -else - echo -e "- No obsolete entries\n" -fi - -if [ "$dir" != "" ]; then - echo -e "- Should document those directories:\n\t$dir\n" -else - echo -e "- No new directories to add\n" -fi - -if [ "$file" != "" ]; then - echo -e "- Should document those files:\n\t$file" -else - echo "- No new files to add" -fi -- cgit v1.2.3-59-g8ed1b From 9d723b4ccbd2248a388f37200a931b68dab28ada Mon Sep 17 00:00:00 2001 From: Robin Murphy Date: Thu, 20 Sep 2018 14:14:26 +0100 Subject: iommu: Fix passthrough option documentation Document that the default for "iommu.passthrough" is now configurable. Fixes: 58d1131777a4 ("iommu: Add config option to set passthrough as default") Signed-off-by: Robin Murphy Signed-off-by: Jonathan Corbet --- Documentation/admin-guide/kernel-parameters.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'Documentation/admin-guide') diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 9871e649ffef..8c5a956f3d9c 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -1754,7 +1754,7 @@ Format: { "0" | "1" } 0 - Use IOMMU translation for DMA. 1 - Bypass the IOMMU for DMA. - unset - Use IOMMU translation for DMA. + unset - Use value of CONFIG_IOMMU_DEFAULT_PASSTHROUGH. io7= [HW] IO7 for Marvel based alpha systems See comment before marvel_specify_io7 in -- cgit v1.2.3-59-g8ed1b From 68a6efe86f6a16e25556a2aff40efad41097b486 Mon Sep 17 00:00:00 2001 From: Zhen Lei Date: Thu, 20 Sep 2018 17:10:23 +0100 Subject: iommu: Add "iommu.strict" command line option Add a generic command line option to enable lazy unmapping via IOVA flush queues, which will initally be suuported by iommu-dma. This echoes the semantics of "intel_iommu=strict" (albeit with the opposite default value), but in the driver-agnostic fashion of "iommu.passthrough". Signed-off-by: Zhen Lei [rm: move handling out of SMMUv3 driver, clean up documentation] Signed-off-by: Robin Murphy [will: dropped broken printk when parsing command-line option] Signed-off-by: Will Deacon --- Documentation/admin-guide/kernel-parameters.txt | 12 ++++++++++++ drivers/iommu/iommu.c | 14 ++++++++++++++ 2 files changed, 26 insertions(+) (limited to 'Documentation/admin-guide') diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 9871e649ffef..92ae12aeabf4 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -1749,6 +1749,18 @@ nobypass [PPC/POWERNV] Disable IOMMU bypass, using IOMMU for PCI devices. + iommu.strict= [ARM64] Configure TLB invalidation behaviour + Format: { "0" | "1" } + 0 - Lazy mode. + Request that DMA unmap operations use deferred + invalidation of hardware TLBs, for increased + throughput at the cost of reduced device isolation. + Will fall back to strict mode if not supported by + the relevant IOMMU driver. + 1 - Strict mode (default). + DMA unmap operations invalidate IOMMU hardware TLBs + synchronously. + iommu.passthrough= [ARM64] Configure DMA to bypass the IOMMU by default. Format: { "0" | "1" } diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 8c15c5980299..2b6dad2aa9f1 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -41,6 +41,7 @@ static unsigned int iommu_def_domain_type = IOMMU_DOMAIN_IDENTITY; #else static unsigned int iommu_def_domain_type = IOMMU_DOMAIN_DMA; #endif +static bool iommu_dma_strict __read_mostly = true; struct iommu_callback_data { const struct iommu_ops *ops; @@ -131,6 +132,12 @@ static int __init iommu_set_def_domain_type(char *str) } early_param("iommu.passthrough", iommu_set_def_domain_type); +static int __init iommu_dma_setup(char *str) +{ + return kstrtobool(str, &iommu_dma_strict); +} +early_param("iommu.strict", iommu_dma_setup); + static ssize_t iommu_group_attr_show(struct kobject *kobj, struct attribute *__attr, char *buf) { @@ -1072,6 +1079,13 @@ struct iommu_group *iommu_group_get_for_dev(struct device *dev) group->default_domain = dom; if (!group->domain) group->domain = dom; + + if (dom && !iommu_dma_strict) { + int attr = 1; + iommu_domain_set_attr(dom, + DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE, + &attr); + } } ret = iommu_group_add_device(group, dev); -- cgit v1.2.3-59-g8ed1b From bd0e6c9614b95352eb31d0207df16dc156c527fa Mon Sep 17 00:00:00 2001 From: Zeng Tao Date: Fri, 28 Sep 2018 19:27:52 +0800 Subject: usb: hub: try old enumeration scheme first for high speed devices The new scheme is required just to support legacy low and full-speed devices. For high speed devices, it will slower the enumeration speed. So in this patch we try the "old" enumeration scheme first for high speed devices, and this is what Windows does since Windows 8. Signed-off-by: Zeng Tao Acked-by: Alan Stern Reviewed-by: Roger Quadros Signed-off-by: Greg Kroah-Hartman --- Documentation/admin-guide/kernel-parameters.txt | 3 ++- drivers/usb/core/hub.c | 4 +++- 2 files changed, 5 insertions(+), 2 deletions(-) (limited to 'Documentation/admin-guide') diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 92eb1f42240d..151c527571e5 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -4610,7 +4610,8 @@ usbcore.old_scheme_first= [USB] Start with the old device initialization - scheme (default 0 = off). + scheme, applies only to low and full-speed devices + (default 0 = off). usbcore.usbfs_memory_mb= [USB] Memory limit (in MB) for buffers allocated by diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c index 7801bb30bdba..bf76a3dd4359 100644 --- a/drivers/usb/core/hub.c +++ b/drivers/usb/core/hub.c @@ -2661,11 +2661,13 @@ static bool use_new_scheme(struct usb_device *udev, int retry, { int old_scheme_first_port = port_dev->quirks & USB_PORT_QUIRK_OLD_SCHEME; + int quick_enumeration = (udev->speed == USB_SPEED_HIGH); if (udev->speed >= USB_SPEED_SUPER) return false; - return USE_NEW_SCHEME(retry, old_scheme_first_port || old_scheme_first); + return USE_NEW_SCHEME(retry, old_scheme_first_port || old_scheme_first + || quick_enumeration); } /* Is a USB 3.0 port in the Inactive or Compliance Mode state? -- cgit v1.2.3-59-g8ed1b From d90fe2acd9b2900790da01354dbca48dba37c20d Mon Sep 17 00:00:00 2001 From: Christophe Leroy Date: Fri, 28 Sep 2018 15:39:20 +0000 Subject: powerpc: Wire up memtest Add call to early_memtest() so that kernel compiled with CONFIG_MEMTEST really perform memtest at startup when requested via 'memtest' boot parameter. Tested-by: Daniel Axtens Signed-off-by: Christophe Leroy Signed-off-by: Michael Ellerman --- Documentation/admin-guide/kernel-parameters.txt | 2 +- arch/powerpc/kernel/setup-common.c | 3 +++ 2 files changed, 4 insertions(+), 1 deletion(-) (limited to 'Documentation/admin-guide') diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 64a3bf54b974..1ab0797e4db6 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -2404,7 +2404,7 @@ seconds. Use this parameter to check at some other rate. 0 disables periodic checking. - memtest= [KNL,X86,ARM] Enable memtest + memtest= [KNL,X86,ARM,PPC] Enable memtest Format: default : 0 Specifies the number of memtest passes to be diff --git a/arch/powerpc/kernel/setup-common.c b/arch/powerpc/kernel/setup-common.c index 93fa0c99681e..9ca9db707bcb 100644 --- a/arch/powerpc/kernel/setup-common.c +++ b/arch/powerpc/kernel/setup-common.c @@ -33,6 +33,7 @@ #include #include #include +#include #include #include #include @@ -966,6 +967,8 @@ void __init setup_arch(char **cmdline_p) initmem_init(); + early_memtest(min_low_pfn << PAGE_SHIFT, max_low_pfn << PAGE_SHIFT); + #ifdef CONFIG_DUMMY_CONSOLE conswitchp = &dummy_con; #endif -- cgit v1.2.3-59-g8ed1b From d3091215921bd4b8fdf3129bf8f733b8ca48dc80 Mon Sep 17 00:00:00 2001 From: "Darrick J. Wong" Date: Fri, 5 Oct 2018 19:11:59 -0400 Subject: docs: move ext4 administrative docs to admin-guide/ Move the ext4 mount option and other administrative stuff to the Linux administrator's guide. Signed-off-by: Darrick J. Wong Signed-off-by: Theodore Ts'o --- Documentation/admin-guide/ext4.rst | 574 +++++++++++++++++++++++++++++++ Documentation/admin-guide/index.rst | 1 + Documentation/conf.py | 2 + Documentation/filesystems/ext4/ext4.rst | 574 ------------------------------- Documentation/filesystems/ext4/index.rst | 1 - 5 files changed, 577 insertions(+), 575 deletions(-) create mode 100644 Documentation/admin-guide/ext4.rst delete mode 100644 Documentation/filesystems/ext4/ext4.rst (limited to 'Documentation/admin-guide') diff --git a/Documentation/admin-guide/ext4.rst b/Documentation/admin-guide/ext4.rst new file mode 100644 index 000000000000..e506d3dae510 --- /dev/null +++ b/Documentation/admin-guide/ext4.rst @@ -0,0 +1,574 @@ +.. SPDX-License-Identifier: GPL-2.0 + +======================== +ext4 General Information +======================== + +Ext4 is an advanced level of the ext3 filesystem which incorporates +scalability and reliability enhancements for supporting large filesystems +(64 bit) in keeping with increasing disk capacities and state-of-the-art +feature requirements. + +Mailing list: linux-ext4@vger.kernel.org +Web site: http://ext4.wiki.kernel.org + + +Quick usage instructions +======================== + +Note: More extensive information for getting started with ext4 can be +found at the ext4 wiki site at the URL: +http://ext4.wiki.kernel.org/index.php/Ext4_Howto + + - The latest version of e2fsprogs can be found at: + + https://www.kernel.org/pub/linux/kernel/people/tytso/e2fsprogs/ + + or + + http://sourceforge.net/project/showfiles.php?group_id=2406 + + or grab the latest git repository from: + + https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git + + - Create a new filesystem using the ext4 filesystem type: + + # mke2fs -t ext4 /dev/hda1 + + Or to configure an existing ext3 filesystem to support extents: + + # tune2fs -O extents /dev/hda1 + + If the filesystem was created with 128 byte inodes, it can be + converted to use 256 byte for greater efficiency via: + + # tune2fs -I 256 /dev/hda1 + + - Mounting: + + # mount -t ext4 /dev/hda1 /wherever + + - When comparing performance with other filesystems, it's always + important to try multiple workloads; very often a subtle change in a + workload parameter can completely change the ranking of which + filesystems do well compared to others. When comparing versus ext3, + note that ext4 enables write barriers by default, while ext3 does + not enable write barriers by default. So it is useful to use + explicitly specify whether barriers are enabled or not when via the + '-o barriers=[0|1]' mount option for both ext3 and ext4 filesystems + for a fair comparison. When tuning ext3 for best benchmark numbers, + it is often worthwhile to try changing the data journaling mode; '-o + data=writeback' can be faster for some workloads. (Note however that + running mounted with data=writeback can potentially leave stale data + exposed in recently written files in case of an unclean shutdown, + which could be a security exposure in some situations.) Configuring + the filesystem with a large journal can also be helpful for + metadata-intensive workloads. + +Features +======== + +Currently Available +------------------- + +* ability to use filesystems > 16TB (e2fsprogs support not available yet) +* extent format reduces metadata overhead (RAM, IO for access, transactions) +* extent format more robust in face of on-disk corruption due to magics, +* internal redundancy in tree +* improved file allocation (multi-block alloc) +* lift 32000 subdirectory limit imposed by i_links_count[1] +* nsec timestamps for mtime, atime, ctime, create time +* inode version field on disk (NFSv4, Lustre) +* reduced e2fsck time via uninit_bg feature +* journal checksumming for robustness, performance +* persistent file preallocation (e.g for streaming media, databases) +* ability to pack bitmaps and inode tables into larger virtual groups via the + flex_bg feature +* large file support +* inode allocation using large virtual block groups via flex_bg +* delayed allocation +* large block (up to pagesize) support +* efficient new ordered mode in JBD2 and ext4 (avoid using buffer head to force + the ordering) + +[1] Filesystems with a block size of 1k may see a limit imposed by the +directory hash tree having a maximum depth of two. + +Options +======= + +When mounting an ext4 filesystem, the following option are accepted: +(*) == default + + ro + Mount filesystem read only. Note that ext4 will replay the journal (and + thus write to the partition) even when mounted "read only". The mount + options "ro,noload" can be used to prevent writes to the filesystem. + + journal_checksum + Enable checksumming of the journal transactions. This will allow the + recovery code in e2fsck and the kernel to detect corruption in the + kernel. It is a compatible change and will be ignored by older + kernels. + + journal_async_commit + Commit block can be written to disk without waiting for descriptor + blocks. If enabled older kernels cannot mount the device. This will + enable 'journal_checksum' internally. + + journal_path=path, journal_dev=devnum + When the external journal device's major/minor numbers have changed, + these options allow the user to specify the new journal location. The + journal device is identified through either its new major/minor numbers + encoded in devnum, or via a path to the device. + + norecovery, noload + Don't load the journal on mounting. Note that if the filesystem was + not unmounted cleanly, skipping the journal replay will lead to the + filesystem containing inconsistencies that can lead to any number of + problems. + + data=journal + All data are committed into the journal prior to being written into the + main file system. Enabling this mode will disable delayed allocation + and O_DIRECT support. + + data=ordered (*) + All data are forced directly out to the main file system prior to its + metadata being committed to the journal. + + data=writeback + Data ordering is not preserved, data may be written into the main file + system after its metadata has been committed to the journal. + + commit=nrsec (*) + Ext4 can be told to sync all its data and metadata every 'nrsec' + seconds. The default value is 5 seconds. This means that if you lose + your power, you will lose as much as the latest 5 seconds of work (your + filesystem will not be damaged though, thanks to the journaling). This + default value (or any low value) will hurt performance, but it's good + for data-safety. Setting it to 0 will have the same effect as leaving + it at the default (5 seconds). Setting it to very large values will + improve performance. + + barrier=<0|1(*)>, barrier(*), nobarrier + This enables/disables the use of write barriers in the jbd code. + barrier=0 disables, barrier=1 enables. This also requires an IO stack + which can support barriers, and if jbd gets an error on a barrier + write, it will disable again with a warning. Write barriers enforce + proper on-disk ordering of journal commits, making volatile disk write + caches safe to use, at some performance penalty. If your disks are + battery-backed in one way or another, disabling barriers may safely + improve performance. The mount options "barrier" and "nobarrier" can + also be used to enable or disable barriers, for consistency with other + ext4 mount options. + + inode_readahead_blks=n + This tuning parameter controls the maximum number of inode table blocks + that ext4's inode table readahead algorithm will pre-read into the + buffer cache. The default value is 32 blocks. + + nouser_xattr + Disables Extended User Attributes. See the attr(5) manual page for + more information about extended attributes. + + noacl + This option disables POSIX Access Control List support. If ACL support + is enabled in the kernel configuration (CONFIG_EXT4_FS_POSIX_ACL), ACL + is enabled by default on mount. See the acl(5) manual page for more + information about acl. + + bsddf (*) + Make 'df' act like BSD. + + minixdf + Make 'df' act like Minix. + + debug + Extra debugging information is sent to syslog. + + abort + Simulate the effects of calling ext4_abort() for debugging purposes. + This is normally used while remounting a filesystem which is already + mounted. + + errors=remount-ro + Remount the filesystem read-only on an error. + + errors=continue + Keep going on a filesystem error. + + errors=panic + Panic and halt the machine if an error occurs. (These mount options + override the errors behavior specified in the superblock, which can be + configured using tune2fs) + + data_err=ignore(*) + Just print an error message if an error occurs in a file data buffer in + ordered mode. + data_err=abort + Abort the journal if an error occurs in a file data buffer in ordered + mode. + + grpid | bsdgroups + New objects have the group ID of their parent. + + nogrpid (*) | sysvgroups + New objects have the group ID of their creator. + + resgid=n + The group ID which may use the reserved blocks. + + resuid=n + The user ID which may use the reserved blocks. + + sb= + Use alternate superblock at this location. + + quota, noquota, grpquota, usrquota + These options are ignored by the filesystem. They are used only by + quota tools to recognize volumes where quota should be turned on. See + documentation in the quota-tools package for more details + (http://sourceforge.net/projects/linuxquota). + + jqfmt=, usrjquota=, grpjquota= + These options tell filesystem details about quota so that quota + information can be properly updated during journal replay. They replace + the above quota options. See documentation in the quota-tools package + for more details (http://sourceforge.net/projects/linuxquota). + + stripe=n + Number of filesystem blocks that mballoc will try to use for allocation + size and alignment. For RAID5/6 systems this should be the number of + data disks * RAID chunk size in file system blocks. + + delalloc (*) + Defer block allocation until just before ext4 writes out the block(s) + in question. This allows ext4 to better allocation decisions more + efficiently. + + nodelalloc + Disable delayed allocation. Blocks are allocated when the data is + copied from userspace to the page cache, either via the write(2) system + call or when an mmap'ed page which was previously unallocated is + written for the first time. + + max_batch_time=usec + Maximum amount of time ext4 should wait for additional filesystem + operations to be batch together with a synchronous write operation. + Since a synchronous write operation is going to force a commit and then + a wait for the I/O complete, it doesn't cost much, and can be a huge + throughput win, we wait for a small amount of time to see if any other + transactions can piggyback on the synchronous write. The algorithm + used is designed to automatically tune for the speed of the disk, by + measuring the amount of time (on average) that it takes to finish + committing a transaction. Call this time the "commit time". If the + time that the transaction has been running is less than the commit + time, ext4 will try sleeping for the commit time to see if other + operations will join the transaction. The commit time is capped by + the max_batch_time, which defaults to 15000us (15ms). This + optimization can be turned off entirely by setting max_batch_time to 0. + + min_batch_time=usec + This parameter sets the commit time (as described above) to be at least + min_batch_time. It defaults to zero microseconds. Increasing this + parameter may improve the throughput of multi-threaded, synchronous + workloads on very fast disks, at the cost of increasing latency. + + journal_ioprio=prio + The I/O priority (from 0 to 7, where 0 is the highest priority) which + should be used for I/O operations submitted by kjournald2 during a + commit operation. This defaults to 3, which is a slightly higher + priority than the default I/O priority. + + auto_da_alloc(*), noauto_da_alloc + Many broken applications don't use fsync() when replacing existing + files via patterns such as fd = open("foo.new")/write(fd,..)/close(fd)/ + rename("foo.new", "foo"), or worse yet, fd = open("foo", + O_TRUNC)/write(fd,..)/close(fd). If auto_da_alloc is enabled, ext4 + will detect the replace-via-rename and replace-via-truncate patterns + and force that any delayed allocation blocks are allocated such that at + the next journal commit, in the default data=ordered mode, the data + blocks of the new file are forced to disk before the rename() operation + is committed. This provides roughly the same level of guarantees as + ext3, and avoids the "zero-length" problem that can happen when a + system crashes before the delayed allocation blocks are forced to disk. + + noinit_itable + Do not initialize any uninitialized inode table blocks in the + background. This feature may be used by installation CD's so that the + install process can complete as quickly as possible; the inode table + initialization process would then be deferred until the next time the + file system is unmounted. + + init_itable=n + The lazy itable init code will wait n times the number of milliseconds + it took to zero out the previous block group's inode table. This + minimizes the impact on the system performance while file system's + inode table is being initialized. + + discard, nodiscard(*) + Controls whether ext4 should issue discard/TRIM commands to the + underlying block device when blocks are freed. This is useful for SSD + devices and sparse/thinly-provisioned LUNs, but it is off by default + until sufficient testing has been done. + + nouid32 + Disables 32-bit UIDs and GIDs. This is for interoperability with + older kernels which only store and expect 16-bit values. + + block_validity(*), noblock_validity + These options enable or disable the in-kernel facility for tracking + filesystem metadata blocks within internal data structures. This + allows multi- block allocator and other routines to notice bugs or + corrupted allocation bitmaps which cause blocks to be allocated which + overlap with filesystem metadata blocks. + + dioread_lock, dioread_nolock + Controls whether or not ext4 should use the DIO read locking. If the + dioread_nolock option is specified ext4 will allocate uninitialized + extent before buffer write and convert the extent to initialized after + IO completes. This approach allows ext4 code to avoid using inode + mutex, which improves scalability on high speed storages. However this + does not work with data journaling and dioread_nolock option will be + ignored with kernel warning. Note that dioread_nolock code path is only + used for extent-based files. Because of the restrictions this options + comprises it is off by default (e.g. dioread_lock). + + max_dir_size_kb=n + This limits the size of directories so that any attempt to expand them + beyond the specified limit in kilobytes will cause an ENOSPC error. + This is useful in memory constrained environments, where a very large + directory can cause severe performance problems or even provoke the Out + Of Memory killer. (For example, if there is only 512mb memory + available, a 176mb directory may seriously cramp the system's style.) + + i_version + Enable 64-bit inode version support. This option is off by default. + + dax + Use direct access (no page cache). See + Documentation/filesystems/dax.txt. Note that this option is + incompatible with data=journal. + +Data Mode +========= +There are 3 different data modes: + +* writeback mode + + In data=writeback mode, ext4 does not journal data at all. This mode provides + a similar level of journaling as that of XFS, JFS, and ReiserFS in its default + mode - metadata journaling. A crash+recovery can cause incorrect data to + appear in files which were written shortly before the crash. This mode will + typically provide the best ext4 performance. + +* ordered mode + + In data=ordered mode, ext4 only officially journals metadata, but it logically + groups metadata information related to data changes with the data blocks into + a single unit called a transaction. When it's time to write the new metadata + out to disk, the associated data blocks are written first. In general, this + mode performs slightly slower than writeback but significantly faster than + journal mode. + +* journal mode + + data=journal mode provides full data and metadata journaling. All new data is + written to the journal first, and then to its final location. In the event of + a crash, the journal can be replayed, bringing both data and metadata into a + consistent state. This mode is the slowest except when data needs to be read + from and written to disk at the same time where it outperforms all others + modes. Enabling this mode will disable delayed allocation and O_DIRECT + support. + +/proc entries +============= + +Information about mounted ext4 file systems can be found in +/proc/fs/ext4. Each mounted filesystem will have a directory in +/proc/fs/ext4 based on its device name (i.e., /proc/fs/ext4/hdc or +/proc/fs/ext4/dm-0). The files in each per-device directory are shown +in table below. + +Files in /proc/fs/ext4/ + + mb_groups + details of multiblock allocator buddy cache of free blocks + +/sys entries +============ + +Information about mounted ext4 file systems can be found in +/sys/fs/ext4. Each mounted filesystem will have a directory in +/sys/fs/ext4 based on its device name (i.e., /sys/fs/ext4/hdc or +/sys/fs/ext4/dm-0). The files in each per-device directory are shown +in table below. + +Files in /sys/fs/ext4/: + +(see also Documentation/ABI/testing/sysfs-fs-ext4) + + delayed_allocation_blocks + This file is read-only and shows the number of blocks that are dirty in + the page cache, but which do not have their location in the filesystem + allocated yet. + + inode_goal + Tuning parameter which (if non-zero) controls the goal inode used by + the inode allocator in preference to all other allocation heuristics. + This is intended for debugging use only, and should be 0 on production + systems. + + inode_readahead_blks + Tuning parameter which controls the maximum number of inode table + blocks that ext4's inode table readahead algorithm will pre-read into + the buffer cache. + + lifetime_write_kbytes + This file is read-only and shows the number of kilobytes of data that + have been written to this filesystem since it was created. + + max_writeback_mb_bump + The maximum number of megabytes the writeback code will try to write + out before move on to another inode. + + mb_group_prealloc + The multiblock allocator will round up allocation requests to a + multiple of this tuning parameter if the stripe size is not set in the + ext4 superblock + + mb_max_to_scan + The maximum number of extents the multiblock allocator will search to + find the best extent. + + mb_min_to_scan + The minimum number of extents the multiblock allocator will search to + find the best extent. + + mb_order2_req + Tuning parameter which controls the minimum size for requests (as a + power of 2) where the buddy cache is used. + + mb_stats + Controls whether the multiblock allocator should collect statistics, + which are shown during the unmount. 1 means to collect statistics, 0 + means not to collect statistics. + + mb_stream_req + Files which have fewer blocks than this tunable parameter will have + their blocks allocated out of a block group specific preallocation + pool, so that small files are packed closely together. Each large file + will have its blocks allocated out of its own unique preallocation + pool. + + session_write_kbytes + This file is read-only and shows the number of kilobytes of data that + have been written to this filesystem since it was mounted. + + reserved_clusters + This is RW file and contains number of reserved clusters in the file + system which will be used in the specific situations to avoid costly + zeroout, unexpected ENOSPC, or possible data loss. The default is 2% or + 4096 clusters, whichever is smaller and this can be changed however it + can never exceed number of clusters in the file system. If there is not + enough space for the reserved space when mounting the file mount will + _not_ fail. + +Ioctls +====== + +There is some Ext4 specific functionality which can be accessed by applications +through the system call interfaces. The list of all Ext4 specific ioctls are +shown in the table below. + +Table of Ext4 specific ioctls + + EXT4_IOC_GETFLAGS + Get additional attributes associated with inode. The ioctl argument is + an integer bitfield, with bit values described in ext4.h. This ioctl is + an alias for FS_IOC_GETFLAGS. + + EXT4_IOC_SETFLAGS + Set additional attributes associated with inode. The ioctl argument is + an integer bitfield, with bit values described in ext4.h. This ioctl is + an alias for FS_IOC_SETFLAGS. + + EXT4_IOC_GETVERSION, EXT4_IOC_GETVERSION_OLD + Get the inode i_generation number stored for each inode. The + i_generation number is normally changed only when new inode is created + and it is particularly useful for network filesystems. The '_OLD' + version of this ioctl is an alias for FS_IOC_GETVERSION. + + EXT4_IOC_SETVERSION, EXT4_IOC_SETVERSION_OLD + Set the inode i_generation number stored for each inode. The '_OLD' + version of this ioctl is an alias for FS_IOC_SETVERSION. + + EXT4_IOC_GROUP_EXTEND + This ioctl has the same purpose as the resize mount option. It allows + to resize filesystem to the end of the last existing block group, + further resize has to be done with resize2fs, either online, or + offline. The argument points to the unsigned logn number representing + the filesystem new block count. + + EXT4_IOC_MOVE_EXT + Move the block extents from orig_fd (the one this ioctl is pointing to) + to the donor_fd (the one specified in move_extent structure passed as + an argument to this ioctl). Then, exchange inode metadata between + orig_fd and donor_fd. This is especially useful for online + defragmentation, because the allocator has the opportunity to allocate + moved blocks better, ideally into one contiguous extent. + + EXT4_IOC_GROUP_ADD + Add a new group descriptor to an existing or new group descriptor + block. The new group descriptor is described by ext4_new_group_input + structure, which is passed as an argument to this ioctl. This is + especially useful in conjunction with EXT4_IOC_GROUP_EXTEND, which + allows online resize of the filesystem to the end of the last existing + block group. Those two ioctls combined is used in userspace online + resize tool (e.g. resize2fs). + + EXT4_IOC_MIGRATE + This ioctl operates on the filesystem itself. It converts (migrates) + ext3 indirect block mapped inode to ext4 extent mapped inode by walking + through indirect block mapping of the original inode and converting + contiguous block ranges into ext4 extents of the temporary inode. Then, + inodes are swapped. This ioctl might help, when migrating from ext3 to + ext4 filesystem, however suggestion is to create fresh ext4 filesystem + and copy data from the backup. Note, that filesystem has to support + extents for this ioctl to work. + + EXT4_IOC_ALLOC_DA_BLKS + Force all of the delay allocated blocks to be allocated to preserve + application-expected ext3 behaviour. Note that this will also start + triggering a write of the data blocks, but this behaviour may change in + the future as it is not necessary and has been done this way only for + sake of simplicity. + + EXT4_IOC_RESIZE_FS + Resize the filesystem to a new size. The number of blocks of resized + filesystem is passed in via 64 bit integer argument. The kernel + allocates bitmaps and inode table, the userspace tool thus just passes + the new number of blocks. + + EXT4_IOC_SWAP_BOOT + Swap i_blocks and associated attributes (like i_blocks, i_size, + i_flags, ...) from the specified inode with inode EXT4_BOOT_LOADER_INO + (#5). This is typically used to store a boot loader in a secure part of + the filesystem, where it can't be changed by a normal user by accident. + The data blocks of the previous boot loader will be associated with the + given inode. + +References +========== + +kernel source: + + +programs: http://e2fsprogs.sourceforge.net/ + +useful links: http://fedoraproject.org/wiki/ext3-devel + http://www.bullopensource.org/ext4/ + http://ext4.wiki.kernel.org/index.php/Main_Page + http://fedoraproject.org/wiki/Features/Ext4 diff --git a/Documentation/admin-guide/index.rst b/Documentation/admin-guide/index.rst index 0873685bab0f..965745d5fb9a 100644 --- a/Documentation/admin-guide/index.rst +++ b/Documentation/admin-guide/index.rst @@ -71,6 +71,7 @@ configure specific aspects of kernel behavior to your liking. java ras bcache + ext4 pm/index thunderbolt LSM/index diff --git a/Documentation/conf.py b/Documentation/conf.py index 05dad6bda787..4d32c01e1e16 100644 --- a/Documentation/conf.py +++ b/Documentation/conf.py @@ -383,6 +383,8 @@ latex_documents = [ 'The kernel development community', 'manual'), ('filesystems/index', 'filesystems.tex', 'Linux Filesystems API', 'The kernel development community', 'manual'), + ('admin-guide/ext4', 'ext4-admin-guide.tex', 'ext4 Administration Guide', + 'ext4 Community', 'manual'), ('filesystems/ext4/index', 'ext4.tex', 'ext4 Filesystem', 'ext4 Filesystem Developers', 'manual'), ('gpu/index', 'gpu.tex', 'Linux GPU Driver Developer\'s Guide', diff --git a/Documentation/filesystems/ext4/ext4.rst b/Documentation/filesystems/ext4/ext4.rst deleted file mode 100644 index e2b6bb7c2730..000000000000 --- a/Documentation/filesystems/ext4/ext4.rst +++ /dev/null @@ -1,574 +0,0 @@ -.. SPDX-License-Identifier: GPL-2.0 - -======================== -General Information -======================== - -Ext4 is an advanced level of the ext3 filesystem which incorporates -scalability and reliability enhancements for supporting large filesystems -(64 bit) in keeping with increasing disk capacities and state-of-the-art -feature requirements. - -Mailing list: linux-ext4@vger.kernel.org -Web site: http://ext4.wiki.kernel.org - - -Quick usage instructions -======================== - -Note: More extensive information for getting started with ext4 can be -found at the ext4 wiki site at the URL: -http://ext4.wiki.kernel.org/index.php/Ext4_Howto - - - The latest version of e2fsprogs can be found at: - - https://www.kernel.org/pub/linux/kernel/people/tytso/e2fsprogs/ - - or - - http://sourceforge.net/project/showfiles.php?group_id=2406 - - or grab the latest git repository from: - - https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git - - - Create a new filesystem using the ext4 filesystem type: - - # mke2fs -t ext4 /dev/hda1 - - Or to configure an existing ext3 filesystem to support extents: - - # tune2fs -O extents /dev/hda1 - - If the filesystem was created with 128 byte inodes, it can be - converted to use 256 byte for greater efficiency via: - - # tune2fs -I 256 /dev/hda1 - - - Mounting: - - # mount -t ext4 /dev/hda1 /wherever - - - When comparing performance with other filesystems, it's always - important to try multiple workloads; very often a subtle change in a - workload parameter can completely change the ranking of which - filesystems do well compared to others. When comparing versus ext3, - note that ext4 enables write barriers by default, while ext3 does - not enable write barriers by default. So it is useful to use - explicitly specify whether barriers are enabled or not when via the - '-o barriers=[0|1]' mount option for both ext3 and ext4 filesystems - for a fair comparison. When tuning ext3 for best benchmark numbers, - it is often worthwhile to try changing the data journaling mode; '-o - data=writeback' can be faster for some workloads. (Note however that - running mounted with data=writeback can potentially leave stale data - exposed in recently written files in case of an unclean shutdown, - which could be a security exposure in some situations.) Configuring - the filesystem with a large journal can also be helpful for - metadata-intensive workloads. - -Features -======== - -Currently Available -------------------- - -* ability to use filesystems > 16TB (e2fsprogs support not available yet) -* extent format reduces metadata overhead (RAM, IO for access, transactions) -* extent format more robust in face of on-disk corruption due to magics, -* internal redundancy in tree -* improved file allocation (multi-block alloc) -* lift 32000 subdirectory limit imposed by i_links_count[1] -* nsec timestamps for mtime, atime, ctime, create time -* inode version field on disk (NFSv4, Lustre) -* reduced e2fsck time via uninit_bg feature -* journal checksumming for robustness, performance -* persistent file preallocation (e.g for streaming media, databases) -* ability to pack bitmaps and inode tables into larger virtual groups via the - flex_bg feature -* large file support -* inode allocation using large virtual block groups via flex_bg -* delayed allocation -* large block (up to pagesize) support -* efficient new ordered mode in JBD2 and ext4 (avoid using buffer head to force - the ordering) - -[1] Filesystems with a block size of 1k may see a limit imposed by the -directory hash tree having a maximum depth of two. - -Options -======= - -When mounting an ext4 filesystem, the following option are accepted: -(*) == default - - ro - Mount filesystem read only. Note that ext4 will replay the journal (and - thus write to the partition) even when mounted "read only". The mount - options "ro,noload" can be used to prevent writes to the filesystem. - - journal_checksum - Enable checksumming of the journal transactions. This will allow the - recovery code in e2fsck and the kernel to detect corruption in the - kernel. It is a compatible change and will be ignored by older - kernels. - - journal_async_commit - Commit block can be written to disk without waiting for descriptor - blocks. If enabled older kernels cannot mount the device. This will - enable 'journal_checksum' internally. - - journal_path=path, journal_dev=devnum - When the external journal device's major/minor numbers have changed, - these options allow the user to specify the new journal location. The - journal device is identified through either its new major/minor numbers - encoded in devnum, or via a path to the device. - - norecovery, noload - Don't load the journal on mounting. Note that if the filesystem was - not unmounted cleanly, skipping the journal replay will lead to the - filesystem containing inconsistencies that can lead to any number of - problems. - - data=journal - All data are committed into the journal prior to being written into the - main file system. Enabling this mode will disable delayed allocation - and O_DIRECT support. - - data=ordered (*) - All data are forced directly out to the main file system prior to its - metadata being committed to the journal. - - data=writeback - Data ordering is not preserved, data may be written into the main file - system after its metadata has been committed to the journal. - - commit=nrsec (*) - Ext4 can be told to sync all its data and metadata every 'nrsec' - seconds. The default value is 5 seconds. This means that if you lose - your power, you will lose as much as the latest 5 seconds of work (your - filesystem will not be damaged though, thanks to the journaling). This - default value (or any low value) will hurt performance, but it's good - for data-safety. Setting it to 0 will have the same effect as leaving - it at the default (5 seconds). Setting it to very large values will - improve performance. - - barrier=<0|1(*)>, barrier(*), nobarrier - This enables/disables the use of write barriers in the jbd code. - barrier=0 disables, barrier=1 enables. This also requires an IO stack - which can support barriers, and if jbd gets an error on a barrier - write, it will disable again with a warning. Write barriers enforce - proper on-disk ordering of journal commits, making volatile disk write - caches safe to use, at some performance penalty. If your disks are - battery-backed in one way or another, disabling barriers may safely - improve performance. The mount options "barrier" and "nobarrier" can - also be used to enable or disable barriers, for consistency with other - ext4 mount options. - - inode_readahead_blks=n - This tuning parameter controls the maximum number of inode table blocks - that ext4's inode table readahead algorithm will pre-read into the - buffer cache. The default value is 32 blocks. - - nouser_xattr - Disables Extended User Attributes. See the attr(5) manual page for - more information about extended attributes. - - noacl - This option disables POSIX Access Control List support. If ACL support - is enabled in the kernel configuration (CONFIG_EXT4_FS_POSIX_ACL), ACL - is enabled by default on mount. See the acl(5) manual page for more - information about acl. - - bsddf (*) - Make 'df' act like BSD. - - minixdf - Make 'df' act like Minix. - - debug - Extra debugging information is sent to syslog. - - abort - Simulate the effects of calling ext4_abort() for debugging purposes. - This is normally used while remounting a filesystem which is already - mounted. - - errors=remount-ro - Remount the filesystem read-only on an error. - - errors=continue - Keep going on a filesystem error. - - errors=panic - Panic and halt the machine if an error occurs. (These mount options - override the errors behavior specified in the superblock, which can be - configured using tune2fs) - - data_err=ignore(*) - Just print an error message if an error occurs in a file data buffer in - ordered mode. - data_err=abort - Abort the journal if an error occurs in a file data buffer in ordered - mode. - - grpid | bsdgroups - New objects have the group ID of their parent. - - nogrpid (*) | sysvgroups - New objects have the group ID of their creator. - - resgid=n - The group ID which may use the reserved blocks. - - resuid=n - The user ID which may use the reserved blocks. - - sb= - Use alternate superblock at this location. - - quota, noquota, grpquota, usrquota - These options are ignored by the filesystem. They are used only by - quota tools to recognize volumes where quota should be turned on. See - documentation in the quota-tools package for more details - (http://sourceforge.net/projects/linuxquota). - - jqfmt=, usrjquota=, grpjquota= - These options tell filesystem details about quota so that quota - information can be properly updated during journal replay. They replace - the above quota options. See documentation in the quota-tools package - for more details (http://sourceforge.net/projects/linuxquota). - - stripe=n - Number of filesystem blocks that mballoc will try to use for allocation - size and alignment. For RAID5/6 systems this should be the number of - data disks * RAID chunk size in file system blocks. - - delalloc (*) - Defer block allocation until just before ext4 writes out the block(s) - in question. This allows ext4 to better allocation decisions more - efficiently. - - nodelalloc - Disable delayed allocation. Blocks are allocated when the data is - copied from userspace to the page cache, either via the write(2) system - call or when an mmap'ed page which was previously unallocated is - written for the first time. - - max_batch_time=usec - Maximum amount of time ext4 should wait for additional filesystem - operations to be batch together with a synchronous write operation. - Since a synchronous write operation is going to force a commit and then - a wait for the I/O complete, it doesn't cost much, and can be a huge - throughput win, we wait for a small amount of time to see if any other - transactions can piggyback on the synchronous write. The algorithm - used is designed to automatically tune for the speed of the disk, by - measuring the amount of time (on average) that it takes to finish - committing a transaction. Call this time the "commit time". If the - time that the transaction has been running is less than the commit - time, ext4 will try sleeping for the commit time to see if other - operations will join the transaction. The commit time is capped by - the max_batch_time, which defaults to 15000us (15ms). This - optimization can be turned off entirely by setting max_batch_time to 0. - - min_batch_time=usec - This parameter sets the commit time (as described above) to be at least - min_batch_time. It defaults to zero microseconds. Increasing this - parameter may improve the throughput of multi-threaded, synchronous - workloads on very fast disks, at the cost of increasing latency. - - journal_ioprio=prio - The I/O priority (from 0 to 7, where 0 is the highest priority) which - should be used for I/O operations submitted by kjournald2 during a - commit operation. This defaults to 3, which is a slightly higher - priority than the default I/O priority. - - auto_da_alloc(*), noauto_da_alloc - Many broken applications don't use fsync() when replacing existing - files via patterns such as fd = open("foo.new")/write(fd,..)/close(fd)/ - rename("foo.new", "foo"), or worse yet, fd = open("foo", - O_TRUNC)/write(fd,..)/close(fd). If auto_da_alloc is enabled, ext4 - will detect the replace-via-rename and replace-via-truncate patterns - and force that any delayed allocation blocks are allocated such that at - the next journal commit, in the default data=ordered mode, the data - blocks of the new file are forced to disk before the rename() operation - is committed. This provides roughly the same level of guarantees as - ext3, and avoids the "zero-length" problem that can happen when a - system crashes before the delayed allocation blocks are forced to disk. - - noinit_itable - Do not initialize any uninitialized inode table blocks in the - background. This feature may be used by installation CD's so that the - install process can complete as quickly as possible; the inode table - initialization process would then be deferred until the next time the - file system is unmounted. - - init_itable=n - The lazy itable init code will wait n times the number of milliseconds - it took to zero out the previous block group's inode table. This - minimizes the impact on the system performance while file system's - inode table is being initialized. - - discard, nodiscard(*) - Controls whether ext4 should issue discard/TRIM commands to the - underlying block device when blocks are freed. This is useful for SSD - devices and sparse/thinly-provisioned LUNs, but it is off by default - until sufficient testing has been done. - - nouid32 - Disables 32-bit UIDs and GIDs. This is for interoperability with - older kernels which only store and expect 16-bit values. - - block_validity(*), noblock_validity - These options enable or disable the in-kernel facility for tracking - filesystem metadata blocks within internal data structures. This - allows multi- block allocator and other routines to notice bugs or - corrupted allocation bitmaps which cause blocks to be allocated which - overlap with filesystem metadata blocks. - - dioread_lock, dioread_nolock - Controls whether or not ext4 should use the DIO read locking. If the - dioread_nolock option is specified ext4 will allocate uninitialized - extent before buffer write and convert the extent to initialized after - IO completes. This approach allows ext4 code to avoid using inode - mutex, which improves scalability on high speed storages. However this - does not work with data journaling and dioread_nolock option will be - ignored with kernel warning. Note that dioread_nolock code path is only - used for extent-based files. Because of the restrictions this options - comprises it is off by default (e.g. dioread_lock). - - max_dir_size_kb=n - This limits the size of directories so that any attempt to expand them - beyond the specified limit in kilobytes will cause an ENOSPC error. - This is useful in memory constrained environments, where a very large - directory can cause severe performance problems or even provoke the Out - Of Memory killer. (For example, if there is only 512mb memory - available, a 176mb directory may seriously cramp the system's style.) - - i_version - Enable 64-bit inode version support. This option is off by default. - - dax - Use direct access (no page cache). See - Documentation/filesystems/dax.txt. Note that this option is - incompatible with data=journal. - -Data Mode -========= -There are 3 different data modes: - -* writeback mode - - In data=writeback mode, ext4 does not journal data at all. This mode provides - a similar level of journaling as that of XFS, JFS, and ReiserFS in its default - mode - metadata journaling. A crash+recovery can cause incorrect data to - appear in files which were written shortly before the crash. This mode will - typically provide the best ext4 performance. - -* ordered mode - - In data=ordered mode, ext4 only officially journals metadata, but it logically - groups metadata information related to data changes with the data blocks into - a single unit called a transaction. When it's time to write the new metadata - out to disk, the associated data blocks are written first. In general, this - mode performs slightly slower than writeback but significantly faster than - journal mode. - -* journal mode - - data=journal mode provides full data and metadata journaling. All new data is - written to the journal first, and then to its final location. In the event of - a crash, the journal can be replayed, bringing both data and metadata into a - consistent state. This mode is the slowest except when data needs to be read - from and written to disk at the same time where it outperforms all others - modes. Enabling this mode will disable delayed allocation and O_DIRECT - support. - -/proc entries -============= - -Information about mounted ext4 file systems can be found in -/proc/fs/ext4. Each mounted filesystem will have a directory in -/proc/fs/ext4 based on its device name (i.e., /proc/fs/ext4/hdc or -/proc/fs/ext4/dm-0). The files in each per-device directory are shown -in table below. - -Files in /proc/fs/ext4/ - - mb_groups - details of multiblock allocator buddy cache of free blocks - -/sys entries -============ - -Information about mounted ext4 file systems can be found in -/sys/fs/ext4. Each mounted filesystem will have a directory in -/sys/fs/ext4 based on its device name (i.e., /sys/fs/ext4/hdc or -/sys/fs/ext4/dm-0). The files in each per-device directory are shown -in table below. - -Files in /sys/fs/ext4/: - -(see also Documentation/ABI/testing/sysfs-fs-ext4) - - delayed_allocation_blocks - This file is read-only and shows the number of blocks that are dirty in - the page cache, but which do not have their location in the filesystem - allocated yet. - - inode_goal - Tuning parameter which (if non-zero) controls the goal inode used by - the inode allocator in preference to all other allocation heuristics. - This is intended for debugging use only, and should be 0 on production - systems. - - inode_readahead_blks - Tuning parameter which controls the maximum number of inode table - blocks that ext4's inode table readahead algorithm will pre-read into - the buffer cache. - - lifetime_write_kbytes - This file is read-only and shows the number of kilobytes of data that - have been written to this filesystem since it was created. - - max_writeback_mb_bump - The maximum number of megabytes the writeback code will try to write - out before move on to another inode. - - mb_group_prealloc - The multiblock allocator will round up allocation requests to a - multiple of this tuning parameter if the stripe size is not set in the - ext4 superblock - - mb_max_to_scan - The maximum number of extents the multiblock allocator will search to - find the best extent. - - mb_min_to_scan - The minimum number of extents the multiblock allocator will search to - find the best extent. - - mb_order2_req - Tuning parameter which controls the minimum size for requests (as a - power of 2) where the buddy cache is used. - - mb_stats - Controls whether the multiblock allocator should collect statistics, - which are shown during the unmount. 1 means to collect statistics, 0 - means not to collect statistics. - - mb_stream_req - Files which have fewer blocks than this tunable parameter will have - their blocks allocated out of a block group specific preallocation - pool, so that small files are packed closely together. Each large file - will have its blocks allocated out of its own unique preallocation - pool. - - session_write_kbytes - This file is read-only and shows the number of kilobytes of data that - have been written to this filesystem since it was mounted. - - reserved_clusters - This is RW file and contains number of reserved clusters in the file - system which will be used in the specific situations to avoid costly - zeroout, unexpected ENOSPC, or possible data loss. The default is 2% or - 4096 clusters, whichever is smaller and this can be changed however it - can never exceed number of clusters in the file system. If there is not - enough space for the reserved space when mounting the file mount will - _not_ fail. - -Ioctls -====== - -There is some Ext4 specific functionality which can be accessed by applications -through the system call interfaces. The list of all Ext4 specific ioctls are -shown in the table below. - -Table of Ext4 specific ioctls - - EXT4_IOC_GETFLAGS - Get additional attributes associated with inode. The ioctl argument is - an integer bitfield, with bit values described in ext4.h. This ioctl is - an alias for FS_IOC_GETFLAGS. - - EXT4_IOC_SETFLAGS - Set additional attributes associated with inode. The ioctl argument is - an integer bitfield, with bit values described in ext4.h. This ioctl is - an alias for FS_IOC_SETFLAGS. - - EXT4_IOC_GETVERSION, EXT4_IOC_GETVERSION_OLD - Get the inode i_generation number stored for each inode. The - i_generation number is normally changed only when new inode is created - and it is particularly useful for network filesystems. The '_OLD' - version of this ioctl is an alias for FS_IOC_GETVERSION. - - EXT4_IOC_SETVERSION, EXT4_IOC_SETVERSION_OLD - Set the inode i_generation number stored for each inode. The '_OLD' - version of this ioctl is an alias for FS_IOC_SETVERSION. - - EXT4_IOC_GROUP_EXTEND - This ioctl has the same purpose as the resize mount option. It allows - to resize filesystem to the end of the last existing block group, - further resize has to be done with resize2fs, either online, or - offline. The argument points to the unsigned logn number representing - the filesystem new block count. - - EXT4_IOC_MOVE_EXT - Move the block extents from orig_fd (the one this ioctl is pointing to) - to the donor_fd (the one specified in move_extent structure passed as - an argument to this ioctl). Then, exchange inode metadata between - orig_fd and donor_fd. This is especially useful for online - defragmentation, because the allocator has the opportunity to allocate - moved blocks better, ideally into one contiguous extent. - - EXT4_IOC_GROUP_ADD - Add a new group descriptor to an existing or new group descriptor - block. The new group descriptor is described by ext4_new_group_input - structure, which is passed as an argument to this ioctl. This is - especially useful in conjunction with EXT4_IOC_GROUP_EXTEND, which - allows online resize of the filesystem to the end of the last existing - block group. Those two ioctls combined is used in userspace online - resize tool (e.g. resize2fs). - - EXT4_IOC_MIGRATE - This ioctl operates on the filesystem itself. It converts (migrates) - ext3 indirect block mapped inode to ext4 extent mapped inode by walking - through indirect block mapping of the original inode and converting - contiguous block ranges into ext4 extents of the temporary inode. Then, - inodes are swapped. This ioctl might help, when migrating from ext3 to - ext4 filesystem, however suggestion is to create fresh ext4 filesystem - and copy data from the backup. Note, that filesystem has to support - extents for this ioctl to work. - - EXT4_IOC_ALLOC_DA_BLKS - Force all of the delay allocated blocks to be allocated to preserve - application-expected ext3 behaviour. Note that this will also start - triggering a write of the data blocks, but this behaviour may change in - the future as it is not necessary and has been done this way only for - sake of simplicity. - - EXT4_IOC_RESIZE_FS - Resize the filesystem to a new size. The number of blocks of resized - filesystem is passed in via 64 bit integer argument. The kernel - allocates bitmaps and inode table, the userspace tool thus just passes - the new number of blocks. - - EXT4_IOC_SWAP_BOOT - Swap i_blocks and associated attributes (like i_blocks, i_size, - i_flags, ...) from the specified inode with inode EXT4_BOOT_LOADER_INO - (#5). This is typically used to store a boot loader in a secure part of - the filesystem, where it can't be changed by a normal user by accident. - The data blocks of the previous boot loader will be associated with the - given inode. - -References -========== - -kernel source: - - -programs: http://e2fsprogs.sourceforge.net/ - -useful links: http://fedoraproject.org/wiki/ext3-devel - http://www.bullopensource.org/ext4/ - http://ext4.wiki.kernel.org/index.php/Main_Page - http://fedoraproject.org/wiki/Features/Ext4 diff --git a/Documentation/filesystems/ext4/index.rst b/Documentation/filesystems/ext4/index.rst index 71121605558c..427bc115012e 100644 --- a/Documentation/filesystems/ext4/index.rst +++ b/Documentation/filesystems/ext4/index.rst @@ -13,5 +13,4 @@ the ext4 community. :maxdepth: 5 :numbered: - ext4 ondisk/index -- cgit v1.2.3-59-g8ed1b From 6bf53999a3a269d5d27bb636602f6788e1bb4dd0 Mon Sep 17 00:00:00 2001 From: Mike Rapoport Date: Fri, 5 Oct 2018 01:11:00 +0300 Subject: docs: move memory hotplug description into admin-guide/mm The memory hotplug description in Documentation/memory-hotplug.txt is already formatted as ReST and can be easily added to admin-guide/mm section. While on it, slightly update formatting to make it consistent with the doc-guide. Signed-off-by: Mike Rapoport Signed-off-by: Jonathan Corbet --- Documentation/admin-guide/mm/index.rst | 1 + Documentation/admin-guide/mm/memory-hotplug.rst | 508 ++++++++++++++++++++++++ Documentation/memory-hotplug.txt | 507 ----------------------- 3 files changed, 509 insertions(+), 507 deletions(-) create mode 100644 Documentation/admin-guide/mm/memory-hotplug.rst delete mode 100644 Documentation/memory-hotplug.txt (limited to 'Documentation/admin-guide') diff --git a/Documentation/admin-guide/mm/index.rst b/Documentation/admin-guide/mm/index.rst index ceead68c2df7..8edb35f11317 100644 --- a/Documentation/admin-guide/mm/index.rst +++ b/Documentation/admin-guide/mm/index.rst @@ -29,6 +29,7 @@ the Linux memory management. hugetlbpage idle_page_tracking ksm + memory-hotplug numa_memory_policy pagemap soft-dirty diff --git a/Documentation/admin-guide/mm/memory-hotplug.rst b/Documentation/admin-guide/mm/memory-hotplug.rst new file mode 100644 index 000000000000..a33090c179d6 --- /dev/null +++ b/Documentation/admin-guide/mm/memory-hotplug.rst @@ -0,0 +1,508 @@ +.. _admin_guide_memory_hotplug: + +============== +Memory Hotplug +============== + +:Created: Jul 28 2007 +:Updated: Add description of notifier of memory hotplug: Oct 11 2007 + +This document is about memory hotplug including how-to-use and current status. +Because Memory Hotplug is still under development, contents of this text will +be changed often. + +.. contents:: :local: + +.. CONTENTS + + 1. Introduction + 1.1 Purpose of memory hotplug + 1.2. Phases of memory hotplug + 1.3. Unit of Memory online/offline operation + 2. Kernel Configuration + 3. sysfs files for memory hotplug + 4. Physical memory hot-add phase + 4.1 Hardware(Firmware) Support + 4.2 Notify memory hot-add event by hand + 5. Logical Memory hot-add phase + 5.1. State of memory + 5.2. How to online memory + 6. Logical memory remove + 6.1 Memory offline and ZONE_MOVABLE + 6.2. How to offline memory + 7. Physical memory remove + 8. Memory hotplug event notifier + 9. Future Work List + + +.. note:: + + (1) x86_64's has special implementation for memory hotplug. + This text does not describe it. + (2) This text assumes that sysfs is mounted at ``/sys``. + + +Introduction +============ + +Purpose of memory hotplug +------------------------- + +Memory Hotplug allows users to increase/decrease the amount of memory. +Generally, there are two purposes. + +(A) For changing the amount of memory. + This is to allow a feature like capacity on demand. +(B) For installing/removing DIMMs or NUMA-nodes physically. + This is to exchange DIMMs/NUMA-nodes, reduce power consumption, etc. + +(A) is required by highly virtualized environments and (B) is required by +hardware which supports memory power management. + +Linux memory hotplug is designed for both purpose. + +Phases of memory hotplug +------------------------ + +There are 2 phases in Memory Hotplug: + + 1) Physical Memory Hotplug phase + 2) Logical Memory Hotplug phase. + +The First phase is to communicate hardware/firmware and make/erase +environment for hotplugged memory. Basically, this phase is necessary +for the purpose (B), but this is good phase for communication between +highly virtualized environments too. + +When memory is hotplugged, the kernel recognizes new memory, makes new memory +management tables, and makes sysfs files for new memory's operation. + +If firmware supports notification of connection of new memory to OS, +this phase is triggered automatically. ACPI can notify this event. If not, +"probe" operation by system administration is used instead. +(see :ref:`memory_hotplug_physical_mem`). + +Logical Memory Hotplug phase is to change memory state into +available/unavailable for users. Amount of memory from user's view is +changed by this phase. The kernel makes all memory in it as free pages +when a memory range is available. + +In this document, this phase is described as online/offline. + +Logical Memory Hotplug phase is triggered by write of sysfs file by system +administrator. For the hot-add case, it must be executed after Physical Hotplug +phase by hand. +(However, if you writes udev's hotplug scripts for memory hotplug, these +phases can be execute in seamless way.) + +Unit of Memory online/offline operation +--------------------------------------- + +Memory hotplug uses SPARSEMEM memory model which allows memory to be divided +into chunks of the same size. These chunks are called "sections". The size of +a memory section is architecture dependent. For example, power uses 16MiB, ia64 +uses 1GiB. + +Memory sections are combined into chunks referred to as "memory blocks". The +size of a memory block is architecture dependent and represents the logical +unit upon which memory online/offline operations are to be performed. The +default size of a memory block is the same as memory section size unless an +architecture specifies otherwise. (see :ref:`memory_hotplug_sysfs_files`.) + +To determine the size (in bytes) of a memory block please read this file:: + + /sys/devices/system/memory/block_size_bytes + +Kernel Configuration +==================== + +To use memory hotplug feature, kernel must be compiled with following +config options. + +- For all memory hotplug: + - Memory model -> Sparse Memory (``CONFIG_SPARSEMEM``) + - Allow for memory hot-add (``CONFIG_MEMORY_HOTPLUG``) + +- To enable memory removal, the following are also necessary: + - Allow for memory hot remove (``CONFIG_MEMORY_HOTREMOVE``) + - Page Migration (``CONFIG_MIGRATION``) + +- For ACPI memory hotplug, the following are also necessary: + - Memory hotplug (under ACPI Support menu) (``CONFIG_ACPI_HOTPLUG_MEMORY``) + - This option can be kernel module. + +- As a related configuration, if your box has a feature of NUMA-node hotplug + via ACPI, then this option is necessary too. + + - ACPI0004,PNP0A05 and PNP0A06 Container Driver (under ACPI Support menu) + (``CONFIG_ACPI_CONTAINER``). + + This option can be kernel module too. + + +.. _memory_hotplug_sysfs_files: + +sysfs files for memory hotplug +============================== + +All memory blocks have their device information in sysfs. Each memory block +is described under ``/sys/devices/system/memory`` as:: + + /sys/devices/system/memory/memoryXXX + +where XXX is the memory block id. + +For the memory block covered by the sysfs directory. It is expected that all +memory sections in this range are present and no memory holes exist in the +range. Currently there is no way to determine if there is a memory hole, but +the existence of one should not affect the hotplug capabilities of the memory +block. + +For example, assume 1GiB memory block size. A device for a memory starting at +0x100000000 is ``/sys/device/system/memory/memory4``:: + + (0x100000000 / 1Gib = 4) + +This device covers address range [0x100000000 ... 0x140000000) + +Under each memory block, you can see 5 files: + +- ``/sys/devices/system/memory/memoryXXX/phys_index`` +- ``/sys/devices/system/memory/memoryXXX/phys_device`` +- ``/sys/devices/system/memory/memoryXXX/state`` +- ``/sys/devices/system/memory/memoryXXX/removable`` +- ``/sys/devices/system/memory/memoryXXX/valid_zones`` + +=================== ============================================================ +``phys_index`` read-only and contains memory block id, same as XXX. +``state`` read-write + + - at read: contains online/offline state of memory. + - at write: user can specify "online_kernel", + + "online_movable", "online", "offline" command + which will be performed on all sections in the block. +``phys_device`` read-only: designed to show the name of physical memory + device. This is not well implemented now. +``removable`` read-only: contains an integer value indicating + whether the memory block is removable or not + removable. A value of 1 indicates that the memory + block is removable and a value of 0 indicates that + it is not removable. A memory block is removable only if + every section in the block is removable. +``valid_zones`` read-only: designed to show which zones this memory block + can be onlined to. + + The first column shows it`s default zone. + + "memory6/valid_zones: Normal Movable" shows this memoryblock + can be onlined to ZONE_NORMAL by default and to ZONE_MOVABLE + by online_movable. + + "memory7/valid_zones: Movable Normal" shows this memoryblock + can be onlined to ZONE_MOVABLE by default and to ZONE_NORMAL + by online_kernel. +=================== ============================================================ + +.. note:: + + These directories/files appear after physical memory hotplug phase. + +If CONFIG_NUMA is enabled the memoryXXX/ directories can also be accessed +via symbolic links located in the ``/sys/devices/system/node/node*`` directories. + +For example:: + + /sys/devices/system/node/node0/memory9 -> ../../memory/memory9 + +A backlink will also be created:: + + /sys/devices/system/memory/memory9/node0 -> ../../node/node0 + +.. _memory_hotplug_physical_mem: + +Physical memory hot-add phase +============================= + +Hardware(Firmware) Support +-------------------------- + +On x86_64/ia64 platform, memory hotplug by ACPI is supported. + +In general, the firmware (ACPI) which supports memory hotplug defines +memory class object of _HID "PNP0C80". When a notify is asserted to PNP0C80, +Linux's ACPI handler does hot-add memory to the system and calls a hotplug udev +script. This will be done automatically. + +But scripts for memory hotplug are not contained in generic udev package(now). +You may have to write it by yourself or online/offline memory by hand. +Please see :ref:`memory_hotplug_how_to_online_memory` and +:ref:`memory_hotplug_how_to_offline_memory`. + +If firmware supports NUMA-node hotplug, and defines an object _HID "ACPI0004", +"PNP0A05", or "PNP0A06", notification is asserted to it, and ACPI handler +calls hotplug code for all of objects which are defined in it. +If memory device is found, memory hotplug code will be called. + +Notify memory hot-add event by hand +----------------------------------- + +On some architectures, the firmware may not notify the kernel of a memory +hotplug event. Therefore, the memory "probe" interface is supported to +explicitly notify the kernel. This interface depends on +CONFIG_ARCH_MEMORY_PROBE and can be configured on powerpc, sh, and x86 +if hotplug is supported, although for x86 this should be handled by ACPI +notification. + +Probe interface is located at:: + + /sys/devices/system/memory/probe + +You can tell the physical address of new memory to the kernel by:: + + % echo start_address_of_new_memory > /sys/devices/system/memory/probe + +Then, [start_address_of_new_memory, start_address_of_new_memory + +memory_block_size] memory range is hot-added. In this case, hotplug script is +not called (in current implementation). You'll have to online memory by +yourself. Please see :ref:`memory_hotplug_how_to_online_memory`. + +Logical Memory hot-add phase +============================ + +State of memory +--------------- + +To see (online/offline) state of a memory block, read 'state' file:: + + % cat /sys/device/system/memory/memoryXXX/state + + +- If the memory block is online, you'll read "online". +- If the memory block is offline, you'll read "offline". + + +.. _memory_hotplug_how_to_online_memory: + +How to online memory +-------------------- + +When the memory is hot-added, the kernel decides whether or not to "online" +it according to the policy which can be read from "auto_online_blocks" file:: + + % cat /sys/devices/system/memory/auto_online_blocks + +The default depends on the CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE kernel config +option. If it is disabled the default is "offline" which means the newly added +memory is not in a ready-to-use state and you have to "online" the newly added +memory blocks manually. Automatic onlining can be requested by writing "online" +to "auto_online_blocks" file:: + + % echo online > /sys/devices/system/memory/auto_online_blocks + +This sets a global policy and impacts all memory blocks that will subsequently +be hotplugged. Currently offline blocks keep their state. It is possible, under +certain circumstances, that some memory blocks will be added but will fail to +online. User space tools can check their "state" files +(``/sys/devices/system/memory/memoryXXX/state``) and try to online them manually. + +If the automatic onlining wasn't requested, failed, or some memory block was +offlined it is possible to change the individual block's state by writing to the +"state" file:: + + % echo online > /sys/devices/system/memory/memoryXXX/state + +This onlining will not change the ZONE type of the target memory block, +If the memory block doesn't belong to any zone an appropriate kernel zone +(usually ZONE_NORMAL) will be used unless movable_node kernel command line +option is specified when ZONE_MOVABLE will be used. + +You can explicitly request to associate it with ZONE_MOVABLE by:: + + % echo online_movable > /sys/devices/system/memory/memoryXXX/state + +.. note:: current limit: this memory block must be adjacent to ZONE_MOVABLE + +Or you can explicitly request a kernel zone (usually ZONE_NORMAL) by:: + + % echo online_kernel > /sys/devices/system/memory/memoryXXX/state + +.. note:: current limit: this memory block must be adjacent to ZONE_NORMAL + +An explicit zone onlining can fail (e.g. when the range is already within +and existing and incompatible zone already). + +After this, memory block XXX's state will be 'online' and the amount of +available memory will be increased. + +This may be changed in future. + +Logical memory remove +===================== + +Memory offline and ZONE_MOVABLE +------------------------------- + +Memory offlining is more complicated than memory online. Because memory offline +has to make the whole memory block be unused, memory offline can fail if +the memory block includes memory which cannot be freed. + +In general, memory offline can use 2 techniques. + +(1) reclaim and free all memory in the memory block. +(2) migrate all pages in the memory block. + +In the current implementation, Linux's memory offline uses method (2), freeing +all pages in the memory block by page migration. But not all pages are +migratable. Under current Linux, migratable pages are anonymous pages and +page caches. For offlining a memory block by migration, the kernel has to +guarantee that the memory block contains only migratable pages. + +Now, a boot option for making a memory block which consists of migratable pages +is supported. By specifying "kernelcore=" or "movablecore=" boot option, you can +create ZONE_MOVABLE...a zone which is just used for movable pages. +(See also Documentation/admin-guide/kernel-parameters.rst) + +Assume the system has "TOTAL" amount of memory at boot time, this boot option +creates ZONE_MOVABLE as following. + +1) When kernelcore=YYYY boot option is used, + Size of memory not for movable pages (not for offline) is YYYY. + Size of memory for movable pages (for offline) is TOTAL-YYYY. + +2) When movablecore=ZZZZ boot option is used, + Size of memory not for movable pages (not for offline) is TOTAL - ZZZZ. + Size of memory for movable pages (for offline) is ZZZZ. + +.. note:: + + Unfortunately, there is no information to show which memory block belongs + to ZONE_MOVABLE. This is TBD. + +.. _memory_hotplug_how_to_offline_memory: + +How to offline memory +--------------------- + +You can offline a memory block by using the same sysfs interface that was used +in memory onlining:: + + % echo offline > /sys/devices/system/memory/memoryXXX/state + +If offline succeeds, the state of the memory block is changed to be "offline". +If it fails, some error core (like -EBUSY) will be returned by the kernel. +Even if a memory block does not belong to ZONE_MOVABLE, you can try to offline +it. If it doesn't contain 'unmovable' memory, you'll get success. + +A memory block under ZONE_MOVABLE is considered to be able to be offlined +easily. But under some busy state, it may return -EBUSY. Even if a memory +block cannot be offlined due to -EBUSY, you can retry offlining it and may be +able to offline it (or not). (For example, a page is referred to by some kernel +internal call and released soon.) + +Consideration: + Memory hotplug's design direction is to make the possibility of memory + offlining higher and to guarantee unplugging memory under any situation. But + it needs more work. Returning -EBUSY under some situation may be good because + the user can decide to retry more or not by himself. Currently, memory + offlining code does some amount of retry with 120 seconds timeout. + +Physical memory remove +====================== + +Need more implementation yet.... + - Notification completion of remove works by OS to firmware. + - Guard from remove if not yet. + +Memory hotplug event notifier +============================= + +Hotplugging events are sent to a notification queue. + +There are six types of notification defined in ``include/linux/memory.h``: + +MEM_GOING_ONLINE + Generated before new memory becomes available in order to be able to + prepare subsystems to handle memory. The page allocator is still unable + to allocate from the new memory. + +MEM_CANCEL_ONLINE + Generated if MEMORY_GOING_ONLINE fails. + +MEM_ONLINE + Generated when memory has successfully brought online. The callback may + allocate pages from the new memory. + +MEM_GOING_OFFLINE + Generated to begin the process of offlining memory. Allocations are no + longer possible from the memory but some of the memory to be offlined + is still in use. The callback can be used to free memory known to a + subsystem from the indicated memory block. + +MEM_CANCEL_OFFLINE + Generated if MEMORY_GOING_OFFLINE fails. Memory is available again from + the memory block that we attempted to offline. + +MEM_OFFLINE + Generated after offlining memory is complete. + +A callback routine can be registered by calling:: + + hotplug_memory_notifier(callback_func, priority) + +Callback functions with higher values of priority are called before callback +functions with lower values. + +A callback function must have the following prototype:: + + int callback_func( + struct notifier_block *self, unsigned long action, void *arg); + +The first argument of the callback function (self) is a pointer to the block +of the notifier chain that points to the callback function itself. +The second argument (action) is one of the event types described above. +The third argument (arg) passes a pointer of struct memory_notify:: + + struct memory_notify { + unsigned long start_pfn; + unsigned long nr_pages; + int status_change_nid_normal; + int status_change_nid_high; + int status_change_nid; + } + +- start_pfn is start_pfn of online/offline memory. +- nr_pages is # of pages of online/offline memory. +- status_change_nid_normal is set node id when N_NORMAL_MEMORY of nodemask + is (will be) set/clear, if this is -1, then nodemask status is not changed. +- status_change_nid_high is set node id when N_HIGH_MEMORY of nodemask + is (will be) set/clear, if this is -1, then nodemask status is not changed. +- status_change_nid is set node id when N_MEMORY of nodemask is (will be) + set/clear. It means a new(memoryless) node gets new memory by online and a + node loses all memory. If this is -1, then nodemask status is not changed. + + If status_changed_nid* >= 0, callback should create/discard structures for the + node if necessary. + +The callback routine shall return one of the values +NOTIFY_DONE, NOTIFY_OK, NOTIFY_BAD, NOTIFY_STOP +defined in ``include/linux/notifier.h`` + +NOTIFY_DONE and NOTIFY_OK have no effect on the further processing. + +NOTIFY_BAD is used as response to the MEM_GOING_ONLINE, MEM_GOING_OFFLINE, +MEM_ONLINE, or MEM_OFFLINE action to cancel hotplugging. It stops +further processing of the notification queue. + +NOTIFY_STOP stops further processing of the notification queue. + +Future Work +=========== + + - allowing memory hot-add to ZONE_MOVABLE. maybe we need some switch like + sysctl or new control file. + - showing memory block and physical device relationship. + - test and make it better memory offlining. + - support HugeTLB page migration and offlining. + - memmap removing at memory offline. + - physical remove memory. diff --git a/Documentation/memory-hotplug.txt b/Documentation/memory-hotplug.txt deleted file mode 100644 index 7f49ebf3ddb2..000000000000 --- a/Documentation/memory-hotplug.txt +++ /dev/null @@ -1,507 +0,0 @@ -============== -Memory Hotplug -============== - -:Created: Jul 28 2007 -:Updated: Add description of notifier of memory hotplug: Oct 11 2007 - -This document is about memory hotplug including how-to-use and current status. -Because Memory Hotplug is still under development, contents of this text will -be changed often. - -.. CONTENTS - - 1. Introduction - 1.1 purpose of memory hotplug - 1.2. Phases of memory hotplug - 1.3. Unit of Memory online/offline operation - 2. Kernel Configuration - 3. sysfs files for memory hotplug - 4. Physical memory hot-add phase - 4.1 Hardware(Firmware) Support - 4.2 Notify memory hot-add event by hand - 5. Logical Memory hot-add phase - 5.1. State of memory - 5.2. How to online memory - 6. Logical memory remove - 6.1 Memory offline and ZONE_MOVABLE - 6.2. How to offline memory - 7. Physical memory remove - 8. Memory hotplug event notifier - 9. Future Work List - - -.. note:: - - (1) x86_64's has special implementation for memory hotplug. - This text does not describe it. - (2) This text assumes that sysfs is mounted at /sys. - - -Introduction -============ - -purpose of memory hotplug -------------------------- - -Memory Hotplug allows users to increase/decrease the amount of memory. -Generally, there are two purposes. - -(A) For changing the amount of memory. - This is to allow a feature like capacity on demand. -(B) For installing/removing DIMMs or NUMA-nodes physically. - This is to exchange DIMMs/NUMA-nodes, reduce power consumption, etc. - -(A) is required by highly virtualized environments and (B) is required by -hardware which supports memory power management. - -Linux memory hotplug is designed for both purpose. - - -Phases of memory hotplug ------------------------- - -There are 2 phases in Memory Hotplug: - - 1) Physical Memory Hotplug phase - 2) Logical Memory Hotplug phase. - -The First phase is to communicate hardware/firmware and make/erase -environment for hotplugged memory. Basically, this phase is necessary -for the purpose (B), but this is good phase for communication between -highly virtualized environments too. - -When memory is hotplugged, the kernel recognizes new memory, makes new memory -management tables, and makes sysfs files for new memory's operation. - -If firmware supports notification of connection of new memory to OS, -this phase is triggered automatically. ACPI can notify this event. If not, -"probe" operation by system administration is used instead. -(see :ref:`memory_hotplug_physical_mem`). - -Logical Memory Hotplug phase is to change memory state into -available/unavailable for users. Amount of memory from user's view is -changed by this phase. The kernel makes all memory in it as free pages -when a memory range is available. - -In this document, this phase is described as online/offline. - -Logical Memory Hotplug phase is triggered by write of sysfs file by system -administrator. For the hot-add case, it must be executed after Physical Hotplug -phase by hand. -(However, if you writes udev's hotplug scripts for memory hotplug, these -phases can be execute in seamless way.) - - -Unit of Memory online/offline operation ---------------------------------------- - -Memory hotplug uses SPARSEMEM memory model which allows memory to be divided -into chunks of the same size. These chunks are called "sections". The size of -a memory section is architecture dependent. For example, power uses 16MiB, ia64 -uses 1GiB. - -Memory sections are combined into chunks referred to as "memory blocks". The -size of a memory block is architecture dependent and represents the logical -unit upon which memory online/offline operations are to be performed. The -default size of a memory block is the same as memory section size unless an -architecture specifies otherwise. (see :ref:`memory_hotplug_sysfs_files`.) - -To determine the size (in bytes) of a memory block please read this file: - -/sys/devices/system/memory/block_size_bytes - - -Kernel Configuration -==================== - -To use memory hotplug feature, kernel must be compiled with following -config options. - -- For all memory hotplug: - - Memory model -> Sparse Memory (CONFIG_SPARSEMEM) - - Allow for memory hot-add (CONFIG_MEMORY_HOTPLUG) - -- To enable memory removal, the following are also necessary: - - Allow for memory hot remove (CONFIG_MEMORY_HOTREMOVE) - - Page Migration (CONFIG_MIGRATION) - -- For ACPI memory hotplug, the following are also necessary: - - Memory hotplug (under ACPI Support menu) (CONFIG_ACPI_HOTPLUG_MEMORY) - - This option can be kernel module. - -- As a related configuration, if your box has a feature of NUMA-node hotplug - via ACPI, then this option is necessary too. - - - ACPI0004,PNP0A05 and PNP0A06 Container Driver (under ACPI Support menu) - (CONFIG_ACPI_CONTAINER). - - This option can be kernel module too. - - -.. _memory_hotplug_sysfs_files: - -sysfs files for memory hotplug -============================== - -All memory blocks have their device information in sysfs. Each memory block -is described under /sys/devices/system/memory as: - - /sys/devices/system/memory/memoryXXX - (XXX is the memory block id.) - -For the memory block covered by the sysfs directory. It is expected that all -memory sections in this range are present and no memory holes exist in the -range. Currently there is no way to determine if there is a memory hole, but -the existence of one should not affect the hotplug capabilities of the memory -block. - -For example, assume 1GiB memory block size. A device for a memory starting at -0x100000000 is /sys/device/system/memory/memory4:: - - (0x100000000 / 1Gib = 4) - -This device covers address range [0x100000000 ... 0x140000000) - -Under each memory block, you can see 5 files: - -- /sys/devices/system/memory/memoryXXX/phys_index -- /sys/devices/system/memory/memoryXXX/phys_device -- /sys/devices/system/memory/memoryXXX/state -- /sys/devices/system/memory/memoryXXX/removable -- /sys/devices/system/memory/memoryXXX/valid_zones - -=================== ============================================================ -``phys_index`` read-only and contains memory block id, same as XXX. -``state`` read-write - - - at read: contains online/offline state of memory. - - at write: user can specify "online_kernel", - - "online_movable", "online", "offline" command - which will be performed on all sections in the block. -``phys_device`` read-only: designed to show the name of physical memory - device. This is not well implemented now. -``removable`` read-only: contains an integer value indicating - whether the memory block is removable or not - removable. A value of 1 indicates that the memory - block is removable and a value of 0 indicates that - it is not removable. A memory block is removable only if - every section in the block is removable. -``valid_zones`` read-only: designed to show which zones this memory block - can be onlined to. - - The first column shows it`s default zone. - - "memory6/valid_zones: Normal Movable" shows this memoryblock - can be onlined to ZONE_NORMAL by default and to ZONE_MOVABLE - by online_movable. - - "memory7/valid_zones: Movable Normal" shows this memoryblock - can be onlined to ZONE_MOVABLE by default and to ZONE_NORMAL - by online_kernel. -=================== ============================================================ - -.. note:: - - These directories/files appear after physical memory hotplug phase. - -If CONFIG_NUMA is enabled the memoryXXX/ directories can also be accessed -via symbolic links located in the /sys/devices/system/node/node* directories. - -For example: -/sys/devices/system/node/node0/memory9 -> ../../memory/memory9 - -A backlink will also be created: -/sys/devices/system/memory/memory9/node0 -> ../../node/node0 - -.. _memory_hotplug_physical_mem: - -Physical memory hot-add phase -============================= - -Hardware(Firmware) Support --------------------------- - -On x86_64/ia64 platform, memory hotplug by ACPI is supported. - -In general, the firmware (ACPI) which supports memory hotplug defines -memory class object of _HID "PNP0C80". When a notify is asserted to PNP0C80, -Linux's ACPI handler does hot-add memory to the system and calls a hotplug udev -script. This will be done automatically. - -But scripts for memory hotplug are not contained in generic udev package(now). -You may have to write it by yourself or online/offline memory by hand. -Please see :ref:`memory_hotplug_how_to_online_memory` and -:ref:`memory_hotplug_how_to_offline_memory`. - -If firmware supports NUMA-node hotplug, and defines an object _HID "ACPI0004", -"PNP0A05", or "PNP0A06", notification is asserted to it, and ACPI handler -calls hotplug code for all of objects which are defined in it. -If memory device is found, memory hotplug code will be called. - - -Notify memory hot-add event by hand ------------------------------------ - -On some architectures, the firmware may not notify the kernel of a memory -hotplug event. Therefore, the memory "probe" interface is supported to -explicitly notify the kernel. This interface depends on -CONFIG_ARCH_MEMORY_PROBE and can be configured on powerpc, sh, and x86 -if hotplug is supported, although for x86 this should be handled by ACPI -notification. - -Probe interface is located at -/sys/devices/system/memory/probe - -You can tell the physical address of new memory to the kernel by:: - - % echo start_address_of_new_memory > /sys/devices/system/memory/probe - -Then, [start_address_of_new_memory, start_address_of_new_memory + -memory_block_size] memory range is hot-added. In this case, hotplug script is -not called (in current implementation). You'll have to online memory by -yourself. Please see :ref:`memory_hotplug_how_to_online_memory`. - - -Logical Memory hot-add phase -============================ - -State of memory ---------------- - -To see (online/offline) state of a memory block, read 'state' file:: - - % cat /sys/device/system/memory/memoryXXX/state - - -- If the memory block is online, you'll read "online". -- If the memory block is offline, you'll read "offline". - - -.. _memory_hotplug_how_to_online_memory: - -How to online memory --------------------- - -When the memory is hot-added, the kernel decides whether or not to "online" -it according to the policy which can be read from "auto_online_blocks" file:: - - % cat /sys/devices/system/memory/auto_online_blocks - -The default depends on the CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE kernel config -option. If it is disabled the default is "offline" which means the newly added -memory is not in a ready-to-use state and you have to "online" the newly added -memory blocks manually. Automatic onlining can be requested by writing "online" -to "auto_online_blocks" file:: - - % echo online > /sys/devices/system/memory/auto_online_blocks - -This sets a global policy and impacts all memory blocks that will subsequently -be hotplugged. Currently offline blocks keep their state. It is possible, under -certain circumstances, that some memory blocks will be added but will fail to -online. User space tools can check their "state" files -(/sys/devices/system/memory/memoryXXX/state) and try to online them manually. - -If the automatic onlining wasn't requested, failed, or some memory block was -offlined it is possible to change the individual block's state by writing to the -"state" file:: - - % echo online > /sys/devices/system/memory/memoryXXX/state - -This onlining will not change the ZONE type of the target memory block, -If the memory block doesn't belong to any zone an appropriate kernel zone -(usually ZONE_NORMAL) will be used unless movable_node kernel command line -option is specified when ZONE_MOVABLE will be used. - -You can explicitly request to associate it with ZONE_MOVABLE by:: - - % echo online_movable > /sys/devices/system/memory/memoryXXX/state - -.. note:: current limit: this memory block must be adjacent to ZONE_MOVABLE - -Or you can explicitly request a kernel zone (usually ZONE_NORMAL) by:: - - % echo online_kernel > /sys/devices/system/memory/memoryXXX/state - -.. note:: current limit: this memory block must be adjacent to ZONE_NORMAL - -An explicit zone onlining can fail (e.g. when the range is already within -and existing and incompatible zone already). - -After this, memory block XXX's state will be 'online' and the amount of -available memory will be increased. - -This may be changed in future. - - - -Logical memory remove -===================== - -Memory offline and ZONE_MOVABLE -------------------------------- - -Memory offlining is more complicated than memory online. Because memory offline -has to make the whole memory block be unused, memory offline can fail if -the memory block includes memory which cannot be freed. - -In general, memory offline can use 2 techniques. - -(1) reclaim and free all memory in the memory block. -(2) migrate all pages in the memory block. - -In the current implementation, Linux's memory offline uses method (2), freeing -all pages in the memory block by page migration. But not all pages are -migratable. Under current Linux, migratable pages are anonymous pages and -page caches. For offlining a memory block by migration, the kernel has to -guarantee that the memory block contains only migratable pages. - -Now, a boot option for making a memory block which consists of migratable pages -is supported. By specifying "kernelcore=" or "movablecore=" boot option, you can -create ZONE_MOVABLE...a zone which is just used for movable pages. -(See also Documentation/admin-guide/kernel-parameters.rst) - -Assume the system has "TOTAL" amount of memory at boot time, this boot option -creates ZONE_MOVABLE as following. - -1) When kernelcore=YYYY boot option is used, - Size of memory not for movable pages (not for offline) is YYYY. - Size of memory for movable pages (for offline) is TOTAL-YYYY. - -2) When movablecore=ZZZZ boot option is used, - Size of memory not for movable pages (not for offline) is TOTAL - ZZZZ. - Size of memory for movable pages (for offline) is ZZZZ. - -.. note:: - - Unfortunately, there is no information to show which memory block belongs - to ZONE_MOVABLE. This is TBD. - -.. _memory_hotplug_how_to_offline_memory: - -How to offline memory ---------------------- - -You can offline a memory block by using the same sysfs interface that was used -in memory onlining:: - - % echo offline > /sys/devices/system/memory/memoryXXX/state - -If offline succeeds, the state of the memory block is changed to be "offline". -If it fails, some error core (like -EBUSY) will be returned by the kernel. -Even if a memory block does not belong to ZONE_MOVABLE, you can try to offline -it. If it doesn't contain 'unmovable' memory, you'll get success. - -A memory block under ZONE_MOVABLE is considered to be able to be offlined -easily. But under some busy state, it may return -EBUSY. Even if a memory -block cannot be offlined due to -EBUSY, you can retry offlining it and may be -able to offline it (or not). (For example, a page is referred to by some kernel -internal call and released soon.) - -Consideration: - Memory hotplug's design direction is to make the possibility of memory - offlining higher and to guarantee unplugging memory under any situation. But - it needs more work. Returning -EBUSY under some situation may be good because - the user can decide to retry more or not by himself. Currently, memory - offlining code does some amount of retry with 120 seconds timeout. - -Physical memory remove -====================== - -Need more implementation yet.... - - Notification completion of remove works by OS to firmware. - - Guard from remove if not yet. - -Memory hotplug event notifier -============================= - -Hotplugging events are sent to a notification queue. - -There are six types of notification defined in include/linux/memory.h: - -MEM_GOING_ONLINE - Generated before new memory becomes available in order to be able to - prepare subsystems to handle memory. The page allocator is still unable - to allocate from the new memory. - -MEM_CANCEL_ONLINE - Generated if MEMORY_GOING_ONLINE fails. - -MEM_ONLINE - Generated when memory has successfully brought online. The callback may - allocate pages from the new memory. - -MEM_GOING_OFFLINE - Generated to begin the process of offlining memory. Allocations are no - longer possible from the memory but some of the memory to be offlined - is still in use. The callback can be used to free memory known to a - subsystem from the indicated memory block. - -MEM_CANCEL_OFFLINE - Generated if MEMORY_GOING_OFFLINE fails. Memory is available again from - the memory block that we attempted to offline. - -MEM_OFFLINE - Generated after offlining memory is complete. - -A callback routine can be registered by calling:: - - hotplug_memory_notifier(callback_func, priority) - -Callback functions with higher values of priority are called before callback -functions with lower values. - -A callback function must have the following prototype:: - - int callback_func( - struct notifier_block *self, unsigned long action, void *arg); - -The first argument of the callback function (self) is a pointer to the block -of the notifier chain that points to the callback function itself. -The second argument (action) is one of the event types described above. -The third argument (arg) passes a pointer of struct memory_notify:: - - struct memory_notify { - unsigned long start_pfn; - unsigned long nr_pages; - int status_change_nid_normal; - int status_change_nid_high; - int status_change_nid; - } - -- start_pfn is start_pfn of online/offline memory. -- nr_pages is # of pages of online/offline memory. -- status_change_nid_normal is set node id when N_NORMAL_MEMORY of nodemask - is (will be) set/clear, if this is -1, then nodemask status is not changed. -- status_change_nid_high is set node id when N_HIGH_MEMORY of nodemask - is (will be) set/clear, if this is -1, then nodemask status is not changed. -- status_change_nid is set node id when N_MEMORY of nodemask is (will be) - set/clear. It means a new(memoryless) node gets new memory by online and a - node loses all memory. If this is -1, then nodemask status is not changed. - - If status_changed_nid* >= 0, callback should create/discard structures for the - node if necessary. - -The callback routine shall return one of the values -NOTIFY_DONE, NOTIFY_OK, NOTIFY_BAD, NOTIFY_STOP -defined in include/linux/notifier.h - -NOTIFY_DONE and NOTIFY_OK have no effect on the further processing. - -NOTIFY_BAD is used as response to the MEM_GOING_ONLINE, MEM_GOING_OFFLINE, -MEM_ONLINE, or MEM_OFFLINE action to cancel hotplugging. It stops -further processing of the notification queue. - -NOTIFY_STOP stops further processing of the notification queue. - -Future Work -=========== - - - allowing memory hot-add to ZONE_MOVABLE. maybe we need some switch like - sysctl or new control file. - - showing memory block and physical device relationship. - - test and make it better memory offlining. - - support HugeTLB page migration and offlining. - - memmap removing at memory offline. - - physical remove memory. -- cgit v1.2.3-59-g8ed1b From 98cee6742c80e35274bab06c5fa1141fe0abd910 Mon Sep 17 00:00:00 2001 From: Mike Rapoport Date: Fri, 5 Oct 2018 01:11:01 +0300 Subject: docs/vm: split memory hotplug notifier description to Documentation/core-api The memory hotplug notifier description is about kernel internals rather than admin/user visible API. Place it appropriately. Signed-off-by: Mike Rapoport Signed-off-by: Jonathan Corbet --- Documentation/admin-guide/mm/memory-hotplug.rst | 83 --------------------- Documentation/core-api/index.rst | 2 + Documentation/core-api/memory-hotplug-notifier.rst | 84 ++++++++++++++++++++++ 3 files changed, 86 insertions(+), 83 deletions(-) create mode 100644 Documentation/core-api/memory-hotplug-notifier.rst (limited to 'Documentation/admin-guide') diff --git a/Documentation/admin-guide/mm/memory-hotplug.rst b/Documentation/admin-guide/mm/memory-hotplug.rst index a33090c179d6..0b9c83effaa4 100644 --- a/Documentation/admin-guide/mm/memory-hotplug.rst +++ b/Documentation/admin-guide/mm/memory-hotplug.rst @@ -31,7 +31,6 @@ be changed often. 6.1 Memory offline and ZONE_MOVABLE 6.2. How to offline memory 7. Physical memory remove - 8. Memory hotplug event notifier 9. Future Work List @@ -414,88 +413,6 @@ Need more implementation yet.... - Notification completion of remove works by OS to firmware. - Guard from remove if not yet. -Memory hotplug event notifier -============================= - -Hotplugging events are sent to a notification queue. - -There are six types of notification defined in ``include/linux/memory.h``: - -MEM_GOING_ONLINE - Generated before new memory becomes available in order to be able to - prepare subsystems to handle memory. The page allocator is still unable - to allocate from the new memory. - -MEM_CANCEL_ONLINE - Generated if MEMORY_GOING_ONLINE fails. - -MEM_ONLINE - Generated when memory has successfully brought online. The callback may - allocate pages from the new memory. - -MEM_GOING_OFFLINE - Generated to begin the process of offlining memory. Allocations are no - longer possible from the memory but some of the memory to be offlined - is still in use. The callback can be used to free memory known to a - subsystem from the indicated memory block. - -MEM_CANCEL_OFFLINE - Generated if MEMORY_GOING_OFFLINE fails. Memory is available again from - the memory block that we attempted to offline. - -MEM_OFFLINE - Generated after offlining memory is complete. - -A callback routine can be registered by calling:: - - hotplug_memory_notifier(callback_func, priority) - -Callback functions with higher values of priority are called before callback -functions with lower values. - -A callback function must have the following prototype:: - - int callback_func( - struct notifier_block *self, unsigned long action, void *arg); - -The first argument of the callback function (self) is a pointer to the block -of the notifier chain that points to the callback function itself. -The second argument (action) is one of the event types described above. -The third argument (arg) passes a pointer of struct memory_notify:: - - struct memory_notify { - unsigned long start_pfn; - unsigned long nr_pages; - int status_change_nid_normal; - int status_change_nid_high; - int status_change_nid; - } - -- start_pfn is start_pfn of online/offline memory. -- nr_pages is # of pages of online/offline memory. -- status_change_nid_normal is set node id when N_NORMAL_MEMORY of nodemask - is (will be) set/clear, if this is -1, then nodemask status is not changed. -- status_change_nid_high is set node id when N_HIGH_MEMORY of nodemask - is (will be) set/clear, if this is -1, then nodemask status is not changed. -- status_change_nid is set node id when N_MEMORY of nodemask is (will be) - set/clear. It means a new(memoryless) node gets new memory by online and a - node loses all memory. If this is -1, then nodemask status is not changed. - - If status_changed_nid* >= 0, callback should create/discard structures for the - node if necessary. - -The callback routine shall return one of the values -NOTIFY_DONE, NOTIFY_OK, NOTIFY_BAD, NOTIFY_STOP -defined in ``include/linux/notifier.h`` - -NOTIFY_DONE and NOTIFY_OK have no effect on the further processing. - -NOTIFY_BAD is used as response to the MEM_GOING_ONLINE, MEM_GOING_OFFLINE, -MEM_ONLINE, or MEM_OFFLINE action to cancel hotplugging. It stops -further processing of the notification queue. - -NOTIFY_STOP stops further processing of the notification queue. - Future Work =========== diff --git a/Documentation/core-api/index.rst b/Documentation/core-api/index.rst index 165d76886d73..4f8a4266eb7c 100644 --- a/Documentation/core-api/index.rst +++ b/Documentation/core-api/index.rst @@ -32,6 +32,8 @@ Core utilities gfp_mask-from-fs-io timekeeping boot-time-mm + memory-hotplug-notifier + Interfaces for kernel debugging =============================== diff --git a/Documentation/core-api/memory-hotplug-notifier.rst b/Documentation/core-api/memory-hotplug-notifier.rst new file mode 100644 index 000000000000..35347cc3a43a --- /dev/null +++ b/Documentation/core-api/memory-hotplug-notifier.rst @@ -0,0 +1,84 @@ +.. _memory_hotplug_notifier: + +============================= +Memory hotplug event notifier +============================= + +Hotplugging events are sent to a notification queue. + +There are six types of notification defined in ``include/linux/memory.h``: + +MEM_GOING_ONLINE + Generated before new memory becomes available in order to be able to + prepare subsystems to handle memory. The page allocator is still unable + to allocate from the new memory. + +MEM_CANCEL_ONLINE + Generated if MEM_GOING_ONLINE fails. + +MEM_ONLINE + Generated when memory has successfully brought online. The callback may + allocate pages from the new memory. + +MEM_GOING_OFFLINE + Generated to begin the process of offlining memory. Allocations are no + longer possible from the memory but some of the memory to be offlined + is still in use. The callback can be used to free memory known to a + subsystem from the indicated memory block. + +MEM_CANCEL_OFFLINE + Generated if MEM_GOING_OFFLINE fails. Memory is available again from + the memory block that we attempted to offline. + +MEM_OFFLINE + Generated after offlining memory is complete. + +A callback routine can be registered by calling:: + + hotplug_memory_notifier(callback_func, priority) + +Callback functions with higher values of priority are called before callback +functions with lower values. + +A callback function must have the following prototype:: + + int callback_func( + struct notifier_block *self, unsigned long action, void *arg); + +The first argument of the callback function (self) is a pointer to the block +of the notifier chain that points to the callback function itself. +The second argument (action) is one of the event types described above. +The third argument (arg) passes a pointer of struct memory_notify:: + + struct memory_notify { + unsigned long start_pfn; + unsigned long nr_pages; + int status_change_nid_normal; + int status_change_nid_high; + int status_change_nid; + } + +- start_pfn is start_pfn of online/offline memory. +- nr_pages is # of pages of online/offline memory. +- status_change_nid_normal is set node id when N_NORMAL_MEMORY of nodemask + is (will be) set/clear, if this is -1, then nodemask status is not changed. +- status_change_nid_high is set node id when N_HIGH_MEMORY of nodemask + is (will be) set/clear, if this is -1, then nodemask status is not changed. +- status_change_nid is set node id when N_MEMORY of nodemask is (will be) + set/clear. It means a new(memoryless) node gets new memory by online and a + node loses all memory. If this is -1, then nodemask status is not changed. + + If status_changed_nid* >= 0, callback should create/discard structures for the + node if necessary. + +The callback routine shall return one of the values +NOTIFY_DONE, NOTIFY_OK, NOTIFY_BAD, NOTIFY_STOP +defined in ``include/linux/notifier.h`` + +NOTIFY_DONE and NOTIFY_OK have no effect on the further processing. + +NOTIFY_BAD is used as response to the MEM_GOING_ONLINE, MEM_GOING_OFFLINE, +MEM_ONLINE, or MEM_OFFLINE action to cancel hotplugging. It stops +further processing of the notification queue. + +NOTIFY_STOP stops further processing of the notification queue. -- cgit v1.2.3-59-g8ed1b From 31527da5d673ed16255869b6d0f209285b8b0981 Mon Sep 17 00:00:00 2001 From: Yves-Alexis Perez Date: Tue, 2 Oct 2018 22:47:23 +0200 Subject: yama: clarify ptrace_scope=2 in Yama documentation Current phrasing is ambiguous since it's unclear if attaching to a children through PTRACE_TRACEME requires CAP_SYS_PTRACE. Rephrase the sentence to make that clear. Signed-off-by: Yves-Alexis Perez Acked-by: Kees Cook Signed-off-by: Jonathan Corbet --- Documentation/admin-guide/LSM/Yama.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'Documentation/admin-guide') diff --git a/Documentation/admin-guide/LSM/Yama.rst b/Documentation/admin-guide/LSM/Yama.rst index 13468ea696b7..d0a060de3973 100644 --- a/Documentation/admin-guide/LSM/Yama.rst +++ b/Documentation/admin-guide/LSM/Yama.rst @@ -64,8 +64,8 @@ The sysctl settings (writable only with ``CAP_SYS_PTRACE``) are: Using ``PTRACE_TRACEME`` is unchanged. 2 - admin-only attach: - only processes with ``CAP_SYS_PTRACE`` may use ptrace - with ``PTRACE_ATTACH``, or through children calling ``PTRACE_TRACEME``. + only processes with ``CAP_SYS_PTRACE`` may use ptrace, either with + ``PTRACE_ATTACH`` or through children calling ``PTRACE_TRACEME``. 3 - no attach: no processes may use ptrace with ``PTRACE_ATTACH`` nor via -- cgit v1.2.3-59-g8ed1b From 9b8c7c14059af801637a818882159145c370d6f1 Mon Sep 17 00:00:00 2001 From: Kees Cook Date: Wed, 10 Oct 2018 17:18:25 -0700 Subject: LSM: Provide init debugging infrastructure Booting with "lsm.debug" will report future details on how LSM ordering decisions are being made. Signed-off-by: Kees Cook Reviewed-by: Casey Schaufler Reviewed-by: John Johansen Reviewed-by: James Morris Signed-off-by: James Morris --- Documentation/admin-guide/kernel-parameters.txt | 2 ++ security/security.c | 18 ++++++++++++++++++ 2 files changed, 20 insertions(+) (limited to 'Documentation/admin-guide') diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 9871e649ffef..32d323ee9218 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -2274,6 +2274,8 @@ ltpc= [NET] Format: ,, + lsm.debug [SECURITY] Enable LSM initialization debugging output. + machvec= [IA-64] Force the use of a particular machine-vector (machvec) in a generic kernel. Example: machvec=hpzx1_swiotlb diff --git a/security/security.c b/security/security.c index e74f46fba591..395f804f6a91 100644 --- a/security/security.c +++ b/security/security.c @@ -12,6 +12,8 @@ * (at your option) any later version. */ +#define pr_fmt(fmt) "LSM: " fmt + #include #include #include @@ -43,11 +45,19 @@ char *lsm_names; static __initdata char chosen_lsm[SECURITY_NAME_MAX + 1] = CONFIG_DEFAULT_SECURITY; +static __initdata bool debug; +#define init_debug(...) \ + do { \ + if (debug) \ + pr_info(__VA_ARGS__); \ + } while (0) + static void __init major_lsm_init(void) { struct lsm_info *lsm; for (lsm = __start_lsm_info; lsm < __end_lsm_info; lsm++) { + init_debug("initializing %s\n", lsm->name); lsm->init(); } } @@ -91,6 +101,14 @@ static int __init choose_lsm(char *str) } __setup("security=", choose_lsm); +/* Enable LSM order debugging. */ +static int __init enable_debug(char *str) +{ + debug = true; + return 1; +} +__setup("lsm.debug", enable_debug); + static bool match_last_lsm(const char *list, const char *lsm) { const char *last; -- cgit v1.2.3-59-g8ed1b From 63625899c6eba697fdbefcdcdc2cb27549df9e43 Mon Sep 17 00:00:00 2001 From: Mike Rapoport Date: Sun, 7 Oct 2018 19:32:44 +0300 Subject: docs/admin-guide: memory-hotplug: remove table of contents Remove "manual" table of contents and leave only the ReST tag so that Sphinx will take care of TOC generation. Signed-off-by: Mike Rapoport Signed-off-by: Jonathan Corbet --- Documentation/admin-guide/mm/memory-hotplug.rst | 21 --------------------- 1 file changed, 21 deletions(-) (limited to 'Documentation/admin-guide') diff --git a/Documentation/admin-guide/mm/memory-hotplug.rst b/Documentation/admin-guide/mm/memory-hotplug.rst index 0b9c83effaa4..25157aec5b31 100644 --- a/Documentation/admin-guide/mm/memory-hotplug.rst +++ b/Documentation/admin-guide/mm/memory-hotplug.rst @@ -13,27 +13,6 @@ be changed often. .. contents:: :local: -.. CONTENTS - - 1. Introduction - 1.1 Purpose of memory hotplug - 1.2. Phases of memory hotplug - 1.3. Unit of Memory online/offline operation - 2. Kernel Configuration - 3. sysfs files for memory hotplug - 4. Physical memory hot-add phase - 4.1 Hardware(Firmware) Support - 4.2 Notify memory hot-add event by hand - 5. Logical Memory hot-add phase - 5.1. State of memory - 5.2. How to online memory - 6. Logical memory remove - 6.1 Memory offline and ZONE_MOVABLE - 6.2. How to offline memory - 7. Physical memory remove - 9. Future Work List - - .. note:: (1) x86_64's has special implementation for memory hotplug. -- cgit v1.2.3-59-g8ed1b From 14fdc2c5318ae420e68496975f48dc1dbef52649 Mon Sep 17 00:00:00 2001 From: Will Deacon Date: Mon, 22 Oct 2018 16:39:01 +0100 Subject: Documentation/security-bugs: Clarify treatment of embargoed information The Linux kernel security team has been accused of rejecting the idea of security embargoes. This is incorrect, and could dissuade people from reporting security issues to us under the false assumption that the issue would leak prematurely. Clarify the handling of embargoed information in our process documentation. Co-developed-by: Ingo Molnar Acked-by: Kees Cook Acked-by: Peter Zijlstra Acked-by: Laura Abbott Signed-off-by: Will Deacon Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- Documentation/admin-guide/security-bugs.rst | 47 ++++++++++++++++++----------- 1 file changed, 29 insertions(+), 18 deletions(-) (limited to 'Documentation/admin-guide') diff --git a/Documentation/admin-guide/security-bugs.rst b/Documentation/admin-guide/security-bugs.rst index 30491d91e93d..164bf71149fd 100644 --- a/Documentation/admin-guide/security-bugs.rst +++ b/Documentation/admin-guide/security-bugs.rst @@ -26,23 +26,34 @@ information is helpful. Any exploit code is very helpful and will not be released without consent from the reporter unless it has already been made public. -Disclosure ----------- - -The goal of the Linux kernel security team is to work with the bug -submitter to understand and fix the bug. We prefer to publish the fix as -soon as possible, but try to avoid public discussion of the bug itself -and leave that to others. - -Publishing the fix may be delayed when the bug or the fix is not yet -fully understood, the solution is not well-tested or for vendor -coordination. However, we expect these delays to be short, measurable in -days, not weeks or months. A release date is negotiated by the security -team working with the bug submitter as well as vendors. However, the -kernel security team holds the final say when setting a timeframe. The -timeframe varies from immediate (esp. if it's already publicly known bug) -to a few weeks. As a basic default policy, we expect report date to -release date to be on the order of 7 days. +Disclosure and embargoed information +------------------------------------ + +The security list is not a disclosure channel. For that, see Coordination +below. + +Once a robust fix has been developed, our preference is to release the +fix in a timely fashion, treating it no differently than any of the other +thousands of changes and fixes the Linux kernel project releases every +month. + +However, at the request of the reporter, we will postpone releasing the +fix for up to 5 business days after the date of the report or after the +embargo has lifted; whichever comes first. The only exception to that +rule is if the bug is publicly known, in which case the preference is to +release the fix as soon as it's available. + +Whilst embargoed information may be shared with trusted individuals in +order to develop a fix, such information will not be published alongside +the fix or on any other disclosure channel without the permission of the +reporter. This includes but is not limited to the original bug report +and followup discussions (if any), exploits, CVE information or the +identity of the reporter. + +In other words our only interest is in getting bugs fixed. All other +information submitted to the security list and any followup discussions +of the report are treated confidentially even after the embargo has been +lifted, in perpetuity. Coordination ------------ @@ -68,7 +79,7 @@ may delay the bug handling. If a reporter wishes to have a CVE identifier assigned ahead of public disclosure, they will need to contact the private linux-distros list, described above. When such a CVE identifier is known before a patch is provided, it is desirable to mention it in the commit -message, though. +message if the reporter agrees. Non-disclosure agreements ------------------------- -- cgit v1.2.3-59-g8ed1b From 2ce7135adc9ad081aa3c49744144376ac74fea60 Mon Sep 17 00:00:00 2001 From: Johannes Weiner Date: Fri, 26 Oct 2018 15:06:31 -0700 Subject: psi: cgroup support On a system that executes multiple cgrouped jobs and independent workloads, we don't just care about the health of the overall system, but also that of individual jobs, so that we can ensure individual job health, fairness between jobs, or prioritize some jobs over others. This patch implements pressure stall tracking for cgroups. In kernels with CONFIG_PSI=y, cgroup2 groups will have cpu.pressure, memory.pressure, and io.pressure files that track aggregate pressure stall times for only the tasks inside the cgroup. Link: http://lkml.kernel.org/r/20180828172258.3185-10-hannes@cmpxchg.org Signed-off-by: Johannes Weiner Acked-by: Tejun Heo Acked-by: Peter Zijlstra (Intel) Tested-by: Daniel Drake Tested-by: Suren Baghdasaryan Cc: Christopher Lameter Cc: Ingo Molnar Cc: Johannes Weiner Cc: Mike Galbraith Cc: Peter Enderborg Cc: Randy Dunlap Cc: Shakeel Butt Cc: Vinayak Menon Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/accounting/psi.txt | 9 +++ Documentation/admin-guide/cgroup-v2.rst | 18 +++++ include/linux/cgroup-defs.h | 4 ++ include/linux/cgroup.h | 15 ++++ include/linux/psi.h | 25 +++++++ init/Kconfig | 4 ++ kernel/cgroup/cgroup.c | 45 +++++++++++- kernel/sched/psi.c | 118 +++++++++++++++++++++++++++++--- 8 files changed, 228 insertions(+), 10 deletions(-) (limited to 'Documentation/admin-guide') diff --git a/Documentation/accounting/psi.txt b/Documentation/accounting/psi.txt index 3753a82f1cf5..b8ca28b60215 100644 --- a/Documentation/accounting/psi.txt +++ b/Documentation/accounting/psi.txt @@ -62,3 +62,12 @@ well as medium and long term trends. The total absolute stall time is tracked and exported as well, to allow detection of latency spikes which wouldn't necessarily make a dent in the time averages, or to average trends over custom time frames. + +Cgroup2 interface +================= + +In a system with a CONFIG_CGROUP=y kernel and the cgroup2 filesystem +mounted, pressure stall information is also tracked for tasks grouped +into cgroups. Each subdirectory in the cgroupfs mountpoint contains +cpu.pressure, memory.pressure, and io.pressure files; the format is +the same as the /proc/pressure/ files. diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst index caf36105a1c7..8389d6f72a77 100644 --- a/Documentation/admin-guide/cgroup-v2.rst +++ b/Documentation/admin-guide/cgroup-v2.rst @@ -966,6 +966,12 @@ All time durations are in microseconds. $PERIOD duration. "max" for $MAX indicates no limit. If only one number is written, $MAX is updated. + cpu.pressure + A read-only nested-key file which exists on non-root cgroups. + + Shows pressure stall information for CPU. See + Documentation/accounting/psi.txt for details. + Memory ------ @@ -1271,6 +1277,12 @@ PAGE_SIZE multiple when read back. higher than the limit for an extended period of time. This reduces the impact on the workload and memory management. + memory.pressure + A read-only nested-key file which exists on non-root cgroups. + + Shows pressure stall information for memory. See + Documentation/accounting/psi.txt for details. + Usage Guidelines ~~~~~~~~~~~~~~~~ @@ -1408,6 +1420,12 @@ IO Interface Files 8:16 rbps=2097152 wbps=max riops=max wiops=max + io.pressure + A read-only nested-key file which exists on non-root cgroups. + + Shows pressure stall information for IO. See + Documentation/accounting/psi.txt for details. + Writeback ~~~~~~~~~ diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h index 22254c1fe1c5..5e1694fe035b 100644 --- a/include/linux/cgroup-defs.h +++ b/include/linux/cgroup-defs.h @@ -20,6 +20,7 @@ #include #include #include +#include #ifdef CONFIG_CGROUPS @@ -436,6 +437,9 @@ struct cgroup { /* used to schedule release agent */ struct work_struct release_agent_work; + /* used to track pressure stalls */ + struct psi_group psi; + /* used to store eBPF programs */ struct cgroup_bpf bpf; diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h index b622d6608605..9968332cceed 100644 --- a/include/linux/cgroup.h +++ b/include/linux/cgroup.h @@ -650,6 +650,11 @@ static inline void pr_cont_cgroup_path(struct cgroup *cgrp) pr_cont_kernfs_path(cgrp->kn); } +static inline struct psi_group *cgroup_psi(struct cgroup *cgrp) +{ + return &cgrp->psi; +} + static inline void cgroup_init_kthreadd(void) { /* @@ -703,6 +708,16 @@ static inline union kernfs_node_id *cgroup_get_kernfs_id(struct cgroup *cgrp) return NULL; } +static inline struct cgroup *cgroup_parent(struct cgroup *cgrp) +{ + return NULL; +} + +static inline struct psi_group *cgroup_psi(struct cgroup *cgrp) +{ + return NULL; +} + static inline bool task_under_cgroup_hierarchy(struct task_struct *task, struct cgroup *ancestor) { diff --git a/include/linux/psi.h b/include/linux/psi.h index b0daf050de58..8e0725aac0aa 100644 --- a/include/linux/psi.h +++ b/include/linux/psi.h @@ -4,6 +4,9 @@ #include #include +struct seq_file; +struct css_set; + #ifdef CONFIG_PSI extern bool psi_disabled; @@ -16,6 +19,14 @@ void psi_memstall_tick(struct task_struct *task, int cpu); void psi_memstall_enter(unsigned long *flags); void psi_memstall_leave(unsigned long *flags); +int psi_show(struct seq_file *s, struct psi_group *group, enum psi_res res); + +#ifdef CONFIG_CGROUPS +int psi_cgroup_alloc(struct cgroup *cgrp); +void psi_cgroup_free(struct cgroup *cgrp); +void cgroup_move_task(struct task_struct *p, struct css_set *to); +#endif + #else /* CONFIG_PSI */ static inline void psi_init(void) {} @@ -23,6 +34,20 @@ static inline void psi_init(void) {} static inline void psi_memstall_enter(unsigned long *flags) {} static inline void psi_memstall_leave(unsigned long *flags) {} +#ifdef CONFIG_CGROUPS +static inline int psi_cgroup_alloc(struct cgroup *cgrp) +{ + return 0; +} +static inline void psi_cgroup_free(struct cgroup *cgrp) +{ +} +static inline void cgroup_move_task(struct task_struct *p, struct css_set *to) +{ + rcu_assign_pointer(p->cgroups, to); +} +#endif + #endif /* CONFIG_PSI */ #endif /* _LINUX_PSI_H */ diff --git a/init/Kconfig b/init/Kconfig index 26e639df5517..a4112e95724a 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -501,6 +501,10 @@ config PSI the share of walltime in which some or all tasks in the system are delayed due to contention of the respective resource. + In kernels with cgroup support, cgroups (cgroup2 only) will + have cpu.pressure, memory.pressure, and io.pressure files, + which aggregate pressure stalls for the grouped tasks only. + For more details see Documentation/accounting/psi.txt. Say N if unsure. diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c index 4c1cf0969a80..8b79318810ad 100644 --- a/kernel/cgroup/cgroup.c +++ b/kernel/cgroup/cgroup.c @@ -55,6 +55,7 @@ #include #include #include +#include #include #define CREATE_TRACE_POINTS @@ -862,7 +863,7 @@ static void css_set_move_task(struct task_struct *task, */ WARN_ON_ONCE(task->flags & PF_EXITING); - rcu_assign_pointer(task->cgroups, to_cset); + cgroup_move_task(task, to_cset); list_add_tail(&task->cg_list, use_mg_tasks ? &to_cset->mg_tasks : &to_cset->tasks); } @@ -3446,6 +3447,21 @@ static int cpu_stat_show(struct seq_file *seq, void *v) return ret; } +#ifdef CONFIG_PSI +static int cgroup_io_pressure_show(struct seq_file *seq, void *v) +{ + return psi_show(seq, &seq_css(seq)->cgroup->psi, PSI_IO); +} +static int cgroup_memory_pressure_show(struct seq_file *seq, void *v) +{ + return psi_show(seq, &seq_css(seq)->cgroup->psi, PSI_MEM); +} +static int cgroup_cpu_pressure_show(struct seq_file *seq, void *v) +{ + return psi_show(seq, &seq_css(seq)->cgroup->psi, PSI_CPU); +} +#endif + static int cgroup_file_open(struct kernfs_open_file *of) { struct cftype *cft = of->kn->priv; @@ -4576,6 +4592,23 @@ static struct cftype cgroup_base_files[] = { .flags = CFTYPE_NOT_ON_ROOT, .seq_show = cpu_stat_show, }, +#ifdef CONFIG_PSI + { + .name = "io.pressure", + .flags = CFTYPE_NOT_ON_ROOT, + .seq_show = cgroup_io_pressure_show, + }, + { + .name = "memory.pressure", + .flags = CFTYPE_NOT_ON_ROOT, + .seq_show = cgroup_memory_pressure_show, + }, + { + .name = "cpu.pressure", + .flags = CFTYPE_NOT_ON_ROOT, + .seq_show = cgroup_cpu_pressure_show, + }, +#endif { } /* terminate */ }; @@ -4636,6 +4669,7 @@ static void css_free_rwork_fn(struct work_struct *work) */ cgroup_put(cgroup_parent(cgrp)); kernfs_put(cgrp->kn); + psi_cgroup_free(cgrp); if (cgroup_on_dfl(cgrp)) cgroup_rstat_exit(cgrp); kfree(cgrp); @@ -4892,10 +4926,15 @@ static struct cgroup *cgroup_create(struct cgroup *parent) cgrp->self.parent = &parent->self; cgrp->root = root; cgrp->level = level; - ret = cgroup_bpf_inherit(cgrp); + + ret = psi_cgroup_alloc(cgrp); if (ret) goto out_idr_free; + ret = cgroup_bpf_inherit(cgrp); + if (ret) + goto out_psi_free; + for (tcgrp = cgrp; tcgrp; tcgrp = cgroup_parent(tcgrp)) { cgrp->ancestor_ids[tcgrp->level] = tcgrp->id; @@ -4933,6 +4972,8 @@ static struct cgroup *cgroup_create(struct cgroup *parent) return cgrp; +out_psi_free: + psi_cgroup_free(cgrp); out_idr_free: cgroup_idr_remove(&root->cgroup_idr, cgrp->id); out_stat_exit: diff --git a/kernel/sched/psi.c b/kernel/sched/psi.c index 595414599b98..7cdecfc010af 100644 --- a/kernel/sched/psi.c +++ b/kernel/sched/psi.c @@ -473,9 +473,35 @@ static void psi_group_change(struct psi_group *group, int cpu, schedule_delayed_work(&group->clock_work, PSI_FREQ); } +static struct psi_group *iterate_groups(struct task_struct *task, void **iter) +{ +#ifdef CONFIG_CGROUPS + struct cgroup *cgroup = NULL; + + if (!*iter) + cgroup = task->cgroups->dfl_cgrp; + else if (*iter == &psi_system) + return NULL; + else + cgroup = cgroup_parent(*iter); + + if (cgroup && cgroup_parent(cgroup)) { + *iter = cgroup; + return cgroup_psi(cgroup); + } +#else + if (*iter) + return NULL; +#endif + *iter = &psi_system; + return &psi_system; +} + void psi_task_change(struct task_struct *task, int clear, int set) { int cpu = task_cpu(task); + struct psi_group *group; + void *iter = NULL; if (!task->pid) return; @@ -492,17 +518,23 @@ void psi_task_change(struct task_struct *task, int clear, int set) task->psi_flags &= ~clear; task->psi_flags |= set; - psi_group_change(&psi_system, cpu, clear, set); + while ((group = iterate_groups(task, &iter))) + psi_group_change(group, cpu, clear, set); } void psi_memstall_tick(struct task_struct *task, int cpu) { - struct psi_group_cpu *groupc; + struct psi_group *group; + void *iter = NULL; - groupc = per_cpu_ptr(psi_system.pcpu, cpu); - write_seqcount_begin(&groupc->seq); - record_times(groupc, cpu, true); - write_seqcount_end(&groupc->seq); + while ((group = iterate_groups(task, &iter))) { + struct psi_group_cpu *groupc; + + groupc = per_cpu_ptr(group->pcpu, cpu); + write_seqcount_begin(&groupc->seq); + record_times(groupc, cpu, true); + write_seqcount_end(&groupc->seq); + } } /** @@ -565,8 +597,78 @@ void psi_memstall_leave(unsigned long *flags) rq_unlock_irq(rq, &rf); } -static int psi_show(struct seq_file *m, struct psi_group *group, - enum psi_res res) +#ifdef CONFIG_CGROUPS +int psi_cgroup_alloc(struct cgroup *cgroup) +{ + if (psi_disabled) + return 0; + + cgroup->psi.pcpu = alloc_percpu(struct psi_group_cpu); + if (!cgroup->psi.pcpu) + return -ENOMEM; + group_init(&cgroup->psi); + return 0; +} + +void psi_cgroup_free(struct cgroup *cgroup) +{ + if (psi_disabled) + return; + + cancel_delayed_work_sync(&cgroup->psi.clock_work); + free_percpu(cgroup->psi.pcpu); +} + +/** + * cgroup_move_task - move task to a different cgroup + * @task: the task + * @to: the target css_set + * + * Move task to a new cgroup and safely migrate its associated stall + * state between the different groups. + * + * This function acquires the task's rq lock to lock out concurrent + * changes to the task's scheduling state and - in case the task is + * running - concurrent changes to its stall state. + */ +void cgroup_move_task(struct task_struct *task, struct css_set *to) +{ + bool move_psi = !psi_disabled; + unsigned int task_flags = 0; + struct rq_flags rf; + struct rq *rq; + + if (move_psi) { + rq = task_rq_lock(task, &rf); + + if (task_on_rq_queued(task)) + task_flags = TSK_RUNNING; + else if (task->in_iowait) + task_flags = TSK_IOWAIT; + + if (task->flags & PF_MEMSTALL) + task_flags |= TSK_MEMSTALL; + + if (task_flags) + psi_task_change(task, task_flags, 0); + } + + /* + * Lame to do this here, but the scheduler cannot be locked + * from the outside, so we move cgroups from inside sched/. + */ + rcu_assign_pointer(task->cgroups, to); + + if (move_psi) { + if (task_flags) + psi_task_change(task, 0, task_flags); + + task_rq_unlock(rq, task, &rf); + } +} +#endif /* CONFIG_CGROUPS */ + +int psi_show(struct seq_file *m, struct psi_group *group, enum psi_res res) { int full; -- cgit v1.2.3-59-g8ed1b From f682a97a00591def7cefbb5003dc04045028e405 Mon Sep 17 00:00:00 2001 From: Alexander Duyck Date: Fri, 26 Oct 2018 15:07:45 -0700 Subject: mm: provide kernel parameter to allow disabling page init poisoning Patch series "Address issues slowing persistent memory initialization", v5. The main thing this patch set achieves is that it allows us to initialize each node worth of persistent memory independently. As a result we reduce page init time by about 2 minutes because instead of taking 30 to 40 seconds per node and going through each node one at a time, we process all 4 nodes in parallel in the case of a 12TB persistent memory setup spread evenly over 4 nodes. This patch (of 3): On systems with a large amount of memory it can take a significant amount of time to initialize all of the page structs with the PAGE_POISON_PATTERN value. I have seen it take over 2 minutes to initialize a system with over 12TB of RAM. In order to work around the issue I had to disable CONFIG_DEBUG_VM and then the boot time returned to something much more reasonable as the arch_add_memory call completed in milliseconds versus seconds. However in doing that I had to disable all of the other VM debugging on the system. In order to work around a kernel that might have CONFIG_DEBUG_VM enabled on a system that has a large amount of memory I have added a new kernel parameter named "vm_debug" that can be set to "-" in order to disable it. Link: http://lkml.kernel.org/r/20180925201921.3576.84239.stgit@localhost.localdomain Reviewed-by: Pavel Tatashin Signed-off-by: Alexander Duyck Cc: Dave Hansen Cc: Michal Hocko Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/admin-guide/kernel-parameters.txt | 12 +++++++ include/linux/page-flags.h | 8 +++++ mm/debug.c | 46 +++++++++++++++++++++++++ mm/memblock.c | 5 ++- mm/sparse.c | 4 +-- 5 files changed, 69 insertions(+), 6 deletions(-) (limited to 'Documentation/admin-guide') diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 8022d902e770..dcd082576e79 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -4839,6 +4839,18 @@ This is actually a boot loader parameter; the value is passed to the kernel using a special protocol. + vm_debug[=options] [KNL] Available with CONFIG_DEBUG_VM=y. + May slow down system boot speed, especially when + enabled on systems with a large amount of memory. + All options are enabled by default, and this + interface is meant to allow for selectively + enabling or disabling specific virtual memory + debugging features. + + Available options are: + P Enable page structure init time poisoning + - Disable all of the above options + vmalloc=nn[KMG] [KNL,BOOT] Forces the vmalloc area to have an exact size of . This can be used to increase the minimum size (128MB on x86). It can also be used to diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 4d99504f6496..934f91ef3f54 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -163,6 +163,14 @@ static inline int PagePoisoned(const struct page *page) return page->flags == PAGE_POISON_PATTERN; } +#ifdef CONFIG_DEBUG_VM +void page_init_poison(struct page *page, size_t size); +#else +static inline void page_init_poison(struct page *page, size_t size) +{ +} +#endif + /* * Page flags policies wrt compound pages * diff --git a/mm/debug.c b/mm/debug.c index bd10aad8539a..cdacba12e09a 100644 --- a/mm/debug.c +++ b/mm/debug.c @@ -13,6 +13,7 @@ #include #include #include +#include #include "internal.h" @@ -175,4 +176,49 @@ void dump_mm(const struct mm_struct *mm) ); } +static bool page_init_poisoning __read_mostly = true; + +static int __init setup_vm_debug(char *str) +{ + bool __page_init_poisoning = true; + + /* + * Calling vm_debug with no arguments is equivalent to requesting + * to enable all debugging options we can control. + */ + if (*str++ != '=' || !*str) + goto out; + + __page_init_poisoning = false; + if (*str == '-') + goto out; + + while (*str) { + switch (tolower(*str)) { + case'p': + __page_init_poisoning = true; + break; + default: + pr_err("vm_debug option '%c' unknown. skipped\n", + *str); + } + + str++; + } +out: + if (page_init_poisoning && !__page_init_poisoning) + pr_warn("Page struct poisoning disabled by kernel command line option 'vm_debug'\n"); + + page_init_poisoning = __page_init_poisoning; + + return 1; +} +__setup("vm_debug", setup_vm_debug); + +void page_init_poison(struct page *page, size_t size) +{ + if (page_init_poisoning) + memset(page, PAGE_POISON_PATTERN, size); +} +EXPORT_SYMBOL_GPL(page_init_poison); #endif /* CONFIG_DEBUG_VM */ diff --git a/mm/memblock.c b/mm/memblock.c index 237944479d25..a85315083b5a 100644 --- a/mm/memblock.c +++ b/mm/memblock.c @@ -1444,10 +1444,9 @@ void * __init memblock_virt_alloc_try_nid_raw( ptr = memblock_virt_alloc_internal(size, align, min_addr, max_addr, nid); -#ifdef CONFIG_DEBUG_VM if (ptr && size > 0) - memset(ptr, PAGE_POISON_PATTERN, size); -#endif + page_init_poison(ptr, size); + return ptr; } diff --git a/mm/sparse.c b/mm/sparse.c index 10b07eea9a6e..67ad061f7fb8 100644 --- a/mm/sparse.c +++ b/mm/sparse.c @@ -696,13 +696,11 @@ int __meminit sparse_add_one_section(struct pglist_data *pgdat, goto out; } -#ifdef CONFIG_DEBUG_VM /* * Poison uninitialized struct pages in order to catch invalid flags * combinations. */ - memset(memmap, PAGE_POISON_PATTERN, sizeof(struct page) * PAGES_PER_SECTION); -#endif + page_init_poison(memmap, sizeof(struct page) * PAGES_PER_SECTION); section_mark_present(ms); sparse_init_one_section(ms, section_nr, memmap, usemap); -- cgit v1.2.3-59-g8ed1b From 7a1adfddaf0d11a39fdcaf6e82a88e9c0586e08b Mon Sep 17 00:00:00 2001 From: Roman Gushchin Date: Fri, 26 Oct 2018 15:09:48 -0700 Subject: mm: don't raise MEMCG_OOM event due to failed high-order allocation It was reported that on some of our machines containers were restarted with OOM symptoms without an obvious reason. Despite there were almost no memory pressure and plenty of page cache, MEMCG_OOM event was raised occasionally, causing the container management software to think, that OOM has happened. However, no tasks have been killed. The following investigation showed that the problem is caused by a failing attempt to charge a high-order page. In such case, the OOM killer is never invoked. As shown below, it can happen under conditions, which are very far from a real OOM: e.g. there is plenty of clean page cache and no memory pressure. There is no sense in raising an OOM event in this case, as it might confuse a user and lead to wrong and excessive actions (e.g. restart the workload, as in my case). Let's look at the charging path in try_charge(). If the memory usage is about memory.max, which is absolutely natural for most memory cgroups, we try to reclaim some pages. Even if we were able to reclaim enough memory for the allocation, the following check can fail due to a race with another concurrent allocation: if (mem_cgroup_margin(mem_over_limit) >= nr_pages) goto retry; For regular pages the following condition will save us from triggering the OOM: if (nr_reclaimed && nr_pages <= (1 << PAGE_ALLOC_COSTLY_ORDER)) goto retry; But for high-order allocation this condition will intentionally fail. The reason behind is that we'll likely fall to regular pages anyway, so it's ok and even preferred to return ENOMEM. In this case the idea of raising MEMCG_OOM looks dubious. Fix this by moving MEMCG_OOM raising to mem_cgroup_oom() after allocation order check, so that the event won't be raised for high order allocations. This change doesn't affect regular pages allocation and charging. Link: http://lkml.kernel.org/r/20181004214050.7417-1-guro@fb.com Signed-off-by: Roman Gushchin Acked-by: David Rientjes Acked-by: Michal Hocko Acked-by: Johannes Weiner Cc: Vladimir Davydov Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/admin-guide/cgroup-v2.rst | 4 ++++ mm/memcontrol.c | 4 ++-- 2 files changed, 6 insertions(+), 2 deletions(-) (limited to 'Documentation/admin-guide') diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst index 8389d6f72a77..8384c681a4b2 100644 --- a/Documentation/admin-guide/cgroup-v2.rst +++ b/Documentation/admin-guide/cgroup-v2.rst @@ -1133,6 +1133,10 @@ PAGE_SIZE multiple when read back. disk readahead. For now OOM in memory cgroup kills tasks iff shortage has happened inside page fault. + This event is not raised if the OOM killer is not + considered as an option, e.g. for failed high-order + allocations. + oom_kill The number of processes belonging to this cgroup killed by any kind of OOM killer. diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 92d38c88250f..10a9b554d69f 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1669,6 +1669,8 @@ static enum oom_status mem_cgroup_oom(struct mem_cgroup *memcg, gfp_t mask, int if (order > PAGE_ALLOC_COSTLY_ORDER) return OOM_SKIPPED; + memcg_memory_event(memcg, MEMCG_OOM); + /* * We are in the middle of the charge context here, so we * don't want to block when potentially sitting on a callstack @@ -2250,8 +2252,6 @@ retry: if (fatal_signal_pending(current)) goto force; - memcg_memory_event(mem_over_limit, MEMCG_OOM); - /* * keep retrying as long as the memcg oom killer is able to make * a forward progress or bypass the charge if the oom killer -- cgit v1.2.3-59-g8ed1b