From 9f63df26beea78aae0500d1225a99bbd905dd157 Mon Sep 17 00:00:00 2001 From: Randy Dunlap Date: Sun, 6 Jan 2019 19:19:29 -0800 Subject: Documentation/filesystems: fix title underline lengths in path-lookup.rst Fix Sphinx warnings in path-lookup.rst: Documentation/filesystems/path-lookup.rst:347: WARNING: Title underline too short. Documentation/filesystems/path-lookup.rst:358: WARNING: Title underline too short. [...] Signed-off-by: Randy Dunlap Acked-by: NeilBrown Signed-off-by: Jonathan Corbet --- Documentation/filesystems/path-lookup.rst | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/Documentation/filesystems/path-lookup.rst b/Documentation/filesystems/path-lookup.rst index 9d6b68853f5b..80e22eda4132 100644 --- a/Documentation/filesystems/path-lookup.rst +++ b/Documentation/filesystems/path-lookup.rst @@ -344,7 +344,7 @@ In particular it is held while scanning chains in the dcache hash table, and the mount point hash table. Bringing it together with ``struct nameidata`` --------------------------------------------- +---------------------------------------------- .. _First edition Unix: http://minnie.tuhs.org/cgi-bin/utree.pl?file=V1/u2.s @@ -355,7 +355,7 @@ converts a "name" to an "inode". ``struct nameidata`` contains (among other fields): ``struct path path`` -~~~~~~~~~~~~~~~~~~ +~~~~~~~~~~~~~~~~~~~~ A ``path`` contains a ``struct vfsmount`` (which is embedded in a ``struct mount``) and a ``struct dentry``. Together these @@ -366,13 +366,13 @@ step. A reference through ``d_lockref`` and ``mnt_count`` is always held. ``struct qstr last`` -~~~~~~~~~~~~~~~~~~ +~~~~~~~~~~~~~~~~~~~~ This is a string together with a length (i.e. _not_ ``nul`` terminated) that is the "next" component in the pathname. ``int last_type`` -~~~~~~~~~~~~~~~ +~~~~~~~~~~~~~~~~~ This is one of ``LAST_NORM``, ``LAST_ROOT``, ``LAST_DOT``, ``LAST_DOTDOT``, or ``LAST_BIND``. The ``last`` field is only valid if the type is @@ -381,7 +381,7 @@ components of the symlink have been processed yet. Others should be fairly self-explanatory. ``struct path root`` -~~~~~~~~~~~~~~~~~~ +~~~~~~~~~~~~~~~~~~~~ This is used to hold a reference to the effective root of the filesystem. Often that reference won't be needed, so this field is @@ -510,7 +510,7 @@ potentially interesting things about these dentries corresponding to three different flags that might be set in ``dentry->d_flags``: ``DCACHE_MANAGE_TRANSIT`` -~~~~~~~~~~~~~~~~~~~~~~~ +~~~~~~~~~~~~~~~~~~~~~~~~~ If this flag has been set, then the filesystem has requested that the ``d_manage()`` dentry operation be called before handling any possible @@ -529,7 +529,7 @@ filesystem, which will then give it a special pass through ``d_manage()`` by returning ``-EISDIR``. ``DCACHE_MOUNTED`` -~~~~~~~~~~~~~~~~ +~~~~~~~~~~~~~~~~~~ This flag is set on every dentry that is mounted on. As Linux supports multiple filesystem namespaces, it is possible that the @@ -542,7 +542,7 @@ If this flag is set, and ``d_manage()`` didn't return ``-EISDIR``, and a new ``dentry`` (both with counted references). ``DCACHE_NEED_AUTOMOUNT`` -~~~~~~~~~~~~~~~~~~~~~~~ +~~~~~~~~~~~~~~~~~~~~~~~~~ If ``d_manage()`` allowed us to get this far, and ``lookup_mnt()`` didn't find a mount point, then this flag causes the ``d_automount()`` dentry @@ -698,7 +698,7 @@ With that little refresher on seqlocks out of the way we can look at the bigger picture of how RCU-walk uses seqlocks. ``mount_lock`` and ``nd->m_seq`` -~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ We already met the ``mount_lock`` seqlock when REF-walk used it to ensure that crossing a mount point is performed safely. RCU-walk uses @@ -727,7 +727,7 @@ results would have been the same. This ensures the invariant holds, at least for vfsmount structures. ``dentry->d_seq`` and ``nd->seq`` -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In place of taking a count or lock on ``d_reflock``, RCU-walk samples the per-dentry ``d_seq`` seqlock, and stores the sequence number in the @@ -774,7 +774,7 @@ getting a counted reference to the new dentry before dropping that for the old dentry which we saw in REF-walk. No ``inode->i_rwsem`` or even ``rename_lock`` -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ A semaphore is a fairly heavyweight lock that can only be taken when it is permissible to sleep. As ``rcu_read_lock()`` forbids sleeping, @@ -796,7 +796,7 @@ locking. This neatly handles all cases, so adding extra checks on rename_lock would bring no significant value. ``unlazy walk()`` and ``complete_walk()`` -------------------------------------- +----------------------------------------- That "dropping down to REF-walk" typically involves a call to ``unlazy_walk()``, so named because "RCU-walk" is also sometimes -- cgit v1.2.3-59-g8ed1b From 1b23f5e9973abc2137ca615d770bf23d8e45b93c Mon Sep 17 00:00:00 2001 From: Otto Sabart Date: Sun, 6 Jan 2019 00:28:59 +0100 Subject: doc: networking: prepare offload documents for conversion into RST Add small number of markups which are sufficient for conversion into reStructuredText. Unfortunately there was necessary to restructure all sections in checksum-offloads.txt file and create paragraphs separated by newline. There also must not be a space at the beginning of paragpraph. There are no semantic changes. Signed-off-by: Otto Sabart Acked-by: David S. Miller Signed-off-by: Jonathan Corbet --- Documentation/networking/checksum-offloads.txt | 179 ++++++++++++--------- Documentation/networking/segmentation-offloads.txt | 46 ++++-- 2 files changed, 130 insertions(+), 95 deletions(-) diff --git a/Documentation/networking/checksum-offloads.txt b/Documentation/networking/checksum-offloads.txt index 27bc09cfcf6d..1a1cd94a3f6d 100644 --- a/Documentation/networking/checksum-offloads.txt +++ b/Documentation/networking/checksum-offloads.txt @@ -1,122 +1,143 @@ +.. SPDX-License-Identifier: GPL-2.0 + +=============================================== Checksum Offloads in the Linux Networking Stack +=============================================== Introduction ============ -This document describes a set of techniques in the Linux networking stack - to take advantage of checksum offload capabilities of various NICs. +This document describes a set of techniques in the Linux networking stack to +take advantage of checksum offload capabilities of various NICs. The following technologies are described: - * TX Checksum Offload - * LCO: Local Checksum Offload - * RCO: Remote Checksum Offload + +* TX Checksum Offload +* LCO: Local Checksum Offload +* RCO: Remote Checksum Offload Things that should be documented here but aren't yet: - * RX Checksum Offload - * CHECKSUM_UNNECESSARY conversion + +* RX Checksum Offload +* CHECKSUM_UNNECESSARY conversion TX Checksum Offload =================== -The interface for offloading a transmit checksum to a device is explained - in detail in comments near the top of include/linux/skbuff.h. +The interface for offloading a transmit checksum to a device is explained in +detail in comments near the top of include/linux/skbuff.h. + In brief, it allows to request the device fill in a single ones-complement - checksum defined by the sk_buff fields skb->csum_start and - skb->csum_offset. The device should compute the 16-bit ones-complement - checksum (i.e. the 'IP-style' checksum) from csum_start to the end of the - packet, and fill in the result at (csum_start + csum_offset). -Because csum_offset cannot be negative, this ensures that the previous - value of the checksum field is included in the checksum computation, thus - it can be used to supply any needed corrections to the checksum (such as - the sum of the pseudo-header for UDP or TCP). +checksum defined by the sk_buff fields skb->csum_start and skb->csum_offset. +The device should compute the 16-bit ones-complement checksum (i.e. the +'IP-style' checksum) from csum_start to the end of the packet, and fill in the +result at (csum_start + csum_offset). + +Because csum_offset cannot be negative, this ensures that the previous value of +the checksum field is included in the checksum computation, thus it can be used +to supply any needed corrections to the checksum (such as the sum of the +pseudo-header for UDP or TCP). + This interface only allows a single checksum to be offloaded. Where - encapsulation is used, the packet may have multiple checksum fields in - different header layers, and the rest will have to be handled by another - mechanism such as LCO or RCO. +encapsulation is used, the packet may have multiple checksum fields in +different header layers, and the rest will have to be handled by another +mechanism such as LCO or RCO. + CRC32c can also be offloaded using this interface, by means of filling - skb->csum_start and skb->csum_offset as described above, and setting - skb->csum_not_inet: see skbuff.h comment (section 'D') for more details. +skb->csum_start and skb->csum_offset as described above, and setting +skb->csum_not_inet: see skbuff.h comment (section 'D') for more details. + No offloading of the IP header checksum is performed; it is always done in - software. This is OK because when we build the IP header, we obviously - have it in cache, so summing it isn't expensive. It's also rather short. +software. This is OK because when we build the IP header, we obviously have it +in cache, so summing it isn't expensive. It's also rather short. + The requirements for GSO are more complicated, because when segmenting an - encapsulated packet both the inner and outer checksums may need to be - edited or recomputed for each resulting segment. See the skbuff.h comment - (section 'E') for more details. +encapsulated packet both the inner and outer checksums may need to be edited or +recomputed for each resulting segment. See the skbuff.h comment (section 'E') +for more details. A driver declares its offload capabilities in netdev->hw_features; see - Documentation/networking/netdev-features.txt for more. Note that a device - which only advertises NETIF_F_IP[V6]_CSUM must still obey the csum_start - and csum_offset given in the SKB; if it tries to deduce these itself in - hardware (as some NICs do) the driver should check that the values in the - SKB match those which the hardware will deduce, and if not, fall back to - checksumming in software instead (with skb_csum_hwoffload_help() or one of - the skb_checksum_help() / skb_crc32c_csum_help functions, as mentioned in - include/linux/skbuff.h). - -The stack should, for the most part, assume that checksum offload is - supported by the underlying device. The only place that should check is - validate_xmit_skb(), and the functions it calls directly or indirectly. - That function compares the offload features requested by the SKB (which - may include other offloads besides TX Checksum Offload) and, if they are - not supported or enabled on the device (determined by netdev->features), - performs the corresponding offload in software. In the case of TX - Checksum Offload, that means calling skb_csum_hwoffload_help(skb, features). +Documentation/networking/netdev-features.txt for more. Note that a device +which only advertises NETIF_F_IP[V6]_CSUM must still obey the csum_start and +csum_offset given in the SKB; if it tries to deduce these itself in hardware +(as some NICs do) the driver should check that the values in the SKB match +those which the hardware will deduce, and if not, fall back to checksumming in +software instead (with skb_csum_hwoffload_help() or one of the +skb_checksum_help() / skb_crc32c_csum_help functions, as mentioned in +include/linux/skbuff.h). + +The stack should, for the most part, assume that checksum offload is supported +by the underlying device. The only place that should check is +validate_xmit_skb(), and the functions it calls directly or indirectly. That +function compares the offload features requested by the SKB (which may include +other offloads besides TX Checksum Offload) and, if they are not supported or +enabled on the device (determined by netdev->features), performs the +corresponding offload in software. In the case of TX Checksum Offload, that +means calling skb_csum_hwoffload_help(skb, features). LCO: Local Checksum Offload =========================== LCO is a technique for efficiently computing the outer checksum of an - encapsulated datagram when the inner checksum is due to be offloaded. -The ones-complement sum of a correctly checksummed TCP or UDP packet is - equal to the complement of the sum of the pseudo header, because everything - else gets 'cancelled out' by the checksum field. This is because the sum was - complemented before being written to the checksum field. +encapsulated datagram when the inner checksum is due to be offloaded. + +The ones-complement sum of a correctly checksummed TCP or UDP packet is equal +to the complement of the sum of the pseudo header, because everything else gets +'cancelled out' by the checksum field. This is because the sum was +complemented before being written to the checksum field. + More generally, this holds in any case where the 'IP-style' ones complement - checksum is used, and thus any checksum that TX Checksum Offload supports. +checksum is used, and thus any checksum that TX Checksum Offload supports. + That is, if we have set up TX Checksum Offload with a start/offset pair, we - know that after the device has filled in that checksum, the ones - complement sum from csum_start to the end of the packet will be equal to - the complement of whatever value we put in the checksum field beforehand. - This allows us to compute the outer checksum without looking at the payload: - we simply stop summing when we get to csum_start, then add the complement of - the 16-bit word at (csum_start + csum_offset). +know that after the device has filled in that checksum, the ones complement sum +from csum_start to the end of the packet will be equal to the complement of +whatever value we put in the checksum field beforehand. This allows us to +compute the outer checksum without looking at the payload: we simply stop +summing when we get to csum_start, then add the complement of the 16-bit word +at (csum_start + csum_offset). + Then, when the true inner checksum is filled in (either by hardware or by - skb_checksum_help()), the outer checksum will become correct by virtue of - the arithmetic. +skb_checksum_help()), the outer checksum will become correct by virtue of the +arithmetic. LCO is performed by the stack when constructing an outer UDP header for an - encapsulation such as VXLAN or GENEVE, in udp_set_csum(). Similarly for - the IPv6 equivalents, in udp6_set_csum(). +encapsulation such as VXLAN or GENEVE, in udp_set_csum(). Similarly for the +IPv6 equivalents, in udp6_set_csum(). + It is also performed when constructing an IPv4 GRE header, in - net/ipv4/ip_gre.c:build_header(). It is *not* currently performed when - constructing an IPv6 GRE header; the GRE checksum is computed over the - whole packet in net/ipv6/ip6_gre.c:ip6gre_xmit2(), but it should be - possible to use LCO here as IPv6 GRE still uses an IP-style checksum. +net/ipv4/ip_gre.c:build_header(). It is *not* currently performed when +constructing an IPv6 GRE header; the GRE checksum is computed over the whole +packet in net/ipv6/ip6_gre.c:ip6gre_xmit2(), but it should be possible to use +LCO here as IPv6 GRE still uses an IP-style checksum. + All of the LCO implementations use a helper function lco_csum(), in - include/linux/skbuff.h. +include/linux/skbuff.h. LCO can safely be used for nested encapsulations; in this case, the outer - encapsulation layer will sum over both its own header and the 'middle' - header. This does mean that the 'middle' header will get summed multiple - times, but there doesn't seem to be a way to avoid that without incurring - bigger costs (e.g. in SKB bloat). +encapsulation layer will sum over both its own header and the 'middle' header. +This does mean that the 'middle' header will get summed multiple times, but +there doesn't seem to be a way to avoid that without incurring bigger costs +(e.g. in SKB bloat). RCO: Remote Checksum Offload ============================ -RCO is a technique for eliding the inner checksum of an encapsulated - datagram, allowing the outer checksum to be offloaded. It does, however, - involve a change to the encapsulation protocols, which the receiver must - also support. For this reason, it is disabled by default. +RCO is a technique for eliding the inner checksum of an encapsulated datagram, +allowing the outer checksum to be offloaded. It does, however, involve a +change to the encapsulation protocols, which the receiver must also support. +For this reason, it is disabled by default. + RCO is detailed in the following Internet-Drafts: -https://tools.ietf.org/html/draft-herbert-remotecsumoffload-00 -https://tools.ietf.org/html/draft-herbert-vxlan-rco-00 -In Linux, RCO is implemented individually in each encapsulation protocol, - and most tunnel types have flags controlling its use. For instance, VXLAN - has the flag VXLAN_F_REMCSUM_TX (per struct vxlan_rdst) to indicate that - RCO should be used when transmitting to a given remote destination. + +* https://tools.ietf.org/html/draft-herbert-remotecsumoffload-00 +* https://tools.ietf.org/html/draft-herbert-vxlan-rco-00 + +In Linux, RCO is implemented individually in each encapsulation protocol, and +most tunnel types have flags controlling its use. For instance, VXLAN has the +flag VXLAN_F_REMCSUM_TX (per struct vxlan_rdst) to indicate that RCO should be +used when transmitting to a given remote destination. diff --git a/Documentation/networking/segmentation-offloads.txt b/Documentation/networking/segmentation-offloads.txt index aca542ec125c..1794bfe98196 100644 --- a/Documentation/networking/segmentation-offloads.txt +++ b/Documentation/networking/segmentation-offloads.txt @@ -1,4 +1,9 @@ +.. SPDX-License-Identifier: GPL-2.0 + +=================================================== Segmentation Offloads in the Linux Networking Stack +=================================================== + Introduction ============ @@ -15,6 +20,7 @@ The following technologies are described: * Partial Generic Segmentation Offload - GSO_PARTIAL * SCTP accelleration with GSO - GSO_BY_FRAGS + TCP Segmentation Offload ======================== @@ -42,6 +48,7 @@ NETIF_F_TSO_MANGLEID set then the IP ID can be ignored when performing TSO and we will either increment the IP ID for all frames, or leave it at a static value based on driver preference. + UDP Fragmentation Offload ========================= @@ -54,6 +61,7 @@ UFO is deprecated: modern kernels will no longer generate UFO skbs, but can still receive them from tuntap and similar devices. Offload of UDP-based tunnel protocols is still supported. + IPIP, SIT, GRE, UDP Tunnel, and Remote Checksum Offloads ======================================================== @@ -71,17 +79,19 @@ refer to the tunnel headers as the outer headers, while the encapsulated data is normally referred to as the inner headers. Below is the list of calls to access the given headers: -IPIP/SIT Tunnel: - Outer Inner -MAC skb_mac_header -Network skb_network_header skb_inner_network_header -Transport skb_transport_header +IPIP/SIT Tunnel:: + + Outer Inner + MAC skb_mac_header + Network skb_network_header skb_inner_network_header + Transport skb_transport_header -UDP/GRE Tunnel: - Outer Inner -MAC skb_mac_header skb_inner_mac_header -Network skb_network_header skb_inner_network_header -Transport skb_transport_header skb_inner_transport_header +UDP/GRE Tunnel:: + + Outer Inner + MAC skb_mac_header skb_inner_mac_header + Network skb_network_header skb_inner_network_header + Transport skb_transport_header skb_inner_transport_header In addition to the above tunnel types there are also SKB_GSO_GRE_CSUM and SKB_GSO_UDP_TUNNEL_CSUM. These two additional tunnel types reflect the @@ -93,6 +103,7 @@ header has requested a remote checksum offload. In this case the inner headers will be left with a partial checksum and only the outer header checksum will be computed. + Generic Segmentation Offload ============================ @@ -106,6 +117,7 @@ Before enabling any hardware segmentation offload a corresponding software offload is required in GSO. Otherwise it becomes possible for a frame to be re-routed between devices and end up being unable to be transmitted. + Generic Receive Offload ======================= @@ -117,6 +129,7 @@ this is IPv4 ID in the case that the DF bit is set for a given IP header. If the value of the IPv4 ID is not sequentially incrementing it will be altered so that it is when a frame assembled via GRO is segmented via GSO. + Partial Generic Segmentation Offload ==================================== @@ -134,6 +147,7 @@ is the outer IPv4 ID field. It is up to the device drivers to guarantee that the IPv4 ID field is incremented in the case that a given header does not have the DF bit set. + SCTP accelleration with GSO =========================== @@ -157,14 +171,14 @@ appropriately. There are some helpers to make this easier: - - skb_is_gso(skb) && skb_is_gso_sctp(skb) is the best way to see if - an skb is an SCTP GSO skb. +- skb_is_gso(skb) && skb_is_gso_sctp(skb) is the best way to see if + an skb is an SCTP GSO skb. - - For size checks, the skb_gso_validate_*_len family of helpers correctly - considers GSO_BY_FRAGS. +- For size checks, the skb_gso_validate_*_len family of helpers correctly + considers GSO_BY_FRAGS. - - For manipulating packets, skb_increase_gso_size and skb_decrease_gso_size - will check for GSO_BY_FRAGS and WARN if asked to manipulate these skbs. +- For manipulating packets, skb_increase_gso_size and skb_decrease_gso_size + will check for GSO_BY_FRAGS and WARN if asked to manipulate these skbs. This also affects drivers with the NETIF_F_FRAGLIST & NETIF_F_GSO_SCTP bits set. Note also that NETIF_F_GSO_SCTP is included in NETIF_F_GSO_SOFTWARE. -- cgit v1.2.3-59-g8ed1b From d0dcde6426ce071ad447fb9d91c85ab649026114 Mon Sep 17 00:00:00 2001 From: Otto Sabart Date: Sun, 6 Jan 2019 00:29:15 +0100 Subject: doc: networking: convert offload files into RST and update references This patch renames offload files. This is necessary for Sphinx. Also update reference to checksum-offloads.rst file. Whole kernel code was grepped for references using: $ grep -r "\(segmentation\|checksum\)-offloads.txt" . There should be no other references to {segmentation,checksum}-offloads.txt files. Signed-off-by: Otto Sabart Acked-by: David S. Miller Signed-off-by: Jonathan Corbet --- Documentation/networking/checksum-offloads.rst | 143 ++++++++++++++++ Documentation/networking/checksum-offloads.txt | 143 ---------------- Documentation/networking/segmentation-offloads.rst | 184 +++++++++++++++++++++ Documentation/networking/segmentation-offloads.txt | 184 --------------------- include/linux/skbuff.h | 2 +- 5 files changed, 328 insertions(+), 328 deletions(-) create mode 100644 Documentation/networking/checksum-offloads.rst delete mode 100644 Documentation/networking/checksum-offloads.txt create mode 100644 Documentation/networking/segmentation-offloads.rst delete mode 100644 Documentation/networking/segmentation-offloads.txt diff --git a/Documentation/networking/checksum-offloads.rst b/Documentation/networking/checksum-offloads.rst new file mode 100644 index 000000000000..1a1cd94a3f6d --- /dev/null +++ b/Documentation/networking/checksum-offloads.rst @@ -0,0 +1,143 @@ +.. SPDX-License-Identifier: GPL-2.0 + +=============================================== +Checksum Offloads in the Linux Networking Stack +=============================================== + + +Introduction +============ + +This document describes a set of techniques in the Linux networking stack to +take advantage of checksum offload capabilities of various NICs. + +The following technologies are described: + +* TX Checksum Offload +* LCO: Local Checksum Offload +* RCO: Remote Checksum Offload + +Things that should be documented here but aren't yet: + +* RX Checksum Offload +* CHECKSUM_UNNECESSARY conversion + + +TX Checksum Offload +=================== + +The interface for offloading a transmit checksum to a device is explained in +detail in comments near the top of include/linux/skbuff.h. + +In brief, it allows to request the device fill in a single ones-complement +checksum defined by the sk_buff fields skb->csum_start and skb->csum_offset. +The device should compute the 16-bit ones-complement checksum (i.e. the +'IP-style' checksum) from csum_start to the end of the packet, and fill in the +result at (csum_start + csum_offset). + +Because csum_offset cannot be negative, this ensures that the previous value of +the checksum field is included in the checksum computation, thus it can be used +to supply any needed corrections to the checksum (such as the sum of the +pseudo-header for UDP or TCP). + +This interface only allows a single checksum to be offloaded. Where +encapsulation is used, the packet may have multiple checksum fields in +different header layers, and the rest will have to be handled by another +mechanism such as LCO or RCO. + +CRC32c can also be offloaded using this interface, by means of filling +skb->csum_start and skb->csum_offset as described above, and setting +skb->csum_not_inet: see skbuff.h comment (section 'D') for more details. + +No offloading of the IP header checksum is performed; it is always done in +software. This is OK because when we build the IP header, we obviously have it +in cache, so summing it isn't expensive. It's also rather short. + +The requirements for GSO are more complicated, because when segmenting an +encapsulated packet both the inner and outer checksums may need to be edited or +recomputed for each resulting segment. See the skbuff.h comment (section 'E') +for more details. + +A driver declares its offload capabilities in netdev->hw_features; see +Documentation/networking/netdev-features.txt for more. Note that a device +which only advertises NETIF_F_IP[V6]_CSUM must still obey the csum_start and +csum_offset given in the SKB; if it tries to deduce these itself in hardware +(as some NICs do) the driver should check that the values in the SKB match +those which the hardware will deduce, and if not, fall back to checksumming in +software instead (with skb_csum_hwoffload_help() or one of the +skb_checksum_help() / skb_crc32c_csum_help functions, as mentioned in +include/linux/skbuff.h). + +The stack should, for the most part, assume that checksum offload is supported +by the underlying device. The only place that should check is +validate_xmit_skb(), and the functions it calls directly or indirectly. That +function compares the offload features requested by the SKB (which may include +other offloads besides TX Checksum Offload) and, if they are not supported or +enabled on the device (determined by netdev->features), performs the +corresponding offload in software. In the case of TX Checksum Offload, that +means calling skb_csum_hwoffload_help(skb, features). + + +LCO: Local Checksum Offload +=========================== + +LCO is a technique for efficiently computing the outer checksum of an +encapsulated datagram when the inner checksum is due to be offloaded. + +The ones-complement sum of a correctly checksummed TCP or UDP packet is equal +to the complement of the sum of the pseudo header, because everything else gets +'cancelled out' by the checksum field. This is because the sum was +complemented before being written to the checksum field. + +More generally, this holds in any case where the 'IP-style' ones complement +checksum is used, and thus any checksum that TX Checksum Offload supports. + +That is, if we have set up TX Checksum Offload with a start/offset pair, we +know that after the device has filled in that checksum, the ones complement sum +from csum_start to the end of the packet will be equal to the complement of +whatever value we put in the checksum field beforehand. This allows us to +compute the outer checksum without looking at the payload: we simply stop +summing when we get to csum_start, then add the complement of the 16-bit word +at (csum_start + csum_offset). + +Then, when the true inner checksum is filled in (either by hardware or by +skb_checksum_help()), the outer checksum will become correct by virtue of the +arithmetic. + +LCO is performed by the stack when constructing an outer UDP header for an +encapsulation such as VXLAN or GENEVE, in udp_set_csum(). Similarly for the +IPv6 equivalents, in udp6_set_csum(). + +It is also performed when constructing an IPv4 GRE header, in +net/ipv4/ip_gre.c:build_header(). It is *not* currently performed when +constructing an IPv6 GRE header; the GRE checksum is computed over the whole +packet in net/ipv6/ip6_gre.c:ip6gre_xmit2(), but it should be possible to use +LCO here as IPv6 GRE still uses an IP-style checksum. + +All of the LCO implementations use a helper function lco_csum(), in +include/linux/skbuff.h. + +LCO can safely be used for nested encapsulations; in this case, the outer +encapsulation layer will sum over both its own header and the 'middle' header. +This does mean that the 'middle' header will get summed multiple times, but +there doesn't seem to be a way to avoid that without incurring bigger costs +(e.g. in SKB bloat). + + +RCO: Remote Checksum Offload +============================ + +RCO is a technique for eliding the inner checksum of an encapsulated datagram, +allowing the outer checksum to be offloaded. It does, however, involve a +change to the encapsulation protocols, which the receiver must also support. +For this reason, it is disabled by default. + +RCO is detailed in the following Internet-Drafts: + +* https://tools.ietf.org/html/draft-herbert-remotecsumoffload-00 +* https://tools.ietf.org/html/draft-herbert-vxlan-rco-00 + +In Linux, RCO is implemented individually in each encapsulation protocol, and +most tunnel types have flags controlling its use. For instance, VXLAN has the +flag VXLAN_F_REMCSUM_TX (per struct vxlan_rdst) to indicate that RCO should be +used when transmitting to a given remote destination. diff --git a/Documentation/networking/checksum-offloads.txt b/Documentation/networking/checksum-offloads.txt deleted file mode 100644 index 1a1cd94a3f6d..000000000000 --- a/Documentation/networking/checksum-offloads.txt +++ /dev/null @@ -1,143 +0,0 @@ -.. SPDX-License-Identifier: GPL-2.0 - -=============================================== -Checksum Offloads in the Linux Networking Stack -=============================================== - - -Introduction -============ - -This document describes a set of techniques in the Linux networking stack to -take advantage of checksum offload capabilities of various NICs. - -The following technologies are described: - -* TX Checksum Offload -* LCO: Local Checksum Offload -* RCO: Remote Checksum Offload - -Things that should be documented here but aren't yet: - -* RX Checksum Offload -* CHECKSUM_UNNECESSARY conversion - - -TX Checksum Offload -=================== - -The interface for offloading a transmit checksum to a device is explained in -detail in comments near the top of include/linux/skbuff.h. - -In brief, it allows to request the device fill in a single ones-complement -checksum defined by the sk_buff fields skb->csum_start and skb->csum_offset. -The device should compute the 16-bit ones-complement checksum (i.e. the -'IP-style' checksum) from csum_start to the end of the packet, and fill in the -result at (csum_start + csum_offset). - -Because csum_offset cannot be negative, this ensures that the previous value of -the checksum field is included in the checksum computation, thus it can be used -to supply any needed corrections to the checksum (such as the sum of the -pseudo-header for UDP or TCP). - -This interface only allows a single checksum to be offloaded. Where -encapsulation is used, the packet may have multiple checksum fields in -different header layers, and the rest will have to be handled by another -mechanism such as LCO or RCO. - -CRC32c can also be offloaded using this interface, by means of filling -skb->csum_start and skb->csum_offset as described above, and setting -skb->csum_not_inet: see skbuff.h comment (section 'D') for more details. - -No offloading of the IP header checksum is performed; it is always done in -software. This is OK because when we build the IP header, we obviously have it -in cache, so summing it isn't expensive. It's also rather short. - -The requirements for GSO are more complicated, because when segmenting an -encapsulated packet both the inner and outer checksums may need to be edited or -recomputed for each resulting segment. See the skbuff.h comment (section 'E') -for more details. - -A driver declares its offload capabilities in netdev->hw_features; see -Documentation/networking/netdev-features.txt for more. Note that a device -which only advertises NETIF_F_IP[V6]_CSUM must still obey the csum_start and -csum_offset given in the SKB; if it tries to deduce these itself in hardware -(as some NICs do) the driver should check that the values in the SKB match -those which the hardware will deduce, and if not, fall back to checksumming in -software instead (with skb_csum_hwoffload_help() or one of the -skb_checksum_help() / skb_crc32c_csum_help functions, as mentioned in -include/linux/skbuff.h). - -The stack should, for the most part, assume that checksum offload is supported -by the underlying device. The only place that should check is -validate_xmit_skb(), and the functions it calls directly or indirectly. That -function compares the offload features requested by the SKB (which may include -other offloads besides TX Checksum Offload) and, if they are not supported or -enabled on the device (determined by netdev->features), performs the -corresponding offload in software. In the case of TX Checksum Offload, that -means calling skb_csum_hwoffload_help(skb, features). - - -LCO: Local Checksum Offload -=========================== - -LCO is a technique for efficiently computing the outer checksum of an -encapsulated datagram when the inner checksum is due to be offloaded. - -The ones-complement sum of a correctly checksummed TCP or UDP packet is equal -to the complement of the sum of the pseudo header, because everything else gets -'cancelled out' by the checksum field. This is because the sum was -complemented before being written to the checksum field. - -More generally, this holds in any case where the 'IP-style' ones complement -checksum is used, and thus any checksum that TX Checksum Offload supports. - -That is, if we have set up TX Checksum Offload with a start/offset pair, we -know that after the device has filled in that checksum, the ones complement sum -from csum_start to the end of the packet will be equal to the complement of -whatever value we put in the checksum field beforehand. This allows us to -compute the outer checksum without looking at the payload: we simply stop -summing when we get to csum_start, then add the complement of the 16-bit word -at (csum_start + csum_offset). - -Then, when the true inner checksum is filled in (either by hardware or by -skb_checksum_help()), the outer checksum will become correct by virtue of the -arithmetic. - -LCO is performed by the stack when constructing an outer UDP header for an -encapsulation such as VXLAN or GENEVE, in udp_set_csum(). Similarly for the -IPv6 equivalents, in udp6_set_csum(). - -It is also performed when constructing an IPv4 GRE header, in -net/ipv4/ip_gre.c:build_header(). It is *not* currently performed when -constructing an IPv6 GRE header; the GRE checksum is computed over the whole -packet in net/ipv6/ip6_gre.c:ip6gre_xmit2(), but it should be possible to use -LCO here as IPv6 GRE still uses an IP-style checksum. - -All of the LCO implementations use a helper function lco_csum(), in -include/linux/skbuff.h. - -LCO can safely be used for nested encapsulations; in this case, the outer -encapsulation layer will sum over both its own header and the 'middle' header. -This does mean that the 'middle' header will get summed multiple times, but -there doesn't seem to be a way to avoid that without incurring bigger costs -(e.g. in SKB bloat). - - -RCO: Remote Checksum Offload -============================ - -RCO is a technique for eliding the inner checksum of an encapsulated datagram, -allowing the outer checksum to be offloaded. It does, however, involve a -change to the encapsulation protocols, which the receiver must also support. -For this reason, it is disabled by default. - -RCO is detailed in the following Internet-Drafts: - -* https://tools.ietf.org/html/draft-herbert-remotecsumoffload-00 -* https://tools.ietf.org/html/draft-herbert-vxlan-rco-00 - -In Linux, RCO is implemented individually in each encapsulation protocol, and -most tunnel types have flags controlling its use. For instance, VXLAN has the -flag VXLAN_F_REMCSUM_TX (per struct vxlan_rdst) to indicate that RCO should be -used when transmitting to a given remote destination. diff --git a/Documentation/networking/segmentation-offloads.rst b/Documentation/networking/segmentation-offloads.rst new file mode 100644 index 000000000000..1794bfe98196 --- /dev/null +++ b/Documentation/networking/segmentation-offloads.rst @@ -0,0 +1,184 @@ +.. SPDX-License-Identifier: GPL-2.0 + +=================================================== +Segmentation Offloads in the Linux Networking Stack +=================================================== + + +Introduction +============ + +This document describes a set of techniques in the Linux networking stack +to take advantage of segmentation offload capabilities of various NICs. + +The following technologies are described: + * TCP Segmentation Offload - TSO + * UDP Fragmentation Offload - UFO + * IPIP, SIT, GRE, and UDP Tunnel Offloads + * Generic Segmentation Offload - GSO + * Generic Receive Offload - GRO + * Partial Generic Segmentation Offload - GSO_PARTIAL + * SCTP accelleration with GSO - GSO_BY_FRAGS + + +TCP Segmentation Offload +======================== + +TCP segmentation allows a device to segment a single frame into multiple +frames with a data payload size specified in skb_shinfo()->gso_size. +When TCP segmentation requested the bit for either SKB_GSO_TCPV4 or +SKB_GSO_TCPV6 should be set in skb_shinfo()->gso_type and +skb_shinfo()->gso_size should be set to a non-zero value. + +TCP segmentation is dependent on support for the use of partial checksum +offload. For this reason TSO is normally disabled if the Tx checksum +offload for a given device is disabled. + +In order to support TCP segmentation offload it is necessary to populate +the network and transport header offsets of the skbuff so that the device +drivers will be able determine the offsets of the IP or IPv6 header and the +TCP header. In addition as CHECKSUM_PARTIAL is required csum_start should +also point to the TCP header of the packet. + +For IPv4 segmentation we support one of two types in terms of the IP ID. +The default behavior is to increment the IP ID with every segment. If the +GSO type SKB_GSO_TCP_FIXEDID is specified then we will not increment the IP +ID and all segments will use the same IP ID. If a device has +NETIF_F_TSO_MANGLEID set then the IP ID can be ignored when performing TSO +and we will either increment the IP ID for all frames, or leave it at a +static value based on driver preference. + + +UDP Fragmentation Offload +========================= + +UDP fragmentation offload allows a device to fragment an oversized UDP +datagram into multiple IPv4 fragments. Many of the requirements for UDP +fragmentation offload are the same as TSO. However the IPv4 ID for +fragments should not increment as a single IPv4 datagram is fragmented. + +UFO is deprecated: modern kernels will no longer generate UFO skbs, but can +still receive them from tuntap and similar devices. Offload of UDP-based +tunnel protocols is still supported. + + +IPIP, SIT, GRE, UDP Tunnel, and Remote Checksum Offloads +======================================================== + +In addition to the offloads described above it is possible for a frame to +contain additional headers such as an outer tunnel. In order to account +for such instances an additional set of segmentation offload types were +introduced including SKB_GSO_IPXIP4, SKB_GSO_IPXIP6, SKB_GSO_GRE, and +SKB_GSO_UDP_TUNNEL. These extra segmentation types are used to identify +cases where there are more than just 1 set of headers. For example in the +case of IPIP and SIT we should have the network and transport headers moved +from the standard list of headers to "inner" header offsets. + +Currently only two levels of headers are supported. The convention is to +refer to the tunnel headers as the outer headers, while the encapsulated +data is normally referred to as the inner headers. Below is the list of +calls to access the given headers: + +IPIP/SIT Tunnel:: + + Outer Inner + MAC skb_mac_header + Network skb_network_header skb_inner_network_header + Transport skb_transport_header + +UDP/GRE Tunnel:: + + Outer Inner + MAC skb_mac_header skb_inner_mac_header + Network skb_network_header skb_inner_network_header + Transport skb_transport_header skb_inner_transport_header + +In addition to the above tunnel types there are also SKB_GSO_GRE_CSUM and +SKB_GSO_UDP_TUNNEL_CSUM. These two additional tunnel types reflect the +fact that the outer header also requests to have a non-zero checksum +included in the outer header. + +Finally there is SKB_GSO_TUNNEL_REMCSUM which indicates that a given tunnel +header has requested a remote checksum offload. In this case the inner +headers will be left with a partial checksum and only the outer header +checksum will be computed. + + +Generic Segmentation Offload +============================ + +Generic segmentation offload is a pure software offload that is meant to +deal with cases where device drivers cannot perform the offloads described +above. What occurs in GSO is that a given skbuff will have its data broken +out over multiple skbuffs that have been resized to match the MSS provided +via skb_shinfo()->gso_size. + +Before enabling any hardware segmentation offload a corresponding software +offload is required in GSO. Otherwise it becomes possible for a frame to +be re-routed between devices and end up being unable to be transmitted. + + +Generic Receive Offload +======================= + +Generic receive offload is the complement to GSO. Ideally any frame +assembled by GRO should be segmented to create an identical sequence of +frames using GSO, and any sequence of frames segmented by GSO should be +able to be reassembled back to the original by GRO. The only exception to +this is IPv4 ID in the case that the DF bit is set for a given IP header. +If the value of the IPv4 ID is not sequentially incrementing it will be +altered so that it is when a frame assembled via GRO is segmented via GSO. + + +Partial Generic Segmentation Offload +==================================== + +Partial generic segmentation offload is a hybrid between TSO and GSO. What +it effectively does is take advantage of certain traits of TCP and tunnels +so that instead of having to rewrite the packet headers for each segment +only the inner-most transport header and possibly the outer-most network +header need to be updated. This allows devices that do not support tunnel +offloads or tunnel offloads with checksum to still make use of segmentation. + +With the partial offload what occurs is that all headers excluding the +inner transport header are updated such that they will contain the correct +values for if the header was simply duplicated. The one exception to this +is the outer IPv4 ID field. It is up to the device drivers to guarantee +that the IPv4 ID field is incremented in the case that a given header does +not have the DF bit set. + + +SCTP accelleration with GSO +=========================== + +SCTP - despite the lack of hardware support - can still take advantage of +GSO to pass one large packet through the network stack, rather than +multiple small packets. + +This requires a different approach to other offloads, as SCTP packets +cannot be just segmented to (P)MTU. Rather, the chunks must be contained in +IP segments, padding respected. So unlike regular GSO, SCTP can't just +generate a big skb, set gso_size to the fragmentation point and deliver it +to IP layer. + +Instead, the SCTP protocol layer builds an skb with the segments correctly +padded and stored as chained skbs, and skb_segment() splits based on those. +To signal this, gso_size is set to the special value GSO_BY_FRAGS. + +Therefore, any code in the core networking stack must be aware of the +possibility that gso_size will be GSO_BY_FRAGS and handle that case +appropriately. + +There are some helpers to make this easier: + +- skb_is_gso(skb) && skb_is_gso_sctp(skb) is the best way to see if + an skb is an SCTP GSO skb. + +- For size checks, the skb_gso_validate_*_len family of helpers correctly + considers GSO_BY_FRAGS. + +- For manipulating packets, skb_increase_gso_size and skb_decrease_gso_size + will check for GSO_BY_FRAGS and WARN if asked to manipulate these skbs. + +This also affects drivers with the NETIF_F_FRAGLIST & NETIF_F_GSO_SCTP bits +set. Note also that NETIF_F_GSO_SCTP is included in NETIF_F_GSO_SOFTWARE. diff --git a/Documentation/networking/segmentation-offloads.txt b/Documentation/networking/segmentation-offloads.txt deleted file mode 100644 index 1794bfe98196..000000000000 --- a/Documentation/networking/segmentation-offloads.txt +++ /dev/null @@ -1,184 +0,0 @@ -.. SPDX-License-Identifier: GPL-2.0 - -=================================================== -Segmentation Offloads in the Linux Networking Stack -=================================================== - - -Introduction -============ - -This document describes a set of techniques in the Linux networking stack -to take advantage of segmentation offload capabilities of various NICs. - -The following technologies are described: - * TCP Segmentation Offload - TSO - * UDP Fragmentation Offload - UFO - * IPIP, SIT, GRE, and UDP Tunnel Offloads - * Generic Segmentation Offload - GSO - * Generic Receive Offload - GRO - * Partial Generic Segmentation Offload - GSO_PARTIAL - * SCTP accelleration with GSO - GSO_BY_FRAGS - - -TCP Segmentation Offload -======================== - -TCP segmentation allows a device to segment a single frame into multiple -frames with a data payload size specified in skb_shinfo()->gso_size. -When TCP segmentation requested the bit for either SKB_GSO_TCPV4 or -SKB_GSO_TCPV6 should be set in skb_shinfo()->gso_type and -skb_shinfo()->gso_size should be set to a non-zero value. - -TCP segmentation is dependent on support for the use of partial checksum -offload. For this reason TSO is normally disabled if the Tx checksum -offload for a given device is disabled. - -In order to support TCP segmentation offload it is necessary to populate -the network and transport header offsets of the skbuff so that the device -drivers will be able determine the offsets of the IP or IPv6 header and the -TCP header. In addition as CHECKSUM_PARTIAL is required csum_start should -also point to the TCP header of the packet. - -For IPv4 segmentation we support one of two types in terms of the IP ID. -The default behavior is to increment the IP ID with every segment. If the -GSO type SKB_GSO_TCP_FIXEDID is specified then we will not increment the IP -ID and all segments will use the same IP ID. If a device has -NETIF_F_TSO_MANGLEID set then the IP ID can be ignored when performing TSO -and we will either increment the IP ID for all frames, or leave it at a -static value based on driver preference. - - -UDP Fragmentation Offload -========================= - -UDP fragmentation offload allows a device to fragment an oversized UDP -datagram into multiple IPv4 fragments. Many of the requirements for UDP -fragmentation offload are the same as TSO. However the IPv4 ID for -fragments should not increment as a single IPv4 datagram is fragmented. - -UFO is deprecated: modern kernels will no longer generate UFO skbs, but can -still receive them from tuntap and similar devices. Offload of UDP-based -tunnel protocols is still supported. - - -IPIP, SIT, GRE, UDP Tunnel, and Remote Checksum Offloads -======================================================== - -In addition to the offloads described above it is possible for a frame to -contain additional headers such as an outer tunnel. In order to account -for such instances an additional set of segmentation offload types were -introduced including SKB_GSO_IPXIP4, SKB_GSO_IPXIP6, SKB_GSO_GRE, and -SKB_GSO_UDP_TUNNEL. These extra segmentation types are used to identify -cases where there are more than just 1 set of headers. For example in the -case of IPIP and SIT we should have the network and transport headers moved -from the standard list of headers to "inner" header offsets. - -Currently only two levels of headers are supported. The convention is to -refer to the tunnel headers as the outer headers, while the encapsulated -data is normally referred to as the inner headers. Below is the list of -calls to access the given headers: - -IPIP/SIT Tunnel:: - - Outer Inner - MAC skb_mac_header - Network skb_network_header skb_inner_network_header - Transport skb_transport_header - -UDP/GRE Tunnel:: - - Outer Inner - MAC skb_mac_header skb_inner_mac_header - Network skb_network_header skb_inner_network_header - Transport skb_transport_header skb_inner_transport_header - -In addition to the above tunnel types there are also SKB_GSO_GRE_CSUM and -SKB_GSO_UDP_TUNNEL_CSUM. These two additional tunnel types reflect the -fact that the outer header also requests to have a non-zero checksum -included in the outer header. - -Finally there is SKB_GSO_TUNNEL_REMCSUM which indicates that a given tunnel -header has requested a remote checksum offload. In this case the inner -headers will be left with a partial checksum and only the outer header -checksum will be computed. - - -Generic Segmentation Offload -============================ - -Generic segmentation offload is a pure software offload that is meant to -deal with cases where device drivers cannot perform the offloads described -above. What occurs in GSO is that a given skbuff will have its data broken -out over multiple skbuffs that have been resized to match the MSS provided -via skb_shinfo()->gso_size. - -Before enabling any hardware segmentation offload a corresponding software -offload is required in GSO. Otherwise it becomes possible for a frame to -be re-routed between devices and end up being unable to be transmitted. - - -Generic Receive Offload -======================= - -Generic receive offload is the complement to GSO. Ideally any frame -assembled by GRO should be segmented to create an identical sequence of -frames using GSO, and any sequence of frames segmented by GSO should be -able to be reassembled back to the original by GRO. The only exception to -this is IPv4 ID in the case that the DF bit is set for a given IP header. -If the value of the IPv4 ID is not sequentially incrementing it will be -altered so that it is when a frame assembled via GRO is segmented via GSO. - - -Partial Generic Segmentation Offload -==================================== - -Partial generic segmentation offload is a hybrid between TSO and GSO. What -it effectively does is take advantage of certain traits of TCP and tunnels -so that instead of having to rewrite the packet headers for each segment -only the inner-most transport header and possibly the outer-most network -header need to be updated. This allows devices that do not support tunnel -offloads or tunnel offloads with checksum to still make use of segmentation. - -With the partial offload what occurs is that all headers excluding the -inner transport header are updated such that they will contain the correct -values for if the header was simply duplicated. The one exception to this -is the outer IPv4 ID field. It is up to the device drivers to guarantee -that the IPv4 ID field is incremented in the case that a given header does -not have the DF bit set. - - -SCTP accelleration with GSO -=========================== - -SCTP - despite the lack of hardware support - can still take advantage of -GSO to pass one large packet through the network stack, rather than -multiple small packets. - -This requires a different approach to other offloads, as SCTP packets -cannot be just segmented to (P)MTU. Rather, the chunks must be contained in -IP segments, padding respected. So unlike regular GSO, SCTP can't just -generate a big skb, set gso_size to the fragmentation point and deliver it -to IP layer. - -Instead, the SCTP protocol layer builds an skb with the segments correctly -padded and stored as chained skbs, and skb_segment() splits based on those. -To signal this, gso_size is set to the special value GSO_BY_FRAGS. - -Therefore, any code in the core networking stack must be aware of the -possibility that gso_size will be GSO_BY_FRAGS and handle that case -appropriately. - -There are some helpers to make this easier: - -- skb_is_gso(skb) && skb_is_gso_sctp(skb) is the best way to see if - an skb is an SCTP GSO skb. - -- For size checks, the skb_gso_validate_*_len family of helpers correctly - considers GSO_BY_FRAGS. - -- For manipulating packets, skb_increase_gso_size and skb_decrease_gso_size - will check for GSO_BY_FRAGS and WARN if asked to manipulate these skbs. - -This also affects drivers with the NETIF_F_FRAGLIST & NETIF_F_GSO_SCTP bits -set. Note also that NETIF_F_GSO_SCTP is included in NETIF_F_GSO_SOFTWARE. diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 93f56fddd92a..4e671b46e767 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -4296,7 +4296,7 @@ static inline bool skb_head_is_locked(const struct sk_buff *skb) /* Local Checksum Offload. * Compute outer checksum based on the assumption that the * inner checksum will be offloaded later. - * See Documentation/networking/checksum-offloads.txt for + * See Documentation/networking/checksum-offloads.rst for * explanation of how this works. * Fill in outer checksum adjustment (e.g. with sum of outer * pseudo-header) before calling. -- cgit v1.2.3-59-g8ed1b From b83eb68cb939e4486c7792f26cacab9ce7a83dc4 Mon Sep 17 00:00:00 2001 From: Otto Sabart Date: Sun, 6 Jan 2019 00:29:28 +0100 Subject: doc: networking: shorten the main title in offloads documents The titles do not look very nice in the table of contents generated by Sphinx. I also think it is obvious that the documents are describing offloads in the Linux Networking Stack. Signed-off-by: Otto Sabart Acked-by: David S. Miller Signed-off-by: Jonathan Corbet --- Documentation/networking/checksum-offloads.rst | 6 +++--- Documentation/networking/segmentation-offloads.rst | 6 +++--- 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/Documentation/networking/checksum-offloads.rst b/Documentation/networking/checksum-offloads.rst index 1a1cd94a3f6d..905c8a84b103 100644 --- a/Documentation/networking/checksum-offloads.rst +++ b/Documentation/networking/checksum-offloads.rst @@ -1,8 +1,8 @@ .. SPDX-License-Identifier: GPL-2.0 -=============================================== -Checksum Offloads in the Linux Networking Stack -=============================================== +================= +Checksum Offloads +================= Introduction diff --git a/Documentation/networking/segmentation-offloads.rst b/Documentation/networking/segmentation-offloads.rst index 1794bfe98196..89d1ee933e9f 100644 --- a/Documentation/networking/segmentation-offloads.rst +++ b/Documentation/networking/segmentation-offloads.rst @@ -1,8 +1,8 @@ .. SPDX-License-Identifier: GPL-2.0 -=================================================== -Segmentation Offloads in the Linux Networking Stack -=================================================== +===================== +Segmentation Offloads +===================== Introduction -- cgit v1.2.3-59-g8ed1b From d96bedb2b2488519818f1c2654b31069b5136401 Mon Sep 17 00:00:00 2001 From: Otto Sabart Date: Sun, 6 Jan 2019 00:29:41 +0100 Subject: doc: networking: add offload documents into main index file This patch just adds references to offload documents into main table of contents in network documentation. Signed-off-by: Otto Sabart Acked-by: David S. Miller Signed-off-by: Jonathan Corbet --- Documentation/networking/index.rst | 2 ++ 1 file changed, 2 insertions(+) diff --git a/Documentation/networking/index.rst b/Documentation/networking/index.rst index 6a47629ef8ed..7034214cf87e 100644 --- a/Documentation/networking/index.rst +++ b/Documentation/networking/index.rst @@ -32,6 +32,8 @@ Contents: alias bridge snmp_counter + checksum-offloads + segmentation-offloads .. only:: subproject -- cgit v1.2.3-59-g8ed1b From 2fec7b33094c1257ddb07b2b8318cfccba235d58 Mon Sep 17 00:00:00 2001 From: Bart Van Assche Date: Mon, 7 Jan 2019 10:20:19 -0800 Subject: Documentation/CodingStyle: Use directory-local variables for emacs settings In emacs 23.1 support for directory-local variables was added (see also https://lists.gnu.org/archive/html/info-gnu-emacs/2009-07/msg00000.html). Simplify the settings in coding-style.rst by using that feature. Additionally, do not inherit any settings from emacs' linux coding style to minimize dependencies on the version of emacs that is being used. I have verified with several large and nontrivial kernel source files that the new settings format code according to what checkpatch expects. Signed-off-by: Bart Van Assche Cc: Matthew Wilcox Cc: Jani Nikula Cc: Alison Chaiken Cc: Joe Perches Cc: Federico Vaga Cc: Geyslan G. Bem Cc: Tiago Natel de Moura Signed-off-by: Jonathan Corbet --- Documentation/process/coding-style.rst | 57 ++++++++++++++-------- .../translations/it_IT/process/coding-style.rst | 57 ++++++++++++++-------- Documentation/translations/zh_CN/coding-style.rst | 57 ++++++++++++++-------- 3 files changed, 111 insertions(+), 60 deletions(-) diff --git a/Documentation/process/coding-style.rst b/Documentation/process/coding-style.rst index b78dd680c038..afa11024279a 100644 --- a/Documentation/process/coding-style.rst +++ b/Documentation/process/coding-style.rst @@ -595,26 +595,43 @@ values. To do the latter, you can stick the following in your .emacs file: (* (max steps 1) c-basic-offset))) - (add-hook 'c-mode-common-hook - (lambda () - ;; Add kernel style - (c-add-style - "linux-tabs-only" - '("linux" (c-offsets-alist - (arglist-cont-nonempty - c-lineup-gcc-asm-reg - c-lineup-arglist-tabs-only)))))) - - (add-hook 'c-mode-hook - (lambda () - (let ((filename (buffer-file-name))) - ;; Enable kernel mode for the appropriate files - (when (and filename - (string-match (expand-file-name "~/src/linux-trees") - filename)) - (setq indent-tabs-mode t) - (setq show-trailing-whitespace t) - (c-set-style "linux-tabs-only"))))) + (dir-locals-set-class-variables + 'linux-kernel + '((c-mode . ( + (c-basic-offset . 8) + (c-label-minimum-indentation . 0) + (c-offsets-alist . ( + (arglist-close . c-lineup-arglist-tabs-only) + (arglist-cont-nonempty . + (c-lineup-gcc-asm-reg c-lineup-arglist-tabs-only)) + (arglist-intro . +) + (brace-list-intro . +) + (c . c-lineup-C-comments) + (case-label . 0) + (comment-intro . c-lineup-comment) + (cpp-define-intro . +) + (cpp-macro . -1000) + (cpp-macro-cont . +) + (defun-block-intro . +) + (else-clause . 0) + (func-decl-cont . +) + (inclass . +) + (inher-cont . c-lineup-multi-inher) + (knr-argdecl-intro . 0) + (label . -1000) + (statement . 0) + (statement-block-intro . +) + (statement-case-intro . +) + (statement-cont . +) + (substatement . +) + )) + (indent-tabs-mode . t) + (show-trailing-whitespace . t) + )))) + + (dir-locals-set-directory-class + (expand-file-name "~/src/linux-trees") + 'linux-kernel) This will make emacs go better with the kernel coding style for C files below ``~/src/linux-trees``. diff --git a/Documentation/translations/it_IT/process/coding-style.rst b/Documentation/translations/it_IT/process/coding-style.rst index b707bdbe178c..67c6d9ea7257 100644 --- a/Documentation/translations/it_IT/process/coding-style.rst +++ b/Documentation/translations/it_IT/process/coding-style.rst @@ -600,26 +600,43 @@ segue nel vostro file .emacs: (* (max steps 1) c-basic-offset))) - (add-hook 'c-mode-common-hook - (lambda () - ;; Add kernel style - (c-add-style - "linux-tabs-only" - '("linux" (c-offsets-alist - (arglist-cont-nonempty - c-lineup-gcc-asm-reg - c-lineup-arglist-tabs-only)))))) - - (add-hook 'c-mode-hook - (lambda () - (let ((filename (buffer-file-name))) - ;; Enable kernel mode for the appropriate files - (when (and filename - (string-match (expand-file-name "~/src/linux-trees") - filename)) - (setq indent-tabs-mode t) - (setq show-trailing-whitespace t) - (c-set-style "linux-tabs-only"))))) + (dir-locals-set-class-variables + 'linux-kernel + '((c-mode . ( + (c-basic-offset . 8) + (c-label-minimum-indentation . 0) + (c-offsets-alist . ( + (arglist-close . c-lineup-arglist-tabs-only) + (arglist-cont-nonempty . + (c-lineup-gcc-asm-reg c-lineup-arglist-tabs-only)) + (arglist-intro . +) + (brace-list-intro . +) + (c . c-lineup-C-comments) + (case-label . 0) + (comment-intro . c-lineup-comment) + (cpp-define-intro . +) + (cpp-macro . -1000) + (cpp-macro-cont . +) + (defun-block-intro . +) + (else-clause . 0) + (func-decl-cont . +) + (inclass . +) + (inher-cont . c-lineup-multi-inher) + (knr-argdecl-intro . 0) + (label . -1000) + (statement . 0) + (statement-block-intro . +) + (statement-case-intro . +) + (statement-cont . +) + (substatement . +) + )) + (indent-tabs-mode . t) + (show-trailing-whitespace . t) + )))) + + (dir-locals-set-directory-class + (expand-file-name "~/src/linux-trees") + 'linux-kernel) Questo farà funzionare meglio emacs con lo stile del kernel per i file che si trovano nella cartella ``~/src/linux-trees``. diff --git a/Documentation/translations/zh_CN/coding-style.rst b/Documentation/translations/zh_CN/coding-style.rst index 1466aa64b8b4..3cb09803e084 100644 --- a/Documentation/translations/zh_CN/coding-style.rst +++ b/Documentation/translations/zh_CN/coding-style.rst @@ -535,26 +535,43 @@ Documentation/doc-guide/ 和 scripts/kernel-doc 以获得详细信息。 (* (max steps 1) c-basic-offset))) - (add-hook 'c-mode-common-hook - (lambda () - ;; Add kernel style - (c-add-style - "linux-tabs-only" - '("linux" (c-offsets-alist - (arglist-cont-nonempty - c-lineup-gcc-asm-reg - c-lineup-arglist-tabs-only)))))) - - (add-hook 'c-mode-hook - (lambda () - (let ((filename (buffer-file-name))) - ;; Enable kernel mode for the appropriate files - (when (and filename - (string-match (expand-file-name "~/src/linux-trees") - filename)) - (setq indent-tabs-mode t) - (setq show-trailing-whitespace t) - (c-set-style "linux-tabs-only"))))) + (dir-locals-set-class-variables + 'linux-kernel + '((c-mode . ( + (c-basic-offset . 8) + (c-label-minimum-indentation . 0) + (c-offsets-alist . ( + (arglist-close . c-lineup-arglist-tabs-only) + (arglist-cont-nonempty . + (c-lineup-gcc-asm-reg c-lineup-arglist-tabs-only)) + (arglist-intro . +) + (brace-list-intro . +) + (c . c-lineup-C-comments) + (case-label . 0) + (comment-intro . c-lineup-comment) + (cpp-define-intro . +) + (cpp-macro . -1000) + (cpp-macro-cont . +) + (defun-block-intro . +) + (else-clause . 0) + (func-decl-cont . +) + (inclass . +) + (inher-cont . c-lineup-multi-inher) + (knr-argdecl-intro . 0) + (label . -1000) + (statement . 0) + (statement-block-intro . +) + (statement-case-intro . +) + (statement-cont . +) + (substatement . +) + )) + (indent-tabs-mode . t) + (show-trailing-whitespace . t) + )))) + + (dir-locals-set-directory-class + (expand-file-name "~/src/linux-trees") + 'linux-kernel) 这会让 emacs 在 ``~/src/linux-trees`` 下的 C 源文件获得更好的内核代码风格。 -- cgit v1.2.3-59-g8ed1b From 2d87948a19ac26bb0307bd2af2a5550e4c3a172d Mon Sep 17 00:00:00 2001 From: Laurent Gauthier Date: Sat, 5 Jan 2019 00:08:34 +0100 Subject: doc: fault-injection: fix macro name in example Signed-off-by: Laurent Gauthier Signed-off-by: Jonathan Corbet --- Documentation/fault-injection/fault-injection.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/fault-injection/fault-injection.txt b/Documentation/fault-injection/fault-injection.txt index 4d1b7b4ccfaf..a17517a083c3 100644 --- a/Documentation/fault-injection/fault-injection.txt +++ b/Documentation/fault-injection/fault-injection.txt @@ -195,7 +195,7 @@ o #include o define the fault attributes - DECLARE_FAULT_INJECTION(name); + DECLARE_FAULT_ATTR(name); Please see the definition of struct fault_attr in fault-inject.h for details. -- cgit v1.2.3-59-g8ed1b From 9ac963c98e2c2d0bc6faafa3e746c94bed12426a Mon Sep 17 00:00:00 2001 From: Federico Vaga Date: Sun, 23 Dec 2018 02:01:02 +0100 Subject: doc:it_IT: translation for process/submitting-patches It translats the document process/submitting-patches.rst. Signed-off-by: Federico Vaga Signed-off-by: Jonathan Corbet --- .../it_IT/process/submitting-patches.rst | 862 ++++++++++++++++++++- 1 file changed, 858 insertions(+), 4 deletions(-) diff --git a/Documentation/translations/it_IT/process/submitting-patches.rst b/Documentation/translations/it_IT/process/submitting-patches.rst index d633775ed556..b6368a45a8e4 100644 --- a/Documentation/translations/it_IT/process/submitting-patches.rst +++ b/Documentation/translations/it_IT/process/submitting-patches.rst @@ -1,13 +1,867 @@ .. include:: ../disclaimer-ita.rst :Original: :ref:`Documentation/process/submitting-patches.rst ` - +:Translator: Federico Vaga .. _it_submittingpatches: -Sottomettere modifiche: la guida essenziale per vedere il vostro codice nel kernel -================================================================================== +Inviare patch: la guida essenziale per vedere il vostro codice nel kernel +========================================================================= + +Una persona o un'azienda che volesse inviare una patch al kernel potrebbe +sentirsi scoraggiata dal processo di sottomissione, specialmente quando manca +una certa familiarità col "sistema". Questo testo è una raccolta di +suggerimenti che aumenteranno significativamente le probabilità di vedere le +vostre patch accettate. + +Questo documento contiene un vasto numero di suggerimenti concisi. Per +maggiori dettagli su come funziona il processo di sviluppo del kernel leggete +:ref:`Documentation/translations/it_IT/process `. +Leggete anche :ref:`Documentation/translations/it_IT/process/submit-checklist.rst ` +per una lista di punti da verificare prima di inviare del codice. Se state +inviando un driver, allora leggete anche :ref:`Documentation/translations/it_IT/process/submitting-drivers.rst `; +per delle patch relative alle associazioni per Device Tree leggete +Documentation/devicetree/bindings/submitting-patches.txt. + +Molti di questi passi descrivono il comportamento di base del sistema di +controllo di versione ``git``; se utilizzate ``git`` per preparare le vostre +patch molto del lavoro più ripetitivo lo troverete già fatto per voi, tuttavia +dovete preparare e documentare un certo numero di patch. Generalmente, l'uso +di ``git`` renderà la vostra vita di sviluppatore del kernel più facile. + +0) Ottenere i sorgenti attuali +------------------------------ + +Se non avete un repositorio coi sorgenti del kernel più recenti, allora usate +``git`` per ottenerli. Vorrete iniziare col repositorio principale che può +essere recuperato col comando:: + + git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git + +Notate, comunque, che potreste non voler sviluppare direttamente coi sorgenti +principali del kernel. La maggior parte dei manutentori hanno i propri +sorgenti e desiderano che le patch siano preparate basandosi su di essi. +Guardate l'elemento **T:** per un determinato sottosistema nel file MAINTANERS +che troverete nei sorgenti, o semplicemente chiedete al manutentore nel caso +in cui i sorgenti da usare non siano elencati il quel file. + +Esiste ancora la possibilità di scaricare un rilascio del kernel come archivio +tar (come descritto in una delle prossime sezioni), ma questa è la via più +complicata per sviluppare per il kernel. + +1) ``diff -up`` +--------------- + +Se dovete produrre le vostre patch a mano, usate ``diff -up`` o ``diff -uprN`` +per crearle. Git produce di base le patch in questo formato; se state +usando ``git``, potete saltare interamente questa sezione. + +Tutte le modifiche al kernel Linux avvengono mediate patch, come descritte +in :manpage:`diff(1)`. Quando create la vostra patch, assicuratevi di +crearla nel formato "unified diff", come l'argomento ``-u`` di +:manpage:`diff(1)`. +Inoltre, per favore usate l'argomento ``-p`` per mostrare la funzione C +alla quale si riferiscono le diverse modifiche - questo rende il risultato +di ``diff`` molto più facile da leggere. Le patch dovrebbero essere basate +sulla radice dei sorgenti del kernel, e non sulle sue sottocartelle. + +Per creare una patch per un singolo file, spesso è sufficiente fare:: + + SRCTREE= linux + MYFILE= drivers/net/mydriver.c + + cd $SRCTREE + cp $MYFILE $MYFILE.orig + vi $MYFILE # make your change + cd .. + diff -up $SRCTREE/$MYFILE{.orig,} > /tmp/patch + +Per creare una patch per molteplici file, dovreste spacchettare i sorgenti +"vergini", o comunque non modificati, e fare un ``diff`` coi vostri. +Per esempio:: + + MYSRC= /devel/linux + + tar xvfz linux-3.19.tar.gz + mv linux-3.19 linux-3.19-vanilla + diff -uprN -X linux-3.19-vanilla/Documentation/dontdiff \ + linux-3.19-vanilla $MYSRC > /tmp/patch + +``dontdiff`` è una lista di file che sono generati durante il processo di +compilazione del kernel; questi dovrebbero essere ignorati in qualsiasi +patch generata con :manpage:`diff(1)`. + +Assicuratevi che la vostra patch non includa file che non ne fanno veramente +parte. Al fine di verificarne la correttezza, assicuratevi anche di +revisionare la vostra patch -dopo- averla generata con :manpage:`diff(1)`. + +Se le vostre modifiche producono molte differenze, allora dovrete dividerle +in patch indipendenti che modificano le cose in passi logici; leggete +:ref:`split_changes`. Questo faciliterà la revisione da parte degli altri +sviluppatori, il che è molto importante se volete che la patch venga accettata. + +Se state utilizzando ``git``, ``git rebase -i`` può aiutarvi nel procedimento. +Se non usate ``git``, un'alternativa popolare è ``quilt`` +. + +.. _it_describe_changes: + +2) Descrivete le vostre modifiche +--------------------------------- + +Descrivete il vostro problema. Esiste sempre un problema che via ha spinto +ha fare il vostro lavoro, che sia la correzione di un baco da una riga o una +nuova funzionalità da 5000 righe di codice. Convincete i revisori che vale +la pena risolvere il vostro problema e che ha senso continuare a leggere oltre +al primo paragrafo. + +Descrivete ciò che sarà visibile agli utenti. Chiari incidenti nel sistema +e blocchi sono abbastanza convincenti, ma non tutti i bachi sono così evidenti. +Anche se il problema è stato scoperto durante la revisione del codice, +descrivete l'impatto che questo avrà sugli utenti. Tenete presente che +la maggior parte delle installazioni Linux usa un kernel che arriva dai +sorgenti stabili o dai sorgenti di una distribuzione particolare che prende +singolarmente le patch dai sorgenti principali; quindi, includete tutte +le informazioni che possono essere utili a capire le vostre modifiche: +le circostanze che causano il problema, estratti da dmesg, descrizioni di +un incidente di sistema, prestazioni di una regressione, picchi di latenza, +blocchi, eccetera. + +Quantificare le ottimizzazioni e i compromessi. Se affermate di aver +migliorato le prestazioni, il consumo di memoria, l'impatto sollo stack, +o la dimensione del file binario, includete dei numeri a supporto della +vostra dichiarazione. Ma ricordatevi di descrivere anche eventuali costi +che non sono ovvi. Solitamente le ottimizzazioni non sono gratuite, ma sono +un compromesso fra l'uso di CPU, la memoria e la leggibilità; o, quando si +parla di ipotesi euristiche, fra differenti carichi. Descrivete i lati +negativi che vi aspettate dall'ottimizzazione cosicché i revisori possano +valutare i costi e i benefici. + +Una volta che il problema è chiaro, descrivete come lo risolvete andando +nel dettaglio tecnico. È molto importante che descriviate la modifica +in un inglese semplice cosicché i revisori possano verificare che il codice si +comporti come descritto. + +I manutentori vi saranno grati se scrivete la descrizione della patch in un +formato che sia compatibile con il gestore dei sorgenti usato dal kernel, +``git``, come un "commit log". Leggete :ref:`it_explicit_in_reply_to`. + +Risolvete solo un problema per patch. Se la vostra descrizione inizia ad +essere lunga, potrebbe essere un segno che la vostra patch necessita d'essere +divisa. Leggete :ref:`split_changes`. + +Quando inviate o rinviate una patch o una serie, includete la descrizione +completa delle modifiche e la loro giustificazione. Non limitatevi a dire che +questa è la versione N della patch (o serie). Non aspettatevi che i +manutentori di un sottosistema vadano a cercare le versioni precedenti per +cercare la descrizione da aggiungere. In pratica, la patch (o serie) e la sua +descrizione devono essere un'unica cosa. Questo aiuta i manutentori e i +revisori. Probabilmente, alcuni revisori non hanno nemmeno ricevuto o visto +le versioni precedenti della patch. + +Descrivete le vostro modifiche usando l'imperativo, per esempio "make xyzzy +do frotz" piuttosto che "[This patch] makes xyzzy do frotz" or "[I] changed +xyzzy to do frotz", come se steste dando ordini al codice di cambiare il suo +comportamento. + +Se la patch corregge un baco conosciuto, fare riferimento a quel baco inserendo +il suo numero o il suo URL. Se la patch è la conseguenza di una discussione +su una lista di discussione, allora fornite l'URL all'archivio di quella +discussione; usate i collegamenti a https://lkml.kernel.org/ con il +``Message-Id``, in questo modo vi assicurerete che il collegamento non diventi +invalido nel tempo. + +Tuttavia, cercate di rendere la vostra spiegazione comprensibile anche senza +far riferimento a fonti esterne. In aggiunta ai collegamenti a bachi e liste +di discussione, riassumente i punti più importanti della discussione che hanno +portato alla creazione della patch. + +Se volete far riferimento a uno specifico commit, non usate solo +l'identificativo SHA-1. Per cortesia, aggiungete anche la breve riga +riassuntiva del commit per rendere la chiaro ai revisori l'oggetto. +Per esempio:: + + Commit e21d2170f36602ae2708 ("video: remove unnecessary + platform_set_drvdata()") removed the unnecessary + platform_set_drvdata(), but left the variable "dev" unused, + delete it. + +Dovreste anche assicurarvi di usare almeno i primi 12 caratteri +dell'identificativo SHA-1. Il repositorio del kernel ha *molti* oggetti e +questo rende possibile la collisione fra due identificativi con pochi +caratteri. Tenete ben presente che anche se oggi non ci sono collisioni con il +vostro identificativo a 6 caratteri, potrebbero essercene fra 5 anni da oggi. + +Se la vostra patch corregge un baco in un commit specifico, per esempio avete +trovato un problema usando ``git bisect``, per favore usate l'etichetta +'Fixes:' indicando i primi 12 caratteri dell'identificativo SHA-1 seguiti +dalla riga riassuntiva. Per esempio:: + + Fixes: e21d2170f366 ("video: remove unnecessary platform_set_drvdata()") + +La seguente configurazione di ``git config`` può essere usata per formattare +i risultati dei comandi ``git log`` o ``git show`` come nell'esempio +precedente:: + + [core] + abbrev = 12 + [pretty] + fixes = Fixes: %h (\"%s\") + +.. _it_split_changes: + +3) Separate le vostre modifiche +------------------------------- + +Separate ogni **cambiamento logico** in patch distinte. + +Per esempio, se i vostri cambiamenti per un singolo driver includono +sia delle correzioni di bachi che miglioramenti alle prestazioni, +allora separateli in due o più patch. Se i vostri cambiamenti includono +un aggiornamento dell'API e un nuovo driver che lo sfrutta, allora separateli +in due patch. + +D'altro canto, se fate una singola modifica su più file, raggruppate tutte +queste modifiche in una singola patch. Dunque, un singolo cambiamento logico +è contenuto in una sola patch. + +Il punto da ricordare è che ogni modifica dovrebbe fare delle modifiche +che siano facilmente comprensibili e che possano essere verificate dai revisori. +Ogni patch dovrebbe essere giustificabile di per sé. + +Se al fine di ottenere un cambiamento completo una patch dipende da un'altra, +va bene. Semplicemente scrivete una nota nella descrizione della patch per +farlo presente: **"this patch depends on patch X"**. + +Quando dividete i vostri cambiamenti in una serie di patch, prestate +particolare attenzione alla verifica di ogni patch della serie; per ognuna +il kernel deve compilare ed essere eseguito correttamente. Gli sviluppatori +che usano ``git bisect`` per scovare i problemi potrebbero finire nel mezzo +della vostra serie in un punto qualsiasi; non vi saranno grati se nel mezzo +avete introdotto dei bachi. + +Se non potete condensare la vostra serie di patch in una più piccola, allora +pubblicatene una quindicina alla volta e aspettate che vengano revisionate +ed integrate. + + +4) Verificate lo stile delle vostre modifiche +--------------------------------------------- + +Controllate che la vostra patch non violi lo stile del codice, maggiori +dettagli sono disponibili in :ref:`Documentation/translations/it_IT/process/coding-style.rst `. +Non farlo porta semplicemente a una perdita di tempo da parte dei revisori e +voi vedrete la vostra patch rifiutata, probabilmente senza nemmeno essere stata +letta. + +Un'eccezione importante si ha quando del codice viene spostato da un file +ad un altro -- in questo caso non dovreste modificare il codice spostato +per nessun motivo, almeno non nella patch che lo sposta. Questo separa +chiaramente l'azione di spostare il codice e il vostro cambiamento. +Questo aiuta enormemente la revisione delle vere differenze e permette agli +strumenti di tenere meglio la traccia della storia del codice. + +Prima di inviare una patch, verificatene lo stile usando l'apposito +verificatore (scripts/checkpatch.pl). Da notare, comunque, che il verificator +di stile dovrebbe essere visto come una guida, non come un sostituto al +giudizio umano. Se il vostro codice è migliore nonostante una violazione +dello stile, probabilmente è meglio lasciarlo com'è. + +Il verificatore ha tre diversi livelli di severità: + - ERROR: le cose sono molto probabilmente sbagliate + - WARNING: le cose necessitano d'essere revisionate con attenzione + - CHECK: le cose necessitano di un pensierino + +Dovreste essere in grado di giustificare tutte le eventuali violazioni rimaste +nella vostra patch. + + +5) Selezionate i destinatari della vostra patch +----------------------------------------------- + +Dovreste sempre inviare una copia della patch ai manutentori dei sottosistemi +interessati dalle modifiche; date un'occhiata al file MAINTAINERS e alla storia +delle revisioni per scoprire chi si occupa del codice. Lo script +scripts/get_maintainer.pl può esservi d'aiuto. Se non riuscite a trovare un +manutentore per il sottosistema su cui state lavorando, allora Andrew Morton +(akpm@linux-foundation.org) sarà la vostra ultima possibilità. + +Normalmente, dovreste anche scegliere una lista di discussione a cui inviare +la vostra serie di patch. La lista di discussione linux-kernel@vger.kernel.org +è proprio l'ultima spiaggia, il volume di email su questa lista fa si che +diversi sviluppatori non la seguano. Guardate nel file MAINTAINERS per trovare +la lista di discussione dedicata ad un sottosistema; probabilmente lì la vostra +patch riceverà molta più attenzione. Tuttavia, per favore, non spammate le +liste di discussione che non sono interessate al vostro lavoro. + +Molte delle liste di discussione relative al kernel vengono ospitate su +vger.kernel.org; potete trovare un loro elenco alla pagina +http://vger.kernel.org/vger-lists.html. Tuttavia, ci sono altre liste di +discussione ospitate altrove. + +Non inviate più di 15 patch alla volta sulle liste di discussione vger!!! + +L'ultimo giudizio sull'integrazione delle modifiche accettate spetta a +Linux Torvalds. Il suo indirizzo e-mail è . +Riceve moltissime e-mail, e, a questo punto, solo poche patch passano +direttamente attraverso il suo giudizio; quindi, dovreste fare del vostro +meglio per -evitare di- inviargli e-mail. + +Se avete una patch che corregge un baco di sicurezza che potrebbe essere +sfruttato, inviatela a security@kernel.org. Per bachi importanti, un breve +embargo potrebbe essere preso in considerazione per dare il tempo alle +distribuzioni di prendere la patch e renderla disponibile ai loro utenti; +in questo caso, ovviamente, la patch non dovrebbe essere inviata su alcuna +lista di discussione pubblica. + +Patch che correggono bachi importanti su un kernel già rilasciato, dovrebbero +essere inviate ai manutentori dei kernel stabili aggiungendo la seguente riga:: + + Cc: stable@vger.kernel.org + +nella vostra patch, nell'area dedicata alle firme (notate, NON come destinatario +delle e-mail). In aggiunta a questo file, dovreste leggere anche +:ref:`Documentation/translations/it_IT/process/stable-kernel-rules.rst ` + +Tuttavia, notate, che alcuni manutentori di sottosistema preferiscono avere +l'ultima parola su quali patch dovrebbero essere aggiunte ai kernel stabili. +La rete di manutentori, in particolare, non vorrebbe vedere i singoli +sviluppatori aggiungere alle loro patch delle righe come quella sopracitata. + +Se le modifiche hanno effetti sull'interfaccia con lo spazio utente, per favore +inviate una patch per le pagine man ai manutentori di suddette pagine (elencati +nel file MAINTAINERS), o almeno una notifica circa la vostra modifica, +cosicché l'informazione possa trovare la sua strada nel manuale. Le modifiche +all'API dello spazio utente dovrebbero essere inviate in copia anche a +linux-api@vger.kernel.org. + +Per le piccole patch potreste aggiungere in CC l'indirizzo +*Trivial Patch Monkey trivial@kernel.org* che ha lo scopo di raccogliere +le patch "banali". Date uno sguardo al file MAINTAINERS per vedere chi +è l'attuale amministratore. + +Le patch banali devono rientrare in una delle seguenti categorie: + +- errori grammaticali nella documentazione +- errori grammaticali negli errori che potrebbero rompere :manpage:`grep(1)` +- correzione di avvisi di compilazione (riempirsi di avvisi inutili è negativo) +- correzione di errori di compilazione (solo se correggono qualcosa sul serio) +- rimozione di funzioni/macro deprecate +- sostituzione di codice non potabile con uno portabile (anche in codice + specifico per un'architettura, dato che le persone copiano, fintanto che + la modifica sia banale) +- qualsiasi modifica dell'autore/manutentore di un file (in pratica + "patch monkey" in modalità ritrasmissione) + + +6) Niente: MIME, links, compressione, allegati. Solo puro testo +---------------------------------------------------------------- + +Linus e gli altri sviluppatori del kernel devono poter commentare +le modifiche che sottomettete. Per uno sviluppatore è importante +essere in grado di "citare" le vostre modifiche, usando normali +programmi di posta elettronica, cosicché sia possibile commentare +una porzione specifica del vostro codice. + +Per questa ragione tutte le patch devono essere inviate via e-mail +come testo. .. warning:: - TODO ancora da tradurre + Se decidete di copiare ed incollare la patch nel corpo dell'e-mail, state + attenti che il vostro programma non corrompa il contenuto con andate + a capo automatiche. + +La patch non deve essere un allegato MIME, compresso o meno. Molti +dei più popolari programmi di posta elettronica non trasmettono un allegato +MIME come puro testo, e questo rende impossibile commentare il vostro codice. +Inoltre, un allegato MIME rende l'attività di Linus più laboriosa, diminuendo +così la possibilità che il vostro allegato-MIME venga accettato. + +Eccezione: se il vostro servizio di posta storpia le patch, allora qualcuno +potrebbe chiedervi di rinviarle come allegato MIME. + +Leggete :ref:`Documentation/translations/it_IT/process/email-clients.rst ` +per dei suggerimenti sulla configurazione del programmi di posta elettronica +per l'invio di patch intatte. + +7) Dimensione delle e-mail +-------------------------- + +Le grosse modifiche non sono adatte ad una lista di discussione, e nemmeno +per alcuni manutentori. Se la vostra patch, non compressa, eccede i 300 kB +di spazio, allora caricatela in una spazio accessibile su internet fornendo +l'URL (collegamento) ad essa. Ma notate che se la vostra patch eccede i 300 kB +è quasi certo che necessiti comunque di essere spezzettata. + +8) Rispondere ai commenti di revisione +-------------------------------------- + +Quasi certamente i revisori vi invieranno dei commenti su come migliorare +la vostra patch. Dovete rispondere a questi commenti; ignorare i revisori +è un ottimo modo per essere ignorati. Riscontri o domande che non conducono +ad una modifica del codice quasi certamente dovrebbero portare ad un commento +nel changelog cosicché il prossimo revisore potrà meglio comprendere cosa stia +accadendo. + +Assicuratevi di dire ai revisori quali cambiamenti state facendo e di +ringraziarli per il loro tempo. Revisionare codice è un lavoro faticoso e che +richiede molto tempo, e a volte i revisori diventano burberi. Tuttavia, anche +in questo caso, rispondete con educazione e concentratevi sul problema che +hanno evidenziato. + +9) Non scoraggiatevi - o impazientitevi +--------------------------------------- + +Dopo che avete inviato le vostre modifiche, siate pazienti e aspettate. +I revisori sono persone occupate e potrebbero non ricevere la vostra patch +immediatamente. + +Un tempo, le patch erano solite scomparire nel vuoto senza alcun commento, +ma ora il processo di sviluppo funziona meglio. Dovreste ricevere commenti +in una settimana o poco più; se questo non dovesse accadere, assicuratevi di +aver inviato le patch correttamente. Aspettate almeno una settimana prima di +rinviare le modifiche o sollecitare i revisori - probabilmente anche di più +durante la finestra d'integrazione. + +10) Aggiungete PATCH nell'oggetto +--------------------------------- + +Dato l'alto volume di e-mail per Linus, e la lista linux-kernel, è prassi +prefiggere il vostro oggetto con [PATCH]. Questo permette a Linus e agli +altri sviluppatori del kernel di distinguere facilmente le patch dalle altre +discussioni. + + +11) Firmate il vostro lavoro - Il certificato d'origine dello sviluppatore +-------------------------------------------------------------------------- + +Per migliorare la tracciabilità su "chi ha fatto cosa", specialmente per +quelle patch che per raggiungere lo stadio finale passano attraverso +diversi livelli di manutentori, abbiamo introdotto la procedura di "firma" +delle patch che vengono inviate per e-mail. + +La firma è una semplice riga alla fine della descrizione della patch che +certifica che l'avete scritta voi o che avete il diritto di pubblicarla +come patch open-source. Le regole sono abbastanza semplici: se potete +certificare quanto segue: + +Il certificato d'origine dello sviluppatore 1.1 +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Contribuendo a questo progetto, io certifico che: + + (a) Il contributo è stato creato interamente, o in parte, da me e che + ho il diritto di inviarlo in accordo con la licenza open-source + indicata nel file; oppure + + (b) Il contributo è basato su un lavoro precedente che, nei limiti + della mia conoscenza, è coperto da un'appropriata licenza + open-source che mi da il diritto di modificarlo e inviarlo, + le cui modifiche sono interamente o in parte mie, in accordo con + la licenza open-source (a meno che non abbia il permesso di usare + un'altra licenza) indicata nel file; oppure + + (c) Il contributo mi è stato fornito direttamente da qualcuno che + ha certificato (a), (b) o (c) e non l'ho modificata. + + (d) Capisco e concordo col fatto che questo progetto e i suoi + contributi sono pubblici e che un registro dei contributi (incluse + tutte le informazioni personali che invio con essi, inclusa la mia + firma) verrà mantenuto indefinitamente e che possa essere + ridistribuito in accordo con questo progetto o le licenze + open-source coinvolte. + +poi dovete solo aggiungere una riga che dice:: + + Signed-off-by: Random J Developer + +usando il vostro vero nome (spiacenti, non si accettano pseudonimi o +contributi anonimi). + +Alcune persone aggiungono delle etichette alla fine. Per ora queste verranno +ignorate, ma potete farlo per meglio identificare procedure aziendali interne o +per aggiungere dettagli circa la firma. + +Se siete un manutentore di un sottosistema o di un ramo, qualche volta dovrete +modificare leggermente le patch che avete ricevuto al fine di poterle +integrare; questo perché il codice non è esattamente lo stesso nei vostri +sorgenti e in quelli dei vostri contributori. Se rispettate rigidamente la +regola (c), dovreste chiedere al mittente di rifare la patch, ma questo è +controproducente e una totale perdita di tempo ed energia. La regola (b) +vi permette di correggere il codice, ma poi diventa davvero maleducato cambiare +la patch di qualcuno e addossargli la responsabilità per i vostri bachi. +Per risolvere questo problema dovreste aggiungere una riga, fra l'ultimo +Signed-off-by e il vostro, che spiega la vostra modifica. Nonostante non ci +sia nulla di obbligatorio, un modo efficace è quello di indicare il vostro +nome o indirizzo email fra parentesi quadre, seguito da una breve descrizione; +questo renderà abbastanza visibile chi è responsabile per le modifiche +dell'ultimo minuto. Per esempio:: + + Signed-off-by: Random J Developer + [lucky@maintainer.example.org: struct foo moved from foo.c to foo.h] + Signed-off-by: Lucky K Maintainer + +Questa pratica è particolarmente utile se siete i manutentori di un ramo +stabile ma al contempo volete dare credito agli autori, tracciare e integrare +le modifiche, e proteggere i mittenti dalle lamentele. Notate che in nessuna +circostanza è permessa la modifica dell'identità dell'autore (l'intestazione +From), dato che è quella che appare nei changelog. + +Un appunto speciale per chi porta il codice su vecchie versioni. Sembra che +sia comune l'utile pratica di inserire un'indicazione circa l'origine della +patch all'inizio del messaggio di commit (appena dopo la riga dell'oggetto) +al fine di migliorare la tracciabilità. Per esempio, questo è quello che si +vede nel rilascio stabile 3.x-stable:: + + Date: Tue Oct 7 07:26:38 2014 -0400 + + libata: Un-break ATA blacklist + + commit 1c40279960bcd7d52dbdf1d466b20d24b99176c8 upstream. + +E qui quello che potrebbe vedersi su un kernel più vecchio dove la patch è +stata applicata:: + + Date: Tue May 13 22:12:27 2008 +0200 + + wireless, airo: waitbusy() won't delay + + [backport of 2.6 commit b7acbdfbd1f277c1eb23f344f899cfa4cd0bf36a] + +Qualunque sia il formato, questa informazione fornisce un importante aiuto +alle persone che vogliono seguire i vostri sorgenti, e quelle che cercano +dei bachi. + + +12) Quando utilizzare Acked-by:, Cc:, e Co-Developed-by: +-------------------------------------------------------- + +L'etichetta Signed-off-by: indica che il firmatario è stato coinvolto nello +sviluppo della patch, o che era nel suo percorso di consegna. + +Se una persona non è direttamente coinvolta con la preparazione o gestione +della patch ma desidera firmare e mettere agli atti la loro approvazione, +allora queste persone possono chiedere di aggiungere al changelog della patch +una riga Acked-by:. + +Acked-by: viene spesso utilizzato dai manutentori del sottosistema in oggetto +quando quello stesso manutentore non ha contribuito né trasmesso la patch. + +Acked-by: non è formale come Signed-off-by:. Questo indica che la persona ha +revisionato la patch e l'ha trovata accettabile. Per cui, a volte, chi +integra le patch convertirà un "sì, mi sembra che vada bene" in un Acked-by: +(ma tenete presente che solitamente è meglio chiedere esplicitamente). + +Acked-by: non indica l'accettazione di un'intera patch. Per esempio, quando +una patch ha effetti su diversi sottosistemi e ha un Acked-by: da un +manutentore di uno di questi, significa che il manutentore accetta quella +parte di codice relativa al sottosistema che mantiene. Qui dovremmo essere +giudiziosi. Quando si hanno dei dubbi si dovrebbe far riferimento alla +discussione originale negli archivi della lista di discussione. + +Se una persona ha avuto l'opportunità di commentare la patch, ma non lo ha +fatto, potete aggiungere l'etichetta ``Cc:`` alla patch. Questa è l'unica +etichetta che può essere aggiunta senza che la persona in questione faccia +alcunché - ma dovrebbe indicare che la persona ha ricevuto una copia della +patch. Questa etichetta documenta che terzi potenzialmente interessati sono +stati inclusi nella discussione. + +L'etichetta Co-Developed-by: indica che la patch è stata scritta dall'autore in +collaborazione con un altro sviluppatore. Qualche volta questo è utile quando +più persone lavorano sulla stessa patch. Notate, questa persona deve avere +nella patch anche una riga Signed-off-by:. + + +13) Utilizzare Reported-by:, Tested-by:, Reviewed-by:, Suggested-by: e Fixes: +----------------------------------------------------------------------------- + +L'etichetta Reported-by da credito alle persone che trovano e riportano i bachi +e si spera che questo possa ispirarli ad aiutarci nuovamente in futuro. +Rammentate che se il baco è stato riportato in privato, dovrete chiedere il +permesso prima di poter utilizzare l'etichetta Reported-by. + +L'etichetta Tested-by: indica che la patch è stata verificata con successo +(su un qualche sistema) dalla persona citata. Questa etichetta informa i +manutentori che qualche verifica è stata fatta, fornisce un mezzo per trovare +persone che possano verificare il codice in futuro, e garantisce che queste +stesse persone ricevano credito per il loro lavoro. + +Reviewd-by:, invece, indica che la patch è stata revisionata ed è stata +considerata accettabile in accordo con la dichiarazione dei revisori: + +Dichiarazione di svista dei revisori +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Offrendo la mia etichetta Reviewed-by, dichiaro quanto segue: + + (a) Ho effettuato una revisione tecnica di questa patch per valutarne + l'adeguatezza ai fini dell'inclusione nel ramo principale del + kernel. + + (b) Tutti i problemi e le domande riguardanti la patch sono stati + comunicati al mittente. Sono soddisfatto dalle risposte + del mittente. + + (c) Nonostante ci potrebbero essere cose migliorabili in queste + sottomissione, credo che sia, in questo momento, (1) una modifica + di interesse per il kernel, e (2) libera da problemi che + potrebbero metterne in discussione l'integrazione. + + (d) Nonostante abbia revisionato la patch e creda che vada bene, + non garantisco (se non specificato altrimenti) che questa + otterrà quello che promette o funzionerà correttamente in tutte + le possibili situazioni. + +L'etichetta Reviewed-by è la dichiarazione di un parere sulla bontà di +una modifica che si ritiene appropriata e senza alcun problema tecnico +importante. Qualsiasi revisore interessato (quelli che lo hanno fatto) +possono offrire il proprio Reviewed-by per la patch. Questa etichetta serve +a dare credito ai revisori e a informare i manutentori sul livello di revisione +che è stato fatto sulla patch. L'etichetta Reviewd-by, quando fornita da +revisori conosciuti per la loro conoscenza sulla materia in oggetto e per la +loro serietà nella revisione, accrescerà le probabilità che la vostra patch +venga integrate nel kernel. + +L'etichetta Suggested-by: indica che l'idea della patch è stata suggerita +dalla persona nominata e le da credito. Tenete a mente che questa etichetta +non dovrebbe essere aggiunta senza un permesso esplicito, specialmente se +l'idea non è stata pubblicata in un forum pubblico. Detto ciò, dando credito +a chi ci fornisce delle idee, si spera di poterli ispirare ad aiutarci +nuovamente in futuro. + +L'etichetta Fixes: indica che la patch corregge un problema in un commit +precedente. Serve a chiarire l'origine di un baco, il che aiuta la revisione +del baco stesso. Questa etichetta è di aiuto anche per i manutentori dei +kernel stabili al fine di capire quale kernel deve ricevere la correzione. +Questo è il modo suggerito per indicare che un baco è stato corretto nella +patch. Per maggiori dettagli leggete :ref:`it_describe_changes` + + +14) Il formato canonico delle patch +----------------------------------- + +Questa sezione descrive il formato che dovrebbe essere usato per le patch. +Notate che se state usando un repositorio ``git`` per salvare le vostre patch +potere usare il comando ``git format-patch`` per ottenere patch nel formato +appropriato. Lo strumento non crea il testo necessario, per cui, leggete +le seguenti istruzioni. + +L'oggetto di una patch canonica è la riga:: + + Subject: [PATCH 001/123] subsystem: summary phrase + +Il corpo di una patch canonica contiene i seguenti elementi: + + - Una riga ``from`` che specifica l'autore della patch, seguita + da una riga vuota (necessaria soltanto se la persona che invia la + patch non ne è l'autore). + + - Il corpo della spiegazione, con linee non più lunghe di 75 caratteri, + che verrà copiato permanentemente nel changelog per descrivere la patch. + + - Una riga vuota + + - Le righe ``Signed-off-by:``, descritte in precedenza, che finiranno + anch'esse nel changelog. + + - Una linea di demarcazione contenente semplicemente ``---``. + + - Qualsiasi altro commento che non deve finire nel changelog. + + - Le effettive modifiche al codice (il prodotto di ``diff``). + +Il formato usato per l'oggetto permette ai programmi di posta di usarlo +per ordinare le patch alfabeticamente - tutti i programmi di posta hanno +questa funzionalità - dato che al numero sequenziale si antepongono degli zeri; +in questo modo l'ordine numerico ed alfabetico coincidono. + +Il ``subsystem`` nell'oggetto dell'email dovrebbe identificare l'area +o il sottosistema modificato dalla patch. + +La ``summary phrase`` nell'oggetto dell'email dovrebbe descrivere brevemente +il contenuto della patch. La ``summary phrase`` non dovrebbe essere un nome +di file. Non utilizzate la stessa ``summary phrase`` per tutte le patch in +una serie (dove una ``serie di patch`` è una sequenza ordinata di diverse +patch correlate). + +Ricordatevi che la ``summary phrase`` della vostra email diventerà un +identificatore globale ed unico per quella patch. Si propaga fino al +changelog ``git``. La ``summary phrase`` potrà essere usata in futuro +dagli sviluppatori per riferirsi a quella patch. Le persone vorranno +cercare la ``summary phrase`` su internet per leggere le discussioni che la +riguardano. Potrebbe anche essere l'unica cosa che le persone vedranno +quando, in due o tre mesi, riguarderanno centinaia di patch usando strumenti +come ``gitk`` o ``git log --oneline``. + +Per queste ragioni, dovrebbe essere lunga fra i 70 e i 75 caratteri, e deve +descrivere sia cosa viene modificato, sia il perché sia necessario. Essere +brevi e descrittivi è una bella sfida, ma questo è quello che fa un riassunto +ben scritto. + +La ``summary phrase`` può avere un'etichetta (*tag*) di prefisso racchiusa fra +le parentesi quadre "Subject: [PATCH ...] ". +Le etichette non verranno considerate come parte della frase riassuntiva, ma +indicano come la patch dovrebbe essere trattata. Fra le etichette più comuni +ci sono quelle di versione che vengono usate quando una patch è stata inviata +più volte (per esempio, "v1, v2, v3"); oppure "RFC" per indicare che si +attendono dei commenti (*Request For Comments*). Se ci sono quattro patch +nella serie, queste dovrebbero essere enumerate così: 1/4, 2/4, 3/4, 4/4. +Questo assicura che gli sviluppatori capiranno l'ordine in cui le patch +dovrebbero essere applicate, e per tracciare quelle che hanno revisionate o +che hanno applicato. + +Un paio di esempi di oggetti:: + + Subject: [PATCH 2/5] ext2: improve scalability of bitmap searching + Subject: [PATCH v2 01/27] x86: fix eflags tracking + +La riga ``from`` dev'essere la prima nel corpo del messaggio ed è nel +formato: + + From: Original Author + +La riga ``from`` indica chi verrà accreditato nel changelog permanente come +l'autore della patch. Se la riga ``from`` è mancante, allora per determinare +l'autore da inserire nel changelog verrà usata la riga ``From`` +nell'intestazione dell'email. + +Il corpo della spiegazione verrà incluso nel changelog permanente, per cui +deve aver senso per un lettore esperto che è ha dimenticato i dettagli della +discussione che hanno portato alla patch. L'inclusione di informazioni +sui problemi oggetto dalla patch (messaggi del kernel, messaggi di oops, +eccetera) è particolarmente utile per le persone che potrebbero cercare fra +i messaggi di log per la patch che li tratta. Se la patch corregge un errore +di compilazione, non sarà necessario includere proprio _tutto_ quello che +è uscito dal compilatore; aggiungete solo quello che è necessario per far si +che la vostra patch venga trovata. Come nella ``summary phrase``, è importante +essere sia brevi che descrittivi. + +La linea di demarcazione ``---`` serve essenzialmente a segnare dove finisce +il messaggio di changelog. + +Aggiungere il ``diffstat`` dopo ``---`` è un buon uso di questo spazio, per +mostrare i file che sono cambiati, e il numero di file aggiunto o rimossi. +Un ``diffstat`` è particolarmente utile per le patch grandi. Altri commenti +che sono importanti solo per i manutentori, quindi inadatti al changelog +permanente, dovrebbero essere messi qui. Un buon esempio per questo tipo +di commenti potrebbe essere quello di descrivere le differenze fra le versioni +della patch. + +Se includete un ``diffstat`` dopo ``---``, usate le opzioni ``-p 1 -w70`` +cosicché i nomi dei file elencati non occupino troppo spazio (facilmente +rientreranno negli 80 caratteri, magari con qualche indentazione). +(``git`` genera di base dei diffstat adatti). + +Maggiori dettagli sul formato delle patch nei riferimenti qui di seguito. + +.. _it_explicit_in_reply_to: + +15) Usare esplicitamente In-Reply-To nell'intestazione +------------------------------------------------------ + +Aggiungere manualmente In-Reply-To: nell'intestazione dell'e-mail +potrebbe essere d'aiuto per associare una patch ad una discussione +precedente, per esempio per collegare la correzione di un baco con l'e-mail +che lo riportava. Tuttavia, per serie di patch multiple è generalmente +sconsigliato l'uso di In-Reply-To: per collegare precedenti versioni. +In questo modo versioni multiple di una patch non diventeranno un'ingestibile +giungla di riferimenti all'interno dei programmi di posta. Se un collegamento +è utile, potete usare https://lkml.kernel.org/ per ottenere i collegamenti +ad una versione precedente di una serie di patch (per esempio, potete usarlo +per l'email introduttiva alla serie). + +16) Inviare richieste ``git pull`` +---------------------------------- + +Se avete una serie di patch, potrebbe essere più conveniente per un manutentore +tirarle dentro al repositorio del sottosistema attraverso l'operazione +``git pull``. Comunque, tenete presente che prendere patch da uno sviluppatore +in questo modo richiede un livello di fiducia più alto rispetto a prenderle da +una lista di discussione. Di conseguenza, molti manutentori sono riluttanti +ad accettare richieste di *pull*, specialmente dagli sviluppatori nuovi e +quindi sconosciuti. Se siete in dubbio, potete fare una richiesta di *pull* +come messaggio introduttivo ad una normale pubblicazione di patch, così +il manutentore avrà la possibilità di scegliere come integrarle. + +Una richiesta di *pull* dovrebbe avere nell'oggetto [GIT] o [PULL]. +La richiesta stessa dovrebbe includere il nome del repositorio e quello del +ramo su una singola riga; dovrebbe essere più o meno così:: + + Please pull from + + git://jdelvare.pck.nerim.net/jdelvare-2.6 i2c-for-linus + + to get these changes: + +Una richiesta di *pull* dovrebbe includere anche un messaggio generico +che dica cos'è incluso, una lista delle patch usando ``git shortlog``, e una +panoramica sugli effetti della serie di patch con ``diffstat``. Il modo più +semplice per ottenere tutte queste informazioni è, ovviamente, quello di +lasciar fare tutto a ``git`` con il comando ``git request-pull``. + +Alcuni manutentori (incluso Linus) vogliono vedere le richieste di *pull* +da commit firmati con GPG; questo fornisce una maggiore garanzia sul fatto +che siate stati proprio voi a fare la richiesta. In assenza di tale etichetta +firmata Linus, in particolare, non prenderà alcuna patch da siti pubblici come +GitHub. + +Il primo passo verso la creazione di questa etichetta firmata è quello di +creare una chiave GNUPG ed averla fatta firmare da uno o più sviluppatori +principali del kernel. Questo potrebbe essere difficile per i nuovi +sviluppatori, ma non ci sono altre vie. Andare alle conferenze potrebbe +essere un buon modo per trovare sviluppatori che possano firmare la vostra +chiave. + +Una volta che avete preparato la vostra serie di patch in ``git``, e volete che +qualcuno le prenda, create una etichetta firmata col comando ``git tag -s``. +Questo creerà una nuova etichetta che identifica l'ultimo commit della serie +contenente una firma creata con la vostra chiave privata. Avrete anche +l'opportunità di aggiungere un messaggio di changelog all'etichetta; questo è +il posto ideale per descrivere gli effetti della richiesta di *pull*. + +Se i sorgenti da cui il manutentore prenderà le patch non sono gli stessi del +repositorio su cui state lavorando, allora non dimenticatevi di caricare +l'etichetta firmata anche sui sorgenti pubblici. + +Quando generate una richiesta di *pull*, usate l'etichetta firmata come +obiettivo. Un comando come il seguente farà il suo dovere:: + + git request-pull master git://my.public.tree/linux.git my-signed-tag + + +Riferimenti +----------- + +Andrew Morton, "La patch perfetta" (tpp). + + +Jeff Garzik, "Formato per la sottomissione di patch per il kernel Linux" + + +Greg Kroah-Hartman, "Come scocciare un manutentore di un sottosistema" + + + + + + + + + + + + +No!!!! Basta gigantesche bombe patch alle persone sulla lista linux-kernel@vger.kernel.org! + + +Kernel Documentation/translations/it_IT/process/coding-style.rst: + :ref:`Documentation/translations/it_IT/process/coding-style.rst ` + +E-mail di Linus Torvalds sul formato canonico di una patch: + + +Andi Kleen, "Su come sottomettere patch del kernel" + Alcune strategie su come sottomettere modifiche toste o controverse. + + http://halobates.de/on-submitting-patches.pdf -- cgit v1.2.3-59-g8ed1b From 787d07ed8b2ca726194b461f27fe5785d10add7d Mon Sep 17 00:00:00 2001 From: Chengguang Xu Date: Fri, 4 Jan 2019 22:27:22 +0800 Subject: doc: fix typo in Documentation/hwmon/f71882fg Just fix a typo in Documentation/hwmon/f71882fg. Signed-off-by: Chengguang Xu Signed-off-by: Jonathan Corbet --- Documentation/hwmon/f71882fg | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/hwmon/f71882fg b/Documentation/hwmon/f71882fg index de91c0db5846..4c3cb8377d74 100644 --- a/Documentation/hwmon/f71882fg +++ b/Documentation/hwmon/f71882fg @@ -94,7 +94,7 @@ Note that the lowest numbered temperature zone trip point corresponds to to the border between the highest and one but highest temperature zones, and vica versa. So the temperature zone trip points 1-4 (or 1-2) go from high temp to low temp! This is how things are implemented in the IC, and the driver -mimicks this. +mimics this. There are 2 modes to specify the speed of the fan, PWM duty cycle (or DC voltage) mode, where 0-100% duty cycle (0-100% of 12V) is specified. And RPM -- cgit v1.2.3-59-g8ed1b From 4ab5a5d2a4a2289c2af07accbec7170ca5671f41 Mon Sep 17 00:00:00 2001 From: Thorsten Leemhuis Date: Tue, 8 Jan 2019 20:40:06 +0100 Subject: tools: add a kernel-chktaint to tools/debugging Add a script to the tools/ directory that shows if or why the running kernel was tainted. The script was mostly written by Randy Dunlap; I enhanced the script a bit. There does not appear to be a good home for this script. so create tools/debugging for tools of this nature. Signed-off-by: Randy Dunlap Signed-off-by: Thorsten Leemhuis [ jc: fixed conflicts, rewrote changelog ] Signed-off-by: Jonathan Corbet --- tools/Makefile | 14 +-- tools/debugging/Makefile | 16 ++++ tools/debugging/kernel-chktaint | 202 ++++++++++++++++++++++++++++++++++++++++ 3 files changed, 226 insertions(+), 6 deletions(-) create mode 100644 tools/debugging/Makefile create mode 100755 tools/debugging/kernel-chktaint diff --git a/tools/Makefile b/tools/Makefile index abb358a70ad0..c0d1e59f5abb 100644 --- a/tools/Makefile +++ b/tools/Makefile @@ -12,6 +12,7 @@ help: @echo ' acpi - ACPI tools' @echo ' cgroup - cgroup tools' @echo ' cpupower - a tool for all things x86 CPU power' + @echo ' debugging - tools for debugging' @echo ' firewire - the userspace part of nosy, an IEEE-1394 traffic sniffer' @echo ' freefall - laptop accelerometer program for disk protection' @echo ' gpio - GPIO tools' @@ -60,7 +61,7 @@ acpi: FORCE cpupower: FORCE $(call descend,power/$@) -cgroup firewire hv guest spi usb virtio vm bpf iio gpio objtool leds wmi pci: FORCE +cgroup firewire hv guest spi usb virtio vm bpf iio gpio objtool leds wmi pci debugging: FORCE $(call descend,$@) liblockdep: FORCE @@ -95,7 +96,8 @@ kvm_stat: FORCE all: acpi cgroup cpupower gpio hv firewire liblockdep \ perf selftests spi turbostat usb \ virtio vm bpf x86_energy_perf_policy \ - tmon freefall iio objtool kvm_stat wmi pci + tmon freefall iio objtool kvm_stat wmi \ + pci debugging acpi_install: $(call descend,power/$(@:_install=),install) @@ -103,7 +105,7 @@ acpi_install: cpupower_install: $(call descend,power/$(@:_install=),install) -cgroup_install firewire_install gpio_install hv_install iio_install perf_install spi_install usb_install virtio_install vm_install bpf_install objtool_install wmi_install pci_install: +cgroup_install firewire_install gpio_install hv_install iio_install perf_install spi_install usb_install virtio_install vm_install bpf_install objtool_install wmi_install pci_install debugging_install: $(call descend,$(@:_install=),install) liblockdep_install: @@ -129,7 +131,7 @@ install: acpi_install cgroup_install cpupower_install gpio_install \ perf_install selftests_install turbostat_install usb_install \ virtio_install vm_install bpf_install x86_energy_perf_policy_install \ tmon_install freefall_install objtool_install kvm_stat_install \ - wmi_install pci_install + wmi_install pci_install debugging_install acpi_clean: $(call descend,power/acpi,clean) @@ -137,7 +139,7 @@ acpi_clean: cpupower_clean: $(call descend,power/cpupower,clean) -cgroup_clean hv_clean firewire_clean spi_clean usb_clean virtio_clean vm_clean wmi_clean bpf_clean iio_clean gpio_clean objtool_clean leds_clean pci_clean: +cgroup_clean hv_clean firewire_clean spi_clean usb_clean virtio_clean vm_clean wmi_clean bpf_clean iio_clean gpio_clean objtool_clean leds_clean pci_clean debugging_clean: $(call descend,$(@:_clean=),clean) liblockdep_clean: @@ -175,6 +177,6 @@ clean: acpi_clean cgroup_clean cpupower_clean hv_clean firewire_clean \ perf_clean selftests_clean turbostat_clean spi_clean usb_clean virtio_clean \ vm_clean bpf_clean iio_clean x86_energy_perf_policy_clean tmon_clean \ freefall_clean build_clean libbpf_clean libsubcmd_clean liblockdep_clean \ - gpio_clean objtool_clean leds_clean wmi_clean pci_clean + gpio_clean objtool_clean leds_clean wmi_clean pci_clean debugging_clean .PHONY: FORCE diff --git a/tools/debugging/Makefile b/tools/debugging/Makefile new file mode 100644 index 000000000000..e2b7c1a6fb8f --- /dev/null +++ b/tools/debugging/Makefile @@ -0,0 +1,16 @@ +# SPDX-License-Identifier: GPL-2.0 +# Makefile for debugging tools + +PREFIX ?= /usr +BINDIR ?= bin +INSTALL ?= install + +TARGET = kernel-chktaint + +all: $(TARGET) + +clean: + +install: kernel-chktaint + $(INSTALL) -D -m 755 $(TARGET) $(DESTDIR)$(PREFIX)/$(BINDIR)/$(TARGET) + diff --git a/tools/debugging/kernel-chktaint b/tools/debugging/kernel-chktaint new file mode 100755 index 000000000000..2240cb56e6e5 --- /dev/null +++ b/tools/debugging/kernel-chktaint @@ -0,0 +1,202 @@ +#! /bin/sh +# SPDX-License-Identifier: GPL-2.0 +# +# Randy Dunlap , 2018 +# Thorsten Leemhuis , 2018 + +usage() +{ + cat < + +Call without parameters to decode /proc/sys/kernel/tainted. + +Call with a positive integer as parameter to decode a value you +retrieved from /proc/sys/kernel/tainted on another system. + +EOF +} + +if [ "$1"x != "x" ]; then + if [ "$1"x == "--helpx" ] || [ "$1"x == "-hx" ] ; then + usage + exit 1 + elif [ $1 -ge 0 ] 2>/dev/null ; then + taint=$1 + else + echo "Error: Parameter '$1' not a positive interger. Aborting." >&2 + exit 1 + fi +else + TAINTFILE="/proc/sys/kernel/tainted" + if [ ! -r $TAINTFILE ]; then + echo "No file: $TAINTFILE" + exit + fi + + taint=`cat $TAINTFILE` +fi + +if [ $taint -eq 0 ]; then + echo "Kernel not Tainted" + exit +else + echo "Kernel is \"tainted\" for the following reasons:" +fi + +T=$taint +out= + +addout() { + out=$out$1 +} + +if [ `expr $T % 2` -eq 0 ]; then + addout "G" +else + addout "P" + echo " * proprietary module was loaded (#0)" +fi + +T=`expr $T / 2` +if [ `expr $T % 2` -eq 0 ]; then + addout " " +else + addout "F" + echo " * module was force loaded (#1)" +fi + +T=`expr $T / 2` +if [ `expr $T % 2` -eq 0 ]; then + addout " " +else + addout "S" + echo " * SMP kernel oops on an officially SMP incapable processor (#2)" +fi + +T=`expr $T / 2` +if [ `expr $T % 2` -eq 0 ]; then + addout " " +else + addout "R" + echo " * module was force unloaded (#3)" +fi + +T=`expr $T / 2` +if [ `expr $T % 2` -eq 0 ]; then + addout " " +else + addout "M" + echo " * processor reported a Machine Check Exception (MCE) (#4)" +fi + +T=`expr $T / 2` +if [ `expr $T % 2` -eq 0 ]; then + addout " " +else + addout "B" + echo " * bad page referenced or some unexpected page flags (#5)" +fi + +T=`expr $T / 2` +if [ `expr $T % 2` -eq 0 ]; then + addout " " +else + addout "U" + echo " * taint requested by userspace application (#6)" +fi + +T=`expr $T / 2` +if [ `expr $T % 2` -eq 0 ]; then + addout " " +else + addout "D" + echo " * kernel died recently, i.e. there was an OOPS or BUG (#7)" +fi + +T=`expr $T / 2` +if [ `expr $T % 2` -eq 0 ]; then + addout " " +else + addout "A" + echo " * an ACPI table was overridden by user (#8)" +fi + +T=`expr $T / 2` +if [ `expr $T % 2` -eq 0 ]; then + addout " " +else + addout "W" + echo " * kernel issued warning (#9)" +fi + +T=`expr $T / 2` +if [ `expr $T % 2` -eq 0 ]; then + addout " " +else + addout "C" + echo " * staging driver was loaded (#10)" +fi + +T=`expr $T / 2` +if [ `expr $T % 2` -eq 0 ]; then + addout " " +else + addout "I" + echo " * workaround for bug in platform firmware applied (#11)" +fi + +T=`expr $T / 2` +if [ `expr $T % 2` -eq 0 ]; then + addout " " +else + addout "O" + echo " * externally-built ('out-of-tree') module was loaded (#12)" +fi + +T=`expr $T / 2` +if [ `expr $T % 2` -eq 0 ]; then + addout " " +else + addout "E" + echo " * unsigned module was loaded (#13)" +fi + +T=`expr $T / 2` +if [ `expr $T % 2` -eq 0 ]; then + addout " " +else + addout "L" + echo " * soft lockup occurred (#14)" +fi + +T=`expr $T / 2` +if [ `expr $T % 2` -eq 0 ]; then + addout " " +else + addout "K" + echo " * kernel has been live patched (#15)" +fi + +T=`expr $T / 2` +if [ `expr $T % 2` -eq 0 ]; then + addout " " +else + addout "X" + echo " * auxiliary taint, defined for and used by distros (#16)" + +fi +T=`expr $T / 2` +if [ `expr $T % 2` -eq 0 ]; then + addout " " +else + addout "T" + echo " * kernel was built with the struct randomization plugin (#17)" +fi + +echo "For a more detailed explanation of the various taint flags see" +echo " Documentation/admin-guide/tainted-kernels.rst in the the Linux kernel sources" +echo " or https://kernel.org/doc/html/latest/admin-guide/tainted-kernels.html" +echo "Raw taint value as int/string: $taint/'$out'" +#EOF# -- cgit v1.2.3-59-g8ed1b From 896dd323abbf6a9980d8aca2656b6c4bf5352c3b Mon Sep 17 00:00:00 2001 From: Thorsten Leemhuis Date: Tue, 8 Jan 2019 20:40:07 +0100 Subject: docs: Revamp tainted-kernels.rst to make it more comprehensible Add a section about decoding /proc/sys/kernel/tainted, create a more understandable intro and a hopefully explain better the tainted flags in bugs, oops or panics messages. Only thing missing then is a table that quickly describes the various bits and taint flags before going into more detail, so add that as well. That table is partly based on a section from Documentation/sysctl/kernel.txt, but a bit more compact. To avoid confusion I added the shortened version to kernel.txt; the same table is used in three different places now: ./tools/debugging/kernel-chktaint, Documentation/admin-guide/tainted-kernels.rst and Documentation/sysctl/kernel.txt During review of v1 (see above) a number of existing issues with the text were raised, like outdated usages as well as incomplete or missing descriptions. Address most of those as well. Signed-off-by: Thorsten Leemhuis [jc: tightened up changelog] Signed-off-by: Jonathan Corbet --- Documentation/admin-guide/tainted-kernels.rst | 159 +++++++++++++++++++++----- Documentation/sysctl/kernel.txt | 50 ++++---- 2 files changed, 154 insertions(+), 55 deletions(-) diff --git a/Documentation/admin-guide/tainted-kernels.rst b/Documentation/admin-guide/tainted-kernels.rst index 28a869c509a0..71e9184a9079 100644 --- a/Documentation/admin-guide/tainted-kernels.rst +++ b/Documentation/admin-guide/tainted-kernels.rst @@ -1,59 +1,164 @@ Tainted kernels --------------- -Some oops reports contain the string **'Tainted: '** after the program -counter. This indicates that the kernel has been tainted by some -mechanism. The string is followed by a series of position-sensitive -characters, each representing a particular tainted value. - - 1) ``G`` if all modules loaded have a GPL or compatible license, ``P`` if +The kernel will mark itself as 'tainted' when something occurs that might be +relevant later when investigating problems. Don't worry too much about this, +most of the time it's not a problem to run a tainted kernel; the information is +mainly of interest once someone wants to investigate some problem, as its real +cause might be the event that got the kernel tainted. That's why bug reports +from tainted kernels will often be ignored by developers, hence try to reproduce +problems with an untainted kernel. + +Note the kernel will remain tainted even after you undo what caused the taint +(i.e. unload a proprietary kernel module), to indicate the kernel remains not +trustworthy. That's also why the kernel will print the tainted state when it +notices an internal problem (a 'kernel bug'), a recoverable error +('kernel oops') or a non-recoverable error ('kernel panic') and writes debug +information about this to the logs ``dmesg`` outputs. It's also possible to +check the tainted state at runtime through a file in ``/proc/``. + + +Tainted flag in bugs, oops or panics messages +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +You find the tainted state near the top in a line starting with 'CPU:'; if or +why the kernel was tainted is shown after the Process ID ('PID:') and a shortened +name of the command ('Comm:') that triggered the event:: + + BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 + Oops: 0002 [#1] SMP PTI + CPU: 0 PID: 4424 Comm: insmod Tainted: P W O 4.20.0-0.rc6.fc30 #1 + Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 + RIP: 0010:my_oops_init+0x13/0x1000 [kpanic] + [...] + +You'll find a 'Not tainted: ' there if the kernel was not tainted at the +time of the event; if it was, then it will print 'Tainted: ' and characters +either letters or blanks. In above example it looks like this:: + + Tainted: P W O + +The meaning of those characters is explained in the table below. In tis case +the kernel got tainted earlier because a proprietary Module (``P``) was loaded, +a warning occurred (``W``), and an externally-built module was loaded (``O``). +To decode other letters use the table below. + + +Decoding tainted state at runtime +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +At runtime, you can query the tainted state by reading +``cat /proc/sys/kernel/tainted``. If that returns ``0``, the kernel is not +tainted; any other number indicates the reasons why it is. The easiest way to +decode that number is the script ``tools/debugging/kernel-chktaint``, which your +distribution might ship as part of a package called ``linux-tools`` or +``kernel-tools``; if it doesn't you can download the script from +`git.kernel.org `_ +and execute it with ``sh kernel-chktaint``, which would print something like +this on the machine that had the statements in the logs that were quoted earlier:: + + Kernel is Tainted for following reasons: + * Proprietary module was loaded (#0) + * Kernel issued warning (#9) + * Externally-built ('out-of-tree') module was loaded (#12) + See Documentation/admin-guide/tainted-kernels.rst in the the Linux kernel or + https://www.kernel.org/doc/html/latest/admin-guide/tainted-kernels.html for + a more details explanation of the various taint flags. + Raw taint value as int/string: 4609/'P W O ' + +You can try to decode the number yourself. That's easy if there was only one +reason that got your kernel tainted, as in this case you can find the number +with the table below. If there were multiple reasons you need to decode the +number, as it is a bitfield, where each bit indicates the absence or presence of +a particular type of taint. It's best to leave that to the aforementioned +script, but if you need something quick you can use this shell command to check +which bits are set:: + + $ for i in $(seq 18); do echo $(($i-1)) $(($(cat /proc/sys/kernel/tainted)>>($i-1)&1));done + +Table for decoding tainted state +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +=== === ====== ======================================================== +Bit Log Number Reason that got the kernel tainted +=== === ====== ======================================================== + 0 G/P 1 proprietary module was loaded + 1 _/F 2 module was force loaded + 2 _/S 4 SMP kernel oops on an officially SMP incapable processor + 3 _/R 8 module was force unloaded + 4 _/M 16 processor reported a Machine Check Exception (MCE) + 5 _/B 32 bad page referenced or some unexpected page flags + 6 _/U 64 taint requested by userspace application + 7 _/D 128 kernel died recently, i.e. there was an OOPS or BUG + 8 _/A 256 ACPI table overridden by user + 9 _/W 512 kernel issued warning + 10 _/C 1024 staging driver was loaded + 11 _/I 2048 workaround for bug in platform firmware applied + 12 _/O 4096 externally-built ("out-of-tree") module was loaded + 13 _/E 8192 unsigned module was loaded + 14 _/L 16384 soft lockup occurred + 15 _/K 32768 kernel has been live patched + 16 _/X 65536 auxiliary taint, defined for and used by distros + 17 _/T 131072 kernel was built with the struct randomization plugin +=== === ====== ======================================================== + +Note: The character ``_`` is representing a blank in this table to make reading +easier. + +More detailed explanation for tainting +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + + 0) ``G`` if all modules loaded have a GPL or compatible license, ``P`` if any proprietary module has been loaded. Modules without a MODULE_LICENSE or with a MODULE_LICENSE that is not recognised by insmod as GPL compatible are assumed to be proprietary. - 2) ``F`` if any module was force loaded by ``insmod -f``, ``' '`` if all + 1) ``F`` if any module was force loaded by ``insmod -f``, ``' '`` if all modules were loaded normally. - 3) ``S`` if the oops occurred on an SMP kernel running on hardware that + 2) ``S`` if the oops occurred on an SMP kernel running on hardware that hasn't been certified as safe to run multiprocessor. Currently this occurs only on various Athlons that are not SMP capable. - 4) ``R`` if a module was force unloaded by ``rmmod -f``, ``' '`` if all + 3) ``R`` if a module was force unloaded by ``rmmod -f``, ``' '`` if all modules were unloaded normally. - 5) ``M`` if any processor has reported a Machine Check Exception, + 4) ``M`` if any processor has reported a Machine Check Exception, ``' '`` if no Machine Check Exceptions have occurred. - 6) ``B`` if a page-release function has found a bad page reference or - some unexpected page flags. + 5) ``B`` If a page-release function has found a bad page reference or some + unexpected page flags. This indicates a hardware problem or a kernel bug; + there should be other information in the log indicating why this tainting + occured. - 7) ``U`` if a user or user application specifically requested that the + 6) ``U`` if a user or user application specifically requested that the Tainted flag be set, ``' '`` otherwise. - 8) ``D`` if the kernel has died recently, i.e. there was an OOPS or BUG. + 7) ``D`` if the kernel has died recently, i.e. there was an OOPS or BUG. - 9) ``A`` if the ACPI table has been overridden. + 8) ``A`` if an ACPI table has been overridden. - 10) ``W`` if a warning has previously been issued by the kernel. + 9) ``W`` if a warning has previously been issued by the kernel. (Though some warnings may set more specific taint flags.) - 11) ``C`` if a staging driver has been loaded. + 10) ``C`` if a staging driver has been loaded. - 12) ``I`` if the kernel is working around a severe bug in the platform + 11) ``I`` if the kernel is working around a severe bug in the platform firmware (BIOS or similar). - 13) ``O`` if an externally-built ("out-of-tree") module has been loaded. + 12) ``O`` if an externally-built ("out-of-tree") module has been loaded. - 14) ``E`` if an unsigned module has been loaded in a kernel supporting + 13) ``E`` if an unsigned module has been loaded in a kernel supporting module signature. - 15) ``L`` if a soft lockup has previously occurred on the system. + 14) ``L`` if a soft lockup has previously occurred on the system. + + 15) ``K`` if the kernel has been live patched. - 16) ``K`` if the kernel has been live patched. + 16) ``X`` Auxiliary taint, defined for and used by Linux distributors. -The primary reason for the **'Tainted: '** string is to tell kernel -debuggers if this is a clean kernel or if anything unusual has -occurred. Tainting is permanent: even if an offending module is -unloaded, the tainted value remains to indicate that the kernel is not -trustworthy. + 17) ``T`` Kernel was build with the randstruct plugin, which can intentionally + produce extremely unusual kernel structure layouts (even performance + pathological ones), which is important to know when debugging. Set at + build time. diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt index 1b8775298cf7..b279b824b887 100644 --- a/Documentation/sysctl/kernel.txt +++ b/Documentation/sysctl/kernel.txt @@ -93,7 +93,7 @@ show up in /proc/sys/kernel: - stop-a [ SPARC only ] - sysrq ==> Documentation/admin-guide/sysrq.rst - sysctl_writes_strict -- tainted +- tainted ==> Documentation/admin-guide/tainted-kernels.rst - threads-max - unknown_nmi_panic - watchdog @@ -1002,39 +1002,33 @@ compilation sees a 1% slowdown, other systems and workloads may vary. 1: kernel stack erasing is enabled (default), it is performed before returning to the userspace at the end of syscalls. - ============================================================== -tainted: +tainted Non-zero if the kernel has been tainted. Numeric values, which can be ORed together. The letters are seen in "Tainted" line of Oops reports. - 1 (P): A module with a non-GPL license has been loaded, this - includes modules with no license. - Set by modutils >= 2.4.9 and module-init-tools. - 2 (F): A module was force loaded by insmod -f. - Set by modutils >= 2.4.9 and module-init-tools. - 4 (S): Unsafe SMP processors: SMP with CPUs not designed for SMP. - 8 (R): A module was forcibly unloaded from the system by rmmod -f. - 16 (M): A hardware machine check error occurred on the system. - 32 (B): A bad page was discovered on the system. - 64 (U): The user has asked that the system be marked "tainted". This - could be because they are running software that directly modifies - the hardware, or for other reasons. - 128 (D): The system has died. - 256 (A): The ACPI DSDT has been overridden with one supplied by the user - instead of using the one provided by the hardware. - 512 (W): A kernel warning has occurred. - 1024 (C): A module from drivers/staging was loaded. - 2048 (I): The system is working around a severe firmware bug. - 4096 (O): An out-of-tree module has been loaded. - 8192 (E): An unsigned module has been loaded in a kernel supporting module - signature. - 16384 (L): A soft lockup has previously occurred on the system. - 32768 (K): The kernel has been live patched. - 65536 (X): Auxiliary taint, defined and used by for distros. -131072 (T): The kernel was built with the struct randomization plugin. + 1 (P): proprietary module was loaded + 2 (F): module was force loaded + 4 (S): SMP kernel oops on an officially SMP incapable processor + 8 (R): module was force unloaded + 16 (M): processor reported a Machine Check Exception (MCE) + 32 (B): bad page referenced or some unexpected page flags + 64 (U): taint requested by userspace application + 128 (D): kernel died recently, i.e. there was an OOPS or BUG + 256 (A): an ACPI table was overridden by user + 512 (W): kernel issued warning + 1024 (C): staging driver was loaded + 2048 (I): workaround for bug in platform firmware applied + 4096 (O): externally-built ("out-of-tree") module was loaded + 8192 (E): unsigned module was loaded + 16384 (L): soft lockup occurred + 32768 (K): kernel has been live patched + 65536 (X): Auxiliary taint, defined and used by for distros +131072 (T): The kernel was built with the struct randomization plugin + +See Documentation/admin-guide/tainted-kernels.rst for more information. ============================================================== -- cgit v1.2.3-59-g8ed1b From 7fbc258fea74d888b5ac6ba15c50ef2a05b5243b Mon Sep 17 00:00:00 2001 From: Randy Dunlap Date: Sun, 13 Jan 2019 19:28:58 -0800 Subject: Documentation: fix coding-style.rst Sphinx warning Fix Sphinx warning in coding-style.rst: Documentation/process/coding-style.rst:446: WARNING: Inline interpreted text or phrase reference start-string without end-string. Signed-off-by: Randy Dunlap Signed-off-by: Jonathan Corbet --- Documentation/process/coding-style.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/process/coding-style.rst b/Documentation/process/coding-style.rst index afa11024279a..4e611a01adc6 100644 --- a/Documentation/process/coding-style.rst +++ b/Documentation/process/coding-style.rst @@ -443,7 +443,7 @@ In function prototypes, include parameter names with their data types. Although this is not required by the C language, it is preferred in Linux because it is a simple way to add valuable information for the reader. -Do not use the `extern' keyword with function prototypes as this makes +Do not use the ``extern`` keyword with function prototypes as this makes lines longer and isn't strictly necessary. -- cgit v1.2.3-59-g8ed1b From 5591a3075e956af501430bdf2b3516a69f4fe65d Mon Sep 17 00:00:00 2001 From: Randy Dunlap Date: Sun, 13 Jan 2019 19:21:46 -0800 Subject: Documentation: add ibmvmc to toctree(index) and fix warnings Fix Sphinx warnings in ibmvmc.rst, add an index.rst file in Documentation/misc-devices/, and insert that index file into the top-level index file. Documentation/misc-devices/ibmvmc.rst:2: WARNING: Explicit markup ends without a blank line; unexpected unindent. Documentation/misc-devices/ibmvmc.rst:: WARNING: document isn't included in any toctree Signed-off-by: Randy Dunlap Cc: Steven Royer Cc: Jonathan Corbet Signed-off-by: Jonathan Corbet --- Documentation/index.rst | 1 + Documentation/misc-devices/ibmvmc.rst | 1 + Documentation/misc-devices/index.rst | 17 +++++++++++++++++ 3 files changed, 19 insertions(+) create mode 100644 Documentation/misc-devices/index.rst diff --git a/Documentation/index.rst b/Documentation/index.rst index c858c2e66e36..80a421cb935e 100644 --- a/Documentation/index.rst +++ b/Documentation/index.rst @@ -90,6 +90,7 @@ needed). filesystems/index vm/index bpf/index + misc-devices/index Architecture-specific documentation ----------------------------------- diff --git a/Documentation/misc-devices/ibmvmc.rst b/Documentation/misc-devices/ibmvmc.rst index 46ded79554d4..b46df4ea2b81 100644 --- a/Documentation/misc-devices/ibmvmc.rst +++ b/Documentation/misc-devices/ibmvmc.rst @@ -1,4 +1,5 @@ .. SPDX-License-Identifier: GPL-2.0+ + ====================================================== IBM Virtual Management Channel Kernel Driver (IBMVMC) ====================================================== diff --git a/Documentation/misc-devices/index.rst b/Documentation/misc-devices/index.rst new file mode 100644 index 000000000000..dfd1f45a3127 --- /dev/null +++ b/Documentation/misc-devices/index.rst @@ -0,0 +1,17 @@ +.. SPDX-License-Identifier: GPL-2.0 + +============================================ +Assorted Miscellaneous Devices Documentation +============================================ + +This documentation contains information for assorted devices that do not +fit into other categories. + +.. class:: toc-title + + Table of contents + +.. toctree:: + :maxdepth: 2 + + ibmvmc -- cgit v1.2.3-59-g8ed1b From 35283f56626cbc6d5b2dcb42f6d2189ce34fee3d Mon Sep 17 00:00:00 2001 From: Christian Brauner Date: Fri, 11 Jan 2019 14:40:59 +0100 Subject: Documentation/filesystems: add binderfs This documents the Android binderfs filesystem used to dynamically add and remove binder devices that are private to each instance. Signed-off-by: Christian Brauner [jc: tweaked markup and added to filesystems/index.rst] Signed-off-by: Jonathan Corbet --- Documentation/filesystems/binderfs.rst | 68 ++++++++++++++++++++++++++++++++++ Documentation/filesystems/index.rst | 7 ++++ 2 files changed, 75 insertions(+) create mode 100644 Documentation/filesystems/binderfs.rst diff --git a/Documentation/filesystems/binderfs.rst b/Documentation/filesystems/binderfs.rst new file mode 100644 index 000000000000..c009671f8434 --- /dev/null +++ b/Documentation/filesystems/binderfs.rst @@ -0,0 +1,68 @@ +.. SPDX-License-Identifier: GPL-2.0 + +The Android binderfs Filesystem +=============================== + +Android binderfs is a filesystem for the Android binder IPC mechanism. It +allows to dynamically add and remove binder devices at runtime. Binder devices +located in a new binderfs instance are independent of binder devices located in +other binderfs instances. Mounting a new binderfs instance makes it possible +to get a set of private binder devices. + +Mounting binderfs +----------------- + +Android binderfs can be mounted with:: + + mkdir /dev/binderfs + mount -t binder binder /dev/binderfs + +at which point a new instance of binderfs will show up at ``/dev/binderfs``. +In a fresh instance of binderfs no binder devices will be present. There will +only be a ``binder-control`` device which serves as the request handler for +binderfs. Mounting another binderfs instance at a different location will +create a new and separate instance from all other binderfs mounts. This is +identical to the behavior of e.g. ``devpts`` and ``tmpfs``. The Android +binderfs filesystem can be mounted in user namespaces. + +Options +------- +max + binderfs instances can be mounted with a limit on the number of binder + devices that can be allocated. The ``max=`` mount option serves as + a per-instance limit. If ``max=`` is set then only ```` number + of binder devices can be allocated in this binderfs instance. + +Allocating binder Devices +------------------------- + +.. _ioctl: http://man7.org/linux/man-pages/man2/ioctl.2.html + +To allocate a new binder device in a binderfs instance a request needs to be +sent through the ``binder-control`` device node. A request is sent in the form +of an `ioctl() `_. + +What a program needs to do is to open the ``binder-control`` device node and +send a ``BINDER_CTL_ADD`` request to the kernel. Users of binderfs need to +tell the kernel which name the new binder device should get. By default a name +can only contain up to ``BINDERFS_MAX_NAME`` chars including the terminating +zero byte. + +Once the request is made via an `ioctl() `_ passing a ``struct +binder_device`` with the name to the kernel it will allocate a new binder +device and return the major and minor number of the new device in the struct +(This is necessary because binderfs allocates a major device number +dynamically.). After the `ioctl() `_ returns there will be a new +binder device located under /dev/binderfs with the chosen name. + +Deleting binder Devices +----------------------- + +.. _unlink: http://man7.org/linux/man-pages/man2/unlink.2.html +.. _rm: http://man7.org/linux/man-pages/man1/rm.1.html + +Binderfs binder devices can be deleted via `unlink() `_. This means +that the `rm() `_ tool can be used to delete them. Note that the +``binder-control`` device cannot be deleted since this would make the binderfs +instance unuseable. The ``binder-control`` device will be deleted when the +binderfs instance is unmounted and all references to it have been dropped. diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst index 605befab300b..61d2441b25d5 100644 --- a/Documentation/filesystems/index.rst +++ b/Documentation/filesystems/index.rst @@ -380,3 +380,10 @@ including: :maxdepth: 2 path-lookup.rst + +binderfs +======== + +.. toctree:: + + binderfs.rst -- cgit v1.2.3-59-g8ed1b From 9762dc1432e10d5571c1ae746c41469504360c2a Mon Sep 17 00:00:00 2001 From: Christian Brauner Date: Fri, 11 Jan 2019 14:41:00 +0100 Subject: samples: add binderfs sample program This adds a simple sample program mounting binderfs and adding, then removing a binder device. Hopefully, it will be helpful to users who want to know how binderfs is supposed to be used. Signed-off-by: Christian Brauner Signed-off-by: Jonathan Corbet --- samples/Kconfig | 7 ++++ samples/Makefile | 2 +- samples/binderfs/Makefile | 1 + samples/binderfs/binderfs_example.c | 83 +++++++++++++++++++++++++++++++++++++ 4 files changed, 92 insertions(+), 1 deletion(-) create mode 100644 samples/binderfs/Makefile create mode 100644 samples/binderfs/binderfs_example.c diff --git a/samples/Kconfig b/samples/Kconfig index ad1ec7016d4c..d19754ccad08 100644 --- a/samples/Kconfig +++ b/samples/Kconfig @@ -147,6 +147,13 @@ config SAMPLE_VFIO_MDEV_MBOCHS Specifically it does *not* include any legacy vga stuff. Device looks a lot like "qemu -device secondary-vga". +config SAMPLE_ANDROID_BINDERFS + bool "Build Android binderfs example" + depends on CONFIG_ANDROID_BINDERFS + help + Builds a sample program to illustrate the use of the Android binderfs + filesystem. + config SAMPLE_STATX bool "Build example extended-stat using code" depends on BROKEN diff --git a/samples/Makefile b/samples/Makefile index bd601c038b86..b1142a958811 100644 --- a/samples/Makefile +++ b/samples/Makefile @@ -3,4 +3,4 @@ obj-$(CONFIG_SAMPLES) += kobject/ kprobes/ trace_events/ livepatch/ \ hw_breakpoint/ kfifo/ kdb/ hidraw/ rpmsg/ seccomp/ \ configfs/ connector/ v4l/ trace_printk/ \ - vfio-mdev/ statx/ qmi/ + vfio-mdev/ statx/ qmi/ binderfs/ diff --git a/samples/binderfs/Makefile b/samples/binderfs/Makefile new file mode 100644 index 000000000000..01ca9f2529a7 --- /dev/null +++ b/samples/binderfs/Makefile @@ -0,0 +1 @@ +obj-$(CONFIG_SAMPLE_ANDROID_BINDERFS) += binderfs_example.o diff --git a/samples/binderfs/binderfs_example.c b/samples/binderfs/binderfs_example.c new file mode 100644 index 000000000000..5bbd2ebc0aea --- /dev/null +++ b/samples/binderfs/binderfs_example.c @@ -0,0 +1,83 @@ +// SPDX-License-Identifier: GPL-2.0 + +#define _GNU_SOURCE +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +int main(int argc, char *argv[]) +{ + int fd, ret, saved_errno; + size_t len; + struct binderfs_device device = { 0 }; + + ret = unshare(CLONE_NEWNS); + if (ret < 0) { + fprintf(stderr, "%s - Failed to unshare mount namespace\n", + strerror(errno)); + exit(EXIT_FAILURE); + } + + ret = mount(NULL, "/", NULL, MS_REC | MS_PRIVATE, 0); + if (ret < 0) { + fprintf(stderr, "%s - Failed to mount / as private\n", + strerror(errno)); + exit(EXIT_FAILURE); + } + + ret = mkdir("/dev/binderfs", 0755); + if (ret < 0 && errno != EEXIST) { + fprintf(stderr, "%s - Failed to create binderfs mountpoint\n", + strerror(errno)); + exit(EXIT_FAILURE); + } + + ret = mount(NULL, "/dev/binderfs", "binder", 0, 0); + if (ret < 0) { + fprintf(stderr, "%s - Failed to mount binderfs\n", + strerror(errno)); + exit(EXIT_FAILURE); + } + + memcpy(device.name, "my-binder", strlen("my-binder")); + + fd = open("/dev/binderfs/binder-control", O_RDONLY | O_CLOEXEC); + if (fd < 0) { + fprintf(stderr, "%s - Failed to open binder-control device\n", + strerror(errno)); + exit(EXIT_FAILURE); + } + + ret = ioctl(fd, BINDER_CTL_ADD, &device); + saved_errno = errno; + close(fd); + errno = saved_errno; + if (ret < 0) { + fprintf(stderr, "%s - Failed to allocate new binder device\n", + strerror(errno)); + exit(EXIT_FAILURE); + } + + printf("Allocated new binder device with major %d, minor %d, and name %s\n", + device.major, device.minor, device.name); + + ret = unlink("/dev/binderfs/my-binder"); + if (ret < 0) { + fprintf(stderr, "%s - Failed to delete binder device\n", + strerror(errno)); + exit(EXIT_FAILURE); + } + + /* Cleanup happens when the mount namespace dies. */ + exit(EXIT_SUCCESS); +} -- cgit v1.2.3-59-g8ed1b From 631605c007531ef0b48690740519d0b9a6b22d38 Mon Sep 17 00:00:00 2001 From: Vincent Whitchurch Date: Fri, 11 Jan 2019 17:14:10 +0100 Subject: Documentation/sysctl/vm.txt: Fix drop_caches bit number Bits are usually numbered starting from zero, so 4 should be bit 2, not bit 3. Suggested-by: Matthew Wilcox Signed-off-by: Vincent Whitchurch Signed-off-by: Jonathan Corbet --- Documentation/sysctl/vm.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt index 187ce4f599a2..6af24cdb25cc 100644 --- a/Documentation/sysctl/vm.txt +++ b/Documentation/sysctl/vm.txt @@ -237,7 +237,7 @@ used: cat (1234): drop_caches: 3 These are informational only. They do not mean that anything is wrong -with your system. To disable them, echo 4 (bit 3) into drop_caches. +with your system. To disable them, echo 4 (bit 2) into drop_caches. ============================================================== -- cgit v1.2.3-59-g8ed1b From 58f4df3c1bde999574d3e66b20eb7ee796a2647e Mon Sep 17 00:00:00 2001 From: Sebastian Andrzej Siewior Date: Mon, 14 Jan 2019 11:08:07 +0100 Subject: Documentation/dev-tools: Use gcc version number instead svn revision number svn commit 231296 matches commit d29e939c63b71 ("Add fuzzing coverage support") in the gcc git. The change is part of gcc 6.1.0. Replace the svn commit number with a gcc version which everyone can easily compare. Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Jonathan Corbet --- Documentation/dev-tools/kcov.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/dev-tools/kcov.rst b/Documentation/dev-tools/kcov.rst index c2f6452e38ed..42b612677799 100644 --- a/Documentation/dev-tools/kcov.rst +++ b/Documentation/dev-tools/kcov.rst @@ -22,7 +22,7 @@ Configure the kernel with:: CONFIG_KCOV=y -CONFIG_KCOV requires gcc built on revision 231296 or later. +CONFIG_KCOV requires gcc 6.1.0 or later. If the comparison operands need to be collected, set:: -- cgit v1.2.3-59-g8ed1b From 98e5f349c9a0253d4a213d54db918f2aab5846ee Mon Sep 17 00:00:00 2001 From: Mike Rapoport Date: Mon, 14 Jan 2019 13:47:34 +0200 Subject: docs/core-api: memory-allocation: add mention of kmem_cache_create_userspace Mention that when a part of a slab cache might be exported to the userspace, the cache should be created using kmem_cache_create_usercopy() Signed-off-by: Mike Rapoport Signed-off-by: Jonathan Corbet --- Documentation/core-api/memory-allocation.rst | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/Documentation/core-api/memory-allocation.rst b/Documentation/core-api/memory-allocation.rst index 8954a88ff5b7..51a200d59f51 100644 --- a/Documentation/core-api/memory-allocation.rst +++ b/Documentation/core-api/memory-allocation.rst @@ -113,9 +113,11 @@ see :c:func:`kvmalloc_node` reference documentation. Note that If you need to allocate many identical objects you can use the slab cache allocator. The cache should be set up with -:c:func:`kmem_cache_create` before it can be used. Afterwards -:c:func:`kmem_cache_alloc` and its convenience wrappers can allocate -memory from that cache. +:c:func:`kmem_cache_create` or :c:func:`kmem_cache_create_usercopy` +before it can be used. The second function should be used if a part of +the cache might be copied to the userspace. After the cache is +created :c:func:`kmem_cache_alloc` and its convenience wrappers can +allocate memory from that cache. When the allocated memory is no longer needed it must be freed. You can use :c:func:`kvfree` for the memory allocated with `kmalloc`, -- cgit v1.2.3-59-g8ed1b From 80a76c7261d5c3a21eca3dacd6f10f9553c08244 Mon Sep 17 00:00:00 2001 From: Mike Rapoport Date: Mon, 14 Jan 2019 20:32:58 +0200 Subject: docs/core-api/mm: fix GFP combinations section name Fix the mismatch between "Useful GFP flag combinations" section naming in the DOC: section in include/linux/gfp.h and Documentation/core-api/mm-api.rst. This brings in the documentation, and eliminates one warning: ./include/linux/gfp.h:1: warning: no structured comments found Signed-off-by: Mike Rapoport [jc: tweaked changelog] Signed-off-by: Jonathan Corbet --- Documentation/core-api/mm-api.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/core-api/mm-api.rst b/Documentation/core-api/mm-api.rst index aa8e54b85221..128e8a721c1e 100644 --- a/Documentation/core-api/mm-api.rst +++ b/Documentation/core-api/mm-api.rst @@ -35,7 +35,7 @@ users will want to use a plain ``GFP_KERNEL``. :doc: Reclaim modifiers .. kernel-doc:: include/linux/gfp.h - :doc: Common combinations + :doc: Useful GFP flag combinations The Slab Cache ============== -- cgit v1.2.3-59-g8ed1b From 4d01460ec9a454a9b536fd4e050f61389c381d9e Mon Sep 17 00:00:00 2001 From: Joel Nider Date: Mon, 14 Jan 2019 09:14:59 +0200 Subject: docs-rst: doc-guide: Minor grammar fixes While using this guide to learn the new documentation method, I saw a few phrases that I felt could be improved. These small changes improve the grammar and choice of words to further enhance the installation instructions. Signed-off-by: Joel Nider Acked-by: Mike Rapoport Acked-by: Matthew Wilcox Signed-off-by: Jonathan Corbet --- Documentation/doc-guide/sphinx.rst | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/Documentation/doc-guide/sphinx.rst b/Documentation/doc-guide/sphinx.rst index 02605ee1d876..c039224b404e 100644 --- a/Documentation/doc-guide/sphinx.rst +++ b/Documentation/doc-guide/sphinx.rst @@ -27,8 +27,8 @@ Sphinx Install ============== The ReST markups currently used by the Documentation/ files are meant to be -built with ``Sphinx`` version 1.3 or upper. If you're desiring to build -PDF outputs, it is recommended to use version 1.4.6 or upper. +built with ``Sphinx`` version 1.3 or higher. If you desire to build +PDF output, it is recommended to use version 1.4.6 or higher. There's a script that checks for the Sphinx requirements. Please see :ref:`sphinx-pre-install` for further details. @@ -37,15 +37,15 @@ Most distributions are shipped with Sphinx, but its toolchain is fragile, and it is not uncommon that upgrading it or some other Python packages on your machine would cause the documentation build to break. -A way to get rid of that is to use a different version than the one shipped -on your distributions. In order to do that, it is recommended to install +A way to avoid that is to use a different version than the one shipped +with your distributions. In order to do so, it is recommended to install Sphinx inside a virtual environment, using ``virtualenv-3`` or ``virtualenv``, depending on how your distribution packaged Python 3. .. note:: #) Sphinx versions below 1.5 don't work properly with Python's - docutils version 0.13.1 or upper. So, if you're willing to use + docutils version 0.13.1 or higher. So, if you're willing to use those versions, you should run ``pip install 'docutils==0.12'``. #) It is recommended to use the RTD theme for html output. Depending @@ -82,7 +82,7 @@ output. PDF and LaTeX builds -------------------- -Such builds are currently supported only with Sphinx versions 1.4 and upper. +Such builds are currently supported only with Sphinx versions 1.4 and higher. For PDF and LaTeX output, you'll also need ``XeLaTeX`` version 3.14159265. -- cgit v1.2.3-59-g8ed1b From b631c7f513548f4097687603ef6f959ccae18da9 Mon Sep 17 00:00:00 2001 From: Jonathan Corbet Date: Wed, 9 Jan 2019 14:01:32 -0700 Subject: docs: don't try to get comments from rcupdate_wait.h or rcutree.h Neither file contains any kerneldoc comments, so including them generates these warnings in the docs build: ./include/linux/rcupdate_wait.h:1: warning: no structured comments found ./include/linux/rcutree.h:1: warning: no structured comments found Remove them and make life a little quieter. Signed-off-by: Jonathan Corbet --- Documentation/core-api/kernel-api.rst | 4 ---- 1 file changed, 4 deletions(-) diff --git a/Documentation/core-api/kernel-api.rst b/Documentation/core-api/kernel-api.rst index cdd24943fbcc..71f5d2fe39b7 100644 --- a/Documentation/core-api/kernel-api.rst +++ b/Documentation/core-api/kernel-api.rst @@ -356,10 +356,6 @@ Read-Copy Update (RCU) .. kernel-doc:: include/linux/rcupdate.h -.. kernel-doc:: include/linux/rcupdate_wait.h - -.. kernel-doc:: include/linux/rcutree.h - .. kernel-doc:: kernel/rcu/tree.c .. kernel-doc:: kernel/rcu/tree_plugin.h -- cgit v1.2.3-59-g8ed1b From 053bc5693863d9230f9a11b923ea713aa6a4ce91 Mon Sep 17 00:00:00 2001 From: Yang Shi Date: Wed, 16 Jan 2019 07:51:35 +0800 Subject: doc: memcontrol: fix the obsolete content about force empty We don't do page cache reparent anymore when offlining memcg, so update force empty related content accordingly. Reviewed-by: Shakeel Butt Acked-by: Michal Hocko Cc: Johannes Weiner Signed-off-by: Yang Shi Signed-off-by: Jonathan Corbet --- Documentation/cgroup-v1/memory.txt | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/Documentation/cgroup-v1/memory.txt b/Documentation/cgroup-v1/memory.txt index 3682e99234c2..8e2cb1dabeb0 100644 --- a/Documentation/cgroup-v1/memory.txt +++ b/Documentation/cgroup-v1/memory.txt @@ -70,7 +70,7 @@ Brief summary of control files. memory.soft_limit_in_bytes # set/show soft limit of memory usage memory.stat # show various statistics memory.use_hierarchy # set/show hierarchical account enabled - memory.force_empty # trigger forced move charge to parent + memory.force_empty # trigger forced page reclaim memory.pressure_level # set memory pressure notifications memory.swappiness # set/show swappiness parameter of vmscan (See sysctl's vm.swappiness) @@ -459,8 +459,9 @@ About use_hierarchy, see Section 6. the cgroup will be reclaimed and as many pages reclaimed as possible. The typical use case for this interface is before calling rmdir(). - Because rmdir() moves all pages to parent, some out-of-use page caches can be - moved to the parent. If you want to avoid that, force_empty will be useful. + Though rmdir() offlines memcg, but the memcg may still stay there due to + charged file caches. Some out-of-use page caches may keep charged until + memory pressure happens. If you want to avoid that, force_empty will be useful. Also, note that when memory.kmem.limit_in_bytes is set the charges due to kernel pages will still be seen. This is not considered a failure and the -- cgit v1.2.3-59-g8ed1b From 6e6c61d3e342ee0830844b1ce3d433decfc2321e Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Wed, 16 Jan 2019 11:26:52 +0100 Subject: LICENSES: Add GCC runtime library exception text A recent commit added SPDX identifiers to the SuperH low level library code which originates from GCC. This code is licensed under the GPL 2.0 or later with the GCC runtime library exception. Unfortunately the authors did not bother to add the exception text to the LICENSES directory so spdxcheck fails with: arch/sh/lib/ashiftrt.S: 1:42 Invalid Exception ID: GCC-exception-2.0 arch/sh/lib/ashlsi3.S: 1:42 Invalid Exception ID: GCC-exception-2.0 arch/sh/lib/ashrsi3.S: 1:42 Invalid Exception ID: GCC-exception-2.0 arch/sh/lib/lshrsi3.S: 1:42 Invalid Exception ID: GCC-exception-2.0 arch/sh/lib/movmem.S: 1:42 Invalid Exception ID: GCC-exception-2.0 arch/sh/lib/udiv_qrnnd.S: 1:42 Invalid Exception ID: GCC-exception-2.0 arch/sh/lib/udivsi3.S: 1:42 Invalid Exception ID: GCC-exception-2.0 arch/sh/lib/udivsi3_i4i-Os.S: 1:42 Invalid Exception ID: GCC-exception-2.0 arch/sh/lib/udivsi3_i4i.S: 1:42 Invalid Exception ID: GCC-exception-2.0 Add the exception text along with the required tags which allow automated checking. Fixes: 4494ce4fb4ff ("sh: lib: convert to SPDX identifiers") Signed-off-by: Thomas Gleixner Acked-by: Greg Kroah-Hartman Cc: Kuninori Morimoto Cc: Simon Horman Cc: Yoshinori Sato Cc: Rich Felker Cc: Andrew Morton Cc: Kate Stewart Signed-off-by: Jonathan Corbet --- LICENSES/exceptions/GCC-exception-2.0 | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) create mode 100644 LICENSES/exceptions/GCC-exception-2.0 diff --git a/LICENSES/exceptions/GCC-exception-2.0 b/LICENSES/exceptions/GCC-exception-2.0 new file mode 100644 index 000000000000..422914a88c36 --- /dev/null +++ b/LICENSES/exceptions/GCC-exception-2.0 @@ -0,0 +1,18 @@ +SPDX-Exception-Identifier: GCC-exception-2.0 +SPDX-URL: https://spdx.org/licenses/GCC-exception-2.0.html +SPDX-Licenses: GPL-2.0, GPL-2.0+, GPL-2.0-only, GPL-2.0-or-later +Usage-Guide: + This exception is used together with one of the above SPDX-Licenses to + allow linking the compiled version of code to non GPL compliant code. + To use this exception add it with the keyword WITH to one of the + identifiers in the SPDX-Licenses tag: + SPDX-License-Identifier: WITH GCC-exception-2.0 +License-Text: + +In addition to the permissions in the GNU Library General Public License, +the Free Software Foundation gives you unlimited permission to link the +compiled version of this file into combinations with other programs, and to +distribute those programs without any restriction coming from the use of +this file. (The General Public License restrictions do apply in other +respects; for example, they cover modification of the file, and +distribution when not linked into another program.) -- cgit v1.2.3-59-g8ed1b From 959b49687838c9eae38f19cb19d504ba880b6cd0 Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Wed, 16 Jan 2019 11:26:53 +0100 Subject: scripts/spdxcheck.py: Handle special quotation mark comments The SuperH boot code files use a magic format for the SPDX identifier comment: LIST "SPDX-License-Identifier: .... " The trailing quotation mark is not stripped before the token parser is invoked and causes the scan to fail. Handle it gracefully. Fixes: 6a0abce4c4cc ("sh: include: convert to SPDX identifiers") Signed-off-by: Thomas Gleixner Cc: Kuninori Morimoto Cc: Simon Horman Cc: Yoshinori Sato Cc: Rich Felker Cc: Andrew Morton Cc: Kate Stewart Reviewed-by: Greg Kroah-Hartman Signed-off-by: Jonathan Corbet --- scripts/spdxcheck.py | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/scripts/spdxcheck.py b/scripts/spdxcheck.py index e559c6294c39..3fb020c2cb7f 100755 --- a/scripts/spdxcheck.py +++ b/scripts/spdxcheck.py @@ -175,7 +175,13 @@ class id_parser(object): self.lines_checked += 1 if line.find("SPDX-License-Identifier:") < 0: continue - expr = line.split(':')[1].replace('*/', '').strip() + expr = line.split(':')[1].strip() + # Remove trailing comment closure + if line.startswith('/*'): + expr = expr.rstrip('*/').strip() + # Special case for SH magic boot code files + if line.startswith('LIST \"'): + expr = expr.rstrip('\"').strip() self.parse(expr) self.spdx_valid += 1 # -- cgit v1.2.3-59-g8ed1b From be5cd20c9b491504dfb9105404913de25c47c580 Mon Sep 17 00:00:00 2001 From: Jonathan Corbet Date: Fri, 11 Jan 2019 12:31:39 -0700 Subject: kernel-doc: suppress 'not described' warnings for embedded struct fields The ability to add kerneldoc comments for fields in embedded structures is useful, but it brought along a whole bunch of warnings for fields that could not be described before. In many cases, there's little value in adding docs for these nested fields, and in cases like: struct a { struct b { int c; } d, e; }; "c" would have to be described twice (as d.c and e.c) to make the warnings go away. We can no doubt do something smarter, but simply suppressing the warnings for this case removes about 70 warnings from the docs build, freeing us to focus on the ones that matter more. So make kerneldoc be silent about missing descriptions for any field containing a ".". Signed-off-by: Jonathan Corbet --- scripts/kernel-doc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/scripts/kernel-doc b/scripts/kernel-doc index c5333d251985..3350e498b4ce 100755 --- a/scripts/kernel-doc +++ b/scripts/kernel-doc @@ -1474,7 +1474,7 @@ sub push_parameter($$$$) { if (!defined $parameterdescs{$param} && $param !~ /^#/) { $parameterdescs{$param} = $undescribed; - if (show_warnings($type, $declaration_name)) { + if (show_warnings($type, $declaration_name) && $param !~ /\./) { print STDERR "${file}:$.: warning: Function parameter or member '$param' not described in '$declaration_name'\n"; ++$warnings; -- cgit v1.2.3-59-g8ed1b From 1d2375f048b7da90402c7a40fb3606b320c3f76f Mon Sep 17 00:00:00 2001 From: Federico Vaga Date: Fri, 18 Jan 2019 22:58:04 +0100 Subject: doc:process: remove note from 'stable api nonsense' The link referred by the note can't be retrieved: this patch just remove that old note. Signed-off-by: Federico Vaga Acked-by: Greg Kroah-Hartman Signed-off-by: Jonathan Corbet --- Documentation/process/stable-api-nonsense.rst | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/Documentation/process/stable-api-nonsense.rst b/Documentation/process/stable-api-nonsense.rst index 24f5aeecee91..57d95a49c096 100644 --- a/Documentation/process/stable-api-nonsense.rst +++ b/Documentation/process/stable-api-nonsense.rst @@ -171,8 +171,7 @@ is also a rough job. Simple, get your kernel driver into the main kernel tree (remember we are talking about GPL released drivers here, if your code doesn't fall -under this category, good luck, you are on your own here, you leech -.) If your +under this category, good luck, you are on your own here, you leech). If your driver is in the tree, and a kernel interface changes, it will be fixed up by the person who did the kernel change in the first place. This ensures that your driver is always buildable, and works over time, with -- cgit v1.2.3-59-g8ed1b From 3d18f58621601c04f5c31e4f51c640dfd41bbaf0 Mon Sep 17 00:00:00 2001 From: Federico Vaga Date: Sat, 19 Jan 2019 23:14:22 +0100 Subject: doc:it_IT: documentation alignment It aligns the italian translation with the latest changes: ae67ee6c5e1d docs: fix Co-Developed-by docs 3fe5dbfef47e Documentation/process/coding-style.rst: don't use "extern" with function prototypes Signed-off-by: Federico Vaga Signed-off-by: Jonathan Corbet --- Documentation/translations/it_IT/process/coding-style.rst | 3 +++ Documentation/translations/it_IT/process/submitting-patches.rst | 4 ++-- 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/Documentation/translations/it_IT/process/coding-style.rst b/Documentation/translations/it_IT/process/coding-style.rst index 67c6d9ea7257..dfdefb951b89 100644 --- a/Documentation/translations/it_IT/process/coding-style.rst +++ b/Documentation/translations/it_IT/process/coding-style.rst @@ -449,6 +449,9 @@ Nonostante questo non sia richiesto dal linguaggio C, in Linux viene preferito perché è un modo semplice per aggiungere informazioni importanti per il lettore. +Non usate la parola chiave ``extern`` coi prototipi di funzione perché +rende le righe più lunghe e non è strettamente necessario. + 7) Centralizzare il ritorno delle funzioni ------------------------------------------ diff --git a/Documentation/translations/it_IT/process/submitting-patches.rst b/Documentation/translations/it_IT/process/submitting-patches.rst index b6368a45a8e4..2ab9c1401aa1 100644 --- a/Documentation/translations/it_IT/process/submitting-patches.rst +++ b/Documentation/translations/it_IT/process/submitting-patches.rst @@ -534,7 +534,7 @@ alle persone che vogliono seguire i vostri sorgenti, e quelle che cercano dei bachi. -12) Quando utilizzare Acked-by:, Cc:, e Co-Developed-by: +12) Quando utilizzare Acked-by:, Cc:, e Co-developed-by: -------------------------------------------------------- L'etichetta Signed-off-by: indica che il firmatario è stato coinvolto nello @@ -567,7 +567,7 @@ alcunché - ma dovrebbe indicare che la persona ha ricevuto una copia della patch. Questa etichetta documenta che terzi potenzialmente interessati sono stati inclusi nella discussione. -L'etichetta Co-Developed-by: indica che la patch è stata scritta dall'autore in +L'etichetta Co-developed-by: indica che la patch è stata scritta dall'autore in collaborazione con un altro sviluppatore. Qualche volta questo è utile quando più persone lavorano sulla stessa patch. Notate, questa persona deve avere nella patch anche una riga Signed-off-by:. -- cgit v1.2.3-59-g8ed1b From b04c11c988f4fcfba037ce6c356ea5ac9e23df63 Mon Sep 17 00:00:00 2001 From: Federico Vaga Date: Sun, 20 Jan 2019 12:16:29 +0100 Subject: doc:process: add missing internal link in stable-kernel-rules Keep consistent the document. In the document, option references are always linked, except for the one I fixed with this patch Signed-off-by: Federico Vaga Signed-off-by: Jonathan Corbet --- Documentation/process/stable-kernel-rules.rst | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/Documentation/process/stable-kernel-rules.rst b/Documentation/process/stable-kernel-rules.rst index 0de6f6145cc6..070e3f04d183 100644 --- a/Documentation/process/stable-kernel-rules.rst +++ b/Documentation/process/stable-kernel-rules.rst @@ -98,9 +98,9 @@ text, like this: commit upstream. -Additionally, some patches submitted via Option 1 may have additional patch -prerequisites which can be cherry-picked. This can be specified in the following -format in the sign-off area: +Additionally, some patches submitted via :ref:`option_1` may have additional +patch prerequisites which can be cherry-picked. This can be specified in the +following format in the sign-off area: .. code-block:: none -- cgit v1.2.3-59-g8ed1b From 7967656ffbfa493f5546c0f18bf8a28f702c4baa Mon Sep 17 00:00:00 2001 From: Jason Gunthorpe Date: Fri, 18 Jan 2019 15:50:47 -0700 Subject: coding-style: Clarify the expectations around bool There has been some confusion since checkpatch started warning about bool use in structures, and people have been avoiding using it. Many people feel there is still a legitimate place for bool in structures, so provide some guidance on bool usage derived from the entire thread that spawned the checkpatch warning. Link: https://lkml.kernel.org/r/CA+55aFwVZk1OfB9T2v014PTAKFhtVan_Zj2dOjnCy3x6E4UJfA@mail.gmail.com Signed-off-by: Joe Perches Acked-by: Joe Perches Reviewed-by: Bart Van Assche Acked-by: Jani Nikula Reviewed-by: Joey Pabalinas Signed-off-by: Jason Gunthorpe Signed-off-by: Jonathan Corbet --- Documentation/process/coding-style.rst | 38 ++++++++++++++++++++++++++++++---- scripts/checkpatch.pl | 13 ------------ 2 files changed, 34 insertions(+), 17 deletions(-) diff --git a/Documentation/process/coding-style.rst b/Documentation/process/coding-style.rst index 4e611a01adc6..8ea913e99fa1 100644 --- a/Documentation/process/coding-style.rst +++ b/Documentation/process/coding-style.rst @@ -938,7 +938,37 @@ result. Typical examples would be functions that return pointers; they use NULL or the ERR_PTR mechanism to report failure. -17) Don't re-invent the kernel macros +17) Using bool +-------------- + +The Linux kernel bool type is an alias for the C99 _Bool type. bool values can +only evaluate to 0 or 1, and implicit or explicit conversion to bool +automatically converts the value to true or false. When using bool types the +!! construction is not needed, which eliminates a class of bugs. + +When working with bool values the true and false definitions should be used +instead of 1 and 0. + +bool function return types and stack variables are always fine to use whenever +appropriate. Use of bool is encouraged to improve readability and is often a +better option than 'int' for storing boolean values. + +Do not use bool if cache line layout or size of the value matters, as its size +and alignment varies based on the compiled architecture. Structures that are +optimized for alignment and size should not use bool. + +If a structure has many true/false values, consider consolidating them into a +bitfield with 1 bit members, or using an appropriate fixed width type, such as +u8. + +Similarly for function arguments, many true/false values can be consolidated +into a single bitwise 'flags' argument and 'flags' can often be a more +readable alternative if the call-sites have naked true/false constants. + +Otherwise limited use of bool in structures and arguments can improve +readability. + +18) Don't re-invent the kernel macros ------------------------------------- The header file include/linux/kernel.h contains a number of macros that @@ -961,7 +991,7 @@ need them. Feel free to peruse that header file to see what else is already defined that you shouldn't reproduce in your code. -18) Editor modelines and other cruft +19) Editor modelines and other cruft ------------------------------------ Some editors can interpret configuration information embedded in source files, @@ -995,7 +1025,7 @@ own custom mode, or may have some other magic method for making indentation work correctly. -19) Inline assembly +20) Inline assembly ------------------- In architecture-specific code, you may need to use inline assembly to interface @@ -1027,7 +1057,7 @@ the next instruction in the assembly output: : /* outputs */ : /* inputs */ : /* clobbers */); -20) Conditional Compilation +21) Conditional Compilation --------------------------- Wherever possible, don't use preprocessor conditionals (#if, #ifdef) in .c diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl index b737ca9d7204..d62abd032885 100755 --- a/scripts/checkpatch.pl +++ b/scripts/checkpatch.pl @@ -6368,19 +6368,6 @@ sub process { } } -# check for bool bitfields - if ($sline =~ /^.\s+bool\s*$Ident\s*:\s*\d+\s*;/) { - WARN("BOOL_BITFIELD", - "Avoid using bool as bitfield. Prefer bool bitfields as unsigned int or u<8|16|32>\n" . $herecurr); - } - -# check for bool use in .h files - if ($realfile =~ /\.h$/ && - $sline =~ /^.\s+bool\s*$Ident\s*(?::\s*d+\s*)?;/) { - CHK("BOOL_MEMBER", - "Avoid using bool structure members because of possible alignment issues - see: https://lkml.org/lkml/2017/11/21/384\n" . $herecurr); - } - # check for semaphores initialized locked if ($line =~ /^.\s*sema_init.+,\W?0\W?\)/) { WARN("CONSIDER_COMPLETION", -- cgit v1.2.3-59-g8ed1b From e6e37f636815075c0055f33f42ebea4fb057def6 Mon Sep 17 00:00:00 2001 From: Otto Sabart Date: Fri, 18 Jan 2019 21:38:32 +0100 Subject: doc: networking: integrate scaling document into doc tree Convert scaling document into reStructuredText and add reference to scaling document into main table of contents in network documentation. There are no semantic changes. There are no references to "scaling.txt" file. Whole kernel tree was checked using: $ grep -r "scaling\.txt" Signed-off-by: Otto Sabart Signed-off-by: Jonathan Corbet --- Documentation/networking/index.rst | 1 + Documentation/networking/scaling.rst | 523 +++++++++++++++++++++++++++++++++++ Documentation/networking/scaling.txt | 484 -------------------------------- 3 files changed, 524 insertions(+), 484 deletions(-) create mode 100644 Documentation/networking/scaling.rst delete mode 100644 Documentation/networking/scaling.txt diff --git a/Documentation/networking/index.rst b/Documentation/networking/index.rst index 7034214cf87e..44a0f1939fcd 100644 --- a/Documentation/networking/index.rst +++ b/Documentation/networking/index.rst @@ -34,6 +34,7 @@ Contents: snmp_counter checksum-offloads segmentation-offloads + scaling .. only:: subproject diff --git a/Documentation/networking/scaling.rst b/Documentation/networking/scaling.rst new file mode 100644 index 000000000000..f78d7bf27ff5 --- /dev/null +++ b/Documentation/networking/scaling.rst @@ -0,0 +1,523 @@ +.. SPDX-License-Identifier: GPL-2.0 + +===================================== +Scaling in the Linux Networking Stack +===================================== + + +Introduction +============ + +This document describes a set of complementary techniques in the Linux +networking stack to increase parallelism and improve performance for +multi-processor systems. + +The following technologies are described: + +- RSS: Receive Side Scaling +- RPS: Receive Packet Steering +- RFS: Receive Flow Steering +- Accelerated Receive Flow Steering +- XPS: Transmit Packet Steering + + +RSS: Receive Side Scaling +========================= + +Contemporary NICs support multiple receive and transmit descriptor queues +(multi-queue). On reception, a NIC can send different packets to different +queues to distribute processing among CPUs. The NIC distributes packets by +applying a filter to each packet that assigns it to one of a small number +of logical flows. Packets for each flow are steered to a separate receive +queue, which in turn can be processed by separate CPUs. This mechanism is +generally known as “Receive-side Scaling” (RSS). The goal of RSS and +the other scaling techniques is to increase performance uniformly. +Multi-queue distribution can also be used for traffic prioritization, but +that is not the focus of these techniques. + +The filter used in RSS is typically a hash function over the network +and/or transport layer headers-- for example, a 4-tuple hash over +IP addresses and TCP ports of a packet. The most common hardware +implementation of RSS uses a 128-entry indirection table where each entry +stores a queue number. The receive queue for a packet is determined +by masking out the low order seven bits of the computed hash for the +packet (usually a Toeplitz hash), taking this number as a key into the +indirection table and reading the corresponding value. + +Some advanced NICs allow steering packets to queues based on +programmable filters. For example, webserver bound TCP port 80 packets +can be directed to their own receive queue. Such “n-tuple” filters can +be configured from ethtool (--config-ntuple). + + +RSS Configuration +----------------- + +The driver for a multi-queue capable NIC typically provides a kernel +module parameter for specifying the number of hardware queues to +configure. In the bnx2x driver, for instance, this parameter is called +num_queues. A typical RSS configuration would be to have one receive queue +for each CPU if the device supports enough queues, or otherwise at least +one for each memory domain, where a memory domain is a set of CPUs that +share a particular memory level (L1, L2, NUMA node, etc.). + +The indirection table of an RSS device, which resolves a queue by masked +hash, is usually programmed by the driver at initialization. The +default mapping is to distribute the queues evenly in the table, but the +indirection table can be retrieved and modified at runtime using ethtool +commands (--show-rxfh-indir and --set-rxfh-indir). Modifying the +indirection table could be done to give different queues different +relative weights. + + +RSS IRQ Configuration +~~~~~~~~~~~~~~~~~~~~~ + +Each receive queue has a separate IRQ associated with it. The NIC triggers +this to notify a CPU when new packets arrive on the given queue. The +signaling path for PCIe devices uses message signaled interrupts (MSI-X), +that can route each interrupt to a particular CPU. The active mapping +of queues to IRQs can be determined from /proc/interrupts. By default, +an IRQ may be handled on any CPU. Because a non-negligible part of packet +processing takes place in receive interrupt handling, it is advantageous +to spread receive interrupts between CPUs. To manually adjust the IRQ +affinity of each interrupt see Documentation/IRQ-affinity.txt. Some systems +will be running irqbalance, a daemon that dynamically optimizes IRQ +assignments and as a result may override any manual settings. + + +Suggested Configuration +~~~~~~~~~~~~~~~~~~~~~~~ + +RSS should be enabled when latency is a concern or whenever receive +interrupt processing forms a bottleneck. Spreading load between CPUs +decreases queue length. For low latency networking, the optimal setting +is to allocate as many queues as there are CPUs in the system (or the +NIC maximum, if lower). The most efficient high-rate configuration +is likely the one with the smallest number of receive queues where no +receive queue overflows due to a saturated CPU, because in default +mode with interrupt coalescing enabled, the aggregate number of +interrupts (and thus work) grows with each additional queue. + +Per-cpu load can be observed using the mpstat utility, but note that on +processors with hyperthreading (HT), each hyperthread is represented as +a separate CPU. For interrupt handling, HT has shown no benefit in +initial tests, so limit the number of queues to the number of CPU cores +in the system. + + +RPS: Receive Packet Steering +============================ + +Receive Packet Steering (RPS) is logically a software implementation of +RSS. Being in software, it is necessarily called later in the datapath. +Whereas RSS selects the queue and hence CPU that will run the hardware +interrupt handler, RPS selects the CPU to perform protocol processing +above the interrupt handler. This is accomplished by placing the packet +on the desired CPU’s backlog queue and waking up the CPU for processing. +RPS has some advantages over RSS: + +1) it can be used with any NIC +2) software filters can easily be added to hash over new protocols +3) it does not increase hardware device interrupt rate (although it does + introduce inter-processor interrupts (IPIs)) + +RPS is called during bottom half of the receive interrupt handler, when +a driver sends a packet up the network stack with netif_rx() or +netif_receive_skb(). These call the get_rps_cpu() function, which +selects the queue that should process a packet. + +The first step in determining the target CPU for RPS is to calculate a +flow hash over the packet’s addresses or ports (2-tuple or 4-tuple hash +depending on the protocol). This serves as a consistent hash of the +associated flow of the packet. The hash is either provided by hardware +or will be computed in the stack. Capable hardware can pass the hash in +the receive descriptor for the packet; this would usually be the same +hash used for RSS (e.g. computed Toeplitz hash). The hash is saved in +skb->hash and can be used elsewhere in the stack as a hash of the +packet’s flow. + +Each receive hardware queue has an associated list of CPUs to which +RPS may enqueue packets for processing. For each received packet, +an index into the list is computed from the flow hash modulo the size +of the list. The indexed CPU is the target for processing the packet, +and the packet is queued to the tail of that CPU’s backlog queue. At +the end of the bottom half routine, IPIs are sent to any CPUs for which +packets have been queued to their backlog queue. The IPI wakes backlog +processing on the remote CPU, and any queued packets are then processed +up the networking stack. + + +RPS Configuration +----------------- + +RPS requires a kernel compiled with the CONFIG_RPS kconfig symbol (on +by default for SMP). Even when compiled in, RPS remains disabled until +explicitly configured. The list of CPUs to which RPS may forward traffic +can be configured for each receive queue using a sysfs file entry:: + + /sys/class/net//queues/rx-/rps_cpus + +This file implements a bitmap of CPUs. RPS is disabled when it is zero +(the default), in which case packets are processed on the interrupting +CPU. Documentation/IRQ-affinity.txt explains how CPUs are assigned to +the bitmap. + + +Suggested Configuration +~~~~~~~~~~~~~~~~~~~~~~~ + +For a single queue device, a typical RPS configuration would be to set +the rps_cpus to the CPUs in the same memory domain of the interrupting +CPU. If NUMA locality is not an issue, this could also be all CPUs in +the system. At high interrupt rate, it might be wise to exclude the +interrupting CPU from the map since that already performs much work. + +For a multi-queue system, if RSS is configured so that a hardware +receive queue is mapped to each CPU, then RPS is probably redundant +and unnecessary. If there are fewer hardware queues than CPUs, then +RPS might be beneficial if the rps_cpus for each queue are the ones that +share the same memory domain as the interrupting CPU for that queue. + + +RPS Flow Limit +-------------- + +RPS scales kernel receive processing across CPUs without introducing +reordering. The trade-off to sending all packets from the same flow +to the same CPU is CPU load imbalance if flows vary in packet rate. +In the extreme case a single flow dominates traffic. Especially on +common server workloads with many concurrent connections, such +behavior indicates a problem such as a misconfiguration or spoofed +source Denial of Service attack. + +Flow Limit is an optional RPS feature that prioritizes small flows +during CPU contention by dropping packets from large flows slightly +ahead of those from small flows. It is active only when an RPS or RFS +destination CPU approaches saturation. Once a CPU's input packet +queue exceeds half the maximum queue length (as set by sysctl +net.core.netdev_max_backlog), the kernel starts a per-flow packet +count over the last 256 packets. If a flow exceeds a set ratio (by +default, half) of these packets when a new packet arrives, then the +new packet is dropped. Packets from other flows are still only +dropped once the input packet queue reaches netdev_max_backlog. +No packets are dropped when the input packet queue length is below +the threshold, so flow limit does not sever connections outright: +even large flows maintain connectivity. + + +Interface +~~~~~~~~~ + +Flow limit is compiled in by default (CONFIG_NET_FLOW_LIMIT), but not +turned on. It is implemented for each CPU independently (to avoid lock +and cache contention) and toggled per CPU by setting the relevant bit +in sysctl net.core.flow_limit_cpu_bitmap. It exposes the same CPU +bitmap interface as rps_cpus (see above) when called from procfs:: + + /proc/sys/net/core/flow_limit_cpu_bitmap + +Per-flow rate is calculated by hashing each packet into a hashtable +bucket and incrementing a per-bucket counter. The hash function is +the same that selects a CPU in RPS, but as the number of buckets can +be much larger than the number of CPUs, flow limit has finer-grained +identification of large flows and fewer false positives. The default +table has 4096 buckets. This value can be modified through sysctl:: + + net.core.flow_limit_table_len + +The value is only consulted when a new table is allocated. Modifying +it does not update active tables. + + +Suggested Configuration +~~~~~~~~~~~~~~~~~~~~~~~ + +Flow limit is useful on systems with many concurrent connections, +where a single connection taking up 50% of a CPU indicates a problem. +In such environments, enable the feature on all CPUs that handle +network rx interrupts (as set in /proc/irq/N/smp_affinity). + +The feature depends on the input packet queue length to exceed +the flow limit threshold (50%) + the flow history length (256). +Setting net.core.netdev_max_backlog to either 1000 or 10000 +performed well in experiments. + + +RFS: Receive Flow Steering +========================== + +While RPS steers packets solely based on hash, and thus generally +provides good load distribution, it does not take into account +application locality. This is accomplished by Receive Flow Steering +(RFS). The goal of RFS is to increase datacache hitrate by steering +kernel processing of packets to the CPU where the application thread +consuming the packet is running. RFS relies on the same RPS mechanisms +to enqueue packets onto the backlog of another CPU and to wake up that +CPU. + +In RFS, packets are not forwarded directly by the value of their hash, +but the hash is used as index into a flow lookup table. This table maps +flows to the CPUs where those flows are being processed. The flow hash +(see RPS section above) is used to calculate the index into this table. +The CPU recorded in each entry is the one which last processed the flow. +If an entry does not hold a valid CPU, then packets mapped to that entry +are steered using plain RPS. Multiple table entries may point to the +same CPU. Indeed, with many flows and few CPUs, it is very likely that +a single application thread handles flows with many different flow hashes. + +rps_sock_flow_table is a global flow table that contains the *desired* CPU +for flows: the CPU that is currently processing the flow in userspace. +Each table value is a CPU index that is updated during calls to recvmsg +and sendmsg (specifically, inet_recvmsg(), inet_sendmsg(), inet_sendpage() +and tcp_splice_read()). + +When the scheduler moves a thread to a new CPU while it has outstanding +receive packets on the old CPU, packets may arrive out of order. To +avoid this, RFS uses a second flow table to track outstanding packets +for each flow: rps_dev_flow_table is a table specific to each hardware +receive queue of each device. Each table value stores a CPU index and a +counter. The CPU index represents the *current* CPU onto which packets +for this flow are enqueued for further kernel processing. Ideally, kernel +and userspace processing occur on the same CPU, and hence the CPU index +in both tables is identical. This is likely false if the scheduler has +recently migrated a userspace thread while the kernel still has packets +enqueued for kernel processing on the old CPU. + +The counter in rps_dev_flow_table values records the length of the current +CPU's backlog when a packet in this flow was last enqueued. Each backlog +queue has a head counter that is incremented on dequeue. A tail counter +is computed as head counter + queue length. In other words, the counter +in rps_dev_flow[i] records the last element in flow i that has +been enqueued onto the currently designated CPU for flow i (of course, +entry i is actually selected by hash and multiple flows may hash to the +same entry i). + +And now the trick for avoiding out of order packets: when selecting the +CPU for packet processing (from get_rps_cpu()) the rps_sock_flow table +and the rps_dev_flow table of the queue that the packet was received on +are compared. If the desired CPU for the flow (found in the +rps_sock_flow table) matches the current CPU (found in the rps_dev_flow +table), the packet is enqueued onto that CPU’s backlog. If they differ, +the current CPU is updated to match the desired CPU if one of the +following is true: + + - The current CPU's queue head counter >= the recorded tail counter + value in rps_dev_flow[i] + - The current CPU is unset (>= nr_cpu_ids) + - The current CPU is offline + +After this check, the packet is sent to the (possibly updated) current +CPU. These rules aim to ensure that a flow only moves to a new CPU when +there are no packets outstanding on the old CPU, as the outstanding +packets could arrive later than those about to be processed on the new +CPU. + + +RFS Configuration +----------------- + +RFS is only available if the kconfig symbol CONFIG_RPS is enabled (on +by default for SMP). The functionality remains disabled until explicitly +configured. The number of entries in the global flow table is set through:: + + /proc/sys/net/core/rps_sock_flow_entries + +The number of entries in the per-queue flow table are set through:: + + /sys/class/net//queues/rx-/rps_flow_cnt + + +Suggested Configuration +~~~~~~~~~~~~~~~~~~~~~~~ + +Both of these need to be set before RFS is enabled for a receive queue. +Values for both are rounded up to the nearest power of two. The +suggested flow count depends on the expected number of active connections +at any given time, which may be significantly less than the number of open +connections. We have found that a value of 32768 for rps_sock_flow_entries +works fairly well on a moderately loaded server. + +For a single queue device, the rps_flow_cnt value for the single queue +would normally be configured to the same value as rps_sock_flow_entries. +For a multi-queue device, the rps_flow_cnt for each queue might be +configured as rps_sock_flow_entries / N, where N is the number of +queues. So for instance, if rps_sock_flow_entries is set to 32768 and there +are 16 configured receive queues, rps_flow_cnt for each queue might be +configured as 2048. + + +Accelerated RFS +=============== + +Accelerated RFS is to RFS what RSS is to RPS: a hardware-accelerated load +balancing mechanism that uses soft state to steer flows based on where +the application thread consuming the packets of each flow is running. +Accelerated RFS should perform better than RFS since packets are sent +directly to a CPU local to the thread consuming the data. The target CPU +will either be the same CPU where the application runs, or at least a CPU +which is local to the application thread’s CPU in the cache hierarchy. + +To enable accelerated RFS, the networking stack calls the +ndo_rx_flow_steer driver function to communicate the desired hardware +queue for packets matching a particular flow. The network stack +automatically calls this function every time a flow entry in +rps_dev_flow_table is updated. The driver in turn uses a device specific +method to program the NIC to steer the packets. + +The hardware queue for a flow is derived from the CPU recorded in +rps_dev_flow_table. The stack consults a CPU to hardware queue map which +is maintained by the NIC driver. This is an auto-generated reverse map of +the IRQ affinity table shown by /proc/interrupts. Drivers can use +functions in the cpu_rmap (“CPU affinity reverse map”) kernel library +to populate the map. For each CPU, the corresponding queue in the map is +set to be one whose processing CPU is closest in cache locality. + + +Accelerated RFS Configuration +----------------------------- + +Accelerated RFS is only available if the kernel is compiled with +CONFIG_RFS_ACCEL and support is provided by the NIC device and driver. +It also requires that ntuple filtering is enabled via ethtool. The map +of CPU to queues is automatically deduced from the IRQ affinities +configured for each receive queue by the driver, so no additional +configuration should be necessary. + + +Suggested Configuration +~~~~~~~~~~~~~~~~~~~~~~~ + +This technique should be enabled whenever one wants to use RFS and the +NIC supports hardware acceleration. + + +XPS: Transmit Packet Steering +============================= + +Transmit Packet Steering is a mechanism for intelligently selecting +which transmit queue to use when transmitting a packet on a multi-queue +device. This can be accomplished by recording two kinds of maps, either +a mapping of CPU to hardware queue(s) or a mapping of receive queue(s) +to hardware transmit queue(s). + +1. XPS using CPUs map + +The goal of this mapping is usually to assign queues +exclusively to a subset of CPUs, where the transmit completions for +these queues are processed on a CPU within this set. This choice +provides two benefits. First, contention on the device queue lock is +significantly reduced since fewer CPUs contend for the same queue +(contention can be eliminated completely if each CPU has its own +transmit queue). Secondly, cache miss rate on transmit completion is +reduced, in particular for data cache lines that hold the sk_buff +structures. + +2. XPS using receive queues map + +This mapping is used to pick transmit queue based on the receive +queue(s) map configuration set by the administrator. A set of receive +queues can be mapped to a set of transmit queues (many:many), although +the common use case is a 1:1 mapping. This will enable sending packets +on the same queue associations for transmit and receive. This is useful for +busy polling multi-threaded workloads where there are challenges in +associating a given CPU to a given application thread. The application +threads are not pinned to CPUs and each thread handles packets +received on a single queue. The receive queue number is cached in the +socket for the connection. In this model, sending the packets on the same +transmit queue corresponding to the associated receive queue has benefits +in keeping the CPU overhead low. Transmit completion work is locked into +the same queue-association that a given application is polling on. This +avoids the overhead of triggering an interrupt on another CPU. When the +application cleans up the packets during the busy poll, transmit completion +may be processed along with it in the same thread context and so result in +reduced latency. + +XPS is configured per transmit queue by setting a bitmap of +CPUs/receive-queues that may use that queue to transmit. The reverse +mapping, from CPUs to transmit queues or from receive-queues to transmit +queues, is computed and maintained for each network device. When +transmitting the first packet in a flow, the function get_xps_queue() is +called to select a queue. This function uses the ID of the receive queue +for the socket connection for a match in the receive queue-to-transmit queue +lookup table. Alternatively, this function can also use the ID of the +running CPU as a key into the CPU-to-queue lookup table. If the +ID matches a single queue, that is used for transmission. If multiple +queues match, one is selected by using the flow hash to compute an index +into the set. When selecting the transmit queue based on receive queue(s) +map, the transmit device is not validated against the receive device as it +requires expensive lookup operation in the datapath. + +The queue chosen for transmitting a particular flow is saved in the +corresponding socket structure for the flow (e.g. a TCP connection). +This transmit queue is used for subsequent packets sent on the flow to +prevent out of order (ooo) packets. The choice also amortizes the cost +of calling get_xps_queues() over all packets in the flow. To avoid +ooo packets, the queue for a flow can subsequently only be changed if +skb->ooo_okay is set for a packet in the flow. This flag indicates that +there are no outstanding packets in the flow, so the transmit queue can +change without the risk of generating out of order packets. The +transport layer is responsible for setting ooo_okay appropriately. TCP, +for instance, sets the flag when all data for a connection has been +acknowledged. + +XPS Configuration +----------------- + +XPS is only available if the kconfig symbol CONFIG_XPS is enabled (on by +default for SMP). The functionality remains disabled until explicitly +configured. To enable XPS, the bitmap of CPUs/receive-queues that may +use a transmit queue is configured using the sysfs file entry: + +For selection based on CPUs map:: + + /sys/class/net//queues/tx-/xps_cpus + +For selection based on receive-queues map:: + + /sys/class/net//queues/tx-/xps_rxqs + + +Suggested Configuration +~~~~~~~~~~~~~~~~~~~~~~~ + +For a network device with a single transmission queue, XPS configuration +has no effect, since there is no choice in this case. In a multi-queue +system, XPS is preferably configured so that each CPU maps onto one queue. +If there are as many queues as there are CPUs in the system, then each +queue can also map onto one CPU, resulting in exclusive pairings that +experience no contention. If there are fewer queues than CPUs, then the +best CPUs to share a given queue are probably those that share the cache +with the CPU that processes transmit completions for that queue +(transmit interrupts). + +For transmit queue selection based on receive queue(s), XPS has to be +explicitly configured mapping receive-queue(s) to transmit queue(s). If the +user configuration for receive-queue map does not apply, then the transmit +queue is selected based on the CPUs map. + + +Per TX Queue rate limitation +============================ + +These are rate-limitation mechanisms implemented by HW, where currently +a max-rate attribute is supported, by setting a Mbps value to:: + + /sys/class/net//queues/tx-/tx_maxrate + +A value of zero means disabled, and this is the default. + + +Further Information +=================== +RPS and RFS were introduced in kernel 2.6.35. XPS was incorporated into +2.6.38. Original patches were submitted by Tom Herbert +(therbert@google.com) + +Accelerated RFS was introduced in 2.6.35. Original patches were +submitted by Ben Hutchings (bwh@kernel.org) + +Authors: + +- Tom Herbert (therbert@google.com) +- Willem de Bruijn (willemb@google.com) diff --git a/Documentation/networking/scaling.txt b/Documentation/networking/scaling.txt deleted file mode 100644 index b7056a8a0540..000000000000 --- a/Documentation/networking/scaling.txt +++ /dev/null @@ -1,484 +0,0 @@ -Scaling in the Linux Networking Stack - - -Introduction -============ - -This document describes a set of complementary techniques in the Linux -networking stack to increase parallelism and improve performance for -multi-processor systems. - -The following technologies are described: - - RSS: Receive Side Scaling - RPS: Receive Packet Steering - RFS: Receive Flow Steering - Accelerated Receive Flow Steering - XPS: Transmit Packet Steering - - -RSS: Receive Side Scaling -========================= - -Contemporary NICs support multiple receive and transmit descriptor queues -(multi-queue). On reception, a NIC can send different packets to different -queues to distribute processing among CPUs. The NIC distributes packets by -applying a filter to each packet that assigns it to one of a small number -of logical flows. Packets for each flow are steered to a separate receive -queue, which in turn can be processed by separate CPUs. This mechanism is -generally known as “Receive-side Scaling” (RSS). The goal of RSS and -the other scaling techniques is to increase performance uniformly. -Multi-queue distribution can also be used for traffic prioritization, but -that is not the focus of these techniques. - -The filter used in RSS is typically a hash function over the network -and/or transport layer headers-- for example, a 4-tuple hash over -IP addresses and TCP ports of a packet. The most common hardware -implementation of RSS uses a 128-entry indirection table where each entry -stores a queue number. The receive queue for a packet is determined -by masking out the low order seven bits of the computed hash for the -packet (usually a Toeplitz hash), taking this number as a key into the -indirection table and reading the corresponding value. - -Some advanced NICs allow steering packets to queues based on -programmable filters. For example, webserver bound TCP port 80 packets -can be directed to their own receive queue. Such “n-tuple” filters can -be configured from ethtool (--config-ntuple). - -==== RSS Configuration - -The driver for a multi-queue capable NIC typically provides a kernel -module parameter for specifying the number of hardware queues to -configure. In the bnx2x driver, for instance, this parameter is called -num_queues. A typical RSS configuration would be to have one receive queue -for each CPU if the device supports enough queues, or otherwise at least -one for each memory domain, where a memory domain is a set of CPUs that -share a particular memory level (L1, L2, NUMA node, etc.). - -The indirection table of an RSS device, which resolves a queue by masked -hash, is usually programmed by the driver at initialization. The -default mapping is to distribute the queues evenly in the table, but the -indirection table can be retrieved and modified at runtime using ethtool -commands (--show-rxfh-indir and --set-rxfh-indir). Modifying the -indirection table could be done to give different queues different -relative weights. - -== RSS IRQ Configuration - -Each receive queue has a separate IRQ associated with it. The NIC triggers -this to notify a CPU when new packets arrive on the given queue. The -signaling path for PCIe devices uses message signaled interrupts (MSI-X), -that can route each interrupt to a particular CPU. The active mapping -of queues to IRQs can be determined from /proc/interrupts. By default, -an IRQ may be handled on any CPU. Because a non-negligible part of packet -processing takes place in receive interrupt handling, it is advantageous -to spread receive interrupts between CPUs. To manually adjust the IRQ -affinity of each interrupt see Documentation/IRQ-affinity.txt. Some systems -will be running irqbalance, a daemon that dynamically optimizes IRQ -assignments and as a result may override any manual settings. - -== Suggested Configuration - -RSS should be enabled when latency is a concern or whenever receive -interrupt processing forms a bottleneck. Spreading load between CPUs -decreases queue length. For low latency networking, the optimal setting -is to allocate as many queues as there are CPUs in the system (or the -NIC maximum, if lower). The most efficient high-rate configuration -is likely the one with the smallest number of receive queues where no -receive queue overflows due to a saturated CPU, because in default -mode with interrupt coalescing enabled, the aggregate number of -interrupts (and thus work) grows with each additional queue. - -Per-cpu load can be observed using the mpstat utility, but note that on -processors with hyperthreading (HT), each hyperthread is represented as -a separate CPU. For interrupt handling, HT has shown no benefit in -initial tests, so limit the number of queues to the number of CPU cores -in the system. - - -RPS: Receive Packet Steering -============================ - -Receive Packet Steering (RPS) is logically a software implementation of -RSS. Being in software, it is necessarily called later in the datapath. -Whereas RSS selects the queue and hence CPU that will run the hardware -interrupt handler, RPS selects the CPU to perform protocol processing -above the interrupt handler. This is accomplished by placing the packet -on the desired CPU’s backlog queue and waking up the CPU for processing. -RPS has some advantages over RSS: 1) it can be used with any NIC, -2) software filters can easily be added to hash over new protocols, -3) it does not increase hardware device interrupt rate (although it does -introduce inter-processor interrupts (IPIs)). - -RPS is called during bottom half of the receive interrupt handler, when -a driver sends a packet up the network stack with netif_rx() or -netif_receive_skb(). These call the get_rps_cpu() function, which -selects the queue that should process a packet. - -The first step in determining the target CPU for RPS is to calculate a -flow hash over the packet’s addresses or ports (2-tuple or 4-tuple hash -depending on the protocol). This serves as a consistent hash of the -associated flow of the packet. The hash is either provided by hardware -or will be computed in the stack. Capable hardware can pass the hash in -the receive descriptor for the packet; this would usually be the same -hash used for RSS (e.g. computed Toeplitz hash). The hash is saved in -skb->hash and can be used elsewhere in the stack as a hash of the -packet’s flow. - -Each receive hardware queue has an associated list of CPUs to which -RPS may enqueue packets for processing. For each received packet, -an index into the list is computed from the flow hash modulo the size -of the list. The indexed CPU is the target for processing the packet, -and the packet is queued to the tail of that CPU’s backlog queue. At -the end of the bottom half routine, IPIs are sent to any CPUs for which -packets have been queued to their backlog queue. The IPI wakes backlog -processing on the remote CPU, and any queued packets are then processed -up the networking stack. - -==== RPS Configuration - -RPS requires a kernel compiled with the CONFIG_RPS kconfig symbol (on -by default for SMP). Even when compiled in, RPS remains disabled until -explicitly configured. The list of CPUs to which RPS may forward traffic -can be configured for each receive queue using a sysfs file entry: - - /sys/class/net//queues/rx-/rps_cpus - -This file implements a bitmap of CPUs. RPS is disabled when it is zero -(the default), in which case packets are processed on the interrupting -CPU. Documentation/IRQ-affinity.txt explains how CPUs are assigned to -the bitmap. - -== Suggested Configuration - -For a single queue device, a typical RPS configuration would be to set -the rps_cpus to the CPUs in the same memory domain of the interrupting -CPU. If NUMA locality is not an issue, this could also be all CPUs in -the system. At high interrupt rate, it might be wise to exclude the -interrupting CPU from the map since that already performs much work. - -For a multi-queue system, if RSS is configured so that a hardware -receive queue is mapped to each CPU, then RPS is probably redundant -and unnecessary. If there are fewer hardware queues than CPUs, then -RPS might be beneficial if the rps_cpus for each queue are the ones that -share the same memory domain as the interrupting CPU for that queue. - -==== RPS Flow Limit - -RPS scales kernel receive processing across CPUs without introducing -reordering. The trade-off to sending all packets from the same flow -to the same CPU is CPU load imbalance if flows vary in packet rate. -In the extreme case a single flow dominates traffic. Especially on -common server workloads with many concurrent connections, such -behavior indicates a problem such as a misconfiguration or spoofed -source Denial of Service attack. - -Flow Limit is an optional RPS feature that prioritizes small flows -during CPU contention by dropping packets from large flows slightly -ahead of those from small flows. It is active only when an RPS or RFS -destination CPU approaches saturation. Once a CPU's input packet -queue exceeds half the maximum queue length (as set by sysctl -net.core.netdev_max_backlog), the kernel starts a per-flow packet -count over the last 256 packets. If a flow exceeds a set ratio (by -default, half) of these packets when a new packet arrives, then the -new packet is dropped. Packets from other flows are still only -dropped once the input packet queue reaches netdev_max_backlog. -No packets are dropped when the input packet queue length is below -the threshold, so flow limit does not sever connections outright: -even large flows maintain connectivity. - -== Interface - -Flow limit is compiled in by default (CONFIG_NET_FLOW_LIMIT), but not -turned on. It is implemented for each CPU independently (to avoid lock -and cache contention) and toggled per CPU by setting the relevant bit -in sysctl net.core.flow_limit_cpu_bitmap. It exposes the same CPU -bitmap interface as rps_cpus (see above) when called from procfs: - - /proc/sys/net/core/flow_limit_cpu_bitmap - -Per-flow rate is calculated by hashing each packet into a hashtable -bucket and incrementing a per-bucket counter. The hash function is -the same that selects a CPU in RPS, but as the number of buckets can -be much larger than the number of CPUs, flow limit has finer-grained -identification of large flows and fewer false positives. The default -table has 4096 buckets. This value can be modified through sysctl - - net.core.flow_limit_table_len - -The value is only consulted when a new table is allocated. Modifying -it does not update active tables. - -== Suggested Configuration - -Flow limit is useful on systems with many concurrent connections, -where a single connection taking up 50% of a CPU indicates a problem. -In such environments, enable the feature on all CPUs that handle -network rx interrupts (as set in /proc/irq/N/smp_affinity). - -The feature depends on the input packet queue length to exceed -the flow limit threshold (50%) + the flow history length (256). -Setting net.core.netdev_max_backlog to either 1000 or 10000 -performed well in experiments. - - -RFS: Receive Flow Steering -========================== - -While RPS steers packets solely based on hash, and thus generally -provides good load distribution, it does not take into account -application locality. This is accomplished by Receive Flow Steering -(RFS). The goal of RFS is to increase datacache hitrate by steering -kernel processing of packets to the CPU where the application thread -consuming the packet is running. RFS relies on the same RPS mechanisms -to enqueue packets onto the backlog of another CPU and to wake up that -CPU. - -In RFS, packets are not forwarded directly by the value of their hash, -but the hash is used as index into a flow lookup table. This table maps -flows to the CPUs where those flows are being processed. The flow hash -(see RPS section above) is used to calculate the index into this table. -The CPU recorded in each entry is the one which last processed the flow. -If an entry does not hold a valid CPU, then packets mapped to that entry -are steered using plain RPS. Multiple table entries may point to the -same CPU. Indeed, with many flows and few CPUs, it is very likely that -a single application thread handles flows with many different flow hashes. - -rps_sock_flow_table is a global flow table that contains the *desired* CPU -for flows: the CPU that is currently processing the flow in userspace. -Each table value is a CPU index that is updated during calls to recvmsg -and sendmsg (specifically, inet_recvmsg(), inet_sendmsg(), inet_sendpage() -and tcp_splice_read()). - -When the scheduler moves a thread to a new CPU while it has outstanding -receive packets on the old CPU, packets may arrive out of order. To -avoid this, RFS uses a second flow table to track outstanding packets -for each flow: rps_dev_flow_table is a table specific to each hardware -receive queue of each device. Each table value stores a CPU index and a -counter. The CPU index represents the *current* CPU onto which packets -for this flow are enqueued for further kernel processing. Ideally, kernel -and userspace processing occur on the same CPU, and hence the CPU index -in both tables is identical. This is likely false if the scheduler has -recently migrated a userspace thread while the kernel still has packets -enqueued for kernel processing on the old CPU. - -The counter in rps_dev_flow_table values records the length of the current -CPU's backlog when a packet in this flow was last enqueued. Each backlog -queue has a head counter that is incremented on dequeue. A tail counter -is computed as head counter + queue length. In other words, the counter -in rps_dev_flow[i] records the last element in flow i that has -been enqueued onto the currently designated CPU for flow i (of course, -entry i is actually selected by hash and multiple flows may hash to the -same entry i). - -And now the trick for avoiding out of order packets: when selecting the -CPU for packet processing (from get_rps_cpu()) the rps_sock_flow table -and the rps_dev_flow table of the queue that the packet was received on -are compared. If the desired CPU for the flow (found in the -rps_sock_flow table) matches the current CPU (found in the rps_dev_flow -table), the packet is enqueued onto that CPU’s backlog. If they differ, -the current CPU is updated to match the desired CPU if one of the -following is true: - -- The current CPU's queue head counter >= the recorded tail counter - value in rps_dev_flow[i] -- The current CPU is unset (>= nr_cpu_ids) -- The current CPU is offline - -After this check, the packet is sent to the (possibly updated) current -CPU. These rules aim to ensure that a flow only moves to a new CPU when -there are no packets outstanding on the old CPU, as the outstanding -packets could arrive later than those about to be processed on the new -CPU. - -==== RFS Configuration - -RFS is only available if the kconfig symbol CONFIG_RPS is enabled (on -by default for SMP). The functionality remains disabled until explicitly -configured. The number of entries in the global flow table is set through: - - /proc/sys/net/core/rps_sock_flow_entries - -The number of entries in the per-queue flow table are set through: - - /sys/class/net//queues/rx-/rps_flow_cnt - -== Suggested Configuration - -Both of these need to be set before RFS is enabled for a receive queue. -Values for both are rounded up to the nearest power of two. The -suggested flow count depends on the expected number of active connections -at any given time, which may be significantly less than the number of open -connections. We have found that a value of 32768 for rps_sock_flow_entries -works fairly well on a moderately loaded server. - -For a single queue device, the rps_flow_cnt value for the single queue -would normally be configured to the same value as rps_sock_flow_entries. -For a multi-queue device, the rps_flow_cnt for each queue might be -configured as rps_sock_flow_entries / N, where N is the number of -queues. So for instance, if rps_sock_flow_entries is set to 32768 and there -are 16 configured receive queues, rps_flow_cnt for each queue might be -configured as 2048. - - -Accelerated RFS -=============== - -Accelerated RFS is to RFS what RSS is to RPS: a hardware-accelerated load -balancing mechanism that uses soft state to steer flows based on where -the application thread consuming the packets of each flow is running. -Accelerated RFS should perform better than RFS since packets are sent -directly to a CPU local to the thread consuming the data. The target CPU -will either be the same CPU where the application runs, or at least a CPU -which is local to the application thread’s CPU in the cache hierarchy. - -To enable accelerated RFS, the networking stack calls the -ndo_rx_flow_steer driver function to communicate the desired hardware -queue for packets matching a particular flow. The network stack -automatically calls this function every time a flow entry in -rps_dev_flow_table is updated. The driver in turn uses a device specific -method to program the NIC to steer the packets. - -The hardware queue for a flow is derived from the CPU recorded in -rps_dev_flow_table. The stack consults a CPU to hardware queue map which -is maintained by the NIC driver. This is an auto-generated reverse map of -the IRQ affinity table shown by /proc/interrupts. Drivers can use -functions in the cpu_rmap (“CPU affinity reverse map”) kernel library -to populate the map. For each CPU, the corresponding queue in the map is -set to be one whose processing CPU is closest in cache locality. - -==== Accelerated RFS Configuration - -Accelerated RFS is only available if the kernel is compiled with -CONFIG_RFS_ACCEL and support is provided by the NIC device and driver. -It also requires that ntuple filtering is enabled via ethtool. The map -of CPU to queues is automatically deduced from the IRQ affinities -configured for each receive queue by the driver, so no additional -configuration should be necessary. - -== Suggested Configuration - -This technique should be enabled whenever one wants to use RFS and the -NIC supports hardware acceleration. - -XPS: Transmit Packet Steering -============================= - -Transmit Packet Steering is a mechanism for intelligently selecting -which transmit queue to use when transmitting a packet on a multi-queue -device. This can be accomplished by recording two kinds of maps, either -a mapping of CPU to hardware queue(s) or a mapping of receive queue(s) -to hardware transmit queue(s). - -1. XPS using CPUs map - -The goal of this mapping is usually to assign queues -exclusively to a subset of CPUs, where the transmit completions for -these queues are processed on a CPU within this set. This choice -provides two benefits. First, contention on the device queue lock is -significantly reduced since fewer CPUs contend for the same queue -(contention can be eliminated completely if each CPU has its own -transmit queue). Secondly, cache miss rate on transmit completion is -reduced, in particular for data cache lines that hold the sk_buff -structures. - -2. XPS using receive queues map - -This mapping is used to pick transmit queue based on the receive -queue(s) map configuration set by the administrator. A set of receive -queues can be mapped to a set of transmit queues (many:many), although -the common use case is a 1:1 mapping. This will enable sending packets -on the same queue associations for transmit and receive. This is useful for -busy polling multi-threaded workloads where there are challenges in -associating a given CPU to a given application thread. The application -threads are not pinned to CPUs and each thread handles packets -received on a single queue. The receive queue number is cached in the -socket for the connection. In this model, sending the packets on the same -transmit queue corresponding to the associated receive queue has benefits -in keeping the CPU overhead low. Transmit completion work is locked into -the same queue-association that a given application is polling on. This -avoids the overhead of triggering an interrupt on another CPU. When the -application cleans up the packets during the busy poll, transmit completion -may be processed along with it in the same thread context and so result in -reduced latency. - -XPS is configured per transmit queue by setting a bitmap of -CPUs/receive-queues that may use that queue to transmit. The reverse -mapping, from CPUs to transmit queues or from receive-queues to transmit -queues, is computed and maintained for each network device. When -transmitting the first packet in a flow, the function get_xps_queue() is -called to select a queue. This function uses the ID of the receive queue -for the socket connection for a match in the receive queue-to-transmit queue -lookup table. Alternatively, this function can also use the ID of the -running CPU as a key into the CPU-to-queue lookup table. If the -ID matches a single queue, that is used for transmission. If multiple -queues match, one is selected by using the flow hash to compute an index -into the set. When selecting the transmit queue based on receive queue(s) -map, the transmit device is not validated against the receive device as it -requires expensive lookup operation in the datapath. - -The queue chosen for transmitting a particular flow is saved in the -corresponding socket structure for the flow (e.g. a TCP connection). -This transmit queue is used for subsequent packets sent on the flow to -prevent out of order (ooo) packets. The choice also amortizes the cost -of calling get_xps_queues() over all packets in the flow. To avoid -ooo packets, the queue for a flow can subsequently only be changed if -skb->ooo_okay is set for a packet in the flow. This flag indicates that -there are no outstanding packets in the flow, so the transmit queue can -change without the risk of generating out of order packets. The -transport layer is responsible for setting ooo_okay appropriately. TCP, -for instance, sets the flag when all data for a connection has been -acknowledged. - -==== XPS Configuration - -XPS is only available if the kconfig symbol CONFIG_XPS is enabled (on by -default for SMP). The functionality remains disabled until explicitly -configured. To enable XPS, the bitmap of CPUs/receive-queues that may -use a transmit queue is configured using the sysfs file entry: - -For selection based on CPUs map: -/sys/class/net//queues/tx-/xps_cpus - -For selection based on receive-queues map: -/sys/class/net//queues/tx-/xps_rxqs - -== Suggested Configuration - -For a network device with a single transmission queue, XPS configuration -has no effect, since there is no choice in this case. In a multi-queue -system, XPS is preferably configured so that each CPU maps onto one queue. -If there are as many queues as there are CPUs in the system, then each -queue can also map onto one CPU, resulting in exclusive pairings that -experience no contention. If there are fewer queues than CPUs, then the -best CPUs to share a given queue are probably those that share the cache -with the CPU that processes transmit completions for that queue -(transmit interrupts). - -For transmit queue selection based on receive queue(s), XPS has to be -explicitly configured mapping receive-queue(s) to transmit queue(s). If the -user configuration for receive-queue map does not apply, then the transmit -queue is selected based on the CPUs map. - -Per TX Queue rate limitation: -============================= - -These are rate-limitation mechanisms implemented by HW, where currently -a max-rate attribute is supported, by setting a Mbps value to - -/sys/class/net//queues/tx-/tx_maxrate - -A value of zero means disabled, and this is the default. - -Further Information -=================== -RPS and RFS were introduced in kernel 2.6.35. XPS was incorporated into -2.6.38. Original patches were submitted by Tom Herbert -(therbert@google.com) - -Accelerated RFS was introduced in 2.6.35. Original patches were -submitted by Ben Hutchings (bwh@kernel.org) - -Authors: -Tom Herbert (therbert@google.com) -Willem de Bruijn (willemb@google.com) -- cgit v1.2.3-59-g8ed1b From 31f433307043cf14ba1057ce4b8eaac17a2c617f Mon Sep 17 00:00:00 2001 From: Corentin Labbe Date: Fri, 18 Jan 2019 13:38:22 +0000 Subject: Documentation: DMA-API: fix two typos This patch fixes two typos, a missing "e" and dma-api/driver_filter was incorrectly typed dma-api/driver-filter. Signed-off-by: Corentin Labbe [jc: fixed obvious language typos on the way in] Signed-off-by: Jonathan Corbet --- Documentation/DMA-API.txt | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/Documentation/DMA-API.txt b/Documentation/DMA-API.txt index e133ccd60228..607c1e75e5aa 100644 --- a/Documentation/DMA-API.txt +++ b/Documentation/DMA-API.txt @@ -530,8 +530,8 @@ that simply cannot make consistent memory. dma_free_attrs(struct device *dev, size_t size, void *cpu_addr, dma_addr_t dma_handle, unsigned long attrs) -Free memory allocated by the dma_alloc_attrs(). All parameters common -parameters must identical to those otherwise passed to dma_fre_coherent, +Free memory allocated by the dma_alloc_attrs(). All common +parameters must be identical to those otherwise passed to dma_free_coherent, and the attrs argument must be identical to the attrs passed to dma_alloc_attrs(). @@ -717,7 +717,7 @@ dma-api/num_free_entries The current number of free dma_debug_entries dma-api/nr_total_entries The total number of dma_debug_entries in the allocator, both free and used. -dma-api/driver-filter You can write a name of a driver into this file +dma-api/driver_filter You can write a name of a driver into this file to limit the debug output to requests from that particular driver. Write an empty string to that file to disable the filter and see -- cgit v1.2.3-59-g8ed1b From 7d1179f0dbcd88dea67bdf3d69c93ab39750bbbb Mon Sep 17 00:00:00 2001 From: Frank Rowand Date: Fri, 1 Feb 2019 13:54:39 -0800 Subject: docs: kernel-doc: update commands to generate man page (1) The command to generate man pages is truncated in the pdf version of the document. Reformat the command into multiple lines to prevent the truncation. (2) Older versions of git do not support all variants of pathspec syntax. Provide commands to generate man pages for various alternate syntax. Signed-off-by: Frank Rowand Signed-off-by: Jonathan Corbet --- Documentation/doc-guide/kernel-doc.rst | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/Documentation/doc-guide/kernel-doc.rst b/Documentation/doc-guide/kernel-doc.rst index 51be62aa4385..450ac634e92e 100644 --- a/Documentation/doc-guide/kernel-doc.rst +++ b/Documentation/doc-guide/kernel-doc.rst @@ -517,4 +517,17 @@ How to use kernel-doc to generate man pages If you just want to use kernel-doc to generate man pages you can do this from the kernel git tree:: - $ scripts/kernel-doc -man $(git grep -l '/\*\*' -- :^Documentation :^tools) | scripts/split-man.pl /tmp/man + $ scripts/kernel-doc -man \ + $(git grep -l '/\*\*' -- :^Documentation :^tools) \ + | scripts/split-man.pl /tmp/man + +Some older versions of git do not support some of the variants of syntax for +path exclusion. One of the following commands may work for those versions:: + + $ scripts/kernel-doc -man \ + $(git grep -l '/\*\*' -- . ':!Documentation' ':!tools') \ + | scripts/split-man.pl /tmp/man + + $ scripts/kernel-doc -man \ + $(git grep -l '/\*\*' -- . ":(exclude)Documentation" ":(exclude)tools") \ + | scripts/split-man.pl /tmp/man -- cgit v1.2.3-59-g8ed1b From b5b2187db0cbc991466121fbf25d9e15796ea145 Mon Sep 17 00:00:00 2001 From: Frank Rowand Date: Fri, 1 Feb 2019 14:04:16 -0800 Subject: docs: kernel-doc: typo "documentaion" Fix a typo in kernel-doc.rst Signed-off-by: Frank Rowand Signed-off-by: Jonathan Corbet --- Documentation/doc-guide/kernel-doc.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/doc-guide/kernel-doc.rst b/Documentation/doc-guide/kernel-doc.rst index 450ac634e92e..6513c16b7e4f 100644 --- a/Documentation/doc-guide/kernel-doc.rst +++ b/Documentation/doc-guide/kernel-doc.rst @@ -490,7 +490,7 @@ doc: *title* functions: *[ function ...]* Include documentation for each *function* in *source*. - If no *function* if specified, the documentaion for all functions + If no *function* if specified, the documentation for all functions and types in the *source* will be included. Examples:: -- cgit v1.2.3-59-g8ed1b From 358b6ba9befa6d62bab30e2d2c6dc1d49caf532f Mon Sep 17 00:00:00 2001 From: "Tobin C. Harding" Date: Thu, 31 Jan 2019 15:06:21 +1100 Subject: docs: Fix SLUB docs typo There is a minor typo in SLUB documentation. Fix typo. Signed-off-by: Tobin C. Harding Acked-by: Mike Rapoport Signed-off-by: Jonathan Corbet --- Documentation/vm/slub.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/vm/slub.rst b/Documentation/vm/slub.rst index 195928808bac..0871c584939b 100644 --- a/Documentation/vm/slub.rst +++ b/Documentation/vm/slub.rst @@ -141,7 +141,7 @@ can be influenced by kernel parameters: (list_lock) where contention may occur. ``slub_min_order`` - specifies a minim order of slabs. A similar effect like + specifies a minimum order of slabs. A similar effect like ``slub_min_objects``. ``slub_max_order`` -- cgit v1.2.3-59-g8ed1b From 11ede50059d0603e6ae2df5042cf813145ea5736 Mon Sep 17 00:00:00 2001 From: "Tobin C. Harding" Date: Thu, 31 Jan 2019 15:06:22 +1100 Subject: docs: Add missing colon sphinx emits warning WARNING: Inline emphasis start-string without end-string. This is caused by a missing colon. Add missing colon, clearing shpinx build warning. Signed-off-by: Tobin C. Harding Acked-by: Mike Rapoport Signed-off-by: Jonathan Corbet --- Documentation/vm/slub.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/vm/slub.rst b/Documentation/vm/slub.rst index 0871c584939b..933ada4368ff 100644 --- a/Documentation/vm/slub.rst +++ b/Documentation/vm/slub.rst @@ -66,7 +66,7 @@ Trying to find an issue in the dentry cache? Try:: to only enable debugging on the dentry cache. You may use an asterisk at the end of the slab name, in order to cover all slabs with the same prefix. For example, here's how you can poison the dentry cache as well as all kmalloc -slabs: +slabs:: slub_debug=P,kmalloc-*,dentry -- cgit v1.2.3-59-g8ed1b From cd7198fc959e9ce6a90148b0ab7338b1a8b7c17e Mon Sep 17 00:00:00 2001 From: "Tobin C. Harding" Date: Thu, 31 Jan 2019 15:06:23 +1100 Subject: docs: Use underscore not hyphen in label sphinx emits warning WARNING: undefined label: memory-allocation ... This seems to be caused by the use of a hyphen in the label name instead of an underscore. Using an underscore for the label name and the reference clears the warning. Use underscore not hyphen in label and reference. Signed-off-by: Tobin C. Harding Signed-off-by: Jonathan Corbet --- Documentation/core-api/memory-allocation.rst | 2 +- Documentation/vm/index.rst | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/Documentation/core-api/memory-allocation.rst b/Documentation/core-api/memory-allocation.rst index 51a200d59f51..7744aa3bf2e0 100644 --- a/Documentation/core-api/memory-allocation.rst +++ b/Documentation/core-api/memory-allocation.rst @@ -1,4 +1,4 @@ -.. _memory-allocation: +.. _memory_allocation: ======================= Memory Allocation Guide diff --git a/Documentation/vm/index.rst b/Documentation/vm/index.rst index 2b3ab3a1ccf3..b58cc3bfe777 100644 --- a/Documentation/vm/index.rst +++ b/Documentation/vm/index.rst @@ -4,7 +4,7 @@ Linux Memory Management Documentation This is a collection of documents about the Linux memory management (mm) subsystem. If you are looking for advice on simply allocating memory, -see the :ref:`memory-allocation`. +see the :ref:`memory_allocation`. User guides for MM features =========================== -- cgit v1.2.3-59-g8ed1b From 19c1d46dfc7793b53b89c796475399656b9a794a Mon Sep 17 00:00:00 2001 From: Jonathan Neuschäfer Date: Wed, 30 Jan 2019 18:25:15 +0100 Subject: doc: Change LXR references to elixir.bootlin.com MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Recently, Free Electrons was renamed to Bootlin[1]. Less recently, the Linux Cross Reference (LXR) at lxr.free-electrons.com was replaced by Elixir[2], and lxr.free-electrons.com redirected first to elixir.free-electrons.com and now to elixir.bootlin.com. [1]: https://bootlin.com/blog/free-electrons-becomes-bootlin/ [2]: https://github.com/free-electrons/elixir Signed-off-by: Jonathan Neuschäfer Reviewed-by: Martin Kepplinger Acked-by: Federico Vaga Signed-off-by: Jonathan Corbet --- Documentation/input/devices/xpad.rst | 2 +- Documentation/process/howto.rst | 2 +- Documentation/process/kernel-docs.rst | 2 +- Documentation/translations/it_IT/process/howto.rst | 2 +- Documentation/translations/ja_JP/howto.rst | 2 +- Documentation/translations/ko_KR/howto.rst | 2 +- 6 files changed, 6 insertions(+), 6 deletions(-) diff --git a/Documentation/input/devices/xpad.rst b/Documentation/input/devices/xpad.rst index b8bd65962dd8..173c2acda9fd 100644 --- a/Documentation/input/devices/xpad.rst +++ b/Documentation/input/devices/xpad.rst @@ -218,7 +218,7 @@ References .. [1] http://euc.jp/periphs/xbox-controller.ja.html (ITO Takayuki) .. [2] http://xpad.xbox-scene.com/ .. [3] http://www.markosweb.com/www/xboxhackz.com/ -.. [4] http://lxr.free-electrons.com/ident?i=xpad_device +.. [4] https://elixir.bootlin.com/linux/latest/ident/xpad_device Historic Edits diff --git a/Documentation/process/howto.rst b/Documentation/process/howto.rst index 58b2f46c4f98..bdd2eda81a1c 100644 --- a/Documentation/process/howto.rst +++ b/Documentation/process/howto.rst @@ -225,7 +225,7 @@ Cross-Reference project, which is able to present source code in a self-referential, indexed webpage format. An excellent up-to-date repository of the kernel code may be found at: - http://lxr.free-electrons.com/ + https://elixir.bootlin.com/ The development process diff --git a/Documentation/process/kernel-docs.rst b/Documentation/process/kernel-docs.rst index 3fb28de556e4..ab12dddc773e 100644 --- a/Documentation/process/kernel-docs.rst +++ b/Documentation/process/kernel-docs.rst @@ -565,7 +565,7 @@ Miscellaneous * Name: **Cross-Referencing Linux** - :URL: http://lxr.free-electrons.com/ + :URL: https://elixir.bootlin.com/ :Keywords: Browsing source code. :Description: Another web-based Linux kernel source code browser. Lots of cross references to variables and functions. You can see diff --git a/Documentation/translations/it_IT/process/howto.rst b/Documentation/translations/it_IT/process/howto.rst index 909e6a55bc43..f9a44b1af89d 100644 --- a/Documentation/translations/it_IT/process/howto.rst +++ b/Documentation/translations/it_IT/process/howto.rst @@ -234,7 +234,7 @@ il progetto Linux Cross-Reference, che è in grado di presentare codice sorgente in un formato autoreferenziale ed indicizzato. Un eccellente ed aggiornata fonte di consultazione del codice del kernel la potete trovare qui: - http://lxr.free-electrons.com/ + https://elixir.bootlin.com/ Il processo di sviluppo diff --git a/Documentation/translations/ja_JP/howto.rst b/Documentation/translations/ja_JP/howto.rst index f3116381c26b..2ca9389d5915 100644 --- a/Documentation/translations/ja_JP/howto.rst +++ b/Documentation/translations/ja_JP/howto.rst @@ -245,7 +245,7 @@ Linux カーネルソースツリーの中に含まれる、きれいにし、 できます。この最新の素晴しいカーネルコードのリポジトリは以下で見つかり ます - - http://lxr.free-electrons.com/ + https://elixir.bootlin.com/ 開発プロセス ------------ diff --git a/Documentation/translations/ko_KR/howto.rst b/Documentation/translations/ko_KR/howto.rst index a8197e072599..9fc869087e9c 100644 --- a/Documentation/translations/ko_KR/howto.rst +++ b/Documentation/translations/ko_KR/howto.rst @@ -235,7 +235,7 @@ ReST 마크업을 사용하는 문서들은 Documentation/output 에 생성된 소스코드를 인덱스된 웹 페이지들의 형태로 보여준다. 최신의 멋진 커널 코드 저장소는 다음을 통하여 참조할 수 있다. - http://lxr.free-electrons.com/ + https://elixir.bootlin.com/ 개발 프로세스 -- cgit v1.2.3-59-g8ed1b From 548a7643866b66526b700fc921a84fed47e4e452 Mon Sep 17 00:00:00 2001 From: SeongJae Park Date: Thu, 24 Jan 2019 17:20:28 +0900 Subject: Documentation/kr: Update Korean translation to delete reference to the kernel-mentors mailing list Translate this commit to Korean: bc0ef4a7e4c3 ("Documentation: Delete reference to the kernel-mentors mailing list") Signed-off-by: SeongJae Park Signed-off-by: Jonathan Corbet --- Documentation/translations/ko_KR/howto.rst | 7 ------- 1 file changed, 7 deletions(-) diff --git a/Documentation/translations/ko_KR/howto.rst b/Documentation/translations/ko_KR/howto.rst index 9fc869087e9c..a61f7b3f927f 100644 --- a/Documentation/translations/ko_KR/howto.rst +++ b/Documentation/translations/ko_KR/howto.rst @@ -220,13 +220,6 @@ ReST 마크업을 사용하는 문서들은 Documentation/output 에 생성된 가지고 있지 않다면 다음에 무엇을 해야할지에 관한 방향을 배울 수 있을 것이다. -여러분들이 이미 커널 트리에 반영하길 원하는 코드 묶음을 가지고 있지만 -올바른 포맷으로 포장하는데 도움이 필요하다면 그러한 문제를 돕기 위해 -만들어진 kernel-mentors 프로젝트가 있다. 그곳은 메일링 리스트이며 -다음에서 참조할 수 있다. - - https://selenic.com/mailman/listinfo/kernel-mentors - 리눅스 커널 코드에 실제 변경을 하기 전에 반드시 그 코드가 어떻게 동작하는지 이해하고 있어야 한다. 코드를 분석하기 위하여 특정한 툴의 도움을 빌려서라도 코드를 직접 읽는 것보다 좋은 것은 없다(대부분의 -- cgit v1.2.3-59-g8ed1b From a6bee90a35f4d82bcd59416a33baacf202504616 Mon Sep 17 00:00:00 2001 From: SeongJae Park Date: Thu, 24 Jan 2019 17:20:29 +0900 Subject: Documentation/process/howto/kr: Update Korean translation to remove outdated info about bugzilla mailing lists Translate this commit to Korean: bcd3cf0855c5 ("Documentation/process/howto: Remove outdated info about bugzilla mailing lists") Signed-off-by: SeongJae Park Signed-off-by: Jonathan Corbet --- Documentation/translations/ko_KR/howto.rst | 10 +--------- 1 file changed, 1 insertion(+), 9 deletions(-) diff --git a/Documentation/translations/ko_KR/howto.rst b/Documentation/translations/ko_KR/howto.rst index a61f7b3f927f..ddbea3ad3e97 100644 --- a/Documentation/translations/ko_KR/howto.rst +++ b/Documentation/translations/ko_KR/howto.rst @@ -370,15 +370,7 @@ https://bugzilla.kernel.org 는 리눅스 커널 개발자들이 커널의 버 다른 사람들의 버그들을 수정하기 위하여 시간을 낭비하지 않기 때문이다. 이미 보고된 버그 리포트들을 가지고 작업하기 위해서 https://bugzilla.kernel.org -를 참조하라. 여러분이 앞으로 생겨날 버그 리포트들의 조언자가 되길 원한다면 -bugme-new 메일링 리스트나(새로운 버그 리포트들만이 이곳에서 메일로 전해진다) -bugme-janitor 메일링 리스트(bugzilla에 모든 변화들이 여기서 메일로 전해진다) -에 등록하면 된다. - - https://lists.linux-foundation.org/mailman/listinfo/bugme-new - - https://lists.linux-foundation.org/mailman/listinfo/bugme-janitors - +를 참조하라. 메일링 리스트들 -- cgit v1.2.3-59-g8ed1b From 6fc48e6085ea72fc4ba3e20679478e9fb97727e5 Mon Sep 17 00:00:00 2001 From: SeongJae Park Date: Thu, 24 Jan 2019 17:20:30 +0900 Subject: Documentation/process/howto.rst/kokr: Update Korean translation to add a missing cross-reference Translate this commit to Korean: dad051395413 ("Documentation/process/howto.rst: add a missing cross-reference") Signed-off-by: SeongJae Park Signed-off-by: Jonathan Corbet --- Documentation/translations/ko_KR/howto.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/translations/ko_KR/howto.rst b/Documentation/translations/ko_KR/howto.rst index ddbea3ad3e97..d21806244d32 100644 --- a/Documentation/translations/ko_KR/howto.rst +++ b/Documentation/translations/ko_KR/howto.rst @@ -99,7 +99,7 @@ mtk.manpages@gmail.com의 메인테이너에게 보낼 것을 권장한다. 다음은 커널 소스 트리에 있는 읽어야 할 파일들의 리스트이다. - README + :ref:`Documentation/admin-guide/README.rst ` 이 파일은 리눅스 커널에 관하여 간단한 배경 설명과 커널을 설정하고 빌드하기 위해 필요한 것을 설명한다. 커널에 입문하는 사람들은 여기서 시작해야 한다. -- cgit v1.2.3-59-g8ed1b From 265083a4ae5bbf963ecd739b9638bebd6e026c6a Mon Sep 17 00:00:00 2001 From: SeongJae Park Date: Thu, 24 Jan 2019 17:20:31 +0900 Subject: docs/kokr: Update Korean translation to tidy up TOCs and refs to license-rules.rst Transalte this commit to Korean: 9799445af124 ("docs: tidy up TOCs and refs to license-rules.rst") Signed-off-by: SeongJae Park Signed-off-by: Jonathan Corbet --- Documentation/translations/ko_KR/howto.rst | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/Documentation/translations/ko_KR/howto.rst b/Documentation/translations/ko_KR/howto.rst index d21806244d32..16438515b8e1 100644 --- a/Documentation/translations/ko_KR/howto.rst +++ b/Documentation/translations/ko_KR/howto.rst @@ -77,10 +77,12 @@ Documentation/process/howto.rst 리눅스 커널 소스 코드는 GPL로 배포(release)되었다. 소스트리의 메인 디렉토리에 있는 라이센스에 관하여 상세하게 쓰여 있는 COPYING이라는 -파일을 봐라. 여러분이 라이센스에 관한 더 깊은 문제를 가지고 있다면 -리눅스 커널 메일링 리스트에 묻지말고 변호사와 연락하라. 메일링 -리스트들에 있는 사람들은 변호사가 아니기 때문에 법적 문제에 관하여 -그들의 말에 의지해서는 안된다. +파일을 봐라. 리눅스 커널 라이센싱 규칙과 소스 코드 안의 `SPDX +`_ 식별자 사용법은 +:ref:`Documentation/process/license-rules.rst ` 에 설명되어 +있다. 여러분이 라이센스에 관한 더 깊은 문제를 가지고 있다면 리눅스 커널 메일링 +리스트에 묻지말고 변호사와 연락하라. 메일링 리스트들에 있는 사람들은 변호사가 +아니기 때문에 법적 문제에 관하여 그들의 말에 의지해서는 안된다. GPL에 관한 잦은 질문들과 답변들은 다음을 참조하라. -- cgit v1.2.3-59-g8ed1b From faa6bcbb4c9c2676bd2ad2199c23bf3358356b76 Mon Sep 17 00:00:00 2001 From: SeongJae Park Date: Thu, 24 Jan 2019 17:20:32 +0900 Subject: doc:process:kokr: Update Korean translation to add links where missing Translate this commit to Korean: f77af637f29d ("doc:process: add links where missing") Signed-off-by: SeongJae Park Signed-off-by: Jonathan Corbet --- Documentation/translations/ko_KR/howto.rst | 16 +++++++++------- 1 file changed, 9 insertions(+), 7 deletions(-) diff --git a/Documentation/translations/ko_KR/howto.rst b/Documentation/translations/ko_KR/howto.rst index 16438515b8e1..528433932589 100644 --- a/Documentation/translations/ko_KR/howto.rst +++ b/Documentation/translations/ko_KR/howto.rst @@ -298,9 +298,9 @@ Andrew Morton의 글이 있다. 4.x.y는 "stable" 팀에 의해 관리되며 거의 매번 격주로 배포된다. -커널 트리 문서들 내에 Documentation/process/stable-kernel-rules.rst 파일은 어떤 -종류의 변경들이 -stable 트리로 들어왔는지와 배포 프로세스가 어떻게 -진행되는지를 설명한다. +커널 트리 문서들 내의 :ref:`Documentation/process/stable-kernel-rules.rst ` +파일은 어떤 종류의 변경들이 -stable 트리로 들어왔는지와 +배포 프로세스가 어떻게 진행되는지를 설명한다. 4.x -git 패치들 ~~~~~~~~~~~~~~~ @@ -355,9 +355,10 @@ https://bugzilla.kernel.org 는 리눅스 커널 개발자들이 커널의 버 https://bugzilla.kernel.org/page.cgi?id=faq.html -메인 커널 소스 디렉토리에 있는 admin-guide/reporting-bugs.rst 파일은 커널 버그라고 생각되는 -것을 보고하는 방법에 관한 좋은 템플릿이며 문제를 추적하기 위해서 커널 -개발자들이 필요로 하는 정보가 무엇들인지를 상세히 설명하고 있다. +메인 커널 소스 디렉토리에 있는 :ref:`admin-guide/reporting-bugs.rst ` +파일은 커널 버그라고 생각되는 것을 보고하는 방법에 관한 좋은 템플릿이며 문제를 +추적하기 위해서 커널 개발자들이 필요로 하는 정보가 무엇들인지를 상세히 설명하고 +있다. 버그 리포트들의 관리 @@ -417,7 +418,8 @@ https://bugzilla.kernel.org 는 리눅스 커널 개발자들이 커널의 버 "John 커널해커는 작성했다...."를 유지하며 여러분들의 의견을 그 메일의 윗부분에 작성하지 말고 각 인용한 단락들 사이에 넣어라. -여러분들이 패치들을 메일에 넣는다면 그것들은 Documentation/process/submitting-patches.rst에 +여러분들이 패치들을 메일에 넣는다면 그것들은 +:ref:`Documentation/process/submitting-patches.rst ` 에 나와있는데로 명백히(plain) 읽을 수 있는 텍스트여야 한다. 커널 개발자들은 첨부파일이나 압축된 패치들을 원하지 않는다. 그들은 여러분들의 패치의 각 라인 단위로 코멘트를 하길 원하며 압축하거나 첨부하지 않고 보내는 것이 -- cgit v1.2.3-59-g8ed1b From a41e8f25fa8f8f67360d88eb0eebbabe95a64bdf Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Tue, 22 Jan 2019 19:46:32 +0100 Subject: stable-kernel-rules.rst: add link to networking patch queue The networking maintainer keeps a public list of the patches being queued up for the next round of stable releases. Be sure to check there before asking for a patch to be applied so that you do not waste people's time. Signed-off-by: Greg Kroah-Hartman Acked-by: David S. Miller Signed-off-by: Jonathan Corbet --- Documentation/process/stable-kernel-rules.rst | 3 +++ 1 file changed, 3 insertions(+) diff --git a/Documentation/process/stable-kernel-rules.rst b/Documentation/process/stable-kernel-rules.rst index 070e3f04d183..06f743b612c4 100644 --- a/Documentation/process/stable-kernel-rules.rst +++ b/Documentation/process/stable-kernel-rules.rst @@ -38,6 +38,9 @@ Procedure for submitting patches to the -stable tree - If the patch covers files in net/ or drivers/net please follow netdev stable submission guidelines as described in :ref:`Documentation/networking/netdev-FAQ.rst ` + after first checking the stable networking queue at + https://patchwork.ozlabs.org/bundle/davem/stable/?series=&submitter=&state=*&q=&archive= + to ensure the requested patch is not already queued up. - Security patches should not be handled (solely) by the -stable review process but should follow the procedures in :ref:`Documentation/admin-guide/security-bugs.rst `. -- cgit v1.2.3-59-g8ed1b From 8f7e6d134bda563c9b3abac14de11e89eb4c86d7 Mon Sep 17 00:00:00 2001 From: Adam Borowski Date: Tue, 22 Jan 2019 10:34:08 +0100 Subject: doc: process: GPL -> GPL-compatible Drivers under MIT, BSD-17-clause, or uncle-Bob's-newest-take-on-PD are all fine, not just GPL. Signed-off-by: Adam Borowski [jc: fixed conflict and refilled paragraph] Reviewed-by: Greg Kroah-Hartman Signed-off-by: Jonathan Corbet --- Documentation/process/stable-api-nonsense.rst | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/Documentation/process/stable-api-nonsense.rst b/Documentation/process/stable-api-nonsense.rst index 57d95a49c096..a9625ab1fdc2 100644 --- a/Documentation/process/stable-api-nonsense.rst +++ b/Documentation/process/stable-api-nonsense.rst @@ -169,13 +169,13 @@ driver for every different kernel version for every distribution is a nightmare, and trying to keep up with an ever changing kernel interface is also a rough job. -Simple, get your kernel driver into the main kernel tree (remember we -are talking about GPL released drivers here, if your code doesn't fall -under this category, good luck, you are on your own here, you leech). If your -driver is in the tree, and a kernel interface changes, it will be fixed -up by the person who did the kernel change in the first place. This -ensures that your driver is always buildable, and works over time, with -very little effort on your part. +Simple, get your kernel driver into the main kernel tree (remember we are +talking about drivers released under a GPL-compatible license here, if your +code doesn't fall under this category, good luck, you are on your own here, +you leech). If your driver is in the tree, and a kernel interface changes, +it will be fixed up by the person who did the kernel change in the first +place. This ensures that your driver is always buildable, and works over +time, with very little effort on your part. The very good side effects of having your driver in the main kernel tree are: -- cgit v1.2.3-59-g8ed1b From 560f28bcceb25c6c94b4c54afaf7b886d408e05a Mon Sep 17 00:00:00 2001 From: Kamalesh Babulal Date: Mon, 4 Feb 2019 14:16:57 +0530 Subject: static_keys.txt: Fix trivial spelling mistake Fix the spelling of 'functionnality' -> 'functionality'. Signed-off-by: Kamalesh Babulal Signed-off-by: Jonathan Corbet --- Documentation/static-keys.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/static-keys.txt b/Documentation/static-keys.txt index d68135560895..9803e14639bf 100644 --- a/Documentation/static-keys.txt +++ b/Documentation/static-keys.txt @@ -159,7 +159,7 @@ particularly the CPU hotplug lock (in order to avoid races against CPUs being brought in the kernel while the kernel is getting patched). Calling the static key API from within a hotplug notifier is thus a sure deadlock recipe. In order to still allow use of the -functionnality, the following functions are provided: +functionality, the following functions are provided: static_key_enable_cpuslocked() static_key_disable_cpuslocked() -- cgit v1.2.3-59-g8ed1b From 31dcbbefd385ae59b1384cbcacb31bc2e8dcc94b Mon Sep 17 00:00:00 2001 From: Otto Sabart Date: Sun, 3 Feb 2019 21:00:19 +0100 Subject: doc: kernel-parameters.txt: fix documentation of elevator parameter Legacy IO schedulers (cfq, deadline and noop) were removed in f382fb0bcef4. The documentation for deadline was retained because it carries over to mq-deadline as well, but location of the doc file was changed over time. The old iosched algorithms were removed from elevator= kernel parameter and mq-deadline, kyber and bfq were added with a reference to their documentation. Fixes: f382fb0bcef4 ("block: remove legacy IO schedulers") Signed-off-by: Otto Sabart Signed-off-by: Jonathan Corbet --- Documentation/admin-guide/kernel-parameters.txt | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index b799bcf67d7b..2238fb3c7dcf 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -1182,9 +1182,10 @@ arch/x86/kernel/cpu/cpufreq/elanfreq.c. elevator= [IOSCHED] - Format: {"cfq" | "deadline" | "noop"} - See Documentation/block/cfq-iosched.txt and - Documentation/block/deadline-iosched.txt for details. + Format: { "mq-deadline" | "kyber" | "bfq" } + See Documentation/block/deadline-iosched.txt, + Documentation/block/kyber-iosched.txt and + Documentation/block/bfq-iosched.txt for details. elfcorehdr=[size[KMG]@]offset[KMG] [IA64,PPC,SH,X86,S390] Specifies physical address of start of kernel core -- cgit v1.2.3-59-g8ed1b From 5eadc169fc80d483db88c9997d159ecf0e5fe21a Mon Sep 17 00:00:00 2001 From: Federico Vaga Date: Wed, 23 Jan 2019 21:22:55 +0100 Subject: doc:it_IT: update coding-style - expectations around bool This patch translates in Italian the content of the following patch 7967656ffbfa coding-style: Clarify the expectations around bool Signed-off-by: Federico Vaga Signed-off-by: Jonathan Corbet --- .../translations/it_IT/process/coding-style.rst | 43 +++++++++++++++++++--- 1 file changed, 38 insertions(+), 5 deletions(-) diff --git a/Documentation/translations/it_IT/process/coding-style.rst b/Documentation/translations/it_IT/process/coding-style.rst index dfdefb951b89..2fd0e7f79d55 100644 --- a/Documentation/translations/it_IT/process/coding-style.rst +++ b/Documentation/translations/it_IT/process/coding-style.rst @@ -949,7 +949,40 @@ qualche valore fuori dai limiti. Un tipico esempio è quello delle funzioni che ritornano un puntatore; queste utilizzano NULL o ERR_PTR come meccanismo di notifica degli errori. -17) Non reinventate le macro del kernel +17) L'uso di bool +----------------- + +Nel kernel Linux il tipo bool deriva dal tipo _Bool dello standard C99. +Un valore bool può assumere solo i valori 0 o 1, e implicitamente o +esplicitamente la conversione a bool converte i valori in vero (*true*) o +falso (*false*). Quando si usa un tipo bool il costrutto !! non sarà più +necessario, e questo va ad eliminare una certa serie di bachi. + +Quando si usano i valori booleani, dovreste utilizzare le definizioni di true +e false al posto dei valori 1 e 0. + +Per il valore di ritorno delle funzioni e per le variabili sullo stack, l'uso +del tipo bool è sempre appropriato. L'uso di bool viene incoraggiato per +migliorare la leggibilità e spesso è molto meglio di 'int' nella gestione di +valori booleani. + +Non usate bool se per voi sono importanti l'ordine delle righe di cache o +la loro dimensione; la dimensione e l'allineamento cambia a seconda +dell'architettura per la quale è stato compilato. Le strutture che sono state +ottimizzate per l'allineamento o la dimensione non dovrebbero usare bool. + +Se una struttura ha molti valori true/false, considerate l'idea di raggrupparli +in un intero usando campi da 1 bit, oppure usate un tipo dalla larghezza fissa, +come u8. + +Come per gli argomenti delle funzioni, molti valori true/false possono essere +raggruppati in un singolo argomento a bit denominato 'flags'; spesso 'flags' è +un'alternativa molto più leggibile se si hanno valori costanti per true/false. + +Detto ciò, un uso parsimonioso di bool nelle strutture dati e negli argomenti +può migliorare la leggibilità. + +18) Non reinventate le macro del kernel --------------------------------------- Il file di intestazione include/linux/kernel.h contiene un certo numero @@ -973,7 +1006,7 @@ rigido sui tipi. Sentitevi liberi di leggere attentamente questo file d'intestazione per scoprire cos'altro è stato definito che non dovreste reinventare nel vostro codice. -18) Linee di configurazione degli editor e altre schifezze +19) Linee di configurazione degli editor e altre schifezze ----------------------------------------------------------- Alcuni editor possono interpretare dei parametri di configurazione integrati @@ -1007,8 +1040,8 @@ d'indentazione e di modalità d'uso. Le persone potrebbero aver configurato una modalità su misura, oppure potrebbero avere qualche altra magia per far funzionare bene l'indentazione. -19) Inline assembly ---------------------- +20) Inline assembly +------------------- Nel codice specifico per un'architettura, potreste aver bisogno di codice *inline assembly* per interfacciarvi col processore o con una funzionalità @@ -1040,7 +1073,7 @@ al fine di allineare correttamente l'assembler che verrà generato: "more_magic %reg2, %reg3" : /* outputs */ : /* inputs */ : /* clobbers */); -20) Compilazione sotto condizione +21) Compilazione sotto condizione --------------------------------- Ovunque sia possibile, non usate le direttive condizionali del preprocessore -- cgit v1.2.3-59-g8ed1b From 0c5e194947fc5ccc82bdd656d0efcd2d13b6e743 Mon Sep 17 00:00:00 2001 From: Federico Vaga Date: Mon, 4 Feb 2019 22:30:47 +0100 Subject: doc:it_IT: add translations in process/ This patch adds the Italian translation for the following documents in Documentation/process: - applying-patches - submit-checklist - submitting-drivers - changes - stable api nonsense Signed-off-by: Federico Vaga Signed-off-by: Jonathan Corbet --- .../translations/it_IT/doc-guide/sphinx.rst | 2 + .../it_IT/process/applying-patches.rst | 12 +- .../translations/it_IT/process/changes.rst | 487 ++++++++++++++++++++- .../it_IT/process/stable-api-nonsense.rst | 202 ++++++++- .../it_IT/process/submit-checklist.rst | 127 +++++- .../it_IT/process/submitting-drivers.rst | 8 +- 6 files changed, 822 insertions(+), 16 deletions(-) diff --git a/Documentation/translations/it_IT/doc-guide/sphinx.rst b/Documentation/translations/it_IT/doc-guide/sphinx.rst index 474b7e127893..793b5cc33403 100644 --- a/Documentation/translations/it_IT/doc-guide/sphinx.rst +++ b/Documentation/translations/it_IT/doc-guide/sphinx.rst @@ -3,6 +3,8 @@ .. note:: Per leggere la documentazione originale in inglese: :ref:`Documentation/doc-guide/index.rst ` +.. _it_sphinxdoc: + Introduzione ============ diff --git a/Documentation/translations/it_IT/process/applying-patches.rst b/Documentation/translations/it_IT/process/applying-patches.rst index f5e9c7d0b16d..1d30e5cd2a3d 100644 --- a/Documentation/translations/it_IT/process/applying-patches.rst +++ b/Documentation/translations/it_IT/process/applying-patches.rst @@ -1,13 +1,15 @@ .. include:: ../disclaimer-ita.rst :Original: :ref:`Documentation/process/applying-patches.rst ` - +:Translator: Federico Vaga .. _it_applying_patches: -Applicare modifiche al kernel Linux -=================================== +Applicare patch al kernel Linux ++++++++++++++++++++++++++++++++ -.. warning:: +.. note:: - TODO ancora da tradurre + Questo documento è obsoleto. Nella maggior parte dei casi, piuttosto + che usare ``patch`` manualmente, vorrete usare Git. Per questo motivo + il documento non verrà tradotto. diff --git a/Documentation/translations/it_IT/process/changes.rst b/Documentation/translations/it_IT/process/changes.rst index 956cf95a1214..d0874327f301 100644 --- a/Documentation/translations/it_IT/process/changes.rst +++ b/Documentation/translations/it_IT/process/changes.rst @@ -1,12 +1,495 @@ .. include:: ../disclaimer-ita.rst :Original: :ref:`Documentation/process/changes.rst ` +:Translator: Federico Vaga .. _it_changes: Requisiti minimi per compilare il kernel ++++++++++++++++++++++++++++++++++++++++ -.. warning:: +Introduzione +============ - TODO ancora da tradurre +Questo documento fornisce una lista dei software necessari per eseguire i +kernel 4.x. + +Questo documento è basato sul file "Changes" del kernel 2.0.x e quindi le +persone che lo scrissero meritano credito (Jared Mauch, Axel Boldt, +Alessandro Sigala, e tanti altri nella rete). + +Requisiti minimi correnti +************************* + +Prima di pensare d'avere trovato un baco, aggiornate i seguenti programmi +**almeno** alla versione indicata! Se non siete certi della versione che state +usando, il comando indicato dovrebbe dirvelo. + +Questa lista presume che abbiate già un kernel Linux funzionante. In aggiunta, +non tutti gli strumenti sono necessari ovunque; ovviamente, se non avete un +modem ISDN, per esempio, probabilmente non dovreste preoccuparvi di +isdn4k-utils. + +====================== ================= ======================================== + Programma Versione minima Comando per verificare la versione +====================== ================= ======================================== +GNU C 4.6 gcc --version +GNU make 3.81 make --version +binutils 2.20 ld -v +flex 2.5.35 flex --version +bison 2.0 bison --version +util-linux 2.10o fdformat --version +kmod 13 depmod -V +e2fsprogs 1.41.4 e2fsck -V +jfsutils 1.1.3 fsck.jfs -V +reiserfsprogs 3.6.3 reiserfsck -V +xfsprogs 2.6.0 xfs_db -V +squashfs-tools 4.0 mksquashfs -version +btrfs-progs 0.18 btrfsck +pcmciautils 004 pccardctl -V +quota-tools 3.09 quota -V +PPP 2.4.0 pppd --version +isdn4k-utils 3.1pre1 isdnctrl 2>&1|grep version +nfs-utils 1.0.5 showmount --version +procps 3.2.0 ps --version +oprofile 0.9 oprofiled --version +udev 081 udevd --version +grub 0.93 grub --version || grub-install --version +mcelog 0.6 mcelog --version +iptables 1.4.2 iptables -V +openssl & libcrypto 1.0.0 openssl version +bc 1.06.95 bc --version +Sphinx\ [#f1]_ 1.3 sphinx-build --version +====================== ================= ======================================== + +.. [#f1] Sphinx è necessario solo per produrre la documentazione del Kernel + +Compilazione del kernel +*********************** + +GCC +--- + +La versione necessaria di gcc potrebbe variare a seconda del tipo di CPU nel +vostro calcolatore. + +Make +---- + +Per compilare il kernel vi servirà GNU make 3.81 o successivo. + +Binutils +-------- + +Il sistema di compilazione, dalla versione 4.13, per la produzione dei passi +intermedi, si è convertito all'uso di *thin archive* (`ar T`) piuttosto che +all'uso del *linking* incrementale (`ld -r`). Questo richiede binutils 2.20 o +successivo. + +pkg-config +---------- + +Il sistema di compilazione, dalla versione 4.18, richiede pkg-config per +verificare l'esistenza degli strumenti kconfig e per determinare le +impostazioni da usare in 'make {g,x}config'. Precedentemente pkg-config +veniva usato ma non verificato o documentato. + +Flex +---- + +Dalla versione 4.16, il sistema di compilazione, durante l'esecuzione, genera +un analizzatore lessicale. Questo richiede flex 2.5.35 o successivo. + +Bison +----- + +Dalla versione 4.16, il sistema di compilazione, durante l'esecuzione, genera +un parsificatore. Questo richiede bison 2.0 o successivo. + +Perl +---- + +Per compilare il kernel vi servirà perl 5 e i seguenti moduli ``Getopt::Long``, +``Getopt::Std``, ``File::Basename``, e ``File::Find``. + +BC +-- + +Vi servirà bc per compilare i kernel dal 3.10 in poi. + +OpenSSL +------- + +Il programma OpenSSL e la libreria crypto vengono usati per la firma dei moduli +e la gestione dei certificati; sono usati per la creazione della chiave e +la generazione della firma. + +Se la firma dei moduli è abilitata, allora vi servirà openssl per compilare il +kernel 3.7 e successivi. Vi serviranno anche i pacchetti di sviluppo di +openssl per compilare il kernel 4.3 o successivi. + + +Strumenti di sistema +******************** + +Modifiche architetturali +------------------------ + +DevFS è stato reso obsoleto da udev +(http://www.kernel.org/pub/linux/utils/kernel/hotplug/) + +Il supporto per UID a 32-bit è ora disponibile. Divertitevi! + +La documentazione delle funzioni in Linux è una fase di transizione +verso una documentazione integrata nei sorgenti stessi usando dei commenti +formattati in modo speciale e posizionati vicino alle funzioni che descrivono. +Al fine di arricchire la documentazione, questi commenti possono essere +combinati con i file ReST presenti in Documentation/; questi potranno +poi essere convertiti in formato PostScript, HTML, LaTex, ePUB o PDF. +Per convertire i documenti da ReST al formato che volete, avete bisogno di +Sphinx. + +Util-linux +---------- + +Le versioni più recenti di util-linux: forniscono il supporto a ``fdisk`` per +dischi di grandi dimensioni; supportano le nuove opzioni di mount; riconoscono +più tipi di partizioni; hanno un fdformat che funziona con i kernel 2.4; +e altre chicche. Probabilmente vorrete aggiornarlo. + +Ksymoops +-------- + +Se l'impensabile succede e il kernel va in oops, potrebbe servirvi lo strumento +ksymoops per decodificarlo, ma nella maggior parte dei casi non vi servirà. +Generalmente è preferibile compilare il kernel con l'opzione ``CONFIG_KALLSYMS`` +cosicché venga prodotto un output più leggibile che può essere usato così com'è +(produce anche un output migliore di ksymoops). Se per qualche motivo il +vostro kernel non è stato compilato con ``CONFIG_KALLSYMS`` e non avete modo di +ricompilarlo e riprodurre l'oops con quell'opzione abilitata, allora potete +usare ksymoops per decodificare l'oops. + +Mkinitrd +-------- + +I cambiamenti della struttura in ``/lib/modules`` necessita l'aggiornamento di +mkinitrd. + +E2fsprogs +--------- + +L'ultima versione di ``e2fsprogs`` corregge diversi bachi in fsck e debugfs. +Ovviamente, aggiornarlo è una buona idea. + +JFSutils +-------- + +Il pacchetto ``jfsutils`` contiene programmi per il file-system JFS. +Sono disponibili i seguenti strumenti: + +- ``fsck.jfs`` - avvia la ripetizione del log delle transizioni, e verifica e + ripara una partizione formattata secondo JFS + +- ``mkfs.jfs`` - crea una partizione formattata secondo JFS + +- sono disponibili altri strumenti per il file-system. + +Reiserfsprogs +------------- + +Il pacchetto reiserfsprogs dovrebbe essere usato con reiserfs-3.6.x (Linux +kernel 2.4.x). Questo è un pacchetto combinato che contiene versioni +funzionanti di ``mkreiserfs``, ``resize_reiserfs``, ``debugreiserfs`` e +``reiserfsck``. Questi programmi funzionano sulle piattaforme i386 e alpha. + +Xfsprogs +-------- + +L'ultima versione di ``xfsprogs`` contiene, fra i tanti, i programmi +``mkfs.xfs``, ``xfs_db`` e ``xfs_repair`` per il file-system XFS. +Dipendono dell'architettura e qualsiasi versione dalla 2.0.0 in poi +dovrebbe funzionare correttamente con la versione corrente del codice +XFS nel kernel (sono raccomandate le versioni 2.6.0 o successive per via +di importanti miglioramenti). + +PCMCIAutils +----------- + +PCMCIAutils sostituisce ``pcmica-cs``. Serve ad impostare correttamente i +connettori PCMCIA all'avvio del sistema e a caricare i moduli necessari per +i dispositivi a 16-bit se il kernel è stato modularizzato e il sottosistema +hotplug è in uso. + +Quota-tools +----------- + +Il supporto per uid e gid a 32 bit richiedono l'uso della versione 2 del +formato quota. La versione 3.07 e successive di quota-tools supportano +questo formato. Usate la versione raccomandata nella lista qui sopra o una +successiva. + +Micro codice per Intel IA32 +--------------------------- + +Per poter aggiornare il micro codice per Intel IA32, è stato aggiunto un +apposito driver; il driver è accessibile come un normale dispositivo a +caratteri (misc). Se non state usando udev probabilmente sarà necessario +eseguire i seguenti comandi come root prima di poterlo aggiornare:: + + mkdir /dev/cpu + mknod /dev/cpu/microcode c 10 184 + chmod 0644 /dev/cpu/microcode + +Probabilmente, vorrete anche il programma microcode_ctl da usare con questo +dispositivo. + +udev +---- + +``udev`` è un programma in spazio utente il cui scopo è quello di popolare +dinamicamente la cartella ``/dev`` coi dispositivi effettivamente presenti. +``udev`` sostituisce le funzionalità base di devfs, consentendo comunque +nomi persistenti per i dispositivi. + +FUSE +---- + +Serve libfuse 2.4.0 o successiva. Il requisito minimo assoluto è 2.3.0 ma +le opzioni di mount ``direct_io`` e ``kernel_cache`` non funzioneranno. + + +Rete +**** + +Cambiamenti generali +-------------------- + +Se per quanto riguarda la configurazione di rete avete esigenze di un certo +livello dovreste prendere in considerazione l'uso degli strumenti in ip-route2. + +Filtro dei pacchetti / NAT +-------------------------- + +Il codice per filtraggio dei pacchetti e il NAT fanno uso degli stessi +strumenti come nelle versioni del kernel antecedenti la 2.4.x (iptables). +Include ancora moduli di compatibilità per 2.2.x ipchains e 2.0.x ipdwadm. + +PPP +--- + +Il driver per PPP è stato ristrutturato per supportare collegamenti multipli e +per funzionare su diversi livelli. Se usate PPP, aggiornate pppd almeno alla +versione 2.4.0. + +Se non usate udev, dovete avere un file /dev/ppp che può essere creato da root +col seguente comando:: + + mknod /dev/ppp c 108 0 + +Isdn4k-utils +------------ + +Per via della modifica del campo per il numero di telefono, il pacchetto +isdn4k-utils dev'essere ricompilato o (preferibilmente) aggiornato. + +NFS-utils +--------- + +Nei kernel più antichi (2.4 e precedenti), il server NFS doveva essere +informato sui clienti ai quali si voleva fornire accesso via NFS. Questa +informazione veniva passata al kernel quando un cliente montava un file-system +mediante ``mountd``, oppure usando ``exportfs`` all'avvio del sistema. +exportfs prende le informazioni circa i clienti attivi da ``/var/lib/nfs/rmtab``. + +Questo approccio è piuttosto delicato perché dipende dalla correttezza di +rmtab, che non è facile da garantire, in particolare quando si cerca di +implementare un *failover*. Anche quando il sistema funziona bene, ``rmtab`` +ha il problema di accumulare vecchie voci inutilizzate. + +Sui kernel più recenti il kernel ha la possibilità di informare mountd quando +arriva una richiesta da una macchina sconosciuta, e mountd può dare al kernel +le informazioni corrette per l'esportazione. Questo rimuove la dipendenza con +``rmtab`` e significa che il kernel deve essere al corrente solo dei clienti +attivi. + +Per attivare questa funzionalità, dovete eseguire il seguente comando prima di +usare exportfs o mountd:: + + mount -t nfsd nfsd /proc/fs/nfsd + +Dove possibile, raccomandiamo di proteggere tutti i servizi NFS dall'accesso +via internet mediante un firewall. + +mcelog +------ + +Quando ``CONFIG_x86_MCE`` è attivo, il programma mcelog processa e registra +gli eventi *machine check*. Gli eventi *machine check* sono errori riportati +dalla CPU. Incoraggiamo l'analisi di questi errori. + + +Documentazione del kernel +************************* + +Sphinx +------ + +Per i dettaglio sui requisiti di Sphinx, fate riferimento a :ref:`it_sphinx_install` +in :ref:`Documentation/translations/it_IT/doc-guide/sphinx.rst ` + +Ottenere software aggiornato +============================ + +Compilazione del kernel +*********************** + +gcc +--- + +- + +Make +---- + +- + +Binutils +-------- + +- + +Flex +---- + +- + +Bison +----- + +- + +OpenSSL +------- + +- + +Strumenti di sistema +******************** + +Util-linux +---------- + +- + +Kmod +---- + +- +- + +Ksymoops +-------- + +- + +Mkinitrd +-------- + +- + +E2fsprogs +--------- + +- + +JFSutils +-------- + +- + +Reiserfsprogs +------------- + +- + +Xfsprogs +-------- + +- + +Pcmciautils +----------- + +- + +Quota-tools +----------- + +- + + +Microcodice Intel P6 +-------------------- + +- + +udev +---- + +- + +FUSE +---- + +- + +mcelog +------ + +- + +Rete +**** + +PPP +--- + +- + +Isdn4k-utils +------------ + +- + +NFS-utils +--------- + +- + +Iptables +-------- + +- + +Ip-route2 +--------- + +- + +OProfile +-------- + +- + +NFS-Utils +--------- + +- + +Documentazione del kernel +************************* + +Sphinx +------ + +- diff --git a/Documentation/translations/it_IT/process/stable-api-nonsense.rst b/Documentation/translations/it_IT/process/stable-api-nonsense.rst index d4fa4abf8dd3..cdfc509b8566 100644 --- a/Documentation/translations/it_IT/process/stable-api-nonsense.rst +++ b/Documentation/translations/it_IT/process/stable-api-nonsense.rst @@ -1,13 +1,209 @@ .. include:: ../disclaimer-ita.rst :Original: :ref:`Documentation/process/stable-api-nonsense.rst ` - +:Translator: Federico Vaga .. _it_stable_api_nonsense: L'interfaccia dei driver per il kernel Linux ============================================ -.. warning:: +(tutte le risposte alle vostre domande e altro) + +Greg Kroah-Hartman + +Questo è stato scritto per cercare di spiegare perché Linux **non ha +un'interfaccia binaria, e non ha nemmeno un'interfaccia stabile**. + +.. note:: + + Questo articolo parla di interfacce **interne al kernel**, non delle + interfacce verso lo spazio utente. + + L'interfaccia del kernel verso lo spazio utente è quella usata dai + programmi, ovvero le chiamate di sistema. Queste interfacce sono **molto** + stabili nel tempo e non verranno modificate. Ho vecchi programmi che sono + stati compilati su un kernel 0.9 (circa) e tuttora funzionano sulle versioni + 2.6 del kernel. Queste interfacce sono quelle che gli utenti e i + programmatori possono considerare stabili. + +Riepilogo generale +------------------ + +Pensate di volere un'interfaccia del kernel stabile, ma in realtà non la +volete, e nemmeno sapete di non volerla. Quello che volete è un driver +stabile che funzioni, e questo può essere ottenuto solo se il driver si trova +nei sorgenti del kernel. Ci sono altri vantaggi nell'avere il proprio driver +nei sorgenti del kernel, ognuno dei quali hanno reso Linux un sistema operativo +robusto, stabile e maturo; questi sono anche i motivi per cui avete scelto +Linux. + +Introduzione +------------ + +Solo le persone un po' strambe vorrebbero scrivere driver per il kernel con +la costante preoccupazione per i cambiamenti alle interfacce interne. Per il +resto del mondo, queste interfacce sono invisibili o non di particolare +interesse. + +Innanzitutto, non tratterò **alcun** problema legale riguardante codice +chiuso, nascosto, avvolto, blocchi binari, o qualsia altra cosa che descrive +driver che non hanno i propri sorgenti rilasciati con licenza GPL. Per favore +fate riferimento ad un avvocato per qualsiasi questione legale, io sono un +programmatore e perciò qui vi parlerò soltanto delle questioni tecniche (non +per essere superficiali sui problemi legali, sono veri e dovete esserne a +conoscenza in ogni circostanza). + +Dunque, ci sono due tematiche principali: interfacce binarie del kernel e +interfacce stabili nei sorgenti. Ognuna dipende dall'altra, ma discuteremo +prima delle cose binarie per toglierle di mezzo. + +Interfaccia binaria del kernel +------------------------------ + +Supponiamo d'avere un'interfaccia stabile nei sorgenti del kernel, di +conseguenza un'interfaccia binaria dovrebbe essere anche'essa stabile, giusto? +Sbagliato. Prendete in considerazione i seguenti fatti che riguardano il +kernel Linux: + + - A seconda della versione del compilatore C che state utilizzando, diverse + strutture dati del kernel avranno un allineamento diverso, e possibilmente + un modo diverso di includere le funzioni (renderle inline oppure no). + L'organizzazione delle singole funzioni non è poi così importante, ma la + spaziatura (*padding*) nelle strutture dati, invece, lo è. + + - In base alle opzioni che sono state selezionate per generare il kernel, + un certo numero di cose potrebbero succedere: + + - strutture dati differenti potrebbero contenere campi differenti + - alcune funzioni potrebbero non essere implementate (per esempio, + alcuni *lock* spariscono se compilati su sistemi mono-processore) + - la memoria interna del kernel può essere allineata in differenti modi + a seconda delle opzioni di compilazione. + + - Linux funziona su una vasta gamma di architetture di processore. Non esiste + alcuna possibilità che il binario di un driver per un'architettura funzioni + correttamente su un'altra. + +Alcuni di questi problemi possono essere risolti compilando il proprio modulo +con la stessa identica configurazione del kernel, ed usando la stessa versione +del compilatore usato per compilare il kernel. Questo è sufficiente se volete +fornire un modulo per uno specifico rilascio su una specifica distribuzione +Linux. Ma moltiplicate questa singola compilazione per il numero di +distribuzioni Linux e il numero dei rilasci supportati da quest'ultime e vi +troverete rapidamente in un incubo fatto di configurazioni e piattaforme +hardware (differenti processori con differenti opzioni); dunque, anche per il +singolo rilascio di un modulo, dovreste creare differenti versioni dello +stesso. + +Fidatevi, se tenterete questa via, col tempo, diventerete pazzi; l'ho imparato +a mie spese molto tempo fa... + + +Interfaccia stabile nei sorgenti del kernel +------------------------------------------- + +Se parlate con le persone che cercano di mantenere aggiornato un driver per +Linux ma che non si trova nei sorgenti, allora per queste persone l'argomento +sarà "ostico". + +Lo sviluppo del kernel Linux è continuo e viaggia ad un ritmo sostenuto, e non +rallenta mai. Perciò, gli sviluppatori del kernel trovano bachi nelle +interfacce attuali, o trovano modi migliori per fare le cose. Se le trovano, +allora le correggeranno per migliorarle. In questo frangente, i nomi delle +funzioni potrebbero cambiare, le strutture dati potrebbero diventare più grandi +o più piccole, e gli argomenti delle funzioni potrebbero essere ripensati. +Se questo dovesse succedere, nello stesso momento, tutte le istanze dove questa +interfaccia viene utilizzata verranno corrette, garantendo che tutto continui +a funzionare senza problemi. + +Portiamo ad esempio l'interfaccia interna per il sottosistema USB che ha subito +tre ristrutturazioni nel corso della sua vita. Queste ristrutturazioni furono +fatte per risolvere diversi problemi: + + - È stato fatto un cambiamento da un flusso di dati sincrono ad uno + asincrono. Questo ha ridotto la complessità di molti driver e ha + aumentato la capacità di trasmissione di tutti i driver fino a raggiungere + quasi la velocità massima possibile. + - È stato fatto un cambiamento nell'allocazione dei pacchetti da parte del + sottosistema USB per conto dei driver, cosicché ora i driver devono fornire + più informazioni al sottosistema USB al fine di correggere un certo numero + di stalli. + +Questo è completamente l'opposto di quello che succede in alcuni sistemi +operativi proprietari che hanno dovuto mantenere, nel tempo, il supporto alle +vecchie interfacce USB. I nuovi sviluppatori potrebbero usare accidentalmente +le vecchie interfacce e sviluppare codice nel modo sbagliato, portando, di +conseguenza, all'instabilità del sistema. + +In entrambe gli scenari, gli sviluppatori hanno ritenuto che queste importanti +modifiche erano necessarie, e quindi le hanno fatte con qualche sofferenza. +Se Linux avesse assicurato di mantenere stabile l'interfaccia interna, si +sarebbe dovuto procedere alla creazione di una nuova, e quelle vecchie, e +mal funzionanti, avrebbero dovuto ricevere manutenzione, creando lavoro +aggiuntivo per gli sviluppatori del sottosistema USB. Dato che gli +sviluppatori devono dedicare il proprio tempo a questo genere di lavoro, +chiedergli di dedicarne dell'altro, senza benefici, magari gratuitamente, non +è contemplabile. + +Le problematiche relative alla sicurezza sono molto importanti per Linux. +Quando viene trovato un problema di sicurezza viene corretto in breve tempo. +A volte, per prevenire il problema di sicurezza, si sono dovute cambiare +delle interfacce interne al kernel. Quando è successo, allo stesso tempo, +tutti i driver che usavano quelle interfacce sono stati aggiornati, garantendo +la correzione definitiva del problema senza doversi preoccupare di rivederlo +per sbaglio in futuro. Se non si fossero cambiate le interfacce interne, +sarebbe stato impossibile correggere il problema e garantire che non si sarebbe +più ripetuto. + +Nel tempo le interfacce del kernel subiscono qualche ripulita. Se nessuno +sta più usando un'interfaccia, allora questa verrà rimossa. Questo permette +al kernel di rimanere il più piccolo possibile, e garantisce che tutte le +potenziali interfacce sono state verificate nel limite del possibile (le +interfacce inutilizzate sono impossibili da verificare). + + +Cosa fare +--------- + +Dunque, se avete un driver per il kernel Linux che non si trova nei sorgenti +principali del kernel, come sviluppatori, cosa dovreste fare? Rilasciare un +file binario del driver per ogni versione del kernel e per ogni distribuzione, +è un incubo; inoltre, tenere il passo con tutti i cambiamenti del kernel è un +brutto lavoro. + +Semplicemente, fate sì che il vostro driver per il kernel venga incluso nei +sorgenti principali (ricordatevi, stiamo parlando di driver rilasciati secondo +una licenza compatibile con la GPL; se il vostro codice non ricade in questa +categoria: buona fortuna, arrangiatevi, siete delle sanguisughe) + +Se il vostro driver è nei sorgenti del kernel e un'interfaccia cambia, il +driver verrà corretto immediatamente dalla persona che l'ha modificata. Questo +garantisce che sia sempre possibile compilare il driver, che funzioni, e tutto +con un minimo sforzo da parte vostra. + +Avere il proprio driver nei sorgenti principali del kernel ha i seguenti +vantaggi: + + - La qualità del driver aumenterà e i costi di manutenzione (per lo + sviluppatore originale) diminuiranno. + - Altri sviluppatori aggiungeranno nuove funzionalità al vostro driver. + - Altri persone troveranno e correggeranno bachi nel vostro driver. + - Altri persone troveranno degli aggiustamenti da fare al vostro driver. + - Altri persone aggiorneranno il driver quando è richiesto da un cambiamento + di un'interfaccia. + - Il driver sarà automaticamente reso disponibile in tutte le distribuzioni + Linux senza dover chiedere a nessuna di queste di aggiungerlo. + +Dato che Linux supporta più dispositivi di qualsiasi altro sistema operativo, +e che girano su molti più tipi di processori di qualsiasi altro sistema +operativo; ciò dimostra che questo modello di sviluppo qualcosa di giusto, +dopo tutto, lo fa :) + + + +------ - TODO ancora da tradurre +Dei ringraziamenti vanno a Randy Dunlap, Andrew Morton, David Brownell, +Hanna Linder, Robert Love, e Nishanth Aravamudan per la loro revisione +e per i loro commenti sulle prime bozze di questo articolo. diff --git a/Documentation/translations/it_IT/process/submit-checklist.rst b/Documentation/translations/it_IT/process/submit-checklist.rst index b6b4dd94a660..70e65a7b3620 100644 --- a/Documentation/translations/it_IT/process/submit-checklist.rst +++ b/Documentation/translations/it_IT/process/submit-checklist.rst @@ -1,12 +1,131 @@ .. include:: ../disclaimer-ita.rst :Original: :ref:`Documentation/process/submit-checklist.rst ` +:Translator: Federico Vaga .. _it_submitchecklist: -Lista delle cose da fare per inviare una modifica al kernel Linux -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Lista delle verifiche da fare prima di inviare una patch per il kernel Linux +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -.. warning:: +Qui troverete una lista di cose che uno sviluppatore dovrebbe fare per +vedere le proprie patch accettate più rapidamente. - TODO ancora da tradurre +Tutti questi punti integrano la documentazione fornita riguardo alla +sottomissione delle patch, in particolare +:ref:`Documentation/translations/it_IT/process/submitting-patches.rst `. + +1) Se state usando delle funzionalità del kernel allora includete (#include) + i file che le dichiarano/definiscono. Non dipendente dal fatto che un file + d'intestazione include anche quelli usati da voi. + +2) Compilazione pulita: + + a) con le opzioni ``CONFIG`` negli stati ``=y``, ``=m`` e ``=n``. Nessun + avviso/errore di ``gcc`` e nessun avviso/errore dal linker. + + b) con ``allnoconfig``, ``allmodconfig`` + + c) quando si usa ``O=builddir`` + +3) Compilare per diverse architetture di processore usando strumenti per + la cross-compilazione o altri. + +4) Una buona architettura per la verifica della cross-compilazione è la ppc64 + perché tende ad usare ``unsigned long`` per le quantità a 64-bit. + +5) Controllate lo stile del codice della vostra patch secondo le direttive + scritte in :ref:`Documentation/translations/it_IT/process/coding-style.rst `. + Prima dell'invio della patch, usate il verificatore di stile + (``script/checkpatch.pl``) per scovare le violazioni più semplici. + Dovreste essere in grado di giustificare tutte le violazioni rimanenti nella + vostra patch. + +6) Le opzioni ``CONFIG``, nuove o modificate, non scombussolano il menu + di configurazione e sono preimpostate come disabilitate a meno che non + soddisfino i criteri descritti in ``Documentation/kbuild/kconfig-language.txt`` + alla punto "Voci di menu: valori predefiniti". + +7) Tutte le nuove opzioni ``Kconfig`` hanno un messaggio di aiuto. + +8) La patch è stata accuratamente revisionata rispetto alle più importanti + configurazioni ``Kconfig``. Questo è molto difficile da fare + correttamente - un buono lavoro di testa sarà utile. + +9) Verificare con sparse. + +10) Usare ``make checkstack`` e ``make namespacecheck`` e correggere tutti i + problemi rilevati. + + .. note:: + + ``checkstack`` non evidenzia esplicitamente i problemi, ma una funzione + che usa più di 512 byte sullo stack è una buona candidata per una + correzione. + +11) Includete commenti :ref:`kernel-doc ` per documentare API + globali del kernel. Usate ``make htmldocs`` o ``make pdfdocs`` per + verificare i commenti :ref:`kernel-doc ` ed eventualmente + correggerli. + +12) La patch è stata verificata con le seguenti opzioni abilitate + contemporaneamente: ``CONFIG_PREEMPT``, ``CONFIG_DEBUG_PREEMPT``, + ``CONFIG_DEBUG_SLAB``, ``CONFIG_DEBUG_PAGEALLOC``, ``CONFIG_DEBUG_MUTEXES``, + ``CONFIG_DEBUG_SPINLOCK``, ``CONFIG_DEBUG_ATOMIC_SLEEP``, + ``CONFIG_PROVE_RCU`` e ``CONFIG_DEBUG_OBJECTS_RCU_HEAD``. + +13) La patch è stata compilata e verificata in esecuzione con, e senza, + le opzioni ``CONFIG_SMP`` e ``CONFIG_PREEMPT``. + +14) Se la patch ha effetti sull'IO dei dischi, eccetera: allora dev'essere + verificata con, e senza, l'opzione ``CONFIG_LBDAF``. + +15) Tutti i percorsi del codice sono stati verificati con tutte le funzionalità + di lockdep abilitate. + +16) Tutti i nuovi elementi in ``/proc`` sono documentati in ``Documentation/``. + +17) Tutti i nuovi parametri d'avvio del kernel sono documentati in + ``Documentation/admin-guide/kernel-parameters.rst``. + +18) Tutti i nuovi parametri dei moduli sono documentati con ``MODULE_PARM_DESC()``. + +19) Tutte le nuove interfacce verso lo spazio utente sono documentate in + ``Documentation/ABI/``. Leggete ``Documentation/ABI/README`` per maggiori + informazioni. Le patch che modificano le interfacce utente dovrebbero + essere inviate in copia anche a linux-api@vger.kernel.org. + +20) Verifica che il kernel passi con successo ``make headers_check`` + +21) La patch è stata verificata con l'iniezione di fallimenti in slab e + nell'allocazione di pagine. Vedere ``Documentation/fault-injection/``. + + Se il nuovo codice è corposo, potrebbe essere opportuno aggiungere + l'iniezione di fallimenti specifici per il sottosistema. + +22) Il nuovo codice è stato compilato con ``gcc -W`` (usate + ``make EXTRA_CFLAGS=-W``). Questo genererà molti avvisi, ma è ottimo + per scovare bachi come "warning: comparison between signed and unsigned". + +23) La patch è stata verificata dopo essere stata inclusa nella serie di patch + -mm; questo al fine di assicurarsi che continui a funzionare assieme a + tutte le altre patch in coda e i vari cambiamenti nei sottosistemi VM, VFS + e altri. + +24) Tutte le barriere di sincronizzazione {per esempio, ``barrier()``, + ``rmb()``, ``wmb()``} devono essere accompagnate da un commento nei + sorgenti che ne spieghi la logica: cosa fanno e perché. + +25) Se la patch aggiunge nuove chiamate ioctl, allora aggiornate + ``Documentation/ioctl/ioctl-number.txt``. + +26) Se il codice che avete modificato dipende o usa una qualsiasi interfaccia o + funzionalità del kernel che è associata a uno dei seguenti simboli + ``Kconfig``, allora verificate che il kernel compili con diverse + configurazioni dove i simboli sono disabilitati e/o ``=m`` (se c'è la + possibilità) [non tutti contemporaneamente, solo diverse combinazioni + casuali]: + + ``CONFIG_SMP``, ``CONFIG_SYSFS``, ``CONFIG_PROC_FS``, ``CONFIG_INPUT``, + ``CONFIG_PCI``, ``CONFIG_BLOCK``, ``CONFIG_PM``, ``CONFIG_MAGIC_SYSRQ``, + ``CONFIG_NET``, ``CONFIG_INET=n`` (ma l'ultimo con ``CONFIG_NET=y``). diff --git a/Documentation/translations/it_IT/process/submitting-drivers.rst b/Documentation/translations/it_IT/process/submitting-drivers.rst index 16df950ef808..dadd77e47613 100644 --- a/Documentation/translations/it_IT/process/submitting-drivers.rst +++ b/Documentation/translations/it_IT/process/submitting-drivers.rst @@ -1,12 +1,16 @@ .. include:: ../disclaimer-ita.rst :Original: :ref:`Documentation/process/submitting-drivers.rst ` +:Translator: Federico Vaga .. _it_submittingdrivers: Sottomettere driver per il kernel Linux ======================================= -.. warning:: +.. note:: - TODO ancora da tradurre + Questo documento è vecchio e negli ultimi anni non è stato più aggiornato; + dovrebbe essere aggiornato, o forse meglio, rimosso. La maggior parte di + quello che viene detto qui può essere trovato anche negli altri documenti + dedicati allo sviluppo. Per questo motivo il documento non verrà tradotto. -- cgit v1.2.3-59-g8ed1b From de19055564c8f8f9d366f8db3395836da0b2176c Mon Sep 17 00:00:00 2001 From: Jeremy Linton Date: Fri, 25 Jan 2019 12:07:00 -0600 Subject: Documentation: Document arm64 kpti control For a while Arm64 has been capable of force enabling or disabling the kpti mitigations. Lets make sure the documentation reflects that. Signed-off-by: Jeremy Linton Reviewed-by: Andre Przywara Signed-off-by: Jonathan Corbet --- Documentation/admin-guide/kernel-parameters.txt | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 2238fb3c7dcf..b251494d1b66 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -1983,6 +1983,12 @@ Built with CONFIG_DEBUG_KMEMLEAK_DEFAULT_OFF=y, the default is off. + kpti= [ARM64] Control page table isolation of user + and kernel address spaces. + Default: enabled on cores which need mitigation. + 0: force disabled + 1: force enabled + kvm.ignore_msrs=[KVM] Ignore guest accesses to unhandled MSRs. Default is 0 (don't ignore, but inject #GP) -- cgit v1.2.3-59-g8ed1b From bf7fbeeae6db644ef5995085de2bc5c6121f8c8d Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Fri, 8 Feb 2019 17:02:56 +0100 Subject: module: Cure the MODULE_LICENSE "GPL" vs. "GPL v2" bogosity The original MODULE_LICENSE string for kernel modules licensed under the GPL v2 (only / or later) was simply "GPL", which was - and still is - completely sufficient for the purpose of module loading and checking whether the module is free software or proprietary. In January 2003 this was changed with commit 3344ea3ad4b7 ("[PATCH] MODULE_LICENSE and EXPORT_SYMBOL_GPL support"). This commit can be found in the history git repository which holds the 1:1 import of Linus' bitkeeper repository: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git/commit/?id=3344ea3ad4b7c302c846a680dbaeedf96ed45c02 The main intention of the patch was to refuse linking proprietary modules against symbols exported with EXPORT_SYMBOL_GPL() at module load time. As a completely undocumented side effect it also introduced the distinction between "GPL" and "GPL v2" MODULE_LICENSE() strings: * "GPL" [GNU Public License v2 or later] * "GPL v2" [GNU Public License v2] * "GPL and additional rights" [GNU Public License v2 rights and more] * "Dual BSD/GPL" [GNU Public License v2 * or BSD license choice] * "Dual MPL/GPL" [GNU Public License v2 * or Mozilla license choice] This distinction was and still is wrong in several aspects: 1) It broke all modules which were using the "GPL" string in the MODULE_LICENSE() already and were licensed under GPL v2 only. A quick license scan over the tree at that time shows that at least 480 out of 1484 modules have been affected by this change back then. The number is probably way higher as this was just a quick check for clearly identifiable license information. There was exactly ONE instance of a "GPL v2" module license string in the kernel back then - drivers/net/tulip/xircom_tulip_cb.c which otherwise had no license information at all. There is no indication that the change above is any way related to this driver. The change happend with the 2.4.11 release which was on Oct. 9 2001 - so quite some time before the above commit. Unfortunately there is no trace on the intertubes to any discussion of this. 2) The dual licensed strings became ill defined as well because following the "GPL" vs. "GPL v2" distinction all dual licensed (or additional rights) MODULE_LICENSE strings would either require those dual licensed modules to be licensed under GPL v2 or later or just be unspecified for the dual licensing case. Neither choice is coherent with the GPL distinction. Due to the lack of a proper changelog and no real discussion on the patch submission other than a few implementation details, it's completely unclear why this distinction was introduced at all. Other than the comment in the module header file exists no documentation for this at all. From a license compliance and license scanning POV this distinction is a total nightmare. As of 5.0-rc2 2873 out of 9200 instances of MODULE_LICENSE() strings are conflicting with the actual license in the source code (either SPDX or license boilerplate/reference). A comparison between the scan of the history tree and a scan of current Linus tree shows to the extent that the git rename detection over Linus tree grafted with the history tree is halfways complete that almost none of the files which got broken in 2003 have been cleaned up vs. the MODULE_LICENSE string. So subtracting those 480 known instances from the conflicting 2800 of today more than 25% of the module authors got it wrong and it's a high propability that a large portion of the rest just got it right by chance. There is no value for the module loader to convey the detailed license information as the only decision to be made is whether the module is free software or not. The "and additional rights", "BSD" and "MPL" strings are not conclusive license information either. So there is no point in trying to make the GPL part conclusive and exact. As shown above it's already non conclusive for dual licensing and incoherent with a large portion of the module source. As an unintended side effect this distinction causes a major headache for license compliance, license scanners and the ongoing effort to clean up the license mess of the kernel. Therefore remove the well meant, but ill defined, distinction between "GPL" and "GPL v2" and document that: - "GPL" and "GPL v2" both express that the module is licensed under GPLv2 (without a distinction of 'only' and 'or later') and is therefore kernel license compliant. - None of the MODULE_LICENSE strings can be used for expressing or determining the exact license - Their sole purpose is to decide whether the module is free software or not. Add a MODULE_LICENSE subsection to the license rule documentation as well. Signed-off-by: Thomas Gleixner Reviewed-by: Greg Kroah-Hartman Acked-by: Philippe Ombredanne Acked-by: Joe Perches [jc: Did s/merily/merely/ ] Acked-by: Jessica Yu Signed-off-by: Jonathan Corbet --- Documentation/process/license-rules.rst | 62 +++++++++++++++++++++++++++++++++ include/linux/module.h | 18 +++++++++- 2 files changed, 79 insertions(+), 1 deletion(-) diff --git a/Documentation/process/license-rules.rst b/Documentation/process/license-rules.rst index 2bb8c0fc2238..43b6a1ee0193 100644 --- a/Documentation/process/license-rules.rst +++ b/Documentation/process/license-rules.rst @@ -372,3 +372,65 @@ in the LICENSE subdirectories. This is required to allow tool verification (e.g. checkpatch.pl) and to have the licenses ready to read and extract right from the source, which is recommended by various FOSS organizations, e.g. the `FSFE REUSE initiative `_. + +_`MODULE_LICENSE` +----------------- + + Loadable kernel modules also require a MODULE_LICENSE() tag. This tag is + neither a replacement for proper source code license information + (SPDX-License-Identifier) nor in any way relevant for expressing or + determining the exact license under which the source code of the module + is provided. + + The sole purpose of this tag is to provide sufficient information + whether the module is free software or proprietary for the kernel + module loader and for user space tools. + + The valid license strings for MODULE_LICENSE() are: + + ============================= ============================================= + "GPL" Module is licensed under GPL version 2. This + does not express any distinction between + GPL-2.0-only or GPL-2.0-or-later. The exact + license information can only be determined + via the license information in the + corresponding source files. + + "GPL v2" Same as "GPL". It exists for historic + reasons. + + "GPL and additional rights" Historical variant of expressing that the + module source is dual licensed under a + GPL v2 variant and MIT license. Please do + not use in new code. + + "Dual MIT/GPL" The correct way of expressing that the + module is dual licensed under a GPL v2 + variant or MIT license choice. + + "Dual BSD/GPL" The module is dual licensed under a GPL v2 + variant or BSD license choice. The exact + variant of the BSD license can only be + determined via the license information + in the corresponding source files. + + "Dual MPL/GPL" The module is dual licensed under a GPL v2 + variant or Mozilla Public License (MPL) + choice. The exact variant of the MPL + license can only be determined via the + license information in the corresponding + source files. + + "Proprietary" The module is under a proprietary license. + This string is solely for proprietary third + party modules and cannot be used for modules + which have their source code in the kernel + tree. Modules tagged that way are tainting + the kernel with the 'P' flag when loaded and + the kernel module loader refuses to link such + modules against symbols which are exported + with EXPORT_SYMBOL_GPL(). + ============================= ============================================= + + + diff --git a/include/linux/module.h b/include/linux/module.h index 9a21fe3509af..3a2402b8d790 100644 --- a/include/linux/module.h +++ b/include/linux/module.h @@ -172,7 +172,7 @@ extern void cleanup_module(void); * The following license idents are currently accepted as indicating free * software modules * - * "GPL" [GNU Public License v2 or later] + * "GPL" [GNU Public License v2] * "GPL v2" [GNU Public License v2] * "GPL and additional rights" [GNU Public License v2 rights and more] * "Dual BSD/GPL" [GNU Public License v2 @@ -186,6 +186,22 @@ extern void cleanup_module(void); * * "Proprietary" [Non free products] * + * Both "GPL v2" and "GPL" (the latter also in dual licensed strings) are + * merely stating that the module is licensed under the GPL v2, but are not + * telling whether "GPL v2 only" or "GPL v2 or later". The reason why there + * are two variants is a historic and failed attempt to convey more + * information in the MODULE_LICENSE string. For module loading the + * "only/or later" distinction is completely irrelevant and does neither + * replace the proper license identifiers in the corresponding source file + * nor amends them in any way. The sole purpose is to make the + * 'Proprietary' flagging work and to refuse to bind symbols which are + * exported with EXPORT_SYMBOL_GPL when a non free module is loaded. + * + * In the same way "BSD" is not a clear license information. It merely + * states, that the module is licensed under one of the compatible BSD + * license variants. The detailed and correct license information is again + * to be found in the corresponding source files. + * * There are dual licensed components, but when running with Linux it is the * GPL that is relevant so this is a non issue. Similarly LGPL linked with GPL * is a GPL combined work. -- cgit v1.2.3-59-g8ed1b From 9a065fa8f76f1b025c6b595d781050ad32db76bf Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Mon, 11 Feb 2019 14:43:23 +0100 Subject: Documentation/DMA-ISA-LPC: fix an incorrect reference AFAIK we never had a isa_virt_to_phys, it always was isa_virt_to_bus. Signed-off-by: Christoph Hellwig Signed-off-by: Jonathan Corbet --- Documentation/DMA-ISA-LPC.txt | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/Documentation/DMA-ISA-LPC.txt b/Documentation/DMA-ISA-LPC.txt index 8c2b8be6e45b..b1ec7b16c21f 100644 --- a/Documentation/DMA-ISA-LPC.txt +++ b/Documentation/DMA-ISA-LPC.txt @@ -52,8 +52,8 @@ Address translation ------------------- To translate the virtual address to a bus address, use the normal DMA -API. Do _not_ use isa_virt_to_phys() even though it does the same -thing. The reason for this is that the function isa_virt_to_phys() +API. Do _not_ use isa_virt_to_bus() even though it does the same +thing. The reason for this is that the function isa_virt_to_bus() will require a Kconfig dependency to ISA, not just ISA_DMA_API which is really all you need. Remember that even though the DMA controller has its origins in ISA it is used elsewhere. -- cgit v1.2.3-59-g8ed1b From c9389ad814cd0dd32e7d922bc33d5cc4b04784c1 Mon Sep 17 00:00:00 2001 From: Randy Dunlap Date: Sun, 10 Feb 2019 22:30:49 -0800 Subject: Documentation: fix lg-laptop.rst warnings Fix markup warnings by inserting blank lines. Also correct one typo. Documentation/laptops/lg-laptop.rst:2: WARNING: Explicit markup ends without a blank line; unexpected unindent. Documentation/laptops/lg-laptop.rst:16: WARNING: Unexpected indentation. Documentation/laptops/lg-laptop.rst:17: WARNING: Block quote ends without a blank line; unexpected unindent. Signed-off-by: Randy Dunlap Cc: Matan Ziv-Av Acked-by: Andy Shevchenko Signed-off-by: Jonathan Corbet --- Documentation/laptops/lg-laptop.rst | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/Documentation/laptops/lg-laptop.rst b/Documentation/laptops/lg-laptop.rst index e486fe7ddc35..aa503ee9b3bc 100644 --- a/Documentation/laptops/lg-laptop.rst +++ b/Documentation/laptops/lg-laptop.rst @@ -1,4 +1,5 @@ .. SPDX-License-Identifier: GPL-2.0+ + LG Gram laptop extra features ============================= @@ -9,6 +10,7 @@ Hotkeys ------- The following FN keys are ignored by the kernel without this driver: + - FN-F1 (LG control panel) - Generates F15 - FN-F5 (Touchpad toggle) - Generates F13 - FN-F6 (Airplane mode) - Generates RFKILL @@ -16,7 +18,7 @@ The following FN keys are ignored by the kernel without this driver: This key also changes keyboard backlight mode. - FN-F9 (Reader mode) - Generates F14 -The rest of the FN key work without a need for a special driver. +The rest of the FN keys work without a need for a special driver. Reader mode -- cgit v1.2.3-59-g8ed1b From 2c71d305caf9cd5076873c6a57d22ab8cb7c906c Mon Sep 17 00:00:00 2001 From: Jonathan Neuschäfer Date: Sun, 10 Feb 2019 18:12:59 +0100 Subject: docs: process: Remove outdated info about -git patches MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit As can be seen by clicking around the timeline on web.archive.org[1], there were no -git patches/tarballs on kernel.org since release 3.1. [1]: https://web.archive.org/web/20111103073843/http://www.kernel.org/ Signed-off-by: Jonathan Neuschäfer Signed-off-by: Jonathan Corbet --- Documentation/process/howto.rst | 9 --------- 1 file changed, 9 deletions(-) diff --git a/Documentation/process/howto.rst b/Documentation/process/howto.rst index bdd2eda81a1c..64deab5cfb3b 100644 --- a/Documentation/process/howto.rst +++ b/Documentation/process/howto.rst @@ -302,15 +302,6 @@ The file :ref:`Documentation/process/stable-kernel-rules.rst Date: Fri, 8 Feb 2019 16:30:38 +0100 Subject: doc:dmaengine: clarify DMA desc. pointer after submission It clarifies that the DMA description pointer returned by `dmaengine_prep_*` function should not be used after submission. Signed-off-by: Federico Vaga Signed-off-by: Jonathan Corbet --- Documentation/driver-api/dmaengine/client.rst | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/Documentation/driver-api/dmaengine/client.rst b/Documentation/driver-api/dmaengine/client.rst index fbbb2831f29f..d728e50105eb 100644 --- a/Documentation/driver-api/dmaengine/client.rst +++ b/Documentation/driver-api/dmaengine/client.rst @@ -168,6 +168,13 @@ The details of these operations are: dmaengine_submit() will not start the DMA operation, it merely adds it to the pending queue. For this, see step 5, dma_async_issue_pending. + .. note:: + + After calling ``dmaengine_submit()`` the submitted transfer descriptor + (``struct dma_async_tx_descriptor``) belongs to the DMA engine. + Consequentially, the client must consider invalid the pointer to that + descriptor. + 5. Issue pending DMA requests and wait for callback notification The transactions in the pending queue can be activated by calling the -- cgit v1.2.3-59-g8ed1b From 32c8966e904bd12d002a113da442839503ff84be Mon Sep 17 00:00:00 2001 From: Frank Rowand Date: Mon, 11 Feb 2019 14:38:04 -0800 Subject: docs: kernel-doc: typo "if ... if" -> "if ... is" "If no *function* if specified" should instead be "If no *function* is specified". Reported-by: Matthew Wilcox Signed-off-by: Frank Rowand Signed-off-by: Jonathan Corbet --- Documentation/doc-guide/kernel-doc.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/doc-guide/kernel-doc.rst b/Documentation/doc-guide/kernel-doc.rst index 6513c16b7e4f..f96059767c8c 100644 --- a/Documentation/doc-guide/kernel-doc.rst +++ b/Documentation/doc-guide/kernel-doc.rst @@ -490,7 +490,7 @@ doc: *title* functions: *[ function ...]* Include documentation for each *function* in *source*. - If no *function* if specified, the documentation for all functions + If no *function* is specified, the documentation for all functions and types in the *source* will be included. Examples:: -- cgit v1.2.3-59-g8ed1b From 44a47f0e3ec213a670c0170bb8ee588551ff1549 Mon Sep 17 00:00:00 2001 From: Nicholas Mc Guire Date: Fri, 15 Feb 2019 08:29:48 +0100 Subject: sysfs.txt: add note on available attribute macros The common cases of attributes wrappers should probably be using the __ATTR_XXX macros to make code more concise and readable but the current sysfs.txt does not point developers to those convenience macros. Further there is no note in sysfs.txt currently explaining why trying to set a sysfs file to mode 0666 will fail respectively revert to 0664. Signed-off-by: Nicholas Mc Guire Signed-off-by: Jonathan Corbet --- Documentation/filesystems/sysfs.txt | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+) diff --git a/Documentation/filesystems/sysfs.txt b/Documentation/filesystems/sysfs.txt index a1426cabcef1..d70c8f112662 100644 --- a/Documentation/filesystems/sysfs.txt +++ b/Documentation/filesystems/sysfs.txt @@ -116,6 +116,27 @@ static struct device_attribute dev_attr_foo = { .store = store_foo, }; +Note as stated in include/linux/kernel.h "OTHER_WRITABLE? Generally +considered a bad idea." so trying to set a sysfs file writable for +everyone will fail reverting to RO mode for "Others". + +For the common cases sysfs.h provides convenience macros to make +defining attributes easier as well as making code more concise and +readable. The above case could be shortened to: + +static struct device_attribute dev_attr_foo = __ATTR_RW(foo); + +the list of helpers available to define your wrapper function is: +__ATTR_RO(name): assumes default name_show and mode 0444 +__ATTR_WO(name): assumes a name_store only and is restricted to mode + 0200 that is root write access only. +__ATTR_RO_MODE(name, mode): fore more restrictive RO access currently + only use case is the EFI System Resource Table + (see drivers/firmware/efi/esrt.c) +__ATTR_RW(name): assumes default name_show, name_store and setting + mode to 0644. +__ATTR_NULL: which sets the name to NULL and is used as end of list + indicator (see: kernel/workqueue.c) Subsystem-Specific Callbacks ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -- cgit v1.2.3-59-g8ed1b From 9d87bbae2d60ebbd1f28a4b5c65162051164c775 Mon Sep 17 00:00:00 2001 From: Alexey Budankov Date: Mon, 11 Feb 2019 16:42:58 +0300 Subject: perf-security: document perf_events/Perf resource control Extend perf-security.rst file with perf_events/Perf resource control section describing RLIMIT_NOFILE and perf_event_mlock_kb settings for performance monitoring user processes. Signed-off-by: Alexey Budankov Signed-off-by: Jonathan Corbet --- Documentation/admin-guide/perf-security.rst | 42 +++++++++++++++++++++++++++++ 1 file changed, 42 insertions(+) diff --git a/Documentation/admin-guide/perf-security.rst b/Documentation/admin-guide/perf-security.rst index f73ebfe9bfe2..bac599e3c55f 100644 --- a/Documentation/admin-guide/perf-security.rst +++ b/Documentation/admin-guide/perf-security.rst @@ -84,6 +84,46 @@ governed by perf_event_paranoid [2]_ setting: locking limit is imposed but ignored for unprivileged processes with CAP_IPC_LOCK capability. +perf_events/Perf resource control +--------------------------------- + +Open file descriptors ++++++++++++++++++++++ + +The perf_events system call API [2]_ allocates file descriptors for every configured +PMU event. Open file descriptors are a per-process accountable resource governed +by the RLIMIT_NOFILE [11]_ limit (ulimit -n), which is usually derived from the login +shell process. When configuring Perf collection for a long list of events on a +large server system, this limit can be easily hit preventing required monitoring +configuration. RLIMIT_NOFILE limit can be increased on per-user basis modifying +content of the limits.conf file [12]_ . Ordinarily, a Perf sampling session +(perf record) requires an amount of open perf_event file descriptors that is not +less than the number of monitored events multiplied by the number of monitored CPUs. + +Memory allocation ++++++++++++++++++ + +The amount of memory available to user processes for capturing performance monitoring +data is governed by the perf_event_mlock_kb [2]_ setting. This perf_event specific +resource setting defines overall per-cpu limits of memory allowed for mapping +by the user processes to execute performance monitoring. The setting essentially +extends the RLIMIT_MEMLOCK [11]_ limit, but only for memory regions mapped specifically +for capturing monitored performance events and related data. + +For example, if a machine has eight cores and perf_event_mlock_kb limit is set +to 516 KiB, then a user process is provided with 516 KiB * 8 = 4128 KiB of memory +above the RLIMIT_MEMLOCK limit (ulimit -l) for perf_event mmap buffers. In particular, +this means that, if the user wants to start two or more performance monitoring +processes, the user is required to manually distribute the available 4128 KiB between the +monitoring processes, for example, using the --mmap-pages Perf record mode option. +Otherwise, the first started performance monitoring process allocates all available +4128 KiB and the other processes will fail to proceed due to the lack of memory. + +RLIMIT_MEMLOCK and perf_event_mlock_kb resource constraints are ignored for +processes with the CAP_IPC_LOCK capability. Thus, perf_events/Perf privileged users +can be provided with memory above the constraints for perf_events/Perf performance +monitoring purpose by providing the Perf executable with CAP_IPC_LOCK capability. + Bibliography ------------ @@ -94,4 +134,6 @@ Bibliography .. [5] ``_ .. [6] ``_ .. [7] ``_ +.. [11] ``_ +.. [12] ``_ -- cgit v1.2.3-59-g8ed1b From 68570ca0b4b5194c7680c5caf9669b6bcbc6153a Mon Sep 17 00:00:00 2001 From: Alexey Budankov Date: Mon, 11 Feb 2019 16:43:54 +0300 Subject: perf-security: document collected perf_events/Perf data categories Document and categorize system and performance data into groups that can be captured by perf_events/Perf and explicitly indicate the group that can contain process sensitive data. Signed-off-by: Alexey Budankov Signed-off-by: Jonathan Corbet --- Documentation/admin-guide/perf-security.rst | 32 +++++++++++++++++++++++++++-- 1 file changed, 30 insertions(+), 2 deletions(-) diff --git a/Documentation/admin-guide/perf-security.rst b/Documentation/admin-guide/perf-security.rst index bac599e3c55f..8772d44a5912 100644 --- a/Documentation/admin-guide/perf-security.rst +++ b/Documentation/admin-guide/perf-security.rst @@ -11,8 +11,34 @@ impose a considerable risk of leaking sensitive data accessed by monitored processes. The data leakage is possible both in scenarios of direct usage of perf_events system call API [2]_ and over data files generated by Perf tool user mode utility (Perf) [3]_ , [4]_ . The risk depends on the nature of data that -perf_events performance monitoring units (PMU) [2]_ collect and expose for -performance analysis. Having that said perf_events/Perf performance monitoring +perf_events performance monitoring units (PMU) [2]_ and Perf collect and expose +for performance analysis. Collected system and performance data may be split into +several categories: + +1. System hardware and software configuration data, for example: a CPU model and + its cache configuration, an amount of available memory and its topology, used + kernel and Perf versions, performance monitoring setup including experiment + time, events configuration, Perf command line parameters, etc. + +2. User and kernel module paths and their load addresses with sizes, process and + thread names with their PIDs and TIDs, timestamps for captured hardware and + software events. + +3. Content of kernel software counters (e.g., for context switches, page faults, + CPU migrations), architectural hardware performance counters (PMC) [8]_ and + machine specific registers (MSR) [9]_ that provide execution metrics for + various monitored parts of the system (e.g., memory controller (IMC), interconnect + (QPI/UPI) or peripheral (PCIe) uncore counters) without direct attribution to any + execution context state. + +4. Content of architectural execution context registers (e.g., RIP, RSP, RBP on + x86_64), process user and kernel space memory addresses and data, content of + various architectural MSRs that capture data from this category. + +Data that belong to the fourth category can potentially contain sensitive process +data. If PMUs in some monitoring modes capture values of execution context registers +or data from process memory then access to such monitoring capabilities requires +to be ordered and secured properly. So, perf_events/Perf performance monitoring is the subject for security access control management [5]_ . perf_events/Perf access control @@ -134,6 +160,8 @@ Bibliography .. [5] ``_ .. [6] ``_ .. [7] ``_ +.. [8] ``_ +.. [9] ``_ .. [11] ``_ .. [12] ``_ -- cgit v1.2.3-59-g8ed1b From e152c7b7bf56f4b00262463dcedfd92dd5629b6f Mon Sep 17 00:00:00 2001 From: Alexey Budankov Date: Mon, 11 Feb 2019 16:44:55 +0300 Subject: perf-security: elaborate on perf_events/Perf privileged users Elaborate on possible perf_event/Perf privileged users groups and document steps about creating such groups. Signed-off-by: Alexey Budankov Signed-off-by: Jonathan Corbet --- Documentation/admin-guide/perf-security.rst | 43 +++++++++++++++++++++++++++++ 1 file changed, 43 insertions(+) diff --git a/Documentation/admin-guide/perf-security.rst b/Documentation/admin-guide/perf-security.rst index 8772d44a5912..dccbf2ec0c9e 100644 --- a/Documentation/admin-guide/perf-security.rst +++ b/Documentation/admin-guide/perf-security.rst @@ -73,6 +73,48 @@ enable capturing of additional data required for later performance analysis of monitored processes or a system. For example, CAP_SYSLOG capability permits reading kernel space memory addresses from /proc/kallsyms file. +perf_events/Perf privileged users +--------------------------------- + +Mechanisms of capabilities, privileged capability-dumb files [6]_ and file system +ACLs [10]_ can be used to create a dedicated group of perf_events/Perf privileged +users who are permitted to execute performance monitoring without scope limits. +The following steps can be taken to create such a group of privileged Perf users. + +1. Create perf_users group of privileged Perf users, assign perf_users group to + Perf tool executable and limit access to the executable for other users in the + system who are not in the perf_users group: + +:: + + # groupadd perf_users + # ls -alhF + -rwxr-xr-x 2 root root 11M Oct 19 15:12 perf + # chgrp perf_users perf + # ls -alhF + -rwxr-xr-x 2 root perf_users 11M Oct 19 15:12 perf + # chmod o-rwx perf + # ls -alhF + -rwxr-x--- 2 root perf_users 11M Oct 19 15:12 perf + +2. Assign the required capabilities to the Perf tool executable file and enable + members of perf_users group with performance monitoring privileges [6]_ : + +:: + + # setcap "cap_sys_admin,cap_sys_ptrace,cap_syslog=ep" perf + # setcap -v "cap_sys_admin,cap_sys_ptrace,cap_syslog=ep" perf + perf: OK + # getcap perf + perf = cap_sys_ptrace,cap_sys_admin,cap_syslog+ep + +As a result, members of perf_users group are capable of conducting performance +monitoring by using functionality of the configured Perf tool executable that, +when executes, passes perf_events subsystem scope checks. + +This specific access control management is only available to superuser or root +running processes with CAP_SETPCAP, CAP_SETFCAP [6]_ capabilities. + perf_events/Perf unprivileged users ----------------------------------- @@ -162,6 +204,7 @@ Bibliography .. [7] ``_ .. [8] ``_ .. [9] ``_ +.. [10] ``_ .. [11] ``_ .. [12] ``_ -- cgit v1.2.3-59-g8ed1b From e85a198e30e9ffbc8e12986c1a8d29e6f0bc0af9 Mon Sep 17 00:00:00 2001 From: Alexey Budankov Date: Mon, 11 Feb 2019 17:58:24 +0300 Subject: perf-security: wrap paragraphs on 72 columns Implemented formatting of paragraphs to be not wider than 72 columns. Signed-off-by: Alexey Budankov Signed-off-by: Jonathan Corbet --- Documentation/admin-guide/perf-security.rst | 278 +++++++++++++++------------- 1 file changed, 149 insertions(+), 129 deletions(-) diff --git a/Documentation/admin-guide/perf-security.rst b/Documentation/admin-guide/perf-security.rst index dccbf2ec0c9e..72effa7c23b9 100644 --- a/Documentation/admin-guide/perf-security.rst +++ b/Documentation/admin-guide/perf-security.rst @@ -6,84 +6,94 @@ Perf Events and tool security Overview -------- -Usage of Performance Counters for Linux (perf_events) [1]_ , [2]_ , [3]_ can -impose a considerable risk of leaking sensitive data accessed by monitored -processes. The data leakage is possible both in scenarios of direct usage of -perf_events system call API [2]_ and over data files generated by Perf tool user -mode utility (Perf) [3]_ , [4]_ . The risk depends on the nature of data that -perf_events performance monitoring units (PMU) [2]_ and Perf collect and expose -for performance analysis. Collected system and performance data may be split into -several categories: - -1. System hardware and software configuration data, for example: a CPU model and - its cache configuration, an amount of available memory and its topology, used - kernel and Perf versions, performance monitoring setup including experiment - time, events configuration, Perf command line parameters, etc. - -2. User and kernel module paths and their load addresses with sizes, process and - thread names with their PIDs and TIDs, timestamps for captured hardware and - software events. - -3. Content of kernel software counters (e.g., for context switches, page faults, - CPU migrations), architectural hardware performance counters (PMC) [8]_ and - machine specific registers (MSR) [9]_ that provide execution metrics for - various monitored parts of the system (e.g., memory controller (IMC), interconnect - (QPI/UPI) or peripheral (PCIe) uncore counters) without direct attribution to any - execution context state. - -4. Content of architectural execution context registers (e.g., RIP, RSP, RBP on - x86_64), process user and kernel space memory addresses and data, content of - various architectural MSRs that capture data from this category. - -Data that belong to the fourth category can potentially contain sensitive process -data. If PMUs in some monitoring modes capture values of execution context registers -or data from process memory then access to such monitoring capabilities requires -to be ordered and secured properly. So, perf_events/Perf performance monitoring -is the subject for security access control management [5]_ . +Usage of Performance Counters for Linux (perf_events) [1]_ , [2]_ , [3]_ +can impose a considerable risk of leaking sensitive data accessed by +monitored processes. The data leakage is possible both in scenarios of +direct usage of perf_events system call API [2]_ and over data files +generated by Perf tool user mode utility (Perf) [3]_ , [4]_ . The risk +depends on the nature of data that perf_events performance monitoring +units (PMU) [2]_ and Perf collect and expose for performance analysis. +Collected system and performance data may be split into several +categories: + +1. System hardware and software configuration data, for example: a CPU + model and its cache configuration, an amount of available memory and + its topology, used kernel and Perf versions, performance monitoring + setup including experiment time, events configuration, Perf command + line parameters, etc. + +2. User and kernel module paths and their load addresses with sizes, + process and thread names with their PIDs and TIDs, timestamps for + captured hardware and software events. + +3. Content of kernel software counters (e.g., for context switches, page + faults, CPU migrations), architectural hardware performance counters + (PMC) [8]_ and machine specific registers (MSR) [9]_ that provide + execution metrics for various monitored parts of the system (e.g., + memory controller (IMC), interconnect (QPI/UPI) or peripheral (PCIe) + uncore counters) without direct attribution to any execution context + state. + +4. Content of architectural execution context registers (e.g., RIP, RSP, + RBP on x86_64), process user and kernel space memory addresses and + data, content of various architectural MSRs that capture data from + this category. + +Data that belong to the fourth category can potentially contain +sensitive process data. If PMUs in some monitoring modes capture values +of execution context registers or data from process memory then access +to such monitoring capabilities requires to be ordered and secured +properly. So, perf_events/Perf performance monitoring is the subject for +security access control management [5]_ . perf_events/Perf access control ------------------------------- -To perform security checks, the Linux implementation splits processes into two -categories [6]_ : a) privileged processes (whose effective user ID is 0, referred -to as superuser or root), and b) unprivileged processes (whose effective UID is -nonzero). Privileged processes bypass all kernel security permission checks so -perf_events performance monitoring is fully available to privileged processes -without access, scope and resource restrictions. - -Unprivileged processes are subject to a full security permission check based on -the process's credentials [5]_ (usually: effective UID, effective GID, and -supplementary group list). - -Linux divides the privileges traditionally associated with superuser into -distinct units, known as capabilities [6]_ , which can be independently enabled -and disabled on per-thread basis for processes and files of unprivileged users. - -Unprivileged processes with enabled CAP_SYS_ADMIN capability are treated as -privileged processes with respect to perf_events performance monitoring and -bypass *scope* permissions checks in the kernel. - -Unprivileged processes using perf_events system call API is also subject for -PTRACE_MODE_READ_REALCREDS ptrace access mode check [7]_ , whose outcome -determines whether monitoring is permitted. So unprivileged processes provided -with CAP_SYS_PTRACE capability are effectively permitted to pass the check. - -Other capabilities being granted to unprivileged processes can effectively -enable capturing of additional data required for later performance analysis of -monitored processes or a system. For example, CAP_SYSLOG capability permits -reading kernel space memory addresses from /proc/kallsyms file. +To perform security checks, the Linux implementation splits processes +into two categories [6]_ : a) privileged processes (whose effective user +ID is 0, referred to as superuser or root), and b) unprivileged +processes (whose effective UID is nonzero). Privileged processes bypass +all kernel security permission checks so perf_events performance +monitoring is fully available to privileged processes without access, +scope and resource restrictions. + +Unprivileged processes are subject to a full security permission check +based on the process's credentials [5]_ (usually: effective UID, +effective GID, and supplementary group list). + +Linux divides the privileges traditionally associated with superuser +into distinct units, known as capabilities [6]_ , which can be +independently enabled and disabled on per-thread basis for processes and +files of unprivileged users. + +Unprivileged processes with enabled CAP_SYS_ADMIN capability are treated +as privileged processes with respect to perf_events performance +monitoring and bypass *scope* permissions checks in the kernel. + +Unprivileged processes using perf_events system call API is also subject +for PTRACE_MODE_READ_REALCREDS ptrace access mode check [7]_ , whose +outcome determines whether monitoring is permitted. So unprivileged +processes provided with CAP_SYS_PTRACE capability are effectively +permitted to pass the check. + +Other capabilities being granted to unprivileged processes can +effectively enable capturing of additional data required for later +performance analysis of monitored processes or a system. For example, +CAP_SYSLOG capability permits reading kernel space memory addresses from +/proc/kallsyms file. perf_events/Perf privileged users --------------------------------- -Mechanisms of capabilities, privileged capability-dumb files [6]_ and file system -ACLs [10]_ can be used to create a dedicated group of perf_events/Perf privileged -users who are permitted to execute performance monitoring without scope limits. -The following steps can be taken to create such a group of privileged Perf users. +Mechanisms of capabilities, privileged capability-dumb files [6]_ and +file system ACLs [10]_ can be used to create a dedicated group of +perf_events/Perf privileged users who are permitted to execute +performance monitoring without scope limits. The following steps can be +taken to create such a group of privileged Perf users. -1. Create perf_users group of privileged Perf users, assign perf_users group to - Perf tool executable and limit access to the executable for other users in the - system who are not in the perf_users group: +1. Create perf_users group of privileged Perf users, assign perf_users + group to Perf tool executable and limit access to the executable for + other users in the system who are not in the perf_users group: :: @@ -97,8 +107,9 @@ The following steps can be taken to create such a group of privileged Perf users # ls -alhF -rwxr-x--- 2 root perf_users 11M Oct 19 15:12 perf -2. Assign the required capabilities to the Perf tool executable file and enable - members of perf_users group with performance monitoring privileges [6]_ : +2. Assign the required capabilities to the Perf tool executable file and + enable members of perf_users group with performance monitoring + privileges [6]_ : :: @@ -108,49 +119,52 @@ The following steps can be taken to create such a group of privileged Perf users # getcap perf perf = cap_sys_ptrace,cap_sys_admin,cap_syslog+ep -As a result, members of perf_users group are capable of conducting performance -monitoring by using functionality of the configured Perf tool executable that, -when executes, passes perf_events subsystem scope checks. +As a result, members of perf_users group are capable of conducting +performance monitoring by using functionality of the configured Perf +tool executable that, when executes, passes perf_events subsystem scope +checks. -This specific access control management is only available to superuser or root -running processes with CAP_SETPCAP, CAP_SETFCAP [6]_ capabilities. +This specific access control management is only available to superuser +or root running processes with CAP_SETPCAP, CAP_SETFCAP [6]_ +capabilities. perf_events/Perf unprivileged users ----------------------------------- -perf_events/Perf *scope* and *access* control for unprivileged processes is -governed by perf_event_paranoid [2]_ setting: +perf_events/Perf *scope* and *access* control for unprivileged processes +is governed by perf_event_paranoid [2]_ setting: -1: - Impose no *scope* and *access* restrictions on using perf_events performance - monitoring. Per-user per-cpu perf_event_mlock_kb [2]_ locking limit is - ignored when allocating memory buffers for storing performance data. - This is the least secure mode since allowed monitored *scope* is - maximized and no perf_events specific limits are imposed on *resources* - allocated for performance monitoring. + Impose no *scope* and *access* restrictions on using perf_events + performance monitoring. Per-user per-cpu perf_event_mlock_kb [2]_ + locking limit is ignored when allocating memory buffers for storing + performance data. This is the least secure mode since allowed + monitored *scope* is maximized and no perf_events specific limits + are imposed on *resources* allocated for performance monitoring. >=0: *scope* includes per-process and system wide performance monitoring - but excludes raw tracepoints and ftrace function tracepoints monitoring. - CPU and system events happened when executing either in user or - in kernel space can be monitored and captured for later analysis. - Per-user per-cpu perf_event_mlock_kb locking limit is imposed but - ignored for unprivileged processes with CAP_IPC_LOCK [6]_ capability. + but excludes raw tracepoints and ftrace function tracepoints + monitoring. CPU and system events happened when executing either in + user or in kernel space can be monitored and captured for later + analysis. Per-user per-cpu perf_event_mlock_kb locking limit is + imposed but ignored for unprivileged processes with CAP_IPC_LOCK + [6]_ capability. >=1: - *scope* includes per-process performance monitoring only and excludes - system wide performance monitoring. CPU and system events happened when - executing either in user or in kernel space can be monitored and - captured for later analysis. Per-user per-cpu perf_event_mlock_kb - locking limit is imposed but ignored for unprivileged processes with - CAP_IPC_LOCK capability. + *scope* includes per-process performance monitoring only and + excludes system wide performance monitoring. CPU and system events + happened when executing either in user or in kernel space can be + monitored and captured for later analysis. Per-user per-cpu + perf_event_mlock_kb locking limit is imposed but ignored for + unprivileged processes with CAP_IPC_LOCK capability. >=2: - *scope* includes per-process performance monitoring only. CPU and system - events happened when executing in user space only can be monitored and - captured for later analysis. Per-user per-cpu perf_event_mlock_kb - locking limit is imposed but ignored for unprivileged processes with - CAP_IPC_LOCK capability. + *scope* includes per-process performance monitoring only. CPU and + system events happened when executing in user space only can be + monitored and captured for later analysis. Per-user per-cpu + perf_event_mlock_kb locking limit is imposed but ignored for + unprivileged processes with CAP_IPC_LOCK capability. perf_events/Perf resource control --------------------------------- @@ -158,39 +172,45 @@ perf_events/Perf resource control Open file descriptors +++++++++++++++++++++ -The perf_events system call API [2]_ allocates file descriptors for every configured -PMU event. Open file descriptors are a per-process accountable resource governed -by the RLIMIT_NOFILE [11]_ limit (ulimit -n), which is usually derived from the login -shell process. When configuring Perf collection for a long list of events on a -large server system, this limit can be easily hit preventing required monitoring -configuration. RLIMIT_NOFILE limit can be increased on per-user basis modifying -content of the limits.conf file [12]_ . Ordinarily, a Perf sampling session -(perf record) requires an amount of open perf_event file descriptors that is not -less than the number of monitored events multiplied by the number of monitored CPUs. +The perf_events system call API [2]_ allocates file descriptors for +every configured PMU event. Open file descriptors are a per-process +accountable resource governed by the RLIMIT_NOFILE [11]_ limit +(ulimit -n), which is usually derived from the login shell process. When +configuring Perf collection for a long list of events on a large server +system, this limit can be easily hit preventing required monitoring +configuration. RLIMIT_NOFILE limit can be increased on per-user basis +modifying content of the limits.conf file [12]_ . Ordinarily, a Perf +sampling session (perf record) requires an amount of open perf_event +file descriptors that is not less than the number of monitored events +multiplied by the number of monitored CPUs. Memory allocation +++++++++++++++++ -The amount of memory available to user processes for capturing performance monitoring -data is governed by the perf_event_mlock_kb [2]_ setting. This perf_event specific -resource setting defines overall per-cpu limits of memory allowed for mapping -by the user processes to execute performance monitoring. The setting essentially -extends the RLIMIT_MEMLOCK [11]_ limit, but only for memory regions mapped specifically -for capturing monitored performance events and related data. - -For example, if a machine has eight cores and perf_event_mlock_kb limit is set -to 516 KiB, then a user process is provided with 516 KiB * 8 = 4128 KiB of memory -above the RLIMIT_MEMLOCK limit (ulimit -l) for perf_event mmap buffers. In particular, -this means that, if the user wants to start two or more performance monitoring -processes, the user is required to manually distribute the available 4128 KiB between the -monitoring processes, for example, using the --mmap-pages Perf record mode option. -Otherwise, the first started performance monitoring process allocates all available -4128 KiB and the other processes will fail to proceed due to the lack of memory. - -RLIMIT_MEMLOCK and perf_event_mlock_kb resource constraints are ignored for -processes with the CAP_IPC_LOCK capability. Thus, perf_events/Perf privileged users -can be provided with memory above the constraints for perf_events/Perf performance -monitoring purpose by providing the Perf executable with CAP_IPC_LOCK capability. +The amount of memory available to user processes for capturing +performance monitoring data is governed by the perf_event_mlock_kb [2]_ +setting. This perf_event specific resource setting defines overall +per-cpu limits of memory allowed for mapping by the user processes to +execute performance monitoring. The setting essentially extends the +RLIMIT_MEMLOCK [11]_ limit, but only for memory regions mapped +specifically for capturing monitored performance events and related data. + +For example, if a machine has eight cores and perf_event_mlock_kb limit +is set to 516 KiB, then a user process is provided with 516 KiB * 8 = +4128 KiB of memory above the RLIMIT_MEMLOCK limit (ulimit -l) for +perf_event mmap buffers. In particular, this means that, if the user +wants to start two or more performance monitoring processes, the user is +required to manually distribute the available 4128 KiB between the +monitoring processes, for example, using the --mmap-pages Perf record +mode option. Otherwise, the first started performance monitoring process +allocates all available 4128 KiB and the other processes will fail to +proceed due to the lack of memory. + +RLIMIT_MEMLOCK and perf_event_mlock_kb resource constraints are ignored +for processes with the CAP_IPC_LOCK capability. Thus, perf_events/Perf +privileged users can be provided with memory above the constraints for +perf_events/Perf performance monitoring purpose by providing the Perf +executable with CAP_IPC_LOCK capability. Bibliography ------------ -- cgit v1.2.3-59-g8ed1b From a10c29cd8bce35e938a2a3bd8f7eb425f7755d4d Mon Sep 17 00:00:00 2001 From: Federico Vaga Date: Thu, 21 Feb 2019 21:29:59 +0100 Subject: doc: translations: sync translations 'remove info about -git patches' Synchonise translations: CN, IT, JP, KR commit 2c71d305caf9 ("docs: process: Remove outdated info about -git patches") I can guarantee for the Italian translations, but since we are removing an entire chapter I think I did it right also for the other languages. Signed-off-by: Federico Vaga Acked-by: SeongJae Park Signed-off-by: Jonathan Corbet --- Documentation/translations/it_IT/process/howto.rst | 10 ---------- Documentation/translations/ja_JP/howto.rst | 9 --------- Documentation/translations/ko_KR/howto.rst | 8 -------- Documentation/translations/zh_CN/HOWTO | 8 -------- 4 files changed, 35 deletions(-) diff --git a/Documentation/translations/it_IT/process/howto.rst b/Documentation/translations/it_IT/process/howto.rst index f9a44b1af89d..15199aa5b4df 100644 --- a/Documentation/translations/it_IT/process/howto.rst +++ b/Documentation/translations/it_IT/process/howto.rst @@ -313,16 +313,6 @@ Il file Documentation/process/stable-kernel-rules.rst (nei sorgenti) documenta quali tipologie di modifiche sono accettate per i sorgenti -stable, e come avviene il processo di rilascio. -Le modifiche in 4.x -git -~~~~~~~~~~~~~~~~~~~~~~~~ - -Queste sono istantanee quotidiane del kernel di Linus e sono gestite in -una repositorio git (da qui il nome). Queste modifiche sono solitamente -rilasciate giornalmente e rappresentano l'attuale stato dei sorgenti di -Linus. Queste sono da considerarsi più sperimentali di un -rc in quanto -generate automaticamente senza nemmeno aver dato una rapida occhiata -per verificarne lo stato. - Sorgenti dei sottosistemi del kernel e le loro patch ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ diff --git a/Documentation/translations/ja_JP/howto.rst b/Documentation/translations/ja_JP/howto.rst index 2ca9389d5915..4fbf40552728 100644 --- a/Documentation/translations/ja_JP/howto.rst +++ b/Documentation/translations/ja_JP/howto.rst @@ -319,15 +319,6 @@ Documentation/process/stable-kernel-rules.rst ファイルにはどのような 類の変更が -stable ツリーに受け入れ可能か、またリリースプロセスがどう 動くかが記述されています。 -4.x -git パッチ -~~~~~~~~~~~~~~~ - -git リポジトリで管理されているLinus のカーネルツリーの毎日のスナップ -ショットがあります。(だから -git という名前がついています)。これらのパッ -チはおおむね毎日リリースされており、Linus のツリーの現状を表します。こ -れは -rc カーネルと比べて、パッチが大丈夫かどうかも確認しないで自動的 -に生成されるので、より実験的です。 - サブシステム毎のカーネルツリーとパッチ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ diff --git a/Documentation/translations/ko_KR/howto.rst b/Documentation/translations/ko_KR/howto.rst index 528433932589..d8acf255ac5f 100644 --- a/Documentation/translations/ko_KR/howto.rst +++ b/Documentation/translations/ko_KR/howto.rst @@ -302,14 +302,6 @@ Andrew Morton의 글이 있다. 파일은 어떤 종류의 변경들이 -stable 트리로 들어왔는지와 배포 프로세스가 어떻게 진행되는지를 설명한다. -4.x -git 패치들 -~~~~~~~~~~~~~~~ - -git 저장소(그러므로 -git이라는 이름이 붙음)에는 날마다 관리되는 Linus의 -커널 트리의 snapshot 들이 있다. 이 패치들은 일반적으로 날마다 배포되며 -Linus의 트리의 현재 상태를 나타낸다. 이 패치들은 정상적인지 조금도 -살펴보지 않고 자동적으로 생성된 것이므로 -rc 커널들 보다도 더 실험적이다. - 서브시스템 커널 트리들과 패치들 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ diff --git a/Documentation/translations/zh_CN/HOWTO b/Documentation/translations/zh_CN/HOWTO index 5f6d09edc9ac..8e7dee80a087 100644 --- a/Documentation/translations/zh_CN/HOWTO +++ b/Documentation/translations/zh_CN/HOWTO @@ -240,14 +240,6 @@ kernel.org网站的pub/linux/kernel/v2.6/目录下找到它。它的开发遵循 版内核接受的修改类型以及发布的流程。 -2.6.x -git补丁集 ----------------- -Linus的内核源码树的每日快照,这个源码树是由git工具管理的(由此得名)。这 -些补丁通常每天更新以反映Linus的源码树的最新状态。它们比-rc版本的内核源码 -树更具试验性质,因为这个补丁集是全自动生成的,没有任何人来确认其是否真正 -健全。 - - 2.6.x -mm补丁集 --------------- 这是由Andrew Morton维护的试验性内核补丁集。Andrew将所有子系统的内核源码 -- cgit v1.2.3-59-g8ed1b From 1c7f86cbceb4536b2490e74533db78fc2e0f0f80 Mon Sep 17 00:00:00 2001 From: Federico Vaga Date: Thu, 21 Feb 2019 21:30:00 +0100 Subject: doc: process: complete removal of info about -git patches The following patch forgot to remove a reference to the -git patches commit 2c71d305caf9 ("docs: process: Remove outdated info about -git patches") This patch complete the removal and update all translations Signed-off-by: Federico Vaga Acked-by: SeongJae Park Signed-off-by: Jonathan Corbet --- Documentation/process/howto.rst | 1 - Documentation/translations/it_IT/process/howto.rst | 1 - Documentation/translations/ja_JP/howto.rst | 1 - Documentation/translations/ko_KR/howto.rst | 1 - Documentation/translations/zh_CN/HOWTO | 1 - 5 files changed, 5 deletions(-) diff --git a/Documentation/process/howto.rst b/Documentation/process/howto.rst index 64deab5cfb3b..f16242bfb962 100644 --- a/Documentation/process/howto.rst +++ b/Documentation/process/howto.rst @@ -237,7 +237,6 @@ branches. These different branches are: - main 4.x kernel tree - 4.x.y -stable kernel tree - - 4.x -git kernel patches - subsystem specific kernel trees and patches - the 4.x -next kernel tree for integration tests diff --git a/Documentation/translations/it_IT/process/howto.rst b/Documentation/translations/it_IT/process/howto.rst index 15199aa5b4df..9903ac7c566b 100644 --- a/Documentation/translations/it_IT/process/howto.rst +++ b/Documentation/translations/it_IT/process/howto.rst @@ -244,7 +244,6 @@ e di molti altri rami per specifici sottosistemi. Questi rami sono: - I sorgenti kernel 4.x - I sorgenti stabili del kernel 4.x.y -stable - - Le modifiche in 4.x -git - Sorgenti dei sottosistemi del kernel e le loro modifiche - Il kernel 4.x -next per test d'integrazione diff --git a/Documentation/translations/ja_JP/howto.rst b/Documentation/translations/ja_JP/howto.rst index 4fbf40552728..2621b770a745 100644 --- a/Documentation/translations/ja_JP/howto.rst +++ b/Documentation/translations/ja_JP/howto.rst @@ -256,7 +256,6 @@ Linux カーネルの開発プロセスは現在幾つかの異なるメイン - メインの 4.x カーネルツリー - 4.x.y -stable カーネルツリー - - 4.x -git カーネルパッチ - サブシステム毎のカーネルツリーとパッチ - 統合テストのための 4.x -next カーネルツリー diff --git a/Documentation/translations/ko_KR/howto.rst b/Documentation/translations/ko_KR/howto.rst index d8acf255ac5f..bcd63731b80a 100644 --- a/Documentation/translations/ko_KR/howto.rst +++ b/Documentation/translations/ko_KR/howto.rst @@ -242,7 +242,6 @@ ReST 마크업을 사용하는 문서들은 Documentation/output 에 생성된 - main 4.x 커널 트리 - 4.x.y - 안정된 커널 트리 - - 4.x -git 커널 패치들 - 서브시스템을 위한 커널 트리들과 패치들 - 4.x - 통합 테스트를 위한 next 커널 트리 diff --git a/Documentation/translations/zh_CN/HOWTO b/Documentation/translations/zh_CN/HOWTO index 8e7dee80a087..7a00a8a4eb15 100644 --- a/Documentation/translations/zh_CN/HOWTO +++ b/Documentation/translations/zh_CN/HOWTO @@ -192,7 +192,6 @@ Linux内核代码中包含有大量的文档。这些文档对于学习如何与 些分支包括: - 2.6.x主内核源码树 - 2.6.x.y -stable内核源码树 - - 2.6.x -git内核补丁集 - 2.6.x -mm内核补丁集 - 子系统相关的内核源码树和补丁集 -- cgit v1.2.3-59-g8ed1b From f07fb1088fb16e62b69d62fb3836e4731fb4079b Mon Sep 17 00:00:00 2001 From: Randy Dunlap Date: Wed, 20 Feb 2019 20:02:33 -0800 Subject: Documentation: fix admin-guide/README.rst minimum gcc version requirement Fix minimum gcc version as specified in Documentation/process/changes.rst. Suggested-by: Matthew Wilcox Signed-off-by: Randy Dunlap Signed-off-by: Jonathan Corbet --- Documentation/admin-guide/README.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/admin-guide/README.rst b/Documentation/admin-guide/README.rst index 47e577264198..a582c780c3bd 100644 --- a/Documentation/admin-guide/README.rst +++ b/Documentation/admin-guide/README.rst @@ -251,7 +251,7 @@ Configuring the kernel Compiling the kernel -------------------- - - Make sure you have at least gcc 3.2 available. + - Make sure you have at least gcc 4.6 available. For more information, refer to :ref:`Documentation/process/changes.rst `. Please note that you can still run a.out user programs with this kernel. -- cgit v1.2.3-59-g8ed1b From 61ab9fecaf4fb829d21bbc3ec6679b1505fd61e8 Mon Sep 17 00:00:00 2001 From: Federico Vaga Date: Wed, 20 Feb 2019 23:46:09 +0100 Subject: doc: fix typos in license-rules.rst The patches fixes some typos in process/license-rules.rst Signed-off-by: Federico Vaga Signed-off-by: Jonathan Corbet --- Documentation/process/license-rules.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/Documentation/process/license-rules.rst b/Documentation/process/license-rules.rst index 43b6a1ee0193..6b09033a8e9e 100644 --- a/Documentation/process/license-rules.rst +++ b/Documentation/process/license-rules.rst @@ -62,7 +62,7 @@ License identifier syntax The SPDX license identifier in kernel files shall be added at the first possible line in a file which can contain a comment. For the majority - or files this is the first line, except for scripts which require the + of files this is the first line, except for scripts which require the '#!PATH_TO_INTERPRETER' in the first line. For those scripts the SPDX identifier goes into the second line. @@ -368,7 +368,7 @@ kernel, can be broken down into: All SPDX license identifiers and exceptions must have a corresponding file -in the LICENSE subdirectories. This is required to allow tool +in the LICENSES subdirectories. This is required to allow tool verification (e.g. checkpatch.pl) and to have the licenses ready to read and extract right from the source, which is recommended by various FOSS organizations, e.g. the `FSFE REUSE initiative `_. -- cgit v1.2.3-59-g8ed1b From a5f4cb4288e548ab415bbfebd1105c7b29ba9f8c Mon Sep 17 00:00:00 2001 From: Aurélien Cedeyn Date: Wed, 20 Feb 2019 22:18:34 +0100 Subject: scripts/spdxcheck.py: fix C++ comment style detection MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit With the last commit to support the SuperH boot code files, we have the following regression: $ ./scripts/checkpatch.pl -f <(echo '/* SPDX-License-Identifier: MIT */') WARNING: 'SPDX-License-Identifier: MIT */' is not supported in LICENSES/.. +/* SPDX-License-Identifier: MIT */ total: 0 errors, 1 warnings, 1 lines checked NOTE: For some of the reported defects, checkpatch may be able to mechanically convert to the typical style using --fix or --fix-inplace. /dev/fd/63 has style problems, please review. NOTE: If any of the errors are false positives, please report them to the maintainer, see CHECKPATCH in MAINTAINERS. This is not obvious, but spdxcheck.py is launched in checkpatch.pl with : ... } elsif ($rawline =~ /(SPDX-License-Identifier: .*)/) { my $spdx_license = $1; if (!is_SPDX_License_valid($spdx_license)) { WARN("SPDX_LICENSE_TAG", "'$spdx_license' is not supported in LICENSES/...\n" . \ $herecurr); } ... sub is_SPDX_License_valid { my ($license) = @_; ... my $status = `cd "$root_path"; echo "$license" | python scripts/spdxcheck.py -`; ... } The first chars before 'SPDX-License-Identifier:' are ignored. This commit fixes this regression. Fixes:959b49687838 (scripts/spdxcheck.py: Handle special quotation mark comments) Signed-off-by:Aurélien Cedeyn Signed-off-by: Jonathan Corbet --- scripts/spdxcheck.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/scripts/spdxcheck.py b/scripts/spdxcheck.py index 3fb020c2cb7f..4fe392e507fb 100755 --- a/scripts/spdxcheck.py +++ b/scripts/spdxcheck.py @@ -177,7 +177,7 @@ class id_parser(object): continue expr = line.split(':')[1].strip() # Remove trailing comment closure - if line.startswith('/*'): + if line.strip().endswith('*/'): expr = expr.rstrip('*/').strip() # Special case for SH magic boot code files if line.startswith('LIST \"'): -- cgit v1.2.3-59-g8ed1b From 3203561d6d081fa53d3b448d99fb9ffd933b3123 Mon Sep 17 00:00:00 2001 From: George Sofianos Date: Wed, 20 Feb 2019 03:22:45 +0200 Subject: Docs: Correct /proc/stat path The documentation for dynamic ticks in high resolution timers uses an incorrect path for /proc/stat - fix it. Signed-off-by: George Sofianos Signed-off-by: Jonathan Corbet --- Documentation/timers/highres.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/timers/highres.txt b/Documentation/timers/highres.txt index 9d88f67781c2..8f9741592123 100644 --- a/Documentation/timers/highres.txt +++ b/Documentation/timers/highres.txt @@ -231,7 +231,7 @@ in the idle period to make sure that jiffies are up to date and the interrupt handler has not to deal with an eventually stale jiffy value. The dynamic tick feature provides statistical values which are exported to -userspace via /proc/stats and can be made available for enhanced power +userspace via /proc/stat and can be made available for enhanced power management control. The implementation leaves room for further development like full tickless -- cgit v1.2.3-59-g8ed1b From d61330c689df2ef7ac76b63be2bd0a8561e47fd9 Mon Sep 17 00:00:00 2001 From: Kees Cook Date: Sun, 17 Feb 2019 14:08:36 -0800 Subject: doc: sctp: Merge and clean up rst files The SCTP sections were ending up at the top-level table of contents under the security section when they should have be sections with the SCTP chapters. In addition to correcting the section and subsection headings, this merges the SCTP documents into a single file to organize the chapters more clearly, internally linkifies them, and adds the missing SPDX header. Signed-off-by: Kees Cook Acked-by: Paul Moore Signed-off-by: Jonathan Corbet --- Documentation/security/LSM-sctp.rst | 175 ---------------- Documentation/security/SCTP.rst | 343 ++++++++++++++++++++++++++++++++ Documentation/security/SELinux-sctp.rst | 158 --------------- Documentation/security/index.rst | 3 +- security/selinux/hooks.c | 2 +- 5 files changed, 345 insertions(+), 336 deletions(-) delete mode 100644 Documentation/security/LSM-sctp.rst create mode 100644 Documentation/security/SCTP.rst delete mode 100644 Documentation/security/SELinux-sctp.rst diff --git a/Documentation/security/LSM-sctp.rst b/Documentation/security/LSM-sctp.rst deleted file mode 100644 index 6e5a3925a860..000000000000 --- a/Documentation/security/LSM-sctp.rst +++ /dev/null @@ -1,175 +0,0 @@ -SCTP LSM Support -================ - -For security module support, three SCTP specific hooks have been implemented:: - - security_sctp_assoc_request() - security_sctp_bind_connect() - security_sctp_sk_clone() - -Also the following security hook has been utilised:: - - security_inet_conn_established() - -The usage of these hooks are described below with the SELinux implementation -described in ``Documentation/security/SELinux-sctp.rst`` - - -security_sctp_assoc_request() ------------------------------ -Passes the ``@ep`` and ``@chunk->skb`` of the association INIT packet to the -security module. Returns 0 on success, error on failure. -:: - - @ep - pointer to sctp endpoint structure. - @skb - pointer to skbuff of association packet. - - -security_sctp_bind_connect() ------------------------------ -Passes one or more ipv4/ipv6 addresses to the security module for validation -based on the ``@optname`` that will result in either a bind or connect -service as shown in the permission check tables below. -Returns 0 on success, error on failure. -:: - - @sk - Pointer to sock structure. - @optname - Name of the option to validate. - @address - One or more ipv4 / ipv6 addresses. - @addrlen - The total length of address(s). This is calculated on each - ipv4 or ipv6 address using sizeof(struct sockaddr_in) or - sizeof(struct sockaddr_in6). - - ------------------------------------------------------------------ - | BIND Type Checks | - | @optname | @address contains | - |----------------------------|-----------------------------------| - | SCTP_SOCKOPT_BINDX_ADD | One or more ipv4 / ipv6 addresses | - | SCTP_PRIMARY_ADDR | Single ipv4 or ipv6 address | - | SCTP_SET_PEER_PRIMARY_ADDR | Single ipv4 or ipv6 address | - ------------------------------------------------------------------ - - ------------------------------------------------------------------ - | CONNECT Type Checks | - | @optname | @address contains | - |----------------------------|-----------------------------------| - | SCTP_SOCKOPT_CONNECTX | One or more ipv4 / ipv6 addresses | - | SCTP_PARAM_ADD_IP | One or more ipv4 / ipv6 addresses | - | SCTP_SENDMSG_CONNECT | Single ipv4 or ipv6 address | - | SCTP_PARAM_SET_PRIMARY | Single ipv4 or ipv6 address | - ------------------------------------------------------------------ - -A summary of the ``@optname`` entries is as follows:: - - SCTP_SOCKOPT_BINDX_ADD - Allows additional bind addresses to be - associated after (optionally) calling - bind(3). - sctp_bindx(3) adds a set of bind - addresses on a socket. - - SCTP_SOCKOPT_CONNECTX - Allows the allocation of multiple - addresses for reaching a peer - (multi-homed). - sctp_connectx(3) initiates a connection - on an SCTP socket using multiple - destination addresses. - - SCTP_SENDMSG_CONNECT - Initiate a connection that is generated by a - sendmsg(2) or sctp_sendmsg(3) on a new asociation. - - SCTP_PRIMARY_ADDR - Set local primary address. - - SCTP_SET_PEER_PRIMARY_ADDR - Request peer sets address as - association primary. - - SCTP_PARAM_ADD_IP - These are used when Dynamic Address - SCTP_PARAM_SET_PRIMARY - Reconfiguration is enabled as explained below. - - -To support Dynamic Address Reconfiguration the following parameters must be -enabled on both endpoints (or use the appropriate **setsockopt**\(2)):: - - /proc/sys/net/sctp/addip_enable - /proc/sys/net/sctp/addip_noauth_enable - -then the following *_PARAM_*'s are sent to the peer in an -ASCONF chunk when the corresponding ``@optname``'s are present:: - - @optname ASCONF Parameter - ---------- ------------------ - SCTP_SOCKOPT_BINDX_ADD -> SCTP_PARAM_ADD_IP - SCTP_SET_PEER_PRIMARY_ADDR -> SCTP_PARAM_SET_PRIMARY - - -security_sctp_sk_clone() -------------------------- -Called whenever a new socket is created by **accept**\(2) -(i.e. a TCP style socket) or when a socket is 'peeled off' e.g userspace -calls **sctp_peeloff**\(3). -:: - - @ep - pointer to current sctp endpoint structure. - @sk - pointer to current sock structure. - @sk - pointer to new sock structure. - - -security_inet_conn_established() ---------------------------------- -Called when a COOKIE ACK is received:: - - @sk - pointer to sock structure. - @skb - pointer to skbuff of the COOKIE ACK packet. - - -Security Hooks used for Association Establishment -================================================= -The following diagram shows the use of ``security_sctp_bind_connect()``, -``security_sctp_assoc_request()``, ``security_inet_conn_established()`` when -establishing an association. -:: - - SCTP endpoint "A" SCTP endpoint "Z" - ================= ================= - sctp_sf_do_prm_asoc() - Association setup can be initiated - by a connect(2), sctp_connectx(3), - sendmsg(2) or sctp_sendmsg(3). - These will result in a call to - security_sctp_bind_connect() to - initiate an association to - SCTP peer endpoint "Z". - INIT ---------------------------------------------> - sctp_sf_do_5_1B_init() - Respond to an INIT chunk. - SCTP peer endpoint "A" is - asking for an association. Call - security_sctp_assoc_request() - to set the peer label if first - association. - If not first association, check - whether allowed, IF so send: - <----------------------------------------------- INIT ACK - | ELSE audit event and silently - | discard the packet. - | - COOKIE ECHO ------------------------------------------> - | - | - | - <------------------------------------------- COOKIE ACK - | | - sctp_sf_do_5_1E_ca | - Call security_inet_conn_established() | - to set the peer label. | - | | - | If SCTP_SOCKET_TCP or peeled off - | socket security_sctp_sk_clone() is - | called to clone the new socket. - | | - ESTABLISHED ESTABLISHED - | | - ------------------------------------------------------------------ - | Association Established | - ------------------------------------------------------------------ - - diff --git a/Documentation/security/SCTP.rst b/Documentation/security/SCTP.rst new file mode 100644 index 000000000000..d903eb97fcf3 --- /dev/null +++ b/Documentation/security/SCTP.rst @@ -0,0 +1,343 @@ +.. SPDX-License-Identifier: GPL-2.0 + +==== +SCTP +==== + +SCTP LSM Support +================ + +Security Hooks +-------------- + +For security module support, three SCTP specific hooks have been implemented:: + + security_sctp_assoc_request() + security_sctp_bind_connect() + security_sctp_sk_clone() + +Also the following security hook has been utilised:: + + security_inet_conn_established() + +The usage of these hooks are described below with the SELinux implementation +described in the `SCTP SELinux Support`_ chapter. + + +security_sctp_assoc_request() +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Passes the ``@ep`` and ``@chunk->skb`` of the association INIT packet to the +security module. Returns 0 on success, error on failure. +:: + + @ep - pointer to sctp endpoint structure. + @skb - pointer to skbuff of association packet. + + +security_sctp_bind_connect() +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Passes one or more ipv4/ipv6 addresses to the security module for validation +based on the ``@optname`` that will result in either a bind or connect +service as shown in the permission check tables below. +Returns 0 on success, error on failure. +:: + + @sk - Pointer to sock structure. + @optname - Name of the option to validate. + @address - One or more ipv4 / ipv6 addresses. + @addrlen - The total length of address(s). This is calculated on each + ipv4 or ipv6 address using sizeof(struct sockaddr_in) or + sizeof(struct sockaddr_in6). + + ------------------------------------------------------------------ + | BIND Type Checks | + | @optname | @address contains | + |----------------------------|-----------------------------------| + | SCTP_SOCKOPT_BINDX_ADD | One or more ipv4 / ipv6 addresses | + | SCTP_PRIMARY_ADDR | Single ipv4 or ipv6 address | + | SCTP_SET_PEER_PRIMARY_ADDR | Single ipv4 or ipv6 address | + ------------------------------------------------------------------ + + ------------------------------------------------------------------ + | CONNECT Type Checks | + | @optname | @address contains | + |----------------------------|-----------------------------------| + | SCTP_SOCKOPT_CONNECTX | One or more ipv4 / ipv6 addresses | + | SCTP_PARAM_ADD_IP | One or more ipv4 / ipv6 addresses | + | SCTP_SENDMSG_CONNECT | Single ipv4 or ipv6 address | + | SCTP_PARAM_SET_PRIMARY | Single ipv4 or ipv6 address | + ------------------------------------------------------------------ + +A summary of the ``@optname`` entries is as follows:: + + SCTP_SOCKOPT_BINDX_ADD - Allows additional bind addresses to be + associated after (optionally) calling + bind(3). + sctp_bindx(3) adds a set of bind + addresses on a socket. + + SCTP_SOCKOPT_CONNECTX - Allows the allocation of multiple + addresses for reaching a peer + (multi-homed). + sctp_connectx(3) initiates a connection + on an SCTP socket using multiple + destination addresses. + + SCTP_SENDMSG_CONNECT - Initiate a connection that is generated by a + sendmsg(2) or sctp_sendmsg(3) on a new asociation. + + SCTP_PRIMARY_ADDR - Set local primary address. + + SCTP_SET_PEER_PRIMARY_ADDR - Request peer sets address as + association primary. + + SCTP_PARAM_ADD_IP - These are used when Dynamic Address + SCTP_PARAM_SET_PRIMARY - Reconfiguration is enabled as explained below. + + +To support Dynamic Address Reconfiguration the following parameters must be +enabled on both endpoints (or use the appropriate **setsockopt**\(2)):: + + /proc/sys/net/sctp/addip_enable + /proc/sys/net/sctp/addip_noauth_enable + +then the following *_PARAM_*'s are sent to the peer in an +ASCONF chunk when the corresponding ``@optname``'s are present:: + + @optname ASCONF Parameter + ---------- ------------------ + SCTP_SOCKOPT_BINDX_ADD -> SCTP_PARAM_ADD_IP + SCTP_SET_PEER_PRIMARY_ADDR -> SCTP_PARAM_SET_PRIMARY + + +security_sctp_sk_clone() +~~~~~~~~~~~~~~~~~~~~~~~~ +Called whenever a new socket is created by **accept**\(2) +(i.e. a TCP style socket) or when a socket is 'peeled off' e.g userspace +calls **sctp_peeloff**\(3). +:: + + @ep - pointer to current sctp endpoint structure. + @sk - pointer to current sock structure. + @sk - pointer to new sock structure. + + +security_inet_conn_established() +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Called when a COOKIE ACK is received:: + + @sk - pointer to sock structure. + @skb - pointer to skbuff of the COOKIE ACK packet. + + +Security Hooks used for Association Establishment +------------------------------------------------- + +The following diagram shows the use of ``security_sctp_bind_connect()``, +``security_sctp_assoc_request()``, ``security_inet_conn_established()`` when +establishing an association. +:: + + SCTP endpoint "A" SCTP endpoint "Z" + ================= ================= + sctp_sf_do_prm_asoc() + Association setup can be initiated + by a connect(2), sctp_connectx(3), + sendmsg(2) or sctp_sendmsg(3). + These will result in a call to + security_sctp_bind_connect() to + initiate an association to + SCTP peer endpoint "Z". + INIT ---------------------------------------------> + sctp_sf_do_5_1B_init() + Respond to an INIT chunk. + SCTP peer endpoint "A" is + asking for an association. Call + security_sctp_assoc_request() + to set the peer label if first + association. + If not first association, check + whether allowed, IF so send: + <----------------------------------------------- INIT ACK + | ELSE audit event and silently + | discard the packet. + | + COOKIE ECHO ------------------------------------------> + | + | + | + <------------------------------------------- COOKIE ACK + | | + sctp_sf_do_5_1E_ca | + Call security_inet_conn_established() | + to set the peer label. | + | | + | If SCTP_SOCKET_TCP or peeled off + | socket security_sctp_sk_clone() is + | called to clone the new socket. + | | + ESTABLISHED ESTABLISHED + | | + ------------------------------------------------------------------ + | Association Established | + ------------------------------------------------------------------ + + +SCTP SELinux Support +==================== + +Security Hooks +-------------- + +The `SCTP LSM Support`_ chapter above describes the following SCTP security +hooks with the SELinux specifics expanded below:: + + security_sctp_assoc_request() + security_sctp_bind_connect() + security_sctp_sk_clone() + security_inet_conn_established() + + +security_sctp_assoc_request() +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Passes the ``@ep`` and ``@chunk->skb`` of the association INIT packet to the +security module. Returns 0 on success, error on failure. +:: + + @ep - pointer to sctp endpoint structure. + @skb - pointer to skbuff of association packet. + +The security module performs the following operations: + IF this is the first association on ``@ep->base.sk``, then set the peer + sid to that in ``@skb``. This will ensure there is only one peer sid + assigned to ``@ep->base.sk`` that may support multiple associations. + + ELSE validate the ``@ep->base.sk peer_sid`` against the ``@skb peer sid`` + to determine whether the association should be allowed or denied. + + Set the sctp ``@ep sid`` to socket's sid (from ``ep->base.sk``) with + MLS portion taken from ``@skb peer sid``. This will be used by SCTP + TCP style sockets and peeled off connections as they cause a new socket + to be generated. + + If IP security options are configured (CIPSO/CALIPSO), then the ip + options are set on the socket. + + +security_sctp_bind_connect() +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Checks permissions required for ipv4/ipv6 addresses based on the ``@optname`` +as follows:: + + ------------------------------------------------------------------ + | BIND Permission Checks | + | @optname | @address contains | + |----------------------------|-----------------------------------| + | SCTP_SOCKOPT_BINDX_ADD | One or more ipv4 / ipv6 addresses | + | SCTP_PRIMARY_ADDR | Single ipv4 or ipv6 address | + | SCTP_SET_PEER_PRIMARY_ADDR | Single ipv4 or ipv6 address | + ------------------------------------------------------------------ + + ------------------------------------------------------------------ + | CONNECT Permission Checks | + | @optname | @address contains | + |----------------------------|-----------------------------------| + | SCTP_SOCKOPT_CONNECTX | One or more ipv4 / ipv6 addresses | + | SCTP_PARAM_ADD_IP | One or more ipv4 / ipv6 addresses | + | SCTP_SENDMSG_CONNECT | Single ipv4 or ipv6 address | + | SCTP_PARAM_SET_PRIMARY | Single ipv4 or ipv6 address | + ------------------------------------------------------------------ + + +`SCTP LSM Support`_ gives a summary of the ``@optname`` +entries and also describes ASCONF chunk processing when Dynamic Address +Reconfiguration is enabled. + + +security_sctp_sk_clone() +~~~~~~~~~~~~~~~~~~~~~~~~ +Called whenever a new socket is created by **accept**\(2) (i.e. a TCP style +socket) or when a socket is 'peeled off' e.g userspace calls +**sctp_peeloff**\(3). ``security_sctp_sk_clone()`` will set the new +sockets sid and peer sid to that contained in the ``@ep sid`` and +``@ep peer sid`` respectively. +:: + + @ep - pointer to current sctp endpoint structure. + @sk - pointer to current sock structure. + @sk - pointer to new sock structure. + + +security_inet_conn_established() +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Called when a COOKIE ACK is received where it sets the connection's peer sid +to that in ``@skb``:: + + @sk - pointer to sock structure. + @skb - pointer to skbuff of the COOKIE ACK packet. + + +Policy Statements +----------------- +The following class and permissions to support SCTP are available within the +kernel:: + + class sctp_socket inherits socket { node_bind } + +whenever the following policy capability is enabled:: + + policycap extended_socket_class; + +SELinux SCTP support adds the ``name_connect`` permission for connecting +to a specific port type and the ``association`` permission that is explained +in the section below. + +If userspace tools have been updated, SCTP will support the ``portcon`` +statement as shown in the following example:: + + portcon sctp 1024-1036 system_u:object_r:sctp_ports_t:s0 + + +SCTP Peer Labeling +------------------ +An SCTP socket will only have one peer label assigned to it. This will be +assigned during the establishment of the first association. Any further +associations on this socket will have their packet peer label compared to +the sockets peer label, and only if they are different will the +``association`` permission be validated. This is validated by checking the +socket peer sid against the received packets peer sid to determine whether +the association should be allowed or denied. + +NOTES: + 1) If peer labeling is not enabled, then the peer context will always be + ``SECINITSID_UNLABELED`` (``unlabeled_t`` in Reference Policy). + + 2) As SCTP can support more than one transport address per endpoint + (multi-homing) on a single socket, it is possible to configure policy + and NetLabel to provide different peer labels for each of these. As the + socket peer label is determined by the first associations transport + address, it is recommended that all peer labels are consistent. + + 3) **getpeercon**\(3) may be used by userspace to retrieve the sockets peer + context. + + 4) While not SCTP specific, be aware when using NetLabel that if a label + is assigned to a specific interface, and that interface 'goes down', + then the NetLabel service will remove the entry. Therefore ensure that + the network startup scripts call **netlabelctl**\(8) to set the required + label (see **netlabel-config**\(8) helper script for details). + + 5) The NetLabel SCTP peer labeling rules apply as discussed in the following + set of posts tagged "netlabel" at: http://www.paul-moore.com/blog/t. + + 6) CIPSO is only supported for IPv4 addressing: ``socket(AF_INET, ...)`` + CALIPSO is only supported for IPv6 addressing: ``socket(AF_INET6, ...)`` + + Note the following when testing CIPSO/CALIPSO: + a) CIPSO will send an ICMP packet if an SCTP packet cannot be + delivered because of an invalid label. + b) CALIPSO does not send an ICMP packet, just silently discards it. + + 7) IPSEC is not supported as RFC 3554 - sctp/ipsec support has not been + implemented in userspace (**racoon**\(8) or **ipsec_pluto**\(8)), + although the kernel supports SCTP/IPSEC. diff --git a/Documentation/security/SELinux-sctp.rst b/Documentation/security/SELinux-sctp.rst deleted file mode 100644 index a332cb1c5334..000000000000 --- a/Documentation/security/SELinux-sctp.rst +++ /dev/null @@ -1,158 +0,0 @@ -SCTP SELinux Support -===================== - -Security Hooks -=============== - -``Documentation/security/LSM-sctp.rst`` describes the following SCTP security -hooks with the SELinux specifics expanded below:: - - security_sctp_assoc_request() - security_sctp_bind_connect() - security_sctp_sk_clone() - security_inet_conn_established() - - -security_sctp_assoc_request() ------------------------------ -Passes the ``@ep`` and ``@chunk->skb`` of the association INIT packet to the -security module. Returns 0 on success, error on failure. -:: - - @ep - pointer to sctp endpoint structure. - @skb - pointer to skbuff of association packet. - -The security module performs the following operations: - IF this is the first association on ``@ep->base.sk``, then set the peer - sid to that in ``@skb``. This will ensure there is only one peer sid - assigned to ``@ep->base.sk`` that may support multiple associations. - - ELSE validate the ``@ep->base.sk peer_sid`` against the ``@skb peer sid`` - to determine whether the association should be allowed or denied. - - Set the sctp ``@ep sid`` to socket's sid (from ``ep->base.sk``) with - MLS portion taken from ``@skb peer sid``. This will be used by SCTP - TCP style sockets and peeled off connections as they cause a new socket - to be generated. - - If IP security options are configured (CIPSO/CALIPSO), then the ip - options are set on the socket. - - -security_sctp_bind_connect() ------------------------------ -Checks permissions required for ipv4/ipv6 addresses based on the ``@optname`` -as follows:: - - ------------------------------------------------------------------ - | BIND Permission Checks | - | @optname | @address contains | - |----------------------------|-----------------------------------| - | SCTP_SOCKOPT_BINDX_ADD | One or more ipv4 / ipv6 addresses | - | SCTP_PRIMARY_ADDR | Single ipv4 or ipv6 address | - | SCTP_SET_PEER_PRIMARY_ADDR | Single ipv4 or ipv6 address | - ------------------------------------------------------------------ - - ------------------------------------------------------------------ - | CONNECT Permission Checks | - | @optname | @address contains | - |----------------------------|-----------------------------------| - | SCTP_SOCKOPT_CONNECTX | One or more ipv4 / ipv6 addresses | - | SCTP_PARAM_ADD_IP | One or more ipv4 / ipv6 addresses | - | SCTP_SENDMSG_CONNECT | Single ipv4 or ipv6 address | - | SCTP_PARAM_SET_PRIMARY | Single ipv4 or ipv6 address | - ------------------------------------------------------------------ - - -``Documentation/security/LSM-sctp.rst`` gives a summary of the ``@optname`` -entries and also describes ASCONF chunk processing when Dynamic Address -Reconfiguration is enabled. - - -security_sctp_sk_clone() -------------------------- -Called whenever a new socket is created by **accept**\(2) (i.e. a TCP style -socket) or when a socket is 'peeled off' e.g userspace calls -**sctp_peeloff**\(3). ``security_sctp_sk_clone()`` will set the new -sockets sid and peer sid to that contained in the ``@ep sid`` and -``@ep peer sid`` respectively. -:: - - @ep - pointer to current sctp endpoint structure. - @sk - pointer to current sock structure. - @sk - pointer to new sock structure. - - -security_inet_conn_established() ---------------------------------- -Called when a COOKIE ACK is received where it sets the connection's peer sid -to that in ``@skb``:: - - @sk - pointer to sock structure. - @skb - pointer to skbuff of the COOKIE ACK packet. - - -Policy Statements -================== -The following class and permissions to support SCTP are available within the -kernel:: - - class sctp_socket inherits socket { node_bind } - -whenever the following policy capability is enabled:: - - policycap extended_socket_class; - -SELinux SCTP support adds the ``name_connect`` permission for connecting -to a specific port type and the ``association`` permission that is explained -in the section below. - -If userspace tools have been updated, SCTP will support the ``portcon`` -statement as shown in the following example:: - - portcon sctp 1024-1036 system_u:object_r:sctp_ports_t:s0 - - -SCTP Peer Labeling -=================== -An SCTP socket will only have one peer label assigned to it. This will be -assigned during the establishment of the first association. Any further -associations on this socket will have their packet peer label compared to -the sockets peer label, and only if they are different will the -``association`` permission be validated. This is validated by checking the -socket peer sid against the received packets peer sid to determine whether -the association should be allowed or denied. - -NOTES: - 1) If peer labeling is not enabled, then the peer context will always be - ``SECINITSID_UNLABELED`` (``unlabeled_t`` in Reference Policy). - - 2) As SCTP can support more than one transport address per endpoint - (multi-homing) on a single socket, it is possible to configure policy - and NetLabel to provide different peer labels for each of these. As the - socket peer label is determined by the first associations transport - address, it is recommended that all peer labels are consistent. - - 3) **getpeercon**\(3) may be used by userspace to retrieve the sockets peer - context. - - 4) While not SCTP specific, be aware when using NetLabel that if a label - is assigned to a specific interface, and that interface 'goes down', - then the NetLabel service will remove the entry. Therefore ensure that - the network startup scripts call **netlabelctl**\(8) to set the required - label (see **netlabel-config**\(8) helper script for details). - - 5) The NetLabel SCTP peer labeling rules apply as discussed in the following - set of posts tagged "netlabel" at: http://www.paul-moore.com/blog/t. - - 6) CIPSO is only supported for IPv4 addressing: ``socket(AF_INET, ...)`` - CALIPSO is only supported for IPv6 addressing: ``socket(AF_INET6, ...)`` - - Note the following when testing CIPSO/CALIPSO: - a) CIPSO will send an ICMP packet if an SCTP packet cannot be - delivered because of an invalid label. - b) CALIPSO does not send an ICMP packet, just silently discards it. - - 7) IPSEC is not supported as RFC 3554 - sctp/ipsec support has not been - implemented in userspace (**racoon**\(8) or **ipsec_pluto**\(8)), - although the kernel supports SCTP/IPSEC. diff --git a/Documentation/security/index.rst b/Documentation/security/index.rst index 85492bfca530..aad6d92ffe31 100644 --- a/Documentation/security/index.rst +++ b/Documentation/security/index.rst @@ -9,7 +9,6 @@ Security Documentation IMA-templates keys/index LSM - LSM-sctp - SELinux-sctp + SCTP self-protection tpm/index diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c index f0e36c3492ba..bb4a8e088f7e 100644 --- a/security/selinux/hooks.c +++ b/security/selinux/hooks.c @@ -4531,7 +4531,7 @@ err_af: } /* This supports connect(2) and SCTP connect services such as sctp_connectx(3) - * and sctp_sendmsg(3) as described in Documentation/security/LSM-sctp.rst + * and sctp_sendmsg(3) as described in Documentation/security/SCTP.rst */ static int selinux_socket_connect_helper(struct socket *sock, struct sockaddr *address, int addrlen) -- cgit v1.2.3-59-g8ed1b From 80fcc98711a3980a8593d8ba4bd9cf9c7646fd60 Mon Sep 17 00:00:00 2001 From: Kees Cook Date: Sun, 17 Feb 2019 14:19:01 -0800 Subject: doc: security: Add kern-doc for lsm_hooks.h There is a lot of kern-doc for the LSM internals, but it wasn't visible in the HTML output. This exposes some formatting flaws in lsm_hooks.h that will be fixed in a later series of patches. Signed-off-by: Kees Cook Signed-off-by: Jonathan Corbet --- Documentation/security/LSM.rst | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/Documentation/security/LSM.rst b/Documentation/security/LSM.rst index 8b9ee597e9d0..31d92bc5fdd2 100644 --- a/Documentation/security/LSM.rst +++ b/Documentation/security/LSM.rst @@ -11,4 +11,7 @@ that end users and distros can make a more informed decision about which LSMs suit their requirements. For extensive documentation on the available LSM hook interfaces, please -see ``include/linux/lsm_hooks.h``. +see ``include/linux/lsm_hooks.h`` and associated structures: + +.. kernel-doc:: include/linux/lsm_hooks.h + :internal: -- cgit v1.2.3-59-g8ed1b From 19c3fe285cbaa5aa36d60638fc5c37706f2a1c3e Mon Sep 17 00:00:00 2001 From: Sean Christopherson Date: Tue, 19 Feb 2019 07:27:15 -0800 Subject: docs: Explicitly state that the 'Fixes:' tag shouldn't split lines ...and use a commit with an obnoxiously long summary in the example to make it abundantly clear that keeping the tag on a single line takes priority over wrapping at 75 columns. Without the explicit exemption, one might assume splitting the tag is acceptable, even encouraged, e.g. due to being conditioned by checkpatch's line length warning. Per Stephen's scripts[1] and implied by commit bf4daf12a9fb ("checkpatch: avoid some commit message long line warnings"), splitting the 'Fixes:' tag across multiple lines is a no-no, presumably because parsing multi- line tags is unnecessarily painful. [1] https://lkml.kernel.org/r/20190216183433.71b7cfa7@canb.auug.org.au Cc: Stephen Rothwell Signed-off-by: Sean Christopherson Signed-off-by: Jonathan Corbet --- Documentation/process/submitting-patches.rst | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/Documentation/process/submitting-patches.rst b/Documentation/process/submitting-patches.rst index 30dc00a364e8..be7d1829c3af 100644 --- a/Documentation/process/submitting-patches.rst +++ b/Documentation/process/submitting-patches.rst @@ -182,9 +182,11 @@ change five years from now. If your patch fixes a bug in a specific commit, e.g. you found an issue using ``git bisect``, please use the 'Fixes:' tag with the first 12 characters of -the SHA-1 ID, and the one line summary. For example:: +the SHA-1 ID, and the one line summary. Do not split the tag across multiple +lines, tags are exempt from the "wrap at 75 columns" rule in order to simplify +parsing scripts. For example:: - Fixes: e21d2170f366 ("video: remove unnecessary platform_set_drvdata()") + Fixes: 54a4f0239f2e ("KVM: MMU: make kvm_mmu_zap_page() return the number of pages it actually freed") The following ``git config`` settings can be used to add a pretty format for outputting the above style in the ``git log`` or ``git show`` commands:: -- cgit v1.2.3-59-g8ed1b From d2b008f134b787c067b987c61f949b7768bb8b27 Mon Sep 17 00:00:00 2001 From: Zenghui Yu Date: Wed, 27 Feb 2019 02:22:22 +0800 Subject: Documentation/process/howto: Update for 4.x -> 5.x versioning As linux-5.0 is coming up soon, the howto.rst document can be updated for the new kernel version. Instead of changing all 4.x references to 5.x, this time we git rid of all explicit version numbers and rework some kernel trees' name to keep the docs current and real. Signed-off-by: Zenghui Yu Signed-off-by: Jonathan Corbet --- Documentation/process/howto.rst | 47 +++++++++++++++++++---------------------- 1 file changed, 22 insertions(+), 25 deletions(-) diff --git a/Documentation/process/howto.rst b/Documentation/process/howto.rst index f16242bfb962..ad2b6c852b95 100644 --- a/Documentation/process/howto.rst +++ b/Documentation/process/howto.rst @@ -235,22 +235,21 @@ Linux kernel development process currently consists of a few different main kernel "branches" and lots of different subsystem-specific kernel branches. These different branches are: - - main 4.x kernel tree - - 4.x.y -stable kernel tree - - subsystem specific kernel trees and patches - - the 4.x -next kernel tree for integration tests + - Linus's mainline tree + - Various stable trees with multiple major numbers + - Subsystem-specific trees + - linux-next integration testing tree -4.x kernel tree -~~~~~~~~~~~~~~~ +Mainline tree +~~~~~~~~~~~~~ -4.x kernels are maintained by Linus Torvalds, and can be found on -https://kernel.org in the pub/linux/kernel/v4.x/ directory. Its development -process is as follows: +Mainline tree are maintained by Linus Torvalds, and can be found at +https://kernel.org or in the repo. Its development process is as follows: - As soon as a new kernel is released a two weeks window is open, during this period of time maintainers can submit big diffs to Linus, usually the patches that have already been included in the - -next kernel for a few weeks. The preferred way to submit big changes + linux-next for a few weeks. The preferred way to submit big changes is using git (the kernel's source management tool, more information can be found at https://git-scm.com/) but plain patches are also just fine. @@ -277,21 +276,19 @@ mailing list about kernel releases: released according to perceived bug status, not according to a preconceived timeline."* -4.x.y -stable kernel tree -~~~~~~~~~~~~~~~~~~~~~~~~~ +Various stable trees with multiple major numbers +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Kernels with 3-part versions are -stable kernels. They contain relatively small and critical fixes for security problems or significant -regressions discovered in a given 4.x kernel. +regressions discovered in a given major mainline release, with the first +2-part of version number are the same correspondingly. This is the recommended branch for users who want the most recent stable kernel and are not interested in helping test development/experimental versions. -If no 4.x.y kernel is available, then the highest numbered 4.x -kernel is the current stable kernel. - -4.x.y are maintained by the "stable" team , and +Stable trees are maintained by the "stable" team , and are released as needs dictate. The normal release period is approximately two weeks, but it can be longer if there are no pressing problems. A security-related problem, instead, can cause a release to happen almost @@ -301,8 +298,8 @@ The file :ref:`Documentation/process/stable-kernel-rules.rst Date: Mon, 25 Feb 2019 21:23:26 +0100 Subject: docs: driver-api: iio: fix errors in documentation Improve IIO documentation by fixing a few mistakes. Signed-off-by: Tomasz Duszynski Acked-by: Jonathan Cameron Signed-off-by: Jonathan Corbet --- Documentation/driver-api/iio/buffers.rst | 2 +- Documentation/driver-api/iio/core.rst | 6 +++--- Documentation/driver-api/iio/hw-consumer.rst | 2 +- Documentation/driver-api/iio/triggers.rst | 2 +- 4 files changed, 6 insertions(+), 6 deletions(-) diff --git a/Documentation/driver-api/iio/buffers.rst b/Documentation/driver-api/iio/buffers.rst index 02c99a6bee18..e9036ef9f8f4 100644 --- a/Documentation/driver-api/iio/buffers.rst +++ b/Documentation/driver-api/iio/buffers.rst @@ -26,7 +26,7 @@ IIO buffer setup ================ The meta information associated with a channel reading placed in a buffer is -called a scan element . The important bits configuring scan elements are +called a scan element. The important bits configuring scan elements are exposed to userspace applications via the :file:`/sys/bus/iio/iio:device{X}/scan_elements/*` directory. This file contains attributes of the following form: diff --git a/Documentation/driver-api/iio/core.rst b/Documentation/driver-api/iio/core.rst index 9a34ae03b679..b0bc0c028cc5 100644 --- a/Documentation/driver-api/iio/core.rst +++ b/Documentation/driver-api/iio/core.rst @@ -2,8 +2,8 @@ Core elements ============= -The Industrial I/O core offers a unified framework for writing drivers for -many different types of embedded sensors. a standard interface to user space +The Industrial I/O core offers both a unified framework for writing drivers for +many different types of embedded sensors and a standard interface to user space applications manipulating sensors. The implementation can be found under :file:`drivers/iio/industrialio-*` @@ -11,7 +11,7 @@ Industrial I/O Devices ---------------------- * struct :c:type:`iio_dev` - industrial I/O device -* :c:func:`iio_device_alloc()` - alocate an :c:type:`iio_dev` from a driver +* :c:func:`iio_device_alloc()` - allocate an :c:type:`iio_dev` from a driver * :c:func:`iio_device_free()` - free an :c:type:`iio_dev` from a driver * :c:func:`iio_device_register()` - register a device with the IIO subsystem * :c:func:`iio_device_unregister()` - unregister a device from the IIO diff --git a/Documentation/driver-api/iio/hw-consumer.rst b/Documentation/driver-api/iio/hw-consumer.rst index 8facce6a6733..e0fe0b98230e 100644 --- a/Documentation/driver-api/iio/hw-consumer.rst +++ b/Documentation/driver-api/iio/hw-consumer.rst @@ -1,7 +1,7 @@ =========== HW consumer =========== -An IIO device can be directly connected to another device in hardware. in this +An IIO device can be directly connected to another device in hardware. In this case the buffers between IIO provider and IIO consumer are handled by hardware. The Industrial I/O HW consumer offers a way to bond these IIO devices without software buffer for data. The implementation can be found under diff --git a/Documentation/driver-api/iio/triggers.rst b/Documentation/driver-api/iio/triggers.rst index f89d37e7dd82..5c2156de6284 100644 --- a/Documentation/driver-api/iio/triggers.rst +++ b/Documentation/driver-api/iio/triggers.rst @@ -38,7 +38,7 @@ There are two locations in sysfs related to triggers: * :file:`/sys/bus/iio/devices/iio:device{X}/trigger/*`, this directory is created once the device supports a triggered buffer. We can associate a - trigger with our device by writing the trigger's name in the + trigger with our device by writing the trigger's name in the :file:`current_trigger` file. IIO trigger setup -- cgit v1.2.3-59-g8ed1b From 6cd43851f858f4b34909e6d9ef398ba806e4adfd Mon Sep 17 00:00:00 2001 From: Geert Uytterhoeven Date: Thu, 28 Feb 2019 11:59:32 +0100 Subject: doc: rcu: Suspicious RCU usage is a warning Suspicious RCU usage messages are reported as warnings. Fixes: a5dd63efda3d07b5 ("lockdep: Use "WARNING" tag on lockdep splats") Signed-off-by: Geert Uytterhoeven Reviewed-by: Paul E. McKenney Signed-off-by: Jonathan Corbet --- Documentation/RCU/lockdep-splat.txt | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/Documentation/RCU/lockdep-splat.txt b/Documentation/RCU/lockdep-splat.txt index 238e9f61352f..af24670ab77a 100644 --- a/Documentation/RCU/lockdep-splat.txt +++ b/Documentation/RCU/lockdep-splat.txt @@ -14,9 +14,9 @@ being the real world and all that. So let's look at an example RCU lockdep splat from 3.0-rc5, one that has long since been fixed: -=============================== -[ INFO: suspicious RCU usage. ] -------------------------------- +============================= +WARNING: suspicious RCU usage +----------------------------- block/cfq-iosched.c:2776 suspicious rcu_dereference_protected() usage! other info that might help us debug this: -- cgit v1.2.3-59-g8ed1b From 866d65b9d72f117134197712dca9c2569b703365 Mon Sep 17 00:00:00 2001 From: Geert Uytterhoeven Date: Fri, 1 Mar 2019 10:40:52 +0100 Subject: Documentation/locking/lockdep: Drop last two chars of sample states Since the removal of FS_RECLAIM annotations, lockdep states contain four characters, not six. Fixes: e5684bbfc3f03480 ("Documentation/locking/lockdep: Update info about states") Fixes: d92a8cfcb37ecd13 ("locking/lockdep: Rework FS_RECLAIM annotation") Signed-off-by: Geert Uytterhoeven Acked-by: Will Deacon Signed-off-by: Jonathan Corbet --- Documentation/RCU/lockdep-splat.txt | 6 +++--- Documentation/locking/lockdep-design.txt | 4 ++-- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/Documentation/RCU/lockdep-splat.txt b/Documentation/RCU/lockdep-splat.txt index af24670ab77a..9c015976b174 100644 --- a/Documentation/RCU/lockdep-splat.txt +++ b/Documentation/RCU/lockdep-splat.txt @@ -24,11 +24,11 @@ other info that might help us debug this: rcu_scheduler_active = 1, debug_locks = 0 3 locks held by scsi_scan_6/1552: - #0: (&shost->scan_mutex){+.+.+.}, at: [] + #0: (&shost->scan_mutex){+.+.}, at: [] scsi_scan_host_selected+0x5a/0x150 - #1: (&eq->sysfs_lock){+.+...}, at: [] + #1: (&eq->sysfs_lock){+.+.}, at: [] elevator_exit+0x22/0x60 - #2: (&(&q->__queue_lock)->rlock){-.-...}, at: [] + #2: (&(&q->__queue_lock)->rlock){-.-.}, at: [] cfq_exit_queue+0x43/0x190 stack backtrace: diff --git a/Documentation/locking/lockdep-design.txt b/Documentation/locking/lockdep-design.txt index 49f58a07ee7b..39fae143c9cb 100644 --- a/Documentation/locking/lockdep-design.txt +++ b/Documentation/locking/lockdep-design.txt @@ -45,10 +45,10 @@ When locking rules are violated, these state bits are presented in the locking error messages, inside curlies. A contrived example: modprobe/2287 is trying to acquire lock: - (&sio_locks[i].lock){-.-...}, at: [] mutex_lock+0x21/0x24 + (&sio_locks[i].lock){-.-.}, at: [] mutex_lock+0x21/0x24 but task is already holding lock: - (&sio_locks[i].lock){-.-...}, at: [] mutex_lock+0x21/0x24 + (&sio_locks[i].lock){-.-.}, at: [] mutex_lock+0x21/0x24 The bit position indicates STATE, STATE-read, for each of the states listed -- cgit v1.2.3-59-g8ed1b From 4064174becc09a5a2385a27c8a6fd40888b0e13c Mon Sep 17 00:00:00 2001 From: Jonathan Corbet Date: Wed, 20 Feb 2019 15:29:36 -0700 Subject: docs: Bring some order to filesystem documentation Documentation/filesystems is, like much of the rest of the kernel's documentation, a jumble of unorganized information. Split the documentation into categories and try to bring some order to the top-level index.rst files. No text changes other than a few section-introductory blurbs; this is all just moving stuff around. Cc: linux-fsdevel@vger.kernel.org Cc: Al Viro Signed-off-by: Jonathan Corbet --- Documentation/filesystems/api-summary.rst | 150 ++++++++++++ Documentation/filesystems/index.rst | 394 ++---------------------------- Documentation/filesystems/journalling.rst | 184 ++++++++++++++ Documentation/filesystems/path-lookup.rst | 15 ++ Documentation/filesystems/splice.rst | 22 ++ 5 files changed, 395 insertions(+), 370 deletions(-) create mode 100644 Documentation/filesystems/api-summary.rst create mode 100644 Documentation/filesystems/journalling.rst create mode 100644 Documentation/filesystems/splice.rst diff --git a/Documentation/filesystems/api-summary.rst b/Documentation/filesystems/api-summary.rst new file mode 100644 index 000000000000..aa51ffcfa029 --- /dev/null +++ b/Documentation/filesystems/api-summary.rst @@ -0,0 +1,150 @@ +============================= +Linux Filesystems API summary +============================= + +This section contains API-level documentation, mostly taken from the source +code itself. + +The Linux VFS +============= + +The Filesystem types +-------------------- + +.. kernel-doc:: include/linux/fs.h + :internal: + +The Directory Cache +------------------- + +.. kernel-doc:: fs/dcache.c + :export: + +.. kernel-doc:: include/linux/dcache.h + :internal: + +Inode Handling +-------------- + +.. kernel-doc:: fs/inode.c + :export: + +.. kernel-doc:: fs/bad_inode.c + :export: + +Registration and Superblocks +---------------------------- + +.. kernel-doc:: fs/super.c + :export: + +File Locks +---------- + +.. kernel-doc:: fs/locks.c + :export: + +.. kernel-doc:: fs/locks.c + :internal: + +Other Functions +--------------- + +.. kernel-doc:: fs/mpage.c + :export: + +.. kernel-doc:: fs/namei.c + :export: + +.. kernel-doc:: fs/buffer.c + :export: + +.. kernel-doc:: block/bio.c + :export: + +.. kernel-doc:: fs/seq_file.c + :export: + +.. kernel-doc:: fs/filesystems.c + :export: + +.. kernel-doc:: fs/fs-writeback.c + :export: + +.. kernel-doc:: fs/block_dev.c + :export: + +.. kernel-doc:: fs/anon_inodes.c + :export: + +.. kernel-doc:: fs/attr.c + :export: + +.. kernel-doc:: fs/d_path.c + :export: + +.. kernel-doc:: fs/dax.c + :export: + +.. kernel-doc:: fs/direct-io.c + :export: + +.. kernel-doc:: fs/file_table.c + :export: + +.. kernel-doc:: fs/libfs.c + :export: + +.. kernel-doc:: fs/posix_acl.c + :export: + +.. kernel-doc:: fs/stat.c + :export: + +.. kernel-doc:: fs/sync.c + :export: + +.. kernel-doc:: fs/xattr.c + :export: + +The proc filesystem +=================== + +sysctl interface +---------------- + +.. kernel-doc:: kernel/sysctl.c + :export: + +proc filesystem interface +------------------------- + +.. kernel-doc:: fs/proc/base.c + :internal: + +Events based on file descriptors +================================ + +.. kernel-doc:: fs/eventfd.c + :export: + +The Filesystem for Exporting Kernel Objects +=========================================== + +.. kernel-doc:: fs/sysfs/file.c + :export: + +.. kernel-doc:: fs/sysfs/symlink.c + :export: + +The debugfs filesystem +====================== + +debugfs interface +----------------- + +.. kernel-doc:: fs/debugfs/inode.c + :export: + +.. kernel-doc:: fs/debugfs/file.c + :export: diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst index 61d2441b25d5..1131c34d77f6 100644 --- a/Documentation/filesystems/index.rst +++ b/Documentation/filesystems/index.rst @@ -1,389 +1,43 @@ -===================== -Linux Filesystems API -===================== +=============================== +Filesystems in the Linux kernel +=============================== -The Linux VFS -============= +This under-development manual will, some glorious day, provide +comprehensive information on how the Linux virtual filesystem (VFS) layer +works, along with the filesystems that sit below it. For now, what we have +can be found below. -The Filesystem types --------------------- - -.. kernel-doc:: include/linux/fs.h - :internal: - -The Directory Cache -------------------- - -.. kernel-doc:: fs/dcache.c - :export: - -.. kernel-doc:: include/linux/dcache.h - :internal: - -Inode Handling --------------- - -.. kernel-doc:: fs/inode.c - :export: - -.. kernel-doc:: fs/bad_inode.c - :export: - -Registration and Superblocks ----------------------------- - -.. kernel-doc:: fs/super.c - :export: - -File Locks ----------- - -.. kernel-doc:: fs/locks.c - :export: - -.. kernel-doc:: fs/locks.c - :internal: - -Other Functions ---------------- - -.. kernel-doc:: fs/mpage.c - :export: - -.. kernel-doc:: fs/namei.c - :export: - -.. kernel-doc:: fs/buffer.c - :export: - -.. kernel-doc:: block/bio.c - :export: - -.. kernel-doc:: fs/seq_file.c - :export: - -.. kernel-doc:: fs/filesystems.c - :export: - -.. kernel-doc:: fs/fs-writeback.c - :export: - -.. kernel-doc:: fs/block_dev.c - :export: - -.. kernel-doc:: fs/anon_inodes.c - :export: - -.. kernel-doc:: fs/attr.c - :export: - -.. kernel-doc:: fs/d_path.c - :export: - -.. kernel-doc:: fs/dax.c - :export: - -.. kernel-doc:: fs/direct-io.c - :export: - -.. kernel-doc:: fs/file_table.c - :export: - -.. kernel-doc:: fs/libfs.c - :export: - -.. kernel-doc:: fs/posix_acl.c - :export: - -.. kernel-doc:: fs/stat.c - :export: - -.. kernel-doc:: fs/sync.c - :export: - -.. kernel-doc:: fs/xattr.c - :export: - -The proc filesystem -=================== - -sysctl interface ----------------- - -.. kernel-doc:: kernel/sysctl.c - :export: - -proc filesystem interface -------------------------- - -.. kernel-doc:: fs/proc/base.c - :internal: - -Events based on file descriptors -================================ - -.. kernel-doc:: fs/eventfd.c - :export: - -The Filesystem for Exporting Kernel Objects -=========================================== - -.. kernel-doc:: fs/sysfs/file.c - :export: - -.. kernel-doc:: fs/sysfs/symlink.c - :export: - -The debugfs filesystem +Core VFS documentation ====================== -debugfs interface ------------------ +See these manuals for documentation about the VFS layer itself and how its +algorithms work. -.. kernel-doc:: fs/debugfs/inode.c - :export: +.. toctree:: + :maxdepth: 2 -.. kernel-doc:: fs/debugfs/file.c - :export: + path-lookup.rst + api-summary + splice -The Linux Journalling API +Filesystem support layers ========================= -Overview --------- - -Details -~~~~~~~ - -The journalling layer is easy to use. You need to first of all create a -journal_t data structure. There are two calls to do this dependent on -how you decide to allocate the physical media on which the journal -resides. The :c:func:`jbd2_journal_init_inode` call is for journals stored in -filesystem inodes, or the :c:func:`jbd2_journal_init_dev` call can be used -for journal stored on a raw device (in a continuous range of blocks). A -journal_t is a typedef for a struct pointer, so when you are finally -finished make sure you call :c:func:`jbd2_journal_destroy` on it to free up -any used kernel memory. - -Once you have got your journal_t object you need to 'mount' or load the -journal file. The journalling layer expects the space for the journal -was already allocated and initialized properly by the userspace tools. -When loading the journal you must call :c:func:`jbd2_journal_load` to process -journal contents. If the client file system detects the journal contents -does not need to be processed (or even need not have valid contents), it -may call :c:func:`jbd2_journal_wipe` to clear the journal contents before -calling :c:func:`jbd2_journal_load`. - -Note that jbd2_journal_wipe(..,0) calls -:c:func:`jbd2_journal_skip_recovery` for you if it detects any outstanding -transactions in the journal and similarly :c:func:`jbd2_journal_load` will -call :c:func:`jbd2_journal_recover` if necessary. I would advise reading -:c:func:`ext4_load_journal` in fs/ext4/super.c for examples on this stage. - -Now you can go ahead and start modifying the underlying filesystem. -Almost. - -You still need to actually journal your filesystem changes, this is done -by wrapping them into transactions. Additionally you also need to wrap -the modification of each of the buffers with calls to the journal layer, -so it knows what the modifications you are actually making are. To do -this use :c:func:`jbd2_journal_start` which returns a transaction handle. - -:c:func:`jbd2_journal_start` and its counterpart :c:func:`jbd2_journal_stop`, -which indicates the end of a transaction are nestable calls, so you can -reenter a transaction if necessary, but remember you must call -:c:func:`jbd2_journal_stop` the same number of times as -:c:func:`jbd2_journal_start` before the transaction is completed (or more -accurately leaves the update phase). Ext4/VFS makes use of this feature to -simplify handling of inode dirtying, quota support, etc. - -Inside each transaction you need to wrap the modifications to the -individual buffers (blocks). Before you start to modify a buffer you -need to call :c:func:`jbd2_journal_get_create_access()` / -:c:func:`jbd2_journal_get_write_access()` / -:c:func:`jbd2_journal_get_undo_access()` as appropriate, this allows the -journalling layer to copy the unmodified -data if it needs to. After all the buffer may be part of a previously -uncommitted transaction. At this point you are at last ready to modify a -buffer, and once you are have done so you need to call -:c:func:`jbd2_journal_dirty_metadata`. Or if you've asked for access to a -buffer you now know is now longer required to be pushed back on the -device you can call :c:func:`jbd2_journal_forget` in much the same way as you -might have used :c:func:`bforget` in the past. - -A :c:func:`jbd2_journal_flush` may be called at any time to commit and -checkpoint all your transactions. - -Then at umount time , in your :c:func:`put_super` you can then call -:c:func:`jbd2_journal_destroy` to clean up your in-core journal object. - -Unfortunately there a couple of ways the journal layer can cause a -deadlock. The first thing to note is that each task can only have a -single outstanding transaction at any one time, remember nothing commits -until the outermost :c:func:`jbd2_journal_stop`. This means you must complete -the transaction at the end of each file/inode/address etc. operation you -perform, so that the journalling system isn't re-entered on another -journal. Since transactions can't be nested/batched across differing -journals, and another filesystem other than yours (say ext4) may be -modified in a later syscall. - -The second case to bear in mind is that :c:func:`jbd2_journal_start` can block -if there isn't enough space in the journal for your transaction (based -on the passed nblocks param) - when it blocks it merely(!) needs to wait -for transactions to complete and be committed from other tasks, so -essentially we are waiting for :c:func:`jbd2_journal_stop`. So to avoid -deadlocks you must treat :c:func:`jbd2_journal_start` / -:c:func:`jbd2_journal_stop` as if they were semaphores and include them in -your semaphore ordering rules to prevent -deadlocks. Note that :c:func:`jbd2_journal_extend` has similar blocking -behaviour to :c:func:`jbd2_journal_start` so you can deadlock here just as -easily as on :c:func:`jbd2_journal_start`. - -Try to reserve the right number of blocks the first time. ;-). This will -be the maximum number of blocks you are going to touch in this -transaction. I advise having a look at at least ext4_jbd.h to see the -basis on which ext4 uses to make these decisions. - -Another wriggle to watch out for is your on-disk block allocation -strategy. Why? Because, if you do a delete, you need to ensure you -haven't reused any of the freed blocks until the transaction freeing -these blocks commits. If you reused these blocks and crash happens, -there is no way to restore the contents of the reallocated blocks at the -end of the last fully committed transaction. One simple way of doing -this is to mark blocks as free in internal in-memory block allocation -structures only after the transaction freeing them commits. Ext4 uses -journal commit callback for this purpose. - -With journal commit callbacks you can ask the journalling layer to call -a callback function when the transaction is finally committed to disk, -so that you can do some of your own management. You ask the journalling -layer for calling the callback by simply setting -``journal->j_commit_callback`` function pointer and that function is -called after each transaction commit. You can also use -``transaction->t_private_list`` for attaching entries to a transaction -that need processing when the transaction commits. - -JBD2 also provides a way to block all transaction updates via -:c:func:`jbd2_journal_lock_updates()` / -:c:func:`jbd2_journal_unlock_updates()`. Ext4 uses this when it wants a -window with a clean and stable fs for a moment. E.g. - -:: - - - jbd2_journal_lock_updates() //stop new stuff happening.. - jbd2_journal_flush() // checkpoint everything. - ..do stuff on stable fs - jbd2_journal_unlock_updates() // carry on with filesystem use. - -The opportunities for abuse and DOS attacks with this should be obvious, -if you allow unprivileged userspace to trigger codepaths containing -these calls. - -Summary -~~~~~~~ - -Using the journal is a matter of wrapping the different context changes, -being each mount, each modification (transaction) and each changed -buffer to tell the journalling layer about them. - -Data Types ----------- - -The journalling layer uses typedefs to 'hide' the concrete definitions -of the structures used. As a client of the JBD2 layer you can just rely -on the using the pointer as a magic cookie of some sort. Obviously the -hiding is not enforced as this is 'C'. - -Structures -~~~~~~~~~~ - -.. kernel-doc:: include/linux/jbd2.h - :internal: - -Functions ---------- - -The functions here are split into two groups those that affect a journal -as a whole, and those which are used to manage transactions - -Journal Level -~~~~~~~~~~~~~ - -.. kernel-doc:: fs/jbd2/journal.c - :export: - -.. kernel-doc:: fs/jbd2/recovery.c - :internal: - -Transasction Level -~~~~~~~~~~~~~~~~~~ - -.. kernel-doc:: fs/jbd2/transaction.c - -See also --------- - -`Journaling the Linux ext2fs Filesystem, LinuxExpo 98, Stephen -Tweedie `__ - -`Ext3 Journalling FileSystem, OLS 2000, Dr. Stephen -Tweedie `__ - -splice API -========== - -splice is a method for moving blocks of data around inside the kernel, -without continually transferring them between the kernel and user space. - -.. kernel-doc:: fs/splice.c - -pipes API -========= - -Pipe interfaces are all for in-kernel (builtin image) use. They are not -exported for use by modules. - -.. kernel-doc:: include/linux/pipe_fs_i.h - :internal: - -.. kernel-doc:: fs/pipe.c - -Encryption API -============== - -A library which filesystems can hook into to support transparent -encryption of files and directories. +Documentation for the support code within the filesystem layer for use in +filesystem implementations. .. toctree:: - :maxdepth: 2 - - fscrypt - -Pathname lookup -=============== - - -This write-up is based on three articles published at lwn.net: + :maxdepth: 2 -- Pathname lookup in Linux -- RCU-walk: faster pathname lookup in Linux -- A walk among the symlinks + journalling + fscrypt -Written by Neil Brown with help from Al Viro and Jon Corbet. -It has subsequently been updated to reflect changes in the kernel -including: +Filesystem-specific documentation +================================= -- per-directory parallel name lookup. +Documentation for individual filesystem types can be found here. .. toctree:: :maxdepth: 2 - path-lookup.rst - -binderfs -======== - -.. toctree:: - binderfs.rst diff --git a/Documentation/filesystems/journalling.rst b/Documentation/filesystems/journalling.rst new file mode 100644 index 000000000000..58ce6b395206 --- /dev/null +++ b/Documentation/filesystems/journalling.rst @@ -0,0 +1,184 @@ +The Linux Journalling API +========================= + +Overview +-------- + +Details +~~~~~~~ + +The journalling layer is easy to use. You need to first of all create a +journal_t data structure. There are two calls to do this dependent on +how you decide to allocate the physical media on which the journal +resides. The :c:func:`jbd2_journal_init_inode` call is for journals stored in +filesystem inodes, or the :c:func:`jbd2_journal_init_dev` call can be used +for journal stored on a raw device (in a continuous range of blocks). A +journal_t is a typedef for a struct pointer, so when you are finally +finished make sure you call :c:func:`jbd2_journal_destroy` on it to free up +any used kernel memory. + +Once you have got your journal_t object you need to 'mount' or load the +journal file. The journalling layer expects the space for the journal +was already allocated and initialized properly by the userspace tools. +When loading the journal you must call :c:func:`jbd2_journal_load` to process +journal contents. If the client file system detects the journal contents +does not need to be processed (or even need not have valid contents), it +may call :c:func:`jbd2_journal_wipe` to clear the journal contents before +calling :c:func:`jbd2_journal_load`. + +Note that jbd2_journal_wipe(..,0) calls +:c:func:`jbd2_journal_skip_recovery` for you if it detects any outstanding +transactions in the journal and similarly :c:func:`jbd2_journal_load` will +call :c:func:`jbd2_journal_recover` if necessary. I would advise reading +:c:func:`ext4_load_journal` in fs/ext4/super.c for examples on this stage. + +Now you can go ahead and start modifying the underlying filesystem. +Almost. + +You still need to actually journal your filesystem changes, this is done +by wrapping them into transactions. Additionally you also need to wrap +the modification of each of the buffers with calls to the journal layer, +so it knows what the modifications you are actually making are. To do +this use :c:func:`jbd2_journal_start` which returns a transaction handle. + +:c:func:`jbd2_journal_start` and its counterpart :c:func:`jbd2_journal_stop`, +which indicates the end of a transaction are nestable calls, so you can +reenter a transaction if necessary, but remember you must call +:c:func:`jbd2_journal_stop` the same number of times as +:c:func:`jbd2_journal_start` before the transaction is completed (or more +accurately leaves the update phase). Ext4/VFS makes use of this feature to +simplify handling of inode dirtying, quota support, etc. + +Inside each transaction you need to wrap the modifications to the +individual buffers (blocks). Before you start to modify a buffer you +need to call :c:func:`jbd2_journal_get_create_access()` / +:c:func:`jbd2_journal_get_write_access()` / +:c:func:`jbd2_journal_get_undo_access()` as appropriate, this allows the +journalling layer to copy the unmodified +data if it needs to. After all the buffer may be part of a previously +uncommitted transaction. At this point you are at last ready to modify a +buffer, and once you are have done so you need to call +:c:func:`jbd2_journal_dirty_metadata`. Or if you've asked for access to a +buffer you now know is now longer required to be pushed back on the +device you can call :c:func:`jbd2_journal_forget` in much the same way as you +might have used :c:func:`bforget` in the past. + +A :c:func:`jbd2_journal_flush` may be called at any time to commit and +checkpoint all your transactions. + +Then at umount time , in your :c:func:`put_super` you can then call +:c:func:`jbd2_journal_destroy` to clean up your in-core journal object. + +Unfortunately there a couple of ways the journal layer can cause a +deadlock. The first thing to note is that each task can only have a +single outstanding transaction at any one time, remember nothing commits +until the outermost :c:func:`jbd2_journal_stop`. This means you must complete +the transaction at the end of each file/inode/address etc. operation you +perform, so that the journalling system isn't re-entered on another +journal. Since transactions can't be nested/batched across differing +journals, and another filesystem other than yours (say ext4) may be +modified in a later syscall. + +The second case to bear in mind is that :c:func:`jbd2_journal_start` can block +if there isn't enough space in the journal for your transaction (based +on the passed nblocks param) - when it blocks it merely(!) needs to wait +for transactions to complete and be committed from other tasks, so +essentially we are waiting for :c:func:`jbd2_journal_stop`. So to avoid +deadlocks you must treat :c:func:`jbd2_journal_start` / +:c:func:`jbd2_journal_stop` as if they were semaphores and include them in +your semaphore ordering rules to prevent +deadlocks. Note that :c:func:`jbd2_journal_extend` has similar blocking +behaviour to :c:func:`jbd2_journal_start` so you can deadlock here just as +easily as on :c:func:`jbd2_journal_start`. + +Try to reserve the right number of blocks the first time. ;-). This will +be the maximum number of blocks you are going to touch in this +transaction. I advise having a look at at least ext4_jbd.h to see the +basis on which ext4 uses to make these decisions. + +Another wriggle to watch out for is your on-disk block allocation +strategy. Why? Because, if you do a delete, you need to ensure you +haven't reused any of the freed blocks until the transaction freeing +these blocks commits. If you reused these blocks and crash happens, +there is no way to restore the contents of the reallocated blocks at the +end of the last fully committed transaction. One simple way of doing +this is to mark blocks as free in internal in-memory block allocation +structures only after the transaction freeing them commits. Ext4 uses +journal commit callback for this purpose. + +With journal commit callbacks you can ask the journalling layer to call +a callback function when the transaction is finally committed to disk, +so that you can do some of your own management. You ask the journalling +layer for calling the callback by simply setting +``journal->j_commit_callback`` function pointer and that function is +called after each transaction commit. You can also use +``transaction->t_private_list`` for attaching entries to a transaction +that need processing when the transaction commits. + +JBD2 also provides a way to block all transaction updates via +:c:func:`jbd2_journal_lock_updates()` / +:c:func:`jbd2_journal_unlock_updates()`. Ext4 uses this when it wants a +window with a clean and stable fs for a moment. E.g. + +:: + + + jbd2_journal_lock_updates() //stop new stuff happening.. + jbd2_journal_flush() // checkpoint everything. + ..do stuff on stable fs + jbd2_journal_unlock_updates() // carry on with filesystem use. + +The opportunities for abuse and DOS attacks with this should be obvious, +if you allow unprivileged userspace to trigger codepaths containing +these calls. + +Summary +~~~~~~~ + +Using the journal is a matter of wrapping the different context changes, +being each mount, each modification (transaction) and each changed +buffer to tell the journalling layer about them. + +Data Types +---------- + +The journalling layer uses typedefs to 'hide' the concrete definitions +of the structures used. As a client of the JBD2 layer you can just rely +on the using the pointer as a magic cookie of some sort. Obviously the +hiding is not enforced as this is 'C'. + +Structures +~~~~~~~~~~ + +.. kernel-doc:: include/linux/jbd2.h + :internal: + +Functions +--------- + +The functions here are split into two groups those that affect a journal +as a whole, and those which are used to manage transactions + +Journal Level +~~~~~~~~~~~~~ + +.. kernel-doc:: fs/jbd2/journal.c + :export: + +.. kernel-doc:: fs/jbd2/recovery.c + :internal: + +Transasction Level +~~~~~~~~~~~~~~~~~~ + +.. kernel-doc:: fs/jbd2/transaction.c + +See also +-------- + +`Journaling the Linux ext2fs Filesystem, LinuxExpo 98, Stephen +Tweedie `__ + +`Ext3 Journalling FileSystem, OLS 2000, Dr. Stephen +Tweedie `__ + diff --git a/Documentation/filesystems/path-lookup.rst b/Documentation/filesystems/path-lookup.rst index 80e22eda4132..434a07b0002b 100644 --- a/Documentation/filesystems/path-lookup.rst +++ b/Documentation/filesystems/path-lookup.rst @@ -1,3 +1,18 @@ +=============== +Pathname lookup +=============== + +This write-up is based on three articles published at lwn.net: + +- Pathname lookup in Linux +- RCU-walk: faster pathname lookup in Linux +- A walk among the symlinks + +Written by Neil Brown with help from Al Viro and Jon Corbet. +It has subsequently been updated to reflect changes in the kernel +including: + +- per-directory parallel name lookup. Introduction to pathname lookup =============================== diff --git a/Documentation/filesystems/splice.rst b/Documentation/filesystems/splice.rst new file mode 100644 index 000000000000..edd874808472 --- /dev/null +++ b/Documentation/filesystems/splice.rst @@ -0,0 +1,22 @@ +================ +splice and pipes +================ + +splice API +========== + +splice is a method for moving blocks of data around inside the kernel, +without continually transferring them between the kernel and user space. + +.. kernel-doc:: fs/splice.c + +pipes API +========= + +Pipe interfaces are all for in-kernel (builtin image) use. They are not +exported for use by modules. + +.. kernel-doc:: include/linux/pipe_fs_i.h + :internal: + +.. kernel-doc:: fs/pipe.c -- cgit v1.2.3-59-g8ed1b