aboutsummaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)AuthorFilesLines
2020-03-30NFS: Ensure security label is set for root inodeScott Mayhew4-38/+39
When using NFSv4.2, the security label for the root inode should be set via a call to nfs_setsecurity() during the mount process, otherwise the inode will appear as unlabeled for up to acdirmin seconds. Currently the label for the root inode is allocated, retrieved, and freed entirely witin nfs4_proc_get_root(). Add a field for the label to the nfs_fattr struct, and allocate & free the label in nfs_get_root(), where we also add a call to nfs_setsecurity(). Note that for the call to nfs_setsecurity() to succeed, it's necessary to also move the logic calling security_sb_{set,clone}_security() from nfs_get_tree_common() down into nfs_get_root()... otherwise the SBLABEL_MNT flag will not be set in the super_block's security flags and nfs_setsecurity() will silently fail. Reported-by: Richard Haines <richard_c_haines@btinternet.com> Signed-off-by: Scott Mayhew <smayhew@redhat.com> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Tested-by: Stephen Smalley <sds@tycho.nsa.gov> [PM: fixed 80-char line width problems] Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-03-12MAINTAINERS: Update my email addressStephen Smalley1-1/+1
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-03-05selinux: avtab_init() and cond_policydb_init() return voidPaul Moore5-21/+7
The avtab_init() and cond_policydb_init() functions always return zero so mark them as returning void and update the callers not to check for a return value. Suggested-by: Stephen Smalley <stephen.smalley.work@gmail.com> Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-03-05selinux: clean up error path in policydb_init()Ondrej Mosnacek1-13/+5
Commit e0ac568de1fa ("selinux: reduce the use of hard-coded hash sizes") moved symtab initialization out of policydb_init(), but left the cleanup of symtabs from the error path. This patch fixes the oversight. Suggested-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-02-27selinux: remove unused initial SIDs and improve handlingStephen Smalley5-59/+66
Remove initial SIDs that have never been used or are no longer used by the kernel from its string table, which is also used to generate the SECINITSID_* symbols referenced in code. Update the code to gracefully handle the fact that these can now be NULL. Stop treating it as an error if a policy defines additional initial SIDs unknown to the kernel. Do not load unused initial SID contexts into the sidtab. Fix the incorrect usage of the name from the ocontext in error messages when loading initial SIDs since these are not presently written to the kernel policy and are therefore always NULL. After this change, it is possible to safely reclaim and reuse some of the unused initial SIDs without compatibility issues. Specifically, unused initial SIDs that were being assigned the same context as the unlabeled initial SID in policies can be reclaimed and reused for another purpose, with existing policies still treating them as having the unlabeled context and future policies having the option of mapping them to a more specific context. For example, this could have been used when the infiniband labeling support was introduced to define initial SIDs for the default pkey and endport SIDs similar to the handling of port/netif/node SIDs rather than always using SECINITSID_UNLABELED as the default. The set of safely reclaimable unused initial SIDs across all known policies is igmp_packet (13), icmp_socket (14), tcp_socket (15), kmod (24), policy (25), and scmp_packet (26); these initial SIDs were assigned the same context as unlabeled in all known policies including mls. If only considering non-mls policies (i.e. assuming that mls users always upgrade policy with their kernels), the set of safely reclaimable unused initial SIDs further includes file_labels (6), init (7), sysctl_modprobe (16), and sysctl_fs (18) through sysctl_dev (23). Adding new initial SIDs beyond SECINITSID_NUM to policy unfortunately became a fatal error in commit 24ed7fdae669 ("selinux: use separate table for initial SID lookup") and even before that it could cause problems on a policy reload (collision between the new initial SID and one allocated at runtime) ever since commit 42596eafdd75 ("selinux: load the initial SIDs upon every policy load") so we cannot safely start adding new initial SIDs to policies beyond SECINITSID_NUM (27) until such a time as all such kernels do not need to be supported and only those that include this commit are relevant. That is not a big deal since we haven't added a new initial SID since 2004 (v2.6.7) and we have plenty of unused ones we can reclaim if we truly need one. If we want to avoid the wasted storage in initial_sid_to_string[] and/or sidtab->isids[] for the unused initial SIDs, we could introduce an indirection between the kernel initial SID values and the policy initial SID values and just map the policy SID values in the ocontexts to the kernel values during policy_load_isids(). Originally I thought we'd do this by preserving the initial SID names in the kernel policy and creating a mapping at load time like we do for the security classes and permissions but that would require a new kernel policy format version and associated changes to libsepol/checkpolicy and I'm not sure it is justified. Simpler approach is just to create a fixed mapping table in the kernel from the existing fixed policy values to the kernel values. Less flexible but probably sufficient. A separate selinux userspace change was applied in https://github.com/SELinuxProject/selinux/commit/8677ce5e8f592950ae6f14cea1b68a20ddc1ac25 to enable removal of most of the unused initial SID contexts from policies, but there is no dependency between that change and this one. That change permits removing all of the unused initial SID contexts from policy except for the fs and sysctl SID contexts. The initial SID declarations themselves would remain in policy to preserve the values of subsequent ones but the contexts can be dropped. If/when the kernel decides to reuse one of them, future policies can change the name and start assigning a context again without breaking compatibility. Here is how I would envision staging changes to the initial SIDs in a compatible manner after this commit is applied: 1. At any time after this commit is applied, the kernel could choose to reclaim one of the safely reclaimable unused initial SIDs listed above for a new purpose (i.e. replace its NULL entry in the initial_sid_to_string[] table with a new name and start using the newly generated SECINITSID_name symbol in code), and refpolicy could at that time rename its declaration of that initial SID to reflect its new purpose and start assigning it a context going forward. Existing/old policies would map the reclaimed initial SID to the unlabeled context, so that would be the initial default behavior until policies are updated. This doesn't depend on the selinux userspace change; it will work with existing policies and userspace. 2. In 6 months or so we'll have another SELinux userspace release that will include the libsepol/checkpolicy support for omitting unused initial SID contexts. 3. At any time after that release, refpolicy can make that release its minimum build requirement and drop the sid context statements (but not the sid declarations) for all of the unused initial SIDs except for fs and sysctl, which must remain for compatibility on policy reload with old kernels and for compatibility with kernels that were still using SECINITSID_SYSCTL (< 2.6.39). This doesn't depend on this kernel commit; it will work with previous kernels as well. 4. After N years for some value of N, refpolicy decides that it no longer cares about policy reload compatibility for kernels that predate this kernel commit, and refpolicy drops the fs and sysctl SID contexts from policy too (but retains the declarations). 5. After M years for some value of M, the kernel decides that it no longer cares about compatibility with refpolicies that predate step 4 (dropping the fs and sysctl SIDs), and those two SIDs also become safely reclaimable. This step is optional and need not ever occur unless we decide that the need to reclaim those two SIDs outweighs the compatibility cost. 6. After O years for some value of O, refpolicy decides that it no longer cares about policy load (not just reload) compatibility for kernels that predate this kernel commit, and both kernel and refpolicy can then start adding and using new initial SIDs beyond 27. This does not depend on the previous change (step 5) and can occur independent of it. Fixes: https://github.com/SELinuxProject/selinux-kernel/issues/12 Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-02-27selinux: reduce the use of hard-coded hash sizesOndrej Mosnacek4-40/+45
Instead allocate hash tables with just the right size based on the actual number of elements (which is almost always known beforehand, we just need to defer the hashtab allocation to the right time). The only case when we don't know the size (with the current policy format) is the new filename transitions hashtable. Here I just left the existing value. After this patch, the time to load Fedora policy on x86_64 decreases from 790 ms to 167 ms. If the unconfined module is removed, it decreases from 750 ms to 122 ms. It is also likely that other operations are going to be faster, mainly string_to_context_struct() or mls_compute_sid(), but I didn't try to quantify that. The memory usage of all hash table arrays increases from ~58 KB to ~163 KB (with Fedora policy on x86_64). Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-02-22selinux: Add xfs quota command typesRichard Haines1-0/+7
Add Q_XQUOTAOFF, Q_XQUOTAON and Q_XSETQLIM to trigger filesystem quotamod permission check. Add Q_XGETQUOTA, Q_XGETQSTAT, Q_XGETQSTATV and Q_XGETNEXTQUOTA to trigger filesystem quotaget permission check. Signed-off-by: Richard Haines <richard_c_haines@btinternet.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-02-22selinux: optimize storage of filename transitionsOndrej Mosnacek3-80/+110
In these rules, each rule with the same (target type, target class, filename) values is (in practice) always mapped to the same result type. Therefore, it is much more efficient to group the rules by (ttype, tclass, filename). Thus, this patch drops the stype field from the key and changes the datum to be a linked list of one or more structures that contain a result type and an ebitmap of source types that map the given target to the given result type under the given filename. The size of the hash table is also incremented to 2048 to be more optimal for Fedora policy (which currently has ~2500 unique (ttype, tclass, filename) tuples, regardless of whether the 'unconfined' module is enabled). Not only does this dramtically reduce memory usage when the policy contains a lot of unconfined domains (ergo a lot of filename based transitions), but it also slightly reduces memory usage of strongly confined policies (modeled on Fedora policy with 'unconfined' module disabled) and significantly reduces lookup times of these rules on Fedora (roughly matches the performance of the rhashtable conversion patch [1] posted recently to selinux@vger.kernel.org). An obvious next step is to change binary policy format to match this layout, so that disk space is also saved. However, since that requires more work (including matching userspace changes) and this patch is already beneficial on its own, I'm posting it separately. Performance/memory usage comparison: Kernel | Policy load | Policy load | Mem usage | Mem usage | openbench | | (-unconfined) | | (-unconfined) | (createfiles) -----------------|-------------|---------------|-----------|---------------|-------------- reference | 1,30s | 0,91s | 90MB | 77MB | 55 us/file rhashtable patch | 0.98s | 0,85s | 85MB | 75MB | 38 us/file this patch | 0,95s | 0,87s | 75MB | 75MB | 40 us/file (Memory usage is measured after boot. With SELinux disabled the memory usage was ~60MB on the same system.) [1] https://lore.kernel.org/selinux/20200116213937.77795-1-dev@lynxeye.de/T/ Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-02-13selinux: factor out loop body from filename_trans_read()Ondrej Mosnacek1-59/+63
It simplifies cleanup in the error path. This will be extra useful in later patch. Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-02-11security: selinux: allow per-file labeling for bpffsConnor O'Brien1-0/+1
Add support for genfscon per-file labeling of bpffs files. This allows for separate permissions for different pinned bpf objects, which may be completely unrelated to each other. Signed-off-by: Connor O'Brien <connoro@google.com> Signed-off-by: Steven Moreland <smoreland@google.com> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-02-11selinux: generalize evaluate_cond_node()Ondrej Mosnacek3-6/+12
Both callers iterate the cond_list and call it for each node - turn it into evaluate_cond_nodes(), which does the iteration for them. Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-02-11selinux: convert cond_expr to arrayOndrej Mosnacek2-43/+33
Since it is fixed-size after allocation and we know the size beforehand, using a plain old array is simpler and more efficient. Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com> Reviewed-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-02-11selinux: convert cond_av_list to arrayOndrej Mosnacek2-79/+53
Since it is fixed-size after allocation and we know the size beforehand, using a plain old array is simpler and more efficient. Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com> Reviewed-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-02-11selinux: convert cond_list to arrayOndrej Mosnacek7-59/+43
Since it is fixed-size after allocation and we know the size beforehand, using a plain old array is simpler and more efficient. While there, also fix signedness of some related variables/parameters. Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-02-10selinux: sel_avc_get_stat_idx should increase position indexVasily Averin1-0/+1
If seq_file .next function does not change position index, read after some lseek can generate unexpected output. $ dd if=/sys/fs/selinux/avc/cache_stats # usual output lookups hits misses allocations reclaims frees 817223 810034 7189 7189 6992 7037 1934894 1926896 7998 7998 7632 7683 1322812 1317176 5636 5636 5456 5507 1560571 1551548 9023 9023 9056 9115 0+1 records in 0+1 records out 189 bytes copied, 5,1564e-05 s, 3,7 MB/s $# read after lseek to midle of last line $ dd if=/sys/fs/selinux/avc/cache_stats bs=180 skip=1 dd: /sys/fs/selinux/avc/cache_stats: cannot skip to specified offset 056 9115 <<<< end of last line 1560571 1551548 9023 9023 9056 9115 <<< whole last line once again 0+1 records in 0+1 records out 45 bytes copied, 8,7221e-05 s, 516 kB/s $# read after lseek beyond end of of file $ dd if=/sys/fs/selinux/avc/cache_stats bs=1000 skip=1 dd: /sys/fs/selinux/avc/cache_stats: cannot skip to specified offset 1560571 1551548 9023 9023 9056 9115 <<<< generates whole last line 0+1 records in 0+1 records out 36 bytes copied, 9,0934e-05 s, 396 kB/s https://bugzilla.kernel.org/show_bug.cgi?id=206283 Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-02-10selinux: allow kernfs symlinks to inherit parent directory contextChristian Göttsche3-2/+13
Currently symlinks on kernel filesystems, like sysfs, are labeled on creation with the parent filesystem root sid. Allow symlinks to inherit the parent directory context, so fine-grained kernfs labeling can be applied to symlinks too and checking contexts doesn't complain about them. For backward-compatibility this behavior is contained in a new policy capability: genfs_seclabel_symlinks Signed-off-by: Christian Göttsche <cgzones@googlemail.com> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-02-10selinux: simplify evaluate_cond_node()Ondrej Mosnacek3-13/+6
It never fails, so it can just return void. Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com> Reviewed-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-02-10Documentation,selinux: deprecate setting checkreqprot to 1Stephen Smalley6-1/+40
Deprecate setting the SELinux checkreqprot tunable to 1 via kernel parameter or /sys/fs/selinux/checkreqprot. Setting it to 0 is left intact for compatibility since Android and some Linux distributions do so for security and treat an inability to set it as a fatal error. Eventually setting it to 0 will become a no-op and the kernel will stop using checkreqprot's value internally altogether. checkreqprot was originally introduced as a compatibility mechanism for legacy userspace and the READ_IMPLIES_EXEC personality flag. However, if set to 1, it weakens security by allowing mappings to be made executable without authorization by policy. The default value for the SECURITY_SELINUX_CHECKREQPROT_VALUE config option was changed from 1 to 0 in commit 2a35d196c160e3 ("selinux: change CONFIG_SECURITY_SELINUX_CHECKREQPROT_VALUE default") and both Android and Linux distributions began explicitly setting /sys/fs/selinux/checkreqprot to 0 some time ago. Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-02-10selinux: move status variables out of selinux_ssOndrej Mosnacek6-22/+23
It fits more naturally in selinux_state, since it reflects also global state (the enforcing and policyload fields). Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com> Reviewed-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-02-09Linux 5.6-rc1Linus Torvalds1-2/+2
2020-02-09Merge tag 'kbuild-v5.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuildLinus Torvalds53-261/+252
Pull more Kbuild updates from Masahiro Yamada: - fix randconfig to generate a sane .config - rename hostprogs-y / always to hostprogs / always-y, which are more natual syntax. - optimize scripts/kallsyms - fix yes2modconfig and mod2yesconfig - make multiple directory targets ('make foo/ bar/') work * tag 'kbuild-v5.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: kbuild: make multiple directory targets work kconfig: Invalidate all symbols after changing to y or m. kallsyms: fix type of kallsyms_token_table[] scripts/kallsyms: change table to store (strcut sym_entry *) scripts/kallsyms: rename local variables in read_symbol() kbuild: rename hostprogs-y/always to hostprogs/always-y kbuild: fix the document to use extra-y for vmlinux.lds kconfig: fix broken dependency in randconfig-generated .config
2020-02-09Merge tag 'zonefs-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefsLinus Torvalds9-0/+2058
Pull new zonefs file system from Damien Le Moal: "Zonefs is a very simple file system exposing each zone of a zoned block device as a file. Unlike a regular file system with native zoned block device support (e.g. f2fs or the on-going btrfs effort), zonefs does not hide the sequential write constraint of zoned block devices to the user. As a result, zonefs is not a POSIX compliant file system. Its goal is to simplify the implementation of zoned block devices support in applications by replacing raw block device file accesses with a richer file based API, avoiding relying on direct block device file ioctls which may be more obscure to developers. One example of this approach is the implementation of LSM (log-structured merge) tree structures (such as used in RocksDB and LevelDB) on zoned block devices by allowing SSTables to be stored in a zone file similarly to a regular file system rather than as a range of sectors of a zoned device. The introduction of the higher level construct "one file is one zone" can help reducing the amount of changes needed in the application while at the same time allowing the use of zoned block devices with various programming languages other than C. Zonefs IO management implementation uses the new iomap generic code. Zonefs has been successfully tested using a functional test suite (available with zonefs userland format tool on github) and a prototype implementation of LevelDB on top of zonefs" * tag 'zonefs-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs: zonefs: Add documentation fs: New zonefs file system
2020-02-09irqchip/gic-v4.1: Avoid 64bit division for the sake of 32bit ARMMarc Zyngier1-2/+2
In order to allow the GICv4 code to link properly on 32bit ARM, make sure we don't use 64bit divisions when it isn't strictly necessary. Fixes: 4e6437f12d6e ("irqchip/gic-v4.1: Ensure L2 vPE table is allocated at RD level") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Zenghui Yu <yuzenghui@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-02-09Merge tag '5.6-rc-smb3-plugfest-patches' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds22-129/+247
Pull cifs fixes from Steve French: "13 cifs/smb3 patches, most from testing at the SMB3 plugfest this week: - Important fix for multichannel and for modefromsid mounts. - Two reconnect fixes - Addition of SMB3 change notify support - Backup tools fix - A few additional minor debug improvements (tracepoints and additional logging found useful during testing this week)" * tag '5.6-rc-smb3-plugfest-patches' of git://git.samba.org/sfrench/cifs-2.6: smb3: Add defines for new information level, FileIdInformation smb3: print warning once if posix context returned on open smb3: add one more dynamic tracepoint missing from strict fsync path cifs: fix mode bits from dir listing when mounted with modefromsid cifs: fix channel signing cifs: add SMB3 change notification support cifs: make multichannel warning more visible cifs: fix soft mounts hanging in the reconnect code cifs: Add tracepoints for errors on flush or fsync cifs: log warning message (once) if out of disk space cifs: fail i/o on soft mounts if sessionsetup errors out smb3: fix problem with null cifs super block with previous patch SMB3: Backup intent flag missing from some more ops
2020-02-09Merge branch 'work.vboxsf' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds12-0/+3280
Pull vboxfs from Al Viro: "This is the VirtualBox guest shared folder support by Hans de Goede, with fixups for fs_parse folded in to avoid bisection hazards from those API changes..." * 'work.vboxsf' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fs: Add VirtualBox guest shared folder (vboxsf) support
2020-02-09Merge tag 'x86-urgent-2020-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds13-11/+260
Pull x86 fixes from Thomas Gleixner: "A set of fixes for X86: - Ensure that the PIT is set up when the local APIC is disable or configured in legacy mode. This is caused by an ordering issue introduced in the recent changes which skip PIT initialization when the TSC and APIC frequencies are already known. - Handle malformed SRAT tables during early ACPI parsing which caused an infinite loop anda boot hang. - Fix a long standing race in the affinity setting code which affects PCI devices with non-maskable MSI interrupts. The problem is caused by the non-atomic writes of the MSI address (destination APIC id) and data (vector) fields which the device uses to construct the MSI message. The non-atomic writes are mandated by PCI. If both fields change and the device raises an interrupt after writing address and before writing data, then the MSI block constructs a inconsistent message which causes interrupts to be lost and subsequent malfunction of the device. The fix is to redirect the interrupt to the new vector on the current CPU first and then switch it over to the new target CPU. This allows to observe an eventually raised interrupt in the transitional stage (old CPU, new vector) to be observed in the APIC IRR and retriggered on the new target CPU and the new vector. The potential spurious interrupts caused by this are harmless and can in the worst case expose a buggy driver (all handlers have to be able to deal with spurious interrupts as they can and do happen for various reasons). - Add the missing suspend/resume mechanism for the HYPERV hypercall page which prevents resume hibernation on HYPERV guests. This change got lost before the merge window. - Mask the IOAPIC before disabling the local APIC to prevent potentially stale IOAPIC remote IRR bits which cause stale interrupt lines after resume" * tag 'x86-urgent-2020-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/apic: Mask IOAPIC entries when disabling the local APIC x86/hyperv: Suspend/resume the hypercall page for hibernation x86/apic/msi: Plug non-maskable MSI affinity race x86/boot: Handle malformed SRAT tables during early ACPI parsing x86/timer: Don't skip PIT setup when APIC is disabled or in legacy mode
2020-02-09Merge tag 'smp-urgent-2020-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds2-2/+3
Pull SMP fixes from Thomas Gleixner: "Two fixes for the SMP related functionality: - Make the UP version of smp_call_function_single() match SMP semantics when called for a not available CPU. Instead of emitting a warning and assuming that the function call target is CPU0, return a proper error code like the SMP version does. - Remove a superfluous check in smp_call_function_many_cond()" * tag 'smp-urgent-2020-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: smp/up: Make smp_call_function_single() match SMP semantics smp: Remove superfluous cond_func check in smp_call_function_many_cond()
2020-02-09Merge tag 'perf-urgent-2020-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds10-36/+88
Pull perf fixes from Thomas Gleixner: "A set of fixes and improvements for the perf subsystem: Kernel fixes: - Install cgroup events to the correct CPU context to prevent a potential list double add - Prevent an integer underflow in the perf mlock accounting - Add a missing prototype for arch_perf_update_userpage() Tooling: - Add a missing unlock in the error path of maps__insert() in perf maps. - Fix the build with the latest libbfd - Fix the perf parser so it does not delete parse event terms, which caused a regression for using perf with the ARM CoreSight as the sink configuration was missing due to the deletion. - Fix the double free in the perf CPU map merging test case - Add the missing ustring support for the perf probe command" * tag 'perf-urgent-2020-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf maps: Add missing unlock to maps__insert() error case perf probe: Add ustring support for perf probe command perf: Make perf able to build with latest libbfd perf test: Fix test case Merge cpu map perf parse: Copy string to perf_evsel_config_term perf parse: Refactor 'struct perf_evsel_config_term' kernel/events: Add a missing prototype for arch_perf_update_userpage() perf/cgroups: Install cgroup events to correct cpuctx perf/core: Fix mlock accounting in perf_mmap()
2020-02-09Merge tag 'timers-urgent-2020-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds2-3/+10
Pull timer fixes from Thomas Gleixner: "Two small fixes for the time(r) subsystem: - Handle a subtle race between the clocksource watchdog and a concurrent clocksource watchdog stop/start sequence correctly to prevent a timer double add bug. - Fix the file path for the core time namespace file" * tag 'timers-urgent-2020-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: clocksource: Prevent double add_timer_on() for watchdog_timer MAINTAINERS: Correct path to time namespace source file
2020-02-09Merge tag 'irq-urgent-2020-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds7-35/+127
Pull interrupt fixes from Thomas Gleixner: "A set of fixes for the interrupt subsystem: - Provision only ACPI enabled redistributors on GICv3 - Use the proper command colums when building the INVALL command for the GICv3-ITS - Ensure the allocation of the L2 vPE table for GICv4.1 - Correct the GICv4.1 VPROBASER programming so it uses the proper size - A set of small GICv4.1 tidy up patches - Configuration cleanup for C-SKY interrupt chip - Clarify the function documentation for irq_set_wake() to document that the wakeup functionality is orthogonal to the irq disable/enable mechanism" * tag 'irq-urgent-2020-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip/gic-v3-its: Rename VPENDBASER/VPROPBASER accessors irqchip/gic-v3-its: Remove superfluous WARN_ON irqchip/gic-v4.1: Drop 'tmp' in inherit_vpe_l1_table_from_rd() irqchip/gic-v4.1: Ensure L2 vPE table is allocated at RD level irqchip/gic-v4.1: Set vpe_l1_base for all redistributors irqchip/gic-v4.1: Fix programming of GICR_VPROPBASER_4_1_SIZE genirq: Clarify that irq wake state is orthogonal to enable/disable irqchip/gic-v3-its: Reference to its_invall_cmd descriptor when building INVALL irqchip: Some Kconfig cleanup for C-SKY irqchip/gic-v3: Only provision redistributors that are enabled in ACPI
2020-02-09Merge tag 'efi-urgent-2020-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds1-1/+1
Pull EFI fix from Thomas Gleixner: "A single fix for a EFI boot regression on X86 which was caused by the recent rework of the EFI memory map parsing. On systems with invalid memmap entries the cleanup function uses an value which cannot be relied on in this stage. Use the actual EFI memmap entry instead" * tag 'efi-urgent-2020-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: efi/x86: Fix boot regression on systems with invalid memmap entries
2020-02-08Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds7-20/+29
Pull misc SCSI fixes from James Bottomley: "Five small patches, all in drivers or doc, which missed the initial pull request. The qla2xxx and megaraid_sas are actual fixes and the rest are spelling and doc changes" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: ufs: fix spelling mistake "initilized" -> "initialized" scsi: pm80xx: fix spelling mistake "to" -> "too" scsi: MAINTAINERS: ufs: remove pedrom.sousa@synopsys.com scsi: megaraid_sas: fixup MSIx interrupt setup during resume scsi: qla2xxx: Fix unbound NVME response length
2020-02-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds80-327/+782
Pull networking fixes from David Miller: 1) Unbalanced locking in mwifiex_process_country_ie, from Brian Norris. 2) Fix thermal zone registration in iwlwifi, from Andrei Otcheretianski. 3) Fix double free_irq in sgi ioc3 eth, from Thomas Bogendoerfer. 4) Use after free in mptcp, from Florian Westphal. 5) Use after free in wireguard's root_remove_peer_lists, from Eric Dumazet. 6) Properly access packets heads in bonding alb code, from Eric Dumazet. 7) Fix data race in skb_queue_len(), from Qian Cai. 8) Fix regression in r8169 on some chips, from Heiner Kallweit. 9) Fix XDP program ref counting in hv_netvsc, from Haiyang Zhang. 10) Certain kinds of set link netlink operations can cause a NULL deref in the ipv6 addrconf code. Fix from Eric Dumazet. 11) Don't cancel uninitialized work queue in drop monitor, from Ido Schimmel. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (84 commits) net: thunderx: use proper interface type for RGMII mt76: mt7615: fix max_nss in mt7615_eeprom_parse_hw_cap bpf: Improve bucket_log calculation logic selftests/bpf: Test freeing sockmap/sockhash with a socket in it bpf, sockhash: Synchronize_rcu before free'ing map bpf, sockmap: Don't sleep while holding RCU lock on tear-down bpftool: Don't crash on missing xlated program instructions bpf, sockmap: Check update requirements after locking drop_monitor: Do not cancel uninitialized work item mlxsw: spectrum_dpipe: Add missing error path mlxsw: core: Add validation of hardware device types for MGPIR register mlxsw: spectrum_router: Clear offload indication from IPv6 nexthops on abort selftests: mlxsw: Add test cases for local table route replacement mlxsw: spectrum_router: Prevent incorrect replacement of local table routes net: dsa: microchip: enable module autoprobe ipv6/addrconf: fix potential NULL deref in inet6_set_link_af() dpaa_eth: support all modes with rate adapting PHYs net: stmmac: update pci platform data to use phy_interface net: stmmac: xgmac: fix missing IFF_MULTICAST checki in dwxgmac2_set_filter net: stmmac: fix missing IFF_MULTICAST check in dwmac4_set_filter ...
2020-02-08fs: Add VirtualBox guest shared folder (vboxsf) supportHans de Goede12-0/+3280
VirtualBox hosts can share folders with guests, this commit adds a VFS driver implementing the Linux-guest side of this, allowing folders exported by the host to be mounted under Linux. This driver depends on the guest <-> host IPC functions exported by the vboxguest driver. Acked-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-02-08Merge tag 'powerpc-5.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linuxLinus Torvalds2-5/+7
Pull powerpc fixes from Michael Ellerman: - Fix an existing bug in our user access handling, exposed by one of the bug fixes we merged this cycle. - A fix for a boot hang on 32-bit with CONFIG_TRACE_IRQFLAGS and the recently added CONFIG_VMAP_STACK. Thanks to: Christophe Leroy, Guenter Roeck. * tag 'powerpc-5.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc: Fix CONFIG_TRACE_IRQFLAGS with CONFIG_VMAP_STACK powerpc/futex: Fix incorrect user access blocking
2020-02-08Fix up remaining devm_ioremap_nocache() in SGI IOC3 8250 UART driverLinus Torvalds1-1/+1
This is a merge error on my part - the driver was merged into mainline by commit c5951e7c8ee5 ("Merge tag 'mips_5.6' of git://../mips/linux") over a week ago, but nobody apparently noticed that it didn't actually build due to still having a reference to the devm_ioremap_nocache() function, removed a few days earlier through commit 6a1000bd2703 ("Merge tag 'ioremap-5.6' of git://../ioremap"). Apparently this didn't get any build testing anywhere. Not perhaps all that surprising: it's restricted to 64-bit MIPS only, and only with the new SGI_MFD_IOC3 support enabled. I only noticed because the ioremap conflicts in the ARM SoC driver update made me check there weren't any others hiding, and I found this one. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-02-08Merge tag 'armsoc-late' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/socLinus Torvalds35-675/+697
Pull ARM SoC late updates from Olof Johansson: "This is some material that we picked up into our tree late, or that had more complex dependencies on more than one topic branch that makes sense to keep separately. - TI support for secure accelerators and hwrng on OMAP4/5 - TI camera changes for dra7 and am437x and SGX improvement due to better reset control support on am335x, am437x and dra7 - Davinci moves to proper clocksource on DM365, and regulator/audio improvements for DM365 and DM644x eval boards" * tag 'armsoc-late' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (32 commits) ARM: dts: omap4-droid4: Enable hdq for droid4 ds250x 1-wire battery nvmem ARM: dts: motorola-cpcap-mapphone: Configure calibration interrupt ARM: dts: Configure interconnect target module for am437x sgx ARM: dts: Configure sgx for dra7 ARM: dts: Configure rstctrl reset for am335x SGX ARM: dts: dra7: Add ti-sysc node for VPE ARM: dts: dra7: add vpe clkctrl node ARM: dts: am43x-epos-evm: Add VPFE and OV2659 entries ARM: dts: am437x-sk-evm: Add VPFE and OV2659 entries ARM: dts: am43xx: add support for clkout1 clock arm: dts: dra76-evm: Add CAL and OV5640 nodes arm: dtsi: dra76x: Add CAL dtsi node arm: dts: dra72-evm-common: Add entries for the CSI2 cameras ARM: dts: DRA72: Add CAL dtsi node ARM: dts: dra7-l4: Add ti-sysc node for CAM ARM: OMAP: DRA7xx: Make CAM clock domain SWSUP only ARM: dts: dra7: add cam clkctrl node ARM: OMAP2+: Drop legacy platform data for omap4 des ARM: OMAP2+: Drop legacy platform data for omap4 sham ARM: OMAP2+: Drop legacy platform data for omap4 aes ...
2020-02-08Merge tag 'armsoc-defconfig' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/socLinus Torvalds9-36/+113
Pull ARM SoC defconfig updates from Olof Johansson: "We keep this in a separate branch to avoid cross-branch conflicts, but most of the material here is fairly boring -- some new drivers turned on for hardware since they were merged, and some refreshed files due to time having moved a lot of entries around" * tag 'armsoc-defconfig' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (38 commits) ARM: configs: at91: enable MMC_SDHCI_OF_AT91 and MICROCHIP_PIT64B arm64: defconfig: Enable Broadcom's GENET Ethernet controller ARM: multi_v7_defconfig: Enable devfreq thermal integration ARM: exynos_defconfig: Enable devfreq thermal integration ARM: multi_v7_defconfig: Enable NFS v4.1 and v4.2 ARM: exynos_defconfig: Enable NFS v4.1 and v4.2 arm64: defconfig: Enable Actions Semi specific drivers arm64: defconfig: Enable Broadcom's STB PCIe controller arm64: defconfig: Enable CONFIG_CLK_IMX8MP by default ARM: configs: at91: enable config flags for sam9x60 SoC ARM: configs: at91: use savedefconfig arm64: defconfig: Enable tegra XUDC support ARM: defconfig: gemini: Update defconfig arm64: defconfig: enable CONFIG_ARM_QCOM_CPUFREQ_NVMEM arm64: defconfig: enable CONFIG_QCOM_CPR arm64: defconfig: Enable HFPLL arm64: defconfig: Enable CRYPTO_DEV_FSL_CAAM ARM: imx_v6_v7_defconfig: Select the TFP410 driver ARM: imx_v6_v7_defconfig: Enable NFS_V4_1 and NFS_V4_2 support arm64: defconfig: Enable ATH10K_SNOC ...
2020-02-08Merge tag 'armsoc-drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/socLinus Torvalds142-3276/+6579
Pull ARM SoC-related driver updates from Olof Johansson: "Various driver updates for platforms: - Nvidia: Fuse support for Tegra194, continued memory controller pieces for Tegra30 - NXP/FSL: Refactorings of QuickEngine drivers to support ARM/ARM64/PPC - NXP/FSL: i.MX8MP SoC driver pieces - TI Keystone: ring accelerator driver - Qualcomm: SCM driver cleanup/refactoring + support for new SoCs. - Xilinx ZynqMP: feature checking interface for firmware. Mailbox communication for power management - Overall support patch set for cpuidle on more complex hierarchies (PSCI-based) and misc cleanups, refactorings of Marvell, TI, other platforms" * tag 'armsoc-drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (166 commits) drivers: soc: xilinx: Use mailbox IPI callback dt-bindings: power: reset: xilinx: Add bindings for ipi mailbox drivers: soc: ti: knav_qmss_queue: Pass lockdep expression to RCU lists MAINTAINERS: Add brcmstb PCIe controller entry soc/tegra: fuse: Unmap registers once they are not needed anymore soc/tegra: fuse: Correct straps' address for older Tegra124 device trees soc/tegra: fuse: Warn if straps are not ready soc/tegra: fuse: Cache values of straps and Chip ID registers memory: tegra30-emc: Correct error message for timed out auto calibration memory: tegra30-emc: Firm up hardware programming sequence memory: tegra30-emc: Firm up suspend/resume sequence soc/tegra: regulators: Do nothing if voltage is unchanged memory: tegra: Correct reset value of xusb_hostr soc/tegra: fuse: Add APB DMA dependency for Tegra20 bus: tegra-aconnect: Remove PM_CLK dependency dt-bindings: mediatek: add MT6765 power dt-bindings soc: mediatek: cmdq: delete not used define memory: tegra: Add support for the Tegra194 memory controller memory: tegra: Only include support for enabled SoCs memory: tegra: Support DVFS on Tegra186 and later ...
2020-02-08Merge tag 'armsoc-dt' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/socLinus Torvalds585-15650/+32100
Pull ARM Device-tree updates from Olof Johansson: "New SoCs: - Atmel/Microchip SAM9X60 (ARM926 SoC) - OMAP 37xx gets split into AM3703/AM3715/DM3725, who are all variants of it with different GPU/media IP configurations. - ST stm32mp15 SoCs (1-2 Cortex-A7, CAN, GPU depending on SKU) - ST Ericsson ab8505 (variant of ab8500) and db8520 (variant of db8500) - Unisoc SC9863A SoC (8x Cortex-A55 mobile chipset w/ GPU, modem) - Qualcomm SC7180 (8-core 64bit SoC, unnamed CPU class) New boards: - Allwinner: + Emlid Neutis SoM (H3 variant) + Libre Computer ALL-H3-IT + PineH64 Model B - Amlogic: + Libretech Amlogic GX PC (s905d and s912-based variants) - Atmel/Microchip: + Kizboxmini, sam9x60 EK, sama5d27 Wireless SOM (wlsom1) - Marvell: + Armada 385-based SolidRun Clearfog GTR - NXP: + Gateworks GW59xx boards based on i.MX6/6Q/6QDL + Tolino Shine 3 eBook reader (i.MX6sl) + Embedded Artists COM (i.MX7ULP) + SolidRun CLearfog CX/ITX and HoneyComb (LX2160A-based systems) + Google Coral Edge TPU (i.MX8MQ) - Rockchip: + Radxa Dalang Carrier (supports rk3288 and rk3399 SOMs) + Radxa Rock Pi N10 (RK3399Pro-based) + VMARC RK3399Pro SOM - ST: + Reference boards for stm32mp15 - ST Ericsson: + Samsung Galaxy S III mini (GT-I8190) + HREF520 reference board for DB8520 - TI OMAP: + Gen1 Amazon Echo (OMAP3630-based) - Qualcomm: + Inforce 6640 Single Board Computer (msm8996-based) + SC7180 IDP (SC7180-based)" * tag 'armsoc-dt' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (623 commits) dt-bindings: fix compilation error of the example in marvell,mmp3-hsic-phy.yaml arm64: dts: ti: k3-am654-base-board: Add CSI2 OV5640 camera arm64: dts: ti: k3-am65-main Add CAL node arm64: dts: ti: k3-j721e-main: Add McASP nodes arm64: dts: ti: k3-am654-main: Add McASP nodes arm64: dts: ti: k3-j721e: DMA support arm64: dts: ti: k3-j721e-main: Move secure proxy and smmu under main_navss arm64: dts: ti: k3-j721e-main: Correct main NAVSS representation arm64: dts: ti: k3-j721e: Correct the address for MAIN NAVSS arm64: dts: ti: k3-am65: DMA support arm64: dts: ti: k3-am65-main: Move secure proxy under cbass_main_navss arm64: dts: ti: k3-am65-main: Correct main NAVSS representation ARM: dts: aspeed: rainier: Add UCD90320 power sequencer ARM: dts: aspeed: rainier: Switch PSUs to unknown version arm64: dts: rockchip: Kill off "simple-panel" compatibles ARM: dts: rockchip: Kill off "simple-panel" compatibles arm64: dts: rockchip: rename dwmmc node names to mmc ARM: dts: rockchip: rename dwmmc node names to mmc arm64: dts: exynos: Rename Samsung and Exynos to lowercase arm64: dts: uniphier: add reset-names to NAND controller node ...
2020-02-08Merge tag 'armsoc-soc' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/socLinus Torvalds40-131/+457
Pull ARM SoC platform updates from Olof Johansson: "Most of these are smaller fixes that have accrued, and some continued cleanup of OMAP platforms towards shared frameworks. One new SoC from Atmel/Microchip: sam9x60" * tag 'armsoc-soc' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (35 commits) ARM: OMAP2+: Fix undefined reference to omap_secure_init ARM: s3c64xx: Drop unneeded select of TIMER_OF ARM: exynos: Drop unneeded select of MIGHT_HAVE_CACHE_L2X0 ARM: s3c24xx: Switch to atomic pwm API in rx1950 ARM: OMAP2+: sleep43xx: Call secure suspend/resume handlers ARM: OMAP2+: Use ARM SMC Calling Convention when OP-TEE is available ARM: OMAP2+: Introduce check for OP-TEE in omap_secure_init() ARM: OMAP2+: Add omap_secure_init callback hook for secure initialization ARM: at91: Documentation: add sam9x60 product and datasheet ARM: at91: pm: use of_device_id array to find the proper shdwc node ARM: at91: pm: use SAM9X60 PMC's compatible ARM: imx: only select ARM_ERRATA_814220 for ARMv7-A ARM: zynq: use physical cpuid in zynq_slcr_cpu_stop/start ARM: tegra: Use clk_m CPU on Tegra124 LP1 resume ARM: tegra: Modify reshift divider during LP1 ARM: tegra: Enable PLLP bypass during Tegra124 LP1 ARM: samsung: Rename Samsung and Exynos to lowercase ARM: exynos: Correct the help text for platform Kconfig option ARM: bcm: Select ARM_AMBA for ARCH_BRCMSTB ARM: brcmstb: Add debug UART entry for 7216 ...
2020-02-08Merge tag 'compat-ioctl-fix' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playgroundLinus Torvalds1-4/+7
Pull compat-ioctl fix from Arnd Bergmann: "One patch in the compat-ioctl series broke 32-bit rootfs for multiple people testing on 64-bit kernels. Let's fix it in -rc1 before others run into the same issue" * tag 'compat-ioctl-fix' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground: compat_ioctl: fix FIONREAD on devices
2020-02-08Merge branch 'merge.nfs-fs_parse.1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds37-782/+562
Pull vfs file system parameter updates from Al Viro: "Saner fs_parser.c guts and data structures. The system-wide registry of syntax types (string/enum/int32/oct32/.../etc.) is gone and so is the horror switch() in fs_parse() that would have to grow another case every time something got added to that system-wide registry. New syntax types can be added by filesystems easily now, and their namespace is that of functions - not of system-wide enum members. IOW, they can be shared or kept private and if some turn out to be widely useful, we can make them common library helpers, etc., without having to do anything whatsoever to fs_parse() itself. And we already get that kind of requests - the thing that finally pushed me into doing that was "oh, and let's add one for timeouts - things like 15s or 2h". If some filesystem really wants that, let them do it. Without somebody having to play gatekeeper for the variants blessed by direct support in fs_parse(), TYVM. Quite a bit of boilerplate is gone. And IMO the data structures make a lot more sense now. -200LoC, while we are at it" * 'merge.nfs-fs_parse.1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (25 commits) tmpfs: switch to use of invalfc() cgroup1: switch to use of errorfc() et.al. procfs: switch to use of invalfc() hugetlbfs: switch to use of invalfc() cramfs: switch to use of errofc() et.al. gfs2: switch to use of errorfc() et.al. fuse: switch to use errorfc() et.al. ceph: use errorfc() and friends instead of spelling the prefix out prefix-handling analogues of errorf() and friends turn fs_param_is_... into functions fs_parse: handle optional arguments sanely fs_parse: fold fs_parameter_desc/fs_parameter_spec fs_parser: remove fs_parameter_description name field add prefix to fs_context->log ceph_parse_param(), ceph_parse_mon_ips(): switch to passing fc_log new primitive: __fs_parse() switch rbd and libceph to p_log-based primitives struct p_log, variants of warnf() et.al. taking that one instead teach logfc() to handle prefices, give it saner calling conventions get rid of cg_invalf() ...
2020-02-08Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds10-111/+119
Pull misc vfs updates from Al Viro: - bmap series from cmaiolino - getting rid of convolutions in copy_mount_options() (use a couple of copy_from_user() instead of the __get_user() crap) * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: saner copy_mount_options() fibmap: Reject negative block numbers fibmap: Use bmap instead of ->bmap method in ioctl_fibmap ecryptfs: drop direct calls to ->bmap cachefiles: drop direct usage of ->bmap method. fs: Enable bmap() function to properly return errors
2020-02-08Merge branch 'pipe-exclusive-wakeup'Linus Torvalds4-30/+51
Merge thundering herd avoidance on pipe IO. This would have been applied for 5.5 already, but got delayed because of a user-space race condition in the GNU make jobserver code. Now that there's a new GNU make 4.3 release, and most distributions seem to have at least applied the (almost three year old) fix for the problem, let's see if people notice. And it might have been just bad random timing luck on my machine. If you do hit the race condition, things will still work, but the symptom is that you don't get nearly the expected parallelism when using "make -j<N>". The jobserver bug can definitely happen without this patch too, but seems to be easier to trigger when we no longer wake up pipe waiters unnecessarily. * pipe-exclusive-wakeup: pipe: use exclusive waits when reading or writing
2020-02-08pipe: use exclusive waits when reading or writingLinus Torvalds4-30/+51
This makes the pipe code use separate wait-queues and exclusive waiting for readers and writers, avoiding a nasty thundering herd problem when there are lots of readers waiting for data on a pipe (or, less commonly, lots of writers waiting for a pipe to have space). While this isn't a common occurrence in the traditional "use a pipe as a data transport" case, where you typically only have a single reader and a single writer process, there is one common special case: using a pipe as a source of "locking tokens" rather than for data communication. In particular, the GNU make jobserver code ends up using a pipe as a way to limit parallelism, where each job consumes a token by reading a byte from the jobserver pipe, and releases the token by writing a byte back to the pipe. This pattern is fairly traditional on Unix, and works very well, but will waste a lot of time waking up a lot of processes when only a single reader needs to be woken up when a writer releases a new token. A simplified test-case of just this pipe interaction is to create 64 processes, and then pass a single token around between them (this test-case also intentionally passes another token that gets ignored to test the "wake up next" logic too, in case anybody wonders about it): #include <unistd.h> int main(int argc, char **argv) { int fd[2], counters[2]; pipe(fd); counters[0] = 0; counters[1] = -1; write(fd[1], counters, sizeof(counters)); /* 64 processes */ fork(); fork(); fork(); fork(); fork(); fork(); do { int i; read(fd[0], &i, sizeof(i)); if (i < 0) continue; counters[0] = i+1; write(fd[1], counters, (1+(i & 1)) *sizeof(int)); } while (counters[0] < 1000000); return 0; } and in a perfect world, passing that token around should only cause one context switch per transfer, when the writer of a token causes a directed wakeup of just a single reader. But with the "writer wakes all readers" model we traditionally had, on my test box the above case causes more than an order of magnitude more scheduling: instead of the expected ~1M context switches, "perf stat" shows 231,852.37 msec task-clock # 15.857 CPUs utilized 11,250,961 context-switches # 0.049 M/sec 616,304 cpu-migrations # 0.003 M/sec 1,648 page-faults # 0.007 K/sec 1,097,903,998,514 cycles # 4.735 GHz 120,781,778,352 instructions # 0.11 insn per cycle 27,997,056,043 branches # 120.754 M/sec 283,581,233 branch-misses # 1.01% of all branches 14.621273891 seconds time elapsed 0.018243000 seconds user 3.611468000 seconds sys before this commit. After this commit, I get 5,229.55 msec task-clock # 3.072 CPUs utilized 1,212,233 context-switches # 0.232 M/sec 103,951 cpu-migrations # 0.020 M/sec 1,328 page-faults # 0.254 K/sec 21,307,456,166 cycles # 4.074 GHz 12,947,819,999 instructions # 0.61 insn per cycle 2,881,985,678 branches # 551.096 M/sec 64,267,015 branch-misses # 2.23% of all branches 1.702148350 seconds time elapsed 0.004868000 seconds user 0.110786000 seconds sys instead. Much better. [ Note! This kernel improvement seems to be very good at triggering a race condition in the make jobserver (in GNU make 4.2.1) for me. It's a long known bug that was fixed back in June 2017 by GNU make commit b552b0525198 ("[SV 51159] Use a non-blocking read with pselect to avoid hangs."). But there wasn't a new release of GNU make until 4.3 on Jan 19 2020, so a number of distributions may still have the buggy version. Some have backported the fix to their 4.2.1 release, though, and even without the fix it's quite timing-dependent whether the bug actually is hit. ] Josh Triplett says: "I've been hammering on your pipe fix patch (switching to exclusive wait queues) for a month or so, on several different systems, and I've run into no issues with it. The patch *substantially* improves parallel build times on large (~100 CPU) systems, both with parallel make and with other things that use make's pipe-based jobserver. All current distributions (including stable and long-term stable distributions) have versions of GNU make that no longer have the jobserver bug" Tested-by: Josh Triplett <josh@joshtriplett.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-02-08compat_ioctl: fix FIONREAD on devicesArnd Bergmann1-4/+7
My final cleanup patch for sys_compat_ioctl() introduced a regression on the FIONREAD ioctl command, which is used for both regular and special files, but only works on regular files after my patch, as I had missed the warning that Al Viro put into a comment right above it. Change it back so it can work on any file again by moving the implementation to do_vfs_ioctl() instead. Fixes: 77b9040195de ("compat_ioctl: simplify the implementation") Reported-and-tested-by: Christian Zigotzky <chzigotzky@xenosoft.de> Reported-and-tested-by: youling257 <youling257@gmail.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2020-02-08Merge tag 'irqchip-fixes-5.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/urgentThomas Gleixner6-35/+120
Pull irqchip fixes for 5.6, take #1 from Marc Zyngier: - Guarantee allocation of L2 vPE table for GICv4.1 - Fix GICv4.1 VPROPBASER programming - Numerous GICv4.1 tidy ups - Fix disabled GICv3 redistributor provisioning with ACPI - KConfig cleanup for C-SKY
2020-02-08net: thunderx: use proper interface type for RGMIITim Harvey1-1/+1
The configuration of the OCTEONTX XCV_DLL_CTL register via xcv_init_hw() is such that the RGMII RX delay is bypassed leaving the RGMII TX delay enabled in the MAC: /* Configure DLL - enable or bypass * TX no bypass, RX bypass */ cfg = readq_relaxed(xcv->reg_base + XCV_DLL_CTL); cfg &= ~0xFF03; cfg |= CLKRX_BYP; writeq_relaxed(cfg, xcv->reg_base + XCV_DLL_CTL); This would coorespond to a interface type of PHY_INTERFACE_MODE_RGMII_RXID and not PHY_INTERFACE_MODE_RGMII. Fixing this allows RGMII PHY drivers to do the right thing (enable RX delay in the PHY) instead of erroneously enabling both delays in the PHY. Signed-off-by: Tim Harvey <tharvey@gateworks.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-08Merge tag 'wireless-drivers-2020-02-08' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-driversDavid S. Miller14-53/+159
Kalle Valo says: ==================== wireless-drivers fixes for v5.6 First set of fixes for v5.6. Buffer overflow fixes to mwifiex, quite a few functionality fixes to iwlwifi and smaller fixes to other drivers. mwifiex * fix an unlock from a previous security fix * fix two buffer overflows libertas * fix two bugs from previous security fixes iwlwifi * fix module removal with multiple NICs * don't treat IGTK removal failure as an error * avoid FW crashes due to DTS measurement races * fix a potential use after free in FTM code * prevent a NULL pointer dereference in iwl_mvm_cfg_he_sta() * fix TDLS discovery * check all CPUs when trying to detect an error during resume rtw88 * fix clang warning mt76 * fix reading of max_nss value from a register ==================== Signed-off-by: David S. Miller <davem@davemloft.net>