aboutsummaryrefslogtreecommitdiffstats
path: root/fs (follow)
AgeCommit message (Collapse)AuthorFilesLines
2019-07-10cgroup, blkcg: Prepare some symbols for module and !CONFIG_CGROUP usagesTejun Heo1-0/+3
btrfs is going to use css_put() and wbc helpers to improve cgroup writeback support. Add dummy css_get() definition and export wbc helpers to prepare for module and !CONFIG_CGROUP builds. Reported-by: kbuild test robot <lkp@intel.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-07-09Merge tag 'for-5.3/block-20190708' of git://git.kernel.dk/linux-blockLinus Torvalds6-41/+16
Pull block updates from Jens Axboe: "This is the main block updates for 5.3. Nothing earth shattering or major in here, just fixes, additions, and improvements all over the map. This contains: - Series of documentation fixes (Bart) - Optimization of the blk-mq ctx get/put (Bart) - null_blk removal race condition fix (Bob) - req/bio_op() cleanups (Chaitanya) - Series cleaning up the segment accounting, and request/bio mapping (Christoph) - Series cleaning up the page getting/putting for bios (Christoph) - block cgroup cleanups and moving it to where it is used (Christoph) - block cgroup fixes (Tejun) - Series of fixes and improvements to bcache, most notably a write deadlock fix (Coly) - blk-iolatency STS_AGAIN and accounting fixes (Dennis) - Series of improvements and fixes to BFQ (Douglas, Paolo) - debugfs_create() return value check removal for drbd (Greg) - Use struct_size(), where appropriate (Gustavo) - Two lighnvm fixes (Heiner, Geert) - MD fixes, including a read balance and corruption fix (Guoqing, Marcos, Xiao, Yufen) - block opal shadow mbr additions (Jonas, Revanth) - sbitmap compare-and-exhange improvemnts (Pavel) - Fix for potential bio->bi_size overflow (Ming) - NVMe pull requests: - improved PCIe suspent support (Keith Busch) - error injection support for the admin queue (Akinobu Mita) - Fibre Channel discovery improvements (James Smart) - tracing improvements including nvmetc tracing support (Minwoo Im) - misc fixes and cleanups (Anton Eidelman, Minwoo Im, Chaitanya Kulkarni)" - Various little fixes and improvements to drivers and core" * tag 'for-5.3/block-20190708' of git://git.kernel.dk/linux-block: (153 commits) blk-iolatency: fix STS_AGAIN handling block: nr_phys_segments needs to be zero for REQ_OP_WRITE_ZEROES blk-mq: simplify blk_mq_make_request() blk-mq: remove blk_mq_put_ctx() sbitmap: Replace cmpxchg with xchg block: fix .bi_size overflow block: sed-opal: check size of shadow mbr block: sed-opal: ioctl for writing to shadow mbr block: sed-opal: add ioctl for done-mark of shadow mbr block: never take page references for ITER_BVEC direct-io: use bio_release_pages in dio_bio_complete block_dev: use bio_release_pages in bio_unmap_user block_dev: use bio_release_pages in blkdev_bio_end_io iomap: use bio_release_pages in iomap_dio_bio_end_io block: use bio_release_pages in bio_map_user_iov block: use bio_release_pages in bio_unmap_user block: optionally mark pages dirty in bio_release_pages block: move the BIO_NO_PAGE_REF check into bio_release_pages block: skd_main.c: Remove call to memset after dma_alloc_coherent block: mtip32xx: Remove call to memset after dma_alloc_coherent ...
2019-07-08Merge branch 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespaceLinus Torvalds2-2/+2
Pull force_sig() argument change from Eric Biederman: "A source of error over the years has been that force_sig has taken a task parameter when it is only safe to use force_sig with the current task. The force_sig function is built for delivering synchronous signals such as SIGSEGV where the userspace application caused a synchronous fault (such as a page fault) and the kernel responded with a signal. Because the name force_sig does not make this clear, and because the force_sig takes a task parameter the function force_sig has been abused for sending other kinds of signals over the years. Slowly those have been fixed when the oopses have been tracked down. This set of changes fixes the remaining abusers of force_sig and carefully rips out the task parameter from force_sig and friends making this kind of error almost impossible in the future" * 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (27 commits) signal/x86: Move tsk inside of CONFIG_MEMORY_FAILURE in do_sigbus signal: Remove the signal number and task parameters from force_sig_info signal: Factor force_sig_info_to_task out of force_sig_info signal: Generate the siginfo in force_sig signal: Move the computation of force into send_signal and correct it. signal: Properly set TRACE_SIGNAL_LOSE_INFO in __send_signal signal: Remove the task parameter from force_sig_fault signal: Use force_sig_fault_to_task for the two calls that don't deliver to current signal: Explicitly call force_sig_fault on current signal/unicore32: Remove tsk parameter from __do_user_fault signal/arm: Remove tsk parameter from __do_user_fault signal/arm: Remove tsk parameter from ptrace_break signal/nds32: Remove tsk parameter from send_sigtrap signal/riscv: Remove tsk parameter from do_trap signal/sh: Remove tsk parameter from force_sig_info_fault signal/um: Remove task parameter from send_sigtrap signal/x86: Remove task parameter from send_sigtrap signal: Remove task parameter from force_sig_mceerr signal: Remove task parameter from force_sig signal: Remove task parameter from force_sigsegv ...
2019-07-08Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6Linus Torvalds3-48/+17
Pull crypto updates from Herbert Xu: "Here is the crypto update for 5.3: API: - Test shash interface directly in testmgr - cra_driver_name is now mandatory Algorithms: - Replace arc4 crypto_cipher with library helper - Implement 5 way interleave for ECB, CBC and CTR on arm64 - Add xxhash - Add continuous self-test on noise source to drbg - Update jitter RNG Drivers: - Add support for SHA204A random number generator - Add support for 7211 in iproc-rng200 - Fix fuzz test failures in inside-secure - Fix fuzz test failures in talitos - Fix fuzz test failures in qat" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (143 commits) crypto: stm32/hash - remove interruptible condition for dma crypto: stm32/hash - Fix hmac issue more than 256 bytes crypto: stm32/crc32 - rename driver file crypto: amcc - remove memset after dma_alloc_coherent crypto: ccp - Switch to SPDX license identifiers crypto: ccp - Validate the the error value used to index error messages crypto: doc - Fix formatting of new crypto engine content crypto: doc - Add parameter documentation crypto: arm64/aes-ce - implement 5 way interleave for ECB, CBC and CTR crypto: arm64/aes-ce - add 5 way interleave routines crypto: talitos - drop icv_ool crypto: talitos - fix hash on SEC1. crypto: talitos - move struct talitos_edesc into talitos.h lib/scatterlist: Fix mapping iterator when sg->offset is greater than PAGE_SIZE crypto/NX: Set receive window credits to max number of CRBs in RxFIFO crypto: asymmetric_keys - select CRYPTO_HASH where needed crypto: serpent - mark __serpent_setkey_sbox noinline crypto: testmgr - dynamically allocate crypto_shash crypto: testmgr - dynamically allocate testvec_config crypto: talitos - eliminate unneeded 'done' functions at build time ...
2019-07-08Merge tag 'keys-acl-20190703' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fsLinus Torvalds10-21/+78
Pull keyring ACL support from David Howells: "This changes the permissions model used by keys and keyrings to be based on an internal ACL by the following means: - Replace the permissions mask internally with an ACL that contains a list of ACEs, each with a specific subject with a permissions mask. Potted default ACLs are available for new keys and keyrings. ACE subjects can be macroised to indicate the UID and GID specified on the key (which remain). Future commits will be able to add additional subject types, such as specific UIDs or domain tags/namespaces. Also split a number of permissions to give finer control. Examples include splitting the revocation permit from the change-attributes permit, thereby allowing someone to be granted permission to revoke a key without allowing them to change the owner; also the ability to join a keyring is split from the ability to link to it, thereby stopping a process accessing a keyring by joining it and thus acquiring use of possessor permits. - Provide a keyctl to allow the granting or denial of one or more permits to a specific subject. Direct access to the ACL is not granted, and the ACL cannot be viewed" * tag 'keys-acl-20190703' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: keys: Provide KEYCTL_GRANT_PERMISSION keys: Replace uid/gid/perm permissions checking with an ACL
2019-07-08Merge tag 'keys-namespace-20190627' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fsLinus Torvalds5-8/+12
Pull keyring namespacing from David Howells: "These patches help make keys and keyrings more namespace aware. Firstly some miscellaneous patches to make the process easier: - Simplify key index_key handling so that the word-sized chunks assoc_array requires don't have to be shifted about, making it easier to add more bits into the key. - Cache the hash value in the key so that we don't have to calculate on every key we examine during a search (it involves a bunch of multiplications). - Allow keying_search() to search non-recursively. Then the main patches: - Make it so that keyring names are per-user_namespace from the point of view of KEYCTL_JOIN_SESSION_KEYRING so that they're not accessible cross-user_namespace. keyctl_capabilities() shows KEYCTL_CAPS1_NS_KEYRING_NAME for this. - Move the user and user-session keyrings to the user_namespace rather than the user_struct. This prevents them propagating directly across user_namespaces boundaries (ie. the KEY_SPEC_* flags will only pick from the current user_namespace). - Make it possible to include the target namespace in which the key shall operate in the index_key. This will allow the possibility of multiple keys with the same description, but different target domains to be held in the same keyring. keyctl_capabilities() shows KEYCTL_CAPS1_NS_KEY_TAG for this. - Make it so that keys are implicitly invalidated by removal of a domain tag, causing them to be garbage collected. - Institute a network namespace domain tag that allows keys to be differentiated by the network namespace in which they operate. New keys that are of a type marked 'KEY_TYPE_NET_DOMAIN' are assigned the network domain in force when they are created. - Make it so that the desired network namespace can be handed down into the request_key() mechanism. This allows AFS, NFS, etc. to request keys specific to the network namespace of the superblock. This also means that the keys in the DNS record cache are thenceforth namespaced, provided network filesystems pass the appropriate network namespace down into dns_query(). For DNS, AFS and NFS are good, whilst CIFS and Ceph are not. Other cache keyrings, such as idmapper keyrings, also need to set the domain tag - for which they need access to the network namespace of the superblock" * tag 'keys-namespace-20190627' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: keys: Pass the network namespace into request_key mechanism keys: Network namespace domain tag keys: Garbage collect keys for which the domain has been removed keys: Include target namespace in match criteria keys: Move the user and user-session keyrings to the user_namespace keys: Namespace keyring names keys: Add a 'recurse' flag for keyring searches keys: Cache the hash value to avoid lots of recalculation keys: Simplify key description management
2019-07-08Merge branch 'x86-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds2-0/+10
Pull x86 AVX512 status update from Ingo Molnar: "This adds a new ABI that the main scheduler probably doesn't want to deal with but HPC job schedulers might want to use: the AVX512_elapsed_ms field in the new /proc/<pid>/arch_status task status file, which allows the user-space job scheduler to cluster such tasks, to avoid turbo frequency drops" * 'x86-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: Documentation/filesystems/proc.txt: Add arch_status file x86/process: Add AVX-512 usage elapsed time to /proc/pid/arch_status proc: Add /proc/<pid>/arch_status
2019-07-08Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds1-2/+2
Pull scheduler updates from Ingo Molnar: - Remove the unused per rq load array and all its infrastructure, by Dietmar Eggemann. - Add utilization clamping support by Patrick Bellasi. This is a refinement of the energy aware scheduling framework with support for boosting of interactive and capping of background workloads: to make sure critical GUI threads get maximum frequency ASAP, and to make sure background processing doesn't unnecessarily move to cpufreq governor to higher frequencies and less energy efficient CPU modes. - Add the bare minimum of tracepoints required for LISA EAS regression testing, by Qais Yousef - which allows automated testing of various power management features, including energy aware scheduling. - Restructure the former tsk_nr_cpus_allowed() facility that the -rt kernel used to modify the scheduler's CPU affinity logic such as migrate_disable() - introduce the task->cpus_ptr value instead of taking the address of &task->cpus_allowed directly - by Sebastian Andrzej Siewior. - Misc optimizations, fixes, cleanups and small enhancements - see the Git log for details. * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (33 commits) sched/uclamp: Add uclamp support to energy_compute() sched/uclamp: Add uclamp_util_with() sched/cpufreq, sched/uclamp: Add clamps for FAIR and RT tasks sched/uclamp: Set default clamps for RT tasks sched/uclamp: Reset uclamp values on RESET_ON_FORK sched/uclamp: Extend sched_setattr() to support utilization clamping sched/core: Allow sched_setattr() to use the current policy sched/uclamp: Add system default clamps sched/uclamp: Enforce last task's UCLAMP_MAX sched/uclamp: Add bucket local max tracking sched/uclamp: Add CPU's clamp buckets refcounting sched/fair: Rename weighted_cpuload() to cpu_runnable_load() sched/debug: Export the newly added tracepoints sched/debug: Add sched_overutilized tracepoint sched/debug: Add new tracepoint to track PELT at se level sched/debug: Add new tracepoints to track PELT at rq level sched/debug: Add a new sched_trace_*() helper functions sched/autogroup: Make autogroup_path() always available sched/wait: Deduplicate code with do-while sched/topology: Remove unused 'sd' parameter from arch_scale_cpu_capacity() ...
2019-07-08Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds1-1/+1
Pull locking updates from Ingo Molnar: "The main changes in this cycle are: - rwsem scalability improvements, phase #2, by Waiman Long, which are rather impressive: "On a 2-socket 40-core 80-thread Skylake system with 40 reader and writer locking threads, the min/mean/max locking operations done in a 5-second testing window before the patchset were: 40 readers, Iterations Min/Mean/Max = 1,807/1,808/1,810 40 writers, Iterations Min/Mean/Max = 1,807/50,344/151,255 After the patchset, they became: 40 readers, Iterations Min/Mean/Max = 30,057/31,359/32,741 40 writers, Iterations Min/Mean/Max = 94,466/95,845/97,098" There's a lot of changes to the locking implementation that makes it similar to qrwlock, including owner handoff for more fair locking. Another microbenchmark shows how across the spectrum the improvements are: "With a locking microbenchmark running on 5.1 based kernel, the total locking rates (in kops/s) on a 2-socket Skylake system with equal numbers of readers and writers (mixed) before and after this patchset were: # of Threads Before Patch After Patch ------------ ------------ ----------- 2 2,618 4,193 4 1,202 3,726 8 802 3,622 16 729 3,359 32 319 2,826 64 102 2,744" The changes are extensive and the patch-set has been through several iterations addressing various locking workloads. There might be more regressions, but unless they are pathological I believe we want to use this new implementation as the baseline going forward. - jump-label optimizations by Daniel Bristot de Oliveira: the primary motivation was to remove IPI disturbance of isolated RT-workload CPUs, which resulted in the implementation of batched jump-label updates. Beyond the improvement of the real-time characteristics kernel, in one test this patchset improved static key update overhead from 57 msecs to just 1.4 msecs - which is a nice speedup as well. - atomic64_t cross-arch type cleanups by Mark Rutland: over the last ~10 years of atomic64_t existence the various types used by the APIs only had to be self-consistent within each architecture - which means they became wildly inconsistent across architectures. Mark puts and end to this by reworking all the atomic64 implementations to use 's64' as the base type for atomic64_t, and to ensure that this type is consistently used for parameters and return values in the API, avoiding further problems in this area. - A large set of small improvements to lockdep by Yuyang Du: type cleanups, output cleanups, function return type and othr cleanups all around the place. - A set of percpu ops cleanups and fixes by Peter Zijlstra. - Misc other changes - please see the Git log for more details" * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (82 commits) locking/lockdep: increase size of counters for lockdep statistics locking/atomics: Use sed(1) instead of non-standard head(1) option locking/lockdep: Move mark_lock() inside CONFIG_TRACE_IRQFLAGS && CONFIG_PROVE_LOCKING x86/jump_label: Make tp_vec_nr static x86/percpu: Optimize raw_cpu_xchg() x86/percpu, sched/fair: Avoid local_clock() x86/percpu, x86/irq: Relax {set,get}_irq_regs() x86/percpu: Relax smp_processor_id() x86/percpu: Differentiate this_cpu_{}() and __this_cpu_{}() locking/rwsem: Guard against making count negative locking/rwsem: Adaptive disabling of reader optimistic spinning locking/rwsem: Enable time-based spinning on reader-owned rwsem locking/rwsem: Make rwsem->owner an atomic_long_t locking/rwsem: Enable readers spinning on writer locking/rwsem: Clarify usage of owner's nonspinaable bit locking/rwsem: Wake up almost all readers in wait queue locking/rwsem: More optimal RT task handling of null owner locking/rwsem: Always release wait_lock before waking up tasks locking/rwsem: Implement lock handoff to prevent lock starvation locking/rwsem: Make rwsem_spin_on_owner() return owner state ...
2019-07-06Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds1-1/+0
Pull vfs fixlet from Al Viro: "Fix bogus default y in Kconfig (VALIDATE_FS_PARSER) That thing should not be turned on by default, especially since it's not quiet in case it finds no problems. Geert has sent the obvious fix quite a few times, but it fell through the cracks" * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fs: VALIDATE_FS_PARSER should default to n
2019-07-05Merge tag 'nfsd-5.2-2' of git://linux-nfs.org/~bfields/linuxLinus Torvalds1-1/+1
Pull nfsd fixes from Bruce Fields: "Two more quick bugfixes for nfsd: fixing a regression causing mount failures on high-memory machines and fixing the DRC over RDMA" * tag 'nfsd-5.2-2' of git://linux-nfs.org/~bfields/linux: nfsd: Fix overflow causing non-working mounts on 1 TB machines svcrdma: Ignore source port when computing DRC hash
2019-07-05fs: VALIDATE_FS_PARSER should default to nGeert Uytterhoeven1-1/+0
CONFIG_VALIDATE_FS_PARSER is a debugging tool to check that the parser tables are vaguely sane. It was set to default to 'Y' for the moment to catch errors in upcoming fs conversion development. Make sure it is not enabled by default in the final release of v5.1. Fixes: 31d921c7fb969172 ("vfs: Add configuration parser helpers") Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-07-05Merge branch 'akpm' (patches from Andrew)Linus Torvalds1-16/+26
Merge more fixes from Andrew Morton: "5 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: swap_readpage(): avoid blk_wake_io_task() if !synchronous devres: allow const resource arguments mm/vmscan.c: prevent useless kswapd loops fs/userfaultfd.c: disable irqs for fault_pending and event locks mm/page_alloc.c: fix regression with deferred struct page init
2019-07-05Merge tag 'dax-fix-5.2-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimmLinus Torvalds1-5/+4
Pull dax fix from Dan Williams: "A single dax fix that has been soaking awaiting other fixes under discussion to join it. As it is getting late in the cycle lets proceed with this fix and save follow-on changes for post-v5.3-rc1. - Fix xarray entry association for mixed mappings" * tag 'dax-fix-5.2-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: dax: Fix xarray entry association for mixed mappings
2019-07-05Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds1-3/+4
Pull do_move_mount() fix from Al Viro: "Regression fix" * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: vfs: move_mount: reject moving kernel internal mounts
2019-07-05fs/userfaultfd.c: disable irqs for fault_pending and event locksEric Biggers1-16/+26
When IOCB_CMD_POLL is used on a userfaultfd, aio_poll() disables IRQs and takes kioctx::ctx_lock, then userfaultfd_ctx::fd_wqh.lock. This may have to wait for userfaultfd_ctx::fd_wqh.lock to be released by userfaultfd_ctx_read(), which in turn can be waiting for userfaultfd_ctx::fault_pending_wqh.lock or userfaultfd_ctx::event_wqh.lock. But elsewhere the fault_pending_wqh and event_wqh locks are taken with IRQs enabled. Since the IRQ handler may take kioctx::ctx_lock, lockdep reports that a deadlock is possible. Fix it by always disabling IRQs when taking the fault_pending_wqh and event_wqh locks. Commit ae62c16e105a ("userfaultfd: disable irqs when taking the waitqueue lock") didn't fix this because it only accounted for the fd_wqh lock, not the other locks nested inside it. Link: http://lkml.kernel.org/r/20190627075004.21259-1-ebiggers@kernel.org Fixes: bfe4037e722e ("aio: implement IOCB_CMD_POLL") Signed-off-by: Eric Biggers <ebiggers@google.com> Reported-by: syzbot+fab6de82892b6b9c6191@syzkaller.appspotmail.com Reported-by: syzbot+53c0b767f7ca0dc0c451@syzkaller.appspotmail.com Reported-by: syzbot+a3accb352f9c22041cfa@syzkaller.appspotmail.com Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: <stable@vger.kernel.org> [4.19+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-03nfsd: Fix overflow causing non-working mounts on 1 TB machinesPaul Menzel1-1/+1
Since commit 10a68cdf10 (nfsd: fix performance-limiting session calculation) (Linux 5.1-rc1 and 4.19.31), shares from NFS servers with 1 TB of memory cannot be mounted anymore. The mount just hangs on the client. The gist of commit 10a68cdf10 is the change below. -avail = clamp_t(int, avail, slotsize, avail/3); +avail = clamp_t(int, avail, slotsize, total_avail/3); Here are the macros. #define min_t(type, x, y) __careful_cmp((type)(x), (type)(y), <) #define clamp_t(type, val, lo, hi) min_t(type, max_t(type, val, lo), hi) `total_avail` is 8,434,659,328 on the 1 TB machine. `clamp_t()` casts the values to `int`, which for 32-bit integers can only hold values −2,147,483,648 (−2^31) through 2,147,483,647 (2^31 − 1). `avail` (in the function signature) is just 65536, so that no overflow was happening. Before the commit the assignment would result in 21845, and `num = 4`. When using `total_avail`, it is causing the assignment to be 18446744072226137429 (printed as %lu), and `num` is then 4164608182. My next guess is, that `nfsd_drc_mem_used` is then exceeded, and the server thinks there is no memory available any more for this client. Updating the arguments of `clamp_t()` and `min_t()` to `unsigned long` fixes the issue. Now, `avail = 65536` (before commit 10a68cdf10 `avail = 21845`), but `num = 4` remains the same. Fixes: c54f24e338ed (nfsd: fix performance-limiting session calculation) Cc: stable@vger.kernel.org Signed-off-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-07-03Merge tag '5.2-rc6-smb3-fix' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds2-5/+73
Pull cifs fix from Steve French: "SMB3 fix (for stable as well) for crash mishandling one of the Windows reparse point symlink tags" * tag '5.2-rc6-smb3-fix' of git://git.samba.org/sfrench/cifs-2.6: cifs: fix crash querying symlinks stored as reparse-points
2019-07-01vfs: move_mount: reject moving kernel internal mountsEric Biggers1-3/+4
sys_move_mount() crashes by dereferencing the pointer MNT_NS_INTERNAL, a.k.a. ERR_PTR(-EINVAL), if the old mount is specified by fd for a kernel object with an internal mount, such as a pipe or memfd. Fix it by checking for this case and returning -EINVAL. [AV: what we want is is_mounted(); use that instead of making the condition even more convoluted] Reproducer: #include <unistd.h> #define __NR_move_mount 429 #define MOVE_MOUNT_F_EMPTY_PATH 0x00000004 int main() { int fds[2]; pipe(fds); syscall(__NR_move_mount, fds[0], "", -1, "/", MOVE_MOUNT_F_EMPTY_PATH); } Reported-by: syzbot+6004acbaa1893ad013f0@syzkaller.appspotmail.com Fixes: 2db154b3ea8e ("vfs: syscall: Add move_mount(2) to move mounts around") Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-07-01block: fix .bi_size overflowMing Lei2-2/+2
'bio->bi_iter.bi_size' is 'unsigned int', which at most hold 4G - 1 bytes. Before 07173c3ec276 ("block: enable multipage bvecs"), one bio can include very limited pages, and usually at most 256, so the fs bio size won't be bigger than 1M bytes most of times. Since we support multi-page bvec, in theory one fs bio really can be added > 1M pages, especially in case of hugepage, or big writeback with too many dirty pages. Then there is chance in which .bi_size is overflowed. Fixes this issue by using bio_full() to check if the added segment may overflow .bi_size. Cc: Liu Yiding <liuyd.fnst@cn.fujitsu.com> Cc: kernel test robot <rong.a.chen@intel.com> Cc: "Darrick J. Wong" <darrick.wong@oracle.com> Cc: linux-xfs@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org Cc: stable@vger.kernel.org Fixes: 07173c3ec276 ("block: enable multipage bvecs") Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-07-01Merge tag 'v5.2-rc6' into for-5.3/blockJens Axboe56-266/+220
Merge 5.2-rc6 into for-5.3/block, so we get the same page merge leak fix. Otherwise we end up having conflicts with future patches between for-5.3/block and master that touch this area. In particular, it makes the bio_full() fix hard to backport to stable. * tag 'v5.2-rc6': (482 commits) Linux 5.2-rc6 Revert "iommu/vt-d: Fix lock inversion between iommu->lock and device_domain_lock" Bluetooth: Fix regression with minimum encryption key size alignment tcp: refine memory limit test in tcp_fragment() x86/vdso: Prevent segfaults due to hoisted vclock reads SUNRPC: Fix a credential refcount leak Revert "SUNRPC: Declare RPC timers as TIMER_DEFERRABLE" net :sunrpc :clnt :Fix xps refcount imbalance on the error path NFS4: Only set creation opendata if O_CREAT ARM: 8867/1: vdso: pass --be8 to linker if necessary KVM: nVMX: reorganize initial steps of vmx_set_nested_state KVM: PPC: Book3S HV: Invalidate ERAT when flushing guest TLB entries habanalabs: use u64_to_user_ptr() for reading user pointers nfsd: replace Jeff by Chuck as nfsd co-maintainer inet: clear num_timeout reqsk_alloc() PCI/P2PDMA: Ignore root complex whitelist when an IOMMU is present net: mvpp2: debugfs: Add pmap to fs dump ipv6: Default fib6_type to RTN_UNICAST when not set net: hns3: Fix inconsistent indenting net/af_iucv: always register net_device notifier ...
2019-06-29block: never take page references for ITER_BVECChristoph Hellwig1-3/+0
If we pass pages through an iov_iter we always already have a reference in the caller. Thus remove the ITER_BVEC_FLAG_NO_REF and don't take reference to pages by default for bvec backed iov_iters. Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-06-29direct-io: use bio_release_pages in dio_bio_completeChristoph Hellwig1-12/+3
Use bio_release_pages instead of duplicating it. Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-06-29block_dev: use bio_release_pages in bio_unmap_userChristoph Hellwig1-9/+2
Use bio_release_pages instead of duplicating it. Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-06-29block_dev: use bio_release_pages in blkdev_bio_end_ioChristoph Hellwig1-7/+1
Use bio_release_pages instead of duplicating it. Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-06-29iomap: use bio_release_pages in iomap_dio_bio_end_ioChristoph Hellwig1-7/+1
Use bio_release_pages instead of duplicating it. Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-06-29Merge tag 'xarray-5.2-rc6' of git://git.infradead.org/users/willy/linux-daxLinus Torvalds1-1/+1
Pull XArray fixes from Matthew Wilcox: - Account XArray nodes for the page cache to the appropriate cgroup (Johannes Weiner) - Fix idr_get_next() when called under the RCU lock (Matthew Wilcox) - Add a test for xa_insert() (Matthew Wilcox) * tag 'xarray-5.2-rc6' of git://git.infradead.org/users/willy/linux-dax: XArray tests: Add check_insert idr: Fix idr_get_next race with idr_remove mm: fix page cache convergence regression
2019-06-29Merge branch 'akpm' (patches from Andrew)Linus Torvalds6-42/+40
Merge misc fixes from Andrew Morton: "15 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: linux/kernel.h: fix overflow for DIV_ROUND_UP_ULL mm, swap: fix THP swap out fork,memcg: alloc_thread_stack_node needs to set tsk->stack MAINTAINERS: add CLANG/LLVM BUILD SUPPORT info mm/vmalloc.c: avoid bogus -Wmaybe-uninitialized warning mm/page_idle.c: fix oops because end_pfn is larger than max_pfn initramfs: fix populate_initrd_image() section mismatch mm/oom_kill.c: fix uninitialized oc->constraint mm: hugetlb: soft-offline: dissolve_free_huge_page() return zero on !PageHuge mm: soft-offline: return -EBUSY if set_hwpoison_free_buddy_page() fails signal: remove the wrong signal_pending() check in restore_user_sigmask() fs/binfmt_flat.c: make load_flat_shared_library() work mm/mempolicy.c: fix an incorrect rebind node in mpol_rebind_nodemask fs/proc/array.c: allow reporting eip/esp for all coredumping threads mm/dev_pfn: exclude MEMORY_DEVICE_PRIVATE while computing virtual address
2019-06-29Merge tag 'nfs-for-5.2-4' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds1-1/+1
Pull two more NFS client fixes from Anna Schumaker: "These are both stable fixes. One to calculate the correct client message length in the case of partial transmissions. And the other to set the proper TCP timeout for flexfiles" * tag 'nfs-for-5.2-4' of git://git.linux-nfs.org/projects/anna/linux-nfs: NFS/flexfiles: Use the correct TCP timeout for flexfiles I/O SUNRPC: Fix up calculation of client message length
2019-06-29Merge tag 'ceph-for-5.2-rc7' of git://github.com/ceph/ceph-clientLinus Torvalds1-1/+2
Pull ceph fix from Ilya Dryomov: "A small fix for a potential -rc1 regression from Jeff" * tag 'ceph-for-5.2-rc7' of git://github.com/ceph/ceph-client: ceph: fix ceph_mdsc_build_path to not stop on first component
2019-06-29Merge tag 'for-linus-20190628' of git://git.kernel.dk/linux-blockLinus Torvalds1-3/+2
Pull block fixes from Jens Axboe: "Just two small fixes. One from Paolo, fixing a silly mistake in BFQ. The other one is from me, ensuring that we have ->file cleared in the io_uring request a bit earlier. That avoids a use-before-free, if we encounter an error before ->file is assigned" * tag 'for-linus-20190628' of git://git.kernel.dk/linux-block: block, bfq: fix operator in BFQQ_TOTALLY_SEEKY io_uring: ensure req->file is cleared on allocation
2019-06-29signal: remove the wrong signal_pending() check in restore_user_sigmask()Oleg Nesterov4-25/+32
This is the minimal fix for stable, I'll send cleanups later. Commit 854a6ed56839 ("signal: Add restore_user_sigmask()") introduced the visible change which breaks user-space: a signal temporary unblocked by set_user_sigmask() can be delivered even if the caller returns success or timeout. Change restore_user_sigmask() to accept the additional "interrupted" argument which should be used instead of signal_pending() check, and update the callers. Eric said: : For clarity. I don't think this is required by posix, or fundamentally to : remove the races in select. It is what linux has always done and we have : applications who care so I agree this fix is needed. : : Further in any case where the semantic change that this patch rolls back : (aka where allowing a signal to be delivered and the select like call to : complete) would be advantage we can do as well if not better by using : signalfd. : : Michael is there any chance we can get this guarantee of the linux : implementation of pselect and friends clearly documented. The guarantee : that if the system call completes successfully we are guaranteed that no : signal that is unblocked by using sigmask will be delivered? Link: http://lkml.kernel.org/r/20190604134117.GA29963@redhat.com Fixes: 854a6ed56839a40f6b5d02a2962f48841482eec4 ("signal: Add restore_user_sigmask()") Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reported-by: Eric Wong <e@80x24.org> Tested-by: Eric Wong <e@80x24.org> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Deepa Dinamani <deepa.kernel@gmail.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Jason Baron <jbaron@akamai.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: David Laight <David.Laight@ACULAB.COM> Cc: <stable@vger.kernel.org> [5.0+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-06-29fs/binfmt_flat.c: make load_flat_shared_library() workJann Horn1-16/+7
load_flat_shared_library() is broken: It only calls load_flat_file() if prepare_binprm() returns zero, but prepare_binprm() returns the number of bytes read - so this only happens if the file is empty. Instead, call into load_flat_file() if the number of bytes read is non-negative. (Even if the number of bytes is zero - in that case, load_flat_file() will see nullbytes and return a nice -ENOEXEC.) In addition, remove the code related to bprm creds and stop using prepare_binprm() - this code is loading a library, not a main executable, and it only actually uses the members "buf", "file" and "filename" of the linux_binprm struct. Instead, call kernel_read() directly. Link: http://lkml.kernel.org/r/20190524201817.16509-1-jannh@google.com Fixes: 287980e49ffc ("remove lots of IS_ERR_VALUE abuses") Signed-off-by: Jann Horn <jannh@google.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Kees Cook <keescook@chromium.org> Cc: Nicolas Pitre <nicolas.pitre@linaro.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Greg Ungerer <gerg@linux-m68k.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-06-29fs/proc/array.c: allow reporting eip/esp for all coredumping threadsJohn Ogness1-1/+1
0a1eb2d474ed ("fs/proc: Stop reporting eip and esp in /proc/PID/stat") stopped reporting eip/esp and fd7d56270b52 ("fs/proc: Report eip/esp in /prod/PID/stat for coredumping") reintroduced the feature to fix a regression with userspace core dump handlers (such as minicoredumper). Because PF_DUMPCORE is only set for the primary thread, this didn't fix the original problem for secondary threads. Allow reporting the eip/esp for all threads by checking for PF_EXITING as well. This is set for all the other threads when they are killed. coredump_wait() waits for all the tasks to become inactive before proceeding to invoke a core dumper. Link: http://lkml.kernel.org/r/87y32p7i7a.fsf@linutronix.de Link: http://lkml.kernel.org/r/20190522161614.628-1-jlu@pengutronix.de Fixes: fd7d56270b526ca3 ("fs/proc: Report eip/esp in /prod/PID/stat for coredumping") Signed-off-by: John Ogness <john.ogness@linutronix.de> Reported-by: Jan Luebbe <jlu@pengutronix.de> Tested-by: Jan Luebbe <jlu@pengutronix.de> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-06-28NFS/flexfiles: Use the correct TCP timeout for flexfiles I/OTrond Myklebust1-1/+1
Fix a typo where we're confusing the default TCP retrans value (NFS_DEF_TCP_RETRANS) for the default TCP timeout value. Fixes: 15d03055cf39f ("pNFS/flexfiles: Set reasonable default ...") Cc: stable@vger.kernel.org # 4.8+ Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-06-28cifs: fix crash querying symlinks stored as reparse-pointsRonnie Sahlberg2-5/+73
We never parsed/returned any data from .get_link() when the object is a windows reparse-point containing a symlink. This results in the VFS layer oopsing accessing an uninitialized buffer: ... [ 171.407172] Call Trace: [ 171.408039] readlink_copy+0x29/0x70 [ 171.408872] vfs_readlink+0xc1/0x1f0 [ 171.409709] ? readlink_copy+0x70/0x70 [ 171.410565] ? simple_attr_release+0x30/0x30 [ 171.411446] ? getname_flags+0x105/0x2a0 [ 171.412231] do_readlinkat+0x1b7/0x1e0 [ 171.412938] ? __ia32_compat_sys_newfstat+0x30/0x30 ... Fix this by adding code to handle these buffers and make sure we do return a valid buffer to .get_link() CC: Stable <stable@vger.kernel.org> Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2019-06-28Merge tag 'for-linus-20190627' of gitolite.kernel.org:pub/scm/linux/kernel/git/brauner/linuxLinus Torvalds1-2/+1
Pull pidfd fixes from Christian Brauner: "Userspace tools and libraries such as strace or glibc need a cheap and reliable way to tell whether CLONE_PIDFD is supported. The easiest way is to pass an invalid fd value in the return argument, perform the syscall and verify the value in the return argument has been changed to a valid fd. However, if CLONE_PIDFD is specified we currently check if pidfd == 0 and return EINVAL if not. The check for pidfd == 0 was originally added to enable us to abuse the return argument for passing additional flags along with CLONE_PIDFD in the future. However, extending legacy clone this way would be a terrible idea and with clone3 on the horizon and the ability to reuse CLONE_DETACHED with CLONE_PIDFD there's no real need for this clutch. So remove the pidfd == 0 check and help userspace out. Also, accordig to Al, anon_inode_getfd() should only be used past the point of no failure and ksys_close() should not be used at all since it is far too easy to get wrong. Al's motto being "basically, once it's in descriptor table, it's out of your control". So Al's patch switches back to what we already had in v1 of the original patchset and uses a anon_inode_getfile() + put_user() + fd_install() sequence in the success path and a fput() + put_unused_fd() in the failure path. The other two changes should be trivial" * tag 'for-linus-20190627' of gitolite.kernel.org:pub/scm/linux/kernel/git/brauner/linux: proc: remove useless d_is_dir() check copy_process(): don't use ksys_close() on cleanups samples: make pidfd-metadata fail gracefully on older kernels fork: don't check parent_tidptr with CLONE_PIDFD
2019-06-28Merge tag 'afs-fixes-20190620' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fsLinus Torvalds4-19/+25
Pull AFS fixes from David Howells: "The in-kernel AFS client has been undergoing testing on opendev.org on one of their mirror machines. They are using AFS to hold data that is then served via apache, and Ian Wienand had reported seeing oopses, spontaneous machine reboots and updates to volumes going missing. This patch series appears to have fixed the problem, very probably due to patch (2), but it's not 100% certain. (1) Fix the printing of the "vnode modified" warning to exclude checks on files for which we don't have a callback promise from the server (and so don't expect the server to tell us when it changes). Without this, for every file or directory for which we still have an in-core inode that gets changed on the server, we may get a message logged when we next look at it. This can happen in bulk if, for instance, someone does "vos release" to update a R/O volume from a R/W volume and a whole set of files are all changed together. We only really want to log a message if the file changed and the server didn't tell us about it or we failed to track the state internally. (2) Fix accidental corruption of either afs_vlserver struct objects or the the following memory locations (which could hold anything). The issue is caused by a union that points to two different structs in struct afs_call (to save space in the struct). The call cleanup code assumes that it can simply call the cleanup for one of those structs if not NULL - when it might be actually pointing to the other struct. This means that every Volume Location RPC op is going to corrupt something. (3) Fix an uninitialised spinlock. This isn't too bad, it just causes a one-off warning if lockdep is enabled when "vos release" is called, but the spinlock still behaves correctly. (4) Fix the setting of i_block in the inode. This causes du, for example, to produce incorrect results, but otherwise should not be dangerous to the kernel" * tag 'afs-fixes-20190620' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: afs: Fix setting of i_blocks afs: Fix uninitialised spinlock afs_volume::cb_break_lock afs: Fix vlserver record corruption afs: Fix over zealous "vnode modified" warnings
2019-06-27keys: Replace uid/gid/perm permissions checking with an ACLDavid Howells10-21/+78
Replace the uid/gid/perm permissions checking on a key with an ACL to allow the SETATTR and SEARCH permissions to be split. This will also allow a greater range of subjects to represented. ============ WHY DO THIS? ============ The problem is that SETATTR and SEARCH cover a slew of actions, not all of which should be grouped together. For SETATTR, this includes actions that are about controlling access to a key: (1) Changing a key's ownership. (2) Changing a key's security information. (3) Setting a keyring's restriction. And actions that are about managing a key's lifetime: (4) Setting an expiry time. (5) Revoking a key. and (proposed) managing a key as part of a cache: (6) Invalidating a key. Managing a key's lifetime doesn't really have anything to do with controlling access to that key. Expiry time is awkward since it's more about the lifetime of the content and so, in some ways goes better with WRITE permission. It can, however, be set unconditionally by a process with an appropriate authorisation token for instantiating a key, and can also be set by the key type driver when a key is instantiated, so lumping it with the access-controlling actions is probably okay. As for SEARCH permission, that currently covers: (1) Finding keys in a keyring tree during a search. (2) Permitting keyrings to be joined. (3) Invalidation. But these don't really belong together either, since these actions really need to be controlled separately. Finally, there are number of special cases to do with granting the administrator special rights to invalidate or clear keys that I would like to handle with the ACL rather than key flags and special checks. =============== WHAT IS CHANGED =============== The SETATTR permission is split to create two new permissions: (1) SET_SECURITY - which allows the key's owner, group and ACL to be changed and a restriction to be placed on a keyring. (2) REVOKE - which allows a key to be revoked. The SEARCH permission is split to create: (1) SEARCH - which allows a keyring to be search and a key to be found. (2) JOIN - which allows a keyring to be joined as a session keyring. (3) INVAL - which allows a key to be invalidated. The WRITE permission is also split to create: (1) WRITE - which allows a key's content to be altered and links to be added, removed and replaced in a keyring. (2) CLEAR - which allows a keyring to be cleared completely. This is split out to make it possible to give just this to an administrator. (3) REVOKE - see above. Keys acquire ACLs which consist of a series of ACEs, and all that apply are unioned together. An ACE specifies a subject, such as: (*) Possessor - permitted to anyone who 'possesses' a key (*) Owner - permitted to the key owner (*) Group - permitted to the key group (*) Everyone - permitted to everyone Note that 'Other' has been replaced with 'Everyone' on the assumption that you wouldn't grant a permit to 'Other' that you wouldn't also grant to everyone else. Further subjects may be made available by later patches. The ACE also specifies a permissions mask. The set of permissions is now: VIEW Can view the key metadata READ Can read the key content WRITE Can update/modify the key content SEARCH Can find the key by searching/requesting LINK Can make a link to the key SET_SECURITY Can change owner, ACL, expiry INVAL Can invalidate REVOKE Can revoke JOIN Can join this keyring CLEAR Can clear this keyring The KEYCTL_SETPERM function is then deprecated. The KEYCTL_SET_TIMEOUT function then is permitted if SET_SECURITY is set, or if the caller has a valid instantiation auth token. The KEYCTL_INVALIDATE function then requires INVAL. The KEYCTL_REVOKE function then requires REVOKE. The KEYCTL_JOIN_SESSION_KEYRING function then requires JOIN to join an existing keyring. The JOIN permission is enabled by default for session keyrings and manually created keyrings only. ====================== BACKWARD COMPATIBILITY ====================== To maintain backward compatibility, KEYCTL_SETPERM will translate the permissions mask it is given into a new ACL for a key - unless KEYCTL_SET_ACL has been called on that key, in which case an error will be returned. It will convert possessor, owner, group and other permissions into separate ACEs, if each portion of the mask is non-zero. SETATTR permission turns on all of INVAL, REVOKE and SET_SECURITY. WRITE permission turns on WRITE, REVOKE and, if a keyring, CLEAR. JOIN is turned on if a keyring is being altered. The KEYCTL_DESCRIBE function translates the ACL back into a permissions mask to return depending on possessor, owner, group and everyone ACEs. It will make the following mappings: (1) INVAL, JOIN -> SEARCH (2) SET_SECURITY -> SETATTR (3) REVOKE -> WRITE if SETATTR isn't already set (4) CLEAR -> WRITE Note that the value subsequently returned by KEYCTL_DESCRIBE may not match the value set with KEYCTL_SETATTR. ======= TESTING ======= This passes the keyutils testsuite for all but a couple of tests: (1) tests/keyctl/dh_compute/badargs: The first wrong-key-type test now returns EOPNOTSUPP rather than ENOKEY as READ permission isn't removed if the type doesn't have ->read(). You still can't actually read the key. (2) tests/keyctl/permitting/valid: The view-other-permissions test doesn't work as Other has been replaced with Everyone in the ACL. Signed-off-by: David Howells <dhowells@redhat.com>
2019-06-27keys: Pass the network namespace into request_key mechanismDavid Howells5-8/+12
Create a request_key_net() function and use it to pass the network namespace domain tag into DNS revolver keys and rxrpc/AFS keys so that keys for different domains can coexist in the same keyring. Signed-off-by: David Howells <dhowells@redhat.com> cc: netdev@vger.kernel.org cc: linux-nfs@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: linux-afs@lists.infradead.org
2019-06-27ceph: fix ceph_mdsc_build_path to not stop on first componentJeff Layton1-1/+2
When ceph_mdsc_build_path is handed a positive dentry, it will return a zero-length path string with the base set to that dentry. This is not what we want. Always include at least one path component in the string. ceph_mdsc_build_path has behaved this way for a long time but it didn't matter until recent d_name handling rework. Fixes: 964fff7491e4 ("ceph: use ceph_mdsc_build_path instead of clone_dentry_name") Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-06-27proc: remove useless d_is_dir() checkChristian Brauner1-2/+1
Remove the d_is_dir() check from tgid_pidfd_to_pid(). It is pointless since you should never get &proc_tgid_base_operations for f_op on a non-directory. Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Christian Brauner <christian@brauner.io>
2019-06-24Merge tag 'v5.2-rc6' into sched/core, to refresh the branchIngo Molnar55-265/+216
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-21Merge tag 'nfs-for-5.2-3' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds1-9/+11
Pull more NFS client fixes from Anna Schumaker: "These are mostly refcounting issues that people have found recently. The revert fixes a suspend recovery performance issue. - SUNRPC: Fix a credential refcount leak - Revert "SUNRPC: Declare RPC timers as TIMER_DEFERRABLE" - SUNRPC: Fix xps refcount imbalance on the error path - NFS4: Only set creation opendata if O_CREAT" * tag 'nfs-for-5.2-3' of git://git.linux-nfs.org/projects/anna/linux-nfs: SUNRPC: Fix a credential refcount leak Revert "SUNRPC: Declare RPC timers as TIMER_DEFERRABLE" net :sunrpc :clnt :Fix xps refcount imbalance on the error path NFS4: Only set creation opendata if O_CREAT
2019-06-21io_uring: ensure req->file is cleared on allocationJens Axboe1-3/+2
Stephen reports: I hit the following General Protection Fault when testing io_uring via the io_uring engine in fio. This was on a VM running 5.2-rc5 and the latest version of fio. The issue occurs for both null_blk and fake NVMe drives. I have not tested bare metal or real NVMe SSDs. The fio script used is given below. [io_uring] time_based=1 runtime=60 filename=/dev/nvme2n1 (note /dev/nullb0 also fails) ioengine=io_uring bs=4k rw=readwrite direct=1 fixedbufs=1 sqthread_poll=1 sqthread_poll_cpu=0 general protection fault: 0000 [#1] SMP PTI CPU: 0 PID: 872 Comm: io_uring-sq Not tainted 5.2.0-rc5-cpacket-io-uring #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 RIP: 0010:fput_many+0x7/0x90 Code: 01 48 85 ff 74 17 55 48 89 e5 53 48 8b 1f e8 a0 f9 ff ff 48 85 db 48 89 df 75 f0 5b 5d f3 c3 0f 1f 40 00 0f 1f 44 00 00 89 f6 <f0> 48 29 77 38 74 01 c3 55 48 89 e5 53 48 89 fb 65 48 \ RSP: 0018:ffffadeb817ebc50 EFLAGS: 00010246 RAX: 0000000000000004 RBX: ffff8f46ad477480 RCX: 0000000000001805 RDX: 0000000000000000 RSI: 0000000000000001 RDI: f18b51b9a39552b5 RBP: ffffadeb817ebc58 R08: ffff8f46b7a318c0 R09: 000000000000015d R10: ffffadeb817ebce8 R11: 0000000000000020 R12: ffff8f46ad4cd000 R13: 00000000fffffff7 R14: ffffadeb817ebe30 R15: 0000000000000004 FS: 0000000000000000(0000) GS:ffff8f46b7a00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055828f0bbbf0 CR3: 0000000232176004 CR4: 00000000003606f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: ? fput+0x13/0x20 io_free_req+0x20/0x40 io_put_req+0x1b/0x20 io_submit_sqe+0x40a/0x680 ? __switch_to_asm+0x34/0x70 ? __switch_to_asm+0x40/0x70 io_submit_sqes+0xb9/0x160 ? io_submit_sqes+0xb9/0x160 ? __switch_to_asm+0x40/0x70 ? __switch_to_asm+0x34/0x70 ? __schedule+0x3f2/0x6a0 ? __switch_to_asm+0x34/0x70 io_sq_thread+0x1af/0x470 ? __switch_to_asm+0x34/0x70 ? wait_woken+0x80/0x80 ? __switch_to+0x85/0x410 ? __switch_to_asm+0x40/0x70 ? __switch_to_asm+0x34/0x70 ? __schedule+0x3f2/0x6a0 kthread+0x105/0x140 ? io_submit_sqes+0x160/0x160 ? kthread+0x105/0x140 ? io_submit_sqes+0x160/0x160 ? kthread_destroy_worker+0x50/0x50 ret_from_fork+0x35/0x40 which occurs because using a kernel side submission thread isn't valid without using fixed files (registered through io_uring_register()). This causes io_uring to put the request after logging an error, but before the file field is set in the request. If it happens to be non-zero, we attempt to fput() garbage. Fix this by ensuring that req->file is initialized when the request is allocated. Cc: stable@vger.kernel.org # 5.1+ Reported-by: Stephen Bates <sbates@raithlin.com> Tested-by: Stephen Bates <sbates@raithlin.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-06-21NFS4: Only set creation opendata if O_CREATBenjamin Coddington1-9/+11
We can end up in nfs4_opendata_alloc during task exit, in which case current->fs has already been cleaned up. This leads to a crash in current_umask(). Fix this by only setting creation opendata if we are actually doing an open with O_CREAT. We can drop the check for NULL nfs4_open_createattrs, since O_CREAT will never be set for the recovery path. Suggested-by: Trond Myklebust <trondmy@hammerspace.com> Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-06-21Merge tag 'spdx-5.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdxLinus Torvalds36-154/+35
Pull still more SPDX updates from Greg KH: "Another round of SPDX updates for 5.2-rc6 Here is what I am guessing is going to be the last "big" SPDX update for 5.2. It contains all of the remaining GPLv2 and GPLv2+ updates that were "easy" to determine by pattern matching. The ones after this are going to be a bit more difficult and the people on the spdx list will be discussing them on a case-by-case basis now. Another 5000+ files are fixed up, so our overall totals are: Files checked: 64545 Files with SPDX: 45529 Compared to the 5.1 kernel which was: Files checked: 63848 Files with SPDX: 22576 This is a huge improvement. Also, we deleted another 20000 lines of boilerplate license crud, always nice to see in a diffstat" * tag 'spdx-5.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx: (65 commits) treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 507 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 506 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 505 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 504 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 503 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 502 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 501 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 499 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 498 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 497 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 496 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 495 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 491 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 490 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 489 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 488 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 487 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 486 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 485 ...
2019-06-21Merge tag '5.2-rc5-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds6-4/+24
Pull cifs fixes from Steve French: "Four small SMB3 fixes, all for stable" * tag '5.2-rc5-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: fix GlobalMid_Lock bug in cifs_reconnect SMB3: retry on STATUS_INSUFFICIENT_RESOURCES instead of failing write cifs: add spinlock for the openFileList to cifsInodeInfo cifs: fix panic in smb2_reconnect
2019-06-20Merge tag 'ovl-fixes-5.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfsLinus Torvalds5-59/+90
Pull overlayfs fixes from Miklos Szeredi: "Fix two regressions in this cycle, and a couple of older bugs" * tag 'ovl-fixes-5.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: ovl: make i_ino consistent with st_ino in more cases ovl: fix typo in MODULE_PARM_DESC ovl: fix bogus -Wmaybe-unitialized warning ovl: don't fail with disconnected lower NFS ovl: fix wrong flags check in FS_IOC_FS[SG]ETXATTR ioctls
2019-06-20Merge tag 'fuse-fixes-5.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuseLinus Torvalds1-10/+0
Pull fuse fix from Miklos Szeredi: "Just a single revert, fixing a regression in -rc1" * tag 'fuse-fixes-5.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: Revert "fuse: require /dev/fuse reads to have enough buffer capacity"