aboutsummaryrefslogtreecommitdiffstats
path: root/fs (follow)
AgeCommit message (Collapse)AuthorFilesLines
2019-02-21proc, oom: do not report alien mms when setting oom_score_adjMichal Hocko1-4/+0
Tetsuo has reported that creating a thousands of processes sharing MM without SIGHAND (aka alien threads) and setting /proc/<pid>/oom_score_adj will swamp the kernel log and takes ages [1] to finish. This is especially worrisome that all that printing is done under RCU lock and this can potentially trigger RCU stall or softlockup detector. The primary reason for the printk was to catch potential users who might depend on the behavior prior to 44a70adec910 ("mm, oom_adj: make sure processes sharing mm have same view of oom_score_adj") but after more than 2 years without a single report I guess it is safe to simply remove the printk altogether. The next step should be moving oom_score_adj over to the mm struct and remove all the tasks crawling as suggested by [2] [1] http://lkml.kernel.org/r/97fce864-6f75-bca5-14bc-12c9f890e740@i-love.sakura.ne.jp [2] http://lkml.kernel.org/r/20190117155159.GA4087@dhcp22.suse.cz Link: http://lkml.kernel.org/r/20190212102129.26288-1-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: David Rientjes <rientjes@google.com> Cc: Yong-Taek Lee <ytk.lee@samsung.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-02-20orangefs: remove two un-needed BUG_ONs...Mike Marshall1-4/+0
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2019-02-20Merge branch 'fixes-v5.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-securityLinus Torvalds1-14/+17
Pull keys fixes from James Morris: - Handle quotas better, allowing full quota to be reached. - Fix the creation of shortcuts in the assoc_array internal representation when the index key needs to be an exact multiple of the machine word size. - Fix a dependency loop between the request_key contruction record and the request_key authentication key. The construction record isn't really necessary and can be dispensed with. - Set the timestamp on a new key rather than leaving it as 0. This would ordinarily be fine - provided the system clock is never set to a time before 1970 * 'fixes-v5.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: keys: Timestamp new keys keys: Fix dependency loop between construction record and auth key assoc_array: Fix shortcut creation KEYS: allow reaching the keys quotas exactly
2019-02-20Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller3-16/+56
Two easily resolvable overlapping change conflicts, one in TCP and one in the eBPF verifier. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-18exec: Fix mem leak in kernel_read_fileYueHaibing1-1/+1
syzkaller report this: BUG: memory leak unreferenced object 0xffffc9000488d000 (size 9195520): comm "syz-executor.0", pid 2752, jiffies 4294787496 (age 18.757s) hex dump (first 32 bytes): ff ff ff ff ff ff ff ff a8 00 00 00 01 00 00 00 ................ 02 00 00 00 00 00 00 00 80 a1 7a c1 ff ff ff ff ..........z..... backtrace: [<000000000863775c>] __vmalloc_node mm/vmalloc.c:1795 [inline] [<000000000863775c>] __vmalloc_node_flags mm/vmalloc.c:1809 [inline] [<000000000863775c>] vmalloc+0x8c/0xb0 mm/vmalloc.c:1831 [<000000003f668111>] kernel_read_file+0x58f/0x7d0 fs/exec.c:924 [<000000002385813f>] kernel_read_file_from_fd+0x49/0x80 fs/exec.c:993 [<0000000011953ff1>] __do_sys_finit_module+0x13b/0x2a0 kernel/module.c:3895 [<000000006f58491f>] do_syscall_64+0x147/0x600 arch/x86/entry/common.c:290 [<00000000ee78baf4>] entry_SYSCALL_64_after_hwframe+0x49/0xbe [<00000000241f889b>] 0xffffffffffffffff It should goto 'out_free' lable to free allocated buf while kernel_read fails. Fixes: 39d637af5aa7 ("vfs: forbid write access when reading a file into memory") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-02-18exec: load_script: Do not exec truncated interpreter pathKees Cook1-9/+48
Commit 8099b047ecc4 ("exec: load_script: don't blindly truncate shebang string") was trying to protect against a confused exec of a truncated interpreter path. However, it was overeager and also refused to truncate arguments as well, which broke userspace, and it was reverted. This attempts the protection again, but allows arguments to remain truncated. In an effort to improve readability, helper functions and comments have been added. Co-developed-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Samuel Dionne-Riel <samuel@dionne-riel.com> Cc: Richard Weinberger <richard.weinberger@gmail.com> Cc: Graham Christensen <graham@grahamc.com> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-02-18ceph: avoid repeatedly adding inode to mdsc->snap_flush_listYan, Zheng1-1/+2
Otherwise, mdsc->snap_flush_list may get corrupted. Cc: stable@vger.kernel.org Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-02-16Merge tag 'nfsd-5.0-2' of git://linux-nfs.org/~bfields/linuxLinus Torvalds1-2/+2
Pull more nfsd fixes from Bruce Fields: "Two small fixes, one for crashes using nfs/krb5 with older enctypes, one that could prevent clients from reclaiming state after a kernel upgrade" * tag 'nfsd-5.0-2' of git://linux-nfs.org/~bfields/linux: sunrpc: fix 4 more call sites that were using stack memory with a scatterlist Revert "nfsd4: return default lease period"
2019-02-16Merge tag 'nfs-for-5.0-4' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds1-5/+6
Pull more NFS client fixes from Anna Schumaker: "Three fixes this time. Nicolas's is for xprtrdma completion vector allocation on single-core systems. Greg's adds an error check when allocating a debugfs dentry. And Ben's is an additional fix for nfs_page_async_flush() to prevent pages from accidentally getting truncated. Summary: - Make sure Send CQ is allocated on an existing compvec - Properly check debugfs dentry before using it - Don't use page_file_mapping() after removing a page" * tag 'nfs-for-5.0-4' of git://git.linux-nfs.org/projects/anna/linux-nfs: NFS: Don't use page_file_mapping after removing the page rpc: properly check debugfs dentry before using it xprtrdma: Make sure Send CQ is allocated on an existing compvec
2019-02-15keys: Fix dependency loop between construction record and auth keyDavid Howells1-14/+17
In the request_key() upcall mechanism there's a dependency loop by which if a key type driver overrides the ->request_key hook and the userspace side manages to lose the authorisation key, the auth key and the internal construction record (struct key_construction) can keep each other pinned. Fix this by the following changes: (1) Killing off the construction record and using the auth key instead. (2) Including the operation name in the auth key payload and making the payload available outside of security/keys/. (3) The ->request_key hook is given the authkey instead of the cons record and operation name. Changes (2) and (3) allow the auth key to naturally be cleaned up if the keyring it is in is destroyed or cleared or the auth key is unlinked. Fixes: 7ee02a316600 ("keys: Fix dependency loop between construction record and auth key") Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: James Morris <james.morris@microsoft.com>
2019-02-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller14-230/+168
The netfilter conflicts were rather simple overlapping changes. However, the cls_tcindex.c stuff was a bit more complex. On the 'net' side, Cong is fixing several races and memory leaks. Whilst on the 'net-next' side we have Vlad adding the rtnl-ness support. What I've decided to do, in order to resolve this, is revert the conversion over to using a workqueue that Cong did, bringing us back to pure RCU. I did it this way because I believe that either Cong's races don't apply with have Vlad did things, or Cong will have to implement the race fix slightly differently. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-14Revert "exec: load_script: don't blindly truncate shebang string"Linus Torvalds1-7/+3
This reverts commit 8099b047ecc431518b9bb6bdbba3549bbecdc343. It turns out that people do actually depend on the shebang string being truncated, and on the fact that an interpreter (like perl) will often just re-interpret it entirely to get the full argument list. Reported-by: Samuel Dionne-Riel <samuel@dionne-riel.com> Acked-by: Kees Cook <keescook@chromium.org> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-02-14Merge branch 'linus' into irq/coreThomas Gleixner50-232/+551
Pick up upstream changes to avoid conflicts for pending patches.
2019-02-14Revert "gfs2: read journal in large chunks to locate the head"Bob Peterson8-192/+134
This reverts commit 2a5f14f279f59143139bcd1606903f2f80a34241. This patch causes xfstests generic/311 to fail. Reverting this for now until we have a proper fix. Signed-off-by: Abhi Das <adas@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-02-14Revert "nfsd4: return default lease period"J. Bruce Fields1-2/+2
This reverts commit d6ebf5088f09472c1136cd506bdc27034a6763f8. I forgot that the kernel's default lease period should never be decreased! After a kernel upgrade, the kernel has no way of knowing on its own what the previous lease time was. Unless userspace tells it otherwise, it will assume the previous lease period was the same. So if we decrease this value in a kernel upgrade, we end up enforcing a grace period that's too short, and clients will fail to reclaim state in time. Symptoms may include EIO and log messages like "NFS: nfs4_reclaim_open_state: Lock reclaim failed!" There was no real justification for the lease period decrease anyway. Reported-by: Donald Buczek <buczek@molgen.mpg.de> Fixes: d6ebf5088f09 "nfsd4: return default lease period" Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-02-12Merge branch 'akpm' (patches from Andrew)Linus Torvalds2-13/+16
Merge fixes from Andrew Morton: "6 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: mm: proc: smaps_rollup: fix pss_locked calculation Rename include/{uapi => }/asm-generic/shmparam.h really Revert "mm: use early_pfn_to_nid in page_ext_init" mm/gup: fix gup_pmd_range() for dax Revert "mm: slowly shrink slabs with a relatively small number of objects" Revert "mm: don't reclaim inodes with many attached pages"
2019-02-12mm: proc: smaps_rollup: fix pss_locked calculationSandeep Patil1-8/+14
The 'pss_locked' field of smaps_rollup was being calculated incorrectly. It accumulated the current pss everytime a locked VMA was found. Fix that by adding to 'pss_locked' the same time as that of 'pss' if the vma being walked is locked. Link: http://lkml.kernel.org/r/20190203065425.14650-1-sspatil@android.com Fixes: 493b0e9d945f ("mm: add /proc/pid/smaps_rollup") Signed-off-by: Sandeep Patil <sspatil@android.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Daniel Colascione <dancol@google.com> Cc: <stable@vger.kernel.org> [4.14.x, 4.19.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-02-12Revert "mm: don't reclaim inodes with many attached pages"Dave Chinner1-5/+2
This reverts commit a76cf1a474d7d ("mm: don't reclaim inodes with many attached pages"). This change causes serious changes to page cache and inode cache behaviour and balance, resulting in major performance regressions when combining worklaods such as large file copies and kernel compiles. https://bugzilla.kernel.org/show_bug.cgi?id=202441 This change is a hack to work around the problems introduced by changing how agressive shrinkers are on small caches in commit 172b06c32b94 ("mm: slowly shrink slabs with a relatively small number of objects"). It creates more problems than it solves, wasn't adequately reviewed or tested, so it needs to be reverted. Link: http://lkml.kernel.org/r/20190130041707.27750-2-david@fromorbit.com Fixes: a76cf1a474d7d ("mm: don't reclaim inodes with many attached pages") Signed-off-by: Dave Chinner <dchinner@redhat.com> Cc: Wolfgang Walter <linux@stwm.de> Cc: Roman Gushchin <guro@fb.com> Cc: Spock <dairinin@gmail.com> Cc: Rik van Riel <riel@surriel.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-02-12NFS: Don't use page_file_mapping after removing the pageBenjamin Coddington1-5/+6
If nfs_page_async_flush() removes the page from the mapping, then we can't use page_file_mapping() on it as nfs_updatepate() is wont to do when receiving an error. Instead, push the mapping to the stack before the page is possibly truncated. Fixes: 8fc75bed96bb ("NFS: Fix up return value on fatal errors in nfs_page_async_flush()") Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-02-12Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4Linus Torvalds1-9/+4
Pull ext4 fix from Ted Ts'o: "Revert a commit which landed in v5.0-rc1 since it makes fsync in ext4 nojournal mode unsafe" * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: Revert "ext4: use ext4_write_inode() when fsyncing w/o a journal"
2019-02-11Merge 5.0-rc6 into driver-core-nextGreg Kroah-Hartman56-258/+613
We need the debugfs fixes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-02-11Merge tag 'v5.0-rc6' into sched/core, to pick up fixesIngo Molnar43-205/+496
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-02-10proc/stat: Make the interrupt statistics more efficientThomas Gleixner1-3/+26
Waiman reported that on large systems with a large amount of interrupts the readout of /proc/stat takes a long time to sum up the interrupt statistics. In principle this is not a problem. but for unknown reasons some enterprise quality software reads /proc/stat with a high frequency. The reason for this is that interrupt statistics are accounted per cpu. So the /proc/stat logic has to sum up the interrupt stats for each interrupt. The interrupt core provides now a per interrupt summary counter which can be used to avoid the summation loops completely except for interrupts marked PER_CPU which are only a small fraction of the interrupt space if at all. Another simplification is to iterate only over the active interrupts and skip the potentially large gaps in the interrupt number space and just print zeros for the gaps without going into the interrupt core in the first place. Waiman provided test results from a 4-socket IvyBridge-EX system (60-core 120-thread, 3016 irqs) excuting a test program which reads /proc/stat 50,000 times: Before: 18.436s (sys 18.380s) After: 3.769s (sys 3.742s) Reported-by: Waiman Long <longman@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Waiman Long <longman@redhat.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Davidlohr Bueso <dbueso@suse.de> Cc: Matthew Wilcox <willy@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Kees Cook <keescook@chromium.org> Cc: linux-fsdevel@vger.kernel.org Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Daniel Colascione <dancol@google.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Randy Dunlap <rdunlap@infradead.org> Link: https://lkml.kernel.org/r/20190208135021.013828701@linutronix.de
2019-02-10Merge tag 'y2038-new-syscalls' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground into timers/2038Thomas Gleixner4-14/+14
Pull y2038 - time64 system calls from Arnd Bergmann: This series finally gets us to the point of having system calls with 64-bit time_t on all architectures, after a long time of incremental preparation patches. There was actually one conversion that I missed during the summer, i.e. Deepa's timex series, which I now updated based the 5.0-rc1 changes and review comments. The following system calls are now added on all 32-bit architectures using the same system call numbers: 403 clock_gettime64 404 clock_settime64 405 clock_adjtime64 406 clock_getres_time64 407 clock_nanosleep_time64 408 timer_gettime64 409 timer_settime64 410 timerfd_gettime64 411 timerfd_settime64 412 utimensat_time64 413 pselect6_time64 414 ppoll_time64 416 io_pgetevents_time64 417 recvmmsg_time64 418 mq_timedsend_time64 419 mq_timedreceiv_time64 420 semtimedop_time64 421 rt_sigtimedwait_time64 422 futex_time64 423 sched_rr_get_interval_time64 Each one of these corresponds directly to an existing system call that includes a 'struct timespec' argument, or a structure containing a timespec or (in case of clock_adjtime) timeval. Not included here are new versions of getitimer/setitimer and getrusage/waitid, which are planned for the future but only needed to make a consistent API rather than for correct operation beyond y2038. These four system calls are based on 'timeval', and it has not been finally decided what the replacement kernel interface will use instead. So far, I have done a lot of build testing across most architectures, which has found a number of bugs. Runtime testing so far included testing LTP on 32-bit ARM with the existing system calls, to ensure we do not regress for existing binaries, and a test with a 32-bit x86 build of LTP against a modified version of the musl C library that has been adapted to the new system call interface [3]. This library can be used for testing on all architectures supported by musl-1.1.21, but it is not how the support is getting integrated into the official musl release. Official musl support is planned but will require more invasive changes to the library. Link: https://lore.kernel.org/lkml/20190110162435.309262-1-arnd@arndb.de/T/ Link: https://lore.kernel.org/lkml/20190118161835.2259170-1-arnd@arndb.de/ Link: https://git.linaro.org/people/arnd/musl-y2038.git/ [2]
2019-02-09Merge tag 'for-linus-20190209' of git://git.kernel.dk/linux-blockLinus Torvalds2-9/+11
Pull block fixes from Jens Axboe: - NVMe pull request from Christoph, fixing namespace locking when dealing with the effects log, and a rapid add/remove issue (Keith) - blktrace tweak, ensuring requests with -1 sectors are shown (Jan) - link power management quirk for a Smasung SSD (Hans) - m68k nfblock dynamic major number fix (Chengguang) - series fixing blk-iolatency inflight counter issue (Liu) - ensure that we clear ->private when setting up the aio kiocb (Mike) - __find_get_block_slow() rate limit print (Tetsuo) * tag 'for-linus-20190209' of git://git.kernel.dk/linux-block: blk-mq: remove duplicated definition of blk_mq_freeze_queue Blk-iolatency: warn on negative inflight IO counter blk-iolatency: fix IO hang due to negative inflight counter blktrace: Show requests without sector fs: ratelimit __find_get_block_slow() failure message. m68k: set proper major_num when specifying module param major_num libata: Add NOLPM quirk for SAMSUNG MZ7TE512HMHP-000L1 SSD nvme-pci: fix rapid add remove sequence nvme: lock NS list changes while handling command effects aio: initialize kiocb private in case any filesystems expect it.
2019-02-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller28-110/+302
An ipvlan bug fix in 'net' conflicted with the abstraction away of the IPV6 specific support in 'net-next'. Similarly, a bug fix for mlx5 in 'net' conflicted with the flow action conversion in 'net-next'. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-08Merge tag 'driver-core-5.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-coreLinus Torvalds1-12/+24
Pull driver core fixes from Greg KH: "Here are some driver core fixes for 5.0-rc6. Well, not so much "driver core" as "debugfs". There's a lot of outstanding debugfs cleanup patches coming in through different subsystem trees, and in that process the debugfs core was found that it really should return errors when something bad happens, to prevent random files from showing up in the root of debugfs afterward. So debugfs was fixed up to handle this properly, and then two fixes for the relay and blk-mq code was needed as it was making invalid assumptions about debugfs return values. There's also a cacheinfo fix in here that resolves a tiny issue. All of these have been in linux-next for over a week with no reported problems" * tag 'driver-core-5.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: blk-mq: protect debugfs_create_files() from failures relay: check return of create_buf_file() properly debugfs: debugfs_lookup() should return NULL if not found debugfs: return error values, not NULL debugfs: fix debugfs_rename parameter checking cacheinfo: Keep the old value if of_property_read_u32 fails
2019-02-08Merge tag 'xfs-5.0-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds3-5/+27
Pull xfs fixes from Darrick Wong: "Here are a handful of XFS fixes to fix a data corruption problem, a crasher bug, and a deadlock. Summary: - Fix cache coherency problem with writeback mappings - Fix buffer deadlock when shutting fs down - Fix a null pointer dereference when running online repair" * tag 'xfs-5.0-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: set buffer ops when repair probes for btree type xfs: end sync buffer I/O properly on shutdown error xfs: eof trim writeback mapping as soon as it is cached
2019-02-08kernfs: Allocating memory for kernfs_iattrs with kmem_cache.Ayush Mittal4-4/+9
Creating a new cache for kernfs_iattrs. Currently, memory is allocated with kzalloc() which always gives aligned memory. On ARM, this is 64 byte aligned. To avoid the wastage of memory in aligning the size requested, a new cache for kernfs_iattrs is created. Size of struct kernfs_iattrs is 80 Bytes. On ARM, it will come in kmalloc-128 slab. and it will come in kmalloc-192 slab if debug info is enabled. Extra bytes taken 48 bytes. Total number of objects created : 4096 Total saving = 48*4096 = 192 KB After creating new slab(When debug info is enabled) : sh-3.2# cat /proc/slabinfo ... kernfs_iattrs_cache 4069 4096 128 32 1 : tunables 0 0 0 : slabdata 128 128 0 ... All testing has been done on ARM target. Signed-off-by: Ayush Mittal <ayush.m@samsung.com> Signed-off-by: Vaneet Narang <v.narang@samsung.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-02-08sysfs: remove unused include of kernfs-internal.hOndrej Mosnacek1-1/+0
This include is not needed (fs/sysfs/file.c builds just fine without it). Remove it. Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-02-07Merge tag 'nfsd-5.0-1' of git://linux-nfs.org/~bfields/linuxLinus Torvalds1-2/+4
Pull nfsd fixes from Bruce Fields: "Two small nfsd bugfixes for 5.0, for an RDMA bug and a file clone bug" * tag 'nfsd-5.0-1' of git://linux-nfs.org/~bfields/linux: svcrdma: Remove max_sge check at connect time nfsd: Fix error return values for nfsd4_clone_file_range()
2019-02-07Merge tag 'fuse-fixes-5.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuseLinus Torvalds3-3/+5
Pull fuse fixes from Miklos Szeredi: "A fix for a CUSE regression introduced in v4.20, as well as fixes for a couple of old bugs" * tag 'fuse-fixes-5.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: decrement NR_WRITEBACK_TEMP on the right page fuse: call pipe_buf_release() under pipe lock cuse: fix ioctl fuse: handle zero sized retrieve correctly
2019-02-07y2038: syscalls: rename y2038 compat syscallsArnd Bergmann4-14/+14
A lot of system calls that pass a time_t somewhere have an implementation using a COMPAT_SYSCALL_DEFINEx() on 64-bit architectures, and have been reworked so that this implementation can now be used on 32-bit architectures as well. The missing step is to redefine them using the regular SYSCALL_DEFINEx() to get them out of the compat namespace and make it possible to build them on 32-bit architectures. Any system call that ends in 'time' gets a '32' suffix on its name for that version, while the others get a '_time32' suffix, to distinguish them from the normal version, which takes a 64-bit time argument in the future. In this step, only 64-bit architectures are changed, doing this rename first lets us avoid touching the 32-bit architectures twice. Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2019-02-06nfsd: Fix error return values for nfsd4_clone_file_range()Trond Myklebust1-2/+4
If the parameter 'count' is non-zero, nfsd4_clone_file_range() will currently clobber all errors returned by vfs_clone_file_range() and replace them with EINVAL. Fixes: 42ec3d4c0218 ("vfs: make remap_file_range functions take and...") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: stable@vger.kernel.org # v4.20+ Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-02-06fs: ratelimit __find_get_block_slow() failure message.Tetsuo Handa1-9/+10
When something let __find_get_block_slow() hit all_mapped path, it calls printk() for 100+ times per a second. But there is no need to print same message with such high frequency; it is just asking for stall warning, or at least bloating log files. [ 399.866302][T15342] __find_get_block_slow() failed. block=1, b_blocknr=8 [ 399.873324][T15342] b_state=0x00000029, b_size=512 [ 399.878403][T15342] device loop0 blocksize: 4096 [ 399.883296][T15342] __find_get_block_slow() failed. block=1, b_blocknr=8 [ 399.890400][T15342] b_state=0x00000029, b_size=512 [ 399.895595][T15342] device loop0 blocksize: 4096 [ 399.900556][T15342] __find_get_block_slow() failed. block=1, b_blocknr=8 [ 399.907471][T15342] b_state=0x00000029, b_size=512 [ 399.912506][T15342] device loop0 blocksize: 4096 This patch reduces frequency to up to once per a second, in addition to concatenating three lines into one. [ 399.866302][T15342] __find_get_block_slow() failed. block=1, b_blocknr=8, b_state=0x00000029, b_size=512, device loop0 blocksize: 4096 Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-02-06aio: initialize kiocb private in case any filesystems expect it.Mike Marshall1-0/+1
A recent optimization had left private uninitialized. Fixes: 2bc4ca9bb600 ("aio: don't zero entire aio_kiocb aio_get_req()") Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Mike Marshall <hubcap@omnibond.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-02-04sched/core: Convert sighand_struct.count to refcount_tElena Reshetova2-3/+3
atomic_t variables are currently used to implement reference counters with the following properties: - counter is initialized to 1 using atomic_set() - a resource is freed upon counter reaching zero - once counter reaches zero, its further increments aren't allowed - counter schema uses basic atomic operations (set, inc, inc_not_zero, dec_and_test, etc.) Such atomic variables should be converted to a newly provided refcount_t type and API that prevents accidental counter overflows and underflows. This is important since overflows and underflows can lead to use-after-free situation and be exploitable. The variable sighand_struct.count is used as pure reference counter. Convert it to refcount_t and fix up the operations. ** Important note for maintainers: Some functions from refcount_t API defined in lib/refcount.c have different memory ordering guarantees than their atomic counterparts. The full comparison can be seen in https://lkml.org/lkml/2017/11/15/57 and it is hopefully soon in state to be merged to the documentation tree. Normally the differences should not matter since refcount_t provides enough guarantees to satisfy the refcounting use cases, but in some rare cases it might matter. Please double check that you don't have some undocumented memory guarantees for this variable usage. For the sighand_struct.count it might make a difference in following places: - __cleanup_sighand: decrement in refcount_dec_and_test() only provides RELEASE ordering and control dependency on success vs. fully ordered atomic counterpart Suggested-by: Kees Cook <keescook@chromium.org> Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: David Windsor <dwindsor@gmail.com> Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com> Reviewed-by: Andrea Parri <andrea.parri@amarulasolutions.com> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: akpm@linux-foundation.org Cc: viro@zeniv.linux.org.uk Link: https://lkml.kernel.org/r/1547814450-18902-2-git-send-email-elena.reshetova@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-02-03xfs: set buffer ops when repair probes for btree typeDarrick J. Wong2-3/+24
In xrep_findroot_block, we work out the btree type and correctness of a given block by calling different btree verifiers on root block candidates. However, we leave the NULL b_ops while ->verify_read validates the block, which means that if the verifier calls xfs_buf_verifier_error it'll crash on the null b_ops. Fix it to set b_ops before calling the verifier and unsetting it if the verifier fails. Furthermore, improve the documentation around xfs_buf_ensure_ops, which is the function that is responsible for cleaning up the b_ops state of buffers that go through xrep_findroot_block but don't match anything. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2019-02-03xfs: end sync buffer I/O properly on shutdown errorBrian Foster1-2/+1
As of commit e339dd8d8b ("xfs: use sync buffer I/O for sync delwri queue submission"), the delwri submission code uses sync buffer I/O for sync delwri I/O. Instead of waiting on async I/O to unlock the buffer, it uses the underlying sync I/O completion mechanism. If delwri buffer submission fails due to a shutdown scenario, an error is set on the buffer and buffer completion never occurs. This can cause xfs_buf_delwri_submit() to deadlock waiting on a completion event. We could check the error state before waiting on such buffers, but that doesn't serialize against the case of an error set via a racing I/O completion. Instead, invoke I/O completion in the shutdown case regardless of buffer I/O type. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-02-03xfs: eof trim writeback mapping as soon as it is cachedBrian Foster1-0/+2
The cached writeback mapping is EOF trimmed to try and avoid races between post-eof block management and writeback that result in sending cached data to a stale location. The cached mapping is currently trimmed on the validation check, which leaves a race window between the time the mapping is cached and when it is trimmed against the current inode size. For example, if a new mapping is cached by delalloc conversion on a blocksize == page size fs, we could cycle various locks, perform memory allocations, etc. in the writeback codepath before the associated mapping is eventually trimmed to i_size. This leaves enough time for a post-eof truncate and file append before the cached mapping is trimmed. The former event essentially invalidates a range of the cached mapping and the latter bumps the inode size such the trim on the next writepage event won't trim all of the invalid blocks. fstest generic/464 reproduces this scenario occasionally and causes a lost writeback and stale delalloc blocks warning on inode inactivation. To work around this problem, trim the cached writeback mapping as soon as it is cached in addition to on subsequent validation checks. This is a minor tweak to tighten the race window as much as possible until a proper invalidation mechanism is available. Fixes: 40214d128e07 ("xfs: trim writepage mapping to within eof") Cc: <stable@vger.kernel.org> # v4.14+ Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-02-03socket: Rename SO_RCVTIMEO/ SO_SNDTIMEO with _OLD suffixesDeepa Dinamani1-2/+2
SO_RCVTIMEO and SO_SNDTIMEO socket options use struct timeval as the time format. struct timeval is not y2038 safe. The subsequent patches in the series add support for new socket timeout options with _NEW suffix that will use y2038 safe data structures. Although the existing struct timeval layout is sufficiently wide to represent timeouts, because of the way libc will interpret time_t based on user defined flag, these new flags provide a way of having a structure that is the same for all architectures consistently. Rename the existing options with _OLD suffix forms so that the right option is enabled for userspace applications according to the architecture and time_t definition of libc. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Acked-by: Willem de Bruijn <willemb@google.com> Cc: ccaulfie@redhat.com Cc: deller@gmx.de Cc: paulus@samba.org Cc: ralf@linux-mips.org Cc: rth@twiddle.net Cc: cluster-devel@redhat.com Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-alpha@vger.kernel.org Cc: linux-arch@vger.kernel.org Cc: linux-mips@vger.kernel.org Cc: linux-parisc@vger.kernel.org Cc: sparclinux@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-03Merge tag 'for-5.0-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linuxLinus Torvalds4-38/+71
Pull btrfs fixes from David Sterba: - regression fix: transaction commit can run away due to delayed ref waiting heuristic, this is not necessary now because of the proper reservation mechanism introduced in 5.0 - regression fix: potential crash due to use-before-check of an ERR_PTR return value - fix for transaction abort during transaction commit that needs to properly clean up pending block groups - fix deadlock during b-tree node/leaf splitting, when this happens on some of the fundamental trees, we must prevent new tree block allocation to re-enter indirectly via the block group flushing path - potential memory leak after errors during mount * tag 'for-5.0-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: On error always free subvol_name in btrfs_mount btrfs: clean up pending block groups when transaction commit aborts btrfs: fix potential oops in device_list_add btrfs: don't end the transaction for delayed refs in throttle Btrfs: fix deadlock when allocating tree block during leaf/node split
2019-02-02Merge branch 'akpm' (patches from Andrew)Linus Torvalds6-4/+36
Merge misc fixes from Andrew Morton: "24 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (24 commits) autofs: fix error return in autofs_fill_super() autofs: drop dentry reference only when it is never used fs/drop_caches.c: avoid softlockups in drop_pagecache_sb() mm: migrate: don't rely on __PageMovable() of newpage after unlocking it psi: clarify the Kconfig text for the default-disable option mm, memory_hotplug: __offline_pages fix wrong locking mm: hwpoison: use do_send_sig_info() instead of force_sig() kasan: mark file common so ftrace doesn't trace it init/Kconfig: fix grammar by moving a closing parenthesis lib/test_kmod.c: potential double free in error handling mm, oom: fix use-after-free in oom_kill_process mm/hotplug: invalid PFNs from pfn_to_online_page() mm,memory_hotplug: fix scan_movable_pages() for gigantic hugepages psi: fix aggregation idle shut-off mm, memory_hotplug: test_pages_in_a_zone do not pass the end of zone mm, memory_hotplug: is_mem_section_removable do not pass the end of a zone oom, oom_reaper: do not enqueue same task twice mm: migrate: make buffer_migrate_page_norefs() actually succeed kernel/exit.c: release ptraced tasks before zap_pid_ns_processes x86_64: increase stack size for KASAN_EXTRA ...
2019-02-01Merge tag '5.0-rc4-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds5-29/+61
Pull smb3 fixes from Steve French: "SMB3 fixes, some from this week's SMB3 test evemt, 5 for stable and a particularly important one for queryxattr (see xfstests 70 and 117)" * tag '5.0-rc4-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: update internal module version number CIFS: fix use-after-free of the lease keys CIFS: Do not consider -ENODATA as stat failure for reads CIFS: Do not count -ENODATA as failure for query directory CIFS: Fix trace command logging for SMB2 reads and writes CIFS: Fix possible oops and memory leaks in async IO cifs: limit amount of data we request for xattrs to CIFSMaxBufSize cifs: fix computation for MAX_SMB2_HDR_SIZE
2019-02-01autofs: fix error return in autofs_fill_super()Ian Kent1-1/+3
In autofs_fill_super() on error of get inode/make root dentry the return should be ENOMEM as this is the only failure case of the called functions. Link: http://lkml.kernel.org/r/154725123240.11260.796773942606871359.stgit@pluto-themaw-net Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-02-01autofs: drop dentry reference only when it is never usedPan Bian1-1/+2
autofs_expire_run() calls dput(dentry) to drop the reference count of dentry. However, dentry is read via autofs_dentry_ino(dentry) after that. This may result in a use-free-bug. The patch drops the reference count of dentry only when it is never used. Link: http://lkml.kernel.org/r/154725122396.11260.16053424107144453867.stgit@pluto-themaw-net Signed-off-by: Pan Bian <bianpan2016@163.com> Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-02-01fs/drop_caches.c: avoid softlockups in drop_pagecache_sb()Jan Kara1-1/+7
When superblock has lots of inodes without any pagecache (like is the case for /proc), drop_pagecache_sb() will iterate through all of them without dropping sb->s_inode_list_lock which can lead to softlockups (one of our customers hit this). Fix the problem by going to the slow path and doing cond_resched() in case the process needs rescheduling. Link: http://lkml.kernel.org/r/20190114085343.15011-1-jack@suse.cz Signed-off-by: Jan Kara <jack@suse.cz> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-02-01proc: fix /proc/net/* after setns(2)Alexey Dobriyan3-1/+24
/proc entries under /proc/net/* can't be cached into dcache because setns(2) can change current net namespace. [akpm@linux-foundation.org: coding-style fixes] [akpm@linux-foundation.org: avoid vim miscolorization] [adobriyan@gmail.com: write test, add dummy ->d_revalidate hook: necessary if /proc/net/* is pinned at setns time] Link: http://lkml.kernel.org/r/20190108192350.GA12034@avx2 Link: http://lkml.kernel.org/r/20190107162336.GA9239@avx2 Fixes: 1da4d377f943fe4194ffb9fb9c26cc58fad4dd24 ("proc: revalidate misc dentries") Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reported-by: Mateusz Stępień <mateusz.stepien@netrounds.com> Reported-by: Ahmad Fatoum <a.fatoum@pengutronix.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-02-01Merge tag 'iomap-5.0-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds1-7/+30
Pull iomap fixes from Darrick Wong: "A couple of iomap fixes to eliminate some memory corruption and hang problems that were reported: - fix page migration when using iomap for pagecache management - fix a use-after-free bug in the directio code" * tag 'iomap-5.0-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: iomap: fix a use after free in iomap_dio_rw iomap: get/put the page in iomap_page_create/release()
2019-02-01copy_mount_string: Limit string length to PATH_MAXChandan Rajendra1-1/+1
On ppc64le, When a string with PAGE_SIZE - 1 (i.e. 64k-1) length is passed as a "filesystem type" argument to the mount(2) syscall, copy_mount_string() ends up allocating 64k (the PAGE_SIZE on ppc64le) worth of space for holding the string in kernel's address space. Later, in set_precision() (invoked by get_fs_type() -> __request_module() -> vsnprintf()), we end up assigning strlen(fs-type-string) i.e. 65535 as the value to 'struct printf_spec'->precision member. This field has a width of 16 bits and it is a signed data type. Hence an invalid value ends up getting assigned. This causes the "WARN_ONCE(spec->precision != prec, "precision %d too large", prec)" statement inside set_precision() to be executed. This commit fixes the bug by limiting the length of the string passed by copy_mount_string() to strndup_user() to PATH_MAX. Signed-off-by: Chandan Rajendra <chandan@linux.ibm.com> Reported-by: Abdul Haleem <abdhalee@linux.ibm.com> Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>