aboutsummaryrefslogtreecommitdiffstats
path: root/fs (follow)
AgeCommit message (Collapse)AuthorFilesLines
2020-01-28Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6Linus Torvalds2-3/+3
Pull crypto updates from Herbert Xu: "API: - Removed CRYPTO_TFM_RES flags - Extended spawn grabbing to all algorithm types - Moved hash descsize verification into API code Algorithms: - Fixed recursive pcrypt dead-lock - Added new 32 and 64-bit generic versions of poly1305 - Added cryptogams implementation of x86/poly1305 Drivers: - Added support for i.MX8M Mini in caam - Added support for i.MX8M Nano in caam - Added support for i.MX8M Plus in caam - Added support for A33 variant of SS in sun4i-ss - Added TEE support for Raven Ridge in ccp - Added in-kernel API to submit TEE commands in ccp - Added AMD-TEE driver - Added support for BCM2711 in iproc-rng200 - Added support for AES256-GCM based ciphers for chtls - Added aead support on SEC2 in hisilicon" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (244 commits) crypto: arm/chacha - fix build failured when kernel mode NEON is disabled crypto: caam - add support for i.MX8M Plus crypto: x86/poly1305 - emit does base conversion itself crypto: hisilicon - fix spelling mistake "disgest" -> "digest" crypto: chacha20poly1305 - add back missing test vectors and test chunking crypto: x86/poly1305 - fix .gitignore typo tee: fix memory allocation failure checks on drv_data and amdtee crypto: ccree - erase unneeded inline funcs crypto: ccree - make cc_pm_put_suspend() void crypto: ccree - split overloaded usage of irq field crypto: ccree - fix PM race condition crypto: ccree - fix FDE descriptor sequence crypto: ccree - cc_do_send_request() is void func crypto: ccree - fix pm wrongful error reporting crypto: ccree - turn errors to debug msgs crypto: ccree - fix AEAD decrypt auth fail crypto: ccree - fix typo in comment crypto: ccree - fix typos in error msgs crypto: atmel-{aes,sha,tdes} - Retire crypto_platform_data crypto: x86/sha - Eliminate casts on asm implementations ...
2020-01-28Merge tag '5.6-smb3-fixes-and-dfs-and-readdir-improvements' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds18-713/+1041
Pull cifs updates from Steve French: "Various SMB3/CIFS fixes including four for stable. - Improvement to fallocate (enables 3 additional xfstests) - Fix for file creation when mounting with modefromsid - Add ability to backup/restore dos attributes and creation time - DFS failover and reconnect fixes - performance optimization for readir Note that due to the upcoming SMB3 Test Event (at SNIA SDC next week) there will likely be more changesets near the end of the merge window (since we will be testing heavily next week, I held off on some patches and I expect some additional multichannel patches as well as patches to enable some additional xfstests)" * tag '5.6-smb3-fixes-and-dfs-and-readdir-improvements' of git://git.samba.org/sfrench/cifs-2.6: (24 commits) CIFS: Fix task struct use-after-free on reconnect cifs: use PTR_ERR_OR_ZERO() to simplify code cifs: add support for fallocate mode 0 for non-sparse files cifs: fix NULL dereference in match_prepath smb3: fix default permissions on new files when mounting with modefromsid CIFS: Add support for setting owner info, dos attributes, and create time cifs: remove set but not used variable 'server' cifs: Fix memory allocation in __smb2_handle_cancelled_cmd() cifs: Fix mount options set in automount cifs: fix unitialized variable poential problem with network I/O cache lock patch cifs: Fix return value in __update_cache_entry cifs: Avoid doing network I/O while holding cache lock cifs: Fix potential deadlock when updating vol in cifs_reconnect() cifs: Merge is_path_valid() into get_normalized_path() cifs: Introduce helpers for finding TCP connection cifs: Get rid of kstrdup_const()'d paths cifs: Clean up DFS referral cache cifs: Don't use iov_iter::type directly cifs: set correct max-buffer-size for smb2_ioctl_init() cifs: use compounding for open and first query-dir for readdir() ...
2020-01-28Merge tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscryptLinus Torvalds9-66/+267
Pull fsverity updates from Eric Biggers: - Optimize fs-verity sequential read performance by implementing readahead of Merkle tree pages. This allows the Merkle tree to be read in larger chunks. - Optimize FS_IOC_ENABLE_VERITY performance in the uncached case by implementing readahead of data pages. - Allocate the hash requests from a mempool in order to eliminate the possibility of allocation failures during I/O. * tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt: fs-verity: use u64_to_user_ptr() fs-verity: use mempool for hash requests fs-verity: implement readahead of Merkle tree pages fs-verity: implement readahead for FS_IOC_ENABLE_VERITY
2020-01-28Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscryptLinus Torvalds22-354/+748
Pull fscrypt updates from Eric Biggers: - Extend the FS_IOC_ADD_ENCRYPTION_KEY ioctl to allow the raw key to be provided via a keyring key. - Prepare for the new dirhash method (SipHash of plaintext name) that will be used by directories that are both encrypted and casefolded. - Switch to a new format for "no-key names" that prepares for the new dirhash method, and also fixes a longstanding bug where multiple filenames could map to the same no-key name. - Allow the crypto algorithms used by fscrypt to be built as loadable modules when the fscrypt-capable filesystems are. - Optimize fscrypt_zeroout_range(). - Various cleanups. * tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt: (26 commits) fscrypt: improve format of no-key names ubifs: allow both hash and disk name to be provided in no-key names ubifs: don't trigger assertion on invalid no-key filename fscrypt: clarify what is meant by a per-file key fscrypt: derive dirhash key for casefolded directories fscrypt: don't allow v1 policies with casefolding fscrypt: add "fscrypt_" prefix to fname_encrypt() fscrypt: don't print name of busy file when removing key ubifs: use IS_ENCRYPTED() instead of ubifs_crypt_is_encrypted() fscrypt: document gfp_flags for bounce page allocation fscrypt: optimize fscrypt_zeroout_range() fscrypt: remove redundant bi_status check fscrypt: Allow modular crypto algorithms fscrypt: include <linux/ioctl.h> in UAPI header fscrypt: don't check for ENOKEY from fscrypt_get_encryption_info() fscrypt: remove fscrypt_is_direct_key_policy() fscrypt: move fscrypt_valid_enc_modes() to policy.c fscrypt: check for appropriate use of DIRECT_KEY flag earlier fscrypt: split up fscrypt_supported_policy() by policy version fscrypt: introduce fscrypt_needs_contents_encryption() ...
2020-01-28Merge tag 'fs-dedupe-last-block-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linuxLinus Torvalds2-7/+6
Pull fs deduplication fix from David Sterba: "This is a fix for deduplication bug: the last block of two files is allowed to deduplicated. This got broken in 5.1 by lifting some generic checks to VFS layer. The affected filesystems are btrfs and xfs. The patches are marked for stable as the bug decreases deduplication effectivity" * tag 'fs-dedupe-last-block-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: Btrfs: make deduplication with range including the last block work fs: allow deduplication of eof block into the end of the destination file
2020-01-28Merge tag 'for-5.6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linuxLinus Torvalds40-1715/+2988
Pull btrfs updates from David Sterba: "Features, highlights: - async discard - "mount -o discard=async" to enable it - freed extents are not discarded immediatelly, but grouped together and trimmed later, with IO rate limiting - the "sync" mode submits short extents that could have been ignored completely by the device, for SATA prior to 3.1 the requests are unqueued and have a big impact on performance - the actual discard IO requests have been moved out of transaction commit to a worker thread, improving commit latency - IO rate and request size can be tuned by sysfs files, for now enabled only with CONFIG_BTRFS_DEBUG as we might need to add/delete the files and don't have a stable-ish ABI for general use, defaults are conservative - export device state info in sysfs, eg. missing, writeable - no discard of extents known to be untouched on disk (eg. after reservation) - device stats reset is logged with process name and PID that called the ioctl Fixes: - fix missing hole after hole punching and fsync when using NO_HOLES - writeback: range cyclic mode could miss some dirty pages and lead to OOM - two more corner cases for metadata_uuid change after power loss during the change - fix infinite loop during fsync after mix of rename operations Core changes: - qgroup assign returns ENOTCONN when quotas not enabled, used to return EINVAL that was confusing - device closing does not need to allocate memory anymore - snapshot aware code got removed, disabled for years due to performance problems, reimplmentation will allow to select wheter defrag breaks or does not break COW on shared extents - tree-checker: - check leaf chunk item size, cross check against number of stripes - verify location keys for DIR_ITEM, DIR_INDEX and XATTR items - new self test for physical -> logical mapping code, used for super block range exclusion - assertion helpers/macros updated to avoid objtool "unreachable code" reports on older compilers or config option combinations" * tag 'for-5.6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (84 commits) btrfs: free block groups after free'ing fs trees btrfs: Fix split-brain handling when changing FSID to metadata uuid btrfs: Handle another split brain scenario with metadata uuid feature btrfs: Factor out metadata_uuid code from find_fsid. btrfs: Call find_fsid from find_fsid_inprogress Btrfs: fix infinite loop during fsync after rename operations btrfs: set trans->drity in btrfs_commit_transaction btrfs: drop log root for dropped roots btrfs: sysfs, add devid/dev_state kobject and device attributes btrfs: Refactor btrfs_rmap_block to improve readability btrfs: Add self-tests for btrfs_rmap_block btrfs: selftests: Add support for dummy devices btrfs: Move and unexport btrfs_rmap_block btrfs: separate definition of assertion failure handlers btrfs: device stats, log when stats are zeroed btrfs: fix improper setting of scanned for range cyclic write cache pages btrfs: safely advance counter when looking up bio csums btrfs: remove unused member btrfs_device::work btrfs: remove unnecessary wrapper get_alloc_profile btrfs: add correction to handle -1 edge case in async discard ...
2020-01-28Merge branch 'x86-cache-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds2-0/+11
Pull x86 resource control updates from Ingo Molnar: "The main change in this tree is the extension of the resctrl procfs ABI with a new file that helps tooling to navigate from tasks back to resctrl groups: /proc/{pid}/cpu_resctrl_groups. Also fix static key usage for certain feature combinations and simplify the task exit resctrl case" * 'x86-cache-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/resctrl: Add task resctrl information display x86/resctrl: Check monitoring static key in the MBM overflow handler x86/resctrl: Do not reconfigure exiting tasks
2020-01-28Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds2-4/+4
Pull scheduler updates from Ingo Molnar: "These were the main changes in this cycle: - More -rt motivated separation of CONFIG_PREEMPT and CONFIG_PREEMPTION. - Add more low level scheduling topology sanity checks and warnings to filter out nonsensical topologies that break scheduling. - Extend uclamp constraints to influence wakeup CPU placement - Make the RT scheduler more aware of asymmetric topologies and CPU capacities, via uclamp metrics, if CONFIG_UCLAMP_TASK=y - Make idle CPU selection more consistent - Various fixes, smaller cleanups, updates and enhancements - please see the git log for details" * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (58 commits) sched/fair: Define sched_idle_cpu() only for SMP configurations sched/topology: Assert non-NUMA topology masks don't (partially) overlap idle: fix spelling mistake "iterrupts" -> "interrupts" sched/fair: Remove redundant call to cpufreq_update_util() sched/psi: create /proc/pressure and /proc/pressure/{io|memory|cpu} only when psi enabled sched/fair: Fix sgc->{min,max}_capacity calculation for SD_OVERLAP sched/fair: calculate delta runnable load only when it's needed sched/cputime: move rq parameter in irqtime_account_process_tick stop_machine: Make stop_cpus() static sched/debug: Reset watchdog on all CPUs while processing sysrq-t sched/core: Fix size of rq::uclamp initialization sched/uclamp: Fix a bug in propagating uclamp value in new cgroups sched/fair: Load balance aggressively for SCHED_IDLE CPUs sched/fair : Improve update_sd_pick_busiest for spare capacity case watchdog: Remove soft_lockup_hrtimer_cnt and related code sched/rt: Make RT capacity-aware sched/fair: Make EAS wakeup placement consider uclamp restrictions sched/fair: Make task_fits_capacity() consider uclamp restrictions sched/uclamp: Rename uclamp_util_with() into uclamp_rq_util_with() sched/uclamp: Make uclamp util helpers use and return UL values ...
2020-01-28Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds2-5/+5
Pull perf updates from Ingo Molnar: "Kernel side changes: - Ftrace is one of the last W^X violators (after this only KLP is left). These patches move it over to the generic text_poke() interface and thereby get rid of this oddity. This requires a surprising amount of surgery, by Peter Zijlstra. - x86/AMD PMUs: add support for 'Large Increment per Cycle Events' to count certain types of events that have a special, quirky hw ABI (by Kim Phillips) - kprobes fixes by Masami Hiramatsu Lots of tooling updates as well, the following subcommands were updated: annotate/report/top, c2c, clang, record, report/top TUI, sched timehist, tests; plus updates were done to the gtk ui, libperf, headers and the parser" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (57 commits) perf/x86/amd: Add support for Large Increment per Cycle Events perf/x86/amd: Constrain Large Increment per Cycle events perf/x86/intel/rapl: Add Comet Lake support tracing: Initialize ret in syscall_enter_define_fields() perf header: Use last modification time for timestamp perf c2c: Fix return type for histogram sorting comparision functions perf beauty sockaddr: Fix augmented syscall format warning perf/ui/gtk: Fix gtk2 build perf ui gtk: Add missing zalloc object perf tools: Use %define api.pure full instead of %pure-parser libperf: Setup initial evlist::all_cpus value perf report: Fix no libunwind compiled warning break s390 issue perf tools: Support --prefix/--prefix-strip perf report: Clarify in help that --children is default tools build: Fix test-clang.cpp with Clang 8+ perf clang: Fix build with Clang 9 kprobes: Fix optimize_kprobe()/unoptimize_kprobe() cancellation logic tools lib: Fix builds when glibc contains strlcpy() perf report/top: Make 'e' visible in the help and make it toggle showing callchains perf report/top: Do not offer annotation for symbols without samples ...
2020-01-27Merge tag 'smp-core-2020-01-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds1-1/+1
Pull core SMP updates from Thomas Gleixner: "A small set of SMP core code changes: - Rework the smp function call core code to avoid the allocation of an additional cpumask - Remove the not longer required GFP argument from on_each_cpu_cond() and on_each_cpu_cond_mask() and fixup the callers" * tag 'smp-core-2020-01-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: smp: Remove allocation mask from on_each_cpu_cond.*() smp: Add a smp_cond_func_t argument to smp_call_function_many() smp: Use smp_cond_func_t as type for the conditional function
2020-01-27Merge tag 'timers-core-2020-01-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds4-0/+104
Pull timer updates from Thomas Gleixner: "The timekeeping and timers departement provides: - Time namespace support: If a container migrates from one host to another then it expects that clocks based on MONOTONIC and BOOTTIME are not subject to disruption. Due to different boot time and non-suspended runtime these clocks can differ significantly on two hosts, in the worst case time goes backwards which is a violation of the POSIX requirements. The time namespace addresses this problem. It allows to set offsets for clock MONOTONIC and BOOTTIME once after creation and before tasks are associated with the namespace. These offsets are taken into account by timers and timekeeping including the VDSO. Offsets for wall clock based clocks (REALTIME/TAI) are not provided by this mechanism. While in theory possible, the overhead and code complexity would be immense and not justified by the esoteric potential use cases which were discussed at Plumbers '18. The overhead for tasks in the root namespace (ie where host time offsets = 0) is in the noise and great effort was made to ensure that especially in the VDSO. If time namespace is disabled in the kernel configuration the code is compiled out. Kudos to Andrei Vagin and Dmitry Sofanov who implemented this feature and kept on for more than a year addressing review comments, finding better solutions. A pleasant experience. - Overhaul of the alarmtimer device dependency handling to ensure that the init/suspend/resume ordering is correct. - A new clocksource/event driver for Microchip PIT64 - Suspend/resume support for the Hyper-V clocksource - The usual pile of fixes, updates and improvements mostly in the driver code" * tag 'timers-core-2020-01-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (71 commits) alarmtimer: Make alarmtimer_get_rtcdev() a stub when CONFIG_RTC_CLASS=n alarmtimer: Use wakeup source from alarmtimer platform device alarmtimer: Make alarmtimer platform device child of RTC device alarmtimer: Update alarmtimer_get_rtcdev() docs to reflect reality hrtimer: Add missing sparse annotation for __run_timer() lib/vdso: Only read hrtimer_res when needed in __cvdso_clock_getres() MIPS: vdso: Define BUILD_VDSO32 when building a 32bit kernel clocksource/drivers/hyper-v: Set TSC clocksource as default w/ InvariantTSC clocksource/drivers/hyper-v: Untangle stimers and timesync from clocksources clocksource/drivers/timer-microchip-pit64b: Fix sparse warning clocksource/drivers/exynos_mct: Rename Exynos to lowercase clocksource/drivers/timer-ti-dm: Fix uninitialized pointer access clocksource/drivers/timer-ti-dm: Switch to platform_get_irq clocksource/drivers/timer-ti-dm: Convert to devm_platform_ioremap_resource clocksource/drivers/em_sti: Fix variable declaration in em_sti_probe clocksource/drivers/em_sti: Convert to devm_platform_ioremap_resource clocksource/drivers/bcm2835_timer: Fix memory leak of timer clocksource/drivers/cadence-ttc: Use ttc driver as platform driver clocksource/drivers/timer-microchip-pit64b: Add Microchip PIT64B support clocksource/drivers/hyper-v: Reserve PAGE_SIZE space for tsc page ...
2020-01-26CIFS: Fix task struct use-after-free on reconnectVincent Whitchurch3-0/+6
The task which created the MID may be gone by the time cifsd attempts to call the callbacks on MIDs from cifs_reconnect(). This leads to a use-after-free of the task struct in cifs_wake_up_task: ================================================================== BUG: KASAN: use-after-free in __lock_acquire+0x31a0/0x3270 Read of size 8 at addr ffff8880103e3a68 by task cifsd/630 CPU: 0 PID: 630 Comm: cifsd Not tainted 5.5.0-rc6+ #119 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 Call Trace: dump_stack+0x8e/0xcb print_address_description.constprop.5+0x1d3/0x3c0 ? __lock_acquire+0x31a0/0x3270 __kasan_report+0x152/0x1aa ? __lock_acquire+0x31a0/0x3270 ? __lock_acquire+0x31a0/0x3270 kasan_report+0xe/0x20 __lock_acquire+0x31a0/0x3270 ? __wake_up_common+0x1dc/0x630 ? _raw_spin_unlock_irqrestore+0x4c/0x60 ? mark_held_locks+0xf0/0xf0 ? _raw_spin_unlock_irqrestore+0x39/0x60 ? __wake_up_common_lock+0xd5/0x130 ? __wake_up_common+0x630/0x630 lock_acquire+0x13f/0x330 ? try_to_wake_up+0xa3/0x19e0 _raw_spin_lock_irqsave+0x38/0x50 ? try_to_wake_up+0xa3/0x19e0 try_to_wake_up+0xa3/0x19e0 ? cifs_compound_callback+0x178/0x210 ? set_cpus_allowed_ptr+0x10/0x10 cifs_reconnect+0xa1c/0x15d0 ? generic_ip_connect+0x1860/0x1860 ? rwlock_bug.part.0+0x90/0x90 cifs_readv_from_socket+0x479/0x690 cifs_read_from_socket+0x9d/0xe0 ? cifs_readv_from_socket+0x690/0x690 ? mempool_resize+0x690/0x690 ? rwlock_bug.part.0+0x90/0x90 ? memset+0x1f/0x40 ? allocate_buffers+0xff/0x340 cifs_demultiplex_thread+0x388/0x2a50 ? cifs_handle_standard+0x610/0x610 ? rcu_read_lock_held_common+0x120/0x120 ? mark_lock+0x11b/0xc00 ? __lock_acquire+0x14ed/0x3270 ? __kthread_parkme+0x78/0x100 ? lockdep_hardirqs_on+0x3e8/0x560 ? lock_downgrade+0x6a0/0x6a0 ? lockdep_hardirqs_on+0x3e8/0x560 ? _raw_spin_unlock_irqrestore+0x39/0x60 ? cifs_handle_standard+0x610/0x610 kthread+0x2bb/0x3a0 ? kthread_create_worker_on_cpu+0xc0/0xc0 ret_from_fork+0x3a/0x50 Allocated by task 649: save_stack+0x19/0x70 __kasan_kmalloc.constprop.5+0xa6/0xf0 kmem_cache_alloc+0x107/0x320 copy_process+0x17bc/0x5370 _do_fork+0x103/0xbf0 __x64_sys_clone+0x168/0x1e0 do_syscall_64+0x9b/0xec0 entry_SYSCALL_64_after_hwframe+0x49/0xbe Freed by task 0: save_stack+0x19/0x70 __kasan_slab_free+0x11d/0x160 kmem_cache_free+0xb5/0x3d0 rcu_core+0x52f/0x1230 __do_softirq+0x24d/0x962 The buggy address belongs to the object at ffff8880103e32c0 which belongs to the cache task_struct of size 6016 The buggy address is located 1960 bytes inside of 6016-byte region [ffff8880103e32c0, ffff8880103e4a40) The buggy address belongs to the page: page:ffffea000040f800 refcount:1 mapcount:0 mapping:ffff8880108da5c0 index:0xffff8880103e4c00 compound_mapcount: 0 raw: 4000000000010200 ffffea00001f2208 ffffea00001e3408 ffff8880108da5c0 raw: ffff8880103e4c00 0000000000050003 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff8880103e3900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff8880103e3980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >ffff8880103e3a00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff8880103e3a80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff8880103e3b00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ================================================================== This can be reliably reproduced by adding the below delay to cifs_reconnect(), running find(1) on the mount, restarting the samba server while find is running, and killing find during the delay: spin_unlock(&GlobalMid_Lock); mutex_unlock(&server->srv_mutex); + msleep(10000); + cifs_dbg(FYI, "%s: issuing mid callbacks\n", __func__); list_for_each_safe(tmp, tmp2, &retry_list) { mid_entry = list_entry(tmp, struct mid_q_entry, qhead); Fix this by holding a reference to the task struct until the MID is freed. Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com> Signed-off-by: Steve French <stfrench@microsoft.com> CC: Stable <stable@vger.kernel.org> Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2020-01-26cifs: use PTR_ERR_OR_ZERO() to simplify codeChen Zhou1-1/+1
PTR_ERR_OR_ZERO contains if(IS_ERR(...)) + PTR_ERR, just use PTR_ERR_OR_ZERO directly. Signed-off-by: Chen Zhou <chenzhou10@huawei.com> Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
2020-01-26cifs: add support for fallocate mode 0 for non-sparse filesRonnie Sahlberg3-37/+34
RHBZ 1336264 When we extend a file we must also force the size to be updated. This fixes an issue with holetest in xfs-tests which performs the following sequence : 1, create a new file 2, use fallocate mode==0 to populate the file 3, mmap the file 4, touch each page by reading the mmapped region. Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2020-01-26cifs: fix NULL dereference in match_prepathRonnie Sahlberg1-2/+4
RHBZ: 1760879 Fix an oops in match_prepath() by making sure that the prepath string is not NULL before we pass it into strcmp(). This is similar to other checks we make for example in cifs_root_iget() Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2020-01-26smb3: fix default permissions on new files when mounting with modefromsidSteve French3-3/+29
When mounting with "modefromsid" mount parm most servers will require that some default permissions are given to users in the ACL on newly created files, files created with the new 'sd context' - when passing in an sd context on create, permissions are not inherited from the parent directory, so in addition to the ACE with the special SID which contains the mode, we also must pass in an ACE allowing users to access the file (GENERIC_ALL for authenticated users seemed like a reasonable default, although later we could allow a mount option or config switch to make it GENERIC_ALL for EVERYONE special sid). CC: Stable <stable@vger.kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-By: Ronnie Sahlberg <lsahlber@redhat.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2020-01-26CIFS: Add support for setting owner info, dos attributes, and create timeBoris Protopopov1-11/+117
This is needed for backup/restore scenarios among others. Add extended attribute "system.cifs_ntsd" (and alias "system.smb3_ntsd") to allow for setting owner and DACL in the security descriptor. This is in addition to the existing "system.cifs_acl" and "system.smb3_acl" attributes that allow for setting DACL only. Add support for setting creation time and dos attributes using set_file_info() calls to complement the existing support for getting these attributes via query_path_info() calls. Signed-off-by: Boris Protopopov <bprotopopov@hotmail.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2020-01-26cifs: remove set but not used variable 'server'YueHaibing1-4/+1
fs/cifs/smb2pdu.c: In function 'SMB2_query_directory': fs/cifs/smb2pdu.c:4444:26: warning: variable 'server' set but not used [-Wunused-but-set-variable] struct TCP_Server_Info *server; It is not used, so remove it. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2020-01-26cifs: Fix memory allocation in __smb2_handle_cancelled_cmd()Paulo Alcantara (SUSE)1-1/+1
__smb2_handle_cancelled_cmd() is called under a spin lock held in cifs_mid_q_entry_release(), so make its memory allocation GFP_ATOMIC. This issue was observed when running xfstests generic/028: [ 1722.589204] CIFS VFS: \\192.168.30.26 Cancelling wait for mid 72064 cmd: 5 [ 1722.590687] CIFS VFS: \\192.168.30.26 Cancelling wait for mid 72065 cmd: 17 [ 1722.593529] CIFS VFS: \\192.168.30.26 Cancelling wait for mid 72066 cmd: 6 [ 1723.039014] BUG: sleeping function called from invalid context at mm/slab.h:565 [ 1723.040710] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 30877, name: cifsd [ 1723.045098] CPU: 3 PID: 30877 Comm: cifsd Not tainted 5.5.0-rc4+ #313 [ 1723.046256] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014 [ 1723.048221] Call Trace: [ 1723.048689] dump_stack+0x97/0xe0 [ 1723.049268] ___might_sleep.cold+0xd1/0xe1 [ 1723.050069] kmem_cache_alloc_trace+0x204/0x2b0 [ 1723.051051] __smb2_handle_cancelled_cmd+0x40/0x140 [cifs] [ 1723.052137] smb2_handle_cancelled_mid+0xf6/0x120 [cifs] [ 1723.053247] cifs_mid_q_entry_release+0x44d/0x630 [cifs] [ 1723.054351] ? cifs_reconnect+0x26a/0x1620 [cifs] [ 1723.055325] cifs_demultiplex_thread+0xad4/0x14a0 [cifs] [ 1723.056458] ? cifs_handle_standard+0x2c0/0x2c0 [cifs] [ 1723.057365] ? kvm_sched_clock_read+0x14/0x30 [ 1723.058197] ? sched_clock+0x5/0x10 [ 1723.058838] ? sched_clock_cpu+0x18/0x110 [ 1723.059629] ? lockdep_hardirqs_on+0x17d/0x250 [ 1723.060456] kthread+0x1ab/0x200 [ 1723.061149] ? cifs_handle_standard+0x2c0/0x2c0 [cifs] [ 1723.062078] ? kthread_create_on_node+0xd0/0xd0 [ 1723.062897] ret_from_fork+0x3a/0x50 Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Fixes: 9150c3adbf24 ("CIFS: Close open handle after interrupted close") Cc: Stable <stable@vger.kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2020-01-26cifs: Fix mount options set in automountPaulo Alcantara (SUSE)1-54/+43
Starting from 4a367dc04435, we must set the mount options based on the DFS full path rather than the resolved target, that is, cifs_mount() will be responsible for resolving the DFS link (cached) as well as performing failover to any other targets in the referral. Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Reported-by: Martijn de Gouw <martijn.de.gouw@prodrive-technologies.com> Fixes: 4a367dc04435 ("cifs: Add support for failover in cifs_mount()") Link: https://lore.kernel.org/linux-cifs/39643d7d-2abb-14d3-ced6-c394fab9a777@prodrive-technologies.com Tested-by: Martijn de Gouw <martijn.de.gouw@prodrive-technologies.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2020-01-26cifs: fix unitialized variable poential problem with network I/O cache lock patchSteve French1-1/+1
static analysis with Coverity detected an issue with the following commit: Author: Paulo Alcantara (SUSE) <pc@cjr.nz> Date: Wed Dec 4 17:38:03 2019 -0300 cifs: Avoid doing network I/O while holding cache lock Addresses-Coverity: ("Uninitialized pointer read") Reported-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com>
2020-01-26cifs: Fix return value in __update_cache_entryYueHaibing1-1/+1
copy_ref_data() may return error, it should be returned to upstream caller. Fixes: 03535b72873b ("cifs: Avoid doing network I/O while holding cache lock") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2020-01-26cifs: Avoid doing network I/O while holding cache lockPaulo Alcantara (SUSE)1-264/+283
When creating or updating a cache entry, we need to get an DFS referral (get_dfs_referral), so avoid holding any locks during such network operation. To prevent that, do the following: * change cache hashtable sync method from RCU sync to a read/write lock. * use GFP_ATOMIC in memory allocations. Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com>
2020-01-26cifs: Fix potential deadlock when updating vol in cifs_reconnect()Paulo Alcantara (SUSE)1-32/+77
We can't acquire volume lock while refreshing the DFS cache because cifs_reconnect() may call dfs_cache_update_vol() while we are walking through the volume list. To prevent that, make vol_info refcounted, create a temp list with all volumes eligible for refreshing, and then use it without any locks held. Besides, replace vol_lock with a spinlock and protect cache_ttl from concurrent accesses or changes. Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com>
2020-01-26cifs: Merge is_path_valid() into get_normalized_path()Paulo Alcantara (SUSE)1-17/+4
Just do the trivial path validation in get_normalized_path(). Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Reviewed-by: Aurelien Aptel <aaptel@suse.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2020-01-26cifs: Introduce helpers for finding TCP connectionPaulo Alcantara (SUSE)1-13/+31
Add helpers for finding TCP connections that are good candidates for being used by DFS refresh worker. Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Reviewed-by: Aurelien Aptel <aaptel@suse.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2020-01-26cifs: Get rid of kstrdup_const()'d pathsPaulo Alcantara (SUSE)1-3/+3
The DFS cache API is mostly used with heap allocated strings. Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Reviewed-by: Aurelien Aptel <aaptel@suse.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2020-01-26cifs: Clean up DFS referral cachePaulo Alcantara (SUSE)1-286/+279
Do some renaming and code cleanup. No functional changes. Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Reviewed-by: Aurelien Aptel <aaptel@suse.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2020-01-26cifs: Don't use iov_iter::type directlyDavid Howells1-4/+4
Don't use iov_iter::type directly, but rather use the new accessor functions that have been added. This allows the .type field to be split and rearranged without the need to update the filesystems. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2020-01-26cifs: set correct max-buffer-size for smb2_ioctl_init()Ronnie Sahlberg1-2/+7
Fix two places where we need to adjust down the max response size for ioctl when it is used together with compounding. Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> CC: Stable <stable@vger.kernel.org>
2020-01-26cifs: use compounding for open and first query-dir for readdir()Ronnie Sahlberg2-12/+87
Combine the initial SMB2_Open and the first SMB2_Query_Directory in a compound. This shaves one round-trip of each directory listing, changing it from 4 to 3 for small directories. Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2020-01-26cifs: create a helper function to parse the query-directory response bufferRonnie Sahlberg1-46/+63
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2020-01-26cifs: prepare SMB2_query_directory to be used with compoundingRonnie Sahlberg3-36/+82
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2020-01-26fs/cifs/cifssmb.c: use true,false for bool variablezhengbin1-2/+2
Fixes coccicheck warning: fs/cifs/cifssmb.c:4622:3-22: WARNING: Assignment of 0/1 to bool variable fs/cifs/cifssmb.c:4756:3-22: WARNING: Assignment of 0/1 to bool variable Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: zhengbin <zhengbin13@huawei.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2020-01-26fs/cifs/smb2ops.c: use true,false for bool variablezhengbin1-1/+1
Fixes coccicheck warning: fs/cifs/smb2ops.c:807:2-36: WARNING: Assignment of 0/1 to bool variable Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: zhengbin <zhengbin13@huawei.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2020-01-26Merge tag 'io_uring-5.5-2020-01-26' of git://git.kernel.dk/linux-blockLinus Torvalds1-10/+0
Pull io_uring fixes from Jens Axboe: "Fix for two regressions in this cycle, both reported by the postgresql use case. One removes the added restriction on who can submit IO, making it possible for rings shared across forks to do so. The other fixes an issue for the same kind of use case, where one exiting process would cancel all IO" * tag 'io_uring-5.5-2020-01-26' of git://git.kernel.dk/linux-block: io_uring: don't cancel all work on process exit Revert "io_uring: only allow submit from owning task"
2020-01-26Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds1-7/+10
Pull vfs fix from Al Viro: "Fix a use-after-free in do_last() handling of sysctl_protected_... checks. The use-after-free normally doesn't happen there, but race with rename() and it becomes possible" * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: do_last(): fetch directory ->i_mode and ->i_uid before it's too late
2020-01-26io_uring: don't cancel all work on process exitJens Axboe1-4/+0
If we're sharing the ring across forks, then one process exiting means that we cancel ALL work and prevent future work. This is overly restrictive. As long as we cancel the work associated with the files from the current task, it's safe to let others persist. Normal fd close on exit will still wait (and cancel) pending work. Fixes: fcb323cc53e2 ("io_uring: io_uring: add support for async work inheriting files") Reported-by: Andres Freund <andres@anarazel.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-26Revert "io_uring: only allow submit from owning task"Jens Axboe1-6/+0
This ends up being too restrictive for tasks that willingly fork and share the ring between forks. Andres reports that this breaks his postgresql work. Since we're close to 5.5 release, revert this change for now. Cc: stable@vger.kernel.org Fixes: 44d282796f81 ("io_uring: only allow submit from owning task") Reported-by: Andres Freund <andres@anarazel.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-26afs: Fix characters allowed into cell namesDavid Howells1-1/+10
The afs filesystem needs to prohibit certain characters from cell names, such as '/', as these are used to form filenames in procfs, leading to the following warning being generated: WARNING: CPU: 0 PID: 3489 at fs/proc/generic.c:178 Fix afs_alloc_cell() to disallow nonprintable characters, '/', '@' and names that begin with a dot. Remove the check for "@cell" as that is then redundant. This can be tested by running: echo add foo/.bar 1.2.3.4 >/proc/fs/afs/cells Note that we will also need to deal with: - Names ending in ".invalid" shouldn't be passed to the DNS. - Names that contain non-valid domainname chars shouldn't be passed to the DNS. - DNS replies that say "your-dns-needs-immediate-attention.<gTLD>" and replies containing A records that say 127.0.53.53 should be considered invalid. [https://www.icann.org/en/system/files/files/name-collision-mitigation-01aug14-en.pdf] but these need to be dealt with by the kafs-client DNS program rather than the kernel. Reported-by: syzbot+b904ba7c947a37b4b291@syzkaller.appspotmail.com Cc: stable@kernel.org Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-26do_last(): fetch directory ->i_mode and ->i_uid before it's too lateAl Viro1-7/+10
may_create_in_sticky() call is done when we already have dropped the reference to dir. Fixes: 30aba6656f61e (namei: allow restricted O_CREAT of FIFOs and regular files) Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-01-25Merge tag 'for-5.5-rc8-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linuxLinus Torvalds2-9/+29
Pull btrfs fix from David Sterba: "Here's a last minute fix for a regression introduced in this development cycle. There's a small chance of a silent corruption when device replace and NOCOW data writes happen at the same time in one block group. Metadata or COW data writes are unaffected. The extra fixup patch is there to silence an unnecessary warning" * tag 'for-5.5-rc8-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: dev-replace: remove warning for unknown return codes when finished btrfs: scrub: Require mandatory block group RO for dev-replace
2020-01-25btrfs: dev-replace: remove warning for unknown return codes when finishedDavid Sterba1-4/+1
The fstests btrfs/011 triggered a warning at the end of device replace, [ 1891.998975] BTRFS warning (device vdd): failed setting block group ro: -28 [ 1892.038338] BTRFS error (device vdd): btrfs_scrub_dev(/dev/vdd, 1, /dev/vdb) failed -28 [ 1892.059993] ------------[ cut here ]------------ [ 1892.063032] WARNING: CPU: 2 PID: 2244 at fs/btrfs/dev-replace.c:506 btrfs_dev_replace_start.cold+0xf9/0x140 [btrfs] [ 1892.074346] CPU: 2 PID: 2244 Comm: btrfs Not tainted 5.5.0-rc7-default+ #942 [ 1892.079956] RIP: 0010:btrfs_dev_replace_start.cold+0xf9/0x140 [btrfs] [ 1892.096576] RSP: 0018:ffffbb58c7b3fd10 EFLAGS: 00010286 [ 1892.098311] RAX: 00000000ffffffe4 RBX: 0000000000000001 RCX: 8888888888888889 [ 1892.100342] RDX: 0000000000000001 RSI: ffff9e889645f5d8 RDI: ffffffff92821080 [ 1892.102291] RBP: ffff9e889645c000 R08: 000001b8878fe1f6 R09: 0000000000000000 [ 1892.104239] R10: ffffbb58c7b3fd08 R11: 0000000000000000 R12: ffff9e88a0017000 [ 1892.106434] R13: ffff9e889645f608 R14: ffff9e88794e1000 R15: ffff9e88a07b5200 [ 1892.108642] FS: 00007fcaed3f18c0(0000) GS:ffff9e88bda00000(0000) knlGS:0000000000000000 [ 1892.111558] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1892.113492] CR2: 00007f52509ff420 CR3: 00000000603dd002 CR4: 0000000000160ee0 [ 1892.115814] Call Trace: [ 1892.116896] btrfs_dev_replace_by_ioctl+0x35/0x60 [btrfs] [ 1892.118962] btrfs_ioctl+0x1d62/0x2550 [btrfs] caused by the previous patch ("btrfs: scrub: Require mandatory block group RO for dev-replace"). Hitting ENOSPC is possible and could happen when the block group is set read-only, preventing NOCOW writes to the area that's being accessed by dev-replace. This has happend with scratch devices of size 12G but not with 5G and 20G, so this is depends on timing and other activity on the filesystem. The whole replace operation is restartable, the space state should be examined by the user in any case. The error code is propagated back to the ioctl caller so the kernel warning is causing false alerts. Signed-off-by: David Sterba <dsterba@suse.com>
2020-01-24smp: Remove allocation mask from on_each_cpu_cond.*()Sebastian Andrzej Siewior1-1/+1
The allocation mask is no longer used by on_each_cpu_cond() and on_each_cpu_cond_mask() and can be removed. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20200117090137.1205765-4-bigeasy@linutronix.de
2020-01-24btrfs: scrub: Require mandatory block group RO for dev-replaceQu Wenruo1-5/+28
[BUG] For dev-replace test cases with fsstress, like btrfs/06[45] btrfs/071, looped runs can lead to random failure, where scrub finds csum error. The possibility is not high, around 1/20 to 1/100, but it's causing data corruption. The bug is observable after commit b12de52896c0 ("btrfs: scrub: Don't check free space before marking a block group RO") [CAUSE] Dev-replace has two source of writes: - Write duplication All writes to source device will also be duplicated to target device. Content: Not yet persisted data/meta - Scrub copy Dev-replace reused scrub code to iterate through existing extents, and copy the verified data to target device. Content: Previously persisted data and metadata The difference in contents makes the following race possible: Regular Writer | Dev-replace ----------------------------------------------------------------- ^ | | Preallocate one data extent | | at bytenr X, len 1M | v | ^ Commit transaction | | Now extent [X, X+1M) is in | v commit root | ================== Dev replace starts ========================= | ^ | | Scrub extent [X, X+1M) | | Read [X, X+1M) | | (The content are mostly garbage | | since it's preallocated) ^ | v | Write back happens for | | extent [X, X+512K) | | New data writes to both | | source and target dev. | v | | ^ | | Scrub writes back extent [X, X+1M) | | to target device. | | This will over write the new data in | | [X, X+512K) | v This race can only happen for nocow writes. Thus metadata and data cow writes are safe, as COW will never overwrite extents of previous transaction (in commit root). This behavior can be confirmed by disabling all fallocate related calls in fsstress (*), then all related tests can pass a 2000 run loop. *: FSSTRESS_AVOID="-f fallocate=0 -f allocsp=0 -f zero=0 -f insert=0 \ -f collapse=0 -f punch=0 -f resvsp=0" I didn't expect resvsp ioctl will fallback to fallocate in VFS... [FIX] Make dev-replace to require mandatory block group RO, and wait for current nocow writes before calling scrub_chunk(). This patch will mostly revert commit 76a8efa171bf ("btrfs: Continue replace when set_block_ro failed") for dev-replace path. The side effect is, dev-replace can be more strict on avaialble space, but definitely worth to avoid data corruption. Reported-by: Filipe Manana <fdmanana@suse.com> Fixes: 76a8efa171bf ("btrfs: Continue replace when set_block_ro failed") Fixes: b12de52896c0 ("btrfs: scrub: Don't check free space before marking a block group RO") Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-01-23Merge tag 'ceph-for-5.5-rc8' of https://github.com/ceph/ceph-clientLinus Torvalds1-2/+6
Pull ceph fix from Ilya Dryomov: "A fix for a potential use-after-free from Jeff, marked for stable" * tag 'ceph-for-5.5-rc8' of https://github.com/ceph/ceph-client: ceph: hold extra reference to r_parent over life of request
2020-01-23readdir: make user_access_begin() use the real access rangeLinus Torvalds1-38/+35
In commit 9f79b78ef744 ("Convert filldir[64]() from __put_user() to unsafe_put_user()") I changed filldir to not do individual __put_user() accesses, but instead use unsafe_put_user() surrounded by the proper user_access_begin/end() pair. That make them enormously faster on modern x86, where the STAC/CLAC games make individual user accesses fairly heavy-weight. However, the user_access_begin() range was not really the exact right one, since filldir() has the unfortunate problem that it needs to not only fill out the new directory entry, it also needs to fix up the previous one to contain the proper file offset. It's unfortunate, but the "d_off" field in "struct dirent" is _not_ the file offset of the directory entry itself - it's the offset of the next one. So we end up backfilling the offset in the previous entry as we walk along. But since x86 didn't really care about the exact range, and used to be the only architecture that did anything fancy in user_access_begin() to begin with, the filldir[64]() changes did something lazy, and even commented on it: /* * Note! This range-checks 'previous' (which may be NULL). * The real range was checked in getdents */ if (!user_access_begin(dirent, sizeof(*dirent))) goto efault; and it all worked fine. But now 32-bit ppc is starting to also implement user_access_begin(), and the fact that we faked the range to only be the (possibly not even valid) previous directory entry becomes a problem, because ppc32 will actually be using the range that is passed in for more than just "check that it's user space". This is a complete rewrite of Christophe's original patch. By saving off the record length of the previous entry instead of a pointer to it in the filldir data structures, we can simplify the range check and the writing of the previous entry d_off field. No need for any conditionals in the user accesses themselves, although we retain the conditional EINTR checking for the "was this the first directory entry" signal handling latency logic. Fixes: 9f79b78ef744 ("Convert filldir[64]() from __put_user() to unsafe_put_user()") Link: https://lore.kernel.org/lkml/a02d3426f93f7eb04960a4d9140902d278cab0bb.1579697910.git.christophe.leroy@c-s.fr/ Link: https://lore.kernel.org/lkml/408c90c4068b00ea8f1c41cca45b84ec23d4946b.1579783936.git.christophe.leroy@c-s.fr/ Reported-and-tested-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-23readdir: be more conservative with directory entry namesLinus Torvalds1-1/+5
Commit 8a23eb804ca4 ("Make filldir[64]() verify the directory entry filename is valid") added some minimal validity checks on the directory entries passed to filldir[64](). But they really were pretty minimal. This fleshes out at least the name length check: we used to disallow zero-length names, but really, negative lengths or oevr-long names aren't ok either. Both could happen if there is some filesystem corruption going on. Now, most filesystems tend to use just an "unsigned char" or similar for the length of a directory entry name, so even with a corrupt filesystem you should never see anything odd like that. But since we then use the name length to create the directory entry record length, let's make sure it actually is half-way sensible. Note how POSIX states that the size of a path component is limited by NAME_MAX, but we actually use PATH_MAX for the check here. That's because while NAME_MAX is generally the correct maximum name length (it's 255, for the same old "name length is usually just a byte on disk"), there's nothing in the VFS layer that really cares. So the real limitation at a VFS layer is the total pathname length you can pass as a filename: PATH_MAX. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-23Btrfs: make deduplication with range including the last block workFilipe Manana1-1/+2
Since btrfs was migrated to use the generic VFS helpers for clone and deduplication, it stopped allowing for the last block of a file to be deduplicated when the source file size is not sector size aligned (when eof is somewhere in the middle of the last block). There are two reasons for that: 1) The generic code always rounds down, to a multiple of the block size, the range's length for deduplications. This means we end up never deduplicating the last block when the eof is not block size aligned, even for the safe case where the destination range's end offset matches the destination file's size. That rounding down operation is done at generic_remap_check_len(); 2) Because of that, the btrfs specific code does not expect anymore any non-aligned range length's for deduplication and therefore does not work if such nona-aligned length is given. This patch addresses that second part, and it depends on a patch that fixes generic_remap_check_len(), in the VFS, which was submitted ealier and has the following subject: "fs: allow deduplication of eof block into the end of the destination file" These two patches address reports from users that started seeing lower deduplication rates due to the last block never being deduplicated when the file size is not aligned to the filesystem's block size. Link: https://lore.kernel.org/linux-btrfs/2019-1576167349.500456@svIo.N5dq.dFFD/ CC: stable@vger.kernel.org # 5.1+ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-01-23fs: allow deduplication of eof block into the end of the destination fileFilipe Manana1-6/+4
We always round down, to a multiple of the filesystem's block size, the length to deduplicate at generic_remap_check_len(). However this is only needed if an attempt to deduplicate the last block into the middle of the destination file is requested, since that leads into a corruption if the length of the source file is not block size aligned. When an attempt to deduplicate the last block into the end of the destination file is requested, we should allow it because it is safe to do it - there's no stale data exposure and we are prepared to compare the data ranges for a length not aligned to the block (or page) size - in fact we even do the data compare before adjusting the deduplication length. After btrfs was updated to use the generic helpers from VFS (by commit 34a28e3d77535e ("Btrfs: use generic_remap_file_range_prep() for cloning and deduplication")) we started to have user reports of deduplication not reflinking the last block anymore, and whence users getting lower deduplication scores. The main use case is deduplication of entire files that have a size not aligned to the block size of the filesystem. We already allow cloning the last block to the end (and beyond) of the destination file, so allow for deduplication as well. Link: https://lore.kernel.org/linux-btrfs/2019-1576167349.500456@svIo.N5dq.dFFD/ CC: stable@vger.kernel.org # 5.1+ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>