aboutsummaryrefslogtreecommitdiffstats
path: root/fs (follow)
AgeCommit message (Collapse)AuthorFilesLines
2019-09-18Merge tag 'staging-5.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/stagingLinus Torvalds21-0/+6446
Pull staging and IIO driver updates from Greg KH: "Here is the big staging/iio driver update for 5.4-rc1. Lots of churn here, with a few driver/filesystems moving out of staging finally: - erofs moved out of staging - greybus core code moved out of staging Along with that, a new filesytem has been added: - extfat to provide support for those devices requiring that filesystem (i.e. transfer devices to/from windows systems or printers) Other than that, there a number of new IIO drivers, and lots and lots and lots of staging driver cleanups and minor fixes as people continue to dig into those for easy changes. All of these have been in linux-next for a while with no reported issues" * tag 'staging-5.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (453 commits) Staging: gasket: Use temporaries to reduce line length. Staging: octeon: Avoid several usecases of strcpy staging: vhciq_core: replace snprintf with scnprintf staging: wilc1000: avoid twice IRQ handler execution for each single interrupt staging: wilc1000: remove unused interrupt status handling code staging: fbtft: make several arrays static const, makes object smaller staging: rtl8188eu: make two arrays static const, makes object smaller staging: rtl8723bs: core: Remove Macro "IS_MAC_ADDRESS_BROADCAST" dt-bindings: anybus-controller: move to staging/ tree staging: emxx_udc: remove local TRUE/FALSE definition staging: wilc1000: look for rtc_clk clock staging: dt-bindings: wilc1000: add optional rtc_clk property staging: nvec: make use of devm_platform_ioremap_resource staging: exfat: drop unused function parameter Staging: exfat: Avoid use of strcpy staging: exfat: use integer constants staging: exfat: cleanup spacing for casts staging: exfat: cleanup spacing for operators staging: rtl8723bs: hal: remove redundant variable n staging: pi433: Fix typo in documentation ...
2019-09-18Merge tag 'driver-core-5.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-coreLinus Torvalds1-4/+5
Pull driver core updates from Greg Kroah-Hartman: "Here is the big driver core update for 5.4-rc1. There was a bit of a churn in here, with a number of core and OF platform patches being added to the tree, and then after much discussion and review and a day-long in-person meeting, they were decided to be reverted and a new set of patches is currently being reviewed on the mailing list. Other than that churn, there are two "persistent" branches in here that other trees will be pulling in as well during the merge window. One branch to add support for drivers to have the driver core automatically add sysfs attribute files when a driver is bound to a device so that the driver doesn't have to manually do it (and then clean it up, as it always gets it wrong). There's another branch in here for generic lookup helpers for the driver core that lots of busses are starting to use. That's the majority of the non-driver-core changes in this patch series. There's also some on-going debugfs file creation cleanup that has been slowly happening over the past few releases, with the goal to hopefully get that done sometime next year. All of these have been in linux-next for a while now with no reported issues" [ Note that the above-mentioned generic lookup helpers branch was already brought in by the LED merge (commit 4feaab05dc1e) that had shared it. Also note that that common branch introduced an i2c bug due to a bad conversion, which got fixed here. - Linus ] * tag 'driver-core-5.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (49 commits) coccinelle: platform_get_irq: Fix parse error driver-core: add include guard to linux/container.h sysfs: add BIN_ATTR_WO() macro driver core: platform: Export platform_get_irq_optional() hwmon: pwm-fan: Use platform_get_irq_optional() driver core: platform: Introduce platform_get_irq_optional() Revert "driver core: Add support for linking devices during device addition" Revert "driver core: Add edit_links() callback for drivers" Revert "of/platform: Add functional dependency link from DT bindings" Revert "driver core: Add sync_state driver/bus callback" Revert "of/platform: Pause/resume sync state during init and of_platform_populate()" Revert "of/platform: Create device links for all child-supplier depencencies" Revert "of/platform: Don't create device links for default busses" Revert "of/platform: Fix fn definitons for of_link_is_valid() and of_link_property()" Revert "of/platform: Fix device_links_supplier_sync_state_resume() warning" Revert "of/platform: Disable generic device linking code for PowerPC" devcoredump: fix typo in comment devcoredump: use memory_read_from_buffer of/platform: Disable generic device linking code for PowerPC device.h: Fix warnings for mismatched parameter names in comments ...
2019-09-17Merge tag 'pm-5.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pmLinus Torvalds1-2/+2
Pull power management updates from Rafael Wysocki: "These include a rework of the main suspend-to-idle code flow (related to the handling of spurious wakeups), a switch over of several users of cpufreq notifiers to QoS-based limits, a new devfreq driver for Tegra20, a new cpuidle driver and governor for virtualized guests, an extension of the wakeup sources framework to expose wakeup sources as device objects in sysfs, and more. Specifics: - Rework the main suspend-to-idle control flow to avoid repeating "noirq" device resume and suspend operations in case of spurious wakeups from the ACPI EC and decouple the ACPI EC wakeups support from the LPS0 _DSM support (Rafael Wysocki). - Extend the wakeup sources framework to expose wakeup sources as device objects in sysfs (Tri Vo, Stephen Boyd). - Expose system suspend statistics in sysfs (Kalesh Singh). - Introduce a new haltpoll cpuidle driver and a new matching governor for virtualized guests wanting to do guest-side polling in the idle loop (Marcelo Tosatti, Joao Martins, Wanpeng Li, Stephen Rothwell). - Fix the menu and teo cpuidle governors to allow the scheduler tick to be stopped if PM QoS is used to limit the CPU idle state exit latency in some cases (Rafael Wysocki). - Increase the resolution of the play_idle() argument to microseconds for more fine-grained injection of CPU idle cycles (Daniel Lezcano). - Switch over some users of cpuidle notifiers to the new QoS-based frequency limits and drop the CPUFREQ_ADJUST and CPUFREQ_NOTIFY policy notifier events (Viresh Kumar). - Add new cpufreq driver based on nvmem for sun50i (Yangtao Li). - Add support for MT8183 and MT8516 to the mediatek cpufreq driver (Andrew-sh.Cheng, Fabien Parent). - Add i.MX8MN support to the imx-cpufreq-dt cpufreq driver (Anson Huang). - Add qcs404 to cpufreq-dt-platdev blacklist (Jorge Ramirez-Ortiz). - Update the qcom cpufreq driver (among other things, to make it easier to extend and to use kryo cpufreq for other nvmem-based SoCs) and add qcs404 support to it (Niklas Cassel, Douglas RAILLARD, Sibi Sankar, Sricharan R). - Fix assorted issues and make assorted minor improvements in the cpufreq code (Colin Ian King, Douglas RAILLARD, Florian Fainelli, Gustavo Silva, Hariprasad Kelam). - Add new devfreq driver for NVidia Tegra20 (Dmitry Osipenko, Arnd Bergmann). - Add new Exynos PPMU events to devfreq events and extend that mechanism (Lukasz Luba). - Fix and clean up the exynos-bus devfreq driver (Kamil Konieczny). - Improve devfreq documentation and governor code, fix spelling typos in devfreq (Ezequiel Garcia, Krzysztof Kozlowski, Leonard Crestez, MyungJoo Ham, Gaël PORTAY). - Add regulators enable and disable to the OPP (operating performance points) framework (Kamil Konieczny). - Update the OPP framework to support multiple opp-suspend properties (Anson Huang). - Fix assorted issues and make assorted minor improvements in the OPP code (Niklas Cassel, Viresh Kumar, Yue Hu). - Clean up the generic power domains (genpd) framework (Ulf Hansson). - Clean up assorted pieces of power management code and documentation (Akinobu Mita, Amit Kucheria, Chuhong Yuan). - Update the pm-graph tool to version 5.5 including multiple fixes and improvements (Todd Brandt). - Update the cpupower utility (Benjamin Weis, Geert Uytterhoeven, Sébastien Szymanski)" * tag 'pm-5.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (126 commits) cpuidle-haltpoll: Enable kvm guest polling when dedicated physical CPUs are available cpuidle-haltpoll: do not set an owner to allow modunload cpuidle-haltpoll: return -ENODEV on modinit failure cpuidle-haltpoll: set haltpoll as preferred governor cpuidle: allow governor switch on cpuidle_register_driver() PM: runtime: Documentation: add runtime_status ABI document pm-graph: make setVal unbuffered again for python2 and python3 powercap: idle_inject: Use higher resolution for idle injection cpuidle: play_idle: Increase the resolution to usec cpuidle-haltpoll: vcpu hotplug support cpufreq: Add qcs404 to cpufreq-dt-platdev blacklist cpufreq: qcom: Add support for qcs404 on nvmem driver cpufreq: qcom: Refactor the driver to make it easier to extend cpufreq: qcom: Re-organise kryo cpufreq to use it for other nvmem based qcom socs dt-bindings: opp: Add qcom-opp bindings with properties needed for CPR dt-bindings: opp: qcom-nvmem: Support pstates provided by a power domain Documentation: cpufreq: Update policy notifier documentation cpufreq: Remove CPUFREQ_ADJUST and CPUFREQ_NOTIFY policy notifier events PM / Domains: Verify PM domain type in dev_pm_genpd_set_performance_state() PM / Domains: Simplify genpd_lookup_dev() ...
2019-09-17Merge tag 'for-5.4/block-2019-09-16' of git://git.kernel.dk/linux-blockLinus Torvalds1-47/+127
Pull block updates from Jens Axboe: - Two NVMe pull requests: - ana log parse fix from Anton - nvme quirks support for Apple devices from Ben - fix missing bio completion tracing for multipath stack devices from Hannes and Mikhail - IP TOS settings for nvme rdma and tcp transports from Israel - rq_dma_dir cleanups from Israel - tracing for Get LBA Status command from Minwoo - Some nvme-tcp cleanups from Minwoo, Potnuri and Myself - Some consolidation between the fabrics transports for handling the CAP register - reset race with ns scanning fix for fabrics (move fabrics commands to a dedicated request queue with a different lifetime from the admin request queue)." - controller reset and namespace scan races fixes - nvme discovery log change uevent support - naming improvements from Keith - multiple discovery controllers reject fix from James - some regular cleanups from various people - Series fixing (and re-fixing) null_blk debug printing and nr_devices checks (André) - A few pull requests from Song, with fixes from Andy, Guoqing, Guilherme, Neil, Nigel, and Yufen. - REQ_OP_ZONE_RESET_ALL support (Chaitanya) - Bio merge handling unification (Christoph) - Pick default elevator correctly for devices with special needs (Damien) - Block stats fixes (Hou) - Timeout and support devices nbd fixes (Mike) - Series fixing races around elevator switching and device add/remove (Ming) - sed-opal cleanups (Revanth) - Per device weight support for BFQ (Fam) - Support for blk-iocost, a new model that can properly account cost of IO workloads. (Tejun) - blk-cgroup writeback fixes (Tejun) - paride queue init fixes (zhengbin) - blk_set_runtime_active() cleanup (Stanley) - Block segment mapping optimizations (Bart) - lightnvm fixes (Hans/Minwoo/YueHaibing) - Various little fixes and cleanups * tag 'for-5.4/block-2019-09-16' of git://git.kernel.dk/linux-block: (186 commits) null_blk: format pr_* logs with pr_fmt null_blk: match the type of parameter nr_devices null_blk: do not fail the module load with zero devices block: also check RQF_STATS in blk_mq_need_time_stamp() block: make rq sector size accessible for block stats bfq: Fix bfq linkage error raid5: use bio_end_sector in r5_next_bio raid5: remove STRIPE_OPS_REQ_PENDING md: add feature flag MD_FEATURE_RAID0_LAYOUT md/raid0: avoid RAID0 data corruption due to layout confusion. raid5: don't set STRIPE_HANDLE to stripe which is in batch list raid5: don't increment read_errors on EILSEQ return nvmet: fix a wrong error status returned in error log page nvme: send discovery log page change events to userspace nvme: add uevent variables for controller devices nvme: enable aen regardless of the presence of I/O queues nvme-fabrics: allow discovery subsystems accept a kato nvmet: Use PTR_ERR_OR_ZERO() in nvmet_init_discovery() nvme: Remove redundant assignment of cq vector nvme: Assign subsys instance from first ctrl ...
2019-09-17Merge tag 'for-5.4/io_uring-2019-09-15' of git://git.kernel.dk/linux-blockLinus Torvalds1-183/+348
Pull io_uring updates from Jens Axboe: - Allocate SQ/CQ ring together, more efficient. Expose this through a feature flag as well, so we can reduce the number of mmaps by 1 (Hristo and me) - Fix for sequence logic with SQ thread (Jackie). - Add support for links with drain commands (Jackie). - Improved async merging (me) - Improved buffered async write performance (me) - Support SQ poll wakeup + event get in single io_uring_enter() (me) - Support larger SQ ring size. For epoll conversions, the 4k limit was too small for some prod workloads (Daniel). - put_user_page() usage (John) * tag 'for-5.4/io_uring-2019-09-15' of git://git.kernel.dk/linux-block: io_uring: increase IORING_MAX_ENTRIES to 32K io_uring: make sqpoll wakeup possible with getevents io_uring: extend async work merging io_uring: limit parallelism of buffered writes io_uring: add io_queue_async_work() helper io_uring: optimize submit_and_wait API io_uring: add support for link with drain io_uring: fix wrong sequence setting logic io_uring: expose single mmap capability io_uring: allocate the two rings together fs/io_uring.c: convert put_page() to put_user_page*()
2019-09-17Merge tag 'docs-5.4' of git://git.lwn.net/linuxLinus Torvalds7-7/+7
Pull documentation updates from Jonathan Corbet: "It's a somewhat calmer cycle for docs this time, as the churn of the mass RST conversion is happily mostly behind us. - A new document on reproducible builds. - We finally got around to zapping the documentation for hardware support that was removed in 2004; one doesn't want to rush these things. - The usual assortment of fixes, typo corrections, etc" * tag 'docs-5.4' of git://git.lwn.net/linux: (67 commits) Documentation: kbuild: Add document about reproducible builds docs: printk-formats: Stop encouraging use of unnecessary %h[xudi] and %hh[xudi] Documentation: Add "earlycon=sbi" to the admin guide doc:lock: remove reference to clever use of read-write lock devices.txt: improve entry for comedi (char major 98) docs: mtd: Update spi nor reference driver doc: arm64: fix grammar dtb placed in no attributes region Documentation: sysrq: don't recommend 'S' 'U' before 'B' mailmap: Update email address for Quentin Perret docs: ftrace: clarify when tracing is disabled by the trace file docs: process: fix broken link Documentation/arm/samsung-s3c24xx: Remove stray U+FEFF character to fix title Documentation/arm/sa1100/assabet: Fix 'make assabet_defconfig' command Documentation/arm/sa1100: Remove some obsolete documentation docs/zh_CN: update Chinese howto.rst for latexdocs making Documentation: virt: Fix broken reference to virt tree's index docs: Fix typo on pull requests guide kernel-doc: Allow anonymous enum Documentation: sphinx: Don't parse socket() as identifier reference Documentation: sphinx: Add missing comma to list of strings ...
2019-09-17Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds1-1/+5
Pull core timer updates from Thomas Gleixner: "Timers and timekeeping updates: - A large overhaul of the posix CPU timer code which is a preparation for moving the CPU timer expiry out into task work so it can be properly accounted on the task/process. An update to the bogus permission checks will come later during the merge window as feedback was not complete before heading of for travel. - Switch the timerqueue code to use cached rbtrees and get rid of the homebrewn caching of the leftmost node. - Consolidate hrtimer_init() + hrtimer_init_sleeper() calls into a single function - Implement the separation of hrtimers to be forced to expire in hard interrupt context even when PREEMPT_RT is enabled and mark the affected timers accordingly. - Implement a mechanism for hrtimers and the timer wheel to protect RT against priority inversion and live lock issues when a (hr)timer which should be canceled is currently executing the callback. Instead of infinitely spinning, the task which tries to cancel the timer blocks on a per cpu base expiry lock which is held and released by the (hr)timer expiry code. - Enable the Hyper-V TSC page based sched_clock for Hyper-V guests resulting in faster access to timekeeping functions. - Updates to various clocksource/clockevent drivers and their device tree bindings. - The usual small improvements all over the place" * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (101 commits) posix-cpu-timers: Fix permission check regression posix-cpu-timers: Always clear head pointer on dequeue hrtimer: Add a missing bracket and hide `migration_base' on !SMP posix-cpu-timers: Make expiry_active check actually work correctly posix-timers: Unbreak CONFIG_POSIX_TIMERS=n build tick: Mark sched_timer to expire in hard interrupt context hrtimer: Add kernel doc annotation for HRTIMER_MODE_HARD x86/hyperv: Hide pv_ops access for CONFIG_PARAVIRT=n posix-cpu-timers: Utilize timerqueue for storage posix-cpu-timers: Move state tracking to struct posix_cputimers posix-cpu-timers: Deduplicate rlimit handling posix-cpu-timers: Remove pointless comparisons posix-cpu-timers: Get rid of 64bit divisions posix-cpu-timers: Consolidate timer expiry further posix-cpu-timers: Get rid of zero checks rlimit: Rewrite non-sensical RLIMIT_CPU comment posix-cpu-timers: Respect INFINITY for hard RTTIME limit posix-cpu-timers: Switch thread group sampling to array posix-cpu-timers: Restructure expiry array posix-cpu-timers: Remove cputime_expires ...
2019-09-17Merge branch 'pm-sleep'Rafael J. Wysocki1-2/+2
* pm-sleep: (29 commits) ACPI: PM: s2idle: Always set up EC GPE for system wakeup ACPI: PM: s2idle: Avoid rearming SCI for wakeup unnecessarily PM / wakeup: Unexport wakeup_source_sysfs_{add,remove}() PM / wakeup: Register wakeup class kobj after device is added PM / wakeup: Fix sysfs registration error path PM / wakeup: Show wakeup sources stats in sysfs PM / wakeup: Use wakeup_source_register() in wakelock.c PM / wakeup: Drop wakeup_source_init(), wakeup_source_prepare() PM: sleep: Replace strncmp() with str_has_prefix() PM: suspend: Fix platform_suspend_prepare_noirq() intel-hid: Disable button array during suspend-to-idle intel-hid: intel-vbtn: Avoid leaking wakeup_mode set ACPI: PM: s2idle: Execute LPS0 _DSM functions with suspended devices ACPI: EC: PM: Make acpi_ec_dispatch_gpe() print debug message ACPI: EC: PM: Consolidate some code depending on PM_SLEEP ACPI: PM: s2idle: Eliminate acpi_sleep_no_ec_events() ACPI: PM: s2idle: Switch EC over to polling during "noirq" suspend ACPI: PM: s2idle: Add acpi.sleep_no_lps0 module parameter ACPI: PM: s2idle: Rearrange lps0_device_attach() PM/sleep: Expose suspend stats in sysfs ...
2019-09-15Revert "ext4: make __ext4_get_inode_loc plug"Linus Torvalds1-3/+0
This reverts commit b03755ad6f33b7b8cd7312a3596a2dbf496de6e7. This is sad, and done for all the wrong reasons. Because that commit is good, and does exactly what it says: avoids a lot of small disk requests for the inode table read-ahead. However, it turns out that it causes an entirely unrelated problem: the getrandom() system call was introduced back in 2014 by commit c6e9d6f38894 ("random: introduce getrandom(2) system call"), and people use it as a convenient source of good random numbers. But part of the current semantics for getrandom() is that it waits for the entropy pool to fill at least partially (unlike /dev/urandom). And at least ArchLinux apparently has a systemd that uses getrandom() at boot time, and the improvements in IO patterns means that existing installations suddenly start hanging, waiting for entropy that will never happen. It seems to be an unlucky combination of not _quite_ enough entropy, together with a particular systemd version and configuration. Lennart says that the systemd-random-seed process (which is what does this early access) is supposed to not block any other boot activity, but sadly that doesn't actually seem to be the case (possibly due bogus dependencies on cryptsetup for encrypted swapspace). The correct fix is to fix getrandom() to not block when it's not appropriate, but that fix is going to take a lot more discussion. Do we just make it act like /dev/urandom by default, and add a new flag for "wait for entropy"? Do we add a boot-time option? Or do we just limit the amount of time it will wait for entropy? So in the meantime, we do the revert to give us time to discuss the eventual fix for the fundamental problem, at which point we can re-apply the ext4 inode table access optimization. Reported-by: Ahmed S. Darwish <darwish.07@gmail.com> Cc: Ted Ts'o <tytso@mit.edu> Cc: Willy Tarreau <w@1wt.eu> Cc: Alexander E. Patrakov <patrakov@gmail.com> Cc: Lennart Poettering <mzxreary@0pointer.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-14io_uring: increase IORING_MAX_ENTRIES to 32KDaniel Xu1-1/+1
Some workloads can require far more than 4K oustanding entries. For example memcached can have ~300K sockets over ~40 cores. Bumping the max to 32K seems to work pretty well. Reported-by: Dan Melnic <dmm@fb.com> Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-13Merge tag 'for-5.3-rc8-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linuxLinus Torvalds2-17/+34
Pull btrfs fixes from David Sterba: "Here are two fixes, one of them urgent fixing a bug introduced in 5.2 and reported by many users. It took time to identify the root cause, catching the 5.3 release is higly desired also to push the fix to 5.2 stable tree. The bug is a mess up of return values after adding proper error handling and honestly the kind of bug that can cause sleeping disorders until it's caught. My appologies to everybody who was affected. Summary of what could happen: 1) either a hang when committing a transaction, if this happens there's no risk of corruption, still the hang is very inconvenient and can't be resolved without a reboot 2) writeback for some btree nodes may never be started and we end up committing a transaction without noticing that, this is really serious and that will lead to the "parent transid verify failed" messages" * tag 'for-5.3-rc8-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: Btrfs: fix unwritten extent buffers and hangs on future writeback attempts Btrfs: fix assertion failure during fsync and use of stale transaction
2019-09-12io_uring: make sqpoll wakeup possible with geteventsJens Axboe1-6/+2
The way the logic is setup in io_uring_enter() means that you can't wake up the SQ poller thread while at the same time waiting (or polling) for completions afterwards. There's no reason for that to be the case. Reported-by: Lewis Baker <lbaker@fb.com> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-12io_uring: extend async work mergingJens Axboe1-8/+28
We currently merge async work items if we see a strict sequential hit. This helps avoid unnecessary workqueue switches when we don't need them. We can extend this merging to cover cases where it's not a strict sequential hit, but the IO still fits within the same page. If an application is doing multiple requests within the same page, we don't want separate workers waiting on the same page to complete IO. It's much faster to let the first worker bring in the page, then operate on that page from the same worker to complete the next request(s). Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-12Btrfs: fix unwritten extent buffers and hangs on future writeback attemptsFilipe Manana1-9/+26
The lock_extent_buffer_io() returns 1 to the caller to tell it everything went fine and the callers needs to start writeback for the extent buffer (submit a bio, etc), 0 to tell the caller everything went fine but it does not need to start writeback for the extent buffer, and a negative value if some error happened. When it's about to return 1 it tries to lock all pages, and if a try lock on a page fails, and we didn't flush any existing bio in our "epd", it calls flush_write_bio(epd) and overwrites the return value of 1 to 0 or an error. The page might have been locked elsewhere, not with the goal of starting writeback of the extent buffer, and even by some code other than btrfs, like page migration for example, so it does not mean the writeback of the extent buffer was already started by some other task, so returning a 0 tells the caller (btree_write_cache_pages()) to not start writeback for the extent buffer. Note that epd might currently have either no bio, so flush_write_bio() returns 0 (success) or it might have a bio for another extent buffer with a lower index (logical address). Since we return 0 with the EXTENT_BUFFER_WRITEBACK bit set on the extent buffer and writeback is never started for the extent buffer, future attempts to writeback the extent buffer will hang forever waiting on that bit to be cleared, since it can only be cleared after writeback completes. Such hang is reported with a trace like the following: [49887.347053] INFO: task btrfs-transacti:1752 blocked for more than 122 seconds. [49887.347059] Not tainted 5.2.13-gentoo #2 [49887.347060] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [49887.347062] btrfs-transacti D 0 1752 2 0x80004000 [49887.347064] Call Trace: [49887.347069] ? __schedule+0x265/0x830 [49887.347071] ? bit_wait+0x50/0x50 [49887.347072] ? bit_wait+0x50/0x50 [49887.347074] schedule+0x24/0x90 [49887.347075] io_schedule+0x3c/0x60 [49887.347077] bit_wait_io+0x8/0x50 [49887.347079] __wait_on_bit+0x6c/0x80 [49887.347081] ? __lock_release.isra.29+0x155/0x2d0 [49887.347083] out_of_line_wait_on_bit+0x7b/0x80 [49887.347084] ? var_wake_function+0x20/0x20 [49887.347087] lock_extent_buffer_for_io+0x28c/0x390 [49887.347089] btree_write_cache_pages+0x18e/0x340 [49887.347091] do_writepages+0x29/0xb0 [49887.347093] ? kmem_cache_free+0x132/0x160 [49887.347095] ? convert_extent_bit+0x544/0x680 [49887.347097] filemap_fdatawrite_range+0x70/0x90 [49887.347099] btrfs_write_marked_extents+0x53/0x120 [49887.347100] btrfs_write_and_wait_transaction.isra.4+0x38/0xa0 [49887.347102] btrfs_commit_transaction+0x6bb/0x990 [49887.347103] ? start_transaction+0x33e/0x500 [49887.347105] transaction_kthread+0x139/0x15c So fix this by not overwriting the return value (ret) with the result from flush_write_bio(). We also need to clear the EXTENT_BUFFER_WRITEBACK bit in case flush_write_bio() returns an error, otherwise it will hang any future attempts to writeback the extent buffer, and undo all work done before (set back EXTENT_BUFFER_DIRTY, etc). This is a regression introduced in the 5.2 kernel. Fixes: 2e3c25136adfb ("btrfs: extent_io: add proper error handling to lock_extent_buffer_for_io()") Fixes: f4340622e0226 ("btrfs: extent_io: Move the BUG_ON() in flush_write_bio() one level up") Reported-by: Zdenek Sojka <zsojka@seznam.cz> Link: https://lore.kernel.org/linux-btrfs/GpO.2yos.3WGDOLpx6t%7D.1TUDYM@seznam.cz/T/#u Reported-by: Stefan Priebe - Profihost AG <s.priebe@profihost.ag> Link: https://lore.kernel.org/linux-btrfs/5c4688ac-10a7-fb07-70e8-c5d31a3fbb38@profihost.ag/T/#t Reported-by: Drazen Kacar <drazen.kacar@oradian.com> Link: https://lore.kernel.org/linux-btrfs/DB8PR03MB562876ECE2319B3E579590F799C80@DB8PR03MB5628.eurprd03.prod.outlook.com/ Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=204377 Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-12Btrfs: fix assertion failure during fsync and use of stale transactionFilipe Manana1-8/+8
Sometimes when fsync'ing a file we need to log that other inodes exist and when we need to do that we acquire a reference on the inodes and then drop that reference using iput() after logging them. That generally is not a problem except if we end up doing the final iput() (dropping the last reference) on the inode and that inode has a link count of 0, which can happen in a very short time window if the logging path gets a reference on the inode while it's being unlinked. In that case we end up getting the eviction callback, btrfs_evict_inode(), invoked through the iput() call chain which needs to drop all of the inode's items from its subvolume btree, and in order to do that, it needs to join a transaction at the helper function evict_refill_and_join(). However because the task previously started a transaction at the fsync handler, btrfs_sync_file(), it has current->journal_info already pointing to a transaction handle and therefore evict_refill_and_join() will get that transaction handle from btrfs_join_transaction(). From this point on, two different problems can happen: 1) evict_refill_and_join() will often change the transaction handle's block reserve (->block_rsv) and set its ->bytes_reserved field to a value greater than 0. If evict_refill_and_join() never commits the transaction, the eviction handler ends up decreasing the reference count (->use_count) of the transaction handle through the call to btrfs_end_transaction(), and after that point we have a transaction handle with a NULL ->block_rsv (which is the value prior to the transaction join from evict_refill_and_join()) and a ->bytes_reserved value greater than 0. If after the eviction/iput completes the inode logging path hits an error or it decides that it must fallback to a transaction commit, the btrfs fsync handle, btrfs_sync_file(), gets a non-zero value from btrfs_log_dentry_safe(), and because of that non-zero value it tries to commit the transaction using a handle with a NULL ->block_rsv and a non-zero ->bytes_reserved value. This makes the transaction commit hit an assertion failure at btrfs_trans_release_metadata() because ->bytes_reserved is not zero but the ->block_rsv is NULL. The produced stack trace for that is like the following: [192922.917158] assertion failed: !trans->bytes_reserved, file: fs/btrfs/transaction.c, line: 816 [192922.917553] ------------[ cut here ]------------ [192922.917922] kernel BUG at fs/btrfs/ctree.h:3532! [192922.918310] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC PTI [192922.918666] CPU: 2 PID: 883 Comm: fsstress Tainted: G W 5.1.4-btrfs-next-47 #1 [192922.919035] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014 [192922.919801] RIP: 0010:assfail.constprop.25+0x18/0x1a [btrfs] (...) [192922.920925] RSP: 0018:ffffaebdc8a27da8 EFLAGS: 00010286 [192922.921315] RAX: 0000000000000051 RBX: ffff95c9c16a41c0 RCX: 0000000000000000 [192922.921692] RDX: 0000000000000000 RSI: ffff95cab6b16838 RDI: ffff95cab6b16838 [192922.922066] RBP: ffff95c9c16a41c0 R08: 0000000000000000 R09: 0000000000000000 [192922.922442] R10: ffffaebdc8a27e70 R11: 0000000000000000 R12: ffff95ca731a0980 [192922.922820] R13: 0000000000000000 R14: ffff95ca84c73338 R15: ffff95ca731a0ea8 [192922.923200] FS: 00007f337eda4e80(0000) GS:ffff95cab6b00000(0000) knlGS:0000000000000000 [192922.923579] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [192922.923948] CR2: 00007f337edad000 CR3: 00000001e00f6002 CR4: 00000000003606e0 [192922.924329] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [192922.924711] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [192922.925105] Call Trace: [192922.925505] btrfs_trans_release_metadata+0x10c/0x170 [btrfs] [192922.925911] btrfs_commit_transaction+0x3e/0xaf0 [btrfs] [192922.926324] btrfs_sync_file+0x44c/0x490 [btrfs] [192922.926731] do_fsync+0x38/0x60 [192922.927138] __x64_sys_fdatasync+0x13/0x20 [192922.927543] do_syscall_64+0x60/0x1c0 [192922.927939] entry_SYSCALL_64_after_hwframe+0x49/0xbe (...) [192922.934077] ---[ end trace f00808b12068168f ]--- 2) If evict_refill_and_join() decides to commit the transaction, it will be able to do it, since the nested transaction join only increments the transaction handle's ->use_count reference counter and it does not prevent the transaction from getting committed. This means that after eviction completes, the fsync logging path will be using a transaction handle that refers to an already committed transaction. What happens when using such a stale transaction can be unpredictable, we are at least having a use-after-free on the transaction handle itself, since the transaction commit will call kmem_cache_free() against the handle regardless of its ->use_count value, or we can end up silently losing all the updates to the log tree after that iput() in the logging path, or using a transaction handle that in the meanwhile was allocated to another task for a new transaction, etc, pretty much unpredictable what can happen. In order to fix both of them, instead of using iput() during logging, use btrfs_add_delayed_iput(), so that the logging path of fsync never drops the last reference on an inode, that step is offloaded to a safe context (usually the cleaner kthread). The assertion failure issue was sporadically triggered by the test case generic/475 from fstests, which loads the dm error target while fsstress is running, which lead to fsync failing while logging inodes with -EIO errors and then trying later to commit the transaction, triggering the assertion failure. CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-10io_uring: limit parallelism of buffered writesJens Axboe1-8/+39
All the popular filesystems need to grab the inode lock for buffered writes. With io_uring punting buffered writes to async context, we observe a lot of contention with all workers hamming this mutex. For buffered writes, we generally don't need a lot of parallelism on the submission side, as the flushing will take care of that for us. Hence we don't need a deep queue on the write side, as long as we can safely punt from the original submission context. Add a workqueue with a limit of 2 that we can use for buffered writes. This greatly improves the performance and efficiency of higher queue depth buffered async writes with io_uring. Reported-by: Andres Freund <andres@anarazel.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-10io_uring: add io_queue_async_work() helperJens Axboe1-5/+11
Add a helper for queueing a request for async execution, in preparation for optimizing it. No functional change in this patch. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-10io_uring: optimize submit_and_wait APIJens Axboe1-17/+46
For some applications that end up using a submit-and-wait type of approach for certain batches of IO, we can make that a bit more efficient by allowing the application to block for the last IO submission. This prevents an async when we don't need it, as the application will be blocking for the completion event(s) anyway. Typical use cases are using the liburing io_uring_submit_and_wait() API, or just using io_uring_enter() doing both submissions and completions. As a specific example, RocksDB doing MultiGet() is sped up quite a bit with this change. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-09io_uring: add support for link with drainJackie Liu1-17/+97
To support the link with drain, we need to do two parts. There is an sqes: 0 1 2 3 4 5 6 +-----+-----+-----+-----+-----+-----+-----+ | N | L | L | L+D | N | N | N | +-----+-----+-----+-----+-----+-----+-----+ First, we need to ensure that the io before the link is completed, there is a easy way is set drain flag to the link list's head, so all subsequent io will be inserted into the defer_list. +-----+ (0) | N | +-----+ | (2) (3) (4) +-----+ +-----+ +-----+ +-----+ (1) | L+D | --> | L | --> | L+D | --> | N | +-----+ +-----+ +-----+ +-----+ | +-----+ (5) | N | +-----+ | +-----+ (6) | N | +-----+ Second, ensure that the following IO will not be completed first, an easy way is to create a mirror of drain io and insert it into defer_list, in this way, as long as drain io is not processed, the following io in the defer_list will not be actively process. +-----+ (0) | N | +-----+ | (2) (3) (4) +-----+ +-----+ +-----+ +-----+ (1) | L+D | --> | L | --> | L+D | --> | N | +-----+ +-----+ +-----+ +-----+ | +-----+ ('3) | D | <== This is a shadow of (3) +-----+ | +-----+ (5) | N | +-----+ | +-----+ (6) | N | +-----+ Signed-off-by: Jackie Liu <liuyun01@kylinos.cn> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-09io_uring: fix wrong sequence setting logicJackie Liu1-1/+3
Sqo_thread will get sqring in batches, which will cause ctx->cached_sq_head to be added in batches. if one of these sqes is set with the DRAIN flag, then he will never get a chance to process, and finally sqo_thread will not exit. Fixes: de0617e4671 ("io_uring: add support for marking commands as draining") Signed-off-by: Jackie Liu <liuyun01@kylinos.cn> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-06Merge tag 'configfs-for-5.3' of git://git.infradead.org/users/hch/configfsLinus Torvalds3-175/+257
Pull configfs fixes from Christoph Hellwig: "Late configfs fixes from Al that fix pretty nasty removal vs attribute access races" * tag 'configfs-for-5.3' of git://git.infradead.org/users/hch/configfs: configfs: provide exclusion between IO and removals configfs: new object reprsenting tree fragments configfs_register_group() shouldn't be (and isn't) called in rmdirable parts configfs: stash the data we need into configfs_buffer at open time
2019-09-06io_uring: expose single mmap capabilityJens Axboe1-0/+2
After commit 75b28affdd6a we can get by with just a single mmap to map both the sq and cq ring. However, userspace doesn't know that. Add a features variable to io_uring_params, and notify userspace that the kernel has this ability. This can then be used in liburing (or in applications directly) to avoid the second mmap. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-05erofs: use read_cache_page_gfp for erofs_get_meta_pageGao Xiang1-61/+9
As Christoph said [1], "I'd much prefer to just use read_cache_page_gfp, and live with the fact that this allocates bufferheads behind you for now. I'll try to speed up my attempts to get rid of the buffer heads on the block device mapping instead. " This simplifies the code a lot and a minor thing is "no REQ_META (e.g. for blktrace) on metadata at all..." [1] https://lore.kernel.org/r/20190903153704.GA2201@infradead.org/ Reported-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Link: https://lore.kernel.org/r/20190904020912.63925-26-gaoxiang25@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-09-05erofs: always use iget5_lockedGao Xiang1-7/+0
As Christoph said [1] [2], "Just use the slightly more complicated 32-bit version everywhere so that you have a single actually tested code path. And then remove this helper. " [1] https://lore.kernel.org/r/20190829102426.GE20598@infradead.org/ [2] https://lore.kernel.org/r/20190902125320.GA16726@infradead.org/ Reported-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Link: https://lore.kernel.org/r/20190904020912.63925-25-gaoxiang25@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-09-05erofs: use read_mapping_page instead of sb_breadGao Xiang1-6/+9
As Christoph said [1], "This seems to be your only direct use of buffer heads, which while not deprecated are a bit of an ugly step child. So if you can easily avoid creating a buffer_head dependency in a new filesystem I think you should avoid it. " [1] https://lore.kernel.org/r/20190902125109.GA9826@infradead.org/ Reported-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Link: https://lore.kernel.org/r/20190904020912.63925-24-gaoxiang25@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-09-05erofs: rename errln/infoln/debugln to erofs_{err, info, dbg}Gao Xiang11-90/+139
Add prefix "erofs_" to these functions and print sb->s_id as a prefix to erofs_{err, info} so that the user knows which file system is affected. Reported-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Link: https://lore.kernel.org/r/20190904020912.63925-23-gaoxiang25@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-09-05erofs: save one level of indentationGao Xiang1-32/+33
As Christoph said [1], ".. and save one level of indentation." [1] https://lore.kernel.org/r/20190829102426.GE20598@infradead.org/ Reported-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Link: https://lore.kernel.org/r/20190904020912.63925-22-gaoxiang25@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-09-05erofs: kill use_vmap module parameterGao Xiang1-33/+13
As Christoph said [1], "vm_map_ram is supposed to generally behave better. So if it doesn't please report that that to the arch maintainer and linux-mm so that they can look into the issue. Having user make choices of deep down kernel internals is just a horrible interface. Please talk to maintainers of other bits of the kernel if you see issues and / or need enhancements. " Let's redo the previous conclusion and kill the vmap approach. [1] https://lore.kernel.org/r/20190830165533.GA10909@infradead.org/ Reported-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Link: https://lore.kernel.org/r/20190904020912.63925-21-gaoxiang25@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-09-05erofs: kill all erofs specific fault injectionGao Xiang6-158/+2
As Christoph suggested [1], "Please just use plain kmalloc everywhere and let the normal kernel error injection code take care of injeting any errors." [1] https://lore.kernel.org/r/20190829102426.GE20598@infradead.org/ Reported-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Link: https://lore.kernel.org/r/20190904020912.63925-20-gaoxiang25@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-09-05erofs: add "erofs_" prefix for common and short functionsGao Xiang7-53/+54
Add erofs_ prefix to free_inode, alloc_inode, ... Reported-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Link: https://lore.kernel.org/r/20190904020912.63925-19-gaoxiang25@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-09-05erofs: kill __submit_bio()Gao Xiang3-17/+17
As Christoph pointed out [1], " Why is there __submit_bio which really just obsfucates what is going on? Also why is __submit_bio using bio_set_op_attrs instead of opencode it as the comment right next to it asks you to? " Let's use submit_bio directly instead. [1] https://lore.kernel.org/r/20190830162812.GA10694@infradead.org/ Reported-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Link: https://lore.kernel.org/r/20190904020912.63925-18-gaoxiang25@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-09-05erofs: kill prio and nofail of erofs_get_meta_page()Gao Xiang5-46/+16
As Christoph pointed out [1], "Why is there __erofs_get_meta_page with the two weird booleans instead of a single erofs_get_meta_page that gets and gfp_t for additional flags and an unsigned int for additional bio op flags." And since all callers can handle errors, let's kill prio and nofail and erofs_get_inline_page() now. [1] https://lore.kernel.org/r/20190830162812.GA10694@infradead.org/ Reported-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Link: https://lore.kernel.org/r/20190904020912.63925-17-gaoxiang25@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-09-05erofs: localize erofs_grab_bio()Gao Xiang3-45/+22
As Christoph pointed out [1], "erofs_grab_bio tries to handle a bio_alloc failure, except that the function will not actually fail due the mempool backing it." Sorry about useless code, fix it now and localize erofs_grab_bio [2]. [1] https://lore.kernel.org/r/20190830162812.GA10694@infradead.org/ [2] https://lore.kernel.org/r/20190902122016.GL15931@infradead.org/ Reported-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Link: https://lore.kernel.org/r/20190904020912.63925-16-gaoxiang25@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-09-05erofs: kill verbose debug info in erofs_fill_superGao Xiang1-7/+2
As Christoph said [1], "That is some very verbose debug info. We usually don't add that and let people trace the function instead. " [1] https://lore.kernel.org/r/20190829101545.GC20598@infradead.org/ Reported-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Link: https://lore.kernel.org/r/20190904020912.63925-15-gaoxiang25@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-09-05erofs: use dsb instead of layout for ondisk super_blockGao Xiang1-18/+17
As Christoph pointed out [1], "Why is the variable name for the on-disk subperblock layout? We usually still calls this something with sb in the name, e.g. dsb. for disksuper block. " Let's fix it. [1] https://lore.kernel.org/r/20190829101545.GC20598@infradead.org/ Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Link: https://lore.kernel.org/r/20190904020912.63925-14-gaoxiang25@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-09-05erofs: better erofs symlink stuffsGao Xiang3-54/+29
Fix as Christoph suggested [1] [2], "remove is_inode_fast_symlink and just opencode it in the few places using it" and "Please just set the ops directly instead of obsfucating that in a single caller, single line inline function. And please set it instead of the normal symlink iops in the same place where you also set those." [1] https://lore.kernel.org/r/20190830163910.GB29603@infradead.org/ [2] https://lore.kernel.org/r/20190829102426.GE20598@infradead.org/ Reported-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Link: https://lore.kernel.org/r/20190904020912.63925-13-gaoxiang25@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-09-05erofs: update comments in inode.cGao Xiang1-3/+2
As Christoph suggested [1], update them all. [1] https://lore.kernel.org/r/20190829102426.GE20598@infradead.org/ Reported-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Link: https://lore.kernel.org/r/20190904020912.63925-12-gaoxiang25@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-09-05erofs: update erofs_fs.h commentsGao Xiang1-4/+5
As Christoph said [1] [2], update it now. [1] https://lore.kernel.org/r/20190902124521.GA22153@infradead.org/ [2] https://lore.kernel.org/r/20190902120548.GB15931@infradead.org/ Reported-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Link: https://lore.kernel.org/r/20190904020912.63925-11-gaoxiang25@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-09-05erofs: use erofs_inode namingGao Xiang10-57/+54
As Christoph suggested [1], "Why is this called vnode instead of inode? That seems like a rather odd naming for a Linux file system." [1] https://lore.kernel.org/r/20190829101545.GC20598@infradead.org/ Reported-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Link: https://lore.kernel.org/r/20190904020912.63925-10-gaoxiang25@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-09-05erofs: kill erofs_{init,exit}_inode_cacheGao Xiang1-19/+12
As Christoph said [1] "having this function seems entirely pointless", let's kill those. filesystem function name ext2,f2fs,ext4,isofs,squashfs,cifs,... init_inodecache In addition, add a necessary "rcu_barrier()" on exit_fs(); [1] https://lore.kernel.org/r/20190829101545.GC20598@infradead.org/ Reported-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Link: https://lore.kernel.org/r/20190904020912.63925-9-gaoxiang25@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-09-05erofs: better naming for erofs inode related stuffsGao Xiang6-90/+108
updates inode naming - kill is_inode_layout_compression [1] - kill magic underscores [2] [3] - better naming for datamode & data_mapping_mode [3] - better naming erofs_inode_{compact, extended} [4] [1] https://lore.kernel.org/r/20190829102426.GE20598@infradead.org/ [2] https://lore.kernel.org/r/20190829102426.GE20598@infradead.org/ [3] https://lore.kernel.org/r/20190902122627.GN15931@infradead.org/ [4] https://lore.kernel.org/r/20190902125438.GA17750@infradead.org/ Reported-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Link: https://lore.kernel.org/r/20190904020912.63925-8-gaoxiang25@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-09-05erofs: use feature_incompat rather than requirementsGao Xiang4-13/+14
As Christoph said [1], "This is only cosmetic, why not stick to feature_compat and feature_incompat?" In my thought, requirements means "incompatible" instead of "feature" though. [1] https://lore.kernel.org/r/20190902125109.GA9826@infradead.org/ Reported-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Link: https://lore.kernel.org/r/20190904020912.63925-7-gaoxiang25@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-09-05erofs: update erofs_inode_is_data_compressed helperGao Xiang1-3/+2
As Christoph said, "This looks like a really obsfucated way to write: return datamode == EROFS_INODE_FLAT_COMPRESSION || datamode == EROFS_INODE_FLAT_COMPRESSION_LEGACY; " Although I had my own consideration, it's the right way for now. [1] https://lore.kernel.org/r/20190829095954.GB20598@infradead.org/ Reported-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Link: https://lore.kernel.org/r/20190904020912.63925-6-gaoxiang25@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-09-05erofs: kill __packed for on-disk structuresGao Xiang1-9/+9
As Christoph suggested "Please don't add __packed" [1], remove all __packed except struct erofs_dirent here. Note that all on-disk fields except struct erofs_dirent (12 bytes with a 8-byte nid) in EROFS are naturally aligned. [1] https://lore.kernel.org/r/20190829095954.GB20598@infradead.org/ Reported-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Link: https://lore.kernel.org/r/20190904020912.63925-5-gaoxiang25@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-09-05erofs: some macros are much more readable as a functionGao Xiang3-11/+17
As Christoph suggested [1], these macros are much more readable as a function. [1] https://lore.kernel.org/r/20190829095954.GB20598@infradead.org/ Reported-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Link: https://lore.kernel.org/r/20190904020912.63925-4-gaoxiang25@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-09-05erofs: on-disk format should have explicitly assigned numbersGao Xiang1-9/+9
As Christoph suggested [1], on-disk format should have explicitly assigned numbers. [1] https://lore.kernel.org/r/20190829095954.GB20598@infradead.org/ Reported-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Link: https://lore.kernel.org/r/20190904020912.63925-3-gaoxiang25@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-09-05erofs: remove all the byte offset commentsGao Xiang1-51/+54
As Christoph suggested [1], "Please remove all the byte offset comments. that is something that can easily be checked with gdb or pahole." [1] https://lore.kernel.org/r/20190829095954.GB20598@infradead.org/ Reported-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Link: https://lore.kernel.org/r/20190904020912.63925-2-gaoxiang25@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-09-04configfs: provide exclusion between IO and removalsAl Viro2-18/+80
Make sure that attribute methods are not called after the item has been removed from the tree. To do so, we * at the point of no return in removals, grab ->frag_sem exclusive and mark the fragment dead. * call the methods of attributes with ->frag_sem taken shared and only after having verified that the fragment is still alive. The main benefit is for method instances - they are guaranteed that the objects they are accessing *and* all ancestors are still there. Another win is that we don't need to bother with extra refcount on config_item when opening a file - the item will be alive for as long as it stays in the tree, and we won't touch it/attributes/any associated data after it's been removed from the tree. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Christoph Hellwig <hch@lst.de>
2019-09-04erofs: using switch-case while checking the inode type.Pratik Shinde1-6/+12
while filling the linux inode, using switch-case statement to check the type of inode. switch-case statement looks more clean here. Signed-off-by: Pratik Shinde <pratikshinde320@gmail.com> Reviewed-by: Gao Xiang <gaoxiang25@huawei.com> Link: https://lore.kernel.org/r/20190830095615.10995-1-pratikshinde320@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-09-02configfs: new object reprsenting tree fragmentsAl Viro3-27/+97
Refcounted, hangs of configfs_dirent, created by operations that add fragments to configfs tree (mkdir and configfs_register_{subsystem,group}). Will be used in the next commit to provide exclusion between fragment removal and ->show/->store calls. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Christoph Hellwig <hch@lst.de>