aboutsummaryrefslogtreecommitdiffstats
path: root/fs (follow)
AgeCommit message (Collapse)AuthorFilesLines
2016-10-06Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespaceLinus Torvalds7-29/+177
Pull namespace updates from Eric Biederman: "This set of changes is a number of smaller things that have been overlooked in other development cycles focused on more fundamental change. The devpts changes are small things that were a distraction until we managed to kill off DEVPTS_MULTPLE_INSTANCES. There is an trivial regression fix to autofs for the unprivileged mount changes that went in last cycle. A pair of ioctls has been added by Andrey Vagin making it is possible to discover the relationships between namespaces when referring to them through file descriptors. The big user visible change is starting to add simple resource limits to catch programs that misbehave. With namespaces in general and user namespaces in particular allowing users to use more kinds of resources, it has become important to have something to limit errant programs. Because the purpose of these limits is to catch errant programs the code needs to be inexpensive to use as it always on, and the default limits need to be high enough that well behaved programs on well behaved systems don't encounter them. To this end, after some review I have implemented per user per user namespace limits, and use them to limit the number of namespaces. The limits being per user mean that one user can not exhause the limits of another user. The limits being per user namespace allow contexts where the limit is 0 and security conscious folks can remove from their threat anlysis the code used to manage namespaces (as they have historically done as it root only). At the same time the limits being per user namespace allow other parts of the system to use namespaces. Namespaces are increasingly being used in application sand boxing scenarios so an all or nothing disable for the entire system for the security conscious folks makes increasing use of these sandboxes impossible. There is also added a limit on the maximum number of mounts present in a single mount namespace. It is nontrivial to guess what a reasonable system wide limit on the number of mount structure in the kernel would be, especially as it various based on how a system is using containers. A limit on the number of mounts in a mount namespace however is much easier to understand and set. In most cases in practice only about 1000 mounts are used. Given that some autofs scenarious have the potential to be 30,000 to 50,000 mounts I have set the default limit for the number of mounts at 100,000 which is well above every known set of users but low enough that the mount hash tables don't degrade unreaonsably. These limits are a start. I expect this estabilishes a pattern that other limits for resources that namespaces use will follow. There has been interest in making inotify event limits per user per user namespace as well as interest expressed in making details about what is going on in the kernel more visible" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (28 commits) autofs: Fix automounts by using current_real_cred()->uid mnt: Add a per mount namespace limit on the number of mounts netns: move {inc,dec}_net_namespaces into #ifdef nsfs: Simplify __ns_get_path tools/testing: add a test to check nsfs ioctl-s nsfs: add ioctl to get a parent namespace nsfs: add ioctl to get an owning user namespace for ns file descriptor kernel: add a helper to get an owning user namespace for a namespace devpts: Change the owner of /dev/pts/ptmx to the mounter of /dev/pts devpts: Remove sync_filesystems devpts: Make devpts_kill_sb safe if fsi is NULL devpts: Simplify devpts_mount by using mount_nodev devpts: Move the creation of /dev/pts/ptmx into fill_super devpts: Move parse_mount_options into fill_super userns: When the per user per user namespace limit is reached return ENOSPC userns; Document per user per user namespace limits. mntns: Add a limit on the number of mount namespaces. netns: Add a limit on the number of net namespaces cgroupns: Add a limit on the number of cgroup namespaces ipcns: Add a limit on the number of ipc namespaces ...
2016-10-06Merge tag 'xfs-for-linus-4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfsLinus Torvalds44-703/+1799
Pull xfs and iomap updates from Dave Chinner: "The main things in this update are the iomap-based DAX infrastructure, an XFS delalloc rework, and a chunk of fixes to how log recovery schedules writeback to prevent spurious corruption detections when recovery of certain items was not required. The other main chunk of code is some preparation for the upcoming reflink functionality. Most of it is generic and cleanups that stand alone, but they were ready and reviewed so are in this pull request. Speaking of reflink, I'm currently planning to send you another pull request next week containing all the new reflink functionality. I'm working through a similar process to the last cycle, where I sent the reverse mapping code in a separate request because of how large it was. The reflink code merge is even bigger than reverse mapping, so I'll be doing the same thing again.... Summary for this update: - change of XFS mailing list to linux-xfs@vger.kernel.org - iomap-based DAX infrastructure w/ XFS and ext2 support - small iomap fixes and additions - more efficient XFS delayed allocation infrastructure based on iomap - a rework of log recovery writeback scheduling to ensure we don't fail recovery when trying to replay items that are already on disk - some preparation patches for upcoming reflink support - configurable error handling fixes and documentation - aio access time update race fixes for XFS and generic_file_read_iter" * tag 'xfs-for-linus-4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs: (40 commits) fs: update atime before I/O in generic_file_read_iter xfs: update atime before I/O in xfs_file_dio_aio_read ext2: fix possible integer truncation in ext2_iomap_begin xfs: log recovery tracepoints to track current lsn and buffer submission xfs: update metadata LSN in buffers during log recovery xfs: don't warn on buffers not being recovered due to LSN xfs: pass current lsn to log recovery buffer validation xfs: rework log recovery to submit buffers on LSN boundaries xfs: quiesce the filesystem after recovery on readonly mount xfs: remote attribute blocks aren't really userdata ext2: use iomap to implement DAX ext2: stop passing buffer_head to ext2_get_blocks xfs: use iomap to implement DAX xfs: refactor xfs_setfilesize xfs: take the ilock shared if possible in xfs_file_iomap_begin xfs: fix locking for DAX writes dax: provide an iomap based fault handler dax: provide an iomap based dax read/write path dax: don't pass buffer_head to copy_user_dax dax: don't pass buffer_head to dax_insert_mapping ...
2016-10-06Merge branch 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-armLinus Torvalds1-1/+1
Pull ARM updates from Russell King: - Correct ARMs dma-mapping to use the correct printk format strings. - Avoid defining OBJCOPYFLAGS globally which upsets lkdtm rodata testing. - Cleanups to ARMs asm/memory.h include. - L2 cache cleanups. - Allow flat nommu binaries to be executed on ARM MMU systems. - Kernel hardening - add more read-only after init annotations, including making some kernel vdso variables const. - Ensure AMBA primecell clocks are appropriately defaulted. - ARM breakpoint cleanup. - Various StrongARM 11x0 and companion chip (SA1111) updates to bring this legacy platform to use more modern APIs for (eg) GPIOs and interrupts, which will allow us in the future to reduce some of the board-level driver clutter and elimate function callbacks into board code via platform data. There still appears to be interest in these platforms! - Remove the now redundant secure_flush_area() API. - Module PLT relocation optimisations. Ard says: This series of 4 patches optimizes the ARM PLT generation code that is invoked at module load time, to get rid of the O(n^2) algorithm that results in pathological load times of 10 seconds or more for large modules on certain STB platforms. - ARMv7M cache maintanence support. - L2 cache PMU support * 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm: (35 commits) ARM: sa1111: provide to_sa1111_device() macro ARM: sa1111: add sa1111_get_irq() ARM: sa1111: clean up duplication in IRQ chip implementation ARM: sa1111: implement a gpio_chip for SA1111 GPIOs ARM: sa1111: move irq cleanup to separate function ARM: sa1111: use devm_clk_get() ARM: sa1111: use devm_kzalloc() ARM: sa1111: ensure we only touch RAB bus type devices when removing ARM: 8611/1: l2x0: add PMU support ARM: 8610/1: V7M: Add dsb before jumping in handler mode ARM: 8609/1: V7M: Add support for the Cortex-M7 processor ARM: 8608/1: V7M: Indirect proc_info construction for V7M CPUs ARM: 8607/1: V7M: Wire up caches for V7M processors with cache support. ARM: 8606/1: V7M: introduce cache operations ARM: 8605/1: V7M: fix notrace variant of save_and_disable_irqs ARM: 8604/1: V7M: Add support for reading the CTR with read_cpuid_cachetype() ARM: 8603/1: V7M: Add addresses for mem-mapped V7M cache operations ARM: 8602/1: factor out CSSELR/CCSIDR operations that use cp15 directly ARM: kernel: avoid brute force search on PLT generation ARM: kernel: sort relocation sections before allocating PLTs ...
2016-10-06Merge branches 'misc' and 'sa1111-base' into for-linusRussell King106-914/+2006
2016-10-05Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuseLinus Torvalds9-256/+482
Pull fuse updates from Miklos Szeredi: "This adds POSIX ACL permission checking to the fuse kernel module. In addition there are minor bug fixes as well as cleanups" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: limit xattr returned size fuse: remove duplicate cs->offset assignment fuse: don't use fuse_ioctl_copy_user() helper fuse_ioctl_copy_user(): don't open-code copy_page_{to,from}_iter() fuse: get rid of fc->flags fuse: use timespec64 fuse: don't use ->d_time fuse: Add posix ACL support fuse: handle killpriv in userspace fs fuse: fix killing s[ug]id in setattr fuse: invalidate dir dentry after chmod fuse: Use generic xattr ops fuse: listxattr: verify xattr list
2016-10-05Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fsLinus Torvalds5-9/+54
Pull misc filesystem and quota fixes from Jan Kara: "Some smaller udf, ext2, quota & reiserfs fixes" * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: ext2: Unmap metadata when zeroing blocks udf: don't bother with full-page write optimisations in adinicb case reiserfs: Unlock superblock before calling reiserfs_quota_on_mount() udf: Remove useless check in udf_adinicb_write_begin() quota: fill in Q_XGETQSTAT inode information for inactive quotas ext2: Check return value from ext2_get_group_desc()
2016-10-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds13-513/+414
Pull networking updates from David Miller: 1) BBR TCP congestion control, from Neal Cardwell, Yuchung Cheng and co. at Google. https://lwn.net/Articles/701165/ 2) Do TCP Small Queues for retransmits, from Eric Dumazet. 3) Support collect_md mode for all IPV4 and IPV6 tunnels, from Alexei Starovoitov. 4) Allow cls_flower to classify packets in ip tunnels, from Amir Vadai. 5) Support DSA tagging in older mv88e6xxx switches, from Andrew Lunn. 6) Support GMAC protocol in iwlwifi mwm, from Ayala Beker. 7) Support ndo_poll_controller in mlx5, from Calvin Owens. 8) Move VRF processing to an output hook and allow l3mdev to be loopback, from David Ahern. 9) Support SOCK_DESTROY for UDP sockets. Also from David Ahern. 10) Congestion control in RXRPC, from David Howells. 11) Support geneve RX offload in ixgbe, from Emil Tantilov. 12) When hitting pressure for new incoming TCP data SKBs, perform a partial rathern than a full purge of the OFO queue (which could be huge). From Eric Dumazet. 13) Convert XFRM state and policy lookups to RCU, from Florian Westphal. 14) Support RX network flow classification to igb, from Gangfeng Huang. 15) Hardware offloading of eBPF in nfp driver, from Jakub Kicinski. 16) New skbmod packet action, from Jamal Hadi Salim. 17) Remove some inefficiencies in snmp proc output, from Jia He. 18) Add FIB notifications to properly propagate route changes to hardware which is doing forwarding offloading. From Jiri Pirko. 19) New dsa driver for qca8xxx chips, from John Crispin. 20) Implement RFC7559 ipv6 router solicitation backoff, from Maciej Żenczykowski. 21) Add L3 mode to ipvlan, from Mahesh Bandewar. 22) Support 802.1ad in mlx4, from Moshe Shemesh. 23) Support hardware LRO in mediatek driver, from Nelson Chang. 24) Add TC offloading to mlx5, from Or Gerlitz. 25) Convert various drivers to ethtool ksettings interfaces, from Philippe Reynes. 26) TX max rate limiting for cxgb4, from Rahul Lakkireddy. 27) NAPI support for ath10k, from Rajkumar Manoharan. 28) Support XDP in mlx5, from Rana Shahout and Saeed Mahameed. 29) UDP replicast support in TIPC, from Richard Alpe. 30) Per-queue statistics for qed driver, from Sudarsana Reddy Kalluru. 31) Support BQL in thunderx driver, from Sunil Goutham. 32) TSO support in alx driver, from Tobias Regnery. 33) Add stream parser engine and use it in kcm. 34) Support async DHCP replies in ipconfig module, from Uwe Kleine-König. 35) DSA port fast aging for mv88e6xxx driver, from Vivien Didelot. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1715 commits) mlxsw: switchx2: Fix misuse of hard_header_len mlxsw: spectrum: Fix misuse of hard_header_len net/faraday: Stop NCSI device on shutdown net/ncsi: Introduce ncsi_stop_dev() net/ncsi: Rework the channel monitoring net/ncsi: Allow to extend NCSI request properties net/ncsi: Rework request index allocation net/ncsi: Don't probe on the reserved channel ID (0x1f) net/ncsi: Introduce NCSI_RESERVED_CHANNEL net/ncsi: Avoid unused-value build warning from ia64-linux-gcc net: Add netdev all_adj_list refcnt propagation to fix panic net: phy: Add Edge-rate driver for Microsemi PHYs. vmxnet3: Wake queue from reset work i40e: avoid NULL pointer dereference and recursive errors on early PCI error qed: Add RoCE ll2 & GSI support qed: Add support for memory registeration verbs qed: Add support for QP verbs qed: PD,PKEY and CQ verb support qed: Add support for RoCE hw init qede: Add qedr framework ...
2016-10-04Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-securityLinus Torvalds2-0/+32
Pull security subsystem updates from James Morris: SELinux/LSM: - overlayfs support, necessary for container filesystems LSM: - finally remove the kernel_module_from_file hook Smack: - treat signal delivery as an 'append' operation TPM: - lots of bugfixes & updates Audit: - new audit data type: LSM_AUDIT_DATA_FILE * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (47 commits) Revert "tpm/tpm_crb: implement tpm crb idle state" Revert "tmp/tpm_crb: fix Intel PTT hw bug during idle state" Revert "tpm/tpm_crb: open code the crb_init into acpi_add" Revert "tmp/tpm_crb: implement runtime pm for tpm_crb" lsm,audit,selinux: Introduce a new audit data type LSM_AUDIT_DATA_FILE tmp/tpm_crb: implement runtime pm for tpm_crb tpm/tpm_crb: open code the crb_init into acpi_add tmp/tpm_crb: fix Intel PTT hw bug during idle state tpm/tpm_crb: implement tpm crb idle state tpm: add check for minimum buffer size in tpm_transmit() tpm: constify TPM 1.x header structures tpm/tpm_crb: fix the over 80 characters checkpatch warring tpm/tpm_crb: drop useless cpu_to_le32 when writing to registers tpm/tpm_crb: cache cmd_size register value. tmp/tpm_crb: drop include to platform_device tpm/tpm_tis: remove unused itpm variable tpm_crb: fix incorrect values of cmdReady and goIdle bits tpm_crb: refine the naming of constants tpm_crb: remove wmb()'s tpm_crb: fix crb_req_canceled behavior ...
2016-10-04Merge tag 'jfs-4.9' of git://github.com/kleikamp/linux-shaggyLinus Torvalds2-4/+9
Pull jfs updates from David Kleikamp: "Minor jfs updates" * tag 'jfs-4.9' of git://github.com/kleikamp/linux-shaggy: jfs: Simplify code jfs: jump to error_out when filemap_{fdatawait, write_and_wait} fails
2016-10-04Merge tag 'gfs2-4.8.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2Linus Torvalds12-49/+71
Pull gfs2 updates from Bob Peterson: "We've only got six GFS2 patches for this merge window. In patch order: - Fabian Frederick submitted a nice cleanup that uses the BIT macro rather than bit shifting. - Andreas Gruenbacher contributed a patch that fixes a long-standing annoyance whereby GFS2 warned about dirty pages. - Andreas also fixed a problem with the recent extended attribute readahead feature. - Chao Yu contributed a patch that checks the return code from function register_shrinker and reacts accordingly. Previously, it was not checked. - Andreas Gruenbacher also fixed a problem whereby incore file timestamps were forgotten if the file was invalidated. This merely moves the assignment inside the inode glock where it belongs. - Andreas also fixed a problem where incore timestamps were not initialized" * tag 'gfs2-4.8.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: gfs2: Initialize atime of I_NEW inodes gfs2: Update file times after grabbing glock gfs2: fix to detect failure of register_shrinker gfs2: Fix extended attribute readahead optimization gfs2: Remove dirty buffer warning from gfs2_releasepage GFS2: use BIT() macro
2016-10-04Merge tag 'locks-v4.9-1' of git://git.samba.org/jlayton/linuxLinus Torvalds1-3/+18
Pull file locking updates from Jeff Layton: "Only a single patch from Nikolay this cycle, with a small change to better handle /proc/locks in a containerized host" * tag 'locks-v4.9-1' of git://git.samba.org/jlayton/linux: locks: Filter /proc/locks output on proc pid ns
2016-10-03Merge tag 'tty-4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/ttyLinus Torvalds1-45/+26
Pull tty and serial updates from Greg KH: "Here is the big tty and serial patch set for 4.9-rc1. It also includes some drivers/dma/ changes, as those were needed by some serial drivers, and they were all acked by the DMA maintainer. Also in here is the long-suffering ACPI SPCR patchset, which was passed around from maintainer to maintainer like a hot-potato. Seems I was the sucker^Wlucky one. All of those patches have been acked by the various subsystem maintainers as well. All of this has been in linux-next with no reported issues" * tag 'tty-4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (111 commits) Revert "serial: pl011: add console matching function" MAINTAINERS: update entry for atmel_serial driver serial: pl011: add console matching function ARM64: ACPI: enable ACPI_SPCR_TABLE ACPI: parse SPCR and enable matching console of/serial: move earlycon early_param handling to serial Revert "drivers/tty: Explicitly pass current to show_stack" tty: amba-pl011: Don't complain on -EPROBE_DEFER when no irq nios2: dts: 10m50: Add tx-threshold parameter serial: 8250: Set Altera 16550 TX FIFO Threshold serial: 8250: of: Load TX FIFO Threshold from DT Documentation: dt: serial: Add TX FIFO threshold parameter drivers/tty: Explicitly pass current to show_stack serial: imx: Fix DCD reading serial: stm32: mark symbols static where possible serial: xuartps: Add some register initialisation to cdns_early_console_setup() serial: xuartps: Removed unwanted checks while reading the error conditions serial: xuartps: Rewrite the interrupt handling logic serial: stm32: use mapbase instead of membase for DMA tty/serial: atmel: fix fractional baud rate computation ...
2016-10-03Merge tag 'driver-core-4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-coreLinus Torvalds3-15/+8
Pull driver core updates from Greg KH: "Here are the "big" driver core patches for 4.9-rc1. Also in here are a number of debugfs fixes that cropped up due to the changes that happened in 4.8 for that filesystem. Overall, nothing major, just a few fixes and cleanups. All of these have been in linux-next with no reported issues" * tag 'driver-core-4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (23 commits) drivers: dma-coherent: Move spinlock in dma_alloc_from_coherent() drivers: dma-coherent: Fix DMA coherent size for less than page MAINTAINERS: extend firmware_class maintainer list debugfs: propagate release() call result driver-core: platform: Catch errors from calls to irq_get_irq_data sysfs print name of undiscoverable attribute group carl9170: fix debugfs crashes b43legacy: fix debugfs crash b43: fix debugfs crash debugfs: introduce a public file_operations accessor device core: Remove deprecated create_singlethread_workqueue drivers/base dmam_declare_coherent_memory leaks platform: don't return 0 from platform_get_irq[_byname]() on error cpu: clean up register_cpu func dma-mapping: use vma_pages(). drivers: dma-coherent: use vma_pages(). attribute_container: Fix typo base: soc: make it explicitly non-modular drivers: base: dma-mapping: page align the size when unmap_kernel_range platform driver: fix use-after-free in platform_device_del() ...
2016-10-03Merge branch 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds1-15/+8
Pull x86 vdso updates from Ingo Molnar: "The main changes in this cycle centered around adding support for 32-bit compatible C/R of the vDSO on 64-bit kernels, by Dmitry Safonov" * 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/vdso: Use CONFIG_X86_X32_ABI to enable vdso prctl x86/vdso: Only define map_vdso_randomized() if CONFIG_X86_64 x86/vdso: Only define prctl_map_vdso() if CONFIG_CHECKPOINT_RESTORE x86/signal: Add SA_{X32,IA32}_ABI sa_flags x86/ptrace: Down with test_thread_flag(TIF_IA32) x86/coredump: Use pr_reg size, rather that TIF_IA32 flag x86/arch_prctl/vdso: Add ARCH_MAP_VDSO_* x86/vdso: Replace calculate_addr in map_vdso() with addr x86/vdso: Unmap vdso blob on vvar mapping failure
2016-10-03Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds1-1/+1
Pull low-level x86 updates from Ingo Molnar: "In this cycle this topic tree has become one of those 'super topics' that accumulated a lot of changes: - Add CONFIG_VMAP_STACK=y support to the core kernel and enable it on x86 - preceded by an array of changes. v4.8 saw preparatory changes in this area already - this is the rest of the work. Includes the thread stack caching performance optimization. (Andy Lutomirski) - switch_to() cleanups and all around enhancements. (Brian Gerst) - A large number of dumpstack infrastructure enhancements and an unwinder abstraction. The secret long term plan is safe(r) live patching plus maybe another attempt at debuginfo based unwinding - but all these current bits are standalone enhancements in a frame pointer based debug environment as well. (Josh Poimboeuf) - More __ro_after_init and const annotations. (Kees Cook) - Enable KASLR for the vmemmap memory region. (Thomas Garnier)" [ The virtually mapped stack changes are pretty fundamental, and not x86-specific per se, even if they are only used on x86 right now. ] * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (70 commits) x86/asm: Get rid of __read_cr4_safe() thread_info: Use unsigned long for flags x86/alternatives: Add stack frame dependency to alternative_call_2() x86/dumpstack: Fix show_stack() task pointer regression x86/dumpstack: Remove dump_trace() and related callbacks x86/dumpstack: Convert show_trace_log_lvl() to use the new unwinder oprofile/x86: Convert x86_backtrace() to use the new unwinder x86/stacktrace: Convert save_stack_trace_*() to use the new unwinder perf/x86: Convert perf_callchain_kernel() to use the new unwinder x86/unwind: Add new unwind interface and implementations x86/dumpstack: Remove NULL task pointer convention fork: Optimize task creation by caching two thread stacks per CPU if CONFIG_VMAP_STACK=y sched/core: Free the stack early if CONFIG_THREAD_INFO_IN_TASK lib/syscall: Pin the task stack in collect_syscall() x86/process: Pin the target stack in get_wchan() x86/dumpstack: Pin the target stack when dumping it kthread: Pin the stack via try_get_task_stack()/put_task_stack() in to_live_kthread() function sched/core: Add try_get_task_stack() and put_task_stack() x86/entry/64: Fix a minor comment rebase error iommu/amd: Don't put completion-wait semaphore on stack ...
2016-10-03Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds2-18/+51
Pull locking updates from Ingo Molnar: "The main changes in this cycle were: - rwsem micro-optimizations (Davidlohr Bueso) - Improve the implementation and optimize the performance of percpu-rwsems. (Peter Zijlstra.) - Convert all lglock users to better facilities such as percpu-rwsems or percpu-spinlocks and remove lglocks. (Peter Zijlstra) - Remove the ticket (spin)lock implementation. (Peter Zijlstra) - Korean translation of memory-barriers.txt and related fixes to the English document. (SeongJae Park) - misc fixes and cleanups" * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits) x86/cmpxchg, locking/atomics: Remove superfluous definitions x86, locking/spinlocks: Remove ticket (spin)lock implementation locking/lglock: Remove lglock implementation stop_machine: Remove stop_cpus_lock and lg_double_lock/unlock() fs/locks: Use percpu_down_read_preempt_disable() locking/percpu-rwsem: Add down_read_preempt_disable() fs/locks: Replace lg_local with a per-cpu spinlock fs/locks: Replace lg_global with a percpu-rwsem locking/percpu-rwsem: Add DEFINE_STATIC_PERCPU_RWSEMand percpu_rwsem_assert_held() locking/pv-qspinlock: Use cmpxchg_release() in __pv_queued_spin_unlock() locking/rwsem, x86: Drop a bogus cc clobber futex: Add some more function commentry locking/hung_task: Show all locks locking/rwsem: Scan the wait_list for readers only once locking/rwsem: Remove a few useless comments locking/rwsem: Return void in __rwsem_mark_wake() locking, rcu, cgroup: Avoid synchronize_sched() in __cgroup_procs_write() locking/Documentation: Add Korean translation locking/Documentation: Fix a typo of example result locking/Documentation: Fix wrong section reference ...
2016-10-03Merge branch 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds2-5/+13
Pull EFI updates from Ingo Molnar: "Main changes in this cycle were: - Refactor the EFI memory map code into architecture neutral files and allow drivers to permanently reserve EFI boot services regions on x86, as well as ARM/arm64. (Matt Fleming) - Add ARM support for the EFI ESRT driver. (Ard Biesheuvel) - Make the EFI runtime services and efivar API interruptible by swapping spinlocks for semaphores. (Sylvain Chouleur) - Provide the EFI identity mapping for kexec which allows kexec to work on SGI/UV platforms with requiring the "noefi" kernel command line parameter. (Alex Thorlton) - Add debugfs node to dump EFI page tables on arm64. (Ard Biesheuvel) - Merge the EFI test driver being carried out of tree until now in the FWTS project. (Ivan Hu) - Expand the list of flags for classifying EFI regions as "RAM" on arm64 so we align with the UEFI spec. (Ard Biesheuvel) - Optimise out the EFI mixed mode if it's unsupported (CONFIG_X86_32) or disabled (CONFIG_EFI_MIXED=n) and switch the early EFI boot services function table for direct calls, alleviating us from having to maintain the custom function table. (Lukas Wunner) - Miscellaneous cleanups and fixes" * 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (30 commits) x86/efi: Round EFI memmap reservations to EFI_PAGE_SIZE x86/efi: Allow invocation of arbitrary boot services x86/efi: Optimize away setup_gop32/64 if unused x86/efi: Use kmalloc_array() in efi_call_phys_prolog() efi/arm64: Treat regions with WT/WC set but WB cleared as memory efi: Add efi_test driver for exporting UEFI runtime service interfaces x86/efi: Defer efi_esrt_init until after memblock_x86_fill efi/arm64: Add debugfs node to dump UEFI runtime page tables x86/efi: Remove unused find_bits() function fs/efivarfs: Fix double kfree() in error path x86/efi: Map in physical addresses in efi_map_region_fixed lib/ucs2_string: Speed up ucs2_utf8size() firmware-gsmi: Delete an unnecessary check before the function call "dma_pool_destroy" x86/efi: Initialize status to ensure garbage is not returned on small size efi: Replace runtime services spinlock with semaphore efi: Don't use spinlocks for efi vars efi: Use a file local lock for efivars efi/arm*: esrt: Add missing call to efi_esrt_init() efi/esrt: Use memremap not ioremap to access ESRT table in memory x86/efi-bgrt: Use efi_mem_reserve() to avoid copying image data ...
2016-10-03fuse: limit xattr returned sizeMiklos Szeredi1-2/+2
Don't let userspace filesystem give bogus values for the size of xattr and xattr list. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-10-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller4-6/+26
Three sets of overlapping changes. Nothing serious. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-03Merge branch 'xfs-4.9-log-recovery-fixes' into for-nextDave Chinner13-98/+248
2016-10-03Merge branch 'iomap-4.9-dax' into for-nextDave Chinner11-122/+457
2016-10-03Merge branch 'xfs-4.9-delalloc-rework' into for-nextDave Chinner6-350/+242
2016-10-03Merge branch 'xfs-4.9-reflink-prep' into for-nextDave Chinner20-116/+719
2016-10-03Merge branch 'iomap-4.9-misc-fixes-1' into for-nextDave Chinner1-0/+84
2016-10-03xfs: update atime before I/O in xfs_file_dio_aio_readChristoph Hellwig1-1/+2
After the call to __blkdev_direct_IO the final reference to the file might have been dropped by aio_complete already, and the call to file_accessed might cause a use after free. Instead update the access time before the I/O, similar to how we update the time stamps before writes. Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-and-tested-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-10-03ext2: fix possible integer truncation in ext2_iomap_beginChristoph Hellwig1-1/+1
For 32-bit architectures we need to cast first_block to u64 before shifting it left. Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Jan Kara <jack@suse.cz> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-10-01fuse: remove duplicate cs->offset assignmentMiklos Szeredi1-1/+0
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-10-01fuse: don't use fuse_ioctl_copy_user() helperMiklos Szeredi1-34/+18
The two invocations share little code. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-10-01fuse_ioctl_copy_user(): don't open-code copy_page_{to,from}_iter()Al Viro1-23/+7
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-10-01fuse: get rid of fc->flagsMiklos Szeredi3-24/+20
Only two flags: "default_permissions" and "allow_other". All other flags are handled via bitfields. So convert these two as well. They don't change during the lifetime of the filesystem, so this is quite safe. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-10-01fuse: use timespec64Miklos Szeredi1-3/+7
And check for valid nsec value before passing into timespec64_to_jiffies(). Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-10-01fuse: don't use ->d_timeMiklos Szeredi1-20/+23
Store in memory pointed to by ->d_fsdata. Use ->d_init() to allocate the storage. Need to use RCU freeing because the data is used in RCU lookup mode. We could cast ->d_fsdata directly on 64bit archs, but I don't think this is worth the extra complexity. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-10-01fuse: Add posix ACL supportSeth Forshee7-7/+152
Add a new INIT flag, FUSE_POSIX_ACL, for negotiating ACL support with userspace. When it is set in the INIT response, ACL support will be enabled. ACL support also implies "default_permissions". When ACL support is enabled, the kernel will cache and have responsibility for enforcing ACLs. ACL xattrs will be passed to userspace, which is responsible for updating the ACLs in the filesystem, keeping the file mode in sync, and inheritance of default ACLs when new filesystem nodes are created. Signed-off-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-10-01fuse: handle killpriv in userspace fsMiklos Szeredi3-18/+33
Only userspace filesystem can do the killing of suid/sgid without races. So introduce an INIT flag and negotiate support for this. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-10-01fuse: fix killing s[ug]id in setattrMiklos Szeredi1-4/+28
Fuse allowed VFS to set mode in setattr in order to clear suid/sgid on chown and truncate, and (since writeback_cache) write. The problem with this is that it'll potentially restore a stale mode. The poper fix would be to let the filesystems do the suid/sgid clearing on the relevant operations. Possibly some are already doing it but there's no way we can detect this. So fix this by refreshing and recalculating the mode. Do this only if ATTR_KILL_S[UG]ID is set to not destroy performance for writes. This is still racy but the size of the window is reduced. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Cc: <stable@vger.kernel.org>
2016-10-01fuse: invalidate dir dentry after chmodMiklos Szeredi1-2/+10
Without "default_permissions" the userspace filesystem's lookup operation needs to perform the check for search permission on the directory. If directory does not allow search for everyone (this is quite rare) then userspace filesystem has to set entry timeout to zero to make sure permissions are always performed. Changing the mode bits of the directory should also invalidate the (previously cached) dentry to make sure the next lookup will have a chance of updating the timeout, if needed. Reported-by: Jean-Pierre André <jean-pierre.andre@wanadoo.fr> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Cc: <stable@vger.kernel.org>
2016-10-01fuse: Use generic xattr opsSeth Forshee5-176/+221
In preparation for posix acl support, rework fuse to use xattr handlers and the generic setxattr/getxattr/listxattr callbacks. Split the xattr code out into it's own file, and promote symbols to module-global scope as needed. Functionally these changes have no impact, as fuse still uses a single handler for all xattrs which uses the old callbacks. Signed-off-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-10-01fuse: listxattr: verify xattr listMiklos Szeredi1-0/+19
Make sure userspace filesystem is returning a well formed list of xattr names (zero or more nonzero length, null terminated strings). [Michael Theall: only verify in the nonzero size case] Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Cc: <stable@vger.kernel.org>
2016-09-30ocfs2: fix deadlock on mmapped page in ocfs2_write_begin_nolock()Eric Ren1-0/+10
The testcase "mmaptruncate" of ocfs2-test deadlocks occasionally. In this testcase, we create a 2*CLUSTER_SIZE file and mmap() on it; there are 2 process repeatedly performing the following operations respectively: one is doing memset(mmaped_addr + 2*CLUSTER_SIZE - 1, 'a', 1), while the another is playing ftruncate(fd, 2*CLUSTER_SIZE) and then ftruncate(fd, CLUSTER_SIZE) again and again. This is the backtrace when the deadlock happens: __wait_on_bit_lock+0x50/0xa0 __lock_page+0xb7/0xc0 ocfs2_write_begin_nolock+0x163f/0x1790 [ocfs2] ocfs2_page_mkwrite+0x1c7/0x2a0 [ocfs2] do_page_mkwrite+0x66/0xc0 handle_mm_fault+0x685/0x1350 __do_page_fault+0x1d8/0x4d0 trace_do_page_fault+0x37/0xf0 do_async_page_fault+0x19/0x70 async_page_fault+0x28/0x30 In ocfs2_write_begin_nolock(), we first grab the pages and then allocate disk space for this write; ocfs2_try_to_free_truncate_log() will be called if -ENOSPC is returned; if we're lucky to get enough clusters, which is usually the case, we start over again. But in ocfs2_free_write_ctxt() the target page isn't unlocked, so we will deadlock when trying to grab the target page again. Also, -ENOMEM might be returned in ocfs2_grab_pages_for_write(). Another deadlock will happen in __do_page_mkwrite() if ocfs2_page_mkwrite() returns non-VM_FAULT_LOCKED, and along with a locked target page. These two errors fail on the same path, so fix them by unlocking the target page manually before ocfs2_free_write_ctxt(). Jan Kara helps me clear out the JBD2 part, and suggest the hint for root cause. Changes since v1: 1. Also put ENOMEM error case into consideration. Link: http://lkml.kernel.org/r/1474173902-32075-1-git-send-email-zren@suse.com Signed-off-by: Eric Ren <zren@suse.com> Reviewed-by: He Gang <ghe@suse.com> Acked-by: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-09-30autofs: Fix automounts by using current_real_cred()->uidEric W. Biederman1-2/+2
Seth Forshee reports that in 4.8-rcN some automounts are failing because the requesting the automount changed. The relevant call path is: follow_automount() ->d_automount autofs4_d_automount autofs4_mount_wait autofs4_wait In autofs4_wait wq_uid and wq_gid are set to current_uid() and current_gid respectively. With follow_automount now overriding creds uid that we export to userspace changes and that breaks existing setups. To remove the regression set wq_uid and wq_gid from current_real_cred()->uid and current_real_cred()->gid respectively. This restores the current behavior as current->real_cred is identical to current->cred except when override creds are used. Cc: stable@vger.kernel.org Fixes: aeaa4a79ff6a ("fs: Call d_automount with the filesystems creds") Reported-by: Seth Forshee <seth.forshee@canonical.com> Tested-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2016-09-30mnt: Add a per mount namespace limit on the number of mountsEric W. Biederman4-2/+52
CAI Qian <caiqian@redhat.com> pointed out that the semantics of shared subtrees make it possible to create an exponentially increasing number of mounts in a mount namespace. mkdir /tmp/1 /tmp/2 mount --make-rshared / for i in $(seq 1 20) ; do mount --bind /tmp/1 /tmp/2 ; done Will create create 2^20 or 1048576 mounts, which is a practical problem as some people have managed to hit this by accident. As such CVE-2016-6213 was assigned. Ian Kent <raven@themaw.net> described the situation for autofs users as follows: > The number of mounts for direct mount maps is usually not very large because of > the way they are implemented, large direct mount maps can have performance > problems. There can be anywhere from a few (likely case a few hundred) to less > than 10000, plus mounts that have been triggered and not yet expired. > > Indirect mounts have one autofs mount at the root plus the number of mounts that > have been triggered and not yet expired. > > The number of autofs indirect map entries can range from a few to the common > case of several thousand and in rare cases up to between 30000 and 50000. I've > not heard of people with maps larger than 50000 entries. > > The larger the number of map entries the greater the possibility for a large > number of active mounts so it's not hard to expect cases of a 1000 or somewhat > more active mounts. So I am setting the default number of mounts allowed per mount namespace at 100,000. This is more than enough for any use case I know of, but small enough to quickly stop an exponential increase in mounts. Which should be perfect to catch misconfigurations and malfunctioning programs. For anyone who needs a higher limit this can be changed by writing to the new /proc/sys/fs/mount-max sysctl. Tested-by: CAI Qian <caiqian@redhat.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2016-09-30Merge branch 'x86/urgent' into x86/asmThomas Gleixner4-17/+36
Get the cr4 fixes so we can apply the final cleanup
2016-09-30Merge branch 'linus' into locking/core, to pick up fixesIngo Molnar3-6/+16
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-27ext2: Unmap metadata when zeroing blocksJan Kara1-0/+10
When zeroing blocks for DAX allocations, we also have to unmap aliases in the block device mappings. Otherwise writeback can overwrite zeros with stale data from block device page cache. Signed-off-by: Jan Kara <jack@suse.cz>
2016-09-27debugfs: propagate release() call resultEric Engestrom1-1/+1
The result was being ignored and 0 was always returned. Return the actual result instead. Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-27sysfs print name of undiscoverable attribute groupJohannes Thumshirn1-2/+2
Print the name of an undiscoverable attribute group and not the pointer's address. Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-26gfs2: Initialize atime of I_NEW inodesAndreas Gruenbacher1-0/+4
Fix for commit 719ee344: initialize atime of I_NEW inodes to 0 so that the timestamps read from disk will always be more recent than the initial timestamp, and the atime in the I_NEW inode will be set correctly. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2016-09-26gfs2: Update file times after grabbing glockAndreas Gruenbacher1-3/+3
In gfs2_page_mkwrite, grab the inode glock in EX mode before calling file_update_time: grabbing the lock may result in a call to gfs2_dinode_in, which will reset the file times to their on-disk state. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2016-09-26xfs: log recovery tracepoints to track current lsn and buffer submissionBrian Foster2-2/+31
Log recovery has particular rules around buffer submission along with tricky corner cases where independent transactions can share an LSN. As such, it can be difficult to follow when/why buffers are submitted during recovery. Add a couple tracepoints to post the current LSN of a record when a new record is being processed and when a buffer is being skipped due to LSN ordering. Also, update the recover item class to include the LSN of the current transaction for the item being processed. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-09-26xfs: update metadata LSN in buffers during log recoveryBrian Foster1-3/+38
Log recovery is currently broken for v5 superblocks in that it never updates the metadata LSN of buffers written out during recovery. The metadata LSN is recorded in various bits of metadata to provide recovery ordering criteria that prevents transient corruption states reported by buffer write verifiers. Without such ordering logic, buffer updates can be replayed out of order and lead to false positive transient corruption states. This is generally not a corruption vector on its own, but corruption detection shuts down the filesystem and ultimately prevents a mount if it occurs during log recovery. This requires an xfs_repair run that clears the log and potentially loses filesystem updates. This problem is avoided in most cases as metadata writes during normal filesystem operation update the metadata LSN appropriately. The problem with log recovery not updating metadata LSNs manifests if the system happens to crash shortly after log recovery itself. In this scenario, it is possible for log recovery to complete all metadata I/O such that the filesystem is consistent. If a crash occurs after that point but before the log tail is pushed forward by subsequent operations, however, the next mount performs the same log recovery over again. If a buffer is updated multiple times in the dirty range of the log, an earlier update in the log might not be valid based on the current state of the associated buffer after all of the updates in the log had been replayed (before the previous crash). If a verifier happens to detect such a problem, the filesystem claims corruption and immediately shuts down. This commonly manifests in practice as directory block verifier failures such as the following, likely due to directory verifiers being particularly detailed in their checks as compared to most others: ... Mounting V5 Filesystem XFS (dm-0): Starting recovery (logdev: internal) XFS (dm-0): Internal error XFS_WANT_CORRUPTED_RETURN at line ... of \ file fs/xfs/libxfs/xfs_dir2_data.c. Caller xfs_dir3_data_verify ... ... Update log recovery to update the metadata LSN of recovered buffers. Since metadata LSNs are already updated by write verifer functions via attached log items, attach a dummy log item to the buffer during validation and explicitly set the LSN of the current transaction. This ensures that the metadata LSN of a buffer is updated based on whether the recovery I/O actually completes, and if so, that subsequent recovery attempts identify that the buffer is already up to date with respect to the current transaction. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>