aboutsummaryrefslogtreecommitdiffstatshomepage
path: root/fs (follow)
AgeCommit message (Collapse)AuthorFilesLines
2013-04-09Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds3-43/+92
Pull vfs fixes from Al Viro: "A nasty bug in fs/namespace.c caught by Andrey + a couple of less serious unpleasantness - ecryptfs misc device playing hopeless games with try_module_get() and palinfo procfs support being... not quite correctly done, to be polite." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: mnt: release locks on error path in do_loopback palinfo fixes procfs: add proc_remove_subtree() ecryptfs: close rmmod race
2013-04-09mnt: release locks on error path in do_loopbackAndrey Vagin1-1/+1
do_loopback calls lock_mount(path) and forget to unlock_mount if clone_mnt or copy_mnt fails. [ 77.661566] ================================================ [ 77.662939] [ BUG: lock held when returning to user space! ] [ 77.664104] 3.9.0-rc5+ #17 Not tainted [ 77.664982] ------------------------------------------------ [ 77.666488] mount/514 is leaving the kernel with locks still held! [ 77.668027] 2 locks held by mount/514: [ 77.668817] #0: (&sb->s_type->i_mutex_key#7){+.+.+.}, at: [<ffffffff811cca22>] lock_mount+0x32/0xe0 [ 77.671755] #1: (&namespace_sem){+++++.}, at: [<ffffffff811cca3a>] lock_mount+0x4a/0xe0 Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-04-09procfs: add proc_remove_subtree()Al Viro1-30/+89
just what it sounds like; do that only to procfs subtrees you've created - doing that to something shared with another driver is not only antisocial, but might cause interesting races with proc_create() and its ilk. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-04-09ecryptfs: close rmmod raceAl Viro1-12/+2
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-04-09NFSv4: Fix another potential state manager deadlockTrond Myklebust1-0/+1
Don't hold the NFSv4 sequence id while we check for open permission. The call to ACCESS may block due to reboot recovery. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-04-05sysfs: check if one entry has been removed before freeingMing Lei1-1/+8
It might be a kernel disaster if one sysfs entry is freed but still referenced by sysfs tree. Recently Dave and Sasha reported one use-after-free problem on sysfs entry, and the problem has been troubleshooted with help of debug message added in this patch. Given sysfs_get_dirent/sysfs_put are exported APIs, even inside sysfs they are called in many contexts(kobject/attribe add/delete, inode init/drop, dentry lookup/release, readdir, ...), it is healthful to check the removed flag before freeing one entry and dump message if it is freeing without being removed first. Cc: Dave Jones <davej@redhat.com> Cc: Sasha Levin <levinsasha928@gmail.com> Signed-off-by: Ming Lei <ming.lei@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-04-05NFSv4/4.1: Fix bugs in nfs4[01]_walk_client_listTrond Myklebust1-16/+28
It is unsafe to use list_for_each_entry_safe() here, because when we drop the nn->nfs_client_lock, we pin the _current_ list entry and ensure that it stays in the list, but we don't do the same for the _next_ list entry. Use of list_for_each_entry() is therefore the correct thing to do. Also fix the refcounting in nfs41_walk_client_list(). Finally, ensure that the nfs_client has finished being initialised and, in the case of NFSv4.1, that the session is set up. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Bryan Schumaker <bjschuma@netapp.com> Cc: stable@vger.kernel.org [>= 3.7]
2013-04-05NFSv4: Fix a memory leak in nfs4_discover_server_trunkingTrond Myklebust1-1/+7
When we assign a new rpc_client to clp->cl_rpcclient, we need to destroy the old one. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: stable@vger.kernel.org [>=3.7]
2013-04-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-fixesLinus Torvalds4-38/+39
Pull GFS2 fixes from Steven Whitehouse: "There are two patches which fix up a couple of minor issues in the DLM interface code, a missing error path in gfs2_rs_alloc(), one patch which fixes a problem during "withdraw" and a fix for discards/FITRIM when using 4k sector sized devices." * git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-fixes: GFS2: Issue discards in 512b sectors GFS2: Fix unlock of fcntl locks during withdrawn state GFS2: return error if malloc failed in gfs2_rs_alloc() GFS2: use memchr_inv GFS2: use kmalloc for lvb bitmap
2013-04-05GFS2: Issue discards in 512b sectorsBob Peterson1-17/+13
This patch changes GFS2's discard issuing code so that it calls function sb_issue_discard rather than blkdev_issue_discard. The code was calling blkdev_issue_discard and specifying the correct sector offset and sector size, but blkdev_issue_discard expects these values to be in terms of 512 byte sectors, even if the native sector size for the device is different. Calling sb_issue_discard with the BLOCK size instead ensures the correct block-to-512b-sector translation. I verified that "minlen" is specified in blocks, so comparing it to a number of blocks is correct. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-04-04Merge tag 'upstream-3.9-rc6' of git://git.infradead.org/linux-ubifsLinus Torvalds1-6/+6
Pull UBIFS fix from Artem Bityutskiy: "Make the space fixup feature work in the case when the file-system is first mounted R/O and then remounted R/W." * tag 'upstream-3.9-rc6' of git://git.infradead.org/linux-ubifs: UBIFS: make space fixup work in the remount case
2013-04-04GFS2: Fix unlock of fcntl locks during withdrawn stateSteven Whitehouse1-1/+4
When withdraw occurs, we need to continue to allow unlocks of fcntl locks to occur, however these will only be local, since the node has withdrawn from the cluster. This prevents triggering a VFS level bug trap due to locks remaining when a file is closed. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-04-04GFS2: return error if malloc failed in gfs2_rs_alloc()Wei Yongjun1-1/+1
The error code in gfs2_rs_alloc() is set to ENOMEM when error but never be used, instead, gfs2_rs_alloc() always return 0. Fix to return 'error'. Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-04-04GFS2: use memchr_invAkinobu Mita1-6/+2
Use memchr_inv to verify that the specified memory range is cleared. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: cluster-devel@redhat.com Cc: Christine Caulfield <ccaulfie@redhat.com> Cc: David Teigland <teigland@redhat.com>
2013-04-04GFS2: use kmalloc for lvb bitmapDavid Teigland2-13/+19
The temp lvb bitmap was on the stack, which could be an alignment problem for __set_bit_le. Use kmalloc for it instead. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-04-03Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4Linus Torvalds2-6/+9
Pull ext4 fixes from Ted Ts'o: "Unfortunately, we introduced some big-endian bugs during the last merge window. Fortunately, Cai and Christian noticed before 3.9 shipped." * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: fix big-endian bugs which could cause fs corruptions
2013-04-03sysfs: fix use after free in case of concurrent read/write and readdirMing Lei1-4/+11
The inode->i_mutex isn't hold when updating filp->f_pos in read()/write(), so the filp->f_pos might be read as 0 or 1 in readdir() when there is concurrent read()/write() on this same file, then may cause use after free in readdir(). The bug can be reproduced with Li Zefan's test code on the link: https://patchwork.kernel.org/patch/2160771/ This patch fixes the use after free under this situation. Cc: stable <stable@vger.kernel.org> Reported-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Ming Lei <ming.lei@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-04-03Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fsLinus Torvalds1-2/+2
Pull reiserfs fix from Jan Kara: "A fix for reiserfs xattr bug exposed by changes to lookup_one_len()" * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: reiserfs: Fix warning and inode leak when deleting inode with xattrs
2013-04-03ext4: fix big-endian bugs which could cause fs corruptionsZheng Liu2-6/+9
When an extent was zeroed out, we forgot to do convert from cpu to le16. It could make us hit a BUG_ON when we try to write dirty pages out. So fix it. [ Also fix a bug found by Dmitry Monakhov where we were missing le32_to_cpu() calls in the new indirect punch hole code. There are a number of other big endian warnings found by static code analyzers, but we'll wait for the next merge window to fix them all up. These fixes are designed to be Obviously Correct by code inspection, and easy to demonstrate that it won't make any difference (and hence, won't introduce any bugs) on little endian architectures such as x86. --tytso ] Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reported-by: CAI Qian <caiqian@redhat.com> Reported-by: Christian Kujau <lists@nerdbynature.de> Cc: Dmitry Monakhov <dmonakhov@openvz.org>
2013-04-02Merge branch 'for-3.9' of git://linux-nfs.org/~bfields/linuxLinus Torvalds1-1/+1
Pull nfsd bugfix from J Bruce Fields: "An xdr decoding error--thanks, Toralf Förster, and Trinity!" * 'for-3.9' of git://linux-nfs.org/~bfields/linux: nfsd4: reject "negative" acl lengths
2013-04-01loop: prevent bdev freeing while device in useAnatol Pomozov1-0/+1
struct block_device lifecycle is defined by its inode (see fs/block_dev.c) - block_device allocated first time we access /dev/loopXX and deallocated on bdev_destroy_inode. When we create the device "losetup /dev/loopXX afile" we want that block_device stay alive until we destroy the loop device with "losetup -d". But because we do not hold /dev/loopXX inode its counter goes 0, and inode/bdev can be destroyed at any moment. Usually it happens at memory pressure or when user drops inode cache (like in the test below). When later in loop_clr_fd() we want to use bdev we have use-after-free error with following stack: BUG: unable to handle kernel NULL pointer dereference at 0000000000000280 bd_set_size+0x10/0xa0 loop_clr_fd+0x1f8/0x420 [loop] lo_ioctl+0x200/0x7e0 [loop] lo_compat_ioctl+0x47/0xe0 [loop] compat_blkdev_ioctl+0x341/0x1290 do_filp_open+0x42/0xa0 compat_sys_ioctl+0xc1/0xf20 do_sys_open+0x16e/0x1d0 sysenter_dispatch+0x7/0x1a To prevent use-after-free we need to grab the device in loop_set_fd() and put it later in loop_clr_fd(). The issue is reprodusible on current Linus head and v3.3. Here is the test: dd if=/dev/zero of=loop.file bs=1M count=1 while [ true ]; do losetup /dev/loop0 loop.file echo 2 > /proc/sys/vm/drop_caches losetup -d /dev/loop0 done [ Doing bdgrab/bput in loop_set_fd/loop_clr_fd is safe, because every time we call loop_set_fd() we check that loop_device->lo_state is Lo_unbound and set it to Lo_bound If somebody will try to set_fd again it will get EBUSY. And if we try to loop_clr_fd() on unbound loop device we'll get ENXIO. loop_set_fd/loop_clr_fd (and any other loop ioctl) is called under loop_device->lo_ctl_mutex. ] Signed-off-by: Anatol Pomozov <anatol.pomozov@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-01Merge v3.9-rc5 into driver-core-nextGreg Kroah-Hartman31-89/+416
We want the fixes in here. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-03-29Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfsLinus Torvalds13-43/+188
Pull btrfs fixes from Chris Mason: "We've had a busy two weeks of bug fixing. The biggest patches in here are some long standing early-enospc problems (Josef) and a very old race where compression and mmap combine forces to lose writes (me). I'm fairly sure the mmap bug goes all the way back to the introduction of the compression code, which is proof that fsx doesn't trigger every possible mmap corner after all. I'm sure you'll notice one of these is from this morning, it's a small and isolated use-after-free fix in our scrub error reporting. I double checked it here." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: Btrfs: don't drop path when printing out tree errors in scrub Btrfs: fix wrong return value of btrfs_lookup_csum() Btrfs: fix wrong reservation of csums Btrfs: fix double free in the btrfs_qgroup_account_ref() Btrfs: limit the global reserve to 512mb Btrfs: hold the ordered operations mutex when waiting on ordered extents Btrfs: fix space accounting for unlink and rename Btrfs: fix space leak when we fail to reserve metadata space Btrfs: fix EIO from btrfs send in is_extent_unchanged for punched holes Btrfs: fix race between mmap writes and compression Btrfs: fix memory leak in btrfs_create_tree() Btrfs: fix locking on ROOT_REPLACE operations in tree mod log Btrfs: fix missing qgroup reservation before fallocating Btrfs: handle a bogus chunk tree nicely Btrfs: update to use fs_state bit
2013-03-29reiserfs: Fix warning and inode leak when deleting inode with xattrsJan Kara1-2/+2
After commit 21d8a15a (lookup_one_len: don't accept . and ..) reiserfs started failing to delete xattrs from inode. This was due to a buggy test for '.' and '..' in fill_with_dentries() which resulted in passing '.' and '..' entries to lookup_one_len() in some cases. That returned error and so we failed to iterate over all xattrs of and inode. Fix the test in fill_with_dentries() along the lines of the one in lookup_one_len(). Reported-by: Pawel Zawora <pzawora@gmail.com> CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz>
2013-03-29Btrfs: don't drop path when printing out tree errors in scrubJosef Bacik1-1/+2
A user reported a panic where we were panicing somewhere in tree_backref_for_extent from scrub_print_warning. He only captured the trace but looking at scrub_print_warning we drop the path right before we mess with the extent buffer to print out a bunch of stuff, which isn't right. So fix this by dropping the path after we use the eb if we need to. Thanks, Cc: stable@vger.kernel.org Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-03-28Merge tag 'driver-core-3.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-coreLinus Torvalds1-1/+16
Pull sysfs fixes from Greg Kroah-Hartman: "Here are two fixes for sysfs that resolve issues that have been found by the Trinity fuzz tool, causing oopses in sysfs. They both have been in linux-next for a while to ensure that they do not cause any other problems." * tag 'driver-core-3.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: sysfs: handle failure path correctly for readdir() sysfs: fix race between readdir and lseek
2013-03-28Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespaceLinus Torvalds5-1/+68
Pull userns fixes from Eric W Biederman: "The bulk of the changes are fixing the worst consequences of the user namespace design oversight in not considering what happens when one namespace starts off as a clone of another namespace, as happens with the mount namespace. The rest of the changes are just plain bug fixes. Many thanks to Andy Lutomirski for pointing out many of these issues." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: userns: Restrict when proc and sysfs can be mounted ipc: Restrict mounting the mqueue filesystem vfs: Carefully propogate mounts across user namespaces vfs: Add a mount flag to lock read only bind mounts userns: Don't allow creation if the user is chrooted yama: Better permission check for ptraceme pid: Handle the exit of a multi-threaded init. scm: Require CAP_SYS_ADMIN over the current pidns to spoof pids.
2013-03-28Btrfs: fix wrong return value of btrfs_lookup_csum()Miao Xie1-1/+3
If we don't find the expected csum item, but find a csum item which is adjacent to the specified extent, we should return -EFBIG, or we should return -ENOENT. But btrfs_lookup_csum() return -EFBIG even the csum item is not adjacent to the specified extent. Fix it. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-03-28Btrfs: fix wrong reservation of csumsMiao Xie2-2/+2
We reserve the space for csums only when we write data into a file, in the other cases, such as tree log, log replay, we don't do reservation, so we can use the reservation of the transaction handle just for the former. And for the latter, we should use the tree's own reservation. But the function - btrfs_csum_file_blocks() didn't differentiate between these two types of the cases, fix it. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-03-28Btrfs: fix double free in the btrfs_qgroup_account_ref()Wang Shilong1-2/+1
The function btrfs_find_all_roots is responsible to allocate memory for 'roots' and free it if errors happen,so the caller should not free it again since the work has been done. Besides,'tmp' is allocated after the function btrfs_find_all_roots, so we can return directly if btrfs_find_all_roots() fails. Signed-off-by: Wang Shilong <wangsl-fnst@cn.fujitsu.com> Reviewed-by: Miao Xie <miaox@cn.fujitsu.com> Reviewed-by: Jan Schmidt <list.btrfs@jan-o-sch.net> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-03-28Btrfs: limit the global reserve to 512mbJosef Bacik1-1/+1
A user reported a problem where he was getting early ENOSPC with hundreds of gigs of free data space and 6 gigs of free metadata space. This is because the global block reserve was taking up the entire free metadata space. This is ridiculous, we have infrastructure in place to throttle if we start using too much of the global reserve, so instead of letting it get this huge just limit it to 512mb so that users can still get work done. This allowed the user to complete his rsync without issues. Thanks Cc: stable@vger.kernel.org Reported-and-tested-by: Stefan Priebe <s.priebe@profihost.ag> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-03-28Btrfs: hold the ordered operations mutex when waiting on ordered extentsJosef Bacik1-0/+2
We need to hold the ordered_operations mutex while waiting on ordered extents since we splice and run the ordered extents list. We need to make sure anybody else who wants to wait on ordered extents does actually wait for them to be completed. This will keep us from bailing out of flushing in case somebody is already waiting on ordered extents to complete. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-03-28Btrfs: fix space accounting for unlink and renameJosef Bacik1-4/+2
We are way over-reserving for unlink and rename. Rename is just some random huge number and unlink accounts for tree log operations that don't actually happen during unlink, not to mention the tree log doesn't take from the trans block rsv anyway so it's completely useless. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-03-28Btrfs: fix space leak when we fail to reserve metadata spaceJosef Bacik1-6/+41
Dave reported a warning when running xfstest 275. We have been leaking delalloc metadata space when our reservations fail. This is because we were improperly calculating how much space to free for our checksum reservations. The problem is we would sometimes free up space that had already been freed in another thread and we would end up with negative usage for the delalloc space. This patch fixes the problem by calculating how much space the other threads would have already freed, and then calculate how much space we need to free had we not done the reservation at all, and then freeing any excess space. This makes xfstests 275 no longer have leaked space. Thanks Cc: stable@vger.kernel.org Reported-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-03-28Btrfs: fix EIO from btrfs send in is_extent_unchanged for punched holesJan Schmidt1-6/+4
When you take a snapshot, punch a hole where there has been data, then take another snapshot and try to send an incremental stream, btrfs send would give you EIO. That is because is_extent_unchanged had no support for holes being punched. With this patch, instead of returning EIO we just return 0 (== the extent is not unchanged) and we're good. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net> Cc: Alexander Block <ablock84@gmail.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-03-27vfs/splice: Fix missed checks in new __kernel_write() helperAl Viro1-0/+3
Commit 06ae43f34bcc ("Don't bother with redoing rw_verify_area() from default_file_splice_from()") lost the checks to test existence of the write/aio_write methods. My apologies ;-/ Eventually, we want that in fs/splice.c side of things (no point repeating it for every buffer, after all), but for now this is the obvious minimal fix. Reported-by: Dave Jones <davej@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-03-27userns: Restrict when proc and sysfs can be mountedEric W. Biederman3-0/+29
Only allow unprivileged mounts of proc and sysfs if they are already mounted when the user namespace is created. proc and sysfs are interesting because they have content that is per namespace, and so fresh mounts are needed when new namespaces are created while at the same time proc and sysfs have content that is shared between every instance. Respect the policy of who may see the shared content of proc and sysfs by only allowing new mounts if there was an existing mount at the time the user namespace was created. In practice there are only two interesting cases: proc and sysfs are mounted at their usual places, proc and sysfs are not mounted at all (some form of mount namespace jail). Cc: stable@vger.kernel.org Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-03-27vfs: Carefully propogate mounts across user namespacesEric W. Biederman3-1/+12
As a matter of policy MNT_READONLY should not be changable if the original mounter had more privileges than creator of the mount namespace. Add the flag CL_UNPRIVILEGED to note when we are copying a mount from a mount namespace that requires more privileges to a mount namespace that requires fewer privileges. When the CL_UNPRIVILEGED flag is set cause clone_mnt to set MNT_NO_REMOUNT if any of the mnt flags that should never be changed are set. This protects both mount propagation and the initial creation of a less privileged mount namespace. Cc: stable@vger.kernel.org Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Reported-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-03-27vfs: Add a mount flag to lock read only bind mountsEric W. Biederman1-0/+3
When a read-only bind mount is copied from mount namespace in a higher privileged user namespace to a mount namespace in a lesser privileged user namespace, it should not be possible to remove the the read-only restriction. Add a MNT_LOCK_READONLY mount flag to indicate that a mount must remain read-only. CC: stable@vger.kernel.org Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-03-27userns: Don't allow creation if the user is chrootedEric W. Biederman1-0/+24
Guarantee that the policy of which files may be access that is established by setting the root directory will not be violated by user namespaces by verifying that the root directory points to the root of the mount namespace at the time of user namespace creation. Changing the root is a privileged operation, and as a matter of policy it serves to limit unprivileged processes to files below the current root directory. For reasons of simplicity and comprehensibility the privilege to change the root directory is gated solely on the CAP_SYS_CHROOT capability in the user namespace. Therefore when creating a user namespace we must ensure that the policy of which files may be access can not be violated by changing the root directory. Anyone who runs a processes in a chroot and would like to use user namespace can setup the same view of filesystems with a mount namespace instead. With this result that this is not a practical limitation for using user namespaces. Cc: stable@vger.kernel.org Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Reported-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-03-26Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds4-6/+44
Pull vfs fixes from Al Viro: "stable fodder; assorted deadlock fixes" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: vt: synchronize_rcu() under spinlock is not nice... Nest rename_lock inside vfsmount_lock Don't bother with redoing rw_verify_area() from default_file_splice_from()
2013-03-26Nest rename_lock inside vfsmount_lockAl Viro1-5/+11
... lest we get livelocks between path_is_under() and d_path() and friends. The thing is, wrt fairness lglocks are more similar to rwsems than to rwlocks; it is possible to have thread B spin on attempt to take lock shared while thread A is already holding it shared, if B is on lower-numbered CPU than A and there's a thread C spinning on attempt to take the same lock exclusive. As the result, we need consistent ordering between vfsmount_lock (lglock) and rename_lock (seq_lock), even though everything that takes both is going to take vfsmount_lock only shared. Spotted-by: Brad Spengler <spender@grsecurity.net> Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-03-26Merge tag 'nfs-for-3.9-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds6-32/+89
Pull NFS client bugfixes from Trond Myklebust: - Fix an NFSv4 idmapper regression - Fix an Oops in the pNFS blocks client - Fix up various issues with pNFS layoutcommit - Ensure correct read ordering of variables in rpc_wake_up_task_queue_locked * tag 'nfs-for-3.9-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: SUNRPC: Add barriers to ensure read ordering in rpc_wake_up_task_queue_locked NFSv4.1: Add a helper pnfs_commit_and_return_layout NFSv4.1: Always clear the NFS_INO_LAYOUTCOMMIT in layoutreturn NFSv4.1: Fix a race in pNFS layoutcommit pnfs-block: removing DM device maybe cause oops when call dev_remove NFSv4: Fix the string length returned by the idmapper
2013-03-26nfsd4: reject "negative" acl lengthsJ. Bruce Fields1-1/+1
Since we only enforce an upper bound, not a lower bound, a "negative" length can get through here. The symptom seen was a warning when we attempt to a kmalloc with an excessive size. Reported-by: Toralf Förster <toralf.foerster@gmx.de> Cc: stable@kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-03-26Btrfs: fix race between mmap writes and compressionChris Mason3-0/+49
Btrfs uses page_mkwrite to ensure stable pages during crc calculations and mmap workloads. We call clear_page_dirty_for_io before we do any crcs, and this forces any application with the file mapped to wait for the crc to finish before it is allowed to change the file. With compression on, the clear_page_dirty_for_io step is happening after we've compressed the pages. This means the applications might be changing the pages while we are compressing them, and some of those modifications might not hit the disk. This commit adds the clear_page_dirty_for_io before compression starts and makes sure to redirty the page if we have to fallback to uncompressed IO as well. Signed-off-by: Chris Mason <chris.mason@fusionio.com> Reported-by: Alexandre Oliva <oliva@gnu.org> cc: stable@vger.kernel.org
2013-03-25sysfs: use atomic_inc_unless_negative in sysfs_get_activeMaarten Lankhorst1-15/+2
It seems that sysfs has an interesting way of doing the same thing. This removes the cpu_relax unfortunately, but if it's really needed, it would be better to add this to include/linux/atomic.h to benefit all atomic ops users. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-03-25Merge branch 'for-3.9' of git://linux-nfs.org/~bfields/linuxLinus Torvalds2-6/+8
Pull nfsd bugfixes from J Bruce Fields: "Fixes for a couple mistakes in the new DRC code. And thanks to Kent Overstreet for noticing we've been sync'ing the wrong range on stable writes since 3.8." * 'for-3.9' of git://linux-nfs.org/~bfields/linux: nfsd: fix bad offset use nfsd: fix startup order in nfsd_reply_cache_init nfsd: only unhash DRC entries that are in the hashtable
2013-03-22nfsd: fix bad offset useKent Overstreet1-1/+2
vfs_writev() updates the offset argument - but the code then passes the offset to vfs_fsync_range(). Since offset now points to the offset after what was just written, this is probably not what was intended Introduced by face15025ffdf664de95e86ae831544154d26c9c "nfsd: use vfs_fsync_range(), not O_SYNC, for stable writes". Signed-off-by: Kent Overstreet <koverstreet@google.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: stable@vger.kernel.org Reviewed-by: Zach Brown <zab@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-03-22vfs,proc: guarantee unique inodes in /procLinus Torvalds1-3/+3
Dave Jones found another /proc issue with his Trinity tool: thanks to the namespace model, we can have multiple /proc dentries that point to the same inode, aliasing directories in /proc/<pid>/net/ for example. This ends up being a total disaster, because it acts like hardlinked directories, and causes locking problems. We rely on the topological sort of the inodes pointed to by dentries, and if we have aliased directories, that odering becomes unreliable. In short: don't do this. Multiple dentries with the same (directory) inode is just a bad idea, and the namespace code should never have exposed things this way. But we're kind of stuck with it. This solves things by just always allocating a new inode during /proc dentry lookup, instead of using "iget_locked()" to look up existing inodes by superblock and number. That actually simplies the code a bit, at the cost of potentially doing more inode [de]allocations. That said, the inode lookup wasn't free either (and did a lot of locking of inodes), so it is probably not that noticeable. We could easily keep the old lookup model for non-directory entries, but rather than try to be excessively clever this just implements the minimal and simplest workaround for the problem. Reported-and-tested-by: Dave Jones <davej@redhat.com> Analyzed-by: Al Viro <viro@zeniv.linux.org.uk> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-03-21Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds6-56/+43
Pull CIFS fixes from Steve French: "Three small CIFS Fixes (the most important of the three fixes a recent problem authenticating to Windows 8 using cifs rather than SMB2)" * 'for-next' of git://git.samba.org/sfrench/cifs-2.6: cifs: ignore everything in SPNEGO blob after mechTypes cifs: delay super block destruction until all cifsFileInfo objects are gone cifs: map NT_STATUS_SHARING_VIOLATION to EBUSY instead of ETXTBSY