aboutsummaryrefslogtreecommitdiffstats
path: root/fs/nfs (follow)
AgeCommit message (Collapse)AuthorFilesLines
2013-10-28NFSv4: don't reprocess cached open CLAIM_PREVIOUSWeston Andros Adamson1-4/+8
Cached opens have already been handled by _nfs4_opendata_reclaim_to_nfs4_state and can safely skip being reprocessed, but must still call update_open_stateid to make sure that all active fmodes are recovered. Signed-off-by: Weston Andros Adamson <dros@netapp.com> Cc: stable@vger.kernel.org # 3.7.x: f494a6071d3: NFSv4: fix NULL dereference Cc: stable@vger.kernel.org # 3.7.x: a43ec98b72a: NFSv4: don't fail on missin Cc: stable@vger.kernel.org # 3.7.x Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-28NFSv4: Fix state reference counting in _nfs4_opendata_reclaim_to_nfs4_stateTrond Myklebust1-5/+1
Currently, if the call to nfs_refresh_inode fails, then we end up leaking a reference count, due to the call to nfs4_get_open_state. While we're at it, replace nfs4_get_open_state with a simple call to atomic_inc(); there is no need to do a full lookup of the struct nfs_state since it is passed as an argument in the struct nfs4_opendata, and is already assigned to the variable 'state'. Cc: stable@vger.kernel.org # 3.7.x: a43ec98b72a: NFSv4: don't fail on missing Cc: stable@vger.kernel.org # 3.7.x Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-28NFSv4: don't fail on missing fattr in open recoverWeston Andros Adamson1-6/+0
This is an unneeded check that could cause the client to fail to recover opens. Signed-off-by: Weston Andros Adamson <dros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-28NFSv4: fix NULL dereference in open recoverWeston Andros Adamson1-1/+2
_nfs4_opendata_reclaim_to_nfs4_state doesn't expect to see a cached open CLAIM_PREVIOUS, but this can happen. An example is when there are RDWR openers and RDONLY openers on a delegation stateid. The recovery path will first try an open CLAIM_PREVIOUS for the RDWR openers, this marks the delegation as not needing RECLAIM anymore, so the open CLAIM_PREVIOUS for the RDONLY openers will not actually send an rpc. The NULL dereference is due to _nfs4_opendata_reclaim_to_nfs4_state returning PTR_ERR(rpc_status) when !rpc_done. When the open is cached, rpc_done == 0 and rpc_status == 0, thus _nfs4_opendata_reclaim_to_nfs4_state returns NULL - this is unexpected by callers of nfs4_opendata_to_nfs4_state(). This can be reproduced easily by opening the same file two times on an NFSv4.0 mount with delegations enabled, once as RDWR and once as RDONLY then sleeping for a long time. While the files are held open, kick off state recovery and this NULL dereference will be hit every time. An example OOPS: [ 65.003602] BUG: unable to handle kernel NULL pointer dereference at 00000000 00000030 [ 65.005312] IP: [<ffffffffa037d6ee>] __nfs4_close+0x1e/0x160 [nfsv4] [ 65.006820] PGD 7b0ea067 PUD 791ff067 PMD 0 [ 65.008075] Oops: 0000 [#1] SMP [ 65.008802] Modules linked in: rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache snd_ens1371 gameport nfsd snd_rawmidi snd_ac97_codec ac97_bus btusb snd_seq snd _seq_device snd_pcm ppdev bluetooth auth_rpcgss coretemp snd_page_alloc crc32_pc lmul crc32c_intel ghash_clmulni_intel microcode rfkill nfs_acl vmw_balloon serio _raw snd_timer lockd parport_pc e1000 snd soundcore parport i2c_piix4 shpchp vmw _vmci sunrpc ata_generic mperf pata_acpi mptspi vmwgfx ttm scsi_transport_spi dr m mptscsih mptbase i2c_core [ 65.018684] CPU: 0 PID: 473 Comm: 192.168.10.85-m Not tainted 3.11.2-201.fc19 .x86_64 #1 [ 65.020113] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013 [ 65.022012] task: ffff88003707e320 ti: ffff88007b906000 task.ti: ffff88007b906000 [ 65.023414] RIP: 0010:[<ffffffffa037d6ee>] [<ffffffffa037d6ee>] __nfs4_close+0x1e/0x160 [nfsv4] [ 65.025079] RSP: 0018:ffff88007b907d10 EFLAGS: 00010246 [ 65.026042] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 [ 65.027321] RDX: 0000000000000050 RSI: 0000000000000001 RDI: 0000000000000000 [ 65.028691] RBP: ffff88007b907d38 R08: 0000000000016f60 R09: 0000000000000000 [ 65.029990] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001 [ 65.031295] R13: 0000000000000050 R14: 0000000000000000 R15: 0000000000000001 [ 65.032527] FS: 0000000000000000(0000) GS:ffff88007f600000(0000) knlGS:0000000000000000 [ 65.033981] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 65.035177] CR2: 0000000000000030 CR3: 000000007b27f000 CR4: 00000000000407f0 [ 65.036568] Stack: [ 65.037011] 0000000000000000 0000000000000001 ffff88007b907d90 ffff88007a880220 [ 65.038472] ffff88007b768de8 ffff88007b907d48 ffffffffa037e4a5 ffff88007b907d80 [ 65.039935] ffffffffa036a6c8 ffff880037020e40 ffff88007a880000 ffff880037020e40 [ 65.041468] Call Trace: [ 65.042050] [<ffffffffa037e4a5>] nfs4_close_state+0x15/0x20 [nfsv4] [ 65.043209] [<ffffffffa036a6c8>] nfs4_open_recover_helper+0x148/0x1f0 [nfsv4] [ 65.044529] [<ffffffffa036a886>] nfs4_open_recover+0x116/0x150 [nfsv4] [ 65.045730] [<ffffffffa036d98d>] nfs4_open_reclaim+0xad/0x150 [nfsv4] [ 65.046905] [<ffffffffa037d979>] nfs4_do_reclaim+0x149/0x5f0 [nfsv4] [ 65.048071] [<ffffffffa037e1dc>] nfs4_run_state_manager+0x3bc/0x670 [nfsv4] [ 65.049436] [<ffffffffa037de20>] ? nfs4_do_reclaim+0x5f0/0x5f0 [nfsv4] [ 65.050686] [<ffffffffa037de20>] ? nfs4_do_reclaim+0x5f0/0x5f0 [nfsv4] [ 65.051943] [<ffffffff81088640>] kthread+0xc0/0xd0 [ 65.052831] [<ffffffff81088580>] ? insert_kthread_work+0x40/0x40 [ 65.054697] [<ffffffff8165686c>] ret_from_fork+0x7c/0xb0 [ 65.056396] [<ffffffff81088580>] ? insert_kthread_work+0x40/0x40 [ 65.058208] Code: 5c 41 5d 5d c3 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48 89 e5 41 57 41 89 f7 41 56 41 89 ce 41 55 41 89 d5 41 54 53 48 89 fb <4c> 8b 67 30 f0 41 ff 44 24 44 49 8d 7c 24 40 e8 0e 0a 2d e1 44 [ 65.065225] RIP [<ffffffffa037d6ee>] __nfs4_close+0x1e/0x160 [nfsv4] [ 65.067175] RSP <ffff88007b907d10> [ 65.068570] CR2: 0000000000000030 [ 65.070098] ---[ end trace 0d1fe4f5c7dd6f8b ]--- Cc: <stable@vger.kernel.org> #3.7+ Signed-off-by: Weston Andros Adamson <dros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-28NFSv4.1: Don't change the security label as part of open reclaim.Trond Myklebust1-2/+0
The current caching model calls for the security label to be set on first lookup and/or on any subsequent label changes. There is no need to do it as part of an open reclaim. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-28nfs: fix handling of invalid mount options in nfs_remountJeff Layton1-2/+2
nfs_parse_mount_options returns 0 on error, not -errno. Reported-by: Karel Zak <kzak@redhat.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-28nfs: reject version and minorversion changes on remount attemptsJeff Layton1-0/+4
Reported-by: Eric Doutreleau <edoutreleau@genoscope.cns.fr> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-28NFSv4 Remove zeroing state kern warningsAndy Adamson1-8/+5
As of commit 5d422301f97b821301efcdb6fc9d1a83a5c102d6 we no longer zero the state. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-24nfs: use %p[dD] instead of open-coded (and often racy) equivalentsAl Viro11-186/+119
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-10-01NFSv4: Ensure that we disable the resend timeout for NFSv4Trond Myklebust2-0/+3
The spec states that the client should not resend requests because the server will disconnect if it needs to drop an RPC request. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-01NFSv4: Fix a use-after-free situation in _nfs4_proc_getlk()Trond Myklebust1-0/+1
In nfs4_proc_getlk(), when some error causes a retry of the call to _nfs4_proc_getlk(), we can end up with Oopses of the form BUG: unable to handle kernel NULL pointer dereference at 0000000000000134 IP: [<ffffffff8165270e>] _raw_spin_lock+0xe/0x30 <snip> Call Trace: [<ffffffff812f287d>] _atomic_dec_and_lock+0x4d/0x70 [<ffffffffa053c4f2>] nfs4_put_lock_state+0x32/0xb0 [nfsv4] [<ffffffffa053c585>] nfs4_fl_release_lock+0x15/0x20 [nfsv4] [<ffffffffa0522c06>] _nfs4_proc_getlk.isra.40+0x146/0x170 [nfsv4] [<ffffffffa052ad99>] nfs4_proc_lock+0x399/0x5a0 [nfsv4] The problem is that we don't clear the request->fl_ops after the first try and so when we retry, nfs4_set_lock_state() exits early without setting the lock stateid. Regression introduced by commit 70cc6487a4e08b8698c0e2ec935fb48d10490162 (locks: make ->lock release private data before returning in GETLK case) Reported-by: Weston Andros Adamson <dros@netapp.com> Reported-by: Jorge Mora <mora@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: <stable@vger.kernel.org> #2.6.22+
2013-09-29NFS: Give "flavor" an initial value to fix a compile warningAnna Schumaker1-1/+1
The previous patch introduces a compile warning by not assigning an initial value to the "flavor" variable. This could only be a problem if the server returns a supported secflavor list of length zero, but it's better to fix this before it's ever hit. Signed-off-by: Anna Schumaker <bjschuma@netapp.com> Acked-by: Weston Andros Adamson <dros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-29NFSv4.1: try SECINFO_NO_NAME flavs until one worksWeston Andros Adamson1-3/+27
Call nfs4_lookup_root_sec for each flavor returned by SECINFO_NO_NAME until one works. One example of a situation this fixes: - server configured for krb5 - server principal somehow gets deleted from KDC - server still thinking krb is good, sends krb5 as first entry in SECINFO_NO_NAME response - client tries krb5, but this fails without even sending an RPC because gssd's requests to the KDC can't find the server's principal Signed-off-by: Weston Andros Adamson <dros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-29NFSv4.1: Ensure memory ordering between nfs4_ds_connect and nfs4_fl_prepare_dsTrond Myklebust1-0/+2
We need to ensure that the initialisation of the data server nfs_client structure in nfs4_ds_connect is correctly ordered w.r.t. the read of ds->ds_clp in nfs4_fl_prepare_ds. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-29NFSv4.1: nfs4_fl_prepare_ds - fix bugs when the connect attempt failsTrond Myklebust1-9/+9
- Fix an Oops when nfs4_ds_connect() returns an error. - Always check the device status after waiting for a connect to complete. Reported-by: Andy Adamson <andros@netapp.com> Reported-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: <stable@vger.kernel.org> # v3.10+
2013-09-27NFS: Use i_writecount to control whether to get an fscache cookie in nfs_open()David Howells5-158/+68
Use i_writecount to control whether to get an fscache cookie in nfs_open() as NFS does not do write caching yet. I *think* this is the cause of a problem encountered by Mark Moseley whereby __fscache_uncache_page() gets a NULL pointer dereference because cookie->def is NULL: BUG: unable to handle kernel NULL pointer dereference at 0000000000000010 IP: [<ffffffff812a1903>] __fscache_uncache_page+0x23/0x160 PGD 0 Thread overran stack, or stack corrupted Oops: 0000 [#1] SMP Modules linked in: ... CPU: 7 PID: 18993 Comm: php Not tainted 3.11.1 #1 Hardware name: Dell Inc. PowerEdge R420/072XWF, BIOS 1.3.5 08/21/2012 task: ffff8804203460c0 ti: ffff880420346640 RIP: 0010:[<ffffffff812a1903>] __fscache_uncache_page+0x23/0x160 RSP: 0018:ffff8801053af878 EFLAGS: 00210286 RAX: 0000000000000000 RBX: ffff8800be2f8780 RCX: ffff88022ffae5e8 RDX: 0000000000004c66 RSI: ffffea00055ff440 RDI: ffff8800be2f8780 RBP: ffff8801053af898 R08: 0000000000000001 R09: 0000000000000003 R10: 0000000000000000 R11: 0000000000000000 R12: ffffea00055ff440 R13: 0000000000001000 R14: ffff8800c50be538 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff88042fc60000(0063) knlGS:00000000e439c700 CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033 CR2: 0000000000000010 CR3: 0000000001d8f000 CR4: 00000000000607f0 Stack: ... Call Trace: [<ffffffff81365a72>] __nfs_fscache_invalidate_page+0x42/0x70 [<ffffffff813553d5>] nfs_invalidate_page+0x75/0x90 [<ffffffff811b8f5e>] truncate_inode_page+0x8e/0x90 [<ffffffff811b90ad>] truncate_inode_pages_range.part.12+0x14d/0x620 [<ffffffff81d6387d>] ? __mutex_lock_slowpath+0x1fd/0x2e0 [<ffffffff811b95d3>] truncate_inode_pages_range+0x53/0x70 [<ffffffff811b969d>] truncate_inode_pages+0x2d/0x40 [<ffffffff811b96ff>] truncate_pagecache+0x4f/0x70 [<ffffffff81356840>] nfs_setattr_update_inode+0xa0/0x120 [<ffffffff81368de4>] nfs3_proc_setattr+0xc4/0xe0 [<ffffffff81357f78>] nfs_setattr+0xc8/0x150 [<ffffffff8122d95b>] notify_change+0x1cb/0x390 [<ffffffff8120a55b>] do_truncate+0x7b/0xc0 [<ffffffff8121f96c>] do_last+0xa4c/0xfd0 [<ffffffff8121ffbc>] path_openat+0xcc/0x670 [<ffffffff81220a0e>] do_filp_open+0x4e/0xb0 [<ffffffff8120ba1f>] do_sys_open+0x13f/0x2b0 [<ffffffff8126aaf6>] compat_SyS_open+0x36/0x50 [<ffffffff81d7204c>] sysenter_dispatch+0x7/0x24 The code at the instruction pointer was disassembled: > (gdb) disas __fscache_uncache_page > Dump of assembler code for function __fscache_uncache_page: > ... > 0xffffffff812a18ff <+31>: mov 0x48(%rbx),%rax > 0xffffffff812a1903 <+35>: cmpb $0x0,0x10(%rax) > 0xffffffff812a1907 <+39>: je 0xffffffff812a19cd <__fscache_uncache_page+237> These instructions make up: ASSERTCMP(cookie->def->type, !=, FSCACHE_COOKIE_TYPE_INDEX); That cmpb is the faulting instruction (%rax is 0). So cookie->def is NULL - which presumably means that the cookie has already been at least partway through __fscache_relinquish_cookie(). What I think may be happening is something like a three-way race on the same file: PROCESS 1 PROCESS 2 PROCESS 3 =============== =============== =============== open(O_TRUNC|O_WRONLY) open(O_RDONLY) open(O_WRONLY) -->nfs_open() -->nfs_fscache_set_inode_cookie() nfs_fscache_inode_lock() nfs_fscache_disable_inode_cookie() __fscache_relinquish_cookie() nfs_inode->fscache = NULL <--nfs_fscache_set_inode_cookie() -->nfs_open() -->nfs_fscache_set_inode_cookie() nfs_fscache_inode_lock() nfs_fscache_enable_inode_cookie() __fscache_acquire_cookie() nfs_inode->fscache = cookie <--nfs_fscache_set_inode_cookie() <--nfs_open() -->nfs_setattr() ... ... -->nfs_invalidate_page() -->__nfs_fscache_invalidate_page() cookie = nfsi->fscache -->nfs_open() -->nfs_fscache_set_inode_cookie() nfs_fscache_inode_lock() nfs_fscache_disable_inode_cookie() -->__fscache_relinquish_cookie() -->__fscache_uncache_page(cookie) <crash> <--__fscache_relinquish_cookie() nfs_inode->fscache = NULL <--nfs_fscache_set_inode_cookie() What is needed is something to prevent process #2 from reacquiring the cookie - and I think checking i_writecount should do the trick. It's also possible to have a two-way race on this if the file is opened O_TRUNC|O_RDONLY instead. Reported-by: Mark Moseley <moseleymark@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com>
2013-09-27FS-Cache: Provide the ability to enable/disable cookiesDavid Howells1-4/+4
Provide the ability to enable and disable fscache cookies. A disabled cookie will reject or ignore further requests to: Acquire a child cookie Invalidate and update backing objects Check the consistency of a backing object Allocate storage for backing page Read backing pages Write to backing pages but still allows: Checks/waits on the completion of already in-progress objects Uncaching of pages Relinquishment of cookies Two new operations are provided: (1) Disable a cookie: void fscache_disable_cookie(struct fscache_cookie *cookie, bool invalidate); If the cookie is not already disabled, this locks the cookie against other dis/enablement ops, marks the cookie as being disabled, discards or invalidates any backing objects and waits for cessation of activity on any associated object. This is a wrapper around a chunk split out of fscache_relinquish_cookie(), but it reinitialises the cookie such that it can be reenabled. All possible failures are handled internally. The caller should consider calling fscache_uncache_all_inode_pages() afterwards to make sure all page markings are cleared up. (2) Enable a cookie: void fscache_enable_cookie(struct fscache_cookie *cookie, bool (*can_enable)(void *data), void *data) If the cookie is not already enabled, this locks the cookie against other dis/enablement ops, invokes can_enable() and, if the cookie is not an index cookie, will begin the procedure of acquiring backing objects. The optional can_enable() function is passed the data argument and returns a ruling as to whether or not enablement should actually be permitted to begin. All possible failures are handled internally. The cookie will only be marked as enabled if provisional backing objects are allocated. A later patch will introduce these to NFS. Cookie enablement during nfs_open() is then contingent on i_writecount <= 0. can_enable() checks for a race between open(O_RDONLY) and open(O_WRONLY/O_RDWR). This simplifies NFS's cookie handling and allows us to get rid of open(O_RDONLY) accidentally introducing caching to an inode that's open for writing already. One operation has its API modified: (3) Acquire a cookie. struct fscache_cookie *fscache_acquire_cookie( struct fscache_cookie *parent, const struct fscache_cookie_def *def, void *netfs_data, bool enable); This now has an additional argument that indicates whether the requested cookie should be enabled by default. It doesn't need the can_enable() function because the caller must prevent multiple calls for the same netfs object and it doesn't need to take the enablement lock because no one else can get at the cookie before this returns. Signed-off-by: David Howells <dhowells@redhat.com
2013-09-26NFSv4: Honour the 'opened' parameter in the atomic_open() filesystem methodTrond Myklebust3-9/+22
Determine if we've created a new file by examining the directory change attribute and/or the O_EXCL flag. This fixes a regression when doing a non-exclusive create of a new file. If the FILE_CREATED flag is not set, the atomic_open() command will perform full file access permissions checks instead of just checking for MAY_OPEN. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-16nfs: set FILE_CREATEDMiklos Szeredi1-0/+3
Set FILE_CREATED on O_CREAT|O_EXCL. If the NFS server honored our request for exclusivity then this must be correct. Currently this is a no-op, since the VFS sets FILE_CREATED anyway. The next patch will, however, require this flag to be always set by filesystems. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-13Merge git://git.kvack.org/~bcrl/aio-nextLinus Torvalds1-1/+0
Pull aio changes from Ben LaHaise: "First off, sorry for this pull request being late in the merge window. Al had raised a couple of concerns about 2 items in the series below. I addressed the first issue (the race introduced by Gu's use of mm_populate()), but he has not provided any further details on how he wants to rework the anon_inode.c changes (which were sent out months ago but have yet to be commented on). The bulk of the changes have been sitting in the -next tree for a few months, with all the issues raised being addressed" * git://git.kvack.org/~bcrl/aio-next: (22 commits) aio: rcu_read_lock protection for new rcu_dereference calls aio: fix race in ring buffer page lookup introduced by page migration support aio: fix rcu sparse warnings introduced by ioctx table lookup patch aio: remove unnecessary debugging from aio_free_ring() aio: table lookup: verify ctx pointer staging/lustre: kiocb->ki_left is removed aio: fix error handling and rcu usage in "convert the ioctx list to table lookup v3" aio: be defensive to ensure request batching is non-zero instead of BUG_ON() aio: convert the ioctx list to table lookup v3 aio: double aio_max_nr in calculations aio: Kill ki_dtor aio: Kill ki_users aio: Kill unneeded kiocb members aio: Kill aio_rw_vect_retry() aio: Don't use ctx->tail unnecessarily aio: io_cancel() no longer returns the io_event aio: percpu ioctx refcount aio: percpu reqs_available aio: reqs_active -> reqs_available aio: fix build when migration is disabled ...
2013-09-12Merge branch 'akpm' (patches from Andrew Morton)Linus Torvalds1-3/+1
Merge more patches from Andrew Morton: "The rest of MM. Plus one misc cleanup" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (35 commits) mm/Kconfig: add MMU dependency for MIGRATION. kernel: replace strict_strto*() with kstrto*() mm, thp: count thp_fault_fallback anytime thp fault fails thp: consolidate code between handle_mm_fault() and do_huge_pmd_anonymous_page() thp: do_huge_pmd_anonymous_page() cleanup thp: move maybe_pmd_mkwrite() out of mk_huge_pmd() mm: cleanup add_to_page_cache_locked() thp: account anon transparent huge pages into NR_ANON_PAGES truncate: drop 'oldsize' truncate_pagecache() parameter mm: make lru_add_drain_all() selective memcg: document cgroup dirty/writeback memory statistics memcg: add per cgroup writeback pages accounting memcg: check for proper lock held in mem_cgroup_update_page_stat memcg: remove MEMCG_NR_FILE_MAPPED memcg: reduce function dereference memcg: avoid overflow caused by PAGE_ALIGN memcg: rename RESOURCE_MAX to RES_COUNTER_MAX memcg: correct RESOURCE_MAX to ULLONG_MAX mm: memcg: do not trap chargers with full callstack on OOM mm: memcg: rework and document OOM waiting and wakeup ...
2013-09-12truncate: drop 'oldsize' truncate_pagecache() parameterKirill A. Shutemov1-3/+1
truncate_pagecache() doesn't care about old size since commit cedabed49b39 ("vfs: Fix vmtruncate() regression"). Let's drop it. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-12Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds3-7/+18
Pull vfs pile 4 from Al Viro: "list_lru pile, mostly" This came out of Andrew's pile, Al ended up doing the merge work so that Andrew didn't have to. Additionally, a few fixes. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (42 commits) super: fix for destroy lrus list_lru: dynamically adjust node arrays shrinker: Kill old ->shrink API. shrinker: convert remaining shrinkers to count/scan API staging/lustre/libcfs: cleanup linux-mem.h staging/lustre/ptlrpc: convert to new shrinker API staging/lustre/obdclass: convert lu_object shrinker to count/scan API staging/lustre/ldlm: convert to shrinkers to count/scan API hugepage: convert huge zero page shrinker to new shrinker API i915: bail out earlier when shrinker cannot acquire mutex drivers: convert shrinkers to new count/scan API fs: convert fs shrinkers to new scan/count API xfs: fix dquot isolation hang xfs-convert-dquot-cache-lru-to-list_lru-fix xfs: convert dquot cache lru to list_lru xfs: rework buffer dispose list tracking xfs-convert-buftarg-lru-to-generic-code-fix xfs: convert buftarg LRU to generic code fs: convert inode and dentry shrinking to be node aware vmscan: per-node deferred work ...
2013-09-12Merge tag 'nfs-for-3.12-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds3-28/+21
Pull NFS client bugfixes (part 2) from Trond Myklebust: "Bugfixes: - Fix a few credential reference leaks resulting from the SP4_MACH_CRED NFSv4.1 state protection code. - Fix the SUNRPC bloatometer footprint: convert a 256K hashtable into the intended 64 byte structure. - Fix a long standing XDR issue with FREE_STATEID - Fix a potential WARN_ON spamming issue - Fix a missing dprintk() kuid conversion New features: - Enable the NFSv4.1 state protection support for the WRITE and COMMIT operations" * tag 'nfs-for-3.12-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: SUNRPC: No, I did not intend to create a 256KiB hashtable sunrpc: Add missing kuids conversion for printing NFSv4.1: sp4_mach_cred: WARN_ON -> WARN_ON_ONCE NFSv4.1: sp4_mach_cred: no need to ref count creds NFSv4.1: fix SECINFO* use of put_rpccred NFSv4.1: sp4_mach_cred: ask for WRITE and COMMIT NFSv4.1 fix decode_free_stateid
2013-09-11NFSv4.1: sp4_mach_cred: WARN_ON -> WARN_ON_ONCEWeston Andros Adamson1-2/+2
No need to spam the logs Signed-off-by: Weston Andros Adamson <dros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-11NFSv4.1: sp4_mach_cred: no need to ref count credsWeston Andros Adamson1-3/+3
The cl_machine_cred doesn't need to be reference counted here - a reference is held is for the lifetime of the struct nfs_client. Also, no need to put_rpccred the rpc_message.rpc_cred. Signed-off-by: Weston Andros Adamson <dros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-11NFSv4.1: fix SECINFO* use of put_rpccredWeston Andros Adamson1-6/+10
Recent SP4_MACH_CRED changes allows rpc_message.rpc_cred to change, so keep a separate pointer to the machine cred for put_rpccred. Signed-off-by: Weston Andros Adamson <dros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-11NFSv4.1: sp4_mach_cred: ask for WRITE and COMMITWeston Andros Adamson1-2/+4
Request SP4_MACH_CRED WRITE and COMMIT support in spo_must_allow list -- they're already supported by the client. Signed-off-by: Weston Andros Adamson <dros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-10fs: convert fs shrinkers to new scan/count APIDave Chinner3-6/+17
Convert the filesystem shrinkers to use the new API, and standardise some of the behaviours of the shrinkers at the same time. For example, nr_to_scan means the number of objects to scan, not the number of objects to free. I refactored the CIFS idmap shrinker a little - it really needs to be broken up into a shrinker per tree and keep an item count with the tree root so that we don't need to walk the tree every time the shrinker needs to count the number of objects in the tree (i.e. all the time under memory pressure). [glommer@openvz.org: fixes for ext4, ubifs, nfs, cifs and glock. Fixes are needed mainly due to new code merged in the tree] [assorted fixes folded in] Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Glauber Costa <glommer@openvz.org> Acked-by: Mel Gorman <mgorman@suse.de> Acked-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Acked-by: Jan Kara <jack@suse.cz> Acked-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Cc: Arve Hjønnevåg <arve@android.com> Cc: Carlos Maiolino <cmaiolino@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Rientjes <rientjes@google.com> Cc: Gleb Natapov <gleb@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: J. Bruce Fields <bfields@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Stultz <john.stultz@linaro.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Kent Overstreet <koverstreet@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Thomas Hellstrom <thellstrom@vmware.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-10super: fix calculation of shrinkable objects for small numbersGlauber Costa1-1/+1
The sysctl knob sysctl_vfs_cache_pressure is used to determine which percentage of the shrinkable objects in our cache we should actively try to shrink. It works great in situations in which we have many objects (at least more than 100), because the aproximation errors will be negligible. But if this is not the case, specially when total_objects < 100, we may end up concluding that we have no objects at all (total / 100 = 0, if total < 100). This is certainly not the biggest killer in the world, but may matter in very low kernel memory situations. Signed-off-by: Glauber Costa <glommer@openvz.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Mel Gorman <mgorman@suse.de> Cc: Dave Chinner <david@fromorbit.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Cc: Arve Hjønnevåg <arve@android.com> Cc: Carlos Maiolino <cmaiolino@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Rientjes <rientjes@google.com> Cc: Gleb Natapov <gleb@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: J. Bruce Fields <bfields@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Stultz <john.stultz@linaro.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Kent Overstreet <koverstreet@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Thomas Hellstrom <thellstrom@vmware.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-10NFSv4.1 fix decode_free_stateidAndy Adamson1-15/+2
The operation status is decoded in decode_op_hdr. Stop the print_overflow message that is always hit without this patch: nfs: decode_free_stateid: prematurely hit end of receive buffer. Remaining buffer length is 0 words. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-09Merge tag 'nfs-for-3.12-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds32-720/+3521
Pull NFS client updates from Trond Myklebust: "Highlights include: - Fix NFSv4 recovery so that it doesn't recover lost locks in cases such as lease loss due to a network partition, where doing so may result in data corruption. Add a kernel parameter to control choice of legacy behaviour or not. - Performance improvements when 2 processes are writing to the same file. - Flush data to disk when an RPCSEC_GSS session timeout is imminent. - Implement NFSv4.1 SP4_MACH_CRED state protection to prevent other NFS clients from being able to manipulate our lease and file locking state. - Allow sharing of RPCSEC_GSS caches between different rpc clients. - Fix the broken NFSv4 security auto-negotiation between client and server. - Fix rmdir() to wait for outstanding sillyrename unlinks to complete - Add a tracepoint framework for debugging NFSv4 state recovery issues. - Add tracing to the generic NFS layer. - Add tracing for the SUNRPC socket connection state. - Clean up the rpc_pipefs mount/umount event management. - Merge more patches from Chuck in preparation for NFSv4 migration support" * tag 'nfs-for-3.12-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (107 commits) NFSv4: use mach cred for SECINFO_NO_NAME w/ integrity NFS: nfs_compare_super shouldn't check the auth flavour unless 'sec=' was set NFSv4: Allow security autonegotiation for submounts NFSv4: Disallow security negotiation for lookups when 'sec=' is specified NFSv4: Fix security auto-negotiation NFS: Clean up nfs_parse_security_flavors() NFS: Clean up the auth flavour array mess NFSv4.1 Use MDS auth flavor for data server connection NFS: Don't check lock owner compatability unless file is locked (part 2) NFS: Don't check lock owner compatibility in writes unless file is locked nfs4: Map NFS4ERR_WRONG_CRED to EPERM nfs4.1: Add SP4_MACH_CRED write and commit support nfs4.1: Add SP4_MACH_CRED stateid support nfs4.1: Add SP4_MACH_CRED secinfo support nfs4.1: Add SP4_MACH_CRED cleanup support nfs4.1: Add state protection handler nfs4.1: Minimal SP4_MACH_CRED implementation SUNRPC: Replace pointer values with task->tk_pid and rpc_clnt->cl_clid SUNRPC: Add an identifier for struct rpc_clnt SUNRPC: Ensure rpc_task->tk_pid is available for tracepoints ...
2013-09-07NFSv4: use mach cred for SECINFO_NO_NAME w/ integrityWeston Andros Adamson1-4/+37
Commit 97431204ea005ec8070ac94bc3251e836daa7ca7 introduced a regression that causes SECINFO_NO_NAME to fail without sending an RPC if: 1) the nfs_client's rpc_client is using krb5i/p (now tried by default) 2) the current user doesn't have valid kerberos credentials This situation is quite common - as of now a sec=sys mount would use krb5i for the nfs_client's rpc_client and a user would hardly be faulted for not having run kinit. The solution is to use the machine cred when trying to use an integrity protected auth flavor for SECINFO_NO_NAME. Older servers may not support using the machine cred or an integrity protected auth flavor for SECINFO_NO_NAME in every circumstance, so we fall back to using the user's cred and the filesystem's auth flavor in this case. We run into another problem when running against linux nfs servers - they return NFS4ERR_WRONGSEC when using integrity auth flavor (unless the mount is also that flavor) even though that is not a valid error for SECINFO*. Even though it's against spec, handle WRONGSEC errors on SECINFO_NO_NAME by falling back to using the user cred and the filesystem's auth flavor. Signed-off-by: Weston Andros Adamson <dros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-07NFS: nfs_compare_super shouldn't check the auth flavour unless 'sec=' was setTrond Myklebust1-2/+15
Also don't worry about obsolete mount flags... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-07NFSv4: Allow security autonegotiation for submountsTrond Myklebust2-5/+19
In cases where the parent super block was not mounted with a 'sec=' line, allow autonegotiation of security for the submounts. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-07NFSv4: Disallow security negotiation for lookups when 'sec=' is specifiedTrond Myklebust1-1/+3
Ensure that nfs4_proc_lookup_common respects the NFS_MOUNT_SECFLAVOUR flag. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-07NFSv4: Fix security auto-negotiationTrond Myklebust6-18/+30
NFSv4 security auto-negotiation has been broken since commit 4580a92d44e2b21c2254fa5fef0f1bfb43c82318 (NFS: Use server-recommended security flavor by default (NFSv3)) because nfs4_try_mount() will automatically select AUTH_SYS if it sees no auth flavours. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Chuck Lever <chuck.lever@oracle.com>
2013-09-07NFS: Clean up nfs_parse_security_flavors()Trond Myklebust1-12/+13
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-07NFS: Clean up the auth flavour array messTrond Myklebust2-13/+28
What is the point of having a 'auth_flavor_len' field, if it is always set to 1, and can't be used to determine if the user has selected an auth flavour? This cleanup goes back to using auth_flavor_len for its original intended purpose, and gets rid of the ad-hoc replacements. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-06NFSv4.1 Use MDS auth flavor for data server connectionAndy Adamson3-8/+145
Commit 4edaa308 "NFS: Use "krb5i" to establish NFSv4 state whenever possible" uses the nfs_client cl_rpcclient for all state management operations, and will use krb5i or auth_sys with no regard to the mount command authflavor choice. The MDS, as any NFSv4.1 mount point, uses the nfs_server rpc client for all non-state management operations with a different nfs_server for each fsid encountered traversing the mount point, each with a potentially different auth flavor. pNFS data servers are not mounted in the normal sense as there is no associated nfs_server structure. Data servers can also export multiple fsids, each with a potentially different auth flavor. Data servers need to use the same authflavor as the MDS server rpc client for non-state management operations. Populate a list of rpc clients with the MDS server rpc client auth flavor for the DS to use. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-06NFS: Don't check lock owner compatability unless file is locked (part 2)Trond Myklebust1-6/+16
When coalescing requests into a single READ or WRITE RPC call, and there is no file locking involved, we don't have to refuse coalescing for requests where the lock owner information doesn't match. Reported-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-05NFS: Don't check lock owner compatibility in writes unless file is lockedTrond Myklebust1-1/+1
If we're doing buffered writes, and there is no file locking involved, then we don't have to worry about whether or not the lock owner information is identical. By relaxing this check, we ensure that fork()ed child processes can write to a page without having to first sync dirty data that was written by the parent to disk. Reported-by: Quentin Barnes <qbarnes@gmail.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Tested-by: Quentin Barnes <qbarnes@gmail.com>
2013-09-05nfs: use check_submounts_and_drop()Miklos Szeredi1-5/+4
Do have_submounts(), shrink_dcache_parent() and d_drop() atomically. check_submounts_and_drop() can deal with negative dentries and non-directories as well. Non-directories can also be mounted on. And just like directories we don't want these to disappear with invalidation. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> CC: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-05nfs4: Map NFS4ERR_WRONG_CRED to EPERMWeston Andros Adamson1-0/+1
Signed-off-by: Weston Andros Adamson <dros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-05nfs4.1: Add SP4_MACH_CRED write and commit supportWeston Andros Adamson3-9/+57
WRITE and COMMIT can use the machine credential. If WRITE is supported and COMMIT is not, make all (mach cred) writes FILE_SYNC4. Signed-off-by: Weston Andros Adamson <dros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-05nfs4.1: Add SP4_MACH_CRED stateid supportWeston Andros Adamson1-2/+17
TEST_STATEID and FREE_STATEID can use the machine credential. Signed-off-by: Weston Andros Adamson <dros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-05nfs4.1: Add SP4_MACH_CRED secinfo supportWeston Andros Adamson1-1/+13
SECINFO and SECINFO_NONAME can use the machine credential. Signed-off-by: Weston Andros Adamson <dros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-05nfs4.1: Add SP4_MACH_CRED cleanup supportWeston Andros Adamson1-1/+18
CLOSE and LOCKU can use the machine credential. Signed-off-by: Weston Andros Adamson <dros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-05nfs4.1: Add state protection handlerWeston Andros Adamson1-0/+35
Add nfs4_state_protect - the function responsible for switching to the machine credential and the correct rpc client when SP4_MACH_CRED is in use. Signed-off-by: Weston Andros Adamson <dros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-05nfs4.1: Minimal SP4_MACH_CRED implementationWeston Andros Adamson2-17/+188
This is a minimal client side implementation of SP4_MACH_CRED. It will attempt to negotiate SP4_MACH_CRED iff the EXCHANGE_ID is using krb5i or krb5p auth. SP4_MACH_CRED will be used if the server supports the minimal operations: BIND_CONN_TO_SESSION EXCHANGE_ID CREATE_SESSION DESTROY_SESSION DESTROY_CLIENTID This patch only includes the EXCHANGE_ID negotiation code because the client will already use the machine cred for these operations. If the server doesn't support SP4_MACH_CRED or doesn't support the minimal operations, the exchange id will be resent with SP4_NONE. Signed-off-by: Weston Andros Adamson <dros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>