aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation/filesystems (follow)
AgeCommit message (Collapse)AuthorFilesLines
2012-12-12Merge tag 'for-linus-v3.8-rc1' of git://oss.sgi.com/xfs/xfsLinus Torvalds1-3/+10
Pull xfs update from Ben Myers: "There is plenty going on, including the cleanup of xfssyncd, metadata verifiers, CRC infrastructure for the log, tracking of inodes with speculative allocation, a cleanup of xfs_fs_subr.c, fixes for XFS_IOC_ZERO_RANGE, and important fix related to log replay (only update the last_sync_lsn when a transaction completes), a fix for deadlock on AGF buffers, documentation and comment updates, and a few more cleanups and fixes. Details: - remove the xfssyncd mess - only update the last_sync_lsn when a transaction completes - zero allocation_args on the kernel stack - fix AGF/alloc workqueue deadlock - silence uninitialised f.file warning - Update inode alloc comments - Update mount options documentation - report projid32bit feature in geometry call - speculative preallocation inode tracking - fix attr tree double split corruption - fix broken error handling in xfs_vm_writepage - drop buffer io reference when a bad bio is built - add more attribute tree trace points - growfs infrastructure changes for 3.8 - fs/xfs/xfs_fs_subr.c die die die - add CRC infrastructure - add CRC checks to the log - Remove description of nodelaylog mount option from xfs.txt - inode allocation should use unmapped buffers - byte range granularity for XFS_IOC_ZERO_RANGE - fix direct IO nested transaction deadlock - fix stray dquot unlock when reclaiming dquots - fix sparse reported log CRC endian issue" Fix up trivial conflict in fs/xfs/xfs_fsops.c due to the same patch having been applied twice (commits eaef854335ce and 1375cb65e87b: "xfs: growfs: don't read garbage for new secondary superblocks") with later updates to the affected code in the XFS tree. * tag 'for-linus-v3.8-rc1' of git://oss.sgi.com/xfs/xfs: (78 commits) xfs: fix sparse reported log CRC endian issue xfs: fix stray dquot unlock when reclaiming dquots xfs: fix direct IO nested transaction deadlock. xfs: byte range granularity for XFS_IOC_ZERO_RANGE xfs: inode allocation should use unmapped buffers. xfs: Remove the description of nodelaylog mount option from xfs.txt xfs: add CRC checks to the log xfs: add CRC infrastructure xfs: convert buffer verifiers to an ops structure. xfs: connect up write verifiers to new buffers xfs: add pre-write metadata buffer verifier callbacks xfs: add buffer pre-write callback xfs: Add verifiers to dir2 data readahead. xfs: add xfs_da_node verification xfs: factor and verify attr leaf reads xfs: factor dir2 leaf read xfs: factor out dir2 data block reading xfs: factor dir2 free block reading xfs: verify dir2 block format buffers xfs: factor dir2 block read operations ...
2012-12-11f2fs: fix a typo in f2fs documentationHuajun Li1-1/+1
In f2fs_fs.h, one f2fs inode contains 923 data block pointers, while f2fs documentation says it is 929. Fix this inconsistence. Signed-off-by: Huajun Li <huajun.li.lee@gmail.com>
2012-12-11f2fs: update the f2fs documentJaegeuk Kim1-1/+5
I moved the f2fs-tools.git into kernel.org. And I added a new mailing list, linux-f2fs-devel@lists.sourceforge.net. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-12-11f2fs: add documentJaegeuk Kim2-0/+419
This adds a document describing the mount options, proc entries, usage, and design of Flash-Friendly File System, namely F2FS. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-12-10ext4: Remove CONFIG_EXT4_FS_XATTRTao Ma1-6/+3
Ted has sent out a RFC about removing this feature. Eric and Jan confirmed that both RedHat and SUSE enable this feature in all their product. David also said that "As far as I know, it's enabled in all Android kernels that use ext4." So it seems OK for us. And what's more, as inline data depends its implementation on xattr, and to be frank, I don't run any test again inline data enabled while xattr disabled. So I think we should add inline data and remove this config option in the same release. [ The savings if you disable CONFIG_EXT4_FS_XATTR is only 27k, which isn't much in the grand scheme of things. Since no one seems to be testing this configuration except for some automated compile farms, on balance we are better removing this config option, and so that it is effectively always enabled. -- tytso ] Cc: David Brown <davidb@codeaurora.org> Cc: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-11-26xfs: Remove the description of nodelaylog mount option from xfs.txtSatoru Takeuchi1-1/+1
nodelaylog mount option is removed by commit 93b8a585. But there still be the description about it in the xfs document. This patch removes it. Signed-off-by: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2012-11-26nfsd4: delay filling in write iovec array till after xdr decodingJ. Bruce Fields1-2/+2
Our server rejects compounds containing more than one write operation. It's unclear whether this is really permitted by the spec; with 4.0, it's possibly OK, with 4.1 (which has clearer limits on compound parameters), it's probably not OK. No client that we're aware of has ever done this, but in theory it could be useful. The source of the limitation: we need an array of iovecs to pass to the write operation. In the worst case that array of iovecs could have hundreds of elements (the maximum rwsize divided by the page size), so it's too big to put on the stack, or in each compound op. So we instead keep a single such array in the compound argument. We fill in that array at the time we decode the xdr operation. But we decode every op in the compound before executing any of them. So once we've used that array we can't decode another write. If we instead delay filling in that array till the time we actually perform the write, we can reuse it. Another option might be to switch to decoding compound ops one at a time. I considered doing that, but it has a number of other side effects, and I'd rather fix just this one problem for now. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-16mm, oom: reintroduce /proc/pid/oom_adjDavid Rientjes1-4/+12
This is mostly a revert of 01dc52ebdf47 ("oom: remove deprecated oom_adj") from Davidlohr Bueso. It reintroduces /proc/pid/oom_adj for backwards compatibility with earlier kernels. It simply scales the value linearly when /proc/pid/oom_score_adj is written. The major difference is that its scheduled removal is no longer included in Documentation/feature-removal-schedule.txt. We do warn users with a single printk, though, to suggest the more powerful and supported /proc/pid/oom_score_adj interface. Reported-by: Artem S. Tashkinov <t.artem@lycos.com> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-11-10nfsd4: update documentation on 4.1 progressJ. Bruce Fields1-8/+1
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-07nfsd4: implement backchannel_ctl operationJ. Bruce Fields1-6/+1
This operation is mandatory for servers to implement. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-02xfs: Update mount options documentationCarlos Maiolino1-2/+9
Once inode64 is the default allocation mode now, kernel documentation should be updated to match this behaviour. Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2012-10-30efivarfs: Add documentation for the EFI variable filesystemMatt Fleming2-0/+18
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2012-10-13Merge branch 'for-3.7' of git://linux-nfs.org/~bfields/linuxLinus Torvalds1-0/+41
Pull nfsd update from J Bruce Fields: "Another relatively quiet cycle. There was some progress on my remaining 4.1 todo's, but a couple of them were just of the form "check that we do X correctly", so didn't have much affect on the code. Other than that, a bunch of cleanup and some bugfixes (including an annoying NFSv4.0 state leak and a busy-loop in the server that could cause it to peg the CPU without making progress)." * 'for-3.7' of git://linux-nfs.org/~bfields/linux: (46 commits) UAPI: (Scripted) Disintegrate include/linux/sunrpc UAPI: (Scripted) Disintegrate include/linux/nfsd nfsd4: don't allow reclaims of expired clients nfsd4: remove redundant callback probe nfsd4: expire old client earlier nfsd4: separate session allocation and initialization nfsd4: clean up session allocation nfsd4: minor free_session cleanup nfsd4: new_conn_from_crses should only allocate nfsd4: separate connection allocation and initialization nfsd4: reject bad forechannel attrs earlier nfsd4: enforce per-client sessions/no-sessions distinction nfsd4: set cl_minorversion at create time nfsd4: don't pin clientids to pseudoflavors nfsd4: fix bind_conn_to_session xdr comment nfsd4: cast readlink() bug argument NFSD: pass null terminated buf to kstrtouint() nfsd: remove duplicate init in nfsd4_cb_recall nfsd4: eliminate redundant nfs4_free_stateid fs/nfsd/nfs4idmap.c: adjust inconsistent IS_ERR and PTR_ERR ...
2012-10-10Merge tag 'nfs-for-3.7-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds1-3/+41
Pull NFS client updates from Trond Myklebust: "Features include: - Remove CONFIG_EXPERIMENTAL dependency from NFSv4.1 Aside from the issues discussed at the LKS, distros are shipping NFSv4.1 with all the trimmings. - Fix fdatasync()/fsync() for the corner case of a server reboot. - NFSv4 OPEN access fix: finally distinguish correctly between open-for-read and open-for-execute permissions in all situations. - Ensure that the TCP socket is closed when we're in CLOSE_WAIT - More idmapper bugfixes - Lots of pNFS bugfixes and cleanups to remove unnecessary state and make the code easier to read. - In cases where a pNFS read or write fails, allow the client to resume trying layoutgets after two minutes of read/write- through-mds. - More net namespace fixes to the NFSv4 callback code. - More net namespace fixes to the NFSv3 locking code. - More NFSv4 migration preparatory patches. Including patches to detect network trunking in both NFSv4 and NFSv4.1 - pNFS block updates to optimise LAYOUTGET calls." * tag 'nfs-for-3.7-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (113 commits) pnfsblock: cleanup nfs4_blkdev_get NFS41: send real read size in layoutget NFS41: send real write size in layoutget NFS: track direct IO left bytes NFSv4.1: Cleanup ugliness in pnfs_layoutgets_blocked() NFSv4.1: Ensure that the layout sequence id stays 'close' to the current NFSv4.1: Deal with seqid wraparound in the pNFS return-on-close code NFSv4 set open access operation call flag in nfs4_init_opendata_res NFSv4.1: Remove the dependency on CONFIG_EXPERIMENTAL NFSv4 reduce attribute requests for open reclaim NFSv4: nfs4_open_done first must check that GETATTR decoded a file type NFSv4.1: Deal with wraparound when updating the layout "barrier" seqid NFSv4.1: Deal with wraparound issues when updating the layout stateid NFSv4.1: Always set the layout stateid if this is the first layoutget NFSv4.1: Fix another refcount issue in pnfs_find_alloc_layout NFSv4: don't put ACCESS in OPEN compound if O_EXCL NFSv4: don't check MAY_WRITE access bit in OPEN NFS: Set key construction data for the legacy upcall NFSv4.1: don't do two EXCHANGE_IDs on mount NFS: nfs41_walk_client_list(): re-lock before iterating ...
2012-10-09nfs: disintegrate UAPI for nfsJ. Bruce Fields6-25/+51
This is to complete part of the Userspace API (UAPI) disintegration for which the preparatory patches were pulled recently. After these patches, userspace headers will be segregated into: include/uapi/linux/.../foo.h for the userspace interface stuff, and: include/linux/.../foo.h for the strictly kernel internal stuff. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-10-09oom: remove deprecated oom_adjDavidlohr Bueso1-18/+4
The deprecated /proc/<pid>/oom_adj is scheduled for removal this month. Signed-off-by: Davidlohr Bueso <dave@gnu.org> Acked-by: David Rientjes <rientjes@google.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-08Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4Linus Torvalds1-0/+10
Pull ext4 updates from Ted Ts'o: "The big new feature added this time is supporting online resizing using the meta_bg feature. This allows us to resize file systems which are greater than 16TB. In addition, the speed of online resizing has been improved in general. We also fix a number of races, some of which could lead to deadlocks, in ext4's Asynchronous I/O and online defrag support, thanks to good work by Dmitry Monakhov. There are also a large number of more minor bug fixes and cleanups from a number of other ext4 contributors, quite of few of which have submitted fixes for the first time." * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (69 commits) ext4: fix ext4_flush_completed_IO wait semantics ext4: fix mtime update in nodelalloc mode ext4: fix ext_remove_space for punch_hole case ext4: punch_hole should wait for DIO writers ext4: serialize truncate with owerwrite DIO workers ext4: endless truncate due to nonlocked dio readers ext4: serialize unlocked dio reads with truncate ext4: serialize dio nonlocked reads with defrag workers ext4: completed_io locking cleanup ext4: fix unwritten counter leakage ext4: give i_aiodio_unwritten a more appropriate name ext4: ext4_inode_info diet ext4: convert to use leXX_add_cpu() ext4: ext4_bread usage audit fs: reserve fallocate flag codepoint ext4: remove redundant offset check in mext_check_arguments() ext4: don't clear orphan list on ro mount with errors jbd2: fix assertion failure in commit code due to lacking transaction credits ext4: release donor reference when EXT4_IOC_MOVE_EXT ioctl fails ext4: enable FITRIM ioctl on bigalloc file system ...
2012-10-03Merge tag 'jfs-3.7' of git://github.com/kleikamp/linux-shaggyLinus Torvalds1-4/+15
Pull JFS update from Dave Kleikamp: "JFS TRIM support and some minor fixes" * tag 'jfs-3.7' of git://github.com/kleikamp/linux-shaggy: jfs: Fix do_div precision in commit b40c2e66 JFS: use list_move instead of list_del/list_add jfs: Remove obsolete email address fs/jfs: TRIM support for JFS Filesystem
2012-10-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds1-1/+9
Pull networking changes from David Miller: 1) GRE now works over ipv6, from Dmitry Kozlov. 2) Make SCTP more network namespace aware, from Eric Biederman. 3) TEAM driver now works with non-ethernet devices, from Jiri Pirko. 4) Make openvswitch network namespace aware, from Pravin B Shelar. 5) IPV6 NAT implementation, from Patrick McHardy. 6) Server side support for TCP Fast Open, from Jerry Chu and others. 7) Packet BPF filter supports MOD and XOR, from Eric Dumazet and Daniel Borkmann. 8) Increate the loopback default MTU to 64K, from Eric Dumazet. 9) Use a per-task rather than per-socket page fragment allocator for outgoing networking traffic. This benefits processes that have very many mostly idle sockets, which is quite common. From Eric Dumazet. 10) Use up to 32K for page fragment allocations, with fallbacks to smaller sizes when higher order page allocations fail. Benefits are a) less segments for driver to process b) less calls to page allocator c) less waste of space. From Eric Dumazet. 11) Allow GRO to be used on GRE tunnels, from Eric Dumazet. 12) VXLAN device driver, one way to handle VLAN issues such as the limitation of 4096 VLAN IDs yet still have some level of isolation. From Stephen Hemminger. 13) As usual there is a large boatload of driver changes, with the scale perhaps tilted towards the wireless side this time around. Fix up various fairly trivial conflicts, mostly caused by the user namespace changes. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1012 commits) hyperv: Add buffer for extended info after the RNDIS response message. hyperv: Report actual status in receive completion packet hyperv: Remove extra allocated space for recv_pkt_list elements hyperv: Fix page buffer handling in rndis_filter_send_request() hyperv: Fix the missing return value in rndis_filter_set_packet_filter() hyperv: Fix the max_xfer_size in RNDIS initialization vxlan: put UDP socket in correct namespace vxlan: Depend on CONFIG_INET sfc: Fix the reported priorities of different filter types sfc: Remove EFX_FILTER_FLAG_RX_OVERRIDE_IP sfc: Fix loopback self-test with separate_tx_channels=1 sfc: Fix MCDI structure field lookup sfc: Add parentheses around use of bitfield macro arguments sfc: Fix null function pointer in efx_sriov_channel_type vxlan: virtual extensible lan igmp: export symbol ip_mc_leave_group netlink: add attributes to fdb interface tg3: unconditionally select HWMON support when tg3 is enabled. Revert "net: ti cpsw ethernet: allow reading phy interface mode from DT" gre: fix sparse warning ...
2012-10-01NFS: Add nfs4_unique_id boot parameterChuck Lever1-3/+41
An optional boot parameter is introduced to allow client administrators to specify a string that the Linux NFS client can insert into its nfs_client_id4 id string, to make it both more globally unique, and to ensure that it doesn't change even if the client's nodename changes. If this boot parameter is not specified, the client's nodename is used, as before. Client installation procedures can create a unique string (typically, a UUID) which remains unchanged during the lifetime of that client instance. This works just like creating a UUID for the label of the system's root and boot volumes. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-21ipconfig: add nameserver IPs to kernel-parameter ip=Christoph Fritz1-1/+9
On small systems (e.g. embedded ones) IP addresses are often configured by bootloaders and get assigned to kernel via parameter "ip=". If set to "ip=dhcp", even nameserver entries from DHCP daemons are handled. These entries exported in /proc/net/pnp are commonly linked by /etc/resolv.conf. To configure nameservers for networks without DHCP, this patch adds option <dns0-ip> and <dns1-ip> to kernel-parameter 'ip='. Signed-off-by: Christoph Fritz <chf.fritz@googlemail.com> Tested-by: Jan Weitzel <j.weitzel@phytec.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-17jfs: Remove obsolete email addressDave Kleikamp1-2/+0
The MAINTAINERS file suffices. Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2012-09-17fs/jfs: TRIM support for JFS FilesystemTino Reichardt1-3/+16
This patch adds support for the two linux interfaces of the discard/TRIM command for SSD devices and sparse/thinly-provisioned LUNs. JFS will support batched discard via FITRIM ioctl and online discard with the discard mount option. Signed-off-by: Tino Reichardt <list-jfs@mcmilk.de> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2012-08-27debugfs: more tightly restrict default mount modeKees Cook1-2/+2
Since the debugfs is mostly only used by root, make the default mount mode 0700. Most system owners do not need a more permissive value, but they can choose to weaken the restrictions via their fstab. Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-08-21Documentation: update mount option in filesystem/vfat.txtNamjae Jeon1-0/+11
Update two mount options(discard, nfs) in vfat.txt. Signed-off-by: Namjae Jeon <linkinjeon@gmail.com> Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-08-21nfsd: document kernel interfaces for nfsd configurationJ. Bruce Fields1-0/+41
These are only needed by nfs-utils. But I needed to remind myself how they worked recently and thought this might be helpful. It's short and incomplete for now as I was only interested in startup, shutdown, and configuration of listening sockets. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-08-17ext4: add max_dir_size_kb mount optionTheodore Ts'o1-0/+10
Very large directories can cause significant performance problems, or perhaps even invoke the OOM killer, if the process is running in a highly constrained memory environment (whether it is VM's with a small amount of memory or in a small memory cgroup). So it is useful, in cloud server/data center environments, to be able to set a filesystem-wide cap on the maximum size of a directory, to ensure that directories never get larger than a sane size. We do this via a new mount option, max_dir_size_kb. If there is an attempt to grow the directory larger than max_dir_size_kb, the system call will return ENOSPC instead. Google-Bug-Id: 6863013 Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-08-04Documentation: get rid of write_superArtem Bityutskiy3-9/+2
The '->write_super' superblock method is gone, and this patch removes all the references to 'write_super' from various pieces of the kernel documentation. Cc: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-08-01Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds1-2/+2
Pull second vfs pile from Al Viro: "The stuff in there: fsfreeze deadlock fixes by Jan (essentially, the deadlock reproduced by xfstests 068), symlink and hardlink restriction patches, plus assorted cleanups and fixes. Note that another fsfreeze deadlock (emergency thaw one) is *not* dealt with - the series by Fernando conflicts a lot with Jan's, breaks userland ABI (FIFREEZE semantics gets changed) and trades the deadlock for massive vfsmount leak; this is going to be handled next cycle. There probably will be another pull request, but that stuff won't be in it." Fix up trivial conflicts due to unrelated changes next to each other in drivers/{staging/gdm72xx/usb_boot.c, usb/gadget/storage_common.c} * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (54 commits) delousing target_core_file a bit Documentation: Correct s_umount state for freeze_fs/unfreeze_fs fs: Remove old freezing mechanism ext2: Implement freezing btrfs: Convert to new freezing mechanism nilfs2: Convert to new freezing mechanism ntfs: Convert to new freezing mechanism fuse: Convert to new freezing mechanism gfs2: Convert to new freezing mechanism ocfs2: Convert to new freezing mechanism xfs: Convert to new freezing code ext4: Convert to new freezing mechanism fs: Protect write paths by sb_start_write - sb_end_write fs: Skip atime update on frozen filesystem fs: Add freezing handling to mnt_want_write() / mnt_drop_write() fs: Improve filesystem freezing handling switch the protection of percpu_counter list to spinlock nfsd: Push mnt_want_write() outside of i_mutex btrfs: Push mnt_want_write() outside of i_mutex fat: Push mnt_want_write() outside of i_mutex ...
2012-08-01locks: remove unused lm_release_privateJ. Bruce Fields1-2/+0
In commit 3b6e2723f32d ("locks: prevent side-effects of locks_release_private before file_lock is initialized") we removed the last user of lm_release_private without removing the field itself. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-31mm: add support for a filesystem to activate swap files and use direct_IO for writing swap pagesMel Gorman2-0/+25
Currently swapfiles are managed entirely by the core VM by using ->bmap to allocate space and write to the blocks directly. This effectively ensures that the underlying blocks are allocated and avoids the need for the swap subsystem to locate what physical blocks store offsets within a file. If the swap subsystem is to use the filesystem information to locate the blocks, it is critical that information such as block groups, block bitmaps and the block descriptor table that map the swap file were resident in memory. This patch adds address_space_operations that the VM can call when activating or deactivating swap backed by a file. int swap_activate(struct file *); int swap_deactivate(struct file *); The ->swap_activate() method is used to communicate to the file that the VM relies on it, and the address_space should take adequate measures such as reserving space in the underlying device, reserving memory for mempools and pinning information such as the block descriptor table in memory. The ->swap_deactivate() method is called on sys_swapoff() if ->swap_activate() returned success. After a successful swapfile ->swap_activate, the swapfile is marked SWP_FILE and swapper_space.a_ops will proxy to sis->swap_file->f_mappings->a_ops using ->direct_io to write swapcache pages and ->readpage to read. It is perfectly possible that direct_IO be used to read the swap pages but it is an unnecessary complication. Similarly, it is possible that ->writepage be used instead of direct_io to write the pages but filesystem developers have stated that calling writepage from the VM is undesirable for a variety of reasons and using direct_IO opens up the possibility of writing back batches of swap pages in the future. [a.p.zijlstra@chello.nl: Original patch] Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Rik van Riel <riel@redhat.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: David S. Miller <davem@davemloft.net> Cc: Eric B Munson <emunson@mgebm.net> Cc: Eric Paris <eparis@redhat.com> Cc: James Morris <jmorris@namei.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Christie <michaelc@cs.wisc.edu> Cc: Neil Brown <neilb@suse.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Xiaotian Feng <dfeng@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-31Documentation: Correct s_umount state for freeze_fs/unfreeze_fsValerie Aurora1-2/+2
freeze_fs/unfreeze_fs ops are called with s_umount held for write, not read. Signed-off-by: Valerie Aurora <val@vaaconsulting.com> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14don't pass nameidata to ->create()Al Viro3-2/+8
boolean "does it have to be exclusive?" flag is passed instead; Local filesystem should just ignore it - the object is guaranteed not to be there yet. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14stop passing nameidata to ->lookup()Al Viro3-5/+4
Just the flags; only NFS cares even about that, but there are legitimate uses for such argument. And getting rid of that completely would require splitting ->lookup() into a couple of methods (at least), so let's leave that alone for now... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14stop passing nameidata * to ->d_revalidate()Al Viro3-5/+10
Just the lookup flags. Die, bastard, die... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14kill struct opendataAl Viro2-2/+2
Just pass struct file *. Methods are happier that way... There's no need to return struct file * from finish_open() now, so let it return int. Next: saner prototypes for parts in namei.c Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14make ->atomic_open() return intAl Viro2-4/+4
Change of calling conventions: old new NULL 1 file 0 ERR_PTR(-ve) -ve Caller *knows* that struct file *; no need to return it. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14->atomic_open() prototype change - pass int * instead of bool *Al Viro2-2/+2
... and let finish_open() report having opened the file via that sucker. Next step: don't modify od->filp at all. [AV: FILE_CREATE was already used by cifs; Miklos' fix folded] Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14vfs: add i_op->atomic_open()Miklos Szeredi2-0/+15
Add a new inode operation which is called on the last component of an open. Using this the filesystem can look up, possibly create and open the file in one atomic operation. If it cannot perform this (e.g. the file type turned out to be wrong) it may signal this by returning NULL instead of an open struct file pointer. i_op->atomic_open() is only called if the last component is negative or needs lookup. Handling cached positive dentries here doesn't add much value: these can be opened using f_op->open(). If the cached file turns out to be invalid, the open can be retried, this time using ->atomic_open() with a fresh dentry. For now leave the old way of using open intents in lookup and revalidate in place. This will be removed once all the users are converted. David Howells noticed that if ->atomic_open() opens the file but does not create it, handle_truncate() will be called on it even if it is not a regular file. Fix this by checking the file type in this case too. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14vfs: update documentation on ->i_dentry handlingAl Viro1-6/+4
we used to need to clean it in RCU callback freeing an inode; in 3.2 that requirement went away. Unfortunately, it hadn't been reflected in Documentation/filesystems/porting. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-06-01Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds2-0/+7
Pull vfs changes from Al Viro. "A lot of misc stuff. The obvious groups: * Miklos' atomic_open series; kills the damn abuse of ->d_revalidate() by NFS, which was the major stumbling block for all work in that area. * ripping security_file_mmap() and dealing with deadlocks in the area; sanitizing the neighborhood of vm_mmap()/vm_munmap() in general. * ->encode_fh() switched to saner API; insane fake dentry in mm/cleancache.c gone. * assorted annotations in fs (endianness, __user) * parts of Artem's ->s_dirty work (jff2 and reiserfs parts) * ->update_time() work from Josef. * other bits and pieces all over the place. Normally it would've been in two or three pull requests, but signal.git stuff had eaten a lot of time during this cycle ;-/" Fix up trivial conflicts in Documentation/filesystems/vfs.txt (the 'truncate_range' inode method was removed by the VM changes, the VFS update adds an 'update_time()' method), and in fs/btrfs/ulist.[ch] (due to sparse fix added twice, with other changes nearby). * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (95 commits) nfs: don't open in ->d_revalidate vfs: retry last component if opening stale dentry vfs: nameidata_to_filp(): don't throw away file on error vfs: nameidata_to_filp(): inline __dentry_open() vfs: do_dentry_open(): don't put filp vfs: split __dentry_open() vfs: do_last() common post lookup vfs: do_last(): add audit_inode before open vfs: do_last(): only return EISDIR for O_CREAT vfs: do_last(): check LOOKUP_DIRECTORY vfs: do_last(): make ENOENT exit RCU safe vfs: make follow_link check RCU safe vfs: do_last(): use inode variable vfs: do_last(): inline walk_component() vfs: do_last(): make exit RCU safe vfs: split do_lookup() Btrfs: move over to use ->update_time fs: introduce inode operation ->update_time reiserfs: get rid of resierfs_sync_super reiserfs: mark the superblock as dirty a bit later ...
2012-06-01fs: introduce inode operation ->update_timeJosef Bacik2-0/+7
Btrfs has to make sure we have space to allocate new blocks in order to modify the inode, so updating time can fail. We've gotten around this by having our own file_update_time but this is kind of a pain, and Christoph has indicated he would like to make xfs do something different with atime updates. So introduce ->update_time, where we will deal with i_version an a/m/c time updates and indicate which changes need to be made. The normal version just does what it has always done, updates the time and marks the inode dirty, and then filesystems can choose to do something different. I've gone through all of the users of file_update_time and made them check for errors with the exception of the fault code since it's complicated and I wasn't quite sure what to do there, also Jan is going to be pushing the file time updates into page_mkwrite for those who have it so that should satisfy btrfs and make it not a big deal to check the file_update_time() return code in the generic fault path. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
2012-05-31c/r: procfs: add arg_start/end, env_start/end and exit_code members to /proc/$pid/statCyrill Gorcunov1-0/+5
We would like to have an ability to restore command line arguments and program environment pointers but first we need to obtain them somehow. Thus we put these values into /proc/$pid/stat. The exit_code is needed to restore zombie tasks. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Acked-by: Kees Cook <keescook@chromium.org> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Serge Hallyn <serge.hallyn@canonical.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Tejun Heo <tj@kernel.org> Cc: Andrew Vagin <avagin@openvz.org> Cc: Vasiliy Kulikov <segoon@openwall.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-31fs, proc: introduce /proc/<pid>/task/<tid>/children entryCyrill Gorcunov1-0/+18
When we do checkpoint of a task we need to know the list of children the task, has but there is no easy and fast way to generate reverse parent->children chain from arbitrary <pid> (while a parent pid is provided in "PPid" field of /proc/<pid>/status). So instead of walking over all pids in the system (creating one big process tree in memory, just to figure out which children a task has) -- we add explicit /proc/<pid>/task/<tid>/children entry, because the kernel already has this kind of information but it is not yet exported. This is a first level children, not the whole process tree. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Serge Hallyn <serge.hallyn@canonical.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-29mm: document the meminfo and vmstat fields of relevance to transparent hugepagesMel Gorman1-0/+2
Update Documentation/vm/transhuge.txt and Documentation/filesystems/proc.txt with some information on monitoring transparent huge page usage and the associated overhead. Signed-off-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-29mm/fs: remove truncate_rangeHugh Dickins2-7/+8
Remove vmtruncate_range(), and remove the truncate_range method from struct inode_operations: only tmpfs ever supported it, and tmpfs has now converted over to using the fallocate method of file_operations. Update Documentation accordingly, adding (setlease and) fallocate lines. And while we're in mm.h, remove duplicate declarations of shmem_lock() and shmem_file_setup(): everyone is now using the ones in shmem_fs.h. Based-on-patch-by: Cong Wang <amwang@redhat.com> Signed-off-by: Hugh Dickins <hughd@google.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Cong Wang <amwang@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-28Merge tag 'writeback' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linuxLinus Torvalds1-9/+7
Pull writeback tree from Wu Fengguang: "Mainly from Jan Kara to avoid iput() in the flusher threads." * tag 'writeback' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux: writeback: Avoid iput() from flusher thread vfs: Rename end_writeback() to clear_inode() vfs: Move waiting for inode writeback from end_writeback() to evict_inode() writeback: Refactor writeback_single_inode() writeback: Remove wb->list_lock from writeback_single_inode() writeback: Separate inode requeueing after writeback writeback: Move I_DIRTY_PAGES handling writeback: Move requeueing when I_SYNC set to writeback_sb_inodes() writeback: Move clearing of I_SYNC into inode_sync_complete() writeback: initialize global_dirty_limit fs: remove 8 bytes of padding from struct writeback_control on 64 bit builds mm: page-writeback.c: local functions should not be exposed globally
2012-05-25Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fsLinus Torvalds1-3/+3
Pull ext2, ext3 and quota fixes from Jan Kara: "Interesting bits are: - removal of a special i_mutex locking subclass (I_MUTEX_QUOTA) since quota code does not need i_mutex anymore in any unusual way. - backport (from ext4) of a fix of a checkpointing bug (missing cache flush) that could lead to fs corruption on power failure The rest are just random small fixes & cleanups." * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: ext2: trivial fix to comment for ext2_free_blocks ext2: remove the redundant comment for ext2_export_ops ext3: return 32/64-bit dir name hash according to usage type quota: Get rid of nested I_MUTEX_QUOTA locking subclass quota: Use precomputed value of sb_dqopt in dquot_quota_sync ext2: Remove i_mutex use from ext2_quota_write() reiserfs: Remove i_mutex use from reiserfs_quota_write() ext4: Remove i_mutex use from ext4_quota_write() ext3: Remove i_mutex use from ext3_quota_write() quota: Fix double lock in add_dquot_ref() with CONFIG_QUOTA_DEBUG jbd: Write journal superblock with WRITE_FUA after checkpointing jbd: protect all log tail updates with j_checkpoint_mutex jbd: Split updating of journal superblock and marking journal empty ext2: do not register write_super within VFS ext2: Remove s_dirt handling ext2: write superblock only once on unmount ext3: update documentation with barrier=1 default ext3: remove max_debt in find_group_orlov() jbd: Refine commit writeout logic
2012-05-22Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivialLinus Torvalds2-15/+15
Pull trivial updates from Jiri Kosina: "As usual, it's mostly typo fixes, redundant code elimination and some documentation updates." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (57 commits) edac, mips: don't change code that has been removed in edac/mips tree xtensa: Change mail addresses of Hannes Weiner and Oskar Schirmer lib: Change mail address of Oskar Schirmer net: Change mail address of Oskar Schirmer arm/m68k: Change mail address of Sebastian Hess i2c: Change mail address of Oskar Schirmer net: Fix tcp_build_and_update_options comment in struct tcp_sock atomic64_32.h: fix parameter naming mismatch Kconfig: replace "--- help ---" with "---help---" c2port: fix bogus Kconfig "default no" edac: Fix spelling errors. qla1280: Remove redundant NULL check before release_firmware() call remoteproc: remove redundant NULL check before release_firmware() qla2xxx: Remove redundant NULL check before release_firmware() call. aic94xx: Get rid of redundant NULL check before release_firmware() call tehuti: delete redundant NULL check before release_firmware() qlogic: get rid of a redundant test for NULL before call to release_firmware() bna: remove redundant NULL test before release_firmware() tg3: remove redundant NULL test before release_firmware() call typhoon: get rid of redundant conditional before all to release_firmware() ...
2012-05-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmwLinus Torvalds2-6/+122
Pull GFS2 changes from Steven Whitehouse. * git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw: (24 commits) GFS2: Fix quota adjustment return code GFS2: Add rgrp information to block_alloc trace point GFS2: Eliminate unused "new" parameter to gfs2_meta_indirect_buffer GFS2: Update glock doc to add new stats info GFS2: Update main gfs2 doc GFS2: Remove redundant metadata block type check GFS2: Fix sgid propagation when using ACLs GFS2: eliminate log elements and simplify GFS2: Eliminate vestigial sd_log_le_rg GFS2: Eliminate needless parameter from function gfs2_setbit GFS2: Log code fixes GFS2: Remove unused argument from gfs2_internal_read GFS2: Remove bd_list_tr GFS2: Remove duplicate log code GFS2: Clean up log write code path GFS2: Use variable rather than qa to determine if unstuff necessary GFS2: Change variable blk to biblk GFS2: Fix function parameter comments in rgrp.c GFS2: Eliminate offset parameter to gfs2_setbit GFS2: Use slab for block reservation memory ...