linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2010-10-25	new helper: ihold()	Al Viro	40	-49/+57
	Clones an existing reference to inode; caller must already hold one. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-10-25	fs: remove inode_add_to_list/__inode_add_to_list	Christoph Hellwig	3	-41/+38
	Split up inode_add_to_list/__inode_add_to_list. Locking for the two lists will be split soon so these helpers really don't buy us much anymore. The __ prefixes for the sb list helpers will go away soon, but until inode_lock is gone we'll need them to distinguish between the locked and unlocked variants. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-10-25	fs: move i_count increments into find_inode/find_inode_fast	Christoph Hellwig	1	-11/+6
	Now that iunique is not abusing find_inode anymore we can move the i_ref increment back to where it belongs. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-10-25	fs: Stop abusing find_inode_fast in iunique	Christoph Hellwig	1	-5/+25
	Stop abusing find_inode_fast for iunique and opencode the inode hash walk. Introduce a new iunique_lock to protect the iunique counters once inode_lock is removed. Based on a patch originally from Nick Piggin. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-10-25	fs: Factor inode hash operations into functions	Dave Chinner	1	-45/+55
	Before replacing the inode hash locking with a more scalable mechanism, factor the removal of the inode from the hashes rather than open coding it in several places. Based on a patch originally from Nick Piggin. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-10-25	fs: Implement lazy LRU updates for inodes	Nick Piggin	4	-41/+71
	Convert the inode LRU to use lazy updates to reduce lock and cacheline traffic. We avoid moving inodes around in the LRU list during iget/iput operations so these frequent operations don't need to access the LRUs. Instead, we defer the refcount checks to reclaim-time and use a per-inode state flag, I_REFERENCED, to tell reclaim that iget has touched the inode in the past. This means that only reclaim should be touching the LRU with any frequency, hence significantly reducing lock acquisitions and the amount contention on LRU updates. This also removes the inode_in_use list, which means we now only have one list for tracking the inode LRU status. This makes it much simpler to split out the LRU list operations under it's own lock. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-10-25	fs: Convert nr_inodes and nr_unused to per-cpu counters	Dave Chinner	5	-25/+52
	The number of inodes allocated does not need to be tied to the addition or removal of an inode to/from a list. If we are not tied to a list lock, we could update the counters when inodes are initialised or destroyed, but to do that we need to convert the counters to be per-cpu (i.e. independent of a lock). This means that we have the freedom to change the list/locking implementation without needing to care about the counters. Based on a patch originally from Eric Dumazet. [AV: cleaned up a bit, fixed build breakage on weird configs Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-10-25	vfs: fix infinite loop caused by clone_mnt race	Miklos Szeredi	1	-1/+1
	If clone_mnt() happens while mnt_make_readonly() is running, the cloned mount might have MNT_WRITE_HOLD flag set, which results in mnt_want_write() spinning forever on this mount. Needs CAP_SYS_ADMIN to trigger deliberately and unlikely to happen accidentally. But if it does happen it can hang the machine. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-10-25	switch hfs to hlist_add_fake()	Al Viro	3	-4/+1
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-10-25	list.h: new helper - hlist_add_fake()	Al Viro	3	-2/+8
	Make node look as if it was on hlist, with hlist_del() working correctly. Usable without any locking... Convert a couple of places where we want to do that to inode->i_hash. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-10-25	new helper: inode_unhashed()	Al Viro	6	-8/+13
	note: for race-free uses you inode_lock held Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-10-25	unexport invalidate_inodes	Al Viro	3	-2/+5
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-10-25	smbfs never retains inodes with zero refcount in the first place	Al Viro	1	-1/+0
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-10-25	ntfs: don't call invalidate_inodes()	Al Viro	1	-15/+0
	We are in fill_super(); again, no inodes with zero i_count could be around until we set MS_ACTIVE. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-10-25	gfs2: invalidate_inodes() is no-op there	Al Viro	2	-2/+0
	In fill_super() we hadn't MS_ACTIVE set yet, so there won't be any inodes with zero i_count sitting around. In put_super() we already have MS_ACTIVE removed and we had called invalidate_inodes() since then. So again there won't be any inodes with zero i_count... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-10-25	ext2_remount: don't bother with invalidate_inodes()	Al Viro	1	-3/+1
	It's pointless - we do have busy inodes (root directory, for one), so that call will fail and attempt to change XIP flag will be ignored. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-10-25	fs/buffer.c: call __block_write_begin() if we have page	Namhyung Kim	1	-5/+4
	If we have the appropriate page already, call __block_write_begin() directly instead of releasing and regrabbing it inside of block_write_begin(). Signed-off-by: Namhyung Kim <namhyung@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-10-25	lockdep: fixup checking of dir inode annotation	Namhyung Kim	1	-1/+1
	Since inode->i_mode shares its bits for S_IFMT, S_ISDIR should be used to distinguish whether it is a dir or not. Signed-off-by: Namhyung Kim <namhyung@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-10-25	aio: bump i_count instead of using igrab	Chris Mason	1	-1/+14
	The aio batching code is using igrab to get an extra reference on the inode so it can safely batch. igrab will go ahead and take the global inode spinlock, which can be a bottleneck on large machines doing lots of AIO. In this case, igrab isn't required because we already have a reference on the file handle. It is safe to just bump the i_count directly on the inode. Benchmarking shows this patch brings IOP/s on tons of flash up by about 2.5X. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-10-25	update block_device_operations documentation	Christoph Hellwig	1	-8/+23
	Updated Documentation/filesystems/Locking to match the code. Signed-off-by: Christoph Hellwig <hch@lst.de>
2010-10-25	fs/buffer.c: remove duplicated assignment on b_private	Namhyung Kim	1	-1/+0
	bh->b_private is initialized within init_buffer(), thus the assignment should be redundant. Remove it. Signed-off-by: Namhyung Kim <namhyung@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-10-25	fs: move exportfs since it is not a networking filesystem	Randy Dunlap	1	-3/+3
	Move the EXPORTFS kconfig symbol out of the NETWORK_FILESYSTEMS block since it provides a library function that can be (and is) used by other (non-network) filesystems. This also eliminates a kconfig dependency warning: warning: (XFS_FS && BLOCK \|\| NFSD && NETWORK_FILESYSTEMS && INET && FILE_LOCKING && BKL) selects EXPORTFS which has unmet direct dependencies (NETWORK_FILESYSTEMS) Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Alex Elder <aelder@sgi.com> Cc: xfs-masters@oss.sgi.com Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-10-25	hfs: use sync_dirty_buffer	Christoph Hellwig	2	-13/+2
	Use sync_dirty_buffer instead of the incorrect opencoding it. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-10-25	vfs: introduce FMODE_UNSIGNED_OFFSET for allowing negative f_pos	KAMEZAWA Hiroyuki	4	-4/+33
	Now, rw_verify_area() checsk f_pos is negative or not. And if negative, returns -EINVAL. But, some special files as /dev/(k)mem and /proc/<pid>/mem etc.. has negative offsets. And we can't do any access via read/write to the file(device). So introduce FMODE_UNSIGNED_OFFSET to allow negative file offsets. Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-10-25	hostfs: fix UML crash: remove f_spare from hostfs	Richard Weinberger	3	-10/+4
	365b1818 ("add f_flags to struct statfs(64)") resized f_spare within struct statfs which caused a UML crash. There is no need to copy f_spare. Signed-off-by: Richard Weinberger <richard@nod.at> Reported-by: Toralf Förster <toralf.foerster@gmx.de> Tested-by: Toralf Förster <toralf.foerster@gmx.de> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-10-25	Documentation: Fix trivial typo in filesystems/sharedsubtree.txt	Valerie Aurora	1	-2/+2
	Documentation: Fix trivial typo in filesystems/sharedsubtree.txt This typo is easy to ignore unless you have spent a great deal of time thinking about how to eliminate duplicate dentries in unions. Signed-off-by: Valerie Aurora <vaurora@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-10-25	affs: testing the wrong variable	Dan Carpenter	1	-2/+2
	The intent was to verify that bh = affs_bread_ino(...) returned a valid pointer. We checked "ext_bh" earlier in the function and it's valid here. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-10-25	fs: allow for more than 2^31 files	Eric Dumazet	4	-24/+21
	Andrew, Could you please review this patch, you probably are the right guy to take it, because it crosses fs and net trees. Note : /proc/sys/fs/file-nr is a read-only file, so this patch doesnt depend on previous patch (sysctl: fix min/max handling in __do_proc_doulongvec_minmax()) Thanks ! [PATCH V4] fs: allow for more than 2^31 files Robin Holt tried to boot a 16TB system and found af_unix was overflowing a 32bit value : <quote> We were seeing a failure which prevented boot. The kernel was incapable of creating either a named pipe or unix domain socket. This comes down to a common kernel function called unix_create1() which does: atomic_inc(&unix_nr_socks); if (atomic_read(&unix_nr_socks) > 2 * get_max_files()) goto out; The function get_max_files() is a simple return of files_stat.max_files. files_stat.max_files is a signed integer and is computed in fs/file_table.c's files_init(). n = (mempages * (PAGE_SIZE / 1024)) / 10; files_stat.max_files = n; In our case, mempages (total_ram_pages) is approx 3,758,096,384 (0xe0000000). That leaves max_files at approximately 1,503,238,553. This causes 2 * get_max_files() to integer overflow. </quote> Fix is to let /proc/sys/fs/file-nr & /proc/sys/fs/file-max use long integers, and change af_unix to use an atomic_long_t instead of atomic_t. get_max_files() is changed to return an unsigned long. get_nr_files() is changed to return a long. unix_nr_socks is changed from atomic_t to atomic_long_t, while not strictly needed to address Robin problem. Before patch (on a 64bit kernel) : # echo 2147483648 >/proc/sys/fs/file-max # cat /proc/sys/fs/file-max -18446744071562067968 After patch: # echo 2147483648 >/proc/sys/fs/file-max # cat /proc/sys/fs/file-max 2147483648 # cat /proc/sys/fs/file-nr 704 0 2147483648 Reported-by: Robin Holt <holt@sgi.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: David Miller <davem@davemloft.net> Reviewed-by: Robin Holt <holt@sgi.com> Tested-by: Robin Holt <holt@sgi.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-10-25	isofs: Fix isofs_get_blocks for 8TB files	Jan Kara	1	-9/+8
	Currently isofs_get_blocks() was limited to handle only 4TB files on 32-bit architectures because of unnecessary use of iblock variable which was signed long. Just remove the variable. The error messages that were using this variable should have rather used b_off anyway because that is the block we are currently mapping. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-10-25	fs: kill block_prepare_write	Christoph Hellwig	14	-73/+39
	__block_write_begin and block_prepare_write are identical except for slightly different calling conventions. Convert all callers to the __block_write_begin calling conventions and drop block_prepare_write. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-10-25	fs: mark destroy_inode static	Christoph Hellwig	2	-2/+1
	Hugetlbfs used to need it, but after the destroy_inode and evict_inode changes it's not required anymore. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-10-25	fs: add sync_inode_metadata	Christoph Hellwig	10	-41/+30
	Add a new helper to write out the inode using the writeback code, that is including the correct dirty bit and list manipulation. A few of filesystems already opencode this, and a lot of others should be using it instead of using write_inode_now which also writes out the data. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-10-25	fs: move permission check back into __lookup_hash	Christoph Hellwig	1	-10/+4
	The caller that didn't need it is gone. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-10-25	x86-32, mm: Remove duplicated include	Borislav Petkov	1	-1/+0
	Commit b40827fa7268 ("x86-32, mm: Add an initial page table for core bootstrapping") added an include directive which is needless and is taken care of by a previous one. Remove it. Caught-by: Jaswinder Singh Rajput <jaswinderlinux@gmail.com> Signed-off-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-25	exofs: Remove inode->i_count manipulation in exofs_new_inode	Boaz Harrosh	1	-19/+9
	exofs_new_inode() was incrementing the inode->i_count and decrementing it in create_done(), in a bad attempt to make sure the inode will still be there when the asynchronous create_done() finally arrives. This was very stupid because iput() was not called, and if it was actually needed, it would leak the inode. However all this is not needed, because at exofs_evict_inode() we already wait for create_done() by waiting for the object_created event. Therefore remove the superfluous ref counting and just Thicken the comment at exofs_evict_inode() a bit. While at it change places that open coded wait_obj_created() to call the already available wrapper. CC: Dave Chinner <dchinner@redhat.com> CC: Christoph Hellwig <hch@lst.de> CC: Nick Piggin <npiggin@kernel.dk> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2010-10-25	fs/exofs: typo fix of faild to failed	Joe Perches	3	-14/+14
	Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2010-10-25	MIPS: MT: Fix build error iFPU affinity code	Ralf Baechle	1	-1/+1
	Commit b0ae19811375 ("security: remove unused parameter from security_task_setscheduler()") broke the build of arch/mips/kernel/mips-mt-fpaff.c. The function arguments were unnecessary, not the semicolon ... Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by: James Morris <jmorris@namei.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-25	Coda: replace BKL with mutex	Yoshihisa Abe	4	-50/+70
	Replace the BKL with a mutex to protect the venus_comm structure which binds the mountpoint with the character device and holds the upcall queues. Signed-off-by: Yoshihisa Abe <yoshiabe@cs.cmu.edu> Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-25	Coda: push BKL regions into coda_upcall()	Yoshihisa Abe	8	-190/+96
	Now that shared inode state is locked using the cii->c_lock, the BKL is only used to protect the upcall queues used to communicate with the userspace cache manager. The remaining state is all local and we can push the lock further down into coda_upcall(). Signed-off-by: Yoshihisa Abe <yoshiabe@cs.cmu.edu> Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-25	Coda: add spin lock to protect accesses to struct coda_inode_info.	Yoshihisa Abe	7	-17/+58
	We mostly need it to protect cached user permissions. The c_flags field is advisory, reading the wrong value is harmless and in the worst case we hit a slow path where we have to make an extra upcall to the userspace cache manager when revalidating a dentry or inode. Signed-off-by: Yoshihisa Abe <yoshiabe@cs.cmu.edu> Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-25	[S390] topology: export cpu topology via proc/sysinfo	Heiko Carstens	3	-3/+36
	Export the cpu configuration topology via sysinfo. Two new lines are introduced: CPU Topology HW: 0 0 0 4 6 4 CPU Topology SW: 0 0 0 0 4 24 The HW line describes the cpu topology nesting levels when the maximum nesting level is used to get the corresponding SYSIB. The SW line describes what Linux is actually using. In this case it supports only two levels (CONFIG_SCHED_BOOK off) and therefore the hardware folded the two lower levels in the SYSIB response block. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2010-10-25	[S390] topology: move topology sysinfo code	Heiko Carstens	2	-54/+56
	Move the topology sysinfo SYSIB definitions to the proper place in asm/sysinfo.h where they should be. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2010-10-25	[S390] topology: clean up facility detection	Heiko Carstens	3	-9/+10
	Move cpu topology facility detection to early setup code where it should be. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2010-10-25	[S390] cleanup facility list handling	Martin Schwidefsky	12	-74/+66
	Store the facility list once at system startup with stfl/stfle and reuse the result for all facility tests. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2010-10-25	[S390] enable ARCH_DMA_ADDR_T_64BIT with 64BIT	FUJITA Tomonori	1	-0/+3
	Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2010-10-25	[S390] dasd: ignore unsolicited interrupts for DIAG	Stefan Haberland	1	-9/+14
	For the DASD DIAG discipline IO is started through special diagnose calls. Unsolicited interrupts may contain information about the device itself. But this information is not needed because the device is not used directly. Fix the case that an unimplemented dicipline function may be called by ignoring unsolicited interrupts for the DIAG disciplin. Signed-off-by: Stefan Haberland <stefan.haberland@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2010-10-25	[S390] kvm: Enable z196 instruction facilities	Christian Borntraeger	1	-1/+1
	Enable PFPO, floating point extension, distinct-operands, fast-BCR-serialization, high-word, interlocked-access, load/store- on-condition, and population-count facilities for guests. (bits 37, 44 and 45). Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2010-10-25	[S390] dasd: fix unsolicited interrupt recognition	Stefan Weinhuber	2	-41/+40
	The dasd interrupt handler needs to distinguish solicited from unsolicited interrupts, as unsolicited interrupts may require special handling (e.g. summary unit checks) and solicited interrupts require proper error recovery for the failed I/O request. The interrupt handler needs to check several bit fields in the interrupt response block (irb) to make this distinction. So far our check of the status control bits has not been specific enough, which may lead to a failed request getting just retried instead of the necessary error recovery. Signed-off-by: Stefan Weinhuber <wein@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2010-10-25	[S390] dasd: fix use after free in dbf	Sebastian Ott	1	-1/+0
	Writing to /proc/dasd/statistics while the debug level of the generic dasd debug entry is set to DBF_DEBUG will lead to an use after free when accessing the debug entry later. Since for the format string "%s" in the s390 dbf only a pointer to the string is stored in the debug feature and the buffer used here is freed afterwards. To fix this just remove the debug message. Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2010-10-25	[S390] kvm: Fix badness at include/asm/mmu_context.h:83	Christian Borntraeger	1	-0/+2
	commit 050eef364ad700590a605a0749f825cab4834b1e [S390] fix tlb flushing vs. concurrent /proc accesses broke KVM on s390x. On every schedule a Badness at include/asm/mmu_context.h:83 appears. s390_enable_sie replaces the mm on the __running__ task, therefore, we have to increase the attach count of the new mm. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>