linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2011-01-07	sched: Fix struct autogroup memory leak	Mike Galbraith	1	-0/+1
	Seems I lost a change somewhere, leaking memory. sched: fix struct autogroup memory leak Add missing change to actually use autogroup_free(). Signed-off-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1294222285.8369.2.camel@marge.simson.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-07	sched: Mark autogroup_init() __init	Yong Zhang	1	-1/+1
	autogroup_init() is only called at boot time. Signed-off-by: Yong Zhang <yong.zhang0@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1294375425-31065-1-git-send-email-yong.zhang0@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-07	sched: Consolidate the name of root_task_group and init_task_group	Yong Zhang	3	-26/+24
	root_task_group is the leftover of USER_SCHED, now it's always same to init_task_group. But as Mike suggested, root_task_group is maybe the suitable name to keep for a tree. So in this patch: init_task_group --> root_task_group init_task_group_load --> root_task_group_load INIT_TASK_GROUP_LOAD --> ROOT_TASK_GROUP_LOAD Suggested-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Yong Zhang <yong.zhang0@gmail.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20110107071736.GA32635@windriver.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-07	EDAC, MCE: Fix NB error formatting	Borislav Petkov	1	-7/+10
	Minor formatting fixup since the information which core was associated with the MCE is not always valid. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07	EDAC, MCE: Use BIT_64() to eliminate warnings on 32-bit	Randy Dunlap	1	-2/+2
	Building for X86_32 produces shift count warnings, so use BIT_64() to eliminate the warnings. drivers/edac/mce_amd.c:778: warning: left shift count >= width of type drivers/edac/mce_amd.c:778: warning: left shift count >= width of type Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: Doug Thompson <dougthompson@xmission.com> Cc: bluesmoke-devel@lists.sourceforge.net Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07	EDAC, MCE: Enable MCE decoding on F15h	Borislav Petkov	1	-6/+8
	Now that everything is inplace, enable MCE decoding on F15h. Make initcall routine a bit more readable. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07	EDAC, MCE: Allow F15h bank 6 MCE injection	Borislav Petkov	1	-4/+5
	F15h adds a sixth MCE bank: adjust bank number check in the injection code. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07	EDAC, MCE: Shorten error report formatting	Borislav Petkov	1	-22/+32
	Shorten up MCi_STATUS flags and add BD's new deferred and poison types. Also, simplify formatting. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07	EDAC, MCE: Overhaul error fields extraction macros	Borislav Petkov	3	-54/+43
	Make macro names shorter thus making code shorter and more clear. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07	EDAC, MCE: Add F15h FP MCE decoder	Borislav Petkov	1	-0/+44
	Add decoder for FP MCEs. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07	EDAC, MCE: Add F15 EX MCE decoder	Borislav Petkov	1	-7/+34
	Integrate the single FIROB signature into an expanded table along with the new BD MCE types. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07	EDAC, MCE: Add an F15h NB MCE decoder	Borislav Petkov	1	-0/+10
	by (almost) reusing the F10h one since the signatures are the same. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07	EDAC, MCE: No F15h LS MCE decoder	Borislav Petkov	1	-1/+1
	F15h BD doesn't generate LS MCEs so warn about it. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07	EDAC, MCE: Add F15h CU MCE decoder	Borislav Petkov	1	-1/+61
	MCE bank 2 is redefined from a BU to a CU (Combined Unit) bank on F15h. Add a decoder function for CU MCEs. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07	EDAC, MCE: Add F15h IC MCE decoder	Borislav Petkov	2	-4/+51
	Add support for decoding F15h IC MCEs. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07	EDAC, MCE: Add F15h DC MCE decoder	Borislav Petkov	2	-19/+62
	Add a decoder for F15h DC MCEs to support the new types of DC MCEs introduced by the BD microarchitecture. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07	EDAC, MCE: Select extended error code mask	Borislav Petkov	1	-4/+9
	F15h enlarges the extended error code of an MCE to a 5-bit field (MCi_STATUS[20:16]). Add a mask variable which default 0xf is overridden on F15h. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07	amd64_edac: Disable DRAM ECC injection on K8	Borislav Petkov	1	-2/+3
	K8 does not allow for an atomic RMW to a cacheline as F10h does so disable the error injection interface for it. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07	EDAC: Fixup scrubrate manipulation	Borislav Petkov	7	-63/+52
	Make the ->{get\|set}_sdram_scrub_rate return the actual scrub rate bandwidth it succeeded setting and remove superfluous arg pointer used for that. A negative value returned still means that an error occurred while setting the scrubrate. Document this for future reference. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07	amd64_edac: Remove two-stage initialization	Borislav Petkov	1	-100/+68
	Now that all prerequisites are in place, drop the two-stage driver instances initialization in favor of the following simple init sequence: 1. Probe PCI device: we only test ECC capabilities here and if none exit early. 2. If the hw supports ECC and it is/can be enabled, we init the per-node instance. Remove "amd64_" prefix from static functions touched, while at it. There actually should be no visible functional change resulting from this patch. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07	amd64_edac: Check ECC capabilities initially	Borislav Petkov	1	-66/+75
	Rework the code to check the hardware ECC capabilities at PCI probing time. We do all further initialization only if we actually can/have ECC enabled. While at it: 0. Fix function naming. 1. Simplify/clarify debug output. 2. Remove amd64_ prefix from the static functions 3. Reorganize code. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07	amd64_edac: Carve out ECC-related hw settings	Borislav Petkov	2	-24/+49
	This is in preparation for the init path reorganization where we want only to 1) test whether a particular node supports ECC 2) can it be enabled and only then do the necessary allocation/initialization. For that, we need to decouple the ECC settings of the node from the instance's descriptor. The should be no functional change introduced by this patch. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07	amd64_edac: Remove PCI ECS enabling functions	Borislav Petkov	2	-59/+0
	PCI ECS is being enabled by default since 2.6.26 on AMD so this code is just superfluous now, remove it. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07	amd64_edac: Remove explicit Kconfig PCI dependency	Borislav Petkov	1	-4/+4
	AMD_NB pulls in the dependency on PCI. Clarify/fix help text while at it. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07	amd64_edac: Allocate driver instances dynamically	Borislav Petkov	2	-18/+29
	Remove static allocation in favor of dynamically allocating space for as many driver instances as northbridges present on the system. There should be no functional change resulting from this patch. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07	amd64_edac: Rework printk macros	Borislav Petkov	5	-107/+87
	Add a macro per printk level, shorten up error messages. Add relevant information to KERN_INFO level. No functional change. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07	amd64_edac: Rename CPU PCI devices	Borislav Petkov	3	-95/+77
	Rename variables representing PCI devices to their BKDG names for faster search and shorter, clearer code. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07	amd64_edac: Concentrate per-family init even more	Borislav Petkov	2	-29/+17
	Move the remaining per-family init code into the proper place and simplify the rest of the initialization. Reorganize error handling in amd64_init_one_instance(). Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07	amd64_edac: Cleanup the CPU PCI device reservation	Borislav Petkov	1	-30/+12
	Shorten code and clarify comments, return proper -E* values on error. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07	amd64_edac: Simplify CPU family detection	Borislav Petkov	2	-32/+32
	Concentrate CPU family detection in the per-family init function. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07	amd64_edac: Add per-family init function	Borislav Petkov	2	-19/+28
	Run a per-family init function which does all the settings based on the family this driver instance is running on. Move the scrubrate calculation in it and simplify code. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07	amd64_edac: Use cached extended CPU model	Borislav Petkov	1	-3/+2
	... instead of computing it needlessly again. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07	amd64_edac: Remove F11h support	Borislav Petkov	2	-49/+3
	F11h doesn't support DRAM ECC so whack it away. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07	fs: scale mntget/mntput	Nick Piggin	13	-98/+283
	The problem that this patch aims to fix is vfsmount refcounting scalability. We need to take a reference on the vfsmount for every successful path lookup, which often go to the same mount point. The fundamental difficulty is that a "simple" reference count can never be made scalable, because any time a reference is dropped, we must check whether that was the last reference. To do that requires communication with all other CPUs that may have taken a reference count. We can make refcounts more scalable in a couple of ways, involving keeping distributed counters, and checking for the global-zero condition less frequently. - check the global sum once every interval (this will delay zero detection for some interval, so it's probably a showstopper for vfsmounts). - keep a local count and only taking the global sum when local reaches 0 (this is difficult for vfsmounts, because we can't hold preempt off for the life of a reference, so a counter would need to be per-thread or tied strongly to a particular CPU which requires more locking). - keep a local difference of increments and decrements, which allows us to sum the total difference and hence find the refcount when summing all CPUs. Then, keep a single integer "long" refcount for slow and long lasting references, and only take the global sum of local counters when the long refcount is 0. This last scheme is what I implemented here. Attached mounts and process root and working directory references are "long" references, and everything else is a short reference. This allows scalable vfsmount references during path walking over mounted subtrees and unattached (lazy umounted) mounts with processes still running in them. This results in one fewer atomic op in the fastpath: mntget is now just a per-CPU inc, rather than an atomic inc; and mntput just requires a spinlock and non-atomic decrement in the common case. However code is otherwise bigger and heavier, so single threaded performance is basically a wash. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07	fs: rename vfsmount counter helpers	Nick Piggin	1	-11/+11
	Suggested by Andreas, mnt_ prefix is clearer namespace, follows kernel conventions better, and is easier for tab complete. I introduced these names so I'll admit they were not good choices. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07	fs: implement faster dentry memcmp	Nick Piggin	2	-9/+24
	The standard memcmp function on a Westmere system shows up hot in profiles in the `git diff` workload (both parallel and single threaded), and it is likely due to the costs associated with trapping into microcode, and little opportunity to improve memory access (dentry name is not likely to take up more than a cacheline). So replace it with an open-coded byte comparison. This increases code size by 8 bytes in the critical __d_lookup_rcu function, but the speedup is huge, averaging 10 runs of each: git diff st user sys elapsed CPU before 1.15 2.57 3.82 97.1 after 1.14 2.35 3.61 96.8 git diff mt user sys elapsed CPU before 1.27 3.85 1.46 349 after 1.26 3.54 1.43 333 Elapsed time for single threaded git diff at 95.0% confidence: -0.21 +/- 0.01 -5.45% +/- 0.24% It's -0.66% +/- 0.06% elapsed time on my Opteron, so rep cmp costs on the fam10h seem to be relatively smaller, but there is still a win. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07	fs: prefetch inode data in dcache lookup	Nick Piggin	1	-0/+3
	This makes single threaded git diff -1.25% +/- 0.05% elapsed time on my 2s12c24t Westmere system, and -0.86% +/- 0.05% on my 2s8c Barcelona, by prefetching the important first cacheline of the inode in while we do the actual name compare and other operations on the dentry. There was no measurable slowdown in the single file stat case, or the creat case (where negative dentries would be common). Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07	fs: improve scalability of pseudo filesystems	Nick Piggin	5	-3/+16
	Regardless of how much we possibly try to scale dcache, there is likely always going to be some fundamental contention when adding or removing children under the same parent. Pseudo filesystems do not seem need to have connected dentries because by definition they are disconnected. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07	fs: dcache per-inode inode alias locking	Nick Piggin	9	-60/+67
	dcache_inode_lock can be replaced with per-inode locking. Use existing inode->i_lock for this. This is slightly non-trivial because we sometimes need to find the inode from the dentry, which requires d_inode to be stabilised (either with refcount or d_lock). Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07	fs: dcache per-bucket dcache hash locking	Nick Piggin	4	-53/+89
	We can turn the dcache hash locking from a global dcache_hash_lock into per-bucket locking. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07	bit_spinlock: add required includes	Nick Piggin	1	-0/+4
	Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07	kernel: add bl_list	Nick Piggin	2	-0/+271
	Introduce a type of hlist that can support the use of the lowest bit in the hlist_head. This will be subsequently used to implement per-bucket bit spinlock for inode and dentry hashes, and may be useful in other cases such as network hashes. Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07	xfs: provide simple rcu-walk ACL implementation	Nick Piggin	1	-3/+6
	This simple implementation just checks for no ACLs on the inode, and if so, then the rcu-walk may proceed, otherwise fail it. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07	btrfs: provide simple rcu-walk ACL implementation	Nick Piggin	2	-12/+12
	This simple implementation just checks for no ACLs on the inode, and if so, then the rcu-walk may proceed, otherwise fail it. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07	ext2,3,4: provide simple rcu-walk ACL implementation	Nick Piggin	3	-6/+15
	This simple implementation just checks for no ACLs on the inode, and if so, then the rcu-walk may proceed, otherwise fail it. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07	fs: provide simple rcu-walk generic_check_acl implementation	Nick Piggin	2	-10/+31
	This simple implementation just checks for no ACLs on the inode, and if so, then the rcu-walk may proceed, otherwise fail it. This could easily be extended to put acls under RCU and check them under seqlock, if need be. But this implementation is enough to show the rcu-walk aware permissions code for path lookups is working, and will handle cases where there are no ACLs or ACLs in just the final element. This patch implicity converts tmpfs to rcu-aware permission check. Subsequent patches onvert ext*, xfs, and, btrfs. Each of these uses acl/permission code in a different way, so convert them all to provide templates and proof of concept. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07	fs: provide rcu-walk aware permission i_ops	Nick Piggin	60	-146/+287
	Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07	fs: rcu-walk aware d_revalidate method	Nick Piggin	27	-61/+215
	Require filesystems be aware of .d_revalidate being called in rcu-walk mode (nd->flags & LOOKUP_RCU). For now do a simple push down, returning -ECHILD from all implementations. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07	fs: cache optimise dentry and inode for rcu-walk	Nick Piggin	3	-38/+42
	Put dentry and inode fields into top of data structure. This allows RCU path traversal to perform an RCU dentry lookup in a path walk by touching only the first 56 bytes of the dentry. We also fit in 8 bytes of inline name in the first 64 bytes, so for short names, only 64 bytes needs to be touched to perform the lookup. We should get rid of the hash->prev pointer from the first 64 bytes, and fit 16 bytes of name in there, which will take care of 81% rather than 32% of the kernel tree. inode is also rearranged so that RCU lookup will only touch a single cacheline in the inode, plus one in the i_ops structure. This is important for directory component lookups in RCU path walking. In the kernel source, directory names average is around 6 chars, so this works. When we reach the last element of the lookup, we need to lock it and take its refcount which requires another cacheline access. Align dentry and inode operations structs, so members will be at predictable offsets and we can group common operations into head of structure. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07	fs: dcache reduce branches in lookup path	Nick Piggin	63	-137/+174
	Reduce some branches and memory accesses in dcache lookup by adding dentry flags to indicate common d_ops are set, rather than having to check them. This saves a pointer memory access (dentry->d_op) in common path lookup situations, and saves another pointer load and branch in cases where we have d_op but not the particular operation. Patched with: git grep -E '[.>]([[:space:]])d_op([[:space:]])=' \| xargs sed -e 's/\([^\t ]\)->d_op = \(.\);/d_set_d_op(\1, \2);/' -e 's/\([^\t ]\)\.d_op = \(.\);/d_set_d_op(\&\1, \2);/' -i Signed-off-by: Nick Piggin <npiggin@kernel.dk>