aboutsummaryrefslogtreecommitdiffstats
path: root/kernel/fork.c (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2011-01-07sched: Fix struct autogroup memory leakMike Galbraith1-0/+1
Seems I lost a change somewhere, leaking memory. sched: fix struct autogroup memory leak Add missing change to actually use autogroup_free(). Signed-off-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1294222285.8369.2.camel@marge.simson.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-07sched: Mark autogroup_init() __initYong Zhang1-1/+1
autogroup_init() is only called at boot time. Signed-off-by: Yong Zhang <yong.zhang0@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1294375425-31065-1-git-send-email-yong.zhang0@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-07sched: Consolidate the name of root_task_group and init_task_groupYong Zhang3-26/+24
root_task_group is the leftover of USER_SCHED, now it's always same to init_task_group. But as Mike suggested, root_task_group is maybe the suitable name to keep for a tree. So in this patch: init_task_group --> root_task_group init_task_group_load --> root_task_group_load INIT_TASK_GROUP_LOAD --> ROOT_TASK_GROUP_LOAD Suggested-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Yong Zhang <yong.zhang0@gmail.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20110107071736.GA32635@windriver.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-07EDAC, MCE: Fix NB error formattingBorislav Petkov1-7/+10
Minor formatting fixup since the information which core was associated with the MCE is not always valid. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07EDAC, MCE: Use BIT_64() to eliminate warnings on 32-bitRandy Dunlap1-2/+2
Building for X86_32 produces shift count warnings, so use BIT_64() to eliminate the warnings. drivers/edac/mce_amd.c:778: warning: left shift count >= width of type drivers/edac/mce_amd.c:778: warning: left shift count >= width of type Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: Doug Thompson <dougthompson@xmission.com> Cc: bluesmoke-devel@lists.sourceforge.net Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07EDAC, MCE: Enable MCE decoding on F15hBorislav Petkov1-6/+8
Now that everything is inplace, enable MCE decoding on F15h. Make initcall routine a bit more readable. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07EDAC, MCE: Allow F15h bank 6 MCE injectionBorislav Petkov1-4/+5
F15h adds a sixth MCE bank: adjust bank number check in the injection code. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07EDAC, MCE: Shorten error report formattingBorislav Petkov1-22/+32
Shorten up MCi_STATUS flags and add BD's new deferred and poison types. Also, simplify formatting. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07EDAC, MCE: Overhaul error fields extraction macrosBorislav Petkov3-54/+43
Make macro names shorter thus making code shorter and more clear. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07EDAC, MCE: Add F15h FP MCE decoderBorislav Petkov1-0/+44
Add decoder for FP MCEs. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07EDAC, MCE: Add F15 EX MCE decoderBorislav Petkov1-7/+34
Integrate the single FIROB signature into an expanded table along with the new BD MCE types. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07EDAC, MCE: Add an F15h NB MCE decoderBorislav Petkov1-0/+10
by (almost) reusing the F10h one since the signatures are the same. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07EDAC, MCE: No F15h LS MCE decoderBorislav Petkov1-1/+1
F15h BD doesn't generate LS MCEs so warn about it. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07EDAC, MCE: Add F15h CU MCE decoderBorislav Petkov1-1/+61
MCE bank 2 is redefined from a BU to a CU (Combined Unit) bank on F15h. Add a decoder function for CU MCEs. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07EDAC, MCE: Add F15h IC MCE decoderBorislav Petkov2-4/+51
Add support for decoding F15h IC MCEs. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07EDAC, MCE: Add F15h DC MCE decoderBorislav Petkov2-19/+62
Add a decoder for F15h DC MCEs to support the new types of DC MCEs introduced by the BD microarchitecture. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07EDAC, MCE: Select extended error code maskBorislav Petkov1-4/+9
F15h enlarges the extended error code of an MCE to a 5-bit field (MCi_STATUS[20:16]). Add a mask variable which default 0xf is overridden on F15h. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07amd64_edac: Disable DRAM ECC injection on K8Borislav Petkov1-2/+3
K8 does not allow for an atomic RMW to a cacheline as F10h does so disable the error injection interface for it. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07EDAC: Fixup scrubrate manipulationBorislav Petkov7-63/+52
Make the ->{get|set}_sdram_scrub_rate return the actual scrub rate bandwidth it succeeded setting and remove superfluous arg pointer used for that. A negative value returned still means that an error occurred while setting the scrubrate. Document this for future reference. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07amd64_edac: Remove two-stage initializationBorislav Petkov1-100/+68
Now that all prerequisites are in place, drop the two-stage driver instances initialization in favor of the following simple init sequence: 1. Probe PCI device: we only test ECC capabilities here and if none exit early. 2. If the hw supports ECC and it is/can be enabled, we init the per-node instance. Remove "amd64_" prefix from static functions touched, while at it. There actually should be no visible functional change resulting from this patch. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07amd64_edac: Check ECC capabilities initiallyBorislav Petkov1-66/+75
Rework the code to check the hardware ECC capabilities at PCI probing time. We do all further initialization only if we actually can/have ECC enabled. While at it: 0. Fix function naming. 1. Simplify/clarify debug output. 2. Remove amd64_ prefix from the static functions 3. Reorganize code. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07amd64_edac: Carve out ECC-related hw settingsBorislav Petkov2-24/+49
This is in preparation for the init path reorganization where we want only to 1) test whether a particular node supports ECC 2) can it be enabled and only then do the necessary allocation/initialization. For that, we need to decouple the ECC settings of the node from the instance's descriptor. The should be no functional change introduced by this patch. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07amd64_edac: Remove PCI ECS enabling functionsBorislav Petkov2-59/+0
PCI ECS is being enabled by default since 2.6.26 on AMD so this code is just superfluous now, remove it. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07amd64_edac: Remove explicit Kconfig PCI dependencyBorislav Petkov1-4/+4
AMD_NB pulls in the dependency on PCI. Clarify/fix help text while at it. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07amd64_edac: Allocate driver instances dynamicallyBorislav Petkov2-18/+29
Remove static allocation in favor of dynamically allocating space for as many driver instances as northbridges present on the system. There should be no functional change resulting from this patch. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07amd64_edac: Rework printk macrosBorislav Petkov5-107/+87
Add a macro per printk level, shorten up error messages. Add relevant information to KERN_INFO level. No functional change. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07amd64_edac: Rename CPU PCI devicesBorislav Petkov3-95/+77
Rename variables representing PCI devices to their BKDG names for faster search and shorter, clearer code. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07amd64_edac: Concentrate per-family init even moreBorislav Petkov2-29/+17
Move the remaining per-family init code into the proper place and simplify the rest of the initialization. Reorganize error handling in amd64_init_one_instance(). Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07amd64_edac: Cleanup the CPU PCI device reservationBorislav Petkov1-30/+12
Shorten code and clarify comments, return proper -E* values on error. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07amd64_edac: Simplify CPU family detectionBorislav Petkov2-32/+32
Concentrate CPU family detection in the per-family init function. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07amd64_edac: Add per-family init functionBorislav Petkov2-19/+28
Run a per-family init function which does all the settings based on the family this driver instance is running on. Move the scrubrate calculation in it and simplify code. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07amd64_edac: Use cached extended CPU modelBorislav Petkov1-3/+2
... instead of computing it needlessly again. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07amd64_edac: Remove F11h supportBorislav Petkov2-49/+3
F11h doesn't support DRAM ECC so whack it away. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07fs: scale mntget/mntputNick Piggin13-98/+283
The problem that this patch aims to fix is vfsmount refcounting scalability. We need to take a reference on the vfsmount for every successful path lookup, which often go to the same mount point. The fundamental difficulty is that a "simple" reference count can never be made scalable, because any time a reference is dropped, we must check whether that was the last reference. To do that requires communication with all other CPUs that may have taken a reference count. We can make refcounts more scalable in a couple of ways, involving keeping distributed counters, and checking for the global-zero condition less frequently. - check the global sum once every interval (this will delay zero detection for some interval, so it's probably a showstopper for vfsmounts). - keep a local count and only taking the global sum when local reaches 0 (this is difficult for vfsmounts, because we can't hold preempt off for the life of a reference, so a counter would need to be per-thread or tied strongly to a particular CPU which requires more locking). - keep a local difference of increments and decrements, which allows us to sum the total difference and hence find the refcount when summing all CPUs. Then, keep a single integer "long" refcount for slow and long lasting references, and only take the global sum of local counters when the long refcount is 0. This last scheme is what I implemented here. Attached mounts and process root and working directory references are "long" references, and everything else is a short reference. This allows scalable vfsmount references during path walking over mounted subtrees and unattached (lazy umounted) mounts with processes still running in them. This results in one fewer atomic op in the fastpath: mntget is now just a per-CPU inc, rather than an atomic inc; and mntput just requires a spinlock and non-atomic decrement in the common case. However code is otherwise bigger and heavier, so single threaded performance is basically a wash. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07fs: rename vfsmount counter helpersNick Piggin1-11/+11
Suggested by Andreas, mnt_ prefix is clearer namespace, follows kernel conventions better, and is easier for tab complete. I introduced these names so I'll admit they were not good choices. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07fs: implement faster dentry memcmpNick Piggin2-9/+24
The standard memcmp function on a Westmere system shows up hot in profiles in the `git diff` workload (both parallel and single threaded), and it is likely due to the costs associated with trapping into microcode, and little opportunity to improve memory access (dentry name is not likely to take up more than a cacheline). So replace it with an open-coded byte comparison. This increases code size by 8 bytes in the critical __d_lookup_rcu function, but the speedup is huge, averaging 10 runs of each: git diff st user sys elapsed CPU before 1.15 2.57 3.82 97.1 after 1.14 2.35 3.61 96.8 git diff mt user sys elapsed CPU before 1.27 3.85 1.46 349 after 1.26 3.54 1.43 333 Elapsed time for single threaded git diff at 95.0% confidence: -0.21 +/- 0.01 -5.45% +/- 0.24% It's -0.66% +/- 0.06% elapsed time on my Opteron, so rep cmp costs on the fam10h seem to be relatively smaller, but there is still a win. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07fs: prefetch inode data in dcache lookupNick Piggin1-0/+3
This makes single threaded git diff -1.25% +/- 0.05% elapsed time on my 2s12c24t Westmere system, and -0.86% +/- 0.05% on my 2s8c Barcelona, by prefetching the important first cacheline of the inode in while we do the actual name compare and other operations on the dentry. There was no measurable slowdown in the single file stat case, or the creat case (where negative dentries would be common). Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07fs: improve scalability of pseudo filesystemsNick Piggin5-3/+16
Regardless of how much we possibly try to scale dcache, there is likely always going to be some fundamental contention when adding or removing children under the same parent. Pseudo filesystems do not seem need to have connected dentries because by definition they are disconnected. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07fs: dcache per-inode inode alias lockingNick Piggin9-60/+67
dcache_inode_lock can be replaced with per-inode locking. Use existing inode->i_lock for this. This is slightly non-trivial because we sometimes need to find the inode from the dentry, which requires d_inode to be stabilised (either with refcount or d_lock). Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07fs: dcache per-bucket dcache hash lockingNick Piggin4-53/+89
We can turn the dcache hash locking from a global dcache_hash_lock into per-bucket locking. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07bit_spinlock: add required includesNick Piggin1-0/+4
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07kernel: add bl_listNick Piggin2-0/+271
Introduce a type of hlist that can support the use of the lowest bit in the hlist_head. This will be subsequently used to implement per-bucket bit spinlock for inode and dentry hashes, and may be useful in other cases such as network hashes. Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07xfs: provide simple rcu-walk ACL implementationNick Piggin1-3/+6
This simple implementation just checks for no ACLs on the inode, and if so, then the rcu-walk may proceed, otherwise fail it. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07btrfs: provide simple rcu-walk ACL implementationNick Piggin2-12/+12
This simple implementation just checks for no ACLs on the inode, and if so, then the rcu-walk may proceed, otherwise fail it. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07ext2,3,4: provide simple rcu-walk ACL implementationNick Piggin3-6/+15
This simple implementation just checks for no ACLs on the inode, and if so, then the rcu-walk may proceed, otherwise fail it. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07fs: provide simple rcu-walk generic_check_acl implementationNick Piggin2-10/+31
This simple implementation just checks for no ACLs on the inode, and if so, then the rcu-walk may proceed, otherwise fail it. This could easily be extended to put acls under RCU and check them under seqlock, if need be. But this implementation is enough to show the rcu-walk aware permissions code for path lookups is working, and will handle cases where there are no ACLs or ACLs in just the final element. This patch implicity converts tmpfs to rcu-aware permission check. Subsequent patches onvert ext*, xfs, and, btrfs. Each of these uses acl/permission code in a different way, so convert them all to provide templates and proof of concept. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07fs: provide rcu-walk aware permission i_opsNick Piggin60-146/+287
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07fs: rcu-walk aware d_revalidate methodNick Piggin27-61/+215
Require filesystems be aware of .d_revalidate being called in rcu-walk mode (nd->flags & LOOKUP_RCU). For now do a simple push down, returning -ECHILD from all implementations. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07fs: cache optimise dentry and inode for rcu-walkNick Piggin3-38/+42
Put dentry and inode fields into top of data structure. This allows RCU path traversal to perform an RCU dentry lookup in a path walk by touching only the first 56 bytes of the dentry. We also fit in 8 bytes of inline name in the first 64 bytes, so for short names, only 64 bytes needs to be touched to perform the lookup. We should get rid of the hash->prev pointer from the first 64 bytes, and fit 16 bytes of name in there, which will take care of 81% rather than 32% of the kernel tree. inode is also rearranged so that RCU lookup will only touch a single cacheline in the inode, plus one in the i_ops structure. This is important for directory component lookups in RCU path walking. In the kernel source, directory names average is around 6 chars, so this works. When we reach the last element of the lookup, we need to lock it and take its refcount which requires another cacheline access. Align dentry and inode operations structs, so members will be at predictable offsets and we can group common operations into head of structure. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07fs: dcache reduce branches in lookup pathNick Piggin63-137/+174
Reduce some branches and memory accesses in dcache lookup by adding dentry flags to indicate common d_ops are set, rather than having to check them. This saves a pointer memory access (dentry->d_op) in common path lookup situations, and saves another pointer load and branch in cases where we have d_op but not the particular operation. Patched with: git grep -E '[.>]([[:space:]])*d_op([[:space:]])*=' | xargs sed -e 's/\([^\t ]*\)->d_op = \(.*\);/d_set_d_op(\1, \2);/' -e 's/\([^\t ]*\)\.d_op = \(.*\);/d_set_d_op(\&\1, \2);/' -i Signed-off-by: Nick Piggin <npiggin@kernel.dk>