linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2008-01-25	sched: push RT tasks from overloaded CPUs	Steven Rostedt	2	-0/+11
	This patch adds pushing of overloaded RT tasks from a runqueue that is having tasks (most likely RT tasks) added to the run queue. TODO: We don't cover the case of waking of new RT tasks (yet). Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25	sched: pull RT tasks from overloaded runqueues	Steven Rostedt	2	-11/+178
	This patch adds the algorithm to pull tasks from RT overloaded runqueues. When a pull RT is initiated, all overloaded runqueues are examined for a RT task that is higher in prio than the highest prio task queued on the target runqueue. If another runqueue holds a RT task that is of higher prio than the highest prio task on the target runqueue is found it is pulled to the target runqueue. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25	sched: add rt-overload tracking	Steven Rostedt	1	-0/+36
	This patch adds an RT overload accounting system. When a runqueue has more than one RT task queued, it is marked as overloaded. That is that it is a candidate to have RT tasks pulled from it. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25	sched: add RT task pushing	Steven Rostedt	2	-2/+231
	This patch adds an algorithm to push extra RT tasks off a run queue to other CPU runqueues. When more than one RT task is added to a run queue, this algorithm takes an assertive approach to push the RT tasks that are not running onto other run queues that have lower priority. The way this works is that the highest RT task that is not running is looked at and we examine the runqueues on the CPUS for that tasks affinity mask. We find the runqueue with the lowest prio in the CPU affinity of the picked task, and if it is lower in prio than the picked task, we push the task onto that CPU runqueue. We continue pushing RT tasks off the current runqueue until we don't push any more. The algorithm stops when the next highest RT task can't preempt any other processes on other CPUS. TODO: The algorithm may stop when there are still RT tasks that can be migrated. Specifically, if the highest non running RT task CPU affinity is restricted to CPUs that are running higher priority tasks, there may be a lower priority task queued that has an affinity with a CPU that is running a lower priority task that it could be migrated to. This patch set does not address this issue. Note: checkpatch reveals two over 80 character instances. I'm not sure that breaking them up will help visually, so I left them as is. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25	sched: track highest prio task queued	Steven Rostedt	2	-0/+21
	This patch adds accounting to each runqueue to keep track of the highest prio task queued on the run queue. We only care about RT tasks, so if the run queue does not contain any active RT tasks its priority will be considered MAX_RT_PRIO. This information will be used for later patches. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25	sched: count # of queued RT tasks	Steven Rostedt	2	-0/+18
	This patch adds accounting to keep track of the number of RT tasks running on a runqueue. This information will be used in later patches. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25	softlockup: automatically detect hung TASK_UNINTERRUPTIBLE tasks	Ingo Molnar	7	-13/+164
	this patch extends the soft-lockup detector to automatically detect hung TASK_UNINTERRUPTIBLE tasks. Such hung tasks are printed the following way: ------------------> INFO: task prctl:3042 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message prctl D fd5e3793 0 3042 2997 f6050f38 00000046 00000001 fd5e3793 00000009 c06d8264 c06dae80 00000286 f6050f40 f6050f00 f7d34d90 f7d34fc8 c1e1be80 00000001 f6050000 00000000 f7e92d00 00000286 f6050f18 c0489d1a f6050f40 00006605 00000000 c0133a5b Call Trace: [<c04883a5>] schedule_timeout+0x6d/0x8b [<c04883d8>] schedule_timeout_uninterruptible+0x15/0x17 [<c0133a76>] msleep+0x10/0x16 [<c0138974>] sys_prctl+0x30/0x1e2 [<c0104c52>] sysenter_past_esp+0x5f/0xa5 ======================= 2 locks held by prctl/3042: #0: (&sb->s_type->i_mutex_key#5){--..}, at: [<c0197d11>] do_fsync+0x38/0x7a #1: (jbd_handle){--..}, at: [<c01ca3d2>] journal_start+0xc7/0xe9 <------------------ the current default timeout is 120 seconds. Such messages are printed up to 10 times per bootup. If the system has crashed already then the messages are not printed. if lockdep is enabled then all held locks are printed as well. this feature is a natural extension to the softlockup-detector (kernel locked up without scheduling) and to the NMI watchdog (kernel locked up with IRQs disabled). [ Gautham R Shenoy <ego@in.ibm.com>: CPU hotplug fixes. ] [ Andrew Morton <akpm@linux-foundation.org>: build warning fix. ] Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
2008-01-25	cpu-hotplug: fix build on !CONFIG_SMP	Ingo Molnar	1	-1/+7
	fix build on !CONFIG_SMP. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25	cpu-hotplug: replace per-subsystem mutexes with get_online_cpus()	Gautham R Shenoy	5	-50/+36
	This patch converts the known per-subsystem mutexes to get_online_cpus put_online_cpus. It also eliminates the CPU_LOCK_ACQUIRE and CPU_LOCK_RELEASE hotplug notification events. Signed-off-by: Gautham R Shenoy <ego@in.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25	cpu-hotplug: replace lock_cpu_hotplug() with get_online_cpus()	Gautham R Shenoy	15	-61/+62
	Replace all lock_cpu_hotplug/unlock_cpu_hotplug from the kernel and use get_online_cpus and put_online_cpus instead as it highlights the refcount semantics in these operations. The new API guarantees protection against the cpu-hotplug operation, but it doesn't guarantee serialized access to any of the local data structures. Hence the changes needs to be reviewed. In case of pseries_add_processor/pseries_remove_processor, use cpu_maps_update_begin()/cpu_maps_update_done() as we're modifying the cpu_present_map there. Signed-off-by: Gautham R Shenoy <ego@in.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25	cpu-hotplug: refcount based cpu hotplug	Gautham R Shenoy	3	-41/+115
	This patch implements a Refcount + Waitqueue based model for cpu-hotplug. Now, a thread which wants to prevent cpu-hotplug, will bump up a global refcount and the thread which wants to perform a cpu-hotplug operation will block till the global refcount goes to zero. The readers, if any, during an ongoing cpu-hotplug operation are blocked until the cpu-hotplug operation is over. Signed-off-by: Gautham R Shenoy <ego@in.ibm.com> Signed-off-by: Paul Jackson <pj@sgi.com> [For !CONFIG_HOTPLUG_CPU ] Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25	sched: group scheduler, fix fairness of cpu bandwidth allocation for task groups	Srivatsa Vaddagiri	4	-45/+331
	The current load balancing scheme isn't good enough for precise group fairness. For example: on a 8-cpu system, I created 3 groups as under: a = 8 tasks (cpu.shares = 1024) b = 4 tasks (cpu.shares = 1024) c = 3 tasks (cpu.shares = 1024) a, b and c are task groups that have equal weight. We would expect each of the groups to receive 33.33% of cpu bandwidth under a fair scheduler. This is what I get with the latest scheduler git tree: Signed-off-by: Ingo Molnar <mingo@elte.hu> -------------------------------------------------------------------------------- Col1 \| Col2 \| Col3 \| Col4 ------\|---------\|-------\|------------------------------------------------------- a \| 277.676 \| 57.8% \| 54.1% 54.1% 54.1% 54.2% 56.7% 62.2% 62.8% 64.5% b \| 116.108 \| 24.2% \| 47.4% 48.1% 48.7% 49.3% c \| 86.326 \| 18.0% \| 47.5% 47.9% 48.5% -------------------------------------------------------------------------------- Explanation of o/p: Col1 -> Group name Col2 -> Cumulative execution time (in seconds) received by all tasks of that group in a 60sec window across 8 cpus Col3 -> CPU bandwidth received by the group in the 60sec window, expressed in percentage. Col3 data is derived as: Col3 = 100 * Col2 / (NR_CPUS * 60) Col4 -> CPU bandwidth received by each individual task of the group. Col4 = 100 * cpu_time_recd_by_task / 60 [I can share the test case that produces a similar o/p if reqd] The deviation from desired group fairness is as below: a = +24.47% b = -9.13% c = -15.33% which is quite high. After the patch below is applied, here are the results: -------------------------------------------------------------------------------- Col1 \| Col2 \| Col3 \| Col4 ------\|---------\|-------\|------------------------------------------------------- a \| 163.112 \| 34.0% \| 33.2% 33.4% 33.5% 33.5% 33.7% 34.4% 34.8% 35.3% b \| 156.220 \| 32.5% \| 63.3% 64.5% 66.1% 66.5% c \| 160.653 \| 33.5% \| 85.8% 90.6% 91.4% -------------------------------------------------------------------------------- Deviation from desired group fairness is as below: a = +0.67% b = -0.83% c = +0.17% which is far better IMO. Most of other runs have yielded a deviation within +-2% at the most, which is good. Why do we see bad (group) fairness with current scheuler? ========================================================= Currently cpu's weight is just the summation of individual task weights. This can yield incorrect results. For ex: consider three groups as below on a 2-cpu system: CPU0 CPU1 --------------------------- A (10) B(5) C(5) --------------------------- Group A has 10 tasks, all on CPU0, Group B and C have 5 tasks each all of which are on CPU1. Each task has the same weight (NICE_0_LOAD = 1024). The current scheme would yield a cpu weight of 10240 (101024) for each cpu and the load balancer will think both CPUs are perfectly balanced and won't move around any tasks. This, however, would yield this bandwidth: A = 50% B = 25% C = 25% which is not the desired result. What's changing in the patch? ============================= - How cpu weights are calculated when CONFIF_FAIR_GROUP_SCHED is defined (see below) - API Change - Two tunables introduced in sysfs (under SCHED_DEBUG) to control the frequency at which the load balance monitor thread runs. The basic change made in this patch is how cpu weight (rq->load.weight) is calculated. Its now calculated as the summation of group weights on a cpu, rather than summation of task weights. Weight exerted by a group on a cpu is dependent on the shares allocated to it and also the number of tasks the group has on that cpu compared to the total number of (runnable) tasks the group has in the system. Let, W(K,i) = Weight of group K on cpu i T(K,i) = Task load present in group K's cfs_rq on cpu i T(K) = Total task load of group K across various cpus S(K) = Shares allocated to group K NRCPUS = Number of online cpus in the scheduler domain to which group K is assigned. Then, W(K,i) = S(K) NRCPUS * T(K,i) / T(K) A load balance monitor thread is created at bootup, which periodically runs and adjusts group's weight on each cpu. To avoid its overhead, two min/max tunables are introduced (under SCHED_DEBUG) to control the rate at which it runs. Fixes from: Peter Zijlstra <a.p.zijlstra@chello.nl> - don't start the load_balance_monitor when there is only a single cpu. - rename the kthread because its currently longer than TASK_COMM_LEN Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25	sched: introduce a mutex and corresponding API to serialize access to doms_curarray	Srivatsa Vaddagiri	1	-0/+19
	doms_cur[] array represents various scheduling domains which are mutually exclusive. Currently cpusets code can modify this array (by calling partition_sched_domains()) as a result of user modifying sched_load_balance flag for various cpusets. This patch introduces a mutex and corresponding API (only when CONFIG_FAIR_GROUP_SCHED is defined) which allows a reader to safely read the doms_cur[] array w/o worrying abt concurrent modifications to the array. The fair group scheduler code (introduced in next patch of this series) makes use of this mutex to walk thr' doms_cur[] array while rebalancing shares of task groups across cpus. Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25	sched: group scheduling, change how cpu load is calculated	Srivatsa Vaddagiri	3	-20/+40
	This patch changes how the cpu load exerted by fair_sched_class tasks is calculated. Load exerted by fair_sched_class tasks on a cpu is now a summation of the group weights, rather than summation of task weights. Weight exerted by a group on a cpu is dependent on the shares allocated to it. This version of patch has a minor impact on code size, but should have no runtime/functional impact for !CONFIG_FAIR_GROUP_SCHED. Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25	sched: group scheduling, minor fixes	Srivatsa Vaddagiri	2	-9/+28
	Minor bug fixes for the group scheduler: - Use a mutex to serialize add/remove of task groups and also when changing shares of a task group. Use the same mutex when printing cfs_rq debugging stats for various task groups. - Use list_for_each_entry_rcu in for_each_leaf_cfs_rq macro (when walking task group list) Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25	sched: group scheduling code cleanup	Srivatsa Vaddagiri	1	-18/+3
	Minor cleanups: - Fix coding style - remove obsolete comment Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25	sched: remove printk_clock references from ia64	Ingo Molnar	2	-15/+0
	remove remaining printk_clock references from ia64. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25	sched: remove printk_clock()	Ingo Molnar	3	-43/+0
	printk_clock() is obsolete - it has been replaced with cpu_clock(). Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25	sched: fix CONFIG_PRINT_TIME's reliance on sched_clock()	Ingo Molnar	1	-1/+1
	Stefano Brivio reported weird printk timestamp behavior during CPU frequency changes: http://bugzilla.kernel.org/show_bug.cgi?id=9475 fix CONFIG_PRINT_TIME's reliance on sched_clock() and use cpu_clock() instead. Reported-and-bisected-by: Stefano Brivio <stefano.brivio@polimi.it> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25	printk: make printk more robust by not allowing recursion	Ingo Molnar	1	-10/+38
	make printk more robust by allowing recursion only if there's a crash going on. Also add recursion detection. I've tested it with an artificially injected printk recursion - instead of a lockup or spontaneous reboot or other crash, the output was a well controlled: [ 41.057335] SysRq : <2>BUG: recent printk recursion! [ 41.057335] loglevel0-8 reBoot Crashdump show-all-locks(D) tErm Full kIll saK showMem Nice powerOff showPc show-all-timers(Q) unRaw Sync showTasks Unmount shoW-blocked-tasks also do all this printk-debug logic with irqs disabled. Signed-off-by: Ingo Molnar <mingo@elte.hu> Reviewed-by: Nick Piggin <npiggin@suse.de>
2008-01-25	slab: fix bootstrap on memoryless node	Pekka Enberg	1	-23/+23
	If the node we're booting on doesn't have memory, bootstrapping kmalloc() caches resorts to fallback_alloc() which requires ->nodelists set for all nodes. Fix that by calling set_up_list3s() for CACHE_CACHE in kmem_cache_init(). As kmem_getpages() is called with GFP_THISNODE set, this used to work before because of breakage in 2.6.22 and before with GFP_THISNODE returning pages from the wrong node if a node had no memory. So it may have worked accidentally and in an unsafe manner because the pages would have been associated with the wrong node which could trigger bug ons and locking troubles. Tested-by: Mel Gorman <mel@csn.ul.ie> Tested-by: Olaf Hering <olaf@aepfle.de> Reviewed-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> [ With additional one-liner by Olaf Hering - Linus ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-01-25	fix oops on rmmod capidrv	Karsten Keil	1	-4/+5
	Fix overwriting the stack with the version string (it is currently 10 bytes + zero) when unloading the capidrv module. Safeguard against overwriting it should the version string grow in the future. Should fix Kernel Bug Tracker Bug 9696. Signed-off-by: Gerd v. Egidy <gerd.von.egidy@intra2net.com> Acked-by: Karsten Keil <kkeil@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-01-25	[GFS2] Allow journal recovery on read-only mount	Abhijith Das	1	-4/+12
	This patch allows gfs2 to perform journal recovery even if it is mounted read-only. Strictly speaking, a read-only mount should not be writing to the filesystem, but we do this only to perform journal recovery. A read-only mount will fail if we don't recover the dirty journal. Also, when gfs2 is used as a root filesystem, it will be mounted read-only before being mounted read-write during the boot sequence. A failed read-only mount will panic the machine during bootup. Signed-off-by: Abhijith Das <adas@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25	[GFS2] Lockup on error	Bob Peterson	1	-1/+1
	I spotted this bug while I was digging around. Looks like it could cause a lockup in some rare error condition. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25	[GFS2] Fix page_mkwrite truncation race path	Steven Whitehouse	1	-0/+1
	There was a bug in the truncation/invalidation race path for ->page_mkwrite for gfs2. It ought to return 0 so that the effect is the same as if the page was truncated at any of the other points at which the page_lock is dropped. This will result in the restart of the whole page fault path. If it was due to a real truncation (as opposed to an invalidate because we let a glock go) then the ->fault path will pick that up when it gets called again. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25	[GFS2] Fix typo	Bob Peterson	1	-1/+1
	This patch fixes a minor typo. Surprisingly, it still compiled. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25	[GFS2] Fix write alloc required shortcut calculation	Steven Whitehouse	1	-2/+2
	The comparison was being made against the wrong quantity. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25	[GFS2] gfs2_alloc_required performance	Bob Peterson	1	-0/+5
	This is a small I/O performance enhancement to gfs2. (Actually, it is a rework of an earlier version I got wrong). The idea here is to check if the write extends past the last block in the file. If so, the function can save itself a lot of time and trouble because it knows an allocate will be required. Benchmarks like iozone should see better performance. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25	[GFS2] Remove unneeded i_spin	Bob Peterson	2	-2/+0
	This patch removes a vestigial variable "i_spin" from the gfs2_inode structure. This not only saves us memory (>300000 of these in memory for the oom test) it also saves us time because we don't have to spend time initializing it (i.e. slightly better performance). Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25	[GFS2] Reduce inode size by moving i_alloc out of line	Steven Whitehouse	12	-36/+39
	It is possible to reduce the size of GFS2 inodes by taking the i_alloc structure out of the gfs2_inode. This patch allocates the i_alloc structure whenever its needed, and frees it afterward. This decreases the amount of low memory we use at the expense of requiring a memory allocation for each page or partial page that we write. A quick test with postmark shows that the overhead is not measurable and I also note that OCFS2 use the same approach. In the future I'd like to solve the problem by shrinking down the size of the members of the i_alloc structure, but for now, this reduces the immediate problem of using too much low-memory on x86 and doesn't add too much overhead. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25	[GFS2] Fix assert in log code	Steven Whitehouse	1	-7/+4
	Although the values were all being calculated correctly, there was a race in the assert due to the way it was using atomic variables. This changes the value we assert on so that we get the same effect by testing a different variable. This prevents the assert triggering when it shouldn't. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25	[GFS2] Fix problems relating to execution of files on GFS2	Steven Whitehouse	2	-9/+16
	This patch fixes a couple of problems which affected the execution of files on GFS2. The first is that there was a corner case where inodes were not always uptodate at the point at which permissions checks were being carried out, this was resulting in refusal of execute permission, but only on the first lookup, subsequent requests worked correctly. The second was a problem relating to incorrect updating of file sizes which was introduced with the write_begin/end code for GFS2 a little while back. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Abhijith Das <adas@redhat.com>
2008-01-25	[GFS2] Initialize extent_list earlier	Bob Peterson	2	-1/+1
	Here is a patch for the latest upstream GFS2 code: The journal extent map needs to be initialized sooner than it currently is. Otherwise failed mount attempts (e.g. not enough journals, etc.) may panic trying to access the uninitialized list. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25	[GFS2] Allow page migration for writeback and ordered pages	Steven Whitehouse	1	-0/+2
	To improve performance on NUMA, we use the VM's standard page migration for writeback and ordered pages. Probably we could also do the same for journaled data, but that would need a careful audit of the code, so will be the subject of a later patch. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25	[GFS2] Remove unused variable	Steven Whitehouse	1	-1/+0
	The go_drop_th function is never called or referenced. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25	[GFS2] Fix log block mapper	Steven Whitehouse	1	-1/+1
	A missing offset in the calculation. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25	[GFS2] Minor correction	Bob Peterson	1	-1/+1
	This is a small correction to my previously posted patch1. It just changes a divide to a shift. It's faster and doesn't introduce odd dependencies on 32-bit compiles. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25	[GFS2] Eliminate the no longer needed sd_statfs_mutex	Bob Peterson	3	-6/+0
	This patch eliminates the unneeded sd_statfs_mutex mutex but preserves the ordering as discussed. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25	[GFS2] Incremental patch to fix compiler warning	Bob Peterson	1	-5/+2
	Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25	[GFS2] Function meta_read optimization	Bob Peterson	1	-6/+7
	This patch optimizes function gfs2_meta_read. Basically, gfs2_meta_wait was being called regardless of whether a disk read was requested. This just pulls that wait into the if that triggers the read. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25	[GFS2] Only fetch the dinode once in block_map	Bob Peterson	1	-7/+7
	Function gfs2_block_map was often looking up the disk inode twice. This optimizes it so that only does it once. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25	[GFS2] Reorganize function gfs2_glmutex_lock	Bob Peterson	1	-14/+9
	This patch optimizes the function gfs2_glmutex_lock. The basic theory is: Why bother initializing a holder, setting up wait bits and then waiting on them, if you know the glock can be yours. So the holder stuff is placed inside the if checking if the glock is locked. This one needs careful scrutiny because changing anything to do with locking should strike terror into one's heart. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25	[GFS2] Run through full bitmaps quicker in gfs2_bitfit	Bob Peterson	1	-25/+29
	I eliminated the passing of an unused parameter into gfs2_bitfit called rgd. This also changes the gfs2_bitfit code that searches for free (or used) blocks. Before, the code was trying to check for bytes that indicated 4 blocks in the undesired state. The problem is, it was spending more time trying to do this than it actually was saving. This version only optimizes the case where we're looking for free blocks, and it checks a machine word at a time. So on 32-bit machines, it will check 32-bits (16 blocks) and on 64-bit machines, it will check 64-bits (32 blocks) at a time. The compiler optimizes that quite well and we save some time, especially when running through full bitmaps (like the bitmaps allocated for the journals). There's probably a more elegant or optimized way to do this, but I haven't thought of it yet. I'm open to suggestions. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25	[GFS2] Get rid of useless "found" variable in quota.c	Bob Peterson	1	-4/+2
	This just eliminates an unused variable from the quota code. Not likely to be a time saver. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25	[GFS2] Journal extent mapping	Bob Peterson	4	-17/+97
	This patch saves a little time when gfs2 writes to the journals by keeping a mapping between logical and physical blocks on disk. That's better than constantly looking up indirect pointers in buffers, when the journals are several levels of indirection (which they typically are). Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25	[GFS2] Remove function gfs2_get_block	Bob Peterson	8	-35/+17
	This patch is just a cleanup. Function gfs2_get_block() just calls function gfs2_block_map reversing the last two parameters. By reversing the parameters, gfs2_block_map() may be called directly and function gfs2_get_block may be eliminated altogether. Since this function is done for every block operation, this streamlines the code and makes it a little bit more efficient. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25	[GFS2] use pid for plock owner for nfs clients	David Teigland	1	-4/+14
	The fl_owner is that of lockd when posix locks arrive from nfs clients, so it can't be used to distinguish between lock holders. Use fl_pid as owner instead; it's the pid of the process on the nfs client. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25	[GFS2] Remove unused variable	Steven Whitehouse	1	-1/+0
	Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25	[GFS2] patch to check for recursive lock requests in gfs2_rename code path	Abhijith Das	1	-8/+25
	A certain scenario in the rename code path triggers a kernel BUG() because it accidentally does recursive locking The first lock is requested to unlink an already existing inode (replacing a file) and the second lock is requested when the destination directory needs to alloc some space. It is rare that these two events happen during the same rename call, and even more rare that these two instances try to lock the same rgrp. It is, however, possible. https://bugzilla.redhat.com/show_bug.cgi?id=404711 Signed-off-by: Abhijith Das <adas@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25	[GFS2] Remove lock methods for lock_nolock protocol	Wendy Cheng	3	-14/+35
	GFS2 supports two modes of locking - lock_nolock for single node filesystem and lock_dlm for cluster mode locking. The gfs2 lock methods are removed from file operation table for lock_nolock protocol. This would allow VFS to handle posix lock and flock logics just like other in-tree filesystems without duplication. Signed-off-by: S. Wendy Cheng <wcheng@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>