aboutsummaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)AuthorFilesLines
2008-07-10[GFS2] Remove unused declarationLi Xiaodong1-1/+0
The implementation of gfs2_inode_attr_in is removed. So remove its declaration. Signed-off-by: Li Xiaodong <lixd@cn.fujitsu.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-07-10[GFS2] Remove support for unused and pointless flagSteven Whitehouse5-26/+2
The ability to mark files for direct i/o access when opened normally is both unused and pointless, so this patch removes support for that feature. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-07-10[GFS2] Replace rgrp "recent list" with mru listSteven Whitehouse3-99/+12
This patch removes the "recent list" which is used during allocation and replaces it with the (already existing) mru list used during deletion. The "recent list" was not a true mru list leading to a number of inefficiencies including a "next" function which made scanning the list an order N^2 operation wrt to the number of list elements. This should increase allocation performance with large numbers of rgrps. Its also a useful preparation and cleanup before some further changes which are planned in this area. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-07-07[GFS2] Allow local DF locks when holding a cached EX glockSteven Whitehouse1-2/+6
We already allow local SH locks while we hold a cached EX glock, so here we allow DF locks as well. This works only because we rely on the VFS's invalidation for locally cached data, and because if we hold an EX lock, then we know that no other node can be caching data relating to this file. It dramatically speeds up initial writes to O_DIRECT files since we fall back to buffered I/O for this and would otherwise bounce between DF and EX modes on each and every write call. The lessons to be learned from that are to ensure that (for the time being anyway) O_DIRECT files are preallocated and that they are written to using reasonably large I/O sizes. Even so this change fixes that corner case nicely Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-07-07[GFS2] Fix delayed demote raceSteven Whitehouse1-1/+4
There is a race in the delayed demote code where it does the wrong thing if a demotion to UN has occurred for other reasons before the delay has expired. This patch adds an assert to catch that condition as well as fixing the root cause by adding an additional check for the UN state. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Bob Peterson <rpeterso@redhat.com>
2008-07-03[GFS2] don't call permission()Miklos Szeredi4-13/+30
GFS2 calls permission() to verify permissions after locks on the files have been taken. For this it's sufficient to call gfs2_permission() instead. This results in the following changes: - IS_RDONLY() check is not performed - IS_IMMUTABLE() check is not performed - devcgroup_inode_permission() is not called - security_inode_permission() is not called IS_RDONLY() should be unnecessary anyway, as the per-mount read-only flag should provide protection against read-only remounts during operations. do_gfs2_set_flags() has been fixed to perform mnt_want_write()/mnt_drop_write() to protect against remounting read-only. IS_IMMUTABLE has been added to gfs2_permission() Repeating the security checks seems to be pointless, as they don't normally change, and if they do, it's independent of the filesystem state. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-06-27[GFS2] Fix module buildingSteven Whitehouse2-2/+0
Two lines missed from the previous patch. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-06-27[GFS2] Glock documentationSteven Whitehouse1-0/+114
This patch adds a file describing the internals of GFS2's glock abstraction. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-06-27[GFS2] Remove all_list from lock_dlmSteven Whitehouse3-30/+1
I discovered that we had a list onto which every lock_dlm lock was being put. Its only function was to discover whether we'd got any locks left after umount. Since there was already a counter for that purpose as well, I removed the list. The saving is sizeof(struct list_head) per glock - well worth having. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-06-27[GFS2] Remove obsolete conversion deadlock avoidance codeSteven Whitehouse2-24/+1
This is only used by GFS1 so can be removed. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-06-27[GFS2] Remove remote lock dropping codeSteven Whitehouse11-73/+6
There are several reasons why this is undesirable: 1. It never happens during normal operation anyway 2. If it does happen it causes performance to be very, very poor 3. It isn't likely to solve the original problem (memory shortage on remote DLM node) it was supposed to solve 4. It uses a bunch of arbitrary constants which are unlikely to be correct for any particular situation and for which the tuning seems to be a black art. 5. In an N node cluster, only 1/N of the dropped locked will actually contribute to solving the problem on average. So all in all we are better off without it. This also makes merging the lock_dlm module into GFS2 a bit easier. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-06-27[GFS2] kernel panic mounting volumeBob Peterson1-1/+1
This patch fixes Red Hat bugzilla bug 450156. This started with a not-too-improbable mount failure because the locking protocol was never set back to its proper "lock_dlm" after the system was rebooted in the middle of a gfs2_fsck. That left a (purposely) invalid locking protocol in the superblock, which caused an error when the file system was mounted the next time. When there's an error mounting, vfs calls DQUOT_OFF, which calls vfs_quota_off which calls gfs2_sync_fs. Next, gfs2_sync_fs calls gfs2_log_flush passing s_fs_info. But due to the error, s_fs_info had been previously set to NULL, and so we have the kernel oops. My solution in this patch is to test for the NULL value before passing it. I tested this patch and it fixes the problem. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-06-27[GFS2] Revise readpage lockingSteven Whitehouse1-13/+16
The previous attempt to fix the locking in readpage failed due to the use of a "try lock" which resulted in occasional high cpu usage during testing (due to repeated tries) and also it did not resolve all the ordering problems wrt the transaction lock (although it did solve all the inode lock ordering problems). This patch avoids the problem by unlocking the page and getting the locks in the correct order. This means that we have to retest the page to ensure that it hasn't changed when we relock the page. This now passes the tests which were previously failing. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-06-27[GFS2] Fix ordering of args for list_addSteven Whitehouse1-1/+1
The patch to remove lock_nolock managed to get the arguments of this list_add backwards. This fixes it. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-06-27[GFS2] trivial sparse lock annotationsHarvey Harrison2-0/+4
Annotate the &sdp->sd_log_lock. Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-06-27[GFS2] No lock_nolockSteven Whitehouse8-266/+71
This patch merges the lock_nolock module into GFS2 itself. As well as removing some of the overhead of the module, it also means that its now impossible to build GFS2 without a lock module (which would be a pointless thing to do anyway). We also plan to merge lock_dlm into GFS2 in the future, but that is a more tricky task, and will therefore be a separate patch. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: David Teigland <teigland@redhat.com>
2008-06-27[GFS2] Fix ordering bug in lock_dlmSteven Whitehouse4-387/+306
This looks like a lot of change, but in fact its not. Mostly its things moving from one file to another. The change is just that instead of queuing lock completions and callbacks from the DLM we now pass them directly to GFS2. This gives us a net loss of two list heads per glock (a fair saving in memory) plus a reduction in the latency of delivering the messages to GFS2, plus we now have one thread fewer as well. There was a bug where callbacks and completions could be delivered in the wrong order due to this unnecessary queuing which is fixed by this patch. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Bob Peterson <rpeterso@redhat.com>
2008-06-27[GFS2] Clean up the glock coreSteven Whitehouse13-1049/+736
This patch implements a number of cleanups to the core of the GFS2 glock code. As a result a lot of code is removed. It looks like a really big change, but actually a large part of this patch is either removing or moving existing code. There are some new bits too though, such as the new run_queue() function which is considerably streamlined. Highlights of this patch include: o Fixes a cluster coherency bug during SH -> EX lock conversions o Removes the "glmutex" code in favour of a single bit lock o Removes the ->go_xmote_bh() for inodes since it was duplicating ->go_lock() o We now only use the ->lm_lock() function for both locks and unlocks (i.e. unlock is a lock with target mode LM_ST_UNLOCKED) o The fast path is considerably shortly, giving performance gains especially with lock_nolock o The glock_workqueue is now used for all the callbacks from the DLM which allows us to simplify the lock_dlm module (see following patch) o The way is now open to make further changes such as eliminating the two threads (gfs2_glockd and gfs2_scand) in favour of a more efficient scheme. This patch has undergone extensive testing with various test suites so it should be pretty stable by now. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Bob Peterson <rpeterso@redhat.com>
2008-06-24Linux 2.6.26-rc8Linus Torvalds1-1/+1
2008-06-24Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6Linus Torvalds3-5/+2
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6: [IA64] Eliminate NULL test after alloc_bootmem in iosapic_alloc_rte() [IA64] Handle count==0 in sn2_ptc_proc_write() [IA64] Fix boot failure on ia64/sn2
2008-06-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixesLinus Torvalds2-15/+10
* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes: [GFS2] fix gfs2 block allocation (cleaned up) [GFS2] BUG: unable to handle kernel paging request at ffff81002690e000
2008-06-24Merge branch 'kvm-updates-2.6.26' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvmLinus Torvalds18-266/+358
* 'kvm-updates-2.6.26' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm: KVM: Remove now unused structs from kvm_para.h x86: KVM guest: Use the paravirt clocksource structs and functions KVM: Make kvm host use the paravirt clocksource structs x86: Make xen use the paravirt clocksource structs and functions x86: Add structs and functions for paravirt clocksource KVM: VMX: Fix host msr corruption with preemption enabled KVM: ioapic: fix lost interrupt when changing a device's irq KVM: MMU: Fix oops on guest userspace access to guest pagetable KVM: MMU: large page update_pte issue with non-PAE 32-bit guests (resend) KVM: MMU: Fix rmap_write_protect() hugepage iteration bug KVM: close timer injection race window in __vcpu_run KVM: Fix race between timer migration and vcpu migration
2008-06-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/wim/linux-2.6-watchdogLinus Torvalds1-1/+0
* git://git.kernel.org/pub/scm/linux/kernel/git/wim/linux-2.6-watchdog: Revert "[WATCHDOG] hpwdt: Add CFLAGS to get driver working"
2008-06-24Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tipLinus Torvalds6-77/+27
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: xen: remove support for non-PAE 32-bit
2008-06-24Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdbLinus Torvalds2-15/+8
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb: kgdb: sparse fix kgdb: documentation update - remove kgdboe
2008-06-24enable bus mastering on i915 at resume timeJie Luo1-0/+1
On 9xx chips, bus mastering needs to be enabled at resume time for much of the chip to function. With this patch, vblank interrupts will work as expected on resume, along with other chip functions. Fixes kernel bugzilla #10844. Signed-off-by: Jie Luo <clotho67@gmail.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-06-24KVM: Remove now unused structs from kvm_para.hGerd Hoffmann1-18/+0
The kvm_* structs are obsoleted by the pvclock_* ones. Now all users have been switched over and the old structs can be dropped. Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-06-24x86: KVM guest: Use the paravirt clocksource structs and functionsGerd Hoffmann2-56/+34
This patch updates the kvm host code to use the pvclock structs and functions, thereby making it compatible with Xen. The patch also fixes an initialization bug: on SMP systems the per-cpu has two different locations early at boot and after CPU bringup. kvmclock must take that in account when registering the physical address within the host. Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-06-24KVM: Make kvm host use the paravirt clocksource structsGerd Hoffmann2-14/+65
This patch updates the kvm host code to use the pvclock structs. It also makes the paravirt clock compatible with Xen. Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-06-24x86: Make xen use the paravirt clocksource structs and functionsGerd Hoffmann3-124/+16
This patch updates the xen guest to use the pvclock structs and helper functions. Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-06-24x86: Add structs and functions for paravirt clocksourceGerd Hoffmann5-0/+201
This patch adds structs for the paravirt clocksource ABI used by both xen and kvm (pvclock-abi.h). It also adds some helper functions to read system time and wall clock time from a paravirtual clocksource (pvclock.[ch]). They are based on the xen code. They are enabled using CONFIG_PARAVIRT_CLOCK. Subsequent patches of this series will put the code in use. Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-06-24[GFS2] fix gfs2 block allocation (cleaned up)Benjamin Marzinski1-14/+9
This patch fixes bz 450641. This patch changes the computation for zero_metapath_length(), which it renames to metapath_branch_start(). When you are extending the metadata tree, The indirect blocks that point to the new data block must either diverge from the existing tree either at the inode, or at the first indirect block. They can diverge at the first indirect block because the inode has room for 483 pointers while the indirect blocks have room for 509 pointers, so when the tree is grown, there is some free space in the first indirect block. What metapath_branch_start() now computes is the height where the first indirect block for the new data block is located. It can either be 1 (if the indirect block diverges from the inode) or 2 (if it diverges from the first indirect block). Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-06-24[IA64] Eliminate NULL test after alloc_bootmem in iosapic_alloc_rte()Julia Lawall1-2/+0
As noted by Akinobu Mita alloc_bootmem and related functions never return NULL and always return a zeroed region of memory. Thus a NULL test or memset after calls to these functions is unnecessary. Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Tony Luck <tony.luck@intel.com>
2008-06-24[IA64] Handle count==0 in sn2_ptc_proc_write()Cliff Wickman1-1/+1
The fix applied in e0c6d97c65e0784aade7e97b9411f245a6c543e7 "security hole in sn2_ptc_proc_write" didn't take into account the case where count==0 (which results in a buffer underrun when adding the trailing '\0'). Thanks to Andi Kleen for pointing this out. Signed-off-by: Cliff Wickman <cpw@sgi.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
2008-06-24[IA64] Fix boot failure on ia64/sn2Jes Sorensen1-2/+1
Call check_sal_cache_flush() after platform_setup() as check_sal_cache_flush() now relies on being able to call platform vector code. Problem was introduced by: 3463a93def55c309f3c0d0a8aaf216be3be42d64 "Update check_sal_cache_flush to use platform_send_ipi()" Signed-off-by: Jes Sorensen <jes@sgi.com> Tested-by: Alex Chiang: <achiang@hp.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
2008-06-24kgdb: sparse fixJason Wessel1-1/+2
- Fix warning reported by sparse kernel/kgdb.c:1502:6: warning: symbol 'kgdb_console_write' was not declared. Should it be static? Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2008-06-24kgdb: documentation update - remove kgdboeJason Wessel1-14/+6
kgdboe is not presently included kgdb, and there should be no references to it. Also fix the tcp port terminal connection example. Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2008-06-24xen: remove support for non-PAE 32-bitJeremy Fitzhardinge6-77/+27
Non-PAE operation has been deprecated in Xen for a while, and is rarely tested or used. xen-unstable has now officially dropped non-PAE support. Since Xen/pvops' non-PAE support has also been broken for a while, we may as well completely drop it altogether. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-24[GFS2] BUG: unable to handle kernel paging request at ffff81002690e000Bob Peterson1-1/+1
This patch fixes bugzilla bug bz448866: gfs2: BUG: unable to handle kernel paging request at ffff81002690e000. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-06-24Revert "[WATCHDOG] hpwdt: Add CFLAGS to get driver working"Wim Van Sebroeck1-1/+0
After Linus fixed the inline assembly, the CFLAGS option is not needed anymore. Signed-off-by: Thomas Mingarelli <Thomas.Mingarelli@hp.com> Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
2008-06-24KVM: VMX: Fix host msr corruption with preemption enabledAvi Kivity1-8/+11
Switching msrs can occur either synchronously as a result of calls to the msr management functions (usually in response to the guest touching virtualized msrs), or asynchronously when preempting a kvm thread that has guest state loaded. If we're unlucky enough to have the two at the same time, host msrs are corrupted and the machine goes kaput on the next syscall. Most easily triggered by Windows Server 2008, as it does a lot of msr switching during bootup. Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-06-24KVM: ioapic: fix lost interrupt when changing a device's irqAvi Kivity1-20/+11
The ioapic acknowledge path translates interrupt vectors to irqs. It currently uses a first match algorithm, stopping when it finds the first redirection table entry containing the vector. That fails however if the guest changes the irq to a different line, leaving the old redirection table entry in place (though masked). Result is interrupts not making it to the guest. Fix by always scanning the entire redirection table. Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-06-24KVM: MMU: Fix oops on guest userspace access to guest pagetableAvi Kivity1-6/+0
KVM has a heuristic to unshadow guest pagetables when userspace accesses them, on the assumption that most guests do not allow userspace to access pagetables directly. Unfortunately, in addition to unshadowing the pagetables, it also oopses. This never triggers on ordinary guests since sane OSes will clear the pagetables before assigning them to userspace, which will trigger the flood heuristic, unshadowing the pagetables before the first userspace access. One particular guest, though (Xenner) will run the kernel in userspace, triggering the oops. Since the heuristic is incorrect in this case, we can simply remove it. Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-06-24KVM: MMU: large page update_pte issue with non-PAE 32-bit guests (resend)Marcelo Tosatti1-5/+7
kvm_mmu_pte_write() does not handle 32-bit non-PAE large page backed guests properly. It will instantiate two 2MB sptes pointing to the same physical 2MB page when a guest large pte update is trapped. Instead of duplicating code to handle this, disallow directory level updates to happen through kvm_mmu_pte_write(), so the two 2MB sptes emulating one guest 4MB pte can be correctly created by the page fault handling path. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-06-24KVM: MMU: Fix rmap_write_protect() hugepage iteration bugMarcelo Tosatti1-0/+1
rmap_next() does not work correctly after rmap_remove(), as it expects the rmap chains not to change during iteration. Fix (for now) by restarting iteration from the beginning. Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-06-24KVM: close timer injection race window in __vcpu_runMarcelo Tosatti4-3/+9
If a timer fires after kvm_inject_pending_timer_irqs() but before local_irq_disable() the code will enter guest mode and only inject such timer interrupt the next time an unrelated event causes an exit. It would be simpler if the timer->pending irq conversion could be done with IRQ's disabled, so that the above problem cannot happen. For now introduce a new vcpu requests bit to cancel guest entry. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-06-24KVM: Fix race between timer migration and vcpu migrationMarcelo Tosatti1-12/+3
A guest vcpu instance can be scheduled to a different physical CPU between the test for KVM_REQ_MIGRATE_TIMER and local_irq_disable(). If that happens, the timer will only be migrated to the current pCPU on the next exit, meaning that guest LAPIC timer event can be delayed until a host interrupt is triggered. Fix it by cancelling guest entry if any vcpu request is pending. This has the side effect of nicely consolidating vcpu->requests checks. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-06-23alpha: fix compile error in arch/alpha/mm/init.cThorsten Kranzkowski1-0/+2
Commit 9267b4b3880d00dc2dab90f1d817c856939114f7 ("alpha: fix module load failures on smp (bug #10926)") causes a regression for my ev4 uniprocessor build: CC arch/alpha/mm/init.o /export/data/repositories/linux-2.6/arch/alpha/mm/init.c:34: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘typeof’ make[2]: *** [arch/alpha/mm/init.o] Error 1 make[1]: *** [arch/alpha/mm] Error 2 make: *** [sub-make] Error 2 This fixes it for me (compile and boot tested): Signed-off-by: Thorsten Kranzkowski <dl8bcu@dl8bcu.de> Acked-by: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-06-23Merge branch 'hotfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6Linus Torvalds3-37/+51
* 'hotfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: NFS: nfs_updatepage(): don't mark page as dirty if an error occurred NFS: Fix filehandle size comparisons in the mount code NFS: Reduce the NFS mount code stack usage.
2008-06-23NFS: nfs_updatepage(): don't mark page as dirty if an error occurredTrond Myklebust1-3/+4
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>