aboutsummaryrefslogtreecommitdiffstatshomepage
path: root/tools/perf/scripts/python/syscall-counts.py (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2011-04-25add hlist_bl_lock/unlock helpersChristoph Hellwig3-20/+19
Now that the whole dcache_hash_bucket crap is gone, go all the way and also remove the weird locking layering violations for locking the hash buckets. Add hlist_bl_lock/unlock helpers to move the locking into the list abstraction instead of requiring each caller to open code it. After all allowing for the bit locks is the whole point of these helpers over the plain hlist variant. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-04-25bit_spinlock: don't play preemption games inside the busy loopLinus Torvalds1-4/+4
When we are waiting for the bit-lock to be released, and are looping over the 'cpu_relax()' should not be doing anything else - otherwise we miss the point of trying to do the whole 'cpu_relax()'. Do the preemption enable/disable around the loop, rather than inside of it. Noticed when I was looking at the code generation for the dcache __d_drop usage, and the code just looked very odd. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-04-25eCryptfs: Flush dirty pages in setattrTyler Hicks1-0/+6
After 57db4e8d73ef2b5e94a3f412108dff2576670a8a changed eCryptfs to write-back caching, eCryptfs page writeback updates the lower inode times due to the use of vfs_write() on the lower file. To preserve inode metadata changes, such as 'cp -p' does with utimensat(), we need to flush all dirty pages early in ecryptfs_setattr() so that the user-updated lower inode metadata isn't clobbered later in writeback. https://bugzilla.kernel.org/show_bug.cgi?id=33372 Reported-by: Rocko <rockorequin@hotmail.com> Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
2011-04-25eCryptfs: Handle failed metadata read in lookupTyler Hicks4-16/+28
When failing to read the lower file's crypto metadata during a lookup, eCryptfs must continue on without throwing an error. For example, there may be a plaintext file in the lower mount point that the user wants to delete through the eCryptfs mount. If an error is encountered while reading the metadata in lookup(), the eCryptfs inode's size could be incorrect. We must be sure to reread the plaintext inode size from the metadata when performing an open() or setattr(). The metadata is already being read in those paths, so this adds minimal performance overhead. This patch introduces a flag which will track whether or not the plaintext inode size has been read so that an incorrect i_size can be fixed in the open() or setattr() paths. https://bugs.launchpad.net/bugs/509180 Cc: <stable@kernel.org> Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
2011-04-25eCryptfs: Add reference counting to lower filesTyler Hicks6-59/+92
For any given lower inode, eCryptfs keeps only one lower file open and multiplexes all eCryptfs file operations through that lower file. The lower file was considered "persistent" and stayed open from the first lookup through the lifetime of the inode. This patch keeps the notion of a single, per-inode lower file, but adds reference counting around the lower file so that it is closed when not currently in use. If the reference count is at 0 when an operation (such as open, create, etc.) needs to use the lower file, a new lower file is opened. Since the file is no longer persistent, all references to the term persistent file are changed to lower file. Locking is added around the sections of code that opens the lower file and assign the pointer in the inode info, as well as the code the fputs the lower file when all eCryptfs users are done with it. This patch is needed to fix issues, when mounted on top of the NFSv3 client, where the lower file is left silly renamed until the eCryptfs inode is destroyed. Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
2011-04-25eCryptfs: dput dentries returned from dget_parentTyler Hicks1-2/+2
Call dput on the dentries previously returned by dget_parent() in ecryptfs_rename(). This is needed for supported eCryptfs mounts on top of the NFSv3 client. Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
2011-04-25eCryptfs: Remove extra d_delete in ecryptfs_rmdirTyler Hicks1-2/+0
vfs_rmdir() already calls d_delete() on the lower dentry. That was being duplicated in ecryptfs_rmdir() and caused a NULL pointer dereference when NFSv3 was the lower filesystem. Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
2011-04-24libata: ahci_start_engine compliant to AHCI specJian Peng1-0/+21
At the end of section 10.1 of AHCI spec (rev 1.3), it states Software shall not set PxCMD.ST to 1 until it is determined that a functoinal device is present on the port as determined by PxTFD.STS.BSY=0, PxTFD.STS.DRQ=0 and PxSSTS.DET=3h Even though most AHCI host controller works without this check, specific controller will fail under this condition. Signed-off-by: Jian Peng <jipeng2005@gmail.com> Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2011-04-24ata: pata_at91.c bugfix for initial_timing initialisationIgor Plyatov1-2/+12
The "struct ata_timing" must contain 10 members, but ".dmack_hold" member was forgotten for "initial_timing" initialisation. This patch fixes such a problem. Signed-off-by: Igor Plyatov <plyatov@gmail.com> Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2011-04-24ata: pata_at91.c bugfix for high master clockIgor Plyatov1-1/+7
The AT91SAM9 microcontrollers with master clock higher then 105 MHz and PIO0, have overflow of the NCS_RD_PULSE value in the MSB. This lead to "NCS_RD_PULSE" pulse longer then "NRD_CYCLE" pulse and driver does not detect ATA device. Signed-off-by: Igor Plyatov <plyatov@gmail.com> Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2011-04-24ahci: AHCI-mode SATA patch for Intel Panther Point DeviceIDsSeth Heasley1-0/+6
The previously submitted patch was word-wrapped. This patch adds the AHCI-mode SATA DeviceIDs for the Intel Panther Point PCH. Signed-off-by: Seth Heasley <seth.heasley@intel.com> Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2011-04-24ata_piix: IDE-mode SATA patch for Intel Panther Point DeviceIDsSeth Heasley1-0/+8
The previously submitted patch was word-wrapped. This patch adds the IDE-mode SATA DeviceIDs for the Intel Panther Point PCH. Signed-off-by: Seth Heasley <seth.heasley@intel.com> Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2011-04-24libata: Pioneer DVR-216D can't do SETXFERJeff Mahoney1-0/+1
Commit 4a5610a04d415ed94af75bb1159d2621d62c8328 fixed an issue with the Pioneer DVR-212D not handling SETXFER correctly. An openSUSE user reported a similar issue with his DVR-216D that the NOSETXFER horkage worked around for him as well. This patch adds the DVR-216D (1.08) to the horkage list for NOSETXFER. The issue was reported at: https://bugzilla.novell.com/show_bug.cgi?id=679143 Reported-by: Volodymyr Kyrychenko <vladimir.kirichenko@gmail.com> Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2011-04-24ahci: don't enable port irq before handler is registeredMaxime Bizon2-3/+16
The ahci_pmp_attach() & ahci_pmp_detach() unmask port irqs, but they are also called during port initialization, before ahci host irq handler is registered. On ce4100 platform, this sometimes triggers "irq 4: nobody cared" message when loading driver. Fixed this by not touching the register if the port is in frozen state, and mark all uninitialized port as frozen. Signed-off-by: Maxime Bizon <mbizon@freebox.fr> Acked-by: Tejun Heo <tj@kernel.org> Cc: stable@kernel.org Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2011-04-24libata: Implement ATA_FLAG_NO_DIPM and apply it to mcp65Tejun Heo3-3/+6
NVIDIA mcp65 familiy of controllers cause command timeouts when DIPM is used. Implement ATA_FLAG_NO_DIPM and apply it. This problem was reported by Stefan Bader in the following thread. http://thread.gmane.org/gmane.linux.ide/48841 stable: applicable to 2.6.37 and 38. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Stefan Bader <stefan.bader@canonical.com> Cc: stable@kernel.org Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2011-04-24libata: Kill unused ATA_DFLAG_{H|D}IPM flagsTejun Heo1-2/+0
ATA_DFLAG_{H|D}IPM flags are no longer used. Kill them. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2011-04-24ahci: EM supported message type sysfs attributeHannes Reinecke2-0/+26
This patch adds an sysfs attribute 'em_message_supported' to the ahci host device which prints out the supported enclosure management message types. Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2011-04-24kconfig: Avoid buffer underrun in choice inputBen Hutchings1-1/+1
Commit 40aee729b350 ('kconfig: fix default value for choice input') fixed some cases where kconfig would select the wrong option from a choice with a single valid option and thus enter an infinite loop. However, this broke the test for user input of the form 'N?', because when kconfig selects the single valid option the input is zero-length and the test will read the byte before the input buffer. If this happens to contain '?' (as it will in a mips build on Debian unstable today) then kconfig again enters an infinite loop. Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: stable@kernel.org [2.6.17+] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-04-24vfs: get rid of insane dentry hashing rulesLinus Torvalds2-28/+18
The dentry hashing rules have been really quite complicated for a long while, in odd ways. That made functions like __d_drop() very fragile and non-obvious. In particular, whether a dentry was hashed or not was indicated with an explicit DCACHE_UNHASHED bit. That's despite the fact that the hash abstraction that the dentries use actually have a 'is this entry hashed or not' model (which is a simple test of the 'pprev' pointer). The reason that was done is because we used the normal 'is this entry unhashed' model to mark whether the dentry had _ever_ been hashed in the dentry hash tables, and that logic goes back many years (commit b3423415fbc2: "dcache: avoid RCU for never-hashed dentries"). That, in turn, meant that __d_drop had totally different unhashing logic for the dentry hash table case and for the anonymous dcache case, because in order to use the "is this dentry hashed" logic as a flag for whether it had ever been on the RCU hash table, we had to unhash such a dentry differently so that we'd never think that it wasn't 'unhashed' and wouldn't be free'd correctly. That's just insane. It made the logic really hard to follow, when there were two different kinds of "unhashed" states, and one of them (the one that used "list_bl_unhashed()") really had nothing at all to do with being unhashed per se, but with a very subtle lifetime rule instead. So turn all of it around, and make it logical. Instead of having a DENTRY_UNHASHED bit in d_flags to indicate whether the dentry is on the hash chains or not, use the hash chain unhashed logic for that. Suddenly "d_unhashed()" just uses "list_bl_unhashed()", and everything makes sense. And for the lifetime rule, just use an explicit DENTRY_RCUACCEES bit. If we ever insert the dentry into the dentry hash table so that it is visible to RCU lookup, we mark it DENTRY_RCUACCESS to show that it now needs the RCU lifetime rules. Now suddently that test at dentry free time makes sense too. And because unhashing now is sane and doesn't depend on where the dentry got unhashed from (because the dentry hash chain details doesn't have some subtle side effects), we can re-unify the __d_drop() logic and use common code for the unhashing. Also fix one more open-coded hash chain bit_spin_lock() that I missed in the previous chain locking cleanup commit. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-04-23vfs: get rid of 'struct dcache_hash_bucket' abstractionLinus Torvalds1-24/+21
It's a useless abstraction for 'hlist_bl_head', and it doesn't actually help anything - quite the reverse. All the users end up having to know about the hlist_bl_head details anyway, using 'struct hlist_bl_node *' etc. So it just makes the code look confusing. And the cost of it is extra '&b->head' syntactic noise, but more importantly it spuriously makes the hash table dentry list look different from the per-superblock DCACHE_DISCONNECTED dentry list. As a result, the code ended up using ad-hoc locking for one case and special helper functions for what is really another totally identical case in the very same function. Make it all look and work the same. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-04-22SECURITY: Move exec_permission RCU checks into security modulesAndi Kleen5-8/+14
Right now all RCU walks fall back to reference walk when CONFIG_SECURITY is enabled, even though just the standard capability module is active. This is because security_inode_exec_permission unconditionally fails RCU walks. Move this decision to the low level security module. This requires passing the RCU flags down the security hook. This way at least the capability module and a few easy cases in selinux/smack work with RCU walks with CONFIG_SECURITY=y Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Eric Paris <eparis@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-04-22perf, x86: Update/fix Intel Nehalem cache eventsPeter Zijlstra1-4/+4
Change the Nehalem cache events to use retired memory instruction counters (similar to Westmere), this greatly improves the provided stats. Using: main () { int i; for (i = 0; i < 1000000000; i++) { asm("mov (%%rsp), %%rbx;" "mov %%rbx, (%%rsp);" : : : "rbx"); } } We find: $ perf stat --repeat 10 -e instructions:u -e l1-dcache-loads:u -e l1-dcache-stores:u ./loop_1b_loads+stores Performance counter stats for './loop_1b_loads+stores' (10 runs): 4,000,081,056 instructions:u # 0.000 IPC ( +- 0.000% ) 4,999,502,846 l1-dcache-loads:u ( +- 0.008% ) 1,000,034,832 l1-dcache-stores:u ( +- 0.000% ) 1.565184942 seconds time elapsed ( +- 0.005% ) The 5b is surprising - we'd expect 1b: $ perf stat --repeat 10 -e instructions:u -e r10b:u -e l1-dcache-stores:u ./loop_1b_loads+stores Performance counter stats for './loop_1b_loads+stores' (10 runs): 4,000,081,054 instructions:u # 0.000 IPC ( +- 0.000% ) 1,000,021,961 r10b:u ( +- 0.000% ) 1,000,030,951 l1-dcache-stores:u ( +- 0.000% ) 1.565055422 seconds time elapsed ( +- 0.003% ) Which this patch thus fixes. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Stephane Eranian <eranian@google.com> Cc: Lin Ming <ming.m.lin@intel.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Link: http://lkml.kernel.org/n/tip-q9rtru7b7840tws75xzboapv@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-04-22perf, x86: P4 PMU - Don't forget to clear cpuc->active_mask on overflowCyrill Gorcunov1-1/+1
It's not enough to simply disable event on overflow the cpuc->active_mask should be cleared as well otherwise counter may stall in "active" even in real being already disabled (which potentially may lead to the situation that user may not use this counter further). Don pointed out that: " I also noticed this patch fixed some unknown NMIs on a P4 when I stressed the box". Tested-by: Lin Ming <ming.m.lin@intel.com> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Acked-by: Don Zickus <dzickus@redhat.com> Signed-off-by: Don Zickus <dzickus@redhat.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Link: http://lkml.kernel.org/r/1303398203-2918-3-git-send-email-dzickus@redhat.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-04-22x86, perf event: Turn off unstructured raw event access to offcore registersIngo Molnar1-1/+5
Andi Kleen pointed out that the Intel offcore support patches were merged without user-space tool support to the functionality: | | The offcore_msr perf kernel code was merged into 2.6.39-rc*, but the | user space bits were not. This made it impossible to set the extra mask | and actually do the OFFCORE profiling | Andi submitted a preliminary patch for user-space support, as an extension to perf's raw event syntax: | | Some raw events -- like the Intel OFFCORE events -- support additional | parameters. These can be appended after a ':'. | | For example on a multi socket Intel Nehalem: | | perf stat -e r1b7:20ff -a sleep 1 | | Profile the OFFCORE_RESPONSE.ANY_REQUEST with event mask REMOTE_DRAM_0 | that measures any access to DRAM on another socket. | But this kind of usability is absolutely unacceptable - users should not be expected to type in magic, CPU and model specific incantations to get access to useful hardware functionality. The proper solution is to expose useful offcore functionality via generalized events - that way users do not have to care which specific CPU model they are using, they can use the conceptual event and not some model specific quirky hexa number. We already have such generalization in place for CPU cache events, and it's all very extensible. "Offcore" events measure general DRAM access patters along various parameters. They are particularly useful in NUMA systems. We want to support them via generalized DRAM events: either as the fourth level of cache (after the last-level cache), or as a separate generalization category. That way user-space support would be very obvious, memory access profiling could be done via self-explanatory commands like: perf record -e dram ./myapp perf record -e dram-remote ./myapp ... to measure DRAM accesses or more expensive cross-node NUMA DRAM accesses. These generalized events would work on all CPUs and architectures that have comparable PMU features. ( Note, these are just examples: actual implementation could have more sophistication and more parameter - as long as they center around similarly simple usecases. ) Now we do not want to revert *all* of the current offcore bits, as they are still somewhat useful for generic last-level-cache events, implemented in this commit: e994d7d23a0b: perf: Fix LLC-* events on Intel Nehalem/Westmere But we definitely do not yet want to expose the unstructured raw events to user-space, until better generalization and usability is implemented for these hardware event features. ( Note: after generalization has been implemented raw offcore events can be supported as well: there can always be an odd event that is marginally useful but not useful enough to generalize. DRAM profiling is definitely *not* such a category so generalization must be done first. ) Furthermore, PERF_TYPE_RAW access to these registers was not intended to go upstream without proper support - it was a side-effect of the above e994d7d23a0b commit, not mentioned in the changelog. As v2.6.39 is nearing release we go for the simplest approach: disable the PERF_TYPE_RAW offcore hack for now, before it escapes into a released kernel and becomes an ABI. Once proper structure is implemented for these hardware events and users are offered usable solutions we can revisit this issue. Reported-by: Andi Kleen <ak@linux.intel.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1302658203-4239-1-git-send-email-andi@firstfloor.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-04-22perf: Support Xeon E7's via the Westmere PMU driverAndi Kleen1-0/+1
There's a new model number public, 47, for Xeon E7 (aka Westmere EX). Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: a.p.zijlstra@chello.nl Link: http://lkml.kernel.org/r/1303429715-10202-1-git-send-email-andi@firstfloor.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-04-21ide: unexport DISK_EVENT_MEDIA_CHANGE for ide-gd and ide-cdTejun Heo3-2/+12
check_events() implementations in both ide-gd and ide-cd are inadequate for in-kernel event polling. Both generate media change events continuously when certain conditions are met causing infinite event loop between the driver and userland event handler. As disk event now supports suppression of unlisted events, simply de-listing DISK_EVENT_MEDIA_CHANGE from disk->events resolves the problem. Internal handling around media revalidation will behave the same while userland will fall back to userland event polling after detecting the device doesn't support disk events. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Jens Axboe <jaxboe@fusionio.com> Acked-by: "David S. Miller" <davem@davemloft.net> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-04-21block: don't propagate unlisted DISK_EVENTs to userlandTejun Heo1-2/+6
DISK_EVENT_MEDIA_CHANGE is used for both userland visible event and internal event for revalidation of removeable devices. Some legacy drivers don't implement proper event detection and continuously generate events under certain circumstances. For example, ide-cd generates media changed continuously if there's no media in the drive, which can lead to infinite loop of events jumping back and forth between the driver and userland event handler. This patch updates disk event infrastructure such that it never propagates events not listed in disk->events to userland. Those events are processed the same for internal purposes but uevent generation is suppressed. This also ensures that userland only gets events which are advertised in the @events sysfs node lowering risk of confusion. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-04-21elevator: check for ELEVATOR_INSERT_SORT_MERGE in !elvpriv case tooJens Axboe1-1/+2
The sort insert is the one that goes to the IO scheduler. With the SORT_MERGE addition, we could bypass IO scheduler setup but still ask the IO scheduler to insert the request. This would cause an oops on switching IO schedulers through the sysfs interface, unless the disk just happened to be idle while it occured. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-04-21raid5: fix build error, sector_t usageRandy Dunlap1-1/+1
Change <sectors> from unsigned long long to sector_t. This matches its source field. ERROR: "__udivdi3" [drivers/md/raid456.ko] undefined! Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-04-21vfs: Pass setxattr(2) flags properlyJan Kara1-1/+1
For some reason generic_setxattr() did not pass flags (XATTR_CREATE, XATTR_REPLACE) to the filesystem specific helper. This caused that setxattr(2) syscall just ignored these flags. Fix the bug by passing flags correctly. Signed-off-by: Jan Kara <jack@suse.cz> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-04-21virtio: console: Enable call to hvc_remove() on console port removeAmit Shah1-11/+0
This call was disabled as hot-unplugging one virtconsole port led to another virtconsole port freezing. Upon testing it again, this now works, so enable it. In addition, a bug was found in qemu wherein removing a port of one type caused the guest output from another port to stop working. I doubt it was just this bug that caused it (since disabling the hvc_remove() call did allow other ports to continue working), but since it's all solved now, we're fine with hot-unplugging of virtconsole ports. Signed-off-by: Amit Shah <amit.shah@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2011-04-21virtio_pci: Prevent double-free of pci regions after device hot-unplugAmit Shah1-7/+8
In the case where a virtio-console port is in use (opened by a program) and a virtio-console device is removed, the port is kept around but all the virtio-related state is assumed to be gone. When the port is finally released (close() called), we call device_destroy() on the port's device. This results in the parent device's structures to be freed as well. This includes the PCI regions for the virtio-console PCI device. Once this is done, however, virtio_pci_release_dev() kicks in, as the last ref to the virtio device is now gone, and attempts to do pci_iounmap(pci_dev, vp_dev->ioaddr); pci_release_regions(pci_dev); pci_disable_device(pci_dev); which results in a double-free warning. Move the code that releases regions, etc., to the virtio_pci_remove() function, and all that's now left in release_dev is the final freeing of the vp_dev. Signed-off-by: Amit Shah <amit.shah@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2011-04-21virtio: Decrement avail idx on buffer detachAmit Shah1-0/+1
When detaching a buffer from a vq, the avail.idx value should be decremented as well. This was noticed by hot-unplugging a virtio console port and then plugging in a new one on the same number (re-using the vqs which were just 'disowned'). qemu reported 'Guest moved used index from 0 to 256' when any IO was attempted on the new port. CC: stable@kernel.org Reported-by: juzhang <juzhang@redhat.com> Signed-off-by: Amit Shah <amit.shah@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2011-04-21intel_iommu: disable all VT-d PMRs when TXT launchedJoseph Cihula1-9/+29
Intel VT-d Protected Memory Regions (PMRs) are supposed to be disabled, on each VT-d engine, after DMA remapping is enabled on the engines. This is because the behavior of having both enabled is not deterministic and because, if TXT has been used to launch the kernel, the PMRs may be programmed to cover memory regions that will be used for DMA. Under some circumstances (certain quirks detected, lack of multiple devices, etc.), the current code does not set up DMA remapping on some VT-d engines. In such cases it also skips disabling the PMRs. This causes failures when the kernel is launched with TXT (most often this occurs on the graphics engine and results in colored vertical bars on the display). This patch detects when the kernel has been launched with TXT and then disables the PMRs on all VT-d engines. In some cases where the reason that remapping is not being enabled is due to possible ACPI DMAR table errors, the VT-d engine addresses may not be correct and thus not able to be safely programmed even to disable PMRs. Because part of the TXT launch process is the verification of these addresses, it will always be safe to disable PMRs if the TXT launch has succeeded and hence only doing this in such cases. Signed-off-by: Joseph Cihula <joseph.cihula@intel.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2011-04-21UBIFS: fix master node recoveryArtem Bityutskiy1-0/+26
This patch fixes the following symptoms: 1. Unmount UBIFS cleanly. 2. Start mounting UBIFS R/W and have a power cut immediately 3. Start mounting UBIFS R/O, this succeeds 4. Try to re-mount UBIFS R/W - this fails immediately or later on, because UBIFS will write the master node to the flash area which has been written before. The analysis of the problem: 1. UBIFS is unmounted cleanly, both copies of the master node are clean. 2. UBIFS is being mounter R/W, starts changing master node copy 1, and a power cut happens. The copy N1 becomes corrupted. 3. UBIFS is being mounted R/O. It notices the copy N1 is corrupted and reads copy N2. Copy N2 is clean. 4. Because of R/O mode, UBIFS cannot recover copy 1. 5. The mount code (ubifs_mount()) sees that the master node is clean, so it decides that no recovery is needed. 6. We are re-mounting R/W. UBIFS believes no recovery is needed and starts updating the master node, but copy N1 is still corrupted and was not recovered! Fix this problem by marking the master node as dirty every time we recover it and we are in R/O mode. This forces further recovery and the UBIFS cleans-up the corruptions and recovers the copy N1 when re-mounting R/W later. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Cc: stable@kernel.org
2011-04-21UBIFS: fix false assertion warning in case of I/O failuresArtem Bityutskiy1-4/+6
When UBIFS switches to R/O mode because it detects I/O failures, then when we unmount, we still may have allocated budget, and the assertions which verify that we have not budget will fire. But it is expected to have the budget in case of I/O failures, so the assertion warnings will be false. Suppress them for the I/O failure case. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2011-04-21x86, numa: Fix cpu nodemasks for NUMA emulation and CONFIG_DEBUG_PER_CPU_MAPSDavid Rientjes3-33/+20
The cpu<->node mappings under CONFIG_DEBUG_PER_CPU_MAPS=y when NUMA emulation is enabled is currently broken because it does not iterate through every emulated node and bind cpus that have affinity to it. NUMA emulation should bind each cpu to every local node to accurately represent the true NUMA topology of the underlying machine. debug_cpumask_set_cpu() needs to be fixed at the same time so that the debugging information that it emits shows the new cpumask of the node being assigned when the cpu is being added or removed. It can now take responsibility of setting or clearing the cpu itself to remove the need for duplicate code. Also change its last parameter, "enable", to have the correct bool type since it can only be true or false. -v2: Fix the return statements, by Kosaki Motohiro Acked-and-Tested-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: David Rientjes <rientjes@google.com> Cc: Andreas Herrmann <herrmann.der.user@googlemail.com> Cc: Tejun Heo <tj@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/alpine.DEB.2.00.1104201918470.12634@chino.kir.corp.google.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-04-21Revert "x86, NUMA: Fix fakenuma boot failure"David Rientjes1-23/+0
Andreas Herrmann reported that 7d6b46707f24 ("x86, NUMA: Fix fakenuma boot failure") causes certain physical NUMA topologies (for example AMD Magny-Cours) to move sibling cpus to a single node when in reality they are in separate domains. This may result in some nodes being completely void of cpus, which doesn't accurately represent the correct topology. The system will boot, but will have suboptimal NUMA performance. This commit was intended as a fix for NUMA emulation, but should not cause a regression for real NUMA machines as a side effect. ( There will be a separate fix for the numa-debug code, which will not affect physical topologies. ) Reported-by: Andreas Herrmann <herrmann.der.user@googlemail.com> Signed-off-by: David Rientjes <rientjes@google.com> Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Tejun Heo <tj@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/alpine.DEB.2.00.1104201918110.12634@chino.kir.corp.google.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-04-21agp: fix arbitrary kernel memory writesVasiliy Kulikov1-3/+8
pg_start is copied from userspace on AGPIOC_BIND and AGPIOC_UNBIND ioctl cmds of agp_ioctl() and passed to agpioc_bind_wrap(). As said in the comment, (pg_start + mem->page_count) may wrap in case of AGPIOC_BIND, and it is not checked at all in case of AGPIOC_UNBIND. As a result, user with sufficient privileges (usually "video" group) may generate either local DoS or privilege escalation. Signed-off-by: Vasiliy Kulikov <segoon@openwall.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
2011-04-21agp: fix OOM and buffer overflowVasiliy Kulikov1-1/+7
page_count is copied from userspace. agp_allocate_memory() tries to check whether this number is too big, but doesn't take into account the wrap case. Also agp_create_user_memory() doesn't check whether alloc_size is calculated from num_agp_pages variable without overflow. This may lead to allocation of too small buffer with following buffer overflow. Another problem in agp code is not addressed in the patch - kernel memory exhaustion (AGPIOC_RESERVE and AGPIOC_ALLOCATE ioctls). It is not checked whether requested pid is a pid of the caller (no check in agpioc_reserve_wrap()). Each allocation is limited to 16KB, though, there is no per-process limit. This might lead to OOM situation, which is not even solved in case of the caller death by OOM killer - the memory is allocated for another (faked) process. Signed-off-by: Vasiliy Kulikov <segoon@openwall.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
2011-04-20ALSA: hda - Fix unused warnings when !SND_HDA_NEEDS_RESUMEMike Waychison1-0/+4
When SND_HDA_NEEDS_RESUME is not defined, the compiler identifies that the following symbols are static but not used: restore_shutup_pins hda_cleanup_all_streams Fix warnings by adding SND_HDA_NEEDS_RESUME guards. Signed-off-by: Mike Waychison <mikew@google.com> Signed-off-by: Takashi Iwai <tiwai@suse.de>
2011-04-20rtc: fix coh901331 startup crashLinus Walleij1-2/+2
The rtc_device_register() call has changed semantics so that it will immediately call out to rtc_read_alarm() and since the callbacks require the drvdata to be set, we need to set it before the registration call to avoid NULL dereference. Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2011-04-20mach-ux500: fix i2c0 device setup regressionLinus Walleij1-9/+10
Adding two sets of I2C devices to the same bus doesn't quite work, atleast not anymore. Stash one array and determine how much of it shall be added instead. Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2011-04-20xfs: fix duplicate message outputDave Chinner1-1/+3
Commit 957935dc ("xfs: fix xfs_debug warnings" broke the logic in __xfs_printk(). Instead of only printing one of two possible output strings based on whether the fs has a name or not, it outputs both. Fix it to only output one message again. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
2011-04-20UBIFS: fix false space checking failureArtem Bityutskiy1-4/+15
This patch fixes UBIFS mount failure when the debugging support is enabled, we are recovering from a power cut, we were first mounter R/O and we are re-mounting R/W. In this case we should not assume that the amount of free space before we have re-mounted R/W and after are equivalent, because when we have mounted R/O the file-system is in a non-committed state so the amount of free space is slightly smaller, due to the fact that we cannot predict the amount of free space precisely before we commit. This patch fixes the issue by skipping the debugging check in case of recovery. This issue was reported by Caizhiyong <caizhiyong@huawei.com> here: http://thread.gmane.org/gmane.linux.drivers.mtd/34350/focus=34387 Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Reported-by: Caizhiyong <caizhiyong@huawei.com> Cc: stable@kernel.org [2.6.30+]
2011-04-20Open with O_CREAT flag set fails to open existing files on non writable directoriesSachin Prabhu1-1/+8
An open on a NFS4 share using the O_CREAT flag on an existing file for which we have permissions to open but contained in a directory with no write permissions will fail with EACCES. A tcpdump shows that the client had set the open mode to UNCHECKED which indicates that the file should be created if it doesn't exist and encountering an existing flag is not an error. Since in this case the file exists and can be opened by the user, the NFS server is wrong in attempting to check create permissions on the parent directory. The patch adds a conditional statement to check for create permissions only if the file doesn't exist. Signed-off-by: Sachin S. Prabhu <sprabhu@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-04-20xen: mask_rw_pte: do not apply the early_ioremap checks on x86_32Stefano Stabellini1-4/+9
The two "is_early_ioremap_ptep" checks in mask_rw_pte are only used on x86_64, in fact early_ioremap is not used at all to setup the initial pagetable on x86_32. Moreover on x86_32 the two checks are wrong because the range pgt_buf_start..pgt_buf_end initially should be mapped RW because the pages in the range are not pagetable pages yet and haven't been cleared yet. Afterwards considering the pgt_buf_start..pgt_buf_end is part of the initial mapping, xen_alloc_pte is capable of turning the ptes RO when they become pagetable pages. Fix the issue and improve the readability of the code providing two different implementation of mask_rw_pte for x86_32 and x86_64. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-04-20xen: do not create the extra e820 region at an addr lower than 4GStefano Stabellini1-1/+1
Do not add the extra e820 region at a physical address lower than 4G because it breaks e820_end_of_low_ram_pfn(). It is OK for us to move the xen_extra_mem_start up and down because this is the index of the memory that can be ballooned in/out - it is memory not available to the kernel during bootup. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-04-20md: Update documentation for sync_min and sync_max entriesCoolCold1-0/+10
linux/Documentation/md.txt is missing description for sync_min and sync_max entries. This patch adds description for sync_min and sync_max entries. Signed-off-by: Roman Ovchinnikov <coolthecold@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de>
2011-04-20md: Cleanup after raid45->raid0 takeoverKrzysztof Wojcik1-0/+1
Problem: After raid4->raid0 takeover operation, another takeover operation (e.g raid0->raid10) results "kernel oops". Root cause: Variables 'degraded' in mddev structure is not cleared on raid45->raid0 takeover. This patch reset this variable. Signed-off-by: Krzysztof Wojcik <krzysztof.wojcik@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>