aboutsummaryrefslogtreecommitdiffstatshomepage
path: root/tools/perf/scripts/python/export-to-postgresql.py (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2015-05-29xfs: enable sparse inode chunks for v5 superblocksBrian Foster1-1/+2
Enable mounting of filesystems with sparse inode support enabled. Add the incompat. feature bit to the *_ALL mask. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-05-29xfs: skip unallocated regions of inode chunks in xfs_ifree_cluster()Brian Foster3-20/+35
xfs_ifree_cluster() is called to mark all in-memory inodes and inode buffers as stale. This occurs after we've removed the inobt records and dropped any references of inobt data. xfs_ifree_cluster() uses the starting inode number to walk the namespace of inodes expected for a single chunk a cluster buffer at a time. The cluster buffer disk addresses are calculated by decoding the sequential inode numbers expected from the chunk. The problem with this approach is that if the inode chunk being removed is a sparse chunk, not all of the buffer addresses that are calculated as part of this sequence may be inode clusters. Attempting to acquire the buffer based on expected inode characterstics (i.e., cluster length) can lead to errors and is generally incorrect. We already use a couple variables to carry requisite state from xfs_difree() to xfs_ifree_cluster(). Rather than add a third, define a new internal structure to carry the existing parameters through these functions. Add an alloc field that represents the physical allocation bitmap of inodes in the chunk being removed. Modify xfs_ifree_cluster() to check each inode against the bitmap and skip the clusters that were never allocated as real inodes on disk. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-05-29xfs: only free allocated regions of inode chunksBrian Foster1-3/+78
An inode chunk is currently added to the transaction free list based on a simple fsb conversion and hardcoded chunk length. The nature of sparse chunks is such that the physical chunk of inodes on disk may consist of one or more discontiguous parts. Blocks that reside in the holes of the inode chunk are not inodes and could be allocated to any other use or not allocated at all. Refactor the existing xfs_bmap_add_free() call into the xfs_difree_inode_chunk() helper. The new helper uses the existing calculation if a chunk is not sparse. Otherwise, use the inobt record holemask to free the contiguous regions of the chunk. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-05-29xfs: filter out sparse regions from individual inode allocationBrian Foster1-2/+13
Inode allocation from an existing record with free inodes traditionally selects the first inode available according to the ir_free mask. With sparse inode chunks, the ir_free mask could refer to an unallocated region. We must mask the unallocated regions out of ir_free before using it to select a free inode in the chunk. Update the xfs_inobt_first_free_inode() helper to find the first free inode available of the allocated regions of the inode chunk. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-05-29xfs: randomly do sparse inode allocations in DEBUG modeBrian Foster1-2/+13
Sparse inode allocations generally only occur when full inode chunk allocation fails. This requires some level of filesystem space usage and fragmentation. For filesystems formatted with sparse inode chunks enabled, do random sparse inode chunk allocs when compiled in DEBUG mode to increase test coverage. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-05-29xfs: allocate sparse inode chunks on full chunk allocation failureBrian Foster4-14/+401
xfs_ialloc_ag_alloc() makes several attempts to allocate a full inode chunk. If all else fails, reduce the allocation to the sparse length and alignment and attempt to allocate a sparse inode chunk. If sparse chunk allocation succeeds, check whether an inobt record already exists that can track the chunk. If so, inherit and update the existing record. Otherwise, insert a new record for the sparse chunk. Create helpers to align sparse chunk inode records and insert or update existing records in the inode btrees. The xfs_inobt_insert_sprec() helper implements the merge or update semantics required for sparse inode records with respect to both the inobt and finobt. To update the inobt, either insert a new record or merge with an existing record. To update the finobt, use the updated inobt record to either insert or replace an existing record. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-05-29xfs: helper to convert holemask to inode alloc. bitmapBrian Foster2-0/+54
The inobt record holemask field is a condensed data type designed to fit into the existing on-disk record and is zero based (allocated regions are set to 0, sparse regions are set to 1) to provide backwards compatibility. This makes the type somewhat complex for use in higher level inode manipulations such as individual inode allocation, etc. Rather than foist the complexity of dealing with this field to every bit of logic that requires inode granular information, create a helper to convert the holemask to an inode allocation bitmap. The inode allocation bitmap is inode granularity similar to the inobt record free mask and indicates which inodes of the chunk are physically allocated on disk, irrespective of whether the inode is considered allocated or free by the filesystem. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-05-29xfs: handle sparse inode chunks in icreate log recoveryBrian Foster1-6/+16
Recovery of icreate transactions assumes hardcoded values for the inode count and chunk length. Sparse inode chunks are allocated in units of m_ialloc_min_blks. Update the icreate validity checks to allow for appropriately sized inode chunks and verify the inode count matches what is expected based on the extent length rather than assuming a hardcoded count. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-05-29xfs: pass inode count through ordered icreate log itemBrian Foster3-6/+7
v5 superblocks use an ordered log item for logging the initialization of inode chunks. The icreate log item is currently hardcoded to an inode count of 64 inodes. The agbno and extent length are used to initialize the inode chunk from log recovery. While an incorrect inode count does not lead to bad inode chunk initialization, we should pass the correct inode count such that log recovery has enough data to perform meaningful validity checks on the chunk. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-05-29xfs: use actual inode count for sparse records in bulkstat/inumbersBrian Foster1-5/+8
The bulkstat and inumbers mechanisms make the assumption that inode records consist of a full 64 inode chunk in several places. For example, this is used to track how many inodes have been processed overall as well as to determine whether a record has allocated inodes that must be handled. This assumption is invalid for sparse inode records. While sparse inodes will be marked as free in the ir_free mask, they are not accounted as free in ir_freecount because they cannot be allocated. Therefore, ir_freecount may be less than 64 inodes in an inode record for which all physically allocated inodes are free (and in turn ir_freecount < 64 does not signify that the record has allocated inodes). The new in-core inobt record format includes the ir_count field. This holds the number of true, physical inodes tracked by the record. The in-core ir_count field is always valid as it is hardcoded to XFS_INODES_PER_CHUNK when sparse inodes is not enabled. Use ir_count to handle inode records correctly in bulkstat in a generic manner. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-05-29xfs: introduce inode record hole mask for sparse inode chunksBrian Foster3-12/+81
The inode btrees track 64 inodes per record regardless of inode size. Thus, inode chunks on disk vary in size depending on the size of the inodes. This creates a contiguous allocation requirement for new inode chunks that can be difficult to satisfy on an aged and fragmented (free space) filesystems. The inode record freecount currently uses 4 bytes on disk to track the free inode count. With a maximum freecount value of 64, only one byte is required. Convert the freecount field to a single byte and use two of the remaining 3 higher order bytes left for the hole mask field. Use the final leftover byte for the total count field. The hole mask field tracks holes in the chunks of physical space that the inode record refers to. This facilitates the sparse allocation of inode chunks when contiguous chunks are not available and allows the inode btrees to identify what portions of the chunk contain valid inodes. The total count field contains the total number of valid inodes referred to by the record. This can also be deduced from the hole mask. The count field provides clarity and redundancy for internal record verification. Note that neither of the new fields can be written to disk on fs' without sparse inode support. Doing so writes to the high-order bytes of freecount and causes corruption from the perspective of older kernels. The on-disk inobt record data structure is updated with a union to distinguish between the original, "full" format and the new, "sparse" format. The conversion routines to get, insert and update records are updated to translate to and from the on-disk record accordingly such that freecount remains a 4-byte value on non-supported fs, yet the new fields of the in-core record are always valid with respect to the record. This means that higher level code can refer to the current in-core record format unconditionally and lower level code ensures that records are translated to/from disk according to the capabilities of the fs. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-05-29xfs: add fs geometry bit for sparse inode chunksBrian Foster2-1/+4
Define an fs geometry bit for sparse inode chunks such that the characteristic of the fs can be identified by userspace. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-05-29xfs: sparse inode chunks feature helpers and mount requirementsBrian Foster3-0/+44
The sparse inode chunks feature uses the helper function to enable the allocation of sparse inode chunks. The incompatible feature bit is set on disk at mkfs time to prevent mount from unsupported kernels. Also, enforce the inode alignment requirements required for sparse inode chunks at mount time. When enabled, full inode chunks (and all inode record) alignment is increased from cluster size to inode chunk size. Sparse inode alignment must match the cluster size of the fs. Both superblock alignment fields are set as such by mkfs when sparse inode support is enabled. Finally, warn that sparse inode chunks is an experimental feature until further notice. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-05-29xfs: use sparse chunk alignment for min. inode allocation requirementBrian Foster3-1/+8
xfs_ialloc_ag_select() iterates through the allocation groups looking for free inodes or free space to determine whether to allow an inode allocation to proceed. If no free inodes are available, it assumes that an AG must have an extent longer than mp->m_ialloc_blks. Sparse inode chunk support currently allows for allocations smaller than the traditional inode chunk size specified in m_ialloc_blks. The current minimum sparse allocation is set in the superblock sb_spino_align field at mkfs time. Create a new m_ialloc_min_blks field in xfs_mount and use this to represent the minimum supported allocation size for inode chunks. Initialize m_ialloc_min_blks at mount time based on whether sparse inodes are supported. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-05-29xfs: add sparse inode chunk alignment superblock fieldBrian Foster2-4/+4
Add sb_spino_align to the superblock to specify sparse inode chunk alignment. This also currently represents the minimum allowable sparse chunk allocation size. Signed-off-by: Brian Foster <bfoster@redhat.com>
2015-05-29xfs: support min/max agbno args in block allocatorBrian Foster2-5/+39
The block allocator supports various arguments to tweak block allocation behavior and set allocation requirements. The sparse inode chunk feature introduces a new requirement not supported by the current arguments. Sparse inode allocations must convert or merge into an inode record that describes a fixed length chunk (64 inodes x inodesize). Full inode chunk allocations by definition always result in valid inode records. Sparse chunk allocations are smaller and the associated records can refer to blocks not owned by the inode chunk. This model can result in invalid inode records in certain cases. For example, if a sparse allocation occurs near the start of an AG, the aligned inode record for that chunk might refer to agbno 0. If an allocation occurs towards the end of the AG and the AG size is not aligned, the inode record could refer to blocks beyond the end of the AG. While neither of these scenarios directly result in corruption, they both insert invalid inode records and at minimum cause repair to complain, are unlikely to merge into full chunks over time and set land mines for other areas of code. To guarantee sparse inode chunk allocation creates valid inode records, support the ability to specify an agbno range limit for XFS_ALLOCTYPE_NEAR_BNO block allocations. The min/max agbno's are specified in the allocation arguments and limit the block allocation algorithms to that range. The starting 'agbno' hint is clamped to the range if the specified agbno is out of range. If no sufficient extent is available within the range, the allocation fails. For backwards compatibility, the min/max fields can be initialized to 0 to disable range limiting (e.g., equivalent to min=0,max=agsize). Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-05-29xfs: update free inode record logic to support sparse inode recordsBrian Foster1-4/+12
xfs_difree_inobt() uses logic in a couple places that assume inobt records refer to fully allocated chunks. Specifically, the use of mp->m_ialloc_inos can cause problems for inode chunks that are sparsely allocated. Sparse inode chunks can, by definition, define a smaller number of inodes than a full inode chunk. Fix the logic that determines whether an inode record should be removed from the inobt to use the ir_free mask rather than ir_freecount. Fix the agi counters modification to use ir_freecount to add the actual number of inodes freed rather than assuming a full inode chunk. Also make sure that we preserve the behavior to not remove inode chunks if the block size is large enough for multiple inode chunks (e.g., bsize=64k, isize=512). This behavior was previously implicit in that in such configurations, ir.freecount of a single record never matches m_ialloc_inos. Hence, add some comments as well. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-05-29xfs: create individual inode alloc. helperBrian Foster1-2/+12
Inode allocation from sparse inode records must filter the ir_free mask against ir_holemask. In preparation for this requirement, create a helper to allocate an individual inode from an inode record. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-05-03Linux 4.1-rc2Linus Torvalds1-1/+1
2015-05-02ext4: fix growing of tiny filesystemsJan Kara1-2/+5
The estimate of necessary transaction credits in ext4_flex_group_add() is too pessimistic. It reserves credit for sb, resize inode, and resize inode dindirect block for each group added in a flex group although they are always the same block and thus it is enough to account them only once. Also the number of modified GDT block is overestimated since we fit EXT4_DESC_PER_BLOCK(sb) descriptors in one block. Make the estimation more precise. That reduces number of requested credits enough that we can grow 20 MB filesystem (which has 1 MB journal, 79 reserved GDT blocks, and flex group size 16 by default). Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Eric Sandeen <sandeen@redhat.com>
2015-05-02ext4: move check under lock scope to close a race.Davide Italiano1-7/+8
fallocate() checks that the file is extent-based and returns EOPNOTSUPP in case is not. Other tasks can convert from and to indirect and extent so it's safe to check only after grabbing the inode mutex. Signed-off-by: Davide Italiano <dccitaliano@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
2015-05-02ext4: fix data corruption caused by unwritten and delayed extentsLukas Czerner2-0/+10
Currently it is possible to lose whole file system block worth of data when we hit the specific interaction with unwritten and delayed extents in status extent tree. The problem is that when we insert delayed extent into extent status tree the only way to get rid of it is when we write out delayed buffer. However there is a limitation in the extent status tree implementation so that when inserting unwritten extent should there be even a single delayed block the whole unwritten extent would be marked as delayed. At this point, there is no way to get rid of the delayed extents, because there are no delayed buffers to write out. So when a we write into said unwritten extent we will convert it to written, but it still remains delayed. When we try to write into that block later ext4_da_map_blocks() will set the buffer new and delayed and map it to invalid block which causes the rest of the block to be zeroed loosing already written data. For now we can fix this by simply not allowing to set delayed status on written extent in the extent status tree. Also add WARN_ON() to make sure that we notice if this happens in the future. This problem can be easily reproduced by running the following xfs_io. xfs_io -f -c "pwrite -S 0xaa 4096 2048" \ -c "falloc 0 131072" \ -c "pwrite -S 0xbb 65536 2048" \ -c "fsync" /mnt/test/fff echo 3 > /proc/sys/vm/drop_caches xfs_io -c "pwrite -S 0xdd 67584 2048" /mnt/test/fff This can be theoretically also reproduced by at random by running fsx, but it's not very reliable, though on machines with bigger page size (like ppc) this can be seen more often (especially xfstest generic/127) Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
2015-05-02ext4 crypto: remove duplicated encryption mode definitionsChanho Park1-6/+0
This patch removes duplicated encryption modes which were already in ext4.h. They were duplicated from commit 3edc18d and commit f542fb. Cc: Theodore Ts'o <tytso@mit.edu> Cc: Michael Halcrow <mhalcrow@google.com> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Signed-off-by: Chanho Park <chanho61.park@samsung.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2015-05-02ext4 crypto: do not select from EXT4_FS_ENCRYPTIONHerbert Xu1-2/+7
This patch adds a tristate EXT4_ENCRYPTION to do the selections for EXT4_FS_ENCRYPTION because selecting from a bool causes all the selected options to be built-in, even if EXT4 itself is a module. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2015-05-01virtio: fix typo in vring_need_event() doc commentStefan Hajnoczi1-1/+1
Here the "other side" refers to the guest or host. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-05-01virtio: pass baton to Michael TsirkinRusty Russell1-1/+0
With my job change kernel work will be "own time"; I'm keeping lguest and modules (and the virtio standards work), but virtio kernel has to go. This makes it clear that Michael is in charge. He's good, but having me watch over his shoulder won't help. Good luck Michael! Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-05-01ipv4: Missing sk_nulls_node_init() in ping_unhash().David S. Miller1-0/+1
If we don't do that, then the poison value is left in the ->pprev backlink. This can cause crashes if we do a disconnect, followed by a connect(). Tested-by: Linus Torvalds <torvalds@linux-foundation.org> Reported-by: Wen Xu <hotdog3645@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-01rbd: end I/O the entire obj_request on errorIlya Dryomov1-0/+5
When we end I/O struct request with error, we need to pass obj_request->length as @nr_bytes so that the entire obj_request worth of bytes is completed. Otherwise block layer ends up confused and we trip on rbd_assert(more ^ (which == img_request->obj_request_count)); in rbd_img_obj_callback() due to more being true no matter what. We already do it in most cases but we are missing some, in particular those where we don't even get a chance to submit any obj_requests, due to an early -ENOMEM for example. A number of obj_request->xferred assignments seem to be redundant but I haven't touched any of obj_request->xferred stuff to keep this small and isolated. Cc: Alex Elder <elder@linaro.org> Cc: stable@vger.kernel.org # 3.10+ Reported-by: Shawn Edwards <lesser.evil@gmail.com> Reviewed-by: Sage Weil <sage@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2015-05-01ext4 crypto: add padding to filenames before encryptingTheodore Ts'o5-8/+31
This obscures the length of the filenames, to decrease the amount of information leakage. By default, we pad the filenames to the next 4 byte boundaries. This costs nothing, since the directory entries are aligned to 4 byte boundaries anyway. Filenames can also be padded to 8, 16, or 32 bytes, which will consume more directory space. Change-Id: Ibb7a0fb76d2c48e2061240a709358ff40b14f322 Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2015-05-01ext4 crypto: simplify and speed up filename encryptionTheodore Ts'o5-204/+149
Avoid using SHA-1 when calculating the user-visible filename when the encryption key is available, and avoid decrypting lots of filenames when searching for a directory entry in a directory block. Change-Id: If4655f144784978ba0305b597bfa1c8d7bb69e63 Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2015-05-01powerpc/powernv: Restore non-volatile CRs after napSam Bobroff1-0/+2
Patches 7cba160ad "powernv/cpuidle: Redesign idle states management" and 77b54e9f2 "powernv/powerpc: Add winkle support for offline cpus" use non-volatile condition registers (cr2, cr3 and cr4) early in the system reset interrupt handler (system_reset_pSeries()) before it has been determined if state loss has occurred. If state loss has not occurred, control returns via the power7_wakeup_noloss() path which does not restore those condition registers, leaving them corrupted. Fix this by restoring the condition registers in the power7_wakeup_noloss() case. This is apparent when running a KVM guest on hardware that does not support winkle or sleep and the guest makes use of secondary threads. In practice this means Power7 machines, though some early unreleased Power8 machines may also be susceptible. The secondary CPUs are taken off line before the guest is started and they call pnv_smp_cpu_kill_self(). This checks support for sleep states (in this case there is no support) and power7_nap() is called. When the CPU is woken, power7_nap() returns and because the CPU is still off line, the main while loop executes again. The sleep states support test is executed again, but because the tested values cannot have changed, the compiler has optimized the test away and instead we rely on the result of the first test, which has been left in cr3 and/or cr4. With the result overwritten, the wrong branch is taken and power7_winkle() is called on a CPU that does not support it, leading to it stalling. Fixes: 7cba160ad789 ("powernv/cpuidle: Redesign idle states management") Fixes: 77b54e9f213f ("powernv/powerpc: Add winkle support for offline cpus") [mpe: Massage change log a bit more] Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-05-01powerpc/eeh: Delay probing EEH device during hotplugGavin Shan1-0/+6
Commit 1c509148b ("powerpc/eeh: Do probe on pci_dn") probes EEH devices in early stage, which is reasonable to pSeries platform. However, it's wrong for PowerNV platform because the PE# isn't determined until the resources (IO and MMIO) are assigned to PE in hotplug case. So we have to delay probing EEH devices for PowerNV platform until the PE# is assigned. Fixes: ff57b454ddb9 ("powerpc/eeh: Do probe on pci_dn") Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-05-01powerpc/eeh: Fix race condition in pcibios_set_pcie_reset_state()Gavin Shan1-1/+4
When asserting reset in pcibios_set_pcie_reset_state(), the PE is enforced to (hardware) frozen state in order to drop unexpected PCI transactions (except PCI config read/write) automatically by hardware during reset, which would cause recursive EEH error. However, the (software) frozen state EEH_PE_ISOLATED is missed. When users get 0xFF from PCI config or MMIO read, EEH_PE_ISOLATED is set in PE state retrival backend. Unfortunately, nobody (the reset handler or the EEH recovery functinality in host) will clear EEH_PE_ISOLATED when the PE has been passed through to guest. The patch sets and clears EEH_PE_ISOLATED properly during reset in function pcibios_set_pcie_reset_state() to fix the issue. Fixes: 28158cd ("Enhance pcibios_set_pcie_reset_state()") Reported-by: Carol L. Soto <clsoto@us.ibm.com> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Tested-by: Carol L. Soto <clsoto@us.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-05-01powerpc/pseries: Correct cpu affinity for dlpar added cpusNathan Fontenot1-6/+4
The incorrect ordering of operations during cpu dlpar add results in invalid affinity for the cpu being added. The ibm,associativity property in the device tree is populated with all zeroes for the added cpu which results in invalid affinity mappings and all cpus appear to belong to node 0. This occurs because rtas configure-connector is called prior to making the rtas set-indicator calls. Phyp does not assign affinity information for a cpu until the rtas set-indicator calls are made to set the isolation and allocation state. Correct the order of operations to make the rtas set-indicator calls (done in dlpar_acquire_drc) before calling rtas configure-connector. Fixes: 1a8061c46c46 ("powerpc/pseries: Add kernel based CPU DLPAR handling") Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-05-01selftests/powerpc: Fix the pmu install ruleMichael Ellerman1-1/+1
My patch to add install support for the powerpc selftests had a typo, leading to the three tests in the pmu directory itself not being installed. Fixes: 6faeeea44b84 ("selftests: Add install support for the powerpc tests") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-04-30net: fec: Fix RGMII-ID modeMarkus Pargmann1-1/+4
RGMII-ID uses an internal delay within the transmitter or receiver. This feature is phy specific. The rest of the communication is normal RGMII. So the fec driver has to check for all RGMII modes, not only 'PHY_INTERFACE_MODE_RGMII'. Signed-off-by: Markus Pargmann <mpa@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-30net/mlx4_en: Schedule napi when RX buffers allocation failsIdo Shamay3-2/+26
When system is out of memory, refilling of RX buffers fails while the driver continue to pass the received packets to the kernel stack. At some point, when all RX buffers deplete, driver may fall into a sleep, and not recover when memory for new RX buffers is once again availible. This is because hardware does not have valid descriptors, so no interrupt will be generated for the driver to return to work in napi context. Fix it by schedule the napi poll function from stats_task delayed workqueue, as long as the allocations fail. Signed-off-by: Ido Shamay <idos@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-30netxen_nic: use spin_[un]lock_bh around tx_clean_lockTony Camuso1-2/+2
While testing this driver with DEBUG_LOCKDEP and DEBUG_SPINLOCK enabled did not produce any traces, it would be more prudent in the case of tx_clean_lock to use spin_[un]lock_bh, since this lock is manipulated in both the process and softirq contexts. This patch was tested for functionality and regressions with netperf and DEBUG_LOCKDEP and DEBUG_SPINLOCK enabled. Signed-off-by: Tony Camuso <tcamuso@redhat.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-30net/mlx4_core: Fix unaligned accessesDavid Ahern1-4/+14
Addresses the following kernel logs seen during boot: Kernel unaligned access at TPC[100ee150] mlx4_QUERY_HCA+0x80/0x248 [mlx4_core] Kernel unaligned access at TPC[100f071c] mlx4_QUERY_ADAPTER+0x100/0x12c [mlx4_core] Kernel unaligned access at TPC[100f071c] mlx4_QUERY_ADAPTER+0x100/0x12c [mlx4_core] Kernel unaligned access at TPC[100f071c] mlx4_QUERY_ADAPTER+0x100/0x12c [mlx4_core] Kernel unaligned access at TPC[100f071c] mlx4_QUERY_ADAPTER+0x100/0x12c [mlx4_core] Signed-off-by: David Ahern <david.ahern@oracle.com> Acked-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-30mlx4_en: Use correct loop cursor in error path.Benjamin Poirier1-1/+1
Signed-off-by: Benjamin Poirier <bpoirier@suse.de> Fixes: 9e311e7 ("net/mlx4_en: Use affinity hint") Acked-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-30modsign: change default key detailsDavid Howells2-6/+6
Change default key details to be more obviously unspecified. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: James Morris <james.l.morris@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-30dm: fix free_rq_clone() NULL pointer when requeueing unmapped requestMike Snitzer1-4/+12
Commit 022333427a ("dm: optimize dm_mq_queue_rq to _not_ use kthread if using pure blk-mq") mistakenly removed free_rq_clone()'s clone->q check before testing clone->q->mq_ops. It was an oversight to discontinue that check for 1 of the 2 use-cases for free_rq_clone(): 1) free_rq_clone() called when an unmapped original request is requeued 2) free_rq_clone() called in the request-based IO completion path The clone->q check made sense for case #1 but not for #2. However, we cannot just reinstate the check as it'd mask a serious bug in the IO completion case #2 -- no in-flight request should have an uninitialized request_queue (basic block layer refcounting _should_ ensure this). The NULL pointer seen for case #1 is detailed here: https://www.redhat.com/archives/dm-devel/2015-April/msg00160.html Fix this free_rq_clone() NULL pointer by simply checking if the mapped_device's type is DM_TYPE_MQ_REQUEST_BASED (clone's queue is blk-mq) rather than checking clone->q->mq_ops. This avoids the need to dereference clone->q, but a WARN_ON_ONCE is added to let us know if an uninitialized clone request is being completed. Reported-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-04-30dm: only initialize the request_queue onceChristoph Hellwig2-11/+9
Commit bfebd1cdb4 ("dm: add full blk-mq support to request-based DM") didn't properly account for the need to short-circuit re-initializing DM's blk-mq request_queue if it was already initialized. Otherwise, reloading a blk-mq request-based DM table (either manually or via multipathd) resulted in errors, see: https://www.redhat.com/archives/dm-devel/2015-April/msg00132.html Fix is to only initialize the request_queue on the initial table load (when the mapped_device type is assigned). This is better than having dm_init_request_based_blk_mq_queue() return early if the queue was already initialized because it elevates the constraint to a more meaningful location in DM core. As such the pre-existing early return in dm_init_request_based_queue() can now be removed. Fixes: bfebd1cdb4 ("dm: add full blk-mq support to request-based DM") Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-04-30arm64: perf: Fix the pmu node name in warning messageSuzuki K. Poulose1-1/+1
With commit d5efd9cc9cf2 ("arm64: pmu: add support for interrupt-affinity property"), we print a warning when we find a PMU SPI with a missing missing interrupt-affinity property in a pmu node. Unfortunately, we pass the wrong (NULL) device node to of_node_full_name, resulting in unhelpful messages such as: hw perfevents: Failed to parse <no-node>/interrupt-affinity[0] This patch fixes the name to that of the pmu node. Fixes: d5efd9cc9cf2 (arm64: pmu: add support for interrupt-affinity property) Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Suzuki K. Poulose <suzuki.poulose@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-04-30arm64: perf: don't warn about missing interrupt-affinity property for PPIsWill Deacon1-1/+6
PPIs are affine by nature, so the interrupt-affinity property is not used and therefore we shouldn't print a warning in its absence. Reported-by: Maxime Ripard <maxime.ripard@free-electrons.com> Reviewed-by: Maxime Ripard <maxime.ripard@free-electrons.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-04-30Revert "powerpc/tm: Abort syscalls in active transactions"Michael Ellerman4-37/+18
This reverts commit feba40362b11341bee6d8ed58d54b896abbd9f84. Although the principle of this change is good, the implementation has a few issues. Firstly we can sometimes fail to abort a syscall because r12 may have been clobbered by C code if we went down the virtual CPU accounting path, or if syscall tracing was enabled. Secondly we have decided that it is safer to abort the syscall even earlier in the syscall entry path, so that we avoid the syscall tracing path when we are transactional. So that we have time to thoroughly test those changes we have decided to revert this for this merge window and will merge the fixed version in the next window. NB. Rather than reverting the selftest we just drop tm-syscall from TEST_PROGS so that it's not run by default. Fixes: feba40362b11 ("powerpc/tm: Abort syscalls in active transactions") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-04-29Btrfs: btrfs_release_extent_buffer_page didn't free pages of dummy extentForrest Liu1-25/+26
btrfs_release_extent_buffer_page() can't handle dummy extent that allocated by btrfs_clone_extent_buffer() properly. That is because reference count of pages that allocated by btrfs_clone_extent_buffer() was 2, 1 by alloc_page(), and another by attach_extent_buffer_page(). Running following command repeatly can check this memory leak problem btrfs inspect-internal inode-resolve 256 /mnt/btrfs Signed-off-by: Chien-Kuan Yeh <ckya@synology.com> Signed-off-by: Forrest Liu <forrestl@synology.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> Tested-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-04-29cxgb4: Fix MC1 memory offset calculationHariprasad Shenai1-1/+1
Commit 6559a7e8296002b4 ("cxgb4: Cleanup macros so they follow the same style and look consistent") introduced a regression where reading MC1 memory in adapters where MC0 isn't present or MC0 size is not equal to MC1 size caused the adapter to crash due to incorrect computation of memoffset. Fix is to read the size of MC0 instead of MC1 for offset calculation Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-29bnx2x: Delay during kdump loadYuval Mintz1-0/+6
In a kdump environment interfaces might be re-loaded without a proper unload sequence in the previous running kernel. bnx2x management FW and driver maintains a `pulse' that notifies the FW that the driver is still up and running. Driver load on the kdump kernel should be performed only after the pulse has been out-of-sync long enough for the management FW to identify that the driver has crashed, on which point it will perform some necessary cleanup of the HW. In today's distros kdump loading is quite fast, sometimes too fast for our FW to get out-of-sync. This patch delays the bnx2x's probe during kdump to allow a proper re-load on the kdump kernel. Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-29net: Fix Kernel Panic in bonding driver debugfs file: rlb_hash_tablePai1-0/+10
This patch fixes a Kernel Panic in bonding driver debugfs file: rlb_hash_table. $> modprobe bonding mode=6 $> cat /sys/kernel/debug/bonding/bond0/rlb_hash_table This will crash the kernel. The struct alb_bond_info is initialized only when the bonding interface is initialized (ip link set bond0 up) and not at the time it is allocated. If we try to read the table before that, it'll result in a kernel panic. The patch applies against both net and net-next Signed-off-by: Vishwanath Pai <vpai@akamai.com> Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>