aboutsummaryrefslogtreecommitdiffstatshomepage
path: root/tools/perf/scripts/python/export-to-postgresql.py (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2015-08-09btrfs: Remove unnecessary variants in relocation.cZhaolei3-11/+7
These arguments are not used in functions, remove them for cleanup and make kernel stack happy. Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-08-09btrfs: Cleanup: Remove chunk_objectid argument from btrfs_relocate_chunk()Zhaolei1-8/+2
Remove chunk_objectid argument from btrfs_relocate_chunk() because it is not necessary, it can also cleanup some code in caller for prepare its value. Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-08-09btrfs: Cleanup: Remove objectid's init-value in create_reloc_inode()Zhaolei1-1/+1
objectid's init-value is not used in any case, remove it. Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-08-09btrfs: Error handle for get_ref_objectid_v0() in relocate_block_group()Zhaolei1-0/+4
We need error checking code for get_ref_objectid_v0() in relocate_block_group(), to avoid unpredictable result, especially for accessing uninitialized value(when function failed) after this line. Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-08-09btrfs: Fix data checksum error cause by replace with io-load.Zhaolei2-7/+29
xfstests btrfs/070 sometimes failed. In my test machine, its fail rate is about 30%. In another vm(vmware), its fail rate is about 50%. Reason: btrfs/070 do replace and defrag with fsstress simultaneously, after above operation, checksum error is found by scrub. Actually, it have no relationship with defrag operation, only replace with fsstress can trigger this bug. New data writen to target device have possibility rewrited by old data from source device by replace code in debug, to avoid above problem, we can set target block group to readonly in replace period, so new data requested by other operation will not write to same place with replace code. Before patch(4.1-rc3): 30% failed in 100 xfstests. After patch: 0% failed in 300 xfstests. It also happened in btrfs/071 as it's another scrub with IO load tests. Reported-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-08-09btrfs: use scrub_pause_on/off() to reduce code in scrub_enumerate_chunks()Zhaolei1-7/+3
Use new intruduced scrub_pause_on/off() can make this code block clean and more readable. Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-08-09btrfs: Separate scrub_blocked_if_needed() to scrub_pause_on/off()Zhaolei1-1/+10
It can reduce current duplicated code which is similar to scrub_blocked_if_needed() but can not call it because little different. It also used by my next patch which is in same case. Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-08-09btrfs: Use ref_cnt for set_block_group_ro()Zhaolei3-32/+30
More than one code call set_block_group_ro() and restore rw in fail. Old code use bool bit to save blockgroup's ro state, it can not support parallel case(it is confirmd exist in my debug log). This patch use ref count to store ro state, and rename set_block_group_ro/set_block_group_rw to inc_block_group_ro/dec_block_group_ro. Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-08-09btrfs: Bypass unrelated items before accessing its contents in scrubZhao Lei1-8/+8
When we access extent_root in scrub_stripe() and scrub_raid56_parity(), we need bypass unrelated tree item firstly before using its contents to do other condition. It is not a bug fix, only making code sequence in logic. Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-08-09btrfs: Load only necessary csums into list in scrubZhao Lei1-3/+5
We need not load csum of whole strip in scrub because strip is trimed before use, it is to say, what we really need to calculate csum is data between [extent_logical, extent_len). This patch changed to use above segment for btrfs_lookup_csums_range() in scrub_stripe() Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-08-09btrfs: Fix calculate typo caused by ambiguous meaning of logic_endZhao Lei1-5/+5
For example, in scrub_raid56_parity(), following lines are used to judge is all data processed: place1: if (key.objectid > logic_end) ... place2: if (logic_start >= logic_end) ... ... (place2 is typo, is should be ">", it is copied from other place, where logic_end's meaning is different, long story...) We can fix above typo directly, but the root reason is ambiguous meaning of logic_end in scrub raid56 parity. In other place, XXX_end is pointed to data which is not included, and we need to process segment of [XXX_start, XXX_end). But for scrub raid56 parity, logic_end is pointed to lattest data need to process, and introduced many "+ 1" and "- 1" in code as below: length = sparity->logic_end - sparity->logic_start + 1 logic_end - logic_start + 1 stripe_logical + increment - 1 This patch changed logic_end's meaning to make it in normal understanding in raid56 parity functions and data struct alone with above bugfix. Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-08-09btrfs: Free checksum list on scrub_extent() failZhao Lei1-2/+6
When scrub_extent() failed, we need to free previois created checksum list. Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-08-09btrfs: Check cancel and pause in interval of scrub operationZhao Lei1-16/+18
Old code checking cancel and pause request inside scrub stripe operation, like: loop() { if (parity) { scrub_parity_stripe(); continue; } check_cancel_and_pause() scrub_normal_stripe(); } Reason is when introduce raid56 stripe scrub, new code is inserted simplely to front of loop. Better to: loop() { check_cancel_and_pause() if (parity) scrub_parity_stripe(); else scrub_normal_stripe(); } This patch adjusted code place to realize above sequence. Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-08-09btrfs: Show detail information when mount failed on missing devicesZhao Lei1-2/+3
When mount failed because missing device, we can see following dmesg: [ 1060.267743] BTRFS: too many missing devices, writeable mount is not allowed [ 1060.273158] BTRFS: open_ctree failed This patch add missing_device_number and tolerated_missing_device_number to above output, to let user know what really happened, and helps bug-report and debug. dmesg after patch: [ 127.050367] BTRFS: missing devices(1) exceeds the limit(0), writeable mount is not allowed [ 127.056099] BTRFS: open_ctree failed Changelog v1->v2: 1: Changed to more clear description, suggested-by: Anand Jain <anand.jain@oracle.com> Suggested-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-08-09btrfs: Fix scrub panic when leaf crosses stripesZhao Lei1-7/+10
Scrub panic in following operation: mkfs.ext4 /dev/vdh btrfs-convert /dev/vdh mount /dev/vdh /mnt/tmp1 btrfs scrub start -B /dev/vdh (panic) Reason: 1: In some case, leaf created by btrfs-convert was splited into 2 strips. 2: Scrub bypassed part of above wrong leaf data, but remain data caused panic in scrub_checksum_tree_block(). For reason 1: we can get following information after some simple operation. a. mkfs.ext4 /dev/vdh btrfs-convert /dev/vdh b. btrfs-debug-tree /dev/vdh we can see following item in extent tree: item 25 key (27054080 METADATA_ITEM 0) itemoff 15083 itemsize 33 Its logical address is [27054080, 27070464) and acrossed 2 strips: [27000832, 27066368) [27066368, 27131904) Will be fixed in btrfs-progs(btrfs-convert, btrfsck, ...) For reason 2: Scrub is trying to do a "bypass" in this case, but the result is "panic", because current code lacks of some condition in bypass, and let some wrong leaf data escaped. This patch fixed above scrub code. Before patch: # btrfs scrub start -B /dev/vdh (panic) After patch: # btrfs scrub start -B /dev/vdh scrub done for 353cec8f-da31-4a94-aa35-be72d997b06e ... # dmesg ... [ 59.088697] BTRFS error (device vdh): scrub: tree block 27054080 spanning stripes, ignored. logical=27000832 [ 59.089929] BTRFS error (device vdh): scrub: tree block 27054080 spanning stripes, ignored. logical=27066368 # Reported-by: Chris Murphy <lists@colorremedies.com> Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-08-09Btrfs: fix stale dir entries after removing a link and fsyncFilipe Manana1-20/+138
We have one more case where after a log tree is replayed we get inconsistent metadata leading to stale directory entries, due to some directories having entries pointing to some inode while the inode does not have a matching BTRFS_INODE_[REF|EXTREF]_KEY item. To trigger the problem we need to have a file with multiple hard links belonging to different parent directories. Then if one of those hard links is removed and we fsync the file using one of its other links that belongs to a different parent directory, we end up not logging the fact that the removed hard link doesn't exists anymore in the parent directory. Simple reproducer: seq=`basename $0` seqres=$RESULT_DIR/$seq echo "QA output created by $seq" tmp=/tmp/$$ status=1 # failure is the default! trap "_cleanup; exit \$status" 0 1 2 3 15 _cleanup() { _cleanup_flakey rm -f $tmp.* } # get standard environment, filters and checks . ./common/rc . ./common/filter . ./common/dmflakey # real QA test starts here _need_to_be_root _supported_fs generic _supported_os Linux _require_scratch _require_dm_flakey _require_metadata_journaling $SCRATCH_DEV rm -f $seqres.full _scratch_mkfs >>$seqres.full 2>&1 _init_flakey _mount_flakey # Create our test directory and file. mkdir $SCRATCH_MNT/testdir touch $SCRATCH_MNT/foo ln $SCRATCH_MNT/foo $SCRATCH_MNT/testdir/foo2 ln $SCRATCH_MNT/foo $SCRATCH_MNT/testdir/foo3 # Make sure everything done so far is durably persisted. sync # Now we remove one of our file's hardlinks in the directory testdir. unlink $SCRATCH_MNT/testdir/foo3 # We now fsync our file using the "foo" link, which has a parent that # is not the directory "testdir". $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/foo # Silently drop all writes and unmount to simulate a crash/power # failure. _load_flakey_table $FLAKEY_DROP_WRITES _unmount_flakey # Allow writes again, mount to trigger journal/log replay. _load_flakey_table $FLAKEY_ALLOW_WRITES _mount_flakey # After the journal/log is replayed we expect to not see the "foo3" # link anymore and we should be able to remove all names in the # directory "testdir" and then remove it (no stale directory entries # left after the journal/log replay). echo "Entries in testdir:" ls -1 $SCRATCH_MNT/testdir rm -f $SCRATCH_MNT/testdir/* rmdir $SCRATCH_MNT/testdir _unmount_flakey status=0 exit The test fails with: $ ./check generic/107 FSTYP -- btrfs PLATFORM -- Linux/x86_64 debian3 4.1.0-rc6-btrfs-next-11+ MKFS_OPTIONS -- /dev/sdc MOUNT_OPTIONS -- /dev/sdc /home/fdmanana/btrfs-tests/scratch_1 generic/107 3s ... - output mismatch (see .../results/generic/107.out.bad) --- tests/generic/107.out 2015-08-01 01:39:45.807462161 +0100 +++ /home/fdmanana/git/hub/xfstests/results//generic/107.out.bad @@ -1,3 +1,5 @@ QA output created by 107 Entries in testdir: foo2 +foo3 +rmdir: failed to remove '/home/fdmanana/btrfs-tests/scratch_1/testdir': Directory not empty ... _check_btrfs_filesystem: filesystem on /dev/sdc is inconsistent \ (see /home/fdmanana/git/hub/xfstests/results//generic/107.full) _check_dmesg: something found in dmesg (see .../results/generic/107.dmesg) Ran: generic/107 Failures: generic/107 Failed 1 of 1 tests $ cat /home/fdmanana/git/hub/xfstests/results//generic/107.full (...) checking fs roots root 5 inode 257 errors 200, dir isize wrong unresolved ref dir 257 index 3 namelen 4 name foo3 filetype 1 errors 5, no dir item, no inode ref (...) And produces the following warning in dmesg: [127298.759064] BTRFS info (device dm-0): failed to delete reference to foo3, inode 258 parent 257 [127298.762081] ------------[ cut here ]------------ [127298.763311] WARNING: CPU: 10 PID: 7891 at fs/btrfs/inode.c:3956 __btrfs_unlink_inode+0x182/0x35a [btrfs]() [127298.767327] BTRFS: Transaction aborted (error -2) (...) [127298.788611] Call Trace: [127298.789137] [<ffffffff8145f077>] dump_stack+0x4f/0x7b [127298.790090] [<ffffffff81095de5>] ? console_unlock+0x356/0x3a2 [127298.791157] [<ffffffff8104b3b0>] warn_slowpath_common+0xa1/0xbb [127298.792323] [<ffffffffa065ad09>] ? __btrfs_unlink_inode+0x182/0x35a [btrfs] [127298.793633] [<ffffffff8104b410>] warn_slowpath_fmt+0x46/0x48 [127298.794699] [<ffffffffa065ad09>] __btrfs_unlink_inode+0x182/0x35a [btrfs] [127298.797640] [<ffffffffa065be8f>] btrfs_unlink_inode+0x1e/0x40 [btrfs] [127298.798876] [<ffffffffa065bf11>] btrfs_unlink+0x60/0x9b [btrfs] [127298.800154] [<ffffffff8116fb48>] vfs_unlink+0x9c/0xed [127298.801303] [<ffffffff81173481>] do_unlinkat+0x12b/0x1fb [127298.802450] [<ffffffff81253855>] ? lockdep_sys_exit_thunk+0x12/0x14 [127298.803797] [<ffffffff81174056>] SyS_unlinkat+0x29/0x2b [127298.805017] [<ffffffff81465197>] system_call_fastpath+0x12/0x6f [127298.806310] ---[ end trace bbfddacb7aaada7b ]--- [127298.807325] BTRFS warning (device dm-0): __btrfs_unlink_inode:3956: Aborting unused transaction(No such entry). So fix this by logging all parent inodes, current and old ones, to make sure we do not get stale entries after log replay. This is not a simple solution such as triggering a full transaction commit because it would imply full transaction commit when an inode is fsynced in the same transaction that modified it and reloaded it after eviction (because its last_unlink_trans is set to the same value as its last_trans as of the commit with the title "Btrfs: fix stale dir entries after unlink, inode eviction and fsync"), and it would also make fstest generic/066 fail since one of the fsyncs triggers a full commit and the next fsync will not find the inode in the log anymore (therefore not removing the xattr). Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-08-09btrfs: fix search key advancing conditionNaohiro Aota1-3/+9
The search key advancing condition used in copy_to_sk() is loose. It can advance the key even if it reaches sk->max_*: e.g. when the max key = (512, 1024, -1) and the current key = (512, 1025, 10), it increments the offset by 1, continues hopeless search from (512, 1025, 11). This issue make ioctl() to take unexpectedly long time scanning all the leaf a blocks one by one. This commit fix the problem using standard way of key comparison: btrfs_comp_cpu_keys() Signed-off-by: Naohiro Aota <naota@elisp.net> Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-08-09Btrfs: teach backref walking about backrefs with underflowed offset valuesFilipe Manana1-2/+25
When cloning/deduplicating file extents (through the clone and extent_same ioctls) we can get data back references with offset values that are a result of an unsigned integer arithmetic underflow, that is, values that are much larger then they could be otherwise. This is not a problem when decrementing or dropping the back references (happens when we overwrite the extents or punch a hole for example, through __btrfs_drop_extents()), since we compute the same too large offset value, but it is a problem for the backref walking code, used by an incremental send and the ioctls that are used by the btrfs tool "inspect-internal" commands, as it makes it miss the corresponding file extent items because the search key is set for an extent item that starts at an offset matching the exceptionally large offset value of the data back reference. For an incremental send this causes the send ioctl to fail with -EIO. So teach the backref walking code to deal with these cases by setting the search key's offset to 0 if the backref's offset value is larger than LLONG_MAX (the largest possible file offset). This makes sure the backref walking code finds the corresponding file extent items at the expense of scanning more items and leafs in the btree. Fixing the clone/dedup ioctls to not produce such underflowed results would require major changes breaking backward compatibility, updating user space tools, etc. Simple reproducer case for fstests: seq=`basename $0` seqres=$RESULT_DIR/$seq echo "QA output created by $seq" tmp=/tmp/$$ status=1 # failure is the default! trap "_cleanup; exit \$status" 0 1 2 3 15 _cleanup() { rm -fr $send_files_dir rm -f $tmp.* } # get standard environment, filters and checks . ./common/rc . ./common/filter # real QA test starts here _supported_fs btrfs _supported_os Linux _require_scratch _require_cloner _need_to_be_root send_files_dir=$TEST_DIR/btrfs-test-$seq rm -f $seqres.full rm -fr $send_files_dir mkdir $send_files_dir _scratch_mkfs >>$seqres.full 2>&1 _scratch_mount # Create our test file with a single extent of 64K starting at file # offset 128K. $XFS_IO_PROG -f -c "pwrite -S 0xaa 128K 64K" $SCRATCH_MNT/foo \ | _filter_xfs_io _run_btrfs_util_prog subvolume snapshot -r $SCRATCH_MNT \ $SCRATCH_MNT/mysnap1 # Now clone parts of the original extent into lower offsets of the file. # # The first clone operation adds a file extent item to file offset 0 # that points to our initial extent with a data offset of 16K. The # corresponding data back reference in the extent tree has an offset of # 18446744073709535232, which is the result of file_offset - data_offset # = 0 - 16K. # # The second clone operation adds a file extent item to file offset 16K # that points to our initial extent with a data offset of 48K. The # corresponding data back reference in the extent tree has an offset of # 18446744073709518848, which is the result of file_offset - data_offset # = 16K - 48K. # # Those large back reference offsets (result of unsigned arithmetic # underflow) confused the back reference walking code (used by an # incremental send and the multiple inspect-internal ioctls) and made it # miss the back references, which for the case of an incremental send it # made it fail with -EIO and print a message like the following to # dmesg: # # "BTRFS error (device sdc): did not find backref in send_root. \ # inode=257, offset=0, disk_byte=12845056 found extent=12845056" # $CLONER_PROG -s $(((128 + 16) * 1024)) -d 0 -l $((16 * 1024)) \ $SCRATCH_MNT/foo $SCRATCH_MNT/foo $CLONER_PROG -s $(((128 + 48) * 1024)) -d $((16 * 1024)) \ -l $((16 * 1024)) $SCRATCH_MNT/foo $SCRATCH_MNT/foo _run_btrfs_util_prog subvolume snapshot -r $SCRATCH_MNT \ $SCRATCH_MNT/mysnap2 _run_btrfs_util_prog send $SCRATCH_MNT/mysnap1 -f $send_files_dir/1.snap _run_btrfs_util_prog send -p $SCRATCH_MNT/mysnap1 $SCRATCH_MNT/mysnap2 \ -f $send_files_dir/2.snap echo "File digest in the original filesystem:" md5sum $SCRATCH_MNT/mysnap2/foo | _filter_scratch # Now recreate the filesystem by receiving both send streams and verify # we get the same file contents that the original filesystem had. _scratch_unmount _scratch_mkfs >>$seqres.full 2>&1 _scratch_mount _run_btrfs_util_prog receive $SCRATCH_MNT -f $send_files_dir/1.snap _run_btrfs_util_prog receive $SCRATCH_MNT -f $send_files_dir/2.snap echo "File digest in the new filesystem:" md5sum $SCRATCH_MNT/mysnap2/foo | _filter_scratch status=0 exit The test's expected golden output is: wrote 65536/65536 bytes at offset 131072 XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) File digest in the original filesystem: 6c6079335cff141b8a31233ead04cbff SCRATCH_MNT/mysnap2/foo File digest in the new filesystem: 6c6079335cff141b8a31233ead04cbff SCRATCH_MNT/mysnap2/foo But it failed with: (...) @@ -1,7 +1,5 @@ QA output created by 097 wrote 65536/65536 bytes at offset 131072 XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) -File digest in the original filesystem: -6c6079335cff141b8a31233ead04cbff SCRATCH_MNT/mysnap2/foo -File digest in the new filesystem: -6c6079335cff141b8a31233ead04cbff SCRATCH_MNT/mysnap2/foo ... $ cat /home/fdmanana/git/hub/xfstests/results//btrfs/097.full (...) ERROR: send ioctl failed with -5: Input/output error Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-08-09Btrfs: fix stale dir entries after unlink, inode eviction and fsyncFilipe Manana1-0/+29
If we remove a hard link from an inode, the inode gets evicted, then we fsync the inode and then power fail/crash, when the log tree is replayed, the parent directory inode still has entries pointing to the name that no longer exists, while our inode no longer has the BTRFS_INODE_REF_KEY item matching the deleted hard link (as expected), leaving the filesystem in an inconsistent state. The stale directory entries can not be deleted (an attempt to delete them causes -ESTALE errors), which makes it impossible to delete the parent directory. This happens because we track the id of the transaction where the last unlink operation for the inode happened (last_unlink_trans) in an in-memory only field of the inode, that is, a value that is never persisted in the inode item stored on the fs/subvol btree. So if an inode is evicted and loaded again, the value for last_unlink_trans is set to 0, which prevents the fsync from logging the parent directory at btrfs_log_inode_parent(). So fix this by setting last_unlink_trans to the id of the transaction that last modified the inode when we load the inode. This is a pessimistic approach but it always ensures correctness with the trade off of ocassional full transaction commits when an fsync is done against the inode in the same transaction where it was evicted and reloaded when our inode is a directory and often logging its parent unnecessarily when our inode is not a directory. The following test case for fstests triggers the problem: seq=`basename $0` seqres=$RESULT_DIR/$seq echo "QA output created by $seq" tmp=/tmp/$$ status=1 # failure is the default! trap "_cleanup; exit \$status" 0 1 2 3 15 _cleanup() { _cleanup_flakey rm -f $tmp.* } # get standard environment, filters and checks . ./common/rc . ./common/filter . ./common/dmflakey # real QA test starts here _need_to_be_root _supported_fs generic _supported_os Linux _require_scratch _require_dm_flakey _require_metadata_journaling $SCRATCH_DEV rm -f $seqres.full _scratch_mkfs >>$seqres.full 2>&1 _init_flakey _mount_flakey # Create our test file with 2 hard links. mkdir $SCRATCH_MNT/testdir touch $SCRATCH_MNT/testdir/foo ln $SCRATCH_MNT/testdir/foo $SCRATCH_MNT/testdir/bar # Make sure everything done so far is durably persisted. sync # Now remove one of the links, trigger inode eviction and then fsync # our inode. unlink $SCRATCH_MNT/testdir/bar echo 2 > /proc/sys/vm/drop_caches $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/testdir/foo # Silently drop all writes on our scratch device to simulate a power failure. _load_flakey_table $FLAKEY_DROP_WRITES _unmount_flakey # Allow writes again and mount the fs to trigger log/journal replay. _load_flakey_table $FLAKEY_ALLOW_WRITES _mount_flakey # Now verify our directory entries. echo "Entries in testdir:" ls -1 $SCRATCH_MNT/testdir # If we remove our inode, its parent should become empty and therefore we should # be able to remove the parent. rm -f $SCRATCH_MNT/testdir/* rmdir $SCRATCH_MNT/testdir _unmount_flakey # The fstests framework will call fsck against our filesystem which will verify # that all metadata is in a consistent state. status=0 exit The test failed on btrfs with: generic/098 4s ... - output mismatch (see /home/fdmanana/git/hub/xfstests/results//generic/098.out.bad) --- tests/generic/098.out 2015-07-23 18:01:12.616175932 +0100 +++ /home/fdmanana/git/hub/xfstests/results//generic/098.out.bad 2015-07-23 18:04:58.924138308 +0100 @@ -1,3 +1,6 @@ QA output created by 098 Entries in testdir: +bar foo +rm: cannot remove '/home/fdmanana/btrfs-tests/scratch_1/testdir/foo': Stale file handle +rmdir: failed to remove '/home/fdmanana/btrfs-tests/scratch_1/testdir': Directory not empty ... (Run 'diff -u tests/generic/098.out /home/fdmanana/git/hub/xfstests/results//generic/098.out.bad' to see the entire diff) _check_btrfs_filesystem: filesystem on /dev/sdc is inconsistent (see /home/fdmanana/git/hub/xfstests/results//generic/098.full) $ cat /home/fdmanana/git/hub/xfstests/results//generic/098.full (...) checking fs roots root 5 inode 258 errors 2001, no inode item, link count wrong unresolved ref dir 257 index 0 namelen 3 name foo filetype 1 errors 6, no dir index, no inode ref unresolved ref dir 257 index 3 namelen 3 name bar filetype 1 errors 5, no dir item, no inode ref Checking filesystem on /dev/sdc (...) Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-08-09Btrfs: fix stale directory entries after fsync log replayFilipe Manana1-4/+60
We have another case where after an fsync log replay we get an inode with a wrong link count (smaller than it should be) and a number of directory entries greater than its link count. This happens when we add a new link hard link to our inode A and then we fsync some other inode B that has the side effect of logging the parent directory inode too. In this case at log replay time we add the new hard link to our inode (the item with key BTRFS_INODE_REF_KEY) when processing the parent directory but we never adjust the link count of our inode A. As a result we get stale dir entries for our inode A that can never be deleted and therefore it makes it impossible to remove the parent directory (as its i_size can never decrease back to 0). A simple reproducer for fstests that triggers this issue: seq=`basename $0` seqres=$RESULT_DIR/$seq echo "QA output created by $seq" tmp=/tmp/$$ status=1 # failure is the default! trap "_cleanup; exit \$status" 0 1 2 3 15 _cleanup() { _cleanup_flakey rm -f $tmp.* } # get standard environment, filters and checks . ./common/rc . ./common/filter . ./common/dmflakey # real QA test starts here _need_to_be_root _supported_fs generic _supported_os Linux _require_scratch _require_dm_flakey _require_metadata_journaling $SCRATCH_DEV rm -f $seqres.full _scratch_mkfs >>$seqres.full 2>&1 _init_flakey _mount_flakey # Create our test directory and files. mkdir $SCRATCH_MNT/testdir touch $SCRATCH_MNT/testdir/foo touch $SCRATCH_MNT/testdir/bar # Make sure everything done so far is durably persisted. sync # Create one hard link for file foo and another one for file bar. After # that fsync only the file bar. ln $SCRATCH_MNT/testdir/bar $SCRATCH_MNT/testdir/bar_link ln $SCRATCH_MNT/testdir/foo $SCRATCH_MNT/testdir/foo_link $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/testdir/bar # Silently drop all writes on scratch device to simulate power failure. _load_flakey_table $FLAKEY_DROP_WRITES _unmount_flakey # Allow writes again and mount the fs to trigger log/journal replay. _load_flakey_table $FLAKEY_ALLOW_WRITES _mount_flakey # Now verify both our files have a link count of 2. echo "Link count for file foo: $(stat --format=%h $SCRATCH_MNT/testdir/foo)" echo "Link count for file bar: $(stat --format=%h $SCRATCH_MNT/testdir/bar)" # We should be able to remove all the links of our files in testdir, and # after that the parent directory should become empty and therefore # possible to remove it. rm -f $SCRATCH_MNT/testdir/* rmdir $SCRATCH_MNT/testdir _unmount_flakey # The fstests framework will call fsck against our filesystem which will verify # that all metadata is in a consistent state. status=0 exit The test fails with: -Link count for file foo: 2 +Link count for file foo: 1 Link count for file bar: 2 +rm: cannot remove '/home/fdmanana/btrfs-tests/scratch_1/testdir/foo_link': Stale file handle +rmdir: failed to remove '/home/fdmanana/btrfs-tests/scratch_1/testdir': Directory not empty (...) _check_btrfs_filesystem: filesystem on /dev/sdc is inconsistent And fsck's output: (...) checking fs roots root 5 inode 258 errors 2001, no inode item, link count wrong unresolved ref dir 257 index 5 namelen 8 name foo_link filetype 1 errors 4, no inode ref Checking filesystem on /dev/sdc (...) So fix this by marking inodes for link count fixup at log replay time whenever a directory entry is replayed if the entry was created in the transaction where the fsync was made and if it points to a non-directory inode. This isn't a new problem/regression, the issue exists for a long time, possibly since the log tree feature was added (2008). Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-08-02Linux 4.2-rc5Linus Torvalds1-1/+1
2015-08-02i915: temporary fix for DP MST docking station NULL pointer dereferenceLinus Torvalds1-3/+5
Ted Ts'o reports that his Lenovo T540p ThinkPad crashes at boot if attached to the docking station. This is a regression that he was able to bisect to commit 8c7b5ccb7298: "drm/i915: Use atomic helpers for computing changed flags:" The reason seems to be the new call to drm_atomic_helper_check_modeset() added to intel_modeset_compute_config(), which in turn calls update_connector_routing(), and somehow ends up picking a NULL crtc for the connector state, causing the subsequent drm_crtc_index() to OOPS. Daniel Vetter says that the fundamental issue seems to be confusion in the encoder selection, and this isn't the right fix, but while he chases down the proper fix, this at least avoids the NULL pointer dereference and makes Ted's docking station work again. Reported-bisected-and-tested-by: Theodore Ts'o <tytso@mit.edu> Cc: Daniel Vetter <daniel.vetter@intel.com> Cc: Mani Nikula <jani.nikula@linux.intel.com> Cc: Dave Airlie <airlied@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-08-01link_path_walk(): be careful when failing with ENOTDIRAl Viro1-1/+6
In RCU mode we might end up with dentry evicted just we check that it's a directory. In such case we should return ECHILD rather than ENOTDIR, so that pathwalk would be retries in non-RCU mode. Breakage had been introduced in commit b18825a - prior to that we were looking at nd->inode, which had been fetched before verifying that ->d_seq was still valid. That form of check would only be satisfied if at some point the pathname prefix would indeed have resolved to a non-directory. The fix consists of checking ->d_seq after we'd run into a non-directory dentry, and failing with ECHILD in case of mismatch. Note that all branches since 3.12 have that problem... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-31stmmac: fix missing MODULE_LICENSE in stmmac_platformJoachim Eastwood1-0/+4
Commit 50649ab14982 ("stmmac: drop driver from stmmac platform code") was a bit overzealous in removing code and dropped the MODULE_* macro's that are still needed since stmmac_platform can be a module. Fix this by putting the macro's remvoed in 50649ab14982 back. This fixes the following errors when used as a module: stmmac_platform: module license 'unspecified' taints kernel. Disabling lock debugging due to kernel taint stmmac_platform: Unknown symbol devm_kmalloc (err 0) stmmac_platform: Unknown symbol stmmac_suspend (err 0) stmmac_platform: Unknown symbol platform_get_irq_byname (err 0) stmmac_platform: Unknown symbol stmmac_dvr_remove (err 0) stmmac_platform: Unknown symbol platform_get_resource (err 0) stmmac_platform: Unknown symbol of_get_phy_mode (err 0) stmmac_platform: Unknown symbol of_property_read_u32_array (err 0) stmmac_platform: Unknown symbol of_alias_get_id (err 0) stmmac_platform: Unknown symbol stmmac_resume (err 0) stmmac_platform: Unknown symbol stmmac_dvr_probe (err 0) Fixes: 50649ab14982 ("stmmac: drop driver from stmmac platform code") Reported-by: Igor Gnatenko <i.gnatenko.brain@gmail.com> Signed-off-by: Joachim Eastwood <manabian@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-31gianfar: Enable device wakeup when appropriateClaudiu Manoil3-13/+3
The wol_en flag is 0 by default anyway, and we have the following inconsistency: a MAGIC packet wol capable eth interface is registered as a wake-up source but unable to wake-up the system as wol_en is 0 (wake-on flag set to 'd'). Calling set_wakeup_enable() at netdev open is just redundant because wol_en is 0 by default. Let only ethtool call set_wakeup_enable() for now. The bflock is obviously obsoleted, its utility has been corroded over time. The bitfield flags used today in gianfar are accessed only on the init/ config path, with no real possibility of concurrency - nothing that would justify smth. like bflock. Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-31gianfar: Fix suspend/resume for wol magic packetClaudiu Manoil1-68/+30
If we disable NAPI in the first place we can mask the device's interrupts (and halt it) without fearing that imask may be concurrently accessed from interrupt context, so there's no need to do local_irq_save() around gfar_halt_nodisable(). lock_rx_qs()/unlock_tx_qs() are just obsoleted and potentially buggy routines. The txlock is currently used in the driver only to manage TX congestion, it has nothing to do with halting the device. With these changes, the TX processing is stopped before gfar_halt(). Compact gfar_halt() is used instead of gfar_halt_nodisable(), as it disables Rx/TX DMA h/w blocks and the Rx/TX h/w queues. gfar_start() re-enables all these blocks on resume. Enabling the magic-packet mode remains the same, note that the RX block is re-enabled just before entering sleep mode. Add IRQF_NO_SUSPEND flag for the error interrupt line, to signal that the interrupt line must remain active during sleep in order to wake the system by magic packet (MAG) reception interrupt. (On some systems the MAG interrupt did trigger w/o this flag as well, but on others it didn't.) Without these fixes, when suspended during fair Tx traffic the interface occasionally failed to be woken up by magic packet. Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-31gianfar: Fix warning when CONFIG_PM offClaudiu Manoil1-0/+2
CC drivers/net/ethernet/freescale/gianfar.o drivers/net/ethernet/freescale/gianfar.c:568:13: warning: 'lock_tx_qs' defined but not used [-Wunused-function] static void lock_tx_qs(struct gfar_private *priv) ^ drivers/net/ethernet/freescale/gianfar.c:576:13: warning: 'unlock_tx_qs' defined but not used [-Wunused-function] static void unlock_tx_qs(struct gfar_private *priv) ^ Reported-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-31act_pedit: check binding before calling tcf_hash_release()WANG Cong1-3/+2
When we share an action within a filter, the bind refcnt should increase, therefore we should not call tcf_hash_release(). Fixes: 1a29321ed045 ("net_sched: act: Dont increment refcnt on replace") Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Cong Wang <cwang@twopensource.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-31ARM: dts: keystone: fix dt bindings to use post div register for mainpllMurali Karicheri3-9/+6
All of the keystone devices have a separate register to hold post divider value for main pll clock. Currently the fixed-postdiv value used for k2hk/l/e SoCs works by sheer luck as u-boot happens to use a value of 2 for this. Now that we have fixed this in the pll clock driver change the dt bindings for the same. Signed-off-by: Murali Karicheri <m-karicheri2@ti.com> Acked-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Olof Johansson <olof@lixom.net>
2015-07-31Revert "dmaengine: virt-dma: don't always free descriptor upon completion"Jun Nie2-25/+7
This reverts commit b9855f03d560d351e95301b9de0bc3cad3b31fe9. The patch break existing DMA usage case. For example, audio SOC dmaengine never release channel and cause virt-dma to cache too much memory in descriptor to exhaust system memory. Signed-off-by: Vinod Koul <vinod.koul@intel.com>
2015-07-31dmaengine: mv_xor: fix big endian operation in register modeThomas Petazzoni1-4/+5
Commit 6f166312c6ea2 ("dmaengine: mv_xor: add support for a38x command in descriptor mode") introduced the support for a feature that appeared in Armada 38x: specifying the operation to be performed in a per-descriptor basis rather than globally per channel. However, when doing so, it changed the function mv_chan_set_mode() to use: if (IS_ENABLED(__BIG_ENDIAN)) instead of: #if defined(__BIG_ENDIAN) While IS_ENABLED() is perfectly fine for CONFIG_* symbols, it is not for other symbols such as __BIG_ENDIAN that is provided directly by the compiler. Consequently, the commit broke support for big-endian, as the XOR_DESCRIPTOR_SWAP flag was not set in the XOR channel configuration register. The primarily visible effect was some nasty warnings and failures appearing during the self-test of the XOR unit: [ 1.197368] mv_xor d0060900.xor: error on chan 0. intr cause 0x00000082 [ 1.197393] mv_xor d0060900.xor: config 0x00008440 [ 1.197410] mv_xor d0060900.xor: activation 0x00000000 [ 1.197427] mv_xor d0060900.xor: intr cause 0x00000082 [ 1.197443] mv_xor d0060900.xor: intr mask 0x000003f7 [ 1.197460] mv_xor d0060900.xor: error cause 0x00000000 [ 1.197477] mv_xor d0060900.xor: error addr 0x00000000 [ 1.197491] ------------[ cut here ]------------ [ 1.197513] WARNING: CPU: 0 PID: 1 at ../drivers/dma/mv_xor.c:664 mv_xor_interrupt_handler+0x14c/0x170() See also: http://storage.kernelci.org/next/next-20150617/arm-mvebu_v7_defconfig+CONFIG_CPU_BIG_ENDIAN=y/lab-khilman/boot-armada-xp-openblocks-ax3-4.txt Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Fixes: 6f166312c6ea2 ("dmaengine: mv_xor: add support for a38x command in descriptor mode") Reviewed-by: Maxime Ripard <maxime.ripard@free-electrons.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
2015-07-31dmaengine: xgene-dma: Fix the resource map to handle overlappingRameshwar Prasad Sahu3-2/+5
There is an overlap in dma ring cmd csr region due to sharing of ethernet ring cmd csr region. This patch fix the resource overlapping by mapping the entire dma ring cmd csr region. Signed-off-by: Rameshwar Prasad Sahu <rsahu@apm.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
2015-07-31dmaengine: at_xdmac: fix transfer data width in at_xdmac_prep_slave_sg()Cyrille Pitchen1-3/+4
This patch adds the missing update of the transfer data width in at_xdmac_prep_slave_sg(). Indeed, for each item in the scatter-gather list, we check whether the transfer length is aligned with the data width provided by dmaengine_slave_config(). If so, we directly use this data width for the current part of the transfer we are preparing. Otherwise, the data width is reduced to 8 bits (1 byte). Of course, the actual number of register accesses must also be updated to match the new data width. So one chunk was missing in the original patch (see Fixes tag below): the number of register accesses was correctly set to (len >> fixed_dwidth) in mbr_ubc but the real data width was not updated in mbr_cfg. Since mbr_cfg may change for each part of the scatter-gather transfer this also explains why the original patch used the Descriptor View 2 instead of the Descriptor View 1. Let's take the example of a DMA transfer to write 8bit data into an Atmel USART with FIFOs. When FIFOs are enabled in the USART, its Transmit Holding Register (THR) works in multidata mode, that is to say that up to 4 8bit data can be written into the THR in a single 32bit access and it is still possible to write only one data with a 8bit access. To take advantage of this new feature, the DMA driver was modified to allow multiple dwidths when doing slave transfers. For instance, when the total length is 22 bytes, the USART driver splits the transfer into 2 parts: First part: 20 bytes transferred through 5 32bit writes into THR Second part: 2 bytes transferred though 2 8bit writes into THR For the second part, the data width was first set to 4_BYTES by the USART driver thanks to dmaengine_slave_config() then at_xdmac_prep_slave_sg() reduces this data width to 1_BYTE because the 2 byte length is not aligned with the original 4_BYTES data width. Since the data width is modified, the actual number of writes into THR must be set accordingly. Signed-off-by: Cyrille Pitchen <cyrille.pitchen@atmel.com> Fixes: 6d3a7d9e3ada ("dmaengine: at_xdmac: allow muliple dwidths when doing slave transfers") Cc: stable@vger.kernel.org #4.0 and later Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com> Acked-by: Ludovic Desroches <ludovic.desroches@atmel.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
2015-07-31dmaengine: at_hdmac: fix residue computationCyrille Pitchen2-47/+88
As claimed by the programmer datasheet and confirmed by the IP designer, the Block Transfer Size (BTSIZE) bitfield of the Channel x Control A Register (CTRLAx) always refers to a number of Source Width (SRC_WIDTH) transfers. Both the SRC_WIDTH and BTSIZE bitfields can be extacted from the CTRLAx register to compute the DMA residue. So the 'tx_width' field is useless and can be removed from the struct at_desc. Before this patch, atc_prep_slave_sg() was not consistent: BTSIZE was correctly initialized according to the SRC_WIDTH but 'tx_width' was always set to reg_width, which was incorrect for MEM_TO_DEV transfers. It led to bad DMA residue when 'tx_width' != SRC_WIDTH. Also the 'tx_width' field was mostly set only in the first and last descriptors. Depending on the kind of DMA transfer, this field remained uninitialized for intermediate descriptors. The accurate DMA residue was computed only when the currently processed descriptor was the first or the last of the chain. This algorithm was a little bit odd. An accurate DMA residue can always be computed using the SRC_WIDTH and BTSIZE bitfields in the CTRLAx register. Finally, the test to check whether the currently processed descriptor is the last of the chain was wrong: for cyclic transfer, last_desc->lli.dscr is NOT equal to zero, since set_desc_eol() is never called, but logically equal to first_desc->txd.phys. This bug has a side effect on the drivers/tty/serial/atmel_serial.c driver, which uses cyclic DMA transfer to receive data. Since the DMA residue was wrong each time the DMA transfer reaches the second (and last) period of the transfer, no more data were received by the USART driver till the cyclic DMA transfer loops back to the first period. Signed-off-by: Cyrille Pitchen <cyrille.pitchen@atmel.com> Acked-by: Torsten Fleischer <torfl6749@gmail.com> Tested-by: JirĂ­ Prchal <jiri.prchal@aksignal.cz> Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
2015-07-31dmaengine: at_xdmac: fix bug about channel configurationLudovic Desroches1-9/+10
When using descriptor view 2 or higher, we don't write the configuration into AT_XDMAC_CC register because this configuration will be fetch from the descriptor. Unfortunately, the PROT bit is not updated with this method, we have to do it manually before enabling the channel. Signed-off-by: Ludovic Desroches <ludovic.desroches@atmel.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
2015-07-31iommu/amd: Allow non-ATS devices in IOMMUv2 domainsJoerg Roedel1-1/+6
With the grouping of multi-function devices a non-ATS capable device might also end up in the same domain as an IOMMUv2 capable device. So handle this situation gracefully and don't consider it a bug anymore. Tested-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2015-07-31x86/ldt: Make modify_ldt synchronousAndy Lutomirski9-153/+210
modify_ldt() has questionable locking and does not synchronize threads. Improve it: redesign the locking and synchronize all threads' LDTs using an IPI on all modifications. This will dramatically slow down modify_ldt in multithreaded programs, but there shouldn't be any multithreaded programs that care about modify_ldt's performance in the first place. This fixes some fallout from the CVE-2015-5157 fixes. Signed-off-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jan Beulich <jbeulich@suse.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: security@kernel.org <security@kernel.org> Cc: <stable@vger.kernel.org> Cc: xen-devel <xen-devel@lists.xen.org> Link: http://lkml.kernel.org/r/4c6978476782160600471bd865b318db34c7b628.1438291540.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-31x86/xen: Probe target addresses in set_aliased_prot() before the hypercallAndy Lutomirski1-0/+40
The update_va_mapping hypercall can fail if the VA isn't present in the guest's page tables. Under certain loads, this can result in an OOPS when the target address is in unpopulated vmap space. While we're at it, add comments to help explain what's going on. This isn't a great long-term fix. This code should probably be changed to use something like set_memory_ro. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Vrabel <dvrabel@cantab.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jan Beulich <jbeulich@suse.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: security@kernel.org <security@kernel.org> Cc: <stable@vger.kernel.org> Cc: xen-devel <xen-devel@lists.xen.org> Link: http://lkml.kernel.org/r/0b0e55b995cda11e7829f140b833ef932fcabe3a.1438291540.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-30net: sk_clone_lock() should only do get_net() if the parent is not a kernel socketSowmini Varadhan1-1/+2
The newsk returned by sk_clone_lock should hold a get_net() reference if, and only if, the parent is not a kernel socket (making this similar to sk_alloc()). E.g,. for the SYN_RECV path, tcp_v4_syn_recv_sock->..inet_csk_clone_lock sets up the syn_recv newsk from sk_clone_lock. When the parent (listen) socket is a kernel socket (defined in sk_alloc() as having sk_net_refcnt == 0), then the newsk should also have a 0 sk_net_refcnt and should not hold a get_net() reference. Fixes: 26abe14379f8 ("net: Modify sk_alloc to not reference count the netns of kernel sockets.") Acked-by: Eric Dumazet <edumazet@google.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-30net: sched: fix refcount imbalance in actionsDaniel Borkmann2-6/+13
Since commit 55334a5db5cd ("net_sched: act: refuse to remove bound action outside"), we end up with a wrong reference count for a tc action. Test case 1: FOO="1,6 0 0 4294967295," BAR="1,6 0 0 4294967294," tc filter add dev foo parent 1: bpf bytecode "$FOO" flowid 1:1 \ action bpf bytecode "$FOO" tc actions show action bpf action order 0: bpf bytecode '1,6 0 0 4294967295' default-action pipe index 1 ref 1 bind 1 tc actions replace action bpf bytecode "$BAR" index 1 tc actions show action bpf action order 0: bpf bytecode '1,6 0 0 4294967294' default-action pipe index 1 ref 2 bind 1 tc actions replace action bpf bytecode "$FOO" index 1 tc actions show action bpf action order 0: bpf bytecode '1,6 0 0 4294967295' default-action pipe index 1 ref 3 bind 1 Test case 2: FOO="1,6 0 0 4294967295," tc filter add dev foo parent 1: bpf bytecode "$FOO" flowid 1:1 action ok tc actions show action gact action order 0: gact action pass random type none pass val 0 index 1 ref 1 bind 1 tc actions add action drop index 1 RTNETLINK answers: File exists [...] tc actions show action gact action order 0: gact action pass random type none pass val 0 index 1 ref 2 bind 1 tc actions add action drop index 1 RTNETLINK answers: File exists [...] tc actions show action gact action order 0: gact action pass random type none pass val 0 index 1 ref 3 bind 1 What happens is that in tcf_hash_check(), we check tcf_common for a given index and increase tcfc_refcnt and conditionally tcfc_bindcnt when we've found an existing action. Now there are the following cases: 1) We do a late binding of an action. In that case, we leave the tcfc_refcnt/tcfc_bindcnt increased and are done with the ->init() handler. This is correctly handeled. 2) We replace the given action, or we try to add one without replacing and find out that the action at a specific index already exists (thus, we go out with error in that case). In case of 2), we have to undo the reference count increase from tcf_hash_check() in the tcf_hash_check() function. Currently, we fail to do so because of the 'tcfc_bindcnt > 0' check which bails out early with an -EPERM error. Now, while commit 55334a5db5cd prevents 'tc actions del action ...' on an already classifier-bound action to drop the reference count (which could then become negative, wrap around etc), this restriction only accounts for invocations outside a specific action's ->init() handler. One possible solution would be to add a flag thus we possibly trigger the -EPERM ony in situations where it is indeed relevant. After the patch, above test cases have correct reference count again. Fixes: 55334a5db5cd ("net_sched: act: refuse to remove bound action outside") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Cong Wang <cwang@twopensource.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-30r8152: reset device when tx timeouthayeswang1-4/+3
The device reset is necessary if the hw becomes abnormal and stops transmitting packets. Signed-off-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-30r8152: add pre_reset and post_resethayeswang1-0/+54
Add rtl8152_pre_reset() and rtl8152_post_reset() which are used when calling usb_reset_device(). The two functions could reduce the time of reset when calling usb_reset_device() after probe(). Signed-off-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-30ALSA: hda - Fix MacBook Pro 5,2 quirkTakashi Iwai1-1/+1
MacBook Pro 5,2 with ALC889 codec had already a fixup entry, but this seems not working correctly, a fix for pin NID 0x15 is needed in addition. It's equivalent with the fixup for MacBook Air 1,1, so use this instead. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=102131 Reported-and-tested-by: Jeffery Miller <jefferym@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de>
2015-07-30MAINTAINERS: Appoint Jiang and Marc as irqdomain maintainersThomas Gleixner1-1/+4
Ben was pretty surprised that he is still listed as the maintainer and he has no objections against transferring the duty to those who rumaged in and revamped that code in the recent past. Add kernel/irq/msi.c to the affected files as it's part of the shiny new hierarchical irqdomain machinery. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Grant Likely <grant.likely@linaro.org>
2015-07-30MAINTAINERS: Appoint Marc Zyngier as irqchips co-maintainerThomas Gleixner1-0/+1
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Jason Cooper <jason@lakedaemon.net> Cc: Marc Zyngier <marc.zyngier@arm.com>
2015-07-30x86/irq: Use the caller provided polarity setting in mp_check_pin_attr()Jiang Liu1-1/+1
Commit d32932d02e18 ("x86/irq: Convert IOAPIC to use hierarchical irqdomain interfaces") introduced a regression which causes malfunction of interrupt lines. The reason is that the conversion of mp_check_pin_attr() missed to update the polarity selection of the interrupt pin with the caller provided setting and instead uses a stale attribute value. That in turn results in chosing the wrong interrupt flow handler. Use the caller supplied setting to configure the pin correctly which also choses the correct interrupt flow handler. This restores the original behaviour and on the affected machine/driver (Surface Pro 3, i2c controller) all IOAPIC IRQ configuration are identical to v4.1. Fixes: d32932d02e18 ("x86/irq: Convert IOAPIC to use hierarchical irqdomain interfaces") Reported-and-tested-by: Matt Fleming <matt@codeblueprint.co.uk> Reported-and-tested-by: Chen Yu <yu.c.chen@intel.com> Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Chen Yu <yu.c.chen@intel.com> Cc: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1438242695-23531-1-git-send-email-jiang.liu@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-07-30scsi: fix memory leak with scsi-mqTony Battersby2-4/+4
Fix a memory leak with scsi-mq triggered by commands with large data transfer length. __sg_alloc_table() sets both table->nents and table->orig_nents to the same value. When the scatterlist is DMA-mapped, table->nents is overwritten with the (possibly smaller) size of the DMA-mapped scatterlist, while table->orig_nents retains the original size of the allocated scatterlist. scsi_free_sgtable() should therefore check orig_nents instead of nents, and all code that initializes sdb->table without calling __sg_alloc_table() should set both nents and orig_nents. Fixes: d285203cf647 ("scsi: add support for a blk-mq based I/O path.") Cc: <stable@vger.kernel.org> # 3.17+ Signed-off-by: Tony Battersby <tonyb@cybernetics.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ewan D. Milne <emilne@redhat.com> Signed-off-by: James Bottomley <JBottomley@Odin.com>
2015-07-30ipr: Fix invalid array indexing for HRRQBrian King1-3/+8
Fixes another signed / unsigned array indexing bug in the ipr driver. Currently, when hrrq_index wraps, it becomes a negative number. We do the modulo, but still have a negative number, so we end up indexing backwards in the array. Given where the hrrq array is located in memory, we probably won't actually reference memory we don't own, but nonetheless ipr is still looking at data within struct ipr_ioa_cfg and interpreting it as struct ipr_hrr_queue data, so bad things could certainly happen. Each ipr adapter has anywhere from 1 to 16 HRRQs. By default, we use 2 on new adapters. Let's take an example: Assume ioa_cfg->hrrq_index=0x7fffffffe and ioa_cfg->hrrq_num=4: The atomic_add_return will then return -1. We mod this with 3 and get -2, add one and get -1 for an array index. On adapters which support more than a single HRRQ, we dedicate HRRQ to adapter initialization and error interrupts so that we can optimize the other queues for fast path I/O. So all normal I/O uses HRRQ 1-15. So we want to spread the I/O requests across those HRRQs. With the default module parameter settings, this bug won't hit, only when someone sets the ipr.number_of_msix parameter to a value larger than 3 is when bad things start to happen. Cc: <stable@vger.kernel.org> Tested-by: Wen Xiong <wenxiong@linux.vnet.ibm.com> Reviewed-by: Wen Xiong <wenxiong@linux.vnet.ibm.com> Reviewed-by: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com> Signed-off-by: Brian King <brking@linux.vnet.ibm.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: James Bottomley <JBottomley@Odin.com>
2015-07-30ipr: Fix incorrect trace indexingBrian King2-2/+4
When ipr's internal driver trace was changed to an atomic, a signed/unsigned bug slipped in which results in us indexing backwards in our memory buffer writing on memory that does not belong to us. This patch fixes this by removing the modulo and instead just mask off the low bits. Cc: <stable@vger.kernel.org> Tested-by: Wen Xiong <wenxiong@linux.vnet.ibm.com> Reviewed-by: Wen Xiong <wenxiong@linux.vnet.ibm.com> Reviewed-by: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com> Signed-off-by: Brian King <brking@linux.vnet.ibm.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: James Bottomley <JBottomley@Odin.com>
2015-07-30ipr: Fix locking for unit attention handlingBrian King1-5/+7
Make sure we have the host lock held when calling scsi_report_bus_reset. Fixes a crash seen as the __devices list in the scsi host was changing as we were iterating through it. Cc: <stable@vger.kernel.org> Reviewed-by: Wen Xiong <wenxiong@linux.vnet.ibm.com> Reviewed-by: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com> Signed-off-by: Brian King <brking@linux.vnet.ibm.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: James Bottomley <JBottomley@Odin.com>