aboutsummaryrefslogtreecommitdiffstatshomepage
path: root/tools/perf/scripts/python/call-graph-from-sql.py (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2017-09-12f2fs: hurry up to issue discard after io interruptionChao Yu1-2/+15
Once we encounter I/O interruption during issuing discards, we will delay long time before next round, but if system status is I/O idle during the time, it may loses opportunity to issue discards. So this patch changes to hurry up to issue discard after io interruption. Besides, this patch also fixes to issue discards accurately with assigned rate. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-12f2fs: fix to show correct discard_granularity in sysfsChao Yu1-0/+2
Fix below incorrect display when reading discard_granularity sysfs node. $ cat /sys/fs/f2fs/<device>/discard_granularity $ 16 $ echo 32 > /sys/fs/f2fs/<device>/discard_granularity $ cat /sys/fs/f2fs/<device>/discard_granularity $ 16 Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-12f2fs: detect dirty inode in evict_inodeChao Yu1-0/+3
Add a bugon in f2fs_evict_inode to detect inconsistent status between inode cache and related node page cache. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-12ovl: fix false positive ESTALE on lookupAmir Goldstein1-4/+7
Commit b9ac5c274b8c ("ovl: hash overlay non-dir inodes by copy up origin") verifies that the origin lower inode stored in the overlayfs inode matched the inode of a copy up origin dentry found by lookup. There is a false positive result in that check when lower fs does not support file handles and copy up origin cannot be followed by file handle at lookup time. The false negative happens when finding an overlay inode in cache on a copied up overlay dentry lookup. The overlay inode still 'remembers' the copy up origin inode, but the copy up origin dentry is not available for verification. Relax the check in case copy up origin dentry is not available. Fixes: b9ac5c274b8c ("ovl: hash overlay non-dir inodes by copy up...") Cc: <stable@vger.kernel.org> # v4.13 Reported-by: Jordi Pujol <jordipujolp@gmail.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-09-12fuse: getattr cleanupMiklos Szeredi3-23/+18
The refreshed argument isn't used by any caller, get rid of it. Use a helper for just updating the inode (no need to fill in a kstat). Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-09-12fuse: honor iocb sync flags on writeMiklos Szeredi3-22/+28
If the IOCB_DSYNC flag is set a sync is not being performed by fuse_file_write_iter. Honor IOCB_DSYNC/IOCB_SYNC by setting O_DYSNC/O_SYNC respectively in the flags filed of the write request. We don't need to sync data or metadata, since fuse_perform_write() does write-through and the filesystem is responsible for updating file times. Original patch by Vitaly Zolotusky. Reported-by: Nate Clark <nate@neworld.us> Cc: Vitaly Zolotusky <vitaly@unitc.com>. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-09-12fuse: allow server to run in different pid_nsMiklos Szeredi2-9/+7
Commit 0b6e9ea041e6 ("fuse: Add support for pid namespaces") broke Sandstorm.io development tools, which have been sending FUSE file descriptors across PID namespace boundaries since early 2014. The above patch added a check that prevented I/O on the fuse device file descriptor if the pid namespace of the reader/writer was different from the pid namespace of the mounter. With this change passing the device file descriptor to a different pid namespace simply doesn't work. The check was added because pids are transferred to/from the fuse userspace server in the namespace registered at mount time. To fix this regression, remove the checks and do the following: 1) the pid in the request header (the pid of the task that initiated the filesystem operation) is translated to the reader's pid namespace. If a mapping doesn't exist for this pid, then a zero pid is used. Note: even if a mapping would exist between the initiator task's pid namespace and the reader's pid namespace the pid will be zero if either mapping from initator's to mounter's namespace or mapping from mounter's to reader's namespace doesn't exist. 2) The lk.pid value in setlk/setlkw requests and getlk reply is left alone. Userspace should not interpret this value anyway. Also allow the setlk/setlkw operations if the pid of the task cannot be represented in the mounter's namespace (pid being zero in that case). Reported-by: Kenton Varda <kenton@sandstorm.io> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: 0b6e9ea041e6 ("fuse: Add support for pid namespaces") Cc: <stable@vger.kernel.org> # v4.12+ Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Seth Forshee <seth.forshee@canonical.com>
2017-09-11f2fs: clear radix tree dirty tag of pages whose dirty flag is clearedDaeho Jeong2-0/+14
On a senario like writing out the first dirty page of the inode as the inline data, we only cleared dirty flags of the pages, but didn't clear the dirty tags of those pages in the radix tree. If we don't clear the dirty tags of the pages in the radix tree, the inodes which contain the pages will be marked with I_DIRTY_PAGES again and again, and writepages() for the inodes will be invoked in every writeback period. As a result, nothing will be done in every writepages() for the inodes and it will just consume CPU time meaninglessly. Signed-off-by: Daeho Jeong <daeho.jeong@samsung.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-11f2fs: speed up gc_urgent mode with SSRJaegeuk Kim3-13/+16
This patch activates SSR in gc_urgent mode. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-11f2fs: better to wait for fstrim completionJaegeuk Kim1-1/+6
In android, we'd better wait for fstrim completion instead of issuing the discard commands asynchronous. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-11block: directly insert blk-mq request from blk_insert_cloned_request()Jens Axboe3-1/+23
A NULL pointer crash was reported for the case of having the BFQ IO scheduler attached to the underlying blk-mq paths of a DM multipath device. The crash occured in blk_mq_sched_insert_request()'s call to e->type->ops.mq.insert_requests(). Paolo Valente correctly summarized why the crash occured with: "the call chain (dm_mq_queue_rq -> map_request -> setup_clone -> blk_rq_prep_clone) creates a cloned request without invoking e->type->ops.mq.prepare_request for the target elevator e. The cloned request is therefore not initialized for the scheduler, but it is however inserted into the scheduler by blk_mq_sched_insert_request." All said, a request-based DM multipath device's IO scheduler should be the only one used -- when the original requests are issued to the underlying paths as cloned requests they are inserted directly in the underlying dispatch queue(s) rather than through an additional elevator. But commit bd166ef18 ("blk-mq-sched: add framework for MQ capable IO schedulers") switched blk_insert_cloned_request() from using blk_mq_insert_request() to blk_mq_sched_insert_request(). Which incorrectly added elevator machinery into a call chain that isn't supposed to have any. To fix this introduce a blk-mq private blk_mq_request_bypass_insert() that blk_insert_cloned_request() calls to insert the request without involving any elevator that may be attached to the cloned request's request_queue. Fixes: bd166ef183c2 ("blk-mq-sched: add framework for MQ capable IO schedulers") Cc: stable@vger.kernel.org Reported-by: Bart Van Assche <Bart.VanAssche@wdc.com> Tested-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-09-11mm/backing-dev.c: fix an error handling path in 'cgwb_create()'Christophe JAILLET1-2/+4
If the 'kmalloc' fails, we must go through the existing error handling path. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Fixes: 52ebea749aae ("writeback: make backing_dev_info host cgroup-specific bdi_writebacks") Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-09-11string.h: un-fortify memcpy_and_padMartin Wilck1-13/+2
The way I'd implemented the new helper memcpy_and_pad with __FORTIFY_INLINE caused compiler warnings for certain kernel configurations. This helper is only used in a single place at this time, and thus doesn't benefit much from fortification. So simplify the code by dropping fortification support for now. Fixes: 01f33c336e2d "string.h: add memcpy_and_pad()" Signed-off-by: Martin Wilck <mwilck@suse.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-09-11nvme-pci: implement the HMB entry number and size limitationsChristoph Hellwig4-2/+13
Adds support for the new Host Memory Buffer Minimum Descriptor Entry Size and Host Memory Maximum Descriptors Entries field that were added in TP 4002 HMB Enhancements. These allow the controller to advertise limits for the usual number of segments in the host memory buffer, as well as a minimum usable per-segment size. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com>
2017-09-11nvme-pci: propagate (some) errors from host memory buffer setupChristoph Hellwig1-6/+12
We want to catch command execution errors when resetting the device, so propagate errors from the Set Features when setting up the host memory buffer. We keep ignoring memory allocation failures, as the spec clearly says that the controller must work without a host memory buffer. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Cc: stable@vger.kernel.org
2017-09-11nvme-pci: use appropriate initial chunk size for HMB allocationAkinobu Mita1-1/+1
The initial chunk size for host memory buffer allocation is currently PAGE_SIZE << MAX_ORDER. MAX_ORDER order allocation is usually failed without CONFIG_DMA_CMA. So the HMB allocation is retried with chunk size PAGE_SIZE << (MAX_ORDER - 1) in general, but there is no problem if the retry allocation works correctly. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> [hch: rebased] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Cc: stable@vger.kernel.org
2017-09-11nvme-pci: fix host memory buffer allocation fallbackChristoph Hellwig1-18/+30
nvme_alloc_host_mem currently contains two loops that are interwinded, and the outer retry loop turns out to be broken. Fix this by untangling the two. Based on a report an initial patch from Akinobu Mita. Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Akinobu Mita <akinobu.mita@gmail.com> Tested-by: Akinobu Mita <akinobu.mita@gmail.com> Reviewed-by: Keith Busch <keith.busch@intel.com> Cc: stable@vger.kernel.org
2017-09-11nvme: fix lightnvm checkChristoph Hellwig4-35/+14
nvme_nvm_ns_supported assumes every device is a pci_dev, which leads to reading an incorrect field, or possible even a dereference of unallocated memory for fabrics controllers. Fix this by introducing a quirk for lighnvm capable devices instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Matias Bjørling <mb@lightnvm.io> Reviewed-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
2017-09-11block: fix integer overflow in __blkdev_sectors_to_bio_pages()Mikulas Patocka1-2/+2
Fix possible integer overflow in __blkdev_sectors_to_bio_pages if sector_t is 32-bit. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Fixes: 615d22a51c04 ("block: Fix __blkdev_issue_zeroout loop") Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-09-11block: sed-opal: Set MBRDone on S3 resume path if TPER is MBREnabledScott Bauer2-0/+33
Users who are booting off their Opal enabled drives are having issues when they have a shadow MBR set up after s3/resume cycle. When the Drive has a shadow MBR setup the MBRDone flag is set to false upon power loss (S3/S4/S5). When the MBRDone flag is false I/O to LBA 0 -> LBA_END_MBR are remapped to the shadow mbr of the drive. If the drive contains useful data in the 0 -> end_mbr range upon s3 resume the user can never get to that data as the drive will keep remapping it to the MBR. To fix this when we unlock on S3 resume, we need to tell the drive that we're done with the shadow mbr (even though we didnt use it) by setting true to MBRDone. This way the drive will stop the remapping and the user can access their data. Acked-by Jon Derrick: <jonathan.derrick@intel.com> Signed-off-by: Scott Bauer <scott.bauer@intel.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-09-11block: tolerate tracing of NULL bioGreg Thelen1-3/+2
__get_request() can call trace_block_getrq() with bio=NULL which causes block_get_rq::TP_fast_assign() to deref a NULL pointer and panic. Syzkaller fuzzer panics with linux-next (1d53d908b79d7870d89063062584eead4cf83448): kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] SMP KASAN Modules linked in: CPU: 0 PID: 2983 Comm: syzkaller401111 Not tainted 4.13.0-rc7-next-20170901+ #13 task: ffff8801cf1da000 task.stack: ffff8801ce440000 RIP: 0010:perf_trace_block_get_rq+0x697/0x970 include/trace/events/block.h:384 RSP: 0018:ffff8801ce4473f0 EFLAGS: 00010246 RAX: ffff8801cf1da000 RBX: 1ffff10039c88e84 RCX: 1ffffd1ffff84d27 RDX: dffffc0000000001 RSI: 1ffff1003b643e7a RDI: ffffe8ffffc26938 RBP: ffff8801ce447530 R08: 1ffff1003b643e6c R09: ffffe8ffffc26964 R10: 0000000000000002 R11: fffff91ffff84d2d R12: ffffe8ffffc1f890 R13: ffffe8ffffc26930 R14: ffffffff85cad9e0 R15: 0000000000000000 FS: 0000000002641880(0000) GS:ffff8801db200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000000043e670 CR3: 00000001d1d7a000 CR4: 00000000001406f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: trace_block_getrq include/trace/events/block.h:423 [inline] __get_request block/blk-core.c:1283 [inline] get_request+0x1518/0x23b0 block/blk-core.c:1355 blk_old_get_request block/blk-core.c:1402 [inline] blk_get_request+0x1d8/0x3c0 block/blk-core.c:1427 sg_scsi_ioctl+0x117/0x750 block/scsi_ioctl.c:451 sg_ioctl+0x192d/0x2ed0 drivers/scsi/sg.c:1070 vfs_ioctl fs/ioctl.c:45 [inline] do_vfs_ioctl+0x1b1/0x1530 fs/ioctl.c:685 SYSC_ioctl fs/ioctl.c:700 [inline] SyS_ioctl+0x8f/0xc0 fs/ioctl.c:691 entry_SYSCALL_64_fastpath+0x1f/0xbe block_get_rq::TP_fast_assign() has multiple redundant ->dev assignments. Only one of them is NULL tolerant. Favor the NULL tolerant one. Fixes: 74d46992e0d9 ("block: replace bi_bdev with a gendisk pointer and partitions index") Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Greg Thelen <gthelen@google.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-09-11x86/cpu: Remove unused and undefined __generic_processor_info() declarationDou Liyang2-2/+1
The following revert: 2b85b3d22920 ("x86/acpi: Restore the order of CPU IDs") ... got rid of __generic_processor_info(), but forgot to remove its declaration in mpspec.h. Remove the declaration and update the comments as well. Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: lenb@kernel.org Link: http://lkml.kernel.org/r/1505101403-29100-1-git-send-email-douly.fnst@cn.fujitsu.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-11sched/fair: Fix nuisance kernel-doc warningRandy Dunlap1-1/+1
Work around kernel-doc warning ('*' in Sphinx doc means "emphasis"): ../kernel/sched/fair.c:7584: WARNING: Inline emphasis start-string without end-string. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/f18b30f9-6251-6d86-9d44-16501e386891@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-10Revert "firmware: add sanity check on shutdown/suspend"Linus Torvalds2-110/+0
This reverts commit 81f95076281fdd3bc382e004ba1bce8e82fccbce. It causes random failures of firmware loading at resume time (well, random for me, it seems to be more reliable for others) because the firmware disabling is not actually synchronous with any particular resume event, and at least the btusb driver that uses a workqueue to load the firmware at resume seems to occasionally hit the "firmware loading is disabled" logic because the firmware loader hasn't gotten the resume event yet. Some kind of sanity check for not trying to load firmware when it's not possible might be a good thing, but this commit was not it. Greg seems to have silently suffered the same issue, and pointed to the likely culprit, and Gabriel C verified the revert fixed it for him too. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Pointed-at-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Tested-by: Gabriel C <nix.or.die@gmail.com> Cc: Luis R. Rodriguez <mcgrof@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-10m68k: Add braces to __pmd(x) initializer to kill compiler warningGeert Uytterhoeven1-1/+1
With gcc 4.1.2: include/linux/swapops.h: In function ‘swp_entry_to_pmd’: include/linux/swapops.h:294: warning: missing braces around initializer include/linux/swapops.h:294: warning: (near initialization for ‘(anonymous).pmd’) Due to a GCC zero initializer bug (#53119), the standard "(pmd_t){ 0 }" initializer is not accepted by all GCC versions. In addition, on m68k pmd_t is an array instead of a single value, so we need "(pmd_t){ { 0 }, }" instead of "(pmd_t){ 0 }". Based on commit 9157259d16a8 ("mm: add pmd_t initializer __pmd() to work around a GCC bug.") for sparc32. Fixes: 616b8371539a ("mm: thp: enable thp migration in generic path") Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-10x86/mm/64: Fix an incorrect warning with CONFIG_DEBUG_VM=y, !PCIDAndy Lutomirski1-1/+1
I've been staring at the word PCID too long. Fixes: f13c8e8c58ba ("x86/mm: Reinitialize TLB state on hotplug and resume") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-09sparc64: Handle additional cases of no fault loadsRob Gardner1-0/+51
Load instructions using ASI_PNF or other no-fault ASIs should not cause a SIGSEGV or SIGBUS. A garden variety unmapped address follows the TSB miss path, and when no valid mapping is found in the process page tables, the miss handler checks to see if the access was via a no-fault ASI. It then fixes up the target register with a zero, and skips the no-fault load instruction. But different paths are taken for data access exceptions and alignment traps, and these do not respect the no-fault ASI. We add checks in these paths for the no-fault ASI, and fix up the target register and TPC just like in the TSB miss case. Signed-off-by: Rob Gardner <rob.gardner@oracle.com> Acked-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-09sparc64: speed up etrap/rtrap on NG2 and later processorsAnthony Yznaga5-6/+45
For many sun4v processor types, reading or writing a privileged register has a latency of 40 to 70 cycles. Use a combination of the low-latency allclean, otherw, normalw, and nop instructions in etrap and rtrap to replace 2 rdpr and 5 wrpr instructions and improve etrap/rtrap performance. allclean, otherw, and normalw are available on NG2 and later processors. The average ticks to execute the flush windows trap ("ta 0x3") with and without this patch on select platforms: CPU Not patched Patched % Latency Reduction NG2 1762 1558 -11.58 NG4 3619 3204 -11.47 M7 3015 2624 -12.97 SPARC64-X 829 770 -7.12 Signed-off-by: Anthony Yznaga <anthony.yznaga@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-09Bluetooth: Properly check L2CAP config option output buffer lengthBen Seri1-37/+43
Validate the output buffer length for L2CAP config requests and responses to avoid overflowing the stack buffer used for building the option blocks. Cc: stable@vger.kernel.org Signed-off-by: Ben Seri <ben@armis.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-09NFS: Count the bytes of skipped subrequests in nfs_lock_and_join_requests()Trond Myklebust1-1/+5
If we skip a subrequest due to a zero refcount, we should still count the byte range that it covered so that we accurately reconstruct the original request size. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-09-09NFS: Don't hold the group lock when calling nfs_release_request()Trond Myklebust1-1/+1
That can deadlock if this is the last reference since nfs_page_group_destroy() calls nfs_page_group_sync_on_bit(). Note that even if the page was removed from the subpage list, the req->wb_head could still be pointing to the old head. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-09-09libnvdimm, btt: fix format string warningsRandy Dunlap1-2/+2
Fix format warnings (seen on i386) in nvdimm/btt.c: ../drivers/nvdimm/btt.c: In function ‘btt_map_init’: ../drivers/nvdimm/btt.c:430:3: warning: format ‘%lx’ expects argument of type ‘long unsigned int’, but argument 4 has type ‘size_t’ [-Wformat=] dev_WARN_ONCE(to_dev(arena), size < 512, ^ ../drivers/nvdimm/btt.c: In function ‘btt_log_init’: ../drivers/nvdimm/btt.c:474:3: warning: format ‘%lx’ expects argument of type ‘long unsigned int’, but argument 4 has type ‘size_t’ [-Wformat=] dev_WARN_ONCE(to_dev(arena), size < 512, ^ Fixes: 86652d2eb347 ("libnvdimm, btt: clean up warning and error messages") Reported-by: Arnd Bergmann <arnd@arndb.de> Reported-by: kbuild test robot <fengguang.wu@intel.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-09-09remove gperf left-overs from build systemLinus Torvalds1-9/+0
I removed all the gperf use, but not the Makefile rules. Sam Ravnborg says I get bonus points for cleaning this up. I'll hold him to it. Requested-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-09NFS: Remove pnfs_generic_transfer_commit_list()Trond Myklebust2-41/+4
It's pretty much a duplicate of nfs_scan_commit_list() that also clears the PG_COMMIT_TO_DS flag. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-09-09NFS: nfs_lock_and_join_requests and nfs_scan_commit_list can deadlockTrond Myklebust2-9/+22
Since the commit list is not ordered, it is possible for nfs_scan_commit_list to hold a request that nfs_lock_and_join_requests() is waiting for, while at the same time trying to grab a request that nfs_lock_and_join_requests already holds. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-09-09ARM: 8691/1: Export save_stack_trace_tsk()Dustin Brown1-0/+1
The kernel watchdog is a great debugging tool for finding tasks that consume a disproportionate amount of CPU time in contiguous chunks. One can imagine building a similar watchdog for arbitrary driver threads using save_stack_trace_tsk() and print_stack_trace(). However, this is not viable for dynamically loaded driver modules on ARM platforms because save_stack_trace_tsk() is not exported for those architectures. Export save_stack_trace_tsk() for the ARM architecture to align with x86 and support various debugging use cases such as arbitrary driver thread watchdog timers. Signed-off-by: Dustin Brown <dustinb@codeaurora.org> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
2017-09-08bpf: make error reporting in bpf_warn_invalid_xdp_action more clearDaniel Borkmann2-3/+7
Differ between illegal XDP action code and just driver unsupported one to provide better feedback when we throw a one-time warning here. Reason is that with 814abfabef3c ("xdp: add bpf_redirect helper function") not all drivers support the new XDP return code yet and thus they will fall into their 'default' case when checking for return codes after program return, which then triggers a bpf_warn_invalid_xdp_action() stating that the return code is illegal, but from XDP perspective it's not. I decided not to place something like a XDP_ACT_MAX define into uapi i) given we don't have this either for all other program types, ii) future action codes could have further encoding there, which would render such define unsuitable and we wouldn't be able to rip it out again, and iii) we rarely add new action codes. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-08Revert "mdio_bus: Remove unneeded gpiod NULL check"Florian Fainelli1-2/+4
This reverts commit 95b80bf3db03c2bf572a357cf74b9a6aefef0a4a ("mdio_bus: Remove unneeded gpiod NULL check"), this commit assumed that GPIOLIB checks for NULL descriptors, so it's safe to drop them, but it is not when CONFIG_GPIOLIB is disabled in the kernel. If we do call gpiod_set_value_cansleep() on a GPIO descriptor we will issue warnings coming from the inline stubs declared in include/linux/gpio/consumer.h. Fixes: 95b80bf3db03 ("mdio_bus: Remove unneeded gpiod NULL check") Reported-by: Woojung Huh <Woojung.Huh@microchip.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Acked-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-08bpf: devmap, use cond_resched instead of cpu_relaxJohn Fastabend1-1/+1
Be a bit more friendly about waiting for flush bits to complete. Replace the cpu_relax() with a cond_resched(). Suggested-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-08bpf: add support for sockmap detach programsJohn Fastabend4-16/+72
The bpf map sockmap supports adding programs via attach commands. This patch adds the detach command to keep the API symmetric and allow users to remove previously added programs. Otherwise the user would have to delete the map and re-add it to get in this state. This also adds a series of additional tests to capture detach operation and also attaching/detaching invalid prog types. API note: socks will run (or not run) programs depending on the state of the map at the time the sock is added. We do not for example walk the map and remove programs from previously attached socks. Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-08net: rcu lock and preempt disable missing around generic xdpJohn Fastabend1-9/+16
do_xdp_generic must be called inside rcu critical section with preempt disabled to ensure BPF programs are valid and per-cpu variables used for redirect operations are consistent. This patch ensures this is true and fixes the splat below. The netif_receive_skb_internal() code path is now broken into two rcu critical sections. I decided it was better to limit the preempt_enable/disable block to just the xdp static key portion and the fallout is more rcu_read_lock/unlock calls. Seems like the best option to me. [ 607.596901] ============================= [ 607.596906] WARNING: suspicious RCU usage [ 607.596912] 4.13.0-rc4+ #570 Not tainted [ 607.596917] ----------------------------- [ 607.596923] net/core/dev.c:3948 suspicious rcu_dereference_check() usage! [ 607.596927] [ 607.596927] other info that might help us debug this: [ 607.596927] [ 607.596933] [ 607.596933] rcu_scheduler_active = 2, debug_locks = 1 [ 607.596938] 2 locks held by pool/14624: [ 607.596943] #0: (rcu_read_lock_bh){......}, at: [<ffffffff95445ffd>] ip_finish_output2+0x14d/0x890 [ 607.596973] #1: (rcu_read_lock_bh){......}, at: [<ffffffff953c8e3a>] __dev_queue_xmit+0x14a/0xfd0 [ 607.597000] [ 607.597000] stack backtrace: [ 607.597006] CPU: 5 PID: 14624 Comm: pool Not tainted 4.13.0-rc4+ #570 [ 607.597011] Hardware name: Dell Inc. Precision Tower 5810/0HHV7N, BIOS A17 03/01/2017 [ 607.597016] Call Trace: [ 607.597027] dump_stack+0x67/0x92 [ 607.597040] lockdep_rcu_suspicious+0xdd/0x110 [ 607.597054] do_xdp_generic+0x313/0xa50 [ 607.597068] ? time_hardirqs_on+0x5b/0x150 [ 607.597076] ? mark_held_locks+0x6b/0xc0 [ 607.597088] ? netdev_pick_tx+0x150/0x150 [ 607.597117] netif_rx_internal+0x205/0x3f0 [ 607.597127] ? do_xdp_generic+0xa50/0xa50 [ 607.597144] ? lock_downgrade+0x2b0/0x2b0 [ 607.597158] ? __lock_is_held+0x93/0x100 [ 607.597187] netif_rx+0x119/0x190 [ 607.597202] loopback_xmit+0xfd/0x1b0 [ 607.597214] dev_hard_start_xmit+0x127/0x4e0 Fixes: d445516966dc ("net: xdp: support xdp generic on virtual devices") Fixes: b5cdae3291f7 ("net: Generic XDP") Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-08bpf: don't select potentially stale ri->map from buggy xdp progsDaniel Borkmann2-2/+35
We can potentially run into a couple of issues with the XDP bpf_redirect_map() helper. The ri->map in the per CPU storage can become stale in several ways, mostly due to misuse, where we can then trigger a use after free on the map: i) prog A is calling bpf_redirect_map(), returning XDP_REDIRECT and running on a driver not supporting XDP_REDIRECT yet. The ri->map on that CPU becomes stale when the XDP program is unloaded on the driver, and a prog B loaded on a different driver which supports XDP_REDIRECT return code. prog B would have to omit calling to bpf_redirect_map() and just return XDP_REDIRECT, which would then access the freed map in xdp_do_redirect() since not cleared for that CPU. ii) prog A is calling bpf_redirect_map(), returning a code other than XDP_REDIRECT. prog A is then detached, which triggers release of the map. prog B is attached which, similarly as in i), would just return XDP_REDIRECT without having called bpf_redirect_map() and thus be accessing the freed map in xdp_do_redirect() since not cleared for that CPU. iii) prog A is attached to generic XDP, calling the bpf_redirect_map() helper and returning XDP_REDIRECT. xdp_do_generic_redirect() is currently not handling ri->map (will be fixed by Jesper), so it's not being reset. Later loading a e.g. native prog B which would, say, call bpf_xdp_redirect() and then returns XDP_REDIRECT would find in xdp_do_redirect() that a map was set and uses that causing use after free on map access. Fix thus needs to avoid accessing stale ri->map pointers, naive way would be to call a BPF function from drivers that just resets it to NULL for all XDP return codes but XDP_REDIRECT and including XDP_REDIRECT for drivers not supporting it yet (and let ri->map being handled in xdp_do_generic_redirect()). There is a less intrusive way w/o letting drivers call a reset for each BPF run. The verifier knows we're calling into bpf_xdp_redirect_map() helper, so it can do a small insn rewrite transparent to the prog itself in the sense that it fills R4 with a pointer to the own bpf_prog. We have that pointer at verification time anyway and R4 is allowed to be used as per calling convention we scratch R0 to R5 anyway, so they become inaccessible and program cannot read them prior to a write. Then, the helper would store the prog pointer in the current CPUs struct redirect_info. Later in xdp_do_*_redirect() we check whether the redirect_info's prog pointer is the same as passed xdp_prog pointer, and if that's the case then all good, since the prog holds a ref on the map anyway, so it is always valid at that point in time and must have a reference count of at least 1. If in the unlikely case they are not equal, it means we got a stale pointer, so we clear and bail out right there. Also do reset map and the owning prog in bpf_xdp_redirect(), so that bpf_xdp_redirect_map() and bpf_xdp_redirect() won't get mixed up, only the last call should take precedence. A tc bpf_redirect() doesn't use map anywhere yet, so no need to clear it there since never accessed in that layer. Note that in case the prog is released, and thus the map as well we're still under RCU read critical section at that time and have preemption disabled as well. Once we commit with the __dev_map_insert_ctx() from xdp_do_redirect_map() and set the map to ri->map_to_flush, we still wait for a xdp_do_flush_map() to finish in devmap dismantle time once flush_needed bit is set, so that is fine. Fixes: 97f91a7cf04f ("bpf: add bpf_redirect_map helper routine") Reported-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-08net: tulip: Constify tulip_tblKees Cook2-2/+2
It looks like all users of tulip_tbl are reads, so mark this table as read-only. $ git grep tulip_tbl # edited to avoid line-wraps... interrupt.c: iowrite32(tulip_tbl[tp->chip_id].valid_intrs, ... interrupt.c: iowrite32(tulip_tbl[tp->chip_id].valid_intrs&~RxPollInt, ... interrupt.c: iowrite32(tulip_tbl[tp->chip_id].valid_intrs, ... interrupt.c: iowrite32(tulip_tbl[tp->chip_id].valid_intrs | TimerInt, pnic.c: iowrite32(tulip_tbl[tp->chip_id].valid_intrs, ioaddr + CSR7); tulip.h: extern struct tulip_chip_table tulip_tbl[]; tulip_core.c:struct tulip_chip_table tulip_tbl[] = { tulip_core.c:iowrite32(tulip_tbl[tp->chip_id].valid_intrs, ioaddr + CSR5); tulip_core.c:iowrite32(tulip_tbl[tp->chip_id].valid_intrs, ioaddr + CSR7); tulip_core.c:setup_timer(&tp->timer, tulip_tbl[tp->chip_id].media_timer, tulip_core.c:const char *chip_name = tulip_tbl[chip_idx].chip_name; tulip_core.c:if (pci_resource_len (pdev, 0) < tulip_tbl[chip_idx].io_size) tulip_core.c:ioaddr = pci_iomap(..., tulip_tbl[chip_idx].io_size); tulip_core.c:tp->flags = tulip_tbl[chip_idx].flags; tulip_core.c:setup_timer(&tp->timer, tulip_tbl[tp->chip_id].media_timer, tulip_core.c:INIT_WORK(&tp->media_work, tulip_tbl[tp->chip_id].media_task); Cc: "David S. Miller" <davem@davemloft.net> Cc: Jarod Wilson <jarod@redhat.com> Cc: "Gustavo A. R. Silva" <gustavo@embeddedor.com> Cc: netdev@vger.kernel.org Cc: linux-parisc@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-08net: ethernet: ti: netcp_core: no need in netif_napi_delIvan Khoronzhuk1-1/+0
Don't remove rx_napi specifically just before free_netdev(), it's supposed to be done in it and is confusing w/o tx_napi deletion. Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-08davicom: Display proper debug level up to 6Mathieu Malaterre1-1/+1
This will make it explicit some messages are of the form: dm9000_dbg(db, 5, ... Signed-off-by: Mathieu Malaterre <malat@debian.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-08net: phy: sfp: rename dt properties to match the bindingBaruch Siach1-2/+2
Make the Rx rate select control gpio property name match the documented binding. This would make the addition of 'rate-select1-gpios' for SFP+ support more natural. Also, make the MOD-DEF0 gpio property name match the documentation. Signed-off-by: Baruch Siach <baruch@tkos.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>