aboutsummaryrefslogtreecommitdiffstats
path: root/include/uapi (follow)
AgeCommit message (Collapse)AuthorFilesLines
2015-09-11Merge branch 'akpm' (patches from Andrew)Linus Torvalds3-1/+57
Merge fourth patch-bomb from Andrew Morton: - sys_membarier syscall - seq_file interface changes - a few misc fixups * emailed patches from Andrew Morton <akpm@linux-foundation.org>: revert "ocfs2/dlm: use list_for_each_entry instead of list_for_each" mm/early_ioremap: add explicit #include of asm/early_ioremap.h fs/seq_file: convert int seq_vprint/seq_printf/etc... returns to void selftests: enhance membarrier syscall test selftests: add membarrier syscall test sys_membarrier(): system-wide memory barrier (generic, x86) MODSIGN: fix a compilation warning in extract-cert
2015-09-11Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pendingLinus Torvalds1-4/+0
Pull SCSI target updates from Nicholas Bellinger: "Here are the outstanding target-pending updates for v4.3-rc1. Mostly bug-fixes and minor changes this round. The fallout from the big v4.2-rc1 RCU conversion have (thus far) been minimal. The highlights this round include: - Move sense handling routines into scsi_common code (Sagi) - Return ABORTED_COMMAND sense key for PI errors (Sagi) - Add tpg_enabled_sendtargets attribute for disabled iscsi-target discovery (David) - Shrink target struct se_cmd by rearranging fields (Roland) - Drop iSCSI use of mutex around max_cmd_sn increment (Roland) - Replace iSCSI __kernel_sockaddr_storage with sockaddr_storage (Andy + Chris) - Honor fabric max_data_sg_nents I/O transfer limit (Arun + Himanshu + nab) - Fix EXTENDED_COPY >= v4.1 regression OOPsen (Alex + nab)" * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (37 commits) target: use stringify.h instead of own definition target/user: Fix UFLAG_UNKNOWN_OP handling target: Remove no-op conditional target/user: Remove unused variable target: Fix max_cmd_sn increment w/o cmdsn mutex regressions target: Attach EXTENDED_COPY local I/O descriptors to xcopy_pt_sess target/qla2xxx: Honor max_data_sg_nents I/O transfer limit target/iscsi: Replace __kernel_sockaddr_storage with sockaddr_storage target/iscsi: Replace conn->login_ip with login_sockaddr target/iscsi: Keep local_ip as the actual sockaddr target/iscsi: Fix np_ip bracket issue by removing np_ip target: Drop iSCSI use of mutex around max_cmd_sn increment qla2xxx: Update tcm_qla2xxx module description to 24xx+ iscsi-target: Add tpg_enabled_sendtargets for disabled discovery drivers: target: Drop unlikely before IS_ERR(_OR_NULL) target: check DPO/FUA usage for COMPARE AND WRITE target: Shrink struct se_cmd by rearranging fields target: Remove cmd->se_ordered_id (unused except debug log lines) target: add support for START_STOP_UNIT SCSI opcode target: improve unsupported opcode message ...
2015-09-11sys_membarrier(): system-wide memory barrier (generic, x86)Mathieu Desnoyers3-1/+57
Here is an implementation of a new system call, sys_membarrier(), which executes a memory barrier on all threads running on the system. It is implemented by calling synchronize_sched(). It can be used to distribute the cost of user-space memory barriers asymmetrically by transforming pairs of memory barriers into pairs consisting of sys_membarrier() and a compiler barrier. For synchronization primitives that distinguish between read-side and write-side (e.g. userspace RCU [1], rwlocks), the read-side can be accelerated significantly by moving the bulk of the memory barrier overhead to the write-side. The existing applications of which I am aware that would be improved by this system call are as follows: * Through Userspace RCU library (http://urcu.so) - DNS server (Knot DNS) https://www.knot-dns.cz/ - Network sniffer (http://netsniff-ng.org/) - Distributed object storage (https://sheepdog.github.io/sheepdog/) - User-space tracing (http://lttng.org) - Network storage system (https://www.gluster.org/) - Virtual routers (https://events.linuxfoundation.org/sites/events/files/slides/DPDK_RCU_0MQ.pdf) - Financial software (https://lkml.org/lkml/2015/3/23/189) Those projects use RCU in userspace to increase read-side speed and scalability compared to locking. Especially in the case of RCU used by libraries, sys_membarrier can speed up the read-side by moving the bulk of the memory barrier cost to synchronize_rcu(). * Direct users of sys_membarrier - core dotnet garbage collector (https://github.com/dotnet/coreclr/issues/198) Microsoft core dotnet GC developers are planning to use the mprotect() side-effect of issuing memory barriers through IPIs as a way to implement Windows FlushProcessWriteBuffers() on Linux. They are referring to sys_membarrier in their github thread, specifically stating that sys_membarrier() is what they are looking for. To explain the benefit of this scheme, let's introduce two example threads: Thread A (non-frequent, e.g. executing liburcu synchronize_rcu()) Thread B (frequent, e.g. executing liburcu rcu_read_lock()/rcu_read_unlock()) In a scheme where all smp_mb() in thread A are ordering memory accesses with respect to smp_mb() present in Thread B, we can change each smp_mb() within Thread A into calls to sys_membarrier() and each smp_mb() within Thread B into compiler barriers "barrier()". Before the change, we had, for each smp_mb() pairs: Thread A Thread B previous mem accesses previous mem accesses smp_mb() smp_mb() following mem accesses following mem accesses After the change, these pairs become: Thread A Thread B prev mem accesses prev mem accesses sys_membarrier() barrier() follow mem accesses follow mem accesses As we can see, there are two possible scenarios: either Thread B memory accesses do not happen concurrently with Thread A accesses (1), or they do (2). 1) Non-concurrent Thread A vs Thread B accesses: Thread A Thread B prev mem accesses sys_membarrier() follow mem accesses prev mem accesses barrier() follow mem accesses In this case, thread B accesses will be weakly ordered. This is OK, because at that point, thread A is not particularly interested in ordering them with respect to its own accesses. 2) Concurrent Thread A vs Thread B accesses Thread A Thread B prev mem accesses prev mem accesses sys_membarrier() barrier() follow mem accesses follow mem accesses In this case, thread B accesses, which are ensured to be in program order thanks to the compiler barrier, will be "upgraded" to full smp_mb() by synchronize_sched(). * Benchmarks On Intel Xeon E5405 (8 cores) (one thread is calling sys_membarrier, the other 7 threads are busy looping) 1000 non-expedited sys_membarrier calls in 33s =3D 33 milliseconds/call. * User-space user of this system call: Userspace RCU library Both the signal-based and the sys_membarrier userspace RCU schemes permit us to remove the memory barrier from the userspace RCU rcu_read_lock() and rcu_read_unlock() primitives, thus significantly accelerating them. These memory barriers are replaced by compiler barriers on the read-side, and all matching memory barriers on the write-side are turned into an invocation of a memory barrier on all active threads in the process. By letting the kernel perform this synchronization rather than dumbly sending a signal to every process threads (as we currently do), we diminish the number of unnecessary wake ups and only issue the memory barriers on active threads. Non-running threads do not need to execute such barrier anyway, because these are implied by the scheduler context switches. Results in liburcu: Operations in 10s, 6 readers, 2 writers: memory barriers in reader: 1701557485 reads, 2202847 writes signal-based scheme: 9830061167 reads, 6700 writes sys_membarrier: 9952759104 reads, 425 writes sys_membarrier (dyn. check): 7970328887 reads, 425 writes The dynamic sys_membarrier availability check adds some overhead to the read-side compared to the signal-based scheme, but besides that, sys_membarrier slightly outperforms the signal-based scheme. However, this non-expedited sys_membarrier implementation has a much slower grace period than signal and memory barrier schemes. Besides diminishing the number of wake-ups, one major advantage of the membarrier system call over the signal-based scheme is that it does not need to reserve a signal. This plays much more nicely with libraries, and with processes injected into for tracing purposes, for which we cannot expect that signals will be unused by the application. An expedited version of this system call can be added later on to speed up the grace period. Its implementation will likely depend on reading the cpu_curr()->mm without holding each CPU's rq lock. This patch adds the system call to x86 and to asm-generic. [1] http://urcu.so membarrier(2) man page: MEMBARRIER(2) Linux Programmer's Manual MEMBARRIER(2) NAME membarrier - issue memory barriers on a set of threads SYNOPSIS #include <linux/membarrier.h> int membarrier(int cmd, int flags); DESCRIPTION The cmd argument is one of the following: MEMBARRIER_CMD_QUERY Query the set of supported commands. It returns a bitmask of supported commands. MEMBARRIER_CMD_SHARED Execute a memory barrier on all threads running on the system. Upon return from system call, the caller thread is ensured that all running threads have passed through a state where all memory accesses to user-space addresses match program order between entry to and return from the system call (non-running threads are de facto in such a state). This covers threads from all pro=E2=80=90 cesses running on the system. This command returns 0. The flags argument needs to be 0. For future extensions. All memory accesses performed in program order from each targeted thread is guaranteed to be ordered with respect to sys_membarrier(). If we use the semantic "barrier()" to represent a compiler barrier forcing memory accesses to be performed in program order across the barrier, and smp_mb() to represent explicit memory barriers forcing full memory ordering across the barrier, we have the following ordering table for each pair of barrier(), sys_membarrier() and smp_mb(): The pair ordering is detailed as (O: ordered, X: not ordered): barrier() smp_mb() sys_membarrier() barrier() X X O smp_mb() X O O sys_membarrier() O O O RETURN VALUE On success, these system calls return zero. On error, -1 is returned, and errno is set appropriately. For a given command, with flags argument set to 0, this system call is guaranteed to always return the same value until reboot. ERRORS ENOSYS System call is not implemented. EINVAL Invalid arguments. Linux 2015-04-15 MEMBARRIER(2) Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Nicholas Miell <nmiell@comcast.net> Cc: Ingo Molnar <mingo@redhat.com> Cc: Alan Cox <gnomes@lxorguk.ukuu.org.uk> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: David Howells <dhowells@redhat.com> Cc: Pranith Kumar <bobby.prani@gmail.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Shuah Khan <shuahkh@osg.samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-11Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linuxLinus Torvalds1-1/+1
Pull drm fixes from Dave Airlie: "Just a bunch of fixes to squeeze in before -rc1: - three nouveau regression fixes - one qxl regression fix - a bunch of i915 fixes ... and some core displayport/atomic fixes" * 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: drm/nouveau/device: enable c800 quirk for tecra w50 drm/nouveau/clk/gt215: Unbreak engine pausing for GT21x/MCP7x drm/nouveau/gr/nv04: fix big endian setting on gr context drm/qxl: validate monitors config modes drm/i915: Allow DSI dual link to be configured on any pipe drm/i915: Don't try to use DDR DVFS on CHV when disabled in the BIOS drm/i915: Fix CSR MMIO address check drm/i915: Limit the number of loops for reading a split 64bit register drm/i915: Fix broken mst get_hw_state. drm/i915: Pass hpd_status_i915[] to intel_get_hpd_pins() in pre-g4x uapi/drm/i915_drm.h: fix userspace compilation. drm/i915: Always mark the object as dirty when used by the GPU drm/dp: Add dp_aux_i2c_speed_khz module param to set the assume i2c bus speed drm/dp: Adjust i2c-over-aux retry count based on message size and i2c bus speed drm/dp: Define AUX_RETRY_INTERVAL as 500 us drm/atomic: Fix bookkeeping with TEST_ONLY, v3.
2015-09-11target: use stringify.h instead of own definitionDavid Disseldorp1-4/+0
Signed-off-by: David Disseldorp <ddiss@suse.de> Acked-by: Andy Grover <agrover@redhat.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2015-09-10Merge branch 'akpm' (patches from Andrew)Linus Torvalds1-0/+1
Merge third patch-bomb from Andrew Morton: - even more of the rest of MM - lib/ updates - checkpatch updates - small changes to a few scruffy filesystems - kmod fixes/cleanups - kexec updates - a dma-mapping cleanup series from hch * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (81 commits) dma-mapping: consolidate dma_set_mask dma-mapping: consolidate dma_supported dma-mapping: cosolidate dma_mapping_error dma-mapping: consolidate dma_{alloc,free}_noncoherent dma-mapping: consolidate dma_{alloc,free}_{attrs,coherent} mm: use vma_is_anonymous() in create_huge_pmd() and wp_huge_pmd() mm: make sure all file VMAs have ->vm_ops set mm, mpx: add "vm_flags_t vm_flags" arg to do_mmap_pgoff() mm: mark most vm_operations_struct const namei: fix warning while make xmldocs caused by namei.c ipc: convert invalid scenarios to use WARN_ON zlib_deflate/deftree: remove bi_reverse() lib/decompress_unlzma: Do a NULL check for pointer lib/decompressors: use real out buf size for gunzip with kernel fs/affs: make root lookup from blkdev logical size sysctl: fix int -> unsigned long assignments in INT_MIN case kexec: export KERNEL_IMAGE_SIZE to vmcoreinfo kexec: align crash_notes allocation to make it be inside one physical page kexec: remove unnecessary test in kimage_alloc_crash_control_pages() kexec: split kexec_load syscall from kexec core code ...
2015-09-10Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds1-0/+5
Pull more kvm updates from Paolo Bonzini: "ARM: - Full debug support for arm64 - Active state switching for timer interrupts - Lazy FP/SIMD save/restore for arm64 - Generic ARMv8 target PPC: - Book3S: A few bug fixes - Book3S: Allow micro-threading on POWER8 x86: - Compiler warnings Generic: - Adaptive polling for guest halt" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (49 commits) kvm: irqchip: fix memory leak kvm: move new trace event outside #ifdef CONFIG_KVM_ASYNC_PF KVM: trace kvm_halt_poll_ns grow/shrink KVM: dynamic halt-polling KVM: make halt_poll_ns per-vCPU Silence compiler warning in arch/x86/kvm/emulate.c kvm: compile process_smi_save_seg_64() only for x86_64 KVM: x86: avoid uninitialized variable warning KVM: PPC: Book3S: Fix typo in top comment about locking KVM: PPC: Book3S: Fix size of the PSPB register KVM: PPC: Book3S HV: Exit on H_DOORBELL if HOST_IPI is set KVM: PPC: Book3S HV: Fix race in starting secondary threads KVM: PPC: Book3S: correct width in XER handling KVM: PPC: Book3S HV: Fix preempted vcore stolen time calculation KVM: PPC: Book3S HV: Fix preempted vcore list locking KVM: PPC: Book3S HV: Implement H_CLEAR_REF and H_CLEAR_MOD KVM: PPC: Book3S HV: Fix bug in dirty page tracking KVM: PPC: Book3S HV: Fix race in reading change bit when removing HPTE KVM: PPC: Book3S HV: Implement dynamic micro-threading on POWER8 KVM: PPC: Book3S HV: Make use of unused threads when running guests ...
2015-09-10Merge tag 'for-linus-4.3-rc0b-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tipLinus Torvalds1-0/+4
Pull xen terminology fixes from David Vrabel: "Use the correct GFN/BFN terms more consistently" * tag 'for-linus-4.3-rc0b-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen/xenbus: Rename the variable xen_store_mfn to xen_store_gfn xen/privcmd: Further s/MFN/GFN/ clean-up hvc/xen: Further s/MFN/GFN clean-up video/xen-fbfront: Further s/MFN/GFN clean-up xen/tmem: Use xen_page_to_gfn rather than pfn_to_gfn xen: Use correctly the Xen memory terminologies arm/xen: implement correctly pfn_to_mfn xen: Make clear that swiotlb and biomerge are dealing with DMA address
2015-09-10Merge branch 'next' of git://git.monstr.eu/linux-2.6-microblazeLinus Torvalds1-0/+1
Pull microblaze update from Michal Simek. * 'next' of git://git.monstr.eu/linux-2.6-microblaze: elf-em.h: move EM_MICROBLAZE to the common header
2015-09-10Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds1-0/+1
Pull networking fixes from David Miller: 1) Fix out-of-bounds array access in netfilter ipset, from Jozsef Kadlecsik. 2) Use correct free operation on netfilter conntrack templates, from Daniel Borkmann. 3) Fix route leak in SCTP, from Marcelo Ricardo Leitner. 4) Fix sizeof(pointer) in mac80211, from Thierry Reding. 5) Fix cache pointer comparison in ip6mr leading to missed unlock of mrt_lock. From Richard Laing. 6) rds_conn_lookup() needs to consider network namespace in key comparison, from Sowmini Varadhan. 7) Fix deadlock in TIPC code wrt broadcast link wakeups, from Kolmakov Dmitriy. 8) Fix fd leaks in bpf syscall, from Daniel Borkmann. 9) Fix error recovery when installing ipv6 multipath routes, we would delete the old route before we would know if we could fully commit to the new set of nexthops. Fix from Roopa Prabhu. 10) Fix run-time suspend problems in r8152, from Hayes Wang. 11) In fec, don't program the MAC address into the chip when the clocks are gated off. From Fugang Duan. 12) Fix poll behavior for netlink sockets when using rx ring mmap, from Daniel Borkmann. 13) Don't allocate memory with GFP_KERNEL from get_stats64 in r8169 driver, from Corinna Vinschen. 14) In TCP Cubic congestion control, handle idle periods better where we are application limited, in order to keep cwnd from growing out of control. From Eric Dumzet. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (65 commits) tcp_cubic: better follow cubic curve after idle period tcp: generate CA_EVENT_TX_START on data frames xen-netfront: respect user provided max_queues xen-netback: respect user provided max_queues r8169: Fix sleeping function called during get_stats64, v2 ether: add IEEE 1722 ethertype - TSN netlink, mmap: fix edge-case leakages in nf queue zero-copy netlink, mmap: don't walk rx ring on poll if receive queue non-empty cxgb4: changes for new firmware 1.14.4.0 net: fec: add netif status check before set mac address r8152: fix the runtime suspend issues r8152: split DRIVER_VERSION ipv6: fix ifnullfree.cocci warnings add microchip LAN88xx phy driver stmmac: fix check for phydev being open net: qlcnic: delete redundant memsets net: mv643xx_eth: use kzalloc net: jme: use kzalloc() instead of kmalloc+memset net: cavium: liquidio: use kzalloc in setup_glist() net: ipv6: use common fib_default_rule_pref ...
2015-09-10proc: export idle flag via kpageflagsVladimir Davydov1-0/+1
As noted by Minchan, a benefit of reading idle flag from /proc/kpageflags is that one can easily filter dirty and/or unevictable pages while estimating the size of unused memory. Note that idle flag read from /proc/kpageflags may be stale in case the page was accessed via a PTE, because it would be too costly to iterate over all page mappings on each /proc/kpageflags read to provide an up-to-date value. To make sure the flag is up-to-date one has to read /sys/kernel/mm/page_idle/bitmap first. Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Reviewed-by: Andres Lagar-Cavilla <andreslc@google.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Greg Thelen <gthelen@google.com> Cc: Michel Lespinasse <walken@google.com> Cc: David Rientjes <rientjes@google.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-09ether: add IEEE 1722 ethertype - TSNHenrik Austad1-0/+1
IEEE 1722 describes AVB (later renamed to TSN - Time Sensitive Networking), a protocol, encapsualtion and synchronization to utilize standard networks for audio/video (and later other time-sensitive) streams. This standard uses ethertype 0x22F0. http://standards.ieee.org/develop/regauth/ethertype/eth.txt This is a respin of a previous patch ("ether: add AVB frame type ETH_P_AVB") CC: "David S. Miller" <davem@davemloft.net> CC: netdev@vger.kernel.org CC: linux-api@vger.kernel.org CC: linux-kernel@vger.kernel.org Signed-off-by: Henrik Austad <henrik@austad.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-10elf-em.h: move EM_MICROBLAZE to the common headerMike Frysinger1-0/+1
The linux/audit.h header uses EM_MICROBLAZE in order to define AUDIT_ARCH_MICROBLAZE, but it's only available in the microblaze asm headers. Move it to the common elf-em.h header so that the define can be used on non-microblaze systems. Otherwise we get build errors that EM_MICROBLAZE isn't defined when we try to use the AUDIT_ARCH_MICROBLAZE symbol. Signed-off-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Michal Simek <michal.simek@xilinx.com>
2015-09-09Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdmaLinus Torvalds4-0/+512
Pull inifiniband/rdma updates from Doug Ledford: "This is a fairly sizeable set of changes. I've put them through a decent amount of testing prior to sending the pull request due to that. There are still a few fixups that I know are coming, but I wanted to go ahead and get the big, sizable chunk into your hands sooner rather than waiting for those last few fixups. Of note is the fact that this creates what is intended to be a temporary area in the drivers/staging tree specifically for some cleanups and additions that are coming for the RDMA stack. We deprecated two drivers (ipath and amso1100) and are waiting to hear back if we can deprecate another one (ehca). We also put Intel's new hfi1 driver into this area because it needs to be refactored and a transfer library created out of the factored out code, and then it and the qib driver and the soft-roce driver should all be modified to use that library. I expect drivers/staging/rdma to be around for three or four kernel releases and then to go away as all of the work is completed and final deletions of deprecated drivers are done. Summary of changes for 4.3: - Create drivers/staging/rdma - Move amso1100 driver to staging/rdma and schedule for deletion - Move ipath driver to staging/rdma and schedule for deletion - Add hfi1 driver to staging/rdma and set TODO for move to regular tree - Initial support for namespaces to be used on RDMA devices - Add RoCE GID table handling to the RDMA core caching code - Infrastructure to support handling of devices with differing read and write scatter gather capabilities - Various iSER updates - Kill off unsafe usage of global mr registrations - Update SRP driver - Misc mlx4 driver updates - Support for the mr_alloc verb - Support for a netlink interface between kernel and user space cache daemon to speed path record queries and route resolution - Ininitial support for safe hot removal of verbs devices" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (136 commits) IB/ipoib: Suppress warning for send only join failures IB/ipoib: Clean up send-only multicast joins IB/srp: Fix possible protection fault IB/core: Move SM class defines from ib_mad.h to ib_smi.h IB/core: Remove unnecessary defines from ib_mad.h IB/hfi1: Add PSM2 user space header to header_install IB/hfi1: Add CSRs for CONFIG_SDMA_VERBOSITY mlx5: Fix incorrect wc pkey_index assignment for GSI messages IB/mlx5: avoid destroying a NULL mr in reg_user_mr error flow IB/uverbs: reject invalid or unknown opcodes IB/cxgb4: Fix if statement in pick_local_ip6adddrs IB/sa: Fix rdma netlink message flags IB/ucma: HW Device hot-removal support IB/mlx4_ib: Disassociate support IB/uverbs: Enable device removal when there are active user space applications IB/uverbs: Explicitly pass ib_dev to uverbs commands IB/uverbs: Fix race between ib_uverbs_open and remove_one IB/uverbs: Fix reference counting usage of event files IB/core: Make ib_dealloc_pd return void IB/srp: Create an insecure all physical rkey only if needed ...
2015-09-08Merge tag 'platform-drivers-x86-v4.3-1' of git://git.infradead.org/users/dvhart/linux-platform-drivers-x86Linus Torvalds1-3/+29
Pull x86 platform driver updates from Darren Hart: "Significant work on toshiba_acpi, including new hardware support, refactoring, and cleanups. Extend device support for asus, ideapad, and acer systems. New surface pro 3 buttons driver. Misc minor cleanups for thinkpad and hp-wireless. acer-wmi: - No rfkill on HP Omen 15 wifi thinkpad_acpi: - Remove side effects from vdbg_printk -> no_printk macro surface pro 3: - Add support driver for Surface Pro 3 buttons hp-wireless: - remove unneeded goto/label in hpwl_init ideapad-laptop: - add alternative representation for Yoga 2 to DMI table - Add Lenovo Yoga 3 14 to no_hw_rfkill dmi list asus-laptop: - Add key found on Asus F3M MAINTAINERS: - Remove Toshiba Linux mailing list address toshiba_acpi: - Bump driver version to 0.23 - Remove unnecessary checks and returns in HCI/SCI functions - Refactor *{get, set} functions return value - Remove "*not supported" feature prints - Change *available functions return type - Add set_fan_status function - Change some variables to avoid warnings from ninja-check - Reorder toshiba_acpi_alt_keymap entries - Remove unused wireless defines - Transflective backlight updates - Avoid registering input device on WMI event laptops - Add /dev/toshiba_acpi device - Adapt /proc/acpi/toshiba/keys to TOS1900 devices" * tag 'platform-drivers-x86-v4.3-1' of git://git.infradead.org/users/dvhart/linux-platform-drivers-x86: (21 commits) acer-wmi: No rfkill on HP Omen 15 wifi thinkpad_acpi: Remove side effects from vdbg_printk -> no_printk macro surface pro 3: Add support driver for Surface Pro 3 buttons hp-wireless: remove unneeded goto/label in hpwl_init ideapad-laptop: add alternative representation for Yoga 2 to DMI table asus-laptop: Add key found on Asus F3M MAINTAINERS: Remove Toshiba Linux mailing list address ideapad-laptop: Add Lenovo Yoga 3 14 to no_hw_rfkill dmi list toshiba_acpi: Bump driver version to 0.23 toshiba_acpi: Remove unnecessary checks and returns in HCI/SCI functions toshiba_acpi: Refactor *{get, set} functions return value toshiba_acpi: Remove "*not supported" feature prints toshiba_acpi: Change *available functions return type toshiba_acpi: Add set_fan_status function toshiba_acpi: Change some variables to avoid warnings from ninja-check toshiba_acpi: Reorder toshiba_acpi_alt_keymap entries toshiba_acpi: Remove unused wireless defines toshiba_acpi: Transflective backlight updates toshiba_acpi: Avoid registering input device on WMI event laptops toshiba_acpi: Add /dev/toshiba_acpi device ...
2015-09-08Merge tag 'libnvdimm-for-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimmLinus Torvalds1-1/+11
Pull libnvdimm updates from Dan Williams: "This update has successfully completed a 0day-kbuild run and has appeared in a linux-next release. The changes outside of the typical drivers/nvdimm/ and drivers/acpi/nfit.[ch] paths are related to the removal of IORESOURCE_CACHEABLE, the introduction of memremap(), and the introduction of ZONE_DEVICE + devm_memremap_pages(). Summary: - Introduce ZONE_DEVICE and devm_memremap_pages() as a generic mechanism for adding device-driver-discovered memory regions to the kernel's direct map. This facility is used by the pmem driver to enable pfn_to_page() operations on the page frames returned by DAX ('direct_access' in 'struct block_device_operations'). For now, the 'memmap' allocation for these "device" pages comes from "System RAM". Support for allocating the memmap from device memory will arrive in a later kernel. - Introduce memremap() to replace usages of ioremap_cache() and ioremap_wt(). memremap() drops the __iomem annotation for these mappings to memory that do not have i/o side effects. The replacement of ioremap_cache() with memremap() is limited to the pmem driver to ease merging the api change in v4.3. Completion of the conversion is targeted for v4.4. - Similar to the usage of memcpy_to_pmem() + wmb_pmem() in the pmem driver, update the VFS DAX implementation and PMEM api to provide persistence guarantees for kernel operations on a DAX mapping. - Convert the ACPI NFIT 'BLK' driver to map the block apertures as cacheable to improve performance. - Miscellaneous updates and fixes to libnvdimm including support for issuing "address range scrub" commands, clarifying the optimal 'sector size' of pmem devices, a clarification of the usage of the ACPI '_STA' (status) property for DIMM devices, and other minor fixes" * tag 'libnvdimm-for-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (34 commits) libnvdimm, pmem: direct map legacy pmem by default libnvdimm, pmem: 'struct page' for pmem libnvdimm, pfn: 'struct page' provider infrastructure x86, pmem: clarify that ARCH_HAS_PMEM_API implies PMEM mapped WB add devm_memremap_pages mm: ZONE_DEVICE for "device memory" mm: move __phys_to_pfn and __pfn_to_phys to asm/generic/memory_model.h dax: drop size parameter to ->direct_access() nd_blk: change aperture mapping from WC to WB nvdimm: change to use generic kvfree() pmem, dax: have direct_access use __pmem annotation dax: update I/O path to do proper PMEM flushing pmem: add copy_from_iter_pmem() and clear_pmem() pmem, x86: clean up conditional pmem includes pmem: remove layer when calling arch_has_wmb_pmem() pmem, x86: move x86 PMEM API to new pmem.h header libnvdimm, e820: make CONFIG_X86_PMEM_LEGACY a tristate option pmem: switch to devm_ allocations devres: add devm_memremap libnvdimm, btt: write and validate parent_uuid ...
2015-09-08Merge branch 'upstream' of git://git.infradead.org/users/pcmoore/auditLinus Torvalds1-1/+4
Pull audit update from Paul Moore: "This is one of the larger audit patchsets in recent history, consisting of eight patches and almost 400 lines of changes. The bulk of the patchset is the new "audit by executable" functionality which allows admins to set an audit watch based on the executable on disk. Prior to this, admins could only track an application by PID, which has some obvious limitations. Beyond the new functionality we also have some refcnt fixes and a few minor cleanups" * 'upstream' of git://git.infradead.org/users/pcmoore/audit: fixup: audit: implement audit by executable audit: implement audit by executable audit: clean simple fsnotify implementation audit: use macros for unset inode and device values audit: make audit_del_rule() more robust audit: fix uninitialized variable in audit_add_rule() audit: eliminate unnecessary extra layer of watch parent references audit: eliminate unnecessary extra layer of watch references
2015-09-08Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-securityLinus Torvalds1-2/+4
Pull security subsystem updates from James Morris: "Highlights: - PKCS#7 support added to support signed kexec, also utilized for module signing. See comments in 3f1e1bea. ** NOTE: this requires linking against the OpenSSL library, which must be installed, e.g. the openssl-devel on Fedora ** - Smack - add IPv6 host labeling; ignore labels on kernel threads - support smack labeling mounts which use binary mount data - SELinux: - add ioctl whitelisting (see http://kernsec.org/files/lss2015/vanderstoep.pdf) - fix mprotect PROT_EXEC regression caused by mm change - Seccomp: - add ptrace options for suspend/resume" * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (57 commits) PKCS#7: Add OIDs for sha224, sha284 and sha512 hash algos and use them Documentation/Changes: Now need OpenSSL devel packages for module signing scripts: add extract-cert and sign-file to .gitignore modsign: Handle signing key in source tree modsign: Use if_changed rule for extracting cert from module signing key Move certificate handling to its own directory sign-file: Fix warning about BIO_reset() return value PKCS#7: Add MODULE_LICENSE() to test module Smack - Fix build error with bringup unconfigured sign-file: Document dependency on OpenSSL devel libraries PKCS#7: Appropriately restrict authenticated attributes and content type KEYS: Add a name for PKEY_ID_PKCS7 PKCS#7: Improve and export the X.509 ASN.1 time object decoder modsign: Use extract-cert to process CONFIG_SYSTEM_TRUSTED_KEYS extract-cert: Cope with multiple X.509 certificates in a single file sign-file: Generate CMS message as signature instead of PKCS#7 PKCS#7: Support CMS messages also [RFC5652] X.509: Change recorded SKID & AKID to not include Subject or Issuer PKCS#7: Check content type and versions MAINTAINERS: The keyrings mailing list has moved ...
2015-09-08xen/privcmd: Further s/MFN/GFN/ clean-upJulien Grall1-0/+4
The privcmd code is mixing the usage of GFN and MFN within the same functions which make the code difficult to understand when you only work with auto-translated guests. The privcmd driver is only dealing with GFN so replace all the mention of MFN into GFN. The ioctl structure used to map foreign change has been left unchanged given that the userspace is using it. Nonetheless, add a comment to explain the expected value within the "mfn" field. Signed-off-by: Julien Grall <julien.grall@citrix.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2015-09-07Merge tag 'nfs-for-4.3-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds1-1/+1
Pull NFS client updates from Trond Myklebust: "Highlights include: Stable patches: - Fix atomicity of pNFS commit list updates - Fix NFSv4 handling of open(O_CREAT|O_EXCL|O_RDONLY) - nfs_set_pgio_error sometimes misses errors - Fix a thinko in xs_connect() - Fix borkage in _same_data_server_addrs_locked() - Fix a NULL pointer dereference of migration recovery ops for v4.2 client - Don't let the ctime override attribute barriers. - Revert "NFSv4: Remove incorrect check in can_open_delegated()" - Ensure flexfiles pNFS driver updates the inode after write finishes - flexfiles must not pollute the attribute cache with attrbutes from the DS - Fix a protocol error in layoutreturn - Fix a protocol issue with NFSv4.1 CLOSE stateids Bugfixes + cleanups - pNFS blocks bugfixes from Christoph - Various cleanups from Anna - More fixes for delegation corner cases - Don't fsync twice for O_SYNC/IS_SYNC files - Fix pNFS and flexfiles layoutstats bugs - pnfs/flexfiles: avoid duplicate tracking of mirror data - pnfs: Fix layoutget/layoutreturn/return-on-close serialisation issues - pnfs/flexfiles: error handling retries a layoutget before fallback to MDS Features: - Full support for the OPEN NFS4_CREATE_EXCLUSIVE4_1 mode from Kinglong - More RDMA client transport improvements from Chuck - Removal of the deprecated ib_reg_phys_mr() and ib_rereg_phys_mr() verbs from the SUNRPC, Lustre and core infiniband tree. - Optimise away the close-to-open getattr if there is no cached data" * tag 'nfs-for-4.3-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (108 commits) NFSv4: Respect the server imposed limit on how many changes we may cache NFSv4: Express delegation limit in units of pages Revert "NFS: Make close(2) asynchronous when closing NFS O_DIRECT files" NFS: Optimise away the close-to-open getattr if there is no cached data NFSv4.1/flexfiles: Clean up ff_layout_write_done_cb/ff_layout_commit_done_cb NFSv4.1/flexfiles: Mark the layout for return in ff_layout_io_track_ds_error() nfs: Remove unneeded checking of the return value from scnprintf nfs: Fix truncated client owner id without proto type NFSv4.1/flexfiles: Mark layout for return if the mirrors are invalid NFSv4.1/flexfiles: RW layouts are valid only if all mirrors are valid NFSv4.1/flexfiles: Fix incorrect usage of pnfs_generic_mark_devid_invalid() NFSv4.1/flexfiles: Fix freeing of mirrors NFSv4.1/pNFS: Don't request a minimal read layout beyond the end of file NFSv4.1/pnfs: Handle LAYOUTGET return values correctly NFSv4.1/pnfs: Don't ask for a read layout for an empty file. NFSv4.1: Fix a protocol issue with CLOSE stateids NFSv4.1/flexfiles: Don't mark the entire deviceid as bad for file errors SUNRPC: Prevent SYN+SYNACK+RST storms SUNRPC: xs_reset_transport must mark the connection as disconnected NFSv4.1/pnfs: Ensure layoutreturn reserves space for the opaque payload ...
2015-09-05Merge tag 'media/v4.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-mediaLinus Torvalds2-1/+5
Pull media updates from Mauro Carvalho Chehab: - new DVB frontend drivers: ascot2e, cxd2841er, horus3a, lnbh25 - new HDMI capture driver: tc358743 - new driver for NetUP DVB new boards (netup_unidvb) - IR support for DVBSky cards (smipcie-ir) - Coda driver has gain macroblock tiling support - Renesas R-Car gains JPEG codec driver - new DVB platform driver for STi boards: c8sectpfe - added documentation for the media core kABI to device-drivers DocBook - lots of driver fixups, cleanups and improvements * tag 'media/v4.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (297 commits) [media] c8sectpfe: Remove select on undefined LIBELF_32 [media] i2c: fix platform_no_drv_owner.cocci warnings [media] cx231xx: Use wake_up_interruptible() instead of wake_up_interruptible_nr() [media] tc358743: only queue subdev notifications if devnode is set [media] tc358743: add missing Kconfig dependency/select [media] c8sectpfe: Use %pad to print 'dma_addr_t' [media] DocBook media: Fix typo "the the" in xml files [media] tc358743: make reset gpio optional [media] tc358743: set direction of reset gpio using devm_gpiod_get [media] dvbdev: document most of the functions/data structs [media] dvb_frontend.h: document the struct dvb_frontend [media] dvb-frontend.h: document struct dtv_frontend_properties [media] dvb-frontend.h: document struct dvb_frontend_ops [media] dvb: Use DVBFE_ALGO_HW where applicable [media] dvb_frontend.h: document struct analog_demod_ops [media] dvb_frontend.h: Document struct dvb_tuner_ops [media] Docbook: Document struct analog_parameters [media] dvb_frontend.h: get rid of dvbfe_modcod [media] add documentation for struct dvb_tuner_info [media] dvb_frontend: document dvb_frontend_tune_settings ...
2015-09-05Merge tag 'nfsd-4.3' of git://linux-nfs.org/~bfields/linuxLinus Torvalds1-0/+1
Pull nfsd updates from Bruce Fields: "Nothing major, but: - Add Jeff Layton as an nfsd co-maintainer: no change to existing practice, just an acknowledgement of the status quo. - Two patches ("nfsd: ensure that...") for a race overlooked by the state locking rewrite, causing a crash noticed by multiple users. - Lots of smaller bugfixes all over from Kinglong Mee. - From Jeff, some cleanup of server rpc code in preparation for possible shift of nfsd threads to workqueues" * tag 'nfsd-4.3' of git://linux-nfs.org/~bfields/linux: (52 commits) nfsd: deal with DELEGRETURN racing with CB_RECALL nfsd: return CLID_INUSE for unexpected SETCLIENTID_CONFIRM case nfsd: ensure that delegation stateid hash references are only put once nfsd: ensure that the ol stateid hash reference is only put once net: sunrpc: fix tracepoint Warning: unknown op '->' nfsd: allow more than one laundry job to run at a time nfsd: don't WARN/backtrace for invalid container deployment. fs: fix fs/locks.c kernel-doc warning nfsd: Add Jeff Layton as co-maintainer NFSD: Return word2 bitmask if setting security label in OPEN/CREATE NFSD: Set the attributes used to store the verifier for EXCLUSIVE4_1 nfsd: SUPPATTR_EXCLCREAT must be encoded before SECURITY_LABEL. nfsd: Fix an FS_LAYOUT_TYPES/LAYOUT_TYPES encode bug NFSD: Store parent's stat in a separate value nfsd: Fix two typos in comments lockd: NLM grace period shouldn't block NFSv4 opens nfsd: include linux/nfs4.h in export.h sunrpc: Switch to using hash list instead single list sunrpc/nfsd: Remove redundant code by exports seq_operations functions sunrpc: Store cache_detail in seq_file's private directly ...
2015-09-05Merge branch 'akpm' (patches from Andrew)Linus Torvalds4-1/+187
Merge patch-bomb from Andrew Morton: - a few misc things - Andy's "ambient capabilities" - fs/nofity updates - the ocfs2 queue - kernel/watchdog.c updates and feature work. - some of MM. Includes Andrea's userfaultfd feature. [ Hadn't noticed that userfaultfd was 'default y' when applying the patches, so that got fixed in this merge instead. We do _not_ mark new features that nobody uses yet 'default y' - Linus ] * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (118 commits) mm/hugetlb.c: make vma_has_reserves() return bool mm/madvise.c: make madvise_behaviour_valid() return bool mm/memory.c: make tlb_next_batch() return bool mm/dmapool.c: change is_page_busy() return from int to bool mm: remove struct node_active_region mremap: simplify the "overlap" check in mremap_to() mremap: don't do uneccesary checks if new_len == old_len mremap: don't do mm_populate(new_addr) on failure mm: move ->mremap() from file_operations to vm_operations_struct mremap: don't leak new_vma if f_op->mremap() fails mm/hugetlb.c: make vma_shareable() return bool mm: make GUP handle pfn mapping unless FOLL_GET is requested mm: fix status code which move_pages() returns for zero page mm: memcontrol: bring back the VM_BUG_ON() in mem_cgroup_swapout() genalloc: add support of multiple gen_pools per device genalloc: add name arg to gen_pool_get() and devm_gen_pool_create() mm/memblock: WARN_ON when nid differs from overlap region Documentation/features/vm: add feature description and arch support status for batched TLB flush after unmap mm: defer flush of writable TLB entries mm: send one IPI per CPU to TLB flush all entries after unmapping pages ...
2015-09-04userfaultfd: UFFDIO_COPY|UFFDIO_ZEROPAGE uAPIAndrea Arcangeli1-1/+41
This implements the uABI of UFFDIO_COPY and UFFDIO_ZEROPAGE. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Acked-by: Pavel Emelyanov <xemul@parallels.com> Cc: Sanidhya Kashyap <sanidhya.gatech@gmail.com> Cc: zhang.zhanghailiang@huawei.com Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Andres Lagar-Cavilla <andreslc@google.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Hugh Dickins <hughd@google.com> Cc: Peter Feiner <pfeiner@google.com> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "Huangpeng (Peter)" <peter.huangpeng@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04userfaultfd: change the read API to return a uffd_msgAndrea Arcangeli1-15/+55
I had requests to return the full address (not the page aligned one) to userland. It's not entirely clear how the page offset could be relevant because userfaults aren't like SIGBUS that can sigjump to a different place and it actually skip resolving the fault depending on a page offset. There's currently no real way to skip the fault especially because after a UFFDIO_COPY|ZEROPAGE, the fault is optimized to be retried within the kernel without having to return to userland first (not even self modifying code replacing the .text that touched the faulting address would prevent the fault to be repeated). Userland cannot skip repeating the fault even more so if the fault was triggered by a KVM secondary page fault or any get_user_pages or any copy-user inside some syscall which will return to kernel code. The second time FAULT_FLAG_RETRY_NOWAIT won't be set leading to a SIGBUS being raised because the userfault can't wait if it cannot release the mmap_map first (and FAULT_FLAG_RETRY_NOWAIT is required for that). Still returning userland a proper structure during the read() on the uffd, can allow to use the current UFFD_API for the future non-cooperative extensions too and it looks cleaner as well. Once we get additional fields there's no point to return the fault address page aligned anymore to reuse the bits below PAGE_SHIFT. The only downside is that the read() syscall will read 32bytes instead of 8bytes but that's not going to be measurable overhead. The total number of new events that can be extended or of new future bits for already shipped events, is limited to 64 by the features field of the uffdio_api structure. If more will be needed a bump of UFFD_API will be required. [akpm@linux-foundation.org: use __packed] Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Acked-by: Pavel Emelyanov <xemul@parallels.com> Cc: Sanidhya Kashyap <sanidhya.gatech@gmail.com> Cc: zhang.zhanghailiang@huawei.com Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Andres Lagar-Cavilla <andreslc@google.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Hugh Dickins <hughd@google.com> Cc: Peter Feiner <pfeiner@google.com> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "Huangpeng (Peter)" <peter.huangpeng@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04userfaultfd: Rename uffd_api.bits into .featuresPavel Emelyanov1-3/+9
This is (seems to be) the minimal thing that is required to unblock standard uffd usage from the non-cooperative one. Now more bits can be added to the features field indicating e.g. UFFD_FEATURE_FORK and others needed for the latter use-case. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Sanidhya Kashyap <sanidhya.gatech@gmail.com> Cc: zhang.zhanghailiang@huawei.com Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Andres Lagar-Cavilla <andreslc@google.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Hugh Dickins <hughd@google.com> Cc: Peter Feiner <pfeiner@google.com> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "Huangpeng (Peter)" <peter.huangpeng@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04userfaultfd: uAPIAndrea Arcangeli2-0/+84
Defines the uAPI of the userfaultfd, notably the ioctl numbers and protocol. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Acked-by: Pavel Emelyanov <xemul@parallels.com> Cc: Sanidhya Kashyap <sanidhya.gatech@gmail.com> Cc: zhang.zhanghailiang@huawei.com Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Andres Lagar-Cavilla <andreslc@google.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Hugh Dickins <hughd@google.com> Cc: Peter Feiner <pfeiner@google.com> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "Huangpeng (Peter)" <peter.huangpeng@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04capabilities: add a securebit to disable PR_CAP_AMBIENT_RAISEAndy Lutomirski1-1/+10
Per Andrew Morgan's request, add a securebit to allow admins to disable PR_CAP_AMBIENT_RAISE. This securebit will prevent processes from adding capabilities to their ambient set. For simplicity, this disables PR_CAP_AMBIENT_RAISE entirely rather than just disabling setting previously cleared bits. Signed-off-by: Andy Lutomirski <luto@kernel.org> Acked-by: Andrew G. Morgan <morgan@kernel.org> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Cc: Kees Cook <keescook@chromium.org> Cc: Christoph Lameter <cl@linux.com> Cc: Serge Hallyn <serge.hallyn@canonical.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Aaron Jones <aaronmdjones@gmail.com> Cc: Ted Ts'o <tytso@mit.edu> Cc: Andrew G. Morgan <morgan@kernel.org> Cc: Mimi Zohar <zohar@linux.vnet.ibm.com> Cc: Austin S Hemmelgarn <ahferroin7@gmail.com> Cc: Markku Savela <msa@moth.iki.fi> Cc: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: James Morris <james.l.morris@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04capabilities: ambient capabilitiesAndy Lutomirski1-0/+7
Credit where credit is due: this idea comes from Christoph Lameter with a lot of valuable input from Serge Hallyn. This patch is heavily based on Christoph's patch. ===== The status quo ===== On Linux, there are a number of capabilities defined by the kernel. To perform various privileged tasks, processes can wield capabilities that they hold. Each task has four capability masks: effective (pE), permitted (pP), inheritable (pI), and a bounding set (X). When the kernel checks for a capability, it checks pE. The other capability masks serve to modify what capabilities can be in pE. Any task can remove capabilities from pE, pP, or pI at any time. If a task has a capability in pP, it can add that capability to pE and/or pI. If a task has CAP_SETPCAP, then it can add any capability to pI, and it can remove capabilities from X. Tasks are not the only things that can have capabilities; files can also have capabilities. A file can have no capabilty information at all [1]. If a file has capability information, then it has a permitted mask (fP) and an inheritable mask (fI) as well as a single effective bit (fE) [2]. File capabilities modify the capabilities of tasks that execve(2) them. A task that successfully calls execve has its capabilities modified for the file ultimately being excecuted (i.e. the binary itself if that binary is ELF or for the interpreter if the binary is a script.) [3] In the capability evolution rules, for each mask Z, pZ represents the old value and pZ' represents the new value. The rules are: pP' = (X & fP) | (pI & fI) pI' = pI pE' = (fE ? pP' : 0) X is unchanged For setuid binaries, fP, fI, and fE are modified by a moderately complicated set of rules that emulate POSIX behavior. Similarly, if euid == 0 or ruid == 0, then fP, fI, and fE are modified differently (primary, fP and fI usually end up being the full set). For nonroot users executing binaries with neither setuid nor file caps, fI and fP are empty and fE is false. As an extra complication, if you execute a process as nonroot and fE is set, then the "secure exec" rules are in effect: AT_SECURE gets set, LD_PRELOAD doesn't work, etc. This is rather messy. We've learned that making any changes is dangerous, though: if a new kernel version allows an unprivileged program to change its security state in a way that persists cross execution of a setuid program or a program with file caps, this persistent state is surprisingly likely to allow setuid or file-capped programs to be exploited for privilege escalation. ===== The problem ===== Capability inheritance is basically useless. If you aren't root and you execute an ordinary binary, fI is zero, so your capabilities have no effect whatsoever on pP'. This means that you can't usefully execute a helper process or a shell command with elevated capabilities if you aren't root. On current kernels, you can sort of work around this by setting fI to the full set for most or all non-setuid executable files. This causes pP' = pI for nonroot, and inheritance works. No one does this because it's a PITA and it isn't even supported on most filesystems. If you try this, you'll discover that every nonroot program ends up with secure exec rules, breaking many things. This is a problem that has bitten many people who have tried to use capabilities for anything useful. ===== The proposed change ===== This patch adds a fifth capability mask called the ambient mask (pA). pA does what most people expect pI to do. pA obeys the invariant that no bit can ever be set in pA if it is not set in both pP and pI. Dropping a bit from pP or pI drops that bit from pA. This ensures that existing programs that try to drop capabilities still do so, with a complication. Because capability inheritance is so broken, setting KEEPCAPS, using setresuid to switch to nonroot uids, and then calling execve effectively drops capabilities. Therefore, setresuid from root to nonroot conditionally clears pA unless SECBIT_NO_SETUID_FIXUP is set. Processes that don't like this can re-add bits to pA afterwards. The capability evolution rules are changed: pA' = (file caps or setuid or setgid ? 0 : pA) pP' = (X & fP) | (pI & fI) | pA' pI' = pI pE' = (fE ? pP' : pA') X is unchanged If you are nonroot but you have a capability, you can add it to pA. If you do so, your children get that capability in pA, pP, and pE. For example, you can set pA = CAP_NET_BIND_SERVICE, and your children can automatically bind low-numbered ports. Hallelujah! Unprivileged users can create user namespaces, map themselves to a nonzero uid, and create both privileged (relative to their namespace) and unprivileged process trees. This is currently more or less impossible. Hallelujah! You cannot use pA to try to subvert a setuid, setgid, or file-capped program: if you execute any such program, pA gets cleared and the resulting evolution rules are unchanged by this patch. Users with nonzero pA are unlikely to unintentionally leak that capability. If they run programs that try to drop privileges, dropping privileges will still work. It's worth noting that the degree of paranoia in this patch could possibly be reduced without causing serious problems. Specifically, if we allowed pA to persist across executing non-pA-aware setuid binaries and across setresuid, then, naively, the only capabilities that could leak as a result would be the capabilities in pA, and any attacker *already* has those capabilities. This would make me nervous, though -- setuid binaries that tried to privilege-separate might fail to do so, and putting CAP_DAC_READ_SEARCH or CAP_DAC_OVERRIDE into pA could have unexpected side effects. (Whether these unexpected side effects would be exploitable is an open question.) I've therefore taken the more paranoid route. We can revisit this later. An alternative would be to require PR_SET_NO_NEW_PRIVS before setting ambient capabilities. I think that this would be annoying and would make granting otherwise unprivileged users minor ambient capabilities (CAP_NET_BIND_SERVICE or CAP_NET_RAW for example) much less useful than it is with this patch. ===== Footnotes ===== [1] Files that are missing the "security.capability" xattr or that have unrecognized values for that xattr end up with has_cap set to false. The code that does that appears to be complicated for no good reason. [2] The libcap capability mask parsers and formatters are dangerously misleading and the documentation is flat-out wrong. fE is *not* a mask; it's a single bit. This has probably confused every single person who has tried to use file capabilities. [3] Linux very confusingly processes both the script and the interpreter if applicable, for reasons that elude me. The results from thinking about a script's file capabilities and/or setuid bits are mostly discarded. Preliminary userspace code is here, but it needs updating: https://git.kernel.org/cgit/linux/kernel/git/luto/util-linux-playground.git/commit/?h=cap_ambient&id=7f5afbd175d2 Here is a test program that can be used to verify the functionality (from Christoph): /* * Test program for the ambient capabilities. This program spawns a shell * that allows running processes with a defined set of capabilities. * * (C) 2015 Christoph Lameter <cl@linux.com> * Released under: GPL v3 or later. * * * Compile using: * * gcc -o ambient_test ambient_test.o -lcap-ng * * This program must have the following capabilities to run properly: * Permissions for CAP_NET_RAW, CAP_NET_ADMIN, CAP_SYS_NICE * * A command to equip the binary with the right caps is: * * setcap cap_net_raw,cap_net_admin,cap_sys_nice+p ambient_test * * * To get a shell with additional caps that can be inherited by other processes: * * ./ambient_test /bin/bash * * * Verifying that it works: * * From the bash spawed by ambient_test run * * cat /proc/$$/status * * and have a look at the capabilities. */ #include <stdlib.h> #include <stdio.h> #include <errno.h> #include <cap-ng.h> #include <sys/prctl.h> #include <linux/capability.h> /* * Definitions from the kernel header files. These are going to be removed * when the /usr/include files have these defined. */ #define PR_CAP_AMBIENT 47 #define PR_CAP_AMBIENT_IS_SET 1 #define PR_CAP_AMBIENT_RAISE 2 #define PR_CAP_AMBIENT_LOWER 3 #define PR_CAP_AMBIENT_CLEAR_ALL 4 static void set_ambient_cap(int cap) { int rc; capng_get_caps_process(); rc = capng_update(CAPNG_ADD, CAPNG_INHERITABLE, cap); if (rc) { printf("Cannot add inheritable cap\n"); exit(2); } capng_apply(CAPNG_SELECT_CAPS); /* Note the two 0s at the end. Kernel checks for these */ if (prctl(PR_CAP_AMBIENT, PR_CAP_AMBIENT_RAISE, cap, 0, 0)) { perror("Cannot set cap"); exit(1); } } int main(int argc, char **argv) { int rc; set_ambient_cap(CAP_NET_RAW); set_ambient_cap(CAP_NET_ADMIN); set_ambient_cap(CAP_SYS_NICE); printf("Ambient_test forking shell\n"); if (execv(argv[1], argv + 1)) perror("Cannot exec"); return 0; } Signed-off-by: Christoph Lameter <cl@linux.com> # Original author Signed-off-by: Andy Lutomirski <luto@kernel.org> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com> Acked-by: Kees Cook <keescook@chromium.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Aaron Jones <aaronmdjones@gmail.com> Cc: Ted Ts'o <tytso@mit.edu> Cc: Andrew G. Morgan <morgan@kernel.org> Cc: Mimi Zohar <zohar@linux.vnet.ibm.com> Cc: Austin S Hemmelgarn <ahferroin7@gmail.com> Cc: Markku Savela <msa@moth.iki.fi> Cc: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: James Morris <james.l.morris@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04Merge branch 'drm-next' of git://people.freedesktop.org/~airlied/linuxLinus Torvalds3-6/+55
Pull drm updates from Dave Airlie: "This is the main pull request for the drm for 4.3. Nouveau is probably the biggest amount of changes in here, since it missed 4.2. Highlights below, along with the usual bunch of fixes. All stuff outside drm should have applicable acks. Highlights: - new drivers: freescale dcu kms driver - core: more atomic fixes disable some dri1 interfaces on kms drivers drop fb panic handling, this was just getting more broken, as more locking was required. new core fbdev Kconfig support - instead of each driver enable/disabling it struct_mutex cleanups - panel: more new panels cleanup Kconfig - i915: Skylake support enabled by default legacy modesetting using atomic infrastructure Skylake fixes GEN9 workarounds - amdgpu: Fiji support CGS support for amdgpu Initial GPU scheduler - off by default Lots of bug fixes and optimisations. - radeon: DP fixes misc fixes - amdkfd: Add Carrizo support for amdkfd using amdgpu. - nouveau: long pending cleanup to complete driver, fully bisectable which makes it larger, perfmon work more reclocking improvements maxwell displayport fixes - vmwgfx: new DX device support, supports OpenGL 3.3 screen targets support - mgag200: G200eW support G200e new revision support - msm: dragonboard 410c support, msm8x94 support, msm8x74v1 support yuv format support dma plane support mdp5 rotation initial hdcp - sti: atomic support - exynos: lots of cleanups atomic modesetting/pageflipping support render node support - tegra: tegra210 support (dc, dsi, dp/hdmi) dpms with atomic modesetting support - atmel: support for 3 more atmel SoCs new input formats, PRIME support. - dwhdmi: preparing to add audio support - rockchip: yuv plane support" * 'drm-next' of git://people.freedesktop.org/~airlied/linux: (1369 commits) drm/amdgpu: rename gmc_v8_0_init_compute_vmid drm/amdgpu: fix vce3 instance handling drm/amdgpu: remove ib test for the second VCE Ring drm/amdgpu: properly enable VM fault interrupts drm/amdgpu: fix warning in scheduler drm/amdgpu: fix buffer placement under memory pressure drm/amdgpu/cz: fix cz_dpm_update_low_memory_pstate logic drm/amdgpu: fix typo in dce11 watermark setup drm/amdgpu: fix typo in dce10 watermark setup drm/amdgpu: use top down allocation for non-CPU accessible vram drm/amdgpu: be explicit about cpu vram access for driver BOs (v2) drm/amdgpu: set MEC doorbell range for Fiji drm/amdgpu: implement burst NOP for SDMA drm/amdgpu: add insert_nop ring func and default implementation drm/amdgpu: add amdgpu_get_sdma_instance helper function drm/amdgpu: add AMDGPU_MAX_SDMA_INSTANCES drm/amdgpu: add burst_nop flag for sdma drm/amdgpu: add count field for the SDMA NOP packet v2 drm/amdgpu: use PT for VM sync on unmap drm/amdgpu: make wait_event uninterruptible in push_job ...
2015-09-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tileLinus Torvalds2-0/+5
Pull tile updates from Chris Metcalf: "This includes secure computing support as well as miscellaneous minor improvements" * git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile: tile: correct some typos in opcode type names tile/vdso: emit a GNU hash as well tile: Remove finish_arch_switch tile: enable full SECCOMP support tile/time: Migrate to new 'set-state' interface
2015-09-03Merge tag 'powerpc-4.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linuxLinus Torvalds1-1/+3
Pull powerpc updates from Michael Ellerman: - support "hybrid" iommu/direct DMA ops for coherent_mask < dma_mask from Benjamin Herrenschmidt - EEH fixes for SRIOV from Gavin - introduce rtas_get_sensor_fast() for IRQ handlers from Thomas Huth - use hardware RNG for arch_get_random_seed_* not arch_get_random_* from Paul Mackerras - seccomp filter support from Michael Ellerman - opal_cec_reboot2() handling for HMIs & machine checks from Mahesh Salgaonkar - add powerpc timebase as a trace clock source from Naveen N. Rao - misc cleanups in the xmon, signal & SLB code from Anshuman Khandual - add an inline function to update POWER8 HID0 from Gautham R. Shenoy - fix pte_pagesize_index() crash on 4K w/64K hash from Michael Ellerman - drop support for 64K local store on 4K kernels from Michael Ellerman - move dma_get_required_mask() from pnv_phb to pci_controller_ops from Andrew Donnellan - initialize distance lookup table from drconf path from Nikunj A Dadhania - enable RTC class support from Vaibhav Jain - disable automatically blocked PCI config from Gavin Shan - add LEDs driver for PowerNV platform from Vasant Hegde - fix endianness issues in the HVSI driver from Laurent Dufour - kexec endian fixes from Samuel Mendoza-Jonas - fix corrupted pdn list from Gavin Shan - fix fenced PHB caused by eeh_slot_error_detail() from Gavin Shan - Freescale updates from Scott: Highlights include 32-bit memcpy/memset optimizations, checksum optimizations, 85xx config fragments and updates, device tree updates, e6500 fixes for non-SMP, and misc cleanup and minor fixes. - a ton of cxl updates & fixes: - add explicit precision specifiers from Rasmus Villemoes - use more common format specifier from Rasmus Villemoes - destroy cxl_adapter_idr on module_exit from Johannes Thumshirn - destroy afu->contexts_idr on release of an afu from Johannes Thumshirn - compile with -Werror from Daniel Axtens - EEH support from Daniel Axtens - plug irq_bitmap getting leaked in cxl_context from Vaibhav Jain - add alternate MMIO error handling from Ian Munsie - allow release of contexts which have been OPENED but not STARTED from Andrew Donnellan - remove use of macro DEFINE_PCI_DEVICE_TABLE from Vaishali Thakkar - release irqs if memory allocation fails from Vaibhav Jain - remove racy attempt to force EEH invocation in reset from Daniel Axtens - fix + cleanup error paths in cxl_dev_context_init from Ian Munsie - fix force unmapping mmaps of contexts allocated through the kernel api from Ian Munsie - set up and enable PSL Timebase from Philippe Bergheaud * tag 'powerpc-4.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (140 commits) cxl: Set up and enable PSL Timebase cxl: Fix force unmapping mmaps of contexts allocated through the kernel api cxl: Fix + cleanup error paths in cxl_dev_context_init powerpc/eeh: Fix fenced PHB caused by eeh_slot_error_detail() powerpc/pseries: Cleanup on pci_dn_reconfig_notifier() powerpc/pseries: Fix corrupted pdn list powerpc/powernv: Enable LEDS support powerpc/iommu: Set default DMA offset in dma_dev_setup cxl: Remove racy attempt to force EEH invocation in reset cxl: Release irqs if memory allocation fails cxl: Remove use of macro DEFINE_PCI_DEVICE_TABLE powerpc/powernv: Fix mis-merge of OPAL support for LEDS driver powerpc/powernv: Reset HILE before kexec_sequence() powerpc/kexec: Reset secondary cpu endianness before kexec powerpc/hvsi: Fix endianness issues in the HVSI driver leds/powernv: Add driver for PowerNV platform powerpc/powernv: Create LED platform device powerpc/powernv: Add OPAL interfaces for accessing and modifying system LED states powerpc/powernv: Fix the log message when disabling VF cxl: Allow release of contexts which have been OPENED but not STARTED ...
2015-09-03Merge tag 'dlm-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlmLinus Torvalds1-1/+1
Pull dlm updates from David Teigland: "This set mainly includes a change to the way the dlm uses the SCTP API in the kernel, removing the direct dependency on the sctp module. Other odd SCTP-related fixes are also included. The other notable fix is for a long standing regression in the behavior of lock value blocks for user space locks" * tag 'dlm-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm: dlm: print error from kernel_sendpage dlm: fix lvb copy for user locks dlm: sctp_accept_from_sock() can be static dlm: fix reconnecting but not sending data dlm: replace BUG_ON with a less severe handling dlm: use sctp 1-to-1 API dlm: fix not reconnecting on connecting error handling dlm: fix race while closing connections dlm: fix connection stealing if using SCTP
2015-09-03IB/hfi1: Add PSM2 user space header to header_installIra Weiny2-0/+3
When the hfi1 driver was added a user space header file (hfi1_user.h) was added to be shared between PSM2 and the driver. However, the file was not added to the header install. Add it now. Fixes: d4ab347005fb ("IB/core: Add core header changes needed for OPA") Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-09-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds25-8/+267
Pull networking updates from David Miller: "Another merge window, another set of networking changes. I've heard rumblings that the lightweight tunnels infrastructure has been voted networking change of the year. But what do I know? 1) Add conntrack support to openvswitch, from Joe Stringer. 2) Initial support for VRF (Virtual Routing and Forwarding), which allows the segmentation of routing paths without using multiple devices. There are some semantic kinks to work out still, but this is a reasonably strong foundation. From David Ahern. 3) Remove spinlock fro act_bpf fast path, from Alexei Starovoitov. 4) Ignore route nexthops with a link down state in ipv6, just like ipv4. From Andy Gospodarek. 5) Remove spinlock from fast path of act_gact and act_mirred, from Eric Dumazet. 6) Document the DSA layer, from Florian Fainelli. 7) Add netconsole support to bcmgenet, systemport, and DSA. Also from Florian Fainelli. 8) Add Mellanox Switch Driver and core infrastructure, from Jiri Pirko. 9) Add support for "light weight tunnels", which allow for encapsulation and decapsulation without bearing the overhead of a full blown netdevice. From Thomas Graf, Jiri Benc, and a cast of others. 10) Add Identifier Locator Addressing support for ipv6, from Tom Herbert. 11) Support fragmented SKBs in iwlwifi, from Johannes Berg. 12) Allow perf PMUs to be accessed from eBPF programs, from Kaixu Xia. 13) Add BQL support to 3c59x driver, from Loganaden Velvindron. 14) Stop using a zero TX queue length to mean that a device shouldn't have a qdisc attached, use an explicit flag instead. From Phil Sutter. 15) Use generic geneve netdevice infrastructure in openvswitch, from Pravin B Shelar. 16) Add infrastructure to avoid re-forwarding a packet in software that was already forwarded by a hardware switch. From Scott Feldman. 17) Allow AF_PACKET fanout function to be implemented in a bpf program, from Willem de Bruijn" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1458 commits) netfilter: nf_conntrack: make nf_ct_zone_dflt built-in netfilter: nf_dup{4, 6}: fix build error when nf_conntrack disabled net: fec: clear receive interrupts before processing a packet ipv6: fix exthdrs offload registration in out_rt path xen-netback: add support for multicast control bgmac: Update fixed_phy_register() sock, diag: fix panic in sock_diag_put_filterinfo flow_dissector: Use 'const' where possible. flow_dissector: Fix function argument ordering dependency ixgbe: Resolve "initialized field overwritten" warnings ixgbe: Remove bimodal SR-IOV disabling ixgbe: Add support for reporting 2.5G link speed ixgbe: fix bounds checking in ixgbe_setup_tc for 82598 ixgbe: support for ethtool set_rxfh ixgbe: Avoid needless PHY access on copper phys ixgbe: cleanup to use cached mask value ixgbe: Remove second instance of lan_id variable ixgbe: use kzalloc for allocating one thing flow: Move __get_hash_from_flowi{4,6} into flow_dissector.c ixgbe: Remove unused PCI bus types ...
2015-09-02Merge tag 'dm-4.3-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dmLinus Torvalds1-2/+2
Pull device mapper update from Mike Snitzer: - a couple small cleanups in dm-cache, dm-verity, persistent-data's dm-btree, and DM core. - a 4.1-stable fix for dm-cache that fixes the leaking of deferred bio prison cells - a 4.2-stable fix that adds feature reporting for the dm-stats features added in 4.2 - improve DM-snapshot to not invalidate the on-disk snapshot if snapshot device write overflow occurs; but a write overflow triggered through the origin device will still invalidate the snapshot. - optimize DM-thinp's async discard submission a bit now that late bio splitting has been included in block core. - switch DM-cache's SMQ policy lock from using a mutex to a spinlock; improves performance on very low latency devices (eg. NVMe SSD). - document DM RAID 4/5/6's discard support [ I did not pull the slab changes, which weren't appropriate for this tree, and weren't obviously the right thing to do anyway. At the very least they need some discussion and explanation before getting merged. Because not pulling the actual tagged commit but doing a partial pull instead, this merge commit thus also obviously is missing the git signature from the original tag ] * tag 'dm-4.3-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm cache: fix use after freeing migrations dm cache: small cleanups related to deferred prison cell cleanup dm cache: fix leaking of deferred bio prison cells dm raid: document RAID 4/5/6 discard support dm stats: report precise_timestamps and histogram in @stats_list output dm thin: optimize async discard submission dm snapshot: don't invalidate on-disk image on snapshot write overflow dm: remove unlikely() before IS_ERR() dm: do not override error code returned from dm_get_device() dm: test return value for DM_MAPIO_SUBMITTED dm verity: remove unused mempool dm cache: move wake_waker() from free_migrations() to where it is needed dm btree remove: remove unused function get_nr_entries() dm btree: remove unused "dm_block_t root" parameter in btree_split_sibling() dm cache policy smq: change the mutex to a spinlock
2015-09-02Merge branch 'for-4.3/drivers' of git://git.kernel.dk/linux-blockLinus Torvalds1-0/+1
Pull block driver updates from Jens Axboe: "On top of the 4.3 core block IO changes, here are the driver related changes for 4.3. Basically just NVMe and nbd this time around: - NVMe: - PRACT PI improvement from Alok Pandey. - Cleanups and improvements on submission queue doorbell and writing, using CMB if available. From Jon Derrick. - From Keith, support for setting queue maximum segments, and reset support. - Also from Jon, fixup of u64 division issue on 32-bit archs and wiring up of the reset support through and ioctl. - Two small cleanups from Matias and Sunad - Various code cleanups and fixes from Markus Pargmann" * 'for-4.3/drivers' of git://git.kernel.dk/linux-block: NVMe: Using PRACT bit to generate and verify PI by controller NVMe:Remove unreachable code in nvme_abort_req NVMe: Add nvme subsystem reset IOCTL NVMe: Add nvme subsystem reset support NVMe: removed unused nn var from nvme_dev_add NVMe: Set queue max segments nbd: flags is a u32 variable nbd: Rename functions for clearness of recv/send path nbd: Change 'disconnect' to be boolean nbd: Add debugfs entries nbd: Remove variable 'pid' nbd: Move clear queue debug message nbd: Remove 'harderror' and propagate error properly nbd: restructure sock_shutdown nbd: sock_shutdown, remove conditional lock nbd: Fix timeout detection nvme: Fixes u64 division which breaks i386 builds NVMe: Use CMB for the IO SQes if available NVMe: Unify SQ entry writing and doorbell ringing
2015-09-02Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds2-0/+175
Pull first round of SCSI updates from James Bottomley: "This includes one new driver: cxlflash plus the usual grab bag of updates for the major drivers: qla2xxx, ipr, storvsc, pm80xx, hptiop, plus a few assorted fixes. There's another tranch coming, but I want to incubate it another few days in the checkers, plus it includes a mpt2sas separated lifetime fix, which Avago won't get done testing until Friday" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (85 commits) aic94xx: set an error code on failure storvsc: Set the error code correctly in failure conditions storvsc: Allow write_same when host is windows 10 storvsc: use storage protocol version to determine storage capabilities storvsc: use correct defaults for values determined by protocol negotiation storvsc: Untangle the storage protocol negotiation from the vmbus protocol negotiation. storvsc: Use a single value to track protocol versions storvsc: Rather than look for sets of specific protocol versions, make decisions based on ranges. cxlflash: Remove unused variable from queuecommand cxlflash: shift wrapping bug in afu_link_reset() cxlflash: off by one bug in cxlflash_show_port_status() cxlflash: Virtual LUN support cxlflash: Superpipe support cxlflash: Base error recovery support qla2xxx: Update driver version to 8.07.00.26-k qla2xxx: Add pci device id 0x2261. qla2xxx: Fix missing device login retries. qla2xxx: do not clear slot in outstanding cmd array qla2xxx: Remove decrement of sp reference count in abort handler. qla2xxx: Add support to show MPI and PEP FW version for ISP27xx. ...
2015-09-02uapi/drm/i915_drm.h: fix userspace compilation.Artem Savkov1-1/+1
commit 346add7834557b5b9628b9bf2387106d42e631d4 Author: Daniel Vetter <daniel.vetter@ffwll.ch> Date: Tue Jul 14 18:07:30 2015 +0200 drm/i915: Use expcitly fixed type in compat32 structs changed the type of param field in drm_i915_getparam from int to s32. This header is exported to userspace and needs to use userspace type __s32 instead. This fixes userspace compilation errors like the following: include/drm/i915_drm.h:361:2: error: unknown type name 's32' s32 param; Signed-off-by: Artem Savkov <asavkov@redhat.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2015-08-31Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds1-2/+33
Pull perf updates from Ingo Molnar: "Main perf kernel side changes: - uprobes updates/fixes. (Oleg Nesterov) - Add PERF_RECORD_SWITCH to indicate context switches and use it in tooling. (Adrian Hunter) - Support BPF programs attached to uprobes and first steps for BPF tooling support. (Wang Nan) - x86 generic x86 MSR-to-perf PMU driver. (Andy Lutomirski) - x86 Intel PT, LBR and BTS updates. (Alexander Shishkin) - x86 Intel Skylake support. (Andi Kleen) - x86 Intel Knights Landing (KNL) RAPL support. (Dasaratharaman Chandramouli) - x86 Intel Broadwell-DE uncore support. (Kan Liang) - x86 hw breakpoints robustization (Andy Lutomirski) Main perf tooling side changes: - Support Intel PT in several tools, enabling the use of the processor trace feature introduced in Intel Broadwell processors: (Adrian Hunter) # dmesg | grep Performance # [0.188477] Performance Events: PEBS fmt2+, 16-deep LBR, Broadwell events, full-width counters, Intel PMU driver. # perf record -e intel_pt//u -a sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.216 MB perf.data ] # perf script # then navigate in the tool output to some area, like this one: 184 1030 dl_main (/usr/lib64/ld-2.17.so) => 7f21ba661440 dl_main (/usr/lib64/ld-2.17.so) 185 1457 dl_main (/usr/lib64/ld-2.17.so) => 7f21ba669f10 _dl_new_object (/usr/lib64/ld-2.17.so) 186 9f37 _dl_new_object (/usr/lib64/ld-2.17.so) => 7f21ba677b90 strlen (/usr/lib64/ld-2.17.so) 187 7ba3 strlen (/usr/lib64/ld-2.17.so) => 7f21ba677c75 strlen (/usr/lib64/ld-2.17.so) 188 7c78 strlen (/usr/lib64/ld-2.17.so) => 7f21ba669f3c _dl_new_object (/usr/lib64/ld-2.17.so) 189 9f8a _dl_new_object (/usr/lib64/ld-2.17.so) => 7f21ba65fab0 calloc@plt (/usr/lib64/ld-2.17.so) 190 fab0 calloc@plt (/usr/lib64/ld-2.17.so) => 7f21ba675e70 calloc (/usr/lib64/ld-2.17.so) 191 5e87 calloc (/usr/lib64/ld-2.17.so) => 7f21ba65fa90 malloc@plt (/usr/lib64/ld-2.17.so) 192 fa90 malloc@plt (/usr/lib64/ld-2.17.so) => 7f21ba675e60 malloc (/usr/lib64/ld-2.17.so) 193 5e68 malloc (/usr/lib64/ld-2.17.so) => 7f21ba65fa80 __libc_memalign@plt (/usr/lib64/ld-2.17.so) 194 fa80 __libc_memalign@plt (/usr/lib64/ld-2.17.so) => 7f21ba675d50 __libc_memalign (/usr/lib64/ld-2.17.so) 195 5d63 __libc_memalign (/usr/lib64/ld-2.17.so) => 7f21ba675e20 __libc_memalign (/usr/lib64/ld-2.17.so) 196 5e40 __libc_memalign (/usr/lib64/ld-2.17.so) => 7f21ba675d73 __libc_memalign (/usr/lib64/ld-2.17.so) 197 5d97 __libc_memalign (/usr/lib64/ld-2.17.so) => 7f21ba675e18 __libc_memalign (/usr/lib64/ld-2.17.so) 198 5e1e __libc_memalign (/usr/lib64/ld-2.17.so) => 7f21ba675df9 __libc_memalign (/usr/lib64/ld-2.17.so) 199 5e10 __libc_memalign (/usr/lib64/ld-2.17.so) => 7f21ba669f8f _dl_new_object (/usr/lib64/ld-2.17.so) 200 9fc2 _dl_new_object (/usr/lib64/ld-2.17.so) => 7f21ba678e70 memcpy (/usr/lib64/ld-2.17.so) 201 8e8c memcpy (/usr/lib64/ld-2.17.so) => 7f21ba678ea0 memcpy (/usr/lib64/ld-2.17.so) - Add support for using several Intel PT features (CYC, MTC packets), the relevant documentation was updated in: tools/perf/Documentation/intel-pt.txt briefly describing those packets, its purposes, how to configure them in the event config terms and relevant external documentation for further reading. (Adrian Hunter) - Introduce support for probing at an absolute address, for user and kernel 'perf probe's, useful when one have the symbol maps on a developer machine but not on an embedded system. (Wang Nan) - Add Intel BTS support, with a call-graph script to show it and PT in use in a GUI using 'perf script' python scripting with postgresql and Qt. (Adrian Hunter) - Allow selecting the type of callchains per event, including disabling callchains in all but one entry in an event list, to save space, and also to ask for the callchains collected in one event to be used in other events. (Kan Liang) - Beautify more syscall arguments in 'perf trace': (Arnaldo Carvalho de Melo) * A bunch more translate file/pathnames from pointers to strings. * Convert numbers to strings for the 'keyctl' syscall 'option' arg. * Add missing 'clockid' entries. - Introduce 'srcfile' sort key: (Andi Kleen) # perf record -F 10000 usleep 1 # perf report --stdio --dsos '[kernel.vmlinux]' -s srcfile <SNIP> # Overhead Source File 26.49% copy_page_64.S 5.49% signal.c 0.51% msr.h # It can be combined with other fields, for instance, experiment with '-s srcfile,symbol'. There are some oddities in some distros and with some specific DSOs, being investigated, so your mileage may vary. - Support per-event 'freq' term: (Namhyung Kim) $ perf record -e 'cpu/instructions,freq=1234/',cycles -c 1000 sleep 1 $ perf evlist -F cpu/instructions,freq=1234/: sample_freq=1234 cycles: sample_period=1000 $ - Deref sys_enter pointer args with contents from probe:vfs_getname, showing pathnames instead of pointers in many syscalls in 'perf trace'. (Arnaldo Carvalho de Melo) - Stop collecting /proc/kallsyms in perf.data files, saving about 4.5MB on a typical x86-64 system, use the the symbol resolution routines used in all the other tools (report, top, etc) now that we can ask libtraceevent to use perf's symbol resolution code. (Arnaldo Carvalho de Melo) - Allow filtering out of perf's PID via 'perf record --exclude-perf'. (Wang Nan) - 'perf trace' now supports syscall groups, like strace, i.e: $ trace -e file touch file Will expand 'file' into multiple, file related, syscalls. More work needed to add extra groups for other syscall groups, and also to complement what was added for the 'file' group, included as a proof of concept. (Arnaldo Carvalho de Melo) - Add lock_pi stresser to 'perf bench futex', to test the kernel code related to FUTEX_(UN)LOCK_PI. (Davidlohr Bueso) - Let user have timestamps with per-thread recording in 'perf record' (Adrian Hunter) - ... and tons of other changes, see the shortlog and the Git log for details" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (240 commits) perf evlist: Add backpointer for perf_env to evlist perf tools: Rename perf_session_env to perf_env perf tools: Do not change lib/api/fs/debugfs directly perf tools: Add tracing_path and remove unneeded functions perf buildid: Introduce sysfs/filename__sprintf_build_id perf evsel: Add a backpointer to the evlist a evsel is in perf trace: Add header with copyright and background info perf scripts python: Add new compaction-times script perf stat: Get correct cpu id for print_aggr tools lib traceeveent: Allow for negative numbers in print format perf script: Add --[no-]-demangle/--[no-]-demangle-kernel tracing/uprobes: Do not print '0x (null)' when offset is 0 perf probe: Support probing at absolute address perf probe: Fix error reported when offset without function perf probe: Fix list result when address is zero perf probe: Fix list result when symbol can't be found tools build: Allow duplicate objects in the object list perf tools: Remove export.h from MANIFEST perf probe: Prevent segfault when reading probe point with absolute address perf tools: Update Intel PT documentation ...
2015-08-31Merge tag 'usb-4.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usbLinus Torvalds1-0/+12
Pull USB updates from Greg KH: "Here's the big USB and PHY patchset for 4.3-rc1. As usual, the majority of the changes are in the USB gadget portion of the tree, lots of little changes all over the place for bugs and new hardware. Other than that, the normal mix of new hardware support and bugfixes. All have been in linux-next with no reported issues" * tag 'usb-4.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (261 commits) USB: qcserial: add HP lt4111 LTE/EV-DO/HSPA+ Gobi 4G Module USB: ftdi_sio: Added custom PID for CustomWare products USB: usb_wwan: silence read errors on disconnect USB: option: silence interrupt errors USB: symbolserial: Correct transferred data size USB: symbolserial: Use usb_get_serial_port_data usb: misc: usbtest: format max packet size for iso transfer usb: host: ehci-sys: delete useless bus_to_hcd conversion Revert "usb: interface authorization: Declare authorized attribute" Revert "usb: interface authorization: Introduces the default interface authorization" Revert "usb: interface authorization: Control interface probing and claiming" Revert "usb: interface authorization: Introduces the USB interface authorization" Revert "usb: interface authorization: SysFS part of USB interface authorization" Revert "usb: interface authorization: Documentation part" Revert "usb: interface authorization: Use a flag for the default device authorization" usb: core: hub: Removed some warnings generated by checkpatch.pl USB: host: ohci-at91: merge loops in ohci_hcd_at91_drv_probe USB: host: ohci-at91: merge ohci_at91_of_init in ohci_hcd_at91_drv_probe USB: host: ohci-at91: depend on OF USB: host: ohci-at91: move at91_usbh_data definition in c file ...
2015-08-31Merge tag 'tty-4.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/ttyLinus Torvalds1-0/+1
Pull tty/serial driver updates from Greg KH: "Here is the big tty/serial driver update for 4.3-rc1. Not many major things, a number of driver updates and changes, and the 8250 driver got split up a bit to make it easier to work with by moving some functions to a new file. Full details are in the shortlog. All have been in linux-next with no reported issues" * tag 'tty-4.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (90 commits) serial: imx: save and restore context in the suspend path serial: imx: allow waking up on RTSD serial: imx: introduce serial_imx_enable_wakeup() serial: imx: remove unbalanced clk_prepare serial: 8250: move rx_running out of the bitfield tty: serial: 8250_omap: do not use RX DMA if pause is not supported serial:8250_dw: do not alter CTS and DCTS since AFE is enabled tty: serial: men_z135_uart.c: Don't initialize port->lock tty: serial: men_z135_uart.c: Fix race between IRQ and set_termios() serial: 8250: bind to ALi Fast Infrared Controller (ALI5123) serial: 8250: don't bind to SMSC IrCC IR port serial: mxs-auart: fix baud rate range serial: mxs-auart: keep the AUART unit in reset state when not in use serial: mxs-auart: use a function name to reflect what it really does serial: 8250_pci: fix mode after S3/S4 resume for F81504/508/512 sc16is7xx: constify devtype sc16is7xx: support multiple devices sc16is7xx: save and use per-chip line number uart: pl011: Add support to ZTE ZX296702 uart uart: pl011: Improve LCRH register access decision ...
2015-08-31fib, fib6: reject invalid feature bitsDaniel Borkmann1-4/+7
Feature bits that are invalid should not be accepted by the kernel, only the lower 4 bits may be configured, but not the remaining ones. Even from these 4, 2 of them are unused. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-31Merge tag 'char-misc-4.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-miscLinus Torvalds1-0/+19
Pull char/misc driver patches from Greg KH: "Here's the "big" char/misc driver update for 4.3-rc1. Not much really interesting here, just a number of little changes all over the place, and some nice consolidation of the nvmem drivers to a common framework. As usual, the mei drivers stand out as the largest "churn" to handle new devices and features in their hardware. All have been in linux-next for a while with no issues" * tag 'char-misc-4.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (136 commits) auxdisplay: ks0108: initialize local parport variable extcon: palmas: Fix build break due to devm_gpiod_get_optional API change extcon: palmas: Support GPIO based USB ID detection extcon: Fix signedness bugs about break error handling extcon: Drop owner assignment from i2c_driver extcon: arizona: Simplify pdata symantics for micd_dbtime extcon: arizona: Declare 3-pole jack if we detect open circuit on mic extcon: Add exception handling to prevent the NULL pointer access extcon: arizona: Ensure variables are set for headphone detection extcon: arizona: Use gpiod inteface to handle micd_pol_gpio gpio extcon: arizona: Add basic microphone detection DT/ACPI bindings extcon: arizona: Update to use the new device properties API extcon: palmas: Remove the mutually_exclusive array extcon: Remove optional print_state() function pointer of struct extcon_dev extcon: Remove duplicate header file in extcon.h extcon: max77843: Clear IRQ bits state before request IRQ toshiba laptop: replace ioremap_cache with ioremap misc: eeprom: max6875: clean up max6875_read() misc: eeprom: clean up eeprom_read() misc: eeprom: 93xx46: clean up eeprom_93xx46_bin_read/write ...
2015-08-31Merge tag 'kvm-4.3-1' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds1-0/+2
Pull kvm updates from Paolo Bonzini: "A very small release for x86 and s390 KVM. - s390: timekeeping changes, cleanups and fixes - x86: support for Hyper-V MSRs to report crashes, and a bunch of cleanups. One interesting feature that was planned for 4.3 (emulating the local APIC in kernel while keeping the IOAPIC and 8254 in userspace) had to be delayed because Intel complained about my reading of the manual" * tag 'kvm-4.3-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (42 commits) x86/kvm: Rename VMX's segment access rights defines KVM: x86/vPMU: Fix unnecessary signed extension for AMD PERFCTRn kvm: x86: Fix error handling in the function kvm_lapic_sync_from_vapic KVM: s390: Fix assumption that kvm_set_irq_routing is always run successfully KVM: VMX: drop ept misconfig check KVM: MMU: fully check zero bits for sptes KVM: MMU: introduce is_shadow_zero_bits_set() KVM: MMU: introduce the framework to check zero bits on sptes KVM: MMU: split reset_rsvds_bits_mask_ept KVM: MMU: split reset_rsvds_bits_mask KVM: MMU: introduce rsvd_bits_validate KVM: MMU: move FNAME(is_rsvd_bits_set) to mmu.c KVM: MMU: fix validation of mmio page fault KVM: MTRR: Use default type for non-MTRR-covered gfn before WARN_ON KVM: s390: host STP toleration for VMs KVM: x86: clean/fix memory barriers in irqchip_in_kernel KVM: document memory barriers for kvm->vcpus/kvm->online_vcpus KVM: x86: remove unnecessary memory barriers for shared MSRs KVM: move code related to KVM_SET_BOOT_CPU_ID to x86 KVM: s390: log capability enablement and vm attribute changes ...
2015-08-31Merge branch 'perf/urgent' into perf/core, to pick up fixesIngo Molnar1-0/+6
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-0/+1
2015-08-30IB/netlink: Add defines for local service requests through netlinkKaike Wan1-0/+82
This patch adds netlink defines for local service client, local service group, local service operations, and related attributes. Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: John Fleck <john.fleck@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-28netlink: add NETLINK_CAP_ACK socket optionChristophe Ricard1-0/+1
Since commit c05cdb1b864f ("netlink: allow large data transfers from user-space"), the kernel may fail to allocate the necessary room for the acknowledgment message back to userspace. This patch introduces a new socket option that trims off the payload of the original netlink message. The netlink message header is still included, so the user can guess from the sequence number what is the message that has triggered the acknowledgment. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-28IB/core: Add core header changes needed for OPADennis Dalessandro1-0/+427
This patch adds the value of the CNP opcode to the existing list of enumerated opcodes in ib_pack.h Add common OPA header definitions for driver build: - opa_port_info.h - opa_smi.h - hfi1_user.h Additionally, ib_mad.h, has additional definitions that are common to ib_drivers including: - trap support - cca support The qib driver has the duplication removed in favor those in ib_mad.h Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: John, Jubin <jubin.john@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>