aboutsummaryrefslogtreecommitdiffstats
path: root/arch (follow)
AgeCommit message (Collapse)AuthorFilesLines
2017-11-16s390/cpum_sf: load program parameter at sampler enablementHendrik Brueckner2-0/+9
The lpp instruction is used to place the PID of the current task in the program-parameter (PP) register. The register contents is then included in the sampling data entries. The lpp instruction loads the PP register only when at least one sampling function is enabled. Otherwise it is executed as a no-op. Linux calls lpp at context switch. If the context switch happens before the sampler is enabled, the PP register is empty. That means, the PID of the task that is sampled is not stored in sampling data until the next context switch. Hence, always call lpp when enabling the sampler. Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-11-16s390/perf: extend perf_regs support to include floating-point registersHendrik Brueckner2-0/+37
Extend the perf register support to also export floating-point register contents for user space tasks. Floating-point registers might be used in leaf functions to contain the return address. Hence, they are required for proper DWARF unwinding. Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Reviewed-and-tested-by: Thomas Richter <tmricht@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-11-16s390/perf: add perf_regs support and user stack dumpHeiko Carstens4-1/+79
Add s390 support to dump user stack to user space for DWARF stack unwinding. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Reviewed-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Reviewed-and-tested-by: Thomas Richter <tmricht@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-11-16s390/cpum_sf: do not register PMU if no sampling mode is authorizedHendrik Brueckner1-0/+3
Previously, the cpum_sf PMU was registered even if there is no sampling mode authorized. Add a check and register cpum_sf only at least one sampling mode is authorized. Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-11-16s390/cpumf: remove raw event support in basic-only sampling modePu Hou2-175/+27
Raw sample was implemented to export the diagnostic samples. With having this achieved with AUX buffers, there is no requirement for basic samples to export raw data. In particular, most basic sampling information are consumed for creating the perf event sample. Signed-off-by: Pu Hou <bjhoupu@linux.vnet.ibm.com> Reviewed-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-11-16s390/cpumf: enable using AUX bufferPu Hou1-11/+44
Modify PMU callback to use AUX buffer for diagnostic mode sampling. Basic-mode sampling still use orignal way. Signed-off-by: Pu Hou <bjhoupu@linux.vnet.ibm.com> Reviewed-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-11-16s390/cpumf: introduce AUX buffer for dump diagnostic sample dataPu Hou1-0/+446
Current implementation uses a private buffer for cpumf to dump samples. Samples first go to this buffer. Then copy to ring buffer allocated by perf core. With AUX buffer, this copy is not needed. AUX buffer is shared and zero-copy mapped to user space. The trailer information at the end of each SDB(sample data block) is also exported to user space. AUX buffer is used when diagnostic sampling mode is enabled. This patch contains functions to setup/free AUX buffer or to begin/end sampling per-cpu. Also include function called in interrupt to collect samples. Signed-off-by: Pu Hou <bjhoupu@linux.vnet.ibm.com> Reviewed-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-11-16Merge tag 'kvm-s390-next-4.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linuxRadim Krčmář15-111/+244
KVM: s390: fixes and improvements for 4.15 - Some initial preparation patches for exitless interrupts and crypto - New capability for AIS migration - Fixes - merge of the sthyi tree from the base s390 team, which moves the sthyi out of KVM into a shared function also for non-KVM
2017-11-16s390/disassembler: increase show_code buffer sizeVasily Gorbik1-2/+2
Current buffer size of 64 is too small. objdump shows that there are instructions which would require up to 75 bytes buffer (with current formating). 128 bytes "ought to be enough for anybody". Also replaces 8 spaces with a single tab to reduce the memory footprint. Fixes the following KASAN finding: BUG: KASAN: stack-out-of-bounds in number+0x3fe/0x538 Write of size 1 at addr 000000005a4a75a0 by task bash/1282 CPU: 1 PID: 1282 Comm: bash Not tainted 4.14.0+ #215 Hardware name: IBM 2964 N96 702 (z/VM 6.4.0) Call Trace: ([<000000000011eeb6>] show_stack+0x56/0x88) [<0000000000e1ce1a>] dump_stack+0x15a/0x1b0 [<00000000004e2994>] print_address_description+0xf4/0x288 [<00000000004e2cf2>] kasan_report+0x13a/0x230 [<0000000000e38ae6>] number+0x3fe/0x538 [<0000000000e3dfe4>] vsnprintf+0x194/0x948 [<0000000000e3ea42>] sprintf+0xa2/0xb8 [<00000000001198dc>] print_insn+0x374/0x500 [<0000000000119346>] show_code+0x4ee/0x538 [<000000000011f234>] show_registers+0x34c/0x388 [<000000000011f2ae>] show_regs+0x3e/0xa8 [<000000000011f502>] die+0x1ea/0x2e8 [<0000000000138f0e>] do_no_context+0x106/0x168 [<0000000000139a1a>] do_protection_exception+0x4da/0x7d0 [<0000000000e55914>] pgm_check_handler+0x16c/0x1c0 [<000000000090639e>] sysrq_handle_crash+0x46/0x58 ([<0000000000000007>] 0x7) [<00000000009073fa>] __handle_sysrq+0x102/0x218 [<0000000000907c06>] write_sysrq_trigger+0xd6/0x100 [<000000000061d67a>] proc_reg_write+0xb2/0x128 [<0000000000520be6>] __vfs_write+0xee/0x368 [<0000000000521222>] vfs_write+0x21a/0x278 [<000000000052156a>] SyS_write+0xda/0x178 [<0000000000e555cc>] system_call+0xc4/0x270 The buggy address belongs to the page: page:000003d1016929c0 count:0 mapcount:0 mapping: (null) index:0x0 flags: 0x0() raw: 0000000000000000 0000000000000000 0000000000000000 ffffffff00000000 raw: 0000000000000100 0000000000000200 0000000000000000 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: 000000005a4a7480: 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1 000000005a4a7500: 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00 00 >000000005a4a7580: 00 00 00 00 f3 f3 f3 f3 00 00 00 00 00 00 00 00 ^ 000000005a4a7600: 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1 f8 f8 000000005a4a7680: f2 f2 f2 f2 f2 f2 f8 f8 f2 f2 f3 f3 f3 f3 00 00 ================================================================== Cc: <stable@vger.kernel.org> Signed-off-by: Vasily Gorbik <gor@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-11-16s390: Remove CONFIG_HARDENED_USERCOPYMichael Holzheu3-12/+1
When running the crash tool on a s390 live system we get a kernel panic for reading memory within the kernel image: # uname -a Linux r3545011 4.14.0-rc8-00066-g1c9dbd4615fd #45 SMP PREEMPT Fri Nov 10 16:16:22 CET 2017 s390x s390x s390x GNU/Linux # crash /boot/vmlinux-devel /dev/mem # crash> rd 0x100000 usercopy: kernel memory exposure attempt detected from 0000000000100000 (<kernel text>) (8 bytes) ------------[ cut here ]------------ kernel BUG at mm/usercopy.c:72! illegal operation: 0001 ilc:1 [#1] PREEMPT SMP. Modules linked in: CPU: 0 PID: 1461 Comm: crash Not tainted 4.14.0-rc8-00066-g1c9dbd4615fd-dirty #46 Hardware name: IBM 2827 H66 706 (z/VM 6.3.0) task: 000000001ad10100 task.stack: 000000001df78000 Krnl PSW : 0704d00180000000 000000000038165c (__check_object_size+0x164/0x1d0) R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3 Krnl GPRS: 0000000012440e1d 0000000080000000 0000000000000061 00000000001cabc0 00000000001cc6d6 0000000000000000 0000000000cc4ed2 0000000000001000 000003ffc22fdd20 0000000000000008 0000000000100008 0000000000000001 0000000000000008 0000000000100000 0000000000381658 000000001df7bc90 Krnl Code: 000000000038164c: c020004a1c4a larl %r2,cc4ee0 0000000000381652: c0e5fff2581b brasl %r14,1cc688 #0000000000381658: a7f40001 brc 15,38165a >000000000038165c: eb42000c000c srlg %r4,%r2,12 0000000000381662: eb32001c000c srlg %r3,%r2,28 0000000000381668: c0110003ffff lgfi %r1,262143 000000000038166e: ec31ff752065 clgrj %r3,%r1,2,381558 0000000000381674: a7f4ff67 brc 15,381542 Call Trace: ([<0000000000381658>] __check_object_size+0x160/0x1d0) [<000000000082263a>] read_mem+0xaa/0x130. [<0000000000386182>] __vfs_read+0x42/0x168. [<000000000038632e>] vfs_read+0x86/0x140. [<0000000000386a26>] SyS_read+0x66/0xc0. [<0000000000ace6a4>] system_call+0xc4/0x2b0. INFO: lockdep is turned off. Last Breaking-Event-Address: [<0000000000381658>] __check_object_size+0x160/0x1d0 Kernel panic - not syncing: Fatal exception: panic_on_oops With CONFIG_HARDENED_USERCOPY copy_to_user() checks in __check_object_size() if the source address is within the kernel image. When the crash tool reads from 0x100000, this check leads to the kernel BUG(). So disable the kernel config option until this bug is fixed. Corresponding bug report on LKML: https://lkml.org/lkml/2017/11/10/341 Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-11-15Merge tag 'drm-for-v4.15' of git://people.freedesktop.org/~airlied/linuxLinus Torvalds1-12/+0
Pull drm updates from Dave Airlie: "This is the main drm pull request for v4.15. Core: - Atomic object lifetime fixes - Atomic iterator improvements - Sparse/smatch fixes - Legacy kms ioctls to be interruptible - EDID override improvements - fb/gem helper cleanups - Simple outreachy patches - Documentation improvements - Fix dma-buf rcu races - DRM mode object leasing for improving VR use cases. - vgaarb improvements for non-x86 platforms. New driver: - tve200: Faraday Technology TVE200 block. This "TV Encoder" encodes a ITU-T BT.656 stream and can be found in the StorLink SL3516 (later Cortina Systems CS3516) as well as the Grain Media GM8180. New bridges: - SiI9234 support New panels: - S6E63J0X03, OTM8009A, Seiko 43WVF1G, 7" rpi touch panel, Toshiba LT089AC19000, Innolux AT043TN24 i915: - Remove Coffeelake from alpha support - Cannonlake workarounds - Infoframe refactoring for DisplayPort - VBT updates - DisplayPort vswing/emph/buffer translation refactoring - CCS fixes - Restore GPU clock boost on missed vblanks - Scatter list updates for userptr allocations - Gen9+ transition watermarks - Display IPC (Isochronous Priority Control) - Private PAT management - GVT: improved error handling and pci config sanitizing - Execlist refactoring - Transparent Huge Page support - User defined priorities support - HuC/GuC firmware refactoring - DP MST fixes - eDP power sequencing fixes - Use RCU instead of stop_machine - PSR state tracking support - Eviction fixes - BDW DP aux channel timeout fixes - LSPCON fixes - Cannonlake PLL fixes amdgpu: - Per VM BO support - Powerplay cleanups - CI powerplay support - PASID mgr for kfd - SR-IOV fixes - initial GPU reset for vega10 - Prime mmap support - TTM updates - Clock query interface for Raven - Fence to handle ioctl - UVD encode ring support on Polaris - Transparent huge page DMA support - Compute LRU pipe tweaks - BO flag to allow buffers to opt out of implicit sync - CTX priority setting API - VRAM lost infrastructure plumbing qxl: - fix flicker since atomic rework amdkfd: - Further improvements from internal AMD tree - Usermode events - Drop radeon support nouveau: - Pascal temperature sensor support - Improved BAR2 handling - MMU rework to support Pascal MMU exynos: - Improved HDMI/mixer support - HDMI audio interface support tegra: - Prep work for tegra186 - Cleanup/fixes msm: - Preemption support for a5xx - Display fixes for 8x96 (snapdragon 820) - Async cursor plane fixes - FW loading rework - GPU debugging improvements vc4: - Prep for DSI panels - fix T-format tiling scanout - New madvise ioctl Rockchip: - LVDS support omapdrm: - omap4 HDMI CEC support etnaviv: - GPU performance counters groundwork sun4i: - refactor driver load + TCON backend - HDMI improvements - A31 support - Misc fixes udl: - Probe/EDID read fixes. tilcdc: - Misc fixes. pl111: - Support more variants adv7511: - Improve EDID handling. - HDMI CEC support sii8620: - Add remote control support" * tag 'drm-for-v4.15' of git://people.freedesktop.org/~airlied/linux: (1480 commits) drm/rockchip: analogix_dp: Use mutex rather than spinlock drm/mode_object: fix documentation for object lookups. drm/i915: Reorder context-close to avoid calling i915_vma_close() under RCU drm/i915: Move init_clock_gating() back to where it was drm/i915: Prune the reservation shared fence array drm/i915: Idle the GPU before shinking everything drm/i915: Lock llist_del_first() vs llist_del_all() drm/i915: Calculate ironlake intermediate watermarks correctly, v2. drm/i915: Disable lazy PPGTT page table optimization for vGPU drm/i915/execlists: Remove the priority "optimisation" drm/i915: Filter out spurious execlists context-switch interrupts drm/amdgpu: use irq-safe lock for kiq->ring_lock drm/amdgpu: bypass lru touch for KIQ ring submission drm/amdgpu: Potential uninitialized variable in amdgpu_vm_update_directories() drm/amdgpu: potential uninitialized variable in amdgpu_vce_ring_parse_cs() drm/amd/powerplay: initialize a variable before using it drm/amd/powerplay: suppress KASAN out of bounds warning in vega10_populate_all_memory_levels drm/amd/amdgpu: fix evicted VRAM bo adjudgement condition drm/vblank: Tune drm_crtc_accurate_vblank_count() WARN down to a debug drm/rockchip: add CONFIG_OF dependency for lvds ...
2017-11-15Merge tag 'media/v4.15-1' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-mediaLinus Torvalds3-2/+4
Pull media updates from Mauro Carvalho Chehab: - Documentation for digital TV (both kAPI and uAPI) are now in sync with the implementation (except for legacy/deprecated ioctls). This is a major step, as there were always a gap there - New sensor driver: imx274 - New cec driver: cec-gpio - New platform driver for rockship rga and tegra CEC - New RC driver: tango-ir - Several cleanups at atomisp driver - Core improvements for RC, CEC, V4L2 async probing support and DVB - Lots of drivers cleanup, fixes and improvements. * tag 'media/v4.15-1' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (332 commits) dvb_frontend: don't use-after-free the frontend struct media: dib0700: fix invalid dvb_detach argument media: v4l2-ctrls: Don't validate BITMASK twice media: s5p-mfc: fix lockdep warning media: dvb-core: always call invoke_release() in fe_free() media: usb: dvb-usb-v2: dvb_usb_core: remove redundant code in dvb_usb_fe_sleep media: au0828: make const array addr_list static media: cx88: make const arrays default_addr_list and pvr2000_addr_list static media: drxd: make const array fastIncrDecLUT static media: usb: fix spelling mistake: "synchronuously" -> "synchronously" media: ddbridge: fix build warnings media: av7110: avoid 2038 overflow in debug print media: Don't do DMA on stack for firmware upload in the AS102 driver media: v4l: async: fix unregister for implicitly registered sub-device notifiers media: v4l: async: fix return of unitialized variable ret media: imx274: fix missing return assignment from call to imx274_mode_regs media: camss-vfe: always initialize reg at vfe_set_xbar_cfg() media: atomisp: make function calls cleaner media: atomisp: get rid of storage_class.h media: atomisp: get rid of wrong stddef.h include ...
2017-11-15Merge branch 'akpm' (patches from Andrew)Linus Torvalds59-1600/+308
Merge updates from Andrew Morton: - a few misc bits - ocfs2 updates - almost all of MM * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (131 commits) memory hotplug: fix comments when adding section mm: make alloc_node_mem_map a void call if we don't have CONFIG_FLAT_NODE_MEM_MAP mm: simplify nodemask printing mm,oom_reaper: remove pointless kthread_run() error check mm/page_ext.c: check if page_ext is not prepared writeback: remove unused function parameter mm: do not rely on preempt_count in print_vma_addr mm, sparse: do not swamp log with huge vmemmap allocation failures mm/hmm: remove redundant variable align_end mm/list_lru.c: mark expected switch fall-through mm/shmem.c: mark expected switch fall-through mm/page_alloc.c: broken deferred calculation mm: don't warn about allocations which stall for too long fs: fuse: account fuse_inode slab memory as reclaimable mm, page_alloc: fix potential false positive in __zone_watermark_ok mm: mlock: remove lru_add_drain_all() mm, sysctl: make NUMA stats configurable shmem: convert shmem_init_inodecache() to void Unify migrate_pages and move_pages access checks mm, pagevec: rename pagevec drained field ...
2017-11-15mm, sparse: do not swamp log with huge vmemmap allocation failuresMichal Hocko1-1/+0
While doing memory hotplug tests under heavy memory pressure we have noticed too many page allocation failures when allocating vmemmap memmap backed by huge page kworker/u3072:1: page allocation failure: order:9, mode:0x24084c0(GFP_KERNEL|__GFP_REPEAT|__GFP_ZERO) [...] Call Trace: dump_trace+0x59/0x310 show_stack_log_lvl+0xea/0x170 show_stack+0x21/0x40 dump_stack+0x5c/0x7c warn_alloc_failed+0xe2/0x150 __alloc_pages_nodemask+0x3ed/0xb20 alloc_pages_current+0x7f/0x100 vmemmap_alloc_block+0x79/0xb6 __vmemmap_alloc_block_buf+0x136/0x145 vmemmap_populate+0xd2/0x2b9 sparse_mem_map_populate+0x23/0x30 sparse_add_one_section+0x68/0x18e __add_pages+0x10a/0x1d0 arch_add_memory+0x4a/0xc0 add_memory_resource+0x89/0x160 add_memory+0x6d/0xd0 acpi_memory_device_add+0x181/0x251 acpi_bus_attach+0xfd/0x19b acpi_bus_scan+0x59/0x69 acpi_device_hotplug+0xd2/0x41f acpi_hotplug_work_fn+0x1a/0x23 process_one_work+0x14e/0x410 worker_thread+0x116/0x490 kthread+0xbd/0xe0 ret_from_fork+0x3f/0x70 and we do see many of those because essentially every allocation fails for each memory section. This is an excessive way to tell the user that there is nothing to really worry about because we do have a fallback mechanism to use base pages. The only downside might be a performance degradation due to TLB pressure. This patch changes vmemmap_alloc_block() to use __GFP_NOWARN and warn explicitly once on the first allocation failure. This will reduce the noise in the kernel log considerably, while we still have an indication that a performance might be impacted. [mhocko@kernel.org: forgot to git add the follow up fix] Link: http://lkml.kernel.org/r/20171107090635.c27thtse2lchjgvb@dhcp22.suse.cz Link: http://lkml.kernel.org/r/20171106092228.31098-1-mhocko@kernel.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Michal Hocko <mhocko@suse.com> Cc: Joe Perches <joe@perches.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Khalid Aziz <khalid.aziz@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15mm: remove cold parameter from free_hot_cold_page*Mel Gorman4-4/+4
Most callers users of free_hot_cold_page claim the pages being released are cache hot. The exception is the page reclaim paths where it is likely that enough pages will be freed in the near future that the per-cpu lists are going to be recycled and the cache hotness information is lost. As no one really cares about the hotness of pages being released to the allocator, just ditch the parameter. The APIs are renamed to indicate that it's no longer about hot/cold pages. It should also be less confusing as there are subtle differences between them. __free_pages drops a reference and frees a page when the refcount reaches zero. free_hot_cold_page handled pages whose refcount was already zero which is non-obvious from the name. free_unref_page should be more obvious. No performance impact is expected as the overhead is marginal. The parameter is removed simply because it is a bit stupid to have a useless parameter copied everywhere. [mgorman@techsingularity.net: add pages to head, not tail] Link: http://lkml.kernel.org/r/20171019154321.qtpzaeftoyyw4iey@techsingularity.net Link: http://lkml.kernel.org/r/20171018075952.10627-8-mgorman@techsingularity.net Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Jan Kara <jack@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15sparc64: optimize struct page zeroingPavel Tatashin1-0/+30
Add an optimized mm_zero_struct_page(), so struct page's are zeroed without calling memset(). We do eight to ten regular stores based on the size of struct page. Compiler optimizes out the conditions of switch() statement. SPARC-M6 with 15T of memory, single thread performance: BASE FIX OPTIMIZED_FIX bootmem_init 28.440467985s 2.305674818s 2.305161615s free_area_init_nodes 202.845901673s 225.343084508s 172.556506560s -------------------------------------------- Total 231.286369658s 227.648759326s 174.861668175s BASE: current linux FIX: This patch series without "optimized struct page zeroing" OPTIMIZED_FIX: This patch series including the current patch. bootmem_init() is where memory for struct pages is zeroed during allocation. Note, about two seconds in this function is a fixed time: it does not increase as memory is increased. Link: http://lkml.kernel.org/r/20171013173214.27300-11-pasha.tatashin@oracle.com Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Reviewed-by: Steven Sistare <steven.sistare@oracle.com> Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com> Reviewed-by: Bob Picco <bob.picco@oracle.com> Acked-by: David S. Miller <davem@davemloft.net> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15arm64/mm/kasan: don't use vmemmap_populate() to initialize shadowWill Deacon2-51/+81
The kasan shadow is currently mapped using vmemmap_populate() since that provides a semi-convenient way to map pages into init_top_pgt. However, since that no longer zeroes the mapped pages, it is not suitable for kasan, which requires zeroed shadow memory. Add kasan_populate_shadow() interface and use it instead of vmemmap_populate(). Besides, this allows us to take advantage of gigantic pages and use them to populate the shadow, which should save us some memory wasted on page tables and reduce TLB pressure. Link: http://lkml.kernel.org/r/20171103185147.2688-3-pasha.tatashin@oracle.com Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Steven Sistare <steven.sistare@oracle.com> Cc: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: Bob Picco <bob.picco@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Alexander Potapenko <glider@google.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: David S. Miller <davem@davemloft.net> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@kernel.org> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15x86/mm/kasan: don't use vmemmap_populate() to initialize shadowAndrey Ryabinin2-8/+137
The kasan shadow is currently mapped using vmemmap_populate() since that provides a semi-convenient way to map pages into init_top_pgt. However, since that no longer zeroes the mapped pages, it is not suitable for kasan, which requires zeroed shadow memory. Add kasan_populate_shadow() interface and use it instead of vmemmap_populate(). Besides, this allows us to take advantage of gigantic pages and use them to populate the shadow, which should save us some memory wasted on page tables and reduce TLB pressure. Link: http://lkml.kernel.org/r/20171103185147.2688-2-pasha.tatashin@oracle.com Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Cc: Steven Sistare <steven.sistare@oracle.com> Cc: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: Bob Picco <bob.picco@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Alexander Potapenko <glider@google.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: David S. Miller <davem@davemloft.net> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@kernel.org> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15sparc64: simplify vmemmap_populatePavel Tatashin1-17/+6
Remove duplicating code by using common functions vmemmap_pud_populate and vmemmap_pgd_populate. Link: http://lkml.kernel.org/r/20171013173214.27300-5-pasha.tatashin@oracle.com Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Reviewed-by: Steven Sistare <steven.sistare@oracle.com> Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com> Reviewed-by: Bob Picco <bob.picco@oracle.com> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@kernel.org> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15sparc64/mm: set fields in deferred pagesPavel Tatashin1-1/+8
Without deferred struct page feature (CONFIG_DEFERRED_STRUCT_PAGE_INIT), flags and other fields in "struct page"es are never changed prior to first initializing struct pages by going through __init_single_page(). With deferred struct page feature enabled there is a case where we set some fields prior to initializing: mem_init() { register_page_bootmem_info(); free_all_bootmem(); ... } When register_page_bootmem_info() is called only non-deferred struct pages are initialized. But, this function goes through some reserved pages which might be part of the deferred, and thus are not yet initialized. mem_init register_page_bootmem_info register_page_bootmem_info_node get_page_bootmem .. setting fields here .. such as: page->freelist = (void *)type; free_all_bootmem() free_low_memory_core_early() for_each_reserved_mem_region() reserve_bootmem_region() init_reserved_page() <- Only if this is deferred reserved page __init_single_pfn() __init_single_page() memset(0) <-- Loose the set fields here We end up with similar issue as in the previous patch, where currently we do not observe problem as memory is zeroed. But, if flag asserts are changed we can start hitting issues. Also, because in this patch series we will stop zeroing struct page memory during allocation, we must make sure that struct pages are properly initialized prior to using them. The deferred-reserved pages are initialized in free_all_bootmem(). Therefore, the fix is to switch the above calls. Link: http://lkml.kernel.org/r/20171013173214.27300-4-pasha.tatashin@oracle.com Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Reviewed-by: Steven Sistare <steven.sistare@oracle.com> Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com> Reviewed-by: Bob Picco <bob.picco@oracle.com> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@kernel.org> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15x86/mm: set fields in deferred pagesPavel Tatashin1-2/+8
Without deferred struct page feature (CONFIG_DEFERRED_STRUCT_PAGE_INIT), flags and other fields in "struct page"es are never changed prior to first initializing struct pages by going through __init_single_page(). With deferred struct page feature enabled, however, we set fields in register_page_bootmem_info that are subsequently clobbered right after in free_all_bootmem: mem_init() { register_page_bootmem_info(); free_all_bootmem(); ... } When register_page_bootmem_info() is called only non-deferred struct pages are initialized. But, this function goes through some reserved pages which might be part of the deferred, and thus are not yet initialized. mem_init register_page_bootmem_info register_page_bootmem_info_node get_page_bootmem .. setting fields here .. such as: page->freelist = (void *)type; free_all_bootmem() free_low_memory_core_early() for_each_reserved_mem_region() reserve_bootmem_region() init_reserved_page() <- Only if this is deferred reserved page __init_single_pfn() __init_single_page() memset(0) <-- Loose the set fields here We end up with issue where, currently we do not observe problem as memory is explicitly zeroed. But, if flag asserts are changed we can start hitting issues. Also, because in this patch series we will stop zeroing struct page memory during allocation, we must make sure that struct pages are properly initialized prior to using them. The deferred-reserved pages are initialized in free_all_bootmem(). Therefore, the fix is to switch the above calls. Link: http://lkml.kernel.org/r/20171013173214.27300-3-pasha.tatashin@oracle.com Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Reviewed-by: Steven Sistare <steven.sistare@oracle.com> Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com> Reviewed-by: Bob Picco <bob.picco@oracle.com> Tested-by: Bob Picco <bob.picco@oracle.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: David S. Miller <davem@davemloft.net> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@kernel.org> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15kmemcheck: rip it outLevin, Alexander (Sasha Levin)19-1396/+3
Fix up makefiles, remove references, and git rm kmemcheck. Link: http://lkml.kernel.org/r/20171007030159.22241-4-alexander.levin@verizon.com Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Vegard Nossum <vegardno@ifi.uio.no> Cc: Pekka Enberg <penberg@kernel.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Alexander Potapenko <glider@google.com> Cc: Tim Hansen <devtimhansen@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15kmemcheck: remove whats left of NOTRACK flagsLevin, Alexander (Sasha Levin)2-18/+0
Now that kmemcheck is gone, we don't need the NOTRACK flags. Link: http://lkml.kernel.org/r/20171007030159.22241-5-alexander.levin@verizon.com Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Cc: Alexander Potapenko <glider@google.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Tim Hansen <devtimhansen@gmail.com> Cc: Vegard Nossum <vegardno@ifi.uio.no> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15kmemcheck: stop using GFP_NOTRACK and SLAB_NOTRACKLevin, Alexander (Sasha Levin)13-20/+19
Convert all allocations that used a NOTRACK flag to stop using it. Link: http://lkml.kernel.org/r/20171007030159.22241-3-alexander.levin@verizon.com Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Cc: Alexander Potapenko <glider@google.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Tim Hansen <devtimhansen@gmail.com> Cc: Vegard Nossum <vegardno@ifi.uio.no> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15kmemcheck: remove annotationsLevin, Alexander (Sasha Levin)7-23/+1
Patch series "kmemcheck: kill kmemcheck", v2. As discussed at LSF/MM, kill kmemcheck. KASan is a replacement that is able to work without the limitation of kmemcheck (single CPU, slow). KASan is already upstream. We are also not aware of any users of kmemcheck (or users who don't consider KASan as a suitable replacement). The only objection was that since KASAN wasn't supported by all GCC versions provided by distros at that time we should hold off for 2 years, and try again. Now that 2 years have passed, and all distros provide gcc that supports KASAN, kill kmemcheck again for the very same reasons. This patch (of 4): Remove kmemcheck annotations, and calls to kmemcheck from the kernel. [alexander.levin@verizon.com: correctly remove kmemcheck call from dma_map_sg_attrs] Link: http://lkml.kernel.org/r/20171012192151.26531-1-alexander.levin@verizon.com Link: http://lkml.kernel.org/r/20171007030159.22241-2-alexander.levin@verizon.com Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Cc: Alexander Potapenko <glider@google.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Tim Hansen <devtimhansen@gmail.com> Cc: Vegard Nossum <vegardno@ifi.uio.no> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15mm: introduce wrappers to access mm->nr_ptesKirill A. Shutemov3-3/+3
Let's add wrappers for ->nr_ptes with the same interface as for nr_pmd and nr_pud. The patch also makes nr_ptes accounting dependent onto CONFIG_MMU. Page table accounting doesn't make sense if you don't have page tables. It's preparation for consolidation of page-table counters in mm_struct. Link: http://lkml.kernel.org/r/20171006100651.44742-1-kirill.shutemov@linux.intel.com Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15mm: account pud page tablesKirill A. Shutemov3-1/+5
On a machine with 5-level paging support a process can allocate significant amount of memory and stay unnoticed by oom-killer and memory cgroup. The trick is to allocate a lot of PUD page tables. We don't account PUD page tables, only PMD and PTE. We already addressed the same issue for PMD page tables, see commit dc6c9a35b66b ("mm: account pmd page tables to the process"). Introduction of 5-level paging brings the same issue for PUD page tables. The patch expands accounting to PUD level. [kirill.shutemov@linux.intel.com: s/pmd_t/pud_t/] Link: http://lkml.kernel.org/r/20171004074305.x35eh5u7ybbt5kar@black.fi.intel.com [heiko.carstens@de.ibm.com: s390/mm: fix pud table accounting] Link: http://lkml.kernel.org/r/20171103090551.18231-1-heiko.carstens@de.ibm.com Link: http://lkml.kernel.org/r/20171002080427.3320-1-kirill.shutemov@linux.intel.com Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-by: Rik van Riel <riel@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15mm, arch: remove empty_bad_page*Michal Hocko6-53/+1
empty_bad_page() and empty_bad_pte_table() seem to be relics from old days which is not used by any code for a long time. I have tried to find when exactly but this is not really all that straightforward due to many code movements - traces disappear around 2.4 times. Anyway no code really references neither empty_bad_page nor empty_bad_pte_table. We only allocate the storage which is not used by anybody so remove them. Link: http://lkml.kernel.org/r/20171004150045.30755-1-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Acked-by: Ralf Baechle <ralf@linus-mips.org> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: David Howells <dhowells@redhat.com> Cc: Rich Felker <dalias@libc.org> Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15m32r: fix endianness constraintsGeert Uytterhoeven1-2/+2
The m32r Kconfig provides both CPU_BIG_ENDIAN and CPU_LITTLE_ENDIAN configuration options. As they are user-selectable and independent, this allows invalid configurations: - All m32r defconfigs build a big endian kernel, but CPU_BIG_ENDIAN is not set, causing compiler warnings like: include/linux/byteorder/big_endian.h:7:2: warning: #warning inconsistent configuration, needs CONFIG_CPU_BIG_ENDIAN [-Wcpp] #warning inconsistent configuration, needs CONFIG_CPU_BIG_ENDIAN ^ - Since commit 5bdfca6435b82944 ("m32r: define CPU_BIG_ENDIAN"), building an allmodconfig or allyesconfig enables both CONFIG_CPU_BIG_ENDIAN and CONFIG_CPU_LITTLE_ENDIAN. While this did get rid of the warning above, both options are obviously mutually exclusive. Fix this by making only CPU_LITTLE_ENDIAN configurable by the user, as before, and by making sure exactly one of CPU_BIG_ENDIAN and CPU_LITTLE_ENDIAN is always enabled. Link: http://lkml.kernel.org/r/1509361505-18150-1-git-send-email-geert@linux-m68k.org Fixes: 5bdfca6435b82944 ("m32r: define CPU_BIG_ENDIAN") Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Sudip Mukherjee <sudipm.mukherjee@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-16kbuild: create object directories simpler and fasterMasahiro Yamada1-4/+0
For the out-of-tree build, scripts/Makefile.build creates output directories, but this operation is not efficient. scripts/Makefile.lib calculates obj-dirs as follows: obj-dirs := $(dir $(multi-objs) $(obj-y)) Please notice $(sort ...) is not used here. Usually the result is as many "./" as objects here. For a lot of duplicated paths, the following command is invoked. _dummy := $(foreach d,$(obj-dirs), $(shell [ -d $(d) ] || mkdir -p $(d))) Then, the costly shell command is run over and over again. I see many points for optimization: [1] Use $(sort ...) to cut down duplicated paths before passing them to system call [2] Use single $(shell ...) instead of repeating it with $(foreach ...) This will reduce forking. [3] We can calculate obj-dirs more simply. Most of objects are already accumulated in $(targets). So, $(dir $(targets)) is fine and more comprehensive. I also removed ugly code in arch/x86/entry/vdso/Makefile. This is now really unnecessary. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Acked-by: Ingo Molnar <mingo@kernel.org> Tested-by: Douglas Anderson <dianders@chromium.org>
2017-11-15Merge tag 'pci-v4.15-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pciLinus Torvalds20-65/+204
Pull PCI updates from Bjorn Helgaas: - detach driver before tearing down procfs/sysfs (Alex Williamson) - disable PCIe services during shutdown (Sinan Kaya) - fix ASPM oops on systems with no Root Ports (Ard Biesheuvel) - fix ASPM LTR_L1.2_THRESHOLD programming (Bjorn Helgaas) - fix ASPM Common_Mode_Restore_Time computation (Bjorn Helgaas) - fix portdrv MSI/MSI-X vector allocation (Dongdong Liu, Bjorn Helgaas) - report non-fatal AER errors only to the affected endpoint (Gabriele Paoloni) - distribute bus numbers, MMIO, and I/O space among hotplug bridges to allow more devices to be hot-added (Mika Westerberg) - fix pciehp races during initialization and surprise link down (Mika Westerberg) - handle surprise-removed devices in PME handling (Qiang) - support resizable BARs for large graphics devices (Christian König) - expose SR-IOV offset, stride, and VF device ID via sysfs (Filippo Sironi) - create SR-IOV virtfn/physfn sysfs links before attaching driver (Stuart Hayes) - fix SR-IOV "ARI Capable Hierarchy" restore issue (Tony Nguyen) - enforce Kconfig IOV/REALLOC dependency (Sascha El-Sharkawy) - avoid slot reset if bridge itself is broken (Jan Glauber) - clean up pci_reset_function() path (Jan H. Schönherr) - make pci_map_rom() fail if the option ROM is invalid (Changbin Du) - convert timers to timer_setup() (Kees Cook) - move PCI_QUIRKS to PCI bus Kconfig menu (Randy Dunlap) - constify pci_dev_type and intel_mid_pci_ops (Bhumika Goyal) - remove unnecessary pci_dev, pci_bus, resource, pcibios_set_master() declarations (Bjorn Helgaas) - fix endpoint framework overflows and BUG()s (Dan Carpenter) - fix endpoint framework issues (Kishon Vijay Abraham I) - avoid broken Cavium CN8xxx bus reset behavior (David Daney) - extend Cavium ACS capability quirks (Vadim Lomovtsev) - support Synopsys DesignWare RC in ECAM mode (Ard Biesheuvel) - turn off dra7xx clocks cleanly on shutdown (Keerthy) - fix Faraday probe error path (Wei Yongjun) - support HiSilicon STB SoC PCIe host controller (Jianguo Sun) - fix Hyper-V interrupt affinity issue (Dexuan Cui) - remove useless ACPI warning for Hyper-V pass-through devices (Vitaly Kuznetsov) - support multiple MSI on iProc (Sandor Bodo-Merle) - support Layerscape LS1012a and LS1046a PCIe host controllers (Hou Zhiqiang) - fix Layerscape default error response (Minghuan Lian) - support MSI on Tango host controller (Marc Gonzalez) - support Tegra186 PCIe host controller (Manikanta Maddireddy) - use generic accessors on Tegra when possible (Thierry Reding) - support V3 Semiconductor PCI host controller (Linus Walleij) * tag 'pci-v4.15-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (85 commits) PCI/ASPM: Add L1 Substates definitions PCI/ASPM: Reformat ASPM register definitions PCI/ASPM: Use correct capability pointer to program LTR_L1.2_THRESHOLD PCI/ASPM: Account for downstream device's Port Common_Mode_Restore_Time PCI: xgene: Rename xgene_pcie_probe_bridge() to xgene_pcie_probe() PCI: xilinx: Rename xilinx_pcie_link_is_up() to xilinx_pcie_link_up() PCI: altera: Rename altera_pcie_link_is_up() to altera_pcie_link_up() PCI: Fix kernel-doc build warning PCI: Fail pci_map_rom() if the option ROM is invalid PCI: Move pci_map_rom() error path PCI: Move PCI_QUIRKS to the PCI bus menu alpha/PCI: Make pdev_save_srm_config() static PCI: Remove unused declarations PCI: Remove redundant pci_dev, pci_bus, resource declarations PCI: Remove redundant pcibios_set_master() declarations PCI/PME: Handle invalid data when reading Root Status PCI: hv: Use effective affinity mask PCI: pciehp: Do not clear Presence Detect Changed during initialization PCI: pciehp: Fix race condition handling surprise link down PCI: Distribute available resources to hotplug-capable bridges ...
2017-11-15Merge tag 'modules-for-v4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linuxLinus Torvalds2-2/+2
Pull module updates from Jessica Yu: "Summary of modules changes for the 4.15 merge window: - treewide module_param_call() cleanup, fix up set/get function prototype mismatches, from Kees Cook - minor code cleanups" * tag 'modules-for-v4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux: module: Do not paper over type mismatches in module_param_call() treewide: Fix function prototypes for module_param_call() module: Prepare to convert all module_param_call() prototypes kernel/module: Delete an error message for a failed memory allocation in add_module_usage()
2017-11-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds3-53/+4
Pull networking updates from David Miller: "Highlights: 1) Maintain the TCP retransmit queue using an rbtree, with 1GB windows at 100Gb this really has become necessary. From Eric Dumazet. 2) Multi-program support for cgroup+bpf, from Alexei Starovoitov. 3) Perform broadcast flooding in hardware in mv88e6xxx, from Andrew Lunn. 4) Add meter action support to openvswitch, from Andy Zhou. 5) Add a data meta pointer for BPF accessible packets, from Daniel Borkmann. 6) Namespace-ify almost all TCP sysctl knobs, from Eric Dumazet. 7) Turn on Broadcom Tags in b53 driver, from Florian Fainelli. 8) More work to move the RTNL mutex down, from Florian Westphal. 9) Add 'bpftool' utility, to help with bpf program introspection. From Jakub Kicinski. 10) Add new 'cpumap' type for XDP_REDIRECT action, from Jesper Dangaard Brouer. 11) Support 'blocks' of transformations in the packet scheduler which can span multiple network devices, from Jiri Pirko. 12) TC flower offload support in cxgb4, from Kumar Sanghvi. 13) Priority based stream scheduler for SCTP, from Marcelo Ricardo Leitner. 14) Thunderbolt networking driver, from Amir Levy and Mika Westerberg. 15) Add RED qdisc offloadability, and use it in mlxsw driver. From Nogah Frankel. 16) eBPF based device controller for cgroup v2, from Roman Gushchin. 17) Add some fundamental tracepoints for TCP, from Song Liu. 18) Remove garbage collection from ipv6 route layer, this is a significant accomplishment. From Wei Wang. 19) Add multicast route offload support to mlxsw, from Yotam Gigi" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2177 commits) tcp: highest_sack fix geneve: fix fill_info when link down bpf: fix lockdep splat net: cdc_ncm: GetNtbFormat endian fix openvswitch: meter: fix NULL pointer dereference in ovs_meter_cmd_reply_start netem: remove unnecessary 64 bit modulus netem: use 64 bit divide by rate tcp: Namespace-ify sysctl_tcp_default_congestion_control net: Protect iterations over net::fib_notifier_ops in fib_seq_sum() ipv6: set all.accept_dad to 0 by default uapi: fix linux/tls.h userspace compilation error usbnet: ipheth: prevent TX queue timeouts when device not ready vhost_net: conditionally enable tx polling uapi: fix linux/rxrpc.h userspace compilation errors net: stmmac: fix LPI transitioning for dwmac4 atm: horizon: Fix irq release error net-sysfs: trigger netlink notification on ifalias change via sysfs openvswitch: Using kfree_rcu() to simplify the code openvswitch: Make local function ovs_nsh_key_attr_size() static openvswitch: Fix return value check in ovs_meter_cmd_features() ...
2017-11-15Merge tag 'mips_4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/mipsLinus Torvalds69-433/+503
Pull MIPS updates from James Hogan: "These are the main MIPS changes for 4.15. Fixes: - ralink: Fix MT7620 PCI build issues (4.5) - Disable cmpxchg64() and HAVE_VIRT_CPU_ACCOUNTING_GEN for 32-bit SMP (4.1) - Fix MIPS64 FP save/restore on 32-bit kernels (4.0) - ptrace: Pick up ptrace/seccomp changed syscall numbers (3.19) - ralink: Fix MT7628 pinmux (3.19) - BCM47XX: Fix LED inversion on WRT54GSv1 (3.17) - Fix n32 core dumping as o32 since regset support (3.13) - ralink: Drop obsolete USB_ARCH_HAS_HCD select Build system: - Default to "generic" (multiplatform) system type instead of IP22 - Use generic little endian MIPS32 r2 configuration as default defconfig instead of ip22_defconfig FPU emulation: - Fix exception generation for certain R6 FPU instructions SMP: - Allow __cpu_number_map to be larger than NR_CPUS for sparse CPU id spaces Miscellaneous: - Add iomem resource for kernel bss section for kexec/kdump - Atomics: Nudge writes on bit unlock - DT files: Standardise "ok" -> "okay" Minor cleanups: - Define virt_to_pfn() - Make thread_saved_pc static - Simplify 32-bit sign extension in __read_64bit_c0_split() - DMA: Use vma_pages() helper - FPU emulation: Replace unsigned with unsigned int - MM: Removed unused lastpfn - Alchemy: Make clk_ops const - Lasat: Use setup_timer() helper - ralink: Use BIT() in MT7620 PCI driver Platform support: BMIPS: - Enable HARDIRQS_SW_RESEND Broadcom BCM63XX: - Add clkdev lookup support - Update clk driver, UART driver, DTs to handle named refclk from DTs - Split apart various clocks to more closely match hardware - Add ethernet clocks Cavium Octeon: - Remove usage of cvmx_wait() in favour of __delay() ImgTec Pistachio: - DT: Drop deprecated dwmmc num-slots property Ingenic JZ4780: - Add NFS root to Ci20 defconfig - Add watchdog to Ci20 DT & defconfig, and allow building of watchdog driver with this SoC Generic (multiplatform): - Migrate xilfpga (MIPSfpga) platform to the generic platform Lantiq xway: - Fix ASC0/ASC1 clocks" * tag 'mips_4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/mips: (46 commits) MIPS: Add iomem resource for kernel bss section. MIPS: cmpxchg64() and HAVE_VIRT_CPU_ACCOUNTING_GEN don't work for 32-bit SMP MIPS: BMIPS: Enable HARDIRQS_SW_RESEND MIPS: pci: Make use of the BIT() macro inside the mt7620 driver MIPS: pci: Remove KERN_WARN instance inside the mt7620 driver MIPS: pci: Remove duplicate define in mt7620 driver MIPS: ralink: Fix typo in mt7628 pinmux function MIPS: ralink: Fix MT7628 pinmux MIPS: Fix odd fp register warnings with MIPS64r2 watchdog: jz4780: Allow selection of jz4740-wdt driver MIPS/ptrace: Update syscall nr on register changes MIPS/ptrace: Pick up ptrace/seccomp changed syscalls MIPS: Fix an n32 core file generation regset support regression MIPS: Fix MIPS64 FP save/restore on 32-bit kernels MIPS: page.h: Define virt_to_pfn() MIPS: Xilfpga: Switch to using generic defconfigs MIPS: generic: Add support for MIPSfpga MIPS: Set defconfig target to a generic system for 32r2el MIPS: Kconfig: Set default MIPS system type as generic MIPS: DTS: Remove num-slots from Pistachio SoC ...
2017-11-15Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linuxLinus Torvalds60-507/+3012
Pull arm64 updates from Will Deacon: "The big highlight is support for the Scalable Vector Extension (SVE) which required extensive ABI work to ensure we don't break existing applications by blowing away their signal stack with the rather large new vector context (<= 2 kbit per vector register). There's further work to be done optimising things like exception return, but the ABI is solid now. Much of the line count comes from some new PMU drivers we have, but they're pretty self-contained and I suspect we'll have more of them in future. Plenty of acronym soup here: - initial support for the Scalable Vector Extension (SVE) - improved handling for SError interrupts (required to handle RAS events) - enable GCC support for 128-bit integer types - remove kernel text addresses from backtraces and register dumps - use of WFE to implement long delay()s - ACPI IORT updates from Lorenzo Pieralisi - perf PMU driver for the Statistical Profiling Extension (SPE) - perf PMU driver for Hisilicon's system PMUs - misc cleanups and non-critical fixes" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (97 commits) arm64: Make ARMV8_DEPRECATED depend on SYSCTL arm64: Implement __lshrti3 library function arm64: support __int128 on gcc 5+ arm64/sve: Add documentation arm64/sve: Detect SVE and activate runtime support arm64/sve: KVM: Hide SVE from CPU features exposed to guests arm64/sve: KVM: Treat guest SVE use as undefined instruction execution arm64/sve: KVM: Prevent guests from using SVE arm64/sve: Add sysctl to set the default vector length for new processes arm64/sve: Add prctl controls for userspace vector length management arm64/sve: ptrace and ELF coredump support arm64/sve: Preserve SVE registers around EFI runtime service calls arm64/sve: Preserve SVE registers around kernel-mode NEON use arm64/sve: Probe SVE capabilities and usable vector lengths arm64: cpufeature: Move sys_caps_initialised declarations arm64/sve: Backend logic for setting the vector length arm64/sve: Signal handling support arm64/sve: Support vector length resetting for new processes arm64/sve: Core task context handling arm64/sve: Low-level CPU setup ...
2017-11-15Merge tag 'riscv-for-linus-4.15-arch-v9-premerge' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/linuxLinus Torvalds102-0/+9776
Pull RISC-V architecture support from Palmer Dabbelt: "This contains the core RISC-V Linux port, which has been through nine rounds of review on various mailing lists. The port is not complete: there's some cleanup patches moving through the review process, a whole bunch of drivers that need some work, and a lot of feature additions that will be needed. The patches contained in this tag have been through nine rounds of review on the various mailing lists. I have some outstanding cleanup patches, but since there's been so much review on these patches I thought it would be best to submit them as-is and then submit explicit cleanup patches so everyone can review them. This first patch set is big enough that it's a bit of a pain to constantly rewrite, and it's caused a few headaches with various contributors. The port is definately a work in progress. While what's there builds and boots with 4.14, it's a bit hard to actually see anything happen because there are no device drivers yet. I maintain a staging branch that contains all the device drivers and cleanup that actually works, but those patches won't all be ready for a while. I'd like to get what we currently have into your tree so everyone can start working from a single base -- of particular importance is allowing the glibc upstreaming process to proceed so we can sort out any possibly lingering user-visible ABI problems we might have. Copied below is the ChangeLog that contains the history of this patch set: (v9) As per suggestions on our v8 patch set, I've split the core architecture code out from our drivers and would like to submit this patch set to be included into linux-next, with the goal being to be merged in during the next merge window. This patch set is based on 4.14-rc2, but if it's better to have it based on something else then I can change it around. This patch set contains just the core arch code for RISC-V, so while it builds an nominally boots, you can't print or take an interrupt so it's not that useful. If you're looking to actually boot a system it would probably be better to use the full patch set listed below. We've collected a handful of tags from reviewers, and the remainder of the patch set only got minimal feedback last time. Here's what changed: - We now use the device tree to initialize the timer driver so it's less tighly coupled with the arch port. - I cleaned up the defconfigs -- there's actually now just one, and it's empty. For now I think we're OK with what the kernel sets as defaults, but I anticipate we'll begin to expand this as people start to use the port more. - The VDSO symbols version is sane. - We WFI while spinning in the boot loop. - A handful of comments have been added. While there are still a handful of FIXMEs in this patch set, we've started to get enough interest from various users and contributors that maintaining an out of tree patch set is starting to become a big burden. Hopefully the patches are good enough to merge now, which will at least get everyone working in a more reasonable manner as we clean up the remaining issues. (v8) I know it may not be the ideal time to submit a patch set right now, as it's the middle of the merge window, but things have calmed down quite a bit in the last month so I thought it would be good to get everyone on the same page. There's been a handful of changes since the last patch set, but most of them are fairly minor: - We changed PAGE_OFFSET to allowing mapping more physical memory on 64-bit systems. This is user configurable, as it triggers a different code model that generates slightly less efficient code. - The device tree binding documentation is back, I'd managed to lose it at some point. - We now pass the atomic64 test suite - The SBI timer driver has been refactored. (v7) It's been a while since my last patch set, but the changes han been fairly minimal: - The PCI cleanup patches have been dropped, we'll do them as a separate patch set later. - We've the Kconfig entries from CONFIG_ISA_* to CONFIG_RISCV_ISA_*, to make grep easier. - There have been a handful of memory model related tweaks in I/O land, particularly relating the PCI and the upcoming platform specification. There are significant comments in the relevant files. This is still a WIP, but I think we're close to getting as good as we're going to get until we end up with some more specifications. (v6) As it's been only a day since the v5 patch set, the changes are pretty minimal: - The patch set is now based on linux-next/master, which I believe is a better base now that we're getting closer to upstream. - EARLY_PRINTK is no longer an option. Since the SBI console is reasonable, there's no penalty to enabling it (and thus no benefit to disabling it). - The mmap syscalls were refactored a bit. (v5) Things have really started to calm down, so this is fairly similar to the v4 patch set. The most interesting changes include: - We've moved back to a single patch set. - SMP support has been fixed, I was accidentally running on a non-SMP configuration. There were various mistakes all over the tree as a result of this. - The cmpxchg syscalls have been removed, as they were deemed a bad idea. As a result, RISC-V Linux systems mandate the A extension. The corresponding Kconfig entry to enable builds on non-A systems has been removed. - A few more atomic fixes: mostly fence changes, but those resulted in a handful of additional macros that were no longer necessary. - riscv_early_sie has been removed. (v4) There have only been a few changes since the v3 patch set: - The cmpxchg64 syscall is no longer enabled on 32-bit systems. It's not possible to provide this on SMP systems, and it's not necessary as glibc knows not to call it. - We provide a ELF_HWCAP so users can determine the ISA of the machine the kernel is running on. - The multi-line comments are in a better form. - There were a handful of headers that could be replaced with the asm-generic versions, and a few unnecessary definitions. - We no longer use printk, but instead use pr_*. - A few Kconfig and defconfig entries have been cleaned up. (v3) A highlight of the changes since the v2 patch set includes: - We've split out all our drivers into separate patch sets, which I've already sent out to the relevant maintainers. I haven't included those patches in this patch set, but some of them are necessary to build our port. - The patch set is now split up differently: rather than being split per directory it is split per topic. Hopefully this will make it easier to review the port on the mailing list. The split is a bit rough, so you probably still want to look at the patch set as a whole. - atomic.h has been completely rewritten and is hopefully now correct. I've attempted to sanitize the various other memory model related code as well, and I think it should all be sane now aside from a handful of FIXMEs commented in the code. - We've changed the cmpexchg syscall to always exist and to not be multiplexed. There is also a VDSO entry for compare and exchange, which allows kernels with the A extension to execute user code without the A extension reasonably fast. - Our user-visible register state now contains enough space for the Q extension for 128-bit floating point, as well as a few words to allow extensibility to future ISA extensions like the eventual V extension for vectors. - A handful of driver cleanups, but these have been split into separate patch sets now so I won't duplicate them here. (v2) A highlight of the changes since the v1 patch set includes: - We've split out our drivers into the right places, which means now there's a lot more patches. I'll be submitting these patches to various subsystem maintainers and including them in any future RISC-V patch sets until they've been merged. - The SBI console driver has been completely rewritten to use the HVC helpers and is now significantly smaller. - We've begun to use weaker barriers as opposed to just the big "fence". There's still some work to do here, specifically: - We need fences in the relaxed MMIO functions. - The non-relaxed MMIO functions are missing R/W bits on their fences. - Many AMOs need the aq and rl bits set. - We now have thread_info in task_struct. As a result, sscratch now contains TP instead of SP. This was necessary because thread_info is no longer on the stack. - A few shared routines have been added that we use instead of creating another arch copy" Reviewed-by: Arnd Bergmann <arnd@arndb.de> * tag 'riscv-for-linus-4.15-arch-v9-premerge' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/linux: RISC-V: Build Infrastructure RISC-V: User-facing API RISC-V: Paging and MMU RISC-V: Device, timer, IRQs, and the SBI RISC-V: Task implementation RISC-V: ELF and module implementation RISC-V: Generic library routines and assembly RISC-V: Atomic and Locking Code RISC-V: Init and Halt Code dt-bindings: RISC-V CPU Bindings lib: Add shared copies of some GCC library routines MAINTAINERS: Add RISC-V
2017-11-15x86 / CPU: Always show current CPU frequency in /proc/cpuinfoRafael J. Wysocki4-24/+61
After commit 890da9cf0983 (Revert "x86: do not use cpufreq_quick_get() for /proc/cpuinfo "cpu MHz"") the "cpu MHz" number in /proc/cpuinfo on x86 can be either the nominal CPU frequency (which is constant) or the frequency most recently requested by a scaling governor in cpufreq, depending on the cpufreq configuration. That is somewhat inconsistent and is different from what it was before 4.13, so in order to restore the previous behavior, make it report the current CPU frequency like the scaling_cur_freq sysfs file in cpufreq. To that end, modify the /proc/cpuinfo implementation on x86 to use aperfmperf_snapshot_khz() to snapshot the APERF and MPERF feedback registers, if available, and use their values to compute the CPU frequency to be reported as "cpu MHz". However, do that carefully enough to avoid accumulating delays that lead to unacceptable access times for /proc/cpuinfo on systems with many CPUs. Run aperfmperf_snapshot_khz() once on all CPUs asynchronously at the /proc/cpuinfo open time, add a single delay upfront (if necessary) at that point and simply compute the current frequency while running show_cpuinfo() for each individual CPU. Also, to avoid slowing down /proc/cpuinfo accesses too much, reduce the default delay between consecutive APERF and MPERF reads to 10 ms, which should be sufficient to get large enough numbers for the frequency computation in all cases. Fixes: 890da9cf0983 (Revert "x86: do not use cpufreq_quick_get() for /proc/cpuinfo "cpu MHz"") Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Ingo Molnar <mingo@kernel.org>
2017-11-15Merge branch 'for-linus' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/jikos/trivialLinus Torvalds9-13/+12
Pull trivial tree updates from Jiri Kosina: "The usual rocket-science from trivial tree for 4.15" * 'for-linus' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: MAINTAINERS: relinquish kconfig MAINTAINERS: Update my email address treewide: Fix typos in Kconfig kfifo: Fix comments init/Kconfig: Fix module signing document location misc: ibmasm: Return error on error path HID: logitech-hidpp: fix mistake in printk, "feeback" -> "feedback" MAINTAINERS: Correct path to uDraw PS3 driver tracing: Fix doc mistakes in trace sample tracing: Kconfig text fixes for CONFIG_HWLAT_TRACER MIPS: Alchemy: Remove reverted CONFIG_NETLINK_MMAP from db1xxx_defconfig mm/huge_memory.c: fixup grammar in comment lib/xz: Add fall-through comments to a switch statement
2017-11-15fcntl: don't cap l_start and l_end values for F_GETLK64 in compat syscallJeff Layton8-8/+0
Currently, we're capping the values too low in the F_GETLK64 case. The fields in that structure are 64-bit values, so we shouldn't need to do any sort of fixup there. Make sure we check that assumption at build time in the future however by ensuring that the sizes we're copying will fit. With this, we no longer need COMPAT_LOFF_T_MAX either, so remove it. Fixes: 94073ad77fff2 (fs/locks: don't mess with the address limit in compat_fcntl64) Reported-by: Vitaly Lipatov <lav@etersoft.ru> Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: David Howells <dhowells@redhat.com>
2017-11-15sparc64: Fix page table walk for PUD hugepagesNitin Gupta1-1/+1
For a PUD hugepage entry, we need to propagate bits [32:22] from virtual address to resolve at 4M granularity. However, the current code was incorrectly propagating bits [29:19]. This bug can cause incorrect data to be returned for pages backed with 16G hugepages. Signed-off-by: Nitin Gupta <nitin.m.gupta@oracle.com> Reported-by: Al Viro <viro@ZenIV.linux.org.uk> Cc: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-15sparc64: Convert timers to user timer_setup()Allen Pais1-3/+3
Switch to using the new timer_setup() and from_timer() in LDOM Virtual I/O handshake. Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Allen Pais <allen.pais@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-15sparc64: convert mdesc_handle.refcnt from atomic_t to refcount_tElena Reshetova1-8/+9
atomic_t variables are currently used to implement reference counters with the following properties: - counter is initialized to 1 using atomic_set() - a resource is freed upon counter reaching zero - once counter reaches zero, its further increments aren't allowed - counter schema uses basic atomic operations (set, inc, inc_not_zero, dec_and_test, etc.) Such atomic variables should be converted to a newly provided refcount_t type and API that prevents accidental counter overflows and underflows. This is important since overflows and underflows can lead to use-after-free situation and be exploitable. The variable mdesc_handle.refcnt is used as pure reference counter. Convert it to refcount_t and fix up the operations. Suggested-by: Kees Cook <keescook@chromium.org> Reviewed-by: David Windsor <dwindsor@gmail.com> Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com> Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Acked-by: Shannon Nelson <shannon.nelson@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-15sparc/led: Convert timers to use timer_setup()Kees Cook1-7/+9
In preparation for unconditionally passing the struct timer_list pointer to all timer callbacks, switch to using the new timer_setup() and from_timer() to pass the timer pointer explicitly. Adds a static variable to hold timeout value. Cc: "David S. Miller" <davem@davemloft.net> Cc: Geliang Tang <geliangtang@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: sparclinux@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-15sparc64: Use sparc optimized fls and __fls for T4 and aboveVijay Kumar2-0/+11
For T4 and above, patch fls and __fls functions at the boot time to use lzcnt instruction. Signed-off-by: Vijay Kumar <vijay.ac.kumar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-15sparc64: SPARC optimized __fls functionVijay Kumar1-0/+10
Defined SPARC optimized __fls using lzcnt opcode. Signed-off-by: Vijay Kumar <vijay.ac.kumar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-15sparc64: SPARC optimized fls functionVijay Kumar2-0/+21
Defined SPARC optimized fls using lzcnt opcode. Signed-off-by: Vijay Kumar <vijay.ac.kumar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-15sparc64: Define SPARC default __fls functionVijay Kumar3-1/+63
__fls will now require a boot time patching on T4 and above. Redefining it under arch/sparc/lib. Signed-off-by: Vijay Kumar <vijay.ac.kumar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-15sparc64: Define SPARC default fls functionVijay Kumar3-1/+70
fls will now require a boot time patching on T4 and above. Redefining it under arch/sparc/lib. Signed-off-by: Vijay Kumar <vijay.ac.kumar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-15vDSO for sparcNagarathnam Muthusamy26-2/+1494
Following patch is based on work done by Nick Alcock on 64-bit vDSO for sparc in Oracle linux. I have extended it to include support for 32-bit vDSO for sparc on 64-bit kernel. vDSO for sparc is based on the X86 implementation. This patch provides vDSO support for both 64-bit and 32-bit programs on 64-bit kernel. vDSO will be disabled on 32-bit linux kernel on sparc. *) vclock_gettime.c contains all the vdso functions. Since data page is mapped before the vdso code page, the pointer to data page is got by subracting offset from an address in the vdso code page. The return address stored in %i7 is used for this purpose. *) During compilation, both 32-bit and 64-bit vdso images are compiled and are converted into raw bytes by vdso2c program to be ready for mapping into the process. 32-bit images are compiled only if CONFIG_COMPAT is enabled. vdso2c generates two files vdso-image-64.c and vdso-image-32.c which contains the respective vDSO image in C structure. *) During vdso initialization, required number of vdso pages are allocated and raw bytes are copied into the pages. *) During every exec, these pages are mapped into the process through arch_setup_additional_pages and the location of mapping is passed on to the process through aux vector AT_SYSINFO_EHDR which is used by glibc. *) A new update_vsyscall routine for sparc is added to keep the data page in vdso updated. *) As vDSO cannot contain dynamically relocatable references, a new version of cpu_relax is added for the use of vDSO. This change also requires a putback to glibc to use vDSO. For testing, programs planning to try vDSO can be compiled against the generated vdso(64/32).so in the source. Testing: ======== [root@localhost ~]# cat vdso_test.c int main() { struct timespec tv_start, tv_end; struct timeval tv_tmp; int i; int count = 1 * 1000 * 10000; long long diff; clock_gettime(0, &tv_start); for (i = 0; i < count; i++) gettimeofday(&tv_tmp, NULL); clock_gettime(0, &tv_end); diff = (long long)(tv_end.tv_sec - tv_start.tv_sec)*(1*1000*1000*1000); diff += (tv_end.tv_nsec - tv_start.tv_nsec); printf("Start sec: %d\n", tv_start.tv_sec); printf("End sec : %d\n", tv_end.tv_sec); printf("%d cycles in %lld ns = %f ns/cycle\n", count, diff, (double)diff / (double)count); return 0; } [root@localhost ~]# cc vdso_test.c -o t32_without_fix -m32 -lrt [root@localhost ~]# ./t32_without_fix Start sec: 1502396130 End sec : 1502396140 10000000 cycles in 9565148528 ns = 956.514853 ns/cycle [root@localhost ~]# cc vdso_test.c -o t32_with_fix -m32 ./vdso32.so.dbg [root@localhost ~]# ./t32_with_fix Start sec: 1502396168 End sec : 1502396169 10000000 cycles in 798141262 ns = 79.814126 ns/cycle [root@localhost ~]# cc vdso_test.c -o t64_without_fix -m64 -lrt [root@localhost ~]# ./t64_without_fix Start sec: 1502396208 End sec : 1502396218 10000000 cycles in 9846091800 ns = 984.609180 ns/cycle [root@localhost ~]# cc vdso_test.c -o t64_with_fix -m64 ./vdso64.so.dbg [root@localhost ~]# ./t64_with_fix Start sec: 1502396257 End sec : 1502396257 10000000 cycles in 380984048 ns = 38.098405 ns/cycle V1 to V2 Changes: ================= Added hot patching code to switch the read stick instruction to read tick instruction based on the hardware. V2 to V3 Changes: ================= Merged latest changes from sparc-next and moved the initialization of clocksource_tick.archdata.vclock_mode to time_init_early. Disabled queued spinlock and rwlock configuration when simulating 32-bit config to compile 32-bit VDSO. V3 to V4 Changes: ================= Hardcoded the page size as 8192 in linker script for both 64-bit and 32-bit binaries. Removed unused variables in vdso2c.h. Added -mv8plus flag to Makefile to prevent the generation of relocation entries for __lshrdi3 in 32-bit vdso binary. Signed-off-by: Nick Alcock <nick.alcock@oracle.com> Signed-off-by: Nagarathnam Muthusamy <nagarathnam.muthusamy@oracle.com> Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-15powerpc/64s: Fix Power9 DD2.0 workarounds by adding DD2.1 featureMichael Ellerman4-11/+12
Recently we added a CPU feature for Power9 DD2.0, to capture the fact that some workarounds are required only on Power9 DD1 and DD2.0 but not DD2.1 or later. Then in commit 9d2f510a66ec ("powerpc/64s/idle: avoid POWER9 DD1 and DD2.0 ERAT workaround on DD2.1") and commit e3646330cf66 "powerpc/64s/idle: avoid POWER9 DD1 and DD2.0 PMU workaround on DD2.1") we changed CPU_FTR_SECTIONs to check for DD1 or DD20, eg: BEGIN_FTR_SECTION PPC_INVALIDATE_ERAT END_FTR_SECTION_IFSET(CPU_FTR_POWER9_DD1 | CPU_FTR_POWER9_DD20) Unfortunately although this reads as "if set DD1 or DD2.0", the or is a bitwise or and actually generates a mask of both bits. The code that does the feature patching then checks that the value of the CPU features masked with that mask are equal to the mask. So the end result is we're checking for DD1 and DD20 being set, which never happens. Yes the API is terrible. Removing the ERAT workaround on DD2.0 results in random SEGVs, the system tends to boot, but things randomly die including sometimes dhclient, udev etc. To fix the problem and hopefully avoid it in future, we remove the DD2.0 CPU feature and instead add a DD2.1 (or later) feature. This allows us to easily express that the workarounds are required if DD2.1 is not set. At some point we will drop the DD1 workarounds entirely and some of this can be cleaned up. Fixes: 9d2f510a66ec ("powerpc/64s/idle: avoid POWER9 DD1 and DD2.0 ERAT workaround on DD2.1") Fixes: e3646330cf66 ("powerpc/64s/idle: avoid POWER9 DD1 and DD2.0 PMU workaround on DD2.1") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>