linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2018-06-01	tools/power turbostat: rename num_cores_per_pkg to num_cores_per_node	Prarit Bhargava	1	-12/+14
	turbostat incorrectly assumes that there is one node per package. As a result num_cores_per_pkg is not correctly named and is actually num_cores_per_node. Rename num_cores_per_pkg to num_cores_per_node. Signed-off-by: Prarit Bhargava <prarit@redhat.com> Signed-off-by: Len Brown <len.brown@intel.com>
2018-06-01	tools/power turbostat: track thread ID in cpu_topology	Prarit Bhargava	1	-66/+39
	The code can be simplified if the cpu_topology *cpus tracks the thread IDs. This removes an additional file lookup and simplifies the counter initialization code. Add thread ID to cpu_topology information and cleanup the counter initialization code. v2: prevent thread_id from being overwritten Signed-off-by: Prarit Bhargava <prarit@redhat.com> Signed-off-by: Len Brown <len.brown@intel.com>
2018-06-01	tools/power turbostat: Calculate additional node information for a package	Prarit Bhargava	1	-4/+61
	The code currently assumes each package has exactly one node. This is not the case for AMD systems and Intel systems with COD. AMD systems also may re-enumerate each node's core IDs starting at 0 (for example, an AMD processor may have two nodes, each with core IDs from 0 to 7). In order to properly enumerate the cores we need to track both the physical and logical node IDs. Add physical_node_id to track the node ID assigned by the kernel, and logical_node_id used by turbostat to track the nodes per package ie) a 0-based count within the package. Signed-off-by: Prarit Bhargava <prarit@redhat.com> Signed-off-by: Len Brown <len.brown@intel.com>
2018-06-01	tools/power turbostat: Fix node and siblings lookup data	Len Brown	1	-35/+82
	The turbostat code only looks at thread_siblings_list to determine if processing units/threads are on the same the core. This works well on Intel systems which have a shared L1 instruction and data cache. This does not work on AMD systems which have shared L1 instruction cache but separate L1 data caches. Other utilities also check sibling's core ID to determine if the processing unit shares the same core. Additionally, the cpu_topology cpus list used in topology_probe() can be used elsewhere in the code to simplify things. Export cpus to the entire turbostat code, and add Processing Unit/Thread IDs information to each cpu_topology struct. Confirm that the thread is on the same core as indicated by thread_siblings_list. [v2]: Fixup CPU_* usage that caused gcc malloc error. Signed-off-by: Prarit Bhargava <prarit@redhat.com> Signed-off-by: Len Brown <len.brown@intel.com>
2018-06-01	tools/power turbostat: set max_num_cpus equal to the cpumask length	Prarit Bhargava	1	-5/+16
	Future fixes will use sysfs files that contain cpumask output. The code needs to know the length of the cpumask in order to determine which cpus are set in a cpumask. Currently topo.max_cpu_num is the maximum cpu number. It can be increased the the maximum value of cpus represented in cpumasks. Set max_num_cpus to the length of a cpumask. Signed-off-by: Prarit Bhargava <prarit@redhat.com> Signed-off-by: Len Brown <len.brown@intel.com>
2018-06-01	tools/power turbostat: if --num_iterations, print for specific number of iterations	Chen Yu	2	-1/+21
	There's a use case during test to only print specific round of iterations if --num_iterations is specified, for example, with this patch applied: turbostat -i 5 -n 4 will capture 4 samples with 5 seconds interval. [lenb: renamed to --num_iterations from --iterations] Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Chen Yu <yu.c.chen@intel.com> Reviewed-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2018-06-01	tools/power turbostat: Add Cannon Lake support	Srinivas Pandruvada	1	-3/+24
	All MSRs related to turbostat are same as Kabylake. Even though SDM claims that core C3 residency can be read from MSR 0x662, the read on this MSR fails on CNL platform. Hence disabled C3 MSR read and display. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2018-06-01	tools/power turbostat: delete duplicate #defines	Len Brown	1	-3/+0
	The SNB_C1_AUTO_UNDEMOTE definition should have been deleted once it was copied into msr-index.h. One copy of the truth is better -- particularly when Matt needs to fix it:-) Signed-off-by: Len Brown <len.brown@intel.com>
2018-06-01	x86: msr-index.h: Correct SNB_C1/C3_AUTO_UNDEMOTE defines	Matt Turner	1	-2/+2
	According to the Intel Software Developers' Manual, Vol. 4, Order No. 335592, these macros have been reversed since they were added in the initial turbostat commit. The reversed definitions were presumably copied from turbostat.c to this file. Fixes: 9c63a650bb10 ("tools/power/x86/turbostat: share kernel MSR #defines") Signed-off-by: Matt Turner <mattst88@gmail.com> Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Len Brown <len.brown@intel.com>
2018-06-01	tools/power turbostat: Correct SNB_C1/C3_AUTO_UNDEMOTE defines	Matt Turner	1	-2/+2
	According to the Intel Software Developers' Manual, Vol. 4, Order No. 335592, these macros have been reversed since they were added. Fixes: 889facbee3e6 ("tools/power turbostat: v3.0: monitor Watts and Temperature") Signed-off-by: Matt Turner <mattst88@gmail.com> Signed-off-by: Len Brown <len.brown@intel.com>
2018-06-01	tools/power turbostat: add POLL and POLL% column	Len Brown	1	-5/+6
	Like the "C1" and "C1%" column, the new POLL and POLL% columns show invocations and residency% during the measurement interval. While it didn't seem important to track in the past, we've recently found some Linux cpuidle bugs related to POLL%. Signed-off-by: Len Brown <len.brown@intel.com>
2018-06-01	tools/power turbostat: Fix --hide Pk%pc10	Len Brown	1	-1/+1
	The column header for PC10 residency is "Pk%pc10" This is missing the 'g' that others have, eg Pkg%pc6, to allow tab-delimited columns to fit into 8-columns. However, --hide Pk%pc10 did not work, it was still looking for the 'g'. This was confusing, because --list shows the correct "Pk%pc10" Reported-by: Wendy Wang <wendy.wang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2018-06-01	tools/power turbostat: Build-in "Low Power Idle" counters support	Len Brown	1	-0/+95
	Linux 4.15 exports the ACPI Low Power Idle Table's counters in /sys/devices/system/cpu/cpuidle/ low_power_idle_cpu_residency_us Show this in the "CPU%LPI" column. Today this reflects the "North Complex" residency in PC10, so expect it to closely follow "Pk%pc10". low_power_idle_system_residency_us Show this in the "SYS%LPI" column. Today, this reflects the North is in PC10, plus the PCH is sufficiently quiescent to save additional power via the "S0ix" system state, as measured by the PCH SLP_S0 counter. Signed-off-by: Len Brown <len.brown@intel.com>
2018-06-01	tools/power turbostat: Don't make man pages executable	Laura Abbott	2	-2/+2
	rpm-lint flagged these as being executable: kernel-tools.x86_64: W: spurious-executable-perm /usr/share/man/man8/turbostat.8.gz kernel-tools.x86_64: W: spurious-executable-perm /usr/share/man/man8/x86_energy_perf_policy.8.gz Fix this Signed-off-by: Laura Abbott <labbott@redhat.com> Signed-off-by: Len Brown <len.brown@intel.com>
2018-06-01	tools/power turbostat: remove blank lines	Len Brown	1	-1/+2
	When the user reuests to collect and show columns that are not present on every row (eg. for every CPU) turbostat still prints an (empty) line for every CPU. Update so no blank lines are printed. old: # turbostat --quiet --show Pkg%pc6 Pkg%pc6 9.12 9.12 Pkg%pc6 9.12 9.12 new: # turbostat --quiet --show Pkg%pc6 Pkg%pc6 9.12 9.12 Pkg%pc6 9.12 9.12 Reported-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2018-06-01	tools/power turbostat: a small C-states dump readability immprovement	Artem Bityutskiy	1	-1/+1
	Improve readability a little bit by changing this output: MSR_PKG_CST_CONFIG_CONTROL: 0x00008407 (locked: pkg-cstate-limit=7: unlimited, automatic-c-state-conversion=off) with this output: MSR_PKG_CST_CONFIG_CONTROL: 0x00008407 (locked, pkg-cstate-limit=7 (unlimited), automatic-c-state-conversion=off) Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2018-06-01	tools/power turbostat: dump BDX, SKX automatic C-state conversion bit	Artem Bityutskiy	1	-1/+18
	BDX and SKX have a bit that tells them to PROMOTE shallow C-states requests to MWAIT(C6). It is generally a BIOS bug if this bit is set. As we have encountered that BIOS bug, let's print this bit in turbostat debug output. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2018-06-01	tools/power turbostat: do not hard-code 25MHz crystal on SKX	Len Brown	1	-1/+0
	Some SKX use a 24 MHz crystal, so do not hard code 25 MHz. Also, SKX crystal is not exact, because SKX uses an EMI reduction circuit that costs a fraction of a percent. Signed-off-by: Len Brown <len.brown@intel.com>
2018-06-01	tools/power turbostat: fix possible sprintf buffer overflow	Len Brown	1	-1/+1
	Signed-off-by: Len Brown <len.brown@intel.com>
2018-06-01	tools/power turbostat: fix MSR_IA32_MISC_ENABLE MWAIT printout	Len Brown	1	-1/+1
	MSR_IA32_MISC_ENABLE[18] is the MWAIT ENABLE bit, not DISABLE bit... so MSR_IA32_MISC_ENABLE: 0x00850089 (TCC EIST No-MWAIT PREFETCH TURBO) should print as: MSR_IA32_MISC_ENABLE: 0x00850089 (TCC EIST MWAIT PREFETCH TURBO) Signed-off-by: Len Brown <len.brown@intel.com>
2018-06-01	tools/power turbostat: fix printing on input	Artem Bityutskiy	1	-6/+10
	The recent patch that implements table printing on a keypress introduced a regression - turbostat prints the table almost continuously if it is run from a daemon program. The problem is also easy to reproduce like this: echo \| turbostat The reason is that we cannot assume that stdin is always a TTY. It can be many things. This patch adds fixes the problem by limiting the new keypress functionality to TTYs only. If stdin is not a TTY, we just sleep for the full interval time. While on it, clean-up 'do_sleep()' to return no value, as callers do not expect that anyway. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2018-06-01	tools/power turbostat: end current interval upon newline input	Len Brown	2	-6/+44
	In turbostat interval mode, a newline typed on standard input will now conclude the current interval. Data will immediately be collected and printed for that interval, and the next interval will be started. This is similar to the recently added SIGUSR1 feature. But that is for use by programs, while this is for interactive use. Signed-off-by: Len Brown <len.brown@intel.com>
2018-06-01	tools/power turbostat: on SIGUSR1: sample, print and continue	Len Brown	2	-0/+10
	Interval-mode turbostat now catches and discards SIGUSR1. Thus, SIGUSR1 can be used to tell turbostat to cut short the current measurement interval. Turbostat will then start the next measurement interval using the regular interval length. This can be used to give turbostat variable intervals. Invoke turbostat with --interval LARGE_NUMBER_SEC and have a program that has permission to send it a SIGUSR1 always before LARGE_NUMBER_SEC expires. It may also be useful to use "--enable Time_Of_Day_Seconds" to observe the actual interval length. Signed-off-by: Len Brown <len.brown@intel.com>
2018-06-01	tools/power turbostat: on SIGINT: sample, print and exit	Len Brown	2	-0/+34
	When running in interval-mode, catch interrupts and print a final data record before exiting. Signed-off-by: Len Brown <len.brown@intel.com>
2018-06-01	tools/power turbostat: add --enable Time_Of_Day_Seconds	Len Brown	2	-82/+118
	Add a Time_Of_Day_Seconds column showing when measurement for each row was completed. Units are [sec.subsec] since Epoch, as reported by gettimeofday(2). While useful to correlate turbostat output with other tools, this built-in column is disabled, by default. Add the "--enable" option to enable such disabled-by-default built-in columns: "--enable Time_Of_Day_Seconds" "--enable usec" "--enable all", will enable all disabled-by-defauilt built-in counters. When "--debug" is used, all disabled-by-default columns are enabled, unless explicitly skipped using "--hide" Signed-off-by: Len Brown <len.brown@intel.com>
2018-06-01	tools/power turbostat: fix Skylake Xeon package C-state display	Artem Bityutskiy	1	-1/+1
	Turbostat neglects to display all package C-states for some Skylake Xeon BIOS configurations. This is due to a typo in the table decoding MSR_PKG_CST_CONFIG_CONTROL (0x000000e2) Here we fix that typo, according to Intel SDM, vol 4, Table 2-41 - "MSRs Supported by Intel® Xeon® Processor Scalable Family with DisplayFamily_DisplayModel 06_55H". Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2018-06-01	MAINTAINERS: add turbostat utility	Len Brown	1	-0/+9
	Signed-off-by: Len Brown <len.brown@intel.com>
2018-05-31	platform/x86: asus-wmi: Fix NULL pointer dereference	João Paulo Rechi Vita	1	-10/+13
	Do not perform the rfkill cleanup routine when (asus->driver->wlan_ctrl_by_user && ashs_present()) is true, since nothing is registered with the rfkill subsystem in that case. Doing so leads to the following kernel NULL pointer dereference: BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff816c7348>] __mutex_lock_slowpath+0x98/0x120 PGD 1a3aa8067 PUD 1a3b3d067 PMD 0 Oops: 0002 [#1] PREEMPT SMP Modules linked in: bnep ccm binfmt_misc uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_core hid_a4tech videodev x86_pkg_temp_thermal intel_powerclamp coretemp ath3k btusb btrtl btintel bluetooth kvm_intel snd_hda_codec_hdmi kvm snd_hda_codec_realtek snd_hda_codec_generic irqbypass crc32c_intel arc4 i915 snd_hda_intel snd_hda_codec ath9k ath9k_common ath9k_hw ath i2c_algo_bit snd_hwdep mac80211 ghash_clmulni_intel snd_hda_core snd_pcm snd_timer cfg80211 ehci_pci xhci_pci drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm xhci_hcd ehci_hcd asus_nb_wmi(-) asus_wmi sparse_keymap r8169 rfkill mxm_wmi serio_raw snd mii mei_me lpc_ich i2c_i801 video soundcore mei i2c_smbus wmi i2c_core mfd_core CPU: 3 PID: 3275 Comm: modprobe Not tainted 4.9.34-gentoo #34 Hardware name: ASUSTeK COMPUTER INC. K56CM/K56CM, BIOS K56CM.206 08/21/2012 task: ffff8801a639ba00 task.stack: ffffc900014cc000 RIP: 0010:[<ffffffff816c7348>] [<ffffffff816c7348>] __mutex_lock_slowpath+0x98/0x120 RSP: 0018:ffffc900014cfce0 EFLAGS: 00010282 RAX: 0000000000000000 RBX: ffff8801a54315b0 RCX: 00000000c0000100 RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff8801a54315b4 RBP: ffffc900014cfd30 R08: 0000000000000000 R09: 0000000000000002 R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801a54315b4 R13: ffff8801a639ba00 R14: 00000000ffffffff R15: ffff8801a54315b8 FS: 00007faa254fb700(0000) GS:ffff8801aef80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 00000001a3b1b000 CR4: 00000000001406e0 Stack: ffff8801a54315b8 0000000000000000 ffffffff814733ae ffffc900014cfd28 ffffffff8146a28c ffff8801a54315b0 0000000000000000 ffff8801a54315b0 ffff8801a66f3820 0000000000000000 ffffc900014cfd48 ffffffff816c73e7 Call Trace: [<ffffffff814733ae>] ? acpi_ut_release_mutex+0x5d/0x61 [<ffffffff8146a28c>] ? acpi_ns_get_node+0x49/0x52 [<ffffffff816c73e7>] mutex_lock+0x17/0x30 [<ffffffffa00a3bb4>] asus_rfkill_hotplug+0x24/0x1a0 [asus_wmi] [<ffffffffa00a4421>] asus_wmi_rfkill_exit+0x61/0x150 [asus_wmi] [<ffffffffa00a49f1>] asus_wmi_remove+0x61/0xb0 [asus_wmi] [<ffffffff814a5128>] platform_drv_remove+0x28/0x40 [<ffffffff814a2901>] __device_release_driver+0xa1/0x160 [<ffffffff814a29e3>] device_release_driver+0x23/0x30 [<ffffffff814a1ffd>] bus_remove_device+0xfd/0x170 [<ffffffff8149e5a9>] device_del+0x139/0x270 [<ffffffff814a5028>] platform_device_del+0x28/0x90 [<ffffffff814a50a2>] platform_device_unregister+0x12/0x30 [<ffffffffa00a4209>] asus_wmi_unregister_driver+0x19/0x30 [asus_wmi] [<ffffffffa00da0ea>] asus_nb_wmi_exit+0x10/0xf26 [asus_nb_wmi] [<ffffffff8110c692>] SyS_delete_module+0x192/0x270 [<ffffffff810022b2>] ? exit_to_usermode_loop+0x92/0xa0 [<ffffffff816ca560>] entry_SYSCALL_64_fastpath+0x13/0x94 Code: e8 5e 30 00 00 8b 03 83 f8 01 0f 84 93 00 00 00 48 8b 43 10 4c 8d 7b 08 48 89 63 10 41 be ff ff ff ff 4c 89 3c 24 48 89 44 24 08 <48> 89 20 4c 89 6c 24 10 eb 1d 4c 89 e7 49 c7 45 08 02 00 00 00 RIP [<ffffffff816c7348>] __mutex_lock_slowpath+0x98/0x120 RSP <ffffc900014cfce0> CR2: 0000000000000000 ---[ end trace 8d484233fa7cb512 ]--- note: modprobe[3275] exited with preempt_count 2 https://bugzilla.kernel.org/show_bug.cgi?id=196467 Reported-by: red.f0xyz@gmail.com Signed-off-by: João Paulo Rechi Vita <jprvita@endlessm.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2018-05-30	fs: clear writeback errors in inode_init_always	Darrick J. Wong	1	-0/+1
	In inode_init_always(), we clear the inode mapping flags, which clears any retained error (AS_EIO, AS_ENOSPC) bits. Unfortunately, we do not also clear wb_err, which means that old mapping errors can leak through to new inodes. This is crucial for the XFS inode allocation path because we recycle old in-core inodes and we do not want error state from an old file to leak into the new file. This bug was discovered by running generic/036 and generic/047 in a loop and noticing that the EIOs generated by the collision of direct and buffered writes in generic/036 would survive the remount between 036 and 047, and get reported to the fsyncs (on different files!) in generic/047. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Brian Foster <bfoster@redhat.com>
2018-05-29	selinux: KASAN: slab-out-of-bounds in xattr_getsecurity	Sachin Grover	1	-1/+1
	Call trace: [<ffffff9203a8d7a8>] dump_backtrace+0x0/0x428 [<ffffff9203a8dbf8>] show_stack+0x28/0x38 [<ffffff920409bfb8>] dump_stack+0xd4/0x124 [<ffffff9203d187e8>] print_address_description+0x68/0x258 [<ffffff9203d18c00>] kasan_report.part.2+0x228/0x2f0 [<ffffff9203d1927c>] kasan_report+0x5c/0x70 [<ffffff9203d1776c>] check_memory_region+0x12c/0x1c0 [<ffffff9203d17cdc>] memcpy+0x34/0x68 [<ffffff9203d75348>] xattr_getsecurity+0xe0/0x160 [<ffffff9203d75490>] vfs_getxattr+0xc8/0x120 [<ffffff9203d75d68>] getxattr+0x100/0x2c8 [<ffffff9203d76fb4>] SyS_fgetxattr+0x64/0xa0 [<ffffff9203a83f70>] el0_svc_naked+0x24/0x28 If user get root access and calls security.selinux setxattr() with an embedded NUL on a file and then if some process performs a getxattr() on that file with a length greater than the actual length of the string, it would result in a panic. To fix this, add the actual length of the string to the security context instead of the length passed by the userspace process. Signed-off-by: Sachin Grover <sgrover@codeaurora.org> Cc: stable@vger.kernel.org Signed-off-by: Paul Moore <paul@paul-moore.com>
2018-05-29	nvme: fix extended data LBA supported setting	Max Gurtovoy	1	-1/+1
	This value depands on the metadata support value, so reorder the initialization to fit. Fixes: b5be3b392 ("nvme: always unregister the integrity profile in __nvme_revalidate_disk") Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: stable@vger.kernel.org
2018-05-28	tracing: Make the snapshot trigger work with instances	Steven Rostedt (VMware)	3	-8/+25
	The snapshot trigger currently only affects the main ring buffer, even when it is used by the instances. This can be confusing as the snapshot trigger is listed in the instance. > # cd /sys/kernel/tracing > # mkdir instances/foo > # echo snapshot > instances/foo/events/syscalls/sys_enter_fchownat/trigger > # echo top buffer > trace_marker > # echo foo buffer > instances/foo/trace_marker > # touch /tmp/bar > # chown rostedt /tmp/bar > # cat instances/foo/snapshot # tracer: nop # # # * Snapshot is freed * # # Snapshot commands: # echo 0 > snapshot : Clears and frees snapshot buffer # echo 1 > snapshot : Allocates snapshot buffer, if not already allocated. # Takes a snapshot of the main buffer. # echo 2 > snapshot : Clears snapshot buffer (but does not allocate or free) # (Doesn't have to be '2' works with any number that # is not a '0' or '1') > # cat snapshot # tracer: nop # # _-----=> irqs-off # / _----=> need-resched # \| / _---=> hardirq/softirq # \|\| / _--=> preempt-depth # \|\|\| / delay # TASK-PID CPU# \|\|\|\| TIMESTAMP FUNCTION # \| \| \| \|\|\|\| \| \| bash-1189 [000] .... 111.488323: tracing_mark_write: top buffer Not only did the snapshot occur in the top level buffer, but the instance snapshot buffer should have been allocated, and it is still free. Cc: stable@vger.kernel.org Fixes: 85f2b08268c01 ("tracing: Add basic event trigger framework") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-05-27	tracing: Fix crash when freeing instances with event triggers	Steven Rostedt (VMware)	1	-2/+3
	If a instance has an event trigger enabled when it is freed, it could cause an access of free memory. Here's the case that crashes: # cd /sys/kernel/tracing # mkdir instances/foo # echo snapshot > instances/foo/events/initcall/initcall_start/trigger # rmdir instances/foo Would produce: general protection fault: 0000 [#1] PREEMPT SMP PTI Modules linked in: tun bridge ... CPU: 5 PID: 6203 Comm: rmdir Tainted: G W 4.17.0-rc4-test+ #933 Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v03.03 07/14/2016 RIP: 0010:clear_event_triggers+0x3b/0x70 RSP: 0018:ffffc90003783de0 EFLAGS: 00010286 RAX: 0000000000000000 RBX: 6b6b6b6b6b6b6b2b RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8800c7130ba0 RBP: ffffc90003783e00 R08: ffff8801131993f8 R09: 0000000100230016 R10: ffffc90003783d80 R11: 0000000000000000 R12: ffff8800c7130ba0 R13: ffff8800c7130bd8 R14: ffff8800cc093768 R15: 00000000ffffff9c FS: 00007f6f4aa86700(0000) GS:ffff88011eb40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f6f4a5aed60 CR3: 00000000cd552001 CR4: 00000000001606e0 Call Trace: event_trace_del_tracer+0x2a/0xc5 instance_rmdir+0x15c/0x200 tracefs_syscall_rmdir+0x52/0x90 vfs_rmdir+0xdb/0x160 do_rmdir+0x16d/0x1c0 __x64_sys_rmdir+0x17/0x20 do_syscall_64+0x55/0x1a0 entry_SYSCALL_64_after_hwframe+0x49/0xbe This was due to the call the clears out the triggers when an instance is being deleted not removing the trigger from the link list. Cc: stable@vger.kernel.org Fixes: 85f2b08268c01 ("tracing: Add basic event trigger framework") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-05-27	Linux 4.17-rc7	Linus Torvalds	1	-1/+1

2018-05-26	ARM: Fix i2c-gpio GPIO descriptor tables	Linus Walleij	10	-11/+11
	I used bad names in my clumsiness when rewriting many board files to use GPIO descriptors instead of platform data. A few had the platform_device ID set to -1 which would indeed give the device name "i2c-gpio". But several had it set to >=0 which gives the names "i2c-gpio.0", "i2c-gpio.1" ... Fix the offending instances in the ARM tree. Sorry for the mess. Fixes: b2e63555592f ("i2c: gpio: Convert to use descriptors") Cc: Wolfram Sang <wsa@the-dreams.de> Cc: Simon Guinot <simon.guinot@sequanux.org> Reported-by: Simon Guinot <simon.guinot@sequanux.org> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Olof Johansson <olof@lixom.net>
2018-05-26	arm64: dts: hikey: Fix eMMC corruption regression	John Stultz	1	-1/+0
	This patch is a partial revert of commit abd7d0972a19 ("arm64: dts: hikey: Enable HS200 mode on eMMC") which has been causing eMMC corruption on my HiKey board. Symptoms usually looked like: mmc_host mmc0: Bus speed (slot 0) = 24800000Hz (slot req 400000Hz, actual 400000HZ div = 31) ... mmc_host mmc0: Bus speed (slot 0) = 148800000Hz (slot req 150000000Hz, actual 148800000HZ div = 0) mmc0: new HS200 MMC card at address 0001 ... dwmmc_k3 f723d000.dwmmc0: Unexpected command timeout, state 3 mmc_host mmc0: Bus speed (slot 0) = 24800000Hz (slot req 400000Hz, actual 400000HZ div = 31) mmc_host mmc0: Bus speed (slot 0) = 148800000Hz (slot req 150000000Hz, actual 148800000HZ div = 0) mmc_host mmc0: Bus speed (slot 0) = 24800000Hz (slot req 400000Hz, actual 400000HZ div = 31) mmc_host mmc0: Bus speed (slot 0) = 148800000Hz (slot req 150000000Hz, actual 148800000HZ div = 0) mmc_host mmc0: Bus speed (slot 0) = 24800000Hz (slot req 400000Hz, actual 400000HZ div = 31) mmc_host mmc0: Bus speed (slot 0) = 148800000Hz (slot req 150000000Hz, actual 148800000HZ div = 0) print_req_error: I/O error, dev mmcblk0, sector 8810504 Aborting journal on device mmcblk0p10-8. mmc_host mmc0: Bus speed (slot 0) = 24800000Hz (slot req 400000Hz, actual 400000HZ div = 31) mmc_host mmc0: Bus speed (slot 0) = 148800000Hz (slot req 150000000Hz, actual 148800000HZ div = 0) mmc_host mmc0: Bus speed (slot 0) = 24800000Hz (slot req 400000Hz, actual 400000HZ div = 31) mmc_host mmc0: Bus speed (slot 0) = 148800000Hz (slot req 150000000Hz, actual 148800000HZ div = 0) mmc_host mmc0: Bus speed (slot 0) = 24800000Hz (slot req 400000Hz, actual 400000HZ div = 31) mmc_host mmc0: Bus speed (slot 0) = 148800000Hz (slot req 150000000Hz, actual 148800000HZ div = 0) mmc_host mmc0: Bus speed (slot 0) = 24800000Hz (slot req 400000Hz, actual 400000HZ div = 31) mmc_host mmc0: Bus speed (slot 0) = 148800000Hz (slot req 150000000Hz, actual 148800000HZ div = 0) EXT4-fs error (device mmcblk0p10): ext4_journal_check_start:61: Detected aborted journal EXT4-fs (mmcblk0p10): Remounting filesystem read-only And quite often this would result in a disk that wouldn't properly boot even with older kernels. It seems the max-frequency property added by the above patch is causing the problem, so remove it. Cc: Ryan Grachek <ryan@edited.us> Cc: Wei Xu <xuwei5@hisilicon.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Ulf Hansson <ulf.hansson@linaro.org> Cc: YongQin Liu <yongqin.liu@linaro.org> Cc: Leo Yan <leo.yan@linaro.org> Signed-off-by: John Stultz <john.stultz@linaro.org> Tested-by: Leo Yan <leo.yan@linaro.org> Signed-off-by: Wei Xu <xuwei04@gmail.com>
2018-05-26	crypto: inside-secure - do not use memset on MMIO	Antoine Tenart	1	-2/+2
	This patch fixes the Inside Secure driver which uses a memtset() call to set an MMIO area from the cryptographic engine to 0. This is wrong as memset() isn't guaranteed to work on MMIO for many reasons. This led to kernel paging request panics in certain cases. Use memset_io() instead. Fixes: 1b44c5a60c13 ("crypto: inside-secure - add SafeXcel EIP197 crypto engine driver") Reported-by: Ofer Heifetz <oferh@marvell.com> Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-05-25	kasan: fix memory hotplug during boot	David Hildenbrand	1	-1/+1
	Using module_init() is wrong. E.g. ACPI adds and onlines memory before our memory notifier gets registered. This makes sure that ACPI memory detected during boot up will not result in a kernel crash. Easily reproducible with QEMU, just specify a DIMM when starting up. Link: http://lkml.kernel.org/r/20180522100756.18478-3-david@redhat.com Fixes: 786a8959912e ("kasan: disable memory hotplug") Signed-off-by: David Hildenbrand <david@redhat.com> Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-05-25	kasan: free allocated shadow memory on MEM_CANCEL_ONLINE	David Hildenbrand	1	-0/+1
	We have to free memory again when we cancel onlining, otherwise a later onlining attempt will fail. Link: http://lkml.kernel.org/r/20180522100756.18478-2-david@redhat.com Fixes: fa69b5989bb0 ("mm/kasan: add support for memory hotplug") Signed-off-by: David Hildenbrand <david@redhat.com> Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-05-25	checkpatch: fix macro argument precedence test	Joe Perches	1	-1/+1
	checkpatch's macro argument precedence test is broken so fix it. Link: http://lkml.kernel.org/r/5dd900e9197febc1995604bb33c23c136d8b33ce.camel@perches.com Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-05-25	init/main.c: include <linux/mem_encrypt.h>	Mathieu Malaterre	1	-0/+1
	In commit c7753208a94c ("x86, swiotlb: Add memory encryption support") a call to function `mem_encrypt_init' was added. Include prototype defined in header <linux/mem_encrypt.h> to prevent a warning reported during compilation with W=1: init/main.c:494:20: warning: no previous prototype for `mem_encrypt_init' [-Wmissing-prototypes] Link: http://lkml.kernel.org/r/20180522195533.31415-1-malat@debian.org Signed-off-by: Mathieu Malaterre <malat@debian.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Kees Cook <keescook@chromium.org> Cc: Laura Abbott <lauraa@codeaurora.org> Cc: Dominik Brodowski <linux@dominikbrodowski.net> Cc: Gargi Sharma <gs051095@gmail.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-05-25	kernel/sys.c: fix potential Spectre v1 issue	Gustavo A. R. Silva	1	-0/+5
	`resource' can be controlled by user-space, hence leading to a potential exploitation of the Spectre variant 1 vulnerability. This issue was detected with the help of Smatch: kernel/sys.c:1474 __do_compat_sys_old_getrlimit() warn: potential spectre issue 'get_current()->signal->rlim' (local cap) kernel/sys.c:1455 __do_sys_old_getrlimit() warn: potential spectre issue 'get_current()->signal->rlim' (local cap) Fix this by sanitizing resource before using it to index current->signal->rlim Notice that given that speculation windows are large, the policy is to kill the speculation on the first load and not worry if it can be completed with a dependent load/store [1]. [1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2 Link: http://lkml.kernel.org/r/20180515030038.GA11822@embeddedor.com Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-05-25	mm/memory_hotplug: fix leftover use of struct page during hotplug	Jonathan Cameron	3	-6/+9
	The case of a new numa node got missed in avoiding using the node info from page_struct during hotplug. In this path we have a call to register_mem_sect_under_node (which allows us to specify it is hotplug so don't change the node), via link_mem_sections which unfortunately does not. Fix is to pass check_nid through link_mem_sections as well and disable it in the new numa node path. Note the bug only 'sometimes' manifests depending on what happens to be in the struct page structures - there are lots of them and it only needs to match one of them. The result of the bug is that (with a new memory only node) we never successfully call register_mem_sect_under_node so don't get the memory associated with the node in sysfs and meminfo for the node doesn't report it. It came up whilst testing some arm64 hotplug patches, but appears to be universal. Whilst I'm triggering it by removing then reinserting memory to a node with no other elements (thus making the node disappear then appear again), it appears it would happen on hotplugging memory where there was none before and it doesn't seem to be related the arm64 patches. These patches call __add_pages (where most of the issue was fixed by Pavel's patch). If there is a node at the time of the __add_pages call then all is well as it calls register_mem_sect_under_node from there with check_nid set to false. Without a node that function returns having not done the sysfs related stuff as there is no node to use. This is expected but it is the resulting path that fails... Exact path to the problem is as follows: mm/memory_hotplug.c: add_memory_resource() The node is not online so we enter the 'if (new_node)' twice, on the second such block there is a call to link_mem_sections which calls into drivers/node.c: link_mem_sections() which calls drivers/node.c: register_mem_sect_under_node() which calls get_nid_for_pfn and keeps trying until the output of that matches the expected node (passed all the way down from add_memory_resource) It is effectively the same fix as the one referred to in the fixes tag just in the code path for a new node where the comments point out we have to rerun the link creation because it will have failed in register_new_memory (as there was no node at the time). (actually that comment is wrong now as we don't have register_new_memory any more it got renamed to hotplug_memory_register in Pavel's patch). Link: http://lkml.kernel.org/r/20180504085311.1240-1-Jonathan.Cameron@huawei.com Fixes: fc44f7f9231a ("mm/memory_hotplug: don't read nid from struct page during hotplug") Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-05-25	proc: fix smaps and meminfo alignment	Hugh Dickins	1	-5/+0
	The 4.17-rc /proc/meminfo and /proc/<pid>/smaps look ugly: single-digit numbers (commonly 0) are misaligned. Remove seq_put_decimal_ull_width()'s leftover optimization for single digits: it's wrong now that num_to_str() takes care of the width. Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1805241554210.1326@eggly.anvils Fixes: d1be35cb6f96 ("proc: add seq_put_decimal_ull_width to speed up /proc/pid/smaps") Signed-off-by: Hugh Dickins <hughd@google.com> Cc: Andrei Vagin <avagin@openvz.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-05-25	mm: do not warn on offline nodes unless the specific node is explicitly requested	Michal Hocko	1	-1/+1
	Oscar has noticed that we splat WARNING: CPU: 0 PID: 64 at ./include/linux/gfp.h:467 vmemmap_alloc_block+0x4e/0xc9 [...] CPU: 0 PID: 64 Comm: kworker/u4:1 Tainted: G W E 4.17.0-rc5-next-20180517-1-default+ #66 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014 Workqueue: kacpi_hotplug acpi_hotplug_work_fn Call Trace: vmemmap_populate+0xf2/0x2ae sparse_mem_map_populate+0x28/0x35 sparse_add_one_section+0x4c/0x187 __add_pages+0xe7/0x1a0 add_pages+0x16/0x70 add_memory_resource+0xa3/0x1d0 add_memory+0xe4/0x110 acpi_memory_device_add+0x134/0x2e0 acpi_bus_attach+0xd9/0x190 acpi_bus_scan+0x37/0x70 acpi_device_hotplug+0x389/0x4e0 acpi_hotplug_work_fn+0x1a/0x30 process_one_work+0x146/0x340 worker_thread+0x47/0x3e0 kthread+0xf5/0x130 ret_from_fork+0x35/0x40 when adding memory to a node that is currently offline. The VM_WARN_ON is just too loud without a good reason. In this particular case we are doing alloc_pages_node(node, GFP_KERNEL\|__GFP_RETRY_MAYFAIL\|__GFP_NOWARN, order) so we do not insist on allocating from the given node (it is more a hint) so we can fall back to any other populated node and moreover we explicitly ask to not warn for the allocation failure. Soften the warning only to cases when somebody asks for the given node explicitly by __GFP_THISNODE. Link: http://lkml.kernel.org/r/20180523125555.30039-3-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Reported-by: Oscar Salvador <osalvador@techadventures.net> Tested-by: Oscar Salvador <osalvador@techadventures.net> Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Reza Arbab <arbab@linux.vnet.ibm.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-05-25	mm, memory_hotplug: make has_unmovable_pages more robust	Michal Hocko	1	-6/+10
	Oscar has reported: : Due to an unfortunate setting with movablecore, memblocks containing bootmem : memory (pages marked by get_page_bootmem()) ended up marked in zone_movable. : So while trying to remove that memory, the system failed in do_migrate_range : and __offline_pages never returned. : : This can be reproduced by running : qemu-system-x86_64 -m 6G,slots=8,maxmem=8G -numa node,mem=4096M -numa node,mem=2048M : and movablecore=4G kernel command line : : linux kernel: BIOS-provided physical RAM map: : linux kernel: BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable : linux kernel: BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved : linux kernel: BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved : linux kernel: BIOS-e820: [mem 0x0000000000100000-0x00000000bffdffff] usable : linux kernel: BIOS-e820: [mem 0x00000000bffe0000-0x00000000bfffffff] reserved : linux kernel: BIOS-e820: [mem 0x00000000feffc000-0x00000000feffffff] reserved : linux kernel: BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved : linux kernel: BIOS-e820: [mem 0x0000000100000000-0x00000001bfffffff] usable : linux kernel: NX (Execute Disable) protection: active : linux kernel: SMBIOS 2.8 present. : linux kernel: DMI: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org : linux kernel: Hypervisor detected: KVM : linux kernel: e820: update [mem 0x00000000-0x00000fff] usable ==> reserved : linux kernel: e820: remove [mem 0x000a0000-0x000fffff] usable : linux kernel: last_pfn = 0x1c0000 max_arch_pfn = 0x400000000 : : linux kernel: SRAT: PXM 0 -> APIC 0x00 -> Node 0 : linux kernel: SRAT: PXM 1 -> APIC 0x01 -> Node 1 : linux kernel: ACPI: SRAT: Node 0 PXM 0 [mem 0x00000000-0x0009ffff] : linux kernel: ACPI: SRAT: Node 0 PXM 0 [mem 0x00100000-0xbfffffff] : linux kernel: ACPI: SRAT: Node 0 PXM 0 [mem 0x100000000-0x13fffffff] : linux kernel: ACPI: SRAT: Node 1 PXM 1 [mem 0x140000000-0x1bfffffff] : linux kernel: ACPI: SRAT: Node 0 PXM 0 [mem 0x1c0000000-0x43fffffff] hotplug : linux kernel: NUMA: Node 0 [mem 0x00000000-0x0009ffff] + [mem 0x00100000-0xbfffffff] -> [mem 0x0 : linux kernel: NUMA: Node 0 [mem 0x00000000-0xbfffffff] + [mem 0x100000000-0x13fffffff] -> [mem 0 : linux kernel: NODE_DATA(0) allocated [mem 0x13ffd6000-0x13fffffff] : linux kernel: NODE_DATA(1) allocated [mem 0x1bffd3000-0x1bfffcfff] : : zoneinfo shows that the zone movable is placed into both numa nodes: : Node 0, zone Movable : pages free 160140 : min 1823 : low 2278 : high 2733 : spanned 262144 : present 262144 : managed 245670 : Node 1, zone Movable : pages free 448427 : min 3827 : low 4783 : high 5739 : spanned 524288 : present 524288 : managed 515766 Note how only Node 0 has a hutplugable memory region which would rule it out from the early memblock allocations (most likely memmap). Node1 will surely contain memmaps on the same node and those would prevent offlining to succeed. So this is arguably a configuration issue. Although one could argue that we should be more clever and rule early allocations from the zone movable. This would be correct but probably not worth the effort considering what a hack movablecore is. Anyway, We could do better for those cases though. We rely on start_isolate_page_range resp. has_unmovable_pages to do their job. The first one isolates the whole range to be offlined so that we do not allocate from it anymore and the later makes sure we are not stumbling over non-migrateable pages. has_unmovable_pages is overly optimistic, however. It doesn't check all the pages if we are withing zone_movable because we rely that those pages will be always migrateable. As it turns out we are still not perfect there. While bootmem pages in zonemovable sound like a clear bug which should be fixed let's remove the optimization for now and warn if we encounter unmovable pages in zone_movable in the meantime. That should help for now at least. Btw. this wasn't a real problem until commit 72b39cfc4d75 ("mm, memory_hotplug: do not fail offlining too early") because we used to have a small number of retries and then failed. This turned out to be too fragile though. Link: http://lkml.kernel.org/r/20180523125555.30039-2-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Reported-by: Oscar Salvador <osalvador@techadventures.net> Tested-by: Oscar Salvador <osalvador@techadventures.net> Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Reza Arbab <arbab@linux.vnet.ibm.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-05-25	mm/kasan: don't vfree() nonexistent vm_area	Andrey Ryabinin	1	-2/+61
	KASAN uses different routines to map shadow for hot added memory and memory obtained in boot process. Attempt to offline memory onlined by normal boot process leads to this: Trying to vfree() nonexistent vm area (000000005d3b34b9) WARNING: CPU: 2 PID: 13215 at mm/vmalloc.c:1525 __vunmap+0x147/0x190 Call Trace: kasan_mem_notifier+0xad/0xb9 notifier_call_chain+0x166/0x260 __blocking_notifier_call_chain+0xdb/0x140 __offline_pages+0x96a/0xb10 memory_subsys_offline+0x76/0xc0 device_offline+0xb8/0x120 store_mem_state+0xfa/0x120 kernfs_fop_write+0x1d5/0x320 __vfs_write+0xd4/0x530 vfs_write+0x105/0x340 SyS_write+0xb0/0x140 Obviously we can't call vfree() to free memory that wasn't allocated via vmalloc(). Use find_vm_area() to see if we can call vfree(). Unfortunately it's a bit tricky to properly unmap and free shadow allocated during boot, so we'll have to keep it. If memory will come online again that shadow will be reused. Matthew asked: how can you call vfree() on something that isn't a vmalloc address? vfree() is able to free any address returned by __vmalloc_node_range(). And __vmalloc_node_range() gives you any address you ask. It doesn't have to be an address in [VMALLOC_START, VMALLOC_END] range. That's also how the module_alloc()/module_memfree() works on architectures that have designated area for modules. [aryabinin@virtuozzo.com: improve comments] Link: http://lkml.kernel.org/r/dabee6ab-3a7a-51cd-3b86-5468718e0390@virtuozzo.com [akpm@linux-foundation.org: fix typos, reflow comment] Link: http://lkml.kernel.org/r/20180201163349.8700-1-aryabinin@virtuozzo.com Fixes: fa69b5989bb0 ("mm/kasan: add support for memory hotplug") Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Reported-by: Paul Menzel <pmenzel+linux-kasan-dev@molgen.mpg.de> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-05-25	MAINTAINERS: change hugetlbfs maintainer and update files	Mike Kravetz	1	-1/+7
	The current hugetlbfs maintainer has not been active for more than a few years. I have been been active in this area for more than two years and plan to remain active in the foreseeable future. Also, update the hugetlbfs entry to include linux-mm mail list and additional hugetlbfs related files. hugetlb.c and hugetlb.h are not 100% hugetlbfs, but a majority of their content is hugetlbfs related. Link: http://lkml.kernel.org/r/20180518225236.19079-1-mike.kravetz@oracle.com Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Nadia Yvette Chambers <nyc@holomorphy.com> Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-05-25	ipc/shm: fix shmat() nil address after round-down when remapping	Davidlohr Bueso	1	-2/+10
	shmat()'s SHM_REMAP option forbids passing a nil address for; this is in fact the very first thing we check for. Andrea reported that for SHM_RND\|SHM_REMAP cases we can end up bypassing the initial addr check, but we need to check again if the address was rounded down to nil. As of this patch, such cases will return -EINVAL. Link: http://lkml.kernel.org/r/20180503204934.kk63josdu6u53fbd@linux-n805 Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Reported-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Joe Lawrence <joe.lawrence@redhat.com> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-05-25	Revert "ipc/shm: Fix shmat mmap nil-page protection"	Davidlohr Bueso	1	-7/+2
	Patch series "ipc/shm: shmat() fixes around nil-page". These patches fix two issues reported[1] a while back by Joe and Andrea around how shmat(2) behaves with nil-page. The first reverts a commit that it was incorrectly thought that mapping nil-page (address=0) was a no no with MAP_FIXED. This is not the case, with the exception of SHM_REMAP; which is address in the second patch. I chose two patches because it is easier to backport and it explicitly reverts bogus behaviour. Both patches ought to be in -stable and ltp testcases need updated (the added testcase around the cve can be modified to just test for SHM_RND\|SHM_REMAP). [1] lkml.kernel.org/r/20180430172152.nfa564pvgpk3ut7p@linux-n805 This patch (of 2): Commit 95e91b831f87 ("ipc/shm: Fix shmat mmap nil-page protection") worked on the idea that we should not be mapping as root addr=0 and MAP_FIXED. However, it was reported that this scenario is in fact valid, thus making the patch both bogus and breaks userspace as well. For example X11's libint10.so relies on shmat(1, SHM_RND) for lowmem initialization[1]. [1] https://cgit.freedesktop.org/xorg/xserver/tree/hw/xfree86/os-support/linux/int10/linux.c#n347 Link: http://lkml.kernel.org/r/20180503203243.15045-2-dave@stgolabs.net Fixes: 95e91b831f87 ("ipc/shm: Fix shmat mmap nil-page protection") Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Reported-by: Joe Lawrence <joe.lawrence@redhat.com> Reported-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>