aboutsummaryrefslogtreecommitdiffstats
path: root/arch/x86/kernel/cpu
AgeCommit message (Collapse)AuthorFilesLines
2020-01-25x86/CPU/AMD: Remove amd_get_topology_early()Borislav Petkov1-8/+2
... and fold its function body into its single call site. No functional changes: # arch/x86/kernel/cpu/amd.o: text data bss dec hex filename 5994 385 1 6380 18ec amd.o.before 5994 385 1 6380 18ec amd.o.after md5: 99ec6daa095b502297884e949c520f90 amd.o.before.asm 99ec6daa095b502297884e949c520f90 amd.o.after.asm Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200123165811.5288-1-bp@alien8.de
2020-01-23x86/mpx: remove MPX from arch/x86Dave Hansen2-54/+0
From: Dave Hansen <dave.hansen@linux.intel.com> MPX is being removed from the kernel due to a lack of support in the toolchain going forward (gcc). This removes all the remaining (dead at this point) MPX handling code remaining in the tree. The only remaining code is the XSAVE support for MPX state which is currently needd for KVM to handle VMs which might use MPX. Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: x86@kernel.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
2020-01-20x86/resctrl: Clean up unused function parameter in mkdir pathXiaochen Shen1-11/+5
Commit 334b0f4e9b1b ("x86/resctrl: Fix a deadlock due to inaccurate reference") changed the argument to rdtgroup_kn_lock_live()/rdtgroup_kn_unlock() within mkdir_rdt_prepare(). That change resulted in an unused function parameter to mkdir_rdt_prepare(). Clean up the unused function parameter in mkdir_rdt_prepare() and its callers rdtgroup_mkdir_mon() and rdtgroup_mkdir_ctrl_mon(). Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/1578500886-21771-5-git-send-email-xiaochen.shen@intel.com
2020-01-20x86/resctrl: Fix a deadlock due to inaccurate referenceXiaochen Shen1-8/+8
There is a race condition which results in a deadlock when rmdir and mkdir execute concurrently: $ ls /sys/fs/resctrl/c1/mon_groups/m1/ cpus cpus_list mon_data tasks Thread 1: rmdir /sys/fs/resctrl/c1 Thread 2: mkdir /sys/fs/resctrl/c1/mon_groups/m1 3 locks held by mkdir/48649: #0: (sb_writers#17){.+.+}, at: [<ffffffffb4ca2aa0>] mnt_want_write+0x20/0x50 #1: (&type->i_mutex_dir_key#8/1){+.+.}, at: [<ffffffffb4c8c13b>] filename_create+0x7b/0x170 #2: (rdtgroup_mutex){+.+.}, at: [<ffffffffb4a4389d>] rdtgroup_kn_lock_live+0x3d/0x70 4 locks held by rmdir/48652: #0: (sb_writers#17){.+.+}, at: [<ffffffffb4ca2aa0>] mnt_want_write+0x20/0x50 #1: (&type->i_mutex_dir_key#8/1){+.+.}, at: [<ffffffffb4c8c3cf>] do_rmdir+0x13f/0x1e0 #2: (&type->i_mutex_dir_key#8){++++}, at: [<ffffffffb4c86d5d>] vfs_rmdir+0x4d/0x120 #3: (rdtgroup_mutex){+.+.}, at: [<ffffffffb4a4389d>] rdtgroup_kn_lock_live+0x3d/0x70 Thread 1 is deleting control group "c1". Holding rdtgroup_mutex, kernfs_remove() removes all kernfs nodes under directory "c1" recursively, then waits for sub kernfs node "mon_groups" to drop active reference. Thread 2 is trying to create a subdirectory "m1" in the "mon_groups" directory. The wrapper kernfs_iop_mkdir() takes an active reference to the "mon_groups" directory but the code drops the active reference to the parent directory "c1" instead. As a result, Thread 1 is blocked on waiting for active reference to drop and never release rdtgroup_mutex, while Thread 2 is also blocked on trying to get rdtgroup_mutex. Thread 1 (rdtgroup_rmdir) Thread 2 (rdtgroup_mkdir) (rmdir /sys/fs/resctrl/c1) (mkdir /sys/fs/resctrl/c1/mon_groups/m1) ------------------------- ------------------------- kernfs_iop_mkdir /* * kn: "m1", parent_kn: "mon_groups", * prgrp_kn: parent_kn->parent: "c1", * * "mon_groups", parent_kn->active++: 1 */ kernfs_get_active(parent_kn) kernfs_iop_rmdir /* "c1", kn->active++ */ kernfs_get_active(kn) rdtgroup_kn_lock_live atomic_inc(&rdtgrp->waitcount) /* "c1", kn->active-- */ kernfs_break_active_protection(kn) mutex_lock rdtgroup_rmdir_ctrl free_all_child_rdtgrp sentry->flags = RDT_DELETED rdtgroup_ctrl_remove rdtgrp->flags = RDT_DELETED kernfs_get(kn) kernfs_remove(rdtgrp->kn) __kernfs_remove /* "mon_groups", sub_kn */ atomic_add(KN_DEACTIVATED_BIAS, &sub_kn->active) kernfs_drain(sub_kn) /* * sub_kn->active == KN_DEACTIVATED_BIAS + 1, * waiting on sub_kn->active to drop, but it * never drops in Thread 2 which is blocked * on getting rdtgroup_mutex. */ Thread 1 hangs here ----> wait_event(sub_kn->active == KN_DEACTIVATED_BIAS) ... rdtgroup_mkdir rdtgroup_mkdir_mon(parent_kn, prgrp_kn) mkdir_rdt_prepare(parent_kn, prgrp_kn) rdtgroup_kn_lock_live(prgrp_kn) atomic_inc(&rdtgrp->waitcount) /* * "c1", prgrp_kn->active-- * * The active reference on "c1" is * dropped, but not matching the * actual active reference taken * on "mon_groups", thus causing * Thread 1 to wait forever while * holding rdtgroup_mutex. */ kernfs_break_active_protection( prgrp_kn) /* * Trying to get rdtgroup_mutex * which is held by Thread 1. */ Thread 2 hangs here ----> mutex_lock ... The problem is that the creation of a subdirectory in the "mon_groups" directory incorrectly releases the active protection of its parent directory instead of itself before it starts waiting for rdtgroup_mutex. This is triggered by the rdtgroup_mkdir() flow calling rdtgroup_kn_lock_live()/rdtgroup_kn_unlock() with kernfs node of the parent control group ("c1") as argument. It should be called with kernfs node "mon_groups" instead. What is currently missing is that the kn->priv of "mon_groups" is NULL instead of pointing to the rdtgrp. Fix it by pointing kn->priv to rdtgrp when "mon_groups" is created. Then it could be passed to rdtgroup_kn_lock_live()/rdtgroup_kn_unlock() instead. And then it operates on the same rdtgroup structure but handles the active reference of kernfs node "mon_groups" to prevent deadlock. The same changes are also made to the "mon_data" directories. This results in some unused function parameters that will be cleaned up in follow-up patch as the focus here is on the fix only in support of backporting efforts. Fixes: c7d9aac61311 ("x86/intel_rdt/cqm: Add mkdir support for RDT monitoring") Suggested-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/1578500886-21771-4-git-send-email-xiaochen.shen@intel.com
2020-01-20x86/resctrl: Fix use-after-free due to inaccurate refcount of rdtgroupXiaochen Shen1-2/+2
There is a race condition in the following scenario which results in an use-after-free issue when reading a monitoring file and deleting the parent ctrl_mon group concurrently: Thread 1 calls atomic_inc() to take refcount of rdtgrp and then calls kernfs_break_active_protection() to drop the active reference of kernfs node in rdtgroup_kn_lock_live(). In Thread 2, kernfs_remove() is a blocking routine. It waits on all sub kernfs nodes to drop the active reference when removing all subtree kernfs nodes recursively. Thread 2 could block on kernfs_remove() until Thread 1 calls kernfs_break_active_protection(). Only after kernfs_remove() completes the refcount of rdtgrp could be trusted. Before Thread 1 calls atomic_inc() and kernfs_break_active_protection(), Thread 2 could call kfree() when the refcount of rdtgrp (sentry) is 0 instead of 1 due to the race. In Thread 1, in rdtgroup_kn_unlock(), referring to earlier rdtgrp memory (rdtgrp->waitcount) which was already freed in Thread 2 results in use-after-free issue. Thread 1 (rdtgroup_mondata_show) Thread 2 (rdtgroup_rmdir) -------------------------------- ------------------------- rdtgroup_kn_lock_live /* * kn active protection until * kernfs_break_active_protection(kn) */ rdtgrp = kernfs_to_rdtgroup(kn) rdtgroup_kn_lock_live atomic_inc(&rdtgrp->waitcount) mutex_lock rdtgroup_rmdir_ctrl free_all_child_rdtgrp /* * sentry->waitcount should be 1 * but is 0 now due to the race. */ kfree(sentry)*[1] /* * Only after kernfs_remove() * completes, the refcount of * rdtgrp could be trusted. */ atomic_inc(&rdtgrp->waitcount) /* kn->active-- */ kernfs_break_active_protection(kn) rdtgroup_ctrl_remove rdtgrp->flags = RDT_DELETED /* * Blocking routine, wait for * all sub kernfs nodes to drop * active reference in * kernfs_break_active_protection. */ kernfs_remove(rdtgrp->kn) rdtgroup_kn_unlock mutex_unlock atomic_dec_and_test( &rdtgrp->waitcount) && (flags & RDT_DELETED) kernfs_unbreak_active_protection(kn) kfree(rdtgrp) mutex_lock mon_event_read rdtgroup_kn_unlock mutex_unlock /* * Use-after-free: refer to earlier rdtgrp * memory which was freed in [1]. */ atomic_dec_and_test(&rdtgrp->waitcount) && (flags & RDT_DELETED) /* kn->active++ */ kernfs_unbreak_active_protection(kn) kfree(rdtgrp) Fix it by moving free_all_child_rdtgrp() to after kernfs_remove() in rdtgroup_rmdir_ctrl() to ensure it has the accurate refcount of rdtgrp. Fixes: f3cbeacaa06e ("x86/intel_rdt/cqm: Add rmdir support") Suggested-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/1578500886-21771-3-git-send-email-xiaochen.shen@intel.com
2020-01-20x86/resctrl: Fix use-after-free when deleting resource groupsXiaochen Shen1-2/+10
A resource group (rdtgrp) contains a reference count (rdtgrp->waitcount) that indicates how many waiters expect this rdtgrp to exist. Waiters could be waiting on rdtgroup_mutex or some work sitting on a task's workqueue for when the task returns from kernel mode or exits. The deletion of a rdtgrp is intended to have two phases: (1) while holding rdtgroup_mutex the necessary cleanup is done and rdtgrp->flags is set to RDT_DELETED, (2) after releasing the rdtgroup_mutex, the rdtgrp structure is freed only if there are no waiters and its flag is set to RDT_DELETED. Upon gaining access to rdtgroup_mutex or rdtgrp, a waiter is required to check for the RDT_DELETED flag. When unmounting the resctrl file system or deleting ctrl_mon groups, all of the subdirectories are removed and the data structure of rdtgrp is forcibly freed without checking rdtgrp->waitcount. If at this point there was a waiter on rdtgrp then a use-after-free issue occurs when the waiter starts running and accesses the rdtgrp structure it was waiting on. See kfree() calls in [1], [2] and [3] in these two call paths in following scenarios: (1) rdt_kill_sb() -> rmdir_all_sub() -> free_all_child_rdtgrp() (2) rdtgroup_rmdir() -> rdtgroup_rmdir_ctrl() -> free_all_child_rdtgrp() There are several scenarios that result in use-after-free issue in following: Scenario 1: ----------- In Thread 1, rdtgroup_tasks_write() adds a task_work callback move_myself(). If move_myself() is scheduled to execute after Thread 2 rdt_kill_sb() is finished, referring to earlier rdtgrp memory (rdtgrp->waitcount) which was already freed in Thread 2 results in use-after-free issue. Thread 1 (rdtgroup_tasks_write) Thread 2 (rdt_kill_sb) ------------------------------- ---------------------- rdtgroup_kn_lock_live atomic_inc(&rdtgrp->waitcount) mutex_lock rdtgroup_move_task __rdtgroup_move_task /* * Take an extra refcount, so rdtgrp cannot be freed * before the call back move_myself has been invoked */ atomic_inc(&rdtgrp->waitcount) /* Callback move_myself will be scheduled for later */ task_work_add(move_myself) rdtgroup_kn_unlock mutex_unlock atomic_dec_and_test(&rdtgrp->waitcount) && (flags & RDT_DELETED) mutex_lock rmdir_all_sub /* * sentry and rdtgrp are freed * without checking refcount */ free_all_child_rdtgrp kfree(sentry)*[1] kfree(rdtgrp)*[2] mutex_unlock /* * Callback is scheduled to execute * after rdt_kill_sb is finished */ move_myself /* * Use-after-free: refer to earlier rdtgrp * memory which was freed in [1] or [2]. */ atomic_dec_and_test(&rdtgrp->waitcount) && (flags & RDT_DELETED) kfree(rdtgrp) Scenario 2: ----------- In Thread 1, rdtgroup_tasks_write() adds a task_work callback move_myself(). If move_myself() is scheduled to execute after Thread 2 rdtgroup_rmdir() is finished, referring to earlier rdtgrp memory (rdtgrp->waitcount) which was already freed in Thread 2 results in use-after-free issue. Thread 1 (rdtgroup_tasks_write) Thread 2 (rdtgroup_rmdir) ------------------------------- ------------------------- rdtgroup_kn_lock_live atomic_inc(&rdtgrp->waitcount) mutex_lock rdtgroup_move_task __rdtgroup_move_task /* * Take an extra refcount, so rdtgrp cannot be freed * before the call back move_myself has been invoked */ atomic_inc(&rdtgrp->waitcount) /* Callback move_myself will be scheduled for later */ task_work_add(move_myself) rdtgroup_kn_unlock mutex_unlock atomic_dec_and_test(&rdtgrp->waitcount) && (flags & RDT_DELETED) rdtgroup_kn_lock_live atomic_inc(&rdtgrp->waitcount) mutex_lock rdtgroup_rmdir_ctrl free_all_child_rdtgrp /* * sentry is freed without * checking refcount */ kfree(sentry)*[3] rdtgroup_ctrl_remove rdtgrp->flags = RDT_DELETED rdtgroup_kn_unlock mutex_unlock atomic_dec_and_test( &rdtgrp->waitcount) && (flags & RDT_DELETED) kfree(rdtgrp) /* * Callback is scheduled to execute * after rdt_kill_sb is finished */ move_myself /* * Use-after-free: refer to earlier rdtgrp * memory which was freed in [3]. */ atomic_dec_and_test(&rdtgrp->waitcount) && (flags & RDT_DELETED) kfree(rdtgrp) If CONFIG_DEBUG_SLAB=y, Slab corruption on kmalloc-2k can be observed like following. Note that "0x6b" is POISON_FREE after kfree(). The corrupted bits "0x6a", "0x64" at offset 0x424 correspond to waitcount member of struct rdtgroup which was freed: Slab corruption (Not tainted): kmalloc-2k start=ffff9504c5b0d000, len=2048 420: 6b 6b 6b 6b 6a 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkjkkkkkkkkkkk Single bit error detected. Probably bad RAM. Run memtest86+ or a similar memory test tool. Next obj: start=ffff9504c5b0d800, len=2048 000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk 010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk Slab corruption (Not tainted): kmalloc-2k start=ffff9504c58ab800, len=2048 420: 6b 6b 6b 6b 64 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkdkkkkkkkkkkk Prev obj: start=ffff9504c58ab000, len=2048 000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk 010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk Fix this by taking reference count (waitcount) of rdtgrp into account in the two call paths that currently do not do so. Instead of always freeing the resource group it will only be freed if there are no waiters on it. If there are waiters, the resource group will have its flags set to RDT_DELETED. It will be left to the waiter to free the resource group when it starts running and finding that it was the last waiter and the resource group has been removed (rdtgrp->flags & RDT_DELETED) since. (1) rdt_kill_sb() -> rmdir_all_sub() -> free_all_child_rdtgrp() (2) rdtgroup_rmdir() -> rdtgroup_rmdir_ctrl() -> free_all_child_rdtgrp() Fixes: f3cbeacaa06e ("x86/intel_rdt/cqm: Add rmdir support") Fixes: 60cf5e101fd4 ("x86/intel_rdt: Add mkdir to resctrl file system") Suggested-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/1578500886-21771-2-git-send-email-xiaochen.shen@intel.com
2020-01-20x86/cpu: Remove redundant cpu_detect_cache_sizes() callTony W Wang-oc2-4/+0
Both functions call init_intel_cacheinfo() which computes L2 and L3 cache sizes from CPUID(4). But then they also call cpu_detect_cache_sizes() a bit later which computes ->x86_tlbsize and L2 size from CPUID(80000006). However, the latter call is not needed because - on these CPUs, CPUID(80000006).EBX for ->x86_tlbsize is reserved - CPUID(80000006).ECX for the L2 size has the same result as CPUID(4) Therefore, remove the latter call to simplify the code. [ bp: Rewrite commit message. ] Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/1579075257-6985-1-git-send-email-TonyWWang-oc@zhaoxin.com
2020-01-20x86/resctrl: Add task resctrl information displayChen Yu1-0/+86
Monitoring tools that want to find out which resctrl control and monitor groups a task belongs to must currently read the "tasks" file in every group until they locate the process ID. Add an additional file /proc/{pid}/cpu_resctrl_groups to provide this information: 1) res: mon: resctrl is not available. 2) res:/ mon: Task is part of the root resctrl control group, and it is not associated to any monitor group. 3) res:/ mon:mon0 Task is part of the root resctrl control group and monitor group mon0. 4) res:group0 mon: Task is part of resctrl control group group0, and it is not associated to any monitor group. 5) res:group0 mon:mon1 Task is part of resctrl control group group0 and monitor group mon1. Signed-off-by: Chen Yu <yu.c.chen@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Jinshi Chen <jinshi.chen@intel.com> Link: https://lkml.kernel.org/r/20200115092851.14761-1-yu.c.chen@intel.com
2020-01-20Merge tag 'v5.5-rc7' into locking/kcsan, to refresh the treeIngo Molnar4-10/+11
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-01-20Merge tag 'v5.5-rc7' into efi/core, to pick up fixesIngo Molnar4-10/+11
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-01-18Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds3-6/+6
Pull x86 fixes from Ingo Molnar: "Misc fixes: - a resctrl fix for uninitialized objects found by debugobjects - a resctrl memory leak fix - fix the unintended re-enabling of the of SME and SEV CPU flags if memory encryption was disabled at bootup via the MSR space" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/CPU/AMD: Ensure clearing of SME/SEV features is maintained x86/resctrl: Fix potential memory leak x86/resctrl: Fix an imbalance in domain_remove_cpu()
2020-01-17x86/resctrl: Check monitoring static key in the MBM overflow handlerXiaochen Shen2-2/+3
Currently, there are three static keys in the resctrl file system: rdt_mon_enable_key and rdt_alloc_enable_key indicate if the monitoring feature and the allocation feature are enabled, respectively. The rdt_enable_key is enabled when either the monitoring feature or the allocation feature is enabled. If no monitoring feature is present (either hardware doesn't support a monitoring feature or the feature is disabled by the kernel command line option "rdt="), rdt_enable_key is still enabled but rdt_mon_enable_key is disabled. MBM is a monitoring feature. The MBM overflow handler intends to check if the monitoring feature is not enabled for fast return. So check the rdt_mon_enable_key in it instead of the rdt_enable_key as former is the more accurate check. [ bp: Massage commit message. ] Fixes: e33026831bdb ("x86/intel_rdt/mbm: Handle counter overflow") Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/1576094705-13660-1-git-send-email-xiaochen.shen@intel.com
2020-01-17x86/speculation/swapgs: Exclude Zhaoxin CPUs from SWAPGS vulnerabilityTony W Wang-oc1-2/+2
New Zhaoxin family 7 CPUs are not affected by the SWAPGS vulnerability. So mark these CPUs in the cpu vulnerability whitelist accordingly. Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/1579227872-26972-3-git-send-email-TonyWWang-oc@zhaoxin.com
2020-01-17x86/speculation/spectre_v2: Exclude Zhaoxin CPUs from SPECTRE_V2Tony W Wang-oc1-1/+8
New Zhaoxin family 7 CPUs are not affected by SPECTRE_V2. So define a separate cpu_vuln_whitelist bit NO_SPECTRE_V2 and add these CPUs to the cpu vulnerability whitelist. Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/1579227872-26972-2-git-send-email-TonyWWang-oc@zhaoxin.com
2020-01-17x86/cpu: Update cached HLE state on write to TSX_CTRL_CPUID_CLEARPawan Gupta1-6/+7
/proc/cpuinfo currently reports Hardware Lock Elision (HLE) feature to be present on boot cpu even if it was disabled during the bootup. This is because cpuinfo_x86->x86_capability HLE bit is not updated after TSX state is changed via the new MSR IA32_TSX_CTRL. Update the cached HLE bit also since it is expected to change after an update to CPUID_CLEAR bit in MSR IA32_TSX_CTRL. Fixes: 95c5824f75f3 ("x86/cpu: Add a "tsx=" cmdline option with TSX disabled by default") Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Neelima Krishnan <neelima.krishnan@intel.com> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/2529b99546294c893dfa1c89e2b3e46da3369a59.1578685425.git.pawan.kumar.gupta@linux.intel.com
2020-01-16x86/CPU/AMD: Ensure clearing of SME/SEV features is maintainedTom Lendacky1-2/+2
If the SME and SEV features are present via CPUID, but memory encryption support is not enabled (MSR 0xC001_0010[23]), the feature flags are cleared using clear_cpu_cap(). However, if get_cpu_cap() is later called, these feature flags will be reset back to present, which is not desired. Change from using clear_cpu_cap() to setup_clear_cpu_cap() so that the clearing of the flags is maintained. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> # 4.16.x- Link: https://lkml.kernel.org/r/226de90a703c3c0be5a49565047905ac4e94e8f3.1579125915.git.thomas.lendacky@amd.com
2020-01-16x86/MCE/AMD, EDAC/mce_amd: Add new Load Store unit McaTypeYazen Ghannam1-0/+2
Add support for a new version of the Load Store unit bank type as indicated by its McaType value, which will be present in future SMCA systems. Add the new (HWID, MCATYPE) tuple. Reuse the same name, since this is logically the same to the user. Also, add the new error descriptions to edac_mce_amd. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200110015651.14887-2-Yazen.Ghannam@amd.com
2020-01-15x86/cpu: Print "VMX disabled" error message iff KVM is enabledSean Christopherson1-2/+3
Don't print an error message about VMX being disabled by BIOS if KVM, the sole user of VMX, is disabled. E.g. if KVM is disabled and the MSR is unlocked, the kernel will intentionally disable VMX when locking feature control and then complain that "BIOS" disabled VMX. Fixes: ef4d3bf19855 ("x86/cpu: Clear VMX feature flag if VMX is not fully enabled") Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200114202545.20296-1-sean.j.christopherson@intel.com
2020-01-15x86/mce/therm_throt: Do not access uninitialized therm_workChuansheng Liu1-4/+5
It is relatively easy to trigger the following boot splat on an Ice Lake client platform. The call stack is like: kernel BUG at kernel/timer/timer.c:1152! Call Trace: __queue_delayed_work queue_delayed_work_on therm_throt_process intel_thermal_interrupt ... The reason is that a CPU's thermal interrupt is enabled prior to executing its hotplug onlining callback which will initialize the throttling workqueues. Such a race can lead to therm_throt_process() accessing an uninitialized therm_work, leading to the above BUG at a very early bootup stage. Therefore, unmask the thermal interrupt vector only after having setup the workqueues completely. [ bp: Heavily massage commit message and correct comment formatting. ] Fixes: f6656208f04e ("x86/mce/therm_throt: Optimize notifications of thermal throttle") Signed-off-by: Chuansheng Liu <chuansheng.liu@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20200107004116.59353-1-chuansheng.liu@intel.com
2020-01-13x86/cpufeatures: Add flag to track whether MSR IA32_FEAT_CTL is configuredSean Christopherson1-0/+2
Add a new feature flag, X86_FEATURE_MSR_IA32_FEAT_CTL, to track whether IA32_FEAT_CTL has been initialized. This will allow KVM, and any future subsystems that depend on IA32_FEAT_CTL, to rely purely on cpufeatures to query platform support, e.g. allows a future patch to remove KVM's manual IA32_FEAT_CTL MSR checks. Various features (on platforms that support IA32_FEAT_CTL) are dependent on IA32_FEAT_CTL being configured and locked, e.g. VMX and LMCE. The MSR is always configured during boot, but only if the CPU vendor is recognized by the kernel. Because CPUID doesn't incorporate the current IA32_FEAT_CTL value in its reporting of relevant features, it's possible for a feature to be reported as supported in cpufeatures but not truly enabled, e.g. if the CPU supports VMX but the kernel doesn't recognize the CPU. As a result, without the flag, KVM would see VMX as supported even if IA32_FEAT_CTL hasn't been initialized, and so would need to manually read the MSR and check the various enabling bits to avoid taking an unexpected #GP on VMXON. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20191221044513.21680-14-sean.j.christopherson@intel.com
2020-01-13x86/cpu: Set synthetic VMX cpufeatures during init_ia32_feat_ctl()Sean Christopherson4-119/+14
Set the synthetic VMX cpufeatures, which need to be kept to preserve /proc/cpuinfo's ABI, in the common IA32_FEAT_CTL initialization code. Remove the vendor code that manually sets the synthetic flags. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20191221044513.21680-13-sean.j.christopherson@intel.com
2020-01-13x86/cpu: Print VMX flags in /proc/cpuinfo using VMX_FEATURES_*Sean Christopherson3-6/+29
Add support for generating VMX feature names in capflags.c and use the resulting x86_vmx_flags to print the VMX flags in /proc/cpuinfo. Don't print VMX flags if no bits are set in word 0, which holds Pin Controls. Pin Control's INTR and NMI exiting are fundamental pillars of VMX, if they are not supported then the CPU is broken, it does not actually support VMX, or the kernel wasn't built with support for the target CPU. Print the features in a dedicated "vmx flags" line to avoid polluting the common "flags" and to avoid having to prefix all flags with "vmx_", which results in horrendously long names. Keep synthetic VMX flags in cpufeatures to preserve /proc/cpuinfo's ABI for those flags. This means that "flags" and "vmx flags" will have duplicate entries for tpr_shadow (virtual_tpr), vnmi, ept, flexpriority, vpid and ept_ad, but caps the pollution of "flags" at those six VMX features. The vendor-specific code that populates the synthetic flags will be consolidated in a future patch to further minimize the lasting damage. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20191221044513.21680-12-sean.j.christopherson@intel.com
2020-01-13x86/cpu: Detect VMX features on Intel, Centaur and Zhaoxin CPUsSean Christopherson2-0/+77
Add an entry in struct cpuinfo_x86 to track VMX capabilities and fill the capabilities during IA32_FEAT_CTL MSR initialization. Make the VMX capabilities dependent on IA32_FEAT_CTL and X86_FEATURE_NAMES so as to avoid unnecessary overhead on CPUs that can't possibly support VMX, or when /proc/cpuinfo is not available. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20191221044513.21680-11-sean.j.christopherson@intel.com
2020-01-13x86/cpu: Clear VMX feature flag if VMX is not fully enabledSean Christopherson1-3/+20
Now that IA32_FEAT_CTL is always configured and locked for CPUs that are known to support VMX[*], clear the VMX capability flag if the MSR is unsupported or BIOS disabled VMX, i.e. locked IA32_FEAT_CTL and didn't set the appropriate VMX enable bit. [*] Because init_ia32_feat_ctl() is called from vendors ->c_init(), it's still possible for IA32_FEAT_CTL to be left unlocked when VMX is supported by the CPU. This is not fatal, and will be addressed in a future patch. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20191221044513.21680-9-sean.j.christopherson@intel.com
2020-01-13x86/zhaoxin: Use common IA32_FEAT_CTL MSR initializationSean Christopherson1-0/+2
Use the recently added IA32_FEAT_CTL MSR initialization sequence to opportunistically enable VMX support when running on a Zhaoxin CPU. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20191221044513.21680-8-sean.j.christopherson@intel.com
2020-01-13x86/centaur: Use common IA32_FEAT_CTL MSR initializationSean Christopherson1-0/+2
Use the recently added IA32_FEAT_CTL MSR initialization sequence to opportunistically enable VMX support when running on a Centaur CPU. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20191221044513.21680-7-sean.j.christopherson@intel.com
2020-01-13x86/mce: WARN once if IA32_FEAT_CTL MSR is left unlockedSean Christopherson1-5/+6
WARN if the IA32_FEAT_CTL MSR is somehow left unlocked now that CPU initialization unconditionally locks the MSR. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20191221044513.21680-6-sean.j.christopherson@intel.com
2020-01-13x86/intel: Initialize IA32_FEAT_CTL MSR at bootSean Christopherson4-0/+44
Opportunistically initialize IA32_FEAT_CTL to enable VMX when the MSR is left unlocked by BIOS. Configuring feature control at boot time paves the way for similar enabling of other features, e.g. Software Guard Extensions (SGX). Temporarily leave equivalent KVM code in place in order to avoid introducing a regression on Centaur and Zhaoxin CPUs, e.g. removing KVM's code would leave the MSR unlocked on those CPUs and would break existing functionality if people are loading kvm_intel on Centaur and/or Zhaoxin. Defer enablement of the boot-time configuration on Centaur and Zhaoxin to future patches to aid bisection. Note, Local Machine Check Exceptions (LMCE) are also supported by the kernel and enabled via feature control, but the kernel currently uses LMCE if and only if the feature is explicitly enabled by BIOS. Keep the current behavior to avoid introducing bugs, future patches can opt in to opportunistic enabling if it's deemed desirable to do so. Always lock IA32_FEAT_CTL if it exists, even if the CPU doesn't support VMX, so that other existing and future kernel code that queries the MSR can assume it's locked. Start from a clean slate when constructing the value to write to IA32_FEAT_CTL, i.e. ignore whatever value BIOS left in the MSR so as not to enable random features or fault on the WRMSR. Suggested-by: Borislav Petkov <bp@suse.de> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20191221044513.21680-5-sean.j.christopherson@intel.com
2020-01-13x86/msr-index: Clean up bit defines for IA32_FEATURE_CONTROL MSRSean Christopherson1-5/+5
As pointed out by Boris, the defines for bits in IA32_FEATURE_CONTROL are quite a mouthful, especially the VMX bits which must differentiate between enabling VMX inside and outside SMX (TXT) operation. Rename the MSR and its bit defines to abbreviate FEATURE_CONTROL as FEAT_CTL to make them a little friendlier on the eyes. Arguably, the MSR itself should keep the full IA32_FEATURE_CONTROL name to match Intel's SDM, but a future patch will add a dedicated Kconfig, file and functions for the MSR. Using the full name for those assets is rather unwieldy, so bite the bullet and use IA32_FEAT_CTL so that its nomenclature is consistent throughout the kernel. Opportunistically, fix a few other annoyances with the defines: - Relocate the bit defines so that they immediately follow the MSR define, e.g. aren't mistaken as belonging to MISC_FEATURE_CONTROL. - Add whitespace around the block of feature control defines to make it clear they're all related. - Use BIT() instead of manually encoding the bit shift. - Use "VMX" instead of "VMXON" to match the SDM. - Append "_ENABLED" to the LMCE (Local Machine Check Exception) bit to be consistent with the kernel's verbiage used for all other feature control bits. Note, the SDM refers to the LMCE bit as LMCE_ON, likely to differentiate it from IA32_MCG_EXT_CTL.LMCE_EN. Ignore the (literal) one-off usage of _ON, the SDM is simply "wrong". Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20191221044513.21680-2-sean.j.christopherson@intel.com
2020-01-13x86/resctrl: Do not reconfigure exiting tasksXiaochen Shen1-0/+4
When writing a pid to file "tasks", a callback function move_myself() is queued to this task to be called when the task returns from kernel mode or exits. The purpose of move_myself() is to activate the newly assigned closid and/or rmid associated with this task. This activation is done by calling resctrl_sched_in() from move_myself(), the same function that is called when switching to this task. If this work is successfully queued but then the task enters PF_EXITING status (e.g., receiving signal SIGKILL, SIGTERM) prior to the execution of the callback move_myself(), move_myself() still calls resctrl_sched_in() since the task status is not currently considered. When a task is exiting, the data structure of the task itself will be freed soon. Calling resctrl_sched_in() to write the register that controls the task's resources is unnecessary and it implies extra performance overhead. Add check on task status in move_myself() and return immediately if the task is PF_EXITING. [ bp: Massage. ] Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lkml.kernel.org/r/1578500026-21152-1-git-send-email-xiaochen.shen@intel.com
2020-01-13x86/mce: Fix use of uninitialized MCE message stringJan H. Schönherr1-2/+2
The function mce_severity() is not required to update its msg argument. In fact, mce_severity_amd() does not, which makes mce_no_way_out() return uninitialized data, which may be used later for printing. Assuming that implementations of mce_severity() either always or never update the msg argument (which is currently the case), it is sufficient to initialize the temporary variable in mce_no_way_out(). While at it, avoid printing a useless "Unknown". Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200103150722.20313-4-jschoenh@amazon.de
2020-01-13x86/mce: Fix mce=nobootlogJan H. Schönherr1-13/+9
Since commit 8b38937b7ab5 ("x86/mce: Do not enter deferred errors into the generic pool twice") the mce=nobootlog option has become mostly ineffective (after being only slightly ineffective before), as the code is taking actions on MCEs left over from boot when they have a usable address. Move the check for MCP_DONTLOG a bit outward to make it effective again. Also, since commit 011d82611172 ("RAS: Add a Corrected Errors Collector") the two branches of the remaining "if" at the bottom of machine_check_poll() do same. Unify them. Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200103150722.20313-3-jschoenh@amazon.de
2020-01-13x86/mce: Take action on UCNA/Deferred errors againJan H. Schönherr1-15/+16
Commit fa92c5869426 ("x86, mce: Support memory error recovery for both UCNA and Deferred error in machine_check_poll") added handling of UCNA and Deferred errors by adding them to the ring for SRAO errors. Later, commit fd4cf79fcc4b ("x86/mce: Remove the MCE ring for Action Optional errors") switched storage from the SRAO ring to the unified pool that is still in use today. In order to only act on the intended errors, a filter for MCE_AO_SEVERITY is used -- effectively removing handling of UCNA/Deferred errors again. Extend the severity filter to include UCNA/Deferred errors again. Also, generalize the naming of the notifier from SRAO to UC to capture the extended scope. Note, that this change may cause a message like the following to appear, as the same address may be reported as SRAO and as UCNA: Memory failure: 0x5fe3284: already hardware poisoned Technically, this is a return to previous behavior. Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20200103150722.20313-2-jschoenh@amazon.de
2020-01-10Merge branch 'x86/mm' into efi/core, to pick up dependenciesIngo Molnar5-5/+5
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-01-09x86/cpu: Add a missing prototype for arch_smt_update()Benjamin Thiel1-0/+1
.. in order to fix a -Wmissing-prototype warning. No functional change. Signed-off-by: Benjamin Thiel <b.thiel@posteo.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200109121723.8151-1-b.thiel@posteo.de
2020-01-02x86/resctrl: Fix potential memory leakShakeel Butt1-3/+3
set_cache_qos_cfg() is leaking memory when the given level is not RDT_RESOURCE_L3 or RDT_RESOURCE_L2. At the moment, this function is called with only valid levels but move the allocation after the valid level checks in order to make it more robust and future proof. [ bp: Massage commit message. ] Fixes: 99adde9b370de ("x86/intel_rdt: Enable L2 CDP in MSR IA32_L2_QOS_CFG") Signed-off-by: Shakeel Butt <shakeelb@google.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Reinette Chatre <reinette.chatre@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20200102165844.133133-1-shakeelb@google.com
2019-12-30x86/resctrl: Fix an imbalance in domain_remove_cpu()Qian Cai1-1/+1
A system that supports resource monitoring may have multiple resources while not all of these resources are capable of monitoring. Monitoring related state is initialized only for resources that are capable of monitoring and correspondingly this state should subsequently only be removed from these resources that are capable of monitoring. domain_add_cpu() calls domain_setup_mon_state() only when r->mon_capable is true where it will initialize d->mbm_over. However, domain_remove_cpu() calls cancel_delayed_work(&d->mbm_over) without checking r->mon_capable resulting in an attempt to cancel d->mbm_over on all resources, even those that never initialized d->mbm_over because they are not capable of monitoring. Hence, it triggers a debugobjects warning when offlining CPUs because those timer debugobjects are never initialized: ODEBUG: assert_init not available (active state 0) object type: timer_list hint: 0x0 WARNING: CPU: 143 PID: 789 at lib/debugobjects.c:484 debug_print_object Hardware name: HP Synergy 680 Gen9/Synergy 680 Gen9 Compute Module, BIOS I40 05/23/2018 RIP: 0010:debug_print_object Call Trace: debug_object_assert_init del_timer try_to_grab_pending cancel_delayed_work resctrl_offline_cpu cpuhp_invoke_callback cpuhp_thread_fun smpboot_thread_fn kthread ret_from_fork Fixes: e33026831bdb ("x86/intel_rdt/mbm: Handle counter overflow") Signed-off-by: Qian Cai <cai@lca.pw> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Reinette Chatre <reinette.chatre@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: john.stultz@linaro.org Cc: sboyd@kernel.org Cc: <stable@vger.kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: tj@kernel.org Cc: Tony Luck <tony.luck@intel.com> Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20191211033042.2188-1-cai@lca.pw
2019-12-30Merge tag 'v5.5-rc4' into locking/kcsan, to resolve conflictsIngo Molnar13-184/+489
Conflicts: init/main.c lib/Kconfig.debug Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-12-17x86/mce: Remove mce_inject_log() in favor of mce_log()Jan H. Schönherr3-13/+2
The mutex in mce_inject_log() became unnecessary with commit 5de97c9f6d85 ("x86/mce: Factor out and deprecate the /dev/mcelog driver"), though the original reason for its presence only vanished with commit 7298f08ea887 ("x86/mcelog: Get rid of RCU remnants"). Drop the mutex. And as that makes mce_inject_log() identical to mce_log(), get rid of the former in favor of the latter. Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20191210000733.17979-7-jschoenh@amazon.de
2019-12-17x86/mce: Pass MCE message to mce_panic() on failed kernel recoveryJan H. Schönherr1-1/+1
In commit b2f9d678e28c ("x86/mce: Check for faults tagged in EXTABLE_CLASS_FAULT exception table entries") another call to mce_panic() was introduced. Pass the message of the handled MCE to that instance of mce_panic() as well, as there doesn't seem to be a reason not to. Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20191210000733.17979-6-jschoenh@amazon.de
2019-12-17x86/mce/therm_throt: Mark throttle_active_work() as __maybe_unusedArnd Bergmann1-1/+1
throttle_active_work() is only called if CONFIG_SYSFS is set, otherwise we get a harmless warning: arch/x86/kernel/cpu/mce/therm_throt.c:238:13: error: 'throttle_active_work' \ defined but not used [-Werror=unused-function] Mark the function as __maybe_unused to avoid the warning. Fixes: f6656208f04e ("x86/mce/therm_throt: Optimize notifications of thermal throttle") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Cc: bberg@redhat.com Cc: ckellner@redhat.com Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: hdegoede@redhat.com Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20191210203925.3119091-1-arnd@arndb.de
2019-12-17x86/mce: Fix possibly incorrect severity calculation on AMDJan H. Schönherr1-1/+1
The function mce_severity_amd_smca() requires m->bank to be initialized for correct operation. Fix the one case, where mce_severity() is called without doing so. Fixes: 6bda529ec42e ("x86/mce: Grade uncorrected errors for SMCA-enabled systems") Fixes: d28af26faa0b ("x86/MCE: Initialize mce.bank in the case of a fatal error in mce_no_way_out()") Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: <stable@vger.kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86-ml <x86@kernel.org> Cc: Yazen Ghannam <Yazen.Ghannam@amd.com> Link: https://lkml.kernel.org/r/20191210000733.17979-4-jschoenh@amazon.de
2019-12-17x86/MCE/AMD: Allow Reserved types to be overwritten in smca_banks[]Yazen Ghannam1-1/+1
Each logical CPU in Scalable MCA systems controls a unique set of MCA banks in the system. These banks are not shared between CPUs. The bank types and ordering will be the same across CPUs on currently available systems. However, some CPUs may see a bank as Reserved/Read-as-Zero (RAZ) while other CPUs do not. In this case, the bank seen as Reserved on one CPU is assumed to be the same type as the bank seen as a known type on another CPU. In general, this occurs when the hardware represented by the MCA bank is disabled, e.g. disabled memory controllers on certain models, etc. The MCA bank is disabled in the hardware, so there is no possibility of getting an MCA/MCE from it even if it is assumed to have a known type. For example: Full system: Bank | Type seen on CPU0 | Type seen on CPU1 ------------------------------------------------ 0 | LS | LS 1 | UMC | UMC 2 | CS | CS System with hardware disabled: Bank | Type seen on CPU0 | Type seen on CPU1 ------------------------------------------------ 0 | LS | LS 1 | UMC | RAZ 2 | CS | CS For this reason, there is a single, global struct smca_banks[] that is initialized at boot time. This array is initialized on each CPU as it comes online. However, the array will not be updated if an entry already exists. This works as expected when the first CPU (usually CPU0) has all possible MCA banks enabled. But if the first CPU has a subset, then it will save a "Reserved" type in smca_banks[]. Successive CPUs will then not be able to update smca_banks[] even if they encounter a known bank type. This may result in unexpected behavior. Depending on the system configuration, a user may observe issues enumerating the MCA thresholding sysfs interface. The issues may be as trivial as sysfs entries not being available, or as severe as system hangs. For example: Bank | Type seen on CPU0 | Type seen on CPU1 ------------------------------------------------ 0 | LS | LS 1 | RAZ | UMC 2 | CS | CS Extend the smca_banks[] entry check to return if the entry is a non-reserved type. Otherwise, continue so that CPUs that encounter a known bank type can update smca_banks[]. Fixes: 68627a697c19 ("x86/mce/AMD, EDAC/mce_amd: Enumerate Reserved SMCA bank type") Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: <stable@vger.kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20191121141508.141273-1-Yazen.Ghannam@amd.com
2019-12-17x86/MCE/AMD: Do not use rdmsr_safe_on_cpu() in smca_configure()Konstantin Khlebnikov1-1/+1
... because interrupts are disabled that early and sending IPIs can deadlock: BUG: sleeping function called from invalid context at kernel/sched/completion.c:99 in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 0, name: swapper/1 no locks held by swapper/1/0. irq event stamp: 0 hardirqs last enabled at (0): [<0000000000000000>] 0x0 hardirqs last disabled at (0): [<ffffffff8106dda9>] copy_process+0x8b9/0x1ca0 softirqs last enabled at (0): [<ffffffff8106dda9>] copy_process+0x8b9/0x1ca0 softirqs last disabled at (0): [<0000000000000000>] 0x0 Preemption disabled at: [<ffffffff8104703b>] start_secondary+0x3b/0x190 CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.5.0-rc2+ #1 Hardware name: GIGABYTE MZ01-CE1-00/MZ01-CE1-00, BIOS F02 08/29/2018 Call Trace: dump_stack ___might_sleep.cold.92 wait_for_completion ? generic_exec_single rdmsr_safe_on_cpu ? wrmsr_on_cpus mce_amd_feature_init mcheck_cpu_init identify_cpu identify_secondary_cpu smp_store_cpu_info start_secondary secondary_startup_64 The function smca_configure() is called only on the current CPU anyway, therefore replace rdmsr_safe_on_cpu() with atomic rdmsr_safe() and avoid the IPI. [ bp: Update commit message. ] Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: <stable@vger.kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/157252708836.3876.4604398213417262402.stgit@buzz
2019-12-15x86/cpu/tsx: Define pr_fmt()Borislav Petkov1-1/+4
... so that all current and future pr_* statements in this file have the proper prefix. No functional changes. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: x86@kernel.org Link: https://lkml.kernel.org/r/20191112221823.19677-2-bp@alien8.de
2019-12-14x86/bugs: Move enum taa_mitigations to bugs.cBorislav Petkov1-0/+7
... because it is used only there. No functional changes. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: x86@kernel.org Link: https://lkml.kernel.org/r/20191112221823.19677-1-bp@alien8.de
2019-12-10x86/mm/pat: Rename <asm/pat.h> => <asm/memtype.h>Ingo Molnar5-5/+5
pat.h is a file whose main purpose is to provide the memtype_*() APIs. PAT is the low level hardware mechanism - but the high level abstraction is memtype. So name the header <memtype.h> as well - this goes hand in hand with memtype.c and memtype_interval.c. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-12-09x86/mtrr: Require CAP_SYS_ADMIN for all accessKees Cook1-19/+2
Zhang Xiaoxu noted that physical address locations for MTRR were visible to non-root users, which could be considered an information leak. In discussing[1] the options for solving this, it sounded like just moving the capable check into open() was the first step. If this breaks userspace, then we will have a test case for the more conservative approaches discussed in the thread. In summary: - MTRR should check capabilities at open time (or retain the checks on the opener's permissions for later checks). - changing the DAC permissions might break something that expects to open mtrr when not uid 0. - if we leave the DAC permissions alone and just move the capable check to the opener, we should get the desired protection. (i.e. check against CAP_SYS_ADMIN not just the wider uid 0.) - if that still breaks things, as in userspace expects to be able to read other parts of the file as non-uid-0 and non-CAP_SYS_ADMIN, then we need to censor the contents using the opener's permissions. For example, as done in other /proc cases, like commit 51d7b120418e ("/proc/iomem: only expose physical resource addresses to privileged users"). [1] https://lore.kernel.org/lkml/201911110934.AC5BA313@keescook/ Reported-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: James Morris <jamorris@linux.microsoft.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Colin Ian King <colin.king@canonical.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: linux-security-module@vger.kernel.org Cc: Matthew Garrett <mjg59@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tyler Hicks <tyhicks@canonical.com> Cc: x86-ml <x86@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/201911181308.63F06502A1@keescook
2019-12-09x86/mtrr: Get rid of mtrr_seq_show() forward declarationBorislav Petkov1-22/+20
... by moving the function up in the file. No functional changes. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: x86@kernel.org Link: https://lkml.kernel.org/r/20191108200815.24589-1-bp@alien8.de
2019-12-01Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds1-10/+2
Pull x86 fixes from Ingo Molnar: "Various fixes: - Fix the PAT performance regression that downgraded write-combining device memory regions to uncached. - There's been a number of bugs in 32-bit double fault handling - hopefully all fixed now. - Fix an LDT crash - Fix an FPU over-optimization that broke with GCC9 code optimizations. - Misc cleanups" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm/pat: Fix off-by-one bugs in interval tree search x86/ioperm: Save an indentation level in tss_update_io_bitmap() x86/fpu: Don't cache access to fpu_fpregs_owner_ctx x86/entry/32: Remove unused 'restore_all_notrace' local label x86/ptrace: Document FSBASE and GSBASE ABI oddities x86/ptrace: Remove set_segment_reg() implementations for current x86/traps: die() instead of panicking on a double fault x86/doublefault/32: Rewrite the x86_32 #DF handler and unify with 64-bit x86/doublefault/32: Move #DF stack and TSS to cpu_entry_area x86/doublefault/32: Rename doublefault.c to doublefault_32.c x86/traps: Disentangle the 32-bit and 64-bit doublefault code lkdtm: Add a DOUBLE_FAULT crash type on x86 selftests/x86/single_step_syscall: Check SYSENTER directly x86/mm/32: Sync only to VMALLOC_END in vmalloc_sync_all()