<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-dev/arch/x86/kernel/fpu, branch master</title>
<subtitle>Linux kernel development work - see feature branches</subtitle>
<id>https://git.zx2c4.com/linux-dev/atom/arch/x86/kernel/fpu?h=master</id>
<link rel='self' href='https://git.zx2c4.com/linux-dev/atom/arch/x86/kernel/fpu?h=master'/>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/linux-dev/'/>
<updated>2022-10-21T22:22:09Z</updated>
<entry>
<title>x86/fpu: Fix copy_xstate_to_uabi() to copy init states correctly</title>
<updated>2022-10-21T22:22:09Z</updated>
<author>
<name>Chang S. Bae</name>
<email>chang.seok.bae@intel.com</email>
</author>
<published>2022-10-21T18:58:44Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/linux-dev/commit/?id=471f0aa7fa64e23766a1473b32d9ec3f0718895a'/>
<id>urn:sha1:471f0aa7fa64e23766a1473b32d9ec3f0718895a</id>
<content type='text'>
When an extended state component is not present in fpstate, but in init
state, the function copies from init_fpstate via copy_feature().

But, dynamic states are not present in init_fpstate because of all-zeros
init states. Then retrieving them from init_fpstate will explode like this:

 BUG: kernel NULL pointer dereference, address: 0000000000000000
 ...
 RIP: 0010:memcpy_erms+0x6/0x10
  ? __copy_xstate_to_uabi_buf+0x381/0x870
  fpu_copy_guest_fpstate_to_uabi+0x28/0x80
  kvm_arch_vcpu_ioctl+0x14c/0x1460 [kvm]
  ? __this_cpu_preempt_check+0x13/0x20
  ? vmx_vcpu_put+0x2e/0x260 [kvm_intel]
  kvm_vcpu_ioctl+0xea/0x6b0 [kvm]
  ? kvm_vcpu_ioctl+0xea/0x6b0 [kvm]
  ? __fget_light+0xd4/0x130
  __x64_sys_ioctl+0xe3/0x910
  ? debug_smp_processor_id+0x17/0x20
  ? fpregs_assert_state_consistent+0x27/0x50
  do_syscall_64+0x3f/0x90
  entry_SYSCALL_64_after_hwframe+0x63/0xcd

Adjust the 'mask' to zero out the userspace buffer for the features that
are not available both from fpstate and from init_fpstate.

The dynamic features depend on the compacted XSAVE format. Ensure it is
enabled before reading XCOMP_BV in init_fpstate.

Fixes: 2308ee57d93d ("x86/fpu/amx: Enable the AMX feature in 64-bit mode")
Reported-by: Yuan Yao &lt;yuan.yao@intel.com&gt;
Suggested-by: Dave Hansen &lt;dave.hansen@intel.com&gt;
Signed-off-by: Chang S. Bae &lt;chang.seok.bae@intel.com&gt;
Signed-off-by: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Tested-by: Yuan Yao &lt;yuan.yao@intel.com&gt;
Link: https://lore.kernel.org/lkml/BYAPR11MB3717EDEF2351C958F2C86EED95259@BYAPR11MB3717.namprd11.prod.outlook.com/
Link: https://lkml.kernel.org/r/20221021185844.13472-1-chang.seok.bae@intel.com
</content>
</entry>
<entry>
<title>x86/fpu: Exclude dynamic states from init_fpstate</title>
<updated>2022-10-17T13:44:25Z</updated>
<author>
<name>Chang S. Bae</name>
<email>chang.seok.bae@intel.com</email>
</author>
<published>2022-08-24T19:12:23Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/linux-dev/commit/?id=a401f45e38754953c9d402f8b3bc965707eecc91'/>
<id>urn:sha1:a401f45e38754953c9d402f8b3bc965707eecc91</id>
<content type='text'>
== Background ==

The XSTATE init code initializes all enabled and supported components.
Then, the init states are saved in the init_fpstate buffer that is
statically allocated in about one page.

The AMX TILE_DATA state is large (8KB) but its init state is zero. And the
feature comes only with the compacted format with these established
dependencies: AMX-&gt;XFD-&gt;XSAVES. So this state is excludable from
init_fpstate.

== Problem ==

But the buffer is formatted to include that large state. Then, this can be
the cause of a noisy splat like the below.

This came from XRSTORS for the task with init_fpstate in its XSAVE buffer.
It is reproducible on AMX systems when the running kernel is built with
CONFIG_DEBUG_PAGEALLOC=y and CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT=y:

 Bad FPU state detected at restore_fpregs_from_fpstate+0x57/0xd0, reinitializing FPU registers.
 ...
 RIP: 0010:restore_fpregs_from_fpstate+0x57/0xd0
  ? restore_fpregs_from_fpstate+0x45/0xd0
  switch_fpu_return+0x4e/0xe0
  exit_to_user_mode_prepare+0x17b/0x1b0
  syscall_exit_to_user_mode+0x29/0x40
  do_syscall_64+0x67/0x80
  ? do_syscall_64+0x67/0x80
  ? exc_page_fault+0x86/0x180
  entry_SYSCALL_64_after_hwframe+0x63/0xcd

== Solution ==

Adjust init_fpstate to exclude dynamic states. XRSTORS from init_fpstate
still initializes those states when their bits are set in the
requested-feature bitmap.

Fixes: 2308ee57d93d ("x86/fpu/amx: Enable the AMX feature in 64-bit mode")
Reported-by: Lin X Wang &lt;lin.x.wang@intel.com&gt;
Signed-off-by: Chang S. Bae &lt;chang.seok.bae@intel.com&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Tested-by: Lin X Wang &lt;lin.x.wang@intel.com&gt;
Link: https://lore.kernel.org/r/20220824191223.1248-4-chang.seok.bae@intel.com

</content>
</entry>
<entry>
<title>x86/fpu: Fix the init_fpstate size check with the actual size</title>
<updated>2022-10-17T13:44:25Z</updated>
<author>
<name>Chang S. Bae</name>
<email>chang.seok.bae@intel.com</email>
</author>
<published>2022-08-24T19:12:22Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/linux-dev/commit/?id=d3e021adac7c51a26d9ede167c789fcc1b878467'/>
<id>urn:sha1:d3e021adac7c51a26d9ede167c789fcc1b878467</id>
<content type='text'>
The init_fpstate buffer is statically allocated. Thus, the sanity test was
established to check whether the pre-allocated buffer is enough for the
calculated size or not.

The currently measured size is not strictly relevant. Fix to validate the
calculated init_fpstate size with the pre-allocated area.

Also, replace the sanity check function with open code for clarity. The
abstraction itself and the function naming do not tend to represent simply
what it does.

Fixes: 2ae996e0c1a3 ("x86/fpu: Calculate the default sizes independently")
Signed-off-by: Chang S. Bae &lt;chang.seok.bae@intel.com&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Link: https://lore.kernel.org/r/20220824191223.1248-3-chang.seok.bae@intel.com

</content>
</entry>
<entry>
<title>x86/fpu: Configure init_fpstate attributes orderly</title>
<updated>2022-10-17T13:44:25Z</updated>
<author>
<name>Chang S. Bae</name>
<email>chang.seok.bae@intel.com</email>
</author>
<published>2022-08-24T19:12:21Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/linux-dev/commit/?id=c32d7cab57e3a77af8ecc17cde7a5761a26483b8'/>
<id>urn:sha1:c32d7cab57e3a77af8ecc17cde7a5761a26483b8</id>
<content type='text'>
The init_fpstate setup code is spread out and out of order. The init image
is recorded before its scoped features and the buffer size are determined.

Determine the scope of init_fpstate components and its size before
recording the init state. Also move the relevant code together.

Signed-off-by: Chang S. Bae &lt;chang.seok.bae@intel.com&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Acked-by: neelnatu@google.com
Link: https://lore.kernel.org/r/20220824191223.1248-2-chang.seok.bae@intel.com

</content>
</entry>
<entry>
<title>x86/fpu: Add a helper to prepare AMX state for low-power CPU idle</title>
<updated>2022-07-19T16:46:15Z</updated>
<author>
<name>Chang S. Bae</name>
<email>chang.seok.bae@intel.com</email>
</author>
<published>2022-06-08T16:47:47Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/linux-dev/commit/?id=f17b168734c0fe47343a7502d012266a051f9942'/>
<id>urn:sha1:f17b168734c0fe47343a7502d012266a051f9942</id>
<content type='text'>
When a CPU enters an idle state, a non-initialized AMX register state may
be the cause of preventing a deeper low-power state. Other extended
register states whether initialized or not do not impact the CPU idle
state.

The new helper can ensure the AMX state is initialized before the CPU is
idle, and it will be used by the intel idle driver.

Check the AMX_TILE feature bit before using XGETBV1 as a chain of
dependencies was established via cpuid_deps[]: AMX-&gt;XFD-&gt;XGETBV1.

Signed-off-by: Chang S. Bae &lt;chang.seok.bae@intel.com&gt;
Signed-off-by: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Signed-off-by: Borislav Petkov &lt;bp@suse.de&gt;
Link: https://lkml.kernel.org/r/20220608164748.11864-2-chang.seok.bae@intel.com
</content>
</entry>
<entry>
<title>Merge tag 'kthread-cleanups-for-v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace</title>
<updated>2022-06-03T23:03:05Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2022-06-03T23:03:05Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/linux-dev/commit/?id=1ec6574a3c0a22c130c08e8c36c825cb87d68f8e'/>
<id>urn:sha1:1ec6574a3c0a22c130c08e8c36c825cb87d68f8e</id>
<content type='text'>
Pull kthread updates from Eric Biederman:
 "This updates init and user mode helper tasks to be ordinary user mode
  tasks.

  Commit 40966e316f86 ("kthread: Ensure struct kthread is present for
  all kthreads") caused init and the user mode helper threads that call
  kernel_execve to have struct kthread allocated for them. This struct
  kthread going away during execve in turned made a use after free of
  struct kthread possible.

  Here, commit 343f4c49f243 ("kthread: Don't allocate kthread_struct for
  init and umh") is enough to fix the use after free and is simple
  enough to be backportable.

  The rest of the changes pass struct kernel_clone_args to clean things
  up and cause the code to make sense.

  In making init and the user mode helpers tasks purely user mode tasks
  I ran into two complications. The function task_tick_numa was
  detecting tasks without an mm by testing for the presence of
  PF_KTHREAD. The initramfs code in populate_initrd_image was using
  flush_delayed_fput to ensuere the closing of all it's file descriptors
  was complete, and flush_delayed_fput does not work in a userspace
  thread.

  I have looked and looked and more complications and in my code review
  I have not found any, and neither has anyone else with the code
  sitting in linux-next"

* tag 'kthread-cleanups-for-v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  sched: Update task_tick_numa to ignore tasks without an mm
  fork: Stop allowing kthreads to call execve
  fork: Explicitly set PF_KTHREAD
  init: Deal with the init process being a user mode process
  fork: Generalize PF_IO_WORKER handling
  fork: Explicity test for idle tasks in copy_thread
  fork: Pass struct kernel_clone_args into copy_thread
  kthread: Don't allocate kthread_struct for init and umh
</content>
</entry>
<entry>
<title>Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm</title>
<updated>2022-05-26T21:20:14Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2022-05-26T21:20:14Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/linux-dev/commit/?id=bf9095424d027e942e1d1ee74977e17b7df8e455'/>
<id>urn:sha1:bf9095424d027e942e1d1ee74977e17b7df8e455</id>
<content type='text'>
Pull kvm updates from Paolo Bonzini:
 "S390:

   - ultravisor communication device driver

   - fix TEID on terminating storage key ops

  RISC-V:

   - Added Sv57x4 support for G-stage page table

   - Added range based local HFENCE functions

   - Added remote HFENCE functions based on VCPU requests

   - Added ISA extension registers in ONE_REG interface

   - Updated KVM RISC-V maintainers entry to cover selftests support

  ARM:

   - Add support for the ARMv8.6 WFxT extension

   - Guard pages for the EL2 stacks

   - Trap and emulate AArch32 ID registers to hide unsupported features

   - Ability to select and save/restore the set of hypercalls exposed to
     the guest

   - Support for PSCI-initiated suspend in collaboration with userspace

   - GICv3 register-based LPI invalidation support

   - Move host PMU event merging into the vcpu data structure

   - GICv3 ITS save/restore fixes

   - The usual set of small-scale cleanups and fixes

  x86:

   - New ioctls to get/set TSC frequency for a whole VM

   - Allow userspace to opt out of hypercall patching

   - Only do MSR filtering for MSRs accessed by rdmsr/wrmsr

  AMD SEV improvements:

   - Add KVM_EXIT_SHUTDOWN metadata for SEV-ES

   - V_TSC_AUX support

  Nested virtualization improvements for AMD:

   - Support for "nested nested" optimizations (nested vVMLOAD/VMSAVE,
     nested vGIF)

   - Allow AVIC to co-exist with a nested guest running

   - Fixes for LBR virtualizations when a nested guest is running, and
     nested LBR virtualization support

   - PAUSE filtering for nested hypervisors

  Guest support:

   - Decoupling of vcpu_is_preempted from PV spinlocks"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (199 commits)
  KVM: x86: Fix the intel_pt PMI handling wrongly considered from guest
  KVM: selftests: x86: Sync the new name of the test case to .gitignore
  Documentation: kvm: reorder ARM-specific section about KVM_SYSTEM_EVENT_SUSPEND
  x86, kvm: use correct GFP flags for preemption disabled
  KVM: LAPIC: Drop pending LAPIC timer injection when canceling the timer
  x86/kvm: Alloc dummy async #PF token outside of raw spinlock
  KVM: x86: avoid calling x86 emulator without a decoded instruction
  KVM: SVM: Use kzalloc for sev ioctl interfaces to prevent kernel data leak
  x86/fpu: KVM: Set the base guest FPU uABI size to sizeof(struct kvm_xsave)
  s390/uv_uapi: depend on CONFIG_S390
  KVM: selftests: x86: Fix test failure on arch lbr capable platforms
  KVM: LAPIC: Trace LAPIC timer expiration on every vmentry
  KVM: s390: selftest: Test suppression indication on key prot exception
  KVM: s390: Don't indicate suppression on dirtying, failing memop
  selftests: drivers/s390x: Add uvdevice tests
  drivers/s390/char: Add Ultravisor io device
  MAINTAINERS: Update KVM RISC-V entry to cover selftests support
  RISC-V: KVM: Introduce ISA extension register
  RISC-V: KVM: Cleanup stale TLB entries when host CPU changes
  RISC-V: KVM: Add remote HFENCE functions based on VCPU requests
  ...
</content>
</entry>
<entry>
<title>x86/fpu: KVM: Set the base guest FPU uABI size to sizeof(struct kvm_xsave)</title>
<updated>2022-05-25T09:11:37Z</updated>
<author>
<name>Sean Christopherson</name>
<email>seanjc@google.com</email>
</author>
<published>2022-05-04T00:12:19Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/linux-dev/commit/?id=d187ba5312307d51818beafaad87d28a7d939adf'/>
<id>urn:sha1:d187ba5312307d51818beafaad87d28a7d939adf</id>
<content type='text'>
Set the starting uABI size of KVM's guest FPU to 'struct kvm_xsave',
i.e. to KVM's historical uABI size.  When saving FPU state for usersapce,
KVM (well, now the FPU) sets the FP+SSE bits in the XSAVE header even if
the host doesn't support XSAVE.  Setting the XSAVE header allows the VM
to be migrated to a host that does support XSAVE without the new host
having to handle FPU state that may or may not be compatible with XSAVE.

Setting the uABI size to the host's default size results in out-of-bounds
writes (setting the FP+SSE bits) and data corruption (that is thankfully
caught by KASAN) when running on hosts without XSAVE, e.g. on Core2 CPUs.

WARN if the default size is larger than KVM's historical uABI size; all
features that can push the FPU size beyond the historical size must be
opt-in.

  ==================================================================
  BUG: KASAN: slab-out-of-bounds in fpu_copy_uabi_to_guest_fpstate+0x86/0x130
  Read of size 8 at addr ffff888011e33a00 by task qemu-build/681
  CPU: 1 PID: 681 Comm: qemu-build Not tainted 5.18.0-rc5-KASAN-amd64 #1
  Hardware name:  /DG35EC, BIOS ECG3510M.86A.0118.2010.0113.1426 01/13/2010
  Call Trace:
   &lt;TASK&gt;
   dump_stack_lvl+0x34/0x45
   print_report.cold+0x45/0x575
   kasan_report+0x9b/0xd0
   fpu_copy_uabi_to_guest_fpstate+0x86/0x130
   kvm_arch_vcpu_ioctl+0x72a/0x1c50 [kvm]
   kvm_vcpu_ioctl+0x47f/0x7b0 [kvm]
   __x64_sys_ioctl+0x5de/0xc90
   do_syscall_64+0x31/0x50
   entry_SYSCALL_64_after_hwframe+0x44/0xae
   &lt;/TASK&gt;
  Allocated by task 0:
  (stack is not available)
  The buggy address belongs to the object at ffff888011e33800
   which belongs to the cache kmalloc-512 of size 512
  The buggy address is located 0 bytes to the right of
   512-byte region [ffff888011e33800, ffff888011e33a00)
  The buggy address belongs to the physical page:
  page:0000000089cd4adb refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x11e30
  head:0000000089cd4adb order:2 compound_mapcount:0 compound_pincount:0
  flags: 0x4000000000010200(slab|head|zone=1)
  raw: 4000000000010200 dead000000000100 dead000000000122 ffff888001041c80
  raw: 0000000000000000 0000000080100010 00000001ffffffff 0000000000000000
  page dumped because: kasan: bad access detected
  Memory state around the buggy address:
   ffff888011e33900: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
   ffff888011e33980: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  &gt;ffff888011e33a00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
                     ^
   ffff888011e33a80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
   ffff888011e33b00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
  ==================================================================
  Disabling lock debugging due to kernel taint

Fixes: be50b2065dfa ("kvm: x86: Add support for getting/setting expanded xstate buffer")
Fixes: c60427dd50ba ("x86/fpu: Add uabi_size to guest_fpu")
Reported-by: Zdenek Kaspar &lt;zkaspar82@gmail.com&gt;
Cc: Maciej S. Szmigiero &lt;mail@maciej.szmigiero.name&gt;
Cc: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
Cc: kvm@vger.kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson &lt;seanjc@google.com&gt;
Tested-by: Zdenek Kaspar &lt;zkaspar82@gmail.com&gt;
Message-Id: &lt;20220504001219.983513-1-seanjc@google.com&gt;
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
</content>
</entry>
<entry>
<title>Merge tag 'x86_fpu_for_v5.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip</title>
<updated>2022-05-24T01:49:16Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2022-05-24T01:49:16Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/linux-dev/commit/?id=e36ae2290fd1ff3768c3688627e442cd11d4ca1e'/>
<id>urn:sha1:e36ae2290fd1ff3768c3688627e442cd11d4ca1e</id>
<content type='text'>
Pull x86 fpu updates from Borislav Petkov:

 - Add support for XSAVEC - the Compacted XSTATE saving variant - and
   thus allow for guests to use this compacted XSTATE variant when the
   hypervisor exports that support

 - A variable shadowing cleanup

* tag 'x86_fpu_for_v5.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/fpu: Cleanup variable shadowing
  x86/fpu/xsave: Support XSAVEC in the kernel
</content>
</entry>
<entry>
<title>Merge tag 'x86_cleanups_for_v5.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip</title>
<updated>2022-05-24T01:17:09Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2022-05-24T01:17:09Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/linux-dev/commit/?id=a13dc4d40938787cfc9e520ed426df4ebf48b1aa'/>
<id>urn:sha1:a13dc4d40938787cfc9e520ed426df4ebf48b1aa</id>
<content type='text'>
Pull x86 cleanups from Borislav Petkov:

 - Serious sanitization and cleanup of the whole APERF/MPERF and
   frequency invariance code along with removing the need for
   unnecessary IPIs

 - Finally remove a.out support

 - The usual trivial cleanups and fixes all over x86

* tag 'x86_cleanups_for_v5.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits)
  x86: Remove empty files
  x86/speculation: Add missing srbds=off to the mitigations= help text
  x86/prctl: Remove pointless task argument
  x86/aperfperf: Make it correct on 32bit and UP kernels
  x86/aperfmperf: Integrate the fallback code from show_cpuinfo()
  x86/aperfmperf: Replace arch_freq_get_on_cpu()
  x86/aperfmperf: Replace aperfmperf_get_khz()
  x86/aperfmperf: Store aperf/mperf data for cpu frequency reads
  x86/aperfmperf: Make parts of the frequency invariance code unconditional
  x86/aperfmperf: Restructure arch_scale_freq_tick()
  x86/aperfmperf: Put frequency invariance aperf/mperf data into a struct
  x86/aperfmperf: Untangle Intel and AMD frequency invariance init
  x86/aperfmperf: Separate AP/BP frequency invariance init
  x86/smp: Move APERF/MPERF code where it belongs
  x86/aperfmperf: Dont wake idle CPUs in arch_freq_get_on_cpu()
  x86/process: Fix kernel-doc warning due to a changed function name
  x86: Remove a.out support
  x86/mm: Replace nodes_weight() with nodes_empty() where appropriate
  x86: Replace cpumask_weight() with cpumask_empty() where appropriate
  x86/pkeys: Remove __arch_set_user_pkey_access() declaration
  ...
</content>
</entry>
</feed>
