| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
| |
Add AMX-FP16 support to <sys/platform/x86.h>.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
|
|
|
|
|
| |
Add WRMSRNS support to <sys/platform/x86.h>.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
|
|
|
|
|
|
| |
Add Architectural Performance Monitoring Extended Leaf (EAX = 23H)
support to <sys/platform/x86.h>.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
|
|
|
|
|
| |
Add CMPCCXADD support to <sys/platform/x86.h>.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
|
|
|
|
|
| |
Add Linear Address Space Separation (LASS) support to <sys/platform/x86.h>.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
|
|
|
|
|
| |
Add RAO-INT support to <sys/platform/x86.h>.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
|
|
|
|
|
| |
Add architectural LBR support to <sys/platform/x86.h>.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
|
|
|
|
|
| |
Add RTM_FORCE_ABORT support to <sys/platform/x86.h>.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
|
|
|
|
|
| |
Add SGX-KEYS support to <sys/platform/x86.h>.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
|
|
|
|
|
|
| |
Add Bus lock debug exceptions (BUS_LOCK_DETECT) support to
<sys/platform/x86.h>.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
|
|
|
|
|
|
| |
Add 57-bit linear addresses and five-level paging (LA57) support to
<sys/platform/x86.h>.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
|
|
|
|
|
|
| |
Rename x86_cpu_INDEX_7_ECX_1 to x86_cpu_INDEX_7_ECX_15 for the unused bit
15 in ECX from CPUID with EAX == 0x7 and ECX == 0.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
|
|
|
|
| |
Signed-off-by: John David Anglin <dave.anglin@bell.net>
|
|
|
|
|
|
| |
Handle both 32 and 64-bit ABIs.
Signed-off-by: John David Anglin <dave.anglin@bell.net>
|
|
|
|
|
| |
Signed-off-by: Guy-Fleury Iteriteka <gfleury@disroot.org>
Message-Id: <20230318095826.1125734-4-gfleury@disroot.org>
|
|
|
|
|
|
|
|
|
|
|
| |
sysdeps/mach/hurd/htl/pt-pthread_self.c: New file.
htl/Makefile: .. Add it to libc routine.
sysdeps/mach/hurd/htl/pt-sysdep.c(__pthread_self): Remove it.
sysdeps/mach/hurd/htl/pt-sysdep.h(__pthread_self): Add hidden propertie.
htl/Versions(__pthread_self) Version it as private symbol.
Signed-off-by: Guy-Fleury Iteriteka <gfleury@disroot.org>
Message-Id: <20230318095826.1125734-3-gfleury@disroot.org>
|
|
|
|
| |
Also replace an unreachable assert with __builtin_unreachable.
|
|
|
|
|
|
|
|
| |
Similar to fb95c316382679c0826cc8399760977cd95f15c9, also disable
for string-ppc64.c (pulled on rltd as the default string
implementation).
Checked on powerpc64-linux-gnu.
|
|
|
|
|
|
| |
As indicated by sparc kernel-features.h, even though sparc64 defines
__NR_pause, it is not supported (ENOSYS). Always use ppoll or the
64 bit time_t variant instead.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The error handling is moved to sysdeps/ieee754 version with no SVID
support. The compatibility symbol versions still use the wrapper
with SVID error handling around the new code. There is no new symbol
version nor compatibility code on !LIBM_SVID_COMPAT targets
(e.g. riscv).
The ia64 is unchanged, since it still uses the arch specific
__libm_error_region on its implementation. For both i686 and m68k,
which provive arch specific implementation, wrappers are added so
no new symbol are added (which would require to change the
implementations).
It shows an small improvement, the results for fmod:
Architecture | Input | master | patch
-----------------|-----------------|----------|--------
x86_64 (Ryzen 9) | subnormals | 12.5049 | 9.40992
x86_64 (Ryzen 9) | normal | 296.939 | 296.738
x86_64 (Ryzen 9) | close-exponents | 16.0244 | 13.119
aarch64 (N1) | subnormal | 6.81778 | 4.33313
aarch64 (N1) | normal | 155.620 | 152.915
aarch64 (N1) | close-exponents | 8.21306 | 5.76138
armhf (N1) | subnormal | 15.1083 | 14.5746
armhf (N1) | normal | 244.833 | 241.738
armhf (N1) | close-exponents | 21.8182 | 22.457
Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu.
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This uses a new algorithm similar to already proposed earlier [1].
With x = mx * 2^ex and y = my * 2^ey (mx, my, ex, ey being integers),
the simplest implementation is:
mx * 2^ex == 2 * mx * 2^(ex - 1)
while (ex > ey)
{
mx *= 2;
--ex;
mx %= my;
}
With mx/my being mantissa of double floating pointer, on each step the
argument reduction can be improved 8 (which is sizeof of uint32_t minus
MANTISSA_WIDTH plus the signal bit):
while (ex > ey)
{
mx << 8;
ex -= 8;
mx %= my;
} */
The implementation uses builtin clz and ctz, along with shifts to
convert hx/hy back to doubles. Different than the original patch,
this path assume modulo/divide operation is slow, so use multiplication
with invert values.
I see the following performance improvements using fmod benchtests
(result only show the 'mean' result):
Architecture | Input | master | patch
-----------------|-----------------|----------|--------
x86_64 (Ryzen 9) | subnormals | 17.2549 | 12.0318
x86_64 (Ryzen 9) | normal | 85.4096 | 49.9641
x86_64 (Ryzen 9) | close-exponents | 19.1072 | 15.8224
aarch64 (N1) | subnormal | 10.2182 | 6.81778
aarch64 (N1) | normal | 60.0616 | 20.3667
aarch64 (N1) | close-exponents | 11.5256 | 8.39685
I also see similar improvements on arm-linux-gnueabihf when running on
the N1 aarch64 chips, where it a lot of soft-fp implementation (for
modulo, and multiplication):
Architecture | Input | master | patch
-----------------|-----------------|----------|--------
armhf (N1) | subnormal | 11.6662 | 10.8955
armhf (N1) | normal | 69.2759 | 34.1524
armhf (N1) | close-exponents | 13.6472 | 18.2131
Instead of using the math_private.h definitions, I used the
math_config.h instead which is used on newer math implementations.
Co-authored-by: kirill <kirill.okhotnikov@gmail.com>
[1] https://sourceware.org/pipermail/libc-alpha/2020-November/119794.html
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This uses a new algorithm similar to already proposed earlier [1].
With x = mx * 2^ex and y = my * 2^ey (mx, my, ex, ey being integers),
the simplest implementation is:
mx * 2^ex == 2 * mx * 2^(ex - 1)
while (ex > ey)
{
mx *= 2;
--ex;
mx %= my;
}
With mx/my being mantissa of double floating pointer, on each step the
argument reduction can be improved 11 (which is sizeo of uint64_t minus
MANTISSA_WIDTH plus the signal bit):
while (ex > ey)
{
mx << 11;
ex -= 11;
mx %= my;
} */
The implementation uses builtin clz and ctz, along with shifts to
convert hx/hy back to doubles. Different than the original patch,
this path assume modulo/divide operation is slow, so use multiplication
with invert values.
I see the following performance improvements using fmod benchtests
(result only show the 'mean' result):
Architecture | Input | master | patch
-----------------|-----------------|----------|--------
x86_64 (Ryzen 9) | subnormals | 19.1584 | 12.5049
x86_64 (Ryzen 9) | normal | 1016.51 | 296.939
x86_64 (Ryzen 9) | close-exponents | 18.4428 | 16.0244
aarch64 (N1) | subnormal | 11.153 | 6.81778
aarch64 (N1) | normal | 528.649 | 155.62
aarch64 (N1) | close-exponents | 11.4517 | 8.21306
I also see similar improvements on arm-linux-gnueabihf when running on
the N1 aarch64 chips, where it a lot of soft-fp implementation (for
modulo, clz, ctz, and multiplication):
Architecture | Input | master | patch
-----------------|-----------------|----------|--------
armhf (N1) | subnormal | 15.908 | 15.1083
armhf (N1) | normal | 837.525 | 244.833
armhf (N1) | close-exponents | 16.2111 | 21.8182
Instead of using the math_private.h definitions, I used the
math_config.h instead which is used on newer math implementations.
Co-authored-by: kirill <kirill.okhotnikov@gmail.com>
[1] https://sourceware.org/pipermail/libc-alpha/2020-November/119794.html
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Linux kernel uses AT_HWCAP2 to indicate if FSGSBASE instructions are
enabled. If the HWCAP2_FSGSBASE bit in AT_HWCAP2 is set, FSGSBASE
instructions can be used in user space. Define dl_check_hwcap2 to set
the FSGSBASE feature to active on Linux when the HWCAP2_FSGSBASE bit is
set.
Add a test to verify that FSGSBASE is active on current kernels.
NB: This test will fail if the kernel doesn't set the HWCAP2_FSGSBASE
bit in AT_HWCAP2 while fsgsbase shows up in /proc/cpuinfo.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
|
|
|
|
|
|
|
|
|
| |
The divss instruction clobbers its first argument, and the constraints
need to reflect that. Fortunately, with GCC 12, generated code does
not actually change, so there is no externally visible bug.
Suggested-by: Jakub Jelinek <jakub@redhat.com>
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
|
|
|
|
|
| |
Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-Id: <20230319151017.531737-30-bugaevc@gmail.com>
|
|
|
|
|
| |
Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-Id: <20230319151017.531737-29-bugaevc@gmail.com>
|
|
|
|
|
| |
Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-Id: <20230319151017.531737-23-bugaevc@gmail.com>
|
|
|
|
|
|
|
|
| |
Just like the other existing rtld-str* files, this provides rtld with
usable versions of stpncpy and strncpy.
Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-Id: <20230319151017.531737-22-bugaevc@gmail.com>
|
|
|
|
|
|
|
|
| |
The source code is the same as sysdeps/i386/htl/tcb-offsets.sym, but of
course the produced tcb-offsets.h will be different.
Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-Id: <20230319151017.531737-21-bugaevc@gmail.com>
|
|
|
|
|
|
|
| |
These do not need any changes to be used on x86_64.
Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-Id: <20230319151017.531737-20-bugaevc@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is more correct, if only because these fields are defined as having
the type unsigned int in the Mach headers, so casting them to a signed
int and then back is suboptimal.
Also, remove an extra reassignment of uesp -- this is another remnant of
the ecx kludge.
Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-Id: <20230319151017.531737-16-bugaevc@gmail.com>
|
|
|
|
|
|
|
|
| |
There's nothing Mach- or Hurd-specific about it; any port that ends
up with rtld pulling in strncpy will need this.
Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-Id: <20230319151017.531737-15-bugaevc@gmail.com>
|
|
|
|
|
| |
Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-Id: <20230319151017.531737-13-bugaevc@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
| |
This was used for the value of libc-lock's owner when TLS is not yet set
up, so THREAD_SELF can not be used. Since the value need not be anything
specific -- it just has to be non-NULL -- we can just use a plain
constant, such as (void *) 1, for this. This avoids accessing the symbol
through GOT, and exporting it from libc.so in the first place.
Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-Id: <20230319151017.531737-12-bugaevc@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Noone is or should be using __hurd_threadvar_stack_{offset,mask}, we
have proper TLS now. These two remaining variables are never set to
anything other than zero, so any code that would try to use them as
described would just dereference a zero pointer and crash. So remove
them entirely.
Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-Id: <20230319151017.531737-6-bugaevc@gmail.com>
|
|
|
|
| |
For the next test from cf7ffdd8a5f6da55397e10b3860062944312824c.
|
|
|
|
|
| |
When /proc/self/loginuid is not set, we should still fall back to using
the traditional utmp lookup, instead of failing right away.
|
|
|
|
|
|
|
|
|
|
|
|
| |
And make always supported. The configure option was added on glibc 2.25
and some features require it (such as hwcap mask, huge pages support, and
lock elisition tuning). It also simplifies the build permutations.
Changes from v1:
* Remove glibc.rtld.dynamic_sort changes, it is orthogonal and needs
more discussion.
* Cleanup more code.
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
|
|
|
|
|
|
|
|
|
|
|
| |
Prevent sh from interpreting a user string as shell options if it
starts with '-' or '+'. Since the version of /bin/sh used for testing
system() is different from the full-fledged system /bin/sh add support
to it for handling "--" after "-c". Add a testcase to ensure the
expected behavior.
Signed-off-by: Joe Simmons-Talbott <josimmon@redhat.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
|
|
|
|
| |
We added Adhemerval Zanella's comment to explain the reason for
using EF_LARCH_OBJABI_V1.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Instead define the required fields in system dependend files. The only
system dependent definition is FILENAME_MAX, which should match POSIX
PATH_MAX, and it is obtained from either kernel UAPI or mach headers.
Currently set pre-defined value from current kernels.
It avoids a circular dependendy when including stdio.h in
gen-as-const-headers files.
Checked on x86_64-linux-gnu and i686-linux-gnu
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
They are both used by __libc_freeres to free all library malloc
allocated resources to help tooling like mtrace or valgrind with
memory leak tracking.
The current scheme uses assembly markers and linker script entries
to consolidate the free routine function pointers in the RELRO segment
and to be freed buffers in BSS.
This patch changes it to use specific free functions for
libc_freeres_ptrs buffers and call the function pointer array directly
with call_function_static_weak.
It allows the removal of both the internal macros and the linker
script sections.
Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
|
|
|
|
|
|
|
| |
This allows other targets to use the same inputs for their own libmvec
microbenchmarks without having to duplicate them in their own
subdirectory.
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Binutils 2.40 sets EF_LARCH_OBJABI_V1 for shared objects:
$ ld --version | head -n1
GNU ld (GNU Binutils) 2.40
$ echo 'int dummy;' > dummy.c
$ cc dummy.c -shared
$ readelf -h a.out | grep Flags
Flags: 0x43, DOUBLE-FLOAT, OBJ-v1
We need to ignore it in ldconfig or ldconfig will consider all shared
objects linked by Binutils 2.40 "unsupported". Maybe we should stop
setting EF_LARCH_OBJABI_V1 for shared objects, but Binutils 2.40 is
already released and we cannot change it.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Linux threads were removed about 12 years ago and the current
nptl implementation only requires 4-byte alignment for pthread
locks.
The 16-byte alignment causes various issues. For example in
building ignition-msgs, we have:
/usr/include/google/protobuf/map.h:124:37: error: static assertion failed
124 | static_assert(alignof(value_type) <= 8, "");
| ~~~~~~~~~~~~~~~~~~~~^~~~
This is caused by the 16-byte pthread lock alignment.
Signed-off-by: John David Anglin <dave.anglin@bell.net>
|
|
|
|
|
|
|
| |
Don't check PREFETCHWT1 against /proc/cpuinfo since kernel doesn't report
PREFETCHWT1 in /proc/cpuinfo.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
|
|
|
|
|
|
|
| |
For better debug experience use separate code block with extra
cfi_* directives to run child (same as in __clone3).
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
|
|
|
|
|
|
|
|
|
|
| |
Use the clone3 wrapper on ARC. It doesn't care about stack alignment.
All callers should provide an aligned stack.
It follows the internal signature:
extern int clone3 (struct clone_args *__cl_args, size_t __size,
int (*__func) (void *__arg), void *__arg);
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
|
| |
|
| |
|