summaryrefslogtreecommitdiffstats
path: root/lib/libc/arch/amd64/gen (follow)
Commit message (Collapse)AuthorAgeFilesLines
* Save and restore the MXCSR register and the FPU control word such thatkettenis2020-10-213-3/+15
| | | | | | floating-point control modes are properly restored by longjmp(3). ok guenther@
* amd64: TSC timecounter: prefix RDTSC with LFENCEcheloha2020-08-231-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Regarding RDTSC, the Intel ISA reference says (Vol 2B. 4-545): > The RDTSC instruction is not a serializing instruction. > > It does not necessarily wait until all previous instructions > have been executed before reading the counter. > > Similarly, subsequent instructions may begin execution before the > read operation is performed. > > If software requires RDTSC to be executed only after all previous > instructions have completed locally, it can either use RDTSCP (if > the processor supports that instruction) or execute the sequence > LFENCE;RDTSC. To mitigate this problem, Linux and DragonFly use LFENCE. FreeBSD and NetBSD take a more complex route: they selectively use MFENCE, LFENCE, or CPUID depending on whether the CPU is AMD, Intel, VIA or something else. Let's start with just LFENCE. We only use the TSC as a timecounter on SSE2 systems so there is no need to conditionally compile the LFENCE. We can explore conditionally using MFENCE later. Microbenchmarking on my machine (Core i7-8650) suggests a penalty of about 7-10% over a "naked" RDTSC. This is acceptable. It's a bit of a moot point though: the alternative is a considerably weaker monotonicity guarantee when comparing timestamps between threads, which is not acceptable. It's worth noting that kernel timecounting is not *exactly* like userspace timecounting. However, they are similar enough that we can use userspace benchmarks to make conjectures about possible impacts on kernel performance. Concerns about kernel performance, in particular the network stack, were the blocking issue for this patch. Regarding networking performance, claudio@ says a 10% slower nanotime(9) or nanouptime(9) is acceptable and that shaving off "tens of cycles" is a micro-optimization. There are bigger optimizations to chase down before such a difference would matter. There is additional work to be done here. We could experiment with conditionally using MFENCE. Also, the userspace TSC timecounter doesn't have access to the adjustment skews available to the kernel timecounter. pirofti@ has suggested a scheme involving RDTSCP and an array of skews mapped into user memory. deraadt@ has suggested a scheme where the skew would be kept in the TCB. However it is done, access to the skews will improve monotonicity, which remains a problem with the TSC. First proposed by kettenis@ and pirofti@. With input from pirofti@, deraadt@, guenther@, naddy@, kettenis@, and claudio@. Based on similar changes in Linux, FreeBSD, NetBSD, and DragonFlyBSD. ok deraadt@ pirofti@ kettenis@ naddy@ claudio@
* Clean up the amd64 userland timecounter implementation a bit:kettenis2020-07-081-10/+10
| | | | | | | | | * We don't need TC_LAST * Make internal functions static to avoid namespace pollution in libc.a * Use a switch statement to harmonize with architectures providing multiple timecounters ok deraadt@, pirofti@
* Add support for timeconting in userland.pirofti2020-07-062-2/+44
| | | | | | | | | | | | | | | | | | | | | | | | | | This diff exposes parts of clock_gettime(2) and gettimeofday(2) to userland via libc eliberating processes from the need for a context switch everytime they want to count the passage of time. If a timecounter clock can be exposed to userland than it needs to set its tc_user member to a non-zero value. Tested with one or multiple counters per architecture. The timing data is shared through a pointer found in the new ELF auxiliary vector AUX_openbsd_timekeep containing timehands information that is frequently updated by the kernel. Timing differences between the last kernel update and the current time are adjusted in userland by the tc_get_timecount() function inside the MD usertc.c file. This permits a much more responsive environment, quite visible in browsers, office programs and gaming (apparently one is are able to fly in Minecraft now). Tested by robert@, sthen@, naddy@, kmos@, phessler@, and many others! OK from at least kettenis@, cheloha@, naddy@, sthen@
* Add retguard macros to setjmp/longjmp on amd64. Knocks out some usefulmortimer2019-03-303-21/+33
| | | | | | gadgets from libc. ok deraadt@, kettenis@
* Add retguard macros for libc.mortimer2018-07-039-9/+27
| | | | ok deraadt
* Put _map table into .rodata instead of .textderaadt2017-08-191-3/+2
|
* Copy files from ../librthread in preparation for moving functionalityguenther2017-08-151-0/+26
| | | | | | | from libpthread to libc. No changes to the build yet, just making it easier to review the substantive diffs. ok beck@ kettenis@ tedu@
* Switch from calling obsolete sig{block,setmask} to directly using theguenther2016-05-292-21/+27
| | | | | | sigprocmask syscall ok kettenis@
* Using a 3-word buffer in the openbsd.randomdata segment, XOR swizzlederaadt2016-05-123-21/+80
| | | | | | the PC/FP/SP registers in the jmpbuf. An old idea (around 1999?) but the random segment sure makes it easy. Lots of help from kettenis ok kettenis
* Split the non-syscall ASM bits from SYS.h into DEFS.h and use that in theguenther2015-11-141-2/+2
| | | | | | non-syscall .S source ok millert@ miod@
* Wrap the remaining math functions in libc: __fpclassify*(), __flt_rounds(),guenther2015-10-273-5/+7
| | | | | | and ldexp(). ok millert@
* Do provide hidden _libc_* aliases for sig{block,setmask} and use them inguenther2015-09-132-22/+6
| | | | | | | the ASM *setjmp implementations. Skip the PLT when calling them on amd64 (other archs to do this after testing) ok miod@
* Put obvious END() macros that match ENTRY() entries.uebayasi2015-05-2912-12/+51
|
* remove code for ancient gcc.daniel2015-01-041-7/+1
| | | | ok millert@, kettenis@
* add proto for amd64 case; unify otherwisederaadt2013-11-121-2/+4
|
* Do a PC-relative relocation for _map rather than going throughmartynas2013-04-231-2/+2
| | | | | | | | | GOTPCREL. Uncovered after the binutils patch where it isn't optimized away at assembly and is forced to go through GOTPCREL. But _map is effectively a local variable. Found with cephes by guenther@. OK guenther@, kettenis@, deraadt@.
* Convert cpp | as rules in bsd.lib.mk and lib/libc/sys/Makefile.inc to pure ccpascal2012-08-223-8/+8
| | | | | | | invocations. This allows us to use the compiler builtin define __PIC__ to check for PIC/PIEness rather than passing -DPIC. Simplifies PIE work a lot. ok matthew@, conceptually ok kurt@
* rely on the compiler giving us a built-in alloca. any new architecturederaadt2012-04-192-16/+1
| | | | | or compiler we use will. ok millert
* alloca.c cannot be usedderaadt2012-04-121-2/+1
|
* Revert (leaving the complex math part alone). Some stuff is dependingmartynas2011-07-083-2/+110
| | | | | on this historical behavior; so we're stuck in this stupid situation. No cookie for me.
* Move fabs(3), frexp(3), and modf(3) to libm--nothing has been usingmartynas2011-07-083-110/+2
| | | | them in libc for a very long time. OK guenther@.
* remove from gen so that lint doesn't check gen if assembly versionsmartynas2009-04-211-2/+2
| | | | are available. spotted by theo
* - ldexp implementation has issues. switch to the one from libmmartynas2009-04-191-2/+2
| | | | | | - remove frexp in hppa64, cloned from hppa - move generic ieee754 implementations of modf and ldexp to gen ok kettenis@, "looks good" millert@
* these were not neededmartynas2008-12-091-2/+1
|
* ditto frexpl and ldexplmartynas2008-12-091-1/+2
|
* - add long double signbitmartynas2008-12-096-6/+38
| | | | | | | | | | | | | - make long double versions weak aliases to double versions, on archs where long doubles are 64 bits - no need to have two finites. finite() and finitef() are non-standard 3BSD obsolete versions of isfinite. remove from libm. make them weak_alias in libc to __isfinite and __isfinitef instead. similarly make 3BSD obsolete versions of isinf, isinff, isnan, isnanf weak_aliases to C99's __isinf, __isinff, __isnan, __isnanf - bump major ok millert@
* - replace dtoa w/ David's gdtoa, version 2008-03-15martynas2008-09-076-1/+160
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - provide proper dtoa locks - use the real strtof implementation - add strtold, __hdtoa, __hldtoa - add %a/%A support - don't lose precision in printf, don't round to double anymore - implement extended-precision versions of libc functions: fpclassify, isnan, isinf, signbit, isnormal, isfinite, now that the ieee.h is fixed - separate vax versions of strtof, and __hdtoa - add complex math support. added functions: cacos, casin, catan, ccos, csin, ctan, cacosh, casinh, catanh, ccosh, csinh, ctanh, cexp, clog, cabs, cpow, csqrt, carg, cimag, conj, cproj, creal, cacosf, casinf, catanf, ccosf, csinf, ctanf, cacoshf, casinhf, catanhf, ccoshf, csinhf, ctanhf, cexpf, clogf, cabsf, cpowf, csqrtf, cargf, cimagf, conjf, cprojf, crealf - add fdim, fmax, fmin - add log2. (adapted implementation e_log.c. could be more acruate & faster, but it's good enough for now) - remove wrappers & cruft in libm, supposed to work-around mistakes in SVID, etc.; use ieee versions. fixes issues in python 2.6 for djm@ - make _digittoint static - proper definitions for i386, and amd64 in ieee.h - sh, powerpc don't really have extended-precision - add missing definitions for mips64 (quad), m{6,8}k (96-bit) float.h for LDBL_* - merge lead to frac for m{6,8}k, for gdtoa to work properly - add FRAC*BITS & EXT_TO_ARRAY32 definitions in ieee.h, for hdtoa&ldtoa to use - add EXT_IMPLICIT_NBIT definition, which indicates implicit normalization bit - add regression tests for libc: fpclassify and printf - arith.h & gd_qnan.h definitions - update ieee.h: hppa doesn't have quad-precision, hppa64 does - add missing prototypes to gdtoaimp - on 64-bit platforms make sure gdtoa doesn't use a long when it really wants an int - etc., what i may have forgotten... - bump libm major, due to removed&changed symbols - no libc bump, since this is riding on djm's libc major crank from a day ago discussed with / requested by / testing theo, sthen@, djm@, jsg@, merdely@, jsing@, tedu@, brad@, jakemsr@, and others. looks good to millert@ parts of the diff ok kettenis@ this commit does not include: - man page changes
* - move isinf, isnan dups to gen, since most is ieee 754martynas2008-07-244-97/+12
| | | | | | | | | | | | | | | | - is{inf,nan} should be macros for real-floating, so rename to __is{inf,nan}, per C99 - implement C99 __fpclassify(), __fpclassifyf(), __isfinite(), __isfinitef(), __isnormal(), __isnormalf(), __signbit(), __signbitf() - long functions added, but not yet enabled, till ieee.h is fixed - implement vax equivalents of the functions - reimplement isinff, isnanf in a better way, and move to libc - add qnan bytes for all archs - bump major man pages will follow ok millert@. arm bits looked over by drahn@ discussed w/ theo, who showed the right direction, to put these functions in libc
* if we pull in a .S file, we must fake out the lint with a .c filederaadt2005-11-291-2/+4
| | | | | for this first cut, we will do this for alloca() using alloca.c by adding it to LSRCS
* zap rcsid.espie2005-08-074-19/+4
| | | | okay deraadt@ (tested them all)
* Replace broken frexp() with a working one from FreeBSD. There'smillert2005-02-012-76/+2
| | | | | | no need to have a copy for each platform with ieee floating point, only vax needs a special version (which probably has similar bugs). OK and with help from otto@
* Sync with NetBSD, picking up fixes to correctly reset status bits returningkettenis2004-07-131-3/+6
| | | | | the old status bits. ok deraadt@
* do signal blocking before saving registersderaadt2004-02-092-23/+23
|
* 16 byte align for performance, as on other architecturesderaadt2004-02-081-2/+4
|
* from freebsd, helps awk too:deraadt2004-02-081-2/+2
| | | | | Fix fabs(). This commit brought to you by the letter 'l'. (fstp stores a mem32 value, fstpl stores a mem64 value)
* from freebsd (and appears to make our awk work better)deraadt2004-02-081-17/+14
| | | | | | Tidy up modf.S and make it actually work. It wasn't extracting the value out of ST(0) before copying it to %xmm0. Also remove bogus stack frame and work in the red zone.
* things for amd64; from art@mickey2004-01-2819-0/+911