| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
| |
floating-point control modes are properly restored by longjmp(3).
ok guenther@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Regarding RDTSC, the Intel ISA reference says (Vol 2B. 4-545):
> The RDTSC instruction is not a serializing instruction.
>
> It does not necessarily wait until all previous instructions
> have been executed before reading the counter.
>
> Similarly, subsequent instructions may begin execution before the
> read operation is performed.
>
> If software requires RDTSC to be executed only after all previous
> instructions have completed locally, it can either use RDTSCP (if
> the processor supports that instruction) or execute the sequence
> LFENCE;RDTSC.
To mitigate this problem, Linux and DragonFly use LFENCE. FreeBSD and
NetBSD take a more complex route: they selectively use MFENCE, LFENCE,
or CPUID depending on whether the CPU is AMD, Intel, VIA or something
else.
Let's start with just LFENCE. We only use the TSC as a timecounter on
SSE2 systems so there is no need to conditionally compile the LFENCE.
We can explore conditionally using MFENCE later.
Microbenchmarking on my machine (Core i7-8650) suggests a penalty of
about 7-10% over a "naked" RDTSC. This is acceptable. It's a bit of
a moot point though: the alternative is a considerably weaker
monotonicity guarantee when comparing timestamps between threads,
which is not acceptable.
It's worth noting that kernel timecounting is not *exactly* like
userspace timecounting. However, they are similar enough that we can
use userspace benchmarks to make conjectures about possible impacts on
kernel performance.
Concerns about kernel performance, in particular the network stack,
were the blocking issue for this patch. Regarding networking
performance, claudio@ says a 10% slower nanotime(9) or nanouptime(9)
is acceptable and that shaving off "tens of cycles" is a
micro-optimization. There are bigger optimizations to chase down
before such a difference would matter.
There is additional work to be done here. We could experiment with
conditionally using MFENCE. Also, the userspace TSC timecounter
doesn't have access to the adjustment skews available to the kernel
timecounter. pirofti@ has suggested a scheme involving RDTSCP and an
array of skews mapped into user memory. deraadt@ has suggested a
scheme where the skew would be kept in the TCB. However it is done,
access to the skews will improve monotonicity, which remains a problem
with the TSC.
First proposed by kettenis@ and pirofti@. With input from pirofti@,
deraadt@, guenther@, naddy@, kettenis@, and claudio@. Based on
similar changes in Linux, FreeBSD, NetBSD, and DragonFlyBSD.
ok deraadt@ pirofti@ kettenis@ naddy@ claudio@
|
|
|
|
|
|
|
|
|
| |
* We don't need TC_LAST
* Make internal functions static to avoid namespace pollution in libc.a
* Use a switch statement to harmonize with architectures providing
multiple timecounters
ok deraadt@, pirofti@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This diff exposes parts of clock_gettime(2) and gettimeofday(2) to
userland via libc eliberating processes from the need for a context
switch everytime they want to count the passage of time.
If a timecounter clock can be exposed to userland than it needs to set
its tc_user member to a non-zero value. Tested with one or multiple
counters per architecture.
The timing data is shared through a pointer found in the new ELF
auxiliary vector AUX_openbsd_timekeep containing timehands information
that is frequently updated by the kernel.
Timing differences between the last kernel update and the current time
are adjusted in userland by the tc_get_timecount() function inside the
MD usertc.c file.
This permits a much more responsive environment, quite visible in
browsers, office programs and gaming (apparently one is are able to fly
in Minecraft now).
Tested by robert@, sthen@, naddy@, kmos@, phessler@, and many others!
OK from at least kettenis@, cheloha@, naddy@, sthen@
|
|
|
|
|
|
| |
gadgets from libc.
ok deraadt@, kettenis@
|
|
|
|
| |
ok deraadt
|
| |
|
|
|
|
|
|
|
| |
from libpthread to libc. No changes to the build yet, just making it
easier to review the substantive diffs.
ok beck@ kettenis@ tedu@
|
|
|
|
|
|
| |
sigprocmask syscall
ok kettenis@
|
|
|
|
|
|
| |
the PC/FP/SP registers in the jmpbuf. An old idea (around 1999?) but
the random segment sure makes it easy. Lots of help from kettenis
ok kettenis
|
|
|
|
|
|
| |
non-syscall .S source
ok millert@ miod@
|
|
|
|
|
|
| |
and ldexp().
ok millert@
|
|
|
|
|
|
|
| |
the ASM *setjmp implementations.
Skip the PLT when calling them on amd64 (other archs to do this after testing)
ok miod@
|
| |
|
|
|
|
| |
ok millert@, kettenis@
|
| |
|
|
|
|
|
|
|
|
|
| |
GOTPCREL. Uncovered after the binutils patch where it isn't optimized
away at assembly and is forced to go through GOTPCREL. But _map
is effectively a local variable.
Found with cephes by guenther@.
OK guenther@, kettenis@, deraadt@.
|
|
|
|
|
|
|
| |
invocations. This allows us to use the compiler builtin define __PIC__ to check
for PIC/PIEness rather than passing -DPIC. Simplifies PIE work a lot.
ok matthew@, conceptually ok kurt@
|
|
|
|
|
| |
or compiler we use will.
ok millert
|
| |
|
|
|
|
|
| |
on this historical behavior; so we're stuck in this stupid situation.
No cookie for me.
|
|
|
|
| |
them in libc for a very long time. OK guenther@.
|
|
|
|
| |
are available. spotted by theo
|
|
|
|
|
|
| |
- remove frexp in hppa64, cloned from hppa
- move generic ieee754 implementations of modf and ldexp to gen
ok kettenis@, "looks good" millert@
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- make long double versions weak aliases to double versions,
on archs where long doubles are 64 bits
- no need to have two finites. finite() and finitef() are
non-standard 3BSD obsolete versions of isfinite. remove
from libm. make them weak_alias in libc to __isfinite and
__isfinitef instead. similarly make 3BSD obsolete versions
of isinf, isinff, isnan, isnanf weak_aliases to C99's
__isinf, __isinff, __isnan, __isnanf
- bump major
ok millert@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- provide proper dtoa locks
- use the real strtof implementation
- add strtold, __hdtoa, __hldtoa
- add %a/%A support
- don't lose precision in printf, don't round to double anymore
- implement extended-precision versions of libc functions: fpclassify,
isnan, isinf, signbit, isnormal, isfinite, now that the ieee.h is
fixed
- separate vax versions of strtof, and __hdtoa
- add complex math support. added functions: cacos, casin, catan,
ccos, csin, ctan, cacosh, casinh, catanh, ccosh, csinh, ctanh, cexp,
clog, cabs, cpow, csqrt, carg, cimag, conj, cproj, creal, cacosf,
casinf, catanf, ccosf, csinf, ctanf, cacoshf, casinhf, catanhf,
ccoshf, csinhf, ctanhf, cexpf, clogf, cabsf, cpowf, csqrtf, cargf,
cimagf, conjf, cprojf, crealf
- add fdim, fmax, fmin
- add log2. (adapted implementation e_log.c. could be more acruate
& faster, but it's good enough for now)
- remove wrappers & cruft in libm, supposed to work-around mistakes
in SVID, etc.; use ieee versions. fixes issues in python 2.6 for
djm@
- make _digittoint static
- proper definitions for i386, and amd64 in ieee.h
- sh, powerpc don't really have extended-precision
- add missing definitions for mips64 (quad), m{6,8}k (96-bit) float.h
for LDBL_*
- merge lead to frac for m{6,8}k, for gdtoa to work properly
- add FRAC*BITS & EXT_TO_ARRAY32 definitions in ieee.h, for hdtoa&ldtoa
to use
- add EXT_IMPLICIT_NBIT definition, which indicates implicit
normalization bit
- add regression tests for libc: fpclassify and printf
- arith.h & gd_qnan.h definitions
- update ieee.h: hppa doesn't have quad-precision, hppa64 does
- add missing prototypes to gdtoaimp
- on 64-bit platforms make sure gdtoa doesn't use a long when it
really wants an int
- etc., what i may have forgotten...
- bump libm major, due to removed&changed symbols
- no libc bump, since this is riding on djm's libc major crank from
a day ago
discussed with / requested by / testing theo, sthen@, djm@, jsg@,
merdely@, jsing@, tedu@, brad@, jakemsr@, and others.
looks good to millert@
parts of the diff ok kettenis@
this commit does not include:
- man page changes
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- is{inf,nan} should be macros for real-floating, so rename to
__is{inf,nan}, per C99
- implement C99 __fpclassify(), __fpclassifyf(), __isfinite(),
__isfinitef(), __isnormal(), __isnormalf(), __signbit(), __signbitf()
- long functions added, but not yet enabled, till ieee.h is fixed
- implement vax equivalents of the functions
- reimplement isinff, isnanf in a better way, and move to libc
- add qnan bytes for all archs
- bump major
man pages will follow
ok millert@. arm bits looked over by drahn@
discussed w/ theo, who showed the right direction, to put these
functions in libc
|
|
|
|
|
| |
for this first cut, we will do this for alloca() using alloca.c by
adding it to LSRCS
|
|
|
|
| |
okay deraadt@ (tested them all)
|
|
|
|
|
|
| |
no need to have a copy for each platform with ieee floating point,
only vax needs a special version (which probably has similar bugs).
OK and with help from otto@
|
|
|
|
|
| |
the old status bits.
ok deraadt@
|
| |
|
| |
|
|
|
|
|
| |
Fix fabs(). This commit brought to you by the letter 'l'.
(fstp stores a mem32 value, fstpl stores a mem64 value)
|
|
|
|
|
|
| |
Tidy up modf.S and make it actually work. It wasn't extracting
the value out of ST(0) before copying it to %xmm0. Also remove bogus stack
frame and work in the red zone.
|
|
|