Skip to content

add new module to core called Time::HiRes (a real benchmark framework) #23389

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 20 commits into
base: blead
Choose a base branch
from

Conversation

bulk88
Copy link
Contributor

@bulk88 bulk88 commented Jun 27, 2025

Read the commits.

Goals of this branch:

  • make T::HR's public API C func myNVtime() actually usable for 3rd party CPAN authors who want to benchmark something. Its not usable because of the dTHX;/Perl_get_context() call. Removing 1 Perl_get_context() DOUBLED the speed of C function myNVtime() in this very broken/flawed benchmark. Its flawed b/c XSUB XS::APItest::XSUB::Time::HiRes::myNVtime and XSUB XS::APItest::XSUB::Time::HiRes::myNVtime_cxt are Grade F XS code that did not write #define PERL_NO_GET_CONTEXT and have 1 or 2 dozen Perl_get_context() calls inside them.

The fact that their TU is Grade F XS code that did not write #define PERL_NO_GET_CONTEXT, is not a bug, and is as designed.

C:\sources\perl5\win32>..\perl -I..\lib -E"use Benchmark; use Benchmark qw(:all)
 ; use XS::APItest; use Time::HiRes; XS::APItest::XSUB::Time::HiRes::Init(); say
 XS::APItest::XSUB::Time::HiRes::myNVtime();say XS::APItest::XSUB::Time::HiRes::
myNVtime_cxt(); my @t = XS::APItest::XSUB::Time::HiRes::myU2time(); say $t[0];sa
y $t[1]; cmpthese(0,{'gc' => \&XS::APItest::XSUB::Time::HiRes::myNVtime, 'c' =>
\&XS::APItest::XSUB::Time::HiRes::myNVtime_cxt});"
1750849048.78898
1750849048.78937
1750849048
789743
Rate     getc c
 getc 23478412/s   --   -33%
 c    34939543/s   49%  --
 C:\sources\perl5\win32>

-Next goal. Discretely add the rdtsc CPU instruction and Win32's QueryPerformanceCounter() and GetSystemTimePreciseAsFileTime() func calls, to blead Perl/Perl core, without any visible public API changes, and without any visible C code changes to HiRes.xs, and without documenting that they are secretly available now in stock P5P WinPerl core now. No need to use Win32::API or go hunting for quick, dirty, and poorly maintained Win32::* CPAN mods, just to get access to QueryPerformanceCounter().

If Linux's glibc's C grammer token gettimeofday() becomes actual inline assembly or a builtin or an instrinsic (a ptr deref into the vdso struct https://elixir.bootlin.com/linux/v4.7/source/arch/x86/entry/vdso/vclock_gettime.c#L104 ) inside libperl/hires.so, then that platform will be faster too.

GetSystemTimePreciseAsFileTime (rdtsc + utc time) is reachable with

C:\sources\perl5\win32>cd .. && perl.exe -Ilib -MTime::HiRes -E" say 'GSTPAFT()
'.Time::HiRes::clock_gettime(1); say 'CORE '.time();" & cd win32
GSTPAFT() 1751048069.79015
CORE 1751048069
C:\sources\perl5\win32>

QueryPerformanceCounter (rdtsc with speedswitch/turboboost/hypervisor correction, no comment when 0 nanosecs happened) is reachable with

C:\sources\perl5\win32>cd .. && perl.exe -Ilib -MTime::HiRes -E" say 'QPC() '.Ti
me::HiRes::clock_gettime(2); say 'CORE '.time();" & cd win32
QPC() 1471372.36982931
CORE 1751051969

Exposing this CPU features meanings stripping as much perl XS "glue" from the xsub as possible without segving or doing black box breaking like manually delinking SV heads and SV bodies, or manually growing/reallocing the mortal stack.

There were large areas of very poor quality XS code, and sometimes even bugs where a sv_newmortal() was executed and the retval was never used. and HiRes's XS code did an accidental sv_newmortal() and a few lines later did sv_2mortal(newsviv()). dXSTARG contains a sv_newmortal() call.

Only perf optimizations not done in these commits were

-manually delinking SV heads/bodies (making sv_2mortal(newsviv()). faster)
-not using call checker to rewrite the caller's optree.
-POPMARK macro/inlinr static's pointer aliasing violation, only me and tonyc know about it
-not writing any asm code,
-not adding any new os specific function calls that were not previously linked into T::HR
-no reaching into the windows vdso page/C strct or linux vdso page/C struct

I DID handle and remove most of the XS_RETURN() macros since XS-RETURN's C code violates x86 pointer aliasing (memory barrer/felce

#ifdef PERL_IMPLICIT_CONTEXT

static NV
myNVtime_cxt(pTHX)
{
000007FEEEDA1160 48 83 EC 28          sub         rsp,28h  
    __debugbreak();
000007FEEEDA1164 CC                   int         3  
#ifdef HAS_NV_GETTIMEOFDAY
    return nv_gettimeofday();
#else
    struct timeval Tp;
    int status;
    status = gettimeofday (&Tp, NULL);
000007FEEEDA1165 E8 96 FE FF FF       call        _GetSystemTimePreciseAsFileTime (07FEEEDA1000h)  
000007FEEEDA116A 48 8B C8             mov         rcx,rax  
000007FEEEDA116D 48 BA 00 80 C1 2A 21 4E 62 FE mov         rdx,0FE624E212AC18000h  
000007FEEEDA1177 48 03 D0             add         rdx,rax  
000007FEEEDA117A 48 B8 BD 42 7A E5 D5 94 BF D6 mov         rax,0D6BF94D5E57A42BDh  
000007FEEEDA1184 48 F7 E2             mul         rax,rdx  
000007FEEEDA1187 48 B8 CD CC CC CC CC CC CC CC mov         rax,0CCCCCCCCCCCCCCCDh  
000007FEEEDA1191 4C 8B C2             mov         r8,rdx  
000007FEEEDA1194 48 F7 E1             mul         rax,rcx  
000007FEEEDA1197 49 C1 E8 17          shr         r8,17h  
000007FEEEDA119B 48 B8 DB 34 B6 D7 82 DE 1B 43 mov         rax,431BDE82D7B634DBh  
000007FEEEDA11A5 48 8B CA             mov         rcx,rdx  
000007FEEEDA11A8 48 C1 E9 03          shr         rcx,3  
000007FEEEDA11AC 48 F7 E1             mul         rax,rcx  
000007FEEEDA11AF 66 41 0F 6E C8       movd        xmm1,r8d  
    return status == 0 ? Tp.tv_sec + (Tp.tv_usec / NV_1E6) : -1.0;
000007FEEEDA11B4 F3 0F E6 C9          cvtdq2pd    xmm1,xmm1  
#ifdef HAS_NV_GETTIMEOFDAY
    return nv_gettimeofday();
#else
    struct timeval Tp;
    int status;
    status = gettimeofday (&Tp, NULL);
000007FEEEDA11B8 48 C1 EA 12          shr         rdx,12h  
000007FEEEDA11BC 48 69 C2 40 42 0F 00 imul        rax,rdx,0F4240h  
000007FEEEDA11C3 48 2B C8             sub         rcx,rax  
000007FEEEDA11C6 66 0F 6E C1          movd        xmm0,ecx  
    return status == 0 ? Tp.tv_sec + (Tp.tv_usec / NV_1E6) : -1.0;
000007FEEEDA11CA F3 0F E6 C0          cvtdq2pd    xmm0,xmm0  
000007FEEEDA11CE F2 0F 5E 05 0A 39 00 00 divsd       xmm0,mmword ptr [__real@412e848000000000 (07FEEEDA4AE0h)]  
000007FEEEDA11D6 F2 0F 58 C1          addsd       xmm0,xmm1  
#endif
}
000007FEEEDA11DA 48 83 C4 28          add         rsp,28h  
000007FEEEDA11DE C3                   ret  

AFTER inventing nv_gettimeofday();

#ifdef PERL_IMPLICIT_CONTEXT

static NV
myNVtime_cxt(pTHX)
{
000007FEF5BD1160 48 83 EC 28          sub         rsp,28h  
#ifdef HAS_NV_GETTIMEOFDAY
    return nv_gettimeofday();
000007FEF5BD1164 E8 97 FE FF FF       call        _GetSystemTimePreciseAsFileTime (07FEF5BD1000h)  
000007FEF5BD1169 48 8B C8             mov         rcx,rax  
000007FEF5BD116C 0F 57 C0             xorps       xmm0,xmm0  
000007FEF5BD116F 48 B8 00 80 3E D5 DE B1 9D 01 mov         rax,19DB1DED53E8000h  
000007FEF5BD1179 48 2B C8             sub         rcx,rax  
000007FEF5BD117C 78 07                js          myNVtime_cxt+25h (07FEF5BD1185h)  
000007FEF5BD117E F2 48 0F 2A C1       cvtsi2sd    xmm0,rcx  
000007FEF5BD1183 EB 15                jmp         myNVtime_cxt+3Ah (07FEF5BD119Ah)  
000007FEF5BD1185 48 8B C1             mov         rax,rcx  
000007FEF5BD1188 83 E1 01             and         ecx,1  
000007FEF5BD118B 48 D1 E8             shr         rax,1  
000007FEF5BD118E 48 0B C1             or          rax,rcx  
000007FEF5BD1191 F2 48 0F 2A C0       cvtsi2sd    xmm0,rax  
000007FEF5BD1196 F2 0F 58 C0          addsd       xmm0,xmm0  
000007FEF5BD119A F2 0F 5E 05 46 39 00 00 divsd       xmm0,mmword ptr [__real@416312d000000000 (07FEF5BD4AE8h)]  
#else
    struct timeval Tp;
    int status;
    status = gettimeofday (&Tp, NULL);
    return status == 0 ? Tp.tv_sec + (Tp.tv_usec / NV_1E6) : -1.0;
#endif
}
000007FEF5BD11A2 48 83 C4 28          add         rsp,28h  
000007FEF5BD11A6 C3                   ret  
--- No source file -------------------------------------------------------------
000007FEF5BD11A7 CC                   int         3  

  • This set of changes requires a perldelta entry, and it is included.
  • This set of changes requires a perldelta entry, and I need help writing it.
  • This set of changes does not require a perldelta entry.

bulk88 added 20 commits June 27, 2025 13:31
More efficient. This is a static, there are no binary compat concerns.
The dTHX is from initial commit of hrstatns() in commit:

75d5269 - Steve Peters - 10/13/2006 10:11:04 AM
Upgrade to Time-HiRes-1.92.
-XPUSHs() requires saving the SV* retval of sv_2mortal(newSVsv()) around
 a possible Perl_stack_grow(), split the EXTEND from the PUSH, so SV*
 is held only in volatile registers (liveness).
-over EXTEND to 13 elements instead of 1 element.
 Why not? pp_stat()/pp_lstat() have to do the Perl_stack_grow() call if we
 don't do it.
-remove Zero() macro and use a function call free struct initializer.
 Just b/c GCC and its offshoot Clang will inline a fixed length memset()
 doesn't make it part of ISO C. MSVC compiler never inlines memset() calls
 on WinPerl (b/c P5P never added the magic sauce to ask for that feature).
 More portably, P5P has never verified the machine code output of all
 known commercial Unix CCs on all CPU archs regarding inlining memset().
-when filling out the fake OP, do some instruction level parallelism like
 filling in fakeop with 0s, while digging through my_perl->Iop->op_flags,
 my_perl->Icurstackinfo->si_cxsubix, my_perl->Icurstackinfo->si_cxstack,
 and etc, as part of GIMME_V macro, which used to be a libperl.so exported
 function call a very long time ago IIRC.

 Another example, translate ix?OP_LSTAT:OP_STAT while translating
 gm==G_LIST?OPf_WANT_LIST:gm==G_SCALAR?OPf_WANT_SCALAR:OPf_WANT_VOID.

 Dig through PLT/GOT/PE sym table as part of PL_ppaddr[op_type]
 while writing to C stk mem as part of fakeop.op_type = op_type

-change fakeop.op_ppaddr(aTHX); to ppaddr(aTHX); b/c some CCs have a low
 IQ and can't prove statement "PL_op = &fakeop;" won't modify field
 fakeop.op_ppaddr in our C auto storage OP struct var.
-don't execute the Perl_sv_2uv_flags() getter method pointlessly inside
 UV atime = SvUV(ST( 8)); if static function hrstatns() is a NOOP and
 is inlined and totally optimized away since in some build configs,
 hrstatns() only does atime_nsec = 0; mtime_nsec = 0; ctime_nsec = 0;
 Windows is an example.
-change SvUV(ST( 8)); to SvUV(SPBASE[ 8]); don't deref my_perl->Istack_base
 over and over
-strings "Time::HiRes::clock", "Time::HiRes::clock_nanosleep", etc will be
 inside HiRes.dll.so no matter what, b/c BOOT: and newXS_flags() requires
 them no matter what
-type NV_DIE is an un-invasive LOC-wise quick fix to get rid of the tons of
 EU::PXS injected dXSTARG; statements which execute Perl_sv_newmortal()
 right before executing

   croak("%s(): unimplemented in this platform","Time::HiRes::clock");

 The retval types could be changed to void, or SV* instead to eliminate
 the Perl_sv_newmortal() before croak() calls. But for some hysterical
 raisens, Time::HiRes.xs is confusing Perl_warn() with Perl_croak() in
 dozens of places. Fixing that is out of scope for this patch.
-It is a boot time constant. It will not change without a motherboard or
 CPU swap and then rebooting. The actual 64 bit integer returned, reflects
 if the NT Kernel wants to use Intel's APIC Timer or Intel's 8253/8254 PIT
 Timer, or Intel's RDTSC instruction. NT Kernel will only use RDTSC backend
 if both the CPU and Northbridge swear upon a holy book, that they will
 fire an interrupt at every Intel/AMD SpeedSwitch/TurboBoost transition.
 The dynamic CPU speed correction factor logic lives inside the machine
 code of QueryPerformanceCounter(). Not inside QueryPerformanceFrequency()
 which has been part of MS;s frozen Public API since 1993.
-the test:

 if (!QueryPerformanceFrequency(&l_tick_frequency)) croak("WT???");

 can probably be removed one day, only Win2K or NT4 or Win95/98, running
 on any 32-bit CISC or 32-but RISC CPU arch, are capable of retval FALSE.
 The test is added out of paranoia. IDK what in real life on real HW can
 cause retval FALSE.
-calc and save var unsigned __int64 qpc_res_ns; and
 unsigned __int64 qpc_res_ns_realtime; exactly once instead of re-calcing
 in the runloop, why not? HiRes.dll's .data section is only 0x650 bytes
 long and granularity is 0x1000/4096 bytes.
-the BOOT: initialization code of the 3 true C static global vars, is
 written, to assume 2 ithreads, or 2 my_perl ptrs, or 2 different
 embbeding consumers of perl5XX.dll inside 1 OS process, can
 simultaneously call Dynaloader::bootstrap() or Time::HiRes::bootstrap()
 on 2 different CPU cores. This is unrealistic paranoia IMO, but CPU op
 lock xchg reg, [addr]; and mov [addr], reg; are both 7 bytes long.
 Maybe Windows >= 8.0 on ARM32/ARM64, want their memory fence/barrier
 formalities writing to an aligned 64 bit integer. So why not?
-#define S_InterlockedExchange64(_d,_s) has S_ prefix, so no assumptions
 are made on MSVC and Mingw GCC, if InterlockedExchange64() is a macro or
 a symbol. Any age, any version, any build number, any FOSS project code
 owner, or any FOSS binary packager, of those 2 C compiler families.
Macros SvIV()/SvNV()/SvUV() contain getter function calls. Don't execute
the getters, if we will croak() no matter what. The end user doesn't need
to see an "Uninitialized variable in" STDERR warning right before
croak("unimplemented"); executes. Same goes for SvGETMAGIC() methods firing
right before croak("unimplemented");

I picked "int die_t" vs "int_die_t" so IDE syntax highlight keeps working
on token "int".
…ime()

To summarize, MS's FILETIME type is an 8 bytes long, 64 bit integer, that
might aligned to 4 bytes, not 8.

SW E-Attorneys, will vigorously argue, MS's FILETIME type, is an 8 byte
long C struct, wrapping a union that wraps a U8 array[8]; string that is
8 bytes long. Claiming type FILETIME is a 64 bit int is libel and slander.

Since P5P does not publish a C compiler or C linker. That alignment detail
for Windows on RISC machine code is irrelavent.

This commit was written to preventing redundant re-reads of a C auto U64
from C stack memory to a CPU register around any possible function call,
if they exist, and to narrow down the peak width of each caller function's
callstack frame on the C stack.
…zers

-all C branches/CPP branches in these 2 XSUBs return and set "int status"
-remove align padding bytes from struct my_cxt_t{}.
 unsigned long run_count; is always 4 bytes, the other 3 members are
 always 8 bytes
-cleanup ABI/machine code gen of Win32-only static fn _gettimeofday()
 It never leaves this TU as a fn ptr. MSVC 2022 -O1/-O2 optimizer can only
 create unitialzed reg/C stk "holes" for args that are unused in
 all callers and unused in callee. It can't shift left or collapse any
 both sides, unused registers/C arguments, in 1 TU, even if no fn ptr
 if taken in a static function. The new macro remains POSIX-like.
-In _GetSystemTimePreciseAsFileTime(), immediatly copy contents of our
 " &C_auto_u64 " var, to a new C auto var, so the 64-bit value
 "outputs" or psuedo-retvals of the MS Win API funcs, can be manipulated
 for the rest of the function's body, completly in CPU registers, with 0%
 chance of re-reading or pointlessly writing back to the C stack memory
 address.
-Do the same for _gettimeofday_x() when _gettimeofday_x() calls the MS
 public Win API funcs.
-Inside _GetSystemTimePreciseAsFileTime(), hoist/combine/factor out the 2
 different callsites of QueryPerformanceCounter() to the root block.
 All branches will execute QueryPerformanceCounter() anyways. MSVC 2022
 refused to hoist the QueryPerformanceCounter() call, around the statement

 if(MY_CXT.run_count++==0
    ||MY_CXT.base_systime_as_filetime.ft_i64>MY_CXT.reset_time){

-add PERL_STATIC_FORCE_INLINE for static funcs like _clock_gettime() that
 have exactly 1 caller/callsite, usually this is XSUB function with a CV*
 argument.
-add PERL_STATIC_FORCE_INLINE to _gettimeofday(), even though it has
 8 different callers/callsites. The reason is because _gettimeofday() has
 a huge amount of U64 math at its bottom. All the callers then do a huge
 amount of mostly FP NV/double math, before saving the final NV value to a
 SV* with NOK_on. To allow the CC to optimize/combine/simplify these 2
 large groups of U64 math and NV math, they must be in the same function.
 So add PERL_STATIC_FORCE_INLINE to _gettimeofday().





sortunsigned long run_count
…erefs

-each reference to a global var like qpc_res_ns or tick_frequency is 7
 bytes in machine code, or a couple more bytes than 7. Since BOOT:{}
 runs only once, and the chance 2 parallel BOOT:{} XSUBs in 2 different
 my_perls is almost zero, and even if there are 2 parallel OS threads
 executing, 1 OS thread isn't going help shave time off the 2nd OS thread.
 So to reduce the number of 7 byte opcodes that are reading from the
 global vars, maximize C auto vars as much as possible.
 QueryPerformanceFrequency() internally on Win7 is around 1-3 ptr derefs
 into NT's "VDSO" aka KUSER_SHARED_DATA. On Win2k, QPF() is a ring 0 call.
-slide indent level to the left b/c the Win32 code block is nested too
 deep and almost ever statement would exceed 80 chars
-cache PL_modglobal to a register, PL_modglobal is a big U32 offset 0x698
 into my_perl struct " 48 8B 9F 98 06 00 00 mov rbx, [rdi+698h] "
… COW

-we dont need to map values 0/1 to OP_STAT/OP_LSTAT at runtime, it can be
 done once at CC time / BOOT:{} time
-IDK why $_[0] is being duped, the pp_stat*() functions aren't supposed to
 modify incoming @_ args, but if we are going to dupe $_[0], atleast try
 to use COW semantics if available
croak("%s(): unimplemented in this platform", "Time::HiRes::ualarm");

This can be estimated at 6 + 7 + 7 = 20 bytes of machine code on Intel.
My guess on a RISC CPU is 3 * 2 * 4 = 24 bytes.
On any CPU arch, the asm code will look like:

mov rel_U32; mov rel_U32; call rel_U32;

So create a dedicated static croak func, so these unimplemented stubs are
smaller, and will look like:

mov reg, reg; call rel_U32;

RISC: 4 + (4 || 8)
Intel: 3 + (5 || 6)
-gettimeofday() EXTEND is only need if > 1 retval b/c pp_entersub promises
 @_ 1 slot, lift C stack memory var values to registers, this way if
 gettimeofday() is a static P5P written polyfill, and if the CC decides
 to inline it, the struct timeval Tp; C stack var will optimize away
-setitimer() min 2 incoming args + PPCODE: is proof we have atleast 2
 retval slots
-getitimer() 1 in arg + PPCODE: is proof we have atleast 1 retval slot
-utime() don't execute SvNV() over and over, don't exec sv_2io() 2x,
 add SvPV_const() for anti-de-COW future-proofing
-I measured S_croak_xs_unimplemented() at 0x88 bytes of MSVC 2022 -O1 x64
 machine code. The optimization probably isn't worth it if break even is
 0x88/(7*3) = 6.47 unimpl stubs. Just use exported function cv_name(),
 we don't need to perfectly match croak_xs_usage()'s text/logic.
…prmt)

-TMHR has a fancy Perl maintained Win32 high precision GTOD() polyfill impl
 inside it. But it can't be used for actual benchmarking by CPAN authors
 b/c it's do a very slow Perl_get_context() call every time to get access
 to MY_CXT struct. So add a pTHX_ version of myNVtime(). Add tests that
 prove TMHR's C level public API for CPAN authors actually exists and
 works. Nothing inside the P5P repo, ever tries to use TMHR's C level
 Time::HiRes::myNVtime / Time::HiRes::myU2time function pointers.
-The 3 XSUBs for calling the TMHR C func ptrs, really should be in a
 new .xs file inside ext/XS-APItest/ called "benchmark.xs" or
 "noplgetcxt.xs" that has #define NO_PERL_GET_CONTEXT at the top, UNLIKE
 all the other XS-APItest .xs files, which try to prove the very slot
 ithreads-unaware CPAN XS legacy src code compat mode actually works.
-POK and SvPVX() store the 2nd fn ptr, in the same SV*, POK flag can be
 used by CPAN XS authors to separate old TMHR releases w/o the new fn ptr
 from new TMHR releases that have it. NOK and SvNVX() and using
union _xnvu {
    NV	    xnv_nv;
    HV *    xgv_stash; <<<<<<<<
    line_t  xnv_lines;
    bool    xnv_bm_tail;
};
 is an alternative design, but I went with POK and SvPVX, because even with
 SvREADONLY(), I have paranoia, some C code on some OS on some CPU arch
 somewhere, will do a random
     read -> round_and_or_fire_IEEE_OS_signals -> write to SvNVX()
 operation on the SvNVX() slot, for no good reason, b/c of
 academic purity/standards body compliance/ABI requirements of
 that CPU/OS arch, and the function ptr is now giberish, or was converted
 from a denormal NaN to a normal NaN or SIG_DIV0-ed.
-future expansion provision exists, if SvPOK_on && SvCUR() > sizeof(void*),
 SvPVX() is now a pointer to a C struct/C array, with the 1st 4/8 bytes
 being a header, and not a fn ptr.
-TODO return by copy version of Time::U2time fn ptr, more efficient on
 certain ABIs (__vectorcall/SysV) that allow 128 bit structs/arrays to
 be returned in 2 registers back to the caller, and not secret pointers
 as a secret 1st arg
-reason, make these XSUBs as fast as possible so these XSUBs are more
 accurate for benchmarking, or contribute less overhead to the final
 numeric time deltas vs the time of whatever PP code was being measured
 The sv_newmortal()+sv_set_i_u_n_v_mg() permutation is unacceptable.
 Stepping into sv_upgrade() is unacceptable to do SVt_NULL->SVt_IV.
-TMR_TARG***(rsv, RETVAL, 1); macros could be further optimized here vs
 pp.h's impl of TARG***(RETVAL,1), but that is left for the future.
…n loss

-add NV retval variants nv_gettimeofday() and
 nv_clock_gettime(clock_id, &status), the splitting of the solo U64,
 into 2 IVs/UVs (64b IVs/UVs on my system), then recombing those 2 integers
 with integer or FP double logic, was very messy and verbose machine code
 and no, MSVC didn't "algebra" const fold away the splitting and recombing
 logic, so just create polyfills that always return NVs from the start
-do "- ((U64)EPOCH_BIAS" with U64 logic, for maximum chance of
 no rounding/no precision loss, then do division with FP logic for maximum
 fractional number precision
-"NV nv = nv_clock_gettime(clock_id, &status);" is inlined away,
 var bool status; has no C stack or register representation in mach code
 with MSVC 2022 -O1. Returning a pass by copy
 struct {NV nv; bool success;}; was considered, but never tried, b/c of
 Win64 AMD64 ABI's "rule" of all retval types > 8 bytes become secret ptrs
 and a secret 1st arg. Maybe MSVC would inline and fold away the struct,
 maybe it would not. I didn't try it. Current impl is working as intended.
-nv_clock_gettime() still needs to reject junk values in clock_id remember
-add tick_frequency_nv, so U64 -> NV is done 1x at startup, not in the
 run loop
-S_croak_xs_unimplemented(const CV *const cv) silence CC warning, cv_name()
 doesn't want a const CV* head struct
-EU::Constant already has all these AUTOLOAD macro const C strings in the
 binary, and they aren't going away any time soon. So use those C strings
 to make SVPV HEK* COWs, and stick them in @EXPORT_OK, instead of
 @EXPORT_OK holding SVPV Newx() non-COW strings. Besides, most or all
 all of these C strings will become HV* stash HEs, CV*s, or GV*s, and all
 of those hold PL_strtab HEK*s, so lets same private bytes phy/virtual
 memory of a Perl proc at runtime b/c @EXPORT_OK's SV*s are all COWs.
 And speed up Time::HiRes initial load time since yylex/ck_op*() doesn't
 have to parse, alloc OPs, alloc pad consts, then run BEGIN, then DTOR
 all the OPs and pad consts.
@bulk88 bulk88 force-pushed the timehires_cleanup branch from bc456d4 to 02c65aa Compare June 27, 2025 23:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant