-
Notifications
You must be signed in to change notification settings - Fork 584
Mro dot xs cleanup #23385
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: blead
Are you sure you want to change the base?
Mro dot xs cleanup #23385
Conversation
-less machine code on threaded perls for branches that will never execute in normal production code, each aTHX_, is a CPU's mov, push, load, or store machine code op (3-4 bytes typically, worst case 7 or 8 bytes).
…sh_U32 Perl_mro_register(aTHX_ &dfs_alg); will remember and store ptr &dfs_alg forever in a SVUV. So do not keep struct mro_alg'es in const RO memory, and fill in the U32 hash at perl proc start or shared lib attach time. Other parts of the interp will deref and pass struct member "struct mro_alg->hash" to hv_common(). PERL_HASH(hash, "c3", (sizeof("c3")-1)); was specially written instead of PERL_HASH(hash, c3_alg.name, c3_alg.length); because PERL_HASH() is inlineable, and I've seen MSVC 2022 -O1 do heavy inlining and heavy const folding of sbox32_hash_with_state(), if it knows the constant chars in a string, and the constant length of the string, ahead of time during CC .c->.obj phase. The machine code of a very heavy const fold of sbox32_hash_with_state(), superficially looks more like an inlined fixed length memcpy() or a struct to struct = assignment, than some kind of fancy "hash" or "cryptographic" or jpg/mpeg algorithm. I will assume "U32 hash = c3_alg.hash; if (hash == 0) {" and "c3_alg.hash = hash;" are atomic/multi-core safe. Aligned R/Ws to types U8/U16/U32 on all CPUs, and U64 on 64b, are supposed to be atomic safe on all CPUs. Aligned U32 R/Ws in C are never emulated with 4 separate U8 or 8 U4 R/W ops on any commercial CPU.
-HEK* cache str "UNIVERSAL" its heavily used inside the Perl VM, and take advantage of shared HEK* to HEK* comparison loop optimization inside hv_common -HEK* cache str "dfs", it is ineligible for COW 255 since its under 9 chars long -mro_get_mro() cleanup retval, PPCODE: moves SP back to 0, we push 1 elem then EUPXS does a PUTBACK for us, this removes alot of PL_stack_base derefs - av_push_simple(ret_array, newSVsv(hv_iterkeysv(iter))); is created 2 SV*s for no good reason, someone earlier never read the api docs for what hv_iterkeysv() does, or looked at its internals -mXPUSHs -> mPUSHs, X not needed, we bounds checked if (items != 1) and rewind with PPCODE earlier -switch to SvPV_const(), COW future proofing -hv_existss(isarev, "UNIVERSAL") switch to SVPV HEK* for faster lookups and precalced U32 hash -mro_get_pkg_gen() just use EU::PXS's built in IV retval logic instead of DIYing it, newer EU::PXSs use dXSTARG; and TARGi,(); optimization -mro_invalidate_all_method_caches() do @_ logic as early as possible croak_xs_usage() doesn't care if MARK was popped, or how much forwards or backwards the SV** inside PL_stack_sp is. Doing this allows the smallest machine code possible on any CPU arch, since the offset literals into my_perl, or unthreaded offsets into libperl.so.dll, that do the reads and writes are very close together (RISC CPUs might need a dedicated load const litteral integer to register op, to reach around inside my_perl or libperl.so.dll, they can't put a variable length 1-4 byte opcode + (U16 || U32) litteral into 1 RISC op. ARM's limit is U12 for example.
- newSVsv()/newSVsv_flags() will not propagate a SVPV HEK* COW for us b/c we are not PERL_CORE, so manually detect SVPV HEK* COWs and send them to newSVhek(), probably faster too than the forest of logic trees inside sv_setsv_flags(). GMG test is paranoia.
-S_mro_get_linear_isa_c3() dont call av_count() many times -the fn call mro_newSVsvhekok(), or any ISO C fn call located between static inlines newAV_alloc_xz() and av_push_simple() severely degrades or destroys any chance of anything const folding or any thing being removed by the CC for common sub exp elimination AV* isalin = newAV_alloc_xz(4); av_push_simple(isalin, mro_newSVsvhekok(classname)); Changing it to SV* nsv = mro_newSVsvhekok(classname); AV* isalin = newAV_alloc_xz(4); av_push_simple(isalin, nsv); fixes the problem. The CC is now allowed to fuse bottom half of newAV_alloc_xz() and top half of av_push_simple(). -"const I32 throw_nomethod = SvIVX(ST(1));" silence CC truncate warning
extra fields were added in commit fa60396 but looks to me like NC forgot the click save, or didn't stage a hunk, or a patch file/commit got lost on his PC, since he added all the pieces for the optimization, except for writing in the U32 hash value. the structs original were R/W static global memory see https://github.com/Perl/perl5/blame/123892d916359369839f3896f283189be71dc32c/mro.c then around the same time/(same branch?) of the commit that added the optimization, a design change happened, and they got marked RO static global |
New windows, new bathrooms, new kitchen, new landscaping, new swimming pool. Watch XSess Factor Flipping tonight at 8PM.
See the individual commits for details. Rational, since most Ctrl-C presses SEGV WinPerl, by chance I SEGVed inside one of mro.xs's .t, so I cancelled the Ctrl-C event, and started single stepping around. It needed cleanup. @_ macros are all wrong,
PUSH()
family andXS_RETURN()
family macros are never to be mixed.PUTBACK
is forPUSH()
family, or fall off the end and let EU::PXS do the PUTBACK. you dont useXS_RETURN()
. WritingST(1); ST(1);
in any XSUB is wrong. mro.xs needs cleanup. mro.xs's .t's are kindda longish, to run too. and the comments "there will never be aSV
* with aHEK*
PVX buf here, so we will duplicate it into aSV *
with a Newx() PVX buf" those comments were wrong. I BPed those areas, and I sawSVPV HEK * COWs
there, not SVPV Newx bufs. checkingSvLEN == 0
value butSVCUR > 0
proves it.Also why does the pluggable MRO algo provider struct, have a U32 hash field, but its frozen to forever bc that gbl struct is sitting in const RO global memory? Fill it in if its there. I'm was planning not to git blame who decided to add the const tag to the 2 structs and forgot to fill in the last c struct field, but i think i will do that now.