Skip to content

615 use value sort for evaluation #625

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 27 commits into
base: master
Choose a base branch
from

Conversation

jberthold
Copy link
Member

@jberthold jberthold commented Jul 1, 2025

Large refactoring to have TypedValue only in the <locals> list, while operating only with Values when we read, write, or compute with them.

  • Evaluation sort has result Value now

  • Projection traversal and helper functions recurse and return Value now

  • A few places where type information is needed for processing (pointer casts, for one) are now using a function that apply projections to type metadata.

  • The Moved marker was previously a TypedValue but is now just another Value. This is in order to not lose type and mutability when values are moved and the local is used again (typically in loops)

  • I added some Ceil rules that appeared useful to avoid ceil conditions on symbolic arguments.

  • I had to add an interim-simplification option to the server to work around a problem with symbolic-args-fail.main. Without adding that, the server would grow an enormous memory footprint during the proof. I experimented with the request in question but could not find out where exactly the memory is allocated. It is not Haskell heap data, so probably the LLVM backend allocates it somehow.

jberthold added 27 commits June 24, 2025 16:41
@jberthold jberthold marked this pull request as ready for review July 1, 2025 12:40
@jberthold jberthold requested review from dkcumming and ehildenb July 1, 2025 12:40
@@ -64,6 +64,7 @@ def kcfg_explore(self, label: str | None = None) -> Iterator[KCFGExplore]:
llvm_definition_dir=self.llvm_library_dir,
bug_report=self.bug_report,
id=label if self.bug_report is not None else None, # NB bug report arg.s must be coherent
interim_simplification=200, # working around memory problems in LLVM backend calls
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This interim-simplification option was added to the server to work around a problem with symbolic-args-fail.main. Without adding that, the server would grow an enormous memory footprint during the proof. I experimented with the request in question but could not find out where exactly the memory is allocated. It is not Haskell heap data, so probably the LLVM backend allocates it somehow.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the next steps to track the LLVM problem down? Should we reach out to another team about it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's the same issue that Kontrol is running into with munmap?

Copy link
Collaborator

@dkcumming dkcumming left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It all looks great. I wondered if we would see a performance increase, but it doesn't look like it. Do you think that is to do with the LLVM memory problems?

@@ -543,7 +543,7 @@ An operand may be a `Reference` (the only way a function could access another fu
rule <k> #setArgFromStack(IDX, operandMove(place(local(I), .ProjectionElems)))
=>
#setLocalValue(place(local(IDX), .ProjectionElems), #incrementRef({CALLERLOCALS[I]}:>TypedValue))
#setLocalValue(place(local(IDX), .ProjectionElems), #incrementRef(getValue(CALLERLOCALS, I)))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we use get get_value can we remove the side condition on this rule since it will be checked in get_value? Or will that cause branching?

Comment on lines +65 to +77
```k
// data coerced to sort TypedLocal is not undefined if it is of that sort
rule #Ceil({X}:>TypedLocal) => #Ceil(X)
requires isTypedLocal(X) [simplification]
// data coerced to sort Value is not undefined if it is of that sort
rule #Ceil({X}:>Value) => #Ceil(X)
requires isValue(X) [simplification]
// TypedLocals created by the semantics do not have undefined subterms
rule #Ceil(LIST:List[IDX]) => #Top
requires 0 <=Int IDX andBool IDX <Int size(LIST) [simplification]
```
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I trust these make sense with the conversation had the other day. I am confused with why the top two don't create an infinite loop though as it seems the RHS is valid input for the LHS

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The LHS has the extra symbol {_}:>_, which is a cast to a lower sort. It's actually an explicit symbol in K, it's not something that's just there to inform the parser/type-checker.

When applying simplification rules in eth haskell backend, it won't try an unify on variables. So it won't try and say "what if the X on the RHS was also another semantic cast of Y?" for example. It only does that potential unification on semantic rules.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jberthold should the last rule check the definedness of the arguments to the lookup:

  // TypedLocals created by the semantics do not have undefined subterms
  rule #Ceil(LIST:List[IDX]) => #Ceil(LIST) #And #Ceil(IDX)
    requires 0 <=Int IDX andBool IDX <Int size(LIST) [simplification]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jberthold should the last rule check the definedness of the arguments to the lookup:

  // TypedLocals created by the semantics do not have undefined subterms
  rule #Ceil(LIST:List[IDX]) => #Ceil(LIST) #And #Ceil(IDX)
    requires 0 <=Int IDX andBool IDX <Int size(LIST) [simplification]

That would be good, yes.
This rule is probably too generous because the list may be well-defined but still contain something undefined at the given defined index. I can try what happens without the rule.

In particular, if we have pointer arithmetic with abstract pointers (not able to be resolved into concrete ints/bytes directly), then it will wrapper the operations in a `thunk`.
It is also useful to capture unimplemented semantic constructs so that we can have test / proof driven development.

```k
syntax Value ::= thunk ( Evaluation )
rule <k> EV:Evaluation => typedValue(thunk(EV), TyUnknown, mutabilityNot) ... </k> requires notBool isTypedValue(EV) [owise]
rule <k> EV:Evaluation => thunk(EV) ... </k> requires notBool isValue(EV) [owise]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If an Evaluation := TypedValue | Value should this not be requires notBool isValue(EV) andBool notBool isTypedValue(EV)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do have the rule above:

rule <k> typedValue(VAL, _, _) => VAL ... </k> [owise]

But I can see these two rules conflicting. So either adding the extra condition (like you say), or changing the priorities so the rule above fires first.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Evaluation hsa more subsorts than TypedValue and Value. The [owise] priority should ensure that this rule only gets applied when no other rule can.

Comment on lines +197 to +203
// // FIXME this becomes setmoved
// rule <k> #setLocalValue(place(local(I), .ProjectionElems), Moved) => .K ... </k>
// <locals>
// LOCALS => LOCALS[I <- typedValue(Moved, tyOfLocal(getLocal(LOCALS, I)), mutabilityOf(getLocal(LOCALS, I)))]
// </locals>
// requires 0 <=Int I andBool I <Int size(LOCALS)
// [preserves-definedness] // valid list indexing checked
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be commented out or deleted?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this, I'll look into it. Setting Moved is now very different from what it was before, probably we can just delete this.

Comment on lines +577 to +591
rule <k> rvalueCast(CASTKIND, operandCopy(place(local(I), PROJS)) #as OPERAND, TY)
=> #cast(OPERAND, CASTKIND, getTyOf(tyOfLocal({LOCALS[I]}:>TypedLocal), PROJS, TYPEMAP), TY) ... </k>
<locals> LOCALS </locals>
<types> TYPEMAP </types>
requires 0 <=Int I andBool I <Int size(LOCALS)
andBool isTypedLocal(LOCALS[I])
[preserves-definedness] // valid list indexing checked
rule <k> rvalueCast(CASTKIND, operandMove(place(local(I), PROJS)) #as OPERAND, TY)
=> #cast(OPERAND, CASTKIND, getTyOf(tyOfLocal({LOCALS[I]}:>TypedLocal), PROJS, TYPEMAP), TY) ... </k>
<locals> LOCALS </locals>
<types> TYPEMAP </types>
requires 0 <=Int I andBool I <Int size(LOCALS)
andBool isTypedLocal(LOCALS[I])
[preserves-definedness] // valid list indexing checked
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These look the same, but do we need to write moved the local slot for operandMove, or do we do that at a different point?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems a bit clunky but I wanted to exclude OperandConstant which would probably not make much sense for a cast, and if it was it would still need its own rule because there is no Place and the Ty of the constant needs to be read directly rather than using getTyOf.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "move" of the operand happens when the operand is read. #cast is declared strict(1) so the operand will be heated and written back as soon as it is evaluated to a Value - which does the Moved

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These do look the same to me as well, other than operandCopy and operandMove. I guess the issue is that we need to get variables I and PROJS out, which we use the #as pattern for. Maybe we could make projection funcitons for those elements instead, so that we could have a single rule for both of these cases?

It's probably not important to include in this PR.

A special function traverses type metadata along the applied projections to achieve this (using the type metadata map `Ty -> TypeInfo`).

```k
// should this be in types.md?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the sound of that, types.md is pretty small and data.md is getting large

@@ -64,6 +64,7 @@ def kcfg_explore(self, label: str | None = None) -> Iterator[KCFGExplore]:
llvm_definition_dir=self.llvm_library_dir,
bug_report=self.bug_report,
id=label if self.bug_report is not None else None, # NB bug report arg.s must be coherent
interim_simplification=200, # working around memory problems in LLVM backend calls
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the next steps to track the LLVM problem down? Should we reach out to another team about it?

Comment on lines -7 to +15
│ #expect ( typedValue ( BoolVal ( notBool ARG_UINT1:Int +Int ARG_UINT2:Int &Int 1
│ #expect ( BoolVal ( notBool ARG_UINT1:Int +Int ARG_UINT2:Int &Int 18446744073709
┃ (branch)
┣━━┓ subst: .Subst
┃ ┃ constraint:
┃ ┃ notBool notBool ARG_UINT1:Int +Int ARG_UINT2:Int &Int 18446744073709551615 ==Int ARG_UINT1:Int +Int ARG_UINT2:Int ==Bool false
┃ │
┃ ├─ 4
┃ │ #expect ( typedValue ( BoolVal ( notBool ARG_UINT1:Int +Int ARG_UINT2:Int &Int 1
┃ │ #expect ( BoolVal ( notBool ARG_UINT1:Int +Int ARG_UINT2:Int &Int 18446744073709
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes confuse me a bit, I would expect the typedValue( to be gone now, but am surprised that the 1 changes to a different number. Any reason for that? Did another simplification get applied, or one got missed?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, I didn't notice that. I assume the test didn't change so it seems wrong

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants