Skip to content

Partition Operation enum #223

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 15 commits into
base: master
Choose a base branch
from

Conversation

sellout
Copy link
Contributor

@sellout sellout commented May 22, 2025

This splits Operation into three separate enums – Control (for
if/else, which get executed regardless of vexec and disabled
operations, which always fail, regardless of vexec), Normal, for
operations that respect vexec, and Unknown for undefined opcodes
which only fail if they’re on an active branch. This is done so that the
evaluator can be split on the same lines.

It also integrates the values with the PushValue opcodes.

And it exposes two modules (op and pv) to allow a separation between internal opcode structure for the implementation and the interface consumers (PCZT, etc.) use.

This depends on #221 and #222.

@mpguerra mpguerra added this to Zebra Jun 2, 2025
@mpguerra mpguerra moved this to Review/QA in Zebra Jun 2, 2025
Copy link
Contributor

@str4d str4d left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flushing comment from a partial review as of f0753c5.

@github-project-automation github-project-automation bot moved this from Review/QA to In progress in Zebra Jun 24, 2025
sellout added a commit to sellout/zcash_script that referenced this pull request Jul 2, 2025
This was originally part of ZcashFoundation#223, but with the script validation tests being added, it was helpful
to bring this subset forward.
@sellout sellout force-pushed the partition-opcodes branch from f0753c5 to 0ec5b66 Compare July 2, 2025 23:49
@natalieesk natalieesk moved this from In progress to Review/QA in Zebra Jul 9, 2025
conradoplg pushed a commit that referenced this pull request Jul 11, 2025
* Add more ergonomic opcode definitions

This was originally part of #223, but with the script validation tests being added, it was helpful
to bring this subset forward.

* Port script tests from C++

This is regex-transformed from the script_*valid.json tests in the C++
implementation.

The most significant difference is that this checks the specific result,
not just whether the script is valid. This is useful for a couple reasons:

- it catches places where a test may not be failing in the way we expect
 (e.g., invalid.rs has a test that is likely not testing what it should
   be);
- it catches places the implementation returns the wrong error (these
  have already been handled in the Rust implementation, but see all of
  the `UnknownError` results – if we’re explicitly testing for them,
  they shouldn’t be unknown, and one is definitely a case of forgot-to-
  report-`SigHighS`); and
- it allows us to interleave successful and unsuccessful tests in
  future, rather than splitting related tests into two files just
  because one succeeds and one represents an error.

However, some tests currently fail, this is because they rely on
specific transactions in `CHECK*SIG*`. This was easy to support prior to
the callback change, but these tests were never migrated to work on the
updated C++ implementation. See #240.

* Update script tests

This brings them in sync with
bitcoin/bitcoin@76da761 ,
which is the last commit before they were merged into a single file.

Unfortunately, this also adds expected results to each entry, so the
diff is bigger than just added/changed tests.

Also, since the C++/JSON version of these tests was never updated to
work after the callback change, I’m updating the JSON files to match the
Bitcoin ones (modulo the tests we don’t support), just to keep
everything in sync.

* Unify the valid & invalid script tests

This mirrors the changes from
bitcoin/bitcoin@dde46d3

* Sync script tests with Bitcoin HEAD

This brings the script tests up to date as of
bitcoin/bitcoin@fa33592 ,
but without the cases that require features we don’t support.

* Merge unified `TestVector`s into containing module
@natalieesk natalieesk added the external contribution If an issue or PR has been created by someone external to the Foundation label Jul 15, 2025
@sellout sellout force-pushed the partition-opcodes branch 2 times, most recently from 533f27a to fb6ff58 Compare July 16, 2025 20:22
@sellout sellout marked this pull request as draft July 21, 2025 19:34
src/script.rs Outdated
},
}
}

pub fn get_op(script: &[u8]) -> Result<(Result<Opcode, u8>, &[u8]), ScriptError> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of a tuple here, can the Ok side be a struct with named members? That would help with clarity.

Also, document this method (and the new struct).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ll pull the documentation here, but just for reference, I documented everything in 4ad8553 (from #209), which also adds #![deny(missing_docs)].

src/script.rs Outdated
@@ -400,28 +462,61 @@ impl Script<'_> {
})
}

pub fn get_op2(script: &[u8]) -> Result<(Opcode, &[u8], &[u8]), ScriptError> {
pub fn get_lv(script: &[u8]) -> Result<Option<(LargeValue, &[u8])>, ScriptError> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Document this method. Does it need to be public?

src/script.rs Outdated
},
}
}

pub fn get_op(script: &[u8]) -> Result<(Result<Opcode, u8>, &[u8]), ScriptError> {
Self::get_lv(script).and_then(|r| {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this is another place where using ? would be clearer and require less indentation.

src/script.rs Outdated
Comment on lines 472 to 477
0x4c => Self::split_tagged_value(script, 1)
.map(|(v, script)| Some((OP_PUSHDATA1(v.to_vec()), script))),
0x4d => Self::split_tagged_value(script, 2)
.map(|(v, script)| Some((OP_PUSHDATA2(v.to_vec()), script))),
0x4e => Self::split_tagged_value(script, 4)
.map(|(v, script)| Some((OP_PUSHDATA4(v.to_vec()), script))),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add pub(crate) constants for these values instead of using the magic numbers inline.

src/script.rs Outdated
Comment on lines 478 to 485
_ => {
if 0x01 <= *leading_byte && *leading_byte < 0x4c {
Self::split_value(script, (*leading_byte).into())
.map(|(v, script)| Some((PushdataBytelength(v.to_vec()), script)))
} else {
Ok(None)
}
}
Copy link
Contributor

@str4d str4d Jul 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
_ => {
if 0x01 <= *leading_byte && *leading_byte < 0x4c {
Self::split_value(script, (*leading_byte).into())
.map(|(v, script)| Some((PushdataBytelength(v.to_vec()), script)))
} else {
Ok(None)
}
}
0x01..0x4c => {
Self::split_value(script, (*leading_byte).into())
.map(|(v, script)| Some((PushdataBytelength(v.to_vec()), script)))
}
_ => Ok(None),

0x01..0x4c matches are newer Rust syntax; if the MSRV of this crate is too low, then use 0x01..=0x4b instead.

(And also move this above the 0x4c case so they are in order, for niceness.)

@sellout sellout force-pushed the partition-opcodes branch 2 times, most recently from 521747c to 05854fc Compare July 29, 2025 22:12
Copy link
Contributor

@str4d str4d left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 05854fc

src/pv.rs Outdated
Comment on lines 28 to 42
pub fn pushdata_bytelength(value: Vec<u8>) -> PushValue {
LargeValue(PushdataBytelength(value))
}

pub fn pushdata1(value: Vec<u8>) -> PushValue {
LargeValue(OP_PUSHDATA1(value))
}

pub fn pushdata2(value: Vec<u8>) -> PushValue {
LargeValue(OP_PUSHDATA2(value))
}

pub fn pushdata4(value: Vec<u8>) -> PushValue {
LargeValue(OP_PUSHDATA4(value))
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a fan of having these in the public API, as they enable construction of both non-minimal (not ideal, but eh) and invalid (because the data is longer than can be represented) PushValues. Instead, I'd prefer a PushValue constructor that takes a data: Vec<u8> and returns the correct PushValue (or None / error if it is too long).

Non-blocking because it looks like crate::test_vectors::Entry::val_to_pv does exactly this, and IIRC there's also maybe a push_vec helper function that appears in a later PR, so I'm fine with addressing this in a subsequent PR (at which point all of these should be pub(crate) at most, unless there's a documented reason for these being in the public API).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You’re right. It’s fixed in e54309d#diff-06fac4d64e96cf6aba9121211fd0596281b0225a40a27abe35f87357a9e749bb (from #209), but I can pull that into here.

src/script.rs Outdated
Comment on lines 28 to 35
#[derive(Clone, PartialEq, Eq, PartialOrd, Ord, Debug)]
pub enum LargeValue {
// push value
PushdataBytelength(Vec<u8>),
OP_PUSHDATA1(Vec<u8>),
OP_PUSHDATA2(Vec<u8>),
OP_PUSHDATA4(Vec<u8>),
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These being enum tuple variants means that they are forcibly public, which has the same problem as my previous comment (it is possible to construct an invalid LargeValue, and thus an invalid PushValue). I would prefer that each of these use an opaque newtype wrapper to keep the Vec<u8> hidden:

Suggested change
#[derive(Clone, PartialEq, Eq, PartialOrd, Ord, Debug)]
pub enum LargeValue {
// push value
PushdataBytelength(Vec<u8>),
OP_PUSHDATA1(Vec<u8>),
OP_PUSHDATA2(Vec<u8>),
OP_PUSHDATA4(Vec<u8>),
}
#[derive(Clone, PartialEq, Eq, PartialOrd, Ord, Debug)]
pub enum LargeValue {
// push value
PushdataBytelength(PushData0),
OP_PUSHDATA1(PushData1),
OP_PUSHDATA2(PushData2),
OP_PUSHDATA4(PushData4),
}
/// A `PushData` that is short enough to not require a length prefix.
pub struct PushData0(Vec<u8>);
/// A `PushData` that requires a 1-byte length prefix.
pub struct PushData1(Vec<u8>);
/// A `PushData` that requires a 2-byte length prefix.
pub struct PushData2(Vec<u8>);
/// A `PushData` that requires a 4-byte length prefix.
pub struct PushData4(Vec<u8>);

Alternatively, make LargeValue a struct, and handle the difference between the required opcode / length prefix internally. (What benefit do we get from exposing these differences in the public API for matching?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, in #209, these are no longer public (the eval_* functions have become methods on the individual types, and so only Opcode and PushValue are exposed as types, and they don’t have their constructors exposed).

To defend the current encoding, I pulled in some work that wasn’t on any published branch yet – using different BoundedVecs for each LargeValue constructor. It doesn’t yet enforce minimal encoding at the type level. I think that’d be feasible with const_generic_exprs, but in the meantime we could parameterize PushValue to take either, say, MinimalLargeValue or NonMinimalLargeValue. But that’s not in this change.

src/script.rs Outdated
use LargeValue::*;

impl LargeValue {
pub fn value(&self) -> Vec<u8> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Document this method.

#[derive(Copy, Clone, PartialEq, Eq, PartialOrd, Ord, Debug)]
#[repr(u8)]
pub enum PushValue {
pub enum SmallValue {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Document this enum.
  • Document this addition in the changelog.

Comment on lines 440 to 450
pv.value().map_or(
Err(ScriptError::BadOpcode(Some(SmallValue::OP_RESERVED.into()))),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, here instead it becomes clear that None actually means OP_RESERVED, and the map_or(0, _) earlier is being used so that we reach this error.

However, I note that in the previous code we always return BadOpcode, whereas here it's possible we return MinimalData instead (unless pv.is_minimal_push() is documented to always return true for OP_RESERVED?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah, there is a commit in #209 (6725282) that removes OP_RESERVED from SmallValue making this and the previous comment obsolete. I can pull that into this PR.

Comment on lines 936 to 947
if stack.len() < i.into() {
return Err(ScriptError::InvalidStackOperation);
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this incorrectly removed in an earlier refactor, or if not then what is this refactored from? I ask because it changes the error that gets returned (instead of returning the error from stack.rget(i.into())?). Maybe this is also a rebase error?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stack.len() < 0usize can't be true, so this is dead code. I suspect it's incorrectly duplicated from other instances below.

Suggested change
if stack.len() < i.into() {
return Err(ScriptError::InvalidStackOperation);
};

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t know how this got reintroduced. Originally i was initialized to 1, so the check made sense, and it should have been removed when that was changed to 0.

Comment on lines 922 to 973
let mut isig = i;
i += sigs_count;
if stack.len() <= i.into() {
return Err(ScriptError::InvalidStackOperation);
};

let mut success = true;
while success && sigs_count > 0 {
let vch_sig: &ValType = stack.rget(isig.into())?;
let vch_pub_key: &ValType = stack.rget(ikey.into())?;

// Check signature
let ok: bool =
is_sig_valid(vch_sig, vch_pub_key, flags, script, &checker)?;

if ok {
isig += 1;
sigs_count -= 1;
}
ikey += 1;
keys_count -= 1;

// If there are more signatures left than keys left,
// then too many signatures have failed. Exit early,
// without checking any further signatures.
if sigs_count > keys_count {
success = false;
};
}

// Clean up stack of actual arguments
for _ in 0..i {
stack.pop()?;
}

// A bug causes CHECKMULTISIG to consume one extra argument
// whose contents were not checked in any way.
//
// Unfortunately this is a potential source of mutability,
// so optionally verify it is exactly equal to zero prior
// to removing it from the stack.
if stack.is_empty() {
return Err(ScriptError::InvalidStackOperation);
}
if flags.contains(VerificationFlags::NullDummy)
&& !stack.rget(0)?.is_empty()
{
return Err(ScriptError::SigNullDummy);
}
stack.pop()?;

stack.push(cast_from_bool(success));

if op == OP_CHECKMULTISIGVERIFY {
if success {
stack.pop()?;
} else {
return Err(ScriptError::CheckMultisigVerify);
}
}
}
OP_CHECKMULTISIG | OP_CHECKMULTISIGVERIFY => {
// ([sig ...] num_of_signatures [pubkey ...] num_of_pubkeys -- bool)

// NB: This is guaranteed u8-safe, because we are limited to 20 keys and
// 20 signatures, plus a couple other fields. u8 also gives us total
// conversions to the other types we deal with here (`isize` and `i64`).
let mut i: u8 = 0;
if stack.len() < i.into() {
return Err(ScriptError::InvalidStackOperation);
};

let mut keys_count =
u8::try_from(parse_num(stack.rget(i.into())?, require_minimal, None)?)
.map_err(|_| ScriptError::PubKeyCount)?;
if keys_count > 20 {
return Err(ScriptError::PubKeyCount);
};
assert!(*op_count <= MAX_OP_COUNT);
*op_count += keys_count;
if *op_count > MAX_OP_COUNT {
return Err(ScriptError::OpCount);
};
i += 1;
let mut ikey = i;
i += keys_count;
if stack.len() <= i.into() {
return Err(ScriptError::InvalidStackOperation);
}

_ => {
return Err(ScriptError::BadOpcode);
}
let mut sigs_count =
u8::try_from(parse_num(stack.rget(i.into())?, require_minimal, None)?)
.map_err(|_| ScriptError::SigCount)?;
if sigs_count > keys_count {
return Err(ScriptError::SigCount);
};
assert!(i <= 21);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reverts #222 (comment) which I presume is due to a rebase error. Undo this change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same problem here; when looking with whitespace hidden this diff is much more obvious.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wait, I wonder if this is actually an artifact of this PR not having been rebased yet.

Comment on lines 942 to 989
isig += 1;
sigs_count -= 1;
}
ikey += 1;
keys_count -= 1;

// If there are more signatures left than keys left,
// then too many signatures have failed. Exit early,
// without checking any further signatures.
if sigs_count > keys_count {
success = false;
};
}

// Clean up stack of actual arguments
for _ in 0..i {
stack.pop()?;
}

// A bug causes CHECKMULTISIG to consume one extra argument
// whose contents were not checked in any way.
//
// Unfortunately this is a potential source of mutability,
// so optionally verify it is exactly equal to zero prior
// to removing it from the stack.
if stack.is_empty() {
return Err(ScriptError::InvalidStackOperation);
}
if flags.contains(VerificationFlags::NullDummy)
&& !stack.rget(0)?.is_empty()
{
return Err(ScriptError::SigNullDummy);
}
stack.pop()?;

stack.push(cast_from_bool(success));

if op == OP_CHECKMULTISIGVERIFY {
if success {
stack.pop()?;
} else {
return Err(ScriptError::CheckMultisigVerify);
}
}
}
OP_CHECKMULTISIG | OP_CHECKMULTISIGVERIFY => {
// ([sig ...] num_of_signatures [pubkey ...] num_of_pubkeys -- bool)

// NB: This is guaranteed u8-safe, because we are limited to 20 keys and
// 20 signatures, plus a couple other fields. u8 also gives us total
// conversions to the other types we deal with here (`isize` and `i64`).
let mut i: u8 = 0;
if stack.len() < i.into() {
return Err(ScriptError::InvalidStackOperation);
};

let mut keys_count =
u8::try_from(parse_num(stack.rget(i.into())?, require_minimal, None)?)
.map_err(|_| ScriptError::PubKeyCount)?;
if keys_count > 20 {
return Err(ScriptError::PubKeyCount);
};
assert!(*op_count <= MAX_OP_COUNT);
*op_count += keys_count;
if *op_count > MAX_OP_COUNT {
return Err(ScriptError::OpCount);
};
i += 1;
let mut ikey = i;
i += keys_count;
if stack.len() <= i.into() {
return Err(ScriptError::InvalidStackOperation);
}

_ => {
return Err(ScriptError::BadOpcode);
}
let mut sigs_count =
u8::try_from(parse_num(stack.rget(i.into())?, require_minimal, None)?)
.map_err(|_| ScriptError::SigCount)?;
if sigs_count > keys_count {
return Err(ScriptError::SigCount);
};
assert!(i <= 21);
i += 1;
let mut isig = i;
i += sigs_count;
if stack.len() <= i.into() {
return Err(ScriptError::InvalidStackOperation);
};

let mut success = true;
while success && sigs_count > 0 {
let vch_sig: &ValType = stack.rget(isig.into())?;
let vch_pub_key: &ValType = stack.rget(ikey.into())?;

// Note how this makes the exact order of pubkey/signature evaluation
// distinguishable by CHECKMULTISIG NOT if the STRICTENC flag is set.
// See the script_(in)valid tests for details.
let ok: bool = is_sig_valid(vch_sig, vch_pub_key, flags, script, checker)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also looks like a rebase error, though I guess the comment being re-added here is fine.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ew, at the outer level (without showing whitespace presumably) this diff looks horrible. I was talking specifically about the re-introduction of the "Note how this makes..." comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only skimmed the changes to this file.

sellout added 9 commits July 31, 2025 08:13
On the C++ side, this is the value of the error output parameter when
script validation is successful. It never occurs on the Rust side
(because we have `Result`), and so we can remove it from the enumeration
without an consequences.
Previously, the `TestVector`s held normalized results, and we would
normalize the _actual_ result before comparison.

This changes the `TestVector`s to hold richer Rust results, and then to
normalize the _expected_ result only for the C++ case.
This splits `Operation` into three separate enums – `Control` (for
if/else, which get executed regardless of `vexec` and disabled
operations, which always fail, regardless of `vexec`), `Normal`, for
operations that respect `vexec`, and `Unknown` for undefined opcodes
which only fail if they’re on an active branch. This is done so that the
evaluator can be split on the same lines.

__NB__: I strongly recommend ignoring whitespace changes when reviewing
        this commit.

(cherry picked from commit f97b92c)
(cherry picked from commit 27a5037)
This parallels the existing `op` module, and is used in cases where we
want to guarantee that only push values are used.

`op` itself has been updated to reference `pv`, rather than using the
constructors directly.
This introduces one edge case: If a disabled opcode is the 202nd
operation in a script, the C++ impl would return `Error::OpCount`, while
the Rust impl would return `Error::DisabledOpcode`.
Having `Control` and `Normal` grouped under `Operation` only
eliminated one conditional (checking whether we hit the `op_count`
limit, and that is now better abstracted anyway), and it introduced a
lot of code (see the 55 lines removed here _plus_ the many nested calls
removed, as in op.rs).

So, `Normal` is now called `Operation`. The confusing label of “control”
for some `Operation` (née `Normal`) opcodes has been removed.

(cherry picked from commit 1c98bb3)
Well, my _tiny_ edge case of “only if the 202nd operation is a disabled opcode” didn’t slip past the
fuzzer. It caught that pretty quickly.

So, this does a better job of normalizing errors for comparisons.

First, it normalizes both the C++ and Rust side, which allows the Rust error cases to not be a
superset of the C++ error cases. Then, it also normalizes errors in the stepwise comparator (which I
think was done in ZcashFoundation#210, but it’s reasonable to do along with these other changes).

Also, `ScriptError::UnknownError` has been replaced with `ScriptError::ExternalError`, which takes
a string description. This is used for two purposes:

1. “Ambiguous” cases. One was already done – `UNKNOWN_ERROR` on the C++
   side with `ScriptError::ScriptNumError` or `ScriptError::HighS` on
   the Rust side, but now it’s handled differently. The other is the
   edge case I introduced earlier in this PR – Rust will fail with a
   `DisabledOpcode` and C++ will fail with a `OpCount`, so those two
   cases have been unified.
2. Errors that occur inside a stepper. Previously, this just melded in
   by returning `UnknownError`, but that was just the “least bad”
   option. Now we can distinguish these from other `ScriptError`.
@sellout sellout force-pushed the partition-opcodes branch from 05854fc to faef56e Compare July 31, 2025 15:40
sellout added 2 commits July 31, 2025 10:34
- add a `ParsedOpcode` type rather than using a tuple
- use constants for `PUSHDATA` bytes
- add some documentation
@sellout sellout marked this pull request as ready for review July 31, 2025 20:33
sellout added 3 commits July 31, 2025 14:35
This gives the different `LargeValue` constructors different types, but
it can’t (yet) enforce minimal encoding at the type level.
This is one more step toward type-safe scripts.

It also modifies `Script::parse` to not fail on bad opcodes, and adds a
new `Script::parse_strict` that _does_ fail on bad opcodes.
@sellout sellout force-pushed the partition-opcodes branch from c719e4a to 87202b0 Compare July 31, 2025 20:36
Copy link
Contributor

@nuttycom nuttycom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor style nits; some of my style complaints from earlier commits were already addressed by other changes by the time I got to the end of the PR, but some (in particular, the request that functional constructs not be used for side-effecting code) remain.

src/lib.rs Outdated
ScriptError::ScriptNumError(_) => ScriptError::UnknownError,
_ => serr,
}),
Error::Ok(serr) => Error::Ok(normalize_script_error(serr)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went and looked at the documentation of Error::Ok, and I could not figure out why this is called Ok - can you document how or why this particular error is okay in some sense? I know it's not something that's specific to this PR, but that naming is definitely confusing without there being additional elaboration.

@@ -484,7 +492,7 @@ fn eval_control(
vexec.pop()?;
}

OP_VERIF | OP_VERNOTIF => return Err(ScriptError::BadOpcode),
OP_VERIF | OP_VERNOTIF => return Err(ScriptError::BadOpcode(Some(op.into()))),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I generally have had bad experiences when using .into(); these can be broken by spooky action at a distance by the introduction of From impls. Here it's probably fine, but I've gotten into the habit of avoiding into entirely in favor of e.g. u8::from(op).

src/script.rs Outdated
Comment on lines 432 to 437
pub struct ParsedOpcode<'a> {
/// The [`Result`] allows us to preserve unknown opcodes, which only trigger a failure if
/// they’re on an active branch during interpretation.
pub opcode: Result<Opcode, u8>,
pub remaining_code: &'a [u8],
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, does this need to be part of the public API? Second, can the members be crate-private? Finally, it looks like now the Result is just being used as a generic Either type; I'd prefer a bespoke enum instead.

src/script.rs Outdated
pub struct ParsedOpcode<'a> {
/// The [`Result`] allows us to preserve unknown opcodes, which only trigger a failure if
/// they’re on an active branch during interpretation.
pub opcode: Result<Opcode, u8>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pub opcode: Result<Opcode, u8>,
pub opcode: OpcodeParseResult,

(potentially pub(crate)?)

src/script.rs Outdated
Comment on lines 454 to 456
opcode
.map_err(|byte| ScriptError::BadOpcode(Some(byte)))
.map(|op| result.push(op))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
opcode
.map_err(|byte| ScriptError::BadOpcode(Some(byte)))
.map(|op| result.push(op))
match opcode {
Known(op) => {
result.push(op);
Ok(())
}
Unknown(byte) => Err(ScriptError::BadOpcode(Some(byte)))
}

Comment on lines +56 to +59
vec.clone().try_into().map_or(
vec.clone().try_into().map_or(
vec.clone().try_into().map_or(
vec.try_into().map_or(None, |bv| Some(OP_PUSHDATA4(bv))),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would really prefer that these were <type>::try_from calls instead; relying on type inference makes this code really hard to read.


use PushValue::*;
#[derive(Clone, PartialEq, Eq, PartialOrd, Ord, Debug)]
pub enum PushValue {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variants still need documentation.

opcode
.map_err(|byte| ScriptError::BadOpcode(Some(byte)))
.map(|op| result.push(op))
result.push(opcode)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a better resolution to my previous comment, but I would still prefer imperative style here given that it's mutating variables in the enclosing scope.

Copy link
Contributor

@str4d str4d left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK 87202b0. Comments are all style and non-blocking (particularly if addressing them makes the later PRs harder to rebase), though I will likely go through and make the changes myself in a follow-up PR if they aren't addressed.

Comment on lines +359 to +360
pub fn run_test_vector(
tv: &TestVector,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just noticed that this is now in the public API (behind test-dependencies), and TestVector is now pub, but there's no way to obtain a TestVector in the public API (because crate::test_vectors is private). Either make crate::test_vectors public, or make this private (depending on which is intended).

Comment on lines +55 to +65
let vec = v.to_vec();
vec.clone().try_into().map_or(
vec.clone().try_into().map_or(
vec.clone().try_into().map_or(
vec.try_into().map_or(None, |bv| Some(OP_PUSHDATA4(bv))),
|bv| Some(OP_PUSHDATA2(bv)),
),
|bv| Some(OP_PUSHDATA1(bv)),
),
|bv| Some(PushdataBytelength(bv)),
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nits:

  • This clones the data one more time than necessary.
  • It also clones the data way more times than necessary, because .map_or eagerly evaluates its first argument. You'd need to use .map_or_else instead to get lazy evaulation.
  • I find nested map_or (where the or case is complex) unreadable, and for the inner case where there is no further nesting, it is more verbose than .ok().map(_).
Suggested change
let vec = v.to_vec();
vec.clone().try_into().map_or(
vec.clone().try_into().map_or(
vec.clone().try_into().map_or(
vec.try_into().map_or(None, |bv| Some(OP_PUSHDATA4(bv))),
|bv| Some(OP_PUSHDATA2(bv)),
),
|bv| Some(OP_PUSHDATA1(bv)),
),
|bv| Some(PushdataBytelength(bv)),
)
if let Ok(bv) = v.to_vec().try_into() {
Some(PushdataBytelength(bv))
} else if let Ok(bv) = v.to_vec().try_into() {
Some(OP_PUSHDATA1(bv))
} else if let Ok(bv) = v.to_vec().try_into() {
Some(OP_PUSHDATA2(bv))
} else {
v.to_vec().try_into().ok().map(OP_PUSHDATA4)
}

Or if we want the more functional approach:

Suggested change
let vec = v.to_vec();
vec.clone().try_into().map_or(
vec.clone().try_into().map_or(
vec.clone().try_into().map_or(
vec.try_into().map_or(None, |bv| Some(OP_PUSHDATA4(bv))),
|bv| Some(OP_PUSHDATA2(bv)),
),
|bv| Some(OP_PUSHDATA1(bv)),
),
|bv| Some(PushdataBytelength(bv)),
)
v.to_vec()
.try_into()
.ok()
.map(PushdataBytelength)
.or_else(|| {
v.to_vec()
.try_into()
.ok()
.map(OP_PUSHDATA1)
})
.or_else(|| {
v.to_vec()
.try_into()
.ok()
.map(OP_PUSHDATA2)
})
.or_else(|| {
v.to_vec()
.try_into()
.ok()
.map(OP_PUSHDATA4)
})
}

Comment on lines +52 to +54
match v {
[] => None,
_ => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of a single-case match, we generally use:

if v.is_empty() {
    None
} else {

or

(!v.is_empty()).then(|| {

Comment on lines +30 to +32
///
/// TODO: These should have lower bounds that can prevent non-minimal encodings, but that requires
/// at least `const_generic_exprs`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should they? Is it the case that the C++ script engine never permits non-minimal encodings to be used during execution?

The upper bounds by contrast are clearly correct, as it is impossible to e.g. parse an OP_PUSHDATA1 and get more than 0xff bytes.

Comment on lines +166 to +189
match v {
[] => Some(PushValue::SmallValue(OP_0)),
[byte] => Some(match byte {
0x81 => PushValue::SmallValue(OP_1NEGATE),
1 => PushValue::SmallValue(OP_1),
2 => PushValue::SmallValue(OP_2),
3 => PushValue::SmallValue(OP_3),
4 => PushValue::SmallValue(OP_4),
5 => PushValue::SmallValue(OP_5),
6 => PushValue::SmallValue(OP_6),
7 => PushValue::SmallValue(OP_7),
8 => PushValue::SmallValue(OP_8),
9 => PushValue::SmallValue(OP_9),
10 => PushValue::SmallValue(OP_10),
11 => PushValue::SmallValue(OP_11),
12 => PushValue::SmallValue(OP_12),
13 => PushValue::SmallValue(OP_13),
14 => PushValue::SmallValue(OP_14),
15 => PushValue::SmallValue(OP_15),
16 => PushValue::SmallValue(OP_16),
_ => PushValue::LargeValue(PushdataBytelength([*byte; 1].into())),
}),
_ => LargeValue::from_slice(v).map(PushValue::LargeValue),
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: the inner match can be merged with the outer, and then we aren't duplicating the parsing logic for one-byte PushdataBytelengths:

Suggested change
match v {
[] => Some(PushValue::SmallValue(OP_0)),
[byte] => Some(match byte {
0x81 => PushValue::SmallValue(OP_1NEGATE),
1 => PushValue::SmallValue(OP_1),
2 => PushValue::SmallValue(OP_2),
3 => PushValue::SmallValue(OP_3),
4 => PushValue::SmallValue(OP_4),
5 => PushValue::SmallValue(OP_5),
6 => PushValue::SmallValue(OP_6),
7 => PushValue::SmallValue(OP_7),
8 => PushValue::SmallValue(OP_8),
9 => PushValue::SmallValue(OP_9),
10 => PushValue::SmallValue(OP_10),
11 => PushValue::SmallValue(OP_11),
12 => PushValue::SmallValue(OP_12),
13 => PushValue::SmallValue(OP_13),
14 => PushValue::SmallValue(OP_14),
15 => PushValue::SmallValue(OP_15),
16 => PushValue::SmallValue(OP_16),
_ => PushValue::LargeValue(PushdataBytelength([*byte; 1].into())),
}),
_ => LargeValue::from_slice(v).map(PushValue::LargeValue),
}
match v {
[] => Some(PushValue::SmallValue(OP_0)),
[0x81] => Some(PushValue::SmallValue(OP_1NEGATE)),
[1] => Some(PushValue::SmallValue(OP_1)),
[2] => Some(PushValue::SmallValue(OP_2)),
[3] => Some(PushValue::SmallValue(OP_3)),
[4] => Some(PushValue::SmallValue(OP_4)),
[5] => Some(PushValue::SmallValue(OP_5)),
[6] => Some(PushValue::SmallValue(OP_6)),
[7] => Some(PushValue::SmallValue(OP_7)),
[8] => Some(PushValue::SmallValue(OP_8)),
[9] => Some(PushValue::SmallValue(OP_9)),
[10] => Some(PushValue::SmallValue(OP_10)),
[11] => Some(PushValue::SmallValue(OP_11)),
[12] => Some(PushValue::SmallValue(OP_12)),
[13] => Some(PushValue::SmallValue(OP_13)),
[14] => Some(PushValue::SmallValue(OP_14)),
[15] => Some(PushValue::SmallValue(OP_15)),
[16] => Some(PushValue::SmallValue(OP_16)),
_ => LargeValue::from_slice(v).map(PushValue::LargeValue),
}

Comment on lines +566 to +574
Self::get_op(pc).map(
|ParsedOpcode {
opcode,
remaining_code,
}| {
pc = remaining_code;
result.push(opcode)
},
)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Self::get_op(pc).map(
|ParsedOpcode {
opcode,
remaining_code,
}| {
pc = remaining_code;
result.push(opcode)
},
)?;
let ParsedOpcode {
opcode,
remaining_code,
} = Self::get_op(pc)?;
pc = remaining_code;
result.push(opcode);

Comment on lines +659 to +674
Some((leading_byte, remaining_code)) => Disabled::from_u8(*leading_byte).map_or(
Ok(ParsedOpcode {
opcode: SmallValue::from_u8(*leading_byte).map_or(
Control::from_u8(*leading_byte).map_or(
Operation::from_u8(*leading_byte)
.map_or(Err((*leading_byte).into()), |op| {
Ok(Opcode::Operation(op))
}),
|ctl| Ok(Opcode::Control(ctl)),
),
|sv| Ok(Opcode::PushValue(PushValue::SmallValue(sv))),
),
remaining_code,
}),
|disabled| Err(ScriptError::DisabledOpcode(Some(disabled))),
),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-blocking: Another case where the .map_or is eagerly running when it should lazily run. However, this entire section will be rewritten when enum_primitive is removed, so I'm fine leaving it as-is for now.

Comment on lines +555 to +559
match self.parse().map(|script| script.into_iter().collect()) {
Err(op_err) => Err(Ok(op_err)),
Ok(Err(bad_ops)) => Err(Err(bad_ops)),
Ok(Ok(ops)) => Ok(ops),
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very confusing match without documentation. @nuttycom's enum suggestion would likely help, but also some comments would make things clearer.

if Bad::OP_RESERVED != bad {
state.increment_op_count()?;
}
if Bad::OP_VERIF == bad || Bad::OP_VERNOTIF == bad || should_exec(&state.vexec) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should also work:

Suggested change
if Bad::OP_VERIF == bad || Bad::OP_VERNOTIF == bad || should_exec(&state.vexec) {
if matches!(bad, Bad::OP_VERIF | Bad::OP_VERNOTIF) || should_exec(&state.vexec) {

(turns out it's the same length in this instance).

remaining_code,
}| match opcode {
Err(bad) => {
if Bad::OP_RESERVED != bad {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if Bad::OP_RESERVED != bad {
// See the documentation of `Bad` for an explanation of this logic.
if Bad::OP_RESERVED != bad {

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
external contribution If an issue or PR has been created by someone external to the Foundation
Projects
Status: Review/QA
Development

Successfully merging this pull request may close these issues.

6 participants