Partition Operation enum #223

sellout · 2025-05-22T22:46:24Z

This splits Operation into three separate enums – Control (for
if/else, which get executed regardless of vexec and disabled
operations, which always fail, regardless of vexec), Normal, for
operations that respect vexec, and Unknown for undefined opcodes
which only fail if they’re on an active branch. This is done so that the
evaluator can be split on the same lines.

It also integrates the values with the PushValue opcodes.

And it exposes two modules (op and pv) to allow a separation between internal opcode structure for the implementation and the interface consumers (PCZT, etc.) use.

This depends on #221 and #222.

str4d

Flushing comment from a partial review as of f0753c5.

src/script.rs

This was originally part of ZcashFoundation#223, but with the script validation tests being added, it was helpful to bring this subset forward.

* Add more ergonomic opcode definitions This was originally part of #223, but with the script validation tests being added, it was helpful to bring this subset forward. * Port script tests from C++ This is regex-transformed from the script_*valid.json tests in the C++ implementation. The most significant difference is that this checks the specific result, not just whether the script is valid. This is useful for a couple reasons: - it catches places where a test may not be failing in the way we expect (e.g., invalid.rs has a test that is likely not testing what it should be); - it catches places the implementation returns the wrong error (these have already been handled in the Rust implementation, but see all of the `UnknownError` results – if we’re explicitly testing for them, they shouldn’t be unknown, and one is definitely a case of forgot-to- report-`SigHighS`); and - it allows us to interleave successful and unsuccessful tests in future, rather than splitting related tests into two files just because one succeeds and one represents an error. However, some tests currently fail, this is because they rely on specific transactions in `CHECK*SIG*`. This was easy to support prior to the callback change, but these tests were never migrated to work on the updated C++ implementation. See #240. * Update script tests This brings them in sync with bitcoin/bitcoin@76da761 , which is the last commit before they were merged into a single file. Unfortunately, this also adds expected results to each entry, so the diff is bigger than just added/changed tests. Also, since the C++/JSON version of these tests was never updated to work after the callback change, I’m updating the JSON files to match the Bitcoin ones (modulo the tests we don’t support), just to keep everything in sync. * Unify the valid & invalid script tests This mirrors the changes from bitcoin/bitcoin@dde46d3 * Sync script tests with Bitcoin HEAD This brings the script tests up to date as of bitcoin/bitcoin@fa33592 , but without the cases that require features we don’t support. * Merge unified `TestVector`s into containing module

nuttycom · 2025-07-29T18:05:15Z

src/script.rs

            },
        }
    }

+    pub fn get_op(script: &[u8]) -> Result<(Result<Opcode, u8>, &[u8]), ScriptError> {


Instead of a tuple here, can the Ok side be a struct with named members? That would help with clarity.

Also, document this method (and the new struct).

I’ll pull the documentation here, but just for reference, I documented everything in 4ad8553 (from #209), which also adds #![deny(missing_docs)].

nuttycom · 2025-07-29T18:06:08Z

src/script.rs

@@ -400,28 +462,61 @@ impl Script<'_> {
        })
    }

-    pub fn get_op2(script: &[u8]) -> Result<(Opcode, &[u8], &[u8]), ScriptError> {
+    pub fn get_lv(script: &[u8]) -> Result<Option<(LargeValue, &[u8])>, ScriptError> {


Document this method. Does it need to be public?

nuttycom · 2025-07-29T18:09:49Z

src/script.rs

            },
        }
    }

+    pub fn get_op(script: &[u8]) -> Result<(Result<Opcode, u8>, &[u8]), ScriptError> {
+        Self::get_lv(script).and_then(|r| {


nit: this is another place where using ? would be clearer and require less indentation.

nuttycom · 2025-07-29T18:12:42Z

src/script.rs

+                0x4c => Self::split_tagged_value(script, 1)
+                    .map(|(v, script)| Some((OP_PUSHDATA1(v.to_vec()), script))),
+                0x4d => Self::split_tagged_value(script, 2)
+                    .map(|(v, script)| Some((OP_PUSHDATA2(v.to_vec()), script))),
+                0x4e => Self::split_tagged_value(script, 4)
+                    .map(|(v, script)| Some((OP_PUSHDATA4(v.to_vec()), script))),


Add pub(crate) constants for these values instead of using the magic numbers inline.

str4d · 2025-07-29T18:18:41Z

src/script.rs

+                _ => {
+                    if 0x01 <= *leading_byte && *leading_byte < 0x4c {
+                        Self::split_value(script, (*leading_byte).into())
+                            .map(|(v, script)| Some((PushdataBytelength(v.to_vec()), script)))
+                    } else {
+                        Ok(None)
+                    }
                }


Suggested change

_ => {

if 0x01 <= *leading_byte && *leading_byte < 0x4c {

Self::split_value(script, (*leading_byte).into())

.map(|(v, script)| Some((PushdataBytelength(v.to_vec()), script)))

} else {

Ok(None)

}

}

0x01..0x4c => {

Self::split_value(script, (*leading_byte).into())

.map(|(v, script)| Some((PushdataBytelength(v.to_vec()), script)))

}

_ => Ok(None),

0x01..0x4c matches are newer Rust syntax; if the MSRV of this crate is too low, then use 0x01..=0x4b instead.

(And also move this above the 0x4c case so they are in order, for niceness.)

str4d

Reviewed 05854fc

str4d · 2025-07-30T14:38:56Z

src/pv.rs

+pub fn pushdata_bytelength(value: Vec<u8>) -> PushValue {
+    LargeValue(PushdataBytelength(value))
+}
+
+pub fn pushdata1(value: Vec<u8>) -> PushValue {
+    LargeValue(OP_PUSHDATA1(value))
+}
+
+pub fn pushdata2(value: Vec<u8>) -> PushValue {
+    LargeValue(OP_PUSHDATA2(value))
+}
+
+pub fn pushdata4(value: Vec<u8>) -> PushValue {
+    LargeValue(OP_PUSHDATA4(value))
+}


I'm not a fan of having these in the public API, as they enable construction of both non-minimal (not ideal, but eh) and invalid (because the data is longer than can be represented) PushValues. Instead, I'd prefer a PushValue constructor that takes a data: Vec<u8> and returns the correct PushValue (or None / error if it is too long).

Non-blocking because it looks like crate::test_vectors::Entry::val_to_pv does exactly this, and IIRC there's also maybe a push_vec helper function that appears in a later PR, so I'm fine with addressing this in a subsequent PR (at which point all of these should be pub(crate) at most, unless there's a documented reason for these being in the public API).

You’re right. It’s fixed in e54309d#diff-06fac4d64e96cf6aba9121211fd0596281b0225a40a27abe35f87357a9e749bb (from #209), but I can pull that into here.

str4d · 2025-07-30T14:46:13Z

src/script.rs

+#[derive(Clone, PartialEq, Eq, PartialOrd, Ord, Debug)]
+pub enum LargeValue {
+    // push value
+    PushdataBytelength(Vec<u8>),
+    OP_PUSHDATA1(Vec<u8>),
+    OP_PUSHDATA2(Vec<u8>),
+    OP_PUSHDATA4(Vec<u8>),
+}


These being enum tuple variants means that they are forcibly public, which has the same problem as my previous comment (it is possible to construct an invalid LargeValue, and thus an invalid PushValue). I would prefer that each of these use an opaque newtype wrapper to keep the Vec<u8> hidden:

Suggested change

#[derive(Clone, PartialEq, Eq, PartialOrd, Ord, Debug)]

pub enum LargeValue {

// push value

PushdataBytelength(Vec<u8>),

OP_PUSHDATA1(Vec<u8>),

OP_PUSHDATA2(Vec<u8>),

OP_PUSHDATA4(Vec<u8>),

}

#[derive(Clone, PartialEq, Eq, PartialOrd, Ord, Debug)]

pub enum LargeValue {

// push value

PushdataBytelength(PushData0),

OP_PUSHDATA1(PushData1),

OP_PUSHDATA2(PushData2),

OP_PUSHDATA4(PushData4),

}

/// A `PushData` that is short enough to not require a length prefix.

pub struct PushData0(Vec<u8>);

/// A `PushData` that requires a 1-byte length prefix.

pub struct PushData1(Vec<u8>);

/// A `PushData` that requires a 2-byte length prefix.

pub struct PushData2(Vec<u8>);

/// A `PushData` that requires a 4-byte length prefix.

pub struct PushData4(Vec<u8>);

Alternatively, make LargeValue a struct, and handle the difference between the required opcode / length prefix internally. (What benefit do we get from exposing these differences in the public API for matching?)

So, in #209, these are no longer public (the eval_* functions have become methods on the individual types, and so only Opcode and PushValue are exposed as types, and they don’t have their constructors exposed).

To defend the current encoding, I pulled in some work that wasn’t on any published branch yet – using different BoundedVecs for each LargeValue constructor. It doesn’t yet enforce minimal encoding at the type level. I think that’d be feasible with const_generic_exprs, but in the meantime we could parameterize PushValue to take either, say, MinimalLargeValue or NonMinimalLargeValue. But that’s not in this change.

str4d · 2025-07-30T14:46:55Z

src/script.rs

+use LargeValue::*;
+
+impl LargeValue {
+    pub fn value(&self) -> Vec<u8> {


Document this method.

src/script.rs

str4d · 2025-07-30T14:47:29Z

src/script.rs

 #[derive(Copy, Clone, PartialEq, Eq, PartialOrd, Ord, Debug)]
 #[repr(u8)]
-pub enum PushValue {
+pub enum SmallValue {


Document this enum.

Document this addition in the changelog.

str4d · 2025-07-30T17:03:40Z

src/interpreter.rs

+        pv.value().map_or(
+            Err(ScriptError::BadOpcode(Some(SmallValue::OP_RESERVED.into()))),


Ah, here instead it becomes clear that None actually means OP_RESERVED, and the map_or(0, _) earlier is being used so that we reach this error.

However, I note that in the previous code we always return BadOpcode, whereas here it's possible we return MinimalData instead (unless pv.is_minimal_push() is documented to always return true for OP_RESERVED?)

Ah yeah, there is a commit in #209 (6725282) that removes OP_RESERVED from SmallValue making this and the previous comment obsolete. I can pull that into this PR.

str4d · 2025-07-30T17:10:12Z

src/interpreter.rs

+            if stack.len() < i.into() {
+                return Err(ScriptError::InvalidStackOperation);
+            };


Was this incorrectly removed in an earlier refactor, or if not then what is this refactored from? I ask because it changes the error that gets returned (instead of returning the error from stack.rget(i.into())?). Maybe this is also a rebase error?

stack.len() < 0usize can't be true, so this is dead code. I suspect it's incorrectly duplicated from other instances below.

Suggested change

if stack.len() < i.into() {

return Err(ScriptError::InvalidStackOperation);

};

I don’t know how this got reintroduced. Originally i was initialized to 1, so the check made sense, and it should have been removed when that was changed to 0.

str4d · 2025-07-30T17:12:58Z

src/interpreter.rs

-                        let mut isig = i;
-                        i += sigs_count;
-                        if stack.len() <= i.into() {
-                            return Err(ScriptError::InvalidStackOperation);
-                        };
-
-                        let mut success = true;
-                        while success && sigs_count > 0 {
-                            let vch_sig: &ValType = stack.rget(isig.into())?;
-                            let vch_pub_key: &ValType = stack.rget(ikey.into())?;
-
-                            // Check signature
-                            let ok: bool =
-                                is_sig_valid(vch_sig, vch_pub_key, flags, script, &checker)?;
-
-                            if ok {
-                                isig += 1;
-                                sigs_count -= 1;
-                            }
-                            ikey += 1;
-                            keys_count -= 1;
-
-                            // If there are more signatures left than keys left,
-                            // then too many signatures have failed. Exit early,
-                            // without checking any further signatures.
-                            if sigs_count > keys_count {
-                                success = false;
-                            };
-                        }
-
-                        // Clean up stack of actual arguments
-                        for _ in 0..i {
-                            stack.pop()?;
-                        }
-
-                        // A bug causes CHECKMULTISIG to consume one extra argument
-                        // whose contents were not checked in any way.
-                        //
-                        // Unfortunately this is a potential source of mutability,
-                        // so optionally verify it is exactly equal to zero prior
-                        // to removing it from the stack.
-                        if stack.is_empty() {
-                            return Err(ScriptError::InvalidStackOperation);
-                        }
-                        if flags.contains(VerificationFlags::NullDummy)
-                            && !stack.rget(0)?.is_empty()
-                        {
-                            return Err(ScriptError::SigNullDummy);
-                        }
-                        stack.pop()?;
-
-                        stack.push(cast_from_bool(success));
-
-                        if op == OP_CHECKMULTISIGVERIFY {
-                            if success {
-                                stack.pop()?;
-                            } else {
-                                return Err(ScriptError::CheckMultisigVerify);
-                            }
-                        }
-                    }
+        OP_CHECKMULTISIG | OP_CHECKMULTISIGVERIFY => {
+            // ([sig ...] num_of_signatures [pubkey ...] num_of_pubkeys -- bool)
+
+            // NB: This is guaranteed u8-safe, because we are limited to 20 keys and
+            //     20 signatures, plus a couple other fields. u8 also gives us total
+            //     conversions to the other types we deal with here (`isize` and `i64`).
+            let mut i: u8 = 0;
+            if stack.len() < i.into() {
+                return Err(ScriptError::InvalidStackOperation);
+            };
+
+            let mut keys_count =
+                u8::try_from(parse_num(stack.rget(i.into())?, require_minimal, None)?)
+                    .map_err(|_| ScriptError::PubKeyCount)?;
+            if keys_count > 20 {
+                return Err(ScriptError::PubKeyCount);
+            };
+            assert!(*op_count <= MAX_OP_COUNT);
+            *op_count += keys_count;
+            if *op_count > MAX_OP_COUNT {
+                return Err(ScriptError::OpCount);
+            };
+            i += 1;
+            let mut ikey = i;
+            i += keys_count;
+            if stack.len() <= i.into() {
+                return Err(ScriptError::InvalidStackOperation);
+            }

-                    _ => {
-                        return Err(ScriptError::BadOpcode);
-                    }
+            let mut sigs_count =
+                u8::try_from(parse_num(stack.rget(i.into())?, require_minimal, None)?)
+                    .map_err(|_| ScriptError::SigCount)?;
+            if sigs_count > keys_count {
+                return Err(ScriptError::SigCount);
+            };
+            assert!(i <= 21);


This reverts #222 (comment) which I presume is due to a rebase error. Undo this change.

Same problem here; when looking with whitespace hidden this diff is much more obvious.

Oh wait, I wonder if this is actually an artifact of this PR not having been rebased yet.

str4d · 2025-07-30T17:14:17Z

src/interpreter.rs

-                                isig += 1;
-                                sigs_count -= 1;
-                            }
-                            ikey += 1;
-                            keys_count -= 1;
-
-                            // If there are more signatures left than keys left,
-                            // then too many signatures have failed. Exit early,
-                            // without checking any further signatures.
-                            if sigs_count > keys_count {
-                                success = false;
-                            };
-                        }
-
-                        // Clean up stack of actual arguments
-                        for _ in 0..i {
-                            stack.pop()?;
-                        }
-
-                        // A bug causes CHECKMULTISIG to consume one extra argument
-                        // whose contents were not checked in any way.
-                        //
-                        // Unfortunately this is a potential source of mutability,
-                        // so optionally verify it is exactly equal to zero prior
-                        // to removing it from the stack.
-                        if stack.is_empty() {
-                            return Err(ScriptError::InvalidStackOperation);
-                        }
-                        if flags.contains(VerificationFlags::NullDummy)
-                            && !stack.rget(0)?.is_empty()
-                        {
-                            return Err(ScriptError::SigNullDummy);
-                        }
-                        stack.pop()?;
-
-                        stack.push(cast_from_bool(success));
-
-                        if op == OP_CHECKMULTISIGVERIFY {
-                            if success {
-                                stack.pop()?;
-                            } else {
-                                return Err(ScriptError::CheckMultisigVerify);
-                            }
-                        }
-                    }
+        OP_CHECKMULTISIG | OP_CHECKMULTISIGVERIFY => {
+            // ([sig ...] num_of_signatures [pubkey ...] num_of_pubkeys -- bool)
+
+            // NB: This is guaranteed u8-safe, because we are limited to 20 keys and
+            //     20 signatures, plus a couple other fields. u8 also gives us total
+            //     conversions to the other types we deal with here (`isize` and `i64`).
+            let mut i: u8 = 0;
+            if stack.len() < i.into() {
+                return Err(ScriptError::InvalidStackOperation);
+            };
+
+            let mut keys_count =
+                u8::try_from(parse_num(stack.rget(i.into())?, require_minimal, None)?)
+                    .map_err(|_| ScriptError::PubKeyCount)?;
+            if keys_count > 20 {
+                return Err(ScriptError::PubKeyCount);
+            };
+            assert!(*op_count <= MAX_OP_COUNT);
+            *op_count += keys_count;
+            if *op_count > MAX_OP_COUNT {
+                return Err(ScriptError::OpCount);
+            };
+            i += 1;
+            let mut ikey = i;
+            i += keys_count;
+            if stack.len() <= i.into() {
+                return Err(ScriptError::InvalidStackOperation);
+            }

-                    _ => {
-                        return Err(ScriptError::BadOpcode);
-                    }
+            let mut sigs_count =
+                u8::try_from(parse_num(stack.rget(i.into())?, require_minimal, None)?)
+                    .map_err(|_| ScriptError::SigCount)?;
+            if sigs_count > keys_count {
+                return Err(ScriptError::SigCount);
+            };
+            assert!(i <= 21);
+            i += 1;
+            let mut isig = i;
+            i += sigs_count;
+            if stack.len() <= i.into() {
+                return Err(ScriptError::InvalidStackOperation);
+            };
+
+            let mut success = true;
+            while success && sigs_count > 0 {
+                let vch_sig: &ValType = stack.rget(isig.into())?;
+                let vch_pub_key: &ValType = stack.rget(ikey.into())?;
+
+                // Note how this makes the exact order of pubkey/signature evaluation
+                // distinguishable by CHECKMULTISIG NOT if the STRICTENC flag is set.
+                // See the script_(in)valid tests for details.
+                let ok: bool = is_sig_valid(vch_sig, vch_pub_key, flags, script, checker)?;


This also looks like a rebase error, though I guess the comment being re-added here is fine.

Ew, at the outer level (without showing whitespace presumably) this diff looks horrible. I was talking specifically about the re-introduction of the "Note how this makes..." comment.

str4d · 2025-07-30T17:15:27Z

src/test_vectors.rs

I only skimmed the changes to this file.

On the C++ side, this is the value of the error output parameter when script validation is successful. It never occurs on the Rust side (because we have `Result`), and so we can remove it from the enumeration without an consequences.

Previously, the `TestVector`s held normalized results, and we would normalize the _actual_ result before comparison. This changes the `TestVector`s to hold richer Rust results, and then to normalize the _expected_ result only for the C++ case.

This splits `Operation` into three separate enums – `Control` (for if/else, which get executed regardless of `vexec` and disabled operations, which always fail, regardless of `vexec`), `Normal`, for operations that respect `vexec`, and `Unknown` for undefined opcodes which only fail if they’re on an active branch. This is done so that the evaluator can be split on the same lines. __NB__: I strongly recommend ignoring whitespace changes when reviewing this commit. (cherry picked from commit f97b92c)

(cherry picked from commit 27a5037)

This parallels the existing `op` module, and is used in cases where we want to guarantee that only push values are used. `op` itself has been updated to reference `pv`, rather than using the constructors directly.

This introduces one edge case: If a disabled opcode is the 202nd operation in a script, the C++ impl would return `Error::OpCount`, while the Rust impl would return `Error::DisabledOpcode`.

Having `Control` and `Normal` grouped under `Operation` only eliminated one conditional (checking whether we hit the `op_count` limit, and that is now better abstracted anyway), and it introduced a lot of code (see the 55 lines removed here _plus_ the many nested calls removed, as in op.rs). So, `Normal` is now called `Operation`. The confusing label of “control” for some `Operation` (née `Normal`) opcodes has been removed. (cherry picked from commit 1c98bb3)

Well, my _tiny_ edge case of “only if the 202nd operation is a disabled opcode” didn’t slip past the fuzzer. It caught that pretty quickly. So, this does a better job of normalizing errors for comparisons. First, it normalizes both the C++ and Rust side, which allows the Rust error cases to not be a superset of the C++ error cases. Then, it also normalizes errors in the stepwise comparator (which I think was done in ZcashFoundation#210, but it’s reasonable to do along with these other changes). Also, `ScriptError::UnknownError` has been replaced with `ScriptError::ExternalError`, which takes a string description. This is used for two purposes: 1. “Ambiguous” cases. One was already done – `UNKNOWN_ERROR` on the C++ side with `ScriptError::ScriptNumError` or `ScriptError::HighS` on the Rust side, but now it’s handled differently. The other is the edge case I introduced earlier in this PR – Rust will fail with a `DisabledOpcode` and C++ will fail with a `OpCount`, so those two cases have been unified. 2. Errors that occur inside a stepper. Previously, this just melded in by returning `UnknownError`, but that was just the “least bad” option. Now we can distinguish these from other `ScriptError`.

- add a `ParsedOpcode` type rather than using a tuple - use constants for `PUSHDATA` bytes - add some documentation

This gives the different `LargeValue` constructors different types, but it can’t (yet) enforce minimal encoding at the type level.

This is one more step toward type-safe scripts. It also modifies `Script::parse` to not fail on bad opcodes, and adds a new `Script::parse_strict` that _does_ fail on bad opcodes.

nuttycom

A few minor style nits; some of my style complaints from earlier commits were already addressed by other changes by the time I got to the end of the PR, but some (in particular, the request that functional constructs not be used for side-effecting code) remain.

nuttycom · 2025-07-31T17:48:23Z

src/lib.rs

-            ScriptError::ScriptNumError(_) => ScriptError::UnknownError,
-            _ => serr,
-        }),
+        Error::Ok(serr) => Error::Ok(normalize_script_error(serr)),


I went and looked at the documentation of Error::Ok, and I could not figure out why this is called Ok - can you document how or why this particular error is okay in some sense? I know it's not something that's specific to this PR, but that naming is definitely confusing without there being additional elaboration.

src/interpreter.rs

src/script.rs

nuttycom · 2025-08-01T19:57:17Z

src/interpreter.rs

@@ -484,7 +492,7 @@ fn eval_control(
            vexec.pop()?;
        }

-        OP_VERIF | OP_VERNOTIF => return Err(ScriptError::BadOpcode),
+        OP_VERIF | OP_VERNOTIF => return Err(ScriptError::BadOpcode(Some(op.into()))),


nit: I generally have had bad experiences when using .into(); these can be broken by spooky action at a distance by the introduction of From impls. Here it's probably fine, but I've gotten into the habit of avoiding into entirely in favor of e.g. u8::from(op).

nuttycom · 2025-08-01T20:06:32Z

src/script.rs

+pub struct ParsedOpcode<'a> {
+    /// The [`Result`] allows us to preserve unknown opcodes, which only trigger a failure if
+    /// they’re on an active branch during interpretation.
+    pub opcode: Result<Opcode, u8>,
+    pub remaining_code: &'a [u8],
+}


First, does this need to be part of the public API? Second, can the members be crate-private? Finally, it looks like now the Result is just being used as a generic Either type; I'd prefer a bespoke enum instead.

nuttycom · 2025-08-01T20:13:48Z

src/script.rs

+pub struct ParsedOpcode<'a> {
+    /// The [`Result`] allows us to preserve unknown opcodes, which only trigger a failure if
+    /// they’re on an active branch during interpretation.
+    pub opcode: Result<Opcode, u8>,


Suggested change

pub opcode: Result<Opcode, u8>,

pub opcode: OpcodeParseResult,

(potentially pub(crate)?)

nuttycom · 2025-08-01T20:16:07Z

src/script.rs

+                    opcode
+                        .map_err(|byte| ScriptError::BadOpcode(Some(byte)))
+                        .map(|op| result.push(op))


Suggested change

opcode

.map_err(|byte| ScriptError::BadOpcode(Some(byte)))

.map(|op| result.push(op))

match opcode {

Known(op) => {

result.push(op);

Ok(())

}

Unknown(byte) => Err(ScriptError::BadOpcode(Some(byte)))

}

nuttycom · 2025-08-01T20:22:22Z

src/script.rs

+                vec.clone().try_into().map_or(
+                    vec.clone().try_into().map_or(
+                        vec.clone().try_into().map_or(
+                            vec.try_into().map_or(None, |bv| Some(OP_PUSHDATA4(bv))),


I would really prefer that these were <type>::try_from calls instead; relying on type inference makes this code really hard to read.

nuttycom · 2025-08-01T20:26:12Z

src/script.rs


-use PushValue::*;
+#[derive(Clone, PartialEq, Eq, PartialOrd, Ord, Debug)]
+pub enum PushValue {


Variants still need documentation.

nuttycom · 2025-08-01T20:29:51Z

src/script.rs

-                    opcode
-                        .map_err(|byte| ScriptError::BadOpcode(Some(byte)))
-                        .map(|op| result.push(op))
+                    result.push(opcode)


This is a better resolution to my previous comment, but I would still prefer imperative style here given that it's mutating variables in the enclosing scope.

str4d

utACK 87202b0. Comments are all style and non-blocking (particularly if addressing them makes the later PRs harder to rebase), though I will likely go through and make the changes myself in a follow-up PR if they aren't addressed.

str4d · 2025-08-01T14:02:12Z

src/lib.rs

+    pub fn run_test_vector(
+        tv: &TestVector,


I just noticed that this is now in the public API (behind test-dependencies), and TestVector is now pub, but there's no way to obtain a TestVector in the public API (because crate::test_vectors is private). Either make crate::test_vectors public, or make this private (depending on which is intended).

str4d · 2025-08-01T14:49:02Z

src/script.rs

+                let vec = v.to_vec();
+                vec.clone().try_into().map_or(
+                    vec.clone().try_into().map_or(
+                        vec.clone().try_into().map_or(
+                            vec.try_into().map_or(None, |bv| Some(OP_PUSHDATA4(bv))),
+                            |bv| Some(OP_PUSHDATA2(bv)),
+                        ),
+                        |bv| Some(OP_PUSHDATA1(bv)),
+                    ),
+                    |bv| Some(PushdataBytelength(bv)),
+                )


Nits:

This clones the data one more time than necessary.

It also clones the data way more times than necessary, because .map_or eagerly evaluates its first argument. You'd need to use .map_or_else instead to get lazy evaulation.

I find nested map_or (where the or case is complex) unreadable, and for the inner case where there is no further nesting, it is more verbose than .ok().map(_).

Suggested change

let vec = v.to_vec();

vec.clone().try_into().map_or(

vec.clone().try_into().map_or(

vec.clone().try_into().map_or(

vec.try_into().map_or(None, |bv| Some(OP_PUSHDATA4(bv))),

|bv| Some(OP_PUSHDATA2(bv)),

),

|bv| Some(OP_PUSHDATA1(bv)),

),

|bv| Some(PushdataBytelength(bv)),

)

if let Ok(bv) = v.to_vec().try_into() {

Some(PushdataBytelength(bv))

} else if let Ok(bv) = v.to_vec().try_into() {

Some(OP_PUSHDATA1(bv))

} else if let Ok(bv) = v.to_vec().try_into() {

Some(OP_PUSHDATA2(bv))

} else {

v.to_vec().try_into().ok().map(OP_PUSHDATA4)

}

Or if we want the more functional approach:

Suggested change

let vec = v.to_vec();

vec.clone().try_into().map_or(

vec.clone().try_into().map_or(

vec.clone().try_into().map_or(

vec.try_into().map_or(None, |bv| Some(OP_PUSHDATA4(bv))),

|bv| Some(OP_PUSHDATA2(bv)),

),

|bv| Some(OP_PUSHDATA1(bv)),

),

|bv| Some(PushdataBytelength(bv)),

)

v.to_vec()

.try_into()

.ok()

.map(PushdataBytelength)

.or_else(|| {

v.to_vec()

.try_into()

.ok()

.map(OP_PUSHDATA1)

})

.or_else(|| {

v.to_vec()

.try_into()

.ok()

.map(OP_PUSHDATA2)

})

.or_else(|| {

v.to_vec()

.try_into()

.ok()

.map(OP_PUSHDATA4)

})

}

str4d · 2025-08-01T14:55:02Z

src/script.rs

+        match v {
+            [] => None,
+            _ => {


Instead of a single-case match, we generally use:

if v.is_empty() { None } else {

or

(!v.is_empty()).then(|| {

str4d · 2025-08-01T15:01:14Z

src/script.rs

+///
+/// TODO: These should have lower bounds that can prevent non-minimal encodings, but that requires
+///       at least `const_generic_exprs`.


Should they? Is it the case that the C++ script engine never permits non-minimal encodings to be used during execution?

The upper bounds by contrast are clearly correct, as it is impossible to e.g. parse an OP_PUSHDATA1 and get more than 0xff bytes.

str4d · 2025-08-01T22:48:06Z

src/script.rs

+        match v {
+            [] => Some(PushValue::SmallValue(OP_0)),
+            [byte] => Some(match byte {
+                0x81 => PushValue::SmallValue(OP_1NEGATE),
+                1 => PushValue::SmallValue(OP_1),
+                2 => PushValue::SmallValue(OP_2),
+                3 => PushValue::SmallValue(OP_3),
+                4 => PushValue::SmallValue(OP_4),
+                5 => PushValue::SmallValue(OP_5),
+                6 => PushValue::SmallValue(OP_6),
+                7 => PushValue::SmallValue(OP_7),
+                8 => PushValue::SmallValue(OP_8),
+                9 => PushValue::SmallValue(OP_9),
+                10 => PushValue::SmallValue(OP_10),
+                11 => PushValue::SmallValue(OP_11),
+                12 => PushValue::SmallValue(OP_12),
+                13 => PushValue::SmallValue(OP_13),
+                14 => PushValue::SmallValue(OP_14),
+                15 => PushValue::SmallValue(OP_15),
+                16 => PushValue::SmallValue(OP_16),
+                _ => PushValue::LargeValue(PushdataBytelength([*byte; 1].into())),
+            }),
+            _ => LargeValue::from_slice(v).map(PushValue::LargeValue),
+        }


Nit: the inner match can be merged with the outer, and then we aren't duplicating the parsing logic for one-byte PushdataBytelengths:

Suggested change

match v {

[] => Some(PushValue::SmallValue(OP_0)),

[byte] => Some(match byte {

0x81 => PushValue::SmallValue(OP_1NEGATE),

1 => PushValue::SmallValue(OP_1),

2 => PushValue::SmallValue(OP_2),

3 => PushValue::SmallValue(OP_3),

4 => PushValue::SmallValue(OP_4),

5 => PushValue::SmallValue(OP_5),

6 => PushValue::SmallValue(OP_6),

7 => PushValue::SmallValue(OP_7),

8 => PushValue::SmallValue(OP_8),

9 => PushValue::SmallValue(OP_9),

10 => PushValue::SmallValue(OP_10),

11 => PushValue::SmallValue(OP_11),

12 => PushValue::SmallValue(OP_12),

13 => PushValue::SmallValue(OP_13),

14 => PushValue::SmallValue(OP_14),

15 => PushValue::SmallValue(OP_15),

16 => PushValue::SmallValue(OP_16),

_ => PushValue::LargeValue(PushdataBytelength([*byte; 1].into())),

}),

_ => LargeValue::from_slice(v).map(PushValue::LargeValue),

}

match v {

[] => Some(PushValue::SmallValue(OP_0)),

[0x81] => Some(PushValue::SmallValue(OP_1NEGATE)),

[1] => Some(PushValue::SmallValue(OP_1)),

[2] => Some(PushValue::SmallValue(OP_2)),

[3] => Some(PushValue::SmallValue(OP_3)),

[4] => Some(PushValue::SmallValue(OP_4)),

[5] => Some(PushValue::SmallValue(OP_5)),

[6] => Some(PushValue::SmallValue(OP_6)),

[7] => Some(PushValue::SmallValue(OP_7)),

[8] => Some(PushValue::SmallValue(OP_8)),

[9] => Some(PushValue::SmallValue(OP_9)),

[10] => Some(PushValue::SmallValue(OP_10)),

[11] => Some(PushValue::SmallValue(OP_11)),

[12] => Some(PushValue::SmallValue(OP_12)),

[13] => Some(PushValue::SmallValue(OP_13)),

[14] => Some(PushValue::SmallValue(OP_14)),

[15] => Some(PushValue::SmallValue(OP_15)),

[16] => Some(PushValue::SmallValue(OP_16)),

_ => LargeValue::from_slice(v).map(PushValue::LargeValue),

}

str4d · 2025-08-01T22:54:46Z

src/script.rs

+            Self::get_op(pc).map(
+                |ParsedOpcode {
+                     opcode,
+                     remaining_code,
+                 }| {
+                    pc = remaining_code;
+                    result.push(opcode)
+                },
+            )?;


Suggested change

Self::get_op(pc).map(

|ParsedOpcode {

opcode,

remaining_code,

}| {

pc = remaining_code;

result.push(opcode)

},

)?;

let ParsedOpcode {

opcode,

remaining_code,

} = Self::get_op(pc)?;

pc = remaining_code;

result.push(opcode);

str4d · 2025-08-01T23:00:57Z

src/script.rs

+                Some((leading_byte, remaining_code)) => Disabled::from_u8(*leading_byte).map_or(
+                    Ok(ParsedOpcode {
+                        opcode: SmallValue::from_u8(*leading_byte).map_or(
+                            Control::from_u8(*leading_byte).map_or(
+                                Operation::from_u8(*leading_byte)
+                                    .map_or(Err((*leading_byte).into()), |op| {
+                                        Ok(Opcode::Operation(op))
+                                    }),
+                                |ctl| Ok(Opcode::Control(ctl)),
+                            ),
+                            |sv| Ok(Opcode::PushValue(PushValue::SmallValue(sv))),
+                        ),
+                        remaining_code,
+                    }),
+                    |disabled| Err(ScriptError::DisabledOpcode(Some(disabled))),
+                ),


Non-blocking: Another case where the .map_or is eagerly running when it should lazily run. However, this entire section will be rewritten when enum_primitive is removed, so I'm fine leaving it as-is for now.

str4d · 2025-08-01T23:05:36Z

src/script.rs

+        match self.parse().map(|script| script.into_iter().collect()) {
+            Err(op_err) => Err(Ok(op_err)),
+            Ok(Err(bad_ops)) => Err(Err(bad_ops)),
+            Ok(Ok(ops)) => Ok(ops),
+        }


This is a very confusing match without documentation. @nuttycom's enum suggestion would likely help, but also some comments would make things clearer.

str4d · 2025-08-01T23:10:49Z

src/interpreter.rs

+                if Bad::OP_RESERVED != bad {
+                    state.increment_op_count()?;
+                }
+                if Bad::OP_VERIF == bad || Bad::OP_VERNOTIF == bad || should_exec(&state.vexec) {


This should also work:

Suggested change

if Bad::OP_VERIF == bad || Bad::OP_VERNOTIF == bad || should_exec(&state.vexec) {

if matches!(bad, Bad::OP_VERIF | Bad::OP_VERNOTIF) || should_exec(&state.vexec) {

(turns out it's the same length in this instance).

str4d · 2025-08-01T23:15:29Z

src/interpreter.rs

+             remaining_code,
+         }| match opcode {
+            Err(bad) => {
+                if Bad::OP_RESERVED != bad {


Suggested change

if Bad::OP_RESERVED != bad {

// See the documentation of `Bad` for an explanation of this logic.

if Bad::OP_RESERVED != bad {

This was referenced May 22, 2025

Add script serialization & patterns #224

Open

Stable Rust changes #209

Draft

mpguerra added this to Zebra Jun 2, 2025

mpguerra moved this to Review/QA in Zebra Jun 2, 2025

mpguerra assigned conradoplg Jun 2, 2025

str4d requested changes Jun 24, 2025

View reviewed changes

src/script.rs Outdated Show resolved Hide resolved

github-project-automation bot moved this from Review/QA to In progress in Zebra Jun 24, 2025

sellout added a commit to sellout/zcash_script that referenced this pull request Jul 2, 2025

Add more ergonomic opcode definitions

d74fe5b

This was originally part of ZcashFoundation#223, but with the script validation tests being added, it was helpful to bring this subset forward.

sellout mentioned this pull request Jul 2, 2025

Migrate script_*valid.json tests to Rust TestVectors #241

Merged

sellout force-pushed the partition-opcodes branch from f0753c5 to 0ec5b66 Compare July 2, 2025 23:49

natalieesk moved this from In progress to Review/QA in Zebra Jul 9, 2025

natalieesk added the external contribution If an issue or PR has been created by someone external to the Foundation label Jul 15, 2025

sellout force-pushed the partition-opcodes branch 2 times, most recently from 533f27a to fb6ff58 Compare July 16, 2025 20:22

sellout marked this pull request as draft July 21, 2025 19:34

nuttycom reviewed Jul 29, 2025

View reviewed changes

str4d reviewed Jul 29, 2025

View reviewed changes

sellout force-pushed the partition-opcodes branch 2 times, most recently from 521747c to 05854fc Compare July 29, 2025 22:12

str4d requested changes Jul 30, 2025

View reviewed changes

sellout added 9 commits July 31, 2025 08:13

Remove ScriptError::Ok

9ed0ceb

On the C++ side, this is the value of the error output parameter when script validation is successful. It never occurs on the Rust side (because we have `Result`), and so we can remove it from the enumeration without an consequences.

Improve TestVector expectations

548391a

Previously, the `TestVector`s held normalized results, and we would normalize the _actual_ result before comparison. This changes the `TestVector`s to hold richer Rust results, and then to normalize the _expected_ result only for the C++ case.

Integrate values with push ops

88cd075

(cherry picked from commit 27a5037)

Expose ergonomic push values

b61b1c5

This parallels the existing `op` module, and is used in cases where we want to guarantee that only push values are used. `op` itself has been updated to reference `pv`, rather than using the constructors directly.

Exclude disabled ops from Opcode

b874388

This introduces one edge case: If a disabled opcode is the 202nd operation in a script, the C++ impl would return `Error::OpCount`, while the Rust impl would return `Error::DisabledOpcode`.

Add a MAX_OP_COUNT constant

f6c71ad

Don’t promote unknown bytes to opcodes

3bab11e

sellout force-pushed the partition-opcodes branch from 05854fc to faef56e Compare July 31, 2025 15:40

sellout added 2 commits July 31, 2025 10:34

Improve opcode parsing clarity

fa8a800

- add a `ParsedOpcode` type rather than using a tuple - use constants for `PUSHDATA` bytes - add some documentation

Make the public interface for PushValue safer

8bc3320

sellout marked this pull request as ready for review July 31, 2025 20:33

sellout added 3 commits July 31, 2025 14:35

Use BoundedVec for LargeValue

24b0348

This gives the different `LargeValue` constructors different types, but it can’t (yet) enforce minimal encoding at the type level.

Address other minor PR comments

38e10c3

Separate “bad” opcodes from Opcode

87202b0

This is one more step toward type-safe scripts. It also modifies `Script::parse` to not fail on bad opcodes, and adds a new `Script::parse_strict` that _does_ fail on bad opcodes.

sellout force-pushed the partition-opcodes branch from c719e4a to 87202b0 Compare July 31, 2025 20:36

nuttycom requested changes Aug 1, 2025

View reviewed changes

str4d approved these changes Aug 1, 2025

View reviewed changes

-#[derive(Clone, PartialEq, Eq, PartialOrd, Ord, Debug)]
-pub enum LargeValue {
-    // push value
-    PushdataBytelength(Vec<u8>),
-    OP_PUSHDATA1(Vec<u8>),
-    OP_PUSHDATA2(Vec<u8>),
-    OP_PUSHDATA4(Vec<u8>),
-}
+#[derive(Clone, PartialEq, Eq, PartialOrd, Ord, Debug)]
+pub enum LargeValue {
+    // push value
+    PushdataBytelength(PushData0),
+    OP_PUSHDATA1(PushData1),
+    OP_PUSHDATA2(PushData2),
+    OP_PUSHDATA4(PushData4),
+}
+/// A `PushData` that is short enough to not require a length prefix.
+pub struct PushData0(Vec<u8>);
+/// A `PushData` that requires a 1-byte length prefix.
+pub struct PushData1(Vec<u8>);
+/// A `PushData` that requires a 2-byte length prefix.
+pub struct PushData2(Vec<u8>);
+/// A `PushData` that requires a 4-byte length prefix.
+pub struct PushData4(Vec<u8>);

		pv.value().map_or(
		Err(ScriptError::BadOpcode(Some(SmallValue::OP_RESERVED.into()))),

	if stack.len() < i.into() {
	return Err(ScriptError::InvalidStackOperation);
	};

	pub opcode: Result<Opcode, u8>,
	pub opcode: OpcodeParseResult,

	if Bad::OP_VERIF == bad \|\| Bad::OP_VERNOTIF == bad \|\| should_exec(&state.vexec) {
	if matches!(bad, Bad::OP_VERIF \| Bad::OP_VERNOTIF) \|\| should_exec(&state.vexec) {

	if Bad::OP_RESERVED != bad {
	// See the documentation of `Bad` for an explanation of this logic.
	if Bad::OP_RESERVED != bad {

Partition Operation enum #223

Are you sure you want to change the base?

Partition Operation enum #223

Uh oh!

Conversation

sellout commented May 22, 2025

Uh oh!

str4d left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

str4d Jul 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

str4d left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nuttycom left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

str4d Jul 29, 2025 •

edited

Loading