-
Notifications
You must be signed in to change notification settings - Fork 12
Partition Operation enum #223
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Flushing comment from a partial review as of f0753c5.
This was originally part of ZcashFoundation#223, but with the script validation tests being added, it was helpful to bring this subset forward.
f0753c5
to
0ec5b66
Compare
* Add more ergonomic opcode definitions This was originally part of #223, but with the script validation tests being added, it was helpful to bring this subset forward. * Port script tests from C++ This is regex-transformed from the script_*valid.json tests in the C++ implementation. The most significant difference is that this checks the specific result, not just whether the script is valid. This is useful for a couple reasons: - it catches places where a test may not be failing in the way we expect (e.g., invalid.rs has a test that is likely not testing what it should be); - it catches places the implementation returns the wrong error (these have already been handled in the Rust implementation, but see all of the `UnknownError` results – if we’re explicitly testing for them, they shouldn’t be unknown, and one is definitely a case of forgot-to- report-`SigHighS`); and - it allows us to interleave successful and unsuccessful tests in future, rather than splitting related tests into two files just because one succeeds and one represents an error. However, some tests currently fail, this is because they rely on specific transactions in `CHECK*SIG*`. This was easy to support prior to the callback change, but these tests were never migrated to work on the updated C++ implementation. See #240. * Update script tests This brings them in sync with bitcoin/bitcoin@76da761 , which is the last commit before they were merged into a single file. Unfortunately, this also adds expected results to each entry, so the diff is bigger than just added/changed tests. Also, since the C++/JSON version of these tests was never updated to work after the callback change, I’m updating the JSON files to match the Bitcoin ones (modulo the tests we don’t support), just to keep everything in sync. * Unify the valid & invalid script tests This mirrors the changes from bitcoin/bitcoin@dde46d3 * Sync script tests with Bitcoin HEAD This brings the script tests up to date as of bitcoin/bitcoin@fa33592 , but without the cases that require features we don’t support. * Merge unified `TestVector`s into containing module
533f27a
to
fb6ff58
Compare
src/script.rs
Outdated
}, | ||
} | ||
} | ||
|
||
pub fn get_op(script: &[u8]) -> Result<(Result<Opcode, u8>, &[u8]), ScriptError> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of a tuple here, can the Ok
side be a struct with named members? That would help with clarity.
Also, document this method (and the new struct).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
src/script.rs
Outdated
@@ -400,28 +462,61 @@ impl Script<'_> { | |||
}) | |||
} | |||
|
|||
pub fn get_op2(script: &[u8]) -> Result<(Opcode, &[u8], &[u8]), ScriptError> { | |||
pub fn get_lv(script: &[u8]) -> Result<Option<(LargeValue, &[u8])>, ScriptError> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Document this method. Does it need to be public?
src/script.rs
Outdated
}, | ||
} | ||
} | ||
|
||
pub fn get_op(script: &[u8]) -> Result<(Result<Opcode, u8>, &[u8]), ScriptError> { | ||
Self::get_lv(script).and_then(|r| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this is another place where using ?
would be clearer and require less indentation.
src/script.rs
Outdated
0x4c => Self::split_tagged_value(script, 1) | ||
.map(|(v, script)| Some((OP_PUSHDATA1(v.to_vec()), script))), | ||
0x4d => Self::split_tagged_value(script, 2) | ||
.map(|(v, script)| Some((OP_PUSHDATA2(v.to_vec()), script))), | ||
0x4e => Self::split_tagged_value(script, 4) | ||
.map(|(v, script)| Some((OP_PUSHDATA4(v.to_vec()), script))), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add pub(crate)
constants for these values instead of using the magic numbers inline.
src/script.rs
Outdated
_ => { | ||
if 0x01 <= *leading_byte && *leading_byte < 0x4c { | ||
Self::split_value(script, (*leading_byte).into()) | ||
.map(|(v, script)| Some((PushdataBytelength(v.to_vec()), script))) | ||
} else { | ||
Ok(None) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_ => { | |
if 0x01 <= *leading_byte && *leading_byte < 0x4c { | |
Self::split_value(script, (*leading_byte).into()) | |
.map(|(v, script)| Some((PushdataBytelength(v.to_vec()), script))) | |
} else { | |
Ok(None) | |
} | |
} | |
0x01..0x4c => { | |
Self::split_value(script, (*leading_byte).into()) | |
.map(|(v, script)| Some((PushdataBytelength(v.to_vec()), script))) | |
} | |
_ => Ok(None), |
0x01..0x4c
matches are newer Rust syntax; if the MSRV of this crate is too low, then use 0x01..=0x4b
instead.
(And also move this above the 0x4c
case so they are in order, for niceness.)
521747c
to
05854fc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 05854fc
src/pv.rs
Outdated
pub fn pushdata_bytelength(value: Vec<u8>) -> PushValue { | ||
LargeValue(PushdataBytelength(value)) | ||
} | ||
|
||
pub fn pushdata1(value: Vec<u8>) -> PushValue { | ||
LargeValue(OP_PUSHDATA1(value)) | ||
} | ||
|
||
pub fn pushdata2(value: Vec<u8>) -> PushValue { | ||
LargeValue(OP_PUSHDATA2(value)) | ||
} | ||
|
||
pub fn pushdata4(value: Vec<u8>) -> PushValue { | ||
LargeValue(OP_PUSHDATA4(value)) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a fan of having these in the public API, as they enable construction of both non-minimal (not ideal, but eh) and invalid (because the data is longer than can be represented) PushValue
s. Instead, I'd prefer a PushValue
constructor that takes a data: Vec<u8>
and returns the correct PushValue
(or None
/ error if it is too long).
Non-blocking because it looks like crate::test_vectors::Entry::val_to_pv
does exactly this, and IIRC there's also maybe a push_vec
helper function that appears in a later PR, so I'm fine with addressing this in a subsequent PR (at which point all of these should be pub(crate)
at most, unless there's a documented reason for these being in the public API).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You’re right. It’s fixed in e54309d#diff-06fac4d64e96cf6aba9121211fd0596281b0225a40a27abe35f87357a9e749bb (from #209), but I can pull that into here.
src/script.rs
Outdated
#[derive(Clone, PartialEq, Eq, PartialOrd, Ord, Debug)] | ||
pub enum LargeValue { | ||
// push value | ||
PushdataBytelength(Vec<u8>), | ||
OP_PUSHDATA1(Vec<u8>), | ||
OP_PUSHDATA2(Vec<u8>), | ||
OP_PUSHDATA4(Vec<u8>), | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These being enum tuple variants means that they are forcibly public, which has the same problem as my previous comment (it is possible to construct an invalid LargeValue
, and thus an invalid PushValue
). I would prefer that each of these use an opaque newtype wrapper to keep the Vec<u8>
hidden:
#[derive(Clone, PartialEq, Eq, PartialOrd, Ord, Debug)] | |
pub enum LargeValue { | |
// push value | |
PushdataBytelength(Vec<u8>), | |
OP_PUSHDATA1(Vec<u8>), | |
OP_PUSHDATA2(Vec<u8>), | |
OP_PUSHDATA4(Vec<u8>), | |
} | |
#[derive(Clone, PartialEq, Eq, PartialOrd, Ord, Debug)] | |
pub enum LargeValue { | |
// push value | |
PushdataBytelength(PushData0), | |
OP_PUSHDATA1(PushData1), | |
OP_PUSHDATA2(PushData2), | |
OP_PUSHDATA4(PushData4), | |
} | |
/// A `PushData` that is short enough to not require a length prefix. | |
pub struct PushData0(Vec<u8>); | |
/// A `PushData` that requires a 1-byte length prefix. | |
pub struct PushData1(Vec<u8>); | |
/// A `PushData` that requires a 2-byte length prefix. | |
pub struct PushData2(Vec<u8>); | |
/// A `PushData` that requires a 4-byte length prefix. | |
pub struct PushData4(Vec<u8>); |
Alternatively, make LargeValue
a struct, and handle the difference between the required opcode / length prefix internally. (What benefit do we get from exposing these differences in the public API for matching?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, in #209, these are no longer public (the eval_*
functions have become methods on the individual types, and so only Opcode
and PushValue
are exposed as types, and they don’t have their constructors exposed).
To defend the current encoding, I pulled in some work that wasn’t on any published branch yet – using different BoundedVec
s for each LargeValue
constructor. It doesn’t yet enforce minimal encoding at the type level. I think that’d be feasible with const_generic_exprs
, but in the meantime we could parameterize PushValue
to take either, say, MinimalLargeValue
or NonMinimalLargeValue
. But that’s not in this change.
src/script.rs
Outdated
use LargeValue::*; | ||
|
||
impl LargeValue { | ||
pub fn value(&self) -> Vec<u8> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Document this method.
#[derive(Copy, Clone, PartialEq, Eq, PartialOrd, Ord, Debug)] | ||
#[repr(u8)] | ||
pub enum PushValue { | ||
pub enum SmallValue { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Document this enum.
- Document this addition in the changelog.
src/interpreter.rs
Outdated
pv.value().map_or( | ||
Err(ScriptError::BadOpcode(Some(SmallValue::OP_RESERVED.into()))), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, here instead it becomes clear that None
actually means OP_RESERVED
, and the map_or(0, _)
earlier is being used so that we reach this error.
However, I note that in the previous code we always return BadOpcode
, whereas here it's possible we return MinimalData
instead (unless pv.is_minimal_push()
is documented to always return true
for OP_RESERVED
?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
src/interpreter.rs
Outdated
if stack.len() < i.into() { | ||
return Err(ScriptError::InvalidStackOperation); | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this incorrectly removed in an earlier refactor, or if not then what is this refactored from? I ask because it changes the error that gets returned (instead of returning the error from stack.rget(i.into())?
). Maybe this is also a rebase error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stack.len() < 0usize
can't be true, so this is dead code. I suspect it's incorrectly duplicated from other instances below.
if stack.len() < i.into() { | |
return Err(ScriptError::InvalidStackOperation); | |
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don’t know how this got reintroduced. Originally i
was initialized to 1
, so the check made sense, and it should have been removed when that was changed to 0
.
src/interpreter.rs
Outdated
let mut isig = i; | ||
i += sigs_count; | ||
if stack.len() <= i.into() { | ||
return Err(ScriptError::InvalidStackOperation); | ||
}; | ||
|
||
let mut success = true; | ||
while success && sigs_count > 0 { | ||
let vch_sig: &ValType = stack.rget(isig.into())?; | ||
let vch_pub_key: &ValType = stack.rget(ikey.into())?; | ||
|
||
// Check signature | ||
let ok: bool = | ||
is_sig_valid(vch_sig, vch_pub_key, flags, script, &checker)?; | ||
|
||
if ok { | ||
isig += 1; | ||
sigs_count -= 1; | ||
} | ||
ikey += 1; | ||
keys_count -= 1; | ||
|
||
// If there are more signatures left than keys left, | ||
// then too many signatures have failed. Exit early, | ||
// without checking any further signatures. | ||
if sigs_count > keys_count { | ||
success = false; | ||
}; | ||
} | ||
|
||
// Clean up stack of actual arguments | ||
for _ in 0..i { | ||
stack.pop()?; | ||
} | ||
|
||
// A bug causes CHECKMULTISIG to consume one extra argument | ||
// whose contents were not checked in any way. | ||
// | ||
// Unfortunately this is a potential source of mutability, | ||
// so optionally verify it is exactly equal to zero prior | ||
// to removing it from the stack. | ||
if stack.is_empty() { | ||
return Err(ScriptError::InvalidStackOperation); | ||
} | ||
if flags.contains(VerificationFlags::NullDummy) | ||
&& !stack.rget(0)?.is_empty() | ||
{ | ||
return Err(ScriptError::SigNullDummy); | ||
} | ||
stack.pop()?; | ||
|
||
stack.push(cast_from_bool(success)); | ||
|
||
if op == OP_CHECKMULTISIGVERIFY { | ||
if success { | ||
stack.pop()?; | ||
} else { | ||
return Err(ScriptError::CheckMultisigVerify); | ||
} | ||
} | ||
} | ||
OP_CHECKMULTISIG | OP_CHECKMULTISIGVERIFY => { | ||
// ([sig ...] num_of_signatures [pubkey ...] num_of_pubkeys -- bool) | ||
|
||
// NB: This is guaranteed u8-safe, because we are limited to 20 keys and | ||
// 20 signatures, plus a couple other fields. u8 also gives us total | ||
// conversions to the other types we deal with here (`isize` and `i64`). | ||
let mut i: u8 = 0; | ||
if stack.len() < i.into() { | ||
return Err(ScriptError::InvalidStackOperation); | ||
}; | ||
|
||
let mut keys_count = | ||
u8::try_from(parse_num(stack.rget(i.into())?, require_minimal, None)?) | ||
.map_err(|_| ScriptError::PubKeyCount)?; | ||
if keys_count > 20 { | ||
return Err(ScriptError::PubKeyCount); | ||
}; | ||
assert!(*op_count <= MAX_OP_COUNT); | ||
*op_count += keys_count; | ||
if *op_count > MAX_OP_COUNT { | ||
return Err(ScriptError::OpCount); | ||
}; | ||
i += 1; | ||
let mut ikey = i; | ||
i += keys_count; | ||
if stack.len() <= i.into() { | ||
return Err(ScriptError::InvalidStackOperation); | ||
} | ||
|
||
_ => { | ||
return Err(ScriptError::BadOpcode); | ||
} | ||
let mut sigs_count = | ||
u8::try_from(parse_num(stack.rget(i.into())?, require_minimal, None)?) | ||
.map_err(|_| ScriptError::SigCount)?; | ||
if sigs_count > keys_count { | ||
return Err(ScriptError::SigCount); | ||
}; | ||
assert!(i <= 21); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This reverts #222 (comment) which I presume is due to a rebase error. Undo this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same problem here; when looking with whitespace hidden this diff is much more obvious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh wait, I wonder if this is actually an artifact of this PR not having been rebased yet.
src/interpreter.rs
Outdated
isig += 1; | ||
sigs_count -= 1; | ||
} | ||
ikey += 1; | ||
keys_count -= 1; | ||
|
||
// If there are more signatures left than keys left, | ||
// then too many signatures have failed. Exit early, | ||
// without checking any further signatures. | ||
if sigs_count > keys_count { | ||
success = false; | ||
}; | ||
} | ||
|
||
// Clean up stack of actual arguments | ||
for _ in 0..i { | ||
stack.pop()?; | ||
} | ||
|
||
// A bug causes CHECKMULTISIG to consume one extra argument | ||
// whose contents were not checked in any way. | ||
// | ||
// Unfortunately this is a potential source of mutability, | ||
// so optionally verify it is exactly equal to zero prior | ||
// to removing it from the stack. | ||
if stack.is_empty() { | ||
return Err(ScriptError::InvalidStackOperation); | ||
} | ||
if flags.contains(VerificationFlags::NullDummy) | ||
&& !stack.rget(0)?.is_empty() | ||
{ | ||
return Err(ScriptError::SigNullDummy); | ||
} | ||
stack.pop()?; | ||
|
||
stack.push(cast_from_bool(success)); | ||
|
||
if op == OP_CHECKMULTISIGVERIFY { | ||
if success { | ||
stack.pop()?; | ||
} else { | ||
return Err(ScriptError::CheckMultisigVerify); | ||
} | ||
} | ||
} | ||
OP_CHECKMULTISIG | OP_CHECKMULTISIGVERIFY => { | ||
// ([sig ...] num_of_signatures [pubkey ...] num_of_pubkeys -- bool) | ||
|
||
// NB: This is guaranteed u8-safe, because we are limited to 20 keys and | ||
// 20 signatures, plus a couple other fields. u8 also gives us total | ||
// conversions to the other types we deal with here (`isize` and `i64`). | ||
let mut i: u8 = 0; | ||
if stack.len() < i.into() { | ||
return Err(ScriptError::InvalidStackOperation); | ||
}; | ||
|
||
let mut keys_count = | ||
u8::try_from(parse_num(stack.rget(i.into())?, require_minimal, None)?) | ||
.map_err(|_| ScriptError::PubKeyCount)?; | ||
if keys_count > 20 { | ||
return Err(ScriptError::PubKeyCount); | ||
}; | ||
assert!(*op_count <= MAX_OP_COUNT); | ||
*op_count += keys_count; | ||
if *op_count > MAX_OP_COUNT { | ||
return Err(ScriptError::OpCount); | ||
}; | ||
i += 1; | ||
let mut ikey = i; | ||
i += keys_count; | ||
if stack.len() <= i.into() { | ||
return Err(ScriptError::InvalidStackOperation); | ||
} | ||
|
||
_ => { | ||
return Err(ScriptError::BadOpcode); | ||
} | ||
let mut sigs_count = | ||
u8::try_from(parse_num(stack.rget(i.into())?, require_minimal, None)?) | ||
.map_err(|_| ScriptError::SigCount)?; | ||
if sigs_count > keys_count { | ||
return Err(ScriptError::SigCount); | ||
}; | ||
assert!(i <= 21); | ||
i += 1; | ||
let mut isig = i; | ||
i += sigs_count; | ||
if stack.len() <= i.into() { | ||
return Err(ScriptError::InvalidStackOperation); | ||
}; | ||
|
||
let mut success = true; | ||
while success && sigs_count > 0 { | ||
let vch_sig: &ValType = stack.rget(isig.into())?; | ||
let vch_pub_key: &ValType = stack.rget(ikey.into())?; | ||
|
||
// Note how this makes the exact order of pubkey/signature evaluation | ||
// distinguishable by CHECKMULTISIG NOT if the STRICTENC flag is set. | ||
// See the script_(in)valid tests for details. | ||
let ok: bool = is_sig_valid(vch_sig, vch_pub_key, flags, script, checker)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also looks like a rebase error, though I guess the comment being re-added here is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ew, at the outer level (without showing whitespace presumably) this diff looks horrible. I was talking specifically about the re-introduction of the "Note how this makes..." comment.
src/test_vectors.rs
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only skimmed the changes to this file.
On the C++ side, this is the value of the error output parameter when script validation is successful. It never occurs on the Rust side (because we have `Result`), and so we can remove it from the enumeration without an consequences.
Previously, the `TestVector`s held normalized results, and we would normalize the _actual_ result before comparison. This changes the `TestVector`s to hold richer Rust results, and then to normalize the _expected_ result only for the C++ case.
This splits `Operation` into three separate enums – `Control` (for if/else, which get executed regardless of `vexec` and disabled operations, which always fail, regardless of `vexec`), `Normal`, for operations that respect `vexec`, and `Unknown` for undefined opcodes which only fail if they’re on an active branch. This is done so that the evaluator can be split on the same lines. __NB__: I strongly recommend ignoring whitespace changes when reviewing this commit. (cherry picked from commit f97b92c)
(cherry picked from commit 27a5037)
This parallels the existing `op` module, and is used in cases where we want to guarantee that only push values are used. `op` itself has been updated to reference `pv`, rather than using the constructors directly.
This introduces one edge case: If a disabled opcode is the 202nd operation in a script, the C++ impl would return `Error::OpCount`, while the Rust impl would return `Error::DisabledOpcode`.
Having `Control` and `Normal` grouped under `Operation` only eliminated one conditional (checking whether we hit the `op_count` limit, and that is now better abstracted anyway), and it introduced a lot of code (see the 55 lines removed here _plus_ the many nested calls removed, as in op.rs). So, `Normal` is now called `Operation`. The confusing label of “control” for some `Operation` (née `Normal`) opcodes has been removed. (cherry picked from commit 1c98bb3)
Well, my _tiny_ edge case of “only if the 202nd operation is a disabled opcode” didn’t slip past the fuzzer. It caught that pretty quickly. So, this does a better job of normalizing errors for comparisons. First, it normalizes both the C++ and Rust side, which allows the Rust error cases to not be a superset of the C++ error cases. Then, it also normalizes errors in the stepwise comparator (which I think was done in ZcashFoundation#210, but it’s reasonable to do along with these other changes). Also, `ScriptError::UnknownError` has been replaced with `ScriptError::ExternalError`, which takes a string description. This is used for two purposes: 1. “Ambiguous” cases. One was already done – `UNKNOWN_ERROR` on the C++ side with `ScriptError::ScriptNumError` or `ScriptError::HighS` on the Rust side, but now it’s handled differently. The other is the edge case I introduced earlier in this PR – Rust will fail with a `DisabledOpcode` and C++ will fail with a `OpCount`, so those two cases have been unified. 2. Errors that occur inside a stepper. Previously, this just melded in by returning `UnknownError`, but that was just the “least bad” option. Now we can distinguish these from other `ScriptError`.
05854fc
to
faef56e
Compare
- add a `ParsedOpcode` type rather than using a tuple - use constants for `PUSHDATA` bytes - add some documentation
This gives the different `LargeValue` constructors different types, but it can’t (yet) enforce minimal encoding at the type level.
This is one more step toward type-safe scripts. It also modifies `Script::parse` to not fail on bad opcodes, and adds a new `Script::parse_strict` that _does_ fail on bad opcodes.
c719e4a
to
87202b0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few minor style nits; some of my style complaints from earlier commits were already addressed by other changes by the time I got to the end of the PR, but some (in particular, the request that functional constructs not be used for side-effecting code) remain.
src/lib.rs
Outdated
ScriptError::ScriptNumError(_) => ScriptError::UnknownError, | ||
_ => serr, | ||
}), | ||
Error::Ok(serr) => Error::Ok(normalize_script_error(serr)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went and looked at the documentation of Error::Ok
, and I could not figure out why this is called Ok
- can you document how or why this particular error is okay in some sense? I know it's not something that's specific to this PR, but that naming is definitely confusing without there being additional elaboration.
src/interpreter.rs
Outdated
@@ -484,7 +492,7 @@ fn eval_control( | |||
vexec.pop()?; | |||
} | |||
|
|||
OP_VERIF | OP_VERNOTIF => return Err(ScriptError::BadOpcode), | |||
OP_VERIF | OP_VERNOTIF => return Err(ScriptError::BadOpcode(Some(op.into()))), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I generally have had bad experiences when using .into()
; these can be broken by spooky action at a distance by the introduction of From
impls. Here it's probably fine, but I've gotten into the habit of avoiding into
entirely in favor of e.g. u8::from(op)
.
src/script.rs
Outdated
pub struct ParsedOpcode<'a> { | ||
/// The [`Result`] allows us to preserve unknown opcodes, which only trigger a failure if | ||
/// they’re on an active branch during interpretation. | ||
pub opcode: Result<Opcode, u8>, | ||
pub remaining_code: &'a [u8], | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First, does this need to be part of the public API? Second, can the members be crate-private? Finally, it looks like now the Result
is just being used as a generic Either
type; I'd prefer a bespoke enum instead.
src/script.rs
Outdated
pub struct ParsedOpcode<'a> { | ||
/// The [`Result`] allows us to preserve unknown opcodes, which only trigger a failure if | ||
/// they’re on an active branch during interpretation. | ||
pub opcode: Result<Opcode, u8>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pub opcode: Result<Opcode, u8>, | |
pub opcode: OpcodeParseResult, |
(potentially pub(crate)?)
src/script.rs
Outdated
opcode | ||
.map_err(|byte| ScriptError::BadOpcode(Some(byte))) | ||
.map(|op| result.push(op)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
opcode | |
.map_err(|byte| ScriptError::BadOpcode(Some(byte))) | |
.map(|op| result.push(op)) | |
match opcode { | |
Known(op) => { | |
result.push(op); | |
Ok(()) | |
} | |
Unknown(byte) => Err(ScriptError::BadOpcode(Some(byte))) | |
} |
vec.clone().try_into().map_or( | ||
vec.clone().try_into().map_or( | ||
vec.clone().try_into().map_or( | ||
vec.try_into().map_or(None, |bv| Some(OP_PUSHDATA4(bv))), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would really prefer that these were <type>::try_from
calls instead; relying on type inference makes this code really hard to read.
|
||
use PushValue::*; | ||
#[derive(Clone, PartialEq, Eq, PartialOrd, Ord, Debug)] | ||
pub enum PushValue { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Variants still need documentation.
opcode | ||
.map_err(|byte| ScriptError::BadOpcode(Some(byte))) | ||
.map(|op| result.push(op)) | ||
result.push(opcode) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a better resolution to my previous comment, but I would still prefer imperative style here given that it's mutating variables in the enclosing scope.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utACK 87202b0. Comments are all style and non-blocking (particularly if addressing them makes the later PRs harder to rebase), though I will likely go through and make the changes myself in a follow-up PR if they aren't addressed.
pub fn run_test_vector( | ||
tv: &TestVector, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just noticed that this is now in the public API (behind test-dependencies
), and TestVector
is now pub
, but there's no way to obtain a TestVector
in the public API (because crate::test_vectors
is private). Either make crate::test_vectors
public, or make this private (depending on which is intended).
let vec = v.to_vec(); | ||
vec.clone().try_into().map_or( | ||
vec.clone().try_into().map_or( | ||
vec.clone().try_into().map_or( | ||
vec.try_into().map_or(None, |bv| Some(OP_PUSHDATA4(bv))), | ||
|bv| Some(OP_PUSHDATA2(bv)), | ||
), | ||
|bv| Some(OP_PUSHDATA1(bv)), | ||
), | ||
|bv| Some(PushdataBytelength(bv)), | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nits:
- This clones the data one more time than necessary.
- It also clones the data way more times than necessary, because
.map_or
eagerly evaluates its first argument. You'd need to use.map_or_else
instead to get lazy evaulation. - I find nested
map_or
(where theor
case is complex) unreadable, and for the inner case where there is no further nesting, it is more verbose than.ok().map(_)
.
let vec = v.to_vec(); | |
vec.clone().try_into().map_or( | |
vec.clone().try_into().map_or( | |
vec.clone().try_into().map_or( | |
vec.try_into().map_or(None, |bv| Some(OP_PUSHDATA4(bv))), | |
|bv| Some(OP_PUSHDATA2(bv)), | |
), | |
|bv| Some(OP_PUSHDATA1(bv)), | |
), | |
|bv| Some(PushdataBytelength(bv)), | |
) | |
if let Ok(bv) = v.to_vec().try_into() { | |
Some(PushdataBytelength(bv)) | |
} else if let Ok(bv) = v.to_vec().try_into() { | |
Some(OP_PUSHDATA1(bv)) | |
} else if let Ok(bv) = v.to_vec().try_into() { | |
Some(OP_PUSHDATA2(bv)) | |
} else { | |
v.to_vec().try_into().ok().map(OP_PUSHDATA4) | |
} |
Or if we want the more functional approach:
let vec = v.to_vec(); | |
vec.clone().try_into().map_or( | |
vec.clone().try_into().map_or( | |
vec.clone().try_into().map_or( | |
vec.try_into().map_or(None, |bv| Some(OP_PUSHDATA4(bv))), | |
|bv| Some(OP_PUSHDATA2(bv)), | |
), | |
|bv| Some(OP_PUSHDATA1(bv)), | |
), | |
|bv| Some(PushdataBytelength(bv)), | |
) | |
v.to_vec() | |
.try_into() | |
.ok() | |
.map(PushdataBytelength) | |
.or_else(|| { | |
v.to_vec() | |
.try_into() | |
.ok() | |
.map(OP_PUSHDATA1) | |
}) | |
.or_else(|| { | |
v.to_vec() | |
.try_into() | |
.ok() | |
.map(OP_PUSHDATA2) | |
}) | |
.or_else(|| { | |
v.to_vec() | |
.try_into() | |
.ok() | |
.map(OP_PUSHDATA4) | |
}) | |
} |
match v { | ||
[] => None, | ||
_ => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of a single-case match, we generally use:
if v.is_empty() {
None
} else {
or
(!v.is_empty()).then(|| {
/// | ||
/// TODO: These should have lower bounds that can prevent non-minimal encodings, but that requires | ||
/// at least `const_generic_exprs`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should they? Is it the case that the C++ script engine never permits non-minimal encodings to be used during execution?
The upper bounds by contrast are clearly correct, as it is impossible to e.g. parse an OP_PUSHDATA1
and get more than 0xff
bytes.
match v { | ||
[] => Some(PushValue::SmallValue(OP_0)), | ||
[byte] => Some(match byte { | ||
0x81 => PushValue::SmallValue(OP_1NEGATE), | ||
1 => PushValue::SmallValue(OP_1), | ||
2 => PushValue::SmallValue(OP_2), | ||
3 => PushValue::SmallValue(OP_3), | ||
4 => PushValue::SmallValue(OP_4), | ||
5 => PushValue::SmallValue(OP_5), | ||
6 => PushValue::SmallValue(OP_6), | ||
7 => PushValue::SmallValue(OP_7), | ||
8 => PushValue::SmallValue(OP_8), | ||
9 => PushValue::SmallValue(OP_9), | ||
10 => PushValue::SmallValue(OP_10), | ||
11 => PushValue::SmallValue(OP_11), | ||
12 => PushValue::SmallValue(OP_12), | ||
13 => PushValue::SmallValue(OP_13), | ||
14 => PushValue::SmallValue(OP_14), | ||
15 => PushValue::SmallValue(OP_15), | ||
16 => PushValue::SmallValue(OP_16), | ||
_ => PushValue::LargeValue(PushdataBytelength([*byte; 1].into())), | ||
}), | ||
_ => LargeValue::from_slice(v).map(PushValue::LargeValue), | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: the inner match can be merged with the outer, and then we aren't duplicating the parsing logic for one-byte PushdataBytelength
s:
match v { | |
[] => Some(PushValue::SmallValue(OP_0)), | |
[byte] => Some(match byte { | |
0x81 => PushValue::SmallValue(OP_1NEGATE), | |
1 => PushValue::SmallValue(OP_1), | |
2 => PushValue::SmallValue(OP_2), | |
3 => PushValue::SmallValue(OP_3), | |
4 => PushValue::SmallValue(OP_4), | |
5 => PushValue::SmallValue(OP_5), | |
6 => PushValue::SmallValue(OP_6), | |
7 => PushValue::SmallValue(OP_7), | |
8 => PushValue::SmallValue(OP_8), | |
9 => PushValue::SmallValue(OP_9), | |
10 => PushValue::SmallValue(OP_10), | |
11 => PushValue::SmallValue(OP_11), | |
12 => PushValue::SmallValue(OP_12), | |
13 => PushValue::SmallValue(OP_13), | |
14 => PushValue::SmallValue(OP_14), | |
15 => PushValue::SmallValue(OP_15), | |
16 => PushValue::SmallValue(OP_16), | |
_ => PushValue::LargeValue(PushdataBytelength([*byte; 1].into())), | |
}), | |
_ => LargeValue::from_slice(v).map(PushValue::LargeValue), | |
} | |
match v { | |
[] => Some(PushValue::SmallValue(OP_0)), | |
[0x81] => Some(PushValue::SmallValue(OP_1NEGATE)), | |
[1] => Some(PushValue::SmallValue(OP_1)), | |
[2] => Some(PushValue::SmallValue(OP_2)), | |
[3] => Some(PushValue::SmallValue(OP_3)), | |
[4] => Some(PushValue::SmallValue(OP_4)), | |
[5] => Some(PushValue::SmallValue(OP_5)), | |
[6] => Some(PushValue::SmallValue(OP_6)), | |
[7] => Some(PushValue::SmallValue(OP_7)), | |
[8] => Some(PushValue::SmallValue(OP_8)), | |
[9] => Some(PushValue::SmallValue(OP_9)), | |
[10] => Some(PushValue::SmallValue(OP_10)), | |
[11] => Some(PushValue::SmallValue(OP_11)), | |
[12] => Some(PushValue::SmallValue(OP_12)), | |
[13] => Some(PushValue::SmallValue(OP_13)), | |
[14] => Some(PushValue::SmallValue(OP_14)), | |
[15] => Some(PushValue::SmallValue(OP_15)), | |
[16] => Some(PushValue::SmallValue(OP_16)), | |
_ => LargeValue::from_slice(v).map(PushValue::LargeValue), | |
} |
Self::get_op(pc).map( | ||
|ParsedOpcode { | ||
opcode, | ||
remaining_code, | ||
}| { | ||
pc = remaining_code; | ||
result.push(opcode) | ||
}, | ||
)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Self::get_op(pc).map( | |
|ParsedOpcode { | |
opcode, | |
remaining_code, | |
}| { | |
pc = remaining_code; | |
result.push(opcode) | |
}, | |
)?; | |
let ParsedOpcode { | |
opcode, | |
remaining_code, | |
} = Self::get_op(pc)?; | |
pc = remaining_code; | |
result.push(opcode); |
Some((leading_byte, remaining_code)) => Disabled::from_u8(*leading_byte).map_or( | ||
Ok(ParsedOpcode { | ||
opcode: SmallValue::from_u8(*leading_byte).map_or( | ||
Control::from_u8(*leading_byte).map_or( | ||
Operation::from_u8(*leading_byte) | ||
.map_or(Err((*leading_byte).into()), |op| { | ||
Ok(Opcode::Operation(op)) | ||
}), | ||
|ctl| Ok(Opcode::Control(ctl)), | ||
), | ||
|sv| Ok(Opcode::PushValue(PushValue::SmallValue(sv))), | ||
), | ||
remaining_code, | ||
}), | ||
|disabled| Err(ScriptError::DisabledOpcode(Some(disabled))), | ||
), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Non-blocking: Another case where the .map_or
is eagerly running when it should lazily run. However, this entire section will be rewritten when enum_primitive
is removed, so I'm fine leaving it as-is for now.
match self.parse().map(|script| script.into_iter().collect()) { | ||
Err(op_err) => Err(Ok(op_err)), | ||
Ok(Err(bad_ops)) => Err(Err(bad_ops)), | ||
Ok(Ok(ops)) => Ok(ops), | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a very confusing match without documentation. @nuttycom's enum suggestion would likely help, but also some comments would make things clearer.
if Bad::OP_RESERVED != bad { | ||
state.increment_op_count()?; | ||
} | ||
if Bad::OP_VERIF == bad || Bad::OP_VERNOTIF == bad || should_exec(&state.vexec) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should also work:
if Bad::OP_VERIF == bad || Bad::OP_VERNOTIF == bad || should_exec(&state.vexec) { | |
if matches!(bad, Bad::OP_VERIF | Bad::OP_VERNOTIF) || should_exec(&state.vexec) { |
(turns out it's the same length in this instance).
remaining_code, | ||
}| match opcode { | ||
Err(bad) => { | ||
if Bad::OP_RESERVED != bad { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if Bad::OP_RESERVED != bad { | |
// See the documentation of `Bad` for an explanation of this logic. | |
if Bad::OP_RESERVED != bad { |
This splits
Operation
into three separate enums –Control
(forif/else, which get executed regardless of
vexec
and disabledoperations, which always fail, regardless of
vexec
),Normal
, foroperations that respect
vexec
, andUnknown
for undefined opcodeswhich only fail if they’re on an active branch. This is done so that the
evaluator can be split on the same lines.
It also integrates the values with the
PushValue
opcodes.And it exposes two modules (
op
andpv
) to allow a separation between internal opcode structure for the implementation and the interface consumers (PCZT, etc.) use.This depends on #221 and #222.