Skip to content

allow hash() to take multiple arguments similar to pack() #1526

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

spoonincode
Copy link
Member

This change allows passing hash() multiple arguments to concatenate together similar to how pack() already does. This can allow some more concise usage of hash() without resorting to make_tuple/pair() and remembering to avoid the trap of needing to use cref along with make_tuple/pair() to avoid copies.

I updated generate_action_digest() as a good demonstration of the improved conciseness. Of course there are other locations we could potentially change too.

Unfortunately there were a few gnarly quirks that came out of adding just this "simple" variadic template that I will describe below. If it is too gnarly (i.e. too risky) we can explore other approaches that aren't so invasive. For example I have a different branch main...vardigest that just creates a new variadic digest() call so as not to conflict with any existing hash() usages at all.

inline void pack( Stream& s, T (&v)[N]) {
fc::raw::pack( s, unsigned_int((uint32_t)N) );
for (uint64_t i = 0; i < N; ++i)
fc::raw::pack(s, v[i]);
}

template<typename Stream, typename T, size_t N>
requires (!std::is_same_v<std::remove_cv_t<T>, char>)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When changing from raw_fwd.hpp to raw.hpp (a required change afaict), I would get ambiguous call errors between the T(&v)[N] (only present in raw.hpp) and char* overloads, due to a call such as sha256::hash("foo"). This seems like an appropriate change to keep C strings using the char* overload.

template<typename T>
static sha256 hash( const T& t )
template<typename... T>
requires NotTwoArgsCharUint32<T...>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The various hash types have some existing overloads of hash() such as

static sha256 hash( const char* d, uint32_t dlen );
static sha256 hash( const std::string& );
static sha256 hash( const sha256& );

This is pretty gnarly on its own because the std::string overload doesn't follow the same rules as if sending a std::string through the encoder. i.e.

   sha256 a = sha256::hash(std::string("banana"));
   sha256::encoded enc;
   fc::raw::pack(enc, std::string("banana"));
   sha256 b = enc.result();
   // a != b

but the real problem is the hash( const char* d, uint32_t dlen ) overload that will be used, for example,

std::vector<char> foo;
sha256::hash(foo.data(), foo.size())

I need to ensure that calls like sha256::hash(foo.data(), foo.size()) continue to go to the existing overload, so I need to remove the variadic function from consideration in such cases (it is picked without this).

template <typename T>
static sha3 hash(const T &t, bool is_nist=true)
struct keccak {};
struct nist {};
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There didn't appear to be any existing usages of sha3::hash(). I opted to use tag dispatching to select between the two sha3 options and still have multiple arguments. The downside with this is that can't use sha3's hash() somewhere the hash type itself is templated. e.g.

template <typename HashType>
HashType do_something(unsigned x) {
   return HashType::hash(x);
}

@@ -20,11 +22,12 @@ class sha1
static sha1 hash( const char* d, uint32_t dlen );
static sha1 hash( const std::string& );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I think we should rename these to hash_str and remove the complicated template concepts.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe safest to rename the variadic one to hash_values so it is clear which one is in-use.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could be a viable option; I was reluctant to make changes across existing code but I can try and see just how pervasive those changes would be. (many tests did fail without the complicated concepts though it's possible only due to a small number root causes)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, you could just add:

   template<typename T1, typename T2, typename... T>
   static sha256 hash( const T1& t1, const T2& t2, const T&... t )
    {
      sha256::encoder e;
      fc::raw::pack(e, t1, t2, t...);
      return e.result();
    }

to support the case a hash called with two or more parameters. I believe it would require no change to existing code or new concepts.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that solves the problem though. Need std::vector .data() and .size() to go through the 'old' overload. e.g.

#include <iostream>
#include <vector>
#include <string>
#include <cstddef>
#include <cstdint>
#include <typeinfo>

struct sha256 {};

struct hasher {
    static sha256 hash(const char* d, uint32_t dlen) {
        std::cout << "CALLED: hash(const char*, uint32_t)" << std::endl;
        return {};
    }

    static sha256 hash(const std::string& s) {
        std::cout << "CALLED: hash(const std::string&)" << std::endl;
        return {};
    }

    static sha256 hash(const sha256& h) {
        std::cout << "CALLED: hash(const sha256&)" << std::endl;
        return {};
    }

    template<typename T1, typename T2, typename... T>
    static sha256 hash( const T1& t1, const T2& t2, const T&... t )
    {
        std::cout << "CALLED: Variadic hash"<< std::endl;
        return {};
    }
};

int main() {
    std::vector<char> wif_bytes = {'t', 'e', 's', 't'};
    hasher::hash(wif_bytes.data(), wif_bytes.size());
    return 0;
}

when running this get CALLED: Variadic hash but we need CALLED: hash(const char*, uint32_t)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly the std::is_convertible_v<std::decay_t<T>, size_t> is too loose, since even an fc::unsigned_int will match it (due to it having operator uint32_t()), so you can end up with a nasty surprise.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a valid concern for the CStyleStringPtr version.

In my comment I was more worried about the variadic hash version (hash( const T&... t )) catching unexpected calls. One way to check that it doesn't happen would be to add an assert(0) in its implementation, and run the ci/cd on code that is not yet using the variadic version. If the asserton build passes, it will be a good indication that existing uses of hash didn't migrate to using the variadic version.

Upon reflection, I think the pattern of hash accepting a char * + length is not a good one, it probably would have been better to always have a single parameter to specify an object to be hashed, whether it is

  • a char* ( expected to be null-terminated)
  • a std::string or std::string_view
  • a vector<T> where T is a hashable type (or maybe any iterable container where the value_type is hashable)
  • a std::span<T> where T is a hashable type
  • etc...

which would avoid the ambiguity of a two argument hash call specifying one or two values to be hashed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That can all be avoided by renaming the existing string ones to hash_chars()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does increasingly feel like renaming the old overloads is the best option, provided there is not too much fallout

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the number of calls to the non-templated hash() was substantially more than may have guessed. I prepared a commit where I replace all them to (I hope; didn't audit yet) 1:1 equivalent of new hash_raw(), c893000

But a substantial number of these don't need to be 1:1 exact as they were before (like in the benchmark, or many/most of tests), so I guess I can build on top of the commit a new commit that migrates some back to plain hash() to limit the number of calls hash_raw()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants