Skip to content

String#hash of strings generated by decompress_buffered is different from that of literal strings in Ruby 3.3.0 or later #89

Closed
@abicky

Description

@abicky

We encountered a strange problem where we could not look up the value of a Hash whose key was generated by Zstd.decompress using the same multibyte string literal.
I found that the problem can be reproduced only if the compressed data doesn't have the Frame_Content_Size information, that is, decompress_buffered is used.

I'm not sure if it is a bug of Ruby or zstd-ruby.

Here is the reproducible code:

require 'zstd-ruby'

# This constant was generated by the following Java code
# and the compressed data doesn't have Frame_Content_Size:
#
#   import com.github.luben.zstd.RecyclingBufferPool;
#   import com.github.luben.zstd.ZstdOutputStreamNoFinalizer;
#   import org.apache.kafka.common.utils.ByteBufferOutputStream;
#   import javax.xml.bind.DatatypeConverter;
#
#   import java.io.BufferedOutputStream;
#   import java.io.DataOutputStream;
#   import java.io.IOException;
#   import java.nio.charset.StandardCharsets;
#
#   class Main {
#       public static void main(String[] args) throws IOException {
#           ByteBufferOutputStream buffer = new ByteBufferOutputStream(10);
#           DataOutputStream stream = new DataOutputStream(new BufferedOutputStream(new ZstdOutputStreamNoFinalizer(buffer, RecyclingBufferPool.INSTANCE), 16 * 1024));
#           stream.write("あ".getBytes(StandardCharsets.UTF_8));
#           stream.close();
#
#           System.out.println(DatatypeConverter.printHexBinary(buffer.buffer().array()));
#       }
#   }
COMPRESSED_DATA_HEX = '28B52FFD0058180000E38182010000'

data = Zstd.decompress([COMPRESSED_DATA_HEX].pack('H*')).force_encoding('UTF-8')
expected_data = 'あ'
puts <<~MSG
  RUBY_VERSION: #{RUBY_VERSION}
  data: #{data}
  data == expected_data: #{data == expected_data}
  data.equal?(expected_data): #{data.equal?(expected_data)}
  data.hash: #{data.hash}
  expected_datadata.hash: #{expected_data.hash}
  { expected_data => 1 }.has_key?(data): #{{ expected_data => 1 }.has_key?(data)}
MSG

Here is the output:

RUBY_VERSION: 3.3.1
data: あ
data == expected_data: true
data.equal?(expected_data): false
data.hash: 3328309050837243483
expected_datadata.hash: 3486244608461787623
{ expected_data => 1 }.has_key?(data): false

As you can see, { expected_data => 1 }.has_key?(data) is false even though data == expected_data is true.

In Ruby 3.2.2, the result is expected.

RUBY_VERSION: 3.2.2
data: あ
data == expected_data: true
data.equal?(expected_data): false
data.hash: 3278076437348888334
expected_datadata.hash: 3278076437348888334
{ expected_data => 1 }.has_key?(data): true

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions