Skip to content

Investigate string decode performance improvements #4

Open
@andrewthad

Description

@andrewthad

In 665c428, I've added a benchmark that focuses on the performance of decoding strings. Here's the performance of decoding 4KB of json, most of which is just 30-character strings:

benchmarked json/url/100/decode
time                 8.053 μs   (7.736 μs .. 8.411 μs)
                     0.996 R²   (0.989 R² .. 0.999 R²)
mean                 8.089 μs   (8.051 μs .. 8.139 μs)
std dev              74.41 ns   (48.48 ns .. 107.8 ns)

benchmarked aeson/url/100/decode
time                 22.66 μs   (21.50 μs .. 24.51 μs)
                     0.994 R²   (0.989 R² .. 0.999 R²)
mean                 22.72 μs   (22.49 μs .. 22.84 μs)
std dev              276.1 ns   (159.0 ns .. 414.7 ns)

Not bad. We're ahead of aeson by a factor of three, but can this number be improved further? String decode currently walks the string, byte-by-byte, until it finds the end of it. As it walks the string, it keeps up with information about whether or not anything will need to be unescaped. I think that it should be possible to instead walk the string w64-by-w64. This could be done by adapting the approach in bytestring-encodings to work with ByteArray instead of Ptr and adding some additional bit twiddling hacks. The general idea would be:

  • Fail if you encounter a backslash
  • Fail if you encounter a byte less than 0x20
  • Fail if you encounter a byte greater than 0x7E
  • Fail if your read would give you a w64 that straddled the end of the string (simplifies things a little)
  • Succeed if you encounter a "

Failure just means to fall back to the existing string decode logic, and succeed means that we may perform a memcpy (as we do now). This whole thing is a little bit tricky because it's possible to encounter both a " and a failing byte in them same w64, and then the order that they showed up in matters. But I think that the conservative action of always failing even if the quote showed up first is probably the best course. It simplifies it, and the bytes after a string ends are probably ascii characters anyway, so this shouldn't cost a ton of performance.

This needs to be implemented and benchmarked, but I think it could this benchmark at least 2x faster.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions