Skip to content

Surprising parse results from binary and octal constants #3421

Open
@gwhitney

Description

@gwhitney

Describe the bug
Binary and octal constants consume variable names when juxtaposed (e.g., for implicit multiplication) if and only if they happen to consist of hex characters.

To Reproduce

const eval = math.evaluate
const scope = {c: 3, k:3}
eval('1k',` scope)  // 3
eval('0x1k', scope)  // 3
eval('0o1k', scope)  // 3
eval('0b1k', scope)  // 3

eval('1c', scope)  // 3
eval('0x1c', scope)  // 28; makes sense, `1c` is a hex number equal to 28 decimal
eval('0o1c', scope)  // SyntaxError: String "0o1c" is not a valid number
eval('0b1c', scope)  // SyntaxError: String "0b1c" is not a valid number

Discussion

Is the difference between 0b1k and 0b1c intentional? The syntax error in the octal/binary case starts to look even less sensible with something like 0b1010Face which seems pretty clear as ten in binary times the value of the variable named Face, or with 0b1010ConversionValue which is a syntax error just because ConversionValue happens to start with C, whereas 0b1010ValueForConversion would parse fine.

I agree with the idea that binary and octal constants should consume all digits; 0o108 is much more likely a typo than a weird way of writing 64 (as octal 8 implicit-multiplied by decimal 8). But it seems to me that either (I)
binary and octal constants (unlike hexadecimal) should not consume any alphabetic characters, or that (II) all of binary, octal, and hexadecimal should consume all alphabetic characters, essentially disallowing implicit multiplication by such constants without intervening whitespace (0b1010 Mask is easier to read than 0b1010Mask anyway). The goal is not to get weird parsing behavior change just because you decide to rename a variable in your expressions in a way that changes whether the first character is in the hex range or not.

Especially bad along these lines would be if you wrote, say, 0x80Sphere for implicit multiplication, and you global-replaced Sphere to Cone to get 0x80Cone and happened to also have a variable named one in scope, because then it would silently compute 2060 * one instead of 64 * Cone.

So on balance, I'd recommend requiring a non-alphabetic character after a hex constant (whitespace or some other delimiter), and I could go either way on whether binary and octal constants consume only digits (so 0b1c is allowed as 1 * c) or also require a non-alphabetic character after them. There are other reasonable resolutions, but the current state of affairs seems surprisingly inconsistent. Thanks for your thoughts.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions