Skip to content

surrogate pair and lone surrogate support in stringLiteral #1701

@hardfist

Description

@hardfist

Steps to reproduce

for following code tsgo and typescript generate differents token text

"🦀\ud7ff\ud800\ud801\uD83E\uDD80"

It seems tsgo using go string to store codePoint(from JS string),

func (f *NodeFactory) NewStringLiteral(text string) *Node {

but JS string is not strict UTF16 string which may contain lone surrogate while go string will convert lone surrogate to U+FFFD which is a lossy conversion and lose the origin info

Behavior with [email protected]

🦀\ud7ff\ud800\ud801\uD83E\uDD80

https://ts-ast-viewer.com/#code/ESPg3AG7A6CuAmDsAzB0YA4AM6UYIzQCKoDMAogYesEA

Behavior with tsgo

🦀퟿����

https://rslint.rs/playground/?tab=ast&code=%22%F0%9F%A6%80%5Cud7ff%5Cud800%5Cud801%5CuD83E%5CuDD80%22

Metadata

Metadata

Assignees

No one assigned

    Labels

    help wantedExtra attention is needed

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions