-
Notifications
You must be signed in to change notification settings - Fork 38
Description
I have documents with Unicode characters, which rowan correctly handles. But the orgize-wasm frontend has issues because it creates the data-range links without considering that the indices are byte ranges, and JavaScript has UTF-16 strings. This clash causes the links to get out of sync rapidly.
For instance, parse this input in the web frontend:
Hello
world
Foo 💩 Bar
Baz
- 💩
- Lorem ipsum dolor sit amet
- 💩
- consectetur adipiscing elit
- 💩
- sed do eiusmod tempor incididunt ut labore et dolore magna aliquaIn the "Syntax" tab, clicking on the range in [email protected] "Hello\r\n" properly highlights "Hello", and the same for "World". However, after the poo emoji, the links will desynchronize with the input. The range in [email protected] "Baz\r\n" will highlight the "z" and all of the newlines up to the hyphen. The list is worse, since each emoji extends the range by 2 additional "phantom" UTF-16 characters.