Skip to content

Lossless Syntax Tree Pattern

andychu edited this page Mar 29, 2017 · 37 revisions

Hacker News Comments on "From AST to Lossless Syntax Tree"

Other Names

  • RedBaron: Full Syntax Tree
  • Go: AST
  • JavaScript: AST or CST
  • Roslyn: Syntax Tree -- makes the explicit claim of being round-trippable
    • Red-Green Trees? This term is more about how to handle insertions in an IDE, which change the position of every node after the insertion. Red tree is a wrapper around the green tree.

IDE Architecture

  • Roslyn -- Microsoft's C# compiler platform

  • JetBrains Implementing Parser and PSI -- The AST nodes have a direct mapping to text ranges in the underlying document. The bottom-most nodes of the AST match individual tokens returned by the lexer, and higher level nodes match multiple-token fragments. Operations performed on nodes of the AST tree, such as inserting, removing, reordering nodes and so on, are immediately reflected as changes to the text of the underlying document.

    • There is currently no ready way to reuse existing language grammars, for example, from ANTLR, for creating custom language parsers. The parsers need to be coded manually. -- this should go in my meta-language rant
    • Custom language parser and PSI classes can be generated from grammars using Grammar-Kit plugin. Besides code generation it provides various features for editing grammar files: syntax highlighting, quick navigation, refactorings and more. The Grammar-Kit plugin is built using its own engine and its source code can be found on GitHub.
  • Making Tools Kythe-Compatible -- I would have thought that Kythe had the LST concept, because it's basically an IDE, but I don't see it. I guess that is because it's read-only?

Lossless Representations

  • cst for JavaScript -- CST means Concrete Syntax Tree. Unlike an AST (Abstract Syntax Tree), a CST contains all the information from the JavaScript source file: whitespace, punctuators, comments. This information is extremely useful for code style checkers and other code linters. CST is also useful for cases when you need to apply modifications to existing JavaScript files while preserving the initial file formatting.

  • JRuby Parser -- JRuby once had a parser which kept track of all sorts of extra information when it built it's Abstract Syntax Tree (AST). Stuff like character offsets where a particular element started or ended. The impact of this extra information was a more than noticeable amount of memory and a bit of a perf impact. At the time we decided to discontinue having this sort of parser in JRuby we created JRubyParser.* -- related to Parsing Is Difficult

Style-Preserving Source Translators

  • 2to3 -- for Python 2 to Python 3

  • RedBaron -- RedBaron, by relying on Baron, uses a Full Syntax Tree (FST). It’s like an AST except it keeps every information, included formatting, and is then a lossless representation of the source code.

  • Facebook's jscodeshift -- uses the recast library

    • recast - JavaScript syntax tree transformer, nondestructive pretty-printer, and automatic source map generator
  • Facebook's pfff -- Tools for code analysis, visualizations, or style-preserving source transformation

  • External Clang Examples -- has clang-mutate, but it's dormant

Source Reformatters

TODO: What data structures do these use?

Other Docs / Designs / Discussions

  • Haskell GHC AST Annotations -- Simplify the roundtripping and modification of source by explicitly capturing the missing location information for the syntactic markers

  • What to do about comments? on Lambda the Ultimate

  • https://news.ycombinator.com/item?id=13914218 -- From C# compiler dev -- Resilient parsing. This is the big one! If you give our parser a string that is illegal according to the grammar, our parser will still give you a syntax tree! (We'll also spit errors out). But getting a syntax tree regardless of the actual validity of the program being passed in means that the IDE can give autocomplete and report type-checking error messages. As an example, the code "var x = velocity." is invalid C#. However, in order to give autocomplete on "velocity", that code needs to be parsed into an AST, and then typechecked, and then we can extract the members on the type in order to provide a good user experience.

Transpilers

These don't need to use the Lossless Syntax Tree because the resulting code won't be edited by a human.

  • CoffeeScript, etc.
Clone this wiki locally