-
-
Notifications
You must be signed in to change notification settings - Fork 163
Lossless Syntax Tree Pattern
Hacker News Comments on "From AST to Lossless Syntax Tree"
- RedBaron: Full Syntax Tree
- Go: AST
- JavaScript: AST or CST
- Roslyn: Syntax Tree -- makes the explicit claim of being round-trippable
- Red-Green Trees? This term is more about how to handle insertions in an IDE, which change the position of every node after the insertion. Red tree is a wrapper around the green tree.
-
Roslyn -- Microsoft's C# compiler platform
-
JetBrains Implementing Parser and PSI -- The AST nodes have a direct mapping to text ranges in the underlying document. The bottom-most nodes of the AST match individual tokens returned by the lexer, and higher level nodes match multiple-token fragments. Operations performed on nodes of the AST tree, such as inserting, removing, reordering nodes and so on, are immediately reflected as changes to the text of the underlying document.
- There is currently no ready way to reuse existing language grammars, for example, from ANTLR, for creating custom language parsers. The parsers need to be coded manually. -- this should go in my meta-language rant
- Custom language parser and PSI classes can be generated from grammars using Grammar-Kit plugin. Besides code generation it provides various features for editing grammar files: syntax highlighting, quick navigation, refactorings and more. The Grammar-Kit plugin is built using its own engine and its source code can be found on GitHub.
-
Making Tools Kythe-Compatible -- I would have thought that Kythe had the LST concept, because it's basically an IDE, but I don't see it. I guess that is because it's read-only?
-
cst for JavaScript -- CST means Concrete Syntax Tree. Unlike an AST (Abstract Syntax Tree), a CST contains all the information from the JavaScript source file: whitespace, punctuators, comments. This information is extremely useful for code style checkers and other code linters. CST is also useful for cases when you need to apply modifications to existing JavaScript files while preserving the initial file formatting.
-
JRuby Parser -- JRuby once had a parser which kept track of all sorts of extra information when it built it's Abstract Syntax Tree (AST). Stuff like character offsets where a particular element started or ended. The impact of this extra information was a more than noticeable amount of memory and a bit of a perf impact. At the time we decided to discontinue having this sort of parser in JRuby we created JRubyParser.* -- related to Parsing Is Difficult
-
2to3 -- for Python 2 to Python 3
-
RedBaron -- RedBaron, by relying on Baron, uses a Full Syntax Tree (FST). It’s like an AST except it keeps every information, included formatting, and is then a lossless representation of the source code.
-
Facebook's jscodeshift -- uses the recast library
- recast - JavaScript syntax tree transformer, nondestructive pretty-printer, and automatic source map generator
-
Facebook's pfff -- Tools for code analysis, visualizations, or style-preserving source transformation
- Facebook's Spatch for PHP, based on the pfff toolkit
-
External Clang Examples -- has clang-mutate, but it's dormant
TODO: What data structures do these use?
- clang-format -- Clang AST, which is huge
- http://clang.llvm.org/docs/LibFormat.html -- design is undocumented
- gofmt -- uses the Go AST package, which is reputed to be messy
- dartfmt -- The Hardest Program I've Ever Written
- yapf for Python
- scalafmt
- prettier for JavaScript
- rustfmt Design -- Blog Post
-
Haskell GHC AST Annotations -- Simplify the roundtripping and modification of source by explicitly capturing the missing location information for the syntactic markers
-
What to do about comments? on Lambda the Ultimate
-
https://news.ycombinator.com/item?id=13914218 -- From C# compiler dev -- Resilient parsing. This is the big one! If you give our parser a string that is illegal according to the grammar, our parser will still give you a syntax tree! (We'll also spit errors out). But getting a syntax tree regardless of the actual validity of the program being passed in means that the IDE can give autocomplete and report type-checking error messages. As an example, the code "var x = velocity." is invalid C#. However, in order to give autocomplete on "velocity", that code needs to be parsed into an AST, and then typechecked, and then we can extract the members on the type in order to provide a good user experience.
These don't need to use the Lossless Syntax Tree because the resulting code won't be edited by a human.
- CoffeeScript, etc.