Test of new comment attachment model #300
Draft
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is a starting point for a fix for #299 intended for discussion of the approach.
Current approach to comment attachment
When a token is read by the parser in
readPeek
the lexer is advanced until the next token is not a comment or linefeed. Any comments that were read in this process are attached as leading comments to the new node meta struct stored in the parserpeekToken
pointer. This means all comments found in the token stream are attached as leading comments to the first syntax node that follows them.The various parser functions then move these leading comments to the appropriate location / node with the
swap*()
helper functions.This method captures most comments, except for comments after the last top level definition. However unless a node parser function explicitly extracts the comments out of tokens they are not added to the tree. For example the
local
token in a variable declaration, the semi-colons at the end of line statements, etc. Any missed calls toswap*()
for these tokens will result in a comment being lost and not attached to the final syntax tree.Proposed model
With the addition of a formatting sub-command the importance of not losing any of the input source or comments is incredibly important. I propose that instead of automatically attaching comments to each token that comments are added to a list of unbound comments which are attached to nodes at boundaries in the parsing process.
For this stream of tokens (where
<#>
are comments):For each token processed as part of the parsing of a declare statement the following would be the process of attaching comments to the declare node and its child nodes.
declare:
unbound - [<1>]
unbound peek - [<2>]
unbound comments are attached at leading to the declare statement node and the slice reset
local:
unbound - [<2>]
unbound peek - [<3>]
unbound comments are attached at infix to the declare statement node and the slice reset
var.s:
unbound - [<3>]
unbound peek - [<4>]
unbound comments are attached at leading to the ident node and the slice reset
unbound peek comments are attached at trailing to the ident node and the slice reset
STRING:
unbound - []
unbound peek - [<5>]
unbound peek comments are attached at trailing to the ident node and the slice reset
;:
unbound - []
unbound peek - [<6>]
there are no unbound comments so no action
unbound peek comments are attached at trailing to the declare statement node and the slice reset.
The resulting attachments for the parsed declare statement:
By attaching all unbound comments each time we ensure that even if we have an error in our logic the worst case is a comment being placed in the wrong location during formatting. A missed attachment call would not lead to a lost comment in the output.
Implementation
The
readPeek
method is updated to append any comments it finds into one of two slices tracking unbound comments.Comments are first paced into unboundPeekComments by
readPeek
and then whennextToken
is called theunboundPeekComments
entries are removed and appended tounboundComments
. This two stage batching of unbound comments allows for a simple method for allowing the parse functions to attach trailing comments.The interface for attaching comments to nodes is the new parser method
attachUnboundComments
. It takes any unbound comments that have been seen since the last attachment call and attaches them at the specified attachment point (leading, trailing, infix).If
peek
argument is true theunboundPeakComments
are appended to any comments still inunboundComments
. This would typically be used to attach trailing comments to a node.When called with true value for peek would also be when the function could be setup to break up ambiguous comments.
For example it could break up the following comments
a
andb
with a being an infix comment for the subroutine and leaving commentb
to be attached as a leading comment for the log statement.Tests
I have updated the
ast.SubroutineDeclaration
,ast.DeclareStatement
, andast.Ident
nodes to use the new model and have setup a test case showing the results.