Test of new comment attachment model #300

richardmarshall · 2024-04-07T06:13:22Z

This is a starting point for a fix for #299 intended for discussion of the approach.

Current approach to comment attachment

When a token is read by the parser in readPeek the lexer is advanced until the next token is not a comment or linefeed. Any comments that were read in this process are attached as leading comments to the new node meta struct stored in the parser peekToken pointer. This means all comments found in the token stream are attached as leading comments to the first syntax node that follows them.

The various parser functions then move these leading comments to the appropriate location / node with the swap*() helper functions.

This method captures most comments, except for comments after the last top level definition. However unless a node parser function explicitly extracts the comments out of tokens they are not added to the tree. For example the local token in a variable declaration, the semi-colons at the end of line statements, etc. Any missed calls to swap*() for these tokens will result in a comment being lost and not attached to the final syntax tree.

Proposed model

With the addition of a formatting sub-command the importance of not losing any of the input source or comments is incredibly important. I propose that instead of automatically attaching comments to each token that comments are added to a list of unbound comments which are attached to nodes at boundaries in the parsing process.

For this stream of tokens (where <#> are comments):

<1> declare <2> local <3> var.s <4> STRING <5> ; <6>

For each token processed as part of the parsing of a declare statement the following would be the process of attaching comments to the declare node and its child nodes.

declare:
unbound - [<1>]
unbound peek - [<2>]

unbound comments are attached at leading to the declare statement node and the slice reset

local:
unbound - [<2>]
unbound peek - [<3>]

unbound comments are attached at infix to the declare statement node and the slice reset

var.s:
unbound - [<3>]
unbound peek - [<4>]

unbound comments are attached at leading to the ident node and the slice reset
unbound peek comments are attached at trailing to the ident node and the slice reset

STRING:
unbound - []
unbound peek - [<5>]

unbound peek comments are attached at trailing to the ident node and the slice reset

;:
unbound - []
unbound peek - [<6>]

there are no unbound comments so no action
unbound peek comments are attached at trailing to the declare statement node and the slice reset.

The resulting attachments for the parsed declare statement:

declareStatment {
    leading: [<1>]
    infix: [<2>]
    trailing: [<6>]
    name: ident {
        leading: [<3>]
        trailing: [<4>]
    }
    type: ident {
        trailing: [<5>]
    }
}

By attaching all unbound comments each time we ensure that even if we have an error in our logic the worst case is a comment being placed in the wrong location during formatting. A missed attachment call would not lead to a lost comment in the output.

Implementation

The readPeek method is updated to append any comments it finds into one of two slices tracking unbound comments.

unboundComments ast.Comments
unboundPeekComments ast.Comments

Comments are first paced into unboundPeekComments by readPeek and then when nextToken is called the unboundPeekComments entries are removed and appended to unboundComments. This two stage batching of unbound comments allows for a simple method for allowing the parse functions to attach trailing comments.

The interface for attaching comments to nodes is the new parser method attachUnboundComments. It takes any unbound comments that have been seen since the last attachment call and attaches them at the specified attachment point (leading, trailing, infix).

func (p *Parser) attachUnboundComments(node *ast.Node, attachmentPoint string, peek bool)

If peek argument is true the unboundPeakComments are appended to any comments still in unboundComments. This would typically be used to attach trailing comments to a node.

p.attachUnboundComments(s, AttachmentTrailing, true)

When called with true value for peek would also be when the function could be setup to break up ambiguous comments.

For example it could break up the following comments a and b with a being an infix comment for the subroutine and leaving comment b to be attached as a leading comment for the log statement.

sub foo { // a 
    // b
    log "foo";
}

NOTE: this type of logic has not yet been implemented.

Tests

Note: The general test suite is currently failing due to the comment handling changes not being applied to all node types yet.

I have updated the ast.SubroutineDeclaration, ast.DeclareStatement, and ast.Ident nodes to use the new model and have setup a test case showing the results.

NOTE: I've left some debugging prints in place to help visualize where each comment is being attached. These will be removed later.

go test -run TestNewCommentMode ./parser -v
=== RUN   TestNewCommentModel
token sub: Attached 1 comments to &ast.SubroutineDeclaration{Meta:(*ast.Meta)(0xc0001968c0), Name:(*ast.Ident)(nil), Block:(*ast.BlockStatement)(nil), ReturnType:(*ast.Ident)(nil)} at (leading):
/* 1 */
token foo: Attached 1 comments to &ast.Ident{Meta:(*ast.Meta)(0xc000196960), Value:"foo"} at (leading):
/* 2 */
token foo: Attached 1 comments to &ast.Ident{Meta:(*ast.Meta)(0xc000196960), Value:"foo"} at (trailing):
/* 3 */
token declare: Attached 2 comments to &ast.DeclareStatement{Meta:(*ast.Meta)(0xc000196aa0), Name:(*ast.Ident)(nil), ValueType:(*ast.Ident)(nil)} at (leading):
/* 4 */
/* 5 */
token declare: Attached 1 comments to &ast.DeclareStatement{Meta:(*ast.Meta)(0xc000196aa0), Name:(*ast.Ident)(nil), ValueType:(*ast.Ident)(nil)} at (infix):
/* 6 */
token var.s: Attached 1 comments to &ast.Ident{Meta:(*ast.Meta)(0xc000196be0), Value:"var.s"} at (leading):
/* 7 */
token var.s: Attached 1 comments to &ast.Ident{Meta:(*ast.Meta)(0xc000196be0), Value:"var.s"} at (trailing):
/* 8 */
token STRING: Attached 1 comments to &ast.Ident{Meta:(*ast.Meta)(0xc000196c80), Value:"STRING"} at (trailing):
/* 9 */
token declare: Attached 2 comments to &ast.DeclareStatement{Meta:(*ast.Meta)(0xc000196aa0), Name:(*ast.Ident)(0xc000190240), ValueType:(*ast.Ident)(0xc000190288)} at (trailing):
/* 10 */
/* 11 */
token sub: Attached 2 comments to &ast.SubroutineDeclaration{Meta:(*ast.Meta)(0xc0001968c0), Name:(*ast.Ident)(0xc0001901b0), Block:(*ast.BlockStatement)(0xc0001de180), ReturnType:(*ast.Ident)(nil)} at (trailing):
/* 12 */
/* 13 */
--- PASS: TestNewCommentModel (0.00s)
PASS
ok      github.com/ysugimoto/falco/parser       0.006s

Test of new comment attachment model

a20335c

richardmarshall mentioned this pull request Apr 7, 2024

Feature/implement formatter #291

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test of new comment attachment model #300

Test of new comment attachment model #300

richardmarshall commented Apr 7, 2024

Test of new comment attachment model #300

Are you sure you want to change the base?

Test of new comment attachment model #300

Conversation

richardmarshall commented Apr 7, 2024

Current approach to comment attachment

Proposed model

Implementation

Tests