Skip to content

CLAP Grammar Proposal

okapia edited this page Oct 18, 2018 · 17 revisions

CLAP grammar (Command Line Argument Parse)

EARLY DRAFT STATE

Conceptually, we want to specify a grammar in much the same way as something like Backus-Naur form. But, it needs to be augmented with additional information like descriptions and tags (which specify what is to be completed). We want it to be concise and focussed on completion candidates. It should also be open to use for things other than completion.

Parsing goes left-right up to the cursor. And the set of possible states at this stage indicates things to be completed. Parsing should also proceed to the right of the cursor (perhaps going backwards from the end) which can further limit the sets, mainly by manipulating sentinel values.

Conceptually, this is similar to Zsh's _regex_arguments but the aim is to be able to define things as succinctly as _arguments. _arguments handles common command-line parser variants, including:

  • -- terminates options (or not)
  • non-option argument terminates options (or not)
  • option packing: -abc for -a -b -c and -abc arg forms
  • -a arg -a=arg --aa arg --aa=arg

It therefore will need some sort of macro layer on top so that you can specify the specific form of the parser.

{ … } blocks of alternatives (no alternation operator unless you count newline)
<…>   delimits tags these correspond to things that might be completed
[…]   descriptions
/…/   regex for consumed characters
&/…/  lookahead
!/…/  negative lookahead
\x…x  regex with alternate delimiter
(…)   sentinel conditions
=>    concatenation (continuation in next argument (or same of no chars consumed))
->    concatenation (splitting the current argument)
$     named sequence (like <…> in BNF), can be scoped
@     macro or function invocation can also be scoped
=     assign a sequence to a name, newline terminates but blocks and trailing operators allow continuation

& and ! are PEG syntax for lookahead. I also pondered \…, |…| and /…/ for look behind, consuming text and lookahead.

Alternation is the default within {…} blocks, each alternative can only be passed once unless followed by (*).

The following is a sort of example to demonstrate usage

CLAP 0.0                              # declare grammar specification version
command cmd = $args                    # define grammar for a command's arguments
stdin cmd = $in                       # define grammar for redirection in
stdout cmd = $out                     # define grammar for redirection out
stderr cmd = $err                     # define grammar for stderr redirection
envar VAR = $var                      # define grammar for environment variable value

args = $opts => $arg1 => $arg2

opts = {                              # define a named sequence
  <options>:[option]
  /-./                                # match and ignore unknown options
  /--.+/
  -f [option description]             # short option with description
  { -l --long } [synonyms]
  -r [repeatable option] (*)          # repeatable option
  -v [verbose]          (*3)          # repeatable with maximum
  -x [specify argument] => {          # option taking argument
    <users>:[user] /.*/               # defer to shell's user completion
  }
  -y [desc] (!a)                      # option excluded by sentinel
  { -o --opt } $sep =>
  <options/actions>:[action option] (+a) # subgroup incrementing a sentinel
  --do
  --undo
} (*)


arg1 = {
  @lowercase()                        # predefined function for mapping lower to upper
  <arg>:[argument]:!"$arg --list-args"  # shell-command to list arguments:descriptions
}

arg2 = {
  @partial(_)                         # predefined function so sn_ca matches snake_case
  snake_case
  joined_words
}

macros: TBD

Sentinels

This is the basis for specifying mutual exclusion between options.

  • * – allow many
  • ^ - exclusive (allow many but not duplicates)
  • ^list - exclusive, sharing a dedup list between places.
  • +o – increment o counter
  • -debug – decrement debug counter
  • ?opt – require opt counter >0
  • !opt – require opt counter <=0
  • 0opt – zero counter
  • 1opt – set counter When the parser backtracks, it needs to revert changes to the counts. This is mainly notable because it doesn't happen with _regex_arguments. Finding a matching point should not prevent backtracking: if it is ambiguous we complete all cases.

Common list of standard tags (plural names):

  • options
  • processes
  • files
  • directories
  • users
  • hosts For common tags, you can elide the description. We'll need ways to provide options to certain tags like file extensions to be completed. Tags also come with predefined regexes for characters to be consumed.

Predefined Macros

The following affect matching rules:

  • @quote(start,protected[,end])
  • @shell_quote() - nested level of shell quoting
  • @packed_options() - allow -abc to match -a -b -c
  • @uppercase() - lowercase matching uppercase
  • @lowercase() - uppercase matching lowercase
  • @map() - character equivalence classes
  • @partial() - e.g. /u/l/b completing to /usr/local/bin
  • @initial() - initial characters that match nothing, e.g. initial zeroes on a number
  • @camel() - partial matching of camel case (or some generalisation there-of)

Regular Expressions

It'd be really useful to have certain extensions like matching characters from $IFS and matching the ends of arguments.

Clone this wiki locally