Defining a document markup language for finl

The markup language for finl will be based on LaTeX, but many of the pain points of LaTeX come from the macro-expansion approach that Knuth’s TeX takes towards parsing the document. I can remember being a teenager reading The TeXbook and puzzling over the whole mouth-gullet-stomach description and finding that challenging to follow. 

LaTeX attempts to impose a sense of order on some of the randomness of TeX’s syntax (albeit with the price of occasional verbosity and losing some of the natural language sense of plain TeX, cf., the difference between typing {a \over b} vs \frac{a}{b}). Still, LaTeX’s basic markup language is a good starting point. This gives us commands and environments as the basic markup for a document. There’s something to be said for different modes of parsing as well. Parsing rules for math would differ from parsing rules for text, and there should be an option to be able to take a chunk of input completely unparsed, e.g., for \verb or the verbatim environment. Changing the timing of how things are parsed would enable us to do things like \footnote{This footnote has \verb+some_verbatim+ text}.

Commands

We retain the basic definition of how commands are parsed. A command begins with \ and is followed by a single non-letter or a string of letters until we come to a non-letter. Letters are defined as those Unicode characters that are marked as alphabetic, which means that not only is \bird a valid command, but so is \pták and also \طائر and \鳥

Commands can take any number of arguments. Arguments can be of the following forms:

  • Required argument. This will be either a single token (a unicode character, a command with its arguments or an environment) or a delimited token list (usually enclosed in braces but see below). 
    Note that this varies from existing LaTeX syntax in that a user could write, e.g., \frac 1\sqrt{2} rather than \frac 1{\sqrt{2}}. I may change my mind on this later.
  • Optional argument.  This must be delimited by square brackets. Inside an optional argument, if a closing square bracket is desired, it must appear inside braces, or, can be escaped as \]. \[ will also be treated as a square bracket within an optional argument.
  • Ordered pair. Two floating-point numbers, separated with a comma enclosed in parentheses. Any white space inside the parentheses will be ignored. This would be used for, e.g., graphics environments.

A command can have a single * immediately after the command which indicates an alternate form of the command.

Arguments can have types as well. These can be:

  • Parsed text. This will be parsed as normal, including any enclosed macros.
  • Mathematics. Spaces will be ignored. Math-only commands will be defined. ^ will indicate a superscript, _ will indicate a subscript (note that outside of math mode, ^ and _ will simply typeset those characters).
  • Unparsed text. This will be treated as a straight unparsed character stream. The command will be responsible for parsing this stream. The argument can either be enclosed in braces or the character at the beginning of the unparsed character stream will be used to indicate the closing of the character stream.
  • Key-value pair list. This will be a list of key-value pairs separated by commas. Any white space at the beginning or end of the list will be ignored as well as any white space surrounding the commas. If a value contains a comma, the whole value should be enclosed in braces or the comma escaped with \, values are otherwise treated as unparsed text. Keys and values are separated by ->
  • No space mode. Parsed as normal except all spaces are ignored.

As an aside, unparsed text can be re-submitted for parsing, potentially after manipulation of the text by the command definition.

Environments

Environments are marked with \begin{X} and \end{X} where X is the environment name.

Environment names can consist of any characters except {} or *. A single * at the end of the environment name indicates that this is an alternate form of the command, as with commands above.

The \begin environment command can take any number of arguments as above.

The contents of the environment can be any of the types for command arguments as above. Unparsed text, however, can only be concluded with the appropriate \end environment command.

There will be some special shortcuts for environments. \(…\) and $…$ shall be equivalent to \begin{math}…\end{math} \[…\] and $$…$$ shall be equivalent to \begin{displaymath}…\end{displaymath}

Updates

26 Feb 2021 Minor formatting change, indicate that key-value pairs are separated by ->.

 

Comments |0|

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Legend *) Required fields are marked
**) You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>
Category: architecture