DMS — Tier 1 specification

Version: 0.1 (draft)

Sibling specification to SPEC.md. Defines the tier 1 extension: a syntactic surface that lets DMS represent element-shaped data (markup, declarative AST nodes, structured function calls) on top of the tier-0 value-tree algebra.

A tier-0-only decoder (per SPEC.md) rejects tier-1 documents at front-matter decode with a tier-1-pointing error. A tier-1-capable decoder ships the four functions described in Decoder / encoder split below; tier-0 conformance is preserved.

Pre-1.0: breaking changes are still possible. No version-bump rules apply yet.

Goal

Add a syntactic surface to DMS that lets it represent element-shaped data (markup, declarative AST nodes, structured function calls) without compromising the value-tree algebra (Map / List / Scalar) that the rest of the spec rests on.

Concrete near-term use cases:

What tier 1 does not add: expressions, references, includes from other files, computation. Those are separate decorator families that could be defined later as opt-in dialects, each with its own spec, but they are not part of this draft.

Tier 1 inherits all of tier-0's design non-goals (SPEC.md §"Design non-goals") unchanged:

If a tier-1 dialect's runtime layer wants to introduce any of these (e.g., an i18n dialect doing key resolution at render time), that's the dialect's contract with its consumers — DMS itself stays on the same algebra.

Core concept: decorators as AST decoration

A tier-1 line can carry a decorator call:

+ |tag(lang: "en")

The decorator (|tag(...)) is AST decoration, not value-tree content. The decoded document keeps two parallel structures:

The decorators sidecar is to |-calls what the comments AST is to # ... lines today: parallel, path-keyed, captured during full-mode decode, dropped in lite mode, reattached on encode for byte-stable round-trip. Its canonical shape is a list of per-node entries — see "AST shape" for details.

Sigil categories

Two disjoint roles:

Tier-0 reserved decorator sigil set

Tier 0 spec adds the following characters as reserved decorator sigils:

 !  @  $  %  ^  &  *  |  ~  `  .  ,  >  <  ?  ;  =

Tier 0 also reserves the Reserved Emoji Set (see SPEC.md §Lexical → "Reserved emoji characters"): every extended grapheme cluster containing at least one codepoint from Extended_Pictographic ∪ Regional Indicators ∪ Emoji Modifiers ∪ {U+20E3}. Emoji-bearing grapheme clusters are first-class decorator sigils alongside the ASCII set.

Both kinds of chars never appear in valid tier-0 bare keys, so the reservation costs zero existing tier-0 documents. The tier-0 spec rejects them in every value-position — see SPEC.md §"Implicit reservations":

Position Tier-0 status Tier-1 use
First non-whitespace of a body line parse error decorator at scalar root or leading
After key:, before inline_value parse error inner decoration on kvpair value
After +, before inline_value parse error inner decoration on list-item value
After inline_value on kvpair line, pre-NL parse error trailing decoration on kvpair value
After inline_value on list-item line, pre-NL parse error trailing decoration on list-item value
Inside flow_array, before/after element parse error inner/trailing decoration on flow element
Inside flow_table, before/after value parse error inner/trailing decoration on flow_kv

The reservation positions are exactly the value-positions where tier 1 places decoration. Tier-1 decoration "fills in" the slots tier 0 already keeps empty.

This is a fixed list in the tier-0 spec, not a dynamic per-file reservation. Benefits:

The tier-0 spec also reserves _-prefixed root keys in front matter (existing behavior); _dms_imports and any future _dms_* field name is covered by that.

Multi-character sigils

A sigil is a non-empty sequence of sigil atoms. A sigil atom is one of:

Single-atom sigils (|, @, 🚀, 🇺🇸) are the common case; multi-atom sigils (||, |@, ~~, &|*, 🚀🔥, |🚀) are the escape valve when a file imports more dialects/families than single-atom sigils can accommodate. Capacity scales as N + N² + N³ + …, effectively unlimited; the addition of the Reserved Emoji Set increases N from 17 to ~3.7k.

Sigils may only combine atoms drawn from the reserved sets above. Cross-set combinations with non-reserved chars like |+ are not valid sigils — + is the tier-0 list-item marker, and admitting it into sigil position would reintroduce parse ambiguity that tier 0 avoids by construction. ASCII-reserved chars and emoji clusters may be mixed within a single sigil (|🚀, 🚀|, @🇺🇸@); both kinds are tier-0-reserved at the same parse positions, so the resulting tokenization stays context-free.

Lexer rule: longest match. At decoration position, the lexer reads the maximal run of sigil atoms — a sequence of ASCII reserved-sigil chars and/or Reserved-Emoji grapheme clusters, in any order, with no intervening non-reserved characters — and matches it against the file's bound-sigil table (built from each import's bind, plus dialect defaults). The longest registered prefix of that run is the sigil; remaining sigil atoms (if any) are a parse error. If the run has no registered prefix, the error is "unknown sigil ''." Match comparison is byte- exact after NFC normalization of the source line, consistent with the rest of the spec.

Concretely: if a file binds both | and ||, then ||tag lexes as (||, tag) and |tag lexes as (|, tag) — no ambiguity. If a file binds only |, then ||tag is a parse error (the lexer would read || as the candidate, find no || binding, and reject it rather than fall back to | followed by literal |tag). Likewise, if a file binds 🚀 to a family, 🚀tag lexes as (🚀, tag); 🚀🔥tag is a parse error unless 🚀🔥 is also bound.

Skin-tone, ZWJ, and keycap sequences are matched as single atoms by the UAX #29 grapheme-cluster boundary algorithm (frozen at 15.1.0). 👍🏽 is one atom, not two; binding 👍 does not match 👍🏽. Authors who want both must bind both explicitly.

Front matter additions

Tier 1 adds two reserved root fields to front matter:

Tier-0 documents (_dms_tier: 0 or no front matter) must not contain _dms_imports. Decoder rejects such documents with an error suggesting _dms_tier: 1.

Import shape

+++
_dms_tier: 1
_dms_imports:
  + dialect: "html"
    version: "1.0.0"
    ns: "html"
    bind:
      "|": ["tag", "entity"]
    deny:
      tag: ["script", "iframe"]
    alias:
      tag:
        para: "p"
+++

Per import: two required identity fields, plus four optional sub-maps, each handling one orthogonal concern.

Identity (required):

Identity (optional):

Syntax (optional):

Curation (optional, all keyed by family-name):

Resolution rules

A decorator call appears at any value-position (see Decoration sites: every value-position). To parse one:

  1. Lex the sigil. At the call's start, read the maximal run of sigil atoms — ASCII chars from the tier-0 reserved decorator sigil set and/or extended grapheme clusters from the Reserved Emoji Set, in any order, with no intervening non-reserved characters. Match the longest prefix of that run against the file's bound-sigil table (built from each import's bind, falling back to the dialect's published defaults). The matched prefix is the sigil; any unmatched trailing sigil atoms are a parse error. Lookup yields a set of (family, dialect) candidates — possibly more than one if a sigil binds multiple families.
  2. Lex the name. Read the identifier after the sigil following tier-0 ident rules. If a . follows, this is a fully-qualified call (|<ns>.<name>); the namespace bypasses step 1's candidate set and resolves directly through <ns>'s import.
  3. Resolve the name within candidate families: - Resolve through alias first: if the identifier is an alias key in a candidate family's alias map, replace with the canonical name. Writing the canonical name directly when an alias exists is a parse error. - Apply curation: if allow is specified, the resolved name must appear in it; if deny is specified, the resolved name must not appear in it; otherwise the family is wildcard and any name is accepted. Exactly one candidate family must accept the name. Zero accepting ⇒ name not found in any family bound to '<sigil>'. More than one accepting ⇒ name '<x>' is ambiguous between families <A>, <B> under sigil '<sigil>'; qualify the call as |<ns>.<x>(...).
  4. Lex the params. If the next char (no whitespace) is (, parse one param group — flow-table-shaped (named) or flow-array-shaped (positional), determined by the group's first token (see "Positional params" under "Dialect specification contract"). Then loop: while the next char is ( (no whitespace), parse another group. Otherwise the call has zero explicit param groups. Decoded shape: - bare |tag and empty-parens |tag() both produce params: [{}] - |tag(a: 1)params: [{ a: 1 }] (named) - |score(95, 5)params: [[95, 5]] (positional) - |tag(a: 1)(b: 2)params: [{ a: 1 }, { b: 2 }] (two named groups) - |score(95, 5)(commented: true)params: [[95, 5], { commented: true }] (positional + named)
  5. Record the AST entry. { family: <canonical family>, fn: <canonical fn name>, params: [...], params_dec: [], position: <leading|inner|trailing|floating> }. fn is always the canonical name regardless of whether an alias was used at the source. params_dec is empty by default and populated only if a param value was itself prefixed with decoration (see AST shape).

Conflict detection

After all _dms_imports entries are merged (shallow), the decoder runs two layers of conflict checks.

Front-matter-time: cross-import family collisions

For each (sigil, namespace, family) triple, two different imports binding the same triple is a hard parse error:

Decoder error in front matter (line 12):
  Decorator binding collision on (sigil='|', ns=<unset>, family='tag'):
    - import #0: dialect 'html' v1.0.0 binds '|' → 'tag'
    - import #1: dialect 'math' v0.1.0 binds '|' → 'tag'

  Resolve by remapping one. Suggestion (rebinding 'math' to '@'):
    _dms_imports:
      + { dialect: "html", version: "1.0.0", ns: "html" }
      + { dialect: "math", version: "0.1.0", ns: "math",
          bind: { "@": ["tag"] } }

Heuristic for which dialect the error suggests remapping: the later one in the imports list, since the earlier import is more likely the file's "primary" dialect.

A single sigil binding multiple families from the same import is not a collision ("|": ["tag", "entity"] is fine). Same- sigil multi-family bindings across different imports are also fine as long as no (sigil, ns, family) triple repeats — function-name disambiguation handles the rest at parse time.

Parse-time: same-sigil function-name ambiguity

Different families bound to the same sigil may publish overlapping function names. The decoder cannot enumerate names ahead of time when one or both families are wildcard, so this class of conflict is caught at body parse — see step 3 of Resolution rules. The error message points the user toward fully-qualified |<ns>.<name>(...) form.

Sigil validation

Every key in any bind map must be a non-empty sequence of sigil atoms — ASCII chars drawn from the tier-0 reserved decorator sigil set and/or extended grapheme clusters drawn from the Reserved Emoji Set (see "Tier-0 reserved decorator sigil set" and "Multi-character sigils" above for the canonical sources). A bind key containing any character outside the union of those two sets is a hard parse error at front-matter decode. (Note: underscore _ is not in either set — it's its own category, reserved for core / built-in decorators. ASCII chars that have Emoji=Yes but are not in the Reserved Emoji Set — #, *, digits — are not sigil atoms either; ©, ®, and are sigil atoms because Unicode classifies them as Extended_Pictographic=Yes.)

AST shape

The decorators sidecar on Document_t1 is a list of entries, one per decorated value-tree node, in source order:

decorators:
  -
    path: [0]
    "|":
      -
        family: "tag"
        fn:     "html"
        params: [{ lang: "en" }]
        params_dec: []
        position: "leading"
    comments:
      -
        kind: "leading"
        text: "page root"

Shape: List< { path, <kind>: List<Record>, ... } >.

The path field identifies which value-tree node the entry attaches to. Path is a list of typed segments:

Examples: [] is the root; ["body"] is the value at map key body; ["body", "children", 0] is the first element of the list at body.children.

Each entry beyond path is keyed by decoration kind. Each value is an array of decoration records of that kind, in source order. Per-kind shapes:

A given path must appear at most once in the list. All decoration kinds attached to the same node share one entry. The list-of-entries form is canonical because: (1) it preserves source order without depending on map iteration semantics, and (2) it lets path segments stay typed values rather than baking an escape grammar for arbitrary unicode map keys into a path string.

Identity vs param data: separate AST slots

The decoration record's identity / machinery fields (family, fn, params, params_dec, position) live at the top level of the record. params is a separate slot beneath that level. Function identity lives only in fn; nothing else does.

This means user-written param keys — name, id, class, even fn itself — are plain data with no AST-machinery meaning. There is no collision between the AST's identity slots and the user's param keys, regardless of what the user writes:

|form(name: "x", fn: "y")

decodes to:

{ "family": "tag", "fn": "form",
  "params": [{ "name": "x", "fn": "y" }] }

The AST record's fn holds "form" (the function name from source). params[0].fn holds "y" (user data). They never overwrite each other.

The only reserved param key per family is the one declared as content_slot — that key triggers hoisting and never appears in params after decode (its value moves to the value tree). Everything else is plain user data.

Lite-mode behavior

Lite mode drops the decorators sidecar wholesale, the same way tier-0 lite mode drops the comments sidecar.

For documents whose meaning depends on decorator content (markup docs, anything element-shaped), this is semantically lossy in a way tier-0 lite-mode is not. Comments are advisory; tier-1 decorators are load-bearing. The tier-1 spec must state:

Comments are advisory; consumers may discard them without loss of document meaning. Tier-1 decorators are structural; consumers that discard them must not claim to preserve document semantics.

Decoration attachment

Tier-1 decoration mirrors tier-0 comment attachment exactly. There are four positions; they share source-location rules with tier-0 comments (SPEC.md §"Comments → attachment").

Position Source location Attaches at Stacks?
Leading Own line(s) immediately before a kvpair / list-item, no blank line between The following node Yes
Inner Between key: (or +) and the value, same line The node's value Yes
Trailing After the value, same line, before newline The node's value Yes
Floating Own line(s), blank-line-separated, or after the last child of a block The enclosing block Yes

Path-keying: leading, inner, and trailing all attach at the same path (the node's value path). Floating attaches one level up (the parent container's path). The position field on each decoration record (analogous to comments' kind field) records which of the four was used, for round-trip fidelity.

Decorator call syntax

decorator_call = sigil name [ "." name ] { "(" [ flow_kvs ] ")" }
sigil          = <one of the bound sigils for this file>
name           = ident                                  (* tier-0 ident rules *)
flow_kvs       = flow_kv { "," flow_kv }                (* tier-0 flow_kv *)

Examples by position

# leading — own line(s) before a node
|cached_for("1h")
|requires_auth
endpoint: "/api/users"

# inner — between header and value, same line
endpoint: |cached_for("1h") "/api/users"
endpoint: |required                        # inner-only — value is dialect default
+ |row(class: "header") name: "Alice"      # inner on list-item kvpair-continuation form

# trailing — after the value, same line
port: 5432 |validates_range(1024, 65535)
+ "first" |emphasis

# floating — blank-line-separated, attaches to enclosing block
servers:
  + name: "web1"
  + name: "web2"

  |section_status("paused")

Stacking and interleaving with comments

Each position accepts an arbitrary number of decorations and comments, in source order. Decorations and comments may interleave freely:

a: "A" /* before |dec1 */ |dec1() |dec2() # this is a trailing line comment

Decoded as four trailing entries on a, in this order: 1. trailing block comment "before |dec1" 2. trailing decorator |dec1() 3. trailing decorator |dec2() 4. trailing line comment "this is a trailing line comment"

Rule (inherited from tier-0 comments): any # or // line comment must come last — line comments consume to end-of-line, so anything after them isn't part of the same slot. Decorators and /* … */ block comments don't consume EOL and can appear in any order before the line comment.

This rule applies wherever line comments are syntactically possible (trailing, leading, floating). Inner position is between header and value on the same line; tier-0 already forbids # / // there (they'd consume the value).

Decoration sites: every value-position

The grammar additions, by surrounding production:

decoration         = decorator_call | line_comment | block_comment

inner_run          = decoration { whitespace decoration }
trailing_run       = decoration { whitespace decoration }
leading_block      = ( decoration NEWLINE )+    (* no blank line before next node *)

decorated_value_t1 = inner_run? base_value? trailing_run?
                   ;  (* at least one of inner_run or base_value must be present *)

scalar_root_t1     = leading_block? decorated_value_t1
kvpair_t1          = leading_block? key ":" decorated_value_t1
list_item_t1       = leading_block? "+"     decorated_value_t1
flow_array_t1      = "[" [ decorated_inline_t1 { "," decorated_inline_t1 } ] "]"
flow_table_t1      = "{" [ flow_kv_t1         { "," flow_kv_t1          } ] "}"
flow_kv_t1         = key ":" decorated_inline_t1

decorated_inline_t1 = inner_run? inline_value trailing_run?
                    ;  (* flow forms have no leading/floating positions *)

base_value covers whatever the surrounding production already admits (e.g., child_block is available after + and key:, not inside flow forms — same as tier 0).

Floating decoration follows tier-0's floating-comment rules and attaches at the enclosing container's path, identically to how floating comments attach.

Decoration-only (no base_value)

When inner_run is present but base_value is absent, the value at that path is the dialect-specified empty default for the family of the first inner decorator on that line:

+ |meta(charset: "UTF-8")              # value resolves to html.tag's empty default ({})
+ |link(rel: "stylesheet", href: "x")
key: |required                         # value resolves to required's family empty default

Each dialect publishes a per-family empty default as part of its registration contract (typical defaults: empty table {} for record-shaped families like tag, empty list [] for collection-shaped families).

If no inner decoration is present and no base_value is written, that's a tier-0 parse error as today (e.g., bare + with no continuation). Trailing decoration without a base_value is syntactically impossible — there's no value-position for it to sit after.

Indent-block role: unchanged from tier 0

Because tier 1 only adds decoration positions around values, the question "what does an indent block under this line mean?" is answered exactly by tier-0 productions, applied to whatever base_value the line carries (or the dialect-empty default if inner-only):

Line shape Indent-block role
key: <inner?> <inline> <trailing?> (no indent allowed — leaf)
key: <inner?> child_block is the value of key
+ <inner?> <inline> <trailing?> (no indent allowed — leaf)
+ <inner?> key: <inline> sibling kvpairs of the same record
+ <inner?> child_block is the list-item's value

Tier 1 does not add a new indent-block opener. The tier-0 discriminator stays binary: a line either opens a block (no inline value present) or it doesn't (inline value present). Decoration sits orthogonally and never affects which case applies.

Content hoisting

Children of a tier-1 element can be written in two equivalent forms — block (indent-block) or flow (a content-slot param):

# Block form
+ |p(class: "lede")
  + "Click "
  + |a(href: "/x") "here"
  + " to read."

# Flow form (semantically equivalent)
+ |p(class: "lede", children: ["Click ", |a(href: "/x") "here", " to read."])

Both decode to the same value tree. The decode pipeline hoists the content-slot param into the value-tree position the decoration is attached to, so consumers find children at one place — the value tree — regardless of which source form was used.

Per-family content-slot declaration

The content-slot param name is not hardcoded. Each family in the dialect spec optionally declares its content-slot name as part of its registration contract:

When a family has no declared content slot, no param name triggers hoisting — content (or any other key) is just an ordinary param.

Hoist pass

Tier-1 decode runs the hoist pass after the body parse:

  1. Parse front matter, resolve dialect imports and bound sigils.
  2. Parse body → raw AST where decorator params are intact maps, no value-tree promotion yet.
  3. Hoist pass. For each decoration record: - Look up its family in the dialect. - If the family declares a content slot and the param map contains that key, move the slot's value out of params[N] [slot] and into the value-tree position the decoration attaches to. - The remaining keys in params[N] stay as decoration.
  4. Apply other tier-1 normalizations (decoration-only → dialect-empty default, params_dec for nested decoration, etc.).

Hoisting is tier-1-only. Tier-0 docs have no decorators, so the pass is a no-op on them.

Conflict: both forms present

Specifying content via both the declared content_slot param (children: for HTML, whatever the dialect declared for other families) and an indent block on the same line is a parse error:

+ |p(children: ["a"])     # ← parse error
  + "b"
Decoder error at line N:
  Element |p has content specified via both 'children:' parameter
  and indent block. Pick one.

No magic merging or override semantics. Pick one form per node.

Encoder canonical form

The decoder collapses both source forms to the same value tree; the encoder must choose which form to re-emit. Heuristic, deterministic from content shape:

This rule rewrites source — |p(children: ["one"]) decodes and re-encodes as block form. That's the same kind of canonicalizer behavior tier-0 encoders already do for things like quote style. A future revision can add per-node form preservation (sidecar original_form marker) if real users find rewriting jarring.

Dialect specification contract

A dialect publishes a structured specification that the decoder loads at registration time. The spec is the cross-port source of truth — each port translates it into its native registration format, but the contract (what the decoder validates, what canonical names exist, etc.) is identical across ports.

The spec contains four kinds of declarations: families, param signatures (per family), named structs, and the dialect's version-match rule (covered in "Import shape").

Families

Each family the dialect publishes:

Naming guidance: pick a slot name that does not conflict with any valid attribute name for the family. HTML's tag family uses "children""content" was the obvious choice but collides with <meta name="..." content="...">, where content is a literal HTML attribute. "children" has no collision in standard HTML. The hoist mechanism owns one canonical slot name per family; every other key flows through params unchanged. If a dialect can't find a non-colliding name, fall back to a prefixed form ("_children", "__body__") — verbose but unambiguous. - params (optional) — param signature for this family (see below).

Param signatures

Three modes per family:

Per-family params block structure:

params:
  mode: "wildcard_with_typed"
  typed:
    class:    { type: "string" }
    hidden:   { type: "boolean", default: false }
    children: { type: "list_of any" }
  required: ["id"]

Validation is family-level only. Per-function tightening (e.g. HTML's <input> requires type, <span> does not) lives in the dialect's runtime / render layer, not in the DMS decoder. This keeps the decoder's job small and the spec testable across ports.

Param values themselves can be any tier-0 inline_value shape — scalar, flow_array, or flow_table — or a decorated value (|inner(...) nested in another decorator's params, resolved through params_dec). Validation applies to the hoisted + nested-resolved value, after the decoder has finished normalizing the AST.

Type vocabulary

Type Matches
string tier-0 string
integer tier-0 integer
float tier-0 float
boolean tier-0 boolean
datetime tier-0 datetime
list_of <T> flow_array (or hoisted block list) where every element matches <T>
map_of <T> flow_table (or block table) where every value matches <T>
any any value-tree shape
<StructName> a map matching the named struct (see below)

Named structs

Dialects may declare reusable struct types referenced by name in typed signatures:

structs:
  Address:
    street: { type: "string", required: true }
    city:   { type: "string", required: true }
    zip:    { type: "string" }

  ContactInfo:
    email: { type: "string" }
    home:  { type: "Address" }
    work:  { type: "Address" }

families:
  + name: "user_card"
    params:
      mode: "wildcard_with_typed"
      typed:
        contact:   { type: "ContactInfo" }
        addresses: { type: "list_of Address" }

Each struct field has the same shape as a typed entry — type, optional required, optional default. Structs may reference other structs and built-in types. Cycles are a registration-time error (the dialect's spec fails to load if a struct references itself directly or transitively).

Struct names live in the dialect's namespace; cross-dialect struct references are not supported in this revision. If a file imports two dialects that both define Address, each dialect's families resolve their own Address and there is no shared definition.

Decoder validation behavior

When a decorator call is decoded, after Resolution rules and content hoisting the decoder applies the family's signature:

  1. If mode == "wildcard", skip validation.
  2. If mode == "strict", every key in the (post-hoist) param group must appear in typed. Unknown keys are parse errors.
  3. If mode == "wildcard_with_typed", declared typed keys are checked when present; unknown keys pass through unchecked.
  4. For each declared key, run type-match: - Built-in types match by tier-0 value-tree kind. - Struct types recursively validate the value as a map against the struct's field signatures.
  5. required keys must be present after hoisting + defaults. Missing required is a parse error.
  6. Defaults fill in absent keys before the AST is finalized (i.e., the decorator record's params shows the defaulted value).

Validation errors fire at decode time with path context:

Decoder error at line N, decorator |tag(...) at path [0, 1]:
  Param 'class' has type integer but signature requires string.

(Path is rendered with the canonical typed-segment form — strings for map keys, integers for list indices, displayed as a list.)

Positional params

A param group is either flow-table-shaped (all named) or flow-array-shaped (all positional). Single calls separate the two modes by group:

|score(95, 5)                      # one positional group
|tag(class: "lede")                # one named group
|score(95, 5)(commented: true)     # positional group, then named group
|some(5)                           # variant payload — positional
|emphasis "text"                   # base_value form (existing) still works

A param group cannot mix positional and named at the same level. The mode of a group is detected from its first token and locked in for the whole group — mixing within one group is a parse error. To mix modes, use multiple param groups.

Rationale for separation-by-group rather than Python-style "positional-then-named within one group":

Lexer rule

At the open (, the decoder peeks at the first non-whitespace token:

First token after ( Group is Parse as
key: (ident followed by :) named flow_kvs
) (immediate close) empty [{}] (back-compat)
anything else positional flow_array_elems

The "anything else" includes: scalars, flow forms, decorator calls, base_value-like inline values. A positional group is exactly a flow_array body without the brackets.

A key: token appearing after a positional element in the same group, or any non-key: token after a named element, is a parse error: "Cannot mix positional and named params in one group; use a separate (...) group."

AST shape

The params field on a decoration record becomes a list of either Map (named group) or List (positional group):

# |tag(class: "x") decodes to:
params: [{ class: "x" }]

# |score(95, 5) decodes to:
params: [[95, 5]]

# |score(95, 5)(commented: true) decodes to:
params: [[95, 5], { commented: true }]

# |tag and |tag() both decode to:
params: [{}]

In languages with sum types: params: List<Map<String,Value> | List<Value>>. In dynamically-typed languages: detect kind at runtime.

Per-family signature

Families that accept positional params declare a positional block alongside the existing typed block. The mode enum gains a fourth value:

Mode Positional groups Named groups Strict checking on names
wildcard rejected accepted none
wildcard_with_typed rejected accepted declared typed keys
strict rejected accepted only declared typed keys
positional accepted accepted positional slots typed; named keys per typed

Spec example:

families:
  + name: "variant"
    default_sigils: ["|"]
    empty_default: {}
    content_slot: "value"
    params:
      mode: "positional"
      positional:
        - { name: "value", type: "any" }
      typed: {}                     # no named keys defined for this family

Each positional slot has: - name — used for AST round-trip identity and error messages - type — from the standard type vocabulary - required (optional, default true) — slot must be present - default (optional) — fills in absent slot before AST is finalized - variadic (optional, default false) — see below

positional is an ordered list of slots. Element 0 of the positional group fills slot 0, element 1 fills slot 1, etc.

Variadic positional slot

A family that accepts arbitrary-arity positional calls (|node(a, b, c, d, e)) declares its last slot as variadic: true. Each surplus positional element collects into the variadic slot's list.

# A family that takes one required string label and any number
# of additional values:
positional:
  - { name: "label", type: "string", required: true }
  - { name: "args",  type: "any", variadic: true }

# A family that takes only variadic args (KDL-shaped):
positional:
  - { name: "args", type: "any", variadic: true }

Rules.

  1. Only the last slot may be variadic. A variadic slot followed by another slot is a registration-time error ("variadic slot 'X' must be the last positional slot in family <f>"). Forbidding mid-list variadic keeps slot assignment a single left-to-right scan with no end-counting.
  2. At most one variadic slot per family. Falls out of (1).
  3. Variadic slots are implicitly optionalrequired and default are not used on variadic slots; zero matching elements is valid and produces [].
  4. Element-level typing. The slot's type describes the type of each element. The slot's collected value is implicitly a list of those elements. To accept any element type (KDL's case), set type: "any". To accept only integers (|sum(1, 2, 3, 4)), set type: "integer".
  5. Surplus elements never error when variadic is present — they always have a slot to land in. Without a variadic slot, surplus elements remain a parse error per existing rules.

Validation pass — slot assignment.

For a positional group with K elements and a signature with N slots where slot N-1 is variadic:

Signature Call Validates as
[label: string!, args: any (variadic)] |node("x") label: "x", args: []
[label: string!, args: any (variadic)] |node("x", 1, 2, 3) label: "x", args: [1, 2, 3]
[label: string!, args: any (variadic)] |node parse error: label required
[args: any (variadic)] |node args: [] (no positional group)
[args: any (variadic)] |node("a", "b") args: ["a", "b"]
[args: integer (variadic)] |node(1, 2, 3) args: [1, 2, 3]
[args: integer (variadic)] |node(1, "two", 3) parse error: element 1 type mismatch

AST shape — unchanged.

Variadic does not change the AST. The positional group stays a flat List<Value> in params[N]:

# |node("x", 1, 2, 3) decodes to:
params: [["x", 1, 2, 3]]

The dialect's positional signature is metadata for validation and structured access, not an AST transform. Tools that want the structured { label: "x", args: [1, 2, 3] } view apply the signature on top of the raw list; tools that don't — generic walkers, sidecar inspectors, lite-mode consumers — get the same flat List<Value> regardless of whether variadic is declared.

Decoder cost.

Zero new lex/parse work. The positional-group lexer still produces a flat List<Value> regardless of the family's slot declarations. Variadic is a validation-pass rule applied after parsing — the same pass that already iterates the list to type-check non-variadic slots.

The cost the spec adds:

No lexer state change, no new tokens, no new AST shape, no new streaming yield rule. Streaming behavior is identical to the existing positional-group rule (decorator-call parens are yield-suspending; yield is deferred until the close paren regardless of how many elements appear inside).

Encoder.

Encoder emits the positional group as a flat comma-separated list. No marker for the variadic boundary — the boundary is implicit (last N-1 elements after the required slots, where N is the slot count).

|node("x", 1, 2, 3) round-trips as |node("x", 1, 2, 3), both with and without a variadic-aware encoder.

Decoder validation behavior (extended)

For each param group:

  1. If group is positional and family mode == "positional": validate group elements against the family's positional slots in order. Type-check each element against its slot's type. Apply defaults for absent trailing slots if not required.
  2. If group is positional and family mode != "positional": parse error — "Family '<f>' does not accept positional params; use named keys."
  3. If group is named: validate per the existing rules (wildcard / wildcard_with_typed / strict).
  4. If family mode == "positional": named groups still validate against typed exactly as wildcard_with_typed would. Mixed-group calls (one positional, one named) are normal.

Validation errors carry slot identity for positional groups:

Decoder error at line N, decorator |score(...) at path [0, 1]:
  Positional slot 1 ('y') has type string but signature
  requires integer.

Encoder canonical form

The encoder emits each group in its decoded shape:

Group order is preserved from decode. No re-ordering, no merging across groups, no automatic conversion (positional elements are never re-emitted as named, even when slot names exist).

Multi-line vs single-line group emission

Decorator-call parens are a flow-form region (per "Streaming / incremental decode" above), so they inherit SPEC.md's canonical multi-line layout for flow forms — close-bracket anchors the indent, members one level deeper, trailing comma on the last member. Tier 1 adds two specifics:

  1. Multi-line emission is not optional infrastructure for tier-1 ports. Decorator-call parens have no block-form alternative (unlike tier-0 lists / tables, which canonicalize to block form when non-empty). Block-shaped dialects routinely have groups with many keys; a single-line-only encoder produces unreadable output. Every tier-1-capable port MUST support multi-line emission for both named and positional groups.
  2. Mixing single-line and multi-line groups in one call is permitted. If the first group fits on one line and the second doesn't, emit single-line then multi-line: dms |resource("aws_instance", "web")( count: 3, ami: "ami-...", instance_type: "t2.micro", ) (Where ("aws_instance", "web") is single-line and the named group is multi-line, both anchored on the call's line.)

The break threshold (when to choose multi-line) is the same as SPEC.md's flow-form rule: single-line render exceeding the port's line-width threshold, OR the group containing a value that itself renders multi-line (nested decorator call, multi-line flow form, heredoc).

Decoding accepts both forms unconditionally — decorator-call parens are yield-suspending, so line breaks inside (...) are invisible to the parse.

Hoisting interaction

content_slot hoisting is a named-key mechanism. A positional group does not trigger hoisting, regardless of whether positional slot 0's name matches content_slot.

Inline base_value continues to hoist:

Three forms can produce the same value tree if value is the content_slot AND positional slot 0's name is value. The dialect MUST document its canonical encode form (typically inline base_value when possible, else named, else positional) so round-trips are stable.

Dialect versioning

Dialect versions are semver. All dialects must publish their versions as MAJOR.MINOR.PATCH strings, with optional pre-release (-rc.1, -alpha.2) and build-metadata (+build.7) suffixes. This is a hard requirement — no other versioning schemes are supported.

The dialect declares one match strategy in its canonical spec, drawn from this fixed enum:

Strategy Behavior
exact Installed version equals requested version exactly.
caret Same major, installed ≥ requested. (npm ^x.y.z semantics.) For 0.x.y requests, behaves as tilde — pre-1.0 minor bumps are breaking, per semver convention.
tilde Same major.minor, installed patch ≥ requested patch.
gte Installed version ≥ requested version.
any Any installed version matches.

Default if undeclared: caret. Standard practice; friendliest evolution path.

The match algorithms are normative. Every port implements all five strategies identically — no per-port semantics drift.

Pre-release and build-metadata rules:

Where it lives in the dialect spec:

# Dialect canonical spec
name: "html"
version: "1.0.0"
version_strategy: "caret"        # optional; defaults to "caret"

structs: ...
families: ...

File-side syntax: the file writes a plain semver string (version: "1.0.0"); the dialect's strategy is applied. Range specifiers in the file (npm-style ^1.0.0, ~1.0.0) are not supported in this revision and would be a parse error if written. Range-specifier syntax is parked as a future enhancement.

Failure mode at decode:

Decoder error in front matter: dialect 'html' v1.5.0 requested
with strategy 'caret', but installed versions [1.0.0, 1.2.0,
1.4.9] do not satisfy. Install ≥1.5.0 of html.

Registration-time validation: if a dialect spec declares a version_strategy outside the five-value enum, the port refuses to register the dialect and surfaces an error.

Branding & file naming

A tier-1 document that imports any dialect is no longer a plain DMS document — it's a DMS dialect document. Naming conventions:

Open: dialect registry governance. Who allocates short brand names (html, markup, config)? Punted to a future registry. For now: an allocations document in the SPEC repo with PR-based additions; a x- prefix for unofficial / experimental dialects (x-mybrand); reverse-DNS namespacing available for anything else (io.flolabs.html).

Decoder / encoder split

Tier 1 introduces enough new lex / parse / sidecar machinery that mixing it into the tier-0 entry point would (a) bloat tier-0-only ports with code they don't need, and (b) muddle conformance — "does this port handle tier 1?" should be a yes/no per port, answered by which functions it ships.

Four functions, paired by tier

decode_t0(source, opts?) → Document_t0      # tier-0 only; rejects tier-1
encode_t0(doc: Document_t0, opts?) → str    # tier-0 only; rejects decorations

decode_t1(source, opts?) → Document_t1      # accepts both tiers
encode_t1(doc: Document_t1, opts?) → str    # accepts both tiers

The opts shape is per-port idiom (kwargs, options struct, builder, etc.) and carries:

Tier detection

Tier is not declared by tier-0 documents. The decoder reads front matter and:

A bare tier-0 document needs no declaration; the _dms_tier field is the opt-in marker for tier ≥ 1.

Document types

Document_t0 = { value_tree, comments }
Document_t1 = Document_t0 + { decorators }       # strict superset

Languages with subtyping (Python, TS): Document_t1 extends Document_t0. Languages without (Rust, Go): explicit field — a Document_t0 is convertible to a Document_t1 with empty decorators.

A decode_t1 always produces a Document_t1. If the source was tier-0 (no _dms_tier: 1), the result is a Document_t1 with an empty decorators list — structurally indistinguishable from a tier-0 doc round-tripping through tier-1 machinery.

Behavior at the boundary

Lite vs full is orthogonal to tier

decode_t0(source, mode='lite')  → value tree only
decode_t0(source, mode='full')  → value tree + comments
decode_t1(source, mode='lite')  → value tree only          (lossy on tier-1 — see warning)
decode_t1(source, mode='full')  → value tree + comments + decorators

Lite mode on tier-1 docs is semantically lossy (per "Lite-mode behavior" earlier). Tier-1-capable ports must surface this in their docs; consumers who lite-decode a tier-1 doc and re-emit it have produced a structurally different document.

Conformance per port

Port profile Ships Corpus
Tier-0-only decode_t0, encode_t0 tier-0 (~4695 fixtures)
Tier-1-capable All four tier-0 + per-dialect tier-1

A tier-1-capable port still ships decode_t0 / encode_t0 — some consumers want strict tier-0 behavior in a tier-1-capable port (e.g., tooling pipelines that reject tier-1 docs by policy).

Forward extensibility

A future tier 2 adds decode_t2 / encode_t2 alongside the existing four. Cumulative — each tier-N decoder accepts tier-N and below. A port adopting tier 2 ships six functions; no existing function changes signature.

Mutate API symmetry

Ports that expose a mutate / path-update API split the same way: mutate_t0 operates on value tree + comments; mutate_t1 preserves decorators across mutations. Tier-1 mutations need to keep decorations attached to the right node across insertion, deletion, and reorder. The contract mutate_t1 must satisfy, and the two implementation strategies (opaque-ID backing or path-rewriting), are spelled out under Stable node identity (port-level) below.

Streaming / incremental decode

Streaming is optional per port. A port may ship batch-only decoders (whole document → Document) without violating spec. If a port ships a streaming decoder, the tier-1 streaming behavior is fully determined by the rule below — no per-port divergence.

Yield-point rule (inherited from tier 0): a streaming decoder yields a (path, value, decorations?) event when it has fully parsed a value-tree node.

Tier-1 addition: decorator-call parens are a yield-suspending region. A |tag(...) call's parens bound a region treated exactly like a flow form: yields suspend until the closing ). Param groups may span lines, and the decoder buffers until the group closes.

The hoisting wrinkle. When a family declares a content_slot, the decoder cannot finalize the decorated node's value-tree value until the param group is fully parsed — because it needs to check whether the slot key is present and, if so, hoist it into the value tree. Concretely:

+ |p(class: "lede", children: [..., long flow array spanning lines, ...])

The list-item's value-tree value is either the family's empty default (if content is absent) or the hoisted array (if present). The decoder doesn't know which until the param group closes. The yield for this list-item is therefore deferred until ). This isn't a new constraint shape — it's the same suspension flow forms already impose on tier-0 streaming, just applied to decorator-call parens.

What remains line-streamable in tier 1:

Summary: yield-suspending regions in tier 1 are exactly flow forms + decorator-call parens; hoisting defers a node's yield to the close of its containing param group. Everything else streams as in tier 0.

Per-port API

Each port exposes a path-keyed metadata-lookup API on the decoded Document so consumers can read tier-1 decoration without walking the decorators list themselves. Paths are passed as lists of typed segments (strings for map keys, integers for list indices), matching the AST path field exactly:

doc.decorations_at(["body", 0])              # all kinds at path
doc.decorations_at(["body", 0], kind="|")    # just |-decorations
doc.decoration_value(["body", 0], kind="|")[0].fn
doc.decorations_at([])                       # root-level decorations

Spec mandates the lookup contract (path-keyed by typed segments, kind-discriminated, ordered array per kind). Ports may build an internal hash on first lookup to amortize O(1) access over the canonical list shape. Each port picks idiomatic naming on top.

Stable node identity (port-level)

The canonical sidecar shape is path-keyed: every entry's path field locates the decorated node in the value tree by typed- segment path. This is the wire format for the sidecar, the streaming yield shape, and the inspection / debug-print form.

Path-keying is positional — paths shift when lists are mutated (insert at index 0 → every later index shifts by one). Naive mutation of the value tree can leave decorations pointing at the wrong node. To keep the canonical shape simple while still supporting mutation-heavy workloads, the spec adopts a hybrid model:

A port that ships mutate_t1 MUST guarantee:

After any sequence of mutate_t1 operations on a Document_t1, a subsequent encode_t1 produces output where each decoration is emitted on the same logical node it was attached to at decode time — modulo replacements, where the caller overwrites a node with a fresh value and the old node's identity (and its decoration) is dropped.

Ports MAY satisfy this contract by:

The spec is silent on which strategy a port picks; both satisfy the contract. The choice is between paying the cost in the data model (opaque-ID) or in the mutate API surface (path rewriting).

For ports that ship only decode_t1 / encode_t1 (no mutate_t1): decoration attachment is preserved across decode → encode round-trips on unmutated documents. Programmatic mutation outside a mutate_t1 API is the caller's responsibility — typical mitigation is to manipulate the sidecar in tandem with the value tree, or to re-decode after editing source text directly.

Plain + direct + preserving: pick two

The reason the spec accommodates multiple strategies (rather than mandating one) is that three properties port authors might want out of Document_t1 mutation are not jointly satisfiable:

  1. Plain data model — value tree is unwrapped Map / List / Scalar in the host language's idiomatic types. No wrappers, no identity slots, no per-node ceremony.
  2. Direct mutation — caller edits the value tree in place, without routing through any mutate API.
  3. Preservation — decoration stays attached to the same logical node across mutation.

A port can pick any two of the three. Each choice maps to one of the strategies above:

This isn't a bug; it's a description of what's available. Each port picks the pair that matches its workload. The canonical sidecar shape is path-keyed regardless of the choice — the three strategies differ only in how a port represents the value tree in memory and what its mutate surface looks like, not in what encode_t1 writes to disk.

When to use mutate_t1 (caller-facing rules)

Independent of the port's strategy, callers need a clear rule for when direct value-tree mutation preserves decoration and when it doesn't. The rule follows from path-keying: decoration is attached to the path it was decoded onto. Anything that changes which path identifies which logical node breaks attachment.

Direct mutation is safe (preservation holds) when:

  1. Leaf value swap at unchanged path. Replacing the scalar at an existing path with another scalar: python doc.value_tree["server"]["port"] = 8080
  2. Editing decoration content in place. Params, comment text, position fields — these live inside sidecar entries and aren't path-keyed: python doc.decorations_at(["body", 0], kind="|")[0].params[0]["class"] = "lede"
  3. Tail-only append to a list. If no decoration entry has a path through that list at any index, or only through indices already present, appending past the end shifts nothing: python doc.value_tree["items"].append("new") # safe iff no decoration on ["items", N+]
  4. Adding a fresh decoration entry. New entries with new paths don't disturb existing ones.

Direct mutation is unsafe without mutate_t1 (or manual sidecar rewriting) when:

  1. List indices shift. Any insertion or deletion at a non- tail index of a list that has decoration on later indices, or any list reorder — paths through later indices now point at the wrong logical nodes.
  2. Map keys change. Rename, pop + set under a new key, or any operation that removes the key the decoration was attached to — decoration becomes orphaned.
  3. Subtrees move. Splicing a node from one location to another — decoration entries through the source path are wrong; the destination doesn't get them.
  4. Non-leaf replacement with structural change. Overwriting a map or list with a value of different shape — decoration through the old structure doesn't fit the new structure.

Operational summary:

If your mutation only changes a leaf value or a decoration's internal contents, edit directly. If your mutation changes which path identifies which logical node — index shift, key change, subtree move, structural overwrite — use mutate_t1, rewrite the sidecar paths yourself, or re-decode after editing source text.

This rule applies regardless of which strategy a port adopted:

The rule is format-level, not port-level: any tier-1 consumer can use it to decide whether doc.value_tree[...] = x is fine or whether they need a heavier mutation path.

Why hybrid. Pure path-keyed leaves mutation safety unsolved and forces every consumer that edits a Document_t1 in place to write its own path-bookkeeping. Pure opaque-ID forces a data- model regression — every value-tree node would need an identity slot, breaking the "value tree is plain Map / List / Scalar" contract that tier 0 keeps and that tier 1 inherits. The hybrid keeps the canonical shape clean for interop (path-keyed everywhere it matters across ports) while letting mutation-heavy ports adopt opaque-ID backing as an internal implementation choice.

Conformance

Negatives accepted in this design

These are tier-1 properties that are not bugs but real costs of the approach:

Open questions parked for later

Worked example: HTML in dms+html

The full dms+html dialect spec — families, typed attributes, content semantics, tag inventory, versioning — lives in dialects/dms+html.md. What follows is a single end-to-end example showing source, decoded value tree, and decoration sidecar.

+++
_dms_tier: 1
_dms_imports:
  + dialect: "html"
    version: "1.0.0"
    ns: "html"
+++

+ |html(lang: "en")
  + |head
    + |title "DMS feature tour"
    + |meta(charset: "UTF-8")
    + |link(rel: "stylesheet", href: "style.css")
  + |body(class: "main", id: "root")
    + |h1 "Welcome to DMS"
    + |p(class: "lede")
      + "Click "
      + |a(href: "/spec.html") "here"
      + " to read the spec."
    + |ul(class: "items")
      + |li "first item"
      + |li "second item"

Decoded value tree (lite mode equivalent):

[
  [
    [["DMS feature tour"]],
    [
      ["Welcome to DMS"],
      ["Click ", ["here"], " to read the spec."],
      [["first item"], ["second item"]]
    ]
  ]
]

Decoration sidecar (full mode), shown in DMS:

decorators:
  - { path: [0],          "|": [{ family: "tag", fn: "html",  params: [{ lang: "en" }] }] }
  - { path: [0, 0],       "|": [{ family: "tag", fn: "head",  params: [{}] }] }
  - { path: [0, 0, 0],    "|": [{ family: "tag", fn: "title", params: [{}] }] }
  - { path: [0, 0, 1],    "|": [{ family: "tag", fn: "meta",  params: [{ charset: "UTF-8" }] }] }
  - { path: [0, 0, 2],    "|": [{ family: "tag", fn: "link",  params: [{ rel: "stylesheet", href: "style.css" }] }] }
  - { path: [0, 1],       "|": [{ family: "tag", fn: "body",  params: [{ class: "main", id: "root" }] }] }
  - { path: [0, 1, 0],    "|": [{ family: "tag", fn: "h1",    params: [{}] }] }
  - { path: [0, 1, 1],    "|": [{ family: "tag", fn: "p",     params: [{ class: "lede" }] }] }
  - { path: [0, 1, 1, 1], "|": [{ family: "tag", fn: "a",     params: [{ href: "/spec.html" }] }] }
  - { path: [0, 1, 2],    "|": [{ family: "tag", fn: "ul",    params: [{ class: "items" }] }] }
  - { path: [0, 1, 2, 0], "|": [{ family: "tag", fn: "li",    params: [{}] }] }
  - { path: [0, 1, 2, 1], "|": [{ family: "tag", fn: "li",    params: [{}] }] }

(params_dec and position fields elided here; both default empty / "leading" for this example.)