DMS — Tier 1 specification

Version: 0.1 (draft)

Sibling specification to SPEC.md. Defines the tier 1 extension: a syntactic surface that lets DMS represent element-shaped data (markup, declarative AST nodes, structured function calls) on top of the tier-0 value-tree algebra.

A tier-0-only decoder (per SPEC.md) rejects tier-1 documents at front-matter decode with a tier-1-pointing error. A tier-1-capable decoder ships the four functions described in Decoder / encoder split below; tier-0 conformance is preserved.

Pre-1.0: breaking changes are still possible. No version-bump rules apply yet.

Goal

Add a syntactic surface to DMS that lets it represent element-shaped data (markup, declarative AST nodes, structured function calls) without compromising the value-tree algebra (Map / List / Scalar) that the rest of the spec rests on.

Concrete near-term use cases:

DMS-shaped HTML / XML / SVG / JSX-like markup.
DMS-shaped structured logging with named verbs (|event(...), |metric(...)).
DMS-shaped IaC where named decorators map to resources (|resource("aws_instance", ...)).

What tier 1 does not add: expressions, references, includes from other files, computation. Those are separate decorator families that could be defined later as opt-in dialects, each with its own spec, but they are not part of this draft.

Tier 1 inherits all of tier-0's design non-goals (SPEC.md §"Design non-goals") unchanged:

No null / none. Missing values are expressed by key absence — including in tier-1 decoration AST records, where optional fields are omitted rather than set to a null marker.
No string concatenation or line continuation for single-line strings. Heredocs remain the only multi-line string mechanism.
No unit suffixes.
No references / interpolation / schemas.
No anchors / aliases / type tags.

If a tier-1 dialect's runtime layer wants to introduce any of these (e.g., an i18n dialect doing key resolution at render time), that's the dialect's contract with its consumers — DMS itself stays on the same algebra.

Core concept: decorators as AST decoration

A tier-1 line can carry a decorator call:

+ |tag(lang: "en")

The decorator (|tag(...)) is AST decoration, not value-tree content. The decoded document keeps two parallel structures:

Value tree — pure Map / List / Scalar. Same algebra as tier 0. Generic tooling (diff, patch, JSON-Schema, lite-mode parser, pretty-printer) operates on this without knowing tier 1 exists.
Decoration sidecar — parallel structure addressing each decorated value-tree node by path. Records every decorator call (and every comment, reusing the same machinery).

The decorators sidecar is to |-calls what the comments AST is to # ... lines today: parallel, path-keyed, captured during full-mode decode, dropped in lite mode, reattached on encode for byte-stable round-trip. Its canonical shape is a list of per-node entries — see "AST shape" for details.

Sigil categories

Two disjoint roles:

_ is reserved for core / built-in decorators. Existing tier-0 heredoc modifiers (_trim, _fold_paragraphs) keep working unchanged. Future spec-defined core decorators across any tier also use _.
Tier 0 reserves a fixed set of decorator sigils that no tier-0 document may use as a line-start character. Tier-1 imports bind sigils from this set to dialect families.

Tier-0 reserved decorator sigil set

Tier 0 spec adds the following characters as reserved decorator sigils:

 !  @  $  %  ^  &  *  |  ~  `  .  ,  >  <  ?  ;  =

Tier 0 also reserves the Reserved Emoji Set (see SPEC.md §Lexical → "Reserved emoji characters"): every extended grapheme cluster containing at least one codepoint from Extended_Pictographic ∪ Regional Indicators ∪ Emoji Modifiers ∪ {U+20E3}. Emoji-bearing grapheme clusters are first-class decorator sigils alongside the ASCII set.

Both kinds of chars never appear in valid tier-0 bare keys, so the reservation costs zero existing tier-0 documents. The tier-0 spec rejects them in every value-position — see SPEC.md §"Implicit reservations":

Position	Tier-0 status	Tier-1 use
First non-whitespace of a body line	parse error	decorator at scalar root or leading
After `key:`, before inline_value	parse error	inner decoration on kvpair value
After `+`, before inline_value	parse error	inner decoration on list-item value
After inline_value on kvpair line, pre-NL	parse error	trailing decoration on kvpair value
After inline_value on list-item line, pre-NL	parse error	trailing decoration on list-item value
Inside flow_array, before/after element	parse error	inner/trailing decoration on flow element
Inside flow_table, before/after value	parse error	inner/trailing decoration on flow_kv

The reservation positions are exactly the value-positions where tier 1 places decoration. Tier-1 decoration "fills in" the slots tier 0 already keeps empty.

This is a fixed list in the tier-0 spec, not a dynamic per-file reservation. Benefits:

Lexer is stable across files. Tokenizing a tier-0 line doesn't depend on whether or what front-matter declares; a reserved sigil is a parse error regardless of the document's tier.
Forward compatibility. Tier-0 decoders reject tier-1 decorator usage explicitly, with an actionable error ("decorator sigil | requires tier 1; set _dms_tier: 1 and declare the dialect in _dms_imports").
Cross-file tooling stays grep-able. A reserved sigil always means the same thing across the corpus — a decorator call. Search and refactor tools don't need per-file context.

The tier-0 spec also reserves _-prefixed root keys in front matter (existing behavior); _dms_imports and any future _dms_* field name is covered by that.

Multi-character sigils

A sigil is a non-empty sequence of sigil atoms. A sigil atom is one of:

a single ASCII char from the reserved decorator sigil set above, or
one extended grapheme cluster from the Reserved Emoji Set.

Single-atom sigils (|, @, 🚀, 🇺🇸) are the common case; multi-atom sigils (||, |@, ~~, &|*, 🚀🔥, |🚀) are the escape valve when a file imports more dialects/families than single-atom sigils can accommodate. Capacity scales as N + N² + N³ + …, effectively unlimited; the addition of the Reserved Emoji Set increases N from 17 to ~3.7k.

Sigils may only combine atoms drawn from the reserved sets above. Cross-set combinations with non-reserved chars like |+ are not valid sigils — + is the tier-0 list-item marker, and admitting it into sigil position would reintroduce parse ambiguity that tier 0 avoids by construction. ASCII-reserved chars and emoji clusters may be mixed within a single sigil (|🚀, 🚀|, @🇺🇸@); both kinds are tier-0-reserved at the same parse positions, so the resulting tokenization stays context-free.

Lexer rule: longest match. At decoration position, the lexer reads the maximal run of sigil atoms — a sequence of ASCII reserved-sigil chars and/or Reserved-Emoji grapheme clusters, in any order, with no intervening non-reserved characters — and matches it against the file's bound-sigil table (built from each import's bind, plus dialect defaults). The longest registered prefix of that run is the sigil; remaining sigil atoms (if any) are a parse error. If the run has no registered prefix, the error is "unknown sigil ''." Match comparison is byte- exact after NFC normalization of the source line, consistent with the rest of the spec.

Concretely: if a file binds both | and ||, then ||tag lexes as (||, tag) and |tag lexes as (|, tag) — no ambiguity. If a file binds only |, then ||tag is a parse error (the lexer would read || as the candidate, find no || binding, and reject it rather than fall back to | followed by literal |tag). Likewise, if a file binds 🚀 to a family, 🚀tag lexes as (🚀, tag); 🚀🔥tag is a parse error unless 🚀🔥 is also bound.

Skin-tone, ZWJ, and keycap sequences are matched as single atoms by the UAX #29 grapheme-cluster boundary algorithm (frozen at 15.1.0). 👍🏽 is one atom, not two; binding 👍 does not match 👍🏽. Authors who want both must bind both explicitly.

Front matter additions

Tier 1 adds two reserved root fields to front matter:

_dms_tier: 1 — already reserved by SPEC v0.14; sets the tier. Required for any document that uses tier-1 features.
_dms_imports — list of dialect imports. Required if the document uses any non-_-prefixed decorator.

Tier-0 documents (_dms_tier: 0 or no front matter) must not contain _dms_imports. Decoder rejects such documents with an error suggesting _dms_tier: 1.

Import shape

+++
_dms_tier: 1
_dms_imports:
  + dialect: "html"
    version: "1.0.0"
    ns: "html"
    bind:
      "|": ["tag", "entity"]
    deny:
      tag: ["script", "iframe"]
    alias:
      tag:
        para: "p"
+++

Per import: two required identity fields, plus four optional sub-maps, each handling one orthogonal concern.

Identity (required):

dialect (string) — the dialect name. Resolved against the decoder's installed dialect registry.
version (string, semver) — the version of the dialect to bind. Must be a valid semver string (MAJOR.MINOR.PATCH, with optional -pre.release and +build.metadata). The decoder matches the requested version against installed versions using the dialect's declared match strategy — see "Dialect versioning" under "Dialect specification contract".

Identity (optional):

ns (string) — namespace under which this dialect's decorators are accessible via fully-qualified reference. When set, calls of the form |<ns>.<fn_name>(...) resolve through this dialect explicitly. Unqualified calls still resolve through normal lookup.

Syntax (optional):

bind (map: sigil → list of family-names) — overrides the dialect's default sigil set for this file. If present, fully replaces defaults — you control which sigils bind which families. If absent, the dialect's published defaults apply. Sigil keys must be sequences of sigil atoms — ASCII chars from the tier-0 reserved decorator sigil set and/or grapheme clusters from the Reserved Emoji Set (see "Multi-character sigils" above); family-name values must be families the dialect publishes. Multiple families may bind to the same sigil; the function name disambiguates at parse time (see Resolution rules). Single-family bindings still use list form for type consistency: "|": ["tag"], not "|": "tag".

Curation (optional, all keyed by family-name):

allow (map: family → list of names) — whitelist. Only listed names within each family are accepted; wildcard turns off for any family with an allow entry. Families not listed in allow remain at dialect default.
deny (map: family → list of names) — blacklist. Listed names rejected; everything else passes through wildcard. Mutually exclusive with allow for the same family — declaring both for one family is a parse error. Different families may use different modes within one import.
alias (map: family → map: alias → canonical) — local renames. Source code uses the alias; AST records the canonical name. Aliases shadow canonical names within a family — writing the canonical name when an alias exists is a parse error, to preserve "one way to write each thing." Aliases are resolved before allow/deny checks.

Resolution rules

A decorator call appears at any value-position (see Decoration sites: every value-position). To parse one:

Lex the sigil. At the call's start, read the maximal run of sigil atoms — ASCII chars from the tier-0 reserved decorator sigil set and/or extended grapheme clusters from the Reserved Emoji Set, in any order, with no intervening non-reserved characters. Match the longest prefix of that run against the file's bound-sigil table (built from each import's bind, falling back to the dialect's published defaults). The matched prefix is the sigil; any unmatched trailing sigil atoms are a parse error. Lookup yields a set of (family, dialect) candidates — possibly more than one if a sigil binds multiple families.
Lex the name. Read the identifier after the sigil following tier-0 ident rules. If a . follows, this is a fully-qualified call (|<ns>.<name>); the namespace bypasses step 1's candidate set and resolves directly through <ns>'s import.
Resolve the name within candidate families: - Resolve through alias first: if the identifier is an alias key in a candidate family's alias map, replace with the canonical name. Writing the canonical name directly when an alias exists is a parse error. - Apply curation: if allow is specified, the resolved name must appear in it; if deny is specified, the resolved name must not appear in it; otherwise the family is wildcard and any name is accepted. Exactly one candidate family must accept the name. Zero accepting ⇒ name not found in any family bound to '<sigil>'. More than one accepting ⇒ name '<x>' is ambiguous between families <A>, <B> under sigil '<sigil>'; qualify the call as |<ns>.<x>(...).
Lex the params. If the next char (no whitespace) is (, parse one param group — flow-table-shaped (named) or flow-array-shaped (positional), determined by the group's first token (see "Positional params" under "Dialect specification contract"). Then loop: while the next char is ( (no whitespace), parse another group. Otherwise the call has zero explicit param groups. Decoded shape: - bare |tag and empty-parens |tag() both produce params: [{}] - |tag(a: 1) → params: [{ a: 1 }] (named) - |score(95, 5) → params: [[95, 5]] (positional) - |tag(a: 1)(b: 2) → params: [{ a: 1 }, { b: 2 }] (two named groups) - |score(95, 5)(commented: true) → params: [[95, 5], { commented: true }] (positional + named)
Record the AST entry. { family: <canonical family>, fn: <canonical fn name>, params: [...], params_dec: [], position: <leading|inner|trailing|floating> }. fn is always the canonical name regardless of whether an alias was used at the source. params_dec is empty by default and populated only if a param value was itself prefixed with decoration (see AST shape).

Conflict detection

After all _dms_imports entries are merged (shallow), the decoder runs two layers of conflict checks.

Front-matter-time: cross-import family collisions

For each (sigil, namespace, family) triple, two different imports binding the same triple is a hard parse error:

Decoder error in front matter (line 12):
  Decorator binding collision on (sigil='|', ns=<unset>, family='tag'):
    - import #0: dialect 'html' v1.0.0 binds '|' → 'tag'
    - import #1: dialect 'math' v0.1.0 binds '|' → 'tag'

  Resolve by remapping one. Suggestion (rebinding 'math' to '@'):
    _dms_imports:
      + { dialect: "html", version: "1.0.0", ns: "html" }
      + { dialect: "math", version: "0.1.0", ns: "math",
          bind: { "@": ["tag"] } }

Heuristic for which dialect the error suggests remapping: the later one in the imports list, since the earlier import is more likely the file's "primary" dialect.

A single sigil binding multiple families from the same import is not a collision ("|": ["tag", "entity"] is fine). Same- sigil multi-family bindings across different imports are also fine as long as no (sigil, ns, family) triple repeats — function-name disambiguation handles the rest at parse time.

Parse-time: same-sigil function-name ambiguity

Different families bound to the same sigil may publish overlapping function names. The decoder cannot enumerate names ahead of time when one or both families are wildcard, so this class of conflict is caught at body parse — see step 3 of Resolution rules. The error message points the user toward fully-qualified |<ns>.<name>(...) form.

Sigil validation

Every key in any bind map must be a non-empty sequence of sigil atoms — ASCII chars drawn from the tier-0 reserved decorator sigil set and/or extended grapheme clusters drawn from the Reserved Emoji Set (see "Tier-0 reserved decorator sigil set" and "Multi-character sigils" above for the canonical sources). A bind key containing any character outside the union of those two sets is a hard parse error at front-matter decode. (Note: underscore _ is not in either set — it's its own category, reserved for core / built-in decorators. ASCII chars that have Emoji=Yes but are not in the Reserved Emoji Set — #, *, digits — are not sigil atoms either; ©, ®, and ™ are sigil atoms because Unicode classifies them as Extended_Pictographic=Yes.)

AST shape

The decorators sidecar on Document_t1 is a list of entries, one per decorated value-tree node, in source order:

decorators:
  -
    path: [0]
    "|":
      -
        family: "tag"
        fn:     "html"
        params: [{ lang: "en" }]
        params_dec: []
        position: "leading"
    comments:
      -
        kind: "leading"
        text: "page root"

Shape: List< { path, <kind>: List<Record>, ... } >.

The path field identifies which value-tree node the entry attaches to. Path is a list of typed segments:

String segment → map key (any unicode key works without escaping, since segments are typed values, not stringified).
Integer segment → list index (zero-based).
Empty list [] → the document root.

Examples: [] is the root; ["body"] is the value at map key body; ["body", "children", 0] is the first element of the list at body.children.

Each entry beyond path is keyed by decoration kind. Each value is an array of decoration records of that kind, in source order. Per-kind shapes:

| (or whichever sigil the file binds): array of { family, fn, params, params_dec, position } records, in source order.
family: canonical family name from the dialect (e.g., "tag").
fn: literal name written after the sigil (e.g., "html"). Always the canonical name (post-alias-resolution), always present.
params: ordered list of param groups. Each group is either a Map (named, flow-table-shaped) or a List (positional, flow-array-shaped). See "Positional params" under "Dialect specification contract" for the lexer rule that determines mode at the open (. Decoded shapes:
- |tag (bare, no parens) → [{}]
- |tag() (empty parens) → [{}] (equivalent to bare)
- |tag(a: 1) → [{ a: 1 }] (named)
- |tag(a: 1)(b: 2) → [{ a: 1 }, { b: 2 }] (two named)
- |score(95, 5) → [[95, 5]] (positional)
- |score(95, 5)(commented: true) → [[95, 5], { commented: true }] (mixed groups)
params_dec: parallel sidecar for nested decoration on param values. Same shape as the top-level decorators list (a List< { path, <kind>: ... } >), but rooted at this call's params. Path segments reach into the param groups: integer first segment selects the group index, then subsequent segments descend into that group's value. The second segment's kind reflects the group's shape — string for named ({}) groups, integer for positional ([]) groups. Examples:
- path: [0, "src", "url"] decorates params[0].src.url (named group at index 0).
- path: [0, 1, "url"] decorates params[0][1].url (positional group at index 0, second element, then a named-key descent into its value). Empty list [] when no nested decoration exists.
position: which of the four decoration positions the call occupied at source — "leading", "inner", "trailing", or "floating". Mirrors the kind field on tier-0 comment records. The path already identifies which node the decoration attaches to; position records how it was written so encoders can round-trip back to the same form.
comments: array of { kind, text } records, where kind ∈ { "leading", "trailing", "inner", "floating" } (existing tier-0 attachment positions).
future kinds (e.g., "original_forms") follow the same shape: a kind key, an ordered array of records of that kind.

A given path must appear at most once in the list. All decoration kinds attached to the same node share one entry. The list-of-entries form is canonical because: (1) it preserves source order without depending on map iteration semantics, and (2) it lets path segments stay typed values rather than baking an escape grammar for arbitrary unicode map keys into a path string.

Identity vs param data: separate AST slots

The decoration record's identity / machinery fields (family, fn, params, params_dec, position) live at the top level of the record. params is a separate slot beneath that level. Function identity lives only in fn; nothing else does.

This means user-written param keys — name, id, class, even fn itself — are plain data with no AST-machinery meaning. There is no collision between the AST's identity slots and the user's param keys, regardless of what the user writes:

|form(name: "x", fn: "y")

decodes to:

{ "family": "tag", "fn": "form",
  "params": [{ "name": "x", "fn": "y" }] }

The AST record's fn holds "form" (the function name from source). params[0].fn holds "y" (user data). They never overwrite each other.

The only reserved param key per family is the one declared as content_slot — that key triggers hoisting and never appears in params after decode (its value moves to the value tree). Everything else is plain user data.

Lite-mode behavior

Lite mode drops the decorators sidecar wholesale, the same way tier-0 lite mode drops the comments sidecar.

For documents whose meaning depends on decorator content (markup docs, anything element-shaped), this is semantically lossy in a way tier-0 lite-mode is not. Comments are advisory; tier-1 decorators are load-bearing. The tier-1 spec must state:

Comments are advisory; consumers may discard them without loss of document meaning. Tier-1 decorators are structural; consumers that discard them must not claim to preserve document semantics.

Decoration attachment

Tier-1 decoration mirrors tier-0 comment attachment exactly. There are four positions; they share source-location rules with tier-0 comments (SPEC.md §"Comments → attachment").

Position	Source location	Attaches at	Stacks?
Leading	Own line(s) immediately before a kvpair / list-item, no blank line between	The following node	Yes
Inner	Between `key:` (or `+`) and the value, same line	The node's value	Yes
Trailing	After the value, same line, before newline	The node's value	Yes
Floating	Own line(s), blank-line-separated, or after the last child of a block	The enclosing block	Yes

Path-keying: leading, inner, and trailing all attach at the same path (the node's value path). Floating attaches one level up (the parent container's path). The position field on each decoration record (analogous to comments' kind field) records which of the four was used, for round-trip fidelity.

Decorator call syntax

decorator_call = sigil name [ "." name ] { "(" [ flow_kvs ] ")" }
sigil          = <one of the bound sigils for this file>
name           = ident                                  (* tier-0 ident rules *)
flow_kvs       = flow_kv { "," flow_kv }                (* tier-0 flow_kv *)

The name [ "." name ] form covers fully-qualified calls (|html.tag etc.) used to disambiguate between dialects.
Parens are optional. |tag and |tag() are equivalent; both produce params: [{}]. The lexer reads the name; if the next char (no whitespace) is (, parse param group(s), otherwise the call has no explicit params. See AST shape.
Multiple back-to-back param groups (|tag(a: 1)(b: 2)) are allowed; each becomes a separate entry in params. Dialects that don't use multi-group calls treat the second group as a parse error at the dialect layer.

Examples by position

# leading — own line(s) before a node
|cached_for("1h")
|requires_auth
endpoint: "/api/users"

# inner — between header and value, same line
endpoint: |cached_for("1h") "/api/users"
endpoint: |required                        # inner-only — value is dialect default
+ |row(class: "header") name: "Alice"      # inner on list-item kvpair-continuation form

# trailing — after the value, same line
port: 5432 |validates_range(1024, 65535)
+ "first" |emphasis

# floating — blank-line-separated, attaches to enclosing block
servers:
  + name: "web1"
  + name: "web2"

  |section_status("paused")

Stacking and interleaving with comments

Each position accepts an arbitrary number of decorations and comments, in source order. Decorations and comments may interleave freely:

a: "A" /* before |dec1 */ |dec1() |dec2() # this is a trailing line comment

Decoded as four trailing entries on a, in this order: 1. trailing block comment "before |dec1" 2. trailing decorator |dec1() 3. trailing decorator |dec2() 4. trailing line comment "this is a trailing line comment"

Rule (inherited from tier-0 comments): any # or // line comment must come last — line comments consume to end-of-line, so anything after them isn't part of the same slot. Decorators and /* … */ block comments don't consume EOL and can appear in any order before the line comment.

This rule applies wherever line comments are syntactically possible (trailing, leading, floating). Inner position is between header and value on the same line; tier-0 already forbids # / // there (they'd consume the value).

Decoration sites: every value-position

The grammar additions, by surrounding production:

decoration         = decorator_call | line_comment | block_comment

inner_run          = decoration { whitespace decoration }
trailing_run       = decoration { whitespace decoration }
leading_block      = ( decoration NEWLINE )+    (* no blank line before next node *)

decorated_value_t1 = inner_run? base_value? trailing_run?
                   ;  (* at least one of inner_run or base_value must be present *)

scalar_root_t1     = leading_block? decorated_value_t1
kvpair_t1          = leading_block? key ":" decorated_value_t1
list_item_t1       = leading_block? "+"     decorated_value_t1
flow_array_t1      = "[" [ decorated_inline_t1 { "," decorated_inline_t1 } ] "]"
flow_table_t1      = "{" [ flow_kv_t1         { "," flow_kv_t1          } ] "}"
flow_kv_t1         = key ":" decorated_inline_t1

decorated_inline_t1 = inner_run? inline_value trailing_run?
                    ;  (* flow forms have no leading/floating positions *)

base_value covers whatever the surrounding production already admits (e.g., child_block is available after + and key:, not inside flow forms — same as tier 0).

Floating decoration follows tier-0's floating-comment rules and attaches at the enclosing container's path, identically to how floating comments attach.

Decoration-only (no base_value)

When inner_run is present but base_value is absent, the value at that path is the dialect-specified empty default for the family of the first inner decorator on that line:

+ |meta(charset: "UTF-8")              # value resolves to html.tag's empty default ({})
+ |link(rel: "stylesheet", href: "x")
key: |required                         # value resolves to required's family empty default

Each dialect publishes a per-family empty default as part of its registration contract (typical defaults: empty table {} for record-shaped families like tag, empty list [] for collection-shaped families).

If no inner decoration is present and no base_value is written, that's a tier-0 parse error as today (e.g., bare + with no continuation). Trailing decoration without a base_value is syntactically impossible — there's no value-position for it to sit after.

Indent-block role: unchanged from tier 0

Because tier 1 only adds decoration positions around values, the question "what does an indent block under this line mean?" is answered exactly by tier-0 productions, applied to whatever base_value the line carries (or the dialect-empty default if inner-only):

Line shape	Indent-block role
`key: <inner?> <inline> <trailing?>`	(no indent allowed — leaf)
`key: <inner?>`	`child_block` is the value of `key`
`+ <inner?> <inline> <trailing?>`	(no indent allowed — leaf)
`+ <inner?> key: <inline>`	sibling kvpairs of the same record
`+ <inner?>`	`child_block` is the list-item's value

Tier 1 does not add a new indent-block opener. The tier-0 discriminator stays binary: a line either opens a block (no inline value present) or it doesn't (inline value present). Decoration sits orthogonally and never affects which case applies.

Content hoisting

Children of a tier-1 element can be written in two equivalent forms — block (indent-block) or flow (a content-slot param):

# Block form
+ |p(class: "lede")
  + "Click "
  + |a(href: "/x") "here"
  + " to read."

# Flow form (semantically equivalent)
+ |p(class: "lede", children: ["Click ", |a(href: "/x") "here", " to read."])

Both decode to the same value tree. The decode pipeline hoists the content-slot param into the value-tree position the decoration is attached to, so consumers find children at one place — the value tree — regardless of which source form was used.

Per-family content-slot declaration

The content-slot param name is not hardcoded. Each family in the dialect spec optionally declares its content-slot name as part of its registration contract:

html.tag declares content slot "children" (chosen because no standard HTML element has a literal children="..." attribute — "content" would collide with <meta name="..." content="...">)
A markdown-flavored dialect might declare "body"
An events dialect might declare "payload"
Families with no value-bearing children (e.g., i18n.message, validators.required) declare no content slot and never hoist

When a family has no declared content slot, no param name triggers hoisting — content (or any other key) is just an ordinary param.

Hoist pass

Tier-1 decode runs the hoist pass after the body parse:

Parse front matter, resolve dialect imports and bound sigils.
Parse body → raw AST where decorator params are intact maps, no value-tree promotion yet.
Hoist pass. For each decoration record: - Look up its family in the dialect. - If the family declares a content slot and the param map contains that key, move the slot's value out of params[N] [slot] and into the value-tree position the decoration attaches to. - The remaining keys in params[N] stay as decoration.
Apply other tier-1 normalizations (decoration-only → dialect-empty default, params_dec for nested decoration, etc.).

Hoisting is tier-1-only. Tier-0 docs have no decorators, so the pass is a no-op on them.

Conflict: both forms present

Specifying content via both the declared content_slot param (children: for HTML, whatever the dialect declared for other families) and an indent block on the same line is a parse error:

+ |p(children: ["a"])     # ← parse error
  + "b"

Decoder error at line N:
  Element |p has content specified via both 'children:' parameter
  and indent block. Pick one.

No magic merging or override semantics. Pick one form per node.

Encoder canonical form

The decoder collapses both source forms to the same value tree; the encoder must choose which form to re-emit. Heuristic, deterministic from content shape:

Block form when the value is a list with > 1 element, OR any element is itself a non-scalar (a list, table, or decorated value with non-trivial content).
Flow form when the value is a short inline span — a list with ≤ 1 element, OR all elements are scalars / decorated scalars (the typical mixed-text-with-spans case).
Block form when the value is a non-list (a table — block is the only way to express a table value-tree position ergonomically).
No content emission when the value matches the family's empty default (e.g., <meta>-shaped self-closing elements).

This rule rewrites source — |p(children: ["one"]) decodes and re-encodes as block form. That's the same kind of canonicalizer behavior tier-0 encoders already do for things like quote style. A future revision can add per-node form preservation (sidecar original_form marker) if real users find rewriting jarring.

Dialect specification contract

A dialect publishes a structured specification that the decoder loads at registration time. The spec is the cross-port source of truth — each port translates it into its native registration format, but the contract (what the decoder validates, what canonical names exist, etc.) is identical across ports.

The spec contains four kinds of declarations: families, param signatures (per family), named structs, and the dialect's version-match rule (covered in "Import shape").

Families

Each family the dialect publishes:

name — canonical family name (e.g. "tag", "entity").
default_sigils — sigils this family binds to in the absence of a per-file bind override. List form, even for single-sigil defaults.
empty_default — the value-tree value the decoder uses for decoration-only positions (no following base_value). Typically {} for record-shaped families, [] for list- shaped families, "" for string-shaped, etc.
content_slot (optional) — name of the param that gets hoisted into the value tree on decode (Content hoisting). Omit if the family has no value-bearing children.

Naming guidance: pick a slot name that does not conflict with any valid attribute name for the family. HTML's tag family uses "children" — "content" was the obvious choice but collides with <meta name="..." content="...">, where content is a literal HTML attribute. "children" has no collision in standard HTML. The hoist mechanism owns one canonical slot name per family; every other key flows through params unchanged. If a dialect can't find a non-colliding name, fall back to a prefixed form ("_children", "__body__") — verbose but unambiguous. - params (optional) — param signature for this family (see below).

Param signatures

Three modes per family:

strict — only declared keys accepted; unknown keys are parse errors. Required keys must be present.
wildcard_with_typed — any keys accepted; declared keys are type-checked. Required keys must be present.
wildcard — any keys accepted; no checks. Default if no params block is declared. Equivalent to today's behavior.

Per-family params block structure:

params:
  mode: "wildcard_with_typed"
  typed:
    class:    { type: "string" }
    hidden:   { type: "boolean", default: false }
    children: { type: "list_of any" }
  required: ["id"]

Validation is family-level only. Per-function tightening (e.g. HTML's <input> requires type, <span> does not) lives in the dialect's runtime / render layer, not in the DMS decoder. This keeps the decoder's job small and the spec testable across ports.

Param values themselves can be any tier-0 inline_value shape — scalar, flow_array, or flow_table — or a decorated value (|inner(...) nested in another decorator's params, resolved through params_dec). Validation applies to the hoisted + nested-resolved value, after the decoder has finished normalizing the AST.

Type vocabulary

Type	Matches
`string`	tier-0 string
`integer`	tier-0 integer
`float`	tier-0 float
`boolean`	tier-0 boolean
`datetime`	tier-0 datetime
`list_of <T>`	flow_array (or hoisted block list) where every element matches `<T>`
`map_of <T>`	flow_table (or block table) where every value matches `<T>`
`any`	any value-tree shape
`<StructName>`	a map matching the named struct (see below)

Named structs

Dialects may declare reusable struct types referenced by name in typed signatures:

structs:
  Address:
    street: { type: "string", required: true }
    city:   { type: "string", required: true }
    zip:    { type: "string" }

  ContactInfo:
    email: { type: "string" }
    home:  { type: "Address" }
    work:  { type: "Address" }

families:
  + name: "user_card"
    params:
      mode: "wildcard_with_typed"
      typed:
        contact:   { type: "ContactInfo" }
        addresses: { type: "list_of Address" }

Each struct field has the same shape as a typed entry — type, optional required, optional default. Structs may reference other structs and built-in types. Cycles are a registration-time error (the dialect's spec fails to load if a struct references itself directly or transitively).

Struct names live in the dialect's namespace; cross-dialect struct references are not supported in this revision. If a file imports two dialects that both define Address, each dialect's families resolve their own Address and there is no shared definition.

Decoder validation behavior

When a decorator call is decoded, after Resolution rules and content hoisting the decoder applies the family's signature:

If mode == "wildcard", skip validation.
If mode == "strict", every key in the (post-hoist) param group must appear in typed. Unknown keys are parse errors.
If mode == "wildcard_with_typed", declared typed keys are checked when present; unknown keys pass through unchecked.
For each declared key, run type-match: - Built-in types match by tier-0 value-tree kind. - Struct types recursively validate the value as a map against the struct's field signatures.
required keys must be present after hoisting + defaults. Missing required is a parse error.
Defaults fill in absent keys before the AST is finalized (i.e., the decorator record's params shows the defaulted value).

Validation errors fire at decode time with path context:

Decoder error at line N, decorator |tag(...) at path [0, 1]:
  Param 'class' has type integer but signature requires string.

(Path is rendered with the canonical typed-segment form — strings for map keys, integers for list indices, displayed as a list.)

Positional params

A param group is either flow-table-shaped (all named) or flow-array-shaped (all positional). Single calls separate the two modes by group:

|score(95, 5)                      # one positional group
|tag(class: "lede")                # one named group
|score(95, 5)(commented: true)     # positional group, then named group
|some(5)                           # variant payload — positional
|emphasis "text"                   # base_value form (existing) still works

A param group cannot mix positional and named at the same level. The mode of a group is detected from its first token and locked in for the whole group — mixing within one group is a parse error. To mix modes, use multiple param groups.

Rationale for separation-by-group rather than Python-style "positional-then-named within one group":

No parse ambiguity. Mode is detectable at the first token after (; no lookahead, no late re-classification.
Lossless round-trip. Encoder knows from the AST shape which group was positional vs named and emits faithfully.
Existing multi-group syntax gets purpose. (a)(b) was underused at named-only; positional makes it load-bearing.
Consistent with DMS's "one form per concept" principle. No implicit-mixing magic.

Lexer rule

At the open (, the decoder peeks at the first non-whitespace token:

First token after `(`	Group is	Parse as
`key:` (ident followed by `:`)	named	`flow_kvs`
`)` (immediate close)	empty	`[{}]` (back-compat)
anything else	positional	`flow_array_elems`

The "anything else" includes: scalars, flow forms, decorator calls, base_value-like inline values. A positional group is exactly a flow_array body without the brackets.

A key: token appearing after a positional element in the same group, or any non-key: token after a named element, is a parse error: "Cannot mix positional and named params in one group; use a separate (...) group."

AST shape

The params field on a decoration record becomes a list of either Map (named group) or List (positional group):

# |tag(class: "x") decodes to:
params: [{ class: "x" }]

# |score(95, 5) decodes to:
params: [[95, 5]]

# |score(95, 5)(commented: true) decodes to:
params: [[95, 5], { commented: true }]

# |tag and |tag() both decode to:
params: [{}]

In languages with sum types: params: List<Map<String,Value> | List<Value>>. In dynamically-typed languages: detect kind at runtime.

Per-family signature

Families that accept positional params declare a positional block alongside the existing typed block. The mode enum gains a fourth value:

Mode	Positional groups	Named groups	Strict checking on names
`wildcard`	rejected	accepted	none
`wildcard_with_typed`	rejected	accepted	declared `typed` keys
`strict`	rejected	accepted	only declared `typed` keys
`positional`	accepted	accepted	positional slots typed; named keys per `typed`

Spec example:

families:
  + name: "variant"
    default_sigils: ["|"]
    empty_default: {}
    content_slot: "value"
    params:
      mode: "positional"
      positional:
        - { name: "value", type: "any" }
      typed: {}                     # no named keys defined for this family

Each positional slot has: - name — used for AST round-trip identity and error messages - type — from the standard type vocabulary - required (optional, default true) — slot must be present - default (optional) — fills in absent slot before AST is finalized - variadic (optional, default false) — see below

positional is an ordered list of slots. Element 0 of the positional group fills slot 0, element 1 fills slot 1, etc.

Variadic positional slot

A family that accepts arbitrary-arity positional calls (|node(a, b, c, d, e)) declares its last slot as variadic: true. Each surplus positional element collects into the variadic slot's list.

# A family that takes one required string label and any number
# of additional values:
positional:
  - { name: "label", type: "string", required: true }
  - { name: "args",  type: "any", variadic: true }

# A family that takes only variadic args (KDL-shaped):
positional:
  - { name: "args", type: "any", variadic: true }

Rules.

Only the last slot may be variadic. A variadic slot followed by another slot is a registration-time error ("variadic slot 'X' must be the last positional slot in family <f>"). Forbidding mid-list variadic keeps slot assignment a single left-to-right scan with no end-counting.
At most one variadic slot per family. Falls out of (1).
Variadic slots are implicitly optional — required and default are not used on variadic slots; zero matching elements is valid and produces [].
Element-level typing. The slot's type describes the type of each element. The slot's collected value is implicitly a list of those elements. To accept any element type (KDL's case), set type: "any". To accept only integers (|sum(1, 2, 3, 4)), set type: "integer".
Surplus elements never error when variadic is present — they always have a slot to land in. Without a variadic slot, surplus elements remain a parse error per existing rules.

Validation pass — slot assignment.

For a positional group with K elements and a signature with N slots where slot N-1 is variadic:

Elements 0..N-2 fill the non-variadic slots in order. Type-check each against its slot's type. Apply defaults to missing optional slots in this range.
Elements N-1..K-1 collect into the variadic slot's list. Type-check each against the variadic slot's type (element type).
If K < N-1, missing required non-variadic slots are a parse error. The variadic slot itself can be empty.

Signature	Call	Validates as
`[label: string!, args: any (variadic)]`	`\|node("x")`	`label: "x"`, `args: []`
`[label: string!, args: any (variadic)]`	`\|node("x", 1, 2, 3)`	`label: "x"`, `args: [1, 2, 3]`
`[label: string!, args: any (variadic)]`	`\|node`	parse error: `label` required
`[args: any (variadic)]`	`\|node`	`args: []` (no positional group)
`[args: any (variadic)]`	`\|node("a", "b")`	`args: ["a", "b"]`
`[args: integer (variadic)]`	`\|node(1, 2, 3)`	`args: [1, 2, 3]`
`[args: integer (variadic)]`	`\|node(1, "two", 3)`	parse error: element 1 type mismatch

AST shape — unchanged.

Variadic does not change the AST. The positional group stays a flat List<Value> in params[N]:

# |node("x", 1, 2, 3) decodes to:
params: [["x", 1, 2, 3]]

The dialect's positional signature is metadata for validation and structured access, not an AST transform. Tools that want the structured { label: "x", args: [1, 2, 3] } view apply the signature on top of the raw list; tools that don't — generic walkers, sidecar inspectors, lite-mode consumers — get the same flat List<Value> regardless of whether variadic is declared.

Decoder cost.

Zero new lex/parse work. The positional-group lexer still produces a flat List<Value> regardless of the family's slot declarations. Variadic is a validation-pass rule applied after parsing — the same pass that already iterates the list to type-check non-variadic slots.

The cost the spec adds:

One new descriptor field per slot (variadic: bool)
One new validation rule (slot assignment under variadic) — shape: a single left-to-right scan, no backtracking
One new registration-time check (variadic-slot-must-be-last)

No lexer state change, no new tokens, no new AST shape, no new streaming yield rule. Streaming behavior is identical to the existing positional-group rule (decorator-call parens are yield-suspending; yield is deferred until the close paren regardless of how many elements appear inside).

Encoder.

Encoder emits the positional group as a flat comma-separated list. No marker for the variadic boundary — the boundary is implicit (last N-1 elements after the required slots, where N is the slot count).

|node("x", 1, 2, 3) round-trips as |node("x", 1, 2, 3), both with and without a variadic-aware encoder.

Decoder validation behavior (extended)

For each param group:

If group is positional and family mode == "positional": validate group elements against the family's positional slots in order. Type-check each element against its slot's type. Apply defaults for absent trailing slots if not required.
If group is positional and family mode != "positional": parse error — "Family '<f>' does not accept positional params; use named keys."
If group is named: validate per the existing rules (wildcard / wildcard_with_typed / strict).
If family mode == "positional": named groups still validate against typed exactly as wildcard_with_typed would. Mixed-group calls (one positional, one named) are normal.

Validation errors carry slot identity for positional groups:

Decoder error at line N, decorator |score(...) at path [0, 1]:
  Positional slot 1 ('y') has type string but signature
  requires integer.

Encoder canonical form

The encoder emits each group in its decoded shape:

params: [{ k: v }] → (k: v)
params: [[a, b]] → (a, b)
params: [[a, b], { k: v }] → (a, b)(k: v)

Group order is preserved from decode. No re-ordering, no merging across groups, no automatic conversion (positional elements are never re-emitted as named, even when slot names exist).

Multi-line vs single-line group emission

Decorator-call parens are a flow-form region (per "Streaming / incremental decode" above), so they inherit SPEC.md's canonical multi-line layout for flow forms — close-bracket anchors the indent, members one level deeper, trailing comma on the last member. Tier 1 adds two specifics:

Multi-line emission is not optional infrastructure for tier-1 ports. Decorator-call parens have no block-form alternative (unlike tier-0 lists / tables, which canonicalize to block form when non-empty). Block-shaped dialects routinely have groups with many keys; a single-line-only encoder produces unreadable output. Every tier-1-capable port MUST support multi-line emission for both named and positional groups.
Mixing single-line and multi-line groups in one call is permitted. If the first group fits on one line and the second doesn't, emit single-line then multi-line: dms |resource("aws_instance", "web")( count: 3, ami: "ami-...", instance_type: "t2.micro", ) (Where ("aws_instance", "web") is single-line and the named group is multi-line, both anchored on the call's line.)

The break threshold (when to choose multi-line) is the same as SPEC.md's flow-form rule: single-line render exceeding the port's line-width threshold, OR the group containing a value that itself renders multi-line (nested decorator call, multi-line flow form, heredoc).

Decoding accepts both forms unconditionally — decorator-call parens are yield-suspending, so line breaks inside (...) are invisible to the parse.

Hoisting interaction

content_slot hoisting is a named-key mechanism. A positional group does not trigger hoisting, regardless of whether positional slot 0's name matches content_slot.

Inline base_value continues to hoist:

|some 5 → hoists 5 into the family's content_slot
|some(5) → first positional param is 5; not hoisted
|some(value: 5) → named value key; hoisted if value is the content_slot name

Three forms can produce the same value tree if value is the content_slot AND positional slot 0's name is value. The dialect MUST document its canonical encode form (typically inline base_value when possible, else named, else positional) so round-trips are stable.

Dialect versioning

Dialect versions are semver. All dialects must publish their versions as MAJOR.MINOR.PATCH strings, with optional pre-release (-rc.1, -alpha.2) and build-metadata (+build.7) suffixes. This is a hard requirement — no other versioning schemes are supported.

The dialect declares one match strategy in its canonical spec, drawn from this fixed enum:

Strategy	Behavior
`exact`	Installed version equals requested version exactly.
`caret`	Same major, installed ≥ requested. (npm `^x.y.z` semantics.) For `0.x.y` requests, behaves as `tilde` — pre-1.0 minor bumps are breaking, per semver convention.
`tilde`	Same major.minor, installed patch ≥ requested patch.
`gte`	Installed version ≥ requested version.
`any`	Any installed version matches.

Default if undeclared: caret. Standard practice; friendliest evolution path.

The match algorithms are normative. Every port implements all five strategies identically — no per-port semantics drift.

Pre-release and build-metadata rules:

Pre-release tags participate in match: 1.0.0-rc.1 does not match 1.0.0 under any strategy. Pre-release ordering follows semver (-alpha < -beta < -rc < release).
Build metadata is ignored for matching: 1.0.0+linux matches 1.0.0+darwin under exact.

Where it lives in the dialect spec:

# Dialect canonical spec
name: "html"
version: "1.0.0"
version_strategy: "caret"        # optional; defaults to "caret"

structs: ...
families: ...

File-side syntax: the file writes a plain semver string (version: "1.0.0"); the dialect's strategy is applied. Range specifiers in the file (npm-style ^1.0.0, ~1.0.0) are not supported in this revision and would be a parse error if written. Range-specifier syntax is parked as a future enhancement.

Failure mode at decode:

Decoder error in front matter: dialect 'html' v1.5.0 requested
with strategy 'caret', but installed versions [1.0.0, 1.2.0,
1.4.9] do not satisfy. Install ≥1.5.0 of html.

Registration-time validation: if a dialect spec declares a version_strategy outside the five-value enum, the port refuses to register the dialect and surfaces an error.

Branding & file naming

A tier-1 document that imports any dialect is no longer a plain DMS document — it's a DMS dialect document. Naming conventions:

Brand identifier: dms+<brand> form. The + echoes EBNF / MIME dialect notation. Avoid dms-<brand> because that conflicts with port-naming convention (dms-c, dms-py, dms-rs).
File extension: .dms.<brand_extension>. Examples: .dms.html, .dms.svg. Editors and grep can dispatch on the secondary extension without parsing front matter.
Tier 0 stays canonical "DMS". Brands are strict supersets: every tier-1 dialect doc, with its decoration stripped, is a valid tier-0 document.

Open: dialect registry governance. Who allocates short brand names (html, markup, config)? Punted to a future registry. For now: an allocations document in the SPEC repo with PR-based additions; a x- prefix for unofficial / experimental dialects (x-mybrand); reverse-DNS namespacing available for anything else (io.flolabs.html).

Decoder / encoder split

Tier 1 introduces enough new lex / parse / sidecar machinery that mixing it into the tier-0 entry point would (a) bloat tier-0-only ports with code they don't need, and (b) muddle conformance — "does this port handle tier 1?" should be a yes/no per port, answered by which functions it ships.

Four functions, paired by tier

decode_t0(source, opts?) → Document_t0      # tier-0 only; rejects tier-1
encode_t0(doc: Document_t0, opts?) → str    # tier-0 only; rejects decorations

decode_t1(source, opts?) → Document_t1      # accepts both tiers
encode_t1(doc: Document_t1, opts?) → str    # accepts both tiers

The opts shape is per-port idiom (kwargs, options struct, builder, etc.) and carries:

mode: 'lite' | 'full' (default 'full')
dialect_registry (for _t1 only — port-specific lookup of installed dialect implementations)
other port-local knobs (formatter options on encode, etc.)

Tier detection

Tier is not declared by tier-0 documents. The decoder reads front matter and:

_dms_tier absent (or front matter itself absent) → tier 0
_dms_tier: 0 → tier 0 (legal but redundant)
_dms_tier: 1 → tier 1
any other value or future-tier integer → version error

A bare tier-0 document needs no declaration; the _dms_tier field is the opt-in marker for tier ≥ 1.

Document types

Document_t0 = { value_tree, comments }
Document_t1 = Document_t0 + { decorators }       # strict superset

Languages with subtyping (Python, TS): Document_t1 extends Document_t0. Languages without (Rust, Go): explicit field — a Document_t0 is convertible to a Document_t1 with empty decorators.

A decode_t1 always produces a Document_t1. If the source was tier-0 (no _dms_tier: 1), the result is a Document_t1 with an empty decorators list — structurally indistinguishable from a tier-0 doc round-tripping through tier-1 machinery.

Behavior at the boundary

decode_t0 on tier-1 input (front matter has _dms_tier: 1): errors immediately with the actionable forward-compat message described in SPEC's reservations section: "_dms_tier: 1 found, but this decoder only supports tier 0. Use decode_t1." Decoder must not attempt to parse the body.
decode_t1 on tier-0 input: succeeds. Tier-0 logic runs unchanged; the decoration-aware lexer paths are gated on the tier flag and cost nothing at parse time when not active.
encode_t0 on a Document_t1 with non-empty decorators: errors with "Document has tier-1 decorations at <path>; strip first or use encode_t1." Ports may ship a strip_decorations(doc) → Document_t0 helper for consumers who want a tier-0 projection.
encode_t1 on a tier-0-shaped Document_t1 (empty decorators): succeeds, emits a tier-0 document (no _dms_tier field).

Lite vs full is orthogonal to tier

decode_t0(source, mode='lite')  → value tree only
decode_t0(source, mode='full')  → value tree + comments
decode_t1(source, mode='lite')  → value tree only          (lossy on tier-1 — see warning)
decode_t1(source, mode='full')  → value tree + comments + decorators

Lite mode on tier-1 docs is semantically lossy (per "Lite-mode behavior" earlier). Tier-1-capable ports must surface this in their docs; consumers who lite-decode a tier-1 doc and re-emit it have produced a structurally different document.

Conformance per port

Port profile	Ships	Corpus
Tier-0-only	`decode_t0`, `encode_t0`	tier-0 (~4695 fixtures)
Tier-1-capable	All four	tier-0 + per-dialect tier-1

A tier-1-capable port still ships decode_t0 / encode_t0 — some consumers want strict tier-0 behavior in a tier-1-capable port (e.g., tooling pipelines that reject tier-1 docs by policy).

Forward extensibility

A future tier 2 adds decode_t2 / encode_t2 alongside the existing four. Cumulative — each tier-N decoder accepts tier-N and below. A port adopting tier 2 ships six functions; no existing function changes signature.

Mutate API symmetry

Ports that expose a mutate / path-update API split the same way: mutate_t0 operates on value tree + comments; mutate_t1 preserves decorators across mutations. Tier-1 mutations need to keep decorations attached to the right node across insertion, deletion, and reorder. The contract mutate_t1 must satisfy, and the two implementation strategies (opaque-ID backing or path-rewriting), are spelled out under Stable node identity (port-level) below.

Streaming / incremental decode

Streaming is optional per port. A port may ship batch-only decoders (whole document → Document) without violating spec. If a port ships a streaming decoder, the tier-1 streaming behavior is fully determined by the rule below — no per-port divergence.

Yield-point rule (inherited from tier 0): a streaming decoder yields a (path, value, decorations?) event when it has fully parsed a value-tree node.

Leaf values yield on line completion, after any trailing decoration on that line is attached.
Container values (block-form table, list, child_block) yield on DEDENT.
Flow forms ([...], {...}) suspend line-based yields between the open and close brackets — flow_kvs may span lines and the decoder buffers until the close.

Tier-1 addition: decorator-call parens are a yield-suspending region. A |tag(...) call's parens bound a region treated exactly like a flow form: yields suspend until the closing ). Param groups may span lines, and the decoder buffers until the group closes.

The hoisting wrinkle. When a family declares a content_slot, the decoder cannot finalize the decorated node's value-tree value until the param group is fully parsed — because it needs to check whether the slot key is present and, if so, hoist it into the value tree. Concretely:

+ |p(class: "lede", children: [..., long flow array spanning lines, ...])

The list-item's value-tree value is either the family's empty default (if content is absent) or the hoisted array (if present). The decoder doesn't know which until the param group closes. The yield for this list-item is therefore deferred until ). This isn't a new constraint shape — it's the same suspension flow forms already impose on tier-0 streaming, just applied to decorator-call parens.

What remains line-streamable in tier 1:

Leading decoration — yields when the next sibling node yields, with the leading decorations attached. Same as leading comments today.
Floating decoration — yields on container close. Same as floating comments.
Inner / trailing decoration — same-line as header / value, no streaming impact beyond what tier-0 already handles.
Decorator calls with no multi-line params — yield with the line they're on.
Block-form children of a decorated parent — yield as encountered, identically to tier-0 children under a parent kvpair.

Summary: yield-suspending regions in tier 1 are exactly flow forms + decorator-call parens; hoisting defers a node's yield to the close of its containing param group. Everything else streams as in tier 0.

Per-port API

Each port exposes a path-keyed metadata-lookup API on the decoded Document so consumers can read tier-1 decoration without walking the decorators list themselves. Paths are passed as lists of typed segments (strings for map keys, integers for list indices), matching the AST path field exactly:

doc.decorations_at(["body", 0])              # all kinds at path
doc.decorations_at(["body", 0], kind="|")    # just |-decorations
doc.decoration_value(["body", 0], kind="|")[0].fn
doc.decorations_at([])                       # root-level decorations

Spec mandates the lookup contract (path-keyed by typed segments, kind-discriminated, ordered array per kind). Ports may build an internal hash on first lookup to amortize O(1) access over the canonical list shape. Each port picks idiomatic naming on top.

Stable node identity (port-level)

The canonical sidecar shape is path-keyed: every entry's path field locates the decorated node in the value tree by typed- segment path. This is the wire format for the sidecar, the streaming yield shape, and the inspection / debug-print form.

Path-keying is positional — paths shift when lists are mutated (insert at index 0 → every later index shifts by one). Naive mutation of the value tree can leave decorations pointing at the wrong node. To keep the canonical shape simple while still supporting mutation-heavy workloads, the spec adopts a hybrid model:

At rest, on the wire, in streaming events: path-keyed, always. No opaque IDs in source, no opaque IDs in serialized sidecars, no opaque IDs in streaming events. The sidecar stays self-describing — path: ["body", 0] means something to a human reading the structure.
In memory, per-port, optional: ports MAY back the value tree with identity-bearing nodes (wrappers, arenas, object identity) so a mutate_t1 API can keep decorations attached to nodes through arbitrary mutation without per-call path rewriting. The port chooses the representation; spec does not mandate one.

A port that ships mutate_t1 MUST guarantee:

After any sequence of mutate_t1 operations on a Document_t1, a subsequent encode_t1 produces output where each decoration is emitted on the same logical node it was attached to at decode time — modulo replacements, where the caller overwrites a node with a fresh value and the old node's identity (and its decoration) is dropped.

Ports MAY satisfy this contract by:

Opaque-ID backing. Wrap value-tree nodes in identity- bearing containers internally; sidecar lookups go through identity, not path. Mutations preserve identity for free; replacements drop it explicitly. Port pays a data-model cost (every node carries identity) but mutation is bookkeeping-free.
Path rewriting. Keep value-tree nodes plain. Each mutate_t1 operation rewrites affected path fields in the sidecar as part of the mutation. Port pays bookkeeping cost but keeps the data model plain. Direct mutation of the value tree (bypassing the API) breaks attachment.

The spec is silent on which strategy a port picks; both satisfy the contract. The choice is between paying the cost in the data model (opaque-ID) or in the mutate API surface (path rewriting).

For ports that ship only decode_t1 / encode_t1 (no mutate_t1): decoration attachment is preserved across decode → encode round-trips on unmutated documents. Programmatic mutation outside a mutate_t1 API is the caller's responsibility — typical mitigation is to manipulate the sidecar in tandem with the value tree, or to re-decode after editing source text directly.

Plain + direct + preserving: pick two

The reason the spec accommodates multiple strategies (rather than mandating one) is that three properties port authors might want out of Document_t1 mutation are not jointly satisfiable:

Plain data model — value tree is unwrapped Map / List / Scalar in the host language's idiomatic types. No wrappers, no identity slots, no per-node ceremony.
Direct mutation — caller edits the value tree in place, without routing through any mutate API.
Preservation — decoration stays attached to the same logical node across mutation.

A port can pick any two of the three. Each choice maps to one of the strategies above:

Opaque-ID backing — gives up #1. Direct mutation + preservation, but the value tree is wrapped (port-internal data- model cost). Suits editor-shaped consumers that mutate heavily and want plain tree[k] = v semantics.
Path rewriting in mutate_t1 — gives up #2. Plain data model + preservation, but only when mutations route through the API. Suits API-shaped consumers willing to call methods for mutation.
Pure plain tree, no mutate_t1 — gives up #3. Plain data model + direct mutation, but no preservation guarantee under insertion/reorder. Suits config-shaped workloads that decode → small edit → encode (or re-decode after edits).

This isn't a bug; it's a description of what's available. Each port picks the pair that matches its workload. The canonical sidecar shape is path-keyed regardless of the choice — the three strategies differ only in how a port represents the value tree in memory and what its mutate surface looks like, not in what encode_t1 writes to disk.

When to use `mutate_t1` (caller-facing rules)

Independent of the port's strategy, callers need a clear rule for when direct value-tree mutation preserves decoration and when it doesn't. The rule follows from path-keying: decoration is attached to the path it was decoded onto. Anything that changes which path identifies which logical node breaks attachment.

Direct mutation is safe (preservation holds) when:

Leaf value swap at unchanged path. Replacing the scalar at an existing path with another scalar: python doc.value_tree["server"]["port"] = 8080
Editing decoration content in place. Params, comment text, position fields — these live inside sidecar entries and aren't path-keyed: python doc.decorations_at(["body", 0], kind="|")[0].params[0]["class"] = "lede"
Tail-only append to a list. If no decoration entry has a path through that list at any index, or only through indices already present, appending past the end shifts nothing: python doc.value_tree["items"].append("new") # safe iff no decoration on ["items", N+]
Adding a fresh decoration entry. New entries with new paths don't disturb existing ones.

Direct mutation is unsafe without mutate_t1 (or manual sidecar rewriting) when:

List indices shift. Any insertion or deletion at a non- tail index of a list that has decoration on later indices, or any list reorder — paths through later indices now point at the wrong logical nodes.
Map keys change. Rename, pop + set under a new key, or any operation that removes the key the decoration was attached to — decoration becomes orphaned.
Subtrees move. Splicing a node from one location to another — decoration entries through the source path are wrong; the destination doesn't get them.
Non-leaf replacement with structural change. Overwriting a map or list with a value of different shape — decoration through the old structure doesn't fit the new structure.

Operational summary:

If your mutation only changes a leaf value or a decoration's internal contents, edit directly. If your mutation changes which path identifies which logical node — index shift, key change, subtree move, structural overwrite — use mutate_t1, rewrite the sidecar paths yourself, or re-decode after editing source text.

This rule applies regardless of which strategy a port adopted:

Opaque-ID-backed ports make the unsafe cases safe automatically (identity survives; sidecar lookups don't depend on path), so the rule is advisory rather than load-bearing.
Path-rewriting ports rely on the rule strictly: direct mutation in the unsafe categories silently breaks attachment.
No-mutate_t1 ports offer no recourse — callers either stay in the safe set, do their own sidecar bookkeeping, or re-decode.

The rule is format-level, not port-level: any tier-1 consumer can use it to decide whether doc.value_tree[...] = x is fine or whether they need a heavier mutation path.

Why hybrid. Pure path-keyed leaves mutation safety unsolved and forces every consumer that edits a Document_t1 in place to write its own path-bookkeeping. Pure opaque-ID forces a data- model regression — every value-tree node would need an identity slot, breaking the "value tree is plain Map / List / Scalar" contract that tier 0 keeps and that tier 1 inherits. The hybrid keeps the canonical shape clean for interop (path-keyed everywhere it matters across ports) while letting mutation-heavy ports adopt opaque-ID backing as an internal implementation choice.

Conformance

Tier-0 conformance corpus (~4695 fixtures) stays as-is, unchanged.
Tier-1 adds its own corpus, scoped per dialect: dms+html fixtures, dms+markup fixtures, etc.
Each port declares which dialects it supports. Conformance becomes a matrix of (port × dialect → pass count) rather than a single number.
Tier-0 conformance per port is unchanged: every port still hits 4695 / 4695 on tier-0 fixtures.

Negatives accepted in this design

These are tier-1 properties that are not bugs but real costs of the approach:

Lite mode loses element identity for decorated documents. Acknowledged; spec language fences off the semantic-loss warning.
Tier-1 → other-format export is muddy. No clean serialization of the value tree + decoration sidecar to JSON, since JSON has no slot for the parallel sidecar. Consumers who need an interop story choose a flattening convention.
Mutation bookkeeping is per-port. The canonical sidecar is path-keyed; mutation safety under mutate_t1 is a port-level implementation choice (opaque-ID backing, or path-rewriting in the API). Direct mutation of the value tree, bypassing mutate_t1, breaks decoration attachment in path-keyed ports by design.
Mixed inline content goes vertical in block form. Mitigated by the family's content_slot in flow form for inline runs (e.g., HTML's children: param).
Family / dialect registry coordination cost. Cross-port consistency requires every port that claims dialect support to implement the same families with the same semantics.
Encoder must dispatch on family. No longer a single walker; needs a per-family renderer registry.
Schema validation splits in two. Structural validation on the value tree, semantic validation on the decoration sidecar.

Open questions parked for later

Dialect registry governance + naming.
Editor / IDE story per dialect (syntax highlighting, completion, schema-aware errors).
Range-specifier syntax in files (^1.0.0, ~1.0.0); parked pending real-user pressure once dialects mature.

Worked example: HTML in dms+html

The full dms+html dialect spec — families, typed attributes, content semantics, tag inventory, versioning — lives in dialects/dms+html.md. What follows is a single end-to-end example showing source, decoded value tree, and decoration sidecar.

+++
_dms_tier: 1
_dms_imports:
  + dialect: "html"
    version: "1.0.0"
    ns: "html"
+++

+ |html(lang: "en")
  + |head
    + |title "DMS feature tour"
    + |meta(charset: "UTF-8")
    + |link(rel: "stylesheet", href: "style.css")
  + |body(class: "main", id: "root")
    + |h1 "Welcome to DMS"
    + |p(class: "lede")
      + "Click "
      + |a(href: "/spec.html") "here"
      + " to read the spec."
    + |ul(class: "items")
      + |li "first item"
      + |li "second item"

Decoded value tree (lite mode equivalent):

[
  [
    [["DMS feature tour"]],
    [
      ["Welcome to DMS"],
      ["Click ", ["here"], " to read the spec."],
      [["first item"], ["second item"]]
    ]
  ]
]

Decoration sidecar (full mode), shown in DMS:

decorators:
  - { path: [0],          "|": [{ family: "tag", fn: "html",  params: [{ lang: "en" }] }] }
  - { path: [0, 0],       "|": [{ family: "tag", fn: "head",  params: [{}] }] }
  - { path: [0, 0, 0],    "|": [{ family: "tag", fn: "title", params: [{}] }] }
  - { path: [0, 0, 1],    "|": [{ family: "tag", fn: "meta",  params: [{ charset: "UTF-8" }] }] }
  - { path: [0, 0, 2],    "|": [{ family: "tag", fn: "link",  params: [{ rel: "stylesheet", href: "style.css" }] }] }
  - { path: [0, 1],       "|": [{ family: "tag", fn: "body",  params: [{ class: "main", id: "root" }] }] }
  - { path: [0, 1, 0],    "|": [{ family: "tag", fn: "h1",    params: [{}] }] }
  - { path: [0, 1, 1],    "|": [{ family: "tag", fn: "p",     params: [{ class: "lede" }] }] }
  - { path: [0, 1, 1, 1], "|": [{ family: "tag", fn: "a",     params: [{ href: "/spec.html" }] }] }
  - { path: [0, 1, 2],    "|": [{ family: "tag", fn: "ul",    params: [{ class: "items" }] }] }
  - { path: [0, 1, 2, 0], "|": [{ family: "tag", fn: "li",    params: [{}] }] }
  - { path: [0, 1, 2, 1], "|": [{ family: "tag", fn: "li",    params: [{}] }] }

(params_dec and position fields elided here; both default empty / "leading" for this example.)

DMS — Tier 1 specification

Goal

Core concept: decorators as AST decoration

Sigil categories

Tier-0 reserved decorator sigil set

Multi-character sigils

Front matter additions

Import shape

Resolution rules

Conflict detection

Front-matter-time: cross-import family collisions

Parse-time: same-sigil function-name ambiguity

Sigil validation

AST shape

Identity vs param data: separate AST slots

Lite-mode behavior

Decoration attachment

Decorator call syntax

Examples by position

Stacking and interleaving with comments

Decoration sites: every value-position

Decoration-only (no base_value)

Indent-block role: unchanged from tier 0

Content hoisting

Per-family content-slot declaration

Hoist pass

Conflict: both forms present

Encoder canonical form

Dialect specification contract

Families

Param signatures

Type vocabulary

Named structs

Decoder validation behavior

Positional params

Lexer rule

AST shape

Per-family signature

Variadic positional slot

Decoder validation behavior (extended)

Encoder canonical form

Multi-line vs single-line group emission

Hoisting interaction

Dialect versioning

Branding & file naming

Decoder / encoder split

Four functions, paired by tier

Tier detection

Document types

Behavior at the boundary

Lite vs full is orthogonal to tier

Conformance per port

Forward extensibility

Mutate API symmetry

Streaming / incremental decode

Per-port API

Stable node identity (port-level)

Plain + direct + preserving: pick two

When to use mutate_t1 (caller-facing rules)

Conformance

Negatives accepted in this design

Open questions parked for later

Worked example: HTML in dms+html

When to use `mutate_t1` (caller-facing rules)